PROGRESS IN
Nucleic Acid Research a n d Molecular Biology edited by
WALDO E. COHN
KlVlE MOLDAVE
Biology Division Oak Ridge National Laboratory Oak Ridge, Tennessee
Department of Molecular Biology and Biochemistry University of California, lroine Iruine, California
Volume
50
ACADEMIC PRESS San Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper.
c 3
Copyright 0 1995 by ACADEMIC PRESS, INC. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by Academic Press Limited 24-28 Oval Road, London NW 1 7DX
International Standard Serial Number: 0079-6603 International Standard Book Number: 0- 12-540050-0 PRINTED IN THE UNITED STATES OF AMERICA 95 96 9 7 9 8 99 0 0 B B 9 8 7 6
5
4
3 2 1
Abbreviations and Symbols
All contributors to this Series are asked to use the terminology (abbreviations and symbols) recommended by the IUPAC-IUB Commission on Biochemical Nomenclature (CBN) and approved by IUPAC and IUB, and the Editors endeavor to assure conformity. These Recommendations have been published in many journals ( I , 2) and compendia (3);they are therefore considered to be generally known. Those used in nucleic acid work, originally set out in section 5 of the first Recommendations (1)and subsequently revised and expanded (2,3),are given in condensed form in the frontmatter of Volumes 9-33 of this series. A recent expansion of the oneletter system (5) follows.
(5) SINGLE-LETTER CODERECOMMENDATIONS~ Symbol
Meaning
Origin of symbol
A T(U) C
G
Guanosine Adenosine (ribo)Thymidine (Uridine) Cytidine
G or A T(U) or C A or C G or T(U) G or C A or T(U)
puRine pyrimidine aMino Keto Strong interaction (3 H-bonds) Weak interaction (2 H-bonds)
A or C or T(U) G or T(U) or C G or C or A G or A or T(U)
not not not not
N
G or A or T(U) or C
aNy nucleoside (i.e., unspecified)
Q
Q
Queuosine (nucleoside of queuine)
G A T(U) C R Y M K S Wb
H B
V
DC
G ; H follows G in the alphabet A; B follows A T (not U); V follows U C; D follows C
OModified from Proc. Natl. Acad. Sci. U.S.A. 83, 4 (1986). ”W has been used for wyosine, the nucleoside of “base Y” (wye). CDhas been used for dihydrouridine (hU or H,Urd). Enzymes
In naming enzymes, the 1984 recommendations of the IUB Commission on Biochemical Nomenclature ( 4 ) are followed as far as possible. At first mention, each enzyme is described either by its systematic name or by the equation for the reaction catalyzed or by the recommended trivial name, followed by its EC number in parentheses. Thereafter, a trivial name may be used. Enzyme names are not to be abbreviated except when the substrate has an approved abbreviation (e.g., ATPase, but not LDH, is acceptable). ix
ABBREVIATIONS AND SYMBOLS
X
REFEREN cEs 1 . JBC 241,527 (1966);Bchem 5,1445 (1966);BJ 101, l(1966);ABB 115,1(1966), 129, l(1969);
and elsewhere. General.
2. EJB 15, 203 (1970);JBC 245, 5171 (1970);J M B 55, 299 (1971); and elsewhere. 3. “Handbook of Biochemistry” (G. Fasman, ed.), 3rd ed. Chemical Rubber Co., Cleveland, Ohio, 1970, 1975, Nucleic Acids, Vols. I and 11, pp. 3-59. Nucleic acids. 4. “Enzyme Nomenclature” [Recommendations (1984)of the Nomenclature Committee of the IUB]. Academic Press, New York, 1984. 5. EJB 150, 1 (1985). Nucleic Acids (One-letter system). Abbreviations of Journal Titles
Journals
Abbreuiations used
Annu. Rev. Biochem. Annu. Rev. Genet. Arch. Biochem. Biophys. Biochem. Biophys. Res. Commun. Biochemistry Biochem. J. Biochim. Biophys. Acta Cold Spring Harbor Cold Spring Harbor Lab Cold Spring Harbor Symp. Quant. Biol. Eur. J. Biochem. Fed. Proc. Hoppe-Seyler’s Z. Physiol. Chem. J. Amer. Chem. SOC. J. Bacteriol. J. Biol. Chem. J. Chem. SOC. J. Mol. Biol. J. Nat. Cancer Inst. Mol. Cell. Biol. Mol. Cell. Biochem. Mol. Gen. Genet. Nature, New Biology Nucleic Acid Research Proc. Natl. Acad. Sci. U.S.A. Proc. SOC.Exp. Biol. Med. Progr. Nucl. Acid. Res. Mol. Biol.
ARB ARGen ABB BBRC Bchem BJ BBA CSH CSHLab CSHSQB EJB FP ZpChem JAC S J. Bact. JBC JCS JMB JNCI MCBiol MCBchem MGG Nature NB NARes PNAS PSEBM This Series
Some Articles Planned for Future Volumes
The Poly(ADP)-ribosylation System of Higher Eukaryotes FELIXR. ALTHAUS Reconstitution of Mammalian DNA Replication ROBERT A. BAMBARAAND LIN HUANG The Rodent BC1 Gene as a Master Gene for the ID Family Retroposition: Evolution and Functional Studies
DEININGER, HENRYTIEDGE,JOOMYEONG KIM AND BROSIUS
PRESCOTT JURCEN
Transcriptional Regulation of Growth Related Genes THOMAS F. DEUELAND ZHAO-YI WANG Poly(A) Tails, Structure, and Function
MARYEDMONDS Mechanism of Transcription Fidelity
GUNTHEREICHHORNAND JIM Bmzow
The Mechanics and Specificity of Signal Transduction to the Nucleus: Lessons from c-fos MICHAEL GILMAN Regulation of Expression of the Gene for Malic Enzyme ALAN G. GOODRIDGE Structure/Function Relationships of Phosphoribulokinase and Ribulose Bisphosphate Carboxylase/Oxygenase
FREDC. HARTMAN AND HILLELK. BRANDES
Histone Interactions with Special DNA Structures KENSAL
E.
VAN
HOLDSAND JORDANKA
ZLATANOVA
Examination of Mitotic Recombination by Means of Hyper-recombination Mutants in Saccharomyces cerevisiae HANNAHL. KLEIN Molecular Regulation of Heme Biosynthesis in Higher Vertebrates BRIAN K. MAY, SATISHC. DOGRA, TIMJ. SADLON,C. KAMANABHASKER, TIMOTHY c. COX AND SYLVIA s. BOTTOMLEY Drugs That Deplete Mitochondria1 DNA in Vertebrates: Basic and Physiological Considerations &JEAN
MORAIS
xi
xii
SOME ARTICLES PLANNED FOR FUTURE VOLUMES
The Chemistry and Biology of Double-stranded RNA ALLAN w. NICHOLSON The Decay of Bacterial Messenger RNA DONALD P. NIERLICH The Role of Ribosomal RNA in Translation JIM OFENGAND
Gene Structure Creates Diversity in lsozyme Structure, Substrate Specificity, and Regulation IDA S. OWENSAND JOSEPH K . RITTER Structure, Function, and Inhibition of 06-Alkylguanine-DNA Alkyltransferase ANTHONYE. ~ G G M , . EILEENDOLAN AND ROBERT c. MOSCHEL Bacterial and Eukaryotic DNA Methyltransferases NORERT0. REICH The FLP Recombinase of the 2 p m Plasmid of Sacchoromyces cerevisiae PAUL
D.
SADOWSKI
Site-specific Chemical Nucleases DAVID
s. SICMAN
Replicable RNA Vectors: Prospects for Cell-Free Gene Amplification, Expression, and Cloning ALEXANDER B. CHETVERIN AND ALEXANDERs. SPIRIN Transcriptional Regulation of Small Nuclear RNA Genes WILLIAM
E.
STUMPH
Transcription of the Herpes Simplex Virus Genome during Productive and Latent Infection EDWARDK. WAGNER,JOHN F. GUZOWSKIAND JASBIR SINCH
Ri bosome-catalyzed Pept ide- bond Format io n KATHY R. LIEBERMAN AND ALBERT E. DAHLBERC Division of Biology and Medicine Brown University Providence, Rhode Island 02912
I. The Enzyme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Substrates . . . . . . . . Reactions with “Unnatura es . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implication of 234
11. 111. IV. V
Prospective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 7 9 10
17 20
Translation of messenger RNA (mRNA) into protein by ribosomes is a complex and highly critical phase of gene expression. Peptide-bond formation, the covalent linkage of amino acids during mRNA translation, is among the most fundamental biochemical transformations in nature, and is the principal catalytic activity of ribosomes. The enzymatic activity responsible for peptide-bond formation, peptidyltransferase, is integral to the ribosome ( I ) . In recent years, evidence has accumulated suggesting that ribosomal RNA (rRNA) is intimately involved in the catalysis of peptidyl transfer, leading to the proposal that the catalytic activity is a property of rRNA, and to the speculation that the contemporary translational mechanism has evolved from a primordial peptidyltransferase consisting solely of RNA (2).In this article, we review the essential features of ribosome-catalyzed peptide-bond formation, and the involvement of rRNA in catalysis of this critical reaction. Ribosomes are complex ribonucleoprotein particles, consisting in all organisms of two subunits; in the eubacterium Escherichia coli, the 30-S ribosomal subunit is composed of 16-S rRNA and a single copy each of 21 ribosomal proteins, whereas the 50-S ribosomal subunit consists of 5-S rRNA, 23-S rRNA, and 32 different proteins, one of which is present in four copies. Translation of mRNA is initiated by the assembly of a ternary coniplex between the 30-5 subunit, mRNA, and the initiation-specific aminoacyl transfer RNA (aa-tRNA) substrate, fMet-tRNAfMet(Fig. 1, top right). This assembly process is facilitated and regulated by the activities of three protein Progress in Nircleic Acid Research and Molvculsr Biology, MII. 50
1
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved.
Ribosome Messenger RNA
methionine) Initiation factors GTP product J I
Termination
, I I
Initiation
I
Initiation factors (regenerated) and GDP Elongationfactor Tu and GTP
factors
GDP Elongation
Transfer RNA
c,
Elongation factor G
Eloigation factor G and GTP
FIG. 1. General scheme for the ribosomal translational cycle. Indicated are the broad features of the initiation (top right), elongation (bottom), and termination (top left) phases of ribosome-catalyzed protein synthesis. A, Aminoacyl-tRNA site; P, peptidyl-tRNA site; E, exit site. [Modified from Engelman and Moore (91). Copyright 0 1976 by Scientific American, Inc. All rights reserved.]
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
3
initiation factors (3).The 50-S ribosomal subunit then associates with the ternary complex to begin protein chain elongation. For each peptide bond formed during the elongation phase of translation, a peptidyl-tRNA molecule (or Met-tRNAfMet for the first peptide bond) is bound to the ribosomal peptidyl site (P-site) and (except in the case of the first peptide bond) the deacylated tRNA product from the previous cycle is bound to the exit site1 (E-site) (4-9). An aa-tRNA molecule is delivered to the ribosomal aininoacyl site (A-site) in a complex with the protein elongation factor Tu (EF-Tu) and guanosine triphosphate (GTP) (Fig. 1, center), and the E-site tRNA is released from the ribosome (4). Accuracy in mRNA decoding, which might be defined as substrate specificity or discrimination in peptidyl transfer, requires that productive recognition of aa-tRNA in the A-site be tied to appropriate codon-anticodon interaction. Substrate discrimination is therefore accomplished through a complex series of events that is not yet fully understood in mechanistic detail (10-12). The first level of discrimination derives from the differential af€inities of cognate and noncognate tRNAs (complexed with EFTu-GTP), and is conferred, at least in part, by base-pairing matches or mismatches in codonanticodon interaction during initial recognition (Fig. 2, k, and k-J. Binding of the aa-tRNA-EFTu-GTP complex to the ribosome triggers the hydrolysis (Fig. 2., k,) of at least two molecules of GTP (13, 14), and the subsequent dissociation of EFTuaGDP frees the a-amino group of the A-site-bound aa-tRNA to participate in nucleophilic attack on the carbonyl carbon of the aminoacyl ester of the P-site-bound peptidyl-tRNA. Accuracy is further enhanced by a proofreading step that occurs following GTP hydrolysis, and that appears to consist of a kinetic competition between the rate of dissociation from the ribosome of aa-tRNA (Fig. 2, k,) and the rate of peptidyl transfer (Fig. 2, k3) (10, 11). Thus noncognate aa-tRNAs that progress past the irreversible step of GTP hydrolysis (Fig. 2, k,) may exhibit a higher value of k, compared to cognate tRNAs, and may still be rejected prior to their incorporation into the nascent protein. Although it is possible, in theory, that the relative magnitude of k, is solely attributable to the strength of the codon-anticodon interaction, it is also possible that the competition between k, and k3 reflects the differential ability of cognate or noncognate tHNAs to make additional contacts with the ribosome (i.e., to achieve a bound structure dependent on appropriate codon-anticodon interaction) necessary to obtain the binding energy and alignment for peptidebond formation (kJ Such contacts could thus be simultaneously essential for catalysis of peptidyl transfer and for translational fidelity. 1 Although there are conflicting views regarding the mechanistic role of the E-site in the elongation cycle (see, for example, 5 and 6), the evidence for a structurally distinct third tRNA binding site (7)with a preferential &nity for deacylated tRNA is now well established (8, 9).
4
KATHY R. LIEBERMAN AND ALBERT E . DAHLBERG
Y
k, k2 RS + T C e RS-TC -RS*EFTu*GDP.oo-tRNA k-i
-
RS pet- t RNA
EFTu.GDP ao-tRNA RS-EFTu-GDP k51
RS+EFTu' GDP I
INITIAL RECOGNITlON
I 1
PROOFREADING
I
FIG. 2. Selection of aminoacyl-tRNA in the ribosomal A-site. A two-stage model for discrimination between cognate and noncognate aminoacyl-tRNAs, featuring an initial recognition step and a kinetic proofreading step. RS, mRNA-programmed ribosome with peptidyl-tRNA in the P-site; TC,complex of elongation factor Tu (EF-Tu), aminoacyl-tRNA (aa-tRNA), and GTP. [From Thompson ( l o ) . ]
The ribosome is a processive enzyme; thus, following peptide-bond formation, a translocation event occurs (Fig. 1, bottom) in which the movement of tRNA substrates and mRNA is promoted by interaction with another GTPase, elongation factor G (EF-G). The translocation step is highly critical, because maintenance of reading-frame is as essential to translational fidelity as is accurate A-site decoding, and depends on the movement of tRNA substrates and mRNA by precisely three nucleotides with respect to the ribosome. Recent studies indicate that translocation occurs in a two-state process, with the 3' ends of the P- and A-site tRNA substrates moving spontaneously relative to the 5 0 4 subunit (from the P- to E-site, and from the A- to P-site, respectively) following peptide-bond formation, whereas the movement of the substrates and mRNA with respect to the 3 0 4 subunit is mediated by subsequent interaction with EF-G (15, 16). Following translocation, the elongation cycle begins again, and is repeated until an mRNA termination codon enters the A-site (Fig. 1, top left), whereupon a protein release factor promotes the peptidyltransferase-catalyzed hydrolysis of the P-site-bound peptidyl-tRNA (17). Peptide-bond formation thus occurs in the context of an intricate process in which substrates are delivered to the ribosome and move from one binding site to another in a highly ordered fashion, dictated by interactions with mRNA, soluble protein factors, and the ribosome. A complete understanding of the catalysis of peptidyl transfer must therefore seek to integrate the kinetic and energetic contributions of all of these sequential interactions, while providing a detailed structural and functional description of the encounters between the ribosome and its tRNA substrates.
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
5
The very complexity of the translational process, however, renders the dissection of mechanistic aspects of individual steps, including peptidyl transfer, quite difficult. In order to study peptide-bond formation, in vitro systems have been devised that significantly simplify the reaction, and in the process, a considerable amount of information regarding the requirements for catalysis has been obtained.
1. The Enzyme The ribosome-catalyzed in vitro synthesis of polyphenylalanine in response to the synthetic mRNA, polyuridylate [poly(U)], provides a welldefined and manipulable model system for the processive elongation phase of protein synthesis (1).This system requires 7 0 3 ribosomes, factors from a cellular extract contained in the supernatant after the ribosomes have been removed by centrifugation, total cellular tRNAs or purified tRNAPhe, total cellular amino acids or pure phenylalanine, GTP or an energy-regenerating system, and poly(U). Using the polyphenylalanine synthesis system, critical insight into the source of the catalytic activity responsible for peptidyl transfer has been gained through the elucidation of the mode of action of the antibiotic puromycin. Puromycin, a structural analog of aminoacyl-adenosine, inhibits protein synthesis by acting as an A-site substrate, and is incorporated via its a-amino group at the carboxyl terminus of the growing peptide chains (18). Because the linkage between the amino-acid and nucleoside moieties of puromycin is an amide rather than the more reactive acyl ester of the natural aa-tRNA substrates, peptidyl-puromycin does not function as a P-site substrate. Moreover, puromycin contains no tRNA moiety capable of interacting with mRNA and the 304 subunit; thus, following its covalent attachment, further processive reactions are not possible, and the peptide chain is released from the ribosome. If polyphenylalanine synthesis is allowed to proceed for a period in the presence of supernatant factors and GTP, and these factors are then removed by washing the ribosomes in buffer containing a high concentration of salt, the synthesized polypeptide chains (as peptidyl-tRNA) remain bound to the ribosome. The addition of puromycin releases the peptide chains from the ribosome (19).Similar results have been obtained from experiments utilizing an analogous system of poly(A)-directed synthesis of polylysine (20, 21), where it was further demonstrated that polylysyl-tRNALys, bound to saltwashed ribosomes, can be transferred to either puromycin or lysyl-tRNALys (21). Moreover, the puromycin system can be further simplified by directly binding fMet-tRNAfMet to salt-washed ribosomes in the presence of the
6
KATHY R. LIEBERMAN AND ALBERT E . DAHLBERG
oligoribonucleotide triplet AUG (methionine codon) and adding puromycin, to obtain the ribosome-catalyzed formation of Net-puromycin
(22). The use of salt-washed ribosomes and purified components demonstrated that peptidyltransferase activity is integral to the ribosome, and not a property of another component of the cellular extract. Because these experiments rendered peptidyltransferase nonprocessive, they indicated that supernatant factors, necessary to support processive synthesis, are not required for catalysis. Interpreted in light of a more contemporary understanding of the roles of the supernatant factors, these findings indicate that neither the energetic nor kinetic contributions of EF-Tu-mediated binding of aminoacyl tRNA, nor of EF-G-mediated translocation, are required for the chemistry of catalysis. The ribosome contains all determinants essential for peptide-bond formation. In fact, the entire 704 ribosome is not required for catalysis. Polyphenylalanyl-tRNAPhe, synthesized in response to poly(U) by 70-S ribosomes, remains bound to salt-washed 5 0 4 ribosomal subunits when they are separated from 3 0 4 subunits by sucrose gradient centrifugation; this complex, in the absence of the 30-S subunit, reacts with puromycin (19). Indeed, under appropriate buffer conditions, purified salt-washed 5 0 3 subunits alone catalyzed peptide-bond formation between puromycin and an aminoacyloligoribonucleotide fragment derived from the 3' end of fMet-tRNAfmet by RNase digestion (23) (discussed further in Section 11). Thus, the large ribosomal subunit is responsible for the principal catalytic function of the ribosome, peptidyl transfer. Indeed, not even the entire 504 subunit is required for catalysis. Genetic experiments have revealed that several large subunit proteins are not essential for protein synthesis in viuo (24). Furthermore, after tackling the difficult task of devising an ordered reconstitution scheme (and assembly map) for the E . coli 504 subunit (25,26),Nierhaus and co-workers employed a series of reconstitution experiments with rRNA and different subsets of the 50-S subunit proteins to determine that only 2 3 3 rRNA, and five of the 50-S proteins could be correlated with the reconstitution of peptidyltransferase activity (27, 28). No single ribosomal protein or group of proteins has been found to be capable of catalysis of peptide-bond formation in the absence of rRNA (29), and recently, it has been demonstrated that the 50-S subunit from Thermus aquaticus ribosomes, depleted of over 90% of its protein component by proteolysis and extensive phenol extraction, can still catalyze peptidyl transfer efficiently (30). Catalytic activity was destroyed by ribonuclease treatment, thus indicating the critical importance of rRNA in peptide-bond formation.
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
7
II. The Substrates In both the A- and P-sites, the ribosome tnust productively recognize some or all of the common features of aa-tRNA structure, and must do so in the context of over 20 different tRNAs and their attached amino acids. These common determinants may include the N-terminal amide linkage ( peptidyl chain) in the P-site substrate, and the free a-amino group in the A-site substrate. Common determinants in both the A- and P-site substrates may include the 2’(3’)-O-aminoacyl ester linkage, the very similar but by no means identical three-dimensional shape of the tRNAs, and the universally invariant sequence CCA at the 3’ terminus of tRNAs. The absolute conservation of the 3’ CCA sequence strongly indicates it has an essential role in tRNA function. Indeed, in the tRNA genes of many organisms, the CCA sequence is not encoded by the DNA sequence, but rather is added posttranscriptionally (31). In addition, there is enzymatic machinery for the repair of the 3’ CCA end in organisms whose genes do encode the sequence, such as E . coli (32). Presumably, such a metabolic investment is a further indication of the critical function of this sequence. As mentioned above, puromycin is an efficient A-site substrate, indicating that a small fragment of aa-tRNA derived from the 3’ terminus is sufficient for productive A-site interaction with peptidyltransferase. Analogously, the entire peptidyl-tRNA molecule is not required for P-site function. The fragment CAACCA-Met, derived from the 3‘ terminus of fMet-tRNAfMefby digestion with RNase T,, is an efficient P-site substrate for the formation of Met-puromycin, catalyzed by either 7 0 3 ribosomes or by 50-S subunits alone (23, 33). The rate of peptide-bond formation by 70-S ribosomes between the CAACCA-met fragment and puromycin is about half the rate obtained with intact fMet-tRNAI’et, although the extent of reaction is identical for the two substrates (34). This system, termed the fragment reaction, requires Mgz+ ions and either K + or NHJ ions, is independent of mRNA and the 3 0 4 subunit, and requires the presence of a water-miscible organic solvent at concentrations between 10 and 33%. Ethanol, methanol, and acetone all promote the reaction (35),with methanol being the most effective. The requirement for alcohol, although not well-understood, must be related to binding of the minimal P-site substrate from solution, because intact peptidyl-tRNA, synthesized on 7 0 3 ribosomes and then separated from 30-S subunits as a complex with the 5 0 3 subunit, is reactive toward puromycin in the absence of organic solvent (19). The CAACCA-fMet fragment was further truncated by exonuclease digestion and the rates of reaction for the smaller substrates were compared.
8
KATHY R. LIEBEHMAN AND ALBERT E. DAHLBEHG
The P-site activities of the fragment substrates AACCA-fMet and CCA-fMet were virtually identical to the activity obtained with the hexamer substrate, whereas CA-fMet and A-Met were inactive2 (34).Thus a minimal system for catalysis of peptidyl transfer consists of the large ribosomal subunit, an N-acyl-aminoacyl-oligoribonucleotidecontaining the 3’ CCA sequence of tRNA as the P-site substrate, the aminoacyl-adenosine analog puromycin as A-site substrate, and divalent and monovalent cations. Notably, the portions of the substrates required for efficient participation in the fragment reaction consist of features found in common among all aa-tRNAs. The attributes of such simplified assay systems are precisely the same as their limitations. By separating peptidyl transfer from the processivity and decoding demands of mRNA translation, the reaction is essentially reduced to two integrated components, productive substrate binding and the chemical steps of catalysis. Although binding of both substrates to the ribosome occurs from solution, thus likely altering the free energy changes of initial recognition events, it seems reasonable to assume that the same fundamental reaction mechanism applies in the model reaction. This assumption is supported by the fact that the fragment reaction is subject to inhibition by the same antibiotics that inhibit peptide-bond formation both in vivo and in more complete in vitro systems (38), such as chloramphenicol and carbomycin. The importance of the 3’ CCA sequence for accurate, catalytically productive P-site function is underscored by the behavior of tRNAs containing mutations in this sequence. A mutant of E . coli tRNAmet with the 3’ sequence UCA, prepared in vitro by bisulfite treatment, was methionylated and formylated in vitro, and bound to the P-site in the presence of initiation factors (39).The bound mutant Met-tRNAmet was inactive in the puromycin reaction. Recently, a mutant of E . coli tRNAPhe with the 3’ end sequence GGA was constructed and expressed by in uitro transcription. The mutant tRNA was tested as a P-site substrate in experiments using 70-S ribosomes, in the presence of either poly(U) or methanol (40).Although, as a deacylated species, mutant tRNA bound to the P-site as efficiently as did wild-type tKNA, the aminoacylated, N-acetylated form of the mutant tRNA failed to react with puromycin. Mutants of tRNAV”’with the 3’ end sequences ACA and GCA have been isolated, using a genetic selection for suppressors of a frameshift mutation in 2 It was later found that with intact Phe-tRNA Phe as the A-site substrate, Met-adenosine 5’-monophosphate could function as a P-site substrate (36).in the presence of methanol and at much higher concentrations than originally employed (34).This activity was enhanced by the presence of cytidine 5‘-monophospate (pC), but not by the 5’-phosphates of the other nucleosides (37).
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
9
the t r p E gene of the tryptophan biosynthetic operon of Salmonella typhimurium (41). In addition to causing frameshifting, expression of these mutant tRNAs in E . coli promoted readthrough of nonsense mutations in a lac2 reporter gene. To obtain nonsense suppression with the mutant tRNAs,
a valine codon was required just upstream of the nonsense codon. Protein sequencing of the resultant P-galactosidase revealed that valine was inserted into the protein chain in response to the valine codon, followed by the insertion of a noncognate tRNA in response to the next (nonsense) codon. Thus the misreading event responsible for nonsense suppression occurred while the mutant tRNA occupied the ribosomal P-site.
111. Reactions with "Unnatural" Substrates While recognizing features common to all aa-tRNA substrates, peptidyltransferase must also have the flexibility to accommodate all combinations of 2 out of 20 different amino acids, derivatized from an even greater number of tRNAs (due to isoacceptors). In addition, the active site exhibits flexibility toward the chemical nature of the groups participating in the reaction. Although, during the elongation phase of translation, peptidyl transfer is the aminolysis of an activated aminoacyl ester, when an mRNA termination codon enters the ribosomal A-site, the peptidyltransferase active site, in the presence of a protein release factor, catalyzes the hydrolysis of P-site-bound peptidyl-tRNA (17). This corresponds to a change in the A-site specificity with regard to the nucleophile, from the primary amine of aa-tRNA to the oxygen of a water molecule. However, the ability of peptidyltransferase to utilize water or a primary alchohol as a nucleophile is not strictly dependent on the presence of release factor. In the presence of acetone, both E . coli and rabbit reticulocyte ribosomes catalyze the hydrolysis of P-site-bound Met-tRNAmet (42).Under the same conditions, but with ethanol present rather than acetone, fMet is transferred to the alcohol to form Met-ethyl ester (43). Escherichia coli 70-S ribosomes also catalyze ester formation between the a-hydroxyl derivative of puromycin and fXlet-tRNAfMCt (44), in a reaction that requires no organic solvent. These findings suggest peptidyltransferase has a flexibility similar to that of many proteolytic enzymes, which catalyze the hydrolysis of both amide and ester substrates (45). Perhaps more remarkably, E . coli ribosomes have been shown to catalyze the processive reaction of polyester synthesis (46).In the presence of elongation factors, GTP, poly(U), and phenyllactyl-tRNAPhe, the product poly( phe-
10
KATHY R. LIEBERMAN AND ALBERT E. DAHLBERG
nyllactate) was formed. Finally, E . coli ribosomes, but not rat liver ribosomes, catalyze thioester formation between thiopuromycin and AcPhe-tRNAPlle (47). Alterations in the chemical nature of the electrophilic substrate are also tolerated by peptidyltransferase. Under fragment reaction conditions (in the presence of methanol), E . coli ribosomes catalyze the formation of a thioamide between an AcLeu-thioester derivative of adenosine and PhetRNAPhe (48).Using the same system, the ribosome-catalyzed formation of a phosphinoamide linkage between the AcMet Gly-phosphinoester derivative of adenosine and Phe-tRNAPhe (49),a reaction expected to proceed through a trigonal bipyramid transition state, with very different geometry than the tetrahedral transition state predicted for the natural reaction, was observed. Perhaps the most remarkable report of ribosome-catalyzed synthesis of an unnatural chemical linkage involved a P-site substrate containing a second electrophilic center (50). N-(ch1oroacetyl)Phe-tRNAPhe was prepared and employed as a P-site substrate in a dipeptide-synthesis assay, with E . coli 7 0 4 ribosomes, poly(U), and Phe-tRNAPhe as the A-site substrate. Two products were formed. One had the normal Phe-Phe linkage, whereas the other, obtained in equal yield, was the product of attack by the a-amino group at the chloro-substituted carbon, several bond lengths away from the carbonyl carbon of the aminoacyl ester. Both reactions were ribosomedependent and both were inhibited by chloramphenicol. Taken together, these studies draw a picture of an active site with an intriguing level of flexibility and tolerance for chemical and structural variability in the aminoacyl moiety of both A- and P-site substrates, and even for the structure and charge distribution of the predicted transition state for the catalyzed reaction. It is possible that this flexibility reflects properties important for an enzyme that must interact with many substrates that differ both in terms of tRNA structure and attached amino-acid structure. Although DNA and RNA polymerases also face the problem of multiple substrates, the number and size of the substrates that they encounter are much smaller than encountered by the ribosome. Among these polymerases, the ribosome is unique in possessing an essential, conserved RNA component. Perhaps the catalytic flexibility of peptidyltransferase may derive from the participation of rRNA in the catalysis of peptide-bond formation.
IV. Implication of 234 rRNA in Peptide-bond Formation
An abundance of genetic and biochemical evidence now implicates 23-S rRNA, and in particular the region encompassing the secondary structural
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
11
feature termed the central loop of domain V (Fig. 3), in peptidyltransferase function. The secondary structure of domain V is extremely well-conserved (51),whereas the central loop consists for the most part of nucleotides whose identity and position are invariant in all organisms known to date. Indeed, due to the large number of universally conserved residues in this region, no secondary structural interactions, and only a single tertiary interaction (between central loop residue 2586 and residue 1782 in domain IV), have been proposed for central loop nucleotides by phylogenetic analysis (52), which relies on covariances as the criteria for the validity of an interaction. Thus the higher order folding of 23-S rRNA in this region, likely to be essential for formation of a structure important for catalysis of peptidyl transfer, is largely unknown. In many organisms, mutations that confer resistance to antibiotic inhibitors of peptidyltransferase, such as chloramphenicol and lincomycin in eubacteria and mitochondria, and anisomycin in archaebacteria and eukaryotes (53,54, and references therein), have been mapped to the central loop region (Fig. 4A). When bound to E . coli ribosomes, chloramphenicol and carbomycin-another peptidyltransferase inhibitor-protect an overlapping set of conserved central loop residues (Fig. 4B) from chemical modification by dimethyl sulfate and kethoxal (55).The sites of protection correlate remarkably well with the sites of the resistance mutations, and argue strongly that this region constitutes the binding site for these peptidyltransferase inhibitors. Several photofinity analogs of the substrates of peptidyl transfer have been cross-linked to the central loop region in E . coli ribosomes (Fig. 5), suggesting that it is in extremely close proximity to the portions of the aa-tRNA substrates participating in peptide-bond formation. Phe-tRNAPhe, derivatized from the a-amino group with a photoreactive benzoylphenone function (BP-Phe-tRNAPhe),was cross-linked in high yield to universally conserved nucleotides (56).When bound to the P-site, cross-links were obtained between BP-Phe-tRNAPhe and central loop residues A2451 and C2452. When BP-Phe-tRNAPhe was bound to ribosomes in which P-site binding was blocked by deacylated tRNA [a complex that mimics the state of the ribosome immediately following peptidyl transfer but prior to EF-G-mediated translocation (15)],the cross-links obtained were across the loop at residues A2584 and U2585. The p-azido derivative of puromycin has been specifically cross-linked to the universally conserved central loop residues G2502 and U2504 (57).The cross-linking of residue A2439, using an N-( p-azidobenzoy1)glycylderivative of Phe-tRNAPhe, has been reported (58). A2439, another universally conserved residue, is located in one of the conserved helices extending out from the central loop. Interestingly, a U-to-C mutation of the nucleotide imme-
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
A
13
B
FIG.4. Interaction of antibiotic inhibitors of peptidyltransferase with the central loop of domain V of 23-S rRNA. (A) The sites of mutations in various organisms conferring resistance to the peptidyltransferase inhibitors chloramphenicol (O),lincomycin (A),or anisomycin (u). (B) The nucleotide residues in E . coli 234 rRNA protected from chemical modification by the binding of chloramphenicol or carbomycin, another peptidyltransferase inhibitor. [From Noller
(92).1
diately adjacent to A2439 (corresponding to position 2438 in the E . coli sequence) was obtained in the archaebacterium Halobacterium halobium, using a selection for cells resistant to the antibiotic inhibitor of peptidyltransferase, amicetin (59). Further strong indication that this region participates in the formation of the peptidyltransferase active site comes from the demonstration that each of the aa-tRNA molecules, bound to the A- and P-sites, protects a specific set of conserved 2 3 3 rRNA residues from chemical modification (Fig. 6). The protected residues are clustered in or near the central loop of domain V (7), and consist largely of the very same nucleotides implicated in the crosslinking and antibiotic resistance studies described above. A stepwise loss of rRNA protection in both sites was observed as the substrates were truncated, first by deacylation, then by removal of the 3’-terminal adenosine, and finally by the removal of the 3’-terminal CA dinucleotide, suggesting interactions between specific substrate moieties and specific rRNA residues (7). Remarkably, all but one of the 2 3 3 residues protected by P-site-bound aa-tRNA are also protected by short N-acyl-aminoacyl-oligoribonucleotide
FIG. 3. Domain V of E . coli 23-S rRNA. Pictured is the secondary-structure model for residues 2043-2625 of 23-S rRNA. The highly conserved central loop feature associated with peptidyltransferase function is indicated. [Modified from Gutell, Schnare and Gray (51);reproduced by permission of Oxford University Press.]
14
KATHY R. LIEBERMAN AND ALBERT E. DAHLBERG
ABG-Phe-1RNA (A and P sites)
8P-Phe-tRNA A
I 1
11
GCUG
cu cc A u,c 2610 u
1 4 ’ .
cU “
A
G
C
G
C
cu-- a m G U
‘pazido
puromycin
(A site)
FIG. 5. Sites of photocross-linking of peptidyltransferase substrate analogs. Pictured is the central loop region of domain V of 2 3 4 rRNA, and indicated are the universally conserved residues that have been covalently cross-linked by reaction with 3-(4‘-benzoy1phenyl)propionylPhe-tRNAPhe (BP-Phe-tHNA) from the A- and P-sites (56), by p-azidobenzoyl-Phe-tRNAphe (ABG-Phe-tRNA) from the A- and P-sites (58),and by p-azidopuromycin (57).
fragments derived by RNase T, digestion from the 3’ ends of either M e t tRNAfMet, AcPhe-tRNAPhe, or AcLeu-tRNALe1I(60). These fragments comprise precisely those portions of the peptidyl-tRNA substrates that constitute the minimal essential P-site substrate in the fragment reaction (34). Recent genetic experiments in E . coli support the proposal that many of the domain V residues detected in the above biochemical studies are important to the efficiency and specificity of peptidyl transfer. Utilizing the same selection that previously yielded mutants at position 74 of tRNAVal(frameshift suppression of the trpE91 gene; 4 4 , a mutant was isolated containing a U-to-A change of 23-S rRNA residue 2555 (61). This residue, located in a loop capping a helix that extends from the central loop, was protected from chemical modification by A-site-bound tRNA (7) in a manner dependent on the presence of the 3’-terminal adenosine of the substrate (Fig. 6). In addition to effecting a - 1 frameshift suppression of trpE91, both the U-to-A and the U-to-G changes at 2555 promoted -1 and + 1frameshift suppression and readthrough of all three nonsense codons in a lac2 reporter gene. Thus a genetic selection based solely on functional criteria in vivo implicated one of
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
A
15
P
G is53 V2SSS
FIG. 6. 2 3 4 rRNA domain V residues protected by tRNA hound to the A- and P-sites. Protection of the indicated nucleotides from chemical modification is dependent on the presence in the hound substrate of the aminoacyl moiety (+), the 3‘-terminal adenosine residue (O), the 3’-terminal CA residues (A),or the remainder of the tRNA molecule (m). The arrow next to residue A2602 in the P-site footprint indicates that the reactivity of this nucleotide is enhanced by aminoacyl-tRNA binding. All of the residues protected in the P-site by aminoacyl-tRNA, with the exception of G2505, are also protected from chemical modification by aminoacyloligonucleotide fragments derived from the 3’ end of fMet-tRNAMet, AcLeu-tRNAkU, or AcPhe-tRNAPheby RNase T, digestion (60). [Reproduced from Moazed and Noller (7); copyright by Cell Press.]
the very same 23-S rRNA residues detected in a structural analysis of an in uitro complex (7). Mutations of the universally conserved central loop residue G2583 have been constructed in E . coZi by site-directed mutagenesis (62). This nucleotide is adjacent to U2584 and U2585, both of which were photocross-linked
16
KATHY H. LIEBERMAN AND ALBERT E. DAHLBERG
by BP-Phe-tRNAP1le(56)and protected by P-site-bound tRNA (7) in a manner dependent on the presence of the 3‘-terminal adenosine. Ribosomes containing mutations at position 2583 displayed increased levels of translational accuracy in both in vitro (62) and in v i m (63)experiments. Mutations of the highly conserved domain V residues G2252 and G2253, constructed by site-directed mutagenesis, decrease translational fidelity in vivo (64). These two nucleotides are protected from kethoxal modification when the P-site is occupied by either an aminoacyl-oligonucleotide (60), by aminoacyl-tRNA, or by deacylated tRNA, but not by tRNA missing the 3’-terminal CA sequence (7)(Fig. 6). G2252 and G2253 are therefore candidates for direct interaction with the 3’ CCA end of peptidyl-tRNA, possibly through Watson-Crick base-pairing. This possibility was tested in vitro using ribosomes containing the double mutation G2252lG2253 to C2252l C2253 (41). Mutant ribosomes consistently displayed lower peptidyltransferase activity than did wild-type ribosomes, supporting a role for these 23-S rRNA residues in peptide-bond formation. However, aminoacylated, N-acetylated tRNAPIie with the 3’ sequence GGA (see Section 11) did not compensate for any of the reduced activity of the mutant ribosomes. The mechanisms by which any of the above 23-S rRNA mutations perturb translational fidelity are not yet known. Because the ribosome is a processive and multifunctional enzyme, the effects of mutations may be pleiotropic, affecting multiple steps in the elongation cycle. Thus, although it cannot be excluded that the rRNA mutations alter interactions between the ribosome and elongation factors or release factors, it is intriguing that mutations in the 3’ CCA end of the aa-tRNA substrates, and in 23-S rRNA nucleotides with which they are predicted to interact, have the capacity to alter the level of accuracy (substrate discrimination) of peptidyltransferase. These findings suggest that binding contacts between universally conserved residues of 23-S rRNA and tRNA, likely to be important for catalysis of peptidyl transfer, are also important for translational fidelity. Although a preponderance of evidence implicates domain V in peptidebond formation, it is likely that domains I1 and IV of 23-S rRNA are functionally and proximally linked to the peptidyltransferase center. As mentioned above, a phylogenetic covariance occurs between the domain V central loop residue 2586 and the domain IV residue 1782 (52). P-sitebound AcPhe-tRNAPhe, derivatized with an azido group at N2 of the 3‘-terminal adenosine, has been cross-linked to residue G1945 in domain IV (65), whereas three separate intrasubunit cross-links between residues in or near the central loop of domain V and domain IV were induced by UV irradiation (66). In addition, domain IV contains the rRNA binding site for ribosomal one of the few proteins found to be essential for reconstituprotein L2 (0, tion of peptidyltransferase activity (28).
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
17
Deletion mutations in a helix of domain I1 of E . coli 23-S rRNA confer resistance to erythromycin (68).Although this antibiotic does not inhibit the peptidyltransferase step directly, it does compete with chloramphenicol for binding to ribosomes (38);furthermore, erythromycin and chloramphenicol protect a partially overlapping set of central loop residues from chemical modification (55). All previously described rRNA mutations giving rise to erythromycin resistance were point mutations in the domain V central loop region (53). Finally, two UV-induced intrasubunit cross-links have been obtained between the central loop region and residues in domain I1 (66).
V. Prospective The catalytic activity of protein-deficient T. aquaticus 5 0 3 subunits (30) argues for the critical importance of rRNA in peptide-bond formation. Furthermore, the stepwise loss of 23-S rRNA protection as tRNA substrates are truncated from the 3‘ end (3, taken together with the P-site protection by aa-tRNA fragments of virtually all of the 23-S residues protected by intact aa-tRNA (60) and the efficiency of peptidyl transfer with minimal substrate fragments ( 3 4 , provides compelling support for the proposal that the direct recognition of the 3‘ CCA end of aa-tRNA substrates by a specific region and specific residues of 23-S rRNA constitutes the binding interactions essential for ribosome-catalyzed peptide-bond formation. There is a striking contrast between the requirement for the CCA sequence of the tRNA portion of the P-site substrate in peptide-bond formation, and the permissiveness of the peptidyltransferase active site with regard to some of the chemical and structural features of the aminoacyl moiety, including the functional groups directly participating in reaction chemistry (42-44, 46-50). Such permissiveness may be a useful feature for an enzyme required to recognize any combination of 2 out of 20 different amino acids during each round of catalysis, and may derive directly from the contribution of rRNA to peptidyltransferase function. It has been suggested that the ribosome, as an enzyme that “runs on” rRNA, might have a type of flexibility not available to protein enzymes (69), due to the fact that the specificity of interactions between RNA molecules can be dictated by base-pairing. A role for rRNA secondary structure in the positioning of elements critical for catalysis can be imagined, whereby the folding of a secondary structural domain or subdomain may dictate the formation of an essential rRNA tertiary structure. Such a structure might have a degree of independence or modularity, and perhaps even some flexibility in its precise geometric relationship to the rest of the 50-S particle (69). Furthermore, formation of the catalytically competent structure may be depen-
18
KATHY R. LIEBERMAN AND ALBERT E. DAHLBERG
dent on an intermolecular nucleic acid interaction, between rRNA and the 3’ CCA ends of the aa-tRNA substrates. This leads directly to the question of whether the secondary structural features of base-pairing and helix formation can be extended to describe the intermolecular process of substrate recognition between the ribosome and its aa-tRNA substrates. Minimally, two parts of aa-tRNA molecules, the anticodon and the 3’ CCA end, must interact with the mRNA-programmed ribosome to achieve peptide-bond formation with accurate decoding. In fact, a fragment of tRNAphe consisting of only the anticodon stem-loop protects all of the same 16-S rRNA nucleotides from chemical attack in the P-site as does intact tRNAPhe (70), whereas an aa-tRNA fragment containing the 3’ CCA end protects all but one of the 23-S residues protected by the intact aa-tRNA substrate (60). A fundamental feature of translation of the genetic code is base-pairing between the anticodon and the ribosome-bound mRNA. Although the logic of a model predicting active site flexibility conferred by RNA base-pairing interactions would be neatly satisfied by the invocation of rRNA.tRNA basepairing as a mode of recognition for the 3’ CCA end, this question remains unresolved. Experiments using an aa-tRNA substrate with a 3’ GGA end sequence detected no evidence for canonical base-pairing between 23-S rRNA residues G2252 and G2253 and peptidyl-tRNA in the P-site (40). Also, results of an earlier study using aminoacyl-oligoribonucleotideswith base substitutions in the 3’ CCA sequence in in vitru peptide-bond formation (with wild-type ribosomes) were inconsistent with Watson-Crick basepairing in the A-site (71). Regardless of whether the 3‘ CCA ends of aa-tRNA substrates bind to 23-S rRNA through canonical base-pairing, there is a more critical question in understanding peptide-bond formation: What is the nature of an rRNA structure capable of catalyzing peptidyl transfer? The paradigm of RNA catalysis provided by present-day catalytic RNAs may be instructive in understanding how rRNA might achieve the catalysis of peptidyl transfer. The details emerging from structural and functional analyses of the RNA molecules that catalyze the making and breaking of phoshodiester bonds indicate that RNA enzymes use noncovalent binding interactions, including Watson-Crick base-pairing (72), recognition of helix length and structure (74, noncanonical base interactions (74, 75), and base and coaxial helical stacking (76)to form the intramolecular and intermolecular higher order structures necessary for catalysis by approximation. Critically positioned divalent cations are essential for both RNA tertiary structure and the chemistry of RNA catalysis (77-79, and references therein). This significantly expands the chemical repertoire available to RNA molecules by providing a source of electrophilic stabilization likely to be neces-
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
19
sary for catalysis of reactions proceeding through negatively charged transition states, such as phosphoryl and acyl transfers. In this regard, it is important to note that the peptidyltransferase activity of protein-depleted T. oyuaticus 50-S subunits was abolished by the addition of EDTA (30), consistent with a requirement for Mg2+ ions in catalytic function. The tRNA footprinting experiments (7, 60), in particular, provide structural correlates of functional states of substrate binding. However, protection of rRNA residues from chemical modification may arise from direct rRNA.tRNA contacts or from conformational alterations in 50-S subunit structure induced by tRNA binding. This limitation to interpretation is largely due to the paucity of information regarding the tertiary structure of 23-S rRNA. Development of a physical model of the peptidyltransferase active site will require a detailed understanding of the higher order folding of the rRNA in the central loop region of domain V, and of how this structure interacts with aa-tRNA substrates. Spatial relationships between elements of 23-S RNA implicated in functional studies must be established, and both genetic and biochemical approaches might be fruitfully applied to this endeavor. Understanding peptide-bond formation in the context of the dynamic process of translation will require an integration of the local structure of the peptidyltransferase domain within the framework of the quaternary structure of the 50-S particle. The development of a three-dimensional structural model for a particle as complex as the 50-S subunit is a daunting challenge. Models of the quaternary structure of the 30-S subunit that integrate the pathway and folding of 16-S rRNA within the spatial arrangement of small subunit proteins have been proposed (80, 81). They incorporate mapping of ribosomal proteins by immune electron microscopy (82, 83) and neutron scattering (84), combined with the localization of ribosomal protein-binding sites on 16-S rRNA (85)and the identification of intrasubunit cross-links (80). Although some of the same experimental approaches have been applied to some components of the 50-S subunit (86-89), the extent of our knowledge of the higher order structure of this particle is much less advanced. As this structural knowledge continues to expand, with the important goal of high-resolution crystal structures of ribosomes now an imminent possibility (go), it should become possible to propose increasingly sophisticated and detailed models for the mechanism of ribosome-catalyzed peptidebond formation, and for the integration of that mechanism in the elongation cycle of protein synthesis. Application of the elegant systems currently being developed and refined for the functional analysis of the elongation process (5, 6, 10, 11, and references therein) will allow the meaningful testing of such models, using both mutant and wild-type ribosomes. Hence the prospect of understanding peptidyl transfer, catalyzed by this fascinating and
20
KATHY R . LIEBERMAN AND ALBERT E. DAHLBERG
multifunctional enzyme that provides the fundamental interface between genotype and phenotype, is opening before us. ACKNOWLEDGMENTS We thank all of the members of the Dahlberg laboratory, and Rachel Green, Raymond Samaha, Harry Noller, and George Q. Pennable for helpful discussions. We also thank Samuel Beale for suggesting that we write this article, and Harry Noller for critical review of the manuscript. Work from the authors’ laboratory was supported by a grant from the National Institutes of Health (GM19756)to A.E.D.
REFERENCES 1. B. E. H. Maden, R. R. Traut and R. E. Monro, J M B 35, 333 (1968). 2. C. R. Woese and N. R. Pace, in “The RNA World (R. F. Gesteland and J. F. Atkins, eds.), p. 91. CSHLab, Cold Spring Harbor, New York, 1993. 3. P. H. Van Knippenberg, in “The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. E. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessingerand J. R. Warner, eds.), p. 265. American Society for Microbiology, Washington, D.C., 1990. 4. K. H. Nierhaus, B c h e m 29, 4997 (1990). 5. H.-J. Rheinberger, U. Geigenmuller, A. Gnirke, T.-P. Hausner, J. Remme, H . Saruyama and K. H. Nierhaus, in “The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. E. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 318. American Society for Microbiology, Washington, D.C., 1990. 6. W. Wintermeyer, R. Lill and J. M. Robertson, in “The Ribosome: Structure, Function and Evolution”(W. E. Hill, A. E. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessingerand J. R. Warner, eds.), p. 348. American Society for Microbiology, Washington, D.C., 1990. 7. D. Moazed and H. F. Noller, Cell 57, 585 (1989). 8. R. Lill, J. M . Robertson and W. Wintermeyer, B c h e m 25, 3245 (1986). 9. S. Schilling-Bartetzko, F. Franceschi, H. Sternbach and K . H . Nierhaus, JBC 267, 4693 (1992). 10. R. C. Thompson, TIBS 13, 91 (1988). 1 1 . C. G. Kurland, F. JBrgensen, A. Richter, M. Ehrenberg, N. Bilgin and A.-M. Rojas, in “The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. E. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 513. American Society for Microbiology, Washington, D.C., 1990. 12. M. Ehrenberg, A.-M. Rojas, I. Diaz, N . Bilgin, J. Weiser, F. Claesens and C. G. Kurland, in “The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. E. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 373. American Society for Microbiology, Washington, D.C., 1990. 13. M. Ehrenberg, A.-M. Rojas, J. Weiser and C. G. Kurland, J M B 211, 739 (1990). 14. A. Weijland and A. Parmeggiani, Science 259, 1311 (1993). 15. I). Moazed and H. F. Noller, Nature 342, 142 (1989). 16. B. Hardesty, 0. W. Odom and J. Czworkowski, in “The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. E. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 366. American Society for Microbiology, Washington, D.C., 1990.
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49.
50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60.
21
Z. Vogel, A. Zamir and D. Elson, Bchem 8, 5161 (1969). D. Nathans, PNAS 51, 585 (1964). R . R. Traut and R. E. Monro, J M B 10, 63 (1964). I. Rychlik, BBA 114, 425 (1966). M. A. Gottesman, J B C 242, 5564 (1967). A. Zamir, P. Leder and D. Elson, PNAS 56, 1794 (1966). R . E. Monro, JMB 26, 147 (1967). E. R. Dabbs, J. Bact. 140, 734 (1979). F. Dohme and K. H. Nierhaus, JMB 107, 585 (1976). M. Herold and K. H. Nierhaus, JBC 262, 8826 (1987). H. Hampl, H. Schulze and K. H. Nierhaus, JBC 256, 2284 (1981). H. Schulze and K. H. Nierhaus, EMBO J. 1, 609 (1982). H. F. Noller, J . B a t . 175, 5297 (1993). H. F. Noller, V. Hogarth and L. Zimniak, Science 256, 1416 (1992). M. P. Deustcher, This Series 39, 209 (1990). M. P. Deutscher, in “Enzymes of Nucleic Acid Synthesis and Modification,” Vol. 2., p. 159. CRC Press, Boca Raton, Florida, 1983. R. E. Monro and K. A. Marcker, J M B 25, 247 (1967). R. E. Monro, J. Cerni and K. A. Marcker, PNAS 61, 1042 (1968). R. E. Monro, T. Staehelin, M. L. Celma and D. Vazquez, C S H S Q B 34, 357 (1969). J. Cernli, I. Rychlik, A. A. Krayevsky and B. P. Gottikh, FEBS Lett. 37, 188 (1973). J. Cerni, FEBS Lett. 58, 94 (1975). M. L. Celma, R. E. Monro and D. Vazquez, FEBS Lett. 6, 273 (1970). R. M. Sundari, H. Pelka and L. H. Schulman, J B C 252, 3941 (1977). K. R. Lieberman and A. E. Dahlberg, J B C 269, 16163 (1994). M. O’Connor, N. M . Wills, L. Bossi, R. F. Gesteland and J. F. Atkins, EMBOJ. 12,2559 (1993). C. T. Caskey, A. L. Beaudet, E. M. Scolnick and M. Rosinan, PNAS 68, 3163 (1971). E. M. Scolnick, 6. Milman, M. Rosrnan and T. Caskey, Nature 225, 152 (1970). S. Fahnestock, H. Neumann, V. Shashoua, and A. Rich, Bchena 9, 2577 (1970). A. Fersht, “Enzyme Structure and Mechanism” (Second Ed.), p. 405. W. H. Freeman and Company, New York, 1985. S. Fahnestock and A. Rich, Science 173, 340 (1971). J. Gooch and A. 0. Hawtrey, BJ 149, 209 (1975). L. S. Victorova, V. V. Kotusov, A. V. Ashaev, A. A. Krayevsky, M. K. Kukhanova and B. P. Gottikh, FEBS Lett. 68, 215 (1976). N. B. Tarussova, G. M. Jacovleva, L. S. Victorova, M. K. KukhanovaandR. M. Khomutov, FEBS Lett. 130, 85 (1981). J. R. Roesser, M. S. Chorghade and S. M. Hecht, Bchem 25, 6361 (1986). R. R. Gutell, M. N. Schnare and M. W. Gray, NARes 18, 2319 (1990). R. R. Gutell and C. R. Woese, PNAS 87, 663 (1990). B. Vester and R. A. Garrett, EMBO J . 7, 3577 (1988). S. Douthwaithe, J. B a t . 174, 1333 (1992). D. Moazed and H . F. Noller, Biochimie 69, 879 (1987). G. Steiner, E. Kuechler and A. Barta, EMBO J . 7, 3949 (1988). C. C. Hall, D. Johnson and B. S . Cooperman, Bchem 27, 3983 (1988). P. Mitchell, K. Stade, M. Osswald and R. Brimacombe, NARes 21, 887 (1993). I. 6. Leviev, C. Rodriguez-Fonseca, H. Phan, R. A. Garrett, G. Heilek, H. F. Noller and A. S. Mankin, EMBOJ. 13, 1682 (1994). D. Moazed and H. F. Noller, PNAS 88, 3725 (1991).
22
KATHY R. LIEBERMAN AND ALBERT E. DAHLBEHG
61. M. O’Connor and A. E. Dahlberg, PNAS 90, 9214 (1993). 62. U. Saarma and J. Remme, NARes 20, 3147 (1992). 63. U. Saarma and B. T. U. Lewicki, T. Margus, S. Nigul and J. Remme, in “The Translational Apparatus: Structure, Function, Regulation, Evolution” (K. H. Nierhaus, F. Franceschi and A. R. Subramanian, eds.), p, 163. Plenum, New York, 1993. 64. S. T. Gregory, K. R. Lieberman and A. E . Dahlberg, NARes 22, 279 (1994). 65. J. Wower, S. S. Hixson and R. A. Zimmerman, PNAS 86, 5232 (1989). 66. P. Mitchell, M. Osswald, D. Schueler and R. Brimacombe, NARes 18, 4325 (1990). 67. A. A. D. Beauclerk. and E. Cundliffe, E M B O f . 7, 3589 (1988). 68. S. Douthwaithe, T. Powers, J. Y. Lee and H. F. Noller, JMR 209, 655 (1989). 69. P. B. Moore, CSHSQB 52, 721 (1987). 70. D. Moazed and H. F. Noller, Cell 47, 985 (1986). 71. M. Tezuka and S. Chlidek, Bchem 29, 667 (1990). 72. R. B. Waring, P. Towner, S. J. Minter and R. W. Davies, Nature 321, 133 (1986). 73. J. A. Doudna, B. P. Cormack and J. W. Szostak, PNAS 86, 7402 (1989). 74. F. Michel, A. D. Ellington, S. Couture and J. W. Szostak, Nature 347, 578 (1990). 75. F. Michel, M. Hanna, R. Green, D. P. Bartel and J. W. Szostak, Nature 342, 391 (1989). 76. F. Michel and E. Westhof, JMB 216, 585 (1990). 77. D. Smith and N. R. Pace, Bchem 32, 5273 (1993). 78. T. Pan, D. M. Long and 0. C. Uhlenbeck, in “The RNA World” (R. F. Gesteland and J. F. Atkins, eds.), p. 271. CSHLab, Cold Spring Harbor, New York, 1993. 79. T. A. Steitz and J. A. Steitz, PNAS 90, 6498 (1993). 80. R. Brirnacombe, J. Atmadja, W. Stiege and D. Schiiler, JMB 199, 115 (1988). 81. S. Stern, B. Weiser and H. F. Noller, J M B 204, 447 (1988). 82. G . StoMer and M. StiifRer-Meilicke, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 28. Springer-Verlag, Berlin and New York, 1986. 83. M. I. Oakes, A. Scheinman, T. Atha, G. Shankweiler and J. A. Lake, in “The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. E. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 180. American Society for Microbiology, Washington, D.C., 1990. 84. M. S. Capel, D. M. Engelmann, B. R. Freeborn, M. Kjeldgaard, J. A. Langer, V. Ramakrishnan, D. J. Schindler, D. K. Schneider, B. P. Schoenborn, L-Y. Sillers, S. Yabuki and P. B. Moore, Science 238, 1403 (1987). 85. S. Stern, T. Powers, L.-M. Changchien and H. F. Noller, Science 244, 783 (1989). 86. J. Walleczek, D. Schuler, M. StofRer-Meilicke, R . Brimacombe and G. StofRer, EMBO J. 11, 3571 (1988). 87. V. Nowotny, R. P. May and K. H. Nierhaus, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G . Kramer, eds.), p. 101. Springer-Verlag, Berlin and New York, 1986. 88. R. R. Traut, D. S. Tewari, A. Sommer, 6. R. Gavino, H. M. Olson and D. G. Glitz, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and 6. Krarner, eds.), p. 286. Springer-Verlag, Berlin and New York, 1986. 89. M. Oakes, A. Henderson, M. Scheinrnan, M. Clark and J. A. Lake, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 47. Springer-Verlag, Berlin and New York, 1986. 90. F. Franceschi, S. Weinstein, U. Evers, E. Arndt, W. Jahn, H. A. S. Hansen, K. von Biihlen, Z. Berkovitch-Yellin, M. Eisenstein, I. Agmon, J. Thygesen, N. Volkmann, H. Bartels, F. Schliinzen, A. Zaytzev-Bashan, R. Sharon, I. Levin, A. Dribin, I. Sagi, T. Choli-Papadopolilou, P. Tsiboli, G . Kryger, W. S. Bennett and A. Yonath, in “The Transla-
RIBOSOME-CATALYZED PEPTIDE-BOND FORMATION
23
tional Apparatus: Structure, Function, Regulation, Evolution” (K. H. Nierhaus, F. Franceschi and A. R. Subramanian, eds.), p. 397. Plenum, New York, 1993. 91. Donald M. Engelman and Peter B. Moore, Sci. Am. 235, 44 (1976). 92. H. F. Noller, in “The RNA World (R. F. Gesteland and J. F. Atkins, eds.), p. 137. CSHLab, Cold Spring Harbor, New York, 1993.
Promotion and Regulation of Ribosomal Transcription in Eukaryotes by RNA Polymerase I’ TOM MOSS2 AND
VICTORY. STEFANOVSKY Cancer Research Centre and Department of Biochemistry Laval University HBtel-Dieu de Qudhec QuBbec, Canadu G l R 2J6
I. 11. 111. IV.
V. VI. VII. VIII.
General Aspects of Ribosomal Gene Regulation Ribosomal Gene Organization and Evolution . . . . . . . . . . . . . . . . . . . . . RNA Polymerase I Promoters . . . .............. The Basal Transcription Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............. A. The TBP,-Complex and UBF B. The Polymerase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Activation at the Ribosomal Promo Enhancement . . . . . . . . . . . . . . . . . Mechanisms of Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . .
27 28 33 36 36 43 44 49 54 57 58
The ribosome is a large macromolecular complex essential for gene translation and protein synthesis. In eukaryotes it consists of four or five ribosomal RNAs (rRNAs) associated with about 80 ribosomal proteins (r-proteins) in two distinct subunits, simply called the large and small ribosome subunits. The so-called ribosomal genes code for the three larger rRNAs, that is, the 183 rRNA of the small ribosome subunit and the 5.8-S and 28-S rRNAs of the large subunit. The ribosomal genes are transcribed into a single polycistronic transcript, from which the mature rRNAs are cleaved. In prokaryotes, the 5-S rRNA also forms part of this transcript, but in eukaryotes the production of this rRNA has become the role of a distinct gene. In eukaryotes, ribosomal gene transcription uses a dedicated set of tran1
2
A list of abbreviations appears on page 58. To whom correspondence should be directed.
Progres in Nuclric Acid Research and Molecular Biology, Val. 50
25
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved
26
TOM MOSS AND VICTOR Y. STEFANOVSKY
scription factors and a specialized RNA polymerase, the DNA-dependent RNA polymerase I (RPOI). The resultant precursor-rRNA (pre-rRNA) is neither capped nor polyadenylated and is produced in the nucleolus, a large nuclear structure visible in the light microscope. By the 1930s, it was evident that the nucleolus forms from the secondary constrictions of specific metaphase chromosomes, hence these chromosomal loci were named nucleolar organizers ( 1 ) . But it was not until the 1960s that work, predominantly with Xenopus and Drosophila, showed that these same loci contain the ribosomal genes (2). We now know that the existence of a nucleolous is the direct result of ribosomal gene transcription (3).Once formed, the nucleolus becomes the target for the transport of newly synthesized r-proteins and ribosomal small nuclear ribonucleoproteins (snRNPs). The clearly visible "granular" portion of the nucleolus consists, in fact, of partially assembled ribosomes. The ribosomal genes are repeated in animals from one to several hundred times per haploid genome, whereas in plants they may be repeated several thousand times. These ribosomal gene repeats are distributed among several chromosomes, each chromosomal locus constituting a distinct nucleolar organizer. Ribosomal gene transcription accounts for about 40% of all cellular transcription, and 80% of the RNA content of living cells. It is therefore a major undertaking for any cell. Experiments with yeast show that ribosomal gene transcription is limiting for vegetative growth, and a minimum number of genes is essential for survival, (e.g., 4 , 5). The ribosomal genes of amphibians are often amplified in the germ line (2). This is probably necessary to permit the production of sufficient ribosomes for the large, rapidly developing amphibian embryo. Other organisms may use nurse cells or follicle cells to perform similar tasks. Drosophila bobbed mutants develop a shortened body as a direct consequence of a low ribosomal gene copy-number. The severity of the bobbed phenotype depends directly on the active gene copy-number, the mutation becoming severe around 130 ribosomal gene copies per haploid genome and lethal by 20 (6). These observations clearly indicate that ribosomal transcription is a limiting process in cell proliferation and during development. Because the hundred or so ribosomal genes, typically constituting only about 1% of the gene pool, account for 30 to 40% of the total cellular transcription (7), each ribosomal gene has clearly evolved to sustain a high level of transcription. It first became clear only about 10 years ago that this high level of transcription is made possible by a complex, multifunctional, and rapidly evolving array of enhancer sequences (8, see also 9, 10). In this review, we attempt to survey present knowledge of the mechanisms of eukaryotic ribosomal transcription and its regulation. Ribosomal transcription has been studied in an exceptionally wide range of organisms.
RIBOSOMAL TRANSCRIPTION
27
It is therefore often difficult to do justice to the full range of literature available and at the same time to present a coherent view of the subject. We have therefore been somewhat selective in our choice of data and take this opportunity to apologize to those whose work may appear to have been overlooked. The reader will probably find several earlier reviews helpful. Apart from our own previous review, which contains, to our knowledge, the only compendium of rDNA control sequences (9), there is an excellent resource text on all aspects of ribosome biogenesis (7) and many more specialized reviews concerned with ribosomal transcription (9-20), gene structure and evolution (21-23), growth regulation (14, 17), and ribosomal protein synthesis (14, 17, 24, 25)
1. General Aspects of Ribosomal Gene Regulation
The ribosomal gene products are, to all intents and purposes, stable structural RNAs3 (e.g., 26). Their concentrations can therefore be controlled only at two levels, the transcription rate and the rate of dilution through cell division. The rates of production of the other ribosome components, the r-proteins and the 5-S rRNA, must at the same time be coordinately regulated with rRNA production. However, it has been found that the regulation of the r-protein genes is often posttranscriptional, in some cases involving mRNA processingiturnover and in others simply the degradation of excess r-protein (14, 24, 25, 27, 28). The key element in deciding the rate of ribosome production is rRNA transcription. Not surprisingly then, the ribosomal genes are regulated in a growth-rate and hence often in a differentiation-dependent manner (14, 17), responding to many external stimuli. Serum deprivation, cell culture density, glucocorticoids, and differentiation in various forms rapidly down-regulate the rate of rRNA transcription, whereas stimulators of cell growth and division, such as insulin and phorbol esters, do the contrary (24,29-47). Unlike other cellular transcription, rRNA transcription is also controlled during the cell cycle, ceasing at each mitosis, with the concomitant disappearance of the nucleolus (48, 49). It is further controlled during early amphibian development in a manner distinct from that of mRNA and 5-S tRNA transcription. Ribosomal transcription does not commence until the gastrula stage of amphibian development, some 10 hours after the midblastula transition, when other tranRibosomal RNA turnover occurs on the order of days to weeks. Some data show changes in the t,,z of rRNAs between several days and several weeks with the degree of cell transformation.
28
TOM MOSS A N D VICTOH Y. STEFANOVSKY
scription begins (50-52). The growth regulation of ribosomal transcription results in various visible changes in the nucleoli. These changes are clear markers of cellular growth potential and have been used successfully in the diagnosis of various forms of human cancers as well as in their prognosis (53). Why should it be necessary to regulate ribosomal transcription in such a responsive manner? For prokaryotes and lower eukaryotes, the answer is almost certainly to achieve an ecologically efficient growth rate. Regulation is necessary essentially to conserve energy. However, this is probably not a satisfactory, or at least not a complete, answer for more complex organisms. Several observations suggest that the number of cellular ribosomes is a determinant of potential growth rate. As mentioned above, ribosomes are often stored in the germ cells of multicellular organisms to permit the rapid development of offspring, and the number of ribosomal genes is crucial for both normal development and rapid vegetative growth. Transcription of rRNA is therefore a limiting process in cell growth and organism development. By responding to growth rate changes, ribosomal transcription regulates ribosome production, and in so doing must determine the potential for cellular proliferation. Logically, this places ribosomal transcription in the position of a regulator of somatic cell growth, one that could counterbalance and even limit the long-term effects of mitogenic stimuli. It is then quite reasonable to propose that ribosonial transcription may not be simply responsive to growth rate, but may in fact be used as a means to check unwanted cell proliferation. Overproduction of the translation initiation factor eIF-4 has already been shown to induce short-term transformation of mammalian cells in culture (54). Because, in the long term, translation potential depends on ribosome concentration, deregulation of ribosomal transcription must b e a key step and possibly a determinant in neoplastic transformation.
II. Ribosomal Gene Organization and Evolution The ribosomal genes are commonly, though not exclusively, found in tandem chromosomal arrays, the so-called ribosomal DNA (rDNA) of each nucleolar organizer (23)(Fig. la). Metazoan genomes generally contain several hundred ribosomal gene copies (2), whereas in plants this number may be 10 times higher (55).In exceptional cases, the rDNA may not be arranged in a simple tandem repeat, but as an inverted repeat (56)(Fig. la). Amphibia (2, 57), insects (2),and fungi (58)also amplify their rDNAs during meiosis. In amphibia, this amplified rDNA takes the form of episomal rings of around 10 gene copies each. Slime molds and protozoa often have only one chromo-
29
RIBOSOMAL TRANSCRIPTION
Ill)
b
45s
I
XenoDus
,+
40s -
ITS1 ITS2
'
1
FIG. 1. (a) The various recognized organizations of ribosomal gene transcription units within the chromosome (i and ii) and as extrachromosomal units (iii). The genes are shown as simulated Miller spreads (297, 298), the lateral branches being the growing pre-rRNAs. (b) the organization of genes and spacers within the ribosomal gene repeats. The repeat lengths from three organisms are shown to scale. The genes are in black and the 45-, 40-, or 3 7 4 pre-rRNA transcripts are indicated by arrows.
soma1 ribosomal gene copy, which is amplified as extrachromosomal inverted repeats during somatic growth (59; Fig. la). These large palindromic rDNA fragments replicate autonomously and associate within a single nucleolus. The rDNA repeats of different organisms vary greatly in their lengths. For example, those of mammals are as large as 44 kbp, whereas that of the yeast Saccharomyces cerevisiae is just under 10 kbp. Less extreme length heterogeneities are also observed within the genome of each individual organism. These length heterogeneities are due to changes both within and between the structural genes. Each rDNA repeat contains the three (or four) rRNA genes in the order 184, 5 . 8 4 , and 28-S (or 28-S-1 and 28-S-2), 5' to 3' in the direction of transcription. All three genes are transcribed as part of a
30
TOM MOSS AND VICTOR Y. STEFANOVSKY
single, polycistronic precursor, variously called the 37-S (35-S), 40-S, or 45-S pre-rRNA in yeast, amphibialinsects, and mammals, respectively. The genes are preceded by a 5’-external transcribed spacer (5’-ETS), separated by internal transcribed spacers, ITS-1 and ITS-2, and followed by a short 3’-ETS (Fig. Ib). Because the genes code structural and probably enzymatically active RNAs (60, 61), they have been highly conserved during evolution. Despite this, the 18- and 28-S genes do vary in length, tending to be longer in higher eukaryotes and especially so in mammals. This length heterogeneity is restricted to specific sites within each gene, the so-called expansion segments (62), which may represent RNA parallels of variablelength loops found in some protein folding domains. A more important length heterogeneity occurs as a result of increased gene spacing, i.e., the lengths of ITS-1 and ITS-2, and of an increase in the length of the ETS. However, by far the greatest length heterogeneity occurs within the intergenic spacer (IGS)4 (59) (Fig. Ib). The IGS varies in length from about 2 kbp in S . cerevisiae to about 21 kbp in mammals, and with very few exceptions it contains significant regions of internal repetition. These repetitive regions account for the very common IGS length heterogeneities or polymorphisms observed within the genomes of individual organisms and between individuals of a given species. The organization and sequence of the IGS vary enormously between species, sequence homologies being evident only between closely related organisms (Fig. 2). Hence, the various ribosomal IGSs are clearly the result of many independent evolutionary events. Naturally, this apparent lack of evolutionary constraint led to general acceptance of the concept of the IGS as “junk DNA.” Our work in the early 1980s (8)began a flood of publications that have forced us to discard this concept of the IGS (Sections I11 and V). We now know that basal transcription of all known rDNAs requires sequences lying close to and upstream of the pre-rRNA initiation site. In vertebrates, the basal promoter is closely preceded by a terminator element that is also implicated in promotion. Further upstream, one or more arrays of repeated sequences, usually of variable length, are found to have enhancer activity (Fig. 2). A very large number of IGS sequences have now been determined. Some of these were analyzed and compared in our previous review (9). A selection of other sequences can be found for the following groups: mammals (63-67) (a complete sequence of human IGS will also soon be available from the 4 Formerly, this region was referred to as the nontranscribed spacer (NTS). However, because much of the region was found to be transcribed, this name became very misleading. The IGS has become the preferred name defining the region between the 5’ and 3’ ends of the semistable, preribosomal RNA polycistronic transcript.
Enhancers
Enhancers
--
X. laevis(pXll08)
+Super
Repe-
7-l
X. borealis(pXb
Directional Enhancers D. melanogaster
S. cerevisiae
2 l
k
b
FIG. 2. Sequence homologies and functional elements within the ribosomal IGSs of various organisms. Homologous sequences within a given IGS are indicated by shading. For Xenopus species, this shading also indicates interspecies homologies. The arrows indicate active RPOI promoters. Terminators (T) are indicated. It should be noted that in Xenopus laeois, T2 functions only as a processing site, whereas in Xenopus borealis it is also an active terminator. T3 is an active terminator in both species. In all cases the enhancers, spacer promoters, and other repetitive sequence elements are repeated a variable number of times within the IGSs of a given species, subpopulation, and even individual. The diagrams therefore depict IGS sequence organizations that have been found in given molecular clones, not definitive examples of IGS structure. The data are taken from a previous structural analysis (9).
32
TOM MOSS AND VICTOR Y. STEFANOVSKY
Sylvester laboratory), insects (68-72), plants (55, 73-87), nematode (88), Trypanasoma (89, go), fungi (91, 92), molds (93, 94), and yeasts (95-97). If we look at the best-studied examples of IGS organization, we see that probably all the IGSs, but in particular the enhancer repeats, have been generated by the repeated amplification of the gene promoter or of a subpromoter element (9, 68, 70, 98-100, 102) (Fig. 2). In Xenopus, data from three different species show that although the same sequences have been amplified, their organization within the spacer varies considerably. Essentially the same is true in Drosophila species. However, in Drosophila melanogaster and Drosophila virilis, for example, the IGSs have clearly evolved from independent amplification events. Despite this, the same functional element, the gene promoter, has been chosen for amplification in each species (9, 70). Hence, the enhancer repeats of the IGS probably represent an extreme example of convergent evolution toward a common mechanism. Given the apparent continual evolution of the IGS, how are the ribosomal gene arrays within a given organism maintained homogeneous? To some extent, they are not. The number of repeat elements in the IGS of a given species is very variable, both within the population and within the genome of each individual. This length variability in a population is probably explained by unequal crossover (103-106). Unequal crossover will lead to the homogenization of IGS sequences and necessarily to a degree of IGS length heterogeneity and rDNA copy-number variation. Conforming to this view, foreign sequences introduced into the yeast rDNA locus are rapidly deleted or expanded into ajoining rDNA repeats (107). The coevolution of multiple rDNA loci can also be explained in terms of unequal crossover between chromosomes (e.g., see 108, 108a). However, another mechanism may also be at work. In yeast, the inactivation of topoisomerase I leads to the nonlethal accumulation of autonomously replicating, extrachromosomal rDNA, excised from the chromosome (108b, 108c). Reactivation of the topoisomerase I causes the rapid reintegration of this rDNA at the chromosomal rDNA locus. These data suggest that ribosomal DNA sequences may be maintained homogeneous by a continual process of excision, amplification, and reinsertion. This process distantly resembles the now discarded master-slave hypothesis (109-111), the master gene(s) in this case being chosen randomly. Excision, amplification, and reintegration of the rDNA could result in the rapid propagation of mutations throughout all the gene repeats of multiple nucleolar organizers. Both unequal cross-over and excision-integration mechanisms of rDNA homogenization would permit the transient coexistence of mutant and wild-type genes within a single genome. Viable compensatory mutations within the transcription machinery can then be selected and fixed within a population, a process called “molecular coevolution” or “molecular drive” (108).Thus, the repetitive nature of the
RIBOSOMAL TRANSCRIPTION
33
ribosomal genes, along with the use of a dedicated ribosomal transcription machinery, permit rapid evolutionary change, inevitably leading to species specificities. Molecular coevolution of the IGS and the ribosomal transcription machinery has resulted in a high degree of incompatibility between the transcription machineries of different organisms. In general, ribosomal promoters show little sequence homology and are functionally incompatible between orders of organisms (Fig. 3 and Section 111; see also 9). For example, mouse and rat promoters are quite interchangeable, but do not function in human cells (32, 112-114). By contrast, mammalian and Xenopus enhancers are interchangeable (115) and even plant enhancers function in Xenopus (116), all despite a complete lack of significant DNA sequence homology (Section VI). These and many other observations (Sections IVVII) suggest that despite the enormous range of IGS structure and sequence that exists in eukaryotes, the underlying mechanisms of ribosomal transcription are common. A very striking observation provides strong support for this hypothesis. Despite the lack of sequence homology between the mouse and Xenopus ribosomal promoters, the Xenopus promoter can be persuaded to function very efficiently in the mouse by the simple means of a 5-bp insertion between its two major promoter elements ( 117) (see Section 111).
111. RNA Polymerase I Promoters Figure 3b shows a small selection of RPOI promoters from diverse organism@. It can be seen that little sequence homology is apparent even between promoters from relatively closely related organisms, e.g., D . melunoguster and D. virilis or humans and mice. This made early promoter mapping difficult. Luckily, duplicate active promoters were found within the IGS of Xenopus and Drosophila. This allowed preliminary promoter boundaries to be very profitably judged from the clear homologies between the gene and spacer promoters (9). The RPOI promoter has now been studied by deletion (118-126), linker scanning (127-133), and point mutation (134-139) in mammals, amphibia, diptera, protozoa, and yeast. What emerges from these studies is a view of the promoter as two essential and specifically spaced sequences (Fig. 34. The two-element model of the RPOI promoter has a core promoter proximal to the initiation site and an upstream promoter 5 In a previous review, we presented an extensive alignment and analysis of RPOI promoters and their control sequences derived from mammals, amphibia, diptera, and yeasts (9).To our knowledge, that work still constitutes the only such rDNA sequence analysis available.
b
___
-1m
-1W
CCGGGGCCCT C C C G ~ G A G GCCCCGATGAG GAC~GATTCG CCCGGCCCGC CCCGGCCGGA GTTCCGGGAG
XI
TGAGGTCCGG ~~~~ITITCGTTATGGGGTCA WGGGCC
M GTTGTTCCTT
At
.................................................... AGGGGAAA AAATAATCAT ..............CCGGCA TAAGTCAATT A T G m A T A A AAGGAGAATA ATGAAGTTAT W G T G T A T ...................................................................... ...................................................................... .................................................................... TC
Sc
AGAATAGCTT AAATTGAAGT TIlTCTCGGC GAGAAATACG TAGTTAAGGC AGAGCGACAG AGAGGGCAAA
Dm Dv Ac Tt
-
-50
HSS GCCGGCG .GC GTGGTCGGTG ACGCGACCTC CCGG .CCCCG GGGA u o x1
xi
CCCGGGGAGA GGAGCCGGCG GCCCGGCCTC TCGGGCCCCC CGCACGACGC C~CCATGCTAC G C ~ T I T ~ ~ G ~
Fh CAGGTATTCT CTGTGGCCTG TCACllTCCT CCCTGTCTCT TTTATGClTG L
O
X
1 ?-
Dm ATAATATATA AGAGMTAGC CGCTATGTGG GGTGGTAAAT GGAATTGAAA ATACCCGCTT TGAGGACAGC Dv TATTAAATTA GTACATGAAG ACATTAAGGT GAATGGTAGC A l T f G M A A A AATATCGCCA TTATAGATGA
...........................
Ac
.
.......................................
Tt
r
G
T CTTTGGCAAA MAAATAAM ATAATATCAG
At
C M G T A l l T C lllTmTG GCACCGGTGT CTCCTCAGAC AllTCAATGT CTGTTGGTGC CAAGAGGGAA
Sc
AGAAMTAAA AGTAAGAllT T A G l T X r A A TGGGAGGGGG G G m A G T C A TGGAGTACAA GTGTGAGGAA
+39
-1 +1
FGGGCCG CCGGGTTATT GCTGACACG CTGTCCTCTG1 GCGACCTGTC
~ s s
GGAAGGT
AGG&AAGA~;~GGCCCTC&
GCTGGAGAGG
GCGCGACW~: $GCCCGW*
ATAGGG ACTGACAqG C T G T C C m C CCTATTAACA CTAAAGGACA
m
GGGTT&
A. .CTACTAT A. GGTAGGC AGTGGTTGCC GACCTCGCAT TGTTCGAAAT
Dv T G T G T W AACCTATTC Ac
7
.
GACCGTCCGA AAGTATATAT
ATGGTGAGC AGTGTGTGCT CATCACATTA CGCTGAAAGC
.................. ..............................
AAAGGGACG GGTCCGGCCG GG
Tt
GGGGGTAAAA A T G C A T A l l l AAGAA....
At
AAGGG.CTAT TAATCTATAT AGGGGGGTG GGTGTTGAGG GAG.
................
SC AAGTAGTTGG GAGGTACrrC ATGCGAAAG CAGTTGAAGA CAAGTTCGAA AAGAGmGG
FIG. 3. (a)The general arrangement ofelements within a typical RNA polymerase I promoter. The upstream promoter element (UPE), also refered to as the UCE, is shown specifically spaced from the core promoter. Intrapromoter elements (IPE) within the spacer region modulate transcription in some systems. (b) Sequence alignments of RNA polymerase I promoters from a broad range of organisms (68, 82, 98,100,125,140,299-309). The regions of the mapped UPE and core elements are indicated by boxing. The approximate binding sites for the TBP,-complex, as deduced from footprinting, are shown by shaded overlining. The sites ofbinding for the various UBF HMG-boxes (box 1, etc.) are indicated below the sequences with horizontal arrows.
RIBOSOMAL TRANSCRIPTION
35
element (UPE), which has been given various names in the literature, e.g., upstream control element (UCE)G. Spacing of the UPE and core elements appears crucial in all situations in which the UPE is an essential promoter element. Data to this effect are now available from studies with rats, Xenopus, and yeast (128, 130, 131). In each case, insertions or deletions of about half a DNA duplex turn are very deleterious, whereas insertions of about a whole turn are much less so. As demonstrated by the “Xenopus paradox,” promoter spacing is probably also one important aspect of the species specificity of ribosomal transcription. A 5-bp insertion between the UPE and core of the Xenopus promoter produces a strong mouse RPOI promoter, while making this promoter nonfunctional in Xenopus (117). The two-element promoter model is not without its derogators, especially among those working on protozoa, fungi, and plants. In these organisms, the ribosomal promoter appears somewhat simpler, consisting of only the initiation-site proximal element (122, 140). The most extreme examples of this simple promoter organization require only a few bases around the initiation site. Only two such examples have been observed to date, the RPOI promoter of Arabidopsis (C. S. Pikaard, personal communication) and, surprisingly, the Xenopus initiation-site element (when microinjected at high template concentrations into oocytes) (141).The former occurs at low copy-number in plant cells and hence may indicate a very different organization of plant promoters. [Plant RPOI promoters, strangely enough, often include TATA boxes, the consensus TATA-box binding protein (TBP) site found in many RPOII and some RPOIII promoters.] Promoter activity of the Xenopus initiator is probably a special case in which relaxation of sequence requirements has been induced by the choice of specific assay conditions. Most mammalian promoters were also originally defined, from in vitro studies, to consist of only an initiation-site proximal element, the core promoter (Fig. 3). Subsequently, more stringent assay conditions have usually revealed a UPE (130, 133, 137, 142, 143). 6 To date, experiments demonstrate the necessity of a UPE, but no control function has been demonstrated for this promoter element. Hence we prefer here to use the less committal term UPE.
Vertical arrows below the sequences indicate a DNase-I hypersensitivity due to UBF binding. For want of a better approach, the promoter sequences were aligned by the program Pileup (310)with a 5.0 weighting against gaps, and the initiation sites were then forced into alignment manually. Surprisingly, the Xenopus and human sequences always scored as more homologous than the mouse and human ones, independent of the gap weighting used. This is perhaps only an indication of the (G C)-richness of the Xenopus and human sequences.
+
36
TOM MOSS AND VICTOR Y. STEFANOVSKY
Point mutation studies of ribosomal promoters have, in particular, demonstrated that KPOI is not very exacting of DNA sequence. Typically, only bases around the initiation site, a G at -7 and in mammals at - 16 and one or two bases in the UPE, are of very significant importance (134, 136-138). At all other sites, single-base mutations have only minor effects. However, both point-mutation and linker-scanning studies have demonstrated a modulating role for intrapromoter element (IPE) sequences between the UPE and core elements (128-131, 137). Hence we must conclude that, although a simple model of the promoter as a correctly spaced UPE and core explains much of the data, IPE sequences can modulate promoter activity. This is perhaps not surprising considering the extent of the protein DNA contacts that occur within this region of the promoter (see Sections IV and V.) The species specificity of mammalian promoter elements has been studied mainly by the production of mouse-human promoter chimeras. These studies clearly show that the species specificity resides solely within the core promoter (144),but not within a particular subcore promoter sequence (145). The “Xenopus paradox,” i.e., the observation that a lengthened Xenopus promoter functions perfectly well in the mouse, clearly contrasts with this observation (117). It would therefore seem likely that the exact reasons for species-specific transcription will not be explained by distinct, easily identifiable differences in promoter sequence and transcription factor structure. Species specificity is probably much more subtle and variable than is presently envisioned. As such, the final explanation may come only when we eventually resolve the structures involved at their atomic level.
IV. The Basal Transcription Factors Many investigators have isolated the protein factors necessary for RPOI transcription. Because, in most systems, these factors are still defined in terms of partially or even highly purified chromatographic fractions but not as cloned genes, it is extremely difficult to make definitive comparisons (but see 16). The mammalian (144, 146-157) and Acanthamoeba (30, 158-161) systems are probably still the best characterized. However, with the cloning of Xenopus UBF (xUBF) (152-164) and the identification of Ribl, an SL1like activity (163), the Xenopus system has begun to catch up. More recently, the yeast system has also begun to become important (165-167).
A. The TBP,-Complex and UBF Ribosomal transcription in vitro requires an active form of the dedicated polymerase RPOI and two other purified factors, UBF (an HMG-box protein) and an RPOI-specific TBP-complex (TBP,-complex) (variously called
37
RIBOSOMAL TRANSCRIPTION
b
J
FIG.4. (a) A generalized, low-resolution model of protein-DNA interactions at the mammalian RPOI promoter. (b) The components of the mouse and human TBP,-complex. The organization of the polypeptides is as yet unknown; however, the diagram is consistent with the known in uitro protein-protein and protein-DNA interactions.
SL1, TIF-IB, TFID, factor D, or Rib 1)7 (18, 146, 148, 150, 152, 163). The TBP,-complex carries the major, through far from the sole, species selectivity observed among RPOI promoters (147, 150). Alone it binds only weakly if at all to the UPE and core promoter elements (144, 146-148, 152, 159, 160, 163). However, UBF permits, or at least greatly enhances, the binding of the TBP,-complex (148, 152, 158, 163, 168) (Fig. 4a). Thus, U B F either interacts with the promoter before the TBP,-complex or they do so together. This makes UBF a potential regulator of TBP,-complex binding. It should be noted that several experiments demonstrate that UBF is not essential for in vitro transcription (148,158,169,170). This may be related to the conditions used in vitro that select for independent core promoter activity (Section 111). The absolute concentrations of template and of the basal factors will affect the rate of initiation complex formation; therefore, 7 We have taken the brave or perhaps foolhardy step of not using any of the current names for this factor. The most common general term outside the intimate RPOI clique is SL1. However, among the connoisseurs, each has his own term for this fxtor. By using “TBP,complex” we are not attempting to coin a new nomenclature, but simply to provide an impartial and rational term.
38
TOM MOSS AND VICTOR Y. STEFANOVSKY
high concentrations may bypass the need for UPE-core cooperativity. It may be the role of UBF to permit this cooperativity (see Section V). 1. THE TBP,-COMPLEX
In yeast, TBP (yTFIID is often used synonymously with yTBP) is required for transcription by all three polymerases, and mutations in yTBP that differentially affect RPOI, 11, and 111 transcription (171, 172) have been created. Further, TBP is a functional part of the RPOI transcription initiation complex in higher and lower eukaryotes (155, 156, 159). Therefore, it is fairly clear that the same TBP must function for all three RNA polymerases (e.g., 173-176). The TBP,-complex is best defined in humans and mice. Here it has been purified as TBP and three associated polypeptides (RPOIspecific TBP-associated factors; TAFp) of 95, 68, and 48 kDa in mice and 110, 63, and 48 kDa in humans (155, 156) (Fig. 4b). The TBPI-complex of Acanthamoeba has been purified as an even larger complex containing TBP and 145-, 99-, 96-, and 91-kDa TAF,s (159). Studies on 5-S gene-initiation complexes (177) have greatly stimulated the use of UV cross-linking8 to study the protein-DNA contacts of basal transcription factors. In the mouse and in humans it has been shown that at least two of the TAFIs, the 68/63- and the 48-kDa species, interact with DNA (178 and L. Comai and R. Tjian, personal communication) Fig. 4b). This leaves open the question of whether TBP also contacts the ribosomal promoter DNA. The structure of TBP, with its extensive DNA-contacting surface, suggests that this is likely (179, 180). [The DNA-binding characteristics of TBP suggest that this protein has only a slight preference for the TATA-box sequence over other DNA sequences (181 and T. Moss and S. I. Dimitrov, unpublished observations).] Because TBP is essential for all eukaryotic transcription, it would be surprising if it did not play a similar role at all promoters, whether they be TATA-box or non-TATA-box, polymerase I, 11, or 111. Non-TATA-box RPOII promoters bind TBP, and some can even function with TBP in place of the holo-TFIID (182, 183). A very recent study on the Acanthamoeba TBP,-complex shows that the 145-, 99-, 96-, and 91-kDa TAF,s contact the DNA and, most importantly, that TBP does so also (184). For the present, it seems reasonable to assume that TBP contacts the RPOI promoter DNA. However, studies from the same group (159) also demonstrated that, while a TATA-box oligonucleotide interfered with RPOII and I11 promotion, it did not affect RPOI promotion in a sequence-specific manner. Two broad explanations are possible: (1) the TBP of the TBPI-complex is not actually Cross-linking of proteins and polynucleotides by UV is discussed by E. I. Budowsky and 6. G. Abdurashidova in Vol. 37 (1989) of this series [Eds.].
39
RIBOSOMAL TRANSCRIPTION
able to interact with the DNA, but is close enough to cross-link, or (2) the DNA-binding surface of TBP is distorted within the complex by the TAF, so it is no longer able to recognize the TATA-box as a prefered DNA-binding site. The availability of the TAF, genes should resolve these questions and allow rapid advances in our understanding of TBP,-complex function in the next few years.
2. UBF The determination of the primary structure of human UBF (185) established the existence of a family of HMG-box transcription factors to which the sex-determination factors, tissue-specific regulatory factors, and mitochondrial factors, among others, have since been added (186). UBF has been isolated from human, Xenopus, rat, and mouse cells (162,164,185,187-190) and may exist in a broad range of other organisms, including protozoa (158, 191), plants (191), yeast (95), and Drosophilu (B. Leblanc and T. Moss, unpublished observations). This suggests that UBF is a universal eukaryotic protein, as we might expect of a ribosomal transcription factor. However, the structure of UBF is somewhat unusual as compared with other transcription factors. In vetebrates, UBF is a protein of 80 to 92 kDa, depending on the organism, and has five or six tandem homologies to the DNA-bindins domains of HMG 1and 2 (162,185)(Fig. 5). Like HMG 1 and 2, the UBFs also end C-terminally in long blocks of acidic residues. UBF is a highly conserved protein. It shows only one amino-acid change Nuclear targeting
Nucleolar localization UBF2
Nucleolar localization
1
OOH
N
UBF 1 NH Dirner.1 I I
J
I
1
I I
2
11 I I
3
11 I I
4
11
5
6
1
,,
]I
,
Acidic COOH
A
Nuclear Localization FIG.5 . Diagrammatic representations of the mammalian UBF and Xenopus (xUBF) structures and the structures of the UBF variants. HMG-box domains are shown numbered and shaded. The nuclear and nucleolar localization signals are indicated, as is the acidic C-terminal domain. Nuclear localization and targeting experiments, respectively, in Xenopus and the mouse were performed somewhat differently and have identified quite different sequences.
40
TOM MOSS AND VICTOH Y. STEFANOVSKY
between primates and rodents and is 73% conserved between mammals and Xenopus, conservation of the two N-terminal HMG-boxes exceeding 90% (162) (Fig. 6a). In contrast to this interspecies sequence conservation, the individual HMG-boxes of UBF are highly evolved (Fig. 6b). For example, HMG-box 3 of U B F is no more homologous to HMG-box 1 or 2 than it is to the HMC-boxes of HMG 1 and 2 or that of the sexdetermination factor, SRY (186).Each HMG-box of UBF is therefore under very different evolutionary constraints, suggesting that each has a distinct function. These functions may include DNA sequence selection and the
a hubfbox3 xubfbox3 hubfbox6 xubfbox5 hubf box5 xubfbox4 hubfbox2 xubfboxZ hubfboxl xubfboxl hubfbox4 WG- 1b o d WG-Tbod IEMj-lboxB WG-TbOxE Srybox
b
1
GRPTKPPPNS GRPTKPPPNS GEPKKPPMNG GEPKKAPMNG GKLPESPKRA AKLPETPKTA SDIPEKPKTP SDVPEKPKTP PDFPKKPLTP PEFPKKPLTP SEKPKRPVSA PKKPRGKMSS PNKPKGKTSS PNAPKRPPSA PNAPKRPSSA ECHVKRPMNA
YSLYCAELMA YSMYCAELMA YQKFSQELLS YQKFSQELLS QQS EEIW EEIW QQS QQLWTHEKK QQLWNHERK YFRFFMEKRA YFRFFMEKRA MFIFSEEKRR YAFFVQTCRE YAFFVATSRE FFLFCSEYRP FFIFCADFRP FMVWSRGERH
... ...
41 88 N..MKDVPST ERMVLC SQQWKLLSQK EKDAYHKKCD QKKKDYEVEL LRFLESLPEE SQRWKLLSQK EKDAYNKKCE QRKKDYEVEL MRFLESLPEE N..MKDVPST ERMVLC GSRWQRISQS QKEHYKKLAE EQQKQYKVHL DLWVKSLSPQ NCELNHLPLK ERMVEI GSRWRISPS QKDYYKKLAE DQQRVYRTQF DTWKGLSSQ NGELNHLPLK ERHVEI VIGDYLARFK NDRVKALKAH EMTWNWEKK EKLMWIKKAA EDQKRYEREL SEMR.APPAA VIGDYLARFK NDRAKALKSM EGTWLMEKK EKIMYIKKAA EDQKRYEREL SDMR.ATPTP WLKVRPDAT T..KEVKDSL GKQWSQLSDK KRLKWIHKAL EQRKEYEEIM RDYI.QKHPE WLKLHADAS T..KDVKDAL GKQWSQLTDK KRLKWIHKAL EQRKQYEGIM REYM.QKHPE KYAKLHPEMS N..LDLTKIL SKKYKELPEK KKMKYIQDFQ REKQEFERNL ARFR.EDHPD KYAKLHPEMS N..LDLTKIL SKKYKELPEK KKMKYIQDFQ REKLEFERNL ARFR.EEHPD QLQEERPELS E..SELTRLL ARWNDLSEK KKAKYKAREA ALKAQSERKP CGER.EERCK EHKKKHPDAS VNFSEFSKKC SERWKTMSAK EKGKFEOMAK ADKARYEREM KTYI.PPKGE EHKKKHSGAS VNFSEFSKKC SERWKTMSAK EKGKFEDLAK LDKVRYEREM RSYI.PPKCE .GDVAKKL GEWNNTAAD DKQPYEKKAA KLKEKYEKDI M Y . .RAKGK KIKGEHPCLS I. QVKGETPGLS I..GDVAKKL GEKWNNLTAE DKVPYEKKAS KLKEKYEKDI TAY..RNKGK KLAQQNPWQ NTE..ISKQL GCRWKSLTEA EKRPFFQEAQ RLKTLHREKY PNYKYQPHRR
.... .... .... ....
11
r---
f
S
B m c
-l
$
fi
W x
FIG. 6. HMG-box homologies and sequence alignments. (a) The primary sequences of the various HMG-boxes were aligned using the program Pileup (310), without consideration of tertiary structure constraints. This allowed the construction of a tree (b) showing the relative homologies.
g
B
RIBOSOMAL TRANSCRIPTION
41
provision of specific interfaces for protein-protein interactions with the TBP,-complex or with RPOI (192) as well as interactions with repressors of ribosomal transcription (Section VII). Because UBF-binding is the first step in promoter recognition, UBF has the potential to regulate gene activation. UBF dimerizes in solution via the N-terminal 80 or so amino acids. This dimerization domain as well as the C-terminal acidic domains and the HMG-box domains are all required for full promoter activation (190, 193-195). xUBF cannot fully functionally replace the mammalian UBF in in vitro transcription assays, and the converse is also true (151, 196). This specificity is predominantly due to the absence from xUBF of the mammalian HMG-box 4 (162, 164, 195) (Fig. 5). It is clear from footprinting data that UBF and xUBF specifically recognize and position themselves on their cognate promoter sequences (195, 197). However, the means by which they do this is somewhat of a mystery. Although linkerscanning mutations of the RNA polymerase I promoter do have some effect on UBF binding (143, 144, 148), point mutations in essential promoter elements have no effect at all on xUBF recognition of the Xenopus promoter (137). We have even found that nearly 50% of the xUBF binding site can be deleted before promoter recognition is affected (197). Most surprisingly, the mammalian and Xenopus UBFs are completely interchangeable in their recognition of the human and Xenopus RPOI promoters, despite the almost complete lack of promoter sequence homology (see Section I11 and Fig. 3). Hence, (1) though it is not evident at the DNA sequence level, the mammalian and Xenopus ribosomal promoters must in fact encode common information for the positioning of UBF, and (2) aminoacid sequence differences between equivalent HMG-boxes of the known UBFs do not affect significantly their DNA-sequence recognition. Attempts to understand the DNA sequence motifs recognized by the UBFs have not brought us further than finding that most binding sites are (G + C)-rich (195, 197, 198). Even this rule is not absolute, because HMGbox 2 of xUBF binds across a dA, tract downstream of the transcription initiation site (197). It is tempting to compare the DNA sequence recognition of UBF with nucleosome phasing. As will become clear in Section V, this may not be the only similarity that UBF-DNA complexes share with nucleosomes. Somewhat contradictory results have been published on the transcription activation roles of the various UBF domains. The contradictions are probably due to the various in vitro transcription systems used. In vitro experiments are likely to test only a subset of a protein’s functions; depending on the in vitro system used, this subset may be different. The in vitro activities of the human, rat, and mouse UBF (hUBF, rUBF, and mUBF) proteins have been shown to require the acidic C-terminal domain (194,195), phosphorylation of
42
TOM MOSS AND VICTOR Y. STEFANOVSKY
which may be important (193, 199, 200). On the other hand, data from Xenopus suggest that this domain is largely dispensable, as are HMG-boxes 4 and 5 (190). Our data show that a short 24-aminoacid segment between HMG-box 5 and the C-terminal acidic domain is essential for nuclear transport of xUBF (201), and others have shown that HMG-box 1 and the acidic C-terminal domain are necessary for nucleolar localization of mUBF (202) (Fig. 5). Both of these functions are, of course, unnecessary in uitro. Mouse UBF is also able to derepress the polymerase I promoter in vitro, a function that apparently requires the C-terminal acidic domain (see Section VI1,A). Extrapolating somewhat from these varied studies, we might conclude that HMG-boxes 1 to 3 in Xenopus and 1 to 4 in mammals, along with the N-terminal dimerization domain, may be sufficient for promoter function under noncompetitive conditions. Perhaps it is predominantly these domains that allow the TBP,-complex to recognize the promoter. These are also the domains necessary to position UBF on, and induce a specific fold in, the ribosomal promoter (197, 203) (see Section V ) . Two major UBF variants are found in all species so far studied (Fig. 5), the result of a differential splicing of the UBF message. In mammals, UBF is expressed from a single gene, whereas in Xenopus two or more genes express xUBF (164,185,188,204-206).The differential splicing events in both Xenopus and mammals eliminate specific amino-acid sequences from a subset of UBF molecules. However, despite the very obvious interspecies conservation of the UBF proteins, the region eliminated by splicing in Xenopus bears no relationship to that eliminated in mammals (Fig. 5). In Xenopus, a 22-aminoacid region, which might be considered a highly evolved remnant of the mammalian UBF HMG-box 4, is differentially spliced. In mammals, it is a 37-aminoacid segment of the highly conserved HMG-box 2 that is differentially spliced. In the mouse and rat, UBF1, which contains a complete HMG-box 2, is transcriptionally active, whereas UBF2, in which the remnant of box 2 is almost certainly nonfunctional, is inactive (170, 207). This is perhaps not surprising because mouse UBF2 has about a tenth of UBFl's affinity for the promoter. Consistent with its activity, the UBFl variant predominates in rapidly growing cells (45).We do not yet know if the xUBFl and 2 variants differ in their transcriptional activities. Clearly, they do not exhibit different promoter affinities. However, the differential splicing that leads to their production is developmentally regulated, mRNA for the longer (xUBF1) form being predominant under conditions of highest growth rate and early in development (206).Hence, the very different structural changes brought about by UBF splicing in mammals and Xenopus may belie a functional commonality.
43
RIBOSOMAL TRANSCHIPTION
B. The Polymerase The RNA polymerase I consists of an unknown number of functional peptides. Estimates of the active polymerase complexity range from about 7 peptides to as many as 14 (208-211). Because even the lowest level of complexity suggests that the polymerase is over 500 kDa in size, only a concerted effort will resolve the nature of the active polymerase form. In yeast, this effort is being undertaken by several groups and already some data on the roles of the different subunits are available (209,212). Several RPOI subunits are also shared with one or both of the other nuclear RNA polymerases, RPOII and RPOIII (210, 211). It has been suggested that RPOI is the most distinct of the three eukaryotic RNA polymerases (213). Clearly, if correct, this infers that the RPOI transcription machinery is also likely to be the most distinct of the three eukaryotic systems. We have recently isolated and sequenced a cDNA for the Xenopus RPOI large subunit (RPOI1) and have collaborated with L. I. Rothblum (Weis Center, Danville, PA) in the isolation of the equivalent rat cDNA (S. I. Dimitrov, W. Q. Xie, L. I. Rothblum and T. Moss, unpublished observations). The Xenopus and rat sequences fully confirm that RPOI is the most distinct of the three eukaryotic polymerases (Fig. 7). Hence, we must
I
I
I
rl U
2 CT +r
rl
Fx
o
U rl
X
U rl
g
rl
d U H Y
d n U
2
2 5
pc g A
U
n U pc +r
rl U H
i
FIG.7. Primary sequence relationships between the large subunits (subunit 1)ofthe RNA polymerases (RPO) I, 11, and 111. The respective sequences from X . laeuis, S. cereoisiue, T. brucei, and the mouse were aligned and their degrees ofhomology calculated by the program Pileup (310).
44
TOM MOSS AND VICTOR Y. STEFANOVSKY
expect the RPOI transcription machinery to be the most divergent of the three nuclear systems. RPOII transcription initiation involves protein phosphorylation and P,y hydrolysis of nucleotide triphosphates. Neither of these events appears necessary for transcription initiation by RPOI (214). However, like RPOII, RPOI has been shown to exist in two forms, one competent for faithful in uitro transcription (the active form), and another with DNA-dependent RNA polymerase activity but unable to initiate transcription specifically (30-32, 36). Phosphorylation of the C-terminal domain (CTD) heptapeptide repeat of the large subunit differentiates the active and inactive forms of RPOII. Though the large RPOI subunit does not contain a C-terminal repeat, it has been suggested that the active and inactive forins of RPOI may also be related by a phosphorylation event or possibly by some other posttranslational modification (30, 31). Two laboratories have identified what is probably the same essential growth-regulated factor associated with the active form of the mouse RPOI (33, 36-38, 157). These data are further discussed in Section VII.
V. Activation at the Ribosomal Promoter The structure of UBF is very unlike that of any other known transcription factor, with the exception perhaps of mTFl (Fig. 5 ) (215). Further, no distinct transcription activation domains can be defined (Section IV). Parallels between the acidic U B F C-terminal domain and RNA polymerase I1 acidic activators are clearly unrealistic. The former consists only of acidic residues and hence is probably unstructured in solution, while the latter contains only a few specifically distributed acidic residues (216, 21 7). Further, the acidic C-terminal domain of HMGl is unable, even in part, to replace an RPOII acidic activation domain (218). Hence, we must search for an alternative explanation of RPOI transcription activation. The HMG-boxes of LEF-1 (TCFla) and SRY bend DNA in a sequencespecific manner and bind strongly to cruciforms, or four-way junctions, independent of their DNA sequence (186, 219). The HMG-boxes of HMGl also bend DNA and bind cruciforms, but they further recognize drug-DNA complexes and supercoil plasmid DNA. Not unexpectedly, the HMG-boxes of UBF also bind cruciforms (198, 207). xUBF can supercoil DNA (203, 220, 221) and the ligase-mediated circularization assay shows that it can bend DNA (222). In short, the HMG-box appears to induce or select for certain DNA distortions, the exact nature of which is not yet obvious. The solution structure of HMG I-box B is known (223, 224). It consists predominantly of three a-helices, two of which form arms in a V conformation. It has been
HIBOSOMAL TRANSCHIPTION
45
suggested that the angle of this V may be a defining factor in the degree of DNA bending each HMG-box will induce. We must await a DNA-protein structure determination before this can be confirmed. An obvious proposition to explain UBF function is that it somehow bends the ribosomal promoter sequences, possibly allowing the UPE and core elements to cooperate. The affinity of HMG-boxes for DNA cruciforms and four-way junctions has been a popular starting point for this type of explanation. The similarity between a four-way junction and two DNA duplexes layed across each other (225) suggests that UBF might stabilize a loop by binding at the DNA entry and exit points. Electron-microscope evidence supporting this point of view has been obtained (221). However, the observed structures demonstrate long-range looping of one to several kilobasepairs of DNA and not the 100 or so base-pairs typical for the spacing between the UPE and core. Further, ligation-mediated DNA circularization experiments clearly show that the HMG-boxes of HMG 1can promote the looping of as little as 60 bp (226, 227) and xUBF can loop as little as 150 bp (222). Looping of such short DNA fragments could not be induced by HMG-box binding to crossovers and therefore indicates direct DNA bending by the H MG-boxes.
A. The Enhancesome Our previous data showed that the repeated HMG-boxes of xUBF interact across the transcription initiation site of the Xenopus core promoter (197). This interaction clearly occurs in a colinear manner, consecutive HMGboxes in the protein interacting with adjacent DNA sequences (Fig. 8a). Very similar results have been obtained for the human UBF binding at the UPE (UCE) element of the human promoter (195). If, as discussed above, UBF were to loop the promoter by binding at the crossover points of the DNA loop, the DNA-binding sites for consecutive HMG-boxes would not be adjacent. Rather, they would be positioned discontinuously along the promoter DNA. Hence, the available footprinting data is also incompatible with models implicating HMG-box binding at DNA crossovers. While mapping the xUBF HMG-box sites on the Xenopus promoter, we noted a protein-protein interaction between the C-terminus and HMG-box 1. The interaction depended on the presence of sequences between about +20 and +40, suggesting that it was the C-terminus of the downstream xUBF monomer that interacted with the HMG-box 1 of the upstream one (Fig. Sa). This was most easily explained in terms of a folded core promoter (197). More recent studies show that it is in fact the acidic C-terminal domain that interacts with the HMG-box 1domain sitting on the critical core promoter sequences -20 to $1 (203, 220) (Fig. 8a). Other studies have already implicated the C-terminal domain of UBF in promoter activation (Section
-
- - -
human:- -715 8 5 - 7 5 mouse:- -108 -89 -68
il
-d?l1:
+122
+go
pE-4
mammal-
d
L--
Wild Type
U
Xenopus ---/
Plus 10 b.p.
Plus 5 b.p. U
UP
FIG.8. The enhancesome structure and its suggested role in initiation complex formation. (a) The colinear model of xUBF-DNA interaction. A composite picture is presented, derived from mapping data of the xUBFlUBF binding sites on the Xenopus, human, and mouse prornoters. These data have clearly defined binding positions for UBF HMG-box 1and for xUBF boxes 1 to 3. (In the case of the mouse, it has been assumed that, by analogy with Xenopus and humans, the most evident DNase-I footprints are due to HMG-box 1.)An interaction between
RIBOSOMAL TRANSCRIPTION
47
IV,A). It is therefore likely that this specific intra-xUBF interaction plays an important role in promotion. To visualize exactly what occurs when xUBF binds to DNA, we, in collaboration with D. P. Bazett-Jones (University of Calgary), took advantage of the technique of electron spectroscopic imaging (ESI) (228, 229). ESI is ideally suited for visualizing DNA-protein complexes because (1)the specimen does not have to be stained or shadowed, (2) the technique allows direct estimation of the mass of complexes, and (3) net phosphorus images can be used to localize the DNA component and allow it to be estimated. Together, the mass information and the phosphorus content can reveal stoichiometric relationships between protein and DNA. Single xUBF complexes were found to contain about 180 bp of DNA looped nearly 360" by a dimer of xUBF (203). The net phosphorus images of the xUBF-enhancer complex clearly indicated that the DNA component was concentrated toward the periphery (Fig. 8b). We have called th'is complex an enhancesome, because it was originally observed on the 60/81-bp Xenopus enhancer repeats. We also constructed a low-resolution space-filling model of the enhancesome (203) (Fig. 8c), in which an xUBF dimer lies inside a 180-bp DNA loop, the tandemly arranged HMG-boxes each binding to -20 bp of DNA (195, 197, 230). The model predicts that DNA looping by xUBF is predominantly due to a series of in-phase bends induced by the repeated binding of the HMG-boxes. The bend angle per HMG-box can be estimated from our data to be -60" if it is assumed that only the tight DNAbinding boxes 1 to 3 bend the DNA. Even before the primary structure of xUBFs (TFIs) was known, it had been noted that this factor tended to protect against DNase I attack in a regular 10-bp repeat, suggesting binding to one face of the DNA (189). This led the authors to suggest that the DNA wrapped around the xUBF as we observe in the enhancesome.
the acidic C-terminal domain of xUBF and the HMG-box 1 bound on the Xenopus core promoter is indicated by a shaded arrow. (b) Electron spectroscopic imaging (ESI) of the xUBF-DNA complex revealed that a 180-bp near-360" loop of DNA is stabilized by one xUBF dimer. The upper image shows only the DNA whereas the lower image shows the complete complex. (c)A low-resolution model of the enhancesome showing the ten HMG-boxes and acidic C-terminal domains of an xUBF dimer within the DNA loop. (d) A model for the cooperative interaction of the TBP,-complex with the UBF-promoter complex. Consistent with the available mapping data (Fig. 3), two asymmetric TBP,-complexes have been positioned on the surface of adjacent enhancesomes, one in the UPE and one in the core promoter. The effects of the insertion of 5 or 10 bp between the UPE and core on cooperative TBP,-complex binding have also been modeled.
48
TOM MOSS AND VICTOR Y. STEFANOVSKY
B. Role of the Enhancesome in Promotion As discussed above, previous work (147, 195, 197) shows that two independent UBF complexes bind within the human and Xenopus ribosomal promoter, one centered around +1 and the other, within the UCE, at around -90 to - 100 bp (Figs. 3 and 8a). Hence, we predict that two adjacent enhancesomes should form on the promoter. The TBP,-complex extends the UBF footprint on the UPE from - 115 to beyond - 160bp, and also protects the core near the initiation site (147); a summary of the footprinting data is given in Fig. 3. The two promoter enhancesomes would present these TBP,complex sites on the surface of a superhelix and hence may facilitate the cooperative binding of this factor to both sites (Fig. 8d). For simplicity, we have assumed that two TBP,-complexes interact with the promoter, one in the UPE and one in the core element. TBP dimerizes when bound to DNA (231), though it is not known if the TBP,-complex can also dimerize. A similar model could, however, be made for a single TBP,-complex interacting with both promoter elements. Spacing changes of half a duplex turn between the UPE and core elements severely diminish promoter activity, whereas changes involving a full turn only mildly affect promoter activity (see Section 111). Figure 8d shows how the corresponding changes in enhancesome topology can explain these observations in terms of cooperative binding of the TBP,-complex to the UPE and core elements. Clearly, important protein-protein interactions may be superimposed on the DNA-folding role of UBF. It has already been shown that UBF interacts in uitro with RPOI, and other experiments suggest further interactions with other basal transcription factors (192 and L. I. Rothblum, personal communication). Thus, although the enhancesome may be an important structural element in aiding cooperative promoter recognition, UBF basal factor interactions may aid this process and provide a further level of specificity. If our explanation of the function of UBF in the formation of the RNA polymerase I initiation complex is correct, it might also explain the functions of UBF variants. As discussed above, the different UBF forms are predominantly length variants of the protein. Because the spacing of the UPE and core promoter elements is crucial for promoter function, it is tempting to suggest that the UBF and promoter lengths are correlated. Possibly the enhancesome loop diameter could be affected by UBF length, or the binding of a different number of HMG-boxes, such as is the case for the mammalian UBF variants, could change the degree of DNA unwinding per complex. Both effects would be similar to the insertion or deletion of bases between adjacent enhancesomes (Fig. 8d). The fact that the shorter mammalian UBF is nonfunctional may then be because it induces a suboptimal juxtapositioning of the TBP,-complex sites in the promoter complex. The shorter UBF
RIBOSOMAL TRANSCRIPTION
49
variants may therefore constitute natural gene repressors. The same argument may explain why the Xenopus and mammalian UBFs are only partially interchangeable.
VI. Enhancement A. Spacer Promoters and Repetitive Enhancers Promoter-related repetitive sequences were first discovered in the Xenopus intergenic spacers (IGSs) (98,99). Some years later, these same sequences were shown to enhance rRNA transcription (232). Since then, we have learned much about the molecular genetics of the Xenopus IGS. Unfortunately, we still do not understand the mechanisms by which the Xenopus repeated IGS sequences function. In the meantime, the ribosomal IGSs of a large number of eukaryotic organisms have also been shown to contain arrays of promoter-adjacent repeats (Section I1 and Fig. 2). These repeats have often been generated by promoter duplication. Indeed, some repeats consist of functional promoters, and where this is not the case, the repeats are usually preceded by a promoter. The IGS repeat arrays often enhance ribosomal transcription, though this activity may sometimes be revealed only in a heterologous system (65, 115, 116, 126, 233-239). Because the organisms studied have included plants, vertebrates, and insects, and the IGS has evolved independently in each group, we must conclude that transcription enhancement by promoter-adjacent repeats is a common aspect of ribosomal transcription. Underlining this point, the mouse and Xenopus enhancers are to some extent interchangeable (115, 233), and it has recently been shown that a putative plant enhancer repeat can also function in Xenopus (116).The only clear exception to the general rule of ribosomal transcription being enhanced by IGS repeat sequences occurs in S . cerevisiae, where the IGS contains a unique, promoter-distal enhancer (Section VI, B). Although we do not yet understand how the IGS enhancer-repeats function, certain experiments put clear constraints on the possible mechanisms. In Xenopus and mammals, the enhancer repeats consist of one or more active promoters upstream of an array of directly repeated elements (Fig. 2). The direct repeats function as bidirectional enhancers (234), and it is these elements that appear to be widely interchangeable between organisms. The upstream promoter element also enhances transcription and does so as a direct result of its promoter activity (240). It is therefore probably a directional enhancer. Experiments on the Drosophila rDNA support this conclusion (236, 237). They show that the direct enhancer repeats, which in Drosophila constitute active promoter elements, act as directional enhancers.
50
TOM MOSS AND VICTOR Y. STEFANOVSKY
De Winter and Moss (235, 240) have probably provided the most selfconsistent and complete set of data on the role of the various repetitive enhancer elements in Xenopus. Briefly, the experiments showed that a single super-repeat, i.e., a spacer promoter followed by an enhancer array (see Fig. 2), is sufficient for full enhancer activity. Multiple super-repeats, as occur in wild-type spacers, do not function additively, and indeed a single super-repeat is by a small margin the optimal enhancer configuration. Inactivation of the spacer promoter or deletion of the enhancer array severely impairs enhancement. The number of enhancer repeats within an isolated super-repeat, but not the total number of enhancers in a natural IGS, is directly proportional to overall enhancer strength. The spacer promoter did not enhance transcription of itself but did so only when followed directly by an array of enhancer repeats. Thus, spacer promoter enhancement is somehow mediated by the enhancer repeats. A possible explanation of this is that spacer transcription modifies the proteins bound to these repeats. The enhancers appear to be associated with the core histones, whatever the state of gene activity (241,242),and their chromatin has been reported to be compact (243).We have reinvestigated the chromatin structure of specific Xenopus IGS regions and find that the enhancers of active genes are in fact very accessible to micrococcal nuclease whereas those of inactive genes give a classic nucleosome ladder (B. Leblanc, L. Karagyozov and T. Moss, unpublished data). Hence, we suggested that transcription from the spacer promoters may open up the enhancer chromatin to allow ribosomal gene activators to bind (235). Because it is known that UBF can bind within the spacer (187, 189), it is clearly one very likely candidate. Another is the TBP,complex, possible in cooperation with UBF. A Ku-related factor, E,BF, has also been shown to interact functionally with each 134-bp rat enhancer repeat (239). The contention that the enhancer repeats function by binding a transcription factor is also supported by the reports that the Xenopus enhancers compete with the gene promoter when the two are placed in trans (234, 244). However, we have carefully repeated these experiments with a natural enhancer repeat lacking an active spacer promoter and have found that the enhancers do not of themselves compete with the gene promoter in microinjected oocytes (T. Moss, unpublished data). We have yet to test if transcription of these enhancers is the key to their competitive activity. xUBF forms a specific folded structure, the enhancesome, when bound on the Xenopus repetitive enhancer sequences (203, 220) (Fig. 8). xUBF positions itself equivalently on pairs of contiguous Xenopus 60- and/or 81-bp enhancer repeats in uitro (245).The probable colinear arrangement of xUBF dimers on the enhancers is shown in the upper panel of Fig. 9a. The resulting repeated enhancesome structure would probably resemble an unbroken
51
RIBOSOMAL TRANSCRIPTION
a
b
SDacer Promoter
Enhancer ReDeab (Bind TBP,-complex or Polymerase?)
1
Promoter Terminator (Readthrough Enhancement?)
FIG. 9. Enhancesome structure as applied to the IGS. (a) Colinear xUBF binding to the Xenopus 6 0 B l - b ~repetitive enhancers. The model is derived from data on the enhancesome structure and the stoichiometry of xUBF binding to multiple enhancer repeats. (b) A model for the structure of the enhancer, terminator, and promoter regions of a typical IGS. Extending the arguments of Fig. 8, the terminator will normally be found in close proximity to the promoterbound TBP,-complex, permitting readthrough enhancement by polymerase recycling. The repetitive enhancers might form a continuous DNA superhelix between the gene promoter and the upstream spacer promoters. The need for such a structure might be explained by a capacity to accumulate polymerase, the TBP,-complex, or both.
DNA superhelix of about 180 bp per turn (lower panel, Fig. 9a). This, and the observation that the apparently unrelated Xenopus, mouse, and Arabidopsis enhancers are functionally interchangeable (115, 116, 233), indicate that the formation of such a structure, and not the exact DNA base sequence, is probably the major determinant of enhancement. Perhaps the TBP,-complex and/or the polymerase (8)can be effectively accumulated on this structure and lead, by some unknown mechanism, to the enhancement of transcription initiation or to gene activation (see Section VII).
52
TOM MOSS AND VICTOR Y. STEFANOVSKY
B. Single-copy Enhancers Whereas the ribosomal IGS of Schizosaccharomyces pombe conforms to the higher eukaryotic paradigm (95), that of S. cerevisiae presents a clear exception. Here the IGS contains no repetitive enhancers, and enhancement is controlled by a relatively short segment of DNA just downstream of the 25-S gene, i.e., at the promoter-distal end ofthe IGS (Fig. 2). This DNA segment has been implicated not only in enhancement (246-250), but also in RPOI termination (251-256) (Section VI), replication arrest (257, 258), and rDNA recombination (259). It is still unclear how enhancement occurs or what proteins may be involved, but the binding sites for the ribosomal enhancer binding protein 1, Reblp, which has been implicated in in vitro termination, appear to be required (260). It has been suggested that the enhancer and promoter elements are juxtaposed to allow a type of readthrough enhancement or polymerase recycling (260),much as was previously suggested for Xenopus (261, 262) (Section V1,C). However, to achieve this, the IGS must form a large loop, for which there is as yet no evidence. Single-copy enhancers have also been observed in mammals. A 174-bp sequence upstream of the spacer promoter in the rat IGS enhances RPOI transcription in cis (238). It has also been shown to bind the factor E,BF. This factor appears to be related to the rat equivalent of the Ku antigen (263, 264), a DNA-dependent kinase shown in other experiments to be a transcriptional inhibitor (Section VI1,A). This same protein also binds and activates the rat promoter (265),and binds functionally to the rat repetitive enhancers
(239).
C. Terminators Readers not working on rDNA transcription may be surprised to find transcription termination categorized under enhancement. However, termination in organisms as widely disperse as yeast and mouse has been clearly implicated in ribosomal transcription enhancement, and several distinct mechanisms may be involved. As originally observed in Xenopus (8),the ribosomal promoter of vertebrates is closely preceded by a transcription terminator (Fig. 2). Termination has been studied in most detail in the mouse, Xenopus, and yeast, and the mechanisms appear similar. A short sequence motif, the Sal-box in mouse (266)and the T3-box in Xenopus (267, 268), is recognized by a single polypeptide factor, respectively called TTFl (269) or Rib2 (270). The termination reaction, which is specific for RPOI (271),occurs in two distinct steps (272, 273). The polymerase is first arrested by the DNA-bound factor. The transcript is then cleaved some tens of bases upstream of the site of arrest and within a pyrimidine-rich or (A+U)-rich sequence. Terminators usually, but not exclusively, occur at both extremes of
RIBOSOMAL TRANSCRIPTION
53
the vertebrate IGS (Fig. 2). Hence, ribosomal transcripts can terminate one to several hundred base-pairs downstream of the 28-S gene andlor just upstream of the gene promoter. However, the Xenoptis laeuis IGS does not have a functional 28-S proximal terminator (274, 275), and in Drosophila species, no terminator can be identified at either end of the IGS (276). Hence, termination per se may not be a prerequisite for expression of the ribosomal genes and may be an adaptive or a regulative function. In S . cereuisiae the situation is somewhat different. Three termination sites have been mapped, one within the promoter-distal unique enhancer element (see Section V1,B) and two further downstream between this enhancer and the 5-S gene (see Fig. 2). In uitro, termination at the enhancer site appears to require the factor Reblp (256);however, this factor alone does not recreate the exact 3’ rRNA terminus seen in viuo. Reblp, a relatively abundant factor, was isolated as an enhancer-binding protein. It has since been found to be identical to a factor also implicated in RPOII transcription activation (277, 278). Several experiments have suggested a role for RPOI termination in transcription activation. The proximity of a terminator to the ribosomal promoter in vertebrates suggested the possibility that the polymerase may be rapidly recycled. Experiments in which the promoter-proximal terminator was deleted showed a strong effect on promotion (279-281). However, it was subsequently found that these observations were probably an artifact of promoter occlusion, a phenomenon in which transcription through a promoter disrupts the semistable preinitiation complex (282, 283). Parallel experiments in our laboratory suggested that multiple ribosomal genes aligned on a plasmid are not all equivalent and that this lack of equivalence is due to transcription attenuation within the plasmid vector (9). Following up this observation, we found that premature RPOI termination on an essentially wild-type, but circular, rDNA template reduced the initiation rate in cis (261). This result was most easily explained by the inhibition of polymerase recycling. Because the mechanism depended on polymerase molecules reading-through the IGS to terminate promoter proximally, the mechanism was referred to as readthrough enhancement (261). Several subsequent studies have established a role for the promoterproximal terminator, independent of it simply preventing promoter occlusion (262,268, 284), and both the sequence and length of the DNA between the T3 terminator and the promoter affect promoter activity (268, 284). The Reblp binding sites within the yeast IGS also appear to be important for transcription enhancement (Section VI, B). But whether termination per se is an important factor in enhancement on the yeast gene remains unclear. By modeling polymerase diffusion in the microenvironment of the Xenopus T3 terminator-promoter, we showed that polymerase recycling could
54
TOM MOSS AND VICTOR Y. STEFANOVSKY
only be an important factor in enhancement if the terminator and promoter are in very close proximity, i.e., within one DNA turn or about 4 nm (262). On the linear rDNA, the promoter and terminator are separated by 40 bp, about 14nm. However, xUBF is known to bind on either side of the Xenopus T3 terminator (197, 262). A model similar to that suggested for the promoter in Fig. 8 might then also explain the observed coupling between the terminator and the promoter (Fig. 9b). Enhancesome formation in the core, the UPE, and the terminator could juxtapose these elements on the surface of a superhelix whose pitch might be as little as 2 nm, i.e., the DNA duplex diameter.
VII. Mechanisms of Regulation Differentiation, cell culture density, nutrient deprivation, serum deprivation, and hormonal treatment affect promoter recognition by RPOI and, as a consequence, RPOI loading onto the ribosomal genes (29-33, 38). On the other hand, gene dosage, electron-microscope studies, and cross-linking experiments suggest that the rate of eukaryotic ribosomal transcription is regulated at the level of gene activation (7, 285, 286a,b). Hence, at least two distinct mechanisms control ribosomal gene transcription. Regulation of the polymerase or an associated factor affects the number of polymerase molecules available for initiation and hence the initiation rate on each active gene. A second mechanism determines the number of active genes and probably involves the formation of stable preinitiation complexes at the gene promoter (45).Inactivation of the ribosomal genes at mitosis (7) and their developmental regulation (50-52) may represent related phenomena or indicate yet further levels of transcriptional control.
A. Growth-regulated Activation 1. THE POLYMERASE AND ASSOCIATED FACTORS Experiments in two very different systems implicate the polymerase in the growth-associated regulation of ribosomal transcription. Some time ago, it was shown that two distinct forms of RPOI exist in Acanthamoeba (29, 30). One major, or sporulation, form was active in unspecific polymerase assays, but could not specifically recognize the rRNA promoter. A minor, or vegetative, form of polymerase was highly active in directing transcription from the promoter. This regulation was explained by a direct modification of the polymerase. Extracts from mouse cells treated in various ways to inhibit growth also showed similar changes in polymerase activity, (31-33). In one case it was argued that this was probably due to direct polymerase modifica-
RIBOSOMAL TRANSCRIPTION
55
tion and in the other to the association of the polymerase with a factor called TIF-lA, or factor C. Glucocorticoid treatment of lymphoma cells arrests their growth and reduces ribosomal transcription by 95%. The difference between extracts from treated and untreated cells appears to lie in the activity of a polymerase-associated factor, TFlC, essential for promoter recognition (34-38). This factor and TIF-1A have now been isolated and are probably one and the same (36, 157). However, it is still unclear if the factor is missing from growth-arrested cells, or if it is inactivated. Recently, phosphorylation was also implicated in the regulation of ribosomal transcription in Xenopus (287). Protein kinase inhibitors repressed in vitro transcription and phosphatase inhibitors stimulated transcription. Although it is not yet known which components of the transcription machinery are affected in this system, the observations suggest some resemblance to the cycle of C-terminal domain (CTD) phosphorylation/dephosphorylation that occurs during repeated transcription-initiation by RPOII (288, 289). However, RPOI contains no domain that resembles the RPOII CTD (213 and S. I. Dimitrov, W. Q. Xie, L. I. Rothblum and T. Moss, unpublished observations).
2. UBF As discussed in Section IV, UBF has been shown to exist as two variant proteins, one active and the other inactive in transcription. Consistent with this, expression of the two UBF forms is regulated with cell growth rate and developmental stage (45,204,206). Because the UBF variants are not equivalent for gene activation, such regulation could provide a route by which to control the number of active ribosomal genes. In Section V, we gave a structural explanation of gene activation by UBF and suggested that the UBF variants may induce alternative nonfunctional DNA folding. However, it is also clear that posttranslational modification of UBF can modulate the activity of this factor. Phosphorylation of the C-terminal of UBF occurs in viuo and modulates its cellular location and in vitro activity (193, 194, 200). UBF, derepresses the RPOI promoter in the artificial in vitro situation of repression with histone Hl(169). It also overcomes repression by the Ku antigen, a two-subunit DNA-activated kinase that usually binds DNA relatively nonspecifically, but apparently interacts specifically with the mouse RPOI promoter (290).
B. Growth-regulated Repressors A possible means of regulating the ribosomal genes is through regulatable repressors. An activity capable of repressing the mouse promoter has
been isolated from growth-arrested cells; this activity is absent from proliferating cells (291, 292). The repressor activity appears not to affect preinitia-
56
TOM MOSS AND VICTOR Y. STEFANOVSKY
tion complex formation, as would be predicted if it were a histone or some other nonspecific DNA-binding protein, and acts at the stage of polymerase recruitment. Chromatin regulates gene activity (293),and often does so by limiting the access of transcription factors to their DNA-binding sites. Like most other genes, the ribosomal genes partially lose the so-called linker histone H1 on activation, at the same time maintaining interactions with the core histones (241, 242). Thus, the transition from an inactive to an active gene state involves chromatin modification. We have observed that, on inactive Xenopus rDNA, nucleosomes occupy not only the complete 40-S transcribed region, but also most of the IGS, including the repetitive enhancers (B. Leblanc, L. Karagyozov and T. Moss, unpublished data). After gene activation, these regions no longer show nucleosomal characteristics. However, polymerase chain reaction (PCR)-mediated footprinting showed that the gene and spacer promoters of most of both active and inactive genes are complexed with UBF in vivo (197 and B. Leblanc and T. Moss, unpublished data). UBF is also found associated with the nucleolar organizer of mitotic chromosomes (294, 295). Hence, repression by chromatin, if it occurs, apparently does not disrupt the activated promoter organization. As we suggested in Section VI, spacer promoter activity could play a role in opening up the enhancer chromatin to activating factors. If this is so, the spacer promoters may be involved in an early step in gene activation (see below).
C. The Role of the Repetitive Enhancers and Spacer Promoters The promoter-proximal repetitive IGS is clearly implicated in transcription enhancement, but only two experiments have directly addressed whethe r it functions at the level of gene activation or to control the transcriptioninitiation rate on activated genes. In the first experiment (296a),the Xenopus repetitive enhancers were placed on one plasmid, the gene promoter on another, and the two plasmids enzymatically interlinked or concatenated in vitro. The concatenated plasmids were then microinjected into Xenopus oocytes and both transcription and decatenation were followed. Despite the fact that the plasmids were rapidly decatenated, no difference in transcription level between this situation and the wild-type in cis enhancer-promoter linkage was observed. It was therefore concluded that the enhancers functioned solely at an early stage of gene activation. Similar conclusions were drawn from an in vitro study in which the enhancers were shown to function at preinitiation complex formation and did not affect reinitiation (296h). As discussed in Sections VI,A and VII,B, the spacer promoters in Xenopus function to augment overall transcription only when followed downstream by the repetitive enhancers. These promoters may then simply serve
57
RIBOSOMAL TRANSCRIPTION
to make the enhancers available to transcription factors. As such, they might be able to induce gene activation via the enhancers. Activation of the spacer promoter, rather than the gene promoter, may therefore be the first step in gene activation.
VIII. In Conclusion To briefly summarize, the promotion of ribosomal transcription is generally directed by a 100- to 150-bp DNA sequence that lies across and predominantly upstream of the transcription-initiation site. Promotion requires an activated form of RPOI, the DNA-binding factor UBF, and the TBP,complex containing three TAF,s. UBF binds to both an upstream and an initiation-site promoter element, probably coiling the promoter into a 180-bp loop. This allows interaction of the TBP,-complex and subsequent polymerase recruitment. The activity of the promoter is enhanced by a promoter-proximal terminator, by enhancer repeats in the proximal IGS, and by the presence in the IGS of duplicate spacer promoters. Further upstream, other sequences may also modulate transcription. Ribosomal transcription is sensitive to both growth rate and cell differentiation and is regulated at two levels, the transcription-initiation rate per gene and the number of active genes per cell. Several factors are involved in various aspects of this regulation. In particular RPOI activity is regulated with growth rate as is at least one of the basal transcription factors, UBF. Ribosomal transcription is clearly a fascinating and complex problem. It is also a key player in the regulation of cell proliferation. As a Physics student, converted to molecular biology in the early 1970s, I assimilated the then-current view that ribosome research was pretty well played out. It took a convinced adherent and some not insignificant means of persuasion to convince me otherwise. In the years that I have since spent attempting to unravel various aspects of ribosomal transcription, I have become convinced that such studies are not just important but are essential. In this review, we have attempted to present a rational but also a personal view of ribosomal transcription. In so doing, we hope to stimulate interest and speculation on the molecular mechanisms of ribosomal transcription. ACKNOWLEDGMENTS We thank Dr. M. Boissinot for indispensable aid in modeling the enhancesome and Dr.
L. I. Rothhluni for critical reading of the manuscript. The work was supported by the Medical Research Council of Canada. T.M. is a Senior Researcher of the FRSQ and a member of the Centre d e Recherche en C a n c h l o g i e d e l’Universit6 Laval, which is supported by the FCAR of QuBhec.
58
TOM MOSS AND VICTOR Y. STEFANOVSKY
Abbreviations core ESI ETS HMG HMG-box hUBF IGS IPE ITS mUBF rDNA RPOI RPOII RPOIII r-proteins rUBF snRNP TAF TAF, TATA-box TBP TBP, complex UBF UCE UPE xUBF
core promoter element electron spectroscopic imaging external transcribed spacer high-mobility-group protein high-mobility-group protein homology domain human UBF intergenic spacer intrapromoter element internal transcribed spacer mouse UBF ribosomal DNA RNA polymerase I(A) RNA polymerase II(B) RNA polymerase III(C) ribosomal proteins rat UBF small nuclear ribonucleoprotein TATA-box binding protein associated protein RNA-polymerase-I-specific TATA-box binding protein associated protein dTATA promoter element TATA-box binding protein RNA-polymerase-I-specific TATA-box binding protein complex upstream binding factor upstream control element upstream promoter element Xenopus UBF
REFERENCES B. McClintock, 2. ZeUforsch. Mikroanat. 21, 294 (1934). M . L. Birnstiel, M. Chipchase and J. Speirs, This Series 11, 351 (1971). M. Oakes, Y. Nogi, M. W. Clark and M. Nomura, MCBiol 13, 2441 (1993). E. P. Rustchenko, T. M. Curran and F. Sherman, /. Bact. 175, 7189 (1993). P. Pasero and M . Marilley, MGG 236, 448 (1993). 6. K. D. Tartof and R. S. Hawley, in “The Genome of Drosophikz melanogaster” (D. L. Lindsley and G. 6. Ziinm eds.), p. 68. Academic Press, London, 1992.
1. 2. 3. 4. 5.
RIBOSOMAL TRANSCRIPTION
59
7. 8. 9. 10. il. 12. 13. 14. 15. 16. 17. 18.
A. A. Hadjiolov, Cell Biol. (Monograph) 12, 1 (1985). T. Moss, Nature 302, 223 (1983). T. Moss, K. Mitchelson and R. F. J. De Winter, Oxf, Suru. Eukaryot. Genes 2, 207 (1985). R. H. Reeder, Cell 38, 349 (1984). A. E. Dahlberg, Cell 57, 525 (1989). M. R. Paule, J. Protozoal. 30, 211 (1983). M. Derenzini, M. Thiry and G . Geossens, J , Histochern. Cytochern. 38, 1237 (1990). J. R. Warner, Microbiol. Reu. 53, 256 (1989). R. H. Reeder, Trends Genet. 6, 390 (1990). B. Sollner-Webb and E. B. Mougey, TZBS 16, 58 (1991). D. E. Larson, P. Zahradka and B. H. Sells, Biochem. Cell. Biol. 69, 5 (1994). M. R. Paule, E. Bateman, L. Hoffman, C. Iida, M. Imboden, W. Kubaska, P. Kownin, H. Li, A. Lofquist, P. Risi, Q . Yang and M. Zwick, MCBchern 104, 119 (1991). 19. A. Schnapp, H. Rosenbauer, and I . Grummt, MCBchern 104, 137 (1991). 20. I. Grummt, in “Nucleic Acids and Molecular Biology” (F. Eckstein and M. J. Lilley, eds.), 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34.
35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48.
p. 148. Springer-Verlag, Berlin, Heidelberg, and New York, 1989. G. A. Dover, Genetics 122, 249 (1989). A. K . Srivastava and D. Schlessinger, Biochirnie 73, 631 (1991). E. 0. Long and I. B. Dawid, ARB 49, 727 (1980). P. Zahradka, D. E. Larson and B. H. Sells, MCBchern 104, 189 (1991). F. Amaldi, I . Bozzoni, E. Beccari and P. Pierandrei-Amaldi,TlBS 14, 175 (1989). S. A. Liebhaber, S Wolf and D. Schlessinger, Cell 13, 121 (1978). W. H. Mager and R. J. Planta, MCBchern 104, 181 (1991).
C. Presutti, S. A. Ciafr6 and I. Bozzoni, E M B O J. 10, 2215 (1991). M. R. Paule, C. T. Iida, P. J. Perna, G. H. Harris, D. A. Knoll and J. M. D’Alessio, NARes 12, 8161 (1984). E. Bateman and M. R. Paule, Cell 47, 445 (1986). J. Tower and B. Sollner-Webb, Cell 50, 873 (1987). D. Buttgereit, G. Pflugfelder and I . Grummt, NARes 13, 8165 (1985). A. Schnapp, C . Pfleiderer, H . Rosenbauer and I. Grummt, E M B O J. 9, 2857 (1990). A. H. Cavanaugh, P. K. Gokal, R. P. Lawther and E. A. Thompson, Jr., PNAS 81, 718 (1984). A. H. Cavanaugh and E. A. Thompson, Jr., NARes 13, 3357 (1985). P. B. Mahajan and E. A. Thompson, JBC 265, 16225 (1990). P. K . Gokal, P. B. Mahajan and E. A. Thompson, JBC 265, 16234 (1990). P. B. Mahajan, P. K. Gokal and E. A. Thompson, JBC 265, 16244 (1990). K. K. Yamamoto and M. Pellegrini, Bchern 29, 11029 (1990). T. Haneda and P. J. McDermott, MCBchein 104, 169 (1991). H. W. Weber, S. Vallett, L. Neilson, M. Grotke, Y. Chao, M . Brudnak, A. S. Juan and M. Pellegrini, MCBchern 104, 201 (1991). S. N. Allo, P. J. McDermott, L. L. Carl and H . E. Morgan, JBC 266, 22003 (1991). S. M. Vallett, M. Brudnak, M. Pellegrini and H. W. Weber, MCBiol 13, 928 (1993). Y. Chao and M. Pellegrini, MCRioZ 13, 934 (1993). D. E. Larson, W. Xie, M. Glibetic, D. O’Mahony, B. H. Sells and L. I. Rothblurn, PNAS 90, 7933 (1993). P. J. McDermott, L. I. Rothblum, S. D. Smith and H. E. Morgan, JBC 264, 18220 (1989). S. M. Vallet, K. K. Yainamoto, H. W. Weber and M. Pellegrini, Insect Mol. B i d . (1994).In press. D. M. Prescott, “Reproduction of Eukaryotic Cells.” Academic Press, New York, 1976.
60
TOM MOSS AND VICTOR Y. STEFANOVSKY
49. P. W. Melerd, in “Growth and Differentiation in Physarum polycephalum” (W. F. Dove and H . P. Rusch, eds.), p. 64.Princeton University Press, Princeton, New Jersey, 1980. 50. D. D. Brown and E. Littna, J M B 8, 669 (1964). 51. K. Shiokawa, Y. Misumi, Y. Yasuda, Y. Nishio, S. Kurata, M. Sameshima and K. Yamana, Deu. B i d . 68, 503 (1979). 52. J. Newport and M. Kirschner, Cell 30, 675 (1982). 53. M. Derenzini and D. TrerB, Virchms Arch. [ B ] 61, 1 (1991). 54. A. Lazaris-Karatzas, K. S. Montiiie and N. Sonenberg, Nature 345, 544 (1990). 55. S. 0. Rogers and A. J. Bendich, Plant Mol. Biol. 9, 509 (1987). 56. U. Scheer and H. Zentgraf, in “The Cell Nucleus” (H. Busch and L. I. Rothblum, eds.), p. 143. Academic Press, New York, 1982. 57. A. P. Bird, C S H S Q B 42, 1179 (1978). 58. D. K. Butler and R. L. Metzenberg, Chromosoma 102, 519 (1993). 59. B. Lewin, “Gene Expression 2.” Wiley-Interscience, New York, 1980. 60. H. U. Goringer, K. A. Hijazi, E. J. Murgola and A. E. Dahlberg, PNAS 88, 6603 (1991). 61. T. Powers and H . F. Noller, EMBO J. 10, 2203 (1991). 62. C. G. Clark, B. W. Tague, V. C. Ware and S. A. Gerbi, NARes 12, 6197 (1984). 63. J. Tower, S. L. Henderson, K. M. Dougherty, P. J. Wejksnora and B. Sollner-Webb, MCBiol9, 1513 (1989). 64. D. L. Mroczka, 8. Cassidy, H. Busch and L. I. Rothblum, J M B 174, 141 (1984). 65. B. G. Cassidy, H. F. Yang-Yen and L. I. Rothblum, MCBiol 6, 2766 (1986). 66. V. M. Dumenco and P. J. Wejksnora, Gene 46, 227 (1986). 67. J. E. Sylvester, R. Petersen and R. D. Schmickel, Gene 84, 193 (1989). 68. A. Simeone, A. La Volpe and E. Boncinelli, NARes 13, 1089 (1985). 69. 6. D. Baldridge and A. M. Fallon, DNA Cell B i d . 11, 51 (1992). 70. D. Tautz, C. Tautz, D. Webb and G. A. Dover, J M B 195, 525 (1987). 71. N. C . P. Cross and G . A. Dover, J M B 195, 63 (1987). 72. D. C . Hayward and D. M. Glover, Gene 77, 271 (1989). 73. V. V. Ashapkin, T. T. Antoniv and B. F. Vanyushin, Biochem. Mol. B i d . Znt. 30, 755 (1993). 74. R. N . Beech and C. Strobeck, Plant Mol. B i d . 22, 887 (1993). 75. F. Grellet, D. Delc o-Tremousaygue and M. Delseny, Plant Mol. B i d . 12, 695 (1989). 76. R. J. Kelly and A. Siegel, Gene 80, 239 (1989). 77. W. Schmidt-Puchta, I. Gunther and H . L. Sanger, Plant Mol. B i d . 13, 251 (1989). 78. J. Rathgeber and I. Capesius, NARes 18, 1288 (1990). 79. J. D. Procunier and K. J. Kasha, Plant Mol. B i d . 15, 661 (1990). 80. F. Takaiwa, S . Kikuchi and K. Oono, Plant Mol. B i d . 15, 933 (1990). 81. R. I. Bennett and A. 6. Smith, Plant Mol. Biol. 16, 1095 (1991). 82. P. Gruendler, I. Unfried, K. Pascher and D. Schweizer, J M B 221, 1209 (1991). 83. P. Gruendler, I. Unfried, R. Pointner and D. Schweizer, NARes 17, 6395 (1989). 84. M. Ueki, E. Uchizawa and K. Yakura, Plant Mol. Biol. 18, 175 (1992). 85. D. Tremousaygue, M. Laudie, F. Grellet and M . Delseny, Plant Mol. Biol. 18, 1013 (1992). 86. I. Dornreiter, L. F. Erdile, I. U. Gilbert, D. Von Winkler, T. J. Kelly and E. Fanning, EMBO J. 11, 769 (1992). 87. N. Borisjuk and V. Hemleben, Plant Mol. B i d . 21, 381 (1993). 88. H. Vahidi and B. M. Honda, MGG 227, 334 (1991). 89. E. M. Novak, M. P. De Mello, H. B. M. Gomes, I. Galindo, P. Guevara, J. L. Ramirez and J. F. Da Silveira, Mol. Biochem. Parasitol. 60, 273 (1993).
RIBOSOMAL TRANSCRIPTION
61
90. P. Dietrich, M. B. Soares, M. H. T. Affonso and L. M. Floeter-Winter, Gene 125, 103 (1993). 91. G. R. Klassen and J. Buchko, Curr. Genet. 17, 125 (1990). 92. S. K. Dutta and M. Verma, BBRC 170, 187 (1990). 93. A . M . Weiner and H. S. Emery, in “The Cell Nucleus” (H. Busch and L. I. Rothhlum eds.), p. 127. Academic Press, New York, 1982. 94. R. A. Cole and K. L. Williams, Genetics 130, 757 (1992). 95. L. Pape, L. Chen, Z. Liu and 2. Zhoa, J . Cell Biochern. Suppl. 18C 74 (1994). 96. L. J. Degennaro, F. Weinherg and W. J. Rutter, JBC 252, 8126 (1977). 97. K. G. Skryabin, M. A. Eldarov, V. L. Larioniv, A. A. Bayev, J. Klootwijk, V. C. H. C. deRegt, 6. M. Veldman, R. J. Planta, 0. I. Georgiev and A. A. Hadjiolov, NARes 12, 2955 (1984). 98. T. Moss and M. L. Birnstiel, NARes 6, 3733 (1979). 99. P. Boseley, T. Moss, M. Machler, R. Portmann and M. Birnstiel, Cell 17, 19 (1979). 100. V. L. Murtif and P. M. M . Rae, NARes 13, 3221 (1985). 102. J. R. Miller, D. C. Hayward and D. M. Glover, NARes 11, 11 (1983). 103. G. P. Smith, CSHSQB 38, 507 (1973). 104. G. P. Smith, Science 191, 528 (1976). 105. E. S. Coen and G. A. Dover, Cell 33, 849 (1983). 106. E. S. Coen, J. M. Thoday and G. A. Dover, Nature 295, 564 (1982). 107. T. D. Petes, Cell 19, 765 (1980). 108a. G. A. Dover, Bioessays 14, 281 (1992). 108b. R. A. Kim and J. C. Wang, Cell 57, 975 (1989). 108c. M. F. Christman, F. S. Dietrich, N. A. Levin, B. U. Sadoff and G. R. Fink, €“AS 90, 7637 (1993). 109. M. Buongiorno-Nardelli, F. Amaldi and P. Lava-Sanchez, Nature N B 238, 134 (1972). 110. H . 6. Callan, J. Cell Sci. 2, 1 (1967). 111. H. L. K. Whitehouse, J. Cell Sci. 2, 9 (1967). 112. R. Miesfeld and N . Arnheim, MCBiol4, 221 (1984). 113. Y. Mishima, I. Financsek, R. Kominami and M. Muramatsu, NARes 10, 6659 (1982). 114. I. Grummt, E. Roth and M. R. Paule, Nature 296, 173 (1983). 115. C. S. Pikaard, L. K. Pape, S. L. Henderson, K. Ryan, M. H . Paalman, M. A. Lopata, R. H. Reeder and B. Sollner-Webb, MCBiol 10, 4816 (1990). 116. J. H. Doelling, R. J. Gaudino and C. S. Pikaard, PNAS 90, 7528 (1993). 117. V. C. Culotta, J. K . Wilkinson and B. Sollner-Webb, PNAS 84, 7498 (1987). 118. T. Moss, Cell 30, 835 (1982). 119. 0. Yamamoto, N. Takakusa, Y. Mishima, R. Kominami and M. Muramatsu, PNAS 81, 299 (1984). 120. R. M. Learned, S. T. Smale, M. M. Haltiner and R. Tjian, PNAS 80, 3558 (1983). 121. I. Grummt, PNAS 79, 6908 (1982). 122. P. Kownin, C. T. Iida, S. Brown-Shimer and M. R. Paule, NARes 13, 6237 (1985). 123. B. D. Kohorn and P. M. M. Rae, PNAS 79, 1501 (1982). 124. B. D. Kohorn and P. M. Rae, Nature 304, 179 (1983). 125. P. Kownin, E. Bateman and M. R. Paule, Cell 50, 693 (1987). 126. D. C. Hayward and D. M . Glover, NARes 16, 4253 (1988). 127. W. Musters, J. Knol, P. Maas, A. F. Dekker, H. Van Heerikhuizen and R. J. Planta, NARes 17, 9661 (1989). 128. S. Y. Choe, M. C. Schultz and R. H. Reeder, NARes 20, 279 (1992). 129. R. H. Reeder, D. Pennock, B. McStay, J. Roan, E. Tolentino and P. Walker, NARes 15, 7429 (1987).
62
TOM MOSS AND VICTOR Y. STEFANOVSKY
W. Q. Xie and L. I. Rothblum, MCBiol 12, 1266 (1992). J. J. Windle and B. Sollner-Webb, MCBiol6, 4585 (1986). K. G. Miller, J. Tower and B. Sollner-Webb, MCBiol 5, 554 (1985). M. M. Haltiner, S. T. Smde and R. Tjian, MCBiol 6, 227 (1986). T. Kishimoto, M. Nagamine, T. Sasaki, N. Takakusa, T. Miwa, R . Kominami and M . Muramatsu, NARes 13, 3515 (1985). 135. M. H. Jones, R. M. Learned and R. Tjian, PNAS 85, 669 (1988). 136. S. Firek, C. Read, D. R. Smith and T. Moss, NARes 18, 105 (1990). 137. C. Read, A. M. Larose, B. Leblanc, A. J. Bannister, S. Firek, D. R. SmithandT. Moss, JBC 267, 10961 (1992). 138. J. A. Skinner, A. Ohrlein and I. Grummt, PNAS 81, 2137 (1984). 139. P. Kownin, E. Bateman and M. R. Paule, MCBiol8, 747 (1988). 140. B. M. Tyler and N. H. Giles, NARes 13, 4311 (1985). 141. B. Sollner-Webb, J. A. Wilkinson, J. Roan and R. H. Reeder, Cell 35, 199 (1983). 142. S. L. Henderson and B. Sollner-Webb, MCBiol 10, 4970 (1990). 143. W. Xie, D. J. O’Mahony, S. D. Smith, D. Lowe and L. I. Rothblum, NARes 20, 1587 (1992). 144. R . M. Learned, T. K. Learned, M. M. Haltiner and R. T. Tjian, Cell 45, 847 (1986). 145. G. Safrany, N. Tanaka, T. Kishimoto, Y. Ishikawa, H. Kato, R. Korninami and M. Muramatsu, MCBiol 9, 349 (1989). (Abstract) 146. N . Tanaka, H. Kato, Y. Ishikawa, K. Hisatake, K. Tashiro, R. Kominami and M. Muramatsu, JBC 265, 13836 (1990). 147. S. P. Bell, H.-M. Jantzen and R. Tjian, Genes Deu. 4, 943 (1990). 148. S. D. Smith, E. Oriahi, D. Lowe, H.-F. Yang-Yen, D. O’Mahony, K. Rose, K. Chen and L. I. Rothblum, MCBiol 10, 3105 (1990). 149. S. D. Smith, E. Oriahi, H.-F. Yang-Yen, W. Xie, C. Chen and L. 1. Rothblum, NARes 18, 1677 (1990). 150. A. Schnapp, J. Clos, W. Hadelt, R. Schreck, A. Cvekl and I. Grummt, NARes 18, 1385 (1990). 151. S. P. Bell, C. S. Pikaard, R. H. Reeder and R. Tjian, Cell 59, 489 (1989). 152. S. P. Bell, R. M. Learned, H. M. Jantzen and R. Tjian, Science 241, 1192 (1988). 153. J. Tower, V. C. Culotta and B. Sollner-Webb, MCBiol 6, 3451 (1986). 154. R. M. Learned, S. Cordes and R. Tjian, MCBioZ 5, 1358 (1985). 155. L. Comai, N. Tanese and R. Tjian, Cell 68, 965 (1992). 156. D. Eberhard, L. Tora, J. M. Egly and I. Grummt, NARes 21, 4180 (1993). 157. A. Schnapp, 6. Schnapp, B. Erny and I. Grummt, MCBiol 13, 6723 (1993). 158. C. T. Iida and M. R. Paule, NARes 20, 3211 (1992). 159. C. A. Radebaugh, J. L. Matthews, 6. K. Geiss, F. Liu, J. M. Wong, E. Bateman, S. Carnier, A. Sentenac and M. R. Paule, MCBiol 14, 597 (1994). 160. E. Bateman, C. T. Iida, P. Kownin and M. R. Paule, PNAS 82, 8004 (1985). 161. E. Bateman and M. R. Paule, MCBiol 8, 1940 (1988). 162. I). Bachvarov and T. Moss, NARes 19, 2331 (1991). 163. B. McStay, C. H. Hu, C. S. Pikaard and R. H. Reeder, E M B O J . 10, 2297 (1991). 164. D. Bachvarov, M. Normandeau and T. Moss, FEBS Lett. 288, 55 (1991). 165. M. C. Schultz, S. Y. Choe and R. H. Reeder, PNAS 88, 1004 (1991). 166. D. L. Riggs and M. Nomura, JBC 265, 7596 (1990). 167. N. F. Lue and R. D. Kornberg, JBC 265, 18091 (1990). 168. A. Schnapp and I. Grummt, JBC 266, 24588 (1991). 169. A. Kuhn and I. Grummt, PNAS 89, 7340 (1992). 130. 131. 132. 133. 134.
RIBOSOMAL TRANSCRIPTION
63
170. S. D. Smith, D. J. O’Mahony, B. J. Kinsella and L. I. Rothblum, Gene E x p r . 3, 229 (1993). 171. M. C. Schultz, R. H. Reeder and S. Hahn, Cell 69, 697 (1992). 172. B. P. Cormack and K. Struhl, Cell 69, 685 (1992). 173. P. W. J. Rigby, Cell 72, 7 (1993). 174. N. Hernandez, Genes Dew. 7, 1291 (1993). 175. P. A. Sharp, Cell 68, 819 (1992). 176. K. Struhl, Science 263, 1103 (1994). 177. B. R. Braun, B. Bartholomew, G. A. Kassavetis and E. P. Geiduschek, J M B 228, 1063 (1992). 178. 1). Eberhard, U. Rudloff and I. Grummt, J. Cell Biochem. Suppl. 18C, L501 (1994). 179. D. B. Nikolov, S. H. Hu, J. Lin, A. Gasch, A. Hoffmann, M. Horikoshi, N . H. Chua, R. G. Roeder and S. K. Burley, Nature 360, 40 (1992). 180. J. L. Kim, D. B. Nikolov and S. K. Burley, Nature 365, 520 (1993). 181. R. Coleman, T. Fisher, A. Jackson, J. Chicca, A. Taggart, R. Carter and B. F. Pugh, J. Cell Biochem. Suppl. 18C, LO14 (1994). 182. B. F. Pugh and R. Tjian, Genes Deu. 5, 1935 (1991). 183. G. Wistow, Nature 364, 107 (1993). 184. M. R. Paule, C. A. Radebaugh, H. Li, J. L. Matthews, G. K. Geiss, F. Liu, J.-M. Wong and E. Bateman, J . Cell Biochem. Suppl. 18C, LO18 (1994). 185. H.-M. Jantzen, A. Admon, S. P. Bell and R. Tjian, Nature 344, 830 (1990). 186. R. Grosschedl, K. Giese and J. Pagel, Trends Genet. 10, 94 (1994). 187. C. S. Pikaard, B. McStay, M. C. Schultz, S. P. Bell and R. H. Reeder, Genes Deu. 3, 1779 (1989). 188. D. J. O’Mahony and L. I . Rothblum, PNAS 88, 3180 (1991). 189. M. Dunaway, Genes Deu. 3, 1768 (1989). 190. B. McStay, M. W. Frazier and R. H. Reeder, Genes Deu. 5 , 1957 (1991). 191. R. M. Rodrigo, M. C. R e n d h , J. Torreblanca, 6 . Garcia-Herdugo and F. J. Moreno, J . Cell Sci. 103, 1053 (1992). 192. G. Schnapp, F. Santori, C. Carles, M. Riva and I. Grummt, E M B O J . 13, 190 (1994). 193. D. J. O’Mahony, S. D. Smith, W. Xie and L. I. Rothblum, NARes 20, 1301 (1992). 194. R. Voit, A. Schnapp, A. Kuhn, H. Rosenbauer, P. Hirschmann, H. G . Stunnenberg and I. Grummt, E M B O J. 11, 2211 (1992). 195. H. M . Jantzen, A. M. Chow, D. S. King and R. Tjian, Genes Deu. 6, 1950 (1992). 196. C. S . Pikaard, S. D. Smith, R. H. Reeder and L. Rothblum, MCBiol 10, 3810 (1990). 197. B. Leblanc, C. Read and T. Moss, E M B O J. 12, 513 (1993). 198. 6. P. Copenhaver, C. D. Putnam, M. L. Denton and C. S. Pikaard, NARes 22, 2651 (1994). 199. E. Li, T. H. Bestor and R. Jaenisch, Cell 69, 915 (1992). 200. D. J. O’Mahony, W. Xie, S. D. Smith, H. A. Singer and L. I. Rothblum, JBC 267, 35 (1992). 201. S. I . Dimitrov, D. Bachvarov and T. Moss, DNA Cell. B i d . 12, 275 (1993). 202. Y. Maeda, K. Hisatake, T. Kondo, K. Hanada, C. Z. Song, T. Nishimura and M. Muramatsu, E M B O J. 11, 3695 (1992). 203. D. P. Bazett-Jones, B. Leblanc, M. Herfort and T. Moss, Science 264, I134 (1994). 204. K. Hisatake, T. Nishimura, Y. Maeda, K. Hanada, C. Z. Song and M. Murdmatsu, NARes 19, 4631 (1991). 205. E. K. L. Chan, H. Imai, J. C. Hamel and E. M. Tan, J. E x p . Med. 174, 1239 (1991). 206. A. Guimond and T. Moss, NARes 20, 3361 (1992).
64
TOM MOSS AND VICTOR Y. STEFANOVSKY
207. A. Kuhn, R. Voit, V. Stefanovsky, R. Evers, M. Bianchi and I. Grummt, EMBOJ. 13,416 (1994). 208. R. G. Roeder, in “RNA Polymerases” (R. Losick and M. Chamberlin, eds.), p. 285. CSHLab, Cold Spring Harbor, New York, 1976. 209. A. Sentenac, CRC Crit. Reu. Biochem. 18, 31 (1985). 210. D. Lalo, C. Carles, A. Sentenac and P. Thuriaux, PNAS 90, 5524 (1993). 211. N . A. Woychik, S.-M. Liao, P. A. Kolodziej and R. A. Young, Genes Deu. 4, 313 (1990). 212. M. Nomura, Y. Nogi, R. Yano, M. Oakes, D. A. Keys, L. Vu and J. A. Dodd, in “The Translation Apparatus” (K. H. Neirhaus, ed.). Plenum Press, New York, 1995. In press. 213. S. Memet, W. Saurin and A. Sentenac, IBC 263, 10048 (1988). 214. A. K. Lofquist, H. Li, M. A. Imboden and M. R. Paule, NARCS21, 3233 (1993). 215. M. A. Parisi and D. A. Clayton, Science 252, 965 (1991). 216. J. M a and M. Ptashne, Cell 48, 847 (1987). 217. M. Ptashne and A. AF. Gann, Nature 346, 329 (1990). 218. 11. Landsman and M. Bustin, MCBiol 11, 4483 (1991). 219. 1). M. J. Lilley, Nature 357, 282 (1992). 220. T. Moss, D. P. Bazett-Jones and B. Leblanc, J. Cell Biochem. Suppl. 18C, L505 (1994). 221. C. H . Hu, B. McStay and R. H. Reeder, MCBioZ 14, 2871 (1994). 222. C. D. Putnam, G. P. Copenhaver, M. L. Denton, and C. S. Pikaard, MCBiol. 14, 6476 (1994). 223. H. M. Weir, P. J. Kraulis, C. S. Hill, A. R. C. Raine, E. D. Laue and J. 0. Thomas, E M B O J. 12, 1311 (1993). 224. C. M . Read, P. D. Cary, C. Crane-Robinson, P. C. Driscoll and D. G. Norman, NARCS21, 3427 (1993). 225. A. Bhattacharyya, A. I. H. Murchie, E. von Kitzing, S. Diekmann, B. Kemper and 1). M. J. Lilley, J M B 221, 1191 (1991). 226. T. T. P a d , M. J. Haykinson and R. C. Johnson, Genes Deu. 7, 1521 (1993). 227. P. M. Pil, C. S. Chow and S. J. Lippard, PNAS 90, 9465 (1993). 228. D. P. Bazett-Jones, Microbeam Anal. 2, 69 (1993). 229. D. P. Bazett-Jones and M. L. Brown, MCBiol 9, 336 (1988). 230. K. Giese, A. Amsterdam and R. Grosschedl, Genes Dew. 5, 2567 (1991). 231. C. Icard-Liepkalns, BBRC 193, 453 (1993). 232. T.Moss, Nature 304, 562 (1983). 233. A. Kuhn, U. Deppert and I. Grummt, PNAS 87, 7527 (1990). 234. P. Lahhart and R. H. Reeder, Cell 37, 285 (1984). 235. R. F. J. De Winter and T. Moss, J M B 196, 813 (1987). 236. G . Grimaldi and P. P. Di Nocera, PNAS 85, 5502 (1988). 237. G . Grimaldi, P. Fiorentini and P. P. Di Nocera, MCBiol 10, 4667 (1990). 238. S. T. Jacob, J. Zhang, L. C. Garg and C. B. Book, MCBchem 104, 155 (1991). 239. A. K. Gosh, C. M. Hoff and S. T. Jacob, Gene 125, 217 (1993). 240. R. F. J. De Winter and T. Moss, Cell 44, 313 (1986). 241. S. I. Dimitrov, V. Y. Stefanovsky, L. Karagyozov, D. Angelov and I. 6. Pashev, NARCS18, 6393 (1990). 242. S. I. Dimitrov, H. N. Tateossyan, V. Y. Stefanovsky, V. R. Russanova, L. Karagyozov and 1. G. Pashev, EJB 204, 977 (1992). 243. C. Spadafora and M. Crippa, NARCS12, 2691 (1984). 244. S. J. Busby and R. H. Reeder, Cell 3, 989 (1983). 245. C. D. Putnam and C. S. Pikaard, MCBiol 12, 4970 (1992). 246. E. A. Elion and J. R. Warner, Cell 39, 663 (1984). 247. E. A. Elion and J. R. Warner, MCBiol6, 2089 (1986).
RIBOSOMAL TRANSCRIPTION
248. 249. 250. 251. 252. 253.
65
S. P. Johnson and J. R. Warner, MCBiol 9, 4986 (1989). M. C. Schultz, S. Young Choe and R. H. Reeder, MCBiot 13, 2644 (1993). B. E. Morrow, S. P. Johnson and J. R. Warner, MCBiol 13, 1283 (1993). M. E. Swanson and M. J. Holland, JBC 258, 3242 (1983). M. E. Swanson, M. Yip and M. J. Holland, JBC 260, 9905 (1985). R. Voets, A. Lagrou, H. Hilderson, G. Van Dessel and W. Dierick, Znt. J . Biochem. 15,87
(1983). 254. S. P. Johnson and J. R. Warner, MCBchem 104, 163 (1991). 255. C. A. F. M. Van der Sande, T. Kulkens, A. B. Kramer, I. J. De Wijs, H. Van Heerikhuizen, J. Klootwijk and R. J. Planta, NARes 17, 9127 (1989). 256. W. H . Lang and R. H. Reeder, MCBiol 13, 649 (1993). 257. B. J. Brewer, D. Lockshon and W. L. Fangman, Cell 71, 267 (1992). 258. T. Kobayashi, M. Hidaka, M. Nishizawa and T. Horiuchi, MGG 233, 355 (1992). 259. K. Voelkel-Meiman, R. L. Keil and G. S. Roeder, Cell 48, 1071 (1987). 260. T. Kulkens, C. A. F. M. Van der Sande, A. F. Dekker, H. Van Heerikhuizen and R. J. Planta, EMBO J. 11, 4665 (1992). 261. K. Mitchelson and T. Moss, NARes 15, 9577 (1987). 262. T. Moss, A.-M. Larose, K. Mitchelson and B. Leblanc, Biochem. Cell Biol. 70,324 (1992). 263. J. Zhang, H. Niu and S. T. Jacob, PNAS 88, 8293 (1991). 264. C. M. Hoff and S. T. Jacob, BBRC 190, 747 (1993). 265. C. M. Hoff, A. K. Ghosh, B. S. Prabhakar and S. T. Jacob, €“AS 91, 762 (1994). 266. A. Kuhn, A. Normann, I. Bartsch and I. Grummt, EMBOJ. 7, 1497 (1988). 267. P. Labhart and R. H. Reeder, MCBiol 7, 1900 (1987). 268. S. Firek, C. Read, D. R. Smith and T.Moss, MCBiol9, 3777 (1989). 269. I. Bartsch, C. Schoneberg and I. Grummt, MCBiol8, 3891 (1988). 270. B. McStay and R. H. Reeder, MCBiol 10, 2793 (1990). 271. A. Kuhn, I. Bartsch and I. Grummt, Nature 344, 559 (1990). 272. A. Kuhn and I. Grummt, Genes Dew. 3, 224 (1989). 273. P. Labhart and R. H. Reeder, Genes Dew. 4, 269 (1990). 274. P. Labhart and R. H. Reeder, Cell 45, 431 (1986). 275. R. F. J. D e Winter and T. Moss, NARes 14, 6041 (1986). 276. D. Tautz and G. A. Dover, EMBO J. 5, 1267 (1986). 277. B. E. Morrow, Q. Ju and J. R. Warner, JBC 265, 20778 (1990). 278. Q. Ju, B. E. Morrow and J. R. Warner, MCBiol 10, 5226 (1990). 279. B. McStay and R. H. Reeder, Cell 47, 913 (1986). 280. I. Grummt, A. Kuhn, I. Bartsch and H. Rosenhauer, Cell 47, 901 (1986). 281. S. Henderson and 3 . Sollner-Webb, Celt 47, 891 (1986). 282. S. L. Henderson, K. Ryan and B. Sollner-Webb, Genes Dew. 3, 212 (1989). 283. E. Bateman and M. R. Paule, Cell 54, 985 (1988). 284. B. McStay and R. H . Reeder, Genes Dev. 4, 1240 (1990). 285. D. E. Muscarella, V. M. Vogt and S. E. Bloom, J. Cell Biol. 105, 1501 (1987). 286a. A. Conconi, R. M. Widmer, T. Koller and J. M. Sogo, Cell 57, 753 (1989). 286b. V. E. Foe, C S H S Q B 42(2) 723 (1978). 287. P. Labhart, MCBiol 14, 2011 (1994). 288. H . Ln, L. Zawel, L. Fisher, J. M. Egly and I>. Reinberg, Nature 358, 641 (1992). 289. R. C. Conaway and J. W. Conaway, ARB 62, 161 (1993). 290. A. Kuhn, V. Stefanovsky and I. Grummt, NARes 21, 2057 (1993). 291. M. Kermekchiev and M. Muramatsu, NARes 21, 447 (1993). 292. Y. Mishima, T. Nishimura, M. Muramatsu and R. Kominami, J. Biochem. (Tokyo)113,36 (1993).
66
TOM MOSS AND VICTOR Y. STEFANOVSKY
293. G . Felsenfeld, Nature 355, 219 (1992). 294. 0. V. Zatsepina, R. Voit, I. Grummt, H. Spring, M. V. Semenov and M. F. Trendelenburg, Chromosoma 102, 599 (1993). 295. P. Roussel, C. AndrB, C . Masson, G. GBraud and D. Hernandez-Verdun, J. Cell Sci. 104, 327 (1993). 296a. M. Dunaway and P. Droge, Nature 341, 657 (1989). 296b. L. K. P a p , J. J. Windle, E. B. Mougey and B. Sollner-Webb, MCBiol9, 5093 (1989). 297. 0. L. Miller and 8 . R. Beatty, Genetics 61, 133 (1969). 298. 0. L. Miller and A. H . Bakken, Acta Endocrinol. (Copenhogen) 168, 155 (1972). 299. M. P. Verbeet, J. Klootwijk, H. Van Heerikhuizen, R. D. Fontijn, E. Vreugdenhil and R. Planta, NARes 12, 1137 (1984). 300. V. L. Murtif and P. M. M. Rae, J. Cell B i d . 95, 471A (1982). 301. E. S. Coen and G. A. Dover, NARes 10, 7017 (1983). 302. A. Simeone, A. DeFalco, G. Macino and E. Bonicinelli, NARes 10, 8263 (1982). 303. R. Miesfeld and N. Arnheim, NARes 10, 3933 (1982). 304. G. N . Wilson, L. L. Szura, C. Rushford, D. Jackson and J. Erickson, Am. J. Human Genet. 34, 32 (1982). 305. I. Financsek, K. Mizumoto and Y. M . Muramatsu, PNAS 79, 3092 (1982). 306. R. Bach, I. Grummt and B. Allet, NARes 9, 1559 (1981). 307. Y. Urano, R. Kominami, Y. Mishima and M. Muramatsn, NARes 8, 6043 (1980). 308. T. Moss, P. G. Boseley and M. L. Birnstiel, NARes 8, 467 (1980). 309. €3. Sollner-Webb and R. H . Reeder, Cell 18, 485 (1979). 310. J. Devereux, P. Haeberli and 0. Smithies, NARes 12, 387 (1984).
Targeting and Regulation of Immunoglobulin Gene Somatic Hypermutation and lsotype Switch Recombination1 MARKUSHENGSTSCHL~GER AND NANCYMAIZELS~ Department of Molecular Biophysics and Biochemistry Yale University School of Medicine New Haven, Connecticut 06510 AND
HELIOSLEUNG Bristol-Myers Squibb Pharmaceutical Research Znstitute Seattle, Washington 98121
1. Somatic Hypermutation . . . . . . . . . . . . .
................
A. Somatic Hypermutation Is Targeted to Rearranged Variable Regions B. Somatic Hypermutation and Affinity Selection Occur in the Specialized Microenvironment of the Germinal Centers . . . . . . . . . . . . . C. What Is the Mechanism of Somatic Hypermutation? . . . . . . . . . . . . D. &-Acting Elements in the K Locus That Regulate Hypermutation E. Transcription Is Not Sufficient to Activate Somatic Hypermutation F. Targeting of Hypermutation by Heavy-chain Regulatory Elements G. Targeting of Hypermutation to Reporter Genes . . . . . . . . . . . . . . . . . H . Future Directions . . . . . . . . . 11. Isotype Switch Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Guanine-rich Sequences Are Involved in Switch Recombination . . B. Isotype Switch Recombination Is Region-specific but Not Sequencespecific . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Regulation and Targeting of Switch Recombination . . . . . . . . . . . . . . D. Proteins Implicated in Switch Recombination . . . . . . . . . . . . . . . . . . E . Extrachromosomal Switch Substrates to Analyze Elements Critical to Switch Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F. Transcriptional Regulatory Elements, but Not Transcription, Stimulate Switch Substrate Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Activation of Recombination by Transcription Factors . . . . . . . . . . . . H . Targeting of Recombination by cis-acting Elements . I. Are the Switch Recombination Enzymes Cell-type Specific? . , . . . .
69 69 70 71 74 79 80 82 83 83 84
85 86 86 88 90 92 93 94
A list of abbreviations appears on page 95. To whom correspondence may be addressed. Progress in Nurleic Acid Research and Molecular Biology, Vul. 50
67
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved.
68
MAHKUS HENGSTSCHLAGER E T AL.
111. Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References.. ....................... ~.. . . . . . . . . . . . . . . . . . . . . . . . .
95 96
Regulated changes in genomic structure occur both early and late in B-cell development. Early, in pre-B cells, rearrangement of the V(D)Jsegments produces the variable regions of an immunoglobulin molecule, which encode the portion of the polypeptide responsible for antigen recognition (see Fig. 1). Later, following activation by antigen, somatic hypermutation and isotype switch recombination collaborate to fine-tune the immune response. Somatic hypermutation introduces single-base changes in the rearranged variable regions, resulting in the production of immunoglobulin molecules with dramatically increased affinity for antigen. Isotype switch recombination joins to the expressed variable region a constant region of a different class or isotype, which increases the efficiency with which antigen is destroyed or cleared from the system after being bound by antibody. Somatic hypermutation and switch recombination occur during a restricted stage of B-cell development. Although these two processes are not necessarily coupled, most B cells that have undergone somatic hypermutation have also carried out switch recombination, and vice versa. Extracellular
VK
JK
EKI
CK
EK~'
e-
EK~' FIG. 1. The murine immunoglobulin heavy chain and K light-chain loci before and after rearrangement. The figure shows the V segments, D and J regions that undergo rearrangement in pre-B cells, constant regions, and intron and 3' enhancers (E). A more complete diagram of the heavy-chain locus is shown in Fig. 6.
IMMUNOGLOBULIN GENES
69
signals activate both processes, and the definition of these signals may eventually lead to the development of cell lines in which somatic hypermutation and switch recombination are ongoing or efficiently inducible. At present, however, genetic analysis of elements that regulate somatic hypermutation and switch recombination must rely largely on experiments carried out in vivo or in primary cells. This review summarizes some of the recent work that has addressed the mechanisms of these two processes, with particular emphasis on the use of engineered constructs to identify genetic elements critical to activation or targeting.
1. Somatic Hypermutation
A. Somatic Hypermutation Is Targeted to Rearranged Variable Regions
Somatic hypermutation is a targeted process of sequence diversification that inserts single-base changes into rearranged immunoglobulin heavy- and light-chain variable (V) regions at a rate that approaches one mutation per thousand bases per generation. Somatic hypermutation is aptly named, because the rate is 105- to 106-fold that of the typical rate of mutation in mammalian somatic cells. Somatic hypermutation gives rise to clones that produce immunoglobulin molecules with 10- to 50-fold increased a n i t y for antigen, and thereby enhances both the efficiency and the specificity of the mammalian immune response. Most of our current understanding of somatic hypermutation depends on extensive sequence analysis that has been carried out on hypermutated V regions. In certain antihapten responses in inbred mice, a limited number of germ-line genes give rise to most of the antibodies, and this makes it possible to study somatic hypermutation independent of V-region repertoire use (see 1-10 for references and reviews). Kinetic analyses show that hypermutation starts very soon after challenge with antigen and accumulates as the response progresses. Mutation of both the heavy- and light-chain loci is localized to the rearranged V regions, occurs infrequently in J regions and the J-C introns, and does not alter the promoter, leader, and constant ( C ) region, or unrearranged germ-line V segments (11-19). Mutation exhibits a clear structural pattern within the V regions, where mutated bases appear in clusters within the complementary-determining regions (CDRs), the regions of the polypeptide that participate directly in antigen binding; mutations are much more sparsely scattered in framework (FR) regions of the antibody molecules (see Fig. 2). This reflects selection for mutations in the CDRs,
70
-
MARKUS HENGSTSCHL~CERET AL.
CDRl CDRP JH f VDJH I CDRl
VJK
CDRP
CDR3
JK
I
50 nt
FIG.2. Rearranged heavy-chain and K light-chain variable regions. Variable regions are composed of complementarity-determining regions (CDRs), where most residues that contact antigen are located; and framework regions, which determine the overall structure of the variable region.
which increase affinity for antigen, as well as selection uguinst mutations in the framework regions, which disrupt the structural integrity of the immunoglobulin molecule.
B. Somatic Hypermutation and Affinity Selection
Occur in the Specialized Microenvironment of the Germinal Centers
Germinal centers are specialized microenvironments that form in follicles of the lymph nodes and other secondary lymphoid tissues following challenge with T-cell-dependent antigens. Germinal centers were first described over 100 years ago, and although evidence accumulated suggesting that they were the site of somatic hypermutation and a f h i t y selection, it is only in the past few years that this has been unambiguously demonstrated (20-26). A germinal center consists of three histologically distinct regions: a dark zone, a light zone, and a mantle, organized around a network of follicular dendritic cells (reviewed in 4,27-30). During the first days of an immune response, activated B cells congregate in the follicle and proliferate. The dark zone of a germinal center fills with descendants of these proliferating cells, which do not express cell-surface immunoglobulin. Cells within the dark zone constantly divide, yet their numbers do not increase. Instead, their progeny migrate to the light zone, where they display cell-surface immunoglobulin and interact with antigen displayed on the long hairlike processes of the follicular dendritic cells; this is where cells undergo selection for antigen binding. Germinal-center B cells bind avidly to the lectin peanut agglutinin (PNA), and this property can be exploited to isolate hypermutating B cells from lymphoid organs (9. 20, 31, 32). T cells are essential for germinal-center development, and germinal centers do not appear following immunization of athymic (nude) mice. Nonethe-
IMMUNOGLOBULIN GENES
71
less, only a small fraction (5-10%) of cells within the germinal center are T cells. These are concentrated in the light zone, but their function there is not yet understood. T-cell-independent antigens do not stimulate formation of classical germinal centers (33) nor do they induce somatic hypermutation (34, 35).
C. What Is the Mechanism of Somatic Hypermutation? The molecular mechanism of somatic hypermutation has not been defined. It is anticipated that an understanding of the mechanism will ultimately explain some of the most striking features of hypermutated V regions:
1. Somatic hypermutation is targeted exclusively to rearranged immunoglobulin V regions, and does not affect unrearranged V segments or other genes in the activated B cell. Both productively and nonproductively rearranged V regions hypermutate at comparable frequency (11,13).Rearrangement ofV regions therefore seems to be important for activating or targeting the mutator machinery. In addition, heavy chains that have completed only D-J joining show levels of mutation much lower than found in comparable V-D-J-joined alleles (36, 37). This suggests that elements near the V region might play a role in targeting somatic hypermutation. 2 . Somatic hypermutation is primarily restricted and/or targeted to sequences downstream of the promoter (12,18,19). Most mutations are found in variable regions, where they are concentrated in the CDRs, but some mutations also appear in the J-C intron. The 3’ boundary of somatic hypermutation is not well-defined, but hypermutation does not extend into the C regions (38). 3. It has often been observed that identical silent mutations are found in independently isolated V regions (reviewed in 3), and recurrent mutations continue to be noted (e.g., 39, 40). Although recurrent replacement mutations may reflect antigen selection, recurrent silent mutations can best be explained by a mutational mechanism that is templated or characterized by unusually active mutational hotspots. Experiments that analyze hypermutation of passenger transgenes that are not contributing to affinity selection suggest that there are active hotspots for hypermutation (9, 41). 4. Somatic hypermutation does not display the 5- to 10-fold transition:transversion bias typical of meiotic mutations; furthermore, unselected mutations do not occur randomly, but seem to occur at a higher rate in A.T pairs than in G.C pairs (reviewed in 42).
72
MARKUS HENGSTSCHLAGER ET AL.
The models for mechanism that have been proposed can be grouped into two categories that are biochemically and genetically distinct. One category invokes unfaithful copying by a template-dependent polymerase as the critical step in mutagenesis, and includes models based on error-prone replication (43,44),transcription, or reverse transcription (45).The second category envisions heteroduplex formation, or gene conversion, as the critical step, and predicts that some fraction of mutations will be templated by germ-line sequences (8, 46, 47). Enzymatic repair of the altered DNA is a common feature of both classes of models. An excellent review evaluates these alternative possibilities in detail (48). The possibility that an error-prone DNA polymerase targeted to the immunoglobulin loci could alter V-region sequences was initially suggested to explain antibody variability (49, 50) and was later applied specifically to somatic hypermutation (see, for example, 12, 51). The ability of mammalian polymerases to replicate nucleic acid in an error-prone fashion in vitru has been reviewed in detail (52). Transcription-based models similarly postulate unfaithful copying by a template-directed enzyme, but this class of model envisions that errors are introduced during transcription or reverse transcription and find their way back into genomic DNA by targeted retrotransposition (15, 48). The experimental proof of replication- or transcriptionbased models would lie in identification of a polymerase with an in vivu error rate of about 10-3, and of a mechanism that targets this polymerase to rearranged V regions in activated B cells. To date, such polymerases or targeting mechanisms have not been identified. Gene conversion is a templated process of recombination in which sequence information is transferred from donor to recipient gene via formation of a heteroduplex intermediate. The notion that the mechanism of somatic hypermutation may depend on gene conversion originated in proposals that some sort of segmental recombination might contribute to antibody structure (53, 54). Seidman et al. (55) suggested that shared homology among V regions might facilitate intergenic recombination in somatic cells, and Baltimore (46) suggested that multiple rounds of gene conversion could explain the patchy sequence homology observed among members of immunoglobulin germ-line heavy- and light-chain V-region families. The possibility that gene conversion might play a role in somatic hypermutation fell into disfavor when donors for particular mutations in murine immunoglobulin genes could not be found at allelic or highly homologous loci (10, 51, 56, 57). We were led to reopen the question of whether gene conversion plays a role in somatic hypermutation for several reasons (8). In particular, experimental data from two different systems had shown that gene conversion could induce or be accompanied by untemplated mutations (58, 59). This suggested that untemplated mutations, which had been taken as counterex-
IMMUNOGLOBULIN GENES
73
amples to a possible role for gene conversion in hypermutation of immunoglobulin genes, might actually be the result of errors in repair of a duplex formed between donor and recipient sequences during gene conversion. Further consistent with a templated mutational mechanism were data showing that particular silent mutations recur frequently in independently isolated antibodies (3). The earliest evidence for a role of gene conversion in diversification of immunoglobulin V regions comes from the chicken A light chain, where potential germ-line donors can be found for nearly every observed mutation (58, 60-63). Gene conversion has also been shown to be the mechanism of targeted, regulated sequence diversification in the rabbit (64,65). Although the molecular details of targeting have not been defined, it is evident that a mechanism does exist that can target gene conversion to rearranged immunoglobulin genes. Several different groups have attempted to determine the role of templating in somatic hypermutation of mammalian immunoglobulin genes by carrying out hybridization and cloning to identify germ-line V genes that contained segmental matches to hypermutated regions. In one case, no germ-line matches were found by hybridization with two different oligonucleotide probes (57), and the authors concluded that hypermutation is not templated. Another group found that, although a probe hybridized to genomic blots under apparently stringent conditions, the only matching clones that they could identify were not from V regions (66). A limitation of experiments that attempt to identify donors for gene conversion by hybridization is that the results are sensitive to the design of the oligonucleotide. Mismatch-sensitive hybridization requires that the labeled oligonucleotide anneal throughout its entire length, and if the boundary of conversion is within the oligonucleotide, or if the segment transfered is small compared to the size of the oligonucleotide, no hybridization will be apparent. We therefore tested hybridization of a panel of 10 oligonucleotides to digests of germ-line D N A (47). DNA digests were probed with 32Plabeled, synthetic 20-base oligonucleotides, and then washed in concentrated tetramethylammonium chloride at elevated temperature to disrupt any hybrids that were not perfect matches (67).These hybridization experiments identified germ-line sequences identical to 7 of the 10 oligonucleotide probes tested. V-Region clones were isolated that matched some of the probes, and comparison of the sequences of cloned germ-line V segments and hypermutated V regions showed that the regions of identity ranged in size from 7 to over 50 nucleotides, in both the K and heavy-chain loci (47, 68). Several examples of germ-line sequences that match hypermutated sequences in VKOXantibodies are shown in Fig. 3. The lengths of the matching segments are similar to lengths transferred in other targeted processes of
74
MARKUS HENGSTSCHLACEH ET AL.
___
~~7.1.3 5-4(=w) - - C IDENTITY
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ _ _ _ - _ __ _ _ _ _ _ ___ _ _ _ ___ _ _ _ ___ _ _ _ ___
.._ .._ .__
.._ ... .__...
-_.
___
-T-
---
-
__. A-_. A-.
___ ___
A-
___ ___
_._ .._ A-_._.-A C--
(40 nt)
FIG.3. Germ-line VKregions that match hypermutated sequences in genes encoding antiOx antibodies. The top line shows the germ-line VKOx-1 sequence from codons 21-40; the
complementarity-determining region, CDR1, includes codons 24-34. The sequences of anti-Ox hybridomas NQ22-16.4 (2), NQ7-34.3, and NQ7-1.3 (142) are compared to germ-line V regions V K ~ - 4and V K ~ - 5(47);V K ~ - 4is probably the same as R2 (117).
sequence diversification by gene conversion (58, 60, 61, 65). These data are consistent with a templated mechanism of mutation, but they do not constitute proof of mechanism.
D. cis-Acting Elements in the
K
Locus That
Regulate Hypermuta tion Unfortunately, no B-cell line has been described in which somatic hypermutation is active or inducible. Furthermore, despite many attempts, no culture conditions have been defined that mimic the microenvironment of the germinal center and induce somatic hypermutation in primary B cells. Transgenic mice carrying productively rearranged immunoglobulin genes are therefore the only system to identify elements that regulate and target this process. Figure 4 is a compilation of the transgenic constructs that have been used thus far to analyze somatic hypermutation; Table I summarizes the results of the analyses using these constructs. The importance of cis-acting regulatory elements in activating somatic hypermutation was established in experiments that studied hypermutation of rearranged K transgenes. Storb's laboratory was the first to show that a transgene could undergo hypermutation, in experiments that analyzed a construct that carried a rearranged K locus from MOPC167, including sequences from upstream of VK through 9 kb downstream of CK (see Fig. 4) (69, 70). Hybridomas were generated from mice immunized with the hapten phosphocholine, which induces a strain-specific response dominated by the T,,V, region and the V ~ 1 6 7light chain. The endogenous T15VHregion had clearly mutated in three IgG hybridomas, and the K transgenes in these
IMMUNOGLOBULIN GENES
75
FIG.4. Transgenic constructs used to analyze elements critical to somatic hypermutation. Numbered constructs are described in detail in the references indicated: (l),69, 70; (2),71; (3), 72; (4), 72a; (S), 77; (6), 78; (7), 81; (a), 82, 84; (9), 85, 86; (lo), 87; (ll),88. EKi and E K ~denote ' the K intron and 3' enhancer; Ep,, the heavy-chain intron enhancer; P,, a noncognate heavychain promoter (all other V regions are regulated by their own promoters); SupF, a supressor tRNA gene; CAT, a chloramphenicol acetyltransferase reporter gene.
hybridomas had also undergone hypermutation, although the mutation frequency, 13 mutations in 1044 bp of V ~ 1 6 7(Table I), was about a fifth of that typically observed in endogenous K light chains. This report clearly demon-
TABLE I TRANSGENES STUDIED FOR SOMATIC HYPERMUTATION ______
______
~~
~~
-l
rn
Number0
V regions
1
V~167
2
VKOX-1
3
VKOX-1
4
VK0x-l
5
V K ~
Source MOPC- 167 (myeloma) NQ2.48.2.2 (hybridoma) NQ2.48.2.2 (hybridoma) NQ2.48.2.2 (hybridoma) NQ2.48.2.2 (hybridoma) NQ2.48.2.2 (hybridoma) H220-17 (hybridoma)
Copiesb
Promoter
Enhancer
3 and 13
V~167
EKi and E K ~ '
2 and 4
VKOX-1
h i , 3'
3 and 5
VKOX-1
2 and 5
Antigen"
Mutation frequency d
Phosphorylcholine
1.25%
2-Phenyl-oxazalone
0
EKi and E K ~ '
2-Phen yl-oxazalone
0.94
VKOX-1
EKi
<0.22e
4 and 7
VKOX-1
EK~'
7
@-Globin
EKi and E K ~ '
1 and 3
V K ~
K
Environmental (gut Peyer's patch cells) Environmental (gut Peyer's patch cells) Environmental (gut Peyer's patch cells) Hemagglutinin
intron
K
deletion
<0.07e 0.8
0
6
VA 1
7
V2B4P-TCR
8
V,R16.7 V"R16.7
9
11 -3 4
V,36-65
CAT
J558L (myeloma) MouseTCR gene R16.7 (hybridoma) R16.7 hyhridoma) 36-65 (hybridoma)
-
1
V,186.2
NP
0
2
V2B4P-TCR
Phosphorylcholine
0.02
50 and 100
V,R16.7
p-Azophenyl arsonate
0.56
30-50
V,R16.7
EP
p-Azophenyl arsonate
0.2
2-10
V,36-65
E p (not recombined)
p-Azophenyl arsonate
0
E k and heavy chain 3' (recombined) K intron
p-Azophenyl arsonate
2 .1
Multiple
0.1% of supF 0.05
3-4
V, 17.2.25
NP
References are given in Fig. 4 legend. Number of transgene copies stably integrated in the mouse genome. c Antigen used for hyperimmunization of transgenic mice before hybridomas were generated. d Mutation frequency is given in mutated base pairs per 100 sequenced bases; where the rate shown is 0, at least 2700 bases were sequenced c Mutation frequency determined by PCR and the PCR error rate of 0.07 has not been subtracted.
a
b
78
MARKUS HENGSTSCHLdCER ET AL.
strated that transgene DNA can be targeted from somatic hypermutation. It further showed that ongoing rearrangement is not required for hypermutation, and that the injected transgene carries all information necessary to target the mutation process specifically to its V region, independent of its chromosomal integration site. Shortly after publication of this first report on somatic hypermutation of an immunoglobulin transgene, it was shown that that the region downstream of CK plays a critical role in targeting and/or stimulating K somatic hypermutation. Neuberger, Milstein, and their collaborators (71) assayed hypermutation of a rearranged VKOx-1 transgene that carried only 1 kb of sequence from downstream of CK (Fig. 4). Hybridomas were generated after hyperimmunization with the hapten 2-phenyl-oxazalone, which induces a strain-specific immune response dominated by the VKOx-1 and the V,Ox-1 genes. Sequences of 48 kb of transgenic VK regions from these hybridomas showed no evidence that the transgene had been altered by somatic hypermutation, even in B cells in which the endogenous and expressed heavy- and light-chain genes had been mutated. However, this same rearranged VKOx-1 region underwent active hypermutation when carried in a transgenic construct that included 9 kb of sequence from downstream of CK, including the 3 ' enhancer, ~ and the rate of hypermutation of this longer K transgene was comparable to that of endogenous rearranged immunoglobulin K genes (72) (Table I). Because hypermutation of a rearranged K transgene had been shown to occur in several different transgenic lines, it was clear that there are one or more cis-acting elements in the K locus that can target hypermutation in a manner largely independent of chromosomal position. In the transgenic K constructs described above, hypermutation appeared to correlate with gene expression. The shorter K construct, which did not hypermutate, was poorly expressed, whereas the longer K transgene, which did hypermutate, was actively expressed (71, 72). There are two transcriptional enhancers that regulate K expression, a relatively weak enhancer, h i , in the JK-CK intron (73, 74), and a more potent enhancer, E K ~ about ', 8 kb downstream of CK (75, 76) (see Fig. 1). This raised the possibility that one or both of these regulatory elements might regulate hypermutation. In addition, the possibility that transcriptional regulatory elements might regulate somatic hypermutation fits with data showing that only rearranged immunoglobulin genes undergo hypermutation and that the promoter is the upstream boundary for somatic hypermutation (11-19). The function of the K locus transcriptional regulatory elements in hypermutation was directly tested in recent experiments, which showed that both E K ~and ' EKi are necessary to target hypermutation to a rearranged transgene in cis (72a) (Table I). Addition of the E K ~enhancer ' to a rearranged K
IMMUNOGLOBULIN GENES
79
gene construct carrying a 3' truncation (71)restored transgene hypermutation to the normal level; conversely, deletion of E K ~from ' a longer transgene diminished hypermutation to near background levels. In the same set of experiments, deletion of the K intron enhancer-matrix attachment region also reduced hypermutation dramatically, without affecting transgene transcription. Perhaps surprisingly, because the promoter is the upstream boundary of hypermutation, substitution of a P-globin promoter for the VK promoter upstream of the rearranged transgene had no effect on hypermutation. Therefore the signals that define the 5' boundary of hypermutation are not supplied by the V-region promoter.
E. Transcription Is N o t Sufficient to Activate Somatic Hypermutation Analysis of the role of K transcriptional regulatory elements in somatic hypermutation produced one additional surprising conclusion. Although the K intron enhancer and matrix attachment region (EKi) are necessary for hypermutation, they do not appear to be necessary for transgene transcription, at least in very late stages of B-cell development. Transgenes that lacked this element were actively expressed in hybridomas, although they did not hypermutate (72a). Observations with another K transgene had also suggested that transcription alone does not target a rearranged immunoglobulin gene for hypermutation. A rearranged VK8 gene, specific for the influenza virus hemagglutinin and carrying 1.2 kb downstream of CK, was expressed at high levels but appeared not to undergo hypermutation (Fig. 4,and Table I) (77).The interpretation of these results is complicated, however, because expression of this transgene appeared to depend on elements at its integration site, and because the endogenous heavy-chain V regions in the same cells were not hypermutated. To address directly the question of whether transcription is sufficient to activate hypermutation, we created transgenics carrying a A light-chain minilocus construct in which a rearranged A 1 light-chain gene was regulated by the heavy-chain intron enhancer, Ep,, and the PHISfi,zpromoter from the heavy-chain germ-line V,,186.2 segment (Fig. 4) (78). The A genes do not contain any J-C intron enhancer element (74, 79), and the expression of the transgene was therefore exclusively regulated by Ep and PH1,fi,z. C57BL/6 mice immunized with the hapten (NP) mount an immune response dominated by the V,186.2 heavy chain and the A 1 light chain, and the same transcriptional regulatory elements therefore control both heavy- and lightchain expression in the transgenics. One line that carried a single copy of the A 1 transgene was chosen for further analysis, and 13 hybridomas generated following immunization with
80
MARKUS HENGSTSCHLAGEH ET AL.
NP were analyzed for gene expression and hypermutation. All these hybridomas secreted high levels of NP-specific antibodies and actively expressed the endogenous V,,186.2 heavy-chain gene and the transgenic X I light-chain gene. Somatic hypermutation was active in the B cells that underwent fusion: 46 mutations were apparent in the six different V, regions that we sequenced (1710 bp sequenced, an average of 7.6 mutations per V,186.2 gene). Furthermore, the frequency, distribution, and pattern of hypermutation of the heavy-chain variable regions in these hybridomas were typical for the C57BL/6 NP response (15, 80). In contrast, the transgenic VX1 region in the same B cells was not a substrate for hypermutation: there was not a single mutation in 4279 bases sequenced from 13 different hybridomas (Table I). Because both the heavy-chain gene and the A 1 transgene are rearranged and actively expressed in the hybridomas, but only the heavy-chain gene is mutated, transcription alone is not sufficient to activate hypermutation of a rearranged immunoglobulin gene. Taken together with the analysis of K transgene hypermutation described above, these results show that regulatory elements that activate somatic hypermutation are functionally-if not physically-distinct from elements that enhance transcription in late stages of B-cell development. As discussed below, it should now be possible to identify the genetic elements that enhance somatic hypermutation using a new methodology that streamlines the analysis of hypermutation (26, 72a, 90).
F. Targeting of Hypermutation by Heavy-chain Regu Ia tory E I ements The experiments described above show that the heavy-chain promoter and E p intron enhancer are not sufficient to target hypermutation to a rearranged X gene. E p was also shown to be unable to target somatic hypermutation to a rearranged T-cell receptor (TCH)transgene generated by introducing Ep into the intron of a rearranged TCR-j3 gene (Fig. 4) (81).A line of transgenic mice carrying two copies of this transgene was established and hybridomas were generated after hyperimmunization with the hapten phosphocholine. The endogenous V,, genes in these hybridomas displayed a frequency, distribution, and pattern of somatic hypermutation characteristic of the antiphosphocholine response. However, although the TCR-j3 gene was expressed in these hybridomas, sequence analysis showed that 29 of 32 TCR transgenes carried no mutation; and one single-base change was evident in each of the three other TCR transgenes (Table I). This report clearly demonstrates that E p alone is not sufficient to target somatic hypermutation to a variable region of a rearranged TCR gene. However, as somatic hypermutation does not appear to play a role in diversification of TCR genes, a TCR transgene could lack other essential elements that target hypermutation.
IMMUNOGLOBULIN GENES
81
Only one rearranged heavy-chain gene has been introduced into the genome of transgenic mice with the aim of studying somatic hypermutation. This transgene was created by joining a VHR16.7-DJ segment from an anti-p-azophenyl arsonate antibody to a genomic C p gene region (Fig. 4) (82). The transgene did not span the region that includes the 3’ enhancer (83) and was therefore regulated by the E p intron enhancer alone. The transgenic lines studied carried at least 50 copies of the transgene per cell (82,84), and the p transgene in these lines appeared to undergo switch recombination with the endogenous y regions, although the frequency of these recombination events was not determined. Somatic hypermutation was analyzed in both IgM and IgG hybridomas from mice that had been immunized with p-azophenyl arsonate. In one line, where only IgG antibodies were studied, one expressed V region carried no mutations, another carried one mutation, and a third carried seven mutations (Table I). Within the transgenic V,, segments, the authors observed five mutations in 900 bp. In another line, V,, segments in p chains appeared to mutate at a rate of about 0.24%(this is only six times the error rate of PCR), whereas V,, segments in y chains mutated at a rate about ten times higher (84). The frequencies of mutation observed with this transgene are a fifth to a tenth the frequency observed in nontransgenic mice after immunization with p-azophenyl arsonate (2.4to 2.8%;Table I; reviewed in 43). Until it is known whether the hypermutation mechanism is affected by gene dosage, the very high number of trangene copies makes it difficult to establish whether the few mutations observed were biologically meaningful. To test the role of sequences flanking the V regions in activating hypermutation, transgenic mice were generated bearing multicopy plasmid constructs containing DNA of antibody heavy-chain variable-diversity-joining regions (V,,-D- JH2-4)flanked by various amount of 5‘ and 3’ sequence (85). These transgenic constructs recombined with endogenous heavy-chain locus sequences, leading to the expression of anti arsonate antibodies partially encoded by the transgenic V, (85). The functional heavy chain in hybridomas isolated from immunized transgenic mice was often encoded by a “hybrid” transgene-IgH locus, and other copies of the original trangene were localized on other chromosomes in the same cells. Sequence analyses of the copies, which were rearranged to the endogenous heavy-chain loci, revealed that these transgenic V regions were targeted for somatic hypermutation (86).The observed mutation frequency was about 2% (Table I), and the distribution and type of mutation were characteristic of natural immunoglobulin loci. Strikingly, the transgenic V region copies that integrated outside the heavy-chain locus were not targeted for somatic hypermutation, independent of the amount of 5‘ flanking sequence and the presence of Ep. Collectively, these observations provide evidence that a V, promoter and
82
MAHKUS HENGSTSCHLAGER ET AL.
the heavy-chain intronic enhancer EIJ.are not sufficient, and that a region 3' of the heavy chain constant regions is necessary to target somatic hypermutation to Ig V,, transgenes.
G. Targeting of Hypermutation to Reporter Genes Analysis of hypermutation is tedious, because it involves sequencing mutated V regions, and several laboratories have attempted to develop reporter gene assays that will permit rapid identification of hypermutated sequences. Results of two such attempts have been published, and from these reports some of the difficulties in setting up such a system are evident. One group (87) generated transgenics that carried a rearranged V ~ 1 6 7 variable region, CK,and 1.2 kb of sequence from downstream of CK;a supFtRNA amber-suppressor gene was inserted into the J-CK intron as a reporter for hypermutation (Fig. 4). The transgene also carried vector ori and amp genes, and hypermutation was assayed by rescuing plasmids from B cells of hyperimmunized mice and transforming a ZacZ amber strain of Escherichia coZi. On X-gal indicator plates, colonies transformed by supF+ plasmids will be blue, and colonies transformed by supF- plasmids will be white: the white:blue ratio should therefore provide an estimate of hypermutation. The authors recovered 17 white colonies and over 13,000 blue colonies, and they claimed that this represented a hypermutation frequency of 0.1% (Table I). Sequencing of the supF regions showed that inactivation of the suppressor was in every case due to a small (1-2 nt) deletion, and that there were no examples of the single-base changes that are characteristic of somatic hypermutation. This raises the question of whether mutation of the reporter reflects bona jide somatic hypermutation. Furthermore, because the transgenic construct lacked the downstream elements essential to activate hypermutation (71, 72), it would be surprising if the VKtransgene had hypermutated; the authors did not sequence the transgenic VK regions to address this issue. Another group claimed to observe somatic hypermutation in transgenic mice carrying a chloramphenicol acetyltransferase (CAT) gene reporter gene (Fig. 4) (88).The CAT coding region was regulated by the promoter of the VH17.2.25segment, which dominates the BALB/c response to NP just as V,,186.2 dominates the C57BL/6 response to this hapten (89). Transgenic C57BL/6 mice carrying three copies of this reporter construct were immunized with the hapten NP and hybridomas were analyzed for hypermutation. In 8 of the 11 hybridoma lines analyzed, none of the three transgene copies was mutated. In the other three hybridomas, six mutations were evident that did not appear to arise during PCR because they were shared by clones obtained from two independent PCR cloning procedures. The overall frequency of mutation was 0.08%(Table I). The authors did not determine the
IMMUNOGLOBULIN GENES
83
mutation rate of the endogenous V, gene in these hybridomas, but assuming that these B cells mutated at a rate typical of the C57BL/6 anti-NP response, the mutation frequency of the CAT gene is about a fiftieth of the expected mutation frequency of an immunoglobulin variable region.
H. Future Directions Analyses of immunoglobulin transgenes to date have revealed that cisacting elements must play a role in targeting somatic hypermutation. The identification of these elements and the factors that bind them will be a major step in elucidating the mechanism responsible for targeting somatic hypermutation. In principle, it is possible to carry out a screen for such regulatory elements by analyzing hypermutation of rearranged immunoglobulin genes in which potential regulatory elements have been mutated or deleted. In practice, this sort of experiment involves two steps that are timeconsuming, expensive, and require real technical expertise: creating and maintaining transgenics carrying the test substrate for hypermutation, and generating hybridomas from immunized mice. As long as there are no cell lines in which hypermutation is ongoing or inducible, work on somatic hypermutation will have to rely on transgenics, but a number of laboratories have attempted to streamline subsequent steps in the analysis of hypermutation. As described above, results with reporter genes for hypermutation do not yet look promising. Nonetheless, other recent advances do have real potential to simplify the analysis of somatic hypermutation. The time, effort, and expense required to generate hybridomas have been one obstacle to carrying out rapid screens for elements critical to hypermutation. Several laboratories have recently reported that, using single-cell PCR, both heavy- and light-chain DNA can be amplified from individual B cells and subjected to sequence analysis (26, 90). Using single-cell PCR, one should now be able to bypass hyridoma production completely and analyze hypermutation in activated B cells isolated directly from the Peyer’s patches (91) of unimmunized mice, or from lymph nodes of mice immunized with T-cell-dependent antigens (31).
II. lsotype Switch Recombination Immunoglobulin isotype switch recombination is a regulated recombination event that joins an expressed heavy-chain variable (VDJ) region to a new downstream constant (C) region (see Fig. 5). Each of the different classes of constant region removes antigen in a specialized way, so the result of switch recombination is to alter the antigen clearance properties of an immuno-
a4
MARKUS HENGSTSCHLAGER E T AL.
LVDJ Ep Sp Cp
C6
sy3 cv3
Syl Cyl
LVDJ Ep SpISyl Cyl
FIG. 5. Switching from C p to Cyl. Switch recombination juxtaposes a rearranged variable region (VDJ)with a downstream C region, deleting the DNA between as a circle. (Symbols as in Fig. 4.)
globulin molecule without affecting its specificity for antigen. Switch recombination increases the efficiency and versatility of the immune response. The importance of switch recombination is evidenced by the fact that certain immunodeficiency diseases are characterized by an inability to carry out isotype switch recombination (see, for example, 92).
A. Guanine-rich Sequences Are involved in Switch Recombination
The organization of the murine heavy-chain locus, shown in Fig. 6, reflects the functional requirements of switch recombination. G-Rich regions of sequence, from 2 to 8 kb in length, appear to be critical to switch recombination. In the heavy-chain locus, these G-rich switch regions, or S regions, occur upstream of each C region that can undergo switching, Cp., Cy, Ca, and CE. RNA processing rather than DNA recombination regulates C6 expression, and there is no G-rich switch region upstream of C6. The G-rich S regions consist primarily of repetitive sequences. The consensus repeat of each S region is distinctive in size (5 to 50 nt) and sequence. G-Rich sequences are implicated in interactions that occur during meiosis and telomere association, as well as in switch recombination (93-96). The observation that synthetic oligonucleotides that match consensus switch repeat sequences spontaneously form four-standard structures in uitro (93) raises the intriguing possibility that, by virtue of their chemistry, G-rich sequences can promote interstrand associations that are essential to chromosome pairing and recombination.
IMMUNOGLOBULIN GENES LVDJEkSp
85
Cp C6
FIG. 6. Organization of the murine immunoglobulin heavy-chain locus. A rearranged VDJ region is shown just upstream of the heavy-chain intron enhancer, Ek; switch regions (S) are shown by striped boxes and variable and constant (C) regions by filled boxes.
B. lsotype Switch Recombination Is Regionspecific but N o t Sequence-specific
Although switch recombination can be targeted to a particular constant region, the actual recombination event is region-specific, not sequencespecific. Chromosomal junctions do not cluster within the S regions, and junctions show no evidence of sequence-specificity or of homologous pairing during switch recombination (97). Although an occasional base or two of homology is evident at some switch junctions, even such limited homology is not the rule. This apparent imprecision distinguishes isotype switching from other targeted, regulated recombination events in both eukaryotic and prokaryotic cells. Because switch junctions lie within introns, the imprecision of the recombination event leaves no mark on the heavy-chain polypeptide. One puzzling feature of switch recombination junctions is that about half of the upstream chromosomal junctions are located not within, but 5’ of the Sp region. Switching B cells produce circular DNA molecules containing the deleted C-region sequences, as shown in Fig. 5 (98-100). The junctions carried by the circles do appear somewhat more clustered than the chromosomal junctions and, in particular, almost all upstream junctions mapped in circles deleted during switching fall within the S p region. It is not yet known whether switch circles are products of reciprocal recombination, but if they are, deletion subsequent to circle excision may be responsible for loss of repetitive Sk sequences from chromosomal junctions. Whether such deletion happens soon after switching or during subsequent cell culture is clearly relevant to the mechanism of recombination. It is also possible that upstream and downstream breakpoints produced during switch recombination are not equivalently processed, and that the upstream breakpoint may undergo further deletion while the downstream breakpoint remains intact. In fact, signal and coding joints are not identically processed during V(D)J recombination (reviewed in 101, 102). It is important to emphasize that switch recombination is developmentally and mechanistically distinct from V(D)J joining. V(D)J joining occurs in pre-B cells that have not encountered antigen, and switching occurs in B cells that have accomplished V(D)J joining and have subsequently been activated by antigen. Heptamer-nonamer recognition elements precisely
86
MAHKUS HENGSTSCHLAGER ET AL.
target V(D)Jjoining (103),whereas, as described above, switch recombination is imprecise and no conserved sequences comparable to the heptamernonamer elements are apparent at switch recombination sites. Switching does, however, share two features with V(D)Jjoining. Deletion circles are produced during V(D)Jjoining, as in switch recombination (98-loo), and, as discussed below, in both processes sterile transcription of the region targeted for recombination correlates temporally with recombination.
C. Regulation and Targeting of Switch Recombination Switching occurs in response to extracellular signals mediated by cytokines and lymphokines. Primary B cells cultured with the polyclonal mitogen lipopolysaccharide (LPS) undergo switching mainly to y3, LPS IL-4 to y l and E, LPS IFN-y to y2a, and LPS + TGF-P to a (reviewed in 104). In all cases examined, sterile transcripts are produced from the constant region targeted for switching (reviewed in 105). Alt and co-workers have suggested that transcriptional activation might regulate recombination, because regions of DNA that are activated for transcription would be especially accessible to the lesions that must initiate recombination (106, 107). The sterile transcripts, produced at a fairly low level, initiate upstream of the targeted switch region, often at heterogeneous start sites, and proceed through the downstream S and C regions. Sterile C-region transcripts are spliced to produce an RNA in which a 5’ exon from just upstream of the S region is joined to the downstream C-region sequence. These spliced RNAs appear not to encode any polypeptide; it is not clear if they have any function in activated B cells. Switch recombination is targeted to specific regions of DNA by cis-acting regulatory elements. Gene replacement or “knockout” experiments have shown that deletion of regions that regulate sterile transcription of the S y l and Sy2b switch regions specifically impairs switching to these isotypes, but not to other isotypes (108, 109). However, while deletion of the E p heavychain intron enhancer eliminated production of S p sterile transcripts, it impaired but did not completely eliminate recombination involving the S p region (110), suggesting that transcription and recombination are not coupled mechanistically. A better understanding of the specific factors that regulate recombination should clarify what connection there may be between recombination and transcription.
+
+
D. Proteins Implicated in Switch Recombination The switch regulatory elements may very well contain binding sites for one or more inducible factors that target switch recombination to specific isotypes. Two IL-4-inducible factors likely to be involved in targeting of E switching have been described (111, 112), and a match to the binding-site
87
IMMUNOGLOBULIN GENES
consensus for one of these factors has been identified in the region implicated in regulation of sterile Syl transcription (113). In the near future, it is likely that additional switch targeting factors will be identified that are inducible by culture with specific lymphokines and cytokines. Several different DNA-binding proteins have been identified that appear to be involved in the more general regulation of switch recombination. We have characterized one such protein, LR1, a 106-kDa DNA-binding protein that is induced in primary cultured B cells with kinetics parallel to switch recombination (114-116). We have shown that LR1 binds sites in the E p intron enhancer and the S y l , Sy3, and Sa switch regions. These sites define a consensus binding sequence, and computer search for sequences that match this consensus shows that there are from 15 to 25 potential LR1 sites per kilobase of DNA in each of the S regions. In the Sy3 switch region, for example, there are 49 potential binding sites in 2.5 kb of switch region DNA; as shown in Fig. 7 , all but three of these potential sites are in G-rich S-region sequences. The multiplicity of sites is consistent with LR1 functioning in a recombination process that is region-specific but not site-specific. Several properties of LR1 in uitro may be especially relevant to its function in recombination. LR1 bends DNA dramatically on binding, and LR1LR1 complexes form readily at high protein concentration (L. Hanakahi and N. Maizels, unpublished data). Interactions between LRl molecules bound at distant chromosomal sites may therefore juxtapose sequences that have been activated for recombination. LR1 is not only implicated in regulation of switch recombination, but has also been shown to regulate transcription of c-myc in B cell lymphomas (118). Translocations of c-myc to the immunoglobulin locus characterize B-cell lymphomas, and it is an intriguing possibility that some of these aberrant recombination events are targeted by
LVDJSp Cp
C6
--.- - -
- - - _ _--. -.
-.- - - -. lkb
I LR1 site FIG.7. LR1 sites in the Sy3 switch region. A search for matches to the binding consensus, GGNCNAGGCTGR, at a stringency of two or fewer mismatches, revealed 45 potential LR1 sites in the 2.5-kb Sy3 switch region. This is comparable to the density of sites found in other S regions. All sites are in the same orientation; this may reflect the fact that it is the top strand of S-region DNA that is 6-rich.
88
MAHKUS H EN G S TS C H L~ G ERET AL.
interactions between LRl bound at the switch regions and LR1 bound at c-rnyc. Other factors independently identified as binding sites in Sp (119), Sa (120) and near Sy2a (121) are all identical to the B-cell-specific transcription factor, BSAP (122, 123). BSAP, a 50-kDa protein, is distinct from LR1 not only in molecular weight but also in binding specificity (116). Furthermore, unlike LR1, BSAP DNA-binding activity is present in resting B cells; it is induced severalfold when cells are cultured under conditions that stimulate cell proliferation (124). A factor (“SNIP/SNAP”)that may include a polypeptide related to NF-KB has been reported to bind sequences from the Sy repeats; it has been further suggested that binding by this factor may define regions where junctions are likely to occur (125, 126).
E. Extrachromosomal Switch Substrates to Analyze Elements Critical to Switch Recombination We have developed a sensitive genetic assay that uses extrachromosomal substrates carrying switch region sequences to test the roles of particular sequences and regulatory elements in switch recombination (127, 128). This assay is diagrammed in Fig. 8. The switch substrates contain sequences from the SF and Sy3 switch regions flanking a conditionally lethal sequence, the leftward (PL)promoter of bacteriophage A. The recombination substrate is carried in a shuttle vector that also contains the polyonia virus early region to drive replication in murine B cells, and the pBR322 plasmid-origin and ampicillin-resistance marker. The constructs can be propagated in lysogenic strains of E . coli, where the endogenous cI repressor shuts off transcription from APL, but they cannot give rise to ampicillin-resistant colonies in non-
amp FIG.8. Switch substrate recombination assay: (1) Transfect primary LPS-stimulated B cells. (2) SpLISy3 recombination will delete the conditionally lethal marker, P,. (3) Recover lowmolecular-weight DNA after 40 hours. (4) Digest with DpnI to destroy molecules that have not replicated. (5)Transform E . coZi strains DHlOB and DHlOB(X). Recombination frequency = R = [ampHtransformants of DHlOB]/[ampB transformants of DHlOB(X)].
IMMUNOGLOBULIN GENES
89
lysogens. In order to assay recombination, switch substrates are first transfected into primary murine spleen cells cultured with LPS to induce switching at the chromosomal heavy-chain loci. Hecombination between S-region sequences in the extrachromosomal constructs results in deletion of XPL and produces a molecule that can transform a nonlysogen to ampicillin resistance. Low-molecular-weight DNA is isolated from the transfected primary B cells, a compatible plasmid pACYC184 (cmR) is added as an internal control, DNA is treated with the restriction enzyme DpnI to destroy unreplicated molecules, and deletion of hPL (as well as plasmid recovery) is assayed by transformation. Hecombination frequency ( R ) is expressed as the ratio of ampicillin-resistant transformants in the nonlysogen and lysogen, normalized for the relative transformation efficiencies of the two strains. Several important controls show that the assay quantitates recombination events that occur during transfection of the mammalian cells:
1. Constructs carrying st~-AP,-Sy3 are stable during propagation in A lysogens. The rate of spontaneous mutation of APL is less than 2 x 10-5, about 3 orders of magnitude lower than the recombination rate after transfection of B cells; moreover, there is no recombination between S regions during propagation in E . coli. 2. The genetic selection does not bias the assay. Hybridization of a XPL probe to DHlOB(A) transformants provides an independent measure of recombination frequency, and numbers produced in this fashion are comparable to those obtained by counting ampicillin-resistant transformants of DHIOB. 3. Deletion of AP, does not bestow a replication advantage on the plasmid constructs in primary mammalian B cells. Experiments with the substrates further showed that substrate recombination reflects recombinational activities involved in switch recombination: 1. Efficient recombination of the switch substrates required the presence of S-region sequences. 2. All recombinants carried large, single deletions that removed XPL, and deletion endpoints were heterogeneous, as expected from independent recombination events mediated by the switch recombination apparatus. 3. Switch substrate and chromosomal junction sequences are similar, consistent with recombination relying on similar enzymatic machinery. As in some chromosomal junctions, about one-third of the substrate junctions did display from one to four bases of homology be-
90
MARKUS
HENGSTSCHLAGER ET AL.
tween donor and acceptor, and junctions occasionally carried short insertions (54 nt). 4. Substrate recombination was highest in LPS-cultured primary spleen cells, in which chromosomal switch recombination is ongoing. Recombination was reduced to a third in substrates transfected in the T cell thymoma EL4. Most switch substrate recombination in the LPSstimulated primary cells thus appears to be due to activities that are unique to or induced in switching B cells. It is important to emphasize, however, that the enzymes that carry out switch recombination in B cells may participate in other recombination processes in other cell types. This point is discussed in greater detail below. The switch substrate recombination assay is rapid, and the recombination substrates can be readily manipulated to test the effect of different sequences and regulatory elements on recombination. It further allows easy recovery of recombination products for analysis. For these reasons, it has provided a valuable tool in the study of cis-acting elements that regulate switch recombination.
F. Transcriptional Regulatory Elements, but Not Transcription, Stimulate Switch Substrate Recombination
Activation and targeting of recombination clearly depend on recognition of specific sequences by DNA-binding proteins, just as transcriptional activation does. Furthermore, some of the physical properties that characterize recombinational activators, particularly DNA bending, are shared by proteins that activate transcription. As described above, considerable published evidence correlated imminent switch recombination with sterile transcription of the region about to recombine, suggesting some link between transcriptional and recombinational regulation; the immunoglobulin intron enhancers function not only as transcriptional regulatory elements, but also as regulators of V(D)J joining (see, for example, 129). We therefore used the switch substrates shown in Fig. 9 to determine whether transcription or transcriptional activation is essential for switch substrate recombination. A series of experiments that assayed recombination of substrates carrying combinations of transcriptional regulatory elements showed that the presence of the heavy-chain intron enhancer (Ep) and a minimal heavy-chain promoter (PIX)stimulate recombination 12-fold or more, so that as many as 25% of replicated molecules recombined. Activation appeared to depend on the transcriptional regulatory element, but not on transcription per se. A construct containing P, alone recombined at a very low frequency, but the recombination frequency of a construct containing E p but no promoter was
91
IMMUNOGLOBULIN GENES
Sk
hPL
pHL22
sr3
PY
amp ori
2.1%
pHL422
pHL122
R
1.2 EkPH
%
xpL
s@
pY
amp ori 25.6
pHL322
24.1
pBHL352
11.8
pCMVl22
21.1
pCMV152
12.2
FIG. 9. Extrachromosomal switch substrate recombination. All substrates carry S p and Sy3 switch region sequences flanking a conditionally lethal marker, the leftward promoter of phage A (AP,). Substrates are propagated on a shuttle vector carrying both a bacterial origin (ori) and ampicillin-resistance marker (amp)and polyoma sequences to support replication in murine B cells (Py).Transcriptional regulatory elements tested for activation and targeting of recombination included the immunoglobulin heavy-chain intron enhancers (Ep), E p and the heavy chain promoter (PtJ, and the cytomegalovirus enhancer-promoter EP,,,. Recombination frequencies ( R ) are shown in the column at the right.
24.1%,comparable to the recombination frequency of constructs containing both E p and PH. Like transcriptional activation, stimulation of recombination by the enhancer was largely orientation-independent. Assays with a reporter construct verified that E p alone cannot drive detectable transcription in primary cells: reporter gene expression from a construct driven by E p alone was at least 100-fold below expression from a construct driven by EP + pw To determine what elements within E p were critical for recombinational enhancement, recombination of constructs with linker substitutions in separate binding sites in E p were assayed (130). Mutation of a single corebinding site, which does not affect transcriptional activation, reproducibly diminished recombination to about two-thirds the level observed in sub-
92
MARKUS HENGSTSCHLAGER ET AL.
strates carrying the wild-type enhancer. Other constructs that carried mutations in elements critical for transcriptional activation were tested, but no single mutation diminished recombination to less than a half, although combinations of mutations completely abolished recombinational enhancement. A construct mutant at the core, E4 + octamer sites recombined at a frequency of 9.0%, a construct carrying substitutions at the core, E l , E2, and E 3 sites recombined at a frequency of 12.4%, and a construct in which the core, E l , E2, E3, E4, and octamer mutations were combined recombined at a frequency characteristic of constructs lacking any transcriptional activator (1.9%). We therefore concluded that factor-binding, not transcription, is responsible for enhancement of recombination in the extrachromosomal substrates. Other recent observations support the notion that there is no necessary mechanistic connection between transcription and recombination. Culture of Epstein-Barr virus (EBV)-transformed B lymphocytes with IL-4 induces synthesis of sterile E transcripts but does not induce switch recombination (131);conversely, over production of the transcription factor E47 in a preT-cell line induces D-J joining but not sterile transcription of the D-J region (132). The genetic knockout that eliminated the y2b switch targeting element replaced it with a transcriptionally active neo gene that was inducible in response to LPS activation; nonetheless, neo transcription did not substitute for the regulator element to activate y2b recombination (109).
G. Activation of Recombination by Tra nsc ript ion Factors The E p enhancer is located in the J-C intron, just upstream of the Sp region. The chromosomal position and the ability of this element to stimulate recombination of the extrachromosomal switch substrates suggest this enhancer may function as a regulator of isotype switch recombination in uiuo. However, as is clear from the mutational data, recombinational enhancement is not the sole property of any single factor that binds E p . To determine whether transcriptional regulatory regions other than E p may also activate recombination, we assayed recombination of constructs that carried the cytoinegalovirus (CMV) IE1 enhancer-promoter (EPC:MV), and found that this regulatory element supports recombination at a frequency of 21.1%, comparable to Ep. EpP,, and EPcMv are not known to bind any common transcription factors (133, 134), suggesting that recombinational enhancement is due to general properties of transcriptional activators, rather than the binding of particular factors in EkP, or EPCMv The interchangeability of factors that activate recombination may initially be surprising, but a precedent for this has been established in biochemical
93
IMMUNOGLOBULIN GENES
experiments. Either the prokaryotic CAMP-response factor CAP (135)or the eukaryotic HMG domain (136) can substitute for I H F in stimulating A Intpromoter recombination in vitro. CAP, the HMG domain, and I H F all bend DNA on binding, and it may be that this shared activity explains the ability of these different factors to perform an identical function in stimulating recombination. The role of the transcriptional regulatory elements on the switch substrates appears to be to provide high-affinity binding sites for proteins that are present in LPS-activated B cells. The p, y2b, and y l regulatory elements identified in the genetic knockout mice (108-110) may perform a similar function at the chromosomal heavy-chain locus. A dual role in transcription and recombination has been established for the prokaryotic factor I H F (137, 138)and appears to extend to at least two eukaryotic factors, LEF-1 (136, 139)and LRl (114,118). Such dual function may prove to be an even more general property of nuclear DNA-binding proteins.
H. Targeting of Recombination by cisacting Elements
Mapping of switch substrate recombinants suggested that an enhancer not only activates but also targets recombination. Recornbination junctions from constructs carrying a functional enhancer downstream of S p mapped within the SF region (pBHL352, pCMV152; see Fig. 9 and Table 11), whereas constructs carrying the enhancer upstream of Sp (pHL122, pCMV122, pHL322) typically carried sequences from upstream of the transcriptional activator joined to sequences in Sy3. This appears to reflect processes that also direct chromosomal switch recombination because, as described above, the upstream endpoint often does not map within re$etitive S p sequences,
TABLE I1 TARGETING OF RECOMBINATIONBY TRANSCRIPTIONAL REGULATORY ELEMENTS~ Endpoints (%)
5' Elements
R(%)
5'Sp
sp
sy3
3'Sy3
Sp--Ep--hP,--Sy3 Sp--EP,M,--hP,--Sy3 EpPH--Sp--hP,--Sy3 EP,,,,--S~--hP,,--Sy3
11.8 12.2 25.6 21.1
11 12 78 100
89 88 22 0
45 82 86 48
55
Plasmid ~~~~
pBHL352 pCMV152 pHL122 pCMV122
3'
~
18 14 52
Recombinants of constructs shown in Fig. 9 were selected as ampicillin-resistant transformants of DHlOB and mapped by restriction digestion.
94
MARKUS HENGSTSCHLACEH ET AL.
but 5’ of the Sp region. These observations suggest that DNA-binding factors can define a boundary for switch deletion. If so, then the heterogeneous distribution of switch junctions could be due to the multiple binding sites available in the repetitive S-region DNA. In this case, the clustering of recombination sites in switch circles isolated from cells cultured with LPS and TGF-P (140) might reflect induction in these culture conditions of a factor or factors that bind S-region elements that are not occupied in other culture conditions.
I. Are the Switch Recombination Enzymes Celltype Specific? Switch recombination is restricted to a limited stage in B-cell development and could, in principle, require enzymes present only in those cells. In fact, an early view of switch recombination held that specific “recombinases,” inducible by cytokines or lymphokines, both targeted switch recombination to a specific C region and effected the recombination process. One report did claim that switch recombination was cell-type specific (141), but the retroviral vectors used to assay recombination between Sp and Sy2b sequences lacked the y2b regulatory region now thought critical to targeted recombination (109); the genetic assay used in these experiments assumed that only one copy of the vector integrated per cell, an assumption that, in retrospect, is unlikely to be correct. It now appears that much of the regulation of switch recombination is conferred by regulatory elements that function in cis. The importance of the cis-acting regulatory elements in switch recombination is shown by the observation that mice hemizygous for the y l switch regulatory element were 50% disabled for y l switch recombination (110). In these mice, switch recombination did not occur on the chromosome on which the element was deleted, although the recombination enzymes were clearly in the cells and able to effect recombination on the normal chromosome. These observations suggest that, in the absence of positive activation, the chromosome is shut off for recombination. If switch recombination is regulated largely by local chromatin structure, the actual recombination reaction may be carried out by enzymes having general recombination functions and present in most-if not all-cells. Although this model is speculative at present, it is consistent with the fact that switch recombination sites lack recognition signals characteristic of other site-specific recombination events. It is further supported by our observations that switch substrates undergo recombination at a reduced but significant level in the EL4 T cell line.
95
IMMUNOGLOBULIN GENES
111. Conclusions and Future Directions The identification and characterization of the elements and factors that activate and target somatic hypermutation and isotype switch recombination comprise a critical first step toward understanding the mechanisms of these processes. Although somatic hypermutation and isotype switch recombination occur with remarkable efficiency in activated B cells in vivo, there are no cell lines in which either process is efficiently inducible. This presents a serious limitation to the study of molecular mechanisms. Overexpression of an appropriate regulatory factor or factors may permit the establishment of cell lines in which efficient hypermutation or switch recombination is ongoing or inducible, and such lines would be an invaluable tool for studying regulation and mechanism. Eventually, detailed molecular understanding of somatic hypermutation and switch recombination will depend on establishing in vitro systems that can carry out these processes. ACKNOWLEDGMENTS M. H. is the recipient of an Erwin Schrodinger Postdoctoral Fellowship from the Austrian Government. Research was supported by NIH Grants R01 GM39799 and GM41712.
Abbreviations CDR
FR
PNA EF E Ki EK~' TCR V
D
J
VDJ VJ BSAP EPCMV
HMG
complementarity-determining region framework region peanut agglutinin immunoglobulin heavy-chain intron enhancer immunoglobulin K light-chain intron enhancer and adjacent matrix-attachment region immunoglobulin K light-chain 3' enhancer T-cell receptor immunoglobulin variable-region gene segment immunoglobulin heavy-chain gene-diversity region gene segment immunoglobulin joining-region gene segment a rearranged immunoglobulin heavy-chain variable region containing V-, D-, and J-region segments a rearranged immunoglobulin light-chain variable region containing V- and J-region segments B-cehpecific activator protein cytomegalovirus immediate early-region enhancer and promoter high-mobility group
96
IHF TGF LRl LEF-1
MAHKUS HENGSTSCHLAGEH E T AL.
integration host factor transforming growth factor lipopolysaccharide-responsivefactor 1 lymphocyte enhancer factor 1 REFERENCES
1 . D. Allen, A. Cumano, C. Kocks, K. Rajewsky, N. Rajewsky, J. Roes, F. Sablitzky and M. Siekevitz, Zmmunol. Reu. 96, 5 (1987). 2. C. Berek, J. M. Jarvis and C. Milstein, Eur. J. Zmmunol. 17, 1121 (1987). 3. C. Berek and C. Milstein, Zmmunol. Reo. 96, 23 (1987). 4. C. Berek, Zmmunol. Reu. 126, 5 (1992). 5. C. Dell and J. L. Claflin, in “Somatic Hypermutation in V-Regions” (E. J. Steele, ed.), p. 69. CRC Press, Boca Raton, Florida, 1990. 6. P. J. Gearhart and N. S. Levy, in “Somatic Hypermutation in V-Regions” (E. J. Steele, ed.), p. 29. CRC Press, Boca Raton, Florida, 1990. 7. N. Maizels, in “Somatic Hypermutation in V-Regions” (E. J. Steele, ed.), p. 129. CHC Press, Boca Raton, Florida, 1990. 8. N . Maizels, Trends Genet. 5, 4 (1989). 9. A. C. Betz, M. S. Neuberger and C. Milstein, Zmmunol. Today 14, 405, (1993). 10. L. J. Wysocki and M. L. Gefter, ARB 58, 509 (1989). 11. J. Gorski, P. Rollini and B. Mach, Science 220, 1179 (1983). 12. P. J. Gearhart and D. F. Bogenhangen, PNAS 80, 3439 (1983). 13. M. Pech, J. Hochtl, H . Schnell and H. G. Zachau, Nature 291, 668 (1981). 14. C. Clarke, J. Berenson, J. Goverman, P. D. Boyer, S. Crews, G. Siu and K. Calarne, NARes 10, 7731 (1982). 15. G. W. Both, L. Taylor, J. W. Pollard and E. J. Steele, MCBiol 10, 5187 (1990). 16. S. C . Lebecque and P. J. Gearhart, J . Exp. Med. 172, 3652 (1990). 17. J. S. Weber, J. Berry, T. Manser and J. L. Claflin, J. Zmmunol. 146, 3652 (1991). 18. H. S. Rothenfluh, L. Taylor, A. L. M. Bothwell, G. W. Both and E. J. Steele, Eur. J. Zmmunol. 23, 2152 (1993). 19. B. Rogerson, Mol. Immunol. 31, 83 (1994). 20. C. Berek, A. Berger and M. Apel, Cell 67, 1121 (1991). 21. J. Jacob, G. Kelsoe, K. Rajewsky and U. Weiss, Nature 354, 389 (1991). 22. J. Jacob, R. Kassir and 6 . Kelsoe, J. Exp. Med. 173, 1165 (1991). 23. J. Jacob, and 6 . Kelsoe, J. Exp. Med. 176, 679 (1992). 24. U.-J. Liu, 6. D. Johnson, G. Gordon and I. C. M. MacLennan, Zmmunol. Today 13, 17 (1992). 25. J. Jacob, J. Przylepa, C. Miller and 6. Kelsoe, J. Exp. Med. 178, 1293 (1993). 26. R. Kuppers, N. Zhao, M.-L. Hansmann and K. Rajewsky, EMHO J. 12, 4955 (1993). 27. D. Gray, H. Skarvall, Y.-J. Liu, I. C. M. MacLennan and T. Leanderson, in “Somatic Hypermutation in V-Regions” (E. J. Steele, ed.), p. 83. CRC Press, Boca Raton, Florida, 1990. 28. I. C. M. MacLennan, Nature 354, 352 (1991). 29. I. C. M. MacLennan, Curr. Biol. 70 (1994). 30. G. J. V. Nossal, Cell 68, 1 (1992). 31. E. C. Butcher, R. V. Rouse, R. L. Coffman, C. N . Nottenburg, R. R. Hardy and I. L. Weissman, J. Zmmunol. 129, 2698 (1982).
IMMUNOGLOBULIN GENES
97
32. G. Kraal, R. R. Hardy, W. M. Gallatin, I. L. Weissman and E. C. Butcher, Eur. J. Immunol. 16, 829 (1986). 33. E. B. Jacobson, L. H. Caporale and G. J. Thorbecke, Cell. Immunol. 13, 416 (1974). 34. N. Maizels and A. Bothwell, Cell 43, 715 (1985). 35. N. Maizels, J. C. Lau, P. R. Blier and A. Bothwell, Mol. Immunol. 25, 1277 (1988). 36. J. Roes, K. Huppi, K. Rajewsky and F. Sahlitzky, J. Immunol. 142, 1022 (1989). 37. F. Sahlitzky, 6. Wildner and K. Rajewsky, EMBO J . 4, 345 (1985). 38. E. J. Steele, H. S . Rothenfluh and 6. W. Both, Immunol. Cell B i d 70, 129 (1992). 39. S . Clarke, R. Rickert, M . K. Wloch, L. Staudt, W. Gerhard and M . Weigert, J. Immunol. 145, 2286 (1990). 40. N. van der Stoep, J. van der Linden and T. Logtenberg, J. Exp. Med. 177, 99 (1993). 41. A. 6. Betz, C. Rada, R. Pannell, C. Milstein and M. S . Neuberger, PNAS 90,2385 (1993). 42. M. Kaartinen, S . Kulp and 0. Makela, in “Somatic Hypermutation in V-Regions” (E. J. Steele, ed.), p. 105. CKC Press, Boca Raton, Florida, 1990. 43. T. Manser, in “Somatic Hypermutation in V-Regions”(E. J. Steele, ed.), p. 41. CRC Press, Boca Raton, Florida, 1990. 44. B. Rogerson, J. J. Hackett, A. Peters, D. Haasch and U. Storb, EMBOJ. 10, 4331 (1991). 45. E. J. Steele and J. W. Pollard, Mol. Zmmunol. 24, 667 (1987). 46. D. Baltimore, Cell 24, 592 (1981). 47. V. David, N. L. Folk and N. Maizels, Genetics 132, 799 (1992). 48. E. J. Steele, J. W. Pollard, L. Taylor and 6. W. Both, in “Somatic Hypermutation in V-Regions” (E. J. Steele, ed.), p. 137. CRC Press, Boca Raton, Florida, 1990. 49. J. Lederherg, Science 129, 1649 (1959). 50. S . Breiiner and C. Milstein, Nature 211, 242 (1966). 51. S. Crews, J. Griffin, H. Huang, K. Calaine and L. Hood, Cell 25, 59 (1981). 52. T. A. Kunkel, in “Somatic Hypermutation in V-Regions” (E. J. Steele, ed.), p. 159. CRC Press, Boca Katon, Florida, 1990. 53. G . M . Edelman and J. A. Gally, PNAS 57, 353 (1967). 54. 0. Smithies, Science 157, 267 (1967). 55. J, 6 . Seidman, A. Leder, M. Nau, B. Norman and P. Leder, Science 202, 11 (1978). 56. 0. Bernard, N. Hozumi and S . Tonegawa, Cell 15, 1133 (1978). 57. N. C. Chien, R. R. Pollock, C. Desaymard and M. D. Scharff, J. E x p . Med. 167, 954 (1988). 58. C.-A. Reynaud, V. Anquez, H. Grimal and J.-C. Weill, Cell 48, 379 (1987). 59. K. R. Thomas and M . R. Capecchi, Nature 324, 34 (1986). 60. W. T. McCormack and C. B. Thompson, Genes Deu. 4, 548 (1990). 61. L. M. Carlson, W. T. McCormack, C. E. Postema, E. H. Humphries and C. B. Thompson, Genes Deu. 4, 536 (1990). 62. C. B. Thompson and P. E. Neiman, Cell 48, 369 (1987). 63. W. T. McCormack, L. W. Tjoelker and C. B. Thompson, This Series 45, 27 (1993). 64. K. L. Knight and R. S . Becker, Cell 60, 963 (1990). 65. R. S. Becker and K. L. Knight, Cell 63, 987 (1990). 66. L. J. Wysocki, M. L. Gefter and M. N. Margolies, J. Exp. Med. 172, 315 (1990). 67. W. I. Wood, J. Gitschier, L. A. Lasky and R. M. Lawn, PNAS 82, 1585 (1985). 68. N. Maizels, Res. Imrnunol. 144, 459 (1993). 69. R. L. O’Brien, R. L. Brinster and U. Storb, Nature 326, 405 (1987). 70. J. J. Hackett, 13. Rogerson, R. L. O’Brien and U. Storb, J . Exp. Med. 172, 131 (1990). 71. M . J. Sharpe, M. S. Neuberger, R. Pannell, M. A. Surani and C. Milstein, Eur. J. Immunol. 20, 1379 (1990). 72a. M. J. Sharpe, C. Milstein, J. M. Jarvis and M. S. Neuberger, EMBOJ. 10, 2139 (1991).
98
MARKUS HENGSTSCHL~GER ET AL.
72h. A. G. Betz, C. Milstein, A. Gonzdes-Fernlindez, R. Pannell, T. Larson and H. S. Neuberger, Cell 77, 239 (1994). 73. C. Queen and I). Baltimore, Cell 33, 741 (1983). 74. D. Picard and W. Schaffner, Nature 307, 80 (1984). 75. K. B. Meyer and M. S. Neuberger, E M B O J . 8, 1959 (1989). 76. K. B. Meyer, M. J. Sharpe, M. A. Surani and M. S. Neuberger, NARes 18, 5609 (1990). 77. C . E. Carmack, S. A. Camper, J. J. Mackle, W. U. Gerhard and M. 6. Weigert, J. Zrnrnunol. 147, 2024 (1991). 78. M. Hengstschliiger, M. Williams and N. Maizels, Eur. J. Irnrnunol. 24, 1649 (1994). 79. J. Hagman, C. M. Rudin, D. Haasch, D. Chaplin and U. Storb, Genes Dev. 4,978, (1990). 80. N. Motoyama, T. Miwa, Y. Suzuki, H. Okada and T. Azuma, J . Exp. Med. 179,395 (1994). 81. J. J. Hackett, C. Stebbins, B. Rogerson, M. M. Davis and U. Storb, J. Exp. Med 176,225 (1992). 82. J. Durdik, R. M. Gerstein, S. Rath, P. F. Robbins, A. Nisonofrand E. Selsing, PNAS 86, 2346 (1989). 83. S. Pettersson, G. P. Cook, M. Bruggemann, 6. T. Williams and M. S. Neuberger, Nature 344, 165 (1990). 84. J. Sohn, R. M. Gerstein, C.-L. Hsieh, M. Lemer and E. Selsing, J. Exp. Med. 177, 493 (1993). 85. A. M . Giusti, R. Coffee and T. Manser, PNAS 89, 10321 (1992). 86. A. M. Giusti and T. Manser, J. Erp. Med. 177, 797 (1993). 87. A. Umar, P. A. Schweitzer, N. S. Levy, J. I). Gearhart and P. J. Gearhart, PNAS 88, 4902 (1991). 88. T. Azuma, N. Motoyama, L. E. Fields and D. Y. Loh, Int. Zrntnunol. 5, 121 (1993). 89. D. Y. Loh, A. L. M. Bothwell, M. E. White-Scharf, T. Imanishi-Kari and D. Baltimore, Cell 33, 85 (1983). 90. A. Liu, G.Creadon and L. J. Wysocki, PNAS 89, 7610 (1992). 91. A. Gonzalez-Fernandez and C. Milstein, PNAS 90, 9862 (1993). 92. A. Aruffo, M. Farrington, D. Hollenhaugh, X. Li, A. Milatovich, S. Nonoydma, J. Bajorath, L. S. Grosmaire, R. Stenkamp and M. Neubauer, Cell 72, 291 (1993). 93. D. Sen and W. Gilbert, Nature 334, 364 (1988). 94. D. Sen and W. Gilbert, Nature 344, 410 (1990). 95. J. R. Williamson, M. K. Raghuraman and T. R. Cech, Cell 59, 871 (1989). 96. W. I. Sundquist and A. Klug, Nature 342, 825 (1989). 97. W. Dunnick, G. Z. Hertz, L. Scappino and C. Gritzmacher, NARes 21, 365 (1993). 98. T Iwasato, A. Shimizu, T. Honjo and H. Yamagishi, Cell 62, 143 (1990). 99. M. Matsuoka, K. Yoshida, T. Maeda, S. Usuda and H. Sakano, Cell 62, 135 (1990). 100. U. von Schwedler, H.-M. Jack and M. Wabl, Nature 345, 452 (1990). 101. S. Lewis and M. Gellert, Cell 59, 585 (1989). 102. M. R. Lieber, FASEB 1.5, 2934 (1991). 103. J. E. Hesse, M. R. Lieber, K. Mizuuchi and M. Gellert, Genea Dea 3, 1053 (1989). 104. R. L. Coffman, B. W. P. Seymour, D. A. Lebman, I). D. Hiraki, J. A. Christiansen, B. Shrader, H. M. Cherwinski, H. F. J. Savelkoul, F. D. Finkelman and M. W. Bond, Irnrnunol. Reo. 102, 5 (1988). 105. S. Lutzker and F. Alt, MCBiol8, 1849 (1988). 106. T. K. Blackwell and F. Alt, Annu. Reu. Genet. 23, 605 (1989). 107. G . D. Yancopoulos and F. W. Alt, Cell 40, 271 (1985). 108. J. Zhang, A. Bottaro, S. Li, V. Steward and F. W. Alt, EMBOJ. 12, 3529 (1993). 109. S. Jung, K. Rajewsky and A. Radbruch, Science 259, 984 (1993). 110. H. Gu, R.-R. Zou and K. Rajewsky, Cell 73, 1155 (1993).
IMMUNOGLOBULIN GENES
99
1 1 1 . M. Boothby, E. Gravallese, H.-C. Liou and L. H . Glimcher, Science 242, 1559 (1988). 112. P. Rothman, S. C. Li, B. Gorman, L. Glimcher, F. Alt and M. Boothby, MCBiol11,5551 (1991). 113. M. Xu, R. E. Hammer, V. C. Blasquez, S. L. Jones and W. T. Garrard, JBC 264, 21190 (1989). 114. M. Williams and N. Maizels, Genes Deu. 5, 2353 (1991). 115. M. Williams, A. Brys, A. M. Weiner and N. Maizels, NARes 20, 4935 (1992). 116. M. Williams, L. A. Hanakahi and N. Maizels, JBC 268, 13731 (1993). 117. J. Even, 6. M. Griffiths, C. Berek and C. Milstein, E M B O J . 4, 3439 (1985). 118. A. Brys and N. Maizels, PNAS 91, 4915 (1994). 119. R. WuerfTel, A. T. Nathan and A. L. Kenter, MCBiol 10, 1714 (1990). 120. S. H . Waters, K. U. Saikh and J. Stavnezer, MCBiol9, 5594 (1989). 121. F. Liao, S. L. Giannini and B. K. Birshtein, J. Zmmunol. 148, 2909 (1992). 122. €3. Adams, P. Dorfler, A. Aguzzi, 2. Kozmik, P. Urbanek, I. Maurer-Fogy and M. Busslinger, Genes Deu. 6, 1589 (1992). 123. A. Barberis, K. Widehorn, L. Vitelli and M. Busslinger, Genes Den 4, 849 (1990). 124. Y. Wakatsuki, M. F. Neurath, E. E . Maw and W. Strober, J. Exp. Med. 179, 1099 (1994). 125. A. L. Kenter, R. Wuerffel, R. Sen, C. E . Jamieson and G. V. Merkulov,]. Zmmunol. 151, 4718 (1993). 126. R. Wuerffel, C. E . , Jamieson, L. Morgan, 6. V. Merkulov, R. Sen and A. L. Kenter, J. Exp. Med. 176, 339 (1992). 127. H. Leung and N. Maizels, MCBinl 14, 1450 (1994). 128. H. Leung and N. Maizels, PNAS 89, 4154 (1992). 129. Y.-R. Zou, S. Taketla and K. Rajewsky, E M B O J . 12, 811 (1993). 130. M. Kiledjian, L.-K. Su and T. Kadesch, MCBiol 8, 145 (1988). 131. J.-F. Gauchat, H. Gascan, R. De Waal Maalefyt and J. E. De Vries, 1.Zmmunol. 148,2291 (1992). 132. M. Schlissel, A. Voronova and D. Baltimore, Genes Deu. 5, 1367 (1991). 133. L. Hennighausen and B. Fleckenstein, E M R O J. 5, 1367 (1986). 134. C. L. Peterson, K. Orth and K. L. Calame, MCBiol6, 4168 (1986). 135. S. D. Goodman and H. A. Nash, Nature 341, 251 (1989). 136. K. Giese, J. Cox and R . Grosschedl, Cell 69, 185 (1992). 137. D. I. Friedman, Cell 55, 545 (1988). 138. T. R . Hoover, E. Santero, S . Porter and S. Kustu, Cell 63, 11 (1990). 139. M. L. Waterman, W. 6. Fischer and K. A. Jones, Genes Deu. 5, 656 (1991). 140. T. Iwasato, H. Arakawa, A. Shimizu, T. Honjo and H. Yamagishi, J. Exp. Med. 175, 1539 (1992). 141. D. E. Ott, F. W. Alt and K. B. Marcu, E M B O J . 6, 577 (1987). 142. G. M. Griffiths, C. Berek, M. Karrtinen and C. Milstein, Nature 312, 271 (1984).
Capping Enzyme in Eukaryotic mRNA Synthesis’ STEWARTSHUMAN Molecular Biology Program Sloan-kttering lnstitute New York, New York 10021
I. Domain Structure of Vaccinia Virus Capping Enzyme . . . . . . . . 11. What Is the Rate-limiting Step in Cap Formation? . . . . . . . . . . . . . . . . 111. Cotranscriptional Capping of Nascent mRNA IV. Direct Role for Capping Enzyme in Vaccinia Virus Transcription . . . . V. Yeast mRNA Capping Enzyme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. VI. Capping Enzyme from Schizosaccharomyces pombe . . . e VII. Sequence Conservation among Capping Enzymes and Ligases . . . . . . . . . . . . . . . . . . . . . . . . . . . .. VIII. Capping Enzyme and mRNA Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . IX. Genetic Link between Capping Enzyme and Pre-mRNA Splicing . . . X. Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . ...........................
103 109 111 112 114 117
119 122 124 125 127
Eukaryotic mRNAs contain a modified 5‘ terminal “cap” structure consisting of 7-methylguanosine linked to the 5’ end of the transcript by a 5’-5‘ triphosphate bridge (1). A growing body of evidence indicates that the cap plays an important role in RNA synthesis and function, by facilitating posttranscriptional processing, nucleocytoplasmic transport, and ultimately the recognition of mature inRNA by the translation machinery. Capping may also serve to protect the mRNA from nucleolytic degradation. Elucidation of the mechanism and potential regulation of cap formation is thus pertinent to our understanding of eukaryotic gene expression. Capping occurs by a series of three enzymatic reactions in which the 5’ triphosphate terminus of a primary transcript is first cleaved to a diphosphateterminated RNA by RNA triphosphatase, then capped with GMP by RNA guanylyltransferase, and methylated at the N7 position of guanine by RNA (guanine-7)methyltransferase.
1 Abbreviations: VTF, vaccinia virus termination factor; EpG, enzyme-GMP covalent complex; 5-FOA. 5-Fluoroorotic acid.
Progress zn Nucleic Acid Hesearch
and Molmular Biology, Val. 50
101
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form resewed
102
STEWART SHUMAN
PPPN(PN)~ PPN(PN), + Pi ppN(pN), + PPPG G ( ~ ' ) P P P N ( P N+) ~'Pi (iii) G(5')pppN(pN),&+ AdoMet + m'G(5')pppN( pN),
(i)
(4
+
+ AdoHcy
Viral systems have played a pivotal role in defining the structure of the cap and the biochemistry of cap formation (2, 3). A multifunctional capping enzyme, which catalyzes all three steps in cap formation, has been purified to homogeneity from vaccinia virus particles (4-6). The vaccinia enzyme is a heterodimer containing subunit polypeptides of 95 and 33 kDA (encoded by the viral D1 and D12 genes, respectively). In addition to its role in RNA 5' processing, the vaccinia capping enzyme participates directly in the transcription of viral mRNAs, as a transcription termination factor in early mRNA synthesis (7),and as an initiation factor during transcription of vaccinia intermediate genes (8). Cellular transcripts are capped by the same series of reactions as vaccinia mRNAs. Enzymes that catalyze these steps have been isolated from a number of cellular sources (9). A distinctive feature of the cellular enzymes is the lack of tight physical association between the guanylyltransferase and 7-methyltransferase activities. Unlike the vaccinia capping enzyme, for which capping and methylation functions are never dissociable during conventional purification, these two activities readily separate during chromatography of cell extracts (10). The cap methyltransferase isolated from rat liver has a native size of 130 kDa, estimated by gel filtration (10). Methyltransferase has been partially purified from HeLa cells; a native molecular weight of 56 kDa is estimated, based on sedimentation and gel filtration (11). In no case has the cellular cap methyltransferase been purified to homogeneity, and there has been no identification of a cellular gene encoding cap methyltransferase. RNA guanylyltransferase has been purified from HeLa cells (12-15), rat liver (lo), calf thymus (16), mouse myeloma (13), brine shrimp (17), wheat germ (18), and yeast (19-21). Each enzyme catalyzes the transfer of GMP from GTP to the diphosphate end of RNA to form a GpppN structure. Triphosphate-terminated RNAs can also serve as cap acceptors with varying efficiency, depending on the source of the enzyme. Monophosphate termini cannot be capped. The guanylyltransferases from rat liver, brine shrimp, and yeast copurify with an RNA triphosphatase activity, thereby accounting for utilization of triphosphate-terminated RNA substrates (I 7 , 2 1 , 2 2 ) .The cellular capping enzymes thus resemble the vaccinia protein with respect to the tight association of triphosphatase and guanylyltransferase activities. The inefficient use of triphosphate termini noted with capping enzymes from HeLa cells and wheat germ (18, 23) may be related to the in vitro reaction conditions. The RNA triphosphatase activity of the rat liver and brine shrimp
CAPPING ENZYME IN
mHNA SYNTHESIS
103
enzymes is optimal in the absence of divalent cation, and is actually inhibited > 95% by divalent cation concentrations used routinely to assay guanylyltransferase (17, 19, 22). On the other hand, the RNA triphosphatase activity of the yeast guanylyltransferase (19), like that of the vaccinia enzyme (6), depends absolutely on a divalent cation cofactor. Most cellular guanylyltransferases are monomeric and bifunctionali. e., guanylyltransferase and triphosphatase activities are contained within a single polypeptide. In this regard, they again resemble the vaccinia enzyme, in which the guanylyltransferase and triphosphatase domains both reside entirely within the 95-kDa subunit (24, 25). The sizes of the guanylyltransferases from human cells (68 kDa), rat liver (69 kDa), calf thymus (65 kDa), brine shrimp (73 kDa), and wheat germ (77 kDa) have been determined by analysis of the covalent enzyme-GMP catalytic intermediate (9; see Section 1,A). The yeast protein is exceptional in that it is a heterodimeric enzyme in which the triphosphatase and guanylyltransferase activities are separate polypeptide chains (21).The gene for the guanylyltransferase subunit of the yeast enzyme has been identified recently (26).No gene encoding a mammalian guanylyltransferase has been reported to date. This review focuses on the enzymatic mechanism of the individual capping reactions and the organization of functional domains within the relevant enzymes. The greatest attention is devoted to two model systems, vaccinia virus and Saccharomyces cerevisim, in which biochemistry, molecular genetics, and protein engineering have fueled considerable progress since the last review of RNA capping in this series (9). I also elaborate on genetic studies in yeast that shed light on the physiologic role of the cap in eukaryotic RNA metabolism. I take some license with respect to critical commentary and hermeneutic zeal in order to enliven the issues under consideration.
1. Domain Structure of Vaccinia Virus Capping Enzyme
The vaccinia capping enzyme, a heterodimer of 95- and 33-kDa subunits, catalyzes all three steps in cap formation. Delineation of functional domains within the vaccinia capping enzyme has been accomplished through the isolation of active subdomains generated by partial proteolysis (27) and through the expression of the capping enzyme subunits (encoded by the D1 and D12 genes) in Escherichia coli (24, 25, 28, 29). Coexpression studies showed that the 95- and 33-kDa viral gene products were together sufficient to catalyze all three enzymatic steps in cap synthesis. By expressing the DI gene alone, the RNA triphosphatase and guanylyltransferase domains were localized to the large (Dl) subunit (24). The small (D12) subunit was re-
104
STEWART SHUMAN
quired for methyltransferase (25,30).Expression of carboxyl-deleted forms of the large enzyme subunit in E . coli further localized the guanylyltransferase domain to the amino two-thirds of the 95-kDa polypeptide (25) An aminoterminal 59-kDa tryptic fragment of the large subunit contained both RNA triphosphatase and guanylyltransferase activities (25, 27). A model for the domain structure of the vaccinia capping enzyme based on these findings is illustrated in Fig. 1, and depicts the following features: (1) the RNA triphosphatase and RNA guanylyltransferase domains colocalize within an amino-terminal 59-kDa fragment of the large subunit; (2) the methyltransferase domain resides on a heterodimer of the small subunit and the C-terminal portion of the large subunit; and (3) the guanylyltransferase domain is linked in cis to the methyltransferase domain by a protease-sensitive bridge within the large subunit.
A. Catalytic Mechanism and Active Site of Vaccinia RNA Guanylyltransferase Of the three catalytic steps in cap formation, only the guanylyltransferase reaction mechanism has been dissected in detail. Transfer of GMP from GTP to the 5’-diphosphate terminus of RNA occurs in a two-stage reaction involving a covalent enzyme-GMP intermediate (31).Both steps are readily reversible: (i) E + pppG S EpG + PP, (ii) EpG ppRNA G GpppRNA
+
Triphosphatase Guany lyltransferase Methyltransferase Termination Factor
Triphosphatase Guany ly ltransferase
+E
Methy ltransferase
FIG. 1. Domain structure of vaccinia virus capping enzyme. Activities associated with the native DUD12 heterodimeric capping enzyme are indicated at the left. The active site lysine of the guanylyltransferase domain (Lys-260 of the DI subunit) forms a covalent intermediate with G M P as shown. The structure of autonomous catalytic domains is illustrated at the right. The amino segment of the large subunit is indicated by N-D1, and the carboxyl portion, by C-D1. The two catalytic domains are linked in cis within the native protein by a protease-sensitive hinge region of the D1 subunit.
CAPPING ENZYME IN
mRNA
SYNTHESIS
105
The GMP residue is linked to the large subunit of the vaccinia capping enzyme through a phosphoamide bond to the €-amino group of a lysine residue (31-33). An equivalent mechanism of covalent catalysis applies to cellular mRNA guanylyltransferases, including enzymes isolated from human (13-15), mouse (13), rat (34),calf (16),wheat germ (33),brine shrimp (17), and yeast (20, 21). In looking for clues to the location of the active site, Cong and Shuman (35) noted a region of local conservation between large subunits of two poxvirus capping enzymes [from vaccinia virus and Shope fibroma virus (SFV) (36, 331 and the sequence of the guanylyltransferase subunit of yeast capping enzyme (26). Vaccinia virus S FV S. cereuisiae HNA ligase
Tyr Ala Val Thr Lys Thr Asp Gly Tyr Val Thr Thr Lys Thr Asp Gly Tyr Val Cys Glu Lys Thr Asp Gly Tyr Ile Leu Thr Lys Glu Asp Gly
The motif Lys-X-Asp-Gly is also conserved at the known active site regions of DNA ligases (38) and T4 RNA ligase (39) (as shown). This is particularly striking because the ligase reaction entails formation of a covalent enzymenucleotidyl intermediate that consists of an AMP moiety bound to the €-amino group of a lysine residue (40, 41). (i)
E
EpA + PP, (or NMN) + pRNA (or DNA) S AppRNA (or AppDNA) + E
+ pppA (or NAD)
(ii) EpA
The AMP is then transferred to the 5' end of a monophosphate-terminated polynucleotide to generate a blocked 5 ' 4 ' phosphoanhydride bridge structure (AppN) analogous to the unmethylated RNA cap. The conserved lysine residue (Lys-260 of vaccinia D1 protein) within the KTDG motif is essential for nucleotidyl transfer, suggesting that Lys-260 is the active site (35).This was confirmed by direct mapping of the GMP-bound peptide (42).Any of several conservative amino acid substitutions at Lys-260 abrogated the ability of the D1 protein to form a covalent adduct with GTP (35). Mutation of Gly-263 in the KTDG motif to Val or Ala also completely abolished EpG formation. In contrast, mutation of Asp-262 to Asn did not inhibit guanylyltransferase activity. The effects of mutations in the vaccinia capping enzyme KXDG sequence on nucleotidyl transfer are similar to those reported for mutations at corresponding positions in T4 RNA ligase and mammalian D N A ligase (39, 43). Comparison of the sequences of capping enzymes and polynucleotide ligases from diverse sources suggested that KX(D/N)G may be a signature element for covalent catalysis in nucleotidyl
106
STEWART SHUMAN
transfer. More recent experiments support the prediction that KTDG constitutes the active site of yeast capping enzyme (see Section V,A). The limits of the guanylyltransferase domain within the D1 protein have been defined crudely by partial proteolysis. A 59-kDa amino-terminal proteolytic fragment of the D1 polypeptide constitutes an autonomous guanylyltransferase/ triphosphatase domain that lacks methyltransferase activity. Recently, Myette and Niles (44) expressed and purified a fully active guanylyltransferase domain consisting of the region of D1 from residues 1 to 545. Further truncation to produce the derivative Dl(1-520) resulted in total loss of guanylyltransferase activity.
6. Methyltransferase Domain of Vaccinia Capping Enzyme Cap methyltransferase catalyzes methyl group transfer from S-adenosylmethionine to the 5' guanine nucleoside of the cap, as follows: GpppRNA
+ S-adenosylmethionine -+ m7GpppRNA + S-adenosylhomocysteine
The methylation step is essentially irreversible. Because reversal of the guanylyltransferase reaction is blocked by the addition of the methyl group, the effect of concomitant methylation is to pull the overall reaction equilibrium to the right, i.e., in the direction of cap formation. Cap methylation is also critical for cap function in promoting translation. The methyltransferase domain of the vaccinia capping enzyme was initially localized to a complex consisting of the small subunit and a 347aminoacid carboxyl-terminal portion of the large subunit (aa 498-844). The small subunit alone did not suffice for methyltransferase activity (30). It was proposed that the requirement for both subunits may explain the tight physical association of the two polypeptides in vivo. Subsequent studies showed that the purified carboxyl segment of the large subunit-Dl(498-844)has a very weak intrinsic methyltransferase activity in the absence of the D12 protein (45, 46). Thus, the active site of the methyltransferase must reside in the D l polypeptide per se. The basal level of activity of Dl(498-844) is stimulated 50 to 100-fold by addition of purified D12 protein, which is catalytically inert (45, 46). Stimulation of methyltransferase activity by the D12 protein apparently requires that the two subunits form a complex. The Dl(498-844) protein can heterodimerize with the D12 subunit when the subunits are coexpressed in vioo or in vitro, and the proteins can interact functionally when mixed in vitro (30, 45, 46). By expressing a more extensively truncated version of the large subunit in bacteria, it was shown that a 305-aminoacid region of D I
CAPPING ENZYME IN
mRNA
SYNTHESIS
107
(residues 540-844) suffices for reconstitution of the methyltransferase domain, together with the D12 protein (46).This same Dl(540-844) segment has weak intrinsic methyltransferase activity and can heterodimerize with the small subunit in vivo. A more extensively deleted protein, Dl(579-844), was inactive for cap methylation. Further dissection of the methyltransferase domain by mapping the sites of substrate binding and via targeted mutagenesis are in progress. AdoMet (labeled) can be specifically cross-linked by UV light to the D1 capping enzyme subunit (47). Because photoadduct formation was inhibited by AdoHcy, it was inferred that cross-linking had occurred at the methyltransferase active site. Peptide mapping studies localized the site of photocrosslinking to two fragments derived from the carboxyl region of the D1 proteinone fragment from amino-acids 499-579 and a second from residues 806-844 (47). Significantly, the cross-linking of AdoMet to D1 was unaffected by association with the D12 subunit, which indicates that the stimulation of methyltransferase activity by the D12 protein is not attributable to enhanced affinity for the methyl donor. Cross-linking of GTP (a methyl acceptor) to the carboxyl segment of the D 1 protein has also been demonstrated (48). Our initial efforts to map precisely the methyltransferase domain by mutagenesis were guided by the alignment of the carboxyl regions of the vaccinia virus D1 polypeptide with related polypeptides encoded by Shope fibroma virus (37)and African swine fever virus (49) (Fig. 2). Nine mutated alleles were created that contained single or clustered alanine substitutions at aminoacid residues conserved between all three viral proteins (indicated by asterisks in Fig. 2). Because the choice of mutated residues was dictated by identity among three viral capping enzyme large subunits, it was anticipated that many of these mutations would have functional consequences, and, indeed, seven of nine mutated proteins were defective for methyltransferase activity in the presence of the D12 subunit. In most cases, the lack of methyltransferase activity could be explained simply by the inability of the alanine-substituted D1 protein to heterodimerize with D12 (46). However, in the case of the H682A-Y683A substitution, the mutant protein was defective specifically for methyltransferase activity, but not for subunit interaction (46, 50). The D1(498-844)H682A-Y683A and D12 subunits were coexpressed in bacteria and purified as a 1:l heterodimer, which was inactive in methyl transfer (SO).Presumably, the residues H682/Y683 constitute part of the methyltransferase active site. The effects of single alanine substitutions at each residue within the conserve IHY motif of the D1 protein confirm this view (50).The single mutation of His-682 to Ala resulted in a reduction to 1/40th of the specific activity of the heterodimeric methyltransferase domain. The single mutation of Tyr-683 to Ala reduced activity to 1/3000th. Mutation of the conserved up-
108
STEWART SHUMAN
** *
Vac SFV
.............................. .................. ................................ ................... YA-NDKYRLNPDVSYFTNKRTRGPLGILSNYVK-------TLLISLYCSKTFLDNSNK~ . . . .. .. .. .. .. . . . . .....
YA"DKFRLNPEVSYFTNKRTRGPLGILSNYVK-------TLLISMYCSKTFLDDSNKRK
ASF
FKTAELTWLNYMDPFSFEELAKGPSGMYFAGAKTGIYRAQTALISFIKQEIIQKISHQSW
Vac
VLAIDFGNG--ADLEKYFYGEIALLVATDPDADAIARGNERYNKLNSGIKTKYYKFDYI--Q
SFV
VLAIDFmJG--ADLEKYFYGEISSLVATDPDKEAIGRCIERYNSLNSGIKSKYYKFDYI--Q
* * *
** * *
*
ASF
................................................. .................................................... . . .. .. .. . . . . .. . ... . .. .. .. .. .. . .. . G--IDLGIGKGQDLGRYLDAGGRHLVGIDKDQTALAELVYRKFSHATTRQH~ATNIYVLHQ
Vac
ETIRSDTFVS-SVREVFYFGKFNI--IDWQFAIHYSFHPRHYATVNLSE-LTASGG~
SFV
ETIRSVTYVS-SVREVFFFGKFDL--VDWQFAIHYSFHPKHYAT~NLTE-LTASGG~
ASF
DLAEPAKEISEKVHQIYGFPKEGASSIVSNLFIHYLMKNTQQVENLAVLCHKLLQPOOMV
**
** *
.................. .............................. ....................................................... . .. .. . . . . . . . . ... .. . . . . .
---
***
............................. .............................. . ..... . . . . ..
Vac
LITTMDGDKLSKLTDKKTFIIHKNLPSSENYM
SFV
LITTMDGDLLSQLTDKKTFVIHKNLPSSENYM
ASF
WFTTMLGEQVLELLHENRIELNEVWEARENEV
FIG. 2. Alignment of the vaccinia virus (Vac) D1 protein sequence with that of homologous polypeptides encoded by Shope fibroma virus (SFV) and African swine fever (ASF) virus. The predicated amino-acid sequences of the capping enzyme large subunits encoded by vaccinia virus, Shope fibroma virus, and African swine fever virus are aligned over a region corresponding to residues 541-739 of the vaccinia protein. Identical residues are indicated by a double dot between the lines whereas conserved residues are denoted by a single dot. Residues of the vaccinia D1 gene product subjected to alanine substitution mutagenesis are shown in bold type and are marked by an asterisk. The location of the conserved IHY motif essential for methyltransferase activity is underlined in bold.
stream vicinal residue Ile-681 to Ala had a milder effect, resulting in a decrease to U8th in the activity. It remains to be determined whether these mutations at the putative active site are affecting substrate binding or reaction chemistry, or both.
C. Triphosphatase Domain of Vaccinia Capping Enzyme
Vaccinia capping enzyme has an intrinsic triphosphatase activity that hydrolyzes the y phosphate from triphosphate-terminated mRNAs or from synthetic triphosphate-terminated homopolymeric RNA substrates (5, 6). The enzyme also hydrolyzes the y phosphate from nucieoside triphosphates to yield nucleoside diphosphate and Pi (6,24). The NTPase activity is strong-
CAPPING ENZYME IN
mRNA
SYNTHESIS
109
ly purine-specific, but displays no preference for deoxy versus ribonucleotides (6). GTP and dGTP are the preferred substrates, with ATP and dATP being hydrolyzed about one-third to one-half as well as guanine nucleotides. Activity with pyrimidine NTPs is 4-10% of that with GTP. It has been hypothesized that the RNA triphosphatase and NTPase activities reflect a common active site for y-phosphate cleavage (6). The fact that the K , for RNA ends (0.6 pM) is three orders of magnitude lower than the K, for ATP (0.8 mM) suggests that RNA ends are the preferred substrate for the triphosphatase (6).The turnover number for hydrolysis of ATP by the full-sized D1/D12 heterodimeric enzyme is 6-lOisecond (51). The triphosphatase domain was colocalized initially with the guanylyltransferase domain to the amino-terminal 59-kDa segment of the large capping enzyme subunit. The deletion derivative Dl(1-545) is fully active in ATP hydrolysis, with a turnover number of 8/second (44). However, the triphosphatase and guanylyltransferase domains are clearly not identical, because mutation of the active-site lysine of the guanylyltransferase (Lys-260), which abrogates enzyme-GMP complex formation, has no effect on the triphosphatase activity of the enzyme (51). No mutations that selectively affect triphosphatase activity have been described as yet. Thus, the location of the triphosphatase active site remains obscure. It is worth noting that the sequence of the D1 protein does not include any of the motifs commonly implicated in nucleotide binding.
11. What Is the Rate-limiting Step in Cap Formation?
Synthesis of the cap entails four separate chemical steps: (1)y-phosphate cleavage, (2) GMP transfer from GTP to enzyme, (3) GMP transfer from enzyme to diphosphate-terminated RNA, and (4) methyl transfer from AdoMet to the cap guanosine. In order to achieve any understanding of the regulation of cap formation, we must first define the rate-limiting step and the kinetic parameters for each reaction. Unfortunately, there has been no rigorous kinetic study under single-turnover conditions that has addressed the individual rate constant for all steps. Nor has there been an adequate study under multiple-turnover (steady-state) conditions. Nonetheless, review of the literature allows some important conclusions and predictions about which steps might be rate-limiting. For example, it is known from early studies (5, 6) that the RNA triphosphatase reaction is much faster than GMP transfer from GTP to RNA. Although crude estimates of turnover numbers based on the data of Venkatesan et al. (5)(2OIminute for Pi release from RNA and O.S/minute for GMP transfer to RNA) yield values that proba-
110
STEWART SHUMAN
bly underestimate the reaction rates, their key finding was that the specific activity of vaccinia capping enzyme for RNA triphosphatase was at least 50 times the specific activity of the guanylyltransferase (5). Thus, it is quite unlikely that capping is subject to regulation at the first chemical step. The turnover number of the capping enzyme in hydrolysis of ATP, which is on the order of 6-l0/second is presumably more reflective of the actual triphosphatase reaction rate (44, 51). The guanylyltransferase reaction involves two steps. The formation of the covalent enzyme-GMP complex is certain to be much more rapid than the rate of GMP transfer from enzyme to the RNA end; this inference is based on early studies of GTP-PP, exchange by purified capping enzyme (6). Because the molar concentration of active enzyme could be accurately determined b y titration of enzyme-GMP complex formation, the turnover number in GTP-PP, exchange of 1.8/second [calculated from the data of Shuman and Hunvitz (31)] affords an estimate of the reaction rate. The relative specific activities of purified vaccinia capping enzyme for ATPase, GTP-PP, exchange, and (guanine-7)methyltransferase reported by Shuman et al. (6) provide the following information. The ratio of the specific activity of ATPase to that of GTP-PP, exchange was 7:l. this value is in the same range as the ratios of the respective turnover numbers derived from separate studies using different enzyme preparations (i.e., 6-10lsecond for ATPase compared with 1.8/second for GTP-PP, exchange). The specific activity of methyltransferase was 1/270th that of GTP-PP, exchange. Although this study may have underestimated the rate of transmethylation, it is nevertheless clear that the rate-limiting reaction in capping is either the transmethylation step or the transfer of GMP from enzyme to RNA and not the formation of the enzyme-GMP intermediate. In their initial characterization of the capping enzyme, Martin and Moss assayed by double-labeling the kinetics of GMP and methyl-group addition to diphosphate-terminated RNA. They found both reactions to be linear with respect to time but that only one methyl group was incorporated for every four to five residues of GMP (52). The concentration of AdoMet in this reaction was near the K , value and may have partially limited the rate of methylation of newly guanylylated ends. More recent experiments with recombinant capping enzyme, in which capping is assayed by incorporation of labeled GMP into RNA, followed by direct product analysis of the capped ends, suggest that virtually all caps formed in the presence of unlabeled AdoMet are indeed methylated (25, 28). In other words, the methylation reaction is probably not slower than the GMP transfer step to RNA. The turnover number for the D1(498-844)/D12 methyltransferase domain is 9 mol of cap methylated per mole of enzyme per minute, which is essentially identical to the value of 8/minute determined for “full length’
CAPPING ENZYME IN
mHNA
SYNTHESIS
111
D l / D 12 heterodimer purified from bacteria (46). Thus, the methyltransferase domain is no less potent than the whole enzyme. These values for the methyltransferase turnover number are in the same range as the V,, determinations (0.5-1. G/minute) reported using a different assay procedure (45). Using these values, it appears that the rate of methyltransferase is at least 1/12th the rate of GTP-PP, exchange.
111. Cotranscriptional Capping of Nascent mRNA Biochemical studies of vaccinia mRNA capping are performed in solution using purified enzymes and free RNA substrates. In the “real world,” however, capping of viral early mRNAs occurs within the vaccinia virion, a very large (0.2 x 0.3 km), highly compacted, and presumably highly constrained nucleoprotein complex. Indeed, the virus core particle contains all the enzymes necessary for transcription of approximately 80- 100 early genes encoded by the 192-kbp DNA genome (53).These enzymes include a multisubunit DNA-dependent RNA polymerase and the several proteins required for 5‘ capping and 3’ polyadenylylation of the early mRNAs. Capping does not normally occur on free RNA, and thus should not be considered a posttranscriptional event. Rather, capping is cotranscriptional, and the true substrate for the capping enzyme is the ternary complex of template DNA, RNA polymerase, and nascent RNA. To understand how and when a capping enzyme acts on the transcriptionelongation complex, Hagler and Shuman (54, 55) prepared homogeneous populations of ternary complexes paused at unique template positions downstream of a vaccinia early promoter, and probed the structure of the these halted complexes in solution using several footprinting approaches. Comparing the properties of complexes halted at varying distances from the start site of transcription affords a “freeze-frame” view of the elongation complex as it makes incremental progress along the template (54, 55). The aspects of this work pertinent to the capping question involve the configuration of the nascent RNA within the complexes and the time of 5’-end modification. RNase A was used to footprint the labeled nascent transcript. RNA polymerase protected an 18-base RNA segment extending back from the 3’ growing point of the chain (54). This protection was attributed to an RNAbinding domain within the RNA polymerase. The dimensions of the binding domain (18 bases of RNA) did not change as the polymerase translocated down the template. The size of the RNA-binding domain determined the accessibility of the nascent transcript to modification by the mRNA capping enzyme. Cotranscriptional capping was confined to RNAs 12 31 nucleotides long, whereas transcripts 5 27 nucleotides were uncapped (54). This is not
112
STEWART SHUMAN
attributable to an inherent RNA size preference on the part of the capping enzyme per se. Rather, the results indicate that a critical chain length must be extruded from the polymerase before capping enzyme can interact with the 5’ end. It was posited that capping enzyme might interact with RNA polymerase at, or shortly after, the time of transcription initiation and thereby be poised to cap the 5’ end as soon as it is extruded from the RNA polymerase. In support of this model, we have shown that purified capping enzyme forms a binary complex with vaccinia RNA polymerase in solution (54). Capping enzyme and polymerase interacted in the absence of nucleic acid. No complex could be detected between capping enzyme and purified E . coli RNA polymerase.
IV. Direct Role for Capping Enzyme in Vaccinia Virus Transcription
As if the vaccinia capping enzyme was not busy enough catalyzing three distinct functions in 5‘ processing of mRNAs, nature has endowed it with direct roles in the transcription of the early and intermediate classes of vaccinia genes. The 3’ ends of vaccinia early mRNAs arise from true termination rather than endonucleolytic cleavage. Termination requires a cis-acting heptamer sequence, UUUUUNU, in the nascent RNA strand that is sufficient to induce termination at heterogeneous sites downstream of the signal (56, 57). Vaccinia RNA polymerase by itself cannot terminate in response to the UUUUUNU signal, but requires a separate viral termination factor (VTF) that is identical to the vaccinia mHNA capping enzyme (7).Although a detailed consideration of the termination mechanism is beyond the scope of this review, several features of the termination reaction are noteworthy. First, it is likely that the capping enzyme elicits termination through interactions with the nascent RNA and perhaps with one or more polypeptide constituents of the transcription elongation complex (58). Contacts between the capping enzyme and the nascent RNA have been localized by UV crosslinking to the large capping enzyme subunit (58). Second, the 5’ cap structure is not required in any way for transcription termination (58).Third, the termination event is subject to temporal control during the vaccinia life cycle, insofar as the VTF-dependent termination pathyway applies only to viral early genes. The autonomous catalytic domains for the individual capping reactions are incapable of promoting termination. Thus, the D l subunit by itself, which is fully active in triphosphatase and guanylyltransferase functions, has no demonstrable VTF activity in uitro (51). Similarly, the heterodimeric
CAPPING ENZYME I N
mRNA SYNTHESIS
113
methyltransferase domain of Dl(498-844) and D12 subunits has no VTF activity (51). Neither does the D12 subunit by itself. Apparently, both fulllength subunits are required for transcription termination. A single aminoacid substitution at the active site Lys-260 of the vaccinia capping enzyme D1 subunit (which abolishes enzyme-GMP complex formation) has no effect on the termination factor activity of the mutant DUD12 heterodimeric enzyme in vitro (51). In the same vein, the H682iY683 mutation of the D1 subunit, which completely abrogates methyltransferase activity, has no effect of VTF activity of the mutant DI(H682/Y683)/D12 heterodimer (50).Thus, there must exist a domain for termination distinct from the catalytic domains for nucleotidyl transfer and methyl transfer. A key question is whether the triphosphatase function of the capping enzyme might play a role in termination, given the finding the VTF-dependent termination requires ATP hydrolysis (58).Presently, no mutation has been identified that selectively affects the transcription termination factor activity of the capping enzyme. Transcription of vaccinia intermediate genes is driven by a distinct class of promoter element (59, 60). The choice of promoters is dictated by classspecific transcription initiation factors that act on a common “core” RNA polymerase. Whereas the vaccinia capping enzyme appears to have no role in the initiation phase of early transcription, the capping enzyme is actually required as a transcription initiation factor for intermediate transcription (8). Intermediate transcription initiation requires at least two other protein factors besides capping enzyme and RNA polymerase (8, 61-63), Although the specific protein component that binds to the intermediate promoter element has not been defined, it is suggested that protein-protein interactions between capping enzyme and the polymerase are relevant to the initiation event. Cap formation is not required for intermediate transcription (64). Structure-function relationships for the initiation factor activity of capping enzyme remain to be explored. Can we anticipate additional roles for capping enzyme in vaccinia biology? At this point, nothing about this protein would come as too great a surprise, especially after the suggestion that capping enzyme plays a role in formation of the hairpin telomeres at the ends of vaccinia genomic DNA (65). Of particular interest is whether capping enzyme plays some direct role in the transcription of the late class of vaccinia virus genes. In my view, the key unanswered question regarding the role of capping enzyme in vaccinia transcription is this: Why is it that only early mRNAs are terminated in response to the UUUUUNU signal, whereas mRNAs transcribed from intermediate and late genes (which contain one or more copies of the signal) are not? I would predict that capping enzyme is necessary, but not sufficient, to elicit transcription termination by vaccinia RNA polymerase, and that coupling of termination to initiation at early promoters reflect a requirement for addi-
114
STEWART SHUMAN
tional protein factors present only on early transcription units. Current efforts are focused on testing this model.
V. Yeast mRNA Capping Enzyme The mechanism of mRNA capping, the role of capping enzyme in transcription, and the role of the cap in RNA metabolism are major issues that are most effectively approached in a system that permits biochemical and genetic analysis. Whereas much has already been achieved in studies of the vaccinia system, which is powerful biochemically, it has not been easy to examine vaccinia capping enzyme function in uiuo. Therefore, in collaboration with Beate Schwer’s laboratory at RutgersAJMDNJ, we have turned our attention to yeast. The capping enzyme purified from S . cereuisiae is a bifunctional complex consisting of two polypeptides of 80 and 52 kDa (21). RNA triphosphatase activity is intrinsic to the 80-kDa subunit, whereas the 52-kDa subunit contains guanylyltransferase activity (19-21). Extensive analysis of this enzyme by Mizumoto and colleagues culminated in the isolation of the CEGl gene encoding the 52-kDa (459-aminoacid) guanylyltransferase subunit and the demonstration that this gene is essential for cell viability (26).
A. Active Site of Yeast Guanylyltransferase: Mutationa I Ana Iys is A strong clue to the location of the active site lysine within the C E G l protein was provided by studies of the vaccinia enzyme, whose active site was assigned by mutational analysis (35) to Lys-260, a result confirmed by peptide mapping (42). This lysine is situated with a motif, KTDG, that is conserved among the guanylyltransferases from yeast (KTDG), vaccinia (KTDG), Shope fibroma virus (KTDG), and African Swine fever virus (KADG). A similar motif occurs at the active sites of T4 RNA ligase (KEDG), mammalian DNA ligase (KYDG), and yeast tRNA ligase (KANG) (38,39,66). To evaluate the hypothesis that Lys-70 of the C E G l protein is the site of covalent guanylylation, this residue was changed to alanine via oligonucleotide-directed mutagenesis of the cloned CEGl gene. Single alanine mutations were also created at conserved residues Thr-71, Asp-72, and Gly-73. These mutated alleles were expressed in bacteria in parallel with the wildtype gene. Assay of enzyme-GMP complex formation by soluble bacterial extracts indicated that the K70A and G73A mutations completely abolished enzyme activity (67) In contrast, the T71A and D72A mutant proteins retained EpG-forming activity, albeit at reduced level compared to the wildtype protein. Thus, K70 and G73 of the yeast capping enzyme were essential
CAPPING ENZYME IN
mRNA SYNTHESIS
115
for transguanylylation in vitro. The concordance of these findings with previous studies of nucleotidyl transfer by mutated versions of vaccinia capping enzyme and of mammalian DNA ligase suggests that Lys-70 is indeed the active site of the yeast capping enzyme. Substitution of Lys-70 of C E G l with either Ile or Thr abrogates EpG formation in vitro (68).
6. Structure Probing of Yeast Guanylyltransferase by Limited Proteolysis Limited proteolysis is a classical approach to probing the conformation of proteins in the native state, and one that has been used successfully for domain mapping of the capping enzymes from vaccinia virus (27) and from brine shrimp (17). We have analyzed the susceptibility of the yeast capping enzyme-GMP complex to limited digestion with trypsin, chymotrypsin, and Staphylococcus V8 proteinases. Digests of purified “His-tagged” enzyme
VB
V8
1
t
Chymo
Chymo
Trypsin
V8
4 4
t
t
Chymo
P e p t i d e (MW)
p2 8-GMP P25 01 8 pl3-GMP p35-GMP P 20 P3 9 P2 6 P16 pl5-GMP
Trypsin
Chymo
Amino T e r m i n u s
4 231 93 4
N-His
305, 307 122, 119 254 119
N-His
459
-
-
0
0
FIG. 3. Structure probing of the yeast guanylyltransferase by limited proteolysis. Sites of cleavage of purified His-tagged CEGl protein by V8 protease, trypsin, and chymotrypsin under limiting digestion conditions are indicated by arrows above and below a linear diagram of the C E G l polypeptide. Below the diagram, the major proteolytic products are indicated by “p” followed by the approximate size of the peptide (in kDa, as estimated by SDS-PAGE). Peptides that contained a covalently bound radiolabeled guanylate moiety are denoted as such. The residues found at the experimentally determined amino termini of the peptide fragments are shown. “N-His” indicates the proteolysis product derived from the original amino terminus of the His-tagged C E G l protein.
116
STEWART SHUMAN
were performed at room temperature for 15 minutes at several concentrations of proteinase, followed by denaturation and electrophoretic analysis of proteolytic products. Stable proteolytic products were detected by Coomassie-blue staining, and the polypeptide fragments containing covalently bound 32P[GMP] were identified by autoradiography. After transfer to PVDF membranes, the N-terminal sequences of the proteolytic fragments were determined. The location of protease-sensitive sites, illustrated in Fig. 3, provides a crude map of accessible regions of the native protein. The sizes of the major proteolytic fragments generated by chymotrypsin, trypsin, and V8 are listed in Fig. 3. The amino-terminal position of the sequenced fragment is specified. Fragments retaining the intact amino terminus of the His-tagged capping enzyme are indicated as “N-His”. All internal cleavage sites are consistent with the known specificities of the proteinase employed. Polypeptides containing covalently bound GMP are indicated. Although the carboxyl position of these species was not directly mapped, it was surmised, based on size and the locations of internal cleavage sites, that the limited proteolysis generated the family of polypeptides illustrated in Fig. 3. Peptides with covalently bound GMP are darkly shaded; those without nucleotide are lightly shaded. A comparison of peptides containing bound GMP with those lacking nucleotide makes clear that the active site for covalent catalysis is located between amino-acid positions 4 and 93. Thus, the evidence from partial proteolytic mapping is consistent with assignment of the active site to Lys-70 within the conserved KTDG motif. Localization of the active site to Lys-70 by exhaustive proteolysis and sequencing of the GMP-bound peptide has been accomplished (68).
C. C€G7 Function in Vivo: Capping Activity Is Essential for Cell Growth
A plasmid-shuffle strategy was employed to assess the ability of mutated CECl alleles to support cell growth. Wild-type and alanine-substituted CEGl alleles on centromeric plasmids marked with TRPl were introduced into a haploid strain in which the chromosomal copy of CEGl was deleted (and whose viability was contingent on maintenance of a CEGl allele on an extrachromosomal CENI URA3 plasmid). The transformants containing both CEGl plasmids were then plated on 5-FOA to select against retention of the wild-type CEGl IURA3 plasmid. Cells bearing the wild-type CEGl ITRPI plasmid grew readily (67). The T71A and D72A alleles, the protein products of which retained partial activity in oitro, also supported growth on 5-FOA. In contrast, the K70A and G73A alleles, which encoded catalytically inert guanylyltransferases, were unable to sustain growth under counterselective conditions (67). These results indicate that guanylyltransferase activity is essential for yeast viability.
CAPPING ENZYME IN m R N A SYNTHESIS
117
A strict correlation of in vitro enzymatic activity and cell growth was also noted during analysis of deletion mutants of the yeast capping enzyme. The CEGl gene was altered via site-directed mutagenesis, such that the mutated alleles-CEG(29-459) and CEG(l431)-would express amino- and carboxyltruncated versions of the yeast capping enzyme in E. coli. Deletion of 28 residues from the amino terminus abrogated enzyme-guanylate formation in vitro. In contrast, the removal of 28 residues from the carboxyl end was benign (67). When tested for in vivo function in yeast, the catalytically impaired CEG(29-459) allele was lethal, whereas the active CEG(I-431) allele was viable (67). A more extensive carboxyl-deletion allele, CEG(1366), was inactive in vitro and in vivo. A conditional lethal growth phenotype was elicited by placing the CEGl open reading frame under the transcriptional control of galactose-inducible promoter. Cells containing a single-wild-type CEGl allele under GAL control grew well on galactose but were unable to grow in the presence of glucose, i.e., when expression of the CEGl gene is transcriptionally repressed (67). The conditional growth phenotype of the GAL-CEGI strain, together with the isolation of temperature-sensitive mutant alleles of CEGl, will facilitate studies aimed at defining which aspects of RNA biogenesis and function are cap-dependent in vivo.
VI. Capping Enzyme from Schizosaccharornyces pombe The conditional growth phenotype of the GAL-CEGI strain has been exploited to isolate the gene encoding the capping enzyme from the fission yeast, S. pombe (69). The GAL-CEGI strain was transformed with a S. pombe cDNA-2p library. Transformants capable of growth on glucose medium were selected. Retransformation with plasmid DNA recovered from these isolates confirmed their ability to complement the growth defect of GAL-CEGI on glucose. These isolates also complemented the Acegl null mutation in a plasmid-shuffle experiment. Restriction analysis of several independent clones indicated that a single gene is responsible for complementation. The 1.7-kbp cDNA contains a single long open reading frame encoding a predicted 402-aminoacid polypeptide that initiates at the first available ATG codon. The polypeptide from S. pombe (Sp) is obviously related to the S. cerevisiue capping enzyme (Sc), with 152 out of 402 identical residues, as indicated in the primary sequence alignment shown in Fig. 4. The alignment, which extends nearly the entire length of the two proteins, is punctuated by several discontinuities in which a sequence present in the C E G l protein is not represented in the S . pombe polypeptide.
118
STEWART SHUMAN
. . . .. . .. .
.... . .... .. .. ..
.......... ..........
SP
MAPSEKDIEEVSVPGVLAPRDDVRVLKTRIAKLLGTSPD---TFPGSQPVSFSKKHLQA-
sc
MVLAMESRVAPEIPGLIQPGNVTQDLKMMVCKLLN-SPKPTKTFPGSQPVSFQHSDVEEK
SP
LKEKNYFVCEXSDGIRCLLYMTEHPRY-ENRPSVYLFDRKIFYP-VENDKSG
sc
LLAHDYYVCEKTDGLRVLMFIVINPVTGEQGC--FMIDRENNYYLVNGFRFPRLPQKKKE
SP
--KKYHVD-TLLDGELVLDIYPGGKKQ-LRYLVFDCLACDG-----IVYMSRLLDKRLGI
sc
ELLETLQDGTLLDGELVIQTNPMTKLQELRYLMFDCLAINGRCLTQSPTSSRLAH--LGK
SP
FAKSIQKPLDEYTKTHM-RETAIFPFLTSLKKMELGHGILKLFNEVIPRLRHGNDGLIFT
sc
EFF---KPYFDLRAAYPNRCT-TFPFKISMKHMDFSYQLVKVAKSLD-KLPHLSDGLIFT
SP
CTETPYVS-GTDQSLL-KWKPKEMNTIDFMLKLEFAQPEE----------GDIDYS~PE
sc
PVKAPYTAGGKD-SLLLKWKPEQENTVDFKLILDIPMVEDPSLPKDDRNRWYYNYDVKPV
SP
FQLGWJEG-RNMYS-FFAFMYV-DEKE--------------------WEKLKSF~PLSE
sc
FSLYWJQGGADVNSRLKHFDQPFDRKEFEILERTYRKFAELSVSDEEWQNLKNLEQPLNG
SP
RIVACYLDENR--WRFLRFRDDKRDANHISTVKSVLQSIEDGVSKEDLLKEMPIIREAYY
sc
RIVECAKNQETGAWEMLRFRDDKLNGNHTSWQKVLESINDSVSLEDLEEIVGDIKRCWD
SP
NRKKPSVTK--RKLDETSNDDAPAIKKVAKESEKEI
sc
. . . . . . . . ....... .. . . . . . . . .
. . .. .. . . . . . . . . . .
. . . . . . . . . . . . . ..... . ... . . ... . ... . . .
.. . . .
. .. .. . .. . . .. ...
. .. ... . .. ... . . ... .. . ..... . .. ..
. .. .,.. .. . . . .
..
...
.. ..
. . . . . . . .. ..... .. .. . .
..
.. .. ..
.. .. .. .. . . . . . .. .. .. ............... .. .. .. .. .. .. .. .. .. .. .. ..
.. . .
. .. .. . . .... ..
(402) . . . E&MAGGSG;~P~PSQ&~ATLSTS~PVHSQPPSNDKEPKYVDEDDWSD (459)
FIG. 4. Sequence alignment of mRNA capping enzymes from Schizosaccharomyces pombe (Sp) and Saccharomyces cerevisiae (Sc). Identical amino acids are indicated by a double dot between the lines and conserved residues are denoted by a single dot. Discontinuities in the alignment are indicated by dashes (-). The active site lysine (K)residues are indicated in bold. There are 152 out of 402 identical residues.
Confirmation that the cDNA isolated by complementation actually encodes a functional guanylyltransferase was obtained by expressing the ORF in E . coli (69). The expressed protein can form a 47-kDa enzyme-GMP complex in vitro. [Accordingly, the gene encoding pombe capping enzyme has been designated PCEI.] The 47-kDa P C E l protein includes the sequence KSDG, which is related to the KTDG motif at the active site of the guanylyltransferases from S. cerevisiae and vaccinia, implying that residue K67 is the active site of the S. pornbe capping enzyme. This is supported by the finding that mutation of residue K67 to alanine abrogates PCEl function in vivo.
CAPPING ENZYME IN
mHNA
SYNTHESIS
119
VII. Sequence Conservation among Capping Enzymes and Polynucleotide Ligases
The conserved KTDG element at the guanylyltransferase active site was first noted during scanning “by eye” for sequence similarities between capping enzyme and various polynucleotide ligases (35).It had been predicted in 1980 (6) that capping enzyme and ligase would share a common mechanism of covalent catalysis. That the active sites are so similar suggested that other structural features may be conserved, thus prompting further sequence-gazing to root out candidate motifs. This was done by first inspecting the regions of conservation between the CEGl and PCEl proteins, then searching by eye for similar elements in the capping enzymes of vaccinia virus (36), Shope fibroma virus (37), and African swine fever virus (49). In addition to the active site KTDG (denoted as Motif I), four other conserved sequence elements were discerned, which are referred to as Motifs 11-V, and which are situated within the CEGl polypeptide as shown in Fig. 5 . Remarkably, these motifs are also conserved among the numerous members of the polynucleotide ligase family (41). The aligned amino-acid sequences of the five conserved regions are shown in Fig. 5 for the capping enzyme (CE) and polynucleotide ligase (DNA or RNA) from the indicated sources. What is most striking about these sequence motifs is that they are arranged in the same order, and with nearly identical spacing, in all capping enzymes and in most of the polynucleotide ligases. Motif I encompasses the KXDG element at the active site of covalent enzyme-NMP adduct formation. The X residue is not strictly conserved, but there is a preference for Thr among the guanylyltransferases and for Tyr in the ligases. Within the capping enzyme family, a Tyr located four residues upstream of the active site Lys is also conserved. Motif TI, consisting of RFP, or closely related triplets, is found in some of the family members, but not in others, as indicated in Fig. 5 . Motifs I11 and IV are highly conserved. Motif V, which displays a more subtle pattern of conversation, can be viewed as bipartite. The KWKP sequence in the upstream half of Motif V is identical in the capping enzymes from the two yeasts and the African swine fever virus-the closely related sequence KLKP is found in the African swine fever virus DNA ligase. The poxvirus capping enzymes and the other DNA ligases contain an invariant Lys in this region (xxKx). The downstream portion of Motif V includes the sequence (E/D)NTVD, which is highly conserved among the five capping enzymes. The ligases have an invariant Asp residue in this region (Dxxxx). Is primary sequence conservation between capping enzyme and ligases relevant to the common catalytic mechanism? To test this, alanine substitution mutations were introduced into the CEGl protein as residues within conserved Motifs I-V (69). (Residues mutated are indicated by asterisks in
120
STEWART SHUMAN
I
m
N
m
II III n
u
w v 0
CEGl
W
=
C
REGIONS OP CONSERVATION BETWEEN CAPPING ENZXMES AND POLYNUCLEOTIDE LIGASES I
*
**
***
**** *
Sc CE Sp CE
YVCEKTDG -33- RFP FVCEKSDG -34- FYP
-17- TLLDGELV -13- TLLDGELV
-90-90-
DGLIF DGLIF
-17-16-
KWXEEQENTYR KHKEKEMXUR
ASF CE VAC CE SFV CE
YVTDKADG -32-> TILDGEFM YAVTKTDG -23- RYP -8- WVFGEAV -8- VTLYGEAV YVTTKTDG -23- RYD
-78-69-68-
DGIIL EGVIL EGWL
-13-10-9-
WKETJ-JRULR DFKIKKDYKIKLRUXR
-96-96-96-91-92-85-124-112-
EGLMV EGLMV EGLMV EGLVL EGLML EGAIV EGIIL EGYVA
-18-18-17-13-13-20-16-6-
WLKLKKQYLEG WLKVKKRYLSG WLKLKKQYLDG WLKIKRPYLNE WLKIKKRHLKT KLKELLRAEFI &FKEVIQVDLK HFKIKSRWYVS
Sc DNA
Sp DNA Hu DNA VAC DNA SFV DNA ASF DNA T4 DNA T4 RNA
****
V
IV
111
I1
KYDG -25- RYP -16KYDG -25- RYP -16KYDG -25- KYP -16KYDG -43-> KYDG -43-> KRNG -43-> -49-> KADG KEDG -4->
LILDCEAV FILDCEAV F ILDTEAV IVLDSEIV FILDAELV VYLDGELY VLIDGELV TYLDGDEI
*
*** *
FIG. 5. Regions of conservation between capping enzymes and polynucleotide ligases. Five colinear conserved sequence elements, designated Motifs I-V, were discerned by visual inspection of the amino-acid sequences of capping enzymes (CE), DNA ligases (DNA), and RNA ligases (RNA) from S . cereuisiae (Sc), S . pornbe (Sp), African swine fever virus (ASF), vaccinia virus (VAC), Shope fibroma virus (SFV), human (Hu), and bacteriophage T4. The number of intervening amino-acid residues is indicated (-n-). Residues in the CEGl protein targeted for mutational analysis are indicated by asterisks above the aligned sequence. The location and spacing of the motifs within the CEGl protein are depicted above the alignment.
Fig. 5). C E G l - A h alleles in CEN:TRPI plasmids were tested for in vivo function using the plasmid-shuffle procedure. Inability of CEGI-AZa alleles to sustain cell growth on medium containing 5-FOA (which selects against a resident CEGI:URA3 plasmid) indicates that the side chain of the affected residue is essential for protein function. It was anticipated that some of the Ala-substitution mutations in the conserved motifs might be tolerated, while others would be lethal, and still others might confer a conditional growth defect. Consequently, all mutated CEGl alleles were screened initially for growth at 25°C. CEGI-Ala strains viable at 25°C were screened secondarily for growth at 37°C. The results are shown in Table I. As mentioned earlier, the K70A and G73A mutations in Motif I were lethal, whereas the T71A and D72A mutants were viable both 25 and 37°C. Replacement of the active site lysine with arginine (K70R) was also lethal, suggesting a stringent requirement for lysine as the nucleophile during attack by enzyme on the a phosphate of GTP. The Y66A substitution in Motif I
CAPPING ENZYME I N
mRNA
121
SYNTHESIS
TABLE I MUTATIONSIN CONSERVED MOTIFS I-V AFFECT CEGl FUNCTIONin Viva" Motif
Mutation
I
Y66A K70A K70R T71A D72A G73A RFP+AAA L129A D130A G131A E 132A V134A D225A G226A K249A E253A N254A T255A D257A
I1 111
IV V
Growth phenotype ts Lethal Lethal
+++ +++
Lethal ts
+++
Lethal ts
Lethal
+++
Lethal Lethal Lethal
+++ +++
ts Lethal
Yeast strain YBS2 was transformed with plasmid-borne derivatives of CECl containing the indicated amino-acid substitution mutants of C E C l . Q
Lethal mutations were those that precluded growth in a plasmid shut% under counterselection with FOA. Strains that grew on FOA were streaked on YPD plates at 25°C.Single colonies were restreaked on YPD plates and incubated at either 25 o r 37°C for 3-4 days. Temperature-sensitive (ts) alleles either failed to form colonies or else formed only pinpoint colonies at 37°C. Alleles that supported "wild-type" growth are indicated by + + .
+
caused a temperature-sensitive defect, seen as normal growth at 25"C, but severely slowed growth at 37°C. The RFP triplet of Motif I1 was substituted simultaneously at all three positions; this caused a slow-growth defect at 25"C, and complete lethality at 37°C. Five single alanine mutations in Motif I11 were examined. Two of these involving aliphatic residues, L129A and V134A, were viable, whereas the alterations of charged residues, D130A and E132A, were lethal. The G131A mutant was strongly temperature-sensitive. In Motif IV, D225A and G226A substitutions were lethal. Replacement of universally conserved K249 in Motif V with alanine was lethal, as was substitution D257, a residue conserved only in the capping enzyme family. The T255A mutant (affecting a residue common to all capping enzymes) caused a temperature-sensitive phenotype. The E253A and N254A mutants were fully viable.
122
STEWART SHUMAN
In summary, the mutational analysis indicates that conserved Motifs I, 111, VI, and V are essential for capping enzyme function. Seventeen residues in these motifs were singly substituted (not counting the RFP mutation in Motif 11). Mutations at eight residues were lethal, three were temperaturesensitive, and only six were viable (69). The conservation of essential motifs among ligases and capping enzymes has important evolutionary implications. Both types of enzymes catalyze single-nucleotide transfer reactions to activate the ends of polynucleotide chains. We propose that the ligases and guanylyltransferases evolved from an ancestral nucleotidyltransferase that employed a phosphoramidate intermediate, but which may have lacked NTP specificity. Indeed, single-step nucleotidyltransferases may have antedated the evolution of processive template-directed DNA and RNA polyinerases as agents of polynucleotide synthesis. Phosphormidate catalysis in nucleotide transfer is not merely a molecular fossil; this mechanism is likely to pertain to many other nucleotidyl transfer reactions for which a covalent intermediate has been either demonstrated or proposed. For example, tRNAHis guanylyltransferase catalyzes ATP-dependent addition of a nontemplated GTP moiety to the 5‘ terminus of tRNAHis molecules. This is a multistep ligaselike reaction in which ATP binds enzyme to form a covalent protein-AMP intermediate; AMP is transferred to the 5’ end of the tRNA to form an activated A(5’)pp(5’)Nstructure that is attacked by the 3’-OH of GTP (70).In another case, GTP-GTP guanylyltransferase from brine shrimp catalyzes synthesis of a GppppG dinucleotide from two GTP molecules via a capping-enzymelike mechanism employing an enzyme-GMP phosphoramidate intermediate (71). An ATP-dependent RNA ligase from kinetoplastid mitochondria is thought to play a role in RNA editing (72). The cloning of genes encoding these proteins, and of additional members of the guanylyltransferase family, will undoubtedly shed light on the structural basis for covalent catalysis.
VIII. Capping Enzyme and mRNA Identity RNA capping in uiuo is coordinated temporally and physically with transcription. Capping occurs on nascent RNAs as soon as they achieve a critical chain length that allows access of capping enzyme to the 5’ end (54, 73). In cellular systems, capping is targeted to RNAs synthesized by RNA polymerase 11; these include pre-mRNAs and many snRNAs (e.g., U1, U2, U4, and U5). How is this achieved? A 5’ triphosphate or diphosphate RNA terminus is all that is needed to permit cap formation in uitro by purified guanylyltransferase. Such termini are not restricted to pre-mRNAs or snRNAs, yet only polymerase I1 transcripts are capped with the standard
CAPPING ENZYME IN
mRNA SYNTHESIS
123
m7GpppN structure. [The U6 snRNA, which is transcribed by RNA polymerase 111, contains a blocked 5’ y-monomethyl phosphate terminus, MepppN (74, 75);this U6 “cap” structure is formed by enzymes unrelated to those involved in capping of mRNA (76).]In order to account for this specificity, one might predict that the cellular capping enzyme interacts specifically with RNA polymerase I1 or some other component of the mRNA transcription apparatus. Precedent for such interaction is provided by the vaccinia system, where vaccinia capping enzyme forms a binary complex in solution with vaccinia RNA polymerase (54). The vaccinia polymerase is a virus-encoded homolog of cellular RNA polymerase 11. It is proposed that the timely acquisition of the cap structure by RNA polymerase I1 transcripts as they are being made may actually target nascent pre-mRNAs for further processing events (54).[Splicing, for example, occurs cotranscriptionally in vivo (77, 78)].How the various processing enzymes identify pre-mRNAs among other classes of transcripts is unclear. “mRNA identity” may be established by recognition of the RNA polymerase I1 elongation apparatus (by protein-protein interactions) or may be conferred on the nascent RNA, perhaps through an RNA-polymerase-11-specific modification (e.g., capping). There is evidence that the cap may facilitate RNA splicing (79, 80) and RNA transport (81, 82), in which case the capping event would be crucial (if not actually sufficient) to establish mRNA identity. A prediction of this model is that failure to cap should have an effect on “downstream” RNA transactions. Although it has often been suggested that the cap plays a role in mRNA processing, translation, and mRNA stability, there has been no definitive genetic test of cap function in vivo. It is likely that the cap plays more than one role in mRNA metabolism, as suggested in Fig. 6. Accordingly, the failure to cap may produce a complex phenotype. Using conditional cegl
Capped pre-mRNA
Uncapped pre-mRNA
1 \
Splicing B*”<*.. Polyadenylation ,
Transport
Nucleus Decay
I
*---:
c1
Translation
I
?
;
v I
Cytoplasm
mRNA Decay De-adenylation De-capping
A
FIG. 6. Multiple potential roles for the RNA cap in eukaryotic mRNA metabolism
124
STEWAKT SHUMAN
mutants, we hope to answer the question: What is the fate of newly transcribed yeast pre-mRNAs that lack the cap? Some initial results pertinent to RNA splicing are described in Section IX.
IX. Genetic link between Capping Enzyme and Pre-mRNA Splicing Studies of pre-mRNA splicing in mammalian cell extracts suggest that the cap structure is required for splicing in uitro (79, 80), apparently at the level of spliceosome assembly (79, 83). Another report indicated that although the cap structure enhances splicing efficiency in uitro, splicing of uncapped RNAs occurs readily (84). It has been reported that the cap structure selectively enhances in uitro excision of the 5' proximal intron from premRNAs containing multiple introns (85). Capped pre-mRNAs are also spliced more efficiently than uncapped precursors when injected into Xenopus oocyte nuclei, where the cap structure again exerts its effect primarily on the 5' proximal intron (86). In these studies, the splicing of exogenous RNAs is uncoupled from transcription. Splicing in uiuo apparently requires that the RNA precursor be transcribed by RNA polymerase 11, because RNAs containing consensus 5' and 3' splicing signals are not spliced when transcribed in t h o by RNA polymerase I11 (87).It is suggested that pol-11-specific RNA capping is the basis for this effect (87). One prediction of the hypothesis that capping confers mRNA identity is that mutations in the yeast capping enzyme would affect pre-mRNA splicing in uiuo. p r p (pre-inRNA-processing) mutants of S . cereuisiae have been isolated from several collections of yeast mutants that display a thermosensitve growth phenotype (reviewed in 88). The screen for p r p phenotype typically entails demonstration of the accumulation of unprocessed pre-mRNA precursor in cells maintained at the nonpermissive temperature. Characterization of stage-specific blocks to mRNA processing caused by the p r p mutations, and the development of yeast in uitro splicing systems, have provided a wealth of mechanistic insights (88).Many PRP genes have been cloned and most are essential. Indeed, many gene products identified genetically by the p r p phenotype have ultimately been shown to be direct participants in the RNA splicing reaction. The number of existing PRP genes falls short of the number of gene products that would be expected to participate in RNA splicing in uiuo (based, for example, on the polypeptide complexity of the spliceosome), suggesting that the p r p collection is not yet saturated. This has fueled additional screening efforts in several laboratories where new t s or cs mutant collections have been generated.
CAPPING ENZYME I N
mRNA
SYNTHESIS
125
New p r p alleles have been isolated in this fashion by John Woolford (Carnegie Mellon University). We have tested several of Woolford’s p r p mutants for their ability to be rescued by a plasmid-borne copy of the CEGl gene. One mutant, p r p 3 3 - 1 , was compIemented by CEGl on a CEN plasmid; growth of the p r p 3 3 - 1 , CENCEGI strain at 37°C (the nonpermissive temperature for p r p 3 3 - I ) was indistinguishable from that of a wild-type strain. Subsequent experiments have proven that PRP33 is allelic to CEGI, as follows: ( 1 ) the cegl gene was isolated from the prp33-I mutant strain and was shown, after plasmid shuffle, to confer a temperature-sensitive growth phenotype on the Acegl null strain; (2) the ts growth phenotype could be transferred by exchange of a restriction fragment of the cegl coding sequence; (3)sequencing this region of the cegl gene from the prp33-1 mutant strain revealed a single nucleotide substitution that causes a Cys to Tyr amino-acid substitution at residue 354 (89). Woolford’s group has cloned PRP33 from a genomic yeast library (by complementation of the ts growth phenotype of p r p 3 3 - 1 ) and has found the gene, by sequence analysis, to be identical to CEGl (90). A second ts allele of yeast capping enzyme (containing a Tyr-to-Ala substitution at residue 66 of CEG1) also displays a pre-mRNA splicing defect at the nonpermissive temperature (89). Guanylyltransferase activity of the Y66A and the C354Y(prp33-1) proteins is thermolabile in uitro; both enzymes are inactivated by heating for 5 minutes at 3TC, whereas wild-type CEGl is unaffected by this treatment (89). The finding that mutations in capping enzyme elicit a conditional defect in mRNA splicing provides the first genetic link between capping and other mRNA processing events. A key question is whether the link is direct or indirect, and whether it involves the cap structure, or the capping enzyme, or both.
X. Perspective The cap structure of eukaryotic mRNA was discovered 20 years ago. The latter 1970s witnessed an enormous outpouring of studies from many laboratories documenting the presence of various forms of the cap on mRNAs from all eukaryotic organisms and on most viral mRNAs. Though this review has dealt only with formation of the “cap zero” structure m7GpppN, it must be kept in mind that the 5’ ends of eukaryotic RNAs are subject to additional methylation reactions. The cap, in its protean forms, may contain methyl groups at the 2‘ ribose sugars of the first and second nucleosides, at the N 6 position of an initiating adenosine nucleoside base, or, in the case of
126
STEWART SHUMAN
snRNAs, two additional methyl groups at the N2 position of the cap guanosine moiety. If the functions of the cap zero structure are just beginning to be understood in biochemical and genetic terms, we must admit that we are essentially clueless as to the biological roles of the other base and sugar methylation events. The pathway of cap formation was defined at first by studies of viral transcriptionlinodification systems. In a tour de force of enzymology, Moss and colleagues quickly purified and extensively characterized the capping and methylating enzymes from vaccinia virions, while also partially purifying the cellular enzymes responsible for guanylylation and many of the ribose and base methylation steps (4, 11, 12,18, 91-94). Studies by Shatkin’s group using in uitro translation systems showed that the methylated cap zero structure is important for recruitment of mRNA to the 40-S ribosome during translation initiation (95-97). Capping enzyme was soon relegated to “reagent” status for capping and cap-labeling of RNAs. Although the description of the covalent mechanism of transguanylylation in 1981 (31) provided transient impetus for collateral studies of cellular capping enzymes, the focus of the RNA processing field had already shifted away from issues of RNA 5‘ end modification. Even the reagent uses of capping enzyme were supplanted somewhat by the introduction of bacteriophage RNA polymerases for largescale synthesis of capped transcripts primed by m7GpppG cap dinucleotides (98). During the 198Os, several groups continued to make important contributions in eukaryotic 5’ end-formation. I would particularly credit Mizumoto and colleagues, whose work on the yeast guanylyltransferase has been instrumental in bringing cellular capping enzyme into the modern era of molecular genetics. I would also single out Reddy’s work on the y-methyl phosphate cap of U6 RNA as opening up a fascinating new area in RNA 5’ processing (74-76). The vaccinia capping enzyme has been rejuvenated as a focus for research in my own lab as well as that of Niles. Also Moss et al. identified the vaccinia gene encoding the cap nucleoside-2’-0-methyltransferase and initiated a molecular genetic analysis of this protein via targeted mutagenesis (99, 100). The identification of the yeast guanylyltransferase gene has drawn in several investigators new to the capping field. I anticipate that the next several years will be a very exciting period for studies of RNA capping as more investigators join the fray. It is time to revisit the enzymology of capping and methylation, to purify the cellular triphosphatase and methyltransferases in quantity sufficient for reverse genetics, and to examine the impacts of mutations in the relevant genes on eukaryotic RNA metabolism.
CAPPING ENZYME IN mRNA SYNTHESIS
127
ACKNOWLEDGMENTS Members of my laboratory who have contributed to the capping and transcription work described herein include present and former graduate students Peijie Cong, Xiangdong Mao, Yan Luo, Jerry Hagler, Liang Deng, and Scott Morham. Financial support has been provided by grants from the National Institutes of Health, the American Cancer Society, and the Pew Charitable Trusts.
REFERENCES 1. 2. 3. 4. 5.
6. 7. 8. 9. 10. 11. 12. 13. 14.
15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.
A. K. Banerjee, Microbid. Reu. 44, 175 (1980). B. Moss, A. Gershowitz, C. Wei and R. Boone, Virology 72, 341 (1976). Y. Furuichi, S. Muthukrishnan, J. Tomasz and A. J. Shatkin, JBC 251, 5043 (1976). S. A. Martin, E. Paoletti and B. Moss, JBC 250, 9322 (1975). S. Venkatesan, A. Gershowitz and B. Moss, JBC 255, 903 (1980). S. Shuman, M. Surks, H. Furneaux and J. Hunvitz, JBC 255, 11588 (1980). S. Shuman, S. S. Broyles and B. Moss, JBC 262, 12372 (1987). J. C. Vos, M. Sasker and H. 6. Stunnenberg, EMBOJ. 10, 2553 (1991). K. Mizumoto and Y. Kaziro, This Series 34, 1 (1987). K. Mizumoto and F. Lipmann, PNAS 76, 4961 (1979). M. Ensinger and B. Moss, JBC 251, 5283 (1976). S. Venkatesan, A. Gershowitz and B. Moss, JBC 255, 2829 (1980). S. Shuman, JBC 257, 7237 (1982). D. Wang, Y. Furuichi and A. Shatkin, MCBiol 2, 993 (1982). S. Venkatesan and B. Moss, PNAS 79, 340 (1982). Y. Nishikawa and P. Chambon, EMBOJ. 1, 485 (1982). Y. Yagi, K. Mizumoto and Y. Kaziro, JBC 259, 4695 (1984). J. Keith, S. Venkatesan, A. Gershowitz and B. Moss, Bchem 21, 327 (1982). N. Itoh, K. Mizumoto and Y. Kaziro, JBC 259, 13930 (1984). D. Wang and A. J. Shatkin, NARes 12, 2303 (1984). N. Itoh, H . Yamada, Y. Kaziro and K. Mizumoto, JBC 262, 1989 (1987). Y. Yagi, K. Mizumoto and Y. Kaziro, EMBOJ. 2, 611 (1983). S. Venkatesan and B. Moss, JBC 255, 2835 (1980). S. Shuman, JBC 265, 11960 (1990). S. Shuman, and S. G. Morham, JBC 265, 11967 (1990). Y. Shibagaki, N. Itoh, H. Yamada, S. Hagata and K. Mizumoto, JBC 267, 9521 (1992). S. Shuman, JBC 264, 9690 (1989). P. Guo and B. Moss, PNAS 87, 4023 (1990). M. A. Higman, N. Bourgeois and E. G. Niles, JBC 267, 16430 (1992). P. Cong and S. Shuman, JBC 267, 16424 (1992). S. Shuman and J. Hunvitz, PNAS 78, 187 (1981). M. Roth and J. Hunvitz, JBC 259, 13488 (1984). R. Toyama, K. Mizumoto, Y. Nakahara, T.Tatsuni and Y. Kaziro, EMBOJ. 2, 2195 (1983). K. Mizumoto, Y. Kaziro and F. Lipmann, PNAS 79, 1693 (1982). P. Cong and S. Shuman, J S C 268, 7256 (1993). E. G. Niles, R. C. Condit, P. Caro, K. Davidson, L. Matusick and J. Seto, Virology 153,
96 (1986).
128
STEWART SHUMAN
37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49.
C. Upton, D. Stuart and G. McFadden, Virology 183, 773 (1991). A. E. Tomkinson, N. F. Totty, M. Ginsburg and T.Lindahl, PNAS 88, 400 (1991). S. Heaphy, M. Singh and M. J. Gait, Bchem 26, 1688 (1987). I. R. Lehman, Science 186, 790 (1974). T.Lindahl and D. E . Barnes, A R B 61, 251 (1992). E. 6. Niles and L. Christen, JBC 268, 24986 (1993). K. Kodoma, D. E. Barnes and T.Lindahl, NARes 19, 6093 (1991). J. Myette and E . 6. Niles, personal communication. M. A. Higman, L. A. Christen and E. G. Niles, JBC 269, 14974 (1994). X. Mao and S. Shuman, JBC 269, 24472 (1994). M. A. Higman and E . G . Niles, JBC 269, 14892 (1994). E. G. Niles, L. Christen and M. A. Higman, Bchem in press (1994). L. Pena, R. Yanez, Y. Revilla, E. Vinuela and M. L. Salas, Virology 193, 319 (1992). X. Mao and S. Shuman, unpublished. X. Mao,Y. Luo, P. Cong, L. Deng and S. Shuman, unpublished. S. A. Martin and B. Moss, JBC 250, 9330 (1975). B. Moss, B. Ahn, B. Amegadzie, P. D. Gershon and J. G. Keck, JBC 266, 1355 (1991). J. Hagler and S. Shunian, Science 255, 983 (1992). J. Hagler and S. Shuman, PNAS 89, 10009 (1992). L. Yuen and B. Moss, PNAS 84, 6417 (1987). S. Shuman and B. Moss, JBC 263, 6220 (1988). J. Hagler, Y. Luo and S. Shuman, JBC 269, 10050 (1994). P. Hirschmann, J. C. Vos and H. 6. Stunnenberg, J. Virol. 64, 6063 (1990). C. J. Baldick, J. 6. Keck and B. Moss, J. Virol.66, 4710 (1992). J. C. Vos, M. Sasker and H. G. Stunnenberg, Cell 65, 105 (1991). R. Rosales, G . Sutter and B. Moss, PNAS 91, 3794 (1994). R. Rosales, N. Harris, B. Ahn and B. Moss, JBC 269, 14620 (1994). N. Harris, R. Rosales and B. Moss, PNAS 90, 2860 (1993). M. S . Carpenter and A. M. DeLange, J. Virol.65, 4142 (1991). Q. Xu, D. Teplow, T. D. Lee and J. Abelson, Bchem 29, 6132 (1990). B. Schwer and S. Shuman, PNAS 91, 4328 (1994). L. D . Fresco and S. Buratowski, PNAS 91, 6624 (1994). S. Shuman, Y. Liu and B. Schwer, PNAS 91, 12046 (1994). D. Jahn and S. Pande, JBC 266, 22832 (1991). J. J. Liu and A. 6. McLennan, JBC 269, 11787 (1994). N. Bakalara, A. M. Simpson and L. Simpson, JBC 264, 18679 (1989). E. B. Rasmussen and J. Lis, PNAS 90, 7923 (1993). R. Singh and R. Reddy, PNAS 86, 8280 (1989). S. Gupta, R. Singhand, R. Reddy, JBC 265, 9491 (1990). S. Shimba and R. ReddyJBC 269, 12419 (1994). A. L. Beyer and Y. N. Osheim, Genes Dew. 2, 754 (1988). G. Bauren and L. Wieslander, Cell 76, 183 (1994). M. M. Konarska, R. A. Padgett and P. A. Sharp, Cell 38, 731 (1984). I. Ederly and N. Sonenberg, PNAS 82, 7590 (1985). J. Hamm and I. Mattaj, Cell 63, 109 (1990). M. P. Terns, J. E . Dahlberg and E. Lund, Genes Dew. 7, 1898 (1993). E. Patzelt, E. Thalman, K. Harmuth, D . Blaas and E. Keuchler, NARes 15, 1387 (1987). A. R. Krainer, T. Maniatis, B. Ruskin and M. R. Green, Cell 36, 993 (1984). M. Ohno, H. Sakamoto, Y. Shimura, PNAS 84, 5187 (1987). K. Inoue, M. Ohno, H. Sakamoto and Y. Shimura, Genes Deu. 3, 1472 (1989).
50.
51. 52. 53. 54. 55. 56.
57. 58. 59. 60.
61. 62. 63. 64. 65. 66.
67. 68.
69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82.
83. 84. 85. 86.
CAPPING ENZYME IN mRNA SYNTHESIS
129
87. S. S. Sisodia, B. Sollner-Webb and D. W. Cleveland, MCBiol 7, 3602 (1987). 88. M. J. Moore, C. C. Query and P. A. Sharp, in “The RNA World,”p. 303. CSHLab Press. Cold Spring Harbor, New York, 1993. 89. B. Schwer and S. Shuman, unpublished. 90. J. Woolford, personal communication. 91. E. Barbosa and B. Moss, JBC 253, 7692 (1978). 92. E. Barbosa and B. Moss, JBC 253, 7698 (1978). 93. J. M. Keith, M. J. Ensinger and B. Moss, JBC 253, 5033 (1978). 94. S. R. Langberg and B. Moss, JBC 256, 10054 (1981). 95. S. Muthukrishnan, G. W. Both, Y. Furuichi and A. J. Shatkin, Nature 255, 33 (1975). 96. 6. W. Both, A. K. Banerjee and A. J. Shatkin, PNAS 72, 1189 (1975). 97. A. J. Shatkin, Cell 40, 223 (1985). 98. D. A. Melton, P. A. Krieg, M. R. Rebagliati, T. Maniatis, K. Zinn and M . R. Green, NARes 12, 7035 (1984). 99. B. S. Schnierle, P. D. Gershon and B. Moss, PNAS 89, 2897 (1992). 100. B. S. Schnierle, P. D. Gershon and B. Moss, JBC 269, 20700 (1994).
Rearrangement of snRNA Structure during Assembly and Function of the Spl iceosome’ MANUEL ARES,Jr.2 AND BRYNWEISER Biology Department Sinsheimer Laboratories University of Calqornia, Santa Cruz Santu Cruz, California 95064
I. General Features of Splicing ................................... 11. Dynamic RNA: Technical Considerations . . . . . . . . . . . . . . . . . . 111. RNA-RNA Interactions Early in Spliceosome Assembly . . . . . . . . . . . . IV. RNA-RNA Interactions in the Assembling Spliceosome Prior to the First Chemical Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Rearrangement of Spliceosomal RNA Structure during the Catalytic Steps of Splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI. Conserved Residues in the Core of the Assembled Spliceosome . . . . . VII. Interactions during the Second Catalytic Step . . . . . . . . . . . . . VIII. Modified Nucleotides in Splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX. Release of Spliced mRNA and Regeneration of snRNA Structures . . . X. Conclusions .................... References ....................
132 136 139 143 146 150 151 154 156 157 157
In the past 15 years, an RNA Weltanschauung has evolved, progressing quickly from an initial astonished disbelief at the catalytic ability of naked RNA to the jaded assumption that there is little that RNA cannot be made to do with a clever in uitro evolution experiment. Interest in RNA is natural; in the two central processes of the eukaryotic gene expression pathway, splicing and translation, RNA is both the substrate and part of the enzyme. These “enzymes,” the ribosome and the spliceosome, are large ribonucleoprotein complexes (RNPs), assembled by ordered and regulated means to achieve specific recognition of RNA substrates and carry out multiple chemical reactions. Our preoccupation with the RNA moieties of these enzymes stems Abbreviations: RNP, ribonucleoprotein particle; snRNA, small nuclear RNA; snRNP, small nuclear ribonucleoprotein particle; 4-sU, 4-thiouridine; SL-RNA, spliced leader RNA. To whom correspondence may be addressed. Progress in Nucleic Acid Research and Molecular Biology, Vol. 50
131
Copyright 0 1995 by Academic Press, Inc All rights of reproduction in any form reserved.
132
MANUEL ARES,
Jr.
AND BRYN WEISER
from the hypothesis that they compose intrinsically, if not exclusively, the catalytic elements. The chemical reactions of nuclear pre-mRNA splicing are not beyond the reach of RNA: the autocatalytic group-I1 intron RNAs perform them without the benefit of protein. Yet the spliceosome meets the task with dramatic stepwise changes in snRNA composition and conformation, escorted by an entourage of protein factors. Why does the spliceosome operate like a Rube Goldberg machine, performing such complex and unlikely events? The complexity of the spliceosome may be necessary to accommodate the vast numbers of different substrates and regulatory influences that complicate its work. The core of the spliceosome contains a highly conserved RNA structure, suggesting that the chemical events of splicing are based heavily on RNARNA interactions. Yet, spliceosomes from different organisms are not exactly the same. Most studies on splicing have used human (HeLa) cells or the yeast Saccharomyces cerevisiue. Differences in splicing, and the intrinsic strengths and weaknesses of the experimental materials, have restricted some findings to one or the other system, but the conservation of snRNA and splicing factor structure and hnction means that the yeast and human spliceosomes are very similar. Some RNA-RNA interactions are essential in both systems. Some make more important contributions to function in one system than in the other, at least as assayed in the laboratory. Others interactions have not yet been shown to contribute to function, but are inferred to be critical for success in the ultimate bioassay, evolution. In this review, the idea that spliceosomal RNA works in fundamentally the same way in all systems is used to develop a cohesive representation of the rearrangement of the most conserved RNA-RNA interactions during splicing.
1. General Features of Splicing In this section, we briefly review general features of splicing necessary as a context in which to place the RNA-RNA interactions described so far. More extensive reviews on splicing that contain discussion about the roles of RNA and protein factors, the effects of substrate mutations on splicing, the cellular localization of splicing factors, and the structure and function of snRNPs are recommended (1-8, 81-83).
A. Two Transesterifications With the development of cell-free splicing extracts, it was determined that two chemical reactions are performed by the splicing machinery on the substrate pre-mRNA (1).The first chemical reaction is a transesterification
133
DYNAMIC RNA IN THE SPLICEOSOME
consisting of an in-line nucleophilic attack on the phosphate between the 3' nucleotide of the first exon and the 5' nucleotide of the intron (the 5' splice site) by a 2' hydroxyl within the intron (the branchpoint) with the first exon as the leaving group (1, 9; see Fig. 1). This "first-step" reaction produces the two splicing intermediates: free exon 1 and an intron-exon 2 molecule in
BP
STEP 1
3'55
I
PRECURSOR
! )H
cd' 0STEP 2
I
i
o J-
3'0H
P
INTERMEDIATES
PRODUCTS
FIG. 1. Stereochemical course of splicing at the reactive phosphates. In the first step of splicing, the branchpoint (BP) 2' hydroxyl (2'OH) attacks the phosphate at the 5' splice site ( 5 ' S S ) to form the lariat intermediate, and the 5' exon (El) is the leaving group. In the second step, the 3' hydroxyl (3'OH) of E l attacks the phosphate at the 3' splice site (3'SS) to form the spliced exons, with the intron lariat product as the leaving group. Both reactions occur with the Sp phosphorothioate isomer, but not with the Rp isomer (9, SO).Exon 1 (El), white bar; intron, thick black line; exon 2 (EZ), black bar; 0-, oxygen substituted by S in phosphorothioate Rp isomer, not tolerated; O', oxygen substituted by S in the functional Sp isomer. (Redrawn from 9. )
134
MANUEL ARES,
Jr.
AND BRYN WEISER
“lariat” form, so called because the 5’ phosphate of the intron is now covalently attached to the branchpoint nucleotide, forming a closed circle of RNA with a 3’ tail ( 1 ) . The second chemical reaction is a similar transesterification, but this time the in-line attack is by the 3’ hydroxyl of exon 1 (the leaving group in the first reaction) on the phosphate between the last nucleotide of the intron and the first nucleotide of exon 2 (the 3’ splice site). This “second-step” reaction produces the two splicing products: the spliced exons (with the phosphate conserved from the first nucleotide of exon 2) and the intron in lariat form with its 5’ phosphate joined to the 2‘ hydroxyl of the branchpoint ( 1 , Fig. 1). The number of phosphodiester bonds broken (one at the 5’ splice site of the pre-mRNA and one at the 3’ splice site of the lariat-exon 2 intermediate) is equal to the number of phosphodiester bonds formed (one at the branchpoint and one between the spliced exons, Fig. 1). The phosphorothioate stereoselectivity of the active site(s) for the two reactions is similar, and both reactions proceed with phosphate inversion (9, Fig. 1). This suggests that the second reaction is not carried out by removing the branchpoint from the active site, replacing it with the 3’ splice site and carrying out a reaction similar to the reverse of the first step (9). These and other data suggest that the spliceosome contains two distinct (but possibly overlapping) catalytic sites (discussed in 9).
B. The Enzyme Is Built o n the Substrate The spliceosome is built of small nuclear ribonucleoprotein particles (snRNPs). The snRNPs are named after the snRNA they contain (UlU22 . . .), and are classified by the common core proteins they contain (68). Not all snRNPs are involved in splicing, but those that are ( U l , U2, U4/U6, and U5) share a common “core” set of proteins (the “Sm” proteins) that are tightly associated with the snRNA through a shared RNA sequence motif (6-8, the “Sm-binding site”: AU,-,G). Additional proteins may be found associated only with a particular snRNP; these are the “snRNP-specific proteins” (6-8). The protein composition of the different snRNPs remains an active area of research. As methods for the purification of snRNPs are improved, more and more snRNP-specific proteins are being identified. The chemical reactions of splicing occur with great efficiency and accuracy, although the reactive phosphates and hydroxyls may be as many as tens of thousands of nucleotides apart in the primary transcript. The identification of splice sites in constitutive splicing and the regulation of alternative splicing are constrained by spliceosome assembly events that bring the regions of the pre-mRNA transcript together and into the catalytic structure. Spliceosome assembly follows an ordered pathway of snRNP binding, and is
135
DYNAMIC RNA IN THE SPLICEOSOME
influenced by the activity of numerous splicing factors (1-6). First, the U 1 snRNP delimits the 5’ splice-site region. Then U2 binds near where the branchpoint will be to form the “prespliceosome. Next, a “tri-snRNP” containing U4, U5, and U6 snRNAs joins the complex. In the tri-snRNP complex, U4 is extensively base-paired to U6, and U 5 association is mediated by proteins. The association of U 1 with the complex is weak, and special measures must be taken to detect it ( I ) . Prior to the first chemical step, the association of U4 also becomes weaker (1-5). The extent of the base-pairing between U4 and U6 suggests that an active process must take place to disrupt tight U4 association with the complex (1-5). The chemical steps of splicing follow. In the yeast system, progression of the complex through these functional states is mediated by a set of proteins with amino-acid sequence similarity to ATP-dependent RNA helicases (1-5). The splicing protein members of the family consist of two subgroups, the “DEAD3-box” proteins (PRP5, PRP28, SPP81) and the related DEAH proteins (PRP2, PRPl6, PRP22), named for a conserved amino-acid motif that is part of the nucleotide-binding domain. None has yet been shown to have helicase activity; however, many have RNA-dependent ATPase activity (1,2).The proteins are required at specific stages: PRP5 and PRP28 during spliceosome assembly, PRP2 after assembly but before the first step, PRP16 after the first step and before the second step, and PRP22 after the second step before release of spliced mRNA from the complex (1-5). Presumably the events catalyzed by each are unique and use ATP to drive the complex forward, possibly by initiating conformational changes. They may also play a role in the fidelity of splicing (10).The nature of their substrates remains mysterious. ”
C. U1 and U4 snRNPs Have Essential Functions Limited to Spliceosome Assembly UL and U4 are weakly associated with the spliceosome by the time the chemical steps of splicing begin. Recent evidence indicates that they are dispensable for the chemical steps of splicing (11;R. J. Lin, personal communication) and shows that U1 and U4 are not essential parts of the catalytic apparatus of the spliceosome. As described in Sections IV,B, V,A and V,B in more detail, interactions between U 1 and the 5’ splice site as well as U4 and U 6 are dissolved in favor of other important, mutually exclusive interactions. Thus U 1 and U4 can be considered spliceosome assembly factors with roles in the construction of the catalytic apparatus of the spliceosome. =
D E A D = Asp-Glu-Ala-Asp; H = histidine; PRP = pre-RNA processing mutants; SPP81 suppressor of prp8-1,
136
MANUEL ARES,
Jr. AND
BRYN WEISER
D. trans-Splicing: A Modified Spliceosome Built with an Exon-snRNA “Hybrid“ snRNP Certain protozoa and nematode worms carry out trans-splicing whereby an RNA, called the spliced leader (SL) RNA, containing a 5’ splice site is assembled into a snRNP, enters the spliceosome, and participates in the splicing reactions (reviewed in 12). The SL-RNA donates its 5’ nucleotides as though they were exon 1, to a transcript carrying a 3’ splice site and a downstream exon. Branched molecules equivalent to the lariat of cis-splicing are observed, and the process requires U2, U4, and U6. In organisms where both cis- and trans-splicing take place, some of the same snRNAs are used for both processes, suggesting that the trans-spliceosome shares fundamental properties with the cis-spliceosome (12). In trans-splicing organisms not known to perform cis-splicing, neither U1 nor U5 has been found, suggesting that the snRNP component of the spliced leader may supply any necessary functions carried out by these snRNPs in cis-splicing (12).
II. Dynamic RNA: Technical Considerations
A. How Are Interactions Identified and Placed in Time? Conformational rearrangements in the spliceosome are deduced or inferred from data that identify temporal changes in the interaction between different sets of snRNA and pre-mRNA nucleotides during splicing. Two main technical issues arise: how are the RNA-RNA interactions identified, and how are they placed in time? Identification of an RNA-RNA interaction in the spliceosome usually involves one or more of the following observations: (1)phylogenetic variation in sequence that conserves the potential for equivalent kinds of base-base interactions, (2) genetic experiments that identify functional combinations of non-wild-type nucleotides, (3) chemical or enzymatic probes of RNA secondary structure, and (4) cross-linking studies that position specific RNA residues near each other in splicing complexes. Placing the requirement for a particular dynamic RNA-RNA interaction within the sequence of events during splicing is more difficult. An interaction may be required to get to or pass a particular landmark, or may form only after a particular landmark is passed. Failing definitive placement, correlation of the appearance and disappearance of the interaction relative to other splicing events must be used. A number of landmarks are available. The timing of events relative to the first and second chemical steps is infor-
DYNAMIC HNA IN THE SPLICEOSOME
137
mative. In yeast, temperature-sensitive mutations in splicing-factor genes provide numerous landmarks in the process (2-4). For the many assembly events prior to the first step, time is marked with respect to the formation of presplicing complexes, including the commitment complex and the prespliceosome. In addition, spliceosome assembly can be blocked at specific stages by various treatments, such as ATP depletion, EDTA addition, mild heat treatment, or removal of specific snRNPs (1). Thus far, the number of discrete steps in the splicing pathway is unknown. In many cases the order in which certain landmarks occur relative to each other is also unclear. To generate images of the changes in RNA-RNA interactions in the spliceosome, we have reviewed data that can be interpreted to constrain RNA-RNA interactions in space and time. Because these data originate from distinct experimental approaches, we summarize and compare below the strengths and limitations of phylogenetic, genetic, structure-mapping, and cross-linking approaches.
B. Phylogenetic Variation and Dynamic RNA Of the means for detecting interactions listed above, phylogenetic variation consistent with conservation of potential for base-base interaction comprises the bulk of the data used to determine the secondary structures of the snRNAs, much of which has been reviewed elsewhere (13). Such variation is useful for modeling the structure of regions of the spliceosome where the functional demands on RNA structure are sufficiently flexible that variation in sequence is allowed. Phylogenetic variation alone is inadequate for complete modeling of dynamic RNAs for two reasons. First, an extreme lack of variation characterizes regions of the RNA where multiple functional constraints are at play. Informative sequence variation will occur much less frequently in an RNA strand that interacts with two other strands at different times. Second, where variation does exist in dynamic RNA, it will be consistent with multiple structures, some of which may not be relevant to function. For example, phylogenetic data cannot be used to distinguish between a static requirement for a pseudoknot (14) and a dynamic requirement for both of the two component stem loops that may be derived from it. The limitations of the phylogenetic approach seem especially severe in the spliceosome, where large-scale changes in secondary structure occur during function.
C. Genetic Analysis of RNA-RNA Interactions in Splicing
In instances where strong conclusions elude the phylogenetic approach, reverse genetic techniques can be used to test models based on specific interactions. The power of this approach is based on in viuo function rather
138
MANUEL ARES,
Jr. AND
BRYN W I S E R
than physical association. An interaction is detected when a phenotype caused by a mutation in one region of RNA can be suppressed by a compensating mutation in the interacting region of RNA (1-5). Thus, double-mutant combinations that function more like wild type than the single mutations are taken as evidence for a functional interaction between the altered nucleotides. The argument against indirect modes of suppression rather than suppression by direct restoration of RNA-RNA interactions is strengthened when allele specificity of suppression can be demonstrated. If several functional combinations of nucleotides are consistent with Watson-Crick (or other) base-base interactions, specify hydrogen bonds may be predicted. For studying the timing of RNA rearrangements in splicing, the genetic approach is only partly satisfying. First, as with phylogenetics, genetic suppression results are most often displayed as in uiuo phenotypes. Unless the interaction is specifically required for the second step, it will be difficult to place in time. Many conserved nucleotides have both an early and a late function, making mutational approaches to testing their late function cumbersome or impossible. Finally, there remains the nagging uncertainty that the observed suppression results from new or unusual compensating activities that wild-type nucleotides do not carry out during normal function. Nonetheless, this approach has been extremely successful, and has been used to study the U1-5’ splice site, U2-branchpoint, U4-U6, U2-U6, U5exon, 5’ splice site-U6, and U1-3’ splice site interactions, as well as interactions that promote conserved internal structure in U2 and U6 ( I , 5).
D. Cross-linking and Chemical Accessibility of RNA in the Spliceosome Detection of RNA-RNA interactions by UV cross-linking and UVinduced psoralen cross-linking, as well as detection of higher order structure by chemical cross-linking, has often been the first clue identifying interacting partners in the spliceosome. Cross-linking data can be very useful because the cross-linked partners must be very close.4 The yield of cross-linked material is often low, and a constant worry is that a small percentage of aberrantly structured RNA complex is cross-linking very efficiently to produce a signal with little functional meaning. Occasionally such difficulties can be overcome by demonstrating functionality of cross-linked material, but more often other kinds of evidence supporting the interaction must be obtained to solidify the importance of a particular cross-linking result. Other methods of probing RNA structure have been developed, but application to the study of RNA structural rearrangements during splicing 4 “Closeness”in UV cross-linking is discussed by Budowsky and Abdorashidovain Vol. 37 of this series [Eds.].
DYNAMIC RNA IN THE SPLICEOSOME
139
has not been common. The main difficulty in the application of these methods to nuclear pre-mRNA splicing has been in the preparation of sufficient amounts of splicing complexes at discrete, known steps in the splicing process. Interpretation of the data where multiple conformations are present is difficult, and often the most interesting complexes are not abundant. So far, chemical probing has been useful in analyzing the structure of snRNAs within free snRNPs or snRNP complexes as they await entry into the spliceosome, but little of the dynamics of spliceosomal RNA has so far been revealed.
E.
Cross-linking of Site-specifically Photoactivatable Premessenger RNAs
Recently, intrinsic photolabels placed within pre-mRNA have been used to study the question of splice-site recognition and the dynamics of RNA in the spliceosome (15, 16). In this approach, a photoactivatable 4-thiouridylate residue (4-sU) is built into the pre-mRNA substrate at a unique position near one of the splice sites. At times after splicing is begun, the reaction is pulsed with UV light, whereupon elements of the spliceosome near the substituted residue become cross-linked to the pre-mRNA at the substituted uridine. Analysis of the cross-linked material identifies the cross-linked partner, and primer extension can be used to map the sites of cross-linking. This methodology is extremely informative due to the unique placement of the photoactivatable nucleotide as well as the potential to “chase” cross-linked material through the splicing pathway (16).
111. RNA-RNA Interactions Early in Spliceosome Assembly
A. Interaction of U1 snRNA with the Pre-mRNA The secondary structure of U1 snRNA is well illuminated by phylogenetic variation among the metazoans, but U1 from S. cerevisiae shares limited similarity with more typical U1 (14, including the conserved 5‘-most stem loop. The sequence near the 5‘ end of U1 is invariant and engages in basepairing with 5‘ splice sites during splicing in order to identify the region containing the bond to be attacked in the first step (1-6; Fig. 2). The precise definition of the splice site is not made relative to the U1 complementary sequence in the pre-mRNA; rather U6 seems to help make this decision later (17, 18).The extent of complementarity between the 5’ splice site and the 5‘ end of U1 influences its efficiency in splicing. The lower efficiency of some mutant 5‘ splice sites can be increased by suppressor U l snRNAs with
140
MANUEL ARES,
Jr.
AND BRYN WEISER
F I ~2.. RNA-RNA interactions in the prespliceosome. The sequences of human U 1 and U2 are folded to show internal interactions as well as interaction between U1 and a 5' splice site (5'-CUAUGU-3') and between U2 and a branchpoint sequence (5'-UACUAAC-3'). The intron sequences connecting the 5' splice site to the branchpoint and the branchpoint to the 3' splice site are not shown. Exon 1, 5' black bar; exon 2, 3' black bar.
compensatory mutations in their 5' ends that improve pairing to the mutant splice site (19-21). Because later requirements for splicing are superimposed on the 5' splice-site consensus sequence, U 1 suppression experiments are not effective at all positions in the U1 5' splice-site helix. U1 also contains a sequence (nucleotides 9-11) complementary to the 3' splice site (Fig. 2), that is important in the identification of the 3' splice site in the processing of introns in Schizosaccharomycespombe. This interaction influences the first step of splicing in the "AG-dependent" class of introns in uiuo (22, 23). In S. cereuisiue, spliceosome assembly and the first step of splicing are not strictly dependent on a 3' splice-site AG (1,2), and a test of this interaction in S. cerevisiue did not identify its role (24). However, complementarity between UI nucleotides 9 and 10 and the last two nucleotides of the first exon (essentially extending the U1-5' splice-site helix to include some of exon 1) did improve splicing efficiency at debilitated 5' splice sites
DYNAMIC RNA IN THE SPLICEOSOME
141
(24). The negative results in S . cerevisiae do not strictly exclude a contribution to splicing, but the generality of the interaction between U1 nucleotides 9 and 10 and the 3’ splice site has yet to be established. The function of U1 in identifying the 5’ splice site may be bypassed in certain instances. U1 depletion in splicing extracts of the nematode Ascaris blocks cis-spicing but does not block trans-splicing (12).To date, trypanosomes are known to carry out only trans-splicing; neither cis-introns nor U1 snRNA have been discovered in these organisms. Artificial cis constructs carrying SL RNA sequences in place of the 5’ end and the 3’ splice site of adenovirus or globin introns can be spliced in mammalian extracts in which U1 snRNA has had its 5’ end trimmed or blocked using complementary oligonucleotides (25, 26). Spliceosome assembly occurs in response to the addition of model 5’ splice-site oligonucleotides to extracts, apparently without the participation of the 5’ end of U1 snRNA (27, 28). These studies demonstrate that mechanisms may exist to bypass the need for U1-5’ splicesite interactions in certain instances.
B. Structure of U2 snRNA in snRNPs and Early Splicing Complexes
The second snRNP to bind the pre-mRNA substrate is U2. The stable binding of U 2 snRNP is ATP dependent and requires branchpoint (yeast) or polypyrimidine tract (mammalian) sequences in the pre-mRNA (1-6). Although in mammalian extracts U2 snRNP can form complexes with substrates lacking a 5’ splice site, substrate commitment experiments argue that normally the binding of U2 snRNP occurs after formation of the U1containing “commitment complex” (1,2). Binding of U 2 snRNP to the commitment complex forms the “prespliceosome.” As analyzed by nondenaturing gel electrophoresis, the prespliceosome (complex 111 or B in yeast; complex A in mammals) contains U2 snRNP, but not the U4/U6.U5 snRNP (1, 2). The association of U1 snRNP with the prespliceosome is less stable and more difficult to demonstrate by native gel electrophoresis, but is detectable using less stringent conditions (1, 2). Studies on U2 snRNA structure reveal that U2 snRNA must adopt different secondary structures during the course of its function in splicing. Chemical probing in yeast (29), and of both the 12-S and 1 7 3 forms of the U2 snRNP in mammalian extracts (30)indicate that the RNA is folded as shown in Fig. 2. Pairing of nucleotides 7-14 with 19-26 (human numbering) forms “stem I,” a structure consistent with chemical probing data (29, 30). Nucleotides forming this stem are highly conserved, and phylogenetic variation consistent with Watson-Crick pairing is observed only in the equivalent of the 14-19 base-pair in kinetoplastid organisms (13, 31). Genetic experiments designed to test for a contribution of the stem structure to function in
142
MANUEL ARES,
Jr.
AND BRYN WISER
splicing have been negative, although the nucleotides composing the stem have other roles (32-34). An unusual conserved feature is the presence of non-Watson-Crick base appositions in the core of the stem. Hyperstabilization of the stem blocks splicing, revealing a requirement for appropriately tuned stability (32).These and other data (see below) indicate that this part of U2 must unfold later in splicing. A phylogenetically supported stem-loop formed by pairing U2 nucleotides 47-52 with 61-66 (human numbering, “stem-loop IIa”) (13,31;Fig. 2) is essential for U2 function (31).Also supported by phylogenetic variation is a base-pairing between the 8-base “loop IIa” (nucleotides 53-60) and a sequence downstream (the “conserved complementarity,” nucleotides 88-95; dotted line in Fig, 2) (13, 31). Genetic experiments do not demonstrate a function for the complementarity to the loop (29), and structure-probing experiments show that the loop and the conserved complementarity are accessible to chemical probes in the bulk of snRNPs. Mutations in yeast that destabilize stem IIa cause the RNA to be misfolded so that the conserved complementarity is paired to the loop IIa (35),causing cold-sensitive splicing and growth defects that can be suppressed by destroying the conserved complementarity (36). Analysis of yeast splicing extracts made from cold-sensitive U2 strains shows that prespliceosome formation is blocked at restrictive temperatures (35). Contrasting results obtained in HeLa-cell extracts show that oligonucleotides directed against loop IIa and adjacent stem sequences allow spliceosome assembly and the first step of splicing, but interfere with the second step of splicing in uitro (37). Chemical cross-links can be formed within human U2 RNA that are compatible with the pairing of the loop to the conserved complementarity (38). One consistent interpretation of all the data argues that this region of U2 adopts more than one structure during the functional cycle. First, stem IIa is required for a spliceosome assembly step, then, after the first step of splicing but before the second step, this part of the RNA refolds and becomes accessible to the oligonucleotide. How or why this region of U2 might rearrange remains to be determined.
C. Interaction between U2 and Pre-mRNA U2 base-pairs with the intron branchpoint region. Mutations in nucleotides 33-38 (human numbering) of U2 that increase complementarity to mutant branchpoint regions greatly improve their use in yeast (39)and mammals (40,41). In yeast, certain mutations in the branchpoint interaction region of U2 cause dominant slow-growth defects (42). Presumably this phenotype is due to a partial function of the mutant U2 snRNP that prevents wild-type U2 snRNP from helping remove introns at a rate consistent with growth, and suggests that stable complexes involving the mutant snRNP
DYNAMIC RNA IN THE SPLICEOSOME
143
sequester or deplete a limiting substoichiometric splicing factor in vivo. Because branchpoint complementarity to U2 snRNA is not required for the dominant phenotype, steps leading to stable addition of U2 snRNP may occur prior to recognition of the branchpoint by base-pairing to U2 in yeast in vivo (42). Although there is an intrinsic branchpoint binding ability of mammalian U2 snRNPs (43), stable U2 binding to the branchpoint region in mammalian extracts is ATP dependent and is assisted by U2 auxiliary factor (U2AF) through adjacent polypyrimidine tracts in the intron (1, 5). Thus, in both yeast and mammals there is evidence that base-pairing between U2 and the branchpoint region is not the only factor that contributes to the initial selection of the U2 binding site in pre-mRNA. Psoralen cross-linking experiments demonstrate that the U2-branchpoint interaction is established early in the course of splicing in mammalian extracts, well before the first chemical step (44). Surprisingly, the U2branchpoint cross-link occurs independently of ATP, suggesting that this RNA-RNA interaction may take place before U2 snRNP binding is defined as “stable” by the biochemical assays normaily employed (44). Many complexities of mammalian branchpoint selection remain to be explained, such as why very poor complementarity to U2 is tolerated in some mammalian introns (45), and how several different adenosine residues in the same region can be used as branchpoints (46). Although the conserved yeast branchpoint is often used as a model because of extensive complementarity to U2 and the provocative adenosine, there is as yet little direct evidence to support the extent of interaction commonly depicted (Fig. 2), despite evidence for the bulged attacking residue (47).
IV. RNA-RNA Interactions in the Assembling Spliceosome Prior to the First Chemical Step
A. Extensive Interaction between U4 and U6 snRNAs The prespliceosome is the substrate for the next step of spliceosome assembly, addition of the U4/U6.U5 tri-snRNP (1-6). The tri-snRNP is formed by the association of the U4/U6 snRNP with the U5 snRNP (1-6). A site of association between U4 and U 6 48, 49) was first mapped by psoralen cross-linking (50). In the U4/U6 snRNP (and the tri-snRNP) the interaction is so stable that deproteinized U4-U6 hybrids are efficiently recovered after cold phenol extraction and can be analyzed by nondenaturing gel electrophoresis (49, 50; Fig. 3). U6 lacks the Sm binding-site consensus sequence;
144
MANUEL ARES,
Jr.
AND BRYN WEISER
FIG.3. RNA-RNA rearrangements in the assembling spliceosome. The sequences of the human spliceosomal snRNAs are folded to show interaction established during spliceosome assembly. Extensive interaction between U4 and U6, as well as between the 5' end of U2 and the 3' end of U6, is shown. The intron sequences connecting the 5' splice site to the branchpoint and the branchpoint to the 3' splice site are not shown. Exon 1, 5' black bar; exon 2, 3' black bar.
however, it appears in Sm antibody immunoprecipitates of snRNPs by virtue of its interaction with U4 (6). Phylogenetic data allowed the development of a model for the interaction between U4 and U6 snRNAs (51; Fig. 3). There are two stem regions involving U4 and U6, separated by an internal U4 stem structure that creates a Y junction. The U6 sequences that interact with U4 in the U4/U6 snRNP and in the tri-snRNP are also required to participate in other HNA-RNA interactions later in spliceosome assembly. The constraints on the U6 sequence to maintain the ability to participate in both sets of interactions must account for some of the high degree of U6 sequence conservation through evolution.
DYNAMIC RNA IN THE SPLICEOSOME
145
B. Structure of U5 snRNA The most prominent feature of U5 snRNA is the conserved 11-nucleotide loop at the end of a conserved stem structure (loop I in Fig. 3). The loop is accessible to solvent, as shown by chemical probing experiments (52). The extended stem is punctuated by an internal loop containing an invariant CCG sequence on the 3‘ side and a longer, less well-conserved sequence on the 5’ side (13, 53; Fig. 3). At the base of the stem to the 3‘ side is the Sm binding site, and following the Sm site there is a 3‘ terminal stem loop. An extra sequence can be found at the 5’ end of U5 in mammalian cells, encoded by variant genes (54). The single yeast U5 gene produces two RNAs that differ at their 3’ ends, with the shorter version lacking the 3’ stem loop (13).Yeast U 5 also has an extra stem loop projecting from the main stem near the conserved internal loop (13, 53). Genetic experiments have identified an interaction between the nucleotides of the U5 loop and the exon sequences of pre-mRNA (discussed in Sections V,E and VI1,A).
C. Interaction between the 5’ End of U2 and the 3’ End of U6 Psoralen cross-linking of snRNAs in mammalian cells first identified an interaction between the 5‘ end of U2 and the 3’ end of U6 (55;helix 2-6 I1 in Fig. 3). The potential to form this interaction is phylogenetically conserved from humans to yeast and trypanosomes. Genetic studies in transfected mammalian cells have measured the activity of a suppressor U2 (able to recognize a splicing substrate with a mutant branchpoint) carrying second mutations in the sequence complementary to U6. Such second mutations block suppressor U2 function, but function was regained by transfection of a U6 gene containing a compensatory mutation that restored the helix (56, 57). In yeast, U2 or U6 mutations in the component strands of the helix do not block growth, suggesting that this interaction is not absolutely essential for splicing (42, 58-60). Its broad conservation and the results in mammalian cells suggest that it contributes to the efficiency of splicing. In the assembling spliceosome, the interaction between the 5’ end of U2 and the 3’ end of U6 may help the tri-snRNP bind to the prespliceosome, or serve to position the assembling RNA elements within the complex (1). Splicing extracts probably contain abundant UZ-U4/U6. U 5 snRNP complexes in the absence of added pre-mRNA, because psoralen cross-linking experiments reveal the double cross-linked U4-U6-U2 snRNA product in amounts expected given the efficiency of the individual U4-U6 and U6-U2 cross-linked species (61). For this reason, U6 is shown paired with U 4 and U2 simultaneously (Fig. 3).
146
MANUEL ARES,
Jr.
AND BRYN WEISEH
D. The Departure of U1 Demonstrating association of U1 with splicing complexes containing other snRNPs has been technically challenging ( 1 ) . The weak association of U1 with splicing complexes, as well as the interaction of U6 with the 5’ splice site, has led to the proposal that U1 leaves the splicing complex some time after defining the 5’ splice site. Thus, although Fig. 3 is drawn to contain all the splicing snRNAs currently known, evidence for a stable complex containing all of them simultaneously is scant. In a direct test of the involvement of U1 in the splicing reactions, assembled yeast spliceosomes blocked at the PRP2 step were isolated and stripped of detectable U1 snRNA (R. J. Lin, personal communication). Such spliceosomes could be chased through the splicing reactions on addition of appropriate factors in the absence of U1, indicating that U1 is not necessary for the catalytic steps of splicing. Recently, a mutually exclusive interaction between U6 and the 5’ splice site has been identified (27, 28). These observations, as well as the weak biochemical association of U1 with assembled spliceosomes, suggest that U1 snRNP leaves the spliceosome prior to the first step of splicing.
V. Rearrangement of Spliceosomal RNA Structure during the Catalytic Steps of Splicing
A. An Interaction between U6 and the 5’ Splice Site
UV cross-linking first identified an interaction between U6 and premRNA (Figs. 4 and 5), as well as between U6 and the lariat intermediate, near substrate sequences at the 5‘ splice site in both yeast (62, 63) and mammalian (16, 61) extracts. Two distinct base-pairing models between U6 and the 5’ splice site region were tested, and one, in which the 4, 5, and 6 positions of the splice site pair with the first three nucleotides of the invariant ACAGAGA box of U6, is supported by compensatory mutations in both RNAs (17, 18; Fig. 5). This interaction would seem to be mutually exclusive with pairing of the 5’ splice site with U1 (see Fig. 4; compare Fig. 3 with Fig. 5), and it is inferred that after identification of the 5’ splice site region by U1, U6 takes over from U1 and assists in specifying the precise bond to be attacked in the first step of splicing. This interaction explains why certain suppressor U1-5’ splice site mutations do not function: U6 interactions with the mutant substrate remain perturbed and result in inaccurate specification of the bond to be cleaved (17, 18). Model oligonucleotides representing the 5’ splice site can induce the assembly of U2.U4/U6.U5 complexes, even when the 5’ end of U1 is
147
DYNAMIC RNA IN THE SPLICEOSOME
5'
3'
5'
I 3'
!
3'
FIG. 4. Rearrangement of snRNA structure during activation of the spliceosome. A representation of human snRNAs is given in Fig. 3, showing changes that occur between assembly and activation. During this time, U1 leaves the 5' splice site to U6, US establishes contact with exon 1, and U4 leaves U6, which refolds on itself, as well as establishing new contacts with U2.
blocked (27, 28). Studies using 4-sU-substituted transcripts show that the second nucleotide of the intron (the U of the conserved GU) can be crosslinked to the third A of the U6 ACAGAGA sequence, but only after the first step of splicing (16).This suggests that the interaction between U 6 and the 5' splice site region may be altered between the first and second steps of splicing.
6. U4 Is Unwound from U6 Before the first catalytic step of splicing occurs, the association of U4 snRNA with the splicing complex becomes greatly weakened or lost (1-5; Fig. 4). As with U1, the question ofwhether U4 is necessary for the catalytic steps of splicing has been addressed by assembling yeast spliceosomes blocked before the first step and stripping them of U4 snRNA. Spliceosomes lacking U4 can carrying out the splicing reactions on addition of the appropriate factors ( I I ) , arguing that U4 snRNA does not participate in the catalytic steps of splicing. Because of the extent of interaction between U4 and U6, the destabilization of U4 at physiological temperatures must be an active process (51). In addition, the dramatic loss of such a significant segment of structured RNA argues that U 6 snRNA must adopt other structures in the absence of U4, and that these structures may be particularly important to the catalysis of the splicing reactions (51).
148
MANUEL ARES,
Jr.
AND BRYN W I S E R
U
rm -A
C
x U
!
3'
FIG. 5. RNA-RNA interactions in the spliceosome after rearrangement. Sequences of human U2, U6, and U5 are folded to show interactions important for the catalytic steps of splicing. As modeled in Fig. 4,an internal stem has formed in U6 (residues 57-78), and residues 49-55 of U6 interact with residues 20-28 of U2 to generate the U2-U6 helix Ia and Ib structures. In addition, U5 residues 40-42 interact with the last three residues of exon 1. The arrow indicates the attack of the 2' hydroxyl of the branchpoint on the 5' splice site. The intron sequences connecting the 5' splice site to the branchpoint and the branchpoint to the 3' splice site are not shown. Exoii 1, 5' black bar; exon 2, 3' black har.
C. The Refolding of U6 What happens to the U 6 structure when U4 leaves? Phylogenetic, genetic, and structure-probing data indicate formation of a conserved internal stem in U6 snRNA when U4 is not bound (64, 65; Figs. 4 and 5). The
DYNAMIC RNA IN THE SPLICEOSOME
149
sequences of U6 that form this internal stem (sometimes called “the 3‘ stem”) are made up of those that form stem I1 of the U4/U6 structure (Figs. 3-5). The internal stem of U6 is balanced in stability relative to the U4/U6 interaction (64,65). Mutations that hyperstabilize the stem result in a cold-sensitive growth phenotype in yeast (65). Suppressors of the hyperstabilizing mutations are similar to suppressors of a cold-sensitive mutation in U4 that destabilizes the U4/U6 interaction (66). These results argue that the relative stabilities of the U4/U6 interaction and the U6 internal stem are so exquisitely tuned that neither has a large stability advantage over the other. This might be expected of nucleic acids that must interconvert between structures.
D. Establishment of Additional Interactions between U2 and U6 Formation of the internal U6 stem replaces many but not all of the U4/U6 interactions. The region of U6 that participates in U4/U6 stem I is highly conserved and shows complementarity to U2 nucleotides 22-28 (human numbering), including two bulged residues (4-6 I in Fig. 3). Genetic experiments in yeast show that complementarity between U6 and U2 in these regions is required for function (33;Figs. 4 and 5). Mutations in several of the U2 residues that form these interactions are lethal and interfere with both steps of splicing in uiuo (33).In uitro, such mutations preferentially block the second step (34).The extended interaction is referred to as U2/U6 helix I, with the longer of the two segments designated helix Ia and the shorter helix Ib (2-6 Ia and 2-6 Ib in Fig. 5). Mutation of the corresponding residues in mammalian U6 (67, 68) and U2 (32)cause slight loss of function. Mutations that hyperstabilize U2 stem I (Fig. 2), which must be disrupted to form both the U2/U6 helix 11 and helix I interactions, can block splicing (32),suggesting that U2/U6 helix I is also important for mammalian splicing. The dependence on helix I and helix I1 for efficient splicing seems different in yeast and mammalian cells: mammalian splicing is more sensitive to changes in helix I1 than in helix I, but yeast growth is more dependent on helix I than on helix II. These differing dependencies could be due to the assays used, or to intrinsic differences in the splicing machineries of the two organisms.
E. U5 Interacts with the First Exon The highly conserved U5 loop nucleotides interact with the exon regions of the pre-mRNA (Figs. 4 and 5). In yeast, improvement in Watson-Crick complementarity between U5 U residues 96-98 (40-42 in human U5) and the three exon nucleotides just upstream of the splice site positively influences 5‘ splice site cleavage (69, 70). U5 genes from a library in which the U5 loop sequence was randomized were selected on the basis of their ability to
150
MANUEL ARES,
Jr.
AND BRYN WEISER
suppress a 5’ splice-site mutation. The activities of a number of such suppressor U5 genes are explained by the hypothesis that residues 96-98 (4042 in human US) stabilize an interaction between the splicing machinery and exon 1 of the mutant substrate (69, 70). Using mammalian extracts and a pre-mRNA substrate carrying a photoactivatable 4-sU residue as the next to last (15)or last nucleotide of the first exon, cross-links can be detected between the substituted exon residue and U40 and U41 of U5 (16).The time course of appearance of these cross-links places them before the first step of splicing, but after formation of a detectable U1-substrate cross-link (16). As the splicing reaction proceeds, the early cross-link with U1 becomes less efficient as the U 5 cross-link appears. This pattern is consistent with establishment of contacts between U5 and the exon sequences as U1 is displaced from the 5’ splice site (Figs. 4 and 5). Presumably the interaction between U6 and the 5’ splice site is also established at this time, because it becomes detectable with similar kinetics using a substrate with the photoactivatable residue at a different site (16).
F. The First Chemical Step Once the appropriate structures are formed, the transesterification reaction can proceed. The 2’ OH selected as the nucleophile attacks the phosphate at the 5’ end of the intron (arrow, Fig. 5), releasing the first exon and creating a (5’)A 2’-5‘ G(3’) dinucleotide that is uniquely present in spliceosomes having completed the first step of splicing. The free exon may be held in position in part through its interaction with US (16, 69, 70).
VI. Conserved Residues in the Core of the Assembled Spliceosome
Figure 6 shows a model for the secondary structure elements formed by rearrangement of the snRNAs during formation of the spliceosome. The sequence of human snRNAs is used to create the model. Conserved secondary structure elements that vary widely in primary sequence are represented by stem loops. Nucleotide differences between human and yeast (13) are indicated by arrows. Because the lengths of mammalian and yeast U2, U6, and U5 differ, two numbering systems are superimposed on the model. It is obvious that the most highly conserved residues are in the core of this folded RNA structure, as are the sites of the chemical reactions. Variation is restricted to peripheral elements, or is consistent with the maintenance of secondary structure elements (Fig. 6).
151
DYNAMIC RNA IN THE SPLICEOSOME
3’
FIG.6. Sequence variation between yeast and humans in core RNA elements of the spliceosome. The human sequence is folded as in Fig. 5 . Regions of structural conservation, but little primary sequence conservation, are represented without sequence. The highly conserved sequences are shown, and substitutions in the yeast snRNAs are indicated by arrows and bold italics. Standard numbering is human, numbers followed by Y indicate the number of the homologous nucleotide in the yeast spliceosome.
VII. Interactions during the Second Catalytic Step
A. U5 Interacts with the Second Exon Splicing of yeast pre-mRNAs is blocked by changes in either the first or last G of the intron. If an A is selected as the 5’ intron nucleotide in the first step, then the second step is blocked, producing “dead-end lariats. This defect can be suppressed by mutations in U5 that improve Watson-Crick complementarity between U5 nucleotides 95 and 96 (human nucleotides 39 and 40)and the first two nucleotides in the second exon (70; Fig. 7). Suppression is not limited to dead-end lariats formed by improper 5’ splice-site selection: substrates blocked at the second step by mutation of the 3’ splice
152
MANUEL ARES,
Jr.
AND BRYN WEISER
U
I
s'
FIG. 7. RNA-RNA interactions during the second step of splicing. Human sequences are folded to show new interactions that form between U 5 residues 39 and 40 and the first two nucleotides of the second exon. An interaction between the first and last nucleotides of the intron (circled) is indicated by a dashed line. The intron sequences connecting the 5' slice site to the branchpoint and the branchpoint to the 3' splice site are not shown. Exon 1, 5' black bar; exon 2, 3' black Lar.
site AG to AA can also be processed in cells carrying U5 with improved complementarity to the second exon. These observations indicate that interactions between US and the second exon are important in the selection of the 3' splice site for the second step of splicing (70).U5 position 96 (human U5 residue 40) complementarity to the last nucleotide of exon 1 improves the efficiency of (aberrant) 5' splice-site cleavage, and complementarity of the
DYNAMIC RNA IN T H E SPLICEOSOME
153
same U5 nucleotide to the first nucleotide of exon 2 improves efficiency of 3' splice-site cleavage when either the branch or the 3' splice site is noncanonical(69, 70). This suggests the register of the U5 loop with respect to exon 1 may change between the first and second step (70). In mammalian cell extracts, arguments for interaction between U5 and both exons are more direct. Pre-mRNA substrates carrying a single 4-sU residue at the last nucleotide of exon 1 can be cross-linked to U5 before the first catalytic step, and the cross-linked material can be chased through the splicing reactions (16). Furthermore, substrates carrying the 4-SU substitution at the first nucleotide of the second exon become cross-linked to U5 only after the first step of splicing, consistent with a role for the U5 loop in binding the second exon (16).
B. Second-Step Function of Nucleotides in the U6 ACAGAGA Sequence The ability to reconstitute splicing extracts depleted of endogenous U6 snRNA using synthetic U 6 allows mutational analysis of U6 function in splicing in uitro. Among other things, the data identify nucleotides that, when altered, allow spliceosome assembly and the first step of splicing, but block the second step in both yeast (71)and mammalian (68)systems. The third A and (in yeast) the following G of the ACAGAGA sequence (A51, G52 in yeast, 71; A45 in human U6, 68) appear to be specifically required for the second step because their mutation results in a strong accumulation of splicing intermediates but little or no product. Cross-linking data, using 4-SU substitution at the U of the 5' splice site GU dinucleotide, show that, after the first step of splicing in human extracts, A45 of U6 is very near the 5' splice site (16).
C. U2U6 Helix la Has a n Important
Second-Step Function
Nucleotides in U 6 and U2 interact by Watson-Crick base-pairing to comprise helix Ia (33).In reconstituted yeast splicing extracts in vitro, mutation of either a conserved pyrimidine (C58 yeast, 71; U52 human, 68) or an invariant A (A59 yeast, 71)in U 6 produces a strong block to the second step. Mutations in yeast U2 G26 or A27 also allow the first step but block the second step of splicing in uitro (34). These residues form the two base pairs of the U2.U6 helix Ia nearest the two bulged bases that separate helix Ia from helix Ib (2-6 Ia, Figs. 5 and 7, 33). This structure may form prior to the first step as proposed (33);however, its first step function (if any) is not essential (34). Considering the dynamic properties of the spliceosome, the possibility remains that critical elements of this structure do not form until after the first step.
154
MANUEL ARES,
Jr. AND
BRYN WEISER
An interaction between the bulged U2 residue A25 and the U6 G52 in the U 6 ACAGAGA sequence has been proposed, on the basis of suppression data in yeast (Fig. 7 ; 72). Suppression could be indirect, but if it represents direct contact between the two bases, it suggests that the U2.U6 helix I may be folded back onto the ACAGAGA element. The role of these nucleotides in the second step is mentioned above; here, suppression of 3’ splice-site mutations provides additional support for a second-step function (72). AS the secondary-structure elements of the spliceosome become defined, tertiarystructure elements will provide spatial relationships between the known helices. In this case, a model for the spatial relationship between these regions also links the key nucleotides specifically involved in the second step
(72).
D. The Identities of the Branchpoint Bases and Last lntron Base Influence the Second Step The first intron base and the base at the branch can influence the efficiency of the second catalytic step (1-5). Normally, the branch is A(2’-5’)GU. First-step reactions leading to, for example, A(2‘-5‘)A, A(2’-5’)U, or C(2’5’)G branches are blocked for the second step of splicing in yeast (2-4). In the case of A(2‘-5’)AU branches, the second step can proceed with modest efficiency if the 3‘ splice site is AC rather than AG, arguing that the branched bases influence 3’ splice site selection (Fig. 7 ; 73). Branched C(2’-5’)G substrates can be encouraged to participate in the second step by virtue of the dominant suppressor activity of certain PRP16 mutations (2-4). In mammalian cells, A(2’-5’)G is the most common branch dinucleotide, but branches with U(2’-5’)G are also observed. In certain instances when G carries the attacking 2’ hydroxyl [forming a G(2’-5’)G branched dinucleotide], the second step is inhibited in uitro (47). Whether the branchpoint base influences the second step by virtue of a base-specific interaction has not been investigated.
VIII. Modified Nucleotides in Splicing Nucleotide modification is common in structural RNAs such as tRNA and ribosomal RNA, and the spliceosomal RNAs are no exception (75). Figure 8 shows the known nucleotide modifications in vertebrate snRNAs, folded as they might be in the activated spliceosome. Strikingly, the majority of the modified nucleotides are located in the core of the structure. Numerous invariant U residues in U2 and U 5 are modified to pseudouridine, adding a potential hydrogen bond donor to the 5 position of the pyrimidine ring at all
DYNAMIC RNA IN THE SPLICEOSOME
155
FIG.8. Sites of nucleotide modification in the activated spliceosome. Figure 4 is redrawn here to show the modifications internal to and at the ends of the human spliceosomal snRNAs. Greek letter psi (IJJ), pseudouridine; A6, N6-methyladenine; G2, N2-methylguanosine; circled residues, 2’-OCH, sugar modification. The intron sequences connecting the 5’ splice site to the branchpoint and the branchpoint to the 3‘ splice site are not shown. Exon 1, 5’ black bar; exon 2, 3’ black bar.
these residues. The A43 in the conserved ACAGAGA sequence of U6 is methylated on the N6 position, possibly restricting certain interactions with this base (Fig. 8). Likewise, the function of the invariant A30 of U2 could be influenced by N6 methylation. An N2 methyl group on G71 of U 6 in the central stem is also present. Methylation of 2’-OH groups of the ribose moieties is also common in the conserved core.
156
MANUEL ARES,
Jr.
AND BRYN WEISEH
So far, no nucleotide modification in the spliceosomal snRNA has been ascribed a function. As suggested, methylation of potential hydrogen bond donors may prevent competing nonfunctional RNA-RNA interactions from forming. Alternatively, such modifications may increase the hydrophobic character of certain RNA structural elements and help exclude water from the catalytic apparatus as it folds. Finally, methylation of 2' hydroxyls may protect adjacent phosphodiester bonds from attack, as well as preventing certain 2'-OH groups from being misidentified as nucleophiles (Fig. 8). Little information is available about modifications in the yeast spliceosomal snRNAs; however, mutation of many yeast U2 U residues equivalent to those present as pseudouridine in mammalian U2 is not lethal (31, 34; D. Yan and M. Ares, unpublished), suggesting that many modifications may not be essential.
IX. Release of Spliced mRNA and Regeneration of snRNA Structures
Once the splicing reaction is complete, the mRNA must be released from the splicing complex, the snRNA interactions with the intron and exons must be melted, and snRNA structures important for spliceosome assembly must be regenerated, if to be used for another round of splicing. Release of spliced mRNA requires PRP22 protein (74), possibly to disrupt interactions between the spliced exons and U5. U2 and U6 remain associated with the intron product as indicated by cross-linking; these interactions must also be disrupted to release the snRNA from the intron. In addition, the base-pairs between U2 and U6 must be disrupted to release these snRNAs from each other. Following its release, U6 must unfold and reassociate with U4 to regenerate the U4/U6 snRNP. Proteins that may be candidates for factors that function in this way have been identified by genetic approaches in yeast. Mutations in the PRP24 gene act as recessive suppressors of a U4 mutation that destabilizes the U4/U6 stem I (66). The PRP24 protein binds U 6 snRNA, but not the U41U6 complex, except in the mutant U4 strain. One hypothesis is that PRP24 is required to regenerate the U4/U6 snRNP and subsequently U4/U6.U5 snRNP for spliceosome assembly (66). It is difficult to tell from the genetic studies performed thus far whether the defect in these mutants is in regeneration of sufficient amounts of U4/U6.U5 snRNP or is manifested during spliceosome assembly or function of the tri-snRNP components once assembled.
DYNAMIC RNA IN THE SPLICEOSOME
157
X. Conclusions and Perspectives The structure of snRNA appears to contribute to the function of the spliceosome in several different ways, serving to recognize substrate, position attacking groups, and possibly stabilize the association of intermediates. If the snRNAs of the assembled and activated spliceosome represent a ribozyme, then what is their relationship to other ribozymes, especially the self-splicing group-I1 introns with which they share mechanistic similarity ( 1 , 76)? Divalent metal ions (notably Mg2+) are critically important in the function of ribozymes, both for folding and possibly catalysis (77-79). If the spliceosome is a metalloribozyme, then how and by what RNA elements are the catalytic metals positioned? Thus far, the demonstration of a catalytic activity related to splicing by RNA derivatives of the snRNAs in the absence of protein has not been possible. How the proteins of the spliceosome enhance the operation of its RNA elements is a matter of conjecture: despite the sequence similarity between RNA helicases and some splicing factors, no RNA helicase activity has been identified (1-5). Thus, the picture of the spliceosome that emerges from the available data remains blurred in several important areas. Further technical breakthroughs in purification and biochemical analysis of splicing components and in resolving events during splicing will be necessary to refine this picture and address many of the pressing mechanistic questions. ACKNOWLEDGMENTS We thank R. J. Lin, Hitten Madhani, and Christine Guthrie for communicating results prior to publication. We thank Doug Black, Michelle Haynes, Scott Seiwert, and A1 Zahler for making many valuable comments on the manuscript. Supported by National Institute of General Medical Sciences Grant GM40478 and a Research Career Development Award to M.A.
REFERENCES 1. M. Moore, C. Query and P. Sharp, in “The RNA World (R. F. Gesteland and J. F. Atkins,
eds.), p. 303. CSHLab Press, Plainview, New York, 1993. 2. B. Rymond and M. Rosbash, in “The Molecular and Cellular Biology of the Yeast Saccharomyces” (E. W. Jones, J. R. Pringle and J. R. Broach, eds.), p. 143. CSHLab Press, Plainview, New York, 1992. 3. S. Ruby and J. Abelson, Trends Genet. 7, 79 (1991). 4. C. Guthrie, Science 253, 157 (1991). 5. M. R. Green, Annu. Reu. Cell Biol. 7, 559 (1991). 6. J. Steitz, D. L. Black, V. Gerke, K. A. Parker, A. Kramer, D. Frendewey and W. Keller, in
158
MANUEL ARES,
Jr.
AND BRYN WEISER
“Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles” (M. L. Birnsteil, ed.), p. 100. Springer-Verlag, Berlin and New York, 1988. 7. I. Mattaj, in “Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles” (M. L. Birnsteil, ed.), p. 100. Springer-Verlag, Berlin and New York, 1988. 8 . R. Luhrmann, in “Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles” (M. L. Birnsteil, ed.), p. 71. Springer-Verlag, Berlin and New York, 1988. 9. M. Moore and P. Sharp, Nature 365, 364 (1993). 10. S. Burgess and C. Guthrie, Cell 73, 1377 (1993). 1 1 . S. L. Yean and R. J. Lin, MCBiol 11, 5571 (1991). 12. T. Nilsen, Annu. Reu. Microbid. 47, 413 (1993). 13. C. Guthrie and B. Patterson, ARGen 22, 387 (1988). 14. J. Puglisi, J. Wyatt and I. Tinoco, JMB 214, 437 (1990). 15. J. Wyatt, E. Sontheimer and J. Steitz, Genes Dew. 6, 2542 (1992). 16. E. Sontheimer and J. Steitz, Science 262, 1989 (1993). 17. C. Lesser and C. Guthrie, Science 262, 1982 (1993). 18. S. Kandels-Lewis and B. Seraphin, Science 262, 2035 (1993). 19. Y. Zhuang and A. M. Weiner Cell 46, 827 (1986). 20. P. Siliciano and C. Guthrie, Genes Deo. 2, 1258 (1988). 21. B. Seraphin, L. Kretzner and M. Rosbash, EMBOJ. 7, 2533 (1988). 22. R. Reed, Genes Deu. 3, 2113 (1989). 23. C. Reich, R. VanHoy, G. Porter and J. A. Wise, Cell 69, 1159 (1992). 24. B. Seraphin and S. Kandels-Lewis, Science 262, 2035 (1993). 25. J. Bruzik and J. Steitz, Cell 62, 889 (1990). 26. S. Seiwert and J. Steitz, MCBiol 13, 3135 (1993). 2 7 . ’ K . Hall and M. Konarska, PNAS 89, 10969 (1992). 28. B. Konforti, M. Koziolkiewicz and M. Konarska, Cell 75, 863 (1993). 29. M. Area and A. H. Igel, Genes Deu. 4, 2132 (1990). 30. S.-E. Behrens, K. Tyc, B. Kastner, J. Reichelt and R. Luhrmann, MCBiol 13, 307 (1993). 31. M. Ares and A. H. Igel, in “The Molecular Biology of RNA” (T. Cech, ed.) p. 13. Alan R. Liss, New York, 1989. 32. J. Wu and J. Manley, MCBiol 12, 5464 (1992). 33. H. Madhani and C. Guthrie, CeU 71, 803 (1992). 34. D. McPheeters and J. Abelson, CeZZ 71, 819 (1992). 35. M. Zavanelli and M. Ares, Genes Dew. 5, 2521 (1991). 36. M. Zavanelli, J. Britton, A. H. Igel and M. Ares, MCBiol 14, 1689 (1994). 37. S. Barabino, B. Sproat and A. Lamond, NARes 20, 4457 (1992). 38. B. Datta and A. M. Weiner, JBC 267, 4497 (1992). 39. R. Parker, P. Siliciano and C. Guthrie, Cell 49, 229 (1987). 40. Y. Zhuang and A. M. Weiner, Genes Dew. 3, 1545 (1989). 41. J. Wu and J. Manley, Genes Deu. 3, 1553 (1989). 42. L. Miraglid, S. Seiwert, A. H. Igel and M. Ares, PNAS 88, 7061 (1991). 43. K. Nelson and M. Green, Genes Dew. 3 , 1562 (1989). 44. D. Wasserman and J. Steitz, Science 257, 1918 (1992). 45. K. Hartmuth and A. Barta, MCBiol8, 2011 (1988). 46. J. Noble, Z.-Q. Pan, C. Prives and J. Manley, Cell 50, 227 (1987). 47. C. Query, M. Moore and P. Sharp, Genes Dew. 8, 587 (1994). 48. P. Bringmann, B. Appel, J. Rinke, R. Reuter and H. Theissenn et aZ., EMBOJ. 3, 1357
(1984). 49. C. Hashimoto and J. Steitz, NARes 12, 3283 (1984).
DYNAMIC RNA I N THE SPLICEOSOME
SO. J. Rinke, B. Appel, M. Digweed and R. Luhrmann, JMB 185, 721 (1985).
159
51. D. Brow and C. Guthrie, Nature 334, 213 (1988). 52. D. L. Black and A. Pinto, MCBiol9, 3350 (1989). 53. D. Frank, H. Roiha and C. Guthrie, MCBiol 14, 2180 (1994). 54. E. Sontheimer and J. Steitz, MCBiol 12, 734 (1992). 55. T. Hausner, L. Giglio and A. M. Weiner, Genes Deo. 4, 2146 (1990). 56. B. Datta and A. M. Weiner, Nature 352, 821 (1991). 57. J. Wu and J. Manley, Nature 352, 818 (1991). 58. P. Fabrizio, D. McPheeters and J. Abelson, Genes Deo. 3, 2137 (1989). 59. H. Madhani, R. Bordonne and C. Guthrie, Genes Deu. 4, 2264 (1990). 60. E. Shuster and C. Guthrie, Nature 345, 270 (1990). 61. D. Wasserman and J. Steitz, PNAS 90, 7139 (1993). 62. H. Sawa and Y. Shimura, Genes Deu. 6, 244 (1992). 63. H. Sawa and J. Ahelson, PNAS 89, 11269 (1992). 64. T. Wolff and A. Bindereif, Genes Deo. 7, 1377 (1993). 65. D. Fortner, R. Troy and D. Brow, Genes Deo. 8, 221 (1994). 66. K. Shannon and C. Guthrie, Genes Deu. 5, 773 (1991). 67. B. Datta and A. M. Weiner, MCBiol 13, 5377 (1993). 68. T. Wolff, R. Menssen, J. Hammel and A. Bindereif, PNAS 91,903 (1994). 69. A. Newrnan and C. Norman, Cell 65, 115 (1991). 70. A. Newinan and C. Norman, Cell 68, 743 (1992). 71. P. Fabrizio and J. Abekon, Science 250, 404 (1990). 72. H. Madhani and C. Guthrie, Genes Deo. 8, 1071 (1994). 73. R. Parker and P. Siliciano, Nature 361, 660 (1993). 74. M. Company, J. Arenas and J. Abelson, Nature 349, 487 (1991). 75. R. Reddy and H. Busch, in “Structure and Function of Major and Minor Small Nuclear Rihonucleoprotein Particles” (M. L. Birnsteil, ed.), p. 1. Springer-Verlag, Berlin and New York, 1988. 76. A. M. Weiner, Cell 72, 161 (1993). 77. M. Yarus, FASEBJ. 7, 31 (1993). 78. A. Pyle, Science 261, 709 (1993). 79. T. Steitz and J. Steitz, PNAS 90,6498 (1993). 80. K. Marchhoff and R. Padgett, NARes 20, 1949 (1992). 81. P. Sharp, Cell 77, 805 (1994). 82. T. Nielsen, Cell 78, 1 (1994). 83. H. Madhani and C. Guthrie, Annu. Rev. Genet. in press (1994).
Transcriptional Control of the Human Apolipoprotein B Gene in Cell Culture and in Transgenic Animals’ BEATRIZLEVY-WILSON Palo Alto Medical Foundation Research lnstitute Palo Alto, California 94301
I. Boundaries and Chromatin Organization of the Human apo-B Gene . . 11. Identification of apo-B Regulatory Elements, Using Cell-Culture Models.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Probing the in Vioo Function of the apo-B Gene Regulatory Elements IV. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
162 163 172 184 189
Apolipoprotein B (apo-B) is the major and perhaps the sole protein component of low-density lipoproteins (LDLs), which play a central role in the metabolism and transport of cholesterol. Apo-B is the ligand responsible for the uptake and clearance of low-density lipoproteins from the circulation via the LDL receptor pathway (1-3). The delivery of LDL to its receptor and the ensuing catabolism regulate cellular cholesterol biosynthesis. Thus, apoB and the LDL receptor are crucial to cholesterol homeostasis, and deficiencies in either protein may contribute to the development of atherosclerosis. Elevated levels of apo-B and LDL-cholesterol in plasma are a major risk factor for premature heart disease, whereas low levels appear to be protective (3-5). The primary structure of the 500-kDa apo-B protein has been determined by cDNA cloning and sequencing (6). Apo-B mRNA has been detected only in the liver, intestine, and placenta in mammals (7, 8), thus making the human apo-B gene a suitable model system for studying the regulation of tissue-specific gene expression. The complete exon-intron structure and DNA sequence of the human apo-B gene has been determined (9);the coding portion of the gene extends over 43 kb and contains 29 exons A list of abbreviations appears on page 188. Progress in Nucleic Acid Research and Molecular Biology, Vol. 50
161
Copyright 0 1995 by Academic Press. Inc. All rights of reproduction in any form reserved.
162
BEATHIZ LEVY-WILSON
and 28 introns (10). One exon, exon 26, is unusually long (7572 bp) and codes for the main portion of the LDL receptor-binding domain of apo-B (11). In this article, I review recent progress in three major areas of work with the apo-B gene, namely, (1)mapping regulatory elements required for hepatic and intestinal expression of the human apo-B gene, using cell-culture models; (2) ascertaining the functions of these regulatory elements in the expression of the apo-B gene in uivo, using transgenic mice; and (3) using this knowledge to generate transgenic mice that overproduce human apo-B, to test its role in atherogenesis.
1. Boundaries and Chromatin Organization of the Human apo-B Gene
Having determined the structure of the coding portion of the gene, one of our initial goals was to determine the length of the chromosomal domain in which the human apo-B gene resides in nuclei from transcriptionally active and inactive cell types. The intent was to identify the 5’ and 3’ borders of the gene, so that we would concentrate our efforts on mapping important control regions within this domain. In eukaryotic cells, chromatin is organized as domains or loops that are generated by periodic attachment of the chromatin fiber to protein components of a nuclear matrix, or scaffold. These chromosomal loops may exert a function in gene regulation (12-16). The length of the chromatin domain encompassing the human apolipoprotein-B gene was evaluated by determining the locations of nuclear matrix attachment sites as well as the boundaries of the DNase-I-sensitive domain in cells that express the gene (such as HepG2 and CaCo-2 cells) and in those that do not (HeLa cells) (Fig. l),(1 7). Three nuclear matrix attachment regions (MARs) of the human apolipoprotein-B gene were localized: a 3’-proximal MAR, between nucleotides +43,186 and +43,850; a 5‘-proximal MAR, between nucleotides -2765 and -1801; and a 5’-distal MAR, between nucleotides -5262 and -4048. Both the 3’-proximal and the 5’-distal MARs were present in cells that express the gene (HepG2 and CaCo-2 cells) as well as in cells that do not (HeLa cells), whereas the 5‘-proximal MAR was detected only in HepG2 cells. These MARs were located at the bases of chromosomal loops in histone-extracted nuclei in all three cell lines. Various classes of (A T)-rich sequences resembling the recognition site for topoisomerase I1 were present within the MAR-containing fragments. The boundaries of the DNaseI-sensitive domain coincided with the positions of the 3’-proximal and 5’-distal matrix attachment sites. These results suggested the existence of a 47.5kb domain representing a topologically sequestered functional unit contain-
+
LIVER-SPECIFIC EXPRESSION OF THE
A A
m
ap0-B GENE
2 kb H
163
A
=
FIG. 1. The 5’ and 3’ boundaries of the human apo-B gene. The horizontal line represents the length of the gene, and the vertical bars crossing the horizontal line represent its 29 exons. ATG denotes the translational start site. The MARs and SARs are represented below the gene by triangles: A,the positions of MARs and SARs found in all three cell types; A ,the positions of MARs and SARs present only in HepG2 cells. Boundaries of the DNase-I-sensitive domain are represented by the bars below the triangles. Reproduced with permission from Ref. 17.
ing the coding region and all known cis-acting regulatory elements of the human apolipoprotein-B gene ( 1 7).
II. Identification of apo-B Regulatory Elements, Using Cell-Culture Models
A. Localization of the apo-B Promoter The localization of a segment of the apo-B gene exhibiting tissue-specific promoter activity was accomplished by a combination of two experimental approaches. The first involved the determination of DNase-I-hypersensitive (DH) sites in the region of the gene flanking the transcriptional start site; the second involved testing DNA fragments from the 5’ end of the gene for their ability to initiate transcription of a reporter gene, the chloramphenicol acetyltransferase (CAT) gene, in transient transfection assays with various cultured cell lines. In most studies, we used liver-derived HepG2 cells and intestine-derived CaCo-2 cells as models representing transcriptionally active liver-like and intestine-like cells, respectively. HeLa cells were used as a model for transcriptionally inactive cells. The DNase-I-hypersensitive studies (18)revealed the presence of three DH sites within a DNA sequence extending from -899 to + 1 of the human apo-B gene, plus a fourth one within the second intron (Fig. 2). These sites are near positions -120 (DHl), -440 (DH2), -700 (DH3), and +760 (DH4). DNase-I-hypersensitive sites 1-3 coincided with the locations of three micrococcal-nuclease-hypersensitive(MH) sites, but only in nuclei from transcriptionally active HepG2 and CaCo-2 cells. None of these DH or MH sites was detected in nuclei from transcriptionally inactive HeLa cells. Comparison of the DNA sequence of the 5’ flanking regions of the human and mouse apo-B genes revealed a high degree of evolutionary conservation of
164 HepG2 CaCo-2 -899
i
i
BEATRIZ LEVY-WILSON
i
rn
PVUII
StUI
Ncol
TATA FWII
FIG. 2. DNase-I-hypersensitive sites in the 5’ end of the human apo-B gene. The upper portion of the figure shows the location of the DH sites, indicated by the numbered arrows. The main horizontal line shows the 5’ end of the gene with exons 1 and 2 as open boxes. Restriction endonuclease sites are indicated by the small arrows below the line. The position of the TATA box and of nucleotide -899 are also indicated [18].
short stretches of sequences in the immediate vicinity of each of the DH and most of the MH sites (18). Of note are DH1, the strongest DH site in the promoter region, which mapped in the segment from -111 to -88 of the gene, where an important hepatocyte nuclear factor-3 (HNF-3) binding site resides (19). DH4, on the other hand, mapped to an NF-1 binding site, also highly conserved (20). To map promoter elements in the 5’ region of the human apo-B gene, a 1-kb PvuII fragment immediately preceding the translational start site (- 899 to 121) was inserted in front of the bacterial CAT gene and transfected into six different cell lines. The results (Fig. 3) show that there was CAT activity in HepG2 and CaCo-2 cells, but not in HeLa, U937, L, or CHO cells, suggesting that the PuuII fragment contains DNA sequence elements with promoter activity specific for liver-derived and intestine-derived cell lines (18). Several CCGG and GCGC sequences present within the promoter region are undermethylated in HepG2 and CaCo-%cells, but are fully methylated in HeLa cells, suggesting a correlation between undermethylation, nuclease sensitivity, and promoter activity in this region (Fig. 4) (21). Extensive studies of this region (22-24) revealed binding sites for a number of liver-enriched transcription factors, such as HNF-4 (25), C/EBP (26), and HNF-3 (19).The in viuo functional roles of many nuclear proteins that bind to the apo-B promoter sequence remain to be determined. The similarities and differences in the function of regulatory elements in this promoter region (-898 to +121) of the human apo-B gene in liverderived and intestine-derived cells were examined in transient transfection experiments using a series of 5’ deletions of the promoter region (Fig. 5) (22). The overall distribution of positive and negative regulatory segments was very similar in two hepatoma cell lines (HepG2 and Hep3B) but different from that observed in colon carcinoma cells (CaCo-2). For example, whereas
+
LIVER-SPECIFIC EXPRESSION OF THE
ap0-B GENE TATA
165
‘“”
I
FIG.3. Cell specificity of apo-B promoter expression. A 102O-b~fragment extending from -899 to 121 of the apo-B gene was inserted into pLSl upstream of a promoterless CAT gene. The plasmid was transfected into the various cultured cell lines shown below, and the level of CAT activity in each cell line was determined [18].
+
260 bp of 5‘-flanking sequence suffice for maximal expression of the promoter in HepG2 cells, only 139 are required for maximal expression in CaCo-2 cells. A low level of promoter activity was observed in C H O cells but only with short constructs, with maximal activity for the -85 construct (Fig. 5). Gel retention experiments using the region from -262 to -88 (the region of greatest contrast between HepG2 and CaCo-2 cells) revealed interesting variations in the relative abundance of various nuclear proteins between the two cell types (22). A major functional difference between HepG2 and CaCo-2 cells was localized to the region between -111 and -88, which harbors the sequence TGTITGCT, a motif present in the promoter region of
166
BEATRIZ LEVY-WILSON DH2
DH3
DH1
HeLa
?
HepG2
TATA
QP,
Q
,?
I
CaCo-2
HeLa
HepG2 CaCo-2
I
.1207
I
-899
I
-666
I
-388
I
I
-229
1
I
+121
+1
+360
FIG. 4. Summary of the DNA methylation status in the apo-8 promoter region. First panel: the distribution of methylated CCGG sequences (Mspl sites) (0)in HeLa DNA; the three DH arrows above the first and third panels illustrate the positions of DNaseI-hypersensitive (DH) sites in that region. Second panel: the distribution of undermethylated (0)and methylated (0)CCGG sequences (HpaII sites) in HepC2 and CaCo-2 DNA; shaded circles represent sites whose methylation status is unknown. Third panel: the locations of methylated GCGC sequences (W) (HhaI sites) in HeLa DNA. Fourth panel: the location of undermethylated (0) and methylated (W) GCGC seqnences (HhaI sites) in HepG2 and CaCo-2 DNA; shaded squares represent sites whose methylation status is unknown. The horizontal line at the bottom represents a scale in IYdSe-pairS; the position of the transcriptional start site ( + ) is indicated in the first and third panels. Reproduced with permission from Ref. 21.
several liver-specific genes. The molecular basis for the functional differences between these two cell types may be attributable to a difference in the relative abundance of three proteins that bind to sequences between -111 and -88 (22).
B. The Second lntron Enhancer The DNase-I hypersensitivity studies revealed, in addition to the aforementioned "promoter" DH sites, a strong, tissue-specific DH site designated as DH4 (18),localized in the second intron of the gene in a segment of a highly conserved DNA sequence reminiscent of an NF-1 binding site (20). This observation prompted an examination of the region containing and surrounding DH4 for the presence of additional regulatory elements. A 704bp PstI fragment comprising sequences from the first and second introns of the human apolipoprotein B gene (positions +360 to 1064) possesses
+
LIVER-SPECIFIC EXPRESSION OF THE
ap0-B GENE
1.
167
Relallve CAT Activity I
-900
l
-son
l
-700
l l l dm -500 -UIO
l
l
l
l
-m -ma -im o
%Of CAT
Activity
l
tm
.600 440
-260 -247 -230
-lW 458
-
.I39
-111
II.
-
45-
HepG2
7
p
CHO
150
5
FIG.5 . Relative CAT activities of apo-B promoter deletions in HepG2, Hep3B, CaCo-2, and CHO cells. Deletions from the 5’ end of the apo-B promoter were made as described (22). In each experiment, 15 pg of DNA was cotransfected with 5 pg of plasmid containing the Rous sarcoma virus P-galactosidase gene, and the CAT activity was assayed after correction for differences in transfection efficiency. (I) The left portion shows a scale, in base-pairs, representing the region from -900 to +121 of the apo-B gene. Below the scale, the horizontal bars illustrate the deletions, with the numbers indicating the positions of the 5’ ends of each deletion. On the right are the values of CAT activity, expressed as percentage relative to that of the -898 construct; N is the number of independent transfections. Because the expression level of the -898 construct was very low in CHO cells, the data are expressed as percentages of CAT activity relative to the construct with the highest activity for this cell line (-85). Chloramphenicol acetyltransferase activities shown in this figure can only be compared vertically, not horizontally. (11) The data from panel I are drawn as bar diagrams to facilitate visualization of the differences between the cell lines. The position of the vertical bars on the abscissa indicates the 5’ end point of each deletion. The size of the bars in each panel represents the relative CAT activity of the deletion constructs. Reproduced with permission from Ref. 22.
168
BEATRIZ LEVY-WILSON
tissue-specific transcriptional enhancer elements when assayed in transient transfection experiments using either the apo-B or the thymidine kinase promoter (Fig. 6), (20). Most of the enhancer activity, which was observed in transcriptionally active HepG2 and CaCo-2 cells, but not in transcriptionally inactive CHO or HeLa cells, was subsequently located in a 443-bp SmaI-PuuII fragment (positions +621 to +1064) within the second intron of
A.
- HepG2 Pvu-CAT +443b.p. +443b.p. control iorward raverse
1)
1.0
D.
5.0
Pvu-CAT +275b.p. control lorward
1.0
1.5
B. TK-CAT +443b.p. control forward
4.9 +275b.p. reverse
2.0 Pvull
c.
4.0
1.0
E. TK-CAT control
+443b.p. rever8e
0.0
+275b.p. +275b.p. forward reverse
1.0
Pvull
1.4
Smal
b.p. +340 +821 exon. I I 275 443
1.6
-CaCo-2 Pvu-CAT +443b.p. control forward
+443b.p. reverse
* r e * *
**+r.* t0
2.0
F. Pvu-CAT +275b.p. forward control
1.0
0.6
3.6
+275b.p revarse
0.98
PVUll
+lo04
FIG. 6. Enhancer activity in the apo-B gene second intron. (A-F) Autoradiographs representing chloramphenicol acetyltransferase assays performed with HepG2 (A, B, D, and E) and CaCo-2 (C and F) cells. In every case, the lower spots represent the substrate and the upper spots represent the product of the CAT enzymatic reaction. The numbers below the autoradiographs indicate the relative CAT activity, with the Pvu-CAT and TK-CAT control values set at 1.0.All values are the average of at least four independent transfections and have been corrected for the differences in transfection efficiencies of the various flasks, as determined by the P-galactosidase activities. (A) Enhancer effect of the 443-bp fragment in both orientations on the apo-B promoter in HepG2 cells; (B) similar experiments with the thymidine kinase promoter; (C) results obtained with the apo-B promoter in CaCo-2 cells; (D) effect of the 275-bp fragment in both orientations on the apo-B promoter in HepG2 cells; (E) similar data with the thymidine kinase promoter; (F) results with the apo-B promoter in CaCo-2 cells. Shown below is a restriction map of the relevant region, with the numbers in base-pairs below the map indicating the positions along the apo-B gene sequence of the two PouII sites and the SmoI site. Exons are shown as black rectangles below the map. The locations of the 275- and 443-bp fragments are shown at the bottom. Reproduced with permission from Ref. 20.
LIVER-SPECIFIC EXPRESSION OF THE
ap0-B GENE
169
the apo-B gene. The actual sequence elements that bind to nuclear proteins from HepG2 cells were identified by DNase-I footprinting. Deletion experiments were performed to distinguish those proteinbinding regions involved in the enhancer effect. Our data showed that sequences between positions +SO6 and +940 are essential for this enhancer activity (Fig. 7). This segment contains one large 97-bp footprint, whose sequence has been conserved between the human and mouse genes (27).We then showed that this footprint contains functional binding sites for at least three liver-enriched transcription factors (Fig. 8). One of these proteins was identified as hepatocyte nuclear factor 1 (HNF-l), which binds with rela-
Relatlve CAT Activity Apo-8 Promoter
A3Ez Xbr 1
I
xba 2
I
xba 3
I
Xbr 4
I 8nul
A
B
C
D
E
I
xba 7
I
Xba 8 xbr 9
4
* I
i
t
xbr 717 X h 3s
1
I
1 I
F R
1.3
CaCo-2 S.D. N
M.2 4 N.D.
. .
1.1
-
.
2 2
R
2.7 2.7
f0.2 f0.6
4
4
1.7 2.3 2.6 5.1
F R
2.8
i0.6 N.D.
3
3.0
5.2
-
2 2
2.7 5.1
. .
2 2
3.4 6.6
-
2
F R F
1.8
Pvu w
F
+=I
Xbr 6
I I
S.D. N
-
. .
F R F R F R F R F R
1.5 1.3 1.5 1.3 1.5 1.5 3.6 2.6 2.5 2.8
M.5 M.3 M.l f0.4
F R F R F R
1.3 1.2 2.4 1.8 3.2 2.2
f0.5 fO.l f0.6 i0.3
fO.0 fO.l
. -
. -
.
2 2 2 1
2 2 4
4 2 2 8
8 8 8 2 2 4 4 8 8
2
FIG. 7. Localization of enhancer activity within 443-bp SmI-Pod1 fragment of intron 2. Deletions of the 443-bp SmI-PuuII fragment (Xbo 1to Xba 3/8) were constructed as described previously (20) and inserted in both the forward (F)and reverse (R) orientations upstream of the apo-B promoter; the thick line in the center represents the 443-bp SmI-PuuII fragment. (A-F) Binding sites for HepG2 nuclear proteins, as determined by DNase-I footprinting (20). The mean value of relative CAT activity in HepG2 and CaCo-2 cells is shown for the forward and reverse orientations of each deletion and the 443-bp fragment. Standard deviations are also shown for experiments reflecting more than two transfections. N , The number of times a particular construct was transfected; ND, not determined. Reproduced with permission from Ref. 20.
170
BEATRIZ LEVY-WILSON
HNF-1 Dimer
ClEBP family
I1
FIG. A. Proposed arrangement of protein factors binding to the apo-B core enhancer. The thick black line represents the 97-bp footprint of the core enhancer. The binding sites for HNF-1, C/EBP, and protein I1 are shown [28].
tively low affinity to the 5’ half of a 20-bp palindrome located at the 5’ end of the 97-bp footprint. A binding site for C/EBP (or one of the related proteins that recognize similar sequences) was identified in the center of the 97-bp footprint (28). This binding site is coincident or overlaps with the binding sites for five other proteins, two of which appear to be distinct from the C/EBP-related family of proteins (28). The binding site for a nuclear factor (designated protein I) is located between the HNF-1 and C/EBP binding sites. The function of protein I, if any, remains obscure. Finally, the 3’-most 15 bp of the footprinted sequence contains a binding site for another nuclear protein, which we have called protein 11. Mutations that abolish the binding of either HNF-1, protein 11, or the C/EBP-related proteins severely reduce enhancer activity. However, deletion experiments demonstrated that neither the HNF-1 binding site alone, nor the combination of binding sites for HNF-1, protein I, and C/EBP, nor the C/EBP binding site plus the protein I1 binding site is sufficient to enhance transcription from a strong apolipoprotein-B promoter (Fig. 9). Rather, HNF-1 and C/EBP act synergistically with protein I1 to enhance transcription of the apolipoprotein-B gene (28).
C. The Third lntron Enhancer Using methodology similar to that employed in the detection and analysis of the second intron enhancer, we uncovered another sequence with tissue-specific enhancer activity in intron 3 (29). This enhancer is weaker than the second intron enhancer. Deletion experiments identified a minimal or “core” enhancer, a 155-bp fragment extending from + 1803 to 1958, that contains binding sites for three liver-enriched nuclear proteins and is flanked by two tissue-specific DH sites, DH-IV and DH-V (Fig. 10).
+
D. The Reducer Similarly, there appears to be a negative regulatory region between positions -1802 and -3211 of the human apolipoprotein-B gene that reduces
LIVER-SPECIFIC EXPRESSION OF T H E
H
147
*
M
Taq - Xba
11q I +a71 I A ,)I,,UEBP,
1.q I +a21
)
I
(
m
E
Xba Taq -475
476
Xba Core
+838
171
ap0-B GENE
+a03
-898 CAT S.D. N F 2.6 tO.3 4 R 1.7 i 0 . 2 4 F
R
1.1 i O . 2 4 1.0 iO.1 4
I
-85 CAT F
S.D. N 36.3 i 0.7 4
F
7.2 i 1 . 4 4
R 1.1 iO.l 4 F 0.0 t 0.1 4
14.7 i 0 . 9 4 F R
6.4 kO.6 4 10.5 t 1.4 4
FIG. 9. Deletion analysis of the apo-B core enhancer in HepG2 cells. The left side of the figure shows a map of the core enhancer (27) in which the location of the 97-bp footprint (E) is shaded. Locations of the binding sites for HNF-1, C/EBP, and proteins I and I1 are indicated. Below this are illustrated three deletion clones of the core enhancer. Protein I hinds between the HNF-1 and C/EBP binding sites but it plays no functional role in the enhancer activity. Each deletion was cloned in either the forward (F)or reverse (R) orientations upstream of either the -898 or the -85 apo-B promoter CAT plasmids and transiently transfected into HepG2 cells. The results are presented as CAT activity relative to that of the promoter alone, whose activity was set to 1.0. S.D., Standard deviation; N, number ofindependent transfections; *, the orientation of this clone was not determined. Reproduced with permission from Ref. 28.
expression of the gene (30).This "reducer" effect was detected in transient transfection experiments performed with hepatic (HepG2 and HepSB) cells as well as intestinal (CaCo-2) cells. It appears to be specific for the apolipoprotein-B promoter because it does not affect expression from the thymidine kinase promoter. This reducer segment operates either in the
1
CaCo-2
1 1
I
HepG2
l v v
I
Exon 2
Stul
-638
I
Smal +621
1
Exon 3
I
Pvull
+low
1-1
Srvr +I542
Enhancer
Haelll +2020
1
I
EcoRl
+2421
+2977
Bgll
FIG. 10. Locations of DH sites and enhancer in intron 3. The upper panel shows the positions of the DH sites in CaCo-2 and HepG2 cells, with the roman numbers that designate the DH sites located below the arrows. The central horizontal line shows a restriction map of the area of interest, with exons 2 and 3 as white boxes and the third intron core enhancer as a black box. DH-IV and -V flank the third intron core enhancer. DH-I maps at the protein I1 binding site of the second intron core enhancer [29].
172
BEATRIZ LEVY-WILSON 2nd lntron enhancer
3rd intron enhancer
-5 K-b
-3211
-1802
u
m
exonl exon?
exon3
exon 4
FIG. 11. Summary of regulatory elements of the human apo-B gene. A, Matrix anchorage region.
presence or the absence of the transcriptional enhancers from the second and third introns of the apo-B gene, suggesting that the protein factors involved in the enhancer and reducer effect can interact with the transcriptional machinery independently of each other (29, 31). A summary of all of the tissue-specific regulatory elements of the human apo-B gene identified with cultured cells is shown in Fig. 11.
111. Probing the in Vivo Function of the apo-B Gene Regulatory Elements
Having identified a number of control elements required for high-level, tissue-specific expression of the apo-B gene in cultured cells, the question arises as to whether these regulatory regions characterized in cultured cells function to control transcription of the apo-B gene in transgenic animals. To answer this question, a number of constructs were designed in which one or more of these control regions were linked to the bacterial p-galactosidase @-gal) reporter gene. We chose the &gal gene as a reporter gene rather than the apo-B gene for several reasons. The human apo-B gene is very large (10, 17), and the smallest available minigene construct is 18 kb and lacks unique restriction sites required for making the proposed constructs (32).The p-gal gene is not expressed in most mammalian cells, thus providing a low background for detection of the expressed protein in mouse tissues. Furthermore, @-galexpression can easily be detected in tissue slices with a simple histological stain (33). These constructs were used to generate transgenic mice, which were analyzed both for tissue specificity and for the relative level of transgene expression (34). Whenever possible, founder mice (Fo)were bred to generate F, mice. In a few cases the F, progeny from a single founder exhibited differences in transgene copy number, presumably due to integration of the transgene at more than one chromosomal site within the germ line of the founder animal, followed by segregation at the F, generation. (In cases where Southern-blot analysis or differences in transgene expression sup-
LIVER-SPECIFIC EXPRESSION OF THE
ap0-B GENE
173
ported the idea that F, mice from the same founder had different transgene integration sites, they were considered independent lines of mice.)
A. Expression of the Transgenic Constructs in Cell Culture As a test for structural integrity, all of the constructs to be tested in transgenic animals (shown in Fig. 12), were transiently transfected into both the human hepatoma cell line HepG2 and the human intestinal carcinoma cell line CaCo-2. All constructs contained the apo-B promoter (spanning nucleotides -898 to + 121), fused to a P-galactosidase expression cassette (30)containing the polyadenylation and splicing signals from simian virus 40 (SV40). Transcription from the apo-B promoter was quantitated by assaying P-galactosidase activity in extracts made from the transfected cells; the results are presented in Fig. 12. As previously described (18,22),the construct that contained just the apo-B promoter linked to the P-gal gene (pp-ga1.P) was efficiently expressed in both HepG2 and CaCo-2 cells. Addition of sequences from +360 to + 1064 of the apo-B gene (pp-gal. PE), including the second intron enhancer, resulted in 7- and 5.3-fold increases in P-galactosidase activity in HepG2 and CaCo-2 cells, respectively. In pp-gal2.7E, sequences spanning positions -2762 to -899 from the apo-B gene containing the 5'-proximal MAR (17) and a negative regulatory region (30)were inserted upstream of the enhancer. The P-gal activity was reduced to half in this construct as compared to the p-gal.PE construct, probably due to the effect of the negative control region (-2738 to -1802). We then tested the promoter activity of pp-ga15.2E that is identical to pp-ga12.7E except for the addition of sequences from -2738 to -5262, including the 5' distal MAR. Addition of these sequences had only a minimal negative effect on promoter activity in the two cell lines, as compared to the pP-ga12.7E construct. To test the role of the 3' MAR, a DNA fragment spanning nucleotides +43,104 to +44,329 and containing the 3' MAR was inserted downstream of the P-galactosidase cassette in pp-ga15.2E to create pp-gal5.2EM. It had a minor negative effect on promoter activity in transient assays. To evaluate the role of the second intron enhancer, the plasmid pp-gal5.2M was created. This construct is identical to pp-gal5.2EM except that it lacks the second intron enhancer. The removal of the enhancer resulted in reductions to one-seventh and one-third of promoter activity in HepG2 and CaCo-2 cells, respectively. Finally, to examine the function of the third intron enhancer, it was introduced together with intron 2 upstream of the promoter to create the pp-ga15.2E3M construct (Fig. 12). Comparison of the activity of this construct to that of pp-gal5.2EM showed that the third intron enhancer had a small positive effect on promoter activity in CaCo-2
Relative @gal
Ac HepGZ
pp-ga1.P
ity
CaCo-2
1.o
1 .o
7.011.0(4)
5.3f0.95(4)
3.510.13(4)
2.3*0.23(4)
3.0*0.36(4)
2.2*0.14(4)
2.2*0.24(4)
1.5*0.14(4)
0.3*0.02(4)
0.510.02(4)
pp-gal.PE
pp-ga12.7E -2762
I
pp-ga15.2M
pp-gal5.ZE3M 6.3(2)
2.2(2)
L
enhancer (+621 to +2805)
FIG. 12. Transcriptional activity in HepG2 and CaCo-2 cells of constructs used to generate transgenic mice. Construct pp-ga1.P contains the apo-B promoter (-898 to +121) (shown as a cross-hatched rectangle) fused to a P-galactosidase expression cassette. Plasrnid pp-gal. PE is identical to pp-ga1.P except for the presence of the second intron enhancer, shown as a black rectangle. pp-ga12.7E is similar to pp-gal.PE, except for the addition of sequences from -2762 to -899 of the human apo-B gene, including the 5' proximal MAR (shown as a stippled box) cloned upstream of the enhancer. pp-ga15.2E is identical to pp-ga12.7E except for the addition of the sequences from -5262 to -2762 from the human apo-B gene that include the 5' distal MAR (shown as a stippled rectangle). pB-gal5.2EM is identical to pBga15.2E except for the insertion of the 3' MAR at its 3' end. pp-ga15.2M is similar to pp-gd5.2EM but lacks the second intron enhancer. pp-ga15.2E3M is identical to pp-ga15.2M except for the addition of sequences
LIVER-SPECIFIC EXPRESSION OF THE
ap0-B
GENE
175
cells. In HepG2 cells, on the other hand, promoter activity was stimulated about threefold, suggesting that additional hepatic enhancer elements are present within intron 3. These experiments indicate that all of the P-gal constructs are functional in transiently transfected cells of both hepatic and intestinal origin, and that the enhancer and reducer elements function as expected (34).
B. The Second lntron Enhancer Is Required for Expression in the Livers of Transgenic Mice
The 898-bp sequence of the apo-B gene immediately upstream of the transcriptional start site (the promoter) is sufficient to direct correct celltype-specific transcription of a reporter gene when transiently transfected into cultured cells (Fig. 3). To test whether the promoter is sufficient for expression in mice, the construct pp-ga1.P (Fig. 12) was used to generate transgenic mice. Two founder mice were generated with this construct, one of which (F, #2) transmitted the transgene to its offspring; analysis of RNA from the various tissues of these four F, mice by ribonuclease protection showed that the transgene was not transcribed in any of 12 tissues analyzed (Table I). Analysis of the same RNA samples in parallel with a mouse apo-B probe indicated that the endogenous mouse apo-B mRNA is expressed in the liver and intestine and that it is intact (data not shown), thus eliminating RNA degradation as a reason for the absence of a signal from the transgene. The second F,, designated #6, failed to transmit the transgene to its offspring, presumably due to mosaicism. Analysis of RNA from six tissues of F, #6 showed that the transgene was not transcribed in any tissue examined, despite the fact that the transgene was present in genomic DNA prepared from the liver and intestine at the time of sacrifice. Extensive analysis of the integrity and arrangement of the transgenes in the pp-ga1.P mice by Southern-blot analysis revealed that most copies of the transgene were intact and in a head-to-tail tandem array (data not shown). Taken together, our data suggest that the apo-B promoter alone may not contain sufficient information to initiate tissue-specific transcription in uiuo. from +621 to +2805, containing both the second (black box) and third intron (checkered rectangle) enhancers. The right side of the figure shows the relative values of P-galactosidase activity obtained when the constructs shown on the left were transiently transfected into HepC2 and CaCo-2 cells. The data are expressed as P-galactosidase activity relative to that of the pp-gal.P construct and represent the average of several independent transfections, the number of which are shown in parentheses. p-Galactosidase activities were corrected for differences in transfection efficiencies as determined by cotransfection with an apo-B-enhancer-CAT construct as described previously. Reproduced with permission from Ref. 34.
176
BEATRIZ LEVY-WILSON
TABLE I QUANTITATION OF TRANSGENE COPY NUMBERAND EXPRESSION AND pp-ga15.2M TRANSGENIC MICE
Transgene
pp-ga1.P
pp-gal5.2M
Animal
F"
Number
FIa
Transgene COPY nurnberh
FOR THE
pp-ga1.P
Number of tissues analyzed(:
Transgene expression
27 (M) 128 (M) 129 (M) 130 (M)
20 20 20 20
12 1 6 1
None None None None
6 (F)
-
3
6
None
23 1
261 (M) 264 (F) 267 (F)
73 73 127
12 2 2
None None None
277
321 (M)
70
12
None
278
348 (F)
15
6
None
28 1
325 (M) 332 (F)
15 15
6 6
None None
2
* Mice are identified individually by number, and the sex of each mouse analyzed for expression is indicated in parentheses following the mouse number. b Quantitation of transgene copy number was achieved by radioanalytical imaging of the Southern-blot filters on an AMBIS scanner. c The number of tissues from each mouse analyzed for expression of the transgene is indicated. In every case, RNA from the liver was analyzed for expression, and in cases where only two tissues were examined, the second tissue was the small intestine. When six tissues were examined, they were liver, duodenum, jejunum, ileum, kidney, and spleen. The remaining tissues analyzed were brain, salivary gland, pancreas, stomach, muscle, and heart.
Six founder animals were generated with the construct ppgal.PE, in which the second intron enhancer lies upstream of the promoter (Fig. 12 and Table 11). Four of these six lines, #384, #452, #464, and #466, expressed the transgene in the liver, but not in any other tissue examined, at levels varying between 0.1 and 10%of the endogenous mouse apo-B mRNA levels (Table IT). Southern-blot analysis of genomic DNA prepared from the liver and intestine of the six pp-gal.PE founder animals at the time of sacrifice using probes B and C showed that, in all cases, the transgene was present in both tissues in all animals. Extensive Southern analysis showed that the transgenes were, by and large, in head-to-tail tandem arrays and that most copies of the transgenes were intact. Therefore, the lack of transgene expression exhibited by the pp-ga1.P and the pp-gal.PE mice #145, #66, and #69 cannot be attributed to rearrangement of the transgenes.
LIVER-SPECIFIC EXPRESSION OF THE
aP0-B
GENE
177
These results suggest that transcription from the apo-B promoter in the liver requires the presence of the second intron enhancer, because of the two lines of mice created with the promoter alone (pp-gal.P), none expressed the transgene. As described below, this conclusion was validated by results with construct pp-ga15.2M, which contains 5262 bp of apo-B 5' flanking sequence, including the 5' MARS and the 3' MAR, but lacks the second intron enhancer. Four founder mice were generated with pp-ga15.2M; the F, progeny from one of these (#231) had two different copy numbers and therefore were considered to be distinct lines (Table I). None of the five lines of mice generated with the pp-ga15.2M construct expressed the transgene in any of the tissues examined (Table I). The integrity and arrangement of the transgenes was examined by Southern-blot analysis of tail DNA from F, mice representative of all five lines of pp-ga15.2M (data not shown). From these experiments, we concluded that the transgene DNA was intact and in headto-tail tandem array in these pp-ga15.2M mice. Therefore, our inability to detect liver expression is attributable to the absence of the intron-2 enhancer. Based on the results presented so far, one would predict that liver expression would be restored to such a construct by the addition of the second intron enhancer, as in pp-gal5.2EM. To test this hypothesis, several founder mice were generated with pp-ga15.2E. Five of the eight lines of pp-gal5.2EM mice expressed the transgene in the liver at levels varying between 3 and 120% of the endogenous apo-B mRNA levels (Table 11). This contrasts with the results obtained with pp-ga15.2M where all five lines failed to express the transgene (Table I). These data demonstrate that the second intron enhancer is required for liver expression in transgenic mice. The fact that pp-gal.PE was also expressed in the liver of transgenic mice, although at lower levels, indicates that the promoter together with the second intron enhancer are sufficient to initiate low-level transcription in the liver.
C. Tissue Specificity of Transgene Expression The apo-B promoter and the second intron enhancer together are sufficient to direct expression of a reporter gene in the liver of transgenic mice. Expression was not detected in any of 11 other tissues examined, including the small intestine. The addition of sequences extending from -899 to -5262 and +43,104 to +44,329, as in pp-gal5.2EM, had no effect on the tissue specificity of expression. In an attempt to localize the apo-B intestinalspecific sequences, we constructed pp-ga15.2E3M, incorporating the third intron enhancer, together with the second intron enhancer upstream of the promoter, in a construct flanked by the 5' and 3' MARS (Fig. 1). Two independent transgenic lines were generated (#205 and #195) (Table 11); these
178
BEATHIZ LEVY-WILSON
TABLE I1 QUANTITATION OF TRANSGENE COPYNUMBER AND EXPRESSION LEVELSFOR TRANSGENIC MICE HARBORING pp-gal. PE, pp-ga12.7E, pp-ga15.2E, pp-yal5.2EM, AND pp-gal5.2E3M CONSTRUCTS
Transgene
pp-gal. PE
pp-gal2.7E
Number of tissues analyzedc
Endogenous mouse liver apo-B (%)"
<1
12 6 6 12 4 4 4
None None None 1.2 9.6 0.16 2.4 3.3 ? 3.1(4)
2 10
11 11
9.4 7.4 8.4(2)
184 (F)
9
6
3.2
185 (F) 188 (M) 189 (M)
4 4 4
3 6 6
51.0 19.8 20.0
159 (M) 163 (M)
1 1
6 3
<2.0e 0.7
165 (F) 166 (F) 167 (F) 171 (M)
16 16 16 16
3 1 1 3
None None None 0.68
170 (M)
52
6
None 15.9 ? 14.4(6)
112 (M) 118 (F)
4 4
12 6
None None
88 (M) 125 (F) 124 (F)
10 10 10
6 6 1
120.0 52.0 102.0
105 (M)
18
12
20.0e
106 (M)
30
2
None
107 (M)
40
2
None
111 (F)
5
6
18.0
Animal
Number FIa
13 145 (F) 384 (F) 452 (M) 464 (M) 466 (F)
66 (M) 69 (F)
F"
139
141
44
46 pp-gal5.2EM 48
COPY
number6 33 33 6
7
1 12
441 (F) 480 (M)
136
Transgene
(continued)
LIVER-SPECIFIC EXPRESSION OF THE
179
ap0-B GENE
TABLE I1 (Continued) ~
~~
Animal Fo
Transgene
49
pB-gal5.2E3M
~~
~
~
~
~~
F1a
Transgene COPY numberh
Number of tissues analyzedc
Endogenous mouse liver apo-B (%)d
97 (F) 99 (F) 100 (M)
6 6 6
2 2 6
16.4 3.0 44.0
1
2
Number
45.7
10.0 2 34.3(8)
205
237 (F) 243 (M)
12 12
12 6
8.0 2.6
195
249 (F) 290 (M)
11 11
11
12
118.4 106.0 58.8 2 53.5(4)
~~~
~
~
~~
~~
Mice are identified individually by number, and the sex of each mouse analyzed for expression is indicated in parentheses following the mouse number. b Quantitation of transgene copy numher was achieved by radioanalytical imaging of the Southern-blot filters on an AMBIS scanner. c The number of tissues from each mouse analyzed for expression of the transgene is indicated. In every case, RNA from the liver was analyzed for expression, and in cases where only two tissues were examined, the second tissue was small intestine. When six tissues were examined, they were liver, duodenum, jejunum, ileum, kidney, and spleen. The remaining tissues analyzed were brain, salivary gland, pancreas, stomach, muscle, and heart. d The expression levels are presented as hepatic transgene RNA levels relative to that ofthe endogenous mouse liver apo-B mRNA, and take into account the differences in the specific activities of the two rihonuclease protection probes. The numbers beneath the last column are the mean the standard deviation of the number of data points (shown in parentheses) for each of the constructs. Mice that did not express the transgene, or for which the level of expression was not quantitated accurately, were not included in the calculation of the mean. e These figures are approximate. Q
*
expressed the transgene only in the liver, suggesting that the intestinal elements required for apo-B transcription in oiuo are not localized within intron-3.
D. The Nuclear Matrix Attachment Sites M a y Play a Role in Transgene Expression
The results obtained with pQ-gal.PE show that the apo-B gene MARs are not absolutely required for expression in the liver in uiuo. It has been suggested that MARs may act as boundary elements within chromatin, thereby protecting the genes they surround from position effects (35).The human apo-B gene contains 3 MARs: the 5’ distal, the 5’ proximal, and the 3‘ MAR.
180
BEATRIZ LEVY-WILSON
First, we asked whether inclusion of the 5’-proximal MAR and neighboring sequences upstream of the promoter-enhancer vector, as in pp-gal2.7E, would have an impact on the expression of the transgenes. Two transgenic lines were generated with this construct, and both expressed the transgene at levels that are higher than those observed in the absence of the MAR (Table 11). We then tested the effect of both the 5‘-distal and the 5’-proximal MARs on expression levels in the construct pp-ga15.2E. Three founders were generated with this construct (#137, #130, and #141) which gave rise to F, mice with a total of five different copy-numbers (Table 11). Of these, four were expressed in the liver but not in any other tissue and one failed to express the transgene in any tissue (#170). In all of the pp-ga15.2E mice, the transgenes were intact and in head-to-tail tandem arrays. These data show that, in the presence of both 5’ MARs, the transgenes are still subject to siteof-integration effects. The presence of both 5’ MARs and the 3’ MAR in the construct pp-gal5.2EM did not guarantee expression of the transgene. Of eight lines carrying different copy numbers of this construct, three failed to express the transgene (Table 11). Extensive Southern-blot analysis demonstrated that the lack of expression exhibited by these mice was not due to rearrangement of the transgenes. The steady state level of transgene RNA in the liver of the expressing mice was quantitated using a sensitive ribonuclease protection assay (Fig. 13). To control for RNA degradation and to correct for differences in the amount of RNA analyzed, the results were normalized to the level of endogenous mouse apo-B mRNA; the results are presented in Table 11. The level of transgene expression varied considerably even between sibling F, mice with the same copy-number. When the level of transgene RNA was corrected for the copy-number of the transgene, there was little evidence of copy-number-dependent expression for any of the constructs (data not shown). A good example of this is provided by the pp-gal5.2EM mice. The two lines generated with this construct had similar copy-numbers but expressed the transgene at very different levels, whereas line #205 exhibits low-level expression. Thus, even when the transgene was flanked by the 5’ and 3’ MARs as in pp-gal5.2EM, expression of the transgene was not copynumber dependent. This suggests that sequences outside the domain examined are required for the elimination of position effects.
E. DNase-l Hypersensitivity of the Integrated Transgenes
The 5’-end of the human apo-B gene is enriched in DNaseI-hypersensitive sites. Several DH sites have been mapped to the proximal
181
LIVER-SPECIFIC EXPRESSION OF THE ap0-B GENE pP-ga15.2M (325) I pp-ga15.2M (332)
A
pp-gaI5.2E (189)
I Human Apo B Probe I
- 121
I
plbgal 2.7E (480)
Human ADO B Probe
121 nt -
I Mouse Apo B Probe I
254 nl-
Mouse Apo B Probe
I
254nt- . I ,
B
ppgal5.ZEM (111) -,pPga152EM (125) Human Apo B Probe-
-,
Mouse Apo B Probe I
254nl-a
138nt-
I)
0;
I
_1
ppgaI5.2E3M (249)
D
254 n l
,-Human
Apo B P r o b e ,
,-Mouse
Apo B Probe
- I).
136nt-
,
emI-
FIG. 13. Analysis of transgene RNA expression. Representative results of RNase protection experiments from mice carrying the transgenes pp-ga15.2M (A), pp-gal5.2EM (B), pp-gal5.2E and pp-ga12.7E (C), and pp-ga15.2E3M (D) are shown. In each case, 15 kg of total cellular RNA from the tissues indicated above each lane was analyzed in parallel with a human apo-B probe and a mouse apo-B probe in separate assays. RNA from HepG2 cells and transfer RNA (tRNA) were used as positive and negative controls, respectively. The number in parentheses next to the construct name identifies the mouse. Reproduced with permission from Ref. 34.
A.
c. ,
5.2 EM (#124)
r?c8#l
Liver
t
Spleen
(UrSmJ) 0 1 2 4 8
Kpnl3DH4
- Kpnl
4.3-
*'? 5' Probe
M
Liver
MO 2
M 0 1 2 4
(IlniwM 0
3'Probe
3'Probe
(U&iii)O
Liver
1 2 4
- Kpnl
- Kpnl
-
5' Probe
5.2 E (#106, #167)
Spleen
O 1 2
Liver
*I-
0-1 2 4
1 2 4
-Kpnl
- Kpnl
-Kpnl
3 DH
1DH4
I DH4 ( I ,
3'Probe
1
- Kpnl -DH4
2.0
M
spleen
Liver IUJMnllMO I 2 4
F
Sprobe
5.2 E3M (#297)
3'Probe
5' Probe
om
4
DH
4
5' Probe
LIVER-SPECIFIC EXPRESSION OF THE
apo-B GENE
183
promoter region (18), introns 2 (18, 29) and 3 (29),and to the 5’ distal region of the gene (B. Levy-Wilson, unpublished results). It was of interest to ascertain whether any of these previously identified tissue-specific DH sites are created in the livers of expressing (or nonexpressing) mice carrying the human apo-B-driven P-gal constructs. Several transgenic mice were examined for the presence of apo-B gene DH sites. We searched mainly in the promoter and the second and third intron enhancer regions. The restriction enzymes and the probes used in these experiments yielded no background bands with nontransgenic mice that were examined in parallel. In expressing mice harboring the 5.2EM construct, such as line #124 shown in Fig. 14A, we observed one strong hypersensitive region in the liver that was absent in the spleen. The transgenes were expressed in the liver but not in the spleen (Table 11). We designated this rather broad region DH4, because it overlaps the previously determined DH4 region in HepG2 and CaCo-2 cells (18). As shown in the map at the bottom of Fig. 14, DH4 maps within the second intron enhancer, in a segment of DNA enriched in transcription factor binding sites and highly conserved in evolution (18). This same hypersensitive site was detected in liver from mice expressing constructs 2.7E (#480) (Fig. 14B) and 5.2E3M (Fig. 14C), suggesting that formation of this hypersensitive site may be a prerequisite for expression. That this may indeed be the case is supported by the fact that nonexpressing mice carrying the 5.2E construct (sharing a similar structure and differing only slightly from the 2.7E and 5.2EM constructs) fail to exhibit a band in the DH4 position (Fig. 14B). Furthermore, the fact that DH4 is not observed in the spleens of animals carrying these constructs also supports the idea that DH4 may be required for expression. A second area of hypersensitivity was detected within the third intron enhancer in mice expressing 5.2E3M transgenes (Fig. 14C). This DH site,
FIG. 14. DNase-I hypersensitivity of the integrated transgenes. Livers and spleens were removed from various transgenic mice. Nuclei were prepared and incubated with increasing concentrations of DNase I followed by digestion with KpnI and analysis as described (34). (A) The representative results for a mouse (#124) carrying the construct pp-gal5.2EM; (B) the data for a pp-ga12.7E transgene (mouse #480), (C) the data for a pp-gd5.2E3M mouse (#297). The amount of DNase I used in each sample is indicated for each line on top of the gels; M reflects A-Hind111 DNA fragments that were coelectrophoresed with the samples and used as size markers; the probe used is indicated below the autoradiograms. KpnI indicates the position on the gels of the initial KpnI fragment. The terms DH4 and DH are used to illustrate the hypersensitive sites. At the bottom of panels A, B, and C are maps of the relevant region of the transgenes with the apo-B promoter depicted as a black box, the second intron enhancer as a stippled oval, and the third intron enhancer as a rectangle with diagonal lines. DH4 and DH show the locations of the hypersensitive regions. Reproduced with permission from Ref. 34.
184
BEATHIZ LEVY-WILSON
termed DH, maps in between D H sites IV and V, found in nuclei from HepG2 and CaCo-2 cells (29) within the third intron core enhancer. The transgene D H site is in close proximity or overlaps three DNase-I footprints that are enriched in binding sites for tissue-specific transcription factors (29). Whether formation of the intron 3 D H site plays an important role in transgene expression remains to be determined.
IV. Concluding Remarks Much effort has been put into expressing a wide variety of genes in transgenic mice, not only to understand the physiologic role of those proteins and the effects of their overexpression in animal models, but to identify and characterize the DNA sequence elements required for expression. Although in some cases transgene expression has been achieved by merely introducing a cDNA or minigene linked to a strong tissue-specific promoter, (36, 37), in most cases it appears that correctly regulated, high-level, cellspecific gene expression in transgenic animals is not easy to achieve except with large constructs encompassing many kilobases of upstream and downstream flanking DNA (38, 39). It is clear that the DNA sequences important for conferring high-level, correctly regulated, copy-number-dependent and regulated expression must lie somewhere within those large constructs, but the exact nature of the various key sequences remains obscure. An exception to this is provided by the globin genes. The pioneer work of Grosveld and collaborators has led to the identification of a unique class of regulatory elements termed locus control regions (LCRs) (38). These elements correspond to groups of DNase-I-hypersensitive sites that are required for position-independent, copy-number-dependent expression of all the genes within the globin locus, acting over great distances. LCRs consist of binding sites for both ubiquitous and tissue-specific transcription factors, and function as enhancers in transient assays (40). As yet, it is unclear how LCRs differ from the classical enhancer elements found closer to the transcription start sites of many genes. A similar element, termed the livercontrol region, is required for liver expression of the linked human apolipoprotein E (apo-E) and apo-CI genes (41). For most other genes, the requirements for accurate expression in vivo vary widely. Some genes, like the human cholesterol ester transfer protein gene, require only promoter and proximal upstream sequences (X’),whereas others, like the mouse albumin gene, require sequences from -8.5 to -10.4 kb upstream of the promoter (42). The human adenosine deaminase gene requires sequences from the first intron of the gene (43).
LIVER-SPECIFIC EXPRESSION OF T H E ap0-B GENE
185
The apparent discrepancy seen in most systems between the functions of these regulatory elements in cell culture versus transgenic animals almost certainly reflects the effect of integration of the transgene into the host chromatin. Any model of gene activation must include the transition to a more open chromatin structure that allows trans-acting protein factors and RNA polymerase to gain access to the DNA. This initial event in gene activation may involve the binding of transcription factor(s) to specialized elements during DNA replication when the DNA is transiently free of nudeosomes. Thus, the lack of expression in transgenic animals may result from a requirement for additional positive elements involved in promoting decondensation of the chromatin structure at the gene locus in question. It has been proposed that transcription units are organized into looped domains in the chromatin of eukaryotic nuclei, each domain corresponding to a functional unit containing all of the regulatory elements required for the correct expression of the gene(s) within that domain. Such an arrangement would also provide a mechanism for isolating transcription units from the influence of nearby enhancers and silencers, some of which can affect transcription over distances as great as 50 kb (38). Nuclear DNA is tightly associated with a proteinaceous structure, referred to as the nuclear matrix, by specific DNA elements termed scaffold attachment regions (SARs) (44) or matrix attachment regions (MARs) (45). It has been proposed that the latter form the bases of these looped domains (44).Whether MARs are also able to act as boundary elements to protect genes from the effects of neighboring enhancers and silencers is unclear. However, there is experimental evidence that MARS can protect transgenes from position effects exerted by nearby sequences. For example, a construct consisting of the complete functional domain of the chicken lysozyme gene as defined by DNase-I sensitivity, and including the flanking 5' and 3' MARs, is correctly transcribed in transgenic mice in a manner that is both copy-number dependent and position independent (46)Furthermore, the lysozyme gene MARs confer correctly regulated, position-independent expression upon a heterologous gene, namely, the whey acidic protein transgene, in mouse mammary tissue (35). The human apo-B gene is similar to the chicken lysozyme gene in that, for both genes, the structural domain of the gene as represented by the locations of the 5' and 3' MARs coincides with the functional domain of the gene, represented by the DNase-I-sensitive domain. Thus, one would predict that inclusion of all key apo-B control sequences located within this domain may suffice to yield high-level, position-independent expression in transgenic mice. The results presented herein demonstrate that, unlike the situation in cultured cells, the promoter alone is not sufficient to direct expression of the
186
BEATRIZ LEVY-WILSON
apo-B gene in transgenic mice; inclusion of the second intron enhancer upstream of the promoter (pp-gal. PE) is sufficient for low-level expression of the transgenes, but only in the liver. Nevertheless, expression was not copynumber dependent, perhaps reflecting the exposure of this construct to strong position effects at the site of integration. Addition of one or both of the 5' MARs to the 5' end of the promoter-second intron enhancer construct (as in ppga12.7E and pp-ga15.2E) appeared to increase the level of hepatic expression of the transgenes (Table 11). This contrasts with the effects of these sequences in transient assays, where expression levels were significantly decreased (Fig. 12). Thus, the positive effects of MARs seen in transgenic mice may depend on the details of integration into the host chromatin. Some lines of ppgal5.2E transgenic mice, like F, #136, expressed the transgene at levels as high as 50% of the endogenous mouse apo-B mRNA levels, whereas others, for example, F, #141, poorly expressed the transgene. Therefore, the MARs clearly do not serve as boundary insulators for the transgenes in these animals. Similarly, inclusion of the 3' MARs at the 3' end of the construct together with the 5' MARs at the 5' end of the construct (as in 5.2EM) did not guarantee expression, although at least in one line (F, #46), levels comparable to those seen for the endogenous apo-B message were observed. On a quantitative level the expression of ppgal5.2EM in transgenic mice contrasts markedly with the cell culture results. In transiently transfected cells the expression of 5.2EM was one-third that of PE whereas in transgenic inice the average expression level of 5.2EM was almost 14-fold that of PE. Thus, these sequences behave differently in vivo than they do in transient assays. This is similar to that reported for the chicken lysozyme gene MARs (46). From these observations, we conclude that the 5' and 3' MARs of apo-B are not sufficient to insulate consistently the associated transgenes from position effects at the site of integration, although it appears that constructs harboring one or more MARs express generally at higher levels than do those lacking MARs. These results are reminiscent of those of Blazquez et al. (47) using mouse plasrnocytoina cells to examine the role of the immunoglobulin K gene intronic MAR in the expression of the integrated exogenously added mouse K genes. They found that deletion of the intronic MAR caused a decrease by one-fourth in expression levels of the integrated K genes, thus supporting the notion that MARs play a quantitative role in gene expression in vivo. The results with constructs 2.7E, 5.2E, 5.2EM, and 5.2E3M suggest that the previously characterized negative regulatory region (30)present in all these constructs (-3678 to -1802) may not repress hepatic transcription of the gene in vivo, because levels of expression seen with these constructs
LIVER-SPECIFIC EXPRESSION OF THE
ap0-B GENE
187
were higher than those seen with the PE construct, lacking the negative control region. Of course, one could argue that presence of the MARs in those constructs may serve to counteract the effects of the negative control region. To solve this dilemma, we are preparing constructs that lack the negative control region but retain the MARs. The lack of intestinal expression of all the constructs tested clearly implies that the apo-B intestinal-specific regulatory elements are separate and distinct from the hepatic-specific control elements. The locations of the intestinal elements remain unknown. Our studies of the DNase-I hypersensitivity of the integrated transgenes are very informative. Several tissue-specific DH sites had been observed previously in cell culture studies, within the 5’ end of the apo-B gene (18, 29); one such DH site, DH4, localized within the strong intron-2 enhancer (between +700 and +I000 of the gene) is evident in expressing human transgenes that carry intron-2 (2.7E, 5.2EM, 5.2E3M) (Fig. 14). The presence of this D H site in liver nuclei appears to be associated with the expression of the transgenes, because DH4 was not detected in liver nuclei from two mice from nonexpressor 5.2E lines analyzed in this manner (Fig. 14). The original DH4 detected in nuclei from cultured HepG2 and CaCo-2 cells was centered at position +760 (18), in a DNA segment from the second intron enhancer containing a putative binding site for the ubiquitous transcription factor NF-1 (220). Furthermore, this segment between +750 and $770 of the apo-B gene (footprint C in 20 exhibited an 18- out of 25-bp match with a segment of the mouse immunoglobulin Jk-Ck intron enhancer region. The DNA region contains a 130-bp segment whose sequence is highly conserved among mouse, rabbit, and humans (48) and that is hypersensitive to DNase I (49, 50). Furthermore, within the 25-bp homologous region between the apo-B gene enhancer and the mouse immunoglobulin enhancer, there are 10 bp that are homologous to the polyoma virus enhancer (48). Thus, these sequences of the apo-B gene within DH4 may reflect minimum sequence requirements for a generalized enhancer function. The DH4 band seen in the transgenic mouse liver nuclei is broad (-+700 to +1000) and encompasses, in addition to the aforementioned region, the whole of the second intron “core enhancer.” The latter is a 147bp segment (+806 to +952) containing a large 97-bp DNase-I footprint with functional binding sites for the transcription factors HNF-1, C/EBP, and protein I1 (28).The core enhancer exhibits some 75% of the total enhancer activity. Functional studies demonstrate that HNF-1, U E B P , and protein I1 act synergistically within the core enhancer to enhance transcription from the apo-B promoter (Fig. 9). The binding site for protein I1 maps at the center of another tissue-specific, evolutionarily conserved DNase-I-hyper-
188
BEATRIZ LEVY-WILSON
sensitive site, DHI (29).Therefore it appears that in the liver of mice expressing apo-B constructs, a broad hypersensitive region (DH4) is formed that includes the tissue-specific DH4 and DHI sites, previously seen within the second intron enhancer in studies with cultured cells. Transgenes containing intron 3 exhibit another D H site, labeled DH in Fig. 14C. This site mapped in between two previously described DH sites, DH-IV and DH-V (Fig. lo), (29). It is likely that this segment within the third intron may be important in hepatic expression in uiuo. Despite our efforts to reproduce the apo-B gene chromosomal domain in the 5.2EM and 5.2E3M P-gal constructs, we have not totally eliminated position effects at the site of integration, suggesting that additional elements, located further 5’ or 3’ of the apo-B regulatory domain examined here, may be involved. Future studies with larger chromosomal apo-B fragments harboring large 5’ and 3’ extensions (50-100 kb) will ultimately be required to answer these questions. In two recent reports (51, 52), apo-B genomic constructs over 80 kb long have been used to generate transgenic mice. Mice harboring the human transgenes express the apo-B protein at varying levels in plasma, with some mice expressing the protein at very high levels. Nevertheless, it is still not known whether the mRNA levels parallel the protein levels, and whether expression of the apo-B mRNA is copy-number dependent. Because not all founder mice carrying the transgene express it, position effects may still play a role in transgene expression. Furthermore, because in these mice expression has been detected only in the liver, the location of intestine-specific control elements remains to be elucidated. The availability of even larger genomic clones in yeast artificial chromosomes (39) should permit identification of elements required for copy-numberdependent, site-of-integration-independent expression as well as the intestinal control elements. ACKNOWLEDGMENTS I acknowledge Alan Brooks, Bernhard Paulweber, and Brian Blackhart, together with Brian Nagy and Craig Fortier, for their outstanding contributions to this work. Special thanks to Brian J. McCarthy for his comments on this article and Pamela J. Hogan for typing the manuscript. This research was supported by funds provided by the Cigarette and Tobacco Surtax Fund of the State of California through the Tobacco-Related Disease Research Program of the University of California, Grant Number 4RT-0308A.
Abbreviations CAT apo-B
chloramphenicol acetyltransferase apolipoprotein-B
LIVER-SPECIFIC EXPRESSION OF THE
P-gal
MAR DH MH LCR SAR HNF NF EBP
ap0-B GENE
189
P-galactosidase matrix attachment site DNase-I hypersensitive micrococcal nuclease hypersensitive locus control region scaffold-associated region hepatocyte nuclear factor nuclear factor enhancer binding protein
REFERENCES 1. 2. 3. 4.
J. L. Goldstein and M. S . Brown, ARB 46, 897 (1977). J. P. Kane, Annu. Reu. Physiol. 45, 637 (1983). S. G . Young, Circulation 82, 1574 (1990). J. D. Brunzell, A. D. Sniderman, J. J. Albers and P. 0. Kwiterovich, Jr., Arteriosclerosis 4, 79 (1984). 5 . A. Sniderman, S . Shapiro, D. Marpole, B. Skinner, B. Teng and P. 0. Kwiterovich, Jr., PNAS 77, 604 (1980). 6 . T. J. Knott, R. J. Pease, L. M. Powell, S . C. Wallis, S . C. Rall, Jr., T. L. Innerarity, B. Blackhart, W. H. Taylor, Y. Marcel, R. Milne, D. Johnson, M. Fuller, A. J. Lusis, B. J. McCarthy, R. W. Mahley, B. Levy-Wilson and J. Scott. Nature 323, 734 (1986). 7. T. J. Knott, S . C. Rall, Jr., T. L. Innerarity, S. F. Jacobson, M. S. Urdea, B. Levy-Wilson, L. M. Powell, R. J. Pease, R. Eddy, H. Nakai, M. Byers, L. M. Priestley, E. Robertson, L. B. Rall, C. Betsholtz, T. B. Shows, R . W. Mahley and J. Scott, Science 230, 37 (1985). 8. L. A. Demmer, M. S. Levin, J. Elovson, M. A. Reuben, A. J. Lusisand J. I. Gordon, PNAS 83, 8102 (1986). 9. E. H. Ludwig, B. D. Blackhart, V. R. Pierotti, L. Caiati, C. Fortier, T. Knott, J. Scott, R. W. Mahley, B. Levy-Wilson and B. J. McCarthy, DNA ( N . Y.) 6, 363 (1987). 10. B. D. Blackhart, E. H. Ludwig, V. R. Pierotti, L. Caiati, M . A. Onasch, S. C . Wallis, L. Powell, R. Pease, T. J. Knott, M.-L. Chu, R. W. Mahley, J. Scott, B. J. McCarthy and B. Levy-Wilson, JBC 261, 15364 (1986). 11. L. F. Soria, E. H. Ludwig, H . R. G . Clark, G. L. Vega, S . M. GrundyandB. J. McCarthy, PNAS 86, 587 (1989). 12. R . Berezney and D. Coffey, BBRC 60, 1410 (1974). 13. J. R. Paulson and U. K. Laemnili, Cell 12, 817-828 (1977). 14. P. Cook and I. Brazell, J. Cell Sci. 22, 287 (1976). 15. C. Benyajati and A. Worcel, Cell 9, 393 (1976). 16. T. Igo-Kemenes and H. 6. Zachau, CSHSQB 42, 109 (1977). 17. B. Levy-Wilson and C. Fortier, JBC 264, 21196 (1989). 18. B. Levy-Wilson, C. Fortier, B. D. Blackhart and B. J. McCarthy, MCBiol. 8 , 71 (1988). 19. B. Paulweber, F. Sandhofer and B. Levy-Wilson, MCBiol. 13, 1534-1546 (1993). 20. A. R. Brooks, B. D. Blackhart, K. Haubold and B. Levy-Wilson, JBC 266, 7848 (1991). 21. B. Levy-Wilson and C. Fortier, JBC 264, 9891 (1989). 22. B. Paulweber, M. A. Onasch, B. P. Nagy and B. Levy-Wilson, JBC 266, 24149 (1991). 23. H . K. Das, T. Leff and J. L. Breslow, JBC 263, 11452 (1988).
190
BEATHIZ LEVY-WILSON
24. D. Kardassis, M . Hadzopoulou-Cladaras, D. P. Ramji, R. Cortese, V. I. Zannis and C. Cladaras, MCBiol. 10, 2653 (1990). 25. J. A. A. Ladias, M. Hadzopoulou-Cladaras, D. Kardassis, P. Cardot, J. Cheng, V. Zannis and C. Cladaras, JBC 267, 15849 (1992). 26. S. Metzger, T. Leff and J. L. Breslow, JBC 265, 9978 (1990). 27. E. H. Ludwig, B. Levy-Wilson, T. Knott, V. D. Blackhart and B. J. McCarthy, Cell Biol. 10, 329 (1991). 28. A. R. Brooks, and B. Levy-Wilson, MCBiol. 12, 1134 (1992). 29. B. Levy-Wilson, B. Paulweber, B. P. Nagy, E. H. Ludwig and A. R. Brooks, JBC 267, 18735 (1992). 30. B. Paulweber, A. R. Brooks, B. P. Nagy and B. Levy-Wilson, JBC 266, 21956 (1991). 31. B. Paulweber and B. Levy-Wilson, JBC 266, 24161 (1991). 32. B. D. Blackhart, Z. Yao and B. J. McCarthy, JBC 265, 8358 (1990). 33. D. R. Goring, J. Rossant, S. Clapoff, M. L. Breitman and L. C. Tsui, Science 235, 456 (1987). 34. A. R. Brooks, B. P. Nagy, S. Taylor, W. S. Simonet, J. M. Taylor and B. Levy-Wilson, MCBiol. 14 (1994). In press. 35. R. A. McKnight, A. Shamay, L. Sankaran, R. J. Wall and L. Hennighausen, PNAS 89,6943 (1992). 36. S. L. Hofmann, D. W. Russell, M. S. Brown, J. L. GoldsteinandR. E. Hammer, Science 239, 1277 (1988). 37. X. C . Jiang, L. B. Agellon, A. Walsh, J. L. Breslow and A. Tall, J . Clin. Znoest. 90, 1290 (1992). 38. F. Grosveld, 6. B. van Assendelft, D. R. Greaves and G . Kollias, Cell 51, 975 (1987). 39. A. Schedl, L. Montoliu, 6 . Kelsey and G. Schiitz, Nature 362, 258 (1993). 40. P. Fraser, S. Pruzina, M. Antoniou and F. Grosveld, Genes Deu. 7, 106 (1993). 41. W. S. Simonet, N. Bucay, S. J. Lauer and J. M. Taylor, JBC 268, 8221 (1993). 42. C. A. Pinkert, D. M. Ornitz, R. L. Brinster and R. D. Palmiter, Genes Deu. 1, 268 (1987). 43. B. J. Aronow, R. N. Silbiger, M. R. Dusing, J. L. Stock, K. L. Yager, S. S. Potter, J. J. Hutton and D. A. Wiginton, MCBiol. 12, 4170 (1992). 44. J. Mirkovitch, M.-E. Mirault and U. K. Laemmli, Cell 39, 223 (1984). 45. P. N. Cockerill and W. T.Garrard, Cell 44, 273 (1986). 46. C. Bonifer, M. Vidal, F. Grosveld and A. E. Sippel, EMBOJ. 9, 2843 (1990). 47. V. C . Blasquez, M. Xu, S. C. Moses and W. T. Garrard, JBC 264 21183 (1989). 48. L. Emorine, M. Kuehl, L. Weir, P. Leder and E. E. Max, Nature 304, 447 (1983). 49. T. G. Parslow and D. K. Grannar, Nature 299, 449 (1982). 50. T. G. Parslow and D. K. Granner, NARes. 11, 4775 (1983). 51. M. F. Linton, R. V. Farese, Jr., G. Chiesa, D. S. Grass, P. Chin, R. E. Hammer, H. H. Hohbs and S. G. Young, J . Clin. Znuest. 92, 3029 (1993). 52. M. J. Callow, L. J. Stoltzfus, R. M. Lawn and E. M. Rubin, PNAS 91, 2130 (1994).
Early Growth Response Protein 1 (Egr-I): Prototype of a Zinc-finger Family of Transcription Factors ANDREA GASHLER~ AND VIKASP. SUKHATME~ Department of Medicine Beth Israel Hospital and Harvard Medical School Boston, Massachusetts 02215
I. Overview of Immediate-early Genes . . . . . . . . . . . . . . . 11. Identification of Egr-1 cDNA by Differential Screening . . . . . . . . . . . . 111. Egr-1 Is Expressed in Response to Diverse Stimuli . . . . . . . . . . . . . . . . A. Induction by Mitogens . . . , . . . . . . . . . . . . . . . . . . . . . B. Induction during Development and Differentiation . C. Induction by Tissue or Radiation Injury . . . . . . . . . . , D. Induction in Neuronal Signaling IV. Proximal Events ......................................... A. Second Messengers . . . . . . . . . . . . . . B. Egr-1 Promoter Analysis . . . . . . . . . . . . . , . . . . . . . . . . . . . , . , . . . , . V. Distal Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Characterization of the Egr-1 Protein Product B. DNA-binding Activity of Egr-1 . . . . . C. Structure-Function Analysis . . . . . . . D. Targets of Egr-1 Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI. In Vioo Role of Egr-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII. Egr-I Is Part of a Gene Family, Including the Wilms Tumor Suppressor Gene WTl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII. Conclusion and Future Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . ..... ............. .
.
.
.
.
I
.
.
.
.
.
.
.
191 192 193 193 195 197 197 199 199 200 201 201 205 210 216 219 219 220 221
1. Overview of Immediate-early Genes Extracellular signals in the form of soluble factors, matrix proteins, and adhesion molecules influence the proliferation and differentiation of eukaryotic cells. These long-term responses, mediated by changes in gene expression, are coupled to biochemical events occurring in the plasma mem1 2
Present address: Ligand Pharmaceuticals, San Diego, CA 92121 To whom correspondence may be addressed.
Progress in Nucleic Acid Research and Molecular Biology. Vol. 50
191
Copyright 0 1995 by Academic Press, he.
All rights of reproduction in any form reserved.
192
ANDREA GASHLEH AND VIKAS P. SUKHATME
brane and cytosol that follow ligand-receptor interactions or other changes in the extracellular milieu. The so-called immediate-early genes are the earliest downstream nuclear targets for these events. These genes are, by definition, induced in the absence of de novo protein synthesis. In particular, a subclass of these genes encodes transcription factors, and these products form the first step in a cascade of gene-protein interactions. Thus, immediate-early transcription factor genes serve as nuclear couplers of early cytoplasmic events to long-term alterations in gene expression. At present, the best characterized members of this group include c-fos, c-jun, and Egr-1. In turn, each of these genes is a prototype for a family of closely related proteins. This review focuses on the E g r gene family and its most extensively characterized member, Egr-1, first identified as an immediate-early gene responsive to growth factors and various differentiation cues, later confirmed to be a transcriptional regulatory protein. Other reviews have focused on changes in gene expression during the cell cycle (1) and transcriptional responses to extracellular signals (2-4). As a group, immediate-early transcription factors have provided important insights into how cellular responses to diverse extracellular signals are mediated.
II. Identification of Egr-7 cDNA by Differential Screening
One approach to identifying novel genes that play key roles in cellular growth control is to focus on transcripts whose expression is low in nondividing cells but is rapidly up-regulated in cells stimulated by mitogen. Using c-fos as a model of immediate-early gene induction, several groups used similar differential screening strategies to isolate novel genes induced without intervening protein synthesis. Specifically, the following criteria were applied in our screen for important regulators of the G,G, transition: (1) Transcripts should be induced rapidly by serum stimulation of quiescent fibroblasts; (2) the mitogenic induction should not be affected by inhibitors of protein synthesis, such as cycloheximide; (3) expression should be induced by a spectrum of mitogens in a wide variety of cell types; and (4) the genes should be highly conserved in evolution (5, 6). In particular, we pursued differential screening of a library from BALB/c 3T3 cells stimulated for 3 hours with serum in the presence of cycloheximide. Clones were identified that hybridized preferentially to cDNA from serum and cycloheximidetreated fibroblasts as compared to cDNA from quiescent cells. The immediate-early gene c-fos was reisolated by this protocol. In addition, mitogenic stimulation of a variety of cell types from different species induced a 3.4-kb transcript. This novel immediate-early gene, designated Egr-l (5-
Egr-1
TRANSCRIPTION FACTOR FAMILY
193
7), has been independently cloned by similar differential screening strategies by a number of groups: NGFI-A was isolated as a nerve growth factorinducible transcript in rat pheochromocytoma PC12 cells (8); zif268 was cloned from serum-stimulated BALB/c 3T3 fibroblasts (9); tis8 was identified as a phorbol-inducible gene in 3T3 cells (10); the chicken homolog, cef5, was cloned as a v-src-inducible gene from chicken embryo fibroblasts (11); and gene 225 was identified as a T-cell-activated transcript (12). Through hybridization to a highly conserved domain of the Drosophila factor Kriippel, Krox24 was isolated from serum-stimulated 3T3 cells (13).
111. Egr-l Is Expressed in Response to Diverse Stimuli
A. Induction by Mitogens In response to mitogens such as growth factors, hormones, and the tumor promoter TPA (phorbol), Egr-1 induction is universal. In addition, Egr-1 is expressed in diverse physiological contexts in particular cell types. The broad spectrum of extracellular stimuli that induce Egr-1 can be roughly subgrouped into four categories: (1)mitogens, (2) developmental or differentiation cues, (3) tissue or radiation injury, and (4) signals that cause neuronal excitation. In every cell type examined, Egr-1 expression is rapidly induced by mitogenic stimulation. For example, in quiescent 3T3 cells stimulated with fetal calf serum, Egr-1 expression is seen as early as 10 minutes, peaks around 30 minutes, and decays rapidly thereafter, returning to basal levels by 3-4 hours. Purified growth factors such as platelet-derived growth factor (PDGF), fibroblast growth factor (FGF), and epidermal growth factor (EGF) also stimulate Egr-1 expression in fibroblasts (5, 9, 13). The kinetics of induction are similar to those of c-fos, but the magnitude of Egr-1 induction is typically severalfold greater (5). In addition to induction in fibroblasts, mitogenic stimulation of Egr-1 has been described in a wide array of cell types, such as kidney and liver epithelial cells and lymphocytes. For example, Egr-1 is induced in regenerating liver within 1 hour after partial hepatectomy (14); in serum-starved BSC-1 monkey kidney epithelial cells in response to the mitogen adenosine diphosphate, in serum-deprived rat hepatoma H35 cells stimulated with serum or insulin, and in human peripheral blood lymphocytes treated with phytohemagglutinin (5). Egr-1 is also up-regulated by protein tyrosine kinases, whose activity is associated with transformation in culture and tumorigenesis in animals. Egr-1 message levels increase when a temperaturesensitive variant of v-Src is shifted from the nonpermissive to the permissive
194
ANDREA CASHLER AND VIKAS P. SUKHATME
temperature. Egr-1 is similarly induced by expression of a second tyrosine kinase, v-Fos (15, 16). Because protein-tyrosine kinase activity has been implicated in events promoting cell division, Egr-1 may be an important component of the mitogenic signal. An extremely tight correlation between Egr-1 expression and B lymphocyte activation has been established (17). B lymphocytes express surface immunoglobulin that acts as receptor for antigen. While mature B cells are activated by cross-linking surface immunoglobulin with anti-p antibodies and respond by proliferating, immature B-cells, such as the WEHI-231 cell line, respond to anti-p by down-regulation of proliferation and eventually cell death. The Egr-1 response in mature and immature B lymphocytes differs accordingly: Egr-1 is rapidly and transiently induced in mature B cells cross-linked with anti-p but not in WEHI-231 cells treated identically. However, Egr-l can be induced to respond in WEHI-231 cells exposed to lipopolysaccharide (LPS), a treatment that protects these cells from the antiproliferative effects of anti-p (17). The participation of Egr-1 in positive versus negative signaling through surface immunoglobulin may be mediated by differential methylation of the gene. Egr-1 is hypermethylated in immature B cells and in the WEHI-231 line. When an Egr-1 reporter is transfected (18) into the WEHI line, it can be activated by anti-p in contrast to the endogenous gene. Most convincingly, endogenous Egr-1 can be induced in WE HI-231 cells treated with the inhibitor of methylation, 5’-azacytidine (18). Additional correlation of Egr-1 induction with mitogenicity has been shown in studies (19) in rat kidney mesangial cells. Numerous vasoactive agents, including PDGF, vasopressin, serotonin, and angiotensin 11, induce proliferations in these cells, correlating Egr-1 mRNA and protein induction with cell proliferation. Strong evidence for a role for Egr-1 in proliferation also comes from studies with mouse skeletal muscle So18 cells (20). Although Egr-1 message was induced in response to mitogenic stimuli (such as basic fibroblast growth factor, PDGF BB, and fetal calf serum), differentiative stimuli (insulin), and other agents that caused neither proliferation nor differentiation, Egr-1 protein could be detected only in response to mitogenic cues. Translation of Egr-1 may be uncoupled from transcriptional induction, as was in fact suggested by earlier studies with human fibroblasts (21). Although interferons ci and y, tumor necrosis factors ci and (3, and epidermal growth factor induced Egr-1 message levels to a similar extent, the amount of Egr-I translated varied with the mitogenicity of the inducing agent. Cao et al. (21) suggest that the mechanism of translational regulation may be through the phosphorylation of cap-binding protein (eIF-4E). Phosphorylation of this factor, which promotes cellular protein synthesis, is enhanced by the mitogenic
Egr-1
TRANSCRIPTION FACTOR FAMILY
195
agents E G F and tumor necrosis factor (TNF) but not interferon (IFN) (21). Together these studies present an intriguing correlation between the translatability of Egr-1 message and the strength of the mitogenic inducing signal. Given the translational block in Egr-1 production induced by insulin in So18 cells, any role for Egr-1 in differentiated muscle must assume a function for the abundantly expressed Egr-1 message, perhaps within its 3’ UTR (22). In light of these results, the assumption that Egr-1 mRNA levels correlate with protein levels implicit in many studies of Egr-1 induction must be reexamined. Finally, recent work (23) suggests a role for Egr-1 in the regulation of astrocyte growth. Endothelin 3 (ET-3), a potent growth regulator in these cells, stimulates Egr-1 and basic fibroblast growth factor expression. An antisense oligonucleotide to Egr-1 blocked ET-stimulated thymidine uptake and bFGF gene transcription. Moreover, an antisense oligomer to the bFGF gene significantly blocked ET-stimulated thymidine incorporation. These studies point to a causal role for Egr-l induction in the proliferation of astrocytes and suggest that the bFGF gene may be a relevant physiological target gene.
B.
Induction during Development and Differentiation
In the adult mouse, high levels of Egr-l mRNA are seen in brain, thymus, heart, muscle, and lung. In particular, the high level of expression in the brain is located in the cerebral cortex and hippocampus (14). Lower levels are detected in kidney, spleen, and most other tissues, with very low levels in liver (6, 13, 14). A similar pattern of expression has been observed in the adult rat: Egr-1 is most abundant in brain and adrenal gland, and is also highly expressed in superior cervical ganglia and lung (24, 25). During development, a single Egr-1 transcript is predominantly expressed in cortex, midbrain, and cerebellum; in bone, cartilage, and muscle; and at several sites of epithelial-mesenchymal interactions. Studies in the developing rat suggest a role for Egr-1 in postnatal maturation of the brain: Egr-1 levels are low in neonatal and early postnatal brain, but increase dramatically at later times and in the adult animal, with highest levels detected in the cortex (25).In the developing mouse, Egr-1 expression in 14.5and 17.5-day fetal skeleton parallels c-fos expression, suggesting a role for these coregulated genes in skeletal development. Egr-1 expression is correlated with the onset of ossification (about day 14.5) and is localized to regions of the embryo undergoing substantial bone formation, including the membranous and alveolar bones of the head and the periosteal and endochondral ossification sites of the developing long bones (26). Like c-fos, Egr-f is expressed in cartilage at the articular surfaces of joints and in the interstitial
196
ANDREA GASHLER AND VIKAS P. SUKHATME
cells that lie in between these elements. In addition, high-level Egr-1 expression is seen in developing striated muscle, showing a patchy distribution. Finally, it has been suggested that Egr-1 may respond to signals that mediate epithelial-mesenchymal interactions during organogenesis: expression is localized to ectodermally derived cells of the inner root sheath in young whisker follicles, in the underlying mesenchymal component of developing salivary and nasal glands of the mouse, as well as the mesenchymal component of the developing tooth (26).This initial patterning during tooth organogenesis requires primary signals derived from the dental epithelium. Importantly, recent in vitro reconstitution experiments demonstrate that purified bone morphogenetic protein 4 (BMP-4) can substitute for dental epithelium in inducing morphogenetic changes in the mesenchyme and in up-regulating Egr-1 expression (27). In summary, the developmental profile of Egr-l is consistent with a role for it in brain maturation, in skeletal development, and in response to epithelial-mesenchymal interactions. In several cell types, a rise in Egr-1 expression is correlated with differentiative processes, in particular in cardiac, neural, osteoblast, and monocytic differentiation. Differentiation of PI9 embryonal carcinoma cells into cardiac muscle, or nerve and glial cells, is induced in the presence of dimethyl sulfoxide (DMSO) or retinoic acid, respectively. In response to either, a biphasic pattern of Egr-1 expression is seen. A transitory increase after 3 days of treatment is followed by high sustained levels of Egr-1 expression after 14 days in culture (6).The expression of Egr-1 in adult heart and brain is consistent with its prolonged response, pointing to a role for it in these differentiated cell types (6, 28). Neuronal differentiation can also be modeled on the rat pheochromocytoma cell line PC12. Nerve growth factor (NGF) causes an initial mitogenic response in PC12 cells, followed by growth arrest and differentiation into sympathetic neuronlike cells with extended neurites. Egr-l responds rapidly to NGF in PC12 cells, as to other growth factors; however, the expression is not transient, remaining high for up to 6 days (6, 8). Finally, retinoic acid induces the differentiation of rat calvarial preosteoblastic RCT-1 cells. Egr-1 is induced rapidly and transiently by retinoic acid in RCT-1 cells or primary cultures of embryonal calvarial cells, but not in the most mature RCT-3 line, which already expresses many osteoblastic markers (29). These observations, together with the expression of Egr-1 in developing bone and cartilage described above, support a role for Egr-1 in osteoblast differentiation (26, 29). As described above, Egr-1 induction has been correlated with the onset of differentiation in several cell types. In particular, rnonocytic differentiation of U-937 and HL-60 myeloid leukemia cells induces Egr-l expression (30,31).Interestingly, dexamethasone, an inhibitor of rnonocytic differentia-
Egr-1
TRANSCRIPTION FACTOR FAMILY
197
tion, blocks the Egr-1 induction (31). Recent exciting results with myeloid cells provide the first demonstration that Egr-1 expression is necessary for differentiation (32). The human myeloid leukemia cell line HL-60 can be induced to differentiate along either macrophage or granulocyte lineages by treatment with phorbol or DMSO, respectively. Egr-1 expression is seen exclusively on induction of macrophage differentiation in HL-60 cells and primary myeloblasts. Egr-1 antisense oligomers added to the culture medium prevent macrophage differentiation, and constitutive expression of Egr-1 limits the differentiative capacity of HL-60 cells such that these multipotent cells can be longer be induced for granulocyte differentiation (32). These results convincingly demonstrate that Egr-1 expression is essential for and restricts differentiation along the macrophage lineage. Mapping of the human EGRl3 gene to chromosomal locus 5q31.1 is particularly intriguing with respect to these studies. The human EGRl locus has been localized to a 2.8megabase region defined by overlapping chromosomal deletions from patients with therapy-related acute myeloid leukemia (6, 33). The suggestion that Egr-1 is a myeloid tumor-suppressor gene is consistent with a role for Egr-1 in promoting myelogenesis.
C. Induction by Tissue or Radiation Injury In a third context, Egr-1 is induced in response to tissue or radiation injury. Ischemic injury to the kidney results in alterations in epithelial cell polarity, tissue damage, and cell death. Restoration of differentiated function after ischemic injury sets the kidney apart from the heart and brain, two organs that are irreversibly damaged by oxygen deprivation. Ischemic injury to rat kidney followed by reoxygenation induces a transient 30-fold increase in Egr-1 expression that does not require protein synthesis. Moreover, because induction requires reoxygenation, Egr-1 is not induced by the injury per se, but may rather act in response to postischemic events to mediate the subsequent processes of cell differentiation or proliferation (34). A second example of Egr-1 induction as a consequence of cell injury is the cellular response to X-ray irradiation. Ionizing radiation has pleiotropic effects, including growth arrest, the repair of damaged DNA, and proliferation. Egr-1 responds by a transient induction within 0.5 to 3 hours of exposure to X-rays in the absence of protein synthesis (35).
D. Induction in Neuronal Signaling Immediate-early genes, by analogy to their part in the mitogenic response, may also play an important role in stimulus-transcription coupling in neurons (36).Several lines of experimentation indicate that immediate-early EGR is the human factor; Egr is the mouse or rat factor.
198
ANDREA GASHLER AND VIKAS P. SUKHATME
Resaonse
!aLJuE
Rererences
serum PDGF, ECF, FGF insulin phylahernagglutinin anti-p adenosine diphosphale PDGF. vasomessin serum bFGF, PDGF BB partial hepatectomy GM-CSF. LPS endothelin angiotensin I1
fibroblasts fibroblasts hepatocytes peripheral blood lymphocytes B lymphocytes kidney epithelial cells kidney mesangial cells skeletal muscle So18 cells skeletal muscle So18 cells liver peritoneal macrophages astrocytes vascular smooth muscle cells
5. 6. 9, 13 5 . 6. 9. 13. 43
Hypertrophic
endothelin, angiotensin I1
myocyte
101, I I7
Diflerentiative
NGF retinoic acid, DMSO retinoic acid TPA. DMSO
pheochromocytoma PC12 (neural) embryonal carcinoma PI9 embryonal calvarial cells, RCT-1(osteohlast) myeloid leukemia HL60 and U-937
6, 8 6 29 30, 31. 32
ischemia ionizing radiation
kidney 293, SQ-20B
34 35, 53. 1 I6
potassium ions metrazole
PC12 (depolarization) seizures in vivo hippocampus visual cortex CNS
36, 6. 37 6 40 38. 39 41
sciative nerve transection
42
MDCK, LLC-PKI renal epithelial cells
iin
Mitogenic
Tissue1 radiation injury
Neuronal excitation
NMDA
visual stimuli electroconvulsive shock therapy, dopamine receptor activation, opiate withdrawal peripheral nervous system
Other
urea
5 5. 12 17
5 19. 114 20 20 9. 119 121. 115 23 I20
FIG. 1. Biological processes in which Egr-I expression has been described.
genes, including Egr-1, participate in the rapid response of neurons to transsynaptic stimuli. In uiuo, Egr-1 levels increase rapidly in the brain following seizure activity, with kinetics similar to c-fos (6). Membrane depolarization of PC12 cells by treatment with potassium chloride also results in rapid and transient induction of Egr-1 (6, 37). In dark-reared cats, a brief 1-hour visual stimulation causes dramatic and transient induction of Egr-1, c-fos, and junB mRNAs that are specific to the visual cortex, i.e., absent from the frontal cortex. The magnitude of the induction, greatest in young animals, is consistent with the idea that Egr-1 expression plays a fundamental role during the critical period of development in the visual cortex (38,39).A role for Egr-l in postnatal maturation of the brain is supported by the dramatic increase in Egr-l message levels in all sections of postnatally developing rat brain,
Egr-1
TRANSCRIPTION FACTOR FAMILY
199
especially cortex (25). Finally, high-frequency stimulation of the perforant path-granule cell synapse results in induction of Egr-1 in the postsynaptic cells. The response of Egr-1 is highly reproducible, as compared to the variable response of other immediate-early genes. Interestingly, induction of Egr-1 is correlated with long-term potentiation, because both responses require the N-methyl-D-aspartate receptor and a stimulus of similar frequency and intensity (40). Additional studies show Egr-1 induction following electroconvulsive shock therapy, D1 dopamine receptor activation, and opiate withdrawal (41). Transient Egr-l induction has also been noted in the peripheral nervous system, e.g., sciatic nerve transection provokes Egr-1 protein increase in neurons of the spinal dorsal horn (42).These studies, and the expression of Egr-l in developing and adult brain and in the peripheral nervous system are consistent with a role for Egr-1 in neurophysiological processes. This summary of the contexts in which Egr-1 is expressed emphasizes the diversity of signals that induce Egr-1 (Fig. 1).Egr-l is induced by mitogenic stimuli in all cell types; during differentiation of nerve, cardiac, bone, and myeloid cells; after tissue injury due to ischemia or irradiation; and by signals that result in neuronal excitation, such as membrane depolarization or brain seizures. There has been one demonstration, in the differentiationinducible HL-60 cell line, of a phenotype resulting from inappropriate Egr-1 expression (32). In addition to promoting and restricting differentiation of myeloid precursors along the macrophage lineage, the enormous complexity of the Egr-1 response hints that this protein may play diverse roles in different cellular contexts.
IV. Proximal Events
A. Second Messengers Two strategies have yielded insight into the complex regulation of the
Egr-1 gene: activation or inhibition of specific second-messenger pathways and a molecular genetic dissection of the Egr-l promoter. Multiple intracellular pathways appear to contribute to the regulation of Egr-l expression.
Both protein-kinase-C (PKC)-dependent and -independent mechanisms are integral in linking extracellular signals to transcriptional activation of Egr-l . Clearly, the PKC pathway can relay extracellular stimuli to a nuclear response resulting in Egr-1 induction, because direct activation of the pathway by phorbol ester (TPA) induces Egr-l (5, 43). In addition, non-PKC pathways also play a role: fibroblasts rendered deficient in PKC signaling by long-term exposure to phorbol retain a robust Egr-1 response to serum and epidermal growth factor (43).
200
ANDHEA GASHLER AND VIKAS P. SUKHATME
In the response of Egr-1 to tumor necrosis factor and interferon in human fibroblasts, the PKC pathway appears instrumental. Treatment with H7 (a nonspecific inhibitor of protein kinases including PKC) or the PKC inhibitor staurosporine effectively blocks much of the Egr-1 response. The selective inhibitor of cyclic-nucleotide-dependent protein kinases, HA1004, does not modify the Egr-1 response (21). Stimulation of B lymphocytes with phorbol or the PKC agonist SC-9 also up-regulate Egr-1 expression, implying that surface immunoglobulin (Ig)-generated signals work through PKC. Evidence for the PKC pathway as a requisite component of anti-p, induction of Egr-1 comes from studies with inhibitors of PKC. A prior treatment with either H7 or sangivamycin, effective inhibitors of PKC, blocks the increase in Egr-1 mRNA levels in response to anti+. Again, the cyclic-nucleotidedependent protein kinase inhibitor HA1004 had no effect. These studies demonstrate that activation of PKC is involved in coupling surface Ig stimulation in B lymphocytes to the transcriptional response of the Egr-1 gene
(44). The PKC pathway appears fundamental in mediating Egr-1 induction in response to X-irradiation. First, prolonged stimulation with micromolar concentations of phorbol depletes PKC and virtually blocks the X-ray inducibility of Egr-1 in SQ2OB cells. In addition, pretreatment with the inhibitor H7 but not HA1004 markedly attenuates the X-ray inducibility of Egr-1 in SQ2OB or 293 cells (35). In contrast, an intracellular pathway involving c-Raf plays a central role in the v-Src induction of Egr-1. c-Kaf-1 is a serine-threonine protein kinase, and v-Raf up-regulates the Egr-1 promoter. Moreover, expression of a kinase-defective mutant of c-Raf-1 blocks induction of Egr-1 upon regulation of the Egr-1 gene.
6. Egr-7 Promoter Analysis The architecture of the Egr-1 promoter has been described by several groups who have cloned the murine (14, 46), rat (47), and human Egr-I genes. In particular, the coregulation of c-fos and Egr-l in several contexts has prompted a comparison of their promoter sequences. Six CC(W)GGG elements (CA,G boxes), the functional core of the serum response element (SRE), are present in the Egr-1 promoter; however, none of these potential SREs shares the extended symmetry outside of the core sequence that typifies the c-fos SRE (48). In addition to the CA,G boxes, putative regulatory elements in the Egr-1 promoter include CAMP response elements, AP1, CREB, and Spl sites as well as a CCAAT box and TATA motif ( 1 4 , 4 6 , 4 7 , 4 9 ) , as illustrated in Fig. 2. The demonstration that 1 kb of murine 5’ sequence confers serum and phorbol responsiveness to a CAT reporter in mouse fibroblasts opened the
Egr-1 TRANSCRIPTION
FACTOR FAMILY
201
door to delineation of the functional elements (14, 50, 51). Similarly, NGF inducibility was observed with the sequence from -532 to +lo0 of the rat gene in PC12 cells (47). Deletion analysis of the Egr-1 promoter showed that a construct with sequence to -594 (and all six CA,G boxes) retains full serum inducibility whereas deletion to -166 (with the two proximal CA,G elements) has partial serum responsiveness as compared to a minimal promoter construct. Moreover, synthetic constructs with a single Egr-1 CA,G box confer serum inducibility on the heterologous thymidine kinase promoter (49). These results show clearly that the decanucleotide inner core of the previously defined c-fos SRE functions as a serum response element in the Egr-1 promoter. In a gel-shift assay, the core Egr-1 SRE can compete for binding against the c-fos SRE with its more extensive dyad symmetry. And like the c-fos SRE, the Egr-1 CA,G boxes bind to in-uitro-translated serum response factor. Further experiments with synthetic constructs indicate that tandem copies of the CA,G boxes are more strikingly inducible than an individual element (49). Given these observations, the greater serum inducibility of Egr-1 versus c-fos may be explained by the multiple elements in the Egr-1 promoter as compared to the single SRE regulating c-fos expression. The CA,G box appears to play a central role in the broad responsiveness of Egr-1 to mitogens, because this motif directs induction by PDGF, phorbol, v-src, and v-fos, as well as serum (16, 49, 50, 52). These elements, especially the three most 5’ ones, are also responsible for the activation of Egr-1 by ionizing radiation (53). Finally, the CA,.G boxes in the Egr-1 mediate the down-regulation of Egr-1 transcription following mitogenic stimulation. In particular, the Fos protein effects this transcriptional repression; Fos mutants lacking a leucine zipper function as well in this assay, and the C-terminal region of Fos is sufficient for this function (51).
V. Distal Events A. Characterization of the Egr-1 Protein Product Immediate-early genes encode several types of proteins, including growth factors, growth factor receptors, cytoskeletal proteins, and transcription factors. Sequence analysis of Egr-1 revealed a protein with three tandemly repeated Cys,His, zinc-finger motifs that presage the function of this protein (6, 8, 13, 14). The zinc finger (see below), a highly conserved eukaryotic DNA-binding motif, is a compact domain that uses conserved pairs of cysteine and histidine residues to coordinate a central zinc ion (54). The importance of the Egr-1 gene product is also suggested by the conservation
202
ANDREA GASHLEH AND VIKAS P. SUKHATME
I
A
I
-9&3
I
I
-800
I
I
-700
I
I
-600
I
I
-500
I
B
I
I
-400
u r n CI
Srna I
BarnHI I
6
I
513
Apa I I
I
-300 -200
mmm
D
I
I
-100
TAT%
:!
serum response element (SRE) Egr-1 Mndng site (EBS)
ACGGAGGGAA T A G C C m C G ATTCTGGGTG GTGCATTGGA AGCCCCAGGC TCTAARACCC
-876
-935
CCAACCTACT GACTGGTGGC CGAGTATCCA CCCCACTCCT AGCTAGGCAC TGTCCCAACA
-816
-875
-
ACCAGTAGCC AAATGTCTTO GCCTCAGWT TCCCGGTGAC ACCTGGAAAG TGACCCTGCC
-815
-756
AITAGTAGAG GCPCAGGTCA GGGCCCCGCC TCTCCTGGGC GGCCTCTGCC CTAGCCCGCC
-755
CMCCGCTCC
695
-696
TCCTCTCCGC
AGGCTCGCX
CCACGGTCCC
CGAG -636
AGGATQACGG CTGTAGAACC CCGGCCTGAC
-635
TCCCCAQCCC AGCTCGCACC CGGGGGCCQT CGGAGCCGCC GCGCGCCCAG CTCTACGCGC
-575
-516
CTGGCCCTCC CCACGCGGGC GTCCCCGACT CCCGCGCGCG CTCACGCTCC rAG?TCGGAA
-515
456
CCAAQGAGGG GGAGGATGGG GGGGGGGGTO TGCGCCGACC CGGAAACCCC ATATAAGGAG
455
-396
CAGGAAGGAT CCCCCGCCGG AACAGACC'IT A'ITTGGGCAG CGCC'R'ATAT
-395
GGAGTGGCCC
-336
AATA'TGQCCC TGCCGCTTCC GGCTCTGGGA GGAGGGGCGA GCGGGGGTTG GG
-335
&GCEGGAA
-275
CTCCAGGCGC CTGGCCCGGG AGGCCACTGC TGCTGWCCA
ATACTAGGCT
-216
Tl'CCAGGAGC CTGAGCGCTC GCGATGCCGG AGCGGGTCGC AGGGTGGAGG TGCCCACCAC
-215
-156
TCTTOGATGG GAGGGCTTCA CGKACTCCG GGTCCTCCCG GCCGGTCCYI' CCATATTAGC
-155
-96
GCTITCTGCT TCCCATATAT GGCCATQTAC GTCACGGCGG AGGCGGGCCC GTGCTGTTCC
-9 5
AGACCCITG-GAGGCC
-35
-36
+l
GATTCGGGGA GTCGCGAGAG ATCCCAGCGC GCAGAACTTG
t25
GGGAGCCGCC GCCGCGATTC GCCGCCGCCG CCAGCTTCCG CCGCCGCAAG ATCGGCCCCT
+26
+85
GCCCCAGCCT CCGCGGCAGC CCTGCGTCCA CCACGGGCCG CGGCTACCGC CAGCCTGGGG
+86
t145
Egr-1
TRANSCRIPTION FACTOR FAMILY
203
of the coding sequence across vertebrate evolution: human (7), rat (8),mouse (6, 13, 14), chicken ( l l ) ,and zebrafish (55, 56) cDNAs are highly homologous. From the deduced amino-acid sequence of Egr-1 protein, several interesting features have been predicted (Fig. 3A). Basic residues cluster in the three zinc fingers and adjacent sequence. The amino-terminal 300 amino acids are rich in proline (14%) and serinelthreonine residues (24%). Several stretches of five to seven consecutive serine or threonine residues are present with one series of seven serinelthreonine residues followed by seven glycines (Fig. 3B). It has been noted that the repeating trinucleotide motifs that encode these poly(aminoacid) stretches are similar to those whose expansion has been implicated in human disease (55,57,58).The region on the carboxy-terminal side of the zinc-finger motifs is also rich in proline (15%) and serine plus threonine (37%), but this region is distinguished by a repeated motif of eight amino acids with the consensus SerIThr-SerIThrPhe/Tyr-Pro-Ser-Pro-X-X. The composition of this reiterated sequence is reminiscent of the heptapeptide repeat in the carboxy-terminal domain of the RNA polymerase I1 large subunit (59).The proline-rich regions of Egr-1 are predicted to lack a-helical secondary structure, whereas the high content of serine, threonine, and tyrosine residues suggests that Egr-1 may be phosphorylated. Characterization of the Egr-1 gene product showed it to encode a shortlived protein with an anomalous electrophoretic mobility of 80-82 kDa. In fibroblasts, Egr-1 protein is rapidly induced by serum, accumulating within 30 minutes and reaching maximum levels at 1-2 hours poststimulation (50). Consistent with its putative DNA-binding function, immunocytochemistry and cell fractionation studies show that Egr-1 is located in the nucleus (50, 60, 61). Studies (60) have characterized the rat homolog in PC12 cells with several antisera directed against various regions of the protein. In particular, a truncated species of 54 kDa is cytoplasmic. This 54-kDa species is recognized by antisera directed against the basic region immediately 5’ of the first zinc finger but not by sera against a C-terminal peptide. These results were an early indication that sequences within or C-terminal to the zinc-finger
FIG.2. The 5’ upstream sequence of the murine Egr-I gene. (A) Schematic of promoter depicting putative regulatory elements. The positions of the six serum response elements within approximately 1 kb of promoter sequence are depicted as darkened boxes. The locations of the Egr-1 binding sites within the promoter are indicated as open boxes. (B) Nucleotide sequence of the Egr-l promoter. [Reprinted from NARes (Ref. 46) by permission of Oxford University Press.] The nucleotides are numbered from the cap site, which is +1. The putative TATA element is underlined and three Egr-1 binding sites in the 5’ promoter region are boxed.
Egr-1 residues
A
100
1
I
I
I
200
300 I
400 I
500
1
533 1
B
LYS Pro Ser A r g Met Arg Lye prr Pro Asn Arg Pro Sex Lys Iphr Pro Pro H i s Glu Arg Pro Tyr Ala@Pro ,g@Ile
Val Glu S e r a A s p Arg Arg Phe Ser Arg Ser Asp Glu Leu Thr 354
Arg Ils@Thr
Gly Gln ~ y sPro Phe G l n e A r g I l e a n e t Arg Asn
Phe Ser Arg Ser Asp H i s Leu Thr Thr @Ile
Arg T h r a T h r Gly Glu Lys Pro Phe 394 A l a a A s p I l e Q G l y Arg Lys Phe Ala A r g Ser Asp Glu Arg Lys A r g a T h r Lye Leu Arg Gln LYS ASP Lys LYE Ala Asp ~ y sSer Val Val Ala Ser Pro Ala Ala 434 Ser Ser Leu Ser Ser Tyr Pro Ser Pro Val Ala Thr Ser Tyr Pro Ser Pro Ala Thr Thr S e r Phe Pro Ser Pro Val Pro Thr Ser Tyr Ser Ser Pro GLy Ser Ser Thr Tyr Pro
1%
Pro Ala His Ser Gly Phe Pro Ses Pro Ser Val Ala Thr Thr Phe Ala Ser Val Pro Pro Ala Phe Pro Thr Gln Val Ser Ser Phe Pro Ser Ala Gly Val Ser Ser Ser Phe Ser Thr 514 S e r Thr Gly Leu Ser ASP Met Thr Ala Thr Phe S e r Pro Arg Thr Ile Glu Ile cys 533
Egr-1 TRANSCRIPTION FACTOR
FAMILY
205
domain may participate in nuclear targeting (60). The Egr-1 gene product is also phosphorylated: alkaline phosphatase converts the two closely spaced Egr-1 species seen on SDS-PAGE analysis of NGF-stimulated PC12 cells to the faster migrating form (50, 60). Immunoprecipitation of Egr-1 from phosphate-labeled HeLa cells and subsequent analysis of phosphoaminoacid content indicate that the phosphorylation is on serine (62).
B. DNA-binding Activity of Egr-1 Hundreds of eukaryotic transcription factors share the highly conserved DNA-binding motif known as the zinc finger. First identified as a compact zinc-binding domain in the Xenopus transcription factor IIIA (54), this wellconserved motif also occurs in the yeast proteins SW15 and ADR1, DrosophiZu factor Kruppel and Hunchback, and mammalian regulatory proteins such as the testis-determining factor ZFY, the enhancer-binding protein S p l , and the Wilms tumor suppressor WT1. TFIIIA-like fingers are distinguished by pairs of conserved cysteine and histidine residues and are evolutionarily and structurally distinct from the cysteine-rich zinc-binding motifs in the steroid receptors and in the yeast factor GAL4. A variable number of tandem repeats of this domain of 28-30 amino acids act in concert to recognize a specific DNA sequence (63). Residues fundamental to the structural integrity of the finger domain are conserved among all Cys,His, zinc-finger proteins whereas other amino acids involved in base sequence discrimination may be unique or confined to a subset of this large family of proteins. Pairs of cysteine and histidine residues are absolutely conserved as are usually the hydrophobic amino acids phenylalanine and leucine (Fig. 4A). The region connecting the histidine of one finger to the cysteine of the following finger, designated the H-C link, has the highly conserved consensus His-Thr-Gly-Glu-LuslArg-Pro-Phe-TyrX-Cys (63).In addition, three variable residues, discussed below, appear to participate in sequence-specific interactions with DNA. NMR and cystallographic studies suggest that each zinc-finger motif consists of an antiparallel P-sheet that includes the two consensus cysteines, and an a-helix that contains the two conserved histidine residues (Fig. 4B). Each
FIG.3. Schematic structure and amino-acid sequence of the Egr-1 protein. (A) Structural features of Egr-1. Each zinc-finger motif is designated by a black bar. The basic region of Egr-1 is indicated (+ +). The serineithreonine-rich N-terminal domain of Egr-1 is shown on the left and the proline/serine/threonine-richC-terminus (PISIT) is on the right. (B) Coding sequence of murine Egr-I. [Reprinted with permission from Ref. 6, copyright 1988 Cell Press.] The three zinc-finger motifs are enclosed. Conserved cysteine and histidine residues in the zinc fingers are circled. Serine, threonine, and tyrosine residues in the N-terminal domain are underlined.
+
ANDREA GASHLER AND VIKAS P. SUKHATME
206 1
A
Go 0
I
0
B
P-F-
I I
8 8
I
I. I ,
0 i
I
[email protected]@\
I,
i a
8
I k
0
-@-a’
-
,F-
21
Antiparallel P-sheet
0
\
-G\
15
E-
K
linker
DNA binding region
C
EGR family
31C
G
C
C
C
F& tl
C
C
G
C
C
C
C
C
SPl
39C
C
C
C
0
FIG. 4. The zinc finger as a modular DNA-binding motif. (A) Zinc-finger consensus residues. Invariant cysteine (C)and histidine (H) residues that coordinate a zinc ion are circled, as are the conserved hydrophobic residues phenylalanine and leucine. Residues that are part of the highly conserved His-Cys link are enclosed by hexagons. Amino acids that determine the sequence-specificity of binding are shown in the shaded boxes. ( 8 ) Diagram of zinc-finger folding (from 113).Each zinc-finger domain is composed ofa @-sheetand an a-helix. Hydrogen bonds are depicted as dotted lines. (C) Each zinc finger contacts a three-nucleotide subsite. [Reprinted with permission from Nature (Ref.66). copyright 1991 Macmillan Magazines Limited.] Fingers 1 and 3 of Egr-2 are postulated to bind the same three-nucleotide subsite as finger 2 of Spl.
zinc finger incorporates these secondary structures into a compact globular domain with the invariant cysteine and histidine residues coordinating a central zinc ion. A hydrophobic core including the conserved phenylalanine and leucine residues and the first histidine stabilizes the domain. In a manner similar to prokaryotic helix-turn-helix motifs and eukaryotic homeodo-
Egr-1
TRANSCRIPTION FACTOR FAMILY
207
mains, the a-helix of the zinc finger lies within the major groove of DNA. Multiple interactions between amino-acid side-chains of the helix and DNA base-pairs combine to discriminate among nucleic-acid sequences (64). In a search for the DNA element recognized by Egr-1 among fragments derived from the 5' upstream flanking sequence of the Egr-l gene (49),it was found that autoregulation by other immediate early genes such as c-fos and cycloheximide superinduction of Egr-l was consistent with the hypothesis that Egr-1 regulates its own expression. Using gel mobility shift assays with Egr-1 protein purified from bacteria, specific binding to one promoter fragment was observed (49). DNase-I footprinting identified the sites of contact, revealing that Egr-1 binds the 9-bp sequence GCG-GGG-GCG. Further gel shifts comparing the d n i t y of this sequence to sites altered at various positions generated the consensus sequence: GCG-KGG-GCG (49). Gel shift assays with zinc-chelating agents were utilized to demonstrate the requirement for zinc cations to effect DNA binding (50). Two types of experiments support a similar model for the determination of DNA-binding specificity by EGR fingers and proteins with related zincfinger domains (reviewed in 65). Mutagenesis experiments were guided by the similar but distinct zinc-finger domains of Spl and Egr-2, a gene whose three zinc fingers are identical to those of Egr-1 except for four conservative amino-acid substitutions. These mutagenesis studies foreshadowed the results obtained by cocrystallization of the Egr-1 zinc-finger domain and its cognate binding site. It has been observed that Spl and Egr-2 each contain three zinc fingers and bind to a (G+C)-rich 9-bp binding site (66). If each motif interacts with DNA in an analogous manner, then a zinc finger is predicted to contact 3 bp of DNA. Furthermore, comparison of the Egr consensus GCG-GGG-GCG to the Spl consensus GGG-GCG-GGG suggested that fingers 1 and 3 of Egr-2 might have the same specificitydetermining residues as finger 2 of Spl (Fig. 4C). Fingers 1 and 3 of Egr-2 share Glul8 and Arg21 with finger 2 of Spl (Fig. 5A). In addition, finger 2 of Egr-2 and fingers 1 and 3 of Spl each have a histidine residue at position 18 of the finger. It was predicted (66) that the residues at positions 18 and 21 discriminate between GGG or GCG subsites. In accordance with this hypothesis, mutagenesis of Egr-2 finger 2 residues Hisla to Glu and Thr21 to Arg created a protein that did not bind the Egr-2 cognate sequence but instead recognized the novel sequence GCG-GCG-GCG (66). The mode thus constructed (66), in which variable residues at positions 18 and 21 were postulated to be the determinants of base-sequence discrimination, has been substantiated (64).Solution of the Egr-1 zinc-finger domain-DNA crystal structure provided a framework for understanding how proteins with tandemly repeated Cys2His2zinc fingers interact with DNA.
208
ANDREA GASHLER AND VIKAS P. SUKHATME
A
+
Egr- 1 Egr-2 Egr-3 wT1 SPl
+ +
+
K P K K T
P I P R C
S L I Y P
R R R F Y
M P P K C
R R R L K
+
K K K S D
Y Y Y H S
P P P L E
N N N Q G
R R R M R
+
+ P P P H G
S S S S S
K K K R G
T T T K D
P P P H P
P V L T G
H H H G K
E E E E K
R R R K K
P P P P Q
Y Y H Y H
A P A Q I
Egr-1 (1) Egr-2 (1)
C P V E S C D R R F S C P A E G C D R R F S
spl (2) MIGl (I)
C T W S Y C G K R F T R S D E L Q R H K R T H T G E K K F A C P I - - C H R A F H U L E U Q T U H M R I H T G E K P H A
I R I H T G Q K P F Q I R I H T G H K P F Q
a-helix
B
finger 1
finger 2
finger 3
ma
16
18
21
3 ' G n G
G
G
n
16
18 21
+ + 15 18
21
G n G 5 '
FIG. 5. DNA-binding domains of the EGR family. (A) Comparison of DNA-binding domains related to Egr-1. Zinc fingers and the flanking sequence of murine Egr-1, human Egr-2, human Egr-3, the Wilms tumor gene W T l , Spl, and the yeast protein MIGl are aligned for comparison. The position of each finger motif is noted in parentheses. Conserved cysteine and histidine residues are marked (.). The helical region is underlined and the residues important in determining binding specificity are enclosed. Conserved basic residues flanking the zinc-finger domains are denoted (+). (B) Residues determining sequence specificity of Egr-1 binding. [Adapted with permission from R. E. Klevit, Science 253, 1367 (1991)(Ref. 65).Copyright 1991 American Association for the Advancement of Science.] The Egr-1 zinc-finger domain interacts with the guanine-rich strand of DNA in an antiparallel manner. Fingers 1 and 3 contact the same 3-bp subsite. Arrows represent specific interactions between arginine or histidine residues and the guanine bases.
This structure showed that each finger has a similar relation to the DNA and interacts primarily with a 3-bp subsite. The a-helix of each finger fits directly into the major groove so that residues in the amino-terminal part of the helix
Egr-1
TRANSCRIPTION FACTOR FAMILY
209
are available for hydrogen-bonding interactions with the base-pairs. The P-sheet is on the backside of the helix away from the base-pairs; the second P-strand of the sheet contacts the sugar-phosphate backbone, serving to orient the a-helix in relation to the DNA. An arginine immediately preceding the helix makes important DNA contacts as do the second, third, and sixth residues of the a-helices. Each of the hydrogen bonds are made to guanines of the G-rich strand of the DNA. The orientation of the three fingers is antiparallel with respect to the G-rich strand; that is, the 5’-most subsite of the sequence is recognized by the carboxy-terminal finger. In addition, the a-helix lies in the major groove in an antiparallel manner, so that the carboxy-terminal portion of each helix interacts with the 5’-most base of each subsite (64). A select number of residues at defined positions in each finger interact specifically with the DNA. The residue preceding the a-helix and the third and sixth residues of the helix make the specific contacts. In each finger of Egr-1, Arg15 precedes the helix and Asp1’ is the second residue of the helix (Fig. 5A). The arginine hydrogen bonds through its long side-chain with a guanine at the third position of each subsite and is stabilized by the conserved aspartate residue. Thus, the third position, G, is common to each subsite and is recognized in an identical manner (Fig. 5B).The third residue of the helix varies between fingers; Glu18 is present in fingers 1 and 3 whereas a histidine is present at the same position of finger 2. The structure solved by Pavletich and Pabo (64)shows that these glutamate residues do not contact the DNA. In contrast, the histidine of finger 2 participates in a hydrogen bond with the guanine in the center of its subsite. The sixth residue of the helix, Arg21 in fingers 1and 3, forms a specific bond with the guanine occupying the first position of the subsite. A threonine, which is the sixth residue in the helix of finger 2, is incapable of this interaction (64). In summary, a relatively simple pattern has emerged for Egr-1:DNA recognition from this work (64). The common arginine immediately preceding the helix specifies a guanine at the third position of each subsite. The third residue of the helix may contact the middle base of the subsite, and the sixth residue of the helix may contact the first base of the subsite. Moreover, the Egr-1 zinc fingers utilize only arginine or histidine residues to contact guanines; in the absence of these amino acids, such as Glu18 in fingers 1and 3 and Thr21 in finger 2, there is no specific recognition of the DNA sequence (64). A complementary analysis (49) of variant Egr-1 binding sites has confirmed the lack of specific interactions with the fourth nucleotide: in gel shift assays, GCG-TGG-GCG competed as well as GCG-GGG-GCG. However, the sequences GAG-GGG-GCG and GCG-GGG-GAG were not efficient competitors, showing that not all nucleotides are permissible at positions 2
210
ANDREA GASHLER AND VIKAS P. SUKHATME
and 8 (49). Although no specific contacts were observed at these positions in the Egr-1:DNA cocrystal (64), it is possible that substitution of the bulkier adenosine for cytosine is disruptive at these positions. These results (64, 66) emphasize the modularity of the zinc-finger motif in which each zinc-binding domain recognizes a three-nucleotide sequence. In particular, an implicit assumption has been that each finger makes an equal contribution to the overall affinity of binding. A complementary in vivo mutational analysis of the Egr-1 zinc-finger domain hints that each finger may not make the same contribution to binding. Specifically, many more DNA-binding impaired mutants with alterations in the second finger rather than in the first or third can be recovered (67). Moreover, the two His-Cys links connecting the finger motifs also showed a disparity in the number of DNA-binding mutants recovered. The second linker was mutated 17 times whereas the first was altered three times, suggesting that the linkers may not play identical roles in orienting the fingers (67). The recognition code outlined by the crystallographic studies (66) indicates similar interactions for all three fingers of Egr-1 and implies that other Cys, His, zinc-finger proteins will use residues at analogous positions to make their base-specific contacts. Studies with the Drosophila finger protein Tramtrack reveal an extension to the formula derived from Egr-1 DNAprotein interactions whereby residues at three positions determine DNA binding specificity. The first finger of Tramtrack uses an additional aminoacid contact to recognize its DNA binding site (68).In conclusion, the model developed from Egr-1 studies will generalize to some other zinc-finger proteins, but it does not describe the complete repertoire of all possible protein-DNA contacts in Cys, His, zinc-finger proteins.
C. Structure-Function Analysis 1. DEFININGTHE Egr-1 ~?-U1ZS-ACTIVATIONDOMAINS Definition of a DNA-binding site for Egr-1 set the stage for assessing whether Egr-1 could regulate transcription through the GCG-GGG-GCG sequence. Data from transient transfection assays shows that Egr-1 can activate a minimal promoter with multiple Egr-l binding sites 10-fold in a dosedependent manner (62, 69). Like classical transcription factors, the organization of Egr-1 is modular in nature, with functional domains that are structurally independent and able to confer activity on heterologous proteins. We and others have used deletion analysis and gene fusions to dissect the functional domains of Egr-1, delineating modular activation, repression, and nuclear localization activities. Deletion analysis of murine Egr-1 indicates that the extensive serine- and threonine-rich N-terminal domain is a robust transcriptional activator. A
Egr-1
TRANSCRIPTION FACTOR FAMILY
211
constructive approach shows that several Egr-1 activation sequences are independent domains capable of functioning in a heterologous context when fused to the DNA-binding domain of the yeast factor GAL4. Residues 3-281, or subregions from 3 to 138 or 138 to 281, activate transcription 100-fold as GAL4 fusions (70). Deletion analysis of the rat homolog of Egr-1 further suggests that residues 13-38 and 223-264 may be most important for the activation function (57). The N-terminal domain is 30% serine/threonine/tyrosine rich over a span of 180 residues; the large size of the activation domain may contribute to its potency relative to the smaller, previously described serinelthreonine-rich activator Pit-l/GHF-1 (71). Moreover, the trans-activation domain is impervious to mutation in that substantial deletions in the extensive N-terminal domain do not destroy transcriptional activity. Finally, work from several laboratories maps a weak trans-activation function to the C-terminus of Egr-1, which contains the octapeptide repeats reminiscent of the phosphorylated YSPTSPS reiterations in the carboxyterminal domain of RNA polymerase I1 (57, 59, 70).
2. LOCALIZATION OF
AN
Egr-1 REPRESSIONDOMAIN
An unexpected result of deletion analysis is that a small internal deletion immediately 5’ of the zinc-finger domain (A284-330) enhances transactivation some fivefold in HeLa cells. Western-blotting and gel-shift analyses showed that this superactivation cannot be explained simply by overexpression or enhanced DNA binding of the deletion derivative relative to full-length Egr-1. The superactivation observed with A284-330 is consistent with the loss of a region important for repression or for negatively regulating the trans-activation function of Egr-1. Further experiments have shown that Egr-1 encodes a portable repression domain. Initial work demonstrated that a domain of 34 amino acids (281-314) can repress transcription 7- to 10-fold when fused to the GAL4 DNA-binding domain and assayed for effect on a reporter with five GAL4 binding sites. Repression by this compact domain was dependent on a DNA binding anchor (70). A further definition of the essential region showed that residues 281-304 repress and that residues 290-314 are inactive (72).This domain, highly conserved throughout vertebrate evolution (55), represents a novel motif distinct from the previously described alanine- and glycine-rich repression module in Kruppel (73, 74); the hydrophobic and proline-rich Even-skipped repressor (75); the glutamine-, alanine-rich factor D r l ; and the proline-, glycine-rich repressor of WT1 (76). In the Egr-1 repression domain, depicted in Fig. 5A, 7 of 24 residues are serine or threonine. In light of the fact that Egr-1 is known to be phosphorylated (14, 50, 60, 61), this raises the question ofwhether the Egr-1 repression function may be regulated by this modification (see below). Repression by Egr-1 may involve an interaction with a cellular factor. A
212
ANDREA GASHLER AND VIKAS P. SUKHATME
competition assay showed that overexpression of Egr-1 amino acids from 266 to 301 results in a dramatic increase in activation from an Egr-1 molecule whose DNA-binding domain has been replaced with that of GAL4 (57). These results suggest that the region 266-301 is sufficient for an interaction with a titratable cellular factor that normally inhibits Egr-1 activity. A single isoleucine-to-phenylalaninesubstitution at position 290 renders the 266-301 domain nonfunctional. As predicted, this Ile2goPhe mutation in the context of the native Egr-1 protein results in dramatic superactivation such that this variant activates about 15 times better than wild-type Egr-1. It is suggested that the cellular factor that interacts through this domain is present in a wide variety of mammalian cells, although apparently not in Drosophila Schneider cells because there is no superactivation in this cell type (57). Elucidation of the mechanism of Egr-1 repression has begun with the definition of the minimal promoter elements required. Initial work had demonstrated repression with an Egr-1 IGALA chimera on a reporter containing a portion of the thymidine kinase promoter with multiple proteinbinding elements in addition to a TATA box. However, both in uiuo and with an in uitro transcription assay using bacterially expressed fusion proteins, minimal promoter constructs containing only a TATA or initiator element in addition to binding sites to direct the Egr-1 /GAL4 chimera are sufficient for repression (72).Although these observations suggest that Egr-1 repression is mediated by some type of interaction with the basal transcription machinery, preliminary experiments indicate that Egr-1 does not directly bind to either TBP, TFIIB, or TFIIE in uitro (77).Therefore, the Egr-1 repression domain may bind to one of the many other proteins involved in complex formation or to an associated protein, presumably the widely expressed cellular factor titrated by Russo et al. (57). The compact Egr-1 repressor is serine- and threonine-rich, and in particular Thr-289 has homology to known PKC phosphorylation sites (Fig. 5A). Phosphorylation is clearly not required for repression, because bacterially expressed Egr-1 efficiently represses transcription in vitro (78). This work is consistent with the suggestion that phosphorylation inactivates the Egr-1 repression domain, preventing an interaction needed for the transcriptional inhibition. Importantly, an Ile-to-Phe mutation at the position analogous to Egr-1 residue 290 in the PKC substrate neurogranin makes it a better substrate for the kinase (79). The corresponding mutation in Egr-1, which may similarly promote phosphorylation on Thr-289, renders the repression domain nonfunctional (57). The role of phosphorylation may therefore be to enhance the ability of Egr-1 to work as an activator, by muting its repression function. Egr-1 is one of only a small number of factors that contain modular domains capable of regulating transcription both positively and negatively.
Egr-1 TRANSCRIPTION
FACTOR FAMILY
213
Other examples include the Drosophila factor Kruppel (74), YYl/NF-El/s (reviewed in 80), and the immediate-early factors Fos and Jun (81).This work provocatively suggests that native Egr-1 may be a bifunctional protein, capable of alternatively activating or repressing transcription. Such a property may be common to immediate-early genes to allow for versatility of effector functions. Posttranslational modifications as discussed above can be envisioned to enable complex factors such as these to regulate transcription either positively or negatively. In the case of Egr-1, we can speculate that Egr-1 may either activate or repress transcription, depending on whether it is induced in response to positive growth or to differentiation cues, or that Egr-1 may activate and repress multiple target genes depending on their promoter context, thereby mediating multiple transcriptional effects in response to a single inducing agent. HL-60 cell differentiation by phorbol may exemplify the latter type of bimodal Egr-1 function. Because Egr-1 expression both promotes macrophage differentiation and prevents granulocytic differentiation, the bifunctional role of Egr-1 may be to stimulate genes essential for macrophage differentiation while repressing genes required for specialized granulocytic functions.
3. MAPPINGTHE Egr-1 NUCLEARLOCALIZATION SIGNAL Consistent with its role as a transcriptional regulator, Egr-1 has been shown by several groups to be localized in the nucleus (50, 60, 62). Small molecules and proteins less than 40-60 kDa may passively diffuse across the nuclear pores into the nucleus, whereas larger proteins are targeted to the nucleus by an active, two-step process. The first step is a rapid, signaldependent binding to the nuclear pore periphery, and the second step is a slower, ATP- and temperature-dependent translocation across the pore. In a number of nuclear proteins, the signal that specifies nuclear localization (NLS) is generally a short stretch of 8-10 amino acids characterized by basic residues as well as proline (reviewed in 82 and 83). In Egr-1, basic residues cluster only in the three zinc fingers and adjacent sequences (Fig. 5), hinting that the karyophilic signal of Egr-1 resides here. Using subcellular fractionation/Western analysis or immunocytochemistry to analyze deletion derivatives of Egr-1, we have demonstrated that AN314 and AC430 are properly targeted to the nucleus, whereas AC314 is cytoplasmic. From these results, amino acids 315 to 429, encoding the three zinc fingers and adjacent basic sequences, appear essential for proper nuclear targeting, These results agree with early suggestions that the C-terminus of Egr-1 is required for nuclear localization (60). A series of fusions of segments of Egr-1 to the large bacterial protein P-galactosidase were further used to show that the zinc-finger domain itself cannot function as an NLS. However, the zinc fingers in conjunction with
214
ANDREA GASHLER AND VIKAS P. SUKHATME
the 5' basic sequence 315-330, but not the 3' basic sequence, were sufficient to target the bacterial protein P-galactosidase to the nucleus. This 5' basic stretch of residues 315-330, KPSRMRKYPNRPSKTP, is shared by other members of the EGR family, Egr-2 and Egr-3, which have conserved DNAbinding domains but generally diverge outside this region (Fig. 4A). Additional analyses showed that the entire zinc-finger domain is not required: either finger 2 or 3, yet not finger 1, could work with the 5' basic sequence to form a bipartite NLS (70).Precedents for the incorporation of nuclear targeting signals within a DNA-binding domain include Fos (84); the progesterone receptor, in which the second finger but not the first functions as an NLS (85); GAL4 (83);and the homeodomain proteins a 2 and Pit-l/GHF-1 (71, 86). Egr-1 may be a prototypical Cys,His, zinc-finger protein whose DNAbinding and nuclear localization functions have coevolved as a composite domain rich in basic residues. Other bipartite nuclear localization signals with two basic regions separated by a short variable spacer have been characterized in nucleoplasmin (87);SW15 (88);the Xenopus protein N1 (89);the steroid hormone receptors; and polymerase basic protein 1 of influenza virus (90). In addition, discontinuous nuclear targeting signals are found in adenovirus DNA-binding protein (91) and the yeast repressor a2, which has two nonhomologous signals, i.e., a basic NLS found at the N terminus as well as a signal located in the homeodomain (86, 92). In these proteins, as in Egr-1, the essential domains are discontinuous in the primary sequence, and it has been suggested that the two parts of the signal may mediate separate steps in nuclear accumulation (86). Several Egr-1-P-galactosidase mutants containing the 5' basic sequence (but neither finger 2 nor 3 intact) and showing staining ringing the nucleus may contain the portion of the signal for binding to, but not translocation across, the nuclear pore (70). Each of the assays used to define the Egr-1 NLS measured the equilibrium nuclear/cytoplasmic distribution of protein. Future kinetic analyses may reveal additional sequences required for prompt nuclear localization. Although a signal of seven predominantly basic amino acids suffices for the nuclear accumulation of SV40 T antigen over a period of hours, a more extensive sequence resulted in nuclear targeting within minutes (93). Serum-dependent nuclear import has been described for the immediateearly transcription factors c-Fos and, reportedly, c-Jun (94). Although Egr-1 is clearly nuclear in serum-stimulated or exponentially growing cells (maintained in 10%calf serum), staining of Egr-1 derivatives or fusion proteins in serum-starved cells should be examined to assess the possibility of conditional nuclear localization. In conclusion, deletion analysis and Egr-1-P-galactosidase fusions demonstrate that nuclear localization of Egr-1 requires a bipartite signal consist-
Egr- 1 TRANSCRIPTION
FACTOR FAMILY
215
281
290
417 431 L R Q X D X X A D X B V V A B
Egr-1 amino acids
B
1
I
I 1
100
I
200
300 I
I
400
500
I
I
+++++++++
SerIThr-rich
533
1
PlSlT
II 1::::J
..,.a**..
281
420
533
Ic I.lll weak
Activation Repression 331
419
315 330 361
419
DNA binding Nuclear localization FIG. 6. Summary of Egr-1 domains (modified from 70). (A) Sequence of Egr-1 repression domain and zinc fingers. The repression domain is shaded and the 5’ basic region involved in nuclear localization is underlined. The threonine residue whose phosphorylation may prevent repression is circled. The three zinc fingers of Egr-1 are aligned for comparison, with residues conserved among Cys,His, zinc fingers enclosed. (B) Functional domains of Egr-1. The serine-/threonine-rich N-terminus of Egr-1 is shown. The basic region of Egr-1 is indicated (+ +). Each zinc finger is designated by a black bar and the proline/serine/threonine-rich C-terminal domain (P/S/T) is indicated. Residues 3-281 activate transcription 100-fold and the C-terminus of Egr-1 (residues 420-533) encodes a weaker trans-activation function. Amino acids 281-314 suffice to act as a repressor of transcription when fused to a heterologous DNAbinding domain. The DNA-binding activity of Egr-1 has been mapped to amino acids 331-419. The NLS of Egr-1 is bipartite: a basic region (amino acids 315-330) and part of the zinc-finger domain suffice to target Egr-1 to the nucleus.
+
216
ANDREA GASHLER AND VIKAS P. SUKHATME
ing of basic residues 315-330, which flank the zinc-finger domain, in addition to sequences within fingers 2 or 3. These results are notable in light of the fact that relatively few Cys2His2 zinc-finger proteins have been characterized with respect to their requirements for nuclear targeting. The incorporation of an NLS within or adjacent to the DNA-binding domain is suggestive of a conserved composite motif in Cys2His2 zinc-finger transcription factors (see Fig. 6). Finally, Egr 1is a member of a small class af proteins that have bipartite nuclear localization signals in which the essential subdomains are separated by more than a few amino acids.
D. Targets of Egr-1 Regulation 1. GENESREGULATED IN THE CONTEXTOF CELLULAR PROLIFERATION Consistent with its induction by mitogenic cues and during terminal differentiation in a few cell types, Egr-1 may bind and regulate genes involved in mitosis or needed for specialized cell functions. The universality of Egr-l expression in response to growth signals suggests that genes downstream of Egr-1 in the cascade governing cellular proliferation will be widely expressed. Several genes belong to this first class of Egr-1 targets, whose regulation presumably directs a cellular response to growth induction. The expression of the thymidine kinase (tk) gene peaks during late GI, after Egr-1 induction, kinetics consistent with regulation by Egr-1 . Enzymes such as thymidine kinase, integral to the biosynthesis of DNA, are regulated depending on the growth state of the cell, and as such thymidine kinase represents a physiologically relevant target for Egr-1. The use of specific a-Egr-1 antiserum (95) has demonstrated that Egr-1 is a component of the tk promoter-binding complex derived from serum-stimulated nuclear extract. Second, transient transfections in CV-1 cells show that Egr-1 activates a reporter driven by a tk promoter fragment from - 174 to 159. Egr-1 activation appears to work through a lower &nity binding site, CCG-TGG-GTG. However, it should be noted that because tk is also expressed highly in actively cycling cells (in the absence of Egr-1 induction), high-level expression of tk apparently does not require Egr-1. A second target for Egr-1 may be the PDGF A chain, a potent mitogen for cells of mesenchymal origin. PDGF-A is also found at high levels in a number of transformed cell lines. In normal cultured cells, levels of PDGFA mRNA rise in response to growth factors or cytokines, but peak later than Egr-1 induction. A region of hypersensitivity to the single-strand-specific nuclease S1 in the 5’ untranslated region of PDGF-A that may be involved in regulating transcription of this growth factor has recently been defined (96).
+
Egr-1 TRANSCRIPTION
FACTOR FAMILY
217
Gel-shift competitions with purified Egr-1 showed that this homopurine/homopyrimidine site competes as well as the Egr-1 consensus. Although the S1-sensitive sequence GAG-GAG-GAG-GAGGA deviates at only the underlined position from the contactsdetermined by crystallography to be important for binding (M), the high &nity of the S1-sensitive site is surprising considering that previous studies have shown that GAG in the first or third subsite is not optimal for Egr-1 binding (97). Nevertheless, this homopurinelhomopyrimidine sequence may be of widespread importance, because similar motifs derived from the promoters of other growth-related genes, such as the epidermal growth factor receptor, the insulin receptor, c-Ki-rus, c-myc, and TGF-P3, are also good competitors of Egr-1 binding (96).Future studies will determine if these provocative in uitro studies are of physiological significance by assessing whether Egr-1 can regulate transcription of the PDGF-A gene through this variant motif. A third Egr-1 target in primary fetal astrocytes may be bFGF. An antisense oligomer to Egr-1 blocks bFGF induction following addition of a mitogen, ET-3 (23). 2. GENESREGULATED I N THE CONTEXT OF CELL DIFFERENTIATION Expression of the myosin heavy chain a gene (a-MHC) and Egr-l are coregulated in serum-deprived primary cultures of cardiac myocytes stimulated with serum and when the embryonal carcinoma cell line P19 differentiates into cardiac cells in response to dimethyl sulfoxide, prompting investigation of a-MHC as a target of Egr-1 regulation. A CAT reporter containing 1.7 kb of the a-myosin heavy chain promoter is activated 10-fold by an Egr-1 expression vector in transfected primary cultures of fetal rat cardiac myocytes. Northern analysis shows the endogenous a-MHC gene is also stimulated three- to fourfold by Egr-1 (98). Induction of a-MHC in response to Egr-1 was observed in the myogenic So18 cell line, but not in NIH3T3 fibroblasts, suggesting tissue-specific induction; a-MHC expression was unchanged in response to Egr-1 in another muscle cell line, L,E,, showing that Egr-1 is not sufficient for the MHC gene activation. The region of the rat a-MHC promoter that is Egr-1 responsive to a segment from -1698 to -1283 has been delimited (98). A potential Egr-1 binding site GTG-GGGGTG is located within this promoter fragment, but has not yet been shown to be the functional element (98). In light of the study showing that Egr-1 translation is blocked during So18 differentiation in response to insulin (20), Egr-1 protein levels in cardiac myocytes remain to be analyzed. A functional role for Egr-1 in adrenergic differentiation, suggested by high-level expression in the rat adrenal gland and in PC12 cells, may be the
218
ANDREA GASHLER AND VIKAS P. SUKHATME
regulation of phenylethanolamine N-methyltransferase (PNMT), the adrenal enzyme that converts norepinephrine to epinephrine. In vivo, neural stimulation causes an increase in Egr-1 protein in the adrenal medulla corresponding to a rise in PNMT expression in the same cell type (99). Transient transfections in the highly transfectable PC12 subline RS1 reveal that Egr-1 can modestly stimulate (fourfold) a PNMT reporter with 442 bp of 5’ sequence. This region includes two potential Egr-1 binding sites, an optimal consensus sequence at - 165 and a proximal site GCG-GGG-GGG at -45. Cold competition experiments show that this 8 of 9 match to the optimal Egr-1 consensus is a weak but specific competitor (99). It has been postulated that Egr-1 negatively regulates the widely expressed adenosine deaminase gene. ADA has a (G + C)-rich promoter, lacking TATA and CCATT box elements, typical of classical housekeeping gene promoters. As discussed above, Egr-1 and S p l bind to distinct, although similarly (G C)-rich, DNA-binding sites (97). Notably, deletion analysis of the ADA promoter reveals a cis-acting repressor element that maps to an Egr-1 site. Mutations that destroy Egr-1 binding but do not affect the S p l site in the 13 bp overlapping Egr-1/Spl motif GCG-TGG-GCGGGGC result in a 15-fold enhancement in promoter activity. In vitro, Egr-1 and S p l protect overlapping segments of this complex 13-bp sequence. One hypothesis is that Egr-1 negatively regulates ADA transcription by competitively occupying the motif and displacing Spl. Alternatively, Egr-1 may repress the ADA promoter by an active mechanism independent of the Spl, also consistent with results described above. Evidence for this proposal is that even in the absence of an Spl site, mutation of the Egr-1 motif results in higher promoter activity. Future studies with varying ratios of Egr-1 and S p l expression vectors as well as experiments addressing the issue of whether the Egr-1 DNA-binding domain is sufficient for the negative regulation will be informative. The definition of a consensus binding site for Egr-1 has propelled investigations to identify the genes that Egr-1 binds and regulates. The tk gene represents a physiologically relevant target for Egr-1 in the context of cell growth. The induction of the tk gene subsequent to the Egr-1 serum response, the ability of Egr-1 to bind to a site in the tk 5’ sequence, and transcriptional activation of the tk promoter by Egr-1 in transient transfections all support the idea that thymidine kinase is an important Egr-1 target. A second Egr-1 target may be the mitogen PDGF, because Egr-1 can bind to a site in the PDGF-A gene (100).Other potential Egr-1 target genes are clearly not relevant to cellular proliferation. The expression pattern of Egr-1 in the adult animal as well as its induction during terminal differentiation in some cell types suggest that Egr-1 plays a role in specialized cells that is distinct from its function during the Go to G, transition. In cardiac cells, the
+
Egr-1 TRANSCRIPTION FACTOR FAMILY
219
endogenous a-myosin heavy chain gene or a transfected construct containing the a - M H C promoter is stimulated by Egr-l(98). And in adrenal cells, Egr-1 can regulate the phenylethanolamine N-methyltransferase gene, supporting a role for Egr-1 in adrenergic differentiation (99). Additional targets of Egr-1 regulation in other differentiated cell types, for example, specific to osteoblasts or to macrophages, remain to be identified.
VI. In Vivo Role of Egr-1 The challenge remaining in current Egr-1 research is to relate correlative expression data and in vitro studies to a biological role for Egr-1. In a few instances, overexpression or antisense analyses have shown a phenotype for Egr-1. These studies have focused on differentiated cell types; despite the abundance of data showing Egr-1 induction by mitogenic signals, a role for Egr-1 in cell growth/division remains to be established. These phenotypic analyses are complicated by potential functional redundancy contributed by related members of the EGR family (see Section VII). With virtually identical DNA-binding domains, the expression of related family members may serve to mask a phenotype in Egr-1 loss-of-function experiments. A clear-cut biological role for Egr-1 has been demonstrated in three systems. As discussed above, antisense oligomers preventing Egr-1 expression in myeloid cells block macrophage differentiation. Further, constitutive Egr-1 expression restricts the potential of HL-60 cells, rendering them incapable of differentiation along the granulocyte lineage (32). A second phenotype for Egr-1 involves its role as a positive regulator of astrocyte proliferation as discussed earlier (23).A third system in which Egr-1 plays a causal role involves the hypertrophic growth of cardiac myocytes in response to endothelin-1. Egr-1, as a gene rapidly induced by endothelin, was proposed to mediate cardiac hypertrophy. It has been definitively shown that endothelin-l-induced hypertrophic growth in adult rat cardiomyocytes, as assayed by increased protein synthesis, is blocked by oligomers complementary to the Egr-1 message (101). Additional phenotypes for Egr-1 await experimentation in other systems. Perhaps the .most exciting unanswered question is whether Egr-1 functions as a cellular proto-oncogene in a manner analogous to c-fos.
VII. €gr-l Is Part of a Gene Family, Including the Wilms Tumor Suppressor Gene W7 Egr-1 shares a highly conserved domain, encoding the three zinc-finger motifs, with several other immediate-early genes as well as genes that h n c -
220
ANDREA GASHLER AND VIKAS P. SUKHATME
tion in unrelated contexts. Zinc-finger proteins of the type first described for TFIIIA contains invariant residues, including conserved cysteines and histidines, but also include nonconserved residues that presumably dictate the specificity of binding. Egr-2IKrox2O (102,103),Egr-3 (69),and Egr-4INGFlClpAT133 (69, 104, 105) encode proteins with zinc-finger domains virtually identical to that of Egr-1. The Egr-1 zinc-finger domain is over 95% identical to that of Egr-2 and 91% identical to that of Egr-3 at the amino-acid level. Most of the changes are conservative substitutions, and residues important in determining the sequence specificity of binding are absolutely conserved (Fig. 5A). The homology extends to adjacent basic sequences but drops abruptly outside this region. Egr-2IKrox2O and Egr-3 are strikingly induced by growth factors whereas Egr-4/NCFl-ClpAT133 is more weakly inducible (105). The expression of Egr-2JKrox 20, restricted to the nervous system during mouse embryogenesis, generates a segment-specific pattern in the developing hindbrain (69, 103, 106). Importantly, disruption of Egr-21 Krox20 by homologous recombination in the mouse results in postnatal death of the animal, with anatomical analysis showing severely reduced or absent rhombomeres 3 and 5 in the hindbrain; (107). Finally, Egr-21Krox20 brain expression is also transiently activated by electroconvulsive shock treatment, D1 dopamine receptor activation, and opiate withdrawal, in a pattern similar to that noted for Egr-llZif268 (41). The Wilms tumor suppressor gene W T I , implicated in the genesis of this pediatric kidney malignancy, has four zinc fingers, three of which are highly homologous (67% identical) to the Egr-1 zinc-finger domain (108, 109). The WT1 protein binds to the EGR consensus binding sequence GCG-GGGGCG but with lower affinity. Moreover, the first finger of WTI and the presence of KTS (an alternatively spliced variant) between fingers 3 and 4 dictate other sequence requirements for DNA binding (55).The mammalian activator Spl also has three related zinc fingers, with finger 2 most similar to EGR fingers 1 and 3 (110). The EGR family of proteins is also distantly related to MIG1, a yeast protein that responds to glucose repression ( 1 1 1 ) . AS suggested by the homology of the zinc-finger motifs, the sequences recognized by the EGR proteins, WTl, and Spl are related. Interestingly, flanking (A + T)-rich sequences play critical roles in target site recognition by MIG1. These flanking sequence preferences may reflect local DNA binding (112).
VIII. Conclusion and Future Perspectives The genomic response of a cell to changes in its extracellular environment includes the induction of immediate-early transcription factor genes.
Egr-1
TRANSCRIPTION FACTOR FAMILY
221
The most extensively characterized members of this group are the fos, jun, and Egr family members. Their discovery has allowed delineation of the “proximal” events from cell surface to nucleus that induce them: definition of intracellular signaling pathways and downstream promoter elements they target. More recent efforts have focused on events “distal” to transcription factor gene induction: characterization of the proteins involved, their interactions with each other, definition of the target DNA sequences to which they bind, structure-function analyses, negative regulation following induction, and other forms of cross-talk between these family members. Collectively, therefore, these investigations have enhanced our knowledge of signal transduction pathways and general mechanisms of transcriptional activation and repression, and protein-DNA interactions. The most important critical questions for future analysis involve the further identification of phenotypes, either by ectopic overexpression or by “underexpression” using dominant negative, antisense, or homologous recombination methodologies. Unfortunately, however, many phenotypes may be masked by redundant pathways. Nevertheless, a search for such systems will be critical to provide the substrate by which to characterize suitable physiological target genes for immediate-early transcription factor action. ACKNOWLEDGMENT This work was partially supported by NIH Grant CA40046 to V. P. S.
REFERENCES 1. D. Denhardt, D. Edwards and C. Parfett, BBA 865, 83 (1986). 2. H. Herschman, TZBS 14, 455 (1989). 3. H. Herschman, A R B 60, 281 (1991). 4. L. Lau, Curr. Opin. Cell B i d . 2, 280 (1990). 5 . V. P. Sukhatme, S. Kartha, F. Toback, R. ‘kub, R. Hoover and C.-H. Tsai-Morris, Oncogene Res. 1, 343 (1987). 6. V. Sukhatme, X. Cao, L. Chang, C.-H. Tsai-Morris, D. Stamenkovich, P. Ferreira, D. Cohen, S. Edwards, T. Shows, T. Curran, M. Le Beau and E. Adamson, Cell 53, 37 (1988). 7. S. Suggs, J. Katzowitz, C. Tsai-Morris and V. Sukhatme, NARes 18, 4283 (1990). 8. J. Milbrandt, Science 238, 797 (1987). 9. L. Lau and D. Nathans, PNAS 84, 1182 (1987). 10. R. Lim, B. Varnum and H. Herschman, Oncogene 1, 263 (1987). 11. D. Simmons, D. Levy, Y. Yannoni and R. Erikson, PNAS 86, 1178 (1989). 12. J. Wright, K. Gunter, H. Mitsuya, S. Irving, K. Kelly and U. Siebenlist, Science 248, 588 (1990). 13. P. Lemaire, 0. Revelant, R. Bravo and P. Charnay, PNAS 85, 4691 (1988).
222
ANDREA GASHLER AND VIKAS P. SUKHATME
14. B. Christy, L. Lau and D. Nathans, PNAS 85, 7857 (1988). 15. S. Qureshi, C. Joseph, M.-H. Rim, A. Maroney and D. Foster, Oncogene 6, 995 (1991). 16. K. Alexandropoulos, S. A. Qureshi, M. Rim, V. P. Sukhatme and D. A. Foster, NARes 20, 2355 (1992). 17. V. Seyfert, V. Sukhatme and J. Monroe. MCBiol 9, 2083 (1989). 18. V. Seyfert, S. McMahon and W. Glenn. Science 250, 797 (1990). 19. H. D. Rupprecht, P. Dann, V. P. Sukhatme, R. 8. Sterzel and D. L. Coleman, Am. J. Physiol. 263, F623 (1992). 20. B. Wollnik, C. Kubisch, A. Maass, H. Vetter and L. Neyses, BBRC 194, 642 (1993). 21. X. M. Cao, G. R. Guy, V. P. Sukhatme and Y. H. Tan, JBC 267, 1345 (1992). 22. F. Rastinejad and H. Blau, Cell 72, 903 (1993). 23. R.-M. Hu and E. R. Levin, J. Clin. Inuest. 93, 1820 (1994). 24. J. Milbrandt, Neuron 1, 183 (1988). 25. M. Watson and J. Milbrandt, Deuelopment 110, 173 (1990). 26. A. McMahon, J. Champion, J. McMahon and V. Sukhatme, Deuelopment 108,281 (1990). 27. S. Vainio, I. Karavauova, A. Jowett and I. Thesleff, Cell 75, 45 (1993). 28. T. Darland, S. Samuels, V. P. Sukhatme and E. Adamson. Oncogene 6, 1367 (1991). 29. L. Suva, M. Ernst and G. Rodan. MCBdol 11, 2503 (1991). 30. S. Kharbanda, E. Rubin, R. Datta, R. Hass, V. Sukhatme and D. Kufe. Cell Growth Dqfer. 4, 17 (1993). 31. S. Kharabanda, T. Nakamura, R. Stone, R. Hass, S. Bernstein, R. Datta, V. P. Sukhatme and D. Knfe, J . Clin. Inuest. 88, 571 (1991). 32. H. Nguyen, B. Hoffnian-Liebermann and D. Leibermann, Cell 72, 197 (1993). 33. M. Le Beau, R. Espinosa 111 and W. Neuman, PNAS 90, 5484 (1993). 34. J. V. Bonventre, V. P. Sukhatme, M. Bamberger, A. J. Ouellette and D. Brown. Cell Reg. 2, 251 (1991). 35. D. Hallahan, V. P. Sukhatme, M. Sherman, S. Virudachalam, D. Kufe and R. R. Weichselbaum, PNAS 88, 2156 (1991). 36. J. Morgan and T. Curran, Trends Neurosci. 12, 459 (1989). 37. D. Bartel, M. Sheng, L. Lau and M. Greenberg, Genes Deu. 3, 314 (1989). 38. K. Rosen, M. McCormack and L. Villa-Komaroff. PNAS 89, 5437 (1992). 39. P. Worley, B. Christy, Y. Nakabeppu, R. Bhat, A. Cole and J. Baraban, PNAS 88, 5106 (1991). 40. A. Cole, D. SaRen, J. Baraban and P. Worley, Nature NB 340, 474 (1989). 41. R. Bhat, P. Worley, A. Cole and J. Baraban, Brain Res. Mol. Brain Res. 13, 263 (1992). 42. T. Herdegen, K. Kovary, J. Leah and R. Bravo, J. Comp. Neurol. 313, 178 (1991). 43. G. Jamieson, R . Mayforth, M. Villereal and V. P. Sukhatme, J. Cell Physiol. 139, 262 (1989). 44. V. Seyfert, S. McMahon, W. Glenn, X. Cao, V. Sukhatme and J. Monroe, J. Zmmunol. 145, 3647 (1990). 45. S. A. Qureshi, M. Rim, J. Bruder, W. Kolch, U. Rapp, V. P. Sukhatme and D. A. Foster, JBC 266, 20594 (1991). 46. C.-H. Tsai-Morris, X. Cao and V. P. Sukhatme, NARes 16, 8835 (1988). 47. P. Changelian, P. Feng, T. King and J. Millbrandt, PNAS 86, 377 (1989). 48. R. Treisman, Cell 46, 567 (1986). 49. B. Christy and D. Nathans, MCBiol9, 4889 (1989). 50. X. Cao, R. Koski, and A. Gashler et al, MCBiol 10, 1931 (1990). 51. D. G u s , X. Cao, F. Rauscher, D. Cohen, T. Curran and V. Sukhatme, MCBiollO, 4243 (1990). 52. S. Qureshi, X. Cao, V. Sukhatme and D. Foster, JBC 266, 10802 (1991).
Egr-1
TRANSCRIPTION FACTOR FAMILY
223
53. R. Datta, E. Rubin, V. Sukhatme, S. A. Qureshi, R. Weichselbaum and D. W. Kufe, PNAS 89, 10149 (1992). 54. J. Miller, A. McLachlan and A. Klug, EMBOJ. 4, 1609 (1985). 55. I. Drummond, H. Rupprecht, P. Rohwer-Nutter, J. Lopez-Guisa, S. Madden, F. Rauscher I11 and V. Sukhatme, MCBiol 14, 3800 (1994). 56. J. Lanfear, T. Jowett and P. Holland, BBRC 179, 1220 (1991). 57. M. Russo, C. Matheny and J. Milbrandt, MCBioZ 13, 6858 (1993). 58. R. Richards and 6. Sutherland, Cell 70, 709 (1992). 59. J. Corden, TIBS 15, 383 (1990). 60. M. Day, T. Fahrner, S. Aykent and J. Milbrandt, JBC 25, 15253 (1990). 61. C. Waters, D. Hancock and G. Evan, Oncogene 5 , 669 (1990). 62. P. Lemaire, C. Vesque, J. Schmitt, H. Stunnenberg, R. Fank and P. Charnay, MCBiol 10, 3456 (1990). 63. J. Berg, Annu. Rev. Biophys. Biophys. Chem. 19, 405 (1990). 64. N. P. Pavletich and C. 0. Pabo, Science 252, 809 (1991). 65. R. E. Klevit, Science 253, 1367 (1991). 66. J. Nardelli, T. Gibson, C. Vesque and P. Charnay, Nature NB 349, 175 (1991). 67. T.Wilson, M. Day, T. Pexton, K. Padgett, M. Johnston and J. Milbrandt, JBC 267, 3718 (1992). 68. L. Fairall, J. Schwabe, L. Chapman, J. Finch and D. Rhodes, Nature N B 366,483 (1993). 69. S. Patwardhan, S. Gashler, M. Siegel, L. Chang, L. Joseph, T. Shows, M. Le Beau and V. Sukhatme, Oncogene 6, 917 (1991). 70. A. L. Gashler, S. Swaminathan and V. P. Sukhatme, MCBiol 13, 4556 (1993). 71. L. Theill, J.-L. Castillo, D. Wu and M. Karin, Nature N B 342, 945 (1989). 72. N. Zeleznik-Le, A. Gashler and V. Sukhatme, unpublished (1994). 73. J. Licht, M. Grossel, J. Figge and U. Hansen, Nature N B 346, 76 (1990). 74. P. Zuo, D. Stanojevic, J. Colgan, K. Han, M. Levine and J. Manley, Genes Deu. 5, 254 (1991). 75. K. Han and J. Manley, Genes Deu. 7, 491 (1993). 76. S. Madden, D. Cook and F. I. Rauscher, Oncogene 8, 1713 (1993). 77. N. Zeleznik-Le, V. Sukhatme, R. Drapkins and D. Reinberg, unpublished (1993). 78. N. Zeleznik-Le and V. Sukhatme, unpublished (1993). 79. S.-J. Chen, E. Klann, M. Cower, C. Powell, J. Sessomsand J. Sweatt, BJ 32, 1032(1993). 80. S. Hahn, Curr. Biol. 2, 152 (1992). 81. C. Abate, D. Luk and T. Curran, MCBioZ 11, 3624 (1991). 82. J. Garcia-Bustos, J. Heitrnan and M. Hall, BBA 1071, 83 (1991). 83. P. Silver, L. Keegan and M. Ptashne, PNAS 81, 5951 (1984). 84. I. Tratner and I. Verma, Oncogene 6, 2049 (1991). 85. A. Guichon-Mantel, P. Lescop, S. Christin-Maitre, H. Loosfelt, M. Perrot-Appanat and E. Milgrorn, EMBO J . 10, 3851 (1991). 86. M. Hall, C. Craik and Y. Hiraoka, PNAS 87, 6954 (1990). 87. J. Robbins, S. Dilworth, R. Laskey and C. Dingwall, Cell 64, 615 (1991). 88. T. Moll, G. Tell, U. Surana, H. Robitsch and K. Nasmyth, Cell 66, 743 (1991). 89. J. Kleinschmidt and A. Seiter, EMBO J . 7, 1605 (1988). 90. S. Nath and D. Nayak, MCBiol 10, 4139 (1990). 91. N. Morin, C. Delsert and D. Klessig, MCBiol9, 4372 (1989). 92. M. Hall, L. Hereford and I. Herskowitz, Cell 36, 1057 (1984). 93. H.-P. Rihs and R. Peters, EMBOJ. 8, 1479 (1989). 94. P. Roux, J.-M. Blanchard, A. Fernandez, N. Lamb, P. Jeanteur and M. Piechaczyk, Cell 63, 341 (1990).
224
ANDREA GASHLER AND VIKAS P. SUKHATME
95. 96. 97. 98. 99.
A. Crozat, G. Molnar and A. B. Pardee, MCBiol (1995). Submitted. Z. Wang and T. Deuel, BBRC 188, 433 (1992). B. Christy and D. Nathans, PNAS 86, 8737 (1989). M. P. Gupta, G. Gupta, R. Zak and V. P. Sukhatme, JBC 266, 12813 (1991). S. Ebert, S. Balt, J. Hunter, A. Gashler, V. Sukhatme and D. Wong, j B C , 269, 20885
(1994). 100. Z. Wang, S. Madden, T. Deuel and F. Rauscher, JBC 267, 21999 (1992). 101. L. Neyses, J. Nouskas and H. Vetter, BBRC 181, 22 (1991). 102. P. Chavrier, M. Zerial, P. Lemaire, J. Almendral, R. Bravo and P. Charnay, EMBOJ. 7, 29 (1988). 103. L. Joseph, M. Le Beau and G. Jamieson, et al., PNAS 85, 7164 (1988). 104. S. Crosby, J. Puetz, K. Simburger, T. Fahrner and J. Milbrandt, MCBiol11,3835 (1991). 105. H.-J. Muller, C. Skerka, A. Bialonski and P. Zipfel, PNAS 88, 10079 (1991). 106. D. Wilkinson, S. Bhatt, P. Chavrier, R. Bravo and P. Charnay, Nature N B 337,461 (1989). 107. S. Schneider-Maunoury, P. Topilko and T. Seitandou et al., Cell, 75, 1199 (1993). 108. M. Gessler, A. Poustka, W. Cavenee, R. Neve, S. Orkin and G. Bruns, Nature N B 343, 774 (1990). 109. K. Call, T. Glaser, C. Ito, A. Buckler, J. Pelletier, D. A. Haber, E. A. Rose, A. Kral, H. Yeger, W. H. Lewis, C. Jones and D. E. Housman, Cell 60, 509 (1990). 110. J. Kadonaga, K. Carner, F. Masiarz and R. Tjian, Cell 51, 1079 (1987). 111. J. Nehlin and H. Ronne, EMBOJ. 9,2891 (1990). 112. M. Lundin, J. Nehlin and H. Ronne, MCBiol 14, 1979 (1994). 113. M. H. Little, J. Prosser, A. Condie, P. J. Smith, V. Van Heyningen and N. D. Hastie, PNAS 89, 4791 (1992). 114. H. Rupprecht, V. P. Sukhatme, J. Lacy, R. B. Sterzeland D. L. Coleman, Am. J. Physiol. 265, F351 (1993). 115. D. L. Coleman, A. H. Bartiss, V. P. Sukhatme, J. Liu and H. D. Rupprecht, J. Zmmunol. 149, 3045 (1992). 116. R. Datta, N. Taneja, V. P. Sukhatme, S. A. Qureshi, R. Weichselbaum and D. W. Kufe, PNAS 90, 2419 (1993). 117. L. Neyses, J. Nouskas, J. Luyken, S. Fronhoffs, S. Oberdorf, U. Pfeifer, R. S. Williams, V. P. Sukhatme and H. Vetter. j . Hypertens. 11, 927 (1993). 118. D. M. Cohen and S. R. Gullans, Am. J . Physiol. 264, F593 (1993). 119. B. A. Haber, K. L. Mohn, R. H. Diamond and R. Taub, J. Clin. Znoest. 91, 1319 (1993). 120. A. Sachinidis, P. Weisser, Y. KO, K. Schulte, K. Meyer zu Brickwedde, L. Neyses and H. Vetter, FEBS Lett. 313, 109 (1992). 121. J. W. Liu, J. Lacy, V. P. Sukhatme and B. L. Coleman, JBC 266, 5629 (1991).
Two New Collagen Subgroups: Membraneassociated Collagens and Types XV and XVIII TAINAPIHLAJANIEMI MARKOREHN
AND
Collagen Research Unit Biocenter and Department of Medical Biochemistry University of Oulu FIN-90220 Oulu, Finland
I. The Collagen Superfamily . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A. Fibrillar Collagens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Nonfibrillar Collagens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Membrane-associated Collagenous Proteins . . . . . . . . . . . . . . . . . . . . . . . A. Macrophage Scavenger Receptors . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Complement Subcomponent C l q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Type XVII Collagen . . ................................... D. Type XI11 Collagen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Deliberations on Membrane-associated Collagenous Proteins . . . . . 111. Collagen Types XV and XVIII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Structural Characteristics of the al(XV) and al(XVII1) Chains . . . . B. Sequence Homologies between Collagen Types XV and XVIII . . . . C. Genes Encoding Collagen Types XV and XVIII . . . . . . . . . . . . . . . . . D. Tissue Distribution of mRNAs for Collagen Types XV and XVIII . . E. Variant Type XVIII Collagen Chains Are Homologous with Tissue Polarity Gene Products (“Frizzled Proteins) . . . . . . . . . . . . . . . . . . . F. Deliberations on Collagen Types XV and XVIII . . IV. Conclusions and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
226 228 228 230 232 234 235 237 246 248 248 250 252 253 253 256 257 259
The collagens comprise a large family of genetically distinct but structurally related proteins that are found in essentially all connective tissues of most multicellular organisms, being particularly abundant in cartilage, demineralized bone, ligaments, placenta, tendon, skin, and most blood vessels. A prominent function of collagens is to maintain the architecture of tissues and organs and to confer strength on them, but they are also involved in early development and organogenesis, cell attachment, chemotaxis, and filtration through basement membranes. Progress in Nucleic Acid Research and Molecular Biology, Vol. 50
22s
Copyright 8 1995 by Academic Press. lnc. All rights of reproduction in any form reserved.
226
TAINA PIHLAJANIEMI AND MARK0 REHN
The collagens are defined as proteins that contain one or more characteristic triple-helical domains consisting of three polypeptides with repeated Gly-X-Y amino-acid triplets. The triple-helical conformation is made possible by the occurrence of glycine as every third amino-acid residue, because only this residue is small enough to occupy the restricted space at the center of the collagen helix. Furthermore, collagens are regarded as structural components of the extracellular matrix. The late decade has witnessed a marked extension of the list of collagen molecules, and molecular cloning techniques, in particular, have greatly facilitated identification of the members of the collagen family. At the DNA level, the repeated Gly-X-Y sequence allows for cross-hybridization between clones encoding different collagen types, and this has led to the isolation of several clones encoding previously unidentified collagens. The triple-helical domains of collagens are resistant to proteolytic digestion, which has likewise resulted in the identification of new collagens. This discussion focuses on two new subgroups among the large collagen gene family, the membrane-associated collagens, types XIII and XVII, and the subgroup formed by collagen types XV and XVIII. Variant transcripts of collagens are briefly reviewed, because this phenomenon is pertinent for members of both new subgroups. The categorization of all known members of the collagen family into subgroups is also outlined, but a more comprehensive overview of the structures, types, and biosynthesis of collagens and their mutations in connective tissue disorders can be found in several excellent reviews addressing one or more of these topics (1-12); therefore, the various collagens are discussed here only selectively, as they pertain to the specific topics of this essay.
1. The Collagen Superfamily All collagen molecules consist of three polypeptide chains, called a chains, that wind around each other in a right-handed helix in each molecule to form a characteristic collagen triple helix (1-12). In addition, all collagen molecules contain noncollagenous sequences at their termini, and several types also have them as interruptions separating adjacent triple-helical regions. A triple-helical region has a high proline content in the X position of the repeated Gly-X-Y triplet, and hydroxyproline in the Y position. These residues are required for the correct conformation of the helix, the hydroxyprolines also being essential for the thermal stability of the helix. Several other characteristic features are also involved in the remarkably complex posttranslational processing of collagens (1, 6, 7, 11, 12). Nineteen distinct collagen types have been identified in vertebrates (1-
TWO NEW COLLAGEN SUBGROUPS
227
11), and this list is expected to grow, although, somewhat surprisingly, no new collagens have been discovered during the past 2 years. The collagen types are numbered by Roman numerals in order of their discovery, and the (Y chains found in each collagen type are identified with Arabic numerals. Because some collagens consist of two or even more genetically distinct (Y chains, over 30 genes exist to code for the constituent chains. In addition, several other proteins contain triple-helical Gly-X-Y sequences but are not formally classified as collagens because they are not structural components of the extracellular matrix (5, 7, 10, 11). The rapid expansion of the list of proteins included in the collagen family has not allowed this criterion to be ascertained for some of the proteins classified as collagens; some proteins have been classified as collagens because they were discovered in a laboratory engaged in collagen research. Moreover, some of the new collagens are composed mostly of noncollagenous sequences, and thus differ strikingly from the classical collagens with large triple-helical domains. These matters have inevitably led to some inconsistencies in the nomenclature of collagens, and as the distinction between collagens and other proteins with collagenous sequences has become blurred, it has been proposed that all proteins with a collagenous triple helix should be considered members of the collagen superfamily regardless of their functions (5). As has been outlined in detail (lo),the complexity of collagens is evident at many levels, from the set of over 30 genes needed to code for the individual chains to the alternate promoters existing for some genes and the alternative splicing of primary transcripts required for others, both affecting the structures of the ensuing polypeptides. The tissue distribution is also heterogeneous; some collagens are found in most connective tissues whereas others have a more restricted distribution. Complexity is observed at the protein level in the extent of the posttranslational modifications, and even the chain composition of the collagen molecules shows variability. Finally, the supramolecular aggregates of collagens are more complex than was previously appreciated, the stereotypical banded fibrils being only one of the several forms of aggregate, and the aggregates usually being composed of several collagen types arranged in a highly precise manner. Although all collagens, by definition, contain repeated Gly-X-Y sequences, the types differ in their precise amino-acid sequence and the lengths of these domains (1-11). It can be inferred from the occurrence of the collagen sequence in various other proteins, such as acetylcholine esterase and the C l q subcomponent of the first component of the complement cascade, that this repeated rodlike configuration is useful in proteins outside the collagen family as well, perhaps for separating noncollagenous functional domains in some of these proteins. The noncollagenous parts of collagens consist of unique sequences, modular structures present in one or more of
228
TAINA PIHLAJANIEMI AND MARK0 REHN
the other collagens, and motifs originally found in noncollagenous proteins (13). These noncollagenous domains comprise the bulk of the structure in some of the nonfibrillar collagens, yet in most cases their functions are poorly understood. The collagens can be divided into two major groups, fibrillar and nonfibrillar, in terms of their structural and functional characteristics, and the latter into several subgroups.
A. Fibrillar Collagens Collagen types I-III, V, and XI are collectively known as the fibrillar collagens because they aggregate into fibrils, structures that are prominent in many collagen-containing tissues. These molecules contain continuous collagenous domains of about 1000 amino-acid residues, highly conserved C-terminal noncollagenous domains of about 250 residues and variable N-terminal noncollagenous domains of about 50-520 residues; they are derived from a common ancestral gene (14,15).Each of the genes encoding the structurally homologous fibrillar collagens contains over 50 exons, and a distinct feature of those encoding the repeated Gly-X-Y sequences is that they are either 54 bp in size or derivatives of this. The fibrils are formed by a highly ordered quarter-staggered arrangement of molecules, and provide tensile strength for the tissues. A recent observation is that the fibrils are heterotypic in vivo, i.e., they consist of more than one collagen type (7, 10).
B. Nonfibrillar Collagens Collagen types IV, VI-X, and XII-XIX do not form fibrils, and are hence known as the nonfibrillar collagens (1-11). These molecules display great heterogeneity in structure, tissue location, macromolecular organization, and function. One common feature is that they all have one or more interruptions in the collagenous sequence. Their collagenous sequences vary in length between about 330 and 1400 amino-acid residues, the shortest being found in type VI collagen molecules and the longest in type VII. Their C-terminal and N-terminal noncollagenous domains are highly variable both in sequence and length, the former varying from less than 20 residues to several hundreds and the latter from 25 residues to more than 2700. 1. THE FACIT COLLAGENS
The largest subgroup among the nonfibrillar collagens is formed by the FACITs (fibril-associated collagens with interrupted triple helices), types IX, XII, and XIV (1-11). Each of these contains one or two collagenous domains that adhere to fibrils of the fibrillar collagens, one other collagenous domain, and a noncollagenous region that projects out of the fibril. It has been suggested that the latter region interacts with other matrix components, and that the FACITs serve in this way to connect fibrils to other matrix struc-
229
TWO NEW COLLAGEN SUBGROUPS
tures. Type IX collagen is expressed in cartilage, and covalent linkage to the major fibrillar collagen in cartilage, type 11, has recently been demonstrated (16). Types XI1 and XIV are found in tissues containing type I collagen. Interestingly, types IX, XII, and XIV collagens contain glycosaminoglycan side chains (17-19). Two other collagens, types XVI (20,21)and XIX (22,23), contain sequences homologous with the FACITs. Limited information exists on the tissue distribution of type XVI collagen ( 2 4 , but otherwise the tissue distribution of these two types and their associations with other matrix components are still to be determined.
2. THE TYPES VIII
AND
X SUBGROUP
Another subgroup is formed by the homologous types VIII and X collagens, also known as the short-chain collagens because their constituent polypeptide chains are only about 700 residues in size, or half the size of the fibrillar collagens. Type X collagen molecules assemble into hexagonal lattices in the matrix surrounding hypertrophic chondrocytes in uiuo, and a similar organization has been reported for type VIII collagen in uitro (7, 9, 10). The biological significance of the latter is not known, but the restricted expression of type X collagen by hypertrophic chondrocytes suggests a role for it during enchondreal bone formation. 3. COLLAGEN TYPESIV, VI,
AND
VII
Type IV collagen molecules are major components of basement membranes. They aggregate to form the networklike structures essential for creating the sheetlike architecture of basement membranes, and they participate in tissue integrity and filtration. Type IV collagen forms a complex entity by itself and must be present in more than one type of molecule, because six genetically distinct o! chains have been identified (11, 24). Type VI collagen is a heterotrimer consisting of three different a chains; it forms a subgroup of its own among the nonfibrillar collagens (1,4-11), and is present in most connective tissues. The molecules aggregate to form supramolecular structures that have the appearance of beaded filaments. Type VII collagen is a homotrimer composed of unique chains, and its molecules are the major constituents of the anchoring fibrils that attach the epithelium to the underlying stroma ( I , 4-11). 4. NEW COLLAGENSUBGROUPS
Results obtained in the past one or two years lead us to suggest two new subgroups of collagenous molecules, which we discuss in detail in the following paragraphs. One of these comprises the membrane-associated collagens, types XI11 and XVII, both of which have interrupted triple helices. Hence, they could be called MACITs (membrane-associated collagens with inter-
230
TAINA PIHLAJANIEMI AND MARK0 REHN
rupted triple helices) by analogy with the previously coined term FACITs (see Section I,B,l). The other new subgroup consists of the homologous collagens XV and XVIII.
II. Membrane-associated Collagenous Proteins The first integral membrane proteins found to contain collagenous sequences were the types I and I1 macrophage scavenger receptors (25, 26), which participate in a variety of macrophage-associated functions, as suggested by their broad polyanion-binding ability, including host defense and inflammation; they appear to play a key part in the development of atherosclerosis (27-29). C l q , the collagen-like subcomponent of the first component of C1, is also a membrane protein thought to serve as an immunoglobulin receptor mediating macrophage adherence and phagocytosis (30, 31). Because they are receptors participating in host defense, rather than structural components of the extracellular matrix, these are not included in the list of collagens, but they are discussed in this paper along with the MACITs (see Section I,B,4). The 180-kDa bullous pemphigoid antigen, BPAG2, is a hemidesmosomal protein recognized as an autoantigen in bullous pemphigoid and herpes gestationalis (32-34). The cloning of this protein predicted an interrupted collagenous domain and a transmembrane segment (35-38), and the molecule was subsequently designated as type XVII collagen (38).Our interest in the membrane-associated collagenous proteins stems from the recent further characterization of type XIII collagen, suggesting the presence of a transmembrane segment (39). The macrophage scavenger receptors, C l q and types XVII and XI11 collagens, are not homologous in sequence or in the general appearance of their molecules (Fig. l), but they are grouped together in this discussion on the basis of their putative plasma membrane location. FIG. 1. Schematic comparison of domain organization of membrane-associated polypeptides containing collagenous sequences. The domain structures for the human al(XII1) collagen chain, the human rrl(XVI1) collagen chain, the bovine macrophage scavenger type I receptor, and the mouse B chain of C l q (derived from 25, 36, 39, 57, and 70). The numbering of the noncollagenous and collagenous domains is shown above the corresponding polypeptide. Black boxes are noncollagenous domains; white boxes are collagenous domains; shaded boxes are transmembrane domains; striped boxes are coiled-coil domains; the plasma membrane is represented by the vertical dashed line. In the case of the mouse B chain of Clq, the signal sequence has been proposed to serve as a transmembrane domain in guinea pig (31).C, Cysteine residue; N, potential extracellular N-glycosylation site. The scale shown below the polypeptides is in amino acids originating from the C termini of the transmembrane domains, extending in both the cytosolic and extracellular directions
Extracellular
-
Cytosolic
-
-
NC1
Type Xlll collagen
NH2 I
-
Type XVll collagen NC1
NH2
c-2 c
COLl NC2
COL2
C
NC3
COL3
O
c c COLl
NC4
O
2 2
3
I
c c
H
3 4 5 6 7 8 9 1 0 11 12 13 14 15 4 56 7 8 9 1011 12 13 14 15 16
I
r l ~ . I r . ~ -
c cc
N
Macrophage scavenger NH2 receptor,type I
c
B-chain of C1q complex
NHZ
="
COOH N NNNN
acids
I
I
400
200
,
.
2c 4c
-
p
COOH
-
c c c
C
Amino
COOH
I 0
I
I
200
400
I 600
I
800
I 1000
232
TAINA PIHLAJANIEMI AND MARK0 REHN
A. Macrophage Scavenger Receptors The bovine scavenger receptor is a trimeric protein with a molecular weight of 220 kDa composed of subunits of 77 kDa, with N-linked sugars accounting for 15-20 kDa of the M , of the monomer (40, 41). cDNAs have been isolated for bovine, murine, human, and rabbit species, and two types of receptor, types I and 11, have been defined by cloning (25,26,42-45).The bovine type I receptor is a 453-residue polypeptide with six domains (Fig. 1): I, a 50-residue N-terminal cytoplasmic domain; 11, a 26-residue singletransmembrane domain; 111, a %residue short spacer domain; IV, a 163residue a-helical domain; V, a 72-residue collagenous domain, and VI, a 110residue C-terminal cysteine-rich domain (25).The bovine type I1 receptor is a 349-residue truncated form identical except that the cysteine-rich domain is replaced by a 6-residue C terminus (26).The two types of receptor derive from alternative splicing of transcripts from a single gene (42). Based on the observation that functional scavenger receptors are trimeric structures (41.) and on the nature of the amino-acid sequences in domains IV and V, it can be predicted that these domains fold into coiled-coiled structures of two types. Domain IV contains 23 seven-amino-acid repeats that, in other proteins with similar heptad repeats, fold into right-handed amphipathic a-helices. Three of these helices wrap around each other to form a left-handed coiled-coiled structure. Domain V consists of 24 uninterrupted Gly-X-Y triplet repeats in which 14 of the Y residues are prolines or lysines, residues often found in these positions in collagens. Although folding of domain V into a collagen helix has not been demonstrated, this conformation is extremely likely based on the sequence and in view of the observed hydroxylation of the proline residues of scavenger receptors expressed in Chinese hamster ovary cells (46). In contrast to typical cell-surface receptors with both high ligand specificity and affinity, the types I and I1 macrophage scavenger receptors exhibit an unusually broad ligand specificity in conjunction with high affinity, one common feature of these ligands being their polyanionic nature. The polyanions they can bind include chemically modified proteins such as acetylated and oxidized low-density lipoprotein (LDL), polyribonucleotides, including poly(1) and poly(G), polysaccharides, acidic phospholipids, and certain other molecules such as asbestos, bacterial lipopolysaccharide and glucosemodified type IV collagen (28, 29, 47). Interestingly, there are many polyanions that the receptors do not bind, notably unmodified LDL. Differences have been observed in the binding properties of receptors derived from different cells and species and with respect to type I versus type I1 receptors (28, 29). Thus multiple binding sites on a single receptor, multiple conforma-
TWO NEW COLLAGEN SUBGROUPS
233
tions of the known receptors with differences in their binding properties, and as yet unidentified receptors may influence the variable binding activities. Identification of the C-terminal cysteine-rich domain of the type I receptor defined a previously uncharacterized conserved sequence motif found in various proteins (42), and this seemed initially an attractive candidate for ligand-binding activity. However, the surprising finding that the truncated type 11 receptor, which lacks the above domain, mediates endocytosis of chemically modified LDL in a manner similar to the type I receptor excluded this possibility (26). The fact that all known ligands are anionic and the observation that the nucleotide-derived sequences predicted that all the Gly-X-Y triplets would be either neutral or positively charged at physiological pH suggested that this domain may play a critical role in binding the anionic ligands. Subsequently, mutant receptors lacking most of the collagenous sequences were indeed unable to bind modified LDL while trimerization and plasma membrane location were unaffected (48). Furthermore, point mutations introduced into most C-terminal 22 residues of the collagenous sequences indicate that this sequence contains certain lysine residues, conserved between several species, that are critical for the binding activity (49). Thus, the extreme C-terminal end of the collagenous domain contains a positively charged groove that specifically interacts with negatively charged ligands. Interestingly, the substitutions needed to abolish binding of acetylated LDL differed from that of oxidized LDL, directly demonstrating that different ligands are recognized in various ways. Additional data suggest that conserved negatively charged residues may play a role in differentiating between polyanions that bind and those that do not (45). Scavenger receptor expression in Chinese hamster ovary cells leads to their conversion to foamlike cells that accumulate lipids when incubated with modified LDL (50). Furthermore, atherosclerotic plaques contain scavenger receptor mRNA and protein (43, 51). Many sets of results collectively suggest that macrophage scavenger receptors are involved in the transformation of arterial wall macrophages into lipid-laden foam cells, so that they have a key role in the deposition of lipoprotein cholesterol in the artery walls during the formation of atherosclerotic plaques (27-29). The broad ligand specificity of the scavenger receptors also implies involvement in macrophage-associated host defense activities such as pathogen clearance and participation in immune reactions (28,29).Scavenger receptors identical to those present in macrophages are also expressed by cultured smooth muscle cells (44, 52), but none have been identified in endothelial cells, even though these also possess scavenging activity (29).
234
TAINA PIHLAJANIEMI AND MARK0 REHN
B. Complement Subcomponent C1q Complement subcomponent C l q associates with the proenzymes C l r and C l s to yield the first component of the serum complement system, C1 (30). As a result of this activation, the globular heads of C l q are able to interact with the Fc regions of IgG and IgM antibodies present in immune complexes. C l q is a glycoprotein of about 460 kDa consisting of 18 polypeptide chains arranged in six trimeric subunits that form a hexameric oligomer structure. The subunits are heterotrimers composed of an A chain, a B chain, and a C chain, the structures of which have been elucidated by protein sequencing (53)as well as by cDNA and genomic cloning (54-57). The human A, B, and C chains are about 225 amino-acid residues long and have 22- to 28-residue signal peptides, collagenous regions of about 81 residues, and C-terminal noncollagenous sequences of about 136 residues (Fig. 1). The C-terminal globular domains of the C l q chains are homologous in sequence with the C-terminal noncollagenous domains of the type VIII and X collagen chains, indicating an evolutionary relationship among the three proteins (56, 58). Electron microscopy shows that the C l q hexamers resemble bouquets of tulips, the collagenous domains forming the stems, and the C-terminal collagenous globular domains, the bulbs (30).A recent analysis of ligand-binding to the short collagenous domain of C l q showed it to have broad polyanion-binding specificity similar to the macrophage scavenger receptor (48). We include C l q in the group of collagenous membrane proteins, because it has recently been suggested that at least a portion of it may be a membrane protein in guinea pig macrophages (31).This is based on immunochemical analyses showing it to be located in the membrane throughout the biosynthetic pathway, and that it is expressed on the cell surface, from which it is also secreted into the medium, the latter being accompanied by a change in the apparent molecular mass of the B chain. The only hydrophobic portions of C l q chains that could serve as membrane domains are the signal peptides. In view of both this and the difference in apparent molecular weight between the membrane-bound and secreted B chains, it has been proposed that at least the B chain of C l q is an integral membrane component that is not cleaved from the signal sequence after translocation of the rest of the molecule in the lumen of the endoplasmic reticulum (31). In addition to scavenger receptors and C l q , the members of a group recently termed collectins have similarly short collagenous domains and are likewise thought to be important in immune defense (59, 60).The collectins are hybrid proteins containing a collagen-like domain and lectin domains characterized by carbohydrate binding activity; they include three plasma
TWO NEW COLLAGEN SUBGROUPS
235
proteins (mannan-binding protein, conglutinin, and collectin-43) and the lung surfactant apoproteins SP-A (with at least two forms; see Section 11,E) and SP-D (59, 60). Although the collectins are not structurally homologous with C l q , some of them assemble into hexameric or tetrameric oligomers resembling those formed by the latter. It has also been suggested that the scavenger receptors associate to form similar oligomeric structures (28, 29). All in all, a whole set of collagenous proteins, both membrane-bound and soluble, participate in host defense, involving the clearance of debris from the extracellular matrix. In the light of studies with the scavenger receptor and C l q it is predicted that the collagenous domains of the entire set of proteins contribute significantly to their host defense functions (28, 29).
C. Type XVll
Collagen
Type XVII collagen is a major component of hemidesmosomes, plasmamembrane-associated organelles involved in the adherence of certain epithelia, notably stratified ones, to the underlying basement membrane (8, 9, 11). Type XVII collagen has been identified in the epidermis but it may have a more generalized distribution in stratified squamous epithelia, because both mRNA and protein have been demonstrated in the cornea (6163). Complete cDNA-derived primary structures have been determined for human (35-37) and mouse type XVII collagen (38)the deduced M , of the a chain being 144 kDa. The human polypeptide contains 1497 residues with the following domains (see Fig. 1): I, a 466-residue N-terminal cytoplasmic domain (the major part of the NC1 domain); 11, a 23-residue singletransmembrane domain; 111, a 15-residue spacer; IV, a 58-residue heptad repeat domain; V, a 920-residue primarily collagenous segment; and VI, a 15-residue C-terminal noncollagenous domain (NC16) (35-37). Computer predictions suggest that the major antigenic sites likely to be involved in eliciting an autoantigenic response are located downstream of the transmembrane domain (38). Its orientation across the membrane has been proved by electron-microscope immunolocalization using both polyclonal and monoclonal antibodies reacting with epitopes that reside either on the N-terminal or C-terminal side of the transmembrane domain and by proteolytic cleavage of the extracellular domain, clearly demonstrating that the N-terminal end is cytoplasmic and the C-terminal is extracellular (37, 64). The large cytoplasmic domain, nearly 500 residues in size, is characterized by four closely located cysteines (Fig. l),a region rich in glycine, and several putative sites for phosphorylation (37, 38). The extracellular portion contains eight heptad repeats that may form a coiled-coil trimer structure, as suggested for the scavenger receptor (see Section 11,A). Immediately adja-
236
TAINA PIHLAJANIEMI AND MARK0 REHN
cent to this structure is a collagenous sequence of 920 residues distinguished by a large number of interruptions in the repeated Gly-X-Y sequences, with 36% of the residues residing in the interruptions. All in all, 15 collagenous domains can be detected in the human chain, varying from 15 to 242 residues, while the interruptions vary from 6 to 58 residues (COL1-15 and NC2-15 in Fig. 1; the domains are numbered from the N to the C terminus). In addition, several of the collagen domains contain short imperfections of two or three residues in the repeat sequence. It should be noted that there are several differences in the human (37) and mouse (38)sequences, possibly due to species variations or existence of alternative splicing of transcripts, resulting in differences in the number of collagenous and noncollagenous domains in the two species. Although not experimentally demonstrated, the primary sequence analysis suggests that the major extracellular portion of type XVII collagen is formed by a collagen triple helix with multiple interruptions that provide flexibility in the molecules, whereas the C-terminal noncollagenous domain is very short (Fig. 1). A partial exon-intron organization for the human gene encoding type XVII collagen has been elucidated. Characterization of the four extreme 5' exons of the gene, covering the 67 N-terminal residues, indicate that they vary from 295 to 48 bp, the first exon being the longest, and are contained within about 6.5 kbp of genomic DNA (38).The first exon corresponds solely to noncoding sequences, and contains two segments ofT,,AA and TIA,, that may form a hairpinlike secondary structure, whereas exon 2 contains the translation-initiation codon. Furthermore, sequences spanning 12 kbp of human genomic DNA and corresponding to residues 489-850, namely, the end of the NC1 domain, the heptad-repeat domain, and the beginning of the collagenous sequences, have been characterized (62). This coding segment consisted of 19 exons varying in size from 27 to 222 bp. The extreme 5' of these, the 222-bp exon, covered sequences immediately downstream of the transmembrane domain as far as the end of the last heptad repeat, thus fully encompassing one putative structural unit. All the exons encoding collagenous sequences without interruptions or imperfections were multiples of 9 bp, the predominant exon sizes in the collagenous segment being 36, 27, and 63 bp (in order of frequency of occurrence), but other sizes were also found, including 54 bp. The exon sizes are very similar to those observed for types VI and XI11 genes (65-68), with the notable difference that all the exons of the type XVII gene start with the second codon for glycine instead of a complete codon. The exon-intron organization of the type XVII collagen gene is thus clearly different from that of the fibrillar collagen genes that contain 54-bp exons or multiples of this.
TWO NEW COLLAGEN SUBGROUPS
237
D. Type Xlll Collagen 1. STRUCTURAL FEATURES OF THE al(XII1) CHAIN Type XI11 collagen was fortuitously identified when screening a human cDNA library with a mouse type IV collagen clone for the purpose of finding human type IV collagen cDNAs. Instead, a positive clone was found that coded for a previously unidentified collagenous sequence (69). We subsequently reported the complete primary structure of the mainly collagenous polypeptide that was designated the a1 chain of human type XIII collagen (70) and recently isolated cDNAs encoding the full-length mouse counterpart (39). Surprisingly, the mouse clones extended further in the 5’ direction than the human clones and encoded a longer N-terminal noncollagenous domain than we previously reported for the human sequences. The longer domain initiates at an upstream ATG that is in-frame with the ATG previously thought to represent the initiation codon of translation, and results in an N-terminal noncollagenous domain that is 81 residues longer and has a highly hydrophobic sequence (39). Comparison of human and mouse type XIII collagen amino acid sequences shows marked conservation between the two species. The previously reported human cDNA clones have not been shown to extend as far upstream as the mouse cDNAs, but the presence of the upstream ATG and the transmembrane domain can be predicted from the available human genomic sequences (68) as well as from the recent studies with human RNA demonstrating the presence of the upstream ATG also in human mRNAs (39). It has been evident from the very beginning of the characterization of type XI11 collagen that transcripts for this collagen undergo alternative splicing (see Section 11,D,3). Based on overlapping cDNAs, the longest possible splice variant consists of 726 amino-acid residues and has a predicted M , of about 70,900 (39, 69-71). The following domains can be discerned in the human al(XII1) chain (see Fig. 1):I, a 40-residue cytosolic N-terminal noncollagenous domain; II, a 21-residue putative transmembrane domain; 111, a 58-residue noncollagenous domain (1-111 collectively form the 101-residue NC1 domain seen in Fig. 1);IV, a 104-residue collagenous domain (COL1); V, a 34-residue noncollagenous domain (NC2); VI, a central 172-residue collagenous domain (COL2); VII, a %residue noncollagenous domain (NC3); VIII, a 236-residue collagenous domain (COL3); and IX, an 18residue C-terminal noncollagenous domain (NC4). However, the lengths of the COL1, COL3, NC2, and NC4 domains may vary considerably (see Section II,D,3). In comparison to other collagens, type XI11 markedly resembles type IX in its overall arrangement, by having three collagenous
238
TAINA PIHLAJANIEMI AND MARK0 REHN
domains separated by short noncollagenous segments and in the length of its polypeptide, but the two do not show homology at sequence level. Very few data exist on the type XI11 collagen protein, on account of a combination of a low level of expression and difficulties in obtaining antibodies. Antipeptide antibodies against the COL3/NC4 junction recognize collagenous bands of about 67 and 54 kDa in Western blot analysis of cultured HT-1080 human fibrosarcoma cells (69). These sizes are in good agreement with the predicted M , for the polypeptide, and the presence of more than one band is to be expected in light of the alternative splicing of the primary transcripts. The available antibodies have not been suitable for immunohistochemical localization of type XI11 collagen molecules, and thus the putative transmembrane location remains to be demonstrated. However, the hydrophobic region of the NCl domain (39) clearly fulfills the criteria for a transmembrane domain when analyzed by the parameters of Engelman et al. (72). Neither the upstream ATG nor the downstream one is followed by a classical signal peptide sequence, the lack of such a sequence being typical of those transmembrane proteins that have an intracellular N-terminal portion and an extracellular C-terminal portion, also known as class 11 transmembrane proteins (73). Taken together, these findings suggest that type XI11 collagen may be located on the plasma membrane, with the short N-terminal portion cytosolic and the long collagenous portion extracellular. Preliminary immunfluorescence studies using recently obtained antipeptide antibodies specific for type XIII collagen indeed support the postulated membrane location in cultured cells (P. Hagg, S. Peltonen, T. Vaisanen and T. Pihlajaniemi, unpublished results).
2. THE GENEENCODING RPEXI11 COLLAGEN The human gene encoding type XI11 collagen is at least 130 kb in size, but two gaps between the otherwise overlapping genomic clones mean that its exact size is not known (67, 68). This large collagen gene encodes mRNAs of 2.5-2.8 kb (70), and thus the coding sequences represent only 1.8% of the genomic sequences. The combined results of the analyses of genomic clones (67, 68) and cDNAs (71) allow us to estimate that the gene contains at least 41 exons. The exon sizes deduced from the genomic clones vary from 24 to 153 bp for the coding regions, and indirect evidence from the cDNAs suggests that the third exon is only 8 bp in size, being the smallest known among the various collagen genes (68, 71). The exons coding for collagenous domains are, in most cases, multiples of 9 bp in length, eight being of 27 bp, five of 54 bp, three of 45 bp, two of 42 bp, and the rest of unique sizes varying between 8 and 153 bp. All the exons except one begin with a complete codon for an amino acid, the extreme 3’ exon being unusual in that it begins with the stop
TWO NEW COLLAGEN SUBGROUPS
239
codon for translation and encodes solely 3' untranslated sequences, while the extreme 5' exon encodes most of the NCl domain. The transcription initiation site mapped by nuclease S 1 analysis and primer extension using human HT-1080 RNA is preceded by a TATA-boxlike sequence that is repeated four times and contains CCAAT boxes at - 10 and -180 (68). However, the mouse cDNAs extend about 400 bases upstream of the presumed transcription initiation site, and the upstream ATG (see Section 11, D, 1) is located at +9 with respect to the mapped initiation of transcription. Thus the data on the mouse clones suggest that the 5' untranslated region is markedly longer than was previously assumed, and that the promoter may be located further upstream.
3. ALTERNATIVE SPLICINGOF TYPEXI11 COLLAGEN TRANSCRIPTS AND VARIANT TRANSCRIPTS FOR OTHER COLLAGEN TYPES
a. Variant Collagen Transcripts. Type XI11 was the first collagen shown to be modified by variations in sequences of RNA transcripts (69), but subsequent results indicate that the occurrence of variant collagen transcripts appears to be the rule rather than the exception, because such variations have now been reported for 14 ci chains in 11 collagen types, as outlined in Table I. The mode of generation of the alternative collagen transcripts varies from alternative promoters to the use of alternative splice acceptor or donor sites or simple exon skipping (for references, see Table I). However, the significance of these modifications is not fully understood at present. Utilization of the alternative, cartilage-specific transcription start site in the gene encoding the a2 chain of type I collagen results in synthesis of a cartilage form of the mRNA that is not translated into a collagen chain but is thought to direct the synthesis of a short, noncollagenous protein that may control the expression of a2(I) chains (74).The alternative transcript appears early in embryogenesis in tissues derived from the neuroectoderm, but at later stages of chick development it disappears from the neural tissues and is found almost exclusively in the hyaline cartilage (96). An alternative transcript of the cxl(III) gene in which exons 1-23 are replaced by the initiation of transcription at intron 23 results in two open reading frames that are out of phase with the collagen reading frame, and a third that encodes the C-terminal two-thirds of the collagenous sequences (76). These results obtained with the genes encoding the types I and 111 collagen chains suggest that some collagen genes may have additional functions independent of their roles in collagen synthesis. The two transcription start sites found for the gene encoding the a1 chain of type IX collagen are used in a tissue-specific manner,
240
TAINA PIHLAJANIEMI AND MARK0 REHN
TABLE I VARIATIONSOCCURRING IN TRANSCRIPTS OF COLLAGEN a CHAINSDUETO ALTERNATIVE SPLICING OR ALTERNATIVE PROMOTERS -
Type Type I collagen a2 chain Type I1 collagen a1 chain Type 111 collagen a1 chain Type IV collagen a3 chain
a5 chain Type VI collagen a 2 chain
a 2 chain
a3 chain
Type IX collagen a1 chain Type XI collagen a1 chain
Mode of variation
Effect on gene product
Ref.
Cartilage-specific promoter in intron 2
Production of a noncollagenous protein
74
Alternatively spliced exon 2
Encodes a cysteine-rich domain of the N-propeptide
75
Alternative transcriptioninitiation site in intron
Production of noncollagenous proteins or a truncated collagen chain
76
Alternatively spliced exons 4 and 2 in the 3’ end of the gene Alternatively spliced exon 2 in the 3’ end of the gene
Altered lengths of C-terminal NCl domains
77
Premature termination of the NC1 domain
78
Alternative splicing by mutually exclusive utilization of the last two exons in conjunction with the usage of an internal acceptor site in the penultimate exon Alternative promoter with three alternative splice donor sites 650 bp downstream from exon 1 Alternative splicing of the first, second, fourth, and eighth exons encoding the N-terminal globular domain
Three variant chains with altered lengths of the C-terminal globular domains
79
Production of transcripts with four variant 5’ untranslated sequences
80
Altered lengths of the large N-terminal globular domain
81 -83
Cornea-specific promoter in intron 6
Production of a short N-terminal NC4 domain
84
Three alternatively spliced exons in the 5’ end of the gene
Variant lengths of the N-propeptide
23
85, 86
(continued)
24 1
TWO NEW COLLAGEN SUBGROUPS TABLE I (Continued) ~
Type
a2 chain Type XI1 collagen a1 chain
Type XI11 collagen a1 chain
Type XIV collagen a1 chain
Mode of variation
~~
~
Ref.
Three alternatively spliced exons in the 5’ end of the gene
Variant lengths of the N-propeptide
Alternatively spliced exons in the 5’ end of the gene
Absence of eight fibronectin type 111 repeats and three von Willebrand factor A repeats in the N-terminal NC3 domain results in a short form of type XI1 collagen lacking a glycosaminoglycan side chain
88
Ten alternatively spliced exons: exons 3B, 4A, 4B, 5, 12, 13, 29, 30, 33, and 37 in the human gene
Altered lengths of both noncollagenous and collagenous sequences
68, 70, 71, 89, 90,91
Alternatively spliced exons in the 5’ end of the gene
Absence of one fibronectin type 111 repeat in the N-terminal NC3 domain Production of a truncated form of the type XIV collagen lacking the collagenous portion of the polypeptide, i.e., production of the undulin 1 protein Presence of 31 additional amino-acid residues in the C-terminal NC1 domain
92, 93
Short and long N-terminal NC1 domain Production of a cysteinerich sequence, termed fz
95
a1 chain
Lack of sequences corre-
a1 chain
Alternatively spliced region in the 3’ end of the gene
sponding to the 3’ end of the gene
Type XVIII collagen a1 chain Alternative promoters a1 chain
~~
Effect on gene product
Alternatively spliced region in the 5’ end of the gene
86, 87
94
93
95
resulting in the synthesis of al(IX) chains possessing either long or short N-terminal noncollagenous domains that may affect the abilities of the variant molecules to interact with other matrix components (84). In most cases,
242
TAINA PIHLAJANIEMI AND MARK0 REHN
variant transcripts result in the synthesis of collagen chains that have either N or C-terminal noncollagenous domains with altered lengths (Table I).
b. Alternative Splicing of Type X I I I Transcripts in Human and Mouse Tissues. The alternative splicing of type XI11 collagen transcripts is surprisingly complex because sequences corresponding to 10 exons of the human gene are alternatively spliced (68, 70, 71, 89-91). This affects not only noncollagenous sequences but, uniquely, also collagenous sequences, namely, the noncollagenous domains NC2 and NC4 and the collagenous domains COLl and COL3 (Fig. 2). In the case of the COLl domain, the four successive alternative exons, 3B, 4A, 4B, and 5, encode half of the domain. We have found 6 of the 16 theoretically possible combinations of exons 3B-5 when analyzing human RNAs by reverse transcriptase polymerase chain reaction (PCR) (Fig. 2B), resulting in the conclusion that the length of the COLl domain may vary between 57 and 104 amino-acid residues (89). Most of the NC2 domain is encoded by the alternatively spliced exons 12 and 13, and mRNAs exist that contain either exon 12 or 13 or lack both exons (Fig. 2B). Thus, the predicted length of this domain is either 12, 31, or 34 residues (89). The structure of the COL3 domain is altered due to alternative splicing of sequences corresponding to exons 29, 30, 33, and 37, the latter exon encoding both the end of the collagenous sequences and the beginning of the NC4 domain (90,91). Twelve of the 16 potential variants of these four exons have been identified (P. Hagg and T. Pihlajaniemi, unpublished observations) and the length of the COL3 domain may vary between 175 and 234 residues, whereas the NC4 domain is either 18 or only 7 residues long. Thus alternative splicing affects the outer thirds of the al(XII1)chains, leaving the central collagenous domain and the NC1 and the NC3 domains constant. We have used reverse transcriptase-PCR and RNase mapping to analyze the occurrence of the various splice forms of type XIII collagen in human tissues and cultured cells. With respect to the COLl domain, most samples contained notable amounts of four different mRNAs encoding either two long C O L l variants of 95 and 104 residues or two short ones of 57 and 66 residues (89). Particularly, the extent of inclusion or exclusion of the NC2coding exons 12 and 13 varied according to the type of tissue or cell analyzed. Bone, cartilage, and colon adenocarcinoma samples contained little or none of the mRNA corresponding to the long NC2 variants, whereas these niRNAs were the major variants in fibroblasts, lung, muscle, placental villi, and osteosarcoma cells (89, 91). The expression of COL3 variants has been analyzed only in placental tissue, where the major mRNA variants both in villi and in decidual samples lacked exon 29 sequences, and either lacked or contained exon 33 sequences, whereas no splicing out of exon 30 and 37
A I
NC 1 119
1
COLl
** *
57- 104
344 5 BA B
I
NC2
**
I
COL2
12-34
IlILmmI
172
Ill I l l I
INC31
LNC41
CO L3
**
22
I I I I
*
*
I I I I I U
29 30
33
37
175-234
I I IBI I I
12 13
7-18 amino acids
exons
B COLl length in amino acids 104 92
m ih.oi
95 57
66 83
bvd
NC2 length
in amino acids
pY mLd
COL3 length in amino acids 222
31
20 7
12
1
34
219 234
NC4 length in amino acids
7 8
9
-
T
$All
V
G
I I I I I
q
d
q
inrEm3
18 18 7 18 18
FIG. 2. Different human type XI11 collagen variants identified in various sources. (A) The structure of the human al(XII1) collagen chain indicating the lengths of the various domains in amino acids and the exons encoding the polypeptide (39, 70, 71). Exons, or parts of them, encoding the collagenous sequences are shown as white boxes and those encoding the noncollagenous sequences are shown as black boxes. Arrowheads denote the alternatively spliced exons, which are also numbered beginning from the most 5’ exon of the gene. (B) Variant exon combinations encoding the COLI, NC2, COL3, and NC4 domains of the human type XI11 collagen chains (70,71,91). It should be noted that the different splice combinations encoding the COL3 and NC4 domains have not yet been extensively characterized. Because the domains have been studied separately, the domain combinations in respect to each other are not shown.
244
TAINA PIHLAJANIEMI AND MARK0 REHN
sequences was detected (91). All in all, these findings indicate that tissuespecific differences occur in the expression of variant NC2 sequences, and to a lesser extent variant COLl and COL3 sequences. The alternative splicing does not remove sequences with known potential biological functions. Furthermore, “nonpermitted combinations of the alternative exons do not appear to exist. Characterization of the proportions of full-length variant mRNAs with alternative sequences corresponding to 10 exons is difficult due to the complexity of the alternative splicing. We have therefore attempted to estimate the amounts of variant mRNAs by simultaneously analyzing the relative occurrence of only some of the variant exons. Thus, we analyzed the combinatorial splicing pattern covering the COLl and NC2 domains of d(XII1) chains, a region that is affected by alternative splicing of six exons that theoretically can result in 64 variants (71).Twelve combinations of the alternatively spliced exons 3B, 4A, 4B, 5 , 12, and 13 were found, yielding clear differences in the relative occurrence of the variants between the cells and tissues analyzed. Furthermore, these experiments suggest that each cell synthesizes only a few type XI11 collagen mRNA variants in major amounts. The availability of mouse type XI11 collagen clones has made it possible to analyze the alternative splicing of type XIII collagen transcripts in mouse tissues in order to see if the splicing is conserved between the human and mouse species (97). These results indicate that several exons are also alternatively spliced in the mouse, and that splicing affects the structures of the same domains as previously observed in the human type XI11 collagen chains. The same NC2-coding exons are variable in the mouse as in the human collagen, and the expression of variant NC2 exons similarly shows marked tissue differences, but surprisingly, only some of the alternative exons identified in the COLl and COL3 domains in the various mouse samples were the same as in the human collagen. It thus appears that, with respect to the collagenous domains, it is not always critical what exons are alternative as long as the net effect is the possibility for altering the lengths of the collagenous domains.
c. Possible Signijicance of Variant Type XI11 Collagen Chains. Because we do not know the function of type XI11 collagen, we can only speculate about the significance of the complex alternative splicing of its transcripts. Particularly intriguing is the fact that polypeptides with collagenous domains of different lengths are synthesized in the same cell, as this immediately raises the question of the effect of such variation on chain assembly. It is not known whether polypeptides with COLl and COL3 domains of different length occur in the same trimeric molecule, but if such associations do occur,
TWO NEW COLLAGEN SUBGROUPS
245
the different collagenous domains will not be in phase throughout their lengths, and thus the “excess” Gly-X-Y triplets of the longer chain(s) must participate in forming an extended noncollagenous domain. The lengths of the collagenous domains of such heterotrimeric molecules would be dictated by the chain with the shortest collagenous domains, and therefore the need for altering the lengths of the type XI11 collagen domains remains obscure. Recent studies with recombinant type XI1 collagen molecules suggest that the extreme C-terminal collagenous domain, and not only the C-terminal non-collagenous domains, as is the case with the fibrillar collagens, is important for the recognition and mutual alignment of the three a chains in trimer formation (98).This brings up another possibility regarding the association of type XI11 collagen chains with collagenous domains of variable lengths, namely, that the collagenous domains of the polypeptides would determine the association of chains in such a way that only chains with collagenous domains of identical length would associate to form trimers. This would result in type XI11 collagen molecules that differed in the lengths of their collagenous domains. In general, the triple-helical domains of collagens are rodlike structures in which the side chains of each X and Y position residue are at the surface of the helix, giving the collagen molecules a substantial capacity for lateral interactions with other molecules of the extracellular matrix and resulting in formation of various supramolecular assemblies (1, 4-11). Because the presumably extracellular domain of the membrane-bound type XI11 collagen is composed almost entirely of collagenous sequences, the collagenous domains can be expected to be important in the functions that this collagen performs, probably including interactions with extracellular components. Critical residues of the collagenous domains may possess the binding specificity of the type XI11 collagen molecule, as in the case of the macrophage scavenger receptor (48,49),allowing the alternative splicing to modulate the structures of the binding sites. The alternative splicing could thus alter the properties of the type XI11 collagen molecules with respect to interactions based on collagenous sequences. On the other hand, alternative splicing of collagenous sequences may modulate type XIII collagen functions in another way. If the putative binding properties of type XIII collagen molecules are based on their noncollagenous domains, the collagenous domains may adjust the functional noncollagenous domains to the different distances needed for the binding properties at various locations. Here it does not matter which of the exons are alternatively spliced as far as the distance between the noncollagenous domains can be altered. Both mechanisms may be operating in the type XI11 collagen molecules, explaining why only part of the alternatively spliced exons are
246
TAINA PIHLAJANIEMI AND MARK0 REHN
conserved between the human and mouse type XIII collagen genes (97). As far as the NC2 domain is concerned, it is quite possible that the shortest variant may lack some functional property that a long form possesses. 4. TISSUEDISTRIBUTION OF (rl(XII1) MRNAs Northern and in situ hybridization experiments with human fetal tissues and human placentas have revealed a wide tissue distribution of type XIII collagen mRNAs; low amounts of these mRNAs could be found in all tissues examined (91, 99). In situ hybridization experiments suggest that type XIII mRNAs are expressed in the epidermis, hair follicles, and nail root cells of the skin, the endomycium separating the muscle fibers in striated skeletal muscle, and the mucosal layer of the colon and small intestine. Furthermore, expression of the mRNA is detected in developing bone in both cartilaginous and osseus areas. A clear hybridization signal is observed in the proliferating and hypertrophic chondrocytes near the perichondrium or articular surfaces, whereas the type XI11 collagen mRNA in osseous regions are located in the mesenchymal cells forming the reticulin fibers between bone spicules. In the placenta, type XI11 collagen mRNAs were observed in the fibroblastoid stromal cells of the villi, the endothelial cells of the developing capillaries, and the cells of the cytotrophoblastic columns. These cells also synthesize mRNAs for type IV collagen and laminin (100). The cells of the double-layered trophoblastic epithelium are negative for type XI11 collagen expression, but synthesize marked amounts of mRNAs for laminin and type IV collagen, essential components of the basement membrane underlying the epithelium. This may indicate that type XIII collagen is not a constituent that always occurs in conjunction with basement membranes. Furthermore, the large decidual cells of the decidual membrane and the stromal cells of the gestational endometrium express type XI11 collagen mRNAs, whereas the epithelial cells of the endometrial glands are devoid of these mRNAs.
E.
Deliberations on Membrane-associated Collagenous Proteins
The macrophage scavenger receptor, C l q , and type XVII collagen all reside on the plasma membrane, so that their collagenous domains are in the extracellular space. Assuming that type XI11 collagen also has this orientation, each of the four proteins will have an N-terminal intracellular portion and a C-terminal collagenous portion. This “type I1 orientation” is rare for plasma membrane proteins, occurring in only 5% of them (73). Thus the type I1 orientation of collagenous proteins may be more than coincidental, particularly with respect to formation of the collagenous triple helix.
TWO NEW COLLAGEN SUBGROUPS
247
In the case of fibrillar collagens the correct three polypeptides associate via the C-terminal noncollagenous domains after the polypeptides have been released into the lumen of the endoplasmic reticulum. After that, triplehelix formation initiates from the C-terminal ends of the associated polypeptides and proceeds in a zipperlike fashion toward the N-terminus (101-103). Polypeptides with transmembrane domains are likely to remain inserted in the membrane throughout the biosynthetic phases, their collagenous domains extending into the lumen and being subject to the various posttranslational modifications typical of collagens. Because the N termini of the polypeptides are anchored, triple-helix formation proceeding from the C terminus to the N terminus is not as readily conceivable as it is for the nonmembrane collagens. Thus the macrophage scavenger receptors, C l q , and collagen types XIII and XVII may differ strikingly from the known mode of triple-helix formation, in that association of the correct three chains occurs while the newly synthesized polypeptides are being inserted in the rough endoplasmic reticulum membrane and the folding into a triple-helical conformation proceeds in the opposite orientation. A function is known only for the macrophage scavenger receptors and C l q ; in the case of types XIII and XVII collagens it can merely be postulated that their physiological role may be to interact with various extracellular matrix proteins, such as other collagens, proteoglycans, integrins, etc. As type XI11 collagen is found ubiquitously in the matrix, it may have an essential role in all connective tissues. Due to the shortness of the cytosolic domain, it probably does not have any enzymatic activity and thus is not likely to be directly involved in transmembrane signaling. One factor of importance may be the highly charged nature of the most obviously protruding collagenous domain, COL3, inasmuch as the extreme C terminus of this domain contains 10 Gly-X-Y triplets with charged residues, the net charge being positive. Furthermore, the 100 extreme C-terminal residues are completely conserved between the human and mouse forms (39). It is thus possible that this region functions in interactions with anionic components of the matrix. The restricted location of type XVII collagen in squamous epithelia, particularly in the epidermal region, suggests that it interacts with the basement membrane components of the epidermal zone. Furthermore, the blisters occurring in the skin of patients with autoantibodies against type XVII collagen point to a possible anchoring function. The genes encoding the various collagen chains are mostly dispersed in the genome, but in a few cases two genes have been mapped to the same region (7, 8). The human gene encoding the scavenger receptors is located on 8p22 (104)and those encoding the three chains of C l q are adjacent to each other on lp34.1-36.3 (56, 105). Correspondingly, those encoding type XIII collagen (106) and type XVII collagen (62) are on chromosome 10, in
248
TAINA PIHLAJANIEMI AND MARK0 HEHN
regions q22 and q24.3, respectively. The long arm of chromosome 10 contains the genes for the surfactant-associated proteins SP-A1 and SP-D at q22-23 (107), a second SP-A gene, SP-A11(108), an SP-A pseudogene ( l o g ) , and the gene for the mannose-binding protein at q21(107). Interestingly, the gene coding for the a subunit of prolyl 4-hydroxylase, the critical posttranslational enzyme of collagen synthesis, maps at 1Oq21.3-23.1 (110).Thus there exists an unusual clustering of genes encoding several collagenous proteins and the catalytically active subunit of the key enzyme of collagen biosynthesis. This clustering and the notion that membrane collagens serving to attach the cell to its surroundings may be needed in even the simplest living forms make it tempting to suggest that collagen types XIII and XVII represent ancient evolutionary forms of collagenous proteins.
111. Collagen Types XV and XVlll
A. Structural Characteristics of the a1 (XV) and a1 (XVIII) Chains The second new subgroup of collagens discussed here is formed by the recently identified types XV and XVIII, which share homology at the aminoacid level (111-115). Both collagens are characterized by extensive interruptions in their collagenous sequences, the collagenous domain of the a1(XV) chain having eight interruptions containing 33% of the collagenous domain residues, and the al(XVII1) chain nine interruptions hosting 19% of the collagenous residues. Two other collagen chains contain multiple interruptions in their collagenous sequences, namely, the al(XVI) chain (20,21)with nine interruptions containing 15% of the collagenous portion residues, and the al(XVI1) chain (see Section II,C,l), with 14 interruptions hosting 36% of the residues. Types XVI and XVII collagens do not show sequence homology with the al(XV)/al(XVIII)pair, however, except for a thrombospondin motif present in all but type XVII collagen. The first cDNA for type XV collagen was identified by screening a human placenta library at low stringency with a DNA fragment encoding an unidentified avian fibrillar collagen (116).Subsequently, others have independently identified clones for the same collagenous polypeptide (112, 113). Screening of a cDNA library with a mouse type XIII collagen clone for the purpose of finding full-length mouse clones again resulted in the localization of a clone encoding a previously unidentified collagenous chain (114). The polypeptide was identified at the same time by low-stringency screening of cDNA libraries in two other laboratories (113, 117), and was promptly named the a1 chain of type XVIII collagen (113, 114). The finding of type XVIII collagen
249
TWO NEW COLLAGEN SUBGROUPS
was of particular interest to us because it proved to be homologous with type XV collagen, which was already under investigation in our laboratory. The cDNA-deduced molecular mass of the oLl(XV)chain is 141 kDa (111) and that of the al(XVII1) chain between 134 and 182 kDa, depending on the variant in question (95, 115). The human al(XV) chain consists of 1388 residues with the following domains (see Fig. 3): I, a 25-residue putative signal peptide; 11, a 530-residue N-terminal noncollagenous domain; 111, a 577-residue collagenous sequence; and IV, a 256-residue C-terminal noncollagenous domain. The collagenous sequence consists of nine collagenous domains, COL1-COL9, varying in length between 15 and 114 residues and separated by eight noncollagenous sequences, NC1-NC8, which vary in length between 7 and 45 residues. Four of the collagenous domains contain altogether five imperfections of 2 to 3 residues. The shortest mouse al(XVII1) chain variant consists of 1315 residues with the following structure (see Fig. 3): I, a 25-residue putative signal peptide; 11, a 301-residue N-terminal noncollagenous domain; 111, a 674residue collagenous sequence; and IV, a 315-residue C-terminal noncollagenous domain. The collagenous sequence contains ten collagenous domains varying in length between 18 and 122 residues, and five of them contain altogether seven imperfections of two residues. The collagenous domains are separated by nine noncollagenous domains varying in length between 12 and 24 residues. Sequences covering part of the human al(XVII1) chain have also been reported (118). Due to the presence of an Human a l ( X V )
CQLl 2
NC1
I
tsp
I
3
I I
4
6 7 8 9
5 5
6
7
8
NCL
I
N C C
COLl 2
2
5
4
3 3
4
O N
0
5
6 6
7 3 9 7 8 9
10::
10,
10
9
I 1 1 1
I
I I I
4
3
2
1
1 ) 1 )
I I I I
I
111
c
I I I I I
I I
ccc
Mouse al(XVII1) FIG 3 Schematic comparison of polypeptide structures of the human al(XV) collagen chain and the short form of the mouse ul(XV1II) chain (from 111,114,and 115) The numbering of the noncollagenous and collagenous domains is shown above the corresponding polypeptides. Homologous noncollagenous sequences are connected with dashed lines Black boxes are noncollagenous Sequences, lightly shaded boxes are putative signal sequences, darkly shaded boxes are homologous collagenous sequences, white boxes are nonhomologous collagenous sequences tsp, Thrombospondin sequence motif, C, cysteine residue, N, potential N-glycosylation site, 0, potential 0-glycosylation site (modified from 115).
250
TAINA PIHLAJANIEMI AND MARK0 REHN
alternative promoter and alternative splicing of the ensuing transcripts, two longer d(XVIII) chains exist with lengths of 1527 or 1774 residues (95). These differ from the 1315-residue chain by having a different signal peptide of 21 residues in length and an N-terminal noncollagenous domain of either 517 or 764 residues (see Section 111,E). Using antipeptide antibodies directed against the beginning portion of the C-terminal noncollagenous domain of type XV collagen, we detected a polypeptide of 125 kDa by Western blotting in cell homogenates of cultured HeLa cells (ff6), its size being in reasonably good agreement with the cDNA-derived molecular mass of 141 kDa (111). Furthermore, only one discrete band was observed in the Western blotting, precluding extensive posttranslational modifications such as glycosylation (see Section 111,B), or suggesting that the antibodies react only with a portion of each polypeptide chain.
B. Sequence Homologies between Collagen Types XV and XVlll Comparison of the primary structures of collagen types XV and XVIII revealed homologies in both their noncollagenous and collagenous domains (Fig. 3). The N-terminal noncollagenous domains of both collagen chains contain an approximately ZOO-residue sequence that is homologous with a large N-terminal segment of thrombospondin-1, a multifunctional glycoprotein with aflinity for several molecules (119). The thrombospondin-1 sequence motif has been identified previously in collagen types V, IX, XI, XII, XIV, XVI, and XIX (13, 93, I l l ) , being found in the N-terminal noncollagenous domain of each molecule and in most cases immediately adjacent to the N-terminal end. This sequence represents the N-terminal heparinbinding domain of thrombospondin but the positions thought to be involved in heparin binding are not conserved in any of the collagens. Thus the significance of this thrombospondin homology for the various collagen chains is not known. The thrombospondin motifs present in each of the FACIT collagens and in some of the fibrillar collagens were aligned and a dendrogram was calculated (Fig. 4). The ten collagen chains fall into two major groups, each of which can divided into two subgroups. One major group is formed by the FACIT collagens and the other by the fibrillar collagens V and XI and the XV/XVIII subgroup. The thrombospondin motif of type XV collagen showed the highest homology with type XVIII, the degree of identity being 45% between the two collagens. An additional feature pertinent only to the al(XV) collagen chains is the occurrence of a 45-amino-acid residue sequence that is repeated four times
251
TWO NEW COLLAGEN SUBGROUPS
-
a1 (Y)
!
a1 (XII) a1 (XIV) a1 (IX)
a1 (XI) a1 (V) -
a2(XI) (=PARP)
FIG. 4. Dendrogram calculated from the aligned thrombospondin-1 sequence motif in ten collagen chains. The lengths of the horizontal lines are proportional to the similarity between the aligned sequences. Reproduced with permission (111).
within the C-terminal third of the N-terminal noncollagenous domain of these chains (112). The homology between collagens XV and XVIII includes seven of their collagenous domains (Fig. 3). More accurately, it involves the six C-terminal collagenous domains preceded by a collagenous domain clearly differing in size between the two chains and the homologous collagenous domains 2 and 3 of the al(XV) and al(XVII1) chains, respectively (114, 115). The al(XVII1) chain contains one more of the collagenous domains at the beginning of the collagenous sequence than does the d ( X V ) chain. The extreme N-terminal collagenous domain of the d ( X V ) chain does not correspond in either size or sequence to either of the extreme N-terminal collagenous domains of the al(XVI1I) chain. It is thus not possible fully to align the two homologous polypeptides, suggesting that they cannot represent different a chains of the same collagen type. The most striking homology between collagens XV and XVIII is found in the C-terminal noncollagenous domains, which have primary structures unique to these two collagens (113,115).These domains contain two regions of homology separated by a variable portion (Fig. 3). One such region involves the first 47 residues of the al(XV) chain and the first 48 residues of the al(XVII1) chain, sharing 31% identity. This is followed by a variable portion of 36 residues in type XV and 91 residues in type XVIII and a region of homology involving the last 173 residues of type XV and the last 177 residues of type XVIII. The degree of identity in the second homologous portion is high, 63%, and it is notable that this portion contains four cysteine residues conserved between the two chains.
252
TAINA PIHLAJANIEMI AND MARK0 REHN
Furthermore, both collagens are characterized by several putative sites for N-linked glycosylation and sites that conform to the consensus sequence for O-linked glycosaminoglycan attachment (Fig. 3). The possibility of types XV and XVIII containing glycosaminoglycan side chains is supported by the recent findings of such side chains in a few of the FACIT collagens (18-19). On the other hand, our Western blot data regarding type XV collagen chains revealed a discrete band instead of the broad band expected for substantially glycosylated molecules (see Section 111,A).
C. Genes Encoding Collagen Types XV and XVIII Characterization of the exon-intron organizations of the genes encoding the homologous collagens XV and XVIII has been initiated, and because the extreme 3' sequences have been examined for both genes, their organizations can be compared. The last 213 274 residues of the human type XV collagen chains are encoded by 7 exons (111)and the last 493 residues of the mouse type XVIII collagen chains are encoded by 12 exons (115).Exons 712 of the type XVIII collagen gene (numbered from the extreme 3' exon) encode collagenous sequences and are highly variable in size due to the presence of noncollagenous sequences. Both exons that begin with a complete codon for glycine and ones that begin with a split codon for such a glycine were identified. The highly homologous C-terminal noncollagenous domains of the two collagens are encoded by 7 exons in both genes. The coding portions of exon 1 in each gene are identical in length, and both exons begin with a split codon for tryptophan. Furthermore, the lengths of exons 2, 3, 6, and 7 and the locations of the respective exon-intron junctions are identical or almost identical in the two genes. The exons coding for sequences containing the four cysteines of the C-terminal noncollagenous domains, exons 1 and 3 in both genes, are the best conserved. Two exons, exons 4 and 5, encode the variable region in both genes. Exon 7 begins with a split codon in both and is a junction exon mostly covering collagenous sequences and only one or two residues of the terminal domains. All in all, partial characterizations of the al(XV) and al(XVII1) genes have indicated a conserved exon-intron organization, suggesting that the two genes derive from a common ancestor. The genes for the two homologous collagens reside on different human chromosomes, that for the d ( X V ) chain on chromosome 9, region q21-22 (120), and that for the al(XVII1) chain on chromosome 21, region q22.3 (118).The chromosomal region containing the type XVIII collagen gene is known to contain the al(V1) and a2(VI) genes as well, located on 21q22.3 (121).However, the clustering of the types XVIII and VI collagen genes does not reflect any close evolutionary relationship, because these collagens are not homologous.
+
TWO NEW COLLAGEN SUBGROUPS
253
D. Tissue Distribution of mRNAs for Collagen Types XV and XVlll Detailed characterization of the tissue distribution of types XV and XVIII collagen transcripts and the corresponding proteins is still in its early stages. Cultured fibroblasts and HeLa cells express type XV collagen mRNAs showing one major transcript of 5.3 kb and very minor transcripts of 4.7 and 4.4 kb (116).Northern blot analysis of a limited number of human tissues showed type XV collagen mRNAs in the adrenal, kidney, and pancreas, whereas lung and liver tissues were lacking in these mRNAs (112). Our initial in situ hybridization experiments suggest that type XV collagen and mRNAs are expressed in fibroblasts in a wide range of tissues (122),and by muscle and endothelial cells (S. Kivirikko, J. Saarela, J. C . Myers, H. Autio-Harmainen and T Pihlajaniemi, unpublished results). Northern blot analysis of mouse tissues suggests an unusual distribution of type XVIII collagen mRNAs because the highest mRNA levels are detected in liver and kidney and the next highest amounts in lung, muscle, and testis, whereas several other tissues contain lower mRNA levels (113, 114, 117). Furthermore, Northern blots reveal multiple type XVIII collagen transcripts ranging in size between 4, 5, and 7 kb (113, 114). Thus, mRNAs for both collagen types are often expressed in the same tissues but there are marked differences in the proportions of the transcripts for the two collagens. Furthermore, in the light of the strong liver expression, the data suggest that the distribution of the type XVIII collagen transcripts is unlike that of any other collagen.
E. Variant Type XVIII Collagen Chains Are Homologous with Tissue Polarity Gene Products ( ” Frizz led ” Proteins)
Our initial characterization of cDNA clones for type XVIII collagen (114) and that by Olsen’s group (113) suggested heterogeneity at the N-terminal end of the corresponding polypeptide because the clones differed with respect to the first 27 amino-acid residues of the al(XVII1) chains. The nature of the heterogeneity was further evaluated by screening a primer-extended mouse embryo cDNA library for additional 5’ clones and by employing PCR specifically to amplify the 5‘ ends of template cDNAs. This resulted in the identification of three variant N-terminal ends of the ensuing 1315-, 1527-, or 1774-residue collagen chains, with sequence-derived molecular weights of 134, 156, and 182 kDa, respectively (95). The variants appear to originate from the utilization of two alternative promoters of the type XVIII collagen gene, resulting in synthesis of either short or long N-terminal noncollagenous domains, the latter being further subject to alternative splicing of transcripts. As a result, the 1527- and 1774-
254
TAINA PIHLAJANIEMI AND MARK0 REHN
residue chains have the same signal peptide postulated to be 21 residues in length, and NC1 domains 517 or 764 residues in length, whereas the 1315residue polypeptide has a different signal peptide and a 301-residue NC1 domain (Fig. 5). All three polypeptides are thought to be identical with respect to the 299 C-terminal residues of their NC1 domains, the collagenous domains, and the C-terminal noncollagenous domains. Thus, in essence, the 301-residue NC1 contains the sequences common to all three NC1 variants whereas the two longer ones contain additional sequences. All in all, significant differences occur at the N-terminal ends of the variant type XVIII collagen chains, and it may therefore be presumed that the variant molecules possess different functional properties. The 5’ end of the type XVIII collagen gene has not been fully characterized and we therefore do not yet know how the two presumed promoters are arranged with respect to each other or what is the mode of alternative splicing. Both long NC1 variants are characterized by a highly acidic sequence at their N terminus. Furthermore, the 247-residue stretch present only in the longest NC1 domain is strikingly characterized by a 110-residue sequence containing ten cysteines that are lacking from the other two NC1 variants (95).Interestingly, the cysteine-rich domain of d(XVII1) chain showed a 2627% identity and 50-51% homology (Fig. 6) with a cysteine-rich domain found in each of the three previously characterized “frizzled proteins,
I
Noncollagenous 25 + 2 a a
4
299aa
I
Collagenous
I
674 aa
[ collagenous Non1
315aa
I
21 +218aa
cc
N C C
O N
0
c
ccc
21 + 465 aa
cc
-fz tsp
2N 1OC
FIG. 5. Schematic structures of the variant mouse al(XVII1) collagen chains. Collagenous sequences are shown in white and noncollagenous domains in black, except for the long variant NC1 domain portions, which are shaded. Two different putative signal peptides are indicated with different stripes. The lengths of the amino-acid sequences (aa)specific for each variant are given, as well as the lengths of the common regions. C, Cysteine residue; lOC, cluster of 10 cysteine residues; N, potential N-glycosylation site; ZN, two adjacent N-glycosylation sites; 0, potential 0-linked glycosylation site; fz and tsp, frizzled and thrombospondin sequence motifs, respectively; the acidic domains are indicated by the overbars over the shaded regions (modified from 95).
255
TWO NEW COLLAGEN SUBGROUPS
1
2
mouse al(xvII1) 361 N r a t fz-1 r a t fz-2 droaophila fz 44 L
-AT EMGLE EDAOLE
3
4
5
6
7
mouse ul(XVII1) 406 r a t fz-1 145 r a t fz-2 78 droaophila fz 87
mouse al(xvII1) 451 L
a
9
10
rat fz-1 r a t fz-2 drosophila fz
FIG. 6. Comparison of the cysteine-rich sequences identified in the long NC1 domain of the mouse al(XVII1) collagen chain with rat frizzled-1 arid frizzled-2 proteins and the DrosopAila lrizzled protein. The numbering of the amino-acid residues begins from the N terminus of each protein (95, 123). Note that the last al(XVII1) chain amino-acid residue in the aligned sequence represents the extreme C-terminal residue in the cysteine-rich NC1 splice variant. Tho amino-acid residues that are identical between the mouse al(XVII1) collagen and oiic or more of the frizzled (17.)proteins are in black and similar residues are shaded. The additional identities and homologies that exist only between the frizzled proteins are not indicated. Similar amino acids: G = A = S; A =V; V = I = L = M; I = L = M = F = Y =W; K = R = 11; 1) - E = Q = N; S = T = Q = N. The similarly located cysteine residues are numbered from the N direction Gaps are introduced to maximize the homology (modified from 95).
namely, rat frizzled proteins 1 and 2 (123) and the Drosophilu frizzled protein (124). The frizzled proteins vary in size between 570 and 641 residues, and all three contain the cysteine-rich domain within their N-terminal one-third and seven putative transmembrane domains within their C-terminal twothirds (123, 224). They resemble G-protein-coupled receptors in their overall structure, and thus are likely to be involved in transmcmbrane signaling. The extracellular domains of the frizzled proteins consist mainly of the cysteine-rich sequence, and therefore this portion is likely to be important for ligand interaction, although the nature of their ligand(s) is not known. Genetic experiments with Drosophilu indicate that mutations in the frizzled locus cause abnormal wing hair and bristle orientation and it has therefore been suggested that the frizzled proteins are important for correct polarity. Collagens are mosaic proteins that contain a number of shuffled domains also present in noncollagenous proteins including thrombospondin-1, thrombospondin-2, complement subcomponent C l q , fribronectin type 111, von Willebrand factor type A, and Kunitz and salvage protein modules (13).
256
TAINA PIHLAJANIEMI AND MARK0 REHN
The list of cysteine-rich motifs found in many noncollagenous membrane proteins and secreted proteins includes, in addition to the fibronectin type 111 and Kunitz motifs also present in some of the collagens, an epidermalgrowth-factor-like domain, an immunoglobulin superfamily domain, and a low-density lipoprotein receptor domain (13). These disulfide cross-linked motifs appear to provide proteins with stable domains that may be involved in binding with other molecules, as is the case with the low-density lipoprotein receptor. The homology identified between type XVIII collagen and the frizzled proteins leads us to suggest that the cysteine-rich sequence, which we tentatively call the fz motif, represents another sequence motif present in both collagenous and noncollagenous proteins. At present one can only speculate as to the significance of the fz motif in some of the type XVIII collagen molecules. It is possible that the occurrence of the fz motif in type XVIII collagen is entirely fortuitous and its presence does not confer on these molecules properties similar to those of the frizzled proteins. On the other hand, the fz motif of type XVIII collagen molecules may interact with the same or a similar ligand to that interacting with the frizzled proteins, and thus type XVIII collagen may be involved in a complex signal transduction pathway that extends from the extracellular matrix into the cell, or else type XVIII collagen molecules may simply serve to sequester the frizzled protein ligand and release it on changes in binding conditions. Northern blot analysis of the variant type XVIII collagen transcripts showed striking differences in tissue distribution (95). Of the eight mouse tissues analyzed, marked amounts of mRNAs for the two long NC1 variants were found in the kidney, liver, lung, and skeletal muscle, whereas the brain, heart, spleen, and testis contained little or none of these mRNAs. The very strong liver signal was largely due to the presence of transcripts encoding the long NC1 domain, which lacks the cysteine-rich sequence. Transcripts for the shortest NC1 variant were detected ubiquitously at low levels, except in the kidney and testis, which contained higher mRNA levels. Thus kidney tissue contained mRNAs for all three NC1 variants whereas testis had only mRNAs encoding the short NC1 variant. The clear differences in the expression of the variant mRNAs argue in favor of a possible functional significance for the utilization of two alternate promoters and alternative splicing of the al(XVII1) transcripts.
F. Deliberations on Collagen Types XV and XVIII The al(XV) and al(XVII1) chains share homology throughout their sequences and clearly form a new subgroup within the family of collagens. It has been suggested (113) that these two homologous collagens should be
TWO NEW COLLAGEN SUBGROUPS
257
called MULTIPLEXINs, derived from collagens with multiple triple-helix domains and interruptions. The two polypeptides contain long N-terminal noncollagenous domains that have a thrombospondin sequence motif, but are not otherwise homologous, whereas the highest homology is observed between their C-terminal noncollagenous domains, structures unique to these two collagens. Both polypeptides are characterized by a collagenous sequence with frequent interruptions, and although several of their collagenous domains are homologous, some are not, making alignment of the two chains in the same molecule unlikely. mRNAs for the collagen types XV and XVIII can be expressed in the same tissues but the proportions of the two types of transcript vary markedly, and some tissues, notably liver, contain solely or predominantly only one of these transcripts, supporting the conclusion that the al(XV) and al(XVII1) chains are not constituents of the same molecule.
IV. Conclusions and Perspectives Two new subgroups within the large collagen superfamily are described here, namely, the membrane-associated collagenous proteins and the types XV/XVIII subgroup. Members of the former group, collagen types XI11 and XVII, and the collagen-like macrophage scavenger receptors and C l q , are not homologous with each other in terms of structure, and in fact show very marked structural and probably also functional differences, but we propose that they should be grouped together because each of them is an integral membrane protein. The two collagen-like proteins are involved in host defense processes. Type XVII collagen is known to be a hemidesmosomal component, probably needed for the attachment of certain epithelia to the stroma, while direct evidence for the membrane location and function of type XI11 collagen is still lacking. Type XI11 collagen is characterized by complex alternative splicing of its transcripts, this affecting not only noncollagenous sequences, as has been reported for several other collagens, but uniquely also collagenous sequences. An important question to be solved is how the type XI11 collagen polypeptide collagenous domains of different lengths, which are synthesized in the same cell, assemble into triple-helical molecules. A question related to each of the members of the group of membrane-associated collagenous proteins is how the folding of three chains into a triple-helical conformation occurs in the rough endoplasmic reticulum in view of their N-terminal portions being on the cytosolic side and their C-terminal portions with the collagenous sequences on the luminal (extracellular) side. This topography
258
TAINA PIHLAJANIEMI AND MARK0 REHN
suggests that triple-helix formation proceeds from the N terminus to the C terminus in the case of membrane-associated collagens, the opposite orientation to that demonstrated for the fibrillar collagens. The members of the second new subgroup, collagen types XV and XVIII, are closely related structurally and possibly have similar functions. A characteristic feature of this subgroup concerns their highly conserved C-terminal noncollagenous domains with four cysteines. Additional features regarding type XVIII collagen are the occurrence of two alternative promoters and alternative splicing affecting the transcripts of one of the promoters. AS a result, three variant polypeptides are predicted to occur, markedly differing with respect to their N-terminal noncollagenous domains. Interestingly, characterization of the variant transcripts defined a new cysteine-rich sequence, termed fz, present in one of the variant N termini of type XVIII collagen, and the three frizzled pmteins, members of the family of G-protein-coupled receptors. These seven-span membrane proteins are thought to be important for polarity determination, at least in the case of Drosophila wing hairs and bristle, and their extracellular cysteine-rich sequence can be assumed to be involved in ligand binding. This raises the possibility that at least some type XVIII collagen molecules may also be involved in interactions with the same or similar ligands. A certain amount of knowledge has thus been gathered about the primary structures of the polypeptides that make up the members of the new collagen subgroups, the tissue distribution of the corresponding mRNAs and proteins, and the respective genes, but it is clear from this review that knowledge of their protein structure, biosynthesis, and function is still in its infancy. The significance of collagen types XIII, XV, and XVIII, identified solely on the grounds of their repetitive Gly-X-Y sequence, can only be fully assessed from a genetic analysis of mutations in the respective genes. It is thus hoped that gene “knockout” experiments in transgenic mice and the introduction of dominant-negative collagen mutations into mice will result in increased understanding of the function of these collagens. Furthermore, much information is needed on the chemical and physical nature of the proteins and their interactions in their native environment. Various recombinant expression systems must be used for many of these purposes, although the expression of collagens is always hampered by the fact that many of them consist of more than one subunit and all of them require complex posttranslational modifications. ACKNOWLEDGMENTS Our work is supported by grants from the Research Council for Medicine within the Academy of Finland and the Sigrid Juselius Foundation.
TWO NEW COLLAGEN SUBGROUPS
259
REFERENCES 1 . R. Mayne and R. E. Burgeson, “Biology of Extracellular Matrix: A Series. Structure and Function of Collagen Types” Academic Press, Orlando, FL, 1987. 2 . M. Gordon and B. R. Olsen, Curr. Opin. Cell Biol. 2, 833 (1990). 3. L. M. Shaw and B. R. Olsen, TZBS 16, 191 (1991). 4. M. van der Rest and R. Garrone, FASEB J. 5, 2814 (1991). 5. D. J. S. Hulmes, Essays Biochem. 27, 49 (1992). 6 . R. E. Burgeson and M. E. Nimni, Clin. Orthop. 282, 250 (1992). 7 . C. M. Kielty, I. Hopkinson and M . E. Grant, in “Connective Tissue and Its Heritable Disorders. Molecular, Genetic and Medical Aspects” (P. M. Royce and B. Steinmann, eds.), p. 103. Wiley-Liss, New York, 1993. 8 . K. I. Kivirikko, Ann. Med. 25, 113 (1993). 9. R. Mayne and R. 6. Brewton, Curr. Opin. Cell Biol. 5, 883 (1993). 10. M. van der Rest and P. Bruckner, Curr. Opin. Struct. Biol. 3, 430 (1993). 1 1 . D. J. Prockop and K. I. Kivirikko, ARB 64, in press. 12. K. I. Kivirikko, in “Principle of Medical Biology” (E. E. Bittar and N. Bittar, eds.), JAI Press, in press. 13. P. Bork, FEBS Lett. 307, 49 (1992). 14. E. Vuorio and B. de Cromhrugghe, ARB 59, 837 (1990). 15. L. J. Sandell and C. D. Boyd, in “Extracellular Matrix Genes” (L. J. Sandell and C. D. Boyd, eds.), p. 1. Academic Press, New York, 1990. 16. J.-J. Wu, P. E. Woods and D. R. Eyre, JBC 267, 23007 (1992). 17. P. Bruckner, L. Vaughan and K. H. Winterhalter, PNAS 82, 2608 (1985). 18. D. McCormick, M. van der Rest, J. Goodship, G. Lozano, Y. Ninorniya and B. R. Olsen, PNAS 84, 4044 (1987). 19. S. L. Watt, G. P. Lunstrum, A. M. McDonough, D. R. Keene, R. E. Burgeson and N. P. Morris, JBC 267, 20093 (1992). 20. T.-C. Pan, R.-Z. Zhang, M.-G. Mattei, R. Timpl and M.-L. Chu, PNAS 89, 6565 (1992). 21. N. Yamaguchi, S. Kimura, 0. W. McBride, H. Hori, Y. Yanada, T. Kanamori, H. Yarnakoshi and Y. Nagai, J . Biochem. 112, 856 (1992). 22. H. Yoshioka, H. Zhang, F. Ramirez, M.-G. Mattei, M. Moradi-Arneli, M. van der Rest and M. K. Gordon, Genomics 13, 884 (1992). 23. J. C. Myers, M . J. Sun, J. A. D’Ippolito, E. W. Jabs, E. G . Neilson and A. S. Dion, Gene 123, 211 (1993). 24. T. Pihlajaniemi, in “Molecular Pathology and Genetics of Alport Syndrome. Contributions to Nephrology” (K. Tryggvason, ed.), Karger, Basel, in press. 25. T. Kodama, M. Freeman, L. Rohrer, J. Zabrecky, P. Matsudaira and M. Krieger, Nature 343, 531 (1990). 26. L. Rohrer, M. Freeman, T. Kodama, M. Penman and M. Krieger, Nature 343,570 (1990). 27. J. L. Goldstein and M. S. Brown, in “The Metabolic Basis of Inherited Disease” (C. R. Scriver, A. L. Beaudet, W. S. Sly and D. Valle, eds.), 6th Ed., p. 1215. McGraw-Hill, New York, 1989. 28. M. Krieger, TZBS 17, 141 (1992). 29. M. Krieger, S. Acton, J. Ashkenas, A. Pearson, M. Penman and D. Resnick, JBC 268, 4569 (1993). 30. S. Thiel and K. B. M. Reid, FEBS Lett. 250, 79 (1989). 31. M. Kaul and M. Loos, Eur. J. Zmmunol. 23, 2166 (1993). 32. D. F. Mutasim, Y. Takahashi, R. S. Labib, G . J. Anhalt, H. P. Patel and L. A. Diaz, J. Znoest. Dermatol. 84, 47 (1985).
260
TAINA PIHLAJANIEMI AND MARK0 REHN
33. L. H. Morrison, R. S. Labib, J. J. Zone, L. A. Diaz and G. J. Anhalt, J. Clin.Invest. 81, 2023 (1988). 34. D. H. Klatte, M. A. Kurpakus, K. A. Grelling and J. C. R. Jones, J. Cell Biol. 109, 3377 (1989). 35. L. A. Diaz, H. Ratrie 111, W. S. Saunders, S. Futamura, H. L. Squiquera, G. J. Anhalt and G. J. Giudice, J . Clin. Inuest. 86, 1088 (1990). 36. G. J. Giudace, D. J. Emery and L. A. Diaz, J. Invest. Dermatol. 99, 243 (1992). 37. S. B. Hopkinson, K. S. Riddelle and J. C. R. Jones, J. Invest. Dennatol. 99, 264 (1992). 38. K. Li, K . Tamai, E. M. L. Tan and J. Uitto, JBC 268, 8825 (1993). 39. M. Rehn and T. Pihlajaniemi, Matrix. Coll. Rel. Res. 13, 12 (1993). 40. H. A. Dresel, E. Friedrich, D. P. Via, H. Sinn, R. Ziegler and G. Schlettler, EMBOJ. 6, 319 (1987). 41. T. Kodama, P. Reddy, C. Kishimoto and M. Krieger, PNAS 85, 9238 (1988). 42. M. Freeman, J. Ashkenas, D. J. G. Rees, D. M. Kingsley, N. G. Copeland, N. A. Jenkins and M. Krieger, PNAS 87, 8810 (1990). 43. A. Matsumoto, M. Naito, H. Itakura, S. Ikemoto, H. Asaoka, I. Hayakawa, H. Kanamori, H. Aburatani, F. Takaku, H. Suzuki, Y. Kobari, T. Miyai, K. Takahashi, E. H. Cohen, R. Wydro, D. E. Housman and T. Kodama, PNAS 87, 9133 (1990). 44. P. E. Bickel and M. W. Freeman, J. Clin. Invest. 90, 1450 (1992). 45. J. Ashkenas, M. Penman, E. Vasile, S. Acton, M. Freeman and M. Krieger, 1.Lipid Res. 34, 983 (1993). 46. M. Penman, A. Lux, N. J. Freedman, L. Rohrer, Y. Ekkel, H. McKinstry, D. Resnickand M. Krieger, JBC 266, 23985 (1991). 47. J. El Khoury, C. A. Thomas, J. D. Loike, S. E. Hickman, L. Cao and S. C. Silverstein, JBC, 269, 10197 (1994). 48. S. Acton, D. Resnick, M. Freeman, Y. Ekkel, J. Ashkenas and M. Krieger, JBC 268,3530 (1993). 49. T. Doi, K.-I. Higashino, Y. Kurihara, Y. Wada, T. Miyazaki, H. Nakamura, S. Uesugi, T. Imanishi, Y. Kawahe, H. Itakura, Y. Yazaki, A. Matsumoto and T. Kodama, JBC 268, 2126 (1993). 50. M. Freeman, Y. Ekkel, L. Rohrer, M. Penman, N. J. Freedman, G . M. Chisolm and M. Krieger, PNAS 88, 4931 (1991). 51. S. Yla-Herttuala, M. E. Rosenfeld, S. Parthasarathy, E. Sigal, T. Sarkioja, J. L. Witztum and D. Steinberg, J. Clin. Invest. 87, 1146 (1991). 52. R. E. Pitas, JBC 265, 12722 (1990). 53. K. B. Reid, J. Gagnon and J. Frampton, EJ 203, 559 (1982). 54. K. B. M. Reid, BJ 231, 729 (1985). 55. L. Wood, S. Polaski and 6 . Vogeli, Immunol. Lett. 17, 115 (1988). 56. G . C. Sellar, D. J. Blake and K. B. M. Reid, BJ 274, 481 (1991). 57. F. Petry, K. B. M. Reid and M. Loos, EJE 209, 129 (1992). 58. A. Brass, K. E. Kadler, J. T. Thomas, M. E. Grant and R. P. Boot-Handford, FEBS Lett. 303, 126 (1992). 59. K. B. M . Reid, Biochem. Soc. Transact. 21, 464 (1993). 60. J. C. Jensenius, S. B. Laursen, Y. Zheng and U. Holmskov, Biochern. Soc. Transact. 22,95 (1994). 61. G. J. Anhalt, H. D. Jampel, H. P. Patel, L. A. Diaz, D. A. Jabs and D. F. Mutasim, Invest. Opthalml. Visual Sci. 28, 903 (1987). 62. K. Li, D. Sawamura, G. J. Giudice, L. A. Diaz, M . 4 . Mattei, M.-L. Chu and J. Uitto, JBC 266, 24064 (1991). 63. J. K. Marchant, T. F. Linsenmayer and M. K. Gordon, PNAS 88, 1560 (1991).
TWO NEW COLLAGEN SUBGROUPS
261
64. Y. Nisizawa, J. Uematsu and K. Owaribe, J, Biochem. 113, 493 (1993). 65. A. R. Hayman, J. Koppel, K. H . Winterhalter and B. Triieb, JBC 265, 9864 (1990). 66. C. Walchli, E. Koller, J. Trueb and B. Trueb, EJB 205, 583 (1992). 67. L. Tikka, T. Pihlajaniemi, P. Henttu, D. J. Prockop and K. Tryggvason, PNAS 85, 7491 (1988). 68. L. Tikka, 0. Elomaa, T. Pihlajaniemi and K. Tryggvason, JBC 266, 17713 (1991). 69. T. Pihlajaniemi, R. Myllyla, J. Seyer, M. Kurkinen and D. J. Prockop, PNAS 84, 940 (1987). 70. T. Pihlajaniemi and M. Tamminen, JBC 265, 16922 (1990). 71. M. Juvonen and T. Pihlajaniemi, JBC 267, 24693 (1992). 72. D. M. Engelman, T. A. Steitz and A. Goldman, Annu. Reu. Biophys. Biophys. Chem. 15, 321 (1986). 73. S. J. Singer, Annu. Reu. Cell Biol. 6, 247 (1990). 74. V. D. Bennett and S. L. Adams, JBC 265, 2223 (1990). 75. M. C. Ryan and L. J. Sandell, JBC 265, 10334 (1990). 76. H.-D. Nah, Z. Niu and S. L. Adams, JBC 269, 16443 (1994). 77. L. Feng, Y. Xia and C. 8. Wilson, JBC 269, 2342 (1994). 78. A. Saito, M. Sakatsume, H. Kimura, H. Shimada and M. Arakawa,J. Am. SOC.Nephrol. 2, 560 (1991). 79. B. Saitta, D. G. Stokes, H. Vissing, R. Timpl and M.-L. Chu, JBC 265, 6473 (1990). 80. B. Saitta, R. Timpl and M.-L. Chu, JBC 267, 6188 (1992). 81. R. Doliana, P. Bonaldo and A. Colombatti, 1.Cell Biol. 111, 2197 (1990). 82. D. G. Stokes, B. Saitta, R. Timpl and M.-L. Chu, JBC 266, 8626 (1991). 83. S. Zanussi, R. Doliana, D. Segat, P. Bonaldo and A. Colombatti, JBC 267, 24082 (1992). 84. I. Nishimura, Y. Muragaki and B. R. Olsen, JBC 264, 20033 (1989). 85. N. Morris, J. Thom Oxford and K. Doege, Matrix Biol. 14, 361 (1994). 86. N. I. Zhidkova, S. K. Justice and R. Mayne, Matrix Biol. 14, 365 (1994). 87. N. Tsumaki and T. Kimura, Matrix Biol. 14, 364 (1994). 88. J. Trueb and B. Trueb, BBA 1171, 97 (1992). 89. M . Juvonen, M. Sandberg and T. Pihlajaniemi, JBC 267, 24700 (1992). 90. T. Pihlajaniemi, P. Hagg, M. Sandberg and M. Juvonen, Matrix Coll. Rel. Res. 13, 11 (1993). 91. M. Juvonen, T. Pihlajaniemi and H. Autio-Harmainen, Lab. lnoest. 69, 541 (1993). 92. D. R. Gerecke, J. W. Foley, P. Castagnola, M. Gennari, B. Dublet, R. Cancedda, T. F. Linsenmayer, M. van der Rest, B. R. Olsen and M. K. Gordon, JBC 268, 12177 (1993). 93. C. Walchli, J. Triieb, B. Kessler, K. H. Winterhalter and B. Triieb, EJB 212, 483 (1993). 94. J. Trueb and B. Trueb, EJB 207, 549 (1992). 95. M. Rehn and T. Pihlajaniemi, JBC (1995). In press. 96. S. L. Adams, H.-D. Nah, A. J. Cohen, 2. Niu and K. M. Palante, Matrix Biol. 14, 367 (1994). 97. S. Peltonen, M. Rehn and T Pihlajaniemi, Matrix Biol. 14, 352 (1994). 98. M. Mazzorana, H. Gruffat, A. Sergeant and M. van der Rest, JBC 268, 3029 (1993). 99. M . Sandberg, M. Tamminen, H. Hirvonen, E. Vuorio and T. Pihlajaniemi, J . Cell. Biol. 109, 1371 (1989). 100. H. Autio-Harmainen, M. Sandberg, T. Pihlajaniemi and E. Vuorio, Lab. Inwest. 64, 483 (1991). 101. H. P. Bkhinger, P. Bruckner, R. Timpl, D. J. Prockop and J. Engel, EJB 106,619 (1980). 102. P. Bruckner, E. F. Eikenberry and D. J. Prockop, EJB 118, 607 (1981). 103. K. J. Doege and J. H. Fessler, JBC 261, 8924 (1986).
262
TAINA PIHLAJANIEMI AND MARK0 REHN
104. M. Emi, H. Asaoka, A. Matsumoto, H. Itakura, Y. Kurihara, Y. Wada, H. Kanamori, Y. Yazaki, E. Takahashi, M. Lepert, J.-M. Lalouel, T. Kodama and T. Mukai, JBC 268, 2120 (1993). 105. G. C. Sellar, D. Cockburn and K. B. M. Reid, Zmmunogenetics 35, 214 (1992). 106. T. B. Shows, L. Tikka, M. G. Byers, R. L. Eddy, L. L. Haley, W. M. Henry, D. J. Prockop and K. Tryggvason, Genomics 5, 128 (1989). 107. K. Kolble, J. Lu, S. E. Mole, S. Kaluz and K. B. M. Reid, Genomics 17, 294 (1993). 108. S. L. Katyal, G. Singh and L. Locker, Am. J . Resp. Cell Mol. B i d . 6, 446 (1992). 109. T. R. Korthagen, S. W. Glasser, M. D. Bruno, M. J. McMahan and J. A. Whitsett, Am.J. Respir. Cell Mol. Biol. 4, 463 (1991). 110. L. Pajunen, T. A. Jones, T. Helaakoski, T. Pihlajaniemi, E. Solomon, D. Sheer and K. I. Kivirikko, Am. J. Human Genet. 45, 829 (1989). 111. S. Kivirikko, P. Heinamaki, M . Rehn, N . Honkanen, J. C. MyersandT. Pihlajaniemi,JBC 269, 4773 (1994). 112. Y. Muragaki, N. Abe, Y. Ninomiya, B. R. Olsen and A. Ooshima, JBC 269, 4042 (1994). 113. S. P. Oh, Y. Kamagata, Y. Muragaki, S. Timmons, A. Ooshimaand B. R. Olsen, PNAS 91, 4229 (1994). 114. M. Rehn and T. Pihlajaniemi, PNAS 91, 4234 (1994). 115. M. Rehn, E. Hintikka and T. Pihlajaniemi, JBC 269, 13929 (1994). 116. J. C. Myers, S. Kivirikko, M. K. Gordon, A. S. Dion and T. Pihlajaniemi, PNAS 89, 10144 (1992). 117. N. Abe, Y. Muragaki, H. Yoshioka, H. Inoue and Y. Ninomiya, BBRC 196, 576 (1993). 118. S. P. Oh, M . L. Warman, M. F. Seldin, S.-D. Cheng, J. H. M . Knoll, S. Timmons and B. R. Olsen, Genomics 19, 494 (1994). 119. J. Adams and J. Lawler, Curr. B i d . 3, 188 (1993). 120. K. Huebner, L. A. Cannizzaro, E. W. Jabs, S. Kivirikko, H. Manzone, T Pihlajaniemi and J. C. Myers, Genomics 14, 220 (1992). 121. D. Weil, M.-G. Mattei, E. Passage, V. C. N'Guyen, D. Pribula-Conway, K. Mann, R. Deutzmann, R. Timpl and M.-L. Chu, Am. J. Human Genet. 42, 435 (1988). 122. T. Pihlajaniemi, S. Kivirikko, J. Saarela, M. K. Gordon, H. Autio-Harmainen, A. S. Dion and J. C. Myers, XIIIth Mtg. Fed. Eur. Connective Tissue SOC., Davos, Switzerland, July 12-17, 1992, Abstr. 10117. 123. S. D. H. Chan, D. B. Karpf, M. E. Fowlkes, M. Hooks, M. S. Bradley, V. Vuong, T. Bambino, M. Y. C. Liu, C. D. Arnaud, G. J. Strewler and R. A. Nissenson, JBC 267, 25202 (1992). 124. C. R. Vinson, S. Conover and P. N. Adler, Nature 338, 263 (1989).
Genetic Dissection of Synthesis and Function of Modified Nucleosides in Bacterial Transfer RNA' GLENNR. BJORK Department of Microbiology U m e i University S-90187 U m e i , Sweden
I. Mutants (hisT) Defective in the Synthesis of in Positions 38, 39, and 40 ............................................... A. Isola he hisT Mutant and Organization of the hisT Operon B. Function in Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Metabolic Consequences of vt Deficiencies . , , , . ....... 11. Mutants ( t d ) Defective in the Synthesis of m5U in Position 54 , . , . A. Isolation of t d Mutants and Regulation of the Synthesis of tRNA(m5U54)methyltransferase . . . . . ..... B. Function of m5U54 in Translation . . . . ..... C. The Structural Gene, t d , for the tRN eth ylt e Is Essential, Although the Catalytic Product, m5U54, of the 111. Mutants (trmC, trmE, asuE) Defective in the Synthesis o mnm5s2U34 . . . . . . . .
B. Function of mnm5s2U34 in Translation C. Replacement of the s2 Group of mnm5s Influenced by the seZD Gene Product .
..........
A. Isolation of nuuA and nuvC Mutants and Synthesis of s4U8 ...... B. s4U8 as Sensor for Near-UV Light; Function in Translation . . . . . . V. Mutants (miaA, mi&, mi&) Defective in the Synthesis of ms2ifiAor ms2io6A37 in Position 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Presence of ms2P A and ms2iofiA37in tRNA of Different Organisms: Isolation of Mutants m i d , m i d , and m i d of E . coli and S. ty-
................... ..... .. C. Function of ms2ieAA37 and ms2i06A37 in Translation . . . . . . . . . . . .
I>. Metabolic and Physiological Consequences Induced by Lack of ms2i06A37 . . . . . . ........................ ........ VI. Mutants (trmD) Defective in the Synthesis of mlG in Position 37 . . . . A. Isolation of trmD Mutants: Regulation of the Synthesis of the tRNA(m1G37)methyltransferase and Enzymatic Recognition Mechanism ... ................... ....... 1
267 267 270 276 279 279 282 285 287 287 289 292 293 293 294 296 296 299 300 308 310 310
A discussion of the abbreviations used in this chapter appears on page 328.
Progress in Nucleic Acid Research and Molecular Biology, Vol. 50
263
Copyright 0 1995 by Academic Press, Inc.
All rights of reproduction in any form reserved.
264
GLENN R. BJC)RK
B. Function of m C 3 7 in Translation ...................... C. Metabolic Consequences of Lack of m'G37 in tRNA . . . . . . . . . . . VII. Mutants (queA, queB, tgt) Defective in the Synthesis of Queuosine in
......................................
313 318 318
queB, and tgt Mutants: The Synthesis of
Queuosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
............ B. Function of Q34 in Translation . . . ............ C. Metabolic Consequences of Lack of VIII. Mutants ( a m ) Defective in the Synthesis of cmo5U (V-Nucleoside) and Its Methyl Ester in Position 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Isolation of Mutants Defective in the Synthesis of cmoW34 and Discovery of a Link between Biosynthesis of Aromatic Amino Acids and tRNA Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Function of cmo5U34 in Translation . IX. Conclusions and Perspectives . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
318 320 322 322
322 323 324 329
Transfer RNA (tRNA), a central cellular macromolecule and the decoding entity of the cell, was discovered in 1957 ( 1 , 2 )and soon thereafter, modified nucleosides were identified as components of it (3, 4). Modified nucleosides are derivatives of the four common ribonucleosides adenosine (A), guanosine (G), uridine (U), and cytidine (C) and were discovered to be parts of nucleic acids as early as 1948 (5). Now 79 different modified nucleosides have been characterized as components of tRNA from many different organisms (6-8). In this review only a few of them are discussed. Their structures and symbols (8) are shown in Fig. 1. Transfer RNA from all three phylogenetic domains, Archaea, Bacteria, and Eucarya [earlier called the kingdoms of archaebacteria, eubacteria, and eukaryotes, respectively (9)] contains modified nucleosides (10). A subset of these modified nucleosides (D, W, Um, ac*C, Cm, mlG, m7G, Gm, m'A, t6A, m6t6A, and I; see Section X for an explanation of abbreviations) is present in tRNAs from all organisms (6). Moreover, some (W13, Cm32, mlG37, t6A37, q38, W39, W55, and m'A58) are even present in comparable positions in the tRNAs from all three domains (11).This suggests a common evolutionary origin and function for these modified nucleosides, unless a convergent evolution has occurred. Therefore, the modified nucleosides may have been present in the tRNA of the progenitor. However, some modified nucleosides are domain-specific, demonstrating that the presence of these modified nucleosides has evolved after the three phylogenetic domains had separated (6, 11, 12). Some domain-specific modifications with different structures may have
265
MODIFIED NUCLEOSIDES IN BACTERIAL HNA m)oc - C -CH
lit!
I
-%
b,
c -0 I
Ribosc
m6t6A
Ribose
Ribose
rns2i6A
ms2i06A
I
I
Ribose
Ribcec
2C
m'G
T-NHg
on
s
Ribose
NH
mnm5s2U
HN
FIG. 1. Structures and abbreviations of modified nucleosides.
I
Ribose
266
GLENN R.
BJORK
the same function due to convergent evolution. Most tRNAs from both Eucarya and Bacteria contain m5U54, whereas most tRNAs in Archaea contain m 1 9 in the same position of the tRNA. However, the spatial orientation of the methyl group in m'T54 and m5U54 is the same relative to the ribose moiety and the phosphate-ribose backbone. Therefore, an evolutionary convergence for the presence of this methyl group may have occurred (13). Furthermore, a species-specificity of tRNA modification exists within a domain; e.g., mo5U34 is present in tRNAs from gram-positive organisms specific for Val, Ala, Thr, and Ser, whereas the corresponding tRNAs from gramnegative organisms have cmo5U34 and/or (m)cmo5U34. These examples show that tRNA modification is ubiquitous and is both domain- and species-specific. Twenty-nine different modified nucleosides, and two for which structures have not been determined, are present in tRNA from Escherichia coli and Salmonella typhimurium (10, 14). Some of these are the hypermodified nucleosides, such as ms2iofiA37 and Q, which have complicated structures; several enzyme activities are therefore required for their synthesis. The formation of modified nucleosides such as q,which occur in more than one position in the tRNA, is catalyzed by distinct site-specific enzymes (15). Based on such considerations, about 45 different tRNA-modifying enzymes may exist in the bacterial cell. Assuming the average size of a structural gene is about 1 kb, as much as 1%of the bacterial genome is allocated to the synthesis of these enzymes. However, because these enzymes are present in small amounts in the cell, the energy required for their synthesis is not too costly for the bacterium. The 46 tRNA species present in E . coli are encoded by 79 tRNA genes distributed among 40 different operons (16). If one assumes an average tRNA gene to be 150 nucleotides (the mature tRNAs are 75-95 b p long), the genetic information devoted to the synthesis of the primary tRNA transcripts is about 12 kb, which represents about 0.25% of the genome size of E . coli. Thus, in bacteria, at least four times more genetic information is devoted to the synthesis of the tRNA-modifying enzymes than to the synthesis of the primary tRNA transcript. Although modified nucleosides are found in various positions in the tRNA, there are two positions, 34 and 37 in the anticodon, that contain the largest variety of modified nucleosides. There is a good correlation between the kind of modified nucleosides present in either of these two positions and the coding properties of the tRNA (17). Modification at these two positions also has a strong influence on the efficiency and codon-context sensitivity of the tRNA (17, 18). In this review, I place emphasis on what we have learned about the synthesis and function of modified nucleosides from the analysis of mutants of E . coli and S . typhimurium deficient in tRNA modification. The order in which I discuss these aspects follows the chronological order in which the corresponding mutants were isolated, as shown in Table I. Primar-
267
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
ily, results obtained from in vivo analyses are discussed, but they will be related to results obtained from in vitro analyses. Because tRNAs from all organisms contain modified nucleosides and some of them are in fact conserved in the tRNAs from different domains, knowledge from analyses of bacterial model systems will no doubt have general biological implications. Several reviews (19-30) have appeared in this series that give an interesting insight of the development of this field following the discovery of the modified nucleosides. The reader is also referred to a number of reviews that deal with specific aspects of tRNA modification (25, 26a,b, 29-38) or that give a general overview of this fascinating subject (17 , 18, 20, 39).
1. Mutants (hisT) Defective in the Synthesis of Positions 38, 39, and 40
in
A. Isolation of the hisT Mutant and Organization of the hisT Operon Pseudouridine was the first modified nucleoside detected in tRNA. It was found in alkaline hydrolyzates of calf-liver RNA as a new nucleotide (40) and its structure was established later (41-44a). It is the most common modified nucleoside, and it is present not only in tRNA (10,26b)and rRNA (44b, 44c) from all organisms but also in many other small RNAs (45).The hisT mutant, which lacks 9 in the anticodon region, was the first mutant isolated (46) that was later shown to be deficient in tRNA modification (15). Cells of S . typhimurium grown on minimal medium have an almost fully repressed his operon and grow well in the presence of the histidine analog 1,2,3-triazole-3-alanine(TRA),although this analog acts as a false corepressor. However, if another histidine analog, e.g., 3-amino-1,2,4-triazole, which inhibits the activity of one of the histidine biosynthetic enzymes, is added in conjunction with TRA, wild-type cells cannot grow, and only derepressed his mutants survive such a selection scheme (46).Several different classes of derepressed his mutants have been isolated, and one of them, hisT, was later shown to lack '4' in the anticodon region of tRNAHis (15). The discovery that tRNA from the hisT mutants accepts histidine as well as wildtype tRNAHis, both in vivo (47) and in vitro (48), and that the hisT mutant still has a derepressed his operon, were central contributions in unraveling the mechanism by which the attenuation-mediated regulation of the his operon occurs (49). The hisT gene is the structural gene for the tRNA(938,39,40)synthetase (also denoted tRNA pseudouridine synthetase-I) (15, 50,51),which catalyzes the formation of '4'38, 9 3 9 , and q 4 0 in 17 of the 40 tRNA species in E . coli
ISOLATION OF
Year
Nucleoside
1966 1970 1975 1975 1976 1977 1978 1978 1980 1982 1982 1984
"38, 39, 40 m5U54 m7G46 mnm5s2U34 s4U8 ms2i6A m1G37 Q34 cmo5U34
1984 1985
mnmWU34 mnm5s2U34
434
s4U8 mnm5s2U34
TABLE I MUTANTSOF BACTERIADEFECTIVE IN tRNA MODIFICATION
Gene
Procedures
hisT
tnnE
Selection Screening Screening Screening Selection Selection Screening Screening Screening Screening Selection Selection
tnnF asuE
Selection Selection
t
d
tmB tmc nuuA mid tmD queA aro tgt
nuuC
Map locationb (min)
50 89 6-8
50 9 96(Ec) 56 9 C
9
44
83 83 25
Method used to score the mutant phenotype Derepressed his operon In oitro methylation of tRNA I n oitro methylation of tRNA In oitro methylation of tRNA Resistance to near-UV light Derepressed t r p operon In uitro methylation of tRNA TL chromatography of tRNA digest TL chromatography of tRNA digest TL chromatography of tRNA digest Resistance to near-UV light Resistance to phage T4 containing a UAG codon in the lysozyme gene and an ochre suppressor As above Antisuppressor of supLmediated suppression of a nonsense mutation in &I; selection of Lac- colonies
Ref. 46
I05 166 166, 167 198, 199 238 167 324 349 327 174 168
168 169
Antisuppressor of supF-mediated UAG readthrough in hisD; selection of salt-resistant colonies due to increased polarity of expression on downstream genes that, when highly expressed, cause salt sensitivity Scoring for mutants defective in the selenocysteine-containing FDH,; isolation of acidifying colonies at anaerobic conditions Antisuppression of supF-mediated readscreening through of UAG codon in hI; for white colonies on X-gal plates HPLC analysis of tRNA digest Derepressed uir genes in Agrobwterium HPLC analysis of tRNA digest
1986
ms2i06A37
miQA
Selection
Wt)
1988
mnm5SezU34
selD
Screening
38(Ec) 21(St)
1993
ms2i06A37
miaB
Screening
17
1993 1993 1993
ms2i06A37 ms2io6A37 s2C32
miaE miaA stcA
Screening Screening Fortuitous
%(St)
1993
&A37
trmc
Fortuitous
56-61
HPLC analysis of tRNA digest
1993
mYA
mtaA
Fortuitous
6-8
HPLC analysis of tRNA digest
-
-
241
188,189
B. Esberg and C. Bjork (unpublished) 247 301 Q. Qian and C. Bjork (unpublished) Q. Qian and C. Bjork (unpublished) Q. Qian and C. Bjork (unpublished)
a The screening procedure is a method in which a phenotype is scored at random among a large population of cells; the phenotype, which distinguishes a desired mutant in tRNA modification, may represent an ability of tRNA to accept a methyl group in citro or a lack of a modified nucleoside in tRNA as shown by HPLC or thin layer chromatography of digested tRNA, detected by color of bacterial colonies on indicator plates. The selection procedure is the application of a condition that allows growth of specific mutants, but not the parental cells. h Numbers indicate at which minute on the bacterial chromosome (0-100 min) the genes are located. c aroA, 20 miu; aroB, 75 min; aroC, 51 min; aroD, 37 min; aroE, 72 min.
270
GLENN R. BJC)RK
and S. typhimurium (10). The hisT gene lies close to the purF gene in Salmonella (Fig. 2) (46) and in E . coli (53).The hisT gene from E . coli was cloned by complementing a hisT mutation in a S . typhimurium strain (54); it was then sequenced (55, 56), and the complex organization and transcription pattern of the hisT operon was determined (56-59). This operon may consist of up to six unrelated genes, and an overview of its organization and tentative transcription pattern is shown in Fig. 3 (see 17 for a more extensive discussion). The complex organization and transcription pattern of the hisT operon may have implications for the effects observed on the metabolism of hisT mutants (see Section I, C ) . Interestingly, the hisT gene is translationally coupled with the upstream asd gene, which is, however, made in 10-14 times higher amounts than is the HisT peptide and is structurally, functionally, and evolutionarily unrelated to the hisT gene (55).None of the other peptides encoded by the hisT operon is related to the HisT peptide or tRNA biosynthesis (58).
B. Function in Translation The first hisT mutants were isolated as having a derepressed his operon (46). This operon is regulated entirely by an attenuation mechanism, in which the rate of translation of the seven his codons in the his leader mRNA determines the level of expression of the operon. Thus, a slow rate of transla-
amB -
amE -
-as&
271
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
trmA trmC hh8
trmD
1 kb
FIG.3. Overview of the genetic organization of structural genes for tRNA-modifying enzymes (modified from 298). A more detailed description of these operons and references can be found in 17.
tion of these seven his codons by the W38-and W9-deficient tRNAHis would result in a derepressed operon according to the model presented (49). This is supported by the observation that the peptide chain-elogation rate of P-galactosidase in vivo is reduced 23% in a hisT mutant, and that the effect is primarily on the rate of translation and not on the rate of transcription (60).A hisT mutation also affects the regulation of the leu and ilv operons (61, 62). These two operons are also regulated by an attenuation mechanism by which the rate of translation of leu and ilv codons present in the leader mRNA is critical for the level of transcription (63-65). Accordingly, derepression of these operons in a hisT mutant is consistent with a reduced translation rate by W-deficient tRNALeU and tRNAIle, respectively. In summary, these results show that tRNA lacking W is less efficient in translating cognate codons, but they do not show which step in the translation cycle is affected. Note also, that the codon context in these leader mRNAs is very unusual, with
272
GLENN R. BJORK
several codons in a row all translated by the same tRNA species. Thus, the effect of W deficiency may be augmented by these unusual codon contexts. However, to date there is no experimental evidence to show that lack of W induces increased codon context sensitivity (see below). Different kinds of nonsense, missense, or frameshift tRNA suppressors can be used to monitor the activity of one specific tRNA in vivo. The supE amber suppressor, which is a tRNACln species with a changed anticodon sequence, contains W in positions 38 and 39 (66).The efficiency of this amber suppressor is reduced to 1/2Oth if it lacks the two Ws in the anticodon region. The reduction is the same irrespectively if the amber codon is followed by A or C, i . e . , it is codon-context insensitive, at least with respect to the kind of nucleotide present 3' of the cognate amber codon. The supF amber suppressor, a stronger suppressor than the supE amber suppressor, is a tRNATyr with a changed anticodon sequence (66-70). The supF tRNATyr has a W at position 39, i . e . , in the anticodon stem, and its activity is reduced by only 40% and, like the supE-mediated suppression, in a codon-contextindependent manner (71).Therefore, W in the anticodon region, i . e . , both in the stem and in the loop, improves the efficiency of the tRNA. Note also that the effect of W deficiency is quantitatively different when it is present in position 39 (in the anticodon stem as in t R N q r C ) or in both positions 38 (in the anticodon loop) and 39 (as in tRNAga", ). It was noted that a hisT mutation also decreases the aminoglycosidemediated suppression of several his or lac missense or nonsense mutations (72, 7 4 , suggesting that W deficiency also increases translation fidelity. One way to monitor translation fidelity is by starving cells for a required amino acid, which results in a severe misincorporation of the near-cognate amino acid. Such a misincorporation can be followed by isoelectric focusing provided that the amino acids to be misincorporated differ in charge compared to the "sense" amino acid (74, 75). The differently charged amino acids His and Gln share the same four-codon box (CAN) and the differently charged amino acids Asn and Lys share another four-codon box (AAN). All tRNAs reading these eight codons have W in position 38 or 39 or both. On starvation for Asn (codons AAY) in a relA mutant, misincorporation occurs of the nearcognate amino acid lysine (codons AAR). The misincorporation is the same in hisTl504 and his?"+ cells. Starvation for His (codons CAY) results in misincorporation of Gln (codons CAR). However, this misincorporation is drastically reduced in hisTl504 cells (76). Whereas tRNAHis, tRNAE:nc, and tRNAg?",,all have W38 (in addition His and tRNAEbcalso have W39), the tRNALySand tRNAAsnhave only "39. Thus, the reduced misincorporation is correlated to the hypomodification of W38. One explanation of these results may be that when cells are starved for His, the empty A-site is prone to accept the near-cognate tRNAGIn. This event is drastically reduced when
273
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
tRNA‘;*” lacks W38, suggesting that this modification increases the activity of the tRNAG1n.However, this is not the case for the 939-containing tRNALys, because no effect of W deficiency was observed during Asn starvation. Apparently, the W38 has a stronger influence than the W39 on the efficiency of the tRNA. This is consistent with the results obtained with the s u p F and s u p E amber suppressors as discussed above in which the undermodification of the q38 containing supE amber suppressor has a more severe affect on function than does the W39 containing supF suppressor. The first step in the elongation cycle is the aminoacyl-tRNA selection step. Charged tRNA is bound to GTP and EF-Tu in a complex that binds to the A-site on the mRNA-programmed ribosome. An assay has been devised that measures this activity as competition between a frameshifting tRNA and the incoming ternary (or pentameric; [77]) complex (78) (Fig. 4). The synthesis of release factor 2 (RF2) requires a + 1frameshift during the translation of its mRNA (79). If the lac2 gene is fused into the +1 frame, the P-galactosidase activity is a measurement of the frameshifting efficiency. The degree of frameshifting is dependent on the presence of a Shine-Dalgarno sequence upstream of the frameshifting site (80)and on the tRNA at the frameshifting point (81).The selection rate of the aa-tRNA*EF=Tu=GTPcomplex cognate to the sense codon just downstream of the frameshifting site influences the degree of frameshifting. The faster the selection of the cognate aa-tRNA for the test codon, the less is the chance for frameshifting, which results in a lower P-galactosidase activity. The relative rate of aa-tRNA selection for 29 different codons was reported using this assay system (78). We have used this “speedometer” assay system to evaluate to what extent W in the anticodon region of leucine and proline tRNAs affects the tRNA selection step. W is present in tRNALeUl (positions 38 and 40), tRNALeU, (position 38), tRNALeU,(positions 38 and 39), and tRNAPro2(position 38) (10). The codons, cognate to these tRNAs were tested in strains differing only in the allelic state of the hisT gene. For all CUN leucine codons, the presence of W in the cognate tRNAs increased the rate of selection of the corresponding tRNAs (82a,b).Therefore, assuming that the rate of selection at the test codon is the sole or the dominating factor in this competition assay between the frameshift event and the competing tRNAs, in the anticodon region plays a significant role at this step of the translation cycle. Supporting evidence has also been obtained using another experimental system. The frameshift suppressor sufl is a tRNAThr, which has an extra nucleotide in the anticodon region and reads the quadruplet ACCN instead of the usual triplet ACC (83).Despite the fact that this tRNA does not have W in the anticodon region, its efficiency is increased in a specific context by a mutation in the hisT gene (84). In this particular context, ACCCUGC, the competing tRNA is either the q38 containing tRNALeu,,which reads CUG,
*
274
GLENN R. BJORK
A
B
-€ I
I
I
I
I
I
I
*lFRAMtBHPTIi?#UACTW“ FIG. 4. The “speedometer” assay (78), which measures the rate with which the ternary [or pentameric (77)] complex successfully enters the A-site in competition with the ability of the tRNA in the P-site to shift into the 1 frame. Downstream of the test codons (NNN) a lac2 gene is fused in the + 1 frame. In panel A, a successful binding of the entering ternary complex has occurred, the reading frame is maintained, and no P-galactosidase is synthesized from the lac2 gene. In panel B, binding of the ternary complex has not occurred to the A-site (NNN codon) and the tRNA in the peptidyl site has shifted into the 1 frame and (3-galactosidaseis synthesized from the loc2 gene. An interaction between the anti-Shine-Dalgarno sequence in the 164 rRNA and the Shine-Dalgarno sequence present three nucleotides upstream of the codon in the peptidyl site (XXX) forces tRNA to shift frame. Translation in the +1 frame encounters the lac2 gene further down, and P-galactosidase is produced (80). (Modified from 82n.)
+
+
or the *38 and 9 3 9 containing tHNALeL13,which reads CUR. It was suggested that the *-deficient tRNAs are less efficient and thus compete less well with the sufl frameshifting tRNA(ACCC), which results in an increased frameshifting by the sufl tRNAT11r. These results are analogous to our results with the RF2 system and support the above notion that 9 increases the efficiency of tRNALeU1,3. However, in the case of tRNAPr02, which con-
275
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
tains 9 in position 38 and which is the only tRNAPro that reads the CCC codon, no effect was observed when this codon was tested (82b). Thus, whereas 938 alone or 938 and *39 in tRNALeUspecies affect the aa-tRNA selection rate, 938 in tRNAPro2does not, implying a tRNA species-specific influence of 9. The above results show that 9 in positions 38, 39, and 40 has an overall effect on the translation elongation cycle, which may be attributed at least partly to the effect exerted at the first step in the clongation cycle, the aminoacyl-tRNA selection step. Can these results be explained in molecular terms? What kind of structural alterations can the 9 modification exert on the conformation of the anticodon that can explain these results? It has been known for some time that a helix of 9 . A ribopolynucleotides has a higher melting temperature (T,J than does a helix of U*A polymers (85)and that 9 in a ?*A pair adopts the anti conformation (86)(Fig. 5).An anti U*A WatsonCrick base pair has the C(2) carbonyl of U anti to the ribose, and the N(3) imino and C(4) carbonyl groups are engaged in hydrogen bonding. 9 in the anti conformation can form a structure whose topology is almost identical to U, and its C(2) carbonyl and N(3) imino groups form hydrogen bonds with A in a fashion similar to the hydrogen bonds between the C(4) and N(3) of U (Fig. 5). However, 9 has an additional imino hydrogen at the nitrogen atom [N(l)], which in U is occupied by the glycosyl bond. This N(l) imino group has the potential to make a hydrogen-water bridge to an oxygen in the 5’-phosphate. Such an intramolecular bond makes the structure less flexible and more stable (87). Because 9 in position 38 is not involved in WatsonCrick base pairing, the modification at this position may exert its stronger influence on the activity of the tRNA (Section 1,B) by creating new bonds
*
9
Y
0
(smi
Y fanti)
0
U fanti)
FIG. 5. Structure of \y in the anti and syn conformations compared to the structure of U, which preferentially adopts the anti conformation. The potential Watson-Crick base-pairings to A are depicted by arrows. (Modified from 87.)
276
GLENN R.
BJOHK
within the anticodon loop that stabilizes and make it less flexible and more adapted to interact with the mRNA in the A-site. Indeed, q 3 2 , which is part of the anticodon loop of tRNAPhe, forms an internal hydrogen bond to a bridging water molecule or to a 2'-hydroxyl of a nearby nucleoside (86). An NMR crystal analysis of a tRNA hypomodified with respect to q38 will reveal how the W modification in this position changes the conformation of the anticodon loop. Such an analysis may further reveal if 938 has the potential to make intramolecular bonds other than W 9 , which may explain the observed differential effect of ' I ! in positions 38 and 39 on the efficiency of the tRNA.
C. Metabolic Consequences of
*
Deficiencies
Various perturbations of cellular metabolism also influence the modification of tRNA (17). Leucine starvation induces a deficiency of several modified nucleosides in tRNAPhe and RNALe", among them 1I' in positions 38 and 39 (88), which then parallels the effect of a mutation in the hisT gene. Unbalanced cellular metabolism, as such, is probably not the primary reason for the appearance of undermodified tRNA, because amino-acid limitation during balanced growth also induces similar undermodification of tRNA (89). Thus, such perturbation of cellular metabolism can induce altered modifications of tRNA that, in turn, may alter the regulation of several metabolic pathways through the roles tRNAs have in attenuation-mediated regulation. In fact, mutations in the hisT gene invoke altered regulation of many aminoacid biosynthetic operons (52). Mutations in the hisT gene may also alter the expression of some metabolic pathways not easily reconciled with an attenuation-mediated regulatory mechanism. On starvation or growth in a leucine-limiting chemostate, the ilu and leu operons are derepressed 13- to 117-fold (i.e., 13- to 117-fold more enzyme molecules are synthesized from the operons), depending on which enzyme is monitored (90).However, in the hisT1504 mutant, the derepression is less pronounced and is only 4-to 27-fold for the corresponding enzymes. These results may be caused by bypassing the attenuation control or by altering it. Analysis of strains deleted for the attenuator should distinguish between these two alternatives. Unfortunately, as yet no such test has been made. Other mutations have been characterized that influence the level of expression of these operons independently of the attenuation-mediated regulations (91). Therefore, the hisT-mediated inhibition of derepression may be another such example. For instance, hisT may influence the level of ppGpp under the investigated conditions (see below) and thus indirectly cause these abnormalities in the derepression of the ilv and leu operons. Another effect not easily reconciled with a tRNA-mediated attenuation mechanism is how W deficiency influences the synthesis of ppGpp. Starva-
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
277
tion of a stringent (reZA+)hisT mutant for histidine does not provoke ppGpp synthesis (92). In contrast, in the same hisT mutant, synthesis of ppGpp occurs on starvation for serine, threonine, or arginine, whose cognate tRNAs do not contain *38, 39, or 40. This anomaly may be due to the fact that the *-deficient uncharged tRNAHis,which normally contains q38 and F39, can either not bind to the ribosomal A-site (a prerequisite for the relA+ mediated ppGpp synthesis) or, if bound, cannot activate the RelA protein to synthesize ppGpp. Nevertheless, histidine starvation of a hisT mutant leads to an abrupt cessation of stable RNA synthesis in a reZA+-dependent manner. Thus, histidine starvation of a hisT mutant provokes the stringent response without ppGpp accumulation (92). However, starvation of a reZA+, hisT mutant for lysine, whose cognate tRNA contains *39, results in accumulation of ppGpp (93). Apparently, W38,normally present in tRNAHis, is the key position where the presence or absence of a determines whether synthesis of ppGpp occurs. As noted in Section I,B, the effect caused by a *38 deficiency on the activity of suppressors as well as on the rate of aminoacyl-tRNA selection is also more pronounced compared to the effect observed by a 9 3 9 deficiency. When Salmonella hisT mutants are tested for growth on arginine or proline as nitrogen sources and glucose as the carbon source, the hisT mutants grow more rapidly than wild-type cells (94). The activity of the ammonia-assimilatory enzyme glutamate synthetase is lower, but the activity of the glutamate dehydrogenase is higher in the hisT mutant, compared to the isogenic wild-type strain when grown in a glucose-arginine medium. Furthermore, the transport of arginine is elevated in the hisT strain, which partly explains the more rapid growth on glucose-arginine medium by the mutant. Although the molecular mechanism for these observations has not been established, the results suggest that undermodified tRNA or the hisT gene product may have a role in the regulation of some aspects of nitrogen metabolism. Microcin B17 is a peptide that inhibits DNA replication. Its production depends on six plasmid-encoded genes. A chromosomal Tn5 insertion in the hisT gene results in a decreased production of this peptide (95).Using a ZmZ insertion in the structural gene (mcbA)for the peptide, the typical induction in stationary cells is delayed in the hisT mutant. However, the P-galactosidase activity reaches the same level in the mutant as in the wild type and it was suggested that the hisT effect on the mcbA expression is too weak to explain the reduced synthesis of the microcin B17 peptide. Also, synthesis of other antibiotic peptides is reduced in hisT mutant cells. The hisT-mediated effect on the synthesis of these antibiotic peptides was not explained at the molecular level, but most likely it is indirectly caused by a hisT-mediated effect on cellular metabolism.
*
278
GLENN R. BJORK
Several frameshift and nonsense mutations have been isolated from the hisT gene, which suggests that the hisT gene produce is not essential for growth even on minimal medium (SO).However, hisT::Kmr knockout mutants in two E . coli K-12 strains, MG1655 and W3110, which are frequently used as wild-type E . coli control strains, show a prolonged lag when shifted from rich to glucose salt medium (96).The mutant cells also show a defect in cell division that results in filament formation. These phenotypes are reversed by the addition of uracil to the glucose salt medium, suggesting that the effect observed is the result of a hisT-mediated uracil requirement. It was suggested that slower translation of a stretch of codons by *-deficient tRNAs in the pyrBZ leader peptide occurs, and thereby reducing the pyrBZ expression, which results in uracil deficiency. Although this may be so, the uracil requirement in these two strains on introduction of the hisT mutation may be specific for these strains. The optimal expression of another gene (pyrE) involved in the synthesis of uracil, downstream from the gene for RNase PH, requires a close coupling between transcription and translation of the p y r E attenuator located in the intercistronic region. The two “wildtype” strains, MG1655 and W3110, used in the above-mentioned study, have a frameshift mutation in the end of the RNase PH gene that results in a very low level of p y r E expression; accordingly, a growth stimulation occurs on addition of uracil in the medium (97‘). The introduction of the hisT mutation may augment this uncoupling between transcription and translation resulting in a Ura- phenotype. Still, these results show how easily the metabolism of a cell can be perturbed by undermodification of tRNA, demonstrating the potential of tRNA modification as a regulatory device. Deficiency of W in the anticodon region can change the level of expression of many amino-acid operons through a reduced translation of the leader mRNA. However, a reduced translation of a structural gene may also, in some instances, influence the level of an enzyme. The 6-phosphogluconate dehydrogenase (6PGD), which is encoded by the gnd gene, participates in the conversion of 6-phosphogluconate to pentose 5-phosphate. The transcription of the gnd gene is growth-rate-dependent regulated, and the functional half-life or translational efficiency of the gnd mRNA is dependent on an “internal complementary sequence” (ICS) present between codons 67 and 78 (98). The ICS is complementary to the ribosomal binding site of gnd mRNA, including the Shine-Dalgarno sequence, and appears to function as a cis-acting antisense RNA, regulating the initiation of translation of the gnd mRNA (99). The growth rate inducibility of the GPGD is reduced by onethird by a hisT mutation, but this hisT-mediated regulation is not dependent on the ICS (100). Furthermore, in miaA mutants, which lack msVA ( E . coli) or ms2i06A37 ( S . typhimurium) in their tRNAs, the specific activity of the GPGD is twice that in miuA+ cells, but the rate of accumulation of the
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
279
enzyme is still growth-rate dependent. Thus, two different undermodifications change the expression of the g n d gene. This effect is specific, because none of them influences the level of another enzyme, glucose 6-phosphate dehydrogenease, the zwf gene product, in the central metabolic pathway (100). Thus, although the mechanism(s) is now known, the level of tRNA modification can influence specifically the level of enzymes in the central metabolic pathways and thereby exerts a regulatory role in cellular metabolism. In enterobacteria, there are several examples of genes that are grouped into operons and also involved in the same metabolic pathways. Such an organization ensures a coordinate regulation that is generally believed to be the reason for such a grouping of genes into operons. However, there are also several examples of multifunctional operons, e.g., ribosomal operons (101). Although the reason for this other grouping is not presently well understood, it may be that such an organization allows coordinate regulation under certain physiological conditions. This notion is supported by the fact that the organization of these operons is evolutionarily conserved. Alternatively, the organization reflects a possible involvement of the nonribosoma1 proteins present in these operons in the assembly of the ribosome (101). The hisT gene is part of a complex operon that also encodes a gene (pdxB) involved in the synthesis of pyridoxine (vitamin B6). This organization is evolutionarily conserved, suggesting a functional reason for this grouping (102) or that this gene has been brought into the bacteria by horizontal gene transfer. Pyridoxal phosphate, the end product of the p d x pathway, participates as a cofactor in the synthesis of nearly all amino acids. As noted above, different stress conditions, such as amino-acid limitation, result in undermodification of tRNA and among the modified nucleosides lacking is synthesized by the hisT gene product (88). The level of hisT-mediated 9 synthesis regulates the synthesis of many amino-acid biosynthetic operons by its modulation capacity on attenuation control (52). Thus, one reason for the grouping of pdxB and hkT in the same operon may be that both gene products participate in the global regulation of amino-acid biosynthesis (102).
*
II. Mutants (trmA) Defective in the Synthesis of m5U in Position 54
A. Isolation of trmA Mutants and Regulation of
the Synthesis of tRNA(mW54) methyltransferase
5-Methyluridine (ribothymidine, m5U) was first detected in RNA in 1958 (103)and was later shown to be part of the sequence T(m5U54)-T-C-G (104)
280
GLENN R. BJC)RK
in most eukaryotic and in all tRNAs from E . coli and very likely in all tRNAs from S. typhimurium, although only a few tRNAs from the latter species have been sequenced (10). As pointed out in my opening remarks, tRNA from the domain Archaea contains 1-methylpseudouridine (ml") in position 54. Because the spatial orientation of the methyl group in m 1 9 is the same as the methyl group of m5U54, a convergent evolution of a methyl group at this position of the tRNA may have occurred (13).The ubiquitous presence and the postulated convergent evolutionary requirement of this particular methyl group (13) implies an important functional role for it. It was therefore a great surprise when bacterial mutants ( t d )devoid of m5U54 were isolated and shown to be viable and amazingly healthy. Unlike the isolation of the hisT mutant of S . typh imu riu ~,which was selected as having a derepressed his operon, the trmA mutants of E. coli, which lack m5U54 in their tRNAs, were isolated by a screening procedure (105).It was anticipated that if tRNA lacks a methylated nucleoside in uiuo, such tRNA would serve as a substrate in uitro provided that the corresponding tRNA methyltransferase is active in uitro and the proper methyl donor is present. Following mutagenesis, total RNA was prepared and methylated in uitro, using an enzyme extract from wild-type E . coli cells in the presence of the methyl donor AdoMet. Because several cultures were pooled, it was possible to screen a large number of RNA preparations for methyl group acceptor activity. Among 3000 clones tested, 10 clones, the total RNA of which accepted methyl groups in uitro, were found; 7 of them were deficient in m5U54 and their tRNAs and these mutants have been fundamental tools in the studies of the synthesis and function of this modified nucleoside. The trrnA mutations were located on the E . coli chromosomal map (Fig. 2) and isogenic strains differing only in the allelic state of the t r d gene were constructed (106). The t d gene encodes the tRNA(mW54)methyltransferase and an overview of the t d operon and its transcription pattern is shown in Fig. 3. The enzyme is present in very low amounts. Surprisingly, this lowly expressed trmA gene is growth-rate-dependent regulated in a way similar to the highly expressed rrn genes (encoding rRNA). The kinetics of accumulation of the enzyme when cells are shifted to higher growth rates is the same as for rRNA. On a shift to a lower growth rate, the synthesis of both tRNA(m5U54)methyltransferase and rRNA stops immediately, but the resumption of the accumulation of the enzyme significantly precedes that of rRNA. The expression of the trmA gene responds like rRNA to mutations in fusA (encodes EF-G), fusB (unknown function; a mutant allele results in a "reversed relaxed phenotype at a nonpermissive temperature) (107), relA (encodes the stringent factor), cyu (encodes adenylyl cyclase), and ZeuA (a temperature-sensitive leucine biosynthetic en-
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
28 1
zyme) (108, 109). However, the expression of the trmA gene responds to gene dosage (108,110) whereas the rrn genes do not (111). Thus, the synthesis of tRNA(m5U54)methyltransferaseand rRNA is similarly regulated under all physiological conditions tested and responds similar to different genetic alterations, except gene dose. The transcription of the rrn genes is governed by two promoters, P1 and P2; promoter P1 is growth-rate and stringently regulated and the source of most of the transcripts (112-119). A molecular explanation for the similar regulation of expression of the trmA and the rrn genes is evident from the extensive similarity between the t d and the rrn P1 promoters (120). As for the rrn P1 promoter, a (G+C)-rich region is located between the -10 Pribnow box and the +1 transcriptional start site of the t d promoter. This region, which is called the “discriminator,” is shared by stringently controlled promoters (121). Furthermore, a putative FIS (factor for inversion stimulation) binding site is centered around base pair -56 upstream of the trmA promoter. The location of this FIS binding site differs from that in the rrn promoters, centered around base pair -71. The FIS protein was first identified as a protein involved in site-specific recombination (122, 123), but was later shown also to be involved in the activation of rRNA and tRNA promoters (124, 125). Also, the sequence TCCC is located just upstream of the - 10 region in the t d promoter, in all seven rrn P1 promoters, and in some tRNA promoters, but not in any other promoter (120). Although the significance of this TCCC sequence in the t d promoter is not clear, the TCCC motif in the rrnB P1 promoter controls both the level of expression and the growth-rate-dependent regulation, perhaps in conjunction with the “discriminator” (126, 127). These features of the tmzA promoter explain, at least partly, the similarity between the rrn and tmnA genes regarding growth-rate-dependent (108, 120) and stringent regulation (109). Although the regulation of the synthesis of several proteins is similar to that of rRNA, there is to my knowledge only the expression of the tRNA(mW54)methyltransferase that also seems to share a regulatory mechanism similar to that of rRNA. All other proteins, like most of the ribosomal proteins (128), for which the rate of accumulation of the gene products is similar to the rate of accumulation of rRNA, use mechanisms distinct from that used to regulate the synthesis of rRNA. Why the expression of the trmA and the rrn genes is regulated similarly is not clear, but it does result in a constant amount of tRNA(m5U54)methyltransferase as compared to that of bulk tRNA and rRNA. Although this regulation of t d gene expression results in a balance of the levels of tRNA(m5U54)methyltransferase and bulk tRNA, each tRNA species, all containing m5U54, is differentially regulated (129). One possible reason for the coordinated regulation of the synthesis of tRNA(m5U54)methyl-
282
GLENN R. BJC)RK
transferase with that of rRNA/tRNA could be that this enzyme has an unusual feature: it exists in two forms, the native form and a form covalently bound to rRNA/tRNA (see Section I1,C) (130).
6. Function of m5U54 in Translation When the trmA mutants were isolated, it was known that m5U54 is present not only in all tRNAs of E . coli but also in most eukaryotic tRNAs, implying that it was of utmost importance. Therefore, it was a great surprise that most of the mutants isolated in our screening procedure, which required that the mutants could grow, were defective in the synthesis of this ubiquitously present, modified nucleoside (105).Of course, all these original mutants, which were isolated after heavy mutagenesis, could harbor compensatory suppressors. However, on analysis of isogenic pairs differing only in the allelic state of the t d gene, no effect caused by m5U54 deficiency was observed on the growth rate as determined during nonrestricted growth (131).Special efforts were made to detect a small residual level of m5U54 in bulk tRNA, because a small subpopulation of tRNA, which may be required for growth, may still be present in the mutant. However, no m5U (detection level 0.2%of the level in the wild-type tRNA) is present in the tRNA of the t d 5 mutant (132).Assays of m5U54 in individual tRNA species also support this conclusion (133, 134). Of course, it could be argued that a small subpopulation of tRNA containing m5U54 is still present in the trmA5 mutant. If such a small level were still significant for growth, it would imply that less than 400 of the 200,000 tRNA molecules present in the cell (135)require m5U54 for their function. If so, the mutant TrmA5 peptide must distinguish these 400 tRNA molecules from the remaining more than 199,600. This is unlikely, because the identity element required for the tRNA(mW54)methyltransferase is 11 nucleotides in the highly conserved T W loop (136). Furthermore, all tRNA species in Mycoplasmu capricolum are devoid of m5U54 (137), showing that m5U54 in tRNA of this bacterium is not essential for growth. Also a yeast mutant (trm2) lacking m5U54 in its tRNA grows normally (138).Taken together, these results show that m5U54 in tRNA is not essential for growth. Direct measurements of many different physiological parameters, such as polypeptide chain elongation rate in uivo, growth of phage T4 and several amber, ochre, and UGA mutants of this phage (131, 139), efficiency of an amber suppressor tRNA in uiuo (133), and aminoacylation in uitro and poly(U)-directed poly(Phe) synthesis in vitro (133,134,140)showed no effect of m5U54 in tRNA. However, lack of m5U54 in tRNALys from E . coli enhanced poly(A)-directed poly(Lys) synthesis (134).This effect is caused by a reduced binding to the A-site and an improved translocation step, whereas the formation of the ternary complex, the P-site binding, and the peptidyl-
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
283
transferase reaction are not dependent on m5U54 (134). Effects caused by m5U54 deficiency of translation in vitro with eukaryotic tRNAs have also been reported. On one hand, mW54-deficient wheat-germ tRNAG'y is more active in a wheat-germ system compared to m5U54-containing tRNAG'y (141), whereas mammalian tRNAPhe lacking m5U54 is less active in overall poly(Phe) synthesis in a rat-liver system (142).Therefore, the presence of m5U54 may either increase or decrease the rate of the eukaryotic elongation cycle depending on which tRNA it is part of. Because the level of m5U54 varies in individual eukaryotic tRNA species, the level of m5U54 may be a regulatory device in these organisms or may simply be nonessential. Using an optimized translation system (143), mW54-deficient tRNAPhe from the trmA5 E . coli mutant at nonsaturating condition is also less efficient than wild-type tRNAPhe (20). Furthermore, monitoring leucine misreading in an optimized poly(U)-directed poly(Phe) synthesis, lack of m5U54 in both tRNAPhe and tRNALeUinduces a 10-fold higher level of misreading of leucine (134). The methyl group of m5U54 increases the stability of the tRNA (144), which may influence certain tRNA species differently and may partly explain the different results obtained when the activity of different tRNA species is monitored. Thus, m5U54 in vitro seems to influence (under some conditions and for some tRNA species) the elongation cycle, translational fidelity, and stability of the tRNA. These small effects observed in uitro may at least partly explain that the wild-type cell outgrows the t r d 5 mutant within 20 generations in a mixed-population experiment (131). The difference in growth rate observed in a glucose-limiting chemostat in a mixed-population experiment is only 4%, which is, however, of evolutionary significance although it is small (G. R. Bjork, unpublished results). Thus, the ubiquitously present m5U54 in tRNA is not essential for growth but influences the growth of the cell in a minor but beneficial way and the efficiency and fidelity of translation in vitro. As do most other bacteria, E . coli and S. typhimurium use formylmethionyl-tRNA*"t to initiate protein synthesis (145-147), and formylation depends on the presence of tetrahydrofolate. Streptococcus faecalis can not synthesize folate or its derivatives de nouo and consequently requires folic acid for growth. However, if all the end products of C1 metabolism are supplied in the medium, s. faecalis can grow in the absence of folic acid. The reason is that this bacterium can initiate protein synthesis without formylation (148, 149), provided that the nonformylated Met-tRNAfMeflacks m5U54 (150, 151).This is also true for tRNAmei from another gram-positive organism, Bacillus subtilis (152). In most bacteria and in eucarya, the methyl donor in the synthesis of m5U54 is AdoMet, but in S. faecalis and B . subtilis, the synthesis of m5U54 depends on a tetrahydrofolate derivative (153, 154).
284
GLENN R. BJC)RK
Thus, folate-deficient tRNAfMet from S . faecalis and B . subtilis lacks both m5U54 and the formyl group, which results in the ability to initiate protein synthesis without formylation of the initiator tRNA. Because tetrahydrofolate is required for both the formylation of the initiator tRNA and the formation of m5U54, a metabolic link exists in these gram-positive organisms between C1 metabolism and translation. In E . coli, folate is synthesized de nmo and, unlike in gram-positive organisms, the formation of m5U54 is dependent on AdoMet as the methyl donor. Still, it was of interest to see if mutants able to initiate protein synthesis without formylation could be isolated. Using a p-aminobenzoic (PABA)-requiring strain of E . coli, mutants were isolated that could grow in the absence of PABA (and thus tetrahydrofolate) (155). Such mutants are deficient in m5U54 of tRNAfMet, and they start protein synthesis without formylation of Met-tRNAfMet.Thus, m5U54 deficiency in E . coli may faciliate the initiation of protein synthesis without formylation, in agreement with what has been shown for S. fmcalis. Because only one structural gene exists for tRNA(m5U54)methyltransferase (105,156),the PABA-independent mutation may be in the trmA gene, but unfortunately the mutations were not mapped. Furthermore, a trmA5 mutant, unlike the t d + control, can grow in the absence of folate imposed by the addition of sulfathiazole and trimethoprim, which inhibit the synthesis of tetrahydrofolate. These results imply that the t d 5 mutant is also able to initiate proteins synthesis without formylation (155). Thus, the level of m5U54 and the ability to formylate initiator tRNAfMet may be part of a regulatory interplay between C1 metabolism and translation as discussed below and earlier (20). The start of protein synthesis at the first cistron in a polycistronic mRNA uses dissociated ribosomal subunits and is only weakly dependent on formylated Met-tRNAfMet.On the other hand, reinitiation at downstream cistrons may occur by nondissociated ribosomes and, if so, it is strongly dependent on formylated Met-tRNAfMet (157). If m5U54 in tRNAfMet facilitates start of protein synthesis of nonformylated tRNAfMet, the level of m5U54 in tRNA may be part of a regulatory device to set the level of translation of downstream cistrons in a polycistronic mRNA. If so, the level of m5U54 regulates polarity. Although there are several examples how different physiological conditions may alter the level of modification in tRNA (17), the level of m5U54 in E . coli tRNA seems to be very strictly regulated. So far, m5U54 is always found in a proper amount in bulk tRNA as well as in some individual tRNA species irrespective of which physiological stress conditions are imposed (see 17 for references). However, the level of m5U54 in tRNAfMet has not been assayed under different conditions, and it is quite possible that its level may vary as a function of the physiological state of the cell and consequently be part of a regulatory device. Most such regulatory devices often
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
285
possess an element of specificity, and it is difficult today to envision such a specificity in this postulated regulatory interplay between C1 metabolism and translation. More detailed knowledge of the process of translation initiation may in the future reveal such a specificity in which the level of m5U54 together with the level of formylation of Met-tRNAfMefplay a fundamental role.
C. The Structural Gene, frmA, for the tRNA(m5U54)methyltransferase Is Essential, Although the Catalytic Product, m5U54, of the Enzyme Is Not In the course of our studies of the regulation of tRNA(mW54)methyltransferase, we showed that the regulation of the peptide is similar to the regulation of the synthesis of rRNA. Using a tmA-cat transcriptional fusion on a plasmid, we also showed that this regulatory feature is operating on a transcriptional level (120).However, we wanted to analyze further this transcriptional regulation by transferring the t d - c a t fusion to the chromosomal copy of the trmA gene. Taking into the account the nonessential feature of m5U54 (discussed above), such a fusion should be viable and suitable for such regulatory studies. Furthermore, a viable Tn5 insertion in the t d gene had been isolated earlier, which, as expected, shows a lack of tRNA(m5U54)methyltransferase activity in uitro and no m5U54 in the tRNA in uiuo, strengthening our conclusion that m5U54 in tRNA is not essential for growth. Therefore, the t d : : c a t fusion was first transferred to a A phage of the Kohara library (158) and this phage was then used to transfer the tmzA::cat fusion to the chromosome (159).However, viable cells containing a t d : : c a t copy on the chromosome were only obtained if the recipient cells contained a wild-type copy of the t r d gene on a plasmid (160). Th'is was very surprising, because the t d operon is monocistronic (Fig. 3) and it was ruled out that the observed effect was due to some polarity effect on downstream genes. Because the phenotype is trans-complemented and m5U54 in tRNA is not essential, these results suggest that the trmA gene is essential for viability and that this is due to a function residing in the TrmA peptide or in the t d mRNA. If the latter is true, t d mRNA or part of it may have a catalytic activity or function as a structural component in a multimeric complex. Apparently, insertion in different parts of the t d gene results in different phenotypes, of which one is nonviability. The cut gene is inserted following codon 160, whereas the aforementioned Tn5 is inserted following codon 337. The essential function is therefore located between these two codons in the t d mRNA or between the corresponding amino acids in the TrmA
286
GLENN R. BJORK
peptide. Southern blot analysis of the seven t d mutations isolated earlier (105)revealed that none contains any deletions or insertions, suggesting that they all are base substitutions (156).The tRNA(m5U54)methyltransferaseis present in uiuo both as a native polypeptide of 42 kDa and as a TrmA-RNA complex of two sizes, a 54- and a 62-kDa complex (130).The 54-kDa TrmARNA complex contains the 3’ end of the 16-S rRNA and some undermodified tRNA, whereas the 62-kDa TrmA-RNA complex contains undermodified tRNA. Western blot analyses using antibodies directed toward the native 42kDa TrmA peptide revealed that the three kinds of TrmA peptides, including the TrmA-RNA complexes, are also present in the t d 5 mutant (160). This is consistent with the aforementioned suggestion that the essential function of the TrmA peptide is located between amino acids 160 and 337. Note that about 50% of the tRNA(mW54)methyltransferase activity in wildtype cells is bound to RNA (130). The reason for the presence of these different forms of the TrmA peptide is unknown. Because the transfer of a methyl group from AdoMet to tRNA involves a covalent intermediate between Cys-324 and C6 of U54 in tRNA (162, 163), some of the covalently bound tRNAs may be such intermediates (130). However, we find this unlikely, because all three forms of the TrmA peptide are present in the catalytically inactive t d 5 mutant in the same ratio as in the wild type (160). Furthermore, the methylation is a rapid process (136), and only a small proportion (not 50%!) of the TrmA peptide should be present as such an intermediate. There is an extensive homology between the W C - a r m of all E . coli tRNAs, which is the recognition target for the tRNA(mW54)methyltransferase, and the last stem-loop of the 16-S rRNA (8 out of 10 bases are identical). Therefore, the same binding site could be used in the binding to U54 of the tRNA and the binding of a uridine in the 1 6 4 rRNA. Addition of either of the two substrates (AdoMet or m5U54 lacking tRNA) to the TrmA16-S rRNA complex did not result in any release of 16-S rRNA. Therefore, the covalently bound piece of 16-S rRNA is probably not linked through the Cys-324 catalytic nucleophile, but to some other part of the TrmA peptide. This would allow the complex to be catalytically active with respect to the formulation of m5U54 in the tRNA. Because these three forms of the TrmA peptide are all present in a the viable t d 5 mutant, we speculate that the essential unknown function of the TrmA peptide is associated with the ability to form these TrmA-RNA complexes. Perhaps these observed TrmA-RNA complexes are part of a ribosomal component involved in the assembly of the ribosome or in RNA maturation, e.g., as an RNA chaperone or an RNA helicase. Future analysis of suitable conditional mutations in the trmA gene may reveal the unknown but essential function of the TrmA peptide.
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
287
111. Mutants (trmC, trmE, mu€) Defective in the Synthesis of mnrn5sW34
A. Isolation of Mutants and the Stepwise Synthesis of mnm5sW34
The modified nucleosides 5-methylaminomethyl-2-thiouridine (mnm5s2U34)was first identified in tRNA in 1968 (164). It is present in the “wobble” position of the single tRNALys species and in all the three tRNAclu species present in E . coli (10).Furthermore, it is most likely present in one of the two tRNAGlnspecies, but definite identification of the wobble nucleoside in this tRNA has not been established (10). Derivatives of it, cmnm5Um in tRNAkzAand mnm5U in tRNA#$, are present in other tRNA species and may be synthesized by the same enzymes that participate in the synthesis of mnm5s2U34. Similar modifications, i. e., 2-thio- and 5-substituted uridines (xm5s2U; x can be, e.g., H or CH,NH, groups), are found in the corresponding tRNA species from mitochondria, and tRNA from the two domains Bacteria and Eucarya and perhaps also in tRNA from the domain Archaea (6). Thus, mnmss2U34 belongs to a group of modified nucleosides present in all organisms, and the structural change imposed on tRNAs by these kinds of modifications seems to be similar, suggesting that their function may be the same irrespective of the origin of the tRNA. During the course of isolating DNA methyltransferase mutants (165)with a method like ours (105),several tRNA methyltransferase mutants have also been isolated (166). One such tRNA methyltransferase mutant (trmC1) is defective in the synthesis of mnm5s2U34. Using another in uitro screening procedure (see Section VI), we isolated another mutant (tmC2), also defective in the synthesis of mnm5s2U34 (167). This modified nucleoside is present in the phage T4 ochre and amber suppressor psu2+. Selecting for survivors following infection by phage T4psu2+ of a bacterial strain that also is arg(UAG) and ZacZ(UAG), rare bacterial mutants that concomitantly had become Arg- and Lac- were isolated. These antisuppressor mutants ( t m E and t m F ) contain s2U34 instead of mnm5s2U34 in their tRNA (168).Screening for mutants with a reduced supL(tRNA:yJ-mediated suppression of a lac1 nonsense mutation resulted in several classes of antisuppressors, of which one is asuE (169).These asuE mutants are apparently deficient in the thio group of mnm5s2U34 (169) and have instead mnm5U34 in their tRNA (T. G. Hagervall and J. McCloskey, unpublished results). The t m C 1 and the t m C 2 mutants map at the same location of the chromosome (Fig. 2) and tRNA from both accepts methyl groups in uitro and generates the same product, mnm5s2U (167). However, in uiuo, the tRNAs from these two trmC mutants contain two different compounds, cmnmss2U
288
GLENN R. BJC)RK
( t m C 1 ) and nm5s2U (trmC2) (170, 171). Cloning of the t m C + gene and purification of the tRNA(mnrn5~2U34)methyltransferaseestablished that indeed these two mutations are in the same gene and thus the TrmC peptide has two different enzymatic activities (171, 172). The t m C 1 mutation generates a UGA stop-codon early in the gene at position UGG-131; the trmC2 mutation results in a missense mutation in the putative AdoMet binding site (T. G. Hagervall and P. M. Wikstrom, unpublished results). Both mnm5s2U and cmnm5s2U are present in wild-type cells, implying that either the latter is the final product in some tRNA species or that the conversion from cmnm5s2U to the final product mnm5s2U34 is slow, resulting in the presence of this intermediate in the tRNA. Because a purified TrmC peptide converts cmnm5s2U present in t m C 1 tRNA to nm5s2U in the absence of AdoMet, this step can precede the methylation step of nm5s2U to mnm5s2U. The TrmE peptide must catalyze a step prior to the steps catalyzed by the TrmC peptide, because the tRNA of a t m E mutant contains s2U and no modification in position 5 of U (168).The thiolation reaction, catalyzed by the AsuE peptide, is independent of the modification at position 5, based on the presence of mnm5U in an asuE mutant (T. G. Hagervall and J. A. McCloskey, unpublished results). Therefore, these genetic and biochemical analyses suggest that the biosynthetic pathway is as follows:
(Several biosynthetic steps may be involved, although only one arrow is shown. Although mutant alleles are named, the enzymatic steps are performed by the corresponding wild-type activity. The named genes are scattered on the chromosome; see Fig. 2.) A tmnE mutant has s2U34; tRNA from this mutant is not a substrate for a methylating enzyme using AdoMet as methyl donor (168).Moreover, starvation for methionine in a reZA1 mutant results in the accumulation of tRNAs containing both cmnrn5sQU34and nm5s2U34 but not s2U (171). Consequently, only one of the two carbon atoms present in mnm5s2U34 originates from the methyl donor AdoMet (1 71).The methylene group may come from tetrahydrofolate (173). If so, the TrmE peptide may be a tRNA methyltransferase using a tetrahydrofolate derivative as donor, similar to the tRNA(mW54)methyltransferase of gram-positive organisms (153, 154). The origin of the N atom is not known. Obviously, many different building blocks are required for the synthesis of the mnm5 side chain. Its synthesis is complex and may require more enzymatic steps than depicted above. There are several kinds of thiolated nucleosides in tRNA, for which the sulfur originates from cysteine. Moreover, this thiolation reaction proceeds by a differ-
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
289
ent pathway, because the nuvC mutant (see Section IV,A) lacking s4U8 has a normal level of mnm5s2U34. Therefore, the putative common activated sulfur compound required for the thiolation to s4U8 and a step in the synthesis of thiamine (174) is not the primary sulfur donor in the synthesis of mnm5s2U 34.
B. Function of mnrn5sW34 in Translation The mnm5s2U34 nucleoside is present in tRNAs that read codons ending with purines in “split” codon boxes (more than one amino acid is encoded by codons in one codon box, e.g., the CAN box in which CAY encodes His and CAR encodes Gln). Therefore, the major function of mnm5s2U34 may be to prevent misreading of the codons ending with pyrimidine nucleosides (Y) in these codon boxes, because such misreading would result in the incorporation of a wrong amino acid. Although this hypothesis has not been experimentally tested in vivo, it was recognized that the binding properties of tRNAs with mnmSs2U in the wobble position are not consistent with the wobble hypothesis (175), because in vitro these tRNAs bind to or translate preferentially codons ending with A compared to codons ending with G (176, 177). Based on a conformation analysis of such nucleosides, nucleotides, or dinucleotides derivatives, a model has emerged that explains their basepairing properties. The presence of a sulfur at position 2 of uridine or a methyl group at the 2’ position of the ribose enhances the intraresidual steric repulsion between the groups in these two positions of the nucleoside in the C2’-endo form and therefore stabilizes the C3’-endo form (178-180) (for a recent review, see 18). In the stabilized C3’-endo conformation, base-pairing with U and G is prohibited (180). One the other hand, modifications in position 5 may (180)or may not (178, 179) enhance the stability of the C3’endo form. This matter is not yet settled. Furthermore, the s2 group per se stabilizes the anticodon-codon interaction by changing both the stacking interaction and the flexibility of the base (181). All hypomodified forms of mnm”s2U34 decrease the efficiency of suppressor tRNAs. As mentioned above, tRNA from the tmnC1 and tmnC2 mutants contains cmnm5s2U34 and nmSs2U34, respectively. The effect of these trmC mutations on the coding properties of an ochre suppressor (supG)derived from tRNALys, which in the t m C + cells has mnm5s2U in the wobble position, was studied not only with respect to the efficiency of tRNA but also how the undermodified forms of mnm5s2U34 influence codoncontext sensitivity of the tRNA (170). The efficiency of this suppressor in wild-type t m C 1 and tmnC2 mutants was determined for UAG and UAA codons at six different sites in the ZacZ mRNA. Both mutations render the tRNA more codon-context sensitive and reduce (by 25-70%) the SupGmediated UAG suppression somewhat more than the UAA suppression, i.e.,
290
GLENN H. BJdRK
the wobble reading of G is preferentially affected by the mnms side-chain. Therefore, the modification in position 5 apparently increases the efficiency to wobble toward G rather than decreases it as suggested by the above hypothesis. The t m E mutant, which has s2U as wobble nucleoside instead of mnm5s2U, was isolated as an antisuppressor of endogenous misreading of a ZmZ(UAG) codon. The reduction is more than 90% at a UAG codon and 70% at a UAA codon, whereas UGA misreading is not affected at all (168). Provided that the misreading tRNA is one that normally has mnmVU34, these results are consistent with the results mentioned above in that the t m C 1 and t m C 2 mutations, which also affect the synthesis of the side-chain in position 5, preferentially decrease the reading of UAG. However, because the misreading tRNA is not known in the trmE mutant, a firm conclusion is not permitted. The observed effect on misreading may be indirect, because the degree of modification of the wobble nucleoside of the tRNA in the P-site may affect the misreading by the tRNA in the A-site through tRNA-tRNA interactions (182).However, tRNALys that contains s2U34 binds only 10%as well to AAG-programmed ribosomes compared to the fully modified tRNA, whereas this reduction is only 50% in the binding to AAA-programmed ribosomes (168). Thus, the mnms modification affects the efficiency of ribosomal binding and preferentially increases the binding toward G and to a lesser extent to A ending codons. Thus, both in recognizing sense codons (AAR) or nonsense codons (UAR), as discussed above, the mnm5 side-chain increases the efficiency to wobble toward G. This is not consistent with the suggestion that the modification of mnmSs2U increases the rigidity of the nucleoside and induces a preferential base pairing with A at the expense of pairing with G. However, if the main function of the mnm5 modification is to increase the stacking interaction of the wobble base and consequently to stabilize the anticodon-codon interaction, one would expect a larger influence on the weaker binding to a G-ending codon compared to the more efficient binding to an A-ending codon. Indeed, the mnm5 modification stabilizes the 3'-endo conformation much less than the s2 group (180)or not at all (1 78, 179). If so, the major function of the mnm5 modification is not to impose the selective reading toward A, but to increase the efficiency of the anticodon-codon interaction to both kinds of codons and preferentially to G-ending codons. The restrictive codon reading by tRNALys-containing mnmWU34 is then mainly imposed by the s2 group. Though this may be so for the E . coli tRNALyS, the presence of only a 5 modification and thus lack of an s2 group in the wobble nucleoside is known to restrict wobble in another tRNA (183).Therefore, the influence by the wobble base on the rigidity may be tRNA-specific. The asuE mutation results in mnmW instead of mnmSs2U34 in the tRNA
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
291
(169) (T. G. Hagervall and J. A. McCloskey, unpublished results). It was originally isolated as an antisuppressor mutation toward supl-mediated amber suppression and it reduces the suppression of both amber and ochre codons (169). Unfortunately, no systematic studies of the impact of the asuE mutation on suppressor tRNA has yet been made. Such analysis should be revealing in our understanding of the function of the s2 group of mnrnss2U34. However, tRNAclu lacking the s2 group of mnmWU34 recognizes GAG in uitro much better than GAA, although the normal tRNAGIUrecognizes preferentially GAA (184). Although these experiments were performed with a mixture of thiolated and nonthiolated wobble-nucleoside-containingtRNAs, the results support the hypothesis that the s2 group imposes a restricted wobbling toward G. It is clear that mnm5s2U34 has a strong influence on the decoding efficiency of a tRNA. An anticodon-anticodon pair in which mnrn5sZU34 pairs with A [lifetime(.r) = 1640 msec] is much more stable than when mnm5s2U34 pairs with G (T = 6 msec) (185). Thus, a replacement of the mnm5s2U34.A pair with mnrnss2U34*G pair reduces the lifetime considerably. In fact, this reduction in stability is much greater than that induced by a G C to a G*U wobble pair replacement (7 reduced from 530 to 120 msec). This preferential binding of mnm5s2U34 to A in these model experiments are consistent with the findings that tRNAs containing mnm5s2U34 preferentially read codons ending with A. The s2 group enhances base-stacking mainly by its hydrophobic character, and lowers the flexibility of the thiolated nucleoside. This has been shown by lifetime measurements of tRNA.tRNA dimers in which a mnm5s2U34*Apair was compared with a mnm5U.A (181). Results from these model experiments are consistent with the antisuppressor phenotype of the asuE mutant. One would expect that the dethiolation should preferentially affect pairing with G compared to pairing with A. Unfortunately, this aspect was not tested (181).Clearly, the in vivo data generated by using the different mnm5s2U34-deficient mutants have established that both the mnm5 and the s2 group of mnm5sW34 improve the efficiency of the tRNA, but they have so far not verified the hypothesis that these modifications also impose a restrictive coding ability compared to an unmodified U. If the primary function of the modification of mnmSs2U34 is to prevent misreading, one would expect that a double t m z E , asuE mutant, which should result in an unmodified U in the wobble position, would be lethal because of extensive misreading. One such mutant may have been isolated (186). This bacterial mutant is temperature-sensitive for growth and its ochre suppression, mediated by a phage-T4-encoded tRNA(psu2), is decreased. Moreover, the T4-encoded psu2 tRNA lacks a modified nucleoside, which may be mnm5s2U, in the wobble position. Although the T4-encoded tRNAs are also deficient in m2A37, Gm18, mlG37, and D19, 20, the level of
292
GLENN R.
BJORK
mnm5s2U34 in the bacterially encoded tRNAclu is reduced in the mutant to 30% and the modification of tRNALeUl,which normally does not contain mnm5sW34, is not affected. The temperature sensitivity of the bacterial cell, the efficiency of phage T4 psu2 ochre suppression, and tRNA modification are probably caused by a single genetic lesion. Lack of this putative mnm5s2U34 nucleoside in psu2 tRNA apparently affects both UAG and UAA readthrough in a codon-context-dependent manner. A more detailed analysis might have revealed a possible regulatory gene that influences not only the synthesis of mnm5s2U34 but also some other modified nucleosides. Although the relationship between undermodification and temperature sensitivity for growth is at best circumstantial, the temperature-sensitive phenotype of this bacterial mutant, which may have a completely unmodified U instead of mnmVU34 in some of its tRNAs, implies that lack of modification at both the 5 and the 2 positions of U34 in tRNAs that normally have mnm5s2U34 is detrimental to the cell. Therefore, construction of a trmE, asuE double mutant should give important information on whether this modification prevents misreading as predicted by the above-mentioned hypothesis. Also, direct measurements of misreading induced by the different mutations that affect the synthesis of mnm5s2U34 will be required to verify the model. Further ribosomal binding experiments with tRNA from the asuE mutant in conjunction with measurements of the tRNA selection step (as described in Fig. 4) in the asuE mutant would help to clarify the in uiuo role of the s2 group.
C. Replacement of the s2 Group of m n m W J 3 4 by an Se2 Group Is Influenced by the selD Gene Product When selenium is present in the growth medium for E . coli (187)and S. typhimurium (188), 40%of the mnmWU34 in the tRNA is replaced by the selenium derivative, mnm5Se2U34. In anaerobically grown E . coli, two peptides, both part of the formate dehydrogenase (FDH), contain selenium as selenocysteine. This enzyme exists in two forms, of which the nitratereductase-linked form (FDH,) is induced by anaerobic growth in the presence of nitrate, and the hydrogenase-linked form (FDH,) is induced anaerobically in the absence of external electron acceptors. Both forms have a selenium-containing subunit and the presence of selenium as selenocysteine is required for their activity. Mutants defective in FDH, activity were isolated, and one class is defective in the gene selD (selA in S . typhimurium) (188, 189).Although the selA gene maps at 21 min on the S. typhimurium chromosome and the selD gene at 38 min on the E . coli chromosome, only a single gene in wild-type S .
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
293
typhimurium hybridizes to the selD gene of E . coli (190) (see Fig. 2). Because the selD gene of E . coli also complements the selA1 mutation in S. typhimurium, the two genes are functionally equivalent (190). The selD peptide catalyzes a selenium-dependent ATP-cleavage reaction generating the labile phosphoselenoate (H2P0,SeH) (191, 192). The selD mutants are defective in the incorporation of selenium as selenocysteine in the FDH peptides and in the formation of mnm5Se2U in tRNA (188,189),showing that the labile phosphoselenoate acts as the Se donor for both pathways. Thus, in a selenium-containing medium, the selDl (denoted selAl in 188) mutant of Salmonella, an ochre suppressor tRNALysderivative encoded by supG, contains only mnm5s2U34, whereas in the wild-type selD+, 40% of the wobble nucleoside is mnm5Se2U34. The efficiency of this ochre suppressor to read UAG or UAA codons at four different sites in the lacZ mRNA was determined in selDl and in selD+ strains (188).At all amber sites, the efficiency in the selDl mutant is reduced to one-half irrespective of the mRNA context. Although UAA suppression is reduced in the selDl mutant at one site, the UAA suppression is the same in wild type and mutant at two sites. Apparently, the presence of selenium instead of sulfur at position 2 of uridine improves the ability to wobble toward G, at least at two of the three sites tested, because the sdDl mutation imposes a 50% reduction to read UAG but does not affect the reading of UAA. This is consistent with results obtained in vitro in triplet binding experiments (193). Transfer RNAGlu and tRNALys containing an s2 group preferentially bind to A-ending codons, whereas the Se2-containing tRNAs, this preference is either reversed (tRNALys) or diminished (tRNAG1"). Because a Se2 group would be expected to form an even weaker hydrogen bond than an s2 group, these results seems surprising. However, a greater ionization of the Se2 group compared to the s2 group, a differential effect of the mnm5 side-chain, or the larger atomic radius of selenium as compared to sulfur may contribute to the increased wobble interaction with G by the selenium-containing derivative (193).
IV. Mutants (nuvA, nuvC) Defective in the Synthesis of s4U in Position 8 A. Isolation of nuvA and n u v C Mutants and Synthesis of s4U8 Although the base s4Ura was chemically synthesized as early as 1908 (194), it was not until 1965 that its nucleoside, 4-thiouridine (s4U), was isolated as part of tRNA (195).This nucleoside, unlike most other nonthiolated nucleosides, has a spectrum that extends into the near-UV (300-400
294
GLENN R. BJC)RK
nm; A,,, of s4U is 334 nm). Irradiation in vitro of those tRNAs that have s4U8 and a C in the nearby position 13, results in a cross-link between these two residues (196);such a cross-linking also occurs in vivo (197).Among near-UVresistant mutants, some (nuvA and nuuC) are deficient in s4U8 (174, 198200). A tRNA(s4U8)synthetase has been partially purified; it is composed of two subunits, “factor A” and “factor C” (201).The activity of these factors is reduced in nuvA and nuvC mutants (1 74, 202), suggesting that these two genes are the structural genes for the corresponding factors. The nuvA and nuvC genes, common to the synthesis of s4U8, are widely separated on the E . coli chromosome (Fig. 2). The NuvA protein converts the tRNA in an ATP- and MgZ+-dependent manner into an unidentified intermediate, perhaps an activated 4-hydroxyl group of U8. One might envision that the NuvA peptide is specific for the synthesis of s4U8 and is the protein that requires a specific interaction with the tRNA to activate only the U8 in the tRNA. Indeed, all other sulfurcontaining nucleosides in tRNA are present in normal amounts in a nuvA mutant (199). The NuvC protein, a cysteine-s4U8 sulfur transferase, catalyzes the second step, which is transfer of a sulfide from cysteine to the putative activiated h04U8, resulting in the formation of s4U. This sulfur transferase requires bound pyridoxal 5-phosphate as cofactor (203). The NuvC protein also participates in the synthesis of thiamine, because a mutation in nuvC also results in auxotrophy for this vitamin (174).Therefore, the NuvC protein, which transfers sulfur from cysteine to tRNA, may also transfer sulfur to an intermediate in the thiazole biosynthesis (174). Among all sulfur-containing nucleosides in tRNA, only the synthesis of s4U8 is mediated by the NuvC peptide (174). The key event may be that the NuvC peptide generates a suitable activated small sulfur donor, which is transferred by the NuvC peptide to different activated targets. If so, the NuvC peptide may be analogous in some aspects to the SelD peptide, which also generates an activated donor molecule that participates in two different pathways, the selenation of mnm%zU and of Ser-tRNASec. Thus, thiolation and selenation of a uridine in tRNA may proceed by similar mechanisms.
6. s4U8 as Sensor for Near-UV Light; Function in Translation
When bacteria are illuminated by near-UV light, growth stops well before cell death (204).However, this growth delay does not occur in bacteria (nuvA mutants) lacking s4U8 in their tRNA (188, 199, 205, 206). Such mutants are killed by broad-band near-UV more easily than the wild-type, implying that s4U8 in tRNA protects the cell from such stress (200),although there are conflicting results using irradiation with monochromatic UV of 334
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
295
nm (207). On irradiation either in uiuo or in uitro, s4U8 is cross-linked to the nearby nucleoside C13 (196, 197) and such cross-linked tRNA triggers the accumulation of ppGpp (205). Because this nncleotide inhibits cell growth and rRNA synthesis (208), the growth delay is at least partly due to ppGpp accumulation. Accordingly, mutants (reZA) that are unable to accumulate ppGpp are also more sensitive to near-UV, showing that ppGpp at least partly protects cells from such stress (200). Moreover, the s4U8-Cl3 crosslink mediates the induction of some oxidative-stress proteins, ppGppinducible proteins, and the dinucleotide ApppGpp (200).This dinucleotide is synthesized by aminoacyl-tRNA synthetases in a back-reaction in which aminoacyl-AMP is condensed with ppGpp. Defective tRNAs, e.g., undermodified tRNAs, have been proposed as stimulators of such a back-reaction because of aberrant aa-tRNA-aaRS interactions (209) and, indeed, undermodified tRNALys does so (210, 211). The back-reaction may be stimulated by the formation of the s4U-Cl3 adduct giving ApppGpp (200), because the nuuA mutant, which lacks s4U in its tRNA and cannot form the s4U8-Cl3 adduct, is defective in the accumulation of ApppGpp (200).Thus, s4U8 may act as a sensor of near-UV irradiation and mediate, through the synthesis of ppGpp and ApppGpp, the induction of specific proteins as a cellular response to this stress (200).Some modified nucleosides in tRNA, notably the thiolated or selenated nucleosides, may therefore be sensors for different irradiation or oxidative stresses and mediate the synthesis of different phosphorylated nucleotides through their interaction with the aminoacyl-tRNA synthetases (209).However, the correlation between induction of certain oxidative-stress proteins and adenylylated nucleotides has been questioned (212, 213). Still, the protective physiological response imposed by such a stress condition may be brought about by thiolated nucleosides in tRNA. Prior exposure of cells to near-UV antagonizes the mutagenic effect of 254-nm UV and inhibits subsequent induction of the SOS response by irradiation with UV (reviewed in 214). The SOS response inhibits cell division and induces several DNA repair systems. This photoprotection by near-UV requires s4U in the tRNA (199, 214, 215). Although s4U8 is the most prevalent thiolated nucleoside in tRNA, there are several others, e.g., mnm5s*U34. Such thiolated nucleosides could also be part of the sensing mechanism for near-UV, because ppGpp is also accumulated on irradiation of an nuu mutant, although not to the same extent as on similar treatment of an nuu+ strain (206).Taken together, the results strongly support the hypothesis that s4U in tRNA and perhaps some other modified nucleosides act as sensors for near-UV and mediate the cellular responses to protect the cell from such stress. The nuuA mutant, which completely lacks s4U8 in its tRNA, grows at a
296
GLENN R. BJORK
rate similar to that of the wild type (188, 198, 199). However, the level of s4U8 level in the tRNAs varies dramatically at different bacterial growth rates (216).Five tRNA species are completely thiolated at all growth rates, whereas in another eight tRNA species, the fraction of s4U8 decreases at increasing growth rates. Neither the aminoacylation of several tRNA species nor the EF-Tu and the EF-G cycles for tRNAPhe in a poly(U)-primed translation system are affected by s4U8 (216).So far, the only function of s4U8 shown is its participation as a sensor for near-UV.
V. Mutants (miaA, m i d , m i d ) Defective in the Synthesis of m s V A or ms2io6A37 in Position 37 A. Presence of m s V A and ms2io6A37 in tRNA of Different Organisms: Isolation of Mutants miaA, miaB, and m i d of €. coli and S. typhimurium The nucleoside isopentenyladenosine (PA), characterized 1966 (217,218; for early reviews see also 24, 219), and its closely related derivatives, such as N6-isopentenyl-2-thiomethyladenosine(msZi6A) and the hydroxylated derivative of ms2i6AA,N~-(cis-4-hydroxyisopentenyl)-2-methylthioadenosine (ms2i06A; also called 2-methylthio-cis-ribosylzeatin), are potent plant growth-factors (cytokinins) that promote cell division (220). In 1968, the msVA was identified in bulk tRNA from E . coli (221, 222) and is present adjacent to the anticodon in tRNATyr (223).The hydroxylated derivative of msVA37 was detected in tRNA from plants (224) and from Pseudomonas aeruginosa (225)and other plant-associated bacteria (226).It was thought to be restricted to tRNA from plants and from bacteria associated with plants (227, 228), but was later also found in tRNA from S . typhimurium (229)and from several other Enterobacteriaceae (230; M. Buck, personal communication) but not in E . coli. The conformation of the ms2i06Apresent in tRNA is cis, whereas it is the trans-isomer of io6A, i.e., trans-ribosylzeatin, that is found as the free base (truns-zeatin) in plants (220). Therefore, a tRNAindependent pathway for cytokinin synthesis may exist (231).Indeed, in the S . typhimurium miaA mutant, which completely lacks rnszio6A in its tRNA, cytokinins are still present in the cytoplasm, which suggests that a tRNAindependent synthetic pathway also exists in bacteria (232). The transderivative of ms2i06A37has also been detected in tRNA from the bacterium Azotobacter vinelandii (233). Soon after the discovery of ms2i6A37 in tRNA from E . coli, Nishimura
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
297
(234) suggested that such modified nucleosides are components of tRNAs that recognize codons starting with U. Cytokinin activity, and thereby the presence of i6A derivatives, of yeast and E . coli tRNAs is correlated to those tRNAs that recognize such a subset of codons (235, 236). Sequences of several tRNA species have indeed verified this hypothesis for E . coli and many other bacteria, and these kinds of modified nucleosides are, when present, always part of tRNAs that recognize UNN codons. However, corresponding tRNAs from another bacterium, Mycoplasma capricolum, does not contain any of these derivatives; instead, it has m6A37 or mlG37 in this subject of tRNAs (137). Although several eukaryotic tRNAs recognizing codons starting with U always have a hydrophobic modified nucleoside adjacent to the anticodon, it is not always an FA-derivative (17). The synthesis of enzymes synthesizing tryptophan is regulated both by a repression and a transcriptional attenuation mechanism. If the trpR gene, which is the structual gene for the repressor, is deleted, the regulation of trp-operon expression is mainly dependent on the attenuation mechanism, which senses the availability of charged tRNATrp and the rate with which this tRNA traverses two UGU (Trp) codons in the trp-leader RNA (237). Among polarity suppressor strains isolated, one is denoted t r p X (238). The t r p X (now designated miuA) mutant is defective in the synthesis of ms2i6A37, and this mutant has an unmodified A37 instead in tRNATrp as well as in all other tRNAs that recognize codons starting with U (239). According to the model for transcriptional attenuation of the trp-operon, a retarded translation of tRNATp results in a derepressed trp-operon. The miaA mutation derepresses the trp-operon four- to fivefold. Lack of ms2i6A37, which does not influence the charging of tRNATrp (240), retards the rate of translation of the two UGG(trp) codons in the trp-leader mRNA and thereby derepresses the trp-operon (238). The miaAl mutant of Salmonella was isolated as an antisuppressor mutation toward the supF suppressor (tRNAzzG (241). Salmonella typhimurium strains with a derepressed his-operon (e.g., containing the his01242 attenuator deletion) are unable to grow on minimal agar plates containing high salt concentration. The reason for this is not known but may by caused by the overproduction of the distal HisH and HisF peptides. An amber mutation in a promoter-proximal gene (hisD6404(UAG))exerts a polar effect on the synthesis of the HisH and hisF peptides; such a strain can grow on high-salt agar plates provided that histidine is included in the growth medium. Introduction of a strong amber suppressor, such as supF30, renders such a strain saltsensitive. The miaAl mutant was selected as a salt-resistant mutant of a strain harboring both the hisD6404(UAG) mutation and supF30. This phenotype of the miaAl mutant suggests that the miaAl mutation lowers the
298
GLENN R. BJoRK
efficiency of the tRNAzrGencoded by the supF30 gene. As in the case of the miaA mutation in E . coli, the miaAl mutation of S . typhimurium also results in an unmodified A37 in the tRNA (241), consistent with the fact that the miaA gene is the structual gene for the tRNA(PA37)synthetase (242, 243). The miaA gene is part of a complex operon (Fig. 2) consisting of perhaps six genes, of which the first two genes encode proteins with unknown functions; the third gene, amiB, may encode a periplasmic N-acetylmuramoyl-L-a1anine amidase (244); the fourth gene is mutL, which encodes a protein involved in mismatch repair; the fifth gene is miaA (243, 245a); and the sixth gene is hf4, which encodes the host factor, HF-1, required for phage Qp RNA-directed synthesis (246~). The rniaBl mutation was also isolated as an antisuppressor but the isolation was based on a screening procedure. A strain of S. typhimurium that harbors an F’ plasmid containing the l a d gene fused to the lac2 gene with a maintained translational reading frame was used. In such a strain, the hybrid LacI-LacZ protein has p-galactosidase activity. The F’ plasmid used contained an amber codon in the 1acI part and such a strain has no P-galactosidase activity and is white on X-gal plates. Introduction of the supF30 suppressor results in P-galactosidase activity and makes the colonies blue. Transposon mutagenesis on a strain harboring this F’ ZacZ(UAG)-ZacZ plasmid and supF30mutation on the chromosome gave light blue colonies on X-gal plates and one was found to have i6A instead of ms2i06A37 in its tRNA (246b). The mutation is denoted miaBl and is located at min 17, far from other genes involved in the synthesis of ms2i06A37 [the miaA (min 94) and the miaE (min 99; see below)] on the S. typhimurium chromosome (Fig. 2). As stated above, ms2i6A37 is present in E . coli tRNAs that recognize codons starting with U, except tRNAser (species I and V). In S. typhimurium, the hydroxylated derivative of ms2i6A, ms2i06A, is present in the corresponding tRNA species that have ms2i6A in E . coli. Thus, one can consider E . coli as a naturally occurring ms2ioeA37-deficient mutant of S. typhimurium. A S . typhimurium plasmid bank was introduced into E . coli and total RNA was prepared and digested to nucleosides. The digest was analyzed by liquid chromatography and screened for clones in which the ms2i6A had been converted into ms2i06A (247). Subcloning and sequencing of the plasmid, giving in viuo tRNA(ms2io6A37)hydroxylase activity, established that the second gene (miaE) of a dicistronic operon is necessary for hydroxylation of ms2i6A. This operon is close to the argI gene at min 99. The miaE gene is absent from E . coli, consistent with the lack of the hydroxylated derivative of ms2i6A in the tRNA. MudJ insertions in the Salmonella miaE gene were also characterized by a screening procedure.
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
299
B. The Stepwise Synthesis of ms2io6A37 The synthesis of ms2io6A37 in tRNA may occur in the following steps: A37maaA + i6A37miaB + sQi6A37miaC+ ms2i6A37m"E -+ ms2io6A37 [The miaD gene product postulated to be involved in a demodification step or in the regulation of the synthesis of the miaA gene product (245a) has been shown to be allelic to the structural gene (pr-C for release factor 3 (245b,c).Synthesis of the ms2 group requires iron, but it is not yet known if the MiaB or MiaC peptide catalyzes the reaction that requires iron. The last step occurs in S. typhimurium and in many other bacteria, but not in E . coli (229, 230). These genes are not clustered on the chromosome (Fig. 2)]. The 6-isopentenyl group (i6) is derived from mevalonic acid (248, 249; reviewed in 24), which is the precursor to more than a dozen end-products through its conversion to isopentenyl pyrophosphate (IPP; used in tRNA modification), farnesyl pyrophosphate (used in synthesis of heme A, ubiquinone, and farnesylated proteins, for example), and cholesterol (used in synthesis of steroid hormones, vitamin D, etc.) (250). The tRNA(PA37)synthetase transfers an isopentenyl group from IPP to A37 in tRNA (251).The sulfur group comes from cysteine, the methyl group from AdoMet (252).The tRNA(iGA37)synthetase has been partially purified from yeast and E . coli
(253-255). The postulated biosynthetic pathway is derived from precursor analysis with methionine- or cysteine-starved E . coli cells (256) and analysis of mutants defective in the pathway (239, 241, 246b, 247). A mutation in the miaA gene results in the accumulation of unmodified A37, but no ms2A37. Overexpression in E . coli of tRNATyr on induction of a phage @80Su3 lysogen results in tRNATyr containing either A37, i6A37, or ms2i6A37(257).Again, no ms2A37-containing tRNATyr species were formed under these conditions. Thus, the MiaB enzyme has a strong requirement for the isopentenyl group. This conclusion is also consistent with some early experiments on the biosynthesis of msVA (reviewed in 24, 219). However, the MiaB enzyme requires additional features, because the selenocysteine-inserting tRNA&vA, has several unusual structural features; it lacks the ms2 group but does have the i6 group (258).The starvation of E . coli (rel, met, cys) for methionine (256)results in accumulation of a precursor to msVA, which may be s2i6A37, whereas starvation for cysteine (256) or iron (259-261) results in i6A accumulation. Cysteine- or iron-starvation in S. typhimurium also results in the accumulation of @A37 and only a small proportion (12%)is in the hydroxylated form (io6A37) (262). A mutation likely to be in the miaB gene of S.
300
GLENN R. BJC)RK
typhimurium results in a tRNA containing i6A37, and again only a small amount of io6A37 is present (246b). Thus, the hydroxylation reaction in S . typhimurium requires the presence of the ms2 group, and the formation of this group requires the presence of the i6 group. Therefore, the modifying enzymes synthesizing ms2io6A act strictly sequentially and depend on modifications at other positions of the nucleoside.
C. Function of m s V A 3 7 and ms2io6A37 in Translation The apparent correlation of a hypermodified nucleoside in position 37 and an adenosine in position 36 paired with the first U in the codon, suggests that these hydrophobic modifications stabilize the intrinsic weak A36.U interaction (19). Such an effect would improve the efficiency of the tRNA, but would also decrease the possibility for A36 to wobble (31, 263), which would present a first-position misreading. Analysis of the lifetime of a tRNA.tRNA dimer with complementary codons containing the same G.C and A.U pairs, but differing in the degree of modification of A37, revealed that the presence of the ms2i6 group stabilizes considerably such complexes, mainly by improved stacking of the hypermodified nucleoside (264, 265). Furthermore, under these conditions this stabilization is mainly an effect of the presence of the ms2 group (264). This would imply that the ms2 group, and not the isopentenyl group, is the major determinant of the anticodon-codon interaction between tRNAs having the ms2i6A modification next to the anticodon. Thus, the results of these model experiments suggest that the modification of A37 to ms2i06A37 stabilizes the 3’ stacking feature of the anticodon and thereby improves its interaction with the codon.
1. INFLUENCEOF ms2io6A37 ON
TRANSLATIONAL
EFFICIENCY
tRNA The miuA (trpX)mutant of E . coli has a derepressed trp operon (238)the undermodified tRNATq in this mutant slows down the movements of the ribosome at the tandem UGG-UGG Trp codons in the 14-codon-long leaderpeptide-coding region and thereby increases the readthrough at the trp attenuator, which results in derepression (239). Similarly, ms2i6A37 also influences the expression of the tryptophanase operon through its impact on the efficiency of tRNATv (266). Replacement of the Trp control sequence UGG-UGG in the trp operon with AGG-UGC (Arg-Cys)abolishes the miuAdependent increase in transcriptional readthrough (267).This result is somewhat surprising, because tRNAcys, which reads the UGC codon, normally has ms2i6A37and, therefore, the miuA effect would not have been entirely abolished. The explanation may be that an extremely slow translation of the rare AGG codon (267) is epistatic over a reduced translation of the UGC AND CODON-CONTEXT SENSITIVITY OF THE
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
30 1
codon due to a codon-context effect, or, less likely, that ms2i6A37does not influence the efficiency of tRNACYsin a manner similar to that of tRNATrp. The way the miaA mutants were isolated suggests that the efficiency of a tRNA lacking the ms2i6A37 or ms2i06A37 is reduced, which was also early shown experimentally, both in vivo and in vitro (265, 268). In an effort to analyze not only the efficiency of the tRNA but also how the msWA37 or ms2i06A37 influences the intrinsic codon-context sensitivity of the tRNA (269; reviewed in 270, 271), we combined genetically several E . coli and S . typhimurium amber suppressor tRNAs with F' lacl-lac2 plasmids, which have UAG codons at six different sites of the lac1 mRNA, and the miaAl mutation (68). At all sites tested, the efficiency is decreased by 47 to 99.4%, depending on sites and suppressors monitored. For the supC80 ochre suppressor, lack of ms2i06A37also results in a reduction of the same magnitude as observed for the 10-times stronger supF30 amber suppressor (241). However, especially at one site in the lac1 mRNA, at which there is a C 3' of the amber codon, the amber suppressor tRNAs lacking ms2i06A37are all significantly more affected compared to suppression at other sites, suggesting that the undermodified tRNA is more codon-context sensitive than the fully modified tRNA (68).Although a role for a pyrimidine nucleoside (Y) 3' of the UAG codon was suggested in the codon-context effects, other features of the mRNA could not be ruled out. Similar experiments have also been performed using different allelic states of the m i d gene. The efficiency of the supF30 (tRNA&) suppressor in the miaB mutant and thus the presence of FA37 instead of ms2i06A37 in the tRNA reduces suppression between 34 and 85%, depending on the location of the UAG codon (71). Thus, the major effect on tRNA efficiency seems to be due to the presence of the i6 group and not the ms2 group. This result is surprising, because the major stabilizing effect of the msVA in the tRNA.tRNA interaction by complementary anticodons was attributed to the ms2 group (264). However, our results obtained in vivo are consistent with some early in vitro suppression experiments (257). Overproduction of tRNATyr in E . coli results in the production of three types of undermodified forms, which have an unmodified A37, i6A37, or msVA37, respectively (257). Compared to the fully modified tRNA*yr, the undermodified form having A37 is almost unable to read UAG in vitro, whereas the i6A37containing tRNA reads the UAG codon with about 50% efficiency compared to the fully modified tRNATyr. Furthermore, lack of the ms2 group decreases the poly(U)-directed poly(Phe) synthesis by 70-80% and the poly(U,C)directed poly(Phe) synthesis by 50-60% (240). Interestingly, no effect could be attributed to the ms2 group as observed when translation of naturally occurring mRNA was monitored in vitro (240).Taken together, these in vitro results suggest that the ms2 group affects the efficiency of the tRNA, but to a
302
GLENN R. BJC)RK
lesser extent than does the i6 group, but also that this effect may depend on the codon context. Thus, the results obtained with the m i d mutant in vivo are fully consistent with results obtained in vitro. Apparently, the structure of the anticodon, during the translation-elongation cycle or as part of a ternary or pentameric (77)complex entering the A-site, is different from that measured as part of a tRNA.tRNA dimer, which was measured in the absence of ribosomes and elongation factors. The results above, which were obtained utilizing UAG mutations at different sites in the lac1 mRNA, suggested that a tRNA lacking the ms2i06 group is particularly sensitive to a context with a C at the 3’ side of the codon. Although indicative, no firm conclusion could be drawn, because other features of the mRNA could result in these codon-context effects. Therefore, a system was constructed in which the amber codon is located at the same site in the mRNA as a codon context that differs only in the identity of the nucleoside 3’ of the UAG codon (67) (Fig. 6). Downstream from the UAG codon in the S . typhimurium hisD gene, a MudK transposon (protein fusion) is inserted. Building strains with different suppressors and different degrees of modification of A37, we monitored the readthrough at this UAG as (3-galactosidase activity of the hybrid HisD-LacZ peptide. Using two strains that differ only in the nucleoside 3’ of the UAG codon, we analyzed the effect of this 3’nucleotide on the UAG readthrough. Compared to fully modified wild-type tRNA’&, the efficiency of suppression drops by 90 and 50% and not at all, when the suppressor tRNAs are deficient in the entire modification, or lack the ms2 group or the hydroxyl group, respectively (Fig. 6). These results with the miuA and the m i d mutants are similar to those obtained using the lacI-lacZ system mentioned above, but they also show that the hydroxyl group, which is lacking in the miaE mutant, does not influence the activity of tRNA gic(247).The ratio between the suppression at the two different contexts (UAG-A/UAG-C)is about 1.5 in the wild-type miaA+ cells for all three amber suppressors tested, but is >5 in the miuAl background irrespective of the strength of the suppressor (67). Also the introduction of the miaBl mutation, the tRNA of which has i6A37, results in a higher ratio of the two contexts than in wild type, but less than that obtained on introduction of the miuAl mutation (71).These results are not due to a codon-context sensitivity of the release factor, because R F 1 has similar a f h i t y for UAG-A and UAG-C (272).The results are also not consistent with a tRNA-tRNA interaction on the ribosome, because the ms2ioGA37 of the A-site tRNA is oriented toward the 5’ side of the mRNA in the P-site and the 5’ context is the same in the system used. Furthermore, structural changes on the 3‘ side of a tHNA have only minor effects on tRNA-tRNA interaction (182). As stated above, the ms2i6 modification stabilizes the anticodon-codon
,
,
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
P his0 1242
,
hisG
supF30
u
.
hisD-lacZ
- 1 )
A37
A37
miaA
I
I
sup F 3 0
i6A37
mia8C
I
U UA AC C -- C C
wt
48
MudK
/acZ
UAC
n
supF+
,
LhisD
303
.
miaE
ms2 i6A
.
1
ms2 &A37
UAC-A UAC-C
UAC-AA
( I A P
A1
mia B1
El
Wt
A1
I B1 I
rnia-
El
wt
2.3
14
44
76
12
39
73
1.6 5.2 2.5
--
f
mia-
_. .
A1
61
El
1.6
FIG.6. Efficiency of an amber suppressor tRNA measured as readthrough of a UAG codon in the S. typhimcrrium hisD gene. A MudK transposon, which generates a translational fusion, has been positioned downstream of a UAG codon, which results in the production of a HisDLacZ hybrid protein with P-galactosidase activity gene (67).Two different derivatives have been constructed in which the only difference is the nature of the nucleoside (C or A) 3’ of the UAG condon. Different allelic states of the m i d , miaB, and m i d genes have been combined with these two hisD::MudK derivatives as well as with different amber suppressors. Results using the supF3O amber suppressor (tRNA?;<;) are shown. Note that the ratio at UAG-A and UAG-C of suppression is a comparison of the readthrough of strains differing only in what kind of nucleoside is present 3’ of the UAG codon.
interaction by improved stacking (265). Based on this fact and the anticodoncodon model (273) (Fig. 7), we (67) proposed a model in which the m s V modification stabilizes the anticodon-codon interaction by improving stacking on A36 in the anticodon but also on the first base (U) of the codon (interstrand stacking). Furthermore, the wobble base (C34) stacks on the base (A or C in the above experiment) 3’ of the codon. The free energy increments for an unpaired dangling base E H where H is A or C, are -1.1 and -0.4, respectively (274).Thus, the most stable anticodon-codon complex is a fully modified A in position 37 (ms2io6A37) and an A 3’ of the codon. The least stable complex would be the unmodified A37 and a C 3’ of the codon. A correlation between the stability of such anticodon-codon interaction and efficiency of suppression was observed (67). The ratio between the suppression at UAG-A and UAG-C is higher (5)for the unmodified tRNAs as compared to the modified tRNA (1.5), but is the same irrespective of the
304
GLENN R. BJt)RK
ANTICODON
1
)r%
7 5 ”
CODON
FIG. 7. A model based on 367 and 273, showing the stacking pattern for an amber suppressor tRNA-amher condon complex including the nucleotide (C or A) 3‘ of the codon. (From 67 with permission.)
strength of the normal as well as the msZio6A-deficient suppressors. This shows that this effect is not correlated with the strength of the suppressor, as has been suggested (275). Thus the modification is of greater importance when the 3’ base is C than if it is A. The ms2i06A37 may induce a conformational change of the anticodon that also alters qualitatively the interstrand stacking of the wobble nucleoside on the nucleoside 3’ of the codon, affecting differentially the efficiency of suppression. The Hirsh (276) suppressor [trpT(Su9)] is a tRNAT* with a G24-to-A24 substitution that results in unusual features (277) and is also able to read UGA (nonsense), UGG (Trp), and UGU (Cys) codons (278). Introduction of trpT(Su9) does not affect the regulation of the trp operon (238),which suggests that the trpT(Su9) tRNATrp reads the Trp codons UGG as efficiently as wild-type tRNATrp. Introduction of the miaA mutation, which results in an unmodified A37 instead of ms2i6A37 in otherwise wild-type tRNATrp, derepresses the trp operon four- to fivefold. However, introduction of the trpT(Su9) suppressor into a miuA strain restores normal regulation of the trp operon, suggesting that this unusual tRNAT* lacking the ms2i6 modification (239)can read the tandem trp codons (UGG) in the trp leader as efficiently as the fully modified wild-type tRNAT*. The miaA mutation also increases the transport of cognate amino acids, but also in this case, the G24-to-A24 base substitution counteracts this effect (279). It seems as if the changed conformation of the anticodon loop induced by the lack of m s V modification is compensated by the G24-to-A24 base substitution reading the cognate codon UGG. On the other hand, in the ability of the trpT(Su9) tRNA to read UGA, the efficiency to read this nonsense codon is severely reduced both in uiuo and in uitro if the trpT(Su9) tRNA lacks the m s V modification (265, 268, 280). As for amber suppressors, as discussed above, the lack of msVA37 also increases the codon-context sensitivity to the nature of the codon on the 3’ side
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
305
of the UGA codon (280). Readthrough of UGA by fully modified trpT(Su9) tRNA depends on both the 5‘ codon and the 3’ codon but not in a cooperative manner (280). However, removal of the ms2i6 modification from the suppressor tRNA induces a cooperative dependence on the nature of the codons on both sides of the UGA codon. Thus, in the undermodified but not fully modified form, the codon-context sensitivity of the trpT(Su9) suppressor tRNA spans three codons: codons in the P-site, A-site, and downstream of the A-site (280). However, one can not generalize this result to other tRNAs, because the G24-to-A24 mutation in the trpT(Su9) tRNA makes its interaction with the ribosome unusual (277)and induces an altered anticodon loop conformation that results in unusual decoding ability (278). Thus, the influence by the ms2i6 modification on the unusual anticodon conformation of trpT(Su9) tRNATrp may differ from that imposed on other anticodons present in nonmutated tRNAs containing this hypermodified nucleoside. The 5’ effect and the cooperativity between the 5’ and 3’ elements may be mediated by an unusual tRNA-tRNA interaction, in which this mutated tRNATrp may be involved. An analysis of how the 5’ codon-context influences hypomodified amber suppressors would clarify whether the observed effects on undermodified trpT(Su9) suppressor tRNA is valid for other undermodified nonmutated tRNAs. In order to investigate if the ms2i06A37imposes its function on the ability of the aa-tRNA.EF-G.GTP complex to enter the A-site, we utilize the “speedometer” assay system (Fig. 4) to monitor the rate of the aa-tRNA selection step (78).Using different allelic states of the miuA and mid3 genes, we assayed the efficiency of the selection step of Phe-tRNAPhe having either ms2ioGA37, i6A37, or A37. When one (UUC) of the two cognate codons was tested, no difference in the tRNA selection was observed between wild-type, miuAl, or miuBl mutants (82u,b).These results were unexpected, because the ms2i6A37 improves stacking (265)and can be envisioned to have an effect on the aa-tRNA selection step. We note that the ms2i6A37 has no effect on the selection of Phe-tRNAPhe in vitro (275).These results imply that the conformation of the anticodon is such that the modification does not improve its S’stacking feature. Indeed, the binding of EFTu.GTP induces an altered anticodon structure as compared to the noncomplexed tRNA (281).Thus, the dramatic effect of ms2io6A37 on translation must be in a step of the translation cycle following the aa-tRNA selection step when EF-Tu has left the ribosome. The Tet(M) protein interacts with the translation apparatus to render this process resistant to the antibiotic tetracycline (282).The Tet(M) protein has sequence similarities to EF-G and is functionally similar to it, but nevertheless the molecular mechanism of its function is not known. A tetracyclinesensitive mutant of E . coli was isolated and shown to be mutated in the miuA
306
GLENN R. BJC)RK
gene (283).The similarity of the Tet(M) protein to EF-G and the influence ms2i6A37 has on its function suggest that this protein and tRNA may interact and in conjunction affect the efficiency of translation. ON TRANSLATIONAL FIDELITY BY ms2i06A37 2. INFLUENCE
Misreading of the nonsense codon UGA, most likely by the ms2PA37-containing near-cognate tRNATrp (284, 285), is reduced in uiuo in msVA37-deficient strains (68,268, 280) and in uitro (265). Quantitatively, this decreased misreading of UGA in uiuo is dependent on codon context (68, 280). Perhaps more interesting is how msVA37 influences missense errors in uiuo, because such misreading does not involve any competition with release factors. Erroneous reading of the Trp codon UGG in ribosomal protein S6 by tRNACys was monitored (68).The tRNAc2ysnormally reads UGU/C codons and has the anticodon sequence U33-GCAmsVA37. When misreading UGG(Trp), the tRNACys(anticodon GCA) competes with the cognate tRNATrp (anticodon CCA), which also has ms2i6A37. This misreading by tRNACYs requires a G34-G mismatch, i.e., a third-codon position mismatch. Such misreading was reduced to 1/30th in a msVA37deficient (miuA) mutant. Thus, the misreading of UGA nonsense codons, probably by tRNATrp (anticodon CCA), and missense reading of UGG by tRNAcYs (anticodon GCA) are both caused by a third codon position mismatch. The A37 base may be in its normal stacking position when a mismatch in the third codon position occurs, and if this is so, then the modification will improve the stability of the anticodon-codon interaction, as has been shown for the amber suppressors (see Section V,C,l), resulting in an improved efficiency of the cognate interaction. When tRNACysmisreads the Trp codon UGG, both tRNAs compete for the same codon and both tRNAs normally carry the ms2i6Amodification. A reasonable explanation is that the error rate is lower in the miuA strain than in the wild-type strain, because the modification deficiency has a larger effect on the weaker, near-cognate interaction between tRNACys (anticodon GCA) and the codon UGG than in the stronger, cognate interaction between tRNATrp (anticodon CCA) and the codon UGG. a. Third Position Misreading.
b. First Position Misreading. Erroneous reading of the Arg codon
CGU in the rplL gene (encoding ribosomal protein L7/L12) by tRNAcYs (anticodon GCA) requires a first codon position (A36-C) mismatch (68). In this case, the modified or unmodified tRNAcys competes with tRNA*rg, which normally does not have the ms2i6A37 modification. No effect of the miuA mutation is observed (68).The lack of effect of ms2i6A37deficiency in a first codon position mismatch suggests that the modified ms2i6A37 does not contribute to the efficiency of reading CGU(Arg) by tRNAcys. Perhaps the
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
307
neighboring mismatch A36-C forces the A37 base, either modified or unmodified, out of the normal 3' anticodon stack. If this is so, then the ms2i6 group does not influence the stability of the first codon position base-pair. Because the misreading of CGU is unusually high (lO-3), the lack of response by the miaA mutation may be codon-context dependent. This example of no effect on first position mismatch by ms2i6A37 in vivo resembles the following observation made in vitro. When the E . coli tRNAPhe,lacking the ms2i6 modification, is added in a 20-fold excess to an in vitro protein-synthesizing system, Phe is incorporated in response to the CUU(Leu) codon as well as to the normal phenylalanine codons UUY (286). No misincorporation is observed when the fully modified tRNAPheis added under similar conditions. This misincorporation also requires a first codon position mismatch and resembles in this aspect the Cys/Arg misincorporation discussed above. Furthermore, the competing tRNA (tRNALeU anticodon U,-GAG-mlG37) does not normally carry the ms2i6Amodification. It is suggested (286) that the lack of the ms2i6A modification induces a larger probability for wobble in the third anticodon position, supporting the earlier suggestion that a hypermodified nucleoside located next to the anticodon would prevent a first codon position wobble (27, 263, 287). Accordingly, modification of tRNAPhe increases in vitro the accuracy of translation by increasing the rate of rejection with noncognate codons (288). The premature release of peptidyl-tRNA may be a result of an active editing mechanism, which releases noncognate tRNAs at the P-site that have escaped the proofreading step in the A-site (289). A specific reduction in peptidyl-tRNA accumulation is observed in miuA mutants for those tRNAs that normally contain msVA, but not for other tRNAs (290). If the primary effect of the ms2i6 modification is at the proofreading step in the A site, less substrate (i.e., near-cognate peptidyl-tRNA) would be present to be released. Thus, these results are consistent with the ribosome-editor model, which links the release of peptidyl-tRNA to the mistranslational events; the results are also consistent with the enhanced proofreading by ms2i6A37deficiency observed in uitro (275) and suggested in vivo (280). Taken together, when a mismatch occurs in the wobble position, an ms2i6A37 deficiency prevents misreading with near-cognate codons, probably by an enhanced proofreading (275, 280). This observation is consistent with a reduced polypeptide elongation rate (241, 291). On the other hand, a lack of msVA37 modification is neutral or may indeed increase the frequency of errors induced by noncognate tRNAs that requires mismatch in the first and perhaps second codon position. Noncognate errors might be prevented by ms2i6A37 due to the stabilization of the base pair between the first codon base and the anticodon position next to the modified nucleoside (27, 263, 287). Accordingly, the modification may increase the rate of rejection of
308
GLENN R. BJC)RK
tRNA with noncognate codons (288) whereas it will decrease the rejection with near-cognate codons (275). Therefore, the effect imposed by ms2i6A37 or ms2io6A37 on errors may depend on the error monitored, because a third position error is increased whereas a first position error is reduced by the modification, consistent with the hypothesis of Ninio (292).Obviously more experiments must be done to address these questions and establish the mechanism(s) behind these apparently selective effects imposed by this remarkable and elaborate modification.
D. Metabolic and Physiological Consequences Induced by Lack of ms2io6A37 Complete undermodification of ms2i06A37or ms2i6A37, as in miuA mutants, induces a 20 to 50% reduction in growth rate, depending on the growth medium (241, 293), but lack of the ms2 group, as in the m i d mutant, or of the hydoxyl group, as in the miuE mutant, does not result in any growth-rate reduction in glucose minimal medium (71,298).The miuA mutation also induces a reduced polypeptide step-time (241, 293), altered regulation of many operons [most likely through its influence on attenuation mechanisms (241)], and other metabolic changes (100, 294). One step in the synthesis of m ~ ~ i ( o ) ~ requires A37 iron, and, interestingly, the level of enterobactin, an iron chelator and a mediator of the transport of iron, is derepressed twofold in a miuA mutant (262). Although the mechanism of the influence on enterobactin synthesis is not known, it resembles the miuAmediated derepression of transport of aromatic amino acids, which was suggested to be caused by ms2-deficient or ms2i6-deficient tRNAs (279). Therefore, the level of ms2 groups in tRNAs might be part of a regulatory system that senses the lack of availability of iron (262, 279). Growth in body fluids usually results in lack of the ms2 group in tRNA caused by the lack of available iron in such fluids. These induced alterations of tRNA are thus connected with the adaptation of the bacteria to growth on or in host tissues and this ability may be a required character for the pathogenicity of the bacteria (295). Reversion of trp or lac2 point mutations is higher in the miuA mutant as compared to wild-type cells (243). Only G.C to T.A transversion is induced by the miuA mutation (245~). Another gene, mutL, in the miuA operon also induces mutations but with a specificity that is distinct from that induced by miuA mutations. Thus, at least two genes of the six in the m i d operon are involved directly or indirectly in mismatch repair. Iron starvation that results in lack of the ms2 group of ms2i6A37 also results in the same mutator specificity as the miuA mutation, which suggests that the mutator phenotype may be mediated by undermodified tRNA (243).So far only two other mutators, mutY and mutM, have the same mutator specificity as miuA (296, 297).
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
309
Therefore, the miuA mutation or undermodified tRNA may alter the level of the products of these genes, because the allelic state of the miuA gene or iron starvation is known to change the expression of several genes. However, other indirect metabolic routes may be involved, but clearly the miaA gene product or undermodified tRNA regulates the frequency of occurrence of a specific subset of mutations. Buck and Ames (262) showed that the tRNA(ms2io6A37)hydroxylase is present in cells grown anaerobically but that the hydroxylation of ms2i6A37 to ms2i06A37requires molecular oxygen. They suggested that the hydroxylation signals the entry of the cell to aerobic conditions (262). One would therefore expect that a miaE mutant, which lacks the tRNA(ms2ioGA37)hydroxylase activity, behaved differently compared to wild type on such a shift, but this is not the case (247). Unexpectedly, the miaE mutant of S. typhimurium does not grow aerobically on the citric-acid-cycle intermediates succinate, fumarate, and malate, and it grows poorly on citrate and acetate (247).This growth defect is not caused by an altered uptake for these carbon compounds nor is it dependent on the lack of activity of the enzymes of the citric acid cycle, nor on a defect in the respiratory chain. The growth on malate of the miuE mutant is suppressed by the addition of CAMP, which suggests that the miaE gene is involved in the metabolism of this regulatory nucleotide. Alternatively, the miuE mutation may alter some of the gluconeogenetic pathways required for growth on citric-acid-cycle intermediates (298). Extragenic suppressors of the miuE mutant can grow on malate although they have a nonhydroxylated tRNA. Thus, the suppressors resemble E. coli, which does not have the miaE gene and consequently has the nonhydroxylated form ms2i6A37 in its tRNA and is able to grow aerobically on these citric-acid-cycle intermediates. Apparently, E. coli may have evolved to cope with the nonhydroxylated form of ms2ioGA37in a manner similar to that of these extragenetic suppressors in S. typhimurium. Although the mechanism for this link between tRNA modification and the central metabolic pathways is not understood at the molecular level, this observation supports the idea that such links may be of regulatory significance for the cell. The transfer of T-DNA (transferred DNA) to various plants by the plant pathogen Agrobacterium tumefaciens is governed mainly by the proteins encoded by the uir genes, which are located on the Ti (tumor-inducing) plasmid (299, 300). The uir regulon contains several operons whose expressions are regulated by plasmid-encoded genes, but apparently also by several chromosomally encoded factors. In screening for mutants of A. tumefuciens with reduced expression of the uir genes, a miaA mutant was isolated (301). The miuA-mediated reduction of the synthesis of these uir genes affected the virulence of the bacteria only to a limited extent on one of the
310
GLENN H. BJC)RK
five hosts tested. The reduction of vir gene expression may be due to inefficient translation caused by the undermodified tRNA. The phenotype of the miuA mutant of A . tumefmiens shows that modification-deficient tRNA may influence gene expression, thereby influencing the pathogenicity of bacteria in a manner similar to that mentioned above for requirement of undermodified tRNA to assist growth in body fluids (260).
VI. Mutants (trmD) Defective in the Synthesis of m l G in Position 37 A. Isolation of trmD Mutants:
Regulation of the Synthesis of the t RNA ( m1 G37)met hy It ra nsf erase a nd Enzymatic Recognition Mechanism
1-Methylguanosine (mlG) was first characterized in yeast RNA (302)and was later found in tRNA from several sources (4). It is present in position 37 in tRNAs from all three phylogenetic domains and in position 9 in tRNAs from the two domains Eucarya and Archaea. Interestingly, tRNAs from the three domains reading codons of the type CUN (leucine), CCN (proline), and CGG (arginine) all have mlG in position 37, suggesting that this modification was present in the tRNA of the progenitor (11). Therefore, the function of m1G in this position may be the same irrespective of the origin of the tRNA. When cells that harbor both a temperature-sensitive valyl-tRNA synthetase [valS(Ts)] and a relAl mutation are shifted to a nonpermissive temperature, protein synthesis stops. However, a relaxed synthesis of RNA occurs imposed by the relAl mutation (208). If such cells also have a temperature-sensitive tRNA-modifying enzyme, specifically undermodified tRNA will accumulate. Following a shift from permissive to nonpermissive growth conditions of a uaZS(Ts), relAl strain, RNA was prepared and methylated in uitro using an enzyme extract from wild-type cells. By screening for clones whose RNA accepts methyl groups in vitro, it would be possible to isolate mutants in tRNA modifications essential for cell viability (303).In this way, the t m D l and the t m C 2 mutants of E . coli were isolated. The level of mlG37 in tRNA from the E . coli t m D l mutant grown at high temperature is 20%of the level of mlG37 in tRNA from wild-type cells. Such tRNA is also a specific substrate for the tRNA(mlG37)methyltransferase, a fact that has been pivotal in the study of the regulation of the synthesis of this enzyme as well as in determining the genetic organization of the trmD operon. Mutations in the trmD gene of S. typhimurium were isolated in another
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
311
way. The two frameshift suppressor mutations sufA6 and s u . 2 in S . typhimurium induce frameshift at runs of Cs. These mutations result in an insertion of a G in the anticodon of tRNAProland tRNAPro,, respectively (304; J.-N. Li, unpublished results). The wild-type forms of these tRNApro species contain m'G37. Because a modification at position 37, such as msZio6A37, improves the efficiency of a tRNA during translation (17, 39, 305), it was anticipated that an mlG37 deficiency in the sufA6 tRNApro, would also reduce the eficiency of the sufA6 suppressor. However, no such antisuppressor mutation in the trmD gene was found. Instead, one mutation, t m D 3 , rendered the sufA6 strain temperature-sensitive for growth but did not decrease the sufA6-mediated frameshift suppression. Instead, the trmD3 mutation may increase the activity of the sufA6-mediated suppression to a nonacceptable level for the cell. The S . typhimurium trmV3 mutation reduces the level of mlG37 in tRNA to nondetectable levels at strains grown at high temperature, and to about 40%of the wild-type level when grown at 30°C (306). This mutant is thus a suitable tool for functional analysis of mlG37, as discussed below. The trmD operon (Fig. 3) consists of the following four genes: rpsP, which encodes the ribosomal protein S16; 21K, which encodes a 21-kDa protein that influences the activity of the ribosome (298);trmD, which encodes the tRNA(m'G37)methyltransferase; and rpZS, which encodes the ribosomal protein L19. These four genes are transcribed in the above order as a single polycistronic mRNA (307)(Fig. 3). Just upstream of the promoterproximal gene rpsP (S 16), there is an attenuator-like structure that causes transcription termination to about 70% in vitro, and there are no internal promoters or processing sites. Transcription terminates at a rhoindependent terminator just downstream of the rpZS gene (308). Like other ribosomal protein operons (309), the accumulation of the mRNA from the trmV operon is growth-rate dependent and is stringently controlled (308). However, unlike other ribosomal protein operons, the trmD operon is not subject to translational and transcriptional feedback regulation (310). The rate of synthesis of the four proteins is significantly different resulting in l / l 2 t h and 1/40th the amount of the 21K protein and tRNA(m'G37)methyltransferase, respectively, as compared to the amount of the two ribosomal proteins flanking the 21K and trmV genes (311). This difference in expression is achieved by regulation at the translational level, because the operon is transcribed as a single mRNA molecule. A large stemand-loop structure is formed by mRNA sequences 100 nucleotides downstream from the start codon of the 21K gene. This may fold back and basepair to the translational start site of the 21K mRNA, the stem-and-loop structure preventing entry of the ribosome and thus decreasing the frequency of translation initiation. This internal negative control element of the 21K
312
GLENN R. BJdRK
gene does not control the translational initiation of the downstream trmD gene. However, a similar stem-and-loop structure, which may inhibit the initiation of translation of the trmD mRNA, is also present at the beginning of the trmD gene (312). The tRNA(mlG37)methyltransferase activity is invariant with growth rate, although the amount of mRNA transcript and the TrmD polypeptide increases with growth rate (108, 308, 311, 313). Neither the enzymatic activity (313)nor the synthesis of the polypeptide (311)is stringently regulated, in contrast to the accumulation of the mRNA (308). Accordingly, the trmD promoter contains the “discriminator,” a (G C)-rich region common to all stringently regulated genes (121). This discrepancy between the regulation at the transcriptional and the translational level suggests that a regulatory device is operating at the translational level. Whether the complex mRNA structure at the beginning of the t m D gene is involved in this translational regulation is not known. Clearly, the trmD operon exhibits many complex regulatory features, resulting in noncoordinate gene expression and different response to various physiological conditions. Although most studies of the recognition elements for a tRNA biosynthetic enzyme have been performed in uitro using altered substrates, it would be informative to do similar studies in vivo. Such in vivo studies take into account the competition that must exist between the tRNA-modifying enzyme under study, the different exo- and endonucleases, other tRNAmodifying enzymes, and the aminoacyl-tRNA synthetases acting during the maturation process. Moreover, analyses in vivo of the recognition requirements for a tRNA biosynthetic enzyme would also take into account the rate of synthesis of the primary tRNA transcript in relation to the amount and the intrinsic catalytic activity of the tRNA-modifying enzyme of interest. We have therefore devised a method to analyze the recognition process in vivo. It was shown that a mutation in the structural gene (trmD) for the tRNA(m1G37)methyltransferase, which results in lack of mlG37, also induces suppression of certain frameshift mutations in the his-operon (306).It was further shown that tRNAPro, is one possible suppressing agent (see also Section VI,B,2). Therefore, mutations in the structural gene for tRNAProp may result in mlG37 deficiency if the mutation reduces the d n i t y of the substrate for the tRNA(rn’G37)methyltransferase. Such a tRNA would probably also be a frameshift suppressor, because it would be deficient in m’G37. Of course, other classes of mutations may also induce frameshifting although the level of mlG37 in tRNAPr0, might be normal. In this case the frameshift event is caused by a mechanism other than that caused by the lack of m’G37. We selected several such frameshift suppressor mutants of tRNAPro, with base substitutions that also reduced the level of mlG37 in tRNAPro2.These
+
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
313
cells were grown at a steady state, i.e., under growth conditions in which the tRNA(mlG37)methyltransferase competes with all the cellular entities involved in the synthesis and maturation process of the substrate. Single-base substitutions far from (in the amino-accepting stem and in the D-stem) the target nucleoside (position 37) severely affected the methylation reaction, suggesting that the tRNA(m'G37)methyltransferase in uiuo senses the whole tRNA structure rather than a specific sequence in the tRNA (Q. Qian and G. R. Bjork, unpublished results). These genetic results obtained in viuo under the correct competition conditions for the recognition process of tRNA(m'G37)methyltransferase are consistent with results for the same enzyme obtained in uitro ( 3 1 4 ~ ) .
B. Function of mlG37 IN TRANSLATION OF M'G37 1. FUNCTION INTERACTIONS
IN
COGNATE CODON-ANTICODON
(reading CUN codons), In E . coli and S. typhimurium, tRNALeU1.2,3 tRNAPr01,2,3 (reading CCN codons), and one tRNAArgccc(which reads the codon CGG) are the only tRNA species that have m'G37. Lack of mlG37 results in a 32% reduction of the overall polypeptide chain elongation rate (71), which is somewhat more than that cuased by the absence of *38,39, or 40 (73). The leader region of the S . typhimurium ZeuACBD-operon encodes a 28-aminoacid-long polypeptide containing four consecutive leucine control codons (CUA-CUA-CUA-CUC) (65). The two tRNALeUspecies (tRNALeU2 recognizing CUC and tRNALeU, recognizing CUA) both contain W in the anticodon hairpin but not in the anticodon loop and mlG in position 37. In a hisT mutant, which has U instead of in positions 38 and 39, the anticodon hairpin of the tRNA, the leu-operon is derepressed; consequently the hisT mutant is more resistant toward the leucine analog triflouoro-DL-leucine compared to the wild-type strain (61). On the other hand, the trmD3 mutation, which results in an unmodified G37 in place of mlG37, does not affect the leu-operon expression and shows even a slight sensitivity toward trifluoro-DL-leucine compared to the wild-type strain. The derepressed leuoperon in a hisT mutant is consistent with a longer step-time of the two tRNALeU specieslacking W in the anticodon region and decoding the leucine control codons. However, an mlG37 deficiency of the same tRNA"" species leads to an unchanged, or even a slightly repressed, expression of the leuoperon, which implies an unaffected or even a slightly shorter step-time of the leucine tRNAs (314b). Apparently, some of the other tRNAs (tRNALeUI, tRNAPr01,2,3, and tRNAArg,,,) normally having mlG37 must have a longer step-time, because, as mentioned above, the overall polypeptide chain elon-
*
314
GLENN R.
BJOHK
gation rate is reduced in a trmD3 mutant. Therefore, these results suggest that the effect imposed on various tRNA species by the presence of mlG37 is not the same but is tRNA-dependent. Figure 4 shows how the first step in translation, the aa-tRNA selection step, is measured as a competition betwen the entering EFTu.GTP.aa-tRNA complex and the ability of the peptidyl-tRNA (in this case tRNAValL,which does not contain any mlG37) to shift into the + 1 frame. We utilized this “speedometer” assay to determine the ability of mlG37-deficient tRNAs to compete with the frameshifting peptidyl-tHNAV?. Transfer tRNALeC*,,deficient in mlG37 and reading the codon CUA, is as efficient as the fully modified tRNA in the tRNA selection step. As mentioned above (Section 1,B) the same tRNAL‘U species lacking in positions 38 and 39 is less efficient than the corresponding fully modified tRNALe” in selection of the cognate codon ( 8 2 ~ )Clearly, . the Y and mlG37, which are present in the same anticodon region, influence the anticodon structure differently, resulting in different effects on the various steps of the translation cycle. These results are also consistent with the effects induced by the 9 or the m’G37 deficiency on the expression of the leu-operon, because lack of Y,but not of mlG37, results in a derepressed leu-operon, implying that the tRNA step-time is increased for the *-deficient tRNALeUcuA but not for the same tRNA species lacking mlG37. However, tRNAPro, lacking m’G37 was considerably less efficient than the wild-type tRNA in the selection of the cognate codon CCG (82a). Thus, mlG37 in one tRNA species exerts strong effects on the tRNA selection step whereas the same modified nucleoside in another tRNA species does not. These results are consistent with the suggestion above, that mlG37 influences the function of the various tRNA species differently.
*
2. FUNCTION OF mlG37 IN THE READING FRAME MAINTENANCE The trmD3 mutant of S . typhimurium was isolated as temperaturesensitive for growth in a genetic background that also harbored a mutated gene for tRNAProl (the sufA6 mutation). During the genetic manipulations, it was noticed that the trmD3 mutation in itself could suppress some frameshift mutations in the his-operon. The lack of mlG37 was correlated with the ability to suppress the frameshift sites CCC-C or CCC-U (the first CCC is in the 0 frame and thus m’G37 deficiency induced the ability to read formally the first base in the downstream codon, which results in a + 1 frameshift). A model (Fig. 8) has been proposed in which G37 forms a Watson-Crick basepair with the first base in the 0 frame codon (306). To further analyze the mechanism of the tmD3-mediated frameshift suppression, a system was devised in which various potential frameshifting sequences were tested for their ability to be suppressed by mlG37-deficient tRNA (82a) (Fig. 9). All
315
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
B
Prevention of standard base pairing by the methyl group of m'G
B
A Wild type No frameshift = m'G37 = Prevention pairing by m'G37 of base
tmD3
*l frameshift
v= ~ = E P G37
'nsa,:
....
x:
..a.
0 frame
New *1 frame
FIG.8. Proposed model of how mW37-deficient tRNA induces a +I frameshift (modified from 315). (Upper panel, A) Watson-Crick base-pairing between G and C. (Upper panel, B) How a methyl group in position 1 prevents standard base-pairing. (Lower panel, A) Normal triplet decoding by wild-type tRNA. A potential involvement of m1G37 is prevented by the methyl group. (Lower panel, B) Suggested mechanism for + 1frameshifting by tRNA lacking the mlG37, as in the t m D 3 mutant of S. typhimurium. The unmodified G37 pairs with the first nucleotide in the cognate codon and participates in the decoding by creating a four-base anticodon (nucleotides in positions 34-37).
four cCC-N sites [denoted as a proline site (CCN); the first c would pair to G37, the capitalized nucleosides would be recognized by the normal anticodon nucleosides according to the model in Fig. 81 were efficienctly suppressed in the t m D 3 mutant as compared to the wild type (Fig. 9). The largest difference in frameshifting activity was observed at the two proline sites cCC-U (23-fold)and cCC-C (zg-fold), which both end with a pyrimidine (Y). The aCC-U site is not suppressed, which demonstrates that the 5' nucleoside (c or a) in the potential frameshifting site is an important determinant in accordance with the model (Fig. 8). N-Terminal amino-acid sequences of hybrid P-galactosidases generated by frameshifting at cCC-U and cCC-A showed that proline is inserted at these sites, which is consistent with
316
GLENN R. BJC)RK
---
A U C AAA AGC U U A AAC
&
--
1
Glu Ala Asn Gly Pro A AGC UAA C G C CCC U
FIG.9. Frameshifting activities at cCU-N, cCC-N, and cCG-N codons (the first c would be paired to G37; the capitalized nucleosides would be recognized by the normal anticodon nucleosides according to the model in Fig. 4). Different synthetic oligonucleotides were inserted in the beginning of the lacZ gene, creating a frameshift window in such a way that P-galactosidase activity was dependent of + I frameshift. Below are results shown as the ratio of the fl-galactosidase activity obtained in the t m D 3 mutants to that obtained in the wild type (82a, 315). The asterisk denotes a significant (pC0.05)difference in suppression between t m D f and trmD3 as deduced following a t-test (315) (T. G. Hagervall, personal communication).
the model in which a quadruplet codon-anticodon interaction by tRNAPro has taken place (Fig. 8). Although amino-acid sequences of the hybrid p-galactosidase generated from the other constructs have not been determined, we favor the same mechanism for the two other proline (cCC-C and cCC-G) sites. Interestingly, a correlation exists between the probability to frameshift and a high intrinsic rate of tRNA selection. Therefore, it was suggested that tRNAs with high intrinsic rate would be better competitors with the next tRNA for the first nucleoside of the next codon (315). However, the mlG37 deficiency reduces the intrinsic rate of codon selection for all tRNAPro species (82a). Therefore, it is difficult to evaluate the correlation between the high intrinsic rate of the fully modified tRNAProand frameshifting. Still the most efficient frameshift suppressor, tRNAPrO2, which normally reads CCU/C, also has the highest intrinsic rate of codon selection among all tRNAs tested (78).Although we favor a quadruplet anticodon-codon interac-
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
317
tion to explain the obtained results, other mechanisms, such as two- or three-base interactions or steric hindrance, are not ruled out (315). Similar to the proline sites, the two leucine sites (cCU-U and cCU-C), most efficiently suppressed (9- to 11-fold)by mlG37 deficiency, also end with a pyrimidine. Also at these two leucine sites we favor a quadruplet anticodon-codon interaction (Fig. 8). Note that no frameshifting is observed at the cCU-G site, which is the only site the major tRNALeUl species would be expected to frameshift (Fig. 9). Although this tRNA species is the most abundant tRNA species in E . coZi (316), its intrinsic rate of codon selection is lower than those for the other two tRNALeuspecies reading codons starting with C (78).Therefore, a correlation exists between the probability to frameshift and high intrinsic rate of tRNALeUselection. The intrinsic rate of codon selection by tRNALeU is not dependent on mlG37 (82b). Therefore, the tRNALeUselection rates determined by Curran and Yarus (78)are also relevant for mlG37 deficient tRNALeUspecies. However, the unexpected inability of the major tRNALeUl to frameshift may be due to a context effect caused by the next tRNA that competes with tRNALeUlfor the same nucleoside [G in cCU-Gca site is the last nucleoside in the potential quadruplet anticodoncodon interaction by tRNALeU,and the first nucleoside in the codon (GCA) in the 0 frame]. If so, the tRNA*Ia, which reads the codon GCA, would be a much more efficient competitor than those tRNAs reading the codons iCA. The proline cCC-G site is also the site least prone to frameshift among the proline sites (Fig. 9), thus a context effect imposed by the selection of the next tRNA in the 0 frame cannot be entirely ruled out. Still, the fact that frameshifting is occurring at the proline cCC-G site suggests that tRNA*la cannot completely inhibit frameshifting by tRNAPr01.3.Therefore, m'G37deficient major tRNALeU, is apparently less prone to frameshift than the undermodified major tRNAPrOl. Only one of the three tRNA*rg species has mlG37 and it recognizes only CGG. Therefore, a tmD3-mediated suppression is expected only at the arginine site, cCG-G. However, Fig. 5 shows that a low level of t m D 3 specific suppression was observed at all cCG-N sites, implying that a low but significant tmD3-induced frameshifting occurred at arginine sites read by tRNAs that do not normally have mlG37 (82a).Thus, an alternative mechanism from that shown in Fig. 4 must be operating to induce a trmD3mediated frameshifting at the arginine sites (82a)and perhaps also at one of the leucine sites (cCU-A). Although it was speculated (306) that m'G37 deficiency may also induce a -1 frameshift, no evidence for such an event has been obtained (315). In summary, t m D 3 is an efficient +1 frameshift suppressor at all four proline (cCC-N) sites and of two (cCU-Y) leucine sites. For these sites, we
318
GLENN R. BJC)RK
favor a mechanism involving a quadruplet anticodon-codon interaction (Fig. 8) in which the unmodified G37 is involved in base-pairing. However, at the arginine sites (cCG-N) and at one of the leucine sites (cCU-A), an alternative mechanism must be acting to induce the low-level but trmD3-specific frameshift suppression. At one potential site, cCU-G, at which the major tRNALeu, would be expected to frameshift, no trmD3-specific + 1 frameshifting was observed. These results also suggest that the function of mlG37 is not the same in all tRNA species but is tRNA species-dependent, or that the induction of frameshifting by m1G37 deficiency is sensitive to the mRNA context.
C. Metabolic Consequences of Lack of mlG37 in tRNA The growth rate of the trmD3 mutant of S. typhimurium, is reduced at 37°C by 24% in glucose minimal medium (71), and this reduction is decreased in media supporting slower growth (314b). The trmD3 mutant shows an altered sensitivity to several amino-acid analogs, implying that the mutation influences the synthesis of some of the enzymes involved in amino-acid metabolism. The t m D 3 mutant is especially sensitive to the tyrosine analog nitrotyrosine (314b). Characterization of extragenic suppressors revealed that this compound is transported into the cells by the tryptophan-specific permease encoded by the mtr gene. Therefore, in this respect, nitrotyrosine is apparently a tryptophan analog and not a tyrosine analog. The t m D 3 mutation does not influence the uptake of tryptophan, which suggests that the trmD3-mediated sensitivity is not caused by an increased uptake of nitrotyrosine. These results imply that the lack of mlG37 influences the metabolism of the cell and is an additional example of links between translation and the intermediary or central metabolism of the cell.
VII. Mutants (queA, queB, tgt) Defective in the Synthesis of Queuosine in Position 34 A. Isolation of the queA, que5, and tgt Mutants: The Synthesis of Queuosine
Queuosine (Q) was first discovered in the wobble position of the anticodon of E . coli tRNATyr in 1967 by three independent groups during the course of sequencing (317-320) (see also 25). Q was thought to be a simple derivative of G, but it actually took 8 years before its complex structure was established (321). The base of Q (queuine, Que) consists of a 7-(methylaminomethyl)-7-deazaguanine,called the “base of preQ,”, to
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
319
which a cyclopentenediol is linked to the NH, group (Fig, 1). The bulky 7-substituent group of Q has the planar trans conformation and is fully extended outward, so that it should not interfere with codon-anticodon interaction (322). Therefore, the side-chain of Q may interact with some other components of the translational apparatus rather than the codon. Moreover, the base-pairing property of the Q-base (queuine) is hardly influenced by replacement of the nitrogen atom by a carbon atom at position 7. This hypermodified nucleoside is present in the wobble position of all tRNA species specific for Tyr (UAY), His (CAY), Asn (AAY), and Asp (GAY). Thus, the corresponding tRNAs recognize NAY codons in the split codon boxes (a split codon box is a codon box in which the four codonj. are split between two amino acids, or between amino acids and stop codons). Temperature-sensitive clones were isolated following heavy mutagenesis (323) and their nucleotide composition was determined. One mutant was isolated (324) that was later shown to have an altered queA gene (325). Another mutant (here denoted queB), defective in Q biosynthesis, was also isolated (326). This latter mutant (queB) accumulates a Q precursor, 7-(cyano)-7-deazaguanosine (denoted preQo), whereas the queA mutant accumulates preQ, in the tRNA. A third mutant was defective in the tgt gene (327), which is the structural gene for the tRNA(Gua)transglycosylase. This enzyme efficiently inserts either the base of preQ, or the base of preQ, into the polynucleotide chain. The insertion occurs by cleavage of the N-C glycosylic bond without breakage of the phosphodiester bond (328). In the tgt mutant, the base of preQ, accumulates (327) and because the precursor to Q is also found free in wild-type E . coli (328), the normal substrate for the Tgt enzyme is the base of preQ,. Therefore, the conversion of the base of preQ, to the base of preQ, occurs at the monomeric level and is catalyzed by the queB gene product. Queuine (the base of Q) is not a substrate for the tRNA(Gua)transglycosylase, indicating that the cyclopentenediol is added after the insertion of the base of preQ,. Although no carbon unit of methionine is incorporated during the synthesis of Q, starvation for methionine in E . coli results in the accumulation of tRNAs that contain unmodified G as well as three different precursors to Q, one ofwhich is preQ, (329, 330). Therefore, the cyclopentenediol may originate from the ribose of AdoMet (25) and, indeed, a purified QueA protein transfers and isomerizes the ribose moiety of AdoMet to the epoxycyclopentane of the intermediate epoxy-Q (00) (331). The last step in the synthesis of Q is the reduction of the intermediate OQ to Q, which requires vitamin B,, (332). From analyses of the isolated mutants and the nucleoside present in their tRNA, the synthesis of Q in E . coli tRNA may be as follows (several biosynthetic steps may be involved although only one arrow is indicated):
320
GLENN R. BJoRK
GTP --., base of preQOqUeB+ base of preQ,
L
tgt +
7 ~RNAGuN --j
preQ134que* + 0 Q 3 4 ~ + ’ ~ 434
Although three mutants have been isolated, only two of the bacterial genes-queA and tgt-involved in the biosynthesis of Q have been localized on the chromosome (Fig. 2 ) and sequenced (325,327).They are likely part of an operon of five cistrons (Fig. 3). The first gene (queA) in this operon is preceded by the main promoter, P,, but two internal promoters (PI and PII) also exist. Transcription initiated at P, is predominantly responsible for the expression of the queA and tgt gene products (333). The only rhoindependent transcriptional terminator present in the operon is located downstream of the fifth gene, secF, suggesting that the different transcripts initiated at these three promoters may all terminate after the secF gene. Interestingly, the first promoter P, contains a “discriminator,” which implies that the expression from this promoter, like the t d , trmD, and hisT promoters, is stringently regulated. A high-&nity FIS binding site is centered around nucleotide -58, a location similar to the putative FIS binding site in the trmA promoter. The FIS binding site for the queA gene is located in an upstream activating sequence (UAS), the presence of which doubles the transcription (333).
B. Function of Q34 in Translation Transfer RNATyr from the tgt mutant has G34 instead of 434. The aminoacylation kinetics for such undermodified tRNA is similar to that of the wild-type tRNA. However, a small but definite difference in the K , values was observed (327).The Q-deficient tRNAHis from rabbit reticulocytes and the E . coli tRNAAspwith or without Q have the same K , and kcat (334, 335). Therefore, Q may influence the aminoayclation reaction in a tRNAdependent manner, but clearly Q does not have a strong impact on the homologous aminoacylation reaction. The G34-containing tRNATyr, like the Q34-containing tRNATyr, binds preferentially to UAU compared to UAC. However, the G34-containing tRNATyr is about half as efficient as the 434containing tRNATyr in binding to triplet-programmed ribosomes (327). Therefore, the presence of Q34 enhances the binding but does not change the codon preference of E . coli tRNATyr, indicating that this feature of tRNATyr function may not depend on the presence of 4 3 4 per se. However, in Xenopus oocytes, tRNAHirwith 4 3 4 in the anticodon slightly prefers the
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
32 1
histidine codon CAU to the codon CAC, whereas the G34-containing tRNAHisshows an opposite preference (336).In E . coli the histidine operon is regulated by an attenuation mechanism that monitors the speed with which the seven histidine codons in the leader mRNA are decoded. The level of one of the histidine biosynthetic enzymes is the same in the tgt mutant and in the wild-type control, implying that the Q34-deficient tRNAHis reads these histidine codons as efficiently as the fully modified tRNAHis(327). Due to the presence of seven histidine codons in a row, a small difference in the tRNA step-time at each of these seven histidine codons amplifies the effect. This result strongly suggests that Q in the only tRNAHis species that is present in E . coli has no effect on the overall elongation cycle. Moreover, two species of tRNAHis from rabbit reticulocytes containing either Q or G in position 34 also show the same efficiency in hemoglobin biosynthesis in vitro (337). The fact that Q does not affect the efficiency of translation suggests a similar base-pairing ability of Q compared to G, which is also predicted by the three-dimensional structure of Q (322). Thus, in E . coli, 4 3 4 has no major impact on the efficiency of translation, thus it may interact with some macromolecule other than mRNA. All tRNA that contain Q read codons ending with pyrimidines (NAY) occurs in split codon boxes. The other two code words in the split codon box, which ends with purines (NAR), are decoded by tRNA having a thiolated uridine in the wobble position. One function of Q may be to prevent miscoding toward purine-ending code words, whereas the thiolated uridine may prevent misreading of pyrimidine-ending code words. However, no such misreading was observed in vitro for G34-containing tRNAAsn(327). Several amber (UAG) and ochre (UAA) codons are not suppressed by the t g t mutation suggesting that such misreading by tRNATyr, which normally reads codons UAY, is not occurring in vivo (327). However, a low level of tgtmediated amber suppression occurs at one specific codon context, suggesting that such misreading is codon-context dependent (338). In vitro, the ability of G34-containing tRNATyr to read UAG has been established for several eukaryotic systems (339-341). Thus, although Q seems to have no pronounced effect on the efficiency of translation in E . coli, this hypermodified nucleoside may prevent some misreadings at specific codon contexts. Becaus the only misreading observed is the readthrough at the UAG nonsense codon, the Q-deficient tRNATyr may be an efficient competitor to release factor 1, which recognizes the amber codon UAG. It is intriguing that so many organisms have such a complicated hypermodified nucleoside, synthesis of which is catalyzed by several enzymes in a complicated set of reactions. Perhaps Q functions in a hitherto unknown role of tRNA, or is important for a tRNA function that has not so far been specifically detected.
322
GLENN R. BJC)RK
C. Metabolic Consequences of Lack of Q34 in tRNA
Using transductants presumed to differ only in the allelic state of the tgt gene, it was shown that the tgt mutant of E . coli grows 1.2 times faster than the wild type as revealed by a mixed-population experiment (327). If the mixed population was continously incubated at stationary phase for a long time, viable cells of the tgt mutant decreased much faster than wild-type cells, suggesting that the viability of the cell under such harsh physiological conditions is dependent on Q-containing tRNA (327).The molecular mechanism for this Q-mediated prevention of cell death at stationary phase is unknown. Using the same transductants, it was observed that the tgt mutation influences the ability of the cell to grow anaerobically on fumarate (342). However, this was later shown to be due to a mutation in the closely linked fnr gene (338).Thus, the aforementioned tgt-mediated growth characteristics may also be dependent on some other unknown mutations present in the tgt mutant, because it was isolated after heavy mutagenesis.
VIII. Mutants (aro) Defective in the Synthesis of cmo5U (V-Nucleoside) and Its Methyl Ester in Position 34
A. Isolation of Mutants Defective in the Synthesis of cmo5U34 and Discovery of a Link between Biosynthesis of Aromatic Amino Acids and tRNA Modification
A modified nucleoside, denoted V (343,344), was detected in the wobble position of the major tRNAVa*(343-345) and was subsequently identified as uridined-oxyacetic acid (cmo5U) (346). This nucleoside is found only in tRNAs from E . coli and S . typhimurium reading codon family boxes specific for Val, Ser, Pro, Thr, and Ala. Corresponding tRNAs from G + organisms such as €3. subtilis have 5-methoxy-uridine (m05U) (347). Several mutants deficient in tRNA modification show no apparent change in phenotype (e.g., t d ) . Transposable elements can insert into the chromosome almost randomly and they also harbor a gene that renders the cell resistant to various drugs. If a transposon (e.g., Tn5) is inserted into a nonessential structural gene for a tRNA-modifying enzyme, such a mutant lacks completely the corresponding modified nucleoside in its tRNA; at the same time, it becomes resistant to kanamycin. These mutants should be valuable for genetic experiments because they result from a single mutation-
MODIFIED NUCLEOSIDES I N BACTERIAL RNA
323
a1 event and have an easily scorable phenotype. Therefore, a screening procedure was set up to look for transposon-induced mutations, which result in defective tRNA modification (348). Following transposon mutagenesis, cells were labeled with [methyl-14C]-~-methionine,tRNA was prepared and digested to nucleosides, and the nucleoside composition was analyzed by one-dimensional thin layer chromatography, which resolved at least 17 different methylated nucleosides (348).One mutant defective in the synthesis of two methylated nucleosides was found. Genetic characterization revealed that the transposon Tn5 had been inserted into aroD (349).Further genetic analysis revealed that mutations in any of the genes (aroA, B , C,D,and E ; Fig. 2) common to the synthesis of aromatic amino acids lack these two modified nucleosides, which were identified as cmo5U and its methyl ester, mcmo5U. The aroD gene product catalyzes a reaction upstream of the formation of the intermediate shikimic acid, whereas the aroA gene product catalyzes the formation of chorismic acid, the common precursor to the three aromatic amino acids and four vitamins. When shikimic acid was added to the growth medium, the aroD mutant regained the ability to synthesize cmo5U34 and mcmoW34, whereas an aroC mutant (AroC acts distal to shikimic acid but prior to chorismic acid) did not (349).Furthermore, addition of early intermediates in the synthesis of the four aromatic vitamins does not restore the formation of cmo5U34 or mcmo5U34 in the tRNA (349,350). Thus, chorismic acid, or an intermediate in an hitherto unknown pathway from it, is required for the formation of cmo5U34. Transfer RNAVallfrom an aroD mutant has an unmodified U in the wobble position (350),suggesting that the absence of this metabolite is required as the first step in the synthesis of cmoW34. The biosynthetic pathway may be as follows:
[Chorismic acid (Chor) or an unknown derivative of it (X) is required for the first step in the synthesis of cmo5U34. Only one of the two carbon atoms of cmo5U34 originates from methionine. The mo5U34 may be an intermediate, because it occurs in gram-positive organisms in the corresponding tRNA species, and its synthesis is also affected by mutations in the aro genes. For further discussion see 3501.
B. Function of cmo5U34 in Translation The wobble position is often occupied by a modified nucleoside, and a correlation exists between the kind of modified nucleoside present in this position and the coding capacity of the tRNA (19). Uridines in this position are always modified except in tRNAs from mitochondria, Mycoplasma, and Aro- mutants of E . coli and S . typhimurium, which lack cmoW34 and/or
324
GLENN R. BJORK
mcmo5U34. tRNAs having (m)cmo5U in the wobble position read codons ending in A, G, or U (357-361), because an interaction between the cm05 group and the 5' phosphate stabilizes the C2'-endo configuration and favors the formation of cmo5U34.U and cmo5U34.G base-pairs. In the C3'-endo configuration, which these uridines can still adopt, they can form a cmo5U34.A base-pair (LBO). However, within each codon family, the codons ending in U or C are also read by at least one other tRNA. Furthermore, tRNAs from mitochondria as well as from Mycoplasm have an unmodified uridine-34 in the corresponding tRNAs. These tRNAs can read all four codons in the codon families (137, 362-366). Thus, it is not obvious why these modifications are present in this group of tRNAs. An Aro- mutant of E . coli completely lacks the (m)cmo5U modifications and has an unmodified U in the corresponding tRNAs (349,350). Such an Aro- mutant grows 80% as fast as wild type when grown in the presence of all known metabolites whose synthesis originates from chorismic acid (G. R. Bjork, unpublished results). This result suggests that the presence of this group of modified uridines is of importance under some physiological conditions.
IX. Conclusions and Perspectives Analyses of genetically well-defined bacterial mutants lacking tRNA modifications have given valuable information about the genetic organization of the corresponding genes, but also about the synthesis of modified nucleosides and their function in translation and how mutations in tRNAmodifying genes in a variety of unknown ways influence cell physiology. Table I shows that half of these mutants have been isolated not by a selection procedure but by a screening method. Although most scientists hesitate to screen for a desired mutant because it appears to require too much boring work, the time spent on screening for the mutants is short compared to the time spent using them in scientifically stimulating functional studies. Of course, selection of a mutant is the method of choice; still, one should not hesitate to screen for a desired mutant, provided that the screening method is simple and specific. Of the about 50 different tRNA-modifying genes predicted to exist in bacteria, less than half have so far been identified. It is imperative to characterize these genes before we can have any comprehensive picture of the impact that modified nucleosides have on tRNA structure and function and cell physiology and metabolism. As discussed in this essay, the tRNA modification-deficient mutants have been pivotal in unraveling their role in tRNA function (summarized in Table 11). Unexpectedly, some modified nucleosides (m5U54, s4U8, and Q34), which occur frequently in tRNA, show no apparent functional impact on the
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
325
tRNA based on the normal growth rate of the corresponding mutants. Other modified nucleosides (ms2ioGA37, mlG37, and W38, 39, 40) influence the function of the tRNA and result in a reduced growth rate and polypeptide chain clongation rate for the mutant. The presence of two of these modified nucleosides (ms2i06A37and W38, 39, 40) optimizes anticodon-codon interaction by increasing the efficiency at the expense of an increased thirdposition fidelity. The ms2i06A37 also decreases the rate of first-position errors. One modified nucleoside, mlG37, directly controls reading-frame maintenance. Recent results also imply that a modified nucleoside may act in a tRNA-dependent manner, i.e., the function of the modified nucleoside is not the same in all tRNA species containing it. Modification at the wobble position may (mnm5s2U34, cmo5U34) or may not (434) influence the anticodon-codon interaction. Clearly, modified nucleosides present in the anticodon region, except Q, have, as expected, a profound functional impact on the tRNA, whereas modified nucleosides (m5U54, s4U8) outside this region have so far eluded any attempts to show a functional impact on the tRNA in uiuo. These experimental facts are most likely due to the way the function of the tRNA has been monitored in uiuo, and indeed m5U54 has been shown to affect the function of tRNA in uitro. Methods to assay the different steps in the elongation cycle in uiuo may reveal that these nucleosides cause a structural change in the tRNA. Therefore, it is imperative to develop new in uiuo methods that assay, in a specific way, the different tRNA-dependent reactions. As such methods are developed, well-defined mutants defective in tRNA modification will be increasingly valuable to elucidate the molecular mechanisms for the various reactions of tRNA. Analysis of mutants defective in tRNA modification have, quite unexpectedly, revealed links between translation and intermediary or central metabolism (aromatic amino-acid metabolism and cmoW34; ms2i06A37and central metabolic pathways; iron metabolism and oxygen tension; mlG37 and some unidentified metabolism of the tyrosine analog nitrotyrosine; m5U54 and C1 metabolism; s4U8 and UV protection and thiamin metabolism). Furthermore, the genetic organization of operons harboring genes encoding a tRNAmodifying enzyme suggest genetic links even between cellular metabolism of the cell and translation (hisT and pyridoxal metabolism; miuA and DNA repair). Some of these metabolic links may have a regulatory function. One example is the link between Fe-dependent ms2 formation and the control of the level of enterochelin in the cell and thereby iron transport and the transport of aromatic amino acids. Whether other links between tRNA modification and metaboIism have similar regulatory impact awaits further analysis of the mutants, but it may be so. The mechanism with which such regulatory circuits work is a challenge for future studies. The isolation of additional mutants defective in tRNA modification will therefore give us not only more
TABLE I1 SUMMARY OF SOME FUNCTIONAL IMPLICATIONS IMPOSED BY LACK OF MODIFIEDNUCLEOSIDES IN BACTERIAL tRNA AS DEDUCED FROM MUTANT ANALYSES ~~~
Mutation
0
a! N
Modified nucleoside affected
Relative Modified growth rate of the Effect on nucleoside present In Growth mutant= Relative aminothe mutant medium (%) cgr," acylation
us
Gluc, rich
No effect
nuoA, Ec/St
s4U8
nuoC, Ec trmCl, Ec tnnC2,Ec tnnE, Ec
s4U8 U8 mnm5szU34 cmn5szU34 mnm5s2U34 nm5s2U34 mnm5s2U34 s2U34
-
-
asuE;Ec selD; Ec/St
mnm5s2U34 mnm5U34 mnm5Se2U34 mnm5s2U34
aro; Ec/St
tgt; EC
cmo5U34 Q34
U34 G
queA; Ec t m D ; Ec/St
Q34 m'G37
preQ1 G37
Efficiency of translationb
-
No effect
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-32
-
-
Rich -
-20 No effect
-
GIUC
-
-24
-
-
Reduced Reduced Reduced
Reduced Reduced for G ending codons
Step(s) affected in translation elongation cycle No effect on EFG or EF-Tu cycle
-
-
Reduced binding of tRNALvs to
-
Error
Other
-
UV resistance
-
Thi-
Reduced
-
Reduced binding Weak UAG No effect on tRNAH's stepof tRNATvr to supprestime cognate codons sion; no missense -
-
-
-
-
-
tRNA dependence Frameshifts Pleiotropic; in aa-tRNA sesensitive to lection nitrotyrosine
rniaA; Ec/St
ms2i06A37 , A37
rniaB; St
ms2i06A37
miaE: St
m t d ; Ec
t d ; EC
Gluc
-30
FA37
Gluc
No effect
ms2i06A37
ms2i6A37
Gluc
No effect
mt6A37
t6A37?
Gluc
No effect
1 3 8 , 39, 40
U38, 39, 40 Gluc
mW54
U54
Glu
- 16 No effect
-31
No
Reduced
No effect
-
Reduced No effect
No effect
-
-23
NO
No effect
No
No effect on a- Decrease Pleiotropic tRNA selection third and increase firstposition errors Pleiotropic -
-
Increased steptime for tRNAThr(ACY) Increased step- Reduced efficien- Decreased time for many cy in the tRNA tRNAs selection step No effect Reduced A-site None in binding of oioo; intRNALys;no ef creased in vitro fect on EF-Tu and P-site binding and on peptidyltransferase activity
Unable to grow o n CAC intermediates
-
Deregulates several operons tnnA gene essential
* Relative growth rate and polypeptide chain elongation rate (cgr,,) were calculated as the difference in growth rate or cgr,, respectively, divided by the values of the wildtype control expressed as %. A minus sign denotes that the mutant has a lower value. b Determined how modification deficiency influences a nonsense suppressor (ms2i06A37,mnmVU34, mnmjseZU34, mW54, '4'38, 39, 40)or regulation of operon known to be under attenuation-mediated regulation (4' 3' 8, 39, 49; ms"io6A37; 434, mWA, mW54).
328
GLENN R. BJORK
information and what role they play in the function of the tRNA, but also unexpected new knowledge of the physiology of the cell and the interplay between translation and central and intermediary metabolism. An exciting future is ahead of us and studies on tRNA modification will certainly reveal some unexpected and exciting new features of that fascinating molecule, tRNA, and its “odd” nucleosides.
ACKNOWLEDGMENTS This work was supported by grants from the Swedish Cancer Society (Project 680) and the Swedish National Science Research Council (Project B-BU 2930). The critical reading of the manuscript and stimulating discussions by T. 6. Hagervall, K . S. Strlby, and B. Persson (Umel, Sweden), L. A. Isaksson (Stockholm, Sweden), and M. Buck (Sussex, England) are gratefully acknowledged. I am also indebted to M . Buck for the linguistic improvements of the manuscript. Special thanks are due to B. Esherg (Fig. 4). T. G . Hagervall (Fig. 8), and B. Persson (Figs. 1-3 and 5) for their help with the indicated figures.
Abbreviations The structures of the different modified nucleosides as well as their abbreviations are described by Nishimura (8). An index and an exponent indicate the number of the position of a substitution, e.g., 6-dimeth yladenosine is abbreviated m$A; k-, m-, c-, n-, o-, t-, i-, r- and s- indicate lysine, methyl, carbon, amino, oxy, threonine, isopentenyl, ribose, and thio groups, respectively; acp denotes 3-amino-3-carboxypropyl. An abbreviation to the left of the nucleoside symbol denotes modifications of the base, whereas a symbol to the right of the nucleoside symbol denotes modification of the ribose. Other abbreviations: D, dihydrouridine; 9,pseudouridine; I, inosine; Q, queuosine, oQ, epoxyqueuosine; R, purine nucleoside; and Y, pyrimidine nucleoside. A number following an abbreviation for a modified nucleoside denotes the location of it in the tRNA. The enzyme catalyzing the formation of m5U at position 54 in the tRNA is denoted tRNA(mW54)methyltransferase, and likewise for other tRNA-modifying enzymes. MethionyltRNA ligase is denoted MetRS and likewise for other aminoacyl-tRNA ligases. The peptide encoded by the trmA gene is denoted TrmA and likewise for other gene products. Aminoacyl-tRNA is denoted aa-tRNA. Tn5, MudJ, and MudX are transposons, DNA sequences that can insert themselves into new locations in the genome. They also harbor genes that render the cell resistant to the antibotic kanamycin.
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
329
REFERENCES K. Ogata and H. Nohara, BBA 25, 659 (1957). M. B. Hoagland, P. C. Zamecnik and M. L. Stephenson, BBA 24, 215 (1957). D. B. Dunn, BBA 34, 286 (1959). J. D. Smith and D. B. Dunn, BJ 72, 294 (1959). R. Hotchkiss, JRC 175, 315 (1948). C . G. Edmonds, P. F. Crain, R. Gupta, T. Hashizume, C. H. Hocart, J. A. Kowalak, S. C. Pomerantz, K. 0. Stetter and J. A. McCloskey, J. B a t . 173, 3138 (1991). 7. J. A. Kowalak and J. A. McCloskey, in “Translational Apparatus” (K. Nierhaus, ed.), p. 79. Springer-Verlag, Berlin and New York, 1993. 8. P. A. Limbach, P. F. Crain and J. A. McCloskey, NARes 22, 2183 (1994). 9. C. R. Woese, 0. Kandler and M. L. Wheelis, PNAS 87, 4576 (1990). 10. S. Steinberg, A. Misch and M. Sprinzl, NARes 21, 3011 (1993). 11. G . R . Bjork, Chem. Scr. 26B, 91 (1986). 12. 1. A. McCloskey, Systematic Appl. Microbiol. 7, 246 (1986). 13. H. Pang, M. Ihara, Y. Kuchino, S. Nishimura, R. Gupta, C. R. Woese and J. A. McCloskey, JBC 257, 3589 (1982). 14. M. Buck, M . Connick and B. N. Ames, Anal. Biochem. 129, 1 (1983). 15. C. E. Singer, G. R. Smith, R. Cortese and B. N. Ames, Nature NB 238, 72 (1972). 16. Y. Komine, T. Adachi, H. Inokuchi and H. Ozeki, J M B 212, 579 (1990). 17. G. R. Bjork, in “Transfer RNA” (D. Sol1 and U. L. Rajbhandary, eds.), p. 165. American Society for Microbiology, Washington, D.C., 1994. 18. S. Nishiinura and S. Yokoyama, in “Transfer RNA” (D. Sol1 and U. L. Rajbhandary, eds.), p. 207. American Society for Microbiology, Washington, D.C., 1994. 19. S. Nishimura, This Series 12, 49 (1972). 20. H. Kersten, This Series 31, 59 (1984). 21. P. R. Srinivasan and E. Borek, This Series 5 , 157 (1966). 22. R. W. Chambers, This Series 5, 349 (1966). 23. E. Goldwasser and R. L. Heinrikson, This Series 5, 399 (1966). 24. R. H. Hall, This Series 10, 57 (1970). 25. S. Nishimura, This Series 28, 49 (1983). 26a. R. P. Singhal, This Series 28, 75 (1983). 26b. R. P. Singhaal, E. F. Roberts and V. N. Vakharia, This Series 28, 211 (1968). 27. S. Nishimura, This Series 12, 49 (1972). 28. B. Singer and M. Kroger, This Series 23, 151 (1979). 29. R. W. Adamiak and P. Gornicki, This Series 32, 27 (1985). 30. K. M . Ivanetich and D. V. Santi, This Series 42, 127 (1992). 31. S. Nishimura, in “Transfer RNA: Structure, Properties, and Recognition” (P, R. Schimme], D. Sol1 and J. N. Abelson; eds.), p. 59. CSHLab, Cold Spring Harbor, New York, 1979. 32. D. L. Hatfield, J. G. Levin, A. Rein and S. Oroszlan, Adu. Virus Res. 41, 193 (1992). 33. D. L. Hatfield, D. W. Smith, B. J. Lee, P. J. Worland and S. Oroszlan, Crit. Rev. Biochem. Mol. Biol. 25, 71 (1990). 34. W. M. Ching, L. Tsai and A. J. Wittwer, Curr. Top. Cell Regul. 27, 497 (1985). 35. G . Dirheimer, Rec. Results Cancer Res. 84, 15 (1983). 36. H. Kersten and W. Kersten, in “Chromatography and Modification of Nucleosides. Part B: Biological Roles and Function of Modification, J. Chromatography Library” (C. W. Gehrke and K. C. T Kuo, eds.), 45B. p. B69. Elsevier, Amsterdam, 1990.
1. 2. 3. 4. 5. 6.
330
GLENN R. BJC)RK
37. S. Yokoyama and T. Miyazawa, in “Chromatography and Modification of Nucleosides. Part B: Biological Roles and Function of Modification, J. Chromatography Library” (C. W. Gherke and K. C. T. Kuo, eds.), 45B, p. B303. Elsevier, Amsterdam, 1990. 38. B. C. Person, Mol. Microhid. 8, 1011 (1993). 39. G. R. Bjork, J. U. Ericson, C. E. Gustafsson, T G. Hagervall, Y. H. Jonsson and P. M. Wikstriim, ARB 56, 263 (1987). 40. W. E. Cohn and E. Volkin, Nature 167, 483 (1951). 41. C.-T. Yu and F. W. Allen, BBA 32, 393 (1959). 42. J. P. Scannell, A. M. Crestfield and F. W. Allen, BBA 32, 406 (1959). 43. W. E. Cohn, BBA 32, 569 (1959). 440. W. E. Cohn, JBC 235, 1488 (1960). 44b. B. E. H. Maden, This Series 39, 241 (1990). 44c. H. A. Rau6, J. Klootwijk and W. Musters, Prog. Biophys. Mol. B i d . 51, 77 (1988). 45. R. Reddy, NARes 16, Suppl. r71 (1988). 46. J. R. Roth, D. N . Anton and P. E. Hartinan, ] M B 22, 305 (1966). 47. J. A. Lewis and B. N. Ames, JMB 66, 131 (1972). 48. M . Brenner, J. A. Lewis, D. S. Straus, F. De Lorenzo and 8 . N. Ames, JBC 247, 4333 (1972). 49. H. M. Johnston, W. M. Barnes, F. G . Chumley, L. Bossi and J. R. Roth, PNAS 77, 508 (1980). 50. 6. W. Chang, J. R . Roth and B. N. Ames, J. B a t . 108, 410 (1971). 51. R. Cortese, H. 0. Kammen, S. J. Spengler and B. N. Ames, JBC 249, 1103 (1974). 52. C. L. Turnbough, Jr., R. J. N e d , R. Landsberg and B. N . Ames, JBC 254, 5111 (1979). 53. C. B. Bruni, V. Colantuoni, L. Sbordone, R. Cortese and F. Blasi, J. R u t . 130, 4 (1977). 54. C. C. Marvel, P. J. Arps, B. C. Rubin, H. 0. Kammen, E. E. Penhoet and M. E. Winkler, J. B a t . 161, 60 (1985). 55. P. J. Arps, C. C. Marvel, €3. C. Rubin, D. A. Tolan, E. E. Penhoet and M. E. Winkler, NARes 13, 5297 (1985). 56. M. L. Nonet, C. C. Marvel and D. R. Tolan, JBC 262, 12209 (1987). 57. P. J. Arps and M. E. Winkler, J. B a t . 169, 1061 (1987). 58. P. V. Schoenlein, B. €3. Roa and M. E. Winkler, 1. B a t . 171, 6084 (1989). 59. A. Bognar, C. Pyne, M. Yu and G . Basi,J. Buet. 171, 1854 (1989). 60. D. T. Palmer, P. H. Blum and S. W. Artz, J. Bact. 153, 357 (1983). 61. R. Cortese, R. Landsberg, R. A. Haar, H. E. Umbarger and B. N. Ames, PNAS 71, 1857 (1974). 62. A. Rizzino, M. Mastanduno and M. Freundlich, BBA 475, 267 (1977). 63. R. P. Lawther and 6. W. Hatfield, PNAS 77, 1862 (1980). 64. F. E. Nargang, C. S . Subrahmanyam and H. E. Umbarger, PNAS 77, 1823 (1980). 65. R. M . Gemmill, S. R. Wessler, E. B. KeHer and J. M. Calvo, PNAS 76, 4941 (1979). 66. L. Bossi and J. R. Roth, Nature 286, 123 (1980). 67. J. U. Ericson and 6. R. Bjork, J M B 218, 509 (1991). 68. F. Bouadloun, T. Srichaiyo, L. A. Isaksson and 6. R. BjBrk, J. Bact. 166, 1022 (1986). 69. L. Bossi, J M B 164, 73 (1983). 70. J. H. Miller and A. M. Albertini, J M B 164, 59 (1983). 71. T. G. Hagewall, J. U. Ericson, K. B. Esberg, J. N . Li and 6. R. Bjork, BBA 1050, 263 (1990). 72. S . W. Artz and D. Holzschu, in “Amino Acids: Biosynthesis and Genetic Regulation” (K. M. Herrman and R. L. Somerville, eds.), p. 379. Addison-Wesley Publishing Company, Reading, MA, 1983.
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
33 1
D. T. Palmer, P. H. Blum and S. W. Artz, FP 40, 1750 (1981). P. H. O’Farrell, Cell 14, 545 (1978). J. Parker, J. W. Pollard, J. D. Friesen and C. P. Stanners, PNAS 75, 1091 (1978). J. Parker, MGG 187, 405 (1982). M. Ehrenberg, A. M. Rojas, J. Weiser and C. G. Kurland, ] M B 211, 739 (1990). J. F. Curran and M. Yarus, J M B 209, 65 (1989). W. J. Craigen, R. G. Cook, W. P. Tate and C. T. Caskey, PNAS 82, 3616 (1985). 80. R. B. Weiss, D. M. Dunn, A. E. Dahlberg, J. F. Atkins and R. F. Gesteland, EMBO]. 7, 1503 (1988). 81. J. F. Curran and M. Yarus, J M B 203, 75 (1988). 82a. T. G. Hagervall, B. Esberg, J.-N. Li, T. M. F. Tuohy, J. F. Atkins, J. F. Curran and G. R. Bjork, in “The Translational Apparatus” (K. H. Nierhaus, ed.), p. 67. Plenum, New York, 1983. 82b. J.-H. Li, B. Esberg, J. Curran and G. R. Bjork, unpublished. 83. L. Bossi and J. R. Roth, Cell 25, 489 (1981). 84. L. Bossi, T. Kohno and J. R. Roth, Genetics 103, 31 (1983). 85. D. C. Ward and E. Reich, PNAS 61, 1494 (1968). 86. R. H. Griffey, D. Davis, Z. Yamaizumi, S. Nishimura, A. Bax, B. Hawkins and C. D. Poulter, JBC 260, 9734 (1985). 87. D. R. Davis and C. D. Poulter, Bchem 30, 4223 (1991). 88. G. R. Kitchinginan and M. J. Fournier, Bchem 16, 2213 (1977). 89. J. Thoniale and 6 . Nass, EJB 85, 407 (1978). 90. R. S. Bresalier, A. A. Rizzino and M. Freundlich, Nature 253, 279 (1975). 91. H. E. Umbarger, in “Escherichia coli and Salmonella typhimurium. Cellular and rnolecular biology” (F. C. Neidhardt, J. L. Ingraharn, K. Brooks Low, B. Magasanik, M. Schaechter and H. E. Umbarger, eds.), p. 352. American Society for Microbiology, Washington, D.C., 1987. 92. A. Spadaro, A. Spena, V. Santonastaso and P. Conini, N a t u r e 291, 256 (1981). 93. D. NBgro, J. C. Cortay, P. Donini and A. J. Cozzone, Bchem 28, 1814 (1989). 94. S. A. Rosenfeld and J. E. Brenchley, J. Bact. 143, 801 (1980). 95. M. C. Rodriguez-Sainz, C. Hernandez-Chico and F. Moreno, J. Bact. 173, 7018 (1991). 96. H.-C. T. Tsui, P. J. Arps, D. M. Connolly and M. E. Winkler, J. Bact. 173, 7395 (1991). 97. K. F. Jensen, J. Bact. 175, 3401 (1993). 98. A. J. Pease and R. E. Wolf, Jr., J. Bact. 176, 115 (1994). 99. P. Carter-Muenchau and R. E. Wolf, Jr., PNAS 86, 1138 (1989). 100. W. R. Jones, G. J. Barcak and R. E. Wolf, Jr., J. Bact. 172, 1197 (1990). 101. G. R. Bjork, Cell 42, 7 (1985). 102. P. J. Arps and M. E. Winkler, J. Bact. 169, 1071 (1987). 103. J. W. Littlefield and D. B. Dunn, BJ 70, 642 (1958). 104. A. Zamir, R. W. Holley and M. Marquisee, ]BC 240, 1267 (1965). 105. G. R. Bjork and L. A. Isaksson, JMB 51, 83 (1970). 106. G. R. Bjork, J. Bact. 124, 92 (1975). 107. R. Takata and L. A. Isaksson, MGG 161, 15 (1978). 108. T. Ny and G. R. Bjork, J. Bact. 141, 67 (1980). 109. T. Ny and G. R. Bjork, J. Bact. 130, 635 (1977). 110. T. N y and 6. R. Bjork, J. Bact. 142, 371 (1980). 111. S. Jinks-Robertson, R. L. Gourse and M . Nornura, Cell 33. 865 (1983). 112. S. F. Gilbert, H. A. de Boer and M. Nouiura, Cell 17, 211 (1979). 113. R. A. Young and J. A. Steitz, Cell 17, 225 (1979). 73. 74. 75. 76. 77. 78. 79.
332 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135.
136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154.
GLENN R. BJC)RK
G. Glaser, P. Sarmientos and M. Cashel, Nature 302, 74 (1983). P. Sarmientos and M. Cashel, PNAS 80, 7010 (1983).
P. Sarmientos, J. E. Sylvester, S. Contente and M. Cashel, Cell 32, 1337 (1983). E. Lund and J. E. Dahlberg, PNAS 76, 5480 (1979). W. F. Shen, C. Squires and C. L. Squires, NARes 10, 3303 (1982). G. Glaser and M. Cashel, Cell 16, 111 (1979). G. Gustafsson, P. H. Lindstrom, T. 6. Hagervall, K. B. Esberg and G. R. Bjork, J. Bact. 173, 1757 (1991). T. Travers, J. Bact. 141, 973 (1980). R. Kahmann, F. Rudt, C. Koch and G. Mertens, Cell 41, 771 (1985). R. C. Johnson and M. I. Simon, Cell 41, 781 (1985). W. Ross, J. F. Thompson, J. T. Newlands and R. L. Gourse, EMBOJ. 9, 3733 (1990). H. Verbeek, L. Nilsson, G. Baliko and L. Bosch, BBA 1050, 302 (1990). R. R. Dickson, T. Gaal, H. A. deBoer, P. L. deHaseth and R. L. Gourse, 1. Bact. 171, 4862 (1989). T. Gaal, J. Barkei, R. R. Dickson, H. A. deBoer, P. L. deHaseth, H. Alavi and R. L. Course, J. Bact. 171, 4852 (1989). L. Lindahl and J. M. Zengel, Ado. Genet. 21, 53 (1982). V. Emilsson and C. G. Kurland, EMBO J. 9, 4359 (1990). C. Gustafsson and G. R. Bjork, JBC 268, 1326 (1993). G. R. Bjiirk and F. C. Neidhardt, J. Bact. 124, 99 (1975). G. R. Bjork and F. C. Neidhardt, Virology 52, 507 (1973). S. Yang, E. R. Reinitz and M. L. Gefter, ABB 157, 55 (1973). H. Kersten, M. Albani, E. Mannlein, R. Praisler, P. Wurmbach and K. H. Nierhaus, EJB 114, 451 (1981). F. C. Neidhardt, in “Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology’’(F. C. Neidhardt, J. L. Ingraham, K. Brooks Low, B. Magasanik, M. Schaechter and H. E. Umbarger, eds.), p. 3. American Society for Microbiology, Washington, D.C., 1987. X. R. Gu and D. V. Santi, Bchem 30, 2999 (1991). Y. Andachi, F. Yamao, A. Muto and S. Osawa, J M B 209, 37 (1989). A. K. Hopper, A. H. Furukawa, H. D. Pham and N. C. Martin, Cell 28, 543 (1982). G . R. Bjork and F. C. Neidhardt, Cancer Res 31, 706 (1971). I. Svensson, L. Isaksson and A. Henningsson, BBA 238, 331 (1971). K. B. Marcu and B. S. Dudock, Nature 261, 159 (1976). B. A. Roe and H. Y. Tsen, PNAS 74, 3696 (1977). P. C. Jelenc and C. G. Kurland, PNAS 76, 3174 (1979). P. Davanloo, M. Sprinzl, K. Watanabe, M. Albani and H. Kersten, NARes 6, 1571 (1979). J. M. Adams and M. R. Capecchi, PNAS 55, 147 (1966). R. E. Thach, K. F. Dewey, J. C. Brown and P. Doty, Science 153, 416 (1966). R. E. Webster, D. L. Engelhardt and N. D. Zinder, PNAS 55, 155 (1966). M. J. Pine, B. Gordon and S. S. Sarimo, BBA 179, 439 (1969). C. E. Samuel, L. D’Ari and J. C. Rabinowitz, JBC 245, 5115 (1970). C. E. Samuel and J. C. Rabinowitz, JBC 249, 1198 (1974). A. S. Delk and J. C. Rabinowitz, Nature 252, 106 (1974). A. Hoburg, H. J. Ashhoff, H. Kersten, U. Manderschied and H. G. Gassen, J . Bact. 140, 408 (1979). H. Kersten, L. Sandig and H. H. Arnold, FEBS Lett. 55, 57 (1975). A. S. Delk and J. C. Rabinowitz, PNAS 72, 528 (1975).
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
333
155. B. R. Baumstark, L. L. Spreniulli, U. L. RajBhandary and G. M. Brown, J. B a t . 129,457
(1977). 156. C. Gustafsson, “The tRNA Modifying Enzyme tRNA(m5U54)methyltransferase and Its Structural Gene tnnA.” Thesis, Umei University, Umei, 1992. 157. H. U. Petersen, E. Joseph, A. Ullmann and A. Danchin, J. Bact. 135, 453 (1978). 158. Y. Kohara, K. Akiyama and K. Isono, Cell 50, 495 (1987). 159. S. Kulakauskas, P. M. Wikstrom and D. E. Berg, J. B a t . 173, 2633 (1991). 160. B. C. Persson, C. Gustafsson, D. E. Berg and G. R. Bjork, PNAS 89, 3995 (1992). 162. D. V. Santi and L. W. Hardy, Bchem 26, 8599 (1987). 163. J. T. Kealey and D. V. Santi, Bchem 30, 9724 (1991). 164. J. Carbon, H. David and M. H. Studier, Science 161, 1146 (1968). 165. M. G. Marinus and N. R. Morris, J. B u t . 114, 1143 (1973). 166. M. G. Marinus, N. R. Morris, D. Sol1 and T. C. Kwong, J. B u t . 122, 257 (1975). 167. 6. R. Bj6rk and K. Kjellin-Striby, J. Bmt. 133, 508 (1978). 168. D. Elseviers, L. A. Petrullo and P. J. Gallagher, NARes 12, 3521 (1984). 169. M. A. Sullivan, J. F. Cannon, F. H. Webb and R. M. Bock, J. Bact. 161, 368 (1985). 170. T. G. Hagervall and 6. R. Bjork, MGG 196, 194 (1984). 171. T. G. Hagervall, C. G. Edmonds, J. A. McCloskey and G. R. Bjork, JBC 262,8488 (1987). 172. T. G. Hagervall and G. R. Bjork, MGG 196, 201 (1984). 173. Y. Taya and S. Nishimura, in “Biochemistry of Adenosylrnethionine” (F. Salvatore, E. Borek, V. Zappia, H. G. Williams-Ashman and F. Schlenk, eds.), p. 251. Columbia University Press, New York, 1977. 174. J. Ryals, R. Y. Hsu, M. N. Lipsett and H. Bremer, J. Bact. 151, 899 (1982). 175. F. H. C. Crick, J M B 19, 548 (1966). 176. T Sekiya, K. Takeishi and T Ukita, BBA 182, 411 (1969). 177. F. Lustig, P. Elias, T.Axberg, T. Samuelsson, I. Tittawella and U. Lagerkvist, JBC 256, 2635 (1981). 178. P. F. Agris, H. Sierzputowska-Gracz, W. Smith, A. Malkiewicz, E. Sochacka and B. Nawrot, JACS 114, 2652 (1992). 179. W. Smith, H. Sierzputowska-Gracz, E. Sochacka, A. Malkiewicz and P. Agris, JACS 114, 7989 (1992). 180. S . Yokoyama, T. Watanabe, K. Murao, H. Ishikura, Z. Yamaizumi, S. Nishimura and T.Miyazawa, PNAS 82, 4905 (1985). 181. C. Houssier, P. Degee, K. Nicoghosian and H. Grosjean, J. Bioniol. Struct. Dyn. 5, 1259 (1988). 182. D. Smith and M. Yarus, PNAS 86, 4397 (1989). 183. K. Sakamoto, G. Kawai, T. Niimi, T. Satoh, M. Sekine, Z. Yamaizumi, S. Nishimura, T.Miyazawa and S. Yokoyama, EJB 216, 369 (1993). 184. P. F. Agris, D. So11 and T. Seno, Bchetn 12, 4331 (1973). 185. H. J. Grosjean, S. de Henau and D. M. Crothers, PNAS 75, 610 (1978). 186. D. S . Colby, P. Schedl and C. Guthrie, Cell 9, 449 (1976). 187. A. J. Wittwer and T. C. Stadtman, ABB 248, 540 (1986). 188. G. F. Kramer and 3.N. Ames, J. Bact. 170, 736 (1988). 189. W. Leinfelder, K. Forchhammer, F. Zinoni, 6 . Sawers, M. A. Mandrand-Berthelot and A. BBck, J. Bact. 170, 540 (1988). 190. T. C. Stadtman, J. N. Davis, E. Zehelein and A. Bock, Biofactors 2, 35 (1989). 191. 2.Veres, L. Tsai, T. D. Scholz, M. Politino, R. S. Balaban and T. C. Stadtman, PNAS 89, 2975 (1992). 192. A. Ehrenreich, K. Forchhammer, P. Turmay, B. Veprek and A. Bock, EJB 206,767 (1992).
334 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208.
209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. 231.
GLENN R. BJORK A. J. Wittwer and W. M. Ching, Biofactors 2, 27 (1989). H. L. Wheeler and L. M. Liddle, Am. Chem. J . 40, 547 (1908). M. N. Lipsett, JBC 240, 3975 (1965). A. Favre, M. Yaniv and A. M. Michelson, BBRC 37, 266 (1969). G. Thomas and A. Favre, BBRC 66, 1454 (1975). T. V. Ramahhadran, T. Fossum and J. Jagger, J. Bact. 128, 671 (1976). G. Thomas and A. Favre, EJB 113, 67 (1980). G . F. Kramer, J. C. Baker and B. N . Ames, J. Bact. 170, 2344 (1988). J. W. Abrell, E. E. Kaufman and M. N. Lipsett, JBC 246, 294 (1971). M. N. Lipsett, J. Bact. 135, 993 (1978). M. N. Lipsett, JBC 247, 1458 (1972). J. Jagger, Photochem. Photobiol. Reu. 7, l(1983). T. V. Ramabhadran and J. Jagger, PNAS 73, 59 (1976). G. Thomas, K. Thiam and A. Favre, EJB 119, 381 (1981). S. C. Tsai and J. Jagger, Photochem. Photobiol. 33, 825 (1981). M. Cashel and K. E. Rudd, in “Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology” (F. C. Neidhardt, J. L. Ingraham, K. Brooks Low, B. Magasanik, M. Schaechter and H. E. Umharger, eds.), p. 1410. American Society for Microbiology, Washington, D.C., 1987. P. C. Lee, B. R. Bochner and B. N. Ames, PNAS 80, 7496 (1983). R. H. Hilderman and B. J. Ortwerth, Bchem 26, 1586 (1987). M. Raha, K. Limburg, M. Burghagen, J. R. Katze, M. Simsek, J. E. Heckman, U. L. Rajhhandary and H. J. Gross, EJB 97, 305 (1979). R. A. VanBogelen, P. M. Kelley and F. C. Neidhardt, J. B a t . 169, 26 (1987). P. Plateau, M. Fromant and S. Blanquet, J. Bact. 169, 3817 (1987). A. Favre, E. Hajnsdorf, K. Thiam and A. Caldeira de Araujo, Biochimie 67, 335 (1985). A. Caldeira de Araujo and A. Favre, EMBO J. 5, 175 (1986). V. Emilsson, A. K. Naslund and C. 6 . Kurland, NARes 20, 4499 (1992). K. Biemann, S. Tsunakawa, J. Sonnenhichler, H. Feldman, D. Diittingand H. G. Zachau, Angew. Chem. Znt. Ed. (in English) 78, 600 (1966). R. H. Hall, M. J. Robins, L. Stasiuk and R. Thedford, JACS 88, 2614 (1966). R. H. Hall, “The Modified Nucleosides in Nucleic Acid.” Columbia University Press, New York, 1971. F. Skoog and D. J. Armstrong, Ann. Reu. Plant Physiol. 21, 359 (1970). W. J. Burrows, D. J. Armstrong, F. Skoog, S. M. Hecht, J. T. Boyle, N. J. Leonardand J. Occolowitz, Science 161, 691 (1968). W. J. Burrows, D. J. Armstrong, F. Skoog, S. M. Hecht, J. T. Boyle, N. J. Leonard and J. Occolowitz, Bchem 8, 3071 (1969). F. Harada, H. J. Gross, F. Kimura, S. H. Chang, S. Nishimura and U. L. RajBhandary, BBRC 33, 299 (1968). R. H . Hall, L. Csonka, H. David and B. McLennan, Science 156, 69 (1967). B. Thimmappaya and J. D. Cherayil, BBRC 60, 665 (1974). J. 13. Cherayil and M. N. Lipsett, J. Bact. 131, 741 (1977). J. W. Einset and F. K. Skoog, BBRC 79, 1117 (1977). R. 0. Morris, D. A. Regier, R. M. Olson, Jr., L. A. Struxness and D. J. Armstrong, Bchem 20, 6012 (1981). M. Buck, J. A. McCloskey, B. Basile and B. N. Ames, NARes 10, 5649 (1982). J. J. Janzer, J. P. Raney and B. D. McLennan, NARes 10, 5663 (1982). F. Skoog and R. Y. Schmitz, in “Biochemical Actions of Homones” (G. D. Litwak, ed.), Vol VI, p. 335. Academic Press, New York, 1979.
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
335
P. H. Blum and B. N. Ames, BBA 1007, 196 (1989). P. Ajitkumar and J. D. Cherayil, J. Bact. 162, 752 (1985). S . Nishimura, Y. Yamada and H. Ishiknra, BBA 179, 517 (1969). D. J. Armstrong, W. J. Burrows, F. Skoog, K. L. Roy and D. Sol], PNAS 63, 834 (1969). D. J. Armstrong, F. Skoog, L. H. Kirkegaard, A. E. Hanipel, R. M. Bock, I. Gillam and 6. M. Tener, PNAS 63, 504 (1969). 237. R. Landick and C. Yanofsky, in “Escherichia coli and Salmonella typhifnurium. Cellular and Molecular Biology” (F. C. Niedhardt, J. L. Ingraham, K. Brooks Low, B. Magasanik, M. Schaechter and H. E. Umbarger, eds.), p. 1276. American Society for Microbiology, Washington, D.C., 1987. 238. C. Yanofsky and L. Soll, JMB 113, 663 (1977). 239. S. P. Eisenberg, L. Sol1 and M. Yarus, JBC 254, 5562 (1979). 240. M. Buck and E. Griffiths, NARes 10, 2609 (1982). 241. J. U. Ericson and G. R. Bjiirk, J. B a t . 166, 1013 (1986). 242. J. Caillet and L. Droogmans, J. Bact. 170, 4147 (1988). 243. D. M. Connolly and M. E. Winkler. /. Bact. 171, 3233 (1989). 244. H.-C. Tsui, G. Zhao, G . Feng, H.-C. Eastwood Leung and M. E. Winkler, Mol. Microbid. 11, 189 (1994). 245a. D. M . Connolly and M. E . Winkler, J. B a t . 173, 1711 (1991). 245h. 0. Mikuni, K. Ito, K. Matsumura, J. Moffat, T. Nobukuni, K. McCaughan, W. Tate and Y. Nakamura, PNAS 91, 5798 (1994). 245c. G . D. Grentzmann, D. Brechemier-Baey, V. Heurgne, L. More and R. A. Buckingham, PNAS 91, 5848 (1994). 246a. M. Kajitani and A. Ishihama, NARes 19, 1063 (1991). 246b. B. Esberg and G. R. Bjiirk, unpublished. 247. B. C. Persson and 6. R. Bjork, J. Bact. 175, 7776 (1993). 248. F. Fittler, L. K. Kline and R. H. Hall, Bchem 7, 940 (1968). 249. A. Peterkofsky, Bchem 7, 472 (1968). 250. J. L. Goldstein and M. S. Brown, Nature 343, 425 (1990). 251. F. Fittler, L. K . Kiine and R. H. Hall, BBAC 31, 571 (1968). 252. M. L. Gefter, BBRC 36, 435 (1969). 253. J. K. Bartz, L. K. Kline and D. Still, BBRC 40, 1481 (1970). 254. L. K. Kline, F. Fittler and R. H. Hall, Bchem 8, 4361 (1969). 255. N. Rosenbaum and M. L. Gefter, J B C 247, 5675 (1972). 256. P. F. Agris, D. J. Armstrong, K. P. Schiifer and D. Soll, NARes 2, 691 (1975). 257. M. L. Gefter and R. L. Russell, J M B 39, 145 (1969). 258. A. Schon, A. Bock, G. Ott, M. Sprinzl and D. Soll, NARes 17, 7159 (1989). 259. A. H. Rosenberg and M. L. Gefter, J M B 46, 581 (1969). 260. E. Griffiths and J. Humphreys, EJB 82, 503 (1978). 261. F. 0 . Wettstein and G. S. Stent, JMB 38, 25 (1968). 262. M. Buck and B. N. Ames, Cell 36, 523 (1984). 263. T. H. Jukes, Nature 246, 22 (1973). 264. C. Houssier and H . Grosjean, /. Biomol. Struct. Dyn.3, 387 (1985). 265. J. Vacher, H. Grosjean, C. Houssier and R. H. Buckingham, ] M B 177, 329 (1984). 266. P. Gollnick and C. Yanofsky, J . B a t . 172, 3100 (1990). 267. R. Landick, C. Yanofsky, K. Choo and L. Phung, J M B 216, 25 (1990). 268. L. A. Petrullo, P. J. Gallagher and D. Elseviers, M G G 190, 289 (1983). 269. W. Salser, M G G 105, 1 (1969). 270. M. Yarus and J. Curran, in “Transfer RNA in Protein Synthesis” (D. L. Hatfield, B. J. Lee and R. M. Pirtle, eds.), p. 319. CRC Press, Boca Raton, Florida, 1992. 232. 233. 234. 235. 236.
336 271. 272. 273. 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. 284. 285. 286. 287. 288. 289. 290. 291. 292. 293. 294. 295. 296. 297. 298. 299. 300. 301. 302. 303. 304. 305.
306. 307.
GLENN R. BJORK
R. H. Buckingham, Experienticl46, 1126 (1990). W. T Pedersen and J. F. Curran, J M B 219, 231 (1991). E. Bubienko, P. Cruz, J. F. Thomason and P. N . Borer, This Series 30, 41 (1983). S. M. Freier, R. Kierzek, J. A. Jaeger, N. Sugimoto, M. H. Caruthers, T. Neilson and D. H. Turner, PNAS 83, 9373 (1986). I. Diaz and M. Ehrenberg, J M B 222, 1161 (1991). D. Hirsh, J M B 58, 439 (1971). D. Smith and M. Yarus, J M B 206, 489 (1989). R. H. Buckingham and C. G. Kurland, PNAS 74, 5496 (1977). M. Buck and E. Griffiths, NARes 9, 401 (1981). A. Bjornsson and L. A. Isaksson, J M B 232, 1017 (1993). F. Janiak, V. A. Dell, J. K. Abrahamson, B. S. Watson, 1). L. Miller and A. E. Johnson, Bchem 29, 4268 (1990). V. Burdett, / B C 266, 2872 (1991). V. Burdett, J. Bact. 175, 7209 (1993). D. Hirsh and L. Gold, J M B 58, 459 (1971). A. M. Weiner and K. Weber, Nature New B i d 234, 206 (1971). R. K. Wilson and B. A. Roe, PNAS 86, 409 (1989). S. K. Dube, K. A. Marcker, B. F. Clark and S. Cory, Nature 218, 232 (1968). K. M. Harrington, I. A. Nazarenko, D. B. Dix, R. C. Thomopson and 0. C. Uhlenbeck, Bchem 32, 7617 (1993). J. R. Menninger, Mech Ageing Deu 6, 131 (1977). L. A. Petrullo and D. Elseviers, J. Bact. 165, 608 (1986). I. Diaz, S. Pedersen and C. G. Kurland, MGG 208, 373 (1987). J. Ninio, CSHSQ 52, 639 (1987). R. Mikkola and C. 6. Kurland, FEMS Microbiol Lett 56, 265 (1988). P. H. Blum, J. Bact. 170, 5125 (1988). E. Griffiths, J. Huniphreys, A. Leach and L. Scanlon, Infect Immun 22, 312 (1978). M. Cabrera, Y. Nghiem and J. H. Miller, J. Bact. 170, 5405 (1988). Y. Nghiem, M. Cabrera, C. G . Cupples and J. H. Miller, PNAS 85, 2709 (1988). B. C. Persson, “Function and Maturation of Tanslational Components in Salmonella typhimuriun and Escherichia coli.” Thesis, Umel University, Umel, 1993. W. Ream, Annu. Reu. Phytopathol, 27, 583 (1989). P. Zambryski, ARGen 22, 1 (1988). J. Gray, J. Wang and S. B. Gelvin, J. Bact. 174, 1086 (1992). M. Adler, B. Weissman and A. B. Gutman, JBC 230, 717 (1958). G . R. Bjork and K. Kjellin-Strlby, J. Bact. 133, 499 (1978). G. E. Stroga, F. Nemoto, Y. Kuchino and G. R. Bjork, NARCS20, 3463 (1992). G. R. Bjork, in “Escherichia coli and Salmonella typhimuriurn. Cellular and Molecular Biology” (F. C. Neidhardt, J. L. Ingraham, K. Brooks Low, B. Magasanik, M. Schaechter and H. E. Umbarger, eds.), p. 719. American Society for Microbiology, Washington, D. C., 1987. G. R. Bjork, P. M. Wikstrom and A. S. Bystrom, Science 244, 986 (1989). A. S. Bystrom, K. J. Hjalmarsson, P. M . Wikstrom and G. R. Bjork, EMBO J 2, 899
(1983). 308. A. S. Bystriim, A. von Gabain and G. R. Bjork, J M B 208, 575 (1989). 309. L. Lindahl and J. M. Zengel, ARGen 20, 297 (1986). 310. P. M. Wikstrom, A. S. Bystrom and G. R. Bjork, J M B 203, 141 (1988). 311. P. M. Wikstrom and C. R. Bjork, J. Bact. 170, 3025 (1988). 312. P. M. Wikstrijm, L. K. Lind, D. E. Berg and G . R. Bjbrk, J M B 224, 949 (1992).
MODIFIED NUCLEOSIDES IN BACTERIAL RNA
337
313. T. Ny, J. Thomale, K. Hjalmarsson, G. Nass and G. R. Bjiirk, BBA 607, 277 (1980). 314a. W. M. Holnies, C. Andraos-Selim, I. Roberts and S. Z. Wahab, JBC 267, 13440 (1992). 314b. J.-H. Li and G. R. Bjork, unpublished. 315. T. 6. Hagervall, T. M. Tuohy, J. F. Atkins and G. R. Bjork, J M B 232, 756 (1993). 316. T. Ikemura, J M B 146, 1 (1981). 317. U. L. Rajbhandary, S. H. Chang, H. J. Gross, F. Harada, F. Kimuraand S. Nishimura, FP 28, 409 (1969). 318. H. M. Goodman, J. Ahelson, A. Landy, S. Brenner and J. D. Smith, Nature 217, 1019 (1968). 319. H. M. Goodman, J. N. Abelson, A. Landy, S. Zadrazil and J. D. Smith, EJB 13, 461 (1970). 320. B. P. Doctor, J. E. Loebel, M. A. Sodd and D. B. Winter, Science 163, 693 (1969). 321. H. Kasai, Z. Oashi, F. Harada, S. Nishimura and N. J. Oppenheimer, Bchem 14, 4198 (1975). 322. S. Yokoyama, T. Miyazawa, Y. Iitaka, Z. Yamaizumi, H. Kasai and S. Nishimura, Nature 282, 107 (1979). 323. K. lsono, J. Krauss and Y. Hirota, M G G 149, 297 (1976). 324. N. Okada, S. Noguchi, S. Nishimura, T. Ohgi, T. Goto, P. F. Crain and J. A. McCloskey, NARes 5 , 2289 (1978). 325. K. Reuter, R. Slany, F. Ullrich and H. Kersten, J. Bact. 173, 2256 (1991). 326. S. Noguchi, Z. Yamaizumi, T. Ohgi, T. Goto, Y. Nishimura, Y. Hirota and S. Nishimura, NARes 5, 4215 (1978). 327. S. Noguchi, Y. Nishimura, Y. Hirota and S. Nishimura, JBC 257, 6544 (1982). 328. N. Okada, S. Noguchi, H. Kasai, N. Shindo-Okada, T. Ohgi, T. Goto and S. Nishimura, JBC 254, 3067 (1979). 329. N. Okada, T. Yasuda and S. Nishimura, NARes 4, 4063 (1977). 330. J. R. Katze, M. H. Simonian and R. D. Mosteller, J. Bact. 132, 174 (1977). 331. R. K. Slany, M. Bod, P. F. Crain and H . Kersten, Bchem 32, 7811 (1993). 332. B. Frey, J. McCloskey, W. Kersten and H. Kersten, J. Bact. 170, 2078 (1988). 333. R . K. Slany and H. Kersten, NARes 20, 4193 (1992). 334. S. M. Kane, C. Vugrincic, D. S. Finbloom and D. W. Smith, Bchem 17, 1509 (1978). 335. F. Martin, G. Eriani, S. Eiler, D. Moras, G. Dirheimer and J. Gangloff, J M B 234, 965 (1993). 336. F. Meier, B. Suter, H. Grosjean, 6. Keith and E. Kuhli, EMBO J. 4, 823 (1985). 337. A. L. McNamara and D. W. Smith, JBC 253, 5964 (1978). 338. B. Frey, G. Janel, U . Michelsen and H. Kersten, J. Bact. 171, 1524 (1989). 339. H. Beier, M. Barciszewska, G. Krupp, R. Mitnacht and H. J. Gross, E M B O J. 3, 351 (1984). 340. H. Beier, M. Barciszewska and H.-D. Sickinger, E M B O J. 3, 1091 (1984). 341. M. Bienz and E. Kubli, Nature 294, 188 (1981). 342. G. JHnel, U. Michelsen, S. Nishimura and H. Kersten, E M B O J . 3, 1603 (1984). 343. F. Harada, F. Kimura and S. Nishimura, BRA 182, 590 (1969). 344. F. Harada, F. Kimura and S. Nishimura, BBA 195, 590 (1969). 345. M. Yaniv and B. 6. Barrel], Nature 222, 278 (1969). 346. K. Murao, M. Saneyoshi, F. Harada and S. Nishimura, B B R C 38, 657 (1970). 347. K. Murao, T. Hasegawa and H. lshikura, NARes 3, 2851 (1976). 348. 6. R. Bjork and A. OlsBn, Acta Chem. Scund. B 33, 591 (1979). 349. G. R. Bjork, J M B 140, 391 (1980). 350. T. G . Hagervall, Y. H. Jonsson, C. G. Edmonds, J. A. McCloskey and G. R. Bjiirk, I. Bact. 172, 252 (1990).
338
GLENN R. BJC)HK
357. 358. 359. 360. 361.
D. A. Kellogg, B. P. Doctor, J. E . Loebel and M. W. Nirenberg, PNAS 55, 912 (1966). H. Ishikura, Y. Yamada and S. Nishimura, BBA 228, 471 (1971). K. Takeishi, T. Takemoto, S. Nishimura and T. Ukita, RBRC 47, 746 (1972). S. K. Mitra, F. Lustig, B. Akesson and U. Lagerkvist, JBC 252, 471 (1977). T. Samuelsson, P. Elias, F. Lustig, T. Axberg, G. Folsch, B. Akesson and U . Lagerkvist, JBC 255, 4583 (1980). Y. S. Guindy, T. Samuelsson and T. I. Johansen, BJ 258, 869 (1989). T. Samuelsson, T. S. Guindy, F. Lustig, T. BorCn and U. Lagerkvist, PNAS 84, 3166 (1987). M. W. Kilpatrick and R. T. Walker, NARes 8, 2783 (1980). J. E . Heckman, J. Sarnoff, B. Alzner-DeWeerd, S. Yin and U. L. RajBhandary, PNAS 77, 3159 (1980). B. G. Barrell, S. Anderson, A. T. Bankier, M. H. d e Bruijn, E. Chen, A. R. Coulson, J. Drouin, I. C. Eperon, D. P. Nierlich, B. A. Roe, F. Sanger, P. H. Schreier, A. J. Smith, R. Staden and I. G. Young, PNAS 77, 3164 (1980). H. Grosjean and H. Chantrenne, Mol. B i d . Biochern. Biophys. 32, 347 (1980).
362. 363. 364. 365. 366. 367.
Corrigendum
Volume 49 (1994), in the chapter “Interaction of Epidermal Growth Factor with Its Receptor,” by Stephen R. Campion and Salil K. Niyogi, pages 353-383: On page 368, for Leu47+Arg and Leu47-+Asp, the “Relative f i n ity” values should be 0.09 and 0.06, respectively.
Progress in Nucleic Acid Research and Molecular Biology, Vol. 50
339
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved.
Index
A
cis-acting elements in K locus regulating hypermutation, 7479 in targeting of recombination, 93-94 cmosU, see Uridine-5-oxyacetic acid Codon-anticodon interactions, function of m'G37 in, 313-314 Collagen fibrillar and nonfibrillar, 228-230 membrane-associated, 230-248 types XV and XVIII, 248-257 sequence homologies, 250-252 types XVII and XIII, 235-246 Complement, subcomplement Clq, 234-235 Cross-linking, RNA in spliceosome, 138-139
Activation recombination by transcription factors, 92-93 ribosomal, growth-regulated, 54-55 Affinity selectior,, and somatic hypermutation, 70-71 Amino acids, aromatic, biosynthesis, 322323
B Bacteria, defective in tRNA modification, 267-268 Base, branchpoint and last intron, 154 Base-pairing, RNA, 16-19
C Capping enzyme genetic link with pre-mRNA splicing, 124-125 Saccharomyces cerevisiae mRNA, 114117 and mRNA identity, 122-124 role in vaccinia virus transcription, 112114 from Schizosaccharomyces pombe, 117-
118
sequence conservation, 119-122 vaccinia virus, domain structure, 103109 Cell cultures expression of transgenic constructs, 173175 models, 163-172 Cell growth, role of CEGl capping activity, 116-117 Chromatin, organization of human apo-B gene, 162-163
D Development, Egr-l induction during, 195197 Differential screening, for Egr-l cDNA, 192-193 Differentiation cellular, and gene regulation, 217-219 Egr-l induction during, 195-197 DNA binding activity of Egr-I, 205-210 complementary, Egr-l , identification, 192-193 DNase-I, hypersensitivity of integrated transgenes, 180-184, 187-188 Domain V residues, in peptide-bond formation. 13-17
E Enhancement single-copy enhancers and terminators, 52-54 spacer promoters and repetitive enhancers, 49-51
34 1
INDEX Enhancers repetitive, 49-51, 56-57 second and third intron, 166-170 second intron, requirement for expression, 175-179 Enhancesome, structure and role in promotion, 45-49 Enzymes capping, see Capping enzyme switch recombination, cell-type specificity,
94-95 Escherichia coli, mutants, isolation, 296-298 Eukaryotes mRNA synthesis, capping enzyme in, 101-129 ribosomal transcription in, 25-66 Evolution, ribosomal genes, 28-33
G Genes apo-B, human, 161-190 C E G l , capping activity, 116-117 Egr-l cDNA identification, 192-193 distal events, 201-219 expression, 193-199 in family with WTl gene, 219-220 proximal events, 199-201 encoding types XV and XVIIl collagen,
252
encoding type XI11 collagen, 238-239 globin, 184 immediate-early, 191-192 regulation, 216-219 reporter, 82-83 ribosomal, 27-33 Genetics, and modified nucleosides in bacterial tRNA, 263-338 Germinal centers, somatic hypermutation in,
70-71
Growth regulated repressors of ribosomal genes,
55-56
regulated ribosomal transcription, 54-55 Guanine, sequences, role in switch recombination, 84-85 Guanylyltransferase vaccinia virus RNA, 104-106
yeast active site, 114-115 structure probing, 115-116
H Hypermutation, somatic, 69-83
I IGS, see Intergenic spacer Immunoglobulins isotype switch recombination, 83-94 rearranged variable regions, 69-70 Intergenic spacer, heterogeneity in, 30-33 N6-Isopentenyl-2-thiomethyladenosine, and ms*io6A37, presence in tRNA, 296-298 Isotypes, immunoglobulins, switch recombination, 83-94
M Macrophages, scavenger receptor, 232-233,
257
Membranes, associated collagenous proteins,
230-248 5-Methylaminomethyl-2-thiouridine,stepwise synthesis in position 34, 287-293 1-Methylguanosine, synthesis in position 37, 310-318 2-Methylthio-cis-ribos ylzeatin lack of, 308-310 stepwise synthesis in position 37, 299-300 Methyltransferase, domain of vaccinia capping enzyme, 106-108 5-Methyluridine, synthesis in position 54, 279-286 mlG, see I-Methylguanosine Mitogens, Egr-l induction in response to, 193-195 mnmW, see 5-Methylaminomethyl-2-thiouridine Models, cell culture, 163-172 Mouse, transgenic, expression and second intron enhancer, 175-179 mRNA ul(XIII), tissue distribution, 246
343
INDEX
identity, and capping enzyme, 122-124 nascent, cotranscriptional capping, 111112 pre-mRNA interaction with U 1 snRNA, 139-141 interaction with U2 snRNP, 142-143 splicing, link with capping enzyme, 124-125 spliced, release, I56 synthesis, capping enzyme in, 101-129 ms2i6A, see N6-Isopentenyl-2-thiomethyladenosine ms2i06A, see 2-Methylthio-cis-ribosylzeatin m5U, see 5-Methyluridine Mutants aru, 322-324 hisT, 269-279 m i d , miaB, and miaE, 296-310 nuuA and nu&, 293-296 queA, queB, and tgt, 318-322 t d , 279-286 t m C , t m E , and asuE, 287-293 t m D , 310-318 Mutational analysis, yeast guanylyltransferase, 114-115
N Neurons, signaling, Egr-l induction in, 197199 Nuclear localization signal, Egr-1, mapping, 213-216 Nucleosides, modified, in bacterial tRNA, 263-338 Nucleotides modified, in splicing, 154-156 second-step function, 153
0 Operons, hisT, organization, 269-270
P Peptide-bond formation, ribosome-catalyzed, 1-23
Peptidyltransferase, in peptide-bond formation, 1-23 Polynucleotide ligases, sequence conservation, 119-122 Proliferation, cellular, and gene regulation, 216-217 Promoters apo-B, localization, 163-166 Egr-1, analysis, 200-201 ribosomal, activation at, 44-49 RPOI, 33-36 spacer and repetitive enhancers, 49-51 role in transcription enhancement, 5657 Promotion, role of enhancesome, 48-49 Proteins collagenous, membrane-associated, 230248 early growth response, Egr-1 characterization, 201-205 DNA binding activity, 205-210 structure-function analysis, 210-216 targets of regulation, 216-219 in oitru role, 219 frizzled, 253-256, 258 in switch recombination, 86-88 Proteolysis, limited, 115-116 Puromycin, A-site substrate, 5-9
Q Queuosine, synthesis in position 34, 318322
R Radiation, injury, and Egr-1 induction, 197 Rate-limiting step, in cap formation, 109111 Reading frame, maintenance, function of mlG37, 314-318 Recornhination, switch, immunoglobulin isotypes, 83-94 Reducer effect, 170-172 Region specificity, isotype switch recombination. 85-86
344
INDEX
Regulatory elements apo-B gene identification, 163-172 in uitro function, 172-184 heavy-chain, in targeting hypermutation, 80-82 transcriptional, 90-92 Repression domain, Egr-1, localization, 211213 Repressors, ribosomal genes, 55-56 Ribonucleoprotein particles, small nuclear hybrid, 136 spliceosome built of, 134-135 U2, interaction with pre-mRNA, 142-143 Ribosomes catalyzed peptide-bond formation, 1-23 gene transcription, in eukaryotes, 25-66 RNA dynamic, 136-137 messenger, see mRNA ribosomal, 2 3 4 , 10-17 -RNA interactions in spliceosome assembly, 139-146 in splicing, 137-138 small nuclear, see snRNA in spliceosome, 138-139 transfer, see tRNA RNA polymerase I complexity, 43-44 in growth-regulated activation, 54-55 promoters, 33-36 in ribosomal transcription, 25-66 RPOI, see RNA polymerase I
S Salmonella typhirnuriurn, mutants, isolation, 296-298 Scavenger receptor, macrophage, 232-233, 257 Second messengers, in regulation of Egr-l, 199-200 Sequence motif, Conservation, 119-122 Signaling, nenronal, Egr-l induction in, 197-199 snRNA -exon hybrid snRNP, 136 regeneration, 156 U1 and U2, 139-143
U1 departure, 146
U2, U4, U5, and U6, 146-150 U2.U6 helix Ia, 153-154 U4 and U6, extensive interaction, 143145 U5, interaction with second exon, 151-
153
snRNP, see Ribonucleoprotein particles, sinall nuclear Spliceosome assembled, 150-151 built of snRNPs, 134-136 RNA, in cross-linking, 138-139 and RNA-RNA interactions, 143-146 Splicing alternative, 239-246 catalytic steps, 146-150 complexes, early, 141- 142 general features, 132-136 modified nucleotides in, 154-156 RNA-RNA interactions in, 137-138 trans-Splicing, 136 Structure-function analysis, Egr-1, 210-216 s4U. see 4-Thiouridine
T Targeting recombination, 93-94 switch recornhination, 86 TATA box binding protein, in RPOI, 35-42 Terminators, of transcription, 52-54 4-Thiouridine as sensor for near-UV light, 294-296 synthesis in position 8, 293-294 Tissues distribution of collagen mRNAs, 246, 253 injury, and Egr-I induction, 197 specificity of transgene expression, 179 trans-activation domains, Egr-1, definition, 210-211 Transcription ribosomal, in eukaryotes, 25-66 in somatic hypermutation, 79-80 units, looped-domain organization, 185 vaccinia virus, 112-114 Transcription factors activation of recombination, 92-93 basal, 36-44
34s
INDEX
zinc-finger family, Egr-1 as prototype, 191-224 Transcripts, and variant transcripts, type XI11 collagen, 239-246 Transesterifications, in splicing, 132-134 Transgenes expression, 179-180 immunoglobulin, analyses, 83 integrated, DNase-I hypersensitivity, 180184, 187-188 K, cis-acting elements, 74-79 Transgenic constructs, expression in cell culture, 173-175 Translation elongation phase, 3-4, 9-10, 16 function of cmo5U34 in, 323-324 hisT mutants in, 270-276 m1G37 in, 313-318 mnm5s234 in, 289-292 mszi6A37and ms2i06A37in, 300-308 m5U54 in, 282-285 434 in, 320-321 s48 in, 294-296 Triphosphatase, domain of vaccinia capping enzyme, 108-109 tRNA bacterial, modified nucleosides in, 263338 codon-context sensitivity, 300-306 lack of mIC37, 318 Q34, 322 (m1C37)methyltransferae, synthesis, 310313
modification, 322-323 (m5U54)methyltransferase, synthesis, 279282 presence of msVA and ms2i0637, 296-298
U Upstream binding factor in ribosomal activation, 55 in ribosomal transcription, 36-50 Upstream control element, see Upstream promoter element Upstream promoter element, in RPOI, 33-
38
Uridine-5-oxyacetic acid, synthesis in position 34. 322-324
V Vaccinia virus capping enzyme, 103-109 transcription, 112-1 14
Y Yeast, 114-118 pombe
Z Zinc-finger, family of transcription factors, 191-224