PROGRESS IN
Nucleic Acid Research and Molecular Biology edited by
WALDO E. COHN
KlVlE MOLDAVE
Biology Division Oak R...
11 downloads
1692 Views
23MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PROGRESS IN
Nucleic Acid Research and Molecular Biology edited by
WALDO E. COHN
KlVlE MOLDAVE
Biology Division Oak Ridge National Laboratory Oak Ridge, Tennessee
Department of Molecular Biology and Biochemistry University of California, lrvitae Zrvine, Californiu
Volume 54
ACADEMIC PRESS Son Diego
New York Boston
London Sydney Tokyo Toronto
This book is printed o n acid-free paper. @ Copynght 0 1996 by ACADEMIC PRESS, INC All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc.
A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by Academic Press Limited 24-28 Oval Road, London NW 1 7DX
International Standard Serial Number: 0079-6603 International Standard Book Number: 0-12-540054-3 PRINTED M THE UNITED STATES OF AMERICA 96 97 9 8 9 9 00 0 1 E B 9 8 7 6 5
4
3
2
1
Abbreviations and Symbols
All contributors to this Series are asked to use the terminology (abbreviations and symbols) recommended by the IUPAC-IUB Commission on Biochemical Nomenclature (CBN) and approved by IUPAC and IUB, and the Editors endeavor to assure conformity. These Recommendations have been published in many journals ( 1 , Z ) and compendia ( 3 ) ;they are therefore considered to he generally known. Those used in nucleic acid work, originally set out in section 5 of the first Recommendations ( 1 ) and subsequently revised and expanded (2, 3), are given in condensed form in the frontmatter of Volumes 9-33 ofthis series. A recent expansion of the oneletter system (5)follows. SINGLELETTERCODERECOMMENDATIONV( 5 ) Meaning
Symhol
Origin of symbol Guanosine Adenosine (ribo)Thymidine (Uridine) Cytidine
Wb
G or A T(U) or C A or C G or T(U) G or C A or T(U)
puRine pyrimidine aMino Keto Strong interaction (3 H-bonds) Weak interaction (2 H-bonds)
H B V Dc
A or C or T(U) G or T(U) or C G or C or A G or A or T(U)
not not not not
N
G or A or T(U) or C
aNy nucleoside (i.e., unspecified)
Q
Q
Queuosine (nucleoside of queuine)
R
Y M
K S
G; H follows G in the alphabet A; B follows A T (not U); V follows U C; D follows C
OModified from Proc. Natl. Acad. Sci. U . S . A . 83, 4 (1986). AW has been used for wyosine, the nucleoside of “base Y” (wye). C Dhas been used for dihydrouridine (hU or H,Urd). Enzymes
In naming enzymes, the 1984 recommendations of the IUB Commission on Biochemical Nomenclature ( 4 ) are followed as far as possible. At first mention, each enzyme is described either by its systematic name or by the equation for the reaction catalyzed or by the recommended trivial name, followed by its EC numher in parentheses. Thereafter, a trivial name may he used. Enzyme names are not to be abbreviated except when the substrate has an approved abbreviation (e.g., ATPase, but not LDH, is acceptable). ix
X
ABBREVIATIONS AND SYMBOLS
REFERENCES I . JBC 241,527 (1966);Bchem 5,1445 (1966);BJ 101, l(1966);ABB 115,1(1966),129,1(1969); and elsewhere. General. 2 . EJB 15, 203 (1970);JBC 245, 5171 (1970);J M B 55, 299 (1971);and elsewhere. 3. “Handbook of Biochemistry” (G. Fasman, ed.), 3rd ed. Chemical Rubber Co., Cleveland, Ohio, 1970, 1975, Nucleic Acids, Vols. I and 11, pp. 3-59. Nucleic acids. 4. “Enzyme Nomenclature” [Recommendations (1984)of the Nomenclature Committee of the IUB]. Academic Press, New York, 1984. 5. EJB 150, 1 (1985). Nucleic Acids (One-letter system). Abbreviations of Journal Titles Journals
Abbreviations used
Annu. Rev. Biochem. Annu. Rev. Genet. Arch. Biochem. Biophys Biochem. Biophys. Res. Commun. Biochemistry Biochem. J. Biochim. Biophys. Acta Cold Spring Harbor Cold Spring Harbor Lab Cold Spring Harbor Symp. Quant. Biol. Eur. J. Biochem. Fed. Proc. Hoppe-Seyler‘s Z. Physiol. Chem. J. Amer. Chem. SOC. J. Bacteriol. J. Biol. Chem. J. Chem. SOC. J. Mol. Biol. J. Nat. Cancer Inst. Mol. Cell. Biol. Mol. Cell. Biochem. Mol. Gen. Genet. Nature, New Biology Nucleic Acid Research Proc. Natl. Acad. Sci. U.S.A. Proc. SOC.Exp. Biol. Med. Progr. Nucl. Acid. Res. Mol. Biol.
ARB ARGen ABB BBRC Bchem BJ BBA CSH CSHLab CSHSQB EJB FP ZpChem JAC S J. Bact. JBC JCS JMB JNCI MCBiol MCBchem MGG Nature NB NARes PNAS PSEBM This Series
Some Articles Planned for Future Volumes
Minute Virus o f Mice cis-acting Sequences Required for Genome Replication and the Role of the Trans-acting Viral Proteins CAROLINE ASTELL, QINGQUAN LIU, COLINE. HARRIS,JOHNBRUNSTEIN, HITESH K. JINDALLAND PAT TAM Structure and Transcription Regulation of Nuclear Genes for the Mouse Mitochondria1 Cytochrome c Oxidase NARAYANG. AVADHANI, A. BASU,C . SUCHAROV AND N. LENKA The Large Ribosomal Subunit Stalk as a Regulatory Element of the Eukaryotic Translational Machinery JUANP.G. BALLESTAAND MICUEL REMACHA General Transcription Factors Controlling the Activity of Mammalian RNA Polymerase II JANEW. CONAWAY AND RONALD C. CONAWAY The Internal Structure o f the Ribosome BARRYS. COOPERMAN Function and Mechanism in Prokaryotic General Recombination Systems MICHAELCox
S1 Nuclease Sensitive D N A Structures Contribute to Transcriptional Regulation of the Human PDGF A-Chain ZHAO-YIWANGAND THOMASF. DEUEL Eukaryotic Nuclear RNase P: Structures and Functions JOEL R. CHAMBERLIN, ANTHONYJ. WNGUCH, EILEENP A G A N - ~ M O AND S DAVIDR. ENCELKE Biochemistry and Molecular Biology of Cobalumin Biosynthesis JORCEC. ESCALANTE-SAMERENA Intron-encoded snRNAs MAURILLE J. FOURNIER AND E. STUARTMAXWELL Mechanisms for the Selectivity of the Cell’s Proteolytic Machinery ALFRED GOLDBERG,MICHAELSHERMAN AND OLIVERCoux Structure/Function Relationships of Phosphoribulokinase and Ribulosebisphosphate Carboxylase/Oxygenase FREDC. HARTMAN AND HILLELK. BRANDES The Nature of DNA Replication Origins in Higher Eukaryotic Organisms JOEL A. HUBERMAN AND WILLIAM C . BURHANS Xi
xii
SOME ARTICLES PLANNED FOR FUTURE VOLUMES
Function and Regulatory Properties of the MEK Kinase Family GARYL. JOHNSON et al. Regulation and Function of Adenosine Deaminase in Mice
MICHAEL R. BLACKBURNAND RODNEY E.
KELLEMS
Experimental Analysis o f Global Gene Regulation in Escherichia coli
ROBERT M . BLUMENTHAL,DEBORAH w. BORSTAND ROWENA G. MATTHEWS
DNA Helicases: Roles in DNA Metabolism STEVEN
w. MATSON AND DANIELw. BEAM
Bacterial and Eukaryotic D N A Methyltransferases NORBERT0. REICH Self-glucosylating Initiator Proteins and Their Role in Glycogen Biosynthesis PETER
J. ROACH
DNA Repair AZIZ SANCAR Depletion of Nuclear Poly(ADP-ribose) Polymerase by Antisense RNA Expression: Influence on Genomic Stability, Chromatin Organization, and DNA Repair and D N A Replication CYNTHIAM. G. SIMBULAN-ROSENTHAL,DEANS. ROSENTHAL, RUCHUANG DING, JOANY JACKMAN AND
MARKE.
SMULSON
Chemical Synthesis and Structure of Small RNA Molecules MATHIASSPRINZL AND STEFAN LIMMER Transcriptional Regulation of Small Nuclear RNA Genes
WILLIAME. STUMPH Bacillus subtilis as 1 Know It NOBORUSUEOKA Effects of the Ferritin Open Reading Frame on Translational Induction by Iron
ROBERT E. THACHet al.
Structure and Function of the Human Im munodef iciency Vi rus Leader RNA BENJAMINBERKHOUT Department of Virology Academic Medical Center University of Amsterdam 1105 AZ Amsterdam, The Netherlands
I. A Structure Model for HIV-1 and HIV-2 Leader RNA The Trans-acting Responsive Hairpin . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Poly(A) Hairpin . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ The Primer-binding Site . . . . . . . . . . . . . . . . . . . . . . . . . The RNA Dimerization Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The RNA Packaging Signal . . . . . . ..................
11. 111. IV. V. VI.
VII. Splicing and Translation Functions ............. VIII. Base Composition of HIV-SIV Leader RNAs . . . . . . . . . . . . . . . . . . . . . IX. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 18 21
23 25 29 30
The retrovirus family encompasses a diverse group of viruses characterized by a replication step in which the viral RNA genome is copied into DNA by the virally encoded reverse transcriptase enzyme. Among retroviruses, the lentiviruses have the most complex genome structure and expression strategy. The primate lentiviruses include the human and simian immunodeficiency viruses. There are two types of human immunodeficiency virus, HIV-1 and HIV-2. Simian immunodeficiency viruses (SIVs) have been identified in a number of Old World monkey species: the sooty mangabey (SIVsm), mandrill (SIVmnd), African green monkey (SIVagm), Sykes monkey (SIVsyk), macaque (SIVmac), and chimpanzee (SIVcpz). In general, the primate lentiviruses can be split in five subgroups that are equally distantly related to one another ( 1 , 2). Interestingly, phylogenetic analysis of nucleicacid or amino-acid sequences strongly suggests that both HIV-1 and HIV-2 result from relatively recent simian-to-human cross-species transmissions. The HIV-1 genome is closely related to that of SIVcpz, and HIV-2 is almost identical to the SIVsm and SIVmac isolates. The three additional groups are represented by the SIVmnd, SIVagm, and SIVsyk isolates. The 5'-untranslated leader region of an HIV-SIV RNA genome encodes Progress in Nucleic Acid Research and Molecular Biology, Vol. 54
1
Copynght 0 19% by Academic Press. Inc. All nghts of reproduction ID any form reserved.
2
BENJAMIN BERKHOUT
multiple sequences important for viral replication. These sequences do not code for proteins but are the cis-acting sites of recognition by proteins and RNAs responsible for mediating several steps in the viral replication cycle. Reverse transcription of the retroviral genome, for example, is primed by a tRNA bound to an 18-nucleotide complementary region (the primer-binding site, PBS) near the 5’ end of the genome. Other leader motifs, the dimerization and packaging signals (DIS and q), are required for genome dimerization and selective encapsidation into assembling virions. Furthermore, processes such as mRNA splicing, polyadenylylation, and translation are controlled by sequence elements in the leader transcript. In addition, complex lentiviruses encode the transcriptional trans-activator protein Tat that binds to the trans-acting responsive (TAR) hairpin in the nascent leader transcript to regulate viral transcription from the long terminal repeat (LTR) promoter. This article deals with the structure and function of the leader transcript of HIV-1 and HIV-2. Most of the RNA signals encoded by the untranslated leader RNA have specific nucleotide sequences critical for recognition and function [e.g., AAUAAA in the poly(A) site], but there is accumulating evidence that their structural context can also be important (e.g., the TAR hairpin motif). There has been an intense effort to analyze the secondary structure of retroviral leader RNAs using a variety of methods (biochemical analysis, free-energy minimization, sequence comparison, mutant analysis). The phylogenetic approach can be extremely helphl, given the large number of sequenced HIV-1 and HIV-2 isolates and the growing number of more distantly related sequences of members of the simian immunodeficiency viruses. In this article, which is not intended to be encyclopedic, we focus primarily on relationships between the structure of specific leader RNA motifs and their function in the retroviral life cycle. A secondary structure model for the HIV-1 and HIV-2 leader is presented in Section I, and the individual motifs and their regulatory role in virus replication are discussed in Sections 11-VIII.
1. A Structure Model for HIV-1 and HIV-2 leader RNA We and others have published RNA secondary structure models for several domains of the HIV-1 and HIV-2 leaders based on a variety of techniques (3-11). These models are generally similar, but there are some significant variations in some regions of the leader RNA. The leader RNA structure models presented in Figs. 1and 2 are based on published data (see
RNA STRUCTURE AND RETROVIHAL REPLICATION
3
later discussions dealing with the individual structure motifs). In case of conflicting data, we performed extensive phylogenetic analysis of the particular RNA region in all HIV-SIV virus groups in order to reveal structural similarities. We will consider a double-helical element as definitely existing only when it is supported b y sufficient comparative data. Specifically, a putative helix is considered to exist when (1)base-pairing covariance can be demonstrated (e.g., G C changed to A*U) or (2) a similar structure can be folded for other HIV-SIV viruses. Although the comparative evidence is convincing for several hairpins [e.g., TAR, poly(A), DIS], it is obvious that these RNA structure models are by no means final. Secondary structure models for the complete leader RNA of HIV-1 and HIV-2 are presented in Figs. 1 and 2, respectively. Each hairpin motif is identified by a name that refers to its (putative) function in viral replication. This region contains several important molecular signals, the 5 ’ end folds the characteristic TAR hairpin structure with either one extended stem region (HIV-1) or a more complex, branched structure (HIV-2), and this motif forms the binding site for the Tat trans-activator protein. Further downstream is the poly(A) hairpin, which invariably presents the AAUAAA hexamer involved in polyadenylylation in the single-stranded loop region. The initiation site of reverse transcription (the primer-binding site, PBS) is part of an extended structure that may be involved in the annealing of the primer tRNA molecule. The larger PBS structure is divided into three subdomains: the top part consisting of a stem-loop, the relatively unstructured central domain containing the tRNA primer-binding site, and the bottom part consisting of an extended stem region with several irregularities. The DIS hairpin is critical for initiation of genome dimerization, but additional dimerization signals as well as the encapsidation signal (P)are believed to be located downstream of the major splice donor (SD). [Throughout this manuscript we refer to these RNA structure models and discuss the different motifs arbitrarily from the 5‘ (TAR) to the 3’ (P)end of the leader.] It is realized that dealing with individual hairpins may be a gross oversimplification, because there may be structural or functional interactions between the different RNA modules. It is likely that the three-dimensional structure is not a collection of hairpin structures connected by singlestranded regions. For instance, the RNA stretches between the stem-loop structures may form long-distance interactions that contract the molecule into a more rigid structure, but such tertiary interactions are not indicated simply because they have not yet been studied. Some stem regions are connected by a few or no nucleotides [in particular, the TAR and poly(A) hairpins], raising the possibility of coaxial stacking of the neighboring stems as in the structure of transfer RNA. Furthermore, some RNA domains may
TAR %,
4
G G U G C A C-G G-C A-U UG-C
U 'A-U G-C A-U C-G C-G A G-C A-U U-A U-A G-C G-U U-A C-G U-G C-G U-A C
-
ccCUC U
AGA
A
I l l 1 1 A
~
~
~
A
AAU-A A 130-G-C G-C U-A-220 C-G U-A C-0 A GAGCu
C-G U-A C-G
U C-G 0-C A- U A-U
U-A U-A C-G 0-U U-A C-G 11-A A-U E Z g - 2 c - G ~
+I
80 I
G
\ G 289-0 A A 0-C 11-G . C-G G-C 250-U-A
G U -0- 240 C-G U
G
U
G
C
'72.:=: C C
AUG
Y
SD
DIS
I I I
U
..
~
GcucUc
U-A G-C U-G G-C U-A
-97
~
U
C
G
U G U-A C-G A-U
G A G G C-G 315 - 0 - C
A G-C C-G G-C ,@ ;F-;-210 G-C' G AGGCGAGGGGA AAAAAUUUUGA
A G G A C-G 0-C U-G G-U G -C A G
AAGGAGAGA
AAG
FIG.1. Secondary RNA structure model for the HIV-1 leader RNA (LA1 isolate). The 5' end of viral RNA (position +1) has a cap (m'G). The transcriptional start site at + 1 marks the border between the upstream (untranscribed) U 3 region and R region of the LTR. The R region (position 1-97) is the short repeat at each end of the genome (see Fig. 4). The U5 region (98-181) is encoded by the LTR, but unique for the 5' end of HIV-1 transcripts. The leader RNA ends at the AUG initiation codon of the gag open reading frame at position 336. All hairpin motifs have been named after their (putative) function in HIV-1 replication and/or after the sequence elements encoded by them. Several direct repeats in the HIV-1 leader sequence are discussed in the text. These include an 8-mer within the TAR region (CUCUCUGG, positions 4-11 and 36-43), a 10-mer in the TAR and the PBS regions (GGAGCUCUCU, positions 32-41 and 223-232), and a 7-mer in the region encompassing the DIS and SD hairpins (GAGGCGA, positions 270-276 and 280-286). Several important sequence motifs are indicated by shaded boxes (AAUAAA hexamer involved in polyadenylylation, the 18-nucleotide PBS site, the GCGCGC palindrome in the loop of the DIS hairpin, the gag AUG start codon). The cleavage site within the major splice donor is marked by an arrow.
FIG.2. Secondary RNA structure model for the HIV-2 leader RNA (ROD isolate). The transcriptional start site at f l marks the border between the upstream (untranscrihed) U3 region and R region of the LTR. The R region (position 1-173) is repeated in the 3’ end of all HIV-2 transcripts. The U5 region (174-302) is encoded by the LTR, but unique for the 5’ end of HIV-1 transcripts. The leader RNA ends at the AUG initiation codon ofthe gag open reading frame at position 545. All hairpin motifs have been named after their (putative) function in HIV-1 replication and/or after the sequence elements encoded by them. Several important sequence motifs are indicated by boxes (AAUAAA hexainer involved in polyadenylylation, the 18-nucleotide PBS site, the GGUACC palindrome in the loop of the DIS hairpin, the gag AUG start codon). The site of cleavage within the major splice donor is marked by an arrow. [Part of the drawing (region 1-390) is after the model presented in Ref. 8, reproduced by permission of Oxford University Press.]
6
BENJAMIN BEKKHOUT
maintain a level of plasticity by being in an equilibrium between two structures, and such RNA conformational transitions can provide unique regulatory possibilities. We have resisted the temptation to maximize the number of base pairs in our models. Many helices in Figs. 1 and 2 could be extended by a few base pairs on introduction of bulging bases and other destabilizing elements. In the absence of comparative evidence, we preferred to show these segments as single-stranded. In general, there are several indications that extremely stable RNA structures are avoided in the HIV leader region. First, the stability of some stem regions was found to be fine-tuned, with a clear restriction to fold into excessively stable structures [e.g., the poly(A) hairpin; see Section III]. Second, a notable feature that seems to hold for the complete HIV leader, and in particular the TAR and poly(A) stems, is the frequent occurrence of unpaired, bulged single residues within helical regions. A preference for bulged A residues is observed, as has been reported for other RNA molecules (12). Although bulges can form specific recognition sites for proteins (e.g., Tat protein binds the 3-nucleotide TAR bulge; see Section 11), the role of bulges may also be to preclude the formation of excessively stable stem regions that may interfere with replication functions of the viral RNA. In particular, stable hairpins can interfere with the 5’ + 3’ scanning movement of ribosomes (13-15) or the 3’ -+ 5’ movement of an elongating reverse transcriptase enzyme (16, 17).
II. The Trans-acting Responsive Hairpin The role of TAR RNA in regulating HIV-1 gene expression has been extensively investigated by both in vitro transcription and transient transfection analyses. The role of TAR in Tat-mediated activation of viral transcription from the long-terminal-repeat (LTR) promoter has been discussed extensively in recent reviews (18-20), and we will therefore only summarize some important aspects of the TAR structure and function. Several features within TAR are critical for function, including the stem region, the 3-nucleotide bulge, and the 6-nucleotide loop. The viral transactivator protein Tat binds to the bulge domain of TAR RNA as part of nascently transcribed HIV-1 transcripts and activates the transcription machinery from this “RNA-enhancer” binding site (15, 21). Multiple observations suggest that cellular proteins, which also interact with TAR, are involved in Tat-mediated trans-activation. For instance, the fact that mutations in the loop sequences do not inhibit Tat binding to TAR RNA and yet greatly reduce trans-activation suggests that cellular factors that recognize the TAR loop are important. Indeed, several cellular TAR RNA-binding proteins have
RNA STRUCTURE AND RETROVIRAL REPLICATION
7
been cloned andlor purified with binding specificities for either the loop, the stem, or the bulge (22-27), although the precise role of these factors during Tat-dependent transcription remains unclear. Other cellular proteins bind to the TAR DNA sequences as part of the LTR transcriptional promoter (28-31). More recently, the role of the TAR RNA motif was analyzed using mutant HIV-1 viruses in tissue culture infections; in particular, the analysis of spontaneous revertant viruses did further define the critical TAR sequences and structural features (31a-34). A comparative analysis of TAR RNA structures in all human and simian immunodeficiency viruses reveals a conservation of certain structure features, despite significant divergence in both nucleotide sequence and length of the different TAR regions (Fig. 1 for HIV-1, Fig. 2 for HIV-2, and Fig. 3 for SIVagm, SIVmnd, and SIVsyk). In particular, we found a striking structural resemblance between the TAR elements of SIVmnd, SIVsyk, and HIV-2. Furthermore, the TAR structure of SIV-agm is intermediate in complexity compared to the single-stem TAR structure in HIV-1 and the duplex TAR structure in HIV-2, SIVmnd, and SIVsyk. Clearly, sequence and structure elements are conserved in the upper parts of both the single and duplex hairpins. This domain consists of a helix with a 2- to 4-nucleotide U-rich bulge, a 6-nucleotide GGG- or GAG-containing loop, and 4 or 5 base pairs in between. The degree of structure variation in this region of the genome of the different HIV-SIV viruses may suggest that a common ancestral virus did diverge a long time ago, but an uncertain factor in these calculations is the in vivo mutation rate of this group of viruses. In addition, we previously suggested an unusual RNA recombination event for this repeat (R) region of the HIV-1 genome. An elongating reverse transcriptase enzyme can prematurely transfer from 5' to 3' TAR-repeat sequences during minus-strand strong-stop cDNA translocation (17). According to this mechanism, a simple TAR structure can convert in a one-step reaction into a complex hairpin and vice versa. Retroviral RNA genomes are terminally redundant and both the TAR and poly(A) hairpin motifs are contained within each repeat region (Fig. 4). It is generally assumed that TAR is functional as a Tat-binding site in the 5'-R region, whereas the poly(A) signal is hnctional only in the 3' context. The latter hypothesis is reasonable, because synthesis of a full-length transcript requires bypass of the polyadenylylation signal in the leader, but use of this site in the 3' R. The mechanisms proposed to govern this differential poly(A) site use and the role of RNA structure are discussed in Section 111. As part of the 5'-R leader region, however, the poly(A) hairpin structure may perform a different function. Similarly, although most studies have focused on TAR in the context of a 5'-LTR promoter, integrated proviruses can activate transcription from the 3' TAR-LTR enhancer-promoter, thus expressing down-
U
G G
c
C-G G-U A-U G-C
U
G A
C-G G-C A-U C-G C-G A-U G-C G-C A-U
A-U U -A U-A C-G U C-G U-A G-C +1--A-u.
G U G
u
A G
C A / G C U / A
cu/
u
A/ U G
uu A
U A
G
A-U G -C A C-G A-U U-A C-G A-U U-G C-G U-A c U-A G-C A-U
A CUG CUCGGGG I l 1 1 1 1 1 A G A G C U CCA c U
SIV agm
G-C
+I-G-C..
A
G-c
,.
....
U-GA C-G G-C A-U U-A C-G
...
SIV mnd
U-A U-G G-C A-U G-C cG - U +'--G-C.
....
SIV syk
FIG. 3. TAR RNA secondary structure models for the SIVagm, SIVmnd, and SIVsyk virus groups. These hairpin structures form the extreme 5' end of the viral genomes and the transcriptional start site is marked as +I. Representative TAR structures of the HIV-1 and HIV-2 groups are shown in Figs. 1 and 2, respectively. (Part of this figure was reproduced from Ref. 5, reproduced by permission of Oxford University Press.)
9
RNA STRUCTURE AND RETROVIRAL REPLICATION
3 TAR UG G G C
C-G GC ’ A-U GC ’ U
A
C U A A ;G ’ A’ CG ’ CG ’ A G-C A “’ U-A A ‘“ G-C G-“ UA ’ C-; UCG ’ U-R
G G U G C A C-G G- C A- U UG - C
C-
‘AU G- C A-U C-G C-G A
G--C . .
3’ polyA
A-U U-A C U U-A U G- C G G- U C U-A C C-G -U U-G -U C-G C-G U-A U-A C C-G U-A U C- G G-C G- C (+I)- G- C G-C (+97) A- u I A-U cU -A C A (A)n CACUGCUU
FIG. 4. RNA secondary structure model for the repeat region at the 5‘ and 3’ ends of HIV-1 transcripts. Shown are both ends of a mature, polyadenylylated HIV-1 transcript. Nucleotide positions in the two repeats are numbered in an identical manner with respect to the transcriptional start site (+1) in the 5’ R. The 5’-TAR and 5‘-poly(A) hairpins are connected without a single nucleotide between the two stems, raising the possibility ofcoaxial stacking (see Section I). Polyadenylylation in the 3’ R occurs 19 nucleotides downstream of the 3‘ AAUAAA hexamer between positions 97 and 98. This process truncates the poly(A) hairpin and allows for extension of the 3‘-TAR hairpin with two base pairs (using the U-G-dinucleotide encoded by the upstream U3 region). This rearrangement results in two stems separated by eight singlestranded nucleotides.
stream cellular sequences (35).Furthermore, there is no obvious reason why the 3‘-TAR motif cannot play additional roles in the viral replication cycle. A simple possibility is that the TAR and poly(A) stems confer protection against cellular exonucleases. In that case, a hairpin-binding protein may also be required because the presence of a hairpin near the 3‘ end of an RNA is not by itself sufficient for a longer lifetime of the RNA in vivo (36). The tendency of viral RNA stem-loop structures not to become excessively stable is discussed in Section I. This restriction may apply in particular for the sequences within the repeat region of the leader, the TAR and poly(A) hairpins. Stable base-pairing interactions in the 3’ R can interfere with one of the initial steps in the reverse transcription process, that is, transfer of the minus-strand strong-stop cDNA from the 5’-R template to the 3’-R acceptor.
10
BENJAMIN BERKHOUT
Reverse transcription may be aborted at this step when the 3‘-R domain is occluded in stable base pairing (37). Alternatively, the R-region hairpins could be actively involved in the molecular strand-transfer mechanism. This seems unlikely for the 3’-R structures because deletion of the 30-97 region, which removes half of the TAR sequences and the complete poly(A) hairpin, did not significantly reduce the production of infectious virus (37). These results do not rule out a role for the 5‘-R hairpins during strand transfer. Although the 5’- and 3’-TAR elements are identical in sequence, they differ in flanking sequences and this may differentially affect their structure. In particular, unique base-pair interactions may exist between 3’ TAR and the upstream U3 region and between 5‘ TAR and the downstream leader-gag sequences (Fig. 4). The single-stranded TAR domains (Snucleotide bulge and 6-nucleotide loop) may also be involved in tertiary base-pair interactions with flanking sequences. In fact, two studies have proposed an interaction between the G-rich loop of 5’ TAR and downstream leader sequences (8,38). The first study proposed an interaction between the second TAR loop in HIV-2 and complementary sequences between the poly(A) and PBS structures (Fig. 2; 71UGGG74and 189CCC,,,, respectively). Consistent with this proposal, both regions were surprisingly insensitive toward various singlestrand-specific reagents (8). This was particularly striking given the high reactivity of the first TAR loop, which contains very similar sequences. The second study identified a small hairpin motif within the HIV-1 gag gene with a 6-nucleotide loop (UCCCAG) that is the perfect complement of the TAR loop (CUGGGA) (38). Because both pseudoknot-like interactions of the TAR loop use sequences that are unique to the 5’ end of HIV transcripts, they are expected to influence specifically the 5’-TAR structure. The 5’- and 3’-TAR elements of HIV-1 were reported to be structurally similar based on RNase T1 accessibility of the G-rich loops, but the 5’-TAR sequence used in this study did not include the proposed gag interaction site (39).There is no additional evidence for these tertiary interactions based on phylogenetic sequence analysis. Covariations were not observed because the sequence elements involved are conserved among virus isolates (38, and data not shown). Clearly, more studies will be required to characterize both these tertiary interactions and, ultimately, their mechanism of action during virus replication.
111. The Poly(A) Hairpin Like most eukaryotic mRNAs, the HIV-1 transcript is post-transcriptionally processed at its 3’ end by cleavage and polyadenylylation (reviewed in 40,41).A poly(A) signal is present in the leader transcript, which apparently
RNA STRUCTURE AND RETROVIRAL REPLICATION
11
does not function in this context. This sequence is part of the repeat element that is reiterated at the 3'end of the viral transcript, where it functions as an efficient polyadenylylation signal (Fig. 4). Most processing signals characterized are composed of at least two elements, the AAUAAA hexamer, which resides 10-30 nucleotides upstream of the cleavage site, and an amorphous U- or GU-rich element, which is located immediately downstream of the cleavage site. Both the hexamer and an extended GU cluster are present in the genomes of HIV-SIV viruses. For instance, the prototype HIV-1 isolate LA1 contains three GU-rich motifs immediately downstream of the cleavage 162GUGUG166),of site (Fig. 1; lo2GUGUGUG,,,, ,,,UGUUGUGUG,,,, which the most upstream motif was demonstrated to facilitate efficient polyadenylylation (42). More recently, additional regulatory elements upstream of AAUAAA were identified in a variety of viral poly(A) sequences (SV40, adenovirus, cauliflower mosaic virus, hepatitis B virus; reviewed in 43). These enhancer elements function in an orientation- and position-dependent manner and are usually U-rich, but exhibit little sequence similarity. An upstream enhancer element has also been suggested for the HIV-1 poly(A) site, which is of considerable interest with respect to its putative role in the selective activation of the 3'-poly(A) site (see below). Differential regulation of polyadenylylation has been studied for the animal retroviruses (44, 45), HIV-1 (46-52), and the human T cell leukemia virus type I (HTLV-I) (53, 54). The HTLV-I poly(A) signal is unique in that the AAUAAA hexamer is widely separated from the actual site of cleavage-poly(A) addition, but these two positions are juxtaposed by the folding of an extended RNA structure (53, 54).This example underscores the general notion that replicative functions of the leader may depend on its higher order RNA structure. Several models have been proposed to explain daerential poly(A)-site usage of HIV RNA. First, the poly(A) signal may merely be inefficient, such that a certain percentage of transcripts will read through. Clearly, no specific regulation of polyadenylylation is required in this mechanism. According to the second model, the 5'-poly(A) site is sufficient for processing, but the close proximity to the mRNA start site (cap site) results in suppression (47, 50). Perhaps a leader RNA of sufficient length is required for binding of proteins involved in polyadenylylation. Like the first model, suppression by cap-site proximity predicts that no regulatory sequence motifs play a role in differential polyadenylylation. The third model uses an upstream activator sequence, present only in the 3'-poly(A) context, to activate the downstream poly(A) site specifically. Indeed, sequences that increase polyadenylylation efficiency have been identified in the U 3 region upstream of the 3'-poly(A) site (46-49, 51, 52). The USenhancer model has been further modified to include the TAR
12
BENJAMIN BERKHOUT
RNA stem-loop structure in order to juxtapose spatially the upstream enhancer and the core poly(A) site (51).This molecular mechanism is reminiscent of the RNA structure that bisects the HTLV-I poly(A) site (53, 54). That the TAR stem can act as spacer was confirmed in artificial poly(A) constructs with the hairpin inserted between the AAUAAA motifand downstream GUrich box (55).Interestingly, we reported a severe replication defect for HIV-1 mutants with an opened lower part of the TAR stem region, although such TAR mutants are fully active in Tat-mediated trans-activation assays (32). Virus escape mutants did repair the TAR stem, without restoration of the actual base sequence. These observations are consistent with a spacer role for the full-length TAR stem in polyadenylylation. As an alternative for the US-enhancer model for activation of the 3‘-poly(A) site, it is theoretically possible to inactivate the 5’-poly(A) site specifically by silencer elements located in the leader sequences that are not in proximity to the 3’-poly(A) site. Silencing may occur either through binding of proteins that interfere with recognition and/or use of the upstream poly(A) site, or through RNA structural rearrangements that inactivate the upstream site. For instance, van Gelder et al. (56) convincingly demonstrated that binding of two molecules of the U1A protein to the poly(A) region of its own pre-mRNA interferes with polyadenylylation, presumably through blocking the access of polyadenylylation factors. In fact, specific silencing of the 5’-poly(A) site in HIV-1 was proposed to be mediated by the viral Tat protein, which binds the flanking 5’-TAR hairpin as part of the nascent transcript (57).However, this Tat-induced shift from the upstream to the downstream poly(A) site is probably caused by an effect of Tat protein on the processivity of RNA polymerase, an effect that is particularly pronounced in transfection systems with replicating plasmids (58, 59). Except for the role of the TAR hairpin as spacer between the AAUAAA motif and the upstream enhancer, is there any additional role for leader RNA structure in the regulated polyadenylylation of HIV-1 transcripts? We performed a comparative sequence analysis of this part of the RNA genome of different groups of immunodeficiency viruses, including human types 1 and 2 and simian types mandrill, african green monkey, and sykes (11). This analysis revealed the conservation of a hairpin motif despite the divergence in sequence (Fig. 5). In all cases, the AAUAAA signal is flanked by nucleotide segments that can base pair, thus forming a hairpin structure with the poly(A) signal in the single-stranded loop. The thermodynamic stability of this “poly(A) hairpin” was also well conserved, suggesting a biological function for this structure motif. Consistent with this idea, it was shown that FIG.5. Phylogenetic comparison of poly(A)-RNA hairpin structures in different HIV-SIV isolates. The poly(A) signals are denoted by grey boxes. (This figure is adapted from Ref. 11.)
4
GC
UU C U C C
G C C A G
-uc
U-A U-A C -G G-U U-AU U-A C-G G-U C-G A-U C -G
-U C-G U-A C-G C-GU G-C A-U A-U U - AC U-A C-G G-U U-A C-G A- U C- G
HIV-2 ROD
HIV-1 LA1
AAAAU-AGA~U U-A C-G
G-c
D
%u
CUG
G-C A-U U-A A-U U-A A-Uu G-C U-G U-A C-G G-U G- C U- A U- A C- G
SIV-agm AA
GU
G A UA
- AGAus
C-G A-U C- G
C-G G-C U-A U-A C-G G-U U-G U-A C-G G-U U-A U-A
SIV-cpz
SIV-mnd
u11-
HIV-1 ANT70
n
CU G C
U
A
-
C-G G-U C-G A A A G U- A U- A C-G G-C U-A C-G G- U C- G
C
U U C U
@yU U U G
GC U U
*&&
U-A C-G C-GU G-UA U A-U A- U U-G A C- G G-C
u-A
8
CUGCUC
&!$5$f
u AA U
C&&&%%G Bc C
C-G G-C A- U A- U A-UA G-C A-U U-G U-A C-G G - I1 G- c
U- A U-A C-G
SIV-agm TYO
uA
A .u' U- A C-G G- C U- A A- Uc A- U G- C U- G U- A C- G G- U G-C C- G A- U C- G
SIV-agm AB
A-u U-A A-U U-A CA - U 1l-G' U-A C-G G-U G--C 11-A U-A C-G
SIV-agm 677
U-A C -G G-U U-A A - 1', A--U G-C U-G U-A C-G G-U G-C C-G A-U C-G
SIV-agm 691
C-G
G-U C-G A-U 1' 1 - G
c-G
G-U U-A U-A U-A_ C-Gb G -C 11-A
-
U-A C-G G-U U-A U-A G-C C-G
..
SIV-Syk
14
BENJAMIN BERKHOUT
stabilization or destabilization of the stem region does severely inhibit the replication potential of the HIV-1 virus (11).Fine-tuning of the stability of this hairpin may be essential either for polyadenylylation or any other biological reaction in which the HIV-1 leader RNA participates. In general, the tendency not to produce perfect stems may reflect the biological role of this RNA molecule as template in reverse transcription and translation (see Section I). Recently, it was reported that HIV-1 polyadenylylation depends on the US-enhancer because the AAUAAA motif is flanked by a suboptimal sequence context (43). Inspection of the nucleotide sequence of the different HIV-SIV poly(A) hairpins demonstrates a remarkable clustering of CUU triplets flanked by purines (Fig. 5). All viral species contain multiple copies of this RCUUR motif, which may form the binding site for repressor-like proteins. The inhibitory sequences that were identified both 5' and 3' of the hexamer are in fact the sequences that constitute the stem of the poly(A) hairpin, suggesting that it may be the RNA structure that blocks access of the poly(A) site to the processing enzymes. Binding of certain proteins to this RNA structure may also occlude the poly(A) sequences for recognition by the proteins involved in mRNA polyadenylylation. A role for RNA secondary structure in mRNA 3'-end processing was suggested previously for some other mRNA species (60, 61). However, a recent mutational analysis of the adenovirus-2 L4 poly(A) site could not confirm an effect of RNA structure on the efficiency of polyadenylylation (62). Of note, histone mRNAs, the only mRNAs that lack a poly(A) tail, are processed at their 3' end by enzymes that recognize a stem-loop structure (63). The limited mutational data in the HIV-1 system strongly suggest that the RNA hairpin motif encompassing the poly(A) signal plays an important role in the HIV-1 replication cycle (11). Whether the hairpin is actually involved in regulation of HIV-1 polyadenylylation or is active at some other point in the retroviral life cycle awaits further study. We note that a rather different folding scheme for the region encompassing the poly(A) signal has been proposed by others (7). In our model, palindromic sequence elements flanking the poly(A)site are used to form a local stem-loop structure (11).The alternative model, however, proposes that this segment is involved in a long-distance base-pairing interaction with sequences approximately 165 nucleotides further downstream on the linear map (7). The phylogenetic analysis strongly supports the relatively simple hairpin model (8, 11),whereas helices of the alternative structure model are not validated by covariations. Furthermore, the biochemical probing data in support of the latter model are not necessarily inconsistent with the hairpin model. For instance, high reactivity of the AAAG sequence overlapping the
HNA STRUCTUHE AND RETROVIRAL REPLICATION
15
poly(A) signal was reported (7),which is fully compatible with the hairpin structure.
IV. The Primer-binding Site Reverse transcription of the viral KNA genome is mediated by the virionassociated reverse transcriptase (RT) enzyme in combination with a cellular tRNA molecule as primer, its identity depending on the particular retroviral species. In each case, 16-19 nucleotides at the 3’-CCA end of the tRNA are the exact complement of the primer-binding site located in the leader RNA of the corresponding virus genome. The HIV-SIV group of viruses uses tRNALyb-3 as primer (64-66). A priori, incorporation of the proper tRNA primer into the virion can be mediated by specific interaction with a viral protein, most likely the RT enzyme, or through base pairing with the PBS site on the viral RNA genome. Several in vitro studies reported a specific interaction of tRNALys-3 with the HIV-1 RT protein (67-69), although nonspecific binding has been reported by others (70-72). Details of this RNA-protein interaction are currently being analyzed. Chemical cross-linking data suggest that the anticodon region of tRNALyS-3 is in close proximity to the protein (67), but this tRNA domain does not provide the determinants for binding specificity (73, 74). Further, a specific tRNA-binding site within the C-terminal portion of the p66 subunit of RT has been proposed (75). Accumulating evidence that the RT domain of the precursor Gag-Pol protein is both required and sufficient for tRNA encapsidation comes from studies with infectious virus (76-79). Virion particles lacking the RT enzyme contain reduced levels of the tRNA primer, whereas normal tRNA levels are present in virions lacking a viral RNA genome. Consistent with the idea that the HIV-1 RT enzyme is dedicated to the tRNALys-3 molecule as primer for reverse transcription, viruses that are mutated in the PBS site, such that other tRNA primers should be used, are severely replication-defective (76, 78). We stipulate that the situation may be different for the murine retroviruses. Although the RT protein is also important for tRNA encapsidation in this case (80, Sl), there is some evidence that these viruses can efficiently initiate reverse transcription with primers other than the natural tRNAPromolecule (82, 83). One striking feature of the HIV RNA structure models is the predominant single-strand character of the PBS site (Figs. 1and 2). Only four (HIV-1) or three (HIV-2) base pairs of the PBS top stem need to open up for optimal base pairing with the 3’ end of the tRNA primer. Intriguingly, several re-
16
BENJAMIN BEFXHOUT
ports propose additional base-pairing contacts between the tRNA primer and the HIV RNA genome (8, 71, 84-86). For instance, Fig. 6 shows several potential base-pairing interactions between tRNALys-3and the HIV-2 genomic RNA (8). As discussed, the PBS sequence is not the major player in selective tRNALys-3 encapsidation, and these additional base pairs are therefore not expected to be involved in selective tRNA packaging. The additional contacts may also facilitate tRNA-PBS annealing through destabilization and melting of part of the tRNA cloverleaf structure. Alternatively, the RT protein may be actively involved in opening of the tRNA stems (see below). The additive tRNA-vRNA interactions can also trigger the formation of a higher order RNA structure that is specifically recognized and primed by the RT protein, as originally proposed for the Rous sarcoma retrovirus (87-87c). Finally, it cannot be excluded that the multidomain VRNA-tRNA interactions play a role in the maturation of genomic vRNA dimers (see Section V). We performed an extensive comparative sequence analysis of the various proposed interactions. However, because of the absolute conservation of the tRNALys-3sequence and the relative invariance of several of the viral RNA sequences involved, these interactions are difficult to demonstrate by comparative evidence. Specific HIV mutants can now be designed to test the contribution of additional contacts with the tRNA primer, In this respect, we note the presence of important DNA sequence motifs in this region of the HIV-1 genome that are involved in the integration of the reverse transcribed retroviral genome into the chromosomal DNA. This step requires the interaction of the virus-encoded integrase protein (In) with sequences located at the ends of viral DNA, the so-called att sites of the U3 region in the 5’ LTR and the U5 region in the 3’ LTR. In particular, changes in the highly conserved C-A dinucleotide near the 3’ end of U5 have a dramatic effect both on virus infectivity and on in uitro processing by the HIV-1 In protein (88-90). The conserved C-A sequence is, however, not sufficient for integration in uitro (90)and virus replication in tissue culture (91). The latter study suggested that U5 sequences critical for integration are situated within the terminal 16 nucleotides of U5 (91). Thus, the design of mutations in this leader region with specific effects (e.g., on RNA structure) is complicated by the presence of overlapping DNA sequence motifs. Based on RNase footprinting assays, it has been proposed that binding of the RT protein to the tRNALyS-3reverse transcription primer results in an opening of the acceptor stem (68). Recently, we demonstrated annealing of an oligonucleotide mimicking the PBS sequence to tRNALyS-3as part of the RT-tRNA complex, but not with the free tRNA molecule (74).These results suggest that RT opens the acceptor stem to allow intermolecular base pairing with the PBS site. Besides the RT protein, other viral factors may be in-
17
RNA STRUCTURE AND RETROVIRAL REPLICATION
A. C. C. G. pG.C. C-G. C.G. C.G. G.C.
C G
.
A. U D G A S S S S A S
S
?L!??
G D A
G A G C
A
U-GG G-C G-U C-G U-A U-G A-U C-G GU-A G U-A C-G C-G G-cA
C-G A-u G'C C A ' y A A
A U
A
A9 A US
u A
'A
A
C-GG C-G G-C C-G U-G G-C
G A
U A
U-A C-G C-G
AUCUUCU-AACAAAC
t R NA'YsS3
HIV-2 PBS
FIG. 6. Potential base-pairing interactions between the HIV-2 RNA genome and the tRNALyS--3primer. The 18-bp interaction of the 3' end of the tRNA primer with the PBS site is indicated by circles. The putative additional interactions add 8 hp (squares) and 6 bp (triangles and stars). All these additional interactions are hypothetical and should be tested experimentally (by biochemical probing or the analysis of viruses with specific mutations in these domains). (This figure is reproduced from Ref. 8, by permission of Oxford University Press, with some modifications in the number of additional base-pairing contacts.) Based on biochemical probing experiments, a detailed model was recently proposed for the interaction of tRNALvs-3with the HIV-1 genome (85,86).
volved in tRNA annealing andlor initiation of reverse transcription. In particular, nucleocapsid (NC) protein promotes annealing of the primer tRNA to viral RNA (92, 92a), and this activity may largely be due to the ability of the NC protein to destabilize secondary structure (93). In fact, the latter
18
BENJAMIN BERKHOUT
study demonstrated a general property of NC to lower the kinetic barrier for double-strand to single-strand transitions of both DNA and RNA templates. Recent experiments indicate a potential role for the accessory H I V 1 proteins Vif (94) and Nef (95, 96) in the reverse transcription process. Both proteins act in the virus-producing cells to allow the generation of virions that are h l l y competent for efficient reverse transcription of the RNA genome on entering a host cell. However, the mechanisms of these effects remain unknown, and the cellular targets for these viral proteins remain to be identified. These viral proteins could directly affect the reverse transcription process. Alternatively, these viral proteins could also be involved in the assembly of new virions or the processing of internalized virus (uncoating, incorporation of nucleotides, etc.), and such an effect of Vifon virion maturation has been reported (94, 97, 98). Some peculiar features of the PBS region are apparent from inspection of the RNA structures (Figs. 1 and 2). First, we note the presence of perfect repeat of a IO-nucleotide sequence downstream of the PBS site and in the TAR element (Fig. 1; GGAGCUCUCU at positions 32-41 and 223-232). Second, a remarkable exclusion of A nucleotides is observed on the left side of the base pairs that constitute the PBS bottom stem (HIV-1, position 112153; HIV-2, position 197-226). Comparing the two structure models in Figs. 1and 2, it is obvious that the HIV-2 genome is more extended in this domain than the HIV-1 RNA, a situation very similar to that observed for the TAR and poly(A) structures (Sections I1 and 111).
V. The RNA Dimerization Signal The genome of all retroviruses consists of two identical full-length RNA transcripts noncovalently associated near their 5' ends in a region called the dimer linkage structure (DLS). A hairpin motif involved in the initiation of dimerization was recently described for HIV-1 (the dimerization initiation signal, DIS). Dimerization is generally considered to play an important role in the preferential encapsidation of viral genomes within the budding virus particle and in the process of reverse transcription. In particular, the presence of a diploid genome has been suggested to enhance genetic recombination, which may increase the rate of retroviral evolution. Furthermore, a dimeric genome allows the viral RT enzyme to bypass occasional breaks in one of the RNA genomes (99). The mechanism of retroviral genome dimerization is currently unclear, but several models have been proposed based on in uitro studies with purified RNA segments (loo),and several attempts have been made to map the HIV RNA region responsible for dimerization (10, l l a , 102-111). Some
RNA STRUCTURE AND RETROVIRAL REPLICATION
19
reports have described a crucial role for a trans-acting factor, the viral nucleocapsid protein NCp7 in the formation of RNA dimers (101,103), but spontaneous RNA dimerization is possible in the absence of any viral or cellular protein. As discussed in Section IV, this effect of NC protein is based on its ability to activate base-pair rearrangements (93).Furthermore, protein is not required to hold the two RNA molecules together because genomic RNA can be phenol-extracted from mature virion particles as a dimer. It was initially proposed that “purine quartets” in the 3’ end of the HIV-1 leader RNA are involved in the dimerization process (102, 104, 107). This model was based on the presence of several consensus RGGARA tracts in the DLS region downstream of the major splice donor (SD) of all retroviruses, and this mechanism is similar to dimerization of telomeric DNA through formation of quadruple helical structures stabilized by guanine base tetrads (112). For instance, the HIV-2 leader encodes four such motifs (Fig. 2; 27IGGGm27@ 2s4AGGM289,447AGGAGA452,541GGGAGA546),with an additional motif in the 5’ end of the gag open reading frame (573GGGAAA57s). However, we reported efficient dimerization of mutant HIV-2 leader transcripts that were deleted for all RGGARA motifs (106). Similarly, several studies with HIV-1 RNA mutants reported the involvement of sequences outside the original DLS region (108-111). In particular, a dimerization initiation site (DIS) upstream of the SD was identified in the 248-270 region of the HIV-1 genome (Fig. 1). The DIS motif consists of a palindromic sequence in the loop of a hairpin structure (Figs. 1and 2). Dimerization is proposed to be initiated via a looploop interaction based solely on Watson-Crick base pairing (108-111). This “loop-loop kissing” mechanism of autocomplementary loop sequences is very similar to RNA-RNA interactions proposed for the regulation of plasmid replication (113, 114). Based on studies with model RNA stem-loops (115),it can be suggested that not only the complementarity between a pair of single-stranded loops, but also the exact loop sequence (and structure) may play a role in determining the stability of this RNA-RNA complex. It is possible that subsequent opening of both stem regions could further stabilize the structure by the formation of additional base-pairing interactions (108111).Although there is convincing evidence for the critical role of this DIS hairpin in in vitro dimerization, infection experiments with mutant HIV-1 viruses should provide h r t h e r evidence for the proposed mechanism and verlfy the role of potential accessory sequences that may activate dimerization. We performed a phylogenetic analysis of the corresponding region of the RNA genome of other HIV-SIV viruses and were able to fold a similar hairpin motif with a 6-mer palindromic sequence in the single-stranded loop for most of the sequences analyzed (Fig. 7). We did not recognize this motif
BssH I1
A A G-C U-G C-G G-C U-A U-A C-G
HIV-1 LA1 HIV-1 ELI HIV-1 2226
Sno I
A A" G-C U-A C-G G-C U-A U-A C-G
HIV-1 U455 HIV-1 MAL
Sno I
Sno I
G-C G-C
G-C G-U C-G G-C A-U U-A U-A C-G
c-64
G-C A-U U-A U-A C-G
HIV-1 ANT-70
Kpn I
Sno I
A A G-C U-A C-G G-C U-A U-A C-G
SlVcpz
HIV-1 MVP-5180
Kpn I
Kpn I
Hpa I A
GC-G
G-C U-A G-C C-G U-A C-G
A-U U-A C-G U-A U-A C-G
SlVsyk
SlVmnd
A A
A G
C-G G-C G
G
A C-G G-C C-G G-U
HIV-2 ROD
G G
C-G U-A G-C
A A C-G G-C G
C-G G-C C-G G-U
HIV-2 NIH-Z
G G
G
C-G G-C C-G
G-U
SlVsm pbja
FIG.7. Phylogenetic comparison of DIS hairpin structures in the leader RNA of different HIV-SIV isolates. The palindroinic motifs in the loop are
denoted by grey boxes and the restriction enzymes with the corresponding sequence specificity are listed on top of the hairpins. No restriction enzyme with UCUACA sequence specificity has been identified. N o similar hairpin motif could be folded for the SIVagm isolates.
RNA STRUCTURE AND RETROVIRAL REPLICATION
21
in the SIVagm sequences. Among the different DIS hairpins identified, there was considerable sequence heterogeneity in both the stem and loop domains. However, base changes on one side of the stem are compensated by base substitutions in the opposite strand (“base-pair covariation”). Remarkably, sequence variation in the loop demonstrates covariation within the palindromes (“palindromic covariation”). For convenience, we listed the restriction endonucleases with sequence specificity that corresponds to the loop palindromes (e.g., HIV-1 with the GCGCGC palindrome corresponds to the BssHII restriction enzyme). Given the variety of palindromes used by the different viruses, it is likely that the exact base sequence has relatively little importance. Based on these structure models, it now is straightforward to test the requirement for these structures and sequences in the context of the replicating virus. Multiple palindromic sequences are present in other single-stranded regions of the leader transcript. For instance, the prototype HIV-1 LA1 virus (Fig. 1) contains the palindrome ,,AAGCUU,, (Hind111 site) in the loop of the poly(A) hairpin and the ,o,AAAAUUUU,09 octamer palindrome in between the SD and 9 hairpins (see Section VI). Whether these additional palindromes contribute to the stability of the RNA dimer complex remains unclear. It is possible that the DIS interaction initiates dimerization, whereas other base-pairing contacts subsequently stabilize the complex. There is indeed some evidence that dimerization is a multistep process because dimers have been observed to “mature”-that is, to increase their stability during assembly of virion particles (116). Phylogenetic analysis provides little evidence for these accessory palindromes. However, as was observed for the DIS palindromes (Fig. 7), sequence conservation is perhaps not essential for these base-pairing motifs.
VI. The RNA Packaging Signal The packaging of retroviral genomes involves the specific interactions of the full-length RNA genome with Gag-derived proteins, in particular the Cys-His boxes of the NC domains (reviewed in 11 7). Because the sequences located between the major splice donor (SD, HIV-1 position 289) and the gag gene are present in full-length genomes and invariably absent from spliced mRNA forms, this region has received the greatest attention as the primary determinant for encapsidation. Indeed, several laboratories have presented evidence that sequences between the SD and gag open reading frame play a role in genome packaging (118-121). In most cases, this identification has been achieved by deletion mutagenesis, leading to an RNA encapsidation defect. More recently, there is accumulating evidence that
22
BENJAMIN BERKHOUT
other regions of the HIV genome are also involved. In particular, sequences in the U 5 region (98)or the DIS region (122) of the leader, the 5’ part of the gag open reading frame (123-125), and env sequences overlapping the Revresponsive element (RRE) (126) have been reported to contribute to the packaging function. Despite this increased knowledge, the actual RNA-protein interactions involved in packaging are poorly understood (127-130). Efficient binding of the NC protein to a 110-nucleotide HIV-1 RNA domain containing the four stem-loop structures DIS, SD, 9,and AUG has been reported (128). Others observed efficient binding with a three-hairpin fragment (SD, T,and AUG) (11a)or with the single SD hairpin (129), with an essential role for both the loop sequence and the structural integrity of the SD stem. These authors reported an effect of this region on RNA dimerization as well. In fact, the sequence between the SD and q hairpins may be one of the accessory palindromes discussed in Section V. Eight intermolecular base pairs can form between two HIV-1 RNA molecules by means of the typical 30,AAAAUUUU,,, motif and this interaction is not expected to disturb the intramolecular base pairs in the two flanking hairpin structures. There is some conservation of a sequence and/or structure motif in the packaging signals of other retroviruses. There is a hairpin motif with a conserved GACG loop sequence in type C murine retroviruses (131) and a similar structure with a GAPyC loop sequence conserved in some type D retroviruses (132). No similar sequence motifs can be found in the HIV-1 and HIV-2 leaders (Figs. 1 and 2), but we note the occurrence of purine-rich tetraloops in the HIV-leader RNA. These structures resemble the “tetraloops” that account for the majority of hairpins in ribosomal RNA (12). Three predominant tetraloop variants are present in ribosomal RNA (UUCG enclosed by an C - G base pair, GA/CAA with G C as closing pair, and CUUG with a terminal R.Y), and their remarkable structural similarity was elucidated by NMR studies (133, 134). This part of the HIV-2 genome is more extended than the HIV-1 counterpart, as observed for the upstream leader motifs (e.g., TAR; Section 11). Additional hairpin structures are predicted for the HIV-2 RNA (e.g., q 2 , W), with loops that are purine-rich but not consisting of four nucleotides. HIV-2 RNA is predicted to fold two purine tetraloops in the PBS region (Fig. 2; 273GAAA,,, and 333GAGA336).Thus, small hairpins with purine-rich loops may be involved in RNA packaging, but it is clear that additional sequence and/or structure elements are likely to be required for the selective encapsidation of retroviral genomes. In particular, we note the presence of extended polypurine stretches in this region of the HIV-SIV genomes (e.g., HIV-1, ,,AGAAGGAGAGAGAS6; HIV-2, ,,,GGGAGCAGAAGAGG,,,; a conserved 6-mer is underlined).
RNA STRUCTURE AND HETROVIRAL REPLICATION
23
It is expected that an intricate and subtle network of tertiary interactions are involved in RNA dimerization, primer tRNA annealing, and encapsidation of the HIV genome into assembling virions. The temporal relationship of these processes has not been characterized rigorously. A potential link between dimerization and encapsidation of HIV-1 genomic RNA has been proposed, and the RNA signals involved may overlap in the 3’ part of the leader transcript. Recent studies with HIV-1 and other retroviruses suggest that their genomes are already joined into some dimeric structure at the time of virus assembly, which is consistent with the notion that a dimeric genome is specifically recognized during virion assembly (12). Both the RNA signals for dimerization and packaging should be mapped in further detail. Positioning of a critical dimerization signal (the DIS hairpin) upstream of the SD does not necessarily prove the overlap hypothesis to be wrong. First, there still may be sequences downstream of the SD that stimulate dimerization or stabilize the dimer configuration (see Section V). Second, it cannot formally be excluded that part of the packaging signals are also positioned upstream of the SD. A deletion in the DIS region has previously been shown to reduce the amount of intact genomic RNA present per virion (122),which suggests that this region is indeed involved in packaging. An alternative interpretation is that encapsidation of dimeric genomes takes precedence over encapsidation of monomeric RNA. It is also possible that the mutant RNA was packaged, but rapidly degraded due to the absence of stable dimers.
VII. Splicing and Translation Functions Splicing of HIV-1 RNA is extremely complex because of the presence of multiple, alternatively used splice sites (reviewed in 135). In particular, numerous weak acceptor sites, located toward the center ofthe genomic RNA, are competing points of ligation for splicing. The leader encodes the major splice donor used to generate most subgenomic HIV transcripts (HIV-1, zs,CUG 4 GUG,,,; HIV-2, 468AAG4 GUA,,), and these sequence motifs are both present in one of the small hairpin motifs upstream of the gag gene. Mutation of the major SD in the HIV-1 virus slowed the kinetics of RNA and protein synthesis and the kinetics of virus spread (135).No complete loss of virus infectivity was observed because a cryptic SD site, four nucleotides further downstream, was activated in this mutant (2gIUGA1GUA,,,). The sequence of this cryptic SD site is strongly conserved among all HIV-1 isolates, suggesting some kind of selective pressure on this sequence motif, signal. Induction of a nearby cryptic splice site perhaps as part of the
24
BENJAMIN BERKHOUT
suggests that certain features of this leader region (sequences and/or structures) direct the splicing machinery to this part of the HIV genome. An interesting splicing pattern was described for the leader transcript of the HIV-2 virus group (136).In addition to the major SD, a minor SD inside the TAR sequences (GoCAG1GUA,) was used in combination with a splice acceptor (SA) site in the 5' stem region of the PBS structure (zooUAG UCG,,,). Usage of this splice does generate a unique transcript that lacks part of the TAR structure and the complete poly(A) hairpin. The biological significance of this alternative transcript is currently unclear. However, we would predict that such transcripts would remain Tat-inducible because the transcriptional function of TAR is completed prior to the splicing event (15, 137). Although it is clear that TAR RNA functions primarily in transcriptional activation from the LTR promoter, there is compelling evidence to suggest that this RNA motif has separate roles in translational regulation via cis- and trans-acting mechanisms.
1. The TAR RNA structure blocks movement of the scanning ribosome, leading to cis-inactivation of HIV-1 mRNAs (14, 138). Mutations that disrupt predicted secondary structure within the TAR hairpin relieve the inhibition and increase accessibility of the 5'-cap structure of the mRNA to translation initiation factors. Other leader regions may also influence translation; the RNA secondary structure model predicts the gag AUG codon to be occluded in a local hairpin structure that may reduce the efficiency of translation initiation. Dimerization of retroviral RNAs also blocks translation in cell-free assay systems (100). Specific leader mutants should be tested in translation assays to further define such effects. 2. Recent biochemical data indicate that the human autoantigen La is involved in regulation of HIV-1 translation through binding of the TAR RNA 139,140).La is an RNA-binding protein that elicits an autoimmune response in patients with systemic lupus erythematosus and Sjogren's syndrome. La binds to the 5' leader of poliovirus as well, and in vitro translation studies implicate this protein in poliovirus internal translation initiation (141).These results, combined with the observation that scanning ribosomes are inhibited by structure in the HIV-1 leader (see above), may indicate that HIV is using an internal ribosomal entry site (IRES), as originally proposed for poliovirus (142). Evidence for an internal ribosome entry mechanism was recently reported for another retroviral species, the murine leukemia virus (143).However, there is no direct evidence to support a nonscanning mechanism of translation for the HIV-SIV viruses. In fact, inspection of all HIV-SIV leader sequences indicates that upstream AUG triplets, which can potentially usurp
RNA STRUCTURE AND RETROVIRAL REPLICATION
25
the scanning ribosomes, are excluded from the leader region. This bias against upstream initiation codons is the expected condition for a scanning mechanism of translation. Furthermore, we have found that HIV-1 replication is severely inhibited by the introduction of an upstream AUG-initiation codon (unpublished data). 3. Viral RNA can also regulate translation in trans. Several viruses affect the activity of the interferon-induced RNA-dependent protein kinase (PKR), which catalyzes the phosphorylation of protein synthesis initiation factor eIF-2 (144). For HIV-1, the TAR RNA hairpin activates two interferoninduced enzymes, PKR and (2-5)A-synthetase, of which the latter is able to activate a cellular RNase (138, 145). Like other viruses such as adenovirus, influenza virus, and vaccinia virus, HIV-1 has acquired a mechanism for evading the antiviral activity of PKR and (2-5)A-synthetase. The HIV-1 Tat protein inhibits TAR-mediated activation of PKR and (2-5)A-synthetase (145, 146), suggesting an escape mechanism for the virus. There has been some doubt that TAR could activate PKR because maximal PKR activation requires a stem region of about 65 to 85 base pairs (147, 148). TAR has only 23 base pairs, but we mentioned in Section I the possibility of coaxial stacking with the neighboring poly(A) stem, which would result in a duplex structure of 40 base pairs. However, most studies used only partial HIV-1 leader transcripts with sequences up to position +82, where a convenient HindIII restriction site is located, but such RNA fragments contain only the TAR hairpin structure. Because the HIV leader transcript contains multiple hairpin structures of considerable stability (Figs. 1 and 2), the cis- and truns- inhibitory effects on translation should be performed with full-length leader RNAs.
VIII. Base Composition of HIV-SIV leader RNAs We analyzed the nucleotide bias and the compositional tendencies of short oligonucleotides in the leader in order to highlight possible sequence motifs that may play a role in any of the biological functions of this RNA molecule. Lentiviral genomes, including HIV-1 RNA, have an unusual base composition (149-153). In particular, the HIV-1 genome is A-rich (35.6%) and C-poor (17.9%). This points to one or more mutational and selective constraints not yet identified. The biased base composition is present in all open reading frames, and dictates the typical codon usage of these viruses (154). Interestingly, when we compared the amino-acid composition of several HIV-1 and HIV-2 proteins to homologous functions of the human T cell leukemia viruses HTLV-I and HTLV-I1 (that do not have an A-rich genome), we found significant differences in total aniino-acid content that correlate with the preferential use of amino-acid residues encoded by A-rich codons in
26
BENJAMIN BERKHOUT
HIV (155). Furthermore, direct alignment of protein domains indicated that many conservative and some nonconservative amino-acid changes can be explained by strong “A-pressure’’ in HIV. These examples underscore the magnitude of “A-pressure” in the HIV-SIV RNA genomes. We analyzed 17 complete HIV-SIV viral genomes and compared the base composition with that of the corresponding leader regions (7003 nucleotides in total). Surprisingly, we found that the nucleotide composition of the leader is more balanced (Table I), without a preference for the A nucleotide (24.4%)or bias against the C nucleotide (23.8%). Next, the complete genomes and leader regions were evaluated for extremes of dinucleotide relative abundances. A common assessment of dinucleotide bias in a sequence is via the odds-ratio measure, pxy = fxY/fJy, where fx and fy denote the frequency of mononucleotide X and Y, respectively, and fxy the frequency of dinucleotide X-Y in the sequence (156). As a conservative criterion, for pxy > 1.25 (or < 0.78), the X-Y pair is regarded to be of high (or low) relative abundance compared with a random association of mononucleotides (156). Table I1 lists the abundance calculated for the complete genomes and the values obtained for the leader regions. Most strikingly, we observed a strong discrimination against C-G in HIV-SIV genomes (p = 0.27), although the discrimination is somewhat relieved in the leader region (p = 0.59). Furthermore, there is a leader-specific trend to cluster purines, but in an alternating manner (e.g., AGAG). Similarly, pyrimidine clusters are favored (e.g., CUCU). This holds in particular for the sequence A-G (p = 1.47 for the leader vs. 1.19 for the genome in total) and C-U (1.76 vs. 1.20), but also to some extent for G-A (1.16 vs. 0.96) and U-C (1.06 vs 0.90).
TABLE I BASE COMPOSITIONS OF COMPLETE HIV-SIV GENOMES AND LEADERREGIONS” Average base composition (%) Region
A
G
C
U
Complete genomes Leader region
35.0 24.4
24.3 31.0
18.6 23.8
22.0 20.8
a Viral strains analyzed (1) belong to the HIV-1 group (isolates ANT-70, MVP5180, ELI, 2226, LAI, MAL, U455, SIVcpz), the HIV-2 group (isolates NIHZ, ROD,SIVsmmpbja), the SIVagm group (isolates 155, 3, XX, AA), SIVrnnd, and SfVsyk (comgnm).
27
RNA STRUCTURE AND RETROVIRAL REPLICATION
TABLE 11 ABUNDANCES OF DINUCLEOTIDES COMPLETEHIV-SIV GENOMES AND LEADERREGIONS
RELATIVE
IN
5‘\3’ A G C U A
G C U
A
G
C
U
HIV-SIV genornes 0.95 1.19 0.89 0.96 1.20 1.04 1.26 0.27 1.26 0.90 1.13 0.90
0.96 0.83 1.20 1.13
HIV-SIV leaders 1.14 1.47 0.86 1.16 0.92 1.03 0.81 0.59 1.02 0.88 1.05 1.06
0.38 0.88 1.76 0.98
The leader regions restrict the formation of homopolymeric stretches (GG , 0.92 vs. 1.20; C-C, 1.02 vs. 1.26; U-U, 0.98 vs. 1.13).In this context, the 277GGGG28,,301AAAAA305, and ,,UUUU3, stretches present in the HIV-1 q-region are rather unusual and may constitute a signal for RNA dimerization or packaging (see Sections V and VI). Another remarkable difference between the leader region and the complete genome is the occurrence of the A-U dinucleotide. Whereas unbiased levels are present in the complete genomes (p = 0.96), the leader is decisively A-U suppressed (p = 0.38). Apart from the C-G and A-U deficiencies, there are no other pervasive significant dinucleotide extremes in the HIV-SIV genomes and the leader region in particular. We will discuss A-U and C-G motifs in more detail below. Scarcity of the A-U dinucleotide may reflect a general requirement for protection of the viral RNA against ribonucleases, as suggested for cellular transcripts (157, reviewed in 158).This suggestion was based on the discovery of a stereotypic, repeating UUAUUUAU sequence in the 3’-untranslated region of certain genes (159)and the observation that this motif is destabilizing to mRNA molecules that contain them (160). However, there is some experimental evidence that U-A and not A-U is the RNA dinucleotide most susceptible to RNase activity (157). Furthermore, protection against ribonucleases does not easily explain the selective rejection of A-U in the leader region only. From another perspective, in view of the prominent role of the “AAUAAA box” in mediating transcription termination (see Section
28
BENJAMIN BERKHOUT
111),occurrences of A-U might be minimized to avoid inappropriate binding of polyadenylylation factors. At the DNA level, the presence of the A-Tcontaining motif that regulates transcription initiation from the upstream LTR promoter (the “TATAA box”) may further limit the usage of the A-T sequence in the flanking leader region. The C-G level of the lentiviral genome, including that of HIV-1, has been reported to be extremely low (152, 161, 162). In contrast, there is no evidence for selection against C-G in oncoretroviruses such as HTLV-I (162), suggesting that low C-G levels are of biological importance specifically for the HIV-like viruses. In vertebrate genomes, methylation of cytosines occurs in C-G nucleotides, and this modification often correlates negatively with gene expression (reviewed in 163). HIV-1 LTR-directed gene expression is also susceptible to transcriptional inactivation by methylation (164). Of the 81 C-G dinucleotides present in the genome of the prototype HIV-1 LA1 isolate, 21 are located in the 113-469 region, which forms the 3‘ part of the leader and the 5’ end of the gag open reading frame. The reason for this C-G clustering is currently unclear. The dinucleotide analysis demonstrated a preference for R-R and Y-Y motifs without iterations of the same nucleotide (A-G, G-A, U-C, C-U). This trend results in the frequent occurrence of typical sequence motif Rn2,Yn>, and Yn2,Rn2,. In the prototype HIV-LA1 strain, we found 26 Rn2,YnZ2 and 29 Yn2,Rn2, motifs, compared to only 8 motifs of the alternating type (R-Y), and (Y-R),. The latter motifs are not only restricted in number, but also in nucleotide composition; they frequently use G but not A, which is combined either with U to form GUGU motifs or with C to form GCGC sequences. All three G-U repeats are located in the U5 region downstream of the HIV poly(A) site and function as enhancers of polyadenylylation (see Section 111). The C-G repeats do encode 6 of the 21 C-G motifs present in the leader region. The abundance of the R-Y-clustered motifs may suggest that they provide a signal as sequence or structure element. The average length of the clustered motifs is R,,7Y3.5 and Y3,4R3,4 for the prototype HIV-1 LA1 leader RNA, with the sequences at the R-Y borders conforming to the dinucleotide bias described in Table 11. Two extended Y-R motif.. in HIV-1 are repeated in the leader. An 8-nucleotide motif is present twice in the TAR domain (3uCUCUCU-GG,,/,,CUCUCU-GG,,) and a 10-nucleotide segment from TAR is copied in the PBS region (3,gGGAG-CUCUCU,1/zz,agaGGAG-CUCUCUc233). Although there is no evidence that any of these motifs is involved in one of the molecular functions of the leader RNA during viral replication, we did include this analysis to highlight the idea that the leader RNA sequences are distributed in a nonrandom manner. Perhaps a given nucleotide content or pattern of distribution can cause a specific RNA structure, or certain Y-R
RNA STRUCTURE AND RETROVIRAL REPLICATION
29
arrangements may be part of‘ a motif that is recognized by leader RNAbinding proteins.
IX. Concluding Remarks Understanding the three-dimensional structure formed by the HIV leader RNA molecule, both free and in the ribonucleoprotein complex of the virion, is crucial for analyzing its specific recognition by proteins and its interaction by other RNA molecules during virus replication. The number of HIV-SIV sequences now known is sufficiently large that comparative analysis can be used effectively to deduce some of the basic design principles underlying HIV RNA structure. Further, experimental studies on viral RNA seem to be about to enter a new phase with the possibility to perform in oitro selection experiments to analyze details of RNA-protein interactions (165) and in tjivo “forced evolution” systems that describe repair pathways for viruses mutated in specific RNA sequencesistructures (32, 166). By understanding the molecular basis of the interactions that govern critical steps in the retroviral replication cycle, it may be possible to develop methods to intervene therapeutically in the process. For instance, gene therapy has been proposed for treatment of AIDS (167), for which there are currently no effective chemotherapeutic or vaccine therapies, and several molecular strategies have been designed and shown to inhibit replication of the HIV-1 retrovirus in tissue-culture systems. These anti-HIV approaches include RNA molecules such as antisense transcripts, ribozymes, and sense/decoy motifs that mimic important HIV-1 RNA structures (168-180). For instance, a TAR RNA decoy transcribed from a retroviral vector is cur. of different leader rently being tested in clinical trials ( 1 8 0 ~ )Combination RNA motifs can be used to increase further the efficiency or specificity of ‘ may antiviral RNA molecules. Addition of the retroviral packaging signal P result in colocalization of the inhibitor transcript and the target HIV-1 genomic RNA within viral particles (181). Alternatively, it may be possible to inhibit HIV expression specifically in trans by leader-encoded functions like the dimerization signal (182). ACKNOWLEDGMENTS I thank Atze Das for critical reading of the manuscript and members of my laboratory for suggestions and helpful discussions. 1 am grateful to Jan van der Noordaa for support and encouragement. I thank Wim van Est for excellent artwork. The research of my group has been supported by grants from the Netherlands Organization for Scientific Research (NWO), the Dutch Cancer Society (KWF), and the Dutch AIDS Foundation.
30
BENJAMIN BERKHOUT
REFERENCES 1. G. Myers, S. Wain-Hobson, G. N. Pavlakis, B. Korber and R. F. Smith, eds., in “Human Retroviruses and AIDS: A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences.” Los Alamos National Laboratory, Los Alamos, NM, 1994. 2 . P. M. Sharp, D. L. Robertson, F. Gao and B. H. Hahn, AZDS 8, S27 (1994). 3 . M. A. Muesing, D. H. Smith and D. J. Capon, Cell 48, 691 (1987). 4 . J.-I. Sakuragi, M. Fukasawa, R. Shibata, H. Sakai, M. Kawamura, H. Akari, T. Koyomsu, A. Ishimoto, M. Hayami and A. Adachi, Virology 185, 455 (1991). 5. B. Berkhout, NARes 20, 27 (1992). 6. G. P. Harrison and A. M. L. Lever, J. Virol. 66, 4144 (1992). 7 . F. Baudin, R. Marquet, C. Isel, 1.-L. Darlix, B. Ehresmann and C. Ehresmann, J M B 229, 382 (1993). 8. B. Berkhout and I. Schoneveld, NARes 21, 1171 (1993). 9. T. Hayashi, Y. Ueno and T. Okamoto, FEBS Lett. 327, 213 (1993). 10. K. Sakaguchi, N. Zambrano, E. T. Baldwin, B. A. Shapiro, J. W. Erickson, J. G. Omichinski, G. M. Clore. A. M. Gronenborn and E. Appella, PNAS 90, 5219 (1993). 1 1 . B. Berkhout, B. Klaver and A. T. Das, Virology 207, 276 (1995). 11a. J. Clever, C. Sassetti and T. G. Parslow, J . Virol. 69, 2101 (1995). 12. R. R. Gutell, N. Larsen and C. R. Woese, Microbiol. Reu. 58, 10 (1994). 13. M. Kozak, Cell 34, 971 (1983). 14. N. T. Parkin, E. A. Cohen, A. Darveau, C. Rosen, W. Haseltine and N. Sonenberg, EMBO J. 7, 2831 (1988). 15. B. Berkhout, R. Silverman and K.-T. Jeang, Cell 59, 273 (1989). 16. V. K. Pathak and H. M. Temin, J . Virol. 66, 3093 (1992). 17. B. Klaver and €3. Berkhout, NARes 22, 137 (1994). 18. M. L. Hammarskjold, D. Rekosh, B. Berkhout, Y.-N. Changand K.-T., Jeang, AZDS 5, S3 (1992). 19. J. A. Garcia and R. B. Gaynor, AIDS 8, S3 (1994). 20. K. A. Jones and B. M. Peterlin, ARB 63, 717 (1994). 21. C. Dingwall, I. Ernberg, M. J. Gait, S. M. Green, S. Heaphy, J. Karn, A. D. Lowe, M. Singh, M. A. Skinner and R. Valerio, PNAS 86, 6925 (1989). 22. R. A. Marciniak, M. A. Garcia-Blanco and P. A. Sharp, PNAS 87, 3624 (1990). 23. A. Gatignol, A. Buckler-White, B. Berkhout and K.-T. Jeang, Science 251, 1597 (1991). 24. C. T. Sheline, L. H. Milocco and K. A. Jones, Genes Deu. 5, 2508 (1991). 25. F. Wu, J. Garcia, D. Sigman and R. Gaynor, Genes Deu. 5, 2128 (1991). 26. M. P. Rounseville and A. Kumar, J. Virol. 66, 1688 (1992). 27. T. R. Reddy, M. Suhasini, J. Rappaport, D. J. Looney, G. Kraus and F. Wong-Staal, AZDS Res. Hum. Retrooiruses 11, 663 (1995). 28. K. A. Jones, P. A. Luciw and N. Duchange, Genes Deu. 2, 1101 (1988). 29. F. K. Wu, J. A. Garcia, D. Harrich and R. B. Gaynor, EMBOJ. 7, 2117 (1988). 30. J. A. Garcia, D. Harrich, E. Soultanakis, F. Wu, R. Mitsuyasu and R. B. Gaynor, EMBO J. 8, 765 (1989). 31. H. Kato, M. Horikoshi and R. 6. Roeder, Science 251, 1476 (1992). 31a. B. Berkhout and B. Klaver, NARes 21, 5020 (1993). 32. B. Klaver and B. Berkhout, EMBOJ. 13, 2650 (1994). 33. B. Berkhout and B. Klaver, J. Gen. Virol. 76, 845 (1995). 34. D. Harrich, G . Mavankal, A. Mette-Snider and R. B. Gaynor, 1. Virol. 69, 4906 (1995). 35. B. Klaver and B. Berkhout, J . Virol. 68, 3830 (1994).
RNA STRUCTURE AND RETROVIRAL REPLICATION
31
36. R. S. McLaren, S. F. Newbury, 6. S . C. Dance, H. C. Causton and C. F. Higgins, JMB 221, 81 (1991). 37. B. Berkhout, J. L. B. van Wamel and B. Klaver, J M B 252, 59 (1995). 38. K.-Y. Chang and I. Tinoco, Jr., PNAS 91, 8705 (1994). 39. P. Wang, M.-C. Rouyez, S . Ducamp, S. Saragosti and M. Ventura, BBRC 195,565 (1993). 40. E. Wahle and W. Keller, ARB 61, 419 (1992). 41. W. Keller, Cell 81, 829 (1995). 42. S. Bohnlein, J. Hauber and B. R. Cullen, J . Virol. 63, 421 (1989). 43. G. M. Gilmartin, E. S . Fleming, J. Oetjen and B. R. Graveley, Genes Deu. 9, 72 (1995). 44. J. M. Coffin, in “RNA Tinnor Viruses” (R. Weiss et a l . , eds.), p. 261. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1984. 45. H. Varmus and R. Swanstriim, in “RNA Tumor Viruses” (R. Weiss et al., eds.), p. 369. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1984. 46. P. H. Brown, L. S. Tiley and B. R. Cullen, J. Virol. 65, 3340 (1991). 47. J. Cherrington and D. Ganem, EMBO J. 11, 1513 (1992). 48. J. D. Dezazzo, J. E. Kilpatrick and M. J. Imperiale, MCBiol 11, 1624 (1991). 49. A. Valsamakis, S. Zeichner, S. Carswell and J. C. Alwine, PNAS 88, 2108 (1991). 50. C. Weichs an der Glon, J. Monks and N. J. Proudfoot, Genes Den 5, 244 (1991). 51. G. M. Gilmartin, E. S. Fleming and J. Oetjen, EMBOJ. 11, 4419 (1992). 52. A. Valsamakis, N. Schek and J, C. Alwine, MCBiol 12, 3699 (1992). 53. M . Seiki, S. Hattori, Y. Hirayania and M. Yoshida, PNAS 80, 3618 (1983). 54. Y. F. Ahmed, G. M. Gilmartin, S. M. Hanly, J. R. Nevins and W. C. Greene, Cell 64,727 (1991). 55. P. H. Brown, L. S. Tiley and B. R. Cullen, Genes Deu. 5, 1277 (1991). 56. C. W. G. van Gelder, S. I. Gunderson, E. J. R. Jansen, W. C. Boelens, M. PolycarpouSchwarz, I. W. Mattaj and W. J. van Venrooij, EMBOJ. 12, 5191 (1993). 57. C. Weichs an der Glon, M. Ashe, J. Eggermont and N . J. Proudfoot, EMBOJ. 12, 2119 (1993). 58. K-T. Jeang, B. Berkhout and B. Dropulic, JBC 268, 24940 (1993). 59. P. Nahreini and M. B. Mathews, J. Virol. 69, 1296 (1995). 60. R. P. Woychik, R. H. Lyons, L. Post and F. M . Rottman, PNAS 8, 3944 (1984). 61. E. R. Gimmi, M. E. Reff and I. C. Deckmann, NARes 17, 6983 (1989). 62. A. Sittler, H . Gallinaro and M. Jacob, J M B 248, 525 (1995). 63. A. S. Williams and W. F. Marzluff, NARes 23, 654 (1995). 64. M. Jiang, J. Mak, M. A. Wainberg, M. A. Parniak, E. Cohen and L. Kleiman, BBRC 185, 1005 (1992). 65. M. Jiang, J. Mak, A. Ladha, E. Cohen, M. Klein, 8 . Rovinski and L. Kleiman, J . Virol. 67, 3246 (1993). 66. A. T. Das, S. E. C. Koken, B. B. Oude Essink, J. L. 8 . van Wamel and B. Berkhout, FEBS Lett. 341, 49 (1994). 67. C. Barat, V. Lullien, 0. Schatz, 6. Keith, M. T. Nugeyre, F. Gruninger-Leitch, F. BarreSinoussi, S. F. J. LeGrice and J.-L. Darlix, EMBO J. 8, 3279 (1989). 68. L. Sarih-Cottin, B. Bordier, K. Musier-Forsyth, M. Andreola, P. J. Barr and S. Litvak, JMB 226, 1 (1992). 69. S. Weiss, B. Konig, H. J. Muller, H. Seidel and R. S. Goody, Gene 111, 183 (1992). 70. R. W. Sobol, R. J. Suhadolnik, A. Kumar, B. J. Lee, D. L. Hatfield and S. H. Wilson, Bchem. 30, 10623 (1991). 71. L. A. Kohlstaedt and T. A. Steitz, PNAS 89, 9652 (1992). 72. M . D. Delahunty, S. H. Wilson and R. L. Karpel, J M B 236, 469 (1994). 73. C. Barat, S. F. J. LeGrice and J.-L. Darlix, NARes 19, 751 (1991).
32
BENJAMIN BERKHOUT
74. 8. 13. Oude Essink, A. T. Das and B. Berkhout, JBC 270, 23867 (1995). 75. Y. Mishima and J. A. Steitz, EMBO J. 14, 2679 (1995). 76. X. Li, J. Mak, E. J. Arts, Z. Gu, L. Kleiman, M. A. Wainberg and M. A. Parniak, J. Virol. 68, 6198 (1994). 77. J. Mak, M. Jiang, M. A. Wainberg, M. L. Hammarskjold, D. Rekosh and L. Kleiman, J. Virol. 68, 2065 (1994). 78. A. T. Das, B. Klaver and B. Berkhout, J . Virol. 69, 3090 (1995). 79. A. T. Das and B. Berkhout, NARes 23, 1319 (1995). 80. B. Gerwin and J. G. Levin, J. Virol, 24, 478 (1977). 81. J. G. Levin and J. 6 . Seidman, J. Virol.29, 328 (1979). 82. J. Colicelli and S. P. Goff, J. Virol. 57, 37 (1986). 83. A. H. Lund, M. Duch, J. Lovmand, P. Jorgensen and F. S. Pedersen, J Virol.67,7125 (1993). 84. A. Aiyar, D. Cobrinik, Z. Ge, H.-J. Kung and J. Leis, J . Virol. 66, 2464 (1992). 85. C. Isel, R. Marquet, G. Keith, C. Ehresmann and B. Ehresmann, ] B C 268,25269 (1993). 86. C. Isel, C. Ehresmann, G . Keith, B. Ehresmann and R. Marquet, J M B 247, 236 (1995). 87. D. Cobrinik, L. Soskey and J. Leis, J. Virol.62, 3622 (1988). 87a. D. Cobrinik, A. Aiyar, Z. Ge, M. Katzman, H. Huang and J. Leis, J. Virol. 65, 3864 (1991). 87h. A. Aiyar, Z. C e and J. Leis, /. Virol. 68, 611 (1994). 87c. E. J. Arts, X. Li, Z. Cu, L. Kleiman, M. A. Parniak and M. A. Wainberg, J B C 269, 14672 (1994). 88. A. M. Borman, C. Quillent, P. Charneau, C. Dauguet and F. CIavel, J. Virol. 69, 2058 (1995). 89. R. L. LaFemina, P. A. Callahan and M. 6. Cordingley, J . Virol. 65,5624 (1991). 90. C. Vink, D. C. van Gent, Y. Elgersma and R. H. A. Plasterk, J. Virol. 65, 4636 (1991). 91. A. D. Leavitt, R. B. Rose and H. E. Varmus, J. Virol. 66, 2359 (1992). 92. A. C. Prats, L. Sarih, C. Cabus, S . Litvak, G. Keith and J.-L. Darlix, EMBO J. 7, 1777 (1988). 92a. R. Khan and D. P. Giedroc, JBC 267, 6689 (1992). 93. Z. Tsuchihashi and P. 0. Brown, J. Virol. 68, 5863 (1994). 94. U. von Schwedler, J Song, C. Aiken and D. Trono, J . Virol. 67, 4945 (1993). 95. 0. Schwartz, V. Marechal, 0. Danos and J.-M. Heard, J . Virol. 69, 4053 (1995). 96. C. Aiken and D. Trono, J. Virol. 69, 5048 (1995). 97. D. H. Gabuzda, K. Lawrence, E. Langhoff, E. Terwilliger, T. Dorfman, W. A. Haseltine and J, Sodroski, J. Virol.66, 6489 (1992). 98. E. Vicenzi, D. S. Dimitrov, A. Engelman, T.-S. Migone, D. F. J. Purcell, J. Leonard, G. Englund and M. A. Martin, J. Virol. 68, 7879 (1994). 99. W . 3 . Hu and H. M. Temin, PNAS 87, 1556 (1990). 100. E. Bieth, C. Gabus and J.-L. Darlix, NARes 18, 119 (1990). 101. J. L. Darlix, C. Gabus, M.-T. Nugeyre, F. Clavel and F. Barre-Sinoussi, J M B 216, 689 (1990). 102. R. Marquet, F. Baudin, C. Gabus, J. L. Darlix, M. Mougel, C. Ehresmann and B. Ehresmann, NARes 18, 2349 (1991). 103. H. De Rocquigny, C. Gabus, A. Vincent, M.-C. Fournie-Zaluski, B. Roques and 1.-L. Dalix, PNAS 8, 6472 (1992). 104. 6. Awang and D. Sen, Bchen 32, 11453 (1993). 106. B. Berkhout, B. B. Oude Essink and I. Schoneveld, FASEB J. 7, 181 (1993). 107. W. Sundquist and S. Heaphy, PNAS 90, 3393 (1993). 108. M. Laughrea and L. JettB, Bchern 33, 113464 (1994).
RNA STRUCTURE AND RETROVIRAL REPLICATION
33
109. J.-C. Paillart, R. Marquet, E. Skripkin, B. Ehresmann and. C. Ehresmann, JBC 269, 27486 (1994). 110. E. Skripkin, J . X . Paillart, R. Marquet, B. Ehresmann and C. Ehresmann, PNAS 91,4945 (1994). 111. D. Muriaux, P.-M. Girard, B. Bonnet-Mathonihre and J. Paoletti, JBC 270, 8209 (1995). 112. J. R. Williamson, M. K. Raghuranian and T. R. Cech, Cell 59, 871 (1989). 113. Y. Eguchi, T. Itoh and J. I. Tomizawa, ARB 60, 631 (1991). 114. C. Persson, E . Gerhart, H. Wagner and N . Nordstrom, EMBO J. 9, 3767 (1990). 115. R. S. Gregorian, Jr. and D. M. Crothers, J M B 248, 968 (1995). 116. W. Fu, R. J. Gorelick and A. Rein, J. Viral. 68, 5013 (1994). 117. E. Hunter, Semin. Virol. 5, 71 (1994). 118. A. M. L. Lever, H. Gottlinger, W. Haseltine and J. Sodroski, 1. Virol. 63, 4085 (1989). 119. A. Aldovini and R. A. Young, J. Virol. 64, 1920 (1990). 120. F. ClaveI and J. M. Orenstein, J. Virol. 64, 5230 (1990). 121. T. Hayashi, T. Shioda, Y. Iwakura and H. Shibuta, Virology 188, 590 (1992). 122. H.-J. Kim and J. J. O’Rear, Virology 198, 336 (1994). 123. G. L. Buchschacher and A. T. Panganihan, J. Viral. 66, 2731 (1992). 124. J. Luban and S. P. Goff, J. Viral. 68, 3784 (1994). 125. C. Parolin, T. Dorfman, G. Palu, H. Gottlinger and J. Sodroski, J. Virol. 68, 3888 (1994). 126. J. H. Richardson, L. A. Child and A. M . L. Lever, J. Viral. 67, 3997 (1993). 127. R. D. Berkowitz, J. Luban and S. P. Goff, J. Viral. 67, 7190 (1993). 128. R. D. Berkowitz and S. P. Goff, Virology 202, 233 (1994). 129. K. Sakaguchi, N . Zambrano, E . T. Baldwin, B. A. Shapiro, J. W. Erickson, J. G. Omichinski, G . M. Clore, A. M. Gronenborn and E . Appella, PNAS 90, 5219 (1993). 130. J. Dannull, A. Surovoy, G. Jung and K. Moelling, EMBO J. 13, 1525 (1994). 131. D. A. Konings, M. A. Nash, J. V. Maizels and R. B. Arlinghaus, J. Viral. 66, 632 (1992). 132. G. P. Harrison, E. Hunter and A. M. L. Lever, J. Viral. 69, 2175 (1995). 133. H. A. Heus and A. Pardi, Science 253, 191 (1991). 134. 6 . Varani, C . Cheong and I. Tinoco, Jr., Bchem 30, 3280 (1991). 135. D. F. J. Purcell and M. A. Martin, J. Virol. 67, 6365 (1993). 136. G. A. Viglianti, P. L. Sharma and J. I. Mullins, J. Virol. 64, 4207 (1990). 137. B. Berkhout and K.-T. Jeang, in “Genetic Structure and Regulation of HIV” (W. A. Haseltine and F. Wong-Staal, eds.), p. 205. Raven Press, New York, 1991. 138. D. N. Sengupta, B. Berkhout, A. Gatignol, A. Zhou and R. H. Silverman, PNAS 87,7492 (1990). 139. Y.-N. Chang, D. J. Kenan, J. 11. Keene, A. Gatignol and K.-T. Jeang, J. Virol. 68, 7008 (1994). 140. Y. V. Svitkin, K. Meerovitch, H. S. Lee, J. N. Dholakia, D. J. Kenan, V. I. Ago1 and N. Sonenberg, J. Viral. 68, 1544 (1994). 141. Y. V. Svitkin, A. Pause and N. Sonenberg, J. Virol. 68, 7001 (1994). 142. J. Pelletier and N. Sonenberg, Nature 334, 320 (1988). 143. C. Berlioz and J.-L. Darlix, J. Virol. 69, 2214 (1995). 144. J. Galabru and A. G. Hovanessian, JBC 262, 15538 (1987). 145. H. C. Schroder, D. Ugarkovic, R. Wenger, P. Reuter, T. Okamoto and W. E. 6 . Muller, AZDS Res. Hum. Retroviruses 6, 659 (1990). 146. R. K. Maitra, N. A. J. McMillan, S. Desai, J. McSwiggen, A. G. Hovanessian, G. Sen, B. R. G. Williams and R. H. Silverman, Virology 204, 823 (1994). 147. M. A. Minks, D. K. West, S. Benvin and C. Baglioni, JBC 254, 10180 (1979). 148. L. Manche, S. R. Green, C. Schmedt and M. B. Mathews, MCBiol 12, 5238 (1992). 149. R. Grantham and P. Perrin, Nature 319, 727 (1986).
34
BENJAMIN BERKHOUT
150. P. M. Sharp, Nature 324, 114 (1986). 151. J. Kypr and J. Mrizek, Nature 327, 20 (1987). 152. J. Kypr, J. Mrizek and J. Reich, BBA 1009, 280 (1989). 153. K.-C. Chou and C.-T. Zhang, AIDS Res. Hum. Retrooiruses 8 , 1967 (1992). 154. F. J. van Hemert and B. Berkhout, J. Mol. E d . 41, 132 (1995). 255. B. Berkhout and F. J. van Hernert, NARes 22, 1705 (1994). 156. S. Karlin, W. Doerfler and L. R. Cardon, J. Virol. 68, 2889 (1994). 157. E. Beutler, T. Gelbart, J. Han, J. A. Koziol and B. Beutler, PNAS 86, 192 (1989). 158. C. J. Decker and R. Parker, Trends Biochem. Sci. 19, 336 (1994). 159. D. Caput, B. Beutler, K. Hartog, S. Brown-Shimer and A. Cerarni, PNAS 83, 1670 (1986). 160. G. Shaw and R. Karnen, Cell 46, 659 (1986). 161. S. Ohno and T. Yorno, PNAS 87, 1218 (1990). 162. E. G. Shpaer and J. I. Mullins, NARes 18, 5793 (1990). 163. D. N. Cooper, Hum. Genet. 64, 315 (1983). 164. D. P. Bednarik, J. D. Mosca and N. B. K. Raj, J. Virol. 61, 1253 (1987). 165. C. Tuerk and L. Gold, Science 249, 505 (1990). 166. R. C. L. Olsthoorn, N . Licis and J. van Duin, E M B O J . 13, 2660 (1994). 167. D. Baltimore, Nature 335, 395 (1988). 168. G. J. Graham and J. J. Maio, PNAS 87, 5817 (1990). 269. A. Rhodes and W. James, J. Gen. Virol. 71, 1965 (1990). 170. N. Sarver, E. M. Cantin, P. S. Chang, J. A. Zaia, P. A. Ladne, D. A. Stephens and J. J. Rossi, Science 247, 1222 (1990). 171. G. Sczakiel, M. Pawlita and A. Kleinheinz, BBRC 169, 643 (1990). 172. B. A. Sullenger, H. F. Gallardo, G . E. Ungers and E. Gilboa, Cell 63, 601 (1990). 173. S. Joshi, A. van Brunschot, S. Asad, I. van der Elst, S. E. Read and A. Bernstein, J. Virol. 65, 5524 (1991). 174. K. Rittner and G. Sczakiel, NARes 19, 1421 (1991). 175. G. Sczakiel and M. Pawlita, J. Virol. 65, 468 (1991). 176. B. A. Sullenger, H. F. Gallardo, G. E. Ungers and E. Gilboa, J. Virol. 65, 6811 (1991). 177. M. Weerasinghe, S. E. Liem, S. Asad, S. E. Read and S. Joshi, J. Virol. 65, 5531 (1991). 178. B. Dropulic, N. H. Lin, M. A. Martin and K.-T. Jeang, J. Virol. 66, 1432 (1992). 179. F. Y. Tung and M. D. Daniel, Arch. Virol. 133, 407 (1993). 180. F. Lori, J. Lisziewicz, J. Srnythe, A. Cara, T. A. Bunnag, D. Curie1 and R. C. Gallo, Gene Ther. 1, 27 (1994). 180a. E. Gilboa and R. Smith, Trends Genet. 10, 139 (1994). 181. B. A. Sullenger and T. R. Cech, Science 262, 1566 (1993). 282. B. Berkhout and J. L. B. van Wamel, Antioiral Res. 26, 101 (1995).
High-MobiI ity-G roup Chromosomal Proteins: Arch itectura I Components That Facilitate Chromatin Function MICHAEL BUSTIN* AND
KAYMOND REEVES?
*Laboratory of Molecular Carcinogenesis National Cancer Institute National Institute of Health Bethesda, Maryland 20892 fDepartment of Biochemistry and Biophysics Department of Genetics and Cell Biology Washington State University Pullman, Washington 99164
I . The HMG-I/-2 and HMG-1 Box Proteins . . . . . . . . . . . . . . . . . . . . . . . . . A. Structure of the Proteins . . . . ........... B. Interactions with DNA and Ch ........... C. Cellular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. The HMG-I(Y) Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Structure of the Proteins ............. ., 5. Interactions with DNA an ..................... C. Cellular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. The HMG-141-17 Family . . . . . . . . . . . . ................ A. Structure of the Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Interaction with DNA and Chromatin . . . . . . . . . . . . . . . . . C . Cellular Function and Mechanism of Action . . . . . . . . . . . . IV. Summary and Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . ..........
37 38
43 49 51 54 58 63 72 73 91 93
Precise interactions between proteins and DNA in chromatin facilitate the orderly progression of complex processes such as transcription, replication, recombination, and repair. Most of the studies on the structure and function of chromatin have focused on interactions occurring between histones and DNA (1-4). It is now clear that the chromatin fiber serves not only to package the DNA into the nucleus but also provides a means to control the accessibility of specific sequences to regulatory factors and to potentiate interactions between distant regulatory elements (5, 6). Thus, most of the Progress in Nucleic Acid Rwearch and Molecular tliolopy Val 54
35
Copyright D 1996 by A c a d ~ m i cPress. Inc. All rights of reproduction in any form reserved.
36
MICHAEL BUSTIN AND RAYMOND REEVES
cellular processes involving DNA have to be considered in the context of chromatin. From this point of view, nonhistone chromosomal proteins, which are either part of, or associated with, the chromatin fiber, provide an additional level of structural and functional complexity. The term “nonhistone” is applied to all the proteins that can be extracted from chromatin and are not histones. In the broadest sense this definition is problematic because it includes many molecular species and it is difficult to ascertain which of the proteins are bona fide chromosomal components and which are nucleoplasmic or cytoplasmic contaminants. Traditionally this term is applied to “structural proteins” and does not include such proteins as modifying enzymes or regulatory factors that affect transcription or replication. The high-mobility-group (HMG) proteins are among the largest and best characterized group of nonhistone chromosomal proteins. Members of this protein group are found in all the cells of higher eukaryotes. HMG proteins are defined as nuclear proteins that can be extracted from nuclei or chromatin with 0.35 M NaC1, are soluble in 5% perchloric or trichloroacetic acid, have a high content of charged amino acids, and have a molecular mass lower than 30,000 Da (7-9). Currently, the HMG proteins are grouped into three families: the HMG-11-2 family, the HMG-141-17 family, and the HMG-I(Y) family. Although the structure of the proteins is well defined their cellular function is not fully understood. Most of the data suggest that these proteins serve as “architectural” elements in chromatin. They are structural elements that bind to specific structures in DNA or in chromatin with little or no specificity for the target DNA sequence. They facilitate, rather than perform, a specific function in chromatin. For example, HMG-1 facilitates the binding of the progesterone receptor by inducing a structural change in the target DNA (10). HMG-14/-17 proteins facilitate transcription from chromatin templates but are not part of the transcription complex (11).HMG-I(Y) proteins modify the structure of the DNA to facilitate protein:protein interactions in the preinitiation complex formed in A-T-rich promoter/enhancer sequences of several genes (12-14). The purpose of this review is to summarize recent information on the function of the HMG proteins. Advances in this field were made primarily by elucidating the structure of the proteins and by understanding their mode of interactions with DNA and chromatin. Therefore, we concentrate mainly on these aspects of HMG proteins. Renewed interest in these proteins is due to the finding that the DNA-binding domains of many regulatory proteins share common elements with the HMG-1/-2 chromosomal protein family. Likewise, recent results with chromatin assembly systems provided evidence that HMG-14/-17 may indeed enhance the transcriptional potential of a chromatin template and that HMG-I/(Y) proteins facilitate protein interac-
37
HMG PROTEINS
tions in certain transcription preinitiation complexes. The scope of this review is limited; for a comprehensive background on the isolation and chemistry of the protein it is best to consult the book edited by Johns (8)as well as several other reviews (7, 9, 15-16a). Information pertaining to the expression of HMG proteins during the cell cycle and ddferentiation has been reviewed by Bustin et al. (15).Information pertaining to the HMG-1 domain proteins can be found in several reviews and articles (16-19). This review also presents information of the structure of the HMG-I(Y) gene and its alternative splicing. A full description of the genes coding for HMG-14/-17 proteins has already been presented elsewhere (7). For descriptions of the genes coding for the mammalian HMG-1/-2 proteins and their homologs in various species, it is necessary to consult original references (20-36). The review covers information available up to May, 1995. The limited scope of the review and the recent widespread activity in the field do not allow us to cite all the references in the HMG field. We do apologize to those whose work we may have inadvertently failed to mention.
1. The HMG-1/-2 and HMG-1 Box Proteins
-
Members of the HMG-1/-2 family are the largest ( M , 25,000) and most abundant (-1 molecule per 10-15 nucleosomes) of the “high-mobilitygroup” of DNA-binding proteins. Proteins in this family are highly conserved. For example the human HMG-1 (215 amino acids) and HMG-2 (209 amino acids) proteins are coded for by separate genes (21,22), but nevertheless share >82% amino-acid sequence identity. Related family members have also been identified in, and their cDNAs and genes cloned from, various other vertebrates (20, 23, 24, 36), insects (25-27), plants (28, 29), protozoans (30, 31), and yeast (32, 33). Although many functions have been proposed for the HMG-1/-2 proteins, their actual biological roles remain elusive (reviewed in 7). Nevertheless, their relative abundance, conservation between species, and apparent lack of sequence specificity of DNA binding suggests that in uivo the HMG-1 and -2 proteins probably perform some general function(s) in the cell, for example, as structural components of chromatin and/or as ancillary transcription factors. Renewed interest in this group of nonhistone proteins also steins from the recent discovery of a large and highly diverse group of additional DNA-binding proteins, the so-called “HMG-1 box” family, related to the HMG-1 and -2 chromatin proteins by virtue of shared sequence homologies in their respective DNA-binding domains (reviewed in 16, 19, 37; see below).
38
MICHAEL BUSTIN AND RAYMOND REEVES
A. Structure of the Proteins The HMG-1/-2 proteins have a tripartite structure (7, 38) originally defined by limited proteolysis under high ionic strength “structuring conditions” (39, 40). The evolutionarily conserved N-terminal A domain and the central B domain, each of -80-90 residues, are internal repeats of similar amino acid sequence (-43% homologous), are extremely basic (net charge -+go), and constitute the nonspecific DNA-binding regions of the protein (41). The highly acidic C-terminal domain contains -30 consecutive aspartate or glutamate residues and is involved in interactions with other proteins, particularly histones (40, 42-45), as well as functioning to regulate DNAbinding affinity of the HMG-1/-2 proteins (46). Regions of -70-80 amino acid residues homologous to the A and B domains of HMG-1 [the so-called “HMG box” (47) or, more appropriately, the “HMG-1 box’’ motif (19)] have been observed in numerous other proteins, many of which are gene transcription factors (reviewed in 16, 19, 37). The HMG-1 box superfamily, with animal, plant, and yeast members, is of ancient evolutionary origin (dating back at least 109 years) (48) and contains both sequence-specific DNA-binding proteins and proteins that bind to DNA without sequence specificity. Analysis of the alignments of a large number of proteins has defined the following distinct amino-acid sequence motif as a “signature” for the HMG-1 box DNA-binding domain (19): (G,S ,A) (Y, F)* * (Y, F ,W)*(G,S ,A) * * (W,Y, F) .* * -..(K, R, Q ,) * - (Y, F, W) * * ....* (K, R, Q) * (Y,F, H)* ...* * (Y, F,W) 9
The most noticeable characteristic of this motif (the parentheses enclose equivalent residues) is the conservation of the position and spacing of the hydrophobic aromatic tyrosine (Y), tryptophan (W), and phenylalanine (F). The asterisk indicates that spacing is not fixed. Phylogenetic analysis (48) distinguishes two subgroups of proteins containing the HMG-1 box motif; one subgroup, including the HMG-1/-2 proteins, as well as the nucleolar HNA polymerase I transcription factor known as UBF (47)and the mitochondria1 transcription factors intTF (49)and ABF2 (33), contains two or more HMG-1 boxes. The other subgroup, as exemplified by the mammalian testes-determining factor SRY (50, 51), the lymphoid enhancer binding factor LEF-1/TCF-la (52, 53),the yeast nonhistone proteins NHP6A/B (54, 54a), a structure-specific recognition protein that binds cisplatin-modified DNA (SS), a component of the V-(D)-Jrecombinase, T160 (S6), as well as numerous other known or suspected transcription factors, many of which are involved in mating-type determinations and sexual development (16, 19, 37), typically contains a single box embedded in a
39
HMG PROTEINS
larger protein. Outside of the signature motifs, these different HMG-1 boxcontaining proteins usually have little or no sequence homologies.
1. STRUCTUREOF
THE
HMG-1 Box
The tertiary structure of one HMG-1 box domain, the box B of mainmalian HMG-1 proteins, has been determined independently by two different groups using 2D 'H NMR and 3D 15N-lH NMR solution spectroscopic techniques (57, 58). More recently, the solution NMR structures of the HMG-1 boxes from the Drosophilia HMG-D protein (59) and the human testesdetermining protein (hSRY-HMG) (60)have also been established. Although there are minor differences in detail, the structures of all of these HMG-1 boxes from both the mammalian and insect proteins are remarkably similar. Figure 1 shows a schematic representation of a coordinate-averaged three-dimensional structure of the rat HMG-1 box B (57).As illustrated, the HMG-1 box is composed of three a-helices and an extended N-terminal peptide segment that have an unusual twisted L (57),or V (58)shape consisting of two arms, one shorter (-31 A) than the other (-36 A), with an angle at the apex between the arms of -70-80". The shorter arm of this boomerangshaped structure consists of helices I and I1 and the longer arm is composed of the extended N-terminal region packed against helix III. The relative
helix Ill
helix II FIG.1. Schematic representation of the three-dimensional structure of the B-domain box of rat HMG-I as determined by solution NMR [redrawn with modifications from Weir et al. (57j1. The extended segment with its highly conserved amino-acid core sequence of P7-K8-R9P10 is proposed to be the region of the hox that binds to the minor groove of DNA (see Section LA1).
40
MICHAEL BUSTIN AND RAYMOND REEVES
positions of the two arms are maintained by a cluster of conserved hydrophobic amino acid residues at the apex of the V. Thus, the apex of the fold contains the hydrophobic core around which the three helices are arranged. In addition, conserved hydrophobic residues stabilize the intersection of helices I and 11. Helices I1 and 111 together with the extended N-terminal region lie approximately in a plane, forming a rather flat surface to one side of the domain, with helix I protruding from the opposite side. The first 12 residues of the HMG-1 box (employing the numbering system of 57) are in an extended configuration lying antiparallel to helix 111, such that the N-terminus of the box and C-terminus of helix I11 lie close together and are stabilized by interactions of hydrophobic residues on the inner amphipathic face of helix 111 with three proline residues of the extended N-terminal region. This stable structural element, composed of the extended segment and part of helix 111, has been called the “terminal unit” (58) and forms the long arm of the L-shaped box. Outside of the extended N-terminal peptide region of the HMG-1 box, which has a remarkable sequence and structural similarity to the extended DNA-binding domain of the HMG-I(Y) proteins (61; see Section III,A), there is no discernible relationship between the HMG-1 box and other previously described DNA-binding structural folds, such as those found in the helix-turn-helix proteins. In toto, the highly conserved HMG-1 box appears to be a novel DNA-binding motif (57, 58). Its three-dimensional configuration (Fig. 1) provides an explanation for most of the sequence identities and homologies found conserved in various HMG-1 box proteins. For example, several of the highly conserved “signature” amino acids (19)are internal hydrophobic residues important for maintaining the integrity of the folds and arms of the box structure (57, 58, 62). Furthermore, the conservation of basic and acidic residues in different HMG boxes (16, 19, 37) suggests that common surface features, such as asymmetric charge distributions, are functionally important. For example, most of the positively charged basic residues are on, or close to, the concave surface formed between the two arms of the box (Fig. l), including both the extended N terminus and part of helix I, suggesting to early workers (57, 58) that this was the region involved in DNA binding. As shown in Fig. 2, this prediction has been confirmed (60) with the determination of the threedimensional solution structure of the hSRY-HMG box DNA cocomplex (see Section 1,B). These and other observations have lead to the notion that HMG-1 box structure is conserved to a greater extent that amino-acid sequence. The overall validity of this idea is also attested to by the findings from recent homology model-building experiments in which a large number of HMG-1 box sequences were “threaded through the solution-NMR structure of the rat HMG-1 B box (62). These model-building studies indicated
HMG PROTEINS
41
that whereas the HMG-1 box does not have rigid sequence requirements for its formation, its overall tertiary domain structure is highly conserved and can be used as a basis for establishing phylogenetic relationships between HMG box protein family members in the absence of statistically significant sequence similarities (62).
2. HMG-1 Box BINDING AND SPECIFICITY The selectivity and specificity of binding by different types of HMG-1 box proteins to linear B-form DNA varies considerably. Binding of the mammalian HMG-1 and -2 and the yeast NHP6A/B chromatic proteins, for example, seems for the most part to be indifferent to DNA sequence. On the other hand, the binding of other types of HMG-1 box proteins, such as the nucleolar transcription factor UBF and the mitochondria1 transcription factors mtTF and ABFB, produces specific DNA footprints but the protected sites do not have a recognizable consensus sequence (49, 63-65). In contrast, the class of HMG-1 box-containing “specific transcription factors,” such as SRY, LEF-UTCF-la, and others, as well as the T160 V-(D)-J recombinase, produce specific footprints on DNA spanning sequegces with a recognizable consensus (reviewed in 16, 19). In general, all of the specific binding sites for the HMG-1 box-containing transcription factors are A-T-rich and the same sequences are often recognized by several different proteins within a related group of factors. Compared to classical transcription factors, the sequence specificity of the HMG-1 box-containing transcription factors is fairly low (66).However, the mere fact that they do possess the ability to recognize and bind to specific DNA sequences is remarkable for several reasons. First of all, methylation-interference, base-substitution, diethyl-pyrocarbonate protection, and hydroxyl-radical cleavage experiments indicate that all HMG-1 boxes interact with DNA primarily through contacts with the minor groove on one side of the duplex (67-69). Except for the well known case of the TBP protein binding to the TATA element (70, 71), such a mode of interaction is unusual for sequence-specific DNA binding proteins because the minor groove provides little opportunity for base-specific contacts and hydrogen bonding cannot distinguish T from C residues (72)or A-T from T.A base pairs (73).These physical limitations on specific protein/DNA interactions in the minor groove are thus probably responsible for the modest sequence selectivity of the HMG-1 box transcription factors. Nevertheless, as will be seen below, hydrogen bonding in the minor groove is well suited for structure-directed recognition because the phosphates on either side of the groove are often spaced at favorable distances for selective interactions. A second reason that the sequence-recognizing ability of the HMG-1 box transcription factors comes somewhat as a surprise is that, as noted above,
42
MICHAEL BUSTIN AND RAYMOND REEVES
the tertiary structure of the DNA-binding domains of all of the HMG-1 box proteins so far investigated are nearly identical (Figs. 1 and 2). Thus, the physical basis for this sequence selectivity must reside in subtleties of either the domain structure itself and/or differences in particular amino acid residues that interact with DNA. In this connection, the long arm (i.e., the terminal unit) of the HMG-1 box has been directly implicated in sequencespecific recognition. In a series of domain-swapping experiments, CraneRobinson and colleagues (74) switched the long and short arms of the sequence-specific HMG box of TCF-la into the equivalent positions in the non-sequence-specific B box of HMG-1, and demonstrated that only chimeric proteins that contained the long arm of the TCF-la protein (i.e., the “extended 12 amino-terminal residues and the last 25 C-terminal residues of helix 111; Fig. 1) formed a sequence-specific complex with DNA. These experiments also clearly demonstrated the additional point that not all HMG-1 boxes are equivalent or interchangeable. The results of these domain-swapping results are also entirely consistent with earlier reports showing that certain of the highly conserved amino acids in the first 12 residues of the extended N-terminal region of the box (numbering system of 57) are directly involved in DNA binding because mutations of these residues in the HMG-1 boxes of SRY (69, 75) and LEF-1 (68) significantly reduce, or abolish, binding without obviously interfering with the structural protein folding interactions of the box. In particular, the three mutations, V7L, R9G, and M111, in SRY that result in sex reversal, and the double mutant K8E and K9E in LEF-1, all fail to bind DNA. Furthermore, a clear distinction has now emerged between sequencespecific and nonspecific HMG boxes in the extended N-terminal segment at positions 7 and 12 (74). The residue at position 7 is proline in all nonsequence-specific boxes, whereas in sequence-specific boxes a hydrophobic residue (valine or isoleucine) is common. The hydrophobic residue at position 7 could be involved in sequence recognition whereas a conserved proline at this position would be expected to have relaxed sequence dependence (61, 74). All presently known sequence-specific HMG boxes also have an asparagine residue at position 12 (Asn-12)whereas a serine at this position is typical for non-sequence-specific boxes. Because the hydrogen-bonding potential of asparagine residues for base recognition is well established (76), substitution of this amino acid at position 12 could reasonably be expected to contribute to altered HMG box sequence specificities. As illustrated in Fig. 2, and consistent with these predictions, Clore and colleagues (60),in their determination of the structure of the cocomplex of hSRY-HMG with DNA, identified seven different amino-acid residues (among them Asn-12, or, in their designation, Nlo) distributed along the entire binding surface of the box
HMG PROTEINS
43
that make direct contacts to particular bases and hence would be expected to mediate sequence specificity.
B. Interactions with DNA and Chromatin 1. HMG Box PROTEINSRECOGNIZE BENT AND DISTORTED DNAS The HMG-1 and -2 proteins have long been known to bind nonspecifically to both double- and single-stranded DNAs, with a marked preference for the latter. Additionally, they can unwind and introduce supercoils in plasmid DNAs, can preferentially bind to cruciform structures as well as to B-Z DNA junctions and also apparently possess the ability to distinguish between different conformations of single-stranded molecules (reviewed in 7). Recent results indicate that the HMG-1I-2 proteins (77, 78), the sequence-specific HMG box-containing SRY protein (66), and the HMG-1 boxes from a number of other proteins recognize the sharp angles present in synthetic four-way junction (4WJ) DNA molecules. In fact, it now appears likely that 4WJs are the universal target for all HMG box proteins (79-82). The physical basis for this specific structural recognition remains a matter of speculation because neither the actual structure of 4WJs nor the mode of interaction of HMG-1 boxes with these structures is currently known. Nevertheless, models have been proposed (58, 83) suggesting that the terminal unit (i.e., the long arm) of the HMG box interacts with the minor groove in the two acute angles of such structures. Indirect support for this model comes from recent hydroxyl radical footprinting experiments that show that the bacterial HU protein preferentially and symmetrically binds to the two acute angles of4WJ DNA (84).Because, in many respects, HU, a homolog of the bacterial I H F protein, is similar to HMG-1 in its ability to interact with and bend DNA and, in fact, can actually replace the HMG-1 protein in certain functional assays (79, 85), these footprinting results suggest that the HMG-1I-2 proteins may interact with 4WJ DNA in a similar fashion. The inherent capacity of HMG-11-2 proteins, and the isolated HMG boxes, to bind to already bent or distorted DNA (78,86)is also attested to by their ability to bind to both the major l,%intrastrand d(GpG) and to the minor 1,Sintrastrand d(GpTpG), DNA adducts of the antitumor drug cisplatin (87-89). These adducts are known to bend DNA by -32-34" (90). Analogous to the situation with 4WJ DNAs, isolated HMG box domains can also preferentially bind to cisplatin-modified DNAs, and DNase I footprinting indicates that both strands of the DNA around the adduct are bound by the box peptides (91). In normal cells, both the major and minor cisplatin DNA adducts are thought to be repaired in uivo by the human excision nuclease system (92).The biological significance of HMG-1I-2, or HMG box
44
MICHAEL BUSTIN AND RAYMOND REEVES
protein, binding to cisplatin adducts is not known, but in vitro HMG-1 binding to such adducts inhibits repair of the major intrastrand cross-linked products by the human excision nuclease system, suggesting that the types and levels of HMG-domain proteins in a tumor may influence the responsiveness of that cancer to chemotherapy (92).Alternatively, cisplatin adducts may function by nonspecifically trapping or "hijacking" essential HMG boxcontaining regulatory proteins, such as the ribosomal gene transcription factor hUBF (93), thereby leading to cellular toxicity.
2. HMG Box PROTEINSBEND,LOOP,AND SUPERCOIL DNAs In addition to recognizing bent DNA, both sequence-specific and nonsequence-specific HMG box proteins are capable of inducing bends in DNA (16, 60, 86). In the case of sequence-specific HMG box proteins this has generally been established from circular permutation assays, and bend angles of -130" for LEF-l(68,94,95)and -85" for mouse and human SRY (66) have been reported. In the case of non-sequence-specific proteins such as the mammalian HMG-1/-2 proteins and the yeast NHPGA/B nonhistone protein (96, 97), DNA bending to varying degrees has usually been demonstrated both by permutation assays (98)and by ring closure, or circularization assays (10, 96, 99) with the best bending results being obtained with reduced "native" proteins that have never been denatured or exposed to acids (10, 100, 101). The most definitive information so far available on the molecular mechanisms involved in HMG-1 box-induced DNA bending comes, however, from solution NMR studies of a complex of the human SRY-HMG box with its specific recognition sequence (60, 102, 103). As shown by several views of the hSRY-HMG-DNA cocomplex illustrated in Fig. 2, on binding to the minor groove of its recognition sequence the hSRY-HMG box induces a large conformational change in the duplex DNA from a B type in the free state to a markedly bent and underwound form that follows the contours of the concave binding surface of the box perfectly. Hence, this protein-DNA interaction represents a classical example of induced fit. The DNA in the complex is bent by -70-80" in the direction of the major groove, which is accomplished by induction of large positive local interbase pair role angles for six of the seven base steps present in the octamer substrate. In addition, the DNA is also severely underwound (with an average interbase pair helical twist of -26") and, as a result, the minor groove is shallow and significantly expanded, with a width of -9.4 A compared with -4.0 bi in B-DNA. Concomitantly, the major groove is substantially compressed. As originally predicted (102), a principal factor in the bending of the DNA is the partial intercalation of an isoleucine residue (113) between base pairs near the center of the DNA substrate. Widening of the
HMG PROTEINS
45
minor groove appears to be mediated by five amino acid residues that form a T-shaped wedge in direct contact with the central base pairs of the DNA octamer. The overall structure of the hSRY-HMG-DNA cocomplex with its widened minor groove and DNA bent toward the major groove is strongly reminiscent of the structure of another minor groove binding protein, TBP, the TATA box binding protein, in complex with its DNA substrate (70, 71). Although the molecular mechanisms involved in the formation of these two types of complexes are quite different, what is clear is that very different protein binding surfaces placed within a widened minor groove can bend and unwind DNA in a similar manner. In contrast to these examples, the means by which non-sequence-specific HMG-1 box proteins induce bends in DNA are unknown. Such bending may involve a combination of several of the above mechanisms and may also include others, such as asymmetric DNA charge neutralizations (104). In addition to their capacity to induce DNA bending, both HMG-1/-2 (7, 46, 105-107) and the non-sequence-specific HMG box proteins, such as the ribosomal gene transcription factor UBF (108-110), have the potential to induce (in the presence of topoisomerase I) supercoils in topologically closed domains of DNA. Furthermore, these proteins also can introduce loops in either linear DNAs or relaxed circular plasmids in the absence of other factors. The efficiency of HMG-1 protein-induced looping and supercoiling is modulated by its acidic C-terminal domain with a four- to fivefold reduction in both DNA binding affinity and supercoiling ability when the tail is present (46, 105). The ability of HMG-1 box proteins to bend and modulate the topological configuration of DNA substrates has led to the idea that the HMG box is an all-purpose “DNA benderiwrapperllooper” domain (81, 82) that in many ways acts like a eukaryotic equivalent of the bacterial I H F and HU proteins (which also have these capabilities) and has therefore been recruited by different proteins in order to facilitate a variety of DNA biological functions, including transcription, repair, and recombination (see Section I, C). In considering the probable biological validity of such a proposed in vivo function for the HMG-1 box, it should also be kept in mind that superimposed on these manipulative abilities for DNA substrates is an even more fundamental ability of HMG-1 boxes: namely, their generalized capacity to recognize and bind tightly to altered DNA conformations, such as intrinsically bent or underwound structures, stem-loops, 4WJs, and cisplatin adducts, regardless of their nucleotide sequences. Importantly, as noted above, in most instances the HMG-1 box proteins actually possess considerably greater in vivo binding aflinities for such distorted DNA structures than they do for normal B-form DNA (66, 79, 83, 111). For instance, the sequence-specific SRY protein has about the same
46
MICHAEL BUSTIN AND RAYMOND REEVES
-.,--
Sequence Dependent
n
I I
Induced Bending
h
FIG. 3. Diagram depicting the various functional capacities of an individual HMG-1 box with respect to DNA structure recognition and bending in oitro. As shown by the large arrows, HMG boxes have an inherent ability to nonspecifically, yet very tightly, bind to altered DNA structures such as those that are either intrinsically bent, undenvound, or adducted by cisplatin (pathway 1)or structures formed by four-way junctions, cruciforms, or DNA cross-overs (pathway 2). Pathway 3 indicates that HMG boxes also have the ability to nonspecifically bind to B-form DNA and induce bends, but, as shown by the smaller arrow, such binding is of mnsiderably lower affinity than that observed for binding to previously distorted structures. Pathway 4 indicates binding to B-form DNA of sequence-specific HMG box transcription factors with a subsequent introduction of a bend in the substrate. As in the case with the binding shown in pathway 3, the smaller arrow in pathway 4 indicates that the afhity of binding of sequencespecific HMG boxes to linear B-form DNA is often less than that observed for binding of the same box to DNA that is already intrinsically bent or distorted. The solid boxes in pathway 4 indicate the defined sequence binding sites on the DNA.
nonspecific binding affinity for 4WJ DNAs as does the HMG-1 protein (with Kd values between 10-8 and 10-9 M) and this &nity is even greater than the affinity of the SRY protein for its normal recognition sequence in B-form DNA (66).Thus, as depicted in Fig. 3, the HMG-1 box proteins in vivo are likely to bind selectively to previously bent or altered DNA structures in preference to B-forms of DNA and therefore, by inference, to favor selectively DNA structural recognition and/or stabilization over induction of DNA bending. As first suggested (66), such an in vivo situation for HMG-1
47
HMG PROTEINS
-
Sequence-Specific
-
Competition for Non-specific
Binding to Bent DNA
recognition
Sequence-specific DNA binding protein
Non-specific DNA binding protein
FIG. 4. Competition between sequence-specific and conformation-specific DNA binding by an HMG-1 box-containing transcription factor [redrawn with modifications from Landsman and Bustin (19) and based on an original model by Ferrari et al. (66)]. A sequence-specific HMG-1 protein can bind to linear B-form DNA containing its recognition sequence (filled box) and introduce a bend or conformational change in the target DNA. The same HMG box can also, and often with higher a n i t y , bind nonspecifically to previously bent or distorted DNA. Thus, when both types of DNA are present in a given reaction there is a competition, based on their relative binding a n i t i e s , between the specific sequence-containing DNA and the nonsequence-specific DNA for binding by the sequence-recognizing HMG box protein. Nonspecific HMG box proteins such as HMG-1 and HMG-2 also recognize and hind to bent DNA with high a n i t y . The cellular concentrations of the latter proteins are orders of magnitude higher than that of the sequence-specific HMG box proteins. Therefore, the nonspecific HMG box proteins will inhibit the binding of the sequence-specific proteins to bent DNA and thus facilite preferential binding to their recognition sequences on linear B-form molecules.
box-containing transcription factors (for example, the male sex-determining protein SRY) could potentially have disastrous biological and/or developmental consequences because the effective cellular concentrations of essential sequence-specific proteins could be substantially reduced by their nonspecific trapping by bent or -distorted DNA structures that transiently exist in cells for a variety of reasons. As illustrated in Fig. 4, it has been suggested (19, 66) that one, but
48
MICHAEL BUSTIN AND RAYMOND REEVES
obviously not the only, possible function for the existence nonspecific HMG box proteins in cells is to provide a biological solution to this differential substrate competition problem. Because the normal concentration of nonspecific DNA-binding HMG box proteins, such as HMG-1 and -2, in cells is orders of magnitude higher than that of sequence-specific HMG box transcription factors, these nonspecific proteins would be expected to saturate preferentially the multitude of nonspecific DNA-binding sites, thereby ensuring that the concentrations of these sequence-specific proteins remain high enough to successfully find their targets.
3. INTERACTION OF HMG-11-2 PROTEINS WITH CHROMATIN Contradictory results have often clouded attempts to elucidate the role, if any, played by HMG-1I-2 proteins in the regulation of chromatin structure. For example, there are early conflicting reports (reviewed in 7) of both the mediation and the repression of in vitro nucleosome assembly by HMG-1I-2 proteins. However, early studies did clearly demonstrate that, in viuo, HMG-1I-2 proteins, like the lysine-rich histones H1 and H5, are bound preferentially to linker DNA between adjacent nucleosomes in the bulk of eukaryotic chromatin (reviewed in 1, 2). Nevertheless, mononucleosomes can be isolated from a subfraction of total chromatin that contains near stoichiometric amounts of HMG-1/-2 proteins, but lacks histone H1, suggesting that a major function of HMG-1 and -2 is to replace H1 in restricted linker regions so as to promote the accessibility of local chromatin domains (112,113), presumably those involved in transcription. Recent investigations reveal that whereas H 1 is a repressor of transcription (reviewed in 114),the HMG-1/-2 proteins appear to be general chromatin factors that can either stimulate (10,115-119) or reversibly repress (120, 121) polymerase I1 transcription, depending on the experimental conditions (see Section 1,C). The molecular mechanisms by which these two classes of highly basic nuclear proteins either repress or activate transcription in uivo are not known. Nevertheless, recent findings provide some novel support for the long-held suspicion that HMG-1I-2 proteins compete with histone H1 for binding to localized regions of chromatin, thereby potentially affecting their functional activity. There is general agreement that the linker histones H1 and H5 interact with DNA at the cross-over point where it enters and exits the nucleosome (1, 122). Furthermore, like histone H1, HMG-1 induces a chromatosome stop in reconstituted chromatin digested with micrococcal nuclease (36), suggesting that both classes of proteins bind to similar regions on the front face of nucleosome particles. Both H1 and H5 bind to cross-overs of doublestranded DNA (123), as well as to synthetic four-way junctions (124, 125) that
49
HMG PROTEINS
structurally mimic cross-overs (86, 126), in preference to regions of linear double-stranded DNA. Furthermore, the same workers demonstrated that HMG-1 can compete effectively with H1, but not histone H5, for binding to 4WJs in uitro, suggesting that replacement of histone H1 by HMG-1 may play a part in the putative transcriptional activation of chromatin by HMG-1 (127). Although of considerable intrinsic interest, and of possible heuristic value, the biological relevance of these in uitro observations remains to be confirmed because there is still no direct evidence that either cross-over DNAs (86) or 4WJs (126) do, indeed, mimic the structure of DNA on the front face of nucleosomes and, so far at least, all of the evidence relating the preferential association of HMG-1/-2 proteins with transcriptionally active regions of chromatin in uivo is of a correlative nature (reviewed in 128).
C. Cellular
Functions
Although the cellular function of several HMG-1 box-containing transcription factors has been firmly established, the in uivo roles played by the HMG-1/-2 proteins are less clear owing to often conflicting in vitro experimental results (reviewed in 7). Nevertheless, numerous lines of evidence suggest that the HMG-1/-2 proteins participate in the regulation of chromatin structure as well as being involved, either as positive or negative factors, with various aspects of DNA replication, transcription, and repair. As previously noted, perhaps the most widely accepted function for the HMG-I/-2 proteins is their ability to bind preferentially to, as well as induce, bent or distorted DNA structures and to facilitate the formation of supercoils and loops in topologically restricted DNA domains. This ability of HMG box proteins to recognize and modulate DNA structure, as well as participate in specific protein-protein interactions, has led to their designation as “architectural transcription” factors (reviewed in 16), implying that they are involved in the formation of stereospecific nucleoprotein complexes involved in gene transcriptional activation (16, 95, 129), although this is not necessarily always the case because these same capabilities can just as easily be employed to regulate other aspects of nuclear DNA structure and function (96, 97, 130). The uncertainties surrounding the biological role of the HMG-1/-2 proteins are well illustrated by the continuing controversy over the role played by these nonspecific DNA-binding proteins in regulating transcription. Early reports indicated that HMG-l/-2 proteins could significantly stimulate specific in uitro transcription from the adenovirus major-late promoter in HeLa cell lysates (116)and suggested that this effect is caused, in part, by an HMG-mediated increase in the rate of binding of a viral transcription factor (MLTF or USF) to a 5’ upstream promoter element (115, 117).
50
MICHAEL BUSTIN AND RAYMOND REEVES
More recently, HMG-11-2 proteins have likewise been reported to stimulate the in uitro transcription of a number of other nonviral genes (131, 132), possibly by acting to stabilize an activated conformation of the transcription factor TFIID-TFIIA initiation complex (133) on the promoters of such genes. HMG-1 and -2 also appear significantly and specifically to stimulate the binding of other nonviral transcription factors to their cognate promoter/enhancer sequences (10, 118, 134). For example, in uitro HMG-1 stimulates by over 10-fold the sequence-specific binding of a complex of purified human progesterone/progesterone receptor proteins to oligonucleotides containing progesterone-response elements (PREs), most likely as a consequence of HMG-induced bending of the PRE-containing DNA substrates (10).In addition, HMG-2 specifically interacts in vitro with the POU domains of the octamer transcription factors Octl and Oct2, thereby increasing the sequence-specific DNA binding of these proteins (135). Perhaps more importantly, the results of cell transfection experiments involving an octamer-reporter gene construct cotransfected with either an antisense HMG-2 expression vector or a vector expressing a VP-16/HMG-2 chimeric protein also strongly suggest that the Oct and HMG-2 proteins physically interact with each other in viuo and thereby stimulate octamerdependent gene transcriptional activity (135). In contrast to these findings, purified HMG-1/-2 proteins repress transcription in uitro by RNA polymerase I1 (Pol-11) as a consequence of specifically interacting with components of the basal transcription initiation complex at two different steps in its formation. At the initial stages of initiation complex formation HMG-1 can interact with the TATA-binding protein (TBP) in the presence of a TATA-box-containing oligonucleotide to form a specific HMG-1.TBP.promoter complex (120).This quaternary complex prevents factor TFIIB from binding to TBP and, consequently, blocks both formation of the preinitiation complex and in vitro transcription from the substrate DNA. Furthermore, transcription factor TFIIA can, in a concentration dependent manner, compete with HMG-1 for TBP binding and thus reverse the HMG-mediated in uitro repression of Pol-I1 basal transcription. In addition, purified HMG-2 proteins inhibit basal transcription by binding later in the assembly process after the assembly of the TBP*TFII.promoter complex but before formation of the fourth phophodiester bond by Pol-I1 (121). Interestingly, this basal repression of transcription by HMG-2 can be counteracted in an ATP-dependent process that is mediated by a TFIIHassociated factor, possibly a helicase. In viuo experiments have also resulted in apparently conflicting effects of the HMG-11-2 proteins on transcription. For example, two types of cell transfection experiments indicate that HMG-1 proteins can stimulate transcription in vivo (119). In one type of experiment, HMG-1 protein intro-
51
HMG PROTEINS
duced into COS-1 cells as a complex with an expression plasmid carrying the bacterial lac2 gene was found to enhance the level of reporter gene expression. In the second type of experiment, cells were cotransfected with an expression carrying the HMG-1 cDNA and the lac2 gene reporter plasmid and, again, the transcriptional activity from the reporter plasmid was enhanced. Significantly, in these cotransfection experiments the acidic C-terminal region of the HMG-1 protein was essential for the observed enhancement of reporter gene expression, suggesting that this region of the protein acts as a transcriptional activator (119). Furthermore, overexpression of HMG-1 (but not HMG-2) protein in cells stably transfected with cDNA-expressing bovine papilloma virus vectors leads to increased expression of reporter genes transfected into these cells as well as a loosening or “relaxation” of the chromatin structure of the minichromosomes derived from the transfected reporter gene plasmids (136). Nevertheless, these in uiuo results obtained with mammalian cells stand in marked contrast to the situation in yeast cells where the C-terminal end of the mammalian HMG-1 protein has been demonstrated not to act as a transcriptional activator (137), suggesting that the acidic terminal region of this protein probably functions in a different manner in these highly divergen t organisms.
II. The HMG-I(Y) Family The mammalian HMG-I(Y)protein family consists of three members (Fig.
5):the isoform proteins HMG-I [also called 6,4a-protein (138,140-143)]and HMG-Y (142, 144) and the closely related protein HMGI-C (145, 146). Complementary DNA clones have been isolated for the mouse (144) and human (142, 147) HMG-I and -Y proteins, as well as for mouse (145) and human (146)HMGI-C. The HMG-I (107amino acids; -11.9 kDa) and HMG-Y
-
(96 amino acids; 10.6 kDa) proteins are identical in sequence except for an 11-amino-acid internal deletion in the latter and are produced by alternative splicing (142, 144) of transcripts from a single gene (148)(Fig. 6). The HMGIC protein (109 amino acids; 12 kDa) has high amino-acid-sequence homology (-50% overall) with the HMG-I and HMG-Y proteins, has the internal deletion of 11amino acids characteristic of HMG-Y (Fig. 5), but is the product of a separate gene (145, 146, 148). In viuo, members of the HMG-I(Y) family exhibit considerable additional heterogeneity as a result of secondary biochemical modifications (143, 1 4 4 , certain of which (for example, reversible phosphorylations) (150-154) are cell cycle regulated (see Section 11,B,3). The human HMG-I(Y) gene (Fig. 6) is located on the short arm of chro) a region involved in rearrangements, translocations, mosome 6 (at 6 ~ 2 1in
-
52
hu hu mu hu
MICHAEL BUSTIN AND RAYMOND REEVES
HMG-I HMG-Y *
1 1 1 1
HMG-Y+ HMGI-C
(M)SESSSKSSQPLASKQEKDGT (M)SESSSKSSQPLASKQEKDGT (M)SESGSKSSQPLASKQEKDGT
EKRGRGRPRKQPP EKRGRGRPRKQPP EKRGRGRPRKQPP (M)SARGEGAGQPSTSAQGQPAAPAPQKRGRGRPRXQQQ
I- BD
+
II--I
35 VSPGTALVGSQKEPSEVPTPKRPRGRPKGSKNKGAAKT RKTTT 35 KEPSEVPTPKRPRGRPKGSKNKGAAKT RKTTT 35 KEPSEVPTPKRPRGRPKGSKNKGAAKT RKVTT 38 EPTGEPSPKRPRGRPKGSKNKSPSKAAQKKAEA
........... ........... ............ I- BD
+
34 34
34 37
77 66
66 70
1 1- 1
LEK EEEEGISQESSEEEQ 67 TPGRKPRGRPKK LEK EEEEGISQESSEEEQ 66 APGRKPRGRPKK LEK EEEEGISQESSEEEQ 71 TGEKRPRGRPRKWPQQWQKKPAQEETEETSSQESAEED 78 TPGRKPRGRPKK
107 96
96
109
+
A*T-DNA Binding Domain Consensus: TP-KRPRGRPKK (the A - THook Motif) FIG. 5. Comparison of the amino-acid sequences of members of the mammalian HMG-I(Y) family of nonhistone chromatin proteins. The human (*, 142, 148) and mouse (+, 144) HMG-I and HMG-Y are isoform proteins produced by alternative mRNA splicing from a single gene, whereas the closely related human HMGI-C protein p, 146) is the product of a separate gene. Both the HMG-Y and the HMGI-C proteins are missing an internal stretch of 11-12 amino acid residues (....-.) that is present in the HMG-I protein. The DNA-binding domains (BD-I, -11, and -II), also called the A.T-hooks (61), of the HMG-I and HMG-Y proteins are indicated, as is the “consensus” amino-acid sequence for these motifs. The amino-acid sequences of the DNAbindings domains of the HMGI-C protein are quite similar to corresponding regions of the HMG-I and HMG-Y proteins, but these proteins diverge considerably elsewhere in their sequences, hence the necessity of introducing blank “gaps” to facilitate comparisons of maximal amino-acid similarities. The diamonds (+) indicate the sites of in v i m phosphorylation of the human HMG-I and HMG-Y proteins by cdc2 kinase (151,152);the double circles (00)indicate the sites of in uitro phosphorylation by casein kinase 11.
and other abnormalities correlated with a number of human cancers (148).In the mouse the cognate gene, Hmgi, is located in the t-complex region of chromosome 17 in an area containing a number of genes that, when mutated,
I II 11'
111
IV
1 -'ID V V
VI
VII
Vlll
Untranslated cDNA
HMQ-Y
llntranslated ORF -
Proteln Coding 33 mer,
splicing
FIG. 6. Diagram of the human HMG-I(Y) gene showing patterns of transcript and alternative splicing [redrawn, by permission of Oxford University Press, from Friedrnann et d.(148)with modifications]. The human gene is longer than 10 kb and contains eight exons (Roman numerals I-VIII) and seven introns (numbers 1-7). Curved arrows show the four different in uioo start sites (labeled 1A-IOA, 2B-7C, 6 A and 11D)for transcription, and the solid lines connecting the various exons indicate different alternative splicing patterns that result in the production of different mRNA species, including those coding for the HMG-I and HMG-Y isoform proteins. Note that the three independent DNA-binding domains ofthe HMC-I(Y) proteins (BD-I, -2, and -3) are located on different exons.
54
MICHAEL BUSTIN AND RAYMOND REEVES
cause embryonic lethality, suggesting that Hmgi is a good candidate locus for embryonic lethal mutations (155).In contrast, studies of transgenic insertional mutations in mice have localized the HMGI-C gene to the pygmy (or “minimouse”) locus on chromosome 10 (156-158). Because the pygmy phenotype does not result from lack of growth hormone or its receptor, it seems likely that this growth defect is due to a reduced response to an embryonic growth factor such as IGF-1. This observation therefore suggests that HMGI-C may either be involved in the regulation of genes activated by embryonic growth factors and/or be specifically responsive to such factors
(156-158). Of interest in this connection is the recent demonstration that stimulation of quiescent cultured mammalian cells by a variety of growth factors (e.g., PDGF, FGF, EGF, phorbol esters, or serum) leads, within a few (l4)hours, to the induced expression of a number of “delayed early response” (DER) genes (159),among them HMG-I, HMG-Y, and HMGI-C (148, 159, 160). Such gravth-factor induction of gene expression can be quite specific. For example, of the four different promoters and mRNA transcription start sites present in the complex human HMG-I(Y) gene (148) (Fig. 6), only one site is specifically induced by phorbol ester stimulation of quiescent cells (160), whereas stimulation by E G F leads to induced transcription from only two of the four sites (161). These results indicate that the different promoterlenhancer sequences are individually and specifically stimulated in response to particular growth factors, a fact that may have biological significance not only for embryonic development but also for regulation of the HMG-I(Y) gene in normal somatic cells and in transformed cancerous cells (see below).
A. Structure of the Proteins The peptide domains of the HMG-I(Y) proteins that preferentially interact with B-form A-T-DNA (see Section II,B,2) have been experimentally determined and a short synthetic peptide (Tl-P2-K3-R4-P5-R6-G7-RS-P9K10-K11) corresponding to a “consensus” binding domain (BD) sequence was found to footprint to the minor groove of a stretch of 5-6 bp (or one-half a helical turn) of A*T-DNA in a manner similar to binding of the intact protein (61). Each HMG-I(Y) protein has three separate BD motifs (also referred to as “A*T-hookmotifs) (Fig, 5) separated by stretches of flexible peptide backbone sequences. Thus, the tandem binding of all three BDs in an HMG-I(Y) protein should occupy the minor groove of -15-18 bp (or about one and one-half helical turns) of contiguous A.T-residues. Such a DNA-binding arrangement is predicted to induce secondary structural changes in the HMG-I(Y)protein, particularly in the flexible peptide regions
HMG PROTEINS
55
between BDs (61, 162), a speculation supported by preliminary two-dimensional solution 1H N M R studies (163). Analogous to the situation for the “HMG-1 box” motif of the HMG-1/-2 family, amino acid sequences similar to the BD domain (or A.T-hook) of HMG-I(Y) are found in numerous other DNA-binding proteins present in many different organisms, including yeast, plants, sea urchins, insects, and mammals. Often multiple copies of these BD-like sequences are present within otherwise unrelated proteins. Many of these proteins bind preferentially to A*T-rich DNA sequences in uitro, and most are suspected of being transcription factors involved in gene regulation. A palindromic BD-like sequence “P-R-G-R-P,” flanked by basic residues (arginines or lysines), is present in most of these conserved motifs and likely represents the consensus “core” of the A-T-DNA-binding domains of these proteins (61, 164). As illustrated in Fig. 7 , the peptide backbone of the consensus BD peptide is predicted (61)to have a planar, crescent-shaped structure that has general similarities to distamycin A and netropsin and to the fluorescent dye Hoechst 33258, ligands that also bind to the minor groove of A-T-sequences. Spaced along this crescent peptide backbone, and projecting above and below its plane, are the positively charged side chains of Arg and Lys residues that are so positioned (when the BD is bound to the minor groove of A.T-rich sequences) that they can interact with and neutralize the negatively charged phosphate residues on the two antiparallel strands of DNA. Evidence supporting a structural relatedness of the above minor groove ligands to the planar backbone of the BD peptide of HMG-I(Y) is provided by the striking similarity of their footprints on A.T-DNAs (165) and by their competition with each other for substrate binding both in uitro (61, 165,166) and in viuo (162; unpublished data). Indeed, in viuo displacement of the HMG-I(Y) proteins by the antiviral and antitumor drugs netropsin and distamycin has been suggested to be, at least partially, the basis for their marked cellular toxicity (167). Two-dimensional 1H NMR solution studies (168-1 70) have also directly confirmed crucial features of the proposed planar crescent-shaped backbone structure of the BD peptide, particularly demonstrating the existence of all of the proline residues in the expected all-trans configuration, as well as showing its minor groove binding to B-form linear A*T-DNAsubstrates (170). As discussed in Section II,B,2, the HMGI(Y) protein, as well as the BD peptide itself, can bind preferentially to nonB form DNAs, such as four-way junctions and supercoiled plasniids. How this is accomplished is unknown but it is tempting to speculate that the inherent rotational flexibility of the glycine residue in the middle of the BD peptide allows for enough pliancy to adopt certain alternative, thermally stable, backbone configurations (169) that could potentially accommodate
56
MICHAEL BUSTIN AND RAYMOND REEVES
I
.
H2Cb/P
I H0N4
'C"-CH3
,c=o -4
b \ C H 3
"h+O
/
P
/CH2
H2fi.b
N H ~
+
3. ’ 0
2
%J
*3? 0 H-U3+
0"
N A
9 C"3'
\cc
FIG.7. Comparison of the predicted planar crescent-shaped backbone structure of (A) the consensus DNA-binding domain peptide of the HMG-I(Y) family of proteins with those of the minor groove A.T-DNA-binding ligands netropsin (B) and Hoechst 33258 (C). [Redrawn with modifications from Reeves and Nissen ( S l ) . ]
57
HMG PROTEINS
TABLE I PROTEINSWITH SEQUENCESSIMILAR TO THE HMG-I(Y) DNA-BINDINGMOTIF Protein
Peptide sequence
HMG-IIY (human) MLL (ALL; HRX) (human) MIF2 (yeast) Datin (yeast) D l (Drosophila) cHMGI (insect) Histone H 1 (sea urchin sperm) Histone H2B (sea urchin sperm) C H D l (mouse) SBlGA,B (soy bean) ATBP-1 (pea) PF1 (oat)
TPKRPRGRPKK SPRKPRGRPRIK
Consensus
KIRPRGRPKIR
Binds A T-DNA
Suspected transcptioii factor
Ref.
+
+ +
61 186, 203a-c
+
RPRGRPKK (GRKP . . KIRRGRPKK RPRGRP (SITPRKIR) (SITPRKIR) KRPKKRGRPR KRPIGRGRPKI PK KI RRRIPGRPRI PK RPRGRPKK
203d
203e 203f
187 171, 203g,
+
+ + + + +
h 171, 203g,
h 203i 203j 203k
2031
61, 164
binding to such altered DNA structures. On the other hand, in analogy with the proposed mode of binding of isolated HMG-boxes to 4WJ DNAs (58),the extended BD peptides of the HMG-I(Y) proteins may not have to vary much in overall conformation to accomplish minor groove binding to non-B-form structures. Future structural studies of HMG-I(Y) proteins complexed to different types of DNA substrates should resolve these issues. The 11 amino acids that comprise the “consensus” sequence of each of the three independent DNA-binding domains of the HMG-I(Y) proteins (Table I) seem to form a “unit” that is modular in both structure and function. The planar, extended conformation of the BD peptide backbone (Fig. 7) facilitates tight, general structural recognition of the minor groove of DNA (61).On the other hand, the conserved palindromic “P-R-G-R-P”core of the BD peptide (along with positively charged flanking sequence) (Table I) probably imparts specificity in determination of the structure of the narrower minor groove of A*T-richsequences (61).The generality of a consensus of this type for recognizing minor groove structure has been recognized (171) and termed the GRP motif. And, finally, as discussed below, the amino-terminal threonine residue of the BD peptide appears to function as a “regulatory” residue involved in modulating the &nity of association of the protein with substrate DNA as a result of reversible phosphorylations.
58
MICHAEL BUSTIN AND RAYMOND REEVES
B. Interactions with DNA and Chromatin 1. HMG-I(Y) PREFERENTIALLY BINDSA-T-RICHDNA in Vivo AND in Vitro Although one member of the HMG-1/2 family, HMG-2a, also displays preferential binding in vitro to A-T-rich DNA fragments from a variety of sources (272), of all of the other known HMG proteins only HMG-I and HMG-Y preferentially bind to A*T-richDNA both in vitro and in vivo. By a combination of methylation interference, dI.dC base-pair substitutions, minor groove ligand-binding competition studies, and a variety of DNA footprinting techniques, these proteins have been shown to bind, in vitro, to the narrow minor groove of short stretches of A.T-rich B-form DNA (61, 140, 141, 165,173). In viuo, the HMG-I(Y) proteins have been immunolocalized to the AaT-rich G/Q and C bands of mammalian metaphase chromosomes (174), suggesting that they may play an important role in chromosome structural changes during the cell cycle (162,165). In vivo experiments employing high-resolution confocal laser microscopy and immunolocalization techniques have shown HMG-I(Y) to colocalize, along with topoisomerase 11, to A-T-rich scaffold-associated regions (SARs) of mitotic chromosomes (175177). Careful microscopic analyses have revealed that HMG-I(Y) is distributed along the longitudinal length of the backbone scaffolding, or ‘‘AaTqueue”, of native chromosomes, including colocalization in the GIQ bands and C bands, postulated to represent tightly coiled SAR sequences (175, 176). These in vivo observations confirm and extend earlier in vitro data showing that purified HMG-I(Y) proteins preferentially bind to isolated SAR fragments (178) and, in fact, effectively out-compete histone H 1 for binding to such A-T-rich sequences (162, 179). 2. HMG-I(Y) PROTEINSRECOGNIZEDNA STRUCTURE DNA footprinting experiments employing purified proteins indicate that in vitro the HMG-I(Y) proteins do not bind to all stretches of A-T-rich DNA equally well, or with equal affinity, indicating that these proteins recognize the structure, rather than the sequence, of such DNA (61, 143, 165, 180182). Recent polymerase chain reaction (PCR)-based DNA selection techniques (183) also demonstrate the marked differences in binding afXnity of HMG-I(Y) for different types of A*T-DNA (184). In linear duplex B-form DNA, the affinity and specificity of HMG-I(Y) structural recognition is significantly influenced by both the length and sequence of the particular A*T stretches (141, 180, 185) and by the “context” of flanking or adjacent nucleotide sequences (165, 180-182, 185). Interestingly, HMG-I(Y) also has the capacity to recognize and preferentially bind to certain types of structures formed by non-A-T-rich DNA se-
HMG PROTEINS
59
quences. For example, in uitro, the whole HMG-I(Y) protein (80), as well as the DNA-binding domain (186, 187) binds to synthetic four-way junction (cruciform) structures in preference to linear duplex DNA molecules of identical sequence. Likewise, HMG-I(Y) recognizes and binds to non-B-form structures in supercoiled plasmids (188) as well as to distorted regions of DNA found on isolated nucleosome core particles (189). The mode of interaction of the HMG-I(Y) protein, or its DNA-binding domains, with these non-B form DNA structures is presently unknown.
3. PHOSPHORYLATION OF HMG-I(Y) BY Cdc2 KINASE ALTERS ITS BINDINGAFFINITY The HMG-I(Y) proteins, along with histone H1, are among the most highly phosphorylated proteins in the nucleus and the extent of such phosphorylation is cell cycle dependent (reviewed in 162). In mammals the extensive phosphorylation of histone H 1 that occurs in proliferating cells is catalyzed by an enzyme homolog of the yeast cyclin-dependent kinase (cdk) p34cdc2/CDC28[also called Cdc2 kinase; formerly referred to as growth-associated histone-H1 kinase (190)], the activity of which is sharply elevated at mitosis. Activated Cdc2 kinase phosphorylates serine or threonine residues within the consensus sequence Ser/Thr-Pro-(Xaa)-Lys/Arg, where the presence of Xaa is variable but, when present, is often a polar residue (191). An inspection of the sequences of the three DNA-binding domains found in different HMG-I proteins (Fig. 5) reveals that in the human protein, two of the three BDs (at residues Thr-53 and Thr-78) have potential Cdc2 kinase phosphorylation sites, whereas, in the mouse protein, only one site (at residue Thr-53) conforms to the consensus phosphorylation sequence. Activated Cdc2 kinase isolated from mammalian cells (151, 152), as well as from starfish oocytes and sea urchin eggs (154),efficiently phosphorylates both human and murine HMG-I and HMG-Y proteins in uitro at the expected modification sites. Furthermore, in uivo 32P-labeling studies of synchronized human and mouse cells show that these same Cdc2 consensus phosphorylation sites are radiolabeled in HMG-I(Y) proteins isolated from metaphase cells (but not from nonproliferating, G1, or S phase cells) (151, 152, 154). These results clearly indicate that the mammalian HMG-I(Y) proteins are in viuu substrates for Cdc2 kinase and demonstrate that the extent of DNA-binding domain phosphorylation varies in a cell cycle-dependent manner. The in uivo effect of such modifications is uncertain, but in vitro phosphorylation of purified human recombinant HMG-I proteins by Cdc2 kinase results in a greatly reduced binding a n i t y (to 1/20 at physiological ionic strength) of the phosphorylated protein for A-T-DNA substrates, probably as a result of negative charge repulsions (152, 162). Nevertheless, as noted earlier, even during mitosis, when the HMG-I(Y)
60
MICHAEL BUSTIN AND RAYMOND REEVES
proteins are most highly phosphorylated, they do not completely dissociate from metaphase chromosomes (174),although their strength of DNA binding may well be weakened. Because in vitro mutagenesis experiments show that replacement of the two conserved Cdc2 kinase-modifiable threonine residues in human HMG-I with nonphosphorylatable alanine residues does not change the binding affinity of the mutant protein for substrate DNA (192), it is likely that the threonine residues at the N-terminal ends of BD peptides are “regulatory residues” involved in reversibly modulating the afhity of association of the protein with substrate DNA at specific points in the cell cycle. Such modulations of binding affinity as a result of reversible Cdc2 kinase phosphorylations can reasonably be expected to have significant effects on the in vivo function(s) of HMG-I(Y) proteins, for example, during the extensive condensation and decondensation of chromosomes accompanying cell division (162).
4. HMG-I(Y) INDUCES BENDS AND SUPERCOILSIN DNA Circular dichroism measurements (193), circular permutation DNA bending analyses (184), and topoisomerase-I-mediated plasmid supercoiling assays (188)all indicate that HMG-I(Y) binding markedly alters DNA conformation by introducing bends, supercoils, and possibly other distortions in the substrates. Given the mode of interaction of the individual binding domains with the minor groove of linear DNA or relaxed plasmids, the most likely physical explanation for at least some of the HMG-I(Y)-induced bending is by asymmetric charge neutralization (104) of the negative phosphate residues located on one face of the DNA helix by the positively charged Arg and Lys residues of the BD peptides (61, 162). In addition, HMG-I(Y)mediated strand unwinding also appears to contribute significantly to the ability of the protein to introduce distortions in DNA (188). For example, recent studies employing relaxed circular plasmids DNAs, topoisomerase I, and HMG-I(Y) indicate that increasing concentrations of the nonhistone protein in the assay used results in the introduction of increasing numbers of supercoils in the plasmid DNAs (188). Interestingly, at low input ratios, HMG-I(Y) introduces positive supercoils in the plasmids, whereas at progressively higher concentrations the protein induces increasing numbers of negative supercoils. Detailed analyses of this phenomenon reveal that such changes in the sign of plasmid supercoiling probably result from a combination of both HMG-I(Y)-induced DNA bending and strand unwinding. An additional finding of considerable interest from these studies is that an in vitro-produced mutant HMG-I protein, lacking the negatively charged carboxyl-terminal domain, binds A-T-DNA with approximately the same affinity as the full-length wild-type protein and yet is 8- to 10-fold more
HMG PROTEINS
61
effective in introducing negative supercoils. This suggests that the highly acidic C-terminal region of the HMG-I(Y) proteins may function as a regulatory domain influencing the amount of topological change induced in DNA substrates by protein binding (188). 5 . HMG-I(Y) BINDING TO CHROMATIN AND NUCLEOSOMES
Early studies (140, 194) investigating the chromatin organization of A.Trich a-satellite DNA in CV1 monkey cells demonstrated by two-dimensional electrophoretic methods that a distinct subpopulation of isolated monomer nucleosome core particles contained a-protein (also called HMG-I), in addition to HMG-14 and -17. In subsequent experiments, the same workers found that the pattern of in oitro binding of a-protein to bulk CV-1 mononucleosomes is strikingly similar to that of HMG-14/-17 binding (140). Both native and recombinant HMG-I(Y) proteins also bind to preferred regions on isolated avian nucleosome core particles containing -146 bp of random sequence DNA (189). Up to four discrete HMG-I(Y).core particle complexes can be detected by electrophoretic mobility shift assays when increasing molar ratios of protein are associated with cores. In vitro and in vivo chemical cross-linking investigations indicate that HMG-I(Y) proteins bind to nucleosome core particles in close proximity to histones H2A, H2B, and H3. Thermal denaturation and DNase I protection studies in vitro show that when HMG-I(Y) is present in less than equal molar concentrations with mononucleosomes the protein initially binds to DNA in the vicinity of the DNA termini at the entrance and exit points on the face of the particle. With increasing molar ratios of bound protein (up to -4 : 1)DNase I footprinting shows that other preferred regions of DNA along the sides of the nucleosome particle are also protected. Both protein-DNA and protein-protein interactions are involved in HMG-I(Y) core particle association. These findings, combined with other information, suggest that HMG-I(Y), like HMG-14 and -17 (195,196), selectively binds to the front face of nucleosome core particles near the dyad axis, as well as near the entrance and exit of DNA from core particles, when the protein is bound at low molar ratios ( < 1 : 1 HMGI(Y):core particles) (189). Because not all random sequence nucleosomes are expected to have A.Trich sequences located in the preferred binding sites on the front face of core particles noted above, it seems plausible that the HMG-I(Y)protein is recognizing and binding to altered DNA structures in these locations (189).Additional support for this idea comes from subsequent studies (197) involving binding of HMG-I(Y) to in vitro reconstituted mono- and dinucleosomes containing DNAs of defined sequence that have various types of A*T stretches (bent, rigid, flexible) located at a particular site in the reconstituted substrates (198). The principal finding from these investigations is that
62
MICHAEL BUSTIN AND RAYMOND REEVES
HMG-I(Y) protein preferentially binds to different sites on defined-sequence DNA depending on whether the duplex substrate is free in solution or has been distorted by being wrapped around a histone octamer core (197). In addition, these studies show that (1)HMG-I(Y) has the capacity to associate with certain types of A.T sequences even when they are located on the lateral sides of the reconstituted nucleosome and (2) on binding, the protein can induce a localized change in the rotational setting of the DNA on the core particle surface. In tuto these studies indicate that HMG-I(Y) binding to D N A associated with chromatin core particles in vitru is mediated, just as in the case of binding of the protein to free DNA substrates, by recognition of preferred DNA structures. Although HMG-I(Y) and HMG-14/-17 proteins do share certain similarities in the way they bind nucleosomes, these two families of HMG proteins are distinctly different in many other important respects. For example, whereas HMG-14 and -17 bind to only two specific sites on each core particle (196,199,200), at high molar ratios (-4 : 1) HMG-I(Y) can form up to four discrete complexes with random sequence core particles in vitru (189). Furthermore, in contrast to HMG-14 and -17, which bind more tightly to core particles than to naked DNA (7,195,196, 199,200)and which also bind .~ manner (199-203), HMG-I(Y) binds more to nucleosomes in a cooperative ~tightly to naked A-T-rich substrates (61)than to random sequence core particles (189)and, so far, there is no evidence for cooperative HMG-I(Y) binding to core particles (189). Based on these differences in binding characteristics, it is expected that in chromatin containing A*T-rich linker regions, HMGI(Y) would preferentially associate with the linker DNA whereas HMG-14 and -17 would bind to nucleosomes. On the other hand, in chromatin in which both the nucleosome and linker DNAs are of random sequence it would not be unreasonable to expect simultaneous binding of both HMG-14/-17 and HMG-I/Y to at least some fraction of the nucleosome core particles, as has previously been reported for the a-protein (140, 194). 6. SIMILARITIESOF THE HMG-I(Y) AND HMG-1I-2 PROTEINFAMILIES Given the marked differences in their amino-acid sequences and their folded peptide structures, there is a remarkable similarity in many of the in uitro DNA-binding characteristics of the HMG-1/42 and HMG-I(Y) proteins. Both families of proteins bind to the minor groove of DNA and have the ability to induce bends and supercoils in DNA, as well as possessing the ability to recognize and preferentially bind to altered DNA structures, e.g., four-way junctions, cruciforms, and certain types of adducted, or non-B form, DNA conformations. This unusual constellation of shared capabilities suggests that the DNA-
63
HMG PROTEINS
binding domains of the two families of proteins probably also share some important common features. At first glance, however, the three-dimensional L- or V-shaped arrowhead structure of the HMG box of the HMG-1I-2 proteins (Fig. 1) appears superficially to be quite different from planar, crescent-shaped BD peptide of the HMG-I(Y) proteins (Fig. 7). Nevertheless, on closer inspection of these two motifs, there does appear to be a significant commonality in both the structure and the sequence of the peptides that actually interact with the minor groove of DNA. As outlined above (Section I,A,2), the first 12 residues of the N-terminal region of the HMG-1 box have been strongly implicated in binding to the minor groove of DNA and, significantly, just as in the case of the BD peptides of the HMG-I(Y) proteins (61),the peptide backbone of this region of the box is in an extended configuration compatible with preferential binding to a narrow minor groove (57, 58). Additionally, there is a highly conserved consensus sequence, P7K8-R9-P10, present in the extended N-terminal peptide of HMG-1 boxes (Fig. 1) (57) that is also faithfully conserved (P2-K3-R4-P5) (Table I; 203a-2) in the BD motif of many HMG-I(Y) proteins. And, most importantly, all of the prolines present in both the BD peptide (61, 168, f69),as well as in the N-terminal region of the HMG-1 box (57, 60) are in the trans configuration, a situation that facilitates both an extended peptide structure and minor groove binding (61). The available information therefore strongly argues for a preservation of similar peptide backbone structures as well as conservation of particular amino acid residues and conformations in the minor groove DNA-binding regions of the HMG-1 box and HMG-I(Y) proteins. The preferential recognition capabilities of individual proteins, for either bent or four-way junction DNAs, for specific DNA sequences, for certain stretches of A.T-DNA, or for other types of unusual DNA structures, are probably imparted by a combination of the subtleties of the actual amino acid sequence and structure of a given HMG DNA-binding domain as well as by the particular flanking, or adjacent, peptide residues.
C. Cellular
Functions
1. HMG-I(Y) Is AN in Vivo STRUCTURAL TRANSCRIPTION FACTOR
The in vivo function of the HMG-I(Y) family is much better understood than that of either the HMG-1I-2 or HMG-14/-17 families. Earlier studies (summarized in 152, 162), postulating a role for the HMG-I(Y) proteins in nucleosome phasing, metaphase chromosome condensation, DNA replication, and 3'-end processing of mHNA transcripts, have all generally been of a correlative nature, thus leaving unanswered the question of whether such
64
MICHAEL BUSTIN AND RAYMOND REEVES
observations have in vivo biological significance. Recently, however, a series of reports have presented compelling evidence directly implicating HMGI(Y) proteins in the in uiuo transcriptional regulation (either positive or negative) of a number of mammalian genes lying in close proximity to A-Trich promoter/enhancer sequences (Fig. 8). The first example of in vivo transcriptional regulation by HMG-I(Y) was reported (12) in studies of the promoter region of the murine lymphotoxin (LT; also called tumor necrosis factor$) gene that is constitutively expressed in transformed B-cell lines. Mutation and promoter deletion analysis delineated a 5‘ poly(dA0dT) upstream activating sequence (UAS), an essential component of LT transcriptional activation in vivo. Additional experiments showed that recombinant HMG-I specifically binds this U A S element in vitro and that nuclear extracts from LT-expressing mouse cells contain an HMG-I-like protein with identical UAS binding characteristics. Electrophoretic mobility shift analyses (EMSAs) using LT promoter DNA incubated in nuclear extracts demonstrated that anti-HMG-I(Y) antibodies gave band “supershift” patterns identical to those observed when the antibodies reacted with recombinant HMG-I protein alone bound to the promoter DNA. And, finally, EMSA combined with antibody reactivity analyses revealed that at least one additional protein was present in the nuclear extracts that bound to both HMG-I and the UAS, suggesting that HMG-I (probably in combination with other proteins) facilitates the formation of an active promoter/enhancer transcription complex necessary for LT gene expression in vivo (12). Since this initial report, additional examples documenting the in vivo involvement of HMG-I(Y) in the positive induction of gene transcription have appeared. These include the human genes coding for p-interferon (13, 173) for the a-subunit of the interleukin-2 receptor (14), and for E-selectin (204, 205). Examples are also known of instances where HMG-I(Y) binding to promoter regions seems to be involved in negative regulation of transcription, including the genes coding for human interleukin-4 gene (206) and GP91-PHOX (185), a component of the respiratory burst NADPH-oxidase complex of phagocytes, as well as the murine gene coding for heavy chain embryonic E-immunoglobulin (E-IgG) (207) (Fig. 8). Positive Regulation Murine tumor necrosis factor-p (TNF-P) (12) Human interferon-p (IFN-P) (13, 173) Human IL-2 receptor-a (IL-2Ra) (14) Human E-selectin (204, 205)
Negative Regulation Human interleukin-4 (IL-4) (206) Human GP91-PHOX (185) Murine E-immunoglobulin (e-IgG) (207)
FIG. 8. Positive and negative in uiuo regulation of gene transcription by HMG-I(Y) proteins.
HMG PROTEINS
65
Several of the reports supporting an in oivo role for HMG-I(Y) in positive gene regulation suggest that the protein probably functions as an “architectural transcription factor (16, 19, 208) both by bending DNA and by directly interacting with other transcription factors to facilitate formation of a stereospecific, multiprotein complex that brings together upstream promoter/enhancer elements with the proximal basal transcription apparatus during the process of transcription induction. Consistent with the basic tenants of such models is the fact that, in vitro, HMG-I(Y) bends and unwinds DNA substrates (see Section II,B,2). Furthermore, HMG-I(Y) also specifically associates either free in solution or as part of a complex in nuclear extracts, with a number of known sequence-specific transcription factors, including NF-KB,ATF-2, IRF, and c-Jun (13,173,209,210),and the lymphoid specific factor Elf-1, an Est family member (14). It should be noted, however, that direct experimental evidence supporting the presence of such stereospecific protein-DNA transcription initiation complexes in living mammalian cells has yet to be demonstrated. Nevertheless, two examples supporting the in vivo existence of such inducible HMG-I(Y) promoter complexes are of particular interest. One example comes from the recent studies of John et al. (14), who investigated the inducible expression of the gene coding for the a-subunit of the 1L-2-receptor (IL-2R) in human T cells in response to mitotic stimuli (Fig. 9). These workers identified and characterized a new positive regulatory region (PRRII) in the gene’s promoter (nucleotides -137 to -64) that binds both HMG-I(Y) and the lymphoid cell-specific factor Elf-1. Cell transfection experiments with an expression vector containing the IL-2Ra promoter ligated to the bacterial CAT reporter gene (Fig. 9A) demonstrated that mitogen-inducible expression of the promoter is inhibited when either the Elf-1 or the HMG-I(Y) binding sites in PRRII are specifically mutated. Furthermore, coexpression of both Elf-l and HMG-I(Y) proteins in nonlymphoid COS-7 cells (which normally lack the Elf-1 protein) containing the same CAT reporter construct activated transcription from the PRRII element. Previous work from the same group had also identified another mitogen-inducible promoter element (PRRI) farther upstream of the transcription start site (at nucleotides -276 to -244) that contained binding sites for two additional transcription factors, serum response factor (SRF) and NF-KB. Importantly, when specific antibodies [anti-Elf-1, anti-HMG-I(Y), anti-NF-KB, etc. ] against various putative components of the transcriptional system were employed in coimmunoprecipitation or EMSA supershift assays using either nuclear extracts or recombinant proteins free in solution, a direct physical interaction was found between Elf-1 and HMG-I(Y) as well as between Elf-1 and the NF-KB p50/c-rel heterodimer, suggesting that protein-protein interactions functionally coordinate the actions of the upstream
66
A
MICHAEL BUSTIN AND RAYMOND REEVES
HUMAN IL-2 RECEPTOR-a PROMOTER I
I
POSITIVE REGULATORY
9' O Y n n PRRII
PRRI
-47
C C G C ~ C T A T A T T G T ~ A T(CA )
l9c
GGCGTTTGATATAACAGTAQ3T)lgG
HMG-I
B
IiMG-I
HMG-I
Activated T-Cells
Resting T-Cells
HMG-I(Y) Molecules FIG.9. (A) Diagram of the human IL-2Ragene 5' regulatory region between nucleotides -472 and 109, including the upstream and downstream positive regulatory regions (PRRI and PRRII) attached to a bacterial chloramphenicol acetyltransferase (CAT)reporter gene used for in uiuo expression assays. The binding sites for transcription factors NF-KB,serum response factor
+
HMG PROTEINS
67
(PRRI) and downstream (PRRII) positive regulatory elements to form a protein complex necessary for inducible IL-2Ra gene expression (Fig. 9B). Another example comes from the laboratory of Maniatis (13, 173, 209, 210) and colleagues, who demonstrated in uiuo that HMG-I(Y) plays a causal role in the virus-induced expression of the human p-interferon gene (IFN-P). Induction of IFN-(3 depends on the simultaneous binding of both HMG-I(Y) and transcription factors NF-KBand ATF-2/c-Jun to two separate “positive regulatory domains” (PRDII and PRDIV) located in the gene’s 5’ promoterlenhancer region. HMG-I(Y) also interacted directly with both NF-KB and ATF-2 as free proteins in solution and thereby significantly increased the binding affinity of these transcription factors for their cognate DNA recognition sites in uitro. In this experimental system the HMG-I(Y) protein is also proposed to function as a mediator for the assembly of a stereospecific protein complex [including NF-KB, ATF-2, c-Jun, and HMGI(Y)] involving the two different upstream enhancer domains, as well as the basal promoter region that is required for virus-induced transcription of the IFN-P gene. In this system, HMG-I(Y) can either stimulate or inhibit the in uitro binding of different ATF-2 isoform proteins to the PRDI site, depending on whether these isoforms contain a short stretch of basic amino-acid residues, located near the leucine zipper dimerization motif, that is necessary for HMG-I(Y) binding (209). This differential association of HMG-I(Y) with different ATF-2 isoforms determines whether a functional ATF-2 dimer is formed that is capable of PHDI enhancer binding and thus, by inference, whether a functional, inducible transcription complex is formed on the IFN-P promoter. The HMG-I(Y) protein significantly increases the afEnity of binding of both NF-KB (13, 173) and the ATF-2 (209, 210) for their recognition sequences in the IFN-P promoter. In the case of the NF-KB site in PRDII, various footprinting techniques have shown that the NF-KBp50/p65 heterodimer binds to the terminal regions of a 10-bp regulatory sequence through contacts in the major groove, while HMG-I(Y) recognizes the central region of the same sequence through contacts in the minor groove; thus, the recog-
(SRF), Elf-I, and HMG-I(Y) are indicated. [Redrawn with modification from John et ~ l ( I. 4 ) . ](B) Diagrammatic model of the promoter region of the human interleukin-2 receptor a chain gene
(IL2-Ra) before (resting T cells) and after (activated T cells) mitogen stimulation indicating direct interactions between NF-KB, Elf-1, and HMG-I(Y) proteins. Two possibilities are indi-
cated for the activated state: the upper schematic depicts direct Elf-1-NF-KB interactions, whereas the lower diagram additionally shows the possibility that HMG-I(Y) may also enhance Elf-I-NF-KB interactions. It is possihle that both models depicting the activated state exist at the same time. [Redrawn with modification from John et al. (14).]
68
MICHAEL BUSTIN AND RAYMOND REEVES
nition sites of these two proteins overlap but their binding occurs in opposite grooves of the DNA (173). Because both proteins are proposed to occupy their respective PRDII binding sites simultaneously during initiation complex formation, a necessary prediction of such a model is that binding of NF-KBto the major groove will not interfere with HMG-I(Y) binding to the minor groove. That this prediction may indeed be correct is suggested by the recently determined X-ray crystallographic structure of a NF-KB p50 homodimer bound to a KB site (211, 212) showing that binding of the butterfly-shaped dimer to the major groove leaves the minor groove open for potential binding by HMGI(Y) (Fig. 10, see color plate). These X-ray structures do not, unfortunately, provide any clues as to how HMG-I(Y) binding in the minor groove might facilitate increased NF-KBaffinity for binding in the major groove. HMG-I(Y) is not the only HMG protein that facilitates increased binding affinity of NF-KBfor its recognition site. Purified HMG-1 (or HMG-2) stimulates, by greater than 19fold, the site-specific binding of all forms of NF-KB (p50, p52, and p65 homodimers as well as p5OIp65 heterodimers), with significant binding enhancements being observed with nearly stoichiometric amounts of HMG-1 to NF-KBprotein (134).Intriguingly, although HMG-1 greatly facilitates the binding of NF-KBto its recognition sequence, based on the failure of anti-HMG-1-specific antibodies to cause an electrophoretic “supershift” of the NF-KB-DNAcomplex, it does not appear that HMG-1 is part of the final ternary complex formed in these in vitro experiments (134). These findings are reminiscent of a previous report (10)describing the capacity of HMG-1 to enhance dramatically (>lo-fold) the binding affinity of purified human progesterone receptor (PR) for DNA fragments containing the progesterone response element (PRE) without being incorporated into the final PRaPRE complex. One interpretation of these combined experiments is that HMG-1 perhaps functions by a “hit-and-run” mechanism whereby the protein induces some type of structural change in the target DNA that facilitates transcription factor binding, but thereafter is not required for the maintenance of such binding and therefore readily dissociates from the complex. An alternative possibility, however, is that HM G-1 is, in reality, actually part of the final ternary transcription factorlDNA complex but is so loosely associated that it readily dissociates from the complex during gel electrophoresis. In either case, the remarkable fact that both the HMG-I(Y) and HMG-1I-2 protein families are able to facilitate enhanced transcription factor binding in vitro again reinforces the notion of an overall general similarity of DNA-binding capacities and possible biological functions of these two groups of proteins. A certain degree of caution may be exercised, nevertheless, in interpreting the results of experiments in which basic proteins such
HMG PROTEINS
69
as HMG-I(Y) or HMG-1 are shown to increase the in vitro DNA-binding d n i t y of NF-KB. In several reported cases such in uitro results have been interpreted as demonstrating that the observed increase in NF-KB binding affinity is the direct result of ancillary protein-induced DNA bending (13, 14, 173, 209, 210). However, because similar stimulations of NF-KB binding affinity can also be induced in vitro by certain proteins that do not cause DNA bending (134),the question of the actual role played by such ancillary proteins in stimulating NF-KB binding remains unclear.
2. HMG-I(Y) PROTEINS AND CANCER In light of the compelling evidence demonstrating that HMG-I(Y) proteins are structural transcription factors in vivo, it is not surprising that a number of laboratories have observed a striking correlation between high levels of HMG-I(Y) gene expression and neoplastic transformation of normal cells and/or increased metastatic potential of tumor cells. In normal differentiated somatic cells, HMG-I(Y) mRNAs and proteins are expressed at only very low (142-144, 213, 214), or nondetectable (215, 216), levels. In contrast, in neoplastically transformed cells (215, 217-223), as well as in embryonic cells that have not yet undergone differentiation (215, 216, 224), levels of HMG-I(Y) gene products are often exceptionally high. Spontaneously derived tumors, or normal cells experimentally transformed by chemicals, by ionizing or UV radiation, or by viral oncogenes (v-src, v-ras, v-mos, v-myc), contain abnormally high levels of HMG-I(Y) proteins and mRNAs. Because cellular levels of HMG-I(Y) mRNAs are known to vary with the rate of proliferation in normal cells, being very low in nondividing or quiescent cells and increasing about fourfold during exponential growth (213), it is important to emphasize that the elevated HMG-I(Y) product levels found in tumors appear to be relatively independent of cellular growth rates because untransformed normal cells proliferating at about the same rate as their transformed counterparts consistently contain much lower levels of HMGI(Y) (220-222). Estimates have been made (142, 144,213) that certain malignant cell lines constitutively contain 15 >> 50 times the level of HMG-I(Y) mRNAs found in nontransformed normal cells. The correlation between cancerous transformation and high constitutive levels of HMG-I(Y) gene products is so striking that Goodwin and colleagues (215, 21 7,218)have suggested that elevated concentrations of these proteins are a characteristic and diagnostic feature of the transformed cellular phenotype. Schalken’s laboratory (220) has also identified increased levels of HMG-I(Y) mRNAs as a progression marker for prostate cancer metastasis in the Dunning rat model system, demonstrating that the extent of HMG-I(Y) overexpression directly correlates with the degree of metastatic aggressiveness of the tumors rather than with their growth rates. More recent studies
70
MICHAEL BUSTIN AND RAYMOND REEVES
have extended these findings to human prostate cancers in a retrospective in situ RNA hybridization study of HMG-I(Y) mRNA levels in paraffin-embedded materials obtained from patients presenting different Gleason grades of metastatic prostate cancer (222). Likewise, retrospective studies have also correlated high levels of HMG-I(Y) protein expression with the malignant phenotype of human thyroid neoplasias (225). Similar correlations for increased levels of HMG-I(Y) mRNA and protein being reliable biochemical markers for different stages of tumor progression have been reported for a well-characterized mouse mammary epithelial cell system (221). The reverse situation also appears to be true, namely, that when undifferentiated, highly aggressive mouse teratocarcinoma cells are induced to undergo overt cellular differentiation, they lose both their high constitutive levels of HMG-I(Y) gene products and their in uiuo tumorigenic potential (224).But perhaps of greater biological significance is the recent report (223) that inhibition of HMG-I(Y) protein synthesis by gene antisense methodology suppresses the ability of transforming retroviruses (carrying v-mos or v-rus-Ki) to induce neoplastic transformation in rat thyroid cells. Together these reports provide strong experimental support for involvement of the HMG-I(Y) proteins in both neoplastic transformation and increased metastatic tumor potential. However, HMG-I(Y) genes do not behave like classical transforming oncogenes in that their transfection into normal cells does not usually lead to transformation (223),suggesting that in many cases their overexpression may be necessary, but not sufficient, to achieve the neoplastic phenotype; the activation of other factors, as well as alterations in the way the HMG-I(Y) protein functions as an architectural transcription factor, may also be required. Specific chromosome translocations are frequently found in human lymphomas and leukemias (139, 226) and recently the human mixed-lineage leukemia (MLL) gene (186) [also called ALL-1 ( 2 0 3 ~or ) HRX (203b)l involved in a number of such rearrangements has been isolated and sequenced. Significantly, the N-terminal region of the MLL (ALL/HRX) gene was found to code for an amino-acid sequence almost identical to the “A*Th o o k DNA-binding motif of the HMG-I(Y) proteins and it is this region of the gene that is frequently translocated in human leukemias (239, 203u,b). These findings raise the intriguing possibility that in certain human cancers, chromosomal translocation and fusion of an A-T-hook-like motif to a new cellular protein may convert the resulting hybrid into a transforming oncoprotein as a result of DNA mistargeting. Compelling support for such a scenario has recently been provided by two additional observations: (1)the demonstration that the HMG-I(Y) A-T motif peptide found in the MLL gene, which is involved in many aberrant chromosomal translocations (re-
71
HMG PROTEINS
viewed in 139), can specifically bind to both A*T-richsequences and to cruciform structures in uitro (186);and (2) chromosomal rearrangements at the site of the HMGI-C gene on human chromosome 12 result in the fusion of the A-T-hook motifs of this HMG-I(Y) family member to new transcriptional trans-activating regulatory domains during the formation of benign lipomas (227).
3. HMG-I(Y), HISTONEH1, AND OF CHROMATIN DOMAINS
THE
OPENING
Another recently postulated function of the HMG-I(Y) proteins relates to their in uivo roles as structural transcription factors and their intimate relationship to the binding of histone H1 and nucleosomes to substrate DNAs. It has been known for some time that if either H 1 histones (228, 229) and/or nucleosomes (reviewed in 2, 230, 231) bind to gene promoter/enhancer regions, transcription of the associated gene by RNA polymerase is usually either repressed or greatly inhibited. It is of some importance then, that, like the BD peptides of HMG-I(Y), the peptide tails of H1 histones also bind preferentially to the narrow minor groove of stretches of A.T-DNA (reviewed in 171). Furthermore, in uitro, HMG-I(Y) out-competes histone H1 for such DNA binding (162, 179). And, as previously mentioned, HMG-I(Y) also binds -50 times more tightly to free A.T-DNA than to chromatin core particles. It was therefore suggested (162)that one of the likely in uivo functions of the HMG-I(Y) proteins is to act as an antirepressor molecule that out-competes, or displaces, inhibitory histone H 1 and/or nucleosomes for A*T-DNA binding, thus assisting in the establishment of an open or accessible chromatin structure over important gene regulatory regions. Once such an “open” chromatin structure has been formed by HMG-I(Y) binding, this accessible configuration can potentially be propagated from one cellular interphase to the next as both HMG-I(Y) and histone H1 change their CdcZ-kinaseinduced phosphorylation levels, and hence their relative DNA-binding strengths, in a coordinated manner during mitosis (162). Considerable support for the above scenario has recently come from the in uitro demonstration (179)that HMG-I(Y) not only acts as an antirepressor molecule by preventing histone H1 binding to isolated SAR sequences, but also functions as a true derepressor by displacing previously bound proteins, thereby relieving histone H 1-mediated repression of reporter gene transcription. Based on the ability of HMG-I(Y) to function as a derepressor molecule in uitro, a model has been presented (166,179)for the involvement of both SARs and HMG-I(Y) in establishing the overall pattern(s) of inactive and transcriptionally competent chromatin domains during cellular differentiation.
72
MICHAEL BUSTIN AND RAYMOND REEVES
In this model, inactive chromosome loops or domains (232, 233) are proposed to be compacted and stabilized by “nucleating” histone H1 molecules that initially bind tightly to A*T-richSAR sequences located at the base of chromatin loops and then, through subsequent cooperative H 1-H1 protein interactions, “spread their inhibitory influence throughout a topologically defined domain. The compact, H1-containing domains thus formed remain transcriptionally inactive until HMG-I(Y) (or another “distamycin-like” D-protein) binds to the SARs and “mobilizes” or displaces histone H1; i.e., HMG-I(Y) binding is proposed to interfere with the ability of SARs to serve as nucleation sites for cooperative histone H1 assembly leading to chromatin domain activation (179).As a consequence of HMG-I(Y) binding, the equilibrium of histone H1 association is postulated to shift toward a reduction in occupancy of nucleosome linker regions in the domain, thus resulting in its “opening” into a transcriptionally competent or active region (166, 179). Although of considerable intrinsic interest, it should be kept in mind that the in vitro experiments on which this attractive model of domain activation is based were not performed in a nucleosomal chromatin context and therefore the in vivo biological relevance of the findings remains to be established.
111. The HMG-14/-17 Family Chromosomal proteins HMG-14 and HMG-17 are closely related proteins present in the cells of most higher eukaryotes. They have a high content of lysine, alanine, and proline and lack aromatic amino acid residues. Their amino-acid composition is reminiscent of the H1 linker histones, except that they have a significantly lower ratio of basic to acidic amino acids. Although they are ubiquitous in higher organisms, the HMG-14/-17 proteins have not been detected in yeast or other lower eukaryotes. Fish tissues have one protein, named H6, which contains all of the evolutionarily conserved domains of this protein family (see Section III,A, 1). Avian erythrocytes contain two types of HMG-14 proteins. The main component, HMG-l4a, has a higher molecular weight than most HMG-14/-17 proteins, whereas the minor component, named HMG-14b, is the homolog of mammalian HMG-14. In the chicken genome single-copy genes code for each of the HMG-14/-17 genes. The functional genes coding for both the human and chicken HMG-14 and HMG-17 have been isolated and fully sequenced (see 7). Structural analyses of these genes suggest that they evolved from a common ancestor. Mammalian genomes contain multi-
73
HMG PROTEINS
ple retropseudogenes for either HMG-14 or HMG-17; these are among the largest known retropseudogene families in mice and humans (234). The presence of HMG-14 and HMG-17 proteins in all the tissues of higher eukaryotes is perhaps the strongest argument favoring the possibility that this HMG family is necessary for proper cellular function. Furthermore, all cells contain both HMG-14 and HMG-17, suggesting that the proteins are involved in distinguishable functions. Although their exact cellular function and mode of action are still not fully understood, results from many types of experiments are consistent with the possibility that the HMG-14/-17 proteins modulate the effect of chromatin on transcription. Insight into their cellular function have been obtained from studies on their structure, their mode of interaction with the nucleosome cores, and their effect on the transcriptional potential of chromatin templates assembled under controlled conditions.
A. Structure of the Proteins 1. CONSERVED STRUCTURAL DOMAINS IN
THE
HMG-14/-17
PROTEINFAMILY Alignment of all the HMG-14/-17 protein sequences reveals structural motifs that are characteristic of this protein family. A sequence logo ( 2 3 4 ~ ) depicting the conserved amino-acid positions is shown in Fig. 11. This logo is based on a multiple alignment of the 12 known HMG-14/-17 protein sequences. Gaps have been introduced to maximize the homology between the members of the HMG-14/-17 protein group. Therefore, the sequence logo contains more amino-acid positions than an alignment of either the HMG-14 or HMG-17 protein subgroup alone, each of which contain respectively, 98 and 89 amino acids. From the sequence logo, it is apparent that the HMG-141-17 protein group has four regions with high sequence information content. The first region, with the sequence PKRK, consists of the first 4 amino acids from the N terminus of the proteins. The second conserved region consists of amino acids 17 to 47; the third region, spanning positions 64 to 69, consists of 5 amino acids with the sequence GK(KR)G, and the fourth region, positions 87 to 94, consists of 8 amino acids. In addition, residues 109 to 111 are also highly conserved. Residue 109 is negatively charged except in H6, where it is an asparagine. Residue 110 is invariably alanine except in the chicken HMG-l4b, where it is a valine. Further analysis of the alignment indicates an uneven distribution of charged amino-acid residues along the polypeptide chain. The HMG-14/-17 proteins can be subdivided into three regions. The first region, containing
74
MICHAEL BUSTIN AND RAYMOND REEVES
FIG. 11. Sequence logo of multiple alignment of HMG-14/17 proteins. The sequence logo is derived from a multiple alignment of the sequences obtained from SWISSPROT version 31.0. (For accession numbers see Fig. 13.)The information content, in bits, is determined at each position. The size of each letter is proportional to the information content (in bits) for that amino acid, which is a graphical representation of the frequency of an amino acid at a given position. Thus, taller letters represent high information content (i.e., positions 1-3). Shorter letters, or the absence of a letter, indicate positions with a variable content of amino acids, i.e., low information content. The logo was constructed by David Landsman (NCBI, NLM, NIH), using the methods described by Schneider and Stephens (23.1~).
residues 1-17, has a slight net positive charge of +2. The central region of the proteins, from residue 17 to residue 73, has a net positive charge of 16 for HMG-14 and + 13 for the HMG-17 subgroup. The C-terminal region of the proteins is negatively charged and has a net charge of -8 and -3, respectively, for HMG-14 and HMG-17. An outline of the conserved domains and the charge distribution in the HMG-14/-17 protein family is presented in Fig. 12. The asymmetric distribution of charged residues along the polypeptide chain is reminiscent of the structure of certain transcription
+
75
HMG PROTEINS HMG-14
+I
+I6
-8
HMG-I7
t2
+I3
-3
Exon
I
I
’
I
III
,
1v
V
I
I
,
I
VI
17 14
FIG. 12. The structure of the HMG-141-17 protein family. The evolutionarily conserved amino-acid residues are clustered into four major domains. The positions (the amino-acid position corresponds to that of the sequence logo in Fig. 1)of the domain boundaries are indicated. Note the correspondence between these domains and the organization of the gene. Thus. domain A is at the 3’ end of exon I; domain B is encoded by exons I11 and IV and domain D is located at the 3’ end of exon V. The charged residues are also clustered, giving raise to regions of low and high cationic charge. The C-terminal regions of the molecules are negatively charged.
factors in which the positive and negative charged residues are clustered into domains. Furthermore, as in the case of acidic transcription factors, the negatively charged C-terminal regions of the HMG-14/-17 proteins have the potential to form a helices with negatively charged surfaces. However, in spite of these structural similarities, experimental evidence suggests that HMG-14/-17 proteins d o not act as “classical”transcriptional activators (137). Figure 12 also illustrates an interesting correlation between the structure of the HMG-14/-17 genes and the conserved protein domains. The 3‘ end of exon I codes for domain A, the 3’ end of exon I1 codes for 3 amino acids at the N-terminal region of domain B, the 3’ end of exon V codes for domain D, and the 3‘ end of exon VI codes for the conserved residues at position 109111. Exons I11 and IV code for most of domain B, a 30-amino-acid evolutionarily conserved sequence, which is the nucleosomal binding domain of the HMG-14/-17 protein family (195). Exon I11 codes for a decapeptide in which 9 positions are absolutely conserved. HMG-14 and -17 are positively charged proteins. HMG-14 contains 21 lysine and 5 arginine residues and HMG-17 contains 21 lysine and 4 arginine residues. The N-terminal half of domain B, encoded by exon 111, contains 3 of the arginine residues and therefore can be considered as an arginine-rich cassette inserted into lysinerich proteins. The 17 amino acids in the C terminus of domain B are encoded by exon IV. This region contains an invariant motif, KPKKA, which is also present in H1 histones but not in any other known protein. This motif is similar to that of domain D, KGK(KR)G.
A ------DKSSDKKVQTKGKRGAKGKQAEVMQETKED-LPAENGETKTEE SPASDEAGEK-EAKSD
WMG-14 human
PKRK VSSAEGAAKEE-PKRRSARLSA KP-PAKVEAKPKXAMK
HMG-14
PKRK VSSAEGAAKEE-PKRRSARLSA KPAPAKVETKPKXAFGK ------DKSSDKKVQTKGKRGAKGKQAEVANQETKED-LPAENGETKNEE SPASDEAEEK-EAKSD
calf
HMG-14 mouse
I I I I I I I I I I I I I I I I I I I I I I I I I I I OIII I I I I I I I Ill1 I I l I I I I I . I IIIIIIIIII llIIllI* IIIIIIII
EXON
II IllIIlllIlIIIIIIIl.II I I I
IIII'I
II IIIII
IIIIIIII Ill. Ill-Ill 11.1
IIIII.
IIIIIIII I
PKRK V-SADGAAKAE-PKRR8SA KPAPAKVDAKPKKAAGK ------DKRSDKKVQIKGKRGAKGKQADVACQQTTE--LPAENGETENQ- SPASEE--EK-EAKSD
IIII I
I ....I IIIIIIIIII II I I
Ill1
I
. 1.1
HMG-14b chicken PKRK V-AASRGGREEVPKRRSARLSA rcmrPDKAEPHMG-14a chicken
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIII I l l I I I I I I I I IIOI
"11
I I I I I I I I I I I1 I l l l l . l
I I
I I
II
lIIllI.IIIII.
I
------DKSENKKAQSKGKKGPKGKQTEETNQEQIKDNLPAENGETKSEETPASDAAVEKEEVKSE III.IIIIIIIIIIII
I
I1 I
I
I1
I11
IV
VI
V
B HMG-17
chicken PKRK AEGDTKGDKAKVKDZ PQRRSARLSA KPAPPKPEPKPKKAAPK KSEKVPKGKKGKADAGKEGNNPAENGLlAK TDQAEKAEGAGD--AK
IIII Ill1 I I I I I I I I I I IIIIIIIIII IIIIIIIIIIIIII I I
I I I I I I I I l I I I I I I I I l I l l l l l I I l IIII l l l l l l l - - l l
KPAPPKPEPKPKRAPAK KGEKVPXGlOCGKADAGKEGNNPAENGDAKTDQAQKAEGAGIJ-AK
HMG-17 human
PKRK AEGDAKGDKAKVKDE PQRFCSA?&SA
HMG-17
PKRK AEGDAKGDKAKVKDE PQRRSARLSA KPAPPKPEPKPKKAPAK K G E K V P K G K K G K A D A G K G D A K TNQAEKAEGAGD--AK
calf
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I l l ' l l l l l l l l l l l I I1 ill////
I1
I l l I I I I I I I I I I I I I I I IIIIIIIIII IIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIII I I I I I I I I I I I I I I
HMG-17 rat
PRRN AEGDAKGDKAKVKDE PQRRSARGSA KPAPPKPEPKPKKAPAK KGEKVPKGFXGKADAGKDGNNPAEDGDAK TNQAEKAEGAGD--AK
HMG-17 pig
PKRK AEGDAKGDKAKVKDE PQRR-SA
HMG-17 mouse
PKRK AEGDAKGDKTKVKDE PQRRSARLSA KPAPPKPEPKPIUAPAK KGEKVPXGKKGKADAGKDANNPAENGMX TDQAQKAEGAGD--AK
H6
PKRK SAT--KG------DEP W A R L S A RPVP-KPAAICPKIUUULP KU-V-KGCDICIU-----------AENGLlAK AEAKVQAAGDGAGNAK
EXON
trout
'IIIII
PKRK A - P A E G E A K E E - P I S A KPAPPKPEPKPKKUPK KEKAANDKKEDKKAATKGKKGAKGKG-ETK-QEDAKEENESEWGDKKTNE APAAEASDDK-EAKSE
I l l I I I I I I I I I I I I I I I IIOIIIIII IIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIII I I I I I I I I I I I I I I
KPAPPKPEPKPKKAPAK KGEKVPKGKKGKADAGKDGNNPAENGDAK TNQAEKAEGAGD--AK
Ill1 I I I I I I I I I ' I I I I I IIIIIIIIII IIIIIIIIIIIIIIIII
Ill1
II
I
I1
II I IIIIIIII - 1 I II
IIIII I
IIIlIIIIIlIIlIIIIl.IIIIIIIIIl 1.11 I
I IIII.
IIIIIII
V
I
IIIIIII
II
I I I
II
VI
HMG PROTEINS
77
In summary, the HMG-14/-17 family of proteins contains four evolutionarily conserved domains. The charged amino-acid residues are unevenly distributed along the polypeptide chain. There seems to be a correlation between the structure of the gene and that of the protein; some of the evolutionarily conserved protein domains are encoded by distinct exons. According to the “exon shuffling” hypothesis (235), it is conceivable that structural motifs similar to those present in HMG-14/-17 may be found in other proteins. Indeed, one of the proteins interacting with the thyroid hormone receptor in a hormone-dependent manner is highly homologous to HMG-14/-17 (236).
2. STRUCTURAL SPECIFICITY OF HMG-14 AND HMG-17 PROTEINS Although HMG-14 and HMG-17 proteins may have evolved from a common ancestor and have many features in common, structural analysis reveals a clear distinction between them. The two subgroups have less than 60% of their sequence in common. Multiple alignment of the protein sequence of each group (Fig. 13) indicates a high degree of sequence conservation among the HMG-17 and the HMG-14 proteins. In the HMG-17 group the sequences of the chicken, human, calf, rat, pig, and mouse differ from each other by less than 3%. Trout H6 is 62-67% similar to the various members of the HMG-17 group. The HMG-14 group is less conserved. The hydropathy index of the two protein groups is about 20 (indicative of a high content of hydrophilic amino acids); however, the hydropathy index profiles are clearly different, suggesting that the structures of the proteins are distinct from each other (237). Particularly noteworthy is the difference between the two protein groups in the 17 amino acids comprising the C-terminal half of their nucleosomal binding domain, which is encoded by exon IV (Fig. 14). In the HMG-17 group this region contains 7 prolines, whereas the HMG-14 group contains only 3 prolines. In summary, although the HMG-14 and HMG-17 chromosomal proteins
FIG. 13. Multiple alignment of HMG-14 and HMG-17 proteins. The protein sequences obtained from SWISSPROT version 31.0 were aligned with the MACAW program and the alignments were optimized visually. The accession number of the sequences are as follows: P02316, HMG14-BOVIN; P12274, HMG14-CHICK; P12902, HMGl5-CHICK; P05114, HMG14-HUMAN; P18608, HMGI4-MOUSE: P02313, HMG17-BOVIN; P02314, HMG17CHICK; P05204, HMG 17-H UMAN, PO9602, H MGl7-MOUSE : P80272, HMG 17-PIG; P18437, HMGIZRAT; P02315, Hti-ONCMY. Amino acids in the conserved domains are indicated by bold letters. Note that in chicken HMG-14a the region encoded by exon IV is identical to that encoded by exon IV of the HMG-17 group.
78
MICHAEL EUSTIN AND RAYMOND REEVES
HMG-17 (FROM RESIDUE 19) HMG-14 (FROM RESIDUE 14) Exon:
PqRRSARLSA PkRRSARLSA
KPAPpKpEpKPKKApAK KPAPaKvE( )KPKKAaGK
111
IV
FIG. 14. Differences between the HMG-14 and HMG-17 protein groups in the consensus sequence of their nucleosomal binding domains. Lowercase letters indicate positions at which the amino-acid residues differ between HMG-14 and HMG-17. Note that in the C-terminal portion encoded by exon IV, a11 the differences involve proline residues.
are similar in many respects, the two subgroups are clearly distinct. The high degree of sequence conservation, especially in the HMG-17 subgroup, suggests that the proteins are architectural elements in chromatin and that most of the primary sequence is necessary for their proper function. The structural daerences between the proteins and their copresence in every tissue raise the possibility that the two proteins participate in specific interactions, each of which is necessary for proper cellular function.
B. Interaction with DNA and Chromatin 1. COOPERATIVE INTERACTIONS WITH NUCLEOSOME CORES Chromosomal proteins HMG-14 and HMG-17 are located in the nucleus associated with the chromatin fiber. HMG-14/-17 are the only nuclear proteins known that specifically recognize the 146-bp nucleosornal core particle (199,200,238). Both proteins bind to nucleosome cores without any specificity for the underlying DNA sequence, suggesting that they recognize structural features specific to these chromatin subunits. Specific interactions between these proteins and nucleosomal core particles can be detected by mobility shift assays. At low ionic strength the binding of HMG-14 or HMG-17 protein to the nucleosomal cores produces two additional bands of lower mobility corresponding to complexes containing either one or two molecules of HMG protein per core particle. Under cooperative conditions only complexes containing two HMG inolecules per core particle are observed. The dissociation constant for the binding of the proteins to cores at low ionic strength (1.0 x 10-9) is about &th of that at higher ionic strength (1.0x lo-'), (201). The ionic-strength dependent differences in the affinity constants could be explained by assuming that the binding at low ionic strength is stabilized by nonspecific ionic interactions between the protein and the charged residues in the nucleosome core particle. Higher ionic strengths would weaken these interactions and increase the dependence of binding on stringent conservation of the residues in the binding domain. Indeed, the nucleosomal binding domain of the protein is highly conserved during evolution, and single-point mutations in this domain reduce the binding constant of the
79
HMG PROTEINS
proteins to nucleosomes (201).These results suggest that a distinct protein conformation is required for proper binding. Because in solution the proteins behave as random coils, it seems likely that the nucleosomal binding site induces a conformational change in the proteins. The ion concentration required for cooperative binding is close to physiological, suggesting that in the nucleus HMG proteins bind to chromatin in a cooperative fashion. Post-translational modifications of the HMG-14/-17 proteins may affect their interaction with nucleosomes. Of particular interest is phosphorylation of Ser-6 in HMG-14, which is one of the first molecular events associated with the induction of immediate-early genes on mitogenic stimulation (239). Phosphorylation reduces the f i n i t y of HMG-14 to nucleosome core particles (240); therefore, this post-translational modification might result in structural changes in chromatin regions containing HMG-14 protein. As shown in Fig. 15 the cooperative interaction of HMG-14/-17 proteins
QW
Core particles (CR)
J
CPc HMG complexes
Only heterodimers
Random mixture
Only homodimers
FIG. 15. Possible complexes between HMG-14/-17 and core particles. Under cooperative binding conditions, at ionic strength closer to physiological, HMC-14/-17 proteins form nucleosome complexes containing two molecules of HMG protein. The interaction of core particles with a n equirnolar mixture of HMG-14 and HMC-17 could potentially lead to three types of complexes. A nucleosonie core could bind exclusively one molecule of HMG-14 and one of HMG-17 to form heterodimers. A second possibility is that the binding is totally random. The third possibility is that the proteins segregate to form hoinodimer complexes. Recent results indicate that the interaction or core particles with an equimolar mixture of HMG-14 and HMG-17 proteins yields complexes containing, exclusively, either two molecules of HMG-14 or two molecules of HMG-17. The proteins "cross-talk" by inducing allosteric transitions in the nucleosome core particle (241).
80
MICHAEL BUSTIN AND RAYMOND REEVES
with nucleosome cores could lead to nucleosome complexes containing either a random mixture of these HMGs, complexes containing exclusively heterodimers (i.e., one molecule of HMG-14 and one of HMG-17), or complexes containing exclusively homodimers of either HMG-14 or HMG-17. Recent results indicate that the binding of HMG-141-17 to nucleosome cores is not random and that this interaction produces complexes containing either two molecules of HMG-14 or two molecules of HMG-17 (241). These results suggest that in chromatin these proteins may be clustered and associated with specific DNA sequences. Studies with deletion mutants suggest that the formation of homodimeric HMG complexes is not dependent on contacts between the nucleosome-bound HMG-141-17 proteins. Most probably the nucleosome-bound proteins “cross-talk by inducing specific allosteric transitions in the chromatin subunits. 2. THE NUCLEOSOMAL BINDINGDOMAIN OF THE HMG-141-17 PROTEINS The HMG-141-17 proteins bind to nucleosomes through a positively charged domain spanning residues 17 to 47 in the HMG-17 family and residues 12 to 41 in the HMG-14 family (195, 242). This region is evolutionarily conserved and has a characteristic amino acid composition; however, the HMG-14 subgroup is clearly distinct from the HMG-17 subgroup (see Section III,A,2). Studies with synthetic peptides indicated that a 30amino-acid peptide, corresponding to the nucleosomal binding domain of HMG-17, binds specifically to nucleosome cores and retains many of the binding characteristics of the intact protein. Point mutations in this protein region reduced the aflinity of the protein to cores (201). Removal of histone tails by trypsin digestion of nucleosomes abolishes the binding of both the peptide and the intact protein, suggesting that the histone tails are required for binding (195). The finding that a protein region can act as an independent functional domain suggests that the HMG proteins are modular proteins containing several functional motifs. Experiments in progress indicate that the negatively charged C-terminal domain is involved in transcriptional activation (243). 3. THE ORGANIZATION OF HMG-14/-17 IN NUCLEOSOME CORES
A model of the location of HMG-14/-17 proteins in nucleosomes is presented in Fig. 16. This model is based mainly on data obtained by DNase-I and hydroxyl-radical footprinting (196) and on the analysis of DNA-protein and protein-protein cross-links in HMG-nucleosome core complexes. In this schematic model two HMG molecules are bound by their N-ter-
HMC PROTEINS
81
FIG. 16. A model of the organization of HMG-14/-17 proteins in nucleosorne core particles. Two molecules of HMG contact the DNA approximateIy 25 bp from the entry/exit paint of the core (the histones in the octamer are depicted as spheres) and in the two major grooves flanking the dyad ityis of the particle (+). Thus, the HMG proteins may stabilize the structure of the nucleosome by bridging the two DNA strands looping around the histone octarner. Part of the HMG proteins may be in contact with, and cause structural changes in, the histone octamer.
minal regions to the DNA 20 to 30 base pairs from the ends of the core particle DNA, in the region where the DNA starts and ends looping around the histone octamer. The protein loops under one of the DNA strands and emerges on the surface of the central DNA strand in the major groove neighboring the nucleosomal dyad axis. In this way, the protein forms a bridge across two adjacent DNA strands on the front surface of the core particle. As elaborated elsewhere (196), this model is based on the following experimental results: (1) mobility shift assays and DNA cross-linking experiments that indicate that each core particle has two binding sites for either HMG-14 or HMG-17 (195, 199-201, 203, 238); (2) DNase-I digestion and DNA-protein cross-linking experiments that indicate that the two HMGs bind to a region about 20 base pairs away from the end of the core particle DNA (195, 196, 199, 200, 244); (3)DNA-protein cross-linking experiments that indicate that part of the HMG proteins is located at the inner surface of the DNA that faces the histone octainer (244);(4)iinmunochemical experiments that indicate that the DNA-binding domain of the protein is sterically hindered, and the C-terminal region exposed, to antibody binding (245);(5) NMR spectroscopy experiments that indicate that the proteins interact with the core particles through their central, positively charged region (242, 246);
82
MICHAEL BUSTIN AND RAYMOND REEVES
(6) mobility shift, thermal denaturation and DNase-I digestion assays that indicate that a peptide corresponding to the positively charged binding domain (residues 17-47 of HMG-17) of the HMGs mimics the binding of the entire molecule (195); (7) protein cross-linking experiments that indicate preferential interaction with histone H2A (247) and H3 (248); (8) protein cross-linking experiments that indicate that the central region of histone H 3 is near the central region of the HMGs, suggesting that they are located near the dyad axis of the core particle (248). This model is consistent with some of the observation on the effect of HMG-141-17 on the structure of nucleosome cores and chromatin (see Section III,B,4). In addition, as discussed in Section III,C, the model raises the possibility that interactions between histone H I and HMG-14/-17 may affect the structure and the transcription potential of the chromatin fiber. 4. EFFECTOF HMG-141-17 ON THE STRUCTURE OF NUCLEOSOMES AND CHROMATIN The binding of HMG-14/-17 to chromatin subunits increases the stability of these particles and is accompanied by only small changes in the radius of gyration of the chromatin subunit, perhaps due to minor conformational changes (reviewed in 7, 196). The model in Fig. 16 is consistent with these findings. HMG-141-17 proteins bridge two adjacent DNA strands on the surface of the core, and therefore could stabilize the structure of the nucleosome core particles by inhibiting the unraveling of the DNA from the histone octamer. The binding of the proteins would not necessarily cause significant changes in the size or structure of the particle. Neutron scattering experiments on the binding of HMG-14/-17 to salt-washed chromatin suggest that the proteins decrease the mass per unit length of the chromatin fiber without changing the chromatin fiber repeat distance (249). These results are in agreement with studies suggesting that the proteins render the chromatin fiber more susceptible to digestion by several nucleases (11). However, the proteins do not prevent the formation of higher order chromatin structure (250). In summary, the binding of HMG-14/-17 proteins to nucleosomes induces minor structural changes in these particles. These proteins stabilize the structure of the nucleosome subunits and at the same time destabilize the higher order structure of the chromatin fiber. In uitro studies in which HMGs are added to preassembled chromatin may result in structures different from those assembled in the intact cells. Chromatin assembly and maturation is an orderly process involving sequential deposition of the H3-H4 histone tetramer followed by the deposition of two H2A-H2B dimers and establishment of proper nuclear spacing (reviewed in 251,252).Furthermore, the assembly of components into the final chromatin structure may be facilitated by specific factors and could depend
HMG PROTEINS
83
on the concentration of the components in the assembly mixture. For example, competition between binding of transcription factors and histones during chromatin assembly on replicating DNA affects the transcriptional potential of the resulting Chromatin template (253-254). Therefore, studies on the effect of HMG-14/-17 on the structure of chromatin must take into account that these proteins are an integral part of the chromatin fiber and that the kinetics of their assembly into the nucleosome may determine their effect on the structure of chromatin. Indeed, recent studies with chromatin assembled in extracts prepared fi-om Xenopus eggs indicate that HMG-14/-17 proteins are incorporated into niicleosomes prior to completion of chromatin assembly (11, 255). At present, the effect of HMG-14/-17 on the nucleosomal repeat is controversial. Assembly of minichromosomes from double-stranded DNA and an extract prepared from either Xenopus eggs (11, 255) or from Drosophila embryos (256) suggest that the proteins increase the length of the nucleosoma1 repeat and may serve as spacing factors (259, 260). On the other hand, studies in which minichromosomes were assembled from single-stranded M13 plasmids and an extract prepared from Xenopus eggs suggest that the proteins do not affect the nucleosomal repeat (11, 255, 257). The differences in the interpretation of the results may reflect minor differences in the experimental systems. In addition, interpretations of the effects of HMG on the nucleosomal repeat must take into account the molecular effects known to occur during the digestion of chromatin by micrococcal nuclease. As elaborated elsewhere (11, 257), it is known that due to the exonucleolytic activity of this enzyme and the tendency of nucleosome cores to slide, the length of the nucleosomal repeat gradually decreases during the course of digestion (258). Because HMG-14/-17 stabilize the position of the nucleosome core, they could protect the core from exonucleolytic attack and minimize nucleosome “sliding.” Thus, the oligonucleosomes derived from chromatin assembled in the presence of these proteins would be somewhat longer than those assembled in the absence of the proteins. The HMG-dependent increase in the length of the nucleosome multimers could be interpreted as an indication that HMG-14/-17 can act as nucleosomal spacing factors (259, 260). However, as elaborated above and elsewhere (11, 255), this interpretation is difficult to reconcile with the kinetics of chromatin digestion by micrococcal nuclease, and with other contradictory results. Further studies are needed to determine whether HMG-14/-17 proteins alter the nucleosomal spacing in the nucleus. The minichromosomes assembled from M13 DNA, in the presence of HMG proteins, have a more extended conformation than those assembled in the absence of the proteins (11).It has been suggested that the HMGs could
84
MICHAEL BUSTIN AND RAYMOND REEVES
A HISTONES
/
DNA
CHROMATIN ASSEMBLY
B FIG. 17. Effect of HMG-14/-17 proteins on chromatin structure. Cellular chromatin is assembled during replication. Assembly in the absence of HMG yields structure B, which is more compact than structure A, which represents chromatin assembled in the presence of HMG. It is important to note that the length x of the linker region (i.e., the nucleosomal repeat) has not changed. The concept is similar to that presented by Hansen and Ausio (261)for core histone termini. HMG-14/-17 may unfold chromatin by interacting with the termini of core histones (11), with histone H1 (263), or with both, By unfolding the chromatin template, HMG-141-17 proteins enhance the transcriptional potential of chromatin.
unfold the minichromosomes, without changing the nucleosomal repeat by interacting with core histone tails, which may play a role in chromatin folding (11, 255, 261). Likewise it is possible that HMG-14/-17 proteins unfold the chromatin fiber by modlfying the interaction of the linker histone H1 with nucleosomes near the dyad axis (196, 262). Indeed, recent studies with SV40 minichromosomes provide direct evidence that an interplay between HMG-14 and histone H1 affects the rate of RNA polymerase I1 elongation on the chromatin template (263). Figure 17 presents a scheme of the effect of HMG-14/-17 on chromatin structure. In summary, studies on the interaction of HMG-14/-17 with chromatin have to take into account the kinetics of chromatin assembly that occurs during DNA replication. Addition of HMG to preasseinbled chromatin may give a structure similar, but not identical, to that assembled under more physiological conditions (see also Section 111,C). Incorporation of HMG-14/-17 into chromatin during replication unfolds the chromatin fiber without significantly affecting the nucleosomal repeat. These effects may be mediated by interaction with the termini of the core histones or with histone H1. Conceivably, by unfolding the higher order chromatin structure, the proteins may increase the accessibility of target sequences to the transcriptional apparatus and facilitate transcription through a nucleosome.
85
HMG PROTEINS
C. Cellular Function and Mechanism of Action 1. HMG-14/-17
IN
ACTIVE GENES
The presence of HMG-14 and HMG-17 proteins in all the cells of higher eukaryotes suggests that both of these proteins are necessary for proper cellular function; however, in spite of numerous experiments, their role is not fully understood. Most probably, their role in cellular function depends on specific interactions with nucleosoines in chromatin, perhaps through the evolutionarily conserved domains characteristic of this protein family (see Section 111,A). Many of the experimental data available (for a comprehensive review of previous experiments see 1-9) are consistent the possibility that the proteins are involved in some aspect of transcriptional regulation. Weintraub and collaborators were first to suggest that HMG-14/-17 may modulate the chromatin structure of active genes (264). This proposal remained controversial because differences between H MG-free and HMGbound particles could not be demonstrated, and because these proteins did not always affect the DNase-I sensitivity of active genes. The finding that the structure and transcriptional potential of chromatin are dependent on the kinetics of chromatin assembly (11, 255), rather than on the composition of the assembled chromatin, and the tendency of these HMG proteins to migrate and rearrange even at low ionic strength (265)could account for some of the discrepancies in the experimental results obtained by various laboratories. Reconstitution experiments with isolated nucleosomes revealed that HMG-14/-17 proteins preferentially bind to particles enriched in sequences from transcribed genes (199, 266). However, studies with mononucleosoines of the avian P-globin cluster suggested that, although HMG-17 binds to isolated nucleosome core in a tissue-specific manner, this interaction is not always correlated with the DNAse-I hypersensitivity or active gene transcription (267). Thus, nucleosomes containing HMG-141-17 inay have unique features that are preserved even when the proteins have been removed. For example, HMGs inay recognize particles enriched in acetylated histones or with an increased length of linker DNA (238, 268). In these reconstitution experiments it is not clear whether the HMG-14/-17 proteins indeed reassociated with the same sequences they were originally bound to in chromatin. Immunochemical approaches have been used to assess the intracellular distribution of nucleosome-bound HMG proteins. Immunofluorescence studies indicated that antibodies against HMG-14 preferentially stain transcriptionally active regions in polytene chromosomes of Chironomus palliduittatus (269). Microinjection of antibodies to HMG-17 into human fibroblasts inhibited transcription (270). These results are in agreement with
86
MICHAEL BUSTIN AND RAYMOND REEVES
the suggestion that the two proteins are preferentially associated with transcriptionally active chromatin. Immunoaffinity chromatography experiments indicate that chromatin regions containing transcribable genes are only twoto threefold enriched in HMG-14/-17 as compared to total nuclear DNA (271-273). Immune precipitation experiments suggested that HMG-17 protein is clustered downstream from the start of transcription, which is depleted of nucleosomes and HMG proteins (272). These experiments must be viewed with caution because the ionic conditions used could have led to protein rearrangements. The problems associated with protein rearrangements can be minimized by cross-linking the proteins prior to fractionating the chromatin. Using this approach it was found that the transcribed chromatin of chicken embryonic P-globin gene has a 1.5- to 2.5-fold increase in HMG-141-17 content and a 2-fold lower density of H I (274).Because histone H1 compacts the structure of the chromatin fiber, whereas HMG-141-17 may induce an more open conformation, these compositional differences suggest that the chromatin structure of a transcriptionally active gene is indeed significantly different from that of untranscribed genes. The results are also consistent with nucleosome footprinting studies (Section 111,B,3)and recent studies with SV40 minichromosomes (263), which indicate that an interplay between HMG-14/-17 and histone H 1 may affect the transcription potential of chromatin. 2. CHANGESIN HMG-14/-17 DIFFERENTIATION
DURING
CELLULAH
Cellular differentiation is often accompanied by a programmed change in the repertoire of expressed genes. In view of the putative role of HMG-141-17 in chromatin structure and gene expression, it was of interest to study the expression of these HMGs during differentiation. (reviewed in 15). Analyses of the mRNA levels during the course of erythropoiesis (275), myogenesis (276),osteoblast differentiation (277), and the differentiation of several additional cell lines (278) indicate that undifferentiated cells synthesize more HMG mRNA than do differentiated cells. The differentiation-related downregulation in HMG-14/-17 mRNA levels is not due to cell-cycle-associated events. Inhibitors of DNA synthesis do not significantly affect the HMG-14/-17 mRNA levels. However, there seems to be a positive correlation between the rate of cellular DNA synthesis and the rate of HMG mRNA synthesis, suggesting that the levels of HMG-141-17 mRNA may also be regulated by cell-cycle events. The biological significance of the differentiation-related down-regulation in HMG-14/-17 expression is not obvious, in that it is difficult to ascertain whether these changes are a prerequisite, or a consequence, of the differentiation program. This question was addressed in a study in which myoblasts
HMG PROTEINS
87
were transfected with plasmids expressing HMG-14 under the control of the dexamethasone-sensitive MMTV promoter (279). Low levels of dexamethasone do not affect the differentiation of myoblast into myotubes. The transfected cells dfierentiated normally in the absence of the inducer. However, addition of dexamethasone to these cells induced the synthesis of HMG-14 mRNA and inhibited the myogenic process. Revertants of these cells, which lost the ability to synthesize HMG-14 mRNA, were not affected by addition of dexamethasone. These results suggest that myogenic differentiation may require regulated levels of HMG-14 protein. The gene coding for human HMG-14 protein is located on chromosome 21 in a region whose triplication is associated with the etiology of Down syndrome, one of the most common human birth defects. The levels of HMG-14 mRNA and protein are elevated in tissues taken from individuals suffering from Down syndrome (280) and in trisomy-16 mice, an animal model for this human syndrome (279). Because HMG-14 may modulate the structure of active chromatin, an imbalance in this gene may have pleiotropic effects on gene expression, resulting in the complex phenotype characteristic of Down syndrome. However, recent studies indicate that transgenic mice overexpressing human HMG-14 have only very mild abnormalities in their thymus (287). Thus, the experimental data do not suggest that overexpression of HMG-14 by itself has a deleterious effect on differentiation. Perhaps synergistic interactions between elevated levels of HMG-14 and other proteins encoded by genes located on chromosome 21 contribute to the etiology of Down syndrome.
3. HMG-14/-17 ARE NOT CLASSICAL TRANSCRIPTION FACTORS Because the structure of HMG-14/-17 proteins is reminiscent of that of certain transcription factors and because HMG-14/-17 proteins enhance the transcription potential of chromatin templates (see Section III,C,4), it is possible that these proteins can function as transcription factors. The possibility has been examined in Succharomyces cerevisiae cells expressing LexA-HMG fusion proteins, which bind to reporter plasmids containing the P-galactosidase gene downstream from the ZexA operator (137).The LexAHMG fusion protein did not elevate the level of P-galactosidase expressed in the yeast cells, suggesting that the HMG proteins do not function as classical transcription activators. THE TRANSCRIPTIONAL POTENTIAL 4. HMG-14/-17 INCREASE OF CHROMATIN BMPLATES
New insights into the possible role of HMG-14/-17 in affecting the structure and transcriptional potential of chromatin were obtained using minichromosomes assembled in extracts obtained from Xenopus eggs or Dro-
88
MICHAEL BUSTIN AND RAYMOND REEVES
sophila embryos and in SV40 minichromosomes isolated from CV-1 cells. Although some of the components in these assembly systems are not fully characterized, chromatin assembly in cell extracts may provide additional insights that cannot be obtained from chromatin templates reconstituted from purified components. Using a reconstituted Xenopus luevis egg extract chromatin assembly system, in which Xenopus Nl/N2.(H3,H4) complexes and chicken H2A and H2B histones were assembled onto double-stranded DNA, it was found that phosphorylated HMG-141-17 extracted from human placenta can stimulate transcription, perhaps by replacing histones H2A and H2B (281).However, other studies with similar extracts, in which the minichromosomes were assembled from single-stranded templates (11, 255), as well as studies in which Drosophila embryo extracts were used to assemble minichromosomes from double-stranded DNA (256), did not find a requirement for phosphorylation and failed to detect an HMG-14/-17-related decrease in the amount of histones H2A and H2B present in the chromatin templates. Ding et al. introduced the human HMG-14 cDNA into CV-1 cells, which are permissive to SV40 infection, and established cell lines expressing elevated levels of HMG-14 (282).Minichromosomes isolated from these cell lines contain elevated levels of HMG-14 protein. In these minichromosomes, transcription from both the early and late SV40 promoters was increased 2.5 and 5.5 times, respectively, compared to control minichromosomes. Transcription was elevated from chromatin, but not from deproteinized DNA templates. HMG-14 stimulated the rate of RNA polymerase-I1 elongation but not the level of initiation of transcription. Transcriptional enhancement was also observed in experiments in which recombinant HMG-14 protein was added to purified minichromosomes, isolated from nontransfected, parental CV-1 cells. In this experimental protocol, a HeLa cell extract supplies all the components necessary to support RNA polymerase-I1 transcription from SV40 chromatin templates. HMG-14 may alleviate the inhibitory effects of a component present either in the HeLa extract or in the isolated minichromosomes. Recent results suggests that HMG-14 stimulates transcription by negating the repressive effects of the linker histone H1 (263). Similar results were obtained by analyzing the effects of HMG-14/-17 proteins on the polymerase-111-driven transcription of the Xenopus borealis 5-S RNA gene, which was assembled into minichromosomes in a Xenopus lueuis egg extract (11, 255). In these extracts, single-stranded M13 plasmids carrying the 5-S RNA gene are converted into double-stranded DNA and assembled into minichromosomes. During this process transcription factors compete with histones for binding to promoter regions. Transcription occurs
HMG PROTEINS
89
from only a small fraction of the templates in which the transcription factors prevent the assembly of nucleosomes on the promoter regions. Addition of recombinant human HMG-14 or HMG-17 protein to the extracts increases the transcription potential of these minichromosomes, but not that of “naked” double-stranded DNA. The increase in transcription potential is observed only if the HMG proteins are present in the extract during chromatin assembly. Addition of HMG-14/-17 to preassembled minichromosomes did not affect the transcription potential of the minichromosomes. Single round transcription assays indicated that the proteins stimulate transcription by increasing the specific activity, and not the number, of transcribed templates. Structural analysis of these minichromosomes suggested that the specific activity of the template increased because the HMG-14/-17 proteins reduced the compactness of the template. By decreasing the compactness of the templates the proteins facilitate the accessibility of RNA polymerase, and perhaps additional transcription factors, to their target sequences. Similar results were recently described in another experimental system, in which minichromosomes were assembled by a Drosophila embryo extract using double-stranded DNA and exogenously added histones (256). In these experiments recombinant HMG-17 protein, in conjunction with the sequence-specific activator GAL4-VP16, stimulated transcription by RNA polymerase I1 from chromatin, but not from DNA templates. In agreement with the previous results, the protein stimulated transcription initiation only when assembled into chromatin together with histones. Thus, experiments using various assembly systems indicate that HMG-14/-17 proteins can stimulate transcription from chromatin, but not from DNA templates. In most cases the timing of incorporation of the HMGs into chromatin is important. In spite of some variations in the results, most of the data are consistent with the possibility that HMG-14/-17 proteins stimulate transcription by unfolding the chromatin template (11). The ability of HMG-14/-17 to enhance transcription from chromatin templates provides a functional assay for these proteins. Studies with N-terminal and C-terminal deletion mutants revealed that the negatively charged C-terminal region of the proteins is involved in the transcription activation function (11).A peptide corresponding to the nucleosomal binding domain of the protein failed to enhance transcription. In fact, addition of this peptide to an assembly system inhibited the ability of the intact proteins to enhance the transcription potential of chromatin, suggesting that the peptide competitively inhibited the assembly of the intact protein into chromatin. Subsequent studies with shorter peptides indicated that the minimal nucleosomal binding domain spans residues 17-40 of HMG-17. These results suggest that HMG-14/-17 proteins are modular and that the structural domains of this
90
MICHAEL BUSTIN AND RAYMOND REEVES
protein family (see Fig. 12) may correspond to distinct functional motifs. A modular structure may be of advantage for proteins that participate in multiple cooperative interactions. What is the mechanism whereby HMG-14/-17 proteins reduce the compactness of the chromatin fiber? One possibility is that the proteins increase the nucleosomal spacing and reduce the density of the nucleosomes along the DNA fiber (257, 259, 260). Most of the physical measurements and the micrococcal nuclease digestion studies are not consistent with this possibility (11,255). A more plausible possibility is that the proteins modify the interaction of histones with DNA. HMG-14/-17 may affect either the interaction of the core histone tails with DNA (11)or the binding of the linker histone H1 to nucleosomes. The latter interaction is suggested by footprinting studies indicating that both histone H 1 and HMG-14/-17 interact with nucleosomes near the dyad axis (196,262) by immunofractionation studies suggesting that chromatin regions enriched in HMG-14/-17 are depleted of H1(274), and by recent experiments with SV40 minichromosomes which demonstrate that HMG-14 relieves an H1-mediated inhibition of transcriptional elongation (263). Interactions of HMG-14/-17 proteins with histone H1 and with the termini of core histones are not mutually exclusive. Both of these interactions could synergistically act to reduce the compactness of the chromatin fiber and enhance the transcriptional potential of a chromatin template. In view of the many similarities between HMG-14 and HMG-17 it is puzzling that all cells contain both of the proteins. It is well documented that the binding of HMG-14/-17 to nucleosomes is associated with structural changes in these chromatin subunits. Recent findings that these proteins bind to nucleosomes to form specific complexes that contain either two molecules of HMG-14 or two molecules of HMG-17 (241) suggest that each of the proteins induces specific allosteric transitions in the particles. Thus, HMG-14 and HMG-17 may be involved in different functions or affect the transcription of different sets of genes. Indeed, mitogenic stimulation of immediate-early gene transcription is associated with rapid and extensive phosphorylation of HMG-14 but not of HMG-17 (239). Apparently both HMG-14 and HMG-17 are necessary for proper function; however, a gene deletion experiment suggests that HMG-17 protein is not necessary for the in vitro growth of chicken DT40 cells (283). How do these proteins, which bind to chromatin without any specificity for the DNA sequence, recognize transcriptionally active regions in chromatin? One possibility is that the proteins bind to unique regions in chromatin, perhaps those with a unique nucleotide composition or those enriched in histone variants. Indeed, immunoafhity chromatographic studies suggest that the proteins are preferentially associated with nucleosomes enriched in
91
HMG PROTEINS
acetylated histones (268).A second possibility is that the deposition of HMG into chromatin is regulated by cell-cycle events. Because the levels of HMG-141-17 mRNA rise sharply at the GUS boundary (284) it is conceivable that, at this point in the cell cycle, the level of newly synthesized HMG protein also increases. Transcriptionally active genes are preferentially replicated early in S phase; therefore, it is possible that they preferentially assemble into nucleosomes containing HMG proteins. Thus, the HMG content of various chromatin regions may depend on a coupling between the synthesis of the protein and the replication of specific DNA sequences. A coupling between the timing of protein synthesis and chromatin assembly may provide a general mechanism whereby structural protein can be targeted to chromatin regions containing specific DNA sequences. In summary, most of the data suggest that HMG-141-17 proteins indeed are associated with transcriptionally active regions in chromatin and that they modify the structure of chromatin so as to facilitate transcription. The content of HMG-14/-17 in active chromatin is approximately twice that in inactive chromatin. This seemingly small enrichment may have significant effects on the local chromatin structure, especially if the presence of the proteins interferes, or modulates, the binding of histone H1 and is associated with regions enriched in acetylated core histones. Most of the data suggest that the proteins are not functioning as classical transcription factors. The proteins seem to function as architectural components in chromatin, that is, they modify the structure so as to facilitate a function. By reducing the compactness of chromatin they facilitate transcription without actually being a part of the transcription complex.
IV. Summary and Perspective A survey of the literature pertaining to the function of the HMG proteins does not provide a clear answer as to the particular function of these proteins. Most of the data suggest that they are associated with selected regions in chromatin; however, the binding does not seem to be dependent on the DNA sequence. Thus, HMG-1/-2 bind preferentially to regions containing unique DNA conformations or bends. HMG-14/-17 recognize structural features specific to nucleosomes, whereas HMG-I(Y) preferentially binds to regions enriched in AT, From quantitative considerations it is obvious that the proteins are associated with only a subset of the genome. Thus, a major question pertaining to the HMG proteins is elucidation of the mechanisms whereby these proteins are targeted to restricted regions in chromatin. We have suggested that cell-cycle events, in which protein synthesis, or mod-
92
MICHAEL BUSTIN AND RAYMOND REEVES
ification, is coupled to chromatin assembly, may serve as a mechanism whereby architectural proteins can be targeted to specific regions in a fashion independent of the DNA sequence (257). Historically, the HMG proteins were somewhat arbitrarily categorized as a protein group based on certain shared chemical and physical properties (7, 8), without any preconceived notion that various members of the group might also be related in other ways as well, such as by their common ability to recognize variations in DNA structure. Furthermore, it is now apparent that these proteins, as a group, also have the ability to modify the structure of DNA or chromatin and by doing so facilitate specific functions. The question arises whether HMG proteins function only as nonessential “facilitators” to improve a cellular process or if they are components that are necessary to cell survival. For example, it has been suggested that HMG-1/-2 proteins function as DNA chaperons, to bend the DNA and facilitate chromatin assembly (285);yet, nucleosome cores and even chromatin can be assembled in the absence of these proteins. Likewise, HMG-14/-17 proteins enhance the transcription potential of chromatin (11,255, 282); yet, transcription also occurs from templates lacking these proteins. The widespread occurrence of these proteins seems to argue that their presence is obligatory for cell survival; yet HMG-17 is not necessary for survival of chicken D40 cells (283). All higher cells contain not only all the classes of HMG proteins, but also each of the structural homologs (i.e., HMG-1 and HMG-2; HMG-14 and HMG-17; HMG-I and HMG-Y). This strongly suggests that all these proteins are in fact obligatory components and that each member of the family is involved in a particular function or associated with a discrete set of genes. Indeed, immunofluorescence studies indicated that the HMG-11-2 variants are differentially distributed in Chironomus polytene chromosomes (286). Likewise, in vitro binding studies revealed that HMG-14 and HMG-17 bind to nucleosomes to form complexes containing either two molecules of HMG-14 or two molecules of HMG-17 (241). Thus, a second important problem is classification of the genomic regions associated with each type of HMG protein and determination of whether some of these interactions can be altered. However, it is important to note that in some cases the effect of HMG on the activity of a template depended on their kinetics of assembly into chromatin (11, 255). Thus, studies on the function of HMG must take into account not only their location in the genome but also their pathway of assembly into the final chromatin structure. In conclusion, most of the data available on HMG proteins suggest that these proteins are associated with chromatin and that this association affects the architecture and increases the structural complexity of the chromatin fiber. Studies on their function are relevant to the understanding of the role of chromatin in regulating the genetic information encoded in DNA.
93
HMG PROTEINS
ACKNOWLEDGMENT We thank Ms. Sabrina Ferguson for editorial assistance
REFERENCES 1 . K. E. van Holde, “Chromatin.” Springer-Verlag, New York, 1989. 2 . A. Wolffe, “Chromatin: Structure and Function.” Academic Press, San Diego, CA, 1992. 3. T. Owen-Hughes and J. L. Workman. CRC Crit. Reu. Gene Expression 11, 1 (1994). 4. S. M. Paranjape, R. T. Kamakaka and J. T. Kadonaga, ARB 63, 265 (1994). 5. M. Grunstein, Annu. Rev. Cell B i d . 6, 643 (1990). 6. A. P. Wolffe, Cell 77, 13 (1994). 7 . M. Bustin, D. A. Lehn and D. Landsman, BBA 1049, 231 (1990). 8. E. W. Johns, “The HMG Chromosomal Proteins” Academic Press, London, 1982. 9. L. Einck and M. Bustin, Exp. Cell Res. 156, 295 (1985). 10. S. A. Onate, P. Prendergast, J. P. Wagner, M. Nissen, R . Reeves, D. E. Pettijohn and D. P. Edwards, MCBiol 14, 3376 (1994). 1 1 . L. Trieschmann, P. J, Alfonso, M . P. Crippa, A. P. Wolffe and M. Bustin, E M B O J . 14, 1478 (1995). 12. S . J. Fashena, R . Reeves and N . H. Ruddle, MCBiol 12, 894 (1992). 13. D. Thanos and T. Maniatis, CSllSQB 58, 73 (1993). 14. S. John, R. Reeves, J.-X. Lin, R. Child, J. M. Leiden, C. B. Thompson and W. J. Leonard, MCBiol 15, 1786 (1995). 15. M. Bustin, M . P. Crippa and J. M . Pash, CRC Crit. Reu. Eukaryotic Gene Expression 2, 137 (1992). 16. R. Grosschedl, K. Giese and 1. Pagel, Trends Genet. 10, 94 (1994). 16a. 6. H. Goodwin and M. Bustin, in “Architecture of Eukaryotic Genes” (G. Kahl, ed.), p. 187. VCH Press, Germany, 1988. 17. A. D. Baxevanis, S. H. Bryant and D. Landsman, NARes 23, 1019 (1995). 18. A. D. Baxevanis and D. Landsman, NARes 23, 1604 (1995). 19. D. Landsman and M. Bustin, BioEssays 15, 539 (1993). 20. M. Stros, S. Nishikawa and 6. H. Dixon, EJB 225, 581 (1994). 21. L. Wen, J. K. Huang, B. H. Johnson and 6 . R. Reeck, NARes 17, 1197 (1989). 22. A. Majumar, D. Brown, S. Kerhy, I. Rudzinski, T. Polte, Z. Randawa and M. M. Seidman NARes 19, 6643 (1991). 23. M. Kinoshita, S. Hatada, M. Arashima and M. Noda, FEBS Lett. 352, 191 (1994). 24. H . Shirakawa, K.-I. Tsuda and M. Yoshida, Bchem 29, 4419 (1990). 25. C. R. Wagner, K. Hamana and S. C. R . Elgin, MCBiol 12, 1915 (1992). 26. J. R. Wiseniewski and E. Schulze, JBC 267, 17170 (1992). 27. S. S. Ner, M . E. A. Churchill, M . A. Searles and A. A. Travers, NARes 21, 4369 (1993). 28. K. D. Gasser and G. Felix, NARes 19, 2573 (1991). 29. K. D. Grasser, PlantJ. 7, 185 (1995). 30. T. Hayashi, H. Hayashi and K. Iwai, J. Biochem. 105, 577 (1989). 31. I. 6. Schulman, T. Wang, M . Wu. J. Bowen, R. G . Cook, M. A. Gorovsky and C. D. Allis, MCBiol 11, I66 (1991). 32. D. Kolodruhetz and A. Burgum, J B C 265, 3234 (1990). 33. J. F. X. Diflley and B. Stillman, €“AS 88, 7864 (1991). 34. S. Ferrari, L. Ronfani, S. Calogero and M . E. Bianchi, JBC 269, 28803 (1994).
94
MICHAEL BUSTIN AND RAYMOND REEVES
H. Shirakawa and M. Yoshida, JBC 267, 6641 (1992). K. Nightingale, S. Dimitrov, R. Reeves and A. P. Wolffe, unpublished (1996). S . S . Ner, Curt-. B i d . 2, 208 (1992). G. R. Reeck, P. J. Isackson and D. C. Teller, Nature 300, 76 (1982). M . Carballo, P. Puigdomenech and J. Palau, E M B O J . 2, 1759 (1983). P. D. Cary, C. H. Turner, E . Mayes and C. Crane-Robinson, EJB 131, 367 (1983). M. E. Bianchi, L. Falciola, S. Ferrari and D. M. Lilley, EMBO J. 11, 1055 (1992). M. Stros and M. Vorlickova, Znt. J. Biol. Macrornol. 12, 282 (1990). L. A. Kohlstaedt, E. C. Sung, A. Fujishige and R. D. Cole, JBC 262, 524 (1987). L. A. Kohlstaedt and R. D. Cole, Bchem 33, 570 (1994). L. A. Kohlstaedt and R. D. Cole, Bchem 33, 12702 (1994). M. Stros, J. Stokrova and J. 0. Thomas, NARes 22, 1044 (1994). H. M. Jantzen, A. Admon, S . P. Bell and R. Tjian, Nature 344, 830 (1990). V. Laudet, D. Stehelin and H. Clevers, NARes 21, 2493 (1993). M. A. Parisi and D. A. Clayton, Science 252, 965 (1991). A. H. Sinclair, P. Berta, M . S. Palmer, J. R. Hawkins, B. L. Griffiths, M. J. Smith, J. W. Foster, A. M . Frisch, B. R. Lowell and P. N. Goodfellow, Nature 346, 240 (1990). 51. J. Gubbay, J. Collignon, P. Koopman, B. Capel, A. Economou, A. Musterberg, N. Vivian, P. Goodfellow and B. R. Lovell, Nature 346, 245 (1990). 52. A . Travis, A. Amsterdam, C. BBlanger and R. Grosschedl, Genes Deu. 5, 880 (1991). 53. M . van de Wetering, M. Oosterwegel, D. Dooijes and H. Clevers, EMBO J, 103, 123 (1991). 54. D. Kolodrubetz, W. Haggren and A. Burgum, FEBS Lett. 238, 175 (1988). 540. D. Kolodrubetz and A. Burgum, JBC 265, 3234 (1990). 55. S. L. Bruhn, P. M. Phil, J. M. Eissigman, D. E. Houseman and S. J. Lippard, PNAS 89, 2307 (1992). 56. M. Shirakata, K. Huppi, K. Okazaki, K. Yoshida and H. Sakano, MCBiol11, 4528 (1991). 57. H. Weir, P. J. Kraulis, C. S. Hill, A. R. C. Raine, E. D. Laue and J. 0. Thomas, EMBOJ. 12, 1311 (1993). 58. C. M. Read, P. D. Cary, C. Crane-Robinson, P. C. Driscoll and D. G. Norman, NARes 21, 3427 (1993). 59. D. N. Jones, M. A. Searles, 6 . L. Shaw, M. E. Churchill, S . S . Ner, J. Keeler, A. Travers and D. Neuhaus, Structure 2, 609 (1994). 60. M. H. Werner, J. R. Huth, A. M. Gronenborn and 6. M. Clore, Cell 81, 705 (1995). 61. R. Reeves and M. S . Nissen, JBC 265, 8573 (1990). 62. A. D. Baxevanis, S . H. Bryant and D. Landsman, NARes 23, 1019 (1995). 63. S. P. Bell, C. S . Pikaard, R. H. Reeder and R. Tjian, Cell 59, 489 (1989). 64. R. P. Fisher, M. A. Parisi and D. A. Clayton, Genes Dev. 3, 2202 (1989). 65. C. S. Pikaard, L. K. Pape, S. L. Henderson, K. Ryan, M. Paalman, M. A. Lopata, R. H. Reeder and B. Sollner-Webb, Cell Mol. Biol. 10, 4816 (1990). 66. S . Ferrari, V. R. Harley, A. Pontiggia, P. N. Goodfellow, R. Lovell-Badge and M. E. Bianchi, EMBO J. 11, 4497 (1992). 67. M. van de Wetering and H. Clevers, EMBO J. L l , 3039 (1992). 68. J. Guesem, A. Amsterdam and R. Grosschedl, Genes Dev. 5, 2567 (1991). 69. N. Nasrin, C. Buggs, X. F. Kong, J. Carnazza, M. Goebl and M. Alexander-Bridges, Nature 354, 317 (1991). 70. J. L. Kim, D. B. Nikolov and S. K. Burley, Nature 365, 520 (1993). 7 1 . Y. C. Kim, J. H. Geiger, S. Hahn and P. B. Sigler, Nature 365, 512 (1993). 72. D. B. Starr and D. K. Hawley, Cell 67, 1231 (1991).
35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50.
HMG PROTEINS
95
73. N . C. Seeman, J. M. Rosenberg arid A. Rich, PNAS 73, 804 (1976). 74. C. M. Read, P. D. Cary, N . S. Preston, M. Lenicek-Allen and C. Crane-Robinson, E M B O J. 13, 5639 (1994). 75. V. R. Harley, D. I. Jackson, P. J. Hextall, J. R. Hawkins, 6. D. Berkovitz, S. Sockanathan, R. Lovell-Badge and P. Goodfellow, Science 255, 453 (1992). 76. C . 0. Pabo and R. T. Sauer, ARB 61, 1053 (1992). 77. M. E. Bianchi, EMBO J. 7, 843 (1988). 78. M. E. Bianchi, M. Beltrame and G. Paonessa, Science 243, 1056 (1989). 79. M. E. Bianchi, Mol. Microbio/. 14, 1 (1994). 80. L. Falcola, D. Hill, R. Reeves and M. E. Bianchi, unpublished observations (1995). 81. M. E. Biarichi, in “DNA-Proteii1:Structure Interactions” (D. M . J. Lilley, ed.). IRL, Oxford, 1995. 82. M. E. Bianchi and D. M. J. Lilley, Nature 375, 532 (1995). 83. L. Falciola, A. I. H . Murchie, D. M. J. Lilley and M. E. Bianchi, NARes 22, 285 (1994). 84. E. Bonnefoy, M. Takahashi and J. R. Yaniv, J M B 242, 116 (1994). 85. A. M. Segall, S. D. Goodman and H. A. Nash, E M B O J . 13, 4536 (1994). 86. D. M. J. Lilley, Nature 357, 282 (1992). 87. S. L. Bruhn, P. M. Pil, J. M. Eissigman, D. E. Hansrnan and S. J. Lippard, PNAS 89, 2307 (1992). 88. C . S. Chow, C. M. Barnes and S. J. Lippard, Bchern 34, 2956 (1995). 89. P. M. Pi1 and S. J. Lippard, Science 256, 234 (1992). 90. S. F. Bellon and S. J. Lippard, Biophys. Chem. 35, 179 (1990). 91. D. Locker, M. Decoville, J. C. Maurizot, M . E. Bianchi and M. Leng, ] M B 246, 243 (1995). 92. J. C. Huanp, 1).B. Zarnhle, J. T. Reardon, S. J. Lippard and A. Sancar, PNAS 91, 10394 (1994). 93. D. K. Treiber, X. Zhai, H.-M. Jantzen and J. M. Eissigman, PNAS 5672, 5676 (1994). 94. K. Giese, J. Cox and R. Grosschedl, Cell 69, 185 (1992). 95. K. Giese, C. Kingley, J. R. Kirshner and R. Grosschedl, Genes Deu. 9, 995 (1995). 96. T. T. Paull, M. J. Haykinson and R. C. Johnson, Genes Deu. 7, 1521 (1993). 97. T. T. Paull and R. C. Johnson, JBC 270, 8744 (1995). 98. C . S. Chow, J. P. Whitehead and S. J. Lippard, Bchem 33, 15124 (1994). 99. P. M. Pil, C. S. Chow and S. J. Lippard, PNAS 90, 9465 (1993). 100. J. P. Wagner, D. M. Quill and 1).E. Pettijohn, JBC 270, 7394 (1995). 101. T. S. Elton and R. Reeves, Anal. Biochem. 149, 315 (1985). 102. C.-Y. King and M. A. Weiss, PNAS 90, 11990 (1993). 103. C. M. Haqq, C.-Y. King, E. Ukiyama, S. Falsafi, T. N. Haqq, P. K. Donahoe and M. A. Weiss, Science 266, 1494 (1994). 104.’ K. Strauss and J. Maher, Science 266, 1829 (1994). 105. L. G. Sheflin, N. W. Fucile and S. W. Spaulding, Bchem 32, 3238 (1993). 106. L. G. Sheflin and S. W. Spaulding, Bchem 28, 5658 (1989). 107. M. Stros, J. Reich and A. Kolibalova, FEBS Lett. 344, 201 (1994). 108. C . H. Hu, B. McStay, S.-W. Jeong and R. H. Reeder, MCBiol 14, 2871 (1994). 109. D. P. Bazett-Jones, B. Leblanc, M . Herfort and T. Moss, Science 264, 1134 (1994). 110. C . D. Putnam, 6. P. Copehaver, M. L. Denton and G. S. Pikkard, MCBiol 14, 6476 (1994). 111. A. Pontiggia, R. Rimini, V. R. Harley, P. N. Goodfellow, R. Lovell-Badge and M. E. Bianchi, EMBO 1. 13, 6115 (1994). 112. J. B. Jackson, J. M . Pollo,’ lnd H. L. Rill, Bchem 18, 3739 (1979).
96
MICHAEL BUSTIN AND RAYMOND REEVES
113. J. B. Jackson and R. L. Rill, Bchem 20, 1042 (1981). 114. J. Zlatanova and K. E. van Holde, J. Cell Sci. 103, 889 (1992). 115. F. Watt and P. Molloy, NARes 16, 1471 (1988). 116. D. J. Tremethick and P. L. Molloy, J B C 261, 6986 (1986). 117. D. J. Tremethick and P. L. Molloy, NARes 16, 11, 1107 (1988). 118. J. Singh and G. H. Dixon, Bchem 29, 6295 (1990). 119. S. Aizawa, H. Nishino, K. Saito, K. Kimura, H. Shirakawa and M. Yoshida, Bchem 33, 14690 (1994). 120. H. Ge and R. G. Roeder, JBC 269, 17136 (1994). 121. G . Seltzer, A. Goppelt, F. Lottspeich and M. Meisterernst, MCBiol 14, 4712 (1994). 122. K. E. van Holde and J. Zlatanova, BioEssays 16, 59 (1994). 123. D. Krylov, S . Lube, K. E. van Holde and J. Zlatanova, PNAS 90, 5052 (1993). 124. P. Varga-Weisz, K . E. van Holde and J. Zlatanova, JBC 268, 20699 (1993). 125. P. Varga-Weisz, J. Zlatanova, S. Leuba, G. P. Schroth and K. E. van Holde, PNAS 91, 3525 (1994). 126. E. von Kitzing, D. M. J. Lilley and S. Diekman, NARes 18, 2671 (1990). 127. P. Varga-Weisz, K. E. van Holde and J. Zlatanova, BBRC 203, 1904 (1994). 128. R. Tsanev, G. Russev, G . Pashev and J. Zlatanova, in “Replication and Transcription of Chromatin,” p. 124. CRC Press, Boca Raton, FL, 1992. 129. R. Tjian and T. Maniatis, Cell 77, 5 (1994). 130. A. A. Travers, S. S. Ner and M. E. A. Churchill, Cell 77, 167 (1994). 131. S. Waga, S . Mizuno and M. Yosihida, BBRC 153, 334 (1988). 132. S. Waga, S. Mizuno and M. Yosihida, JBC 265, 19424 (1990). 133. B. M. Shykin, J. Kim and P. A. Sharp, Genes Deu. 9, 1354 (1995). 134. J. P. Wagner, C. Kunsch and D. E. Pettijohn, in preparation (1996). 135. S. Zwilling, H. Konig and T. Wirth, EMBO 1. 14, 1198 (1995). 136. Y. Ogawa, S. Aizawa, H. Shirakawa and M. Yoshida, J B C 270, 9272 (1995). 137. D. Landsman and M. Bustin, MCBiol 11, 4483 (1991). 138. T. Lund, J. Holtlund, M. Fredriksen and S. G. Laland, FEBS Lett. 152, 163 (1983). 139. T. H. Rabbits, Cell 67, 641 (1991). 140. F. Strauss and A. Varshavsky, Cell 37, 889 (1984). 141. M. Solomon, F. Strauss and A. Varshavsky, PNAS 83, 1276 (1986). 142. K. A. Johnson, D. A. Lehn and R. Reeves, MCBiot 9, 2114 (1989). 143. T. S. Elton and R. Reeves, A n d . Biochem. 157, 53 (1986). 144. K. R. Johnson, D. A. Lehn, T. S. Elton, P. J. Barr and R. Reeves, JBC 263, 18338 (1988). 145. G. Manfioletti, V. Giancotti, A. Bandiera, E. Buratti, P. Sautiere, P. Cary, C. Crane Robinson, B. Coles and G. A. Goodwin, NARes 19, 6793 (1991). 146. U. A. Patel, A. Bandiera, G. Manfioletti, V. Giancotti, K.-Y. Chau and C. Crane-Robinson, BBRC 201, 63 (1994). 147. R. Eckner and M. L. Birnstiel, NARes 17, 5947 (1989). 148. M. Friedmann, L. T. Holth, H. Y. Zoghibi and R. Reeves, NARes 21, 4259 (1993). 149. T. Lund, J. Holtlund and S . G. Laland, F E B S Lett. 180, 275 (1985). 150. T. Lund, B. S. Skalhegg, J. Holtlund, H. K. Blomhoff and S. G. Laland, EJB 166, 2 1 (1987). 151. R. Reeves, T. A. Langan and M. S. Nissen, PNAS 88, 1671 (1991). 152. M. S. Nissen, T. A. Langan and R. Reeves, JBC 266, 19945 (1991). 153. T. Lund and S. G. Laland, BBRC 171, 342 (1990). 154. L. Meijer, A.-C. Ostvold, S. I. Walaas, T. Lund and S. G . Laland, EfB 196, 557 (1991). 155. K. R. Johnson, S. A. Cook and M. T. Davisson, Genomics 12, 503 (1992).
HMG PROTEINS
97
156. X. Xiang, K. F. Benson and K. Chada, Science 247, 967 (1990). 157. K. F. Benson and K. Chada, Genet. Res. 64, 27 (1995). 158. X. Zhou, K. F. Benson, H. R. Ashar and K. Chada, Nature 376, 771 (1995). 159. A . Lanahan, J. B. Williams, L. K. Sanders and D. Nathans, MCBiol 12, 3919 (1992). 160. S. A. Ogram and R. Reeves, JBC 270, 14235 (1995). 161. L. T. Holth and R. Reeves, unpublished. 162. R. Reeves, Curr. Opin. Cell B i d . 4, 413 (1992). 163. R. Reeves and J. N. S. Evans, unpublished observations (1995). 164. J. R. Karlson, E. Mork, J. Holtlund, S. Laland and T. Lund, BBRC 158, 646 (1989). 165. M. Z. Radic, M. Saghbini, T. S. Elton, R. Reeves and B. Hamkalo, Chrornosoma 101,602 (1992). 166. E. Kas, L. Poljak, Y. Adachi and U. K. Laemmli, E M B O J . 12, 115 (1993). 167. M . Wegner and F. Grummt, BBRC 166, 1110 (1990). 168. J. N. S . Evans, M. S. Nissen and R. Reeves, Bull. M a g n . Reson. 14, 171 (1992). 169. J. N. S. Evans, J. Zajicek, M. S. Nissen, G. Munske, V. Smith and R. Reeves, Int. J. Pept. Protein Res. 45, 554 (1995). 170. B. H. Geierstanger, B. F. Volkman, W. Kremer and D. E. Wemmer, Bchern 33, 5347 (1994). 171. M . E. A. Churchill and A. A. Travers, T l B S 16, 92 (1991). 172. J. W. Brown and J. A. Anderson, JBC 261, 1349 (1986). 173. D. Thanos and T. Maniatis, Cell 71, 777 (1992). 174. J. E. Disney, K. R. Johnson, N . S. Magnuson, S. R. Sylvester and R. Reeves, JCBiol 109, 1975 (1989). 175. Y. Saitoh and U. K. Laemmli, Cell 76, 609 (1994). 176. Y. Saitoh and U. K. Laemmli, CSHSQB 58, 755 (1993). 177. S . M. Gasser and U. K. Laemmli, Trends Genet. 3, 16 (1987). 178. T. S. Elton, Ph.D. Thesis, Washington State University, Pullman (1986). 179. K. Zhoa, E. Kas, E. Gonzalez and U. K. Laemmli, EMBO J. 12, 3237 (1993). 180. R. Reeves, T. S. Elton, M. S. Nissen, 1).Lehn and K. R. Johnson, PNAS 84, 6531 (1987). 181. T. S. Elton, M. S. Nissen and R. Reeves, BBRC 143, 260 (1987). 182. R. H . Russnak, E. P. M. Candido and C. R. Astell, JBC 263, 6392 (1988). 183. C. Tuerk and L. Gold, Science 249, 505 (1990). 184. G. Schroth and R. Reeves, unpublished data (1991). 185. D. G. Skalnik and E. J. Nenfeld, RBRC 187, 563 (1992). 186. N. J. Zeleznik-Le, A. M. Harden and J. D. Rowley, PNAS 91, 10610 (1994). 187. P. Claus, E. Schultze and J . R. Wisniewski, JBC 269, 33042 (1994). 188. M. S. Nissen and R. Reeves, JBC 270, 4355 (1995). 189. R. Reeves and M. S. Nissen, JBC 268, 21137 (1993). 190. T. A. Langan, J. Gautier, M. Lohka, R. Hollingworth, S. Moreno, P. Nurse, M. Mallet and R. A. Sclafani, MCBiol9, 3860 (1989). 191. S. Moreno and P. Nurse, Cell 61, 549 (1990). 192. S. Siino, M . S. Nissen and R. Reeves, BBRC 207, 497 (1995). 193. D. A. Lehn, T. S. Elton, K. R. Johnson and R. Reeves, Biochem. Znt. 16, 963 (1988). 194. K. Wu, F. Strauss and A. Varshavsky, J M B 170, 93 (1983). 195. M. P. Crippa, P. J. Alfonso and M. Bustin, J M B 228, 442 (1992). 196. P. J. Alfonso, M. P. Crippa, J. J. Hayes and M. Bustin, J M B 236, 189 (1994). 197. R. Reeves and A. P. Wolffe, unpublished. 198. A. P. Wolffe and H. R. Drew, PNAS 86, 9817 (1989). 199. G. Sandeen, W. I. Wood and G. Felsenfeld, NARes 8, 3757 (1980).
98
MICHAEL BUSTIN AND RAYMOND REEVES
200. J. K. W. Mardian, A. E. Paton, G. J: Burnick and D. E. O h , Science 209, 1534 (1980). 201. Y. V. Postnikov, D. Lehn, R. C. Robinson, F. K. Friedman, J. Shiloach and M. Bustin, NARes 22, 4520 (1994). 202. A. E. Paton, S. E. Wilkinson and D. E. Olins, JBG258, 13221 (1983). 203. H. Schroter and J. Bode, EJB 127, 429 (1982). 203a. Y. Gu, T. Nakamura, H. Alder, R. Prasad, 0. Canaani, 6. Cimino, C. M. Croce and E. Canaani, Cell 71, 701 (1992). 203b. D. C. Tkachuk, S. Kohler and M. L. Cleary, Cell-71, 691 (1992). 203c. N. R. McCabe, R. C. Burnett, H. J. Gill, M. J. Thirman, D. Mbangkollo, M. Kipiniak, E. van Melle, S. Ziemin-van der Poel, J. D. Rowley and M. Diaz, PNAS 89, 11794 (1992). 203d. M. T. Brown, L. Goetsch and L. H. Hartwell, JCBiol 123, 387 (1993). 203e E. Winter and A. Varshavsky, EMBO J. 18, 1876 (1989). 203j. C. T. Ashley, C. G. Pendelton, W. W. Jennings, A. Saxena and C. V. C. Glover, JBC 264, 8394 (1989). 203g. D. L. Poccia and G. R. Green, TlBS 17, 223 (1992). 203h. M. Suzuki, E M B O J 8, 797 (1989). 203i. V. Delmas, D. G. Stokes and R. P. Perry, PNAS 90, 2414 (1993). 203j. T. Laux, J. Seurinck and R. B. Goldberg, NARes 19, 4768 (1991). 203k. G. Tjaden and 6 . M. Coruzzi, Plant Cell 6, 107 (1994). 2031. J. Nieto-Sotelo, A. Ichida and P. HY Quail, N A R ~ s22, 1115 (1994). 204. M. 2. Whitley, D. Thanos, M. A. Read, T. Maniatis and T. Collins, MCBiol 14, 6464 (1994). 205. H. Lewis, W. Kaszubska, J. F. DeLamarter and J. Whelan, MCBiol 14, 5701 (1994). 206. S. Chuvpilo, C Schomberg, R. Gerwig, A. Heinfling, R. Reeves, F. Grummt and E. Serfling, NARes 21, 5694 (1993). 207. J. Kim, R. Reeves, P. Rothrnan and M. Boothby, Eur. J . Zmmunol. 25, 298 (1995). 208. D. Thanos and T. Maniatis, Cell 80, 529 (1995). 209. W. Du and T. Maniatis, PNAS 91, 11318 (1994). 210. W Du, D. Thanos and T. Maniatis, Cell 74 887 (1993). 211. 6. Ghosh, G. van Duyne, S. Ghosh and P. B. Sigler, Nature 373, 303 (1995). 212. C. W. Mueller, F. A. Rey, M. Sodeoka, 6. L. Verdine and S. C. Harrison, Nature 373, 311 (1995). 213. K. R. Johnson, J. E. Disney, C. R. Wyatt and R. Reeves, Erp. Cell Res. 187, 69 (1990). 214. J. R. Lundherg, J. R. Karlson, K. Ingebrigtsen, J. Holtlund, T. Lund and S. G. Laland, BBA 1009, 277 (1989). 215. V. Giancotti, B. Pani, P. D'Andrea, M . T. Berlingieri, P. P. DiFiore, A. Fusco, G. Veccio, R. Philip, C. Crane Robinson, R. H. Nicolas, C. A. Wright and G. H. Goodwin, EMBO]. 6, 1981 (1987). 216. B. V. Giancotti, M. T. Berlingieri, P. P. DiFiore, A. Fusco, G. Vecchio and C. CraneRobinson, Cancer Res. 45, 6051 (1985). 217. V. Giancotti, E. Buratti, L. Perissin, S. Zorzet, A. Balmain, 6. Portella, A. Fusco and G . H. Goodwin, E r p . Cell Res. 184, 538 (1989). 218. V. Giancotti, A. Bandiera, E. Buratti, A. Fusco, R. Marzari, B. ColesandG. H. Goodwin, EJB 198, 211 (1991). 219. S. D. Goodman, S. C. Nicholson and H. A. Nash, PNAS 89, 11910 (1992). 220. M. J. G. Bussemakers, W. J. M. van de Ven, F. M. J. Debruyne and J. A. Schalken, Cancer Res. 51, 606 (1991). 221. T. Ram, R. Reeves and H. Hosick, Cancer Res. 53, 2655 (1993). 222. Y. Tamimi, H. G. van der Poel, M. Denyn, R. Umbas, H. F. M. Karthaus, F. M. J. Debruyne and J. A. Schalken, Cancer Res. 53, 5512 (1993).
HMG PROTEINS
99
223. M. T. Berlingieri, G. Manfioletti. M . Santoro, A. Bandiera, R. Visconti, V. Giancotti, and A. Fusco, MCBiol 15, 1545 (1995). 224. E. Vartiainene, J. Palvimo, A. Mahonen, A. Linnala Kankkunen and P. Maenpaa, FEBS Lett. 228, 45 (1988). 225. 6. Chiappetta, A. Bandiera, M. T. Berlingieri, R. Visconti, G. Manfioletti, S. Battistd, F. J. Martinez-Tello, M. Santoro, V. Giancotti and A. Fusco, Oncogene 10, 1307 (1995). 226. M. L. Cleary, Cell 66, 619 (1991). 227. H. R. Ashar, M. S. Fejzo, A. Tkachenko, X. Zhou, J. A. Fletcher, S. Weremowicz, C. C. Morton and K. Chada, Cell 82, 1 (1995). 228. G. E. Croston, L. A. Kerrigan, L. M. Lira, D. R. Marshak and J. T. Kadonaga, Science 251, 643 (1991). 229. P. J. Layhourn and J. T. Kadonaga, Science 254, 238 (1992). 230. M. Grunstein, Trends Genet. 6, 395 (1990). 231. G. Felsenfeld, Nature 355, 219 (1992). 232. U. K. Laemmli, E. Kas, L. Poljak and Y. Adachi, Curr. Opin. Genet. Deu. 2, 275 (1992). 233. W. T. Garrard, in “Nucleic Acids and Molecular Biology” (F. Eckstein and D. M. Lilley, eds.), p. 163. Springer-Verlag, Heidelherg, 1990. 234. T. D. Srikantha and M. Bustin, J M B 197, 405 (1987). 234a. T. D. Schneider and R. M. Stephens, NARes 18, 6097 (1990). 235. W. Gilbert, C S H S Q B 52, 901 (1987). 236. J. W. Lee, H. S. Choi, J. Gyuris, R. Brent and D. D. Moore, Mol. Endocrinol. 9, 243 (1995). 237. D. Landsman and M. Bustin, JBC 261, 16087 (1986). 238. S. C. Alhright, J. M. Wiseman, R. A. Lange and W. T. Garrard, JBC 255, 3673 (1980). 239. J. M. Barratt, C. A. Hazzalin, E. Cano and L. C. Mahadevan, PNAS 91, 4781 (1994). 240. S. W. Spaulding, N . W. Fucile, D. P. Bofinger and L. 6 . Sheflin, Mol. Endocrinol. 5, 42 (1991). 241. Y. V. Postnikov, L. Trieschmann, A. Rickers and M. Bustin, JMB 252, 423 (1995). 242. 6. R. Cook, M. Minch, G. P. Schroth and E. M. Bradhury, JBC 264, 1799 (1989). 243. L. Trieschmann, Y. Postnikov, A. Rickers and M. Bustin, Mol. Cell Biol. 15, 6663 (1995). 244. V. V. Shick, A. V. Belyavsky and A. D. Mirzabekov, J M B 185, 329 (1985). 245. M. Bustin, M. P. Crippa and J. M. Pash, JBC 265, 20077 (1990). 246. B. D. Ahercomhie, G. G. Kneale, C. Crane-Robinson, E. M. Bradbury, G. H. Goodwin, J. M. Walker and E. W. Johns, EJB 84, 173 (1978). 247. G. R. Cook, P. Yau, H. Yasuda, R. R. Traut and E. M. Bradbury, JBC 261, 16185 (1986). 248. J. V. Brawley and H. G. Martinson, Bchein 31, 364 (1992). 249. V. Graziano and V. Ramakrishnan, JMB 214, 897 (1990). 250. J. D. McGhee, D. C. Rau and G . Felsenfeld, NARes 10, 2007 (1982). 251. G. Almouzni and A. P. Wolffe, E x p . Cell Res. 205, 1 (1993). 252. S. Smith and B. Stillman, EMBO J. 10, 971 (1991). 253. G . Almouzni, M. Mechali and A. P. Wolffe, EMBO J. 9, 573 (1990). 254. J. Svaren and R. Chalkley, Trends Genet. 6, 52 (1990). 255. M. P. Crippa, L. Trieschmann, P. J. Alfonso, A. P. Wolffe and M. Bustin, EMBOJ. 12, 3855 (1993). 256. S. M. Paranjape, A. Krumm and J. T. Kadonaga, Genes Dev. 9, 1978 (1995). 257. M. Bustin, L. Trieschmann and Y. V. Postnikov, Sernin. Cell Biol. 6, 267 (1995). 258. J. S. Godde and J. Widom, J M R 226, 1009 (1992). 259. H . R . Drew, J M B 230, 824 (1993). 260. D. J. Tremethick and H. R. Drew, JBC 268, 11389 (1993). 261. J. C. Hansen and J. Ausio, TIES 17, 187 (1992).
100
MICHAEL BUSTIN AND RAYMOND REEVES
262. 263. 264, 265.
D. Z. Staynov and C. Crane-Robinson, EMBO J. 7, 3685 (1988). H. F. Ding, M. Bustin and U. Hansen, unpublished (1995). S. Weishrod and H. Weintrauh, PNAS 76, 630 (1979). D. Landsman, E. Mendelson, S. Druckmann and M. Bustin, Exp. Cell Res. 163, 95 (1986). T. W. Brotherton and G. D. Ginder, Bchem 25,3447 (1986). T. W. Brotherton, J. Reneker and 6 . D. Ginder, NARes 18,2011, (1990). N. Malik, M. Smulson and M. Bustin, JBC 259, 699 (1984). R. Westermann and U. Grosshach, Chromosoma 90, 355 (1984). L. Einck and M. Bustin, PNAS 80, 6735 (1983). T. Dorbic and B. Wittig, NARes 14, (1986). T. Dorbic and B. Wittig, EMBO J. 6, 2393 (1987). S. Druckman, E . Mendelton, D. Landsman and M. Bustin, Erp. Cell Res. 166, 486 (1986). Y. V. Postnikov, V. V. Shick, A. V. Belyavsky, K. R. Khrapko, K. L. Brodolin, T. A. Nikolskaya and A. D. Mirzabekov, NARes 19, 717 (1991). M. P. Crippa, J. M. Nikol and M. Bustin, JBC 266, 2712 (1991). J. M . Pash, J. S. Bhorjee, B. M. Patterson and M. Bustin, JBC 265, 4197 (1990). A. R. Shakoori, T. A. Owen, V. Shalhouh, J. L. Stein, M. Bustin, G. S. Stein and J. B. Lian, J . Cell. Biochem. 51, 479 (1993). M. P. Crippa, J. M. Pash, B. I. Gerwin, T. E. Smithgall, R. I. Glazer and M. Bustin, Cancer Res. 50, 2022 (1990). J. M . Pash, P. J. Alfonso and M. Bustin, JBC 268, 13632 (1993). J. M. Pash, T. Smithgall and M. Bustin, Erp. Cell Res. 193, 232 (1991). D. J. Tremethick, JBC 269, 28436 (1994). H. F. Ding, S. Rimsky, S. C. Batson, M. Bustin and U. Hansen, Science 265, 796 (1994). Y. Li and J. B. Dodgson, Mol. Cell Biol. 15, 5516 (1995). M. Bustin, N. Soares, D. Landsman, T. Srikantha and J. M. Collins, NARes 15, 3549 (1987). A. A. Travers, S. S. Ner and M. E. A. Churchill, Cell 77, 167 (1994). U . Grosshach, Sernin. Cell B i d . 6, 237, (1995). M. Bustin et al., DNA Cell Biol. 14, 997 (1995).
266. 267. 268. 269. 270. 271. 272. 273. 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. 284. 285. 286. 287.
FIG.2. Tlie interaction ol‘ hSRY-HM(: with DNA as determineti bv solution N M R [reprochcetl with pcrniission from N’errier Pf d.( 6 0 ) ] .Three ~ i e w s(A-C) of‘the co-complex between the hSRY-HMG peptide and its specific recognition sequelice (5’tlCXACAAAC) are displayetl. The protein is &own as a sclirmatic iibhoii tlrawing in green, and the color coding iiscd for the DNA I~asesis red for A, lilac for T. dark blue tor G , and light blue for C. Side chains that contact the DNA haws are depicted in > e h w in (0. (1)) sliows the same view a s in (C)with the iiiolecular surfice of thcx protein sliowi in gray and the DNA atoms in yellow Tlw patclies ofblric 011 the protein siirfacr indicate the location o f the side chains of four of‘the seven residiw that interact witli the D N A biises.
FIG. 10. Surface representation of an X-ray cr)/stallographic image of the hitterfly-shaped NF-Kb p50 homodimer protein (composed of monomer subunits I and 11) bound to its recognition site in the major groove as viewed down the longituclinal axis of DNA. The unobstructed minor groove shown at the bottom of the figure (shown by an arrow) is the putative binding site for the DNA-binding domain of the HMG-I(Y) proteins in the human p-interferon promoter (13). Reprinted with permission from Nature (Ref’.209). Copyright 1995 Macrnillan Magazines Limited.
Homologous Genetic Recombination in Xenopus: Mechanism and Implications for Gene Manipulation’ DANACARROLL Department of Biochemistry Unioersity of Utah School of Medicine Salt Lake City, Utah 84132
I. Recombination of DNAs Injected into Xenopus Oocyte Nuclei . . . . . . . Mechanism of Recombination in Oocytes . . . . . . . . . . . . . . . . . . . . .
11. 111. IV. V. VI. VII.
Marker Recovery and Mismatch Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . Recombination Activities during Xenopus Development . . . . . . . . . . . . . Natural Function of SSA . . . . . . . . . . ............... A Model Gene-targeting Experiment Summary . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
102 108 114 116 119
124
There are many styles and at least two functions of homologous recombination of chromosomal DNA. In meiosis, crossing over between homologs is required for proper chromosome alignment and segregation (1). In somatic or vegetative cells of many different organisms, recombination is one mode available for repair of damage incurred by DNA, particularly double-strand breaks (DSBs) (2). Although we seek common features in these processes, it is certainly true that recombination events detected in different settings may be mediated by many different mechanisms. In addition, within any particular cell the applicable mechanism depends on the substrates presented-the answer you get depends on how you phrase the question. In this essay, I describe the capabilities of oocytes and eggs from the South African clawed frog, Xenopus laevis, for causing the recombination of exogenous DNA molecules. The focus is on the mechanism of homologydependent recombination as elucidated mostly through experimental results obtained in my laboratory. Recombination in oocytes proceeds by exonuclease resection and the annealing of complementary strands. This mecha1 Abbreviations: DSB, double-strand break; GV, germinal vesicle (the oocyte nucleus); SSA, single-strand annealing.
Progr<:~si i i Nucleic Acid Rcrearch and Molecular Biology. Vol. 54
101
Copyright 0 1996 b y Acadeintc Preqs. Inr. All nghthts of reproduction in any b n n reserwd
102
DANACARROLL
nism, which is usually called single-strand annealing (SSA), is nonconservative in the sense that two homologous parental sequences yield only one copy as a recombination product. It has been possible to investigate many specific aspects of this process by taking advantage of the unique features of oocyte injection. Because the same or a very similar mechanism operates in a wide variety of organisms and cell types, we have gone on to examine its implications for directed genetic manipulations, with particular reference to gene targeting protocols. Before our first oocyte injections, there was already considerable evidence that at least some processes of normal D N A metabolism are experimentally accessible in these very large cells (3).The intracellular volume of a full-grown, stage VI oocyte is approximately 1 pL, which is the equivalent of about 105 normal vertebrate somatic cells, and its contents of many cellular components are similarly scaled (4). The oocyte nucleus (the germinal vesicle, or GV) is several hundred micrometers in diameter, has a volume of approximately 50 nL, and can be directly microinjected. Encouraged by the general capabilities of oocytes and by two reports of the formation of recombination intermediates (5, 6), it seemed reasonable to expect that oocytes might support the recombination of injected DNAs. The initial motivation was to develop a means of testing the recombination properties of repeated sequences from Xenopus (7), but attention was soon redirected to questions regarding molecular mechanism. As it turns out, the oocytes are particularly well-suited to such studies because (1) they process large amounts of injected substrate over a convenient time course, (2)the process can be interrupted at any time for observations of intermediates and products, and (3) the substrate can be manipulated in uitro before direct delivery to the cell nucleus. The oocytes have provided more direct information about the structure of recombination intermediates than can be obtained from other systems, and we feel confident in the conclusions about mechanism. Although the behavior of DNA strands in oocytes is clear, we have much less information about the enzymatic catalysts of the process.
1. Recombination of DNAs Injected into Xenopus Oocyte Nuclei
A. Experimental Protocol The steps in an oocyte injection experiment are illustrated in Fig. 1. Oocytes are collected by surgical dissection of ovary segments and dissociated by collagenase treatment (8).They are readied for injection by gentle centrifugation, which causes the GV to press against the inner cortical sur-
RECOMBINATION IN
103
XenOptiS
Substrate
+Incubate
Isolate GV
REir
Analyze
Inject
FIG. 1. Oocyte injection protocol. After design, preparation. and characterization, substrate DNAs are injected directly into the oocyte nucleus (GV). After an incubation of arbitrary duration, recovery of the injected DNA is initiated by manual isolation of the GV, followed by standard extraction procedures, and then analyses that are dictated by the goals of the experiment.
face and mark its location by displacing pigment granules (9). This makes it very easy to accomplish direct nuclear injection. A D N A substrate is prepared, usually by cloning desired sequence combinations in bacteria and by treatment with various enzymes in vitro. The D N A is concentrated typically to about 250-500 Fg/mL, and 20 nL of the resulting solution is delivered to the GV with a simple glass inicropipette (10).The oocytes are incubated at 19°C for an arbitrary period that can be as short as a minute or two or as long as several days. Recovery of the injected D N A is initiated in most of the experiments by manual dissection of the GV; this avoids contamination with cytoplasmic (especially yolk) components, which are difficult to remove and may interfere with subsequent analyses (11, 12). Following standard extraction and precipitation steps, the recovered D N A can be analyzed by gel electrophoresis, electron microscopy, transformation into bacteria, or any other desired method.
B. DNA Substrates and Basic Observations As is the case for monitoring any biochemical process, the choice of a substrate is critical, and ours has evolved with experience and with the aspect of recombination we set out to test. Our first injection experiments were performed with bacteriophage A DNAs (11).Phage A was chosen partly because of the ease of scoring recombinants and partly because it provided a large (48.5 kb) target for crossing over. Pairs of marked phage chromosomes (Fig. 2A) were mixed and injected, as naked D N A , into oocytes. After several hours of incubation, D N A was recovered and packaged into phage particles, which were plated onto permissive and selective hosts. Generation of recombination products dependent on oocyte injection was readily observed (11). To gain insight into the types of events stimulated by oocytes, we made
DANA CARROLL
104
A +
+
Sam
~
----J
----I
am+ clear
am+ turbid
B S
Y
Overlap (bP)
A+
S+ S+
U
Y
AA-
S+ n A-
S+
U
U
A-
A+*
4 254
+++
7,619
+++
14,942
+++
FIG. 2. Recombination of A DNA in oocytes. (A) Marked bacteriophage A chromosomes used for injections. Parental DNAs carried an amber mutation in either the A (Aam) or the S (Sam) gene. Recombinants that were wild type at both sites (am+)were selected by growth on a suppressorless host. Cross-overs in the intervals on either side of the cl gene were distinguished by plaque morphology, as indicated. (8) Effect of DNA cleavage and homologous overlaps on recombination of A DNAs. The same chromosomes illustrated in panel A (drawn on a smaller scale) first had their cohesive ends joined, as indicated by the boxes with a diagonal line; then they were cut with selected restriction enzymes. The A + chromosome was cleaved with XhoI and coinjected with S+ DNAs cut with XhoI, SalI, S K I , or KpnI (top to bottom), producing the homologous overlaps shown. In the rightmost column, yields of A + S + recombinants are indicated qualitatively. (Adapted from Refs. I 1 and 12, with permission.)
intentional double-strand breaks in the A DNAs before injection (12). Cleavage of the substrates stimulated recombination, particularly when both DNAs were cut; the location of the cuts, however, was crucial. Recombinants were recovered only when there was a substantial overlap of homologous sequences between the selected markers (Fig. 2B). When the chromosomes were cut at the same site or in such a way as to leave a gap, no recombination products were recovered. This indicated that the oocytes could support homologous recombination stimulated by DSBs, but not nonhomologous end joining. It seems likely that the low level of recombination observed in the earlier experiments with “uncut” chromosomes (11)depended on fortuitous random breakage of the DNAs during preparation or injection.
RECOMBINATION IN
Xenopus
105
C. Plasmid Substrates When we shifted to the use of plasmid DNA substrates, we were able to test more directly some of the requirements for oocyte recombination (12). We often use derivatives of the plasmid pRW4 (Fig. 3) (13, 14), which contains pBR322 sequences and a direct duplication of nearly all of the tet gene. The repeats are separated by a unique XhoI site, and the duplicated region is 1246 b p long. A similar substrate was found by others to recombine in oocytes (15). After cleavage with XhoI, pRW4 can undergo inter- or intramolecular homologous recombination through the tet repeats (Fig. 3). This reaction is very efficient; approximately 2 x lo9 molecules (10 ng of linear pRW4) injected into one oocyte recombine to completion within a few hours with good yield (Fig. 4) (14). In contrast, injected circular pRW4 is recovered unchanged after hours or days of incubation. Circular DNAs are assembled
pRW4
?-r
Xhol i Pvull
Recombination
FIG.3. Structure of the plasmid pRW4 and illustration of intra- and intermolecular recombination events. The open boxes with arrows are the direct 1246-bp repeats. The unique site for XhoI is used to cut pRW4 between the repeats prior to injection. The unique site for Pod1 is one that is commonly used for analysis of recombination products after DNA has been recovered from oocytes. Substrate that has not recombined yields two fragments after PouII digestion, whereas one larger fragment is produced for each recombination event that has occurred, whether it was intra- or intermolecular.
106
DANACARROLL
FIG. 4. Electrophoretic analysis of recombination in oocytes. pRW4/XhoI was incubated in oocytes for the times indicated. Recovered samples were treated with PouII, and the DNA was subjected to electrophoresis in a 1.0% agarose gel, then blot-hybridized with a pBR322 probe. The two bands resulting from unrecombined substrate are indicated by S, recombination products by P. Smears running just ahead of S and just behind P are recombination intermediates, as discussed in the text; C shows the position of a fragment from a nonhomologous circular plasmid that was coinjected as a control for recovery. (Figure provided by R. J. Dawson.)
into chromatin and are quite stable in oocytes (16), but they are inert in recombination (12). In later experiments, we used this feature to advantage (see Section VI). In addition to being linear, an effective substrate must have homologous overlaps to support recombination. Simple linear DNAs are degraded in oocytes (12, 16). With some linear versions of the plasmid pBR322, we recovered low levels of apparent recombination products (17). When the junctions were sequenced, it turned out that adventitious, short repeats near the cut ends were responsible for (inefficient) recombination by the same, homology-dependent mechanism.
D. 5’ + 3’ Exonuclease and 3’ Tails Because molecular ends are required for recombination, we examined
the fate of DNA termini in oocytes. We found that all linear double-stranded DNAs are subjected to exonucleolytic degradation of one strand and that the polarity of degradation is uniquely 5’ + 3’ (13).In contrast, exposed 3’ ends are relatively stable. To test the importance of exonuclease resection, we prepared substrate D N A with 300-nt single-stranded 3’ tails by digesting pRW4 in vitro with T7 gene-6 exonuclease (a 5’ += 3‘ exonuclease that is commercially available). Substrate prepared in this fashion appeared in recombination products more rapidly than substrate that was simply cut prior
RECOMBINATION IN
Xenopus
107
to injection, which indicated that 3' tails are on the pathway to recombination (18). Armed with the information that single-stranded 3' ends are functional intermediates, one can readily produce a panoply of models that lead from there to completed products. Some of the mechanisms we considered are illustrated in Fig. 5 (18). Many models begin with the invasion of homologous duplex sequences by the 3' tail. The Escherichia coli RecA protein, for example, can catalyze such a reaction (19). The invasion step can be followed by branch migration and the formation of a classical Holliday junction (Fig.
FIG. 5 . Recombination models utilizing single-stranded 3' tails that can, in principle, account for homologous recombination events involving pRW4. Terminal direct repeats are represented by thick lines; half arrowheads denote 3' ends. (A) Invasion. One 3' end invades homologous sequences in the other repeat. Branch migration creates a Holliday junction, which can b e resolved by cleavages and ligations. (9)Invasion-primed DNA synthesis. After initiation as in A, the invading 3' end acts as a primer for DNA synthesis. Several rontes to resolution of this intermediate are possible. (C) Single-strand annealing. Exonuclease resection continues until complementary single strands are exposed. These anneal, the 3' branches are removed or assimilated, and the strands are eventuallv sealed by DNA ligase. (Adapted from Ref. 14. with permission. )
108
DANACARROLL
5A). Cellular activities that can resolve Holliday junctions have been identified in a number of organisms (2O),and the final, covalently closed products are presumably generated by the action of DNA ligase. Alternatively, the invading 3’ end can be utilized as a primer by DNA polymerase to extend the invasion loop by DNA synthesis (Fig. 5B). Depending on available activities, this intermediate can be resolved by cleavage of a Holliday junction, by pull-out of the synthesized strand and hybridization to the tail at the other end of the substrate, or by double-strand synthesis all the way around the circular intermediate. A different class of mechanism envisions more extensive 5’+ 3’ degradation before the homology-dependent step (Fig. 5C). Once complementary single strands are exposed, they anneal by simple Watson-Crick base-pairing. The protruding single-stranded branches can be removed by direct endo- or exonuclease action, or they can be assimilated into the structure by continued 5’ + 3‘ degradation of the duplex strands. At this stage the nicks or gaps that remain are repaired by DNA polymerase and DNA ligase. This mechanism is commonly called single-strand annealing (SSA) @ I ) , although we have sometimes used the term resection-annealing to emphasize the role of the exonuclease (22, 23).
II. Mechanism of Recombination in Oocytes
A. Evidence for SSA To distinguish the various recombination mechanisms, we undertook experiments in which we modified the substrate andlor characterized intermediates in the process. The success of these efforts was facilitated by the large amounts of substrate processed in each oocyte and by the convenient time course of recombination. Smears near bands in a DNA electrophoresis experiment are often a sign of problems, but in these experiments they turn out to be very revealing. As indicated in Fig. 3, the substrate we normally inject appears as two bands on a gel after cleavage with a diagnostic restriction enzyme; the recombination products yield a single, larger band after the same treatment. In a typical experiment, substrate is converted to product over a period of a few hours (Fig. 4)(13, 14). At intermediate times, there are smears running faster than the substrate fragments and slower than the product band. These smears have the kinetic properties of recombination intermediates: they always appear during the course of the reaction when substrates and products coexist, and they are absent both before injection and after recombination is complete (14).
RECOMBINATION
IN
Xenopus
109
Using two-dimensional gel electrophoresis and oligonucleotide probes, it was shown (14) that the smears ahead of substrate fragments are the result of 5’ + 3‘ resection. Molecules in these smears have full-length 3’-ending strands, but 5‘-ending strands of decreasing length. The species running slower than product have the strand composition expected for intermediates on the SSA pathway (14). Their 3’ ends are intact; their 5’ ends are degraded. The slowest moving intermediates have 5’ ends resected about half the length of the original homologous overlap, while those closest to product in mobility have been resected through the whole length of the homology. Digestion with S1 nuclease removed the 3’ branches from the annealed intermediates and caused them to migrate like linear products in the first (nondenaturing) dimension, confirming their partially single-stranded character (14). Compelling evidence for the SSA mechanism came from direct visualization of recombination intermediates in the electron microscope (22). These experiments made use of a substrate with long homologous overlaps (3.4kb), which slowed the time-course of the reaction and produced relatively large amounts of intermediates with more readily characterized features. In addition, the DNA was cross-linked with psoralen before isolation from the oocytes to prevent the loss of invasion intermediates or other unstable structures by branch migration during recovery. Like the example shown in Fig. 6, all of the intermediates we observed had structures predicted by SSA. They were double-stranded DNAs with simple single-stranded branches,
FIG. 6. Characterization of recombination intermediates by electron microscopy. Linear substrate was injected into oocytes; DNA was recovered after 4 hours, cut with a diagnostic restriction enzyme, and prepared for electron microscopy. The molecule shown is a linear double-stranded DNA (segments a-c) with two single-stranded branches (1and 2), as predicted by the SSA mechanism. The single strands were coated with RecA protein, which makes them appear thicker. Bar = 0.2 p,m. [Reproduced from E M B O J . 12, 23-34 (1993),by permission of Oxford University Press.]
DANA CARROLL
110
and the measured lengths of various features corresponded very closely to expectation (22).The invasion models predicted intermediates with internal loops and three or four double-stranded branches, but no such structures were observed among hundreds of molecules examined. The steps in SSA recombination are illustrated in more detail in Fig. 7 to aid in understanding the tests of the model that we performed.
B. Testing the SSA Model The injection of several types of modified substrates confirmed the SSA mechanism and elucidated some of its additional properties as manifested in oocytes. First, circular DNAs with annealed junctions were prepared by digestion with T7 gene 6 exonuclease (14). In this case, approximately 900 nt were removed from each molecular end, revealing homologous single strands that formed 550-bp intramolecular joints on annealing at low D N A concentration. After injection, these molecules were very rapidly converted to recombination products, which showed they are genuine intermediates. In addition, we learned that the remaining redundancy is removed (step 4 in Fig. 7) by continued 5‘ -+ 3‘ digestion, not by cleavage of the 3’ tails. Substrates carrying nonhomologous sequences on one or both ends recombined inefficiently (12, 24). Instead, annealed intermediates with substantial single-strand gaps accumulated (24), as diagrammed in Fig. 8. This was apparently due to the absence in oocytes of an activity that can efficiently remove the 3’-ending strand of the nonhomology. We believe that 5’ + 3’ exonuclease continues to degrade beyond the end of the homology, because neither ligation nor priming of D N A synthesis from the 3’ end is
-5 1
2
13
4
-c-
FIG. 7. Detailed steps in the SSA mechanism. In oocytes, steps 1, 2, and 4 are mediated by 5’ + 3’ exonuclease activity. It is not known whether an annealing catalyst assists in step 3. Completion of covalently closed products (step 5) requires DNA polymerase and DNA ligase activities. Conventions are as in Fig. 5. (Reproduced from Ref. 14, with permission.)
RECOMBINATION IN
Xenopus
111 7
-
p==
...........1 I : : I 1
(il
2
5
FIG. 8. SSA with a substrate carrying a nonhomologous end block T h e terminal nonhomology is illustrated with dotted lines. The 5‘ + 3’ resection (1) and annealing (2) steps proceed as in Fig. 7, and the unblocked joint at the left of the homology is completed (3, 4). Because there is no activity to remove the 3‘-ending strand of the nonhomology, the blocked end cannot b e joined. The exonuclease continues to degrade, creating a gap (4), in which DNA synthesis can be initiated at random sites (dashed line) by DNA polynieraseiprimase (5). (Modified from Ref. 24, with permission.)
possible in the presence of the nonhomology. These “stuck intermediates are probably protected from complete degradation by occasional initiation of DNA synthesis at random sites in the single-stranded gap left by the 5‘ + 3’ exonuclease (25). Thus, there is a balance between degradation and resynthesis. In fact, short terminal nonhomologies (in the range of 60-400 nt) are eventually removed, and the block to recombination is seen to be kinetic, rather than absolute (24).This suggests that there is a very weak 3’ + 5’ exonuclease activity that can eventually remove the end block. The fact that long (600-2000 nt) end blocks are not removed in the course of a typical experiment indicates that resolution is not accomplished by a debranching endonuclease. A direct test of the capability of oocytes to perform the alternative, invasion-type recombination utilized a substrate designed to yield products via such a pathway (Fig. 9) (26). In fact, nearly all of the products recovered resulted from the annealing pathway, and recombination was inefficient because one end of the substrate behaved as an end block (Fig. 9). Two products were recovered that could have been generated by an invasion mechanism, but they could also have been produced by SSA from a low level of fragmented substrate. Considering the extremely low yields of such products and the uncertainty as to their source, we concluded that invasioninitiated recombination is essentially nonexistent in oocytes.
112
DANA CARROLL
Conservative
Sac1 __)_
ri
....__....._...________ Nonconssrvative (SSA)
FIG.9. Substrate designed to test conservative recombination in oocytes. pDClO has the same 1246-bp repeats as pRW4, but they are separated by an insertion of 1691 bp of adenovirus 2 DNA (dashed line), and the two repeats are distinguished by several restriction sites (indicated by shading). Cleavage with Sac1 creates a linear DNA that could recombine via a conservative mechanism that would recreate the original plasmid, but with a patch of gene conversion from the uncut repeat where the DSB had been. If recombination occurs via a nonconservative mechanism, like SSA, only one small plasmid would h e produced from each substrate molecule. The conservative product was not seen in oocytes (indicated by the interrupted arrow), while the yield of SSA products was quite low (indicated by the dashed arrow). (Modified from Ref. 26, with permission.)
C. Recombination Kinetics Because recombination of injected substrates occurs in oocytes with a convenient time course, it has been possible to test the effects of several variables on the kinetics of recombination. The mixture of activities in whole oocytes is quite complex, and such studies are not equivalent to kinetic analyses using purified components, but we have gained some insights into what factors limit the rate and yield of products. Substrates with different lengths of homologous overlaps-0.3, 1.25, 2.0, and 3.4 kb-were prepared and examined for their time courses of recombination (R. J. Dawson and D. Carroll, unpublished). At high concentrations of injected DNA (around 109 molecules per oocyte), all four substrates gave good yields of recombinants, predictions of the SSA mechanism were fulfilled, and it appeared that the rate of recombination is determined by the action of the 5' + 3' exonuclease. The latter conclusion was based on the
RECOMBINATION IN X e n O p U S
113
observation that longer overlaps took longer to complete recombination because the exonuclease had to degrade inore DNA. At much lower DNA concentrations (lo6 to 107 molecules per oocyte), recombination was much faster, taking minutes rather than hours to coinplete, presumably because the ratio of substrate to exonuclease molecules was reduced (R. J. Dawson and D. Carroll, unpublished). The length-dependent kinetic differences were still seen, but the yield of recombination products was surprisingly diminished. It appears that some other step in the overall SSA process had become slow relative to exonuclease digestion, and the enzyme often degraded substrates to completion before the slow step was accomplished. We do not know what the slow step revealed at low concentration is, but the leading candidates are the annealing of compleinentary strands (step 3 in Fig. 7) or removal of a few unmatched nucleotides from the 3‘ ends to allow DNA polymerase to initiate synthesis. Full characterization of the steps in recombination awaits purification of the catalysts involved and reconstitution of the reaction in vitro. Nonetheless, we have the following picture based on injection experiments. Because undegraded substrate, partially degraded intermediates, and completed products coexist at intermediate stages of the reaction, the 5’ -+ 3’ exonuclease is partially processive. The enzyme could have a long processive length, or it could be literally distributive but functionally processive due to a preference for binding to partially resected substrates. Completion of SSA recombination requires the exonuclease to degrade through the entire homology on both interacting ends. The exonuclease does not stop after traversing the homology, hut continues to produce gaps that must be filled by DNA polymerase and ultimately sealed by DNA ligase. The exonuclease inoves slowly compared to polymerase, so polymerase can catch up and fill the gap. To provide a primer for DNA polymerase, the complementary 3’-ending strands must be annealed and their termini must be exactly matched to the template strand. When the exonuclease is occupied with a large excess of substrate molecules, the overall rate of degradation is slow enough that annealing and 3’ nibbling are easily accomplished in good time. At low substrate Concentration, however, the exonuclease chews up the substrate quickly, and the steps necessary for priming of DNA synthesis may become limiting.
D. Recombination in GV Extracts To facilitate analysis of the requirements for oocyte recombination and to provide a starting point for purification of the required enzymatic activities, extracts of isolated GVs were tested for their recombination capabilities (27). Both manually dissected (27) and bulk isolated (28) GVs supported homologous recombination by a process essentially indistinguishable from SSA in
114
DANACARROLL
injected oocytes, although the overall capacity was reduced, presumably due to the approximately 10-fold dilution inherent in the isolation procedures. In these extracts, exonuclease activity was dependent only on the presence of an appropriate divalent metal ion (Mg2+ or Mn2+), but generation of completed recombination products required an added nucleoside triphosphate (27). ATP or dATP alone supported a full level of SSA recombination, but nonhydrolyzable analogs did not substitute. In the presence of all four dNTPs, both SSA and nonhomologous end joining were observed; the latter products are discussed more fully in Section IV,B.
111. Marker Recovery and Mismatch Repair If parental sequences are largely homologous, but differ from each other at a few marked sites, the pattern of recovery of those markers in the products of recombination can often shed light on the underlying mechanism. We designed a multiply marked substrate to help distinguish possible mechanisms in oocytes, but by the time the study was completed, we had already accumulated convincing support for the SSA model. Instead of helping to decide among mechanisms, the results gave us insight into how oocytes deal with mismatches in recombination intermediates. The substrate used in these studies was the plasmid pDC10, a derivative of pRW4 in which one of the 1246-bp direct repeats contained eight singlebase-pair substitutions, each of which created or destroyed a restriction site (Fig. 10A) (29). After recombination was complete, the pattern of parental markers in the products was determined by two methods, which were in good agreement: isolation of individual products by transformation of bacteria, and direct electrophoretic analysis of mixed products. Most of the recombinants from pDClO showed a single apparent exchange between blocks of parental markers, and most of these apparent exchanges were located near the original molecular ends (Fig. 10B). A predicted intermediate in SSA recombination is a full-length heteroduplex molecule in which the original region of homology is composed of one strand from each parent (see Fig. 7). In this interpretation, the pattern of marker recovery reflects not points of literal breakage and reunion, but the processing of mismatches in such a heteroduplex. A predominance of apparent exchanges near the ends of the overlap corresponds to cocorrection of all the markers on one strand or the other in each product molecule. To confirm the validity of this perspective, we constructed synthetic heteroduplexes that mimicked the predicted recombination intermediate (Fig. 11)(30).After incubation of these molecules in oocytes, the pattern of marker recovery was similar to that obtained by recombination of pDClO in
RECOMBINATION IN
A
C
M
115
Xenopus
50
1
0 4 0
500
1000
Midpoint (bp)
FIG. 10. (A) Diagram of pDC10. The starting plasmid is the same as the one diagrammed in Fig. 9, but it has been cut at both boundaries between pBR322 and adenovirus sequences to generate the structure shown. One of the 1246-bp repeats carries eight single-base-pair substitutions that create or destroy restriction sites as indicated: H, HindIII; V, EcoRV; B, BarnHI; S, SphI; L, SaZI; C , S a d ; N , NruI; M, MluI. Determination ofwhich sites are present in products after recombination in oocytes reveals where the exchange between parental sequences has occurred. (B) Distribution of apparent exchanges on recombination of pDCl0 in oocytes. This distribution was determined by detailed restriction analysis of 180 individual cloned products. (Adapted from Ref. 29, with permission.)
that apparent exchanges were concentrated near the ends of the interval (Fig. 11, open symbols). There were, however, many fewer internal exchanges in the case of the synthetic heteroduplexes. We think this is due to occasional loss of parts of the exposed single-stranded 3’ tails during recombination of pDC10, whereas the corresponding strands are completely annealed in the synthetic heteroduplexes prior to injection. Like the SSA intermediates we envision, the first heteroduplexes we injected had nicks in both strands at the ends of the overlap. Nicks are initiation sites for mismatch repair in both bacterial (31)and mammalian (32, 33) cell extracts, and we believe that they serve the same function in oocytes (30).With the nicked substrate, there was a slight bias in favor of recombination near the right end of the overlap, which was due to a growth advantage in the transformed bacteria, which favored the mutant tet gene. To determine whether mismatch repair could be initiated in oocytes in the absence of preexisting nicks, we prepared covalently closed heteroduplexes by ligation (30).The result was that most heteroduplexes were still efficiently repaired,
116
8 DANA CARROLL
8o
- - O m - Nicked Hets
Heteroduplex Substrate
0
500
1000
Midpoint (bp)
FIG. 11. Distribution of apparent exchanges from injected synthetic heteroduplexes. Appropriate DNAs were digested, denatured, and annealed to generate heteroduplex molecules, as illustrated at the left, having different parental strands (black and textured) across the whole 1246-bp region of homology and nicks on opposite strands lying just outside the heteroduplex region. One sample was left nicked and another was ligated prior to injection. The distribution of apparent exchanges, judged by patterns of parental markers, is illustrated for both samples, just as in Fig. 10. (Adapted from Ref. 30, with permission.)
the pattern was still long-patch (i.e., one strand or the other was essentially completely replaced in each product molecule), but the distribution was altered to favor the wild-type strand (Fig. 11,closed symbols). The reason for this altered pattern is not known, but it could reflect a preference for initiating repair at specific mismatches. An important conclusion is that, unlike extracts from mammalian cells (32, 33), oocytes can initiate correction on covalently closed substrates, and this may be a favorable system in which to investigate the initiation process and sources of strand selectivity.
IV. Recombination Activities during Xenopus Development
A. Staged Oocytes Given that oocytes have such a large capacity for SSA recombination, we are interested in what the normal function of this process might be in frogs. One way to address this issue is to determine whether SSA activity is correlated with events of chromosomal DNA metabolism during oogenesis, most
RECOMBINATION IN
Xenopus
117
of which occur at early stages (34),or accumulated in preparation for embryogenesis. The ovary of a mature female frog contains a broad range of oocyte sizes, from stage I to stage VI (34).The standard injection procedures were adapted to deliver linear pRW4 to oocytes of all stages (23). We ensured that we only assayed the results of successful nuclear injections by manually isolating GVs before initiating DNA purification (see Fig. 1).The result (presented in Fig. 12A) was that the capability for homologous (SSA) recombination was absent from stage I and 11 oocytes; it began to appear during stage 111, continued to increase in stage IV, and was essentially fully developed by stages V and VI. The earlier stages produced a low level of end-joined products, which we showed were the result of ligation of the XhoI site at the
FIG. 12. (A) Recombination in oocyte stages, pRW4 was injected into oocytes of each of the indicated stages and incubated overnight. Recovered DNA was cut with PouII and analyzed as before. The positions of bands arising from substrate (S) and homologous recombination products (P) are as shown in Fig. 4. EJ shows the positions of the three (head-tail, head-head, tailtail) end-joining products that result from ligation of XhoI sticky ends. pHSS6 is a nonhomologous circular DNA that was injected as a recovery control. (B) Recombination in GV extracts from stage I1 and stage VI oocytes. pRW4/XhoI was incubated in full-strength extracts (11, VI), extracts diluted 1:l with buffer (+b), or extracts mixed in equal proportions (I1 + VI). Analysis was as in panel A. Recombination intermediates can he seen as smears ahead ofbands S and behind band P in samples that included the stage VI extract. (Adapted from Ref. 23, with permission.)
118
DANA CARROLL
original molecular ends. The accumulation of SSA recombination capacity during oogenesis suggests that, like many activities of full-grown oocytes, it is stored for embryogenesis and does not reflect oocyte-specific requirements (23).
B. Extracts and Eggs We also tested the recombination capabilities of GV extracts from all oocyte stages, and the results mirrored those obtained by injection (23).The availability of extracts allowed us to ask whether the absence of SSA in early oocytes was due to the presence of an inhibitor or to the absence of a required component. We addressed this question by mixing GV extracts of stage I1 and stage VI oocytes. As seen in Fig. 12B, the mixture clearly had the properties of the stage VI extract, ruling out the action of an inhibitor at early stages. What could the missing component(s) be? Because we knew that 5’ -+ 3’ exonuclease activity was required for recombination, we assayed for its presence in GV extracts from the various stages (23).The activity was very low in extracts of stage I and I1 GVs, began to rise in stage 111, and was high in stages IV, V, and VI (Fig. 13). The exonuclease, at least, accumulates in parallel with the capability for recombination. The next stage of development beyond the stage VI oocyte is the unfertilized egg. The transition is called oocyte maturation, and it is initiated, in vivo and in uitro, by steroid hormones (35).Eggs are well-known to catalyze nonhomologous end-joining (an activity not shown by oocytes), and this capability develops very late in the maturation process (36). Prior to our studies, no one had addressed the issue of homologous recombination in eggs. Injection of linear pRW4 showed that eggs support both homologous and nonhomologous modes of recombination on the same substrate (23). Both activities were enhanced by activation of the eggs with calcium ionophore, a process that mimics fertilization. The persistence of SSA capability beyond fertilization reinforces the idea that it is present to serve essentially somatic functions during and/or after the early stages of development. To complete our analysis of recombination products, we sequenced the junctions of molecules resulting from nonhomologous end-joining in eggs and in GV extracts in the presence of dNTPs (37). We found some products that resulted from simple ligation of the XhoI ends on the substrate. Other molecules had deletions from one or both interacting ends and showed very short matches (1-5 bp) at the junction, which were apparently sufficient to set the joining register. Taken together with other studies on end joining in egg extracts (38-40) and our observation of SSA recombination (23), these findings show that Xenopus eggs have recombination capabilities very similar to those in cultural mammalian cells. In that setting, homologous recombina-
RECOMBINATION
m Xenopus
0 I X I
119
J
n
II
111
n
IV
r
"
V
r
n
Y
VI
Egg
Stage FIG. 13. Activities during oocyte development. The percent of recovered DNA that was identifiable as products of homologous recombination (SSA) or nonhomologous end joining (EJ) is plotted for various stages of oocytes and eggs. Exonuclease activity (Exo) measured in GV extracts from each oocyte stage is also shown. (Modified from Ref. 23, with permission.)
tion competes with end-joining (41), end-joints are mediated by microhomologies (42), and the mechanism of homologous recombination is nonconservative (43),likely SSA (21).
V. Natural Function of SSA Because SSA recombination is inherently nonconservative, it is not suited to the task of meiotic recombination. Our findings regarding recombination during oogenesis suggest a somatic function for SSA (23).Nonconservative recombination that may well proceed via SSA has been observed in somatic cells from a wide variety of organisms. Among prokaryotic pathways, E . coli Rec E- (44-46) and RecF- (47) mediated events have been suggested to proceed via SSA, and the bacteriophage k red pathway may do the same (although evidence to the contrary has also been produced (48)).The bacterial recE and A redor genes encode 5' + 3' exonucleases (49).In the yeast Saccharomyces cerevisiue, DSB-stimulated recombination between direct repeats, both in plasmids and at chromosomal loci, apparently goes largely by SSA (50-55), and 5' + 3' resection is observed (56-60). The SSA model was proposed to explain extrachromosomal recombination in mammalian
120
DANA CARROLL
cells @ I ) , and most investigators agree that this is the predominant pathway. Similarly, recombination of DNAs introduced into cultured plant cells is nonconservative and has been attributed to SSA (61). Given the ubiquity of SSA capabilities, what function does this mechanism normally perform in cells? One hypothesis is that it provides a pathway for repairing double-strand breaks. It certainly has this capability when a substrate with appropriately placed repeats is provided. Furthermore, targeted DSBs are survivable in rud52 mutant yeast cells only when they occur in tandemly repeated sequences that are capable of SSA (53).This confers an unequivocal advantage based on repair of D N A damage. SSA is not capable of repairing breaks in unique-sequence D N A (53),which is in accordance with mechanism-based predictions. Do cells retain a repair mechanism that is useful only against breaks in repeated DNA? In higher organisms like Xenopus and mammals, tandemly repeated sequences constitute a substantial fraction of the genome (62), so this would not be a trivial contribution. Furthermore, satellite DNAs and other repeats are sequestered in heterochromatin, where their availability for interchromosomal interactions may be quite limited. SSA between sequences on opposite sides of a single DSB is a reasonable alternative. In addition, dispersed repeated sequences provide opportunities for SSA repair beyond those in tandem repeats. Recombination between dispersed repeats in the correct orientation would lead to deletion of the sequences between the repeats. This would not be tolerable in the germ line, but in somatic cells the loss of one allele in one cell may well be preferable to the loss of a chromosome or chromosome arm that would result from the failure to repair the break. In mammalian cells a few examples of deletion events mediated by repeated sequences, such as Alu repeats, have been documented (63).The vast majority of deletion junctions, however, including those produced following do not have the appearance predicted for homologous intentional DSBs (64), recombination. Instead they show very short matches at the junctions like those made during nonhomologous end joining (63, 64).This discourages attempts to attribute to SSA a dominant function in DNA repair in mammals. Specialized functions may also be mediated by SSA. One example that we have considered is the resolution of rolling-circle tails during the amplification of ribosomal D N A in Xenopus oocytes (14, 65, 66). Discouragingly, the timing of accumulation of high levels of SSA activity in oogenesis is anticorrelated with rDNA amplification (23, 67). This observation is inconclusive, however, because the number of events required (about 105 per oocyte) is much lower than we typically measure (108-109 events per oocyte) and might have escaped detection; and we have made no attempt to measure
RECOMBINATION IN
Xenopus
121
SSA activity in oocytes that are at the peak of rDNA amplification, where sufficient activity might be present. An alternative view is that SSA per se does not have a natural function. Instead it may reflect activities of enzymes involved in other cellular processes, but accessible to exogenous DNA. For example, 5’ + 3’ exonuclease, DNA polymerase, and DNA ligase are all activities involved in normal DNA replication (68). In this view, a homologous recombination outcome is seen as a consequence of substrate design, rather than a normal cellular intent. The issue of a function for SSA will be decided only by identifying genes and enzymes required for the process and examining the consequences of disabling them. In yeast, no genes have yet been identified as being uniquely required for SSA recombination; but if this is possible, the phenotypes of the corresponding mutants will be very informative. In Xenopus, techniques of standard genetic analysis are not readily available, but we still have the advantage of large, manipulable cells to work with. In some cases, it has been possible to disrupt the function of specific proteins by injecting antibodies raised against them (69-71). Our current approach is to purify required components from recombination-competent GV extracts, to characterize their activities, to raise antibodies that block their function, then to examine the effects on recombination and on other aspects of D N A metabolism of injecting the antibodies into oocytes and embryos.
VI. A Model Gene-targeting Experiment Whatever its natural function, SSA is the recombination mechanism that operates most efficiently on extrachromosomal DNAs in mammalian cells, just as in frog oocytes. Given this fact, we can speculate as to why targeted gene repIacements are so inefficient in mammalian cells. SSA is dependent on exonucleolytic resection of the parental DNAs to produce single-stranded tails that can ultimately anneal with each other. In a typical gene targeting experiment (72, 73), the introduced DNA (vector) is linear, but the chromosomal target site is uninterrupted and is thus inaccessible to exonuclease (Fig. 14A). In this configuration, the vector is degraded, while the target is unchanged. In fact, the rare successful targeting events seem to occur by a conservative recombination mechanism, not SSA (74). Now examine what would happen if the target could be cleaved specifically at the place where recombination was desired to occur (Fig. 14B). Exonuclease would enter the target ends and the vector ends, generating single-stranded tails that would support two annealing events and result in the incorporation of the vector fragment between heteroduplex joints. In this view, making a targeted DSB
122
DANA CARROLL
B ........ ........
....... ........ ,
........ ........
-
-........
....... ........ ,
......... ....... -
........, .......
.........-.......
........ ........
........_-
........
--........ -........
......... ....... -
FIG.14. Diagram of SSA activities in gene targeting experiments. Heavy lines represent the strands of the incoming vector DNA, the thinner lines the strands of the chromosomal target. (A) Normal targeting protocol. (B) Targeting with a DSB at the target. See Section VI for details.
or gap would greatly stimulate homologous recombination between exogenous D N A and a chromosomal sequence. We used the oocyte system to perform a model targeting experiment that tests this prediction (75). As stated above, circular DNAs injected into oocytes are inert in recombination; they are like recalcitrant chromosomal sites in this respect. A cleavage site for the specific endonuclease, I-SceI, was placed in a circular target molecule, and injection of the enzyme showed that introduction of a DSB in the oocytes is sufficient to stimulate both intra- and intermolecular recombination. The intermolecular substrates were designed to mimic a targeting experiment as closely as possible. Both I-SceI cleavage and exonucleolytic resection occurred even though the circular DNA was allowed to assemble into chromatin before the enzyme and recombination partner were injected (75). Similar experiments with site-specific endonucleases have been performed in yeast (50-53) and in cultured mammalian (76-78) and plant (79) cells, Both extrachromosomal and chromosomal homologous recombination were stimulated substantially. Cleavage of an I-SceI site in a mouse chromosome stimulated homologous integration of an exogenous linear D N A at least 100-fold (77, 78). These studies show the feasibility of inducing recombination with targeted DSBs. but the reagents do not yet have much general applicability. In order to stimulate targeted recombination with I-SceI, for example, it is necessary first to insert the recognition site at the desired locus using standard, low-efficiency targeting techniques. As cleavage reagents are
RECOMBINATION IN
Xenopus
123
developed that can be directed to arbitrary, but specific targets, they will find immediate application in experimental gene manipulation.
VII. Summary Appropriately designed DNA substrates undergo very efficient homologous recombination after injection into the nuclei of Xenopus laevis oocytes. The requirements for this process are that the substrate be linear, that it have direct repeats to support recombination, and that these repeats be at or very near the molecular ends. Taking advantage of direct nuclear injection, the large amounts of DNA processed in a single oocyte, and the accessibility of recombination intermediates, we were able to analyze the mechanism of recombination in detail. Molecular ends are resected by a 5' + 3' exonuclease activity. When complementary sequences are exposed from two ends, they anneal. Continued 5' + 3' degradation removes the redundant strands; the 3' ends pair with their complements and can be extended by DNA polymerase to fill any gap left by the exonuclease. Joining of strands by DNA ligase completes the process. This mechanism is nonconservative, in that only one of the two original repeats is retained, and it has been dubbed single-strand annealing, or SSA. The capability for SSA accumulates during the later phases of oogenesis and persists into the egg. This pattern suggests that, like many activities of full-grown oocytes, SSA is stored for use during embryogenesis. The same or a very similar mechanism is prevalent in many other species, including bacteria, yeast, plants, and mammals, where it often provides the predominant mode of recombination of extrachromosomal DNA. Lessons learned about SSA are applicable to methods of gene manipulation. It is plausible that SSA has a normal function in the repair of double-strand breaks, but proof of this awaits identification of genes and enzymes uniquely involved in this style of recombination. ACKNOWLEDGMENTS I am particularly grateful to the people in my laboratory, past and present, whose work is summarized here: Ed Maryon, Sunjoo Jeong-Yu, Genevihve Pont-Kingdon, Chris Lehman, RenCe Dawson, David Segal, Mike Clemens, Jon Trautman, ElzbietaGrzesiuk. and Scott Wright. This work has relied on support by research grants from the National Science Foundation (DCB-8718227, DMB-9019139, MCB-9315959) and from the National Institutes of Health (GM22232, GM41747, GM50739). I thank David Segal, Renee Dawson, Genevikve Pont-Kingdon, and Shawn Christensen for thoughtful comments that helped improve the manuscript. I am grateful to the people in my laboratory and to my colleagues in the recombination field for continuing discussions and experiments over many years.
124
DANA CARROLL
REFERENCES 1 . R. S. Hawley, in “Genetic Recombination” (R. Kucherlapati and G. R. Smith, eds.), p. 497. American Society for Microbiology, Washington, DC, 1988. 2. E. C. Friedberg, G . C. Walker and W. Siede, “DNA Repair and Mutagenesis.” ASM Press, Washington, DC, 1995. 3 . J. B. Gurdon and D. A. Melton, ARGen 15, 189 (1981). 4 . J. B. Gurdon and M. P. Wickens, Methods Enzymol. 101, 370 (1983). 5. D. G. Attardi, E. Mattoccia and G. P. Tocchini-Valentini, Nature 270, 754 (1977). 6. R. M. Benbow and M. R. Krauss, Cell 12, 191 (1977). 7. D. Carroll, S. H. Wright, R. S. Ajioka and C. E. J Hussey, J M B 178, 155 (1984). 8. A. Colman, in “Transcription and Translation” (B. D. Hames and S. J. Higgins, eds.), p. 271. IRL Press, Washington, DC, 1984. 9 . A. Kressmann and M. L. Birnstiel, in “Transfer of Cell Constituents into Eukaryotic Cells” (J. E. Celis, A. Graessmann and A. Loyter, eds.), p. 383. Plenum, New York, 1980. 10. D. L. Stephens, T. J. Miller, L. Silver, D. Zipser and J. E. Mertz, Anal. Biochern. 114,299 (1981). 1 1 . D. Carroll, PNAS 80, 6902 (1983). 12. D. Carroll, S. H. Wright, R. K. Wolff, E. Grzesiuk and E. B. Maryon, MCBiol 6, 2053 (1986). 13. E. Maryon and D. Carroll, MCBiol 9, 4862 (1989). 14. E. Maryon and D. Carroll, MCBiol 11, 3278 (1991). 15. J.-P. Abastado, S. Darche, F. Godeau, B. Cami and P. Kourilsky, PNAS 84, 6496 (1987). 16. A. H. Wyllie, R. A. Laskey, J. Finch and J. Gurdon, Deu. Biol. 64, 178 (1978). 17. E. Grzesiuk and D. Carroll, NARes 15, 971 (1987). 18. E. Maryon and D. Carroll, MCBiot 11, 3268 (1991). 19. C. M. Radding, in “Genetic Recombination” (R. Kucherlapati and G. R. Smith, eds.), p. 193. American Society for Microbiology, Washington, DC, 1988. 20. S. C. West, Cell 76, 9 (1994). 21. F.-L. Lin, K. Sperle and N. Sternberg, MCBioZ 4, 1020 (1984). 22. G. Pont-Kingdon, R. J. Dawson and D. Carroll, E M B O J. 12, 23 (1993). 23. C. W. Lehman, M. Clemens, D. Worthylake, J. K. Trautman and D. Carroll, MCBiol 13, 6897 (1993). 24. S. Jeong-Yu and D. Carroll, MCBioZ 12, 5426 (1992). 25. R. Cortese, R. Harland and D. Melton, PNAS 77, 4147 (1980). 26. S. Jeong-Yu and D. Carroll, MCBiol 12, 112 (1992). 27. C. W. Lehman and D. Carroll, PNAS 88, 10840 (1991). 28. C. W. Lehman and D. Carroll, Anal. Biochem. 211, 311 (1993). 29. D. Carroll, C. W. Lehman, S. Jeong-Yu, P. Dohrmann, R. J. Dawson and J. K. Trautman, Genetics 138, 445 (1994). 30. C. W. Lehman, S. Jeong-Yu, J. K. Trautman and D. Carroll, Genetics 138, 459 (1994). 31. R. S. Lahue, K. G . Au and P. Modrich, Science 245, 160 (1989). 32. J. J. Holmes, S. Clark and P. Modrich, PNAS 87, 5837 (1990). 33. D. C. Thomas, J. D. Roberts and T. A. Kunkel, JBC 266, 3744 (1991). 34. J. N. Dumont, J. Morphol. 136, 153 (1972). 35. L. D. Smith, Development 107, 685 (1989). 36. W. Goedecke, W. Vielmetter and P. Pfeiffer, MCBiol 12, 811 (1992). 37. C. W. Lehman, J. K. Trautman and D. Carroll, NARes 22, 434 (1994). 38. P. Pfeiffer and W. Vielmetter, N A G S 16, 907 (1988).
RECOMBINATION IN
Xenopus
125
S. Thode, A. Schiifer, P. Pfeiffer and W. Vielmetter, Cell 60, 921 (1990). P. Pfeiffer, S. Thode, J. Hancke and W. Vielmetter, MCBiol 14, 888 (1994). D. B. Roth and J. H. Wilson, PNAS 82, 3355 (1985). D. Roth and J. Wilson, in “Genetic Recombination” (R. Kucherlapati and G. R. Smith, eds.), p. 621. American Society for Microbiology, Washington, DC, 1988. 43. S. Suhramani and B. L. Seaton, in “Genetic Recombination” (R. Kucherlapati and 6 . R. Smith, eds.), p. 549. American Society for Microbiology, Washington, DC, 1988. 44. L. Symington, P. Morrison and R. Kolodner, J M B 186, 515 (1985). 45. Z. Silberstein, M. Shalit and A. Cohen, Genetics 133, 439 (1993). 46. Z. Silberstein, Y. Tzfati and A. Cohen, J. Bact. 177, 1692 (1995). 47. N. K. Takahashi, K. Yamamoto, Y. Kitamura, S.-Q. Luo, H. Yoshikura and I. Kobayashi, PNAS 89, 5912 (1992). 48. T. Yokochi, K. Kusano and I. Kobayashi, Genetics 139, 5 (1995). 49. S. C. West, in “Nucleases” (S. M. Linn, R. S. Lloyd and R. J. Roberts, eds.), 2nd ed., p. 145. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1993. 50. N. Rudin and J. E. Haber, MCBiol8, 3918 (1988). 51. N. Rudin, E. Sugarman and J. E. Haber, Genetics 122, 519 (1989). 52. A. Plessis, A. Perrin, J. E. Haber and B. Dujon, Genetics 130, 451 (1992). 53. B. A. Ozenherger and 6. S. Roeder, MCBiol 11, 1222 (1989). 54. J. Fishman-Lobell, N. Rudin and J. E. Haber, MCBiol 12, 1292 (1992). 55. C. MBzard and A. Nicolas, MCBioZ 14, 1278 (1994). 56. J. A. Haher, R. H. Borts, B. Connolly, M. Lichten, N. Rudin and C. I. White, This Series 35, 209 (1988). 57. H. Sun, D. Treco, N. P. Schultes and J. W. Szostak, Nature 338, 87 (1989). 58. C. I. White and J. E. Haber, EMBO I . 9, 663 (1990). 59. H. Sun, D. Treco and J. W. Szostak, Cell 64, 1155 (1991). 60. N. Sugawara and J. E. Haber, MCBiol 12, 563 (1992). 61. M. Baur, I. Potrykus and J. Dasakowski, MCBioZ 10, 492 (1990). 62. B. Lewin, “Genes V.” Oxford Univ. Press, Oxford, 1994. 63. M. Meuth, in “Mobile DNA” (D. E. Berg and M. M. Howe, eds.), p. 833. American Society for Microbiology, Washington, DC, 1989. 64. J. W. Phillips and W. F. Morgan, MCBiol 14, 5794 (1994). 65. D. Hourcade, D. Dressler and J. Wnlfson, PNAS 70, 2926 (1973). 66. J.-D. Rochaix, A. P. Bird and A. Bakken, JMB 87, 473 (1974). 67. L. W. Coggins and J. 6. Gall, J , Cell B i d . 52, 569 (1972). 68. S. Waga, G. Bauer and B. Stillman, JBC 269, 10923 (1994). 69. A. E. Warner, S. C. Guthrie and N. B. Gilula, Nature 311, 127 (1984). 70. A. Warner and J. B. Gurdon, J . Cell B i d . 104, 557 (1987). 71. C. V. E. Wright, K. W. Y. Cho, J. Hardwicke, R. H. Collins and E. M. De Robertis, Cell 59, 81 (1989). 72. M. Capecchi, Science 244, 1288 (1989). 73. B. H. Koller and 0. Smithies, Annu. Rev. Zmmunol. 10, 705 (1992). 74. S. L. Pennington and J. H. Wilson, PNAS 88, 9498 (1991). 75. D. J. Segal and D. Carroll, PNAS 92, 806 (1995). 76. P. Rouet, F. Smih and M. Jasin, PNAS 91, 6064 (1994). 77. P. Rouet, F. Smih and M. Jasin, MCBiol 14, 809G (1994). 78. A. Choulika, A. Perrin, B. Dujon and J.-F. Nicolas, MCBiol 15, 1968 (1995). 79. H. Puchta, B. Dujon and B. Hohn, NARes 21, 5034 (1993). 39. 40. 41. 42.
Hormonal and Cell-specific Regulation of the Human Growth Hormone and Chorionic Somatomammotropin Genes NORMANL. EBERHARDT, *,* SHI-WENJIANG,* ALLAN R. SHEPARD,+ANDREW M. ARNOLD*AND MIGUEL A. TFWJILLO* Endocrine Research Unit Departments of *Medicine, +Physiology and #Biochemistry/Molecular Biology Mayo Clinic Rochester. Minnesota 55905
I. Members of the Growth Hornlone Gene Family . . . . . . . . . . . . . , . . . . . A. Evolution of the GHICS Genes . . . . , . , . , , . . . . . , . . , . . . . . . . , , , , B. Receptors for the Placental GIIICS Genes ...... C. Physiological Actions of Placental GHICS . . . . . . . , . . . . . . . . . . , , . . 11. Control of hGH Gene Expression by CAMP , . . . . . . , . , . . . . . . . , , , , , . A. The GHRHISomatostatin-CAMP-PKA Axis , , . . B. cAMPIPKA-Responsive Transcription Factors . . C. Invulvement of GHF-I in Mediating CAMP Resp D. CAMP Response Elements on the GH Promoter . . . . . . . . . . . . . . . . E. CAMP Response Elements on GH-Related Promoters . . . . . . . . . . . 111. Control of GHICS Gene Expression by Thyroid Hormone . . . . , . , . . . A. Positive Thyroid Hormone Regulation of the hCS Gene . . . B. TR-induced DNA Bending and T3 Responsiveness . . . . . . . . . . . . . . C. Negative T3 Regulation of the 11CH Gene .... D. Negative Control Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . IV. Cell-specific Control of Placental GHICS Gene Expression . . . . . . . . . . A. The Placental Enhancer. CSEn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. A Placental Factor, CSEF-1, Controls CSEn Function . . . . . , , , , , , C. The hCS Initiator Element, InrE . . . . ... ... D. Negative Control of Placental hGHICS Genes in Pituitary . . . . . . . E. Hormonal Control of Placental hGHICS Gene Expression . . , . , . , References . . . . . . . . . . . . . . . , . , . . . . . . . . ......
Progrr.ra in Nucleic Acid Research and Molecular Biology, Vol. 54
127
128 128 130
131 134 134 134 135 135 137 138 138 139 143 145 148 149 152 154 155 156 158
Copyright 0 1996 by Academic Press, Inr. All rights of reproduction in any form reserved.
128
NORMAN L. EBERHARDT ET AL.
We have been interested in the regulation of human growth hormone (hGH) and chorionic somatomammotropin (hCS) gene expression for the last decade. In the current article, we summarize the knowledge gained from studies in our own and other laboratories, emphasizing the hormonal and cell-specific regulation of these genes. The pituitary hGH-1 and the placental hCS-I, hCS-2, and hGH-2 genes are unique in primates, because they have evolved very recently and have maintained -95% nucleotide sequence identity. Consequently, they form a particularly interesting gene family for the study of cell-specific regulatory mechanisms, and may represent an important set of genes that played a unique role in primate evolution. In contrast to pituitary hGH-I, the exact functional roles of the placental GHICS genes has not been firmly established. Indeed, it has been questioned whether placental hGH or hCS have essential functions, at least under normal physiologic conditions. Accordingly, the placental GHICS genes provide a fascinating evolutionary event that has forced us to consider some rather startling scientific paradoxes, including convergent evolutionary mechanisms, and to consider how nearly identical genes became so differentially regulated. To attempt to bring some perspective, emphasis, and rationale for our interest in the regulation of the GHICS genes, we also provide a summary of the status of the physiology of placental GHICS action in the first section below.
1. Members of the Growth Hormone Gene Family
A. Evolution of the GH/CS Genes GH and prolactin (PRL) are vertebrate hormones that originated from a common progenitor gene -350 million years ago (1).Subsequent duplication of the GH gene in primates or the PRL gene in rodents and ruminants produced the chorionic somatomammotropin (CS, also known as placental lactogen, PL) genes (1). Thus hGH and hCS mRNA sequences are 93.5% identical, whereas hPRL shares 42 and 41% identity with hGH and hCS mRNAs, respectively. However, rodent and ruminant CS mRNA sequences are -50% similar to their cognate PRL mRNAs, but only -35% similar to their cognate GH mRNAs (1,2). This raises a question whether the mammalian CS genes represent an example of convergent evolution. In mice, two additional members of the G H / PRL family, proliferin and proliferin-related protein, are structurally distinct from CS (3), but may have similar functions (4). The hGHICS locus contains two GH genes and three CS genes that span -50 kb of DNA on chromosome 17q22/25 (5, 6). The genes are arranged as shown in Fig. 1 with the pituitary hGH-1 gene occupying the 5’ end of the
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
1
1
1
'
1
1
WP) 30
20
10 1
'
1
'
1
1
1
1
1
1
40 1
1
1
I
50 l
r
r
l
I
I
60
I
I
Placental-Specific Enhancement GH-1
CS-5 t
P1 I
'
En5
cs-1 t P2
'
En1
GH-2 t
P3
129
CS-2
1 1 1 I
'
En2
t
P4 I
Pituitary-Specific Repression
FIG. 1. Schematic representation of the hGHICS gene locus localized on chromosome 17q22-qter. The location of placental-specific enhancers (En5, E n l , and En2) and pituitaryspecific repressor elements (Pl-P4) that regulate expression of the CS-5, CS-1, GH-2, and CS-2 genes in placenta and pituitary, respectively, are indicated.
locus. The hCS-5 gene appears to be a pseudogene based on a mutation of the 5'-splice donor site at the exon IUintron B boundary that interferes with intron processing (6). The remaining three genes, hCS-I, hGH-2, and hCS-2, are active and expressed exclusively in the placenta. The hGH-2 gene encodes a placental-specific variant, hGH, that contains 13 amino-acid substitutions relative to hGH-1 (7).The placental-specific hCS-1 and hCS-2 genes are both functional and encode identical polypeptides (8), which share -85% amino-acid similarity with hGH-1 (1).All of the genes contain four introns, and the intronlexon boundaries have been highly conserved in the hGH/CS genes and PRL gene (1).The hGHICS genes have evolved via a series of duplications, insertions, and concerted mechanisms that are estimated to have occurred between 15 and 60 million years ago and may have involved recombination events with the multiple repetitive A h elements present on the locus (1, 5, 6). Consequently, the individual hGH/CS genes maintain very high sequence similarity (93.5-96%) over their exonic, intronic, and much of their 5'- and 3'-flanking DNA. Pituitary hGH-1 transcripts are spliced into three different mature mRNAs, resulting in the major active 22-kDa GH, a 20-kDa variant lacking 15 amino acids from exon 3, and a 17.5-kDa variant lacking 40 amino acids due to the deletion of exon 3 (2, 9). The hGH-2 (GH variant), hCS-I, and hCS-8 genes are produced by placental syncytiotrophoblasts (2, 9). CS mRNA is expressed uniformly in all syncytiotrophoblasts whereas GH-2 mRNA is expressed in only a subpopulation of cells (lo), suggesting differential control of these genes. The 22-kDA GH-2, unlike GH-1, can undergo glycosylation to yield a 25-kDa isoform. Also, because the amino-acid differences between GH-1 and GH-2 are largely nonconservative, the biological properties of the resulting proteins may vary significantly (9).
130
NORMAN L. EBERHARDT E T AL.
Another hGH isoform (hGH-V2) with a completely different carboxyl terminus occurs via an alternate splicing event. The 26-kDa hGH-V2 is not glycosylated and is bound to the membrane of the syncytium as a membrane-associated, nonsecretory protein (9). Because of these differences, GH-2 may have potentially unique actions that are important for pregnancy. The hCS-1 and hCS-2 genes are 98% homologous and encode identical mature proteins, but differ by one amino acid in their signal peptides. Although the hCS-5 gene was thought to be a psuedogene (6, l l ) , recent studies indicate that the hCS-5 gene produces five alternatively spliced mRNAs and some additional minor forms (12). Most of these mRNAs lack exon 2 and are nonfunctional, but some retain the potential for synthesis of a functional gestational hormone.
B. Receptors for the Placental GH/CS Genes GH, PRL, and CS receptors are members of the cytokine/hematopoietin receptor family, a group of cell-surface receptors that induce their biological responses via signal transduction (13, 14). The hGH receptor (hGHR) consists of a 246-aminoacid, ligand-binding extracellular domain, a single 24residue transmembrane domain, and a Cytoplasmic region of 250 amino acids. Members of the GH/PRL receptor family are characterized by overall low amino-acid similarity. However, they retain a significant (14-25%) identity over -200 amino acids of their extracellular domains (15).The different receptors also maintain characteristic pairs of N-terminal cysteine residues and, except for the GH receptor, a WSXWS motif (15). In the intracellular domain, three somewhat conserved regions known as Box 1, Box 2, and Box 3 are found in many members of this receptor family, and proline residues in GHR Box 1are required for signal transduction (16). The intracellular region of these receptors uniformly lacks typical tyrosine kinase and receptorG-protein coupling motif.. (13). Interestingly, the GH receptor dimerizes on ligand binding (13, 14). The GH molecule has two binding sites, enabling it to bind two separate receptor molecules (16).The receptor dimer then associates with the tyrosine kinase, JAK2, that leads to the phosphorylation of intracellular substrates, including both JAK2 and the GH receptor. With the phosphorylation of mitogenassociated protein (MAP) kinases and STATSl/pSl (a Src-homology domain-2 containing latent cytoplasmic transcription factor), these proteins become translocated to the nucleus and mediate some of the downstream actions of GH. GH has been implicated in the induction of transcription of c-fos, IGF-I, and the serine protease inhibitor, Spi 2.1. The induction of c-fos is known to be mediated by STATSl/p91. Tyrosine phosphorylation of a nuclear factor is required for GH-induced binding to a GH response element, but STATSl/pSl is not present in the binding complex necessary to
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
131
activate Spi 2.1 transcription (17). This suggests that there may be STATSUpSl-dependent and -independent pathways. Also, the phosphoinositol pathway may be involved in mediating certain GH intracellular effects (18). Although hCS and hGH share 85% primary amino-acid sequence identity, hCS binds the hGH receptor only 1/2300 as well (19). Specific hCS receptors have been found in fetal liver and skeletal muscle, whereas hGH receptors were found only in fetal liver (20).An alternatively spliced hGHR lacking exon 3 (21)was initially thought to encode a specific receptor for hCS or hGH-8; however, further investigation revealed that the hGHR splice variant bound hGH with the same afEnity as the normal hGHR (22). Also, the expression of these GHR isoforms is ubiquitous among human tissues, although the relative proportions of the receptors may vary (23). Evidence for unique bovine CS receptors (bCSR) with a 100-fold lower affinity for bGH binding have been detected in endometrial membranes from midpregnant heifers (24). Bovine CS binds to the liver bGH receptor with a 1:l stoichiometry even when the bGH binding site is blocked by an antibody, indicating that bCS binds to a different domain (25).Although this report is intriguing, its significance is unknown, because no specific 1251labeled bGH binding could be measured in bovine endometrium (24).Also, bCS and bPRL bind with equal affinity to a recombinant bPRLR (26).Taken together, these results suggest that bCS may mediate responses through a combination of bGH, bPRL, and unique bCS receptors or possibly heterodimeric forms of these receptors (27). Preferential binding of ovine CS (oCS) has been observed in fetal liver membranes (28,29),where no specific binding of oGH or oPRL occurs until parturition (29).Similar results have been observed in fetal fibroblasts, suggesting that oCS may exert anabolic effects on the fetus (30). Cross-linking and immunoprecipitation analyses of receptor 12sI-labeled complexes with monoclonal antibodies against the extracellular domain of the rabbit GHR indicate that oCS and oGH bind an identical receptor in fetal liver microsomes obtained at 125-135 days of gestation (31).Interestingly, the oGHR mRNA present during midgestation (60-120 days) is larger (5.8 compared to 5.5 kbp) and differs within the 5’-untranslated region from the oGHR mRNA present at 135 days of gestation or in adult liver (32).This suggests there is a developmental switch in the structure of oGHR, allowing the receptor to preferentially bind oGH and preparing the fetus to respond to GH postnatally (32).
C. Physiological Actions of Placental GH/CS Both hCS and hGH-2 are found in the maternal circulation and their concentrations increase throughout pregnancy, reaching their respective
132
NORMAN L. EBERHARDT ET AL.
maxima near term (hCS = 5-15) p+g/mL;hGH-2 = 15 ng/mL). In the second half of pregnancy, GH-2 progressively supplants GH-1, which eventually is suppressed to undetectable levels in the maternal circulation (9). Unlike pituitary GH secretion, placental GH is not stimulated by GHRH or inhibited by IGF-I. Although the exact physiological role of GH-2 is unknown, its effects are largely somatogenic (33). Increases in GH-2 levels during pregnancy are paralleled by increased IGF-I during the second half of gestation that declines after birth (I), suggesting it affects maternal metabolism. Significantly lower GH-2 and IGF-I concentrations have been measured in pregnancies with intrauterine growth retardation (34) and pathological pregnancies involving disorders of the fetoplacental unit (35). In normal pregnancies, birth weight is positively correlated with maternal IGF-I values (35).Also, hGH-2 may evoke an autocrine/paracrine effect on the placenta, in that syncytia exhibit increased hGHR levels from 10 to 12 weeks of gestation through term (11).Accordingly, GH-2 may act indirectly by maintenance of high levels of energy-rich nutrients in the maternal circulation that are required for development of the fetoplacental unit (34). CS appears to have both lactogenic and somatogenic effects (34). CS may not have a primary role in mammary development, because individuals appear to have normal breast development and lactation in isolated deficiencies of CS, or GH, or prolactin (36).However, in ruminants, recombinant bCS significantly increased DNA content of mammary glands in steroid primed peripubertal, nonpregnant heifers and induced mammary secretions (37). Recombinant bPRL was similarly lactogenic but did not exhibit the mitogenic effect of bCS (37). Older studies in humans have indicated that fasting during pregnancy results in increased CS levels (38) and increased lipolysis (39). Fasting, glucose, and other nutritional factors appear to regulate the expression of the CS receptor in fetal and maternal sheep liver, suggesting that CS plays a role in the metabolic adaptation of the mother and fetus to nutritional deprivation and stress (40). In nonpregnant sheep, increases in nonesterified fatty acids (NEFAs), glucose, and urea are consistent with gluconeogenic and lipolytic activities of CS (41). However recombinant, nonglycosylated bCS had no effect on NEFAs, glucose, or urea concentrations in either pregnant or nonpregnant cows (42), although IGF-I levels were increased and IGF-I binding protein 2 decreased (IGFBP-2). In addition to hGH-2, hCS has been implicated in increasing maternal IGF-I levels in the third trimester of human pregnancy (43). A recent study (44) suggests that CS directly regulates pancreatic islet function in humans, rats, and mice, suggesting that CS is responsible for the elevated insulin secretory response observed in normal pregnancy and may be implicated in pregnancy-associated diabetes.
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
133
Pituitary GH is present in the fetal pituitary from at least 12 weeks of gestation and increases until term (1). There are specific binding sites for hGH in isolated second-trimester fetal hepatocytes (20). Amnity of the receptor for hGH was considerably lower than that observed for the postnatal GHR. Both hGH and hGH-2 have equal ability to displace 125I-labeledhGH from the receptor, indicating there is not a specific hGH-2 receptor. In addition, no hGH receptor was isolated from fetal skeletal muscle (20). hGH potentiate the release of IGF-I and stimulate DNA synthesis in fetal hepatocytes (45).This latter action was neutralized by an antibody recognizing IGFI and IGF-11, indicating that IGFs mediate the mitogenic action of GH. Nevertheless, the absence of endogenous fetal GH does not produce intrauterine growth retardation (36). Thus, GH may have a restricted anabolic role in the fetus ( I ) . Although hCS is predominantly a maternal hormone, it is also present in the fetal circulation at levels 1/50th to 1/100th that of the maternal circulation (43). Stimulation of human fetal fibroblasts by CS stimulates release of IGF-I, IGF-11, and IGFBP in uitro (46).Also, infusion of 1.2 mg/day of oCS from day 122 through 135 of gestation into fetal sheep vasculature increased fetal serum IGF-I concentrations but did not affect serum IGFBP-2 concentrations (47). These results suggest that many of the somatogenic actions of CS are mediated through IGF (43); however, definitive evidence that CS controls fetal growth in utero is lacking. Significant correlations have been observed between maternal serum CS and fetal weight in sheep. In ewes with singleton pregnancies, placental weight plus fetal oCS levels could explain 81% of the variation seen in fetal weight, suggesting that maternal and fetal oCS release are independently controlled and that fetal oCS affects fetal growth by a mechanism unrelated to placental size (48).Fasting of the pregnant ewe reduces the number of CS receptors in ovine fetal liver, which may contribute to the depletion of fetal liver glycogen stores (28). Accordingly CS may play a role in the pathogenesis of the fetal growth retardation that accompanies maternal caloric deprivation (28). Fasting has also been shown to impact the fetal somatotropic axis, suggesting that it is functional in utero and that it plays a role during fetal adaptation to nutritional limitation (49). It is debatable whether hCS and hGH-2 are essential hormones, because individuals lacking the hCSIGH-2 genes have histories of normal pregnancies and normally developed offspring (50,51). However, due to the partially redundant nature of the growth hormone gene family, their somewhat related bioactivities, and potential for cross-talk at the receptor level, CS and GH-2 deficiency may be functionally compensated by GH or PRL. In addition, it cannot be excluded that the alternatively spliced hCS-5 gene transcripts (12) may be involved in essential functions, because this gene is
134
NORMAN L. EBERHARDT ET AL.
present in individuals that lack the GH-2, CS-1, and CS-2 genes (51).Taken together, the above data provide strong evidence that the placental GH/CS genes have diverse functions throughout pregnancy, creating homeorrhetic controls that allow maternal metabolism to incur only minimal fluctuations, thereby ensuring the conceptus a continual supply of nutrients.
II. Control of hGH Gene Expression by cAMP
A. The GHRH/Somatostatin-CAMP-PKA Axis Control of GH gene expression by cAMP in anterior pituitary somatotrophs occurs through a cascade mechanism initiated primarily by the action of two antagonistic hypothalamic neurohormones (52). Somatostatin inhibits GH release and growth-hormone-releasing-hormone (GHRH) regulates both GH synthesis and secretion. Somatostatin binds to a cell-surface seven-transmembrane-helix G-protein-coupled receptor (53) and works by inhibiting protein kinase A (PKA) activity (54)or modulating the steady-state levels of GHRH (55). GHRH exerts its influence over GH transcription indirectly by binding to a cell-surface seven-transmembrane-helix G-protein-coupled receptor (GHRHR) that activates adenylate cyclase (56). Evidence for GHRH-GHRHR involvement in GH synthesis Came from studies of transgenic mice and primary cultures of acromegalic pituitary tumor cells. GHRH overexpression in uivo leads to increased GH synthesis and pituitary hyperplasia in transgenic mice (57). Mutations in the GHRHcoupled G, subunit gene (gsp oncogene) have been identified as the cause of constitutive activation of adenylate cyclase in a small proportion (<25%) of GH-secreting human pituitary tumors (58, 59). Subsequent to GHRHR-mediated G, protein activation, adenylate cyclase catalyzes the conversion of ATP to CAMP. Cytoplasmic cAMP binds to the regulatory subunits of PKA, thereby releasing the catalytic subunits (54). The catalytic subunits of PKA diffuse into the nucleus (60)where they specifically phosphorylate a variety of nuclear transcription factors and confer transcriptional transactivation to genes that contain cAMP response elements (CREs).
B . CAMP/ PKA-Responsive Transc ription Factors
PKA-responsive transcription factors comprise three subfamilies: junlAP-1, AP-2, and CREBIATF (reviewed in 61). These subfamilies bind to a wide array of promoter elements and initiate transcription in a CAMPdependent fashion. The CREB/ATF and junlAP-1 families belong to the basic regionlleucine zipper (bZIP) class of DNA-binding proteins (62).The
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
135
bZIP proteins bind to variations of the palindromic sequence TGACGTCA, whereas AP-2 recognizes the TCCCCANGCG sequence. Cyclic AMP-responsive genes usually contain these or highly related cis elements in their promoter regions. The importance of CREB in somatotrophic cell development and GH gene expression was indicated by transgenic mice expressing a dominant negative mutant CREB that resulted in anterior pituitary somatotroph hypoplasia and a dwarf phenotype (63).One of the targets of CREB action may be on the GHF-1 (also referred to as Pit-1) gene, which contains autoregulatory GHF-1 and cAMP response elements (64).
C. Involvement of GHF-1 in Mediating cAMP Response
GHF-1 is a pituitary-specific protein (56, 65) that is involved in the development of somatotrophs, lactotrophs, and thyrotrophs. It is responsible for GH, PRL, thyrotropin stimulating hormone (TSH), GHRH, and autologous GHF-I, gene transcription. GHF-1 expression is critical for somatotroph development and cell-specific G H expression because endogenous GHF-1 gene mutations result in a dwarf phenotype (66). GH gene synthesis occurs temporally and spatially with the onset of GNF-1 expression (67). PKA may also phosphorylate GHF-1 and affect its DN.4-binding status (68).
D. cAMP Response Elements on the GH Promoter Efforts to identify CAMP-responsive elements in GH genes focused on the proximal 5’-flanking region (69-71). The hGH CAMP-responsive region was localized to nucleotides -212 to -83 by Brent et al. (71),whereas Dana and Karin (69)localized the region to within 82 bp upstream of the transcriptional start site. Copp and Sarnuels (70) localized the rGH CAMP-responsive region to nucleotides -104 to +11 relative to the transcription start site. This suggested involvement of GHF-1 in the cAMP response, because all the identified regions contained GHF-1 binding sites, although elimination of these sites did not extinguish cAMP responsiveness of the hGH promoter (69). GHF-1 mediation of CAMP responsiveness was subsequently shown for the GHF-1-responsive hTSHP gene (72)and the normally GHF-1 unresponsive renin gene (73). Also, the CAMP-responsive regions of the GH promoters did not contain canonical CREs, and the AP-2 binding sites on the hGH and rGH promoters were not functional (69, 70, 74). Thus a novel element might be involved in mediating hGH and rGH cAMP responsiveness. Figure 2 illustrates the CAMP-response unit (CRU) in the hGH promoter that we recently defined by mutational analysis (74). Two partial CREs or half-sites (CGTCA) at nucleotides - 187 to - 183 and -99 to -95 that cooperated with a GHF-1 binding site at nucleotides -123 to -112 were re-
NORMAN L. EBERHAKDT ET At.
136 Human
I
GH
Gene CAMP Response Unit
I
-1 77
- 1 93
-1 05
I
I
-00
I
ACTGGSGTGGGAA
AAGCCC-GTGGCCCC
tdCRE
__)-
I
D d G H F l l l pCRE
-1 24
I
I
TATAAA
-110 I
CTMlTATCCATTA
FIG. 2. Structure of the CAMP response unit in the hGH S'-flanking region. The unit consists of two CGTCA half-sites of the palindromic CRE, TGACGTCA that hinds CREB, and the distal GHF-1 binding site. Mutation of any one of these sites destroys cAMP responsiveness of the hGH promoter. [Adapted from Shepard et al. (74).with permission from The Journal of Biological Chemistry.]
quired to render the hGH promoter cAMP responsive (74). The partial CREs were occupied in uitro by CREB-like factors based on various analyses using extracts of rat anterior pituitary GC cells (74). Antibodies to the CREB/ATF-1 subfamily but not ATF-2 subfamily specifically "supershifted the partial CRE-protein complex in a gel-shift assay (74).The protein binding in this gel-shift complex was also competed by a consensus CREB oligonucleotide, suggesting that CREB-like factors might be involved in mediating this response. Indeed, purified CREB bound the partial CRE with identical affinity and mobility as GC cell extract (74).A proteolytic clippinggel shift assay of the partial CRE complex with pituitary cell extract or purified CREB demonstrated a CREB-like protein activity in the extract and an additional unknown factor. UV cross-linking and Southwestern blot analyses revealed multiple DNA-protein interactions. The 45-kDa CREB-like factor and a 100-kDa unidentified factor bound the hGH partial CREs (74). Thus, the CGTCA motifs in the hGH promoter are recognized by members of the CREB/ATFl transcription factor subfamily, but the exact identity of these factors remains to be elucidated. GHF-1 overexpression studies in transfected GC cells indicated that GHF-1 levels do not limit the hGH transcriptional response to cAMP (74). Thus GHF-1 functions on the hGH promoter either by post-translational modification and/or through structural perturbation of the hGH promoter, facilitating or allowing trans-activation by adjacent transcription factors (74). In fact, GHF-1 is capable of bending DNA in oitro (75), as is CREB (76), suggesting a possible mechanism for altering promoter structure and function. This mechanism gains intriguing support from the fact that the only
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
137
GHF-1 binding site that functions in the CRU is the upstream site between the two CRE half-sites. Thus GHF-1 induced DNA bending might facilitate interactions between factors bound at the upstream and downstream CREs.
E. cAMP Response Elements on GH-Related Promoters High sequence homology between two genes does not guarantee similar control of gene expression. The hCS-I and hCS-2 genes are -95% identical to the hGH gene, yet the hCS promoters respond only marginally to cAMP (74).Both of the hCS genes have mutations in their upstream CGTCA motifs that account for this difference, suggesting that there is no selective advantage in preserving this regulatory mechanism. In contrast, the proximal promoter regions of the rGH and hGH genes that contain their respective CREs are only -75% identical. The rGH promoter lacks CGTCA motifs, but does contain GHF-1 binding sites; however, the exact CREs have yet to be identified. Thus, despite claims of conserved transcriptional responsiveness by the GH genes (77), the rGH and hGH genes utilize different CREs and may operate through different mechanisms. Interestingly, the PRL gene, which retains only -42% sequence similarity to the GH gene, is regulated by cAMP in a manner very similar to the hGH gene (78-80). Pituitary lactotrophs express PRL and require GHF-1 for cell-specific expression (80). The PRL promoter contains several GHF-1 binding sites and a CGTCA motif required for cAMP regulation (78-80) that binds a -100-kDa protein (78).However, the PRL promoter CGTCA motif does not bind purified CREB significantly (78-80). The flanking nucleotides of the PRL and hGH CGTCA motifs are quite different, suggesting that the context of the CGTCA motifs may stipulate which CREB family member binds the motif. Also, targeted disruption of the CREB gene in mice was compensated by CREM and ATF-1 (81),suggesting that different members of the CREB family may act to relay the cAMP signal to the GH or PRL genes. The generality of CGTCA motifs in CAMP-responsive genes is illustrated by the variety of genes containing these elements. The human vasoactive intestinal polypeptide gene requires two CGTCA motifs for cAMP responsiveness (82). The glucagon and somatostatin gene 5'-flanking regions contain CGTCA motifs that mediate CAMP-dependent PKA responsiveness (83). Even the CREB gene promoter contains CGTCA motifs that mediate its autoregulation (84).The CGTCA motif may also confer responsiveness of a gene to a variety of inducers such as phorbol ester, serum, IL-la, and tumor necrosis factor (85, 86). The hGH gene response to these agents is unknown.
NORMAN L. EBERHARDT E T AL.
138
111. Control of GH/CS Gene Expression
by Thyroid Hormone
We were prompted initially to examine thyroid hormone regulation of the GH and GH-related genes because of its dominant effect in controlling rGH gene transcription (87). Surprisingly, in transfected rat pituitary GC cells, the intact hGH gene was negatively regulated by triiodothyronine (T3) and a negative thyroid hormone response element (nTRE) was subsequently localized to the 3’-untranslated region (3’-UTR) of the gene (88, 89). While examining T3 regulation of reporter genes containing the 5‘-flanking region (5’-FR) of the hGH and hCS genes, we discovered the presence of a positive TRE in the hCS proximal promoter (90, 91). Curiously, the proximal promoter region of both the hGH and hCS genes bound the thyroid hormone receptor (TR) with only small differences in binding affinity, but the respective TR-DNA complexes migrated differently from one another. We found that the differences in TK-DNA complex migration were due to TR-induced DNA conformational changes that were correlated with TRE function, suggesting that the DNA structural perturbation was involved in mediating TR responses. Detailed accounts of each of these findings are considered individually.
A. Positive Thyroid Hormone Regulation of the hCS Gene
In initial studies, we found that the hCS, but not the hGH, 5’-FR conferred positive T3 regulation on a chloramphenicol acetyltransferase (CAT) reporter gene in GC cells (90). Although normally expressed only in the placenta, the conservation of GHF-1 binding sites in the hCS promoter allows its efficient expression in GC cells (92). Nevertheless, such T3 regulation may be physiologically relevant, because T3 stimulates hCS transcription in human choriocarcinoma cells (BeWo) (93). Two TREs that bind TR with reasonably high affinity and that occur between nucleotide positions -loo/-80 and -701-50 in both hGH and hCS promoters were identified (91). The hCS TRE-,,,-,, binds TR somewhat more avidly than the TRE-lm,-80. Deletion constructs lacking the distal TRE-l,1-80 exhibited reduced T3 responsiveness, whereas deletion of both TREs eliminated the response, suggesting that both TREs are required for full T3 responsiveness. Nevertheless, site-directed mutagenesis of the proximal TRE--70,-50 coupled to functional assays indicates that this sequence is essential for T3 responsiveness (Table I). In addition, the sequence divergence between the hGH and hCS genes occurring at positions -54 (G insertion in the hCS), -50 (G in hCS versus T in hGH), and -48 (A in hCS versus G in hGH) were sufficient to account for the absence of T3 response by the hGH promoter
139
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
PROXIMALTREs CONSTRUCT
TABLE I hCS
IN THE
AND
hGlf GENES“
BASAL ACTIVITY
SEQUENCE
x 10-3
T3 REGULATION FOLD? S.E
N
GHpCAT
GGTGGGGTcAA:CAGtGgGAoAoAAgg
295 i 1 2 5
06201
4
CSpCAT
GGTGGGGTCMgCMgGaGAGAGMct
4 45
2
0 88
26+03
9
6 36
2
I40
n7+01
9
5 21
07201
8
..............T.G . . . . . . . . . CS(3X)pCAT . . . . . . . . . ( - ) . . . . . . . . . . . . . . CS(1X)pCAT
~
~
29 17
2
~
0 The represented sequences are h G H ( n t s -64 to -39) and hCS (nts -65 to -39). Basal activity is expressed in cpm inin-’ mg protein-l x lo-’’. The data were adapted from Leidig ct al. (91) with the permission of the Journal of Biological Chcmisfry.
(90). Mutation of the proximal hCS GHF-1-96,-62 binding site decreased the basal promoter activity but did not modify the T3 response, indicating that T3 induction of the hCS promoter does not require GHF-1 (91). To discern the molecular basis of the difference in the T3 response, TR binding in the -87 to + 2 region was carefully examined. Competition and Scatchard analysis of a 32P-labeled hCS_70,-,o oligonucleotide using gelshift assays revealed that TR bound three- to fourfold more tightly to the hCS-7,/-40 than the hGH-701-4,, DNA. These data raised the question whether the difference in TR &nity between the hGH and hCS TRE--70/-40 adequately accounted for the “all or none” functional difference detected between these two elements. That qualitative differences might be involved in determining the differences in T3 responses between these promoters was indicated by the observation that hGH and hCS DNA fragments of equal length (89 bp) migrated differently in standard gel-shift assays. This finding led to the study of TR-induced DNA conformational changes.
B. TR-induced DNA Bending and T3 Responsiveness Since the pioneering work of Wu and Crothers (94), protein-induced DNA bending has become a key issue in a variety of biological process. According to electrophoresis theory, polyanion migration in an electric field is dependent on the mean-square end-to-end distance of the fragment (94, 95). Accordingly, maximal retardation in migration is observed when DNA contains bends near the middle of the molecule and minimal retardation occurs when the bend is near the end of the fragment. This led to the development of circular permutation analysis, whereby a protein binding site is engineered to reside at differing positions within a DNA fragment of constant length. A plot of the relative mobility versus position of the binding
140
NORMAN L. EBERHARDT ET AL.
site relative to the DNA end gives a sinusoidal curve, where the peak locates the DNA bending center (94). To establish whether TR binding can induce a DNA conformational change, circularly permuted DNA fragments containing the hGH and hCS TRE-70,-50 bound to TR were shown to migrate at different distances, indicating that receptor binding alters DNA conformation (91). DNA containing the hCS TRE migrated significantly less than the corresponding fragments containing the hCH TRE, indicating that TR induces a greater degree of DNA deformation on binding the hCS TRE (Fig. 3A). Circular permutation analyses were extended to include two functionally deficient TRE mutants (Fig. 3B and C). In mutant hCS-lX, T and G were incorporated at positions -50 and -48, respectively, and a G at position -54 was deleted in mutant hCS3X. The magnitude of the change in gel shift mobility decreased in the order CS > CS-3X CS-1X > GH. T3 responsiveness of the CS-1X and CS-3X mutants was abolished despite the fact that TR bound all mutants (91). Because the hGH TRE contains all of the sequence differences represented by the CS-1X and CS-3X binding sites, it was postulated that each of these sequence differences contributes about equally to the overall conformational change and that T3 induction is dependent on a DNA deformation threshold (91). Additional indication that TR induces DNA bending on binding to its cognate sequence was obtained using phasing analysis. This analysis places an intrinsic DNA bend at different distances from the structure under investigation, thus varying the helical phasing of the two structures. If the structure under examination contains a DNA bend, the mobility of the complex will be highest when the two bends cooperate and lowest when the two bends counteract each other (96, 97). Phasing analysis of TR-RXR heterodimers bound to the malic enzyme TRE and a synthetic palindrome TRE (TRE,,,) revealed a bending angle of 64.4" for the former and 84.31' for the latter, with the bend directed toward the major groove at the center of the TRE (98). The TR-induced DNA deformation also occurs with the rGH TRE (99); however, certain TREs do not appear to support DNA bending. The frog vitellogenin-2 gene (vit-2) TRE is involved in estrogen receptor (ER)mediated activation and TR-mediated inhibition of vit-2 promoter activity, yet neither ER or TR appear to affect DNA conformation (99). Thus TRmediated DNA bending may be dependent on TRE structure. Although ER does not appear to bend the vit-2 TRE, it is involved in
-
~
~~
~
~
FIG.3. The TR-induced DNA bending revealed by circular permutation analysis of several TREs derived from the hGH, hCS, or mutated hCS 5'-flanking region. The structures of the various mutants (CS-1X and CS-3X) are indicated in Table I. [Reprinted from Leidig et al. (91), with permission from The journal of Biological Chemistry.]
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
E W
2W eW YW
-20
A
I
LL
40
20
100
r
L
60
80
100
120
B
a
b!
!k
n 2
0
F
d
P I
o r
...........................................................
-20
L,
20
40
80
80
100
y’1
**
120
W
L
5 W
U
20
40
80
80
100
TRE POSITION (bp)
120
141
142
NORMAN L. EBERHARDT ET AL.
DNA bending with certain estrogen response elements (EREs). Recently Nardulli et al. (100) demonstrated that transcription could be activated by substitution of intrinsic DNA bending sequences that introduced 54" bends comparable to that produced by the ER (56"). Several basal transcriptional factors, including TBP, TFIIIA, TFIIIB, and TFIIIC, induce DNA bending (101-103). Factors that bind to the upstream activation sequences (UAS), such as NF-KB, AP-1, POU-homeodomain proteins, SRF, SP1, and CREBP1, also induce DNA deformation (96, reviewed in 104). Some eukaryotic promoters possess binding sites for proteins whose only purpose may be the induction of a conformational change in the promoter. The T cell receptor (TCR) gene enhancer binds lymphoid enhancer-binding factor 1 (LEF-1)whose DNA binding domain contains a high-mobility-group (HMG) domain (105).The HMG proteins bind to the minor groove of DNA, inducing a sharp DNA bend (106). The LEF-1 binding site functions only in its natural context and enhancer function depends on other TCR enhancerbinding proteins, suggesting that the LEF-l-induced bend facilitates interactions among these factors (104, 107). Also, SRY, a protein involved in activation of Mullerian inhibiting substance, is a DNA-bending protein that contains an HMG domain that can partially replace LEF-1 function in the TCR promoter (108). Mutations in the SRY gene lead to female sex reversal (109). Analysis of SRY protein DNA binding and bending from several patients with complete gonadal dysgenesis reveals a complex pattern of phenotypes that suggest that the precise conformation of the SRY-mediated nucleoprotein complex is essential for sex determination (110). In the case of certain prokaryotic regulatory factors, DNA bending by itself may be sufficient to promote transcription (111, 112). In eukaryotes, the fact that most transcriptional factors possess activation domains and may require other protein interactions to be functional may prevent functional substitution by curved DNA. However, intrinsically bent DNA can substitute for the ERE in a minimal promoter (100), raising the question whether DNA bending and activation by a transcriptional activation domain might not be independent mechanisms. For example, basal transcription factors such as TBP might preferentially bind to promoters that contain bent DNA. Accordingly, transcription factor-induced DNA bends might act by similar mechanisms as well as through their activation domains. In the case of the TR, protein-protein interactions with a variety of transcriptional factors, especially the RXR, are required for its function (reviewed in 113). In addition, the TR is coupled to the basal transcription apparatus via the core promoter binding protein TFIIB (114). Thus, TR-induced DNA deformation may provide an independent activation signal or it may facilitate the TRTFIIB or TR-RXR-TFIIB interaction. Alternatively, TR-induced DNA
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
143
bends might promote formation of a stable preinitiation complex by serving as a signal for proteins sliding along the DNA to find their site of action (115).
C. Negative T3 Regulation of the hGH Gene In view of the dominant positive effect of T3 in regulating rGH gene expression (87),our finding that the hGH gene is negatively regulated by T3 in transfected GC cells was initially surprising (88). Nevertheless, some observations have suggested that this effect might be of physiologic significance under some circumstances. GH secretion from primary cultures of pituitary adenomas is inhibited (116). Also, the release of hGH by provocative stimuli is blunted in hypo- and hyperthyroid individuals (117, 118), suggesting that hGH gene expression involves both positive and negative T3 control. In our initial studies, the regulation of the intact hGH structural gene, including introns and exons and containing 493 and 628 bp of 5 ’ - and 3‘-FR, respectively, was studied in stably transfected GC cells (88).T3 treatment of transfected GC cells decreased hGH mRNA and hGH production, whereas it activated the endogenous rGH gene. To localize the sequences containing the negative element, we examined constructs containing selected regions of the hGH gene in transiently transfected GC cells (89). When intron sequences were removed by substitution of the hGH cDNA for the structural gene and coding sequences were replaced by CAT sequences, negative regulation was preserved. Total deletion of the 3’-UTR abolished the negative T3 regulation and subsequent analysis demonstrated that the nTRE resides within a short region occupying the 3’-UTR between the stop codon and the polyadenylation signal (nts 2030 to 2158). Gel-shift analysis with purified TR revealed strong binding to the hGH 3’-UTR comparable in binding affinity to the hCS TRE-70,-50. Deletion of the 3’-UTR to nt 2051 resulted in a significant drop in the binding of TR to the DNA, suggesting that the nTRE was localized between nts 2051 to 2158. The hGH 3’-UTR contains several sequences that appear to be related to known consensus sequences for TR binding. Figure 4 illustrates the putative nTREs that may account for TR binding and TR-mediated negative responsiveness within the hGH 3’-UTR. Three of them match the canonical AGGWCA half-site structure proposed by Brent et al. (119), and several others are closely related. The half-sites are arranged as direct repeats with 0 or 2 spacer nucleotides, palindromes, and inverted palindromes. We next examined the promoter- and position-dependence of the nTRE (89). When the nTRE was maintained in the 3’-UTR, induction of the positive TRE in the hCS promoter was abolished in constructs harboring either the GH or the CS structural gene as reporters, but neither the metallothio-
144
r
NORMAN L. EBERHARDT ET AL. STOP CODON 2md=1
2050
2090
2150
rml 4
lo70
2170
G A C ~ C C ~ TATCAAT T ATTATGGGGTG~GGGGG
CTGATCOBESEBAGATATTATAATACCCCACCTCCC -*+ +++-
TRANSCRIPTION TERMINI
//
FIG.4. Putative nTREs involved in negative control of hCH gene transcription by thyroid hormone; they are localized in the 3’-untranslated region of the gene. Transcription termini are indicated by vertical arrows in the highlighted boxes. Arrows designate the putative TRE halfsite structures (5‘to 3‘) based on the general form GGNNN. Half-sites designated with double asterisks conform to known functional TRE structures; single asterisks represent half-sites whose sequence in the last three positions differs by one nucleotide; all other half-sites differ by two nucleotides. [Reprinted from Zhang et al. (89),with permission from TheJournal ofBiological Chemistry.]
nein nor the thymidine kinase (TK) promoter-driven expression of the CH structural gene was affected by the hGH nTRE. In addition, the hCH 3’-UTR did not affect expression of the human p-actin or Rous sarcoma virus (RSV) promoters in plasmids where the GH cDNA was the reporter gene. However, in a construct where the TK promoter was associated with the GH cDNA, the hGH nTRE reduced gene expression by about 46% with T3 treatment. Thus it is possible that intron sequences of the hGH gene may affect the behavior of the TK promoter in conjunction with the hGH 3’-UTR. The data do indicate that nTRE function in somewhat promoter-dependent, suggesting a promoter-coupling mechanism with the nTRE in the 3’-UTR. When the sequences defining the nTRE were cloned upstream of the GH promoter, it conferred a positive T3 response upon the promoter, indicating the nTRE function is position-dependent (89). Also, insertion of the SV40 3‘-UTR between the reporter gene and the nTRE abolished the T3
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
145
effect on the hCS promoter. In contrast, placement of the SV40 3’-UTR downstream from the GH 3’-UTR did not alleviate T3-dependent negative regulation. Thus, the location of the nTRE in the proximal region of the 3’-UTR upstream of the polyadenylylation site is required for its negative function. This latter finding coupled with the promoter-dependence of nTRE function suggests that it operates via a transcription pausing and/or termination mechanism that is coupled to the promoter. This conclusion is supported by studies demonstrating that T3 does not affect hGH and mRNA stability or the polyadenylylation site in transfected GC cells. The above data distinguish the hGH nTRE from those present in the T S H P gene. The TSHP contains an intrinsic nTRE located between nts 18 and +27 that functions on heterologous promoters in a position-independent manner (120, 121) and consists of motifs that resemble TRE half-sites (122). Another nTRE that consists of half-site structures is present in the human thyrotropin-releasing hormone (TRH) promoter (123). In this case, TRH promoter ligand-independent and -dependent promoter regulation is due to the TRPl isoform-specific binding at three separate half-sites located between nts -60 and $42. The individual half-sites display differential abilities to bind TR monomers, homodimers, and TR-RXR heterodimers. At least one of the sites (Site 4)appears to be an intrinsic nTRE, because it mediates negative T3 regulation of both TRH and a heterologous promoter in the presence of RXR. Although the hGH 3’-UTR contains multiple TRE halfsites analogous to the structures of the T S H P and TRH nTREs, it is not an intrinsic nTRE that functions in a position-independent manner. Consequently, the mechanism of hGH nTRE action may be unique.
+
D. Negative Control Mechanisms Mounting evidence indicates that repression of gene expression is widely used. Several mechanisms have been proposed to account for negative regulation of gene expression by trans-acting factors (124, 125). Because the mechanism that accounts for T3 regulation of the hGH gene is not known in detail and has properties suggesting a unique mechanism, we correlate known information about TR-mediated negative regulation of the hGH gene with current models of negative regulation with emphasis on those that relate to thyroid hormone action.
1. NEGATIVEREGULATION
OF TRANSCRIPTION
INITIATION
a . Competitive DNA Binding. Competition of a repressor for an activator binding site can result in inhibition of activator-mediated transcription. Unliganded TR can repress ER activation of a TRE by competitive binding; however, the inhibition is blocked by T3 (126). Competitive DNA binding has also been demonstrated in the case of TRa2 inhibition (reviewed in 127).
146
NORMAN L. EBEEWARDT ET AL.
TRa exists as a single-copy gene that is differentially spliced, producing two isoforms. T R a l binds T3 and forms homodimers and heterodimers with RXRs. TRa2 lacks the last highly conserved 40 amino acids ofTRa1, contains an additional 120-122 amino acids, and cannot bind T3. TRa2 inhibits T3 stimulation through T R a l by competition for TRE binding of T R d homodimers and TRal/RXR heterodimers, albeit with a lower binding S n i t y compared to the TRP isoforms (128). Therefore, TRa2 is thought to modulate rather than repress T3 effects. In the rat anterior pituitary, the mRNA levels of the TRP1, TRP2, and TRa2 all decrease by about 45-55% in response to T3 (129), suggesting that negative regulation of the hGH gene by specific receptor isoforms is unlikely. b. Quenching. Quenching results from a repressor-activator interaction that inactivates the activator. In transfection studies with the TK promoter containing TREs and the fos promoter containing an AP-1 site, it was observed that c-jun and c-fos inhibited TR-mediated activation of TRE-TK promoter constructs (130). Conversely, TRs inhibited AP-1 site-dependent activation of the c-fos promoter (130). The inhibition was abolished by removal of the 17 carboxy-terminal amino acids from the TR and was correlated with mutual inhibition of DNA binding by the respective factors. Synergistic interactions at a unique AP-1 site in pUC plasmids, but not the collagenase promoter, were observed between jun or junlfos and unliganded TR that were converted to inhibitory responses by thyroid hormone (131). Thus sequence-context of the AP-1 site may determine whether TR-jun or TRjunlfos interactions are functional. Also, TR-jun interactions have been observed with the TSHP promoter, which contains a composite element that binds jun and TR (132). Mutational analysis showed that amino acids 201 to 380 of TR and the leucine zipper and the DNA binding domains ofjun are critical for the TR-jun interaction, suggesting that TR-jun interactions requirejun binding to DNA. Addition of T3 disrupts the TR-jun interaction, resulting in a negative effect on the promoter. It is doubtful that quenching accounts for the mechanism of the hGH nTRE function, because, as discussed above, its activity is position-dependent, becoming an activator sequence when placed upstream of the promoter (90). Therefore, it would be difficult to explain how TR bound to the nTRE in the 3’-UTR could mask the activation domain of any transcriptional factors that bind to the hGH promoter in a position-dependent manner. c. Active Repression. In active repression, the repressor acts directly on the basal transcriptional machinery, providing the cell with a highly efficient mechanism to turn off gene expression independent of activators. There is evidence that hTRP can act as a transcriptional silencer through a
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
147
direct interaction with TFIIB specified by the carboxy-terminal ligand binding domain of hTRP (115). In in uitro transcription assays, unliganded TR inhibits a minimal promoter containing only a TATA box and TR inhibits an early step in transcription initiation, because preassembled preinitiation complexes are refractory to the inhibitory effect of TR (82). T3 treatment relieved repression, suggesting that liganded TR may undergo a conformational change that masks or disrupts the repressor domain. In fact, binding of TRP to the TFIIB amino-terminal domain is hindered by T3 treatment (133). Such a mechanism is considered unlikely to account for hGH nTRE function, because repression is T3-dependent. Also, active repression implies that repression is position-independent.
d. Squelching. Squelching results from the sequestration of transcriptional cofactors or adaptors that an activator requires. Squelching requires neither a specific binding domain for the repressor nor an intact DNA binding domain within the repressor. In inany cases, squelching results from the overexpression of an activator that ties up a limiting factor. A variation of the squelching mechanism intrinsic to nuclear receptors involves the ability of one ligand to squelch the effect of another ligand (134). For example, TR/RXR dimers induce gene expression through binding to a TRE. However, in the presence of 9-cis-retinoic acid, the RXR/RXR homodimer formation is favored, resulting in reduction of TR/RXR heterodimer levels and reduced T3 responses. Squelching is not likely to account for hGH nTRE function for several reasons: (1) binding of TR to the hGH 3‘-UTR is correlated with repression; (2) changing the precise location of the hGH 3’-UTR would not be predicted to affect the T3 response; and (3) other ligands would have been depleted in the stripped serum used in our experimental conditions (89). 2.
REGULATION OF TFtANSCRIPTION AND
ELONGATION
TERMINATION
Obviously, the synthesis of mRNA can be regulated at multiple stages. The location of the nTRE sequence within the hGH 3’-UTR provides novel possibilities to account for negative T3 regulation.
a. Conditional Transcription Elongation. Conditional transcription elongation involves at least two pathways. First, intrinsic DNA or RNA sequences may form secondary structures that result in paused elongation, increasing the frequency of premature termination. Second, modifications to RNA polymerase I1 (RNA Pol-11) that are mediated by other factors might influence the elongation reaction with similar consequences (135). Elongation by Escherichia coli RNA Pol-I1 can be blocked in uitro by a site-specific
148
NORMAN L. EBERHARDT ET AL.
DNA-binding protein (136).TR binding to the hGH 3'-UTR would present an obstacle to RNA elongation. Interestingly, in light of TR-induced DNA bending (91),DNA bends have been implicated as signals for termination of transcription. A protein termed MAZ binds between P1 and P2 in the myc promoter and MAZ-induced DNA bends may be implicated in the pausing that occurs after initiation from the P1 promoter (137). Termination signals are also coupled to promoter-specific influences (138, 139), providing an attractive mechanism to account for the promoter- and location-dependent behavior of the hGH nTRE as discussed above. This might occur through a TR interaction with RNA Pol-I1 or an associated factor (reviewed in 140).
b. 3'-End Formation and Terminal Processing. Although TRs are not known to bind RNA, it is conceivable that the receptor acts on RNA terminal processing events such as 3'-end cleavage and polyadenylylation. Other members of the nuclear receptor family have been implicated in interactions with proteins with structural features that include RNA binding proteins. The RAR receptor is capable of forming heterodimers with PML, a protein that contains a CydHis-rich RING finger domain, which is found in RNAbinding proteins, transcription factors, and oncoproteins (141). If the TR were bound to the RNA it might affect terminal processing either by steric hindrance or direct interaction with polyadenylylation factors. Also, T3 has important effects at the post-transcriptional level to regulate the expression of several genes, including GH and TSHP (120, 142). A sequence in the TSHP 3'-UTR that is conserved among different species appears to confer RNA instability. This sequence is specifically bound by a protein factor that is under the control of thyroid hormone (142). This mechanism is not applicable to nTRE function, because T3 treatment does not alter the stability of the hGH mRNA in transfected GC cells (89).
IV. Cell-specific Control of Placental GHKS Gene Expression As discussed in Section I, the hCS-I, hCS-2, and hGH-2 genes are expressed in primate placenta (1,2). Copius amounts (-1 glday) of hCS are produced during the third trimester of pregnancy. This accounts for 10% of all placental protein production and corresponds to more than 5% of the total poly(A)+ RNA in placenta (2). Cell-specific enhancers are required for a number of genes that are expressed at very high levels, including silk-worm fibroin (143), mammalian insulin (144), and rat albumin (145). Accordingly, enhancer involvement was proposed to account for high-level hCS expression. Evidence supporting this concept was found by screening DNA frag-
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
149
ments from the hGH/CS locus in an enhancerless vector containing the SV40 promoter fused to the CAT gene (146-148). The enhancer, which we have designated CSEn, was localized in a 1O22-bp DNA fragment located about 2 kb downstream of the hCS-2 termination codon (146)and conferred strong cell-specific transcriptional activity in human choriocarcinoma (JEG-3 and BeWo) cells. Further deletion analysis of CSEn indicated that nearly all the activity is located within the first 240 bp of the original 1O22-b~fragment (147).CSEn functions in BeWo and JEG-3, but not HeLa or HepG2 cells, indicating that it is a cell-specific enhancer. Detailed studies of CSEn (149-151) indicate that it is composed of multiple DNA elements, or enhansons, that share remarkable similarities with those of the SV40 enhancer (149). The hCS promoter requires Spl, TATA box, and an initiator site (InrE) located downstream of the TATA element for maximal promoter and enhancer-stimulated activities (150).The InrE binding factor appears to be abundantly expressed in choriocarcinoma cells, indicating that it may participate in determining cell-specific expression (150). Although the close homology with the SV40 GT-IIC enhanson has suggested that CSEn activity might be mediated by transcription enhancer factor-1 (TEF-l), a novel factor, designated CSEF-1, that is present in BeWo and COS-1 cells recognizes the GT-IIC enhanson and is correlated with CSEn activity in BeWo and COS-1 cells (151).
A. The Placental Enhancer, CSEn CSEn activity has been characterized extensively in both placental and nonplacental cells. Using a CAT reporter gene, Walker et al. (147) demonstrated that CSEn is active in JEG-3 and JAR cells but not HeLa, HepG2, or U-373MG cells. We confirmed these results with the more sensitive luciferase (LUC) reporter gene (149).Because CSEn functions with the heterologous TK and RSV promoters, it does not appear to be promoter-specific (147, 1 4 4 , suggesting that enhancer factors may interact with basal transcription factors. The region encompassing nts 116 to 134 of the 240-bp minimal enhancer is protected by choriocarcinoma and HeLa cell nuclear extracts (147). This footprint region corresponds to a sequence with significant matches (8 to 9) to the SV40 GT-IIC enhanson that binds TEF-1 (152, 153). We inserted a single copy of a synthetic oligonucleotide corresponding to CSEn 103 to 151 that contained the TEF-1 binding site and examined the activity of the construct in transiently transfected BeWo cells. This fragment failed to confer any enhancer activity, indicating that a single GT-IIC enhanson is insufficient for CSEn activity. Deletion of 3’-flanking, 5’-flanking, or the middle part of CSEn seriously diminished enhancer function, providing further support that the enhancer requires multiple elements for its activity.
150
NORMAN L. EBERHARDT ET AL.
We performed a more detailed DNase-I footprinting analysis to locate other putative transcription factor binding regions (Fig. 5). Five protected regions, designated FP-1-FP-5, were observed with nuclear extracts from BeWo, JEG-3, HeLa, and GC cells (149). In no case was a cell-specific footprint observed. FP-3 was identical to that described by Walker et aZ. (147) which contained the putative TEF-1 binding site. In addition, two other regions, FP-2 and FP-4 that flank FP-3, were routinely protected with extracts from all four cell types. FP-2 occurs in an (A + T)-rich region whose sequences on the alternate strand are nearly identical to those within FP-4. Thus the core elements in FP-2 and FP-4 appear to comprise an inverted repeat (IR). FP-5, located at the 3'-end of the enhancer, protects a region of D N A that includes a direct repeated sequence (DR). A final protected region, FP-1, was localized to the 5'-end of the enhancer and was found to contain a sequence homologous to the GT-IIC enhansom We compared the CSEn sequence with the SV40 enhancer (Fig. 5) to determine if there is homology with other SV40 enhansons (149).Extensive similarities were shared with the SV40 GT-I, Sph-IISph-11, and GT-IIC enhansons: FP-1 straddles sequences with homology to GT-IIC (7/9 matches) and GT-I (8/9 matches); FP-2 contains the IR and two sequences related to the SV40 Sph-I/Sph-I1 motifs (7/9 and 8/9 matches); FP-3 contains an almost perfect GT-IIC motif (8/9 matches); and FP-4 contains the second member of the IR, which itself contains an Sph-I-related structure (7/9 matches). It is noteworthy that the positional arrangement of the individual enhansons are similar between CSEn and the SV40 enhancer, suggesting that they may have evolved from a common ancestor. To ascertain the functional significance of these footprints and SV40related enhansons, a site-specific mutational analysis of each of the individual regions was performed (149). Mutation of the upstream member of the IR (EM 3 in Fig. 5) and the GT-IIC motif (EM 5 in Fig. 5) virtually eliminated enhancer activity, indicating that these two elements are essential for enhancer activity. Mutation of the upstream, more degenerate copy of the GTIIC motif in the FP1 region (EM 1 in Fig. 5) resulted in loss of 45% of enhancer activity, suggesting that the binding of a TEF-l-like factor at multiple sites may be required for maximal enhancer activity. Similarly, mutations of the downstream members of the IR (EM 6 in Fig. 5) and DR (EM 7 and EM 8 in Fig. 5) reduced the enhancer activity 51 and 34%, respectively. Interestingly, the mutation of the upstream DR caused only a 36%, nonsignificant reduction in activity in the sense orientation, but an 89% reduction in the antisense orientation, suggesting that individual elements within an enhancer may exert orientation-dependent effects on enhancer activity. Taken together, these data demonstrate that multiple enhancer elements
151
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
cqctqcaq
aaqct t c g
5’-GTCTACACPT~CTCATCAAC~TGTGGACGGCAATTTCTCCTGCAAA~TGAGATG 3’-CAGATCT~G’C(XiAGTAGTTGMCCACAC~CCGTTAAAGAGGACGTTTMACTCTAC I I I
to
20
=-3
cccqqq
Elf-4
atcqat
60
-
CCTAGGATG~C~A~CATACGTGAGCCCTCACTCCCTGAGATTCTGATATAATTA GGATC~A~GA~CCACGTATGCACTCGGGAGTGAGGGA~CTAAGACTATATTAAT I I I 80
100
120
Sph-1178
GT-IIC189
m-6
=-5 cqqatcct
qtcqac
GACfGGMTOrOGTCCAGGCAAGAGTAG~A~CPTTCCCAGGTGATTCTAACATGTAAA CTGAC~A~CCAGGTCCGTTCTCATC~~AAAGGGTC~CTAAGATTCTACATTT I I I 1to 160 180
c t qc a t ac acc
tcqattacacc
CAAGGCPGAGAACCACTGTGTTAGGGACCGCAAAGATGAGACCCATGTGTTCACAG-3’ GTTCCAACT~TGACACMTCCCTGGCGTTT~ACCCTGGGTACACAAGTGTC-5’
I
200
I
220
I
2to
FIG. 5 . Multipartite structure of the placental specific enhancer, CSEn. Regions protected by DNase-I footprinting analysis are designated (FP-1 to FP-5). Interesting structural motifs (DR and IR), sequences with homology to SV40 enhansons (GT-I, GT-IIC, Sph-I, and Sph-II), or sequences with homology to other transcription factor binding sites (Oct) are designated with arrows. Mutations that affected enhancer function (EM 1, EM 3, EM 8) are indicated. Only enhancer mutation EM 2 was without significant effect on overall enhancer activity. Mutations EM 3 and EM 5 eliminated 90% of the enhancer activity. [Data reprinted from Jiang and Eherhardt (149), with permission from The Journal of Biological Chemistry.]
NORMAN L. EBERHARDT ET AL,.
152
related to individual SV40 enhansons interact cooperative to mediate maximal placental enhancer.
B. A
Placental Factor, CSEF-1, Controls
CSEn Function
TEF-1 from HeLa cells binds to the GT-IIC and Sph-I/Sph-I1 enhansons (152, 153). Cloned TEF-1 is a 57-kDa polypeptide containing a TEA DNAbinding domain. TEF-1 is highly conserved in chicken, mouse, and human, suggesting it plays a fundamental biological role. TEF-1 is a widely distributed transcription factor and appears to be involved in mediating the regulation of a diverse set of genes, including SV40 enhancer function (152-154), human papilloma virus-16 E6 and E7 oncogene transcription (155), and car1 The mechanism diac and skeletal-muscle-specific gene expression ( ~ 6 - 58). of TEF-1 action is complex. Overexpression of TEF-1 in a variety of cells squelches transcriptional activity (154-156), suggesting that other limiting transcription factors or adapters are required for TEF-1 function. TEF-1 involvement in CSEn function has been considered likely, because mutation of the CSEn GT-IIC enhanson abolished enhancer activity (147, 149, 159). We established the presence of TEF-1 in the BeWo choriocarcinoma cell line by Western blot and gel-shift assays (151).However, in the functional studies, no TEF-1-mediated stimulation of CSEn was ever observed. Overexpression of TEF-1 in BeWo cells resulted in marked inhibition of basal hCS promoter activity without affecting the relative enhancermediated stimulation of transcription (151). Moreover, down-regulation of TEF-1 expression with antisense oligonucleotides in BeWo cells resulted in increased basal promoter activity and increased enhancer-mediated stimulation (151). Similar effects were observed with an artificial enhancer containing multiple GT-IIC enhansons. These data point to alternate models for TEF-1 action, including the possibility that it mediates negative promoter and/or enhancer functions (151). Several other studies suggest that there are other GT-IIC binding activities in cells that might be involved in mediating GT-IIC enhanson function. In the myosin heavy-chain+ ( M H C P ) gene, two distinct factors, a mouse TEF-1 homolog and an unrelated muscle-specific factor, bind to the GT-IICrelated element. The ubiquitous TEF-1 homolog failed to transactivate M H C P gene expression (160).Also, GAL4 chimeras containing a novel isoform of chicken TEF-1 that has 13 additional C-terminal amino acids can trans-activate GAL4-dependent reporter genes, whereas chimeras corresponding to the ubiquitous hTEF-1 isoform exhibit only inhibitory activity. Thus the dominant function of the ubiquitous form of TEF-1 may be repression. In the case of CSEn, a DNA-protein complex (complex f) migrating faster than that of TEF-1 was observed in HeLa, JEG-3, and human placen-
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
153
tal tissue extracts (159). We compared the GT-IIC binding factors in BeWo, GC, HeLa, and COS-1 cells (Fig. 6). TEF-1 complexes were detected in BeWo, GC, and HeLa cells, but very weak, if any, complexes were formed in COS-1 cells. On the other hand, a fast moving complex, designated CSEF-1, very similar to "f" was found in BeWo and COS-1 cells. Using UV crosslinking and Southwestern blot analysis, we identified CSEF-1 as a 30-kDa polypeptide (151). Strong evidence that CSEF-1 is involved in mediating CSEn function was obtained from the finding that CSEn is a strong enhancer in COS-1 cells (151).COS-1 cells contain virtually no TEF-1 binding activity and have high levels of a factor with properties identical to CSEF-1. Also, no significant CSEn activity was observed in HeLa cells, which contain very low levels, if
EXTRACT
",?
PR0BE:M
W
M
TnT W
M
W
TNTTEF
BeWo
M
M
W
W
COS M
W
HeLa M
W
e TEF-1
PROBE M:
GT-IICsvMUT
PROBE W:
GT-IICsv
FIG. 6. Gel-shift analysis showing that a protein, CSEF-1, that is distinct from TEF-1, binds to the GT-IIC enhanson. CSEF-1 is correlated with enhancer function in COS-1 and BeWo cells, because CSEn enhancer function is very active in COS-1 cells, which contain exceedingly low levels of TEF-1. [Data reprinted from Jiang and Eberhardt (151),with permission from The Journal of Biological Chemistry.]
154
NORMAN L. EBERHARDT ET AL.
any, of CSEF-1, but contain abundant TEF-1 binding activity. Also, in the absence of other enhansons, GT-IIC multimers stimulate hCS promoter activity significantly in BeWo and COS-1 cells (151).The GT-IIC multimers were more active in COS-1 cells than in BeWo cells, suggesting that the TEF-1 present in BeWo cells might down-regulate the GT-IIC enhanson activity. This possibility is supported by the observation that CSEF-1 and TEF-1 have the same DNA binding specificity and directly compete with each other in binding to a GT-IIC probe (151). Several lines of evidence indicate that CSEF-1 is not related to TEF-1 (151). CSEF-1 is extremely thermostable, whereas TEF-1 DNA-binding activity is very sensitive and CSEF-1 is eluted from heparin-Sepharose columns after the bulk TEF-1 binding activity. Finally, a rabbit antibody raised against chicken TEF-1 (157) did not cross-react with CSEF-1. However, it is fully possible that CSEF-1 is still a member of the family of transcription factors containing the TEA-DNA-binding domain (161). This domain has been identified in a diverse set of regulatory genes, including the scalloped gene, a neuronal-specific Drosophila gene that is 68%identical to TEF-1 (162), the abaA gene of Aspergillus nidulans that regulates differentiation of asexual spores (163), and yeast TEC1 that regulates transposon Tyl enhancer activity in Saccharomyces cerevisiae (164). Taken together, our results parallel those found with the M H C P promoter in which both the ubiquitous mTEF-1 and a distinct muscle-specific factor A1 bind to the GT-IIC variant motif, M-CAT. Like CSEF-1 and TEF-1 ( I S ) , the binding specificities of A1 and mTEF-1 are indistinguishable (160). Overexpression of mTEF-1 had no effect on M H C P promoter activity, indicating a role for A1 in mediating M H C P muscle-specific gene expression. It has been suggested that stimulation might involve the heterodimerization of A1 and mTEF-1(160). Our data do not support such a mechanism, because down-regulation of TEF-1 levels in BeWo cells increased CS promoter and enhancer activity, COS-1 cells lack TEF-1 and no evidence for an interaction between TEF-1 and CSEF-1 was observed in the gel-shift experiments (151). The simplest model to explain our results is that CSEF-1 mediates positive enhancer activity and that TEF-1 may act to modulate the enhancer activity through its repressor actions.
C. The hCS Initiator Element, lnrE Enhancer effects depend on interactions between transcriptional factors attached to individual enhansons as well as the interaction of these complexes with promoter factors. Looping mechanisms to account for enhancerpromoter interaction have been clearly demonstrated by electron microscopic visualization in the case of Spl-containing enhancers (165).The fact that CSEn activity relies completely on the presence of a promoter (S.W.
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
155
Jiang and N. L. Eberhardt, unpublished data) suggests that specific promoter elements may play critical roles in the enhancer-stimulated transcription. Because CSEn functions well with many enhancers, such interactions may involve basal transcription factors. Consequently we wanted to analyze the CS promoter elements that might be crucial for interacting with the enhancer. Fitzpatrick et al. (148) studied the hCS promoter elements required for its activity in JEG-3 cells and found that, in the presence of CSEn, deletion of nts -129 to -142, containing the Spl site, reduced activity to 1/8th. These results were interpreted to suggest that the Spl site is required for enhancer activity. We confirmed the important role of the S p l element on the basal hCS promoter basal activity (150).Mutation of the Spl site reduced basal activity to 25%. However, the relative enhancer-stimulated promoter activity was unchanged on mutation of the S p l site, suggesting that other factors are essential for enhancer function (150). Deletion of the TATA box resulted in a very low basal activity and decreased CSEn-simulated activity. Also, substitution of sequences downstream of the TATA box had a deleterious effect on the basal- and enhancer-stimulated activity. These results suggest that TATA and initiator elements are important for hCS promoter function and that these elements might be required for enhancer activity. Subsequent mutagenesis studies led to the localization of a transcription initiator element (InrE) to nts -15 to f l (150). A 70-kDa InrE-binding factor that is preferentially expressed in placental cells was identified by gelshift, Southwestern, and UV cross-linking experiments (150).Mutations between nts -10 and +5 of the hCS promoter affected the positions of the transcriptional start sites, providing evidence that the InrE-binding factor is involved in transcription initiation. These data indicate that the InrE accounts in part for cell-specific expression of the hCS promoter and suggest that the initiator as part of the basal transcription apparatus is essential for enhancer function.
D. Negative Control of Placental hGH/CS Genes in Pituitary Several studies indicate that the hCS promoter is as active as the hGH promoter in transfected pituitary cells, due to the conservation of GHF-1 binding sites (92). In contrast, the endogenous hCS gene is not transcribed in the pituitary in uiuo, pointing to the presence of a pituitary-specific repressive mechanism mediated by DNA sequences outside of the promoter. The absence of placental-specific factors (e.g., initiators) contributing to the basal- and enhancer-mediated promoter activity provides a passive model to partially explain pituitary silencing of hCS gene expression. Also, the presence of TEF-1 in the pituitary may ensure that the hCS enhancer is nonfunc-
NORMAN L. EBERHARDT ET AL.
156
tional and may contribute to hCS promoter silencing. Whereas this passive mechanism explains the lack of activation, active repression must take place in pituitary cells to overcome the potential influence of GHF-1 on hCS promoter activity. Evidence obtained by Nachtigal et al. (166)provides such a mechanism through their discovery that P elements located upstream of each of the placental growth hormone genes are involved in repressing hCS promoter activity. A highly conserved 1.O-kb region of DNA, designated P sequences, located 1.7-2.1 kb upstream ofeach of the placental hCH/CS members, was noted in the sequence analysis of the hGHICS locus (167). Nachtigal et al. (166) investigated the possibility that repressor elements are contained in the P sequences and identified a 263-bp DNA element that blunts hCS promoter activity in pituitary but not placental cells. Two major footprints, PSF-A and PSF-B, were observed in DNase-I analysis using GC cell extracts. These two elements compete with each other in footprinting and a 10-bp palindrome TGTTGCAACA and related sequences were found in PSF-A and PSB-B, suggesting that the same or related proteins interact with these regions. The two footprints can be competed away by a DNA fragment containing a GHF-1 site, suggesting that GHF-1 might be involved in the generation of the footprints. However, the direct GHF-1 binding to PSF-A and PSF-B is unlikely because they do not share any similarity with the GHF-1 consensus sequence and the footprints are observed in cells lacking GHF-1 expression. Therefore, it is not clear what functional role, ifany, GHF-1 plays in hCS gene repression. Further analysis of these sequences and their relationship to hCS promoter control will be required to understand their function.
E. Hormonal Control of Placental hGH/CS Gene Expression A number of hormonal stimuli are involved in adjusting the expression of the placental hGHICS genes to meet the physiological requirements of the fetal-placental-maternal unit. Among others, T3, estrogen, phorbol esters, and several growth factors affect hCS expression. Because of the complexity of these pathways, most studies are still limited to observations of changes on hCS secretion, and the molecular mechanisms remain undefined. Induction by T3 of the endogenous hCS gene has been observed in BeWo cells (93), suggesting that it plays a physiologic role in modulating hCS gene expression. Although, as discussed above, we have defined proximal TREs in the hCS promoter, there is evidence for additional TREs and RAREs between nts -1200 and -500 that may be required to achieve maximal responses (168). Thus more work will be required to define the precise mechanism of T3 action.
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
157
The protein-kinase-C pathway has been suggested to play an active role in hCS gene regulation. Incubation of the term placental trophoblast with phorbol-12-myristate (PMA) increases hCS release and synthesis (169). When the intact hCS gene was stably introduced into GC cells, PMA stimulated hCS mRNA and protein production (170). However, when the hCS promoter was linked to a CAT reporter gene, PMA inducibility was lost, suggesting that the PKC pathway required structural elements not present on the promoter. Related to these findings, we recently observed that PMA induced the hCS promoter linked to a LUC reporter gene only in BeWo cells, not in GC cells, suggesting the presence of cell-specific PKC-dependent regulation (S. -W. Jiang and N. L. Eberhardt, unpublished data). Placenta can carry out the initial and terminal steps in estrogen synthesis. After the first trimester, the placenta becomes the major site of estradiol biosynthesis (1 71). There is a good correlation of increased plasma estrogen levels with placental weight, suggesting that an estrogen-responsive paracrine and/or autocrine mechanism is operative (172). Unfortunately, little is known about placental ER expression and function. Applying the ligand binding assay, some early experiments demonstrated specific estradiol and progesterone binding activities in term placental cytosol(173). Recent experiments suggest estrogen might participate in regulation of a variety of specific placental functions. For example, estrogen significantly increases release of the gonadotropin-releasing hormone (GnRH)from placental primary culture (174). Finally, placenta expresses IGF-1, EGF, and their respective binding proteins (175, 176). E G F induces differentiation of term placental trophoblasts in primary culture (177). By promoting formation of syncytiotrophoblasts, E G F indirectly stimulates hCS production. EGF or IGFI alone enlarged mononucleated trophoblast cells and both hormones caused more marked enlargement of the trophoblast cells, aggregation of the enlarged cells, and early development of syncytiotrophoblasts (178, 179). Similar effects of human macrophage colony-stimulating factor on hCS production have been observed in early-stage placental cells (180). Thus it is possible that these as well as the other signal transduction pathways discussed above may interact with each other to control hCS gene regulation cooperatively in the placenta. Elucidation of the molecular mechanisms involved in these processes may be the object of future studies.
ACKNOWLEDGMENTS The authors are particularly indebted to Grdeme Bell, Peter Seeburg, John Shine, John Parks, John Phillips, Peter Catinni, Pierre Chambon, Irwin Davidson, Ian Farrance, Charles
158
NORMAN L. EBERHARDT ET AL.
Ordahl, Richard Goodman, Arthur Gutierrez-Hartmann, Michael Green, Richard Maurer, Michael Karin, William Wood, Peter Kushner, Jim Apriletti, Tom Lavin, John Baxter, Tom Kerppola, Tom Curran, Tom Maniatis, John Collins, Richard Gelinas, Melvin Grumbach, Selna Kaplan and Walter Miller for promoting this work through encouragement, scientific discussion, and generous sharing of materials. We extend our apologies to those who have made significant contributions to our knowledge in this area, but for which the limitations of space have made it impossible to cite. This work was supported hy NIH Grant DK41206.
REFERENCES 1. W. L. Miller and N . L. Eberhardt, Endocr. Reu. 4, 97 (1983). 2. W. H. Walker, S. L. Fitzpatrick, H. A. Barrera-Saldana, D. Resendez-Perez and G. F. Saunders, Endoc. Reu. 12, 316 (1991). 3. D. I. Linzer and D. Nathans PNAS 81, 4255 (1984). 4. J. T. Nelson, N. Rosenzweig and M. Nilsen-Hamilton, Endocrinology 136, 283 (1995). 5. G . S. Barsh, P. H. Seeburg and R. G. Gelinas, NARes. 11, 3939 (1983). 6. H. Hirt, J. Kimelrnan, M. J. Birnbaum, E. Y. Chen, P. H. Seeburg and N. L. Eberhardt, DNA 6, 59 (1987). 7. P. H. Seeburg, DNA 1, 239 (1982). 8. M. Selhy, A. Barta, J. D. Baxter, G. I. Bell and N. L. Eberhardt, JBC 259, 13131 (1984). 9. G . Baumann, Endoc. Rev. 12, 424 (1991). 10. M . L. Scippo, F. Frankenne, E. L. Hooghe-Peters, A. Igout, B. Velkeniers and G. Hennen, Mol. Cell. Endocrinol. 92, R7 (1993). 11. D. J. Hill, Horm. Res. 38, (Suppl. I), 28 (1992). 12. A. Misra-Press, N. E. Cooke and S. A. Liebhaber, JBC 269, 23220 (1994). 13. P. Rohvein, A. M. Gronowski and M. J. Thomas, Horn. Res. 42, 170 (1994). 14. A. A. Kossiakoff, J. Nuel. Med. 36, S14 (1995). 15. P. A. Kelly, L. Goujon, A. Sotiropoulos, H. Dinerstein, N. Esposito, M. Edery, J. Finidori and M. C. Postel-Vinay, Horn. Res. 42, 133 (1994). 16. M. C. Postelvinay, J. Finidori, A. Sotiropoulos, H. Dinerstein, J. F. Martini and P. A. Kelly, Ann. Endocrinol. 56, 209 (1995). 17. S. A. Berry, P. L. Bergad, C. D. Whaley and H. C. Towle, Mol. Endocrinol. 8, 1714 (1994). 18. A. C. Herington, S. I. Ymer, J. L. Stevenson and P. Roupas, PSEBM 206, 238 (1994). 19. H. B. Lowman, B. C. Cunningham and J. A. Wells, JBC 266, 10982 (1991). 20. D. J. Hill, M . Freemark, A. J. Strain, S. Handwerger and R. D. G. Milner, J. Clin. Endocrinol. Metab. 66, 1283 (1988). 21. M. Urbanek, J. N . MacLeod, N . E. Cooke and S. A. Liebhaber, Mot. Endocrinol. 6,279 (1992). 22. M. Urbanek, J. E . Russell, N . E. Cooke and S. A. Liebhaber, JBC 268, 19025 (1993). 23. M. Mercado, N. DaVila, J. F. McLeod and G. Baumann, J. Clin. Endocrinol. Metab. 78, 731 (1994). 24. S. S. Galosy, A. Gertler, G. Elbergand and D. M. Laird, Mot. Cell. EndocrinoL 78, 229 (1991). 25. N . R. Staten, J. C. Byatt and G. G. Krivi, JBC 268, 18467 (1993). 26. P. Scott, M. A. Kessler and L. A. Schuler, Mol. Cell. Endocrinol. 89, 47 (1992). 27. R. V. Anthony, R. Liang, E. P. Kay1 and S. L. Pratt, J. Reprod. Fertil. 49 (suppl.), 83 (1995).
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
159
28. M. Freemark, M. Comer, T. Mularoni, A. J. D’Ercole, A. Grandis and L. Kodack, Endocrinology 125, 1504 (1989). 29. M. Freemark and S. Handwerger, Endocrinology 118, 613 (1986). 30. J. Fowlkes and M. Freemark, Pediatr. Res. 32, 200 (1992). 31. B. H. Breier, B. Funk, A. Surus, G. R. Ambler, C . A. Wells, M. J. Waters and P. D. Gluckman, Endocrinology 135, 919 (1994). 32. R. V. Anthony, and S. L. Pratt, Endocrinology 136, 2150 (1995). 33. A. Igout, F. Frankenne, M. L’Hermite-Baleriaux, A. Martin and G. Hennen, Growth Regul. 5, 60 (1995). 34. D. Evain-Brion, H o r n . Res. 42, 207 (1994). 35. A. Caufriez, F. Frankenne, G. Hennen and G. Copinschi, Horn. Res. 42, 62 (1994). 36. I. A. Forsyth, E x p . Clin. Endocrinol. 102, 244 (1994). 37. J. C. Byatt, P. J. Eppard, J. J. Veenhuizen, T. L. Curran, D. F. Curran, M. F. McGrath and R. J. Collier, J . Endocrinol. 140, 33 (1994). 38. J. E. Tyson, K. L. Austin and J. W. Farinholt, Am. /. Obstet. Gynecol. 109, 1080 (1971). 39. C. Williams and T. M. Coltart, Hr. /. Obstet. Gynmcol. 85, 43 (1978). 40. M. Freemark, A. Keen, J. Fowlkes. T. Mularoni, M. Comer, A. Grandis and L. Kodack, Endocrinology 130, 1063 (1992). 41. G. Thordarson, G. H. McDowell, S. V. Smith, S. Iley and I. A. Forsyth, J , Endocrinol. 113, 277 (1987). 42. J. C. Byatt, P. J. Eppard, J. J. Veenhuizen, R. H. Sorbet, F. C. Buonomo and D. F. Curran, J . Endocrinol. 132, 185 (1992). 43. S. Handwerger, Endocr. Reo. 12, 329 (1991). 44. T. C. Brelje, D. W. Scharp, P. E. Lacy, L. Ogren, F. Talamantes, M. Robertson, H. G. Friesen and R. L. Sorenson, Endocrinology 132, 879 (1993). 45. A. J. Strain, D. J. Hill, 1. Swenne and R. D. Miher]. Cell. Physiol. 132, 33 (1987). 46. D. J. Hill, C. Camacho-Hubner, P. Rashid, A. J. Strain and D. R. Clemmons, J . Endocrinol. 122, 87 (1989). 47. P. A. Schoknecht, M. A. McGuire. W. S. Cohick, W. B. Currie and A. W. Bell, J . Anim. Sci. 70, (Suppl. l),212 (1992). 48. P. A. Schoknecht, S. N. Nobrega, J. A. Petterson, R. A. Ehrhardt, R . Slepetis and A. W. Bell, J. Anim. Sci. 69, 1059 (1991). 49. M. K. Bauer, B. H. Breier, J. E. Harding, J. D. Veldhuis and P. D. Gluckman, Endocrinology 136, 1250 (1995). 50. P. V. Nielsen, H. Pedersen and E . M. Kampmann, Am. J. Obstet. Gynecol. 135, 322 (1979). 51. J. M. Wurzel, J. S. Parks, J. E. Herd and P. V. Nielsen, DNA 1, 251 (1982). 52. M. Barinaga, L. M. Bilezikjian, W. W. Vale, M. G. Rosenfeld and R. M. Evans, Nature 314, 279 (1985). 53. J. Epelbaum, P. Dournaud, M . Fodor and C. Viollet, Crit. Reo. Neurobiot. 8, 25 (1994). .54. L. M . Bilezikjian, J. Erlichman, N . Fleischer and W. W. Vale, Mol. Endocrinol. 1, 137 (1987). 55. H. Sugihara, S. Minami, K. Okada, J. Kamegai, 0. Hasegawa and I . Wakabayashi, Endocrinology 132, 1225 (1993). 56. C. Lin, S. C. Lin, C. P. Chang and M . G . Rosenfeld, Nature 360, 765 (1992). 57. R. E. Hammer, R. L. Brinster, M . G. Rosenfeld, R. M. Evans and K. E. Mayo, Nature 315, 413 (1985). 58. A. Spada, M. Bassetti, P. Gildelalamo, K. Saccomanno and A. Lania, Metab. Clin. Erp. 44, 31 (1995).
160
NORMAN L. EBERHARDT ET AL.
59. C. A. Landis, S. B. Masters, A. Spada, A. M. Pace, H. R. Bourne and L. Vallar, Nature 340, 692 (1989). 60. M. Hagiwara, P. Brindle, A. Harootunian, R. Armstrong, J. Rivier, W. Vale, R. Tsien, and M. R. Montminy, MCBiol. 13, 4852 (1993). 61. T. E. Meyer and J. F. Habener, Endocr. Reu. 14, 269 (1993). 62. C. R. Vinson, P. B. Sigler and S. L. McKnight, Science 246, 911 (1989). 63. R. S. Struthers, W. W. Vale, C. Arias, P. E. Sawchenko and M. R. Montminy, Nature 350, 622 (1991). 64. A. McCormick, H. Brady, L. E. Theill and M. Karin, Nature 345, 829 (1990). 65. S. Li, E. B. Crenshaw, 111, E. J. Rawson, D. M. Simmons, L. W. Swanson and M. G. Rosenfeld, Nature 347, 528 (1990). 67. P. Dolle, J. L. Castrillo, L. E. Theill, T. Deerinck, M. Ellisman and M. Karin, Cell 60, 809 (1990). 68. M. S. Kapiloff, Y. Farkash, M. Wegner and M. 6. Rosenfeld, Science 253, 786 (1991). 69. S. Dana and M. Karin, Mol. Endocrinol. 3, 815 (1989). 70. R. P. Copp and H. H. Sarnuels, Mol. Endocrinol. 3, 790 (1989). 71. G. A. Brent, J. W. Harney, D. D. MooreandP. R. Larsen, Mol. Endocrinol. 2,792(1988). 72. H. J. Steinfelder, S. Radovick, M. A. Mroczynski, P. Hauser, J. H. McClaskey, B. D. Weintrauh and F. E. Wondisford, J. Clin. Znuest. 89, 409 (1992). 73. M. T. Gilbert, J. Sun, Y. Yan, C. Oddoux, A. Lazarus, W. P. Tansey, T. N. Lavin and D. F. Catanzaro, JBC 269, 28049 (1994). 74. A. R. Shepard, W. Zhang and N. L. Eberhardt, JBC 269, 1804 (1994). 75. C. P. Verrijzer, J. A. van Oosterhout, W. W. van Weperen and P. C. van der Vliet, EMBO J. 10, 3007 (1991). 76. R. P. de Groot, V. Delmas and P. Sassone-Corsi, Oncogene 9, 463 (1994). 77. F. Argenton, S . Vianello, S. Bernardini, P. Jacquemin, J. Martial, A. Belayew, L. Colombo and M. Bortolussiet, BBRC 192, 1360 (1993). 78. B. Peers, A. M. Nalda, P. Monget, M. L. Voz, A. Belayew and J. A. Martial, EJB 210,53 (1992). 79. J. Liang, K. E. Kim, W. E. Schoderhek and R. A. Maurer, Mol. Endocrinol. 6,885 (1992). 80. C. A. Keech, S. M. Jackson, S. K. Siddiqui, K. W. Ocran and A. Gutierrez-Hartmann, Mol. Endocrinol. 6, 2059 (1992). 81. E. Hummler, T. J. Cole, J. A. Blendy, R. Ganss, A. Aguzzi, W. Schmid, F. Beermann and G. Schutz, PNAS 91, 5647 (1994). 82. J. S . Fink, M. Verhave, S. Kasper, T. Tsukada, 6. Mandel and R. H. Goodman, PNAS 85, 6662 (1988). 83. M. R. Montminy and L. M. Bilezikjian, Nature 328, 175 (1987). 84. T. E. Meyer, G. Waeher, J. Lin, W. Beckmann and J. F. Habener, Endocrinology 132,770 (1993). 85. S. E. Hyman, M. Comb, J. Pearlberg and H. M. Goodman, MCBiol9, 321 (1989). 86. A. Ray, P. Sassone-Corsi and P. B. Sehgal, MCBiol 9, 5537 (1989). 87. J. A. Martial, P. H. Seeburg, D. Guenzi, H. M. Goodman and J. D. Baxter, PNAS 74, 4293 (1977). 88. P. A. Cattini, T. R. Anderson, J. D. Baxter, P. Mellon and N. L. Eberhardt, JBC 261, 13367 (1986). 89. W. Zhang, R. L. Brooks, D. W. Silversides, B. L. West, F. Leidig, J. D. Baxter and N. L. Eberhardt, ]BC 267, 15056 (1992). 90. P. A. Cattini and N. L. Eberhardt, N . A. Res. 15, 1297 (1987). 91. F. Leidig, A. R. Shepard, W. Zhang, A. Stelter, P. A. Cattini, J. D. Baxter and N. L. Eberhardt, JBC 267, 913 (1992).
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
161
92. M. W. Nachtigal, B. E. Nickel, M. E. Klassen, W. G. Zhang, N. L. Eberhardt and P. A. Cattini, NARes 17, 4327 (1989). 93. B. E. Nickel and P. A. Cattini, Endocrinology 128, 2353 (1991). 94. H.-M. Wu and D. M . Crothers, Nature 308, 509 (1984). 95. L. S. Lerman and H. L. Frisch, Biopolymers 21, 995 (1982). 96. T. K. Kerppola and T. Curran, Cell 66, 317 (1991). 97. S. S. Zinkel and D. M . Crothers, Nature 328, 178 (1987). 98. X. P. Lu, N. L. Eberhardt and M. Pfahl, MCBiol 13, 6509 (1993). 99. I. N. King, T. de Soyza, D. F. Catanzaro and T. N. Lavin, JBC 268, 495 (1993). 100. A. M. Nardulli, C. Grobner and D. Cotter, Mol. Endocrinol. 9, 1064 (1995). 101. M. Horikoshi, C. Bertuccioli, R. Takada, J. Wang, T. Yamamoto and R. 6 . Roeder, PNAS 89, 1060 (1992). 102. G. P. Schroth, G. R. Cook, E. M . Bradbury and J. M. Gottesfeld, Nature 340,487 (1989). 103. T. Lkeveillard, G. A. Kassavetis and E. P. Geiduschek, JBC 266, 5162 (1991). 104. P. C. van der Vliet and C . P. Verrijzer, BioEssays 15, 25 (1993). 105. A. Travis, A. Amsterdam, C. BBlanger and R. Grosschedl, Genes Deu. 5, 880 (1991). 106. K. Giese, J. Cox and R. Grosschedl, Cell 69, 185 (1992). 107. K. Giese and R. Grosschedl, EMBO J. 12, 4667 (1993). 108. K. Giese, C. Kingsley, J. R. Kirshner and R. Grosschedl, Genes Deu. 9, 995 (1995). 109. P. N. Goodfellow and R. Lowell-Badge, ARGen 27, 71 (1993). 110. A. Pontiggia, R. Rimini, V. R. Harley, P. N. Goodfellow, R. Lowell-Badge and M. E. Bianchi, EMBO J 13, 6115 (1994). 111. L. Bracco, L. Kotlarz, A. Kolb, S. Diekmann and H. Buc, EMBOJ. 8, 4289 (1989). 112. M. R. Gartenberg and D. M. Crothers, J M B 219, 217 (1991). 113. M. Pfahl, Endocr. Reu. 14, 651 (1993). 114. A. Baniahmad, I. Ha, D. Reinberg, M.-J. Tsai and B. W. O’Malley, PNAS 90,8832 (1993). 115. D. J. Shuey and C. S. Parker, Nature 323, 459 (1986). 116. E. F. Adams, I. E. Brajkovich and K. Mashiter, J . Clin. Endocrinol. Metab. 53, 381 (1981). 117. M. H. MacGillvray, T. Aceto, Jr. and L. A. Frohman, Am. J. Dis. Child. 115,273 (1968). 118. H. P. Katz, R. Youlton, S. L. Kaplan and M. M. Grumbach, J. Clin. Endocrinol. Metab. 53, 381 (1969). 119. G. A. Brent, J. W. Harney, Y. Chen, R. L. Warne, D. D. Moore and P. R. Larsen, Mol. Endocrinol. 3, 1996 (1989). 120. W. W. Chin, F. E . Carr, J. Eurnside and D. S. Darling, Recent Prog. H o r n . Res. 48,393 ( 1993). 121. F. E. Carr, L. L. Kaseem and N . C. Wong, JBC 267, 18689 (1992). 122. F. E. Carr and N. C. Wong, JBC 269, 4175 (1994). 123. A. N. Hollenberg, T. Monden, T. R . Flynn, M.-E. Boers, 0. Cohen and F. E. Wondisford, MoZ. Endocrinol, 9, 540 (1995). 124. A . D. Johnson, Cell 81, 655 (1995). 125. M. Levine and J. L. Manley, Cell 59, 405 (1989). 126. G. Graupner, X.-K. Zhang, M. Tzukerman, K. Wills, T. Hermann and M. Pfhal, Mol. Endocrinol. 5, 365 (1991). 127. M . A. Lazar, Endoc. Reu. 14, 184 (1993). 128. D. Katz and M. A. Lazar, JBC 268, 20904 (1993). 129. K. H. Hupart, R. A. Hodin, M. A. Lazar, L. E. Shapiro, W. W. Chin and M. I. Surks, Thyroid 3, 55 (1993). 130. X. K. Zhang, K. N. Wills, M. Husmann, T. Hermann and M. Pfahl, MCBiol 11, 6016 (1991).
162
NORMAN L. EBERHARDT E T AL.
131. G . Lopez, F. Schaufele, P. Webb, J. M. Holloway, J. D. Baxter and P. J. Kushner, MCBiol 13, 3042 (1993). 132. F. E. Wondisford, H. J. Steinfelder, M. Nation and S . Radovick, JBC 268, 2749 (1993). 133. J. D. Fondell, A. L. Roy and R. G . Roeder, Genes Deu. 7, 1400 (1993). 134. J. M. Lehmann, X.-K. Zhang, 6. Graupner, M.-0. Lee, T. Hermann, B. Hoffmann and M. Pfahl, MCBiol 13, 7698 (1993). 135. C. A. Spencer and M. Groudine, Oncogene 5, 777 (1990). 136. P. A. Pavco and D. A. Steege, JRC 265, 9960 (1990). 137. J. L. Manley and N. J. Proudfoot, Genes Deu. 8, 259 (1994). 138. H. D. Parry, G . Tebb and I. W. Mattaj, NARes 17, 3633 (1989). 139. C. Spencer, R. LeStrange, W. Hayward, U. Novak and M. Groudine, Genes Deu. 4, 75 (1990). 140. J. Greenblatt, Trends Biochem. Sci. 16, 408 (1991). 141. C. Lavau, J. Jansen and A. Dejean, Puthol. Biol. 43, 188 (1995). 142. P. J. Leedman, A. R. Stein and W. W. Chin, Mul. Endocrinol. 9, 375 (1995). 143. M. Tsruda and Y. Suzuki, PNAS 80, 7442 (1982). 144. M. D. Walker, T. Edlund, A. M . Boutlet and J. Rutter, Nuture 306, 557 (1983). 145. M.-0. Ott, L. Sperling, P. Herbomel, M. Yaniv and M. C. Weiss, EMBOJ. 3,2505 (1984). 146. B. L. Rogers, M. G. Sobnosky and G . F. Saunders, NARes 14, 7647 (1986). 147. W. H. Walker, S . L. Fitzpatrick and G . F. Saunders, JBC 265, 12940 (1990). 148. S. L. Fitzpatrick. W. H. Walker and G . F. Saunders, Mol. Endocrinol. 4, 1815 (1990). 149. S.-W. Jiang and N. L. Eherhardt, JBC 269, 10384 (1994). 150. S.-W. Jiang, A. R. Shepard and N. L. Eberhardt, JBC 270, 3683 (1995). 151. S.-W. Jiang and N . L. Eberhardt, JBC 270, 13906 (1995). 152. I. Davidson, J. H . Xiao, R. Rosales, A. Staub and P. Chamhon, Cell 54, 931 (1988). 153. J. H. Xiao, I. Davidson, H. Matthes, J.-M. Garneir and P. Chambon, Cell 65,551 (1991). 154. J.-J. Hwang, P. Chambon and I. Davidson, EMBO J. 12, 2337 (1993). 155. T. Ishiji, M. J. Lace, S. Parkkinen, R. D. Anderson, T. H. Haugen, T.P. Cripe, J. H. Xiao, I. Davidson, P. Chambon and L. P. Turek, E M B O J. 11, 2271 (1992). 156. I. K. 6. Farrance, J. H. Mar and C. P. Ordalil, JBC 267, 17234 (1992). 157. K. Kariya, I. K. Farrance and P. C. Simpson, ] B C 268, 26658 (1993). 158. A. F. Stewart, S. B. Larkin, I. K. G. Farrance, J. H. Mar, D. E. Hall and C. P. Ordahl, JBC 269, 3147 (1994). 159. P. Jacquemin, C. Oury, B. Peers, A. Morin, A. Beleyew and J. A. Martial, MCBioll4, 93 (1994). 160. N. Shirnizu, G . Smith and S. Izumo, NARes 21, 4103 (1993). 161. t. R. Biirglin, Cell tt, 11 (1991). 162. S. Campbell, M . Inamdar. V. Rodrigues, V. Raghavan, M. Palazzolo and A. Chovnick, Genes Deu. 6, 367 (1992). 163. P. M . Mirabito, T. H. Adams and W. E. Timberlake, Cell 57, 859 (1989). 164. I. Laloux, E. Dubois, M . Dewerchin and E. Jacobs, MCBiol. 10, 3541 (1990). 165. W. Su, S. Jackson, R. Tjian and H. Echols, Genes Deu. 5, 820 (1991). 166. M. W. Nachtigal, 8. E. Nickel and P. A. Cattini, JBC 268, 8473 (1993). 167. E. Y. Chen, Y. Liao, D. H. Smith, H. A. Barreera-Saldana. R. E. Gerlinas and P. H. Seeburg, Cenomics 4, 479 (1989). 168. A. Stephanou and S. Handwerger, Endocrinology 136, 933 (1995). 169. I. Harman, P. Zeitle, B. Ganong, R. M. Bell and S . H. Handwerger, Endocrinology 119, 1239 (1986). 170. P. A. Cattini, M. Klassen and M. Nachtigal, Mol. Cell. Endocrinol. 60, 217 (1988). 171. E. D. Albrecht and G . L. Pepe, Endocr. Reu. 11, 124 (1990).
REGULATION OF HUMAN GROWTH HORMONE AND RELATED GENES
163
172. D. L. Loriaux, H. J. Rudler, D. R. Knab and M. B. Lipsett, J . C h Endocrinol. Metab. 35, 887 (1972). 173. M. A. Younes, N . F. Besch and P. K . Besch, Am. J. Obstet. Gynecol. 141, 170 (1981). 174. S . C . Sharma, P. Purohit and A . J. Kao, J . Mol. Endocrinol. 11, 91 (1993). 175. G . E. Ringler and J. F. Straws, 111, Endow. Reu. 11, 105 (1990). 176. E. J. Mitchell, K. Lee and M . D. O’Connor-McCour, Mol. B i d . Cell 3, 1295 (1992). 177. D. W. Morrish, D. Bhardwaj, L. K . Dabbagh, H. Marusyk and 0. Siy, J. Clin. Endocrinol. Metab. 65, 1282 (1987). 178. B. Bhaumick, E. P. Dawson and R . M . Bala, BBRC 144, 674 (1987). 179. B. Bhaumick, D. George and R . M. Bala, J. Clin. Endocrinol. Metab. 74, 1005 (1992). 180. S . Saito, M. Saito, M. Enomoto, A. Ito, K . Motoyoshi, T. Nakagawa and M. Ichijo, Growth Factors 9, 11 (1993).
Role of Translation Initiation Factor elF-2B in the Regulation of Protein Synthesis in Mammalian Cells SCOT R. KIMBALL, HARRY MELLOR, KEVIN M . FLOWERS AND LEONARD S. JEFFERSON~ Department of Cellular and Molecular Physiology College of Medicine The Pennsylvania State Uniuersity lfershey, Pennsylvania 17033
I. Function of eIF-2B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Overview B. Discovery of eIF-2B; Historical Perspective . . . . . . . . . . . . . . . . . . . 11. Regulation of eIF-2B Activity by Phosphorylation of the a-Subunit of eIF-2/eIF-2a Kinases .................................... 111. Other Potential Mecha of Regulation of eIF-2B Activity . . . . . . . IV. StructurelFunction of Individual Subunits of eIF-2B; Studies fro Yeast . . . . . . . . . . . . . . . I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Regulation of eIF-2B Activity; Summary . . . . . . . . . . . . . . . . . . . . . . . . . ....... VI. Cloning of eIF-2a Kinases . . A. Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. HCR ..................... C. dsRNA-activated Protein Kinase . . . . . . . . . , . . . . . . . . . . . . . . . . . . . D. eIF-2a Kinases; Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII. Cloning of the a-Subunit of eIF-2B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. cDNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Genomic Clone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII. Cloning of the &Subunit of eIF-2B . . . . . . . . . . . . . . . . . . . . . . . , . . . . . ............... IX. Cloning of the y-Subunit of eIF-2B X. Cloning of the &Subunit of eIF-2B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI. Cloning of the €-Subunit of eIF-2B . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. cDNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Genomic Clone ................................ XII. Future Directions ................................ References I
1
166 166 167 168 170 171 172 172 172 173 177 178 178 178 181 182 184 185 188 188 192 193 194
To whom correspondence may he addressed.
Progress in Nucleic Acid Research and Molecular Biology, Vnl. 54
165
Copyright 0 1996 by Academic Press, Inc. All rights of reproduction m any form reserved.
166
SCOT R. KIMBALL ET AL.
1. Function of elF-26
A. Overview Translation of mRNA occurs through a complex series of reactions that are conventionally divided into three phases: initiation, elongation, and termination (reviewed in 1). The initiation phase is particularly important, because it is generally accepted as the rate-limiting step of translation, and competition of mRNA for translation is thought to be mediated through the activity of certain initiation factors (2, 3). Initiation of protein synthesis in mammalian cells is, in itself, a complex process involving more than a dozen initiation factors (abbreviated eIF) and numerous steps that lead to the formation of an 80-S initiation complex (reviewed in 2, 4). The first step in initiation is the formation of a ternary complex consisting of eIF-2, GTP, and initiator methionyl-tRNA, (Met-tRNA,). The ternary complex then binds to a 4 0 4 ribosomal subunit that already has two other initiation factors, eIF-1A and eIF-3, bound to it, to form a 43-S preinitiation complex. The binding of the 43-S preinitiation complex to an mRNA molecule involves three other initiation factors, eIF-4A, eIF-4B, and eIF-4F, and results in formation of a 48-S preinitiation complex. Addition of the 60-S ribosomal subunit to form the 80-S initiation complex is preceded by the release of the initiation factors bound to the 48-S preinitiation complex and requires eIF-5. During this step, the GTP bound to eIF-2 is hydrolyzed and the eIF-2.GDP binary complex is released. Before eIF-2 can bind MettRNA, and participate in another round of initiation, the GDP bound to eIF-2 must be exchanged for GTP. At cellular concentrations of magnesium ion, the exchange of GTP for GDP on eIF-2 occurs at a very slow rate. The initiation factor, eIF-2B (also termed GEF, for guanine nucleotide exchange factor), catalyzes the exchange of GDP bound to eIF-2 for GTP, allowing eIF-2 to reform the ternary complex. Based on the relative &nity of eIF-2 for G D P and GTP, and the stability of the eIF-2.GDP complex in the presence of Mgz+, if eIF-2B is not active most of the eIF-2 in the cell would exist in an inactive complex with GDP. The importance of eIF-2B in the regulation of protein synthesis is highlighted by the number of studies reporting a change in its activity in response to various cellular perturbations. In many cases, the change in eIF-2B activity is a result of an increase in phosphorylation of the a-subunit of eIF-2. In other cases, the basis for the change in activity is unknown, but is independent of changes in eIF-2a phosphorylation. In spite of these numerous studies, little is known about how the various perturbations result in modulation of eIF-2B activity or even what the function is of the five individual subunits of the factor. However, an understanding of the mechanisms
ROLE OF FACTOR
eIF-2B
167
involved in the regulation of eIF-2B activity is of particular importance because, as discussed below, the activity of the factor becomes rate-limiting for protein synthesis under a variety of conditions. In the present essay, we review the literature involving the rate-limiting reaction catalyzed by eIF-2B as well as describe our studies of the regulation of the activity of the factor in mammalian cells.
B. Discovery of elF-2B; Historical Perspective The search for, and subsequent identification of, eIF-2B was prompted
by an attempt to explain the mechanism involved in the inhibition of protein synthesis that occurs in rabbit reticulocyte lysates deprived of hemin (reviewed in 5, 6). In reticulocyte lysate supplemented with hemin, protein synthesis is maintained at near the in vim rate for extended periods of time. In contrast, the rate of protein synthesis is reduced to less than 5% within 510 minutes in unsupplemented lysate. The inhibition of translation caused by hemin deprivation is preceded by activation of a protein kinase, alternatively termed the heme-controlled repressor (HCR) or hemin-regulated inhibitor (HRI), which specifically phosphorylates the a-subunit of eIF-2. Furthermore, the inhibition is rapidly reversed following the addition of purified eIF-2 to the hemin-deprived lysate and is mimicked by addition of HCR to hemin-supplemented lysates. These findings led to the proposal that phosphorylation of eIF-2 on its a-subunit results in inactivation of the factor. However, later findings suggested that phosphorylation of eIF-2a does not directly inhibit the ability of the factor to participate in initiation. First, although protein synthesis is inhibited by more than 95% following hemin deprivation, only about 30% of the total eIF-2a present in the lysate becomes phosphorylated. Second, addition of GTP to hemin-deprived lysate restores protein synthesis to control rates. Finally, phosphorylation of highly purified eIF-2 by HCR has no apparent effect on the ability of the factor to bind Met-tRNA, or to form the 43-S preinitiation complex in in uitro reactions (reviewed in 7). In contrast, phosphorylation of crude preparations of eIF-2 inhibits its activity. The search for the factor that mediates the inhibition of eIF-2 activity in crude preparations of the factor led to the identification of a guanine nucleotide exchange activity that was initially referred to as GEF and subsequently as eIF-2B. eIF-2B was initially purified from rabbit reticulocytes as a factor distinct from eIF-2 that could reverse the inhibition of protein synthesis in hemindeprived lysate (8). Subsequently, eIF-2B was purified from a number of sources, including Ehrlich ascites cell (S), HeLa cells ( l o ) ,and rat liver (11). Initially, eIF-2B was shown to stimulate the binding of Met-tRNA, to eIF-2
168
SCOT R. KIMBALL ET AL.
and to enhance formation of 43-S preinitiation complexes in in aitro assays using nonphosphorylated, but not phosphorylated, eIF-2 (reviewed in 7). Later studies demonstrated that the function of eIF-2B is to catalyze the exchange of GDP bound to eIF-2 for free GTP, and that phosphorylation of eIF-2 by HCR inhibited the exchange reaction. The mechanism involved in the inhibition is an increase in the a n i t y of eIF-2 for eIF-2B following phosphorylation of eIF-2a by HCR (12-14). However, an explanation for the observation that phosphorylation of only 30%of the total eIF-2 in reticulocyte lysate is associated with a greater than 95% inhibition of protein synthesis was still lacking. Subsequent studies revealed that the molar concentration of eIF-2B in reticulocytes is only approximately 25% of that of eIF-2 (11, 15, 16). Thus, because of the greatly increased affinity of eIF-2B for eIF-2 phosphorylated on the a-subunit, phosphorylation of only 30% of the cellular eIF-2 is sufficient to inhibit essentially all of the eIF-2B in reticulocytes. In contrast, the molar ratio of eIF-2 to eIF-2B is significantly greater in liver than in reticulocytes (11, 15). Thus, phosphorylation of 30%of the eIF-2 in rat liver only results in a 50% inhibition of the rate of protein synthesis (I 7). Therefore, the molar ratio of the two factors is critically important in the modulation of the rate of protein synthesis in response to changes in eIF-2a phosphorylation.
II. Regulation of elF-2B Activity by Phosphorylation of the a-Subunit of elF-2/elF-2a Kinases As described above, an important mechanism for regulating initiation in eukaryotic cells involves phosphorylation of the a-subunit of eIF-2. An increase in the phosphorylation state of eIF-2a is associated with an inhibition of protein synthesis in response to such diverse stimuli as heat shock (18, 19), heavy metals (20), calcium-mobilizing hormones (21), agents that interfere with protein processing and transport in the endoplasmic reticulum (ER) (22, 23), viral infection (24), and deprivation of amino acids (19, 25, 26), glucose (19), purines (27), or serum (19). However, phosphorylation of eIF-2a does not directly inhibit formation of either the eIF-2.GTP.MettRNA, ternary complex or the 4 3 3 preinitiation complex, as these reactions proceed efficiently in uitro with the phosphorylated factor (28). Instead, eIF-2 phosphorylated on the a-subunit is thought to act as a competitive inhibitor of eIF-2B. Three lines of evidence support this assumption: (1) eIF-2B does not catalyze GDP-exchange on eIF-2 phosphorylated on the
ROLE OF FACTOR
eIF-2B
169
a-subunit (14);(2) eIF-2(aP) can displace unphosphorylated eIF-2 bound to eIF-2B but unphosphorylated eIF-2 does not displace eIF-2(aP) bound to eIF-2B (5);and (3) eIF-e(aP) inhibits the activity of eIF-2B in the presence of low concentrations of substrate [i.e.,
170
SCOT R. KIMBALL ET AL.
111. Other Potential Mechanisms of Regulation of elF-2B Activity
In addition to an inhibition of eIF-2B in response to phosphorylation of eIF-2a, the activity of eIF-2B can be regulated through mechanisms independent of changes in eIF-2a phosphorylation. For example, insulin stimulates the activity of eIF-2B in the skeletal muscle of diabetic rats (42, 43) as well as in cultures of serum-deprived Swiss 3T3 fibroblasts (44). Several studies report no change in the phosphorylation state of eIF-2a in skeletal muscle in response to insulin-deprivation caused by either starvation (4.546) or diabetes (43). Likewise, insulin has no effect on the extent of eIF-2a phosphorylation in Swiss 3T3 fibroblasts (44).Therefore, the action of insulin to stimulate the activity of eIF-2B is not associated with any change in phosphorylation of eIF-2a. The activity of eIF-2B can also be regulated in vitro in the absence of a change in eIF-2a phosphorylation. eIF-2B purified from rabbit reticulocytes was associated with NADPH (47).The GDP-exchange activity of the factor in in vitro assays is almost completely inhibited in the presence of either NAD+ or NADP+ (43, 47). The inhibitory effects of oxidized pyridine dinucleotides are prevented by equimolar amounts of either NADH or NADPH. Despite these results obtained in uitro, there is little evidence that pyridine dinucleotides are involved in the regulation of eIF-2B activity in vivo. In fact, the cytosolic concentration of NAD+ in rat liver is approximately 1000-fold higher than that of NADH (48). Furthermore, although the relative cytosolic concentration of NADPH is 10-fold higher than NADP+ (48), the actual concentration of NADP+ is significantly less than needed to inhibit the activity of eIF-2B in vitro (43, 47). The only exception may occur during the fertilization of sea urchin eggs, where activation of eIF-2B correlates with increasing NADPH concentrations (49). However, even in this case, the evidence suggests that the stimulation of eIF-2B activity is probably not due to a direct effect of NADPH on eIF-2B. Another mechanism by which the activity of eIF-2B might be regulated in the absence of a change in eIF-2a phosphorylation involves phosphorylation of the €-subunit of eIF-2B. The €-subunit of eIF-2B can be phosphorylated in vitro by at least three different protein kinases, casein kinase I (CKI) (50) and CK-I1 (50, 51), and glycogen synthase kinase-3 (GSK-3) (44). Presently, there is no evidence that phosphorylation of eIF-2Be by CK-I causes any change in GTP-exchange activity. However, indirect evidence suggests that phosphorylation of the factor by GSK-3 results in inhibition of eIF-2B in vivo. In particular, insulin both stimulates the activity of eIF-2B (44)and inhibits the activity of GSK-3 (52)in cells in culture. Furthermore, an insulin-inhibited eIF-2Bg kinase co-purifies with GSK-3 from C H 0 . T
ROLE OF FACTOR
eIF-2B
171
cells (52). These results suggest that insulin might regulate the activity of eIF-2B by changes in the phosphorylation state of eIF-2Bc caused by modulation of the activity of GSK-3. It has also been proposed that the €-subunit of eIF-2B is a substrate for at least one additional kinase because the phosphorylation of many of the substrates of GSK-3 by the kinase require the previous phosphorylation by a separate kinase (53).The identity of the “priming kinase” is unknown; however, if it exists it is distinct from CK-I or -11 (unpublished observations). Three separate studies have shown that the €-subunit of eIF-2B can be phosphorylated by CK-I1 in uitro (12, 50, 54). Dholakia and Wahba (51) reported that phosphorylation of eIF-2B by CK-I1 results in a fivefold increase in the activity of the factor. Following dephosphorylation of eIF-2B by alkaline phosphatase, the activity of the factor was reduced by a factor of five. In contrast to this study, Oldfield and Proud (50)found that phosphorylation of eIF-2B by either CK-I or -11 had no effect on guanine nucleotide exchange activity. Likewise, we have failed to observe any change in eIF-2B activity following either phosphorylation of eIF-2Be by CK-I1 or dephosphorylation of the phosphorylated factor by alkaline phosphatase (S. R. Kimball and L. S. Jefferson, unpublished observations).
IV. Structure/Function of Individual Subunits of elF-2B; Studies from Yeast
eIF-2B is a heteropentamer consisting of subunits termed a,p, y , 8, and Each of the subunits of eIF-2B has been cloned from S. cereuisiue. In addition, the mammalian a-,p-, 6-, and esubunits of the factor have been cloned. The molecular masses of the four mammalian subunits, as determined from amino-acid sequence derived from cDNA clones, are similar in mammals [33.6, 39.0, 57.2, and 82 kDA (55-58), respectively] and S. cereuisiue 134, 42.6, 71, and 81.2 kDa (59-61), respectively]. The overall aminoacid sequence identity of mammalian and yeast open-reading frames is 40, 36, 36, and 30% for the a-,p-, 6, and esubunits, respectively. Furthermore, the apparent molecular mass of the mammalian y-subunit, as determined by SDS-polyacrylamide gel electrophoresis [58 kDa (11)], is similar to that predicted for the corresponding subunit from yeast [68 kDa (62)],as derived from the cDNA clone. The functions of the individual subunits of eIF-2B have not been delineated, although recent work in yeast suggests that the a-subunit of eIF-2B may be involved in recognition of the phosphorylation state of the a-subunit of eIF-2 (35). Furthermore, recent studies in yeast suggest that the p- and 8-subunits of eIF-2B might also be involved in E.
172
SCOT R. KIMBALL ET AL.
responding to phosphoiylation of eIF-2a because point mutations in these two subunits result in the same phenotype as does deletion of eIF-2Ba (63). It is noteworthy that both the p- and &subunits of eIF-2B exhibit regions of amino-acid sequence similarity with eIF-2Ba (59). It has been proposed that all of the mutations in the a-,p-, and &subunits of eIF-2B weaken the stable interaction between eIF-S(aP) and eIF-2B that is required for sequestration of eIF-2B into an inactive complex (63). Because the mutations have no effect on cellular growth when GCNB is inactive, they probably do not substantially alter the guanine nucleotide exchange activity of eIF-2B under control conditions.
V. Regulation of elF-2B Activity; Summary As described above, the activity of eIF-2B is regulated through a variety of mechanisms. This regulation is currently being studied in a number of species, including yeast, rat, rabbit, and human. A current aim of the investigations ongoing in our laboratory is to derive a system based on a single species that can be used for the genetic analysis of the regulation of eIF-2B activity. The components of this system include not only the individual subunits of eIF-2 and eIF-2B, but also the kinase(s) and phosphatase(s) that are involved in regulating the phosphorylation state of the a-subunit of eIF-2 and the €-subunit of eIF-2B. Toward this end, we, and others, have begun to clone and characterize cDNAs for these various proteins. The remainder of this article comprises a review of the current status of these efforts.
VI. Cloning of elF-2a Kinases
A. Strategy In order to investigate mechanism(s) involved in the regulation of eIF-2a phosphorylation, we have initiated studies designed to identify and clone the eIF-2a kinases present in rat tissues. We have used the polymerase chain reaction (PCR) to achieve this aim. As described previously (34, rabbit reticulocyte HCR, human and murine PKR, and yeast GCNB exhibit regions of amino-acid sequence identity between the conserved protein kinase domains IV and V1. Degenerate sense and antisense primers have been designed based on these regions of homology and used to ampllfy eIF-2a kinases starting from cDNA produced from reverse transcription of poly(A)+ RNA from rat liver (30, 64). Three major cDNA products were produced in the reactions. The three cDNAs were purified on a low-melting-point
ROLE OF FACTOR
eIF-2B
173
agarose gel, subcloned, and sequenced. When one of the products was subcloned, two independent clones were obtained, both of approximately equal length. One of these showed strong (88%) identity to rat testosterone 6P-hydroxylase; the second cDNA had no homology to any eIF-2a kinase or to any reported DNA sequence. The other two cDNAs exhibited sequence identity with HCR (30) and PKR (64).The amplification of a PCR product from rat liver RNA that exhibited sequence homology with PKR was not surprising because of the ubiquitous nature of this kinase. However, we were surprised that the other product was homologous to HCR because HCR had been reported to be present only in tissues of erythroid origin (37). Therefore, initial efforts were directed toward cloning the HCR-like eIF-2a kinase.
B. HCR The cDNA product that exhibited 84% homology to rabbit HCR (29) was used to screen a rat brain cDNA library (30). From approximately 400,000 clones, 30 positive plaques were obtained. As shown in Fig. 1, one of these clones (RBH4) contained an open reading frame of 1863 bp. The derived amino-acid sequence was 81% homologous to the full coding sequence of rabbit HCR. The other clones were truncations of the full-length clone. The amino-acid sequence deduced from the rat cDNA clone is 82% homologous to rabbit reticulocyte HCR overall and contains the 11 conserved catalytic subdomains of protein kinases (65).Variations between the derived aminoacid sequences for rabbit and rat are largely confined to the N terminus and to an approximately 135-aminoacid region located between protein kinase subdomains IV and VI, where the homology is reduced to only 56%. The rat sequence also contains a frameshift mutation toward the C terminus that leads to a seven-amino-acid truncation compared to the rabbit sequence. The rat brain HCR clone was expressed as a histidine-tagged fusion in a The histidine tag allowed one-step &niprokaryotic expression system (30). ty purification on a NiZ+-charged agarose column. eIF-2a kinase activity was eluted from this column with 250-mM imidazole and was judged to be approximately 30% pure by SDS/PAGE analysis and Coomassie staining. No e1F-k kinase activity was detected in extracts of nontransfected cells. The recombinant enzyme was fully active in phosphorylating eIF-2au,but no autophosphorylating activity was detected. It is possible that HCR isolated from a prokaryotic system is fully active and, therefore, already phosphorylated. The recombinant rat brain enzyme was hemin-sensitive, because addition of 10-pM hemin to the reaction caused approximately 50% inhibition of eIF-2a kinase activity. Rabbit reticulocyte HCR is inhibited by hemin with a Ki of approximately 3 pM (66). The expression of HCR mRNA in various rat tissues was examined by
174
SCOT R. KIMBALL ET AL.
A
70 CTCGCGGCAACGATGClGGGGGGCGGCTCCGGC~CGTGGCGGCTCATGG~AGCGCGACACGGACGACGA~CG~~GG 19 M L G G G S V D G E R D T D D D A A G 140 CGGTGGCCGCGCCTCCTGCCATCGACT'ICCCCGCAGAG~TC~ACCC~GTATGAT~GTCCGATGT 4
3
A
V
A
A
P
P
A
I
D
F
P
A
E
V
S
D
P
K
Y
D
E
S
D
V
210 C C C A G C A G A G C T C C A A G T G T T C A A A G A G C C C T l G C A G C P d 66 P A E L Q V F K E P L Q Q P T F P F L V A N Q
280 CTGCTGCTGG?rTCCTTGCTGGGAACACTTGAGCCATGTGCACGRGCCG~CCACTTCACTC~A~ 89
L
L
L
V
S
L
L
E
H
L
S
H
V
H
E
P
N
P
L
H
S
K
Q
350 T C T T T ~ T T A C T G T G C C A G A C T T T T T A ~ A A G A T G G G G C T G C T ~ C T T C f f T T A C C T G C ~ C G A ~ G T T 1
1
3
V
F
K
L
L
C
Q
T
F
I
K
M
G
L
L
S
S
F
T
C
S
D
E
F
420 C A G C T C C C T G R G A C T C C A C C A C A A C A G A G C C A G ~ C C A T ~ C G T 136 S S L R L H H N R A I T H L M R S A K E R V R 490 C A G G A rrC TTG T C AAGAT AAT T C T T AC AT GC AT GC AT G A T ~A G G G ~A T A G CT CT ~A A G C~A G A 159
Q
D
P
C
Q
D
N
S
Y
M
Q
K
I
R
S
R
E
I
A
L
E
A
Q
560 C T T C A C G C T A C T T A A A T G A A T T T G A A G ~ C T C G C C A T C ~ A G G ~ ~ ~ A T A T G G ~ A G ~ T A C A A 1
8
3
T
S
R
Y
L
N
E
F
E
E
L
A
I
L
G
K
G
G
Y
G
R
V
Y
K
630 GGTCCGGAACAAATTAGATGGK!AGCATTATGCAATTMA?iLkTCCTGATTAAGAGTGCAACTAAAACG V R N K L D G Q H Y A I K K I L I K S A T K T 206
700 GATTGTATGAAGGTGCTACGGGAAGTGAAGGTrrTGGT~TGGC~CC~CAGC~CCCAATATCGTTGGCTACC 229 D C M K V L R E V K V L A G L Q H P N I V G Y 770 A C A C T G C A T G G A T A G A G C A C G ~ C A C G ~ C T T C A G C C A ~ G A ~ G A G T ~ C C A ~ C A A C T G C C C ~ T C T ~
~
~
H
T
A
~
I
E
H
~
H
~
L
Q
P
Q
D
R
~
P
840 TGAAGTGTTGTCAGAGCATGAAGGGGACAGAAATCAAGGTGGTGTTAAAGATAATGAMVXAGTTCGTCC 276 E V L S E H E G D R N Q G G V K D N E S S S S 910 A T T A T C T T T G C ’ r G A A C W A C C C C A G A A A A A G A A A A f e C T 299 I I F A E L T P E K E N P L A E S D V K N E N
980 A C A A C T T G G T G A G C T A C A G G G C C A A C T T A G G P C T 3
2
3
N
N
L
V
S
Y
R
A
N
L
V
I
R
S
S
S
E
S
E
S
S
I
E
L
1050 CCAAGAAGATGGCTTGAACGAGTCGCCrrTCAGACCAGACCAG~GTCAAGCAC~GC~CGCTGGGGCATAGC 346 Q E D G L N E S P L R P V V K H Q L P L G H S
1120 TCAGACGTGGAAGGGAATTTT~CTCC~GGATGAGTC~CTGAAGGTrrTGACAATTTGAACC~T~~AGA 369 S D V E G N F T S T D E S S E D N L N L L G Q 1190 C A G A G G C G C G G T A C C A C C T G A ~ C T G C ~ A T C C A G A T G e A T 3
9
3
T
E
A
R
Y
H
L
M
L
H
I
Q
M
Q
L
C
E
L
S
L
W
D
W
I
1260 AGCn;AGAGGAACAAGCGGAGCCGGAA~GCGTGGATG~GCAGCTTGTCCCTAlGTTA~GCCA~TT 416
A
E
R
N
K
R
S
R
K
C
V
D
E
A
A
C
P
Y
V
M
A
S
V
FIG. 1. The cDNA sequence and predicted amino-acid sequence of rat HCR. The cDNA for rat HCR was cloned from a rat brain cDNA library (30). The upper sequence is the nucleotide sequence of the cDNA clone and the lower sequence is the predicted amino-acid sequence. Nucleotide residues and amino acids are numbered on the left.
I
Q
HOLE OF FACTOR
175
eIF-2B
B
1330 G C A A C A A A G A ~ T T T C A A G A A C T G G T G G T G T C T T T 4
3
9
A
T
K
I
F
Q
E
L
V
E
G
V
F
Y
I
H
N
M
G
I
V
H
R
1400 A T C T G A A G C C T A G A A A T A T T T ? T C T T C A T G G T C A T G G T ~ T G A T ~ C ~ T ~ T A G ~ G A C ~ T ~ T ~ ~
~
~
D
L
K
P
R
N
I
F
L
~
~
G
P
D
Q
Q
V
K
I
G
D
F
G
L
1470 CTGTGCAGACATCATTCAGACGCPGATTGGACCAACC~AG~C~AAAGGAACGCCGACAC~ACG 486 C A D I I Q K S A D W T N R N G K G T P T H T 1540 TCCCGAGTGGGGACTTGTCTCTACGCGrrACCTGAGCAGCA~~GAAGGATCCGAGTA~A~CC~~CTG 509 S R V G T C L Y A S P E Q L E G S E Y D A K S
1610 A T A T G T A C A G C T T G G G T G T G A ~ ~ C T G C ~ ~ G A G C T C T T C C A G C T T C C A G C C A T T C G G ~ C A G A A A ~ ~ G C G A ~ ~ 5
3
3
D
M
Y
S
L
G
V
I
L
L
E
L
F
Q
P
F
G
T
E
M
E
R
A
T
1680 AGTCCTAAC A G G T G T G P G G A C ' I G G T C G ~ T A C C A G A G T C 556 V L T G V R T G R I P E S L S K R C P V Q A K 1750 TATATCCAGC~~CTGACTGGGAGGAACGCGGCCCAGAGACCC 579 Y I Q L L T G R N A A Q R P S
A
L
Q
L
L
Q
S
E
1820 T G T T C C A A A C A A C G G G A A A T G ~ ~ T C r r A C A ~ G C A G A T G A A ~ T A A T G G A G C A A G ~ G G ~ T T G A 603 L F Q T T G N V N L T L Q M K I M E Q E K E I E 1890 A G A A T T A A A G A A G C A ~ T C C T T T C G C A G G A C A A A G G C 620 E L K K Q L S L L S Q D K G L K R 1960 C T A C C T G T G T A G C C T G A A T T A ' I G C T C T C A A C C A T G T G T G ~ G T G ~ C T T G C A T T C ~ ~ T C A C C A ~ G 2030 A T G T A A A T T T r r A A C C C T T A G C T G A G G A G G G T G T G A C r r C C A 2100 GGTGAAAGGACGGTGGCTGGGGATTTAGCTCAGTGGTAGTGGTA~GTC~TTACCTAGGAAGTC~~~rrCCT 2145 GGGTTCGGTCCCCAGCTCCGGAARAAAAGAACTAAAAAAA
FIG. 1 (Continued)
Northern blot analysis of poly(A)+ RNA (30).This probe produced a strong signal with the rabbit and rat reticulocyte poly(A)+ RNA samples. HCR message was expressed at approximately a tenth of that in most rat tissues, compared to reticulocyte. The exception to this was psoas muscle, where the amount of HCR mRNA was approximately equal to that found in reticulocytes. The membrane was stripped and reprobed with an erythroidspecific PMaj-globin cDNA. This probe corresponded to nucleotides 1-444 of the rat PMaj-globin cDNA (67). The globin probe produced a strong signal with the two reticulocyte samples, but did not detect globin message in the remaining samples. This result confirmed that the HCR signal in the nonerythroid samples was not due to contamination with reticulocyte mRNA. The finding that HCR mRNA is expressed in all rat tissues examined is in conflict with a report (68) that a monoclonal antibody against rabbit reticulocyte HCR failed to detect HCR in nonerythroid tissues. One explanation
176
SCOT R. KIMBALL ET AL.
A
70 G A A C T P G C C T G A C C T G G A C T T G C C C G G C A G A A G C G A C A G T
140 210 20
GGGAAGAAAAATGGCCPGTGATACACCAGGTTTCTACGTGAC M A S D T P G F Y V D K L N K Y S Q I H
280 43
AAAG~~GA?TATATATAAAGAAATT~~TGTTACAGGACCTC~CACGACAGAAGGTTTACATT~AAG K V K I I Y K E I S V T G P P H D R R F T F Q
350 TTATAATAGAAGAGAGGGA AT?TCCAG A
67
420 90
V
I
I
E
E
R
E
F
P
E
G
A E
G
G
R
G S
K
T
Q
A E
A
G
K
T N
N
G
A
C A
CAAATTAGCTGTCGAAATACTTGATAAT~~~AAGGTGATAGATAGCATCACACGGATGCTTCTG~C~GGT K L A V E I L D N E N K V D S H T D A S E Q G
490 T T A A T P G A G G G G A A C T ~ A T T ~ C C T T T T T T G 113 L I E G N Y I G L V N S F A Q K E N L P V N F 560 137
AACTGTGATATGACCCCGACTCCCAATTGC~CAC~ATTTATTTGT~TG~~GGG~GACTACGTA E L C D P D S Q L P H R F I C K C K I G Q T T Y
630 160
T G G T A C F G G T T T C G G T O C T A A C A ? A A A G A G G C A A A G C C T G G T G F G A N K K E A K Q L A A K N A Y Q K L
700 183
T C A G A G A A A A G C C C A T C G A T G G C ~ T G T C A C A T C T C T A S E K S P S K T G F V T S L S S D F S S S S S
770 207
T A A C A P G T A A C T C T G C ~ C T C P G T C A G G G A I T S N S A S Q S A S G R D F E D I F M N G L R
840 AGAAAAAAGGAAATCAGGAGTAAAAGTGCCATCTGATCTGATGACGTGCTAAGAAAT~TATACCTTGGACGAC E K R K S G V K V P S D D V L R N K Y T L D D 230 910 A G G T T T A G C A A A G A T T ? T G A A ~ C A T A ~ G A A A T T G G ~ C G G ~ G G A T ~ G G C ~ G T A T ? T T C ~ C A A 253 R F S K D F E D I E E I G S G G F G Q V F K A 980 277
AACACPGAATCGATGGAAAGA~TATG~ATTAAGC~ATTACATATAA~CC~AAG~~GCG~A K H R I D G K T Y A I K R I T Y N T K K A K R E
1050 A G T A C A A G C A C T A G C A G A A C T C A A T C A C G C C A A C A T T G T P V Q A L A E L N H A N I V Q Y R V C W E G E D 300 1120 323
TATGA~ATGATCCCGWAACAGCACACAAACGGTGACAC~GTCGATAC~ACCCGGTG~TCTTTATCC Y D Y D P E N S T N G D T S R Y K T R C L F I
1190 AAATGGAATTCTGTGATW+AGGAACTCTGCAGCAGTGGTI'GGWGAGAAATCffiAGTCMGAGGACAA 347 Q M E F C D K G T L Q Q W L E K R N R S Q E D K 1260 270
GGCTTTGGTTTTGGAG~ATTTTGAACAAATAG~ACAGGAGT~TTATATACA~CGAAAGGTTTAATT A L V L E L F E Q I V T G V D Y I H S K G L I
FIG.2. The cDNA sequence and predicted amino-acid sequence of rat PKR. The cDNA for rat PKR was cloned from a rat brain cDNA library (64). The upper sequence is the nucleotide sequence of the cDNA clone and the lower sequence is the predicted amino-acid sequence. Only the coding region of the cDNA is shown; the full-length cDNA is 3826 nucleotides (64). Nucleotide residues and amino acids are numbered on the left.
ROLE OF FACTOR
eIF-2B
177
B
1330 C A T A G A G A C C ~ A A G C C A G G T A A T A T A T A G T G G A T G C C 393 H R D L K P G N I F L V D E K H I K I G D F G 1400 TTGCAACAGCCCTGGAAAATGATGGAAATCCTCCTCG~C~GTATACAGGAATACTC~CAATACATG~TCC ~
I
~
L
A
T
A
L
E
N
D
G
N
P
R
T
K
Y
T
G
T
P
Q
Y
M
S
P
1470 A G A A C A A A A G r r A T C G ? T A G T G G A A T A T G G A A A G G A A G T G G C T 440 E Q K S S L V E Y G K E V D I F A L G L I L A 1540 G A A C T r r T T C A C A T A T G C A T G A T r r A G A G A A A A T A G A G C T 463 E L L H I C K T D S E K I E F F Q L L R N G I 1610 T C T C C G A T G A T A T T T T C G A C A A C A A G G ~ C C T T C T 487F S D D I F D N K E K S L L Q K L L S S K P R E 1680 A C G A C C C A A T A C G T C T G A A A T C C T G A A G A C T T ~ G C T G A G T G G A A T A G ~ C A T C T C ~ A G ~ G ~ G A 510 R P N T S E I L K T L A E W K N I S E K K K R 1750 I L A C A C G T G T T A G G G C T T T T A A C T G C A G r r T G A A G T G G A A T 513 N T c 1820 A C T C T C C C A G A C A G G T T T T G G T A A G G G G ~ A C C C A A A G A G C C
FIG.2 (Continued)
for the apparent conflict may be the relatively low expression of HCR mRNA in nonerythroid tissues compared to reticulocytes. Our studies show that HCR mRNA is expressed in other tissues at approximately one-tenth of the amount found in reticulocytes. Although the levels of expressed protein may be higher or lower, the differences in mRNA appear to correlate with HCR activity in nonerythroid cells (unpublished observation).
C. dsRNA-activated Protein Kinase The cDNA product that exhibited 84% homology to murine PKR (31, 69) was used to screen a rat brain cDNA library (64). From approximately 200,000 clones, one positive plaque was obtained. This was purified and the cDNA insert was subcloned and sequenced. Analysis of the cDNA clone revealed that it lacked the full coding sequence of PKR. Additional sequence was obtained using 5'-rapid amplification of cDNA ends (RACE)and primers designed from the sequence of the partial cDNA. The cDNA sequences of the products of three separate reactions were compared to detect errors that occurred during the reverse transcription and the PCR amplification. The rat brain PKR clone is a 3841-bp cDNA with an exceptionally long (2145 bp) 3'-untranslated region (UTR) terminating in a poly(A) tail (64).It is unlikely that this long UTR represents a cloning artifact because we have isolated an essentially identical cDNA from a rat liver cDNA library. As shown in Fig. 2, the complete rat brain PKR cDNA contains an open reading frame of 1542 bp that encodes a 514-aminoacid protein. The derived amino-
178
SCOT R. KIMBALL ET AL.
acid sequence displays 76% identity to murine PKR (31,69)and 61% to the human enzyme (32, 70). Sequence identity is greatest between the conserved protein kinase domains (65)and the two RNA-binding elements (72, 72).Although the rat enzyme shows greater homology to murine PKR overall, the RNA-binding domains are more conserved between the human and rat enzymes. Using a cDNA probe corresponding to nucleotides 1-1151 of rat PKR, we detected a single mRNA species by Northern blot analysis of various rat tissues (64).The apparent size of this RNA (4.4 kb) is similar to that of our cDNA clone. The expression of PKR message was very similar among rat tissues with the exception of testis and reticulocytes, in which PKR mRNA was barely detectable. The expression of p-actin mRNA was also significantly lower in reticulocytes than in other tissues; however, this is a situation we consistently observe with this tissue.
D. elF-2a Kinases; Summary To date, two eIF-2a kinases have been identified and characterized in mammalian cells, PKR and HCR. PKR has a clearly defined role in the cellular response to viral infection and a newly identified role in the response to the inhibition of protein processing in the endoplasmic reticulum. In addition, the regulation of HCR by heme in cells of erythroid origin is well characterized. In contrast, the identity of the eIF-2a kinase responsible for changes in eIF-2a phosphorylation in response to other stimuli in mammalian cells is unknown. Likewise, the regulation of the HCR-like kinase identified in nonerythroid tissues has yet to be defined. The availability of cDNA clones for these kinases will allow further investigation of the role of the individual kinases in the regulation of the phosphorylation state of the a-subunit of eIF-2 in mammalian cells.
VII. Cloning of the d u b u n i t of elF-2B
A. cDNA In yeast, the a-subunit of eIF-2B is the product of the GCN3 gene and is the only one of the subunits of the factor that is not essential for viability (reviewed in 53). A series of studies suggests that the a-subunit is involved in recognizing the phosphorylation status of eIF-2a (reviewed in 73). In yeast deprived of amino acids, the eIF-2a kinase GCN2 is activated, resulting in an increase in the phosphorylation state of eIF-2a (40).The increase in eIF-2a phosphorylation results in induction of GCN4, a transcriptional activator of more than 30 enzymes involved in nine different amino-acid bio-
ROLE OF FACTOR
eIF-2B
179
synthetic pathways (reviewed in 73, 74). Deletion of eIF-2Ba has no effect on cellular growth rate in yeast under nonstarvation conditions (75). However, eIF-2Ba is required for induction of GCN4 under amino-acid starvation conditions (76). A more recent study has identified point mutations in the a-subunit of eIF-2B that cause the same phenotype observed in cells deleted for eIF-2Ba (63).These results suggest that the primary function of the a-subunit of eIF-2B is to mediate the inhibitory effects of eIF-2a phosphorylation on the activity of eIF-2B. The a-subunit of eIF-2B was initially cloned from yeast as a positive regulator required for increased synthesis of GCN4 during amino-acid deprivation (60). In the same study, it was found that GCNS could suppress the effects of gcdl mutations under nonstarvation conditions. GCDl was later shown to be the yeast equivalent of the y-subunit of eIF-2B (77).A combination of 5‘ and 3’ mapping revealed that the mRNA for GCNS is 1.2 kb long and codes for a protein with a molecular mass of 34.0 kDa (60). Mammalian eIF-2Ba was cloned using PCR to amplify a partial eIF-2Bcx cDNA, using degenerate oligonucleotides corresponding to peptide sequences derived from eIF-2B purified from bovine liver (55).The products of the initial reaction were reamplified using “nested” oligonucleotides, and three products from this reaction were subcloned and sequenced. One of the cDNA fragments contained a sequence that encoded additional eIF-2Ba peptide sequence (i.e., sequence not used to design the primers). The other PCR-generated cDNAs showed no significant homology to any of the eIF-2Ba peptides or to GCN3. To isolate a cDNA corresponding to the entire coding region of eIF-2Ba, the cDNA fragment that encoded eIF-2Ba peptide sequence was radiolabeled and used to screen a rat brain cDNA library. From 3.0 x lo5 recombinant phages screened, seven positive phage clones were purified and the inserts excised as pBluescript SK(-) vectors. These clones were analyzed by multiple restriction digests and partial sequencing and were found to contain overlapping cDNA fragments. The largest of these clones (pBra,,,) contained a 1.5-kb insert that was sequenced completely across both strands. The eIF-2Ba cDNA is 1510 bp long with an open reading frame (ORF) encoding a protein of 305 amino-acid residues with a predicted molecular mass of 33.7 kDa. The initial ATG codon of this ORF is contained within a translation initiation consensus site (78).The TGA termination codon is located at nucleotide position 980. The cDNA contains a 3’-terminal poly(A) region 17 bp long but has no classical polyadenylylation signal (79). The deduced amino-acid sequence of this cDNA contains all of the amino-acid sequences of the peptides derived by proteolytic digestion of bovine eIF-2Ba. As shown in Fig. 3, the derived amino-acid sequence of the rat eIF-2Ba
180 1 1
EDGEL
SCOT
R. KIMBALL ET AL.
I K Y F K S Q M K G~ ; ~ s A ~ T ~ J L ; , ~ R D K ~ EFNITETYLRFLEE MPI A I E IKTPE
80
199 200
x:i
K F K YK A D T E S V Q A G Q B E E H PL QQ D S D B - - -MAGPP FTRRTD EDALR
-$$m TIDY
FIG.3. Comparison of the deduced amino-acid sequences of the a-subunit of eIF-2B. The upper sequence is the predicted amino-acid sequence of rat eIF-2Ba (55) and the lower sequence is that of S. cerevisiae GCN3 (59).Identical residues are boxed. Residues are numbered on the left. Alignment was constructed with the program MEGALIGN computer program (DNAStar).
cDNA displays 42% identity to that of S. cerevisiae GCN3 and 40% identity to a hypothetical open reading frame from a Caenorhabditis elegans genomic cosmid clone (80). The three derived amino-acid sequences display 27% identity overall, with the greatest variability occurring at the N termini of the proteins. The size of the rat eIF-2Ba mRNA was determined by Northern blot analysis of rat liver total and poly(A)+ RNA. A single band was observed at a position of approximately 1575 nucleotides following stringent washing. A 5' RACE procedure (81) was performed using several nested antisense nondegenerate oligonucleotide primers in an attempt to obtain further upstream sequences. This yielded no additional sequence other than that already contained in the cDNA. Taken together, these results suggest that nearly all of the 5'-untranslated mRNA sequence is contained within the eIF-2Ba cDNA.
ROLE OF FACTOR
eIF-2B
181
The pBra,,, plasmid was translated in vitro using a coupled transcription/translation system (Promega). An aliquot of the 3%-labeled translation reaction was size-fractionated on SDS-PAGE and visualized by autoradiography. A predominant peptide product was observed with an apparent molecular mass of 34 kDa that comigrated with eIF-2Ba purified from rat liver. This peptide was not observed in reactions using the nonrecombinant pBluescript vector. To determine whether rat liver eIF-2Ba is functionally homologous to GCN3, the rat cDNA was inserted 3’ of a galactose-inducible promoter on a high-copy-number autonomously replicating plasmid. The resulting construct, paYEX4, was introduced into strains of yeast containing a chromosomal deletion of either GCNS or GCN2. Introduction of paYEX4 into the gcn3A strain conferred increased resistance to 3-amino-1,2,4-triazole (3-AT) relative to vector alone, at a level somewhat reduced from that given by plasmid mp 116 bearing GCN3. The 3-AT-resistant phenotype of the pay EX4 transformants was observed when cells were grown with galactose, but not glucose, as carbon source. These results suggest that the rat eIF-2Ba protein is substituting for GCNS in the yeast eIF-2B complex and restoring the induction of GCN4 and its target genes in the histidine pathway. Introduction of paYEX4 into the gcn2A strain did not confer 3-AT resistance relative to vector alone, indicating that rat eIF-2Ba, like GCN3, is dependent on phosphorylation of eIF-2a by GCN2 for its stimulatory effects on histidine biosynthesis. Thus, rat eIF-2Ba is functionally substituting for GCNS rather than bypassing the requirement for eIF-2a phosphorylation, e.g., by interfering with eIF-2 or eIF-2B function.
B. Genomic Clone A rat eIF-2Ba cDNA probe was used to screen a rat genomic library, and four overlapping positive clones were isolated. These clones were analyzed by Southern blot hybridization and positive fragments were subcloned and then sequenced to compile the rat eIF-2Ba gene. The rat eIF-2Ba gene is composed o f 9 exons that range in size from 68 to 677 bp and are contained within 8.5 kb of genomic DNA. The positions of the exons were derived by aligning the genomic sequence with that of the rat eIF-2Ba cDNA (82).The AUG start codon and 5‘-UTR are encoded by exon 1. Exon 9 encodes the GTG stop codon and the 3’-UTR. The cleavage of the RNA transcript occurs at a GA and this is followed in the genomic sequence by three copies of the trinucleotide TGT, a common feature of mammalian genes (79). The transcriptional start site of the eIF-2Ba gene was mapped by primer extension. A single radiolabeled product was observed, indicating that transcription occurs at a single site. The size of this product places the transcriptional start site 34 bp upstream from the 5’ end of the rat eIF-2Ba cDNA
182
SCOT R. KIMBALL ET AL.
clone previously isolated (82).The eIF-2Ba mRNA would therefore be 1544 nucleotides long, including the poly(A) tail. The eIF-2Ba gene promoter region lacks a TATA box (TATAA), which is generally located approximately -30 bp from the transcriptional start site (83). TATA-less genes are often housekeeping genes, and the eIF-2Ba promoter region shares some other characteristics of such genes. There are two copies of the CCAAT box (84) present in reverse complement at positions -359 and -457 and a single copy of the consensus binding site for the SP1 factor, GGGCGG (85),also in reverse complement, at position -55. Immediately preceding the transcriptional start site in the eIF-2Ba gene is a potential binding site for the CREBJATF family of transcription factors (GTGACGYMR at position - 13) that mediate the transcriptional response to CAMP (86). Recently, a novel transcription factor has been characterized that binds to a specific sequence element in the promoter region of the eIF-2a gene (87). A consensus binding site for this factor, termed a-Pal, has been located in the promoter region of the gene encoding the p-subunit of eIF-2 (87).The promoter region of the rat eIF-201gene contains a partially conserved (10 out of 12 bases) copy of the a-Pal consensus binding motif, TGCGCATGCGCA (87), at position -341. Painvise comparison of the promoter regions of the rat eIF-2Ba and eIF-2Be genes reveals five short regions of identity. Although the significance of these homologies is uncertain, it would be interesting to examine the sequence of the recently cloned human eIF-2B6 gene (88) in order to see if any of these elements are general to eIF-2B genes.
VIII. Cloning of the p-Subunit of elF-2B Recent studies suggest that, in addition to eIF-2Ba, the p- and &subunits of yeast eIF-2B might also be involved in responding to phosphorylation of eIF-2a because point mutations in these two subunits result in the same phenotype as does deletion of eIF-2Bol(63). It is noteworthy that both the p- and 8-subunits of yeast eIF-2B exhibit regions of amino-acid sequence similarity with the a-subunit (59). It has been proposed that all of the mutations in the a-,p-, and &subunits of eIF-2B weaken the stable interaction between eIF-2(aP) and eIF-2B that is required for sequestration of eIF-2B into an inactive complex (63).Because the mutations have no effect on cellular growth when GCN2 is inactive, they probably do not substantially alter the guanine nucleotide exchange activity of eIF-2B under control conditions. However, unlike eIF-ZBa, deletion of eIF-2BP is unconditionally lethal (62), suggesting that the p-subunit of the factor may have additional role(s) in the structure and/or function of the protein. A study showing that
ROLE OF FACTOR
183
eIF-2B
eIF-2BP is radiolabeled following p h o t o a n i t y labeling of the protein with 8-[32P]azidoGTP (89) suggests that the p-subunit may play a direct role in the guanine nucleotide exchange reaction. P h o t o a n i t y labeling of eIF-2BP is prevented by incubation with excess nonradiolabeled GTP or GDP but not ATP or NADP+, indicating that the affinity labeling is specific for guanine nucleotides. The P-subunit of eIF-2B was initially cloned from yeast as a product of the GCD7 gene (62). Like the 7-, 6-, and r-subunits of yeast eIF-2B, the @-subunitof the factor was originally identified as a protein involved in the increased translation of GCN4 in response to amino-acid deprivation. For
SAA
329 354
BB~MSEL&II;~~H.& I FNPS
RIAWD
KNKA.
FIG. 4. Comparison of the deduced amino-acid sequences of the P-subunit of eIF-2B. The upper sequence is the predicted amino-arid sequence of rat eIF-2BP (90) and the lower sequence is that of S. cereoisiae GCD7 (61). Identical residues are boxed. Residues are numbered on the left. Alignment was constructed with the MEGALIGN computer program (DNAStar).
184
SCOT R. KIMBALL ET AL.
each of these proteins, nonlethal mutations cause a constitutive induction of GCN4 and result in the concomitant derepression of a number of genes in the amino-acid biosynthetic pathway (reviewed in 73, 74).This phenomenon has been referred to as the general control response, and the proteins involved in the response are referred to as the GCD, or general control derepressing, proteins. Unlike GCN3 and GCD1, prior to cloning GCD7 it was suspected that the protein might be a subunit of eIF-2B (59).This assumption was based on the prior identification of GCN3, GCD1, and GCDS as part of a GCD complex associated with eIF-2 that had an essential role in peptide-chain initiation, and the observation that GCD7 exhibits several properties in common with G C D l and GCDS (75). The identification of GCD7 as the @-subunit of eIF-2B was confirmed when the corresponding mammalian subunit was cloned. The cDNA for the mammalian P-subunit of eIF-2B was first cloned from a rabbit liver cDNA library in Xgtll (56). The rabbit cDNA has a predicted open reading frame of 351 amino acids encoding a protein of 39.0 kDa. The predicted length and molecular weight of the P-subunit of rat eIF-2B are identical to that of the rabbit protein, although the sequences are only 88% identical (90). By comparison, the yeast cDNA is predicted to encode a protein of 381 amino acids with a molecular weight of 42.6 kDa (59). As shown in Fig. 4, the deduced amino-acid sequence of the rat eIF-2BP shows 35% identity and 65% similarity of GCD7, strongly suggesting that GCD7 is the yeast equivalent of rabbit eIF-2BP.
IX. Cloning of the y-Subunit of elF-2B The y-subunit of eIF-2B is one of the two subunits of the factor that are radiolabeled following incubation with 8-[32P]azidoATP,the other being the S-subunit (89). Radiolabeling is blocked by coincubation with nonradiolabeled ATP or NADP+, although to attain similar levels of blocking, fivefold higher concentrations of NADP+ compared to ATP are required. Furthermore, ATP binds to eIF-2B in in uitro reactions, and this binding is prevented by NADP+ (89).These results suggest that the y- and S-subunits may be involved in the regulation of the activity of the factor by pyridine dinucleotides (43, 47). The studies also suggest that the two subunits might both contain nucleotide binding sites for ATP and pyridine dinucleotides. The y-subunit of eIF-2B has not yet been cloned from any mammalian cell, The yeast equivalent of eIF-BBy, GCD1, was initially cloned by Hill and Struhl (62). G C D l was identified by a mutation that caused constitutive induction of GCN4 and the amino-acid biosynthetic genes as well as tem-
ROLE OF FACTOR
eIF-2B
185
perature-sensitive growth. G C D l is one of a set of proteins that exhibit this phenotype; i.e., mutations in the proteins result in a derepression of the genes in the amino-acid biosynthetic pathway (reviewed in 73, 74). The cDNA has a predicted open reading frame of 511 amino acids encoding a protein of 57.8 kDa (62).The predicted molecular weight of yeast eIF-2By is similar to apparent molecular weight of the protein from rat (55.7 kDa) and rabbit (56.0 kDa) as measured by SDS-polyacrylamide gel electrophoresis
(11). The amino-acid sequence of G C D l shows regions of homology to GCD6, the yeast equivalent of the €-subunit of eIF-2B (59). G C D l also exhibits sequence similarity to the mammalian eIF-2Bq although to a lesser extent than for GCD6. It is interesting that the residues conserved between G C D l and GCDG are conserved in mammalian eIF-2Be more frequently than would be expected based on the degree of identity between GCDG and eIF-2Be (53).This finding suggests that the common residues may play an important role in the function of the protein.
X. Cloning of the &Subunit of elF-2B The S-subunit of eIF-2B was initially cloned from yeast as the product of the GCD2 gene (61). It was later cloned from rabbit (57),mouse (91),and rat (90). It is noteworthy that the size of the yeast protein predicted from the open reading frame of the cDNA is significantly larger than that predicted for the proteins from rat, mouse, or rabbit (Table I). As shown in Fig. 5, most of the difference in size is due to a stretch of approximately 60 amino acids present in the carboxyl end of the yeast protein that is not present in the mammalian protein. The derived amino-acid sequence of the eIF-2BS protein from rat exhibits 87, 94, and 33% identity to the corresponding proteins from rabbit, mouse, and yeast, respectively. The sequences from rabbit and mouse likewise show approximately 33% identity to the yeast protein. However, portions of the sequence near the C terminus exhibit higher levels of identity among species than the rest of the protein. For example, one stretch of 37 amino acids in the yeast protein is approximately 60% identical to a stretch of corresponding amino acids in the mammalian proteins. The sections of the protein that demonstrate high homology among species may be important in structural or functional aspects of the protein. The amino-acid sequences of the a-,p-, and &subunits of yeast eIF-2B exhibit a significant level of similarity, leading to the conclusion that the proteins constitute a family of proteins with related functions (59).The greatest similarity is observed in the C-terminal portion of the proteins. A num-
186
SCOT R. KIMBALL ET AL.
TABLE I COMPARISON OF THE PREDICTED SIZE OF THE SUBUNITS OF
Subunit
Species
eIF-2B FROM
VARIOUS SPECIES
Number of amino-acid residues
Predicted M ,
a
Yeast Rat
306 305
34.0 33.7
P
Yeast Rat Rabbit
38 1 35 1 351
42.6 39.0 39.0
Y
Yeast
511
57.8
6
Yeast Rat Rabbit Mouse (a) Mouse (h)
651 524 523 544 524
70.9 57.8 57.0 59.6 57.6
&
Yeast Rat Rabbit
712 717 721
81.2 80.2 80.1
ber of mutations in yeast eIF-2Ba have been described that lead to a constitutive increase in the expression of GCN4 (60, 63). A similar phenotype has also been observed in cells containing mutations in the p- and &subunits of the factor (63). It is noteworthy that many of the mutations occur at amino acids conserved among the three subunits and that mutations of conserved amino acids seem to have a greater effect on GCN4 expression compared to mutations in amino acids that are not conserved. In agreement with the sequence homology reported in yeast, alignment of the sequences of the rat eIF-2Ba and eIF-2Bp proteins reveals a 31% identity between the two proteins. However, in contrast to the results in yeast, the C-termini of the rat proteins do not appear to have an increased level of similarity compared to the remainder of the protein as was observed in yeast. As discussed under Section IX above, the 8-subunit of eIF-2B is the second of the two subunits of the mammalian factor that can be cross-linked to ATP in uitro (89). However, the potential nucleotide binding domain noted previously in yeast eIF-2B6 (GnkigGK) (57) is not conserved in the 6-subunit from rat, rabbit, or mouse. These results suggest that maybe the y-subunit of the mammalian factor is the primary binding site for adenine nucleotides and that the 6-subunit may be photolabeled by 8-[32P]azidoATP because of the proximity of the two subunits in the holoprotein.
ROLE OF FACTOR
187
eIF-2B
..................... 54 91
Q E - - - - - - - - - - - - I SAVSAA @ D P V R ~ G T - - G S P L G KKKQNERNVKKSTLF L E T T E RATIL TSAVSSPKTS
541 V E 477
--------____--______-------------
GBL PB&
FIKERKFEKKKLAMENKPKGNKIGGKKGSEGE
- - - - - - - - - - - - - - - - - - Q NSSL L L I
586 s K D A sN E E D sN sK N I L D
~
FIG. 5. Comparison of the deduced amino-acid sequences of the &subunit of eIF-2B. The upper sequence is the predicted amino-acid sequence of rat eIF-2BS (90) and the lower sequence is that of S . cereoisiae GCD2 (61).Identical residues are boxed. Residues are numbered on the left. Alignment was constructed with the MEGALIGN computer program (DNAStar).
188
SCOT R. KIMBALL ET AL.
XI. Cloning of the €-Subunit of elF-2B A. cDNA The E-subunit of eIF-2B is of particular interest because it is the only one of the five subunits of the factor that has been shown to be phosphorylated (11,50, 51, 54, 59, 92). The available evidence suggests that phosphorylation may modulate the activity of the holoprotein (51, 92). As discussed above, the e-subunit of eIF-2B can be phosphorylated by both CK-I and CK-11 in vitro (50, 51,54). A third protein kinase that phosphorylates eIF-2Be in vitro is GSK-3 (52). This finding is particularly exciting because GSK-3 is regulated by insulin and because GSK-3 could serve as a transducer of the insulin-stimulated increase in the synthesis of both protein and glycogen. A caveat to this idea is that, to date, it has not been possible to demonstrate a change in eIF-2B activity in response to phosphorylation by GSK-3. In part, this failure is probably a result of the fact that GSK-3 does not stoichiometrically phosphorylate purified eIF-2B in in vitro assays. The consensus phosphorylation site for GSK-3 is Ser/Thr-[X].-Ser(P)/Thr(P),where the second serine/threonine in the sequence is referred to as the “priming” site (93, 94). If the priming site is not phosphorylated, then GSK-3 does not phosphorylate the first serine in the sequence. The number of amino acids between the GSK-3 phosphorylation site and the priming site varies between substrates, but in most cases (e.g., glycogen synthase, phosphatase G-subunit, and L-myc) the number is three. We have found that dephosphorylation of eIF-2B with alkaline phosphatase significantly reduces the subsequent phosphorylation by GSK-3 (unpublished observation). These results suggest that the failure to observe a change in the activity of purified eIF-2B following phosphorylation by GSK-3 may be a consequence of dephosphorylation of the priming site during purification of eIF-2B. At the present time, neither the identity of the kinase that phosphorylates the priming site nor the site phosphorylated by GSK-3 are known. Therefore, one of the reasons for cloning the cDNAs for the e-subunit of eIF-2B was to aid in the identification of the sites phosphorylated by CK-I, CK-11, and GSK-3. The E-subunit of eIF-2B was cloned simultaneously from rabbit and yeast (59).The yeast protein is a product of the GCD6 gene and is required for the translational control of GCN4 expression during amino-acid deprivation (59). As shown in Fig. 6, the cDNA has a predicted open reading frame of 712 amino acids encoding a protein of 81.2 kDa (59). The initial report of the cloning of the rabbit eIF-2Be subunit only reported a partial amino-acid sequence. Subsequently, we obtained full-length cDNAs for both rabbit and rat (58, 95).
KGQ
91
cw
256 309 I R
L~~E QS
HGSVLEE QSCKIGK
35 4 N V L N c sy%~gEJTE&$ 342 CTAI S T I EGTKIE SVIG
PH~LTSQBVGPDI GVRVA
DG IIGFN
IDDNMD
RNTK
KEKVKLKGYNPAE DQDLDDQTLAVSI
INMEEESETESE LS-DDSISSATK
565
FPLQQVDGVi
S C @ L $ ~ ~ ~ E J Y A Y ~ E & S L K ~ Q V L S H-V V ~ -
562 DL
A L EL
LRMSM
TYH
RIATITAL
RVYHFIATQT-
NYIKRAADHLEALAA
DNVSTDPRYDEVK
FIL 6 Comparison of the deduced amino-acid sequences of the subunit of eIF-2B The upper sequence IS the predicted amino acid sequence of rat eIF-2Be (95) and the lower se quence is that of S cereozszae GCDG (59) Identical residues are boxed Residues are numbered on the left Alignment was constructed with the MEGALIGN computer program (DNAStar)
190
SCOT R. KIMBALL ET AL.
The cDNA for rabbit eIF-2Be was isolated from a rabbit reticulocyte cDNA expression library with an antibody against the rabbit eIF-2IeIF-2B complex (58). A single immunopositive clone was isolated from approximately 3.4 x 105 recombinant clones screened. The cDNA insert is 2508 bp long and contains an open reading frame predicted to encode a protein containing 714 amino acids with a molecular mass of 79.5 kDa. However, the open reading frame does not start with an AUG, suggesting that the clone is not full length. The amino-acid sequence predicted by the partial cDNA is 30% identical and 55% similar to that predicted by the yeast Gcd6 protein. The 5’-end of the rabbit eIF-2Be mRNA was amplified and cloned using a set of nested primers and 5’ RACE. The mRNA was found to contain an additional 92 nucleotides and an in-frame AUG start codon at position 72. The coding region of the composite cDNA is predicted to encode a protein containing 721 amino acids with a molecular mass of 80.1 kDa. The cDNA for rat eIF-2Be was cloned from a rat liver cDNA library using a radiolabeled partial rabbit eIF-2Be cDNA as probe (59). From 1.5 x 106 recombinant phages screened, six positive phage clones were purified and the inserts were found to contain identical 1.8-kb inserts Nested antisense oligonucleotides were synthesized based on this cDNA sequence and used in 5‘ RACE protocols with total RNA of rat liver as template. Three sequential reactions produced cDNAs that together yielded an additional 640 bp of upstream rat liver eIF-2Be sequence. and 5’-RACE-derived cDNAs were radiolabeled and used to The screen a rat brain cDNA library. Seven unique clones were isolated from the brain cDNA library; the largest of these clones (pBres) contained a 2475-base insert, which was sequenced completely across both strands. The cDNA contained a 3‘-terminal poly(A) region 57 bp long but lacking an AUG start codon. The sequence of pBre, was used to design oligonucleotide primers for RACE and a further 47 bp of cDNA sequence was obtained. The compiled 2522-bp cDNA contains a 2148-bp open reading frame and terminates in a 57-bp poly(A) tail. The 3’-UTR lacks the canonical AATAAA polyadenylylation signal but contains the sequence AGTAAA located 14 nucleotides before the poly(A) tail, and this has been shown to be an effective polyadenylylation signal (79). The 5’ end of the cDNA has a very high G C content (a 120nucleotide stretch of 75% G C), which may explain the dimculty experienced in obtaining a full-length cDNA. The polypeptide predicted by the open reading frame is 715 amino acids in length and contains sequences essentially identical to those of the three proteolytic peptides obtained from purified bovine eIF-2Be. The AUG start codon is in a good consensus sequence for translation initiation (78) and is
+
+
ROLE OF FACTOR
eIF-2B
191
preceded by an in-frame TGA termination codon 15 nucleotides upstream of the AUG. The rat eIF-2Be peptide sequence was used to search the SwissProt sequence database using the Smith-Waterman algorithm (96).As shown in Fig. 6, rat eIF-2Be shows 87% identity to the predicted amino-acid sequence of a rabbit eIF-2Be cDNA (58) and 30% homology to the yeast Gcd6 protein. The eIF-2Be cDNA encodes a polypeptide with a predicted molecular mass of 80.2 kDa. In contrast, purified rat eIF-2Be comigrates with the 97kDa phosphorylase b marker on SDS-polyacrylamide gels (11).To investigate this discrepancy, the rat eIF-2Be cDNA was transcribed and translated in the reticulocyte lysate coupled transcription/translation system and the products were analyzed by SDS-polyacrylamide gel electrophoresis. Translation yielded one major product that comigrated with both phosphorylase b and 32P-labeled eIF-2Be. This product was never observed in a reaction using the nonrecombinant pBluescriptI1 vector. The observation is consistent with the rat eIF-2Be cDNA encoding a full-length clone and suggests that eIF-2Be migrates anomalously during SDS-polyacrylamide gel electrophoresis. This may explain the variations in mobility of eIF-2Be on SDSpolyacrylamide gel electrophoresis that have been observed among ddferent species (11, 50). The expression of eIF-2Be mRNA in various tissues was examined by Northern blot analyses of poly(A)+ RNA using a rat eIF-2Be cDNA probe. Two hybridizing species were detected at high stringency: a strong band at approximately 2.7 kb and a weaker band at approximately 3.5 kb. The 2.7-kb species is similar in size to the cDNA clone (2.5 kb); however, the identity of the 3.5-kb species is unclear. As expected for an essential part of the protein synthetic machinery, eIF-2Be message was expressed in all rat tissues examined. The 2.7-kb species showed a higher level of expression in testis than we have observed for other eIF-2B subunit mRNAs (unpublished observations). Consistent with this observation is the finding that the expression of eIF-2Be in rabbit testes is approximately threefold higher than that observed in other tissues. The relevance of this finding is unclear, although this tissue is highly active in protein synthesis. The relative expression of both species was greatly reduced in reticulocytes, as was that of the control p-actin mRNA. This is a situation that is consistently observed with reticulocytes and probably reflects the predominance of erythroid-specific mRNAs in these cells. The 2.7-kb species was also detected in the two rabbit tissues examined. The signal was weaker using RNA from either rabbit tissue compared to the signal obtained using RNA isolated from any rat tissues. This finding probably reflects a reduced affmity of the rat cDNA probe for the rabbit mRNA.
192
SCOT R. KIMBALL ET AL.
B. Genomic Clone A rat genomic library was screened with a rat eIF-2Be cDNA probe and an 11.5-kb positive clone was isolated. Southern blot analysis of this clone using the rat eIF-2Be cDNA probe identified a single 5.5-kb EcoFtI fragment, which was subcloned and sequenced. The fragment contained eIF-2Be cDNA sequence from the 3‘ end of the clone. The 5.5-kb partial genomic clone was used to design oligonucleotide primers for a PCR-based screen of a rat P1 genomic library (97). A single positive clone was isolated that comprised approximately 85 kb of genomic DNA. This clone was digested with EcoRI or BamHI and fragments were analyzed by Southern blot hybridization with the eIF-2Be cDNA probe. Three positive fragments were isolated and subcloned, and the sequence of the gene was compiled from these overlapping clones. The rat eIF-2Be gene is contained within 9.5 kb of DNA and is divided into 16 exons, ranging in size from 78 to 357 bp. The positions of the exons were derived by comparing the cDNA sequence to that of the gene and by computer analysis of the gene sequence using the GRAIL algorithm (98). The introdexon boundaries all conform to the consensus pattern for mammalian genes (99). Exon 1 encodes the 5’-UTR and contains the AUG start codon. The 3’-flanking region exhibits features common to mammalian genes (79); cleavage of the RNA transcript occurs 13nucleotides downstream from the polyadenylylation signal at a TA and this is followed by a G + T-rich region that contains three copies of the trinucleotide TGT. The transcriptional start site of the eIF-2Be gene was mapped by primer extension and a single radiolabeled product was observed. The length of this product places the transcriptional start site 65 bp upstream from the 5’ end of the longest 5‘ RACE product obtained. The rat eIF-2Be mRNA would therefore be 2530 bp long, excluding the poly(A) tail. Examination of the sequence of the 5’-flanking region of the gene reveals a TATA-like element (TATACA) at position -29 relative to the transcriptional start site. Immediately adjacent to this is a copy of the consensus binding sequence for the a-Pal transcription factor, a protein first isolated as a possible regulator of eIF-2a gene transcription (87). The promoter element is conserved at 9 out of 12 (9/ 12) positions, and the core element (ATGCGCA) is conserved exactly. Two different cytokine response elements are present in the 5’-flanking region. There is a possible type-2 IL-6 response element (100) at position -722 (8/12 bases conserved) and a cytokine-1 element (101)at position -319 (8/10). There is also a cytokine-2 element at position -837 (7/7). The cytokine-2 element is found a short distance downstream from the cytokine-1 element in the promoter region of the granulocyte-macrophage colony-
ROLE OF FACTOR
eIF-2B
193
stimulating factor gene and is highly conserved among species (101). Its function as a possible promoter element is unclear. Because of the relatively higher expression of eIF-2Be message in testis compared to other tissues, the gene promoter region was examined for testis-specific promoter/enhancer elements. There is one copy of a GGGTGGGG element at position -41 (8/8). This is an element of unknown function that is found in the promoter region of several testis-specific genes, including the genes for human phosphoglycerate kinase-2 and murine protamine-1 and -2 (102).The 5’4anking region also contains two potential binding sites for Ad4 binding (103) at positions -408 (7/9) and -1645 (8/9). The Ad4 transcription factor is involved in regulating the expression of cytochrome P-450 genes, which have a role in steroid hormone biosynthesis, and is present in various steroidogenic tissues (103).There is no apparent consensus binding site for the testis-specific transcription factor Tet-1 (104, 105). Finally, the 5’-flanking region contains a copy of the type-4 I D element, a rat repetitive element ancestrally derived from a tRNA gene (106), which has no known function.
XII. Future Directions Over the 16 years since eIF-2B was initially isolated, significant progress has been made toward understanding the regulation and mechanism of action of the factor in its role in initiation. Yet, many questions remain unanswered. Although several protein kinases that phosphorylate either the a-subunit of eIF-2 or the €-subunit of eIF-2B have been identified in mammalian cells, it seems likely that future studies will reveal additional kinases that phosphorylate the two factors. In particular, it seems likely that a mammalian equivalent of the yeast GCN2 kinase will be found. In addition, although the €-subunit of eIF-2B is a substrate for at least three different protein kinases, the effect of phosphorylation by these kinases on the activity of the factor is still in question. Furthermore, our unpublished studies suggest that at least one other as yet unidentified eIF-2Be kinase is present in extracts of rat skeletal muscle. Other studies are needed to define the role that the individual subunits of eIF-2B play in catalyzing the exchange of guanine nucleotides on eIF-2. Although recent studies in yeast suggest that the a-subunit of eIF-2B may play a regulatory role, the function of the other four subunits is still a mystery. The results of these studies will be of particular interest because other guanine nucleotide exchange factors are much smaller and consist of fewer subunits than does eIF-2B. It seems likely that several of the subunits
194
SCOT R. KIMBALL ET AL.
of the factor may function in additional roles other than catalyzing the guanine nucleotide exchange reaction. Finally, a number of studies have reported the modulation of eIF-2B activity in vitro by compounds such as polyamines, heparin, adenine nucleotides, and pyridine dinucleotides. The importance of these observations to the regulation of eIF-2B activity in vivo is as yet undefined.
ACKNOWLEDGMENTS This work was supported by Grants DK13499 and DK15658 from the National Institutes of Health.
REFERENCES K. Moldave, ARB 54, 1109 (1985). J. W. B. Hershey, ARB 60, 717 (1991). C. G. Proud, Curr. Top. Cell. Regul. 32, 243 (1992). V. M. Pain, B] 235, 625 (1986). B. Safer, R. Jagus, A. Konieczny and D. Crouch, in ‘The Mechanism of Translational Inhibition in Hemin-deficient Lysates” (M. Grunherg-Manago and B. Safer, eds.), p. 311. Elsevier, Amsterdam, 1982. 6. B. Safer, Cell 33, 7 (1983). 7. S. Ochoa, ABB 223, 325 (1983). 8. H. Amesz, H. Goumans, T. Hauhrich-Morree, H. 0. Voorrna and R. Benne, EJB 98, 513 (1979). 9. R. Panniers and E. C. Henshaw, JBC 258, 7928 (1983). 10. T. M. Mariano, J. Siekierka and S. Ochoa, BBRC 134, 1160 (1986). 11. S. R. Kimball, A. M. Karinch, R. C. Feldhoff, H. Mellor and L. S. Jefferson, BBA 1201, 473 (1994). 12. J. N. Dholakia and A. J. Wahba, J B C 264, 546 (1989). 13. D. J. Goss, L. J. Parkhurst, H. B. Mehta, C. L. Woodley and A. J. Wahha, JBC 259,7374 (1984). 14. A. G . Rowlands, R. Panniers and E. C. Henshaw, JBC 263, 5526 (1988). 15. S . Oldfield, B. L. Jones, D. Tanton and C. G . Proud, EJB 221, 399 (1994). 16. A. G. Rowlands, K. S. Montine, E. C. Henshaw and R. Panniers, EJB 175, 93 (1988). 17. S . R. Kirnhall and L. S. Jefferson, JBC 265, 16794 (1990). 18. R. Duncan and J. W. B. Hershey, JBC 259, 11882 (1984). 19. K. A. Scorsone, R. Panniers, A. 6 . Rowlands and E. C. Henshaw, JBC 262, 14538 (1987). 20. R. Hurst, J. R. Schatz and R. L. Matts, JBC 262, 15939 (1987). 21. S. R. Kirnball and L. S. Jefferson, Am. J. Physiol. 263, E958 (1992). 22. C. R. Prostko, M. A. Brostrorn, E. M. Malara and C. 0. Brostrom, ]BC 267,16751 (1992). 23. C. R. Prostko, M. A. Brostrorn and C. 0. Brostrom, MCBchern 127-128, 255 (1993). 24. R. J. Schneider and T. Schenk, ARB 56, 317 (1987). 25. S . R. Kimball and L. S. Jefferson, JBC 266, 1969 (1991). 26. M. J. Marton, D. Crouch and A. G . Hinnebusch, MCBioZ 13, 3541 (1993). 1. 2. 3. 4. 5.
ROLE OF FACTOR
eIF-2B
195
27. R. J. Rolfes and A. G. Hinnehusch, MCBiol 13, 5099 (1993). 28. H. Trachsel and T. Staehelin, PNAS 75, 204 (1978). 29. J.-J. Chen, M. S. Throop, L. Gehrke, I. Kuo, J. K. Pal, M. Brodsky and I. M . London, PNAS 88, 7729 (1991). 30. H. Mellor, K. M. Flowers, S. R. Kimhall and L. S. Jefferson, JBC 269, 10201 (1994). 31. G. S. Feng, K. Chong, A. Kumar and B. R. G. Williams, PNAS 89, 5447 (1992). 32. E. F. Meurs, K. Chong, J. Galabru, N. S. B. Thomas, I. Kerr, B. R. G. Williams and A. G. Hovanessian, Cell 62, 379 (1990). 33. R. C. Wek, B. M. Jackson and A. G . Hinnehusch, PNAS 86, 4579 (1989). 34. C. E. Samuel, JBC 268, 7603 (1993). 35. T. E. Dever, J.-J. Chen, G. N. Barber, A. M. Cigan, L. Feng, T. F. Donahue, I. M. London, M. G. Katze and A. G. Hinnehusch, PNAS 90, 4616 (1993). 36. R. Jagus, W. F. Anderson and B. Safer, This Series 25, 127 (1982). 37. J. K . Pal, J.-J. Chen and I. M. London, Bchm 30, 2555 (1991). 38. C. R. Prostko, J. N. Dholakia, M . A. Brostrom and C . 0. Brostrom, JBC 270,6211 (1995). 39. A. Kumar, J. Haque, J. Lacoste, J. Hiscott and B. R. Williams, PNAS 91, 6288 (1994). 40. T. E. Dever, L. Feng, R. C. Wek, A. M. Cigan, T. F. Donahueand A. G. Hinnebusch, Cell 68, 585 (1992). 41. R. C. Wek, M. Ramirez, B. M. Jackson and A. G. Hinnebusch, MCBiol 10, 2820 (1990). 42. S. R. Kimhall and L. S. Jefferson, BBRC 156, 706 (1988). 43. A. M. Karinch, S. R. Kimhall, T. C. Vary and L. S. Jefferson, Am. J . Physiol. 264, El01 (1993). 44. G. I. Welsh and C. 6. Proud, BJ 284, 19 (1992). 45. I. W. Jeffrey, F. J. Kelly, R. Duncan, J. W. B. Hershey and V. M. Pain, Biochimie 72, 751 (1990). 46. S. Cox, N . T. Redpath and C. G. Proud, FEBS Lett. 239, 333 (1988). 47. J. N. Dholakia, T. C. Mueser, C . L. Woodley, L. J. Parkhurst and A. J. Wahha, PNAS 83, 6746 (1986). 48. A. L. Greenhaum, K. A. Gumaa and P. McLean, ABB 143, 617 (1971). 49. 6 . R. Akkaraju, L. J. Hansen and R. Jagus, JBC 266, 24451 (1991). 50. S. Oldfield and C. 6. Proud, EJB 208, 73 (1992). 51. J. N. Dholakia and A. J. Wahba, PNAS 85, 51 (1988). 52. G . I. Welsh and C. G. Proud, BJ 294, 625 (1993). 53. N. Price and C. G. Proud, Biochimie 76, 748 (1994). 54. A. R. Aroor, N. D. Denslow, L. P. Singh, T. W. O’Brien and A. J. Wahba, Bchem 33,3350 (1994). 5.5. K. M. Flowers, S. R. Kimball, R. C. Feldhoff, A. G . Hinnebusch and L. S. Jefferson, PNAS 92, 4274 (1995). 56. B . L. Craddock, N . T. Price and C. G . Proud, BJ 309, 1009 (1995). 57. N. T. Price, G. Francia, L. Hall and C. C . Proud, BBA 1217, 207 (1994). 58. A. I. Asuru, H. Mellor, N. S. B. Thomas, L. Yu, J.-J. Chen, J. S. Crosby, S. D. Hartson, S. R. Kimhall, L. S. Jefferson and R. L. Matts, BBA in press (1996). 59. J. L. Bushman, A. I. Asuru, R. L. Matts and A. G. Hinnebusch, MCBioll3, 1920 (1993). 60. E. M. Hannig and A. G. Hinnehusch, MCBiol 8, 4808 (1988). 61. C. J. Paddon, E. M. Hannig and A. G. Hinnebusch, Genetics 122, 551 (1989). 62. D. E. Hill and K. Struhl, NARes 16, 9253 (1988). 63. C. R. Vazquez de Aldana and A. 6 . Hinnehusch, MCBioZ 14, 3208 (1994). 64. H. Mellor, K. M. Flowers, S. R. Kimball and L. S. Jefferson, BBA 1219, 693 (1994). 65. S . K. Hanks, A. M. Quinn and T. Hunter, Science 241, 42 (1988). 66. H. Mellor, N. T. Price, T. F. Sarre and C. G . Proud, EJB 211, 529 (1993).
196
SCOT R. KIMBALL ET AL.
67. H. Satoh, H. Fujii and T. Okazaki, BBRC 146, 618 (1987). 68. J. S. Crosby, K. Lee, I. M. London and J.-J. Chen, MCBiol 14, 3906 (1994). 69. P. L. Lcely, P. Gross, J. J. M.Bergeron, A. Devault, D. E. H. Afar and J, C. Bel1,JBC 266, 16073 (1991). 70. D. C. Thomis, J. P. Doohan and C. E. Samuel, Virology 188, 33 (1992). 71. S. R. Green and M. B. Mathews, Genes Deo. 6, 2478 (1992). 72. S. J. McCormack, D. C. Thomis and C. E. Samuel, Virology 188, 47 (1992). 73. A. G. Hinnebusch, Mol. Microbiol. 10, 215 (1993). 74. A. G. Hinnebusch, TIBS 19, 409 (1994). 75. A. M. Cigan, M. Foiani, E. Hannig and A. G. Hinnebusch, MCBiol, 11, 3217 (1991). 76. A. G. Hinnebusch and R. D. Klausner, In “Examples of Eukaryotic Translational Control: GCN4 and Ferritin” (H. Traschel, ed.), p. 243. CRC Press, Boca Raton, FL, 1991. 77. A. M. Cigan, J. L. Bushman, T. R. Boa1 and A. G. Hinnebusch, PNAS 90, 5350 (1993). 78. M. Kozak, J. Cell B i d . 108, 229 (1989). 79. M. L. Birnstiel, M. Busslinger and K. Strub, Cell 41, 349 (1985). 80. M. A. Frohman, M. K. Dush and G. R. Martin, PNAS 85, 8998 (1988). 81. J. Sulston et al., Nature NB 356, 37 (1992). 82. K. M. Flowers, H. Mellor, S. R. Kimball and L. S. Jefferson, BBA in press (1995). 83. R. Breathnach and P. Chambon, ARB 50, 349 (1981). 84. L. A. Chodosh, A. S. Baldwin, R. W. Carthew and P. A. Sharp, Cell 53, 11 (1988). 85. W. S. Dynan and R. Tjian, Nature N B 316, 774 (1985). 86. J. L. Meinkoth, A. S. Alberts, W. Went, D. Fantozzi, S. S. Taylor, M. Haigiwara, M. Montminy and J. R. Feramisco, MCBchem 127, 179 (1993). 87. B. J. S . Efiok, J. A. Chiorini and B. Safer, JBC 269, 18921 (1994). 88. S. Henderson and B. Sollner-Webb, Cell 47, 891 (1986). 89. J. N . Dholakia, B. R. Francis, B. E. Haley and A. J. Wahba, JBC 264, 20638 (1989). 90. N. T. Price, H. Mellor, B. L. Craddock, K. M. Flowers, T.Wilmer, S. R. Kimball, L. S. Jefferson and C. G. Proud, unpublished. 91. R. A. Henderson, G. W. Krissansen, R. Y. Y. Yong, E. Leung, J. D. Watson and J. N . Dholakia, JBC 269, 30517 (1994). 92. L. P. Singh, A. R. Aroor and A . J. Wahba, Bchem 33, 9152 (1994). 93. C. J. Fiol, A. Wang, R. W. Roeske and P. J. Roach, JBC 265, 6061 (1990). 94. S. E. Plyte, K. Hughes, E. Nikolakaki, B. J. Pulverer and W. J. R. Biomed. Biochim. Acta 1114, 147 (1992). 95. K. M. Flowers, H. Mellor, R. L. Matts, S. R. Kimball and L. S. Jefferson, BBA in press (1995). 96. T. F. Smith and M. S. Waterman, Ado. Appl. Math 2, 482 (1981). 97. N. Sternberg, Trends Genet. 8, 10 (1992). 98. E. C. Uberbacher and R. J. Mural, PNAS 88, 11261 (1991). 99. R. A. Padgett, P. J. Grabowski, M. M. Konarska, S. Seiler and P. A. Sharp, A R B 55, 1119 (1986). 100. S. Akira, Y. Nishio, M. Inoue, X. J. Wang, S. Wei, T. Matsnsaka, K. Yoshida, T. Sudo, M. Naruto and T. Kishimoto, Cell 77, 62 (1993). 101. M. F. Shannon, J. R. Gamble and M. A. Vadas, AS 85, 674 (1988). 102. M. 0. Robinson, J. R. McCarrey and M. I. Simon, PNAS 86, 8437 (1989). 103. K . Morohashi, S. Honda, Y. Inomata, H. Handa and T. Omura, JBC 267, 17913 (1992). 104. T. Howard, R. Balogh, P. Overbeek and K. E. Bernstein, MCBiol 13, 18 (1993). 105. T. Tamura, Y. Makino, K. Makoshiba and M. Muramatsu, JBC 267, 4327 (1992). 106. J. Kim, J. A. Martignetti, M . A. Shen, J. Brosius and P. Deininger, PNAS 91,3607 (1994).
Enzymology of DNA Tra nsfer by Con iugat ive Mechanisms’
WERNER PANSEGRAU ERICHL A N K A ~
AND
Mar-Planck-Znstitut fur Molekulare Genetik 0-14195 Berlin, Germany
I. Model of the Transfer Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11. Organization of IncP Transfer Regions . ............ 111. Mating Aggregate Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A. Definition of a Core System for Conjugative Transfer B. A Single Membrane Protein Complex Functions as P and Sustains Pilus Assembly and Conjugative DNA Transport . . . . C. TraC-like Proteins . . . .................................. D. The IncP Entry Exclus unction Is Specified by trbK . . . . . . . . E. Biochemical Analysis of Mpf Functions ........................ IV. DNA Processing Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Kelaxosome Assembly at the IncP Transfer Origin . . . . . . . . . . . . . . . B. DNA Kelaxases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Accessory Proteins . . . ........................... D. Biochemical Methods fo elaxosomes . . . . . . . . . . . . . . . . E. DNA Primases . . . . . . . . . . . . . . . . . . . . . . . . . ... V. Phylogenetic Relationships to Other Systems . . . . . . . . . . . . . . . . . . . . . . VI. Conclusions and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . .........................................
197 200 207 207 210 211 212 214 215 215 218 226 229 232 234 245 247
1. Model of the Transfer Process Bacterial conjugation is one of the major routes of genetic exchange in prokaryotes. Although the process has been studied for 50 years, the enzymology of many of its steps is still an enigma. In 1946, Lederberg and Tatum ( I ) discovered that Escherichia coli K12 I Abbreviations: C terminus, carboxy terminus; Dtr, DNA processing; Hfr, high frequency of recombination; Inc, incompatibility group; IPTG, isopropy-P-D-thiogalactopyranoside;IS element, insertion element; Mpf, mating aggregate formation; N-terminus, amino terminus; PTH amino acid, phenylthiohydantion amino acid; RBS, ribosomal binding site; R-strand, retained strand; SDS, sodium dodecyl sulfate; T-complex, nucleoprotein complex of T-DNA; T-DNA, tumorigenic DNA; Tra, transfer region; T-strand, transferred strand. Corresponding author.
Progress ~n Nucleic Acid Rercarch and M o l r ~ u l a rBioloRy. \‘ol 54
197
Copyright 0 19% by Academic Press, Inc. All rights of reproduction in any form reserved
198
WERNER PANSEGRAU AND ERICH LANKA
can act as a donor for chromosomal genes. A cryptic conjugative plasmid (the F-factor) was later recognized to be responsible for the donor activity of E . coli K12 (2, 3). The F-factor contains several copies of insertional elements (IS) that occur also in the chromosome of E . coli and other Gram-negative bacteria (4). These IS elements are “hot spots” for chromosomal integration of F via homologous recombination. In the integrated state, designated as Hfr (high frequency of recombination), transfer of chromosomal genes occurs with high probability. The gene transfer commences within the integrated plasmid and is unidirectional. This property can be used to map chromosomal genes by determining the period of time required for a certain marker to arrive in the recipient. By the same technique, it was also shown that the E . coli chromosome is a circular entity (5). Bacterial conjugation is still used as a tool for introducing genetic information into organisms for which transformation procedures do not exist. By using shuttle-vectors with alternative origins of vegetative replication, genetic information can be transferred and stably established across species boundaries between organisms as phylogenetically remote as E . coli and Sacchuromyces cereuisiae (6). Recently, tumorigenic DNA (T-DNA) transfer from Agrohacterium tumefaciens to plant cells has been recognized as a special form of bacterial conjugation, adapted to the requirements of transkingdom gene transfer (7, 8). The T-DNA transfer system is extensively used for the genetic manipulation of plants. The major drawback of this method, however, is the low susceptibility of monocotyledenous plants to Agrohacterium-mediated gene transfer (9). Shortly after antibiotics were introduced for treatment of infectious diseases and as a supplement in animal food, bacterial strains with multiple antibiotic resistance appeared (10).These strains contained extrachromosoma1 elements, conjugative plasmids, and resistance (R-) factors that carried the genetic information for the antibiotic-resistance phenotype (11). The resistance genes, in most cases, were parts of transposable elements, suggesting that the plasmids had acquired these genes only recently and in response to the environmental challenge imposed on their hosts by the antibiotics. The phenomenon of antibiotic resistance spread might serve as example for the potential of prokaryotes to adapt rapidly to environmental changes. Other examples of genes that are located on plasmids and that might help a host to exist under special conditions are virulence genes, genes for the utilization of certain carbohydrates, and genes for the biodegradation of aromatic hydrocarbons (12). Recombination and transposition events may result also in incorporation of plasmid-encoded genes into the bacterial genome. Thus, plasmids and their exchange by conjugation play a prominent role in the evolution of bacterial species (12-14).
BACTERIAL CONJUGATION
199
A. Model Systems for Studying Bacterial Conjugation
Two conjugative plasmid systems have been studied in detail: The F-plasmid incompatibility group (Inc) FI (IncFI) and the antibiotic resistance transfer factor RP4 (IncPa). RP4 is considered to be identical to the plasmids R18, R68, RK2, and RP1 that, due to the geographic location of their place of isolation, are designated as Birmingham plasmids (15).Other systems under investigation are the non-self-transmissible IncQ plasmids (R1162, RSF1010) (16, 17). However, these plasmids can be mobilized in the presence of certain conjugative plasmids providing functions required for DNA transport across bacterial cell membranes that are not encoded by IncQ plasmids. IncP plasmids are of particular interest, because they are broad hostrange plasmids, capable of transfer between and stable inheritance in a wide variety of Gram-negative bacterial species. The broad host-range character includes the replication, maintenance, and transfer properties of the respective plasmid. In contrast, the host range of the vegetative replication machinery of IncF plasmids in general is confined to the Enterobacteriaceae. However, the host range of the transfer apparatus in both systems is considerably greater, illustrated impressively by the fact that both IncP and IncF plasmids can direct the transfer of DNA to yeast (6). Nevertheless, there could be some kind of specific interaction with potential recipient cells: the efficiency of DNA transfer varies considerably depending on the type of organism that functions as mating partner (6) and depending on the type of plasmid that directs the interkingdom DNA transfer (B. M. Wilkins, personal communication). Apart from the possibility that host-specific DNA restriction systems are involved, this could be due either to varying efficiency of interaction of the mating aggregate formation (Mpf) system with structures on the receptor’s cell surface, or due to the incapability of the donor to penetrate the cell wall of the recipient for DNA passage. The latter possibility seems especially reasonable when phylogenetically remote organisms such as yeast are involved. The enzymology of transfer DNA replication has been studied in all three systems. Reconstitution studies using purified plasmid-encoded transfer gene products allowed mimicking the initiation reaction in uitro (18-20).
6. Steps in Bacterial Conjugation Bacterial conjugation is a replicative process, during which a DNA molecule is transferred unidirectionally from a donor to a recipient cell. The transfer reaction requires physical contact between donor and recipient cells, possibly initiated by cellular appendices of the donor, called conjuga-
200
WERNER PANSEGRAU AND ERICH LANKA
tive or sex pili. The current model of bacterial conjugation (Fig. 1) for historical reasons is based primarily on the observations with the F-system (21-23). According to that model, pilus retraction leads to intimate cell-cell contact, and a mating bridge between the cells is formed that allows DNA passage. Establishment of stable cell-cell contact is postulated to create a trigger signal that is transmitted to a specialized protein-DNA complex, the “relaxosome” (19, 20). Relaxosomes are the initiation complexes of conjugative transfer DNA replication that form at an origin of transfer (oriT). One relaxosomal component, the “relaxase,” is a DNA strand transferase that, on receiving the mating signal, cleaves the DNA at the nick site of oriT and attaches covalently to the 5’ terminus. Rolling-circle-like replication is thought to create the DNA single strand destined for transfer. DNA transfer proceeds with 5’ to 3‘ polarity. During transfer, the 5’-attached relaxase is thought to remain associated within the DNA transport channel, scanning the incoming DNA for the reconstituted nick site (24). When the reconstituted nick site passes the relaxase, a second strand transfer reaction takes place, recircularizing the transferred DNA single strand (19). Discontinuous complementary strand synthesis is initiated either by host-encoded priming mechanisms or by plasmid-encoded DNA primases that may enter the recipient cell noncovalently attached to the imported DNA single strand. Complementary strand synthesis in the donor and recipient cells, supercoiling of the covalently closed plasmids, and active dissociation of the mating partners complete the conjugative process.
II. Organization of lncP Transfer Regions A. Clustering of Transfer Functions Transfer functions of IncP plasmids map in two regions, transfer regions 1 and 2 (Tral and Tra2). In most natural isolates of IncPa or p plasmids, these regions are separated by insertion elements andlor antibiotic resistance genes (25-27). It is likely that the IncP backbone sequences at the junction of Tral and Tra2 provide a general hotspot for illegitimate recombination. The reason for this could consist either in the structure of the region within the context of the whole plasmid, or the functions located there are not very important for plasmid propagation or transfer. Thus, disruption of genes by insertion elements in this region would not result in an evolutionary disadvantage for the plasmid. Functions within the IncP Tra regions appear to be highly clustered according to their role (Fig. 2): the core of Tra2 encodes exclusively gene products involved in mating aggregate formation and entry exclusion; Tral
201
BACTERIAL CONJUGATION
Pilus attachment, Pilus retraction
+
Plasmid cleavage at the nick site
ssDNA transfer d
recircularization
+
complementary strand synthesis
DNA supercoiling
d
FIG. 1. Mechanistic dissection of bacterial conjugation. See Section I,B for details
(15 loci, 3 operons) specifies the gene products required for DNA processing, i. e., relaxosome proteins, a DNA topoisomerase, and a DNA primase (26,28). Only one gene within Tral (truF) seems to be devoted to mating aggregate Another Tral gene (truG)encodes a product that probably formation (29,30). provides a link between the IncP Mpf and Dtr systems (29, 31).
Tra2 core
1
FiwA H
I
TL271
trbA
trbC trbB trbD
trbE
TR37 1
I
25.0
Tral core
I
Relaxase
operon
TRL396
I:
traC
traD tfaE
traF
I
30.0
I
40.0
45.0
I
I
'traK
0, sites
v
t
> <%
TRL53 I
traG traHtraltraXoriT traL traJ traK traM
v
t
I
operon
'haJ
'tiaG
traA traB
Leader
operon
A
t I
4
trbF trbH trbl trbJ trbL trbN trbP fiwAupf32.8 trbG trbK trbM trb0 upf31.7
tt 20.0
TRL335
Primase
t
KorB
I
50.0
[kbl
FIG. 2. Genetic maps of RP4 transfer regions. Regulatory circuits are indicated by vertical arrows. Transcripts are represented by horizontal arrows. Genes are drawn as boxes and arrowheads mark their 5' ends. P, Promoters; T, p-independent transcriptional terminators. The scale refers to the standard RP4 coordinates (25). Genes belonging to Dtr or Mpf are shown in dark or light gray, respectively. Regulatory genes are hatched. See Section II,A,B for details.
BACTERIAL CONJUGATION
203
Tra2 consist of a single operon containing 19 genes (trbA-P, upf31.7, $wA, upJ32.8)(Fig. 2). Transcription of this operon initiates at two possible promoter sites, PtrLAand PtrbB,resulting in polycistronic messengers with maximal sizes of 15.5 and 14.6 kb, respectively (32-34). Transcription may terminate at two terminator/attenuator sequences located in trbP (T,,, ,) and in the intergenic region between Tra2 and the Par/Mrs region (THL33s) (25). The organization of Tral is more complex: this Tra region contains the origin of transfer (oriT), the site where the relaxosome assembles and transfer DNA replication initiates (35).The transfer origin is an intergenic region containing a divergent back-to-back promoter arrangement (26, 36). The promoters within oriT, PtraK and P,,.,, drive transcription of two adjacent operons, the leader operon and the relaxase operon, each containing three genes, truK-M and truH-J, respectively (26).Transcription that initiates at P,,, might continue into the primase operon containing the genes truAtraG, thus resulting in a polycistronic messenger with a maximal size of 11.7 kb. In the 3'-terminal region of the traE gene, an additional promoter, Ptrac, was localized. This promoter might serve to enhance transcription of the primase operon and to provide a regulation mode of gene expression that is independent from that of P,, (see Section 11,3,2). Termination of transcription takes place at bidirectional terminators at the boundaries of Tral (25). Both terminator sequences are located in short intergenic regions downstream of the genes traA (T,,,, 6) and truM (TRU3 The Tral region contains two examples of overlapping genes: traC, the structural gene for the DNA priinase encodes two different gene products, TraCl and TraC2, that result from an in-phase overlapping gene arrangement (28,37).The smaller one, TraC2, is produced from an internal in-frame initiation codon within truC. The second example is the traH gene that overlaps on its whole length part of the 3'-terminal region of trul in an out-ofphase arrangement (Fig. 2) (26, 38).
B. Regulation of Transfer Gene Expression Although the conjugative transfer machinery of IncP plasmids appears to be expressed constitutively, some gene regulation mechanisms have to exist, facilitating balanced expression of the large number of transfer loci (33)and to coordinate conjugative transfer functions with vegetative replication and maintenance of the plasmid. IncP plasmids encode a whole collection of regulatory networks that can be classified in three major groups: (1) global networks, consisting of an effector protein and its binding sites, which are spread over the whole plasmid genome affecting different operons; (2) local networks, consisting of an effector that has only one or a few binding sites that are usually in the vicinity of the effector gene (in most cases these are
204
WERNER PANSEGRAU AND ERICH LANKA
autoregulatory circuits); and (3) gene regulation on the level of translation. Transfer genes can be subject to all these classes of control (25).
1. GLOBALREGULATION Three different global regulators are involved in IncP transfer gene regulation, KorA, KorB, and TrbA. Whereas TrbA seems to control only the expression of Tra functions, KorA and KorB are encoded by the IncP central control operon that coordinates transcription of various operons scattered over the genome and is involved in different functions, such as vegetative plasmid replication, plasmid transfer, and maintenance (39-43). KorA is a typical dimeric repressor protein with a clear helix-turn-helix motif in its amino-acid sequence. The protein recognizes the sequence TITAGCTAAA, which exists seven times in the IncPci genome (25). Interestingly, the C-terminal halves of KorA and TrbA show significant sequence similarity (33).Possibly, these genes evolved after a gene duplication event. Although the regulation of Tra functions by KorA is indirect, it seems to be of central importance for the expression of Tra2 genes. The promoters P,l-fAand PtrbAform a face-to-face arrangement in the intergenic region between the IncP ssb gene and trbA (PtrfA)and in the 5’-terminal region of the ssb structural gene (PtrbA).Binding of KorA to its recognition site at PtrfAstimulates P,, (43). Most probably, both promoters compete for RNA polymerase. When separated from each other PtrfA is 50-fold more active than PtrbB. In the native arrangement, the presence of PtqA results in a further reduction of PtrbAactivity by a factor of 20. Repression of PtqA by KorA could enhance the availability of RNA polymerase for Pt,,. However, this explanation might not be sufficient, because the effect is unique to KorA; other repressors that reduce transcription from PtqA (KorB, KorF, KorG, TrbA) have no effect on P,,, activity (43).It has been suggested that KorA acts on PtrfAand PtrbAas a switch: low KorA levels allow expression of tgA, promoting vegetative plasmid replication and reducing expression of Tra2 genes; high levels of KorA repress trfA expression and stimulate transcription of the Tra2 region, promoting the conjugative spread of the plasmid (43). The second global regulator that is thought to control tra gene expression is KorB (Fig. 2). KorB is an acidic protein that in solution exists as a dimer or tetramer (39-41). KorB binds to 12 sites on the IncPol genome, named 0,, which have the consensus sequence TITAGCSGCTAAA. Although some of these sites are clearly related to regulation of gene expression, others are not associated with promoter sequences and occur even within the reading frames of structural genes. The latter type of 0, sites might play a role in the plasmid’s structural organization, i.e., folding or pairing of the plasmid genome (39). However, with lacZ as a reporter gene, a moderate repression of
BACTERIAL CONJUGATION
205
gene expression (twofold) has been found when KorB binds to an 0, site within lac2 separated by 197 bp from a constitutive phage P1 bun promoter
(44). 0, occurs six times within the IncPa Tra regions. 0, sites are associated with the promoters PtgA and PtrbBand are located within the reading frames of trbJ, trb0, truF, and truX (Fig. 2). All 0, sites within the IncPa Tra regions are conserved at equivalent positions within the Tra regions of the IncPP plasmid R751, confirming their importance for the respective plasmids (26; C . M. Thomas, personal communication). Repression of the promoters P,gA and PtrbBby KorB has been demonstrated experimentally (34, 45). Interestingly, the 0, site that seems to be involved in PtrbBrepression is separated from PtrbBby almost 200 bp. It has been speculated that this longrange effect on PtrbB might result from loop formation with an additional degenerated 0, site within the Pt,.bs sequence (34). Another interesting situation exists in the relaxase operon of Tral: truX is involved in regulation of trul expression on the translational level (see Section II,B,3). The 0, site within the truX reading frame could provide an alternative pathway for finetuning of TraI expression on the transcriptional level (25). The trbA gene is the first gene of the Tra2 region (32,33,46).The protein functions as a repressor for the promoters PtrbB,Ptrac, P, and PtraK.Thus, TrbA might provide a means of coordinating the expression of genes in Tral and Tra2. Although the sequence of the TrbA target site has not yet been defined experimentally, a careful inspection of the nucleotide sequences of promoters known to be regulated by TrbA revealed a common feature: the consensus sequence CNGTATATC overlaps the promoters PtrbB(- 10 region), PtraG(- 10 and -35 region), p,, (-35 region), and PtraK(-35 region). Moreover, this sequence occurs only six times on RP4; the only case where it is not associated with a TrbA-regulated promoter is within the tnpA sequence of Tnl (our unpublished observation).
2. LOCALREGULATION Besides the global regulation mechanisms that seem to ensure balanced expression of tru genes, local regulation circuits exist on IncP plasmids. These local circuits provide a means for autoregulation of relaxosomal components, ensuring that enough relaxosome proteins are produced without overburdening the host. Two promoters in the Tral region are locally regulated: the oriT promoters P,,.=, and PtraK.The TraK protein, a relaxosome component (47, 48), confers the strongest effects on both oriT promoters: P,, and PtraKare repressed by a factor of 30 in the presence of TraK (42). The protein winds a 180-bp region of oriT around a core of 15-20 TraK subunits (see Sections IV,A and IV,C,2). Because this region includes both
206
WERNER PANSEGRAU AND ERICH LANKA
P , , and PtraK,repression is most probably due to exclusion of RNA polymerase from the oriT.TraK nucleoprotein complex. As a consequence, repression of the oriT promoters by TraK results not only in an autoregulatory circuit but also in down-regulation of the relaxase operon (40, 42). The second protein that has a local regulatory effect on tru gene expression is TraJ. TraJ forms another type of nucleoprotein complex with the transfer origin, binding to a 10-bp imperfect palindrome located within the right part of an 38-bp invert repeat sequence (49). Formation of this nucleoprotein complex is the initial step in relaxosome formation (see Section IV,A). The relaxosome assembles in the intergenic space between the truJ 5’ end and the 1position of PtrOJ. Thus, it is conceivable that the presence of the relaxosome or even TraJ alone at oriT results in premature termination of transcription from Pfraj. The result is an autoregulatory circuit that attenuates formation of the relaxosome components TraH, I, and J when a relaxosome is present at oriT. Repression of P , , by TraJ alone is not as strong as by TraK; a fivefold reduction of transcription activity was measured (42).
+
3.
REGULATION AT THE
LEVELOF TRANSLATION
Another means to achieve balanced expression of components of the conjugative transfer apparatus exists on the level of translation. Many genes in the Tra regions are coupled translationally (25). This way of regulation is used extensively by IncP plasmids probably because clustering of genes according to their function favors the development of this type of regulation. Five examples of translational coupling in the Tra regions have been demonstrated experimentally: the genes trbBltrbC, trbZltrbJ, truGltruF, truXltraZ, and traLltruM (25, 30, 50, 51). A special situation exists in the case of traXltral. traX codes for a peptide of only 13 amino-acid residues. Termination- and initiation-codons of truX and traZ overlap and the ribosome binding site (RBS) of truZ is located within traX. The mRNA in the traX region has the potential to form a stable secondary structure that masks the RBS of traZ as part of a stem in a hairpin structure (51).Thus, it is conceivable that only when traX is translated is the hairpin structure opened and the trul RBS becomes accessible for ribosomes. Additionally, the initiation codon of truX overlaps with the termination codon of TraJ. Therefore, it is possible that truJ and truX are also coupled translationally. The role of the truX leader gene might consist in functioning as a moderator for adjusting the relative amounts of TraJ and TraI. The presence of an operator site for the KorB protein in the traX reading frame might provide an additional tool for the transfer machinery to respond to different conditions. It is remarkable that TraI has a very low copy-number (less than 5 per cell) (52).
BACTERIAL CONJUGATION
207
111. Mating Aggregate Formation The initial step in bacterial conjugation consists in establishing cell-cell contact stable enough to allow passage of DNA molecules through the cellular membrane barriers (Fig. 1).This process has been named “mating aggregate formation” (Mpf). Transport of the DNA single strand through the membranes of the donor and recipient cell requires a hydrophilic channel or pore spanning the inner and outer membranes of the mating partners. Finally, the transport process must be energized, e. g., by hydrolysis of nucleoside triphosphates. Most of the components encoded by the IncP transfer regions are devoted to providing these functions.
A. Definition of a Core System for Coniugat ive Tra nsf er
To classify IncP tru genes as belonging to Mpf or Dtr, and to define a core system consisting only of essential components, a deletion analysis of the Tra regions has been done. Initially, Tral and Tra2 were separated by molecular cloning in two compatible vector plasmids (29, 32). Defined deletions were created using suitable restriction endonucleases, and the resulting phenotypes were analyzed. Following narrowing down of Tral and Tra2 to the core regions containing a minimal number of functions required for efficient mobilization of the oriT-containing Tral plasmid, Tral-core and Tra2-core were inserted as gene cassettes into a ColD replicon-based vector plasmid (31).In the resulting plasmid construct, the Tra2-core region is under control of the tac promoter. The vector part of the plasmid encodes the lac1 repressor gene, thus allowing control of Tra2 gene expression by addition of IPTG to the culture medium. The Tral genes are under control of their native promoters (32). The core system (either the reconstituted one-plasmid system or the twoplasmid system) was the basis for a linker insertion analysis of single tru or trb genes. Multiple reading frame insertion (Murfi) linkers containing three termination codons, one for each reading frame, were inserted using suitable restriction sites within the genes. Where these were lacking, restriction endonuclease sites were created by site-directed mutagenesis. Nonpolarity of the insertions was checked by complementation of the inactivated genes in trans (50). Several phenotypes are available to monitor the effects of the deletions and linker insertion mutations.
1. Conjugative DNA transfer. This process requires the complete set of tru genes and the IncP transfer origin.
208
WERNER PANSEGRAU AND ERICH LANKA
2. Mobilization of the IncQ plasmid RSF1010. The plasmid relies on the IncP Mpf machinery for its mobilization. It encodes its own relaxosoma1 components and therefore is independent of the IncP Dtr system. 3. Propagation of the donor-specific bacteriophages PRD1, Pf3, and PRR1. These bacteriophages use a structure as a receptor that forms at the surface of IncP donor cells. 4. Production of filamentous cell appendices (“pili”) that might be required to overcome the like surface potentials of donor and recipient cells in ionic environments to adhere the mating partners. The filaments therefore are considered as a part of the Mpf system (29,32,50). The Tra core system has been defined for intraspecific E . coli matings. It consists of 20 components: the Tra2 loci trbB-trbL, the Tral genes traFtruM, and oriT (Fig. 3). Two genes, traL and truM, are not strictly essential, however; the transfer frequency drops by 2-3 orders of magnitude when these genes are absent; therefore, they are considered as belonging to the Tra core (29). Another nonessential Tra core gene is trbK. The trbK gene encodes the entry exclusion function of IncP plasmids. Although it is not required for self-transfer of IncP plasmids, trbK is indispensable for assembly of IncP-type pili and thus classified as an Mpf gene (50). The genes truA-truE of the Tral region are not required for intraspecific E . coli matings (29). However, two of these are known to code for enzymes: truC specifies a DNA primase (see Section IV,E) and truE an analog of E . coli topoisomerase I11 (R. J. DiGate, personal communication). The aminoacid sequences of TraE and TopB (topoisomerase 111) are quite similar: 40% of the amino acids at equivalent positions are identical and 57% are functionally equivalent. Obviously, TraE is replaceable by a chromosomally encoded protein: the high degree of similarity between TraE and TopB suggests that it is indeed E . coli topoisomerase I11 that can substitute functionally for TraE. The role of a topoisomerase I11 analog in the conjugation process can only be speculated. Topoisomerase 111, in contrast to topoisomerase I, has the ability to decatenate replication intermediates and to substitute for DNA gyrase in nascent chain elongation during &type replication (53). It is well possible that these activities play an important role in converting the relaxosome to the conjugative rolling-circle-type replication intermediate or to sustain the elongation step of transfer DNA replication. Mobilization of RSFlOlO depends on the IncP Mpf system (54).Thus, RSFlOlO mobilization provides a tool for identifying and characterizing the IncP conjugative DNA transport machinery. Mobilization of RSF1010, as expected, requires less components as conjugative self-transfer: the Tra2 genes trbB-trbL and the Tral genes traF and truG (29). One Tra2 gene, trbF, is not essential; however, its inactivation results in a severe reduction
Tra2 core
I
trbB
trbD trbC
trbE
I
trbF trbG trbH trbl
trbL
trbJ
trbK
Relaxase
traH tral
traG
traF
/d
Pili synthesis (Dps)
Tral core
I
rn
Leader
traJ traK
traX
traL
I
traM
Mobilization of lncQ plasmids (Mob)
Self-transfer (Tra)
4
Classification
I
MDf
Similarity to Ti plasmid region
I
VirB I
, 20.0
1
1
22.0
I
24.0
,
+ - - - -I
c
w
I
I
26.0
28.0
46.0
I
Dtr I
VirD
,
, 48.0
I
I
50.0
1
H
VirC I
52.0
I
[kbl
FIG. 3. Classification of IncPa transfer functions. Genes are drawn as boxes and arrowheads mark their 5’ ends. Genes belonging to Dtr or Mpf are shown in dark or light gray, respectively. Nonessential genes are marked by hatching. Hatched bars mark sets of genes that are required for the functions listed on the left-hand side. Horizontal lines indicate the extension of regions that are applied for classification of genes. Mpf, Mating aggregate formation; Dtr, DNA processing functions. The scale refers to the standard RP4 coordinates (25).
210
WERNER PANSEGRAU AND ERICH LANKA
of the mobilization frequency (3 orders of magnitude) (50). It is known from biochemical experiments applying purified components that most of the Tral core gene products that are not required for RSFlOlO transfer are involved in RP4 relaxosome formation (traH-traK; see Section IV). The remaining nonessential functions truL and traM could play accessory roles in the RP4 relaxosome assembly process. Because traL and traM have no effect on RSFlOlO mobilization, analogous functions should be encoded by the IncQ mobilization genes. Thus, the contiguous Tral gene cluster traH-M is considered as belonging to the Dtr system of IiicP plasmids.
B. A Single Membrane Protein Complex Functions as Phage Receptor and Sustains Pilus Assembly and Conjugative DNA Transport
1. DONOR-SPECIFIC PHAGEPROPAGATION Propagation of the donor specific phage PRDl requires the genes traF and trbB-trbL (29, 50). Thus, the whole set of Mpf functions is required to display the phage receptor on the bacterial cell surface, indicating that each of the corresponding 12 gene products is involved either as structural components or as accessory factors that process and position structural components of a larger membrane structure. One gene displays a more differentiated phenotype: mutants in trbK still allow attachment of PRD1; however, the phage cannot propagate, suggesting that injection of the phage DNA into the cell is blocked at some stage. 2. PILUSASSEMBLY The same set of genes required for phage reproduction, traF and trbBtrhL, is also required for pilus assembly (50). This indicates that the same structure that functions as receptor for phage PRDl is also responsible for processing, transport, and assembly of the pilin subunits into extracellular filaments (Fig. 4). Notably, the filamentous structures observed under Mpf overexpression conditions are morphologically identical to those described by Bradley (55). 3. CONJUGATIVE D N A TRANSPORT
Finally, also the DNA must use the same transmembrane structure for its passage through the membranes of the donor: all the genes required for phage propagation or pilus assembly are also required for DNA transfer (plasmid mobilization and self-transfer). The only exception is TrbK, the entry exclusion factor, which is dispensable for DNA transfer but is required for production of extended pilus structures. This finding, on the other hand, leads to the conclusion that, in the IncP system, an extended pilus is in fact
BACTERIAL CONJUGATION
21 1
FIG. 4. Electron microscopy of IncP pili. Cells containing the reconstituted IncP transfer system were mounted as described (50). Bar = 500 nm.
not required for DNA transfer. Of course, this leaves the possibility open that, in the trbK mutant, remnants of the extracellular filaments still exist, but these might escape detection by electron microscopy (SO). Furthermore, the pilus-like structures might be required only under natural conditions, for instance in a liquid environment. Under laboratory mating conditions, donor and recipient cells are densely packed on a semisolid agar surface or on a membrane filter. Although, such conditions conceivably could occur also in nature (e.g., in biofilms), the pilus-like filaments could be required to adhere the cells under less favorable conditions.
C. TraG-like Proteins
Remarkably, TraG is the only component required for RSFlOlO mobilization, but neither for pilus assembly nor for phage P R D l propagation (29, 30, 50).Therefore, TraG is likely to be involved in the DNA transport process or in linking the relaxosome to the transmembrane structure encoded by Mpf. Because the TraG primary structure contains nucleotide binding motifs of type A and B (56), the protein is also a reasonably good candidate for provid-
212
WERNER PANSEGRAU AND ERICH LANKA
ing motive force for transport of the DNA single strand across the bacterial membranes (26,57). TraG could act as a specialized DNA helicase separating the T- and the R-strand during rolling-circle-type transfer DNA replication. The importance of amino-acid residues in the nucleotide binding motifs for the DNA transfer process has been demonstrated by site-directed mutagenesis: mutants in each of the motifs A or B were shown to be inactive in plasmid self-transfer and RSFlOlO mobilization (31). Analogs to TraG exist in all conjugative DNA transfer systems studied so far, even in conjugative plasmids of Gram-positive bacteria (i.e., TrsK of pGO1) (31).An interesting example is the colicinogenic plasmid CloDF13. Although, the plasmid is not self-transmissible, it is mobilized efficiently by IncF and IncW plasmids (58; F. de la Cruz, personal communication). The mobilization by F (IncF1) or R388 (IncW) is independent of the RP4 TraGlike proteins TraD (IncF) and TrwB (IncW), encoded by the respective plasmids. Indeed, CloDF13 specifies its own RP4 TraG analog: sequence alignment identified the CloDF13 MobB protein to be TraG-like (31). Also TraD (IncF) has been proposed to be involved in the DNA transport step of conjugation (59).Analogous to RP4 TraG, TraD is essential for self-transfer of F, whereas the requirement of the helicase domain of the F plasmid-encoded Tral protein for self-transfer of F remains to be demonstrated (58). The IncW t m B gene cannot be complemented by its IncPa analog truG in the R388 self-transfer process (60). However, the IncQ plasmid RSFlOlO is efficiently mobilized by a transfer machinery consisting of the IncW Mpf system and IncPa TraG, indicating that IncQ but not IncW relaxosomes do specifically interact with IncPa TraG. On the other hand, the Mpf machinery of IncW plasmids obviously interacts both with IncPol TraG and, of course, with IncW TrwB (60).Thus, two types of specificity exist in TraG-like proteins: a more stringent interaction with the relaxosome and a less stringent one with the Mpf system. The Ti-plasmid-encoded TraG-like protein, VirD4, was localized at the cytoplasmatic surface of the inner membrane (61),corroborating the hypothesis that TraG-like proteins provide a connection between the relaxosome and the DNA transport structure in the bacterial membrane. VirD4 is also required for the VirB-mediated interbacterial mobilization of the IncQ plasmid RSFlOlO (62), demonstrating that the T-DNA transfer machinery can also be used to transmit DNA among bacteria.
D. The lncP Entry Exclusion Function Is Specified by trbK Cells that harbor an IncP plasmid are poor recipients in matings with other IncP-type donor cells (21, 63).This phenomenon is designated as entry
BACTERIAL CONJUGATION
213
exclusion (Eex). Typically, the transfer frequency with an IncP plasmidcarrying recipient is lower by a factor of 50-100 compared to a plasmid-free cell. The entry exclusion function of IncPa plasmids is determined by the trbK gene. In contrast to statements in earlier reports (64,65), trbK is the only function that is necessary and sufficient to express the IncPa entry exclusion phenotype. The apparent involvement of the preceding gene, trbJ, in entry exclusion is probably due to translational coupling between the two genes. Expression of trbJ and trbK from separate plasmids in trans revealed that trbK alone is sufficient to produce the Eex phenotype, and that trbJ gives no entry exclusion phenotype by itself nor does it stimulate the function of the trbK gene product. Thus, the Eex function of IncP plasmids, is specified by a single gene (50). The entry exclusion systems of IncN and IncW plasmids apparently belong to the same class: a single gene is sufficient to produce the Eex phenotype. Moreover, significant sequence similarity has been demonstrated between TrbK (IncPct) and Eex (IncN) (66). However, these systems are different from those of IncFl and Incl plasmids, which encode two-component systems. The F-encoded gene products operate through quite unique mechanisms and therefore their contributions are synergistic. Cells carrying the F plasmid typically have a 100- to 300-fold reduction for their ability to act as recipients in F+ x F+ inatings relative to an F- cell. Two plasmidborne genes, truS and traT, are responsible for the Eex phenotype. The product of traS (16.9 kDa) is an inner membrane protein that is thought to act by inhibiting the triggering of conjugative DNA replication. The truT gene product (23.8 kDa), a lipoprotein, located at an exposed site in the outer membrane, blocks conjugation at an earlier stage, before the cells have formed stable mating aggregates. Two models of TraT action are presently discussed: TraT could block a specific site on the major outer membrane protein OmpA that otherwise would be recognized by the pilus tip of a potential donor to initiate the mating process. Alternatively, TraT could interact with the pilus tip, thereby preventing the normal mating contact
(67).
Interestingly, the trbK gene product predicted from the nucleotide sequence of the gene, like TraT of F (68),has a lipoprotein signature at its N terminus, suggesting that TrbK, like TraT of F, is exposed at the cell surface (46). Studies with site-directed mutations in trbK show that trbK mutants, although able to adsorb PRD1, cannot propagate the phage. This result leads to the somehow paradoxical situation that trbK (1) independent of other transfer genes, specifies the entry exclusion function; (2) must interact with the IncP-pilus assembly machinery of Mpf because it is required for production of the filamentous appendices; (3)is not required for RP4 self-transfer despite the fact that cells devoid of TrbK do not produce visible pili (an
214
WERNER PANSEGRAU AND ERICH LANKA
extended pilus apparently is not required for conjugative DNA transfer to take place); (4)is required for uptake of phage PRDl DNA but not for adsorption of the phage. All Eex systems discussed here have in common that they are not required for plasmid self-transfer. TraS- and TraT- mutants of IncFl plasmids are transfer proficient (21) and so are Eex- and TrbK- mutants of IncN (66) and IncP plasmids (50),respectively. Therefore, Eex determinants in a strict sense are not transfer genes. The fact that Eex systems function independently from the DNA transfer machinery conforms with this classification. Therefore, Eex hnctions should be regarded as accessory functions that prevent unproductive mating. In some cases, however (e.g., IncP), coevolution of the Eex and Tra systems apparently results in a close association of Tra and Eex gene products. This could explain the effects of the trbK mutations on pilus formation and phage propagation.
E. Biochemical Analysis of Mpf Functions Biochemical analysis of Mpf gene products requires their purification. Usually, protein purification procedures are facilitated by overexpression of the corresponding genes using suitable expression vectors. In fact, overproduction of Tra2 gene products has been achieved in most cases (50).The main problems in working with Mpf components, however, are caused by the fact that all except one (TrbB) are typical membrane proteins, being highly insoluble under native conditions. Therefore, two main approaches are currently applied to overcome these difficulties. 1. Creation of fusion proteins. It has been shown in several cases that
N-terminal fusions of originally insoluble proteins with thioredoxin (Trx) are more soluble. Moreover, Trx-derivatives containing histidine tags allow rather easy purification procedures applying NiZ+-chelate &nity columns. Even if the fusion proteins obtained by this approach are biologically inactive (this can be tested by complementation experiments), at least they can be applied for raising antisera against Mpf components. 2. Purification of the protein under denaturing conditions, followed by renaturation and/or incorporation into reconstituted membrane vesicles. Without applying renaturation procedures, proteins obtained in this way have been applied for raising antisera (50). At least two Mpf gene products, TrbB and TrbE, are candidates for having enzymatic activities (70, 71). Both polypeptides contain consensus sequences for nucleotide-binding motifs (Table I) (72-75) and therefore are supposed to be involved either in active positioning of other Mpf compo-
BACTERIAL CONJUGATION
215
nents to assemble the DNA transport complex or in energizing the DNA transport process itself by NTP hydrolysis. Consistent with their membraneprotein character, several Mpf components have N-terminal protein export signals, i.e., cleavage sites for signal peptidase I (TraF, TrbC, TrbC, TrbJ, and TrbL). Moreover, two gene products (TrbH and TrbK) have lipoprotein signatures, i.e., cleavage sites for signal peptidase 11. TrbD and TrbJ contain bacterial leucine-zipper motifs (46, 50). TrbB is the only Mpf component that can be overproduced and purified under native conditions. In solution, the protein exists as a hexamer; this has been verified by electron microscopy ( G . Ziegelin, R. Lurz and E. Lanka, unpublished results). In uitro, the protein exhibits weak ATPase, protein kinase, and autophosphorylating activities. None of these activities is stimulated by the presence of DNA (double- or single-stranded) (E. Scherzinger and E. Lanka, unpublished results). Virtually identical enzymatic activities have been described for VirBll (71).TrbB shows sequence similarity to gene products from several specialized protein export systems, including noncon. jugative pilus assembly systems, the competence system of Bacillus subtilis for uptake of DNA from the environment, protein secretion systems, and toxin export systems (see Section V). These data suggest that TrbB most likely belongs to a class of NTP-binding proteins required for the assembly of trans-periplasmatic pilus-like structures. These structures are supposed to exist in all the systems mentioned above. Localization studies with the TrbB analog of the Ti system (VirB11) suggest an association of the protein with the cytoplasmatic membrane of A . turnefaciens (76, 77). Fractionation studies with E . coli cells harboring the complete IncP Mpf system support these observations, demonstrating an association of TrbB with the inner membrane (A. M. Grahn and D. H. Bamford, personal communication).
IV. DNA Processing Reactions A. Relaxosome Assembly at the lncP Transfer 0rigin The origin of transfer (oriT)is the site where transfer DNA replication is initiated by a site- and strand-specific cleavage event (1 9). Obviously, factors required to exert this reaction must not interfere with such plasmid maintenance functions as vegetative replication, partitioning, or the topological state of the DNA. Therefore, sophisticated regulation mechanisms are needed to ensure (I) site-specificity of cleavage and ( 2 )precise timing of the initiation reaction to coordinate it with mating aggregate formation. These requirements are fulfilled by the relaxosome, a specialized high-precision
TABLE I PHYSICAL PROPERTIES OF
INCPa-ENCODED TRANSFER-RELATED
Designation
Number of residues
Molecular mass (Da)
FiwA
23I
25,590.03
28
KorB
358
39,010.79
53
TraA
96
10,611.88
8.6
1
8.43
146
16.5 118
- 29
7
10.55 5.72
80
-19 - 18 7
5.59 3.53 8.22
TraB TraC 1
1061
15,844.07 116,721.49
TraC2 TraD TraE
746 87 737
81,647.41 9218.66 82,022.40
M, (X
Net lo3) charge
20
-
TraF TraG
177 635
18,901.66 69,857.45
72
TraH Trd
119 732
12,869.13 81,562.43
TraJ
123
TraK
134
14 -21
IsoeIectric point
Multimeric structure in solution
11.44
-
4.59
6 8
10.14 9.45
22 82
-11 32
4.21 10.78
13,463.50
11
1
8.40
14,716.66
17
8
10.65
Dirnerltetrarner -
Features of amino-acid sequence
Helix-turn-helix motif Lipoprotein signature
-
-
Monomer
Monomer
GENE PRODUCTS
-
-
Similar to E . coli topoisomerase I11
-
Monomer Dimer Tetrarner
RCR-initiator signature Bacterial zipper (?)
-
Ref.
Proposed function Inhibition of IncW plasmid fertility Regulation of gene expression
-
DNA prirnase, singlestranded DNAbinding protein DNA primase
-
DNA topoisomerase
Mating pair formation DNA transport during conjugation Relaxosome stabilization DNA relaxase
72 39-41. 45 28 28 28, 73 28, 73
R. J. DiCate, pers. comrn. 30 26, 31
26, 36, 74 26, 74
oriT-recognizing protein
49
oriT-binding protein
26
TraL
24 1
26,566.32
26
-6
5.40
TraM TraX
145 13
15,562.88 1241.47
14
-3 1
5.78 8.43
TrbA
103
11,307.92
12
TrbB
319
35,027.15
36
TrbC
145
15,011.57
14.8
TrbD Trb E
103 852
12,085.11 94,361.45
90
-9
TrbF TrbG
252 297
27,404.16 32,582.76
31 34
-1
10.23 6.94
TrbH
160
16,941.21
20.2
4
10.00
TrbI TrbJ TrbK
463 258 69
48,852.66 28,077.46 7333.45
61.5 26
2
5
-
3
9.43 9.77 8.79
TrbL TrbM TrbN TrbO Trb P
528 199 234 87 244
52,182.68 22,155.54 25,228.41 9440.20 26.848.88
58 21.8
-4 3 8 6 9
4.91 8.21 10.32 11.22 10.70
-
-
1
-4
9.14
Dimer
6.69
Hexamer
2
10.83
12
11.77 6.53
5
26, 36
ATPIGTP binding site motif A (P-loop)
Helix-turn-helix motif ATPIGTP binding site motif A (P-loop) Export signal sequence Bacterial zipper ATPlCTF' binding site motif A (P-looP)
-
Export signal sequence
26 25, 51 Translational regulation of Tra gene expression 33 Regulation of Tra gene expression ATPase, autophosphory- 34, 46, 75 lase
Pilin (P)
34, 46
Mating pair formation
34, 46 46
Mating pair formation Mating pair formation
46 46
Export signal sequence, lipoprotein signature
-
Export signal sequence Export signal sequence, lipoprotein signature Export signal sequence Export signal sequence
-
Mating pair formation Mating pair formation Entry exclusion
46 46, 64 46, 64
-
Mating pair formation
46 46 46 46 46
-
-
Surface protein anchoring hexapeptide
46
218
WERNER PANSEGRAU AND ERICH LANKA
nucleoprotein complex that forms at oriT and that exists stably throughout the cell cycle. IncP relaxosornes consist of at least four plasmid-encoded gene products-TraH, TraI, TraJ, and TraK-and the supercoiled oriT DNA (Fig. 5 ) (52, 78). The IncP origin of transfer is located within an intergenic region of the Tral region (Fig. 6). Main features of the IncP ariTs are (1) a pair of divergent promoters in a back-to-back arrangement directing transcription of the relaxase and leader operons (see Section 1,A); (2) a set of inverted sequence repetitions functioning as recognition sites for relaxosome components or forming defined hairpin structures in the single-stranded transfer intermediate; the hairpin structures possibly represent signals for specific termination of transfer DNA replication that occurs after a plasmid unit length has been transmitted to a recipient cell (79); (3) an intrinsically bent D N A region specifically recognized by the TraK protein (srk)(47). The cleavage site (nic) is located eight nucleotides downstream from the right end of an imperfect 19-bp invert repeat sequence (52, 80). Only the right part of this inverted repeat is required for relaxosome assembly: it contains the recognition site for the TraJ protein (srj), one of the specificity determinants that forms the initial complex with the transfer origin. Nucleotides between nic and the inverted repeat form the recognition site for the TraI protein (sri),the IncP-encoded DNA relaxase that catalyzes the specific cleaving-joining reaction at nic (30, 81).
B. DNA Relaxases 1.
INITIATION AND %€WINATION IN
REACTIONS
CONJUGATIVE DNA TRANSFER
Initiation of replication by a rolling-circle-type mechanism requires cleavage of one plasmid strand within the origin of replication. In conjugative DNA processing, this reaction is catalyzed by a DNA relaxase (26). The IncP-encoded relaxase (TraI) virtually has the ability to catalyze two different DNA cleaving-joining reactions: (1)cleaving-joining of a DNA single strand in a double-stranded, superhelical substrate containing at least srj and sri and (2)cleaving-joining of single-stranded DNA containing at least sri (Table 11).Whereas the former reaction requires the presence of the oriT-binding protein TraJ as accessory factor, the latter requires no additional proteins (49, 52, 81). The only cofactor required for all types of relaxase-mediated cleaving-joining reactions is Mg2+ ions (Table 11). Initiation of DNA processing in other conjugative systems follows a similar scheme: The systems that are closely related to the IncP system (R64, pTF-FCZ, the T-DNA transfer systems of agrobacterial pTi and pRi plasmids) all use a two-component system for dsDNA cleavage-joining (20, 82).
219
BACTERIAL CONJUGATION
FIG.5 . Model of the IncP relaxosome. See Section V,A for explanation
D traJ
Relaxase operon
nic
C-C
v
100 bp
D
traK
Directionof DNA transfer
Leader operon
+
\ -
srj sri
TraJ binding
-.. +.-.AAGGGACAGTGAAGAAGGAACACCCGCTCG GG -.-.A
-..p -a.
TTCCCTGTCACTTCATCCTTGTGGGCGAGC
cc
recognition Tral
n : i c
+
I p q ~
1 1 C-CCGGCTGA GATAGGACGGGCCGACT
specific termination relaxosorne formatiortiinitiation
FIG. 6. Modular structure of the IncP transfer origin. Transcription of relaxase and leader operons initiating at divergent promoter sites within oriT is indicated by horizontal arrows. An inverted repeat sequence adjacent to the nick site (nic) is marked by bold horizontal arrows. Binding sites for transfer gene products (sri, srj, srk) are drawn as shaded bars. The 5’-terminal regions of the transfer genes tru] and truK are represented by open bars. Arrowheads show the 5’ ends. Part of the nucleotide sequence of oriT is depicted below: inverted repeat sequences are indicated by horizontal arrows, dots mark deviations from the symmetry. Shaded regions within sd indicate nucleotides that, in the presence of TraJ, are protected against attack by hydroxyl radicals (49). Nucleotides recognized by TraI are drawn in white with a dark background. The position of the cleavage site is marked by a wedge.
220
WERNER PANSEGRAU AND ERICH LANKA
TABLE I1 PHOPERTIES OF RELAXASE-MEDIATEDDNA CLEAVING-JOINING REACTIONS Relaxase substrate Substrate DNA
dsDNA: negative superhelical sri, srj (srk).
ssDNA: singlestranded sri
5 mM 50 mM 8.5
5 mM 50 m M 8.5
+ +
+
Optimum conditions
Mg2+ NaCl PH Cofactor requirements TraI Tra] TraK Fraction of cleaved DNA in equilibrium Release of cleaved reactiou products
(+)a
0.9 (0.3 when TraK omitted) Addition of protein denaturant (SDS, proteinase K) required
0.3 Spontaneous
TraK is not essential in titro, srk is not essential in cioo or in citro. TraK and srk together increase the yield of cleaved &DNA in citru. (1
For the Ti system, cleaving-joining of double-stranded T-border sequences by the combination VirDUVirD2 has recently been demonstrated in vitro (83).Moreover, cleaving-joining of single-stranded DNA by VirD2 alone has also been shown (84). The relaxase of F (TraI) cleaves and joins double- and single-stranded DNA in uitro (85-87). In cleavage of dsDNA, the oriT binding-protein Tray and the host-encoded histone-like protein I H F are involved as cofactors (88).Another well-studied example is the MobA protein of the mobilizable plasmid RSFlOlO (89, 90). However, in this case MobA alone is sufficient to exert both dsDNA and ssDNA cleavage, indicating that MobA contains the DNA double-strand and single-strand recognition domains both in a single polypeptide chain. What is the biological relevance of these two reactions? Obviously, dsDNA cleavage is required in the initiation reaction. In contrast, further processing of the single-stranded transfer intermediate to terminate conjugative DNA replication involves cleaving and joining of a DNA single strand. Termination of rolling-circle-like transfer DNA replication is thought to include a so-called second cleavage reaction that occurred after a unit length of the DNA molecule to be transferred had entered the recipient cell. In this model, the joining reaction would then be used to recircularize the exported DNA single strand (19). A more detailed model on the termination reaction begins to emerge
BACTERIAL CONJUGATION
221
from experiments using relaxase-oligonucleotide adducts immobilized via the oligonucleotide moiety to magnetic beads. Under the experimental conditions applied, only TraI monomers covalently attached to the 5’ terminus of the oligonucleotides were retained by the beads. These TraI monomers could not catalyze a second cleavage reaction with a second oligonucleotide containing sri and several nucleotides downstream of nic. In contrast to that result, another oligonucleotide containing sri but ending at the 3’-terminal nucleotide of nic was efficiently joined to the oligonucleotide moiety in the covalent relaxase adduct, demonstrating the biochemical activity of the immobilized TraI protein (our unpublished results). The inability of a TraI monomer to catalyze second cleavage can be interpreted in two ways.
1. Second cleavage is not necessary because there is no elongation at the nic 3’ hydroxyl terminus and hence only a unit length of the plasmid is transmitted. Leading strand synthesis in the donor could initiate by a special priming event and the 3‘ hydroxyl at nic is protected in some way. In fact, this model has already been proposed for initiation of donor complementary strand synthesis during conjugative transfer of the F plasmid (91). In this model, termination would occur by a simple joining reaction catalyzed by the relaxase that transfers the covalently attached 5‘ end of the T-strand to the 3’ terminus. (Fig. 7, I). 2. Elongation at the 3’ end takes place. Therefore, after a unit length of the plasmid has been transferred to the recipient, second cleavage must occur. Obviously, the relaxase subunit covalently linked to the T-strands 5’ terminus cannot catalyze this reaction, because its unique active-site tyrosine is already occupied by the attached DNA. For the bacteriophage +X174 gene A protein, a tandem arrangement of two active site tyrosine residues alternates in cleaving and joining the (+)-strand during rollingcircle-type replication of the phage genome (Fig. 7, 11) (92, 93). Such a mechanism seems unlikely for IncP-type relaxases: linkage between TraI and DNA was only detectable at tyrosine-22 and a tandem arrangement of tyrosine residues resembling that in +X174 gpA is not present in the TraI sequence. Finally, attempts to demonstrate second cleavage by a TraI monomer in vitro failed. Plasmids of Gram-positive bacteria that undergo rolling-circle-type replication terminate after a single round of replication by a second cleavage mechanism that involves a dimer of the initiator protein (94). Whereas the first subunit of the dimer catalyzes initiation by site-specific cleavage of the origin, the second subunit cleaves the origin after the first round of replication, following restoration of the cleavage site. The 3’ hydroxyl that is created by the second cleavage event makes a nucleophilic attack on the phos-
222
WERNER PANSEGRAU AND ERICH LANKA
I
Ill
Tra I
FIG. 7. Alternative models for termination of transfer DNA replication. Protein subunits are represented by ellipsoids. Single-stranded DNA is drawn as a black line. The active site tyrosines are symbolized by “Y.” The encircled P depicts the phosphodiester moiety at the nick site. Bent arrows indicate nucleophilic attacks. Panel I: Closing of the T-strand without second cleavage; Panel 11: second cleavage and recircularization reaction catalyzed by a tandem arrangement of active-site tyrosines; Panel 111: second cleavage and recircularization reaction catalyzed by a TraI dimer.
phodiester between the DNA 5’ terminus and the first subunit of the initiator protein, resulting in recircularization of the DNA. Rapid dissociation of the initiator complex from the DNA leads to formation of an inactive initiator heterodimer in which one of the protein subunits remains linked to
BACTERIAL CONJUGATION
223
a short oligonucleotide. This mechanism prevents uncontrolled reinitiation of plasmid replication and the plasmid copy-number is directly linked to the copy-number of the initiator protein. The latter type of mechanism seems to be the most likely one to occur during conjugative transfer DNA replication (Fig. 7, 111). It is even conceivable that formation of an inactive relaxase heterodimer could trigger active dissociation of the mating partners. There is evidence that second cleavage occurs also with plasmid substrates that contain tandem arrangements of oriT (79, 95). In these constructs, initiation and termination take place at different transfer origins on the same plasmid. Thus, mating results in transfer of only the intervening sequence and the rest of the plasmid is deleted (20). An intact invert repeat sequence in the oriT of RSF1010-like plasmids seems to be a requirement for specific termination to occur (79). Site-specific cleavage-joining of DNA single strands by the IncP relaxase requires a core sequence of only six to seven nucleotides (Fig. 6) (81). This sequence occurs several times on IncP plasmids. Moreover, certain nucleotides within the core sequence can be exchanged without losing cleavage-joining activity with TraI. Therefore, the invert repeat sequence near nic could provide a clue for specific termination also in IncP plasmids. The hairpin structures that could form in the singlestranded transfer intermediate could contribute to the specificity of the reaction, providing, in addition to sri, a second signal for termination. 2. DOMAINSTRUCTURE OF DNA RELAXASES Relaxases of IncP-type plasmids are multidomain proteins (Fig. 8) (96). The N-terminal fifth of the TraI amino-acid sequence (732 amino acids, 81.6 kDa) contains the DNA cleaving-joining activity. The remaining part of the protein is thought to be involved in making protein-protein contacts to the accessory proteins TraH and TraJ and in receiving the postulated mating signal proposed to trigger initiation of conjugative DNA processing. In contrast to the N-terminal fifth of TraI, the amino-acid sequence of this region is not conserved among relaxases from other DNA export systems. This may reflect the different specificities in the interactions with other relaxosome components. Also these show no or only barely detectable similarity, if functional analogous proteins from different systems are compared (96, 97). In the N-terminal part of the TraI, three conserved motifs (1-111) were identified by sequence comparison (Fig. 8). These motifs are conserved in relaxases from different conjugative or mobilizable plasmids of Gram-negative bacteria (R751, R64, pTF-FCS), in the VirD2 relaxases of agrobacterial pTi and pRi plasmids, and in mobilizable plasmids from Gram-positive bacteria (96). Remnants of motifs I and 111 were also detectable in F-like (F, R100, R388, R46) and in RSF1010-like relaxases, and in other rolling-circletype replication initiator proteins (97, 98).
224 i
WERNER PANSEGRAU AND ERICH LANKA
relaxase activity
I I
I
1
TraH interaction
L -
17
24
68
732
- - _ - _ _. _. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
103
124
I
FIG.8. Domain structure of the RP4 TraI relaxase. The upper bar represents the entire polypeptide. Domains with known activities are marked by shading. The lower bar depicts the relaxase domain of TraI. Three motifs (I, 11, and 111) that are conserved among relaxases from other systems are drawn as black bars (96). The respective amino-acid sequences are shown below. The active-site tyrosine within motif I is marked by an asterisk. Invariant positions and positions where conservative replacements occur are drawn with a black or gray background, respectively.
A site-directed mutagenesis study with the RP4 TraI protein allowed assignment of specific functions to protein domains corresponding to the three motifs. Side-chains of conserved amino-acid residues were exchanged in such a way that their functionality was altered but the protein's secondary structure should remain unaffected (96). Exchange of the conserved tyrosine residue in motif I (position 22) against leucine resulted in the complete loss of cleavage-joining activity. This residue had been identified previously as the site of covalent attachment of TraI to the 5' terminus of the cleaved D N A (81). By N-terminal sequencing of TraI peptides that remained covalently attached to specifically cleaved oligonucleotides, the hydroxyl group of TraI Tyr-22 was shown to be linked via a phosphodiester moiety to the 5' hydroxyl of the terminal cytidyl residue at the oriT nic site. Therefore, it was concluded that motif I represents a fundamental part of the relaxase active site containing the tyrosine residue that, during cleavage, exerts a nucleophilic attack on the D N A backbone. A mutation in motif I1 (Ser-74 + Ala) resulted in instability of the relaxosomes and in the production of a variety of partially relaxed topoisomers (96). This finding suggested that motif I1 is involved in stable binding of the substrate and possibly also in recognition of sri. However, it also indicated that in the relaxosome, even without a trigger signal, continuous cleaving and joining of the D N A takes place, resulting in an equilibrium of open and closed plasmid species (96) (Fig. 9). With the wild-type TraI, the superhelical state of the cleaved D N A species is maintained, because the protein binds sri at the nic 3' terminus very tightly. With the mutant protein TraI
BACTERIAL CONJUGATION
225
FIG. 9. An equilibrium reaction between open and closed DNA in relaxosomes. Right: Superhelical DNA is represented by a ribbon, the relaxase and accessory proteins by an ellipsoid. Left: DNA single strands are drawn as ribbons. Particular amino-acid residues that participate in the equilibrium reaction are indicated. The phosphodiester group at the nick site is represented as an enrircled “P.” See Section V,B,2 for explanations.
S74A, this interaction is disturbed, resulting in occasional release of sri and hence in spontaneous relaxation of the plasmid DNA. When superhelical stress has decreased sufficiently for TraI S74A to bind again to sri, the mutant enzyme seals the cleaved DNA strand. This model recently has been confirmed by the finding that the mutant TraI S74A immobilized to magnetic beads binds oligonucleotides containing sri much less efficiently than does the wild-type protein (our unpublished results). Additional evidence for the existence of a cleaving-joining equilibrium in relaxosomes comes from studies with the RSFlOlO relaxase (MobA). Incubation of RSFlOlO relaxosomes under high-salt conditions allows quantitative recovery of form-I plasmid DNA from the reaction. Protein denaturants, such as sodium dodecyl sulfate or proteinase K, freeze the equilibrium, resulting in capturing form-I1 plasmid intermediates. In contrast, treatment with salt dissociates from the DNA only the proteins that are noncovalently associated. Plasmid D N A in the open state is covalently associated with the relaxase, which under high-salt conditions might still be able to seal the single-strand incision. Therefore, only when the plasmid enters the covalently closed state (form I) can the relaxase dissociate; consequently the equilibrium is driven completely to a covalently closed plasmid form (89). Motif I11 of TraI contains two histidine residues separated by one residue and followed by a stretch of hydrophobic amino acids. This subdomain is the only feature conserved in all rolling-circle-type replication initiator proteins,
226
WERNER PANSEGRAU AND ERICH LANKA
including the relaxases from F- and RSF1010-like plasmids (97, 98). Accordingly, it was expected that exchange of each of these histidine residues in RP4 TraI against serine would result in a severe loss of relaxase activity irrespective if double- or single-stranded DNA is used as a substrate (96). Because relaxosome assembly is not affected by these mutations, it was concluded that His-116 and His-118 are involved in catalyzing the cleavingjoining reaction. Histidine residues can be involved in activating aromatic and aliphatic hydroxyl groups to become strong nucleophiles (99). Two hydroxyl groups are involved in the cleaving-joining reaction at nic: (1) the aromatic hydroxyl group of Tyr-22 attacking the phosphodiester backbone of the DNA by a trans-esterification reaction that opens the DNA and links the protein to the 5‘ terminus, and (2) the 3’ hydroxyl that, in the reverse reaction, attacks the phosphodiester between TraI Tyr-22 and the DNA to join the ends of the DNA. It seems reasonable to speculate that each of the histidine residues is involved in the activation of one of these two hydroxyl groups, resulting in a reversible charge relay mechanism (Fig. 10).
C. Accessory Proteins 1. TRAJ
The IncP TraJ protein is the only relaxosome component that, in addition to the relaxase, TraI, is essential for the specific cleaving-joining reaction to take place on superhelical substrate DNA (74). TraJ is a specific DNA-binding protein that recognizes a 10-bp sequence (srj) within the transfer origin (Fig. 6) (49). The protein binds without cofactors such as nucleotides or divalent cations to double-stranded relaxed or negative superhelical DNA. In solution, TraJ exists as a dimer, and estimates on the stoichiometry of the
FIG. 10. Proposed reaction mechanism of the cleaving-joining reaction catalyzed by DNA relaxases. As an example, amino-acid residues are numbered according to the RP4 TrdI sequence (26).“B” represents a so-far unidentified basic function. See Section IV,B,2 for explanations.
BACTERIAL CONJUGATION
227
interaction with DNA imply that a dimer also attaches to srj. In accordance with that, srj has imperfect twofold rotational symmetry (Fig. 6). Binding of TraJ to srj is the first step in an assembly cascade leading to relaxosome formation (49, 74). The ability of TraI to cleave single-stranded oligonucleotides specifically, and the requirement for negative superhelical DNA when TraI acts on a double-stranded substrate, strongly suggest that the TraI recognition site sri must be exposed as a single strand to be recognized and cleaved by the relaxase. Binding of TraJ to srj is thought to distort the DNA structure locally allowing access of TraI to sri. Moreover, the close spacing of sri and srj and the fact that both sites face the DNA from the same side suggest that TraJ also makes direct protein-protein contacts with TraI (Fig. 5).
2. TRAK TraK is a specific oriT-binding protein that, although not essential for DNA cleavage or relaxosome assembly in uitro, is an essential transfer factor for conjugation to take place in uivo (36, 74, 100). In uitro, the protein binds to an intrinsically bent region of the IncP transfer origin, wrapping a region of about 180 bp around a core of TraK (Fig. 5). Moreover, binding of TraK to its recognition site (srk) dramatically enhances observable bending in this region, indicating that a highly ordered nucleoprotein structure is formed. Estimates on the stoichiometry of this reaction suggest that 15 to 20 TraK monomers are involved in forming this complex. In solution, the protein exists as a tetramer (47). TraK is encoded by one of the two specificity determinants (truJ and traK) that cannot be complemented by the corresponding genes of the IncPP plasmid R751, indicating that the TraK-oriT interaction is highly specific (36, 101). Nevertheless, it was not possible to define the exact limits of the DNA sequence that forms srk. The reason for this might consist in a requirement not only for a specific DNA sequence forming a nucleation site where complex formation initiates, but also for the ability of the adjacent DNA to follow a path given by the core of TraK around which the DNA is to be wrapped. The sequence-directed bend that overlaps with srk might reflect the increased flexibility of that DNA region being required for forming the TraK-oriT complex. In uitro, the presence of TraK increases the yield of cleaved plasmid DNA that can be isolated from reconstituted relaxosomes, indicating that the cleaving-joining equilibrium in the presence of TraK is shifted to the cleaved plasmid form (48, 96). An explanation for this observation could be a local change of the DNA topology that is imposed by TraK on the transfer origin. Hydroxyl radical footprints show that the helical repeat of the DNA
228
WERNER PANSEGRAU AND ERICH LANKA
within the TraK complex drops from 10.6 bplturn, which is the value for normal B-form DNA, to 10.2 bp/turn. Considering that about 180 bp are wrapped in the TraK complex, a drop by 0.4 bp/turn would be topologically equivalent to about 7 bp of totally unwound DNA. Thus, it is conceivable that TraK acts as a DNA chaperone (102) and that binding of TraK to srk helps to expose sri as a single strand, allowing efficient access of TraI to its recognition site (Fig. 5). The hypothesis that a local change in DNA topology is the explanation for the observed stimulation of plasmid cleavage activity is also confirmed by the observation that the in vitro-topoisomerase activity of the relaxase mutant TraI S74A can be completely suppressed by addition of TraK (96). In the presence of TraK, topological stress on the relwosome should be at least partially compensated, because local strand separation at sri is not only due to negative superhelicity of the substrate but could also result from the structure of the adjacent TraK complex. Thus, in the presence of TraK, binding of TraI S74A to sri obviously can be tight enough to prevent spontaneous plasmid relaxation. Besides the DNA chaperone activity, TraK must have another, perhaps even more important, function. This is implicated by the finding that a functional traK gene product is essential for conjugative transfer to take place, even under conditions where, in the absence of TraK, relaxosome formation and specific cleavage at oriT can be demonstrated (100). TraK remains an essential transfer factor even for plasmids that are completely deleted for srk. In the presence of a functional traK gene, this deletion results in a drop of the transfer frequency by two to three orders of magnitude, but transfer is easily detectable. In the absence of traK, however, no plasmid transfer can be observed (36).Another hint on an additional function of TraK is the finding that TraK is the only Tra gene product that, if overproduced, has a deleterious effect on the host cell (47).
3. TRAH The TraH protein is encoded by an unusual out-of-frame overlapping gene arrangement within the tral gene (Fig. 2). TraH is not an essential transfer factor: a site-specific mutation that destroys the initiation codon of traH but does not alter the amino-acid sequence of TraI has no effect on intraspecific E . coli matings (38).TraH is an acidic protein that does not bind by itself to DNA (Table I). In solution the protein forms higher multimers, and electron microscopy of TraH preparations suggests that these multimers consist of stacked disks with sevenfold rotational symmetry. Another unusual feature of the protein is its deep brown color; the chromophore is still unknown ( G . Ziegelin, W. Pansegrau and E. Lanka, unpublished results). TraH shows several interesting activities in uitro. In the absence of DNA,
BACTERIAL CONJUGATION
229
the protein forms specific and very stable complexes with the relaxosome components TraJ and TraI (51).Both complexes, and the multimeric form of TraH itself, are stable at room temperature in the presence of SDS. Only boiling with SDS leads to their disruption into single subunits. Only C-terminal deletion derivatives of TraI that exceed 71 kDa can form the specific TraH complex (our unpublished results). Therefore, the TraH-binding domain within TraI should be located in the C-terminal part of the protein (Fig. 8). Electrophoretic detection of relaxosome formation by specific retention of supercoiled oriT plasmids on agarose gels is possible only in the presence of TraH (74). Specific complex formation of TraH with the relaxosome components TraI and TraJ could be an explanation for this stabilizing effect of TraH on relaxosomes in uitro. According to this model, TraH would act as a clamp between TraI and TraJ preventing spontaneous dissociation of the relaxosome components.
D. Biochemical Methods for Studying Relaxosomes In recent years, a number of assay systems have been developed to study the biochemistry of relaxosomes and DNA relaxases and the events that take place at oriTs when conjugative DNA processing proceeds. Two main approaches have been followed: (1)isolation of relaxosomes from bacterial cells that overproduce relaxosomal components and (2) overproduction and purification of relaxosome components and reconstitution of protein-DNA complexes in uitro (36, 74, 82, 86, 89, 103-107). The first approach has been used extensively to study the structure of oriT DNA after cleavage had taken place. The cleavage site (nic)was mapped first by analyzing specifically relaxed plasmid DNA on alkaline agarose gels to separate the plasmid single strands (36, 80). The exact position was determined by identlfying the 5'- and 3'-terminal nucleotides at nic by MaxamGilbert sequencing (52, 74, 80). The substrate for the sequencing reactions was obtained by incubating relaxosomes in a cleared lysate with SDS and proteinase K. This treatment results in capturing relaxed plasmid intermediates that can be isolated by preparative gel electrophoresis. Cutting the DNA by appropriate restriction endonncleases and end-labeling of the fragments (3' or 5', depending on which terminus is to be sequenced) yielded the substrate for the chemical degradation reactions. To identify the 5' terminal nucleotide, a second, less laborious method was also applied. Primer extension on specifically relaxed plasmid DNA was done in the presence of dideoxynucleotides using a primer that annealed downstream of nic (52). Analysis of the extension products on a sequencing
230
WERNER PANSEGRAU AND ERICH LANKA
gel resulted in a sequence ladder that terminated at the 5' end of the cleavage site, allowing determination of the position of nic unambiguously. However, the use of polymerases other than the large fragment of DNA polymerase I might lead to incorrect results: for T7 Sequenase (TM) Version 2.0 DNA polymerase a terminal transferase (extendase) activity has been reported that could account for some discrepancies in nick sites that were mapped in other systems (108).The extendase activity of T7 Sequenase may lead to a template-independent addition of one or a few nucleotides to the 3' termini of the extension products synthesized with the T-strand as template. Thus, the virtual position of the nick site will be shifted from its actual position by one or a few nucleotides upstream when T7 Sequenase is used. Attempts to label the 5' ends of specifically nicked DNA fragments demonstrated that the 5' terminus is blocked by a covalent modification (80). TraI-specific antiserum identified this modification to be TraI, showing that this protein embodies the catalytic activity of the relaxosome. Alkali-resistance of the covalent linkage between TraI and the DNA suggested that an 04-tyrosyl phosphodiester is formed with the DNA 5' terminus (52, 109). The second main approach used relaxosomes reconstituted in vitro from purified components. Assembly of relaxosomes can be followed by agarose gel electrophoresis under native conditions. Relaxosomes that form on superhelical oriT plasmid DNA diminish the electrophoretic mobility of this plasmid species. Site specificity of the assembly reaction is examined by electron microscopy. The large nucleoprotein complexes that form at oriT can be visualized after fixation by glutaraldehyde followed by linearization of the DNA by an appropriate restriction endonuclease (74). Intermediates specifically cleaved at nic can be captured if IncP relaxosomes reconstituted in vitro are treated with ionic detergents, such as SDS, or proteases, such as proteinase K. The structure of these intermediates is indistinguishable from the structure of those isolated from cells (74). An additional assay method for IncP-like DNA relaxes makes use of the ability of these enzymes to cleave specifically single-stranded oligonucleotides containing sri (81, 87, 90). This reaction requires only Mgz+ ions and no additional proteins as cofactors (Table 11).As in the double-stranded DNA cleavage reaction, a covalent protein-oligonucleotide adduct is formed (Fig. 11).This reaction has been used to determine the amino acid that forms the covalent linkage between the relaxase and the DNA. Following digestion of covalent adducts with specific proteases, peptide-oligonucleotide adducts were separated on polyacrylamide gels. Distinct mobilities of these adducts allowed mapping of the attachments sites for the DNA relaxases TraI and VirD2 (81, 84). A more direct approach involved N-terminal peptide sequencing of peptide-oligonucleotide adducts. At the position of the amino
23 1
BACTERIAL CONJUGATION
nic
30-mer
cleaving nick region
21-mer
tl
relaxase, Mg2'
+ nick region
m
joining
13-mer nic
nick region!
22-mer FIG. 11. Detection of specific cleaving-joining catalyzed by DNA relaxases. Singlestranded nick region oligodeoxyribonucleotidesare represented by shaded bars. See Section V, D for explanations.
acid that forms the linkage with the DNA, a gap in the sequence was found because a non-volatile PTH-amino-acid derivative, formed during Edman degradation, escaped detection (81, 90). However, amino acids that precede or follow this position were detected without disturbance by the DNA moiety, allowing unambiguous determination of the attached amino acid. The oligonucleotide cleavage-joining assay can also be used for the exact determination of the position of the cleavage site. To determine their size, 5'-end-labeled cleavage products can be coelectrophoresed with the partially digested substrate. The resulting ladder serves as a reference to determine the size of the cleavage product. To get a final proof of the position of nic, a synthesized cleavage product containing the sequence upstream of nic can be applied to demonstrate specific joining. Only an oligonucleotide with a 3' end that corresponds to that of the cleaved T-strand will be accepted in the joining reaction (Fig. 11)(81, 84).
232
WERNER PANSEGRAU AND ERICH LANKA
E. DNA Primases DNA primases are enzymes that catalyze de nmo synthesis of short oligonucleotides on a single-stranded circular DNA template (110). The oligonucleotides can be elongated by the host replication machinery, allowing complementary strand synthesis to take place on DNA single strands in the absence of other chromosomally encoded priming systems. This ability of DNA primases led also to their discovery: most plasmid-encoded DNA primases suppress a temperature-sensitive E. coli dnaG mutation, indicating that these enzymes can hnctionally replace the dnaG gene product, the primase of the chromosomally encoded replisome (19, 20). Moreover, the ability for primer synthesis persists in the presence of rifampicin, demonstrating its independence from RNA polymerase. Conjugative plasmids of several incompatibility groups of Gram-negative bacteria specify a DNA primase (78).The best-characterized representatives are enzymes encoded by the IncIl and IncPa plasmids ColIb-P9 and RP4, respectively (111-113). DNA primases of IncP plasmids are encoded by inphase overlapping gene arrangements within the Tral region (Fig. 2). IncPa plasmids encode two forms of DNA primases, 82 (TraC2) and 117 kDa (TraCl) in size; the smaller form is made from an internal initiation codon within the truC gene (28,37).IncPP plasmids (the prototype is R751) encode even four distinct forms, the largest one (173 kDa) (TraCl) is produced by readthrough of the truD termination codon. The three smaller forms (159, 134, and 81 kDa) (TraC2, C3, and C4, respectively) are specified by traC, which has two internal initiation codons. Each of the different forms of IncPencoded primases shows primase activity in vitro, indicating that the primase domain must be completely contained within the smallest forms (28). The organization of primase genes in ColIb-P9 is quite similar to that in RP4: the primase is encoded by the sog gene, which is located within the Tra region of ColIb-PS. Also the sog gene encodes two polypeptides (210 and 160 kDa) by an in-phase-overlapping gene arrangement (112, 114). However, there is one difference in the situation in IncP plasmids: only the large Sog form shows primase activity; therefore, the primase domain must be located in the N-terminal part of the 210-kDa Sog protein. Amino-acid sequence comparison revealed three conserved regions (1-111) in TraC2 (RP4), in TraC4 (R751), and in the N-terminal part of the 210-kDa Sog protein. One is conserved sequence motif within region 111, -Glu-Gly-Tyr-Ala-Thr-Ala-, among all known prokaryotic DNA primase sequences (115). Interestingly, the motif was also found in the cr protein of the E . coli satellite phage P4. This protein is multifunctional, having DNA primase, DNA helicase, and origin recognition activities on a single polypeptide chain, making replication of the P4 genome independent of host initiation factors (114, 116, 117).
BACTERIAL CONJUGATION
233
Amino-acid residues within the conserved motif were the targets for sitedirected mutagenesis studies of the KP4 TraC2 and the P4 a protein (114). These studies demonstrated that the -Glu-Gly-Tyr-Ala-Thr-Alamotif apparently is essential for the primase function. The activity pattern obtained with the mutant proteins of RP4 TraC2 and P4 a fit into a general scheme: Glu + Gln and Thr + Ser exchanges abolish or strongly decrease the specific activity, whereas a Tyr 4 Phe change increases the activity or leaves it unaltered. In vitro, the Glu 4 Gln exchange results in complete loss of oligonucleotide synthesis. These results suggest that these residues could form a part of a critical domain involved in the primase function. The common feature of plasmid-encoded primases, to be encoded by inphase-overlapping genes, suggests that these proteins are multidomain enzymes and that the different forms are fulfilling specific functions in the bacterial life cycle, i.e., conjugative transfer or plasmid maintenance. In fact, it has been shown that, during conjugation, plasmid-encoded primases can be transferred from the donor to a recipient cell (118, 119). The transport mechanism is not understood. The N-terminal amino-acid sequences of the protein lack the typical signal sequences, indicating that the proteins are transferred by some process other than the classical protein export pathway. However, conjugative DNA primase transfer requires specific sequences too: only the RP4 TraCl protein is detectably transferred, indicating that the N-terminal part of TraCl absent from TraC2 is required for transfer to take place. Conversely, both forms of Sog are transferred to recipient cells in ColIb-P9 conjugation. In the case of Sog, the transfer domain must be located toward the C terminus of the protein and is therefore common to both the Sog210 and Sogl60 polypeptide. What is the biological significance of plasmid-encoded DNA primases and their conjugative transfer? Under laboratory conditions, neither TraC nor Sog is an essential transfer factor. However, in a suboptimal environment, for example starvation, a specialized transfer DNA primase in the recipient cell could facilitate initiation of complementary strand synthesis. Another possibility is that primases are required only in matings between certain bacterial species but not in intraspecific E . coli matings. Indeed, this has been demonstrated for RP4 TraC, which strongly stimulates DNA transfer to Salmonella spp. and Providencia spp. (113).It has been speculated that primases are transferred to the recipient cell along with the T-strand as a protein-DNA complex (111). Because DNA primases are very abundant gene products in RP4- or ColIb-PS-containing cells, it is conceivable that TraCl or the Sog proteins coat the DNA single strand during transfer. However, in both systems, the DNA primases do not have to be present during transfer. This would mean that the same transport channel had to be used with nearly the same efficiency both for the naked DNA single strand
234
WERNER PANSEGRAU AND ERICH LANKA
and for the primase-DNA complex. Because this seems very unlikely, either another host- or plasmid-encoded protein may substitute for the primase during transfer or the primase in fact uses another independent pathway to reach the recipient. RP4-mediated DNA transfer to yeast does not require the truC gene products (S. Bates, A. Cashmore and B. M. Wilkins, personal communication). The biochemistry of plasmid-encoded DNA primases has been studied extensively in uitro. TraCl and TraC2 are anisometric molecules that exist in solution as monomers. Both TraCl and TraC2 bind to DNA single strands with equal affinity. A typical assay for primase activity consists of an E . coli extract that sustains DNA replication but is devoid of a functional host primosome, and a single-stranded circular template DNA, for example, of phage +X174, G4, or fd (111).Additional compounds are rifampicin to inhibit RNA polymerase activity and the complete set of ribo- and deoxyribonucleoside triphosphates. Suitable strains to prepare the DNA replicating extract are BT308 (dnuG) or, when +X174 DNA is used as a template, BT1304 (dnaB, dnuC). Primase activity is monitored by incorporation of labeled deoxyribonucleotides into acid-insoluble material during the elongation reaction. Analysis of the primers synthesized by the RP4 TraC primase revealed that these consist of 2- to 12-mer oligoribonucleotides. The 5’-terminal nucleotide is always C or pC. The second nucleotide is A or G, with A being the preferred compound. No preference was detectable for the following nucleotides. Experiments with synthetic oligodeoxyribonucleotides as templates confirmed that the dinucleotide sequence d(TG) is the preferred recognition site and sufficient for TraC to initiate primer synthesis. d(CG) is also accepted, however with lower af€inity. The presence of a C or pC residue at the 5‘ terminus instead of pppC provides strong evidence that TraC has two different nucleotide-binding sites, one for the initiating 5’-terminal nucleotide and a second one for the nucleoside triphosphates to be incorporated into the primer chain (73).
V. Phylogenetic Relationships to Other Systems Relationships among conjugative systems or to other systems can be discovered by two basic approaches: by the analysis of mechanistic analogies and by search for sequence similarities between genes or gene products. Both approaches have been successfully applied for the IncP system. In several cases, mechanistic analogies served also as a guide for a careful examination of functionally analogous gene products for sequence similarities to demonstrate an evolutionary relationship.
BACTERIAL CONJUGATION
235
A. Conjugative Plasmids of Gram-positive and Gram-negative Bacteria All conjugative systems analyzed so far share mechanistic analogies: adhesion of donor and recipient cells is mediated by extracellular filamentous structures (pili) (12O), or, in Gram-positive bacteria, by a fibrillar “adhesion substance” (121); the DNA is transferred as a single strand thought to be generated by rolling-circle-type replication; the leading 5’ end is covalently associated with relaxase protein initiating transfer DNA replication by a siteand strand-specific cleavage event. A comparison of the sequences adjacent to the cleavage sites within the transfer origins of conjugative and mobilizable plasmids from Gram-negative and Gram-positive bacteria revealed the existence of at least four groups of sequence-related transfer origins (Fig. 12) (122-138). Based on the prototype plasmids that gave origin to these families, we propose to designate them IncP-, IncF-, IncQ-, and ColEl-like transfer origins (20). The prototype plasmids are those of its group that were characterized and sequenced first. IncP-like transfer origins seem to be most widely distributed: the core of the relaxase recognition site (sri, Figs. 6 and 13) has been found to be conserved not only among a wide variety of transfer origins from other conjugative and mobilizable plasmids but also among conjugative transposons, the T-border sequences of the Agrobacterium tumefaciens Ti plasmid, vegetative replication origins of plasmids from Gram-positive bacteria, and replication origins from single-stranded bacteriophages (48, 139, 140). In any case, the position of the cleavage site is conserved and the invariant nucleotide positions are found exclusively upstream of the cleavage site (Fig. 13) (141-147). The transfer origins from the other families (Fig. 12) do not fit into this scheme: conserved positions were detected upstream and downstream of nic, indicating that the mode of substrate recognition by the respective DNA relaxases probably differs significantly from that of the IncP-like relaxases. The sequences around the nic sites are highly conserved within each of these groups, but the conserved positions are not shared among the families. Nevertheless, all transfer origins studied so far are functionally equivalent: Transfer is initiated by a specific single-strand incision and the cleaving enzyme attaches covalently to the 5’ terminus of the interrupted DNA strand. In general, the pattern of relationships among the relaxases from different conjugative or mobilizable plasmids follows that of the transfer origins: relaxases that act on IncP-like transfer origins share three conserved motifs at their N termini [Figs. 8 and 14 (148, 149); see Section IV,B,2] (48, 96).
236
WERNER PANSEGRAU AND ERICH LANKA
C C A C C C C R
C C T C A A C
G G T C C A T
G C G G C A G
C C C T A C T
Y A T C C T G Y
F T T P307 T T RlOO T T pED208 T T R46 G C R388 G G T G C G
G T
T A A
T R
R C
G C G C C C - T
ColEl G G A G T G T A T A C T G G I C T T A A C ColA G G A G T G T A T A C T G G C T T A C T FIG.12. Families of oriT nick regions. Nucleotide positions that are conserved within a family of oriT sequences are drawn with a black background. A shaded background marks purine or pyrimidine cf pyrimidine positions where conservative replacements (purine nucleotide) may occur. In cases where the cleavage site has been determined, it is indicated by a wedge. Consensus sequences are depicted below of each block of related sequences. ReferencesiCenBank accession numbers: RP4/RK2 (26)/L27758; R751 (26)/X54458; pTF-FC2 (122)/M57717; R64 (123)/D90273; pTiC58 LB (124)/J01818; pTiC58 RB (124)/J01819; NTP16 (125)/L05392; F (126)/XOO545; P307 (127)/X06534; RlOO (128)/M17148; R46 (129)/M30197; R388(130)/X51505;RSFlOlO (131)/M28829; Rl162 (132)/M13380; pTFl(133)/X52699; pTiC58 oriT (134);pSClOl(135)/XO1654; pIP501 (136)/L39769;p C 0 1 (C. L. Archer, personal communication); ColEl (137)/J01566;ColA (138)/M37402. @
Most notably, motif I11 partially appears also in relaxases of plasmids from the other families: the two histidine residues that are thought to be involved in activation of the reactive nucleophiles in the cleaving-joining reaction are conserved throughout (97, 98, 150). Motif I1 was not found in the non-IncPlike relaxases, which is the expected result because this motif is proposed to be required for specific substrate recognition and stable binding (Fig. 9).
237
BACTERIAL CONJUGATION
Motif I is the least conserved one, also among IncP-like relaxases: except for the reactive tyrosine, all other positions are variable (96). A comparison of the Dtr and Mpf systems from the different plasmids revealed that apparently several combinations of Mpf and Dtr components of diverse origin exist. This finding suggests that the various conjugative transfer systems have formed by combination of exchangeable modules that, in the course of evolution, have adapted to optimal function in the respective context (25). Two examples of module shuffling can be given: The transfer apparatus of IncW plasmids seems to consist of an IncP-like Mpf system (Pil, Fig. 15) and a Dtr system that is composed of an IncF-like transfer origin with the corresponding DNA relaxase/helicase showing similarity to the IncF TraI protein (TrwC), an IncP TraJ-like oriT-binding protein (TrwA), and a protein (TrwB) that belongs into the family of IncP TraG-like proteins (108). The second example is the conjugative system of the Ti plasmid for interbacterial plasmid transfer: the Tra3 region shows extended similarity to the Tra2 core
RP4 R751 PTF-FCP R64
ACTTCAC ACTTCAC ACAACGG CAATTGC
CCGGCT CCGCCT ATTGCT CCCGTT
pTiC58 T-DNA (RB) pTiC58 T-DNA (LB)
CGCCAAT CCACAAT
CAAACA CCACCA
Tn4399
GCCGACA
TATCCT
pC194 PUB110
TTCTTTC TTCTTTC
TAATAA TACATA
TGCTCCC GTGCTGC TGCTCGG TAACTGGA
TATTAA TAATAG TATTAA TGTTAC
phage QX174
phage 3 - 1 , a-3
phage G4, G14, U3 phasyl Consensus
YAWCYTd
FIG. 13. Alignment of rolling-circle-type replication origins. Nucleotide positions that are throughout conserved are drawn with a black background. A shaded background marks positions where conservative replacements (purine c) purine or pyrimidine ct pyrimidine nucleotide) may occur. The consensus sequence is depicted below. ReferencedGenBank accession numbers: Tn4399 (107)/L20975; pC194 (141)/J01754; pUBllO (I42)/M19465; phage 4x174 (143)/ J02482;phage St-I (144)/J02501;phage a-3 (144)/M10631; phage G4 (145)/V00657;phage 614 (146)/M10632; phage U3 (146)/M10630; phasyl (147)/X56069. Others: see legend to Fig. 12.
238
WERNER PANSEGRAU AND ERICH LANKA vrrD3
pTiA6 VirD/C
RP4 Tral (IncPa) oriT
PTF-FC2 Mob mobA
R64 Nik (Incll)
dobB mobC mobD mob€ Or17
-+-+ nrk8
nikA
1kb
FIG. 14. Conserved gene organization among relaxase regions of different DNA transfer systems. Conserved motifs in DNA (nick region sequences) or amino-acid sequences are connected by broken lines. (For details on conserved relaxase motifs see Fig. 8.)Genes proposed to encode functionally related gene products are shown with the same type of hatching or gray tone. ReferencedGenBank accession numbers: IncPa (26)/L27758; pTiA6 VirDlC (148, 149)/M17989, M14480; pTF-FC2 (122)/M57717;R64 (123)/D90273.
region of IncP plasmids (Fig. 15) (151-154). The Dtr system, located in Tra2 of the Ti plasmid, is IncQ-like with a relaxase (TraA) similar to the RSFlOlO MobA protein and a transfer origin that fits into the family of IncQ-like oriTs (Fig. 12). However, the C-terminal part of TraA contains motifs that were also found in the helicases TraI (IncF) and RecD ( E . coli) and in the E . coZi primase DnaG (S. K. Farrand, personal communication). Recently, the sequences of several new systems, including conjugative transposons and integrated elements from Bacteroidm and of conjugative plasmids from Gram-positive bacteria, have become available. Interestingly, most of these systems seem to fit into the IncP family: the transfer origin of Tn4399 contains a sequence that matches the consensus for IncP-like oriTs and an invert repeat sequence upstream of the putative nick region (Fig. 13) (107, 155). The “mobilization region” of Tn4399 encodes two proteins, MocB and MocA, that show similarity to the TraJ and TraI proteins of IncP plasmids, respectively. Particularly, the three IncP relaxase motifs occur also in the MocA sequence. The mobilization protein (Mob) of NBUl displays similarity to TraJ as well as to TraI of IncP plasmids (156).The TraI-like domains are located at the N terminus and comprise the TraI motifs I, 11, and 111.
239
BACTERIAL CONJUGATION
RP4 Tra2 (IncPa)
pTiC58 Tra3 pTiA6 VirB pKM101 Tra (IncN) pVr745 Tra R388 Pil, (IncW)
B. pertussis Ptl
ptm
c
D
E
F
G
H
__ 1 kb
FIG. 15. Conserved gene organization among specialized bacterial export systems. Genes encoding similar products are connected by broken lines or are shown with the same type of hatching. Tags marked with E represent protein export signals. Tags marked with L indicate lipoprotein signatures at the N terminus of the respective protein. Conserved nucleotidebinding motifs of type A are marked by tags labeled with A. ReferencedGenBank accession numbers: IncPa Tra2 (25)/L27758; pTiC58 Tra3 (151; S. K. Farrand, personal communication); pTiA6 VirB (152)/J03216; pKMlOl Tra (153)/U09868; pVT745 Tra (D. Galli and D. Leblanc, personal communication); R388 Pil, (F. de la Cruz, personal communication); B . pertussis Ptl (154)/L10720.
However, the arrangements of motif I1 and I11 differ: in the Mob protein these domains form a single block of approximately 20 amino-acid residues. The C-terminal part of Mob contains a region similar to TraJ (IncP), suggesting that this protein embodies relaxase and origin-recognizing functions in a single polypeptide chain (156). The MobA protein of RSFlOlO is another example for combining double-strand recognition and cleaving-joining activity in one polypeptide chain (89). The transfer region of the conjugative plasmid pSK4 from Staphylococcus aureus encodes 13transfer gene products. Two of these display similarity to IncP transfer proteins: TraK of pSK4 shares similarity with the IncP TraGlike proteins (31)and TraI (pSK4) with the topoisomerase TraE of the IncP plasmids (157).The relaxase has not yet been identified; however, the transfer origin of the closely related plasmid pGOl fits into the IncQ family (Fig. 12), suggesting that also the relaxase probably will be similar to the IncQ MobA protein (158).
240
WERNER PANSEGRAU AND ERICH LANKA
The product of the gene 19 of the IncFl plasmid R 1 is not essential for conjugative DNA transfer; however, mutants in gene 19 are attenuated in DNA transfer and bacteriophage R17 propagation (159). The amino-acid sequence of P19 contains three motifs that are conserved among soluble lytic transglycosylases (e.g., Slt70 of E . coli). The three motifs were also found in the sequences of IpgF (encoded by a virulence plasmid of ShigeZZafEexneri), PilT of R64 (IncI), TrbN of RP4 (IncP), VirBl of pTi, and TraL of pKMlOl (IncN). Mutations in the corresponding genes in most cases result in an attenuation of DNA transfer but not in a Tra- phenotype, suggesting that the proteins might play analogous roles in the DNA transfer process. The three motifs are suggested to be indicative of muramidase activity (159). Thus, it is conceivable that this class of proteins either facilitates the passage of the DNA single strand through the peptidoglycan layer or the insertion of components of the membrane-spanning DNA transport complex into the bacterial cell wall. The attenuated phenotype of mutants in the genes could result from the fact that muramidases provided by the host cell, albeit with lower efficiency, could functionally replace the corresponding gene products. Interestingly, the highest overall similarity exists between the IncP conjugative machinery and the T-DNA transfer system of the Ti plasmids (Figs. 3 and 12-15). Similarities were found between transfer origins and T-borders, between the Tral core region (IncP Dtr) and the VirD region (pTi Dtr) and between the Tra2 core region (IncP Mpf) and the VirB region (pTi Mpf) (26, 46, 57, 75, 139).
B. The T-DNA Transfer System of Agrobacteriurn tumefaciens
Although A. tumefaciens-mediated tumor induction in plants at first sight appears to be rather different from bacterial conjugation, a closer look at the mechanism of tumor induction reveals striking similarities: T-DNAs encoding enzymes for the synthesis of plant hormones and opines are transferred as single-stranded DNA-protein complexes to the plant nucleus (160).In the plant, the DNA arrives coated over its whole length by a specialized singlestranded DNA-binding protein, VirE2 (161, 162), and the 5' end is covalently associated with the VirD2 relaxase (109, 163, 164). This protein-DNA complex has been designated the T-complex. In its structure, it might resemble a filamentous bacteriophage (Fig. 16). The targets for VirD2 relaxase are the border sequences that form a tandem arrangement of 25-bp direct repeats, flanking the T-DNA regions of Ti plasmids on their left and right side. By comparing the T-border sequences with IncP-like transfer origins, it became for the first time apparent that T-borders not only are functionally related to conjugative transfer ori-
24 1
BACTERIAL CONJUGATION
T-DNA transfer mediated by pTi right T-DNA border
conjugative DNA transfer mediated by RP4
5'
3’
oriTnick region
3,
Tral TraJ TraH
Complex formation
VirD2 VirD1 VirD3
3’
VirD2
3
VirE2
VirD4
Linkage of the DNA to the export apparatus NTP hydrolysis (?)
Tral
TraC
TraG
VirB gene products
Tra2 gene products
Plant cell
Bacterial cell Recipient
FIG. 16. Functional analogies hetween IncP-type bacterial conjugation and T-DNA transfer to plants. Components proposed to fulfill analogous functions are arranged at opposing positions. See Section V,B for explanations.
gins but also are evolutionary: a core region of 6-7 nucleotides is identical in both systems (48). Moreover, the positions of the cleavage sites relative to this core are conserved (Figs. 12 and 13). However, a substantial difference
242
WERNER PANSEGRAU AND ERICH LANKA
of T-border sequences from other conjugative transfer origins is the absence of invert repeat structures upstream of the cleavage site. In interbacterial conjugation, these invert repeat structures are thought to be required as a signal for termination of transfer DNA replication and for precise recircularization of the transferred DNA single strand (see Section IV,B,l). Because these reactions are not required for T-DNA transfer-termination takes place by cleavage at the left T-border sequence and the DNA in the T-complex is integrated into the plant genome as a linear molecule (165)-it is reasonable that invert repeat sequences are not present in T-border sequences. The sequence identity in transfer origins and border sequences is paralleled by sequence similarities of conjugative relaxases and pTi-encoded VirD2 proteins (139). The three relaxase motifs identified as conserved among several conjugative DNA relaxases are also present in VirD2 (Figs. 8 and 14). The importance of motif I for the function of VirD2 has been demonstrated by site-directed mutagenesis: exchange of Tyr-28 for phenylalanine abolishes in vivo cleavage activity of VirD2 (166). Furthermore, Tyr-28 is functionally analogous to Tyr-22 of TraI: peptide mapping of covalent VirD2-DNA adducts identified this residue to form the covalent linkage to the 5' end of the covalently attached DNA in the T-complex (84). Purified VirD2 has been applied in vitro for cleaving-joining reactions on single-stranded oligonucleotides containing the nick region of T-border sequences (84, 167) and, together with purified VirD1, for the cleavage of superhelical T-border DNA (83).In the latter reaction VirDl acts as a functional analog of the IncP TraJ protein, although binding of VirDl to DNA has not been demonstrated. The characteristics of the in uitro reactions of VirD2 and VirD2/VirD1 are nearly identical to those of TraI and TraIfTraJ of the IncP system. The only detectable difference is a more relaxed substrate specificity of VirD2 in the single-stranded DNA cleaving-joining reaction. VirD2 cleaves and joins not only oligonucleotides containing the nick region of Ti border sequences, but also those containing the nick region of IncP plasmids (sri, Figs. 6 and 13).In contrast, IncP-encoded TraI protein cleaves and joins only oligonucleotides with the cognate sri sequence (84). VirD2 has been proposed to be also involved in integrating the T-DNA into the plant genome (84). Therefore, the lower stringency on substrate requirement might reflect the ability of VirD2 to use free 3' hydroxyl ends that may occur transiently in the plant genome to initiate the integration reaction to join these ends with the 5' terminus of the T-DNA. Sequencing the ends of integrated T-DNAs has revealed that in many cases integration is precise with respect to the T-DNA 5' terminus. This finding indeed suggests a functional involvement of VirD2 in integrating the T-DNA into the plant genome (168).
BACTERIAL CONJUGATION
243
The gene organization of conjugative relaxase operons shows a striking similarity to the VirD region of Ti plasmids (Fig. 14). The similarity continues on the right-hand site of the intergenic region that separates VirD and VirC: the gene products of virC1 and uirC2 are sequence-related to the proteins TraL and TraM of the IncP leader operon (25).The function of the products of the VirC region is proposed to consist in interacting with “overdrive” sequences that may be present in the vicinity of right T-border sequences to stimulate border-specific cleavage (169).Although such a function has not been demonstrated for TraL and TraM, both gene products, like VirC 1 and VirC2, are nonessential accessory proteins that might stimulate DNA processing reactions at the transfer origin. Most strikingly, a nucleotide-binding motif of type A (56) at the very N terminus of the amino-acid sequences of VirCl and TraL is conserved (Fig. 14). Sequence alignments of corresponding genes reveal that the IncP-encoded TraG proteins have sequence-related analogs in the pTi-encoded VirD4 proteins (26). Again, nucleotide-binding motifs (types A and B) are well conserved among IncP and pTi-encoded proteins. In both cases, however, like in the other TraG-like proteins, the type A motif does not correspond very well to the consensus (57). The similarity between functions involved in T-DNA transfer and bacterial conjugation continues in the pTi VirB region (7). VirB-encoded gene products are thought to be involved in transporting the T-complex across the bacterial membranes and the plant cell wall into the plant cytoplasm (76, 170). Several products of VirB show sequence similarity to products of the IncP Tra2 core region and related conjugative systems (Fig. 15)and the gene organization of both regions matches over a considerable range (46).Particularly, the genes virB2-5 are in the same order as are the IncP Tra2 genes trbC-F (Fig. 15).Because the trbC and uirB2 gene products show similarity to the IncF pilin TraA, it has been speculated that also in T-DNA transfer a pilus-like structure might be involved (38, 44). Interestingly, the uirB11 gene and its analogous counterpart trbB are located at opposite ends of their respective operons, providing evidence that these transport regions could have evolved from exchangeable modules that may arrange in several ways to yield a functional system.
C. The Toxin Secretion System of Bordetella pertussis Bordetellu pertussis is a human pathogen, the infective agent of whooping cough. A major virulence factor of B. pertussis is the pertussis toxin, consisting of five different subunits that are exported individually to the periplasmatic space by the signal peptide-dependent pathway (1 71). After
244
WERNER PANSEGRAU AND ERICH LANKA
assembly, the toxin is secreted from the periplasm into the extracellular environment. The latter step requires the gene products of the chromosomal Ptl operon that is located downstream from the pertussis toxin operon. Ptl consists of at least seven genes (ptlB-H) arranged in a single polycistronic operon. Comparison of the amino-acid sequences of the gene products of ptlB, C , D, E , F , G, and H revealed a striking similarity with gene products virB2, 3 , 4 , 5, 6 , 8 , 9, 10, and 1 1 , respectively (Fig. 15) (153, 154). Consequently, ptl gene products are also sequence related to polypeptides encoded by IncP Tra2 core, IncN Tra, and IncW Pil, and to the pTi-encoded Tra region for interbacterial conjugative transfer (Fig. 15). Special features in the amino-acid sequences, such as protein export signals (cleavage sites for signal peptidase I, PtlF), nucleotide binding motifs (PtlC, PtlH), and leucin zipper motifs (PtlB), are throughout conserved (25, 29). This and the fact that the genes in both Ptl and VirB are colinear demonstrate that the DNA transfer systems and the Ptl toxin secretion system have a common evolutionary origin. Moreover, it shows that the basic principles underlying conjugative DNA transfer can also be applied to protein secretion, and vice versa. Therefore, it is also impossible to decide what a common ancestry system might have been: a delivery system for protein or for DNA.
D. Other Systems One component of the IncP Mpf system, the trbB gene product, is related not only to proteins involved in conjugative DNA transport but also to proteins from a wide variety of specialized protein export systems and from one DNA import system (7, 75). TrbB, which is sequence related to PtlH (Ptl, B . pertussis) (153),TrbB (pTiC58 Tra3) (151; S. K. Farrand, personal communication), TrwD (R388) (172),TraG (pKM101) (153),and VirBll (152,173-175), displays also significant similarity to the gene products XcpR (Pseudomonas aeruginosa) (176), ComGl (Bacillus subtilis competence system for transformation by exogenous DNA) (177), PulE (Klebsiella pneumoniae pullulanase export system) (178), XpsE (Xanthomonas campestris) (1 79), OutE (Erwinia chrysanthemi) (180), ExeA (Aeromonas hydrophilia) ( I S ] ) , and PilB and PilT (Pseudomonas aeruginosa pilus assembly systems) (182). All these proteins have in common three highly conserved regions, one of which is a nucleotide-binding motif of type A (56).The fact that TrbBlike proteins appear in all these types of transport systems makes it unlikely that TrbB is involved directly in the DNA transport reaction during conjugative transfer. A common feature of the systems from which these proteins originate is, however, that these consist of membrane-located multiprotein complexes, often associated with pilus-like structures. Thus, it is most likely
BACTERIAL CONJUGATION
245
that the TrbB-like proteins are involved in the assembly of these membrane complexes, possibly functioning as chaperones preparing components of the complex for assembly.
VI. Conclusions and Perspectives The mechanistic analysis of the process of bacterial conjugation revealed that the great majority of eubacterial DNA transfer systems studied so far rely on the same basic principle: during initiation of transfer DNA replication, the D N A molecule to be transferred is cleaved in a reversible site- and strand-specific reaction. The cleaving agent, the relaxase, forms a covalent intermediate with the DNA, attaching to the 5’ terminus and a DNA single strand is generated by rolling-circle-like replication, initiating at the strand interruption introduced by the relaxase. The DNA single strand is transferred in a 5’ + 3’ polar transmission process across the membranes of donor and recipient. Parallel to these functional analogies, sequence comparisons unveiled a phylogenetic relationship between most conjugative systems and also to some other macromolecular transport systems. Most notably, the agrobacterial T-DNA transfer to plants is now recognized to be a special conjugative process adapted to the requirements imposed by the plant acting as a recipient (7). Biochemical studies on relaxases reveal numerous details of the mechanism of the nicking-closing reaction that takes place at the transfer origin. However, a still-open question concerns the conversion of the relaxosome into a structure that can be used by a rolling-circle replication machinery to generate the single strand destined for transfer. This conversion takes place only on mating aggregate formation, thus, some sort of trigger signal must be sent to the relaxosome to initiate transfer DNA replication. The origin and nature of this signal remains to be determined. Generation of a DNA single strand, in general, requires the action of a DNA helicase, a type of enzyme that has not been discovered among the Tra proteins of IncP plasmids. In those systems known to encode DNA helicases (e.g., IncF and IncW) (183, 184), functional relevance for DNA transfer is still to be demonstrated. Currently, two possibilities are favored: either a host-encoded helicase is engaged in transfer DNA replication, or a specialized enzyme that differs substantially from known DNA helicases in its primary structure and enzymology provides the strand-separating activity. The main enigma of bacterial conjugation, however, remains the transport of the D N A across the cell membranes of the donor and recipient cells. Our knowledge about the structure of the transport channel (Fig. 17) and the
246
WERNER PANSEGRAU AND ERICH LAN'KA
FIG.17. Model of the IncP-type transfer machinery
function of its components is still marginal. Biochemical studies of Mpf components are beginning now: most components of the IncP Mpf system have been overproduced and several are purified. This will provide a basis for studying the structure of the single compounds and to unravel interactions between them and with relaxosomes.
ACKNOWLEDGMENTS We thank our colleagues Gordon J. Archer, A. Marika Grahn, and Dennis H. Bamford, Fernando de la Cruz, Dominique Galli, and Donald Le Blanc, Stephen K. Farrand, Giinther Koraimann, Russell J. DiGate, Ron A. Skurray, Christopher M. Thomas, and Brian M. Wilkins for providing manuscripts and data prior to publication. Work in our laboratory was supported by a grant from the Deutsche Forschungsgemeinschaft (SFB 344/B2) to E. L.
247
BACTERIAL CONJUCATION
REFERENCES J. Lederherg and E. L. Tatum, Nature 158, 558 (1946). L. L. Cavalli, E. M. Lederberg and J. Lederberg, J. G e n . Microbiol. 8, 89 (1953). W. Hayes, J . Gen. Microhiol. 8, 72 (1953). R. C. Deonier, in “Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology” (F. C . Neidhardt, ed.), p. 982. American Society for Microbiology, Washington, DC, 1987. 5. F. Jacob and E. L. Wollman, Symp. SOC.Exp. B i d . 12, 75 (1958). 6. J. A. Heinemann and G . F. J. Sprague, Nature 340, 205 (1989). 7 . M. Lessl and E. Lanka, Cell 77, 321 (1994). 8 . S. E. Stache1 and P. C. Zambryski, Cell 47, 155 (1986). 9. P. Zambryski, ARGen 22, 1 (1988). 10. T. Watanabe, Microbid. Reu. 27, 87 (1963). 1 1 . T. Watanabe and T. Fukasawa, BBRC 3, 660 (1960). 12. C. F. Amlbile-Cuevas and M. E. Chicurel, Cell 70, 189 (1992). 13. P. Mazodier and J. Davies, ARGen 25, 147 (1991). 14. P. Courvalin, Antimicrob. Agents C h m t h e r . 38, 1447 (1994). 15. E. J. L. Lowbury, A. Kidson, H. A. Lilly, G. A. Ayliffe and R. J. Jones, Lancet 2, 448 (1969). 16. P. Barth and N. J. Grinter, J. Bact. 120, 618 (1974). 17. P. Guerry, J. van Embden and S. Falkow, J. Bact. 117, 619 (1974). 18. D. G. Guiney and E. Lanka, in “Promiscuous Plasmids in Gram-negative Bacteria” (C. M. Thomas, ed.), p. 27. Academic Press, London, 1989. 19. B. M. Wilkins and E. Lanka, in “Bacterial Conjugation” (D. B. Clewell, ed.), p. 105. Plenum, New York, 1993. 20. E. Lanka and B. M. Wilkins, ARB 64, 141 (1995). 21. N. Firth, K. Ippen-Ihler and R. A. Skurray, in “Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology” (F. C. Neidhardt, ed.). American Society for Microbiology, Washington, DC, 1986 (in press). 22. L. S. Frost, K. Ippen-Ihler and R. A. Skurray, Microbiol. Reu. 58, 162 (1994). 23. D. B. Clewell, “Bacterial Conjugation.” Plenum, New York, 1993. 24. N. Willetts and B. M. Wilkins, Microbiol. Reu. 48, 24 (1984). 25. W. Pansegrau, E. Lanka, P. T. Barth, D. H. Figurski, D. G. Guiney, D. Haas, D. R. Helinski, H. Schwab, V. A. Stanisich and C. M. Thomas, J M B 239, 623 (1994). 26. G . Ziegelin, W. Pansegrau, B. Strack, D. Balzer, M. Kroger, V. Kruft and E. Lanka, DNA Seq. J. 1, 303 (1991). 27. W. Pansegrau and E. Lanka, NARes 15, 2385 (1987). 28. L. Miele, B. Strack, V. Kruft and E. Lanka, DNA Seq. J. 2, 145 (1991). 29. M. Lessl, D. Balzer, K. Weyrauch and E. Lanka, J. B u t . 175, 6415 (1993). 30. V. L. Waters, B. Strack, W. Pansegrau, E. Lanka and D. 6. Guiney, J. Bact. 174, 6666 (1992). 31. D. Balzer, W. Pansegrau and E. Lanka, J. Bact. 176, 4285 (1994). 32. M. L e d , D. Balzer, R. Lurz, V. L. Waters, D. G . Guiney and E. Lanka, J. B u t . 174, 2493 (1992). 33. G. Jagura-Burdzy, F. Khanim, C. A. Smith and C. M. Thomas, NARes 20, 3939 (1992). 34. V. J. Thomson, 0. S. Jovanovic, R. F. Pohlman, C. H. Chang and D. H. Figurski, J. Bact. 175, 2423 (1993). 35. D. G. Guiney and E. Yakobson, PNAS 80, 3595 (1983). 1. 2. 3. 4.
248
WERNER PANSEGRAU AND ERICH LANKA
36. J. P. Fiirste, W. Pansegrau, G. Ziegelin, M. Kroger and E. Lanka, PNAS 86, 1771 (1989). 37. E. Lanka, R. Lurz, M. Kroger and J. P. Fiirste, MGG 194, 65 (1984). 38. S. P. Cole, E. Lanka and D. G. Guiney, J. Bact. 175, 4911 (1993). 39. J. A. Kornacki, P. J. Balderes and D. H. Figurski, J M B 198, 211 (1987). 40. D. Balzer, G. Ziegelin, W. Pansegrau, V. Kruft and E. Lanka, NARes 20, 1851 (1992). 41. D. R. Williams, M. Motallebi-Veshareh and C. M. Thomas, NARes 21, 1141 (1993). 42. M. Zatyka, G. Jagura-Burdzy and C. M. Thomas, Microbiology 140, 2981 (1994). 43. 6. Jagura-Burdzy and C. M. Thomas, PNAS 91, 10571 (1994). 44. D. Balzer, Ph. D. Dissertation, Freie Universitat Berlin (1993). 45. B. D. Theophilus and C. M. Thomas, NARes 15, 7443 (1987). 46. M. Lessl, D. Balzer, W. Pansegrau and E. Lanka, JBC 267, 20471 (1992). 47. G. Ziegelin, W. Pansegrau, R. Lurz and E. Lanka, JBC 267, 17279 (1992). 48. V. L. Waters, K. H. Hirata, W. Pansegrau, E. Lanka and D. G. Guiney, PNAS 88, 1456 (1991).
49. G. Ziegelin, J. P. Fiirste and E. Lanka, JBC 264, 11989 (1989). 50. J. Haase, R. Lurz, A. M. Grahn, D. H. Bamford and E. Lanka,J. Bact. 177, 4779 (1995). 51. W. Pansegrau, Ph. D. Dissertation, Freie Universitat Berlin (1991). 52. W. Pansegrau, G. Ziegelin and E. Lanka, JBC 265, 10637 (1990). 53. H. Hiasa and K. J. Marians, JBC 269, 32655 (1994). 54. N. S. Willetts and C. Crowther, Genet. Res. 37, 311 (1981). 55. D. E. Bradley and T. Chaudhuri, in “Plasmids and Transposons. Environmental Effects and Maintenance Mechanisms” (C. Stuttard and K. Rozee, eds.), p. 335. Academic Press, New York, 1980. 56. E. Walker, M. Saraste, M . J. Runswick and N. J. Gay, EMBO J. 1, 945 (1982). 57. M. Lessl, W. Pansegrau and E. Lanka, NARes 20, 6099 (1992). 58. N. Willetts, MGG 180, 213 (1980). 59. M. M . Panicker and E. G . Minkley, Jr., JBC 267, 12761 (1992). 60. E. Cabezh, E. Lanka and F. de la Cruz, J. B a t . 176, 4455 (1994). 61. S . Okamoto, A. Toyoda Yamamoto, K. Ito, I. Takebe and Y. Machida, MGG 228,24 (1991). 62. A. Beijersbergen, A. Den Dulk-Ras, R. A. Schilperoort and P. J. J. Hooykaas, Science 256, 1324 (1992).
63. P. T. Barth, M. J. Grinter and D. E. Bradley, J . Bact. 133, 43 (1978). 64. M. Lessl, V. Krishnapillai and W. Schilf, MGG 227, 120 (1991). 65. D. Lyras, A. W. Chan, J. McFarlane and V. A. Stanisich, Plasmid 32, 254 (1994). 66. R. F. Pohlman, H. D. Genetti and S. C. Winans, Plasmid 31, 158 (1994). 67. S. Sukupolvi and C. D. O’Connor, Microbiol. Reu. 54, 331 (1990). 68. N. B. Perumal and J. E. G. Minkley, JBC 259, 5357 (1984). 69. Deleted in proof. 70. K. Shirasu, Z. Koukolfkovi-Nicola, B. Hohn and C. I. Kado, Mol. Microbiol. 11, 581 (1994).
71. P. J. Christie, J. E. Ward, Jr., M. P. Gordon and E. W. Nester, PNAS 86, 9677 (1989). 72. S. T. Fong and V. A. Stanisich, J . Gen. Microbiol. 135, 499 (1989). 73. E. Lanka and J. P. Fiirste, in “Proteins Involved in DNA Replication” (U. Hiibscher and S. Spadari, eds.), p. 265. Plenum, New York and London, 1984. 74. W. Pansegrau, D. Balzer, V. Kruft, R. Lurz and E. Lanka, PNAS 87, 6555 (1990). 75. M. Motallebi-Veshareh, D. Balzer, E. Lanka, G. Jagura-Burdzy and C. M. Thomas, Mol.
Microbiol. 6, 907 (1992).
76. Y. R. Thorstenson, G. A. Kuldau and P. C. Zambryski, J. B a t . 175, 5233 (1993). 77. P. J. Christie, J. E. Ward, Jr., M. P. Gordon and E. W. Nester, PNAS 86, 9677 (1989). 78. J. P. Fiirste, G. Ziegelin, W. Pansegrau and E. Lanka, in “Mechanisms of DNA Replica-
BACTERIAL CONJUGATION
249
tion and Recombination” (T. Kelly and R. McMacken, eds.), p. 553. Alan R. Liss, New York, 1987. 79. M. Bhattacharjee, X. M. Rao and R. J. Meyer, J. Bact. 174, 6659 (1992). 80. W. Pansegrau, G. Ziegelin and E. Lanka, BBA 951, 365 (1988). 81. W. Pansegrau, W. Schroder and E. Lanka, PNAS 90, 2925 (1993). 82. S. W. Matson, W. C. Nelson and B. S. Morton, J . B u t . 175, 2599 (1993). 83. P. Scheiffele, W. Pansegrau and E. Lanka, JBC 270, 1269 (1995). 84. W. Pansegrau, F. Schoumacher, B. Hohn and E. Lanka, PNAS 90, 11538 (1993). 85. S. W. Matson and B. S. Morton, JBC 266, 16232 (1991). 86. U . Reygers, R. Wessel, H. Muller and H. Hoffmann-Berling, EMBO J . 10, 2689 (1991). 87. J. A. Sherman and S. W. Matson, J B C 269, 26220 (1994). 88. S. Inamoto, H. Fukuda, T. Abo and E. Ohtsubo, J . Biochem. 116, 838 (1994). 89. E. Scherzinger, R. Lurz, S. Otto and B. Dobrinski, NARCS20, 41 (1992). 90. E. Scherzinger, V. Kruft and S. Otto, EJB 217, 929 (1993). 91. A. Kingsman and N. Willetts, JMB 122, 287 (1978). 92. A. D. M. van Mansfeld, H. A. A. M. vanTeeffelen, P. D. BaasandH. S. Jansz, NARCS14, 4229 (1986). 93. R. Hanai and J. C. Wang, JBC 268, 23830 (1993). 94. A. Rasooly, P. Z. Wang and R. P. Novick, E M B O J . 13, 5245 (1994). 95. Q . Gao, Y. Luo and R. C. Deonier, Mol. Microbiol. 11, 449 (1994). 96. W. Pansegrau, W. Schroder and E. Lanka, JBC 269, 2782 (1994). 97. T. V. Ilyina and E. V. Koonin, NARes 20, 3279 (1992). 98. E. V. Koonin and T. V. Ilyina, BioSystem 30, 241 (1993). 99. J. Lee, M.-C. Serre, S.-H. Yang, I. Whang, H. Araki, Y. Oshima and M. Jayaram, J M B 228, 1091 (1992). 100. D. G. Guiney, C. Deiss, V. Simnad, L. Yee, W. PansegrauandE. Lanka,J. Bact. 171, 100 (1989). 101. E. Yakobson and D. G. Guiney, MGG 192, 436 (1983). 102. S. S. Ner, A. A. Travers and M. E. Churchill, Trends Biochem. Sci. 19, 185 (1994). 103. D. B. Clewell and D. R. Helinski, PNAS 62, 1159 (1969). 104. S. Inamoto, Y. Yoshioka and E. Ohtsubo, JBC 266, 10086 (1991). 105. N. Furuya and T. Komano, J . Bact. 173, 6612 (1991). 106. P. Mazodier, R. Petter and C. Thompson, J . Bact. 171, 3583 (1989). 107. C. G. Murphy and M. H. Malamy, J. Bact. 177, 3158 (1995). 108. M. Llosa, 6 . Grandoso and F. d e la Cruz, J M B 246, 54 (1995). 109. F. Diirrenberger, A. Crameri, 8. Hohn and Z. Koukolikova-Nicola, PNAS 86, 9154 (1989). 110. A. Kornberg and T. A. Baker, in “DNA Replication,” p. 275. Freeman, New York, 1992. 111. E. Lanka, E. Scherzinger, E. Gunther and H. Schuster, PNAS 76, 3632 (1979). 112. B. M. Wilkins, C . J. Boulnois and E. Lanka, Nature 290, 217 (1981). 113. E. Lanka and P. T. Barth, J. Bact. 148, 769 (1981). 114. B. Strack, M. Lessl, R. Calendar and E. Lanka, JBC 267, 13062 (1992). 115. W. Pansegrau and E. Lanka, NARes 20, 4931 (1992). 116. G. Ziegelin, E. Scherzinger, R. Lurz and E. Lanka, E M B O J .12, 3703 (1993). 117. 6. Ziegelin, N. A. Linderoth, R. Calendar and E. Lanka, J . Bact. 177, 4333 (1995). 118. A. Merryweather, C. E. Rees, N. M. Smith and 8 . M. Wilkins, EMBOJ. 5, 3007 (1986). 119. C. E. Rees and B. M. Wilkins, Mol. Microbiol. 4, 1199 (1990). 120. L. S. Frost, in “Bacterial Conjugation” (D. B. Clewell, ed.), p. 189. Plenum, New York, 1993. 121. D. B. Clewell, Cell 73, 9 (1993).
250
WERNER PANSEGRAU AND ERICH LANKA
122. J. Rohrer and D. E. Rawlings, J. B a t . 174, 6230 (1992). 123. T. Komano, A. Toyoshima, K. Morita and T. Nishioka, J. Bact. 170, 4385 (1988). 124. P. Zambryski, A. Depicker, K. Kruger and H. M. Goodman, J. Mol. Appl. Genet. 1,361 (1982). 125. P. M. Cannon and P. Strike, P l m i d 27, 220 (1992). 126. R. Thompsom, L. Taylor, K. Kelly, R. Everett and N. Willetts, EMBOJ. 3, 1175 (1984). 127. A. Goldner, H. Graus and G. Hogenauer, Plasmid 18, 76 (1987). 128. S. A. McIntire and W. B. Dempsey, J. Bact. 169, 3829 (1987). 129. G. M. Coupland, A. M. C. Brown and N. S. Willetts, MGG 208, 219 (1987). 130. M. Llosa, S. Bolland and F. de la Cruz, MGG 226, 473 (1991). 131. P. Scholz, V. Haring, B. Wittmann-Liebold, K. Ashman, M. Bagdasarian and E. Scherzinger, Gene 75, 271 (1989). 132. M. A. Brasch and R. J. Meyer, J . B a t . 167, 703 (1986). 133. M. Drolet, P. Zanga and P. C. Lau, Mol. Microbiol. 4, 1381 (1990). 134. D. M. Cook and S. K. Farrand, J, B a t . 174, 6238 (1992). 135. A. Bernardi and F. Bernardi, NARes 12, 9415 (1984). 136. A. Wang and F. L. Macrina, J. Bact. 177, 4199 (1995). 137. D. Bastia, J M B 124, 601 (1978). 138. J. Morlon, M. Chartier, M. Bidaud and C. Lazdunski, MGG 211, 231 (1988). 139. W. Pansegrau and E. Lanka, NARes 19, 3455 (1991). 140. V. L. Waters and D. G. Guiney, Mol. Microbiol. 9, 1123 (1993). 141. S. Hironouchi and B. Weisblum, J. B a t . 150, 815 (1982). 142. T. McKenzie, T. Hoshino, T. Tanaka and N. Sueoka, Plasmid 15, 93 (1986). 143. F. Sanger, G. M. Air, B. 6. Barrell, N . L. Brown, A. R. Coulson, J. C. Fiddes, C. A. Hutchison, P. M. Slocombe and M. Smith, Nature 265, 687 (1977). 144. J, Sinis, D. Capon and D. Dressler, J S C 254, 12615 (1979). 145. G. N. Godson, B. G. Barrell, R. Staden and J. C. Fiddes, Nature 276, 236 (1978). 146. F. Heidekamp, P. D. Baas and H. S. Jansz, J. Virol. 42, 91 (1982). 147. A. Gielow, L. Diederich and W. Messer, J. B a t . 173, 73 (1991). 148. M. F. Yanofsky, S. 6. Porter, C. Young, L. M. Albright, M. P. Gordon and E. W. Nester, Cell 47, 471 (1986). 149. M. F. Yanofsky and E. W. Nester, J. B a t . 168, 244 (1986). 150. E. V. Koonin and T. V. Ilyina, J. Gen. Virol. 73, 2763 (1992). 151. I. Hwang, P. L. Li, L. Zhang, K. R. Piper, D. M. Cook, M. E. Tate and S. K. Farrand, PNAS 91, 4639 (1994). 152. J. E. Ward, D. E. Akiyoshi, D. Regier, A. Datta, M. P. Gordon and E. W. Nester,JBC 263, 5804 (1988). 153. R. F. Pohlman, H. D. Genetti and S. C. Winans, Mol. Microbiol. 14, 655 (1994). 154. A. A. Weiss, F. D. Johnson and D. L. Burns, PNAS 90, 2870 (1993). 155. C. G. Murphy and M. H. Malamy, J. Bact. 175, 5814 (1993). 156. L.-Y. Li, N. B. Shoemaker, (2.-R. Wang, S. P. Cole, M. K. Hashimoto, J. Wang and A. A. Salyers, J. Bact. 177, 3940 (1995). 157. N. Firth, K. P. Ridgway, M. E. Byrne, P. D. Fink, L. Johnson, I. T. Paulsen and R. A. Skurray, Gene 136, 13 (1993). 158. T. M. Morton, D. M. Eaton, J. L. Johnston and G. L. Archer, J. Bact. 175, 4436 (1993). 159. M. Bayer, R. Eferl, G. Zellnig, A. Teferle, A. Dijkstra, G. Koraimann and G. Hogenauer, J. Bact. 177, 4279 (1995). 160. P. C. Zambryski, Annu. Reu. Plant Physiol. Plant Mol. B i d . 43, 465 (1992). 161. V Citovsky, M. L. Wong and P. Zambryski, PNAS 86, 1193 (1989). 162. V. Citovsky, D. Warnick and P. Zambryski, PNAS 91, 3210 (1994).
BACTERIAL CONJUGATION
163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183. 184.
251
C. Young and E. W. Nester, ]. B a t . 170, 3367 (1988). E. A. Howard, B. A. Winsor, G. De Vos and P. Zambryski, PNAS 86, 4017 (1989). B. Tinland, B. Hohn and H. Puchta, PNAS 91, 8000 (1994). A. M. Vogel and A. Das, ]. Bact. 174, 303 (1992). F. Jasper, C. Koncz, J. Schell and H.-H. Steinbiss, PNAS 91, 694 (1994). B. Tinland, F. Schoumacher, V. Gloeckler, A. M. Bravo-Angel and B. Hohn, EMBO]. 14, 3585 (1995). N . Toro, A. Datta, 0. A. Carmi, C . Young, R. K. Prusti and E. W. Nester, j . Bact. 171, 6845 (1989). K. Shirasu and C. I. Kado, FEMS Microbiol. Lett. 111, 287 (1993). A. Covacci and R. Rappuoli, Mol. Microbiol. 8, 429 (1993). S. Bolland, M. Llosa, P. Avila and F. de la Cruz, ]. B u t . 172, 5795 (1990). D. V. Thompson, L. S. Melchers, K. B. Idler, R. A. Schilperoort and P. J. Hooykaas, NARes 16, 4621 (1988). G. A. Kuldau, G . De Vos, J. Owen, G. McCafkey and P. Zambryski, MGG 221, 256 (1990). K. Shirasu, P. Morel and C. I. Kado, MoZ. Microbiol. 4, 1153 (1990). M. Bally, A. Filloux, M. Akriin, C . Ball, A. Lazdunskiand J. Tommassen, Mol. Microbiol. 6, 1121 (1992). M. Albano, R. Breitling and D. A. Dubnau, ]. B a t . 172, 5386 (1989). 0. Possot, C. dEnfert, I. Reyss and A. P. Pugsley, Mol. Microbiol. 6, 95 (1992). F. Dums, J. M. Dow and M . J. Daniels, MGG 229, 357 (1991). M. Lindherg and A. Collmer, J . B u t . 174, 7385 (1992). B. Jiang and S. P. Howard, Mol. Microbiol. 6, 1351 (1992). D. Nunn, S. Bergman and S. Lory,]. Bmt. 172, 2911 (1990). M. Abdel-Monem, G . Taucher-Scholz and M. Q. Klinkert, PNAS 80, 4659 (1983). G. Grandoso, M. Llosa, J. C. Zabala and F. de la Cruz, E]B 226, 403 (1994).
recA - inde pe nde nt DNA Recombination between Repetitive Sequences: Mechanisms and Imp Iicat ions’ XIN BIZ AND LEROYF. Lru3 Department of Pharyiiacology UMDNJ-Robert Wood Johnson Medical School Piscataway, New Jersey 08854
I. recA-independent DNA Recombination between Direct Repeats A. Studies of DNA Recombination in Escherichia coli Using Plasmid Substrates . . . . . . . . . . . . . . . . . . . . . . ......... B. Recombination between Tandem Di A Is recA-
.............................
between Direct Repeats of DNA Is Reduced by Increasing the Distance between the Repeats . . . . . D. recA-independent Recombination between Direct Repeats of DNA Yields Multiple Forms of Products by an Intramolecular Mechanism(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Structural Factors That Can Influence the Formation of ucts of recA-independent Recom F. Models for recA-independent R
254 256 257 259
260 262 262
in Cis at a Distance .................................. H. Short Direct Repeats May Mediate Genome Instability . . . . . . I. Sister Chromatid Exchange May Be Mediated by Recombination
270 271
between Direct Repeats at the Replication Fork . . . . 11. recA-independent DNA Recombination between Inverted A. Possible Outcomes of Recornbination between Inverted Repeats of
272 273
....................................
B. Inversions in the Chromosomes of Bacteria and Phages . . . . . C. recA-independent Recornbination between Plasmid-borne Inverted Repeats ...... .......................... D. The Maj t of re dent Recombination between Plasmid-borne Inverted Repeats Is an Unusual Head-to-Head Dimer ... ...... .............
273 275 275 277
1 Abbreviations: CS, chromosomal spiral; Ap, ampicillin; Tc, tetracycline; SCE, sister chromatid exchange; RSS, reciprocal strand switching; EDRC, extrachromasomal double rolling circle. 2 Current address: Department of Molecular Biology, Princeton University, Princeton, NJ 08544. 3 To whom correspondence may be addressed.
Progress in Nucleic Acid Research and Molecular Biology Vol. 54
253
Copyright 0 19% b y Academic Press, lnc. All rights of reprodirctron in any farm reserved.
254
XIN BI AND LEROY F. LIU E. The Reciprocal-strand-switching Model for red-independent Recombination between Inverted Repeats of DNA . . . . . . . . . . . . . . . . F. Implications of the RSS Model in Genome Rearrangement and Gene Amplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
280 283 290
Three basic types of genetic recombination, general, site-specific, and illegitimate, have been defined (reviewed in 1).In Escherichia coli, general (or homologous) recombination typically involves extensive stretches of sequence homology and is dependent on the RecA protein and other recombination functions. Site-specific recombination occurs at highly preferred sites and requires special proteins other than RecA and other activities involved in general recombination. Illegitimate recombination requires little or no homology, requires none of the activities promoting general or sitespecific recombination, and usually occurs with a relatively low frequency. However, the distinction between general and illegitimate recombination has been blurred by recent findings that recombination between repetitive sequences in close proximity is very efficient but independent of RecA even when the homology is substantial (2-4). In this review, we discuss distinct features of recA-independent recombination between repetitive sequences in bacteria with special emphasis on replicational models and their implications in related processes, such as genome rearrangement and gene amplification in eukaryotic cells.
1. recA-independent DNA Recombination between Direct Repeats
Intramolecular reciprocal exchange between direct repeats can lead to deletion of one of the repeats and any intervening sequence (Fig. lA), whereas intermolecular unequal crossover can result in deletion in one of the products and addition in the other (Fig. 1B). These processes can be accomplished by RecA-mediated general homologous recombination (5, 6). It has been shown that efficient recA-independent recombination can also occur at direct repeats, resulting in deletion and addition, which are sometimes accompanied by other rearrangement(s) (2, 3). recA-independent recombination is most efficient when the direct repeats are in close proximity (3, 7, a), and it can form products different from that of recA-dependent homologous
255
?-eCA-INDEPENDENTRECOMBINATION
A
B
A
C
D
t 1
Reciprocal exchange
B
D
Unequal crossover
B B
C
D
C-
FIG.1. Recombination between direct repeats of DNA can lead to deletion and addition. The two direct repeats in the substrdte molecule are shown as open and filled arrows, respectively. (A) Intramolecular reciprocal exchange between the direct repeats leads to deletion (looping out) of one of the repeats plus the intervening sequence. (B) Intermolecular unequal crossover between direct repeats of two substrate molecules leads to deletion in one product and addition in the other.
recombination (2, 3, 9-12). It has been proposed that DNA replication is involved in recA-independent deletion and addition between direct repeats (2, 3, 7, 11).
256
XIN BJ AND LEROY F. LIU
A. Studies of DNA Recombination in Escherichia coli, Using Plasmid Substrates
Plasmids provide a convenient system for studying DNA recombination because their small genomes are amenable to detailed physical analysis. Various plasmid systems have been used to examine both inter- and intramolecular recombination.
1. Intermolecular recombination has been studied by examining the formation of multimeric plasmids from monomers (13-16), as well as recombination between limited homologies borne in two different plasmids (13, 17, 18). In the latter case, recombination was usually designed to restore an intact marker gene (e.g., tetA, for tetracycline resistance) that can be positively selected. 2. Intramolecular recombination has been studied by examining the conversion of oligomeric plasmids to lower order ones (13, 14, 16, 19),as well as recombination between plasmid-borne repeated sequences, which results in regeneration of a marker (15, 17, 18,20). Dimeric plasmids carrying two tetA genes each containing a mutation at a different site have been commonly used as substrates of recombination. The above systems have been used to examine if plasmid recombination requires host functions involved in general recombination pathways. Generally speaking, plasmid recombination depends on RecA (13-17, 19), RecF (16, 18, 20) with one exception (13), but not RecBCD (13-17). In recBC strains, sbcA mutations that activate the RecE recombination pathway (reviewed in 5, 6,21,22) induce plasmid recombination (15-1 7,20). Except for recombination between homologies carried by two different plasmids (I3, various types of plasmid recombination by the RecE pathway are independent of RecA (15-17, 20). At least intramolecular plasmid recombination is stimulated in recBC strains with sbcBC mutations (18, 20), which activate the RecF pathway of recombination (reviewed in 5, 6, 21, 22). It was surprising that plasmid recombination by the RecE pathway (in recBCsbcA strains) is independent of RecA, whose essential role in general recombination has been well established (reviewed in 23-26). However, this became less of a surprise when RecT, a RecA-like protein, was shown to be induced by sbcA mutations in recBC strains (27, 28; reviewed in 29). Therefore, results from studies using the above plasmid systems indicate that plasmid recombination is in general dependent on the function of RecA (or its homologs), though there is variation in the degree of dependence from one type of substrate to another. This held true until recently, when recombination between plasmid-borne tandem repetitive sequences was shown to
reCA-INDEPENDENT RECOMBINATION
257
be recA independent (2-4). We focus the following discussions on the features of recA-independent plasmid recombination.
B. Recombination between Tandem Direct Repeats of DNA Is recA-independent
As discussed above, recombination of dimeric plasmids carrying two tetA genes each containing a mutation at a different site is recA dependent (17, 18, 20, 30). In these substrates, the homologies involved in recombination are separated by relatively long intervening sequences (>3 kb). Recently, intramolecular recombination has been studied by examining deletion between tandem direct repeats within the tetA gene in pBR322 derivatives (Fig. 2). Surprisingly, these tandem direct repeats mediate efficient recAindependent recombination (2,3)whose homology requirement is very limited (3).As shown in Fig. 3, recornbination between tandem repeats (S,) increases sharply when the repeats are lengthened from 14 bp up to 100 bp, with virtually the same frequency in recA- and recA+ cells. Increasing the length of the repeats beyond 100-300 bp gradually induces recombination in recA+ cells without significantly affecting recombination in recA- cells. Therefore, it appears that recombination has a limited dependence on RecA when the repeats are large (>300 bp) (e.g., recombination in recA+ cells is about three- to fivefold more efficient than that in recA- cells when the 436317 8 6
(2535)
’9432
The D-7 region
FIG.2. Structure of pBR322-based plasmid substrates for recombination between direct repeats. The coordinates are those of pBR322. Open arrow, the open reading frame of the tetA gene; filled arrows, direct repeats within the tetA gene; thick line, intervening sequence between the direct repeats. See text for description of the D-7 region and ori. The plasmid substrate contains the blu gene for ampicillin (Ap) resistance (ApR),but the tetA gene is disrupted so that the host would be Ap resistant and tetracycline (Tc) sensitive (ApRTcS).Deletion between the direct repeats would regenerate the tetA gene and the host would be ApRTcR.
258
XIN BI AND LEROY F. LIU
S1
I
X
k 1
2
X I
I
1 oS3
1o
1o
-
- ~ - ~
recA-
1o-6
0
100 200 300 400 5 0 0 600 700
Length of the direct repeat (X) (bp) FIG.3. recA-independent recombination and recA-dependent recombination hetween direct repeats are differentially affected by the length of the repeat and the distance between the repeats. Direct repeats in a substrate are shown as open arrows. S, represents a series of pBR322-based plasmids with tandem direct repeats of various lengths (x) within the tetA gene (see Fig. 2). The vertical bars indicate any pair of homologous segments within the repeats. The distance between them is the length of the repeat (x). S, represents a series of plasmids derived from S, plasmids by inserting a 3872-bp sequence between the direct repeats (see Fig. 2). The frequency of recombination (in logarithm) of the S , and S, series of plasmids is plotted as a function of the length of the repeat. Thin lines, recombination in recA+ cells; thick lines, recombination in recA- cells. This figure is a summary of some of the results described (3).
repeats are -1 kb long). It is noteworthy that efficient reckndependent recombination can occur between tandem repeats of as short as 14 bp. RecA plays a central role in general homologous recombination both as a structural protein and as a reaction catalyst (reviewed in 23-26). It promotes homologous pairing of DNA molecules and catalyzes strand-exchange reac-
red-INDEPENDENT RECOMBINATION
259
tions leading to the formation of heteroduplex DNA in uitro (reviewed in 2326). It is not surprising that (illegitimate or nonhomologous) recombination, which requires little or no homology, is independent of RecA (reviewed in 31). However, efficient recA-independent recombination between substantial homologies (up to 1 kb) is unexpected. Moreover, recombination between tandem direct repeats is also independent of other functions that are important for general recombination, including RecBCD and RecF (2).
C. recA- independent
Recombination between Direct Repeats of DNA Is Reduced by Increasing the Distance between the Repeats
One important feature about recA-independent recombination between direct repeats is that it is affected by the distance separating the repeats (3, 7, 8). In a recA- strain, recombination between direct repeats of various lengths (from 14 to 606 bp) is sharply reduced (to less than 2%)by inserting a 3872-bp-long sequence between the repeats (3) (Fig. 3; compare the thick curves). Shorter insertions exerted a lesser effect (3). This strongly indicates that recA-independent recombination between direct repeats, long or short, is dependent on the distance between the repeats. In a recA+ strain, however, no such distance effect was observed when the repeats were larger than 300 bp, but increasingly greater distance effect was observed as the repeats were shortened (3) (Fig. 3; compare the thin curves). This is probably because recombination between direct repeats in a recA+ strain consists of two components, one recA independent and the other mediated by RecA. When the repeats are long (>300 bp), RecA-mediated recombination is at least as efficient as, if not predominant over, recA-independent recombination and is insensitive to the distance. When the repeats are short (<300 bp), however, recA-independent recombination, which is proximity-sensitive, becomes predominant over RecA-mediated recombination. Therefore, the overall frequency is reduced by increasing the distance between the repeats. These observations are supported by later results (7)showing that recombination between 100-bp direct repeats decreases to less than 1%when the repeats are separated by up to 7-kb sequences in both recA+ and recAstrains. The distance effect for recombination has also been observed in B a c i h s subtilis (8). In summary, it is clear that increasing the distance between the homologies reduces recA-independent recombination without significantly affecting RecA-mediated recombination. This suggests that recA-independent recombination involves mechanism(s) different from that for general recombination. The discrepancy concerning recA dependence between the above results
260
XIN BI AND LEROY F. LIU
and results from earlier studies of intramolecular recombination (discussed in Section 1,A) can be explained as follows. Because recA-independent but not recA-dependent recombination is reduced by lengthening the distance between the repeats, recombination between tandem direct repeats appears recA independent whereas that between direct repeats separated by a large distance appears recA dependent (Fig. 3). Note that it is actually the distance between each pair of homologous segments within the repeats that is important in affecting recA-independent recombination. Within the tandem direct repeats, each pair of homologous segments is separated by a distance of the length of the repeat (Fig. 3, SJ. When the repeat gets longer, the distance between each pair of homologous segments gets larger. Lengthening the repeats is likely to cause the following counteracting effects on recA-independent recombination. On the one hand, it can increase recombination because it increases the choices of homologous segments that can participate in recombination; on the other hand, it can decrease recombination between each pair of homologous segments because the distance between them is larger. The fate of the overall frequency is therefore determined by the relative strength of these two counteracting effects. (a) When the repeats are short, the increasing effect is probably predominant and therefore lengthening the repeats increases recAindependent recombination. (b) When the repeats are moderate, the two effects may be balanced so that the overall frequency of recombination may appear unchanged. (C) When the repeats are long, the decreasing effect is likely to be predominant so that the overall frequency of recombination may be decreased as the repeats get longer and longer. Our results (Fig. 3) are consistent with (a) and (b)(3).The discussion in (c) may explain earlier results that conversion of oligomeric plasmids to lower order ones is recA dependent (13, 14, 16, 19). An oligomeric plasmid consists of tandemly repeated monomers and its conversion to lower order ones is the result of recombination between the tandem monomers. In the early studies (13,14,16,19), the sizes of the monomers were large (>4 kb), and therefore the overall recombination appeared recA dependent.
D. recA-independent Recombination between Direct Repeats of DNA Yields Multiple Forms of Products by an Intramolecular Mechanismjs) 1. THREEBASICFORMS OF PRODUCTS OF WCA-INDEPENDENT RECOMBINATION BETWEEN PLASMID-BORNE DIRECTREPEATS Intramolecular deletion of plasmid-borne direct repeats is predicted to generate a monomeric product with one of the repeats plus any intervening
261
WCA-INDEPENDENT RECOMBINATION
sequence deleted (see M in Fig. 4).Indeed, M has been found to be a major product of recA-independent recombination of plasmid substrates in several studies (2, 3, 12). However, besides M , dimeric forms of plasmids have also been observed as the result of recA-independent recombination (2,3,9-12). In some cases, surprisingly, dimers are the only products of recombination (10-12). Two special dimeric forms named 1 + 2 and 1+3, respectively (1012; Fig. 4), have been observed. Form 1 + 2 is a head-to-tail type of dimer consisting of a monomeric substrate and a monomeric product (M), whereas 1+3 is structurally identical to the product of an intermolecular unequal crossover between the direct repeats. The products M, 1+2, and 1+3 may be formed by different mechanisms or through a common pathway. RECOMBINATION LEADING FORMATION OF THE DIMERICPRODUCTS
2. Ted-INDEPENDENT TO
Is INTRAMOLECULAR
The dimeric nature of the products 1+ 2 and 1+3 may indicate that their formation involves intermolecular recombination. In theory, 1+3 can be formed by an unequal crossover between the direct repeats of two substrate plasmids, whereas 1 + 2 can be formed through recombination between M and a substrate molecule. However, intermolecular recombination is unlike-
f
@
FIG. 4. Structures of the products of recA-independent recombination between plasmidborne direct repeats. Filled arrows. direct repeats; thick line, the intervening sequence hetween the repeats. The open arrow indicates the orientation of the sequence of the plasmid outside of the direct repeats and the intervening sequence; M, the monomeric product of intramolecular deletion; l + 2 and 1+3, the dimeric products, each with deletion and other rearrangements.
262
XIN BI AND LEROY F. LIU
ly to be responsible for the formation of 1+2 and 1+3 for the following reasons.
1. Intermolecular recombination is rare in recA- strains. Virtually no intermolecular conjugational recombination has been observed in recA strains (32).Oligomer formation from monomeric plasmid is recA-dependent (13, 14),and recombination between compatible plasmids is greatly reduced in recA- strains (13, 17). 2. If the hybrid dimer is formed by an intermolecular recombination event between the repeats, increasing the length of the intervening sequence should not have any effect on their formation. However, as discussed above, recombination between direct repeats is greatly reduced as the intervening sequence increases. 3. By examining recombination of two compatible plasmid substrates in the same cell, more evidence has been obtained for the intramolecular nature of recA-independent formation of dimeric products (2, 32a).
E. Structural Factors That Can Influence the Formation of Various Products of re cA - independent Recombination
As discussed in Sections I,B and I,C, the overall frequency of recAindependent recombination between direct repeats is affected by both the length of the repeats and the distance between them. These factors also differentially influence the relative abundance of each form of product of recA-independent recombination (32b) as follows. (1) Recombination between very short tandem repeats (e.g., 14 bp in length) yields exclusively the monomeric product (M). (2) Lengthening the tandem repeats gradually increases the abundance of the dimeric products, most of which is 1+2. For example, when the length of the repeat is in the range of 100-600 bp, 6070% of the products is M, 20-30% is 1+2, and only 0-3% is 1+3. (3) Increasing the distance separating the repeats sharply reduces the abundance of M and increases the abundance of 1+2. When tandem repeats of 559 bp are separated by intervening sequences of 100 bp or longer, the abundance of M is reduced to only a few percent or to zero, whereas 1+2 becomes the predominant product (>go%), and the abundance of 1+3 remains low (
F.
Models for recA-independent Recombination between Direct Repeats
One major problem to address about the mechanism of recA-independent recombination between direct repeats concerns how homologous pair-
263
WCA-INDEPENDENT RECOMBINATION
ing (searching) and strand exchange can occur without RecA. Annealing between single strands of DNA within the homology has been considered for recA-independent homologous pairing. The prelude to annealing between the single strands is the denaturation of the DNA duplex, which may occasionally occur during DNA replication or repair. The models for recA-independent recombination described below all invoke annealing of single strands of DNA within the homology during DNA replication or repair.
1. THE REPLICATION-SLIPPAGE MODEL AND SINGLE-STRAND-ANNEALING MODEL
THE
Two models have been proposed for non conservative deletion between direct repeats. The replication-slippage model (33, 34; reviewed in 31) requires transient pausing of DNA replication followed by the slippage of the tip of a growing strand from one copy to the other copy of the repeat. Removal of the single-stranded loop generated by the slippage leads to deletion of one of the repeats and the sequence between them. The single-strandannealing model was originally proposed for deletion formation in eukaryotic cells (35).This model (Fig. SA) involves an initial double-strand break in the sequence between the homologous segments followed by 5’-to-3’ exonuclease action to expose the complementary strands. Annealing of the exposed complementary sequences from both segments followed by polymerase filling-in and ligation would complete the recombination. In this model, increasing the distance between the homologous sequences is expected to inhibit recombination because the exonuclease may have difficulty exposing complementary sequences located far apart. Both of these models can explain the formation of M but not 1+2 or 1+3 by plasmid recombination.
2. THE ROLLING-CIRCLE-REPLICATION MODEL This model has been advanced to explain the formation of the dimeric products 1+2 and 1+3by recA-independent plasmid recombination (11).It is based on the observation that the frequency of formation of the dimeric products is increased in recBrecC mutants (at least for recombination between short repeats; see Ref. 11).In recBrecC or recD mutants, a considerable fraction of plasmid DNAs exist as linear oligomers, which are suggested to form in the transition from 8 replication to rolling-circle replication (36, 37, reviewed in 38). In wild-type cells, transition to rolling-circle replication also exists, albeit greatly inhibited by RecBCD activity (exonuclease V). It was proposed that formation of 1+2 and 1+3is triggered by the switch from 8 replication to rolling-circle replication, as illustrated in Fig. SB.
264
3'
5’
XIN BI AND LEROY F. LIU
-: - -;I
-
1
Double strand break
3'
5’
.
5’ 3'
'
3'
5’
5’
C
3'
5' Exonuclease
3'
5’
L
3'
'
3'
I
5’ 5 7 3'
Homologous pairing
3'
5’
L*\I
5’
---Y3'
3 -5' 5’ 3-,
t
d
Repair
+
1 +2
J
J
'Li
\
e
i 1+2 or 1+3
FIG. 5. The single-strand-annealing model and the rolling-circle-replication model for recA-independent recombination between plasmid-borne direct repeats of DNA. Each strand of a repeat participating in recombination is shown as an arrow. Open arrows correspond to the 5' to 3' orientation of the DNA strand and filled arrows the 3' to 5’ orientation. (A) The singlestrand-annealing model for recA-independent intramolecular deletion between direct repeats of DNA. See text for description. (B) The rolling-circle-replication model for the formation of l f 2 and 1+3 (Fig. 4).(a) During rolling-circle DNA replication, a single-stranded repeat (or part of it) may occasionally appear at the end of the linear portion of the replication intermediate. (b) The single-stranded repeat anneals with its complementary sequence in the circular portion of the intermediate. Note that the nascent duplex may be very short and thus unstable. (c) After the gap generated in b is repaired, a 6-shaped intermediate is formed. This intermediate can be further processed to form either the structure in d, which could lead to the formation of lf2,or the structure in e, which could lead to the formation of 1+2 or 1+3, after completion of DNA replication. (A is adapted from Fig. 8 in Ref. 35; 3 is adapted from Fig. 4 in Ref. 11, with permission.)
3. THE SISTER-STRAND-EXCHANGE MODEL
This model was also proposed to explain the formation of 1+2 and 1 + 3 (2).The essence of this model is that, at a stalled replication fork, the nascent leading and lagging strands from different repeats are displaced and then annealed with each other, generating a recombinogenic intermediate. Dif-
TeCA-INDEPENDENT
RECOMBINATION
265
ferential processing of this intermediate leads to the formation of 1 + 2 or 1+3 (2).According to this model, formation of 1+2 involves a major misalignment that makes the whole recombination process complicated. This step is eliminated in a more concise version of the model (7) (Fig. 6). 4. THE MISALIGNMENT-EXCHANGE MODEL The misalignment-exchange model (32b)is a unifying model for the formation of all three basic forms of products of recA-independent recombination between plasmid-borne direct repeats. It can accommodate most of the known findings about recA-independent recombination. According to this model, during DNA replication, the strands of the repeats misalign to form a reconibinogenic intermediate either by melting of the DNA duplex followed by misalignment (Fig. 7B), or by replication slippage (Fig. 8B). This intermediate can then be further processed differentially to produce the three basic recombination products (Fig. 4). The model is detailed in Fig. 7. (A) At the replication fork, after replication of the repeats in the leading strands is completed, the region encompassing the direct repeats in the lagging strand remains single-stranded. (B) Misalignment occurs between the repeats in the nascent DNA helix of the leading strand and its template. This generates two single-stranded loops, each consisting of one strand of a repeat and the intervening sequence. (C) If the loops are subsequently removed by endonuclease and ligase, completion of replication will lead to the formation of a monomeric product of deletion (M) together with a substrate plasmid. (D) If the loop originated from the leading-strand template anneals with its complementary sequence in the lagging-strand template, dimeric products will be formed as discussed below. The open arrow indicates the specific junction created in the leadingstrand template that is to be cleaved. As indicated by the filled arrows, the specific junction in the lagging strand template and that in the leading strand are to be cleaved and rejoined after exchange. Note that these processes do not have to occur simultaneously. Completion of these processes leads to the structure shown in Fig. 7E, which is better illustrated in E ’ . Up to this point, a crossover is formed and there is only one repeat left in the original lagging strand, and therefore, for a circular substrate, both deletion and dimerization are achieved. (F) If the remaining single-stranded loop in E’ is removed, completion of replication would lead to the formation 1+2. (G) If the loop persists and “slides” back to anneal with its complementary sequence, 1+3 would be formed. It is noteworthy that the processes proposed for the formation of 1 + 3 are functionally equivalent to interchromatid unequal crossover between the repeats. Misalignment of the strands of the repeats can also be accomplished by replication slippage (33, 34; reviewed in 31) as shown in Fig. 8B, and the
266
XIN BI AND LEROY F. LIU
B
_n
C
D
t
FIG.6. The sister-strand-exchange model for recA-independent recombination between direct repeats of DNA. (A] Direct repeats at a replication fork. The symbols used are the same as in Fig. 5 . (B) Nascent strands of the repeats are displaced from their templates. (C) Nascent strands of the repeats anneal with each other, whereas their templates also anneal with each other. The open arrowhead indicates that the specific junction is to be cleaved. (D) Both deletion and crossover are accomplished after the nicking shown in (C). Further processing similar to that shown in Fig. 7D would lead to the formation of l f 2 (Fig. 4) after completion of replication. (This figure is adapted from Fig. 3D in Ref. 7, with permission.)
recombinogenic intermediate generated can eventually lead to the formation
of M or 1+2 but not 1+3 (compare Figs. 8 and 7). It is possible that both “simple” misalignment (Fig. 7B) and replication slippage (Fig. 8B) can lead to the formation of the products M and 1+2, but only the processes A + B + D -+E + G shown in Fig. 7 are responsible for the formation of 1+3.
267
Ted-INDEPENDENT RECOMBINATION
A
Z:
. '
A A
0
C
0
C'
i
5
B
E 5’
E’
F
3'
3
5
5'
;, 5
+
5
3' 5’
FIG. 7. The misalignment-exchange model for recA-independent recombination between direct repeats of DNA. Each strand of a repeat participating in recombination is shown as an arrow. Note that the repeat shown may be only a segment of the entire duplication in the substrate. Open arrows correspond to the 5'-to-3' orientation of the DNA strand and filled arrows to the 3'-to-5' orientation. The markers A, B, and C are arbitrary. The open and shaded circles represent the leading- and lagging-strand DNA polymerases, respectively. The wavy line represents the RNA primer for lagging-strand DNA synthesis. See Section I,F,4 for description.
268
XIN BI AND LEROY F. LIU
A
3' 5'
+
5'
B
5' 3'
Replication slippage
5'
c
3'
__---__.-._-A B C 7
5’ '
5' ,*'
3'
D
FIG.8. DNA replication slippage proposed in recA-independent recombination between direct repeats. (A) Direct repeats at a replication fork. The symbols used are the same as that in Fig. 7. (B) Slippage of the leading-strand polymerase from one repeat to the other. (C and D) Further processing of the structure generated in 3, which is similar to that shown in Fig. 7D and E.
The misalignment-exchange model can accommodate most of the results from studies on recA-independent recombination between direct repeats as follows.
reCA-INDEPENDENT RECOMBINATION
269
1. It explains why recombination between direct repeats, which is independent of RecA and other recombination functions, can happen (2, 3). 2. It explains why efficient recA-independent recombination can occur between very short tandem repeats (e.g., 14 bp) with relatively high frequency (Fig. 3), and generates exclusively the monomeric product (M). The former occurs probably because misalignment does not require long repeats, and the shorter the repeats, the easier for the region around them to melt before misalignment can occur. The latter occurs because when the repeats are too short, interstrand pairing between the loop (ABC) and the laggingstrand template (Fig. 7D) may not be possible, and thus no 1+2 or 1+3 can be formed. 3. It explains why lengthening the tandem repeats up to -100 bp increases the frequency of recombination but further lengthening the repeat no longer increases recombination (Fig. 3). As discussed in Section I,C, this may be due to the counteracting effects of lengthening the repeats on recombination. The following is a mechanistic explanation of the counteracting effects in light of the misalignment-exchange model. On the one hand, lengthening the repeats can increase the choices and the lengths of sequences that can participate in misalignment (Fig. 7B). On the other hand, it would also make it more difficult for the region encompassing the repeats to melt before slip-pairing can occur. Because of these counteracting effects, it is not surprising that an optimal length (-100 bp) has been observed for tandem repeats mediating recA-independent recombination (Fig. 3). 4. It explains why there is an increase in the formation of the dimeric products as the repeats get longer. This is because lengthening the repeat leads to the formation of larger loops (Fig. 7B), which in turn increases the rate of annealing between the loop and the lagging strand template. Increased rate of annealing favors the formation of the dimeric products. 5 . It also explains why increasing the intervening sequence reduces recombination between direct repeats (3, 7 , 8),and inhibits the formation of M (32b).According to the misalignment-exchange model, increasing the length of the intervening sequence would make it more difficult to melt the region encompassing the direct repeats and thus inhibit the overall recombination frequency. At the same time, it would make the loops in Fig. 7B larger, favoring interstrand annealing of loop ABC and its complementary sequence and thus favoring the formation of the dimeric products. In summary, the replication-slippage model, the single-strand-annealing model, the rolling-circle-replication model, and the sister-strand-exchange model can each explain the formation of one or two of the products observed, whereas the misalignment-exchange model explains the formation of all
270
XIN BI AND LEROY F. LIU
three basic products of recA-independent recombination between direct repeats. Whether any of the models is correct remains to be determined.
G. A DNA Sequence That Can Affect re cA- independen t Recomb ination in Cis at a Distance Although many recent studies on plasmid recombination employed tandem repeats within the tetA gene, the rest of the substrates were slightly different ( 2 , 3 , 7,10-12,39). A careful survey of the results of recombination of various substrates reveals that when the tandem repeats are in a pBR322 background, recombination is recA independent (2, 3), whereas when the repeats are in the background of pAT153 or pBR327, recombination is reduced to &th to d t h by the inactivation of recA (10,39).pAT153 differs from pBR322 in that it lacks a 623-bp fragment (coordinates 1727 to 2349, designated D-7; Fig. 2) of pBR322. pBR327 has an even larger fragment (coordinates 1428 to 2516 of pBH322), including D-7 deleted. It is therefore likely that deletion of D-7 is responsible for the decrease of recA-independent recombination between tandem repeats. Moreover, recA-independent recombination between tandem repeats within tetA in pBR327 or pAT153 yields exclusively dimeric products (10-12), indicating that D-7 might be essential for the mechanism responsible for M formation. A systematic study of the effect of D-7 on recombination between tandemly repeated sequences (12, 32b) reveals the following intriguing features. (1) Deleting D-7 reduces (to about 4 t h ) recA-independent recombination but not recombination in recA+ cells. (2) D-7 is cis-acting. D-7 carried on a compatible plasmid can not compensate its deletion from the original substrate. (3) D-7 exerts its effect in a position-dependent but orientation-independent manner. Inverting the D-7 sequence has no effect on recombination. Moving D-7 from its original position to other sites in the substrate has the same effect as deleting D-7. (4) No specific segment of D-7 appears essential for its effect on recombination. However, the length of the deletion appears important. Shorter deletions forming the D-7 region have less effect on recA-independent recombination. Moreover, other sequences of comparable size are able to substitute for D-7 in influencing recombination. (5) D-7 does not affect recombination between tandem direct repeats within the bla gene of pBR322 (Fig. 2). (6) Deletion of D-7 also dramatically changes the spectrum of recombination products of the pBR322-based substrates. For example, for recA-independent recombination between tandem repeats of 559 bp, in the presence of D-7, the majority (-70%) of the products are M with the rest (-30%) being mostly 1+2;
reCA-INDEPENDENT RECOMBINATION
271
when D-7 is deleted, however, all of the products are dimeric, with 1 + 2 being the majority (-85%). How does D-7 affect recA-independent recombination at a distance from the homology? The results discussed above indicate that it may serve merely as a DNA spacer between the replication origin (ori;see Ref. 40 for review) and the direct repeats in pBR322. In light of the misalignment-exchange model, a tentative mechanistic explanation is discussed as follows. In this model, the way the critical recombinogenic intermediate (Fig. 7B) is processed is determined by whether the region encompassing the repeats in the lagging strand template is single-stranded or not. If the region is single stranded, annealing of the loop (ABC) with the lagging-strand template might be so efficient that B -+D (Fig. 7) is the major or only process whereas B + C is rare or absent. The single-strandedness of the region is determined by lagging-strand synthesis, which might in turn be affected by the distance between ori and the repeats. It is possible that in the absence of D-7, when the repeats in the leading strand are being synthesized, their counterparts in the lagging strand are much more likely to be single-stranded. Therefore, deleting D-7 may strongly favor the formation of 1+2 and 1+3.
H. Short Direct Repeats M a y Mediate Genome instability Although most of the recent studies of recA-independent recombination between direct repeats were done with plasmid substrates, some of the results have been reproduced in bacterial chromosomes (2, 7). It is likely that recA-independent recombination in bacteria has its counterpart in eukaryotic cells. Deletion between direct repeats (and accompanying rearrangements) in the chromosome renders it unstable. In light of recent findings (2, 3), this may pose an especially severe problem if there is a cluster of tandem direct repeats in the chromosome. Satellite DNAs in animal cells are a good example of such clusters of tandem direct repeats. It is not surprising that the sizes of satellite DNAs tend to be highly polymorphic, with wide variations between individuals. This polymorphism has been attributed to misalignments between the repeats during chromosome pairing. When cloned into plasmid vectors, satellite DNA is extremely unstable even in recA- hosts (41). This can be explained by the efficient recA-independent recombination between the tandem direct repeats within the cloned satellite DNA. Another example of a cluster of tandem direct repeats is microsatellite DNA. Microsatellites consist of repeating units of 1-5 bp and are abundant and unstable in eukaryotic genomes. Recently, rearrangements at micro-
272
XIN BI AND LEROY F. LIU
satellites have been found to be associated with certain human genetic disorders (reviewed in 42) and colorectal carcinomas (43, 44). Whether the models discussed above, especially the misalignment-exchange model, are applicable to the rearrangements of satellite and microsatellite DNAs awaits examination.
I. Sister Chromatid Exchange May
Be Mediated by Recombination between Direct Repeats at the Replication Fork
Reciprocal (sister) chromatid interchange (SCE) at homologous loci is a sensitive cytogenetic indicator of DNA damage and can be induced by various chemical mutagens as well as radiation (reviewed in 45,46). SCE is an S-phase event (47) but its molecular mechanism remains unclear despite intensive studies. One major question about SCE concerns whether it occurs between the two daughter DNA duplexes or right at the replication fork. The latter seems more plausible and several models of this type have been developed (45). In these models, breaks in the template strand play an important role in initiating SCE. Although this assumption is supported by several observations, it is inconsistent with data from some other studies (45). Interestingly, as discussed in Section I, F, in the misalignment-exchange model for recA-independent recombination, the processes proposed for the formation of 1 + 3 (Fig. 7) are functionally equivalent to unequal SCE with deletion in one chromatid and addition in the other. Moreover, formation of 1 + 2 as explained by the misalignment-exchange model and the sisterstrand-exchange model (Fig. 7) also involves SCE. In this case, one of the chromatids has a deletion. Therefore, we propose that SCE might occur between direct repeats at the replication fork by the misalignment-exchange mechanism (Fig. 7). Note that in this scenario, SCE is always accompanied by deletion in one chromatid with or without addition in the other (see Fig. 7G and F), and can occur only when the direct repeats are relatively short and in close proximity. Hence the deletion and addition accompanying SCE are very small (at most several kilobases) and cannot be detected by cytogenetic methods. The notion that SCE is via recombination between direct repeats at the replication fork by a misalignment-exchange mechanism provides a testable working model for SCE. This model implies the following: (1)Short tandem direct repeats are “hot spots” for SCE. (2) Deletion alone or deletion plus addition accompanies SCE. (3) Events that stall DNA replication may enhance SCE. (4) Breaks in DNA are not necessary in the initiation of SCE. Detailed molecular analysis of SCE junctions are needed to establish whether SCE happens by recombination between direct repeats.
red-INDEPENDENT RECOMBINATION
273
II. recA-independent DNA Recombination between Inverted Repeats
A. Possible Outcomes of Recombination between Inverted Repeats of DNA Recombination between inverted repeats can alter the orientation of the intervening sequence relative to outside sequences. Intramolecular recombination between inverted repeats can invert the intervening sequence (Fig. 9A). The mechanism responsible for this must be reciprocal. Intermolecular recombination between inverted repeats can result in a variety of forms of products (Fig. 9B and C). As illustrated in Fig. 9B, if two reciprocal crossovers occur, the result would be the inversion of the intervening sequences in both of the product molecules. If none of the repeats is marked by mutation, the products of such an event are formally the same as that of a “simple” intramolecular inversion (Fig. 9A). In the above cases (Fig. 9A and B), two new junctions (AC and B D ) are generated in each product molecule but the sequences outside the inverted repeats are not changed. If only one crossover occurs between the two molecules as shown in Fig. 9C, then only one new junction (AC or B D ) will be generated in each product molecule and the rest of the molecule is also rearranged. Note that each product consists of an inverted duplication separated by a unique sequence. In theory, the event in Fig. 9C does not invert the intervening sequence “completely” (i.e., inversion relative to both of the outside markers A and D). However, the intervening sequence is inverted relative to the A marker in one of the products (with a sequence of ACBA for the markers), and is inverted relative to the D marker in the other (with a sequence of D B C D for the markers). If the starting molecules are both linear, then the products are separated molecules, each with both deletion and duplication. If one or both of them are circular, however, then the two products are actually parts of one special dimer containing large inverted duplications. (For such an example, see later, Pd in Fig. 11.) Although the above processes can be accomplished via recA-mediated homologous recombination, efficient recA-independent recombination between inverted repeats in close proximity has been discovered recently (4). Moreover, bacteria and phages have specialized inversion systems that only work on inversely repeated short specific sites. We review briefly bacterial inversion systems and then focus our discussions on recA-independent recombination between inverted repeats and its implications in genome rearrangements and gene amplification.
C
B
B
A
- c
v
D
eB
C -
A
C
FIG. 9. Possible outcomes of recombination between inverted repeats of DNA. The sequence of markers A, B, C, and D in the substrate molecule (double stranded) is ABCD. Two inverted repeats (arrows) are located between A and B and C and D, respectively. (A) An intramolecular reciprocal exchange between the inverted repeats results in the inversion of the intervening sequence. The resulting sequence of the markers is ACBD. (B) Two intermolecular exchanges between inverted repeats in different molecules also invert the intervening sequence. The resulting sequence of the markers in both of the products is ACBD. (C) One intermolecular exchange between repeats in different molecules totally rearranges the two molecules. The sequence of markers A, B, C, and D is ACBA in one product, and is DBCD in the other. Each product consists of an inverted duplication separated by a unique sequence (BC). See Section II,A for more explanation.
WCA-INDEPENDENT RECOMBINATION
275
B. Inversions in the Chromosomes of Bacteria and Phages In bacteria, specialized enzymatic systems such as those involved in sitespecific recombination and transposition can catalyze inversion between specific pairs of inverted repeats. Some inversion systems are employed as genetic switches for the expression of certain genes. Inversion either changes the orientation of an active promoter or places a gene downstream of an external promoter. Several bacterial and phage inversion systems (e.g., hin, vin, fim, pin, gin, and cin) have been described (reviewed in 48, 49). For each system, inversion is mediated by a pair of short, specific inverted repeats and requires a special invertase and certain host proteins. Transposable elements also contain short inverted repeats at their ends that are essential for the transposition process. In some composite transposons (e.g., Tn5), the inverted repeats are actually two copies of an insertion sequence (IS element). It is not surprising that besides the transposition reaction, the inverted repeats can also mediate chromosomal inversions. Besides site-specific recombination, recombination between inverted repeats in bacteria has not been studied as extensively as recombination between direct repeats. This might have been due to the usual lack of a phenotype of the inversion of a marker gene, and the thought that the same mechanism might be responsible for recombination at both direct and inverted repeats. The few cases of inversion for the chromosomes of E . coli and phage A are caused by, or attributed to, recombination between inversely oriented homologous sequences such as transposable elements (50-SS), phage M u (57), and the rrn operons (58, 59). The genetic requirements of inversion between inverted repeats have been examined in the following studies. A genetic switch in phage A was mediated by the inverted IS10 elements (1.4kb long) of TnlO, which served as “portable regions of homology” (52). It was shown to be recA and recB dependent in the absence of phage recombination functions (52).Switching occurred efficiently in the RecBCD pathway (53).It was suggested to occur either intra- or intermolecularly (53), as illustrated in Fig. 9A and B. In another study, inversion of an 800-bp-long sequence mediated by short inverted repeats (12 or 23 bp) in the chromosome of E . coli was recA and recBC dependent as well (60).
C . recA- independ ent Recombina t io n between Plasmid-borne Inverted Repeats
As discussed in Section I, although plasmid recombination has features distinct from those of chromosomal recombination, it provides a convenient system for studying the molecular mechanism of recombination. It was dem-
276
XIN BI AND LEROY F. LIU
onstrated that plasmid-borne inverted repeats can mediate efficient recAindependent recombination (4). In this study, a genetic switch for the tetA gene of pBR322 (Fig. 10A) was created by placing its promoter (Ptet)between two inverted repeats of 352 bp (see pHPH in Fig. 10B). In the recombinant plasmid (pHPH), Ptet is in the “wrong” orientation so that the tetA gene cannot be expressed. Recombination between the repeats is expected to cause inversion of the intervening sequence so that the intact tetA gene can
B
-
Inversion of fragment P __t
FIG. 10. A genetic switch for the expression of the tetA gene of pBR322 as a model system for studying recombination between inverted repeats. (A) The structure of pBR322. The tetA gene (coordinates 86 to 1276) and its promoter region are divided into three parts: P (coordinates 1 to 651), H (652 to 1002), and T (1003to 1276). (B) The genetic switch for tetA. In the plasmid pHPH, P,,, and part of the 5’ end of the open reading frame of tetA (fragment P) were bracketed with two inverted repeats (designated H). Note that the H fragment is also part of tetA. P,,, in the P fragment of pHPH is in the “wrong” orientation so that tetA cannot be expressed. Recomhination between the H repeats is expected to cause inversion of the P fragment so that an intact tetA gene can be regenerated (in pHPHR). Therefore, tetA can alternate reversibly between nonexpressed (in pHPH) and expressed (in pHPHR) states as the direct consequence of recombination between the inverted H repeats. In order to examine if the distance separating the inverted repeats affects recornhination, additional DNA fragments (besides the P fragment) of various lengths are also inserted between the H repeats of pHPH (not shown). Note that the conversion from pHPH to pHPHR shown here is the predicted result of the processes shown in Fig. 9A and B. However, this is not the only way by which an intact tetA can be regenerated; interinolecular recombination shown in Fig. 9C can also regenerate tetA, but the product would be a special dimer with the structure as Pa, illustrated in Fig. 11.
red-INDEPENDENT RECOMBINATION
277
be regenerated (see pHPHR in Fig. 10B). Recombination of pHPH as examined in three pairs of recA+lrecA- strains of E . coli was at about the same One can level with a frequency (defined in Ref. 3) of 0.5-1.0 x conclude that there is a red-independent mechanism(s) for recombination between (plasmid-borne) inverted repeats. It has also been demonstrated that this recombination is recBC independent (4). Similar results were obtained when recombination between another pair of 651-bp-long inverted repeats flanking a sequence of 232 bp in a pBR322 derivative was examined (4). This supports the notion that the high level of recA-independent recombination was not due to specific sequences involved in recombination. Similar results were also obtained for recombination between inverted repeats in a pACYC184 derivative (4), indicating that the efficient recA-independent recombination is not restricted to ColEl replicons. As discussed in Section I, I),recA-independent recombination between direct repeats is sharply reduced by increasing the length of the intervening sequence. Interestingly, a similar distance effect has been observed for recAindependent recombination between inverted repeats (4), indicating that recA-independent recombination between direct repeats and that between inverted repeats may share common mechanistic features.
D. The Major Product of recA-independent Recombination between Plasmid-borne Inverted Repeats Is an Unusual Head-to-Head Dimer The expected product of recombination between inverted repeats is that of a “simple” intramolecular inversion. For a plasmid substrate like pHPH (Fig. 10B; also see S in Fig. 11),such a product would be a plasmid with the same size as the substrate (pHPHR in Fig. 10B; also see P, in Fig. 11). However, the product of recA-independent recombination of pH PH is almost exclusively a dimeric plasmid with a structure as P, in Fig. 11 (4). Consistently, the products of the other pBR322-based substrate and the pACYC184-based substrate described above (Section II,C) are also dimers like P, (4). Why was the simple inversion product ( P , ) not observed for recA-independent recombination between plasmid-borne inverted repeats? One reason might be that it is toxic to or unstable in the host cell. This was ruled out for pHPH by the fact that pHPHR, made in citro, transformed cells and was stably maintained with roughly the same copy number as for pHPH (4). The dimeric product P, consists of inverted duplications, but is not a true head-to-head dimer. In theory, it can be generated by a single intermolecu-
XIN BI AND LEROY F. LIU
278
a
FIG. 11. RecA-independent recombination between plasmid-borne inverted repeats produces a special head-to-head dimer. Filled arrows represent the inverted repeats; open arrows, the intervening sequence. The sequences beside the repeats and the intervening sequence are denoted by a, b, and the shaded arrows. S is the plasmid substrate for recombination; P, is the predicted product of intramolecular inversion between the repeats; P d is the observed product of recA-independent recombination.
lar reciprocal exchange between inverted repeats of two substrate monomers, as shown in Fig. 9C. However, this is unlikely to be the case for the same reasons discussed for the intramolecular nature of formation of dimeric products by recombination between direct repeats (see Section I, D,2). Interestingly, there are a few reports in the literature on dimer formation as a result of recombination between plasmid-borne inverted repeats. In one of the early studies on the hin system of site-specific inversion (61),a 1.5-kb fragment of the Salmonella chromosomal DNA, which encompasses the invertible region (BOO bp) involved in the control of phase transition, was cloned in the plasmid pBR322. The recombinant plasmid gave rise to a product whose restriction pattern indicated it to be a special head-tohead dimer of the structure of Pd shown in Fig. 11 (61). Formation of the dimer was independent of RecA, and was proposed to be due to an intramolecular inversion within a tandem (head-to-tail) dimer of the substrate plasmid. In light of later characterization of the specific sites for recombination of the hin system (62) and the results discussed above, it appears very likely that the head-to-head dimer (61) was generated by intramolecular recombination between the short (26 bp) imperfect inverted repeats at which site-specific recombination normally occurs. Whether the hin recom-
reCA-INDEPENDENT RECOMBINATION
279
binase participated in this process was not addressed in the report (61)and remained unclear. The bacterial transposon Tn5 consists of a 2.8-kb central region and two flanking inverted 1.5-kb IS50 insertion elements. When Tn5 was inserted into the plasmid pUC18, it was found that recombination between the IS50 elements gave rise to a product with the size of the dimeric substrate (63). Although the structure of the dimeric product was not characterized, it is very likely the same as Pd (Fig. 11).It should be noted that recombination between the IS50 elements was dependent on RecA (63).This was probably because the distance between the IS50 elements was long (2.8kb), and when the distance between the inverted repeats was long (>I kb), recombination appeared to have limited recA dependence (X. Bi and L. F. Liu, unpublished). The cin gene product of phage P1 can catalyze site-specific recombination between the small inversely repeated (-30 bp) cix sequences. Multicopy plasmids bearing cin and the inverted cix repeats formed dimers in the absence of the host recA function (64). Restriction enzyme digestion and electron-microscope analyses indicated that the dimers consisted of both head-to-tail dimers and the special head-to-head dimers as P,. It was further shown that dimer formation was cin dependent (64).The head-to-head dimer could be formed either from a preexisting head-to-tail dimer by an intramolecular recombination, or from two monomers by an intermolecular recombination at the cix sites as illustrated in Fig. 9C. The latter was supported by the finding that Gin+ plasmids were much more efficient than cinplasmids in cotransduction with P1 markers, indicating cointegration with the P1 genome (64). Plasmid pBR325 (65) contains an inverted duplication of 482 bp (66).A study on IS-mediated transposons (67) demonstrated that a pBR325 derivative can form a dimer of the structure of Pd. This was attributed to intermolecular reciprocal recombination at the inverted repeats of pBR325 (67). In summary, recombination between plasmid-borne inverted repeats can lead to the formation of special head-to-head dimers (Pd in Fig. 11).All the early discoveries about these dimers were made in studies of various sitespecific recombination systems. However, only the cin system seemed to be directly involved in dimer formation. It was not completely clear if sitespecific recombinases played any role in other cases. On the other hand, recent studies on recombination between plasmid-borne inverted repeats that involved only the sequences of the plasmid pBR322 clearly indicate that inverted repeats in close proximity can mediate efficient recA-independent recornbination (4). A replicational model is presented below for this recombination.
280
E.
XIN BI AND LEROY F. LIU
The Reciproca I-strand-switching Model for recA-independent Recombination between Inverted Repeats of DNA
As discussed in Section I,F, DNA replication has been invoked in explaining recA-independent recombination between direct repeats. However, the replicational mechanisms proposed cannot account for the formation of the special head-to-head dimer (Pd) by recA-independent recombination between plasmid-borne inverted repeats. Recently, a replicational mechanism for recA-independent recombination between inverted repeats has been proposed (4).The essence of the model is reciprocal switching of the leading and lagging strands within the inverted repeats when they are being replicated. Resolving the junction formed due to the switching, and completion of replication, could result in major rearrangements. The reciprocal-strand-switching(RSS) model is described in Fig. 12 using a plasmid substrate (S) as an example. At the replication fork (Fig. 12B) the leading-strand DNA polymerase (68, 69) is copying the repeat proximal to the fork, while the lagging-strand DNA polymerase (68, 69) is copying the repeat distal to the fork. Reciprocal switching of the leading and lagging strands is shown in Fig. 12C. The 3' end of the nascent leading strand dissociates from the leading-strand polymerase and template, and switches to the lagging-strand polymerase and template, and vice versa. Note that strand switching results in a Holliday junction (70) that can migrate within the repeats. It will be resolved by endonuclease and ligase activities. If the nascent strands (thin lines) in the junction are cut and religated after exFIG. 12. Reciprocal switching of the leading and lagging strands of DNA replication: a model for recA-independent recombination between inverted repeats of DNA. (A) The plasmid substrate bearing inverted repeats (S, as shown in Fig. 11).The two strands of DNA are shown as thick lines. Each strand of a repeat is shown as an arrow. Open arrows correspond to the 5'40-3' orientation of DNA and shaded arrows to the 3'-to-5' orientation. The markers W, X, Y, and 2 are used to mark the sequence around the inverted repeats. The origin of replication ( o h ) is shown as a dashed arrow. Note that a unidirectional origin is used in the illustration. The orientation of the sequence outside of the repeats is indicated by an arrow. (B) The inverted repeats at a replication fork. Open circle, the leading-strand DNA polymerase activity; shaded circle, lagging-strand polymerase. Thin lines represent the nascent strands of DNA. Wavy lines represent the RNA primers for lagging strand DNA synthesis. The dashed line indicates that the two polymerase activities may actually be associated in a complex (the DNA polymerase-111 holoenzyme). (C) Reciprocal switching of the leading and lagging strands of DNA replication within the inverted repeats. Note that switching occurs when the repeats are aligned as illustrated. Replication continues after the switching. Sooner or later, the junction in C will be resolved by DNA endonuclease and ligase. The arrowheads indicate the positions of endonuclease digestion. (D) Structure of the replicating plasmid after the junction in C is resolved in the manner as shown. (E) Completion of replication results in a special dimer (Pd), as is also illustrated in Fig. 11.
A o ri
5'
3’
D
E
282
XIN BI AND LEROY F. LIU
change, the original fork structure is regenerated as if nothing has happened. However, if the template strands are cut and religated as illustrated, resolution of the junction will generate a dumbbell-shaped intermediate as shown in Fig. 12D. Completion of replication results in an unusual complex dimer consisting of inverted duplications (Fig. 12E), which is the same as P d (Fig. 11). Although the plasmid substrate used in the illustration has a unidirectional origin of replication (e.g., the ori of pBR322) (40), it is obvious that a plasmid with a bidirectional origin of replication will also generate the product P d . Note that the sequence of the markers W, X, Y, and Z in the substrate is WXYZ (Fig. 12A). In the product, two new joints ZX and YW are created. The rest of the dimer consists of a large inverted duplication (also see Fig. 11). In conclusion, reciprocal strand switching within plasmid-borne inverted repeats during replication followed by resolution of the junction and completion of replication can produce the special head-to-head dimer (Pd). The distance effect on recombination between inverted repeats discussed in Section II,D can be explained by the RSS model. Because, in general, syntheses of the leading and lagging strands are coordinated spatially (68, 69), reciprocal strand switching events can occur efficiently only when the two repeats are separated by a relatively short distance, perhaps in the range of an Okazaki fragment. As illustrated in Fig. 12, reciprocal strand switching within plasmidborne inverted repeats leads to the formation of a special head-to-head dimer. However, if the substrate is linear, reciprocal strand switching would result in two molecules, both with deletion and inverted duplication (Fig. 9C). The rearrangements in both of the above cases may be lethal to the cell; this may be the reason why this kind of rearrangement has not been found in the genomes of E . coli and phage A, although inverted-repeat-mediated recombination (inversion) in them has been studied. Simple strand switching during replication was first proposed to explain certain deletion events of phage A (71, 72) and later was applied to explain phage Mu excision by aborted transposition (73).The same concept was also used (74, 75) to explain the generation of inverted duplications in certain gene amplification events in mammalian cells (reviewed in 76-78). The models proposed (71, 74, 75)all propose that replication switches strands (templates) and proceeds around the replication fork. These models can be referred to as “single-strand-switching” models. The RSS model is distinct from the single-strand-switching models in the following features. (1) In the RSS model, reciprocal switching of both nascent strands of replication has been proposed and a Holliday junction is formed after the switching. (2) In
WCA-INDEPENDENT RECOMBINATION
283
the RSS model, strand switching is proposed to occur within preexisting inverted repeats at the replication fork. Although both the single-strandswitching model and the RSS model can explain the formation of inverted duplications, only the RSS model can explain the direct-repeat-mediated complex rearrangement (4) discussed above. In the RSS model, it has been proposed that reciprocal strand switching occurs at inverted repeats in close proximity. Short, imperfect inverted repeats, which are common in the genome, might be potential sites for lowefficiency switching. Switching might be enhanced when the movement of the replication fork is blocked by certain lesions in DNA, and therefore, under certain abnormal physiological conditions (e.g., in cells treated with DNA damaging agents), strand switching may occur at sites without the presence of extensive inverted repeats.
F. Implications of the RSS Model in Genome Rearrangement and Gene Amplification 1. GENE AMPLIFICATION ASSOCIATED WITH INVERTED REPEATS Gene amplification and genome rearrangement are of great biological significance. Gene amplification is often found in drug-treated cultured mammalian cells and advanced tumor cells (reviewed in 76). Amplification is often coupled to genome rearrangements, such as inversion, translocation, and chromosome loss. Initially it was found that amplicons could be arrays of head-to-tail tandem duplications with heterogeneous joints between them. However, it appears that amplicons can also exist as homogeneous arrays of head-to-head and tail-to-tail inverted duplications, and it has been suggested that formation of inverted repeats is involved in the generation of amplicons (reviewed in 77, 78). An intriguing finding about gene amplification is that some amplicons exist as circular episomes consisting of imperfect inverted duplications. A 500-kb extrachromosomal amplicon in a mouse fibroblast line (B-d) is a circle structurally similar to the circle shown in Fig. 13D (79). Evidence (79) also indicates that the 500-kb circle is the major, if not only, basic structure of all the amplicons that contain -15% of total DNA of the B - d cell. A better characterized circular amplicon is the H circle in the protozoan parasite Leishmania (80-82). The H circle (see later, Fig. 15)contains a large (30-kb) inverted duplication separated by two unique DNA segments (designated a and b, respectively), both flanked by inverted repeats. It originates from the H locus of the chromosome which is bracketed by the a and b segments. There is evidence indicating a direct role of the preexisting inverted repeats in mediating the formation of the H circles (80-82). It is both
centromere
"
B
B'
C
C'
D
D'
5 1 . ................... ) ..~................
dicentric palindromic chromosome
D,
t .......
acentric palindromic fragment
E'
1-
\strand
J
......
switching at one fork
1 7
A ............
A' B' C'
dicentric palindromic chromosome
.......
acentric palindromic fragment
Ted-INDEPENDENT RECOMBINATION
285
interesting and puzzling that besides the monomer, only special dimers and tetramers of the H circles were observed, but trimeric circles have never been found (81). The distinct structures and locations of various amplified sequences indicate that different mechanisms may be involved under different circumstances. A variety of models have been proposed to explain different types of gene amplification in mammalian cells (reviewed in 76-78, 83). All of the models invoke abnormal DNA synthesis and reconibination. Two early models, the onionskin-replication model and the unequal-sister-strand-exchange model, cannot explain the presence of amplified inverted duplications containing homogeneous ends. The extrachromosomal-double-rolling-circle model (84) (simplified as the EDRC model) and the chromosomal-spiral (CS) model (75)have been developed to explain the presence of inverted repeats in the amplicons. Both of these models involve the double-rolling-circle type of DNA replication (SS), either in an extrachromosomal circle (in the EDRC model) or in the chromosome (the CS model). In the CS model, the amplified DNA is at the original chromosomal locus, whereas in the EDRC model the amplified DNA can integrate into new sites accompanied by rearrangement at the original chromosomal site. The EDRC model can explain many aspects of gene amplification associated with inverted repeats, but no satisfactory explanation has been provided for the critical initial step of the model, excision from the chromosome of a circle containing inverted duplications separated by short (
2.
RSS MODEL IN GENE AMPLIFICATION GENOMEREARRANGEMENT
IMPLICATIONS OF THE AND
The reciprocal-strand-switching (KSS) model for recA-independent recombination between inverted repeats discussed in Section II,E can be used to explain some important aspects of gene amplification and genome rearrangement associated with inverted repeats. It accommodates particularly FIG. 13. Reciprocal switching of the leading and lagging strands of DNA synthesis: a model for gene amplificatiori and genome rearrangement. (A) A replication bubble. Coordinated replication of the leading and lagging strands at each fork is shown. The symbols used are as described in Fig. 12. (B-E) Reciprocal switching of the leading and lagging strands at both forks of a replication huhble in a chroniosonie, and subsequent resolution of the junctions, leading to the excision of a circle containing an inverted duplication (indicated by the dashed arrows) separated by unique DNA sequences. The rest of the chromosome is divided into two parts that later form a dicentric and an acentric chromosome, respectively, after completion of replication. Both of the chromosomes are palindromic. (B'-E') Similar to that illustrated in B-E, reciprocal switching of the leading and lagging strands at one fork alone will also lead to the formation of a dicentric and an acentric chromosome. However, no extrachromosomal circle is generated.
286
XIN BI AND LEROY F. LIU
well the formation of circular amplicons consisting of imperfect inverted duplications, as well as dicentric (and acentric) chromosomal fragments as a result of gene amplification. In the EDRC model for gene amplification, the critical initial step is the excision of a circle containing an inverted duplication from a replication bubble (84). This was proposed to be accomplished by two asymmetric nonhomologous recombination events of unknown nature. We propose that these two events are reciprocal switchings of the leading and lagging strands at both forks of the replication bubble followed by subsequent resolution of the junction generated due to strand switching (Fig. 13A-E). The circle generated contains an inverted duplication separated by noninverted sequences (shown in Fig. 13D). The RSS model can well accommodate the fact that the noninverted repeats are usually short (150-1000 bp, not explained by the EDRC model), because they originate from the loops (loops A and D’ in Fig. 13A) formed during lagging-strand synthesis, which cannot be very large (in the size range of the Okazaki fragment). As a result of excision of the circle, the rest of the replicating chromosome is divided into two parts, each containing one of the two forks of the original bubble (Fig. 13D). Completion of replication will result in two abnormal palindromic chromosomal fragments, one dicentric and one acentric (Fig. 13E). These structures may be prone to further rearrangements or loss, This is consistent with the fact that gene amplification is often coupled with other chromosomal abnormalities (reviewed in 76-78). Note that the processes discussed above (see Fig. 13A-D) readily explain the formation of the circular amplicon in the B - d cell (79). The amplification process in the EDRC model is proposed to be accomplished by double-rolling-circle replication of the excised circle (84). Here we propose an alternative pathway for amplification based on the RSS model. As illustrated in Fig. 14, during replication of the excised circle (see Fig. 13D), reciprocal-strand switching (RSS) within the inverted repeats would lead to the formation of a dimeric circle (see Fig. 12). In turn, sequential rounds of RSS during replication would lead to tetramers, octomers, and up to 2n-mers of circular amplicons (Fig. 14). Thus all of the circular amplicons consist of homogeneous arrays of inverted duplications. The circular amplicons may also integrate into the chromosome at various sites (Fig. 14). The RSS model can also satisfactorily account for the formation of the dicentric palindromic marker chromosome iso(9) (pter 912, q12 += pter) found in a mentally retarded boy (86).As illustrated in Fig. 13A and B’-E’, we propose that reciprocal-strand switching at a replication fork in the long arm of chromosome 9 eventually leads to the formation of the dicentric palindromic chromosome and the acentric palindromic fragment (Fig. 13E’). The RSS model provides an especially good explanation for formation of
-
287
WCA-INDEPENDENT RECOMBINATION
RSS during replication
6
ipxt
1
+
C
2ndRSS
3rd RSS --t
f f
+
2" Integration into chromosome
E c
.-*
.-,
--
*
FIG. 14. Reciprocal switching of the leading and lagging strands of DNA synthesis: a model for amplification of circular amplicons. (A) An episomal circular amplicon (monomer) consisting of an inverted duplication separated by unique sequences (79; see Fig. 13D). Open arrows, inverted repeats; thin and dashed arrows, diffprent sequences (designated a and b, respectively) separating the inverted repeats. (B) The dimeric amplicon derived from a monomeric circle by reciprocal stand switching (RSS) within the inverted repeats during replication. (C) The tetrameric amplicon derived from the dimer by RSS within any pair of inverted repeats. (D) The octonier formed from the tetramer by RSS. Note that Z"-rners can be formed by RSS during replication. (E) Circular amplicons can integrate into the chromosome.
the H circles in Leishmania (80-82). We propose that excision of the H circle is mediated by reciprocal-strand switching at the pair of inverted repeats at each end of the H locus during replication (Fig. 15 A-D). The distance between the inverted repeats in segment a or b, 3-5 kb (81, 82), also meets
-a-A
A
B
C
D
a
B
A!
--b-
-
-a-
-b-
-a-
-b-
(-=> - A
E
C
0
C
-
b
--?Ic
P
3 -aA B
F
G
H
C
A
a
Q’ -bspecial H circle dimer
- A
0
C
-b-c
0
A
-
red-INDEPENDENT RECOMBINATION
289
the requirement that the intervening sequence be relatively short for the RSS model (discussed in Section 11,C). The RSS model also explains the formation of the special dimer of H circle. As shown in Fig. 15D-H, in a replicating H circle monomer, reciprocal switching of the leading and lagging strands between the inverted repeats flanking segment b (or a), and subsequent resolution of the junction, divide the replication bubble into two parts (Fig. 15F and G). Completion of replication of the entire circle generates the special dimer of H circle (Fig. 15H), which has been observed (81).Theoretically, 2n-mers can be formed in this manner but trimers and other oligomers cannot (see Fig. 14). This fits remarkably well with the previously unexplained result that only the special dimers and tetramers of H circles have been observed, but trimeric circles have never been found (81). Our RSS model also explains why H circles can be formed in wild-type cells of Leishmania receiving no drug treatment (81), whereas gene amplification in normal mammalian cells is rare but occurs in tumor cells and drugtreated cells. According to the RSS model, the preexisting inverted repeats at each end of the H locus can mediate the formation of the H circles. Without such extensive inverted repeats, switching may occur only when the replication forks are arrested under abnormal physiological conditions, such as tumorigenesis and drug treatment of the cells. In conclusion, reciprocal switching of the leading and lagging strands of DNA synthesis may underlie the mechanism(s) of certain genome rearrangement and gene amplification events.
ACKNOWLEDGMENTS We thank Shanhong Wan for assistance in preparing the manuscript and Jiaxi Wu for a critical reading of it. Research on D N A recombination in our laboratory is supported by National Institutes of Health Grant GM27731 to L. F. L. FIG. 15. Reciprocal switching of the leading and lagging strands of D N A replication: a model for formation of the H circles in Leishmania. (A) The two strands of the H locus of the chromosome of Leishmania. Thick bars. inverted repeats flanking the segments a and h. (B) The H locus in a replication bubble. Both pairs of the inverted repeats are shown to be near a replication fork. The symbols are as described in Fig. 12 except that the template strands are also drawn as thin lines here for clarity (C) Reciprocal switching of the leading and lagging strands of D N A replication within the pair of the inverted repeats at each fork of the bubble. Note that the two switching events at the two forks do not have to occur simultaneously. ( D ) Resolving the junctions formed due to strand switching leads to excision of the H circle from the chromosome. (E-H) During replication of the H circle (monomer), strand switching within any pair of the inverted repeats can eventually lead to the formation of the special dimer of the H circle. See Fig. 12 for detailed illustration.
290
XIN BI AND LEROY F. LIU
REFERENCES I . R. D. Porter, in “Genetic Recombination” (R. Kucherlapati and 6 . R. Smith, eds.), p. 1. American Society for Microbiology, Washington, DC, 1988. 2 . S. T. Lovett, P. T. Drapkin, V. A. Sutera, Jr. and T. J. Gluckman-Peskind, Genetics 135,631 (1993). 3. X. Bi and L. F. Liu, J M B 235, 414 (1994). 4. X. Bi and L. F. Liu, PNAS 93, 819 (1996). 5. G. R. Smith, Microbiol. Reu. 52, l(1988). 6. 6 . R. Smith, Cell 58, 807 (1989). 7. S. T. Lovett, T. J. Gluckman, P. J. Simon, V. A. Sutera, Jr. and P. T. Drapkin, MGG 245, 294 (1994). 8 . F. Chkdin, E. Dervyn, R. Dervyn, S. D. Ehrlich and P. Noirot, Mol. Microbiol. 12, 561 (1994). 9 . T.-M. Yi, D. Stearns and B. Demple, J. B a t . 170, 2898 (1988). 10. G . L. Dianov, A. V. Kuzminov, A. V. Mazin and R. I. Salganik, MGG 228, 153 (1991). 11. A. V. Mazin, A. V. Kuzminov, G. L. Dianov and R. I. Salganik, MGG 228, 209 (1991). 12. X. Bi, Y. L. Lyu and L. F. Liu, J M B 247, 890 (1995). 13. J. R. Bedbrook and F. M. Ausubel, Cell 9, 707 (1976). 14. H. Potter and D. Dressler, PNAS 74, 4168 (1977). 15. R. A. Fishel, A. A. James and R. Kolodner, Nature 294, 184 (1981). 16. A. A. James, P. T. Morrison and R. D. Kolodner, J M B 160, 411 (1982). 17. A. Laban and A. Cohen, MGG 184, 200 (1981). 18. A. Laban and A. Cohen, MGG 189, 189 (1983). 19. R. Kolodner, PNAS 77, 4847 (1980). 20. C. Luisi-DeLuca, S. T. Lovett and R. D. Kolodner, Genetics 122, 269 (1989). 21, S. K. Mahajan, in “Genetic Recombination” (R. Kucherlapati and 6. R. Smith, eds.), p. 87. American Society for Microbiology, Washington, DC, 1988. 22. A. J. Clark and K. B. Low, in “The Recombination of Genetic Material” (K. B. Low, ed.), p. 155. Academic Press, San Diego, CA, 1988. 23. M. M. Cox and I. R. Lehman, ARB 56, 229 (1987). 24. C. M . Radding, in “Genetic Recombination” (R. Kucherlapati and 6. R. Smith, eds.), p. 193. American Society for Microbiology, Washington, DC, 1988. 25. S. C. West, ARB 61, 603 (1992). 26. S. C. Kowalczykowski and A. K. Eggleston, ARB 63, 991 (1994). 27. S. D. Hall, M. F. Kane and R. D . Kolodner, J. Bact. 175, 277 (1993). 28. S. D. Hall and R. D. Kolodner, PNAS 91, 3205 (1994). 29. R. D. Kolodner, S. D. Hall and C. Luisi-DeLuca, Mol. Microbiol. 11, 23 (1994). 30. M. J. Doherty, P. T. Morrison and R. Kolodner, J M B 167, 539 (1983). 31. N. D. Allgood and T J. Silhavy, in “Genetic Recombination” (R. Kucherlapati and G. R. Smith, eds.), p. 309. American Society for Microbiology, Washington, DC, 1988. 32. B. Low, PNAS 60, 160 (1968). 32a. X. Bi and L. F. Lin, unpublished result. 32b. X. Bi and L. F. Liu, unpublished result. 33. 6. Streisenger, Y. Okada, J. Emrich, J. Newton, A. Tsngita, E. Terazaghi and M. Inouye, C S H S Q B 31, 77 (1966). 34. N. C. Franklin, in “The Bacteriophage Lambda” (A. D. Hershey, ed.), p. 175. CSHLab, CSH, NY, 1971. 35. F.-L. Lin, K. Sperle and N . Sternberg, MCBiol. 4, 1020 (1984).
reCA-INDEPENDENT RECOMBINATION
29 1
S. N. Cohen and A. J. Clark, 1. B a t . 167, 327 (1986). R. Seelke, B. Kline, R. Aleff, P. D. Porter and M. S. Shield, J. B a t . 169, 4841 (1987). J.-F. Viret, A. Bravo and J. C. Alonso, Microbiol. Reo. 55, 675 (1991). M. Matfield, R. Badawi and W. J. Brammar, MCG 199, 518 (1985). P. Balbas, X. Soberon, F. Bolivar and R. L. Rodriguez, in “Vectors: A Survey of Molecular Cloning Vectors” (R. L. Rodriguez and D. T. Denhardt, eds.), p. 5. Butterworth, Boston 1987. 41. D. Brutlag, K. Fry, T. Nelson and P. Hung, Cell 10, 509 (1977). 42. R. D. Wells and R. R. Sinden, in “Genome Analysis” (K. Davies and S. Warren, eds.), Vol. 7, p. 107. CSHLah, CSH, NY, 1993. 43. S. N. Thibodeau, G. Bren and D. Schaid, Science 260, 816 (1993). 44. Y. Ionov, M. A. Peinado, S. Malkhosyan, D. Shibata and M. Perucho, Nature 363, 558 (1993). 45. L. Thompson, in “Genetic Recomhination” (R. Kucherlapati and G. R. Smith, eds.), p. 597. American Society for Microbiology, Washington, DC, 1988. 46. 0. Oishi, in “The Recombination of Genetic Material” (K. B. Low, ed.), p. 445. Academic Press, San Diego, CA, 1988. 47. S. Wolff and P. Perry, Chromosonm 48, 341 (1974). 48. N. L. Craig and N. Kleckner, in “Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology” (F. C. Neidhardt, et al., ed.), p. 1054. American Society for Microbiology, Washington, DC, 1987. 49. A. C. Glasgow, K. T. Hughes and M. I. Simon, in “Mobile DNA” (D. E. Berg and M. M. Howe, eds.), p. 637. American Society for Microbiology, Washington, DC, 1989. 50. D. J. Savic, /. Bact. 140, 311 (1979). 51. D. J. Savic, S. P. Romac and S. D. Ehrlich, J . B a t . 155, 943 (1983). 52. N. Kleckner and D. G . Ross, JMB 144, 215 (1980). 53. D. 6. Ennis, S. K. Amundsen and G . R. Smith, Genetics 115, 11 (1987). 54. J. M. Louam, J. P. Bouche, F. Legendre, J. Louarn and J. Patte, MGG 201, 467 (1985). 55. J.-E. Robello, V. Fransois and J.-M. Louarn, PNAS 85, 9391 (1988). 56. Y. Komoda, M. Enomoto and A. Tominaga. Genetics 129, 639 (1991). 57. M . Faelen and A. Toussaint, J . Buct. 142, 391 (1980). 58. C. W. Hill and B. W. Harnish, PNAS 78, 7069 (1981). 59. C. W. Hill and J. A. Gray, Ge~ietica119, 771 (1988). 60. M. A. Schofield, R. Agbunag and J. H. Miller, Genetics 132, 295 (1992). 61. J. Zieg, M. Hilmen and M. Simon. Cell 15, 237 (1978). 62. R. C. Johnson, M. B. Bruist, M . B. Glaccum and M. I. Simon, CSZISQB 49, 751 (1984). 63. P. C. Weber, M. Levine and J. C. Clorioso, J . Bact. 170, 4972 (1988). 64. K. E. Kennedy, S. Iida, J. Meyer, M . StBlhammar-Carlemalm, R. Hiestand-Nauer and W. Arber, MGC 189, 413 (1983). 65. F. Bolivar, Gene 4, 121 (1978). 66. P. Prentki, F. Karch, S. Iida and J. Meyer, Gene 14, 289 (1981). 67. S. Iida, J. Meyer and W. Arher. C S l I S Q B 45, 27 (1980). 68. A. Kornberg and T. Baker, “DNA Replication,” 2nd ed. Freeman, New York, 1992. 69. K . J. Marians, ARB 61, 673 (1992). 70. R. Hnlliday, Genet. Res. 5, 282 (1964). 71. L. T. Chow, N . Davidson and D. E. Berg, J M B 86, 69 (1974). 72. D. E. Berg, JMB 86, 59 (1974). 73. D. K. Nag and D. E. Berg, MGG 207, 395 (1987). 74. J. Nalbantoglu and M . Meuth, NARes 14, 8361 (1986).
36. 37. 38. 39. 40.
292
XIN BI AND LEROY F. LIU
75. 0. Hyrien, M. Debatisse, 6. Muttin and B. Robert de Saint Vincent, EMBO J. 7, 407 (1988). 76. 6. R. Stark, M . Debatisse, E. Giulotto and G. M. Wahl, Cell 57, 901 (1989). 77. M. Fried, S. Feo and E. Heard, BBA 1090, 143 (1991). 78. M. Fried, S. Feo and E. Heard, in “Gene Amplification in Mammalian Cells” (R. E. Kellems, ed.), p. 447. Dekker, New York, 1993. 79. G . H . Nonet, S. M. Carroll, M. L. DeRose and G. M. Wahl, Genomics 15, 543 (1993). 80. S. M. Beverley, J. A. Coderre, D. V. Santi and R. T. Schimke, Cell 38, 431 (1984). 81. T. C. White, F. Fase-Fowler, H. van Lumen, J. Calafat and P. Borst, JBC 263, 16977 (1988). 82. M. Quellette, E. Hettema, D. Wust, F. Fase-Fowler and P. Borst, EMBO J. 10, 1009 (1991). 83. M. I. Aladjem and S. Lavi, Mutat. Xes. 276, 339 (1992). 84. C. Passananti, 6. Davies, M. Ford and M. Fried, E M B O J . 6, 1697 (1987). 85. A. B. Futcher, J . Theor. B i d . 119, 197 (1986). 86. A. W. Sjostedt, M . Alatalo, J. Wahlstrom, U. von Dobeln and R. Olegard, fiereditas 111, 115 (1989).
The Elongation Phase of Protein Synthesis CZWORKOWSKI PETERB. MOORE
JOHN AND
Department of Chemistry Department of Molecular Biophysics and Biochemistry Yale University New Haven, Connecticut 06520
I. The Elongation Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. The Two-site Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. How Many Sites Are There? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Alternatives to the Two-site Model . . . . . . . . . . . D. The Fidelity of Protein Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Factor-free Translation . . . . , . , . . . . . . . . . . . . . . . . . . . . . . . . . . . F. Rates, States, and Ener ............................. .... ................. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . B. The EF-G Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Antibiotic Inhibitors . . . . . . . . . . . D. Elongation Factor Interactions: The 1060 Region . . . . . . . . . . . . . . . . ............,.... E. Elongation Factor Interactions: The SRL F. Elongation Factor Interactions: The 30-S Subunit . . . . . . . . . . . . . . . G. Structures of the Factors . . . . . . . 111. On the Mechanism of' Elongation . . . . . . . A. On the Placement of Ribosomal Sites 8. Factor and tRNA Orientations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Models for Translocation . . . . . . . . IV. Concluding Remarks , . . . . . . , . , , . , . . . . . . . , , , . , . . . . . . . . . . . . . , . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
294 294 296 297 300 303 304 306 306 308 311 313 314 315 316 320 320 322 323 326 326
Elongation is the phase of the protein-synthesis pathway responsible for the growth of nascent polypeptide chains. Not surprisingly, many reviews of this critically important area of biochemistry have appeared in the four decades since it was discovered (1-3). The reason another review is appropriate now is that the structures of several of the macromolecules involved in elongation have been solved recently: the sarcinlricin loop from 23-S128-S rRNAs ( 4 , 5), the elongation factor Tu.GTP complex (6, 7), the elongation factor Tu.GTP*(aminoacyl-tRNA) complex (8), and elongation factor G in Progrrss tn Nucleic Acid Resrarch and Molecular Biology, Val. 54
293
Copyright 0 19% by Academic Press, Inc All rights of reproduction in any form reserved.
294
JOHN CZWORKOWSKI AND PETER B . MOORE
both its nucleotide-free (9)and GDP forms (10).Our goal is to integrate this new information with what is already known about elongation.
1. The Elongation Cycle The emergence of the elongation cycle from obscurity began in the late
1950s as investigations of the mechanism of protein synthesis got under way. By the early 1960s, ribosomes, tRNA, and mRNA-the principal components of the protein-synthesis apparatus-had all been discovered, and their roles understood. The “two-site model” for protein synthesis, elaborated by Watson in 1964 (11), summarized what had been learned, and has provided the conceptual framework for discussions of protein synthesis ever since.
A. The Two-site Model The two-site model postulates that the ribosome has two sites for tRNA binding: one that binds aminoacyl-tRNAs preferentially [the Afmino acid) site] and a second that is specific for peptidyl-tRNAs [the P(eptide) site]. In the earlier literature, the A site is sometimes called the acceptor site, and the P site the donor site. It was proposed that elongation of a polypeptide chain by a single amino acid is accomplished by the sequence of steps shown in Fig. 1. The cycle starts with a peptidyl-tRNA bound to the ribosome’s P site and with its A site vacant. The A site then is filled by an aminoacyl-tRNA that has an anticodon complementary to the mRNA codon that programs the A site. Peptide transfer then ensues. Attack of the amino group of the aminoacyltRNA bound to the A site on the carbonyl group of the ester bond that links the peptide chain to the P-site tRNA transfers the nascent chain to the A-site tRNA and thus extends it by one residue. In the final (translocation) step, the deacylated P-site tRNA is ejected from the ribosome, the peptidyl-tRNA in the A site moves to the P site, and the mRNA advances by one codon. This returns the system to its initial state, and the elongation cycle is repeated as long as the advance of mRNA across the ribosome presents “sense” codons to the A site. Four important properties of the elongation system were recognized in the years immediately following Watson’s review. First, it was discovered that the large ribosomal subunit has peptidyl transferase activity; the ribosome is an enzyme (12, 13). Second, the division of labor between the two subunits of the ribosome was clarified. The small subunit binds mRNA, and mediates mRNA-tRNA interactions. In addition to being the peptidyl transferase, the large subunit is involved in all aspects of the ribosome-tRNA interaction that are not strictly mRNA dependent. Third, peptidyl-tRNA
ELONGATION PHASE OF PROTEIN SYNTHESIS
295
t Pi
U
FIG. 1. The two-site model for elongation. This diagram of the two-site model for elongation employs an iconography used in other figures in this chapter. The ribosome is represented as a rectangular shape, the upper two-thirds ofwhich is the 5 0 4 subunits, and the lower third is the 304 subunit. The tRNA binding sites on the 50-S subunit are explicitly identified, in this instance the A site and the P site. When the two subunits are aligned, as they are here, the corresponding 30-Ssites lie immediately below their 5 0 4 counterparts. tRNAs are represented as bars, and are distinguished by their shading. Amino acids are small circles, usually found associated with the tops of tRNAs (= aminoacyl-tRNA);they are shaded the same as the tRNAs with which they are associated. mRNA is a line that crosses the 3 0 4 subunit horizontally. In some instances, it includes shaded segments that stand for distinct codons. Note that when a tRNA interacts with a codon shaded the way it is, a cognate codon-anticodon interaction is implied. Factors are squares (EF-Tu) or circles (EF-G). Their shading indicates whether they are in the GTP or GDP conformation.
and mRNA move across the ribosome in synchrony during translocation, as the model requires (14-16). Fourth, two soluble proteins promote elongation in the cell: elongation factor Tu (EF-Tu) ( E F - l a in eukaryotes), which delivers aminoacyl-tRNAs to the A site, and elongation factor G (EF-G) (EF-2 in eukaryotes), which catalyzes translocation (1 7). Both consume GTP.
296
JOHN CZWORKOWSKI AND PETER B . MOORE
6. How Many Sites Are There? By the late 1960s, there were voices arguing in favor of additional tRNA sites. Before reviewing these claims, it is important to remember that “sites” and “states” have always been conflated in the protein-synthesis field. A “site” is a place where tRNA binds to the ribosome. A “state” is a set of biochemical properties characterizing a ribosome-bound tRNA. tRNAs change state during elongation; they enter as aminoacyl-tRNAs, are transformed into peptidyl-tRNAs, and exit as uncharged tRNAs. The number of sites they occupy in the process is less obvious, and the use of “site” as a synonym for “state” has tended to confuse phenomena with their explanations. Any ribosome-bound peptidyl-tRNA, or peptidyl-tRNA surrogate that reacts with puromycin (an aminoacyl-tRNA analog) to form peptidyl puromycin, is in the P state, by definition. Again by definition, A-state aminoacyl-tRNAs accept peptides from P-state peptidyl-tRNAs, but because peptide transfer normally occurs as soon as an aminoacyl-tRNA enters the A state, it is observable only under conditions that prevent peptide transfer. For that reason, “A site” is often no more than a synonym for “unreactive with puromycin.” There is compelling evidence that the A state and the P state correlate with distinct ribosomal sites. Nascent peptides are attached to the ribosome at all times through the tRNAs to which they are covalently bonded, and peptidyl-tRNAs must interact with aminoacyl-tRNAs to form new peptide bonds. Because the ribosome must accommodate (at least) two tRNA molecules simultaneously, there must be (at least) two nonoverlapping sites for tRNA binding on the ribosome. As early as 1965, there was evidence for a third site, which is specific for deacylated tRNA (18).It is called the E(xit) site, and there is clear reason for thinking of it as a site, not just a state; poly(U)-programmed 7 0 3 ribosomes bind three equivalents of deacylated phenylalanine tRNA (19, 20). The existence of the E state/site has been challenged (21, 22), but is now generally accepted (see 23, 24). The deacylated, P-site-bound tRNAs created by peptide transfer enter the E site during translocation, and leave the ribosome from that site (25). Some feel that there is a fourth site, through which aminoacyl-tRNAs pass on their way to the A site. It has been called the E(ntry) site (26),or the R(ecognition) site (27),or the T(ransfer) site (28).There are certainly grounds for believing in a T (or R or E) state; aminoacyl-tRNAs bound to ribosomes complexed with EF-Tu do not accept peptides from peptidyl-tRNA (29). It follows that tRNAs in this condition are not in the A state. However, it is not obvious that tRNAs in the T state occupy a T site.
ELONGATION PHASE OF PROTEIN SYNTHESIS
297
The amino-acid end of aminoacyl-tRNA makes such intimate contact with protein in EFTu*GTP*aa-tRNAcomplexes (8; see Section 11,G) that it might not participate in peptide transfer even if T-state aminoacyl-tRNAs occupy the A site. The argument that EF-Tu binds to the ribosome far from the A site (27) is also questionable (see Section 111,A). Furthermore, chemical protection data suggest that the anticodon end of T-state tRNAs occupy the A site on the 30-S subunit (28).The T state and the A state are distinguished entirely by differences in acceptor-end protections, which are likely to be affected by the presence of EF-Tu in any case. The only evidence for the T site that we find at all persuasive is the observation that the initial phase of EFTu-GTP-aa-tRNA binding to the ribosome is not inhibited when the A site is occupied by tRNA (30),but it is unclear (to us) that the transient state seen when the A site is empty has the same spectral properties as that observed when the A site is occupied by a tRNA. Nor is it obvious that the transient in question is stable enough to justify identifying it with a distinct site. Entry into both the A site and the P site is controlled by mRNA interactions, which implies that small subunit interactions contribute to both (31, 32). Because both P-site and A-site tRNAs must interact with the peptidyl transferase, those sites must have large subunit components also. All are agreed that the large subunit contributes much of the E site; the binding of tRNA to 50-S subunits reported in the early 1960s (33) is almost certainly E-site binding (34).Furthermore, E-site binding is not seen if alterations are made to the CCA end of tRNA, which interacts with the large subunit during elongation (35, 36). It does appear, however, that E-site binding is stabilized by mRNA interactions, which implies some degree of small subunit involvement (25, 37, 38). A fifth site, the S site, was adumbrated recently, which is specific for deacylated tRNA, but because this site does not appear to be involved directly in elongation, it is not considered further (39).
C. Alternatives to the Two-site Model Although it is evident that the two-site model for elongation is no longer adequate, no consensus has emerged about what should replace it. Two models are currently under active consideration: the “hybrid-sites model” (28) and the “allosteric three-sites model” (24). Study of the effects of tRNA and elongation factor binding on the reactivity of rRNA bases has generated the findings on which the hybrid-sites model is based (40).tRNAs appear to interact with ribosomes primarily at their ends. The 16-S-rRNA-reactivity changes observed when tRNAs bind to ribosomes depend entirely on interactions involving the anticodon stem/ loop of tRNA, and the 23-S-rRNA alterations seen reflect interactions with
298
JOHN CZWORKOWSKI AND PETER B. MOORE
the CCA end of tRNA only. Further, tRNAs bound to the ribosome in the A, P, E, and T states can be distinguished by their protection patterns. The A, P, and T states have signatures on both subunits whereas the E state has a large subunit fingerprint only. [“State” is used here because there is no evidence that the reactivity patterns observed report directly on tRNA position.] Finally, elongation involves hybrid states, i.e., states that can be explained by postulating that tRNAs bind to one site on the small subunit while they bind to a different site on the large subunit (28). Just prior to peptide-bond formation, for example, the aminoacyl-tRNA is A state at both ends, and a peptidyl-tRNA is P state at both ends. Immediately following peptidyl transfer, the acceptor end of the newly deacylated tRNA is in the E state on the large subunit while its anticodon stem/loop remains in the P state on the small subunit. At the same time, the CCA end of the new peptidyl-tRNA is P state on the large ribosomal subunit, while its anticodon end remains in the A state on the small subunit. EF-G catalyzes the resolution of these hybrid states; after translocation, the deacylated tRNA is exclusively E state while the peptidyl-tRNA is now P state at both ends. If one equates “states” with “sites,” the hybrid-sites model emerges (Fig. 2). There is physical evidence that tRNAs move when peptide transfer occurs, but that the peptide hardly moves at all, as the hybrid-sites model requires (41-43). It should also be noted that it was proposed almost a decade ago that the large ribosomal subunit contains a “tunnel” connecting the peptidyl transferase region with the back side of the subunit where nascent peptides first become exposed to solvent (44-46). If nascent peptides must be threaded through this tunnel, they ought not to move very much during a single iteration of the elongation cycle. The hybrid-sites model explains the processivity of protein synthesis, which the two-site model does not. Transfer RNAs are attached to the ribosome at one end or the other throughout their passage across the surface of the ribosome. In addition, if one hypothesizes that the creation and annihilation of hybrid sites depend on relative motions of the two ribosomal subunits, the two-subunit architecture of the ribosome is rationalized as well. However, it is not clear that ribosomal subunits move during elongation. On the one hand, two states of the ribosome have been identified functionally, a pretranslocational state and a posttranslocational state (see Section I, F), and there is physical evidence that they differ conformationally (47). On the other hand, there is almost nothing in the protection data that correlates with that difference in state; the protection data suggest that tRNAs “waIk across a stationary ribosomal surface rather than being moved by subunit rearrangements. Another virtue claimed for the hybrid-sites model is its capacity to ex-
ELONGATION PHASE OF PROTEIN SYNTHESIS
299
FIG.2. A three-site, hybrid-sites model for elongation. This diagram of the three-site, hybrid-sites model for elongation conforms to the iconography described in the legend for Fig. 1 in most respects. Note that three tRNA sites are identified on the 50-S subunit-an A site, a P site, and an E site. Note also that the pretranslocational and posttranslocational states of the ribosome are distinguished. The former is represented as a ribosome whose two subunits do not align, and the latter is represented by a ribosome whose subunits do align. The post- to pretranslocational transition is postulated to occur as aminoacyl-tRNA is delivered to the ribosome, preserving the parallelism that is presumed to exist between the mechanisms of action of EF-Tu and EF-G (see Section 11,G). This distorts the way tRNAs associate with the ribosome; the tRNAs are slanted in the diagram. This distortion is relieved on peptide-bond formation; the tRNAs return to the upright position. Release of deacylated tRNA from the E site is postulated to occur as the ET-Tu ternary complex binds to the ribosome. Obviously other three-site, hybrid-sites model for elongation could be proposed consistent with existing data.
300
JOHN CZWORKOWSKI AND PETER B . MOORE
plain why puromycin reacts with A-site-bound peptidyl-tRNA, albeit very slowly (48,49). It reacts, the argument goes, because the CCA end of A-sitebound peptidyl-tRNA occupies the P site on the 50-S subunit immediately following peptide transfer, leaving the 50-S A site open for puromycin. However, until we know where puromycin is when it reacts with peptidyl tRNA, both before and after translocation, the interpretation of this observation will remain unclear. The allosteric three-site model was formulated to account both for the existence of the E site, and for evidence that occupancy of the E site by deacylated tRNA reduces the affinity of the A site for aminoacyl-tRNA, and vice versa (24, 50). Among other things, the negative cooperativity of the two sites explains why the number of tRNAs bound to elongating ribosomes never exceeds two. When aminoacyl-tRNA binds to the A site of a ribosome that already has tRNAs in its P and E sites, the affinity of the E site for deacylated tRNA drops and the tRNA leaves the ribosome. However, in vitro protein-synthesizing systems are notoriously sensitive to experimental conditions, and others have found it dimcult to confirm that the negative cooperativity on which the model depends actually exists (51). Although advocates of the hybrid-states and allosteric three-sites models appear to regard them as being in competition, there is no reason for doing so. The two models explain nonoverlapping bodies of data: chemical probing data in the case of the hybrid-sites model, and enzymatic data in the case of the allosteric three-sites model. Even if tRNAs bind to hybrid sites during elongation, there is no reason why the A site and the E site might not interact the way the allosteric three-site model requires. The problems that need to be addressed are whether tRNAs really do occupy hybrid sites during elongation, and whether the A site and E site really do interact. How robust are the Moazed-Noller protection experiments (40), for example? Surprisingly, this group’s conclusions about prokaryotic elongation factor binding (52)are not supported by mammalian data reported recently (53). In addition, they trapped the intermediates they characterized under a wide variety of conditions. To what extent do protection patterns depend on conditions? Finally, until the groups studying interactions between the E site and the A site resolve their differences, we should reserve judgement about the allosteric component of the allosteric threesites model.
D. The Fidelity of Protein Synthesis It has been understood since the early 1960s that the accuracy with which mRNA sequences are translated into amino-acid sequences depends ultimately on base-pairing between mRNA codons and tRNA anticodons. If the rate at which elongating ribosomes make errors [about one wrong amino
30 1
ELONGATION PHASE OF PROTEIN SYNTHESIS
acid incorporated for every lo3- 104 residues of protein synthesized (%)I were determined entirely by codon-anticodon affinities, the difference in binding free energy between cognate tRNAs and closely related, noncognate tRNAs for mRNA-programmed ribosomes would have to be at least 18 kJ/mol (at 310 K). In solution, the entire free energy of codon-anticodon binding is barely that large (55), and the differences in free energy between cognate and near-cognate pairings are 10 kJ/mol or less (56). However, basepairing energies are context dependent, and in the context provided by the ribosome, codon-anticodon discriminations exceeding 18 kJ/mol can be observed, but only if measurements are made under conditions that prevent peptide bond formation (57). Although the capacity of mRNA-programmed ribosomes to bind tRNAs differentially is fundamental to fidelity, error rates cannot be predicted on the basis of binding constants alone. Fidelity is a kinetic phenomenon, not a thermodynamic one, and no translating system that makes protein at a finite rate and that uses a single binding step to discriminate between tRNAs can possibly have an error rate as favorable as that predicted from differences in equilibrium binding constants (58, 59). Indeed, as expected, actively translating ribosomes bind tRNAs with a discrimination that is much lower than the thermodynamic ideal (57). It is easy to understand why fidelity depends on the rate of peptide-bond formation. Suppose a Michaelis-Menten mechanism explained amino-acid incorporation:
A,
k,
+ R,
k-
A,
k2
R, + P,
1
where A, is a charged tRNA, R, is a ribosome carrying a peptidyl-tRNA in its P site that is programmed to accept A,, and P is the ribosome after its nascent polypeptide chains has been extended by one residue. Suppose in addition that the corresponding rate constants for a noncognate aminoacyltRNA, A,, binding to the same ribosome are k ; , k l , , and k,. If the two kinds of aminoacyl-tRNA compete for the same site on the ribosome and the rate of peptide-bond formation is the same for all aminoacyl-tRNAs once they reach the A site, then when [A,] = [A,], the steady-state ratio of “wrong” amino acids incorporated to “right” amino acids incorporated, E , will be:
E
=
k; (k-]
+ k,)/k,(k‘, + k2).
If k, is slow compared to the off-rates for cognate and noncognate aminoacyltRNAs, E = (kik-,)/(k,kL,), which is the ratio of the equilibrium binding constants for the two aminoacyl-tRNAs, “the thermodynamic limit.” If k2 is
302
JOHN CZWORKOWSKI AND PETER B. MOORE
fast compared to the off-rates for the two aminoacyl-tRNAs, the error rate will become the ratio of on-rates (kl/kl), which is likely to be close to 1 because on-rates depend on the frequency of tRNA encounter with ribosomes. Thus, the faster such a system makes peptide bonds, the less accurately it translates. These considerations notwithstanding, mechanisms can be proposed that would enable the translation system to achieve a fidelity exceeding the thermodynamic limit (58, 59). All such “proofreading” mechanisms include one or more intermediate steps between the formation of the initial ribosome.aminoacy1-tRNA complex and the stage at which peptide-bond formation becomes possible. Provided noncognate aminoacyl-tRNAs leave the ribosome faster than cognate aminoacyl-tRNAs at each such step, and provided also that these dissociations are made irreversible by coupling to freeenergy-releasing reaction (e.g., GTP hydrolysis), fidelities significantly greate r than the thermodynamic limit can be achieved. For mechanisms with n energy-dissipating branch steps, the maximum fidelity possible-which is still achievable only in the slow synthesis limit-is [(k;k-l)/(klk’-l)].+’. Some advocates of the concept that proofreading occurs during protein synthesis have proposed that the steps in the elongation cycle responsible for fidelity are (1) the binding of EFTu-GTP-aa-tRNA to the ribosome and (2) peptide transfer (60).Because aminoacyl-tRNAs cannot form peptide bonds when complexed with EF-Tu (see Section I,B), and GTP cleavage is required for EF-Tu release, the two steps are cleanly separated, as they must be if proofreading is to occur. Plausible as this hypothesis may seem, it is unlikely to be true. The error rate for the peptidyl-transfer step will be (k, + k,)/(kj+ k,), where k, is the rate constant for peptide-bond formation, and k , and k j are the rate constants for the dissociation of cognate and noncognate aminoacyl-tRNAs from the A site, respectively. Discrimination will be achieved only if k, is slow compared to k, and k j is fast compared to k,. Unfortunately, the half-lives of (cognate) tRNAs bound to the A and P sites are measured in hours (61)whereas the half-life for peptide-bond formation is tens of milliseconds (see Section 1,F). If noncognate tRNAs had half-lives 10-6 of that of cognate tRNAs, as they would have to if peptide transfer is be selective, there would be no need for proofreading in the first place (see 62 for further discussion). Recently, advocates of the allosteric three-site model have suggested that the reason the proofreading step in the elongation cycle has not been identified is that proofreading does not occur (24). Evidence has been advanced that the allosteric decrease in A-site &nity for aminoacyl-tRNAs induced by E-site occupancy affects generic tRNA-ribosome interactions, not the codon-dependent interactions on which tRNA discrimination depends. If the
ELONGATION PHASE OF PROTEIN SYNTHESIS
303
strength of these nonspecific interactions are reduced, the argument goes,
the capacity of the A site to discriminate between cognate and noncognate tRNAs should go up. Consistent with this argument, E-site occupancy does indeed appear to increase fidelity (63). It would be premature to conclude proofreading does not occur, however. If E-site binding resulted in an inrelative to k,, k;, and k,, fidelity could indeed increase of k-, and ,k crease, but the tRNA dissociation rate data just discussed implies that the effect would have to be huge in order for ribosomes to operate as close to the thermodynamic limit as they must to achieve observed error rates. Those skeptical of proofreading must also explain why misincorporation is associated with increased GTP consumption by EF-Tu, as the proofreading hypothesis predicts. In nontranslocating systems, incorporation of nearcognate amino acids is associated with a 10-fold increase in GTP hydrolysis per amino acid incorporated, relative to cognate incorporation, and, interestingly, aminoacyl-tRNAs whose anticodon sequences that do not pair at all with an mRNA do not stimulate GTP consumption (64). For systems translocating normally, the level of stimulation of GTP cleavage by near-cognate tRNAs is about 50-fold relative to that of cognate tRNAs (65). Furthermore, the “extra” GTP hydrolysis associated with miscoding ceases in the presence of streptomycin, which is known to stimulate miscoding (60, 66). We conclude that proofreading occurs during translation, but are acutely aware of our inability to specify how.
E. Factor-free Translation In the 1960s, it was discovered that translation can occur in the absence of factors (67-69). Ribosomes bind mRNAs such as poly(U) spontaneously; once an mRNA is in place, the A and the P sites readily accept tRNAs in an mRNA-encoded fashion in the absence of EF-Tu. If the P site is filled with a peptidyl-tRNA or a peptidyl-tRNA surrogate (the P site always fills first) and the A site is filled by an aminoacyl-tRNA, the ribosome will catalyze peptidebond formation, and translocation can then ensue in the absence of EF-G. This process, which is called “factor-free” or “nonenzymatic” translation, is much slower than factor-assisted translation at physiological temperatures, and it occurs only in the presence of a few mRNAs; poly(U) works, most other messengers do not. However, it is sensitive to antibiotics that inhibit protein synthesis except, predictably, those that interfere with factor function. In addition, nonhydrolyzable GTP analogs, which inhibit elongation factors, have no effect, and the length of the peptides produced depends on mRNA length in the normal manner: one amino acid for every mRNA triplet (70). It is likely, therefore, that factor-free translation is mechanistically similar to normal translation. One concludes that elongation factors must not
304
JOHN CZWORKOWSKI AND PETER B. MOORE
endow ribosomes with (qualitative) properties that they would otherwise lack, e.g., the capacity to translocate. Factors facilitate processes that are inherent in the ribosome. Curiously, the rate of factor-free translation increases significantly if ribosomes are pretreated with p-(ch1oromercuri)benzoate (PCMB). Ribosomal protein S12 is the target of PCMB action, and ribosomes lacking S12 also show a high rate of factor-free translation (71).It is interesting that S12 can be cross-linked to EF-G (72, 73),and there is also a connection between S12 and fidelity. Many mutants resistant to streptomycin, an antibiotic that stimulates miscoding, are mutants in ribosomal protein S12, and some of these make coding errors at rates below normal (74).Even more curious, PCMBstimulated, factor-free translation is characterized by a miscoding level well below that seen when factors are present (70).This could be due to the slow rate of factor-free translation, but it is far from obvious that this is the case.
F. Rates, States, and Energies Ideally, one would like to know the rate constants for all of the steps of the elongation cycle in a translation system that is making protein at physiological rates. Unfortunately, most of the kinetic data in the literature were obtained under suboptimal conditions with systems inhibited from making protein in some way so that individual steps could be studied. The interpretation of those data is further complicated by the difficulty that attends the preparation of ribosome populations in which even a majority of particles are active, and by the sensitivity of rate constants to ionic conditions. The only in vivo rate we know is the overall rate of polypeptide elongation; it is between 10 and 20 residues per second. We also know that elongation rates of that order can be achieved in vitro (75). Nevertheless, important aspects of protein synthesis have been illuminated by kinetic and thermodynamic measurements. For example, it has long been believed that ribosomes alternate between a pretranslocational state and a posttranslocational state during elongation. The classic experiments done to validate this concept demonstrated the expected variation in the factor-binding properties of ribosomes at different stages in the elongation cycle (76).However, because they were done using ribosomes that had tRNA bound, and because the occupancy of the A and P sites changes during elongation, it was possible that tRNA rearrangements, not ribosomal conformation changes per se, were responsible for what was seen. Both EF-G and EF-Tu have GTPase activities that are stimulated by tRNA-free ribosomes, their so-called uncoupled (to translation) GTPase activities (17; see Sections II,A and 11,B), and the recent finding that the uncoupled GTPase activities of the two factors interact synergistically has
ELONGATION PHASE OF PROTEIN SYNTHESIS
305
provided persuasive evidence for the validity of the two-state hypothesis. When elongation factors are purified so that cross-contamination is eliminated and the GTPase activities of the two factors are measured under multiturnover conditions where ribosomes are limiting, the activity observed when both EF-Tu and EF-G are mixed with ribosomes is greater than the sum of the activities of the same amounts of the two factors measured separately (77). Because ET-Tu and EF-G compete for the same ribosomal binding site (78-81), the only way they can enhance each other's activities is if each prepares the ribosome to accept the other. Thus EF-Tu must drive empty ribosomes from the posttranslocational state to the pretranslocational state, and EF-G must accomplish the reverse. Furthermore, in order for either factor to have an uncoupled GTPase activity in the absence of the other when ribosomes are limiting-which they do-empty ribosomes must cycle between the two states spontaneously at an appreciable rate. Existing data suggest that at 3 T C , the spontaneous, post- to pretranslocational transition rate is of the order of 1.5 sec-1(77). The data are less clear for the pre- to posttransition, which may proceed at a rate as low 0.15 sec-1. This implies that empty ribosomes prefer the pretranslocational state, consistent with earlier observations (82), and the stimulatory effect of EF-G on EF-Tu activity is greater than the reverse, as expected. Data exist for eukaryotic ribosomes that could be interpreted the same way, but the picture is not as clear at this point (83). The same isomerization has been studied using ribosomes that have mRNA and tRNA bound. The equilibrium constant for the pre- to posttransition has not been measured directly under these conditions, but it is known that the activation energy for the pre- to posttransition in the absence of elongation factors is about the same as that for the reverse. This indicates that the free energy ddference between the two states is small relative to the activation energy, but the activation energy is quite large, about 85 kJ/mol (69, 84, 85). The rate of factor-free translocation is about the same as that estimated for the pre- to posttranslocational transition in empty ribosomes, about 0.08 sec-1 at 37°C (85). The activation energy for EF-G-catalyzed translocation is about 30 kJ/mol (69). EF-Tu delivers aminoacyl-tRNAs to the A site of posttranslocational ribosomes. Once this happens, peptide-bond formation usually follows immediately, and is accompanied by a shift from the post- to the pretranslocational conformation. Which comes first is unknown. The activation energy associated with placement of aminoacyl-tRNA in the A site by EF-Tu has been measured using ribosomes carrying a deacylated tRNA in the presumed canonical P site so that peptide-bond formation will not occur. It is about 35 kJ/mol (85). It appears that this step is rate limiting in elongation (85-87).
306
JOHN CZWORKOWSKI AND PETER B. MOORE
II. Elongation Factors A. The EF-TU Cycle EF-Tu is the most abundant protein in bacterial cells (5 to 10% of the total). There are about 10 copies per ribosome, about as much as there is aminoacyl-tRNA, and most of the EF-Tu in the cell is found complexed with aminoacyl-tRNA (88, 89). In addition to catalyzing the delivery of aminoacyltRNAs to mRNA-programmed ribosomes, the complexation of aminoacyltRNA with EF-Tu protects it from deacylation (90). Experiments done in the 1960s led to the formulation of an EF-Tu cycle (91, 92) that has endured as a paradigm, much as Watson’s two-state model has for elongation (Fig. 3). EF-Tu is a member of the regulatory GTPase, or “G protein,” family; its N-terminal domain is a guanine-nucleotide-binding domain (93, 94). As with other G proteins, the affinity of EF-Tu for its macromolecular ligands is determined by whether it has GDP or GTP bound to it. When complexed with GTP, EF-Tu binds to aminoacyl tRNA, and that ternary complex has high affinity for the A site (or T site?) of the ribosome, provided its tRNA interacts properly with the mRNA codon present in the small subunit A site. The ribosome functions as a GTPase activator protein (GAP) for EFTu-GTP-aa-tRNA. Thus shortly after a cognate EFTu-GTP-aa-tRNA binds to the ribosome, its GTP is cleaved. EFTueGDP dissociates from the ribosome because it has relatively low affinity both for ribosomes and for aminoacyl-tRNA. The ribosomal phase of the EF-Tu cycle has been dissected kinetically into the following steps: (1) initial binding of the ternary complex, which is not codon dependent; (2) codon recognition, which triggers an alteration in the conformation of the tRNA D loop and anticodon; (3) GTP hydrolysis, which is activated by conformational changes induced by codon-anticodon recognition; (4) release of EF-Tu from the ribosome; and (5) full tRNA entry into the A site, which involves a further conformational change in the tRNA (51, 95). One presumes, tentatively, that proofreading occurs at some point during this sequence of steps. The replacement of EF-Tu-bound GDP by GTP is catalyzed by a specific guanine-nucleotide exchange factor (GEF) called EF-Ts. Without it, EF-Tu does not recycle because the affinity of EF-Tu for GDP is considerably higher than for GTP. [The kinetic and thermodynamic constants for the interactions of the EF-Tu cycle have been determined (96-98).] The GTPase activity of EF-Tu, which is normally activated by the ribosome in the presence of aminoacyl-tRNA and mRNA, can also be induced by monovalent cations (99), free ribosomes (100), 5 0 3 ribosomal subunits, 5 0 3 core particles to which ribosomal protein L7/L12 has been added (lo]), and
ELONGATION PHASE OF PROTEIN SYNTHESIS
307
GTP
FIG. 3. The EF-Tu cycle. The only new iconography introduced in this diagram is the symbol for EF-Ts, which is represented a5 an oval. This rendering of the EF-Tu cycle assurne~ that only one GTP is cleaved per aminoacyl-tRNA delivered to the ribosome, but otherwise conforms to the version of the elongation cycle shown in Fig. 2.
the antibiotic kirromycin (102).The ribosome-dependent GTPase of EF-Tu is stimulated by 3’ fragments of aminoacyl-tRNA as small as aminoacyladenosine, and even by unacylated tRNA missing its 3’ CCA end (103-108). If kirromycin and aminoacyl-tRNA are present, either ribosomal subunit can additionally stimulate EF-Tu’s GTPase (109). As expected, the G domain of EF-Tu has nucleotide binding and GTPase activities in the absence of the rest of the protein, but it does not interact productively with tRNA, EF-Ts, or the ribosome. However, the G domain of EF-Tu can be cross-linked to 23S rRNA (40, 93, 110). Although it has long been believed that 1:lcomplexes between EF-Tu, GTP, and aminoacyl-tRNA deliver aminoacyl-tRNA to the ribosome (e.g., 1 1 1 , 1 12), there is evidence that at least under some circumstances it takes two EF-Tus and two GTPs to deliver a single aminoacyl-tRNA. This has been
308
JOHN CZWORKOWSKI AND PETER B. MOORE
demonstrated for poly(U)-programmed translation systems both kinetically and by experiments done using EF-Tu mutants specific for xanthosine triphosphate (75, 113-116). There is physical evidence for the existence of complexes of the form (EFTu.GTP),.aa-tRNA; the stoichiometry of the EFTuetRNA complex depends on temperature (117). In this connection, it is interesting that EF-Tu can form large, filamentous polymers (118, 119)that interact with both nucleotides and EF-Ts (120). Furthermore, dimers of EFTu-EFTShave been observed in Thermus thermophilus (121, 122). It has been suggested that multiple GTP cleavages by EF-Tu occur only in response to specific mRNA sequences (123). There may be a relationship between messages that include homopolymeric stretches, “extra” cleavage of GTP, and frameshifting. Be that as it may, it is essential that additional studies be done to determine how many GTPs per amino acid incorporated are normally consumed by EF-Tu. Until this issue is fully resolved, it will be impossible to give a satisfying account of the mechanism of elongation.
6. The EF-G Cycle
EF-G, like EF-Tu, is a G protein that has a single site that binds both GDP ( K , = 6.7 x 10-7 M ) and GTP ( K D = 1.2 x 10-5 M ) . [The parameters quoted are for T. thermophilus (121).] Under physiological conditions, that site has a GTPase activity that is strongly stimulated when EF-G interacts with empty ribosomes. The uncoupled rate is about 1 GTP per EF-G molecule per second in T. therrnophilus. Estimates of the degree of its stimulation by ribosomes differ enormously because contaminating GTPase activities make it hard to measure the unstimulated rate; it may be as large as 100,000fold (124). Both 70-S couples and 50-S subunits can stimulate this activity, provided ionic conditions are appropriate (see 1).The 303 subunits appear to enhance the 50-S effect by stabilizing the interaction between EF-G and the 50-S subunit (125). Because the GTPase activity of EF-G is also stimulated by solvents like isopropanol, it is probable that EF-G, like EF-Tu, contains all of the groups responsible for catalyzing GTP hydrolysis (126, 127). It is also likely that the uncoupled GTPase activity of EF-G is related to its translocase activity. When protein synthesis is in progress, one GTP is cleaved by EF-G per translocation event (128). G nucleotides modulate the affinity of EF-G for the ribosome, by controlling its conformation. The &nity of EF-G for the ribosome is much higher when GTP analogs such as GMPPNP, GMPPCP, and presumably GTP are bound (k, = 3.6 x 10-5 M ; 129), compared to when it is complexed with GDP (KD too large to measure). Furthermore, the binding of EFG.GMPPCP to pretranslocational ribosomes causes translocation (1,130). Because GMPPCP cannot be hydrolyzed, EF-G remains bound to the ribo-
ELONGATION PHASE OF PROTEIN SYNTHESIS
309
some, and because EF-Tu cannot bind to a ribosome that has EF-G bound (78, 81), elongation stops. The picture of translocation that emerges is outlined in Fig. 4. It begins classically with a ribosome in the pretranslocational state with an empty E site, a discharged tRNA in its P site, and a peptidyl-tRNA in its A site. [The ribosome-bound tRNAs may well be in hybrid states, of course.] The binding of EFG-GTP causes translocation, the resulting shift in ribosome conformation to its posttranslocational state is sensed by EF-G, and its GTPase is activated. Activation of the GTPase activity of EF-G may be associated with the entry of discharged tRNA into the E site: ribosomes carrying deacylated tRNA in their P sites stimulate the GTPase activity of EF-G more strongly than empty ribosomes (131),but this stimulation disappears if the CCA end of tRNA in the P site is damaged (or missing) so that it cannot bind to the E site (132,133).Pairing between the CCA end of tRNA in the E site with 23-S rRNA may be part of the mechanism that triggers the GTP hydrolysis of EFG. In addition, EF-G does not stimulate P-site tRNA release unless the A site is occupied, and the E site does not release tRNAs properly unless EF-G is able to cleave GTP and alter its conformation properly (134).As judged by their anticodon-to-anticodon separation, A-site and P-site tRNAs appear to be translocated simultaneously, but after translocation, the distance between the anticodons of what are now E-site and P-site tRNAs slowly increases (82,
135). GTP cleavage stimulates dissociation of EF-G from the ribosome by enabling EF-G to adopt its low-affinity, GDP conformation in solution. The binding of G D P to EF-G is loose enough so that passive exchange will replace it with fresh GTP, readying it for another round of translocation. Note that EF-G, unlike EF-Tu and all other G proteins except the EF-G eukaryotic homolog EF-2, has no guanine nucleotide exchange protein. Replacement of GDP by GTP occurs spontaneously. As is the case for many other large proteins, domains of EF-G can be isolated from the intact molecule by partial proteolysis (136). Systematic investigation of the enzymatic properties of these domains has recently shown that the N-terminal domain of EF-G, which is its G domain, binds GTP, as expected, but lacks GTPase activity both in the presence and absence of ribosomes. Remarkably, the C-terminal half of the molecule, the part that remains after the G domain has been removed, which does not interact with G nucleotides, promotes (slow) translocation (137). It is important to note that peptidyl tRNAs can be translocated by EF-G in the absence of mRNA; ribosome-catalyzed, factor-dependent synthesis of poly(1ysine) from lysyl-tRNA can be achieved in the absence of poly(A) (138).
310
JOHN CZWORKOWSKI AND PETER B. MOORE
U FIG. 4. EF-6-driven translocation. The events that occur when EF-C interacts with the ribosome are shown in this diagram (see Section 11,B).At least one inhibitor is known for each of the steps identified, and one such inhibitor is identified to the right of each step (see Section 11,C).
This suggests that the translocation mechanism operates on tRNAs; mRNA is dragged along passively. Also, even though there is no evidence that the free energy of EF-G-associated GTP hydrolysis is captured by the elongating
ELONGATION PHASE OF PROTEIN SYNTHESIS
31 1
ribosome directly (see 2), the coupling of the activation of the GTPase activity of EF-G to the posttranslocational state ensures that the posttranslocational state is favored thermodynamically by the presence of EFG-GTP. This guarantees that the factor binding site will be vacated by EF-G when the ribosome is in the posttranslocational state, ready to receive the next EFTu-aa-tRNA complex.
C. Antibiotic Inhibitors Important insights into the factor function have been obtained from studies of antibiotics that inhibit EF-G and EF-Tu. Arguably, the most interesting are fusidic acid and kirromycin (139, 140). Fusidic acid is an EF-G inhibitor. In its presence, a single translocation step occurs, GTP is hydrolyzed, but then elongation stops because EFG.GDP*ribosome.fusidic acid complexes will not dissociate (141). Fusidic acid binds neither to ribosomes nor to EF-G separately, and it will not bind in the presence of GMPPCP. It binds only to ribosome*EFG*GDPcomplexes (142).Furthermore, all known fusidic acid-resistant mutations are EF-G mutations (143).Thus, at the stage fusidic acid binds, the conformation of EF-G must be different from that of ribosome-free EFGeGTP or EFGaGDP, and the transition from the fusidic acid-competent state to the EFG.GDP state must be required for EF-G discharge. [Note that some believe that fusidic acid acts after EF-G has altered its conformation in response to GTP cleavage (134).]It is interesting that fluoroaluminates, which are inorganic phosphate analogs, have the same effect on elongation as fusidic acid; EFGaGDP-AlF, sticks to the ribosome (144). The effect of kirromycin on EF-Tu mirrors the effect of fusidic acid on EF-G. It inhibits protein synthesis by preventing the release of EFTueGDP from the ribosome after it has delivered aminoacyl-tRNA to the A site (145, 146). The amino acid of an aminoacyl-tRNA delivered to the ribosome in the presence of kirromycin cannot participate in peptide transfer; apparently it remains protected by EF-Tu. Nevertheless, puromycin can still react with the peptide moiety of the peptidyl-tRNA in the P site (109).In the absence of ribosomes, kirromycin activates the latent GTPase activity of EF-Tu, and greatly reduces the affinity of EFTu-GTP for aminoacyl-tRNA (102, 147). EF-Tu activity is also inhibited by pulvomycin. It alters the affinity of EF-Tu for G nucleotides, promoting exchange of GDP for GTP, it inhibits EF-Tu’s GTPase, and it weakens the affinity of EFG-GTPfor aminoacyl-tRNAs (148). Thiostrepton, siomycin, micrococcin, and their relatives have also received a great deal of attention. The members of this family of antibiotics bind to a site that is part of the 1060 loop in the 50-S subunit (see Section 11,D) (149). In the presence of thiostrepton, EFGaGTP does not form a stable complex with the ribosome, and the stimulatory effect that ribosomes
312
JOHN CZWORKOWSKI AND PETER B. MOORE
normally have on its GTPase activity is abolished. The effects are reciprocal; thiostrepton does not affect ribosomes to which EF-G is already bound (150). Thiostrepton also inhibits the ribosome binding activity of the EFTu-aatRNA*GTP complex, and the nonenzymatic binding of aminoacyl-tRNA to the A site, but, revealingly, it has no effect on the uncoupled GTPase activity of EF-Tu, either in the presence or absence of aminoacyl tRNA (104, 140, 151). Finally, both EF-G-catalyzed and nonenzymatic translocation are inhibited by thiostrepton (152). Thus it appears that the thiostrepton site not only interacts with elongation factors, but is part of the molecular machinery that enables ribosomes to translocate in the first place. Micrococcin competes with thiostrepton for binding to the ribosome, and has many of the same effects. It differs in one interesting respect, however. Micrococcin stimulates the uncoupled GTPase activity of EF-G rather than inhibiting it (153).This may not be as surprising as it sounds. By weakening the interaction of EFG*GDPwith the ribosome more than it weakens the binding of EFGeGTP to the ribosome, micrococcin could stimulate turnover rather than inhibiting it. On this hypothesis, the only difference between the two drugs would be that thiostrepton is a stronger inhibitor of binding. (It is worth pointing out that “does not bind” or “does not interact” in one report can be equivalent to “binds weakly” in another. Furthermore, there are species-dependent variations in the sensitivity of translation systems to inhibitors. For both reasons, reports about the physiological effect of an antibiotic can differ quatitutively from species to species, even though it affects the translation system in fundamentally the same way in all.) Viomycin has binding sites on both ribosomal subunits (154);in its presence both initiation and elongation are abnormal (139).In protein-synthesizing systems poisoned with viomycin, peptidyl-tRNA is found in the A state; translocation is inhibited (155). However, EF-G still interacts with the ribosome, and its uncoupled GTPase activity is undiminished. It appears that viomycin allows EFG-GTP to interact with the ribosome, but uncouples translocation from activation of the GTPase activity of EF-G. Viomycin also inhibits EF-Tu-dependent delivery of aminoacyl-tRNA to ribosomes whose E sites are occupied, but not those whose E sites are empty (152). Thus, viomycin may inhibit the pre- to posttranslocational transition in both directions in the presence of factors; it does so in factor-free translocation systems (156). Spectinomycin is a 30-S-related aminoglycoside antibiotic that appears to inhibit the 30-S subunit. Although it does not induce translational errors, many of its resistance mutations map to ribosomal protein S5, which is a member of the S4-S5-S12 group that influences the fidelity of translation. Its mechanism of action has something to do with fusidic acid. Some mutants resistant to spectinomycin show enhanced sensitivity to fusidic acid, and
ELONGATION PHASE OF PROTEIN SYNTHESIS
313
some fusidic acid-resistant mutants have reduced sensitivity to spectinomycin (157). What this phenomenology emphasizes is that translocation involves both subunits.
D. Elongation Factor Interactions: The 1060 Region The observations that led to the identification of the 1060 region with factor-binding began with the discovery that ribosomes lacking proteins L7/L12 do not stimulate the uncoupled GTPase activities of the two elongation factors (158-160). Note, however, that EFTu-aa-tRNA*GTPbound to appropriately programmed 30-S subunits will hydrolyze its GTP if 50-S subunits lacking L7/L12 are added, but turnover does not occur (161). Ribosomes normally contain a tetramer of L7/L12, which is the only ribosomal protein present in multiple copies. Not only is L7/L12 important for factor activity, cross-links have also been observed between EF-Tu and L7/L12 (162), and monoclonal antibodies against L7/L12 prevent both EFTu*GTP*aa-tRNAbinding to ribosomes and the ribosome stimulation of EF-Tu GTPase activity (163, 164). L7/L12 can also be cross-linked to EF-G in the presence of GMPPCP (165), but not in the presence of fusidic acid and GDP (166,167). In parallel with this observation, L7/L12 is not required for EF-G binding in the presence of fusidic acid, but is required for formation of the EFG.GMPPCP.ribosome complex (168). There is additional evidence that the conformation of L7/L12 is affected by the state of EF-G. The L7/L12 in EFGefusidic acid.ribosome complexes is resistant to trypsin, but in a GMPPCP-stabilized complex it is not (169). The interaction of EF-G with L7/L12 can also be visualized spectroscopically. L7/L12 contributes sharp resonances to the otherwise broadline spectrum of free ribosomes, indicative of independent mobility. In the presence of EF-G and GMPPCP, that signal disappears (170); L7/L12 becomes immobilized. Could it be that conformational changes in L7/L12 induce the GTPase in EF-G and EF-Tu? Ribosomes contain a tetramer of L7/L12 complexed with L10, another of the proteins that is sometimes picked up in factor cross-linking experiments (171),and L10, in turn, binds directly to the 1060 region of 23-S rRNA (140, 172). Strong evidence exists that factor activity depends directly on the 1060 region. Thiostrepton binds directly to the 1060 region of 23-S rRNA, as pointed out above (Section II,C), and its binding is enhanced by the presence of L11, which binds to the same region. In addition, EF-G has been cross-linked to RNA residues belonging to the 1060 region (173). It seems likely that the 1060 region interacts directly with elongation factors and is directly involved in ribosomal state switching.
314
JOHN CZWORKOWSKI AND PETER B. MOORE
E. Elongation Factor Interactions: The SRL The major RNA in all large ribosomal subunits includes a 12-nucleotide sequence that is totally conserved across taxonomic groups: A2654-A2665 in the 23-S rRNA of Escherichia coli (174)and nucleotides A4318-A4329 in the 28-S rRNA from the rat (175). It is called the “sarcinlricin loop” (SRL) because most of what we know about its role in protein synthesis has emerged from studies of the toxicity of two proteins, ricin and a-sarcin. Ricin is toxic because it inactivates ribosomes by catalyzing the depurination of a single adenosine residue in 23-Sl28-S rRNAs (A2660 in E . coli, A4324 in the rat) (176). a-sarcin is similarly fastidious in uiuo; it catalyzes the hydrolysis of a single phosphodiester bond in 23-Sl28-S rRNA, the one between G2661 and A2662 in E . coli (G4325-A4326 in the rat) (I75). The effect of a-sarcin on ribosome activity is the same as that of ricin, and cells exposed to both die because they can no longer make protein. Their large ribosomal subunits are inactive. Neither eukaryotic (177, 178) nor prokaryotic (179)ribosomes bind elongation factors properly following ricin or a-sarcin treatment, but they are normal in virtually every other respect. Chemical protection results suggest that factors interact with the SRL, the 1060 region, and remarkably little else in bacterial ribosomes. EF-G binding protects A1067, G2655, A2660, and G2661 ( E . coli); EF-Tu protects G2655, A2660, and A2665 (52).Further, the SRL is protected from a-sarcin by the prior binding of EFGeGPD and fusidic acid (180). [It should be noted that quite different results have been obtained with EF-2 in eukaryotes. It protects residues in 5-S rRNA, in 18-S rRNA, in the 2 8 3 rRNA homolog of the 1060 loop, in the helix 72-75 region of 28-S rRNA, and near the peptidyl transferase loop (53).Nothing is seen in the SRL, which is reported to be inaccessible to small reagents, even though the SRL in eukaryotic ribosomes reacts with ricin.] Although generally referred to as a “loop,” the SRL is highly structured (4, 5). Its GAGA sequence (G2659-A2662), where ricin and a-sarcin attack, is part of a GNRA tetraloop, which is closed by a Watson-Crick CG. The remaining bases in the sequence, four on the 5’ side and three on the 3’ side, form a tightly organized structure in which A2657 pairs side-by-side with G2664, U2656 makes a reversed-Hoogsteen pair with A2665, and G2655 (the “bulged G”) reaches across the major groove so that its imino proton can hydrogen-bond to an oxygen on the phosphate that links G2664 to A2665. [The same bulged G motif occurs in eukaryotic 5-S rRNA in loop E (181).] Ricin recognizes the GAGA tetraloop of the SRL (182),and a-sarcin recognizes its bulged G motif (A. Gluck and I. G. Wool, personal communication). The data obtained on SRL function by site-directed mutagenesis in viuo are puzzling. Replacement of G2661 by a C is tolerated without much effect,
ELONGATION PHASE OF PROTEIN SYNTHESIS
315
except in strains that include a streptomycin-resistant mutation in ribosomal protein S12, where the interaction of the ribosome with EFTu-tRNA-GTP is abnormal (183).The lethal effect of this combination is abolished by mutations in EF-Tu (184). Even more curious is the recent observation that G2665, the bulged G, can be replaced by an A (but not by a pyrimidine) without killing cells (A. Gluck and I. G. Wool, personal communication). If ribosomes tolerate sequence changes like these, why have variants of the SRL sequences never appeared in nature? SRL function has also been examined by exposing ribosomes to DNA oligonucleotides having complementary sequences. Ribosomes bind such sequences grudgingly, and then only when they are actively translating (185).Binding is an inactivating event (186),but the reproducibility of recent observations that oligonucleotides complementary to the 3’ half of the SRL and its associated helical stem bind strongly to large ribosomal subunits and have dramatic effects on activity (84, 187)has been questioned (I. G. Wool, personal communication). On balance, it seems unlikely that the SRL is part of the machinery that enables ribosomes to switch between the pretranslocational and posttranslocational states in the absence of factors. a-Sarcin-treated ribosomes are active in factor-free translation (179). However, an intact SRL is obviously required for elongation-factor activity. It could be that factor-induced changes in its conformation affect the rate of state-switching, as Wool and colleagues have long proposed (188). Nierhaus has recently proposed that the conformational change in question might be a melting of the secondary structure of the SRL (189). It could also be that changes in the SRL’s tertiary or quaternary interactions with other ribosomal components might be critical for factor-induced state switching. The SRL may be part of the factorbinding site, but as far as we know, no one been able to demonstrate an interaction between EF-G (or EF-Tu) and SRL sequences in isolation (I. G. Wool, personal communication). It has been reported recently that the SRL is located in the vicinity of the peptidyl transferase site, not the 1060 region (190).
F. Elongation Factor Interactions: The 30-S Subunit As we have repeatedly remarked, there are many lines of evidence indicating that the factor site on the ribosome includes 30-S components, and that elongation involves interaction between the two subunits. In the presence of aminoacyl-tRNA and an appropriate mRNA, EF-Tu binds so tightly to isolated 3 0 3 subunits that the complex can be visualized in the electron microscope (191). It binds to the side of the head of the 3 0 4 subunit away from the platform, in the vicinity of S4, S5, and S12. Consistent with this
316
JOHN CZWORKOWSKI AND PETER B . MOORE
finding and also with the biochemical observation that factors compete for a single binding site, the reactive cysteine in E . coli EF-G can be cross-linked to S12 in high yield by mild oxidation of the complexes of EFGmGTP-fusidic acid-ribosome (72). S12 can also be cross-linked to EF-G under the same conditions in lesser yield using iminothiolane (167). [Aryl azides attached to that same cysteine also efficiently cross-link EF-G to 23-S rRNA (192).] Skold, too, obtained cross-links between GMPPCP-stabilized complexes of EF-G and ribosomes to both 50-S (L6, L7/L12, L14) and 30-S components (S12, S19) (73). There is also genetic evidence linking S12 with the SRL. As mentioned in Section II,E, sequence changes in the SRL interact with mutations in S12 (183).In addition, some mutants in S12 are abnormal with respect to their capacity to stimulate the GTPase activity of EF-Tu, and others affect the response of translating systems to kirromycin (193, 194). A functional connection between S12 and elongation is also indicated by the observation that reaction of cysteines in S 12 with p-chloromercuribenzoate activates factorfree translation (see Section 1,E). It should also be noted that mutation of 1 6 3 rRNA at position 530 (in a universally conserved region) prevents EFTu-dependent binding of aminoacyl-tRNA to the ribosome, but not EF-Tuindependent (i.e. nonenzymatic) association of aminoacyl-tRNA with the classical A site (195). ~
G. Structures of the Factors Although crystals were first reported in the 1960s, it was not until recently that crystal structures for both elongation factors became available. As mentioned earlier, structures are available today for EF-Tu complexed with GDP (196), GTP (6, 7), and both GTP and aminoacyl-tRNA (8), and for nucleotide-free EF-G (9), and EF-G complexed with GDP (10). It had long been realized that their GTP-binding domains are homologous. It is now clear that their homology is far more extensive than that. EF-Tu is composed of three domains. Its large, N-terminal domain is a classic G-nucleotide binding domain, of which numerous examples are known both in prokaryotes and eukaryotes (197). Its small second and third domains are composed of beta sheet, and their placement relative to the G domain is determined by the nucleotide bound to the G domain. If that nucleotide is a GTP, the three domains assume the compact conformation shown on the right-hand side in Fig. 5 (1972,b).If the nucleotide is GDP, the protein opens up dramatically. This conformational change is triggered by (relatively) small alterations in G-domain conformation that are coupled to the placement of residues in its nucleotide binding site (6, 7). It is interesting that mutations of EF-Tu that confer resistance to kirromycin cluster at the interface between the G domain and the third do-
ELONGATION PHASE OF PROTEIN SYNTHESIS
317
FIG.5 . The structures of EF-C, and EFTu.GTP. This figure compares ribbon diagrams of the structures of EFG.GDP (left) and EFTu.GTP (right). The two molecules are shown with their nucleotide binding sites facing the reader and their nucleotides in the same orientation. The EFG.GDP coordinates used are those of the Yale group (10). Coordinates for EFTu.GTP were obtained from the group at the University of Aarhus (6). This figure was created using MolScript and Raster3D (197a, b).
main. Perhaps kirrornycin “glues” the two domains together, forcing EF-Tu to maintain a GTP-like conformation, regardless of the state of its nucleotide (198, 199). Pulvomycin-resistant mutants cluster in a different region, the region of EF-Tu where its three domains come together (200, 201). tRNA binds to the nucleotide-binding-site side of EF-Tu (8) (Fig. 6, top) (201a). The third domain binds the elbow of the tRNA, and the anticodon stem runs across the third domain toward the second domain. The CCA end tucks into the gap between the second domain and the G domain, where it is hidden from solvent, which explains why EF-Tu inhibits the hydrolysis of
318
JOHN CZWORKOWSKI AND PETER B. MOORE
FIG.6. The EF-Tu ternary complex compared with EF-G. A surface of space-filling models of EF-G (top) and EFTu.GTP-aa-tRNA (bottom) are compared, oriented so that their nucleotide binding sites would superimpose if the two molecules were laid on top of each other, in about the same orientation as shown in Fig. 5.The structure that projects down from the body of the EF-Tu ternary complex is the anticodon stemiloop of the tRNA. Its acceptor stem form is included in the “bottom” of the body of the complex. The structure of the ternary complex is described in Ref. 8. We thank Morten Kjeldgaard for enabling us to make use of its coordinates prior to publication. This figure was drawn using Grasp (ZOla).
ELONGATION PHASE OF PROTEIN SYNTHESIS
319
aminoacyl-tRNAs. This structure also explains most prior chemical and enzymatic protection and cross-linking studies (202, 203). EF-Ts appears to interact with all of the domains of EF-Tu (204-207), but its binding does not interfere with the formation of the EFTu.GTP.aa-tRNA complex (205)nor, in E . coli, with the association of the EFTu-GTP-aa-tRNA complex with the ribosome up to the point where GTP is hydrolyzed (208). EF-Ts must bind to EF-Tu somewhere on the surface of the molecule that is distal to the tRNA binding site, and that region must be unimportant for other aspects of EF-Tu activity. EF-G, which is considerably larger than EF-Tu, consists of five domains, the relative arrangement of which is essentially the same in the nucleotidefree protein as it is in the EFG.GDP complex. Preliminary data suggest that the conformational difference between EFG-GTP and EFGmGDP is much smaller than the corresponding difference in EF-Tu (J. Czworkowski and P. B. Moore, unpublished data). This conclusion has been questioned, but it should be pointed out that in the case of EF-Tu, GTP cleavage must reduce not only the &nity of EF-Tu for the ribosome, it must also reduce the &nity of that protein for tRNA. For EF-G, GTP cleavage controls only its affinity for the ribosome. The N-terminal domain of EF-G, like that of EF-Tu, is a G domain, but it is distinguished from others of its class, including that of EF-Tu, by a 90residue insert, which forms a number of “extra” secondary structure elements at the “top” of the molecule. The second domain of EF-G is homologous to the second domain of EF-Tu, but its remaining three domains are alpha-beta domains that are unrelated to the third domain of EF-Tu. The fourth domain of EF-G has a fold like that of ribosomal protein S5 (209), whereas the fifth domain closely resembles ribosomal protein S6 (9). (The C terminal fragment of EF-G, whose activity in translocation was noted earlier, corresponds to domains 2, 3, 4, and 5.) A moderate-sized literature exists describing the properties of EF-G molecules that have been modified chemically one way or another. The mechanistic interpretation of many of those data is problematic at this point because most of it speaks to the details of the interactions between EF-G and ribosomes. Suffice it to say that these data provide ample reason for believing that the nucleotide-binding-site face of the G domain of EF-G interacts with the ribosome ( 1 , 210-213). Furthermore, experiments done on the mechanism of action of diphtheria toxin on EF-2 (214-216) and on the effect of tyrosine-modifying reagents on EF-G (217) indicate that the distal end of the fourth domain is vital for EF-G function also. Mutations in EF-G that confer fusidic acid resistance are found in its G domain, and in domains 3 and 4,predominantly surrounding an interdomain gap in the structure (143). The failure of EFGeGDP to release from ribo-
320
JOHN CZWORKOWSKI AND PETER B. MOORE
somes when fusidic acid is present could be the product of a kirromycin-like locking of the relationship between its domains. Fusidic acid may inhibit the conformational change responsible for release, which normally follows GTP cleavage. The most remarkable discovery to emerge from these studies so far is the finding that EF-G resembles the EF-Tu ternary complex; at low resolution the two are effectively isosteric (8). When EF-G and EFTueaa-tRNA-GTP are aligned to maximize the overlap of the conserved regions of their nucleotide binding sites, their second domains superimpose (Fig. 6). More than that, domains 3, 4,and 5 of EF-G fill nearly the same space as the tRNA of the ternary complex. Domain 4 of EF-G corresponds to the anticodon stem/loop of the tRNA, and domains 5 and 3 correspond to the elbow and the acceptor arm, respectively. The correspondence it so close that it is inconceivable that it is coincidental. EF-G must be an all-protein analog of the EF-Tu ternary complex. EF-G and EFTu-GTPeaa-tRNA must bind to the ribosome similarly, and their functions must be related in some way also.
111. On the Mechanism of Elongation A. O n the Placement of Ribosomal Sites Figure 7 (217a) shows the large ribosomal subunit, with the face of the subunit that interacts with small subunits in ribosomal couples oriented toward the viewer. High-resolution electron-microscopy analyses show that a gap exists between the two subunits large enough to accommodate tRNAs (45, 46), and it is generally agreed that protein synthesis occurs in that gap. The anticodon ends of ribosome-bound tRNAs interact with mRNA at the base of the small subunit cleft (218), and the peptidyl transferase site is located on the 50-S subunit in the region between L1 and the central protuberance (219). It is likely that the anticodon stems of ribosome-bound tRNAs run along the 504 face of the head of the 30-S subunit, and that the codon-anticodon interaction occurs at the cleft of the 30-S subunit. Their acceptor stems cross the intersubunit gap, placing their CCA ends in the neighborhood of the peptidyl transferase site. There is strong evidence that the anticodon ends of A-site and P-site tRNAs are also closely associated by virtue of their binding to adjacent mRNA codons (220,221). A recent modeling study supports the now generally accepted conclusion that the two tRNAs are arranged relatively in the “S” configuration (222, 223). Thus if the A site is located to the right of the P site (in Fig. 7), then tRNAs in those sites must be oriented so that their anticodons point either toward the 504 subunit or toward the body
ELONGATION PHASE OF PROTEIN SYNTHESIS
50 S subunit
32 1
30 S subunit
FIG. 7. Ribosomal sites. The 50-S subunit is shown with its subunit-interface surface oriented toward the reader. The subunit-interface surface of the 3 0 4 subunit is oriented away from the reader. The shapes of both subunits have been sketched, using models obtained recently by electron microscopy as a guide (45,46).(Authority for the positions assigned binding sites on the two subunits may be found in Ref. 217a.)
of the 30-S subunit, depending on their rotational orientations about an axis running through their elbows, normal to the molecular plane. Most commentators believe, as we do, that the A site does indeed lie to the right of the P site in Fig. 7 (217a, 223a). This arrangement is required, among other things, by what is known about the elongation-factor binding site. One speaks in terms of a “site” rather than “sites” because EF-G and EF-Tu compete with each other, and the structural similarity of EF-G and the EF-Tu ternary complex implies that they bind to the ribosome in a similar, if not identical fashion. There is overwhelming evidence that EF-G binds to ribosomes at the base of the L7/L12 stalk (219), and that EF-Tu binds to the 30-S subunit on the side that faces the L7/L12 stalk in 70-S couples (191), to which EF-Tu can be cross-linked (224). The L7/L12 stalk lies well to the right of the peptidyl transferase site. [Note, however, that demonstrations that something can be cross-linked to L7/L12 do not constrain ribosomal placements very strongly. The L7/L12 stalk is flexible, and one of the two dimers of L7/L12 is associated with the central protuberance of the 50-S subunit, not the stalk itself (167).]Because EF-Tu delivers tRNAs to the ribosome, tRNAs must enter the A site from the right. Monoclonal antibodies against L2 and L9, which are in the peptidyl
322
JOHN CZWORKOWSKI AND PETER B. MOORE
transferase neighborhood, also interfere with EF-Tu function (164,225),and EF-Tu can be cross-linked to proteins in the same region: L1, L5,and L15 (162, 226). EF-G also cross-links to components associated with the same region (160). In our view, these data are not in conflict with the conclusion just drawn. Because the second domain of EF-Tu interacts with the CCA end of aminoacyl-tRNAs, that part of EF-Tu can hardly fail to approach the peptidyl transferase region when the EF-Tu ternary complex binds to the ribosome; EF-Tu must reach deep into the A site. That domain 2 is important in EF-Tdribosome interactions is supported by the existence of a domain 2 mutant that interferes with ribosome binding (227). The steric similarity of the EF-Tu ternary complex and EF-G argues that its second domain ought to interact with the ribosome in the same region. If the A site lies to the right of the P site, the site where aminoacyl-tRNA first encounters the ribosome, the T site, if it exists, ought to be the right of the A site. The position of the E site follows almost by elimination. It must be immediately to the left of the P site, on the L1 side of the peptidyl transferase site.
B. Factor and tRNA Orientations There is ample evidence that EF-Tu delivers tRNAs to the ribosome oriented the same way they are in the A site, and binds in a manner that is compatible with peptide transfer. EFTu.GTP*aa-tRNAcomplexes in which amino acids are cross-linked to EF-Tu are active in mRNA-directed aminoacyl-tRNA binding and in GTP hydrolysis (227a), and a 20-A cross-link between the variable loop of the aminoacyl-tRNA and EF-Tu permits message-dependent binding, GTP hydrolysis, and peptide transfer (228). The EF-Tu ternary complex has an RNA-rich side, which includes its tRNA and its GTP binding site, which must face the peptidyl transferase when bound to the ribosome. The opposite side of the ternary complex, its protein-rich side, must point toward the L71L12 stalk. Note that if the tRNA side of the ternary complex was oriented the other way, the CCA end of tRNA would point away from the peptidyl transferase center. Although T-site enthusiasts might find this geometry gratifying, because that rotational difference would definitively distinguish the T site and the A site, it is hard to understand how a tRNA could interact satisfactorily with the same codon in both sites, as it must. Thus we believe that tRNA is delivered to the ribosome in an A-sitelike orientation, and works its way across the subunit interface from right to left during elongation. In addition to moving across the face of the ribosome during elongation, tRNAs rotate. The plane of the L of an A-site-bound tRNA intersects the corresponding plane of its P-site-bound neighbor at an angle of about 50" (229-231). Thus after peptide-bond formation, but before the next amino
ELONGATION PHASE OF PROTEIN SYNTHESIS
323
acid is incorporated, not only must an A-site-bound peptidyl-tRNA move to the P site, it must also rotate about an axis joining its 3’ end and its anticodon. It is generally assumed that this happens during translocation, but, as pointed out recently, rotation could occur when EF-Tu delivers the next aminoacyl-tRNA to the ribosome (K. H. Nierhaus, personal communication). It is generally believed-on the basis of little real evidence-that the rotational orientation of E-site-bound tRNA resembles that of P-site-bound tRNA. These arguments require that EF-G bind to the 70-S ribosome in same place as the EF-Tu ternary complex, and oriented so that its tRNA-like parts fill the same ribosomal region as the tRNA of the ribosome-bound ternary complex. The experimental data that speak most directly to the orientation of EF-G on the ribosome come from cross-linking studies in which both EFG residues and ribosomal components have been identified (see Sections II,D and II,F), but it is not decisive. Some findings suggest that the nucleotide binding face of EF-G contacts the 30-S subunit, but other findings could be interpreted as proving the opposite.
C. Models for Translocation The elongation cycle is driven by a switching of the ribosome between its pre- and posttranslocational states, which is catalyzed by the two elongation factors. The two factors do not operate in a perfectly parallel manner, however. Catalysis of state switching is the sole function of EF-G whereas EF-Tu both catalyzes state switching and delivers aminoacyl tRNAs to the ribosome. However, the fact that EF-G and the EF-Tu ternary complex are isosteric (at low resolution) makes it plausible to propose that they facilitate state switching the same way. If during switching the ribosome were to adopt a conformation stabilized by the binding of either the EF-Tu ternary complex or EF-G, and if that intermediate conformation were the transition state for the conformational isomerization in question, factor-binding would lower the activation energy for state switching, and hence increase its rate, as observed. If this concept is correct, then EF-G and EF-Tu need differ as conformational catalysts in only one way. The GTPase of EF-Tu must be triggered when the ribosome is in its pretranslocational state, whereas that of EF-G must be activated in response to the posttranslocation state of the ribosome. (Note that the elongation scheme presented in Fig. 2 shows ternary complex binding to posttranslocational ribosomes, causing the post- to pretranslocational conformation change, consistent with this proposal.) It could be argued that the component of both factors critical for the catalysis of state switching is the second domain. The tRNA portion of the EF-Tu ternary complex cannot be critical because EF-Tu alone appears to be able to trigger the conformational switching of ribosomes in its absence
324
JOHN CZWORKOWSKI AND PETER B . MOORE
(Section 11,F). In addition, as mentioned earlier, polypeptides consisting of domains 2, 3, 4, and 5 of EF-G catalyze state switching. Domain 2 is the only structure EF-Tu and this fragment of EF-G have in common. (The third EFTu domain partially overlaps domain 5 of EF-G when the two molecules are superimposed on their G domains, but they show no structural homology.) Domain 2 ought to be the portion of both factors that approaches the peptidy1 transferase site of the 50-S subunit most closely. Note also that the SRL is believed to be located in that neighborhood (190). It is not hard to devise proposals for translocation consistent with these ideas in the context of the hybrid-sites model for elongation. EF-G binds to the ribosome immediately after peptide-bond formation, when the new peptidyl-tRNA occupies the P site on the 50-S subunit and the A site on the 30-S subunit, and the CCA end of the newly deacylated tRNA is in the 50-S E site and its anticodon end occupies the 3 0 4 P site. If EF-G were to bind so that its third domain occupied the 5 0 3 A site, its fourth domain could displace the anticodon stem/loop of the peptidyl tRNA from the 30-S A site (Fig. 8). If tRNAs are driven from site to site by motions of the ribosomal subunit, it is easy to understand how tRNA displacement might occur. When EF-G binds to the ribosome, the A site of the 30-S subunit is not aligned with the A site on the 504 subunit, either because of conformational changes occurring when EF-Tu binds or because of changes that accompany peptide-bond formation. Domain 4,by hypothesis, occupies the region the 30-S A site will arrive at after translocation is complete. If interactions between domain 4 and the 30-S subunit were to stabilize the posttranslocational conformation of the ribosome, its presence would facilitate translocation. The anticodon end of the peptidyl-tRNA would be forced to migrate to the P site because the A site is filled by domain 4. Its displacement, which drags the mRNA with it, would favor migration of the anticodon end of the deacylated tRNA the same way. It is harder to visualize what happens if tRNAs work their way across a fixed ribosome surface. In that case, EF-G would be unable to bind to the ribosome immediately after peptide-bond formation because the 3 0 3 A site, where domain 4 must go, would be occupied by the anticodon stemlloop of the new peptidyl-tRNA. The only way this would work is if translocation and EF-G binding were simultaneous, and mechanistic proposals of this type are necessarily vague about why EF-G affects the displacement required. Mechanisms of this class have one attractive feature, however; they make EF-G binding the same as translocation, which is consistent with the enzymology of EF-G insofar as we now know it. No one has isolated an EFG-ribosome.mRNA*tRNAcomplex that was not in the posttranslocational state; the subunit motion model implies that such things could exist. In both models, the rearrangement of the ribosome brought about by translocation triggers
ELONGATION PHASE OF PROTEIN SYNTHESIS
325
FIG. 8. The mechanism of elongation. This figure depicts the events that m y transpire during the elongation cycle. It is a hybrid-sites model premised on the hypotheses that there is relative motion of the subunits during elongation, and that EF-G is an all-protein mimic of the EF-Tu ternary complex. The elongation cycle is dissected into seven steps. Step 1 is the binding of the EF-Tu ternary complex to a posttranslocational ribosome. This binding induces an adjustment in the relationship between the two subunits (heavy, broken arrow; step 2) that alters the ribosome from its posttranslocational to its pretranslocational state. This conformational change induce; the GTPase of EF-Tu; its GTP is cleaved, and EF-Tu leaves the ribosome (step 3). Peptide transfer occurs (step 4), and this causes the two tRNAs bound to the ribosome to enter hybrid states. EFG.GTP then binds (step 5). Its binding causes the reverse of the conformational change depicted in step 2 (step 6). The ribosome enters the posttranslocational state, which induces the GTPase of EF-G. The GTP bound to E F - 6 is hydrolyzed, and EF-G leaves the ribosome (step 7 ) so that the cycle can begin again.
the latent GTPase activity of EF-G. Cleavage of the GTP bound to EF-G favors a conformation of EF-G that has low a n i t y for the ribosome, and EFG departs, leaving the ribosome ready to accept the next EF-Tu ternary complex. We note that because the chirality of the elongation factors is now established beyond question, our analysis of their placement and orientation on
326
JOHN CZWORKOWSKI AND PETER B. MOORE
the ribosome depends on the handedness of the ribosome model we have used. This model is derived from projection images obtained by transmission electron microscopy, and left hands look like right hands in projection. Experiments have been done to determine the absolute hand of the ribosome (232; Joachim Frank, personal communication); the enantiomorph shown here is the one believed to be correct. If the chirality of current ribosome models were found to be incorrect, however, our proposal would have to be reformulated.
IV. Concluding Remarks When we decided to write this review, we hoped that insights gained from new elongation factor structures combined with facts gleaned from close reading of the literature would lead us to a definitive proposal for the mechanism of elongation. It is obvious to us now that the existing data are not sufficient to permit a result that grandiose to be achieved. We hope that speculations we have indulged in to fill the gaps will provoke experiments needed to solve this fascinating problem.
ACKNOWLEDGMENTS We are indebted to Morten Kjeldgaard for supplying us with coordinates for the EF-Tu ternary complex prior to publication, and we thank our many colleagues who responded to our request for reprints and preprints. Those acquainted with the elongation field will have noted that we have not cited all the references relevant to each point raised. Considerations of length made this impossible from the outset, and we apologize to all who feel their contributions have been slighted. We also acknowledge the many discussions about elongation we have had with Anders Liljas and Arnthor Aevarrson. Their insights have shaped our thinking, but they bear no responsibility for the opinions expressed. This review was prepared with support from a grant from the National Institutes of Health (AI09167).
REFERENCES 1. Y. Kaziro, B B A 505, 95 (1978). 2 . A. S. Spirin, This Series 32, 75 (1985).
3. 0. Nygard and L. Nilsson, EJB 191, 1 (1990). 4. A. A. Szewczak, P. B. Moore, Y.-L. Chan and I. G. Wool, PNAS 90, 9581 (1993) 5. A. A. Szewczak and P. B. Moore, J M B 247, 81 (1995). 6. M. Kjeldgaard, P. Nissen, S. Thirup and J. Nyborg, Structure 1, 35 (1993).
ELONGATION PHASE OF PROTEIN SYNTHESIS
327
7. H. Berchtold, L. Reshetnikova, C. 0. A. Reiser, N. K. Schirrner, M. Sprinzl and R. Hilgenfeld, Nature 365, 126 (1993). 8. P. Nissen et al., Science 270, 1464 (1995). 9. A. Aevarsson, E. Brazhnikov, M. Garber, J. Zheltonorova, Yu. Chirgadze, S. AI-Karadaghi, L. A. Svensson and A. Liljas, EMBO J. 13, 3669 (1994). 10. J. Czworkowski, J. Wang, T. A. Steitz and P. B. Moore, E M B O J . 13, 3661 (1994). 11. J. D. Watson, Bull. Soc. Chim.Biol. 46, 1399 (1964). 12. R. E. Monro, J M B 26, 147 (1967). 13. B. E. H. Maden, R. R. Traut and R. E. Monro, JMB 35, 333 (1968). 14. S. L. Gupta, J. Waterson, M. L. Sopori, S. M. Weissrnan and P. Lengyel, Bchem 10,4410 (1971). 15. S . S. Thacb and R. E. Thach, PNAS 6S, 1791 (1971). 16. D. Beyer, E. Skripkin, J. Wadzack and K. H. Nierhaus, J B C 269, 30713 (1994). 17. F. Lipmann, Science 164, 1024 (1969). 18. F. 0. Wettstein and H. Noll, J M B 11, 35 (1965). 19. H.-J. Rheinberger and K. H. Nierhaus, Biochem. Znt. 1, 297 (1980). 20. H.-J. Rheinberger, H. Sternback and K. H. Nierhaus, PNAS 78, 5310 (1981). 21. A. S. Spirin, FEBS Lett. 165, 280 (1984). 22. V. I. Baranov and L. A. Rybova, Biochimie 70, 259 (1988). 23. K. H. Nierhaus, Bchem 29, 4997 (1990). 24. H.-J. Rheinberger, U. Geigenrnuller, A. Gnirke, T.-P. Hausner, J. Remrner, H. Saruyama and K. H. Nierhaus, in ”The Ribosome: Structure, Function and Genetics” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 318. American Society for Microbiology, Washington, DC, 1990. 25. A. Gnirke, U. Geigenrnuller, H.-J. Rheinberger and K. H. Nierhaus, JBC 264, 7291 (1989). 26. B. Hardesty, W. Culp and W. McKeenan, C S H S Q B 34, 331 (1969). 27. J. A. Lake, PNAS 74, 1903 (1977). 28. D. Moazed and H. F. Noller, Nature 342, 142 (1989). 29. L. Skogerson and K. Moldave, ABB 125, 497 (1968). 30. M. V. Rodnina, R. Fricke and W. Wintermeyer, in “The Translational Apparatus: Structure, Function, Regulation and Evolution” (K. H. Nierhaus, F. Franceschi, A. R. Subramanian, V. A. Erdmann and B. Wittrnan-Liebold, eds.), p. 317. Plenum, New York, 1993. 31. S. V. Kirillov, V. I. Makhno and Y. P. Semenkov, NARes 8, 183 (1980). 32. M. C. Ganoza, C. Cunningham, D. G. Chung and T. Neilson, Mol. Biol. Rep. 15, 33 (1991). 33. M. Cannon, R. Krug and W. Gilbert, J M B , 7, 360 (1963). 34. A. Gnirke and K. H. Nierhaus, JBC 261, 14506 (1986). 35. R. Lill, A. Lepier, F. Schwagele, M. Sprinzl, H. Vogt and W. Wintermeyer, J M B 203,699 (1988). 36. D. Moazed and H. F. Noller, PNAS 88, 3725 (1991). 37. H.-J. Rheinberger, H. Sternbach and K. H. Nierhaus, JBC 261, 9140 (1986). 38. R. Lill and W. Wintermeyer, J M B 196, 137 (1987). 39. M. V. Rodnina and W. Wintermeyer, J M B 228, 450 (1992). 40. H. F. Noller, A R B 60, 191 (1991). 41. B. Hardesty, 0. W. Odom and H.-Y. Deng, in “Structure, Function and Genetics of Ribosomes’”(B. Hardesty and 6 . Krarner, eds.), p. 495. Springer-Verlag, New York, 1986. 42. 0. W. Odorn, W. D. Picking and 9. Hardesty, Bchem 29, 10734 (1990). 43. 0. W. Odom and 8. Hardesty, JBC 267, 19117 (1992).
328
JOHN CZWORKOWSKI AND PETER B . MOORE
44. A. Yonath, K. R. Leonard, S. Weinstein and H. G . Wittman, C S H S Q B 52, 729 (1987). 45. H. Stark, F. Mueller, E. V. Orlova, M. Schatz, P. Dube, T. Erdemir, F. Zemlin, R. Brimacornbe and M. van Heel, Structure 3, 815 (1995). 46. J. Frank, J. Zhu, P. Penczek, Y. Li, S. Srivastava, A. Verschoor, M. Rademacher, R. Grassucci, R. K. Lata and R. K. Agrawal, Nature 376, 441 (1995). 47. I. Serdyuk, V. Baranov, T. Tsalkova, D. Gulyarnova, M. Pavlov, A. S. Spirin and R. May, Biochimie 74, 299 (1992). 48. Y. Sernenkov, T. Shapkina, V. Makhno and S. Kirillov, FEBS Lett. 296, 207 (1992). 49. K. H. Nierhaus, in preparation (1995). 50. H.-J. Rheinberger and K. H. NierhausJBC 261, 9133 (1986). 51, M. V. Rodnina, R. Fricke and W. Wintermeyer, Bchem 33, 12267 (1994). 52. D. Moazed, J. M. Roberston and H. F. Noller, Nature 334, 362 (1988). 53. L. Holrnberg and 0. Nygard, Bchem 33, 15159 (1994). 54. C. G . Kurland and M. Ehrenberg, Annu. Reu. Biophys. 16, 291 (1987). 55. J. Eisinger, B. Feuer and T. Yamane, Nature N B 231, 120 (1071). 56. H. J. Grosjean, S. de Henau and D. M. Crothers, PNAS, 75, 610 (1978). 57. R. C. Thompson and A. M. Karim, PNAS 79, 4922 (1982). 58. J. J. Hopfield, PNAS 71, 4135 (1974). 59. J. Ninio, Biochimie 57, 587 (1975). 60. R. C. Thompson, D. B. Dix, R. B. Gerson and A. M. Karim, J B C 256, 6676 (1981). 61. R. Lill, J. M. Robertson and W. Wintermeyer, Bchem 25, 3245 (1986). 62. C. G. Kurland and M. Ehrenberg, This Series 31, 191 (1984). 63. U. Geigenmuller and K. H. Nierhaus, E M B O ] . 9, 4527 (1990). 64. R. C. Thompson and P. J. Stone, PNAS 74, 198 (1977). 65. T. Ruusala, M. Ehrenberg and C. 6. Kurland, E M B O J . 1, 741 (1982). 66. T. Ruusala and C. G . Kurland, MGG 198, 100 (1984). 67. J. Gordon and F. Lipmann, JMB 23, 23 (1967). 68. S. Pestka, JBC 243, 2810 (1968). 69. S. Pestka, JBC 244, 1533 (1969). 70. L. P. Gavrilova, 0. E. Kostiashkina, V. E. Koteliansky, N. M. Rutkevitch and A. S. Spirin, J M B 101, 537 (1976). 71. L. P. Gavrilova and A. S. Spirin, FEBS JAt. 39, 13 (1974). 72. A. S. Girshovich, E. S. Bochkareva and Y. A. Ovchinnikov, J M B 151, 229 (1981). 73. S. E. Skold, EJB 127, 225 (1982). 74. P. Strigini and L. Gorini, JMB 47, 517 (1970). 75. M. Ehrenberg, N. Bilgin and C. G. Kurland, in “hbosomes and Protein Synthesis: A Practical Approach (G. Spedding, ed.), p. 101. IRL Press, Oxford, 1990. 76. C. Nombela and S. Ochoa, PNAS 70, 3556 (1973). 77. J. R. Mesters, A. P. Potapov, J. M. d e Graaf and B. Kraal, JMB 242, 644 (1994). 78. J. W. Bodley and L. Lin, Nature 227, 60 (1970). 79. D. Richter, BBRC 46, 1850 (1972). 80. D. L. Miller, PNAS 69, 752 (1977). 81. N. Richman and J. W. Bodley, PNAS 69, 686 (1972). 82. J. M. Robertson, C. Urbanke, G. Chinali, W. Wintermeyer and A. Parmeggiani,JMB 189, 653 (1986). 83. 0. Nygard and L. Nilsson, EJB 179, 603 (1989). 84. K. H. Nierhaus, S. Schilling-Bartetzko and T. Twardowski, Biochimie 74, 403 (1992). 85. S. Schilling-Bartetzko, A. Bartetzko and K. H. Nierhaus, JBC 267, 4703 (1992). 86. R. Mikkola and C. 6. Kurland, Biochimie 73, 1061 (1991). 87. I. Tubulekas and D. Hughes, Mol. Microbiol. 7, 275 (1993).
ELONGATION PHASE OF PROTEIN SYNTHESIS
329
A. V. Furano, PNAS 72, 4780 (1975). M. Gouy and R. Grantham, FEBS Lett. 115, 151 (1980). L. Beres and J. Lucas-Lenard, Bchem 12, 3998 (1973). J. Lucas-Lenard and F. Lipmann, ARB 40, 409 (1971). D. L. Miller and H. Weissbach, in “Nucleic Acid-Protein Recognition” (H. J. Vogel, ed.), p. 409. Academic Press, New York, 1977. 93. A. Weijland, K. Harmark, R. H. Cool, P. H. Anborgh and A. Parmeggiani, Mol. Microbid. 6 , 683 (1992). 94. M. Sprinzl, Trends Biochem. Sci. 19, 245 (1994). 95. M. V. Rodnina, R. Fricke, L. Kuhn and W. Wintermeyer, E M B O J . 14, 2613 (1995). 96. G. Romero, V. Chau and R. L. Biltonen, JBC 260, 6167 (1985). 97. K. L. Manchester, Biochem. Znt. 5, 929 (1991). 98. K. L. Manchester, Biochem. Znt. 27, 311 (1992). 99. 0. Fasano, E. De Vendittis and A. Parmeggiani, JBC 257, 3145 (1982). 100. J. Gordon, JBC 244, 5680 (1969). 101. G. Sander, R. C. Marsh, J. Voigt and A. Parmeggiani, Bchem 14, 1805 (1975). 102. H. Wolf, G. Chinali and A. Parmeggiani, PNAS 71, 4910 (1974). 103. G. Parlato, R. Pizzano, D. Picone, J. Guesnet, 0. Fasano and A. Parmeggiani, J B C 258, 995 (1983). 104. K. Takahashi, S. Ghang and S. Chladek, Bchem 25, 8330 (1986). 105. M. Tezuka and S. Chladek, BBA 950, 463 (1988). 106. S. Campuzano and J. Modolell, PNAS 77, 905 (1980). 107. G. Sander, EJB 75, 523 (1977). 108. D. Picone and A. Parmeggiani, Bchem 22, 4400 (1983). 109. H. Wolf, 6 . Chinali and A. Parmeggiani, EJB 75, 67 (1977). 110. A. Parmeggiani, G. W. Swart, K. K. Mortensen, M. Jensen, B. F. Clark, L. Dente and R. Cortese, PNAS 84,3141 (1987). 111. K. Bensch, U. Pieper, G. Ott, N. Schirmer, M. Sprinzl and A. Pingoud, Biochimie 73, 1045 (1991). 112. R. Leberman, FEBS Lett.358, 71 (1995). 113. A. Weijland and A. Parmeggiani, Trends Biochem. Sci. 19, 188 (1994). 114. J. Scoble, N. Bilgin and M. Ehrenberg, Biochimie 76, 69 (1994). 115. A. Weijland, G. Parlato and A. Parmeggiani, Bchem 33, 10711 (1994). 116. V. Dincbas, N. Bilgin, J. Scoble and M. Ehrenberg, FEBS Lett. 357, 19 (1995). 117. N. Bilgin and M. Ehrenberg, Bchem 34, 715 (1995). 118. B. D. Beck, P. G. Arscott and A. Jacobson, PNAS 75, 1250 (1978). 119. M. Wurtz, G. R. Jacobson, A. C. Steven and J. P. Rosenbusch, EJB 88, 593 (1978). 120. B. D. Beck, EJB 97, 495 (1979). 121. K. Arai, Y. Otga, N. Arai, S. Nakamura, C. Henneke, T. Oshima and Y. Kaziro, EJB 92, 521 (1978). 122. K. Arai, Y. Otga, N. Arai, S. Nakamura, C. Henneke, T. Oshima and Y. Kaziro, EJB 92, 509 (1978). 123. M . V. Rodinina and W. Wintermeyer, PNAS 92, 1945 (1995). 124. M. S. Rohrbach, M. E. Dempsey and J. W. Bodley, JBC 249, 5094 (1974). 125. A. Parmeggiani and G. Sander, Mol. Gen. Biochem. 31, 129 (1981). 126. E. De Vendittis, M. Masullo and V. Bocchini, JBC 261, 4445 (1986). 127. M . Masullo, G. Parlato, E. de Vendittis and V. Bocchini, BJ 261, 725 (1989). 128. A. R. Dahlfors and C. G. Kurland, J M B 216, 311 (1990). 129. Y. Kaziro, N. Inoe, Y. Kuriki, K. Mizumoto, M. Tanaka and M. Kawakita, C S H S Q B 34, 385 (1969). 88. 89. 90. 91. 92.
330
JOHN CZWORKOWSKI AND PETER B. MOORE
N. Inoue-Yokosawa, C. Ishikawa and Y. Kaziro, JBC 249, 4321 (1974). G. Chinali and A. Parmeggiani, EJB 125, 415 (1982). R. Lill, J. M. Robertson and W. Wintermeyer, EMBO J. 8, 3933 (1989). W. Wintermeyer, R. Lill and J. M. Robertson, in “The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 348. American Society for Microbiology, Washington, DC, 1990. 134. J. M. Robertson and W. Wintermeyer, J M B 198, 133 (1987). 135. H. Paulsen and W. Wintermeyer, Bchem 25, 2749 (1986). 136. Y. B. Alakhov, 0. A. Stengrevics, V. V. Filirninov and S. Yu. Venyaminov, EJB 99, 585 (1979). 137. C. Borowski, C. Niess and W. Wintermeyer, in preparation (1995). 138. N. V. Belitsina, G. Z. Tnalina and A. S. Spirin, FEBS Lett. 131, 289 (1981). 139. E. F. Gale, E. Cundlifle, P. E. Reynolds, M . H. Richmond and M. J. Waring, “The Molecular Basis of Antibiotic Action.” Wiley, London, 1981. 140. E. Cundliffe, in “The Ribosome: Structure, Function, and Genetics” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Wanner, eds.), p. 479. American Society for Microbiology, Washington, DC, 1990. 141. J. W. Bodley, F. J. Zieve, L. Lin and S. T. Vieve, JBC 245, 5656 (1970). 142. G. R. Willie, N. Richman, W. 0. Godtfredsen and J. W. Bodley, Bchem 14, 1713 (1975). 143. U. Johanson and D. Hughes, Gene 143, 55 (1994). 144. J. R. Mesters, J. M. de Graafand B. Kraal, FEBS Lett. 321, 149 (1993). 145. A. Parmeggiani and G. W. M. Swart, Annu. Reu. Microbiol. 39, 557 (1985). 146. A. Weijland, K. Harmark, P. H. Anborgh and A. Parmeggiani, in “The Translational Apparatus: Structure, Function, Regulation, Evolution” (K. H. Nierhaus, F. Franceschi, A. R. Subrarnanian, V. A. Erdmann and B. Wittmann-Liebold, eds.), p. 295. Plenum, New York, 1993. 147. J. P. Abraham, M. J. van Raaij, G. Ott, 8. Kraal and L. Bosch, Bchem 30, 6705 (1991). 148. H. Wolf, D. Assmann and E. Fischer, PNAS 75, 5324 (1978). 149. E. Cundliffe and P. D. Dixon, Antimicrob. Agents Chemother. 8, 1 (1975). 150. J. H . Highland, L. Lin and J. W. Bodley, Bchem 10, 4404 (1971). 151. J. Modolell, B. Cabrer, A. Parmeggiani and D. Vazquez, PNAS 68, 1796 (1971). 152. T. P. Hausner, U. Geigenmuller and K. H. Nierhaus, JBC 263, 13103 (1988). 153. E. CundlifFe and J. Thompson, EJB 118, 47 (1981). 154. M. Misumi, N. Tanaka and T. Shibata, BBRC 82, 971 (1978). 155. J. Modolell and D. Vazquez, EJB 81, 491 (1977). 156. T.-P. Haussner, U. Geigenmuller and K. H. Nierhaus, JBC 263, 13103 (1988). 157. U. Johanson and D. Hughes, NARes 23, 464 (1995). 158. K. W. Kischa, W. Moller and G. Stoefiler, Nature NB 233, 62 (1971). 159. E. Hamel, M. Koka and T. Nakamoto, JBC 247, 805 (1972). 160. W. Moller and J. A. Maassen, in “Structure, Function, and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 309. Springer-Verlag. New York, 1986. 161. J. A. Langer, F. Jurnak and J. A. Lake, Bchem 23, 6171 (1984). 162. C. San Jose, C. G. Kurland and G. StoefBer, FEBS Lett. 71, 133 (1976). 163. B. Nag, D. S. Tewari, A. Somrner, H. M. Olson, D. G. Glitz and R. R. Traut,JBC 262, 9681 (1987). 164. B. Nag, S. S. Akella, P. A. Cann, D. S. Tewari, D. G. Glitz and R. R. Traut, JBC 266, 22129 (1991). 165. A. S. Acharya, P. B. Moore and F. M. Richards, Bchen 12, 3108 (1973). 166. J. A. Maassen and W. Moller, PNAS 71, 1277 (1974).
130. 131. 132. 133.
ELONGATION PHASE OF PROTEIN SYNTHESIS
331
167. R. R. Traut, D. S. Tewari, A. Sommer, G. R. Gavino, H. M. Olson and D. G. Glitz, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 286. Springer-Verlag, Berlin, 1986. 168. P. I. Schrier, Ph.D. Thesis, University of Leiden (1977). 169. A. T. Gudkov and G. M. Gongadze, FEBS Lett. 176, 32 (1984). 170. L. A. Ryabova, 0. M. Selivano, V. I. Baranov, V. D. Vasiliev and A. S. Spirin, FEBS Lett. 226, 255 (1988). 171. A. Liljas, Prog. Biophys. Mol. Biol. 40, 161 (1982). 172. J. Egehjerg, N. Larsen and R. A. Garrett, in “The Ribosome: Structure, Function, and Evolution” (W. E . Hill, A. Dahlherg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 168. American Society for Microbiology, Washington, DC, 1990. 173. S. E. Skold, NARes 11, 4923 (1983). 174. H. F. N o h , J. Kop, V. Wheaton, J. Brosius, R. R. Gutell, A. M. Kopylov, F. Dohme, W. Herr, D. A. Stahl, R. Gupta and C. R. Woese, NARes, 9, 6167 (1981). 175. Y. Endo and I. G. Wool, JBC 257, 9054 (1982). 176. Y. Endo, M. Mitsui, M. Motizuki and K. Tsurugi, JBC 262, 5908 (1987). 177. C. Fernandez-Puentes and D. Vazquez, FEBS Lett. 78, 143 (1977). 178. A. N. Hohden and E. Cundliffe, BJ 170, 57 (1978). 179. T.-P. Hausner, J. Atmadja and K. H. Nierhaus, Biochimie 69, 911 (1987). 180. S. P. Miller and J. W. Bodley, NARes 19, 1657 (1991). 181. B. Wimherly, G . Varani and I. Tinoco, Jr., Bchem 32, 1078 (1993). 182. A. Gluck, Y. Endo and I. G. Wool, J M B 226, 411 (1992). 183. W. E. Tapprich and A. E. Dahlherg, E M B O J . 9, 2649 (1990). 184. S. Tapio and L. A. Issaksson, EJB 202, 981 (1991). 185. C. A. White, T. Wood and W. E. Hill, NARes 16, 10817 (1988). 186. W. E. Hill, J. Weller, T. Gluick, C . Merryman, R. T. Marconi, A. Tassanakajohn and W. E. Tapprich, in “The Ribosome: Structure, Function, and Evolution” (W. E. Hill, A. Dahlherg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 253. American Society for Microbiology, Washington, DC, 1990. 187. K. H. Nierhaus, R. Adlung, T.-P. Hausner, S. Schilling-Bartetzko, T. Twaerdowski and F. Triana, in “The Translational Apparatus: Structure, Function, Regulation, Evolution” (K. H. Nierhaus, F. Franceschi, A. R. Suhramanian, V. A. Erdmann and B. WittmannLiehold, eds.), p. 263. Plenum, New York, 1993. 188. I. G. Wool, A. Gluck and Y. Endo, Trends Biochem. Sci. 17, 266 (1992). 189. K. H. Nierhaus, T.-P. Adlung, S. Hausner, S. Schilling-Bartetzko, T. Twardowski and F. Triana, in “The Translational Apparatus: Structure, Function, Regulation, Evolution” (K. H. Nierhaus, F. Franceschi, A. R. Suhramanian, V. A. Erdmann and B. Wittman-Liebold, eds.), p. 263. Plenum, New York, 1993. 190. B. S . Cooperman, P. Muralikrishna and R. W. Alexander, in preparation (1995). 191. J. A. Langer and J. A. Lake, J M B 187, 617 (1986). 192. A. S. Girshovich, E. S. Bochkareva and A. T. Gudkov, FEBS Lett. 150, 99 (1982). 193. N. Bilgin, F. Claesens, H. Pahverk and M. Ehrenherg, J M B 224, 1011 (1992). 194. I. Tuhulekas, R. H. Buckingham and D. Hughes, J. Bacterial. 173, 3635 (1991). 195. T. Powers and H. F. Noller, PNAS 90, 1364 (1993). 196. M. Kjeldgaard and J. Nyhorg, J M B 223, 721 (1992). 197. H. R . Bourne, D. A. Sanders and F. McCormick, Nature 349, 117 (1975). 197a. P. J. Kraulis, J. Appl. Crystallogr. 24, 946 (1991). 197b. D. J. Bacon and W. F. Anderson, J. Mol. Graphics 6, 219 (1988). 198. F. Ahdulkarim, L. Liljas and D. Hughes, FEBS Lett. 352, 118 (1994).
332
JOHN CZWORKOWSKI AND PETER B. MOORE
199. J. R. Mesters, L. A. H. Zeef, R. Hilgenfeld, J. M. de Graaf, B. Kraal and L. Bosch, E M B O J. 13, 4877 (1994). 200. A. Pingoud, W. Block, C. Urbanke and H. Wolf, EJB 123, 261 (1982). 201. L. A. H . Zeef, L. Bosch, P. H. Anborgh, R. Cetin, A. Parmeggiani and R. Hilgenfeld, E M B O J . 13, 5113 (1994). 201a. A. Nicholls, K. A. Sharp and B. Honig, Proteins 11, 281 (1991). 202. A. E. Johnson, F. Janiak, V. A. Dell and J. K. Abrahamson, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 541. Springer Verlag, New York, 1986. 203. F. P. Wikman, G. E. Siboska, H. U. Petersen and B. F. Clark, E M B O J . 1, 1095 (1982). 204. M. Jensen, R. H. Cool, K. K. Mortensen, B. F. Clark and A. Parmeggiani, EJB 182,247 (1989). 205. N. K. Schirmer, C. 0. Reiser and M. Sprinzl, EJB 200, 295 (1991). 206. Y. W. Hwang, M. Carter and D. L. Miller, JBC 267, 22198 (1992). 207. M. E. Peter, C. 0. A. Reiser, N. K. Schirmer, T. Kieihaber, G. Ott, N . W. Grillenbeck and M . Sprinzl, NARes 18, 6889 (1990). 208. M. G. Bubunenko, M. L. Kireeva and A. T. Gudkov, Biochimie 74, 419 (1992). 209. A. G. Murzin, Nat. Struct. Biol. 2, 25 (1995). 210. Yu. B. Alakhov, L. P. Motuz, 0. A. Stengrevics, L. M. VinokurovandYu. A. Ovchinnikov, Bioorg. Khim. 3, 1333 (1977). 211. Yu.A. Ovchinnikov, Yu.B. Alakhov, Yu.P. Bundulis, M. A. Bundule, N. V. Dovgas, V. P. Kozlov, L. P. Motuz and L. M. Vinokurov, FEBS Lett. 139, 130 (1982). 212. N. Arai, K. Arai, S. Nakamura and Y. Kaziro, J . Biochem. 82, 695 (1977). 213. D. Guillot, J.-P. Lavergne and J.-P. Reboud, JBC 268, 26082 (1993). 214. R. J. Collier, Bacteriol. Rev. 39, 54 (1975). 215. E. A. Robinson, 0. Henriksen and E. S. Maxwell, JBC 249, 5088 (1974). 216. B. G . Van Ness, J. B. Howard and J. W. Bodley, JBC 255, 10710 (1980). 217. Yu. B. Alakhov, I. K. Zalite and I. A. Kashparov, EJB 105, 531 (1980). 21 7a. J. Wower, P. Scheffer, L. A. Sylvers, W. Wintermeyer and R. A. Zimmermann, E M B O J . 12, 617 (1993). 218. P. Gonichi, K. Nurse, W. Hellmann, M. Boublik and J. Ofengand, JBC 249, 10493 (1984). 219. G . StoefAer, and M. Stoeffler-Meilicke, in “Structure, Function, and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 28. Springer-Verlag, New York, 1986. 220. R. H. Fairclough and C. R. Cantor, J M B 132, 575 (1979). 221. A. J. M. Matzke, A. Bartd and E. Kuechler, PNAS 77, 5110 (1980). 222. T. R. Easterwood, F. Major, A. Malhotra and S. C. Harvey, NARes 22, 3779 (1994). 223. V. Lim, C. Venclovas, A. S. Spirin, R. Brimacombe, P. Mitchell and F. Muller, NARes 20, 2627 (1992). 223a. J. Wower and R. A. Zimmermann, Biochimie 73, 961 (1991). 224. A. S. Girshovich, E. S. Bochkareva and V. D. Vasiliev, FEBS Lett. 197, 192 (1986). 225. B. Nag, D. S. Tewari, J. R. Etchinson, A. Sommer and R. R. Traut, JBC 261, 13892 (1986). 226. U. Fabian, FEBS Lett. 71, 256 (1976). 227. I. Tubulekas and D. Hughes, J. Bact. 175, 240 (1993). 227a. A. E. Johnson, D. L. Miller and C. R. Cantor, PNAS 75, 3075 (1978). 228. T. Kao, D. L. Miller, M. Abo and J. Ofengand, J M B 166, 383 (1983). 229. A. E. Johnson, H. J. Adkins, E. A. Matthews and C. R. Cantor, J M B 156, 113 (1982). 230. H . Paulsen, J. M. Robertson and W. Wintermeyer, J M B 167, 411 (1983). 231. R. Rigler and W. Wintermeyer, Annu. Reu. Biophys. Bioeng. 12, 475 (1983). 232. K. R. Leonard and J. A. Lake, J M B 129, 155 (1979).
Signals in Eukaryotic DNA Promote and Influence Formation of Nucleosome Arrays ARNOLDSTEIN Department of Biological Sciences Purdue University West Lafayette, Indiana 47906-1392
I. Detection and Analysis of Nucleosome Arrays 11. Models for the Formation of Periodic Nucleoso Their Implications . . . . . . . . . . . . . . . . . . . . . . . . 111. In Vitro Chromatin Assembly Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Chromatin Assembly Using Crude Extr B. Chromatin Assembly in a System Consisting of Purified Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............ Array and Chromatin Higher IV. Relationship bet Order Structure ............................ V. Signals in Geno cleosome Alignment . . . . . . . . . A. Introns of the Chicken Ovalbumin Gene Promote Nucleosome Alignment in Vitro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Signals in Chicken P-Clobin DNA Influence Chromatin A Vitro . . . . . . . . . . . . . . . . . . . . . . . . C. Rat Growth Hormone Gene Introns Stimulate Nucleosome Alignment in Vitro and in Transgenic Mice and Increase Transcription Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI. Chromatin Assembly on Plasmids in Transfected Cells . . . . . . . . . . . . . . VII. Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References .... ..................................
334 338 343 344 346 356 358 359 363 366 374 377 378
One of the hallmarks of chromatin structure is the periodic arrays of nucleosomes revealed in electron micrographs of chromatin spread at low ionic strength, or by the ladders of bands in gels following micrococcal nuclease (MNase) digestion and electrophoresis of the purified fragmented DNA. The periodic nucleosome arrays must in some way reflect or influence the higher order chromatin structures in which they reside, even though they persist when the higher order structure is unfolded, because of the very tight association of the core histone octamers with DNA. In turn, the higher order structures of chromatin are thought to be responsible both for organizing the folding of DNA in chromosomes and, in cooperation with transcription factors, for regulating access to the genetic material in a cell-type speProgress in Nucleic Acid Research
and Molecular 0iology. Vol. 54
333
Copyright 0 1996 by Acadc~nicYrecs, Inc.
All nghts of epruduchon rn any form reserved.
334
ARNOLD STEIN
cific way (1-3). It is known from in vitro studies, for example, that direct addition of linker histone H1 to chromatin that was gently depleted of histone H1 regenerates the native “30-nm fiber,” provided that the native internucleosome spacings were not disturbed (4, 5). Perturbation of the native nucleosome spacings along the DNA leads to nonnative chromatin structures and, generally, nucleoprotein aggregation when the highly basic H I histone is added. Despite being a fundamental property of chromatin, which might be intimately associated with chromatin structure and function, nucleosome-array formation is still poorly understood. It seems that there are several reasons why this problem has not been solved. First, it is not obvious what the typical MNase ladder is really telling us about the apparently well-ordered nucleosome arrays. As discussed in Section II,A, the bands on a gel are much broader than DNA restriction fragments, there is a considerable background signal between the bands, and the periodic signal generally damps out (bands fail to be resolved from adjacent ones) at about the decamer of the ladder. These characteristics allow for several interpretations concerning the degree of order actually present in arrays of nucleosomes and how the order arises. Additionally, there are technical difficulties in examining the chromatin structures of single-copy DNA sequences in higher cells by Southern hybridization. Signals are weak, necessitating the application of large amounts of genomic DNA to gel lanes, which can affect band resolution and DNA fragment mobility, and long exposure times are required, which can lead to high background signals. Also, cross-hybridization to similar sequences or the presence of repetitive DNA in the probe can result in detection of essentially the bulk chromatin repeat, masking the signal arising from the desired genomic region. Finally, completely defined in uitro systems capable of assembling any DNA into chromatin with the properties of bulk cellular chromatin have not been developed. The systems that have been developed will be described here. It seems likely that nucleosome arrays can be formed through several different mechanisms, even though the final products might appear to be very similar (see Section 111). In this essay, I review what is currently known about nucleosome-array formation, discuss the implications of different models, and present some speculative new ideas concerning the genomic organization of chromatin.
1. Detection and Analysis of Nucleosome Arrays Nucleosomes in chromatin consist of histone cores (octamers of two each of the four core histones), about which 146 bp of DNA is tightly wrapped,
SIGNALS IN EUKARYOTIC DNA
335
and DNA linkers that connect adjacent particles (1). The 146 bp associated with the nucleosome core particle is considerably more resistant to cleavage by MNase (as well as by most other nucleases and DNA-cleaving reagents) than is the linker DNA. Hence, limited digestion generates a mixture of differently sized oligo-nucleosomes, and the purified DNA fragments isolated from these nucleosome oligomers appear to be multiples of a unit repeat, corresponding to the average nucleosome size (core plus linker). Because such experiments involve chromatin or nuclei from a large number of cells, each of which contains a large amount of DNA, and all linkers can be cleaved with approximately equal probability, each band of an “MNase ladder” on a gel contains a large number of different DNA sequences, all derived from a particular oligomer size (i.e., trimer). Thus, an average oligomer size (in base pairs) is obtained. Of course, attention can be focused upon a particular region of the genome using Southern blotting and specific hybridization to a selected DNA probe. In this case only the oligomers that contain the probe sequence or a portion of the probe sequence will be detected. Generally, the relative nucleosome positions on the sequences corresponding to the probe, as well as contributions to the pattern extending up to approximately 1.5 kb on either side from the midpoint of the probe, will be assessed in such an experiment. The nucleosome spacing periodicity (repeat length) exhibits differences among some species and cell types ( 1 ) . For example, the repeat length in baker’s yeast is only about 160 bp, indicating that the nucleosomes are very close together (average linker length of 14 bp); in most animal tissues it is about 200 bp (54-bp average linker), and in sea urchin sperm the repeat is about 240 bp, indicating that the average DNA linkers associated with nucleosomes in this case are approximately 94 bp. Interestingly, nucleosome periodicity variations have also been detected on different DNA sequences within the same cells (6-9). Measurement of the nucleosome spacing periodicity is best done by first measuring very carefully the mean sizes of the nucleosome oligomer DNA bands on an extended ladder from a gel lane run adjacent to a lane with size standards, and then plotting size versus oligomer number (see later, for example, Fig. 16C). The best straight line through the points gives the repeat length (average number of base pairs per nucleosome). This method (10) largely corrects for the end-trimming of the excised oligonucleosomes, which increases with the extent of the digestion. Thus, as the digestion proceeds, the DNA length contained in each nucleosome oligomer decreases by approximately the same amount, leading to a straight line with the same slope as one obtained from a less extensive digestion, but shifted downward. The negative intercept corresponds to the amount of DNA trimmed from both ends of an oligomer. In general, positive intercepts
336
ARNOLD STEIN
should not be obtained; if present, they might reflect some type of heterogeneity in the sample, or the presence of nonnucleosomal protein-DNA complexes close to the hybridization probe. It is also best to omit the monomer from the analysis, because mononucleosomes readily loose histone H1 and become trimmed to 146-bp core particles. Thus far we have only considered the arrangement of nucleosomes relative to each other. However, even for a highly ordered nucleosome array, there might not necessarily be any particular relationship between the nucleosome positions and the underlying DNA sequence. To examine nucleosome positioning with respect to DNA, the indirect end-label method (11, 12) can be used. In this type of experiment, DNA purified from MNasedigested chromatin is cut to completion with a restriction enzyme, providing a known reference point. Southern hybridization with a short probe abutting the restriction site then allows mapping of the MNase cuts that occurred at particular distances (in one direction) from the reference point. Figure 1 shows such an experiment. Differences between the cutting patterns of chromatin (lanes 1 and 2) and a naked DNA control (lane D) provide information on nucleosome positioning with respect to the base sequence. For example, observing protection of chromatin sites that are preferred cutting sites on naked DNA, and observing distances between chromatin cuts that are consistent with the size of a nucleosome, both indicate nucleosome positioning with respect to DNA. For the data shown in Fig. 1, the region of the gel marked by the vertical bar is evidence for the existence of an array of five positioned nucleosomes that formed on the 3' end of the rat growth hormone gene in an in uitro chromatin assembly system (13); the cuts in the chromatin were 200 bp apart. On the other hand, nucleosomes that are randomly positioned with respect to the DNA sequence should generate essentially the same digestion patterns for chromatin and naked DNA (Fig. 1, lower region of gel). This is because each of the preferred MNase cutting sites on naked DNA should be accessible on some molecules. The failure to detect nucleosome arrays that are uniquely positioned with respect to the DNA sequence by the indirect end-label method, however, does not imply that nucleosomes are randomly arranged. For example, the formation of nucleosomes spaced at apparently regular intervals with respect to each other, but not with respect to the DNA sequence, could occur in two ways. First, it could be accomplished through multiple, mutually exclusive, positioning frames on different molecules. In this case, any particular molecule would possess a highly ordered positioned array, using one of the frames. In the hypothetical example illustrated in Fig. 2A, about half of the molecules have frame a and half have frame b (14). Cutting would be detected at all four of the preferred MNase sites (arrows) shown for naked (n)
SIGNALS IN EUKARYOTIC DNA
337
FIG. 1. Indirect end-label analysis for nucleosome positioning with respect to the rat growth-hormone gene DNA sequence. (Referring to Fig. 13, construct a was assembled into chromatin in uitro. Samples were digested lightly with MNase, deproteinized, then digested to completion with Xhol; the Southern blot was probed with probe 11). Here, lane D is the naked DNA control; lane M, labeled size markers. The thick line at the left directs attention to the region of the gel (lanes 1 and 2) where cleavage sites differ from those of the naked DNA. These sites are separated by 200 bp, consistent with a regularly spaced, positioned nucleosome array. The positions of these cutting sites on the gene are indicated by arrows on the map in Fig. 13a. (Reprinted with permission from Ref. 13 )
DNA, leading to a result similar to that expected for random nucleosome positioning. Alternatively, the degree of order might be high enough to generate an extended MNase ladder, but not high enough to satisfy the stringent criteria for nucleosome positioning with respect to DNA (complete protection of all of the preferred MNase sites on naked DNA that should be contained in nucleosomes), as illustrated in Fig. 2B. Here, the nucleosome arrangement shown in frame a protects all of the preferred MNase cleavage sites (arrows) shown on the naked (n) DNA. However, in the other arrangements shown (frames b-e), one nucleosome is either missing or displaced, leading to exposure of MNase sites 1-4, respectively, in frames b-e. For a mixture of such arrangements, MNase digestion would generate many fragment lengths that would be multiples of, say, 200 bp, which would generate a ladder in a simple probing experiment. Nevertheless, cutting at all four of
338
ARNOLD STEIN
A
++
1 n
+
ab
H
H
i
H
CI
B 1
4
n a + b - f
c
n
I-!
+ H
4
H
H
H
+
3
4
l-
H
H
H
H
H
H
H
H
H
H
H
H
H
d +
H
H
1
e +
H
H
H
1
H
H
H
+
2
1
1
+
3
2
H
H
l-
k
t-
FIG. 2. Hypothetical nucleosome arrangements illustrating how simple probing might detect ordering of nucleosomes with respect to each other, but indirect end-label analysis would indicate a lack of nucleosome positioning with respect to the base sequence. Arrows indicate preferred MNase sites on naked DNA (n). (A) The sample consists of a mixture of molecules with two different “phasing frames” (a and b). (B) The sample consists of a mixture of molecules including a perfectly ordered positioned array (a) and imperfect arrays with one nucleosome either missing or displaced (b-e). (Reprinted with permission from Ref. 14.)
the MNase sites shown would be detected in an indirect end-label experiment, again leading to a result similar to that expected from random nucleosome positioning.
II. Models for the Formation of Periodic
Nucleosome Arrangements and Their Implications The finding that nucleosome linker lengths vary in chromatins from different sources suggested that linker lengths might vary within a particular cell type. This result was consistent with the breadths of the bands seen in MNase ladders. However, an alternative explanation for the appearance of the ladder was that the linkers were really homogeneous, but that cleavage left DNA tails of various sizes on the nucleosome oligomers excised from chromatin. For example, the dirner excised from a homogeneous 200-bp array might contain DNA sizes ranging from 346 bp (two 146-bp completely trimmed cores plus one 54-bp internal linker) to 454 bp (346-bp trimmed dimer plus two 54-bp full-length linker tails), with a mean size of 400 bp
339
SIGNALS IN EUKARYOTIC DNA
(346-bp trimmed dimer plus two 5412-b~centrally cut linker tails). This possibility was eliminated in a classic experiment by Prune11 and Kornberg (15). Using exonuclease 111 plus subsequent digestion of the remaining single-stranded DNA, under conditions where the initially broad monomer trimmed to a sharp 146 bp, it was shown that the dimer band remained broad, indicating that linkers in rat liver chromatin must in fact be heterogeneous in length. It has been more difficult to determine whether the linker length variation is simply of a statistical nature, with the full range of variation possible within any group of consecutive nucleosomes, or whether different types of arrays exist that have fairly uniform linker lengths with particular values, or whether linker lengths vary in a way defined by the DNA base sequence.
A. Statistical Positioning of Nucleosomes It has been argued that, although a degree of sequence specificity exists in the histone octamer-DNA interaction, the sequence specificity cannot be very high because essentially all DNA sequences in eukaryotes are packaged into nucleosomes. Moreover, the nucleosome spacing periodicity can vary among certain cell types of an organism, which contain the same DNA. For example, in chicken liver the nucleosome repeat is 195 5 bp, whereas in chicken erythrocyte it is 207 + 5 bp ( 1 ) . The bands on MNase ladders prepared from these two tissues are clearly seen to go out of phase with each other as one proceeds up the ladder, consistent with this repeat difference. To account for these observations, along with the observations mentioned above that MNase ladders contain a background signal and that the bands generally damp out at about the decamer, it was proposed that nucleosomes might form randomly on DNA, constrained only by the ratio of the total histone to the DNA in the chromatin (16). Apparent ordering then arises simply from the relatively high density of nucleosomes on DNA, even though individual linker lengths can have any value. Only the average linker value, when averaged over a very large number of nucleosomes, corresponds to the physiological value observed for that tissue-for example, 50 bp (196146 bp) for a typical 196-bp nucleosome repeat. To demonstrate the validity of this idea, a statistical mechanical formulation was used to derive expressions that give the probabilities for obtaining DNA fragments of any length, for small values of a simulated random cutting frequency (16).The probability vs. length curves computed can be equated with densitometer scans of MNase ladders, obtained from real experiments. Figure 3 shows simulated “densitometer scans” from statistically positioned nucleosomes for four different average linker lengths, corresponding approximately to those determined experimentally from a range of organisms ( 1 ) :35 bp (HeLa cells), 50 bp (rat liver), 65 bp (chicken erythrocyte), and
*
340
ARNOLD STEIN
0
0.6
0
FIG. 3. Simulation of an MNase digestion experiment for low extents of digestion. The relative number of DNA fragments (ordinate) with lengths in the range between 0 and 2000 bp (abscissa)produced from a random nucleosome arrangement on a very long DNA molecule was computed for average linker lengths of 35, 50, 65, and 95 bp in a-d, respectively. These plots can be taken to represent densitometer scans of MNase ladders, except that the length scale is linear instead of logarithmic, as would be obtained from an actual gel. [Reproduced from Nucleic Acids Res. 16, 6677-6690 (1988), by permission of Oxford University Press.]
SIGNALS IN EUKARYOTIC DNA
341
95 b p (sea urchin sperm), in panels a-d, respectively. The simulated scan for the 50-bp linker length (panel b), which is close to the value found in most animal tissues, looks remarkably similar to data from an actual nuclease digestion experiment. Therefore, statistically positioned nucleosomes could account for the periodicities observed experimentally from a typical chromatin. I t is important to keep in mind that according to this model, any particular nucleosome array would possess little order. Order is illusory, arising from the large number of nucleosomes being analyzed in the sample. Despite the fact that completely random nucleosome positioning (constrained by the overall nucleosome density on DNA) can account for the appearance of a typical “200-bp” MNase ladder, there are indications that this model may not, in general, be true. This model necessarily predicts that as the average repeat length becomes shorter, the background signal must diminish and more bands (peaks) should be resolved. Conversely, as the average repeat length becomes longer, the background signal must increase and fewer bands should be resolved. This effect is clearly demonstrated in Fig. 3 for the 35-bp (panel a) and 95-bp (panel d) linker lengths on comparison with the 50-bp linker length (panel b). We have performed a large number of MNase digests on nuclei isolated from HeLa cells and other cell lines where the average linker length is approximately 35 bp, and have never observed a significant reduction in the background signal or resolved any more bands than from rat liver chromatin (Fig. 3b). Also, I am not aware of any published work where this effect has been observed in chromatin from cultured cells. Perhaps, more significantly, we also do not see an increase in the background signal or resolve fewer bands from sea urchin sperm chromatin, as simulated in Fig. 3d. Scans from actual MNase ladders from HeLa and sea urchin sperm chromatin are shown in Fig. 4. In each case, the number of bands resolved and the background signals are about the same as that from rat liver chromatin. An alternative to the “statistical positioning” model, consistent with all of the data, is a “mosaic” model in which well-ordered regions of chromatin, generally containing less than about 10 nucleosomes, exist, interrupted with less ordered regions. Well-ordered chromatin regions possessing different nucleosome spacing periodicities would also contribute to the damping of the MNase ladder signal. I should point out that the experimental conditions chosen are important for obtaining good ladders from sea urchin sperm chromatin. When digestions are performed in the absence of added sodium chloride, relatively poor ladders are obtained, apparently caused by nonspecific binding of excess histone H1, released from small nucleosome oligomers excised during the digestion, to the remaining chromatin. Sea urchin histone H 1 binds particularly strongly to DNA.
342
ARNOLD STEIN
Sea Urchin
Sperm
-
Electrophoresis
FIG. 4. Densitometer scans of actual MNase ladders. The HeLa and sea urchin sperm chromatins had nucleosome spacing periodicities of 185 ? 5 and 241 5 bp, corresponding to a and d, respectively, in Fig. 3. Samples were run on different gels.
*
B. DNA Sequence-directed Chromatin Structures Why should it matter how nucleosome arrays are formed, as long as the DNA gets packaged? It matters because if information in the DNA can direct the formation of particular chromatin structures in different chromosomal regions, or if different chromatin structures can be induced to form in d8erent cell types, these structures can have functional significance. For example, the pairing of homologous chromosomes that occurs during meiosis might conceivably require the formation of and the interactions between specialized chromosomal structures that are encoded in DNA. Similarly, centromere function might require a specialized chromosomal structure encoded in centromeric DNA. Also, it is very plausible that certain types of more condensed chromatin higher order structures can be induced to spread (or, alternatively, spreading can be inhibited) over adjacent DNA, as is thought to mediate the phenomenon of position effect variegation in Drosophila (17). Here, heterochromatin appears to spread its structure to varying extents over (potentially active) euchromatin, which adventitiously became joined to heterochromatin as a result of a chromosomal rearrangement or an insertion. The extent of heterochromatin spreading in a particular cell determines whether adjoining genes
SIGNALS IN EUKARYOTIC DNA
343
are encompassed and thereby become inactivated. The active or inactive state then becomes incorporated into the chromatin structure, and it is clonally inherited, giving rise to patches of cells in which a gene is either active or inactive-the “variegated phenotype. Direct molecular evidence supporting this model has recently been obtained (18).Chromatin analysis by MNase digestion revealed a more regular nucleosome array for a transgene that was inserted next to heterochromatin (and could not be activated when induced) than when the transgene was inserted into euchromatin. This apparent spreading of a more regular chromatin structure over a transgene from the adjoining heterochromatin suggests that DNA sequence context effects can in some way influence chromatin structure and gene expression. Additionally, the nature of the chromatin higher order structure formed in a particular DNA domain or bounded region could be strongly dependent on the nucleosome linker length and the degree of linker heterogeneity, for the nucleosome array contained therein (see Section IV). There is considerable evidence for the existence of cell type-specific DNA domains (19-21). Domains are demarcated by the interactions of the domain boundary sequences with specialized proteins. Hence, if there are signals in DNA that can influence nucleosome array formation, then by apportioning particular DNA regions (or different DNA lengths) into different domains in different cell types, the signals contained in that DNA region could serve as inputs to direct the formation of a particular type of chromatin higher order structure in that domain. This model is attractive because it requires only a small number of regulatory proteins (those that interact with domain boundary sequences) to induce the formation of relatively large chromatin domains with either more “open” or more condensed chromatin structures. This mechanism could be of fundamental importance in gene regulation. In contrast, if the DNA sequence does not influence nucleosome array formation or higher order chromatin structure, as would be the case if nucleosome arrangements were entirely statistical in nature, then nucleosomes could not transmit information throughout an array, and the type of mechanism discussed above could not operate.
111. In Vitro Chromatin Assembly Systems A few in vitro systems have been developed. Some assemble chromatin with physiologically spaced nucleosomes (average linker lengths around 50 bp), and generate MNase ladders comparable to or more extended than what is observed in native chromatin. These systems are useful for studying what
344
ARNOLD STEIN
factors affect nucleosome array formation, and provide information on the mechanisms involved.
A. Chromatin Assembly Using Crude Extracts Chromatin containing ordered, physiologically spaced nucleosomes was first assembled in a cell-free system more than 18 years ago (22). It was shown that a high-speed supernatant fraction from Xenopus laevis eggs could assemble relaxed SV40 DNA (a 5.2-kb circle) into chromatin. The egg supernatant contained endogenous histones, which are present in Xenopus eggs as a stored histone pool. Significantly, chromatin assembly did not require DNA replication. Further progress with this system was limited because the reaction worked best with the extract in its crudest form. In fact, the very small reaction volumes used could not even be scaled up without deleterious effects, owing to a requirement for maintaining a large surface-to-volume ratio in order to maintain the proper pH. Some years later, reproducible scaled-up reaction conditions were defined for the Xenopus oocyte supernatant, after the realization that the reaction requires Mg2+ and ATP and is strongly affected by the concentrations of these components, as well as by temperature (23).It was then demonstrated that chromatin can be assembled well on small plasmids, irrespective of the DNA sequence, in reactions that require incubation times of about 6 hours at 27°C. Interestingly, the oocyte extract does not contain histone H1, and (at 27°C) generates nucleosomes spaced, on average, at approximately 180-bp intervals. At 37"C, nonphysiological 160-bp spacings result. When exogenous histone H1 is added to the extract (at the beginning of the reaction), significantly longer nucleosome repeats, up to 220 bp, are generated; the value of the repeat length depends upon the amount of H1 added. Apparently, the number of nucleosomes present on the plasmid determines the average value of the repeat. Histone H1-containing chromatin with longer repeats has correspondingly fewer nucleosomes on the plasmid. However, the number of nucleosomes contained on the plasmid, with or without H1, is heterogeneous, and is distributed about a mean value, following a Gaussian distribution. It was suggested (23)that this heterogeneity in nucleosome number is responsible for the smearing (loss of band resolution) that is generally observed about half way up the extended MNase ladders produced. Thus, circular plasmid molecules containing d a e r e n t numbers of nucleosomes generate ladders with different numbers of bands in such a way that at the extremes of the ladder, the length differences between corresponding oligomers (i.e., 2 or n - 2, where n is the total nucleosome number on the plasmid) are small compared with the length difference between those and the next oligomer (i.e., 3 or n - 3). Thus, bands arising from oligomers 2 and 3 or n - 2 and n - 3 are resolved. However, toward
SIGNALS IN EUKARYOTIC DNA
345
the center of the ladder, the length differences between corresponding oligomers are too great, and the overlap with the next oligomer is too extensive, to produce discrete bands. This effect has been referred to as a vernier effect by analogy with what occurs on a precision measuring device when the main scale and the vernier scale zero division marks are aligned and the two scales are compared. Recently, efficient cell-free chromatin assembly systems have been developed from Drosophila embryos (24,25).Embryos are homogenized in a small volume of extraction buffer in order to minimize dilution of cytoplasmic components and to remain as close as possible to physiological conditions. As with the Xenopus oocyte system, the extract utilized endogenous histones bound to specialized carrier proteins, although extracts can be supplemented with purified embryonic histones to some extent. The requirements for ATP, Mg2+, and a 26°C incubation temperature, as well as the characteristics of the assembly reactions, are very similar to the Xenopus oocyte system, suggesting that the assembly mechanism is the same in the two systems. The characteristics of these reactions suggest that nucleosomes simply tend to become distributed over the plasmid, constrained only by the nucleosome density on the DNA, in just the way predicted by the statistical positioning model. Perhaps the ATP requirement is associated with a mechanism that induces nucleosome “sliding” along the DNA, allowing the nucleosomes to overcome DNA sequence effects and distribute evenly (possibly randomly) throughout the plasmid. This idea is consistent with the recent exciting findings that ATP-driven nucleosome “sliding” may be involved in generating nuclease-hypersensitive sites in chromatin, important for gene activation (3, 26, 27). A limitation of these systems is that the chromatin assembly extracts are crude, containing a large number of proteins and enzymatic activities. Fractionation of the extracts, while maintaining good chromatin assembly activity, is far from trivial, and has not yet been accomplished. Additionally, the histones of Xenopus oocyte and Drosophila embryo extracts contain some unusual variants that might be required for the reaction to proceed, and there is no histone H 1 in these cells. These differences from the ordinary chromatin of somatic cells may be adaptations for the extremely rapid replication that occurs in both Xenopus oocytes and Drosophila embryos. For example, the fly genome is replicated and packaged once every 9 minutes using a maternal pool of stored histones (28). In contrast, most somatic cells divide roughly every 24 hours, histone pools are absent, and histone H 1 is present (29).Thus, the chromatin assembly mechanism used in Xenopus oocytes and Drosophila embryos might be a special histone H1-independent mechanism adapted to rapidly dividing cells, and may differ from the mechanism used in ordinary cells.
346
ARNOLD STEIN
B. Chromatin Assembly in a System Consisting of Purified Components
1. GENERALCONSIDERATIONS It has been known for some time that it is a simple matter to form nucleosomes on D N A in vitro, when the core histones are prepared by salt extraction, avoiding denaturing conditions. One simply gradually lowers the NaCl concentration from 2.0 M , in which histones and D N A do not interact, to physiological salt concentrations (or lower), in which the core histones bind to D N A essentially irreversibly, and nucleosomes form spontaneously (30).This material does not resemble chromatin, however, in that the nucleosome arrangement is highly irregular, with many nucleosomes packed closely together at physiological ratios of histone to DNA. MNase digestion of such material generally produces a continuum of DNA fragment lengths, on which multiples of about 150 bp, the close packing periodicity, can be detected. Including histone H1 in with the core histones only makes matters worse, because H1 nonspecifically coats regions of free D N A in a cooperative fashion, and leads to nucleoprotein aggregation (31, 32). Moreover, by 0.50 M NaCl, the core histones are already very tightly bound as octamers, before histone H1 interacts with D N A at all (33).
2. POLYGLUTAMATE-MEDIATED REACTIONS The early attempts to fractionate Xenopus egg extracts, although not successful with regard to the problem of reconstituting spaced nucleosome arrays, identified a very abundant acidic nuclear protein, nucleoplasmin, involved in chaperoning histones and in maintaining the stored histone pools (29). In uitro, nucleoplasmin serves as a nucleosome assembly factor by discouraging nucleoprotein aggregation, and aHlowing nucleosomes to form readily at physiological ionic strength. It was subsequently shown that the acidic polypeptide, sodium polyglutamate, has properties very similar to nucleoplasmin with regard to in vitro nucleosome assembly, and is very effective in discouraging nucleoprotein aggregation (34).It was further demonstrated that the presence of polyglutamate permits histone H1 to restore nucleosome ordering and spacing, at physiological salt concentrations, in H1-stripped chromatin for which the native nucleosome arrangement had been perturbed by nucleosome “sliding” at elevated salt concentrations (35). The reaction appears to work by the polyglutamate providing alternative H1 binding sites, thereby discouraging H1 binding to regions of naked DNA, which leads to aggregation. Histone H1 prefers to bind to nucleosomes, rather than to naked D N A (36). H1-nucleosome interactions result in the physiological spacing of closely spaced adjacent nucleosomes, provided that spacing is not hampered by the
347
SIGNALS I N EUKARYOTIC DNA
presence of another nucleosome on the DNA occupying the same space. This type of interference occurs when one tries to add histone H 1 (in the presence of polyglutamate) to nucleosomes reconstituted, at physiological ratios of core histone to DNA, on most plasmids or on randomly sheared mixed-sequence vertebrate DNA. Additionally, some of the randomly deposited nucleosomes remain too far apart. Thus, it is found that only about three or four nucleosomes in a row become properly spaced. The mechanism whereby histone H1 is able to move closely spaced nucleosomes apart is not clear because exactly how H 1 interacts with the nucleosome is still uncertain. In one model (37-41), the central globular domain of H 1 binds to the DNA regions that enter and exit a nucleosome, causing them to cross over, or stabilizing the crossover. This mode of binding could increase the amount of DNA wrapped around the histone core in each of a pair of closely spaced nucleosomes, requiring these nucleosomes to move apart. In another model (42), histone H1 binds asymmetrically to nucleosomal DNA, extending partially over one nucleosome linker. This mode of binding could also conceivably drive adjacent nucleosomes apart, by stiffening linker DNA, causing it to unwrap from around a closely spaced nucleosome and causing this nucleosome to slide over. 3. SYNTHETICPOLYNUCLEOTIDE PoLY(dA-dT).PoLY(dA-dT) SPONTANEOUSLY ASSEMBLES INTO CHROMATIN-LIKE STRUCTURES IN THE POLYGLUTAMATE-MEDIATED SYSTEM
An interesting result was obtained when the synthetic polynucleotide poly(dA-dT) was reconstituted with core histones and then incubated with chicken erythrocyte linker histone H5, an H 1 analog, at physiological ionic strength in the presence of polyglutamate (43). In contrast with what was observed with plasmids or sheared vertebrate DNA, the polynucleotide permitted the initially randomly distributed and closely packed nucleosomes to be spaced at physiological intervals, and chromatin-like higher order structures to form. Figure 5B (44)shows a typical MNase ladder, compared with that obtained from HeLa chromatin (Fig. 5A) using the same gel system. A denaturing gel system is required to prevent the poly(dA-dT) fragments from forming a variety of hairpin and secondary structures. The nucleosome repeat obtained from analysis of the gel (B) was 210 2 5 bp, slightly longer than a typical native chromatin repeat. For example, for native HeLa chromatin, the repeat from the denaturing gel (Fig. 5A) was 185 5 bp, the same as that obtained from an ordinary agarose gel. Figure 6 shows electron micrographs of poly(dA-dT) chromatin before (a) and after (b) incubation with histone H5. Before incubation, only randomly arranged nucleosomes are seen, whereas after incubation, nucleosomes have condensed into solenoid-like structures closely resembling those of native chromatin. Addi-
*
348
ARNOLD STEIN
FIG.5. MNase ladders from native and reconstituted chromatin on denaturing formamidecontaining (4%)polyacrylamide gels. (A) Single-stranded DNA fragments produced fiom HeLa chromatin digested with micrococcal nuclease for increasing times (lanes 1-6, respectively). (B) Poly(dA-dT) fragments produced from chromatin reconstituted in oitro using chicken erythrocyte histones, and digested with micrococcal nuclease for 30 seconds (lane 1)or 1 minute (lane 2). Lanes labeled L are 123-nucleotide ladders; several marker DNA fragment sizes in nucleotides are indicated. (Reprinted with permission from Ref. 44.)
tionally, in the first two panels of Fig. 6b, naked poly(dA-dT) duplex can be seen extending from solenoid-like structures that contain 30 or more nucleosomes. These results show that linker histone H5 (or H1) has a strong tendency to space nucleosomes apart and package the regularly spaced array into chromatin-like structures. In contrast with the Xenopus oocyte or Drosophila embryo systems, nucleosome ordering depends on the presence of linker
SIGNALS IN EUKARYOTIC DNA
349
FIC. 6. Electron micrographs of poly(dA-dT)chromatin lacking (a) or containing (b)linker histone H5. Bar = 200 nm. (Reprinted with permission from Ref. 4 3 . )
histone. In the absence of linker histone, nucleosomes do not form an ordered arrangement or higher order structures (Fig. 6a), and in MNase digests only the close packing periodicity can be detected (43). There are several reasons why poly(dA-dT) assembles into chromatin spontaneously. First, the monotonous base sequence (repeating A-T) eliminates nucleosome positioning preferences for certain sequences. Unless preferred positioning sites are arranged periodically (see Section V), they would be expected to interfere with the reaction. Second, it turns out that poly(dAdT) has an unusually high &nity for histone H1 (45). Third, it can be demonstrated (43) that nucleosomes readily slide along the poly(dA-dT) duplex, at physiological ionic strength, in contrast with natural DNA, for which sliding is limited to 10 or 20 bp (46). This property is consistent with the electron micrographs (Fig. 6b) showing that, on some molecules, the nucleosomes appear to have migrated along the polynucleotide and coalesced into a growing solenoid-like higher structure. Nucleosome sliding may be a feature in common with the Xenopus oocyte and Drosophila embryo systems, although in those systems, ATP may be required for it to occur. Further studies (47) show that nucleosome arrays with spacing periodicities that span the whole physiological range can be obtained using this system. In this system, the value of the spacing periodicity is controlled by the value of the initial average nucleosome packing density, rather than by the histone H1 type. For example, the full range of periodicities observed in nature is accessible to chicken erythrocyte core histones plus histone H5. However, different H1 types differ in the efficiency with which they can recruit nucleosomes into higher order structures. For example, the unusually long and basic sea urchin sperm H1 is particularly efficient and can package nucleosomes at low packing densities, thereby generating arrays
350
ARNOLD STEIN
with a long nucleosome repeat (up to 240 bp). At higher packing densities, shorter repeats are generated. In contrast, typical H 1 histone types require higher packing densities to form regular arrays and higher order structures. These differences between histone H1 types can be explained by their relative effectiveness in polynucleotide charge neutralization. If arrays with long linkers tend to form due to a low nucleosome packing density, a longer or more basic H1 is required for charge neutralization to allow stable array formation. For short linker lengths, obtained at higher nucleosome packing densities, typical H1 types suffice. Even though the value of the nucleosome spacing periodicity depends on the nucleosome packing density, the mechanism of poly(dA-dT) chromatin formation appears to be different from “statistical positioning.” In poly(dAdT) chromatin assembly, a particular packing density is required for a particular histone H1 type to nucleate a stable array or structure into which other nucleosomes can condense. In “statistical positioning,” nucleosomes are randomly arranged, constrained by the overall packing density. However, the degree to which the poly(dA-dT) chromatin assembly mechanism resembles that of cellular chromatin is currently not clear. 4. CHROMATIN ASSEMBLYON NATURAL DNA USING THE POLYGLUTAMATE-MEDIATED SYSTEM
The work presented thus far suggests that although there is a strong tendency for linker histones (H 1)to induce spontaneous chromatin assembly (when nucleoprotein aggregation pathways are discouraged), something more is required to package natural DNA sequences. For example, ATPdependent “machinery” might be required to slide nucleosomes along the DNA in order to overcome nucleosome positioning preferences for sequences that are incompatible with the formation of a regular array. However, a serendipitous finding suggested some additional possibilities. When one particular DNA construct was subjected to the same chromatin assembly procedure described above for poly(dA-dT) and the DNA fragments analyzed by MNase digestion, an unexpected and interesting result was obtained (48). Figure 7a (lane 1) shows a remarkably strong MNase ladder, obtained from the in uitro-assembled chromatin. Seventeen bands that were multiples of 210 4 bp could be resolved. This periodicity is very close to that observed for chicken erythrocyte (CE) chromatin, but the ladder extends considerably further than for the native chromatin. The ladder is also quite different from the digestion pattern obtained from the naked DNA, demonstrating that it is a property of the chromatin. The gel photograph on the right shows a completely independent experiment, where the ratio of core histone to DNA was slightly too high. In the absence of linker histone (Fig. 7b), essentially a continuum of
*
SIGNALS IN EUKARYOTIC DNA
351
DNA fragments sizes is observed, and many of the bands correspond to the preferred cleavage sites on the naked DNA (D) for the lowest extent of digestion (lane 1).For the highest extent of digestion (lane 3), multiples of 150 bp, indicative of nucleosome close packing, can be detected. This plasmid construct was pBR327 (3.3kb) containing a 301-bp insert. It turned out that the size of the insert is very important, but the sequence of the inserted DNA is not. For example, Fig. 7c shows the chromatin assembly reaction obtained using the plasmid vector alone, without an insert. Only three physiologically spaced nucleosomes could be detected, the same result obtained with mostly all other DNA samples tested. It seemed significant that the DNA length occupied by 17 nucleosomes with a 210-bp repeat, 17 x 210 bp = 3570 bp, fit nearly perfectly with the size of the length-adjusted circular DNA, 3575 bp. For example, if the 210-bp repeat were invariant, it would not be possible for an integer number of nucleosomes to fit neatly on the pBR327 vector (3274 bp). However, plasmid size alone was not sufficient for chromatin assembly, because adjustment of pUC plasmids or even pBR322, the parent of pBR327, to lengths close to integer multiples of 210 bp with appropriately sized inserts, or by making deletions, was not sufficient for nucleosome alignment. By making a series of deletions from different regions of pBR327, such that all lengths were close to multiples of 210 bp, it was demonstrated that one particular region of the plasmid was necessary. This approximately 800-bp region, termed the chromatin organizing region (COR), fortuitously positioned two nucleosomes about 40 bp apart in the absence of linker histone. In the presence of linker histone (H5 or Hl), nucleosomes became precisely positioned on the COR at 210-bp intervals, whereas away from the COR, nucleosome positioning with respect to the DNA sequence rapidly diminished. These data suggest that the COR nucleated nucleosome-array formation, with a 210-bp periodicity, which then spread around the plasmid. Small plasmids (<4kbp) with lengths that could not accommodate a 210-bp array were not packaged into a regular array. This mechanism of nucleosome alignment is quite different from what appears to occur on small plasmids using Xenopus oocyte or Drosophila embryo extracts. In those systems, chromatin assembly occurs irrespective of the DNA base sequence or plasmid size, and an apparently continuous range of nucleosome spacing periodicities can occur. The value of the spacing periodicity was consistent with the number of nucleosomes on the plasmid, which were distributed evenly or randomly. In the polyglutamatemediated system, a specific region of pBR327 appeared to nucleate H1dependent nucleosome alignment fortuitously with a 210-bp periodicity. When the plasmid length was close to an integer multiple of 210 bp, a nucleosome array appeared to “crystalize” on the plasmid.
352
ARNOLD STEIN
SIGNALS IN EUKARYOTIC DNA
353
According to this nucleosome alignment model, although a particular region of plasmid pBR327 can nucleate the reaction, alignment does not occur on this plasmid because its length (3274 bp) cannot accommodate a single regular array (going around the entire plasmid) with a 210-bp repeat. Thus, although the 3274-bp plasmid could have accommodated either 18, 17, 16, or 15 evenly spaced nucleosomes, giving repeats of 182, 193, 205, or 218 bp, respectively, no evidence for any of these periodicities was found for any ratio of histone to DNA tested. In contrast, pBR327 containing a 301-bp insert (construct Il), could have contained 19, 18, or 17 evenly spaced nucleosomes, giving repeats of 188, 199, or 210 bp, respectively. In this case, a strong 210-bp periodicity was observed (Fig. 7a). Thus, the value of the repeat would have to be specified and fixed at 210 bp for this phenomenon to occur. For example, if a 205-bp repeat could occur, nucleosomes should have been able to form an ordered array on pBR327. Differences in the repeat length of this magnitude (5 bp) can barely be measured by standard methods. Moreover, this type of boundary constraint could not occur unless the nucleosome spacings in the array are fairly precise. To test this model more rigorously, 30 pBR327 deletion constructs were made, giving circles of different lengths, but each having the COR intact (49). These constructs were tested for their ability to align nucleosomes. The results of this analysis are summarized in Table I. Plasmid lengths are indicated, along with whether they aligned nucleosomes or were nonaligning (N, poor MNase ladder). In cases where good alignment occurred, the measured repeat is indicated. Additionally, “accessible repeats,” defined as the plasmid length divided by the number of nucleosomes contained, are listed. The accessible repeats correspond to the repeat lengths (in the physiological FIG.7. Histone H5-induced nucleosome alignment on plasmid pBR327 is greatly increased by the presence of a 301-bp DNA insertion. (a) MNase digests of chromatin assembled on pBR327 containing a 301-bp insertion (construct 11). In the experiment depicted on the left, approximately 24 pg of DNA was reconstituted with 23 pg of core histones and subsequently incubated with histone H5. The sample was digested 2, 4, or 6 minutes in lanes 1 to 3, respectively. Lane D shows naked DNA digested for 1.5 minutes under the same conditions, but using 0.6 times the MNase concentration used for chromatin. The arrow identifies a particularly intense band. The original position of lane D on this gel was several slots away from the position shown. In the experiment (using different preparations of all components) depicted on the right, the effective ratio of core histone to DNA was slightly higher than in the experiment on the left. The sample was digested 2, 1, or 0.5 minutes in lanes 4 to 6, respectively. Lane U shows undigested plasmid; the nicked (n) and supercoiled (s) DNA forms are indicated. Lane M shows a 123-bp ladder; the DNA lengths of selected multiples are shown. (b) A sample treated as in a, left, but lacking histone H5. (c) MNase digests of chromatin assembled as in a (left) on plasmid pBR327 containing no insert. Lanes labeled CE on all gels show digests of native chicken erythrocyte chromatin; DNA length multiples of nucleosome oligomers are indicated. (Reprinted with permission from Ref. 4 8 . )
354
ARNOLD STEIN
TABLE I
SUMMARYOF RESULTS AND ACCESSIBLEREPEATS Length (bP)
Observed repeat=
Accessible repeats
A3 A4 P4 A5 A6 D3 D4 A7 P5 P6 P7 P8 P9
3575 3274 3230 3171 3156 3154 3036 2973 2950 2933 2910 2872 2858 2816 2812 2751 2734 2733 2725 2692 2577 2556 2453 2401
210 f 4 N N N 210 f 5 N N (212 t 6) 210 f 4 210 f 5 N N N N (210 ? 5) 209 f 5 210 2 5 210 f 4 209 f 5 (209 t 6) N N N N
188.2, 198.6, 210.3 181.9, 192.6, 204.6, 218.3 190.0, 201.9, 215.3 186.5, 198.2, 211.4 185.6, 197.3, 210.4 185.5, 197.1, 210.3 189.8, 202.4, 216.8 185.8, 198.2, 212.4 184.4, 196.7, 210.7 183.3, 195.5, 209.5 181.9, 194.0, 207.8 191.5, 205.1 190.5, 204.1, 219.8 187.7, 201.1, 216.6 187.5, 200.8, 216.3 183.4, 196.5, 211.6 182.3, 195.3, 210.3 182.2, 195.2, 210.2 181.7, 194.6, 209.6 192.3, 207.1 184.1, 198.2, 214.8 182.6, 196.6, 213.0 188.7, 204.4 1EM.7, 200.1, 218.3
A8 A9 A10 P10 All D6 D2
2260 2243 2207 2204 2177 2116 2109
Construct I1 pBR327
s1
PI D7 s2 P2 A1 D5 P3
A2
__-
-_---
202 f 5 202 f 5 201 -c 5 (199 rf- 6) N N N
-_---_-_-_-__
188.3, 205.4 186.9, 203.9 183.9, 200.6 183.7, 200.4 181.4, 197.9, 217.7 192.4, 211.6 191.7, 210.9
a N, Nonaligning constructs. Parentheses denote marginal alignment. The dashed line marks the approximate size range where the observed repeat value changed.
range) that would be obtained if different numbers of nucleosomes on the plasmid were evenly spaced, as discussed above for pBR327 and construct 11. Inspection of Table I reveals that the strongly aligning plasmids with lengths greater than 2.4 kb all had accessible repeats close to 210 bp, a value matching their measured repeat. The smaller ( c 2 . 4 kbp) aligning plasmids had accessible repeats around 200 bp, again correlating with their measured
355
SIGNALS IN EUKARYOTIC DNA
repeats. The reason for the approximately 10-bp shortening of the nucleosome repeat in the small plasmids is not clear, but seems clearly size related. Perhaps it is related to either steric or energetic constraints imposed by the small plasmid size. Additionally, with two apparent exceptions (plasmids P1 and S2), the nonaligning plasmids greater than 2.4 kbp lacked accessible repeats in the range between 208 and 213 bp. Thus, the data strongly support the model. Also, it can be calculated from the data that the statistical variation of individual linkers about the mean value of 64 bp cannot be more than about 10 bp. The two apparent exceptions (deletions P1 and S2) to the rule that nucleosome alignment occurs when the plasmid length is close to an integer multiple of 210 bp (or, equivalently, when 210 bp is an accessible repeat) can be understood to be consequences of the effects of two additional strongly positioned nucleosomes on the plasmid. A map of pBR327 and its positioned nucleosomes is shown in Fig. 8. Nucleosomes 1 and 2 are on the COR, and appear to be responsible for nucleating nucleosome alignment with a 210-bp spacing periodicity. Nucleosome 3 is close to 5.0 x 210 bp away from nucleo-
n
FIG.8. Arrangement of the four positioned nucleosomes that form on pBR327. Nucleosomes are denoted by circles occupying 146 bp and are numbered from 1 to 4. The lengths between nucleosome centers (indicatedby letters) are 5.0 X 210 bp for I and J and 4.6 X 210 bp for A. Plasmid map positions of nucleosome centers are also indicated. Nucleosomes 1 and 2 are within the chromatin-organizing region (shaded). [Reprintedwith permission from Biochernistry 32, 489-499 (1993). Copyright 1993 American Chemical Society.]
356
ARNOLD STEIN
some 2, and can remain in place as alignment spreads from the COR around the plasmid. Nucleosome 4, however, is incorrectly positioned at 9.6 x 210 bp, measuring counterclockwise from 2, and it inhibits alignment unless a length-adjusting deletion removes it or is made in region A. The lengthadjusting deletion P1 was not successful because it was made in region I, altering the 5.0 x 210-bp distance between nucleosomes 1 and 4. Similarly, the length-adjusting deletion S2 was not successful because it was in region J, which altered the 5.0 x 210-bp distance between nucleosomes 2 and 3. Thus, positioned nucleosomes, as well as the overall plasmid length, can impose boundary constraints on the system. Finally, boundary constraints due to plasmid size should occur only on relatively small (<4 kbp) plasmids where the length of DNA to be packaged into an ordered nucleosome array is within the range of the COR, and where statistical effects do not prevail. Consistent with these expectations, insertion of a 2-kbp chicken genomic DNA fragment, obtained from an anonymous clone, into pBR327 and several deletion constructs relieved the boundary constraint due to plasmid size in each case (49).Nucleosome ordering with a 210-bp repeat was observed to spread from the COR over a portion of the 2-kbp-insert, irrespective of the size of the plasmid vector. The extent of spreading of nucleosome order onto large fragments of DNA inserted into pBR327 depends on the sequence of the insert.
IV. Relationship between the Nucleosome Array and Chromatin Higher Order Structure
Chromatin has generally been thought to possess a well-defined higher order structure at the level of compaction immediately above the nucleosome string. Chromatin fragments in solutions of appropriate ionic strength exhibit a "30-nm fiber" morphology in electron micrographs. These electron micrographs have been interpreted in a variety of ways. The most popular models are helical in nature (38, 50-54), and predict regular nucleosomenucleosome contacts. For example, in the solenoid model (38) there are six nucleosomes per turn in a helical arrangement. The axes of the disk-like nucleosomes are perpendicular to the axis of the superstructure, and the linker DNA continues along the same superhelical path as the DNA that is wrapped around the core histone octamer. Linker DNA extends into the central hole in the solenoid, allowing the structure to accommodate small changes or variations in linker lengths without any significant structural change, when viewed from the outside. Although relatively short solenoid-like structures can clearly be seen in electron micrographs under certain conditions, a consistent finding in ultra-
SIGNALS IN EUKARYOTIC DNA
357
structural studies over the years, using a variety of electron microscopy techniques, has also been the observation of irregular structures (55, 56). Moreover, recent work suggests that a well-defined structure for the “30-nm fiber” may not exist. Using the new (in its application to biological samples) technique of scanning force microscopy (SFM), chromatin fibers have been imaged at high resolution that appear, in the absence of salt, as irregular loosely organized three-dimensional structures, instead of the familiar “bead on a string” morphology seen in electron micrographs (57). In the presence of salt, these structures condense and appear as irregular segmented structures with intricate folding and twisting, containing abrupt bends. An advantage of S F M over E M is that images can be obtained under much less harsh conditions. Hydrated macromolecules are imaged without staining, or even fixation in some cases. Interestingly, the images of chromatin fibers obtained by S F M closely resemble those recently obtained for chromatin in situ, using electron tomography to generate three-dimensional reconstructions from thin sections of nuclei embedded in a hydrophilic resin at low temperature (58). A simple chromatin folding model has been proposed (59) that leads to the formation of structures that closely resemble the structures observed in situ and by S F M . A distinction is first made between DNA folding, which is derived from intrinsic properties of nucleosomes plus linkers, and chromatin compaction, which is regarded as an electrostatic effect determined by the local concentration of ions and charged molecules. No assumptions that are implicit in helical models are made, such as imposition of symmetry or the involvement of specific nucleosome-nucleosome contacts. According to the folding model, linker DNA is straight and continues in the plane established within the nucleosome. The rotation angle between adjacent nucleosomes is determined by the length of the linker (L). Using one nucleosome in the chain as a reference, the plane of an adjacent nucleosome rotates with respect to its neighbor following the helical repeat of the linker DNA (about 10.4 bp/turn). The only other parameter that needs to be specified is the entry-exit angle (a), where the entering and exiting DNA linkers cross, estimated to be between 20” and 60”. By simply specifying L and a,a chromatin fiber (with a constant linker length) can be built. Building a series of fiber models by varying L, while keeping a constant, showed that the structure obtained is highly sensitive to the precise value of L. A 1-bp difference in L leads to the formation of a distinguishable and sometimes very different structure. However, when L is changed by 10 bp (approximately the helical repeat of DNA), the original structure essentially repeats itself. When linker variation is introduced into the model, statistical variations of + 2 or + 3 bp about the mean (or this variation 2 10 bp) lead to model fibers that look remarkably similar to in situ chromatin or the structures
358
ARNOLD STEIN
imaged by SFM. Larger statistical variations in the linker generate highly disordered structures that do not resemble native chromatin. The requirement for a well-defined linker length (repeat length) in regions of chromatin, with only small statistical variations, in order to generate model structures that resemble native chromatin, is consistent with the results obtained from the in uitro chromatin assembly system discussed in Section II,B,4. Those studies indicate that, for the DNA being studied, the nucleosome repeat was fixed at 210 bp and statistical variations were small. The model of Woodcock et al. (59) is not consistent with the existence of random linker lengths. Additionally, the strong dependence of the nature of the chromatin fiber on the value of the linker length suggests that many different types of local chromatin structures could form in difFerent regions of DNA, determined by the value of the nucleosome repeat or by the degree of regularity of the nucleosome array in a particular region. The in uitro studies discussed in Section III,B,4 suggest that the DNA base sequence can specify the value of the nucleosome repeat, as well as determine the regularity of the nucleosome array. Hence, in the absence of additional influences, the base sequence could also specify the chromatin higher order structure formed in that region.
V. Signals in Genomic DNA Influence Nucleosome Alignment
Eukaryotic DNA requires packaging into chromatin, and it is reasonable to suspect that there might be information present in the base sequence that influences this process. One type of signal might affect nucleosome positioning through variations in the “bendability” of different base sequences (6066). The exact nature of such signals is not yet clear. The positioning of nucleosomes on DNA is generally thought of in terms of precise nucleosome placement with respect to factor-binding sites in transcriptional promoters. However, there is also evidence for subtle periodic structural variations in DNA on a larger scale, possibly due to variations in DNA bendability, that might influence the formation of whole arrays of nucleosomes (67-69). Signals such as these that could influence the packaging of regions of chromatin might also be functionally important. For example, all regions of DNA might not become packaged into a single highly ordered inert chromatin structure. Thus, DNA control regions, which are often found to have a less ordered nucleosome arrangement (70, 71), might be packaged differently to allow regulatory proteins to compete with histones more effectively (2, 72). It has not been possible to determine whether less-ordered chromatin regions exist to facilitate factor binding, or if such regions arise merely as
SIGNALS IN EUKARYOTIC DNA
359
a consequence of factor binding. Additionally, whole DNA domains might be packaged in a distinctive way in appropriate cell types to facilitate gene regulation. The in vitro studies described in Section III,B,4 provide further support for the idea that there might be signals in DNA that affect nucleosome alignment. In those studies, an approximately 800-bp (or shorter) region of plasmid DNA, fortuitously present, served to nucleate the alignment of nucleosomes by histone H1. Once nucleated, a highly ordered chromatin structure appeared to spread several kilobase pairs around the plasmid. Additionally, this fortuitous signal, referred to as a chromatin-organizing region, appeared somehow to encode the 210-bp nucleosome spacing periodicity observed in the chromatin assembly. The nucleosome repeat length was not affected by the core histone-to-DNA ratio, by adding excess linker histone, or by changing the linker histone type from H5 to H1 (48). Because of these findings, it was of interest to examine eukaryotic DNA and attempt to find signals with properties similar to those of the chromatinorganizing region of pBR327. It would be anticipated that the fortuitous plasmid signal would only superficially resemble eukaryotic signals that evolved to influence nucleosome alignment. For example, signals in vertebrate genomic DNA might be more extended and, possibly, more potent.
A. lntrons of the Chicken Ovalbumin Gene
Promote Nucleosome Alignment in Vitro
A map of the 12-kbp chicken genomic DNA fragment containing the ovalbumin gene is shown in Fig. 9. This gene has eight exons, the last of which contains 647 untranslated nucleotides, and seven introns of varying lengths (73-75). Chicken erythrocyte core histones were first added to this 12-kbp fragment contained in pBR322, forming irregularly spaced and close packed nucleosomes. Then the sample was incubated with chicken linker histone H5 in the presence of sodium polyglutamate in a buffer containing 0.15 M NaCI. These conditions have resulted in extensive nucleosome alignment (extending over 3 kbp), with a spacing periodicity of 210 ? 5 bp, in constructions that contained the chromatin-organizing region from pBR327 (Section III,B,4). In pBR322, used here, the chromatin-organizing region is interrupted and functions poorly. Moreover, the 12-kbp insert is sufficiently large so that its possible influence by plasmid DNA should be minimal. The local nucleosome alignment throughout the chicken DNA insert was assessed by Southern blotting, using probes separated by 2-3 kb, as indicated in Fig. 9 (76). Proceeding across the insert from left to right, probes 1 and l a indicated that nucleosomes aligned poorly over the first 3-4 kb, corresponding to 5' flanking DNA. In contrast, probe 2 from intron A detected a regular nucleo-
360
ARNOLD STEIN
0.0 1.0 2.0 3.0 4.0 5.0 6.0 1
I
1 -
I
I
1
la -
RI( Hd
*
0 C D E b
I
I
I
kb
4 -
-3 12 3 4
L
,
I
-2
A
d
7.0 8.0 9.0 10.0 11.0 12.0 F
5 6
G
, A d
_-_
I0m
FIG. S. Map of 12-kbp chicken genomic DNA clone, pOV12, subfragments, and probes. The solid line denotes the chicken DNA insert containing the ovalhumin gene; dashed lines denote plasmid DNA. The Hind111 (Hd) sites mark the end of the chicken sequence. EcoRI (RI) and BamHI (Bm) sites present in pBR322 are indicated. The dark boxes indicate exons Land 17; letters A - 6 identify the corresponding introns. Positions of hybridization probes 1, la, 2, 3, 4, and subfragments a-d are indicated. [Reproduced from Nucleic Acids Res. 20, 6589-6596 (1992), by permission of Oxford University Press.]
some array extending for at least 2.5 kbp, with a spacing periodicity of 202 t
5 bp. Similarly, probe 3 from intron E/exon 5 detected a strong 196 2 5-bp periodicity, closely resembling that found for native chicken oviduct chromatin (77). Figure 10 shows the results obtained with the 5'-flanking probe l a and the intron E/exon 5 probe 3, respectively. Neither the naked DNA nor chromatin containing only core histones generated MNase digests resembling native chromatin, although the core histone reconstituted DNA (lane R in B) reveals some bands that appear to be multiples of the 196-bp repeat in the upper half of the gel. Probe 4 from the 5' end of exon 7 detected little nucleosome alignment, possibly because of the close proximity to plasmid sequences. These data indicate that histone H5 induced nucleosome alignment over most of the ovalbumin gene, but not on the flanking DNA. In order to test the hypothesis that the presence of introns facilitates nucleosome alignment, a 1.9-kbp cDNA contained in pBR322 was examined, using a fragment from exon 5 as a hybridization probe. Figure 11shows that alignment on the cDNA was poor, compared to that detected on subfragment c, using the same probe. Although it cannot be concluded from this experiment that coding DNA is devoid of nucleosome alignment information, this result is consistent with the idea that the presence of introns facilitates nucleosome alignment over the gene. Interestingly, the 5' flanking region of this gene did not undergo nucleosome ordering, even though approximately 4 kbp of natural flanking sequence was present, suggesting that this flanking DNA region might be resistant or refractory to nucleosome ordering. The nucleosome alignment signals present in the ovalbumin gene functionally resemble the fortuitous alignment signal found in plasmid pBR327
36 1
SIGNALS IN EUKARYOTIC DNA
PROBE l a
PROBE 3
FIG. 10. Nucleosome alignment throughout 12 kbp of chicken ovalbumin DNA assembled into chromatin in vitro. Southern blots of MNase digests with probes l a and 3 are shown. Lanes 1 and 2 show 1- and 2-minute digests, respectively for chromatin assembled in the presence of linker histone H5. Lanes R show 30-second digests of chromatin assembled in the absence of histone H5.Lanes D show naked DNA digests. Lanes M contain labeled DNA size markers; lengths (base pairs) are indicated. The nucleosome 2-mer and 8-mer hands of the 195-lip unit repeat are indicated for probe 3. [Reproduced from Nucleic Acids Res. 20, 6589-6596 (1992),by permission of Oxford University Press.]
(Section III,B,4). However, one difference between the intron E signal and the pBR327 was apparent. These two signals nucleate the assembly of chromatin having different repeat lengths. This property was seen clearly in an experiment where subfragment c was cloned into pBR327 in two orientations, and the plasmid DNA between the insert and the pBR327 chromatinorganizing region was probed for each construct; the two ladders were run on a gel side by side. In one orientation, intron E was closer to the probed region than in the other. When intron E was close to the probed region on one side and the plasmid signal was on the other side, a 196-bp repeat was generated on the plasmid DNA, indicating that the intron E signal was stronger. However, in the other orientation, where intron E was further away from the region of DNA probed, a 210-bp repeat was generated on this same sequence due to the influence of the plasmid signal (76). This result provides support for the idea that the nucleosome repeat length in a region of the chromatin can be specified by a chromatin-organizing signal. As was found before, the repeat length was unaffected by changing the nucleosome density on the DNA, or by adding excess histone H5. The pattern of nucleosome alignment on the ovalbumin gene and flank-
362
ARNOLD STEIN
1
2
FIG. 11. Nucleosome alignment on ovalbumin cDNA. Cloned 2-kbp DNA fragments were assembled into chromatin in oitrv and digested with MNase. The Southern blot was hybridized to a 202-bp sequence taken from exon 5, approximately in the center of the cDNA (lane 2) and the subfragment c (lane 1) inserts (see Fig. 9). Multiples of the 196-bp unit repeat generated from subfragment c are indicated at the left of the photograph. [Reproduced from Nucleic Acids lies. 20, 6589-6596 (1992). by permission of Oxford University Press. ]
ing DNA observed in vitro may closely resemble that of native chromatin in chicken oviduct, the tissue in which this gene is expressed. First, the nucleosome spacing periodicity in hen oviduct is 196 -+ 5 bp (I), virtually the same as what was observed in vitro. Second, in chicken oviduct, the 5’ and 3’ flanking DNAs for several kilobase pairs exhibit poor nucleosome alignment (78, 79), similar to that observed in uitro, even though these regions are not transcribed. In hen oviduct, poor MNase ladders were also observed within the ovalbumin gene, but this is clearly a consequence of ongoing transcription in this region. In the immature chick, perturbation of nucleosome alignment within the gene was directly related to the level of hormone-induced transcription, and a regular 196-bp repeat was observed in hormone-withdrawn chicks (80). It has been argued that the poor nucleosome alignment observed in the chromatin flanking the gene cannot be attributed to an intrinsic property of the DNA because typical ladders were detected in chicken erythrocytes (79). However, the uniform packaging of essentially all of the DNA into a chromatin structure with an unusually long (210 bp) repeat, might be a consequence of an unusual chromatin packaging mechanism related to the “heterochromatization of the nucleus” that occurs in avian erythrocytes (8I), or to the use of different DNA packaging signals that are dominant in nucleated erythrocytes.
SIGNALS IN EUKARYOTIC DNA
363
These data suggest that signals present in some of the larger introns of some genes strongly modulate DNA packaging. In the case of the chicken ovalbumin gene, signals in the DNA are such that the regions flanking the gene should be expected to be resistant to packaging into a typical nucleosome array. These flanking regions, which contain recognition sequences for protein factors, therefore might then be able to compete more effectively for these protein factors than readily packaged DNA regions. This packaging mechanism might be important for discouraging nonspecific or inappropriate factor binding. Alternatively, the transcribed region may possess a special property that enables the nucleosomes on it to readily undergo alignment spontaneously (see Section V,C).
B. Signals in Chicken p-Globin DNA Influence Chromatin Assembly in Vitro
Gene regulation in eukaryotic cells is accomplished through the interplay between transcription factors and chromatin structure (2), but the mechanisms involved are still poorly understood. A popular idea is that in the absence of an activation event, DNA is passively packaged by a default mechanism into an inactive chromatin structure. On activation, a gene and its flanking DNA acquire a more open (DNase-I-sensitive) chromatin structure (82). It has not been adequately explained, however, how a localized activation event could prevent the presumed default packaging of large stretches of DNA into inactive chromatin or, alternatively, how an active chromatin structure could propagate over a large distance. It has been suggested, for example, that transcription factor binding in the promoter region of a gene might disrupt one or more nucleosomes and thereby break cooperative interactions among histone H1 molecules, leading to the destabilization of a large region of chromatin (83). However, although histone H1 binds cooperatively to naked DNA (31, 32), cooperative binding in chromatin has not been demonstrated, and highly cooperative binding appears unlikely (4). Moreover, active chicken p-globin chromatin from erythroid cells exists in a compacted state, consistent with either a fully folded 30-nm fiber containing a local disruption at the nucleosome-free hypersensitive region (84) or an only slightly disrupted fiber (85). Additionally, the large (>200 kbp) open chromatin domain flanking the locus control region of the human p-globin gene cluster is highly asymmetric, beginning just upstream of the hypersensitive sites and extending far downstream (86). It is as if proteins interacting with the locus control region had prevented the long-range spreading on an inactive structure from a region upstream rather than merely perturbed the chromatin structure around the factor binding sites by breaking cooperative interactions. These findings suggest the possibility that the histone H1-induced
364
ARNOLD STEIN
spreading of nucleosome alignment is a cellular mechanism, and also that when isolated from the influence of neighboring chromatin, signals within a DNA domain might be involved in generating a distinctive chromatin structure, conducive to the regulation of genes in that domain. The chick p-globin gene cluster is a good system for studying the effects of chromatin structure on gene regulation during development. It is contained within a large DNase-I-sensitive active chromatin domain in erythroid cells (87, 88), and nucleosome arrays throughout this domain have distinctive shortened internucleosome spacings (9). It is plausible that the significantly shortened spacings, relative to bulk or inactive chromatin (approximately 180-bp repeat versus 200-bp repeat), could facilitate the formation of a distinctive higher order structure that is more suited for regulation by erythroid factors (89) than the bulk chromatin structure, and also contribute to the generalized DNase I sensitivity observed. Therefore, it is important to understand how the shortened spacings arise specifically in this DNA domain. A variety of mechanisms can be imagined. For example, shortened internucleosome spacings could conceivably arise from histone modifications (go), an altered mode of DNA replication (91),association with special nonhistone proteins, or signals in the DNA. Such mechanisms are difficult to distinguish or to demonstrate clearly by in vivo studies. Recently, using the in vitro system described in Section III,B, it was shown that the tendency to form a nucleosome array with the characteristic 180 bp repeat is encoded in p-globin DNA (92). Figure 12 shows a map of the cloned 6.2-kbp chicken p-globin DNA fragment. The three exons of the PA gene, about 1kb of upstream sequence, and the first two exons of the E gene are contained in this fragment (93, 94). The region between the two genes contains an enhancer that controls transcription of both of these genes and possibly controls the whole domain (9598). In the presence of linker histone H5, the chromatin assembled in vitro had a 180 5-bp nucleosome spacing periodicity over the whole 6.2-kbp insert (92). Moreover, around the enhancer (probe E) and the intergenic region (probe I), the 180-bp periodicity could be detected even in the absence of linker histone, and nucleosome positioning with respect to the base sequence could be detected. Interestingly, the cloned 2-kbp subfragment extending from the upstream EcoRI site to the Hind111 site about two thirds of the way into the second exon (Fig. 12) aligned nucleosomes poorly. This result suggests that in the 6.2-kbp fragment, the intergenic sequences facilitated the regular packaging of the P A gene. Control experiments using a construct with p-globin DNA inserted next to the fortuitous chromatin-organizing region of
*
365
SIGNALS IN EUKARYOTIC DNA
P
&
1 kbp
FIG. 12. Map of the 6.2-kb chicken p-globin EcoRI fragment. Restriction sites for EcoRI (R), Hind111 (H), and BamHI (B) are indicated. The three exons of the PA gene (p) and two exons of the gene are represented as heavy lines. Dashed lines represent plasmid sequence. Locations of the three large and three small hybridization probes used are indicated helow the map. (Reprinted with permission from Ref. 92.)
pBR327 (which encodes a 210-bp repeat) clearly demonstrated that the 180bp nucleosome spacing periodicity observed was encoded in p-globin DNA, because a 210-bp nucleosome spacing periodicity could be detected on the adjacent plasmid sequences in the same construct. It might be significant that nucleosome ordering in vitro on the P A gene promoter region requires cis-acting signals present on downstream flanking DNA. Such signals appear to exist in the intergenic region within and just downstream of the enhancer. It is reasonable to suppose that the binding of protein factors to the enhancer could block the spreading of nucleosome alignment toward the P A promoter. The poor nucleosome alignment inherent in the promoter region could make it more accessible to transcription factors. Alternatively, poor nucleosome alignment throughout the region upstream of the enhancer might prevent the formation of a typical inactive higher order structure and thereby facilitate looping out (96)of the chromatin between the enhancer and the PA promoter. The fact that P-globin chromatin appears to be packaged into chromatin with nucleosome repeats typical of the bulk chromatin in nonerythroid cells (78) is not inconsistent with the presence of a 180-bp periodic signal in p-globin DNA. It is plausible that the 180-bp periodicity present in p-globin DNA could be overridden by the spreading influence of adjacent chromatin. Limited spreading influences have been demonstrated in vitro (Section 111,B and here), and such effects are likely to be much more effective in uivo. The
366
ARNOLD STEIN
results obtained in vitro for chicken p-globin DNA and for the chicken ovalbumin gene (Section V,A) suggest that the nucleosome spacing periodicity obtained for a gene in uitro might correlate with the spacing periodicity in uiuo for cell types in which the gene is expressed. For example, the 180-bp p-globin spacing in chick erythroid cells (9), in which the gene is expressed, corresponds to the 180-bp spacing obtained in uitro. Similarly, the 196-bp spacing for the chicken ovalbumin gene in hen oviduct, in which this gene is expressed (78), corresponds to the in uitro value (Section V,A). The chicken ovalbumin gene is not restricted to this value because, in erythrocytes, it has a 207-bp nucleosome spacing periodicity, the bulk chromatin repeat (9). These considerations suggest that a principle for tissue-specific chromatin organization might apply. Thus, DNA domains of active or potentially active genes may be, in essence, isolated from flanking DNA. There is evidence for the existence of specialized sequences that serve as boundaries and presumably function to isolate transcriptional domains in appropriate cell types (19-21). This isolation then should permit relatively weak nucleosome alignment signals, inherent in the DNA of the domain to be realized, as they are when the (isolated) DNA is assembled into chromatin in uitro. The domain-specific packaging that would then occur might facilitate transcriptional regulation of the genes in that domain.
C. Rat Growth Hormone Gene lntrons
Stimulate Nucleosome Alignment in Vitro and in Transgenic Mice and Increase Transcription Efficiency
The use of cDNA constructs or heterologous promoters in transgenic mice often leads to poor gene expression, even for constructs that permit efficient gene expression when transfected into cultured cells (99, 100).The rat growth hormone (rGH) gene fused to the mouse metallothionein (mMT) promoter has been studied in some detail. In this case, the effect of introns on expression is at the level of transcription. The transcriptional efficiency in transgenic mouse liver increased 10- to 100-fold when the natural rGH introns were included (100). Improvement in both the average expression level and the number of mice that gave detectable expression was observed. The general lack of a marked stimulatory effect on transcription when introns were placed at various unnatural locations with respect to the promoter, along with the lack of a stimulatory effect when constructs were transfected into cultured cells, appears to rule out the existence of ordinary enhancers within introns. Moreover, although the first intron (intron A) alone, in its natural location, rescued expression to a level of 50%in transgenic mice, the
367
SIGNALS IN EUKARYOTIC DNA
presence of both introns A and B curiously led to lower average expression levels than when no introns were present (101). A possible explanation for these findings is that genomic DNA might contain sequence arrangements that facilitate the packaging of some genes into chromatin (14, 76, 92, 100, 101). Thus, unnatural sequence arrangements might lead to less well-defined chromatin structures that may be deleterious either to transcription initiation or elongation. To test this idea, the same set of mMT-rGH constructs (Fig. 13) that were initially used in the transgenic mouse studies (101) were assembled into chromatin in uitro. Additionally, high-copy-number transgenic mice were made for chromatin analysis. The in uitro system, which contains purified histones as the only cellular components (Section 11, B), assesses the inherent tendency of DNA sequences to assemble into chromatin. The natural rGH genomic sequence, compared with an intronless version, stimulated the formation of an ordered nucleosome array, both for chromatin assembled in vitro and in transgenic mice. Also, there was a good correspondence between the nature of the chromatin assembled in vitro for constructs that contained particular combinations of introns, and the expression results in transgenic mice (13).
.
b
C
1-5 I
m
1 2-5
d
A 12
3-5
FIG. 13. Descriptions of mMT-rGH gene constructs (a, a', b-f) containing various combinations of introns. The thin lines and filled boxes (rGH gene exons and fused exons) denote rat sequences; exons are numbered 1-5, introns are identified by letters A-D. The thicker lines denote the 1.8-kb EcoRI-XhoI mMT-I promoter fragment, and dashed lines denote plasmid sequences. Construct a' is the same as a except that sequences within the brackets are deleted. Relevant restriction sites are indicated; in a, the distance between the XhoI and EamHI sites is 5.0 kb. The arrows below construct a correspond to MNase cutting sites between positioned nucleosomes. Hybridization probes I-VI are indicated in a and b. (Reprinted with permission from Ref. 13.)
368
ARNOLD STEIN
A
-1NTRONS (b) M
D
C
+INTRONS (a)
C
ELECTROPHORESIS
D
M
-
FIG. 14. Effect of rCH gene introns on nucleosome alignment for chromatin assembled in uitro. (A) Comparison of the constructs (Fig. 1)lacking all introns (b) or containing all introns (a). Lanes labeled M contained HaeIII + AccI +X174 RF fragments as size standards; fragment sizes are indicated. Lanes labeled D show naked DNA digested with 2.5 units of MNase for 30
SIGNALS IN EUKARYOTIC DNA
369
Figure 14A shows that when introns are present, a highly regular 195 f 4-bp ladder was detected using probe I (see Fig. 13 map). The seventh band of the ladder runs slightly slower than the 1353-bp marker, consistent with 7 x 195 bp = 1365bp, and the twelfth band runs slightly faster than the 2352bp marker, consistent with 12 x 195 bp = 2340 bp. A densitometer scan resolved 13 peaks (Fig. 14B, tracing a). Significantly, several strong bands arising from preferred MNase cutting sites on the naked DNA control, in the region of the gel above 1078 bp, are well protected in the chromatin sample, providing strong evidence that the native-like chromatin ladder arose from a nucleosome array that was highly ordered, and not simply from the preferred cleavage sites in DNA. Virtually the same ladder was detected using probe 111, the rGH cDNA (not shown). Interestingly, appreciable nucleosome ordering did not occur on the approximately 3 kb of rat sequence flanking the rGH gene on the 3' side, as assessed using probes IV-VI (data not shown). When no introns were present (Fig. 14A), a less regular nucleosome ladder was detected. The densitometer scan (Fig. 14B, tracing b) shows that, although the first four peaks are very similar to those generated with the natural rGH gene, peaks after the fourth are in spurious positions or are not well resolved. Overall, the peaks after the fourth are not at multiples of 195 bp. Also, it can be seen from the gel photograph that, in this case, many of the bands that appear in the chromatin sample (lane C) are also present in the naked DNA control (lane D). This situation is what would be expected for a low degree of nucleosome order, where the preferred cutting sites in DNA dominate the pattern. Mice with high transgene copy numbers were necessary for the chromatin analysis so that the major hybridization signal would derive from the transgenes rather than endogenous genes. Thus, the DNA fragments were ligated to generate head-to-tail arrays before microinjection. Table 11 provides a summary of the mice, the gene copy number, and the amount of rGH mRNA in liver. Transcription was not induced with zinc because the very active transcription expected for the natural rGH gene might interfere with analysis of the chromatin structure. The six transgenic mice with the natural MTrGH gene averaged 354 mRNA molecules/transgene/cell compared to 24 for the seven transgenic mice with the intronless version. Sample 6 gave an mRNA level 1/30th of the mean, probably due to the extremely high copy number. Such a high copy number might result in transcription factors being limiting. If this sample is not included, the average rGH mRNA level value seconds; lanes labeled C show chromatin digested with 5.0 units of MNase for 1 minute. Hybridization probe I was used. (B) Densitometer scans of the autoradiogram. Lanes C for constructs a and b are shown. (Reprinted with permission from Ref. 13.)
370
ARNOLD STEIN
TABLE I1 EXPRESSION OF MT-rCH TRANSGENE 1?I Sample no.
+ Introns 1 2 3 4 5 6 -1ntrons 7 8 9 10 11 12 13
INTRONS IN
MOUSE LIVER
Mouse no./sex
Transgeneso (no./cell)
rGH rnRHAb (molicell)
rGH mRNAc (rnolicelllgene)
254-3 F 256-1 F 259-4 M 259-5 M 261-5 F 262-1 M
17 6 1 13 8 163
1305 1440 1030 8100 1140 1970
76.9 240 1030 623 142 12
266-1 M 266-9 F 271-5 F 273-7 F 274-2 F 274-4 M 274-5 M
23 18 25 43 6 49 10
120 1170 975 1110 0 621 227
5.2 65 39 25.7 0 12.6 22.7
Transgene copy number w a s measured by dot hybridization by comparing the relative intensity of duplicate dots hybridized with MT promoter probe to a reference gene (HOX locus). The ratio for nomal liver was 1.4, which was assumed to equal two genesicell. b rGH mRNA was determined by solution hybridization using oligo 150 (18).The amount of mRNA per cell was determined using M13 standards and assuming that 1 pg TNA = 0.15 pg D N A = 2.3 x lo4 cells. c Average value for the six +Introns mice is 354 mollcell/gene; average value, excluding mouse 6, is 422 rnollcelligene. Average value for the 7 -1ntrons mice is 24 molicelligene. 0
for the natural gene becomes 422 molecules/cell/gene, about 18 times that obtained for the intronless version. To perform the chromatin analysis, nuclei prepared from adult mouse livers were digested with MNase and the DNA was examined by Southern analysis using the same mMT fragment probe as in Fig. 14. To minimize the contribution to the hybridization signal from the endogenous mMT gene, and to better detect the transgenes, only mice that gave significantly stronger hybridization signals for their MNase digests (using probe I), compared to a nontransgenic control, were selected. In general, there was a good correspondence between the measured transgene copy number (Table 11) and the intensity of the MNase hybridization signal. As an important control for comparing the chromatin structures of the natural and intronless transgenes, the total chromatin was examined by ethidium bromide staining for the different mouse livers to demonstrate that no perturbations of the bulk chromatin had occurred during sample preparation. MNase ladders of high quality were obtained in all cases; nucleosome repeats were 195 k 5 bp. In Fig. 15, scans of autoradiograms from Southern blots corresponding to
371
SIGNALS IN EUKARYOTIC DNA
ELECTROPHORESIS
-
1
ELECTROPHORESIS
------
FIG.15. Effect of rCH gene introns on nucleosome alignment in transgenic mouse liver chromatin. Assessment of transgene chromatin structure by Southern hybridization. Hybridization probe I was used. Densitometer scans of autoradiograms are shown. Upper scans: mouse numbers 271-5 (b) and 262-1 (a) are 3-minute digests, 273-7 (b) is a 2.5-minute digest. Lower scans: 2-minute digests. (Reprinted with permission from Ref. 13.)
samples containing natural (a) or intronless (b) transgenes are compared. Samples compared were run on the same gel, and were blotted and hybridized to the same radiolabeled probe (probe I). This procedure largely eliminated variations arising from D N A transfer, probe labeling, and membrane washing. It is clear from Fig. 15 (top) that the periodic peaks obtained are more intense for the intron-containing transgene (trace a) than for the
372
ARNOLD STEIN
two intronless transgenes (tracings labeled b). Digests from two other transgenic mice are also compared (Fig. 15, bottom). Here, about 13 well-resolved periodic peaks were obtained for the intron-containing sample, mouse 254-3 (a), whereas fewer and less intense peaks were resolved for the intronless sample, mouse 274-4 (b). These results are similar to those obtained in the in vitro experiment (Fig. 14). The skewing of the DNA fragment distribution toward higher molecular weights in nuclear digests compared to the in vitro-assembled chromatin results from the much higher initial molecular weight of the chromatin in nuclei compared with the plasmid construct. Although introns did not serve to phase nucleosomes with respect to promoter sequences, their presence facilitated the formation of regularly spaced nucleosomes over the rGH gene and promoter, both in chromatin from transgenic mice and in chromatin assembled in uitro. In the absence of introns the nucleosome arrangement over the promoter and rGH gene was irregular and haphazard. It is reasonable to suppose that the negative influence on transcription might arise either from inhibition of transcription initiation or transcription elongation by the presence of irregularly arranged nucleosomes. For example, closely packed nucleosomes in the promoter region might be more difficult for transcription factors to displace. Alternatively, irregular nucleosome arrangements might lead to the formation of aberrant higher order structures that occlude transcription factors or interfere with the progression of RNA polymerase. The effect of introns on transcription was clearly evident in that the average level of rGH mRNA per cell per transgene from those mice with the natural rGH gene was about 15-fold higher than those with the intronless construct. More detailed in uitro analysis revealed an array of 5 or 6 strongly positioned nucleosomes over the 3' end of the natural rGH gene, including exons 3, 4, and 5 (Fig. 1).Generation of this positioned array depended on the presence of linker histone in the assembly reaction. The l-kb region where this positioned array forms may constitute part of the nucleosome alignment signal responsible for the apparent spreading of the 195-bp repeat throughout the rGH gene and the mMT promoter. Because this kilobase of DNA includes exons 3 to 5 and introns C and D, it is easy to understand why removing the introns from this region could impair proper nucleosomal organization. Although the idea of a chromatin-organizing region (Section 111,B,4) located at the 3' end of the gene is attractive, other sequences can also influence the nucleosome alignment. For example, if we start with the intronless construct b that is not expressed well and insert only intron A to yield construct c, nucleosome alignment in the promoter region improves (data
SIGNALS IN EUKARYOTIC DNA
373
not shown) and expression increases to about half that of the natural gene (101). Over the course of these experiments, we observed that the 1.8-kb mMT sequence alone has a weak tendency to align nucleosomes, and also that a positioned nucleosome formed over intron A (data not included). It seems likely that the presence of intron A alone fortuitously strengthened nucleosome alignment over the mMT sequence by adding a positioned nucleosome in phase with the mMT signal. Addition of introns A and B (construct d) inhibited expression and nucleosome alignment, compared with intron A alone, perhaps because intron B (718 bp), which contains a 195-bp tandem repeat sequence (102),has a tendency to position nucleosomes in phase with the nucleosomes that form downstream on the natural rGH gene (data not shown), but out of phase with the mMT signal; hence, the promoter region receives two weak conflicting nucleosome alignment signals. Thus, it is not surprising that complex effects can occur when unnatural sequences are juxtaposed. Nucleosome alignment in vitro appears to be directed over the rGH gene (with spreading over the promoter region), but not over the 3’ flanking sequences. Nucleosome alignment in uitro exclusively on the transcribed region was also observed for the chicken ovalbumin gene (Section V,A). In light of these observations, it is plausible that strong nucleosome alignment signals might be present in transcribed regions. There is now strong evidence that the transcription process causes disruption of nucleosome arrays (103, 104). Thus, it makes sense that a replication-independent mechanism that does not rely on chromatin assembly factors evolved to realign nucleosome arrays on transcribed regions of DNA after the disruption incurred by the passage of RNA polymerase. Irregular arrays might otherwise condense into tightly compacted irregular higher order structures that would interfere with subsequent rounds of transcription, similar to what appears to occur in mice for intronless constructs, where the nucleosome alignment signals present in the genomic DNA are perturbed by intron removal. To summarize this section, it has been shown that vertebrate genomic DNA contains nucleosome-aligning signals. Such signals are not found in Escherichia coti DNA (K. Liu and A . Stein, unpublished observations). The overall density and arrangement of these signals in vertebrate genomic DNA are not yet known. Analysis of several large continuous regions of genomic DNA and cellular chromatin should provide this information. Preliminary data suggests a “mosaic” model of chromatin organization. Well-ordered regions of cellular chromatin (with varying periodicities), generally containing no more than 10 nucleosomes, appear to alternate with less-ordered regions. This arrangement appears to result from nucleosome-aligning signals present in genomic DNA.
374
ARNOLD STEIN
VI. Chromatin Assembly on Plasmids in Transfected Cells Although transient transfection assays have been widely used to study gene regulation, little attention has been paid to the chromatin structure of the transfected DNA template. Because chromatin structure can influence gene expression, it seems important to know how the transfected DNA is packaged on its entry into the cell nucleus. It has been reported, for example, that calcium-phosphate-transfected DNA is sometimes assembled into nonnucleosomal material of an unknown nature (105, 106). It is known that this method leads to formation of large concatamers (107),which apparently facilitate incorporation into genoinic DNA, generally required for stable transfection (107-109). On the other hand, DEAE-dextran-transfected DNA remains episomal(110) and has been reported to be efficiently assembled into “typical” chromatin structures (111). Recent studies using the DEAE-dextran method have provided some additional insights.
A. DNA Sequence Affects Nucleosome Ordering on Replicating Plasmids in Transfected COS-1 Cells and in Vitro Plasmids that contain the SV40 replication origin replicate (112) and are assembled into minichromosomes when transfected into COS-1 cells using the DEAE-dextran technique (111). Chromatin assembly on replicating plasmid DNA in the nuclei of these monkey kidney cells, maintained in culture, should resemble that of the cellular DNA to some extent. Moreover, histone HI is abundant in the nucleus of these cells and should be expected to interact with and exert its influence on the nucleosomes of the plasmid chromatin. However, in the one transfection study where MNase ladders were reported ( I l l ) , they were rather poor, suggesting that the nucleosome arrangement was not very regular. This result is in apparent conflict with results obtained with SV40 minichromosomes (113,114), which exhibit extended MNase ladders. Because of the apparent differences observed in the regularity of nucleosome spacing in these studies, a number of constructs containing the SV40 replication origin, and in some cases additional SV40 sequences, were transfected into COS-1 cells and the chromatin structures of the transfected DNA were examined by MNase digestion. It was found that constructs containing the SV40 early-region (approximately base-pair numbers 2600-5000 on the 5243-bp circular map) formed nucleosome arrays significantly more ordered than constructs lacking this region. Moreover, this region of SV40 DNA assembled into a highly ordered nucleosome array in uitro, with the same
SIGNALS IN EUKARYOTIC DNA
375
200-bp repeat observed in transfected cells (14). These results suggest that the SV40 early-region contains nucleosome alignment signals that are largely responsible for forming the well-ordered nucleosome arrays found on SV40 minichromosomes assembled in cell nuclei.
B. Nucleosome Ladders Having Anomalous DNA Lengths Are Generated from Chromatin Assembled on Nonreplicating Plasmids in Transfected Cells
Unexpected and curious results were obtained when nonreplicating plasmids were transfected into a variety of cell types. Highly ordered nucleosome arrays were detected, irrespective of the construct used, but the nucleosome ladders generated were anomalous (Fig. 16). Instead of the typical 180- to 190-bp multiples generated from bulk cellular chromatin (B), ladders of DNA fragments with lengths of approximately 300, 500, 700, 900, etc. were generated (A). Analysis of such ladders (C) shows that mononucleosome bands were absent and all other oligomer lengths were shortened by about 116 b p (115).These anomalous ladders bear a marked similarity to what has been observed in some studies of active chromatin (116, 117). It has been suggested (116)that active nucleosomes have an altered protein composition and consequently are much more susceptible to exonucleolytic trimming by MNase, thereby leading to oligomers shortened by about 50 bp from each end. Although it is tempting to attribute the differences between replicating and nonreplicating plasmids in transfected cells to the replication process directly, this may not be the best explanation. In this study, it was also demonstrated that nonreplicating plasmid chromatin in transfected cell nuclei is largely insoluble after fragmentation by MNase, in contrast with bulk cellular chromatin or with replicating plasmid chromatin (115).Such insolubility suggests that it is associated with nuclear structures. Association with nuclear structure is also a characteristic of active chromatin (117-120). An interesting hypothesis (115)is that most of the transfected DNA that enters the nucleus through nuclear pores remains associated with nuclear structures. Assembly into chromatin under such conditions then leads to altered chromatin structures with properties similar to those of active chromatin. For replicating plasmid chromatin, the DNA molecules produced by replication are not associated with nuclear structures and are present in large excess. This idea is consistent with the hypothesis of Blobel (121), which proposes that chromatin regions to be activated associate with the nuclear pore complex and associated nuclear structures in a cell-type-specific fashion.
A
C
M
1
2
B
2
1800 1
oligomer number
1
M
377
SIGNALS IN EUKARYOTIC DNA
VII. Perspective It has been demonstrated theoretically that by simply specifying that a nucleosome (generally containing histone H1) should occupy 166 bp and that the average nucleosome repeat should be 196 bp, nucleosomes with entirely random positions and linker lengths would generate MNase ladders essentially indistinguishable from prototype rat liver chromatin (16). It needs to be kept in mind that MNase ladders alone contain only a superposition of average excised nucleosome oligomer DNA lengths, which provides a limited amount of information. Also, this “statistical positioning” model is not in good agreement with experiment when applied to cellular chromatin with shorter or longer average nucleosome repeats than that found in rat liver (Fig. 4). Nevertheless, this simple model might satisfactorily describe the nucleosome arrays formed in Xenopus oocytes or Drosophila embryos and in the in uitro systems prepared from extracts of these cells. Random nucleosome formation significantly limits the amount of information that can be contained in chromatin. However, there is reason to believe, that chromatin can be assembled in more than one way, with the net result being the formation of apparently ordered nucleosome arrays. The results obtained with the histone H1-dependent in vitro chromatin assembly system described here indicate that the DNA base sequence can influence chromatin assembly. Formation of highly ordered nucleosome arrays requires the presence of particular DNA sequences that appear to have a tendency to position some nucleosomes about the right distance apart. When reconstituted chromatin was incubated with histone H1 and polyglutamate, nucleosome positioning became stronger and ordered arrays of nucleosomes with physiological spacings formed. Interestingly, the nucleosome spacing periodicity appears to be encoded in the DNA, and in some cases, a nucleosome array with a well-defined periodicity was found to spread from a nucleating region of DNA onto adjacent DNA sequences that ~~
~~
FIG.16. Nucleosome ladders generated by MNase digestion of nuclei from transfected mouse Ltk- cells. The cells were transfected with pBR327 containing a 1.9-kbp chicken ovalbumin gene PuuII fragment. Nuclei were digested for 2 minutes (lane 1) or 4 minutes (lane 2). Lane M contained 32P-labeled (plus unlabeled) 4x174 RF HaeIII + AccI fragments as size markers; lengths in base pairs are indicated. (A) Southern blot specifically detecting transfected DNA; pBR327 DNA was used as the hybridization probe. A mononucleosome band is not present, and the other bands of the ladder appear to he about 100 bp shorter than expected. The arrowhead identifies heterogeneous length DNA
378
ARNOLD STEIN
by themselves did not form ordered nucleosomes. Effects of certain DNA sequences on chromatin assembly observed in uitro have also been demonstrated on replicating plasmids in transiently transfected cells (14) and in transgenic mice (13). These observations suggest models in which different regions of DNA along the chromosome possess different higher order chromatin structures, rather than a uniform “30-nm fiber,” consistent with the recent morphological studies suggesting that chromatin in situ may be fairly extended and much more irregular than has generally been thought (Section IV). In a recently proposed structural model consistent with this new idea, small changes in nucleosome linker length or in the degree of linker length variation led to significant differences in DNA folding (59). Thus, the signals present in vertebrate DNA that determine nucleosome array formation should also determine the first level of chromatin higher order structure on those sequences. Additionally, these properties suggest that it might be possible for different higher level chromatin structures to form on the same DNA sequences in different cell types, depending on which flanking sequences are included, or the length of DNA that is included, in a cell-type-specific domain. For example, different groupings of adjacent chromatin regions that possess distinguishable DNA folding patterns would be expected to aggregate into superstructures differently. Through this mechanism, a gene could be made more accessible in the cell type in which it is to be expressed. Furthermore, the existence of complex DNA-directed chromatin-folding patterns could explain how stable interactions between promoters and distal enhancers, postulated to occur through DNA looping (2), might be possible. Appropriately formed loops or regional interactions could be a natural part of the higher order chromatin structure formed by those sequences. It is not currently clear how solenoid-like structures could permit long-range DNA looping.
REFERENCES 1 . K. E. Van Holde, “Chromatin.” Springer-Verlag, New York, 1989. 2. G. Felsenfeld, Nature 355, 219 (1992). 3. R. D. Kornberg and Y. Lorch, Curr. Opin. Cell B i d . 7, 371 (1995). 4. J. Allen, D. Z. Staynov and H. Could, PNAS 77, 885 (1980). 5. F. Thoma and T. Koller, J M B , 149, 709 (1981). 6. R. D. Smith, R. L. Seale and J. Yu, PNAS 80, 5505 (1983). 7. D. E. Gottschling, T. E. Palen and T. R. Cech, NARes 11, 2093 (1983). 8. 1. R. Brown and J. G. Sutdiffe, NARes, 15, 3563 (1987). 9. B. Villeponteau, J. Brawley and H . Martinson, Bchem 31, 1554 (1992).
SIGNALS IN EUKARYOTIC DNA
379
10. J. 0. Thomas and R. J. Thompson, Cell 10, 633 (1977). 11. S. A. Nedospasov and G. P. Georgiev, BBRC 92, 532 (1980). 12. C. Wu, Nature 286, 854 (1980). 13. K. Liu, E. P. Sandgren, R. D. Palmiter and A. Stein, PNAS 92, 7724 (1995). 14. S . Jeong and A. Stein, JBC 269, 2197 (1994). 15. A. Prune11 and R. D. Kornberg, C S H S Q B 42, 103 (1977). 16. R. D. Kornberg and L. Stryer, NARes 16, 6677 (1988). 17. S. Henikoff, Trends Genet. 6, 422 (1990). 18. L. L. Wallrath and S. C. R. Elgin, Genes Deu. 9, 1263 (1995). 19. J. C. Eisenberg and S. C. R. Elgin, Trends Genet. 7, 335 (1991). 20. U . K. Laemmli, E. Kas, L. Poljak and Y. Adachi, Curr. Opin. Genet. Deu. 2, 275 (1992). 21. J. H. Chung, M. Whiteley and G. Felsenfeld, Cell 74, 505 (1993). 22. R. A. Laskey, A. D. Mills and N. R. Morris, Cell 10, 237 (1977). 23. A. Rodriguez-Campos, A. Shimamura and A. Worcel, JMB 209, 135 (1989). 24. P. B. Becker and C. Wu, MCBiol 12, 2241 (1992). 25. R. T. Kamakaka, M. Bulger and J. T. Kadonaga, Genes Dev. 7, 1779 (1993). 26. T. Tsukiyama, P. B. Becker and C. Wu, Nature 367, 525 (1994). 27. M. J. Pazin, R. T. Kamakaka and J. T. Kadanaga, Science 266, 2007 (1994). 28. V. E. Foe and B. M. Alberts, J . Cell Sci. 61, 31 (1983). 29. S. M. Dilworth and C. Dingwall, BioEssays 9, 44 (1988). 30. P. Oudet, M . Gross-Bellard and P. Chambon, Cell 4, 281 (1975). 31. M . Renz, PNAS 72, 733 (1975). 32. D. J. Clark and J. 0. Thomas, JMB 187, 569 (1986). 33. H. H. Ohlenbusch, B. M. Olivers, D. Tuan and N. Davidson, J M B 25, 299 (1967). 34. A. Stein, J. P. Whitlock and M. Bina, PNAS 76, 5000 (1979). 35. A. Stein and P. Kunzler, Nature 302, 549 (1983). 36. J. J. Hayes and A. P. Wolffe, PNAS 90, 6415 (1993). 37. R. T. Simpson, Bchem 17, 5524 (1978). 38. F. Thoma, T. Koller and A. Klug, J. Cell B i d . 83, 403 (1979). 39. P. P. Nelson, S. C. Albright, S. C. Wisman and W. T. Garrard, JBC 254, 11751 (1979). 40. J. Allen, P. G. Hartman, C. Crane-Robinson and F. X. Aviles, Nature 288, 675 (1980). 41. D. Krylov, S. Leuba, K. van Holde and J. Zlatanova, PNAS 90, 5052 (1993). 42. J. J. Hayes, D. Pruss and A. P. Wollfe, PNAS 91, 7817 (1994). 43. A. Stein and M. Bina, J M B 178, 341 (1984). 44. A. Stein, Methods Enzymol. 170, 585 (1979). 45. J. Sponar and Z. Sormova, EJB 29, 99 (1972). 46. G . Meersseman, S. Pennings and E. M. Bradbury, EMBOJ. 11, 2951 (1992). 47. A. Stein and M. Mitchell, J M B 203, 1029 (1988). 48. S. Jeong, J. D. Lauderdale and A. Stein, J M B 222, 1131 (1991). 49. J. D. Lauderdale and A. Stein, Bchetn 32, 489 (1993). 50. J. D. McGhee, J. M. Nickol, G . Felsenfeld and D. C. Rao, Cell 33, 831 (1983). 51. C. L. Woodcock, L.-L. Y. Frado and J. B. Rattner, J . Cell Biol. 99, 42 (1984). 52. S. P. Williams, B. D. Athey, L. J. Muglia, R. S. Schappe, A. H. Gough and J. P. Langmore, Biophys. J. 49, 233 (1986). 53. J. Bordas, L. Perez-Grau, M. H. J. Koch, M. C. Vega and C. Nave, Eur. Biophys. J. 13, 175 (1986). 54. V. Markarov, S. Dimitrov, V. Smirnov and I. Pashev, FEBS Lett. 181, 357 (1985). 55. H. Ris and D. F. Kubai, ARGen 4, 263 (1970). 56. C. L. Woodcock, in “Electron Tomography” (J. Frank, ed.), p. 281. Plenum, New York, 1992.
380
ARNOLD STEIN
57. J. Zlatanova, S. H. Leuba, G. Yang, C. Bustamante and K. van Holde, PNAS 91, 5277 (1994). 58. R. A. Honvitz, D. A. Agard, J. W. Sedat and C. L. Woodcock, J . Cell B i d . 125, l(1994). 59. C. L. Woodcock, S. A. Grigoryev, R. A. Horwitz and N. Whitaker, PNAS 90,9021 (1993). 60. E. N. Trifonov and J. L. Sussman, PNAS 77, 3816 (1980). 61. S. C. Satchwell, H. R. Drew and A. A. Travers, J M B 191, 659 (1986). 62. S. C. Satchwell and A . A. Travers, E M B O J . 8, 229 (1989). 63. H. L. Lowman and M. Bina, Biopolymers 30, 861 (1990). 64. S. Muyldermans and A. A. Travers, J M B 235, 855 (1994). 65. D. J. Fitzgerald, G. L. Dryden, E . C. Bronson, J. S. Williams and J. N. Anderson,JBC 269, 21303 (1994). 66. A. V. Sivolob and S. N. Khrapunov, J M B 247, 918 (1995). 67. M . A. Keene and S. C. R. Elgin, Cell 36, 121 (1984). 68. R. L. P. Adams, T. Davis, A. Rinaldi and R. Eason, EJB 165, 107 (1987). 69. J. S. Beckmann and E. N. Trifonov, PNAS 88, 2380 (1991). 70. I. L. Cartwright and S. C. R. Elgin, in “Architecture of Eukaryotic Genes” (G. Kahl, ed.), p. 283. VCH, Weinheim, 1988. 71. S. C. R. Elgin,JBC 263, 19259 (1988). 72. R. D. Kornberg and Y. Lorch, Cell 67, 833 (1991). 73. F. Cannon, K. 0. O’Hare, F. Perrin, J. P. Le Pennec, C. Benoist, M. Cochet, R. Breathnach, A. Royal, A. Garapin, B. Carni and P. Chambon, Nature 278, 428 (1979). 74. A. Dugaiczyk, S. C. L. Woo, D. A. Colbert, E . C. Lai, M. L. Mace, Jr. and B. W. O’Malley, PNAS 76, 2253 (1979). 75. R. Heilig, F. Perrin, F. Gannon, J. L. Mandel and P. Chambon, Cell 20, 625 (1980). 76. J. D. Lauderdale and A. Stein, NARes 20, 6589 (1992). 77. J. L. Cornpton, M. Bellard and P. Chambon, PNAS 73, 4382 (1976). 78. M. Bellard, G . Dretzen, F. Bellard, P. Oudet and P. Chambon, E M B O J . 1, 223 (1982). 79. M. Bellard, G . Dretzen, F. Bellard, J. S. Kaye, S. Prat-Kaye, and P. Chambon, EMBOJ. 5, 567 (1986). 80. K. S. Bloom and J. N. Anderson, JBC 257, 13018 (1982). 81. N. R . Ringertz and L. Boulund, in “The Cell Nucleus” (H. Busch, ed.), p. 417. Academic Press, New York, 1974. 82. S. Weisbrod, Nature 297, 289 (1982). 83. H . Weintraub, Cell 42, 705 (1985). 84. A. Caplan, T. Kimura, H. Gould and J. Allen, J M B 193, 57 (1987). 85. E. A. Fisher and G. Felsenfeld, Bchem 25, 8010 (1986). 86. W. C. Forrester, E. Epner, M. C. Driscoll, T. Enver, M. Brice, T. Papayannopoulou and M. Groudine, Genes Dev. 4, 1637 (1990). 87. J. Stalder, M . Groudine, J. B. Dodgson, J. D. Engel and H. Weintraub, Cell 19, 973 (1980). 88. J. Stalder, A. Larsen, J. D. Engel, M. Dolen, M. Groudine and H. Weintraub, Cell 20, 451 (1980). 89. T. Evans, G. Felsenfeld and M . Reitman, Annu. Reu. Cell B i d . 6, 95 (1990). 90. J. Ausio, I . Celt Sci. 102, 1 (1992). 91. S. Kumar and M. Leffak, Bchenz 25, 2055 (1986). 92. K. Liu, J. D. Lauderdale and A. Stein, MCBiol 13, 7596 (1993). 93. J. B. Dodgson, J. Strommer and J. D. Engel, Cell 17, 879 (1979). 94. G. D. Ginder, W. I. Wood and G. Felsenfeld, JBC 254, 8099 (1979). 95. 0 . - R . Choi and J. D. Engel, Nature 323, 731 (1986). 96. 0 . - R . Choi and J, D. Engel, Cell 55, 17 (1988).
SIGNALS IN EUKARYOTIC DNA
38 1
97. J. E. Hesse, J. M. Nichol, M. R. Lieber and 6. Felsenfeld, PNAS 83, 4312 (1986). 98. M. Reitman, E. Lee, H. Westphal and 6. Felsenfeld, Nature 348, 749 (1990). 99. R. D. Palmiter and R. L. Brinster, ARCen 20, 465 (1986). 100. R. L. Brinster, J. M. Allen, R. R. Behringer, R. E. Galinas and R. E. Palmiter, PNAS 85, 836 (1988). 101. R. D. Palmiter, E. P. Sandgren, M. R. Avarbock, D. D. Allen and R. L. Brinster, PNAS 88, 478 (1991). 102. A. Barta, R. I. Richards, J. D. Baxter and J. Shine, PNAS 78, 4867 (1981). 103. G. C a d i and F. Thoma, EMBO J. 12, 4603 (1993). 104. V. M. Studitsky, D. J. Clark and G. Felsenfeld, Cell 76, 371 (1994). 105. R. Reeves, C. M. Gorman and B. Howard, NARes 13, 3599 (1985). 106. T. K. Archer, P. Lefevre, R. G. Wolford and G. L. Hager, Science 255, 1573 (1992). 107. M. Perucho, D. Hanahan and M. Wigler, Cell 22, 309 (1980). 108. D. M. Robins, S. Ripley, A. S . Henderson and R, Axel, Cell 23, 29 (1981). 109. J. Sambrook, E. F. Fritsch and T. Maniatis, “Molecular Cloning: A Laboratory Manual,” 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1989. 110. H. Weintraub, P. F. Cheng and K. Conrad, Cell 46, 115 (1986). 111. S. Cereghini and M. Yaniv, E M B O ] . 3, 1243 (1984). 112. Y. Gluzman, Cell 23, 175 (1981). 113. V. Blasquez, A. Stein, C. Ambrose and M. Bina, J M B 191, 97 (1986). 114. M. Coca-Prados, H. Y.-H. Yu and M.-T. Hsu, J. Virol. 44, 603 (1982). 115. S. Jeong and A. Stein, NARes 22, 370 (1994). 116. Y. L. Sun, Y. Z. Xu, M. Bellard and P. Chambon, E M B O J . 5, 293 (1986). 117. S. M. Rose and W. T. Garrard, JBC 259, 8534 (1984). 118. A. H. Davis, T. L. Reudelhuber and W. T. Garrard, J M B 167, 133 (1983). 119. D. A. Jackson, BioEssays, 13, l(l991). 120. M. Andreeva, D. Markova, P. Loidl and L. Djondjurov, EJB 207, 887 (1992). 121. G. Blobel, PNAS 82, 8527 (1985).
index
A
Conjugation, bacteria antibiotic resistance, 198 DNA primase role, 232-234 DNA relaxase domain structure, 223-226 initiation of replication role, 218, 220 termination of replication role, 220-223 F-factor, 198 gram-positive versus gram-negative bacterial plasmids, 235-240 IncP plasmid clustering of transfer functiona. 200201, 203 physical properties of transfer-related products, 215-216 regulation of transfer gene expression, 203-206 relaxosome assembly at transfer origin, 217-218, 245 mating aggregate formation conjugative DNA transport, 210-211 donor-specific phage propagation, 210 IncP entry exclusion function, 212-214 Mpf role, 214, 217, 245-246 pilus assembly, 210 Trd core system, 207-208, 210 mode1 systems, 199 phylogenetic analysis, 234-245 relaxosome characterization, 229-231 species specificity, 198 steps in process, 199-200 T-DNA transfer in Agrohacterzum turnefuciens, 240-243 toxin secretion in Bordetella pertussis, 243-244 Tra protein roles TraG, 211-212 TraH, 228-229 TraJ, 226-227 TraK, 227-228 Cyclic AMP, control of growth hormone expression cascade mechanism, 134 GHF-1 role, 135-137
Antibiotic, inhibition of protein translation, 311-313
C Chorionic soiiiatoniaminotropii gene cyclic AMP, control of expression, 137 evolution, 128-131 placenta CSEF-I control of CSEn functiori. 152-154 CSEn enhancer, 149-150, 152 hormonal control of gene expression, 156- 157 InrE initiator element, 154-155 negative control by pituitary, 155-156 physiological actions, 131-134 receptors, 130-131 similarity to growth hormone gene, 128 splicing, 128-130 triiodothyronine, control of expression, 138-140, 142 Chromatin, .see ulso High-mobility-group chromosomal protein; Nucleosome assembly systems, in uitro crude extracts, 344-345 purified components polyglutamate-mediated reactions, 346-347, 350-351, 353-356 salt concentrations for assemby. 346 synthetic polynucleotides, :347-350 DNA accessibility, 35-36 genomic signals influencing assembly P-glohin DNA, chicken, 363-366 growth hormone gene introns, rat, 366-367, 369-373 mechanisms, 358-359 ovalburnin gene introns, chicken, 359-363 plasmids in transfected cells, 374-375 packaging, 35
383
INDEX
response elements on gene promoter, 135-137 transcription factors, 134-135
D DNA primase, in conjugative DNA transfer, 232-234 DNA recombination, recA-independent recombination between direct repeats D-7 sequence, effects from a distance, 270-271 deletion and addition, 254-255 effect of distance between repeats, 259-260 models misalignment exchange, 265-266, 268-269 replication slippage, 263 rolling-circle replication, 263 single-strand annealing, 263 sister-strand exchange, 264-265 plasmid substrate studies in E. coli, 256-257 products of recombination, 260-263 short repeats and genome instability, 271-272 sister chromatid exchange mediation, 272 tandem direct repeats, 257-259 recombination between inverted repeats chromosome inversion in bacteria and phage, 275 gene amplification, 283, 285-287, 289 genome rearrangement, 283, 285-287, 289 outcomes, 273 plasmid recombination, 275-277 product, head-to-head dimer in plasmids, 277-279 reciprocal-strand-switching model, 280, 282-283, 285-287, 289 types, 254 DNA rehxase assays, 229-231 domain structure, 223-226 initiation of replication role, 218, 220 termination of replication role, 220-223
Double-stranded RNA-activated protein kinase (PKR) cloning, 177-178 DNA sequence, rat, 176-177 phosphorylation of e1F-211, 169
E eIF-2B cloning a-subunit, 178-182 p-subunit, 182-184 y-subunit, 184-185 &subunit, 185-186 e-subunit, 188, 190-193 discovery, 167-168 regulation eIF-2u phosphorylation activity effects, 166 kinases, activity and cloning, 168169, 172-173, 175, 177-178 c-subunit phosphorylation, 170-172,
193 pyridine dinucleotides, 170, 194 subunits, yeast, 171-172 Elongation, translation antibiotic inhibition, 311-313 EF-G cycle, 308-311 EF-Tu cycle, 306-308 elongation factor interactions 1060 region, 313 sarcidricin loop, 314-315 30-S subunit, 315-316 factor-free, 303-304 fidelity, 300-303 kinetics, 301, 304-305 models allosteric three-site, 297, 300 hybrid-sites, 297-298, 300 translocation, 322-326 two-site, 294-295 ribosomal site placement, 320-322 thermodynamics, 304-305 three-dimensional structures of complexes, 293-294, 316-317, 319-320 tRNA binding sites, 294-298 orientation on ribosome, 322-323
385
INDEX
G GCN2, phosphorylation of eIF-ea, 169 Genetic recombination, see DNA recombination; Homologous genetic recombination P-Globin, DNA and chromatin assembly, 363-366 Growth hormone gene CAMP control of expression cascade mechanism, 134 gene promoter response elements, 135-137 GHF-1 role, 135-137 transcription factors, 134-135 evolution, 128-131 introns and chromatin assembly, 366-367, 369-373 placenta CSEF-I control of CSEn function, 152-154 CSEn enhancer, 149-150, 152 hormonal control of gene expression, 156-157 InrE initiator element, 154-155 physiological actions, 131-134 pituitary negative control of, 155-156 receptors, 130-131 similarity to chorionic somatomammotropin gene, 128 splicing, 128-130 transcription, negative regulation mechanisms elongation, 147-148 initiation, 145-147 termination, 148 triiodothyronine control of expression, 138-140, 142-145
H Heme-controlled repressor (HCR) cloning, 173, 175, 177 DNA sequence, rat, 174-175 phosphorylation of eIF-Ba, 169 High-mobility-group chromosomal protein (HMG) DNA binding specificity, 42
families, 36 HMG-1 box DNA binding, sequence specificity, 40-42 structure, 38-40 HMG-1/-2 family abundance, 37, 92 chromatin interactions, 47-48 DNA binding, 40-45 functions, 37, 45-46, 48-51, 91 HMG-I(Y) similarities, 62-63 structure, 38 HMG-14/-17 family chromatin stabilization, 82 conserved domains, 73-75, 77 distribution, 72 down-regulation during cellular differentiation, 86-87 effects nucleosomal repeat, 82-84 transcriptional potential of chromatin templates, 87-91 enrichment in active genes, 84-86 nucleosomal binding domain, 80 nucleosome cores, cooperative interactions, 77-79 organization in nucleosome cores, 80-81 structure, 72-75, 77 transcription factor activity, 87 HMG-I(Y) family cancer role, 68-70 chromatin interactions, 60-61 DNA binding, 57-60 gene, 51 growth factor induction, 51-52, 54 histone H1 competition, 70-72 HMG-l/-2 similarities, 62-63 nucleosome binding, 61-62 structure, 51, 54-56 transcription factor activity, 63-65, 67-68 physical properties, 36, 91 HMG, see High-mobility-group chromosomal protein Homologous genetic recombination exogenous DNA in Xenopus oocyte developmental recombination, 116-119 exonuclease resection, 106-108, 123
386
INDEX
gene-targeting implications, 121-123 GV extracts, 113-114, 118 kinetics, 112-113 marker recovery, 114-116 mismatch repair, 114-116 single-strand annealing mechanism, 101-102, 108-111, 119-121, 123 substrate selection, 103-106 functions, 101 Human immunodeficiency virus (HIV) gene therapy in AIDS treatment, 29 leader RNA base composition, 25-29 dimer linkage structure, 18-19, 21 packaging signal, 21-23 poly(A) hairpin, 10-12, 14-15 primer-binding site, 15-18 secondary structure models, 2-3, 6, 29 splicing function, 23-24 trans-acting responsive hairpin, 6-7, 9-10 translational regulation role, 24-25 phylogenetic analysis, 1-2
I IncP plasmid clustering of transfer functions, 2M)-201, 203 entry exclusion function, 212-214 physical properties of transfer-related products, 215-216 regulation of transfer gene expression, 203-206 relaxosome assembly at transfer origin, 217-218
I Lentivirus, types, 1
M Mating aggregate formation, see Conjugation, bacteria Mpf, role in mating aggregate formation, 214, 217, 245-246
N Nucleosome, see also Chromatin array detection, 334-336 genomic signals influencing formation P-globin DNA, chicken, 363-3M growth hormone gene introns, rat, 366-367, 369-373 mechanisms, 358-359 ovalbumin gene introns, chicken, 359-363 plasmids in transfected cells, 374-375 linker length and DNA folding, 378 models for formation DNA sequence-directed structures, 342-343 statistical positioning, 339, 341, 377 positioning with respect to gene, measurement, 336-338 relationship to chromatin higher order structure, 356-358 spacing periodicity, measurement, 335-336 HMGs, 61-62, 77-81 structure, 333-335
0 Oocyte, Xenopus DNA injection, 102-103 homologous genetic recombination of exogenous DNA developmental recombination activities, 116-119 exonuclease resection, 106-108, 123 gene-targeting implications, 121-123 GV extracts, recombination activity, 113-114, 118 kinetics of recombination, 112-113 marker recovery, 114-116 mismatch repair, 114-116 single-strand annealing mechanism, 101-102, 108-111, 119-121, 123 substrate selection, 103-106 intracellular volume, 102 staging, 116-117 Ovalbumin gene, introns and chromatin assembly, 359-363
387
INDEX
P Pertussis toxin, 243-244 PKR, see Double-stranded RN- activate, protein kinase Polyglutamate, mediation of chromatin assembly, 346-347, 350-351, 353-356 Prolactin gene CAMP control of expression, 137 evolution, 128 placental receptors, 130
R Relaxosome assembly at IncP transfer origin, 217-218, 245 DNA relaxase assays, 229-231 Ribosome, see Elongation, translation
S Simian immunodeficiency virus (SIV) base composition of leader RNA, 25-29 phylogenetic analysis, 1-2
Single-strand annealing, see DNA recombination, recA-independent; Homologous genetic recombination
T Thyroid hormone receptor, bending of binding genes, 139-140, 142-143 Tra, see Conjugation, bacteria Translation elongation, see Elongation, translation initiation complex, 166 initiation factors, see eIF-2B phases, 166 Triiodothyronine, control of gene expression chorionic somatomammotropin, 138-140, 142 growth hormone, 138-140, 142-145 Triiodothyronine receptor, see Thyroid hormone receptor tRNA, see Elongation, translation