Progress in Nucleic Acid Research and Molecular Biology, Volume 49

PROGRESS IN Nucleic Acid Research and Molecular Biology Volume 49 PROGRESS IN Nucleic Acid Research and Molecular Bi...

Author: Waldo E. Cohn | Kivie Moldave

72 downloads 680 Views 13MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

PROGRESS IN

Nucleic Acid Research and Molecular Biology Volume 49

PROGRESS IN

Nucleic Acid Research and Molecular Biology Volume 49

This Page Intentionally Left Blank

PROGRESS IN

Nucleic Acid Research and Molecular Biology edited by

WALDO E. COHN Biology Dioision Oak Ridge Notiond I~ilioratcir~~ Oak Ridge, Tcnneusce

KlVlE MOLDAVE Departnicnt of Molecular Biology and Biocheinistry Uniocrsity of Californiu, lroine Iminc, California

Volume 49

ACADEMIC PRESS Son Diego New York Boston London Sydney Tokyo Toronto

This book is printed on acid-free paper.

@

Copyright 0 1994 by ACADEMIC PRESS, INC. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.

Academic Press, Inc. A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by Academic Ress Limited 24-28 Oval Road, London NWl 7DX International Standard Serial Number: 0079-6603 International Standard Book Number: 0-12-540049-7 PRINTEDIN THE UNITEDSTATESOF AMERICA

9 4 9 5 % 9 7 9 8 9 9 B B 9 8 7 6 5 4 3 2 1

Contents

ABBREVIATIONS AND SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SOME ARTICLESPLANNED FOR FUTURE VOLUMES . . . . . . . . . . . . . . .

The Prosomes (Multicatalytic Proteinases; Proteasomes) and Their Relationship to the Untranslated Messenger Ribonucleoproteins, the Cytoskeleton, and Cell Differentiation . . . .. . . . , , Klaus Schemer and Faycal Bey I. The Biological and Cytological Bases of the Prosome System . . . . . . . 11. The Prosoines . . . . . . ... . . . . . . . .. . . .. . . .. .. . .. .. .. . . . .. . . . . . . .. 111. The Multicatalytic Proteinase Activity of the Prosoines and the 26-S Proteasoine . . . . . . . . . . . . . . . . . . . . , . . . , . . . . . . . . . . . . . , . , . . , . IV. Prosoines, the Cytoskeleton, and the Hypothesis of inRNA Cytodistrihtion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Prosoines Vary in Their Subunit Composition in Relation to Differentiation and Embryonic Development . . . . . . . . . . . . . . . . . . VI. Variations of Prosome Patterns in Pathology . . . . . . . . . . . . . . . . . . . . . . VII. Attempts at Comprehension . . . . . . . , . , . . . . , . . . . . . . . . . . . . . . . . . . . VIII. Glossary . . . . . .. .. . , . . . . . . . . . . . . , . , . , . . . .. . . . * .. . * .. . . . . . . . . . Hcferences . . . . . . . . , . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Biological Implications of the Mechanism of Action of Human DNA (Cytosine-5)methyltransferase . . . . . .

ix

xi

1 3 12 23

32 44 47 51 57 58

65

Steven S. Smith I. Mechanism of Action of the Huinan DNA (Cytosine-5)inethyltransfera.s~.................................. 11. Selectivity of Huinan DNA Methyltransferases . . . . . . . . . . . . . . . . . . . 111. Biological Iinplications of the Mechanism . . . . . . . . . . . . . . . . . . . . . . . . IV. Conclusions . . . .. .. . . . . . . . . .. . . ... , .... . . . . . . .. . . .. .. .. .. , . ,. References . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

V

66 73 84 105

106

vi

CONTENTS

Molecular Properties and Regulation of G-Protein-Coupled Receptors .....................

113

Claire M . Fraser. Norman H . Lee. Susan M . Pellegrino and Anthony R . Kerlavage I . G-Protein-Mediated Signal Transduction ......................... I1. G-Protein-Coupled Receptors Are a Large Gene Fainily . . . . . . . . . . . 111. Mokcular Basis of Receptor-Ligand Interactions . . . . . . . . . . . . . . . . . IV. Molecular Basis of Receptor/G-Protein Interactions . . . . . . . . . . . . . . . V. Identification of Functional Domains Involved in Receptor Desensitization and Down-regulation ............................ VI . Genetic Eleineiits Controlling G-Protein-Coupled Receptor Expression .................................................. VII . Identification of Novel G-Protein-Coupled Receptors by Partial cDNA Sequencing ............................................ VIII . Conclusions ................................................. References ..................................................

The Human Immunodeficiency Virus Type-1 Long Terminal Repeat and Its Role in Gene Expression ...............................

114 115 121 130

136 143 147 149 149

157

Joseph A . Garcia and Richard B . Gaynor I. I1. I11. IV. V. VI . VII . VIII .

Gene Expression Studies ...................................... Activation Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transcriptional Control Elenients ............................... Processing of HIV-1 inRNA .................................... Transhtional Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . tat Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interventional Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ghsary .................................................... References ..................................................

Processing of Eukaryotic Ribosomal RNA Duane C . Eichler and Nessly Craig

..........

I . Processing Sites and Processing Pathways ........................ I1 . The Relationship between Ribosomal-RNA Processing and Post-transcriptional Modifications ........................... 111. Suininary .................................................... References ..................................................

158 160 162 173 174 177 182 185 185

197 199 231

233 234

vii

CONTENTS

Adenylyl Cyclases: A Heterogeneous Class of ATP-Utilizing Enzymes ..........................

241

Octavian B2rzu and Antoine Danchin I . Adenylyl Cyclases of Gram-Negative Facultative Anaerobes

........

Class 111 Adenylyl Cyclases .................................... Similarity of Adenylyl and Guanylyl Cyclases ..................... Evolution of Adenylyl Cyclases ................................. Are Adenylyl Cyclases Pulse-Generating Enzymes? . . . . . . . . . . . . . . . Glossary .................................................... References ..................................................

242 2-51 261 267 271 275 276 277

.........

285

I1. The Calmodulin-Activated Bacterial Toxic Adenylyl Cyclases . . . . . . . I11. IV. V. VI . VII .

Mutational Spectrometry: Means and Ends

K . Khrapko. P. Andre. R . Cha. G . Hu and W. G . Thilly I . Goals and Problems .......................................... I1. Allele-specific PCR (ASP) ...................................... I11. High-efficiency Restriction Assay (HERA) ........................ IV. Methods Using Differential DNA Melting to Separate Mutants . . . . . References

..................................................

285 289 295 302 311

Polynucleotide Recognition and Degradation

by Bleomycin

....................................

Stefanie A . Kane and Sidney M . Hecht I . Bleomycin: Structure and Domains ............................. I1. Metal Complexes of Bleomycin ................................. 111. Chemistry of Fe(1I).Bleoinycin ................................. IV. Chemistry of DNA Degradation ................................ V. Gther Metallobleomycins ...................................... VI . Interaction of Bleomycin with DNA ............................. VII . Cleavage of RNA Mediated by Fe(I1)-Bleomycin . . . . . . . . . . . . . . . . . VIII . Strand-Scission of Altered DNA Structures Mediated by Fe(I1)-Bleomycin .......................................... IX . Concluding Remarks .......................................... References ..................................................

313 314 314 316 322 327 329 338

344 348 349

viii

CONTENTS

Interaction of Epidermal Growth Factor with Its Receptor .................................

353

Stephen R . Campion and Salil K . Niyogi I . Sequence and Structure of EGF and EGF Receptor

..............

355

I1. Generation and Characterization of Mutant Human EGF

Analogues ................................................... 111. Effects of Single-site Mutations on Receptor-Ligand Association .... IV. Cumulative Effect of Multiple Mutations on Receptor Binding ...... V. Conclusions ................................................. References .................................................. INDEX

.....................................................

359 365

377 379 380 3%

Abbreviations and Symbols

All cuntril)utors to this Series are asked to use the terminology (a1)I)revhtions and symhols) rewinmended by the IUPAC-IUB Commission on Biochemical Nomenclature (CBN) and appnwed by IUPAC and IUB, and the Editors endeavor to assure conformity. These h a m mendations have lwen puldished in many journals (1.2) and compndk (3); they are therefore wnsidered to 1w generully kaown. Those used in nucleic wid work, originallyset out in section 5 of the first kwrnmendutions (I)and subsequently revised and expunded (2,3), are given in amdensed Form in the frnntmatter of Volumes 9-33 ofthis series. A recent expansion of the oneletter system (5) follcnvs. SINGLE-LL"ITERCODERECOMMENDATIONS" (5)

Symlnl

Origin of symhd

Meaning

T(U) C

Gaanosine Adenosine (rilw$I%ymidine(Uridine) Cytidine

C or A T(U) or C A or C C or T(U) C or C A or T(U)

puRiiie pyrimidine aMino Keto Strong inter&ion (3 H-lxmls) Weak i n t e r x ~ o n(2 H-lxmds)

A or C or T(U) C or T(U) or C C or C or A G or A or T(U)

not C ; H follows C in the alphabet not A; B follows A not T (not U); V follows U not C; 1) follows C

C or A or T(U) or C

aNy nuclcnmide (i.e., unspecified)

Q

Queuosine (nucleoside of queuine)

C A

.Modified from Proc. Nut/. A d . Sci. U.S.A. 83, 4 (1986). W has Iwen used for wyosine, the nuclwside of "ImY" (wye). VDhas k e n u s d for dihydrnuridine (hU or H, Ud). Enzymes

In naming enzymes. the 19fM mommendations of the I U I Commission on Biwhemid Nomenclature ( 4 ) are folltnved as far as pssihle. At first mention. each enzyme is desrribed either by its systeiniticname or by the ecluution for the reru.tit,natdy7ed or by the reammended trivial name. followed by its EC numlwr in parentheses. Thereafter. a trivial name may he used. Enzyme rimes ure not to I= dhwiated except when the sulMmte has an 4qmved tlhl,revMon (e.g., A m . hut not LDH,is IwreptalJe).

ABBREVIATIONS AND SYMBOLS

X

REFERENCES 1 . JBC 241,527 (1‘366);Bchetti 5,1445 (1966);BJ 101, l(1966);ARB 115, l(1966).129, l(1969);

and elsewhere. General. 2. EJB 15, 203 (1970);JBC 245, 5171 (1970);J M B 55, 299 (1971);and elsewhere. 3. “Handl)ook of Biochemistry” (C. Fasmitn, ed.), 3rd ed. Chemical Rulher Co., Cleveland, Ohio. 1970,1975, Nucleic Acids, Vols. 1 and 11, pp. 3-59. Niicleic acids. 4. “Enzyme Nonienclatlire” [Reru)minendatiolis(1984)of the Nomenclature Committee of the

IUB].Academic Press, New York, 1984. 5. EJB 150, 1 (1985).Nucleic Acids (One-letter system). Abbreviations of Journal Titles Joctrtiuls

Abhrecicitions ctsed

Annri. Rev. Bicwhem. Annu. Rev. Genet. Arch. Bicdieni. Biophys. Bicwheni. Biophys. Res. Conmiin. Biwheniistry Bicwhent . J. Bitwhim. Biophys. Acta Cold Spring Harbor Cold Spring Harlwr La11 Cold Spring Harl)or Synip. Quaiit. Bio1. Eur. J. Biochem. Fed. Proc. Hoppe-Seyler‘s Z. Pliysiol. Cheni. J. Amer. Chem. S w . J. Bacteriol. J. Biol. Chem. J. Cheni. Soc. J. Mol. Biol. 1. Niit. Cancer Inst. hlol. Cell. Biol. Mol. Cell. Bitwlient. Mol. Gen. Genet. Nature, New Biology Nucleic Acid Research Proc. Natl. Acad. Sci. U.S.A. Pnw. Str. Ex11 Biol. Med. Progr. Nucl. Acid. Res. Mol. Biol.

ARB ARCen ABB BBHC Bcliem BJ BBA CSH CSHLh CSHSQB EJB FP ZpChen1 JACS J. Bact. J BC JCS JMB JNCI MCBid MCBchrm MCG Natiire N B NARes PNAS

PSEBM This Series

Some Articles Planned for Future Volumes

The Poly (ADP)-ribosylation System of Higher Eukaryotes

FELIX R. ALTHAUS snRNA Interactions in the Spliceosome

MANUELARES, JR.

AND

BRYNWEISER

Reconstruction of Mammalian DNA Replication

ROBERTA. BAMBARAAND LIN HUANC Genetic Dissection of Synthesis and Function of Modified Nucleosides in Bacterial Transfer RNA GLENNBJORK The Rodent BC1 Gene as a Master Gene for the ID Family Retroposition: Evolution and Functional Studies PRESCOTTDEININCER, JURCEN BROSIUS, HENRYTIEDCE AND JOOMYEONC KIM

Transcriptional Regulation of Growth Related Genes

THOMASF. DEUELAND ZHAO-YI WANC Poly(A) Tails, Structure, and Function

MARY EDMONDS Mechanism of Transcription Fidelity

GUNTHER EICHHORN AND JIM B u ~ z o v Egr-1: Prototype of a Zinc-finger Family of Transcription Factors ANDREAGASHLER AND VIKAS P. SUKHATME The Mechanics and Specificity of Signal Transduction to the Nucleus: Lessons from c-fos MICHAELGILMAN Regulation of Expression of the Gene for Malic Enzyme ALAN G. GOODRIDCE Structure/Function Relationships of Phosphoribulokinase and Ribulosebisphosphate CarboxylaselOxygenase FREDC. HARTMANAND HILLELK. BRANDES

xi

xii

SOME ARTICLES PLANNED FOR FUTURE VOLUMES

Targeting and Regulation of Immunoglobulin Gene Somatic Hypermutation and lsotype Switch Recombination MARKUS HENCSTSCIILh2ER, HELIOS LEUNCAND NANCY MAIZELS Histone Interactions with Special DNA Structures KENSAL E. VAN HOLDEAND JORDANKA ZLATANOVA Transcriptional Control of the Human Apolipoprotein-B Gene in Cell Culture and in Transgenic Animals BEATRIZLEVY-WILSON Molecular Biology of Heme Regulation in Higher Vertebrates BRIANK. MAY, c. h M A N BIIASKER, SATIS11 DOGRA, TIhf COX AND TIMSADLON Promotion and Regulation of Ribosomal Transcription in Eukaryotes by RNA Polymerase I TOMMoss AND VICTORY. STEFANOVSKY Drugs That Deplete Mitochondria1 DNA in Vertebrates: Basic and Physiological Considerations MJEAN MORAIS New Members of the Collagen Gene Family TAINAPIHLAJANIEMI AND MARK0 &}IN

The Prosomes (Multicatalytic Proteinases; Proteasomes) and Their Relationship to the Untranslated Messenger Ribonucleoproteins, the Cytoskeleton, and Cell Differentiation* KLAUS SCHERRER AND

FAYGAL BEY Institute Jacques Monod

CNRS and

Unioersitc! Paris 7 Paris, France

I. Tlie Biologird and Cytologid Bases of the Prosome System ....... A. Tlie Manifold Disrwery of the Pn)somes (Pn)teuwnies) ......... B. Messenger Riln)nucleoi)roteiiis and Prosoines ................. C. Messenger RNA and the Cytoskeleton ....................... 11. The Pnmmes ................................................ A. The Structure and Bitwhemird Pnqmties of the Prosome Purticles B. Tlie Prcwomd Protein Genes ................................ C. Tlie Prosomd RNA ........................................ I). Tr;mslrttiond Repression it& Vim and 1nhil)itionof Protein Synthesis in Vftro .................................................. 111. Tlie Multicatdytic Proteinase Adivity of the Prosoines and the 26-S Protensome .................................................. A. Tlie 26-S ProtecLsome ...................................... B. The Prosome-MCP Core Enzyme ........................... C. Structural and Enzymatic Mdulation of the Prosome-MCP Core I). Tlie LMP:MCP Activity and Antigen Presentation within the Major Histcxu)mptibilityComplex (MHC) .................... 1V. Prosoines, the Cytoskeleton, mid the Hypothesis of mRNA Cytc~rlistriliution.............................................. A. Cyttdistrilxtion of Prosomes in Interphase Cells and daring the Cell Cycle ................................................ B. Pnmioes and tlie Cytoskeleton ............................. C. Sulnietworks of Pmsomes aid the Intermediate Filaments ...... 1). Pnwnies at tlie Cellular surfaw and in the Extrawllulur Spwe

3 3

7 10 12 12 16

20

22 23 25 27 28

30 32

32 36 41 42

+ To the memory of the hte Nicole G r a n h u h i (1929-1970), expert electron-microscopist and truly devoted investigator and person, who died in an accident shortly after having first

observed prosome particles.

1

2

KLAUS SCHEHHEH AND FAYGAL BEY

V. Prosoines Vary in Their Sulmnit Compsitim in Relation to Ilifferentiation and Embryonic Development ................... VI. Variations of Prosome Patterns in Pathology ...................... VII. Attempts at Comprehension ................................... A. Fascination and Frustration: Much Data and Little Comprehension B. The Prosome-MCP Function(s) at the Level of Protein Synthesis and Catalwlism ........................................... VIII. Gloswry ..................................................... References ...................................................

44 47 51 51 53 57

58

Prosoines, called multicatalytic proteinase (MCP) complexes or proteasomes by many enzyinologists (see Section VIII for Glossary), are a new type of cellular factor. They are “faculative” ribonucleoproteins of about 720,000 M,that display a multicatalytic proteinase activity. They show up in the most varied and unexpected contexts of cellular structure and function, from archeobucterio to humans. Early observations, as well as more recent findings concerning their structure, indicate their relationship to seemingly totally different factors having, apparently, entirely different functions. Indeed, at the time of the publication of the first electron micrographs of prosomes in messenger-ribonucleoprotein (mRNP) complexes ( I ) , Eduard Kellenberger drew our attention to apparently indistinguishable particles in bacteria reported by Thomas Hohn, the morphogenic bacterial factors Gro-E, known to be involved in phage-A assembly (2).This unique structure of approximately 800,000 M,,is built of four superimposed layers of differing densities forming a cylinder of about 14 x 16 nm, with a central hole about 4 nm wide; seen from the top, it shows a seven-fold rotational symmetry quite exceptional for biological structures. The recent structure published by Bauineister and collaborators ( 3 , 4 )for the proteasome from Thennoplusma acidophilum, as well as new data on prosomes of duck and human erythroblasts, HeLa cells and placenta (5,6;A. Arnberg and K. Schemer, unpublished), show a cylinder of similar diinensions (12 X 17 nm), as well as a stack of four disks or rings on a heptagonal base. In the meantime, Gro-E was identified as a inember of the “chaperon” protein family, being involved in controlled peptide folding and preventing spontaneous assembly of proteins in oioo (7, 8). Like other chaperons Gro-E is a heat-shock protein (9); strangely, in a way as yet undecrypted, prosomes are also related to the heat-shock complex (10). Shaped for protein recognition, folding, and eventually degradation, the particular structure of such protein complexes may be related to some fundamental mechanism of specific interaction with substrate proteins. The finding that the MCP first observed by De Martino and Goldberg ( I ] ) , Hershko and collaborators (I2),and Wilk and Orlowski (13)is identical to the prosomes (14, 15) w a s extended recently by the proposition (16, 17)

PHOSOMES (MULTICATALYTIC PHOTEINASES; PHOTEASOMES)

3

that it is involved in antigen processing and presentation (18).Indeed, it has been suggested that yet another protein complex, independently discovered (19)and called Low-molecular-weight protein (LMP) complex, plays a role in the presentation of antigens within the major histocompatibility complex (MHC)-I complex, at the surface of human and mouse cells (20, 21). Finally, some of the properties of prosomes, such as their insensitivity to high-ionic-strength and non-ionic detergents, their relative resistance to proteases, a tendency to polymerize into filaments of 10-15 nm width (A. Arnberg and K. Schemer, unpublished), and in particular the presence of a cryptic RNA (22),are reminiscent of yet another kind of biological structure not understood as yet, the mysterious infective agents called prions (K. Schemer, unpublished). The list of unexplained observations and correlations is long and can easily be extended. Prosomes are present in urcheobucteriu [and possibly in E . coli (23)]as well as in humans, in a similar structure although of different protein composition. They are not only MCPs with highly selective substrate-specificity, but also a subcomplex of the untranslated mRNP; they incorporate in a particular state a small RNA that, in the case of mammalian prosomes, turns out to be a reverse primer of the “retroviral” tRNA type (22). Similar particles show up as “chaperons” in a related structure (but built of quite different protein subunits) involved in protein folding; they are somehow related to heat shock; they are associated, as prosomes-MCP, to the cytoskeleton, which reacts by instantaneous collapse to any kind of cellular stress (or simply, to reprogramming of protein synthesis). In all likelihood, this enigmatic type of particle, present in multiple facets throughout the living world and based, in the case of prosomes-MCP, on a new family of several dozen genes, confronts us with one of the more fascinating biological systems to show up within the last 20 years. Here we try to sort out the facts, and, possibly, will find ways of comprehending the underlying functions and the basic biological meaning of such a complex, multifaceted structure, given the present status of data and understanding. Concentrating on the prosome story and its apparent relationship to inRNA and the cytoskeleton, already reported in two minireviews (24, 25) this analysis will thus be developed here, without extensively repeating the data detailed in the MCP-proteasome reviews cited below.

1. The Biological and Cytological Bases of the Prosome System A. The Manifold Discovery of the Prosomes (Proteasomes) The prosoines were discovered in our laboratory by Nicole Granboulan, an electron microscopist who joined us in order to observe under the elec-

4

KLAUS SCHERRER AND FAYGAL BEY

A

r

C.P.M

O.D. 260

10

20

C

30 Number

Fractions

6 1.(

i

U

0.5

' 1 I

f

I 10 : 30

40

50 60 Fractions

70

80

90

100

,

FIG. 1. Prosomes and mRNP. (A) Prosome and mRNP distribution in a CsCl densitygradient of all cytoplasmic particles in HeLa cells; (B)sucrose gradient of all cytoplasmic particles in duck erythroblasts; (C, D) prosome particles under the electron microscape. (A) All particles in a post-mitochondrial supernatant of HeLa cells, labeled for 6 hours by ["Hluridine in the presence of 0.05 kg of actinomycin D (to suppress rRNA synthesis), were sedimented, resuspended, and cross-linked by formaldehyde prior to CsCl density-gradient centrifugation. (Adapted from I). (I)Ribosomes and polyribosomes, (2) pre-initiation translation complex containing mRNA and the 40-S ribosomal subunit, and (3)ribosome-free, untranslated

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

5

tron microscope the untranslated mRNP of HeLa cells and avian erythroblasts (1).At that time, Jaques Dubochet, Max Herzberg, and Carlos Morel had observed, by dark-field electron microscopy, the translated mRNP isolated from polyribosomes (26).Granboulan did not succeed in observing as a clearly defined structure the untranslated mRNPs. But after weeks of scrutiny, one day she showed us a very tiny structure that looked like a raspberry and occasionally like a cylinder (Fig. lC, D), with a central hole (the micrographs we had been looking at for weeks were full of them!). Remarkably, these structures were not present in the polyribosomal fraction of CsCl density gradients on which, after cross-linking by formaldehyde, all types of particles from cytoplasmic cell extracts were analyzed. This clear-cut structure was also present, to some extent, in the 40-S pre-initiation translation complex banding between the ribosome-free untranslated mRNP and polyribosomes (see Fig. 1A). Some time after the publication of this report ( I ) , J. Harris drew our attention to a structure isolated from human erythrocytes which he named “cylindrin”(for an early review, see 27). However, the different biochemical context, the claim that it was an aminoacyl-tRNA synthetase (28), and the interpretations made of the structure, which, in our case, more often looked like a raspberry than a cylinder, made it then unlikely that we were talking about the same entity. Today, it seems probable that his preparations contained prosoines. In the 1970s, several groups observed various structures bearing remarkable similarities to prosomes, for example, the EDTAsensitive (prosomes are not) nuclear complexes reported by Harris Busch and collaborators (29, 30). However, the previously discussed structure of the bacterial Gro-EL factor made it evident that it was impossible to rely on electron microscopy in defining the structure we found associated with the untranslated mRNP. Indeed, it took us more than 10 years to sort out facts and artifacts, and to define what untranslated and translated mRNPs really are, in terms of structure and eventual function (31,32).Then we arrived at a clear biochemical definition of the particles discovered by Granboulan, clear enough for us to dare to name them “prosomes” (33).At about the same time, mRNP. (M 0.) D.m, alisorbency, (0------0) [3H]uridine incuirporation, and (+-+) prosome count under the electron microscup. (Experimental details are given in I ) . (B) All particles in a post-mittrhondrial lysate of duck erythroblasts were sedimented and fractionated 011 a sucrose gradient; note the absence of a peak in the 80-S position of monorilwsomes. (A) Free mRNP, (B) free mRNP and ribosomal subunits including the pre-initiation cumplex, and (C) polyrilw)somes. (C, D) Electron micrographs (caurtesy of A. C. Arnlxrg and W. Bergsma-Schutter) of prosomes from HeLa cells suspended (C) in 0.5-M KCI, 30-mM TEA (pH 7.6). and stained by 1% uranyl acetate, or (D) suspended in W m M ammonium acetate and stained by 2% sodium phosphotungstate.

6

KLAUS SCHEHHEH AND FAYGAL BEY

Kleinschmidt and colleagues had isolated particles resembling prosomes from the nuclei of Xenopus oocytes (34, 35). Totally independently, a group of enzymologists had worked in the late 1970s on a particular proteinase (11, 12) that was eventually termed the “multicatalytic proteinase (MCP) complex” (13). It was to the merit of two scientists, who, knowing about the prosome story on the basis of our data, noticed and eventually demonstrated that the MCP is identical to the prosoine (14,IS) [after a short-lived proposal that prosoines were built of heatshock proteins (36,37), contradicted by our data (lo)].It was proposed that we call these prosomes, henceforth, “proteasoines”; but the entire group of enzymologists concerned suggested, in two letters to the Biochemical Journal (38, 3 9 , to maintain the by then traditional name “multicatalytic proteinase” (MCP) or “multicatalytic proteinase complex” (MCPC); therefore, when speaking about the proteinase activity of prosomes, we use this term. We recently confirmed that prosoines isolated from genuine untranslated inRNP display all of the properties attributed to the MCP (40).Two earlier reviews (41,42),as well as nine recent ones (17, 43-50), give details about the present knowledge concerning the MCP. We therefore discuss here the proteinase function only in relation to the prosome story as a whole, referring the reader to these reviews for details of enzyinological mechanisms. Entirely independent of the discovery of the prosoines and the MCP is the story of the LMP complex. It led recently to the proposition that the prosoine-MCP might be involved in the generation of the short antigenic peptides, to be presented to the lymphocytes by the MHC-I complex at the surface of human and mouse cells (16).Indeed, the isolation of fractions containing the MHC-I complex precipitated by allogenic sera led also to the definition of a complex of low-M, proteins, the LMPs. The two-dimensional pattern published in Nature (19)led us to reproduce those experiments; we found, using our anti-prosome monoclonal antibodies, that Monaco and McDevitt (19)had indeed isolated prosoines within their LMP (F. Grossi de Sa, M. Seinan and K. Scherrer, unpublished). Because we had other priorities, this allegation has only been brought to the attention of the group concerned. The more recent finding that some of the genuine LMP genes (Ring 10 and 12, or LMP7 and 2)were in fact genes of prosome-MCP proteins, which map in the human and mouse genoines within the MHC-I1 locus, between the TAP1 and TAP2 transporter gene loci, shed new light on this relationship. However, the hypothesis (16)that the MCP activity of the prosomes was a main mechanism for processing intracellular proteins to small peptides, which are presented by the MHC-I complex as antigens to the immune system, w a s recently contradicted experimentally (51,52). A major part of the published work within the prosome field has concentrated for the last few years on the MCP, and recently the LMP; very few

PHOSOMES (MULTICATALYTIC PROTEINASES; PHOTEASOMES)

7

groups have paid attention to the relationship among prosomes and the untranslated mRNP and the cytoskeleton. A new series of papers relating to the latter field was published recently; we will here review primarily the latter two aspects of the prosome-MCP system, in order to enable our colleagues to comprehend better the prosome-MCP-LMP story as a whole.

6. Messenger Ribonucleoproteins and Prosomes For a long time, mRNPs were suspected to be artifacts. Indeed, the demonstratioil that any kind of nucleic acid dumped into a cellular sap will forin an RNP (53),although logical, seemed discouraging. Eventually, others as well as our team succeeded in sorting out most of the artifacts, and to give some meaning to the very complex structures containing the mRNAs, in their translated and untranslated forms. On the basis of simple theoretical considerations, it was evident from the onset that it is impossible for the inRNA itself to contain enough information to drive and regulate all of the biochemical steps involved in the subtle and peripheral controls involved in pre-mRNA processing and the posttranscriptional stages of the “cascade of regulation” of gene expression (5456). Therefore, trans-acting factors and higher-order structures at premRNA and mRNA levels had to be involved. We cannot possibly discuss here the question of the pre-mRNPs (57), which comprise some basic components, the “informoferes”of Samarina and Georgiev and co-workers (58).They were later extensively characterized by LeSturgeon (59)and Martin and colleagues (60),as well as a large number of more acidic proteins numbering in the hundreds (61). But we shall try to define here the inter-relationships of cytoplasmic RNP by the scheme given in Fig. 2. The mRNA is brought to the cytoplasm as a “transfer RNP” complex of, unfortunately, still unknown coniposition. Forming the pre-translational mRNP, it is then confronted at the level of the pre-initiation complex (mRNA, 404 sub-ribosome, and various factors) with the possibility of eventual translation in the polyribosomes. Part of the mRNA remains untranslated for shorter or longer periods; the complexes that include the particularly long-term “masked inRNA in occytes were called “informosomes”by Spirin (62). The work carried out in the 1970s in several laboratories, including our own, brought forth some very clear-cut facts, based on experiments carried out with all necessary precautions to avoid the above-inentioned artifacts that occur when handling cells and their extracts. Since this matter is basic to the prosome and mRNP story, a true sucrose gradient profile showing the fractionation of the integrality of cytoplasmic RNPs into polyribosomes and (ribosome-)free mRNP in avian erythroblasts is given in Fig. 1B. Those having seen hundreds of such profiles then found

8

KLAUS SCHEHHEH AND FAYGAL BEY

FIG 2. Inter-relatioiisliips of the various inRNP fractions and the cytoskeleton in the cytoplasm (adapted from 31). inRNA enters tlie cytoplasm as inRNP of still-unknowii composition; with tlie notalJe exceptition of the prosomes. none of the nnclear trans-acting Extors clisracteriaing pre-inRNP are present in cytoplasmic iiiitraiislated or translated mRNP. Once in tlie cytoplasm, associated with the intermediate filaiiieiit network, some iiiiinediately enter tlie traiislation machinery, losing the free inRNP proteins (including the prosomes) while integrating the pre-initiation cwtiiplex and eventually the polyril~osoiiiesassociated with tlie inicrofilametits. A fraction (5-958) of every type of inRNA remains, htwever, uiitratislated. for tlie short or long t er m By definition, we range in the kong-tenti repressed inRNA those that, in a given cell and at a given time, are fully alisent from polyrihoines (100% untriinslated mRNA). The prototype ofthe latter inRNP are the infonnosortws of Spiriii (62).that is, the maternal inRNP in the cmytes. which are inactive atid activated months or years later upon fertiliwtion (lip to 30 years in humaiis).

one feature very remarkable: This gradient does not contain the “classical” 80-Speak of single ribosomes; all ribosomes are in polyribosomes or present as 404 and 60-S subribosomal complexes (63).Such results are obtained by “freezing” the polyribosomes prior to cell fractionation by cyclohexiinide or, even better, by the irreversible drug emetine, at doses high enough to prevent “run-off as well as artifical loading of polyribosomes by “run-on” of the ribosomes. The evidence is thus clear-cut that (when kept under normal physiological conditions!) “steady-state” cells have few free 804 ribosomes,

PHOSOMES (MULTICATALYTIC PHOTEINASES; PHOTEASOMES)

9

even in the nondividing, terminally differentiating avian erythroblasts. Since 99% of published polyribosomal profiles of most laboratories, including our own, include a heavy 80-S peak, it is clear that this peak must be created either by physiological run-off of ribosomes in the still living cells during washing, or in the cell-free extract. Indeed, at intermediary, (cold-room!) temperatures of4-12"C, ribosomes are particularly prone to "run off (C. Chezzi, J. Grosclaude and K. Scherrer, unpublished). In normal cells, the 804 ribosome peak relates thus to an artifact. The consequences for investigations on mRNP and prosomes is fundamental: If there is substantial run-off, the genuine untranslated mRNPs become mixed with those from the polyribosomes, and a variable mixture of mRNAs (with their trans-acting factors) in their active and inactive forms is studied. It is therefore essential to take all possible precautions when studying the function and structure of the translated and ribosome-free mRNP, particularly the prosomes, which exist both in an mRNA-bound and a free form. Superimposed on this problem is the possible leaking-out of RNPs from the nucleus during cell fractionation, which must be controlled by appropriate means (64). Taking into account such precautions and interpreting the published work accordingly, four basic facts have emerged concerning the cytoplasmic mRNP (cf. reviews 31, 32). (1) The transfer of the mRNA from the nucleus to the cytoplasm is accompanied by a total exchange of the associated proteins, with the possible exception of the poly(A)-bindingprotein (PABP) of M, 73,000 (32, 65) and, possibly, the prosomes. (2) No protein is common to both the translated and untranslated forms of mRNP, as judged by one- and two-dimensional gel electrophoresis. In particular, the untranslated mRNPs seem not to contain the PABP present in polyribosomes; other PABPs of different M, replace the M, 73,000 component (66, 67). (3)All proteins and factors associated in the translated mRNP (which can be isolated with the mRNA after dissociation of the ribosomes by EDTA) seem to be ubiquitous, that is, present on all types of mRNA. Indeed, no qualitative differences in the two-dimensional protein pattern of translated mRNP were observed when comparing polyribosomal mRNPs of various sources. None of these factors seems, thus, to discriminate among different types of mRNA. But they might quantitatiuely favor translation of specific types of mRNA, possibly related to the secondary structure of specific mRNAs, and thus activate them differentially. (4) In contrast, untranslated mRNA associates with a much wider spectrum of proteins and subcomplexesof particular composition, for example, the prosomes. It is particularly important that, as defined by two-dimensional

10

KLAUS SCHERHEH AND FAYGAL BEY

gel electrophoresis, the pattern of mRNA-associated proteins changes according to the type of (differentiated)cell studied, or even when comparing, within a given cell type, one mRNA population to another, for example, in avian erythroblasts (68).A straightforward recent example of such a system of truns-acting factors of control, acting at the level of the ribosome-free mRNP is the “iron response element-binding protein” (IRE-BP), present in the noncoding parts of the ferritin and transferrin mRNAs. A control protein (identified as the venerable aconitase !) associates and blocks translation in the former case, and stabilizes the mRNA in the latter (reviewed in 69). An important theoretical concept emerges: If the fundamental mechanisms of mRNA translation are based on ubiquitous factors, the essential differential controls of specific mRNA expression in the cytoplasm are based on a system of mRNA stabilization (acting positively) and of negative controls by truns-acting factors, exerting their effect on the untranslated mRNA. No specific factors positively favoring the translation of individual types of inRNA have been defined. This statement must immediately be qualified, since, as pointed out above, the variable secondary structure of individual types of inRNA and, based on this feature, the resulting variability of dissociation constants of the ubiquitous translation factors, acting positively, might modulate the translation of individual types of mRNA. It is clear, however, that the large number of differing qualities of the factors associated with untranslated mRNA, and their variability in composition in different types of cells and in different inRNA populations within a given cell, must signify that the essential qualitative controls are negative, and must primarily act at the level of the untranslated mRNP. The careful analysis of these putative control factors associated with ribosome-free inRNA led to the discovery of the prosome particles, which were found to be associated in their majority, although not exclusively, with the untrunskuted inRNA (1). In this context, it is of prime importance that the prosomes, as the untranslated core inRNPs themselves, show compositional variabilities in their protein complement in specific mRNA populations (68, 70);this is discussed extensively below (see Section V).

C. Messenger RNA and the Cytoskeleton The total exchange of truns-acting factors that takes place when mRNA passes from its untranslated to its translated form has been given new significance recently by the observation that polyribosoines and inRNA-poly(A)are found on the actin-containing inicrofilainents of the cytoskeleton (71-73). Since there is strong reason (see Sections IV, B and C) to suggest that a major fraction of the untrunshted mRNA, together with the associated prosomes, are bound to the intermediate filament (IF) networks of the cytoskeleton (74-76), the concept emerges that inRNA may first be distributed in the

PHOSOMES (MULTICATALYTIC PHOTEINASES; PHOTEASOMES)

11

cytoplasm by the I F networks (77), to be secondarily taken over by the microfilaments prior to and for translation (73). Penman and collaborators (78, 79) have shown, for many years, that mRNA in the cytoplasm is bound to cellular structures. The evidence then was that Triton X-100 extracts neither mRNA nor polyribosomes from cells, while RNase treatment releases the ribosomes quantitatively (79). Since the latter treatment also releases the mRNA fragments but not the PABP of M, 73,000, it was proposed that mRNA is bound to the cytoskeleton by the PABP. This experiment did not discriminate between the mRNAs in their translated or untranslated forms; but on a purely qivintitative basis, one could assume that, under such conditions, both types of mRNA remain bound to the cytoskeleton [in HeLa cells mRNA distributes about 1:l between the two fractions (1; see Fig. lA)]. As far as the translated mRNA is concerned, the above-mentioned finding that the polyribosoines and the PABP of 73,000 M, are selectively associated with the microfilaments gives new significance to these data. We might, therefore, hypothesize two types of triangular association, among (i) the mRNA in its transkited form, the PABP, and the actin-containing microfilaments bearing the actively mRNA-translating polyribosoines, and (ii) the untranslated mRNP, the prosomes, and the intermediate filaments. The shunt-yard of this variable association must, by necessity, be at the level of the pre-initiation complex, which contains the 404 ribosomal subunit and the inRNA, eventually associated with the PABP, the CAP-binding complex, and the initiation factors. This exchange of factors may take place on the microfilaments. Indeed, as we show later (Sections IV, B and C), prosoines also align partially with the microfilaments. Since part of the prosoines can be extracted by Triton X-100, hence leaving behind the mRNA-bound particles associated with the cytoskeleton, and in particular with the I F networks, it is clear that not all of the prosomes are bound to mRNA, but rather to lipoproteins or other cellular structures dissociated by mild, non-ionic detergents such as Triton X-100. Prosomes are therefore only transiently associated with the mRNA, as trans-acting factors having, possibly, other functions as well, beyond the mRNA and the cytoskeleton. These basic notions might become particularly important in relation to the wealth of data actually emerging, which show that specific types of mRNA occupy particular, functionally significant territories in sectors of the cytoplasm of somatic cells (80) as well as of oocytes (81, 82). As assumed in the “unified matrix hypothesis” (83, 84) and more recently demonstrated by extensive experimental data, genes and their primary transcripts can be located in specific topologically defined regions of the nuclei (85, 86), and some transcripts may thus be “gated out” into the cytoplasm through spe-

KLAUS SCHEHREH AND FAYCAL BEY

12

cialized nuclear pores (87). Transported in a selective manner on specific cytoskeletal networks to cytoplasmic territories, specific mRNAs will produce proteins locally, to be assembled co-translationally and carry out specific structural or enzymatic functions. Particularly interesting is the fact that “maternal” mRNA in oocytes is also associated with the I F network (88). Again, on a theoretical level, this implies that pre-mRNA and mRNA are qualitatively and possibly quantitatively controlled, not only in a cell-specific and temporal manner, but also topologically within the cell. The particularly intriguing concept emerges that pre-mRNA and mRNA are directly involved in the dynamic architecture of the cell. They may serve as matrices of transient structural compounds at the level of the nuclear matrix and the cytoskeleton. They would, thus, bear information beyond the genetic code, when organizing, according to their nucleotide sequence, the trans-acting factors, which are, in turn, placed into the cellular structure (84). This in addition to their role as carriers of the coding information, the primary message to be translated and subject to the “cascade of regulation” (56). In this context, some of the disconcerting biophysical and biochemical properties of the prosomes might possibly make sense, for example, their complexity and variability in biochemical composition, their putative function at the level of the mRNA, some of their protease activities, their cytolocation on the nuclear matrix and pre-mRNA, the cytoskeleton, and the untranslated mRNP.

II. The Prosomes A. The Structure and Biochemical Properties of the Prosome Particles Prosomes are facultative RNP complexes built of small proteins of M, 19,000-36,OOO. In the individual particle, they are present in a variety of specific combinations, according to the cell type and the mRNA population with which they are associated (Section V). A subfraction of prosomes contains, in addition, a small RNA in the 50- to 150-nucleotide range. The M, of the complex is estimated to be 720,000.This figure is based on various methods such as electron microscopy, light scattering, neutron scattering, X-ray diffraction, and sedimentation (4, 5, 43). Early electron microscopy ( I ) , as well as the recent structural studies on prosomes, from the archeobacterium T.acidophilurn, on the one hand (89), and from such eukaryotic cells as avian erythroblasts, human HeLa cells (A. Arnberg and K. Scherrer, unpublished), and various sources on the other (6) show an almost identical structure. In T. acidophilurn the particle is composed of four superimposed rings, each with seven subunits (Fig. 3A),

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

A

13

PROSOMES IN ARCHAEOBACTERIA: 4 RINGS OF 7 a- or D-TYPE SUBUNITS FORM A HOLLOW CYLINDER

1.ddophllum have (al4D14)comporltlon

B

EUKARYOTIC PROSOMES: 4 RINGS OF 6-7 SUBUNITS OF

a- or D-TYPE IN VARIABLE COMBINATIONS BASED ON A LARGE POPULATION OF GENES

EukWoUc P r o m e s sm Modca ot variable caposinon

FIG.3. Models of prosome-MCP particles. (A) The T. ocidophtlum proteasomes, according to Baumeister and colleagues (4), are constituted of 28 subunits. (I))Model of vertebrate prosomes according to A. C. Arnberg and K. Schemer (unpublished). Note that, in the eukaryotic prosomes, the number ofsubunits may vary from 24 to 28, since, according to mass distribution, some proteins may occupy the place of two subunits in the T.acidophilum model. Furthermore, many of the subunits can be exchanged, resulting in a variable combination; theoretically, a large number of distinct mosaic particles is hence possible.

14

KLAUS SCHEHHEH AND FAYGAL BEY

assembled into a cylinder 12 nm wide and 17 nm high, with a central hole of 4 nm; it is hence constituted by 28 protein subunits.

The studies on prosomes of higher eukarytoes show six centers of mass (48; A. Arnberg and K. Scherrer, unpublished). However, according to our data, the same seven-fold base as observed in archeobacteria prevails in eukaryotic prosomes; indeed, one center of mass seems elongated, with two centers of density occupying the space of two proteins in the bacterial particle. Therefore, one may tentatively assume the same basic model for eukaryotic prosomes (Fig. 3B). The central tunnel is wide enough to accommodate either small RNA species, which resist RNase digestion in the intact particle (22, W),or the extended filament of a (unfolded) protein (8). Prosomes have a tendency to polymerize, particularly in polyethylene glycol solutions (A. Arnberg and K. Scherrer, unpublished), which induce the formation of one- and two-dimensional crystals. This might indicate the existence of an ionic dipole in the particle (a suggestion of E. Kellenberger); another indication of such a dipole in the particle is the property whereby the barrel-shaped particle “stands up” on charged electron microscopic grids, showing the heptagonal ring-like structure from the top. The raspberryshaped particle, which has been seen by many authors in uranyl-acetatestained preparations, might therefore correspond to such “standing-up” structures in a partially collapsed form, subject to the surface tension arising during dehydration. The proteolytically active prosome complex can show higher-order structures by addition of other factors and complexes, associating at both ends of the basic particle; this is particularly the case for the 26-S proteasome complex (see Section 111, A) observed when the particles are isolated, or associated in uitro with cofactors, in the presence of ATP (see electron micrographs in 17). In T.ucidophilum, only two different prosomal subunits exist, the a and p proteins. The particle thus forms an a14p14 complex (4). In higher eukaryotes, due to the six centers of mass observed within the basic heptagon, 24-28 subunits may constitute the core particle. Furthermore, their subunit composition is much more complex and may include ten or more different subunits in a variable combination within one particle (Fig. 3B). In yeast, the individual particle may contain 12-14 different proteins (91, 99), while the precise number is not yet known for higher eukaryotes. Whereas the proteolytic MCP complex studied by enzymologists was considered until about 1990 to be an invariable structure, the mRNP-associated prosomes were known, from the beginning, to present compositional variability, observable even when comparing inRNPs of different kinds (68).The two-dimensional electrophoretic prosome subunit pattern showed differences when comparing different species (70),and our recent data on pros-

PHOSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

15

omes of the various human blood cells confirmed this variability within a species (163). In addition to such biochemical data (70, 92), iminunocytological studies (detailed in Section V) also demonstrate this basic property of prosomes in higher eukaryotes. For T. acidophilum, the a-subunits constitute the two outer rings of the particle, while the p-subunits build up the inner core (89, 93). The latter is supposed to contain the protease activity which, at least in T. acidophilum, is of unique chymotrypsin-like type; there is, thus, no multicutalytic function as yet (94). The tunnel-like structure, with the protease activity supposedly inside, may explain why prosomes are “poor” proteases (in turnover rates) but of high selectivity; they do not attack proteins in a haphazard fashion. This fact is fundamental, since prosomes are extra-lysosomal proteases and therefore are in contact with a great variety of cellular structures, from the chromatin (95) to the cell surface (96, 97). Their unique, highly compact structure makes the prosomes one of the most stable complexes in the cell. In fact, it is possible to dissolve oocytes and blastula stage embryos in 1%Sarkosyl: the oiily higher-order structure resisting and sedimenting in their characteristic 19-S zone are the prosoines (95). Prosomes resist not only nonionic detergents, but also low concentrations of sodium dodecyl sulfate (SDS), desoxycholate, RNase, to some extent proteinases, and, very interestingly, salt solutions of high ionic strengths. Therefore, vertebrate prosoines can be banded without prior fixation in CsCl density-gradients (33). In view of this extraordinary complex structure and its stability, it was surprising to find that the two bivalent metal ions, Zn2+ and Cu2+, in concentrations of 0.01 to 1 mM, inactivate the MCP instantaneously and dissociate prosomes into their subunits, without modifying or denaturing the individual proteins (90).This property, based on a still-unknown mechanism, is not only an important tool for investigation, but might also point to a possible mechanism of in oiuo dissociation and recomposition of prosomes. In this context, it is important to point out that no free prosomal proteins have been found in any of the cell types analyzed, outside the 19-S complex, with the notable exception of heat-shocked cells. In the latter condition, preexisting prosomes dissociate and their components become transiently integrated into the complex of the small heat-shock proteins (10).This indicates, on the one hand, that the level of individual types of prosomes is controlled by the biosynthesis of the corresponding subunits, which are immediately integrated into the complex, and on the other, that physiological recomposition of prosomes is possible under conditions of “stress,” or more simply and better, upon re-programming of protein synthesis, due to physiological or environmental changes.

16

KLAUS SCHEHHEH AND PAYGAL BEY

B. The Prosomal Protein Genes U p to now, about 50 cDNA sequences of prosomal proteins have been published; these allow some interesting conclusions to be drawn concerning the phylogenetic history of these novel genes and their diversity (see 43,98, 99). It has also given at least some preliminary information concerning the mechanisms of the assembly of the complex and its function. Prosomes from a wide spectrum of cells, from the archeobacterium T. ucidophilum at one end to the human species at the other, including such important evolutionary landmarks as yeast and Drosophilu, have been isolated and their genes sequenced. Concerning the particles’ function(s), it is of particular interest that none of the known consensus sequences for proteases has been detected; the prosome-MCP represents, thus, an entirely new type of protease. Forming a complex of variable subunit composition, the protease activity is most likely created by the interaction of the constitutive proteins (40, 50, 90). Another functionally revealing feature is the presence of an RNA-binding sequence related to the so-called RNP consensus (100-102) in the PROS-27 gene; this protein-RNA interaction was confirmed experimentally (98). Another sequence feature is the observation in many of those proteins of the so-called nuclear localization signal (NLS), which is present in prosomes of higher eukaryotes but, very intriguingly, also in T. acidophilum (43, 103). The latter fact shows, indeed, that the NLS relates to a more primordial function than anticipated. The most important feature in the comparative phylogenetic analysis of all of the prosomal protein sequences is that they fall into two major classes, which can be related to the two subunits a and p, which constitute the prosome particle in T. acidophilutn in the form of an a14p,, complex (99). There is extraordinarily high sequence conservation in some of these proteins. The a-subunit of T. acidophilutn shows up to 40% similitude to some of the human prosome protein genes (99, 104).This conservation score is higher than that of most other types of proteins and points to a very fundamental function of the prosome particles, established early in the evolution of living matter. Extensive sequence comparison has not only allowed for the further subdivision of the prosomal protein genes into 14 subfamilies (Table I), which fall into the two large a- and P-related groups. There is also a particular sequence pattern that relates these two subgroups to each other within the entire prosome-MCP-proteasome gene family. The alignments published by our group (98, 99) as well as by others (43)show the existence of a threepartite consensus in the N-terminal part of the a-type prosomal proteins, as was noticed earlier (105).In general, in the a-type superfamily, the N-terminus is highly conserved, while the C-termini diverge. This is also the case to

PHOSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

17

TABLE I TI-E P R O S O W PROTEIN GENE FAMILIES' Fumily Name

Key gene

PNM 1 P m 2 Prn-a 3 PrIM4 P m 5 Pr0-a 6 ~m-7

c3 Pms28.1 C9 c2 Pros27

Pr0-p 1 Pn1-P 2 pwp3 Pro+ 4 Prtrp 5 Pwp6 Prn-p 7

vpe

a a a

ScPUP2, RrZetu, Hs&tu ScY7, DmB. XIC3, RnC3, HsC3 Ata. DdS. DmPROS28.1 ScY13, Dd4, Dm29, RnC9. HsCQ AtPSM30, Drn35, RnC2, HsC2, HsPROS30 ScYC7a, RrIote. HsIota, HsPROS27 ScYC1, RnCB, HsC8

p p p p p p p

ScPRGl, S p a S l , ScPRE2, MmCW.RnC1. HsRINClO. HsLMPld, GdPCl ScPUPl, HsMECLl MrnLMP2, RrRING12. HsRING12, RrDeltn, HsDeltu ScPRE1, RrC7-I ScPRS3, DniL3-73Ai. RrC5, HsC5 ScPZE-6,RrClO-II ScPRE4, XIg, RnN3

a

cn PREP MECLl LMP2 C7-I C5

CIO-I1 N3

Members (genes included)

a a a

'1 PrnMime protein gene fiunilies we dmsified actording to phylogenetic relationship and purticuh piitative fi~~~ctii)~~ial sequence eriteriu, uccudng to the dendritic tree shown in Fig. 4 (detuilsin 99). Abbreviutions: Ti, Theniwplasnm acidophllrrm; At, Ardidupis thaliana; Sc, Saccharuinyces cerecisioe, Sp. Saccharonigces pui116e; Dd. Dictwteliirm d&cuirfetitn; Dm, Drusuphila inelmugaster; XI, Xenuptcs laeeis; Gd. Cii//tr.~ c/f~rr1e8tic118; Mm, Mas ntrscultis; Rn, R ~ t f t t numegiais; s Rr, Rattiis rattcis; and Hs, Hunw sapietts. Thc following pnmme sequences were taken into acruunt Tan (104);T$ (8); Ata (201);AtPSM30: Shirley and Gundrnun, Genhnk arression number M W S ; ScYCI mid ScYC7a (202);ScY7 und ScY13(203); ScPREl (109); ScPUPl (W);ScPUP2 (205);ScPRE2 (206); ScPRE4 (207);ScPRS3 (208):ScPRGl (209); SpPI'S1: Stone et a/., G e n h i k wwsaion number D1309Q; DrnPROS28.1 (210); DmPROSS (105); DmPROS35 (211); XIC3 (212);Xlp (213);CdPC1: S. Sub, Genbunk uctvssioa number XS7210; MmLMP2 (21); !bfmC13(214);RnCl(215); RnC2 (216); RnC3 (217);RnC5 (218);RnC8 (219);RnCB (220); Rns7, RnS(. RnS8, and RnRINGlS (221); HsC2. HsC3, HsC5, HsC8. and HsC9 (2%); HsSI, HsST, m d HsSZ. (223); HsPROS27 (98); HsPROS30 (224); HsRINGlO (200);HsRINC12 (225): and HsLMP7E2 (143). For other tcqiienct.~scv wfercvicvs in (99).

some extent for the P-type superfmily, in which the f3-type prosomespecific consensus consists of four (or possibly more) motifs dispersed throughout the NH2 part of the sequence. Two of these boxes are present in both superfamilies; they may thus represent the most universal fingerprint allowing recognition of prosomal protein genes (99, 104). Generally, sequence conservation is higher among the members of the a-family, compared to the p-type genes. At least in T. addophilum, the protease function is related to the P-type subunit, located in the inner two rings constituting the particle. The triparite consensus within the a-family seems to relate to the subunit assembly mechanism. Indeed, deletion of the a-1-box prohibits assembly of the particle in E. coli transfected with the T. acidophilum genes, and quite in general, the a-proteins assembling spon-

18

KLAUS SCHEHHEH AND PAYCAL BEY

taneously into a torus seem to be necessary to integrate the p-subunits and form the four-layered stack (106; W. Baumeister, personal communication, 1993). In view of the universality of this 12 to 17-nm-wide structure on a heptagonal base, having a central 40-A-wide hole, in the Gro-EL-type chaperons (8)as well as in the prosome-MCP, it is tempting to speculate that the outer surface or external rings with their free polypeptide “tentacles” may recognize and unfold target proteins that might than be biochemically “treated by the central core, which in turn provides a sequestered environment shielded from the outside (the “Anfinson cage” model; 8). In view of the combinations theoretically possible in vertebrate prosomes among the subunits of both the a-and P-types, recognition at high resolution of an unlimited number of specific substrates seems possible, as well as extensive modulation of the protease activity (see Section 111). Given the availability of 60 prosome-proteasome sequences, we undertook to construct a dendritic tree establishing phylogenic relationships (99). This enabled us to test for the existence of the superfamilies and of 14 subfamilies that, first, were defined by similarity scores. The tree shown in Fig. 4, based on the computer program Pileup (107;see also I @ ) , confirms, with minor exceptions, the evolutionary relationship of the different families. Furthermore, the phylogenic relationship within the individual subfamilies can be confirmed by analyzing the sequences of specific types of peptide motives. For instance, one of the subfamilies shows a box of 20 amino acids that is fully conserved. Other protein motives common to some subfamilies include chemically modified amino terminals, the nuclear target consensus (NLS), tyrosine phosphorylation signals, and furthermore, putative protein kinase, casein kinase, and phosphorylation sites. Finally, the putative CAMP-cGMP-dependent protein kinase phosphorylation sites present, for example, in the PROS-27 gene (M),as well as all putative glucosainine glycol attachment and tyrosine sulfation sites, are observed in one subfamily, whereas three families harbor most of the putative glycosylation sites (99). The observation of these protein motives, together with the dendritic tree and the similarity scores give a high probability to the reality of these 14 subfamilies (Table I), which we propose to name in the order of separation from the two archeobacterial genes in the dendrogram (Fig. 4). Nevertheless, it is obvious that the determination of more of the prosoinal protein sequences should add more detail and more specific features. However, it is unlikely that the general patterns outlined will be basically changed. It is possible, although not granted, that these 14 families relate to a number of different subunits in the prosome core. Indeed, a particularly interesting question concerns the number of prosomal protein genes that exist in any

Dendrogram and Pro-Gene Families

Gene Name HsCB RnCB ScYCl HsPros27 Rrlota Hslota ScYC7a h5c2 HSPros30 RnC2 Dm35 AtPsm30 HsCB RnC9 Dm29 Dd4 5cy13

HsZeta RrZeta scPUP2 Taa

Ata w5

-

L

13.3 m

13.2

I

13-4 1

l3-6

I

c

D~ZBI h5c3 RnC3 xic3 Dm25 Scn RnN3 XID -RE4 ScPRE2 ScPRGl SpPTSl MmC13 RnC1 HsLMP7e2 HsRlNGlO GdPCl TaD HsDelta RrDelta MmLMP2 RrRlNGl2 HsRlNGl2 HsMECLl scPUP1 RrC7-I ScPREl RrC10-II ScP22-6 HsCS RrCS DmL3-734 ScPRS3

FIG.4. Dendrograni of phylogenetic relationships showing the two super- and 14 subfamilies of prosoinal protein genes. Tlie sequences were analy7xd for evolutionary relationship, using the program Pileup (GCG package; 107) (details in 99). The names of prosome gene families (see T h l e I) are indicated on the 1)ranchesand to the right, the gene names as given by the individual authors; for references, see the legend to Table I and (99).

20

KLAUS SCHEHHEH AND PAYGAL BEY

given species. One may estimate that several dozen prosome protein genes exist, considering the number of different sequences in the mammalian kingdom that belong to individual subfamilies within a species. On the other hand, taking into account even the fainter spots in two-dimensional electrophoretic protein patterns, similar estimations are possible. Since the gene number might exceed the 24-28 loci available in the basic structure of the particle, a structural variability of the particle within a species becomes likely; this is in confirmation of a concept based originally on biochemical and immunocytological observations (70, 95). In conclusion, sequence analysis has shown that the prosoinal protein genes code for a totally novel type of protein, unrelated to any of the known gene families, and, in particular, to other proteases. The proteins’ subdivision into 14 subfamilies with specific sequence motives may relate to structure, while some of the conserved amino-acid motifs may relate to the mechanism of assembly of the particle. Mutation of the a-type consensus boxes on the 5‘ side of the proteins prohibits assembly of the complex, showing that those most universally conserved polypeptide segments preserve the basic structure of the particle ( W. Baumeister, personal communication, 1993). No information concerning its hnction has as yet come from the sequence analysis. However, the availability of these protein sequences in yeast has allowed initiation of a functional analysis based on molecular genetics. Deletion of individual prosomal genes is lethal, but viability is affected little by point mutations involving individual protease activities (109). Furthermore, such activities can be assigned, in the catalytic process, to the participation of specific subunits.

C. The Prosomal RNA Until recently, the RNA content of prosomes (pRNA) has been a highly controversial matter. Indeed, most enzymologists working on the MCP never observed RNA as a component of prosomes, while others, particularly our group, from the beginning, suggested that RNA is an integral component of prosomes (33, 70); some others came to the same conclusion (110,111).The controversy was ended for good by the demonstration made first by Dineva et al. (112)and, more recently, by Nothwang et al. (22,90)that at least some of the pRNAs are protected inside the complex against RNase attack, and are digestable after Zn2+ dissociation of the particle. It came as a surprise when the major pRNA in prosoines from humans as well as from an avian species was identified as a tRNA, tRNALys.3 (22). From the beginning, the most remarkable capacity of the pRNA was its property to hybridize stably to mRNA (70), particularly to viral mRNA (113),a quite unique property among nuclear and cytoplasmic small RNAs, but an obligatory property for “retroviral” primer tRNAs. The tRNALys.3 accounts for

PHOSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

21

about 80% of the pRNA in humans and ducks, but there are several additional minor species; some of them seem to be tRNAs as well, since they are substrates of the CCA-terminal transferase (22). In addition, there are other pRNA species, in particular a 120-nucleotide-long RNA interacting with the product of the PROS-27 gene (98). The tRNALyS-3is well known as the primer for the reverse transcription of HIV, the AIDS virus (for reviews, see 114, 115).The presence in prosomes at the mRNA level of one of the “retroviral” tRNA primers is one of those disconcerting observations, impossible to comprehend as such, but definitely the beginning of an extremely interesting story. On the other hand, tRNA was also suggested to be involved in the ubiquitin system; treatment with RNase inhibits ubiquitin-dependent proteolysis in reticulocyte lysates, which can be restored by addition of tRNA, particularly tRNAHis (116).Since tRNAs are considered to be some of the most “archaic” RNAs, and operate at the borderline between the RNA and protein “worlds,” our observation seems worth pursuing beyond the prosome-MCP story. Indeed, the nature of the major pRNA, the tRNALYs*3,as a primer for reverse transcription of retroviruses stirs up, once more, the controversy as to the significance of reverse transcription in uninfected higher eukaryotes, as well as in (archeo)bacteria. Reverse transcriptases exist in uninfected eukaryotic cells, some encoded by the LINES’open-reading-frames (117)and also in bacteria. In the genome of higher eukaryotes, “cDNA” copies of mRNA were found integrated as pseudogenes. Therefore, the first demonstration of the presence of a specific type of reverse primer at the level of mRNA, outside the polyribosome system in the prosomes associated in uiuo with mRNP, is most intriguing, particularly, in view of the fact that some of the SINES, repetitive DNA elements, and retroposons (e.g., A h sequences) include, in addition to their long terminal repeats and (A T)-rich DNA, some lysyl tRNA sequence fragments (118).It seems possible, thus, that an entire panel of cellular information exchange and controls still escapes our attention. The surprising presence of a reverse primer in the mRNA-associated prosomes may be a key to a possible experimental approach to this engima. In the course of the identification of some pRNAs, it has also become evident that only a minor fraction of prosomes (about 14%)contains RNaseresistant pRNAs (5). The initial (0ver)estimation of 10-1596 RNA in the prosome particle was based on its density in cesium sulfate (33),and the straightforward application of the formula of Spirin (119)relating the density of an RNP to its RNA:protein ratio. The true pRNA content and complexity in uioo is not yet known. In fact, crude prosome preparations have a higher RNA content and contain many more RNA species than highly purified and RNase-treated particles. This is particularly evident when prosomes are

+

KLAUS SCHERHER AND FAYCAL BEY

22

immunoprecipitated out of the cellular supernatant, bringing down, in mass and complexity, much more RNA (40,111). If the initial (0ver)estimation of RNA content made it theoretically possible that pRNA has a structural function, our present understanding rules out such a possibility. Moreover, successful in vitro reconstitution of the T. acidophiluin prosomes from its subunits shows that the particle does not need RNA to assemble (106;W. Baumeister, personal communication, 1993).Therefore, the presence of RNA might correspond rather to a particular functional state of the particle. Nevertheless, it is not excluded that, in higher eukaryotes, the RNA might trigger the build-up of the particle. Most interestingly, prosomes from T. acidophilum seem to be associated with a tRNA profile similar to that found in human and duck prosomes (22,70); however, the nature of this pRNA is not yet known (W. Baumeister, personal communication, 1993).Entirely open is the question of whether a fraction of the nuclear prosomes has an RNA component, in view of the unsuccessful search for such in the nuclei of Xenopus oocytes (35).

D. Translational Repression in vivo and Inhibition of Protein Synthesis in Wfro The presence of the prosomes in untranslated mRNP and their absence in polyribosomes (1,74,75)made it possible that, somehow, they might be instrumental in keeping mRNA inactive. In this context, it is particularly interesting to note the observation that, in sea urchin oocytes, prosomes are among the factors that maintain the “maternal” mRNA in an inactive state, apparently both in oioo and in oitro (121,122).However, it was known from the beginning of the history of prosomes that the core mRNP, stripped from the prosomes and other associated factors and small RNA, remains translationally inactive in oitro (31,123);almost complete removal of these core proteins is necessary to render full activity to the mRNA. Therefore, at least in somatic cells, these factors are instrumental for long-term repression, while the prosomes may participate in these mechanisms only in a transitory fashic n. Prosomes are therefore not cytoplasmic repressors per se, but may have .he capacity to induce repression. Prtisomes and/or the isolated pRNA can inhibit protein synthesis in oitro (124,125;0.Akhayat, 0.Coux and K. Schemer, unpublished). Hybridizing to mRNA, the pRNA may be instrumental in a transient fashion in the prosome-mRNA interaction. Indeed, many mRNA sequences, tested with the computer, can form more or less extensive homologies with lysine tRNA. Once in place, the prosome-MCP might prohibit initiation complex formation, which depends on the CAP-binding protein included in the initiation factor eIF-4F. Interestingly, this complex is the target of the poliovirus-

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

23

c

induced cellular protease (126,127),which arrests translation of host protein synthesis within 20 minutes of infection (228). The effect of prosomes on in uitro protein synthesis has, so far, not been studied extensively. Early investigations (129) and studies in our laboratory show inhibition of mRNA translation, but the mechanisms of this inhibition remain unknown (0.Coux and K. Scherrer, unpublished). The most interesting study in this context shows that prosomes inhibit in uitro translation of viral mRNAs (adenovirus, tobacco mosaic virus) under conditions in which translation of globin mRNA is not affected. This inhibition bears on initiation; in the presence of prosomes, the 4 0 4 pre-initiation complex does not enter polyribosomes (124). Unfortunately, these studies have not been carried further by any group, primary interest being concentrated on the protease function and the genes of the prosomes.

111. The Multicatalytic Proteinase Activity of the Prosomes and the 2 6 4 Proteasome Several recent reviews have dealt extensively with the proteinase function of the prosome-MCP-proteasome, as of a ubiquitous key system of nonlysosomal protein breakdown. There is no point in repeating these very detailed analyses within the frame of this review, which deals mainly with the particle in the perspective of its association with mRNP and the cytoskeleton; these two latter aspects have not been addressed by authors reviewing the MCP-proteasomes. Nevertheless, to be comprehensive, we develop here some aspects of the protease function that are of particular interest, not only to an enzymologist, but from the point of view of the possible dual function of the prosomes in homeostasis of individual proteins, operating at both levels, in control of protein biosynthesis as well as degradation (see Fig. 7). The almost total dichotomy of research on the prosome system, on the one hand, and its protease activity, on the other, have some experimental basis. In fact, when using their own methods to analyze MCP activity, enzymologists did not come across mRNA, and, conversely, when the mRNPassociated prosomes were analyzed, the higher-order enzymatic complexes in their active and inactive forms escaped attention. As in a detective story, it took the incentive of two insiders to the prosome investigation to bridge the gap and spark the idea that prosomes and the MCP are the same entity (14, 15). This gap was recently completely filledain by our clear-cut demonstration that bonafide prosomes, isolated from mRNP, have all the properties assigned to the MCP (40). Although we had learned rather early about the

24

KLAUS SCHEMER AND PAYGAL BEY

proteinase activity of prosomes, we decided to continue the studies dealing with prosomes in the more complex mRNA- and cytoskeleton-oriented context, and not to intervene in the enzymatic studies. The prosome approach in our laboratory was, from the start, oriented to stay as close as possible to the in uiuo situation, analyzing first the particles in situ, in cells, or isolated from function-related cellular fractions, for example, the mRNP. This was the obligatory basis to more functional future studies implying molecular genetics. In contrast, until very recently, most of the MCP studies used in oitro model systems and artificial fluorogenic oligopeptide substrates and not in oiuo assays or normal proteins, except for the analysis of the ubiquitin-dependent activity, where, for obvious reasons, ubiquitinylated polypeptides had to be used. Evidently, studies on enzymological mechanisms had priority; as a result, surprisingly little is known about the natural substrates of the MCP system, which is very selective. For instance, using the detergent-purified prosomes, we found selective degradation of vimentin, but resistance of actin and most of the mRNP proteins, as well as of the prosomes themselves, which, in their native form, are never autodigested (H.-G. Nothwang, 0. Coux, F. Bey and K. Scherrer, unpublished). It must be stressed, furthermore, that it is not known for sure whether, in the intact cell, the prosome core particle has an MCP function itself, or whether it acts only at the level of the higher-order complex, the 2 6 s proteasome. Of the three subunits of the 26-S proteasome, CF1, CF2, and CF3, the last, representing the prosome-MCP, is much more abundant in reticulocyte extracts than CFl and CF2; it may constitute 0.5-146 of cellular proteins (42). It is therefore likely that the prosome-MCP particles not included in the 2 6 4 proteasomes operate mainly at the mRNP level. Indeed, about 5096 of the prosomes are cytoskeleton-bound and resistant to Triton X-100extraction, as is mRNA. Recent discussions indicate that, possibly, the 2 6 4 proteasome is, in uiuo, the exclusive active form of the MCP activities (Conference on “Aspects of Ubiquitin-dependent Protein Degradation,” Copenhagen, January 1994). Prosomes might therefore distribute between the mRNP and the 2 6 4 proteasome. Surprisingly, in the MCP studies, the compositional variability of the prosome particles was almost fully ignored until the First International Conference on “Prosomes-MCP-Proteasomes”(Titisee, Germany, 1990), when many first learned about the prosomes and, in particular, the fact that there might be many variants of prosome particles (70, 95, 130). This is quite surprising in view of the supposed substrate-specificity of the MCP activity, which, directed to individual proteins, must be of high selectively. Indeed, non obstet the selective ubiquitin labeling system, several laboratories postulated the implication in the cell of the MCP activity in the selective degra-

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

25

dation of individual proteins, assuming implicitly a substrate-specificity of 10-3 to 10-4 in higher eukaryotes; how could a seemingly unique protein complex achieve such a high degree of selectivity?The compositional diversity of the particle and of its superior complexes might possibly provide an answer.

A. The 264 Proteasome Like the prosomes found associated with mRNA in uivo and in uitro, the MCP particle was reported to exist in a higher-M, form: the 26-S proteasome. This complex was initially thought to be involved exclusively in the degradation of ubiquitinylated proteins (17);more recently, exceptions to this rule became known (131).Its integrity depends on the presence of ATP and, for that reason, it was systematically disintegrated when the prosome/mRNP methodology was used. Furthermore, in the MCP fractions isolated by enzymological techniques, the core particle analyzed was always associated with other protein factors, such as inhibitors and activators. Until recently (40), no enzymatic or structural definition of the detergent-washed genuine core particle had been reported. There are very few protein complexes and RNP or DNP structures in the cell that resist 1%Sarkosyl. Surprisingly, this property, widely applied in the prosome studies, was never exploited to define the baseline core MCP particle. Therefore, the compositional and biochemical definition of the multiple active and inactive MCP complexes in relation to the genuine core particle is still outstanding, while the 2 6 4 proteasome is relatively well-known. On the basis of the initial work of Demartino ( I ] ) , Hershko (12), Rechsteiner (132),and colleagues concerning ATP-dependent degradation of proteins in cell lysates, Rechsteiner (133)and Goldberg (134)and co-workers observed, in rabbit reticulocytes, a large ATP-dependent protease complex degrading ubiquitinylated proteins, which was absent in extracts of ATPdepleted cells. This complex, with a sedimentation value of 26 S corresponding to an M, of about 1500, w a s extensively purified and shown to contain protein subunits with M, values between 20,000 and 110,000 (133),including, thus, the prosome-MCP protein subunits with M,s 19,OOO-36,OOO (33). It is composed of three components called CF1, CF2, and CF3; the latter corresponds to the prosome-MCP (135,136). With Mp of about 600, 250, and 700, respectively, CF1, CF2, and CF3 assemble into the holo-complex of M, about 1500. More than 15 subunits constitute the CF1 and CF2 MCP cofactors. The very existence of this 26-S proteasome was contested by some groups (137).However, three independent laboratories have shown, by partial purification of the three factors and their ATP-dependent assembly, the in uitro reconstitution of the activity that degrades ubiquitin conjugates.

26

KLAUS SCHEHHER AND FAYGAL BEY

Furthermore, the lag phase observed in uitro, which precedes degradation of ubiquitinylated proteins in lysates of ATP-depleted cells, was shown to correspond to the ATP-dependent re-assembly of the complex (138). Conversely, upon depletion of ATP in uitro, the degradation of ubiquitin conjugates stops immediately. It seems clear, therefore, that the 26-S complex not only exists but that is a major, if not the exclusive, agent of the degradation of ubiquitinylated proteins in reticulocyte lysates. Its existence has been demonstrated in many tissues and cells, such as leukemic cells, muscle, liver, and the brain; this suggests that it may be ubiquitous in higher eukaroytes (45). There is still controversy as to whether all of the proteinase activity of the 26-S proteasome resides in its CF3 (prosome-MCP)subunit, or whether the factor CF1 contains an additional independent protease. Interestingly, factors CF1 and CF2 contain two independent ATP-binding sites with different nucleotide binding specificities; one of them seems to be involved in the assembly of the 26-S complex, while the other conditions proteolysis of the ubiquitin conjugates. These ATPase activities seem to correspond to a new subfamily of eukaryotic ATPases which form an oligomeric ATPase complex attached to the prosome-MCP; they include the components S4, the Tat-binding proteins (TBPs) 1 and 7, the modulator of HIV-Tat-mediated truns-activation MSS 1, and the yeast protein SUG1. (W. Dubiel, K. Ferrell and M. Rechsteiner, Abstracts, Conference on “Aspects of Ubiquitin-dependent Protein Degradation,” Copenhagen, January 1994). Recently, it was directly demonstrated that the 26-S complex degrades a specific non-ubiquitinylated natural substrate, ornithine decarboxylase, one of the most rapidly turning over proteins in eukaryotes (131).Until now, most studies of the 26-S proteasome were carried out using ubiquitinylated proteins; since it seems to degrade non-ubiquitinylated peptides as well, some of the basic biochemical and enzymological characteristics reported may have to be modified in the future. Among the identified natural substrates ofthe ubiquitin-26-S-proteasome system are the oncoproteins N-myc, c-myc, c-fos, p53, and ElA, the plant photoreceptor phytochrome, and the MAT& repressor, a protein involved in mating-type switching in yeast; furthermore, the implication of the prosomeMCP in the most interesting cell-division-related cyclin systems was proposed (45, 45b). The biophysical structure of the 26-S complex was studied in several laboratories by electron microscopy. The core of the complex is the prosomeMCP particle discussed above, which , when associating with CF1 and CF2, forms first a champagne-cork-like structure, and eventually a symmetrical dumbbell-shaped complex, by the sequential addition of material at both ends of the barrel-shaped core (17, 139, 140). The alternative model of

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

27

Rechsteiner et al. (48)suggests that the 26-S proteasome is constituted by a hemi-prosome associated with the higher M, protein factors, which may, eventually, dimerize. In view of the exceptional stability of the core particle and the immediate and fuZZ disruption upon addition of Cu2+ or Zn2+ (90) into components of less than M, 1OO,OOO, the existence of a hemi-prosome, although not to be excluded, seems unlikely to us. The 2 6 4 proteasome may thus correspond to the champagne cork, the prosome forming the stopper, to be eventually expanded to the dumbell, by the sequential addition of factors C F l and CF2. The precise interactions of the components in the complex and their functional relevance are still largely unknown (17,45).

B. The Prosome-MCP Core Enzyme The operational definition of the prosome-MCP adopted for this report is, on the one hand, the core proteolytic activity of the detergent-washed core particle, and on the other, the complexes of this core with additional factors modulating its activity, excluding the already discussed 26-S proteasome. To discuss the genuine core enzyme, free from any modulating factors, we are limited to the data bearing on the prosomes treated by 1% Sarkosyl and purified on detergent-containing gradients (40).However, one may assume that, biochemically, this core particle may correspond to the MCP complex active in lysates after the addition of 0.01% SDS. This detergent drastically stimulates the MCP activity and indicates the presence of inhibitory factors. The core particle contains the MCP activity; all three basic activities were found in the detergent-washed prosomes with the same pH optima as those reported for the MCP, that is, a chymotrypsin-like activity, a trypsinlike activity, and a peptidyl-glutamyl peptide transferase activity, cleaving on the carboxyl side of basic, neutral, and hydrophobic or acidic amino-acid residues, respectively. Some authors add to these three basic activities a fourth one, found also in prosomes, which degrades specific types of proteins such as casein (40,141, 142).Altogether, the MCP activities seem to be based on five catalytic components (47). No stimulation of the MCP activity of the prosome core particle by 0.010.04% SDS was found, while the addition of polylysine resulted in only a 1.4-fold increase (40).This indicates that the positive or negative quantitative modulation of the MCP activity is tributary to additional factors interacting with the core particle. Furthermore, structural changes or biochemical modifications, including even subunit processing (141,143, I&), might also influence the protease activities. A matter of considerable interest is whether the various distinct protease activities are carried out by individual subunits of the core particle. In yeast, mutational analysis showed that the chymotrypsin-like activity depends on a

28

KLAUS SCHEHRER AND FAYGAL BEY

subunit of M, 23,OOO; interestingly, this point mutation showed that the corresponding enzymatic activity is not vital to the cell, whereas the disruption of the same gene is lethal (109). Therefore, the possibility existed, on theoretical grounds until recently, that individual subunits might carry the individual protease activities. Unfortunately, any attempt to dissociate the complex led inevitably to denaturation of the individual polypeptides or, in low urea concentrations, to autodigestion of the particle (145). It was therefore of considerable interest to find that very low concentrations (0.01-0.1 mM) of Cu2+ or Zn2+ allowed almost instantaneous dissociation of the core particle, with concomitant loss of all protease activity, which could not be restored by removal of the metal ions (90). Since the twodimensional subunit pattern was identical prior to and after disruption, and no mechanism is known to denaturate proteins at such a low level of divalent cations, it may therefore be concluded that none of the prosome subunits has a proteinase activity per se. The enzymatically active catalytic site must, hence, be constituted by the association of the various subunits. It should not escape our attention, at this point, that the variable composition in subunits constituting the individual particle, and thus its MCP activity in higher eukaryotic cells, provides a theoretical basis for extended modulation of the enzymatic specificity. Created by particular combinations of effective subunits, an almost unlimited selectivity in substrate specificities is theoretically possible, provided by the combination and, possibly, permutation of peptides in the 24- 28-subunit core particle (see Section VI1,B).

C. Structural and Enzymatic Modulation of the Prosome-MCP Core Having discussed the prosome-MCP core in relation to its basic enzymatic activities, it may be of interest to discuss further at least some of the particularly intriguing properties of these activities, partly modulated by cofactors bearing on the MCP core, and not necessarily on the activity of the 26-S proteasome. Still, we may recall here that it seems unlikely that the MCP core operates as such in uioo. All of the studies reported bear, thus, on in uitro effects. In T.acidophilum, the prosome-MCP, composed of two types of subunits (a14 P14), has only chymotrypsin-like activity. The active site has not been localized within the particle. It is also not known whether these active sites are shared at the a@ interphases, or reside in the P-type subunits exclusively. Most interestingly, some protozoan prosomes in flagellates of the T r y p a n a s m cruzi type also have a single proteinase activity, but of a different type (C. Martins de Saypersonal communication). Modulation of the MCP in eukaryotes may involve quantitative effects bearing either on all three activities, or on only one of the protease activities,

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

29

implying a qualitative structural change at the individual protein level. Therefore, individual natural modulators such as activators, inhibitors, and also non-natural inhibitors of various types may bear primarily on the individual MCP activities that depend on specific subunits. As discussed above, nondenatured individual subunits have no protease activity (90);the prosomeMCP is hence not a complex of proteases but a genuine enzyme with individual catalytical sites, which may be based on individual or several concerted subunits. The chyinotrypsin-like, the peptide and peptidyl-glutamyl hydrolyzing activities of the MCP, but not the trypsin-like activity, are latent, as well as the proteolytic activity for particular substrates, such as casein (146-148). These latent activities can be revealed by treatment with very low concentrations of SDS or with polylysine. The latency of these activities is, of course, of the utmost physiological importance, since prosomes are everywhere in the cell, in contrast to the lysosomal enzymes. Human erythrocytes contain two endogenous inhibitors of M,s 240,000 (149) and 200,000 (150)that are composed of identical subunits of 40 and 50 kDa, respectively. The subunits of M, 40,000 form a hexamer (149), and those of M, 50,000, a tetramer (150), which inhibit the MCP in a noncompetitive manner. The observation that some of the MCP subunits seem to be autoinhibited and have to be cleaved, in order to allow for activity, is of particular interest. This is the case of an M , 32,ood protein processed to an M, 20,000 active form (144),and of an M, 24,OOO subunit processed autocatalytically into an M, 21,000 form by EDTA treatment and subsequent EDTA-free dialysis (141). Another inhibitor of M, 31,000 has been described (151). In addition to inhibitors, some activators of the core MCP activity have also been described. One contains an M, 160,000 peptide, while another seems to include an M, 30,000 protein, in addition to the prosomal core proteins (152). The existence of a complex composed of M, 30,000 subunits, which binds the prosome reversibly and stimulates its activity, has also been reported (153). An endogenous activator was also found in human platelets

(1%). At the present time, it seems unclear whether any of these activators and inhibitors of the MCP core are present in the 2 6 4 proteasome (for a review, see 48). However, one of these inhibitors is related to the ubiquitin system and seems to form, as such, a component of the 26-S proteasome (155). Some isocoumarins, serine protease inhibitors, stimulate the caseinolytic activity of the MCP and inactivate the other three components (142);moreover, acetylation of the MCP changes this activity further. The caseinolytic activity seems thus basically different from the others, which seem to be serine proteases. Isocoumarins are thought to induce conformational

KLAUS SCHEHRER AND FAYCAL BEY

30

changes; these data show, thus, that modulation of individual MCP activities is possible by conformational and biochemical modification of a subunit(s). The reversible repetitive activation and inactivation of certain activities of the lobster MCP along a triangular scheme have been reported (156).Heating of the isolated basal form activates the (caseinolytic)proteinase activity, while a low concentration of SDS activates the peptidyl-glutamyl peptide hydrolase activity and inhibits the chymotrypsin-like activity. Transformation of the heat-activated form into the SDS-activated form occurs upon addition of SDS; dialysis of the latter restores the basic form. The protein pattern of the three forms being almost identical, it is suggested that conformational changes in the particle, including possibly associated factors, may induce activation or inactivation of specific MCP activities, as shown below.

+

form\

+ Heat

7

SDS-activated form

+ SDS

Heat-activated form

A final question relates to ATP; under certain conditions, the prosomeMCP can be isolated in an ATP-activated form, which seems to be formed within the ubiquitin pathway (157,158). The biochemical mechanism of this dependence has not been defined; partially purified preparations of the MCP fraction of the 264 proteasome have no significant ATP-hydrolyzing activity. ATP-dependence may therefore be restricted to the assembly of the 264 proteasome and its ATPase complex, and its ubiquitin-related mechanisms, resulting in an activated MCP particle.

D. The LMP:MCP Activity and Antigen Presentation within the Major Histocompatibility Complex (MHC) Intracellular antigens are presented to the T-lymphocytes in the form of small peptides carried to the cell surface by the MHC-I complex (159,160), while extracellular antigens are processed by the MHC-I1 complex (160, 161).The vast population of MHC-I and -11 genes are encoded within the MHC locus on chromosome 17 in mice, while the human MHC (HLA) genes are on chromosome 6. Interestingly, two transporter genes are encoded within the MHC-I1 cluster, which are instrumental in transferring cellinternal small peptides to the MHC-I system. Protein processing provides, at the Golgi level, the MHC-I molecules with the short peptides to be

PHOSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

31

antigens, which are selectively bound and eventually presented to the cytotoxic T-lymphocytes; their delivery to the endoplasmic reticulum is mediated by the TAP1 and TAP2 (Ring 4/11) transporter gene products just mentioned. Deletion of these genes impairs but does not suppress antigen presentation. The protease system producing these small peptides is not known at present, and constitutes a missing link in our knowledge of antigen presentation. Therefore, the discovery that two prosome-MCP genes (LMP2/7 or Ring 10/12)are encoded, interspersed with the two transporter genes in the MHC-I1 locus, led immediately to the apparently obvious proposition that the prosome-MCP system might represent this missing link (16). For many years, the groups of Monaco and McDevitt have defined and studied the LMP complex, which was discovered in view of its relationship to the MHC system and was, at an early stage, presented as a fourth class of MHC-related molecules (19). LMP is a multifunctional complex of about 600,OOOM,,with a peptide pattern strongly resembling, in two-dimensional analysis, that of prosome subunits, co-precipitated with the MHC complex by allogenic sera. Repeating these experiments shortly after their publication, we found, using anti-prosome monoclonal antibodies that, indeed, the LMP complex did contain prosomal subunits (F. Grossi de Sa, M. Seman and K. Scherrer, unpublished), but we did not then follow up these observations. The MCP system represents the main nonlysosomal system of intracellular protein degradation. Its implication in the immune peptide generation could thus be suspected; the finding that two prosome-MCP subunits are encoded in the MHC-I1 complex seems to confirm such a hypothesis. The evidence for its implication in antigen presentation is, however, still circumstantial. It is mainly built on correlations, relating the prosome-MCP to the LMP-MHC system, as, for instance, its induction by interferon-y (162).On the other hand, recent data speak against the supposition that antigen presentation relies mainly on the prosome-MCP system. These data show that the expression of stably assembled MHC-I molecules and normal peptide processing can be completely restored in the absence of the LMP2 and LMP7 genes, in human lymphoblastoid cell mutants in uitro (51, 52). These gene products seem essentially to modlfy the proteinase activities of the proteosome (52b, 52c). Therefore, the implication of the prosome-MCP system in antigen processing must be mare subtle than anticipated. The fact that prosomes, as particles, are present at the cell surfwe of lymphocytes in a differential manner, related to the CD phenotype and the type of prosomal antigen probed for, may give a new lead to the implication of the prosome-MCP in the immune response. A particularly interesting observation is that CD19 (B4)-lymphocytespresent several times more pros-

32

KLAUS SCHEHRER AND FAYGAL BEY

omal antigens at their surface than CD4 and CD8 lymphocytes (96, 97). Indeed, the CD19 lymphocytes are involved in the processing of extracellular antigens and the humoral immune response, all related to the MHC-I1 system. Finally, the changes in the prosome subunit composition in response to interferon-y (162) represent the most clear-cut recent data showing that prosomes of different subunit composition can exist within a given cell, in confirmation of our early and recent results (70, 95, 163). This may also be understood as a response to reprogramming of the cell, with implications at the level of protein biosynthesis, a EdCt well documented in our studies on differentiation and embryonic development (see Section V). The interferon effect might, thus, not necessarily be directly related to the MHC system.

IV. Prosomes, the Cytoskeleton, and the Hypothesis of mRNA Cytodistribution Since the early 1 9 8 0 ~more ~ and more evidence indicates that genes and their products, pre-mRNA and mRNA, may be distributed in the cell in a topologically organized manner (for a review, see 80,164; see also the discussion in 84). The notion arose that proteins might be synthesized and assembled co-translationally where they are needed, either directly at the site of their function, or at certain central points where they engage in the secretory mechanisms of the cell. This concept started to replace the still widely held assumption that proteins might be synthesized anywhere in the cell, to then be sorted out by post-translational mechanisms operating possibly at the level of the endoplasmic reticulum, and put in place by selective assembly or “crystallization.” A review of the early experimental evidence and of the theoretical considerations behind the concept of specific localization of genes and transcript processing, as well as of co-translational assembly, are detailed within the frame of the “unified matrix hypothesis” (M),first published in 1983(165,166);similar ideas were developed by Gunther Blobel in his “gating hypothesis” (87)and by Uli Laemmli (cf. 167) as well. We do not expand on these ideas here, but it is worthwhile to point out that they were at the origin of our considerations and attempts at cytolocation of untranslated mRNA in general. Indeed, the prosomes, which q e largely absent from polyribosomes, can be considered to serve as cytological markers for mRNAs in their untranslated form.

A. Cytodistribution of Prosomes in lnterphase Cells and during the Cell Cycle The most interesting observations of prosome cytodistributionin relation to the cytoskeleton, on the one hand, and in differentiation and develop-

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

33

ment, on the other, was based on the development, first in our laboratory (la), and then in that of K. B. Hendil (169) of prosome subunit-specific monoclonal antibodies. (Some prosomal monoclonal antibodies are now distributed by Organon Teknika, Veedijk 58, 2300 Turnhout, Belgium.) In view of the presence of prosomes in ribosome-free mRNP, this also provided, for the first time, a cytological marker for untranslated mRNP (see discussion in Sections I,C and IV,B). In most reports on the cytodistribution of prosomes relating to their MCP activity, including some recent reviews (46,47),the fact was neglected that many different prosomes of variable subunit composition exist. Indeed, each of these particles may thus be distributed individually in a very specific manner among the various cell compartments. Cytodistribution probed by monoclonal antibodies therefore teaches a different lesson, compared to results obtained by polyclonal antibodies, which reflect the system as an entity. Except in specific phases of the cell cycle and stages of gameto- and embryogenesis, prosomes never move en bloc. They must hence be dealt with individually; analysis thus has to bear on the cytodistribution of the specific subunit antigens constituting individual particles. Like mRNA, most of the prosomes are in the cytoplasm. However, some prosomes of specific subunit composition may reside primarily in the nucleus, in specific physiological situations; this is also in analogy with some specific transcripts. Nevertheless, bulk distribution of prosomes is of interest in itself since it may change according to the physiological status of the cell in development (95), cell differentiation (170),and pathology (171). Most interestingly, most types of anti-prosome monoclonal antibodies (p-mAbs) produce a specific distribution pattern in indirect immunofluorescence (IIF), either diffise or in patches, some aligning with the chromosomes or the nucleoli, some resembling the distribution pattern of satellite DNA and transcriptional centers (172, 173), others being particularly rich at the nuclear membrane and some in specific sectors of the cytoplasm, according to the cell type and physiological situation (95, 168).Furthermore, specific types of prosomes align with functional zones of cells, for example, along the bile canaliculi in the liver (174) or in muscle with the sarcomeric structure of myofibrils (175). Cell fractionation indicates that, in the steady state, fewer than 10% of the prosomes are in the nucleus; in the cytoplasm they are either bound to the untranslated mRNP or to the Triton-soluble or -insoluble fractions of the cytoskeleton. In healthy, steady-state cells, a small fraction of less than 5% may be in polyribosomes or the 40-S pre-initiation complex of protein synthesis, according to our data on several types of cells. Some have found, however, a much larger fraction with the polyribosomes (110).In cells synthesizing a large proportion of high M, proteins, separation of mRNP and

34

KLAUS SCHEHHER AND PAYCAL BEY

polyribosomes is very difficult, if not impossible, due to the “giant” mRNP complexes, for example, in muscle cells (S. Missorini and K. Scherrer, unpublished), in contrast to erythroblasts, in which the small globin mRNPs are most abundant (66).Most interestingly, some monoclonal antibodies recognize higher-M, forms of prosomal proteins, exclusively in the polysuch ribosome/pre-initiation complex fractions, as well as in the nucleus (74); particles may therefore be functionally different from the bulk prosomes in the free mRNP. Since, due to their extremely compact structure, free prosomes co-sediment with those bound to small mRNPs in the 15- to 204 zone of gradients, it is not easy to estimate quantitatively the mRNA-associated fraction; again, prosomes of individual subunit composition have to be recorded individually to quantify their partial association with mRNP. Sedimentation analysis of bulk cytoplasmic particles and cytoskeletal breakdown elements indicate that prosomes co-sediment with mRNA and mRNP proteins in a wide zone from 10 to 80 S (70). Upon dissociation by 0.5-M KCl or 0.01-M EDTA, they band exclusively in the single 1 9 4 position; this indicates their previous attachment to mRNA in the 10- to 80-S zone. Upon dissolution of the bulk mRNP by Sarkosyl, they also show a 19-S sedimentation coefficient. On the basis of the qualitative shift of prosomes to the 19-S position upon detergent treatment, the global ratio, prior to dissociation, between genuine free 19-S prosomes and mRNA-bound particles may be estimated to be close to 1:l in the 19-S zone. This is in line with the figure of about 30-7096 (according to the antigen probed) of prosomes that are Triton-extractable from the cells. Much attention was drawn to the NLS in prosomal protein sequences, indicating their ability to move into the nucleus (43,103).Since prosomes have half-lives of several days (176),they can be expected to circulate within the cell. However, since prosomal proteins have the NLS signal already in T. acidophilum, this must relate to a function more fundamental than nuclear transfer. Since no systematic studies of prosome biosynthesis have been carried out as yet, it cannot be excluded a priori that prosomal proteins move into the nucleus individually, to be assembled there. The absence of free prosomal proteins in cell lysates [except in heat shock (lo)],speaks against the existence of prosomal protein pools and hence such a possibility; unfortunately, this question was never systematically studied, as yet, concerning the cell nucleus. Most interestingly, in oocyte “interphase” chromosomes of the lampbrush type, the highest concentration of the prosomal antigens coincides with the “chromomeres” on the chromosomal axis (from which the lampbrush loops emerge), as well as on the nascent transcripts (95). At least 95%of the prosomal proteins thus seem to be in particles, and most of the latter are bound to cellular structures. No prosomes or prosomal antigens were found in the post-mRNP cell sap after hypotonic shock

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

35

(F. Grossi de Sa and K. Scherrer, unpublished). However, Triton X-100 extraction releases between 30% and 70% of a given prosomal antigen from the cell. Under such conditions, mRNA remains attached to the cytoskeleton (see discussion in 1,C). Therefore, 30-70% of the prosome-MCP may be kept in place by lipoproteins or other Triton-soluble structures, at the level of the Golgi apparatus, the endoplasmic reticulum, and the plasma membrane. As we will see later, a (small) fraction of the prosome population is inserted into the outer plasma membrane. The behavior of prosomes during cellular division, particularly in metaphase, is of special interest; our early finding that in cell division the bulk of prosoines end up on the initoic spindle, and on aster-like structures centered onto the centrosomes, is most spectacular (95). This was recently confirmed by systematic studies on the cell cycle in Ascidians (177) and ovarian cells in culture (178). Beyond the possible functional significance in cell division of this finding, it also demonstrates that a privileged relationship must exist between prosomes and the tubulin network; such an association is barely perceivable in interphase cells (76). Upon initiation of cell division, prosoines assemble close to the nuclear membrane. In prophase, prosoines start to accumulate in the perichromosoinal area (177, 178) and on the centrosomes (S. Missorini and K. Scherrer, unpublished), increasing there strongly in metaphase and early anaphase. Eventually, they for..i a kind of shell around the metaphase chromosomes, the latter standing out as dark shades. Preliminary observations on metaphase plates of human lymphocytes indicate that prosomes reside then on a filamentous network, spanning the individual chromosomes in a manner reminiscent of the phenomenon of ectopic pairing of polytene chromosomes in Drosophila (J. Lejeune and K. Scherrer, unpublished). In late anaphase, prosomes are mainly on the spindle fibers and the asters around the centrosomes (95,177, 2 78), while in telophase and early interphase, prosomes are, for a certain time, in the nuclei, prior to being released to the cytoplasm of the daughter cells. This sequence of events, particularly the persistence of prosomes in the nuclei, is in agreement with the general finding that in rapidly dividing cancer or leukemia cells, as well as in phytohemagglutinin (PHA)-stimulated lymphocytes, prosome concentration in the nuclei increases (171). The movement to the nucleus of all prosomes in early development, observed precisely at blastula stage in Pleurodeles and chickens (95,179), as well as to the nuclear periphery in the nematode C. elegans (R. Schnabel and K. Scherrer, unpublished), is most intriguing and interesting. Since in all systems studied, prosome protein synthesis resumes in late blastula and reaches the steady state during gastrulation only (95, 179, 180), in early development maternal prosomes are concerned. Therefore, the coincidence of their movement to the nucleus, at precisely the stage when in most

36

KLAUS SCHEMER AND FAYGAL BEY

embryonic systems de nmo transcription of the zygotic genes resumes fully, is particularly interesting. In conclusion, while in somatic interphase cells most prosomes are in the cytoplasm, particles of individual subunit composition may primarily reside in the nucleus, even in interphase cells. Immediately after cell division, most prosomes are in the nuclei of all cells of an organism, before spreading to the cytoplasm. The temporary presence of all prosomes in the nuclei of all cells of an organism seems to be the particularity of the blastula stage of early embryonic development. In the cytoplasm of somatic cells, they divide between the liposoluble structures (membranes, etc.) and the cytoskeleton, the latter containing the mRNA-bound prosomes.

B. Prosomes and the Cytoskeleton Three types of cytoskeletal networks span cells in interphase and metaphase and also during oogenesis, which constitutes a particular type of interphase. [In the oocytes, macromolecular RNP patterns are laid down that are thought to pre-figure expression patterns and functional segregation of cells in early stages of embryological morphogenesis (see Section V); it is known that specific types of mRNA segregate in sea urchin embryos already at the eight-cell stage (181).]The best known of the three cytoskeletal networks is possibly that of the tubulins, forming the microtubule networks. These are involved in gross movements of cellular structures such as the chromosomes in mitosis or the mitochondria in most interphase cells. The tubulin fibers also play a central role in static but also dynamic cell architecture, since their selective destruction by drugs leads indirectly to the collapse of other types of cytoskeletal networks. But the most fundamental of the cytoskeletal networks is probably that of the actin fibers, which spread throughout the cells in interphase and metaphase and are present in the nucleus and the cytoplasm, constituting, possibly, the basic cellular matrix. Their appearance may vary from thick “stress”fibers, observed rarely in tissues but characteristic of in uitro cultured cells, running in particular under the cellular membrane, to 5-nm-wide microfilaments, highly developed in the cytoplasm of any type of cell. It is most important in the context of this review that more and more clear-cut information has been published over the last 5 years showing that the actin-type filaments carry the ribosomes and the translated mRNA (73)and are thus involved in the translation machinery (71,72). A characteristic common to both the actin and tubulin networks is that they exist in any type of normal or transformed cells. Their function seems to be vital to the existence of the dividing cells. This is not the case for the third type of cytoskeletal network, the IFs, which seem to be dispensable at least to some transformed cells. In contrast with actin and tubulins, which are present in all types of cells, characteristic IF-type networks exist in specific

PHOSOMES (MULTICATALYTIC PROTEINASES; PHOTEASOMES)

37

cells and tissues (for a review, see 77). Epithelial cells are characterized by about 20 different types of cytokeratins, which co-polymerize in couples of type I and I1 molecules (reviewed in 182), while fibroblasts are spanned by the vimentin network. In many cell types, IFs of cytokeratin and vimentin types co-exist. Myogenic cells have desmin, nerve cells IF of the neurofilament (NF) and Periferin type, while GFAP is characteristic of glial cells (astrocytes). Many types of associated proteins seem to interact with the different kinds of IFs (for a review, see 182). As already discussed (see Section I,C), mRNA remains attached to the cytoskeleton upon Triton X-100 extraction, while “run-off ribosomes are easily extracted. Although these studies were meant, primarily, to bear on translated mRNA, sheer quantitative considerations made it likely from the onset that the untranslated mRNA was attached to some part of the cytoskeleton as well. In HeLa cells, these two functional types of mRNA are present in about a 1:l proportion (I).Since ribonuclease treatment released ribosomes with fragments of mRNA, the poly(A)-bindingprotein remaining attached to the cellular structure, it was likely that PABP was somehow involved in the attachment to the cytoskeleton of at least the translated mRNA (79). Since this M, 73,000 protein w a s found to be absent from untranslated mRNP (66),it became evident that other types of ubiquitous factors in the mRNP must be involved in the binding of untranslated mRNA to the cellular structure. Experiments carried out for the last 10 years on various types of cells indicate that also a major fraction of prosomes, identified as subcomplexes of untranslated mRNP, remain attached to the cytoskeleton upon Triton extraction (74, 75). Since prosomes are of variable subunit composition, the extractable fraction of prosomes varies, according to the antigen being probed, from about 30% to 70% (see Section IV,A). Since, as just stressed, mRNA remains attached to cell structures under these conditions as well, it is likely that the prosomes that resist Triton X-100 extraction are those associated with the untranslated mRNA. They may, therefore, serve as cytological markers for the latter, to the same extent that the M, 73,000 PABP seems to be a useful marker for translated mRNP. Using various monoclonal antibodies as molecular probes in double-label IIF studies, carried out on Triton-extracted cells by optical microscopy, as well as by immunogold cytochemistry using the electron microscope, it became evident that the greater part of the Triton-resistant prosome antigens were associated with the cytoskeletal networks of I F type (24, 74,183). The first studies bearing on Triton X-100-extracted HeLa and PtKl cells showed the extensive co-localization of prosomes and the I F of cytokeratin type (see Fig. 5A and A‘), while little correspondence was observed between prosomes and vimentin, also present in these cells. Little correspondence

38

KLAUS SCHERRER AND FAYGAL BEY

FIG.5. Coincidence ofprosomal and intermediate filament (IF)networks, the existence of prosome subnetworks, and selective distribution of specific types of prosomes. (A, A’) Superimposition of prosome and IF networks. PtKl cells were labeled by double-label indirect immunofluorescence (IIF) methods using (A) an anti-cytokeratin polyclonal antibody and (A’) the antip27K prosome subunit-specific monoclonal antibody (p-mAb clone IB5); experimental details are given in 74. (B,C) Existence of prosome subnetworks. PtKl cells were labeled by singlelabel IIF methods using (B) the anti-pSK (clone 7All) and (C) the anti-p33K (clone 62A33) p-mAb (experimental details are given in 75). (D, E) Selective distribution of specific prosome

PROSOMES (MULTICATALYTIC PHOTEINASES; PROTEASOMES)

39

was found between prosomes and the tubulin or actin networks in HeLa cells; their presence on the actin filaments w a s later found to be significantin some types of cells. A small fraction resides on the actin-based microfilaments, which also carry a variable fraction of the Triton-soluble prosomes (C. Arcangeletti and K. Schemer, unpublished). The working hypothesis was proposed, therefore, that the IFs carry the prosomes and thus, possibly, the untranslated mRNPs of various types. The prosome-IF system might thus be involved in the selective cytodistribution of inRNA prior to translation (24, 74, 183), not only in somatic cells but also in oocytes, where “maternal” inRNA was found to be associated with the cytokeratin network (88). It should be pointed out, as already mentioned above, that, in all cells tested, prosomal antigens were never observed as free proteins outside the 19-S complex, except under heat-shock conditions (10). Therefore, it w a s possible to conclude that the immunofluorescence patterns observed, and the labeling by IIF or by gold particles in the electron microscopic studies, concerned prosome particles, not free prosomal proteins. During the last few years, the notion of IF involvement in prosome cytodistribution has been extended to the vimentin and desmin networks (75, 76, 183). These studies, once more, involved various prosomal antigens, tested on PtK cells, human fibroblasts, and LLC-MK2 cells, as well as the C 2.7, T10/2, and SOL-8 myoblasts. In all cases, the co-localization of prosomes and the IF of cell-specific type was extensive. It was reduced on viinentin fibers in cells containing, primarily, cytokeratins or desmin as the differentiation-specific network, but extensive in fibroblasts having only vimentin-type IFs (76,183).Depending on the type of cells and their physiological condition, co-localization of prosomes and cytokeratin was typically 80-90%, while superimposition on vimentin filaments was considerably more restricted, bearing, possibly, on 3040% of the vimentin fibers only (76).In human fibroblasts, in which the only IFs known at present are of the vimentin type (183),again 80-908 of the fibers are occupied by prosomes of specifictypes. The IFs in the nervous system have not yet been studied, but nerve cells of the different types contain prosomes of specific composition as well (184). The concept of IF involvement in the cytodistribution of prosomes may thus be generalized as a working hypothesis at the present time. Dynamic studies have been carried out with drugs such as acrylamide monomer, a neurotoxin that induces the selective collapse of the I F onto the nuclei, leaving largely intact the tubulin and, in particular, the actin nettypes in the cyotplasm of hepatocytes. Adult rat liver sections were labeled by IIF methods using (D)an anti-prosomepolyclond antibody (courtesy of K.Tdndka) on adult liver (experiment by D. P6chinot D. Briane and J. Foucrier), and (E) the anti-p31kDa (clone AA4) p-mAb (experimental details are given in 174). The prosomal mAbs used are distributed by Organon Teknika, Veedijk, Turnhout, Belgium.

40

KLAUS SCHERREH AND PAYGAL BEY

works. These experiments indicate direct association of prosomes and the different IFs, rather than the existence of independent prosome and I F networks running in parallel (75).Indeed, during collapse, the two antigens apparently never dissociate. Interestingly, upon removal of the drug and reconstitution of the IF networks, a slight delay in the insertion of the prosomes into the network IFs was observed (M.Olink-Coux and K. Scherrer, unpublished). This observation might be interpreted as indicating the existence of a single dynamic network, constituted by the IFs and dynamically populated by prosomes, which might be inserted with a slight delay into newly constituted IFs. Recent interesting data of Goldman and colleagues (185), on the one hand, and Lazarides (186), on the other, indicate the existence of a vectorial movement of the IFs (77).The data might be interpreted as showing that vimentin-type IFs move toward the plasma membrane, from either the nuclear periphery (187) or certain cytoplasmic centers of organization (186). Prosomes and mRNP might therefore be carried along by the moving IFs in a “conveyer belt” mechanism, or in a fashion similar to that of various organelles on the microtubules and, in particular, the condensed chromosomes on the tubulin fibers of the spindle, or the mitochondria on microtubules. It is also possible that an energy-requiring mechanism moves the prosomes along the IFs. The interest of either type of mechanism for the putative function of the IFs in prosome and possibly mRNA cytodistribution is evident. Nevertheless, much more direct evidence must be obtained before any conclusions can be drawn as to the existence of such a system. Another type of dynamic studies involves the infection of cells by specific types of viruses, recently shown to lead, eventually, to profound modifications of the cytoskeleton and, in particular, of the IF systems (188, 189). Most interestingly, just 4 hours after infection, the collapse of the vimentinrelated prosome network is obvious in influenza-virus-infected LLC-MK2 cells and is complete by 8 hours, while the cytokeratin system bearing prosomes lasts for 24 hours, until the cells lyse (C. Arcangeletti and C. Chezzi, unpublished). It is evident, therefore, that prosomes must exist associated with the IFs of either type; possibly, the different prosome-IF networks each have specific functions. In this context, it is particularly interesting that the housekeeping type of protein synthesis is arrested in these cells early in infection while, most obviously, the viral proteins are synthesized until cell lysis in a specific, time-related program. Evidently, in order to test the hypothesis of IF involvement in mRNA cytodistribution, the next level of analysis must answer the question of the co-localization of mRNA of a specific type with prosomes of given subunit composition, studied directly in the appropriate cells, using in situ hybrid-

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

41

ization of cDNA probes, immunohistochemistry of antibodies to prosomes and I F constituents, and electron microscopy. A further question relates to the fate of prosomes once they are released from the mRNA. The total exchange of the trans-acting factors associated with untranslated mRNA for those necessary in translation, is likely, as discussed above (Section I,C), to operate at the level of the microfilaments. Indeed, when cells are fixed instantaneouslyat 37°C by formaldehyde, in the presence of high Triton X-100concentrations, a substantial fraction of prosomes, varying in amount relative to the antigen tested, is found at the level of the microfilaments (C. Arcangeletti and K. Scherrer, unpublished). Systematic studies showed that some of these actin-associated prosomes are released from the microfilaments upon Triton extraction, particularly at temperatures below 37°C. It is hence possible that a thus-far-unknown association exists of IFs including the prosomes(-mRNP),with the microfilaments, which is dissolved at low temperature and/or Triton X-100extraction, as are the microtubules. Furthermore, and most interestingly, ladder-like actinprosome cables of characteristic structure under the electron microscope seem to form in oitro, also visible in oioo by immunocytochemistry (U. Aebi and K. Scherrer, unpublished), pre-figuring, possibly, the stress fibers. Therefore, upon release from the mRNA to be translated on the microfilaments, prosomes might transiently associate with the actin fibers prior to further migration in the cytoplasm or back to the nucleus. Finally, recalling the association in mitosis of most cellular prosomes with tubulin on the mitotic spindle, the centrosomes, and the asters (discussed in Section IV, A, 177-1 79), and in view of some preliminary observations in our laboratory indicating a small but significant co-localization of prosomes with tubulin in LLC-MK2 cells (C. Arcangeletti and K. Scherrer, unpublished), the tubulin network might also somehow be involved in the “prosome cycle.” Further work will be necessary to comprehend the structural and functional relationship of prosomes with the various filamentous networks of the cell, but it seems clear already that prosomes are filament-associated factors present at most levels of the nuclear matrix and the cytoskeleton.

C. Subnetworks of Prosomes and the Intermediate Filaments The studies just discussed, as well as those reported in Section IV,A, relating the specific cytodistributionof prosomes, led to the idea that subnetworks of prosomes of specific types might exist, that is, that specific types of prosomes might be associated with subnetworks of the various I F systems. This allegation has recently been given substance by three types of particularly interesting observations.

KLAUS SCHEHHER AND FAYGAL BEY

42

(1)The first one relates to the fact that different types of prosomal antibodies produce different staining patterns in a given type of cell (75, 168). It is interesting that, as seen in Fig. 5B, one prosomal antibody stains I F running generally in parallel to the plasma membrane, although crossing over in certain places between individual cells. Analysis of the same type of cells (Fig. 5C) by another p-mAb shows quite a different picture; there, fibers run from organizational centers in the cytoplasm straight to the plasma membrane, which is heavily populated in many places by what might be called “prosomejunctions,” outlining the full cell contour. Interestingly, the same antibody also stains structures that might be interpreted as showing the Golgi apparatus, close to the center of the cells. In these cells also, the prosomes are on the cytokeratin network, as discussed above (see Fig. 5A and A’). One might therefore consider the hypothesis that prosomes of specific types do indeed occupy subnetworks of the cytokeratin-type IF. This might be in relation to the known heterogeneity of cytokeratins of individual types within the IF network, and to the existence of various IFassociated proteins (190). (2) The second observation of particular interest in this context is the alignment of specific types of prosomes along the bile canaliculi in adult and embryonic rat hepatocytes (174). While a polyclonal antibody directed against all types of prosomes stained the whole cytokeratin network all over the cell (Fig. 5D) (J. Foucrier and K.Scherrer unpublished), exclusively prosomes containing the 31-kDa subunit line up along the bile canaliculi (174). This selective cytolocation dissolves into the general staining pattern upon disturbance of liver function (D. Pkchinot and J. Foucrier, unpublished). It seems therefore likely not only that rat hepatocytes contain specific types of prosomes, but also that individual hepatocytes contain various kinds of prosomes, some of which are positioned in functionally significant zones of the cytoplasm where specific proteins are made and/or processed. These observations led to the conclusion, once more, that subnetworksof the cytokeratintype IF must exist, selectively bearing such types of prosomes. Similar observations were made recently on rat muscle cells showing alignment of some types of prosomes with the M and Z lines of the sarcomeric structures (1 75). (3)A most important result bears on the observation of prosomes at the cell surface and in the extracellular space, described in Section IV,D.

D. Prosomes at the Cellular Surface and in the Extracellular Space Figure 5C indicates that the prosomes might migrate up to cellular junctions at the plasma membrane; this observation led to a further interesting development. The existence of “prosomejunctions,” like those shown in Fig.

PHOSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

43

5C or previously in a “minireview” (24), led to the idea that prosomes might reside at the surface of individual cells and, furthermore, circulate between cells touching each other. To prove conclusively the latter putative phenomenon, extensive investigations are necessary. It was, however, relatively easy to test in the case of cells that normally never touch each other, whether prosomes are present at their surface, and as a corollary, to investigate whether extracellular prosomes exist. Experiments carried out within the last 3 years on human blood cells indicate the presence of prosomal antigens, in the form of 19-S particles, at the outer membrane of human lymphocytes, varying in extent with the C D immuno-type (97). Most CD19 (B4)lymphocytes were found to bear surface prosomes of several types, while only 1030% of CD4- or CD8-positive cells have surface prosomes (96). Similar observations were made in the case of other types of blood cells (163). The search for prosomes in the extracellular space turned out to be positive, in tissue-culture supernatants as well as in the serum of bovine and human blood. In these body fluids, bonafide prosomes were found and identified by their antigenic composition, protease activity, and pRNA content. Most interestingly, under some physiological conditions, the number of extracellular prosomes was found to be increased, particularly in the serum of cancer patients (191, 192). Whether or not these extracellular prosomes reflect a specific function, possibly in relation to a system of cellular factors of communication, or the MHC system, remains to be tested by future investigation. Nevertheless, the working hypothesis that prosomes are not confined to cells, but are shared by the cells of a clone, a cell compound, or possibly an organ seems legitimate. The particularly interesting question arises as to whether the prosomes constitute a system of para- or endocrine factors controlling target cells at a distance. Indeed, their association with mRNA, their protease activity, or any other type of function that may be inherent in them, might be shared by specific cell populations. On the other hand, control of gene expression is not necessarily the problem of the individual cell, but that of cell compounds and cooperating cellular systems. Concluding this section on prosomes and cell structure, we may say that prosomes are associated in part with untranslated mRNA and the IFs. Of all cell systems tested so far, most of them seem to be associated with cellular structures, either Triton-extractable, such as the microfilamentassociated complexes, the endoplasmic reticulum, and the plasma membrane, or with the “hard-core” cytoskeleton. The presence of prosomes of various subunit composition on subnetworks of IFs of various types seems to be established and is particularly evident in the case of specific types of prosomes present in functionally defined sectors, such as the bile canaliculi of hepatocytes or the M and Z lines in the striated muscles sarcomeric system. Finally, the prosomes at the cell surface and in the exoskeleton are

44

KLAUS SCHEHHER AND FAYGAL BEY

particularly intriguing, as is their presence in the extracellular space, particularly in human blood serum. But the key question still to be solved is that of the possible participation of the prosome-IF system in the temporal and topological cytodistribution of specific mRNA. As such, this putative system could provide a conceptual basis for the emerging fact of site-specific mRNA localization and cotranslational assembly and function of proteins.

V. Prosomes Vary in Their Subunit Composition in Relation to Differentiation and Embryonic Development Since 1969, we have investigated the prosome-mRNP system using mainly two different biological models, the highly differentiated avian erythroblasts, on the one hand, and the human tissue culture HeLa cells, on the other. This led early to the observation that variable forms of prosomes must exist, in analogy to the variability of the mRNP core proteins, analyzed in the same cells (I, 68),and in contrast to the ribosomes, for example. Until recently, most MCP-proteasome studies were based on the assumption that a unique MCP particle was composed of a single set of proteins (see, e.g., 46). The first direct evidence that the prosomes are not, like the ribosomes, a unique composite structure more or less maintained throughout the animal kingdom was obtained from two-dimensional gel patterns of Sarkosyl-purified prosomes from avian and mouse erythroblasts compared with those of human HeLa cells, showing significant variations in subunit composition (70). This notion of interspecies dserencies in subunit composition was confirmed throughout the phylogenetic tree. In view of the apparently conserved structure of the particles, at least at the electron microscopic level, this is quite a remarkable finding. Work in our laboratory showed intraspecies differences in the prosome patterns when comparing, for example, erythroblast and brain prosomes in chicks 0. K. Pal and K. Scherrer, unpublished), as well as intracellular differences between globin and nonglobin mRNP (68; F. Bey and C. Martins de Sa, unpublished). In Drosophila, fiactionation of MCP activities allowed the observation of some differences in the protein composition of three slightly separated diethylaminoethyl (DEAE) column fractions (130). No further direct biochemical evidence for prosome-MCP variability was published until the recent discovery of allotypic variations in the LMP-MHC-I complex and, in particular, subunit variations upon treatment of cells with interferon-y (162). Recent systematic comparison of Sarkosyl-purifiedprosomes from various human blood cells with those of HeLa cells seems to confirm the notion

PROSOMES (MULTICATALYTIC PROTEINASES; PHOTEASOMES)

45

that characteristic prosome populations exist in differentiated cells, confirming that the individual prosome particle is constituted by a variable rather than a unique combination of subunits (163). Furthermore, some reports relate variations of the MCP activities to changes in subunit composition (130, 141, 144). Even though the biochemical data on subunit variability of prosomes, in relation to differentiation, are still limited, the use of monoclonal antibodies in single- or double-label immunofluorescence studies on embryonic or adult tissues provides seemingly good evidence that differentiated cells each have a characteristic set of prosomes, constituted by specific types of antigens. In this respect, the observation of an asymmetrical distribution of antigens in hepatocytes and muscle cells showing that different types of prosomes co-exist within a given cell is particularly demonstrative (174,175). Similar types of data have been obtained in studies of Drosophila embryos (92) and on differentiating cells and tissues in humans and the nematode C. eleguns. Data obtained with five and eventually eight monoclonal antibodies on human biopsies, involving about 30 types of cells and tissues, confirm the notion that every cell and tissue has a specific “prosomal immuno-phenotype” (S. Poppema and K. Scherrer, unpublished). In the nematode, 27-kDa-specific prosomes were found in the seam cells, while the 30-33-kDa prosomal subunit is restricted to muscle cells (R. Schnabel and K. Scherrer, unpublished). In fact, it appears that prosomal antigens are among the earliest developmental markers segregating among particular types of cells (95),earlier, for example, than homeobox genes and oncogenes, at the time when early cell lineage determination occurs. Systematic immunocytological studies must have stringent controls. We have already discussed the monoclonal antibodies when relating the prosome-IF correlation in respect to cross-reactivity, specificity, and power of resolution (Section IV, B). Studies with polyclonal antibodies in several laboratories indicate that all cells contain prosomes, mainly in the cytoplasm, but also, in variable proportions, in the nucleus. Therefore, the observation by Foucrier‘s group (174) discussed above, of a specific prosomal antigen occurring in rat hepatocytes exclusively along the bile canaliculi, whereas prosomes probed with polyclonal antibodies are everywhere in the same cells (J. Foucrier, unpublished), provide an internal control that (i) monoclonal antibodies are specific, and (ii) prosomes of specific subunit composition are indeed specifically distributed in cells and tissues. Nevertheless, it cannot be fully excluded that some of the antigenic variations observed may be due to “shielding” of specific epitopes from the reacting p-mAbs. However, it is unlikely that aZZ of the different patterns observed in a dozen different tissues and species are due to this kind of phenomenon. Biochemical studies and cell fractionation, carried out in parallel with all

46

KLAUS SCHEHRER AND FAYGAL BEY

of the immunological studies published thus far by us, led to another hndamental observation: In all cells tested, prosomal antigens exist only in 19-S complexes but not as free soluble proteins. Proof that free prosomal antigens can be detected when present in significant amounts was obtained in the studies carried out on prosomes and the heat-shock response; there, prosome dissociation indeed occurs and can easily be detected (10). We may, therefore, safely assume that the different prosomal antigen patterns observed in differentiating cells relate to the intact particles but not to free antigens, at least at the detection level usual in such studies. This fact is also of prime importance on theoretical grounds, since it indicates that prosome populations may be controlled by biosynthesis but not by the assembly of particles from pre-existing pools of subunits. Developmental studies have been carried out on the sea urchin, Axolotl, Pleurodeles, chicken, fetal rat liver, Drosophila, and, recently, the nematode C . elegans. From the sum of these studies, the following facts emerge. (1)Prosomes abound in oocytes. In the sea urchin, their number was estimated to be 1W per oocyte (122);immunofluorescence staining patterns confirm this notion (180). (2) Prosomal proteins seem to be imported into the oocytes. Axolotl oocytes, freed of the shell of follicular cells, did not incorporate T h ~ e t h i o n i n einto prosomal subunits, while other proteins were labeled (193). (3) During oogenesis, prosomes can be observed at the level of the lampbrush chromosomes and the nuclear matrix in the diplotene stage; at the same time, the bulk of prosomes accumulate in the cytoplasm. But, in mature oocytes, the highest prosome concentration is in the nucleus; upon parthenogenetic induction, they are redistributed all over the ooplasm (193). (4)In the sea urchin (180),Phurodeles (95), and the chick (179) de nmo protein synthesis of prosomal proteins initiates only at the blastula stage and extends through all types of prosomal subunits during gastrulation. (5)The first three cell divisions of the sea urchin embryo seem to distribute prosomes symmetrically (180),while, most interestingly, at about the 30cell stage in Pleurodeles (95), as well as in the nematode (R. Schnabel and K. Scherrer, unpublished), the still cytoplasmic prosomal antigens distribute asymmetrically, in analogy with, for example, the homeotic gene products segregating later. (6)Surprisingly, upon blastulation, prosomes that are, presumably, still tnuternal, concentrate in the nucleus in Pleurodeles, the chicken, and possibly the nematode. This is the stage not only when embryonic prosome synthesis starts, but also when embryonic development becomes dependent on zygotic gene transcription. From this developmental stage on, asymmetric and cell-specific prosomal antigen distribution prevails, as already observed

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

47

in the blastula stage at the nuclear level of Pleurodeles (95), chicken (179), and nematode embryonic cells (R. Schnabel and K. Scherrer, unpublished). (7) In later stages of development, tissue- and cell-specific distribution of prosomal antigens is predominant; it shows restriction to specific embryonic sheets. For example, the 27-kDa prosomal antigen is restricted to the mesoderm in Pleurodeles, being absent in the ectoderm and at the notochord level (95). (8) In further post-gastrula stages of development, in specific types of cells and tissues, specific prosome antigens move from the nucleus to the cytoplasm and, in some types of cells, occupy specific sectors there. This is particularly evident in the nematode, in which specific antigens are present in specific cell lineages (loc. cit.) and in the fetal rat hepatocytes, in which specific types of prosomes move to the bile canaliculi at 17 days' gestation (174), or in the rat muscle, where specific types of prosomes (but not others) occupy selective zones within the sarcomeric structure ( I 75). These basic notions have been confirmed and extended by studies on Drosophih and other types of embryos and tissues. Particular attention should be drawn to those studies (194)showing tissue-specific segregation of specific prosomal antigens in the Drosophila embryos and to those (195) in crustacean striated muscles. Although these studies were not carried out with monoclonal antibodies, the data obtained fit perfectly with the observations and ideas developed above. On the basis of the notion that some of the prosomes are associated with mRNA, the observed patterns of specific prosome appearance and distribution in tissues and in cellular sectors are most interesting. Indeed, prosome distribution mimics most closely what we know about occurrence, storage, and cytodistribution of messenger and pre-mRNA. This is not proof that prosomes are directly involved in differential gene expression, in relation to cell differentiation and embryonic development, but provides a conceptual basis for the continued investigation of such a working hypothesis. It is evident that many arguments can also be brought forward for differentiationand developmental stage-specific MCP action; here also, further investigation is necessary to establish such a correlation.

VI. Variations of Prosome Patterns in Pathology In view of the variations of specific cytolocation and prosome subunit composition in differentiation and development, it became evident that the prosome system might respond to any type of physiological change and may

48

KLAUS SCHERRER AND FAYGAL BEY

thus be altered also in pathology. This supposition was reinforced by progressive evidence that many parameters of the prosome system reflect those of the pre-mRNA and mRNA, in terms of gene-specific nucleocytoplasmic distribution of transcripts, differential expression, and highly specific subcellular cytodistribution. Since prosomes are associated with untranslated mRNA, the prosome system might be sensitive to any modulation of the system of protein biosynthesis itself, in terms of transcription, transfer to the cytoplasm, translational activity, and post-transcriptional repression, as well as protein processing and degradation. In view of the combinatorial and variable subunit composition of the individual prosome particles, possibly reflecting the steady state of mRNA and MCP, it was also evident that such a system might be particularly sensitive and precise in a diagnostic sense, if analyzed by subunit-specific molecular probes and, in particular, monoclonal antibodies (Fig. 6). In terms of medical research and diagnostics, one must distinguish between the direct effects of pathologies of the prosome system itself and the possible impact of any pathology on the mRNA and, indirectly, the prosome and M C P systems. No prosome-related disease is presently known with certainty. However, in view of the surprising immunogenicity of intracellular prosomes (mouse prosomes injected into mice elicit a strong immunoresponse), on the one hand, and the presence of prosomes at the surface of cells and free in the serum, on the other, altered prosomes might be causally involved in some autoimmune diseases. The variable mosaic structure of the individual particle and the extensive degree of post-translational modifications of the protein subunits in terms of phosphorylation, glycosylation, etc., indicate that the immune system may be extremely responsive in controlling self and non-self types of prosomes and, therefore, also to modified particles. In confirmation of this allegation, anti-prosome autoimmune antibodies have been found in patients (25, 196; M . Olink, W. Van Venroij and K. Scherrer, unpublished). The appearance of this type of autoimmune antibody may be in response to modulations of the physiological subunit composition of prosomes presented at the surface of cells or free in the serum, and to biochemical modifications of prosomal proteins. It may also represent a response to the release into the serum of intracellular prosomes in the case of tissue inflammation and cell necrosis due to tumors, cirrhosis, or other types of tissue degradation. Since prosomes are extremely resistant to breakdown, once released they may have a fair chance of provoking an immunoresponse, prior to degradation. However, the possibility that the prosome system may be causal in some autoimmune diseases should not be neglected. The prosome system’s response to pathology can be diagnostically evaluated in two ways. One is purely quantitative, measuring prosome concentration, while the other tries to exploit in a qualitative sense the appearance of

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

49

FIG. 6. The use of anti-prosome antibodies in cytology and the concept of clinical diagnostics. Prosomes are made up by a combination of subunits that can be probed by punek of monoclonal antibodies. This multiparameter analysis can recognize with high resolution (compared to a single antibody) individual cells andlor tissues with a specific or altered prosomal immuno-phenotype, that is, a qualitatively or quantitatively modified presence of individual antigens relative to a specific or "normal" standard. Moreover, the intracellular distribution of individual antigens can be normal or abnormal (e.g., nucleocytoplasmic distribution). Particularly important is the dramatic increase in extracellular prosomes found in some pathologies. This analysis is amenable to automatic evaluation, for example, flow cytometry or cytoscanlimage processing systems. (Prosomal mAbs are distributed by Organon Teknika,Veedijk, Turnhout, Belgium).

prosomes of nonphysiological subunit composition at specific locations in specific cells and tissues. This multiparameter analysis can be based on cDNA specific to mRNA of prosomal genes, or on sets of monoclonal antibodies, by diEerential analysis of tissues and cells by histological, cytological, and flow cytometric methods (Fig. 6). The concept of a prosome-based multiparameter diagnostic system is particularly interesting in view of the invari-

50

KLAUS SCHEHHEH AND FAYGAL BEY

able biophysical structure, complexity, variability, and sensitivity to physiological change of the subunit composition. Furthermore, the qualitative and quantitative survey of the population of extracellular prosomes and autoimmune antibodies in the blood serum is particularly interesting and amenable to diagnostic analysis. The increase in prosomal mRNA has been observed in a variety of leukemic cells. Interestingly, this seems to reflect an increased turnover of mRNA (171),since the absolute level of prosomes, as probed by antibodies, was not increased. On the other hand, increased amounts of prosomal antigens in the nuclei of leukemic (171)and breast cancer cells (197,198) have been reported. This qualitative response seems to reflect primarily cell proliferation, since PHA-stimulated lymphocytes exhibit the same pattern of response

(171). The multiparameter diagnostic approach based on p-mAbs has been applied to breast cancer in a study bearing on the particularly interesting Indian Parsi population, which was compared to normal European subjects. Among the Parsi, 50% of all female cancers are breast cancer (199; M. G. Deo, personal communication). This study, which involved histology as well as flow cytometry of suspended biopsies, showed modifications in the expression of particular prosomal antigens in the cancer tissue and, interestingly, in the apparently normal tissue of cancer patients. Furthermore, there may be some digerence in the differential prosome patterns in the normal breast tissue of Parsi patients compared to non-Parsi Indians and Europeans, as well as differences between benign and malignant tumors in the Indian population (197, 198). Such studies have also been undertaken comparing normal liver, hepatomas, and hepatocarcinomas in the rat and human liver; differential overexpression of a series of prosomal antigens was observed in both species (S. Poppema, M. Olink and K. Scherrer, unpublished). Particularly interesting in a perspective of diagnosis is the increase in extracellular prosomes in a variety of pathologies (192),particularly in breast cancer (198)and colon carcinoma (191). Indeed, the old dream of a comprehensive blood-based diagnostic test revealing specific organ-based pathologies “at a distance” may become reality, particularly in relation to cancer, in view of the tissue- and physiological-state-related variable mosaic structure of prosomes and their natural release into body fluids. However, the data published thus far on the relationship of prosomes and pathology are still very limited and should be interpreted with caution, in view of their still-restricted scope and statistical validity. Nevertheless, they seem interesting enough to stimulate extensive research in the medical field and, possibly, for diagnostic applications.

PROSOMES (MULTICATALYTIC PROTEINASES; PROTEASOMES)

51

VII. Attempts at Comprehension A. Fascination and Frustration: Much Data and Little Comprehension Reviewing the prosome-MCP literature, one is struck not only by the wealth of facts accumulated in a few years and, even more so, by the vastness and complexity of the phenomenon, but also by the lack of comprehension. First of all, prosomes do exist as defined physiological particles and biochemical entities-not a trivial fact for the old-timers in the field-and one could suspect them to be ubiquitous throughout living matter. Yeast genetics have already demonstrated that they have a vital function (log), and T. acidophilum has delivered us a first clear-cut biophysical structure (4). Among the hard facts are the sequence data: Unquestionably, a totally new and homogeneous gene family has been discovered which, from archeobacteria to humans, must have a basic physiological function. As such, it is a perfect example of evolution: A cryptic founder gene seems to have been the parent of two genes coding for the first functional complex that, in its biophysical structure, is strictly maintained throughout living matter. These two genes are highly conserved right up to humans, with about 40% similitude, a degree of evolutionary conservation comparable to that of cytochrome c. Then, on the other hand, there was a rapid gain in complexity by vast and relatively fast diversification within the given physical structure, based on many dozens of genes all related to the two that constitute the archaic functional particle. This gain in complexity of the prosomes seems to reflect perfectly the increased complexity of the genomes and organisms. However, even at this level of hard facts, the lack of comprehension is embarrassing. Thus far, even if some hints as to prosome structure could eventually be drawn from the sequence data, n o positive conclusion could be reached as to the basic functions of the complex, not even as a protease-a perfect illustration of the futility of the idea, popular a few years ago, that sequencing of genes would make it possible to read with ease the book of creation’s intentions! On the other hand, the wealth of sequence data now available will certainly facilitate. within a few years, the use of molecular and classical genetics, particularly in yeast, and a rapid advance of functional studies leading, eventually, to comprehension. Another hard fact is the protease activity, which is at present the object of most of the prosome-MCP papers. If the basic enzymatic mechanisms are well characterized, we are only just beginning to comprehend the finality of the MCP system in the living cell. Interplay with the ubiquitin system complicates matters, since it is hard to see at the moment where substrate

52

‘KLAUS SCHEHHER AND FAYGAL BEY

specificity comes in, either in selective ubiquitinylation or in substrate recognition by the prosome-MCP particle in the 26-S proteasome. As to the biochemical mechanisms generating the protease activity, rapid progress can be expected on the T. addophilum particle, given its simplicity and easy manipulation. As for the rest, yeast genetics have already indicated that some subunits are involved in individual MCP activities; and this game will have to continue for some time if the mosaic of subunit interactions relative to the various MCP activities is to be solved. The perspective that we may be dealing with an enzyme ‘3 la carte,” which would be a novelty in nature, is fascinating. Indeed, the compositional variability of the particle mosaic structure-if it extends to the subunits generating the individual MCP activities-might modulate the protease core and its specificities. Superimposed on this is the substrate recognition mechanism, in all likelihood the task of the two outer rings constituted of a-type subunits in-supposedly-a variable combination. The P-type proteins of the two inner rings would then handle cleavage of the accepted protein in the internal milieu of the primordial “Anfinson cage” (8).The 2 6 4 proteasome represents, most likely, in uiuo the exclusive proteolytically active form of the prosome-MCP. Built of a variety of additional components, including the members of a new family of ATPases and additional proteases, as well as various inhibitors and activators, it might represent another, higher level of combinatorial complexity, allowing, possibly, cleavage with high selectivity of individual polypeptide substrates and amino-acid motifs. The theoretical potential inherent in the variable compositional nature of the basic prosome particle is most interesting; for the first time, to our knowledge, we observe in nature the existence of the three hndamentals of a genuine “multikey system”: a strict physical structure, function(s),and vast multiparameter variation. The reality of such variation of the mosiac subunit structure seems well established for prosomes of vertebrates and Drosophila; nevertheless, not enough is known as yet about the biochemical variability of individual particles within a given cell; the multikey game might, in reality, be severely restricted! But, as ever, theory is primordial, and in this case the facts can be tested. Sensing proteins by unfolding them is already a basic property of the Gro-EL-type chaperonins which also form as the prosome particles, an “Anfinson cage” (8).Indeed, the mechanisms of protein recognition may precisely represent the interphase of the prosome hnction(s) at the levels, on the one hand of the MCP, and on the other of the mRNP and the cytoskeleton, which may or not be related directly to its protease activity. On the far side of the “Rubicon,” five facts may be taken into account. (1) Prosomes are related to untranslated mRNA; they seem to represent a system of trans-acting factors at the (pre-?)mRNP level of high compositional

PHOSOMES (MULTICATALYTIC PHOTEINASES; PHOTEASOMES)

53

variability reflecting the rules of a multikey system. (2) They inhibit protein synthesis in oitro and are released, in oioo, from the mRNA prior to translation. (3) They are associated with the cytoskeleton and, most likely, the nuclear matrix, and primarily with the IFs of all types but also, to some extent, with the actin and tubulin networks. (4) Prosomes are highly stable autonomous particles, found not only inside all eukaryotic cells tested but also outside, in the body fluids, being particularly abundant in cases of pathology; they may thus have functions beyond the individual cell border. (5) They contain at least one major type of RNA that is, surprisingly, a tRNA of the retroviral primer type; this sequence, tRNALyS.3 indicates a relationship between the prosome RNA and the genomic retroposons of the SINE (e.g., A b I ) type (remember that tRNAs represent the most “archaic” type of HNA, operating at the borderline between the protein and nucleic acid “worlds”). On a11 of these levels, we are confronted with a wealth of data, but little is understood as yet. Although the data are highly suggestive, our working hypothesis, that somehow prosomes participate in post-transcriptional controls of protein synthesis and gene expression, has still to be tested and proven; not much is known at the functional level of mRNP and the cytoskeleton, at present, beyond the descriptive facts developed in this review and the capacity of prosomes to interfere with in oitro protein synthesis.

B. The Prosome-MCP Function(s) at the Level of Protein Synthesis and Catabolism One of the most intriguing questions concerning the prosome-MCP particles remains, thus, their physiological fiinction(s). In Fig. 7, we suggest a theoretical scheme that attempts to reconcile the various aspects of potential prosome function, particularly its apparent dual involvement in the protein synthesis machinery and the catabolic system. Let us develop briefly the fundamentals of our reasoning, in recalling some of the facts discussed separately in the preceding pages. Taking the data on the prosome’s presence and, most likely, function at mRNP and matrix/cytoskeleton level together with their proteinase activity, one may consider that prosomes have a dual role in the homeostasis of specific proteins in the cell: According to the scheme proposed in Fig. 7, prosomes might serve, on the one hand, in post-transcriptional processing; in transport, distribution, and control of stored mRNPs, including cytodistrihution of specific mRNAs on the cytoskeleton; and in the processing and/or degradation of targeted proteins, on the other. We have classified the cytoplasmic mHNPs (Section I,B) into two types, depending on their distribution in the cells’ functional compartments (31):(i) polyribosome-associated,translationally active and (ii) ribosome-free, trans-

54

KLAUS SCHERRER AND FAYGAL BEY

FIG.7. A model for the possible involvement of the prosome-MCP particles in the homeostasis of specific protein levels. The different roles of prosomes might be hypothesized as follows. Protein synthesis: Prosomes associate with the nascent pre-mRNA complex and migrate on the nuclear matrix with the mRNA to the nuclear pore. In the cytoplasm, they migrate with the mRNP complex on the intermediate filaments to the sites of protein synthesis located in the microfilaments. There, the mRNA is transferred to the microfilaments for translation, the prosomes falling into the general pool for recycling on the nuclear pre-mRNP or cytoplasmic mRNP. Protein cleavage: The same free prosome pool is in equilibrium with higher-order complexes, particularly 264 proteasomes, cleaving accordingly either normal or ubiquitinylated polypeptides. Such highly substrate-specific cleavage may produce functional peptide fragments, or else initiate a chain of selective breakdown, being taken care of by nonspecific endo- andlor exopeptidases. The variable subunit composition of the individual prosome particles, their chemical modifications, allosterically acting factors, protein cmfactors, and the presence or absence of its pRNA would determine functional states and target specificity at the mRNA and polypeptide levels.

lationally inactive ~ R N P (see s Fig. 2). Prosomes are part of the latter and are thus among the trans-acting factors of the free mRNP complexes that control and maintain translational repression. However, since the core mRNPs isolated in high ionic strength, and hence dissociated from the prosomes (70), are still translationally repressed in vitro (123), prosomes do not seem to be a repressor factor per se, but simply have the capacity to induce repression. Other factors and in particular the proteins of the mRNP core must be responsible for maintaining long-term repression of free mRNPs, at least in somatic cells. Another hypothesis as to the role of prosomes at the mRNA level origi-

PHOSOMES (MULTICATALYTIC PHOTEINASES; PHOTEASOMES)

55

nates in the observation that prosoines are not only associated with repressed inRNA but also linked to chromosomes and the nuclear matrix (95), and to some of the I F networks of the cytoskeleton (74-76, 183). Specific types of prosoines may thus accompany families of inRNA during their transport on the nuclear matrix and the I F subnetworks selected according to the physiological state and the specialization of given differentiated cell types (24). At the mRNA level, the prosoines might thus serve in the transport, distribution, and finally, the control of translational activity of specific inRNAs that have to be released on the inicrofilainents prior to translation (72, 73). This might give a conceptual basis for the selective transport and positioning of specific inRNA in restricted areas of the cell (80). Prosoines are thus present in the cell where two distinct types of gene control mechanisms operate: (1)the positive controls regulating pre-mRNA processing and inRNA transport in the nucleus and selective cytodistribution in the cytoplasm; and (2) the negative controls at the level of the untranslated mRNP, to the extent that prosomes, as well as the other trans-acting mRNAassociated factors, must leave the inRNA prior to its translation. Whether prosomes have an active function in these positive and negative controls remains to be tested. With regard to the function of the prosome-MCP particles in protein turnover, the data accumulated imply enzymatic activities in several proteolytic processes. The function of the 26-S complex, of which prosoines are a constituent, seeins to be well established, that is, its involvement in the ubiquitin-dependent and -independent degradation pathways (131,135,136), while the participation of the particles in the specialized function of antigen presentation by the MHC systein (20, 200) recently, contested (51, 52), remains to be proven conclusively. The demonstration that specific patterns of prosoinal antigens are found according to cell type and differentiation (95, 168, 174, 193), and that the accuinulation and turnover of the particle are differentially regulated in the early development of Pleurodeles (95),the chick (1 74),and Drosophila (194) suggest that the prosomes, not only those functionally associated with the inRNP and the cytoskeleton but also with the MCP activity, inay participate in the development and cell-specific activation or inactivation of factors involved in regulatory mechanisms. Beyond these and other possible instances where prosome-MCP activity might be involved in mechanisms of protein synthesis, the more general role of MCP activity is certainly in the processing and catabolisin of individual proteins. It is obvious that homeostasis of individual specijic proteins must be regulated at the levels of protein synthesis, post-translational processing, and degradation. It is also obvious that these processes must be coordinated,

56

KLAUS SCHEHHEH AND FAYGAL BEY

involving recognition, on the one hand, of specific mRNA, or more precisely mRNP, and, on the other, of the individual protein products. Since protein synthesis in bacteria is mainly, if not exclusively, regulated at the transcriptional level, in prokaryotes homeostasis of individual proteins is essentially a problem of catabolism. Hence, the MCP might have evolved by developing its function from the primordial “Anfinsen cage” (8),eventually gaining enzymatic complexity and, most importantly, a capacity to recognize specific proteins. Once eukaryotic systems arose, post-transcriptional regulation became obligatory to control in time (56)and, later in the evolutionary progression, in the cellular space (84) the already extant (pre-)mRNA. Trans-acting factors became necessary to assume these functions and gave rise to the RNP complexes, subject to selective RNA processing and transport. With an estimated 106 genes in the human genome, in any cell at any time about 1 0 5 (i.e., about 10%)of them are transcribed (see, e.g., 63), and 104-105 different polypeptides exist in most cells and must be controlled in a selective manner (see “Cascade Regulation” in 56).This is a formidable task in mechanistic terms. The main mechanistic problem in temporal and spatial regulation of genetic information is, in this case, once more the recognition of specific proteins. Indeed, recognition of specific (pre-)mRNA is possible via the associated structural RNP proteins (of which there are probably 200-500 species) by trans-acting factors, acting possibly beyond the individual cell. Such mobile factors can be subject to control by secondary messengers or any type of environmental or humoral agent. Furthermore, the association of specific proteins in the (pre-)mRNP being dictated by the RNA sequence, such factors might condition the integration of the mRNP into the cell’s dynamic architecture, which represents the physical “milieu” of the information-bearing inolecules (84). Possibly, nature has reduced the load to the genome of this type of complex control by evolution of a shuttling factor intervening in both types of mechanisms, protein synthesis and catabolism, which must be coordinated by necessity, in a manner such as that outlined in Fig. 7. Accordingly, the early prosorne-MCP of the prokaryotic systems integrated, upon emergence of the eukaryotic systems, into the chain of gene expression recognizing the thousands of information-carrying (pre-)mRNPs at the chromatin, nuclear matrix, and cytoskeletal levels. The known properties of the prosomesunity of physical structure, compositional diversity of protein and RNA components, and interaction with RNA and protein in complex cellular structures-make it a possible candidate for this type of still-hypothetical ambivalent control factor. Only the future, based on many man-years of investigation, will tell whether the present analysis correctly takes into account the biological facts, or whether totally new ways of comprehension

PHOSOMES (MULTICATALYTIC PHOTEINASES; PHOTEASOMES)

57

will have to emerge in order for us to fully embrace the prosome-MCP phenomenon.

VIII. Glossary CAP

EM GFAP IIF IF IRE IRE-BP LINE LMP (complex)

mAbs MCP

“cap” structure at the 5’ end of pre-mRNA and mRNA consisting of 7-methyl-guanylate in a 5’+5’ phosphotriester bond. electron microscopy glial fiber acidic protein; type of intermediate filament characteristic for glial cells (astrocytes) indirect immunofluorescence intermediate filament iron response element (69) iron response element-binding protein (69) long interspersed nuclear element; repetitive DNA element encountered in eukaryotic genomes (117) low-molecular-weight protein complex; term synonymous for prosomes; used by immunologists for the complex precipitated by allogenic sera (20,21) monoclonal antibodies multicatalytic proteinase; synonym for “prosome,” used by enzymologists for the particle when observed as a proteinase having multiple proteolytic specificities

(14.15) MCPC MHC-I and -11 NF NLS PABP p-inAbs

PROS Prosome-MCP Proteasome Ring SINE

multicatalytic proteinase complex major histocompatibility complex of class I and I1 neural filament; type of intermediate filament characteristic of neurons nuclear localization signal poly(A)-bindingprotein monoclonal antibodies directed against prosomal proteins prosomal protein genes multicatalytic proteinase activity of the prosome synonym for “prosome,” “MCP,” and “LMP”; introduced a posteriori in the assumption of an exclusive protease function of the prosomes (14,15,38, 39) genes interspersed with MHC-I1 genes in the MHC gene cluster (16) short interspersed nuclear element; repetitive DNA element encountered in eukaryotic genomes (118)

KLAUS SCHEHKEH AND PAYGAL BEY

58

TAP1 and 2

transporter-associated protein genes interspersed with MHC-I1 genes in the MHC gene cluster (16, 159, 160)

ACKNOWLEDGMENTS The authors thank the colleagues who provided preprints and results of unpihlished work to be used in this review. We thank a11 of our adlaliorators and colleagueswho cumtril)utedto the prosoine story throughoiit the last 10 years within our 1al)orati)ry.as well as those from oiitside, particularly Carlo Chewi, Jean-Paul Bureau, Jean Foucrier, Annika Arnlwrg, Boll Van Hengel, Frdnf,wis Zajdela, and Hans Bloeinendd. We thank F. Gros, F. Jacob, P. Chainl)on, R. Monier, M. Griiiil,erg-Maiiiigo, A. Kdin, and P. Tainlwurin for long-standing support in the difficult enterprise to conduct unwnventiond research within an essentially conserviitive rumtext. The help of R. Rohart, C. Ciiisinier, and H. Nguyen Cong in preparing the inaniiscript and the artwork of R. Schwdrtzmann are gratefrilly acknowledged. Our investigdtions would not hive Iwen possible without the special financial help of the Association pour la Recherche cmntre le Cancer (ARC; President, J. Crozeinarie), the Ligue Nationde Fnnpise contre le Cancer (M. J.-F.Bach), and the Association Frdnpise rvntre les Myopathies (AFM; MM B. Bardtaud and F. Gros), as well as the assistance of Anvar (M Ch. Gani), Prosoma Sarl (B. P. Van Hengel), and Organon Teknika (MM W. Van Everdingen and B. Van Weeinan),complementingthe Imic support of CNRS and INSERM.

REFERENCES 1. G . Spohr, N. Granboulan, C. Morel and K. Scherrer, EJB 17, 296 (1970). 2. T. Hohn, B. Hohn, A. Engel, M. Wurtz and P. R. Smith, J M B 129, 359 (1979). 3. R. Hegerl, G. Pfeifer, G. Puhler, B. Dahlmdnn and W. Baumeister, FEBS Lett. 283, 117 (1991). 4. G . Puhler, S. Weinkauf, L.Bachmann, S. Mueller. A. Engel, R. Hegerl and W. Baumeister, E M B O J . 11, 1607 (1992). 5. 0. Coux, H.-G. Nothwang, K. Scherrer, W. Bergsma-Schutter, A. C. Amberg, P. A. Tiininins, J. Langowski and C. Cohen-Addad, FEBS Lett. 300, 49 (1992). 6. F. Kopp, B. Dahlrnann and K. B. Hendil, JMB 229, 14 (1993). 7. T. Langer, C. Lu, H. Echols, J. Flanagan, M. K. Hayer and F. U. H a d . Nature 356,683 (1992). 8. H. R. Saibil, D. Zheng, A. M. Roseman, A. S. Hunter, G. M. F. Watson, S. Chen, A. Auf der Mauer, B. P. O'Hara, S. P. Wood, N. H. Mann, L. K. Barnett and R. J. Ellis, Cum. B b l . 3, 265 (1993). 9. J. Zeilstra-Ryalls, 0. Fayet a d C. Georgopoulos, Annu. Rea Microbiol. 45, 301 (1991). 10. C. Martins de Sa, E. Rollet, M.-F. Grossi de Sa, R. M. Tangudy, M. Best-Belpoinme. and K. Scherrer, M C B i o l 9 , 2672 (1989). 21. G. N. Demartino and A. L. Goldberg, JBC 254, 3712 (1979). 22. I. A. Rose, J. 11. Warins and A. Hershko, JBC 254, 8135 (1979). 13. S. Wilk and M. Orlowski, J. Neunichetn. 35, 1172 (1980). 14. A. P. Arrigo, K. 'ldnaka, A. L. Goldberg and W. J. Welch, Nature 331, 192 (1988).

PHOSOMES (MULTICATALYTIC PROTEINASES; PHOTEASOMES)

59

15. P. E. Falkenburg, C. HWS, P. M. Kloetzel, B. Niedel, F. Kopp, L. Kuehn and B. Dahlmann, Nature 331, 190 (1988). 16. M. Robertson, Nature 353, 300 (1991). 17. A. L. Goldberg and K. L. Rock, Nature 357, 357 (1992). 18. P. J. Travers and C. J. Thorpe, Curr. Biol. 2, 679 (1992). 19. J. J. Monaco and H. 0. McDevitt, Nature 309, 797 (1984). 20. M. C. Brown, J. Driscdl and J. J. Monaco, Nature 353, 355 (1991). 21. C. K. Martinez and J. J. Monaco, Nature 353, 664 (1991). 22. H . 4 . Nothwang, 0.Coux, G . Keith, I. Silva-Peirera and K. Scherrer, NARes 20, 1959 (1992).

23. I. Vaithilingam and R. A. Cook, Biochetn. Znt. 19, 1297 (1989). 24. K. Scherrer, M. Olink-Coux, 0.Coux, M.-F. Grossi de Sa, J. K. Pal, C. Martins de Samd J. F. Bun, in “Structure and Function of the Cytoskeleton” (B. Rousset, ed.),Vol. 171, p. 349. INSERM and Liblwy, Paris, 1988. 25. K. Scherrer, Mol. Bid. Rep. 14, 1 (1990). 26. J. Dulmhet, C. Morel, B. Lebleu and M. Henberg, EJB 36, 465 (1973). 27. J. R. Harris, Nouo. Fr. Hettiatol. 22, 411 (1980). 28. J. R. Harris, Micron Microsc. Actu 14, 193 (1983). 29. N. Domae. F. R. Harmon, R. K. Busch, W. Spohn, C. S. Subramanyam and H. Busch, Li$e Sci. 30, 469 (1982). 30. F. R. Harmon. W. H. Spohn, N. Domae, C. So0 Ha and H. Busch, Cell Biol. Int. Rep. 7 , 333 (1983). 31. A. Vincent, S. Coldenberg, N. Standard, 0. Civelli, M.-T. Imaizumi-Schemer, K. Maundrell and K. Scherrer, Mol. Biol. Rep. 7, 71 (1981). 32. K. Maundrell, E. S. Maxwell, 0.Civelli, A. Vincent, S. Goldenberg, J.-F. Buri, M.-T. Imaizumi-Scherrer and K. Scherrer, Mol. B i d . Rep. 5, 43 (1979). 33. H. P. Sclimid, 0.Akhaydt, C. Martins de Sa, F. Puvion, K. Koehler and K. Scherrer, EMBO J . 3, 29 (1984). 34. B. Hugle, J. A. Kleinschmidt and W. W. Franke, Eur. J . Cell Bid. 32, 157 (1983). 35. J. A. Kleinschmidt, B. Hugle, C. Grund and W. W. Franke, Eur. J . Cell. Biol. 32, 143 (1983).

36. C. Schuldt and P. M. Kloetzel, Deo. B i d . 110, 65 (1985). 37. A. P. Arrigo, J. L. Darlix, E. W. Khandjian, M. Simon and P. F. Spdir, E M B O ] . 4, 399 (1985).

38: B. I)ahlmann, L. Kuehn, S. Ishiura, T. Tsukdiara, H. Sugita, K. Tanah, J. Rivett, R. F. Hough, M. Reclisteiner, D. L. Mykles, J. M. Fagan, L. Waxman, S. Ishii, M. Sasaki, P. M. Kloetid, H. Harris, K. Ray, F. J. Belial, G . N. DeMartino and M. J. MrCuire, BJ 255, 750 (19%). 39. M. Orlowski and S. Wilk, BJ 255, 751 (1988). 40. H.-G. Nothwang, 0. Coux, F. Bey and K. Scherrer, EJB 207, 621 (1992). 41. A. J. Rivett, JBC 264, 12215 (1989). 42. M. Orlowski, Bcheai 29, 10289 (1990). 43. K. Tanaka, T. Tamura, T. Yoshimurd and A. Ichihara, New Biologist 4, 173 (1992). 44. J. Driscdl and D. Finley, Cell 68, 823 (1992). 45. A. Hershko and A. Ciechanover, ARB 61, 761 (1992). 45b. B. Richter-Ruoff and I). H. Wolf, FEBS Lett. 336, 34 (1993). 46. A. J. Rivett and E. Knecht, Curr. B i d . 3, 127 (1993). 47. A. J. Rivett, BJ 291, 1 (1993). 48. M. Rechsteiner, L. Hoffman and W. Dubiel, JBC 268, 6065 (1993). 49. M. Orlowski, J . Lab. Clin. Med. 121, 187 (1993).

60

KLAUS SCHEHHEH AND FAYGAL BEY

50. W. Hilt and D. H. Wolf, Mol. Microbfol. 6, 2437 (1992).

51. D. Arnold, J. Driscd, M. Andmlewicz. E. Hughes, P. Creswell and T. Spies, Nature 360, 171 (1992).

52. F. Momburg, V. Ortiz-Navarrete, J. Neefjes, E. Goulmy, Y. Van de Wal, H. Spits, S. J. Powis, G. W. Butcher. J, C. Howard, P. Walden and G. J. Hainmeding, Nature 360, 174 (1992).

52b. J. Drismll, M. G . Brown, D. Finley and J. J. Monarv, Nature 365, 262 (1993). 52c. M. Guczynska, K. L. Rock and A. L. Goldberg, Nature 365, 6443 (1993). 53. M. G i r d and D. Baltimore, PNAS 56,999 (1966). 54. K. Schemer, Abh. Dtsch. Akad. Wiss. Berlin, Kl. Med. 1968, 259 (1968). 55. K. Schemer, Adti. Exp. Med. Bid. 44, 169 (1974). 56. K. Schemer, in “Eukaryotic Gene Regulation” (G. Kolodny, ed.), Vol. 1, p. 57. CRC Press, Bnca Raton, Florida, 1980. 57. C. Morel, B. Kayibanda and K. Scherrer, FEBS Lett. 18, 84 (1971). 58. E. M. Lukanidin. E. S. Zalmanzon, L. Komaromi, 0. P. Samarina and G. P. Georgiev, Nature N B 238, 193 (1972). 59. L. Lothstein. H. P. Arenstorf, S. Y. Chung. B. W. Walker, J. C. Wooley and W. M. LeSturgenn, J . Cell B b l . 100, 1570 (1985). 60. G . P. Leser, J. Ewua-Wilke and T E. Martin. JBC 259, 1827 (1984). 61. K. Maundrell and K. Scherrer, EJB 99, 225 (1979). 62. A. Spirin, EJB 10, 20 (1969). 63. M.-T. Imaizumi-Scherrer, K. Mauadrell, 0. Civelli and K. Scherrer, Deti. B i d . 93, 126 (1982).

64. G. Spohr, B. Kayibdnda and K. Schemer. EJB 31, 194 (1972). 65. N. Standart, A. Vincent and K. Scherrer, FEBS Lett. 135, 56 (1981). 66. A. Vincent, S. Goldenberg and K. Schemer, EJB 114, 179 (1981). 67. S. Goldenberg, A. Vincent and K. Scherrer, NARes 6, 2787 (1979). 68. A. Vincent, 0. Akhayat, S. Goldenberg and K. Scherrer, EMBO J . 2, 1869 (1983). 69. R. Cammuck, Curr. B i d . 3, 41 (1993). 70. C.Martins de Sa, M.-F. Grossi de Sa, 0. Akhayat, F. Broders, K. Scherrer, A. Horsch and H. P. Schmid, J M B 187, 479 (1986). 71. J. Hesketh, Biochetu. Soc. Trans. 19, 1103 (1991). 72. J. E. Hesketh and I. F. Pryme, BJ 277, 1 (1991). 73. K. L. Taneja, L. M. Lifshitz, F. S. Fay and R. H. Singer, J . Cell B i d . 119, 1245 (1992). 74. M.-F. Grossi de Sa, C. Martins de Sa, F. Harper, M. Olink-Coux, M. Huesca and K. Scherrer, J . Cell Biol. 107, 1517 (1988). 75. M. Olink-Coux, M. Huesca and K. Schemer, EJB 59, 148 (1992). 76. C. Arc-dngeletti, M. Olink-Coux, R. Minisini, M. Huescd, C. Chezzi and K. Scherrer, Eur. J . Cell Biol. 59, 464 (1992). 77. 0. Skalli and R. D. Goldman, Cell Motil. Cytoskel. 19, 67 (1991). 78. G. Zamlwtti, E. G . Fey, S. Penman, J. Stein and G. Stein, J . Cell. Biochetti. 44, 177 (1QQo).

79. R. Lenk, L. Ransom, Y. Kaufmann and S. Penman, Cell 10, 67 (1977). 80. R. H. Singer, Curr. Oyin. Cell B i d . 4, 15 (1992). 81. L. Mosquera, C. Forristall, Y. Zhou and M. L. King, I~ec;eIoptietit117, 377 (1993). 82. D. Melton, Science 252, k.34 (1991). 83. K. Schemer. Proc. FEBS Congr., 16th B, 79 (1985). 84. K. Scherrer, Biusci. Rep. 9, 157 (1989). 85. L. Manuelidis and J. Borden, Chrottausutna 96, 397 (1968). 86. J. B. Lawrence, R. H. Singer and L. M. Marselle, Cell 57, 493 (1989).

PHOSOYES (MULTICATALYTIC PHOTEINASES; PHOTEASOMES)

61

87. G. Blol)el. PNAS 82, 8527 (1985). 88. M. I). Pondel and M. L. King. PNAS 85, 7612 (1988). 89. P. Zwickl, A. Grimd, G. Piihler, B. Dahhlmann, F. Lottspeich aid W. Baumeister, Bcheni 31, 964 (1992). 90. H.-G. Nathwang, 0.COUX,F. Bey and K. Scherrer, BJ 287, 733 (1992). 91. J. A. Kleinschmidt, C. Escher and D. H. Wolf. FEBS Lett. 239, 35 (1988). 92. C. Hass and P. M. Kloetzel, Exp. Cell Res. 180, 243 (1989). 93. A. Crziwa, W. Baunieister, B. Dahlmann and F. Kopp, FEBS Lett. 290, 186 (1991). 94. B. Ddilmann, L. Kuehn, A. Grziwa, P. Zwickl and W. Baumeister, EJB 208, 789 (1992). 95. J. K. Pal, P. Gounon, M.-F. Grossi de Sa and K. Scherrer, J. Cell Sci. 90, 555 (1988). 96. J. P. Bureau. M. Olink-Coux, S. Baile-Jdien, P. Vagot, 0. Coux, M. Huesca, V. Aguilar, M. Hemlwrg and K. Sclierrer. EJCB in press (1995). 97. J. P. Bureau, L. Garrelly, P. VdgO, S. Bayle, M. Olink-Coux, V. Aguilar and K. Scherrer, B i d . CeN.67, 22 (1989). 98. F. Bey, I. Silva-Pereira, 0. Coux, E. Viegiu-P&pinot, F. Recillas Tare, H.-G. Nothwang, B. Dutrillaux and K. Scherrer, MGC 237, 193 (1993). 99. 0. Coux, H.-G. Nothwang, I. Silva-Pereira, F. Recillas Targa, F. Bey and K. Scherrer, MGG in press (1994). 100. S. A. Adam. T. Nakagawd, M. S. Swanson, T. K. Woodruff and G. 1)reyfuss. MCBiol6, 2932 (1986). 101. C. C. Query, R. C. Bentley and J. D. Keene, Cell 57, 89 (1989). 102. J. 1). Keene and C. C. Query, This Series 41, 179 (1991). 103. K. Tanaka, T. Yoshiniuru, T. Tamura, T. Fujiward, A. Kumatori and A. Ichihara, FEBS Lett. 271, 41 (1990). 104. P. Zwickl, F. Lottspeich, B. Dalilinanii and W. Baumeister, FEBS Lett. 278, 217 (1991). 105. C. H u s , B. Pesoldhurt and P. M. Kloetzel, NARes 18, 4018 (1990). 106. P. Zwickl, F. Lottspeich and W. Baumeister, FEES Ixtt. 312, 157 (1992). 107. G.C. Group, GCG Package Version 7, Alternate journal (1991). 108. P. I)essen, C. Fondrat, C. Valeiicien and C. Mugnier. CABIOS 6, 355 (1990). 109. W. Heinemeyer, J. A. Kleinschmidt, J. Saidowsky, C. Escher and D. H. Wolf, EMBOJ. 10, 555 (1991). 110. P. M. Kloetzel, P. E. Fdkenl)urg, P. Hossl and K. H. Glatzer, Exp. Cell Res. 170, 204 (1987). 111. H. E. Skilton, I. C . Eperon and A. J. Rivett, Biochern. SOC. Trans. 17, 1124 (1989). 112. 8.Dineva, W. Toniek, K. Kohler and H. P. Schmid, Mol. B i d . Rep 13, 207 (1989). 113. A. Horsch, K. Kohler, M. Ellwart-Tschum and H. P. Schniid, FEBS Lett. 269, 336 (1990). 114. W. A. Haseltine, in ‘The Human Retroviruses” (R.C‘. Gallo and G. Jay, eds.), p. 69. Academic Press, san Diego, 1991. 115. M. L. Gougeon, V. Colizzi. A. Dalgleish and L. Montapier, AZDS Res. Hum. Retrociruses 9, 287 (1993). 116. A. Chiechanover, S . L. Wolin, J. A. Steitz and H. F. Lodish, PNAS 82, 1341 (1985). 117. T. Fanning and M. Singer, NARes 15, 2251 (1987). 118. N. Okada, Current 1, 498 (1991). 119. A. S. Spirin, N. V. Belitsina and M. I. Lerinan, J M B 14, 611 (1965). 121. J. L. Grdiiiger and M. M. Winkler, MCBiol. 7, 3947 (1987). 122. J. L. Grainger atid M. M. Winkler, J . Cell B i d . 109, 675 (1989). 123. 0. Civelli, A. Vincent, K. Maundrell, J. F. Buri and K. Scherrer, E]B 107, 577 (1980). 124. A. Horsch, C. Martins de Sa, B. Dineva, E. Spindler and H.P. Schmid, FEBS Lett. 246, 131 (1989). 125. L. Kuehn, B. Dahlmdnn and F. Kopp, FEBS Lett. 261, 274 (1990).

KLAUS SCHEHHEH AND FAYGAL BEY

62

E. E. Wyckoff, D. E. Crmall and E. Ehrenfeld, Bchetn 29, 10055 (1990). E. E. Wyckoff, J. Hershey and E. Ehrenfeld, PNAS 87, 9529 (1990). S. Penman, K. Scherrer, Y. Becker and J.-E. Darnell, PNAS 49, 654 (1963). H. P. Schniid, dissertation. UniversitP Stuttgart, Stuttprt, Germany, 1982. P. E. Falkenburg and P. M. Kloetzel, JBC 264, 6660,(1989). Y. Murilkami, S . Matsufuji, T. Kameji, S. Hayashi, K. Igarashi, T. Tamura, K. Tanaka and A. Icliihara, Nature 360, 597 (1992). 132. M. Rechstiner, Annu. Rm Cell Biol. 3, l(1987). 133. R. Hough, G . Pratt and M. Rechsteiner, JBC '262, 8303 (1987). 134. L. Waxman, J. M. Fagan and A. L. Goldherg. JBC 262, 2451 (1987). 135. E. Eytan, D. Ganoth, T. Armon and A. Hershko, PNAS 86, 7751 (1989). 136. J. Drismll and A. L. Goldberg, JBC 265,4789 (1990). 137. A. Seelig, P. M. Kloet7~1,L. Kuehn and B. Duhlmmn, BJ 280, 225 (1991). 138. E. Orino, K. Tanaka, T. Tamura, S. Sone, T. Ogura and A. Ichihara, FEBS Lett. 284,206 (1991). 139. A. Ikili, M. Nishigai, K. Tanaka and A. Ichihiuu, FEBS Lett. 292, 21 (1991). 140. J. M. Peters, J. R. Harris and J. A. Kleinschmidt, Eur. J. Cell B i d . 56, 442 (1991). 141. B. Yu, M. E. Pereira and S. "ilk, JBC 268, 2029 (1993). 142. M. E. Pereira, T. Nguyen, B. J. Wagner, J. W. Margolis, B. Yuaid S. Wilk,JBC267,7949 (1992). 143. K. Friili. Y. Yang. D. Arnold, J. Chambers, L. Wu, J. B. Waters, T. Spies and P. A. Peterson, JBC 267, 22131 (1992). 144. D. Weitman and J. I). Etlinger, JBC 267, 6977 (1992). 145. K. Tanaka and A. Ichihara, BBRC 158, 548 (1989). 146. K. Tanaka. K. Ii, A. Ichihara, L. Waxman and A. L. Goldberg, JBC 261, 15197 (1986). 147. M. J. McGuire and G. N. DeMartino, BBRC 160, 911 (1989). 148. M. J. McGiiire, M. L. McCiilloiigh, D. E. Croall and G. N. DeMartino, BRA 995, 181 (1989). 149. K. Murakami and J. D. Etlinger, PNAS 83, 7588 (1986). 150. X. C. Li, M. Z. Gu and J. D. Etlinger, Bcherti 30, 9709 (1991). 151. M. Chu-Ping, C. A. Slaughter and G . N. DeMartino, BBA 1119, 303 (1992). 152. L. Hoffman, G. Pratt and M. Reclisteiner, JBC 267, 22362 (1992). 153. M. Chu-Ping, C. A. Slaughter and G . N. DeMartino, JBC 267, 10515 (1992). 1.54. M. Yukawa, M. Sakon, J. Kambayashi, E. Shiba, T. Kawasaki, H. Ariyoshi and T. Mori, BBRC 178, 256 (1991). 155. X. S. Li and J. D. Etlinger, Bchern 31, 11963 (1992). 156. D. L. Mykles and M. F. Haire, ABB 288, 543 (1991). 157. M. J. McGuire, J. F. Reckelhoff, D. E. Crwll and G. N. DeMartino, BHA 967, 195 (1988). 158. J. 1)riscull and A. L. Goldberg, PNAS 86, 787 (1989). 159. P. J. Bjorkman and P. Parham, ARB 59, 253 (1990). 160. R. N. Germain and D. H.Marplies, Annu. Rev. Zritttiutid. 11, 403 (1993). 161. D. Kappes and J.. L. Strominger, ARB 57, 991 (1988). 162. Y. Yang, B. Waters, K. Friih and P. A. Peterson, PNAS 89, 4928 (1992). 163. N. Brouard, Carwterisution des Prosomes de dHerentes celltiles sunguines, DEA-Thesis, University Paris 7 (1993). 164. B. Russell and D. J. Dix. Am. Physiol. Soc. C1 (1992). 165. K. Scherrer, S y m p . Sess., 15th Znt. Congr. Genet. p. 139 (1983). 166. K. Scherrer and J. Moreau, Proc. FEBS Congr., 26th B, 105 (1985). 167. U. K. Laemmli, Curr. Opin. Genet. Dec. 2, 275 (1992). 126. 127. 128. 129. 130. 131.

3J.

PROSOMES (MULTICATALYTIC

PROTEINASES; PROTEASOMES)

63

168. M.-F. Grossi de Sa, C. Martins de Sa, F. Harper, 0.Coux, 0. Akhaydt, Y. Florentin, J. K. Pal and K. Scherrer, J. Ceu Sci. 89, 151 (1988). 169. M. B. Kaltoft, C. Koch, W. Uerkvitz and K. B. Hendil, Hybridorno 11,507 (1992). 170. A. J. Rivett, A. Palmer and E. Kneclit, J. Histochem. Cytochern. 40, 1165 (1992). 171. A. Kumatori, K. Tanaka, N. Inamura. S. Sone, T. Ogura, T. Matsumoto, T. Tachikawa, S. Shin and A. Ichihara, PNAS 87, 7071 (1990). 172. L. Manuelidis, PNAS 81, 3123 (1984). 173. P. H o d , A. Bassim Hassan, D. A. Jackson and P. R. Cook, Cell 73, 361 (1993). 174. D. Briane. M. Olink-Coux, J. Vssy, 0. Oudiu, M. Huesca, K. Scherrer and J. Foucrier, Eur. J. Cell Biol. 57, 30 (1992). 175. M. C. Grand, F. Pinardi, J. Gautron, C. Chezzi, K. Scherrer and J. Foucrier, Cell. Bid. Intern. 18, 426 (1994). 176. K. Tanaka and A. Ichihm, BBRC 159, 1309 (1989). 177. H. Kawahara and H. Yokosm, Deu Biol. 151, 27 (1992). 178. A. Amsterdam, F. Pitzer and W. Baumeister, PNAS 90, 99 (1993). 179. J. K. Pal. C. Martins de Sa, P. Gounon, M.-F. Grossi de Sa and K. Scherrer, Znt. J. Deu Biol. in press (1994). 180. 0. Akhayat, M.-F. Grossi de Sa and A. A. Infante, PNAS 84, 1595 (1987). 181. D. R. Senger and P. R. Gross, Deu Biol. 65,404 (1978). 182. D. A. D. P i and P. M. Steinert, Curr. Opin. Cell. Biol. 4, 94 (1992). 183. M. Olink-Coux. C. Arcrangeletti. R. Minisini, M. Huesca, C. Chezzi and K. Scherrer, J . Cell Sci. 107, 353 (1994). 184. J. Ldmuesse, M. Olink-Coux. C. Cwiere, 8. Matusiak, P. %sin and K. Scherrer, Eur.J . Cell Bid. in press (1995). 185. J. E. Eriksson. P. Old and R. D. Goldinan, Curr. Opin. Cell Bwl. 4, 99 (1992). 186. E. Lazarides, Nature e83,249 (1980). 187. R. D. Goldman, A. E. Goldman, K. J. Green, J. C. R. Jones, S. M. Jones and H. Y. Yang, J. Cell Sci, Suppl. 5, 69 (1986). 188. E. White and R. Ciprinani, MCBiol 10, 120 (1990). 189. J. Doorhr, S. Ely, J. Sterling, C. McLean and L. Crawford, Nature, 353, 824 (1991). 190. P. M. Steinert and D. R. h p , ARB 57, 593 (1988). 191. F. Bey, M. Huesca, H.-G. Nothmig, J. P. Bureau and K. Scherrer, PNAS in press (1995). 192. M. Wda, M. Kosaka, S. Sttito, T. Sano, K.Tanaka and A. Ichihm, J. lab. Clin. Med. 121, 215 (1993). 193. J. Gautier, J. K. Pal, M.-F. Grossi de Sa, J. C. Beetschen and K. Scherrer, J. Cell Sci.90, 543 (1988). 194. U. Klein, M. Gernold and P. M. Kloetzel, J . Cell Biol. 111, 2275 (1990). 195. J. R. Beyette and D. L. Mykles, Muscle-Nem 15, 1023 (1992). 196. J. Arribas, M. L. Rodriguez, R. A. D. Fonio and J, G. Castano, J. Exp. Med. 173, 423 (1991). 197. A. Bhui, A. Themth, K. Scherrer and J.-P. Bureau, Growth Difler. in press (1995). 198. A. Bhui, A. Themath, K. Scherrer and J,-P. Bureau, Breast Dis. 7, 109 (1994). 199. K. Dhingra and G . Hortohgyi, Breast Dis. 6, 7 (1993). 200. R. Glynne, S. H. Powis, S. Beck, A. Kelly, L.-A. Kerr and J. Trowsdale, Nature 353,357 (1991). 201. P. Genschik, G. Philipps, C. Cigot and J. Fleck, FEES Lett. 309, 311 (1992). 202. T. Fujiwwd, K. Tanaka, E. Orino, T. Yoshimura, A. Kumatori, T. 'Ihmura, C. H. Chung, T. Nakai, K. bmaguchi, S. Shin, A. Kakizuka, S. N&nishi and A. Ichihara, JBC 3s5, 16604 (1990).

64

KLAUS SCHEHHEH AND FAYGAL BEY

203. Y. Emori, T. Tsukahara, H. Kawasaki, S. Ihiura, H. Sugita and K. Suzuki, MCBiol11,344

(1991).

204. P. H&er and T. 1). Fox, NARes 19,5075 (1991). 205. E. Georgatsou, T. Georgakopoulos and G. Thireos, FEBS Lett. 299, 39 (1992). 206. W.Heinemeyer, A. Gruhler, V. Mohrle, Y. Mahe and I). H. Wolf, JBC 268,5115 (1993). 207. W.Hilt, C. Enenkel, A. Gruhler, T. Singer and D. H. Wolf, JBC 268, 3479 (1993). 208. D. H. Lee, K. Tanaka, T.Tamura, C. H. Chung m d A. Ichihara, BBRC 182,452 (1992). 209. H. Friedman. M. Goekl and M. Snyder, Gene 122,203 (1992). 210. C. Haass, 6. Pesold-Hurt, G . Multhdup, K. Beyreuther and P. M. Kloetzel, Gene 90,235

(1990).

M. Kloetzel, EMBO J . 8, 2.373 (1989). G. Fujii, K. Tashiro, Y. Emori, K. Saigo, K. Tanaka and K. Shiokawd, BBRC 178, 1233 (1991). M. C. H. M. Van Riel and G . J. M. Martens, FEES Lett. 291,37 (1991). S. Frentzel, U. Graf. G. J. Hammerling and P. M. Kloetzel. FEBS Lett. 302, 121 (1992). M. Aki, T. Tdmura. F. Tokunaga, S. lwanaga,Y. Kawdmura, N. Shimbara, S. Kagawd, K. Tanaka and A. Ichihara, FEBS Lett. 301, 65 (1992). T. Fujiwaru, K. Tanaka, A. Kumatori, S. Shin, T. Yoshimura, A. Ichihara, F. Tokunaga. R. Aruga, S. Iwanaga, A. Kakizuka and S. Nakanishi, Bcheiri 28, 7332 (1989). K. Tanaka, T. Fujiwdrd, A. Kumatori, S. Shin, T. Yoshimurd, A. Ichihard, F. Tokunidga, R. Aniga. S. Iwanaga, A. Kakixuka and S. Nakanishi, Bchetn 29, 3777 (1990). T. Tamurd, K. Tdnaka, A. Kumatori, F. Yamada, C . Tsurumi, T. Fujiwdra, A. Ichihara, F. Tokunaga, R. Aruga and S. Iwdnaga, FEBS Lett. 264, 91 (1990). K. Tanah, H. Kanayama, T. 'Idmura, D. H. Lee, A. Kumatori, T. Fujiwara, A. Ichiliara, F. Tokunaga, R. Arugd and S. Iwaiaga, BBRC 171,676 (1990). A. Kumatori, K. Tanaka, T. Tamura, T. Fujiwara, A. Ichihara, F. Tokunaga, A. Onikura and S. Iwdnaga, FEBS Lett. 264, 279 (1990). T. Tamura, N. Shinil)ara, M. Aki, N. Ishida, F. Bey, K. Scherrer, K. Tanaka and A. Ichihard, J . Biochesi. 112, 530 (1992). T. Tamura, D. H. Lee, F. Osaka, T. Fujiwdra, S. Shin, C. H. Chung, K. Tanaka and A. Ichihara, BBA 1089,95 (1991). G . N. DeMartino, K. Orth, M. L. McCuUough, L. W.Lee, T. Z. Muan. C. R. Moomaw, P. A. Dawson and C. A. Slaughter, BBA 1079, 29 (1991). I. Silva-Pereira, F. Bey, 0.Coux and K. Scherrer, Gene 120, 235 (1992). A. Kelly, S. H. Powis, R. Glynne, E. Radley, S. Beck and J. Trowsdale, Nature 353, 667 (1991).

211. C. H m s , H. B. Pesold, G. Multhaup, K. Beyreuther and P. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225.

Biological Implications of the Mechanism of Action of Human DNA (Cytosine-5)methyltranderase STEVENS. SMITH Department of Cell and Tumor Biology City of Hope National Medical Center

Duarte,

California 91010

I. Mechiwism of Action of the Human DNA (Cytoisine-5)inethyltransfe~ue . . . .. . .. . . . A. Sequencv of Catalytic Events . . . . . . . . . . . . . . . , . , . . B. sp’+sl9 Energetics and Stemdiemistry at C-6 and C-5 of Cytosine ...................................................... C. Conformational Change in the Enzyme-DNA Complex , . . .. . I). Potential for Proton-mediated Hydrolytic Denmination . , . . . . . 11. Selectivity of Htimaii DNA Methy1transfer;lses . . . , . . . . . . . . . . . . . . . A. De N c ~ Methylation o , . . . . . . . ,. . . . . . , ., . B. Methyl-directed Methylittion . . . . . . . . . .. . , . .. C. Structurally Inducwd Methylation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1). The Three-nucleotide Rtuu)gnition Motif . . . .. .. ., . . . .. . E. Enzyme-1)NA Intemqion at the Asymmetric DNA-Binding Site 111. Bioloigical Implications of the Mechanism . . . . . . . . . . . . . . . . . . . . . . A. Specificityof Hilimn DNA Methylation . . . . ... .. .. . . . B. Pattern Formation as the Key to the Function of 1’ertel)rate DNA . . . ... . . . . . . .. . . . . .. Methylation . . . . . . . . . . C. Key Elements of Pattern Formation Are Demonstrated by the Phenomenon of Concerted Modification , , . , . . . . . . . . . . , . . , . . . . 1). Enxymology of Pattern Formation Mechanisms . . . .. . . E. Enzymology of Distnrlxtnces in Patterning Pndiiced by DNA Dam-

. . . . . . . . . . . . . . . . . . . .. . . . . .. . . . .. . .. .

. ....

.. . . .

age

.... . . .. . . . .. . .. . . ..

. .. . .. . . . . . .. . . . . . . .. .. . . .. . . . .. .. .. . . ... . . .. . . .. .. .. . . .. . . . .. .. . . . I

.......................................................

. ...

.. .. . .. . .. . .

F. 1)eamination at C-C Dinucleotides . . . . . . . . . . . . . . .. . . . I\’. Conclusions . . , . . . . , , . . . . . . . .. . , . . .. . . . . . . . Referenws . . . . . . . , , . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .

.. ..

.

... . . . . . ... .

...

66 66 68

70 71 73 73 74 76 78 81 84 85

86 87 91

98 100 105 106

A central characteristic of any phenomenon in molecular biology is its associated enzymology. Although the full potential contained in a given enzymology is rarely realized biologically, it is because the limits on what is possible are difficult to deduce by other means that careful enzymological studies are valuable. The following is an attempt to define the enzymological

66

STEVEN S. SMITH

boundaries of the phenomenon of human DNA methylation set by the specificity and mechanism of action of the human DNA methyltransferase(s).

1. Mechanism of Action of the Human DNA (Cytosine-5)methyltransferase One of the most important early clues in deciphering the mechanism of action of the DNA (cytosine-5)methyltransferases(EC 2.1.1.37)came from studies showing that 5-azacytosine is a potent inhibitor of DNA methylation in both prokaryotes and eukaryotes (1-3). The formation of a tight, and presumably covalent, complex between DNA containing 5-azacytosine and DNA (cytosine-5)methyltransferasesfrom both bacterial ( 4 , 5 )and human (6) sources is consistent with a mechanism of action of the enzymes involving the production of a covalent dihydrocytosine intermediate in DNA (7). Extensive analysis of the HhaI inethyltransferase (8, 9) provided a detailed description of the mechanism, which now appears to be general for the (cytosine-5)methyltransferasesfrom both prokaryotes (8-12) and eukaryotes (13).The general features of the mechanism (Fig. 1) are nucleophilic attack at C-6 of the target cytosine by a cysteine residue in the enzyme to produce a transient 5,6-dihydrocytosine carbanion that, in turn, attacks the incoming methyl group provided by S-adenosylmethionine (AdoMet). Subsequent abstraction of the proton at C-5 and elimination of the covalent link between the enzyme and C-6 release 5-methylcytosine at the methyl acceptor site and regenerate active DNA methyltransferase.

A. Sequence of Catalytic Events The formation of a covalent bond between the methyltransferase and DNA containing 5-azadeoxycytidine (4, 5 ) or 2-pyrimidinone-l-P-~-2'deoxyribohranoside (10, 11) in the absence of AdoMet coupled with the kinetics of proton exchange at C-5 in the absence of AdoMet (8) shows that the dihydrocytosine carbanion forms prior to AdoMet binding. Nucleophilic

CAmuaoN

DmDROCYlWJmE

FIG. 1. Methyltransferase reaction mechanism. C-5 of cytosine or 5-fluorocytosine can be activated as a methyl xveptor through attack by the DNA methyltransferase nucleophile (Nu:) at C-6. Transfer of +Ha from AdoMet is followed by p-elimination to liberate 5-methylcytosine at the methyl acceptor site in DNA and active DNA methyltransferase.

HUMAN DNA (CYTOSINE-5)METHYLTRANSFEHASE

67

attack at C-6 generates a covalent bond between C-6 of cytosine and the sulfhydryl of a cysteiiie residue on the enzyme (9,12,14)at a conserved ProCys dipeptide (9, 12) that is identified as a conserved sequence in the 24 DNA (cytosine-5)methyltransferases for which sequence information is currently available (15-18). Tight complex forination with 5-azadeoxycytidine or 2-pyrimidinonel-P-~-2’-deoxyribofranosideoccurs without AdoMet binding and without methyl transfer. In the case of 5-azadeoxycytidine, it is thought that N-5 becomes protonated, preventing transfer of the methyl group (5).In the case of 2-pyrimidinone-l-P-~-2’-deoxyribofuranoside, the absence of the C-4 amino group somehow prevents methyl transfer, perhaps because hydrogenbonding between the N-4 (and the enzyme surface is required for methyl transfer). On the other hand, weak binding of AdoMet to its binding site appears to occur in the absence of DNA (18),because up to 4% of the EcoRII methyltransferase inolecules can be photo-cross-linked with AdoMet at Cys-186 in the absence of DNA. Cys-186 of EcoRII is the active-site nucleophile (18) and shows sequence hoinolom with the active-site nucleophile (Cys-71) of Hue111 methyltransferase (12). Because of the proximity of the weak binding site to the active-site nucleophile, it s e e m unlikely that this binding site can be considered an allosteric site as is the allosteric site observed for adenine inethyltransferase (19), as suggested elsewhere (11). More probably, the binding of DNA enhances the affinity for AdoMet. Moreover, when defined duplex oligodeoxynucleotide substrates are used, the kinetics of inethylation by the human enzyme with respect to AdoMet are hyperbolic and not higher order (20). All of this is consistent with the kinetic anidysis of the HhuI Mechanism (8), which shows that AdoMet binds after the covalent enzyme-DNA complex has formed. Subsequently, the carbanion at C-5 attacks the methyl group on AdoMet to form the 5,6-dihyrocytosine intermediate. The identification of the steps to this point are confirmed by the isolation of a covalent complex between 5-fluorodeoxycytidine (FdC)-containing DNA (Fig. 2)and the enzyme (9). Formation of complexes is general and has more recently been demonstrated for the Hue111 inethyltransferase (U),EcoRII inethyltransferase (18),and the human DNA methyltransferase (13)by using FdC-containing oligodeoxynucleotides. These complexes are stable to heating in the presence of sodium dodecyl sulfate and require AdoMet for their formation. In the absence of AdoMet, the proton at C-5 exchanges with solvent at a rate that is about seven times that of inethylation (8). Presumably, the increase in pK, at C-5 produced by the formation of the carbanion (21) produces exchange with water, since a proton would be likely to add at C-5 of the carbanion in the absence of AdoMet. Tritiated methyl groups can be

STEVEN S. SMITH

68

I

ABoRTlvEDmDROCYTOSlWECOYPLO(

FIG. 2. Covalent complex formation with fluorocytosine-coii~aiiiillgDNA. Transfer of the inethyl group from Adohlet to 5-fluort~ytosiaeresults in a stable almrtive covalent complex ruimprising DNA atid the etuyme, because neither the C-F iinr the C-CH, bond at C-5 call be broken to periiiit p-elimination.

found associated with FdC-containing DNA in the enzyme-DNA complex, visualized by scintillation counting after gel permeation chromatography (9) or by fluorography after gel electrophoresis (13).Stable complexes are not observed in the absence of AdoMet (9, 12, 13). These findings show that inethyl transfer is required for the stability of the complex (13).The complex is stable (Fig. 2) because the enzyme cannot catalyze the elimination of fluorine or the inethyl group froin C-5 of the dihydrocytosine intermediate due to the stability of the C--C and C-F bonds. In the absence of fluorine (i.e., during normal catalysis), the hydrogen at C-5 is abstracted as p-elimination of the nucleophile (Nu:) at C-6 occurs (Fig. 1).P-Elimination regenerates active enzyme and produces S-adenosylhomocysteine(AdoHcy) along with 5-methylcytosine at the target site in DNA.

B. spGsp3 Energetics and Stereochemistry at C-6 and C-5 of Cytosine Nucleophilic attack at C-6 requires s p h s p 3 rehybridization at C-6 and is expected to induce s p h s p 3 rehybridization at C-5 simultaneously. Both the stereochemistry and the energetics of this step are likely to be dictated by the constraints inherent in the structure of the DNA substrate. In B-DNA, the right-handed sense of the screw axis requires that the nucleotide (N) 5' to cytosine in the 5' N-C-G 3' trinucleotide be displaced forward into the inajor groove relative to the position of the cytosine, while the 3' guanine is recessed behind the cytosine. This means that the C-6 of cytosine is more easily accessible from the face of the ring adjacent to the 3' guanine (3' face) than it is from the face of the ring adjacent to the nucleotide 5' to the cytosine (5' face). In fact, the inability of methyltransferases to inethylate Z-DNA (22-24) may be due to the inaccessibility of C-6 from either face when DNA is in the Z conformation (8).The assumption that the intennediate must form within the confines of the B-DNA double helix generates several useful constraints on the formation of the intermediate. Molecular models of the intermediate, in which P-mercaptoethanol was used to model the active-site nucleophile within the confines of the canonical

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

69

A*C Mispair [Transition State Analog]

(j-Mercaptoethanol Carbanion [Transition State Model]

FIG.3. Comparison of the crystal structure of the A C mispair with a molecular model of the carbanion. The hybridization state at C-5 of the methyl-accepting cytosine is indicated for each model. (Top) C-5 carbanion formed with CH3-CH2-Sat C-6 of cytosine. (Bottom)The structure of the A C mispair constructed from the crystallographic coordinates provided by W. Hunter and 0. Kennard (Cambridge University). All models are canonical B-DNA.

B-DNA structure (Fig. 3), show no energetic reason that rehybridization at C-5 should be stereospecific. The sp3 orbital carrying the electron pair in the carbanion can be accommodated cis or trans to the nucleophile at C-6 (25). Once the methyl group is accepted at C-5 during the formation of the dihydrocytosine intermediate, molecular modeling calculations (25) suggest that the methyl group may be added cis to the active-site nucleophile without energetically unfavorable interactions with the DNA. These results are consistent with analogous studies of the structurally related pryimidine: cisthymine glycol in DNA (26); they also suggest that the structure of DNA need not be altered more dramatically during methyl transfer than it is during the formation of the carbanion. On the basis of these considerations, it seems reasonable to suggest that the enzyme possesses a mechanism for stereospecific production of the carbanion, so that the electron pair in the sp3 orbital at C-5 will be positioned cis to the active-site nucleophile on the 3' face of the cytosine ring. This would permit the intermediate to attack the methyl group in AdoMet once AdoMet is bound to the active site. Stereospecific abstraction of the proton would proceed as trans p-elimination. As pointed out in 8, cis addition followed by trans elimination is consistent with the tritium exchange at C-5 in the absence of AdoMet if an unlabeled solvent proton can be accepted in cis by the carbanion. This, in turn, suggests that the overall reaction might proceed by cis addition and trans

70

STEVEN S. SMITH

elimination (8). While cis addition and trans elimination may be obligatory facets of the forward reaction in the presence of AdoMet, the failure of the enzyme to forin a stable complex with FdC-containing DNA in its absence (12, 13) strongly suggests that they are not obligatory facets of the reverse reaction prior to methyl transfer; that is, the reverse reaction can proceed by either cis or trans elimination. Thus, at least a portion of the proton exchange may be accomplished through random proton loss from a C-5 methylene in the 5,6-dihydrocytosineintermediate during reversal of the reaction in the absence of methyl transfer from AdoMet. Proton loss would transiently regenerate the carbanion before the sp3 carbons at C-5 and C-6 rehybridized to sp2 state as the nucleophile withdrew.

C. Conformational Change in the Enzyme-DNA Complex Work with the human methyltransferase uncovered an exceptional acceleration of the de nmo reaction by DNA molecules containing unusual DNA structures (20, 27-34). In unusual structures in which a cytosine ring is activated as a methyl acceptor, the accepting cytosine has a ring that is protonated at N-3 and/or displaced out of its normally stacked position into the major groove. The acceleration of the reaction by the unprotonated cytosine ring in the A.C mispair suggests (Fig. 3) that a conformational change in DNA occurs during formation of the enzyme-DNA complex (20, 31). This change was expected to be associated with an attack on C-6 by the nucleophile at the active site or with an attack on the methyl group of AdoMet by the electron pair in the sp3 orbital at C-5 of the carbanion. According to current theory (35),the displaced cytosine is a transitionstate analog for which the methyltransferase would have a high d n i t y . An additional acceleration of the reaction by structures containing a C.C mispair (20, 29, 31) suggests that nucleophilic attack at C-6 is facilitated by protonation of N-3 in the mispair, since one of the cytosines in this mispair is both protonated and displaced into the major groove. Ab initio calculations of the charge distribution on the protonated cytosine ring (36) support this notion, since the net positive charge at C-6 is, in fact, increased by protonation at N-3. On this basis, one can understand the acceleration of the reaction by mispairs that are thought to be protonated (29, 36). Molecular modeling (25) of both the carbanion and the 5,6-dihydrocytosine intermediate with P-mercaptoethanol as an analog of the active-site nucleophile suggests that both of these intermediates require a moderate displacement into the major groove regardless of stereochemistry. Recent crystallographic data on the structure of the covalent enzyme-DNA complex formed between the HhaI methyltransferase and an FdC-containing 13-mer (36a) shows that the cytosine ring is, in fact, completely extrahelical during

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

71

catalysis by this methyltransferase. The amino acids that interact with the extrahelical target cytosine in M.HhaI are highly conserved in the human enzyme (17),strongly suggesting that both the carbanion and the dihydrocytosine intermediates are extrahelical in the human enzyme. Cys-1105 is identified as the active-site nucleophile of the human enzyme (17)by homology with Cys-81 of the M.HhaI (8, 15, 17). While the molecular modeling approach predicts a displacement of cytosine into the major groove, the displacement observed in the crystal structure of the HhaI methyltransferase is far more pronounced than that seen in the models. The target cytosine adopts a completely extrahelical conformation in the crystal structure of the M.HhaI complex because Gln-237 of M.HhaI displaces the cytosine ring from the helix and forms hydrogen bonds with the DNA that preserve the structure of the bound DNA molecule. One additional point is worth noting here. According to the foregoing, AdoHcy, sinefungin, and similar methyltransferase inhibitors (10, 11, 37) probably block the reaction at the carbanion with the cytosine in the extrahelical conformation. This displacement offers an explanation for the inability of the enzyme to move along the Z axis of the helix, as noted by others (8), and also provides an explanation for the well-known stability of ternary complexes formed with these inhibitors.

D. Potential for Proton-mediated Hydrolytic Deamination The potential for hydrolytic deamination of the 5,6-dihydrocytosine intermediate is well known (21,38, 39),and it was originally thought that cytosine deaminase might operate by a mechanism similar to that described above for the DNA methyltransferase. This mechanism was ruled out for cytosine deaminase, which appears to operate by direct addition of water to the 3,4 double-bond (39).Nevertheless, certain chemical mutagens do appear to operate by this mechanism (40,41),and the potential for deamination is inherent in the mechanism of action of the methyltransferase (13, 42-44). The two key requirements in a hypothetical deamination scheme involving the inethyltransferase reaction are exposure of the 5,6-dihydrocytosine intermediates to water and exposure to protonation. In the absence of AdoMet, M.HpuII produces uracil at a slow rate (42).This side-reaction is prevented by the binding of AdoHcy, suggesting that the mechanism of deamination involves protonation both at N-3 and at the C-5 sp3 orbital carrying the electron pair in the carbanion, followed by attack on the 3,4 double-bond by water. Protons and solvent water would have access to the intermediate through the route taken by AdoMet to the AdoMet-binding site. AdoHcy binding would block access of water to the intermediate and prevent hydrolysis. In effect, the enzyme could be viewed as sealed against

72

STEVEN S. SMITH

water (and therefore deamination) in the presence of bound cofactor. For M.Hha1, a channel for AdoMet binding that could also provide a route for water access in the absence of AdoMet has been observed (36a). The production of thymine through deamination of the dihydrocytosine intermediate can occur only after methyl transfer. It would require protonation of N-3 followed by attack of the 3,4 double-bond by water. Given that methyl transfer requires the binding of AdoMet and the presence of AdoHcy produced by methyl transfer during f3-elimination, water would not have access to the intermediate, according to the mechanistic considerations given above. Thus, DNA containing thymine is not produced by the human enzyme (13)or the bacterial HpaII methyltransferase (44). Binding AdoMet and AdoHcy so as to prevent the access of water to the substrate is a constraint that is perhaps unique to the cytosine methyltransferases. A motif in the sequence homology identified in protein and other nucleic acid methyltransferases (45)provides part of a hydrophobic region including Phe-18 of M.HhaI that interacts with AdoMet (46)and is conserved as Phe-1024 of the huinan enzyme. However, most of the other amino acids interacting with AdoMet in M.HhaI, while conserved among the cytosine methyltransferases (46),are outside this more widely distributed motif identified for protein inethyltransferases and adenine methyltransferases (45). Although no single inethyltransferase has been tested for each of the several phenomena noted above, it is remarkable that the facts can be assembled into a consistent mechanism that appears to apply to methyltransferases from both bacterial and inainmalian sources. This consistency of mechanism is reflected in protein sequences homology (15-1 7), which suggests that the catalytic and cofactor-binding sites of cytosine methyltransferases have a coininon evolutionary origin. In the older literature on this point (15, 16), sequence comparisons between mouse and bacterial enzymes suggested that sequences that are highly conserved in bacteria do not have the expected homology with sequences from the mouse gene. This discrepency was particularly confusing with regard to the apparent lack of homology at the active site that carried the hallinark Gly - - ProCys - - - PheSer in nearly all It now appears that the sequence originally reported for the bacteria (15,16). mouse cDNA (47)contained artifactual insertions and deletions that resulted in a number of errors for the predicted amino acids. The corrected sequence for the mouse gene (17)is nearly 80% homologous with the human sequence, and both of the known mammalian sequences (human and mouse) now exhibit good homology with the highly conserved sequences from bacterial enzymes, including the Gly - - ProCys - - - PheSer sequence at the active site (46).

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

73

II. Selectivity of Human DNA Methyltransferases Although there is a common mechanism of action among the cytosine methyltransferases, the substrate specificity represented in the group is very broad. Primordial inethyltransferases might have had either the stringent specificities associated with modern restriction inethyltransferases or the inore relaxed sequence specificity exhibited by the mammalian enzymes. One well-characterized prokaryotic inethyltransferase retains a moderately relaxed specificity for the C-G dinucleotide pair1 (25,48,49).Such enzymes have the capacity to inethylate random DNA sequences at least 16 times inore frequently than inethyltransferases with a tetraineric recognition site, and 256 times inore frequently than those with a hexameric recognition site. Thus, the specificity for the dinucleotide provides the potential for broader influence of 5-methylcytosine-related properties. Human DNA methyltransferase(s) exhibits a relaxed sequence specificity for only a three-nucleotide motif within the C-G dinucleotide pair. Additional flanking sequence is not specified (31).For the human enzyme, three inodes of inethylation can be distinguished on the basis of the reaction rate: (1)de n o w inethylation, defined as the slow inethylation of accepting cytosine residues in B-DNA containing the four standard nucleotides but lacking 5-inethylcytosine; (2) methyl-directed inethylation, defined as the rapid inethylation of an accepting cytosine residue in response to a syininetrically placed 5-inethylcytosine on the opposite strand; and (3)structurally induced methylation, defined as the rapid inethylation of DNA containing an unusual structure.

A. De Novo Methylation Since the importance of structurally induced methylation has only recently been recognized, the earlier literature does not distinguish it from de nmo inethylation as defined above. This distinction is most easily understood by focusing on work with well-characterized oligodeoxynucleotide substrates and the human enzyme. Early work showed that oligodeoxynucleotides could be used as substrates for DNA methyltransferase, and that restriction analysis of the products could be used as an aid in assigning sites of inethylation ( 5 0 4 2 ) . To study the details of the specificity of the human enzyme, duplex oligodeoxynucleotides were designed using restriction sites that would permit unequivocal assignment of sites of methylation in DNA products (13, 20, 27-32). 1 "C-C, diiiiiclrotide pair" refers to the srlf-~uiiiipleiiieiitary (C-C,),(G-C) quartet (see Fig. S).

74

STEVEN S. SMITH

Studies of the de nouo reaction as defined above were carried out with sets of complementary oligodeoxynucleotides designed to adopt an uninterrupted B-DNA conformation. Gel electrophoretic analysis of equimolar mixtures of 32P-end-labeled complementary strands showed the formation of stoichiometric duplexes under nondenaturing conditions. Single-stranded DNA and duplex DNA were well separated under these conditions, and more than 95% of the label was found at the position of duplex DNA, with only a trace of label at the position of single strands (20). Cleavage with a battery of restriction enzymes followed by electrophoretic analysis of the products demonstrated that the synthetic duplex was indeed accepted and properly cleaved by these enzymes, strongly suggesting that the DNA was bona fide B-DNA (20). Studies of the initial velocity of the reaction using highly purified preparations of the human enzyme showed that the enzyme catalyzes the de nouo inethylation of these well-characterized B-DNA duplexes at an exceedingly slow rate (20). The kinetics of the reaction with respect to DNA were hyperbolic, indicating a single DNA binding site on the enzyme or perhaps multiple noncooperative binding sites for DNA (unpublished). This weak de nouo reaction was selective for the C-G dinucleotide, but a low level of methylation occurred at C.G base-pairs outside this dinucleotide sequence (20). Each of the cytosines in the C-G dinucleotide pair was a methyl acceptor, although the product of the enzyme at the short reaction times seen in initial velocity studies was not necessarily methylated on both strands.

B. Methyl-directed Methylation Clear enzymological evidence for inethyl-directed inethylation was first obtained with inurine DNA (cytosine-5)methyltransferase(53).These experiments used enzymatically synthesized duplex QX174 DNA in which every cytosine residue in one strand was replaced by 5-methylcytosine. This hemimethylated substrate was methylated about 100-foldfaster than control DNA in which the enzymatically synthesized strand contained only the four coininon bases. Analysis of the DNA product showed that the methyl groups applied by the action of DNA methyltransferase were confined to C-G dinucleotides on the unmethylated strand (53).The strong stimulation of activity by the 5-methylcytosine-containing strand and the selectivity for the C-G dinucleotide suggested that the activity was that of a methyl-directed methyltransferase (maintenance methylase) that could play a role in the somatic inheritance of mammalian methylation patterns (54,55). Enzymatically synthesized DNAs prepared by this method were later used to test the specificity of the enzyme for the methyl group in 5-methylcytosine (27). A series of enzymatically synthesized substrates in which all cytosines on one strand were substituted by 5-fluorocytosine, 5-bromo-

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

75

FIG.4. Relative effectiveness of C-5 substituents as methyl directors. Relative methyltransferase reaction rate is plotted as a function of the van der Waals radius of the atom or group at C-5 of cytosine. (Inset) Molecular models of the substituted cytosine ring show the N-4amino group at 12 o’clock and the group at C-5 at 2 o’clock.

cytosine, 5-methylcytosine, or 5-idiocytosine was used to show that the human DNA methyltransferase is optimally suited to respond to a methyl group at C-5 of cytosine. The relative effectiveness of the C-5 substituents in stimulating the reaction was H < F < Br CH, > I (Fig. 4). Apparently, the methyl-directed activity possesses a mechanism whereby the size and nature of the substituent at C-5 of cytosine can be sensed as a signal that directs methylation to the opposite strand. The optimization of this mechanism for the C-5 methyl clearly suggests that the human DNA methyltransferase evolved to recognize a methyl group at this position. Experiments with enzymatically synthesized DNAs have the disadvantage that all of the cytosine residues on one strand of the substrate DNA are replaced. In the case of 5-methyl replacement at C-5, affected sites are unavailable as methyl acceptors. Thus, to rule out the possibility that these substrates simply preclude methylation of one strand, it was important to test the enzymatic specificity with synthetic oligodeoxynucleotides containing a single cytosine methyl in either of the two possible orientations in the C-G dinucleotide pair. Oligodeoxynucleotides were designed to permit the unequivocal assignment of the position of the [3H]methylapplied by the enzyme to the DNA using restriction analysis of the tritiated DNA products produced by methyl transfer from [3H]methyl-AdoMet (13, 20, 31). While

*

76

STEVEN S. SMITH

cytosines outside the C-G site acted as weak methyl acceptors in the de nwo reaction (as discussed above), the presence of a lone methyl group on one strand directed methylation to a single methyl acceptor on the other strand: the symmetrically placed cytosine residue in the C-G dinucleotide pair (20, 31). Thus, the product of the enzymatic methylation of either of the two asymmetrically inethylated duplexes is a single symmetrically methylated duplex. Moreover, the methyl group stimulated the reaction about 1W-fold, compared to the de nmo rate, from either orientation. This suggests that sequences flanking the C-G dinucleotide are not utilized by the enzyme, because the sequence outside the C-G in these substrates is related only by a pseudo-twofold axis. Asymmetrically methylated trinucleotides of the type W-N-G were not substrates for the methyl-directed reaction carried out by the human enzyme (20). The mechanism by which a cytosine methyl on one strand can stimulate the reaction is unknown. Rate enhancements may reflect an increased affinity of the enzyme for asymmetrically methylated DNA compared to unmethylated B-DNA. In fact, HaeIII (56) and MspI (11) methyltransferases appear to have an increased affinity for asymmetrically methylated DNA. We have recently measured reaction rates for asymmetrically methylated bacterial methyltransferases and have detected only a doubling of the reaction rate induced by appropriate asymmetric methylation for HpaII and SssI. This increase is detected as a constant net reaction rate coupled with the confinement of inethylation to the single unmethylated cytosine in the sequence recognized by these enzymes (25, %a).

C. Structurally Induced Methylation Given the selectivity of biological methylation for the C-G dinucleotide pair, one might expect that any alteration in this sequence would inhibit DNA methyltransferase. However, careful consideration of the enzyme mechanism suggests that the enzyme should preferentially bind DNA at C-G sites in which cytosines are unstacked or protonated, since these two conditions would, on the one hand, mimic the transition state of the reaction (29, 31, 35) and, on the other, facilitate the nucleophilic attack required for the formation of the transition state (29, 31). Indeed, heteroduplexes affecting the structure of the C-G dinucleotide are excellent substrates for de n m o and methyl-directed methylation by the human enzyme (20,28,29,31, 32, 57-62). Relative to the rate observed for Watson-Crick-paired duplexes, enhancements ranging from two- to 100-fold have been observed for heteroduplexes constructed in a single oligodeoxynucleotide sequence background (31).Thus, the nature of the structural anomaly introduced by the heteroduplex not only affects the reaction rate observed, but also sets the point of methylation.

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

5'm*s*cx*mm*3'

r....GC*....S m s... cx . ....3' r.... GC....5' *

s...cx .....3'

r.... GC....5' m

77

Reto Incroawd by Mlrpalr or Ikmrgo at X

Rlt.

by b

or Damago at X

Rab Dmeaaod by MIrp.lr or Damlgr at X

FIG.5. Interaction of a methyl director and a site of damage. DNA methyltransferase generally methylates oligcdeoxynucleotides that cmtain a site of damage (X) at the site normally occupied by the guanine base-paired with the methyl acceptor. Reaction rate increases of up to 100-fold have been observed. On the other hand, the enzyme will not methylate oligdeoxynucleotides that contain a site of damage at the site normally occupied by the guanine h e paired with the methyl director. Methylation, when it is observed, (Kvurs at the asterisk.

When a thymine replaces one of the cytosines in what would otherwise be a C-G dinucleotide, an approximately twofold rate enhancement is observed with selective methylation of the intact C-G dinucleotide (28). Similarly (Fig. 5),when one of the guanines in the same C-G is replaced by 0 6 methylguanine (31, 58-62), the paired cytosine in the C-G on the opposite strand is methylated at four times the rate (31).Replacement of either guanine residue at this C-G dinucleotide with an adenine residue results in a 12- to 14-fold increased rate for the cytosine on the opposite strand in the A C mispair. Introduction of tetrahydrofuran (as an analog of the abasic site) into a position that would otherwise be occupied by a guanine residue in the C-G dimer results in a similar 13-fold rate enhancement for the cytosine in the C-G cytosine opposite the lesion (31).Introducing a lesion at this same C-G that generates both a C.C mispair and an adjacent A.C mispair results in a IW-fold stimulation and a precise selectivity for only one of the cytosines in the C.C mispair. In summary, the effect of the unusual structures introduced by the production of a heteroduplex is stimulation of the de no00 reaction rate of the human enzyme by up to lW-fold over the de nooo rate observed with a Watson-Crick-paired duplex. This difference is consistent with the predictions of the enzyme mechanism, and it is significant because the structurally induced de nooo rate is comparable to that of the methyl-directed rate. Interestingly, the effect of heteroduplex formation at the C-G dinucleotide was to induce selectivity for certain cytosine residues that are present in a

78

STEVEN S. SMITH

3'

3'

1

G

C

5'

FIG. 6. The three-nucleotide motif. The three essential nucleotides in the human DNA methyltransferase recognition site are shown as they would appear from a vantage point above the major groove in B-DNA. The cytosine methyl director is shown with a methyl group at C-5, although H, F, Br, and I act as weaker methyl directors. A guanine is shown base-paired with the cytosine methyl director, although hypoxanthine and 7-demguanine can be substituted at this site.

three-nucleotide motif(Figs. 5 and 6) present in each substrate at the site of methylation (31, 58). The methyl-directed reaction rate was also enhanced by the presence of a structural anomaly, as long as the methyl acceptor in the structurally anomalous DNA was symmetrically specified by the methyl director (cf. center and bottom of Fig. 5). Thus, adenine or 06-methylguanine gave an associated rate increase when substituted for the guanine adjacent to the 5-methylcytosine, but these bases inhibited the reaction when substituted for the guanine base-paired with the 5-methylcytosine.

D. The Three-nucleotide Recognition Motif Analysis of all data in which the site of methylation could be unequivocally established showed that the human enzyme recognizes a threenucleotide motif consisting of a cytosine or a 5-methylcytosine residue, its base-paired guanine residue, and a cytosine residue 5' to the paired guanine (Fig. 6). Moreover, the cytosine 5' to the paired guanine invariably served as the methyl acceptor (31, 58). The remaining site in the C-G dinucleotide (i.e., the site normally occupied by a guanine residue base-paired with the methyl acceptor) w a s not an important site of recognition by the enzyme. The site can be occupied by any base thus far tested, an abasic site analog, or even the 3' hydroxyl at the end of a chain. This finding led to the proposal that human methyltransferase(s)possesses a single asymmetric binding site for DNA that is complementary to the three-nucleotide motif(31, 58). Thus, the weak de nmo activity seen for Watson-Crick-paired B-DNA can be viewed as a reflection of the capacity of this asymmetric binding site to accept a hydrogen atom at the site optimally

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

79

suited for a 5-methyl group on cytosine (Fig. 4). Enhanced binding of substrates actually carrying 5-methylcytosine at the site paired with the guanine adjacent to the methyl acceptor would explain the rate enhancement associated with inethyl-directed methylation. Moreover, increased affinity for substrates that are analogs of the transition state for the catalysis or decreased activation energies associated with the formation of the transition state would explain the capacity of the enzyme(s) to respond to heteroduplex molecules in which the methyl acceptor is displaced into the major groove of the helix and/or protonated (31, 58). Strong support for the presence of an asymmetric binding site on the enzyme was provided by studies with foldback structural isomers (Fig. 7), differing only in the placement of 5-inethylcytosine (32).These 48-mers used five thymine residues to link a long block of DNA to a shorter block of DNA, to form isomeric foldback molecules. The shorter block of DNA was chosen so that the 3' end of the uneven foldback molecule would form the cytosine residue norinally occupied by the methyl director in the three-nucleotide motif. This was the only three-nucleotide motif available in the two isomers. In one isomer, 5-inethylcytosine was placed at Cyt-48 (i-e., at the 3' end of the inolecule to form a three-nucleotide motif with a methyl director basepaired with the guanine residue adjacent to the methyl acceptor). In the other isomer, 5-inethylcytosine was placed at Cyt-17 (i.e., at the methyl

FIG. 7. Methylation of isomeric 48-mers. A strong test of the specificity for the threeiiucleotide recognition motif is given by experiments with structural isomers that can forin fnldlxwks. The only ditTerence in the isomers shown is the placement of the 5-methylcytosine. The isomer shown in 1 carries the methylated base at nucleotide 17. This isomer can only forin a foldback with an incomplete three-nucleotide motif in which the guanine is missing and the cytosine methyl acceptor is already methylated. It is not an enzyme substrate. The isomer shown in I1 carries the methylated base at nucleotide 48. This isomer is able to form a foldback that permits the 5-methylcytosine at position 48 to direct metllylatioii to the unmethylated cytosine metliyl acceptor at position 17.

80

STEVEN S. SMITH

acceptor site on the long strand in the uneven foldback). The enzyme recognized the isomer carrying the methyl group at Cyt-48 and rapidly methylated Cyt-17 but did not recognize the isomer in which 5-methylcytosine was at Cyt-17. This is again consistent with the presence of an asymmetric binding site on the enzyme whose orientation relative to the twofold axis at the C-G dinucleotide can be specified by the presence of a strong methyl director such as Br or CH, at C-5 of cytosine. Thus, the enzyme is able to methylate around a gap but not UWOSS a gap. Importantly, the gap itself did not inhibit the reaction relative to the rate observed with a duplex. Neither the base-paired guanine adjacent to the cytosine methyl director nor any of the nucleotides 3’ to this guanine were required for full enzymatic activity. Additional support for the notion that the three operationally distinct reactions carried out by human DNA methyltransferase(s)are carried out at a single catalytic site was provided by experiments with duplex oligodeoxynucleotides in which one of the guanosines in the C-G site was replaced with another base, an abasic site or an @-methylguanine residue (31,58, 60-62). As with the gapped molecules, placement of the methyl director at the methyl acceptor site blocked methylation of the substrate, while placement of the methyl director at the position required by the three-nucleotide recognition motif generally enhanced the stimulation of the reaction rate induced by the structural perturbation associated with the lesion in the heteroduplex (Fig. 5). This roughly additive behavior for methyl-directed methylation and structurally induced methylation is most easily understood if a single asymmetric binding site on the enzyme carries out both reactions (31). At the time that this proposal was made, the sequence of the human DNA methyltransferasewas unknown. The sequence and chromosomal location of the human gene have now been determined (17).The human inethyltransferase is probably unique in the human genome, where it maps to chromosome 19 between 19~13.2and 19p13.3 (17).Like the sequences of the bacterial enzymes (9, 12),the inferred amino-acid sequence contains several Pro-Cys inotifs that provide cysteine residues that are candidates for the active-site nucleophile (17).However, only one of these is within the highly conserved Gly - - ProCys - - - PheSer sequence that is the hallmark of the active site of the cytosine methyltransferases (15,16).Thus, it appears that the different molecular forms observed for the human enzyme (13,25, 63) are all products of a single gene sequence. Each active form of the enzyme appears capable of catalyzing each of the operationally distinct reactions at a single catalytic site possessing an asymmetric binding site for the three-nucleotide recognition motif in DNA. A single multifunctional catalytic site was observed in recent experiments in which FdC-containing oligodeoxynucleotide affinity labels (13)were used to

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

81

study the four species of methyltransferase present in partially purified preparations of human placental enzyme. When FdC was placed at the methyl acceptor site in an asymmetrically methylated duplex, four active species of human DNA methyltransferase were observed (12, 25). When FdC was placed at the methyl acceptor site defined by the three-nucleotide motif in the heteroduplex containing a C.C mispair adjacent to an A.C mispair as discussed above, the same peptide fingerprint was obtained. In each case, peptides that were labeled with an affinity label designed to detect the methyl-directed activity were also detected by an &nity label designed to detect the structurally induced de nmo activity (25).

E. Enzyme-DNA Interaction at the Asymmetric DNA-Binding Site 1. INTERACTION

WITH THE

CYTOSINE

METHYL

ACCEPTOR

Extensive data now available on substrate requirements for the human methyltransferases(s) allow us to begin to assign potential sites of interaction between the putative asymmetric DNA binding site and functional groups on the surface of the DNA molecule. Consistent with the general picture of protein-DNA binding, most of the groups identified as essential by this analysis are available for interaction in the major groove of the B-DNA helix. In principle, the requirement of AdoMet for catalysis suggests that it might mediate certain of the potential enzyme-DNA interactions through hydrogen-bond bridging (32). However, the weight of the evidence with the bacterial inethyltransferases argues against this hypothesis, since the initial substrate interaction and 5,6-dihydrocytosine carbanion formation do not require AdoMet. Based on the conservation of protein sequence and the similarity of mechanism between these enzymes and the human enzyme, it seems unlikely that such models apply to the human enzyme. The FdC affinity-labeling experiments (13) establish that a nucleophile on the enzyme surface attacks C-6 of the accepting cytosine in the threenucleotide recognition motif. The C-5 of this cytosine need not interact with a group on the enzyme, since electrophilic substitution can occur through attack of the methyl group on AdoMet by the electron pair in the sp3 orbital at C-5 of the carbanion. The N-4 of the accepting cytosine is required for catalysis, because uracil in this position is not methylated by the enzyme (unpublished observations). While an oxygen in place of the amino group at this site might be an impediment to electrophilic substitution, this may also indicate interaction of the N-4 amino group with hydrogen-bond acceptors on the enzyme surface, since 2-pyrimidinone-l-P-~-2'-deoxyfuranoside (which lacks the N-4 amino group of cytosine) cannot accept a methyl group from AdoMet in the reaction catalyzed by M , M s p I (11).

82

STEVEN S. SMITH

r

0

0

r 6,6UHWWlXlOSl~

FIG. 8. Hydrogen-bonding during binding and catalysis. Binding of the cytosine methyl acceptor at the active site establishes several hydrogen Imnds between cytosine and conserved hydrogen-lmnd acceptors (-A) and hydrogen-lwnd donors (-D-H) provided by amino-acid side-chains at the active site of the enzyme (step 1). One of these forins 1)etween the acidic hydrogen of the carlmxyl group of a glutainic acid side-chain at the active site and N-3 of the target cytosine. Ioniiation of this carboxyl group will protonate N-3 of cytosine and activate C-6 for nucleophilic attack (step 2). Nucleophilic attack at C-6 produces a transient carbanion, which is able to attack the inethyl group on AdoMet after AdoMet binds at the active site by entering through the channel in the enzyme (step 3).The 5,6-diIiydrocytosiiie intermediate is transiently formed (step 4) and p-elimination generates 5-inethylcytosine (step 5).

The crystal structure of the M.HhaI-DNA complex suggests that hydrogen bonds play an important role in the selectivity of the enzyme for cytosine and also very probably in the process of catalysis (Fig. 8). In the crystal structure of the HhaI enzyme-DNA complex, N-4 appears to form hydrogen bonds with two ainino acids: the carbonyl oxygen of the side-chain carboxylic acid of Glu-119 and the peptide carbonyl of Phe-79 (36~).In the huinan enzyme, these bonds could be provided by the side-chain oxygen of the highly conserved Glu-1145 and the peptide carbonyl of Pro-1104. Given the acceleration of the huinan methyltransferase reaction associated with loosely stacked or extrahelical cytosines at the inethyl acceptor site, it is not possible to rule out transient interactions between the groups

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

CUlBANloN

83

ABoRTlvE&,wwM1yEocyIDI)INEcoMpLp(

FIG.9. Hydrogen-lmnding during abortive mniplex formation. Initial binding activation and nucleophilic attack are all shown in the same sequence (steps 1-3) as that in Fig. 8. Methyl transfer forms a stable complex because B-elimination cannot ocrur due to the strength of the C-F and C-CH, bonds at C-5. The abortive 5,6-dihydrocytosine complex is stable and not subject to deaniination.

on the enzyme surface and N-3 or 0-2 of the accepting cytosine (32). Such interactions would be expected to enhance the binding affinity of the methyltransferase for the conformationally altered transition state. 0-2 is, in fact, hydrogen-bonded to Arg-165 in the crystal structure of the HhaI methyltransferase, and a similar interaction in the human enzyme is predicted with the conserved Arg-1180 of the human enzyme. Moreover, data on the crystal structure of the HhaI methyltransferase (36a)establish the formation of a hydrogen bond between the ionizable hydroxyl of the acid side-chain of Glu-119 and cytosine N-3, suggesting that nucleophilic attack could be promoted (through delocalization of the charge at N-3 to increase the net positive charge at C-6) by protonation of the N-3 (29, 36) just after hydrogenbonding, as shown in Fig. 8 (panel 2). The formation of the abortive complex with FdC would be expected to follow the same route (Fig. 9).

2. INTERACTIONWITH THE CYTOSINE METHYL DIRECTOR The effectiveness of the methyl group in directing methylation to the target cytosine suggests the existence of a hydrophobic pocket in the putative asyinmetric binding site of the enzyme that is complementary to the methyl group in 5-methylcytosine (Fig. 5). The lesser capacity of fluorine, bromine, and iodine to serve as methyl directors, presumably by increasing the f i n i t y of the enzyme for the substrate (compared to hydrogen), suggests that the dimensions and properties of the pocket are suited to those of the methyl group (27).The N-4 of the methyl-directing cytosine is required for catalytic activity, since its replacement by oxygen in thymine renders it ineffective as a methyl director (27).Thus, the N-4 of this cytosine may serve as a hydrogen-bond donor in interaction with the enzyme (32).

84

STEVEN S. SMITH

3. INTERACTIONWITH THE

REQUIXED GUANINE

The 0 6 of the guanine paired with the methyl director may provide another important hydrogen-bond acceptor site in the DNA substrate. Both the methyl group of O~-inethylguanineand the N-6 amino group of adenine appear to block recognition of the three-nucleotide motif by the enzyme. Since 7-deazaguanine and hypoxanthine are accepted by the enzyme in place of guanine at this site, it appears that N-7 in the major groove of the helix is not a required hydrogen-bond acceptor and N-2 in the minor groove is not a required hydrogen-bond donor during interaction with the enzyme (31, 32).

111. Biological Implications of the Mechanism 5-Methylcytosine is an important component of the DNA of representatives of each of the five kingdoms recognized by modern systematics (64).In the kingdom Prokaryotae (comprising the archaebacteria and the eubacteria), 5-methylcytosine may have several functions, but it is often seen as part of restriction-modification (65) systems, in which it generally modifies sequences containing four or more nucleotides that are characterized by twofold symmetry and are, in general, susceptible in the unmodified state to cleavage by restriction endonucleases. Diversity in the modification patterns present in the 17 phyla of this group is considerable. Members of the kingdoin Protoctista (or lower eukaryotes) are either devoid of 5-1nethylcytosine, as in the case of the acrasiomycote Dictyosteliuin (66),or heavily methylated, primarily at C-G dinucleotides in repeated sequences, as is the case with the myxomycoyte Physarum polycephalutn (67-70). In the three most highly evolved kingdom, the picture is also complex. Higher plants (kingdom Plantae) generally possess 5-inethylcytosine at the syininetrical trinucleotide sequence C-N-G (71). Fungi (kingdom Fungi) often possess low levels of 5-inethylcytosine (72, 73),which is prominent in repeated sequences but not confined to any set di- or trinucleotide (43). Animals (kingdom Animalia) may be devoid of 5-methylcytosine, such as the insect Drosophila (74-76) and the nematode Caenorhabditis (77), or their methylation may be primarily detected at syminetrically inethylated C-G (78). Most methylation in animals also occurs in repetitive DNA (79). 5-Methylcytosine occurring at C-Gs is particularly interesting because it is this form of inethylation that appears to have occurred in the common ancestor of the vertebrates and the echinoderms (80).

85

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

A. Specificity of Human DNA Methylation Human DNA is modified primarily at C-G sites (81,82).As noted above, the common ancestor of the vertebrates and the echinoderms appears to have confined methylation to symmetrically methylated C-G dinucleotides (78). This sequence specificity may already have been present in bacteria, where it is found today in the methyltransferase SssI from Spiroplasmu (48, 49). While it seems likely that all cytosine methyltransferases share the same enzyme mechanism, sequence recognition specificity is very diverse and is carried by a region of the protein distinct from the catalytic site (15). The relaxed specificity for the three-nucleotide motif exhibited by the human enzyme is not shared by the bacterial enzymes, and even M.SssI does not actively inethylate unusual DNA structures, such as those containing an A.C inispair (2.5). Most methylated C-G dinucleotides are symmetrically methylated; for example, all sites tested in rDNA are symmetrically methylated (78).Nevertheless, direct sequencing has detected an asymmetrically methylated site that appears to be stable through multiple rounds of replication (83). In organisms for which the location of 5-methylcytosine has been determined by direct sequencing methods, or by ligation-mediated polymerase-chainreaction methods, methylation has been observed primarily at C-Gs (84, 85). Unfortunately, these types of experiments have been largely confined to expressed gene sequences or their associated control regions. The situation may be somewhat different in coding sequences of nonexpressed genes and repeated sequences where methylation might also occur outside the C-G site (86).Since repeated sequences account for more than 50% of the methylation observed in vertebrates (759, it is clearly possible that a portion of the inethylation in vertebrates lies outside the C-G dinucleotide in repeated sequences. This is consistent with reports (86, 87) suggesting that a significant portion of the total inethylation in vertebrates occurs in other sites.

1. SELECTIVITYFOR C-G Is SUPPORTEDBY

THE

ENZYMOLOGY

The selectivity for the C-G dimer observed in uitro for the human enzyme is consistent with the primary selectivity of the human enzyme for this site (20, 31), as it is with the specificity exhibited by the mouse enzyme (63, 71). The human enzyme inethylates DNA at sequences outside the diiner (20) at a slow de nuuo rate, a fact supported b y observations with the enzyme froin the rat (88),but the selectivity of the enzyme for the C-G is significantly enhanced when the site is asymmetrically niethylated.

86

STEVEN S. SMITH

2. ENZYMOLOGY OF NON-C-G SELECTIVITY

Interestingly, mispaired cytosines are selectively methylated at a very low rate when they are outside the C-G dinucleotide (53a). The expectation from the enzymology is that mispaired cytosines could be selectively methylated in uiuo at a slow rate, since they are analogs of the transition state that do not provide the required three-nucleotide recognition motif. Extrapolating this to unusual DNA structures (e.g., sequence-induced polymorphism in helical parameters, tight bends maintained in chromatin, foldbacks, or multistranded recombination intermediates), one expects methylation to occur at sequences that do not present the normal three-nucleotide motif (e.g., outside the C-G dinucleotide pair) whenever a cytosine moiety is activated as a methyl acceptor by protonation at N-3 or by a significant decrease in its stacking energy. This would also offer a possible explanation for the persistence of asyininetrically methylated sites in DNA through several rounds of replication (83, 85).

B. Pattern Formation as the Key to the Function of Vertebrate DNA Methylation Many eukaryotes lack cytosine methylation (66, 76, 77, 89). Since some of these organisms are capable of stably controlling gene expression during development, X-inactivation, and genetic imprinting (90). cytosine methylation cannot be an obligatory component of systems that control these processes. Nevertheless, in organisms that can methylate cytosine, the process does play an important role in cellular (91), developmental (92), and reproductive (70) viability. Current discussion of the function of cytosine methylation in eukaryotes centers on a possible role in the stabilization of transcriptional inactivity (93,M). Hypotheses of this type address the tendency of transcriptionally inactive genes to become methylated following stable inactivation by suggesting that inethylation is a top-level control (or locking mechanism) in a multilevel system for gene silencing (95). An involvement in the regulation of transcription is necessarily indirect, since transient gene expression is not hindered by cytosine methylation (96-98). Methylation does not always influence transcription factor binding (99),and a lack of DNA methylation does not always imply gene activity (100). Alternative proposals address this indirect association with transcriptional activity as an altered requirement for repair in transcriptionally active and inactive chromatin (101-104)or as a consequence of gene-silencing mechanisms that require the physical deformation or synapsis of DNA sequences capable of association through Watson-Crick and non-Watson-Crick complementarity (33). In the latter case (33), cytosine methylation would func-

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

87

tion to suppress the formation of unusual structures by promoting disassembly of unusual structures and also by providing a form of hysteresis, which would help suppress the formation of unusual structures during subsequent cycles of pairing and condensation. The essential features of each of these hypotheses are consistent with the enzymology of human cytosine methylation. Distinguishing among them will ultimately require elucidation of the molecular mechanisms that establish methylation patterns in vertebrates.

C. Key Elements of Pattern Formation Are Demonstrated by the Phenomenon of Concerted Modification Patterns of methylation are tissue specific (105)and clonally stable (10611O), but they are not generally altered by the actual induction of transcriptional activity (100, 109, 111). Thus, the tools developed for the study of the establishment of transcription states have not been useful in studies of mechanisms by which tissue-specific patterns of methylation are formed. Fortunately, key elements in pattern formation can be approached by studying clonally established differences among cell lines or tissues. Global changes in clonally stable patterns of methylation are a hallmark of oncogenically transformed cells (112-118). Since more than 50% of all DNA methylation in vertebrates is found in repeated sequence families, it is not surprising that clonally stable changes in patterns seen in oncogenically transformed cells include changes in the methylation states of both singlecopy and multicopy sequences. Patterns established by oncogenic transformation in the broadly interspersed repeated sequence families, of which the L1 family (Fig. 10)is the most carefully studied, are particularly informative.

1. CONCERTED MODIFICATIONOF INTERSPERSED REPEATED SEQUENCES Initial studies characterizing the methylation states of interspersed repeated sequences in Friend erythroleukemia cells (119) indicated that groups of interspersed repeated sequences appear to have lost methylation (relative to normal cells) in a concerted fashion at multiple independent chromosomal locations. Different methylation states were associated with different groups of interspersed repeated sequences. When representatives of one of these groups were cloned and characterized, thcy were shown to be homologous to about 2% of the mouse genome, but did not share homology with the major (A+T)-rich satellite sequences of the mouse genome (119).In situ hybridization confirmed broad interspersion and showed that members of this repeated sequence family are not often found associated with either telomeres or centromeres (120). When representative cloned repeated se-

88

STEVEN S. SMITH

, 39 ’

pPs.13

-

, MIL1 , , B U S r R

5’

Consensus

A

t

t 1-1,

-I

I

0

I 1

I 2

I 3

I 4

Length In kb

1 5

I 6

I

7

FIG. 10. Interrelationships within the L1 family. The consensus sequence of the L1 family contains the named sihfamilies MIF-1, BAMS, and R arrayed in the linear order shown. Members of the group are truncated at a random distance from the 3’ (TAATAAAAAA) end so that inany more copies of the sequences near the 3’ end are present in the genome. Sequence variation among the representatives of the group generates polymorphisms at restriction-site recognition sequences (indicated by the verticd arrows). Restriction fragment-length plyinorphisins (RFLPs) that identify interspersed subbmilies with closely related sequences are indicated by solid or broken lines Iwhveen the arrows. The 1330-bp repeated sequence cluster in plasmid pFS-13 carries a wmplete mpy of the R sequence and additional repeated DNA that is 3’ to R.

quence clusters (pFS-13 and pFS-8) were sequenced, they were shown to share homology with the R-sequence subgroup of the L1 family of interspersed repeated sequences (120). Members of the L1 family are up to 7 kb long (Fig. 10). They appear to be confined to the nucleus and have not been reported as nucleolar or mitochondrial elements. The complete sequences of apparently full-length representatives of this group have been determined in both mice (121) and humans (122). In general, members of these groups are shorter than 7 kb, but all carry a TAATAAAAAA sequence that defines the 3’ end of the retroposon thought to be responsible for the dispersion of these sequences during evolution. In humans, the family was defined by a group of sequences initially termed the KPN family; in mice, three named subfamilies were recognized: the R sequences, the BAMS sequences, and the MIF-1 sequences. Random truncation in L1 at various distances from the 3’ end results in a gradient in copy number for members of named groups. Thus, there are about 105 copies of the R-sequence group, about 2.5-5.0 X 1O.r copies of the

HUMAN DNA (CYTOSINE-5)METHYLTHANSFEHASE

89

BAM5 group, and about 2.0 X 104 copies for the MIF-1 group, since R is 3' to BAMS and BAM5 is 3' to MIF-1 in the L1 family. To assess the inethylation state in restriction fragment-length polymorphisms (RFLPs) from Ll, the consensus sequence of the L1 family in this region was used to establish a linear map of all possible CCGG sites at the 3' end of the family. Southern-blot walking experiments using subclones of pFS-13 were then used to probe the methylation state at the CCGG sites defining the ends of the RFLPs, using Southern blots prepared with genoinic DNA from the Friend cell line that had been cleaved with MspI (109, 119, 120).

None of the RFLPs observed with MspI were revealed by HpaII cleavage of DNA from normal spleen or DNA from the L1210 cell line. Since HpuII cannot cleave a CCGG site in which the central cytosine is methylated, while MspI can cleave these sequences, all of the RFLPs were methylated in normal spleen cells and in L1210 cells. In contrast, HpaII-cleaved DNA from Friend cells revealed that a subset of the RFLPs had lost methylation in this cell line at multiple independent chromosomal locations (109,116,120). The pattern observed in Friend cells was stable during cell division and stable during induction of differentiation in Friend cells induced by hexainethylene bisacetamide (109, 116, 119, 120, 123). The relative abundance of each RFLP was determined for both HpaII and MspI. The analyses showed that an RFLP characterized by an MspIl HpaII length of 690 bp retained methylation at each of 5OOO chromosomal loci, while an RFLP characterized by an MspIIHpuII length of 550 bp retained methylation at each of 3300 loci. In contrast, an RFLP characterized by a 7 5 0 - b ~band lost inethylation at 84% (4800/5700) of its 5700 independent loci in Friend erythroleukeinia cells, while an RFLP characterized by an MspIIHpaII length of 600 bp lost methylation at 35% (7400/21,000) of its 21,000 independent loci. All copies of each RFLP were methylated in L1210 lymphoma cells and in the DNA of normal mouse tissues (109, 116, 119, 120).

Analysis of Friend cell DNA showed that 5-inethylcytosine levels are about 40% lower in Friend cells than in normal mouse spleen cells (109). Measurement of number-average molecular weights of restriction fragment digests (120)suggests that the level of methylation at the CCGG sequence is about 24% lower in Friend cells than in normal mouse spleen cells. Assuming that the actual level of methylation in Friend cells is about 32% lower than that in mouse spleen cells (i.e., the mean of values obtained with the independent methods), then uniform random loss of methylation from the end points of each RFLP would generate loss of methylation at (0.32 X 0.32) = 10% of every RFLP family. Obviously, the loss of methylation from

STEVEN S. SMITH

90

F

F

FIG. 11. Concerted loss of methylation at interspersed repeated sequences. Concerted loss of inethylation from a restriction site at one end of an RFLP indicates a loss of inethylation at restriction fragments of related sequence at multiple unlinked loci in the genome. Two classes of RFLP are shown. Both classes are niethylated initially, but only one class retains inethylation after the cell sustains a general loss of inethylation.

(4800/5700)85% of the inembers of the 750-bp group, when contrasted with the coinplete retention of methylation in the ineinbers of the 690-bp group (0/5000 = O%), indicates that the changes in inethylation state are the reflection of a biological, not a random, process (Fig. 11). 2. IMPLICATIONS OF CONCERTED MODIFICATION

The phenomenon of concerted modification of interspersed repeated sequences established the two key facts of biological pattern formation (109, 116, 119, 120, 123). First, pattern formation must ultiinately be sequencespecific, since like sequences adopt like inethylation patterns at different loci (i.e., target sites behave as cis-acting genetic elements). Second, pattern formation inust be mediated by one or more factors capable of modulating inethylation or actually inethylating DNA (i.e., sequence-specific factors must act in truns to facilitate and/or perform inethylation of DNA at cisacting sequences).

3. SIMILARPATTERN FORMATION MECHANISMS APPLY TO LOW-COPY AND SINGLE-COPY SEQUENCES Concerted modification changes also occur in multiple copies of the rDNA of Xenopus. Although these sequences are sequestered in the nucleolus and may not share factors present in the nucleoplasm, the same funda-

HUMAN DNA (CYTOSINE-5)METHYLTRANSFEHASE

91

mental rules of pattern formation apply to the different copies of the rDNA as they undergo deinethylation during early developinent in Xenopus (124). In Neurospora crassa, when tandein duplications of the 5-S rRNA genes are introduced into somatic cells in the uninethylated state, they are methylated de nmo during gainetogenesis (125).This process has also been observed in Ascobolus immersus for tandem duplications of the met2 gene (126).In these cases, methylation requires the duplication that somehow serves as a sequence-specific (cis-acting) signal for methylation. New methylation patterns are seen after sporulation and are stable during nuclear division. Experiments in which the steroid 21-hydroxylase (127)or the adenine phosphoribosyltransferase gene (128)was transfected into mouse cell lines show that there are cis-acting signals for de n m o methylation on these sequences as well. Moreover, integrated retroviral sequences are gradually inethylated de nouo (129,130)independently of chromosomal location, again suggesting that inethylation is sensitive to sequence-specific signals. Human single-copy sequences adjacent to informative variable numbers of tandem repeats, or minisatellites (131),are inethylated in allele-specific patterns (132);thus, the sequence of one allele and not another appears to provide a cis-acting signal for inethylation in the same way that the sequence of an interspersed L1 RFLP provides a signal for inethylation. Moreover, these findings demonstrate that cis-acting signals are not confined to duplicated sequences and are not merely an attribute of DNA introduced into cells by transfection.

D. Enzymology of Pattern Formation Mechanisms Since the two key aspects of the biology of pattern formation apply to both repetitive and nonrepetitive DNA, the inechanisins proposed for the estahlishinent of concerted differences in inethylation state at L1 RFLPs (120,123)are also candidate inechanisins in the establishment of inethylatioii patterns at single-copy loci. It is useful to consider these proposals in light of the mechanism of action of the methyltransferase. Two broad classes of mechanism can be distinguished. 1. ACTIVEESTABLISHMENT AND MAINTENANCE

Probably the simplest mechanism for the establishment of new patterns of methylation is the transient expression of a factor that produces syinmetrical de nmo inethylation at the C-G site. The newly established pattern would then be maintained through subsequent cycles of replication by the methyl-directed activity of the enzyme. Studies of the human methyltransferase (31)have established that it responds to either of the two forins of the asymmetrically inethylated DNA by interacting with the resident methyl

92

STEVEN S. SMITH

group and rapidly converting the symmetrically placed cytosine to S-methylcytosine as the three-nucleotide motif is tested.

a. Transient Expression of Multiple Sequence-spec@ de Nmo Methyltransferase Actioities. One of the earliest proposals for the establishment of methylation patterns was transient expression of a series of sequencespecific de nouo methyltransferase activities that would mediate de nmo methylation of selected sequences at a given stage in the development of an organism. This is an attractive model for the changes in methylation pattern observed as tissue-specific patterns of methylation are established during development (54, 55, 133). However, as noted above, sequence-specific de nmo inethyltransferase activities have not yet been observed. Each of the known mammalian enzymes carries out the methyl-directed reaction, the de nmo reaction, and the structurally enhanced reactions (25,31,53,62),which can occur in either a de nouo or a methyl-directed fashion. Moreover, the de nouo reaction and the structurally induced de nmo reaction (25)proceed at the same active site, at least in humans and possibly other known methyltransferases from mammalian sources (134).Perhaps a constellation of de nouo methyltransferases will ultimately be detected in early embryonic tissues through biochemical or genetic analysis, but the current picture developed from the enzymology and biochemical genetics is far more compatible with the existence of a battery of modulatory factors.

b. Trunsient Expression of Methyltransferuse Modulators. The expression of a series of inethyltransferase modulators that confer sequence specificity to a single inethyltransferase (123) provides an attractive alternative possibility for the establishment of methylation patterns. Such factors would be expected to interact with the methyltransferase itself either as cofactors that bind to the enzyme or as enzyme systems that covalently modify it. Such factors could have either positive or negative effects (54). The search for modulatory factors has not been extensive, and the studies already performed have not been conclusive. Proteases, protein kinases, or protein factors that bind directly to the enzyme could serve as enzyme modulators. For example, proteolysis of the mouse enzyme has been proposed as a covalent alteration associated with differentiation in Friend cells (135). However, in the face of the induction of some 3800 new gene products, as judged by the appearance of newly induced proteins on twodimensional polyacrylamide gels (136),associated changes in general methylation levels are below the limit of detection in Friend cells (109, 1 1 1 , 116, 120). This suggests that if proteolysis occurs biologically, it has little or no effect on inethylation levels during differentiation. By the same token, the human enzyme is an excellent substrate for protein kinase C in oitro (137).

HUMAN DNA (CYTOSINE-5)METHYLTHNSFEHASE

93

However, no alteration in activity or sequence specificity has yet been observed following phosphorylation in oitro. Since the tools for the isolation of proteins that interact with the methyltransferase are at hand (see 138, for example), it will be of interest to isolate and characterize methyltransferase-binding proteins. Preliminary experiments indicate that there are several such proteins of unknown function in the human placenta ( 5 3 ~ ) .

c . Trunsient Expression of Seyuence-specijic Factors. Because of the enzyme mechanism, one of the most attractive possibilities for the establishment of inethylation patterns de nouo is the transient expression of a series of DNA-binding proteins that facilitate methyltransferase action (120, 123,132, 139). This mechanism is particularly attractive because conformational change induced by protein-DNA interaction could activate a particular site for inethylation by presenting an analog of the catalytic transition state to the meth yltransferase. Conformational change associated with bacterial DNA-binding proteins that activate (140)or repress (141, 142) transcription are well known. On the basis of the mechanism of action of the DNA methyltransferase, proteins like these that bind at specific sequences and produce dramatic conformational changes in DNA would be predicted to produce site-specific methylation (Fig. 12). As truns-acting proteins specific for a cis-acting sequence, like that present on the adenine phosphoribosyl transferase gene, for example (128), these proteins could elicit concerted de nmo inethylation at all sites possessing a given DNA recognition sequence (Fig. 13). The inhibition or loss of such a factor during tumorigenesis would explain the concerted loss of methylation at RFLPs in L1 (Fig. 11) if methyl-directed methyltransferase function were also transiently impaired during tumorigenesis.

d . Trunsient Fonnution of Unusuul DNA Structures. The specificity of the human enzyme for the three-nucleotide motif at the C-G dinucleotide pair permits the enzyme to recognize and rapidly methylate DNA structures in which one of the guanines in the C-G site is absent, damaged, replaced, or unusually base-paired (e.g., wobble-paired) at the junction between an unusual DNA structure and a Watson-Crick-paired duplex. Oligodeoxynucleotides containing heteroduplex sites, of DNA damage, WatsonCrick-paired foldback structures, foldback structures containing nonWatson-Crick pairs, or gaps are all actively methylated in oitro in a manner consistent with this innate specificity. This important property of the enzyme raises the possibility that DNA methylation has evolved as a biological response to the transient formation of these sorts of unusual DNA structures. Unusual DNA structures might be introduced by factors that induce

94

STEVEN S. SMITH

FIG.12. Hypothetical site-specific methyltransferase activator. A hypothetical sequencespecific DNA-binding protein (shown schematically as an ellipsoid carrying two cylindrical appendages) might activate a site for enzymatic methylation by constraining DNA in a conformation that presents a transition-state analog to the methyltransferase. Here, the bend induced in the DNA by protein-DNA interaction is viewed as presenting weakly stacked cytosines in an appropriate three-nucleotide motif to the enzyme.

conformational change in DNA, as noted above, but it is important to note that conformational change in DNA need not be caused by factors whose primary function is the alteration of DNA methylation patterns. Factors involved in stabilizing condensed chromatin (120, 123) might also induce methylation as a consequence of the formation of a required chromosome structure (123) that presents a transition-state analog to the methyltransferase in an appropriate three-nucleotide motif. This possibility is of interest because it suggests that changes in methylation state that follow changes in differentiation state (105) or that are associated with gametogenesis (143, 144) may be consequences of alterations in chromosome structure (33),not a cause of gene silencing (Fig. 13). It is important to recognize that this model does not imply that every condensed or paired DNA sequence would become methylated. DNA methylation patterns generated through the action of trans-acting factors that produce pairing or condensation in a region would be confined to cis-acting sequence patches with the potential for appropriate local structural polymor-

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

95

FIG.13. Mechanisms for the establishment of methylation patterns consistent with the enzymology. Three hypothetical forms of modulation are depicted. (Top) Factor-specific modulation would be produced by protein or nucleic acid factors that interact with DNA to produce a site-specific methyltransferase activation by constraining the DNA in a conformation that either promotes methyltransferase recognition by presenting a transition-state analog in an appropriate three-nucleotide recognition motif or produces a site-specific inhibition by blocking access to the three-nucleotide motif. Factor-specific modulation would be expected if the primary role of DNA methylation is in the control of developmental processes. (Center) Conformation-specific modulation would be produced by global constraints on local DNA conformation produced by chromosome condensation during mitosis or chromosome pairing during meiosis. Site-specific methyltransferase activation would again be produced by DNA in a conformation that would promote methyltransferase recognition by presenting a transition-state analog in an appropriate three-nucleotide recognition motif. Site-specific inhibition would be produced by conformations that destroy the three-nucleotide motif on a daughter strand. This form of modulation is essentially passive and would he expected if DNA methylation plays a role in the prevention of chromosome damage in condensed chromatin states. (Bottom) Damagespecific modulation would be produced at sites of DNA damage. Activation would be achieved by the presentation of a transition-state analog in the three-nucleotide motif, while inhibition would be achieved by destruction of the motif at the site of damage. This form of modulation would also be passive and would be expected if DNA methylation plays a role in DNA repair.

phism (33).This is because the genome is a sequence mosaic. Its patchiness (145) is reflected in the local properties of DNA sequences at every level, including the potential for local structural polymorphism (146). Although chromosome condensation or pairing might tend to transiently juxtapose sequences from any two points in the genome, unusual DNA structures,

STEVEN S. SMITH

96

especially those involving non-Watson-Crick pairing, would be confined to selected sites because these structures have special sequence requirements. 2.

PASSIVECONTINUOUS DE Novo METHYLATION Most of the mechanisms discussed above require two steps: an initiating

de nmo methylation event followed by methyl-directed or maintenance

methylation events occurring at each subsequent cell cycle (Fig. 14). While two-step models are the most widely discussed (54,55, 133), they need not be invoked to provide an enzymologically consistent explanation of the biological facts of methylation. Several investigators (see 52 and 147)have suggested that methylation could be maintained by the single-step process of continuous site-specific de nmo methylation. This point of view maintains that the primary sequence of DNA, not the existing methylation pattern, dictates the methylation pattern at each round of DNA replication. As can be seen from the foregoing, the current state of the enzymology (particularly the ability of the methyl group to stimulate the methyl-directed reaction) clearly argues against this suggestion. Moreover, the biology is also difficult to reconcile with it: Tissue-specificpatterns of methylation could not arise in

FIG.14. Maintenance methylation mechanism consistent with the enzymology. Established methylation patterns can be inherited somatically by the mechanism of maintenance methylation (54.55) which is clearly supported by the enzymology reviewed here. Regardless of the functionalrole of methylation, the maintenance of an established pattern would provide a form of hysteresis affecting methylation-sensitive properties of progeny cells.

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

97

cells having the same DNA sequence if sequence were the only requirement for methylation. On the other hand, asymmetrically methylated sites appear in some cases to be maintained through several rounds of cell division (83, 85). Somatic inheritance of the asymmetrically methylated state is most easily rationalized within the constraints of the currently known enzymology as continuous de nmo methylation of a site activated for de n o w methylation because it presents a loosely stacked cytosine at a three-nucleotide motif. Chromatin structure might continuously re-form at each round of replication to produce the activated site that would subsequently be recognized and methylated de nova Early rules for the recognition of oligodeoxynucleotides by the human DNA methyltransferase suggested that actively inethylated sequences had a (G+C)-content of more than 65% and contained two C-G sites spaced about 13-17 nucleotides apart (50, 52). More recent experiments (32, 53u) show that the second C-G site is not required for active methylation, nor is the 13- to 17-nucleotide spacing. Sequences of the type (studied in 52) occur in the (G+C)-rich subtelomeric region of human chromosome 11, where the p-globin cluster (148) and the c-Ha-rus genes are located. One-to-one stoichiometric mixtures of complementary sequences from both locations are rapidly methylated de nmo by the human DNA methyltransferase to produce a product that is asymmetrically methylated when it is assumed to be methylated as a duplex (30, 148). The potential for unusual DNA structure in the region near codon 12 of the c-Ha-rus gene has long been recognized (30,149, 150). This has been suggested as an explanation for the mutagenic potential present at this (30, 149,150) and other (151,152) sites. It has also been suggested as an explanation for the asymmetric methylation patterns applied to sequences derived from the region by human methyltransferase (30).The role of unusual DNA structures in the generation of these asymmetric methylation patterns has now been explored in some detail (30, 534. In 1:l mixtures of complementary 30-mers from the region, DNA from the C-rich strand in the region was recognized and methylated at an extremely rapid rate, while that corresponding to the G-rich strand was ignored by the enzyme (30). Preliminary evidence (30) suggested that the enzyme recognizes an eight-stranded G4 structure of the type detected at the immunoglobulin switch region (153).Stoichioinetricanalysis of the mixture of forms produced by annealing equiinolar amounts of the two compleshowed that the predominant form is a complemenmentary strands (53~) tary duplex. The G-rich strand formed small amounts of quadruplex and Hoogsteen-paired foldback DNA, while the C-rich strand also formed a foldback of the type reported for the C-rich strand of telomeric DNA (154).

98

STEVEN S. SMITH

This foldback forin of the C-rich strand appears to be the primary substrate for the human methyltransferase, since it is also methylated in isolation ( 5 3 ~ )Interestingly, . when the guanine residues in the C-rich strand are replaced by hypoxanthine residues, a new structure is adopted by the sequence, and its effectiveness as a methyltransferase substrate is abolished (53a). Since the structures of substrate oligodeoxynucleotidesused in earlier studies were not explored (147), it is reasonable to assume that the methyltransferase recognized unusual structures associated with C-rich single strands. This is clearly the most plausible inference, since active inethylation by the human DNA methyltransferase can be taken as evidence for the existence of an activated three-nucleotide motif.

3. DEMETHYLATION While the enzymology of DNA inethylation sets strong limits on the processes of de nmo and methyl-directed inethylation in the establishment of methylation patterns, it sets only weak limits on the process of demethylation. One of these limits is that enzymes that carry out the actual demethylation of the cytosine ring are not likely to be found because of the unfavorable energetics of deinethylation. The single report of an activity of this type (155) has not been confirmed. Deinethylation of DNA might occur over a period of several cell-cycles if access to a symmetrically inethylated C-G diiner were blocked by a sitespecific protein or a chromatin structure. Moreover, negative effectors of the inethyltransferase might block its activity over a period of cell-cycles in a sequence-specificmanner. Active deinethylation (156, 157) might also occur by a process analogous to very-short-patch repair. In these reactions, deinethylation would be carried out by excision of the 5-methylcytosine nucleotide, followed by repair synthesis to introduce cytosine. The enzymology of the methyltransferase indicates that demethylation of this type could introduce stable demethylation at a symmetrically inethylated C-G diiner only if excision and repair were repeated sequentially on both strands.

E. Enzymology of Disturbances in Patterning Produced by DNA Damage 1. BASE ANALOGS

5-Azacytidine, 5-azadeoxycytidine (3), and 5-fluorodeoxycytidine (158) have strong effects on DNA methylation patterns. Base substitutions of this type introduce DNA damage in the form of a hydrogen-bond acceptor at C-5 that is not present in the normal cytosine residues they replace. This is expected to alter the chromatin structure in regions where the base analogs

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

99

have been incorporated, because proteins that normally interact in a sequence-specific fashion through contacts with the major groove of the helix will no longer interact appropriately with their target sequences. Altered methylation patterns might also result from indirect inhibition of methyltransferase caused by damage to chromosome structure induced by 5-azacytidine and 5-fluorocytidine, in addition to the direct inhibition of the enzyme caused by the formation of covalent complexes with the methyltransferase. Damage to chromosome structure through the alteration of hydrogen-bonding potential in the major groove might also explain the capacity of 5-bromodeoxyuridine to induce the apparent de novo methylation of the thymidine kinase gene in hamster cells (107). The damaging effects of 5-azacytidine on chromosome structure have now been well documented (159-164). In Drosophila, for example, 5-azacytidine is a genotoxin (165), even though this organism has no 5-methylcytosine in its DNA. Since 5-bromodeoxyuridine is not a strong inethyl director (28), its effects on chromosome structure might promote de nmo methylation by introducing conformational change near C-G sequences.

2. ALKYLATION DAMAGEAND MISINCOHPORATED NUCLEOTIDES Many mutagens generate electrophiles that form adducts with DNA. Tobacco contains several carcinogenic compounds, of which the volatile N-nitroso compounds are among the best studied. Dirnethylnitrosamine, diethylnitrosamine (166),and N-nitrosomorpholine (167)have been detected in significant amounts in tobacco. Detailed studies of the cellular fates of several of these compounds suggest that they are metabolized by cellular enzymes to alkylating agents that attack DNA. Dirnethylnitrosarnine, for example, is activated in target tissues to generate a DNA methylating agent (168). 06-Methylguanine is thought to be an especially important alkylation product since it tends to inispair with thymine and thus has the potential for miscoding during DNA replication (169). N7-Methylguanine is relatively unstable, since the glycosyl bond is made labile and can decay further, either chemically or through the action of a specific glycosylase, to an apurinic site (170). Apurinic sites are also mutagenic intermediates with a potential for miscoding (171).A similar cascade of events has been identified in N-nitrosopyrrolidine activation (172,173),in which cyclic adducts are also labile and decay chemically to apurinic sites in DNA (173). Guanine residues are the most common sites of alkylation in DNA, and regions high in G and C are especially good targets in genomic DNA (174). Since these regions are often rich in the C-G dinucleotides that are sites of naturally occurring cytosine methylation, it is important to ask what sort of interactions alkylation damage would have with the methylation system.

100

STEVEN S. SMITH

Work relating expression and methylation of individual genes is not clear. For example, when cultured pituitary cells are exposed to an alkylating agent, cell lines that do not express prolactin can be isolated with high frequency. These cells can exhibit a stable prolactin-negative phenotype, but they can be reverted to the prolactin-positive phenotype by treatment with the DNA methylation inhibitor 5-azacytidine (175). On the basis of this and the observation of a twofold stimulation of methyltransferase activity by alkylated poly(dG.dC), it w a s concluded (176)that alkylation of DNA stimulates the action of the DNA methyltransferase in o i w , inappropriately silencing the prolactin gene (176);that is, alkylation produced hypermethylation of this gene. In sharp contrast, another report showed that treatment with the tobacco carcinogen N-nitrosomorpholine, which is similar to N-nitrosopyrrolidine in its ability of alkylate DNA (172, 173, 177), produces hypomethylation of the c-myc protooncogene (177). The enzymology of DNA methyltransferase suggests that alkylation damage can elicit either hypomethylation or hypermethylation at selected sites, depending simply on the orientation of the unusual structure produced by a lesion at the C-G dinucleotide. Thus, any perturbation in helical structure that lowers the stacking energy or protonates N-3 of a cytosine ring at the methyl acceptor position in the three-nucleotide recognition motif will enhance methylation by mimicking the transition state or facilitating nucleophilic attack at C-6. In contrast, any alkylation event that destroys required major-groove interaction sites in the three-nucleotide motif will block methylation.

F. Deamination at C-G Dinucleotides The C-G dinucleotide is underrepresented in the human genome (81), according to its expected nearest-neighbor frequency in a genome of random sequence with the human content of G and C. Since natural selection maintains the nonrandom nature of the genome, the underrepresentation of this sequence might indicate some form of selection against its presence at the DNA, RNA, or protein level that is reflected at the DNA level. For example, coding sequences may exclude C-G and T-A in order to distinguish these sequences from control sequences and sequences outside genes (178-180) where C-G and T-A sequences are more abundant (180-182). Maintenance of this genomic mosaic might explain the underrepresentation of C-G. Alternatively, these dinucleotides might be lost through genetic drift driven by the presence of 5-methylcytosine at the C-G site, because deamination of 5-methylcytosine to thymine presents the cell with a dilemma for strand choice in excision repair. This choice occurs because both guanine and thymine are normally present in DNA. This later argument was put forward (183) to explain the underrepresentation of C-G in the human

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

101

globin genes. Thus, C-G sites might act as mutational hot-spots in human tissues, because a methyltransferase specific for this site is expressed by human cells in the same way that the CCAGG sequence in the lac1 gene of Escherichia coli is a hot-spot for mutation when the DNA methyltransferase product of the don gene specific for the CC(A/T)GG sequence is expressed in E. coli (184). Genetic drift in codon usage might accentuate the differences between coding and noncoding sequences in organisms in which these regions are differentially methylated (178-180). The concept of drift driven by 5-methylcytosine is supported by the clustering of C-Gs in control elements (islands) that lack 5-methylcytosine (181, 182) and by the high frequency of C-to-T transitions at C-G sites in human mutations overall (185, 186). Thus, an important question centers around the role of the human DNA methyltransferase in genetic drift and mutagenesis. Does it play a central role by enzymatically promoting deamination during catalysis, or does it play a passive secondary role by generating 5-methylcytosine as an enzymatic product in DNA that is subsequently deaminated to thymine in DNA by other means? 1. ENZYMATIC DEAMINATION

The capacity of the enzyme for deamination has been detailed above. The enzymatic production of thymine has not yet been observed for either human or bacterial enzymes (13, 44). Thus, the fidelity of the methyl-directed activity (i.e., its ability to copy a resident methylation pattern in uitro without introducing mutations) is at least as good as the fidelity exhibited by the mammalian DNA polyinerases in copying a DNA sequence in uitro (13).An error rate below this replication noise level associated with the methyltransferase could not significantly alter the mutation rate at C-G sites (13). As noted above, the bacterial M,HpaII permits a low level of deamination when neither AdoMet nor Ado-Hcy is provided in uitro, suggesting that an unoccupied AdoMet binding site permits protonation of the carbanion at C-5 and subsequent hydrolysis of the 5,6-dihydrocytosine intermediate to form uracil. AdoHcy binding prevents this reaction (44), and so does AdoMet binding (13,44). In the latter case, the normal product, 5-methylcyltosine, is produced, but 5-methyluracil (i.e., thymine) is not (13). With reference to Fig. 8, it can be seen that the formation of the carbanion in the absence of AdoMet might allow water to enter through the access channel used by AdoMet, promoting hydrolytic deainination to uracil. Thymine is not produced by the enzyme because the active site is sealed against water. Thus, deamination cannot occur in the presence of bound cofactor (steps 3-4). This also illustrates an important point: The deamination reaction is nec-

102

STEVEN S. SMITH

essarily hydrolytic (13). Work with the crystal structure of the FdCcontaining DNA complex formed with HhaI methyltransferase suggests that the N-4 amino group is stable during the long-term manipulation required for crystallization and crystallographic analysis (36a). Thus, the complex probably does not decay directly as suggested (43),but must be permitted to react with water, which can be effectively sealed out by the enzyme during catalysis (13). One can imagine a situation in which AdoMet limitation permits the production of uracil followed by the subsequent binding of AdoMet and catalysis to produce thymine. This has not yet been observed. In fact, the human enzyme appears to have evolved a fail-safe mechanism for the prevention of the conversion of uracil to thymine, since it will not methylate uracil when it is placed in an oligodeoxynucleotide at a 5-methylcytosinetargeted methyl acceptor site (53a). Nevertheless, it remains possible that uracil produced by the action of methyltransferase under conditions of AdoMet limitation might survive until subsequent rounds of replication replaced it with thymine. While this might be possible in bacteria and mammalian cells lacking uracil glycosylase (44),it is unlikely to play a role in C-toT transitions at C-G sites in normal cells. It is important to note that uracil production by methyltransferases in bacteria should have at least two consequences in uiuo. First, ung- bacteria should have elevated general mutation rates in a don+ background. The mutation rate in ung- dcm+ bacteria is not elevated beyond that predicted from the spontaneous chemical rate (187).Second, ung- in a don+ background should behave as a specific mutator locus for the CC(A/T)GG sequence. In fact, just the opposite was observed (188).C-to-T transitions associated with 5-methylcytosine hot-spots were suppressed in dcm+ ungbacteria. This would be consistent with the ability of uracil in DNA to inhibit the methyltransferase ( 5 3 ~ )If. this occurs in uiuo, one would expect CC(A/T)GG to become demethylated, much as if the cells had been treated with 5-azacytidine. The dcm hot-spots in lac1 would be suppressed because they would no longer be methylated. One of the major difficulties with the possible involvement of DNA methyltransferase catalysis in the direct production of C-to-T transitions has been pointed out (189).Numerous C-G sites in the human p53 and the human low-density lipoprotein receptor genes carry 5-methylcytosine. Nevertheless, only a few of these sites are hot-spots for mutagenesis (189,190). This is in contrast with the results in lac1 where each recognition sequence corresponding to the d m methyltransferase recognition site (CCAlT) also corresponds to a mutational hot-spot compared to surrounding sequences.

HUMAN DNA (CYTOSINE-5)METHYLTHNSFEHSE

103

Since the hot-spots in p53 are not uniquely methylated, as is the case in ZucI, adjacent C-G sites that also carry 5-methylcytosine ought to show a high frequency of mutation compared to unmethylated surrounding sequences. This is because each site must be methylated at the same rate (once in every cell-cycle) in order to maintain stable methylation. Since the adjacent C-G sites are not hot-spots (190), one must invoke a second factor in explaining the hot-spot. Two possibilities come to mind. Perhaps the methylated C-G at the hot-spot could present a special problem to the methyltransferase, perhaps due to its chromatin or DNA structure, that prevents the enzyme from effectively blocking the attack of the 5,6-dihydrocytosine intermediate by water. Alternatively, the methylated C-G at the hot-spot could be susceptible to nonenzymatic deainination because of a special conformation or structure in DNA or chromatin. 2. NONENZYMATIC DEAMINATION

Measured spontaneous rates of chemical deamination at cytosine and 5-methylcytosine are quite low in DNA (191, 192). Deamination of cytosine and 5-inethylcytosine is expected to proceed via nucleophilic attack at C-6 (38, 40-42). Currently, one can only speculate on the nature of the nucleophile(s) that might play a role in chemical deamination in zjizjo. Although a variety of buffer anions can catalyze this deamination (38, 40), the wellknown bisulfite-catalyzed deamination of cytosine ( 4 4 effectively illustrates the process. The key aspects of the reaction are the requirements for a nucleophile (bisulfite or buffer anion) and low pH. Proposals for the mechanisms of enzymatic catalysis and deamination by DNA methyltransferases share the themes originally set forth to explain these reactions (38,40-42). The bisulfite-catalyzed deainination reaction can be viewed as essentially identical to that described for the methyltransferase, except that SO,2- serves as the nucleophile at C-6. Protonation of the carbanion to produce 5,6-dihydrocytosine 6-sulfonate is followed by hydrolysis of the 3,4 double-bond. Elimination of ammonia generates 5,6dihydrouraci16dfonate, and p-elimination at C-5-C-6 generates uracil and bisulfite. The bisulfite-catalyzed deainination of 5-methylcytosine to thymine occurs by the same route (193). Hydrolysis at the 3,4 double-bond in cytosine or 5-methylcytosine requires that this double-bond be accessible to solvent; thus, the reaction rate is expected to be sensitive to global and local DNA structure (192).In support of this finding, it has been clearly shown that the deamination rate for double-stranded DNA is about 1/140th of that for single-stranded DNA under the same conditions (192). Moreover, bisulfite has been shown to promote deamination selectively at loop cytosines in tRNA (194).These observations raise the interesting possibility that nu-

104

STEVEN S. SMITH

cleophiles in the nuclear milieu (e.g., glutathione or another -SH-containing peptide) might promote the deamination of 5-methylcytosine at sites distinguished by special conformations or structures in DNA or chromatin. AdoMet can methylate DNA nonenzymatically (42, 195, 196). Although most of the methylation observed can be accounted for as 7-methylguanine, 3-methyladenine, or 06-methylguanine, a small amount appears to occur as thymine (42). The reaction was viewed as nonenzymatic methylation and deamination of cytosine catalyzed by buffer nucleophile. While this reaction was originally thought to be a possible explanation for C-to-T transitions in general (42), it may offer an explanation for C-to-T transitions at the C-G dinucleotides in globin that have not yet been observed to be enzymatically inethylated (82).

3. ROLE OF NUCLEICACID STRUCTURE IN DEAMINATION

The C-G dimer exhibits several rather unusual structural characteristics. It crystallizes as a parallel minihelix containing C.C and G.G base-pairs (197). Moreover, its characteristics within the confines of B-DNA suggest a high degree of structural polymorphism (198, 199), depending on flankingsequence context. This has led to the suggestion that the tendency for unusual structure formation at C-G might explain a tendency for modification by 2-acetylaminofluorene (149) and selective binding of hydroxyellipticine (200) at this site. This in turn suggests that, in certain sequence contexts, a required methylation event (e.g., a methyl-directed methylation) would be attempted on a conformationally constrained sequence that would prevent the enzyme from protecting the 5,6-dihydropyrimidine intermediate from water. This same problem might occur as a consequence of damage involving the formation of adducts or abasic sites at the C-G site. Thus, DNA methyltransferase might mediate certain forms of damage-induced deamination at the C-G site. By the same token, selective attack on 5-methylcytosine in CGs in unusual conformations by nucleophiles in the nuclear milieu might produce site-specific deamination at C-Gs nonenzymatically. 4. ALTERNATIVES

While the possibility that related enzymatic and nonenzymatic processes might selectively enhance the rate of C-to-T transitions at C-G sites has been raised, as discussed above, current evidence seems to implicate 5-methylcytosine, but has not yet clearly implicated DNA methyltransferase catalysis. That is, the problem appears to involve the presence, more than the process, of methylation. Perhaps the forward deamination rate is not really elevated at the C-G dinucleotide but occurs with a frequency equal to that in other dinucleotides. This suggests that the structural polymorphism

HUMAN DNA (CYTOSINE-5)METHYLTRNSFERASE

105

at the C-G site (198)might hinder the repair of deaminated cytosines in this context. The specificity of the G.T mismatch binding protein (201),thought to function in the repair of deaminated 5-methylcytosine residues, is consistent with this possibility, since it does not require either the C-G dinucleotide or a 5-methylcytosine residue adjacent to the guanine residue in the G.T mispair (201-203).

IV. Conclusions It now seems virtually certain that the cytosine methyltransferases from all bacterial and eukaryotic sources operate by a common mechanism involving nucleophilic attack at C-6 followed by the formation of a 5,6dihydrocytosine intermediate. This places certain constraints on the utility of the reaction, and at the same time offers evolutionary opportunities to organisms that have opted to use cytosine methylation. The principal constraints result from the inability of the carbanion and 5,6-dihydrocytosine intermediates to stack properly in DNA and from the propensity of carbanion and dihydrocytosine for hydrolytic deamination. Experiments probing these properties of the reaction suggest the following. 1. Organisms that use cytosine methylation have effectively dealt with the problem of deamination by sealing the intermediate away from water, rendering methyltransferase-induced deamination to thymine insignificant. The human enzyme, for example, produces no detectable thymine and will not convert uracil to thymine. This provides a fail-safe point at which any small amount of deamination to uracil can be repaired by uracil glycosylase.2 2. Since any loosely stacked cytosine in B-DNA is potentially a transitionstate analog for DNA methyltransferase, organisms that use cytosine methylation may have utilized this fact as an opportunity for regulation of methylation patterns. Bacterial enzymes, with overriding requirements for methylation of complex sequences, do not appear to have done so; however, the human enzyme, with its relaxed sequence specificity for only three of the four nucleotides in the C-G dinucleotide is able to respond to loosely stacked cytosines in unusual DNA structures as transition-state analogs that activate methylation. 3. The phenomenon of concerted modification of interspersed repeated sequences defines the fundamental requirements for the establishment of methylation patterns. Cis-acting sequence elements must be coupled with truns-acting factors that modulate methylation at multiple loci in identical sequences. The activation of the human enzyme by alterations in stacking 2

See article b y I1.W. Mosl)augh and S.E.Bennett in Vol. 48 of this series. (Eds.).

106

STEVEN S. SMITH

energy at the C-G sequence predicts the existence of chromosome structures or site-specific DNA-binding proteins that will activate site-specific de nouo methylation by presenting unstacked or protonated cytosines in the CG site to the DNA methyltransferase. 4. The active recognition of unusual DNA structures by the human DNA methyltransferase suggests that the evolution of the enzyme and perhaps the evolution of DNA methylation itself may have been driven by the formation of unusual structures in DNA. 5. Modulation of the formation of unusual DNA structures may ultimately provide the coininon link among the various biological phenomena with which methylation has been indirectly associated. 6. Distinguishing between the several hypotheses that have been put forward for the function of cytosine DNA methylation will depend on the nature of the factors that are ultimately found to modulate site-specific methylation. If methylation-specific factors that interact with DNA to produce site-specific de nmo methylation during development are found, a role in controlling developmental processes would be suggested. If passive structurally induced methylation is found, a role in preventing damage or promoting repair during development or gametogenesis would be suggested. ACKNOWLEDGMENTS Supported by grant 0388 from the Smokeless Tobacco Research Council, Inc. The author is a inenher of the Clinical Cancer Center of tlie City of Hope (CA33572-09).and would like to thank the many colleagues and students who have contributed to the evolution of these concepts over tlie past 10 years. Molecular niodels of the A C tnispair were mnstructed from cmrdinates supplied by William Hunter and Olga Kennard, Cambridge University. The model of proteincanstrained DNA was derived from coordinates for the CAP protein-DNA complex (140). obtained from the Protein Data Bank at Brookhaven Nationd hlioratory. I thank Xiadong Cheng, Cold Spring Harbor hl)ordtory, for ru~tiiiiiittiicatitigunpul)lished results.

REFERENCES 1. 2. 3. 4.

5. 6.

7. 8. 9.

S. Friedinan. BBRC 89, 1328 (1979). S. Friedman, Mol. Phantmol. 19, 314 (1981). P. A. Jones and S. M. Taylor, Cell 20, 85 (1980). S. Friedman, ]BC 260, 5698 (1985). D. V. Santi, A. Norment and C. E. Garrett, PNAS 81, 6993 (1984). J. K. Christman, N. Schneidertnan and G . Acs, ]BC 260, 4059 (1985). D. V. Santi, C. E. Garrett and P. J. Barr, Cell 33, 9 (1983). J. C. Wu and D. V. Santi, ]BC 262, 4778 (1987). D. G . Osterinan, G. D. DePillis, J. C. Wu, A. Matsuda and D. V. Santi, Bchetn 27,5204 (1988).

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE

107

10. C. Taylor, K. Ford, B. A. Connolly and I). P. Hornby, BJ 291, 493 (1993). 11. K. Ford. C. Taylor, B. Connolly and I). P. Horny, JMB 230, 779 (1993). 12. L. Chen, A. M. MacMillan, W. Chang. K. Ema-Nikpay, W. S. Lane and G. L. Verdine, Bcheni 30, 11018 (1991). 13. S. S. Smith, B. E. Kaplan, L. C. Sowers and E. M. Newinan, PNAS 89, 4744 (1992). 14. R. Cox, Cancer Res. 40, 61 (1980). 15. R. Lauster, T. A. Trautner and M. Noyer-Weidner, J M B 206, 305 (1989). 16. J. Pbsfai, A. S. Bliagwat, G . P6sfai and R. J. Roberts, NARes 17, 2421 (1989). 17. R.-W. C. Yen, P. M. Vertino, B. D. Nelkin, J. J. Yu, W. El-Deiry, A. Cuinaraswamy, G. G . Lennon, B. J. Trask, P. Celano and S. B. Baylin, NARes 20, 2287 (1992). 18. S. Friedman and N. Ansaki, NARes. 20, 3241 (1992). 19. A. Bergerat, W. Giisch1l)auer and G . V. Famkerley, PNAS 88, 6394 (1991). 20. S. S. Smith, T. A. Hardy and D. J. Baker, NARes 15, 6899 (1987). 21. N. K. Kochetkov, E. I. Budovskii, E. D. Sverdlov, N. A. Simukova, M. F. Turchinskiiand V. N. Shil)aev, i n “Organic Chemistry of Nucleic Acids” (N. K. Ktrhetkov and E. I. Budovskii, eds.), p. 159. Plenum, New York, 1972. 22. G . P. Pfeifer, S. Grunwald, T. L. J. Boehm and 13. 1)raIiovsky. BBA 740, 323 (1983). 23. L. Vardimon and A. Rich, PNAS 81, 3268 (1984). 24. T. Bestor, NARes 15, 3835 (1987). 25. D. J. Baker, A. k y o u n and S. S. Smith, BBRC 196, 864 (1993). 26. J. M. Clark. N. Pattabiraman, W. Jarvis and G. P. Beardsley, Bcheni 26, 5404 (1987). 27. T. A. Hardy, I). J. Baker, E. M. Newman, L. C. Sowers, M. F. Goodman and S. S. Smith, BBRC 145, 146 (1987). 28. D. J. Baker, T. A. Hardy and S. S . Smith. BBRC 146, 596 (1987). 29. 11. J. Baker, J. L. C. Kan and S. S. Smith, Gene 74, 207 (1988). 30. S . S. Smith, I). J. Baker and L. A. Jardines, BBRC 160, 1397 (1989). 31. S. S . Smith, J. L. C. Kan, D. J. Baker, B. E. Kaplan and P. Deinbek, J M B 217, 39 (1991). 32. S . S . Sinith, R. G . Lingeman and B. E. Kaplan, Bcheni 31, 850 (1992). 33. S. S. Smith, Mol. Curcinog. 4, 91 (1991). 34. S. S. Smith, J . NlH Res. 5, 18 (1993). 35. A. R. Fersht, Proc. R . Soc. k i n d i i n , Ser. B 187, 397 (1974). 36. F. Jordan and H. D. Sostman, JACS 95, 6544 (1973). 36u. S. Klimuskansas, S . Kumar, R. J. Rolxrts and X. Cheng, Cell 76, 357 (1994). 37. A. K. Duhey and R. J. Rolwrts, NARes 20, 3167 (1992). 38. R. Shapiro and R. S. Klein, Bcherri 5, 2358 (1966). 39. B. E. Evans, G. N. Mitchell and €3. Wolfenden, Bcheni 14, 621 (1975). 40. R. Shapiro and R. S. Klein, Bcheiii 6, 3576 (1967). 41. H. Hayatsu, This Series 16, 75 (1976). 42. A. L. Mazin, 0.A. Gimadutkinov, S. I. Turkin, N. N. Burtsevaand B. F.Banyushin, Mol. B i d . 19, 903 (1985). 43. E. U . Selker, ARCen 24, 579 (1990). 44. J.-C. Shen, W. M. Rideout 111 and P. A. Jones, Cell 71, 1073 (1992). 4.5. 1). Ingrosso, A. V. Fowler, J. Bleilmm and S. Clarke, JBC 264, 20131 (1989). 46. X. Cheng, S . Kumar, J. Pbsfai, J. W. Pflugrath and R. J. Roberts, Cell 74, 299 (1993). 47. T. Bestor, A. Laudano, R. Mattaliano and V. Ingrain, J M B 203, 971 (1988). 48. P. Renbaum and A. Razin, FEBS Lett. 313, 243 (1992). 49. I. Nur, M. Szvf, A. Rain, G. Glaser, S. Rotten1 and S. Razin, J . Buct. 164, 19 (1985). 50. A. H. Bolden. C. M. Nalin, C. A. Ward, M. S. Poonian, W. W. McConias and A. Weisshwh, NARes 13, 3479 (1985). 51. A. Bolden, C. Ward, J. A. Siedlecki and A. Weissbach, JBC 259, 12437 (1984).

108

STEVEN S. SMITH

52. A. H. Bolden, C. M. Ndin, C. A. Ward, M. S. Poonianand A. Weisslxwh, MCBiol6,1135 (1986). 53. Y. Gruenhum, H. Cedar and A. W n , Nature 295, 620 (1982). 53a. S. S. Smith, unpublished. 54. A. D. Rig@, Cytogenet. Cell Genet. 14,9 (1975). 55. R. Holliday and J. E. P u g , Science 187,226 (1975). 56. L. Chen, A. M. MacMilhn and G. L. Verdine, JACS 115, 5318 (1993). 57. S. S. Smith, D. J. Baker, L. JardinesandT. A. Hardy,J. Cell. Biochetn., Suppl. I%, 300 (1988). 58. J. L. C. b n and S. S . Smith, J. Cell. BsChem., Suppl. 13D,218 (1989). 59. S . S . Smith,J. Cell. Biochetn., Suppl. 14B, 138 (1990). 60. N.-W. Tan and B. F. L. Li, Bchem 29, 9234 (1990). 61. C.-W. Wong. N.-W. Tan and B. F. L. Li. JMB 226, 1137 (1992). 62. P. A. Hepburn, G. P. Margison and M. J. Tisdde, JBC 266, 7985 (1991). 63. G. P. Pfeifer, S. Griinwuld, F. Palitti, S. ffiul, T. L. J. Boehm, H.-P. Hirth and D. Drahovsky, JBC 260, 13787 (1985). 64. L. Margulis and K. V. Schwrtz, “Five Kingdoms: An Illustrated Guide to the Phyla of Life on Earth,” 2nd Ed. Freeman, New York, 1 W . 65. D. Nathns and H. 0. Smith, ARB 44,273 (1975). 66. S. S. Smith and D. I. Ratner, BJ 277,273 (1991). 67. P. A. Whittaker, A. McLwhlan and N. Hardman, NARes 9,801 (1981). 68. H. H. Evans and T.E. Evans, JBC 345, 6436 (1970). 69. H. H. Evans, T.E. Evans and S. Littman, J M B 74, 563 (1973). 70. A. Hildebrandt, Ewp. Cell Res. 167, 271 (1986). 71. Y. Gruenbaum, T. Naveh-Many, H. Cedar and A. b i n , Nuture 292, 860 (1981). 72. J. M. Magill and C. W. Magill, Dea Genet. 10,M (1989). 73. P. J. Russell, K. D. Rodland, E. M. hchlin and J. A. McCloskey. J . Bact. 169, 2902 (1987). 74. P. M. M. Rae and R. E. Steele, NARes 6, 2987 (1979). 75. S. Urieli-Shovd, Y. Gruenhum, J. S d a t and A. b i n , FEBS Lett. 146, 148 (1982). 76. S. S. Smith and C. A. Thomas, Jr.. Gene 13, 395 (1981). 77. V. J. Simpson, T. E. Johnson and R. F. Hammen, NARes 14, 6711 (1986). 78. A. P. Bird, J M B 118, 49 (1978). 79. J. Lewis and A. Bird, FEBS Left. 265, 155 (1991). 80. A. P. Bird, NARes 8, 1499 (1980). 81. R. L. Sinsheimer, JBC 215, 579 (1955). 82. M. F. Perutz, JMB 213, 203 (1990). 83. H. P. Sduz, J. Jiricny and J. P. Jost, €“AS 83, 7167 (19116). 84. G. P. Pfeifer, S. D. S t e i g e d d , P. R. Mueller, B. Wold and A. D. Rims, Science 246,810 (1989). 85. M. Toth, U. Miiller and W. Doerfler. J M B 214, 673 (1990). 86. D. M. Woodcock, P. J. Crowther and W. P. Diver, BBRC 145,888 (1987). 87. P. J. Crowther, A. L. Ctvtwright, A. Hocking, S. Jefferson, M. D. Ford and D. M. Woodcwk, NARes 17, 7229 (1989). 88. K. Hubrich-Kiihner, H.-J. Buhk, H. Wagner, H. Kdger and D. Simon, BBRC 160,1175 (1989). 89. J. H. Proffitt. J. R. Davie, D. Swinton and S. Hattman, MCBiol4, 985 (1984). 90. C. Sapienra, Sci. Am. 263, 52 (1990). 91. L. A. Michalowsky and P. A. Jones, MCBiol 9,1185 (lSa9). 92. E. Li, T. H. Bestor und R. Jmnisch, Cell 69, 915 (1992).

HUMAN DNA (CYTOSINE-5)METHYLTRANSFERASE 93. A. Bird, Cell 70, 5 (1992).

109

94. T. H. Bestor, J. NIH Res. 5, 57 (1993). 95. A. D. Riggs and P. A. Jones, Ado. Cancer Res. 40, 1 (1983). 96. G. Buschhausen, M. Graessmann and A. Graessmann, NARes 13, 5503 (1985). 97. G . Buschhausen, B. Wittig, M. Graessmann and A. Graessmann, PNAS 84, 1177 (1987). 98. J. Yisraeli, R. S. Adelstein, D. Melloul, U. Nudel, D. Yaffe and H. Cedur, Cell 46, 409 (1986). 99. W. S. Dynan, Trends Genet. 5, 35 (1989). 100. L. H. T van der Ploeg and R. A. Flavell, Cell 19, 947 (1980). 101. L. Ho, V. A. Bohr and P. C. Hanawalt, MCElol 9, 1594 (1989). 102. M. Lieh, Cenefics 128, 23 (1991). 103. M. E. Dar and A. S. Bhagwat, Mol. Microbid. 9, 823 (1993). 104. F. Hennecke, H. Kolmar, K. Briindl and H.-J. Fritz, Nature 353, 776 (1991). 105. A. Ruin and A. D. Riggs, Science 210, 604 (1980). 106. M. Wigler, D. Levy and M. Perucho, Cell 24, 33 (1981). 107. M. Harris, Cell 29, 483 (1982). 108. R. Stein, Y. Gruenbaum, Y. Pollack, A. Ruin and H. Cedar, PNAS 79, 61 (1982). 109. M. E. Tolberg and S. S. Smith, FEES Lett. 176, 250 (1984). 110. M. Busslinger, J. Hrirst and R. A. Flavell, Cell 34, 197 (1983). 111. M. Sheffery, R. A. Rifkind and P. A. Marks, PNAS 79, 1180 (1982). 112. J.-N. Lapeyre and F. F. Becker, EERC 87, 698 (1979). 113. J. G . Reilly. C. A. Thomas, Jr.. and A. Sen, EBA 697, 53 (1982). 114. E. S. Diala and R. M. Hoffman, EERC 107, 19 (1982). 115. A. P. Feinherg and B. Vogelstein, Nature 301, 89 (1983). 116. S. S. Smith, J. C. Yu and C. W. Chen, NARes 10, 4305 (1982). 117. J. C. Cohen, Cell 19, 653 (1980). 118. I. Kuhlinann and W. Doerfler, Virology 118, 169 (1982). 119. M. E. Tolberg and S. S. Smith, EEA 783, 272 (1984). 120. M. E. Tolberg, S. J. Funderburk, I. Klisak and S . S. Smith, ]EC 962, 11167 (1987). 121. D. D. Loeh, R. W. Padgett, S. C. Hardies, W. R. Shehee, M. B. Comer, M. H. Edgell, and C. A. Hutchison 111, M C E i o l 6 , I68 (1986). 122. A. F. Scott, B. J. Schmeckpper, M. Abdelrazik, C. T.Comey, B. O’Hara, J. P. Rossiter, T. Cooley, P. Heath, K. D. Smith and L. Margolet, Cenosiics 1, 113 (1987). 123. S. S. Smith and M. E. Tolberg, in “Biochemistryand Biology of DNA Methylation” (G. L. Cantoni and A. Razin, eds.), p. 11. Liss, New York, 1985. 124. A. Bird, M. Taggart and D. Macleod, Cell 26, 381 (1981). 125. E. U. Selker and J. N. Stevens, MCEiol7, 1032 (1987). 126. L. Rhounim, J.-L. Rossignol and G . Faugeron, EMEO J . 11, 4451 (1992). 127. M. Szyf, B. P. Schiintner and J. G. Seidnian, PNAS 86, 6W3 (1989). 128. P. Muininaneni. P. L. Bishop and M. S. Turker, ]EC 268, 552 (1993). 129. D. Jiliner, H. Stuhlmann, C.L. Stewart, K. Habers, J. Liihler, 1. Simon and R.Jaenisch, Nature 298, 623 (1982). 130. D. Simon, H. Stuhlmann, D. Jiiliner, H. Wagner, E. Werner and R. Jaeniscli, Nuture 304, 275 (1983). 131. A. J. Jeffreys, V. Wilson and S. L. Thein, Nature 314, 67 (1985). 132. A. J. Silva and R. White, Cell 54, 145 (1988). 133. R. Holliday, Sci. Am. 260, 60 (1989). 134. T. H. Bestor, EMBO]. 11, 2611 (1992). 135. T. H. Bestor and V. M. Ingren, PNAS 82, 2674 (1985). 136. R. Reeves and P. Cserjesi, JEC e54, 4283 (1979).

110

STEVEN S. SMITH

137. A. DePaoli-Roach, P. J. Roach, K. E. Zuker and S. S. Smith, FEBS Lett. 197, 149 (1986). 138. J. Moinand, G . P. Zambetti, D. C. Olson, D. George and A. J. Levine, Cell 69, 1237 (1992). 139. E. U. Selker, TZBS 15, 103 (1990). 140. S. C. Schultz, C. C. Shields and T. A. Steitz, Science 253, 1001 (1991). 141. C. 0. Pabo and M. Lewis, Nature 298, 443 (1982). 142. C. 0. Pabo. W. Krovatin, A. Jeffrey and R. T. Sauer, Nature 298, 441 (1982). 143. I. OberlB, F. Rousseau, D. Heitz, C. Kretz, D. Devys, A. Hanauer, J. BouB, M. F. d J. L. Mandel, Science 252, 1097 (1991). 144. R. S. Hansen, S. M. Cartler, C. R. Smtt, S.-H. Chen and C. D. Laird, Hum. Mol. Genet. 1, 571 (1992). 145. S. Karlin and V. Brendel. Science 259, 677 (1993). 146. G . P. Schroth, J. S. Siino, C. A. Cooney, J. P. H. Th'ng, P. S. Ho and E. M. Bradbury, JBC 267, 9958 (1992). 147. A. H. Bolden, C. A. Ward, C. M. Nalin and A. Weissl)ach, This Series 33, 231 (1986). 148. C. Ward, A. Bolden, C. M. Nalin and A. Weissbach, JBC 262, 11057 (1987). 149. D. Burnouf, P. Koehl and R. P. P. Fuchs, PNAS 86, 4147 (1989). 150. M. D. Topal, J. S. Eadie and M. Conrad, JBC 261, 9879 (1986). 151. S. A. Akman, R. G. Lingeman, J. H. Doroshow and S. S. Smith, Bchern 30,8648 (1991). 152. A. I. H. Murchie and D. M. J. Lilley. NARes 20, 49 (1992). 153. D. Sen and W. Gilbert, Nature 334, 364 (1988). 154. S. Ahmed and E. Henderson, NARes 20, 507 (1992). 155. R. A. Gjerset and D. W. Martin, Jr., JBC 257, 8581 (1982). 156. A. Wilks, M. Seldran and J. P. Jost, NARes 12, 1163 (1984). 157. M. Szyf, L. Eliasson, V. Mann, G . Klein and A. Ruin, PNAS 82, 8090 (1985). 158. J. Kaysen, D. Spriggs and D. Kufe, Cancer Res. 46, 4534 (1986). 159. F. K. Ziniinerniann and I. Scheel, Mutat. Res. 139, 21 (1984). 160. T.-A. Hori, Mutat. Res. 121, 47 (1983). 161. K. M. Call, J. C. Jensen, H. L. Liber and W. G . Thilly, Mutat. Res. 160, 249 (1986). 162. M. Schmid, D. Grunert, T. Haafand W. Engel, Cytogenet. Cell Genet. 36, 554 (1983). 163. E. Viegas-Pequignot and B. Dutrillaux, Hurn. Genet. 57, 134 (1981). 164. B. I. Carr, J. G . Reilly, S. S. Smith, C. Winberg and A. Riggs, Carcinogenesis 5, 1583 (1984). 165. A. J. Katz, Mutat. Res. 143, 195 (1985). 166. K. D. Brunnemann, L. Yu and D. Hoffmann, Cancer Res. 37, 3218 (1977). 167. K. D. Brunnemann, J. C. Scott and D. Hoffman, Carcinogenesis 3, 693 (1982). 168. P. N. Magee and E. Farber, BJ 83, 114 (1962). 169. B. Singer and J. T. KuSmierek, ARB 52, 665 (1982). 170. B. Singer, Cancer Znwst. 2, 233 (1984). 171. L. A. Loeb, Cell 40, 483 (1985). 172. S. S. Hecht and R. Young, Cancer Res. 41, 5039 (1981). 173. F.-L. Chung, M. Wang and S. S. Hecht, Cancer Res. 49, 2034 (1989). 174. W. B. Mattes, J. A. Hartley, K. W. Kohn and D. W. Matheson. Carcinogenesis 9, 2065 (1Qw. 175. R. D. Ivarie and J. A. Morris, PNAS 79, 2967 (1982). 176. I. K. Farrance and R. Ivarie, PNAS 82, 1045 (1985). 177. P. A. Miinzel, A. Pfohl-Leszkowin, E. Riihrdanz, G . Kieth, G. Dirheimer and K. W. Bock, Biochetn. Phannucol.42, 365 (1991). 178. S. Ohno. PNAS 84, 6486 (1987). 179. S. Ohno, PNAS 85, 4378 (1988).

HUMAN DNA (CYTOSINE-5)METHYLTHNSFERASE 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 1.90. 191. 192. 193. 194. 19.5. 196. 197. 198. 199.

200. 202. 202. 203.

111

S. Ohtin, PNAS 85, 9630 (1988).

M. McClellatid and R. Ivarie. NARes 10, 7865 (1982). A. P. Bird, Nature 321, 209 (1986). W. Salser, CSHSQB 42, 985 (1978). C. Coulondre, J. H. Miller, P. J. Faraliaugh atid W. Gilbert, Nature 274, 775 (1978). 1). N. Cnnper and M. Krawczak, Hutti. Genet. 85, 55 (1990). I). N. Cooper atid H. Youssoufian. Huiti. Genet. 78, 151 (1988). B. K. Duncan and B. Weiss, J . Bmt. 151, 750 (1982). B. K. Duncan and J. H. Miller, Nature 287, 560 (1980). W. M. Rideorrt 111, G. A. Coetzee, A. F. Olumi and P.A. Jones, Science 249, 1288 (1990). M. Hollstein, D. Sidransky, B. Vogelstein and C. C. Harris, Science 253, 49 (1991). M. Ehrlich, X-Y.Zhang and N. M. Inaindar, Mutat. Res. 238, 277 (1990). L. A. Frederico, T. A. Kunkel atid B. R. Shaw, Bcheui 32, 6523 (1993). H. Hayatsu, Y. Wataya, K. Kai and S. Iida, Bchetti 9,2858 (1970). Y. Furuichi. Y. Watayna, H. Hayatsu and T. Ukita, BBRC 41, 1185 (1970). L. R. Barrows and P. N. Magee, Carcinogenesis 3, 349 (1982). 8. Rydljerg and T.Lindahl, EMBO J . 1, 211 (1982). M. Coll, X. Solatis, M. Font-Altaba and J. A. Sihirana. J . Biottiol. Struct. Dyn. 4, 797 (1987). S . El Antri, 0. Mauffret, M. Monnot, E. Lescot, 0. Convert and S. Fermandjian, J M B 230, 373 (1993). 0. MautFret, B. Hartmanti, 0. Convert, R. Lavery and S. Ferinandjian, J M B 227, 852 (1992). M. Monnot, 0. Mauffret, E. Lescot and S. Fermandjian. EJB 204, 1035 (1992). K. Wiel)auer atid J. Jiricny, PNAS 87, 5842 (1990). K. Wiel)auer and J. Jiricny, Nature 339, 234 (1989). J. Jiricny. M. Hughes. N . Coriiian atid B. B. Rudkin, PNAS 85, 8860 (1988).

This Page Intentionally Left Blank

Molecular Properties and Regulation of G-ProteinCoupled Receptors CLAIREM. FRASER,~ NORMANH. LEE,

I. G-Protein-Mediated Signal Transduction ......................... 11. G-Protein-Coupled Receptors Are a Large Gene Family ........... 111. Molecular Bitsis of Receptor-Ligand Interactions .................. A. Biogenic Ainine Receptors .................................. B. Peptide Hornione Receptors ................................ IV. Molecular Basis of ReceptorlC-Protein Interactions ............... A. Biogeiiic Ainiiie Rewptors .................................. B. Defects in ReceptorlG-Protein Coupling in Disease ............ C. Constitutive Activation of G-Protein-Coupled Receptors: Implications for Control of Cellular Growth ......................... v. Identification of Functional Dointlins Involved ia Receptor Desensiti7~tionand Dmi-regulation ............................ A. f3-Adrenergic Receptors .................................... 8. a,-Adrenergic Receptors ................................... C. Muscarinic Acetylcholine Receptors .......................... VI. Genetic Elements Controlling C-Protein-Coupled Receptor Expression .................................................. VII. Identification of Novel G-Protein-Coupled Receptors by Partial cDNA Sequencing .................................................. VIII. Conclusions .................................................. References ...................................................

114 115 121 121 127 130

131 134

135 136 137 141 141 143

147 149 149

Cells in multicellular organisms must communicate in order to regulate physiological processes and coordinate function. A target cell responds to an extracellular signal by means of specific proteins, called receptors, that bind the signaling molecule and initiate a biological response. Many of the same signaling molecules and receptors are used in endocrine, paracrine, and synaptic signaling. The crucial differences lie in the speed and selectivity with which the signals are delivered to the target cells. I

To whoin correspondence may be addressed.

114

CLAIRE M. FHASER ET AL.

For many hormones, neurotransmitters, and cheinotactic factors, signal transduction is accomplished through the interaction of bioactive molecules (agonists) with cell surface receptors that couple to one or inore species of heterotrimeric guanine-nucleotide-binding regulatory proteins (G proteins).2 Receptors that share this mechanism of signal transduction have been termed G-protein-coupled receptors (GPCRs). These receptors play a critical role in many physiological processes and have been targets for drug intervention and therapy in a wide range of diseases. From inolecular cloning experiments, the number of genes encoding GPCRs has been estimated to be 500-1000, a number that represents approximately 1-2% of the total number of genes in the huinan genome.

1. G-Protein-Mediated Signal Transduction Agonist occupation of a GPCR leads to the generation of one or inore intracellular second messengers as a consequence of the activation of one or more effector enzymes such as adenylyl cyclase, phospholipases A, C, or D, phosphodiesterases, and ion channels in specialized cells. For G-proteinmediated signal transduction, it has been proposed that the signaling molecules are free to move laterally in the plane of the ineinbrane and interact in a manner dictated by their relative abundance and affinities for each other (1). The receptor, the inactive G-protein complex composed of three subunits (a,p, and y), and the inactive effector enzymes are all associated with the plasma membrane. The binding of a hormone or neurotransmitter triggers the association of the G-protein complex with the agonist-occupied receptor. The a-subunit of the G protein releases a molecule of bound GDP, quickly replaced with GTP, which is more abundant in the cell. GTP binding triggers a conformational change in the a-subunit, which dissociates from p and y and associates with an effector enzyme. The enzyme switches on and continues to synthesize second-inessenger molecules until the a-subunit spontaneously hydrolyzes its GTP to GDP, returning to the inactive state. The a-subunit dissociates from the enzyme and reassociates with p and y to Abbreviations used: G protein, guanine-nucleotide-binding regulatory protein; GPCR, G-protein-coupled receptor; PBCM, propyll)enzilylcliolilie mustard; 5-HT. 5-hydroxytryptainiiie (serotonin); 8-OH-DPAT, 8-hydroxy-2-(dipropyI-amino)tetralin;NK, neumkinin; TRH, thyrotropia-releasing hormone; G,, stimulatory guaiiiiie-nucleotide-bindiilRprotein; G,, inhibitory guaniiie-nucleotide-bindiii~protein; GHRH, growth-hormone-releasilighonnone; PARK, f3-adrenergic receptor kinase; CHW, Chinese hamster fibroblasts; DDT,MF-S, smooth muscle cells derived froni a Syrian hamster leiomyosarcoma of the diictus deferens; EST, expressed sequence tag.

G-PHOTEIN-COUPLED RECEPTORS

115

generate an inactive G-protein complex. Agonist dissociates from the receptor, restoring the unstimulated conformation (see 2 and 3 for reviews). Recent evidence also suggests a role for py-dimers in the regulation of activity of certain effector enzymes (4, 5). Thus, G proteins act as molecular switches, alternating between the inactive GDP-bound form and the active GTP-bound form. The intrinsic GTPase activity of the a-subunit ensures that the G protein remains active for only a short period. Mutations in the a-subunit that slow or block its GTPase activity prolong the active state, sometimes indefinitely (6). This has been associated with uncontrolled cell growth in some cells. In addition to transducing an extracellular signal into an intracellular one, G-protein-mediated signal transduction greatly amplifies the initial signal from a single agonist molecule. Each agonist-occupied receptor can activate many G proteins, each of which can activate many effector enzymes. Each effector in turn can generate a large number of second-messenger molecules. As a result, a single agonist can produce hundreds or thousands of second messengers inside the cell. Certain specialized types of signal transduction operate through similar G-protein-mediated mechanisms (see 7 and references therein). In the visual system, a protein in the retina (rhodopsin) serves as a receptor for light. The molecule that absorbs light, retinal, is covalently bound to rhodopsin. Working through a G protein called transducin, the signal of light binding is translated into an electrical signal. In the olfactory system, odorant receptors associate with G proteins that couple the receptors to a form of adenylyl cyclase unique to olfactory neurons. Similarly, recent evidence suggests that the perception of tastants also occurs via G-protein-mediated signaling mechanisms. The structural organization of GPCRs, G proteins, and effector enzymes in the plasma membrane is still unresolved. Recent work in several laboratories suggests that GPCRs, G proteins, and some effector enzymes are associated with the cytoskeletal network, arguing for a restricted mobility of these signal-transduction molecules (8-11). It has been proposed that receptors and G proteins are coupled as large oligomeric structures, and that agonists and GTP act in concert to release inonomers of G protein that interact with effector enzymes to cause activation or inhibition (12).

II. G-Protein-Coupled Receptors Are a Large Gene Family During the past decade, the structure of a large number of GPCRs has been elucidated by molecular cloning (Table I). This progress has been

CLAIRE M. FHASER ET AL.

116 TABLE I WITH

MEMBHANERECEPTORS THATINTEHALT GUANINE-NUCLEOTIUE REGULATORYPROTEINS

Peptide hormone receptors AdreiicK~rticotroI,in (ACTH) Angiotensin Antidiuretic hormone (ADH) Boml~esin Brad ykinin C5a anaphylatoxin Calcitonin Cholecystokinin (CCK) Cortirntropiii-releasin): hormone (CRF) Endothelin Gastrin ~~llC!dgOll Glucagon-like peptide Goii;i~lotn)piii-releuiiighorinone (GnRH) Cn)wtli-hormone-releasinghormone (GRF) Interleukind (IL-8) Kinins (Imdykinin, substances P and K) Luteinizing hormone (LH) MeIancKwrtin MelaiicKyte-stiinulating hormone (MSH) Neiirnpeptide tyrosine (NPY) Nriirotensin N-forniyl peptide Opiates Oxyttrin Parathyroid hormone (PTH) Pituitary adenylate cyclue-activating protein Secretin Somatnstatin Tliyrotropin-releasiiiglioriiione (TRH) Vasoactive intestinal pnlypeptide (VIP) Vasopressin

Neurotransmitter receptors Adenosine a-Adrenergic P-Adrenergic ATP Ihpamine GABA Glutamate Histamine Muscarinic wetylcholine Octolxuiiine Serotonin (5-HT) Tyraniine Sensory systems Vision (rhodopsins) Olfaction Taste Other agents Cannahinoids IgE t t i m oncogene Platelet-activatingfactor (PAF) Prostanoids T1ironil)in

Clymprotein hormone receptors Cliorionicgoiiadotropiii Follicle-stiinulatiii~hornione (FSH) Tliyrotropin (TSH)

based, in large part, on the conservation of primary and secondary structure among GPCRs, particularly within subfamilies, allowing for isolation of new cDNA and genomic clones by cross-hybridization and the polymerase chain reaction. From the available data, it appears that CPCRs may be the largest family of cell surface receptors containing many hundreds of members. In

G-PHOTEIN-COUPLED HECEF'TOHS

117

fact, there are now inany examples in which inolecular cloning techniques have identified GPCR subtypes whose properties were not predicted or were only weakly supported by pharinacological data. The photoreceptor rhodopsin was the first inember of the GPCR family whose DNA and amino-acid sequences were elucidated (13,14). While immunological data suggested a structural similarity ainong GPCRs that bind biogenic ainine neurotransmitters (15, 16),it was not until 1986, with the cloning and sequence analyses of a P-adrenergic (17) and inuscarinic acetylcholine (18) receptor, that the structural similarities among receptors that mediate G-protein-coupled signal transduction were confirmed. All GPCRs are integral membrane proteins that range in size from approximately 400 to loo0 amino acids. Although the endogenous ligands for GPCRs include such diverse structures as ainine neurotransmitters, peptide hormones, large glycoprotein hormones, and sensory molecules, including odorants and tastants, the GPCRs share a well-conserved structure and topography. The unifying feature of GPCRs is the presence of seven hydrophobic domains, each between 20 and 26 amino acids in length, containing distinctive amino-acid patterns. These domains are assumed to be transmembrane u-helices that are oriented roughly perpendicular to the membrane (Fig. 1). This assumption is based on the known seven-helical structure of bacteriorhodopsin, an integral membrane protein from Halobucteriuna hulobintn (19). The iiieinbrane-spanning regions of GPCRs display significant arnino-acid identity, ranging from 20% among unrelated receptors to over 90% ainong receptor subtypes. The extracellular and intracellular domains of GPCRs display inore divergent amino-acid sequences. For both rhodopsin and the P,-adrenergic receptor, the N-terminus is located on the extracellular side of the membrane, and the C-terminus is on the intracellular side (20, 21). A detailed coinparison of amino-acid sequences of GPCRs reveals that inany of these proteins contain coininon amino acids and domains (Fig. 2). It has been speculated that the most highly conserved amino-acid residues play an essential role in proper protein folding, whereas residues that are conserved only ainong major classes of receptors are responsible for their unique functional properties (22). Conserved polar residues contained within the transmembrane helices are always positioned on the same side of the helices, presumably located internally, and all but one of the conserved aromatic residues are located on the opposite, or external, faces of the helices (22). Several inodels for the three-dimensional structure of GPCRs with regard to the arrangement of helices in the plasma membrane have been proposed. Baldwin (23) examined the structural features of 204 GPCR sequences and deduced an arrangement of helices in three dimensions by

118

CLAIRE M. FRASER ET AL.

FIG.1. Membrane topography of the rat P,-adrenergic receptor, a representative G-protein-couplsdreceptor. The 400 amino acids in the rat p,-adrenergic receptor are shown (0). The N-terminal domain of the &-receptor is located extracellularly and the C-terminal domain is located intracellularly. The seven hydrophobic domains of the receptor are oriented in the plasma membrane (depicted by the boxed area) and are connected by alternating extracellular and intracellular domains. Amino-acid identities among rat PI- &- and p,-adrenergic receptors are indicated.

allocating each helix to a position appropriate to the extent of its lipid-facing surface area. Viewed from the intracellular surface of the membrane, the helices are oriented clockwise with the lipid-facing surfaces facing outward (Fig. 3). All helices pack against their neighbors at small positive angles, with the exception of helix 111-helix IV, which has a small negative angle. This arrangement gives a closely packed structure at the intracellular surface where the receptor/(=-protein interactions occur, and a more open structure in the extracellular half of the protein (23). From primary sequence data, members of the GPCR gene family can be classified into distinct subfamilies (Fig. 2). These include receptors that bind

TMl

TM2

TM4 I

I

h 1AR hg2AR ham albAF ha2aAR hDlDR hSHTlaR hmlmAChR hm2mAChR

I

I

ARGLVCTWAISALVSFLPILM ARVIILMVIIVSGLTSF.LPIQ AILALLSWLSTVISIG.PLL KAI.IITCUVISAVISFPPLIS AFILISVAWI'LSV!,ISFIPVQL PRALISL?WLIGFL:S:P.PML MLMIGLAlLVSFVLWAPAILF AGMMIAA4WLSFILWAPAILF

mW-R

IAITALYSAVCAVGLLOPVLVMFG

IITYLVFAVTFVLGVLQIIGLVIWV

YIFNLALADALATSTLPFQSAKYL ELLCKAVLSIDYYNMFTSIFTLTEIHSVDRT AKLINICIWfLASGVGVPIMVM SYLNLAVADFCFTSTLPFFENRKA WFLC!KFLTFIVDINLFGSVFLIALIAL.DRC AKKVIIGFWVW.LLLTLPVIIR

hSPR hSKR mTRHR

VLWAAAYTVIVVTSWQNVVMhT ALWAPAYLALVLVAVT(BO\IVIWI WTILLVWICGLGIVOEIHV

FLWFAEASMAAFNTWNFTYA LPYCKFHNFFPIAAVFASIYSMTAVA!?DRT TKWICVIWLALLLAFPQGYY FIVNLRLADLCMAAFNAAFNFVYA RAFCYFpNLFPITAMFVSIYSMTAIA4DRT TKAVIAGIWLVALALASPQCFY YLVSLAVADLMnVAAGLPNITDS YVGCLCITYLQYLGINASSCSITAFTI~T AKKIIIFVIAfTSIYCMLEFFL

hFMLPR

hLH/CGR hFSHR hTSHR rmGluRl mGluR2

---

DFLRVLIWLINILAIMWMTVLFL LHCNLSFADFCMGLYLLLIAVDS GSGCSTAGFFTVFAELSVYTLTVITLERH AILIMLGGHLFSSLIAMLPLVG NILRVLIWFISILAI'IUWIIVLVI LWCNtAFADLCIGIYLLLIASVDI GAGCDAAGFFTVFASELSVYTLTAIAASVMWGUIFAFAAALFPIFG KFLRIWWFVSLIALLmrVFnLI LMCNLAFADFCMGMLLLIASVDL GPGCNTAGFFTVFASELSVYTLTVITL&W ACAIPNGGNCCFLLALLPLVG IIAIAFSCLGILVTLFVTLIFVLY YIILAGIFLGWC.PFTL1AKPTT YLQRLLVGLSSAMCYSAL&WWNRIARILA QVIIASILISVQLTLVVTLIIM YILLGGVFLCY.CEfPFVFIAKPST TLRRLGLGTAFSVCYSALLTK'PNRIARIFG QVAICLALISGQLLIVAAWLW

VGPVTIACLGALATLFVLGVF'VRH

TMS

h 1AR

AYAIAS$VVS?TVPLCIMAAFWL AYAIASSIWWWPLVIMVFVYS FYALFSSLOB~IPLAVILVMYC WKVISSCIGS?~APCLIMILW TYAISSSVIS~IPVAIMIVTYT GYTIYSFFGARIPLLLMLVLYG IITFGTAMAWTLPVTVMCTLYW hm2mAChR AVTFGTAIAIVILPVIIKLYW llE2AR

Him albRR ha2aAR hDlDR hSHTlaR hmlmAChR

TM6

TLGIIMG~LCRLPFFLAN TLGIIMGmLCVLPFFIVNIV TLGIWGlDILClLPFFIALPL TLGIWGWTILCWLWFIALPL TLSVIMG! RVCCILPFFILNCI TLGIIMGT?ILCILP?FIVALV TLSAILWILTWP~IWLV TILAILWIITUAPXNVhVLI

TM7

DRLFVFFMILGYANSARPIIIC KEVYILLNIIGYVNSGRPLITC DAVFKwFWLGYFNOtUlPIIYP DAVKFVVFWLGYFNOtUlPIITP SWTFDVFVIGFWANSSWIPIITA TLLGAIIMILGYSNSLUlPVITA €TLWELGYULCYVNSTIIPMCTA ~IGYULCYINSTIEPACTA

mDOR

hFMLPR

VTKICVFLFARNPILIJTVCYG MVLVWGRIWCWAPIHIFVIV VAALHLCIALGYANSSUlPVLTA VRGIIRFIIGFSAPUSIVAVSYG VLSFVAAAFFLCNSPYQWALI GIAVDVTSALAFFNSCWIPMLILN

hSPR hSKR mTRHR

VYHICVTVLIXELPLLVIGYAYT MXIVWCTFAICWLPFHIFFLL W V Y L A I ~ S S ~ Y E P I I T C LYHLWIALIXELPLAVMNAYS RNLWLWAICWLPXHLYFIL ~ V Y L A L ~ S S ~ ~ I I T C PIYLMDFGVFXVMPMILATVLYG M L A V W I W A L L W U T L W V NWFLLFCRICIYLNSAIUPVITN

hLH/CGR hFSHR hTSHR

YILTILILNWAFFIICACYIKI KMAILIFTDFTCMAPISFFAIS TNSKVLLVLFYPINSCAHPFLTA YVMSLLVLNVLAFWICGCYIHI RMAHLIFTDFLCMAPISFFAIS SKAKILLVLFHPIN#WFLTA YIVFLVTLNIVAFVIVCCCHI RUAVLIFTDFICMAPISFYALS SNSKILLWFYPLN8CAMPFLTA

tnGlUCR1 LGWAPVGYNGLLIHSCTYYAFK AFTHYTPCIIWLAFVPIYFGSN CFAVSLSVTVALGCMFTPKMYII mGlutR2 ASMLGSLAYNVLLIALCTLYAFK GFTMYTTCIIWLAFLPIFYVTS CVSVSLSGSWLGCLFAPKLHII

FK:. 2. Aligninelits of the seven transnieinl~ranedomains (TM 1-7) and adjacent residues in representative G-protein-r~)iipledrecuptors. The amino acids are represented in single-letter cwde. Residues in Inddface type represent highly cwiserved aniitio acids. Underliiied residues represent cnnsewative sul)stittitions. Sliaded residues represent ainino wids cwiserved within a sii1)faniily of receptors. The receptor sequences illustrated include: hSlAR, human fl,-adrenergic receptor (39);hani alliAK, hanister aIl,-adrenergic receptor (194); h a M R , huniaii a%-adreiiergic receptor (42); h5HTlaR. hunian 5-HT, recuptor (195);inTRHR, iiiouse thyrotropiii-rel~~siiig horinone (196); rinCluR1, rat ineta1w)tropicgliitainate receptor 1 (197); nnCluR2, rat nietalwtropic gliitainate receptor 1 (197). Keferences lbr other sequenrus are in the legend to Fig. 10. (Reprtducvd froin 198. with permission. )

,

120

CLAIHE M. FHASER ET AL.

FIG.3. Possible orientation of the seven helices in G-protein-coupled receptors. Based on unalyses of the sequences of the seven transmembrane helices from more than 200 GPCRs,a possible arrangement of the helices in the membrane has been propnsed (23).Arrurding to this model, the helices are amnged in P clockwise orientation; this view is from the intrawllular side of the membrane. The arrangement of the helices is such that helix I11 is least exposed to the lipid, and helices I, IV, and V w e innst exposed to the lipid environment of the membrane.

biogenic amines (e.g., epinephrine, dopamine, and acetylcholine), glycoprotein hormones (e.g., thyrotropin, follicle-stimulating hormone, and lutropin/chorionic gonadotropin), and neurokinins (substance P, substance K, and neuromedin K). The recent cloning of calcitonin, parathyroid hormone, and secretin receptors represents the delineation of another subfamily of GPCRs. These receptors are more closely related to each other (up to 42% sequence identity) than to any other GPCRs (24-26). In most cases, a receptor within a subfamily can be hrther divided into subtypes, each encoded by a separate gene. For example, at least five subtypes of dopamine receptors (Dl-D5) (27) and at least 10 subtypes of serotonin receptors have been isolated by molecular cloning (28, 29). During the past 7 years, the cloning of genes and cDNAs encoding GPCRs, together with their expression in heterologous cell systems, has allowed characterization of the pharmacological and biochemical properties of single receptor subtypes. In addition, a wealth of information on the relationship between GPCR structure and function has been obtained using mutagenesis techniques. These approaches have provided insights into such questions as (i) the domains involved in receptor-ligand interactions in neurotransmitter and peptide hormone (GPCRs, (ii) the domains involved in receptor/(=-protein interactions, (iii) the molecular determinants of receptor desensitization and down-regulation, and (iv) the mechanisms involved in the transcriptional and post-transcriptional regulation of GPCR expression.

G-PHOTEIN-COUPLED RECEPTORS

121

Such data, together with information from molecular modeling studies, have the potential to impact the process of drug discovery and design by providing a clearer picture of the three-dimensional ligand-binding site of GPCRs at the molecular level (23, 30-32).

111. Molecular Basis of Receptor-Ligand Interactions Because all GPCRs are unique in their ligand-binding properties, it is likely that each subfamily of receptors has evolved unique domains related to interactions with their respective endogenous ligands. As summarized below, data from mutagenesis studies on many GPCRs are beginning to reveal fundamental differences in the way that various classes of ligands interact with their respective receptors. Much evidence has accumulated to suggest that the determinants of neurotransmitter binding to GPCRs reside in the transmembrane helices, while the deterininants of peptide-hormone binding are primarily located in extracellular domains. This difference is perhaps not surprising given the tremendous differences in the size of amine neurotransmitters relative to small peptide hormones. While small neurotransmitters may be easily accommodated in a ligand-binding pocket formed by the transmembrane helices of GPCRs, the peptide hormones are likely too large for such an interaction.

A. Biogenic Amine Receptors 1. P-ADRENERGIC hCEPTORS To date, most of the work on the structure-function relationships of G-protein-coupled receptors has been carried out using the p-adrenergic receptor as a model. Large regions of the intracellular and extracellular hydrophilic domains of the P,-adrenergic receptor can be deleted without altering agonist and antagonist binding (33, 34). These observations suggest that the determinants of ligand binding in P-adrenergic receptors may reside in one or more helix. Deletions in the first and second cytoplasmic hops produce receptors that are undetectable b y immunoblotting, suggesting that these receptors are not correctly processed or inserted into the membrane (35). The catecholamines, endogenous agonists for the P-adrenergic receptors, consist of a catechol ring and a protonated amine connected by a P-hydroxyethyl side-chain. From studies utilizing synthetic adrenergic ligands, the amino group and substitutions on the P-hydroxyethyl side-chain have been shown to be important for both agonist and antagonist binding, and the catechol ring has been demonstrated to be essential for agonist

122

CLAIRE M. FHASEH ET AL.

activity (36). It has been suggested that the ligand-binding pocket of the P-adrenergic receptor contains acidic amino-acid residues that serve as counterions for the amiiie group of agonists and antagonists, and polar amino acids that form hydrogen bonds with the catechol hydroxyl groups (37). To identify ainino acids involved in ligand binding to the p-adrenergic receptor, several strategies have been utilized, including the creation of chimeric receptors, substitution and deletion mutants, and site-directed mutagenesis. Chimeric receptors have proven useful for the identification of structural domains that regulate agonist and antagonist specificity, as well as G-protein coupling. Chiineras constructed from a,-adrenergic and P,-adrenergic receptors reveal that the seventh transineinbrane domain is a major deterininant of antagonist binding (38). An aspartate residue at position 113 (Aspll3) in helix 111 of the &-adrenergic receptor is conserved among several receptor subtypes that bind biogenic amines, including P-adrenergic (39-41), a-adrenergic (42, 43), dopaminergic (44, 45), and muscarinic cholinergic receptors (46-49), suggesting an interaction between the ainine group of the ligand and the carboxylate side-chain of Aspl13. Substitution of Asp113 in the P,-adrenergic receptor with asparagine (Am1In) or glutainate (GIu"3) significantly reduces receptor affinity for antagonists (50). Furthermore, the As11113 mutant receptor displays a decrease of 10-5 in agonist potency for stiinulation of adenylyl cyclase (37). Substitution of Asp113 with glutamate, which contains a carboxylate side-chain, has a less marked effect on receptor activation, resulting in a decrease of 10-3 in agonist potency (37).These data suggest that the carboxylate side-chain of Asp113 serves as a couiiterion for the ainine group of P-adrenergic agonists and antagonists. Siinilar data have also been obtained from inutagenesis studies with other receptors that bind ainine neurotransmitters, including a,-adrenergic receptors @I), inuscarinic receptors (48),and histamine receptors (52),suggesting a coininon functional role for this conserved aspartate. Although the aspartate at position 113 in the P,-adrenergic receptor plays a role in binding the positively charged amino groups of p-adrenergic agonists and antagonists, a negatively charged ainino acid at position 113 in the &-adrenergic receptor is not essential for agonist activation of the receptor. Strader et ul. (53)substituted a serine residue (Ser113) for Asp113 in the P-receptor, thus replacing the carboxylate side-chain of aspartate with the hydroxyl group of serine (53). A series of modified catecholainines were generated by substituting the amino-containing alkyl group with functional groups that could potentially interact with the hydroxyl group of serine (Serl13) (53).Catechol derivatives capable of forining hydrogen bonds, such as catechol esters and ketones, were effective in inutant receptor activation but did not activate the wild-type P-adrenergic receptor (53).Hence, the

G-PROTEIN-COUPLED RECEPTORS

123

negatively charged residue at position 113 in the native receptor appears to relate primarily to the chemical nature of the endogenous ligands for the P2-receptor and not to an absolute requirement for agonist activation (53). Structure-activity studies demonstrate that P-adrenergic agonists require the presence of a catechol ring containing hydroxyl groups at the ineta and puru positions for full activity (36). Two serine residues (Ser2u and Se1-207)in transmembrane domain V of the &-adrenergic receptor have been identified as potential hydrogen-bonding sites for the hydroxyl groups of the catechol ring (Fig. 4) (54).This hypothesis is supported by the finding that agonists lacking either the metu- or puru-hydroxyl group display agonistbinding properties similar to those of the mutant receptors lacking the serine at the corresponding loci (54).These serine residues are conserved in all G-protein-coupled receptors that bind catechol ligands (adrenergic and dopaminergic receptors), but are not found in receptor subtypes whose ligands lack a catechol ring (muscarinic cholinergic receptors and peptide hormone receptors). However, mutagenesis experiments with the a2,-adrenergic receptor have indicated that the conserved serines do not necessarily play identical roles in all adrenergic receptors (51). Asp79, located in the second transmembrane segment of the P2-adren-

FIG. 4. Schematic diagram of the p,-adrenergic receptor, illustrating amino-acid residues important in agonist binding. A cross-section of the P,-adrenergic receptor in the p h n a ineml)rane,as viewed froin the extracellular side of the membrane, is illustrated. The conserved aspartate in helix 111 and the conserved serines in helix V that have been iiiiplicated in agonist binding to the receptor are shown. Epinephrine is shown in the binding pocket of the receptor. The positively charged amino group of epinephrine is involved in an ionic interaction with the negatively charged side-chain of Asp113 and the metu- and poru-hydroxylgroups of the catechol ring are hydrogen Imided to Ser*w and Ser*"', respectively. (Adapted froin 54).

124

CLAIRE M. FHASER ET AL.

ergic receptor, is highly conserved among members of this gene family (Fig. 2). Substitution of Asp79 in the human p,-adrenergic receptor with asparagine (Asn79) results in significantly reduced agonist &nities and normal antagonist binding, although this is most likely not a direct effect (55). This mutant receptor (Am") does not display guanine-nucleotide-sensitivehigh&nity binding of agonists, and more importantly, agonist binding produces no increase in intracellular CAMPlevels (55). This residue is essential for agonist-induced signal transduction with muscarinic, a,-adrenergic, dopamine, and luteinizing hormone receptors (48, 51, 56, 57). It has been hypothesized that this highly conserved aspartate may be involved in an agonistinduced conforinational change that is essential for receptor/(=-protein interactions (55). 2. MUSCAHINICACETYLCHOLINE&CEPTORS

Considerable progress has also been made in mapping the determinants of ligand binding in muscarinic acetylcholine receptors. The conserved aspartate residue in helix 111appears to play a similar role in ligand binding as in P-adrenergic receptors (48).This hypothesis was confirmed using ["]propylbenzilylcholine mustard (PBCM) as an &nity label to identify regions of the inuscarinic receptor responsible for binding muscarinic antagonists (58, 59). The aziridine portion of PBCM corresponds to the positively charged oniuin group of inuscarinic ligands that undergoes attack by nucleophilic amino acids and should in theory label the residue that acts as a counterion for the oniuin moiety. Purification and peptide sequence analyses of labeled rat brain inuscarinic receptors indicate that [SHIPBCM labels A S ~ Iin' ~helix ~ I11 of the receptor, consistent with the results of inutagenesis experiments (58, 59). Molecular modeling studies suggest that the aspartate in helix I11 of all biogenic ainine receptors is surrounded by three conserved aromatic amino acids that may influence the ion pair of the receptor-ligand complex by charge-transfer interactions (32, 60). Experimental evidence in support of this hypothesis derives from the observation that mutation of Trp'92 (helix IV) and Trpql3 (helix VI) in the m3 muscarinic receptor produces a marked reduction in ligand &nities (61). Because of the conservation of these amino-acid residues, it seems reasonable to speculate that they may play similar roles in other G-protein-coupled receptors. Another series of inutagenesis studies attempted to identify amino-acid residues in muscarinic receptors that interact specifically with the acetylcholine ester moiety by means of hydrogen bonding. The hydrophobic core of muscarinic receptors formed by the seven transmeinbrane helices contains several serine, threonine, and tyrosine residues that are not found in other

C-PROTEIN-COUPLED RECEPTORS

125

FIG.5. Scehematic diagram of the in3 inuscarinic rewptor illustrating amino-acid residues important in agonist binding. A crnss-section of the muscarinic receptor in the plasma meniI)rane. as viewed frnm tlie extracellular side of the ineinlmuie, is illustrated. Acetylcholine is slitwn in the binding pcwket of the receptor. The positively charged ammonium headgroup of acetylcholine is involved in an ionic interaction with the negativelycharged side-chain of Asp1". The polar amino acids that have been implicated in higli-affinity agonist binding are dso indicated. (Adapted from 63).

GPCRs, suggesting that some or all of these residues may be involved in binding inuscarinic receptor-specific ligands. Consistent with this hypothesis is the finding that two threonine residues (Thrm1 and T h r w in transmembrane helix V of the rat m3 muscarinic receptor) and four tyrosine residues ('&rIJ8 in helix 111, Tyrw in helix VI, and Tyrj2H and T y 9 in helix VII of the rat in3 muscarinic receptor) are required for high-dnity muscarinic agonist binding (62). Based on the large number of residues that influence agonist binding in the inuscarinic receptors, it has been speculated that the receptor-agonist compound is formed by a series of hydrogen-bond interactions rather than a few direct points of contact (Fig. 5) (63). 3. SEHOTONINRECEPTORS

Of all of the GPCRs that bind biogenic ainine neurotransmitters, perhaps no subfamily is as diverse as the receptors that bind serotonin (5-HT). Molecular biology has confirmed the notion of four distinct types of 5-HT receptor: 5-HT1, 5-HT2, 5-HT3, and 5-HT4. Within each of these groups, multiple 5-HT receptor subtypes exist, with the total number of 5-HT receptors identified by molecular cloning numbering at least 10 (28). The 5-HT1,

126

CLAIRE M. FHASEH ET AL.

5-HT2, and 5-HT, receptors are members of the GPCR family, while the 5-HT3 receptors belong to the ligand-gated ion-channel receptor superfamily. Mutagenesis experiments have begun to address the question of the molecular basis of 5-HT, and 5-HT, receptor-ligand interactions. Amino-acid residues in the second transineinbrane domain of 5-HT1and 5-HT2 receptors are important for agonist binding. Replacement of the conserved aspartate at position 82 in helix I1 of the 5-HTlAreceptor with alanine produces a phenotype without detectable agonist binding, suggesting that this residue is either directly involved in agonist binding or is required for maintenance of conformation as a result of its charge (64).Mutation of the corresponding aspartate in the 5-HT, receptor also affects agonist binding; however, agonist affinity is reduced but not eliminated (65). The effects of this mutation in the 5-HT, receptor are similar to those observed with adrenergic and inuscarinic acetylcholine receptors. The difference in the effect of this mutation at 5-HT, and 5-HT, receptors is not clear; however, it suggests that these subtypes of serotonin receptors may differ with respect to their interactions with serotonin and other subtype-selective agonists. Amino-acid residues in the seventh transmembrane helix of serotonin receptors have also been implicated in agonist binding. Mutation of a conserved serine residue at position 393 to alanine in the 5-HTlA receptor reduces the binding of the agonist ["]8-OH-DPAT binding by 86% compared with the wild-type receptor (64).This finding suggests that hydrogenbond interactions between this serine residue and the ring hydroxyl of [3H]8-OH-DPAT may be essential in the binding of this ligand. The rat 5-HTlBreceptor differs markedly from its human homologue, the 5-HT1,, receptor, in its affinities for various drugs, even though their primary structures are more than 90% identical (28, 29 and references therein). Within the transineinbrane domains, regions defined in other biogenic amine receptors to be involved in ligand binding, the rat and human 5-HT1 receptors differ by only eight amino acids. Using site-directed mutagenesis, several laboratories have identified a single amino-acid difference in helix VII that is responsible for most of the known pharinacological discrepancies between the rat and human hornologues (66-68). In the rat receptor, there is an asparagine at position 351; in the human receptor, this residue is replaced by a threonine. The presence of an asparagine residue in helix VII in the rat associated with a much higher affinity for pindolol and its derivatives (6668). These results illustrate how a single amino-acid difference between species homologues of the same receptor can markedly influence receptor pharmacology. Moreover, they indicate that the ligand-bindingproperties of a given receptor subtype may not necessarily be extrapolated across species lines, even when the overall amino-acid identity is quite high.

G-PHOTEIN-COUPLED HECEPTORS

127

B. Peptide Hormone Receptors 1. TACHYKININ RECEPTOHS

Use of chimeric and point-mutated tachykinin receptors has begun to shed light on the domains involved in the binding of peptide agonists and nonpeptide antagonists to this class of receptors. The primary and secondary structures of the tachykinin receptors are similar to those of the adrenergic and muscarinic receptors; however, critical differences in these structures must exist in order to confer specificity for the binding of peptide versus small amine agonists. The three tachykinins [substance P, substance K (neurokinin A) and neuroinedin K (neurokinin B)] all share a common C-terminal sequence, Phe-X-Gly-Leu-Met-NH,, and a similar range of biological activities. The receptors that bind the tachykinins are designated the neurokinin 1, neurokinin 2, and neurokinin 3 (NK1, NK2, and NK3) receptors and differ in their affinities for the peptides. It has been proposed that all three tachykinin receptors may recognize the common C-domain of the peptides, whereas the divergent N-termini may determine receptor subtype selectivity (69, 70). Using chimeric NKl/NK2 receptors, the specificity for substance P was found to be determined priinarily by the region of the receptors extending from helix I1 to the second extracellular loop, together with a small contribution from the N-terminal extracellular domain (71) (Fig. 6). Additional work with NKl/NK2 and NKl/NKS chimeric receptors and point mutations also demonstrated that multiple extracellular domains in the receptors interact with peptide agonists; however, the three tachykinins do not interact with the same functional groups on each receptor (72). These conclusions are supported by findings that several tachykinin receptor domains contribute to the binding specificity of the tachykinin agonists but in varying degrees for each peptide (73). Five residues conserved among the tachykinin receptors at positions 23, 24, 25 (N-terminal domain), 96, and 108 (first extracellular loop) have been postulated to interact with the common determinants on the three peptide agonists (74). A number of nonpeptide tachykinin receptor antagonists specific for the NK1 and NK2 receptors have recently been described (74). These compounds display a marked difference in &nity for the tachykinin receptor subtypes and among the same receptor subtypes in different species (74). Using site-directed mutagenesis, two residues in the NK1 receptor have been identified, Val116 in helix 111and Ile2w in helix VI, that are responsible for the observed differences between rat and human NK1 receptors in the binding afhities for the nonpeptide antagonists (75). These amino acids

FIG.6. Mutational analysis of the substance-P receptor. Highlighted on this schematic diagram of the substance+' receptor are several key amino acids and domains that have been implicated in the binding of peptide agonists and nonpeptide antagonists. The amino-acid sequence of the receptor is given in single-letter code. The area in the middle of the figure represents the plasma membrane; the areas above and below the membrane represent extracellular and intracellular space. (Reproduced from 212, with permission.)

G-PHOTEIN-COUPLED HECEPTOHS

129

presumably do not interact directly with the antagonist compounds but are probably involved in helical packing of the receptor proteins. Other results suggest that residues in or near the second extracellular loop of the receptor are also involved in determining the affinity for nonpeptide antagonists (73, 76, 77). These findings indicate that the site of interaction of nonpeptide antagonists for the tachykinin receptors differs from that for the peptide agonists (Fig. 6). Furthermore, the interaction of the nonpeptide antagonists with the tachykinin receptors appears to be fundamentally different from the interaction of the antagonists for the amine neurotransmitter receptors. 2. OTHER PEPTIDE-HORMONE RECEPTORS u . Thyrotropin-Releusing Hormone Receptor. Thyrotropin-releasing hormone thyroliberin (TRH), binds to its receptor as a neutral peptide, suggesting that ionic interactions between receptor and hormone may not be as critical for binding as they are in the case of biogenic amine neurotransmitters. Consistent with this idea is the finding that mutations in conserved aspartate residues in the transmembrane domains and extracellular loops of the TRH receptor have not effect on the binding of the hormone (78).

h. N-Fonnyl-peptide Receptor. Using chimeric N-formyl-peptide and CSa anaphylatoxin receptors, the structural requirements for the binding of formyl peptides to their specific receptors have been investigated (79). Based on these studies, the ligand-binding pocket of the formyl-peptide receptor is postulated to include the second, third, and fourth extracellular domains together with the first transmembrane domain. The N-terminal domain is also apparently involved in ligand binding, perhaps by providing a lid to the ligand-binding pocket (79).

c. lnterleukin-8 Receptor. Interleukin-8 (IL-8) is a potent mediator of chronic and acute inflammation. Thus, small nonpeptide antagonists of IL-8 may be valuable as anti-inflammatory agents. In order to better understand the nature of the binding of IL-8 to its receptor, and ultimately, to be able to use this information in the design of new IL-8 receptor ligands, site-directed mutagenesis has been used in an attempt to map the ligand-binding site of the receptor. Both the N-terminal region and the third extracellular loop of the receptor are important functional domains involved in ligand binding to the IL-8 receptor (80, 81). A disulfide bridge between cysteine residues in the N-terminus and the third extracellular loop has been postulated to hold these domains in close spatial proximity to form the ligand-binding site (81). An aspartate at position 11 in the N-terminus of the receptor is most likely involved in binding one of the basic residues found in IL-8 (81, 82).

130

CLAIHE M. FHASER ET AL.

IV. Molecular Basis of Receptor/G-Protein Interactions The nature of the second-messenger pathway(s)activated in response to agonist binding to a GPCR is primarily determined by the type of G protein(s) with which it is able to interact. Through the use of transfected cell systems, it has been possible to study receptor/(=-protein interactions in considerable detail and to begin to identify receptor domains that are directly involved in receptor/G-protein coupling. It has been presumed that the cytoplasmic loops of GPCRs form the sites of interaction between receptors and G proteins. Multiple lines of evidence from biochemical, immunological, and genetic approaches lend much support to this hypothesis. Because the family of GPCRs interacts with a number of distinct G proteins (83) (more than 16 species of G, subunits have now been identified, it seems plausible that the sites for receptor/(=-protein interactions might be located in cytoplasmic domains that contain divergent sequences. Thus, attention has focused on the third cytoplasmic loop and the C-terminus of this fainily of proteins, which display the greatest degree of size and sequence heterogeneity among the GPCR subclasses. Amino-acid homology among G-protein-coupled receptors has proven useful as a means of identifying probable domains involved in receptor/G-protein interactions; however, a lack of knowledge of transmembrane protein structure has impeded the definitive identification of these domains. Other important insights into the overall nature of receptodG-protein coupling have recently been obtained through the application of molecular biology to the question of G-protein-mediated signal transduction. Multiple receptor subtypes that bind the same endogenous ligand can be coexpressed in a single cell. Thus it can be difficult to ascertain whether agonist-mediated activation of multiple signaling pathways in such cells reflects the ability of a single receptor subtype to couple to more than one second-messenger system or the stimulation of multiple, related receptors that each selectively couple to one signaling mechanism. The initial studies with transfected cells suggested that biogenic amine neurotransmitter receptors are capable of stimulating more than one species of G protein (51, 84). While it was initially not clear whether these results were physiologically relevant, or were due to overexpression of receptors in transfected cells with a shift in the normal stoichioinetry between receptors and G proteins, subsequent studies with a2-adrenergic (85), muscarinic (86), thyroid-stimulating hormone (87), and somatostatin receptors (88),for example, have confirmed that many classes of GPCRs couple to more than one G protein and activate more than one second-messenger system. In the case of a2-adrenergic receptor (51,85) and the thyroid-stimulating hormone receptor (89), single

G-PROTEIN-COUPLED RECEPTORS

131

amino-acid mutations in regions implicated in receptor/G-protein coupling eliminate the ability of each of these receptors to activate a specific intracellular signaling pathway. These findings suggest that the final biological response of a cell to a hormone or transmitter may, in large part, be determined by the species of receptors and G proteins that are expressed therein and the pleiotropy of G-protein-mediated signal transduction.

A. Biogenic Amine Receptors 1. P-ADHENEHGICHECEPTORS The structural domains of the P-adrenergic receptor involved in G-protein coupling and activation have been examined by numerous approaches, including proteolysis, chimeric receptor construction, and site-directed mutagenesis. Removal of the central portion of the third cytoplasmic loop and the Cytoplasmic tail of the p-adrenergic receptor by limited proteolytic digestion does not impair receptor/(=-protein coupling, suggesting that these regions are not involved in G-protein interactions (90). However, deletion of residues 239-272 in the third cytoplasmic loop results in a loss of receptormediated stimulation of adenylyl cyclase (33).Further analysis of the third cytoplasmic loop reveals that deletion of a short segment (residues 222-229) within the N-terminal portion of this loop eliminates the ability of the receptor to activate the cyclase (91). In addition, deletion of amino acids (residues 258-270 or residues 267-273) within the C-terminus of the third cytoplasmic loop produces mutant receptors with a substantially reduced ability to stimulate the cyclase (91, 92). Hence, the N- and C-terminal portions of the third Cytoplasmic loop appear to be the domains that are essential for receptor/G-protein coupling. The wild-type P-adrenergic receptor typically displays both a high- and a low-finity binding state for agonists (93).High-affinity agonist binding is associated with coupling of the receptor to the G protein. Mutant receptors containing deletions within either the N-terminus or the C-terminus of the third Cytoplasmic loop display only a single &nity state for agonist that is not altered by the addition of GTP analogs or NaF (33, 91),suggesting that these mutant receptors are incapable of coupling to G,, the stimulatory G protein. Substitution of Asp130 of the human P,-adrenergic receptor with asparagine results in a receptor with normal antagonist binding but a significantly higher affinity for agonists than the wild-type receptor (94). While this mutant receptor displays guanine-nucleotide-sensitiveagonist binding, it is unable to mediate increases in CAMP(94),suggesting that the functional coupling of the Asn" P-receptor to G, is altered. These findings demonstrate that guanine nucleotide effects on agonist affinity can be dissociated from those on activation of G, and adenylyl cyclase. From deletion

132

CLAIRE M. FHASER ET AL.

mutagenesis studies of the P,-adrenergic receptor, HausdorE et al. (92) also concluded that the molecular determinants of the P,-adrenergic receptor involved in the formation of the ternary complex are not identical to those that transmit the agonist-induced stimulatory signal to G,. Substitution of the conserved cysteine residue (Cys341)within the N-terminal segment of the cytoplasmic tail produces a significant reduction in the ability of the P-receptor to stimulate adenylyl cyclase (95). This cysteine residue is thioesterified with palmitic acid (96), similar to the palmitoylation of the cysteine residues (Cys322 and Cys3B) of rhodopsin (97). It has been proposed that the palmitoyl moieties of rhodopsin are embedded within the membrane, forming a fourth intracellular loop (97). The cysteine residue (Cys") of the P,-adrenergic receptor may also form an additional intracellular loop that could promote the proper configuration of the C-terminus of the third Cytoplasmic loop and the N-terminus of the cytoplasmic tail, and thus facilitate the receptor/(=-protein coupling (96). Of interest are reports describing results with m l (98), and m2 muscarinic receptors (99) and a,-adrenergic receptors (IOO), which also contain a cysteine in their C-terminal regions. Mutation of this cysteine residue in these biogenic amine receptors has no effect on agonist-mediated activation of their respective secondmessenger pathways, indicating that this shared structural motif may play differing roles in different receptor/G-protein interactions. A series of chimeric a,/P,-adrenergic receptors has been utilized to h r ther delineate the receptor domains involved in G-protein activation. The a,-adrenergic receptor and the P,-adrenergic receptor are both stimulated by epinephrine, but these receptors couple to different G proteins. The P2-adrenergic receptor is coupled to G, and activates adenylyl cyclase, whereas the a,-adrenergic receptor inhibits the cyclase via the inhibitory G protein (Gi).Substitution of the region extending from amino-acid 174 at the N-terminus of helix V to amino-acid 295 at the C-terminus of helix VI of the a,-adrenergic receptor with the corresponding region froin the Pz-adrenergic receptor yields a chimeric receptor capable of stimulating the cyclase with the pharmacological specificity of an a,-adrenergic receptor and an efficacy approximately one-third that of the wild-type P,-adrenergic receptor (38).A chimeric receptor that contains a &-receptor sequence from aminoacid 215 in the third cytoplasmic loop to 295 in helix VI stimulates the cyclase activity but with a greatly reduced efficacy. These data suggest that helices V and VI may be required for determining the specificity of Pzreceptor coupling to G, (38). To identify the receptor domain(s)involved in G-protein specificity (G, or Gi), a series of chimeric receptors was constructed by substitutions of single or multiple segments of the N-terminus (Sl) and C-terminus (S2) of the third cytoplasmic loop and the N-terminus of the cytoplasmic tail (S3) of the

C-PROTEIN-COUPLED RECEPTORS

133

P,-adrenergic receptor with the corresponding regions of the ap-adrenergic receptor (101).Multiple substitutions (S2,3 and S1,2,3) result in significant impairment of receptor-G, coupling (101).Following pertussis toxin treatment, which uncouples receptors from Gi, the mutant receptor containing all three substitutions (S1,2,3) exhibits a substantial increase in agonistmediated adenylyl cyclase activity (101).Furthermore, this mutant receptor (S1,2,3) displays high-affinity agonist binding in the absence and presence of pertussis toxin. These findings suggest that the S1,2,3 mutant receptor is capable of coupling to G, as well as G,. The impaired coupling of this mutant receptor to G, in the absence of pertussis toxin may reflect a concurrent coupling to both G, and Gi, whereas the reduction in G, coupling of the S2 and S3 mutants is most likely due to the inability to couple to either G, or Gi (101).The results of this study support the proposition that receptor/Gprotein coupling and G-protein specificity may require the participation of multiple domains. 2. MUSCARINICACETYLCHOLINERECEPTORS Studies with chimeric inl/m2 or m2/m3 inuscarinic receptors indicate that the third intracellular loop is sufficient in determining the selective coupling of muscarinic receptor subtypes to their respective effector enzymes (102, 103).Much of this specificity resides in the N-terminal end of this sequence (103).Deletion of up to 123 of the 156 amino acids of the central portion of the third intracellular loop of the mouse in1 inuscarinic receptor can be accomplished without decreasing the coupling of the receptor to the activation of phospholipase C (I&), supporting the hypothesis that the membrane-proximal sequences of this loop determine G-protein interactions. However, several lines of evidence suggest that multiple intracellular domains of muscarinic receptors are involved in G-protein coupling. The in1 inuscarinic cholinergic receptor stimulates the release of inositol phosphates via the pertussis-toxin-insensitive G protein, G,,. As observed with the adrenergic receptors, a highly conserved aspartate residue in the Asp-Arg-Tyr motif located at the beginning of the second intracellular loop of muscarinic receptors is important for normal receptor/(=-protein coupling (48). In addition, replacement of either the entire third cytoplasmic loop (residues 211-364) or the N-terminal region (residues 215-226) of the third Cytoplasmic loop of the in1 receptor with the corresponding domain of the P,-adrenergic receptor produces a mutant receptor capable of stimulating adenylyl cyclase, as well as retaining the ability to stimulate inositol phosphate release (105). Substitution of the second cytoplasmic loop of the in1 muscarinic receptor with the comparable region of the P-adrenergic receptor decreases the coupling to GI, but does not promote coupling to G,. However, substitution of both the second and third cytoplasmic loops of the

134

CLAIHE M. FHASEH ET AL.

m l muscarinic receptor potentiates the activation of adenylyl cyclase, yet significantly attenuates the stimulation of inositol phosphate release. These observations suggest that the second and third intracellular loops must interact to determine G-protein specificity (105).

B. Defects in Receptor/G-Protein Coupling in Disease The importance of GPCRs in modulating normal cellular physiology is supported by recent reports that mutations in GPCRs that alter normal receptor/G-protein coupling are responsible for the abnormal phenotype in two genetic diseases in humans and one genetic disease in mice. Congenital nephrogenic diabetes insipidus (CNDI) is a disease that manifests its symptoms in newborns and is associate with an inability to concentrate urine, resulting in severe dehydration that leads to mental retardation, slowed growth, and, in some cases, death. Sixteen mutations in the coding region of the human vasopressin type-2 receptor have been described in individuals affected with CNDI (106-109). The functional consequences of each of these mutations has yet to be determined; however, characterization of one of the mutant receptors in which the arginine at position 137 in the conserved Asp-Arg-Tyr motif is replaced by histidine indicates that this mutant receptor binds arginine-vasopressin with a normal a fh ity but fails to stimulate adenylyl cyclase (110). Retinitis pigmentosa is a group of inherited diseases that lead to blindness. The autosomal dominant form of retinitis pigmentosa (ADRP) can be caused by mutations in the gene encoding the visual pigment, rhodopsin (111). Approximately 30 rhodopsin mutations have been reported in patients with ADRP; these are located throughout all domains of the protein. The molecular pathophysiology of ADRP remains to be determined, as the phenotypes of the mutant rhodopsins are heterogeneous. Recently, Min et ul. (112) described three mutants on or near the cytoplasmic surface of rhodopsin that are associated with ADRP. All three mutant proteins are spectrally normal but are defective in activating transducin (112). It is not obvious how a defect in the signal-transducing properties of rhodopsin may be responsible for the clinical manifestations of ADRP, although it has been speculated that altered protein processing may be involved. The growth-hormone-releasinghormone, somatoliberin (GHRH), receptor is a member of the bmily of GPCRs that is expressed on pituitary somatotropes and mediates the action of GHRH to stimulate the synthesis and release of growth hormone. In the mouse, the GHRH receptor has been mapped to a region on chromosome 6 associated with the little mutation characterized by reduced growth-hormone secretion and a dwarf phenotype (113).In this little mouse, a single base mutation in the GHRH receptor has been identified that substitutes glycine for aspartate at position 60 in the

G-PHOTEIN-COUPLED RECEPTORS

135

N-terminus of the receptor protein (113).The mutant GHRH receptor does not elicit an increase in cAMP following exposure to GHRH, in contrast to the wild-type receptor. The inability of the mutant GHRH receptor to activate the cAMP signaling pathway is most likely responsible for its inability to regulate growth-hormone synthesis and secretion in the pituitary. The little mouse exhibits many phenotypic characteristics in common with patients with growth-hormone deficiency type-I, suggesting that the GHRH receptor may be a reasonable candidate for mutation in patients with this disorder (113).

C. Constitutive Activation of G-Protein-Coupled Receptors: implications for Control of Cellular Growth The mechanism whereby information is transferred from the ligandbinding domains of GPCRs to the relevant regions of the intracellular loops that are responsible for receptor/(=-protein interactions is not well understood. One of the consequences of agonist binding to a GPCR is to trigger a conforinational change in the receptor that allows for activation of G proteins. Thus, it has been postulated that in the absence of agonist, there is a tonic constraint imposed by the structure of GPCRs that prevents direct receptor/G-protein contact (114). In support of this idea are the findings that short synthetic peptides derived from the sequences of the intracellular loops of GPCRs are capable of stimulating G proteins in oitro in the absence of any agonist (115, 116). Mutations in the C-terminal region of the third intracellular domain of the P,-adrenergic (117), the a,-adrenergic (118),and the thyroid-stimulating hormone receptor (89)have been described that are associated with constitutive receptor activity in the absence of agonists. Studies (117, 118)show that reciprocal exchanges of a small region of amino acids in the C-terminal end of the third intracellular loop of the a,,-adrenergic receptor, which is coupled to phospholipase C, with the corresponding segment of the P,-adrenergic receptor, which is coupled to adenylyl cyclase, results in constitutive receptor activity in both cases. The levels of basal signaling in the absence of agonist are comparable in magnitude to those seen in the presence of agonists in the respective wild-type receptors. In each instance, the active mutant receptors have a markedly higher affinity for agonists, and in the case of the &-receptor, the increase in agonist affinity is related to the efficacy of the agonist (117). Thus, the result of such mutations is to produce a highaffinity G-protein-independent conformation of the receptors. It is difficult to explain these findings according to the original ternary complex model of hormone-receptor/G-protein interactions. Lefkowitz et al. (114)proposed an extension of the model that introduces an isomerization

136

CLAIRE M. PHASER ET AL.

step that governs the transition of the receptor (R) to an active state (R*), the form of the receptor capable of binding to the G protein. The constitutively active receptors are presumed to be more likely to adopt the R* conformation in the absence of agonist. Because definitive knowledge of the threedimensional structure of GPCRs is not available, it is not clear how these mutations mediate the presumed change in receptor conformation that allows for an increase in the rate of formation of the R* state. One important implications of the finding that mutations in various GPCRs can lead to constitutive activity is with regard to the potential role of such naturally occurring mutations to serve as oncogenic signals in oioo. Already, several GPCRs linked to phospholipase C (e.g., muscarinic acetylcholine, serotonin, and a,-adrenergic receptors) have been shown to promote agonist-dependent transformation of transfected cells (119-121). Somatic mutations in the thyrotropin receptor gene that result in constitutive receptor activation cause hyperfunctioning thyroid ademonas (122). Mutations in the luteinizing hormone receptor that result in constitutive activity are associated with familial male precocious puberty (123).

V. Identification of Functional Domains Involved in Receptor Desensitization and Down-regulation Prolonged exposure to agonist results in an attenuated receptor responsiveness known as desensitization. This phenomenon has been well studied (124-126) and described as a biphasic process, consisting of a short-term component and a long-term component. Short-term desensitization (seconds to minutes) is characterized by a rapid reduction in receptor signaling and a rapid recovery without a requirement for protein synthesis; long-term desensitization (several hours) is characterized by a loss of total receptor number and a recovery that requires de nmo protein synthesis. Several molecular mechanisms have been proposed for receptor desensitization. One mechanism of desensitization involves a loss of receptors at the cell surface (Fig. 7). Upon exposure to agonist, receptors are rapidly sequestered (in minutes) into subcellular membrane vesicles. The process is reversible, and receptors are returned to the plasma membrane following the removal of agonist. However, prolonged exposure to agonist (hours) results in down-regulation, in which there is a decrease in total receptor number. Restoration of receptors at the cell surface requires new protein synthesis.

G-PROTEIN-COUPLED RECEPTORS

137

Sequestration

Dorm-reg u1ation

FIG. 7. P-Adrenergic receptor desensitization. Following prolonged exposure to agonist, there is an attenuation of receptor responsiveness. kntwwn as desensitization. Nunierous phenomena have heen identified as elements of desensitii*ltion, as illustrated. Phosphorylation of tlie P-adrenergic receptor by protein kinase A (PKA) or P-adrenergic receptor kinase (PARK) leads to an irncu)upling of the receptor from C,. Secpestration refers to tlie process of rapid translocation of the P-adrenergic receptor away from the plasina inetnbrme and into vesicular nieinbrane cmqmrtments that are iniiwwible to agonists and devoid of G,. Agonist-inducwl down-regulation results in decreases in receptor numlier and receptor degradation, possibly by a lysosonial pathway. (Reproducutd from 213, with permission.)

A. 9-Adrenergic Receptors 1. DESENSITIZATION

Following short-term desensitization, stimulation of adenylyl cyclase by P-adrenergic agonists is markedly reduced, yet stimulation of G, by sodium fluoride or stimulation of the cyclase by forskolin remains unchanged, sug-

138

CLAIRE M. FHASER ET AL.

gesting that the receptor may serve as the regulatory element in desensitization (127,128). Furthermore, studies utilizing fusion membranes demonstrate that @,-adrenergic receptors from desensitized cells display an attenuated ability to stimulate the cyclase (129,130). Thus, delineation of the mechanisms of P2-adrenergic receptor desensitization have focused on the role of the receptor itself. Numerous studies have demonstrated that @-receptorsundergo phosphorylation as a result of prolonged exposure to agonists (131-133).The CAMP-dependent protein kinase A (PKA)is capable of agonist-induced phosphorylation of the P-adrenergic receptor (134,135).The P,-adrenergic receptor contains two consensus sequences, Lys/Arg-Arg-X-X-Ser at positions 259-262 on the C-terminus of the third cytoplasmic loop and at positions 343-348 on the N-terminus of the cytoplasmic tail, which may serve as sites for PKA phosphorylation (135,136).PKA-mediated phosphorylation of the P2-adrenergic receptor alters receptor/G-protein coupling (137-139).Since the PKA sites of the P,-adrenergic receptor are located within the domains implicated in G, coupling, it is possible that phosphorylation of these sites directly interferes with the receptor/(=-protein coupling (134,136). A variant of S49 lymphoma cells (kin-) that lacks a functional PKA exhibits agonist-induced desensitization and receptor phosphorylation (140).This finding led to the discovery of a novel receptor-specific kinase, P-adrenergic kinase (PARK) that catalyzes the phosphorylation of multiple serine and threonine residues located at the C-terminus of the cytoplasmic tail of the P2-adrenergic receptor (141,142). PARK phosphorylates only the agonistoccupied form of the receptor, suggesting that this enzyme may be involved in the process of desensitization (141). Mutagenesis studies have been undertaken to define the role of receptor phosphorylation in the process of desensitization. Pre-exposure of CHW cells expressing P,-adrenergic receptors to low (nanomolar)concentrations of isoproterenol causes a loss in sensitivity of the adenylyl cyclase response to agonist stimulation without affecting maximal responsiveness (143).A mutant P,-adrenergic receptor in which the serine residues of the consensus sites for PKA phosphorylation were replaced by alanines displays an attenuated loss of sensitivity following exposure to low concentrations of agonist (143).However, cells expressing a mutant receptor in which alanine or glycine residues were substituted for the serine and threonine sites of the cytoplasmic tail (phosphorylation sites for PARK) exhibit a similar loss of sensitivity as observed in the wild-type receptor following agonist treatment (143).Thus, receptor phosphorylation at the putative PKA sites is responsible for altered receptor sensitivity (i.e., receptor uncoupling) induced by exposure to low levels of agonist.

C-PROTEIN-COUPLED RECEPTORS

139

In contrast, pretreatment of cells expressing the wild-type P,-adrenergic receptor with high (micromolar) concentrations of isoproterenol results in decreases in both receptor sensitivity and maximal agonist-mediated stimulation of adenylyl cyclase (143).A loss of maximal responsiveness is not observed in the mutant receptors lacking the phosphorylation sites for either PKA or PARK (143).These findings suggest that receptor phosphorylation at both the PKA and PARK sites is necessary to affect the decrease in efficacy following exposure of the wild-type receptor to high concentrations of agonists. Treatment of A431 epidermoid carcinoma cells with heparin, a potent inhibitor of PARK, significantly attenuates agonist-induced phosphorylation and desensitization of P,-adrenergic receptors (144).PARK-mediated desensitization occurs with a half-life of less than 15 seconds, whereas PKAmediated desensitization proceeds with a half-life of 3.5 minutes (145).These data indicate that PARK mediates early-onset, agonist-induced, or homologous desensitization, but is not involved in the later stages of receptor desensitization. Phosphorylation of P,-adrenergic receptors by PARK is markedly reduced in a reconstituted system containing purified PARK, implying that receptor phosphorylation by PARK requires additional components (138). This observation is similar to the rhodopsin system, in which full inhibition of rhodopsin activation of transducin requires both phosphorylation of rhodopsin by rhodopsin kinase and the binding of another retinal protein, arrestin, to the phosphorylated rhodopsin (146).An arrestin-like protein has been isolated and this protein, p-arrestin, is capable of inhibiting the activity of phosphorylated P-adrenergic receptors (147).It has been proposed that homologous desensitization is mediated by PARK phosphorylation of the agonist-occupied receptor which promotes the p-arrestin binding and, in turn, inhibits G, activation (147). Additional data supporting the role of phosphorylation in agonistpromoted desensitization of P-adrenergic receptors derive from studies with the P,-adrenergic receptor subtype. This receptor subtype is preferentially expressed in adipose tissue in humans and rodents and is involved in metabolic control of adipocytes (148,149). The P,-adrenergic receptor binds classical P-adrenergic antagonists and agonists with &nities 0.1 to 0.01 that of either the PI- or P,-adrenergic receptor subtypes. Most of the putative phosphorylation sites present on the intracellular domains of the p,-adrenergic phosphorylation sites present at the intracellular domains of the p,-adrenergic receptor are absent from the primary sequence of the P,-receptor (149).A recent study reports that a 30-minute exposure of L cells transfected with the human P,-adrenergic receptor has only a marginal effect on &-receptor responsiveness (150).Substitution of the third cytoplasmic

140

CLAIHE M. FRASEH ET AL.

loop and C-terminal tail of the P,-receptor with the corresponding regions of the P2-receptor partially restores agonist-mediated desensitization (150). These results indicate that the p,-adrenergic receptor is not subject to agonist-mediated desensitization and are consistent with a role for phosphorylation in P-receptor desensitization. However, these findings also suggest that molecular determinants outside the third cytoplasmic loop and carboxyl tail are required for maximal desensitization.

2. SEQUESTRATION Upon exposure to agonists, P-adrenergic receptors are rapidly translocated away from the plasma membrane to vesicular membrane compartments that are inaccessible to agonists (151) and devoid of G, activity (131, 152). This translocation process, known as sequestration, requires agonist occupancy of receptors, and has been proposed as a possible mechanism for receptor desensitization (153). Although sequestration of P-adrenergic receptors occurs at a much slower rate than receptor uncoupling and phosphorylation (152),agonist-induced sequestration does not require receptor phosphorylation. Mutant P,-adrenergic receptors lacking the phosphorylation sites for both PKA and PARK exhibit normal agonist-induced sequestration, even though agonist-stimulated receptor phosphorylation is significantly reduced (143,154). It has been postulated that regions of the P,-adrenergic receptor associated with G, activation may be required for receptor sequestration. Mutant P2-adrenergicreceptors that are incapable of adenylyl cyclase activation (deletion of residues 239-272 or 222-229) do not undergo sequestration (154, 155). However, receptor/(=-protein coupling may not be necessary for agonist-induced sequestration of P,-adrenergic receptors. Mutant p,-adrenergic receptors that exhibit abnormal G, coupling display a normal pattern of agonist-mediated sequestration (125, 156, 157). Thus, the biochemical mechanism of agonist-stimulated receptor sequestration remains unknown. The significance of sequestration as a mechanism of receptor desensitization is also unclear. Receptor sequestration can be completely inhibited without affecting desensitization (151). Following exposure to high concentrations of agonist, only 30% of the total cell surface P,-adrenergic receptors are sequestered (145). Considering the large reserve of spare receptors, sequestration of 30%of the receptors would not significantly alter receptor response. Furthermore, receptor sequestration occurs at a much slower rate than receptor phosphorylation; hence, phosphorylated receptors would already be functionally uncoupled from G,. It has been proposed that sequestration may promote receptor dephosphorylation, leading to the regeneration of functional receptors that are then returned to the plasma

C-PHOTEIN-COUPLED RECEPTORS

141

membrane (137).Supporting this hypothesis (158) is the recent finding that blockade of &-receptor sequestration, either by pretreating cells with a hypertonic sucrose solution to inhibit receptor endocytosis or by creating a sequestration-defective mutant receptor, results in agonist-mediated receptor desensitization with little or no recovery from desensitization following removal of agonist.

B. a,-Adrenergic Receptors Compared with P-adrenergic receptors, very little is known regarding the mechanisms of desensitization of a,-adrenergic receptors. Three subtypes of a,-adrenergic receptors have been cloned and designated a,C2, a,C4, and a2C10, based on their location on human chromosomes 2, 4, and 10, respectively. The a,C10 receptor subtype undergoes short-term, agonist-promoted desensitization via receptor phosphorylation on serines and threonines in the third intracellular loop of the protein (159).Because there is significant divergence among the three subtypes of a,-adrenergic receptors in the primary sequence of the third intracellular loop, it is possible that this difference among these related receptors pertains to mechanisms for agonist-mediated control of receptor responsiveness. Expression of each a,-adrenergic receptor subtype in Chinese hamster ovary cells and short-term and long-term challenge of each cell line with saturating concentrations of epinephrine reveals that, after 30 minutes of agonist exposure, the a,C10 and a,C2 receptors display desensitization characterized by rightward shifts in the curves for agonist-mediated inhibition of adenylyl cyclase (160).In contrast, the a2C4 receptor displays no functional desensitization after the same agonist challenge. All three receptor subtypes undergo desensitization after long-term (24-hour)agonist exposure, primarily due to a decrease in the amount of Gi expression in the transfected cells (160).The primary sequence differences among a,-adrenergic receptor subtypes may reflect differences in how they are regulated by agonists.

C. Muscarinic Acetylcholine Receptors It has been demonstrated that short-term (less than l-hour) activation of muscarinic receptors by agonists leads to their rapid internalization or sequestration away from the cell surface (161).This process is not accompanied by a reduction in total cellular receptor sites. Upon removal of agonist, internalized receptors return rapidly to the cell surface. However, continued muscarinic agonist exposure leads to down-regulation (a decrease in the total number of receptor sites), presumably due to an increased rate of receptor degradation.

142

CLAIRE M. FHASER ET AL.

1. DESENSITIZATION

For muscarinic receptor subtypes coupled to the inhibition of adenylyl cyclase, numerous studies provide evidence for agonist-mediated phosphorylation of receptors (162-164), and this process appears to correlate with desensitization (163, 165). The protein kinases responsible for phosphorylation of 11-12muscarinic receptors in endogenous cell systems have not been unequivocally identified. Data from several laboratories indicate that both second-messenger-activated protein kinases (166-1 68) and receptor-specific protein kinases such as PARK (164, 169-171) phosphorylate the in2 muscarinic receptor in oitro and thus may play a role in agonist-mediatedregulation of receptor responsiveness. Studies using phorbol esters have also established that protein-kinaseC-mediated phosphorylation plays a role in the regulation of function of muscarinic receptors coupled to phospholipase C. However, it has not been clearly established whether the effects of phosphorylation occur at the level of the receptors or downstream in the signal transduction pathway. Using iinmunoprecipitation with a specific antiserum agonist, the human m3 muscarinic receptor, Tobin and Nahorski (172)have described a rapid phosphorylation of 11-13inuscarinic receptors in response to agonist or phorbol 12Pmyristate 13a-acetate. Interestingly, RO-318220, a specific protein kinase C inhibitor, had no effect on carbachol-induced increases in phosphate incorporation into the m3 receptor, indicating that protein kinase C is not involved in the agonist-mediated phosphorylation. The time course of m3 receptor phosphorylation closely parallels that of agonist-mediated desensitization in the same cell system (173),suggesting that the two processes may be linked, as has been shown with other GPCRs. 2. DOWN-REGULATION A reduction in the number of inuscarinic receptors at the cell surface is an additional mechanism for regulation of receptor activity. As with other GPCRs, prolonged stimulation by agonists (at least several hours) can lead to loss of a portion of internalized muscarinic receptors from the cell. Although the molecular events that initiate the process of internalization have not yet been defined, recent analyses of human and rat in1 muscarinic receptors have identified small regions of the third intracellular loop of this receptor subtype sufficient to severely impair agonist-mediated down-regulation without affecting ligand binding or activation of phospholipase C (174,175). The role of these regions in the third intracellular loop in agonist-mediated down-regulation appears not to involve receptor phosphorylation (175). Rather the data suggest that the secondary structure of a small region in the third intracellular loop of the m l muscarinic receptor is pivotal for m l receptor internalization, perhaps as a binding site for a cytosolic factor that pro-

C-PHOTEIN-COUPLED RECEPTORS

143

motes internalization. The domains in the third cytoplasmic loop of the rat in 1 inuscarinic receptor involved in homologous down-regulation are also required for heterologous regulation of the receptor via P,-adrenergic receptor activation of adenylyl cyclase (175). Of interest is the additional finding that the domains involved in agonist-promoted down-regulation of rat m l inuscarinic receptors do not influence agonist-mediated receptor uncoupling, suggesting that a separate motif(s) may be responsible for this phenomenon (175).

VI. Genetic Elements Controlling G-Protein-Coupled Receptor Expression Molecular biological approaches have provided insights into the genetic mechanisms controlling GPCR expression. Several aspects of receptor regulation have been localized at the transcriptional and post-transcriptional levels. Transcriptional control is exemplified by the effects of steroids on P-adrenergic receptor number and mRNA, while post-transcriptional mechanisms controlling receptor expression involve mRNA destabilization. Relevant reviews in this area include those by Malbon et al. (176) and Collins et

al. (177). Adaptation or tachyphylaxis, a universal phenomenon in biology, is defined as a decline in sensitivity to a stimulus following chronic exposure. In DDTIMF-2 cells, the rapid rise in P,-adrenergic receptor mRNA levels induced by short-term (minutes) epinephrine exposure gives way to a downregulation of steady-state p,-adrenergic receptor mRNA when agonist exposure proceeds over a period of hours (178, 179). Agonist-induced downregulation of &-adrenergic receptor mRNA appears to be dependent on the PKA pathway. Decreases in P,-adrenergic receptor inRNA mediated by P-agonists can be promoted by long-term dibutyryl cAMP (a membranepermeable cAMP analog and activator of PKA) treatment (178).The role of cAMP and PKA in decreasing P,-adrenergic receptor mRNA has also been demonstrated in CHW cells stably transfected with the P,-adrenergic receptor cDNA (180). Exposure of these cells to dibutyryl cAMP or forskolin (a diterpene that directly activates adenylyl cyclase) mimics the effects of isoproterenol by decreasing P,-adrenergic receptor mRNA levels. Important is the finding in both studies that substantial decreases in P,-adrenergic receptor mRNA mediated by P-agonist treatment precede the loss of p,-adrenergic receptor-binding sites. Work by Hadcock et al. (179) has provided an explanation for the observed changes induced by agonists in the steady-state levels of p,-adrenergic receptor mRNA. Incubation of DDTIMF-2 cells with isoproterenol for

144

CLAIRE M. FHASER ET AL.

12 hours results in a reduction in the half-life of P,-adrenergic receptor mRNA from 12 to 5 hours, representing a 2.4-fold change. The exact mechanism responsible for the destabilization of p-adrenergic receptor mRNA awaits elucidation. In general, changes in mRNA stability are believed to be dictated by specific sequence elements residing on the 3’ untranslated region of the mRNA molecule (181).These detenninants interact with RNAbinding proteins referred to as truns-acting factors to destabilize the mRNA molecule, presumably by facilitating accessibility of the transcript to attack by a ribosomal bound nuclease (Fig. 8). Recently, instability sequence elements have been identified in the 3‘ untranslated region of two G-protein-coupled receptor mRNAs, those of the thyrotropin-releasing hormone receptor and the m l muscarinic acetylcholine receptor (182,183). Deletion of these elements renders the resulting mutant inRNA molecules resistant to agonist-promoted destabilization (182, 183).For the P,-adrenergic receptor mRNA, AU-rich and AUUUA-rich instability elements exist on the 3’ untranslated region, which specifically bind a 35-kDa protein termed P,-adrenergic receptor mRNA-binding protein (184).Whether such an interaction is responsible for P,-adrenergic receptor inRNA destabilization remains to be seen. It is apparent from these studies that mRNA destabilization represents an important autoregulatory mechanism to control G-protein-coupled receptor expression. The actions of steroid hormones on the regulation of the P-adrenergic receptor/adenylyl cyclase pathway are well documented (176,185).For example, glucocorticoids induce a 2- to 3-fold increase in P-adrenergic receptor levels in cultured cells that is both dose- and time-dependent (186,187). Corresponding to the increased expression of P-adrenergic receptors is an enhanced responsiveness of adenylyl cyclase to P-agonists (185).Although the phenomenon of steroid-induced up-regulation of P-adrenergic receptors at the pharmacological level has been described in detail, the molecular basis for this regulatory response has been elucidated only within the past few years. By using solution hybridization or Northern blot techniques, glucocorticoids have been shown to rapidly elevate the levels of p,-adrenergic receptor mRNA approximately twofold prior to the observed increases in p-adrenergic receptor number in DDT,MF-2 smooth muscle cells (179, 188).The observed increases in steady-state levels of P2-adrenergicreceptor mRNA were in turn shown to result from an elevation in the rate of transcription, not from a decrease in transcript turnover (179,188). P,-Adrenergic receptor numbers remain elevated for at least 48 hours following glucocorticoid treatment, while P,-adrenergic receptor inRNA levels appear to return to control levels by 24 hours (188)or remain elevated for up to 72 hours (179)in DDT,MF-2 cells. It is presumed that the increases in P,-adrenergic receptor mRNA levels contribute to the overall up-regulation

145

G-PHOTEIN-COUPLED RECEPTORS pagonist

i\

PKA-independent pathway

I.P,AR mRNA degradauon

FIG.8. Mechanism of P,-adrenergic receptor (P,AR) inRNA down-regulation in S49 n~nuselyinphoma cells. Dnwn-regulation of P,-adrenergic receptor inRNA occurs through a PKA-dependent and PKA-independent pathway. In wild-type S49 cells, where the P,-adrenergic receptor/adenylyl cyclase (AC)/protein kinase A (PKA) pathway is intact, stimulation of P,-adrenergic receptors with 8-agonists leads to acr.uniulation of CAMPand down-regulation of P,-adrenergic receptor inRNA. (1) P-Agnnists fail to increase CAMPlevels and down-regulate P,-adrenergic receptor inRNA in S49 mutant cells lines containing a coupling defect between Pp-adrenergic receptor and G, (unc and cyc- cells). (2) In 549 variants cmtaining a coupling defect between G, and AC, P-agonists failed to increase CAMP levels but down-regulated P,-;ldrenergic receptor inRNA (H21a d s ) . This finding supports the role of a PKA-independent pathway in P,-adrenergic receptor inRNA modulation. Destabilization of P,-adrenergic receptor mRNA appears to be the 1x4s of P,-ag(inist-niediated down-regulation of inRNA levels, possilily via I,inding of a P,-adrenergic receptor inRNA-I)inding protein (BARB) to AUUUA motifs in the 3' untranslated region. (3)PKA-dependence is evident in the S49 n~utantcell line containing defective PKA activity (kin-),in which P-agonists stiniu~ateCAMPaccuinulation but Fail to decrease P,-adrenergic receptor inRNA levels. (Reproduced from 213, with permission.)

of receptor number. The cloning of the P,-adrenergic receptor gene has made possible the identification of putative glucocorticoid-responsive elements within the DNA sequence (40, 189) (Fig. 9).

146

CLAIRE M. FRASEH ET AL.

FIG.9. Transcriptional up-regulation of P2-adrenergic receptor (PnAR) mRNA. Short-term stimulation of P,-adrenergic receptors by P-agonists increases adenylyl cyclase activity and accumulation of CAMP, leading to activation of protein kiiiase A (PKA). CAMP-responsive element-binding protein (CREB) is a 43-kDa transcription factor that dimerizes upon PKA phosphorylation. The phosphorylated dimer binds to an 8-lip palindromic secpence, CAMPresponsive element (CRE), in the 5'-flanking regions of CAMP-responsive genes, such as the p,-adrenergic receptor gene, thereliy enhancing gene transcription. p,-adrenergic receptor gene transcription can idso lie up-regulated b y steroid harmones. The unliganded steroid hormone receptor (SHR) exists as a minplex containing the SHR, a dimer of hsp90 and other inacroinolecular binding factors (BF). In the steroid-liganded state, this complex dissociates and the SHR dimerizes. Transcriptional activation of the P,-adrenergic receptor gene occurs via binding of the SHR dimer to steroid hormone-responsive elements (SREs). (Reproduced from 213, with permission.)

G-PROTEIN-COUPLED HECEPTOHS

147

VII. Identification of Novel G-ProteinCoupled Receptors by Partial cDNA Sequencing The human genome is estimated to contain 50,000-100,000 genes, of which 500 to over lo00 have been estimated to encode GPCRs. In 1991, A d a m et uZ. (190)reported on a novel method using automated partial DNA sequencing of more than 600 randomly selected human brain cDNA clones to generate expressed sequence tags (ESTs) (190).ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Partial sequencing of randomly selected cDNA clones directly from a cDNA library has been shown to be a rapid, efficient method of identifying new genes and describing the transcriptional activity of a tissue or cell line (190-193). The generation of ESTs from a large number of human cDNA libraries has recently proven to be an effective method for the identification of novel GPCRs. To date, we have discovered more than 20 new human GPCRs using this approach. In inany cases, the nucleotide similarity between a novel GPCR identified with ESTs and the published sequences of other GPCRs is on the order of 50% or less. Indeed, putative identifications of new ESTs as GPCRs can often be made only by searching the protein translation of the novel sequence against proven data bases. The protein alignments also often exhibit less than 50% similarity with known sequences. These observations suggest that it would be difficult to detect many of these novel GPCRs with low-stringency DNA hybridization protocols. In order to further characterize novel GPCRs, it is essential to complete the sequence analysis of the protein coding region and express the DNA in an appropriate heterologous cell system. In most cases, this has first necessitated additional DNA cloning, since many inserts in cDNA libraries are not full-length. The identity of the endogenous ligands for a significant number of novel GPCRs is not immediately obvious froin the arnino-acid sequence. Nonetheless, it is still possible to infer considerable information about the putative identification of each new GPCR based on amino-acid alignments (as illustrated in Fig. 2) and data from mutagenesis experiments that have identified conserved amino acids required for receptor-ligand interactions in various GPCR subfamilies. For example, comparison of the protein sequence alignment of the coding region of a new GPCR isolated from human brain with other GPCRs suggests that it is related to the biogenic amine neurotransmitter receptors because it displays the greatest sequence similarity to a-adrenergic and histamine receptors from a number of species. However, the coding region of this new GPCR is missing the conserved aspartate residue in helix 111 and the conserved serine residues in helix V

148

CLAIHE M. YHASER ET AL.

FIG. 10. Phylogenetic aiialysis of G-protein-reupled receptors. Sequences were aligned The unrooted using CLUSTAL (199),and refinements to the alignment were made iiianua~~y. consensus tree was created using the ProtPars parsimony analysis of Phylip version (3.5)(200) with the input order of the sequences randomized. The graphic was generated with Tree Tool (M. Maciukenas, University of Illinois, unpublished). Only aligned transmem1)rane regions were used in the parsimony calciilations. The lengths of tlie lines are proportional to tlie percentage difference Iwtween any two given sequences. All programs were run using the Genetic Data Enviroiimeiit (S. Smith, Harvard University. unpublished). The receptors considered are as fnllnws: hDlDR, humaii D1 dol’cliiiine receptor (44); hB2AR. hiiman P,-adrenergic recheptor (40);hAlaAR, human a,,,-adrenergic receptor (201);I)AlcAr, lxwiiie alc-adrenergic hmlmAChR. human in1 inuscariiiic receptor (214);hH2R, human histainiiie H2 receptor (215); receptor (49);IiTSHR, human thyrotropin receptor (208);IiFSHR, liiiiiiaii follicle-stiinulatiiig Iiormone receptor (206); hLH/CGR, human liitropiiilcliorio~iicgonadotropin receptor (207); hSKR, huinan sihstance-K receptor (204);hSPR, human sul)staiice-P receptor (205);mDOR, mouse &-opiatereceptor (202);IiFMLPR, human N-forniyl-peptide receptor (203); hOPS, huinaii rhodopsin (14); hOLFR, huinan olfactory receptor (209);rSCR, rat secretin receptor (21I). (Reprtduced from 198, with permission.)

that participate in receptor-ligand interactions in the subfamily of biogenic amine neurotransmitter receptors (see Section 111). This finding suggests that the endogenous ligand for the new GPCR is possibly not an ainine

G-PROTEIN-COUPLED RECEPTORS

149

neurotransmitter. Support for this hypothesis also derives from the phylogenetic relationship of the new GPCR to other members of the GPCR family. Parsimony analysis of the unknown receptor places it between the subfamily of amine neurotransmitter receptors and that of glycoprotein hormone receptors, rather than clearly associated with either group (Fig. 10). This analysis, coupled with the low degree (<SO%) of protein sequence similarity with members of either subfamily, suggests that the unknown receptor may represent a new class of GPCR.

VIII. Conclusions In the past decade, the cloning, sequencing, expression, and mutagenesis of GPCRs have provided a wealth of new information on receptor structure, function, and regulation. The discovery of new receptor subtypes not predicted from pharmacological and biochemical work has revealed complexities in cellular signaling that were not previously appreciated. The identification of the molecular determinants of ligand-receptor interactions and receptor/G-protein interactions suggests that this information might soon be utilized in the design of more selective therapeutic agents. Elucidation of the genetic elements that control receptor gene expression will lead to a far better understanding of how each cell expresses the appropriate receptor molecules required for normal cellular signaling. On a wholeorganism level, this detailed molecular information about receptor systems will ultimately impact our understanding of how cell-to-cell communication is established and the physiological consequences of receptor malfunction.

REFERENCES I . A. Levitski and H . R. Bourne, Annu. Rec. Cell. B i d . 2, 391 (1986). 2. A. G . Cilman, ARB 56, 615 (1987). 3. L. Biriilxiuiner, Annu. Rea P/ion~~ucol. Toxicol. 30, 675 (1990). 4. L. Birnlmiiner, Cell 71, 1069 (1992). 5 . I). E. Clapham slid E. J. Neer, Nature 365, 403 (1993). 6. C. A. Landis. S. B. Masters, A. Spada, A. M . Pace, H. R. Bourne and L. Villar, Nature 340, 692 (1989). 7. T. M . Savarese and C. M . Fraser, BJ 283, 1 (1992). 8 . H . F. Cantiello, A. 6. Prat. J. V. Bonventre, C. C. Cunningliani. J. H. Hartwig and D. A. Ausirllo, JBC 268, 4569 (1993). 9. K . E. Carlsoii, M . J. Wtmlkalis. M. F. Newhouse and D. R. Manning, Mol. Phannucol. 30, 463 (1986). 10. E. Sarndahl, M . Lindroth, T. Bengtsson, M. Fallman, J. Gustavsson. 0. Stendahl and T. Andersson, J . Ceff B i d . 109, 2791 (1989).

150

CLAIRE M. FHASER E T AL.

11. A. J. Jesaitis, G . M. Bokoch. J. 0.Tolley and R. A. Allen, ]. Cell B i d . 107, 921 (1988). 12. M. Rodbell, Cum. Top. Cell. Regul. 32, l(1992). 13. Y. A. Ovcliinnikov, N. G. Abdulaev, M. Y. Feigina, I.D. Artamonov and A. S. Bogachuk,

Biorg. Khiai. 9, 1331 (1983). 14. J. Nathans and I). S. Hogness. PNAS 81, 4851 (1984).

15. J. C. Venter, B. Eddy, L. M. Hall and C. M. Fraser. PNAS 81, 272 (1984). 16. S. M . Shreeve, C. M. Fraser and J. C. Venter, PNAS 82, 4842 (1985). 17. R. A. F. Dixon, B. K. Kobilka, D. J. Strader, J. L. Benovic. H. G . Dohlman, T. Frielle, M. A. Bolanowski, C. 1). Bennett, E. Rands, R. E. Dielil, R. A. Mumford, E. E. Slater, I. S. Sigal, M. G . Caron, R. J. Lefkowitz and C. D. Strader, Nature 321, 75 (1986). 18. T. Kubo, A. Maeda. K. Sugimoto, I. Akiba, A. Mikami, H. Takdmhi, T. Haga, K. Haga, A. Ichiyania, K. Kangawa, H. Matsuo, T. Hirose, and S. Numa, FEBS Lett. 209, 367

(1986). 19. R. Henderson. J. M. Baldwin, K. H. Downing, J. Lepault and F. Zemlin, Ultratnicro-

scopy 19,147 (1986). M. L. Applebury and P. A. Hargrdve, Vision Res. 26, 1881 (1986). H.-Y. Wang, L. Lipfert, C. C. Malbon and S. Bahouth, JBC 264, 14424 (1989). T. K. Atwood, E. E. Eliopoulos mid J. B. C. Findlay, Gene 98, 153 (1991). J. B. Baldwin. EMBO ]. 12, 1693 (1993). H. Y. Lin, T. L. Harris, M. S. Flannery, A. Aruffo, E. H. Kaji, A. Corn, L. F. Kowalski, Jr., H. F. Lodish and S. R. Goldring, Science 254, 1022 (1991). 25. H. Juppner, A.-8. Abou-Sainra. M. Freeman, X. F. Kong, E. Schipani, J. Richards, L. F. Kowalski, Jr., J. Hocks. J. T. Potts, Jr., H. M. Kronenberg and G. V. Segre, Science 254, 1024 (1991). 26. T. Isliihara, S . Nakamura, Y. Kaziro, T. Takahashi, K. Takahashi and S. Nagata, EMBO]. 10, 1635 (1991). 27. D. R. Sibley and F. J. Monisma, Jr., Trends Phannacol. Sci. 13, 61 (1992). 28. P. P. A. Humphrey, P. Hartig and D. Hoyer, Trends Phantmol. Sci. 14, 233 (1993). 29. M. S. Beer, D. N. Middlemiss and G. McAllister, Trends Phannacol. Sci. 14,228 (1993). 30. M. F.Hibert, S. Trumpp-Kallmeyer,A. Bruinvels and J. Hoflack, Mol. Phannacol. 40, 8 (1991). 31. J. Findley and E. Eliopoulos, Trends Phannacol. Sci. 11, 492 (1990). 32. S . Trumpp-Kallmeyer, J. Hoflack, A. Bruinvels and M. Hibert, ]. Med. Chern. 35, 3448 (1992). 33. R. A. F. Dixon, I. S. Sigal, E. Rands, R. B. Register, M.R. Candelore, A. D. Blake and C. D.Strader, Nature 326, 73 (1987). 34. B. K. Kobilka, C. MacGregor, K. Daniel, T. S. Kobilka, M. G. Caron and H.J. Lefkowitz, ]RC 262, 15796 (1987). 35. R. A. F. Dixnn. 1. S. Sigal, M. R. Candelore, R. 8. Register, W. Scattergood, E. Rands and C.I). Strader, EMBO]. 6, 3269 (1987). 36. M. G. Mukherjee, M. C. Caron, D. Mullikin and R. J. Lefkowitz, Mol. Phantwol. 12, 16 (1976). 37. C. D. Strader, I. S. Sigal, M. R. Candelore, E. Rands, W. S. Hill and R. A. F. Dixon.]BC 263, 10267 (1988). 38. B. K. Kobilh, T. S. Kobilka, K. Daniel, J. W. Regan, M. C. Caron and R. J. Lefkowitz, Science 240, 1310 (1988). 39. T. Frielle, S. Collins, K. W. Daniel, M. G. Caron, R. J. Lefkowitz and B. K. Kobilka, PNAS 84, 7920 (1987). 40. F.-Z. Chung, K.-U. Lentes, J. Gocayne, M. Fitzgerald, D. Robinson, A. R. Kerlavage, C. M. Fraser and J. C. Venter, FEBS Lett. 211, 200 (1987).

20. 21. 22. 23. 24.

G-PROTEIN-COUPLED RECEPTORS

151

41. P. Muzzin, J.-P. Revelli, F. Kuhne, J. D. Gocdyne, W. R. McCombie, J. C. Venter, J.-P. Gixd)ino and C. M. Fraser. JBC 266, 24053 (1991). 42. C. M. Fraser, S. Arakawa, W.R. McCombie and J. C. Venter, JBC 264, 11754 (1989). 43. J. W. Lomasney, W. Lorenz, L. F. Allen, K. King, J. W. Regan, T.-L. Yang-Feng, M. C. Caron and R. J. Lefkawitz, PNAS 87, 5094 (1990). 44. Q.-Y. Zhou, D. K. Grandy, L. Tliambi, J. A. Kushner, H. H. M. Van Tol, R. Cone, D. Pribnow, J. Salon, J. R. Bunzow and 0. Civelli, Science 347, 76 (1990). 45. D. K. Grandy, M. A. Marchionni, H. Makam, R. E. Stofko, M. Alfano, L. Frothingham, J. B. Fischer, K. J. Burke-Hawie, J. R. Bunzow, A. C. Server and 0. Civelli, PNAS 86, 9762 (1989). 46. T. Kubo, K. Fukuda, A. Mikami, A. Maeda, H. Takahashi, M. Mishina, T. Haga, K. Haga, A. Ichiyama, K. Kangdwd, H. Matsuo, T. Hirose and S. Numa, Nature 323, 411 (1986). 47. T.I. Bonner, N. J. Buckley, A. C. Young and M. R. Brann, Science 237, 527 (1987). 48. C. M. Fraser, C.-D. Wmg, D. A. Rol)inson, J. D. Gocayne and J. C. Venter, Mol. Phonnacol. 36, 840 (1989). 49. E. G. Peralta, A. Ashkenazi, J. W. Winslow, D. H. Smith, J. Ramacliandran and D. J. Capon, EMBOJ. 6, 3923 (1987). SO. C. D. Strader, I. S. Sigal, R. B. Register, M. R. Candelore, E. Rands and R. A. F. Dixon, PNAS 84, 4384 (1987). 51. C.-D. Wang, M. A. Buck and C. M. Fraser, Mol. Phantiucol. 40, 168 (1991). 52. I. Gantz, J. DelVdle, L.D. Wang, T. Tashiro, G. Munzert, Y.-J. Guo, Y. Konda and T. Yaniada, JBC 267, 20840 (1992). 53. C. D. Strader, T. Gaffney, E. E.Sugg, M. R. Candelore, R. Keys, A. A. Patchett and R. A. F. Dixon, JBC 266, 5 (1991). 54. C. D. Strader, M. R. Candelore, W. S. Hill, I.S. Sigd and R. A. F. Dixon,JBC 264, 13572 (1989). 55. F.-2. Chung, C.-D. Wang, P. C. Potter, J. C. Venter and C. M. Fraser, JBC 263, 4052 (1988). 56. K. A Neve, B. A. Cox, R. A. Henningsen, A. Spanoyannis and R. L. Neve, Mol. Phurrnacol. 39, 733 (1991). 57. I. Ji and T. H. Ji, JBC 266, 14953 (1991). 58. C. A. M. Curtis, M. Wheatley, S. Bansd, N. J. M. Birdsall, P. Eveleigli, K. Pedder, D. Poyner and E. C. Hulme, JBC 264,489 (1989). 59. E. Kurtenbach, C. A. M. Curtis, E. K. Pedder, A. Aitken, A. C. M. Harris and E. C. Hulme, JBC 265, 13702 (1990). 60. M. F. Hibert, S. Trumpp-Kallmeyer. J. Hoflack aid A. Bruinvels, Trends Phannucol. Sci. 14, 7 (1993). 61. J. Wess, S. Nanvati, Z. Vogel and R. Maggio, E M B O ] . 12, 331 (1993). 62. J. Wess, D. Gdula and M. R. Brann, EMBOJ. 10, 3729 (1991). 63. J. Wess, Trends Phonnucol. Sci. 14, 308 (1993). 64. P. K. Chanda, M. C. W. Mincliin, A. R. Davis, L. Greenberg, Y. Reilly, W. H. McGregor, R. Bhat, M. D. Lubeck, S. Mizutani and P. P. Hung, Mol. Phantiucol. 43, 516 (1993). 65. C.-D. Wang, T. K. Gallagher and J. C. Shill, Mol. Phantiucol. 43, 931 (1993). 66. D. Oksenberg, S. A. Marsters, B.F. O’Dowd, H. Jin, S. Havlik, S. J. Peroutka and A. Ashkenazi, Nuture 360, 161 (1992). 67. M. A. Metcalf, R. W. McCuffin and M. W. Hamblin, Biocheai. Phantuzol. 44, 1917 (1992). 68. E. M. Parker, D. A. Grisel, L. C . Iben and R.A. Shapiro,]. Neurocherti. 60,380(1993). 69. S. H. Buck, R. M. Pruss, J. L. Krstenansky, P. J. Robinson and K. A. Stauderman, Trends Phannucol. Sci. 9, 3 (1988).

152

CLAIRE M. FRASER ET AL.

R. Schwyzer. EMBOJ.6, 2255 (1987). Y. Yokota, C. Akmwa, H. Ohkubo and S. Nllkrlnishi, EMBOJ.11, 3585 (1992). T. M. Fong, R.-R. C. Huang and C. D. Stritder, JBC 267,25664 (1992). U. Gether. T. E. Johwsen, R. M. Snider, J.-A. Lowe 111, S. Nukanishi and T. W. Schwartz, Nature 363, 345 (1993). 74. K. J. Watling and J. E. Krause, Trefids Pharmucol. Sci. 14, 81 (1993). 75. T. M. Fong, H. Yu and C. D. Strader. JBC 267,25668(1992). 76. B. S. Sachais, R. M. Snider, J. A. Lowe and J. E. Krause, JBC 268, 2319 (1993). 77. T. M. Fong, M. A. Cwieri, H. Yu, A. Band, C. SwtlinandC. D. Strader, Nahrre362, 350 (1993). 78. J. H. Perlman, D. R. Nussenzveig, R. Osman and M. C. Gershengorn, JBC 267,24413 (1992). 79. H. D.Perez, R. Holmes, L. R. Vialnder, R. R. Adams, W. Manma, D. Jolley and W. H. Andrews, JBC 268,2292 (1993). 80. G . J. LaRosa, K. M. Thomas,M. E. Kaufman, R. Mark, M. White, L. Taylor, G . Gray, D. Witt and J. N a v m , JBC 267, 25402 (1992). 81. C. A. Hebert, A. Chuntharapai, M. Smith, T. Colhy, J. Kim and R. Horuk, JBC 268, 18549 (1993). 82. W. E. Holmes, J. Lee, W.-J. Kuang, G. C. Rice and W. I. Wood, Science 253, 1278 (1991). 83. R. Iyenpsv and L. Birnbaumer. “G Proteins.” Acitdemic Press, New York, 1990. 84. A. Ashkenazi, J. W. Winslow, E. G. Pedta. G. L. Peterson, M. 1. Schimerlik, D. J. Capon and J. ELunachandrw, Science 238,672 (1987). 85. A. Suprenant, D. A. Horstmai, H. A k b d i and L. E. Limbird, Science %7,977 (1992). 86. M. A. Buck and C. M. Fraser, BBRC 173,666 (1990). 87. E. B. Thompson, Mol. Endocrfnol. 6,W1 (1992). 88. S. F. Luw, K. Yasuda, G. I. Bell and T. Reisine, JBC W , 12721 (1993). 89. S. Kosugi, F. Okajima, T. Ban, A. Hi&. A. Shenker and L. D. Kohn, JBC %7,24153 (1992). 90. R. C. Rubenstein, S. K.-F. Wong and E. M. Ross, JBC 262, 16655 (1987). 91. C. D. S t d e r , R. A. F. Dixon, A. H. Cheung. M. R. Candelore, A. D. Blake and I. S. Sigd, JBC 262,16439 (1987). 92. W. P. Hausdo&. M. Hnatowich, B. F. O’Dawd, M. G. Caron and R. J. Letkowitz, JBC 26!5,1388 (1990). 93. A. DeLean, J. M. Stadel and R. J. Letkowitz,JBC 2!55, 7108 (1980). 94. C. M. Fraser, F.-Z. Chung, C.-D. Wang and J. C. Venter, FNAS 85, 5478 (1988). 95. B. F. O’Dowd, M. Hnatowich, J. W. Regan, W. M. Leader, M. G. Caron and R. J. Letkowitz, JBC 263, 15985 (1988). 96. 8. F. O’Dowd, M. Hnatowich, M. G . C m n , R. J. Lefkowitz and M. Bouvier, JBC 264, 7564 (1989). 97. Y. A. Ovchinnikov, N. G. A M u h v and A. S. Bogachuk, FEBS Lett. 230, l(l988). 98. T M. S w e s e , C.-D. Wang and C. M. Fraser, JBC 267, 11439 (1992). 99. C.J. vim Koppen and N. M. Nathanson, J. Neurochnn. 57, 1873 (1991). 100. M. E. Kennedy and L. E. Limbird, JBC 268,8003 (1993). 101. S. B. Liggett, M. C. C m n , R. J. Letkowitz and M. Hnatowich, JBC 286, 4816 (1991). 102. T. Kubo, H. Bujo, I. Akiba, J. Nilkllj, M. Mishina and S. Numa, FEBS Lett. e41, 119 (1988). 103. J. Wess, T. I. Bonner and M. R. Bran, Mol. Phannucol. 38, 872 (1990). 104. R. A. Shupiro and N. M. Nathunson, B c h 28,8946 (1989). 105. S. K.-F. Wong, E. M. Parker and E. M. Ross, ]BC 266,6219 (1990). 70. 71. 72. 73.

G-PROTEIN-COUPLED RECEPTORS

153

106. D. G. Bichet, M.-F. Arthus, M. Lonergan, G. N. Hendy, A. J. P d i s , T. M. Fuji=. K. Morgan, M. C. Gregory, W. Rosenthal, A. Antaramian and M. Birnbaumer, J. Clin. Incest. in press. 107. Y. Pan, A. Metzenberg, S. Das and J. Gitschier, Nature Genet. 2, 103 (1992). 108. W.Rosenthd, A. Antarmian, M.-F. Arthus, M. Lonergan, G. N. Hendy, M. Birnbaumer and D. G. Bichet, Nature 358, 233 (1992). 109. A. M. W. van den Ouweland, J. C. F. M. Dreesen, M. Verdijk, N. V. A. M. Knoers, L.A. H.Monnens, M. Rocchi and B. A. van Oost, Nature Genet. 2, 99 (1992). 110. W. Rosenthd, A. Anatwmina, S. Gilhert and M. Birnbaumer, JBC e68, 13030 (1993). 111. T.P. Dryja, T. L. McGee, E. Reichel, L. B. Hahn, G. S. Cowley, D. W.h d e l l , M. A. Sandberg and E. L. Berson, Nature 343, 364 (1990). 112. K. C. Min, T. A. Zvyaga. A. M. Cypess and T. P. Sakmar, JBC e68, 9400 (1993). 113. P. Godfrey, J. 0. Ruhd, W. G. Beamer, N. G . Copeland, N. A. Jenkins and K. E. Mayo, Nature Genet. 4, 227 (1993). 114. R. J. Lefkowitz, S. Cottechia. P. Samama and T. Costa, Trends Pharmocol. Sd. 14, 303 (1993). 115. T.Okamoto, Y. Murayama, Y. Hayashi, M. Inagaki, E. Ogata and I. Nishimoto, Cell 67, 723 (1991). 116. A. H. Cheung, R. R. C. Huang, M. P. Gruiano and C. D. Strader, FEES Lett. 278, 277 (1991). 117. P. Samama, S. Cottechia, T. Costa and R. J. Lefkowitz, JBC 268, 462 (1993). 118. S. Cottechia, S. Exum, M. G. Caron and R. J. Lefkowitz, PNAS 87, 2896 (lQ90). 119. D. Julius, T.J. Livelli, T. M. Jessell and R. Axel, Science M4, 1057 (1989). 120. J. S. Gutkind, E. A. Novotny, M. R. B m n and K. C. Robbins, PNAS 88, 4703 (1991). 121. L. F. Allen, R. J. Lefkdtz, M. G. Caron and S. Cottechia, PNAS 88, 11354 (1991). 122. J. Parma, L. Duprez, J. Van Sande, P. Cochaux, C. Gervy, J. Mockel, J. Dumont and G. Vassart, Nature 365,649 (1993). 123. A. Shenker, L. Laue, S. Kosugi, J. J. Merendino, Jr., T.Minegishi and G . B. Cutler, Jr., Nature 365, 652 (1993). 124. R. B. Clark, Ado. Cyclic Nucleotide Protein Phosphorylation Res. 20, 151 (1986). 125. W.P. Hausdod, M. G. Caron and R. J. Lefkowitz, FASEB J. 4,2881 (1990). 126. H.G . Dohlman, J. Thomer, M. G. Caron and R. J. Lefkowitz, ARB 60,653 (1991). 127. D. A. Green and R. B. Clark, JBC 256,2105 (1981). 128. D. A. Green, J. Friedman and R. B. Clark, J . Cyclic Nuclevtide Protein Phosphorylatfon Res. 7, 161 (1981). 129. S. Kassis and P. H. Fishman, PNAS 81, 6686 (1984). 130. S. Kassis, M. O h m u , M. Sullivan and P. H. Fishman, JBC 261, 12233 (1986). 131. J. M. Stadel, B. Strulovici, P. Nambi, T. N. Lavin, M. M. Briggs, M. G. Caron and R. J. Lefkowitz, JBC e58, 3032 (1963). 132. D. R. Sibley, J. R. Peters, P. Nambi, M. G. Caron and R. J. Lefkdtz, JBC 258, 9724 (1984). 133. D. R. Sibley, R. H. Strasser, M. G. Caron and R. J. Lefkowitz, JBC 260, 3883 (1985). 134. J. L. Benovic. L. J. Pike, R. A. Cerione, C.Staniszewski, T. Yoshimasa, J. Codina, L. Birnbaumer, M. G. Caron and R. J. Lefkowih, JBC 260, 7094 (1985). 135. R. B. Clark, J. Friedman, R. A. F. Dixon and C. D. Strader, Mol. Phannocol. 36, 343 (1989). 136. M. Bouvier, W. Hausdod, A. DeBlasi, B. F.O’Dowd, B. K. Kobilka, M. G. CaronandR. J. Lefkowitz, Nature 333, 370 (1988). 137. D. R. Sibley, R. H. Strasser, J. L.Benovic, K. Daniel and R. J. Lefkowitz, PNAS 83,9408 (1986).

154

CLAIRE M. FHASER ET AL.

138. J. L. Benovic, H. Kuhn, I. Weyland, J. Codina, M. G . Caron and R. J. Lefkowitz, PNAS 84, 8879 (1987). 139. J. L. Benovic, F. Mayor. C. Staniszewski, R. J. Lefkowitz and M. G . Caron, JBC 262, 17251 (1987). 140. R. H. Stmser, D. R. Sillley and R. J. Leficowitz, Bchem 25, 1371 (1986). 141. J. L. Benovic, R. H. Strasser, M. G . Caron and R. J. Lefkowitz, PNAS 83, 2797 (1986). 142. H. G . Dohlman, M. Bouvier, J. L. Benovic, M. G. CaronandR. J. Lefkowitz,]BC262, 14282 (1987). 143. W. P. HausdorfF, M. Bouvier, B. F. O’Dawd, G . P. Irons, M. G. Caron and R. J. Lefkdtz, JBC 264, 12657 (1989). 144. M. J. Lohse, R. J. Lefkowitz,M. C . C m n and J. L. Benovic. PNAS 86, 3011 (1989). 145. N. S. Roth, P. T.Campbell, M. G . Caron, R. J. Lefkowitz and M. J. Lohse, PNAS 88,6201 (1991). 146. U. Wilden, S. W. Hall and H. Kuhn, PNAS 83, 1174 (1986). 147. M. J. Lohse, J. L. Benovic, J. Codina, M. G . Caron and R. J. Lefkowitz, Science 248, 1547 (1990). 148. P. Muzzin, J.-P. Revelli, F. Kuhne, J. D. Gocayne, W. R. McCombie, J. C. Venter and C . M. Fraser, JBC 266, 24053 (1991). 149. L. J. Emorine, S. Marullo, M.-M. Briend-Sutren, G . Patey, K. Tate, C. Delavier-Klutchko and A. D. Strosberg, Science 245, 1118 (1989). 150. F. Nantel, H. Bonin, L. J. Emorine, V. Zilberfarb, A. D. Strosberg, M. Bouvier and S . Marullo, Mol. Phannucol. 43, 548 (1993). 151. C. Hertel, M. Staehelin and J. P. Perkins, J. Cyclic Nucleotide Protein Phosphoylation Res. 9, 119 (1983). 152. G . L. Waldo, J. K. Northup, J. P. Perkins and T K. Harden,JBC 258, 13900 (1983). 153. M. J. Lohse, J. L. Benovic, M. G. Caron and R. J. Lefkowitz, JBC 265, 3202 (1990). 154. C. D. Strader, I.S. Sip$, A. D. Blake, A. H. Cheung, R. B. Register, E. Rands, B. Zemcik, M. R. Candelore and R. A. F. Dixon, Cell 49, 855 (1987). 155. A. H. Cheung, I. S. Sigal, R. A. F. Dixon and C. D. Strader, Mol. Phonnncol. 34, 132 (1989). 156. A. H. Cheung, R. A. F. Dixon, W. S. Hill, I. S. Sigd and C. D. Strader, Mol. Phannacol. 37, 775 (1990). 157. P. T. Campbell, M. Hnatowich, B. F. O’Dawd, M. G. Caron, R. J. Lefkowitz and W. P. HausdorfF, Mol. Phonnucol. 39, 192 (1991). 158. S. S. Yu, R. J. Lefkowitz and W. P. Hausdo&, JBC 268,337 (1993). 159. S. B. Liggett, J. Ostrowski, L. C. Chestnut, H. Kurose, J. R. Raymond, M. G. Caron and R. J. Lefkavitz, JBC 267, 4740 (1992). 160. M. G . Eason and S. B. Liggett, JBC 267, 25473 (1992). 161. N. M. Natlianson, in ‘The Muscarinic Rewptors” (J. H. Brown, ed.), pp. 419-443, Huinana Press, New Jersey, 1989. 162. M. M. Kwatra and M. M. Hosey, JBC 261, 12429 (1986). 163. M. M. Kwtra, E. Leung, A. C. Mann, K. K. McMahon, J. Ptasienski, R. D. Green and M. M. Hosey, JBC 262, 16314 (1987). 164. M. M. Kwatra, J. Ptasienski and M. M. Hosey, Mol. Phannacul. 35, 553 (1989). 165. R. M. Richardson and M. M. Hosey, /BC 267, 22249 (1992). 166. R. M. Richardson and M. M. Hosey, Bchem 29, 8555 (1990). 167. K. Haga, T. Haga and A. Ichimaya, J. Neurochetn. 54, 1639 (1990). 168. R. M. Richardson, J. Ptasienski and M. M. Hosey, JBC 267, 10127 (1992). 169. K. Haga and T.Haga, Biotned. Res. 10, 293 (1989). 170. K. Haga and T.Haga, FEBS Lett. ’268, 43 (1991).

C-PROTEIN-COUPLED RECEPTORS 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183.

184. 185. 186. 187. 188. 189.

155

K. Haga and T. Haga, JBC 267, 2222 (1992). A. 8. Tobin and S. R. Nahorski, JBC 268, 9817 (1993). A. B. Tobin, D. G. Lambert and S. R. Naliorski, Mol. Phantwol. 42, 1042 (1992). J. Lameh, M. Philip, Y. K. Sharma, 0. Moro, J. Ramachandrdn and W. Sadee, ]BC 267, 13406 (1992). N. H. Lee and C. M. Fraser, JBC 268, 7949 (1993). C. C. Malbon. P. J. Rapiejko and D. C. Watkins, Trends Pharnmol. Sci. 9, 33 (1988). S. Collins, M. G . Caron and R. J. Lefkowitz, Annu. Reu. Physiol. 53, 497 (1991). S. Collins, M. Bouvier, M. A. Bolanowski, M. G. Caroii and R. J. Lefkowitz, PNAS 86, 4853 (1989). J. R. Hadcock, M. Ros and C. C. Malhn, JBC 264, 13956 (1989). M. Bouvier, S. Collins, B. F. O’Dowd, P. T. Campbell, A. DeBlasi, B. K. Kobilka, C. MacGregor, G. P. Irons, M. G. Caron and R. J. Lefkowitz, JBC 264, 16786 (1989). R. J. Jackson, Cell 74, 9 (1993). C. S. Narayanan, J. Fujimoto, E. Geras-Raaka and M. C. Gershengorn, JBC 267, 17296 (1992). N. H. Lee and C. M. Fraser,]BC 269, 4291 (1994). J. 1). Port, L.-Y. Huaiig and C. C. Mallwn, JBC 267, 24103 (1992). A. 0. Davies and R. J. Lefkowitz, Annu. Reu. Physiol. 46, 119 (1984). C. M. Fraser and J. C. Venter, BBRC 94, 390 (1980). E. Lai, 0. M. Rosen and C. S . Rubin, ]BC 256, 12866 (1981). S. Collins, V. Quarinby, F. French, R. J. LefkcMritz and M.G. Caron, FEES Lett. 233, 273 (1988). J. L. Emorine, S. Marullo, C. Delavier-Klutchko, S. V. Kaveri, 0. Durieu-Trdutmannand A. D. Strosberg, PNAS 84, 6995 (1987).

190. M. 13. Adams, J. M. Kelley, 1. D. Cocayne, M. Dubnick, M. H. Polymeropoulous. H. Xiao. C. R. Merril, A. Wu, 8. Olde, R. F. Moreno, A. R. Kerlavage, W. R. McCombie and J. C. Venter, Science 252, 1651 (1991). 191. M. D. Adams, M. Dubnick, A. R. Kerlavage, R. Moreno, J. M. Kelley, T. R. Utterhack, J.

W. Nagle, C. Fields and J. C. Venter, Noture 355, 632 (1992). 192. M. D. Adams, A. R. Kerlavage, C. Fields and J. C. Venter, Noture Genet. 4, 256 (1993). 193. M. D. Adains, M. B. Soares, A. R. Kerlavage, X.-N. Chen, J. R. Korenberg, C. Fields and J. C. Venter, Noture Genet. 4, 373 (1993).

194. S. Cottechia, D. A. Schwinn, R. R. Randall, R. J. Lefkowitz, M. G. Caron and B. K. Kobilka, PNAS 85, 7159 (1988). 195. B. K. Kobilka, T. Frielle, S. Collins, T.-L. Yang-Feng, T. S. Kobilka, U. Francke, R. J. kfkowitz and M. G. Caron, Noture 329, 75 (1987). 196. R. E. Straub, G. C. Frech, R. H. Joho and M. C. Gershengom, PNAS 87, 9514 (1990). 197. Y. Tanabe, M. Masu, T. Ishii, R. Shigenioto and S. Nakanishi, Neuron 8, 169 (1992). 198. N. H. Lee and A. R. Kerlavage, Drug News Perspect. 6, 488 (1993). 199. D. G. Higinsand P. M. Sharp, Gene73, 237(1988). 200. J. Felsenstein, Clodistics 5, 164 (1989). 201. J. F. Bruno, J. Whittaker, J. Song and M. Berelowitz, BBRC 179, 1485 (1991). 202. B. L. Kieffer, K. Befort, C. Gaveriaux-Ruff and C. G. Hirth, PNAS 89, 12048 (1992). 203. F. Boulay, M. Tardif, L. Brouchon and P. Vignais, Bchern Z9, 11123 (1990). 204. N. P. Gerard, R. L. Eddy, T. B. Shows and C. Gerard, JBC 265, 20455 (1991). 205. Y. Tdkeda, J. Takeda, B. S. Sachais and J. E. Krause, BBRC 179, 1232 (1991). 206. T. Minegish, K. Nakamurd, Y. Takakura, Y. Ibuki and M. IgL1rashi. BBRC 17S, 1125 (1991). 207. T. Minegish, K. Nakamura, Y. Takakura, K. Miydmoto, Y. Hasegawd, Y. Ibuki and M. Igarashi, BBRC 172, 1049 (1990).

156

CLAIRE M. FHASER ET AL.

208. Y. Naguyama, K. D. Kaufman, P. Seto and B. hpoport, BBRC 165, 1184 (1989). 209. L. Buck and R. Axel, CeU 65, 175 (1991). 210. H. Y. Lin, T. L. Harris, M. S. Flannery, A. Aruffo, E. H.Kaji, A. Corn, L. F. Kwdski, Jr., H. F. Lodish and S . R. Goldring, Sdence 254, 1022 (1991). 211. T. Ishihara, S . Nakamura, Y. Kaziro, T. Takuhashi, K. Takahashi and S . Nagata, EMBOJ. 10, 1635 (1991). 212. C. M. Fmser and N. H.Lee. in “Neuropeptides in Respiratory Medicine”(M. A. Kaliner, P. J. Barnes, G . Kunkel and J. Bamniuk, eds.), pp. 225-250. Dekker, New York, 1994. 213. S . M. Pellegrino, N. H. Lee and C. M. Fraser, in “Biomembranes. Rhodopsins and G Protein-Linked Receptors” (A. C . Lee, ed.). JAI Press, Greenwich, Connecticut. In press. 214. D. A. Schwinn, J. W. Lomasney, W. Lorenz, P. J. Szklut, R. T. Fremeau, Jr., T.L. YangFeng, M. G . Caron, R. J. Lekowitz and S . Cotecchia, JBC 265, 8183 (1990). 215. I. Gantz, G. Munzert, T. Tashiro, M.Schder, L. Wang, J. DelvalleandT. Yamda, BBRC 178, 1386 (1991).

The Human Immunodeficiency Virus Type-1 Long Terminal Repeat and Its Role in Gene Expression GARCIAAND RICHARD B. GAYNOR'

JOSEPH A.

Departi)lent of Medicine Division of Molecular Virology Unioersity of Texas southtoestern Medical Center Dallas, Texas 75235 I. Gene Expression Studies ...................................... 11. Activation Signals ............................................. 111. Transcriptional Control Elements ............................... IV. Processing of HIV-1tnRNA .................................... V. Translational Control .......................................... VI. tat Studies ................................................... VII. Interventional Strategies ....................................... VIII. Glossary.. ................................................... Referenws ...................................................

158 160 162 173 174 177 182 185 185

It is nearly a decade since the isolation of the first molecular clones of the human immunodeficiency virus, type 1 (HIV-1) (1-3). Since then, an intense effort has been made by investigators worldwide to try to understand in detail how HIV-1 gene expression is regulated. The first step in the life cycle of HIV-1 involves an initial burst of viral gene expression upon cellular infection. Until recently, this was thought to be followed by a variable latency period almost inevitably leading to resuming an active stage of the infectious process, resulting in deterioration of the patient's immune and often neurological status. However, recent studies indicate that HIV-1 gene expression may be active throughout infection in a subset of cells within the lymph nodes of infected individuals. At the same time, most infected cells may retain HIV-1 in a dormant stage (4, 5). The long-terminal-repeat (LTR)elements of the integrated proviral HIV-1 genome serve as the transcriptional initiation site at the 5' end of the 1

To whom correspondence may be addressed.

157

Copyright Q 19% by A d en iic R w s . Inc. All rights of repduotion in MY form mserved.

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

158

genome. The LTR contains an element known as the trans-activationresponsive (TAR) region, the target for the HIV-l-encoded viral transactivator protein tat. The LTR also serves as the terminal processing center for HIV-1 mRNA transcripts at the 3' end of the genome. In addition, the reading frame of the HIV-l-encoded protein nefoverlaps the 5' aspect of the LTR, which likely places constraints on sequence variation and composition in this portion of the LTR. Studies of this region must be made at three primary levels with regard to gene expression: transcriptional initiation and elongation; 3' mRNA cleavage and polyadenylation; and translational control. In this review, we discuss: approaches for studying HIV-1 gene expression; the activation signals for HIV-1 gene expression; the HIV-1 cis-acting control elements in the LTR involved in transcriptional initiation; elongation; and processing of RNA as well as translational control. Finally, the structural and mechanistic features of tat-activation and interventional strategies for the inhibition of HIV-1 gene expression are explored.

1. Gene Expression Studies A. In vitro Transcription The biocheinical dissection of tat-activation eluded an adequate experimental system for a number of years, despite an earlier report suggesting the use of HIV-l-infected extracts to demonstrate tat-induction (6). Th'IS was attributed in part to the strong basal activity of the HIV-1 promoter observed in uitro but not in uiuo. A number of investigators described various modifications to the classical in uitro transcription approach in attempts to develop a more meaningful assessment of in uitro basal and tat-activated HIV-1 gene expression. With the described changes, a significant decrease in basal activity w a s found when a preincubation period, minus label, was added. This increased the activation of HIV-1 gene expression when exogenous tat was added (7). Additional alterations have been described that mimic the in uiuo scenario with regard to the abundance of short transcripts observed in the absence of tat (8, 9) as well as for temporal aspects observed in phytohemagglutinin- or phorbol-stimulated Jurkat cells (10).More recently, in uitro experiments have demonstrated the importance of the loop and bulge regions in a qualitative and quantitative manner, as found in the in uiuo scenario (11-13).In the presence of tat, an increase in transcriptional initiation, as well as a change in the elongation rate, is observed, which again mimics the in uiuo situation. These advancements may allow an accurate biochemical dissection of tat- activation.

HIV-1 GENE EXPRESSION

159

B. Cell-line Studies A second means of assaying for promoter activity involves the transfection of transient expressed or replicating plasmids. These experiments have provided a wealth of information on the regulation of the HIV-1 promoter in a wide variety of cell types under a diverse set of externally defined stimuli. The promoter for all HIV-1 transcripts resides in the 5’ LTR. The LTR is 633 bp in length, and can be conceptually subdivided into several regions as defined by deletion and site-directed inutagenesis studies in a variety of cell lines under a variety of experimental conditions. From these studies, it is evident that the HIV-1 promoter is comprised of the elements responsible for basal as well as tat-induced gene expression, with additional specialized elements conferring response to a multitude of extracellular-mediated stimuli. Furthermore, a number of potential cis-elements, identified in the HIV-1 promoter on the basis of homology, control elements in other class-I1 promoters. The relative role of these elements in the regulation of HIV-1 gene expression, with the notable exception of the specialized response elements, remains remarkably consistent across several different tissue-derived cell lines. Several of these elements can indeed bind known cellular transcription factors, although what role, if any, these factors play in the regulation of HIV-1 gene expression remains unknown. A subset of these putative control elements probably mediates response to the diverse cytokine exposure to which an HIV-1-infected cell is exposed and in turn influences in oivo transcription. It is important to acknowledge that a failure to demonstrate a role for these potential control elements may reflect a limitation of current experimental systems, as it is unlikely that the HIV-1 would retain these elements if they did not serve some essential biological function. Soon after the initial dissection of the HIV-1 cis-acting elements, attempts were made to identify DNA elements in the HIV-1 promoter thatbind cellular proteins (14, 15). This served as a starting point to purify cellular DNA-binding proteins involved in regulating HIV-1 LTR gene expression. In addition, fractions containing highly purified transcription factors were used to demonstrate the functional elements of the HIV-1 promoter that are also found in other eukaryotic class-I1 promoters. Finally, gelshift analysis provided an additional means of identifying potential cellular regulatory factors that influence HIV-1 gene expression. By combining the results of the genetic analyses of the HIV-1 promoter with protein-DNA analyses, many cellular DNA-binding proteins have been described that have important regulatory roles on HIV-1 gene expression. Several of the cDNAs encoding these proteins have been cloned, using reverse genetics. Moreover, additional cellular factors have been identified using expression-

160

JOSEPH A. GARCIA AND RICHARD B. GAYNOH

coning from cDNA libraries. With each cis-element, a diverse group of factors may be capable of modulating promoter activity under different conditions. It will indeed be a formidable task to uncover the relative importance of these cellular factors in HIV-1 gene expression.

C. Viral Studies The application of the results from the isolated promoter studies will provide the groundwork for potential therapeutic modalities. However, the importance of confirming these highly focused, but by definition biologically restricted, studies cannot be underestimated. A major advantage of HIV-1 for the experimentalist is its relative amenability to viral reconstruction efforts. Studies with viral constructs and resultant progeny may serve as a more relevant indicator of HIV-1 gene expression. This is especially important given that there are several confirmed or putative HIV-l-encoded regulatory proteins other than tat that may also have critical modulating activities for HIV-1 gene expression at the transcriptional level. Replication analysis of natural variants and laboratory propagation variants has been performed in several cases in attempts to determine whether the HIV-1 promoter plays a role in cytopathogenesis (16-19). Additional viral mutants have been generated in uitro using proviral HIV-1 constructs (20-24). The results of these studies have substantiated important roles in some cases for several presumptive regulatory regions of the HIV-1 LTR, as defined by earlier studies using transfection analysis with reporter constructs. However, the exact role that these regions play appears less clear, in that their effect is dependent on the cell line used and whether these elements have been altered by deletions or site-specific mutations. Viral studies of particular interest include one demonstrating the emergence of TATAregion mutations that restore activity for constructs lacking SP1 sites (21)and another that demonstrates a TAR-independent mechanism for tut-activation (23). The former study suggests that the nature of the TATA region is critical for HIV-1 gene expression, although whether this is a requirement for responsiveness to other cis-acting factors, or tut-activation in particular, remains unknown. The latter study suggests that tat-activation may be independent of TAR under certain physiological states or in specific cellular environments.

II. Activation Signals A. Cytokines The clinical course of HIV-1 disease includes abnormalities in the overall immune system, in the late stages, in addition to the depletion of CD4+

HIV- 1 GENE EXPRESSION

161

cells. This includes lymph-node hyperplasia, elevated or depressed cytokine levels, and either the suppression of or inappropriate cellular and humoral responses (25).Furthermore, it is likely that these end-stage manifestations in the HIV-1 disease process may, in some cases, reflect the host-derived inducers of reactivation, which had been held in check by other natural modulators. Cytokines are potent inducers of various signal transduction pathways (26, 27). The concurrent infection by other viruses, or reactivation of latent heterologous viruses, in theory provides further means of activating latent HIV-1 proviruses via heterologous trans-activators. Even if dual infections do not accurately reflect the in G ~ O Oscenario, infection with several of these viruses in HIV-1-negative cells can induce cytokine production and hence alter the expression of HIV-1-infected cells in close proximity. Thus, HIV-1 reactivation may be part of the natural progression of the disease process, but it may also be initiated by specific complications in the HIV-1-infected patient. The list of cytokines that augment HIV-1 gene expression (see Section VIII) includes IFN-y, TNF-a, TNF-P, GM-CSF, M-CSF, TGF-P, IL-6, IL-4, IL-3, IL-2, and IL-1 (28).Those shown to inhibit HIV-1 gene expression include IFN-a, IFN-P, IFN-y, TGF-P, IL-4, and possibly GM-CSF (29, 30). Many of these cytokines appear to act through the induction of cellular transcription-factor activity such as the NF-KB enhancer and possibly IL-2 enhancer-like elements piesent in the HIV-1 LTR. However, the data to date remain correlative and will require more definitive characterization for verification of the purported roles of these factors. This should be possible in the near future, now that many of the cDNAs encoding these factors have been cloned. This will allow the generation of suitable reagents to define precisely the roles these factors play in various cytokine-induced alterations of HIV-1 gene expression.

B. Heterologous Viruses trans-activation of the HIV-1 LTR was initially demonstrated for a number of heterologous viruses (31).In most cases, this is mediated by the transactivating viral factors encoded by the respective viruses, although cytokine induction as a result of the infectious process may also modulate or be the sole effector on HIV-1 gene expression. The viruses capable of the inductive process include members of the herpes family [HSV-1 and -2 (31-37), CMV (31, 34, 38-47), EBV (41,48,49),and HHV-6 (50-53)],VZV (31, 34), pseudorabies virus (34, 54), the papovavirus fainily [JCV (31, 55), BKV (31),LPV (31),BPV (31),and SV40 (SS)], the poxvirus family [W (57)],the parvovirus family [AAV (58,59)],adenovirus (34.60-63), the hepadnavirus family [HBV (64-67)],and the retrovirus family [HTLV I (56, 68-72) and HFV (73-76)].

162

JOSEPH A. GARCIA AND RICHARD B. CAYNOR

Investigations into the mechanism of trans-activation of the HIV-1 promoter by these heterologous trans-activators indicate that, as a whole, they act through multiple &-elements and therefore, probably act through multiple cellular factors [HBV (77, 78), CMV (79-82), HSV (83-85), EBV (41,48,49), and HHV-6 (86)], which may explain the synergistic activation observed with tat and some heterologous trans-activators. In addition, the observation that cell-type specificity and differentiation may play a significant role in the trans-activation process [HBV (87), CMV (88), and HSV (89, 90)]also suggests the use of alternative signaling pathways, and may in part explain the lack of consensus regarding the mechanism of action for many of these heterologous viral trans-activators.

C. Mitogens, Heat-shock, and UV-Irradiation The signaling pathways capable of affecting HIV-1 gene expression, as defined by antigen or initogen stimulation (91, 92), receptor cross-linking (93), and inhibitor studies, include PKC-dependent and PKC-independent, as well as PKA-dependent, ones (94, 95). In addition, a subset of intracellular calcium-mediated signals likely plays a role, albeit an indirect one, since treatment of cells with a calcium ionophore does not induce HIV-1 gene expression (96). Differentiation agents can also stimulate HIV-1 gene expression and may be important with regard to activation of latent viral reservoirs (97-101). Heat-shock at physiological levels, by itself, does not appear to increase HIV-1 gene expression, but does augment the effects of cytokines on HIV-1 gene expression (102). Exposure to UV-irradiation also induces expression of latent wild-type (103)as well as tat-defective (104)HIV-1 proviruses through pathways involving post-translational modifications of preexisting transcription factors (105), activation of integrated provirus via chroinatin reorganization (106),and/or extracellular activation via a secreted factor (107).Hence, stimulation of HIV-1-infected cells may occur through many pathways, which reflects the divergent avenues available for signal transduction (108, 109).

111. Transcriptional Control Elements The sequences in the HIV-1 LTR that bind cellular factors are shown in Fig. 1. The HIV-1 LTR can be grouped into three functional elements designated as modulatory, core, and TAR (Fig. 2). The inodulatory region contains elements that bind factors, some of which are involved in tissuespecific gene expression. The core element contains three SP1-binding sites and the TATA box and is involved in HIV-1 gene expression in a variety of

TAGTAGT

-454

TGGAAGGGCTAATTCACTCCCAACGAAGACAAGATATCCTTGATCTGTGGATCTACCACA

-394

us

cAcAAGGcTAcTT cc cTG

AT T

coup G c A G A A cT AC A c A c

W

Apl

c A G G G c c A G G ~ A GT c

AG A T A T c c A C

-334

G AT

-34

TCF-1 I AGCTGCATCCGCAGTACTTCAAGAACTGCTGATATCGAGCTTGCTACA

NIB

G G G G AC TT T TATA Box CCT G C4

K

d

c~

spl(3)

Spl (2)

spl (1)

A GGCA G G c G T G G c CIT G G G c G G G ACIT GGGGAGTGG

WT ln G C AGIC T G C T T T T T G C C T G T A C

UBP-1RBP-1

d E G T C T C T C T G GlT T A G A C C A G A T C T G A

mm

-2

A G CCCT CA

?CCTGGGAGC~TCT~-~GCTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCT R

v

46

TAR Regbn -6

us

TGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTC

+146

AGACCCTTTTAGTCAGTGTGGAAAATCTCTAGC

*zoo

FIG. 1. HIV-1 long-terminal-repeat (LTR) nucleotide sequences. The HIV-1 LTR extending from -454 to + 179 is illustrated. Boxed regions of the LTR have heen s h m i to I)ind cellular transcription factors. The factors cwntained in the intdulatory region include chicken c d b u m i n upstream promoter (COUP), &ivitor protein 1 (APl), nuclear factor (NF)-KB, iipstreaiii stiinolatory factor (USF), and T-cell factor-la (TCF-la). Those in the care element include SP1, TATA, and potential initiator (INT) hinding proteins, while the truns-activation-resp,nsiv~(TAR) element wntains Imth DNAbinding proteins, including untriiislated Ihding protein (UBP)-l,leader-l)inding protein (LBP)-1, UBP-2 and CAAT transcription factor nuclear factor-1 (CTF/NFI), and RNA-binding proteins, including TAR RNA-binding protein of 185 kDa (TRP-185) and 1168.

164

JOSEPH A. GARCIA AND RICHARD B. CAYNOR TAR RNA

*1

I MODULATORY CORE TAR FIG. 2. HIV-1 long-termind-repeat (LTR) DNA- and RNA-binding proteins. A schematic of the HIV-1 LTR, indicating potential sites of interaction for DNA- and RNA-binding proteins, is shown. These include binding sites for activator protein (AP)-1, chicken ovalbumin upstream promoter (COUP), nuclear factor of activated T cells (NF-AT), upstream stimulatory factor (USF), T-cell factor-la, nuclear factor (NF-KB),SPl TATA, initiator (INT), untranslated binding protein (UPB)-1 or leader-binding protein (LBP)-1, UBP-2, and CTF/NFI.

tissues. The TAR region is the binding site for both DNA- and RNA-binding proteins and is required for tat activation.

A. Upstream Region The upstream region of the HIV-1 LTR extends from nucleotide -453 to -105. At least two regions of the HIV-1 LTR confer negative regulatory effects. The first of these to be identified was a distal upstream region defined by deletion (25, 26, 70, 110)and site-directed mutational analyses (111). Although the evidence is not without question regarding a negative the biological pressure for retention regulatory role for this region (112,113), of these sequences may reflect an unrecognized regulatory function for this region. Several factors that bind to elements within this region have been characterized. The functional significance of these factors is suggested by in uitro transcription studies (114) and transfection assays (115, 116) with HIV-1 promoter constructs, or by transfections using heterologous promoter constructs with oligomerized control elements (117).One of these factors appears to be modulated by nef, whose action on HIV-1 gene expression remains controversial (118). A positive-acting influence on HIV-1 promoter activity was noted in transfection studies using the proto-oncogene c-myb (116),suggesting that this region may include both positive- and negativeacting cis-regulatory elements. In addition, sequences related to activator protein 1 (AP1) consensus recognition sites have been identified, although their function is unclear (119). However, the role for AP1 in steroid hormone regulation (120) may provide an insight into a similar function for AP1 in

HIV-1 GENE EXPRESSION

165

HIV-1 gene expression, given the proximity of the HIV-1 putative hormoneresponsive promoter elements to the AP1-binding sites. The distal upstream sequences of the HIV-1 promoter have potential modulatory influences on gene expression in response to diverse extracellular messages, including differentiation. Hormone-responsive elements have been identified on the basis of both homology and functional studies. The effects of glucocorticoid administration in transfection analyses appears to increase HIV-1 gene expression, and binding of the glucocorticoid receptor to the HIV-1 promoter can be demonstrated in vitro (121). However, there is some disagreement as to the numbers of elements conferring the induction (122,123). Furthermore, evidence has been presented demonstrating a suppressive effect of glucocorticoids on HIV-1 gene expression, in conflict with the previously mentioned studies (124). The proximity of the glucocorticoid response element (GRE) sequences to other putative regulatory elements may have significance with respect to the regulation of signal transduction pathways. An additional DNA element with homology to the steroidlthyroid hormone response elements can be bound by human homologies of chicken ovalbumin upstream protein-transcription factor (COUP-TF) (125) as well as by a T-cell-specific factor that also binds estrogen and thyroid hormoneresponsive elements with lower &nity (126).A mutation of the latter site led to increased gene expression in a T-cell line, suggesting a negative role for this element (127). Studies examining the effects of estrogen and progesterone indicate that these hormones inhibit HIV-1 gene expression in monocytes (128), which may provide a partial explanation for the relatively high percentage of uninfected children born to HIV-l-infected mothers. Finally, the retinoic acid receptor (RAR)can bind to the steroidlthyroid response element in the HIV-1 LTR (129). The retinoic acid signaling pathway is composed of multiple members (130)that can interact with each other as well as with other members of the steroid hormone receptor superfamily, including the COUP-TFs (131). The functional significance of the steroid hormone receptor superfamily members on HIV-1 gene expression is unclear. The recent use of trans-dominant negative RAR mutants to alter basal and hormone-induced transcription (132) may prove to be a useful approach to further investigate the role of the RAR in HIV-1 gene regulation. Using constructs containing deletions in the HIV-1 LTR, evidence has been presented that the region encompassing the IL-2 activator-like and the nuclear factor of activated T-cells (NFAT-1) recognition elements in HIV-1 may play a negative regulatory role in gene expression (26).Although transfection analysis of reporter constructs containing point mutations of the IG2lIL2R activator and the NFAT-1 recognition elements do not corroborate these findings (111,133), more meaningful biological experiments with reconstructed viruses containing point mutations in these elements remain to be done.

166

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

Two cellular factors have been identified by expression cloning that can interact with the purine-rich IL-2 activator homology region (134-136).Both of these factors belong to the “forkhead family of transcription factors. Determining whether the forkhead members or other factors capable of binding to the purine-rich elements, such as members of the ets family (137),mediate the biological response observed in the viral deletion mutants described above awaits further experimentation. The region just downstream of the HIV-1 LTR with homology to the IL-2/IL-2R activator element can bind the cellular complex NFAT-1 (138). The appearance of the NFAT-1 complex temporally precedes the activation of the IL-2 and IL-2R genes, and the sequence to which NFAT-1 binds is a critical regulatory element for the promoters of these genes. Inhibition of T-cell activation by the immunosuppressants FK506 and cyclosporine correlates with reduced levels of NFAT-1, presumably secondary to decreased transport of a cytoplasmic component of NFAT-1 (139).Indeed, cyclosporine and FK506 treatment of T cells, chronically infected with HIV-1, inhibits viral production (140).A role in HIV-1 gene expression has not been verified for NFAT-1 or its recognition elements, using transfection studies with reporter constructs containing either a deletion (26)or site-directed mutations of the NFAT-1 recognition element (133).However, a recent study demonstrating reduction of the level of cellular complexes that bind to the HIV-1 enhancer, as well as the NFAT-1 sites in cyclosporine A-treated cells, suggests that the HIV-1 promoter may use alternative NFAT-l-binding sites in addition to the sites first described (141).Furthermore, the observation that treatment of cells with immunosuppressants may potentiate binding of the glucocorticoid receptor, as recently demonstrated for the mouse mammary tumor virus LTR (142),to cis-elements similar to those that flank the NFAT-1 sites in the NRE (negative regulatory element) suggests that this region may be involved in a presently unknown cellular regulatory circuit for HIV-1 gene expression. The proximal upstream regulatory region first identified by deletion analysis in HeLa cell transfection experiments (14),and also demonstrated with linker-scanning mutants (Ill),contains an E-box DNA element that can bind the helix-loop-helix (HLH) transcription factor USF. Evidence has been presented for a repressing (143)as well as a stimulating role (144)for USF or a related factor in HIV-1 gene expression. An additional cellular factor has been identified that also binds a similar element with a negative regulatory role in the IL-2Ra gene (145).Other factors that can bind to this region and whose role in HIV-1 gene expression is unclear have been identified (143, 146).Viral mutant constructs with a more dramatic phenotype have been observed for deletions of the proximal negative regulatory region (NRE1) (25).However, other reports demonstrated a less dramatic effect on replication for a site-directed mutation of the NREl region in a provirus (24).As

HIV-1 GENE EXPHESSION

167

with the other cis-elements, further in uiuo and in uitro characterization is required to define the contributions of this potential modulatory element to HIV-1 gene expression. The region upstream of the HIV-1 enhancer elements can bind a T-cellspecific factor that acts as a T-cell-specific activator of the TCRa enhancer (147).In transfection studies, this region did not appear critical for HIV-1 promoter activity in T cells. However, site-directed alterations of this region result in decreased viral replication (24).Moreover, other elements in this region are somewhat responsive to sodium butyrate stimulation (148)and also to the human foamy virus trans-activator bel-1 (75,76). Hence, this region may be involved in the regulation of HIV-1 gene expression in a wide variety of physiological states.

6. Enhancer Region The multiple activation pathways used by T cells reflect the divergent stimuli to which these critical immunological effectors respond (149).In the region of the HIV-1 LTR from - 105to -81 bp, there are two copies of an enhancer element that functions both in uitro and in d c o . Although initially thought to be responsible for B-cell and T-cell enhancer activity (150),this enhancer element, referred to as the NF-kB site, contributes to HIV-1 promoter activity in lymphoid cell lines (70,151-153) as well as a variety of other cell lines (14,153, 154). A requirement for HIV-1 replication was demonstrated with proviral constructs containing deletions in the enhancer Studies have demonstrated a less dramatic phenotype in viral region (21,22). growth with these mutations as compared to transfection studies (22,23). The re1 family of cellular factors, which regulate activity through this control element, contains multiple members, including inhibitory subunits, and responds to divergent cellular stiinuli (155).A number of rel members influence HIV-1 promoter activity in transient expression assays (156-159) or in in uitro transcription studies (159-162).The role each rel member plays on HIV-1 expression may depend on the cell type or physiological environment in which the provirus is found. Hence, the exact composition of a rel complex may have differing effects on HIV-1 gene expression. There are other cellular factors that, although less well-characterized, may also be important in regulating HIV-1 gene expression under defined cell conditions or in different cell types (141,157,163-169). Several of these factors exhibit a similar increase in binding activities under T-cell activation conditions (141,167,168,170)and, as such, warrant further investigation.

C. G-C-Rich Region The region from -76 to -45 contains three G-C motifs to which the cellular transcription factor SP1 can bind, as demonstrated using highly

168

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

purified preparations of SP1 (171).The crucial role of these sites has been demonstrated in in uitro transcription as well as in uivo studies (171,172). Interestingly, this region does not appear to be protected in DNase-I protection assays in uitro using partially purified nuclear extracts (14),although in uivo “footprinting” assays do detect SP1 binding to this region (15).However, the predominant bound complex(es) observed in uitro may not be responsible for the majority of transcriptional activity observed. Furthermore, the suggestion that multiple types of HIV-1 transcriptional initiation complexes may exist (173)underscores the fact that a DNase-I protection assay using partially purified extracts does demonstrate the predominant promoter occupancy states. Moreover, SP1 may transiently interact with its DNA recognition sites before forming a stable initiation complex. Under certain conditions, it is possible that SP1 may bypass binding to its DNA site altogether by interacting directly with other cellular transcription factors bound to the HIV-1 promoter as seen in other promoters (174).Studies demonstrating the importance of the SPl sites for tat-activation, using GalCtut or Gal4-SP1 fusions, support the notion that a specific type of transcriptional initiation complex is sensitive to the action of tat (175,176). Although other transcription factors may replace the role of SPl in this process, the SP1 recognition sites are particularly effective in tat-activation (177,178).The G-C-rich sites in the HIV-1 promoter may be regulated by methylation (179,180)and also may play a crucial role in alternative activation pathways in actively replicating cells (177). A variable role was seen for the SP1 sequences in a study using a virus containing deletions of part or all of the SP1 region (20).However, deletions of both the enhancer and SP1 regions (21,22)or the SP1 region alone (20)as well as site-directed alterations of the SPl region markedly alter viral gene expression. The defective growth characteristics observed for the SP1 deletion mutants were in a cell background deficient in enhancer-region binding factors. This suggests that the SP1 control element can confer enhancerindependent growth under certain conditions. Interestingly, a variant of HIV-1 has been described that contains an additional SPl site (19)and exhibits increased growth rate. Thus, the G-C-rich promoter elements are critical for HIV-1 gene expression, and either SP1 or other cellular transcription factors (181-185)may regulate the assembly of the HIV-1 tat-responsive initiation and elongation complex whose activity may be independent of enhancer influence.

D. TATA Region The TATA element in the HIV-1 promoter is a canonical (conserved) tetranucleotide sequence found in most RNA polymerase class-I1 (Pol-11) promoters. The necessity of this element for HIV-1 gene expression has

HIV-1 GENE EXPHESSION

169

been demonstrated in transfection studies (14, 186). In viral studies, a severely defective phenotype has been observed for deletions of the TATA region (25) as well as site-directed alterations of the TATA elements (23). It appears not only that the TATA element is critical for basal levels of expression (62), but that the composition of this element may also regulate tutresponsiveness (178, 187)as well as response to other cell stimuli via alternative signaling pathways (148, 188). Although heterologous fusions of upstream regions to TAR DNA can, in soine cases, confer significant tutresponsiveness (27, 189), it is unclear from such studies whether different initiation complexes are being assembled, despite the apparent lack of homology among the upstream sequences, since many transcription factors have degenerate DNA recognition elements. This is further coinplicated by the observation that two types of initiation complexes may assemble on the HIV-1 promoter, with one type thought to be tut-responsive (173). As with all class-I1 promoters, the aim for future investigations will be to integrate the relationship of the specific regulatory factors that interact with the HIV-1 promoter with factors comprising the basic transcriptional machinery (190). The TATA region is protected when DNase-I footprinting is performed using partially purified cellular extracts, and the identity of the specific DNA-binding factors providing this protection has been partly determined (14, 164). This region can be found by highly purified UBP-1 preparations, which bind to low-affinity sequences flanking the TATA element, in the presence or absence of purified TATA-binding protein (TBP) (191). However, a preliminary binding with TBP prevents binding of UBP-1, and also prevents the associated repression. In addition to TBP, a cellular factor has been identified, using expression cloning, that also binds specifically to the HIV-1 TATA element (192). This factor, “TATA” element inodulatory factor 1 (TMF-l), appears to provide a low level of basal expression, but by itself cannot confer tut-responsiveness. Furthermore, the induction of a large protein complex in phorbol-treated T cells inay reflect an increase in a specific TATA-binding factor other than TBP, although induction of a TBPassociated factor cannot be ruled out (193). Other general TATA-binding factors besides TBP have been described subsequent to the identification of TMF-1, and there is genetic evidence for the existence in yeast and higher eukaryotes of multiple TATA-binding factors (194-196). Thus, factors binding to the HIV-1 TATA element and flanking sequences may provide a basis for the tat-sensitive and tut-insensitive transcription complexes described, and may be a critical determinant for successful reconstitution of tut-activation in oitro. The sequences flanking the TATA motif on both the 5’ and 3’ sides contain a sequence known as an E-box, first described in the adenovirus

170

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

major late-promoter and immunoglobulin heavy-chain enhancer. Mutational analysis of these sequences has a minimal effect in HeLa cells (186), but a more pronounced negative effect in Jurkat-tat cells (111). Interestingly, Jurkat-tat cells allow the propagation of slow-growth HIV-1 isolates that may reflect the presence of a particularly tat-responsive or active transcriptional initiation complex in these cells (197). Sequences that flank TATA elements have been associated with highly specialized regulatory responses in certain Pol-I1 promoters (198-201). The HIV-1 TATA-flanking promoter elements can bind members of the HLH DNA binding-factor family, although the precise role of these factors in the tat-response awaits further biochemical characterization (202). The possibility of multiple transcription complexes regulating HIV-1 gene expression through this element should be considered, given the opportunities for heterodimer formation with the HLH family (203).This may be even more important for HIV-1 gene expression, given the recent reports of HLH members dimerizing through the use of ankyrin-like repeat sequences also found in the re1 family members (204).

E. Initiator Region A recently described control feature of certain RNA genes transcribed by Pol-I1 is the initiator element. Examination of the HIV-1 promoter has identified a potential initiator element. However, this element failed to function in a heterologous promoter assay described previously for bonufide initiator elements. Substitution of the terminal deoxynucleotidyl transferase (TdT) initiator for the HIV TATA and initiation region did not confer tatinducibility despite adequate basal levels of expression (178). Further analysis of this region in the HIV-1 promoter showed the existence of a more complex initiator element, comprised of bipartite domains, whose activity is also highly dependent on the HIV-1 TATA region (205). The exact cellular factors responsible for this activity remain undetermined at this time, although several members of the regulatory as well as basic transcription factor family that bind to this region have been identified; these may modulate activity in a positive or negative manner (164, 186, 206-209). The relationship of the HIV-1 promoter-initiator activity to tat-activation has not been determined conclusively. An additional element, known as the inducer of short transcripts (IST), which partially resides in TAR, has been characterized (210). This element activates HIV-1 gene expression, but only that of the short transcript family. Furthermore, this element can act when placed under the control of heterologous promoters including Pol-I11 types. The control elements for IST reside mainly between positions -5 and +26, with additional contributions from the sequence between positions +40 to +59 (211).This region contains recognition elements for the cellular DNA-binding proteins UBP-1,

HIV-1 GENE EXPRESSION

171

CTFINFI, and UNF-1 (164, 206). Binding studies using mutated promoter constructs, which retain IST activity, have decreased affnity or lack binding sites for UBP-1 and CTF/NF-1 (211). However, these constructs probably retain recognition sequences for UNF-1 (I. Ou and R. Gaynor, unpublished), making this cellular factor a candidate for IST control. There is evidence indicating that these short transcripts may sequester tat (212). However, the functional significance of this is unknown. Again, this region at the inRNA level is a target for cellular RNA-binding factors, and the consequences of sequestering tat versus a TAR RNA-binding cellular factor(s) will be difficult to distinguish. The understanding of the biological relevance of these transcripts awaits the reconstruction of HIV-1 proviruses containing mutant promoters deficient in IST activity, but responsive to tat.

F. trans-Activation-responsive (TAR) Region The viral protein tat increases gene expression of HIV-1-encoded constructs in a highly specific manner (213).The target sequences for tat reside in a transcribed, but untranslated, region of the HIV-1 LTK known as the TAR region (214). Viral mRNA produced from this region can form a stable stem-loop structure (112) that is both structure- and sequence-specific for tat-activation, as indicated by the effects of deletion (14, 110, 112, 215) and site-specific mutations (186,206,216-222). The most critical region appears to be the loop sequence, with the bulge region also providing substantial input into tat-responsiveness (186, 21 7, 220). Mutations with a more dramatic phenotype have been observed for deletions of the TAR region (22)as well as with site-directed mutations of the TAR region that alter the loop sequence or disrupt the stem structure in TAR (23). A less dramatic phenotype in viral replication occurs in a TAR mutant that retains the stem-loop structure, but mutates the primary sequence (23), similar to what is seen in transfection studies (186). The mechanism by which tat acts involves an increase in both the transcriptional initiation and the elongation rates of the HIV-1 promoter, resulting in increased steady-state levels of HIV-1 inRNA (223, 224). From the genetic analyses, it was evident that a number of features of the TAR region are critical for tat-induced expression. First, the DNA of this region contains recognition sequences for several cellular DNA-binding proteins that regulate basal expression of the HIV-1 promoter, but that may also, in some manner, contribute to tat-responsiveness (164, 208). Downstream binding sites for DNA-binding proteins affect the recruitment of TBP-containing complexes to the upstream TATA element in another promoter (225). Second, tat can bind in oitro to the bulge region of TAR-containing RNA, and mutations of this primary recognition site for tat decrease HIV-1 gene expression (220, 226-228). Third, there are several RNA-binding cellular

172

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

factors that can bind to critical TAR mRNA cis-elements. Hence, the dissection of HIV-1 promoter activity in the absence or presence of tat (7,230,231) requires assessing the contributions of different components ofTAR to HIV-1 gene expression. Experimental approaches, similar to those used for the cellular DNAbinding factors, have been taken to identify cellular factors that can interact with TAR mRNA in a sequence-specific manner (7,229,230-235). A subset of these binds to the loop sequence of TAR, which is the most critical control element for tat-activation (7, 230, 231). The binding of several of these cellular factors to TAR RNA can be modulated by tat, suggesting a significant role for these factors in tat-activation (230,231).In addition, several cellular factors have been identified by their interaction with tat, in purification or screening schemes, or on the basis of induced expression in the presence of tat (236).Although the list of these factors is relatively short at this time, the task remains to identlfy which of these factors, if any, are critical for the process of tat-induction. Subsequent investigations have provided insight into TAR-independent pathways for a mechanism of tat-activation. The biological significance of this may be even more important given that this pathway appears to be used in activated T cells and in microglial cells (23,237,238).The activation appears to be mediated through upstream DNA elements, more specifically the enhancer elements. Interestingly, a study using a tissue-specific JCV enhancer fused to TAR downstream elements is active in normally nonpermissive cells in the presence of tat, providing indirect evidence for tat interaction with upstream factors (239).Thus, the observation that chimeric tat-DNA-binding protein fusions can activate through upstream DNA elements (113)probably mimics a process described in uiuo with HIV-1 proviral constructs for certain HIV-l-infected cell types or states and with heterologous HIV-1 promoter constructs. Studies using rodent-human chromosome hybrid cell lines have defined an interesting species difference that confers tat-responsiveness upon normally unresponsive rodent backgrounds in the presence of human chromosome (I2),and to a lesser extent, human chromosome 6 (240, 241). Subsequent experiments using tat chimeras in these cell lines provide additional support for this hypothesis (242).These and other experiments suggest the existence of specific human factors involved in tat-activation of not just HIV-1, but also HIV-2, by their respective tat proteins (243)in some, but not all, rodent cell lines. trans-Activation of HIV-2 by the HIV-1 tat involves a pathway that is not restricted to a human chromosome background, suggesting an alternative cellular pathway for HIV-1 tat-activation. Evidence of multiple pathways for tat-activation of HIV-1 has been reported, as stated above. More follow-up studies await these intriguing preliminary reports and

HIV-1 GENE EXPRESSION

173

will be necessary to identify both the factor(s) involved and their sites of action.

G. Downstream Region The sequence downstream of TAR contain elements involved in 3‘ processing, including the polyadenylation signal and the polypyrimidine stretch, found in many eukaryotic transcriptional processing regions (Section IV,A). In addition, these downstream sequences can form RNA structures with significant secondary structure having possible roles in transcriptional stability and translational efficiency. Furthermore, the terminal sequences closest to the 3’ end contribute to a structural component of considerable secondary structure when combined with unique viral sequences just downstream of the 5’ LTR (244). This structure encompasses the primer-binding site for tRNALys, which plays a key role in HIV-1 replication and thus provides a rationale for conservation of this domain in viral isolates.

IV. Processing of HIV-1 mRNA A. &Elements The HIV-1 LTR serves as a multifunctional regulatory region for the initial as well as terminal stages of the transcriptional process. The ciselements, which modulate the 3’-terminal processing events, have been described, using hybrid polyadenylation region constructs. These elements, which reside within the LTR, appear to overlap partially those involved in the transcriptional initiationlelongation process, also dictated by cis-acting elements within the LTR. Thus, the HIV-1 LTR contains control features for transcriptional initiation as well as for polyadenylation. The commitment to the polyadenylylation pathway appears to be determined by the proximity of the polyadenylylation control region to the cap site and hence to an active promoter. Promoter activity other than the HIV-1 promoter can cause occlusion of the adjacent HIV-1 polyadenylylation site (245). This distance-dependent phenomenon could explain the inactivation of the polyadenylylation signal in the 5‘ LTR, but would not explain why the 3’ LTR remains dedicated to 3’ processing but not to transcriptional initiation. Insight into this latter question is provided by studies demonstrating that the U 3 and R regulatory regions in the HIV-1 LTR further increase the efficiency of the HIV-1 polyadenylylation signal in uiuo and in oitro (246). This signal appears to be structural rather than sequence-specific, and does not overlap transcriptional regulatory regions such as the enhancer, SP1, 3’ TATA-flanking sequences, and IST sequences (247-251). The stem-loop

174

JOSEPH A. GARCIA AND HICHARD B. GAYNOR

structure in the R region, which encompasses TAR at the 5’ end of the HIV-1 transcript, may also aid in optimizing the position of this U3 signal to the poly(A) signal and may also contribute to the U3 efficiency signal (249). Finally, a sequence in the HIV-1 LTR US regulatory region downstream of the poly(A) signal, which is similar to the previously recognized G-T cluster found in a variety of polyadenylylation control regions, modulates the efficiency of the HIV-1 polyadenylylation process and also appears to participate in occlusion (245, 252). Further investigations are required to more thoroughly address how these LTR sequences mediate both transcriptional initiation and termination processes.

B. Cellular Factors Whether or not there exist cellular cofactors that may regulate both ends of the transcriptional spectrum is unknown. However, a cellular factor, UNF-1, that binds to the HIV-1 LTR has been identified by expressioncloning. UNF-1 contains the RNP consensus sequence which was first described in the hnRNA ribonucleoprotein family as an RNA-binding motif and was soon found in a factor involved in alternative RNA splicing (253, 254). Subsequent to the initial findings, it was demonstrated that a truncated form of the hnRNP protein can bind single-stranded DNA with high affinity and moderate specificity. Several reports have described cellular factors containing the RNP consensus motif that bind double-stranded DNA in a sequencespecific manner, including several factors that may be involved in transcriptional initiation (255-257) or post-transcriptional processing (258). It is apparent that the RNP consensus element serves as a nucleic acid-binding domain with a full spectrum of nucleic acid targets and biological activities. It will be interesting to determine what cellular factors are involved in the various facets of 3’ processing of the HIV-1 transcripts and whether they also modulate, in an active or passive role, the transcriptional initiation process at the 5’ HIV-1 LTR.

V. Translational Control A. Translation Studies The HIV-1 mRNA untranslated leader region can form a structure with a significant secondary structure consisting of multiple stem-loop structures. The first stem-loop contains the TAR, whereas the secondary structures downstream of TAR appear dispensable for tat response. The presence of similar secondary structures in other mRNAs has been associated with a decreased efficiency of translation. Therefore, several investigators exam-

HIV-1 GENE EXPRESSION

175

ined the translational efficiency of HIV-1 leader-containing messages. The initial studies indicated that the HIV-1 leader decreases the translational efficiency in cis using cytoplasmic microinjection of Xenopus oocytes or in uitro translation assays with reticulocyte or HeLa cell extract as model systems (259).Mutations of the base of the stem improved translational efficiency in both systems, presumably as a result of improved cap accessibility. Microinjection of tat into Xenopus oocytes demonstrated a TAR-dependent positive effect on presynthesized TAR-containing RNA when injected into the nucleus, requiring the loop and bulge regions for full effect (260, 261). The demonstration that HIV-1 upstream sequences are required for the translational inhibition by TAR in the absence of tat suggests that the nature of the HIV-1 transcriptional complex can affect subsequent downstream gene expression events in Xenopus oocytes (261). This result was further characterized as a post-transcriptional effect, and the possibility was raised that this effect was secondary to the improved translatability of these “activated” RNA species, as a result of inhibiting adenosine-to-inosine modifications of TAR-containing RNA, with one notable exception (262). However, subsequent follow-up experiments noted only a requirement for a purine residue at the tat-specified modified position, suggesting that adenosine-toinosine conversion is not the primary reason for the translational effects on HIV-1 gene expression observed in Xenopus oocytes (263).This requirement for the purine at this position probably reflects a structural basis for tatrecognition of TAR RNA rather than a substrate-determinant for a Xenopus enzymatic activity. The previous results have been complicated by experiments in higher eukaryotes. HIV-1 infected cells exhibit differential effects on host-cell versus viral-protein synthesis. However, this effect appears to be mediated by a cellular mRNA degradation pathway rather than a translational difference (264). Moreover, the conclusions regarding the larger translational effects of tat from the Xenopus oocyte microinjection studies are not supported by similar experiments using microinjection of primate cells. In this setting, tat appears to act primarily by activating transcription without any significant translational effect (265).The possibility remains that an evolutionaryconserved factor is important for HIV-1 gene expression in both Xenopus and higher eukaryotes, but that the specific function of the cellular factor in gene expression may have evolved differently. This may be especially relevant, given the recent report of cellular factors from Xenopus oocytes that bind to the loop as well as to the bulge region of TAR RNA. This binding activity can be modulated in the presence of tat (266) in a manner similar to the binding characteristics of proteins purified from human cells (230, 231).

176

JOSEPH A. GARCIA AND RICHARD B. CAYNOR

6. Cellular Factors The activation of the two cellular factors identified in other viral-infected states, double-stranded RNA-dependent kinase (dsI/DAI/p68 kinase) and 2’-5’)A synthetase, is part of a pathway that induces interferon synthesis. One of the interferons, IFN-y, can antagonize the effects of tat and may contribute to the maintenance of HIV-1 latency (267). An early report demonstrated the efficacy of mismatched dsRNA as an antiviral agent for HIV in uitro (268). Similarly, transfection of low levels of poly(I).poly(C)can decrease HIV-1 gene expression, as determined by reporter constructs (269). The short RNA transcripts associated with the HIV-1 promoter in uiuo and in uitro result in an abundant source of mismatched dsRNA, and raise the possibility that HIV-1 gene expression may be modulated at the translational level in HIV-l-infected cells by this cellular system (270, 271). dsRNAs stimulate dsI, which in turn phosphorylates the translational initiation factor eIF2, the end result of this process being the downregulation of translation (272-274). As determined by in uitro translation experiments, TAR-containing mRNA can inhibit in trans the translation of mRNA species by activation of dsI and subsequent phosphorylation of eIF2 (90). This activation of dsI requires the structural integrity of the lower stem of the TAR-containing HIV-1 leader region (275). Furthermore, high concentrations of poly(I).poly(C)prevent activation of dsI and also prevent the trans-inhibition of in uitro translation by HIV-1 leader region-containing mRNA (90). The presence of dsRNA also leads to activation of the (2’-5’)oligoadenylate [(2‘-5‘)Al synthetase system, which includes (2’-5‘)A synthetase, (2’5’)-phosphodiesterase, and (2‘-5’)A-dependent RNase, TAR-containing RNA species can bind and activate (2’-5’)A synthetase (276). Since (2‘-5’)A synthetase can activate (2’-5’)A-dependent RNase, a modulation of mRNA levels may occur in the absence of tat. Purified (2’-5’)A-dependent synthetase from human trophoblast cells is activated by TAR-containing RNA (271). Activation of (2‘-5’)A synthetase is also observed in HIV-l-infected T cells (277). However, this is a transient process and the levels return to normal levels within 5 days post-infection. The transient nature of this process may be due to the prevention of TAR-containingtranscripts from binding to (2‘-5’)A synthetase as the levels of tat increase (278). The important question is whether or not translational effects are observed in uivo during the course of HIV-1 infection. As stated above, HIV-1infected cells appear to exhibit different levels of protein expression for hostversus HIV-l-encoded mRNA, but this effect appears to be due to differences in mRNA degradation rates, and thus correlates with steady-state mRNA levels (264), a situation also initiated at early time-points in transfec-

177

HIV-1 GENE EXPRESSION

tion studies (279). Moreover, the TAR-dependent activation of dsI has been questioned, using highly purified in oitro-synthesized HIV-1 leadercontaining RNA, in which inhibition, rather than activation, of dsI was observed (280). Stimulation of in oitro translation by TAR-containing RNA transcripts has been reported, but this effect was not seen with TAR RNAs corresponding in length to the short-transcript family (281).In addition, the interferon family includes many members that may have differing effects on HIV-1 gene expression (282).Furthermore, another study presents evidence for a heat-labile factor that is not dsI and is involved in translational control of HIV-1 leader-containing mRNA (283). Whether or not translational effects play a prominent role in the conversion from the latent to active phase of HIV-1 viral replication remains controversial and requires further substantiation. However, the observation that tat can decrease the level of dsI in the presence of interferon (284) as well as prevent binding of TAR-containing RNA by (2'-5')A synthetase (278) raises the possibility that HIV-1 may modulate both negative and positive effectors of HIV-1 and/or host-cell translation.

VI. tat Studies A. Structure/Fundion Analysis The initial mutagenesis studies of tat defined several domains by functional as well as structural features (285, 286). These domains include an acidic amino-terminal region, a cysteine-rich region, a region conserved among various lentiviral trans-activators just downstream of the cysteinerich stretch, a basic region composed of eight lysine and arginine residues, and an accessory region. Other important residues with no clear function have been identified by various site-specific mutagenesis studies. These results await detailed structural analysis in the hopes of better defining their contributions to tat function. Cellular localization studies indicate that tat resides in the nuclear compartment with particularly high concentrations in the perinucleolar region of the nucleolus (287). Targeting of tat to the perinucleolar region may be necessary for optimal trans-activation, although this alone is not sufficient (288-290). Finally, tat appears to be associated with the nuclear matrix components in a zinc-dependent manner, although the identity and significance of these components are unknown (291). The amino terminus of tat appears to encode an activation domain as determined by deletion and site-directed mutagenesis studies as well as by heterologous construct analysis (218,285,286,292),but this region is not essential for tat activity (293).The original proposal that this region may form an

178

JOSEPH A. GARCIA AND RICHARD €3. CAYNOR

amphipathic a-helix has not been verified by vacuum UV circular dichroism analysis. The cysteine-rich region binds certain divalent cations and participates in dimerization in uitro, although the in uiuo significance of both of these observations has been questioned (294).However, viral studies support a critical role for the cysteine residues; this is well-supported by transfection studies (285, 295). Non-cysteine residues within this region are also critical for tat-activation of HIV-1 gene expression (296).The conserved region can tolerate numerous single substitutions, with one notable exception (286). The basic domain of tat comprises both a nuclear and a nucleolar targeting signal as well as an RNA-binding motif (220, 226-228) and numerous mutations in this region yield defective tat-trans-activation (285, 297). Substitutions with other basic amino acids can be made if the properties of nucleolar targeting and TAR RNA binding are maintained (298). The arginine residues play a key role in the function of the basic region, which may be related to the proposed RNA-binding role for these residues (288, 299, 300). A role for the basic region, as a cell attachment domain, has recently been proposed (301),in addition to the amino-acid motif RGD, located in the second coding exon, which also serves this purpose (302).A peptide encoding the conserved region and the basic region adopts a amphipathic a-helix by circular dichroism analysis (303),as does a short stretch in the accessory domain, just downstream of the basic region, which may contribute to the binding of tat to the TAR loop region (304).

B. TAR RNA Binding Studies Circular dichroism and NMR studies indicate that a conformational change in tat, as well as TAR RNA, occurs when using peptides comprising the basic domain of tat or an arginine analog as the TAR RNA-binding agents (299, 305, 306). The primary recognition site on TAR for tat is the bulge region, with the uridine at position +23 being a critical contact point (220, 227, 307). The neighboring stem and loop sequences are also important, perhaps partially as a result of their contribution to local structural determinants, but also because these sequences appear to be contacted by various domains of tat (231, 299, 308). A schematic of the TAR,RNA structure is shown in Fig. 3. Minimal tat peptides, containing the basic domain and the arginine residues in particular, appear to mediate binding to TAR in uitro in a specific manner (309). However, one must be cautious of drawing conclusions with regard to the relevance of these studies to the kinetics and binding specificity of intact tat, because of inherent experimental biases (310, 311). An interesting preference of tat for potential structural features of TAR has been demonstrated using TAR RNA.DNA hybrids in a gel-shift analysis. This analysis suggests that A-helical forms of TAR in the RNA stem sequences 5' to the bulge and 3' to the loop may be critical for tat binding

179

HIV-1 GENE EXPRESSION

+301+35 Q G UJ 0

c

+23/+25

C

'

C-G G-C A-U G-C

A

&!IA - U

G-C A-U C-G C-G A G-C A-U U-A U-A G-C G*U U-A C-G U*G C-G c U-A U-A G-C G-C m GpppG CACU +l +62

-

FIG. 3. Structure of trans-wtivating region (TAR) RNA. The seconda~ystructure of TAR RNA extending from 1 to +62 is shown and the position of the bulge between iiucleotides +23 and f 2 5 in the HIV-1 LTR and the loop region between iiucleotides +30 and +35 in the HIV-1 LTR are indicated.

+

(312). An extension of this approach has been used to point out the importance of the stem residues 3' to the bulge for hydrogen-bonding contact points to tat (311). Furthermore, involvement of phosphate residues has

180

JOSEPH A. GARCIA AND RICHARD B. CAYNOR

been demonstrated opposite the bulge in tat binding, in addition to the phosphate residues 5' to the bulge, which was not evident in earlier studies (313).

C. Mechanistic Studies HIV-1 encodes several regulatory proteins essential for viral gene expression. One of these, tat, acts to increase HIV-1 gene expression positively in a highly specific manner (213). The necessity of this factor for viral growth has been demonstrated by proviral constructs (314). tat can increase gene expression of the HIV-1 promoter in a wide variety of human and other primate cell lines from various cell lineages, for primary as well as transformed cell lines. Recent studies show that tat can induce elevated levels of gene expression of certain class-I1 RNA polymerase cellular (315-31 7 ) and viral (43, 44, 318-321) promoters as well as alter the activity of the class-I11 RNA polymerase basal transcription factor, TFIIIC (322). This latter result is particularly intriguing in that the HIV-1 promoter contains two sequences near the start of transcription between + 11and +21 and +47 and +57 with limited homology to the consensus B-box found in Pol-I11 promoters (323, 324). These two regions are contained within the sequences shown to contribute to the IST activity. Whether or not TFIIIC, which binds to the B-box in Pol-I11 promoters, plays a role in IST activity has not been investigated. IST activity is retained when transcription is dictated by a Pol-111 promoter. If such a role for TFIIIC were to be demonstrated, it would be important to determine whether these transcripts were directed by RNA Pol-I1 in the HIV-1 promoter. The recent report that TFIIIC relieves chromatin-mediated repression of U6 snRNA transcription (325) suggests a potential biological role for the IST activity and TFIIIC in particular in the maintenance of a short-transcript population, which may regulate HIV-1 latency. The primary effect of tat appears to be an increase in steady-state HIV-1 mRNA levels, although translational effects have been noted (214, 326). The transcriptional effect is due to two major mechanisms: an increase in transcriptional initiation, and an increase in transcriptional elongation that does not require de nmo cellular protein synthesis (327).The relative importance of each of these mechanisms, however, may depend on the experimental system (12, 328). One particularly elegant mutagenesis experiment revealed the necessity of nascent TAR mRNA formation for tat activity in the isolated HIV-1 promoter (329). This is also supported by microinjection studies of mammalian cell lines that showed a failure of tat-activation for full-length messages containing TAR (265). The requirement for an RNA Pol-I1 initiation and elongation complex mediating this response is seen in experiments using a nuclear-targeted version of the bacteriophage "7 RNA polymerase.

HIV-1 GENE EXPRESSION

181

This prokaryotic RNA polymerase cannot induce transcription complexes that are activated by tat from HIV-1 constructs containing the T7 promoter in place of the RNA Pol-I1 initiation signals. Earlier suggestions as to the mechanism of tat-activation invoked an antitermination pathway based on studies of HIV-1 short-transcript versus long-transcript populations in the absence or presence of tat (330).A recent kinetic analysis of tat-activation using TAR-ribozyme constructs suggests that polymerase pausing at sites distal to the region from TAR to +80 does not play a role in the activation process (331).Rather, the rate limiting step appears to be interaction of tat with TAR RNA and the promoter elements. Several other studies suggest that changes in the elongation capability of the HIV-1 transcriptional complex, rather than an antitermination process, may be the major action of tat (11, 332-334). The presence of two types of elongation complexes (333),in addition to recent experiments demonstrating two types of transcriptional complexes fofmed on the HIV-1 promoter with differing susceptibility to tat (I73),suggests that tat may act on a subset of the initiation complexes formed on the HIV-1 promoter. The identification of the IST element may alter the interpretations reached earlier, in the antitermination study, given the propensity of this promoter to exhibit significant tat-insensitive transcriptional activity within the first 60 or so nucleotides. Heterologous fusions of tat to other RNA-binding proteins (218, 335, 336) or to DNA-binding proteins (113, 337) can trans-activate a modified HIV-1 promoter containing the recognition sequence for the heterologous binding protein at the RNA or DNA level, respectively. These studies indicate that activation of the HIV-1 promoter can occur from downstream as well as upstream binding sites. The conclusions to be reached from these studies, however, with regard to the mechanistic action of tat should not be overstated, and perhaps should be limited to acknowledging the presence of activation domains within tat whose afFector function can operate if these chimeric tat molecules bind within close proximity to the initiation/elongation complex.

D. Associated Cellular Factors With the results of several other viral trans-activator studies in mind, it is important to identify what cellular factors, if any, that tat associates with. One approach has been described in which tat was used to screen an expression library to try to identify cellular factors that specifically interact with tat (338).Subsequently, a second closely related factor was identified by hybridization screening (339). Additional factors, from yeast as well as humans, have been identified on the basis of homology to the earlier factors (340342). Furthermore, a tat-binding cellular protein has been identified using

182

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

tat -peptide-affinity chromatography (343).The precise role of these factors for HIV-1 remains open to further experimentation, although evidence has been presented with transfection and microinjection studies suggesting an effect of some of these factors on HIV-1 gene expression. One observation notes down-regulation of HIV-1 promoter activity shortly after tat-induction that was sensitive to protein synthesis inhibitors (279).Whether this downregulation is due solely to & nmo synthesis, a labile repressor by tat, or direct modification or complexing of a labile factor by tat remains to be determined. The convergence of findings related to the various aspects of HIV-1 transcriptional control requires a careful assessment of the contribution of each of these cellular factors to basal as well as tat-induced HIV-1 gene expression.

VII. lnterventional Strategies A. Pharmacological An interesting application of the knowledge gleaned from the cytokine studies has been in the development of various experimental systems to study potential pharmacological modulators of this cytokine network (30). These include steroid hormones, antioxidants (N-acetyl-L-cysteine, ascorbate) (344), pentoxifylline (345), and retinoids (100). The activity of these compounds as determined by in vitro studies, however, has provided mixed results, suggesting that further elucidation of the cytokine and pharmacological agent pathway is necessary for therapy with these agents to be realized. Perhaps more importantly, several in uitro studies suggest that the use of certain pharmacological agents, widely used in treatment, may actually induce HIV-1 gene expression in latently infected cells (346, 347). Screening pharmacological data bases for inhibitors of HIV-1 transcription has identified an antagonist of tat function (348). However, this compound did not block induction of HIV-1 gene expression by phorbol or TNF, which may limit its effectiveness. Other pharmacological agents have subsequently been identified that may prove more effective at blocking transcriptional activity of the HIV-1 promoter, including an antagonist of plateletactivating factor (345, 349, 350). A soluble form of the TNF-a receptor can limit TNF-mediated activation of the HIV-1 promoter in tissue culture. This may provide a biological-based model as an alternative to traditional pharmacological therapy for interfering with cytokine activation of HIV-1 (352).

B. Antisense Oligonucleotides Antisense oligonucleotides can inhibit HIV-1 gene expression. Several of these oligonucleotides were targeted to the HIV-1 LTR at various sequence

HIV-1 GENE EXPRESSION

183

elements, including both upstream (352) and downstream transcriptional control motifs (353-356), as well as the tat gene (353, 354, 357, 358). A recent analysis of various HIV-1 isolates has identified a highly conserved 21base sequence that encompasses the polyadenylylation signal region. The structural analysis of this sequence may provide a rationale for the design of a specific antisense molecule to interact with this region as an effective adjunct to the current therapeutic armament (359). Finally, in addition to chemical modifications of the oligonucleotide backbone, substances have been synthesized that can be directly coupled to these oligonucleotides (360,361).Coupling scission-inducing moieties to antisense oligonucleotides may, in the future, provide a sequence-specific therapeutic reagent for both unintegrated and integrated HIV-1 molecular targets (362).

C. Antisense RNA An alternative approach to oligonucleotide-mediatedantisense inhibition involves the use of antisense RNA molecules. The advantage of this approach is that it allows the use of heterologous promoters to insure the constant production of antisense molecules that may be important for gene-therapy considerations. Such an approach has been taken in the use of an adenoassociated virus antisense vector designed to target all HIV-l-encoded transcripts (363).Again, targets for antisense RNA therapy at the transcriptional control level include TAR as well as tat (364-367).

D. Ribozymes Ribozyme technology has been applied to HIV-1 gene expression by several investigators. The types of ribozyine molecules tested to date include both the “hammerhead (89, 90, 368-370) and the “hairpin” types (371), which may allow for inhibition under a variety of physiological states. The transcriptional control-related targets for these ribozyines have included LTR sequences (369, 371) as well as HIV-1 mRNAs encoding tat (370).The addition of this approach to that of antisense technology may provide a significant base for gene therapy aimed at the HIV-l-encoded RNA species.

E. TAR RNA Decoys The critical role that the HIV-l-encoded TAR inRNA plays in tatactivation provides a rationale for an alternative approach to inhibiting HIV-1 gene expression. Expression constructs containing multimerized TAR regions confer inhibition of HIV-1 gene expression when expressed as a leader sequence for a heterologous transcript (372) or as an RNA, but not DNA, species (373, 374). This was hypothesized to result from the competition of these multimerized decoy mRNA molecules for binding of tat. However, the identification of cellular proteins that can interact with the TAR mRNA

184

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

region necessitates further experimentation with TAR mutant mRNA multimers to differentiate between these; subsequent investigations noted the requirement for a wild-type loop sequence for TAR decoy function (375). Although one of the loop mutations does not substantially affect the binding of one of the loop-binding cellular factors to TAR RNA (7), there are several other candidate loop-binding factors whose binding specificity has been more clearly defined (230, 231) and whose affinity for the loop mutant TAR RNA may be significantly altered.

F. tat Inhibitors The structural analysis of tat may provide a higher-ordered rationale for the development of alternative pharmacological reagents to inhibit HIV-1 gene expression. Two drugs that have shown an inhibitory effect on HIV-1 expression are D-penicillalnine and 2,3-dimercapto-propanol (DMP) (376, 377). Both of these drugs are heavy-metal chelators and as such may decrease HIV-1 gene expression by complexing with divalent ions possibly required by tat for proper function.

G. trans-Dominant Inhibitors One major goal for HIV-1 investigators would be to identify transdominant negative molecules that might form the basis for a gene-therapy approach for HIV-l-infected patients. These may include viral as well as cellular analogs of existing factors. Several studies have identified mutant tat forms that confer a substantial degree of trans-inhibition by exogenous tat peptides or plasmid-encoded tat in cell-line studies. The mutations conferring trans-dominant inhibition are in the conserved region for the peptides (378)and in the basic region for full-length tat (379,380). The mechanism by which this trans-inhibition occurs is unknown. Possibilities include sequestration of active tat inolecules or of rate-limiting cellular factors by mutant tat. Both of these possibilities can be partially supported by nuclear localization studies and by the demonstration of trans-inhibition using chimeric tat-bacteriophage H17 coat protein constructs (379, 381). A multitude of heterologous viral trans-activators can affect HIV-1 gene expression, and their effects appear to be mediated through several different ck-elements present in the HIV-1 promoter. Thus, the application of several different heterologous trans-dominant inhibiting viral trans-activators may provide the means for synergistically interfering with HIV-1 gene expression. trans-Dominant negative viral mutants have been identified for the adenovirus ElA protein (382). Efforts to expand the list of heterologous viral trans-dominant inhibitors may prove extremely useful for gene therapy in HIV-l-infected patients. Finally, trans-dominant negative analogs of cellular DNA-binding pro-

185

HIV-1 GENE EXPRESSION

teins may prove useful, if specificity for the HIV-1 promoter can be defined in a larger context than the individual recognition elements to which these factors bind. A truncated version of a prokaryotic repressor with binding specificity for the enhancer region has been shown to inhibit HIV-1 gene expression in oitro (383)and naturally occurring repressors exist in the re2 family. Whether or not these initial findings can be improved in terms of both sensitivity and effectiveness will indeed be a formidable but tremendously exciting challenge for the future.

VIII. Glossary LTR TAR IFN TNF GM-CSF TGF IL tat GRE COUP-TF IST AP1 TdT RGD TCRQ NRE hnRNA

long terminal repeat trans-activator-responsive element interferon tumor necrosis factor granulocyte-macrophage colony-stimulating factor transforming growth factor interleukin trans-activator of transcription glucocorticoid response element chicken ovalbuinin upstream protein-transcription factor initiator of short transcripts activator protein 1 terminal deoxynucleotidyl transferase arginine-gl ycine-aspartate T-cell receptor Q negative regulatory element heterogeneous nuclear RNA ACKNOWLEDGMENTS

We thank Brian E. Finley for the editing, figures, and preparation of this manuscript.

REFERENCES 1. M. Alizon, P. Sonigo. S. F. Barre, J. C. Chermann, P. Tiollais, L. Montagnier and H. S. Wain, Nature 312, 757 (1984). 2. B. H. Hahn, C . M . Shaw, S. K. Arya, M. Popovic, R. C. Gallo and S. F. Wong, Nature 312, 166 (1984).

186

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

3. P. A. Luciw, S. J. Potter, K. Steimer, D. Dina and J. A. Levy, Nature 312, 760 (1984). 4. G. Pantaleo, C. G d o s i , J. F. Demarest, L. Butini, M. Montroni, C. H. FOX,J. M. Orenstein, D. P. Kotler and A. S. Fauci, Nature 362, 355 (1993). 5. J. Embretson, M. Zupancic, J. L. Ribas, A. Burke, P. Rwz, K. Tenner-Racz and A. T Haase, Nature 362, 359 (1993). 6. T. Ohmoto and F. Wong-Std, Cell 47, 29 (1986). 7. R. A. Marciniak, M. A. Garcia-Blanco and P. A. Shiitp, PNAS 87, 3624 (1990). 8. M. G. Toohey and K. A. Jones, Genes Deu 3, 265 (1989). 9. E. B e n d and Y. Aloni, J . Virol. 65, 4910 (1991). 10. Y. Li, J. Ross, J. A. Scheppler and B. R. Franul, Jr., MCBiol 11, 1883 (1991). 11. H. Kato, H. Sumimoto. P. Pognonec, C.-H. Chen, C. A. Rosen and R. G . Roeder, Genes Dew. 6, 655 (1992). 12. C. A. Bohan, F. Kashanchi, B. Ensoli, L. Buonaguro, L. K. Boris and J. N. Brady, Gene Expression 2, 391 (1992). 13. M. A. Gmeble, M. J. Churcher, A. D. Lowe, M. J. Gait and J. Karn. personal communication with J. Carn (1993). 14. J. A. Garcia, F. K. Wu, R. Mitsuyasu and R. B. Gaynor, E M B O J . 6, 3761 (1987). 15. F. Demarchi, P. D’Agaro, A. Falaschi and M. Giacca, J . Virol. 66, 2514 (1992). 16. G. Englund, M. D. Hogan, T.S. Theodore and M. A. Martin, Virology 181, 150 (1991). 17. R. Gartenhaus, F. Michaels, L. Hall, R. C. Gallo and M. S. Reitz, Jr., Aids Res. Hum. Retrmiruses 7 , 681 (1991). 18. I. Hirsch, B. Spire, Y. Tsunetsugu-Yokota, C. Neuveut, J. Sire and J.-C. Chermann, Virology 177, 759 (1990). 19. S. E. C. Koken, J. L. B. van Wamel, J. Goudsmit. B. Berkhout and J.L. M. C. Geelen, Virology 191, 968 (1992). 20. C. Parrott, T. Seidner, E. Duh, J. Leonard, T. S. Theodore, A. Buckler-White, M. A. Martin and A. B. Rabson. J. Vfrol.65, 1414 (1991). 21. E. K. Ross, A. J. Buckler-White, A. B. bbson, G . Englund and M. A. Martin, J. Virol. 65, 4350 (1991). 22. J. Leonard, C . Parrott, A. J. Buckler-White, W. Turner, E. K. Ross, M. A. Martin and A. B. Rabson, J . Virol. 63, 4919 (1989). 23. D. Harrich, J. Garcia, R. Mitsuyasu and R. Gaynor, EMBO J. 9, 4417 (1990). 24. J. Y. H. Kim, F. Gonzalez-Scarano, S. L. Zeichner and J. C. Alwine, J . Virol. 67, 1658 (1993). 25. H. W. Sheppard and M. S. Ascher, Annu. Rea Microbiol. 46, 533 (1992). 26. A. Miyajima, T. Hard and T. Kitamura, TZBS 17, 378 (1992). 27. A Miyajima, T.Kitamura, N. Harada, T. Yokota and K . 4 . Arai, Annu. Rev. Zmtnunol. 10, 295 (1992). 28. Y. Koyanagi, W. A. O’Brien, J. Q. Zhao, D. W. Golde, J. C. Gasson and I. S. Chen, Science 241, 1673 (1988). 29. T. Matsuyama, N. Kobayashi and N. Yamamoto, AZDS 5, 1405 (1991). 30. G . Poli and A. Fauci, AIDS Res. Hum Retroviruses 8, 191 (1992). 31. H. E. Gendelman, W. Phelps, L. Feignbaum, J. M. Ostrove, A. Adachi, P. M. Howley, G . Khoury, H. S. Ginsberg and M. A. Martin, PNAS 83, 9759 (1986). 32. J. M. Ostrove, J. Leonard, K. E. Weck, A. B. Rahson and H. E. Gendelman, J . Virol. 61, 3726 (1987). 33. J. D. Moscct, D. P. Bednarik, N. B. K. hj,C. A. Rosen, J. G. Sodroski, W. A. Haseltine and P. M. Pitha, Nature 325, 67 (1987). 34. R. F. Rando, P. E Pellett, P. A. Luciw, C. A. Bohan and A. Srinivasan, Oncogene 1, 13 (1987).

HIV-1 GENE EXPRESSION

187

35. J. D. Moscd, D. P. Bedniuik. N. B. K. Raj, C. A. Rosen, J. G . Sodroski, W.A. Haseltine, G . S. Hayward and P. M. Pitha, PNAS 84, 7408 (1987). 36. C. J. Chapman, J. D. Harris, M. K. L. Collins and D. S . Latchman, AIDS 5,945 (1991). 37. D. M. Margolis, A. 8. Rabson, S. E. Straus and J. M. Ostrove, Virology 186, 788 (1992). 38. M. G. Davis, S. C. Kenney, J. Kamine, J. S. Pagano and E . 4 . Huang, PNAS 84, 8642 (1987). 39. E. Elfmsi, S. Michelson, F. Bachelerie, F. Arenzana-Seisdedos and J. L. Virelizier, Ann. Znst. PasteurlVirol. 138,461 (1987). 40. H. Duclos, E. Elfassi, S. Michelson, F. Arenzana-Seisdedos. A. Munier and J.-L. Virelizier, AIDS Res. H u m Retrooiruses 5, 217 (1989). 41. D. M. Markovitz, S. Kenney, J. Kamine, M. S. Smith, M. Davis, E.-S.Huang, C. Rosen and J. S. Pagano, Virology 173, 750 (1989). 42. P. A. Barry, E. Pratt-Lawe, B. M. Peterlin and P. A. Luciw, J. Virof. 64, 2932 (1990). 43. W.-Z. Ho, J. M. Harouse, R. F. Rando, E. Gonczol, A. Srinivwan and S. A. Plotkin, J . Gen. Virol. 71, 97 (1990). 44. W.-Z. Ho, L. Song and S. D. Douglas, J. AIDS 4, 1098 (1991). 45. B. J. Biegalke and A. P. Geballe, Virology 183, 381 (1991). 46. P. G h d , J. Young, E. Giulietti, C. DeMattei, J. Garcia, R. Gaynor, R. M. Stenbergand J. A. Nelson, J . Virol. 65, 6735 (1991). 47. S. Walker, C. Hagemeier, J. G. P. Sissons and J. H. Sinclair, J . Vlrof. 66, 1543 (1992). 48. S. Kenney, J. Kamine, D. Markovitz, R. Fenrick and J. Pagano, PNAS 85, 1652 (1988). 49. E.B. Quinlivdn, E.Holley-Guthrie, E.-C. Mar, M. S. Smith and S. Kenney, J. Virol. 64, 1817 (1990). 50. R. T. Howat, C. Wood and N. Balachandran, J . Virol. 63, 970 (1989). 51. R. T. Howit, C. Wood,S. F. Josephs and N. Balwhandran,J . Virof.65, 2895 (1991). 52. D. Di Luca, P. Semhiero, P. Bovenzi, A. Rotola, A. Caputo, P. Monini and E. Cmsai, AIDS 5, 1095 (1991). 53. Y. Geng, B. Chandran, S. F. Josephs and C. Wood, /. Virol. 66, 1564 (1992). 54. R. Yuan, C. Bohan, F.C. H. Shim, R. Robinson, H. J. Kaplan and A. Srinivasan, Virology 172, 92 (1989). 55. H. Tab, J. Rappaport, M. Lash@, S. Amini, F. Wong-Staid and K. Khalili, PNAS 87, 3479 (1990). 56. S. K. Aryd, AlDS Res. Hum. Retroviruses 4, 175 (1988). 57. K. A. Stellrecht, K. Sperber and B. G.-T. Pogo, J . Virol. 66, 2051 (1992). 58. 8 . A. Antoni, A. B. Rabson, I. L. Miller, J. P. Trempe, N.Chejanovsky and B. J. Carter, J . Virol. 65, 396 (1991). 59. E. Mendelson, Z. Grossman, F. Mileguir, G. Rechavi and B. J. Carter, Virology 187,453 (1992). 60. A. P. Rice and M. B. Mathews, PNAS 85, 4200 (1988). 61. G. J. Nabel, S. A. Rice, D. M. Knipe and D. Baltimore, Science 239, 1299 (1988). 62. A. Bielinska, S. Krasnow and G. J. Nabel, J . Virol. 63, 4097 (1989). 63. S. Kliewer, J. Garcia, L. Pearson, E. Soultaiiakis, A. Dasgupta and R. Gaynor,J. Virol. 63, 4616 (1989). 64. 1.-S. T w , C. A. Rosen, W. A. Haseltine and W. S . Robinson, J. Virol. 63, 2857 (1989). 65. J.4. Twu, J. Y. Wu and W. S. Robinson. Virology 177,406 (1990). 66. A. Siddiqui, R. Gaynor, A. SrinivasanJ. Mapoles and R. W. Farr, Virology 169, 479 (1989). 67. E. Seto, T. S. B. Yen, B. M. Peterlin and J.-H. Ou, PNAS 85,8286 (1988). 68. E. Bohnlein, J. Lowenthal, M, Siekevitz, D. W. Ballard, B. R. Franzaand W. C. Greene, Cell 53, 827 (1988).

188

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

69. 0. J. Seinmes and K.-T Jedng, J . Virol. 66, 7183 (1992). 70. M. Siekevitz, S. F. Josephs, M. Dukovich, N. Peffer, F. Wong-Staid and W. C. Greene, Science 238, 1575 (1987). 71. M. R. Smith and W. C. Greene, Genes Deu. 4, 1875 (1990). 72. K. Zimmerman, M. Dobmnik, C. Ballaun, D. Bevec, J. Hauber and E. Bohnlein, Virology 182, 874 (1991). 73. A. Keller, K. M.Partin, M. Lochelt, H. Bannert, R. M. Flugel, and B. R. Cul1en.J. Virol. 65, 2589 (1991). 74. L. K. Venkatesh, P. A. Theodorilkis and G. Chinnadurai, NARes 19, 3661 (1991). 75. A. H. Lee, K. J. Lee, S. Kiln and Y. C. Sung, J . Virol. 66, 3236 (1992). 76. A. Keller, E. D. Garrett and B. R. Cullen, J. Virol. 66, 3946 (1992). 77. A. S. Kekule, U. Lauer, L. Weiss, B. Luber and P. H. Hofschneider, Nature 361, 742 (1993). 78. M. Meyer, W. H. Caselmann, V. Schluter, R. Schreck, P. H. Hofschneider and P. A. Baeuerle, EMBOJ. 11, 2991 (1992). 79. R. F. h n d o , A. Srinivwan, J. Feingold, E. Gonczol and S. Plotkin, Virology 176, 87 (1990). 80. C. V. Paya, 1.-L. Virelizier and S. Michelson, J . Virol. 65, 5477 (1991). 81. C. Hagenieier, S. Walker, R. Caswell, T. Kouzarides and J. Sinclair, J . Virol. 66, 4452 (1992). 82. T. F. Kowalik, B. Wing, J. S. Haskill, J. C. Azizkhan, A. S. Bddwin and E . 4 . Huang, PNAS 90. 1107 (1993). 83. J. M. Gimble, E. Duh, J. M. Ostrove, H.E. Gendelman, E. E. Maxand A. B. Ralxon, J . Virol. 62, 4104 (1988). 84. J. Vlwh and P. M. Pitha, Virolugy 187, 63 (1992). 85. D. M. Margolis, J. M. Ostrove and S. E. Stmuss, Virology 192, 370 (1993). 86. B. Ensoli, P. Lusso, F. Schwhter, S. F. Josephs, J. Rappaport, F. Negro, R. C. Gdloand F. Wong-Staal, EMBO J. 8, 3019 (1989). 87. E. Seto, D.-X. Zhou, B. M. Peterlin and T. S. B. Yen, Virology 173, 764 (1989). 88. P. A. Barry, E. Pratt-Lowe, R. E. Unger and P. A. Luciw, J . Virol. 65, 1392 (1991). 89. C.-P. Feng, M. Kulka and L. Aurelian, Virology 192, 491 (1993). 90. I. Edery, R. Petryshyn and N. Sonenberg, Cell 56,303 (1989). 91. K. A. Clouse, P. B. Robbins, 8. Fernie, J. M. Ostrove and A. S. Fauci, J . Ztnmunol. 142, 470 (1989). 92. R. T. Horvat and C. Wood, J . Zmmunol. 132,2745 (1989). 93. P. Bressler, G. Pantdeo, A. Demaria and A. S. Fauci, J . ltntnunol. 147, 2290 (1991). 94. S. E. Tong-Starksen, P. A. Luciw and B. M. Peterlin, J . Ztntnunol. 142, 702 (1989). 95. M. A. Nokta and R. B. Pollard, AZDS Res. Hum. Retrooiruses 8, 1255 (1992). #W. C. T.Baldari, G. Macchia. A. Massone and J. L. Telford, FEBS Lett. 304, 261 (1992). 97. M. I.H. Chowdhury, Y. Koyanagi, S. Kobayashi, Y. Hamamoto, H. Yoshipma, T. Yoshida and N. Yamamoto, Virology 176, 1226 (1990). 98. T. M. Folks, J. Justement, A. Kinter, S. Schnittman, J. Orenstein, G. Poli and A. S. Fauci, J . Ztntnunol. 140, 1117 (1988). 99. S. Harda, Y. Koyanagi, H. Nakashima, N. Kobayashi and N. Ya~namoto,Virology 154, 249 (1986). ZOO. J. A. Turpin, M. Vargo and M. S. Meltzer, J . imtnunol. 148, 2539 (1992). 101. S. L. Zeichner, G. Hirka, P. W. Andrews and J. C. Alwine, J . Virol. 66, 2268 (1992). 102. S. K. Stanley, P. B. Bressler, G. Poli and A. S. Fauci, J . Zmtnunol. 145, 1120 (1990). 103. S. K. Stanley, T. M. Folks and A. S. Fauci, AlDS Res. Hum. Retrooiruses 5, 375 (1989).

HIV- 1 GENE EXPRESSION

189

104. M. R. Sdaie, E. Tschachler. K. Valerie, M. Rosenberg, B. K. Felber, G . N. Pavlakis, M. E. Klotman and F. Wong-Stad, New Biologist 2, 479 (1990). 105. B. Stein, H. J. Ruhmsdorf, A. Steffen, M. Litfin and P. Herrlich, MCBiol 0, 5169 (1989). 106. K. Valerie and M. Rosenherg, New Biologist 2, 712 (1990). 107. B. Stein, M. Kramer, H. J. Rahmsdorf, H. Wnta and P. Herr1ich.J. Virol. 63,4540(1989). 108. H. Hug and T. F. Same, BJ Bl, 329 (1993). 109. P. Cohen, TZBS 17, 408 (1992). 110. C. A. Rosen, J. G . Sodoroski and W. A. Haseltine, Cell 41, 813 (1985). 111. S. L. Zeichner, J. Y. H. Kim and J. C. Alwine, J . Virol. 65, 2436 (1991). 112. M. A. Muesing, D. H. Smith and D. J. Capon, Cell 48, 691 (1987). 113. B. Berkhout, A. Gatignol, A. B. bbson and K.-T. Jeang, Cell 62, 757 (1990). 114. M. West, J. Mikovits, G . Princler, Y.-L. Liu, F. W. Ruscetti, H.-F. Kung and Raziuddin. JBC 267, 24948 (1992). 115. R. PatarCd, J. Schwrtz, R. P. Singh, Q.-T. Kong, E. Murphy, Y. Anderson. F.-Y. Wei Sheng, P. Singh, K. A. Johnson, S. M. Guamagia, T. Durfee, F. Blattner and H. Cantor, PNAS 85, 2733 (1988). 116. P. Dwgupta, P. Saikumw, C. D. Reddy and F. P. Reddy, PNAS 87. 8090 (1990). 117. K. Yamamoto, S. Mori, K. Ohmoto and Y. Kyogoku, NARes 19, 6107 (1991). 118. B. Guy, R. B. Acres, M. P. Kieny and J.-P. Lemq, J . AZDS 3, 797 (1890). 119. B. R. Franza, Jr., F. J. buscher 111, S. F. Josephs and T. Cuman, Sdence e38, 1150 (1988).

120. J. N. Miner and K. R. Yamamoto, TZBS 16, 423 (1991). 121. D.Ghosh, J . V i r d . 66, 586 (1992). 122. P. A. Furth, H. Westphal aid L. Hennighausen, AZDS Res. Hum. Retmiruses 6, 553 (1990).

123. V. Kolesnitchenko and R. S . Snart, AZDS Res. Hum Retrmiruses 8, 1977 (1992). 124. J. Laurence, M. B. Sellers and S. K. Sikder, Blood 74, 291 (1989). 125. A. J. Cooney, S. Y. Tsai, B. W. O’Mdley and M.-J. Tsai, J . Virol. 6!5, 2853 (1991). 126. K. Orchard, G. Lang, M. Collins and D. Latchman, NARes 20, 5429 (1992). 127. K. Orchard, N. Perkins, C. Chapman, J. Harris, V. Emery, G. Goodwin, D. Latchman and M. Collins, J . Virol. 64, 3234 (1990). 128. A. S. Bourinhaiar, R. Nagorny and X. Tan, FEBS Lett. 302, 206 (1992). 129. K. Orchard, G . h g , J. Harris, M. Collins and D. Latchman, J. AZDS 6, 440 (1993). 130. M. Leid, P. Kastner and P. Chambon, TIBS 17, 427 (1992). 131. S. A. Kliewer, K. Umesono, R. A. Heyman, D. J. Mangelsdorf, J. A. Dyck and R. M. Evans, PNAS 89, 1448 (1992). 132. K. Damm, R. A. Heyman, K. Umesono and R. M. Evans, PNAS 90,2989 (1993). 133. D. M. Markovitz, M. C. Hannibal, M.J. Smith, R. CossnlanandG. J. Nabe1.J. Virol. 66, 3961 (1992).

134. C. Li, C. Lai, D. S. Sigman and R. B. Gaynor, PNAS 88, 7739 (1991). 135. C. Li. A. J. Lusis, R. Sparkes, A. Nirula and R. Gaynor, Genomics 13, 665 (1992). 136. C. Li, A. J. Lusis, R. Spwkes, S.-M. Tran and R. Gaynor, Cenotnics 13, 658 (1992). 137. K. Macleod, D. Leprince and D. Stehelin, TZBS 17, 251 (1992). 138. J.-P. Shw, P. J. Utz, D. 8. Durand, J. J. Toole, E. A. EnimelandG. R. Crabtree, Science 241, 202 (1988). 139. W.M. Flitnagan, 8. Corthesy, R. J. Bram and G. R. Crabtree, Nature 352, 803 (1991). 140. A. Karpas, M. Lowdell, S. K. Jacobson and F. Hill, PNAS 89, 8351 (1992). 141. P. G . McCaffrey, J. Jain, C. Jamieson, R. Sen and A. Rao,JBC 267, 1864 (1992). 142. Y.-M. Ning and E. R. Sanchez. JBC ’268,6073 (1993).

190

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

F. D’adda Di Fagagma and A. Falaschi, Virology 186, 133 (1992). 144, T. Maekawa, T. Sudo, M. Kurimoto and S . Ishii, NARes 19, 4689 (1991). 145. M. R. Smith and W. C. Greene, PNAS 86, 8526 (1989). 146. I. Calvert, Z. Q. Peng, H. F. Kung and W u d d i n , Gene 101, 171 (1W1). 147. M. L. Waterman and K. A. Jones, New Biologist 2, 621 (1990). 148. C. A. Bohan, R. A. Robinson, P. A. Luciw and A. Srinivasan, Virology 172, 573 (1989). Crit. Reu Immunol. 10,495 (1991). 149. A. h, 150. G . Nabel and D. Baltimore, Nature 326, 711 (1987). 151. J. D. Kaufman, G . Vdandra, G. Roderiquez, G. Bushar, C. Giri and M. A. Norcross, MCBiol7,3759 (1987). 152. S. E. Tong-Starksen. P. A. Luciw and B. M. Peterlin, PNAS 84, 6845 (1987). 153. H. Lubon, P. Ghazal, J. A. Nelson and L. Hennghausen, AIDS Res. Mum. Retmiruses 4, 381 (1988). 154. H. Dinter. R. Chili, M. hagmu, M. Karin and K. A. Jones. EMBO J . 6, 4067 (1987). 155. P. A. Baeuerle, BBA 1072, 63 (1991). 156. V. Bours, G. Franmso, V. Azarenko, S. Park, T. Kanno, K. Brown and U. Siebenlist, Cell 72, 729 (1993). 157. C. Muchardt, J.4. Seeler, A. Nirula, D.-L. Shurland and R. B. Gaynor, J . Virol. 66,244 (1992). 158. S. Doerre, P. Sista, S.-C. Sun, D. W. Ballard and W. C. Greene, PNAS 90, 1023 (1993). 159. C. S. Duckett, N. D. Perkins, T. F. W k , R. M. Schmid, E . 4 . Huang, A. S. Baldwin, Jr., and G. L. Nabel, MCBiol 13, 1315 (1993). 160. K. Kawakami, C. Scheidereit and R. G. Roeder, PNAS 85, 4700 (1988). 161. T. Maekwa, F. Itoh, T. Okamoto, M. Kurimoto, F. Imamoto and S. Ishii, JBC 264,2826 (1989). 162. M. Kretzschmar, M. Meisterernst, C. Scheidereit, G. Li and R. C. Roeder, Genes Deu 6, 761 (1992). 163. T. Maekawa, H. Sakura, T. Sudo and S. Ishii, ]BC 264, 14591 (1989). 164. F. K. Wu, J. A. Garcia, D. Harrich and R. B. Gaynor, EMBOJ. 7, 2117 (1988). 165. C.-M. Fan and T. Maniatis, Genes Del; 4,29 (1990). 166. A. Radler-Pohl, I. Pfeuffer, M. Karin and E. Serfling, New Biologist 2, 566 (1990). 167. N. Nomura, M.-J. Zhao, T. Nagme, T. Maekawa, R. Ishizaki, S. Tabata and S. Ishii, ]BC 266, 8590 (1991). 168. W. Phares, B. R. Franza, Jr., and W. Herr, J. Virol. 66, 7490 (1992). 169. S.4. Kurata, T Wkabayashi, Y. Ito, N. Mi-, R. Ueno, T. Marunouchi and N. Kurata, FEBS Left. 321,201 (1993). 170. A. S. Baldwin, K. P. LeClair, H. Sin& and P. A. Shtup, MCBiol 10, 1406 (1990). 171. K. A Jones, J. T Kadonaga, P. A. Luciw and R. Tjian, Science 232, 755 (1986). 172. D. Harrich, J. Garcia, F. Wu,R. Mitsuywu, J. Gonzalez and R.Gaynor, J. Viwl. 63,2585 (1989). 173. X. Lu, T. M. Welsh and B. M. Peter1in.J. Virol. 67, 1752 (1993). 174. G. A. Elder, Z. Liang, C. Li wd R. A. Luzmrini, NARes 20, 6281 (1992). 175. J. Kamine, T. Subramanian and G . Chinnadurai, PNAS 88, 8510 (1991). 176. J. Kamine and G. Chinnadurai, J. Vfrol.66, 3932 (1992). 177. N. J. Proudfoot, B. A. Lee and J. Monks, New Biologist 4,369 (1992). 178. B. Berkhout and K.-T. Jeang, J. Virol. 66, 139 (1992). 179. M. K. Sin& and C. D. Pauza, Virology 188, 451 (1992). 180. M. Bonfanti, M. Broggini, C. Prontera and M. D’Incalci, NARes 19, 5739 (1991). 143. M. Giacai, M. I. Gutierrez, S. Menzo,

HIV-1 GENE EXPRESSION

191

181. H. Inlataka, K. Sogawa, K. Yasumoto.Y. Kikuchi, K. Sasano, A. Kobayashi, M. Hayami and Y. Fujii-Kuriyama, EMBO J . 11, 3663 (1992). 182. J. J. Pyre. K. H. Moberg and D. J. Hall, Bcherti 31, 4102 (1992). 183. G . Hagen, S. Muller, M. Bedto and G. Suske, NARes 20, 5519 (1992). 184. C. Kingsley and A. Winoto, MCBiol 12, 4251 (1992). 185. K. Sogiwa, H. Imataka, Y. Yamasaki, H. Kusume, H. Abe and Y. Fujii-Kuriyama, NARes 21, 1527 (1993). 186. J. A. Garcia, D. Harrich, E. Soultanakis, F. Wu, R. Mitsuyasu and R. B. Gaynor, E M B O J . 8, 765 (1989). 187. H. S. Olsen and C. A. Rosen, J. Virol. 66, 5594 (1992). 188. E. I. Golub, G. Li and D. J. Volsky, AIDS 5, 663 (1991). 189. P. Han, R. Brnwn and J. Barsoum, NARes 19, 7225 (1991). 190. L. Zawel and D. Reinberg, This Series 44, 67 (1993). 191. H. Kato, M. Horikoshi and R. G. Roeder, Science 251, 1476 (1991). 192. J. A. Garcia, S.-H. I. Ou, F. Wu, A. J. Lusis, R. S. Sparkes and R. B. Gaynor, PNAS 89, 9372 (1992). 193. M. Sakaguchi. 8. Zenzie-Gregory, J. E. Groopman, S. T.Smale and S. Kim, J . Virol. 65, 5448 (1991). 194. W. Chen and K. Struhl, PNAS 85, 2691 (1988). 195. I. C. A. Taylor and R. E. Kingston, MCBiol 10, 165 (1990). 196. F. C. Wefald, B. H. Devlin and R. S. Williams, Nature 344, 260 (1990). 197. M. Korneyeva. P. Stalhandske and B. Asjo, J . AIDS 6, 231 (1993). 198. D. E. Crone, H.-S. Kim and S. R. Spindler, JBC 265, 10851 (1990). 199. D. Desmarais, M. Filion, L. Lapointe and A. Royal, E M B O J . 11, 2971 (1992). 200. T. C. Fong and B. M. Emerson, Genes DeG. 6, 521 (1992). 201. A. McCorniick, H. Brady, J, Fukushima and M. Karin, Genes Dea 5, 1490 (1991). 202. Y. Zhang, K. Doyle and M. Bina, J . Virol. 66, 5631 (1992). 203. C. Murre, P. S. McCaw, H. Vassin, M. Caudy, L. Y. Jan, Y. N. Jan, C. V. Cabrera, J. N. Buskin, S. D. Hauschka, A. B. Lassar. H. Weintraub and D. Baltimore, Cell 58, 537 (1989). 204. E. S. Klein, 11. M. Simmons, L. W. Swanson and M. G. Rosenfeld, Genes Dea 7 , 55 (1993). 205. B. Zenzie-Gregory and S. T. Smale. personal communication with S. T.Smale (1993). 206. K. A. Jones, P. A. Luciw and N. I>uchange, Genes Dea 2, 1101 (1988). 207. A. L. Roy, M. Meisterernst, P. Pognonec and R. G. Roeder, Nature 354, 245 (1991). 208. H. Du, A. L. Roy and R. G. Roeder, EMBOJ. 12, 501 (1993). 209. M. Schorpp, P.L. Sheridan and K. A. Jones, personal mmmunication with K. A. Jones (1993). 210. R. Ratnasalyapathy. M. Sheldon, L. Johd and N. Hernandez, Genes Dee. 4 , 2061 (1990). 211. M. Sheldon, R. Ratnasabapathy and N. Hernandez, MCBiol 13, 1251 (1993). 212. K. Pfeifer, M. Bachmann, H. C. Schroder, B. E. Weiler, D. Ugarkovic, T. Okamoto and W. E. G. Muller, JBC 266, 14620 (1991). 213. S. K. Arya, C. Guo, S. F. Josephs and F. Wong-Stwl, Science 229, 69 (1985). 214. C. M. Wright, B. K, Felber, H. Paskalis and G. N. Pavlakis, Science 234, 988 (1986). 215. J. Hauber and B. R. Cullen, J . Virol. 63, 673 (1988). 216. A.Jakobovits, D. H. Smith. E. B. Jakobivits and D. J. Capon, MCBiol 8, 2555 (1988). 217. S. Feng and E. C. Holland, Nature 334, 165 (1988). 218. M. J. Selby and B. M. Peterlin, Cell 62, 769 (1990). 219. S. Roy, N. T. Parkin, C. Rosen. J. Itovitch and N. Sonenberg, J . Virol. 64, 1402 (1990).

192

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

220. S. Roy, U. Delling, C.-H. Chen, C. A. Rosen and N. Sonenlxxg, Genes Deu. 4, 1365 (1990). 221. U. Delling, L. S. Reid, R. W.Barnett, M. Y.-X. Ma, S. Climie, M. Sumner-Smith and N. Sonenberg, J . Virol. 66, 3018 (1992). 222. B. Berkhout and K.-T. Jeang, NARes 19,6169 (1991). 223. B. M. Peterlin, P. A. Luciw, P. J. Bur and M. D. Walker, PNAS 83, 9734 (1986). 224. B. R. Cullen. Cell 46, 973 (1986). 225. B. A. Pumell and D. S. Cilmour, MCBiol 13, 2593 (1993). 226. C. Dingwall, I. Emberg, M. J. Gdt, S. M. Green. S. Heaphy, J. Karn, A. D. h e , M. Sin& M. A. Skinner and R. Vderio, PNAS 86,6925 (1989). 227. K. M.Weeks, C. Ampe, S.C. Schultz, T. A. Steitz and D. M.Crothers, Science 249,1281 (1990). 228. M. G. Cordingley, R. L. LaFemina, P. L. Cdlahan, J. H. Condra, V. V. Sardana, D. J. Graham. T. M. Nguyen, K. LeCmw, L.Cotlib, A. J. SchlabwhandR. J. Colonno, PNAS 87, 8985 (lae0). 229. R. Caynor, E. Soultanakis, M. Kuwdbara, J. Garcia and D. S. Sigman, PNAS 86, 4858 (1989). 230. C.T. Sheline, L. H. Milocco and K . A. Jones, Genes Der; 5, 25015 (1991). 231. F. Wu, J. Garcia and D. Sigman, Genes Deu 5, 2128 (1991). 232. M. P. Rounseville and A. Kuniar, 1. Virol. 66, 1688 (1992). 233. A. Gatignol, A. Buckler-White, B. Berkhout and K.-T. Jeang. Science 251, 1597 (1991). 234. X.-M. Han, A. Laras, M . P. Rounseville, A. Kumw and P. R. Shank, 1.Virul. 66, 4065 (1992). 235. T. Masuda and S . Harada, Virology 192, 686 (1993). 236. A. Bielinska, L. Bder, N. Hailat, J. R. Strahler, C. J. Nabel and S. Hanash,]. Zmmunol. 146, 1031 (1991). 237. 0. Bagasra, K. Khalili, T. Seshamma, J. P. Taylor and R. J. Pomerantz, J . Virol.66, 7522 (1992). 238. J. P. Taylor, R. Pomerantz, 0. Bawra, M. Chowdury, J. Rappaport, K. Khalili and S. Amini, EMBO]. 11, 3395 (1992). 239. J. Remenick, M. F. Radonovich and J. N. Brady, J . Virol. 65, 5641 (1991). 240. C. E. Hart,C.-Y. Ou, J. C. Calphin, J. Moore, L. T. Bacheler, J. J. Wasmuth, S. R. Petteway, Jr., and C . Schochetman, Science 246, 488 (1989). 241. M. Newstein, E. Stanbridge, G. Casey and P. R. Shank, ]. Virol.64,4565 (1990). 242. A. Alonso, D.Derse and B. M. Peterlin, J. Virul.66, 4617 (1992). 243. C. E.Hart, M. A. Westhider, J. C. Calphin, C.-Y. Ou, L. T.Bacheler, S. R. Petteway, Jr., J. J. Wasmuth, I. S. Y. Chen and G. Schochetman, AIDS Res. Hum. Retruoiruses 7,877 (1991). 244. F. Baudin, R. Marquet, C. Isel, J.-L. Darlix, B. Ehresmann and C. Ehresmann,J M B 229, 382 (1993). 245. C. Weichs ander Clon, J. Monks and N. J. Proidfoot, Genes Dea 5, 244 (1991). 246. J. Cherrington and D. Ganem, EMBO]. 11, 1513 (19%). 247. J. D. DeZtlzzo, J. E. Kilptrick and M. J. Imperide, MCBiol 11, 1624 (1991). 248. J. D. DeZazzo, J. M. Scott and M. J. Imperiale, MCBiol 12, 5555 (1992). 249. C. M. Cilmartin. E. S. Fleming and J. Oetjen, E M B O J . 11, 4419 (1992). 250. A. Vdsamakir, S . Zeichner, S. Curswell and J. C. Alwine, PNAS 88, 2108 (1991). 251. A.Valsamakis, N. Schek and J. C. Alwine, MCBbl 12, 3699 (1992). 252. S. Bohnlein, J. Hauber and B. R. Cullen, J. Virol. 63, 421 (1989). 253. D.J. Kenan, C. C. Query and J. D. Keene, TIBS 16, 214 (1991). 254. S. R. Haynes, N e w Biofogist 4, 421 (1992).

HIV-1 GENE EXPHESSION

193

255. S. A. Amero, G . Raychaudhuri. C. L. Cass, W. J. van Venrooij, W. J. Habets, A. R. Krainer and A. L. Beyer, PNAS 89,8409 (1992). 256. D. Landsman, NARes 90, 2861 (1992). 257. N. Tay, S.-H. Chan and E.-C. Ren, J . Virol. 66, 6841 (1992). 258. B. J. Hamilton. E. Nagy, J. S. Makers, B. A. Arrick and W. F. C. Rigby, JBC 268, 8881 (1993). 259. N. T. Parkin, E. A. Cohen, A. Darveau, C. Rosen, W. Haseltine and N. Sonenberg, E M B O J . 7, 2831 (1988). 260. M. Braddock, A. Chambers, W. Wilson, M. P. Esnouf, S. E. Adams, A. J. Kingsmanand S. M. Kingsman, Cell IS, 269 (1989). 261. M. Braddock, A. M. Thornburn, A. Chambers, G. D. Elliott, G. J. Anderson, A. J. Kingsinan and S. M. Kingsinan, Cell 62, 1123 (1990). 262. L. Sharmeen, B. Bass, N. Sonenlwrg, H. Weintraub and M. Groudine, PNAS 88, 80% (1991). 263. A. D. Blanchard, R. Powell, M. Braddock, A. J. Kingsman and S. M. Kingsman, J. Virol. 66, 6769 (1992). 264. M. B. Agy, M. Wainhach, K. Foy and M. G. Katze, Virology 177, 251 (1990). 265. D. J. Chin, M. J. Selby and B. M. Peterlin, J. Virol. 65, 1758 (1991). 266. M. Braddock, R. Powell, A. D. Blanchard, A. J. Kingsinan and S. M. Kingsman, FASEBJ. 7, 214 (1993). 267. D. Emelie, M.-C. Maillot, I.-F. Nicolas, R. Fior and P. Galanand, JBC 267,20565 (1992). 268. D. C. Montfiori and W. M. Mitchell, PNAS 84, 2985 (1987). 269. R. Banerjee, P. M. Price, M. W. Sung, S. Karpen and G. Acs. Virology 179, 410 (1990). 270. J. Laurence, AIDS Res. Hum. Retrmiruses 6 , 1149 (1990). 271. G. Y. Zhang, B. Beltchev, A. Fournier, Y. H. Zhang, A. Malassine, C. Bisbal, B. Ehresmann, C. Ehresmann, J. L. Darlix and M. N. Thang, AIDS Res. Hum.Refrmiruses 9, 189 (1993). 272. N. Sonenberg, New Biologist 2, 402 (1990). 273. R. E. Rhoads, JBC 268, 3017 (1993). 274. C. E. Samuel, JBC 268, 7603 (1993). 275. S. Roy, M. Agy, A. G. Hovanessian, N. Sonenberg and M. G . Katze, J. Virol. 65, 632 (1991). 276. D. N. SenGupta and R. H. Silvennan, NARes 17, 969 (1989). 277. H. C. Schroeder, R. Wenger, Y. Kucino and W. E. G. Muller,JBC 264,5669 (1989). 278. H. C. Schroeder, D. Ugarkovic, R. Wenger, P. Reuter, T Okainoto and W. E. G . Muller, AIDS Res. Hum. Retroviruses 6, 659 (1990). 279. C. M. Drysdale and G. N. Pavlakis, J . Virol. 65, 3044 (1991). 280. S. Gunnery, A. P. Rice, H. D. Robertson and M. B. Mathews, PNAS 87, 8687 (1990). 281. S. Gunnery, S. R. Green and M. B. Mathews, PNAS 89, 11557 (1992). 282. M. L. Francis, M. S. Meltzer and H. E. Gendelman, AIDS Res. llutn. Retrmiruses 8,199 (1992). 283. A. P. Geballe and M. K. Gray, NARes 20, 4291 (1992). 284. S. Roy, M. G. Katze, N. T. Parkin, I. Edery. A. C. Hovanessian and N. Sonenberg, Science 247, 1216 (1990). 285. J. A. Garcia, D. Harrich, L. Pearson, R. Mitsuyasu and R. B. Gaynor. EMBOJ. 7, 3143 (1988). 286. M. Kuppuswamy, T. Subramantan, A. Srinivasan and G. Chinnadurai. NARes 17, 3551 (1989). 287. S. Ruben, A. Perkins, R. Purcell, K. Joung, R. Sia, R. Burghoff, W. A. Haseltine and C. A. Rosen, J . Virol. 63, 1 (1989).

194

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

288. T. Subrarnanian. R. Govindardjian and G . Chinnadurai, EMBOJ. 10, 2311 (1991). 289. M. R. Sadaie, R. Mukhopadhyayd, Z. N. Benaissa, G. N. Pavlakis and F. Wong-Stad, AIDS Res. Hum. Retrodruses 6, 1257 (1990). 290. Y. Miyazaki, T. 'Idkarnatsu, T. Nosaka, S . Fujita and M. Hatanaka, FEBS Lett. 305, 1 (1992). 291. W. E. G . Muller, T.Okamoto, P. Reuter, D. Ugwkovic and H. C. Schroder, JBC 265, 3803 (1990). 292. J. Rappaport, S.-J. Lee, K. Khalili and F. Wong-Std, New Biologist 1, 101 (1989). 293. L. S. Tiley, P. H. Brown and B. C. Cullen, Virology 178, 560 (1990). 294. A. P. Rice and F. Chan, Virology 185, 451 (1991). 295. M. R. Sadaie, J. Rappaport, T. Benter, S. F. Josephs, R. Willis and F. Wong-Stail, PNAS 85, 9224 (1968). 296. A. P. Rice and F. Carlotti, J. Virol. 64, 1864 (1990). 297. J. Hauber, M. Maliin and B. C. Cullen, J . Virol. 63, 1181 (1989). 298. H. Siomi, H. Shida, M. Maki and M. Hatmaka, J . Virol. 64, 1803 (1990). 299. B. J. Calnan, S . Biancalana, D. Hudson and A. D. Frankel, Genes Dew. 5, 201 (1991). 300. J. Tho and A. D. Frankel, PNAS 90, 1571 (1993). 301. B. S . Weeks, K. Desai, P. M. Loewenstein, M. E. Klotman, P. E. Klotman, M. Green and H. K. Kleinman, JBC 268, 5279 (1993). 302. D. A. Brake, C. Debouck and G. Biesecker, J . Cell B i d . 111, 1275 (1990). 303. E. P. Loret, E. Vives, P. S. Ho, H.Rochat, J. Van Rietschoten and W. C. Johnson, Jr., Bchein 30, 6013 (1991). 304. E. P. Loret, P. Georgel, W. C. Johnson, Jr., and P. S. Ho, PNAS 89, 9734 (1992). 305. J. D. Puglisi, R. Tan, B. J. Calnan, A. J. Frankel and J. R. Williamson, Science 257, 76 (1992). 306. R. Tan and A. D. Frankel, Bchetn 31, 10288 (1992). 307. M. Suniner-Smith, S. Roy, R. Barnett, L. S. Reid, R. Kuperman, U. Delling and N. Sonenberg, J . Virol. 65, 5196 (1991). 308. J. W. Harper and N. J. Logsdon, Bchein 30, 8060 (1991). 309. J. D. Puglisi, L. Chen, A. 1).Frankel and J. R. Williamson, PNAS 90, 3680 (1993). 310. K. M. Weeks and I>. M. Crothers, Bchein 31, 10281 (1992). 311. M. J. Churcher, C. Lamont, F. Hamy, C. Dinpall, S. M. Green, A. I). Lowe, P. J. G . Butler, M. J. Gait and J. Karn. J M B 230, 90 (1993). 312. R. W. Barnett, U. Delling, R. Kuperman, N. Sonenberg and M. Suniner-Smith, NARes 21, 151 (1993). 313. F. Hamy, U. Asseline, J. Grdsby, S. Iwai, C. Prithcard, G. Slim, P. J. G. Butler, J. Karn and M. J. Gait, J M B 230, 111(1993). 314. M. R. Sadaie, T. Benter and F. Wong-Stail. Science 239, 910 (1988). 315. K. J. Stlstry, R. H.R. Reddy, R. Pandita, K. Totpal and B. B. Agga~wal,JBC 265, 20091 (1990). 316. L. Buonaguro, G . Barillari, H. K. Chang, C. A. Bohan, V. Kao, R. Morgan, R. C. Gallo and B. Ensoli, J . Virol. 66, 7159 (1992). 317. J. P. Taylor. C. Cupp, A. Diaz, M. Chowdury, K. Khdlili, S. A. Jiinenez and S. Amini, PNAS 89, 9617 (1992). 318. W.-Z. Ho, AIDS Res. Huin. Retrwiruses 7, 669 (1991). 319. M. Chowdhury, J. P. Taylor, C.-F. Chang, J. Rappapot and K. Khalili. J . Virol. 66,7355 (1992). 320. M. Chowdury, M. Kundu and K. Khalili, Oncogene 8, 887 (1993). 321. Y.4. Kim and R. Risser, J . Virol. 67, 239 (1993).

HIV-1 GENE EXPRESSION

322. 323. 324. 325. 326. 327. 328. 329. 330. 331. 332. 333. 334. 335. 336. 337.

338. 339. 340. 341. 342. 343. 344. 345.

195

K. L. Jang, M. K. L. Collins and D. S. Latchman, 1. AZDS 5, 1142 (1992). E. P. Geiduschek and G. P. Tocchini-Valentini, ARB 57, 873 (1988). A. P. WolEe, Curr. Opin. Cell Biol. 3, 461 (1991). A.-F. Burnol, F. Margottin, J. Huet, G. Almouzni, G. Prioleau, M.-N. Prioleau, M. Mechdi and A. Sentenac, Nature 362,475 (1993). J. Hauber, A. Perkins, E. P. Heimer and B. R. Cullen, PNAS 84, 6364 (1987). K.-T. Jemg, P. R. Shank and A. Kumar, PNAS 85, 8291 (1988). M. Kessler and M. B. Mathews, PNAS 88, 10018 (1991). B. Berkhout, R. H. Silverman and K.-T Jeang, Cell 59, 273 (1989). S.-Y.Kao, A. F. Calman, P. A. Luciw and B. M. Petelin, Nature 330, 489 (1987). K.-T. Jeang and B. Berkhout, JBC 267, 17891 (1992). M. F. Laspia, A. P. Rice and M. B. Mathews, Cell 59, 283 (1989). R. A. Marciniak and P. A. Sharp, E M B O J . 10, 4189 (1991). M. B. Feinberg, D. Baltimore and A. D. Frankel, PNAS 88,4045 (1991). C. Southgate, M. L. Zapp and M. R. Green, Nature 345, 640 (1990). L. S. Tiley, S. J. Madore, M. H. Malim and B. R. Cullen, Genes Deu 6, 2077 (1992). C. D. Southgate and M. R. Green, Genes Den 5, 2496 (1991). P. Nelbock, P. J. Dillon. A. Perkins and C. A. Rosen, Science 248, 1650 (1990). B. Ohana, P. A. Moore. S. M. Ruben, C. D. Southgate, M. R. Green and C. A. Rosen, PNAS 90, 138 (1993). C. Goyer, H. S. Lee, D. Malo and N. Sonenberg, DNA Cell Biol. 11, 579 (1992). H. Shibuya, K. Irie, J. Ninomiya-Tsuji, M. Goebl, T. Taniguchi and K. Matsumoto, Nature 357, 700 (1992). J. C. S w i e l d , J. F. Bromberg and S. A. Johnston, Nature 357, 698 (1992). K. Desai, P. M. Loewenstein and M. Green, PNAS 88, 8875 (1991). M. Roederer, P. A. Raju, F. J. T. S t d , L. A. Henenberg and L. A. Henenberg, AZDS Res. Hum. Retrouiruses 7, 563 (1991). F. Fazely, B. J. Dezube, J. Allen-Ryan, A. B. Pardee and R. M. Ruprecht. Blood 77, 1653

(1991). 346. D. A. Spandidos, V. Zoumpourlis, A. Kotsinas, H. R. Maurer and P. Patsilinacos, Genet. Anal. Techniques and Applications 7, 138 (1990). 347. S . P. Squinto, D. Mondd, A. L. Block and 0. Prakash, AZDS Res. Hum. Retrmiruses 6, 1163 (1990). 348. M.-C. Hsu, A. D. Schutt, M. Holly, L. W. Slice, M. I. Sherman, D. D. Richman, M. J. Potash and D. J. Volsky, Science 254, 1799 (1991). 349. C. J. Li, L. J. Zhang, B. J. Dezube, C. S. Crumpacker and A. B. Pardee, PNAS 90, 1839 ( 1993). 350. D. Weissman, G. Poli, A. Bousseau and A. S. Fauci, PNAS 90, 2537 (1993). 351. 0. M. Z. Howard, K. A. Clouse, C. Smith, R.G. Goodwin and W. L. Farrar, PNAS 90, 2335 (1993). 352. J. Laurence, S. K. Sikder, J. Kulkosky, P. Miller and P. 0. P. Ts’o, J . Virol. 65,213 (1991). 353. J. Goodchild, S. Agrawd, M. P. Civeira, P. S. Sarin, D. Sun and P. C . Zamecnik, PNAS 85, 5507 (1988). 354. P. S. Sarin, S. AgrdWdl, M. P. Civiera. J. Goodchild, T. Ikeuchi and P. C. Zamecnik, PNAS 85, 7448 (1988). 355. H. M. Buck, L. H. h l e , M. H. P. van Genderen, L. Smit, J. L. M. C. Geelen, S. Jurriaans and J. Goudsmit, Science 248, 208 (1990). 356. T.Vickers, B. F. Baker, P. D. Cook, M. Zounes, R. W.Buckheit, Jr., J. Germany and D. J. Ecker, NARes 19, 3359 (1991).

196

JOSEPH A. GARCIA AND RICHARD B. GAYNOR

357. P. C. Zamecnik, J. Godchild, Y. Taguchi and P. S. Sarin, PNAS 83, 4143 (1986). 358. J. A. Zaia, J. J. Rossi, G. J. Murakawa, P. A. Spallone, D. A. Stephens, B. E. Kaplan, R. Eritja, B. Wallace and E. M. Cantin, J . Virol. 62, 3914 (1988). 359. A. Mujeeb, S. M. Kenvin. W. Egan, G . L. Kenyon and T. L. James, Bcketn 31, 9325 (1992). 360. R. L. Letsinger, G. R. Zhang, D. K. Sun, T. Ikeuchi and P. S. Sarin, PNAS 86, 6553 (1989). 361. P. F. Torrence. R. K. Maitra, K. Lesiak, S. Khamnei, A. Zhou and R. H . Silverman, PNAS 90,1300 (1993). 362. D. S . Sigman, Bckem 29, 9097 (1990). 363. S. Chatterjee, P. R. Johnson and K. K. Wong, Jr.. Science 258, 1485 (1992). 364. G. Sczakiel, M. Oppenlander, K. Rittner and M. Pawlita, J . Virol. 66, 5576 (1992). 365. K. Rittner and G . Scmkiel, NARes 19, 1421 (1991). 366. A Rhodes and W. James, J. Gen. Virol. 71, 1965 (1990). 367. A. Rhodes and A. James, AZDS 5, 145 (1991). 368. M. Weemsinghe, S. E. Liem, S. Asad, S. E. Read and S. Joshi, J. Virol. 65, 5531 (1991). 369. B. Dropulic, N. H. Lin, M. A. Martin and K.-T. Jeang, J . Virol. 66, 1432 (1992). 370. K. M. S. Lo, M. A. Biasolo, G. Dehni, G . Pdu and W. A. Haseltine, Virology 190, 176 (1992). 371. J. 0. Ojwang, A. Hampel, D. J. Looney, F. Wong-Std and J. Rappaport, PNAS 89,10802 (1992). 372. S. Joshi, A. Van Brunschot, S. Asad, I. Van der Elst, S. E. Read and A. Ber1istein.J. Virol. 65, 5524 (1991). 373. J. Lisziewicz, J. Rappaport and R. Dhar, New Biologist 3, 82 (1991). 374. G. J. Graham and J. J. Main, PNAS 87, 5817 (1990). 375. B. A. Sullenger, H. F. Gdlardo, G. E. Ungers and E. Gillma,J. Virol. 65, 6811 (1991). 376. A. Chandra, I. Demirhan, S. K. A y and P. Chandrd, FEBS Lett. 236,282 (1988).

377. S. Kubota, M. A. El-Farrash, M. Maki, S. Hamda and M. Hatanaka, AZDS Res. Hum. Retrmiruses 6, 919 (1990). 378. M. Green. M. Ishino mid P. M. Loewenstein, Cell 58, 215 (1989). 379. N. Modesti, J. Garcia, C. Debouck, M. Peterlin and R. Gaynor, New Biologist 3, 759 (1991). 380. L. Pearson, J. C a d d , F. Wu, N. Modesti, J. Nelson and R. Gaynor, PNAS 87,5079 (1990). 381. R. Carroll, B. M. Peterlin and D. Derse, J . Virol. 66, 2000 (1992). 382. A. M. Ventura, M. Q. Arens, A. Srinivasan and G. Chinnadurai, PNAS 87, 1310 (1990). 383. T. Hehlgans, M. Stolz, S. Klauser, T. Cui, P. Sdgam, S. B. Verca, M. Widmann, A. Leiser, K. Stadler and B. Gutte, FEBS Lett. 315, 51 (1993).

Processing of Eukaryotic Ribosomal RNA DUANEC. EICHLER*.~ AND NESSLYCRAIG+

* Departtnent of Biochemistry kt Molecular Biology University of South Florida College of Medicine Tattipa, Florida 33612 'Departtnent of Biohgfcal Sciences University of Maryland, Baltitnore County Baltiinore, Maryland 21228

I. Pnmssing Sites and Procasing Pathways ........................ A. Processing Sites ........................................... B. The Mechanism of Rih)somd-RNA Processing ................ 1. Small Nucleolar RNAs .................................... 2. Proteins Required for rRNA Processing ...................... 3. Signals Required for rRNA Processing ....................... 4. PnKessing Complexes .................................... 11. The Relationship between Ribosomal-RNA Pnxessing and Posttranscriptiooal Modifications ................................... A. Rilwse Methylation ........................................ B. Pseiidouridylationand Base Methylation ......................

.................................................... References.. .................................................

111. Summary

199 204 209 210 213

223 229 231 231

233 233 234

Transcription units encoding eukaryotic ribosomal RNA (rRNA) are located at chromosomal sites termed nucleolar organizers that give rise to suborganelles called nucleoli (1-7).Nucleoli are not only the sites for the organization of rRNA genes, but are also sites for the synthesis, modification, processing, and assembly of rRNA into ribosomes (2, 3, 5, 6). To accommodate the demand for the large number of ribosomes involved in cellular translation, rRNA transcription units, together with their flanking sequences, are tandemly repeated 50-1OOO times in the chromosome of most eukaryotes (4). Each repeated transcription unit encodes a primary transcript that contains the sequences for the mature rRNAs (18S, 5.8S, and 28s rRNAs) plus additional transcribed spacer sequences (1-7).The small subunit (40s) of the 1

To whom correspondence may be addressed.

Pmgrexr in Nudeic Acid Research and Molwulur Biningy, Vnl. 49

197

Copyright 0 1994 by Acdemic Press. Inc. All rights of rrpmdiictionin any fonn rr0erv.d.

198

DUANE C. EICHLER AND NESSLY CRAIG

eukaryotic ribosome contains the 18s rRNA, while the large subunit (60s) contains the 28s and 5.8s rRNAs. The 5s rRNA is also a component of the large subunit, but is synthesized independently of the other rRNAs. The sequences within the repeated rRNA transcription unit have the general arrangement of 5‘-external transcribed spacer (5’-ETS), 18s rRNA, internal transcribed spacer 1(ITSl), 5.8s rRNA, internal transcribed spacer 2 (ITS2), 28s rRNA, 3’-external transcribed spacer (3’-ETS) (1-7;see Fig. 1).2 Transcription units are separated by nontranscribed spacer (NTS) sequences, which may play a role in regulating the transcriptional initiation and terinination of RNA polymerase I (8). Transcription of rRNA in eukaryotes yields a primary transcript of 3547s (1-7).The nascent precursor rRNA is modified and cleaved from prerRNA intermediates during the early stages of processing (9, 10). Newly synthesized ribosomal proteins are transported into the nucleolus and associate with the precursor rRNA intermediates and other accessory proteins to form ribonucleoprotein particles (1-7).The ribonucleoprotein particles mature into ribosome subunits, which are nearly complete before being transported out of the nucleus (2,3, 5-10). Because the ribosome is the operational structure for translation, a normal dividing cell may need as many as 1 million copies to provide sufficient machinery to meet translational demands. To meet these requirements, production of new ribosomes is directed by the availability of newly synthesized rRNA (8)as well as by the post-transcriptional processing of the newly synthesized transcripts. Thus, the rate of ribosome formation is precisely regulated and coordinated with the rate of cell growth and the state of cellular differentiation and development. The importance of post-transcriptional events in the regulation of ribosome production is supported by studies showing that the increased synthesis of ribosomes is often due to the more efficient use of precursor rRNA rather than to the increased synthesis of the rRNA precursor. Cells with shorter generation times, such as HeLa cells and regenerating rat liver, process their rRNA more rapidly than cells with longer generation times (11-14).Slower-growingcells may even contain a fraction of precursor rRNA that is degraded rather than processed to the mature rRNAs (11,15).This process of “wastage”is also exemplified when synthesis of ribosomal proteins is inhibited by heat shock in cultured cells of Drosophik. Under these conditions, the synthesis of precursor rRNA continues, but the newly synthesized transcripts are degraded (16). 2 Abbreviations used: ETS, external transcribed slxwer; ITS. i n t e n d transcribed spacer; NTS, nontranscri1)edspacer; snoRNA, small nucleolar RNA; snoRNP, sinall nucleolar ribonucleoprotein particle; TMG,triiiiethylRuaiiosine.

199

RNA PROCESSING

Various features of nucleolar function, rRNA synthesis, and rRNA maturation have been covered in other review articles (1-6, 17-22). This essay focuses specifically on aspects of eukaryotic rRNA processing, with emphasis on the molecular details of processing.

1. Processing Sites and Processing Pathways The transcriptional unit that encodes rRNA precursor appears to be conserved in all eukaryotes in terms of the arrangement of the mature rRNA sequences and the transcribed spacer elements. In general, the sequences of the mature rRNA are highly conserved but the length of these mature rRNA sequences can vary somewhat among different eukaryotes as a result of the evolutionary insertion of “expansion regions” (20, 23). Thus, mouse and human 18s and 28s rRNAs are larger than the yeast homologues. In contrast, the transcribed spacer regions are not well-conserved in either sequence or size (see Table I), and these variations have raised concerns as to whether processing itself is evolutionarily conserved. Nevertheless, research related to eukaryotic rRNA processing has, for the most part, been based on the assumption that pre-rRNA is processed in all eukaryotes in an analogous and conserved manner. The sequence of processing events en route to the formation of functional rRNA species has been extensively studied for many years (1-7, 9, 10, 2429). In general, the production of 5‘- and 3’-termini of mature 18S, 5.8S, and 28s rRNA species occurs in a somewhat polar fashion, from the 5’ to the 3‘ end of the nascent transcript. However, the order and intermediates generated during rRNA processing may vary among eukaryotic species as well as with growth and development in the same species. Processing pathways may even vary within the same cell. For example, in rat liver (13), Xenopus oocytes (30, 31), and mammalian HeLa cells (29, 32), multiple pre-rRNAprocessing pathways have been reported to function. The relationship of these varied pathways to ribosome biogenesis, however, is unclear. Nevertheless, in eukaryotes there are at least six sites that correspond to the mature ends of 18S, 5.8S, and 28s rRNAs, and each site must be cleaved in order to release the mature rRNA species from the precursor. In addition, all of the transcribed spacer regions must be degraded. Figures 1 and 2 illustrate rRNA transcriptional units, the main cleavage sites, and the processing pathways that have been reported for the mouse (a “typical” vertebrate) and for yeast. In the following discussion, “cleavage” refers to a particular site, but it is important to emphasize that it has not been definitely established in many cases whether the “cleavage”is due directly to

TABLE I SIZE OF PRECURSOH~ rRNA REGIONS (NUCLEOTIDES)“

Species Yeast

Tetrahy n e ~ Neu rospora Xenopus Mouse Human

5‘-ETS 696 649

1MOl (22, 14)” 1752

860 712 4006

1795 1826(38,44) 1871 (45, 36) 1869 (45, 36)

3658

ITS1

184 rRNA

362 130 184 ,557 999 1095 ~~

ITS2

2.84 rRNA

3’-ETS

158 (2, 2)lJ 154

234 177

3393 (43, 32)” 3760 (precurs0r)C 3343 (mature)

210 14

6854 6636

4115 (69, 52) 4712(70, 57) 5035 (45. 70)

2-35 565 352

13,399 13,321

157 162 (2, 2) 157 (2, 2) 157 (2, 2)

146 262 1089 1155

Total

7869

~~

” References: Yeast, Saccharorrycescerccbiae (39);Teirahyieua thennuphila 1,

5.8 S

(1 71 ); Nectruspora crasaa (1 72-174); Xcriopar h c i s (17s);inoiise (176);and human (1 77). Number of inethylated nucleotides. pseiiduiiridines; W W % lit‘ the inethylations are on the ribose griiup. Includes the unique intervening seqiienrr. which self-splirrs out during rHNA inatiiration (42). and the fonr-iiiicltuitide “hidden break deletion” in the 28-S rRNA (45).

RNA PROCESSING

201 MOUSe ITS1

ITS2

B. Cut at #O, #6

I-

47s Precursor

Jf

ffy 32s (major)

“2t

10s

5.0s m-

28s

FIG. 1. (A) A representation of the transcription unit encoding mouse precursor rRNA. Labels above the figure are positioned to refer to the sequences within the rRNA transcription unit. Numbers below the figure indicate positions of processing sites relative to the sequences within the rRNA transcription unit. “Init.” refers to the transcription initiation site, and ‘Term.” refers to the termination site for transcription. (B) The processing pathwdy for mouse (vertebrate) precursor rRNA. Cleavage sites are indicated b y numbers and refer to positions indicated in (A). Numliers ending with “S” indicate the relative sedimentation coefficient (size) of the various intermediates and products of the processing pathway.

an endonucleolytic cleavage or whether the cleavage actually results from the trimming action of an exonuclease that initiated its attack from some other endonucleolytic site. This ambiguity is due in part to the fact that the steady-state level of any rRNA intermediate is determined by the relative rates of cleavage and trimming reactions at each processing site. For example, the relative amounts of the 5’-ETS 24s intermediate is usually quite low because it is quickly degraded, whereas the 32s intermediate (5.8s-ITS228s) is very abundant, presumably because the ITS2 cleavages are kinetically slower (24, 32, 33).

202

DUANE C. EICHLER AND NESSLY CRAIG

Yeast A.

ITS1

ITS2

82 Bl

B. Cut at AO, A1

355 Precursor

I

.-.

32s

FIG.2. (A) A representation of the transcription unit encoding yeast precursor rRNA. Labels above the figure are positioned to refer to the sequences within the rRNA transcription unit. Numbers below the figure indicate positions of processing sites relative to the sequences within the rRNA transcription unit. “Init.” refers to the transcription initiation site, and ‘Term.” refers to the termination site for transcription. (B)The processing pathway for yeast precursor rRNA. Cleavage sites are indicnted by numbers and refer to positions indicated in (A). Numbers ending with “S” indicate the relative sedimentation coefficient (size) of the various intermediates and products of the processing pathway.

In mammals, the initial pre-rRNA (47s) is rapidly cleaved at site #O (nucleotide +650 in the mouse, and +414 in humans; see Fig. 1)and then (or coordinately)cleaved at site #6 at the 3‘ end of the 28s rRNA sequence (34). It is believed that these two early cleavages release the easily detectable 45s pre-rRNA, which was initially thought to be the primary rRNA transcript (35).Since RNA polymerase I does not terminate until some 565 nucleotides downstream of the 3‘ end of the 28s rRNA sequence, there appears to be a very rapid (concomitant) processing to +565 beyond the 3’

RNA PROCESSING

203

end of 28S, followed by cleavage at the 3’ end of the 28s rRNA sequence (36, 37). The next cleavage may occur at site #1 (the 5’end of 18s)or at site #2b (ITSl), giving rise to two different processing pathways. These two alternative pathways have been described in various cell types, and in the same cell under different growth conditions (7,29, 30, 32). In some Xenopus oocytes, cleavage at site #2b is observed resulting in one processing pathway, while in other oocytes cleavage at both sites (#1 and #2b) has been observed, giving rise to two processing pathways (30, 31, 38). In addition, after microinjection of precursor rRNA into nuclei of oocytes (which might produce a stress response), some of the oocytes that usually process by one pathway switched to the other (31). The splitting of the 18s rRNA sequence from the 5.8s-ITS2-28s rRNA sequences generally occurs next as a result of cleavage in ITSl. This cleavage is then followed by a cleavage in ITS2, which results in separation of the 5.8s and 28s rRNA sequences. Subsequent trimming of these intermediates generates the mature 5.8s and 28s rRNAs. In yeast, the processing scheme is basically similar (see Fig. 2B). However, because the rate of pre-rRNA processing is more rapid than in mammalian cells, sites A1 (5’-18S) and A2 (ITS1) appear to be cleaved simultaneously; thus, “different” pathways are not detectable under normal wild-type growth conditions (21, 39-41). In a few organisms, additional processing cleavages can occur. For example, in the single-cell protozoan Tetrahymena, there is a group I (“selfsplicing”) intron present in the 27s rRNA sequence that is spliced out during processing (42).This occurs through a self-splicing mechanism that has been extensively reviewed (42)and is not considered further in this essay because it is found in cytoplasmic rRNAs in only a few species of Tehrahymena and is essentially absent in the cytoplasmic rRNA of all other eukaryotes. Another rRNA-processing event that is found in a limited number of organisms (Tetrahymena, some insects such as Drosophilia and Sciara, and some annelids and mollusks) is the production of “hidden breaks” in the 28s rRNA molecule which result from the removal of a small segment (e.g., 19 nucleotides in Sciara) of the rRNA (43). This “fragmentation” of rRNA is quite extreme in Euglena (another single-celled protozoan) and in some Trypanosomes (44).It has been suggested that this is a result of the ancestral rRNA gene being composed of many different rRNA genes, and that during evolution the genes combined into one transcriptional unit, with the removed sequences being considered “internal transcribed spacers” (ITSs) (23, 44). On the other hand, it has also been suggested that these extra sequences represent later additions to the rRNA genome. This argument has not yet been resolved (45),but interestingly, all of these extra segments are located

204

DUANE C. EICHLER AND NESSLY CRAIG

within the “expansion”or “variable” regions of mature rRNA sequences and are dispensable, or even mutable, in some cases (46-49).In Tetrahymena, Drosophila, and Sciara, the small “hidden break segments are each at the tip of a loop of a helix-loop structure in the same expansion segment, D7A (43).As discussed below (Section I,B,3), a loop and its associated secondary structure may represent a “processing signal.” The fact that different processing pathways have been documented suggests that the mechanisms or requirements underlying the varied cleavages may not be exactly the same, and that the differences may account for the alternative pathways. For example, inhibition of protein synthesis with cycloheximide abolishes the processing that releases the 5’ end of 18s rRNA and the formation of the 41s pre-rRNA (29). However, processing within ITS1 and production of 32s pre-rRNA remain unaffected. This observation is consistent with earlier findings showing that cycloheximide rapidly blocks the formation of 18s rRNA, while the production of 28s and 5.8s rRNAs continue essentially unchanged (50-52). Whether the components of a processing complex vary with different processing sites is still unknown (see Section I, B, 1).

A. Processing Sites Cleavage sites and processing pathway(s) have been extensively studied for over 25 years and the resulting information has often been reviewed (1-7, 9). Typically, these experiments have been limited to the analysis of nuclear material from only a few eukaryotic species, such as yeast, Tetrahymena, frog, rat, mouse, and humans. The certainty and accuracy of this processing information have often been restricted by the techniques used to resolve and distinguish processing intermediates. As stated earlier, only those intermediates whose steady-state levels are sufficiently abundant to lend themselves to detection can be distinguished by the techniques typically used. As a result, it is quite possible that some true intermediates, whose steady-state levels in the nucleus are below the detectable range of these techniques, remain unrecognized. This concern may be especially relevant for data taken from the older literature, in which the sensitivity and resolution of the analytical procedures may introduce an even greater uncertainty relative to the actual size of detectable intermediates. While some RNA-processing events, such as those involved in mRNA splicing, must be precise to preserve a translational reading frame, it is not clear whether rRNA processing requires the same level of precision, especially since limited sequence heterogeneity (a few nucleotides) has been reported for the ends of some rRNA (e.g., 5.8s and 28s) species in hnctioning ribosomes (33).The last point is especially applicable to the question of

205

RNA PROCESSING

whether any of the processing cleavages are unique and occur at the precise ends of the mature rRNA molecules. A summary of defined “processing sites” common to all or most eukaryotic species is discussed here.

1. SITE #O

IN THE

5‘-ETS (SITE A0

IN

YEAST)

In both mouse and human cells, early processing in the 5‘-ETS (at site #0) results from a grouping of cleavages about seven nucleotides apart (the main cleavages correspond to +GO and +657 in the mouse) (53). Since the upstream fragment resulting from this cleavage has not been detected, either in uiuo or in cell-free processing experiments, these results were taken to suggest that an endonuclease catalyzes this cleavage (35, 53, 54). In cellfree experiments, both the +GO and +657 cleavages appear with the same kinetics, consistent with the belief that each cleavage is unique and does not occur as a result of trimming from the +650 site to the +657 site (53). Cleavage at the early 5’-ETS site (site #0) has been found in many different eukaryotes, suggesting that processing at this site is not unique to mammals (53, 55). Nevertheless, it has been argued that cleavage at this early 5’-ETS processing site (#0 or AO) may not occur in all eukaryotic species, and that detection of rRNA intermediates that result from this putative cleavage are an interpretation artifact from the S l-nuclease protection and/or reverse-transcriptase analyses used to detect processing. An artifact could result from the possibility that reverse transcriptase and S1nuclease protection bands represent structural stops and/or ends of degradation products rather than processing sites (56, 57). For example, processing at the early 5’-ETS site can be easily observed in a cell-free transcription/processing system of Tetrahymena, but it is not as apparent when preRNA isolated from in uiuo nuclear rRNA is analyzed (58). In addition, processing at the early 5‘-ETS site can be relatively low in some situations. For example, in Xenopus oocytes, one group of investigators detected low levels of processed pre-rRNA that resulted from cleavage at this site (59),whereas another group was unable to detect any processing at this site (60). 2. SITE #1,

THE

5’ ENDOF 18s rRNA (SITE A1

IN YEAST)

Early experiments using S1-nuclease protection analysis on mouse nuclear rRNA demonstrated that the 5’ end of pre-rRNA intermediates containing 18s sequences corresponded exactly to the 5‘ end of the mature cytoplasmic 18s rRNA (33). In contrast, the 24s 5’-ETS (ETS1) fragment immediately upstream of the 18s 5’ end was found to have a heterogeneous 3’ end (seven different fragment ends). The largest of these 24s 5‘-ETS fragments, however, had a 3’ end that abutted the mature 5’ end of the 18s rRNA fragment

206

DUANE C. EICHLER AND NESSLY CRAIG

(33). The simplest explanation for these findings is that an endonucleolytic cleavage released the mature 5’ end of the 18s rRNA sequence, and that this cleavage was followed by 3‘-end trimming of the freed upstream 24s ETS fragment, which was ultimately degraded. Cell-free experiments using human nucleolar extracts and pre-rRNA transcribed from truncated human minigenes found that there is an initial cleavage three and eight nucleotides upstream of the mature 5’ end of the 18s rRNA sequence, and that subsequent trimming to the mature 5’ end of the 18s rRNA sequence occurred when a cytoplasmic extract w a s added (61). Differences in cleavage patterns between the mouse and humans may result from sequence variations in this region of precursor rRNA. Human prerRNA has three repeats immediately upstream of the mature 5’ end of the 18s rRNA sequence that are not conserved in mouse pre-rRNA.

YEAST) S1-nuclease protection experiments demonstrated that the 3’ ends of processed intermediates containing mouse 18s rRNA sequences are identical to the 3’ ends of cytoplasmic 18s rRNAs, and that the 3’ ends are “relatively homogeneous” (33). However, no intermediates containing the 5’ end of the ITSl region, which would be released by an endonucleolytic cut at the 3’ end of the 18s rRNA, were found. Therefore, the nature of this cleavage has not been precisely determined. Questions remain as to whether the 3’ end of the 18s rRNA sequence results from a precise endonucleolytic cut or whether an endonucleolytic cut in ITSl (site #2B in the mouse, site A2 in yeast) is followed by a subsequent trimming (3‘+5’ exonuclease)to the mature 3’ end of 18s rRNA. In this regard, an investigation of the possible involvement of a highly purified nucleolar endoribonuclease demonstrated that this enzyme cleaves naked precursor rRNA, in oitro, at two sites, which mimicked in uiuo cleavages (62). The most predominant cleavage by the nucleolar endoribonuclease corresponded directly to the mature 3‘ end of the 18s rRNA sequence, and the other cleavage was 55 nucleotides downstream of the 3’ end of 18s rRNA in the ITSl sequence. A yeast mutation (XRN1, also termed RAR5, KEM1, DST2, and SEP1) that inactivated a specific 5’-+3’ exonuclease resulted in the accumulation of an ITSl fragment of a size and location consistent with a precise endonucleolytic cleavage at the 3’ end of the 18s rRNA sequence, followed by a rapid degradation of the downstream ITSl fragment by the 5‘+3’ exonuclease (63). In another yeast mutant (rrp2) of the gene encoding for the RNA component of the MRP endonuclease, the production of 18s rRNA was significantly slowed, and uncleaved intermediates at this site accumulated (35sand 24S), suggesting that this cleavage was directly or indirectly affected by the action of this endonuclease (64-66). 3. SITE #2,

THE

3’ ENDOF 18s rRNA (SITED

IN

RNA PROCESSING

4. SITE #2b

IN

207

ITSl (SITE A2 IN YEAST)

The major cleavage releasing the 18s-rRNA-containing intermediate from the 5.8s-rRNA- and 28s-rRNA-containing intermediate appears to occur within ITSl and is endonucleolytic. This was first shown in yeast @ I ) , in Drosophilia (67), and in mouse L cells (32). However, some of the earlier experiments that attempted to localize this cleavage are somewhat confusing, since only the more stable intermediates (e.g., those containing nuclear 5.8s rRNA sequences) were analyzed. These rRNA intermediates in mouse cells have ends (a pair separated by five to seven nucleotides) typical of a cytoplasmic mature 5.8s rRNA sequence (33).The putative upstream fragments (34s and 20s) in mouse cells have heterogeneous 3‘ ends with some fragments containing sequences that abut the mature 5’ end of 5.8s rRNA (33). Additional confusion arises from the possibility that the separating cleavage can occur at site #2, site #2b, or site #3 depending on the particular cell type and/or growth conditions. In yeast, the temperature-sensitive-associated mutant allele of the RRP2 gene accumulates a 24s (5‘-ETS-l8%3’-end ITS1) intermediate, in addition to the 35s rRNA precursor, suggesting that cleavage at site A2 is inhibited (64, 65). RRP2 encodes the RNA component of the MRP endoribonuclease (66). Interestingly though, depletion of the wild-type form of the endonuclease RNA component, with the use of a conditional GAL promoter, did not affect processing at site A2 (processing was “normal”); rather, inhibition of cleavage at site B1 (the 5‘ end of 5.8s)was observed (68).The reason(s) for this apparent discrepancy is not clear.

5. SITE #3, THE 5’-END OF 5.8s rRNA (SITE B1 IN YEAST) Processing at site #3 appears to be more complicated than was initially thought. This degree of complexity may, in fact, be typical of the processing necessary to generate each of the mature rRNA ends, and may also reflect a certain flexibility and redundancy in the overall rRNA-processing mechanism. Apparently, mature 5.8s rRNA contains two types of 5’ ends, which result from different cleavages. One species results from cleavage at a site that yields the major mature 5‘ end, while the other cleavage occurs five to seven nucleotides upstream, yielding a species with 5’ end represented in only 10% of the mature 5.8s rRNAs. Interestingly, the mutation in the yeast RRP2 gene, which encodes the RNA component of the MRP endonuclease, affects this particular processing step most noticeably. In the mutant, the mature 5.8s rRNA, the 5.8s rRNA with the extra five to seven nucleotides (“5.8s-B” or “5.8s-L” form), and an intermediate with an extra 149 nucleotides (5.8s-B or 5.8s-L) extending up to site A2 are found in the ratio 45:45:10, compared to the wild-type ratio of 9O:lO:O (64, 65, 68). Experiments analyzing the cis-acting sequences required for 5.8s rRNA

208

DUANE C. EICHLER AND NESSLY CRAIG

synthesis have identified a previously unidentified site in ITS 1 (tentatively termed A3) that is 76 nucleotides upstream of the 5.8s rRNA sequence. Apparently, cleavage at this site is followed by exonuclease trimming to the two 5’ ends of 5.8s rRNA, since double mutants for two different 5‘+3’ exonucleases [XRNl,also termed RAR5, KEM1, DST2, and STEP1 (63);and RAT1, also called TAP1 and HKEl (69)I inhibit this trimming (70). In addition, the cleavage at site A2 does not require the sequence at the A3 site and is not affected by the r r n l and rut1 double-mutant strains (70). Thus, these results strongly suggest that the 5’ end of 5.8s rRNA is produced by an initial endonucleolytic cut in ITS1, which is then subsequently trimmed to one of two ends. 6. SITE #4b

WITHIN

ITS2 (SITE C2 IN YEAST)

Experiments in yeast, Xenopus, mouse, and human cells clearly indicate that the 3’ end of processing intermediates containing 5.8s rRNA sequences (e.g., 7 s in yeast, 12s in vertebrates) extend into the ITS2 region (33). Presumably, cleavage in ITS2 (site #4b in vertebrate, site C2 in yeast) is endonucleolytic, followed by a rapid exonucleolytic trimming to the 5’ end of the 28s rRNA sequence (site #5 in vertebrates, site C1 in yeast). Alternatively, it is possible that the mature 5’ end of 28s rRNA is instead generated by a second endonucleolytic cleavage. However, the fact that no intermediates have been identified containing the ITS2 sequence, which would result from the second endonucleolytic cleavage, supports exonucleolytic trimming (33). 7. SITE #5,

THE

5‘ ENDOF 28s rRNA (SITEC1 IN

YEAST)

A small fraction of nuclear 28s rRNAs isolated from mouse L cells contain four to six extra nucleotides at their 5’ ends (33).This suggests that processing at the 5’ end of 28s rRNA results from an endonucleolytic cleavage in ITS2 (site #4b) followed by exonucleolytic trimming that, on occasion, is imprecise, leaving extra nucleotides. These extra nucleotides might represent kinetic pause sites for a 5‘+3’ trimming exonuclease.

8. SITE #6, THE 3‘ END OF 28s rRNA (SITEB2 IN YEAST,SITE T1 IN XENOPUS) Processing that releases the mature 3’ end of the 28s rRNA sequence occurs very early in the temporal processing pathway, either simultaneously or just after the 5’-ETS cleavage (site #0) to generate a 45s rRNA intermediate (in the mouse) (30, 34, 71, 72). In yeast, a cell-free system to study processing at this site has been developed using in uitro-synthesized rRNA transcripts from cloned minigenes (73). In addition, a mutation (rna8.1)

209

HNA PROCESSING

which affects an endonuclease in yeast [required for the 3‘-end processing of the 5s rRNA species (74)],accumulates 28s rRNA intermediates with 10, 15, and 45-50 extra nucleotides at their 3’ ends (71).The results from these two systems in yeast have been taken to support the involvement of this endonuclease in releasing the mature 3‘ end of 28s rRNAs. In vertebrates, however, almost nothing is known mechanistically about processing at the 3’ end of 28s rRNA, except that it must occur.

9. 3’-ETS SITE IN THE MOUSE (SITE “1 SITE T2 IN XENOPUS)

IN YEAST,

Recent work has demonstrated that the termination of RNA polymerase I occurs several hundred nucleotides downstream of the 3’ end of the 28s rRNA sequence (565 nucleotides for the mouse, 210 nucleoproteins for yeast) (36,37, 72, 73, 75).Termination is quickly followed by processing at an upstream site nearer the 3’ end of the 28s rRNA sequence. In the mouse, only 10 nucleotides are initially removed from the terminated transcript in this processing step, leaving -555 nucleotides 3’ to the mature end of the 28s rRNA sequence (37).In Xenopus, this processing site is 235 nucleotides downstream of the 3’ end of the 28s rRNA sequence (72);in yeast, processing at this site results in intermediates containing 10, 15, and 45-50 nucleotides 3’ to the mature end of the 28s rRNA sequence (71, 75). It has been proposed that 3’-ETS processing may be integral to transcriptional termination, since the relevant processing activity appears to co-purify with RNA polymerase I (76).

B. The Mechanism of Ribosomal-RNA Processing An understanding of the mechanism of rRNA processing has come from a variety of different experimental approaches, but most early studies relied on the assumption that nucleolar localization of various RNAs and proteins implied involvement, either directly or indirectly, in the synthesis of ribosomes. It is difficult, however, to establish the actual role of these components based solely on their nucleolar localization. For this reason, more recent experimental strategies have been developed to better characterize the function of nucleolar components in rRNA processing. These strategies include: (1) the analysis of yeast mutants defective in some way in ribosome biogenesis; (2) the development of cell-free rRNA-processing systems using in uitro-synthesized rRNA transcripts and cellular extracts; and (3)the injection of antisense oligonucleotides into Xenopus oocytes to specifically inactivate snoRNAs (see below) in order to analyze their effects on rRNA processing. However, even with these approaches, our current understanding of the detailed mechanisms of eukaryotic rRNA processing is limited.

210

DUANE C. EICHLER AND NESSLY CRAIG

1. SMALL NUCLEOLAR RNAs In both yeast and vertebrates, the nucleolus contains a variety of small nucleolar RNAs (snoRNAs) ranging in size from 90 to 600 nucleotides (for recent reviews, see 17,18),characterized in terms of their 5’-end structure [capped or not capped with trimethylguanosine (TMG)], their association with nucleolar proteins, and in some cases, their ability to be crosslinked to specific pre-rRNA sequences (Table 11).In vertebrates, at least 12 snoRNAs have been characterized, whereas in yeast the number is even greater (77). Some of the yeast snoRNAs are homologous to the vertebrate snoRNAs, while others are apparently unique (17). Depletion of U3, U8, or U14 from a mouse in uitro-processing system significantly affects rRNA processing (31, 55; C. A. Enright and B. SollnerWebb, personal communication). For example, when an antisense oligonucleotide directed against U3 snoRNA is added to an in uitro-processing extract, the U 3 snoRNA in that extract is depleted (degraded) by endogenous RNase H. The effect of depleting a mouse processing extract of U 3 snoRNA was to block processing at the early 5‘-ETS site (site #0) (55). Similarly, depletion of U14 snoRNA from this same mouse processing extract also prevented processing at the same early 5’-ETS site (C. A. Enright and B. Sollner-Webb, personal communication). Together, these results suggest that both U 3 and U14 snoRNAs are required for normal processing at the 5’-ETS site (site #0) in the mouse. Crosslinking experiments have also been used to establish the involvement of U 3 snoRNA in processing. In both human and mouse cells, psoralen crosslinking experiments show that the U3 snoRNA is crosslinked to a sequence in the 5’-ETS several hundred nucleotides downstream of the early 5‘-ETS processing site (site #0) (78-80).In addition, when an in uitrosynthesized precursor rRNA sequence containing the early 5’-ETS processing site (+650, site #0) was extended beyond nucleotide + 1014, U 3 snoRNA was found to associate with a stable processing complex whose assembly was dependent on this processing site (C. A. Enright and B. Sollner-Webb, personal communication). However, since efficient processing was observed with shorter transcripts containing the early 5‘-ETS processing site, these results suggest that U3 snoRNA may interact with the pre-rRNA precursor by more than just base-pairing, that is, as part of a US-containing ribonucleoprotein complex (53,81). Similarly, U14 snoRNA hybridizes to 18s rRNA sequences (82).Thus, for U14 snoRNA to play a role in the processing at the early 5‘-ETS site (site #O), which is over 4000 nucleotides upstream of its contacts with the 18s rRNA sequence in the mouse, U14 snoRNA must also have other types of interactions that affect processing. These interactions may again be mediated through the protein components of the snoRNP and/or as part of a processing complex, “a processosome”(17).

TABLE I1 VERTEBRATESMALLNUCLEOLAR RNAs (snoRNAs)a

Sue (nucleotides)

Precipitated by antibodies against

u3

206-228

Fibrillarin

U8

136-140

Filwillarin

Effect of snRNA loss; where tested No site #O (mouse, Xenopus) less #3 (Xenopus) No 28s production (Xeno-

Crosslinked or hybridized to 5'-ETS

Nature of 5' end; gene location if it is a processed RNA TMG cap TMG cap

PUS)

U 13 U 14

105

u15a (X)

148

Fibrillarin Filnillarin Fibrillarin

u16a

106

Filxillarin

MRP17-2

125 260-280

Fibrillarin ThlTo autoimmune antigens

E l (=U17a) Ul7b E2 E3 u20

207

No fibrillarin or Sin

5'-ETS, 18s

205 154 135 80

No fibrillarin or Sm No fibrillarin or Sin Fibrillarin

28s

87-96

N o site #O (in oitro mouse)

18s

b

b Y

Site B1 yeast; temperatesensitive allele also affects AO, Al, and A2

18s

18s (complete)

(=CIS-11) u21

28.5 (complete)

TMG cap No TMG cap; hsc70 intron 5 No TMG Cap; r-protein S3 intron 3 No TMG cap; r-protein Lla intron 3 TMG cap Unblocked pppN

NO TMG cap; RCCl introns 1 and 2 No TMG cap No TMG cap No TMG cap; nucleolin intron 11 r-Protein L5 intron 5

Most or all of these snoRNAs are found in sinall nucleolar rihnucleoprotein particles (snoRNPs). References: See 17. excvpt for U8 (31); U14 (C. A. Enright and B. Sollner-Webb. personal communication); U15 (866);U16 (876);El/U17-E3 (88. 89, 178-1610); U2OICIS-11 (I.-P. Bachellerie, personal cummunic-ation; 89); U21 (J.P. Bachellerie, personal mmmiiuication).

212

DUANE C. EICHLEH AND NESSLY CRAIG

In yeast, genetic snoRNA “knockout” experiments have been carried out on five snoRNAs (U3, U14, snR10, snR30, and MRP/7-2) in order to understand their involvement in rRNA processing (56, 64, 65, 68, 83). With the exception of the gene that encodes snRlO, the other genes that encode snoRNAs are essential for growth (17, 18, 77);even in the case of snRlO knockout, growth was affected (77).The depletion mutants were created by putting the synthesis of the snoRNA under the control of a conditional GAL promoter; thus, when cells are deprived of galactose, transcription of the snoRNA is suppressed and its level gradually decreases as a result of dilution and turnover. The conditional depletion phenotype for each snoRNA knockout was similar (56, 83, 84); there was an accumulation of the 35s rRNA precursor, an inhibition of the production of mature 18s rRNA and its 20s rRNA precursor, and no effect on 28s or 5.8s rRNA synthesis. This phenotype was consistent with the inhibition of processing at sites AO, A l , and A2 in yeast. Since there was also an accumulation of a 23s rRNA intermediate (S’-ETS-18S-5‘-ITSl) with a 3’ end located between sites A2 and B1, these results suggest that an aberrant cleavage occurred at a “cryptic A2” site, possibly identical to site A3 (see Section I,A,5) (83). In contrast, the conditional depletion of the MRP/7-2 gene product in yeast significantly affected processing at site B1 (the 5’ end of 5.8s rRNA), although some mutant alleles of this same gene with a single base change also affected processing at sites A 1 and A2 (64,65),but others did not (68).These mutant alleles are not directly temperature-sensitive, but induce a temperature-sensitive (ts) growth phenotype, presumably because incorporation of the 5.8s rRNA molecules with longer 5‘ ends makes the ribosomes partially defective under high-temperature conditions. Overall, these results suggest that there may be a common processing complex containing each of the three snoRNAs (U3, U14, and snR30) involved in the production of mature 18s rRNA in yeast, and presumably in other eukaryotic cells as well. Further support for this proposal comes from the observation that these same snoRNAs can hybridize to the 35s rRNA precursor, although at different locations. For example, yeast U3 snoRNA crosslinks to two different sites in the 5’-ETS (57), whereas U14 snoRNA hybridizes to an internal sequence of 18s rRNA (82). Both U14 and snR30 have also been crosslinked to undetermined sites in the 35s rRNA precursor (17, 83). Interestingly, the site of snoRNA-pre-rRNA interaction for each of these snoRNAs does not correspond to a processing site. Thus, it is unclear how each of these snoRNAs affects processing. Work carried out with Xenopus oocytes shows that the 28s rRNA branch of the processing pathway also requires an intact snoRNA. Injecting antisense oligonucleotides directed against specific regions of U8 snoRNA into oocytes blocked processing at sites #4, #5, and T1, and significantly inhib-

213

HNA PROCESSING

ited processing site #3 (see Fig. 2B) (31). U8 snoRNA is similar to U 3 snoRNA, since it has a 5’-TMG cap and can be immunoprecipitated by antifibrillarin (as are many of the vertebrate snoRNAs that have been studied to date; see Table 11). Because depletion of U3, U14, snR10, and snR30 (and nucleolar proteins NOPl and GAR, see Section I,B,2,b) blocks the production of 18s rRNA, but not the production of the 255 rRNA in yeast, and because U 8 snoRNA depletion in Xenoyus affects only 28s rRNA synthesis, but not 18s rRNA synthesis, these results suggest one of two possibilities related to the mechanisms of rRNA processing: (1)there are at least two different processing complexes that can be distinguished by these knockout experiments; or (2) each branch of the pathway is processed by the same core complex and the differences observed by these knockout experiments can be accounted for by the presence or absence of additional Components necessary for each branch. Another recent and interesting finding is that some of the snoRNAs are products of processed introns froin pre-mRNAs encoding proteins involved in rRNA processing and ribosome assembly (see Table 11). For example, U14 snoHNA is derived from part of intron 5 of the hsc70 gene (85),U15 from intron 1 of the gene encoding the ribosomal protein S3 (86), U16a from intron 3 of the gene encoding ribosomal protein Lla (87), El/U17a from an intron of the RCCl gene (88),and U2O/CIS-11 from intron 11 of the gene encoding nucleolin (89; J. P. Bachellerie, personal communication). Thus, the cellular levels of these snoRNAs may be regulated by the transcription and processing of these pre-mRNAs, and this in turn could conceivably influence the rate and/or efficiency of pre-rHNA processing. 2. PROTEINS REQUIRED

FOR

rRNA PROCESSING

It has been clear for many years that the processing of eukaryotic rRNA is coordinated with the production of ribosomal particles in the nucleolus (5,6, 10).A priori, proteins may affect rRNA processing in a number of ways: First, protein components of processing may act directly as enzymes catalyzing the cleavages at specific processing sites. Second, some protein components may bind directly to the pre-rRNA, affecting rRNA folding and stabilizing a conformation critical for processing, or for particle assembly (serving as “scaffold or “chaperone” proteins). Third, proteins inay affect processing as components of snoRNPs, which are required for processing. Fourth, as ribosomal proteins, they are required to assemble into the nascent ribosome subunits and this assembly process itself may directly affect processing. Finally, protein components may affect processing by their interaction with other protein components of processing, influencing their import into the nucleus and/or assembly into ribosome subunits. Examples of each of these possible roles in rRNA processing have been described. (See Table I11 for a

TABLE 111 PROTEINS ASSO(:IA~'EI) WITH rRNA PRCXXSSING(~

Protein Nucleolin C23 Fil~rillarin NOPl GAR1 SSBl NSRl NOP3 SOFl SPB4 IXR1 CA9

RCC1, MTRI, etc. RRPl RNAl

'1

Excvpt for

Essential (yeilst) ?

Mass; lininolngy family (motifs)

Pathway Imncli dFected"

110 kDa

34-3U kDa,CAR family

18s (depletion, miitation)

Yes

22 kDa, GAR family 33 k h , GAR family 67 k l h , GAR family 45 k l h , GAR family 57 klla, G , family 69 k i h

18.5 (depletion)

NO Yes Yes Yes Yes Yes

87 k l h 55 kDa

18s (disriiptinn) 18s and 25s (depletion) 18s (depletion) 25s (mutation) 25s (mutation) 18s (depletion)

Yes

58 k l h

18s and 25s (iniitations)

Yes Yes

50 k l h ( ? )

25s (mutation) 1HS and 255. also tRNA, niRNA (intitation)

iiiiclroliii,

46 kDa

Activity; h d t i o l l if not nucleolar

N o direct experiment

Yes

NO

Antildy precipitates

all arc inost fiilly rharactcrizrcd iii yeast.

Natiirr

cif

121-127

113-117

All snoRNAS but 7-2 snR30, snRl0 snR10, s n R l l Also nucleoplasm u3

cxl'c'rii~imtalrvideiicr.

Reference

Helicase(?) Helicase(?) Helicase(?)

Indirect; also nucleoplasm Indirect; unkncwn Indirect, catabolic repression; cytoplasm

111 118 120 119 130 110

109 O'Day and Alwlson (personal winmunication) 131 181, 182 183-185

HNA PROCESSING

215

list and description of the proteins most closely linked at the present time to rRNA processing.) a. Catalytic Proteins. Unlike pre-mRNA processing, cleavage of rRNA is a hydrolytic event producing termini where the phosphodiester bond is attacked by water. In other words, this is not a transfer reaction in which the phosphodiester bond is simply exchanged, as is the case for the splicing reaction for eukaryotic pre-mRNA (90).To identify and characterize activities that could catalyze the cleavages in rRNA processing, early studies relied on various nuclear or nucleolar extracts as a source for the catalytic activity and the putative “45s rRNA precursor” isolated from nucleoli as the starting substrate. Early studies characterized these activities by following the formation of recognizable “sized intermediates released from the 45s rRNA precursor using either sucrose gradient profiles or polyacrylamide gel patterns. Based on their nucleolar localization and their apparent ability to generate appropriately sized intermediates compared to in oioo-generated intermediates, these activities were suggested as being important for rRNA processing. For example, an early nucleolar processing activity was described that could degrade pre-rRNA to appropriately sized intermediates in the presence of M$+, but would not, under the same conditions, degrade mature rRNA (91). Similarly, an RNA-associated RNase activity from the cytoplasm of chick embryos was described that would process 45s pre-rRNA in the presence of Mg2+, producing appropriately sized intermediates and mature products (92). In contrast, a flow-through fraction from a DEAE-cellulose column yielded a nucleolar activity that apparently processed 45s prerRNA, but did not require a divalent cation for activity (93). Regardless, in each case, the lack of precision of the assay procedure, coupled with the difficulty of obtaining sufficient quantity and quality of the pre-rRNA substrate, limited the usefulness of this approach. Subsequently, the availability of defined in oitro-derived pre-rRNA transcripts, as well as the ability to map specific cleavage sites directly, made it possible to identify accurately and then to characterize nucleolar cleavage activities that may be involved in the processing of eukaryotic rRNA. For example, an activity from the S-100 fraction of a whole-cell extract (Ehrlich ascites tumor cells) introduces specific cleavages near the 5’ end of 18s rRNA, in IVSl and in IVS2, at a consensus sequence, GGCUUGU. These results suggested that some processing steps may involve a sequencespecific cleavage activity (94). In another example, two distinct activities were characterized that together carried out the processing step that produces the 5’ end of human 18s rRNA (61).The first activity, an endonuclease from a nucleolar extract, cleaved one or two nucleotides upstream of the

216

DUANE C. EICHLER AND NESSLY CRAIG

mature 5’ end. The second activity, a 5’-+3’ exonuclease from a cytoplasmic extract, then trimmed the remaining nucleotide to produce the mature 5‘ end (95). From a nucleolar extract of Ehrlich ascites tumor cells, an endoribonuclease was purified (96, 97) that catalyzed the cleavage at the 5‘-ETS site (site #0) (98), mimicking the cleavage demonstrated from both in uioo and in uitro studies (53,S). The fact that a highly purified enzyme uniquely recognized this early 5‘-ETS processing site suggested that sufficient rRNA structure was maintained in the in uitro-derived transcript to permit cleavage site recognition. This nucleolar endoribonuclease, now termed RNase P a l , has a native molecular mass of -51,OOO Da, requires Mg2+ for activity, and cleaves in single-stranded regions of RNA (96, 97). Two other nucleolar exonucleases have also been isolated and characterized from this nucleolar extract that may participate in the trimming or turnover of rRNA fragments discarded during processing (99, 100). One hydrolyzes either linear or duplex RNA in a 3’+5’ direction in a nonprocessive manner, releasing 5’ mononucleotides (100).The other is a highly processive 5‘-+3’ exoribonuclease specific for single-stranded RNA (99). Both nucleolar exoribonucleases release 5’-mononucleotides. A nuclear-encoded ribonucleoprotein endoribonuclease, known as RNase MRP, appears to function in both the nucleolus and the mitochondria (101).In mitochondria, the most purified fraction of RNase MRP contained an RNA component, and catalyzed the endonucleolytic cleavage of the RNA primer during mitochondria1 DNA replication (102-104).However, only a small proportion of total cellular RNase MRP activity was associated with mitochondria. Most RNase MRP activity w a s in the nucleus (102), and the nuclear RNase MRP, also known as 7-2 RNP, had a 275-nucleotide RNA component and was localized in the nucleolus (105).The RNA component of yeast MRP RNase (340nucleotides) has been cloned, and its essential gene named NMEl(106). It is the same gene as RRP2, described in Sections I,A,5 and I,B,l (66). In HeLa cells, fractionation studies demonstrated that MRP/7-2 RNase is associated with pre-rRNA-processing complexes, and this, along with its nucleolar localization was taken to support the involvement of MRP/7-2 RNase in rRNA processing and ribosome biogenesis (107). Interestingly, the MRP/7-2 ribonucleoprotein particle also shares some structural similarity to RNase P, a ribonucleoprotein particle involved in pre-tRNA processing in prokaryotes (108). Antibodies directed against the MRP enzyme complex (the Th determinant) did not inhibit processing at the +650 site, but it was not clear whether all of the MRP activity had been inhibited (55). As described earlier (Sections I,A,3-5), mutational analysis in yeast has directly implicated at least three different RNases as important in rRNA

RNA PROCESSING

217

processing. These RNases are a 5’+3’ exonuclease (XRN1) whose absence results in the accumulation of ITS1 intermediates (63);the two 5‘+3’ exonucleases XRNl and RAT1, which affect the formation of the 5‘ end of 5.8s rRNA (70);and the yeast counterpart to the MRP endonuclease (NMElIRRP2) whose absence affects cleavage at site B1, and whose mutant allele (rrp2) affects processing at sites A l , A2, and B1 (64-66, 68). The pleiotropic effects observed for the mutant allele in the NMElIRRP2 gene are consistent with the proposal that this endonuclease may function as part of a multicomponent “processosome” involved in the 18s rRNA branch of the processing pathway. Another group of proteins with possible catalytic roles in rRNA processing are the class of proteins whose sequences make them part of the “DEADbox” gene family (108b), and which are related to ATP-dependent RNA helicases such as the translational factor eIF-4A. Two of these proteins appear to affect the ultimate production of 28s rRNA and were isolated either as cold-sensitive mutations of ribosome production (gene DRS1) (109)or as a suppressor of translational inhibition in the absence of the poly(A)-binding protein (gene SPB4) (110).The phenotypes of both mutations included the accumulation of a 27s rRNA intermediate. However, the exact 5’-and 3’-termini of the 27s rRNA intermediate were not determined. Thus it is unclear whether this was a normal or aberrant 27s rRNA intermediate. These results also implied that the early processing cleavages and the production of 18s rRNA were not affected by these mutations. Both DRSl and SPB4 genes are essential for growth, suggesting that more than one helicase may be required for the branch of the rRNA-processingpathway necessary for normal production of 28s rRNA and/or the large 60s ribosomal subunit (109, 110). A putative yeast RNA helicase named CA9, and isolated in a screen for proteins related to the DEAD-box family, has the same mutant phenotype as that found for the depletion of U3, U14, snR10, snR30, and other proteins that affect processing at the AO, A l , and A2 sites (causing an accumulation of the 35s rRNA precursor and blocking production of 18s rRNA) (C. O’Day and J. Abelson, personal communication). This suggests that this putative helicase is required for the 18s branch of the rRNA-processing pathway, in contrast to the DRSl or SPB4 helicases, which affect 25s rRNA production. It has been postulated that RNA helicases may function in eukaryotic rRNA processing in a number of ways that include: (1)unwinding the prerRNA secondary structure to allow for snoRNA-pre-rRNA interactions; (2) altering the pre-RNA secondary structure to produce the processing substrate; (3) altering the pre-rRNA secondary structure to allow ribosomal proteins to bind; or (4) promoting the dissociation of rRNA-snoRNA and/or snoRNA-snoRNA interactions established during the processing reactions (109).

218

DUANE C. EICHLEH AND NESSLY CRAIG

Overall, though, our understanding of the catalytic machinery associated with the processing of eukaryotic pre-rRNA is quite limited. In part, the difficulties in trying to define these activities may relate to the difficulties in establishing a complete in oitro-processing system that lends itself to the accurate identification and characterization of catalytic activities.

b. Noncatalytic Nucleolar Proteins. A prominent family of nucleolar proteins that appear necessary for ribosome production is the “GAR family. Members of this family share a conserved repetitive domain rich in glycine and arginine at their carboxy terminus, and typically have one or more RNA recognition domains (111). Proteins of this family whose role in ribosome synthesis has been studied include fibrillarinlNOP1, GAR1, SSB1, NSRl, and NOP3 (Table 111). Analysis of the primary sequence and comparison of the proteins in this GAR family suggests that it is the secondary and/or tertiary structure of the GAR domain that has been conserved, rather than the primary sequence. The location and spacing between the repeats making up the GAR domain can vary widely. Typically, the GAR domain is separated from the rest of the protein molecule by flanking proline residues. Thus, the GAR domain appears to be location-independent in terms of its function. For example, the GAR domains are in the middle of SSBl, at the amino terminus of NOP1, at the carboxy terminus of nucleolin, and at both carboxy- and amino-terminal ends of GAR1 (111). The characteristics and phenotypes of each of these proteins are slightly different, suggesting that each one may play a unique role in rRNA processing. Fibrillarin was first identified as a unique nucleolar protein in the slime mold Physarutn polycephalum (112), but has subsequently been characterized from various other species, including humans (113)and yeast (114). In vertebrates, it has been localized in the “dense fibrillar component” region of the nucleolus where U 3 snoRNA is located and where the bulk of rRNA processing is presumed to occur (22).Anti-fibrillarin antibodies immunoprecipitate many of the known snoRNAs, suggesting that fibrillarin is a common protein component of many snoRNPs. In yeast, the gene encoding fibrillarin (NOP1) is essential, and encodes a 34- to 38-kDa protein that is highly conserved in eukaryotes. Yeast and human fibrillarins have over 70% sequence identity, and human fibrillarin can functionally (although not perfectly) replace the protein encoded by the yeast NOPl gene (115).Depletion of fibrillarin with the use of a GAL conditional promoter significantly decreased 18s rRNA synthesis with some effect on 25s rRNA production as well as pre-rRNA methylation (116). These pleiotropic effects on rRNA processing were analyzed further by the use of ”

HNA PROCESSING

219

mutants carrying a temperature-sensitive lethal point-mutation in the NOPl gene (117). Interestingly, all of the temperature-sensitive alleles did not have the same effect on processing. For example, nopl.2 and nopl.5 blocked the synthesis of all rRNAs and their intermediates, except for the 35s rRNA precursor. These results showed that processing at sites AO, Al, A2, and even B1 were blocked. In contrast, the nopl.3 temperature-sensitive allele did not prevent synthesis of mature rRNA molecules, but strongly inhibited nuclear rRNA methylation. The nopl.4 and nopl.7 alleles impaired the production of normally sedimenting 60s ribosomal subunits, suggesting an effect on the large ribosomal subunit assembly. Given these diverse effects, it has been suggested that fibrillarin (NOPl) influences cleavage, methylation, and subunit assembly reactions by promoting the correct conformation of the pre-RNA for the various reactions at different points in the processing pathway (117). Whether fibrillarin functions only as part of an snoRNP or whether it directly interacts with the prerRNA and processing intermediates is not clear. Two other members of the nucleolar GAR family are GARl and SSB1, which interact with subsets of snoRNAs. GARl is associated with snR30 and snRlO (111);SSBl is associated with snRlO and snRll (118).GARl is an essential gene ( I l l ) ,whereas cells whose SSBl gene is deleted are still viable even though their growth rate is slower (118),a phenotype like that observed for snRlO disruption. The proteins encoded by these genes are exclusively localized to the nucleolus. The depletion of both genes results in the accumulation of the 35s rRNA precursor and the inhibition of 18s rRNA production. Therefore, processing at sites AO, A l , and A2 appears to be most affected by the loss of these proteins, although there is an apparent cleavage at or near site B1 releasing an aberrant 23s rRNA intermediate, which is subsequently degraded. This phenotype is essentially the same as that observed for the depletion of U3, U14, snR10, and snR30 snoRNAs, again supporting the concept of a processing complex, “processosome,” for the 18s rRNA branch of the rRNA-processing pathway. Another member of the yeast GAR family is NOP3, which is essential for rRNA processing (119). However, its characteristics are a little different from those of NOPl, GARl, and SSB1. First, it is not exclusively localized to the nucleolus, and is found also in the nucleoplasm. Second, this protein appears to be required for the late steps in processing in which the 27SB rRNA (see Fig. 2) is cleaved to the 25s rRNA, and when the 20s rRNA is cleaved to 18s rRNA. Initial processing of the 35s rRNA precursor is also slowed. It is not clear whether NOP3 acts directly or indirectly in rRNA processing. Since NOP3 has an RNA recognition motif domain as well as the GAR domain, presumably involved in protein-protein interactions, it has been suggested

220

DUANE C. EICHLER AND NESSLY CRAIG

that this protein may affect the import of proteins to the nucleolus, such as ribosomal proteins that are assembled into nascent ribosomal subunits, or even those proteins involved in the processing (119). The NSRl gene was first recognized in yeast as encoding a protein with a nuclear signal recognition sequence required for nuclear import. Subsequently, the protein was found in the nucleolus of yeast, and to be involved in rRNA processing (120). Although the gene is not essential for viability, mutations in this gene affect cell growth, with the mutant growing more slowly. Structurally, the protein is clearly a member of the GAR family, and resembles nucleolin in domain structure but not in primary sequence. The similarities to nucleolin also include a GAR domain located at the carboxy terminus, an amino terminus with serines flanked by acidic amino acids, and several consensus RNA recognition motifs (four for nucleolin and two for NSRl). When the NSRl protein is mutated, rRNA processing is less efficient, with reduced processing at sites A1 and A2. The amounts of 18s rRNA (and the 40s small ribosomal subunit) are reduced compared to the amount of 25s rRNA (120). Again, this phenotype is essentially the same as what resulted from the depletion of NOPllfibrillarin, GAR1, U3, U14, and other components required for the 18s rRNA branch of the processing pathway. Because of this, the NSRl protein may be a component of the snoRNP complex involved in the processing of 18s rRNA. However, anti-NSR1 antibody does not immunoprecipitate any snoRNAs of this putative complex. Therefore, there have been several suggestions as to the possible involvement of NSRl in this processing pathway. These suggestions include: (1) a role in the processing complex, but with a loose or transient association; (2) interactions directly with the pre-rRNA or processing intermediates; or (3) an indirect role affecting the import from the cytoplasm to the nucleolus of processing components and/or ribosomal proteins (120). Nucleolin and No38 are two other nucleolar-associated proteins that have been extensively studied in vertebrates in terms of their structure, but relatively little direct evidence is available to support their precise role in rRNA processing. These two proteins have gained considerable interest since they apparently move or “shuttle” between the cytoplasm and the nucleolus (121).This led to the proposal that they play a role in the transport or import of required nucleolar components from the cytoplasm. Nucleolin ((223)is the best characterized and is predominantly localized to the fibrillar component of the nucleolus (122).Nucleolin is also associated with chromatin in the nucleolus, and therefore may have multiple functions in the packaging of rRNA, rDNA activation, and ribosome assembly (122). Nucleolin contains specific domains for interaction with nucleic acids, both DNA and RNA, as well as with ribosomal proteins (123-125). In fact, one of the most significant features of nucleolin primary structure is the “RNP

R N A PROCESSING

221

consensus sequence” (125)implicated in the binding of single-stranded nucleic acids such as pre-rRNA. In this regard, recent studies suggest that nucleolin can interact with rapidly labeled pre-rRNA (126),and it has been proposed that during the early stages of ribosome biogenesis, nucleolin promotes the formation of secondary structure in pre-rRNA (127).Since the assembly of ribosomal proteins as well as the processing of rRNA may be dependent on the appropriate pre-rRNA conformation, the capacity of nucleolin to promote rRNA secondary structure may be an essential function of nucleolin that facilitates ribosome biogenesis. In contrast, B23 is a nucleolar protein that appears to be associated with ribosome assembly at later stages of maturation (128).Biochemical studies support this proposal and even suggest a role for B23 as a carrier of ribosomal proteins between the cytoplasin and the nucleus (121).Protein B23 also binds and alters rRNA conformation, and this property may be important in promoting the association of ribosomal proteins to rRNA (129). SOFl was recognized as a protein that would suppress the pre-rRNAprocessing defect in a yeast nopl mutant complemented by human fibrillarin (these constructs grow more slowly and are temperature-sensitive for growth and rRNA processing) (130).The SOFl gene is essential, and the protein encoded by SOFl has, in its central region, a domain containing a repeated sequence found in the f3-subunit of G-proteins and in the mRNA splicing factor PRP4. The protein is nucleolar, and is associated with NOPl (fibrillarin), although all NOPl is not bound with SOF1. In uiuo depletion of SOFl (not 100% complete) produced a phenotype in which the 35s rRNA precursor accumulates and the normal 32S, 27SA (Fig. 2), and 20s pre-rRNA intermediates are greatly reduced, as are the levels of the mature 18s rRNA (130). Again, this phenotype suggests involvement in the “18srRNA branch of the processing pathway. Immunoprecipitation of the SOFl protein showed that it is also associated with U 3 snoRNA. However, it did not iminunoprecipitate with any other snoRNA (at least to significant levels), suggesting that it is not a common component of all snoRNPs (130).The family of proteins containing the “GP-domain” are quite diverse in their functions and include signal transduction properties characteristic of G-proteins, putative roles in transcriptional regulation, cell cycle control, and spliceosome formation (for PRP4) (references in 130).All of these roles suggest that protein-protein interactions are a critical feature of the GP-domain. Based on this observation, it has been speculated that SOFl may play a role in rRNA processing by facilitating the assembly of the U 3 snoRNP and other nucleolar components to create a processing complex for the 18s rRNA branch of the processing pathway (130). Another protein with extensive pleiotropic effects that appears to affect

222

DUANE C. EICHLER AND NESSLY CRAIG

rRNA processing (131) is RCCl (vertebrate), PZMl (S. pornbe), PRP20, SRM1, and MTRl (Saccharomyces cereuisiae). The different names result From the repeated and independent cloning of this gene in different species by different laboratories. This gene encodes a guanine nucleotide release protein that promotes guanine nucleotide exchange by a small GTPase of the ras gene superfamily (RAN). Among its many phenotypes is one that affects mRNA metabolism by inhibiting mRNA export from the nucleus. Its rRNAprocessing phenotype is associated with the accumulation of the 35s rRNA precursor and the slowed processing of 27s to 25s rRNA and 20s to 18s rRNA (134, a phenotype similar to that of NOP3 depletion. In an attempt to explain the various phenotypes associated with the RCCl gene, it 'has been suggested that the ACCl protein may be part of, or affect, some important structure in the nucleus critical for DNA replication, RNA synthesis, and RNA processing. It is also conceivable that the RCCl protein may be indirectly involved in rRNA processing by affecting the export to the cytoplasm of inRNAs coding for the necessary nucleolar, rRNA, or rRNA-processing proteins (131). Proteins required for normal rRNA processing also include the ribosomal proteins (5, 6, 10) Many experiments show that inhibition of the synthesis of one or more of these ribosomal proteins prevents correct assembly of a particular subunit, large or sinall. One model proposed to account for their effect on processing suggests that the nascent subunits are not sufficiently stable to resist degradation without the required full complement of ribosomal proteins. Thus, the mature rRNA species would not be found due to the rapid degradation of the aberrant subunits. The fact that the function of these and other nucleolar proteins is not well understood with regard to pre-rRNA processing may reflect, in part, the complexity of the eukaryotic rRNA-processing system. This complexity is best exemplified by the interdependent operations of post-transcriptional modifications, cleavage of the precursor, and attachment of ribosomal proteins that must occur in some coordinate fashion to permit the correct assembly of ribosomal subunits. In other words, it is conceivable that the sequence of cleavages that releases intermediate and mature rRNA species may be affected not only by previous cleavages, but also by post-transcriptional modifications and the orderly attachment of ribosomal proteins. This coordination of events among modification, processing, and ribosome assembly may be complex and therefore difficult to reproduce in a test tube. The problems related to the use of in uitro systems to study the mechanisms of eukaryotic rRNA processing may be further compounded if processing is not totally compatible with solubilization. Although there are some questions as to the organizational arrangement and attachment sites of the rDNA repeats, there is considerable evidence for the association of actively

RNA PROCESSING

223

transcribed ribosomal genes with an insoluble nucleolar structure, operationally defined as the “nucleolar matrix” (132-134).Distinct lines of evidence suggest that the nucleolar matrix acts as a superstructure to which nucleolar components are directly or indirectly attached (113,135-139). Core nucleoli that remain after extensive DNase-I treatment of isolated nucleoli not only contain fragments of NTSs of the rDNA repeats, but also other nucleolar components such as nucleolin and fibrillarin (113,135,136,

139). Localization of both nucleolin and fibrillarin in core nucleoli has lent support to the idea that rRNA processing is intimately associated with the nucleolar matrix. Yet, as with any cellular disruption and isolation procedure, there are concerns that the core nucleolar fraction may reflect more the results of fractionation rather than mirror the true in viuo structure. Nevertheless, it is conceivable that the higher-order structures of the nucleolus play a critical role in ribosome biogenesis, including the efficient and temporal processing of pre-rRNA (140,141).Interestingly, the successes to date with in vitro-processing systems have been associated with early processing sites, such as sites #O and #1 in vertebrates (Fig. 1B) (22,24-26,32, 35,53,54,61,142).It is possible that the requirements for these very early cleavages may be less dependent on the superstructure of the nucleolus. In fact, it has been postulated that the role of the very early cleavages sites may be to trigger or assist the appropriate assembly of the “processosome” (7,17,

59, 143). 3. SIGNALSREQUIRED FOR rRNA PROCESSING There is relatively little information concerning the nature of the signals that affect processing-site recognition. The theoretical possibilities for signaling include: (1)rRNA sequence; (2)conserved features of pre-rRNA secondary structure; (3)additional features of rRNA tertiary structure; and (4) features of a ribonucleoprotein complex formed by the interaction of ribosomal or other proteins with the pre-rRNA to create recognizable processing sites. Each possibility is not necessarily mutually exclusive of the other, and each may, in part, contribute to the overall recognition process. For example, the secondary structure of rRNA is determined by the primary sequence, and the binding site for a protein in an RNP can be determined by sequence andlor by features of secondaryltertiary rRNA structure. Direct inspection of the primary sequence for the precursor rRNA, particularly with regard to known processing sites, provides little insight into recognition signals for processing. One observation that seems to distinguish eukaryotic from prokaryotic rRNA processing is that the spacer sequences on each side of 18s or 28s rRNA sequences in the precursor do not base-pair to create long-range, stem-loop structures (33)that characterize the recogni-

224

DUANE C. EICHLER AND NESSLY CRAIG

tion signals for rRNA processing by RNase I11 in Escherichia coli (144).In eukaryotes, for example, correct processing in oiw or in oitro at the 5’ end of the 18s rRNA sequence in precursor rRNA does not involve base-pairing with sequences in the 3’ half of the 18s rRNA sequence or in the ITS1 sequence (61,145).In fact, the best modeling studies suggest that the secondary structure in the precursor at the mature 5’ end of the 18s rRNA Nevertheless, given the uncertainty in sequence is single-stranded (33,146). our ability to predict RNA secondary structure, and that the secondary structure of the spacer regions of only one pre-rRNA species (yeast) has been probed experimentally (147-150),it is not surprising that we do not know whether there might be related features of secondary or tertiary structure that contribute to processing-site recognition signal(s). However, three experimental systems are beginning to provide some tentative clues as to the nature of the sequence andlor secondary structural features that contribute to processing-site recognition, either directly or indirectly. The first is a cell-free processing system derived from a mouse whole-cell extract that carries out processing at the early 5’-ETS site (site #0) (35,53,54).The second is an in uiuo yeast system in which tagged rRNA genes can be mutagenized, transfected, and then expressed (46).The third system is similar to the yeast tagging system, but uses Tetrahymenu, in which the normally expressed rRNA genes can be replaced with mutagenized rRNA genes (151). a. Signals in the 5’-ETS. A large number of transcripts derived from cloned and mutagenized rDNA templates have been tested using the in uitro mouse processing system. Processing in this system at the early 5‘-ETS site (site #0) was shown to be dependent on a relatively small number of nucleotides immediately 3’ (downstream) to the processing site (a region extending from $655 to +666) with other downstream regions influencing the efficiency of processing (54,81). The sequence most critical to specific and efficient cleavage was immediately adjacent to site #O at +650 and is partially conserved over a wide range of eukaryotic species (mouse, human, Xenopus, Tetrahymena, silk moth, Physarum, Neurospora, and yeast) (55). Compared to the sequences in the rest of the 5’-ETS, which are variable and not well-conserved in eukaryotes, this conserved sequence adjacent to the cleavage site was quite conspicuous (53,54, 146). A comparison of the computer-predicted secondary structures for each of the mouse pre-rRNA substrates capable of being correctly processed by this in oitro system suggested that the +650 processing site was in a non-basepaired region, and was located in the loop of a kinky stem-loop structure (81). This hypothesis was directly tested using cis- and trans-antisense oligonucleotides that complement the single-stranded region of the loop,

RNA PROCESSING

225

and therefore change the secondary structure of the precursor rRNA. The results demonstrated that the sequestering of this region in a base paired structure blocked processing as well as the formation of a processing complex (81). Experiments that directly probe the structure at the mouse early 5’-ETS processing site provided further evidence that the computer-predicted structure is correct in its outlines, and that the processing site is single-stranded with a highly structured downstream region (N. Craig, unpublished). Whether there is conservation of this structural motif in other processing sites, either in the mouse or in other eukaryotic species, remains to be determined. It is also possible that protein components in the extract required for processing directly affect the appropriate recognition of this processing site, and that the folded structure of this region simply defines binding sites for these proteins. A powerful technique developed and used in yeast for the study of cissequence signals involves “tagging” the mature rRNA sequence with an oligonucleotide inserted into a region (usually one of the “expansion“ regions) that can be modified without hindering the assembly and functioning of the ribosomes with the tagged rRNA (46-48, 152). Using this approach, a specifically tagged rRNA can be followed during processing and in the presence of an excess of normal wild-type rRNA produced from the highly repeated rRNA genes. More recent work has refined this approach so that the bulk of the newly synthesized rRNA is made from cloned and transformed rRNA genes (J. Venema and H. A. RauC, personal communication). This work uses a strain of yeast defective in RNA polymerase I (so the normal repeated rRNA genes cannot be transcribed), but the strain is still viable, since the additional rRNA genes are under the control of a GAL-inducible RNA polymerase I1 promoter (153). Such a strain of yeast can then be transformed with a multicopy plasinid carrying a mutant (“tagged) rRNA transcriptional unit under the control of a strong constitutive promoter. In the absence of galactose, all of the rRNA is synthesized from the transfected genes, providing a more sensitive approach to reveal the effects of various rRNA mutations. For example, in the first system in which the tagged rRNAs and ribosomes are in low abundance, one tag in the 17s rRNA abolishes formation of mature 17s rRNA with this tag, whereas a tag in a different location in the 17s rRNA has no discernible effect on the maturation and assembly of this tagged rRNA (46, 48). The viability of cells with either tagged 17s rRNA is normal. However, using the current system in which the majority of the ribosomes are tagged, this second 17s rRNA tag can be seen to have a partial inhibitory effect on growth (J. Venema and H. A. Rau6, personal communication). Thus, more subtle consequences may be observed when the tagged ribosomes are in the minority.

226

DUANE C. EICHLER AND NESSLY CRAIG

With the yeast tagging system, it is possible to construct a variety of deletions, mutations, and cross-species replacements of specific regions of the pre-rRNA molecule and then analyze whether the altered precursor rRNAs are correctly processed. When four different extensive 5‘-ETS deletions were constructed, ranging from 142 to 632 nucleotides (out of 699 nucleotides), there was no subsequent production of 17s rRNA, even though normal levels of tagged 25s rRNA were produced (47,48).All four deletions lacked sequences between f 4 1 1 and +457, suggesting that this region andlor adjacent regions might influence processing. The importance of the 5’-ETS region has been further demonstrated by studies in yeast using a smaller 23-nucleotide deletion between positions +469 and +491. In this construct, the level of 17s rRNA was greatly reduced, while the level of 255 rRNA remained unaffected (57). Reversetranscriptase analysis demonstrated that processing at sites AO, Al, and A2 was inhibited, a result exactly analogous to that found in yeast cells depleted for U 3 snoRNA. Interestingly, this region includes the nucleotides that crosslink to U3 snoRNA and share significant homology to the comparable 5’-ETS processing site (site #0) in many eukaryotic species (57). These results are also important because they again reinforce the idea that the 18s rHNA branch of the rRNA-processing pathway may involve a large complex (“processosome”) that assembles at the early 5’-ETS site or region. In addition, the fact that the production of 25s rRNA was u n a e c te d again implies that the two branches of the rRNA-processing pathway are partially or relatively independent. Work in Xenopus has further established that when the conserved 11nucleotide region associated with the 5’-ETS early-processing site is deleted in cloned rRNA genes, pre-rRNA made froin these modified genes is not cleaved in uitro using extracts made froin Xenopus germinal vesicles, Xenopus kidney cells, or even heterologous mouse cells (59, 143). In contrast to wild-type Xenopus pre-rRNA, Xenopus rRNA with specific deletions of this region is unable to form the typical processing complex (see Section I,B,4) (143). Given the observation that the yeast cleavage site #O is after nucleotide +610, and that Xenopus oocytes may have either no (60)or low levels of cleavage (59, 143) even though both yeast and Xenopus have the 11nucleotide conserved region, it may be that the more critical feature of this region or “signal” in all eukaryotes may be to direct the assembly of a processing complex rather than the cleavage itself. There is also a report describing a gene replacement system for Tetrahymena in which the macronuclear rRNA genes were replaced by injected rRNA genes in which a 119-nucleotide insert was placed at one of two sites in the 5’-ETS region (151).The results showed that inserts of this size placed in

RNA PHOCESSINC

227

the 5’-ETS away from site #O were not necessarily deleterious, since production of mature rRNA and growth of the transformed cells were normal. In contrast, a large insert of 2300 nucleotides in the 5’-ETS region of Tetrahyinena produced no transformants. This large insert clearly caused problems, but the exact reason for the lack of transformants is unknown.

b. Signals in ITSl. When a 160-bp deletion that encompassed the A2 processing site was introduced into the central region of ITSl (362 bp), using the yeast tagging system, no 17s rRNA was made from the mutated prerRNA, although normal amounts of 25s rRNA were made. In addition, an abnormal 32s rRNA fragment accumulated, and was taken to result from the cleavage of the 35s rRNA precursor at the A1 site (the 5’ end of the 17 S rRNA sequence) (47, 48). Replacement of the ITSl region in S. cereuisiue with a smaller ITSl region of Kluyueroinyces lactis (226 nucleotides) or Hansenula wingei (153nucleotides) supported production of mature 17s and 25s rRNAs, although the rates were somewhat slower in the latter case (153b, 1 5 3 ~ )Presumably, . it is not the size of the ITSl region that is critical for appropriate processing but rather critical signals that have been conserved among all three species of yeast. Deletion of most of the region upstream of site A2, including an evolutionary highly conserved structural element, also had little or no effect on processing ( 1 5 3 ~ )However, . the deletion of six nucleotides just downstream of the A2 site reduced processing by 20-fold (D. Tollervey personal communication). Examination of the sequence 3’ to the A2 processing site showed that it could potentially form seven consecutive base-pairs with the sequence at the 5’ end of snR30 (the yeast snoRNA which, when depleted, inhibits cleavage at A2). These results suggest that base-pairing between snR30 and this putative “signal” sequence 3’ to the A2 site may be important to processing. It is also important to note that these deletion mutants were viable, as were mutants with deletions of up to 55 nucleotides encompassing the A2 site (L. Lindahl, personal communication). An explanation for these findings might be that there is a redundancy in the processing system so that when the usual A2 site is deleted or mutated, another cryptic site (site A3?) can be utilized. c. Signals in ZTS2. Extensive deletion analysis of the ITS2 region in yeast using the tagging system has provided a complex story of possible signals required for the processing of the mature 25s and 5.8s rRNAs. In these experiments, a 27SB-type intermediate increases in level, suggesting that the normal C2, and subsequent C1 and E cleavages, are affected. Deletions involving the entire ITS2, or the 5‘ half of the ITS2 (total size, 234

228

DUANE C. EICHLEH AND NESSLY CRAIG

nucleotides), inhibit 25s rRNA production, whereas smaller deletions in the 3’ part (e.g., +153-193 and the 3’-terminal 52 nucleotides) allow very low production of 25s rRNA. Further analysis of the ITS2 region has utilized cross-species substitutions, chimeric ITS2 sequences combining different parts from different species, and directed mutagenesis of selective small regions. The net result of these analyses has been the identification of two important regions, or “cis-acting signals (153~). The first involves a highly conserved region at the apical end of helix V in the middle of ITS2. Changing the base sequence of this region of ITS2, even while maintaining the putative secondary structure, strongly inhibits processing. The second signal is associated with the base of helix IV, a helix not conserved in size or sequence in different yeast species. If the helix is completely removed, 25s rRNA production is inhibited, whereas if only part of helix IV is deleted, 25s rRNA continues to be produced (1%). Clearly, further analysis will provide more information about this complicated story. In Tetrahymena, an insertion at the Hind111 site in ITS2 was used to analyze effects on processing (151).Transformants were observed with a 119bp insert, but no transformants were present with the 2300-bp insert. The transforinants with the 119-bp insert in ITS2 were not quite normal: Their growth rate was reduced by half, there was a 50-fold increase in the accumulation of the 35s rRNA precursor, there was an accumulation of a new 300-nucleotide intermediate containing the 5.8s rRNA (the termini of this intermediate were not mapped), and there was an increase in the level of an intermediate greater than 28s in size. Clearly, processing has been affected by the presence of the 119-bp insert in ITS2. The general finding from these experiment is that deletions and/or mutations in the ITS2 do not affect production of 17s rRNA; rather they inhibit or drastically reduce the rate of processing that separates the 5.8s sequences from the 25s sequences (152).These results again strongly suggest that there are two branches of the pre-rRNA-processing pathway because the production of 17s and 25s rRNAs appears to be independently affected by various deletions and/or mutations of their corresponding spacer regions.

d. Signals in 18S, 5.8S, and 28s rRNA sequences. Are there any signals that reside within the mature rRNA sequences of precursor rRNAs that affect processing? In oitro processing at the 5‘ end of human 18s rRNA suggests that there may be. The final exonucleolytic trimming of two or three nucleotides from the 5‘ end of human 18s rRNA requires the nucleotide sequence from +6 to +25 in 18s rRNA (95). In the yeast tagged-gene system, it was found that not every site in the mature rRNA can be suc-

RNA PROCESSING

229

cessfully tagged as a “neutral mutation,” particularly since a 19-bp insertion in the 17s rRNA variable region, V3, prevented production of tagged 17s rRNA (48). Although the tagged 35s pre-rRNA was detected, it was presumably defective and was degraded. The 25s rRNA portion of the 35s prerRNA was not tagged in these experiments; thus, it was not possible to determine whether only the 17s rRNA portion of the tagged 35s pre-rRNA was degraded or whether the entire precursor was degraded. Similar observations were made in the Tetrahymena system, in which only one of two 119-bp insertions in the 26s rRNA region resulted in transformants. The successful insertion was in a variable “expansion”region loop, whereas the unsuccessful insertion was in an “expansion”segment helix (49). Of 10 insertions of either 11 or 61 bp into seven sites in the 17s rRNA region, three gave no transformants, three transformants produced no “tagged rRNA even though the altered rRNA genes were present, three produced no mature tagged rRNA although a little of the tagged precursor rRNA was detected in two strains, and three produced tagged rRNA. The initial interpretation of these results focused primarily on the relationship between the location of the insertion (variable versus conserved region, etc.) and the subsequent transformation ability. For example, the results of no transforinants or no tagged rRNAs could be due to nonfunctional rDNA, or the instability of the tagged rRNA and failure to be processed and assembled into functional ribosomes (49). Thus, while the results in Tetrahymena are rather vague with regard to the possible role of mature rRNA sequences in processing, they may still be taken to support the proposal that these sequences play some role in the processing of precursor rRNA.

4. PROCESSING COMPLEXES As described in preceding sections, various experiments in yeast have given rise to the suggestion that the processing of rRNA involves a large ribonucleoprotein complex analogous to the spliceosoine for pre-mRNA splicing. These experiments include those showing that different snoRNAs are required for processing and that many of the snoRNAs are found together in complexes that are precipitated by antibodies directed against NOPl (fibrillarin). The term “processosome” has been used to label this putative complex (17). Moreover, differences in snoRNAs, proteins, and mutational effects on the 18s and 25s rRNA branches of yeast rRNA processing have further suggested that there may be at least two complexes involved, or that the properties of a single complex can change with the addition or removal of specific components. However, this evidence is indirect and therefore only suggestive at this time. Direct evidence for the existence of, and even the requirement for, a

230

DUANE C. EICHLER AND NESSLY CRAIG

large ribonucleoprotein complex has essentially come from the work following processing at site #O in the mouse (154)and in Xenopus (59,143),using a cell-free rRNA-processing system. When processing-competent pre-rRNA substrates were added to a cellular S-100 extract, stable complexes incorporating the pre-rRNA molecules were formed in a time-dependent manner. These complexes were first demonstrated by gel-retardation analysis or by their sedimentation properties in glycerol gradients. Importantly, the substrate and extract requirements for complex formation were identical to those required for successful rRNA processing. For example, if the prerRNA secondary structure was altered by a cis-antisense sequence, no complex was observed (as judged by gel-retardation analysis), and no processing occurred (as judged by cleavage analysis) (81). UV-crosslinking experiments also show that at least six polypeptides are specifically associated with this processing complex (1%). The identities of these polypeptides have not yet been established; however, the apparent size for two different protein bands suggested that one might be a nucleolar single-strand-specific endonuclease (98)(see Section I, B,2,a), and the other, the abundant nucleolar protein nucleolin. In addition, when pre-rRNA transcripts of sufficient size (extended length) are used, this complex can be demonstrated to contain stably associated U 3 snoRNA (C. A. Enright and B. Sollner-Webb, personal communication). Similar complexes have been detected and analyzed in the cell-free system from Xenopus which carries out processing at this same site (site #0) (59, 143). Complex formation is dependent on having a processing-competent pre-rRNA substrate, on having intact U 3 snoRNA, and on having extract proteins. No complex was formed and no processing occurred if there were inutations or deletions in the 11-nucleotide region of this processing site that is conserved between Xenopus and mammals. However, the Xenopus system has the advantage over the mouse system in that these processing complexes can actually be visualized with the electron microscope when oocyte chromatin is isolated and spread using the “Miller treatment” (143). Under these conditions, transcribing rRNA genes and their associated nascent pre-rRNA molecules have a characteristic “Christmas tree” appearance with branches ending in knobs or “terminal balls.” Because the examined chromatin can include plasmids with normal or mutated rRNA genes injected into the oocytes, the effects of particular mutations can be observed for their effect on the formation of terminal knobs of the transcribing rRNA genes. The most striking finding was the absence of the terminal knobs for rRNA genes whose pre-rRNA was incapable of being processed, and the clearest case of this was found when the conserved 11-nucleotide region at site #O is deleted (143).

RNA PROCESSING

231

II. The Relationship between RibosomalRNA Processing and Posttranscriptional Modifications In eukaryotic cells, pre-rRNA is modified coordinately with and/or shortly after synthesis and then cleaved to intermediates that are further processed to become the 18S, 5.8S, and 28s rRNA components of ribosomes (19,155,156).Although inore than 40% of the pre-rRNA sequence may be degraded, the degraded pre-rRNA sequences contain no modifications; hence, the post-transcriptionally added modifications are conserved during the processing of eukaryotic precursor rRNA (19,156, 157) (see Table I). Both base methylation and ribose methylation (2’-O-methylation)are found in eukaryotic rRNA, but the extent of ribose methylation is far greater, accounting for 80-90% of the total methylation observed (19,156).Therefore, the number of methylated bases in eukaryotic rRNAs is relatively small, and in contrast to ribose methylation, base methylation occurs after transport to the cytoplasm (19).Another type of modification, pseudouridylation, takes place coordinately with ribose methylation, in the nucleolus. Considerable work has shown that rRNA modifications are distributed in a non-random pattern that is highly conserved in eukaryotes (19,156-159). Possible roles for these post-transcriptional modifications include recognition signals for processing sites, effectors of rRNA conformation, and binding sites for specific proteins that affect the assembly of ribosome subunits.

A. Ribose Methylation The high degree of evolutionary conservation of rRNA methylation patterns and the near completion of ribose methylation of eukaryotic rRNA at the stage of the primary precursor rRNA have been taken to support the proposal that ribose methylation is integral to correct pre-rRNA processing. In early studies on pre-rRNA processing, it was noted that inhibition of methylation prevented normal processing under conditions that did not appear to affect rRNA transcription. For example, methionine starvation completely abolishes ribosome production while the synthesis of precursor rRNA continues (160).However, starving for this amino acid also inhibits translation, and this may have a synergistic effect on the overall inhibition of ribosome synthesis. Interestingly, exposure of Novikoff ascites hepatoina cells to poly(I).poly(C)also blocks inethylation of the precursor rRNA, and this in turn preferentially impairs production of the small ribosomal subunits during maturation (161). Other studies using ethionine, a methionine analog, to inhibit methylation showed that inhibition of ribose methylation disrupts the normal cleav-

232

DUANE C. EICHLER AND NESSLY CRAIG

age patterns of pre-rRNA during processing (162).The extent of this effect, however, is somewhat unclear, and in other reports, inhibition of pre-rRNA methylation by ethionine only partially affected processing; in particular, it was the efficiency of processing that was impeded by inhibiting methylation (163-165).In this case, the ribosomal subunits were judged abnormal and nonfunctional. In yeast, ethionine inhibits methylation, but processing of pre-rRNA is only moderately inhibited (117).Consistent with these observations, a temperature-sensitive lethal point-mutation in a yeast gene that encodes the yeast equivalent to fibrillarin (NOPl) strongly inhibits methylation of precursor rRNA at nonpermissive temperatures, but there is little effect on the processing of pre-rRNA (117).Therefore, whether the effects observed on ribosome production as a result of inhibiting methylation by ethionine or methionine starvation are directly related to the methylation of pre-rRNA or the assembly of ribosomal proteins continues to be a question (155). Even though the locations of the ribose methyl groups in 18S, 5.8S, and 28s rRNAs are invariant between such distant species as Xenopus and huno apparent consensus recognition motif has been identified that mans (156), governs recognition of these methylation patterns (19,156).It may be that aspects of rRNA structure are important elements of recognition, but secondary RNA structural analysis provides very few clues to the structural features that may determine methylation patterns (19). The study of activities involved in the methylation of pre-rRNA has generally been quite limited. An S-adenosyl-L-methionine-dependent methyltransferase from rat liver was reported to catalyze the methylation of both base and ribose moieties in hypomethylated rRNA from regenerating rat liver after treatment with ethionine and adenine (166).This enzyme activity had an apparent molecular mass of 30 kDa, and would inethylate only hypomethylated rRNA and not rRNA from normal untreated rat liver. Methyl groups were incorporated into pre-rRNA larger than B S , and the extent of ribose methylation by this enzyme fraction w a s greater than that of base methylation using the hypomethylated rRNA substrate. A 130-kDa methyltransferase from Ehrlich ascites tumor cells was partially purified and was also identified by its capacity to methylate hypomethylated rRNA species, but subsequent experiments showed that this methyltransferase activity actually methylated the 5 position of cytosine residues, not the riboses in the hypomethylated rRNA (167). The characterization of a nucleolar 2‘-O-methyltransferase from nucleoli of Ehrlich ascites tumor cells was the first description of an enzyme that uniquely catalyzed methylation at the 2’-hydroxyl position of rRNA (168). Although the nucleolar methyltransferase possessed activity specific for the 2’-O-methylation of rRNA, questions remain as to whether this enzyme or a

233

RNA PROCESSING

combination of methyltransferases may be involved in the in oioo methylation of precursor rRNA (169). In this regard, the enzyme can, in oitro, methylate each of the four nucleotides of an RNA substrate, although to various levels depending on the RNA substrate. To further investigate the involvement of the nucleolar 2’-O-methyltransferase in the methylation of precursor ribosomal RNA, an in oitro-synthesized 28s rRNA transcript containing a unique tandem triple 2’-0-methylated ribose site was used as a substrate (170).The results showed that the sequence methylated corresponded precisely with the highly conserved methylated tract, A,G,,C,, reported from in oioo studies (156). This site occurs in a single-stranded region of 28s rRNA that links two highly conserved domains in the secondary structure (19), and contains several invariant nucleotides (156). These findings were taken to support the involvement of this nucleolar enzyme in the 2’-O-methylation of pre-rRNA (170).

B. Pseudouridylation and Base Methylation Similar to ribose methylation, pseudouridylation and base methylation of precursor rRNA occur only in those sequences destined to become mature rRNA species, and their distribution is similar to that of the 2‘-O-methyl groups (19), since they are found only in the conserved mature sequences but not in the expansion/variable regions. The number of pseudouridine groups is nearly equal to methylated groups in eukaryotic rRNA (Table I). The small number of methylated bases in eukaryotic rRNAs occur mainly in the 18s species and almost all are introduced at the later stages in the rRNA maturation process. Again, there is no obvious consensus recognition motif for these types of modifications, there is little information about the enzymology involved, and there is no clear understanding as to their role in ribosome biogenesis (19).

111. Summary In summary, it can be argued that the understanding of eukaryotic rRNA processing is no less important than the understanding of mRNA maturation, since the capacity of a cell to carry out protein synthesis is controlled, in part, by the abundance of ribosomes. Processing of pre-rRNA is highly regulated, involving many cellular components acting either alone or as part of a complex. Some of these components are directly involved in the modification and cleavage of the precursor rRNA, while others direct the packaging of the rRNA into ribosome subunits. As is the case for pre-mRNA processing, snoRNPs are clearly involved in eukaryotic rRNA processing, and have been proposed to assemble with other proteins into at least one complex called a

234

DUANE C. EICHLEH AND NESSLY CRAIG

“processosome” (I7), which carries out the ordered processing of the prerRNA and its assembly into ribosomes. The formation of a processing complex clearly makes possible the regulation required to coordinate the abundance of ribosomes with the physiological and developmental changes of a cell. It may be that eukaryotic rRNA processing is even more complex than pre-mRNA maturation, since pre-rRNA undergoes extensive nucleotide modification and is assembled into a complex structure called the ribosome. Undoubtedly, features of the eukaryotic rRNA-processing pathway have been conserved evolutionarily, and the genetic approach available in yeast research (6) should provide considerable knowledge that will be useful for other investigators working with higher eukaryotic systems. Interestingly, it was originally hoped that the extensive work and understanding of bacterial ribosome formation would provide a useful paradigm for the process in eukaryotes. However, although general features of ribosome structure and function are highly conserved between bacterial and eukaryotic systems, the basic strategy in ribosome biogenesis seems to be, for the most part, distinctly different. Thus, the detailed molecular mechanisms for rRNA processing in each kingdom will have to be independently deciphered in order to elucidate the features and regulation of this important process for cell survival.

REFERENCES 1 . G . Attardi and F. Anialdi, ARB 39, 183 (1970). 2. H. Busch and K. Smetana, “The Nucleolus.” Academic Press, New York, 1970. 3. R. P. Perry, ARB 45, 605 (1976). 4. E. 0. Long and I. B. Dawid, ARB 49, 727 (1980).

5. J. L. Woolford and J. R. Warner, in “The Molecular and Cellular Biology of the Yeast Saccharomyces” (J. R. Broach, J. R. Pringle and E. W. Jones, eds.), Vol. I, p. 587. CSH Lal) Press, Cold Spring Harbor, New York, 1991. 6. J. L. Wcmlford, A& Genet. 29, 63 (1991). 7. B. Sollner-Wel)b, K. Tyc and J. A. Steitz, in “Ribosomal RNA: Structure, Evolution, Processing, and Frinction in Protein Synthesis” (R. A. Zimnierinan and A. Dahlberg, eds.), Telford, Caldwell, New Jersey, 1993. 8. B. Sollner-Welh and J. Tower. ARB 55, 801 (1986). 9. A. A. Hadjiolov, “The Nucleolus and Rilmsoine Biogenesis.” Springer-Verlag. New York, 1985. 10. J. R. Warner, Curr. Opin.Cell B i d . 2, 521 (1990). 11. S. Chaudhuri and I. Liel)ennan, J M B 33, 323 (1968). 12. K. P. Dudov, M. 1). Dabeva, A. A. Hadijolov and B. N. Ttdorov, BJ 171, 375 (1978). 13. K. P. Dudovand M. D. Dabeva, BJ210, 183(1983). 14. R. L. Taber and W. S . Vincent, BBA 186, 317 (1969). 15. H. L. Cooper and E. Gibson, JBC 246, 5059 (1971). 16. J. Bell, L. Neilson and M. Pellegrini, MCBiol8, 91 (1988).

RNA PROCESSING 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.

235

M. J. Fournier and E. S. Maxwell, TlBS 18, 131 (1993). I. W. Mattaj, I). Tollervey and B. Seraphin, FASER J . 7, 47 (1993). B. E. H. Maden, This Series 39, 241 (1990). H. A. Raue, J. Klootwijk and W. Musters, Prog. Biophys. Mol. B i d . 51, 77 (1988). H. A. Raue and R. J. Planta, This Series 41, 89 (1991). U. Scheer. M. Thiry and G. Goessens, T r e n h Cell Riol. 3, 236 (1993). C. G . Clark, J . M o l . Eool. 25, 343 (1987). R. A. Weinberg and S. Penman, J M B 47, 169 (1970). P. K. Wellauer, I. G. Dawid, D. E. Kelly and R. P. Perry, JMB 89, 397 (1974). M.D. Dabeva, K.P. Ddov, A. A. Hadjiolov, I. Emanulov and B. N. Todorov, BJ 160,495

(1976). 27. K. V. Hadjiolova, 0. I. Georgiev, V. V. Nosikov and A. A. Hadjiolov, BJ 220, 105 (1984). 28. K. V. Hadjiolova, 0. I. Georgiev, V. V. Nosikov and A. A. Hadjiolov, BBA 782, 195 (1984). 29. K. V. Hadjiolova, M. Nicoloso, S. Mazan, A. A. Hadjiolow and J.-P. Bachellerie, EJB 212, 211 (1993). 30. S . A. Gerbi, R. Savino, B. Stebbins-Boaz. C. Jeppesen and R. Rivera-Leon, in ‘The Ribosome: Structure, Function & Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 452. American Society for Microbiology, Washington, D.C., 1990. 31. B. A. Peculis and J. A. Steitz, Cell 73, 1 (1993). 32. L. H. Bowman, B. Rabin and D. Schlessinger, NARes 9, 4951 (1981). 33. L. H. Bowman, W. E. Goldman, G . I. Goldherg, M. B. Hebert and D. Schlessinger, MCBiol3, 1501 (1983). 34. T. Gurney, NARes 13, 4905 (1985). 35. K. G . Miller and B. Sollner-Webh. Cell 27, 165 (1981). 36. I. Grummt, U . Maier, A. Ohrlein, N. Hassouna and J.-P. Bachellerie, Cell 43, 801 (1985). 37. A. Kuhn and I. Grummt, Genes Decj. 3, 224 (1989). 38. R. Savino and S. A. Gerhi, EMBO 1.9, 2299 (1990). 39. J. Klootwijk and R. J. Planta, in “Methods in Enzymology” (J. E. Dahlberg and J. N. Abelson, eds.), Vol. 180, p. 96. Academic Press, San Diego, 1989. 40. S. A. Udeni and J. R. Warner, JBC 248, 1412 (1973). 41. G. M . Veldinan, R. C. Brand, J. Klootwijk and R. J. Planta, NARes 9, 4847 (1981). 42. T. R. Cech, Science 236, 1532 (1987). 43. V. C. Ware, R. Renkawitz and S. A. Gerbi, NARes 13, 3581 (1985). 44. D. F. Spencer, J. C. Collins, M. N. Schnare and M. W. Gray, EMBOJ. 6, 1063 (1987). 45. J. Engberg, H. Nielsen, G. Lenaers, 0. Murayaina, H. Fujitani and T. Higashinakagawa, 1. Mol. Eool. 30, 514 (1990). 46. W. Musters, J. Venema, G. van der Linder, H. van Heerikhuizen, J. Klootwijk and R. J. Planta, MCBiol9, 551 (1989). 47. W. Musters, K. Boon, C. A. F. M.van der Sande, H. van Heerikhuizen and R. J. Planta, EMBOJ. 9, 3989 (1990). 48. W. Musters, R. J. Planta, H. van Heerikhuiuzen and H. A. RauC, in “The Ribosome: Structure, Function, & Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 435. American Society for Microbiology, Washington, D.C., 1990. 49. R. Sweeney, L. Chen and M.-C. Yao, MCBiol 13, 4814 (1993). 50. M. Willems, P. Penman and S. Penman, 1. Cell B i d . 41, 177 (1969). 51. B. B. Stoyanova and A. A. Hadjiolov, EJB 96, 349 (1979). 52. E. H. Nikolov, B. B. Nankova and M. D. Dabeva, Mol. Biol. Rep. 15, 45 (1991). 53. S. Kass, N. Craig and B. Sollner-Webb, MCBiol7, 2891 (1987).

236

DUANE C. EICHLER AND NESSLY CRAIG

N. Craig, S. Kass and 8. Sollner-Webb, ?“AS 84,629 (1987). S. b s , K. S c , J. A. Steitz and B. Sollner-Webb, Cell 60, 897 (1990). J. M. X. Hughes and M. Ares, EMBOJ. 10, 4231 (1991). M. Beltrame and D. Tollervey, EMBOJ.11, 1531 (1992). J. Sutiphong, C. Matzura and E. G. Niles, Bchem 23, 6319 (1984). 59. E. 8. Mougey, L. K. P a p and B. Sollner-Webb, MCBiol 13,5990 (1993). 60. R. Savino and S. A. Gerbi, Biochimie 73, 805 (1991). 61. G . J. Hannon, P. A. Maroney, A. Branch, 8. J. Benenfield, H. D. Robertson and T. W. Nilsen, MCEiol9,4422 (1989). 62. C. M. Shumard, C. Torres and D. C. Eichler, MCBiol 10, 3868 (1990). 63. A. Stevens, C. L. Hsu, K. R. Isham and F. W. Larimer, J . E d . 173, 7024 (1991). 64. K. Shuai and J. R. Warner, NARes 19, 5059 (1991). 65. L. L i n W , R. H.Archer and J. M. Zengel, NARes 80,295 (1992). 66. S. Chu, R. H. Archer, J. M. Zengel and L. Lindahl, PNAS 91, 659 (1994). 67. E. 0. Long and I. B. Dawid, J M B 138,873 (1980). 68. M. E. Schmitt and D. A. Clayton, MCBioll3, in press (1993). 69. M. Kenna, A. Stevens, M. MrCammon and M. G. Douglas, MCBid 13, 341 (1993). 70. Y. Henry, H. Wood, J. Morrissey. E. Petfulski, S. Kearsey and 1). Tollervey, EMBOJ 13,

54. 55. 56. 57. 58.

2452 (1994). 71. A. E. Kempers-Veenstra, J. Oliemans, H. Offenberg, A. F. Dekker, P. W. Piper, R. J. Plaiita and J. Klootwijk, EMBOJ. 5, 2703 (1986). 72. P. Labhart and R. H. Reeder, Genes Deu 4,269 (1990). 73. M. T. Yip end M. J. Holland, JBC 264, 4045 (1989). 74. P. W. Piper, J. A. Bellatin and A. Lockheart, EMBO]. 2, 353 (1983). 75. C. A. F. M. van der Smde, T. Kulkens, A. B. Kramer, I. J. de Wijs, H. van Heerikhuizen, J. Klootwijk and R. J. Planta, NARes 17,9127 (1989). 76. A. Kuhn, I. Bartsch and I. Grummt, Nature 334, 559 (1990). 77. D. Tollervey, ] M E 196, 355 (1987). 78. R. L. Maser and J. P. Cdvet, PNAS 86, 6523 (1989). 79. I. L. Stoke and A. M. Weiner, JMB 210,497 (1989). 80. K. Tyc and J. A. Steitz, NARes e0, 5375 (1992). 81. N. Craig, S. Kass and B. Sollner-Webb, MCBiol 11, 458 (1991). 82. Q. Trinh-Rohlik and E. S. Maxwell, NARes 16,6041 (1988). 83. J. P. Morrissey and D. Tollervey, MCBiol 13, 2469 (1993). 84. H. V. Li, J. agorski and M. J. Fournier, MCBiol 10, 1145 (1990). 85. R. D. Leverette, M. T. Andrews and E. S. Mawell, Cell 71, 1215 (1992). 86. K. T.Tycowski, M.-D. She and J. A. Steitz, Genes Dm. 7, 1176 (1993). 87. P. Fragapane, S. Prislei, A. Michienzi, E. Cdbrell and I. Bouoni, EMEOJ. 12, 2921 (1993). 88. T. Kiss and W. Filipowin, EMBO J. 12, 2913 (1993). 89. B. Seraphin, TIES 18, 330 (1993). 90. R. A. Padgett, P. J. Grabowski, M. M. Konarska, S. Seiler and P. A. Sharp, ARB 55,1119 (1986). 91. M. C. Liau, N. C. Craig and R. P. Perry, BBA 169, 196 (1968). 92. C. Denoya, P. Costa-Giomi, E. A. Schodeller, C. Vasquezand J. L. LaTorre, EJB 115,375 (1981). 93. I. Winicov and R. P. Perry, Bchem 13, 2908 (1974). 94. M. Nashimoto, K. Ogata and Y. Mishima, J . Bfochem. 103, 992 (1988). 95. Y.-T.Yu and T. W. Nilsen, JBC 267, 9264 (1992).

RNA PROCESSING

237

D. C. Eichler and S. J. Edes, JBC 257, 14384 (1982). D. C. Eichler and S . J. Eales, JBC 258, 10049 (1983). C. M. Shumard and D. C. Eichler, JBC 263, 19346 (1988). L. S. h a t e r and I). C. Eichler, Bchem 23, 4367 (1984). D. C. Eichler and S . J. Edes, Bchem 24, 686 (1985). 101. D. A. Clayton, TlBS 16, 107 (1991). 102. D. D. Chang and D. A. Clayton, Science 235, 1178 (1987). 103. D. D. Chang and D. A. Clayton, Cell 56, 131 (1989). 104. J. N. Topper and D. A. Clayton, NARes 18,488 (1990). 105. Y. Yuan, R. Singh and R. Reddy, JBC 264, 14835 (1989). 106. M. E. Schmitt and D. Clayton. Genes Deu. 6, 1975 (1992). 107. T. Kiss and W. Filipowicz, Cell 70, 11 (1992). 108. Y. Yuan, E. Tan and R. Reddy, MCBiol 11, 5266 (1991). l08b. P. Linder, P. F. h k o , M. Ashburner, P. Leroy, P. J. Nielson, K. Nishi, J. Schnier and P. P. Slonimski, Nature 337, 121 (1989). 109. T. L. Ripmaster, G . P. Vaughn and J. L. Woolford, PNAS 89, 11131 (1992). 110. A. B. Sachs and R. W. Davis, Science 247, 1077 (1990). 111. J.-P. G i r d , H. Lehtonen, M. Caizergues-Ferrer, F. Amdric, D. Tollervey and B. Lapeyre, EMBOJ. 11,673 (1992). 112. M. E. Christensen, A. L. Beyer, B. Walker and W. M. LeStourgeon, BBRC 74, 621 96. 97. 98. 99. 100.

(1977).

113. J. P. Aris and G. Blobel, J. Cell Biol. 107, 17 (1988). 114. T. Schimmang, D. Tollervey, H. Kern, R. Frank and E.C. Hurt, EMBOJ.8,4015 (1989). 115. R. P. Jansen, E. C. Hurt, H.Kern, H. Lehtonen, M. Carmo-Fonseca, B. Lapeyre and D. Tollervey, J . Cell Biol. 113, 715 (1991). 116. D. Tollervey, H. Lehtonen, M. Carmo-Fonsecaand E. C. Hurt, EMBOJ.10,573 (1991). 117. D. Tollervey, H. Lehtonen, R. Jansen, H. Kern and E. C. Hurt, Cell 72, 443 (1993). 118. M. W. Clark, M. L. R. Yip, J. Campbell and J. Abelson, J . Cell Biol. 111, 1741 (1990). 119. I. D. Russell and D. Tollervey, J . Cell Biol. 119, 737 (1992). 120. W.-C. Lee, D. Zabetakis and T. Melese, MCBiol 12, 3865 (1992). 121. R. A. Borer, C. F. Lehner, H. M. Eppenberger and E. A. Nigg, Cell 56, 379 (1989). 122. M. 0.J. Olson, in ‘The Eukaryotic Nucleus: Molecular Biochemistry and Macromolecular Assemblies”(P.R. Stmuss and S. H.Wilson, eds.). Telford, Caldwell. New Jersey, 1990. 123. M. Sapp A. Richter, K. Weisshart, M. Caizergues-Ferrer, F. Arndric, M. 0. Wdlace, M. N. Kirstein and M. 0. J. Olson, EJB 179, 541 (1989). 124. K. Sipos and M. 0. J. Olson, BBRC 177, 673 (1991). 125. B. Lapeyre, H. Bourbon and F. Amdric, PNAS 84, 1472 (1987). 126. A. H. Herrerd and M. 0. J. Olson, Bchetn 25, 6258 (1986). 127. M. Sapp, R. Knippers and A. Richter, NARes 14, 6803 (1986). 128. 1.-H. Chang and M. 0. J. Olson, JBC 265, 18227 (1990). 129. T. S. Durnbar, G. A. Gentry and M. 0. J. Olson, Bchern 28, 9495 (1989). 130. R. Jansen, D. Tollervey and E. C. Hurt, EMBOJ.12, 2549 (1993). 131. T. Kadowdki, D. Goldfarb, L. M. Spitz, A. M. T ~ a k o fand f M. Ohno. EMBOJ.12,2929 (1993).

132. 133. 134. 135.

D. M. Pardoll and 8. Vogelstein, Exp Cell Res. 128, 466 (1980). D. A. Jackson, S . J. McCredy and P. R. Cook, Nature 292, 552 (1981). A. H. Davis, T. L. Reudelhuber and W. T. Garridrd, J M B 167, 133 (1983). Y. Shiomi, J. Powers, R. 1. Bolla, T.V. Nguyen and D. Schlessinger, Bchem 25, 5745 (1986).

238

DUANE C. EICHLER A N D NESSLY CRAIG

136. E. Stephanwa, R. Stancheva and Z.Avramova, Chrmnosmnu log, 287 (1993). 137. M. 0. J. Olson, M. 0.Wallace, A. H. Herrera, L. Marshal-Carlson and R. C. Hunt,

Bchem 25, 484 (1986). H. C. Smith and L. I. Rothblum, Bfochetn. Genet. 25, 863 (1987). R. L. Ochs and K. Smetana, Exp. CeU Res. 197, 183 (1991). F. Puvion-Dutilleul, J.-P. Bachellerie and E. Puvion, Chromosomu 100, 395 (1991). M. Thiry and G. Goessens, J . Cell Sci. 99, 759 (1991). Y. Mishima, T. Mitsuma, and K. Ogata, EMBOJ.4, 3879 (1985). E. B. Mougey, M. O’Reilly, Y. Osheim, 0. L. Miller, A. Beyer and B. Sollner-Webb, Genes Dew. 7, 1609 (1993). 144. T. C. King, R. Sirdeskmukh and D. Schlessinger, Mkrobiol. Reu. 50,428 (1986). 145. V. B. Vance, E. A. Thompson and L. H. Bowman, NARes 13, 7499 (1985). 146. B. Micliot and J.-P. Bachellerie. EJB 195, 601 (1991). 147. L.-C. C. Ye11 and J. C. Lee,J M B 211, 699 (1990). 148. L.-C. C. Yeh, R. Thweatt and J. C. Lee, Bchetn 29,5911 (1990). 149. L.-C. C. Yeh and J. C. Lee, JMB el?, 649 (1991). 150. L.-C. C. Yeh and J. C. Lee, JMB 226, 827 (1992). 151. R. Sweeney and M.-C. Ym, EMBOJ.8,933 (1989). 152. C. A. F. M. vander Sande, M. Kwa, R. W.van Nues, H.vanHeerikhuizen, H. A. hut5 and R. J. Planta, J M B 223, 899 (1992). 153. Y. Nod, R. Yano and M. Nomura, PNAS 88, 3962 (1991). 153b. R. W. van Nues, J. Venema, R. J. Planta and H. A. h u b , in “The Transcriptional Apparatus”(K. H. Nierhaus, A. R. Subramanian, V. A. Erdmann, F. Frdncecshi and B. Wittmm-Liehld, eds.), p. 151.Plenum, New York, 1993. 153c. R. W. van Nues, R. J. Planta and H. A. huC, personal communication. 154. S. Kass and B. Sollner-Webb, MCBiol 10, 4920 (1990). 155. J.-H. A h , Microbiol. Reu 46, 281 (1982). 156. B. E. H. Maden, JMB 201, 289 (1988). 157. B. E. H. Maden and M. Sdim, JMB 88, 133 (1974). 158. B. E. H. Maden, Nuture 288, 293 (1980). 159. M. S. N. Khan, M. Sdim and 8. E. H. Maden, BJ 169,531 (1978). 160. M. H. Vaughn, R. Soeiro, J. R. Warner and J. E. Dmell, PNAS 58, 1527 (1967). 161. M. C. Liau, D. W. Smith and R. B. Hurlbert, Cancer Res. 35, 2340 (1975). 162. L.-T. Wen and K. Tsukada,BBA 741, 153 (1983). 163. P. F. Swann, A. C. Peacock and S. Bunting, BJ 150, 335 (1975). 164. S. F. Wolfand D. Schlessinger, Bchem 16, 2783 (1977). 165. M. Caboche and J.-P. Bachellerie, EJB 74, 19 (1977). 166. T. W. Long, H. Termka and K. Tsukada, BBA 741, 29 (1983). 167. M. Ollara, H. Hirano and K. Higashi, Bchetn 21, 1374 (1982). 168. I). C. Eichler, N. K. Raljer, C. M. Shumard and S. J. Eales, Bchern 26, 1639 (1987). 169. D. C. Eichler and S. J. Eales, BBRC 155, 530 (1988). 170. D. M. Segal and D. C. Eichler, JBC 266, 24385 (1991). 171. J. Engberg and H. Nielsen, NARes 18, 6915 (1990). 172. B. M. Tyler and N. H. Giles, NARes 13, 4311 (1985). 173. M. L. Sogin, K. Miolto and L. Miller, NARes 14, 9540 (1986). 174. C. Chambers. S. K. Dutta and R. J. Crouch, Gene 44, 159 (1986). 175. P. M. Ajuh, P. A. Heeney and B. E. H. Maden, Proc. R. SOC. London, Ser. B US, 65 (1991). 176. H. Bourbon, B. Michot, N. Hassouna, J. Feliu and J.-P. Bachellerie, DNA 7 , 181 (1988).

138. 139. 140. 141. 142. 143.

RNA PROCESSING

239

177. I. L. Gonzalez, C. Chambers, J. L. Gorski, D. Stimbolian, R. D. Schmickel and J. E. Sylvester, J M B 212, 27 (1990). 178. E. A. Ruff, 0. J. Rimoldi, B. Raghu and G. L. Eliceiri, PNAS 90, 635 (1993). 179. 0. J. Rinioldi, B. Raghu, M. K. Nag and G . L. Eliceiri, MCBiol 13, 4382 (1993). 180. N. K. Nag. T. T. Thai, E. A. Ruff, N. Selvamurugan, M. Kunnimalaiyaan and G . L. Eliceiri, PNAS 90, 9001 (1993). 181. G . R. Fabian and A. K. Hopper, J. B a t . 169, 1571 (1987). 182. G. R. Fabian, S. M. Hess and A. K. Hopper, Genetics 124, 497 (1990). 183. A. K. Hopper, F. Banks and V. Evangelidis, Cell 14, 211 (1978). 184. H. M. Traglia, N. S. Atkinson and A. K. Hopper, MCBiol9, 2989 (1989). 185. K.-S. Tung, L. L. Norbeck, S. L. Nolan, N. S. Atkinson and A. K. Hopper, MCBiol 12, 2673 (1992).

This Page Intentionally Left Blank

Adenylyl Cyclases: A Heterogeneous Class of ATP-Utilizing Enzymes OCTAVIAN BhiZU1 AND ANTOINE DANCHI”

lnatitut Pasteur 75724 Paris, France

I. Adenylyl Cyches of Cram-Negative Facultative Aniterolm , . . . .. . . . . A. Early Bicwhemical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . B. Cloning cdtlie cycl Genes and Common Features in the Secluencw d the Proteins . .... . .. . .. . .. .. .. . .. .. .. .... . . . .... .. .. ..... .. C. Reguletion of Adrnylyl-Cycl~~~e Activity of Gram-Negative Fadtattive Anaenhes . . . , . . . . . . . . , . , . , , . . . . . , . . . . . , . . . . . . . . . .. . . . . 11. The Cdmtdulin-Activated B w t e d Toxic Adenylyl Cyches . . . . . . . . . A. Bordetellu perturnis hdenylyl Cyolnse ,. . . . . . . . . . . . . . . . . .. . . . 1.Caterd Features of the Protein , . . . . . . .. . . . . , . . . . , . . . . . . . . 2. Cloning of the cya Gene and the IMurwl Primary Structure of R. Iiertctssb Adenylyl Cyclase . . . . . . . . . . . . . . . . . . . . . . .. . . . . 3. “lie Catalytic Domain of B. pertussla Adenylyl Cychw .... . .. . 4. Tlie Heinolysin llcnnain of &. pertussb Adenylyl Cyc& . . . . . . B. B d l l u s anthrucis hdenylyl Cyclase . . . . . . . . . . . . . . . . . . . . . .. . . . 111. Clirss 111 Adenylyl C y c k s . . . . , . . . . ,. . . . . . . . . . . . .. . ,. . . . . . . . . . IV. Similarity i f Adenylyl and Cuanylyl Cychses . . . .. .. . . . . . . . ,. .. . . . . V. Evolution of Adenylyl Cyobyes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . A. CramdNegative Wriiltative Anaerobes . . . . . . . . . ,. . . . . . . . . .. . . . . B. Bwtetial Pathogens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,. . . . . C. The Cktss I11 Catdytic Center . . . .. .. . . . .. . . . . .. ,. .. , . .. .’.. V1. Are Adenylyl Cyclases Pulse-Generating Enzymes? . . . . . . . . . . .. . .. . VII. GIossary . . . . ... ... . . .. . ,... . . . . . . . . * .. . . .. . * . .. . .. . . . . References . . . . . . . . . . , . . . . . ,.. . . . , ,. . . . . . . . . . . . . . . . . . . . . .. . . . .

.

..

.

.

.

.

.

.

I . .

.

a . . . .

I

I

.

242 242 243 246 251 251 251

252 253 257

259 261 267 271 271 272 273 275 276 277

Cyclic AMP (adenosine 3’,5’-cyclic monophosphate, CAMP), discovered some 37 years tigo in eukayotes as a key element for hormone-dependent metabolic control, is a universally used signal molecule. This ubiquity accounts for the interest and the ever-growing number af studies devoted ta the enzymes or to the regulatory factors that contribute to the synthesis or degradation of CAMP. The catalysts responsible for the synthesis of CAMP from ATP, the adenylyl cyclases, represent a very luw proportion of cellular proteins. This explains the difticulties encountered in their isolation and 1

Corresponderice may be addressed to either author.

Prngress in N r l r i c Acid Research nnd Mn)rmlPr Bldogy. Vnl. 49

242

OCTAVIAN BAHZU AND ANTOINE DANCHIN

characterization, as well as the heterogeneity of protein species isolated from various sources or by different laboratories. Association of adenylyl cyclases with membranes or other cellular components often requires the use ofdetergents or dissociating agents that affect protein stability and, ultimately, activity. Since the pioneering work of Sutherland, it has been observed that adenylyl cyclases from eukaryotes are activated or inhibited by a variety of ligands. This observation has been extended to most of the enzymes identified and characterized. The molecular mechanism of adenylyl-cyclase action and regulation is far from being completely understood, due in part to the lack of knowledge of the three-dimensional structure of the protein. However, the general principles governing the interaction of adenylyl cyclases with different ligands and the organization of the active enzyme in a membrane-bound multi-subunit system seem to be fairly well-established. An essential step toward comprehension of structure-function-evolution relationships in the family of adenylyl cyclases was the cloning of the corresponding genes, from bacteria to higher vertebrates, including humans. Identification of revealing features of the proteins, as inferred from the predicted polypeptide sequences, showed that all cyclases, apart froin their rather large size, are constructed in a modular fashion (see 1-5 for recent reviews). In some instances, inodules carrying specific functions have been isolated as independent entities and shown to have properties similar to those found in the intact enzymes. This variety in the block-building of cyclases allows a fine regulation of their activity according to environmental or cellular conditions. Adenylyl cyclases have been divided into three classes according to the common features they share in their polypeptide sequences (I): (i) cyclases present in Gram-negative facultative anaerobes, best represented by the Escherichia coli adenylyl cyclase; (ii) toxic adenylyl cyclases isolated from bacterial pathogens such as Bacillus anthracis and Bordetellu pertussis; and (iii) a large class, which includes cyclases from eukaryotes as well as from prokaryotes, the origin of which might have predated their phyletic separation.

1. Adenylyl Cyclases of Gram-Negative Facultative Anaerobes A. Early Biochemical Studies The occurrence of cAMP in E. coli and Breuibacterium liquefmiens was first reported in 1963 (6, 7). Two years later Makman and Sutherland showed that the concentration of cAMP in E. coli varies within three orders of magnitude as a function of growth conditions (8).In the presence of glucose, it decreases to 0.1 pM. Upon transfer of bacterial cells into glucose-free

MOLECULAH HETEROGENEITY OF ADENYLYL CYCLASES

243

medium, the concentration of CAMP rises to 0.1 mM. The same year, a tentative purification of adenylyl cyclase from the soluble fraction of B . Ziyuefaciens was reported (9). However, the main observation was that pyruvate is required for full activity. This property was later found to be characteristic of other bacteria, such as Micrococcus species and Arthrobacter (10). Subsequently, pyruvate-independent adenylyl cyclase from E. coli has been reported (11-13). Unlike the enzyme from B. liquefaciens, the E . coZi preparations were partially particulate and easily solubilized by buffer extractions. NaF, inorganic pyrophosphate, several nucleotide triphosphates, and pyridoxal phosphate acted as inhibitors of the E. coli enzyme. Although the specific activity of the E . coZi adenylyl cyclase reported by Tao and Lipmann (13)was less than 1 nmol/min/mg of protein (purification factor of about loo), it allowed estimation of the mass of enzyme by gel permeation chromatography (about 110 kDa). The first preparation of bacterial adenylyl cyclase to be judged homogenous by polyacrylainide gel electrophoresis (PAGE)was that from B. liquefaciens [a specific activity of 30 pmol/min/mg of protein for a mass of 46 kDa/monomer (14)].Only 9 years later was adenylyl cyclase from E. coli purified to near homogeneity (=70%pure) (15). From the purification factor ( X 17,000), it was estimated that only about 15 molecules of the enzyme occur in a wild-type E. coZi cell. The low yield (2%)was due to the instability of enzyme during purification. Escherichia coli adenylyl cyclase has a low specific activity (700 nmol/min/mg of protein, corresponding to a turnover of about 100 min-1) (15)and a mass between 92 kDa (SDS-PAGE) and 95 kDa (gel permeation chromatography). These numbers corresponded fairly well with the inass calculated later from the sequence of the corresponding gene. Digestion of overexpressed E. coli adenylyl cyclase allowed isolation of a fragment of 30 kDa that exhibited a specific activity of 435 nmol/min/mg of protein (turnover of about 13 inin-') (16).Unlike the native adenylyl cyclase, the trypsin-released catalytic domain of bacterial enzyme could be stored indefinitely as a frozen solution with no detectable loss of catalytic activity. Attempts to obtain even minute amounts of pure adenylyl cyclase from other bacterial sources, except from the Gram-positive organisms B . Ziyuefaciens and Streptococcus salivarius, failed (14,17). With the advent of recoinbinant DNA technology, it was expected that cloning of the cyu genes and expression of the proteins in different hosts would facilitate the task of protein purification for further biochemical studies.

6. Cloning of the cya Genes and Common Features in the Sequences of the Proteins The first genes recognized as encoding adenylyl cyclases were cloned in 1981. Wang et al. (18)isolated, using phage hgt4 as a cloning vector, an

244

OCTAVIAN BARLU AND ANTOINE DANCHIN

EcoRI fragment of SaZinoneZZu typhiinuriurn DNA that is effective in complementing an E. coZi strain lacking adenylyl cyclase. The DNA fragment encoded a protein of 81 kDa, whose cellular distribution matched that of adenylyl-cyclase activity. An intriguing observation, subsequently explained by multiple controls of enzyme activity, was that adenylyl-cyclase overproduction was not followed by a similar increase in the concentration of intracellular CAMP. In fact, the DNA fragment cloned was a partial gene sequence that encoded a truncated protein susceptible to proteolysis, and that lacked normal activity regulation. Roy and Danchin (19) identified the complete E. coZi K12 adenylyl-cyclase gene from the Clarke and Carbon plasmid library by subcloning it into plasmid pBR322. It expressed a 95-kDa protein that could complement a Acya (see Section VII for Glossary) mutation and restore its normal glucose-mediated regulation. Interestingly, a protein truncated at the carboxy-terminal end could still complement a Acya mutation, explaining why the truncated gene product from S . typhimuritrm was active. In addition to establishing the physical map of the region, these authors identified neighboring genes, the promoter region, and the direction of transcription (20). When the E. coZi cya gene was fused with the lac2 gene, a hybrid bifunctional protein exhibiting both adenylyl-cyclase and P-galactosidase activities was expressed (21). Antibodies raised against the hybrid protein recognized wild-type adenylyl cyclase in bacterial extracts. The complete nucleotide sequence of the cya gene from E. co2i has been determined (22). The gene encodes a protein of 848 amino-acid residues. The deduced, slightly acidic character of the protein is consistent with the PI of 6.1 determined experimentally on the wild-type protein purified from bacteria. The protein is rich in cysteine (a generally uncommon feature for proteins located in the cytoplasm or at the cytoplasmic face of the membrane) and histidine residues. The sensitivity of cysteine to oxidation (15) might explain the low purification yield in E. coli adenylyl cyclase and the low level of activity found in preparations obtained from overexpression vectors (23).The relatively high number of histidine residues might suggest, on the other hand, that metal ions take part in the folding or activity of the polypeptide chain. Predictive methods for secondary-structure analysis showed that a domain corresponding to the first 395 amino-acid residues, which carries the catalytic site, is rich in a-helix (42.5%). The rest of the molecule (residues 396-848) has only 19% content of a-helix and is electrically neutral (net charge of -1). The hydropathy profile of adenylyl cyclase did not reveal features typical of a membrane-bound protein. Comparison of the domain carrying the catalytic site of E. coZi adenylyl cyclase with known sequences in protein data libraries did not reveal significant identities with other families of proteins. A weak similarity with the

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

245

subunit of membrane ATPases becomes apparent upon lowering of the threshold for comparison. The first 112 residues of the adenylyl cyclase bear some similarity to pyrophosphate-binding proteins (for example, tRNA synthetases), perhaps indicating that the very amino terminus of the protein is involved in pyrophosphate binding. However, these similarities are too low to provide a final demonstration that this class of adenylyl cyclases is derived from an ancestral ATP-binding protein. Cloning of the cya gene of E. coli permitted several observations. (a) Plasmids carrying part of the 5' region of the cya gene fused to the lac2 gene are expressed in E. coli, demonstrating that the N-terminus of adenylyl cyclase carries the catalytic domain, whereas the carboxyl terminus corresponds to a regulatory domain (24, 25). (b) Measurement of P-galactosidase activity in uiuo, from the fused cya-hcZ construction, permitted evaluation of the actual cell content of adenylyl cyclase. A figure of about 400 molecules per cell was found, a higher value than that given by Yang and Epstein (15). (c) Other bacterial types were similarly analyzed for the structure, activity, and regulation of expression of the cya genes. To facilitate the cloning of cya genes from bacteria other than E. coli, restriction-minus, cya-deficient strains of E. coli were constructed, and subsequently transformed and complemented by DNA libraries from foreign organisms (26). After identification and sequencing the cyu genes from E. coli and S. typhimurium, several other cya genes from Gram-negative organisms were analyzed (Enoiniu chysanthemi, Yersinia intermedia, Yersinia pestis, Proteus mirahilis, Aerotnonas h ydruphila, Pasteurella multocida, and Haemuphilus influenzae). In enterobacteria, the organization of the genes at the cya locus and the proteins is similar to that observed in E. coli (1). In the other cases, the organization of the cya gene itself w a s preserved while the environment was altered (27).In enterobacteria, all major features found in the E. coli gene and protein organization are conserved, including the number of promoters and upstream and downstream genes (1).In particular, the unusual I T G initiation codon present in E. coli, together with a stretch of 18 nucleotides overlapping the ribosome-binding site, is almost invariant. Isolation of the gene from P. multucida permitted the demonstration that the gene itself is very similar to the enterobacterial type, but that local gene organization is different, as was the 18-nucleotide segment overlapping the initiator ?TG initiation codon (27). In the same way, the gene from H. influenzae displays significant similarities to the E. coli counterpart (28).More recently, a cyclase gene isolated from A. hydrophih was found to belong to the same class of enzymes (C. Vivarhs, A. Sismeiro and A. Danchin, unpublished). Taken together, these data indicate that the regulation of protein activity (including glucose-mediated regulation) is probably preserved in this

246

OCTAVIAN B h L U AND ANTOINE DANCHIN

class, while the expression of the gene may be different in Pasteurella, HaeinophiZus, or Aeromonas species from the expression in Enterobacteriaceae. Several general features were noticed when comparing the amino-acid sequences of the proteins. In particular, no long stretch of hydrophobic amino-acid residues was found, as would be expected for the membranebound localization of the proteins (13, 29). Considering the residues conserved between different members of the family, one finds that the three aromatic amino acids and leucine (taking into account the relative proportions of these residues in the sequence) are generally preserved, whereas no other strong conservation rule can be found for the other residues (including cysteine or histidine residues) (Fig. 1). This indicates that a hydrophobic core sequence must play a major role in the structural organization of the protein. Finally, the protein is composed of two functionally well-defined domains, discussed below.

C. Regulation of Adenylyl-Cyclase Activity of Gram-Negative Facultative Anaerobes Cyclic AMP is one of the mediators of catabolite repression in E. coZi (30). Modulation of its concentration is the result of a complex interaction involving the glucose-specific transport system (31).As just discussed, the class of enterobacterial adenylyl cyclases consists of proteins comprising two functional domains. The amino-terminal moiety is responsible for catalytic activity, whereas the carboxy-terminal end is responsible for glucose-mediated inhibition of activity. The separation between the catalytic and regulatory domains has yet to be identified exactly, but recent experiments using the E , coZi gene indicate that a correctly folded catalytic domain can be formed with a polypeptide chain located upstream of aspartate residue 414 (31a). Adenylyl-cyclase activity can vary in uiuo over a wide range. Truncated enzyme, fused to P-galactosidase, displays more activity than the native enzyme. It was further shown (31a), by analyzing revertants of a mutant carrying an aspartate at position 414 in place of asparagine in the E. coli wildtype enzyme, that the revertants had truncated proteins with a fully expressed, unregulated level of activity. This is consistent with the hypothesis that the carboxy-terminal end acts, in conjunction with other factors, as a tonic inhibitor of activity, permitting, under appropriate conditions, the release of a powerful synthesizing capacity. As we shall see, this inhibition of the catalytic activity by a domain of the same peptide chain seems to be a general feature, and is not a property only of the family of Gram-negative facultative anaerobe adenylyl cyclases (class I enzymes). What are the effectors of cyclase regulation? It has been established for a long time that the complex cascade of the phosphoenoZpyruvate-dependent

247

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

ECOCYC ECHCYC HINCYC PMUCYC Y INCYC Y PECYC

MYLYIETLKQRLDAINQLRVDRALAAMGPAFQQVYSLLPTLLHYHHPLMP MYFYIETLKQRLDAINQLRVDRALEAMKPAFQQVYSLLPVLLHHHHPLMP

ECOCYC ECHCYC HINCYC PMUCYC YINCYC YPECYC

GYLDGNVPKGICLYTPDETQRHYL-NELELYRGMSVQDPPKGEL-----GYLEGKVPHGICLFSPDEKQQHYL-DSVELRWGELSAPDRKGEL-----GWI-HAPSGIASFLASDYQKKWLTNEYGIHYADHKPSTLKSAVNFHEVF GWA-DAPVGIADFVISPYQKQYLLWPSLEANQ---SLLPSFSYRSTGYLDGNVPHGVCLFTPNEIQQDYL-ADVEARWGEPLAPSAGGEL-----GYLDGNVPHGVCLFTPNETQQDYL-SEVEAKWGEPLQQSVGGEL-----==+ =+ + =+ = +

MECNLAQAKQWVSALDQRRFERALQGSGDAFQHVLAIVPLLLHLNHPQLP

MNYDLFSAQKKVEYLDKLRIERALSGSSGEFQHVFQUTLLLJIINHPNLP MYLYIETLKQRLDAINQLRVDRALAAWPAFQKVYSLLPILLHCHHPQMP MYLYIETLKQRLDAINQLRVDRAUAMGPTFQKVYSLLPTLLHCHHPLMP = + ++ + +++ = +=== + == = ++ s = = +== +=

5

-PITcvYTMGSTSSVGQSCSSDLDIWVCHQSWLDSEERQLLQRKCSLLEN -PITDVYSMGSTSSIGQSCSSDLDIWVCHQSWLDNEERQ-LQQKCSLLEK PPILDVYVMDSFGSISQTSSSDLDTWIWR~LSLDEYTLLTQKAKRISE NAILGWVMGSIASISQTPKSDLD?WVCHRDDLSTKEKEALQRKTHLLKN

ECOCYC ECHCYC HINCYC PMUCYC YINCYC YPECYC

-PITcvYSMGSTSSIGQCHTSDLDIWVCHQAWLIRNQLQQKCSLLEK -PITcvYSMGSTSSIGQCHTSDLDWCHQAWLDSEERNRLQEKCSLLEK

+=

--- ---

+=++=

==== =+= +

=

=

=

I

+

WAASLOVEVSFFLIDENRFRHNE-SGSLGGEDCGSTQHILLLDEFYRTAV WAAGQGVDVSFFLMDENRFRHNE-SGSLGGEDCGSTQHILLLDEFYRTAV

ECOCYC ECHCYC HINCYC PMUCYC Y INCYC Y PECYC

WAMQFNVEINFYLMDQQRFWHYADPLTIENSGSAQYMLLLDEFYRSAV

WAKQFNIEINFYIHDQKRFRCFRYAEPLTAENCGSAQYMLLLDEFYRSAI WAASMGVEVSFFLVDENRFRHNA-SGSLGGEDCGSTQHILLLDEFYRSAV WAASMGVeVSFFLIDENRFRHNA-SGSLGGEDCGSTQHILLLDEFYRSAV -++++=+=+=++=== + +r =+ ==+= +========+=+

ECOCYC ECHCYC HINCYC PMUCYC YINCYC YPECYC

RLAGKRILWNMVPCDEEEHYDDYVMTLYAWGLTPNEWLDLGGLSSLSAE RMAGKRILWNMVPVEEEAHYDEFVLSLYARGALAPNEWLDLGGLSALSAE RLAGKPLLWLHLWVENEKDYEXEVARLITEGEIDPNDWVDFGGLGQFSAN RLAGKPLLWLHLLIEQEENYESEVERLVRTQQICLDDWVDFGGLGQLSAN RLAGKRILWNMVPVKEEEHYDDYVLSLYREGVLTPNEWLDLGGLSTLSAE RLAGKRILWNMVPVEEENNYDDYVLSLYAQGVLTPNEWLDLGGLSTLSAE =+=== + 5 = + += +=+ = = + ++=+=+===+ +r=+

ECOCYC ECHCYC HINCYC PMUCYC Y INCYC YPECYC

EYFGASLWQLYKSIDSPYKAVLKTLLLEAYSWEYPNPRLLAKDIKQRLHD EYFGASLWQLYKSIDSPYKAVLKTLLLEAYSWEYPNLLSSEIKARLHK EYFGASLWHLYKGIDSPYKSILLLEAYSKEYPNTCLIARTFKRDLLA

EYFGASLWQLYKGIDAPYKSVIKILLLETYSSEYPNTYLIARQFKEELLT EYFGASLWQLYKSIDSPYKAVLKTLLLEAYSWPKSQLLMIKQHLm EYFGASLWQLYKSIDSPYKAVLKTLLLEAYSWPNSQLLAMEIKQRLHA .......................

====+==

==r+

=++

+=

=

Frc. 1 . Aligiiiiieiit of adenyly~-cyclasesequences (50 residues) from the enterobacterial class (class I). Identical residues are indicated by a = sign, and conservative replacements are showii by a sign. ECOCYC, Escherichiu coli (22);ECHCYC, Ewinia chrysanthetni (174);

+

HINCYC, Ifaetnophilus injluenzae (28);PMUCYC, Pasteurellu rnultocida (27);YINCYC, Yersirriu intennedia ( 1 ) ; and YPECYC, Yersinia pestis (I).

OCTAVIAN BAHZU AND ANTOINE DANCHIN

248 ECOCYC ECHCYC HINCYC PMUCYC Y INCYC YPECYC

GEI-VSFGLDPYCMMLERVTEYLTAIEDFTRLDLVRRCFYLKVCEKLSRE GEI-VSPGLDPY-RVTQYLDAINDQTRLDAINDQTRLDLVRRCFYLKVCEKLSRE GNTNPDHHFDPYIAILAKWQYLTALSEFKRLDFVHRCFY

G K L N P S H H F D P Y L A M L Q R A ~ Y L ~ ~ ~ ~ ~ GEI-VSFGLDAYCM&DRVTRYLIQI~VRRCFYLKVCEKLSRT GEI -VAFGLDAYCM&DRVTRYLTQI"LVRRCFYLKVCEKLSRS

-

+= + = ==

+

+I+'

=I

+=+=

=+I

I

+

ECOCYC ECHCYC HINCYC PMUCYC Y INCYc Y PECYC

RACVG--WRRAVLSQLVSEWWDEARLAMLD"WKIWVREAHNELLD RACTA--WRRQIL'IQMVQAWGWSDERLVMLD"WKIGQ~LD QAN-- - N W R I R Y M E I L A Q E W W S A L N K R P F W K I K A ~ D N I ~ DPNATNNWRLQHLQKLIQEWWSDALIEELNQRANWKIKQVKKAHNSLIK PASTG--WRREVLSQLVSEWGWSNEKLA-IER-LD PASVG--WRREILSQLVSEWGWSDESLAVLD"WKIERVREAHNELLD -- + + =+I + =++I+ Is= =+ =+ ++ +

ECOCYC ECHCYC HINCYC PMUCYC YINCYC YPECYC

~QSYRNLIRFARRNNL~SSASPQDIOVLTRKLYAAFEAP ~SYRNLIRF~SASFARRNNLrnSASDIGnTRKLYAAFEALPP F~YRNL~~HIHSSWPQDINILSRKL~~EELPOKVSUNT PLMLSYRNLV~~SSIMWDISVLTRKLYTAFEELPITL~P

ECOCYC ECHCYC HINCYC PMUCYC Y INCYC YPECYC

QISPDLSEPNLTFIWPPGRAMRSGWLYNRAPNIESIISHQPLEYNRYL QISPDLSETNLTFIWPAGRAMRSGWYLYNQAPSMDAIISHQPLEYNRYL

ECOCYC ECHCYC HINCYC PMUCYC YINCYC Y PECYC

NKLVAWAWFNCLLTSRTRLYIKGNGIVDLPKLQEMVADVSHHFPLRLPA-

ECOCYC ECHCYC HINCYC PMUCYC YINCYC YPECYC

PTPKALYSPCEIRHLAIIVNLEYDPTAAFRNQWFDFRKLDVFSFGENQ PTPKALYSPCEIRHLAIIVNLEHDPTAAFRNQVVHFDFRQLDVFSFGQQQ PKNSDLLNQCEIRSLFIAINL"DPTSKVEEVL'TCISSRDL--FSFGSLE VTNEDLTHACEIRSLIVAVNLTVDPTKKIlQVKSRIQASDL--FSFGPKE PTPKALYSPCEICHLAIIVNLEHDPTITFRNQWDFDFRKLDVFSFGEQQ

ECOCYC ECHCYC HINCYC WCYC Y INCYC YPECYC

AbMQSYRNLIRFARiWNLWSAPQDIGVLTRKLY~PGKVTLVNP

~QmRNLIRFARiWNLWSASPQDIGVLTRKLY~~ffiKVTLVNP +r I====+rs;r++ + P ==I= +o+rrr=+r== =a==++=+=

QISHNLSEAHL~GNKHFKDGWYLINQPIHHIMFSKERVIEYGESL QISLNLSEIUJLLPPEVKGSK~QTPSVAGFVQKRYTEYSESL

QISPDLSEEHLTFIHVPAGRANRPGWYLYNQAPSMDAIVSHQPLEYNRYL QISPDLSEEHLTFIHVPAGRANRXWYLYNQAPSMDAIVSHQPLEYNRYL =I= +s== +r I = + + I==+ =+ + + = = = NKLVAWAYFNGLLTSSTRLHIKGHELCDIARLQELVSDVSSHFPLRVAANKLVSWAYFNHLLTMTELSIFSKNVT-LSTLQRFVTNLRQSFPSTIAKQ NKLVAWAYFNRILTANTDLHIISPNVS-LTTLRHFVTDLRLSFPVTVSSNKLVSWAYFNGLLTSKTHLHIKSANLCDTVKLQELVTDISHHFPLRLAANKLVSWAYFNGLLTSKTRLHIKSANLCDTVKLQELVTDISHHFPLRLPA====+I= II +== = II+ + E+ =+++ == ++

PTPKALYSPCEIRHLAIIVNLEHDPTAAFRNQWHFDFRKLDVFSFGEQQ

=

err

= + +==

=I=

+

+

=

====

+

NCLVGSVDLLYRNSWNEVRTLHFNGEQSMIEALKTILGKMHQDAAPPDSV QCLVGSIDLLYRNSWNEVRTLHFSGEQAMLEALKTILGKMHQDAALPESL QSLVGSIDFTYR"EIRTLHFEGQNAILLALKVLSNKIYRGVNRPDS1 ESLVGSIDITYR"EIRTLHFEGPNAILLALKVLSNKIHRGAPSPKL1

QCLVGSIDLLYRNWNEVRTLHFSGEQAVLEALKTILGKMHQDAAPPESV QCLVGSIDLLYRNSWNEVRTLHFSGEQAVLEALKTILGKMHQDAAPPESV + ----+I+ === =I=+=====I ++++ === + =+ ++ I +

----

FIG. 1. (Continued)

249

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

ECOCYC ECHCYC HINCYC PMUCYC Y INCYC YPECYC ECOCYC ECHCYC HINCYC PMUCYC Y INCYC YPECYC ECOCYC ECHCYC HINCYC PMUCYC Y INCYC YPECYC ECOCYC ECHCYC HINCYC PMUCYC YINCYC YPECYC

EVFCYSQHLRGLIRTRVQQLVSECIELRLSSTRQETGRFKALFWS~ EVFCYSQHLRGLIRTRVQQLVSECIELRLSSTRQEPGRFKAWAG~

QVYCYSERYRQDLRQLVMGLVNRCVSIQVGDI-QQPCQTSRLRVAGKNWQ QVFSYSHRYRRTLSNIVPHLINRCISIQIGDA-LPP-QNNLLRVAGKNWQ DVFCYSQHLRGLIRTRIQQLVSECIDLRLSSTRQEPGRFKAVRVSGQTWG LWFCYSQHLRGLIRTRIQQLVSECIELRLSSTRQEPGRFKAVRVSGHWG +=+ ==++ = + + =++ =+ ++++ + ++=+= 5 LFFERLNVSVQKLE-NA-IEFYGAISHM(LHG-LSVQV-----ETNHVKL LFFERLSVSAQKLE-NA-VEFYGAIS"KLQG-LPVQV-----ETNHIHL LFFEDRGISLQEIG-NESVCNEAESAVDFDE---VLQTPIEDCETNQ--E FFFEERGISLQEIHSNFDT--------

ALQTEVEEKESALPDT

LFFERLSVSVQKLE-NA-VEFYGAIS"KLHG-LSVQV-----ETNQIHL LFFERLSVSVQKLE-NA-'JEFYGAISNNKLHG-LSIQV-----ETDQIHL +=== += = + + + + + += + =+

p---- AWDGFASEGIIQFFFEETQDENGFNIYILDESNRVEVYHHCEGS

p---- pwDoVAsEG1 IQFFFEDQHDNQGFNIYILDESNRVEVYHHCEGS

SRRYPPglCDAFASEGFLQFFFEDNSDH-SFNVYILDESNULEIYRHCDGE SRTYPPEIDHFASEGFLQFFFEDNSDG-SFNWILDEANRIEIYRNCDGQ p---- P W D O F A S E G I I Q F F F M I Y I L D E A N R V E V Y H H C E G S p----PWWFASEGI IQFFFEGTADEKGFNIYILDESNRVEVYHHCWS

+

+

+z

=

====++===lo

+==+n=ntr+=++n+r++=+=

KEELVRDVSRFYSSS---HDRFTYGSSFINFNLPQFYQIVWDGREQVIP KEELVRDVSRFYSSS---HDRF"YGSSFINFNLPQFYQIVQWRmVIP KDWVREINQLYQNAKQEGDKNPYNIVQHNFNYPQFYQLQNGKNGISIVP KEKKILEINHIYQSSGLD~P~IVQRDFNYWFYQULQ KEELVRDVSRFYSSS---HDRFTYGSSFINFNLPQFYQIVQLP KEALVRDVSRPYSSS- -HDRFTYGSSFINFNLPQFYQIVQWRmVIP

=+

+

+++++I

++

-

+

=

+I= ====3+

++=

ECOCYC ECHCYC HINCYC PMUCYC YINCYC Y PECYC

phosphotransferase system (PTS)is involved in glucose-mediated regulation of adenylyl-cyclase activity (32). In the cascade, a phosphate group is transferred from phosphoenolpyruvate as the initial donor to histidine residues of several proteins: enzyme I, HPr, and enzyme IIIG*c.Only the phosphorylated form of the latter enzyme can act as a stimulating factor, the unphosphorylated form being inactive (31).The actual process of activation is unknown. However, in complementation experiments using adenylyl cyclase from heterologous organisms such as P. rnultocida, the E. coli enzyme IIF'c is still effective for enzyme activation (27). A likely explanation is that

250

OCTAVIAN B&KAU AND ANTOINE DANCHIN

phosphorylated enzyme IIWc phosphorylates a residue of the regulatory domain of adenylyl cyclase, thereby relieving inhibition. The PTS phosphorylation cascade involves phosphohistidine residues. It is therefore tempting to assume that a histidine residue of cyclase is phosphorylated. However, it has also been found that cysteine residues can be phosphorylated as in enzyme IIMtl(33)as well as, in the case of the phosphorylation cascade involving the so-called “two-component” regulators, aspartate residues (34). Comparison of the different adenylyl cyclases of this class allowed identification of two conserved histidine residues, and it was proposed that one of them is in an environment that bears some resemblance to residues that are phosphorylated by enzyme IIIG’c (1). In addition, early experiments aimed at identlfying phosphorylated proteins from E. coli failed to reveal the presence of a phosphoprotein of 95 kDa as expected for adenylyl cyclase (A. Danchin, unpublished). It should be stressed, however, that due to the overall low cell content in adenylyl cyclase, the presumed phosphorylated form of the enzyme might be unstable, particularly if phosphorylation occurs at the level of aspartate residues. Since it has been found that one can obtain only true revertants or complete deletions of the regulatory domain, in the case of mutation D,l,N, one may wonder whether this residue is not involved in the putative phosphorylation process (or even be the phosphorylated residue). Comparison of sequence libraries with a consensus containing this residue revealed a significant similarity with a region of the TyrR repressor that bears some resemblance to a class of two-component effectors. This may be an indication that this residue participates in an acidic pocket involved in phosphorylation. However, further work is necessary to substantiate this hypothesis. Additional observations indicate the complexity of the regulation of adenylyl-cyclase activity. Several factors have been proposed to interact with adenylyl cyclase and to regulate its activity, in general by stimulating it. It is known that cells deficient in the receptor for CAMP, CAP, overproduce the nucleotide by 30- to 100-fold, and that this cannot be accounted for by a stimulation of adenylyl-cyclase synthesis (see 30 for a review and 35 for an early tentative interpretation). A protein somewhat siinilar to RAS (activating Saccharoinyces cerevisiae cyclase) exists in E. coli, but its interaction with adenylyl cyclase has not been established (36). The elongation factor Tu, a GTP-binding protein, activates E. coli adenylyl cyclase (37). Although only a partially purified (10%)preparation of adenylyl cyclase was used, these experiments suggest that, as in the case of most eukaryotes, enterobacterial cyclase should be regulated by G-proteins (37). The same group has published experiments indicating that all three cytoplasmic proteins of the phosphoenolpyruvate-dependentPTS are also required for full activation, and that they participate in a phosphate-mediated control of activity (38).

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

251

This is consistent with the genetic observation (39),by generating progressive deletions of the cytoplasmic phosphoenolpyruvate-dependent PTS genes, that the effect of enzyme IIIGlc alone cannot explain all of the phosphoenolpyruvate-dependent PTS-mediated regulation of adenylylcyclase activity. Finally, although it appears that the stimulatory effect revealed by the absence of the cAMP receptor protein CAP is mediated by enzyme IIFIc (40), no information is available to demonstrate that the interaction is a direct one.

II. The Calmodulin-Activated Bacterial Toxic Adenylyl Cyclases Two taxonomically unrelated pathogens, Bora!etella pertussis and Bacillus anthracis share a property not seen elsewhere on a bacterial adenylyl cyclase: activation by calmodulin (CaM), a protein either absent in bacteria or existing as a very distant relative of eukaryotic CaM (41, 42). Extracellularly released adenylyl cyclases are internalized by host eukaryotic cells. Upon entry into the host cell cytoplasm, B . pertussis and B . anthracis enzymes interact with target cell CaM, causing unregulated synthesis of cAMP and impairment of cellular functions. Immunological cross-reactivity of these two bacterial adenylyl cyclases at first suggested a closely related structure. However, cloning of the cya genes from B . pertussis and B . anthracis showed important differences in their sizes and very little overall sequence similarity. Despite these differences, it seems that the locations of the ATP- and CaM-binding sites, as well as the mechanisms of activation of these two bacterial enzymes, are similar.

A. Bordetella pertussis Adenylyl Cyclase 1. GENERALFEATUHES OF THE

PHOTEIN

Bordetella pertussis, a Gram-negative organism, is the causative agent of whooping cough. It secretes several toxic proteins in the culture medium. As early as 1973, the presence of an adenylyl-cyclase activity among these proteins was recognized (43). The first attempt to purify the enzyme from culture supernatants was reported 3 years later (44). It was only in 1980 (42) that a most unexpected observation about this enzyme, namely, activation by CaM, was made. The molecular heterogeneity of B . pertussis adenylyl-cyclase preparations obtained in different laboratories was somewhat puzzling. This heterogeneity concerned both the size and the capacity of protein to enter animal cells (45-47). Gel permeation chromatography of culture supernatants as well as “urea” extracts of B . pertussis indicated the existence of “low” and

252

OCTAVIAN BAHLU AND ANTOINE DANCHIN

“high”-molecular-massforms of adenylyl cyclase. The smaller species corresponded to masses between 45 and 60 kDa, whereas the higher-mass species had more variable sizes, ranging from 120 to 700 kDa. Shattuck et al. (48) obtained a low-mass form of B. pertussis adenylyl cyclase whose specific activity was much higher than that originally suggested (44). Ladant et al. (49) purified “low”-mass extracellular adenylyl cyclase to near homogeneity by taking advantage of the fact that the CaM-complexed enzyme, unlike free adenylyl cyclase, interacts with DEAE-Sepharose at neutral p H . An important breakthrough in the purification of B. pertussis adenylate cyclase was the use of affinity chromatography on agarose-immobilized CaM and further elution with 8 M urea. To loosen the interaction between adenylyl cyclase and CaM, an engineered CaM exhibiting lower association constants to target proteins (VU-8 CaM) w a s used for enzyme purification (SO). VU-8 CaM, in which three glutamic acid residues present in the wildtype counterpart (residues 82-84) were substituted with three lysine residues, had 10-3 of the affinity for B. pertussis adenylyl cyclase compared to vertebrate CaM. As a result, the bacterial cyclase, once absorbed on a VU-8 CaM-agarose matrix, was eluted with 2 mM EGTA in Tris-HCI buffer. The specific activity of a homogeneous preparation (43 kDa) of adenylyl cyclase was 2400 p,mol/min/mg of protein. The amino-acid composition of this preparation agreed with that deduced later from the first 400 amino acids of B. pertussis adenylyl cyclase (51, 52). The origin of the B. pertussis adenylyl-cyclasepreparation heterogeneity remained unexplained until the corresponding gene was cloned and sequenced (53). 2. CLONING OF

THE C y a GENEAND THE DEDUCED PRIMARY STRUCTURE OF B . pertussis ADENYLYLCYCLASE

In spite of numerous attempts, the cya gene from B. pertussis was not cloned until 1988. This can be explained a posteriori by the inactivity of the adenylyl cyclase in bacteria missing CaM-like activity and used for screening DNA libraries by direct complementation of a cyu defect. The introduction of a plasmid directing synthesis of calmodulin into an E. coli strain deficient in adenylyl-cyclase activity allowed cloning of the B. pertussis adenylylcyclase gene. Sequencing of the gene showed that B. pertussis adenylyl cyclase is synthesized as a polypeptide of 1706 residues (52). The fluctuation in biochemical data could easily be accounted for when it was demonstrated that the N-terminal segment of the protein displays calmodulin-activated ATP-cyclizingactivity, while the rest of the molecule is responsible for the hemolytic activity of the pathogen (52-56). The larger form of the enzyme purified from extracts of B . pertussis as a 200-kDa protein (46, 53, 56-61) was found most often in culture media as smaller

MOLECULAH HETEHOGENEITY O F ADENYLYL CYCLASES

253

forms of 50, 45, or 43 kDa (44,45,49, 60-62). Nevertheless, the proteolytic release of catalytically active fragments does not seem to be a physiological relevant feature, as was proposed initially (63). The toxic form of adenylyl cyclase corresponds, in fact, to the 200-kDa protein (46). Sequence analysis indicates that the cyclase catalytic domain is “fused to a polypeptide similar to E. coli hemolysin toxin (64). For this reason, the name “cyclolysin” was coined for the naturally occurring hybrid protein (57). Further analysis of the adjacent regions for functionally related genes led the same authors to discover that the protein is secreted through a mechanism specific to Gram-negative bacteria and similar to that involved in the extracellular release of E. coZi hemolysin or cognate proteins in related organisms. Three genes located downstream of the c y d (or cyclolysin) gene, cyaB, cyaD, and cyaE, were identified as being required for secretion. This contrasts with E. coli hemolysin, the secretion of which was supposed to require only two genes, hlyB and hlyD (65),a third gene, hZyC, being required only for activation of the protein (66).Weiss and co-workers discovered later in the upstream region of the cyaA gene the counterpart of the E. coli hlyc gene, which, by analogy, they named cyaC (67).The gene order in the region is therefore cyaC, cyaA, cyaB, cyaD, and cyaE. Molecular cloning of the B. pertussis cyaA gene has had numerous and far-reaching implications for the structure-function analysis of the protein. Because of its lack of activity in absence of CaM, the B. pertussis adenylyl cyclase expressed in E. coli was the first ATP-cyclizing enzyme obtained in large amounts and amenable to biochemical studies. Deletions in the cya gene as well as site-directed mutagenesis allowed obtainment of more than 100 variants of adenylyl cyclase differing in their catalytic activity, affinity for ATP and CaM, invasiveness, and hemolytic activity. Biochemical, spectroscopic, and genetic analysis of B. pertussis adenylyl cyclase has allowed accumulation of a significant amount of information concerning the molecular properties of the protein. 3. THE CATALYTIC DOMAINOF B. pertussis ADENYLYLCYCLASE

In B. pertussis, the catalytic domain of adenylyl cyclase corresponds roughly to the first 400 amino acids of the protein, that is, to the low-mass forms isolated from culture supernatants or bacterial “urea” extracts. In fact, trypsin digestion of the entire protein yields the 43-kDa form of adenylyl cyclase with little or no loss of activity (63).The smallest form of the bacterial enzyme still expressing full catalytic activity corresponds to the first 385 amino acids. Removal of 12 or 42 additional residues from the carboxyl end of the catalytic domain yields proteins exhibiting 20% and 0. l%, respectively, of maximal catalytic activity (68).

254

OCTAVIAN BAHXU AND ANTOINE DANCHIN

A striking feature of B. pertussis adenylyl cyclase is activation by CaM whether in the presence or in the absence of calcium ions (69). This is in contrast to the situation of most target enzymes complexed with CaM, which can be readily dissociated by the calcium chelator EGTA. Although EGTA decreases the &nity of B. pertussis adenylyl cyclase for CaM by about two orders of magnitude (SO), the maximum activity of the enzyme is increased by a factor of 2.5, as compared to that observed in the presence of Caz+. The first evidence of the complexity of interaction between CaM and the catalytic domain of B. pertussis adenylyl cyclase came from experiments of limited proteolysis of adenylyl cyclase cross-linked with a photoactivatable derivative of CaM (49, 70). This suggested that the CaM-binding domain of bacterial adenylyl cyclase overlaps two complementary tryptic peptides of 25 kDa (residues 1-224) and 18 kDa (residues 225-399), the main CaM-binding sequence being located on the smaller tryptic fragment (71). Based on the deduced primary structure of adenylyl cyclase, it was proposed that the sequence z3,DLLWKIARAGARSAVGTEKz57might correspond to the CaMbinding locus of the bacterial enzyme (52). In fact, a synthetic peptide of 20 amino acids (P235-254) corresponding to residues 235-254 of B. pertussis adenylate cyclase was found to bind CaM in a CaS+-dependent manner (51). This peptide, although not homologous to any known CaM-binding sequence, was predicted to adopt a helical structure in solution, with segregation of the hydrophobic amino-acid residues to one side of the helix and the basic hydrophilic residues to the other (55). The low affinity for CaM of peptide Pz,-, (580 nM) as compared to the adenylyl-cyclase catalytic domain (0.2 nM), and the fact that high affinity for CaM is restored only when the two tryptic peptides are mixed together (71), raised the question whether the binding locus for CaM in B. pertussis adenylyl cyclase represents a single continuous sequence or whether multiple contacts of residues located distantly in the primary structure contribute to the overall high affinity for cyclase-CaM. Several approaches were used to answer this question: (a) partial deletion of the adenylyl-cyclase gene; (b) isolation of peptides of various sizes interacting with high affinity with CaM; (c) solid-phase synthesis of CaM-binding peptides; and (d) site-directed mutagenesis of residues, or clusters of residues, potentially involved in the interaction with CaM (72). From these approaches, it was concluded that the amino-terminal half of the catalytic domain of adenylyl cyclase contributes only 10%to the binding energy of CaM, whereas the remaining 90% of binding energy corresponds to a stretch of 72 amino acids, from residues 196 to 267, overlapping with the two tryptic fragments (Fig. 2) (68). The conformational properties of two peptides of 43 and 72 residues (PB5-267 and PI,-,, respectively) were recently studied by NMR spectroscopy. The proton resonances were as-

255

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES 196

1 11%

225 267 9% 80%

399 0%

4

Kd W 242

1)

189

I

I

1 399

0.2

I

399

2.5

196 126 -7

2.5

225

T

I 267

235 m 2 w

25

!570

FIG.2. CaM-l)inding activity of truncated B . pertussis adenylyl cyclase. The percentage of the energy involved in CaM binding is indicated below a line on which is shown the position of the residue after which truncation has been inade. In the right-hand coluinn. the K,,for CaM binding is shown.

signed using several 2-D techniques (COSY, TOCSY, and NOESY) and a standard inethodology (72a,b). Froin the chemical shift distribution of C, protons along the sequence, the elements of regular secondary structure were identified. Thus, in the 43-amino-acid peptide, a 15-amino-acid fragment (between L, and A,,,) populates in a significant proportion the a-helix conforinational state. The same fragment is also organized in an a-helix in the larger peptide. Pi(fi-267 also shows a second helical fragment situated between V,,, and The two helices in Pl,-2fj7 can be significantly reinforced in a mixed solvent containing 30% (v/v) trifluoroethanol, suggesting that the corresponding fragment in the intact protein assumes a similar spatial conformation. Differences in the fluorescent properties of isolated peptides of the catalytic domain of adenylyl cyclase, each possessing a single tryptophan residue, are in favor of multiple contacts between enzyme and CaM as playing a role in forming the tightly bound complex (73).As the CaM-binding peptides were shortened, their affinity for CaM dropped, but the shift in tryptophan fluorescence maxiinurn and the increase in quantum yield upon addition of CaM were more important. In this respect, only the y-subunit of skeletal muscle phosphorylase kinase behaves like B. pertussis adenylyl cyclase. Two noncontiguous domains in the phosphorylase kinase can interact with CaM

256

OCTAVIAN B&UU AND ANTOINE DANCHIN

(74), although only one domain exhibits structural features of CaM-binding peptides. Dissection of the contribution of different segments of catalytic domain of B. pertussis adenylyl cyclase in binding CaM was pursued by site-directed mutagenesis. As expected, the hydrophobic amino-acid residues situated on one side of the amphipathic helical structure around WS2 were much more critical to CaM binding than the basic residues situated on the opposite side of the a-helix. It also becomes clear that the “atypical”distribution of hydrophilic residues, in which basic amino acids alternate with acidic ones, did not significantly affect the subnanomolar & for CaM. The CaM-binding domain of chicken smooth-muscle myosin-light-chain kinase contains a pseudosubstrate sequence (,,RRKWQKTGHAVRAIG,) (75) similar to the sequence located in the N-terminal half of the catalytic domain of B. pertussis adenylyl cyclase (,,RRKGGDDFEAVKVIGl,n). Replacement of the first three basic residues (RRK) with glutamic acid residues decreases the &nity of adenylyl cyclase for CaM by a factor of 7 (+Ca2+) and 42 (-Ca2+). It may be that, in addition to hydrophobic interactions, the CaM-adenylyl-cyclase complex is stabilized by contacts between negatively charged residues of CaM and basic amino acids clustered in short sequences such as RRK (68). The N-terminal fragment of 224 amino-acid residues released after limited proteolysis of the catalytic domain of B. pertussis adenylyl cyclase exhibits 0.1% of the activity of the intact domain (51, 71).It seems, therefore, that this fragment possesses the structural requirements for the formation of the catalytic site, but in the absence of the “activatory” subdomain (the second proteolytic fragment), it cannot express full activity. In fact, the two fragments upon reassociation in the presence of CaM can reconstitute a fully active species of adenylyl cyclase (51, 71, 76). Several nucleotide analogs with high affinity for B. pertussis adenylyl cyclase have been tested as potential substrates or inhibitors of the bacterial enzyme. One of them, 3’-anthraniloyl-2’-deoxyATP(Ant-dATP), a fluorescent derivative, interacts with adenylyl cyclase with an a n i t y 60 times that for ATP (77). Moreover, the quantum yield of Ant-dATP in complex with CaM and adenylyl cyclase is four times that of the nucleotide alone, or of the nucleotide associated with adenylyl cyclase in the absence of CaM. Confirmed by equilibrium-dialysis experiments, these data were the first evidence that CaM is necessary not only for triggering catalysis but also to ensure tight binding of the nucleotide to the catalytic site of B. pertussis adenylyl cyclase. Another fluorescent ATP analog that can covalently label the nucleotidebinding site of adenylyl cyclase is 8-azido-Ant-dATP. The great advantage of the latter derivative (which has one-seventh the affinity for adenylyl cyclase of Ant-dATP) is that photolabeled enzyme can be detected on SDS-PAGE

MOLECULAH HETEROGENEITY O F ADENYLYL CYCLASES

257

(78). Fluorescent ATP analogs with high affinity for adenylyl cyclase also proved to be useful tools for the analysis of inactive variants of adenylyl cyclase by site-directed inutagenesis. In this respect, several residues present in the catalytic domain of B . pertussis adenylyl cyclase were shown to be involved preferentially in the binding of nucleotide or in catalytic activity. Thus, K,, and K,, which belong to a sequence resembling the consensus sequence G----GK(T/S) present in many nucleotide-binding proteins, are assumed to interact with the a-phosphate group of MgZ+-ATP (55, 79). To decipher the specific role of residues D,, and D1, essential for both catalysis and nucleotide binding, the I,,PLTADIDl, region in B . pertussis adenylyl cyclase was used to scan protein-sequence libraries for ATP-dependent enzymes possessing two adjacent aspartic residues in their catalytic center. A conclusion from such analysis was that residues Dlxn and D,, of bacterial adenylyl cyclase interact with Mgz+-ATP, probably as a p,y-bidentate complex, and that participation of these two residues in the catalytic step would consist of stabilizing the transition state. It should be mentioned that any of the four mentioned amino-acid residues cannot be replaced productively by other amino acids, proving their unique role both in ATP binding and in catalysis (80). It is more difficult to assign a particular role to residues H, and Em, in B . pertussis adenylyl cyclase. They are probably involved in the activation mechanism by CaM, through hydrogen bonds with residues belonging to the active site. Their role would be to optimize binding of ATP to the catalytic site and promote its cyclization. That this seems to be indeed the case is the fact that H,,N and H,,,Q mutants are more active than H,R, HzsnP, or H,,L inutants of bacterial enzyme. On the other hand, glutainine successfully replaced glutamate at position 301, whereas the E,,R mutant had only 0.02%of the activity of the wild-type protein. 4. THEHEMOLYSIN DOMAINOF B . pertussis

ADENYLYLCYCLASE

The catalytic domain of bacterial adenylyl cyclase is followed by a 1300amino-acid segment that exhibits a significant identity with E. coli a-hemolysin (25%) and other RTX (“repeat in toxin”) toxins such as Pusteurellu huemolyticu leukotoxin (22%) (57). This correlates with the observation that crude or purified preparations of the 200-kDa form of B . pertussis adenylyl cyclase are endowed with hemolytic activity on sheep red blood cells (59, 81). Although this CaZ+-dependent hemolytic effect of B . pertussis adenylyl cyclase is approximately 1/500th that of E. coli hemolysin, it is an intrinsic property of the molecule. Bordetella pertussis strains deleted in the cyu operon are no longer hemolytic. On the other hand, deletion of the catalytic domain of the bacterial enzyme or site-directed mutagenesis yielding catalyt-

258

OCTAVIAN BdRLU AND ANTOINE DANCHIN

ically inactive variants generated protein species still able to display full hemolytic activity (82-85). Taken together, these observations suggest that the hemolytic activity represents a side effect of B. pertussis adenylyl cyclase, and that the hemolysin moiety is related to the toxic activity of the protein by channeling the N-terminal catalytic domain across the bacterial envelope as well as into the eukaryotic target cells. Strains with point mutations inactivating the adenylyl-cyclase activity are still hemolytic, but are avirulent in animal models (86). The first evidence of invasivity of bacterial adenylyl cyclase came from experiments in which crude enzyme incubated with human neutrophils or alveolar macrophages was internalized (87). Accumulation of CAMP in these cells was accompanied by, a decrease in superoxide anion generation and inhibition of chemotaxis and of the bacteriocidal potency of neutrophils. Since these observations (87), an impressive amount of data has been accumulated on the kinetics of adenylyl-cyclasepenetration in various host cells, the fate of internalized protein, and the sequence-related properties of the hemolysin domain (reviewed in 3 and 46). The hemolysin domain of B . pertussis adenylyl cyclase can be divided into three subdomains. The first, located between residues 500 and 700, is rich in hydrophobic residues, forming four potential membrane-spanning regions. This subdomain displays little similarity with the first domain of a-hemolysin. The second subdomain, between amino-acids 700 and 1O00, is highly similar to a corresponding internal domain of both E. coli a-hemolysin and Pusteurellu hueinolyticu leukotoxin (57). A third subdomain between amino acids lo00 and 1600 contains glycine- and aspartate-rich repeats characteristic of RTX proteins (88), among which the E. coli a-hemolysin is the most typical and extensively studied member. These toxins do not possess cleavable amino-terminal signal peptides. They are secreted by a special mechanism involving a translocator system composed of HlyB, HlyD, and TolC (in the case of E. coli a-hemolysin) or CyaB, CyaD, and CyaE proteins (in the case of adenylyl cyclase). The glycine- and aspartate-rich repeats in B . pertussis adenylyl cyclase are supposed to participate in the binding of Ca2+ and association with target cell membranes. Deletion of similar repeats by site-directed inutagenesis in E. coli a-hemolysin was demonstrated to abolish the calcium binding and the hemolytic activity of this molecule (89, 90). Similar to other RTX proteins, the toxic activity depends on a covalent posttranscriptional modification of the protein mediated by the CyaC protein (67, 91-93). Binding of calcium to the B . pertussis adenylyl cyclase, which also appears to be an essential step for maximal toxic activity, is accompanied by conformational changes as evidenced by several methods, such as a shift in intrinsic tryptophan fluorescence, protection against proteolysis by trypsin, and electron

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

259

microscopy. Removal of CaZ+ by EGTA reversed all of the observed effects and also eliminated the toxic activity. Deletion analysis of B. pertussis adenylyl-cyclase toxin showed that the last 217 amino-acid residues contain the main portion of information required for secretion (94). Moreover, adenylyl-cyclase derivatives lacking the hydrophobic region and part of the repeats but containing the last 217 amino-acid residues were secreted at much higher levels than the full-length protein. Another segment of the molecule apparently involved in the secre, and I,,,. tion of adenylyl-cyclase toxin is situated between residues Y

6. Bacillus anthracis Adenylyl Cyclase Bacillus anthracis, a Gram-positive organism, is the causative agent of anthrax. It secretes several proteins into the culture medium, three of them representing more than 50% of the total protein. Found in the relative proportion of 100/25/5, they can be separated with ease and purified by hydroxylapatite and DEAE-Sepharose chromatography (95, 96). Named protective antigen (PA), lethal factor (LF), and edema factor (EF), these proteins are nontoxic when injected into animals as isolated pure proteins. However, their association can provoke death (PA + LF) or edema of skin EF) in experimental animals. E F is a highly active, calmodulin(PA stimulated adenylyl cyclase (97). The protein (89 kDa) is rather acidic (PI 5.9). The three components of the B. anthracis toxin can be obtained in variable proportions by producing bacterial strains defective in one or several components (98).The B. anthracis cya gene was cloned using the same strategy as for the B. pertussis cyaA gene, that is, selection of clones that can complement the cyclase deficiency of an E. coli cya-deficient strain expressing calmodulin (99).A different strategy for cloning the protein uses hybridization with an oligonucleotide-designedsequence following the amino terminus of the purified adenylyl cyclase (100). The protein encoded by the corresponding gene has 800 residues and can be expressed at high levels in E. coli from the B. anthracis cya promoter. A truncated form of B . anthracis adenylyl cyclase lacking the first 261 residues was expressed at relatively high levels in E. coli (2%of total cellular proteins). This form expressed full catalytic activity and was easily purified by Cibacron Blue-Sepharose chromatography and chroinatofocusing (101).In fact, the deletion of the acidic N-terminal domain of the protein, which is responsible for interaction with PA, yielded a fragment whose PI is over 8. This truncated form of B. anthracis adenylyl cyclase and the catalytic domain of B. pertussis adenylyl cyclase represent the only variants of ATP-cyclizing enzymes available in quantities sufficient for detailed biochemical or struct u ral investigations . Fourier-transformed infrared spectra in D,O of the truncated form of B.

+

260

OCTAVIAN BAHZU AND ANTOINE DANCHIN

anthracis adenylyl cyclase (known as CYA62, in reference to its mass of 62 kDa) show a single broad band with the maximum at 1646 cm-1 resulting from the overlapping of component bands representing different secondarystructure elements. Estimation of the secondary structure obtained from the curve-fitting of the infrared spectrum of CYA62 and using the procedure of Byler and Susi (102)provided the following numbers: 34% p-structures, 10% a-helices, and 14% turns (the remaining 42% correspond to disordered protein segments) (103). Temperature dependence of the infrared spectra showed that the apparent midpoint denaturation temperature of CYA62 occurs at 4SoC, a value close to the unfolding temperature of the protein, as determined by differential scanning calorimetry. Bacillus anthracis adenylyl cyclase has an absolute requirement for CaM and for the divalent ions Ca2+ and Mg2+. Ca2+ is required to form the Ca2+CaM complex, which binds with high affinity to the enzyme, whereas Mg2+ forms a Mg2+-ATP complex, the true nucleotide substrate. However, Caz+ can replace Mg2+ in forming the metal-nucleotide complex, with a 5-fold higher affinity for the active site and with reaction rates reduced to 1/250th. 3’-dATP, a noncyclizable analog of ATP, has also been used in equilibriumdialysis experiments to show that the affinity of the B. anthracis adenylyl cyclase for this analog is severely decreased in the absence of CaM (101). This situation was also encountered with the B. pertussis adenylyl cyclase. Comparison of sequences between the catalytic domain of the two bacterial toxins revealed little similarity except in three short segments of 14-22 residues displaying 66-80% identity. Since the replacement of several amino acids belonging to these three similar sequences affects the catalytic activity of B. pertussis adenylyl cyclase considerably (i.e., K,, b,DIM,D,,, H, and E,,,), it is expected that substitution of equivalent amino acids in the B . anthracis enzyme (i.e., K,,, K,, D,,l, D,, H,,,, and Em) would have the same effect. Although only one case of substitution using sitedirected mutagenesis has been reported [K,, replaced by E(103) or by R (104)],both modified proteins lost their activity completely, in full agreement with the expectations. Moreover, the amide I band contour in the infrared spectrum of the K,R variant of truncated B. anthracis adenylyl cyclase is virtually identical to that of the wild-type protein CYA62 (103). Thus, within the limits of infrared spectroscopy, there are no detectable differences between the secondary structure of the active protein and that of the inactive CYA62 K,,Q variant. Identification of the CaM-binding domain in B. anthracis adenylyl cyclase proved to be a more complex and difficult task than in B. pertussis. The six tryptophan residues present in the catalytic domain rendered fluorescence analysis of ligand-binding inefficient. On the other hand, analysis of protein fragments obtained after limited proteolysis, as used successfully in

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

26 1

the case of the B. pertussis enzyme, remains inconclusive in the case of the B. anthracis cyclase catalytic domain due to the high resistance of the latter protein to treatment with proteases such as trypsin or collagenase. Photo&nity labeling experiments with a cleavable radioactive cross-linker bound to CaM showed incorporation of radioactivity within the last 150 amino-acid residues of B. anthracis adenylyl cyclase. On the other hand, removal of 127 codons from the 3’ end of the cyu gene generated a chimeric protein of 502 residues corresponding to residues 262673 of the B. unthracis adenylyl cyclase and the a-domain of p-galactosidase. This protein exhibited a detectable catalytic activity (0.1% of the specific activity of CYA62), which was still dependent on CaM (101).These experiments suggested that the C-terminal domain of B. anthracis adenylyl cyclase is in close contact with CaM, once the complex between the two proteins is formed. However, other segments might also play a role in this interaction. Secondary-structure prediction and helical wheel projection suggested that the segment situated between residues 548 and 564 in the B. unthracis adenylyl cyclase might be implicated in the binding of CaM (105). Although not homologous to any known CaM-binding sequence, this segment exhibits molecular features characteristic of CaM-binding peptides, that is, a high proportion of basic and hydrophobic residues segregated onto the two faces of the a-helical structure (106,107).A synthetic peptide, corresponding to residues 532-565 in the B. anthracis enzyme (PS2-,& showed an affinity for CaM that represented 80% of the binding energy of the adenylyl-cyclaseCaM complex (105). Circular dichroism and proton NMR spectroscopy showed that PS2,,% exists in solution as a mixture of random coil and a-helical structures that are most visible at the carboxy-terminal end (between residues 551 and 563) of the synthetic peptide. It should be mentioned that the segment represented by Pm2-= occupies exactly the same position in the B. anthracis adenylyl cyclase as in the CaM-binding peptide Pzw-zs7 in the B. pertussis adenylyl cyclase. These data show that despite the complexity of interaction with CaM, and the divergent primary structures, the two bacterial enzymes possess a similar structural organization of their binding sites for the activator protein and ATP.

111. Class 111 Adenylyl Cyclases Adenylyl cyclases from eukaryotes remained elusive for a long time because of their low abundance and instability. In particular, the complex organization of the hormone-regulated adenylyl-cyclase system in higher eukaryotes, involving a multiplicity of receptor- and GTP-binding proteins, was recognized early, but did not yield the large quantities of active prepara-

262

OCTAVIAN B&UU AND ANTOINE DANCHIN

tions of enzyme necessary for thorough biochemical studies. In parallel, genetic and biochemical evidence indicated a multiplicity of eukaryotic adenylyl-cyclase forms. However, it was difficult, until recently, to discern the true features of these enzymes from artifacts generated by covalent modification of the multi-component protein during purification, yielding anomalous electrophoretic migration and molecular heterogeneity. Because an immense amount of literature has been devoted to the regulating subunits of eukaryotic adenylyl cyclases-who has not seen an article on G-proteins?s-we shall not discuss in detail their mechanism of activation, but, rather, concentrate our discussion mainly on their catalytic center. An important breakthrough in the purification of eukaryotic adenylyl cyclases was the discovery that forskolin, a plant diterpene, activates adenylyl cyclase from various tissues by direct interaction with the catalytic subunit (108,109). In spite of its numerous targets-forskolin interacts with many other membrane proteins, such as the glucose transporter, the nicotinic acetylcholine receptor, and the voltage-dependent K + channels (110)its use as a ligand in affinity chromatography of adenylyl cyclase allows purification factors as high as 1OO0, with protein yields higher than 50% (111-115). Further progress in purification came from the observation that several forms of adenylyl cyclase (especially from brain or olfactory tissue) interact with CaM (116,117) or wheat-germ agglutinin (111,118, 119). This yielded homogenous preparations of enzymes that were subsequently used to generate mono- or polyclonal antibodies, or to obtain partial sequences for cDNA cloning. Thus, Mollner and Pfeuffer (120)prepared monoclonal antibodies directed against brain adenylyl cyclase, using as antigen the enzyme eluted from forskolin-Sepharose columns. In the same way, affinity chromatography on matrix-immobilized forskolin was successfully applied by several groups for the separation and purification of CaM-sensitive and CaMinsensitive enzymes from brain, heart, erythrocytes, or several other tissues or cells. The first eukaryotic adenylyl-cyclase gene to be sequenced was not from a higher eukaryote but from Saccharomyces cerevisiae. It corresponded to a 6-kb DNA fragment that could complement a C Y R 1 defect (CYR1 codes for adenylyl cyclase in yeast) (121).The 2026-amino-acid protein was remarkable in that it comprised several domains. The catalytic domain of yeast adenylyl cyclase corresponding to about 400 residues is carboxy-terminal (121,122).It is encoded by a 3-kb cDNA segment corresponding to a major short transcript of the CYRl region present when the cells are grown in minimal synthetic medium (122).The catalytic domain is located 30 kDa downstream of a leucine-rich domain of about 60 kDa. The amino-terminal 2

See the article by Fraser et al. on G-proteins in this volume. [Eds.]

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

263

domain in the yeast enzyme binds to a regulatory protein, as was observed froin reproducible copurification by iinmunoafhity chromatography with a 70-kDa component (123, 124). The S . cerevisiae adenylyl-cyclase activity is not regulated by G-proteins, but is regulated by RAS-like proteins, and this regulation is mediated by a region located in the vicinity of the C-terminal end as well as by the leucine repeat region (125).No similarity with the other classes of bacterial cyclases was noticed, suggesting that eukaryotic adenylyl cyclases form a new class of such enzymes (class 111). The adenylyl-cyclase gene from fission yeast, Schizosaccharomyces pombe, has also been cloned, sequenced, and expressed in S . cerevisiae (126).Its sequence resembles the S. cerevisiae protein, the carboxy-terminal catalytic domain exhibiting the greatest similarity, although there are similarities scattered in other domains as well. In particular, both enzymes contain a large leucine-rich repeat region. In this respect it is worth noting that the adenylyl-cyclase gene from the parasite T. eyuiperdum displays a catalytic domain of similar structure, but in this organism the leucine-rich region is expressed as an independent polypeptide (127).The gene encoding the leucine-rich region is located next to the adenylyl-cyclase gene. At present, the adenylyl-cyclase genes of several lower eukaryotes have been cloned and sequenced. All are members of class 111 and contain a single catalytic domain located at the carboxy-terminal end of the protein (121,122, 126-133). Partial amino-acid sequences of tryptic peptides from bovine brain adenylyl cyclase served to synthesize oligonucleotide probes for isolation of the cDNA coding for the enzyme (134).The deduced amino-acid sequence of a protein consisting of 1134 amino-acid residues revealed several significant features: (i) bovine brain adenylyl cyclase is completely different from the enterobacterial class (class I) or toxic adenylyl cyclases (class II), not only because of the total absence of similarity in the catalytic domain, but also because the organization of the gene is different; (ii) the bovine brain enzyme is formed froin two domains that are similar to one another (28% identity and 54% similarity); (iii) a hydropathy plot of the protein sequence shows two strongly hydrophobic regions, each containing six transmembrane spans, a topological feature common to various channels and transporters; and (iv) each, presumably catalytic, domain of the protein is similar to the catalytic domain froin adenylyl cyclases of lower eukaryotes (class 111). Two large Cytoplasmic domains of about 350 and 300 residues, respectively, are situated between the first and second transmembrane spans and at the carboxy-terminal end of the protein. They were predicted to comprise the catalytic domain of the adenylyl cyclase. If this were the case, it would be expected that one, or both, of these cytoplasmic domains contain sequences able to interact with CaM, because bovine brain adenylyl cyclase is activated

264

OCTAVIAN BdRLU AND ANTOINE DANCHIN

by CaM. Two segments in the bovine brain adenylyl cyclase were proposed as the putative CaM-binding domain (135).The first one, situated between residues 495 and 522, has the typical hydrophobic plus basic composition of CaM-binding domains (107);the second, situated between residues 1027 and 1050, is also rich in basic residues (five arginines and three lysines). It has the required hydrophobic characteristics and has, in addition, an aromatic amino acid close to its N-terminus. Fluorescence analysis of model peptides interacting with dansyl-CaM showed that the most plausible CaMbinding domain is the sequence defined by P4H5-522. Interestingly, the affinity of this peptide for CaM (&= 2 nM) is seven-fold higher than that of purified brain adenylyl cyclase (136). Cloning of the bovine brain adenylyl cyclase [named type I by Gilman and co-workers (la)] allowed expression of the protein in Sf9 cells using a recombinant baculovirus expression system (136).Recombinant adenylyl cyclase was purified 500-fold from detergent-solubilized membranes. It displayed an activity of 4 pmol/min/mg of protein, very close to that obtained in bovine brain tissue. The kinetic properties of the recombinant enzyme resemble, in many respects, those of the CaM-activated bovine brain enzyme: The enzyme can be activated by CaM, forskolin, and G,, (the a-subunit of the G-protein that stimulates adenylyl cyclase). Whereas calmodulin acts synergistically with the other activators, there is no synergistic interaction between G,, and forskolin. The recombinant enzyme, like the bovine brain adenylyl cyclase, is inhibited by the py-subunits of G-protein and by compounds known as P-site inhibitors (noncompetitive toward ATP) such as adenosine or its phosphorylated (2-deoxy-3'-AMP, 3'-AMP) or nonphosphorylated (2-deoxyadenosine)analogs. Expression of the wild-type recombinant protein is still low, although it already exceeds by a factor of 100 the level of endogenous enzyme. Thus, the scaling-up of enzyme production for structural investigation will require large quantities of Sf9 cell cultures. This justifies experiments in which sitedirected inutagenesis or partial gene deletion by genetic engineering offers alternative clues to the catalytic properties of adenylyl cyclase. Unfortunately, as noted (136),a negative correlation exits between the amount of recombinant adenylyl cyclase expressed and the intrinsic activity of the protein. For example, elimination of the first 52 residues yielded a species with specific activity estimated at 0.5% of the wild-type protein. However, the modified protein was expressed at a high level, representing more than 1%of the total membrane proteins. The simplest explanation of these observations is that overexpression of the wild-type enzyme increases CAMPproduction, inducing toxic effects on the host cells, a case also encountered when active adenylyl cyclase from bacterial sources is overexpressed in E. coli.

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

265

When the two halves of brain adenylyl cyclase are expressed separately (each containing a transmembrane and a cytoplasmic domain), each displays very low catalytic activity (el% of the wild-type protein activity), although the level of expression is substantially higher than that of the wild-type protein. Coexpression of the two halves of the molecule results in significant adenylyl-cyclase activity (136).It may be that the analogy with B. pertussis adenylyl cyclase, whose activity is restored when inactive segments are held together in the presence of CaM (76),is a case in point in this respect; that is, we speculate that only one half of the brain adenylyl cyclase possesses a catalytic site, whereas the other half contains the information required for expression of full activity. The first successful cloning of a mammalian adenylyl cyclase was followed by the identification of many of similar enzymes from various other sources, classified into six types (137) and comprising now two more types (137u, 137b) having genes on daerent chromosomes (137b).Type I is the CaMactivated mammalian brain type (134);type I1 is expressed in olfactory neurons and lung tissue, and is insensitive to CaM (137);type I11 is an olfactoryspecific, CaM-activated enzyme (138);type IV is a widely distributed form that is insensitive to CaM (139);type V is present in heart and neural tissue; and type VI is similar to type V in structure, distribution, and activity (it is therefore described as a subfamily of cyclase types (140).For example, Bakalyar and Reed (138)isolated cDNA clones encoding an adenylyl cyclase that plays an effective role in olfaction (type I11 enzyme). A cDNA library of olfactive tissue was probed with an oligonucleotide based on the sequence derived from a tryptic fragment of purified bovine brain adenylyl cyclase, and a positive cDNA clone was sequenced. The deduced protein primary structure showed that the type I11 enzyme is 10 residues longer than the type I adenylyl cyclase. Alignment of both mammalian adenylyl cyclases indicated that the proteins share the same topology, with the greatest similarity displayed within the putative cytoplasmic domains (138).Like type I enzyme, the odorate-sensitive enzyme has a potential site of glycosylation between transmembrane regions 9 and 10. In fact, treatment of olfactory cilia with peptide N-glycosidase altered the mobility of the protein on SDSPAGE from an apparent 200 kDa to 129 kDa (which is in full agreement with predictions from the sequence). Type 111adenylyl cyclase is also activated by G,,, forskolin, and CaM. Using bovine brain adenylyl cyclase as a probe, the six other mammalian types from different mammalian sources have been cloned and characterized during the last 2 years. Comparison of the primary structures of these variants, differing in tissue distribution or sensitivity to various activators or inhibitors, indicates an overall similarity of about 50%. All of these proteins share the same topological structure, that is, two transmembrane domains of

266

OCTAVIAN B h X U AND ANTOINE DANCHIN

Common conserved regions

Types IIand IV conserved regions

Types V and VI conserved regions

phosphorylation sites

FIG.3. Comparison of the higher-eukaryoteadenylyl-cyclase types. Thick lines indicate regions of strong coiiservatioo. The two cytoplasmic domains are indicated by “Cat.”Regions of phosphorylation are indicated at 1)ottom. (Redrawn after 178.)

six spans each, and several putative phosphorylation sites probably involved in regulation (see Fig. 3). Types I1 and IV are similar to each other, as are types V and VI. It is interesting to note that the sequence corresponding to the CaM-binding domain present in type I adenylyl cyclase is entirely different in the type I1 enzyme (which does not bind CaM) and even in the type 111 (olfactory)adenylyl cyclase, which reacts with CaM but has a much lower affinity. Katsushika et al. (140) cloned, from a cardiac cDNA library, a truncated form of adenylyl cyclase they termed type V-a which represents a halfmolecule of type V enzyme diverging at the end of the first cytoplasmic loop. When type V-a was expressed into cardiac muscle cells (141), it was inactive (142). However, when it was coexpressed with an artificially generated half-

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

267

inolecule forming the distal half of intact type V enzyme, the catalytic activity of the heterodimer was restored to a significant level. These data confirm and extend earlier interpretations of experimental results (136,143),indicating that heterodimerization can generate functional diversity in the signal transduction pathway. In other words, multiple isoforms might not only be expressed in the same cell type, but they can be coupled by their “complementary” halves and thus create heterodiiners with novel properties.

IV. Similarity of Adenylyl and Guanylyl Cyclases Chemically similar to CAMP, cGMP was discovered shortly after the former. Its local concentration was found to vary in different tissues as a function of environmental conditions, in a way reminiscent of the cAMP variations. For some time, cGMP as a second messenger was supposed to antagonize CAMP, the concentration of the former rising when the concentration of the latter fell, and vice versa. However, this was not always true. Indeed, cGMP is involved in many specific processes, differing significantly from those involving cAMP (144-146). In phototransduction, for example, cGMP is the central control element, with no obvious involvement ofcAMP. Although the process is formally similar to the one involving cAMP (147), it is controlled in a completely different manner. It therefore was assumed that guanylyl cyclases may differ from adenylyl cyclases (for a review on cCMP, see 146). The study of guanylyl cyclases was developed by a few groups in parallel with work on adenylyl cyclase. In 1989, it was discovered that the catalytic domains of guanylyl cyclases are similar to those of S. cereuisiue (148).This extends class I11 to comprise not only eukaryotic adenylyl cyclases but also the guanylyl cyclases, demonstrating a common phylogenic origin. The earliest work on cAMP synthesis in bacteria demonstrated that the Grain-positive organism Brevibacterium (now Corynebacterium) liyuefaciens contains an enzyme that synthesizes cAMP at a high level. It was further shown that the enzyme is active only in the presence of a-ketoacids such as pyruvate. The corresponding gene was cloned, sequenced, and expressed in E. coli (149),and shown to direct synthesis of a 403-ainino-acid protein. The catalytic domain of the protein is typical of the class 111 type, but displays a remarkable alteration in the third region of similarity, compared to all other class I11 enzymes (see Fig. 4). This could be related to the stringent requirement of pyruvate for activity. The amino-terminal end, perhaps involved in regulation of activity, displays some similarity to the fep oncogene (149). Several other bacterial adenylyl cyclases are also members of class 111.

268

OCTAVIAN BAHxU AND ANTOINE DANCHIN

1

BRELI STRClB RHIME STIAUl STIAU2 SACCE SCHPO SACKL NEUCR TRYBR DICDI DROMEb BOSTAl b RATNO2 b RATNO3 b RATNO4 b CANFASb MUSMU6 b

TPLPLAR EMVDRRL AAWRQEV SPELREV

BRELI STRClB RHIME STIAUl STIAU2 SACCE SCHPO SACKL NEUCR TRYBR DICDI DROMEb BOSTAl b RATNO2 b

---VGGGRL ---ARGGRL ---AHDGTI ---RHGGTL ---TCGGTL --RIYGGYE - - RATGGYE --RIYGGYE --RMIGGFE --ENYDCYE -CAKHGIEK EDRFRG IDK KDFYKDLEK KPKFSGVEK NPKFRVITK KPKFSGVEK EDRFRQLEK EERFRQLEK

RA?N03 b RATNO4b

CANFASb MUSMU6b

TOEKREV

SPPTGNL SPPKGCI APPTGNV PPPTGQL KELADPV AERSNNA HQSYAKV YQSYSQV HQSY DCV SQSYDEI HQSYECV YQSCECV YQSCECV

AVGFADLVSY AVGFADLVGF TAWTDIYDF TLLFADIRDF WLFADIRNF AMVFTDIKSS AMVFTDIKNS AIVFTDIKNS SIVFTDIKNS TLIFTDIESS CVFFLDIAGF GVIFASVPNF GVMFASIPNF CVMFASIPDF GVMFASLPNF CVLFASIPDF AVMFASIANF AVMFASIANF

----TSLSRRMNERTLAQLVQRFEAKC-AE---IIS ----TRLTRRMEEEELGELVEAFETTS-AD---LVA - - - -TTISEGRSPEEWAMLSEYFDLF-SE- - -W A ----TSLSERLRPEQWTLLNEYYGRM-VE---WF ----TGLAESLPPEQWovLNQVWRL-SD---AVL

----TFLWELF-P-NAMRTAIKTHNDIM--RRQL-----TLLWERH-P-IAMRSAIKTHNTIM--RRQL-----TFLWELF-P-DAMRAAIKTHNDIH--RRQL--

----TQLWENY-P-EAMRLAIKLHNEVM--RRQL-----TAQWATQ-P-ELMPDAVATHHSMV---RSLI----TRFSSIHSP~VIQVLIKIFNSM----D-LLNEFYTEMDGSDQG----LECLRLLNEIIADFDELLK NDFYIELDGNNMD----VECLRLLNEIIADFDELMD KEFYTESDVNKEG----LECLRLLNEIIADFDDLLS ADFYTEFSINNGG----1ECLRFLNEIISDFDSLLD KEFYSESNINHEG----LECLRLLNEIIADFDELLS SEFWELEANNEG----VECLRVLNEIIADFDEIIS SEFWELEANNEG----VECLRLLNEIIADFDEIIS

VKTIGDEVLYVAE VKTLGDEVLYAAD IQF’HGDSVFAWN DKFIGDALMVYFG DKFLGDGLMAVWG VKTEGDAFMVAFP VKTEGDAFMVCFQ VKTEGDAFMVAFP WTEGDAFMVSFP VKWGDSFMIACK IKTIGDAYMAWG IKWGSTYMAWG IKTIGSTYMAAVG IKTIGSTYMAATG IKTIGSTYMAASG IKTIGSTYMAATG IKTIGSTYMAASG IKTIGSTYMAASG

+

I+

a FIG. 4. Aligninelit of general types of class 111 adenylyl-cyclase catalytic domains. The four regions of conservation are boxed. Note that cli~ss 111 comprises both prokaryotic- and eukaryntic-related enzymes. For higher eukaryotes, adeiiylyl cyclases of types I-VI are coinpared, selecting the second (carl,oxy-terininal) catalytic tlomain of the pr(iteii1. BRELI. Breoibucteriuin liyuefuciens (149); STRCOE. Stwptoiriyces coelicohr (153); RHIME, Rhizobittin trieliloti (151);STIAUl. Stigniutefh auruntiucu form 1 (1): STIAU2. Stiginatellu nurantiucrt forin

269

MOLECULAR HE’I‘EHOGENEITY O F ADENYLYL CYCLASES

BRELI STRCQ RHIME STIAUl STIAU2 SACCE SCHPO SACKL NEUCR TRYBR DICDI DROMEb BOSTAlb RATNO2 b RATNO3 b RATN04b CANFA5b MUSMU6b

BRELI STRCE RHIME STIAUl STIAUZ SACCE SCHPO SACKL NEUCR TRYBR DICDI DROMEb BOSTAlb RATNO2 b RATNO3 b

RATN04b CANFA5b MUSMU6b

------------pJ(DELFpQ--------------------------AHDEIMpE---------------

_ _ - -LEAFNSAQRASGLPE-------------FR TRFGIHWTAWG _ _ - -LETVNALRSARGEPC-------------LR IGVGVHTGPAVLG

_ _ - -VELRQAAQAEWUNERLGR------PLVLE -

ffiIGINSGUVAG

__---------QDGCQ-VTDRNGNIIYQ-- --GLS VRMGIHWOCPVPE _ _ - - - - - _ _VQGRL-VLGSKNEVLYR----GLS __ VRIGVNYGVTVSE

_ _ - - - - _ - _QDGCL-ITDNSGTKVYL----GLS -_

_ _ - - - - - - - - NSSCQPIYDRNNNLITR----GLS _ REFEERHAEEGDGKYKPPTARLDPEVYRQLWNGLR _ _ - -HNTYKMLGFAMDVLEFIP--KEMSFHLGLQ _ _ - -RHMTALIEYVKAMRHSLQEINSHSY-NNFM _ _ _ - -HLSTLADFAIEMFDVLDEINYQSY-NDFV _ _ - -MHIGTMVEFAYALVGKLDAINKHSF-NDFK _ _ - -QHLADLADFALAMKDTLTNINNQSF-NNFM _ _ - -SHLGTMVEFAVALGSKLGVINKHSF-NNFR HIKALADFAMKLMDQMKYINEHSF-NNFQ

__-___

------HITALADYAMRLMEQMKHINEHSF-”F’Q

- - - - - - YGPTVNMAARLTS - - - - - - - - FGTTVNLASR LTS KERLQYTA MGDTVNVASRLEG TRRLEYTA IGDTVNLASR IES M-RTEY‘IT

TQRMDY- TRRMDY - ‘PQRMDY- TRRMDY- TKGYDY- -AKPHFDV -RKPQYDI -RRPQYDI -QKPQYDI -RKPHYDI -QKPQYDI -RKPQYDI -RKPQYDI

VRMGVHWGCPVPE VRMGAHWGEPLAE VRVGIHTGLCDIR VRVGIHCGWISG LRVG INIGP W A G LRVGINVG P W A G LRVGINHGW IAG LRIGMNKGGVLAG LRVGLNHGW A G MKIGLNIGPWAG MK IGLNMGP W A G

* ++

LGDIMGDVSV-GA NIGSA NIGGS -LDLV -LDPI -IDLV -RDPV -YDEV VI SGY VI -GA VI -GA VI -GA VI -GA VI -GA VI -GA VI -GA

I

LAEP-GWLTDAITANTL---RNDARFVLTAQ I A PK - DAVLVDTAFAEELIRTRDA PASEAAM MNKDYGTSVLASGAWAQCKDMVK-FRPLGTA LTKTRDVPILASRATREQAGDTFL-WNEMAPA IGDAVNVAARLCA LAGP-GEILAGER-TRELVSHREMPFEDLPF’J LGPMVNKAARVQC VAffi-GQIAMSSDFYSEFNKIMK-YHERWKG YG PW N R TSRVVS VAffi-GQIAVSAEWSVLNQ-LDS--ETMSSE LGPWNKMRVSG VAffi-GQITLSSDFCSEFKKIMK-FHKRWEN YGPMVNKASR ISA VAffi-GQITASSDFITEIHRCLETYKESVDVD YGQTANTAARTES VGNG-CQVLMTCETYHSLSTAERSQFDVTPLG WGDTVNVASRMES -TGIAGQIHVSDRVYQ-LGKEDFNFSERCDII WGNTVNVASRMDS -TGVFGYSQVTQEwDSLVGSHFEFRCRGGTI WGNTVNVASRMDS -TGVQGRIQVTEEVHRLLRRsSYRFVCRG-KV WGNTVNVASRMDS -TGVLDKIQVTEETSLILQTLGYTCTCRG-I1 WGNTVNVASRMES -TGVMGNIQWEETQVILREYGFRFVRRGPIF WGNTVNVASRHES -TGVLGKIQVTEETARALQSLGYTCYSRG-VI WGN’I1’NVASRMDS -TGVPDRIQVTTDMYQVLAANTYQLECRG-W WGNTVNVSSRMDS -‘TGVPDRIQVTT’DLYQVLAAKGYQLECRG-W I

+ ++r I

270

OCTAVIAN B h L U AND ANTOINE DANCHIN

O’Gara and co-workers (151)cloned a region from the plant symbiotic Gramnegative organism Rhizobium meliloti that weakly complements a cya defect in E . coli. They found that the complementing region coded for several polypeptides deriving from the same region, as if the cloned DNA segment could encode in the same frame a long and a short polypeptide (150).The gene was sequenced and shown to correspond to a class I11 adenylyl cyclase (151).The actual start of the gene was proposed to correspond to a short protein of 193 residues, after purification of a hybrid cyclase P-galactosidase chimera. However, it appeared that, upstream of the putative start codon, the in-frame DNA remained without termination codons, suggesting that the same gene could also encode a much longer protein sequence. This was in agreement with the physiological experiments indicating that the cya locus could direct the synthesis of both a long and a short protein. The situation is therefore reminiscent of that in S. cereuisiae in which a single gene seems to synthesize a short and a long form of the adenylyl cyclase according to environmental conditions. In R. meliloti, however, no significant similarity with known proteins has yet been evidenced for the amino-terminal domain of the long protein. Confirmation that the short protein encodes an adenylyl-cyclase catalytic domain came from the isolation of mutants of the protein that could synthesize cGMP in addition to CAMP, and that are localized in consensus regions 3 and 4 of class I11 adenylyl cyclases (152).This also substantiates the hypothesis of a common general ancestor for adenylyl and guanylyl cyclases of class 111, predating the separation between eukaryotes and prokaryotes. Indeed, class I11 enzymes isolated from several other bacterial types have recently been characterized (I), including the Gram-positive filamentous bacterium Streptoinyces caelicoZor

(153). The study of eukaryotic guanylyl cyclases revealed interesting properties and demonstrated that, as are adenylyl cyclases, guanylyl cyclases are constructed in a modular fashion. Two well-defined forms of the enzyme, a cytoplasmic one and a particulate one, were found. They display extremely different activity and regulation (154,155)and were therefore described as significantly different proteins (154,156-162). The cytoplasmic enzyme comprises two subunits, of 70 and 82 kDa, respectively, each having a carboxy-terminal domain endowed with guanylyl-cyclase activity, but unable to function as a monomer (155,163, 164). A noteworthy feature of these enzymes is that they contained a heme as a prosthetic group. It was later found that the heme cofactor modulated enzyme activity after fixation of nitric oxide or other nitrogen-containing gaseous molecules derived from arginine metabolism (see, e.g., 155, 165, 166). In contrast, the plasma-membrane form of the enzyme is a multi-domain

MOLECULAR HETEHOGENEITY OF ADENYLYL CYCLASES

271

polypeptide. In all cases, the catalytic center is similar to that of eukaryotic adenylyl cyclases of class 111, but the domains linked to these catalytic centers differ widely. In the membrane-bound form, the single chain of guanylyl cyclase is endowed with various activities linked through quaternary interactions in most adenylyl cyclases. An amino-terminal receptor domain is situated on the outer face of the cytoplasmic membrane and recognizes different signals according to the cellular source of the enzyme (148,155,167,168).It is followed in the chain by a short transmembrane segment. The polypeptide chain displays similarity to protein kinases. The carboxy-terminal domain is highly similar to the catalytic center of soluble forms of guanylyl cyclase. This constitutes a 120-kDa polypeptide (155,169). In sharp contrast, the form responsible for phototransduction is organized in such a way that it is coupled to a structure strongly reminiscent of the G-protein-dependent form of adenylyl cyclase. Indeed, the transducing subunit, transducin, is evolved from an ancestor in common with G-proteins. However, in this process, transducin is apparently not involved in the modulation of guanylyl cyclase activity, but in the light-induced modulation of cGMP-specific phosphodiesterase activity (170).The actual quaternary structure of the enzyme responsible for cGMP synthesis in retinal cells, unfortunately, is still unknown (171). Finally, as with adenylyl cyclases, guanylyl cyclases are affected by calcium. Calcium-dependent modulation of some enzyme activity involves a 26-kDa binding protein, behaving in a way similar to that of CaM on brain CaM-dependent adenylyl cyclases (172). Thus, guanylyl cyclases behave as a specific subclass of class 111cyclases, differing in their modular organization. This illustrates the very efficient way in which evolution has proceeded, combining modules having inany structural or catalytic activities in order to produce an integrated pattern for synthesis of the cyclic nucleotides.

V. Evolution of Adenylyl Cyclases As we have seen, adenylyl cyclases are multi-domain enzymes that can

be grouped into three well-defined classes. Because, in each class, the domains that are associated in the catalytic domain vary widely, we concentrate in this section only on the specific case of evolution of the catalytic domain (for a recent review, see 1).

A. Gram-Negative Facultative Anaerobes The adenylyl-cyclase class that corresponds to Gram-negative facultative anaerobes is very homogenous: It is made ofproteins having a similar organiza-

272

OCTAVIAN

BARZU

AND ANTOINE DANCHIN

tion into two domains, an amino-proximal catalytic center followed by a carboxy-terminal regulatory domain. Apart from their obvious similarity (long stretches of conserved residues and, overall, 55% conservative replacements in the worst cases), these proteins do not appear to be like other known proteins. They form a class in themselves. Their phylogenic relationship, as derived, for instance, from the program CLUSTAL, is consistent with the relationship among species, as given by taxonomists (173), except perhaps in the case of Erwinia chysanthemi (174).In this case, it could be that the selection pressure that corresponds to the lower temperature biotope of the organism can explain special features of the proteins, for instance, the putative formation of internal salt bridges, that would be less important at higher temperature as compared to hydrophobic interactions (1).

B. Bacterial Pathogens Bacterial cyclases activated by CaM pose a challenging evolutionary puzzle, because the simplest hypothesis would be that they were derived from a eukaryotic parent. In spite of repeated attempts to isolate other members of this class, we know only two examples of such proteins, isolated from extremely distant bacteria, a Gram-negative one and a Gram-positive one. Comparison of the catalytic domains of B. pertussis and B. anthracis enzymes allowed the identification of four conserved regions involved in catalysis, CaM-binding, and activation. The first region has a sequence similar to the P-loop found in many ATP- or GTP-binding proteins (175).A second region, PLTADID, displaying some similarity with a region present in 6-phosphofructokinase, was also shown to be involved in catalysis. Based on this similarity and the crystal structure of 6-phosphofructokinase, we proposed that two aspartate residues in B. pertussis adenylyl cyclase (D188 and D190) are involved, binding the p and y-phosphate of ATP via Mg2+ (80). Two other regions that do not resemble known sequences are present in the downstream part of the polypeptide chain. A most original feature of the B. pertussis protein is that it can be split into two separate domains that can recover most of the initial activity when put together. This observation, together with analysis of mutants in the region conserved between the B. anthracis and B. pertussis enzymes, indicates that the proteins may form a catalytic center from the cooperation of two halves, the function of CaM (or lipophilic solvents, in the case of the B. pertussis enzyme) being to trigger the appropriate conformational movement necessary for the formation of the catalytic center. Such a description of the molecule is reminiscent of the situation recently discovered for the CAMP-dependent protein kinases (176,177),and it is worth wondering whether there has been a common ancestor between this class of ATPbinding proteins and toxic cyclases.

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

273

C. The Class Ill Catalytic Center Class 111 proteins form an extremely diverse group of enzymes, because they include both eukaryotic and prokaryotic proteins. A general classification would place together the bacterial enzymes, then the lower eukaryotic enzymes. Both groups contain a single catalytic carboxy-terminal end. Metazoans seem to include enzymes that contain two domains displaying similarity to the class 111catalytic domain. Based on a similarity score among the various mammalian adenylyl-cyclase family members, there was proposed (140)a phylogenic tree for mammalian class III types: types I and 111 diverged first from the others, subsequently types II/IV and V/VI diverged, forming subgroups within the family (Fig. 4) (178). Four motifs are specific for class I11 proteins; they are: F(A/T)(D/S)(I/L/V)--(F/S), (V/I)KT-G(D-S)--M, (R/K)-G(I/L/V)(H/N)G, and G(N/D/P)TVN-(A/S)(A/S)R (Fig. 4). These motifs are probably involved in the cyclization reaction rather than in the heterocyclic base recognition, since they are common both to adenylyl and guanylyl cyclases. Analysis of motif 2 suggested, however, that it could be involved in base discrimination, because it is different in adenylyl classes [(V/I)KT] and guanylyl cyclases (VET) (179). However, the adenylyl-cyclase examples span a wider series of organisms, froin the point of view of phylogeny, than the guanylylcyclase examples, which were isolated only from metazoans (in this respect, it would be of the utmost importance to characterize guanylyl cyclases from bacteria such as Stiginatella uuruntiacu, which can synthesize cGMP). The difference could therefore only reflect a coincidence. Indeed, the corresponding region in R. ineliloti adenylyl cyclase is different (-Q-). In order to better understand the evolution and function of class I11 cyclases, Beuve and Danchin (180)set up a genetic screen permitting them to direct evolution in the laboratory froin adenylyl cyclase to guanylyl cyclase and vice versa. The system was devised as follows. An adenylyl-cyclasedeficient strain of E. coli was mutated for its CAMP receptor so that it could recognize cGMP. Mutated adenylyl-cyclase genes were then introduced into the strain under appropriate conditions so that only bacteria synthesizing cGMP could grow. Enzymes displaying significant guanylyl-cyclase activity having evolved froin an adenylyl-cyclase ancestor were thus isolated. A similar experiment could also be performed, driving evolution from guanylyl cyclase to adenylyl cyclase (180). It seems noteworthy that, as in the case of CAMP-dependent protein kinases modified to recognize cGMP (179, 181), a single amino-acid residue change in the region located between motifs 3 and 4 could change the specificity of the enzyme: It is likely that a pocket situated in this region of the protein accommodates the heterocyclic base and that amino-acid side-chains discriminate between A and G by the appro-

274

OCTAVIAN BAHLU AND ANTOINE DANCHIN

priate occupancy of a small portion of the pocket. This is substantiated by the observation that a further mutation (from GDTVN to GDTIN) shifts the enzyme activity toward a general purine nucleoside-triphosphate-cyclizing activity (152). In both types of enzymes, gene organization revealed conservation of the localization of at least one catalytic domain in the carboxy-terminal part of the protein, linked to a large variety (in length and in sequence) of aminoterminal parts. For example, in the bovine brain enzyme, a duplication of the catalytic domain is separated by a hydrophobic stretch of amino-acid residues. The coinmon origin between adenylyl cyclases and guanylyl cyclases suggests the existence of an ancestral purine nucleotide triphosphate cyclase as a precursor. As a consequence of this hypothesis, one may even wonder whether evolution has not created some interlock between synthesis of both nucleotides, a given cyclase being apt, under appropriate regulatory conditions, to synthesize alternatively cAMP or cGMP. This could have been used in cyclic-nucleotide-mediated controls existing in eukaryotes and account for the old observation that, in some cases at least, cAMP and cGMP concentrations vary in opposing ways. The discovery that a class of cyclases can derive from a common ancestor predating the eubacterial and eukaryotic separation suggests that the original function of CAMP differed from that of a second messenger. The cyclizing reaction is easily reversed, because of the equilibrium constant for ATP synthesis (12), and thus under appropriate conditions ATP (or GTP) can be synthesized from cAMP (or cGMP) and pyrophosphate. The hypothesis that cAMP was originally involved in energy scavenging or production, rather than regulation, s e e m therefore worth considering. It is of interest to note that there is a weak, but definite, similarity between class 111 cyclases and ATP synthases (152). Indeed, among the inany arguments that breed thoughts about the origin of life, phosphates (and polyphosphates) certainly play a very significant role (182-185). Generation of a chemiosinotic function allowing synthesis of ATP (from ADP) requires numerous steps that are difficult to see fulfilled simultaneously. It may therefore be considered that originally cAMP could have been a building block for ATP. This requires a ubiquitous source for cAMP synthesis: The inany catalytic activities of self-splicing introns involve processes of transesterification that could synthesize cAMP or cGMP as byproducts, providing an early source for these molecules. A first class of enzymes predating class I11 adenylyl cyclases would therefore be involved in ATP synthesis; a second, very important enzyme, adenylyl kinase, would then scavenge AMP, generating ADP. A general source of ADP thus available might have triggered evolution toward generation of ATP synthase.

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

275

Some similarity can be discovered with such enzymes (186),as well as with enzymes using ATP for positive ion translocation (187). Finally, it should be noticed that the class of toxic adenylyl cyclases can be phylogenetically related to adenylyl kinase, at least when considering the first block of conserved amino-acid residues in B. pertussis and B . anthracis enzymes. This may provide a link between both evolution routes (from AMP to ATP and from cAMP to ATP). Another observation may add substance to the validity of this hypothetical relationship. Antibodies raised against B. pertussis adenylyl cyclase, or even against a peptide motif common to the B. pertussis and B . anthracis enzymes, recognizes a human brain adenylyl cyclase (115,188, 189).This indicates that higher eukaryotic enzymes might display similarities with the calmodulin-activated toxic proteins. Yet identification of an adenylyl-cyclase gene from bovine brain revealed that the enzyme is clearly similar to the yeast enzyme (class 111) but not to adenylyl cyclase toxins (class 11) (134).If confirmed, this demonstrates kinship between class I1 and class I11 proteins. However, immune recognition does not demonstrate true phylogenic relationships. For example, the sweet proteins thaumatin and mellitin, although recognized by the same receptors and the same antibodies, share no common feature in primary or secondary structure (190).Furthermore, the possibility remains that there are several classes of brain adenylyl cyclases, one of which might be a member of class 11. Therefore, in the absence of knowledge about the catalytic mechanism, it is difficult to know whether there is a true similarity between class I1 and class I11 enzymes. If such were the case, it would provide an integrated view in which all cyclases derive from an ancestral nucleotide-binding protein, perhaps made of small independent modules such as in present-day protein kinases (176,177). Alternatively, we could be facing a specific case of convergent evolution (191).This shows that we are much in need of crystallographic data for all three classes. In parallel, determination of the sequence of adenylyl-cyclase genes from the many organisms that are sensitive to cAMP should permit more convincing phylogenic comparisons (10,192-196).

VI. Are Adenylyl Cyclases PulseGenerating Enzymes? Cyclic AMP is involved in the control of metabolic processes. In most instances, it is involved as a molecule undergoing transient, sometimes oscillatory, changes in concentration. Adenylyl cyclases are very complex enzymes; not only are they constructed in modular fashion (several domains of

276

OCTAVIAN BAHLU AND ANTOINE DANCHIN

the catalytic polypeptide are involved in regulation of activity), but they also interact with a large number of other sensory or regulatory polypeptides. Therefore, it seems interesting to investigate whether these enzymes have evolved to produce pulses of CAMPrather than stable concentrations of the metabolite (1). Such may explain why such a large number of classes have evolved (three classes have already been identified, but there may be even more enzyme families; see, e.g., 1, each being specific to one type of timedependent control function. For this reason, it is even more urgent to try to obtain information about the catalytic processes permitting cyclization of ATP to CAMP.Stereochemical studies using phophorothionate analogs of ATP and GTP (197, 198) showed that the reaction catalyzed by bacterial or mammalian adenylyl and guanylyl cyclases proceeds with inversion of configuration at the a-phosphorus (199-201). Gerlt et al. (199) proposed that cyclization occiirs by a single nucleophilic displacement and that a general-base catalyst is required to assist in the ionization of the 3’-OH group of the pentose moiety, which is supposed to attack the a-phosphorus atom of the nucleoside triphosphate. Assuming that a general acid/base catalyst is involved in the interconversion of ATP and CAMP, this is the best target for such a function, due to its ability to substract (CAMPformation) or release (ATP formation from CAMP)protons at pH values close to the optimum pH of the reaction catalyzed by adenylyl cyclase. Substitution of histidine 63 in B . pertussis adenylyl cyclase with arginine, glutamate, glutamine, or valine decreased the catalytic efficiency by two or three orders of magnitude, and altered the kinetic properties of the enzyme (202). Although this residue is involved in the reaction mechanism, it probably does not interact directly with the substrate, being part of a charge-relay catalytic system, as frequently seen in the reaction catalyzed by hydrolases or transferases. Tasks for the future research will be to identify and clarify: (i) the role of other amino-acid residues playing a role in such a proton shuttle system and (ii) the diversity (if any) of the cyclization inechanisin of ATP to CAMP.

VII. Glossary CYa cyaA cyac

gene encoding adenylyl cyclase gene encoding B . pertussis adenylyl-cyclase toxin gene encoding the posttranscriptional activator of B. pertussis adenylyl-cyclase toxin

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

cyaB, cyaD, and cyaE

hlY

hlyC hlyB and hlyD RTX

PTS H Pr Mtl CAP CaM VU-8 CaM

PA LF EF CYRl

277

genes encoding components of the translocation system of B . pertussis adenylylcyclase toxin gene encoding the a-hemolysin of E. coli gene encoding the posttranscriptional activator of E. coli a-hemolysin genes encoding components of the translocation system of E. coli a-hemolysin repeat in toxin; a family of toxins including B. pertussis adenylyl cyclase and E. co2i a-hemoly sin phosphoenolpyruvate-dependent phosphotransferase system protein H of the phosphoenolpyruvatedependent phosphotransferase system mannitol catabolite activator protein calmodulin an engineered calmodulin in which three glutamic acid residues (82-84) have been replaced by three lysine residues protective antigen lethal factor edema factor yeast adenylyl cyclase

ACKNOWLEDGMENTS We thank Peter Sebo for useful comments, Susan Michelson and Iain Peinberton for careful reading of the manuscript, and Annie Beaudeux for excellent secretarial help. This work has Iieen financed by grants froni the Centre National de la Recherche Scientifique (URA1129) and from the Direction de la Recherche et des Etudes Doctorales of the Ministere de I'Education Nationale (grant to AD 3e section de I'Ecole Pratique des Hautes Etudes).

REFERENCES 1 . A. Danchin, Ado. Second Messenger Phosphoprotein Res. 27, 109 (1993). 2. R. Iyengar, Adc;. Second Messenger Phosphoprotein Res. 28, 27 (1993). 3. M. Mock and A. Ullmann, Trends Microbiol. I, 187 (1993). 4. A. Peterkofsky and N. Gollop, Protein Sci. 2, 498 (1993).

278

OCTAVIAN BARLU AND ANTOINE DANCHIN

5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

W.-J. Tang and A. G. Gilman, CeU 70, 869 (1992). T. Okabayashi, M. Ide and A. Yoshimoto, ABB 100, 158 (1963). R. S. Makinan and E. W. Sutherland, FP 22, 470 (1963). R. S. Makman and E. W. Sutherland, JBC 240, 1309 (1965). M. Hirata and 0. Hayaishi, BBRC 21, 361 (1865). M. Ide, ABB 144, 262 (1971). M. Ide, BBRC 36,42 (1969). F. Lipmann, M. Tao and A. Hubermm, Fogarty Znt. Center Proc. 4, 29 (1969). M. Tao mid F. Lipmann, PNAS 63, 86 (1969). K. Takai, Y. Kurashima. C. Suzuki-Hovi, H. Okamoto and 0. Hayaishi, JBC 249, 1965

15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.

J. K. Yang and W. Epstein, JBC e58, 3750 (1983). M. M. Holland, T. K. Leib and J. A. Gerlt, JBC 263, 14661 (1988). R. L. Khandelwd and I. R. Hamilton, JBC 246,3297 (1971). J. Y. J. Wang, D. 0. Clegg and D. E. Koshland, Jr., PNAS 78, 4684 (1981). A. Roy and A. Danchin, Biochimie 63, 719 (1981). A. Roy and A. Danchin, MGC 188,465 (1982). A. Dancliin, N. Guiso, A. Roy and A. Ullmann, J M B 175, 403 (1984). H. Aiba, K. Mori, M. Tanaka, T. Ooi, A. Roa and A. Danchin, NARes 12, 9427 (1984). P. Reddy, A. Peterkofsky and K. McKenney, NARes 17, 10473 (1989). A. Roy, A. Danchin, E. Joseph and A. Ullmann, J M B 165, 197 (1983). I. Crenon, D. Ladant, N. Guiso, A.-M. Gilles and 0. B k u , EJB 159, 605 (1986). L. Hedegaard and A. Danchin, MGG 2 0 1 , s (1985). M. Mock. M. Crasnier, E. Duflot, V. Dumay and A. Danchin, J. Bact. 173, 6265 (1991). I. R. Dorocica, P. M. Williams and R. J. Redfield, J . B a t . 175, 7142. J. Janecek. J. Naprstek, Z. Dolxova, M. Jiresova and J. Spizek, FEMS Microbiol. Lett. 6,

(1974).

305 (1979). 30. A. Ullmann and A. Danchin, Adu. Cyclic Nucleotide Res. 15, 1 (1983). 31. P. W. Postma, J. W. Lengeler and G . R. Jaculxon, Microbwl. Rec;. 57, 543 (1993). 31a. M. Crasnier, V. Dumay and A. Danchin, MGG 243, 409 (1904). 32. B. U. Feucht and M. H. Saier, Jr., J . Bact. 141, 603 (1980). 33. G. T. Robillard, H. H. Pas,R. H. ten Hoeve-Duurkens and M. G . L. Elferink, FEMS Microbiol. Reu. 63, 135 (1989). 34. J. W. Lengeler and A. P. Vogler, FEMS Mlcrobiol. Reu 63,81(1989). 35. W. J. Dobrogosz, G . W. Hall, D. K. Sherba, D. 0. Silva, J. G. Harman and T. Melton, MCC 192,477 (1983). 36. J. Ahnn, P. E. March, H. E. Takilfand M. Inouye, PNAS 83, 8849 (1986). 37. P. Reddy, D. Miller and A. Peterkofsky, JBC 261, 11448 (1986). 38. A. Peterkofsky, ABB 265, 277 (1988). 39. S. Uvy, G. Zeng and A. Danchin. Gene 86, 27 (1990). 40. M. Crasnier and A. Danchin, J. Gen. Microbid. 136, 18% (1990). 41. A. Goldhammer and A. Wolff, A d . Biochetn. 124, 45 (1982). 42. J. Wolff, G. H. Cook, A. R. Goldhammer and S. A. Berkowitz, PNAS 77, 3840 (1880). 43. J. WOKand G . H. Cook, JBC 248,350 (1973). 44. E. L. Hewlett and J. WOK,J. Bact. 127, 890 (1976). 45. A. Rogel, Z. Farfel, S. Goldschmidt, J. Shiloach and E. Hanski, JBC 263, 13310 (1988). 46. E. Hanski, TZBS 14, 459 (1989). 47. H. R. Masure. D. J. Oldenlmrg, M. G. Donovan, R. L. Shattuck and D. R. Storm, JBC 263, 6933 (1988). 48. R. L. Shattuck, D. J. Oldenberg and D. R. Storm, Bchern 24,6356 (1985). 49. D. Ladant, C. Bredn, J.-M. Alonso, I. Crenon and N. Guiso, JBC 261, 16264 (1986).

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

279

50. J. Haiech, R. Predeleanu, D. M. Watterson, D. Ladant, J. Bellalou, A. Ullmann and 0. Blrzu, JBC 263, 4259 (1988). 51. D. Ladant, S. Michelson, R. Sarfati, A. M. Gilles, R. Predeleanu and 0. BBrzu, JBC 264, 4015 (1989). 52. P. Glaser, D. Ladant, 0. Sezer, F. Pichot, A. Ullmain and A. Danchin. Mol. Micmbiol. 2, 19 (1988). 53. P. Claser, A. Danchin, D. Ladant, 0. Blrzu and A. Ullmann, Tokui]. Exp. Clin. M e d . 13(suppl.),239 (1988). 54. I. E. Ehrmann. M. C. Gray, V. M. Gordon, L. S. Gray and E. L. Hewlett, FEBS Lett. 278, 79 (1991). 55. P. Glaser, A. Elmaoglou-Lazaridou, E. Krin, D. Ladant, 0. BBrzu and A. Danchin, E M B O J . 8, 967 (1989). 56. A. Rogel, R. Meller and E. Hanski, JBC 266, 3154 (1991). 57. P. Glaser, H. Sakamoto, J. Bellalou, A. Ullmann and A. Danchin, E M B O J . 7,3997 (1988). 58. P. Glaser, A. Danchin, 0. Blrzu and A. Ullmann, Bact. Protein Toxins 17(suppl.), 375 (1990). 59. E. L. Hewlett. V. M. Gordon, J. D. McCaffery, W. M. Sutherland and M. C. Gray, JBC 264, 19379 (1989). 60. M. S. Leusch, S. Paulaitis and R. L. Friedman, Infect. Zininun. 58, 3621 (1990). 61. R. H. Masure and R. R. Storm, Bchein 28, 438 (1989). 62. E. L. Hewlett, M. A. Urban, C. A. Manclark and J. WoH, PNAS 73, 1926 (1976). 63. J. Belldou, H. Sakamoto, D.Ladant, C. Geoffroy and A. Ullmann, Infect. Imtnun. 58, 3242 (1990). 64. W. Goehel and J. Hedgpeth. J . B a t . 151, 1290 (1982). 65. R. A. Welch and S. Pellett, J . B u d . 170, 1622 (1988). 66. J. M. Nicaud, N. Mackman, L. Gray and J. 8. Holland, FEBS Lett. 187, 339 (1985). 67. E. M. Barry, A. A. Weiss, I. E. Ehrmain, M. C. Gray, E. L. Hewlett and M. St Mary Goodwin, J . B a t . 173, 720 (1991). 68. A. Bouhss, E. Krin, H. Munier, A.-M. Gilles, A. Danchin, P. Glaser and 0. Blnu, JBC 268, 1690 (1993). 69. D. V. Greenlee, T. J. Andreasen and D. R. Storm, Bchein 21, 2759 (1982). 70. D. Ladant, JBC 263, 2612 (1988). 71. H. Munier, A.-M. Gilles, R. Sarfati, P. Glaser, E. Krin, A. Danchin and 0. B k u , EJB 196, 469 (1991). 72. H. Munier, E. Krin, A.-M. Gilles, P. Claser, A. Bouhss. A. Danchin and 0. Blrzu, “Adenine Nucleotides in Cellular Energy Transfer and Signal Transduction” (S. Papa, A. A7zi, and J. M. ‘Iager, eds.), p. 335.Birkhauser Verlag, Basel, 1992. 72a. C. T.Craescu, A. Bouhss, J. Mispelter, E. Diesis, A. Popescu, M. Chiriac and 0. BIrzu Hiochmistrg (siil)mitted). 721). H. Munier, A. Bouhss, A.-hl. Gilles, N. Palil)roda, 0. Blrari, J. Mispelter and C. T. Craescu Errr. J . Biochein (submitted). 73. A.-M. Gilles, H. Munier, T Rose, P. Glaser, E. Krin, A. Danchin, C. Pellecuer and 0. Blrzu, Bchetn 29, 8126 (1990). 74. M. Dasgupta, T. Honeycutt and D. K. Blumenthal, JBC 264, 17156 (1989). 75. B. E. Kemp, R. B. Pearson, V. Guerriero, I. C. Bagchi and A. R. Means, JBC 262,2542 (1987). 76. H. Munier, A. Bouhss, A. Gilles, E. Krin, P. Glaser, A. Danchin and 0. Blrm, EJB 217, 581 (1993). 77. R. S. Sarfati, V. K. Kansd, H. Munier, P. Glaser, A.-M. Gilles, E.LabruyBre, M. Mock, A. Danchin and 0. Blrzu, JBC 265, 18902 (1990). 78. R. S. Sarfati, A. Namane, H. Munier and 0. Bfuzu, Tetrahedron Lett. 32, 4699 (1991).

280

OCTAVIAN B h Z U AND ANTOINE DANCHIN

79. D. S. Au, H. R. Masure and D. R. Storm, Bchetn 28, 2772 (1989). 80. P. Glaser, H. Munier, A.-M. Gilles, E. Krin, T. Porumh, 0.Blnu, R. Sarfati, C. Pel-

lecuer and A. Danchin, E M B O J . 10, 1683 (1991). 81. A. Rogel, J. E. Schultz, R. M. Brownlie, J. G . Coote, R. Piirton and E. Hanski, E M B O J .

8, 2755 (1989). 82. H. Sakamoto, J. Bellalou, P. Sebo and D. Ladant, JBC 267, 1 (1992).

83. I. E. Ehrmann, A. A. Weiss, M. S. Goodwin, M. C. Gray, E. Barry and E. L. Hewlett, FEBS Lett. 304, 51 (1992). 84. M. K. Gross, D. C. Au, A. L. Smith and D. R. Storm, PNAS 89, 4898 (1992). 85. J. Bellalou, D. Ladant and H. Sakamoto, Infect. Imtnun. 58, 1195 (1990). 86. N. Khelef, H. Sakamoto and N. Guiso, Microh. Pathog. 12, 227 (1992). 87. D. L. Confer and J. W. Eaton, Science 217, 948 (1982). 88. R. A. Welch, Mol. Microbiol. 5, 521 (1991). 89. D. F. Boehm, R. A. Welch and I. S. Snyder, Infect. Zmtnun. 58, 1959 (1990). 90. E. L. Hewlett, L. Gray, M. Allietta, I. Ehrmann, V. M. Gordon and M. C. Gray, JBC 266, 17503 (1991). 91. P. Selm, P. Glaser, H. Sakamoto and A. Ullmann, Gene 104, 19 (1991). 92. F. Betsou, P. Sebo and N. Guiso, Infect. Zmtnun. 61, 3583 (1993). 93. E. L. Hewlett, M. C. Gray, I. E. Ehrmann, N. J. Maloney, A. S. Otero, L. Gray, M. Allietta, G. Szabo, A. A. Weiss and E. M. Barry, JBC 268, 7842 (1993). 94. P. Sebo and D. Ladant, Mol. Microbiol. 9, 999 (1993). 9.5. s. H. Leppla, Adv. Cyclic Nuchitide and Protein P/lfJ.s/Jhfir~~utifil~ Res. 17, 189 (1984). 96. s. H. Leppla, Methods Eiizyttiol. 195, 153 (1991). 97. S. H. Leppla, PNAS 79, 3162 (1982). 98. A. Cataldi, E. Lal)ruyBre and M. Mock, Mol. Microbiol. 4, 1111 (1990). 99. M. Mock, E. LallruyPre, P. Glwer, A. Danchin and A. Ullmann, Gene 64, 277 (1988). lfx). M. T. Tippets and D. L. Rol)ertson, J . Huct. 170, 2263 (1988). 101. E. Lal)ruy+re, M. M t r k , I). Ladant, S. Michelson, A.-M. Gilles, 8. Laoide and 0. Birzu, Bcheni 29, 4922 (1990). 102. M. Byler and H. Susi, Biopolymers I 469 , (1986). 103. E. LaliruyPre, M. Mock, W. K. Surewicz, H. H. Mantsch, T. Rose, H. Munier, R. S. Sarfati and 0. Blrzu, Bchern 30, 2612 (1991). 104. Z. Xia and D. R. Storm, JBC 265, 6517 (1990). 105. H. Munier, F. J. Blanco, B. Prtxheur, E. Diesis, J. L. Nieto, C. T. Craescu and 0.Blrzu, JBC 268, 1695 (1993). 106. K. T. O’Neil, J. H. R. Wolfe, S. Erickson-Viitanen and W. F. DeGrado, Science 236, 1454 (1987). 107. K. T. O’Neil and W. F. DeGrado, TZBS 15, 59 (19W). 108. K. B. Seamon, W. Padgett and J. W. Daly, PNAS 78, 3363 (1981). 109. K. B. Seamon and J. W. Daly, Ado. Cyclic Nucleotide Res. 20, 1 (1986). 110. A. Laurenza and K. B. Seamon, Methods Ensytnol. 195, 52 (1991). 111. E. Pfeuffer, S. Mollner arid T. Pfeuffer, EMBOJ. 4, 3675 (1985). 112. E. Pfeuffer, S. Mollner, D. Lancet and T. Pfeuffer. JBC 264, 18803 (1989). 113. E. Pfeuffer, S. Mollner and T. Pfeuffer, Methods Enzymul. 195, 83 (1991). 114. A. Monneron, J. d’Alayer and F. Coussen, Biochimie 69, 263 (1987). 115. C. Orlando, J. d’Alayer, G. Baillat, F. Cwtets, 0. Jeannequin, J.-C. MaziC and A. Monneron, Bchem 31, 3215 (1992). 116. C. 0. Brostrom, M. A. Brostrom and D. J. WOE, JBC 252, 5677 (1977). 117. K. R. Westcott, D. C. Laporte and D. R. Storm, PNAS 76, 204 (1979). 118. M. D. Smigel, J B C 261, 1976 (1986).

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

281

119. A. e. v. M. Minocherhomjee, S. Selfe, N. J. Flowers and D. R. Storm, Bchem 26, 4444 (1987). 120. S. Mollner and T. Pfeuffer, EJB 171, 265 (1988). 121. T.Kataoka, D. Broek and M. Wigler, Cell 43, 493 (1985). 122. P. Masson, G. Lenzen, J. M. Jacquemin and A. Danchin, Curr. Genet. 10, 343 (1986). 123. M. Fedor-Chaiken, R. J. Deschenes and J. R. Broach, Cell 61, 329 (1990). 124. J.Field, A. Vojtek, R. Ballester, G. Bolger, J. Colicelli. K. Ferguson, J. Cerst, T. Kataoka. T.Michaeli, S. Powers, M. Rigs, L. Rodgers, I. Wieland, B. Wheland and M. Wigler. Cell 61, 319 (1990). 125. G . Feger, E. De Vendittis, A. Vitelli, P. Mastuno, R. Zahn, A. C. Verrotti, C. Kavounis. G. P. Pal and 0. Fasano, EMBOJ.10,349 (1991). 126. D. Young, M. Rigs, J. Field, A. Vojtek, D. Broek and M. Wigler, PNAS 86,7989 (1989). 127. D. T. Ross, A. Raibaud, I. C. Florent, S. Sather, M. K. Gross, D. R. Storm and H. Eisen. EMBO J . 10, 2047 (1991). 128. S. Alexandre, P. Paindavoine, P. Tebabi, A. Pays, S. Halleux and M. Steinert, Mol. Biochem. Parasitol. 43, 279 (1990). 129. S. Kore-Eda, T. Murayama and I. Uno, Jpn. J. Genet. 66, 317 (1991). 130. G. S. Pitt, N. Milona, J. Borleis, K. C . Lin, R. R. Reed and P. N. Devreotes, Cell 69,305 (1992). 131. L. K. Read and R. B. Mikkelsen, Mol. Biochem. Parasitol. 45, 109 (1991). 132. Y. Yamawaki-Kataoka. T. Tamaoki, H . 3 . Choe, H. Tanaka and T.Kataoka, PNAS 86,5693 (1989). 133. D. Young. K. O’Neill, D. Broek and M. Wigler, Gene 102, 129 (1991). 134. J. Krupinski, F. Coussen, H. A. Bakalyar, W.-J. Tang, P. G. Feinstein, K. Orth, C. Slaughter, R. R. Reed mid A. G. Gilman, Science 244, 15.58 (1989). 135. T.Vorherr, L. Kniipfel. F. Hofmann, S. Mollner, T. Pfeuffer and E. Carafoli, Bchem 32, 6081 (1993). 136. W.-J. Tang, J. Krupinski and A. G . Gilman, JBC 266, 8595 (1991). 137. P. G . Feinstein, K. A. Schrader, H. A. Bakalyar, W.-J. Tang, J. Krupinski, A. G. Gilman and R. R. Reed, PNAS 88, 10173 (1991). 137a. R. Iyengar, FASEBJ. 7,768 (1993). 13717. N. Haber, D. Stengel, N. Defer, N. Roeckel, M. G. Mattei and J. Hanoune, Genomics in

press (1994).

138. H. A. Bakalyar and R. R. Reed, Science 250, 1403 (1990). 139. B. Gao and A. G. Cilman, PNAS 88, 10178 (1991). 140. S . Katsushika, L. Chen, J. I. Kawabe, R. Nilakantan, N. J. Halnon, C. J. Homcy and Y. Ishikawa, PNAS 89, 8774 (1992). 141. R. D. Gerard and Y. Gluzman, Mol. Cell. Biol. 5, 3231 (1985). 142. S. Katsushika, J. I. Kawabe, C. J. Homcy and Y. Ishikawa, JBC 286, 2273 (1993). 143. W.-J. Tang and A. G. Gilman, Science e54, 1500 (1991). 144. D. L. Garhrs, Trends E n b r i n o l . Metab. 1, 64 (1989). 145. S. Menz, J. Bumann, E. JwKorski and D. Malchow, J . Cell Sci. B9, 187 (1991). 146. J. Tremblay, R. Gerzer and P. Hamet, A h Second Messenger Phosphoprotein Res. 319 (1988). 147. S. Steinlen, S. Klumpp and J. E. Schultz, BBA 1054, 69 (1990). 148. M. Chinkers, D. L. Garbers, M. S. Chang, D. G . Lowe, H. Chin, D. V. Goeddel and S. Schulz, Nature 338, 78 (1989). 149. E. P. Peters, A. F. Wilderspin, S. P. Wood, M. J. J. M. Zwelebil, 0. Sezer and A. Danchin, Mol. Microbid. 5, 1175 (1991). 150. R. Lathigra, M. O’Regan, B. Kiely, B. Boesten and F.O’Gwa, Gene 44, 89 (1986).

a,

282

OCTAVIAN B k L U AND ANTOINE DANCHIN

151. A. Beuve, B. Boesten, M.Crasnier, A. Danchin and F. O’Gara.J. Bact. 172,2614 (1990). 152. A. Beuve, E. Krin and A. Danchin, C.R. Acad. Sci. Paris 316,553 (1993). 153. A. Danchin, J. Pidoux, E. Krin, C. J. Thompson and A. Ullmann, FEMS Microbbl. Lett. 114, 145 (1993). 1%. D. L. Garbers, New Biologist 2, 499 (1990). 155. S. Schulz, P. S. T. Yuen and D. L. Garbers, Trends Phartnacol. Sci. 12, 116 (1991). 156. M. S. Chang, D. G . Lowe, M. Lewis, R. Hellmiss, E. Chen and D. V. Goeddel, Nature 341, 68 (1989). 157. D. L. Garbers, JBC 264, 9103 (1989). 158. D. Koesling, C. Hurteneck, P. Humbert, A. Bosserhoff, R. Frank, G . Schultz and E. Biihme, FEBS Lett. 266,128 (1990). 159. S. Schulz, M. Chinkers and D. L. Garbers, FASEBJ. 3, 2026 (1989). 160. S. Schulz, S. Singh, R. A. Bellet, G. Singh, D. J. Tubb, H. Chin and D. L. Garbers, Cell 58, 1155 (1989). 161. S. Schulz, C. K. Green, P. S. T. h e n and D. L. Garbers, Cell 63, 941 (1990). 162. D. S. Thorpe and D. L. Garbers, JBC 264,6545 (1989). 163. M. Nakane, S. Saheki, T. Kuno Ishii, K. Murad and F. Murad, BBRC 157, 1139 (1988). 164. M. Nakane, S. Saheki, T. Kuno and F. Murad, FASEBJ. 4, A872 (1990). 165. L. J. Ignarro, Phartmcol. Toxicol. 67, 1 (1990). 166. P. A. Marsden and B. J. Bdlermann, J . Exp. Med. 172, 1843 (1990). 167. M. Chinkers and D. L. Garbers, Adcl Cyclic Nucleotide Res. 60, 553 (1991). 168. D. Koesling, C. Schultz and E. Biihme, FEBS Lett. 280, 301 (1991). 169. S. Singh, D. G . Lowe, D. S. Thorpe, H. Rodriguez, W.-J. Kuang, L. J. Dangott, M. Chinkers, D. V. Goeddel and D. L. Garbers, Nature 334, 708 (1988). 170. R. N. Lolley and R. H. Lee, FASEB J . 4, 1001 (1990). 171. A. M. Dizhmr, S. Ray, S. Kumar, G. Niemi, M. Spencer, D. Brolley, K. A. Wdsh, P. P. Philipov, J. B. Hurley and L. Stryer, Science 251, 915 (1991). 172. H.-G. Lam1)recht and K.-W. Koch, EMBOJ. 10, 793 (1991). 173. D. J. Brenner, “Bergey’s Manual of Systematic Bacteriology.’’(N. H. Krieg and J. G. Holt, eds.), p. 408. Williams & Wilkins, Bdtimore, 198.1. 174. A. Danchin and G . Lenzen, Second Messengers Phosphoproteins 12, 7 (1988). 175. S. K. Hanks, A. M. Quinn and T. Hunter, Sdence 241, 42 (1988). 176. D. R. Knighton, J. Zheng, L. F. Ten Eyck, V. A. Ashford, N.-H. Xuong, S. S. Taylor and J. M. Sowadski, Science 253, 407 (1991). 177. D. R. Knighton, J. Zheng, L. F.Ten Eyck, N.-H. Xuong, S. S. Taylor and J. M. Sowadski, Science 253, 414 (1991). 178. R. T. Premont, J. Chen, H. W. Ma, M. Ponnapdi and R. Iyengar, PNAS 89,9809 (1992). 179. I. T.Weber, J. B. Shabb and J. D. Corbin, Bchnn 28, 6122 (1989). 180. A. Beuve and A. Danchin, J M B 225, 933 (1992). 181. J. B. Shabb, L. Ng and J. D. Corbin, JBC 26S. 16031 (1990). 182. J. D. Bernal, ’The Physical Basis of Life.” Routledge & Kegan Paul, London, 1951. 183. A. G. Cairns-Smith, “Genetic Takeover and the Mineral Origin of Life.” Cambridge University Press, Cambridge, England, 1982. 184. A. Danchin, Prog. Biophys. Mol. Biol. 54, 81 (1989). 185. G. Wiichtershauser. Microbiol. Reo. 52,452 (1988). 186. M. Takeyama, K. Ihara. Y. Moriyama, T. Noumi, K. Ida, N. Tomioka, A. Itai, M. Maeda and M. Futai, JBC $65, 21279 (1990). 187. R. Serrano, BBA 947, 1 (1988). 188. S. Goyard, C. Orlando, J.-M. Sabatier. E. Labruy&re,J. dAlayer. G. Fontan, J. van

MOLECULAR HETEROGENEITY OF ADENYLYL CYCLASES

283

Rietschoten, M. Mock, A. Danchin, A. Ullmann and A. Monneron. Bchem 28, 1964 (1989).

189. A. Monneron, D. Ladant, J. dAlayer, J. Bellalou, 0. Bbnu and A. Ullmann, Bchern 27, 536 (1988). 190. S.-H. Kim, A. de Vos and C. Ogata, TZBS 13, 13 (1988). 191. J. Kuriyan, T. S. R. Krishma, L. Wong, B. Guenther, A. P i e r , J. C. H. Williams and P. Model, Nature 352, 172 (1991). 192. J, Ishiyama, App!. Microbid. Biutechnof. 34, 359 (1990). 193. R. Kaul and W. M. Wenman, J. B u d . 168, 722 (1986). 194. R. L. Kliandelwl and I. R. Hamilton, BBRC 151, 75 (1972). 195. C. Paveto, G . Egidy, M. A. Galvagno and S. Passeron, BBRC 167, 1177 (1990). 196. F. k r n i and G. Rosati, J . Exp. Zool. 244, 289 (1987). 197. F. Eckstein, TIBS 5, 157 (1980). 198. P. A. Frey, A h . Enzyiiwl. 60, 119 (198tl). 199. J. A. Gerlt, J. A. Coderre and M. S. Wolin, JBC 255, 331 (1980). 200. F. Eckstein, P. Romaniuk, W. Heideman and D. R. Storm, JBC 256, 9118 (1981). 201. K. W. Koch, F. Eckstein and L. Stryer, JBC 265, 9659 (1990). 202. H. Munier, A. h u h s s , E. Krin, A. Danchin, A.-M. Gilles, P. Glaser and 0. Bbnu, JBC 267, 9816 (1992).

203. D. Engelberg, E. Poradosu, G . Simchen and A. Levitzki, FEBS Lett. 261, 413 (1990). 204. L. R. Levin, P. L. Han, P. M. Hwang, P. G . Feinstein, R. L. Davis and R. R. Reed, Cell 68, 479 (1992).

205. Y. Ishikawa, S. Katsushika, L. Chen, N. J. Halnon, J. I. Kw&e and C. J. Homey, ]BC 267, 13553 (1992).

206. M. Yoshimura and D. M. F. Cooper, PNAS 89, 6716 (1992).

This Page Intentionally Left Blank

Mutational Spectrometry: Means and Ends K. KHRAPKO, P. AND&, R. CHA, G. H U A N D W. G . THILLY~

Center for Encironttientul Heulth Sciences Mussuchctsetts Institrite o j Technology Cumlwidge, Mussuchusetts 02139 I. Goals and Problems ............................................ 11. Allele-specific PCR (ASP) ....................................... 111. High-efficiency Restriction Assay ( H E M ) .........................

I\’. Methods Using Differential DNA Melting to Separate Mutants . . . . . . Heferences

....................................................

285 289 295 302 311

1. Goals and Problems A. Definition A mutational spectrum is the distribution of mutations within a defined DNA sequence with regard to position and kind. Seymour Benzer and the late Ernst Freese showed that different mutagens give different spectra of point-mutations in the rII region of bacteriophage lambda (1). Most researchers who have used mutational spectra have been interested in the molecular mechanisms of spontaneous and chemically induced mutation. A few have been interested in finding the primary causes of mutation in the various organs of humans. Getting the point-mutational spectrum from a population of phage, bacteria, or human cells in culture is straightforward. One isolates independent mutant colonies and sequences each until one has enough information for the intended use. This “clone-by-clone” method was what Benzer and Freese used, and it has been good enough for many mechanistic studies in the field. Much useful information has been obtained in this way. Analysis of the literature from 1958 to 1979 suggested to us, however, that clone-by-clone spectra simply would not be good enough for the next I

To whoin cerrespondenw may be addressed.

Prnyrcss in Niiclric Acid Arseareh and Molrciilar Biolngy, Vnl. 49

285

CnpyrilJit 0 1 W by Acdeniir Press. Inc. All rights d repductinn in any rorn: reserved.

K. KHRAPKO ET AL.

286

level of mutational mechanism studies or for finding the causes of human germinal and somatic mutation. Our work and this essay thus focus on our attempts to measure mutational spectra in whole human cell populations and in human tissue samples.

B. Appropriate Data Sets It is worthwhile to consider how the intended use of mutational spectra influences the criteria applied to define an appropriate data set. If we want to know whether a mutagen causes G-to-A transitions (as many alkylating agents do), we could just isolate and sequence a few dozen colonies and reach a supportable conclusion. If, however, we want to know whether two alkylating agents cause significantly different spectra, considerably more work must be done. For example, Coulondre and Miller compared the mutations found in the Em1 gene of E. coli after methylnitronitrosoguanidine(MNNG) treatment to those seen after ethyl methanesulfonate (EMS) treatment (2). More than 900 mutants were scored for the former, and more than 600 for the latter mutagen. Pairwise comparison showed that both mutagens caused G-to-A transitions at the same set of base-pairs. Statistical analysis (x2)showed the spectra of the two chemicals to be different at the 99% confidence limit. However, if the data sets were simply divided in half, the two spectra were not significantly different, even at the 95% confidence limit. This is a lot of work (>750 clones) to conclude that two spectra are not significantly different. It is especially difficult in this case, in which the spectra are, in fact, different. But one could not be sure of this with fewer than the 1500 colonies studied.

C. Bulk Approach to Mutant Analysis: Selectable Genes We were not thrilled by the idea of sequencing hundreds of human cell mutant colonies to obtain results; we set out to find a less arduous way. We focused our attention on a selectable gene and combined mass selection of mutants with separation of mutant sequences by denaturing gradient gel electrophoresis (DGGE; see Section IV). We enumerated the mutants by their intensities on the gel, isolated and sequenced the mutant bands, and published the spectra (3). In such experiments, the reproducibility among spectra obtained from independent human cell cultures is excellent because the number of mutants induced by mutagen treatment is large enough. For instance, in a typical experiment using exon 3 of the hprt gene as the DNA sequence of interest, we make sure that more than 10,OOO hprt- mutants survive treatment in each replicate culture. In this way, any particular mutant that represents 1% or more of the hprt mutants will occur at least 100times among the surviving

‘‘V

MUTATIONAL SPECTHOMETRY

287

cells. Among independent experiments, the 95% confidence limits on the expectation of 100 will be 80 and 120. Counting and isolating mutants by DGGE have greatly simplified the job. One intermediate goal was reached. We can now obtain efficiently the kind of spectra previously obtained by clone-by-clone analysis in simple cell systems. For example, the method was applicable in human cell culture even at the level of sensitivity required for studying spontaneous mutation (4). The method is general for any selectable genes in viruses, bacteria, yeast, or mammalian cell DNA. However, we have not yet reached our ultimate technical goal of measuring spectra in human tissues.

D. Requirements for Measuring Mutational Spectra in Human Tissue Point-mutant fractions for genes such as hprt in T cells in middle-aged humans were reported to be about 10-5 (5, 6). Since mutational spectra that include “hot spots” consisting of 1%or more of all mutants in a gene are very useful for mechanistic or causation studies, it is reasonable to assume that a typical study of mutational spectra in humans could require a means to measure mutations at a frequency of 10-7 or higher for single-copy nuclear genes. Such a frequency requires that 109 cells be used to produce a spectrum, which ensures that each “1%hot spot” is represented by statistically significant number of 100 copies. It is worth noting here that 109 mainmalian cells contain about 3000 p g of DNA. This is an enormous amount, which is very difficult to process. For example, DNA concentration in a polymerase chain reaction (PCR) should not exceed 50 pg/ml (7; R. Cha, unpublished), or only 5 pg per standard reaction. This is a common challenge for human mutational spectrometry regardless of which technique is utilized. One approach would be to restriction-digest the DNA and to isolate the size-fraction containing the desired sequence. This would reduce the amount of DNA down to 1%. The use of multi-copy genes may simplify the task. For ribosomal RNA genes at 400 copies per cell, 2.5 x 106 cells would suffice to produce a reproducible spectrum. Mitochondria1 genes exist at 4000 copies per cell (8). Moreover, mitochondria1 mutation rates appear to be some 20 times higher than those for nuclear genes (9). These two facts together mean that a means to measure initochondrial mutations at a frequency of 2 x 10-6 in a sample of about 104 cells would be suitable for human tissue mitochondria1 mutational spectrometry.

E. Unselected Approaches To obtain useful mutational spectra for human tissues, one thus must deal with rather low mutant fractions. Unfortunately, phenotypic selection cannot

288

K. KHRAPKO ET AL.

be used to enrich for mutants in most tissues, as opposed to bacteria or cell culture. Another drawback of phenotypic selection is that the approach is limited to selectable genes. The latter limitation is important not only because mutations at some DNA loci of particular interest cannot be selected. Even more important, selection generally rules out the use of multi-copy genes. Therefore, there is no other way but to substitute phenotypic selection by other processes to get rid of wild-type sequences. Our early attempts in the field were based on DGGE separations with radioactive label detection. It was shown that a simple combination of PCR with high-fidelity DNA polymerase and DGGE enabled one to detect a mutant at a fraction of about 10-3 (4). A more advanced approach included a DGGE separation of mutants from a mixture of restriction fragments followed by PCR of eluted DNA and another DGGE of the resulting mixture of PCR fragments. This approach enabled us to detect mutants down to less than 10-6. However, this approach suffered a reproducible but still unexplained non-linearity of response at fractions below 10-4 (10,11). In the sections below, we discuss three separate approaches we have taken; they appear promising either alone or in combination. Allele-specific PCR (ASP)-especially our favorite variant, mismatch amplification mutation assay (MAMA), described in Section 11-allows us to measure a specific point-mutation by constructing a PCR primer and using conditions that support the amplification of the mutant but not of the wildtype sequence (12). With our high-efficiency restriction-enzyme digestion assay, or H E M , we select for mutants in “six-cutter” restriction sites (G. Hu, unpublished). The restriction enzyme chosen cuts a wild-type recognition sequence but none of any of its possible mutants. The uncut mutant sequences are amplified by PCR and studied further (Section 111). With constant denaturing capillary electrophoresis (CDCE), which was derived from Fischer and Lerman’s DGGE, we make use of the differences in melting temperatures of DNA molecules caused by a single base change (13). These melting differences are translated into lower electrophoretic mobility of mutant/wild-type heteroduplexes as compared to wild-type/wild-type heteroduplexes, which enables efficient separation by electrophoresis. Table I summarizes the scope and the efficiency of the aforementioned approaches. The methods are considered single steps of mutant purification, which enables us to compare them in terms of the ability to enrich mutants in a mutant/wild-type mixture. Note that for each base-pair there exist several formal possibilities for a mutation: three substitutions, a deletion, or an insertion of any number of base-pairs 5’ to the base-pair in question. For the purpose of discussion, we thus consider five formal measurable mutations as possible for each base-pair.

289

MUTATIONAL SPECTHOMETHY

TABLE I APPHOACHESTO O R T A I N U S E F U L MUTATIONAL SPECTRA

Approach Phenotypic selection (1O00bp gene) C I X E (100-1)p domain) H E M (6-111)site) MAMA (oneformal mutant) 41

FOR

HUMANTISSUESif

Number of possible mutations screened

Enrichment of mutants

SO00

105

Selectalde niiitaiits

500 30 1

101

100-150 Ill) Restriction sites A single known iiiutant

105

105

Limitation

C I X E , Constant denaturing CdpillaqJelectrophoresis; H E M , high-efficiency restriction as-

say: MAMA, inismatch ainplification inutation assay.

We may reasonably anticipate combining several purification steps into a complete mutant detection and/or isolation procedure. For example, a combination of phenotypic selection and DGGE yielded an enrichment of 107, which, as mentioned earlier, enabled us to investigate spontaneous mutation in cell culture (4).Total enrichment of 107 is the product of the enrichment of mutants by phenotypic selection (105) and by DGGE (102). Although many combinations are possible, some of them require a PCR as a link between consecutive enrichment steps. Since PCR itself generates mutations (14), it should be used only under the condition that the fraction of PCR-associated mutants is less than the fraction of the original mutants. This means that the original mutants must be enriched above a certain threshold prior to PCR. Since the threshold decreases as the fidelity of polymerase increases, the need for high-fidelity PCR is obvious. Here, it is worth pointing out the role of DGGE-like methods, including CDCE. Although not very efficient in enriching mutants, they are able to pick up almost any mutant and to display a mutational spectrum as a series of bands or peaks, which provides easy isolation and subsequent sequencing of individual mutants. Mutational spectrometry is not the only application for the approaches discussed here. Detection of mutants at low fractions are of special interest in population screening and in early detection of cancer cells.

II. Allele-specific PCR (ASP) ASP is a modification of the PCR (15, 16)that permits specific amplification of sequences differing by as little as a single base-pair (for a review, see 17). The method is based on the observation that a 3' mismatch(es) of a

290

K. KHRAPKO

ET AL.

Mutant “ A 4 4 +CTT-

Wild Type

-GGA+ +CCT-

-

- G O A L

1

PCR utilizingone mismatch primer

A

M

4

c-

Gel electrophoresis following 30 40 cycles of ASP

-

FIG. 1. Allele-specific PCR (ASP). ASP is a modification of PCR that permits specific amplification of sequences differing by as little as a single base-pair. Specificity of amplification is obtained by using a primer that, unless it is annealed to the desired allele, forms 3’ misinatch(es) with the template. Shown is the double-mismatch primer utilized for the detection of a transforming rat H-ras allele [GGA-to-GAA transition at the 12th codon (12)].The mutant allele, which forms one penultimate mismatch with the primer, is efficiently amplified by the polymerase; on the other hand, the extension from the wild-type allele is greatly hindered I)y the additional (ultimate) mismatch introduced by the double-iiiistiiatcli primer.

primer/template complex interferes with efficient extension by DNA polymerases. Allelic specificity is obtained by designing a primer that, unless it is annealed to the desired allele, will make a 3’ mismatch(es) with the template (Fig. 1). Terms synonymous with ASP in the literature include PCR amplification of specific allele (PASA; 18), amplification refractory mutation system (ARMS; 19), and MAMA (12).The procedure has been most widely utilized in the human population studies, for instance, to identify carriers of various human genetic disorders, including a,-antitrypsin deficiency (19,2O),sicklecell anemia (21), familial amyloidotic polyneuropathy (22, 23), and phenylketonuria (24). In each of these cases, the desired allele constitutes either 50% or 100%of the sample, and the specific allele is readily detected by nonisotopic ASP methods (17). ASP also has a number of potential applications, including short-term in oioo and in oitro mutagenicity tests, human mutational spectrometry, and

MUTATIONAL SPECTROMETRY

29 1

elucidation of genetic events that are involved during early stages of tumorigenesis. In order to carry out such analyses, we believe that mutation assays with a sensitivity of 10-5 or better are required. The sensitivity of a mutational assay is defined here as the lowest mutation fraction measurable by the assay. However, except for the MAMA, the limit of sensitivity of currently available ASP is around 1%(18)and this has been the reason for the limited utilization of ASP in human population studies. MAMA is an ASP that has been optimized in regard to its sensitivity (12). By exploring double-mismatch primers, altering the duration and the temperature of the primer-annealing and extension step in the PCR, and modifying the solvent composition of the reaction mixture, we reproducibly measure a specific mutation (GGA-to-GAA mutation at the codon 12 of the rat H-ras gene) at a fraction somewhat below 10-5. MAMA is limited in that it is designed to detect one specific mutation at a time. Its power, however, stems from its simplicity and speed. This, in turn, makes MAMA the technique of choice in certain cases in which rapid screening of a large number of samples is desired. For example, MAMA for the G-to-A transition in the 12th codon of the rat H-rus gene allowed us to screen efficiently hundreds of organ sectors. In the case of mutational spectrometry, one could use multiple MAMAS as a rapid screening tool for mutational hot spots once other procedures have provided the mutational spectrum. In such cases, a simple MAMA screening may be sufficient to assess whether certain individuals have been exposed to a particular mutagen.

A. Development of a Mismatch Amplification Mutation Assay (MAMA) The overall objective here is to define PCR conditions that allow efficient amplification of the desired mutant allele, but minimize amplification of a wild-type allele. Development of a MAMA involves several variables: (1)the mismatch primer sequence, (2) the temperature of the primer extension step, (3) the time permitted for extension, and (4) the composition of the reaction mixture, particularly the concentrations of dNTP, MgCl,, primer, and glycerol. 1. NUMBER,POSITION, AND NATUREOF MISMATCHES IN THE PRIMER

Despite a large number of reports regarding efficiencies of primer extension from matched versus mismatch primers, it is still extremely difficult to predict which mismatches will be extended and which will not. This is largely due to the fact that the efficiency of primer extension is greatly influenced by many parameters in PCR, such as the type of the DNA polymerase, the local context of the DNA, the reaction conditions (including

292

K. KHRAPKO ET AL.

concentrations of primers, dNTPs, MgCl,, pH), and the time allowed for extension. This point is illustrated in Table I. Whereas Newton et al. (19)and Kwok et al. (25) reported reduced amplification for specific single mismatches, all of the single-mismatch primers tested by Cha et al. (12) were amplified as efficiently as the perfect match primer. In the latter study, reduced amplification was observed only when double mismatches were introduced at the 3‘ end of the primer. Even then, one example was found in which a primer that created AG/CT double mismatches gave efficient amplification. Whereas Newton et al. (19) observed reduced amplification from T/T mismatches (primer/template), both Kwok et al. (25) and Cha et al. (12) reported efficiencies that were comparable to the perfect match. These differences can be attributed to several factors. As summarized in Table I, each study was carried out using a different gene, using a different mutation, and under different reaction conditions (including the length of the mismatch primers, the concentrations of various components of the reaction mixture, and the steps involved in the PCR cycle). One general “rule” in designing mismatch primers for MAMA is that, in order to see allele-specific amplification, the mismatches in a primer must be positioned at the 3‘ ultimate or the penultimate position. A single mismatch or double mismatches placed at least two positions away from the 3‘ ultimate positions are not as effective as mismatches at 3‘ ends in reducing the efficiencies of undesired alleles (12, 17, 19, 25). Also, for the purpose of detecting rare mutations (e.g., mutant fractions of less than or equal to lo+), a single mismatch has not yet been found to provide sufficient specificity. For an A-to-T transversion in the codon 61 of the mouse H-rus gene, Nelson et (11. (26) found a limit of 10-4, whereas Sarkar et ul. (18) reported a limit of 2.5 X 10-3 for a TA-to-AT polymorphism of the phenylalanine hydroxylase gene. With regard to the nature of double mismatches to be chosen to optimize specificity, there are no general rules except that mismatches involving T residues appear to be more permissive to extension than others (12,25)and should be avoided. Obviously, double-mismatch primers that permit looping-out of one of the mismatches are also undesirable.

2.

REACTION CONDITIONS

Reaction conditions play a critical role in determining whether or not a particular mismatch will be extended. For example, Kwok et al. (25) noted that a G/G mismatch was extended as well as a perfect match primedtemplate when the dNTP concentration in the reaction mixture was 800 p M , but not at 50 gM. Each component of the reaction mixture must be optimized so that it will allow for efficient amplification of the desired allele, but at the same time minimize amplification of the wild type.

MUTATIONAL SPECTROMETRY

293

This overall objective of MAMA is similar to that of high-fidelity PCR in that both require a high degree of specificity. In general, high-fidelity PCR conditions (e.g., a reduction in the mismatch primer extension)are achieved by lowering pH and the concentrations of dNTP and primers (27, 28). Several researchers found that lowering the MgCI, concentration also reduces the extension of mismatch primers in ASP (18, 27). On the other hand, for Tuy and Vent polymerases, Ling et al. (28) reported that when both pH and the concentration of dNTP are lowered, increasing the MgCl, concentration improves the fidelity of PCR. It is also important to note that increasing the fidelity of PCR in certain cases reduces the efficiency of PCR (28). In fact, for the MAMA of the G-to-A transition at the 12th codon of the rat H-rus gene, no detectable amount of amplification product was generated from either the wild-type allele or the mutant allele when pH was below 7.0 (8.4 in the original buffer), or the concentration of MgCI, was below 0.5 mM (2.25 mM in the original buffer).

3. BMPERATURE AND DURATION OF THE ANNEALING AND EXTENSION STEPS In general, shorter extension periods provide the condition for highfidelity PCR. For this reason, in MAMA, we eliminated the separate extension step that greatly decreased double-mismatch primer extension. To find the optimal annealing temperature, we tested temperatures ranging from 50°C to 66°C. No amplification product was observed from either the mutant allele (the desired allele) or the wild-type allele when the temperature was above 63°C. Below this temperature, efficient amplification (65-70% per cycle) of the mutant allele was observed. Minimum amplification occurred from the wild-type allele when the annealing step was carried out at 50°C. At the same time, a few aberrant bands also appeared. Thus, it appears that as the temperature is lowered, the double-mismatch primers can hybridize to other regions of the DNA. The optimal temperature and extension period must be determined individually for each MAMA developed. The use of capillary PCR permits tighter control over time/temperature parameters and could increase the sensitivity of MAMA. We do not know whether all single base-pair alterations in genomic DNA are amenable to MAMA (i.e., with the sensitivity of 10-5). Thus far, three different loci-the GC-to-AT transition at the 12th codon of the rat H-rus gene, the TA-to-AT transition in codon 664 of the rat c-neu gene, and the ATto-TA transition at the codon 61 of the rat H-rus gene-have been subjected to MAMA optimization. MAMA for the first two achieved a sensitivity of lo+; the current sensitivity of the third mutation is about 5 X 10-5. Even in these few cases, the optimal MAMA conditions for each sequence were

294

K. KHHAPKO ET AL.

significantlydifferent. For example, for the A-to-T transversion of the c-neu gene, 15 p.M of each dNTP (versus 37 p.M for the G-to-A mutation of the H-rus gene) and 5% glycerol (versus 10%) were used. In general, obtaining a sensitivity of 10-5 by MAMA required the following three features: (i) introduction of double mismatches; (ii) reduction of the extension time; and (iii) addition of glycerol. Our experiences indicate that by simply implementing these three conditions, one can achieve a sensitivity of 10-2 to 10-3. However, in order to increase the sensitivity to 10-5, optimization of various parameters in MAMA using a matrix approach is required.

6. Achieving Higher Sensitivity The current limit of sensitivity of MAMA is lop5. This is based on the observation that 15 copies of a mutant allele mixed with 1.5x 106 copies of a wild-type allele gave rise to a signal that was reproducibly discernible from the 1.5 x 106 copies of the wild-type DNA alone. The limit of sensitivity stems from the fact that, despite the double mismatches, a small fraction of the wild-type allele is still extended by polymerase. Currently, it is not known precisely how frequently such double-mismatch extension takes place. Our experience with the GGA-to-GAA transition at the 12th codon of the rat H-rus gene indicates that the number of copies generated from 1.5 X 106 copies of the wild-type allele is slightly lower than the number generated from 15 copies of the mutant allele (i.e., 10-5). There are many possibilities that should reduce the background signal from the wild-type DNA. One can eliminate the wild-type DNA by first running a preparative DGGE. Since, it is possible to eliminate at least about 99% of the wild-type DNA by DGGE, this in turn would reduce the background signal by a factor of 100. An alternative method of ridding the wild type in some cases is to utilize a specific restriction enzyme that cleaves the wild-type but not the mutant DNA. In this way (see Section II), it may be possible to degrade over 99.99% of the wild-type DNA. In addition to these methods in which the source of background noise (i.e., the wild-type allele) is physically removed from the sample, there are other means to reduce the background, for example, by making extension of the double-mismatch primers more difficult. Tu4 DNA polymerase is not an enzyme of choice for high-fidelity PCR due to its relatively high error rate (14). Tu4 has been utilized in our initial studies because at the time it was the only thermostable enzyme that was also exonuclease negative (exo-). It was reasoned that exonuclease-positive (exo+) DNA polymerases would correct the terminal mismatch and extend the corrected primer, thereby eliminating the specificity that was conferred by introducing a mismatch(es)at the 3' end. More recently, additional exo- thermostable enzymes have been iden-

MUTATIONAL SPECTROMETRY

295

tified, including those derived from Pfu and Vent DNA polymerases. Although the fidelity of these additional exo- thermostable enzymes remains to be determined, they could easily be tested by MAMA to see whether they could reduce the background noise from the wild-type allele. Finally, one could combine the principle of differential oligonucleotide hybridization (DOH) and MAMA to reduce the background signal (a suggestion by H. Zarbl). DOH is a technique that has been utilized extensively in identifying oncogenic mutations in tumors. In a typical assay, a short piece of synthetic DNA fragment (10-20 bases long) encompassing the region of mutation is used to probe for a specific point-mutation. By optimizing the hybridization conditions, the technique can be successfully utilized in characterizing single point-mutations (29). The principle behind Fig. 3 is to design a synthetic oligonucleotide fragment (“blocker”)that will hybridize to the wild-type but not the mutant allele. By occupying the wild-type DNA, the blocker will presumably prevent wild-type DNA from annealing to the MAMA mismatch primer. In order to ensure that the blocker does not become extended by Tay polymerase, the 3’ end of the blocker will be synthesized with dideoxynucleotide or some other synthetic nucleotide that prevents chain elongation. In summary, an ASP in the form of MAMA has been demonstrated to permit measurements of single base-pair mutants at a fraction of 10-5. A similar sensitivity has been found for a single base-pair deletion in the human hprt gene (R. Okinaka, personal communication). It seems probable that MAMA sensitivity can be improved to measure mutant fractions down to 10-8.

111. High-efficiency Restriction Assay (HERA) A. Introduction HERA detects DNA point-mutations located in restriction recognition sites by eliminating wild-type DNA copies using high-efficiency restriction digestion. Restriction endonucleases are used to digest cellular DNA and eliminate wild-type DNA copies of the sequence studied. Mutants in the restriction recognition sites will be undigested. With high-fidelity DNA amplification, the mutants can be amplified and subsequently separated, enumerated, and isolated by DGGE or another suitable separation technique. Several groups have also been trying to use restriction endonucleases to eliminate wild-type DNA, with varied success. Processes based on restriction digestion have been used to detect point-mutations in oncogenes (30-35).

296

K. KHRAPKO ET AL.

However, a general design problem in these efforts is that PCR is used to amplify the target sequences, which are mixed with too many residual undigested wild-type DNA copies (30-34, 36). In these experiments, DNA amplification before a high-efficiency restriction digestion would be expected to create PCR-induced mutants at a level that would obscure expected in uioo mutations at fractions of 10-6 to 10-7 (14). Another problem with many of the experiments reported is a lack of sufficient initial mutant copy number to achieve useful data (30, 34, 35). In order to achieve a 95% confidence limit of 20%, 100 or more mutants must exist in any sample assayed. Reconstruction experiments, such as mixing “one copy” of a mutant with 1Oj copies of wild-type DNA, are not a suitable means to demonstrate a mutational detection sensitivity of 10-5; 100 mutants should be mixed with 107 copies of wild-type sequences for such a demonstration. The H E M method has the following advantages and characteristics: (i) The sensitivity for mutation detection by this method is about 10-7.Thus, it should be possible to measure human somatic mutations using H E M . (ii) HER4 can screen 4- to 8-bp DNA sequences each time for any pointmutation related to these sequences; therefore, it can be used in a limited way to establish mutational spectra. (iii) HERA measures mutation within palindromic sequences that show a higher proportion of mutational hot spots than random sequences (37).

B. Methodology The major steps of the H E M procedure are shown in Fig. 2.

1. CELL/DNA ISOLATION DNA should be isolated from tissue or cells without being exposed to elements that may react with DNA or cause DNA adducts. Many uncontrolled factors, especially heating and UV light from normal fluorescent lamps, generate DNA adducts that can be clearly separated and distinguished from wild-type DNA on DGGE (8). 2. ELIMINATION OF HETEROGENEOUS DNA WITH REGARD TO ENDONUCLEOLYTIC DIGESTION A DNA fragment several hundred bases in length carrying the target sequence is cut from cellular DNA at two restriction recognition sites flanking the region of interest, and purified on a polyacrylamide gel. In order to eliminate wild-type DNA by restriction digestion, the efficiency of restriction digestion must be sufficiently high so that very few copies of wild-type DNA will remain undigested. However, the efficiency of restriction digestion is limited by heterogeneity in the preparation of the

MUTATIONAL SPECTROMETRY

297

DGGEIXCOCE FIG.2. Illustration of H E M . A D N A population of 10" copies carrying hot-spot mutations with a niutatioiial fraction (MF) of 10-7 is cut from a cellular D N A preparation and digested by restriction eiidonuclease. A 10-5 fraction of wild-type D N A reliiains undigested. The MF of is these inlitatits is increased to 10-2. The fraction of the tnutants generated in PCR (MFrYcR) ecliial to the length of the target (h)tiiiies the error rate ofthe D N A p)lyiiierase used in PCR (fl (2 X lO~'/l)p/duplicatioti for Tay DNA polytiierilse), times the nuiiilwr of duplications made in PCR (d),which is almit 26, to produce a iiiaxiiiiuiii of 10'2 copies of D N A from a iiiiiiiiiirim of 1 0 4 wild-type D N A copies. Thus, MF,,,,, is calculated to be 1.6 X lo-?, which is LY)lllpdrdble to the MF. PCH errors lociited outside the restriction recognition site, as well as a large portion (90%) of residual wild-type sequences, are eliminated by another round of digestion. Reamplificatioii with a internal primer carrying a CC clamp eliiniiiated the noii-specific amplification signals generated in the first round of PCR and enabled the target seqrienct' to be analyzed b y I X G E or CIICE.

DNA, with regard to endonucleolytic digestion (Fig. 3). As shown in Fig. 3, the digestion efficiencies in the EagI site at bp 2567 of the mitochondria] DNA (mtDNA) and in the KpnI site at bp 2574 of the mtDNA were both 90% when cellular DNA was digested. However, double digestion with both KpnI and EagI also left an undigested residue of lo%, instead of 1%, as would be expected for independent action of the endonucleases on homoge-

298

K. KHRAPKO ET AL.

a

pBR322/Mspl, 250ng pBR322/Mspl, 500ng 109 copies rntDNA, 6x102-fold 109 copies rntDNA/Kpnl, 104-fold 109 copies mtDNNEagl, 1O4-fold 109 copies rntDNNKpnLEgal, 104-fold

b

pBR322/Mspl, 250ng pBR322/Mspl, 500ng 109 copies rntDNA/Eagl, 108-fold 109 copies mtDNA/Eagl, 108-fold

FIG.3. Improvement of restriction digestion efficiency. (a) Heterogeneity of cellular DNA. Cellular DNA containing 108 copies of mtDNA isolated by phenol extraction was digested with EagI, KpnI, or EagI plus KpnI, respectively, and subsequently amplified using Ta9 DNA polymerase and primers 1 and 2, which are complementary to the 2457 to 2476-bp and 2613 to 2594-bp regions of mtDNA (45b),which carries an EagI recognition site at 2567 bp and a KpnI recognition site at 2584 bp. Amplification-fold of each sample is indicated. Residual mtDNA copy number in the restriction-digestedcellular DNA samples before PCR was calculated to be about 10s copies. A restriction digestion efficiency of 90% was thus concluded. (b) High-efficiency EagI digestion. Cellular DNA was first digested with SphI and PouII; their recognition sites are located at 2436 bp and 2653 bp of mtDNA, respectively. Undigested DNA was then removed by purifying DNA fragments on a 6% polyacrylamide gel. The portion of the DNA fragments (length 217 2 40 bp) was recovered by electroelution.EagI digestion was carried out on 1Oe copies of these purified DNA fragments. After about 1Wfold amplification, -lo'* PCR products were observed as compared to the pBR322/MspI standard. This method indicated an undigested residue of 10-5 or less in replicate experiments with EagI digestion.

neous DNA (Fig. 3a). A portion of the DNA was thus determined to be indigestible, probably because it is incompletely dissolved as microprecipitates. To improve the digestion efficiency, target sequences were first cut from cellular DNA and purified on a polyacrylamide gel. Indigestible heterogeneous cellular DNA was removed by this gel-purification process. DNA thus purified and eluted from the gel can be digested to near completion. Only 10-5 or less of the wild-type DNA remains undigested, as determined by quantitative PCR (Fig. 3b).

3. HIGH-EFFICIENCY RESTRICTION DIGESTION OF DNA This is the key step contributing to high sensitivity. Wild-type DNA will be digested at the unique recognition site (i.e., the target sequence) to near completion so that only about 10-5 of the wild-type DNA copies remain

299

MUTATIONAL SPECTHOMETHY

undigested. A typical nuclear DNA hot spot that occurs at a fraction of 10-7 (Section I,D) will thus be enriched to about 10-2 by a high-efficiency digestion step. 4. HIGH-FIDELITY DNA AMPLIFICATION Undigested DNA, including mutants and undigested residual wild-type DNA copies, are amplified to generate 1012 total copies. Two points should be considered at this amplification step. Some DNA polymerases may add an extra nucleotide to the 3' end of the PCR products (38) during PCR and therefore affect their behavior in the following DGGE steps (39); DNA polymerases that create blunt-ended PCR products, such as T4 DNA polymerase, are preferred in this step. The second point is that DNA polymerases make mistakes during amplification; these may be mistaken for sample mutants. The PCR reaction should therefore be optimized with respect to fidelity (28). The mutant fraction (MF) generated during PCR can be predicted by the following equation:

M F = bfd/2 where h is the length of the target sequence, f is the error rate of the DNA polymerase, and d is the number of duplications of the sequence. If a 6-bp restriction recognition site is screened and Tay DNA polymerase is used to amplify DNA IW-fold, the expected mutant fraction generated in PCR should be 1.6 x 10-2. This is because there are 6 bp in the target restriction recognition site, f for Tay DNA polymerase is about 2 x 10-4, and amplification from 104 to 1012 copies requires 26 duplications;6 x 2 x 10-4 x 26/2 = 1.6 x 10-2. Since the sample mutant fraction of 10-2 is comparable with the PCR noise, sample mutants should be visible and distinguishable on a denaturing gradient gel from PCR noise, as observed in a simultaneous control containing PCR errors.

5. ELIMINATION OF MOSTPCR-GENERATED MUTANT SEQUENCESAND NON-SPECIFIC AMPLIFICATION SIGNALS The PCR product is redigested with the same restriction endonuclease as used in the step 3 to eliminate the PCR errors generated outside of the 6-bp target sequence in the amplified DNA fragment. In oitro DNA amplification generates PCR errors within and outside the target sequence, all of which are detected as signals during the later separation of mutants on DGGE. Considering that the total length of the amplified sequence is usually about 100 bp to facilitate separation of the mutants on DGGE, only 6%of the PCR errors will be located in the target region. Redigestion of the PCR product

300

K. KHRAPKO ET

AL.

eliminates most of the total PCR errors not located on the restriction recognition site. The digested PCR product is then reamplified 100-fold with an internal primer to eliminate non-specific amplification signals. In step 4, some nonspecific amplification occurs caused by the selected primers annealing to another region of the genomic DNA. These sequences may represent noise in the system. PCR with an internal primer removes almost all of these nonspecific amplification signals. 6. ANALYSISOF THE MUTANTS

Since the sample mutational fraction has been raised to at least 10-2, there are several ways to separate and enumerate these mutants. One of the most reliable methods is DGGE (40).The PCR product generated in step 5 can be attached to an artificial high-melting domain, and the purified PCR product can be run on a DGGE. Since the sensitivity of DGGE detection is around 10-2 to 10-3 (8),it is fully applicable in this case. CDCE (13)and single-strand conformation polymorphism (41)may be alternative choices.

C. Application of High-efficiency Restriction Assay (HERA) to Mitochondrial Mutational Assay HERA has recently been used to measure the mutations in the human mitochondria1 genome (41b).mtDNA has several advantages for mutational research. There are 103to 104 copies per cell, so smaller tissue samples yield the necessary number of mutants. mtDNA has an evolutionary rate 20 times that of nuclear single-copy genes (9) and appears to be more sensitive to chemical mutagens than is nuclear chromosomal DNA. Mitochondrial mutants may also play important roles in carcinogenesis, degenerative diseases, and aging (42-44).intDNA is a convenient target to detect hot-spot mutations and to establish a mutational spectrum from a normal healthy human. According to our calculation, 3 X 105 T cells from peripheral blood samples should provide enough mutant copies to detect hot-spot mutations in a fraction of approximately lo-’ in a 6-bp restriction recognition site. Nuclear multi-copy sequences such as ribosomal DNA genes may also be suitable for mutational spectra studies. However, there is a difficulty that must be overcome in order to measure mtDNA mutations: the interference from nuclear pseudogenes of mtDNA. mtDNA has frequently been inserted into the nuclear genome during evolution, and these insertion events now appear as a series of pseudogenes (45). When using a total genomic DNA preparation, these pseudogenes represent “noise” in mtDNA mutational assays. Single to multiple copies of mutant copies of pseudogenes represent mutant fractions of 10-2 to 10-4 relative to

301

MUTATIONAL SPECTROMETRY

U

2

n

m

1 2

3

4

5

6

7

8

9

10 11 12

FIG.4. DGGE display of mtDNA mutants in tissue samples. Cellular DNA from one lung, two normal colon, and two colon tumor samples were examined for mtDNA mutations in EagI and KpnI sites. A chromium-treated human lymphoblast line, TK6,which carried no detectable mutations on the examined Eagl and KpnI sites (data not shown), was used as a concurrent control. Normal colon sample 2 has a clear signal not found in the other tissues or cell samples (Arrow b). All normal tissue and tumor samples show a band not seen in the cell culture sample (Arrow a).

wild-type mtDNA, that is, one to 100 copies of a particular nuclear mitochondrial pseudogene per cell. We first chose the EagI site (2567 bp) and KpnI site (2574 bp) in the 16-S ribosomal RNA coding sequence of the human mtDNA as target sequences of H E M . A series of mtDNA pseudogenes homologous to the 2457 to 2594bp region of the mtDNA, at the fractions of 10-2 to 10-3 compared to the wild-type mtDNA, were found and sequenced (45b).By knowing the nuclear pseudogene sequences, a protocol is designed to eliminate all of the pseudogenes homologous to target mtDNA sequences and to screen rare mtDNA mutations (41b). When H E M was used to search for unselected mtDNA mutations in the

K . KHRAPKO ET AL.

302

EugI site (2567 bp) and the KpnI site (2574 bp) from human tissue samples, one colon sample was found to have a hot-spot mutation at a frequency of approximately 10-6 (Fig. 4) (41b). While more investigation is needed to standardize the HERA technique and more restriction recognition sites are needed as target sequences for mutational spectrometry, the strategy has shown its potential to achieve the goal of direct measurement of DNA mutations in tissue.

IV. Methods Using Differential DNA Melting to Separate Mutants A. Principles of Separation In the past few years, mutational spectrometric research on DNA has been accelerated by inventions of methods based on cooperative melting equilibria of DNA. These approaches include DGGE (46), constant denaturant gel electrophoresis (CDGE) (47), and a capillary-based variant of the latter, CDCE (13). All include electrophoresis of DNA under partial denaturing conditions (elevated temperature and/or media containing urea and/or formamide). Under these conditions, it is possible to separate mutants differing by only a single nucleotide as individual bands or peaks. The separation is based on the following facts. It has been shown that melting of DNA fragments is a discontinuous process (40). In fact, most of naturally occurring DNA consists of wellbounded melting domains, each of which melts as a single unit at a specific temperature, melting being a rather sharp transition. This conclusion is based on calculations following Poland’s algorithm for DNA melting (48) as later modified (49).This algorithm yields the probability for any base-pair of a DNA fragment to be either in a helical or in a disordered state as a function of temperature. The parameters used in these calculations, which characterize the cooperativity of melting and the probability of loop formation as well as intrinsic stability of a base-pair as a function of its nearest neighbors, were obtained in independent experiments (50, 51, and 52, respectively). The behavior of melting domains stems primarily from two factors: high cooperativity of melting (i.e., high probability for a base-pair to be in the same state, melted or helical, as the neighboring one), and low probability of the formation of melted loops (53). The results of such calculations are usually presented in the form of “melting maps,” which refer to the plots of melting temperature against DNA sequence. Melting domains show up on a melting map as horizontal portions of the plot.

IMUTATIONAL SPECTROMETRY

303

If a DNA fragment consists of two domains, one melting at a lower and the other at a higher temperature, the melting course of such a fragment would include, within a certain range of temperatures and/or denaturant concentrations, a stable intermediate, comprising the fully melted lowmelting and completely helical high-melting domain. The electrophoretic mobility of such a partially melted intermediate is inversely proportional to the exponent of the length of the melted portion and is usually only a fraction of the mobility of a completely double-stranded species (53).Apparently, the partially melted intermediate is in rapid equilibrium with the non-melted form of the DNA fragment. Hence, the apparent mobility of the fragment may be considered as a weighted average of the mobilities of its non-melted and partially melted forms at a particular temperature (13). Important for the separation of mutants is the f x t that the melting temperature of a domain is strongly affected by most base-pair changes (transitions, transversions, deletions, insertions, and mismatches) within that domain. If the change is located in the low-melting domain, the equilibrium between the partially melted intermediate and the non-melted form is shifted. Thus, within the appropriate range of temperature, the apparent mobility of the corresponding mutated fragment is changed as compared to the wild type, and the two are efficiently separated. For example, as much as 95% of base-pair substitutions may be separated from the wild type in a sample fragment of a p-globin promoter (54). Thus, an efficient separation of mutants depends on a number of requirements. The stretch of DNA to be screened for mutants should be located within the low-melting domain of a low-melting/high-meltingdomain combination with sharp domain boundaries and a sharp melting transition. The melting temperature of the high-melting domain should be high enough so that strand dissociation is negligible (otherwise the bands decay and a higher-mobility smear consisting of single strands is formed). In case such a combination does not occur naturally, an artificial high-melting domain, or “clamp,” can be attached to an arbitrary sequence via PCR (54). Moreover, in many cases, the separation of mutants from the wild type is either impossible or the extent of separation is not sufficient for the needs of mutational spectrometry. The ability of the method to detect mutations is significantly improved (in the sense of detecting absolutely all mutations and increasing the separation from the wild type) by converting them into heteroduplexes with the wild-type sequence (55). The improvement results from the fact that a base-pair to mismatch change, as a rule, destabilizes DNA much more than any base-pair to base-pair change. The heteroduplexes are generated by simply boiling and reannealing a sample containing a predominance of wild-type sequences over mutants. By mass action, all mutant homoduplexes are converted to heteroduplexes containing one wild-

304

K. KHHAPKO ET AL.

type strand. This procedure is particularly feasible in mutational spectrometry, because samples usually contain a large excess of wild-type DNA.

B. Comparison of DGGE, CDGE, and CDCE Approaches 1. EXPERIMENTAL SET-UP

Although the physical principles underlying the separations are similar, experimental set-ups are quite different for slab-gel procedures (DGGE and CDGE) and for the capillary polymer network format (CDCE). In DGGE, a DNA fragment is run in a polyacrylamide slab gel with an ascending gradient of denaturant (urea and formamide). The gel is submerged in electrophoresis buffer of controlled temperature (usually around 60°C). The running time is typically 8-16 hours, at 8 V/cm (56). CDGE is performed in much the same way as DGGE, except for the absence of a gradient of denaturant. The running time depends on the resolution to be achieved and the concentration of denaturant used (3-8 hours) (47). For the detection of DNA, both radioactive labeling and ethidium bromide staining have been used. CDCE (13) is a newly developed technique that puts together the constant denaturant approach and the polymer network capillary electrophoresis format introduced for the high-resolution separation of singlestranded DNA (57).In CDCE, fluorescently labeled DNA fragments are run through a capillary 75 pm in diameter filled with viscous non-cross-linked linear polyacrylamide solution, rather than polyacrylamide gel. The capillary can be used many hundreds of times, while polyacrylamide filling must be replaced after each run (a %minute procedure). A portion of the capillary where the separation takes place is inserted into a water jacket with a variable temperature. DNA is detected at a single point where a laser beam is focused on the capillary. The fluorescence of labeled DNA, induced by the laser, is detected by an optical system with a photomultiplier, and the data are transmitted to a computerized data acquisition system. There are several advantages of CDCE over slab-gel formats. Microcapillaries enable us to increase the speed of separation about 30 times as compared to CDGE and DGGE (the usual field strength in CDCE is 250 V/cm). The speed of separation in both DGGE and CDGE is limited by heat production, which is not significant in capillaries. Typically, a capillary separation of mutants takes less than 30 minutes. Moreover, laser-induced fluorescence detection gives very high sensitivity and dynamic range, both features being of special importance in mutational spectrometry. In our system, it is possible to measure DNA peaks containing as few as 3 x 104 and as many as 1011 molecules. The miniature format itself is an advantage, since in working with low numbers of DNA molecules, as required by mutational

MUTATIONAL SPECTHOiMETHY

305

spectrometry, it is better to keep volumes as small as possible. Moreover, with CDCE, fractions are taken simply by directing the material being electroeluted from the anode end of the capillary into separate tubes, while in slab gel, one must cut out gel slices and elute the DNA from each slice. 2. SEPARATION EFFICIENCY

Examples of separations of a mixture of four sequence variants and a single-stranded DNA (ssDNA) by the three methods are shown in Fig. 5. The sequence shown is an example of a well-behaved DNA fragment, containing both high- and low-melting domains. The melting temperature of the wild-type low-melting domain was predicted by Lerman’s algorithm to be 63°C. The differences between sequence variants are limited to the changes in one base-pair in the low-melting domain. This base-pair is a GC in the wild type (labeled “GC”); in the variants, this base-pair was changed to AT or to mismatches GT and AC. The comparison of separations by DGGE, CDGE, and CDCE shown in Fig. 5 demonstrates that CDCE is superior with regard to resolution. Most likely, this advantage should be attributed to the much higher speed of separation in CDCE, which makes dihsion insignificant. In fact, the speed of separation in CDCE is so high that the resolution appears to be limited by the relatively slow kinetics of the melting-reannealing process; by increasing the speed of separation even further, one actually sacrifices resolution (13). Considerable differences in the relative peak positions result from different modes of separation by the three methods. In DGGE, a DNA fragment is supposed to reach a denaturant concentration at which the low-melting domain is almost completely melted; hence, the mobility becomes so low that the band essentially stops. It appears, therefore, that the final positions of the bands, corresponding to different sequence variants, are spatially linked to the specific denaturant concentrations. Note that the dsDNA bands in Fig. 5A are sharper than the ssDNA band, due to so-called “focusing,” which refers to the compression of a band as its mobility decreases. The mobility of the ssDNA band does not decrease and it passes the dsDNA bands by the time separation is complete. In contrast to DGGE, in CDGE and CDCE the conditions are constant throughout the region where separation takes place. The mobilities of dsDNA fragments are thus constant and depend on the states of melting equilibria displayed by each of them under those conditions. This principle is illustrated in Fig. 6, which shows CDCE runs of the same sample as in Fig. 5, except for the absence of single-stranded fragments, at different temperatures. At 31”C, a single peak is observed. This peak contains all four sequence variants in the unmelted form. A temperature of 35°C appears sufficient to partially melt the two sequence variants

K. KHHAPKO ET AL.

306

8 n

z

iII

d

#O

the topofthegel,

-liom

8

CDGE

5 x

g 4 %

d

0

16

18

14

12

Distancefromthe(opofulegel,cm

14

16

18 20 Minutes

22

24

307

MUTATIONAL SPECTHOMETHY

. 0.0 . 0.5

Gc

'

36°C 0.3 GT

n

AC

Gc+AT

-

- 0.1 35°C

AC

31"C

GC+AT+GT+AC

k

15

20

25

5

30

-06

- 0.3 - 0.0 -1.6

- 0.8 10.0

FIG. 6. CDCE separation as a function of temperature. The same sample as in Fig. 5, except for the absence of single-stranded DNA, was run 011 a capillary filled with 6% polyacrylamide, 3 . 3 . 4 urea, 20% formamide in TBE buffer at 250 Vlcm at the temperatures listed on each electrophoretogram. Peaks are labeled as in Fig. 5.

FIG.5. Comparison of separation of DGGE, CDGE, and CDCE. A 2Wbp amplified labeled human mtDNA sequence with 112-bplow-melting and 94-bp high-melting domains was used in our model experiments. Of two variants of this sequence, one (designated GC) was identical to the wild type, while in the other (designated AT) the wild-type GC pair 30 bp deep into the low-meltingdomain was artificially substituted for an AT base-pair. To prepare a sample for separation, a mixture of GC and AT homoduplex sequences was boiled and reannealed, which created, by cross-hybridization, a pair of heteroduplexes, designated here GT and AC, according to the mismatches they bear. A single-stranded (ss) fragment WdS also included in the sample. DGGE and CDGE: The32P-labeled sample was run in slab gels awarding to 40 and 47, respectively, under optimal conditions for the separation of the components. CDCE: The 5' fluorescein-labeledsample was run on a 75+m capillary filled with 5.5% polyacrylamide in TBE (89-mM Tris-borate. pH 8.4, I-mM EIITA)buffer at 63.5"C. I25 Vlcm. One V.sec of peak area rwrresp)nds to about 10" DNA molecules. For inore details, see 13.

308

K. KHHAPKO ET AL.

with the most unstable low-melting domains that contain mismatches. Fragment GT shows higher mobility than AC, since the melting equilibrium for GT is shifted toward the unmelted form as compared to less stable AC. At 38"C, both AC and GT fragments appear to be almost purely in the melted state, so that their mobilities do not differ significantly and the corresponding peaks almost comigrate. By changing temperature, one may selectively improve the resolution within the narrower range of stabilities of particular interest. For example, the mutant homoduplexes, which may be both more and less stable than the wild type, are better resolved at 38"C, when the wild type is in the middle of the separation range. On the other hand, the heteroduplexes, all of which are much less stable than the wild type, are resolved at 36°C when the wild type is still almost unmelted. It appears, therefore, that, given an unknown mixture of sequence variants to be separated, the only parameter one needs to know in advance is the melting temperature of the wild type, which can be roughly predicted by Lerman's algorithm (58) and further refined in test runs.

C. Detection of Low-Frequency Mutations by CDCE The advantages of CDCE over DGGE and CDGE, discussed above, convinced us to choose it as the mutational spectrometry tool. The feasibility of CDCE for the detection of low-frequency mutations is illustrated by a model experiment aimed at detecting mutant PCR fragments that were admixed to wild-type fragments at fractions as low as 10-6. The experiment was based on the idea that, although the current efficiency of CDCE in enriching mutant sequences is about 103, the procedure consisting of two sequential CDCE purifications might provide the necessary sensitivity. This principle was illustrated earlier using consecutive DGGE separations (10). In the course of the experiment, four mixtures with mutant fractions of 10-4, 10-5, and 10-6 and a negative control were prepared from purified GC, GT, and AC fragments by sequential dilution and subjected to two CDCE purifications each followed by PCR amplification. Presented in Fig. 7 are the CDCE separations that characterize the mixtures at each step. To make the picture simpler, only two of four separations are shown for each purification step-one in which the mutants may already be detected, and one with the next lower mutant fraction, in which the mutants cannot yet be seen. Figure 7A and B shows CDCE separations of the initial wild-type/mutant mixtures (mutant fractions 10-4 and 10-5, respectively). Note that the full scale of the two panels is only 1/1O,OOO of the wild-type peak height. This demonstrates the impressive dynamic range of CDCE, which in this case is 104 within one run. Indeed, the wild-type peak in Fig. 7A contains 1W

309

iMUTATIONAL SPECTHOMETRY

I .2 3.9 I.3

.O D

O! K ) $

0

60 K)

60 10

12

14

16

18

Minutes FIG.7. Detection of low-frequency mutations by CDCE. Samples taken at different stages of a reconstruction experiment were run through a capillary at 200 Vlcm, 63.5"C. 5.5% p l y acrylamide in TBE buffer. The amount of wild-type homoduplex (GC) was kept at about 1W copies per sample, which corresponds to a peak 5 V high, which is far off-scale. Due to slight differences between the runs, the charts had to be aligned along the time axis to make the heteroduplexes coincide. Peaks are labeled as in Fig. 5. (A and B) Initial mixtures of purified heteroduplexes (GT) and (AT) and wild-type homoduplex (heteroduplex fractions of 10-4 and 10-5, respectively). (C and D) Mixtures after one CDCEIPCR cycle (heteroduplex fractions 10-5 and 10-6, respectively). X, An unidentified peak of PCR-associated noise (see text). (E and F) Mixtures after two CDCE/PCR cycles (fraction lo-" and pure wild type, respectively).

copies, while mutant peaks (each of 105 copies) are still well above the background noise. Critical to such a high sensitivity is the quality of the purified DNA, which should not contain any admixtures that may appear in the heteroduplex region of separation as noise peaks. In the first CDCE/PCR cycle of mutant purification, fractions that belong to the heteroduplex region between minutes 14 and 20 were collected, pooled, and amplified by Pfu DNA polymerase. The PCR reactions were subjected to a second CDCE separation, shown in Fig. 7C and D (for mutant fractions 10-5 and 10-6, respectively). The full scale of the panels is now

310

K. KHHAPKO ET AL.

1 / 1 0 of the wild-type peak height. Two important conclusions may be derived from Fig. 7C and D. First, as measurements show, the mutant fi-actionof the 10-5 mixture was increased to more than 10-2, which is more than a 103-fold enrichment of mutant sequences. Since mutant peaks in the 10-6 mixture have not yet appeared above the background, 10-5 may be considered the detection limit for a single CDCE/PCR procedure. The enrichment at this step is limited by the carryover of wild-type sequences into those regions of separation where only mutants are supposed to be. Our preliminary observations (8)indicate that the carryover consists of two kinds of DNA molecules. Some of them fall behind the main peak for some non-specific reasons, such as adsorption. The others may bear a chemical modification that destabilizes their low-melting domains. Second, the background (relative to the wild-type peak) in PCRamplified samples is at least 100-fold above that in the initial mixtures of purified DNA fragments. Hence, PCR generates some fraction of “modified DNA molecules which show up in the heteroduplex region of separation. Some of these molecules may be the well-known true PCR-associated mutants (14)that result from polymerase mistakes. However, some of them are definitely of different origin, for example, peak X in Fig. 7C and D, which disappears in the next cycle (cf. Fig. 7E and F). The second cycle of purification was identical to the first one. CDCE separations of the resulting PCR reactions are shown in Fig. 7E and F, the full scale being 1/20 of wild-type height. The enrichment of mutants at this cycle is only about 25-fold, which, however, is enough to detect mutants that originally were at a fraction of 10-6. The reason for such a low enrichment apparently is the aforementioned PCR-associated noise, which coelutes with mutant peaks and is amplified along with the original mutants in subsequent PCR cycles.

D. Conclusion The principles of mutant separation by electrophoresis of cooperatively melting DNA molecules under partially denaturing conditions have been used to develop a new separation approach, CDCE. It has been demonstrated that CDCE has several important advantages that make it the technique of choice for mutational spectrometry. Namely, it is a very rapid method of high resolution and high dynamic range. Combining two consecutive CDCE separations with intermediate PCR has provided sensitivity of 10-6, which may be enough to detect mitochondrial mutations in human tissues. However, this result w a s achieved in a model system and it is still necessary to confirm that such a sensitivity can be reproduced on cellular DNA.

311

MUTATIONAL SPECTHOMETHY

ACKNOWLEDGMENTS We gratefully acknowledge John H. Hannekamp for communicating results and ideas prior to publication, H i l q Coller for critical reading of the manuscript, and Cindy Flannery for help in manuscript preparation.

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.

S. Benxer and E. Freese, PNAS 44, 112 (19.58). C. Coulondre and J. H. Miller. JMB 117, 577 (1977). P. Keohavong and W. C. Thilly, PNAS 89, 4623 (1992). A. R. Oller and W. G. Thilly, JMB 228, 813 (1992). A. A. Morley, K. J. Trainor, R. Seshadri and R. C. Ryall, Nature 303, 155 (1983). R. J. Albertini, Mufat. Res. 150,411 (1985). I. Kdin, S. Shephard and U. Candrian, Mutot. Res. 283, 119 (1992). J. S. Hanekamp, Ph.D. thesis. Massachusetts Institute of Technology, Cambridge, 1993. W. M. Brown, M. George, Jr., and A. C. Wilson, PNAS 76, 1967 (1979). W. G. Thilly and P. Keohavong, U.S. Patent 5,045,450 (1991). A. Kat, Ph.D. thesis. Massachusetts Institute of Technology, Cambridge, 1993. R. Cha, H. Zarbl, P. Keohavong and W. G. Thilly, PCR Methods Appl. 2, 14 (1992). K. Khrapko, J.S. Kanekamp. W. C. Thilly, A. Belenkii, F. Foret and B. L. Karger, NARes 22, 364 (1994). P. Keohavong and W. C. Thilly, PNAS 86,9253 (1989). K. Kleppe, E. Ohtsuka, R. Kleppe, I. Molineux and H. G. Khorana, ]MB !56,341(1971). R. K. Saiki, S. Scharf, F. Falwna, K. B. Mullis, G. T. Horn, H. A. Erlichand A. Arnheim, Science 230, 1350 (1985). C. D. K. Bottema and S. S. Sommer, Mutat. Res. 288, 93 (1993). C. Sarkar, J. Cassady, C. Bottema and S . Sommer, Anal. Biochem. 186,64 (1990). C. R. Newton, A. Graham, L. E. Heptinstd, S. J. Powell, C. Summers, N. Kalsheker, J. C. Smith and A. F. Markham, NARes 17, 2503 (1989). H. Okayama, D. T. Curiel, M. L. Brantly, M. D. Holmes and R. G. Crystid,]. Lab. C h . Med. 114, 105 (1989). D. Y. Wu, L. Ugozzoli, B. K. PI11 and R. B. Wdltule, PNAS 86, 2757 (1989). W. C. Nichols, J. J. Liepnieks, V. A. McKusick and M. D. Benson, Genomics 5,535 (1989). S. Li, J. L. Sobell and S. S. Sommer, Am. J . Hum.Genet. 50, 29 (1992). S. S. Sommer, J. D. Cassady, J. L.Sobell and C. D. K. Bottemil,Mayo Clin. Proc. 64,1361

(1989). 25. S. Kwok, D. E. Kellogg, N. McKinney, D. Spasic, L. Godaand J. J. Sninsky, NARes 18,999 (1990). 26. M.A. Nelson, B. W. Futocher, T. Kinsella, J. Wymer and C . T. Bowden, PNAS 89, 6398 (1992). 27. K. A. Eckert and T. A. Kunkel, NARes 18,3739 (1990). 28. L. L. Ling, P. Keohavong, C. Dim and W. G . Thilly. PCR Methodp Appl. 1,63 (1991). 29. B. J. Conner, A. A. Reyes, C. Morin, K. Itukura, R. L. Teplitz and R. B. Wallace, PNAS 80, 278 (1963). 30. E. Felley-Bosm, C. Poumrd. J. Zijlstra, P. Amstad and P. Cerutti, NARes 19,2913 (1991). 31. R. Kumar and M. Barbacid. Oncogene 3,647 (1988).

312

K. KHRAPKO ET AL.

32. R. Kumar, S. Sukumar and M. Barbacid, Science U8,1101 (1990). 33. S. M. Kahn, W. Jiang, T.A. Culbertson, I. B. Weinstein. G. M.Williams, N. Tomita and Z. Ronai, Oncogene 6, 1079 (1991). 34. M. S. Sandy, S. M. Chiocm and P. A. Cerutto, PNAS 89, 890 (1992). 35. S.-J. Lu and M. C. Archer, PNAS 89, 1001 (1992). 36. A. Haliassos, J. C. Chomel, L. Tesson, M. Bwdis, J. Kruh. J. C. Kaplan and A. Kitzis, NARes 17,3606 (1988). 37. G . G . Hillebrand and K. L. Beattie, JBC 260,3116 (1985). 38. G . Hu, DNA Cell Biol. 12, 763 (1983). 39. P. F'feiffer and G . Hu, in "Denaturant Gradient Gel Electrophoresis: A Laboratory Manual" (L. Lerman, ed.). In press. 1994. 40. S. G. Fischer and L. S. Lerman, PNAS 80, 1579 (1983). 41. K. Hayashi. PCR Methods Appl. 1, 34 (1991). 41b. G. Hu, H. Coller, X. Li and W. C. Thilly, in preparation. 42. 8. Bandy and A. J. Davison, Free Radicals B b l . Med. 8, 523 (1990). 43. D. C. Wallre, Science e56,628 (1992). 44. J. W. Shay and H. Weibin, Mutat. &s. 186, 149 (1987). 45. T. Tsuzuki, H. Nomiyama, C. Setoyama, S. M d and K. Shimada, Gene 25,223 (1983). 4%. G. Hu and W. G. Thilly, Gene in press (1994). 46. S. G. Fischer and L. S. Lerinan, Cell 16, 191 (1979). 47. E. Hovig, 8. Smith-Soresen, A. Brogger and A.-L. Borresen, Mutat. Res. 262,63 (1991). 48. D. Poland, Biopolyners 13, 1859 (1974). 49. M. Fixman and I. I. Friere, Biopolymers 16, 2693 (1977). 50. B. R. Amirikyan, I. L. Vologodskii und Y. L. Lyubchenko, NARes 9, 5469 (1981). 51. R. D. Blake and J. R. Fresco, Biopolymers 12, 775 (1973). 52. 0. Gotoh and Y. Tagashira, Biupolyners 20, 1033 (1981). 53. L. S. Lerman, S. G . Fischer, I. Hurley, K. Silverstein and N. Lumelsky, Annu. Reo. Biophys. Bioeng. 13, 399 (1984). 54. R. M. Myers, S. G. Fischer, L. S. Lermian and T. Maneatis, NARes 13, 3131 (1985). 55. W. G. Thilly, Corcinogenesis 10, 511 (1985). 56. R. M. Myers, T. Maniatis and L. S . Lermiin, Methods E n t y d . 155, 501 (1987). 57. A. S. Cohen, D. R. Najarian, A. Lhulus, A. Cuttinan, J. A. Smith and B. L. Karger, PNAS 85, 9660 (1988). 58. L. S. Lerman and K. Silverstein, Methods Enqnwl. 155, 482 (1987).

Polynucleotide Recognition and Degradation by Bleomycin STEFANIE A. W E * AND SIDNEY M. HECHT**t

Departtnents of Chetnisty* and Biohgyt Uniwrsity of Virginia Charlottesoille, Virginia 22901 Bleomycin: Strudure and Doniains ............................. Metal Complexes of Bleomycin ................................. Chemistry of Fe(II).Bleomycin ................................. Chemistry of DNA Degradation ................................ Other Metallo1)leomycins ...................................... Interaction of Bleomycin with DNA ............................. Cleavage of RNA Mediated by Fe(II).Bleomycin .................. Strand-Scission of Altered DNA Stnictures Mediated by Fe(II).Bleoinycin .......................................... IX. Concluding Remarks .......................................... Referenws ...................................................

1. 11. 111. IV. V. VI. VII. VIII.

314 314 316 322 327 329

338 344 348 349

During the past two decades, substantial effort has been focused on the isolation from natural sources and the design of molecules that can recognize and cleave DNA. The finding that a natural product, bleomycin, can recognize specific DNA sequences and mediate the oxidative destruction of the deoxyribose moiety of DNA in a manner that ultimately leads to strand scission (1-3) has made it the prototype for the design of simple molecular systems that function as “artificial nucleases.” The potential uses of such chemical nucleases range from tools in molecular biology research to the development of new chemotherapeutic agents. Like bleomycin, most of these systems rely on the redox nature of transition metals to promote nucleic acid oxidation. The first synthetic transition metal complex demonstrated to function efficiently in the oxidation of DNA was Cu(II).l,10-phenanthroline (4). Later, complexes containing Fe(II).EDTA (5) as well as chiral 4,Y-diphenyl1,lO-phenanthroline metal complexes (6) showed that the strategy used by bleoinycin for DNA recognition and cleavage could be applied to the design of molecules that cleave DNA. The recent finding that bleomycin can mediate the oxidative destruction of RNA in a highly selective fashion (7-9)will

314

STEFANIE A. KANE AND SIDNEY M. HECHT

probably inake bleoinycin a paradigm for the isolation and design of molecules that selectively recognize and degrade RNA. Such RNA interactive agents should be useful tools for probing the complex three-dimensional structures of RNAs. Ongoing studies of the mechanism of action of bleoinycin focus on the chemistry of nucleic acid degradation, DNA sequence and structure recognition, and the inode of DNA and RNA interaction with bleomycin.

I. Bleomycin: Structure and Domains The bleomycins are a fainily of glycopeptide-derived antibiotics originally isolated froin a fermentation broth of Streptoinyces t;erticiZZus (10). Bleoinycin is used clinically, both as a single agent and in combination chemotherapy, for the treatment of several neoplasms, including squamous cell carcinomas, testicular tumors, and malignant lymphoinas (11).It is used clinically as a inixture of structurally related inolecules called Blenoxane. Blenoxane consists priinarily of bleomycin A2 (-60%) and bleoinycin B2 (-30%) (Fig. l), as well as small amounts of several other congeners. Bleoinycin is believed to elicit its chemotherapeutic effects, at least in part, by degradation of cellular DNA. Studies using cell-free systems have shown that bleoinycin-mediated DNA degradation requires a inetal ion cofactor and inolecular oxygen. Bleoinycin binds to DNA and mediates strand scission predoininantly at G-C and G-T sequences (12,13). The structure of bleoinycin is quite coinplex and comprises three key functional domains (Fig. 1). The metal-binding domain, which consists of P-aininoalaninainide, pyrimidine, and P-hydroxyhistidine moieties, is responsible for metal-ion coordination and oxygen activation (1-3). Recent evidence suggests that this domain also participates in DNA binding. The bithiazole ring system and positively charged carboxy-terminal substituent coinpose the DNA-binding domain. The carbohydrate region inay aid in ineinbrane perineability and selective tumor-cell recognition. Additionally, participation of the carbainoyl group of inannose in inetal-ion coordination has been demonstrated (14). The structure of the ainino acid that acts as a “linker” between the inetal-binding and DNA-binding doinains also affects the efficiency of DNA degradation and the degree of antitumor activity (15).

II. Metal Complexes of Bleomycin Bleomycin can be activated for DNA degradation following coinplexation with a nuinber of inetal ions, including iron (16-18), cobalt (19, 20), copper

315

POLYNUCLEOTIDES AND BLEOMYCIN

I

metal binding domain 1

1

carbohydrate moiety

bleomycin A2 bleomycin B2 bleomycin demethyl A2

R = N-SCH3 H

Frc:. 1. Structures of representative I~leomycin-groupanti1)iotics.

(21, 22), manganese (23-25), vanadium (26), and nickel (27). Although the coordination chemistry of bleoinycin has been the subject of extensive investigation, the nature of the ligands and their arrangement about the metal ions remain controversial. The X-ray crystal structure of the copper complex of a microbial product structurally related to the metal-binding domain of bleomycin, denoted P-3A (Fig. 2), was reported in 1978(28).This structure proved that the primary and secondary amines of the P-aminoalaninamidemoiety, N-1 of the pyrimidine, N-3 of the imidazole, and the deprotonated amide of the histidine moiety are coordinated to the metal center in a square-pyramidal arrangement. This structure is the basis of a structure proposed for Cu(II).bleomycin in which the carbainoyl group of mannose occupies a sixth coordination site (29). Additional

316

STEFANIE A. KANE AND SIDNEY M. HECHT

-ahnine

FIG. 2. X-ray crystallographicallydetermined structure of the Cu(I1) cainplex of P-3A.

investigations demonstrating participation of the carbainoyl group in metal chelation included NMR studies of Zn(II).bleoinycin (30) and the carbon monoxide adduct of Fe(II).bleomycin (14). In contrast, in the nitrosyl adduct of Fe(II).bleomycin, the five ligands involved in the Cu(II).(P-3A)complex and the NO functionality appear to occupy the six coordination sites of the distorted octahedral complex (31). Structural studies of metal-ion coordination by bleomycin have contributed to the consensus view that the pyrimidine, imidazole, and secondary ainine are ligated to the metal ion. The remaining ligands and their overall arrangement remain controversial, and the relevance of these model coinplexes to the coordination geometries of the actual activated metal species remains unclear.

111. Chemistry of Fe(ll)-Bleomycin The finding that Fe(I1) and oxygen are essential cofactors for bleomycinmediated DNA degradation has prompted numerous investigations attempting to define the nature of the species ultimately responsible for mediating DNA damage. EPR, Mossbauer, and optical spectroscopy have been used to characterize the sequence of events responsible for production of the reactive, oxygenated Fe.bleomycin species (32, 33). The initiating event was found to be the combination of Fe(II).bleomycin with dioxygen to generate an EPR-silent species, consistent with a ferric-superoxide structure. This ternary complex undergoes a transfbnnation to generdte “activated bleomycin,” a paramagnetic species having g-values of 2.26, 2.17, and 1.94. The formation of this activated species requires an additional electron, which can be provided by another Fe(II).bleomycin or by added reducing agent. Three possible structures for this EPR-active species were proposed (32):

317

POLYNUCLEOTIDES AND BLEOMYCIN

Fe(III)-o,

Fe(II1)-OOH,

and

Fe(II1)/ O1 ‘0

In the first structure, the oxygen is at the oxidation level of atomic oxygen, similar to the species proposed for activated cytochrome P450(34,35).In the latter two structures, the bound oxygen is at the level of peroxide. Further support for the first species comes from Mossbauer studies of Fe.bleomycin (33), which indicate that activated bleomycin contains a low-spin ferric species, with two oxidizing equivalents residing on oxygen. The EPR and Mossbauer spectra of activated bleomycin exhibit significant similarities to those of activated cytochrome P,. Activated bleomycin can also be produced by the reaction of Fe(III).bleomycin with hydrogen peroxide or ethyl hydroperoxide, as has also been noted for cytochrome P4%)and related model systems (34,35).The presence of the paramagnetic activated species generated either by Fe(II).bleomycin + 0, or Fe(III).bleomycin + HzO, coincides with DNAcleaving ability. The formation of DNA degradation products coincides kinetically with the decay of the activated species, suggesting that “activated bleoinycin is the species responsible for mediating DNA damage. Titration of activated bleomycin with 1 e- (potassium iodide) or 2 e(thio-NADH) reductants showed that activated bleomycin contains two inore oxidizing equivalents than Fe(III).bleomycin (36). These results are consistent with the proposal that activated bleomycin is best represented as a perferryl (FeV=O) species. Supporting this view is the ability of bleomycin to mediate the oxygenation and oxidation of small-molecule substrates, similar to transformations mediated by activated cytochrome P,, and related model systems (34, 35). Activation of Fe(III).bleomycinwith iodosobenzene or periodate produces species that effect oxygen transfer to cis-stilbene and styrene to produce the respective epoxides and other oxygenated products (37-40) (Fig. 3). Olefin oxidation can also be achieved with Fe(II).bleoinycin + 0, in the presence of a suitable reducing agent. Oxidation of cis-stilbene affords predominantly the cis-epoxide, while trans-stilbene is a poor substrate for Fe.bleomycin. This stereoselectivity of olefin oxidation is similar to that observed for activated cytochrome P4%)model systems (41, 42). In addition to olefin epoxidation, Fe.bleomycin also mediates the N-dealkylation of N,N-dimethylaniline (38) as well as the hydroxylation of naphthalene to both 1-naphthol and %naphthol (43)and of p-deuterioanisole to p-inethoxyphenol with a concomitant NIH shift (38) (Fig. 3). The formation of a high-dent metal-oxo species from Fe(II).bleomycin

318

STEFANIE A. KANE AND SIDNEY M. HECHT

P

C6HS

+

CHO C6H5

D OH FIG. 3. Exainples of oxidation and oxygenation of sniall inolecules by Fe(II).bleomycin.

+

+

+ 0, e- or from Fe(III).bleomycin H202 would require scission of the Mbond. Homolytic cleavage would produce a ferryl species, while heterolytic scission would yield a perferryl species (Fig. 4). In order to characterize the less oxidized ferryl bleomycin species, bleoinycin.Fe’”=O, the

319

POLYNUCLEOTIDES AND BLEOMYCIN

bleomycin*Fe"+ O2

1

8-, H

+

bteomycin*Fe"l-OOH homolytc bleomycin.Fe"=O + *OH

bleomycin*Fev10 + -OH

FIG 4. Two possible modes of M Imnd scission of the Fe(III),bleomycin-~und peroxide.

reaction product of Fe(III).bleomycin with the alkyl hydroperoxide, 10hydroperoxy-8,12-octadecadienoicacid, was used (44).The decomposition of this fatty acid peroxide can give two sets of products, depending on the mode of 0-0 bond scission (Fig. 5) (45). Treatment with Fe(III).bleomycin yielded 10-oxo-8-decenoic acid as the major product, which must have been formed by homolytic cleavage of the peroxide 0-0 bond. This mechanism is supported by the observation that 2-octenyl radicals were formed in paral-

+

FIG.5. Homolytic versus heterolytic 0-4bond scission in lO-hydroperoxy-8,12octadecadienoic acid.

320

STEFANIE A. KANE AND SIDNEY M. HECHT

lel with the production of 10-oxo-8-decenoicacid, as shown by nitroxide spin trapping (46).The species obtained b i n the reaction of lO-hydroperoxy-8,12octadecadienoic acid with Fe(III).bleomycincould effect the oxidative transformation of some small substrates, although less efficiently than the species produced by the mixture of Fe(III).bleomycin and H,O,. However, unlike the activated species formed from Fe(III).bleomycin plus H20z, the species formed by activation of Fe(III).bleomycin with the dienoic acid did not degrade the dodecanucleotide d(CGCT,A,GCG), nor mediate the hydroxylation of naphthalene. Further, this species could not oxidize iodide ion, suggesting the absence of a high-valent Fe-0x0 species. These results suggest that activation of Fe(III).bleoinycinwith H20, and 10hydroperoxy-8,12-octadecadienoicacid produces different chemical species. Presumably, Fe(III).bleomycin the dienoic acid produces bleomycin.FelV=O, a species incapable of mediating DNA degradation. These results provide additional evidence that the activated bleom ycin generated from Fe(II).bleomycin 0, + e- or from Fe(III).bleomycin + H,O, contains an FeV=O species, formed by heterolytic scission of the 0-4 bond in the Fe.bleomycin-bound peroxide intermediate. The rates of product formation resulting from degradation of d(CGCT,A,GCG) by Fe(II).bleomycin + 0,and Fe(III).bleoinycin H202 have been measured to study the effect of the mode of activation on this rate (47). Aerobic activation of Fe(I1)~bleomycinis fast, but is inhibited by DNA. With a reducing agent, activation is rapid, even in the presence of DNA. This observation presumably reflects the mechanistic requirement for an additional reducing equivalent to activate Fe.bleomycin. In the absence of a reducing agent, a biinolecular collision of two Fe(II).bleoinycinscould effect This process would probably be inhibited if activation in the presence of 0,. the available Fe(II).bleoinycinswere bound to DNA. The reaction of Fe(III).bleomycinwith H20, is slow at neutral pH, but is accelerated in acid or base to rates comparable to the rate observed for Fe(II).bleomycin + 0,.Activation of Fe(III).bleomycin with Hz02 is not inhibited by DNA, presumably reflecting the lack of a requirement for an additional reducing equivalent on the pathway to Fe.bleomycin activation. The decay of activated bleomycin formed by mixing Fe(II).bleoinycin and 0,is quite rapid, with a t,,, of -2 minutes at 0°C. This value is in excellent agreement with the half-life of activated bleomycin, determined previously by spectrophotometric methods (32).For both modes of activation, product release was slower than the decay of activated bleomycin. These results indicate that activated bleomycin reacts rapidly with DNA, while the release of free bases and strand scission products occurs more slowly. The accumulated evidence for the activation and decay of Fe.bleomycin is summarized in Fig. 6. In this scheme, Fe(II).bleomycin combines with

+

+

+

321

POLYNUCLEOTIDES AND BLEOMYCIN

BLM*Fe(lll)-O;

[Fe(lll).BLMJ+

FIG.6. Possible catalytic cycle for Fe~l)leamycin.BLM, Bleomycin.

oxygen to generate an EPR-silent species. One-electron reduction followed by protonation produces an Fe(II1)-bound peroxide; heterolysis of the 0-0 bond then affords activated bleomycin, which could mediate DNA degradation or effect the oxidation of other substrates, ultimately producing Fe(III).bleomycin as a consequence. One-electron reduction then regenerates Fe(II).bleomycin, thus completing the cycle. In the absence of any added reducing agent, the reducing equivalents would presuinably be provided by two additional Fe(II).bleomycin molecules. This catalytic cycle would result in the overall 4 e- reduction of 0,to H,O; the stoichiolnetry of this process has been confirmed by 1 7 0 NMR spectroscopy (48). In the absence of DNA, a mixture of 20-1nM Fe(II).bleoinycinand 170, resulted in the formation of 9.4-mM H2170, that is, close to the theoretical yield of 10 mM. This observation is consistent with the overall reaction:

+

-

+ 0, 4 H+ 4 Fe(III).bleomycin [ H20/Fe(II)*bleomycin = 0.51

4 Fe(II)*bleomycin

+ 2 H,O

and provides evidence both for the catalytic cycle outlined in Fig. 6 and for the aldity of Fe(II).bleomycin itself to provide all of the reducing equivalents needed in support of the mechanism outlined. The catalytic cycle shown in Fig. 6 is closely analogous to the scheme proposed for cytochrome P4%)(34,35).In this context, it may be noted that the activation of Fe(III).bleomycin by peroxide is andlogoUS to the peroxideshunt mechanism (34, 35), by which cytochrome P4%)and related model systems undergo activation. Also analogous to cytochrome P.,%)is the behav-

322

STEFANIE A. KANE AND SIDNEY M. HECHT

ior of activated Fe.bleomycin in the absence of substrates. There is a rapid loss of oxidizing equivalents following activation by chemical (49) and electrochemical methods (50), with concomitant loss of the ability of the ligand to support the catalytic cycle shown in Fig. 6. In common with observations for cytochroine P,, (35, 51), the available evidence suggests that the observed loss of oxidizing equivalents results from oxidation of the metal ligand, that is, of bleomycin itself (49, 52). While the foregoing observations seem fully consistent with the catalytic cycle for Fe.bleomycin shown in Fig. 6, it is not possible at present to exclude the possibility that activated Feableomycin actually contains an Fe(II1)-boundperoxide. Indeed, Sam et al. (52a)have obtained evidence for a ferric peroxide complex by electrospray mass spectrometry.

IV. Chemistry of DNA Degradation Early experiments indicated that DNA degradation mediated by Fe. bleomycin results in both single- and double-strand breaks as well as alkalilabile lesions; these are accompanied by the release of free nucleic acid bases and base propenals (1,2, 53-55). Fragmentation of the deoxyribose ring occurs with concomitant production of base propenals, and produces a stoichiometric amount of DNA fragments having 3'-phosphoroglycolate termini (56). Both products require an equivalent of oxygen in addition to that required for bleoinycin activation (57). Free base release does not require additional oxygen; it is accompanied by production of an oxidatively damaged sugar ring in the intact DNA strand, which undergoes strand-scission only in the presence of alkali (58). To account for these observations, a mechanistic pathway was proposed in which both free base and base propenal are derived from a common reactive intermediate (59, 60). The observation that equal amounts of base propenal and phosphoroglycolate are formed led to the proposal of the mechanism for Fe(I1). bleomycin-mediatedDNA strand-scissionillustrated in Fig. 7 (56).According to this scheme, abstraction of the C-4' H from deoxyribose results in the formation of a transient C-4' radical, which then combines with oxygen. Following reduction of the resulting peroxy radical to the hydroperoxide, scission of the C-3'-C-4' bond of deoxyribose results from a Criegee-type rearrangement. This process yields three types of products, including base propenals, and oligonucleotide fragments having 5'-phosphate and 3"phosphoroglycolate termini, respectively. Identification of the third type of product was accomplished by acid (56)or enzymatic hydrolysis (61)of Fe.bleomycintreated DNA. Acid treatment released free glycolic acid. All four nucleoside 3'(phosphoro-2''-0-glycolates)were formed from Fe(I1).bleomycin-mediated

323

POLYNUCLEOTIDES AND BLEOMYCIN

Ro

I

Fe(ll).bleornyan 0

Ro I

o=p-0-

o=p--o-

OR'

OR

I

I

Ro ~

0 I

I

I o=p-0I OR

0

.%

I

o=p-o-

HN&cH3

I

Ro

anti-

-+

I

o=p-0-

+

OAN

elimination 0 I o=p-0-

I

m

+

H

CHO

0-

I

o=p-0-

I

OR

Frc. 7. Prolwsed inecliaiiism of oxidative DNA strand-scission by Fe(II).bleoinycin.

degradation of Escherichiu coli DNA concomitant with DNA strand-scission (61).Moreover, the four possible base propenals were also detected, consistent with earlier observations (56). The products resulting from DNA strand-scission mediated by bleomycin have been characterized directly using a hexanucleotide, d(CGCGCG) (62), and a dodecanucleotide, d(CGCT,A,GCG) (a),as substrates for Fe(II).bleomycin; they were consistent with the proposed mechanism. For example, treatment of d(CGCT,A,GCG) with Fe(II).bleomycin resulted in cleavage predominantly (7646%)at cytidine, and cytidine,, over a wide range of conditions. The production of CGCH,COOHand 5'-dGMP was diagnostic of cleavage at cytidine, and cytidine,,, respectively. In the absence of any added reducing agent, the yield of products (i.e., DNA lesions) was never greater than half the concentration of Fe(II).bleomycin present, consistent with previous suggestions that two Fe(II).bleornycin molecules are required to produce a single molecule of activated bleomycin (63, 64). Coinparison with authentic cis- and truns-base propenals showed conclusively that Fe(II).bleomycin produces base propenals having exclusively the trans configuration. This indicates a C - l ' - O - l ' bond scission involving an unti elimination, consistent with earlier findings that the C-2' CUHis lost stereospecifically (59). Further support for this scheme comes from a study of the degradation of poly(dA-dU) specifically tritiated at various positions on the deoxyribose

324

STEFANIE A. KANE AND SIDNEY M. HECHT

ring. In this study, labilization of the C-4’ H accompanied stereospecific loss of the C-2‘ H, resulting in the release of base propenal (59).Moreover, when poly(dA-[4’-3H]dU)was degraded by Fe(II).bleomycin, the ratio of uracil to uracil propenal could be varied from 0.03 to 7, depending on the oxygen concentration (60). Recently, DNA containing [4‘-2H]thymidine residues was used for DNA cleavage studies, and the products were analyzed by gel electrophoresis (65). Primary kinetic isotope effects were observed for DNA strand-scission events as well as for the formation of alkali-labile lesions. Of particular interest was the fact that there were reproducible differences in the magnitude of the isotope effect at different sites, suggesting that local DNA structure influences the facility of H removal by activated Fe(II).bleomycin. These results support a mechanism of Fe(II).bleomycin-mediatedDNA degradation involving rate-limiting abstraction of the C-4’ H, followed by an oxygen-dependent partitioning of the resulting C-4’ radical to yield two sets of products. The scheme proposed in Fig. 7 has been, the accepted mechanism for DNA strand-scission mediated by Fe(II)*bleomycin. However, a recent study using DNA containing [ lf,2’,tnethyl-3H]dT showed that DNA strand-scission coincided with the release of 3H20, and that both preceded the release of base propenals (66). In an investigation using the substrates poly[dA(2‘-pro-R-3H)dU]and poly[dA(2’-pro-S-oH)dU],the 2’-pro-R H was lost specifically at a rate comparable to DNA strand-scission, both of which occurred more rapidly than base-propenal release (67). These findings are not consistent with the previously proposed mechanism (Fig. 7); therefore, an alternative mechanism for bleomycin-mediated DNA strand-scission has been proposed. As shown in Fig. 8, the new scheme posits that following C-4‘ hydroperoxide formation and subsequent Criegee-type rearrangement, loss of the 2’-pro-R H affords intermediate i, which can decompose by either of two pathways involving DNA strand-scission accompanied by the loss of DNA fragments terminating in 5’-phosphate (pathwayA) or 3’-phosphoroglycolate groups (pathway B). Both pathways would result in the formation of iminium ions, the hydrolysis of which releases base propenals. This alternative pathway to base-propenal formation is consistent with other findings (66) and suggests that the “long-lived precursor to base propenal may be the enamine i. Fe.bleomycin-mediated release of free bases from DNA is accompanied by the formation of an alkali-labile lesion (58-60). Chemical characterization of the alkali-labile lesion as a C-4’ hydroxyapurinic acid was accomplished by the use of a self-complementary dodecanucleotide (68, 69) (Fig. 9). Formation of the alkali-labile lesion can be envisioned as resulting from hydroxyla-

325

POLYNUCLEOTIDES AND BLEOMYCIN RO I

om!-0-

o=;-oI

OR Ro I

o=y-o0

+

O-NqO

1

H*o RO I

CHJ

I O=P-Od I OR'

RO I

o=p-oI OCH-

+

0=foOR

I

H20

0np-o-

FIG. 8. Alternative proposed ineclianisin for 1)ase-propenalforination by Fe(1I)~bleomycin.

tion at the C-4' position of deoxyribose. The release of free cytosine from cytidine, in d(CGCT,A,GCG) was accoinpanied by formation of an alkalilabile lesion at this site; further treatment yielded products of the type CpGp,, structural characterization of which permitted the nature of the alkali-labile lesion to be deduced. For example, alkali treatment effected DNA strand-scission producing the two diastereoineric hydroxycyclopentenones 2, the structures of which were determined by comparison with authentic synthetic standards. Alternatively, treatment of the putative intermediate 1with hydrazine effected its conversion to pyridazine 3 in quantitative yield, a transformation that maintained the same connections between carbon atoms as in the alkali-labile lesion. Treatment of the alkali-labile lesion with aqueous n-butylamine yielded CpGp itself. Further support for the structure of the alkali-labilelesion and confirmation of the connectivity of carbon atoms was obtained by reduction of l,$-dihydroxy species 1 with sodium borohydride, followed by enzymatic digestion and characterization of the released deoxypentitol (70).

326

STEFANIE A. KANE AND SIDNEY M. HECHT

-

-

5’ CGCTAAAGCG 3’ 3’ GCGAAATCGC 5’

-

-

I

Fe( Il)*bleomycin

0 I

o=p-0 I

O2

-

oh-

NH2NY ’ \ J

3

2 + CPGP FIG.9. Chemical characterization of the alkali-labile lesion.

The source of oxygen incorporated at the C-4’ position of the alkali-labile lesion has been investigatedby treatment of d(CGCGCG)with Fe(II).bleomycin under conditions of limiting oxygen. This led to the formation of the alkaliFollowing reduction with sodium borohydride and enzymalabile lesion (71). tic digestion, free deoxypentitol was released, and then characterizedby inass spectrometry. With 180-labeled 0, or H20, it was shown that the C-4’ oxygen To account for this observation, came From solvent water, rather than from 0,. a mechanism was proposed that involved a second I e- oxidation of the C-4’

POLYNUCLEOTIDES AND BLEOMYCIN

327

deoxyribose radical initially formed by Fe.bleomycin. This would result in the formation of a C-4’ carbocation, which could combine with solvent water to generate the alkali-labile lesion. Presumably, both oxidations could be mediated by a single molecule of activated bleomycin, although the nature of the Fe.bleomycin-oxygen species remaining after H abstraction has not been established. The validity of the mechanistic conclusions reached in this study obviously depends on the assumption that none of the oxygenated Fe.bleomycin intermediates involved in formation of the alkali-labile lesion underwent exchange of oxygen with solvent prior to completion of the oxidative transformation of DNA. In fact, studies of the oxidation of cis-stilbene by Fe(I1). bleomycin using 1 x 0 , for bleoinycin activation demonstrated that 90% of the epoxide oxygens came from the lH02 used for Fe(II).bleomycin activation, not from solvent water (40), suggesting that solvent exchange is slow or negligible relative to the rate of reaction. Recently, EPR spectroscopy has shown that the oxygen in activated Fe.bleomycin does not exchange with solvent water under the conditions used for DNA degradation (72).

V. Other Metallobleomycins The kinetically inert Co(III).bleoinycinsbind to DNA, although they do not mediate the activation of oxygen nor the oxygen-dependent cleavage of DNA. However, photoactivated Co(III).bleomycins mediate DNA strandscission (19, 20). Several Co(III).bleoinycin coinplexes are formed by the aerobic oxidation of aqueous solutions of Co(1I) and bleomycin. The stable green and brown coinplexes contain peroxide and water as ligands, respectively (73);both complexes were initially thought to be capable of mediating DNA cleavage, but the green complex was subsequently shown to be the actual reactive species (74). Light-activated Co(III).bleomycins exhibit the same selectivity for G-Y sequences as Fe(II).bleomycin. However, in contrast to Fe(II).bleomycin,DNA strand-scission mediated by Co(III).bleomycinwas insensitive to oxygen concentration, and did not result in the release of base propenals. Consistent with these observations,degradation of d(CGCGTzA2CGCG) with green Co(III).bleomycin yielded only free cytosine and alkali-labile lesions (74). These results suggested that DNA degradation mediated by Co(III).bleomycin proceeds exclusively via C-4’ hydroxylation, an observation consistent with the insensitivity to oxygen concentration and the lack of base-propenal formation. Mn.bleoinycin can mediate DNA strand-scission and the oxidative transformation of small substrates (23-25). Initial observations suggested that

328

STEFANIE A. KANE AND SIDNEY M. HECHT

Mn(II).bleoinycin can mediate DNA relaxation following activation in the presence of 0, (23)or H202(24),but that the efficiency of cleavage is only 13% of that obtained with Fe(III).bleomycin + H,O, (24). Subsequent investigations established that Mn(I1)-bleomycinis activated for DNA cleavage in P-mercaptoethanol, 0, + ascorbate (E. C. the presence of H,O,, 0, Long, S. A. Kane and S. M. Hecht, unpublished), or by the use of light (25). Mn(II).bleomycinactivated with H,O, was much less efficient at mediating the relaxation of supercoiled DNA than was Mn(II).bleomycinin combination with oxygen and P-mercaptoethanol. Analysis of the sequence-selectivity of DNA strand-scission showed that Mn(II).bleomycin activated with ascorbate exhibits the same selectivity of cleavage as Fe(II).bleomycin, although the Mn(II).bleomycin is much less potent as a DNA-damaging agent. In this study, DNA degradation by Mn(II).bleomycinwas shown not to be due to contaminating Fe. In addition, the mobility of the cleavage fragments produced by Mn(II).bleomycinwere the same as those produced by Fe(II).bleomycin, suggesting the presence of 3’-phosphoroglycolate termini (S. A. Kane and S. M. Hecht, unpublished). Mn(III).bleomycin, in combination with the oxygen surrogate iodosobenzene, effected the conversion of cis-stilbene to cis-stilbene oxide, trans-stilbene oxide, and deoxybenzoin (23). In the same study, a mixture of (tetraphenylporphinato)Mn(III)and iodosobenzene also produced these oxidation products. Activated Mn(III).bleomycin also mediated the oxidative transformation of styrene, cyclohexene, and norbornene. Both Cu(1)and Cu(I1)form stable 1:l complexes with bleomycin. Moreover, the &nity of both Cu(1)and Cu(I1)for bleomycin was greater than that of Fe(I1) (75). Cu(I).bleomycin binds to DNA with the same &nity as Fe(II).bleomycin. Cu(II).bleomycin can be activated for DNA strandscission by suitable reducing agents, provided that sufficient time is permitted for reduction to the Cu(1) complex (21, 22). The selectivity of DNA strand-scission mediated by Cu(I).bleomycindiffers significantly from that of Fe(II).bleomycin (22). In addition, there is a significant variation in the extent of cleavage for the sites shared in common between the two metallobleomycins. Interestingly, the structure of the Cu(I).bleomycin complex differs significantly from that of Fe(II).bleomycin (24, suggesting that differences in the metal coordination geometries could contribute to the differences in DNA sequence-selectivity. A mixture of Cu(II).bleomycin and iodosobenzene affords a species that mediates the oxidation of cis-stilbene to cis-stilbene oxide, trans-stilbene oxide, and benzaldehyde (22,38). In contrast to activated Fe(III).bleoinycin, no deoxybenzoin was formed by Cu(II).bleomycin activated with iodosobenzene.

+

POLYNUCLEOTIDES AND BLEOMYCIN

329

VI. Interaction of Bleomycin with DNA The finding that single- and double-strand DNA cleavage at specific sequences is mediated by Fe(II).bleomycin, as well as its affinity for doublestrand DNA, has prompted extensive investigation in an attempt to define the mode(s) of interaction of bleomycin with DNA. Early studies indicated that both the bithiazole and a carboxy-terminal substituent are necessary for the binding of bleomycin to DNA, although the exact nature of the interaction was unclear. Multiple modes of association of this portion of the bleomycin molecule with DNA have since been reported (ui& infru). Moreover, recent studies indicate that the metal-binding domain may contribute to the affinity of bleomycin for DNA and actually may be the primary determinant of sequence-selectivity of cleavage (uide infru). Equilibrium constants for the binding of metal-free bleomycin to DNA have been determined by fluorescence spectroscopy (76, 77) and equilibrium dialysis measurements (78, 79) to be on the order of 1-4 X 105 M - 1 . Metallobleoinycins have DNA-binding constants similar to those of metalfree bleomycins. Measurement of the fluorescence quenching of bleomycin by DNA indicates that bleomycin binds to DNA by more than one mode of association (80). Only one type of binding was sensitive to the ionic strength of the medium, consistent with the interpretation that the interaction of bleomycin with DNA involves both hydrophobic and electrostatic components. Modification of the p-aminoalanine moiety at the amino-terminus of bleoinycin, or removal of the positively charged substituent at the carboxyterminus, eliminated the ionic type of fluorescence quenching. The electrostatic component of the interaction of bleoinycin with DNA has also been observed during the measurement of the DNA-binding constants of a series of synthetic bithiazole derivatives (81). Two types of binding were observed, one of which was destabilized by an increase in ionic strength. Additionally, in an NMR study of the interaction of bleomycin A, with poly(dA-dT) (82), the chemical shift of the methyl groups of the diinethylsulfonium moiety was sensitive to the pH and ionic strength of the medium, suggesting an ionic interaction between this portion of the molecule and the negatively charged DNA-phosphate backbone. The hydrophobic component of the interaction of bleomycin with DNA would presumably derive from the association of the bithiazole moiety with DNA. Several studies have been carried out to determine whether the bithiazole functions as a classical intercalator. Criteria for determining intercalation include unwinding of supercoiled DNA and helix elongation (83). The ability of bleoinycin (84)and several synthetic bithiazoles (85) to unwind DNA has been demonstrated, using two-dimensional agarose-gel electro-

330

STEFANIE A. KANE AND SIDNEY M. HECHT

phoresis. Interestingly, DNA unwinding by bleomycin is enhanced by a factor of 100 when the metal-binding region is coordinated to Cu(II), demonstrating direct participation of the positively charged amino-terminus of bleoinycin in drug-DNA interaction (84). Further, Cu(II).bleomycin demethyl A, (see Fig. 1)was found to be much less effective in DNA unwinding, demonstrating the importance of a positively charged group at the carboxy-terminus of the bleomycin molecule. In the same study, however, DNA unwinding was also observed for a steroidal diamine whose structure should preclude intercalation, indicating that DNA unwinding need not be due to intercalation. Helix elongation by bleomycin was studied by both linear dichroism (78) and viscometric methods (76). The results of the linear dichroism experiments demonstrated that each bound bleomycin molecule lengthened DNA by 3.1 A, within the observed range of known intercalators. However, bleomycin failed to increase the solution viscosity of DNA. 1H NMR spectroscopy has been used to investigate the interaction of bleoinycin A, with poly(dA-dT) (76).There were only minimal changes in chemical shifts of the bithiazole proton resonances, arguing against a classical intercalative interaction. It may be noted, however, that poly(dA-dT) is not a good substrate for cleavage by Fe.bleomycin. In a similar study, the binding of synthetic bithiazoles to poly(dA-dT) was investigated (81). Interestingly, bithiazole derivatives containing 2’-aliphatic substituents similar to that found in bleomycin inhibited the ability of those compounds to intercalate into DNA. The magnitude of the perturbation of the bithiazole proton resonances was maximal for bithiazole derivatives containing 2’-aromatic substituents. A number of synthetic bithiazole derivatives have been used to investigate the role of the bithiazole moiety of bleomycin in the interaction of bleomycin with DNA. Several bithiazoles, as well as the tripeptide S fragment of bleomycin (Fig. lo), were capable of inhibiting DNA binding and subsequent cleavage by bleomycin (86).Bleomycin-mediated DNA degradation was evaluated by monitoring the release of [3H]thymine from PM-2 DNA and by assessment of the extent of cleavage of a 32P end-labeled DNA duplex. Inhibition of bleomycin-mediated DNA cleavage by the bithiazoles was sensitive to the number and distribution of positively charged groups on individual bithiazole derivatives. In a related study (8S),inhibition of DNA degradation by bleomycin was more pronounced for bithiazole derivatives containing 2‘-aromatic substituents. In both studies, the bithiazole derivatives diminished bleom ycin-induced DNA cleavage proportionately at all sites, and did not alter the sequence-selectivity of strand-scission. Insight into the role of the mannose carbamoyl group on the disaccharide moiety of bleomycin in determining the sites cleaved by bleoinycin was

POLYNUCLEOTIDES AND BLEOMYCIN

331

"tripeptide S"

FIG. 10. Bithiazole derivatives that inhibit Fe(II)~bleomycin-mediatedDNA degradation.

gained from a study of the ability of several bleomycin congeners to degrade d(CGCT,A,GCG) (87). It was of particular interest to determine the ratio of cleavage of cytidine, versus cytidine,,, that is, at the potential double-strand recognition site. For both bleomycins A, and B,, -85% of the cleavage occurred at cytidine,,, while only -15% was observed at cytidine, (Table I). When the experiment was carried out with the bleomycin congeners deglycobleomycin A, and decarbamoylbleomycin A,, the specificity of cleavage was reversed. These results suggested that bleomycin may bind to G-C

STEFANIE A. KANE AND SIDNEY M. HECHT

332

TABLE I SEQUENCE-SELECTIVITY OF CLEAVAGE OF D(CGCT,A,GCG) BY BLEOMYC~N CONGENERS ~~~

Cleavage position (%) Bleomycin

Specificity ( % ) ( I

Cytidine,

Cytidine,,

Fe(II).l)leomycin A, Fe(II).bleomycin B, Fe(I1)deglycobleomycinAP Fe(II)decdrl,amoylldeomyciii A,

78 75 98 90

15 17 79 72

85 83

11

21

28

Percentage of cleavage at cytidine, or cytidinell relative to the total dodecanuclmtide cleavage.

dinucleotide sequences in two different, presumably antiparallel, orientations, and that the presence of the disaccharide moiety and carbamoyl group of bleomycin may influence that orientation. In this context, it should be noted that the metal coordination geometries of Fe(II).bleomycin A, and Fe(II).deglycobleomycinA2 (88)are believed to differ substantially, as determined from a 1H NMR investigation of the corresponding carbon monoxide adducts. Since bleomycin A, and decarbamoylbleoinycin A2 differ only in the carbamoyl moiety, the results of this study provide strong support for the participation of the carbainoyl group in metal-ion coordination. Moreover, the observed differences in selectivity at a given G-C site appear to derive from structural differences within the metal-binding domain, suggesting participation of the amino-terminus of bleomycin in DNA-sequence recognition. Although there is evidence that bleomycin may bind to DNA through a (partial) intercalative mechanism, there is considerable other evidence to indicate that bleomycin interacts with DNA by binding in the minor groove. This is supported by the observation that known minor-groove binders, such as distamycin, alter the sequence-selectivity of DNA cleavage by Fe.bleomycin, while intercalators such as ethidiuin bromide diminish the extent of cleavage but do not alter sequence-selectivity (63,89). Moreover, the presence of the bulky glucose residues in the major groove of phage T4 DNA have little effect on the efficiency of DNA strand-scission induced by bleosuggesting that bleomycin does not bind to DNA in the major mycin (90), groove. Further evidence in support of minor-groove binding has been adduced from studies using DNA substrates containing modified bases. For example, DNA containingguanosine residues alkylated at the N-7 position by dimethyl sulfate or aflatoxin do not alter the selectivityof cleavage by bleomycin (91).In contrast, modification ofthe %amino group of guanosine by anthramycin significantly inhibits cleavage at G-Y sequences by Fe.bleomycin (92).

POLYNUCLEOTIDES AND BLEOMYCIN

333

These results strongly suggest that bleoinycin binds to DNA through minorgroove interactions. The most compelling evidence that bleoinycin binds to DNA in the minor groove derives froin the finding that DNA degradation involves abstraction of the C-4’ H from deoxyribose ( 1 , 2 , 5 9 ) ,indicating that the metal-binding domain responsible for H abstraction must be oriented in the minor groove. The ability of bithiazoles to bind to DNA and the finding that the bithiazole moiety of bleomycin is essential for DNA binding by the antibiotic itself (93)prompted investigations to determine whether the bithiazole ring system is also responsible for sequence-selective recognition of DNA by bleomycin. In one study, the sequence-selectivities of DNA cleavage by the bleoinycin congeners phleomycin and tallysoinycin (Fig. 11)were evaluated (94).Phleoinycin contains a thiazolinylthiazole moiety, rather than the planar bithiazole present in bleomycin; the sp3 carbon atom in the thiazoline should preclude intercalative binding of phleoinycin to DNA, consistent with available experimental data (95).In spite of this structural element, phleoinycin exhibited the same selectivity for DNA cleavage at G-C and G-T sequences as bleomycin. The extent of cleavage, however, was less than that obtained with bleomycin. Tallysoinycin contains the amino sugar 4-amino-4,6dideoxy-L-talose attached to the aininoethylbithiazole moiety. Experimentally, tallysoinycin exhibits DNA cleavage patterns similar to those observed for bleomycin. The lesser potency of tallysoinycin as a DNA-damaging agent is consistent with the suggestion that interference from the additional sugar residue diminishes the ability of tallysomycin to bind to DNA. To assess the role of the bithiazole moiety in DNA binding and sequence recognition by bleomycin, analogs of deglycobleomycin (96, 97) containing only a single thiazole ring were prepared by total synthesis (Fig. 12) (43). Both analogs carried out oxidative transformations of sinall organic substrates, providing further evidence that the bithiazole moiety is not required for oxygen activation (cf. 93). However, unlike bleoinycin and deglycobleomycin (87, 98, 99),both analogs cleaved DNA only at high concentrations and in a sequence-neutral fashion. These results indicate that disruption of the bithiazole ring system eliminates the ability of the bleoinycin analog to bind to DNA in the same fashion as the natural product, but do not provide evidence for any possible role of the bithiazole moiety in mediating sequenceselective recognition of DNA. To assess the role of the individual structural domains of bleoinycin in DNA-sequence recognition, a series of synthetic bleoinycin analogs was prepared in which the threonine moiety of bleoinycin was replaced with glycine or oligoglycine spacers of varying length (Fig. 13, “Gly,,-bleomycins”) (100,101).To facilitate the synthesis of analogs, all of these derivatives were analogs of deglycobleoinycin rather than of bleoinycin itself. De-

334

STEFANIE A. KANE AND SIDNEY M. HECHT

i

u

Ho w

PhleomyclnD1

I-

o

~

H

,

FIG. 11. Structures of phleomycin D, and tallysomycin S,B.

glycobleoinycin is a less potent DNA-damaging agent than bleomycin, but mediates DNA strand-scission with the same sequence-specificity (87, 98, 99). Within the series, the potential distance between the metal-binding region and the bithiazole was varied, to a inaxiinuin of 14 A in the case of Gly,-bleomycin. It was anticipated that if the bithiazole carboxy-terminal substituent were dominant in determining the sites of DNA binding, the metal-binding domain would be displaced farther along the helix as the length of the spacer was increased, and cleavage would occur at increasingly

+

POLYNUCLEOTIDES AND BLEOMYCIN

335

0

H2N

HO

FIG. 12. Structures of inonothiazole bleoniycin analogs.

distant sites. Alternatively, if the metal-binding domain determines the G-Y selectivity, all of the analogs would bind to and cleave DNA at a common site. This was actually observed.

FIG. 13. Structures of synthetic aiidogs of deglycobleoinycin in which the metal-binding doinaiii and bithiazole inoiety are separated by oligoglycine spacers of variable length.

STEFANIE A. KANE AND SIDNEY M. HECHT

336

These observations suggest that the metal-binding region of bleomycin controlled the selectivity of DNA strand-scission. Nevertheless, replacement of the bithiazole moiety of bleomycin with distamycin altered the sequence-selectivity of bleomycin; cleavage occurred predominantly in (A+T)-rich regions, indicating that the distamycin moiety, not the metalbinding domain, controlled the position of binding (102). An important caveat in interpreting the results of the Gly,,-bleomycin study derives from the fact that all of the compounds used are analogs of bleoinycin deinethyl A,. The positively charged diinethylsulfoniuin moiety enhances DNA binding, and the efficiency of DNA strand-scission by bleoinycin (103, 104),as well as its ability to unwind supercoiled DNA (84). It was, therefore, anticipated that a series of Gly,,-bleomycin analogs that inore closely resemble bleoinycin A,-that is, that retained the dimethylsulfoniuin group at their carboxy-termini-might exhibit a DNA sequencebinding-selectivity that reflects a more substantial role for the carboxyterminal region of the analogs. Accordingly, a series of Gly,,-bleomycin A, analogs (as well as their corresponding demethyl A, derivatives) was prepared by total synthesis and used for DNA cleavage studies (S. A. Kane, A. Natrajan and S. M ,Hecht, unpublished). Using three different radiolabeled DNA restriction fragments from pBR322, the sequence-selectivity of the Gly,,-bleoinycin A, analogs was assessed. For all DNA substrates employed, all Gly,,-bleoinycin A, analogs mediated DNA strand-scission with the same sequence-specificity as bleoinycin-A, itself. These observations, in coinbination with those presented above for the monothiazole bleoinycins, strongly suggest that the bithiazole carboxy-terminal substituent of bleoinycin provides DNA &nity, while the metal-binding region governs DNA-sequence recognition. These results suggest greater complexity in the interaction of bleoinycin with DNA than had been thought previously. The metal-binding domain, once believed to be involved solely in metal binding and oxygen activation, thus appears to be dominant in determining the sequence-selectivity of DNA binding as well. Although it had been well-established that the bithiazole carboxy-terminal substituent of bleoinycin is necessary for DNA binding, whether or not this structural domain has any inherent sequenceselectivity for DNA binding remains unclear. In order to determine whether the bithiazole inoiety can exhibit any inherent sequence-selectivity, several structurally modified bithiazole derivatives equipped with appropriate DNA-cleaving moieties were used. These included Fe(II).EDTA-bithiazole conjugates and a Co(II).diamine-tethered bithiazole. The possible sequence-selectivity of DNA cleavage by these modified bithiazoles has been assessed. Three bithiazole derivatives containing appended EDTA moieties were

+

+

POLYNUCLEOTIDES AND BLEOMYCIN

337

EDTA-bithiazoles

aminomeihylbithiazole

FIG.14. Structures of synthetic bithiazole derivatives used for iovestigition of the possible sequence-l)indiii~-selectivity of the Iiithiazole moiety.

prepared (Fig. 14). Activation of Fe(II).EDTA with a reducing agent in the presence of oxygen generates diffusible oxygen radicals, which can mediate DNA strand-scission (105, 106). It was anticipated that if the bithiazole bound to the same DNA sequences as bleoinycin, enhanced cleavage would occur in the regions near the binding site, due to the diffusible nature of the oxygen radicals produced by Fe(II).EDTA. In spite of the fact that this approach has been used successfully to define the DNA-binding preferences of other agents (107-113),all of the EDTA-bithiazole derivatives mediated DNA strand-scission in a sequence-neutral fashion, presumably reflecting non-specific binding of these inolecules to DNA. In addition, a significant component of the DNA affinity of these bithiazole derivatives was electrostatic, like the interaction of bleoinycin with DNA (80-82, 86). These results suggested that the bithiazole binds to DNA without any intrinsic sequenceselectivity (1 14). The second bithiazole derivative investigated was an aminomethylated bithiazole structurally related to bleoinycin A, (Fig. 14). In the presence of Co(II) and oxygen, this bithiazole induced the forination of alkali-labile sites on duplex DNA; subsequent base treatment resulted in guanosine-specific DNA strand-scission. The products of cleavage after base treatmentoligonucleotides terminating in 3‘- and 5’-phosphates-were consistent with a cleavage mechanism involving oxidative modification of guanine (115).

338

STEFANIE A. KANE AND SIDNEY M. HECHT

DNA cleavage mediated by the Co(II).aminomethylbithiazole complex was mechanistically distinct froin that mediated by other cobalt-containing DNA-cleaving agents (see, e.g., 20, 116) in that it did not require light activation. Although dependent on oxygen, the cleavage reaction was insensitive to scavengers of activated forms of oxygen. Further, no oxygen radicals were detected by EPR spectroscopic spin-trapping methods, indicating that diffusible oxygen-centered radicals were not responsible for the observed DNA damage. These observations suggest that the Co(II).aminomethylbithiazole complex mediated the oxidative modification of the guanine, producing some species susceptible to strand-scission by alkali treatment. Because guanine is the most easily oxidized of the nucleic acid bases (117), the most logical interpretation of these experiments is that the guanine specificity of the Co(II).bithiazole complex results froin preferential reactivity at guanine sites, as opposed to a guanine-binding selectivity of the bithiazole (118). In the aggregate, these studies with modified bithiazole derivatives showed that the bithiazole moiety does not have the same DNA-binding selectivity as bleomycin. The EDTA-bithiazole study suggests that these species bind to DNA in a non-specificfashion, although it is conceivable that the diffusible nature of the hydroxyl radicals produced by the Fe(II).EDTA moiety obscured a weak binding selectivity of the bithiazole. The Co(II).aminomethylbithiazole complex also bound to DNA and mediated cleavage at sites that reflect base-specific reactivity of the complex, as opposed to binding selectivity. These results are entirely consistent with a model of bleomycin-DNA interaction whereby the bithiazole + carboxy-terminal substituent contributes to DNA afhity, but does not provide the structural basis for selective recognition of DNA sequences.

VII. Cleavage of RNA Mediated by Fe(Il)*Bleomycin Although most investigations of the mechanism of action of bleomycin have focused on the degradation of DNA, recent findings demonstrate that RNA can be a target for bleomycin in vitro (7-9). The potential of RNA as a therapeutic target for bleoinycin is substantial for several reasons. Much cellular RNA is located in the cytoplasm and is therefore more readily accessible than DNA, which is located in the cell nucleus. Further, although cellular repair mechanisms can repair certain bleomycin-induced DNA lesions, there is presently no evidence for the existence of analogous mechanisms for the repair of damaged RNA. Finally, chromosomal DNA is much

POLYNUCLEOTIDES AND BLEOMYCIN

339

more extensively packaged than cytoplasmic RNA, rendering the latter more accessible as a target for oxidative damage. The ability of Fe(II).bleomycinto degrade RNA efficiently is indicated by various substrates chosen from the different classes of RNA (7-9). Examples include yeast 5-S ribosomal RNA, Bacillus subtilis tRNA"is precursor, mature Escherichia coli tRNAHis, and a Schizosaccharoinyces pombe amber suppressor tRNASer construct. While efficient cleavage was observed for these substrates, certain RNAs, including E. coli tRNATyr precursor, were refractory to bleoinycin cleavage. Although RNA cleavage predominated at G-Y sequences, as also observed for DNA cleavage, RNA cleavage was much more selective than the cleavage of DNA. Moreover, G-A and U-U sequences were strong cleavage sites unique to certain RNA substrates. Interestingly, a substantial number of cleavage sites were located at the putative junctions between single- and double-stranded regions of the RNA molecules studied (Fig. 15), suggesting selective recognition of nucleic acid structure, rather than sequence, by Fe(II).bleomycin. Further evidence supporting this concept was the observation that NaCl, MgCl,, and spermidine, known to affect RNA conformation (119), altered the efficiency of RNA cleavage by Fe(II).bleomycin. RNA cleavage was also affected to different extents at individual cleavage sites on a given RNA substrate, suggesting that the change in the conformation of the overall RNA molecule effected by these reagents influences the facility of cleavage by bleomycin at these individual sites. Recently, Hiittenhofer et al. (120)reported a few additional examples of Fe.bleomycin-mediated RNA cleavage, but suggested that cleavage may occur only at Mg2+ concentrations much less than those that occur physiologically. Subsequently, it was demonstrated (9) that there is a wide variation in the ability of individual RNAs to be cleaved by Fe.bleomycin when Mg2+ is present; this feature may actually further enhance the selectivity of RNA cleavage anticipated in situ. The observation that RNA cleavage often occurs at the junction between single- and double-stranded regions indicates a significant difference in the recognition of RNA and DNA by bleomycin. The obvious differences between the two structures are the presence of the 2'-hydroxyl group in RNA, and the occurrence of uracil instead of thymine in RNA. However, the ability of RNAs to adopt complex tertiary structures is an additional distinction that could also be used by bleomycin for the differential recognition of the two kinds of macromolecules. To explore further the observed differences in selectivity of cleavage of RNA compared to DNA, a DNA was prepared having the same primary sequence as B. subtilis tRNAkfis precursor and was subjected to cleavage by

340

STEFANIE A. KANE AND SIDNEY M. HECHT

A

AAAUAAAAAUUGAAUU C

* c

5'GAAUACAAGCUUUAUCiAUAUG6UUUG-C G-C C -G 0-C 0-U U-A U-A *uGA * G CUACCC U

.\\

A

G

G

u

I l l

A

~

- 6;~i;i; ~ E

~

~

CU AC -G C .. AU C-G A-U 0-C

B. subtilis tRNAHi8 precursor

B

A3' C C ''G -C G-C U -A 0-C 0-C C-G U-A A-U s4U UACCC' A A I I I I I G A GUGGGT

DGA D G pu u, m7G cG D A G A F C - G U

?F?

*

.\yc

U- A 0- C \G-C

uA-%

E. coil tRNA:'' FIG. 15. (A-C) Sites of cleavage induced by Fe(II)4~leomycinin three tRNA (precursor) substrates.

POLYNUCLEOTIDES AND BLEOMYCIN

C

341

I

t

$

3’

5‘ G A A U A C A A G C U A A A G U A A ~ A U ~ ~ G U C C A ~ ~

A-U C-G A-U C -0

CA- A ‘ U A CUA

tRNASe‘ amber

suppressor construct FIG. 15. (Continued)

Fe(II).bleomycin (121). Remarkably, at low concentrations of added Fe(II).bleomycin, cleavage of both tRNA and “tDNA” substrates occurred predominantly at the same site, the putative junction between single- and double-stranded regions of the molecules. Although no information is available concerning the tertiary structure of either molecule, the fact that they have the same cleavage site suggests strongly that they share common structural features; this has been documented convincingly for other pairs of tRNA and “tDNA” inolecules (122-124). This result provides strong evidence that the three-dimensional structures of RNA and DNA, not the difference in the constituent nucleotides, are responsible for the difference in selectivity of bleoinycin cleavage of RNA as compared to DNA. Analogous to DNA cleavage, the mechanism of cleavage of RNA mediated by Fe(II).bleoinycin has been shown to be oxidative. The mobility of radiolabeled RNA cleavage fragments was indicative of 3’-phosphoroglycolate termini (8, 9),consistent with a mechanism involving initial abstraction of the C-4‘ H of the ribose ring, which resides in the minor groove. RNA molecules typically adopt A-form conformations, which contain much shallower and wider minor grooves than B-form DNA, the “usual” bleomycin substrate. Comparison of the structures of the minor grooves of representative A- and B-DNAs whose structures have been determined crystal-

342

STEFANIE A. KANE AND SIDNEY M. HECHT

lographically (125,126)reveals that C-4’ H in A-form DNA is less accessible sterically within the minor groove than in B-form DNA (R. J. Duff and S. M. Hecht, unpublished). Moreover, C-1’ H is located centrally in the minor grooves of both A- and B-DNAs. Although abstraction of C-1’ H seeins equally as plausible as that of C-4’ H, from a mechanistic standpoint, the former pathway has not been documented for any DNA substrate. Detailed analysis of the mechanism of RNA cleavage was carried out on chimeric oligodeoxynucleotides of the type d(CGCTAGCG), containing a single ribo- or uru-nucleotide at cytidine, (127).This oligodeoxynucleotide is a good substrate for Fe(II).bleomycin(SO, 128);cleavage at cytidine, yielded CGCH,COOH and cytosine propenal (cf. Figs. 7 and 8). The same products were obtained by treatment of the C,-ribo and C,-aru octanucleotides with Fe(II).bleoinycin, supporting a mechanism involving initial abstraction of C-4‘ H, analogous to the mechanism of DNA cleavage (8). As noted above, it was anticipated that an additional pathway leading to RNA degradation might involve abstraction of C-1’ H, as illustrated in Fig. 16.

-

1. Fe(ll).bleomycin

-

-

S CGCTAGCG 3‘ GCGATCGC

4

2. [HI

Criegee-type

m

a

rearrangement

5, I o=p-0I

OTAGCG

HzND v 0

H2N

I

L

o=y-0-

4

+c q

NY

+ pTAGCG FIG.16. Products resulting from Fe(II).bleomycin-mediilted oxidation at C-I’ of C,-uru and C,-ribo octanucleotides following treatment with 1,2-dieminoben~ene.

POLYNUCLEOTIDES AND BLEOMYCIN

343

As indicated, Criegee rearrangement of the initially formed C-1' hydroperoxide would not lead directly to strand-scission. It was envisioned that treatment of the intermediate putatively formed, i, with 1,2-diaminobenzene could effect conversion to a dinucleotide quinoxaline, with concomitant strandscission. The expected products were found. Treatment ofeither the C,-ara or C,-ribo octanucleotide with Fe(I1).bleomycin, followed by l,%diluninobenzene, yielded the dinucleotide quinoxaline derivative 4 (127, 128). These results strongly suggest that degradation of the C,-aru and C,-ribo octanucleotides by Fe(II).bleoinycinproceeds by two pathways, involving initial H abstraction from both C-1' and C-4'. At present, it is unknown whether an analogous mechanism can operate for the degradation of authentic RNA substrates by Fe(I1).bleomycin. Additional evidence supporting an oxidative mechanism of RNA cleavage was obtained by the use of a tRNA"iS precursor substrate "P-labeled at the 5' terminus and also containing [,H]uridine; this substrate had a highefficiency bleoinycin cleavage site at U,, (129). Treatment of this tRNAIjis precursor with Fe(II).bleoinycin released free ["HJuracil. Moreover, the amount of free uracil was stoichioinetrically equivalent to the amount of strand-scission of U:i5. These results are consistent with the existence of an oxidative mechanism for bleoinycin-mediated RNA cleavage. The ability of RNA to serve as a therapeutically relevant target for cleavage by Fe(II).bleomycin has been clearly demonstrated, using representative substrates from the different classes of RNA. However, one type of RNA molecule that, until recently, had not been explored as a target for Fe(II).bleomycinis the RNA strand of an RNA.DNA heteroduplex. Early experiments using the hybrid poly(rA).poly(dT)suggested that only the DNA strand of such a heteroduplex is a suitable substrate for Fe(II).bleomycin (130).Further, recent studies of Fe(II).bleoinycin-mediatedcleavage of the homopolymers poly(rA).poly(dT)and poly(dA).poly(rU) seemed to confirm the initial findings that bleoinycin does not degrade the RNA strand of a heteroduplex (131). However, the lack of sequence diversity within these substrates suggested that they might not be adequate representations of the nucleic acid sequences found in living organisms. RNA.DNA hybrids are present in cells during both forward and reverse transcription (119);therefore, in foiward transcription, any targeting of the heteroduplex could lead to destruction of the template-bound inRNA, which in turn would deprive the cell of proteins essential for survival. The ability of Fe(II).bleomycinto mediate the cleavage of a heteroduplex was investigated using the hybrid obtained froin reverse transcription of E. co2i 5-S rRNA (M. A. Morgan and S. M. Hecht, unpublished). A strategy was developed for the independent radiolabeling of each strand at the 5' terininus in order to investigate the cleavage of each strand. Fe(II).bleomycin-

344

STEFANIE A. KANE AND SIDNEY M. HECHT

mediated cleavage of the DNA strand occurred primarily at G-Y sequences, consistent with that observed for cleavage of B-DNA. Remarkably, the RNA strand of the heteroduplex was cleaved with the same facility as the DNA strand. This was the first example of an RNA in which complete degradation of the substrate occurred; other RNAs were typically degraded only at significantly higher concentrations of added Fe(II).bleoinycin, and the degradation did not proceed to completion. It is of particular interest that complete consumption of the RNA strand occurred at concentrations of Fe(II).bleomycincomparable to those needed to degrade the DNA strand completely. The RNA strand of this heteroduplex is the most efficient RNA substrate for Fe(II).bleomycin characterized to date. It may also be noted that added Mg2+ had comparable effects in diminishing Fe(II).bleomycin-mediatedcleavage of the DNA and RNA strands of the heteroduplex. Although yeast 5-S rRNA is a substrate for Fe(II).bleoinycin (9) and is similar in sequence to the RNA strand of the foregoing heteroduplex, sites of RNA cleavage observed for the heteroduplex differ from those observed for 5-S rRNA itself. Moreover, the RNA strand of the heteroduplex is a significantly better substrate for Fe(II).bleomycin than 5-S rRNA itself. These results provide strong evidence that the tertiary structure of the RNA strand of the heteroduplex is significantly different from that of the 5-S rRNA. Again, this finding constitutes compelling evidence that bleomycin recognizes substrate conformation rather than primary structure. Further, the corresponding all-DNA duplex has some cleavage sites in common with the DNA strand of the heteroduplex.

VIII. Strand-Scission of Altered DNA Structures Mediated by Fe(ll)-Bleomycin In addition to the structure of bleomycin, the structure of the nucleic acid target also influences the sites recognized by bleomycin. An early experiment indicating the ability of bleomycin to recognize DNA Conformation involved the treatment of plasmid DNAs with limited amounts of bleoinycin; this yielded a few unique cleavage sites, all within a discrete region of the DNA substrate (132).A related study investigated the effects of the topological state of plasmid DNA on the selectivity of strand-scission by bleomycin (133). Comparison of the cleavage of supercoiled and linearized plasmid DNAs showed several cleavage sites induced in the supercoiled DNA that were not observed in the linearized DNA substrate, suggesting that the

POLYNUCLEOTIDES AND BLEOMYCIN

345

selectivity of bleomycin-mediated DNA degradation is influenced by the conformation of the DNA substrate. cis-Diamminedichloroplatinum(I1)(cisplatin) is a clinically important antitumor agent often used in combination chemotherapy with bleomycin for the treatment of certain forms of cancer (134).The chemotherapeutic effects of cisplatin are attributed to its ability to form covalent crosslinks with DNA, resulting in adducts that block replication (135).Covalent binding of platinum complexes to DNA induces significant structural distortions in the DNA helix (136-138),and it seemed logical to think that this might affect recognition by bleomycin (139,140).The binding of cisplatin to DNA alters the sequence-selectivity of DNA cleavage by bleomycin (140). The bleomycin-mediated cleavage that ordinarily occurs adjacent to oligo(dG) regions is masked in cisplatin-treated DNA, while new cleavage sites at other sequences are observed. An oligonucleotide duplex containing a highefficiency bleomycin cleavage site and having one strand, d(CGCT,A,GG), with a single, defined cisplatin-d(G-G) crosslink, was used in a detailed analysis of the effect of DNA platination on bleomycin cleavage (139).Treatment of the platinated substrate with Fe(II).bleoinycin afforded the same overall yield of cleavage products as was observed for the non-platinated DNA substrate, although much of the cleavage typically observed at the preferred G-C site was redirected to the thymidine nucleotides. This indicates that the conformational alteration effected by cisplatin indeed resulted in novel bleomycin-mediated DNA cleavage patterns. D N A inethylation affects the regulation of gene expression in eukaryotic systems. Aberrant gene expression, observed in cancer cells, appears to correlate with decreases or alterations in DNA methylation patterns (141, 142).Accordingly, the effect of inethylation of cytidine and adenosine residues in DNA on the selectivity of bleomycin-mediated DNA cleavage has been investigated. By the use of radiolabeled DNA restriction fragments from pBR322 that had been methylated with restriction methylases, the ability of methylated and uninethylated DNAs to serve as substrates for Fe(II).bleomycin was compared (90, 143).Bleomycin-mediated DNA cleavage was diminished substantially at sites proximal to N6-methyladenosine and 5-methylcytidine residues, particularly at sequences containing multiple sites of inethylation. Cytidine methylation promotes a major conformational change (the B + Z transition) (144).Z-DNA is typically favored in aqueous solutions containing high concentrations of NaCl or MgCI,. Bleomycin stabilizes B-DNA, increasing the salt concentration necessary to induce the B + Z transition. Therefore, the observed diminution of cleavage of methylated DNA by bleomycin, appears to be attributable to the conformational change in the

346

STEFANIE A. KANE AND SIDNEY M. HECHT

DNA substrate resulting froin methylation, thus suggesting that recognition of altered inethylation patterns in cancer cells could contribute to selective chemotherapeutic action by bleomycin. The effect of a single methylated cytidine residue on bleomycinmediated DNA cleavage w a s studied using the oligonucleotide d(CG-mCT3A,CGC) as a substrate (245). Fe(II).bleomycin degraded this substrate as efficiently as the corresponding unmethylated oligonucleotide; however, degradation of the methylated substrate led to a greater proportion of alkalilabile lesions, as opposed to strand-scission products. Since both sets of products are believed to derive from a common C-4’ deoxyribose radical, these findings clearly show that the alteration of DNA structure produced by cytidine inethylation results in a change in the chemistry of bleomycinmediated DNA degradation. DNA “bulges” can be selectively recognized by bleoinycin (146). (A bulge is an “extra”unpaired nucleotide on one strand of the double-helix.)The targeting of DNA bulges by DNA-damaging agents is of interest because bulges appear to be intermediates in the process of frameshift mutagenesis (147). The ability of bleomycin to recognize and selectively cleave a DNA bulge has been investigated, using a series of radiolabeled double-stranded oligodeoxynucleotides containing bulges at different sites in the sequence. For each substrate, one or two nucleotides on the strand opposite the bulge was cleaved with the greatest efficiency by Fe(II).bleomycin. It may be observed that this bears some analogy to the bleomycin cleavage patterns observed for yeast ribosomal 5-S RNA (9), although the HNA cleavage occurred on the stand containing the bulged nucleotide. Very recently, the ability of a DNA triple-helix to serve as a substrate for Fe(II).bleomycin has been investigated using the triple-helix illustrated in Fig. 17. The entire sequence is devoid of G-Y sequences, that is, those normally cleaved preferentially in B-DNA by Fe(II).bleomycin.Fe(II).bleomycin mediated highly specific cleavage of the triple-helix at the duplex-triplex junctions (Fig. 17). Judged by their migration on polyacrylainide gels, the products of triplex cleavage included oligonucleotide 3’-phosphoroglycolates and products derived from alkali-labilelesions, implying that the mechanism of cleavage of the triple-helix is analogous to that of the cleavage of B-DNA, i.e. that it involves initial H abstraction from C-4‘ of deoxyribose (118). Molecular-modeling calculations of this triple-helix structure have been made in an effort to understand the conformational changes in the minor groove induced by triple-helix formation (J.-S. Sun, personal communication). The minor groove of the triplex was calculated to be somewhat shallower and wider than the minor groove of the corresponding duplex, but did not approach the dimensions of an A-form duplex [ll A and 2.7A in width and depth, respectively (119)].The calculations suggest that there may be

POLYNUCLEOTIDES AND BLEOMYCIN

347

5' - TCCTGATAAAGGAGGAGATGAAGAAAAAATGA - 3 '

5' - TTTCCTCCTCTI- 3 ' FIG.17. Sequences of the DNA triple-helix used as a substrate for Fe(II).bleomycin, slitnviiig sites of triplex-specific. cleavage mediated by Fe(II).bleomycin. The lengths of the arrows are in roiigli proportion to the extent of strand-scission observed at each site.

major changes in the dimensions of the minor groove at the duplex-triplex junctions (Fig. 17), particularly at the 5'-junction, where the minor-groove width was calculated to increase from 3.6 8, to 6.4 A within two nucleotides. Analysis of the sites of triplex-dependent cleavage produced by Fe(II).bleomycin (see Fig. 17) in the context of the molecular-modeling calculations showed that the bleoinycin cleavage sites corresponded to the regions of the DNA where the dimensions of the minor groove width and depth are predicted to undergo dramatic variations, particularly at the 5' junction. These results strongly suggest that a minor-groove conforination preferred by Fe(II).bleoinycin is located within these regions and results in specific cleavage at these sites. The fact that the strongest cleavage site induced by Fe(II).bleomycin occurred at the 5' duplex-triplex junction is of particular interest, because the 5' junction is the preferred triple-helix binding site of intercalating agents (148,149).Although there is some evidence consistent with the binding of bleomycin to DNA by a (partial) intercalative mechanism (76, 78, 84), there is overwhelming evidence that bleomycin associates with DNA through minor-groove interactions (89-92). In order to determine whether the preference of bleomycin for the 5' junction reflects an intercalative mode of binding at this site, the selectivity of bleomycin has been compared with that of phleomycin (Fig. 11). Phleoinycin contains a thiazolinylthiazole moiety, as opposed to a planar bithiazole ring system, and does not intercalate into DNA (95). In spite of this structural difference, Fe(II).phleomycin produces essentially the same pattern of cleavage of B-DNA as Fe(II).bleomycin (94). Significantly, Fe(II).phleomycin also mediates specific cleavage of the triplex at the same sites as Fe(II).bleomycin. These results provide strong evidence that the cleavage specificity of Fe(I1)bleomycin for the 5' junction is not related to a selective intercalative mode of

348

STEFANIE A. KANE AND SIDNEY M. HECHT

binding, but rather to recognition of a minor-groove structure inherent in the duplex-triplex junction.

IX. Concluding Remarks Studies in our laboratory have focused on issues such as the source(s) of nucleic acid recognition by Fe(II).bleomycin, as well as the identification of novel nucleic acid structures that could constitute therapeutic targets for bleomycin. These studies demonstrate that alteration of DNA structure by platination or methylation, or hybridization of a single-stranded DNA to an RNA strand, results in novel bleomycin-mediated DNA cleavage patterns, and that these structural alterations can actually affect the chemistry of bleomycin-mediated polynucleotide degradation, presumably by affecting the binding of bleomycin to the polynucleotide substrate. The finding that RNA can serve as an efficient substrate for bleomycin has led to the discovery of a few new facets of bleomycin-mediated nucleic acid degradation, including the ability of bleomycin to recognize selectively nucleic acid conformation, rather than sequence, and the existence of a new chemical pathway for substrate degradation. The preference of bleomycin to cleave at the duplex-triplex junctions within a DNA triple-helix also demonstrates selective recognition of DNA shape by Fe(II).bleomycin. One intriguing aspect of the interaction of bleomycin with doublestranded DNA is its selectivity for G-Y sequences. The minor groove of DNA in proximity to G-C sequences is believed to be shallower than the minor groove of A-T sequences, due to the 2-amino group of guanosine; moreover, some evidence suggests that the minor groove of G-C sequences is wider than that of A-T sequences (150). These results suggest that sequences containing wider and shallower minor grooves, such as G-C sequences in B-DNA, as well as analogous sites found in RNA and in DNA triple-helix structures, may actually be the source of nucleic acid recognition for Fe(II).bleomycin. Further study into the mechanisms of minor-groove recognition by Fe(II).bleomycin should provide a better understanding of the way in which this antitumor agent interacts with its target(s).

ACKNOWLEDGMENTS We thank Anand Natrajan for helpful discussions during the writing of this essay, and inembers of the Hecht laboratory for their mntributions to the studies described here. Studies ;It the University of Virginia were supported by research grants CA-27603, CA-38544, and CA-53913 from the National Cancer Institute, Department of Health and Human Services.

349

POLYNUCLEOTIDES AND BLEOMYCIN

REFERENCES 1. S. M. Hecht, Acc. Chetti. Res. 19, 383 (1986). 2. J. Stublx! and J. W. Kozarich, Chetn. Reu 87, 1107 (1987). 3. A. Natrajan and S. M. Hecht, in “Molecular Aspects of Anticancer Drug-DNA Interactions” (S. Neidle and M. J. Waring. eds.), p 197.Macmillan, London, 1993. 4. D. S. Sigman, Acc. Chem. Res. IS, 180 (lesS).

5. P. B. Dervan, Science 232, 464 (1986). 6 . J. K. Barton, Science 233,727 (1986). 7. R. S. Maggliozzo, J. Peisach and M. R. Ciriolo, Mol. Phornlocnl. 35, 428 (1989). 8. B. J. Carter, E. de Vroom, E. C. Long, G . A. van der Marel, J. H. van Boom and S. M. Hecht, PNAS 87, 9373 (1990). 9. C. E. Holmes, B. J. Carter and S . M. Hecht, Bchetn 38, 4283 (1993). 10. H. Ume7awa. Y. Siihm. T.Takita and K. Maeda, J. Antfbbt. MA, 210 (1966). 11. B. 1. Sikic, M. Rnzenrweig and S. K. Carter, eds., “Bleomycin Chemotherapy.” Academic Press, Orlando, Florida, 1985. 12. A. D. D’Andreil and W. A. Haseltine, PNAS 75, 3608 (1978). 13. M. Takeshita, A. Grollman, E. Ohtsulm and H. Ohtsulm, PNAS 73, 5983 (1978). 14. N. J. Oppenheimer, L. 0. Rodriguez and S . M. Hecht, PNAS 76, 5616 (1979). 15. H. Umeiawa, T.Takita, S. Sdto, Y. Muraoka, K. Takahashi, H. Ekimoto, S. Minamide, K. Nishikawa, T. Fukuoka, T. Nakatani, A. Fujii and A. Matsuda, in “Bleomycin Chemotherapy” (B. I. Sikic, M. Rozencweig and S. K. Carter, 4s.). p. 289. Academic Press, Orlando, Florida, 198.5. 16. E. A. Sausville, J. Peisach and S. B. Horwitz, BBRC 73, 814 (1976). 17. E. A. Sausville. J. Peisach and S . B. Horwitz, Bchetn 17, 2740 (1978). 18. E. A. Sausville, R. W. Stein, J. Peisach and S. B. Horwitz, Bchem 17, 2746 (1978). 19. C.-H. Chang and C. F. Meares, Bchetn 21, 6332 (1982). 20. C.-H. Chang and C. F. Meares, Bchetn 23, 2268 (1984). 21. G. M. Ehrenfeld, L. 0. Rodriguez, S. M. Hecht, C. Chang, V. J. Bwus and N. J. Oppenheimer, Bchetn 24, 81 (1985). 22. G. M. Ehrenfeld, J. B. Shipley, D. C. Heimbrook, H. Sugiyiama, E. C. Long, J. H. van Boom, G. A. van der Marel, N. J. Oppenheimer and S. M. Hecht, Bchem 26,931 (1987). 23. G. M. Ehrenfeld, N. Murugesan and S. M. Hecht, Znorg. Chern. 23, 1496 (1984). 24. R. M. Burger, J. H. Freedman, S. B. Horwitz and J. Peisach, Znorg. Chetn. 23, 2215 (1984). 2.5. T. Suzuki, J. Kuwahara, M. Goto and Y. Sugiura, BBA 824, 330 (1985). 26. J. Kuwahara, T. Suzuki and Y. Sugiura, BBRC 129, 368 (1985). 27. L. L.Guan, J. Kuwahara and Y. Sugiura, Bchern 32, 6141 (1993). 28. Y. Iitaka, H. Nakamura, T. Nakatani, Y. Muranka, A. Fujii, T. Takita and H. Umezawa, J . Antibiot. 31, 1070 (1978). 29. T. Takita, Y. Muranka, T. Nakatani, A. Fujii, Y. Iitaka and H. U m e m . J . Antibiot. 31, 1073 (1978). 30. N. J. Oppenheimer, L. 0. Rodriguez and S . M. Hecht, Bchetn 18, 3439 (1979). 31. Y. Sugiura and K. Ishizu, J . 1norg. Biochetn. 11, 171 (1979). 32. R. M. Burger, J. Peisach and S. B. Horwitz, JBC 256, 11636 (1981). 33. R. M. Burger, T. A. Kent, S. B. Horwitz, E. Munck and J. Peisach, JBC 2-56, 1559 (1983). 34. R. E. White and M. J. Coon, ARB 49, 315 (1980). 35. F. P. Guengerich and T. L. Macdonald, Acc. C h e w Res. 17,9 (1984). 36. R. M. Burger, J. S. Blanchwd, S. B. Horwitz and J. Peisach, JBC 260, 15406 (1985). 37. N. Murugesan, G . M. Ehrenfeld and S. M. Hecht, JBC 257, 8600 (1982). ,

350

STEFANIE A. KANE AND SIDNEY M. HECHT

38. N. Murugeran and S. M. Hecht, JACS 107, 493 (1985). 39. D. C. Heimhrcmk, R. L. Mulholland and S. M. Hecht, JACS 108, 7839 (1986). 40. D. C. Heimbrook, S. A. Can; M. A. Mentzer, E. C. Long and S. M. Hecht, Znorg. Chern. 26, 3835 (1987). 41. J. T. Groves, T. E. Nemo and R. S. Myers, JACS 101, 1032 (1979). 42. J. T. Groves and T. E. Nemo, JACS 105, 5786 (1983). 43. N. Hamamichi, A. Natrajan and S. M. Hecht, JACS 114, 6278 (1992). 44. A. Natrajan, S. M. Hecht, G . A. van der Marel and J. H. van Btwm,JACS 112,4532 (1990). 45. G. Padl)ury, S. G . Sligar, R. Labeclue and L. J. Maniett. Bchern 27, 7846 (1988). 46. A. Natrajan and S. M. Hecht, J . Org. Chern. 56, 5239 (1991). 47. A. Natrajan, S. M. Hecht, C. A. van der Marel and J. H. van Btw)m,JACS 112,3997 (1990). 48. J. R. Barr, R. B. Van Atta, A. Natrajan and S. M. Hecht, JACS 112, 4058 (1990). 49. M. Nakamura and J. Peisach, J . Antibiot. 41, 638 (1988). 50. R. B. Van Atta. E. C. Long, S. M. Hecht, C . A. van der Marel and J. H. van Booni,JACS 111, 2722 (1989). 51. P. R. Ortiz de Montellano, in “Cytochrome P450 Structure, Mechanism and Binchemistry” (P. R. Ortiz de Montellano, ed.), p. 217. Plenum. New York, 1986. 52. T. Owa, T. Sugiyama, M. Otsuka, M. Olino and K. Maeda, Tetruhedron Lett. 31, 6063 (1990). S2a. J. W. Sam, X.-J. Tang and J. Peisach, JACS 116, 5250 (1994). 53. C. W. Haidle, Mol. Phannucol. 7, 645 (1971). *54. L. F. Povirk, W. Wiihker, W. Kiihnlein and F. Hutchinson, NARes 4, 3573 (1977). 55. R. M. Burger, A. R. Berkowitz, J. Peisach and S. B. Horwitz, JBC 255, 11832 (1980). 56. L. Giloni, M. Takeshita, F. Johnson. C. Iden and A. Grollnvan, JBC 256, 8608 (1981). 57. R. M. Burger, J. Peisach and S. B. Horwitz, JBC 257, 3372 (1982). .58. R. M. Burger. J. Peisach and S. B. Horwitz, JBC 257, 8612 (1982). 59. J. C. Wu, J. W. Ko7,ich and J. Stulhe, JBC 258, 4694 (1983). 60. J. C. Wu, J. W. Kozarich and J. Stubbe, Bchern 24, 7562 (1985). 61. N. Murugesan, C. Xu, C. M. Ehrenfeld, H. Sugiyama, R. E. Kilkuskie, L. 0. Rodriguez. L.-H. Chang and S. M. Hecht, Bchetn 24, 5735 (1985). 62. S . Uesugi, T. Shida, M. Ikehara, Y. Kobdyashi and Y. Kyogoku, NARes 12, 1581 (1984). 63. H. SUpjydlnd, R. E. Kilkuskie, S. M. Hecht, G . A. van der Marel and J. H. van Boom, JACS 107, 7765 (1985). f2. H. Kuramochi, K. Takahashi, T. Takita and H. Ume7;lwa, J . Antibiot. 34, 576 (1981). 65. J. W. Koi~rich,L. Worth, Jr.. B. L. Frank, D. F. Christner, D. E. Vandewall and J. Stidhe, Science 245, 1396 (1989). 66. R. M. Burger, S. J. Projan, S. B. Horwitz and J. Peisach, JBC 261, 15955 (1986). 67. G . H. McCall, L. E. Rahow, G . W. Ashley, S. H. Wu, J. W. Komrich and J. Stul)he,JACS 114, 4958 (1992). 68. H. Sugiyama, C. Xu, N. Murugesan and S. M. Hecht, JACS 107, 4104 (1985). 69. H. Sugiyama, C. Xu, N. Murugesan, S. M. Hecht, G . A. van der Marel and J. H. van Boom, Bchern 27, 58 (1988). 70. L. E. Ral)ow, G . H. McCdl, J. Stubbe and J. W. Kozarich, JACS 112, 3196 (1990). 71. L. E. R a h , G. H. McCall, J. Stubbe and J. W. Kozarich, JACS 112, 3203 (1990). 72. J. W. Sam and J. Peisach, Bchem 32, 1488 (1993). 73. C.-H. Chang, J. L. Dallas and C. F. Meares, BBRC 110, 959 (1983). 74. I. Saito, T. Morii, H. Sugiyama, T. Matsuum, C. F. Meares and S. M. Hecht, JACS 111, 2307 (1989). 75. N. J. Oppenheimer, C. Chang, L. 0. Rodriguez and S. M. Hecht, JBC 256, 1514 (1981). 76. M. A. Chien, A. P. Grollman and S. 8. Horwitz, Bchern 16, 3641 (1977).

POLYNUCLEOTIDES AND BLEOMYCIN

351

H. Kasai, H. Naganawa, T. T&ta and H. Umezawa, J. Antibiot. 31, 1316 (1978). L. F. Povirk, M. Hogan and N. Dattagupta, Bcheni 18, 96 (1979). S. N. Roy, G. A. On; F. Brewer and S. B. Horwitz, Cancer Res. 41, 4471 (1981). C.-H. Huiuig, L. Galvan and S. T. C m k e , Bchern 19, 1761 (1980). T. T. Sakai, J. M. Riordan and J. D. Glickson, Bchern 21, 805 (1982). T. E. Booth, T. T. Sakai and J. D. Glickson, Bcheni 22, 4211 (1983). E. C. Long and J. K. Barton, Acc. Cheni. Res. 23, 272 (1990). M. J. Levy and S. M. Hecht, Bchern27, 2647(1988). L. M. Fisher, R. Kuroda and T. T Sakai, Bcheai 24, 3199 (1985). J. Kross, D. W. Henner, W. A. Haseltine. L. Rodriguez, M. D. Levin and S. M. Hecht. Bchetn 21, 3711 (1982). 87. H. Sugiyama, R. E. Kilkuskie, L.-H. Chang, L.-T. Ma, S. M. Hecht, C. A. van der Marel and J. H. van Boom. JACS 108, 3852 (1986). 88. N. J. Oppenheimer, C. Chang, L.-H. Chang, G . Ehrenfeld, L. 0. Rodriguez and S. M. Hecht, JBC 257, 1606 (1982). 89. Y. Sugiura and T. Suzuki, JBC 257, 10544 (1982). 90. R. P. Hertzberg, M. J. Caranfa and S. M. Hecht, Bchetn 27, 3164 (1988). 91. T. Suzuki, J. Kuwahara and Y. Sugiura, BBRC 117, 916 (1983). 92. J. Kuwahard and Y. Sugiura, PNAS 85, 2459 (1988). 93. R. G. Kilkuskie, H. Suguna, B. Yellin, N. Murugesan and S. M. Hecht,JACS 107, 260 77. 78. 79. 80. 81. 82. 83. 84. 85. 86.

(1985). !M. J. Kross, W. D. Henner, S. M. Hecht and W. A. Haseltine, Bcheni 21, 4310 (1982). 95. L. F. Povirk, M. Hogan, M. Buechner and N. Dattappta, Bcheoi 20, 665 (1981). 96. N. J. Oppenheimer, C. Clung, L.-H. Chang, G . Ehrenfeld, L. 0. Rodriguez and S. M. Hecht, JBC 257, 1606 (1982). 97. Y. Aoyagi, H. Suguna, N. Murugesan, G. M. Ehrenfeld, L.-H. Clung, T. Ohgi, M. S. Shekhani, M. P. Kirkup and S. M. Hecht, JACS 104, 5237 (1982). 98. H. Sugiyama, G. M. Ehrenfeld, J. B. Shipley, R. E. Kilkuskie, L.-H. Chang and S. M. Heclit, J . Nut. Prod. 48, 869 (1985). 99. J. B. Shipley and S. M. Hecht, Chern. Res. Toxicol. 1, 25 (1988). 100. B. J. Carter, V. S. Murty, K. S. Reddy, S.-N. Wang and S. M. Hecht, JBC 265, 4193 (1gQo). 101. B. J. Carter, K. S. Reddy and S. M. Hecht, Tetrahedron 47, 2463 (1991). 102. M. Otsuka, T. Masuda, A. Haupt, M. Ohno, T. Shiraki, Y. Sugiura and K. Maeda, JACS 112, 838 (1990). 103. T. T. Sakai, J. M. Riordan and J. D. Glickson, BBA 758, 176 (1983). 10.1. D. E. Berry, L.-H. Chang and S. M. Hecht, Bcherti 24, 3207 (1985). 105. P. B. Dervan, Science 232, 464 (1986). 106. T. D. Tullius and B. A. Dombroski, Science 230, 679 (1985). 107. P. G. Schultz and P. B. Dervdn, JACS 105, 7748 (1983). 108. P. G. Scliultz and P. B. Dervan, PNAS 80, 6834 (1983). 109. J. S. Taylor, P. G . Schultz and P. B. Dervan, Tetrahedron 40, 457 (1984). 110. R. S. Youngquist and P. B. Dervan, PNAS 82, 2565 (1985). 111. P. B. Dewan, Science 232, 464 (1986). 112. J. H. Griffinand P. B. Dewan, JACS 109, 6840 (1987). 113. J, P. Sluka, S. J. Horvath, M. F.Bruist, M. I. Simon and P. B. Dewan, Science 238, 1129 (1987). 114. S. A. Kme, A. Natrajan and S. M. Hecht, JBC 269, 10899 (1994). 115. A. M. Maxam and W. Gilbert, Methods Enzyttlol. 65,499 (1980). 116. J. K. Barton and A. L. Raphael, JACS 106, 2466 (1984).

STEFANIE A. KANE AND SIDNEY M. HECHT

352 117. 118. 119. 120. 121. 122. 123.

S. Steenken, Cheni. Reu 89, 503 (1989).

S. A. Kane, Ph.D. thesis. University of Virginia, Charlottesville, 1993. W. Saenger, “Principles of Nucleic Acid Structure,” Springer-Verlag, New York, 1 W . A. Hiittenhofer, S. Hudson, H. F. Noller and P. K. Mascharak, JBC 267, 24471 (1992). C. E. Holnies and S. M. Hecht, JBC 268, 25909 (1993). A. S. a n and B. A. Roe, Science 241, 74 (1988). J. P. Perreault, R. T. Pon, M. Jiang. N. Usmav, J. Pika, K. K. Ogilvie and R. Cedergren,

EJB 186, 87 (1989). K. NicQghosim, G. Qi, N. Beauchemin and R. Cedergren, EJB 189, 259 (1990). 125. H. R. Drew, S. Sanison and R. E. Dickerson, PNAS 79, 4040 (1982). 126. M. McCd, T. Brown and 0. Kennard, JMB 183,385 (1985). 127. R. J. Duff,E. de Vrtmm, A. Geluk, S. M. Hecht, G. A. van der Marel and J. H. van Boom, JACS 115,3350 (1993). 128. R. J. Duff,P1i.D. thesis. University of Virginia, Charlottesville, 1993. 129. C. E. Holmes, Ph. D. thesis, University of Virginia, Charlottesville. 1993. 130. C. W. Haidle and J. Bearden, Jr.. BBRC 65, 815 (1975). 131. C. R. Krishnamtmrthy, D. E. Vandewdl and J. W. Ko~arich,JACS 110,2008 (1988). 132. C. W. Haidle, R. S. Lloyd and D. L. Robberson, in “Bleoniycin: Chemical, Biochemical and Biological Aspects” (S. M. Hecht, ed.). Springer-Verlag. New York, 1979. 133. C. K. Miraldli, C.-H. Huang and S. T. Crooke, Bchern 22, 300 (1983). 134. A. W. Prestayko, S. T.Cmoke and S. K. Carter, eds., “Cisplatin, Current Status and New Developments.” Academic Press, New York, 1980. 135. S. L. Bruhn, J. H. Toney and S. J. Lippard, Prog. Znorg. Chem 38, 477 (1990). 136. S. F. Bellon, J. H. Coleman and S . J. Lippard, Bchesi 30, 8026 (1991). 137. J. A. Rice, D. M. Crothers, A. L. Pinto and S. J. Lippard, PNAS 85, 4158 (1988). 138. S. F. Bellon and S. J. Lippard, Biophys. Chem. 35, 179 (1990). 139. B. Gold, V. Dange, M. A. Moore, A. Eastman, G . A. van der Marel, J. H. van Boom and S. M. Hecht, JACS 110, 2347 (1988). 140. P. K. Maschark, Y. Sugiura, J. Kuwahara, T. Suzuki and S. J. L i p p l , PNAS 80, 6795 (1983). 141. V. L. Wilson and P. A. Jones, Cell 32, 239 (1983). 142. M. Ehrlich and R. Y.-H. Wang, Science 212, 1350 (1981). 143. R. P. Hertzberg, M. J. Caranfa and S. M. Hecht, Bcheni 24, 5285 (1985). 144. M. Behe and G. Felsenfeld, PNAS 78, 1619 (1981). 145. E. C. Long, S. M. Hecht, G. A. van der Marel and J. H. van Boom, Bchew 112, 5272 (1990). 146. L. I>. Williams and 1. H. Goldlwg, Bchew 27, 3004 (1988). 147. G. Streisinger, Y. Okada, J. Emrich, J. Newton, A. Tsugita, E. Tenaghi and M. Inouye, CSfZSQB 31, 77 (1966). 148. L. Perrouault, U. Asseline, c. Rivde, N. T. Thuong, E. Bisagni, c. GiOVdIindng~li. T. LeDom and C. H6lhe. Nature 344,358 (1990). 149. D. A. Collier, J.-L. Mergny, N. T.Thuong and C. H6lPiie. NARes 19, 4219 (lQ9l). 150. C. Yoon, G. G . Prive, D. S. Gooclsell and R. E. Dickerson, PNAS 85, 6332 (1988). 124. J. Paquette,

Interaction of Epidermal Growth Factor with Its Receptor STEPHENR. CAMP ION^ SALILK. NIYOCI~

AND

The Protein Engineering ond Molectilur Mutugenesis Progrutn and the Uniwrsity of Tennessee-Ook Ridge Gruduute School of Bbinedicul Sciences Biology 11il;ision Ouk Ridge Nutionul I.u/nirut~inj~ Ook Ridge. Tennessee 37831

I. Sequence and Structure of EGF and ECF Receptor ................

355

11. Generation and Characteri7.tion of Mutant Hriinan

ECF Analogues ............................................... 111. Effects of Single-site Mutations on Receptor-Ligand Assmiation ................................................... IV. Cuinulative Eff'ect of Multiple Mdations on Receptor Binding ....... V. Conclusions ................................................... References.. ..................................................

359 365 377 379 380

Epiderinal growth factor (EGF) is a prototypical peptide growth factor whose mitogenic role in signal transduction is attributed to its action as an allosteric regulator of the intrinsic protein-tyrosine kinase activity of the cellsurface EGF receptor (for general reviews, see 1-4). The 6-kDa EGF peptide binds with both high affinity and high specificity to the EGF receptor, a transmembrane glycoprotein (180 kDa) comprised of an extracellular ligandbinding domain, a single membrane-spanning region, and a functional intracellular tyrosine-kinase domain (5).The catalytic activity of the intracellular tyrosine-kinasedomain is essential for the receptor's role in mediating EGFdependent effects on cell proliferation (6-8). High-affinity association of

1 Present address: Medicinaland Nuturd Products Chemistry Division, University of Iowa, Iowa City, Iowa. 2 To whom correspondence may be addressed. Doperated by Martin Marietta Energy Systems, Inc., under rontrac+ DE-AC05-840R21400 with the U.S. Depurtment of Energy.

354

STEPHEN R. CAMPION AND SALIL K. NIYOGI

EGF with the extracellular domain of the EGF receptor results in formation of an activated receptor-ligand complex. The significance of the ligand-dependent regulatory influence imposed by the receptor’s extracellular domain over the tyrosine-kinase domain was established by the characterization of the v-erbB gene product, a constitutively activated version of the EGF receptor that lacks an extracellular domain and is a inember of the src family of oncogenic proteins (9).The critical importance of growth-factor control over the receptor-kinase activity is exhibited by the serious consequences of unregulated cell growth observed upon the loss of receptor-kinase regulatory control. A breakdown in receptor-kinase control can occur by several different mechanisms. Mutation and/or overexpression of various components of the growth factor-receptor signaling pathway, which includes growth factors, the EGF-receptor kinase, or the related c-erbB-e/neu receptor, have been correlated with unregulated cell growth (10-14). The mechanism by which the physical association of EGF with the receptor’s extracellular domain exerts regulatory influence on the receptor’s intracellular kinase activity is not completely understood. It is thought that the formation of the EGF receptor-ligand complex leads to stimulation of receptor-kinase activity as a result of a ligand-induced conformational change, initially in the extracellular domain (15) and subsequently transmitted to the kinase domain of the transmembrane receptor protein. The precise structural differences between the latent inactive receptor and the activated receptor-ligand complex have not been established, despite extensive attempts to elucidate these alterations using a variety of physical and chemical techniques. As a result of the observed tendency for the EGF receptor to form diineric (or higher-order) complexes, several studies have attempted to correlate receptor activation with a receptor dimerization event (16-25). Others have proposed kinase activation by the dissociation of latent receptor dimers to active receptor monomers (26) or via an intramolecular mechanism without receptor dimerization (27, 28). Several studies show that EGF receptors have increased susceptibility to covalent cross-linking induced by EGF (2934), suggesting increased receptor-receptor interaction in the presence of the EGF ligand. The nature and specificity of receptor-receptor interactions have not been established. Formation of the catalytically active receptorligand complex requires the proper interaction of the growth-factor ligand, the extracellular domain of the growth-factor receptor, and their solvent environment. While slow progress is being made toward a better understanding of the molecular mechanism of receptor-kinase activation, significant progress has already been made toward understanding the nature of the protein-protein

ECF-HECEPTOH INTERACTION

355

interactions taking place between EGF and its receptor. These studies have exploited the power of site-directed mutagenesis and protein chemistry to evaluate the participation of individual residues of the growth-factor peptides in E G F receptor-ligand association. A comprehensive review of these studies is presented here.

1. Sequence and Structure of EGF and EGF Receptor The structure-function studies of EGF using site-directed mutagenesis have been aided significantly by the elucidation of both the tertiary structure of the E G F protein, using advanced NMR techniques, and the amino-acid sequences of a wide range of EGF and EGF-related proteins from various organisms.

A. Conservation of EGF Primary Structure The amino-acid sequences of several members of the EGF family of growth factors have been determined, including those for human, mouse, rat, and guinea pig EGF; human and rat transforming growth factor a (TGFa); vaccinia, shope fibroma, and myxoina virus growth factors (35); human amphiregulin (36);and human heparin-binding E G F (37).The alignment of these sequences, based on the highly conserved cysteine residues involved in forming the three internal disulfide bonds, is shown in Fig. 1. In addition, the sequences of several peptide ligands belonging to the HRG (heregulin)/NDF (neu differentiation factor) family of growth factors, each containing a functional EGF-like domain, but interacting with HER2/cerbB-e/neu HER3 and HER4 (see 37u for review) receptors (close relatives of the E G F receptor), have also been determined (38-42). Remarkably, a wide variety of other proteins, functionally unrelated to EGF, that contain regions exhibiting EGF-like sequences have also been found. The conservation of sequence among the different species of EGF, in comparison with sequences of both related growth factors and unrelated EGF-like sequences, provides some indication of the degree to which specific residues in the protein are required for structure andlor function. The importance of some residues is easily recognized, such as the highly conserved cysteines and glycines, which enable the protein to assume its stable native tertiary structure. The importance of other residues is often more difficult to resolve from sequence conservation alone. A high degree of conservation is observed for human EGF (hEGF) residues Tyrl3, Gly18, Gly36, Tyr37, Gly39, Arg41, and Leu47, suggesting that these sites are targets for mutagenesis. However, the importance of other residues was

356

STEPHEN H. CAMPION AND SALIL K. NIYOGI

P I

humanEOF

1 moueeBOF 1 rstEOF 1 g u i n e a pigWP 1 humanTOFa 1 ratMFa 1 human AR 41 HB-EGF 30 SWF 116 VOP 38 MGF 30 SFGF 30

b ratNDF HRQ-a HRG-01 HRG-P2 HRQ-P3 Pro-ARIA

175 175 175 175 175 134

NSDSBCPLSWOYCLHWVCmIBAL---DIPIAQICWOYIOBICQ~L~PYRDLIOlWBLR NSYWCPBSYWYLWOrmOPIIBBL-- -DBYTQIOTlOYBODICQTRDLR~UR N S N T G C P P - --DRYVCNCVIGYIOBRCQHRDLR Q D A W C P P B t I ~ L - - - ~ ~ C V I O W O B I ~ D L D L * I I WSPHNoCPDaHlQFC?H-QTCRFLVQB- - D K P A C V C H B L L A WS~CPM)HlQYC~-Q~~B---BI[PACVMSOWOVICklL~ K K I W P ~ ~ r C I H - O ~ ~ I ~ - - - M V T C K ~ ~ ~ 8 R C O Q K

... . . .I(I(RDPCLR~PlmPCIH-OBC~L---RAPBCICHWYHO~CHOLSLP~. . ...R K K ~ P C A U t 4 m P C I H - O ~ Y I ~ - - - B W T C H C H Q D Y W B R C O B l P P n I P I I ) I I l ( ...D I P A I ~ P ~ W Y ~ - ~ I ~ I - - - ~ ~ C S H O ~ I R C Q ~ . . .I I I [ R I ~ C I S D D Y I M Y C ~ ~ S ~ P ~ ~ I ~ O S R ~ F I N L I T I K . . .I V L H V K V C N H D Y D H Y C ~ I ~ E I T P ~ I ~ ~ R C Q F I N L ~ ...G T S H L I K C A e K E R T F C V N G O B C P L S N P 8 R Y L C K ~ W ~ R ~ ~ P ~... KVPT

~ Y

...G T S H L V K C A E K B I ( T P C O B C ~ ~ L ~ P S R Y L C K C Q WPB... ~ ~ ~ ~ ~ Q N ...O T S H L V K ~ I [ B K T P C V m W I B C R " O L B W P 8 R Y L C K C P N S ~ R ~ ~ S... PY~L ...CTSHLVKCAEKEKTPFMVKDLSNPSRYLCKCPNEFCODRCQNYWA8FYK-- ... ... G T S H L V K C A E K E K T ~ W 8 C F M V K D ~ S R Y L C K C P N B ~ R ~ N Y W A 8 P Y S T S . . . . . . G T S H L T K C D I K P I U P C V m W B ~ L ~ P P R Y ~ R C P N B ~ .D. R ~ ~ ~ ~ L .

FIG. 1. Alignment of aniino-acid sequences of ECF, TGFa, and ECF-like domains (a) in proteins known to I)ind to the E C F receptor and (I)) in proteins known to bind to the c-erbB-2 receptor. Cysteine residues were used for alignment, with appropriate gapping (-) added to maxiinize honiology and to account for the different Icwp sizes of the niolecules. Conserved residues are indicated in Imld type. Disulfide I)ritlges, which define the various loops. are indicated Iiy solid lines and designated by letters. AR. Aniphiregulin; HB-ECF, heparinbinding EGF; SIIGF, schwannoiiia-derived growth factor; VCF. \wvinin grtwth factor; MCF, nivxonia growth factor: SFCF. shope fil)ronia grtwth factor. NIIF. neii differentiation factor; HRC. heregdin; and AHIA. acetvlcholine receptor activator protein.

identified only by systematic replacement of amino-acid residues throughout the EGF molecule. Inferring the importance of specific amino-acid residues by scrutinizing sequence conservation alone can lead to inaccurate conclusions. Cautious interpretation of information from sequence conservation, combined with judicious use of the predicted EGF structure, has facilitated a directed approach to the analysis of EGF structure and function by sitedirected inutagenesis and chemical modification.

B. EGF Solution Structure The native EGF molecule, containing three intramolecular disulfide bonds, was recognized rather early as establishing a very stable protein structure in aqueous solution (43). An intense investigation of the solution structure of EGF and TGFa was conducted independently by several investigators who used two-dimensional 1H-NMR techniques. A discussion of the advanced methodology utilized and interpretation of each structural detail collected in these studies are beyond the scope of this essay. Instead, we hope to relate how the predicted EGF structure was used to assist in evaluating the effects of substituting amino acids of various functional groups at sites throughout the EGF molecule.

EGF-RECEPTOR INTERACTION

357

FIG.2. Rihbu diagram of EGF. adapted from the three-dimensional structure of ECF generated by NMR analysis and cwnputer modeling (49). Arrows indicate regions of the EGF Imckbone that constitute the p-sheet semndary structure. Solid Bars indicate disulfide bonds. Dotted lines (residues 49-53) indicate residues for which structural assignments have not been made.

Models derived from two-dimensional NMR analysis predicted a native EGF solution structure (see Fig. 2) having two slightly overlapping motifsa distinct N-terminal domain (residues 1-35) and a C-terminal domain (residues 30-53) (44-51). The models for EGF structure portray the growthfactor peptide with its two separate domains in a fixed position relative to each other. However, these independent reports describe E G F models that differ somewhat in the relative orientations ofthe two domains. The degree of flexibility and the level of dynamic motion of the individual domains and of the entire E G F molecule have been examined and indicate a significant degree of motion involving each of the various subdoinains of the molecule (52). As with sequence conservation data, knowledge of the native growthfactor structure has been of limited value in designing E GF analogues to test directly the involvement of specific residues in receptor binding. Only by a concerted effort of systematic mutation of amino-acid residues throughout the E G F molecule has it been possible to identify those amino-acid sidechains that are involved in important receptor-ligand interactions. Nonethe-

358

STEPHEN R. CAMPION AND SALIL K. NIYOGI

less, being able to plot the location of those residues found to be important

by site-directed mutagenesis within the framework of an established molecular structure makes it easier to visualize potential modes of receptor-ligand interactions.

C. Comparison of EGF-Receptor Sequences While the three-dimensional structure of the EGF receptor has not yet been determined, an examination of the homology between the amino-acid sequences of the EGF receptor and the other members of the EGF-receptor family suggests a common tertiary structure for these proteins. The complete amino-acid sequences of the EGF receptors from both human and avian sources are available (5, 53), as are the sequences of several other transmembrane receptor proteins, including human HER2/c-erbB-2 (S), rat neu (55), HER3/c-erbB-3 (56), and HER4/c-erbB-4 (56u), as well as a homologous protein from Drosophila (57, 58). The sequence conservation among these receptors, which associate with distinctly different ligands, is particularly high within the active tyrosine-kinase region present in the intracellular domain of each of these proteins. The human insulin receptor, which is not a member of the EGF-receptor family, also has regions of moderate to high homology with the EGF receptor (59). A comparison of the sequences of the extracellular domains of these related receptor proteins has provided several important clues to potential structural motifs of the ligand-bindingdomain (60). The N-terminal region of each member of the EGF-receptor family of proteins (HER residues 1-309) shows significant homology with respect to the C-terminal region (HER residues 310-621) of the extracellular domain, and suggests a somewhat symmetrical structure for the extracellular ligand-binding domain of each of these homologous proteins. The extracellular domain of each member of the EGF-receptor family contains two large regions having many cysteine residues thought to form compact structures not involved in direct receptorligand interaction. The cysteine-poor regions of the extracellular domain show alternating regions of amino-acid sequence homology and variability. The conservation of amino-acid sequence in the extracellular domain of the EGF receptor, with respect to the other members of the EGF-receptor family, probably represents amino acids required mainly for formation of the similar tertiary structure shared by these homologous proteins. It is probable that the ligand-binding specificity of each receptor is conferred by the variable regions within the extracellular domain of these receptors; this suggests that these receptors and their corresponding ligand peptides coevolved to optimize the specificity of interaction needed for control of cell growth. Those amino-acid residues in the EGF-receptor extracellular domain that come together to form the complementary ligand-binding“pocket”

EGF-HECEPTOH INTEHACTION

359

are of considerable interest. There are studies under way attempting to identify these receptor residues.

D. Identification of the Receptor‘s Ligand-Binding “Pocket” Preliminary studies attempted to locate the ligand-binding region of the extracellular domain of the E G F receptor by replacement and deletion mutagenesis as well as receptor-ligand cross-linking. The results provide some evidence for the potential participation of a limited region of the receptor extracellular domain in high-affinity ligand binding. The deletion or exchange of large segments of the human and avian receptors generates receptors with altered affinity for growth factor and suggests the participation of the cysteine-poor regions of the receptor‘s extracellular domain in receptorligand interactions (53, 61, 62). A 40-kDa fragment, isolated from the major cysteine-poor region of the EGF-receptor extracellular domain by limited proteolysis, binds to the EGF-related ligand, TGFa, about l/lWth as strongly as does the intact EGF receptor (63). Covalent coupling of receptor-bound EGF by chemical cross-linking agents has distinguished two residues in the receptor extracellular domain as being in close proximity to the a-amino group of the EGF molecule. Such a reaction, using the amine-reactive bifunctional reagent disuccinimidyl suberate, of inurine E G F bearing the single reactive N-terminal a-amino group has identified receptor residue Lys336 as being in close enough proximity to the bound E G F ligand to permit receptor-ligand cross-linking (64,65).Stepwise reaction, using the heterobifunctionalreagent sulfo-l\r-succinimidyl4-(fluorosulfonyl)benzoate, was utilized to link the N-terminal a-amino group of murine EGF to residue TyrlOl in the receptor extracellular domain (66, 67). The presence of relatively few reactive groups limits the usefulness of native mouse or human EGF proteins in most cross-linking studies designed to map residues throughout the receptor’s ligand-binding domain. The ability to introduce, via site-directed mutagenesis, a reactive group at each strategic location throughout the hEGF molecule should make it possible to generate a series of ligand-based affinity-labeling reagents, each bearing a single, unique site for potential cross-linking to the receptor extracellular domain. This approach may permit the identification of sites throughout the ligand-binding “pocket” of the receptor’s extracellular domain.

II. Generation and Characterization of Mutant Human EGF Analogues The identification of amino-acid residues in EGF critical for high-affinity association with the receptor, and thus leading to receptor-kinase activation,

360

STEPHEN R. CAMPION AND SALIL K. NIYOGI

is the necessary first step toward understanding the nature of EGFstimulated signal transduction. In order to accomplish this goal, a synthetic gene encoding the hEGF sequence was generated and cloned into an appropriate Escherichia coli expression vector (68). Although EGF is synthesized in uiuo as part of a large precursor protein (69),the primary structure of a 53amino-acid product of a recombinant hECF gene, cloned and expressed in E. coli, has the necessary chemical information for the complete folding of the peptide into its native three-dimensional structure. The genetic alteration of recombinant EGF and TGFa genes, the expression and purification of the altered gene products, and analysis of the structure and activity of these mutant proteins have been the focus of several studies described below.

A. EGF Mutagenesis The replacement of individual amino acids is accomplished using one of several protocols designed for site-specific substitution of amino acids in uitro. Oligonucleotide-directed mutagenesis is an efficient general method for producing single-base point mutations, as well as deletions and insertions, at specific locations of a gene, thereby leading to specific amino-acid changes at desired sites of the protein under study (70-72). We readily accomplished the mutation of hEGF at numerous sites in the molecule by replication of our hEGF-gene-containing plasmid DNA, using oligonucleotide primers encoding the appropriate sequences for the specific change(s). Synthesis of oligonucleotides of defined sequence, for use as primers, is readily accomplished by standard procedures for automated synthesis with any of a variety of commercially available DNA synthesizers. For most purposes, extensive purification of these synthetic oligonucleotides is not necessary. When required, it is easily performed by gel electrophoresis or high-performance liquid chromatography (HPLC). In designing the oligonucleotides (18-25 nucleotides), silent mutations, which can either introduce a new restriction site or lead to the loss of an existing one, are sometimes utilized to provide convenient means of screening for the genetic alteration(s).Confirmation of EGF gene mutations and the absence of inadvertent changes is routinely achieved by direct sequence analysis (73) of the mutated DNA. The development of the polymerase chain reaction (PCR) has led to new approaches for site-directed mutagenesis. In general, the methods have utilized primers containing the desired changes, which, after PCR, are incorporated into the PCR product. The altered sequence is then excised with restriction enzymes and inserted in place of the wild-type sequence. This procedure is relatively time-consuming and depends on the presence of conveniently located restriction sites flanking the wild-type sequence being mutated. These difficulties are bypassed in a new method (74) using an

EGF-HECEPTOH INTERACTION

361

adaptation of inverse PCR. The initial rounds of amplification are directed by two primers located “back-to-back on opposing DNA strands. One primer contains the mismatch(es) to generate the desired site-directed mutation. The first cycle of PCR generates linear plasmid molecules, from one or both circular template strands. Subsequent PCR rounds amplify the linear plasmid sequence generated by the first PCR cycle. The amplified linear product is then isolated, ligated, and introduced into E. coli cells by standard transformation techniques. This method introduces mutations at any site without requiring the presence of convenient restriction sites in the surrounding sequence. Utilizing suitable modifications (75), we have found the method to be rapid and simple to use, requiring only minute amounts of plasmid DNA and producing a high yield of bacterial transformants carrying the plasinids expressing the desired mutant EG F genes.

B. Expression of Recombinant Growth-Factor Proteins in E. coli Optimizing the expression of a mammalian protein in E. coli can be an involved process, with numerous variables that need careful evaluation regarding their effects on the final yield of the recombinant protein product. One of these variables, of great importance for the expression of E G F and TGFa proteins, involves targeting the cellular location of the protein product following its synthesis by the bacterial protein-synthesis machinery. The two alternative procedures for EGF production target the recombinant protein product either for secretion into the bacterial periplasmic space, using an appropriate secretory “signal” or “leader” sequence (76-79), or intracellularly as a fusion protein with the P-galactosidase or TrpE gene product, which is retained in the cell as an insoluble inclusion body (80, 81). The protocols for expression of TGFa in E. coli utilize a TGFdP-galmtosiduse fusion gene for production of TGFa as a insoluble intracellular inclusion body (82). Both the EGF and TGFa fusion products isolated from inclusion bodies require extensive treatment to solubilize the protein and to remove either chemically or enzymatically the non-EGF portions of the fusion products. In addition, the proteins isolated in this manner are not obtained in their native conformations and require subsequent refolding to attain the proper disulfide arrangement of the native protein structure (8082). We have used a relatively straightforward procedure for mutating and expressing a broad spectrum of mutant EGF species. Those methods, described here, are representative of the procedures used routinely by many who perform site-directed mutagenesis. The recombinant hEGF gene utilized for expression and site-directed mutagenesis in our investigation of EGF structure and function was constructed as a chimeric gene, coding for wild-type hEGF fused to the signal

362

STEPHEN R. CAMPION AND SALIL K. NIYOGI

peptide of E. coZi alkaline phosphatase, and placed under the transcriptional control of the bacterial trp-lac (tac) promoter. The fusion product is correctly processed by the E . coli secretory mechanism and authentic hEGF is secreted into the bacterial periplasmic space as directed by the alkaline phosphatase signal sequence (68). Expression of wild-type or mutant hEGF protein is induced by the addition of l-mM isopropyl-thiogalactosideto late log-phase cultures growing at 37°C in LB medium containing 25 pg/ml ampicillin. Incubation is continued for 10-12 hours or until hEGF production is maximal, at which time the cells are harvested by centrifugation. The addition of a low concentration (-5 pg/ml) of chloramphenicol during the induction phase greatly increased hEGF yields (unpublished). This was particularly useful for some hEGF mutants that were expressed poorly. The mechanism by which chloramphenicol increases protein yields is not clear at this time. Other similar bacterial expression systems for EGF utilize an ompA or p-lactamuse signal sequence, and their expression, under the influence of an inducible promoter, is similarly targeted for secretion to the bacterial periplasmic space (78, 79). Targeting the protein for the bacterial periplasm is advantageous because it not only results in correct processing and folding, but enables the “foreign” EGF product to be separated from a major portion of the bacterial proteins, thereby greatly simplifying isolation and purification of the desired protein. Other non-bacterial systems have also been utilized for the expression of EGF and TGFa proteins. The production of TGFa in yeast has been reported (83, 84); however, the low degree of purification and characterization reported for the proteins generated by the yeast expression system makes the results of limited reliability.

C. Purification of Wild-type and Mutant hEGF Proteins The wild-type and mutant hEGF proteins sequestered in the bacterial periplasin can be isolated by several methods for extracting protein from the cells and separating the recombinant EGF protein from other periplasinic proteins. Our methods have been described (68, 85), but are briefly recounted here. The process involves resuspending and extracting the cell pellet with l-M Tris-C1 (pH 9.0) containing 2-mM EDTA for 20-30 minutes. The alkaline pH and the presence of the metal chelator effectively inhibit most bacterial proteolytic enzymes, most of which function at or near neutral pH and require metal ions for activity. Following removal of the cells by centrifugation, the extracted protein is precipitated by addition of (NH4)$04 to 80% saturation. The protein pellet, collected by centrifugation, is resuspended and dialyzed against 25-mM sodium phosphate, pH 7.2,

EGF-RECEPTOR INTERACTION

363

prior to purification. Each hEGF species is first separated by gel filtration chromatography using a Sephadex G-75 column (1 x 90 cm) equilibrated and eluted with 25-mM sodium phosphate, pH 7.2. Fractions containing hEGF protein are loaded directly onto a Vydac 218 TPS reversed-phase column (4.6 x 250 mm) and eluted with a 15-34% (v/v) linear gradient of acetonitrile in 10-mM sodium phosphate, pH 7.2, on a Waters Model 600E HPLC system. This process results in the isolation of hEGF protein, homogeneous as determined by amino-acid composition and sequence analysis. Others have had success with different purification methods, such as ionexchange chromatography, but the ease of the two-step purification described above has enabled the purification of over 100 EGF mutants in our laboratory alone. About 300-500 kg of pure wild-type hEGF are obtained per liter of culture. However, the yield of EGF mutant species can vary considerably, depending on the specific amino-acid substitution.

D. Determining Receptor Affinib Kinase Stimulation, Gross Structure, and Mitogenicity of hEGF Mutant Proteins Following purification and characterization of each mutant hEGF analogue, the effect(s)of the specific amino-acid substitution on EGF’s ability to bind to its EGF receptor and to stimulate the receptor’s kinase activity has been evaluated. In addition, whenever feasible, every attempt has been made to evaluate the effects of amino-acid substitution on overall EGF protein structure. The receptor-binding affinity of each mutant hEGF protein described was assayed by a radioreceptor competition assay (86)for the binding of EGF to membrane-bound EGF receptors from the human carcinoma cell line, A431, which overexpresses the EGF receptor. The binding of radioiodinated wild-type hEGF (87) to membrane-bound EGF receptors in an enriched membrane preparation (88) is measured in the presence of increasing amounts of the various unlabeled, competing hEGF species. The concentration of protein required to displace 50% of the ‘SI-hEGF is determined for the wild type and for each mutant hEGF analogue. Direct comparison of these values provides a simple means of assessing the relative affinity of each hEGF mutant with respect to the wild-type hEGF protein, and is a valuable measure of the importance of individual amino-acid residues in receptor binding. The relative binding &nities of TGFa mutants have been determined using essentially the same methodology; however, rigorous characterization of only the EGF mutants has been reported. In addition to assaying the relative receptor-binding &nity of EGF species by radioreceptor competition, the ability of EGF analogues to stimulate receptor tyrosine-kinase activity directly was carried out as a measure of

364

STEPHEN H. CAMPION AND SALIL K. NIYOGI

their relative agonist activities. A comparison of the concentration of the mutant and wild-type EGF required to activate the receptor kinase provides an additional reliable means of assessing the relative affinity of growth-factor variants. (It should be noted that the relative affinity values determined by these two different methods generally agree for most mutant EGF analogues.) The stimulation of the EGF receptor's tyrosine-kinase activity is evaluated by measuring the phosphorylation of a synthetic (Glu,Tyr,), substrate, (68, 85), using solubilized and lectin-purified EGF receptors from A431 cells (68, 88). The ability of exogenously added EGF to stimulate receptor-kinase activity is rather sensitive to differences in assay conditions, particularly with respect to detergent concentration, ionic strength, and metal-ion cofactors such as Mg2+ and Mn2+ in the incubation buffer (27). It has generally been concluded that the optimization of assay buffer composition is a critical requisite for achieving and maintaining functional membrane proteins. Through painstaking effort, the assay conditions for receptorkinase activation have been optimized, allowing us to achieve up to a lo-fold stimulation of receptor-kinase activity in an EGF-dependent manner. In addition to evaluating EGF analogues for their ability to bind to the receptor and to stimulate the receptor's tyrosine-kinase activity, we have attempted to identify structural differences that might account for the observed changes in growth-factor activity. At present, the effect of amino-acid substitution on EGF receptor-ligand interactions cannot yet be related directly to characterized alterations in protein structure; however, the structure of mutant EGF analogues has been examined on several levels. On a gross scale, the ability to isolate functional EGF protein from E. coli extracts necessitates that the molecule be processed and folded into the native EGF structure during expression of the recombinant EGF gene product. We have observed that the HPLC elution profile during purification of hEGF proteins is a sensitive indicator of altered conformation. Deviation from the normal EGF molecular folding motif results in protein molecules with significantly altered behavior during purification using reversed-phase HPLC and readily permits identification of non-native EGF proteins. More direct comparisons of the structures of wild-type and selected EGF mutant proteins have been made in attempts to identify differences in protein structure at the molecular level, using spectroscopic methods, including CD (75) and 'H-NMR (85,89-9&). These studies indicate that, for the most part and despite possible subtle structural changes throughout the EGF molecule, the decrease in receptor-binding affinity of the EGF mutants is due to the effect of local changes in the interactions of the EGF molecule with the solvent and/or the receptor. Further structural analysis of EGF mutants, currently under way, is expected to reveal greater detail about the .

I

EGF-HECEPTOH INTEHACTION

365

interactions responsible for receptor-ligand association and kinase activation. The mitogenic potential of various hEGF mutants was assessed, as described earlier (91, 99, 103), by their stimulation of DNA synthesis in EGFresponsive cells. The incorporation of [3H]thymidine into acid-insoluble material was used as a measure of DNA synthesis. The target cells were either mouse BALB/c 3T3 clone A31 fibroblasts or BALB MK, an EGF-dependent mouse epithelial cell line. The stimulation, relative to wild-type hEGF, of thymidine incorporation as a function of the concentration of mutant hEGF analogues was used a measure of their mitogenicity. (It should be noted that although mitogenesis is a late event in signal transduction, the mitogenic potential of each E G F mutant tested reflects its relative receptor a n i t y . )

111. Effects of Single-site Mutations on Receptor-Ligand Association The substitution of amino acids in recombinant EGF and TGFa proteins by site-directed mutagenesis is a valuable tool for examining the contributions of specific residues to the structure and hnction of proteins in the E G F family of growth factors. The EGF and TGFa peptides, depending on the species, contain most of the acidic, basic, polar, non-polar, and aromatic amino-acid residues, and each of these various functional groups has a potential role in receptor-ligand association. Using site-directed mutagenesis, EGF and TGFa analogues have been produced in which amino-acid sidechains from each of the classes of functional groups in both the N-terminal and C-terminal domains of the growth-factor peptides have been replaced. The effect of amino-acid substitution on EGF receptor-ligand association has been evaluated using assays that include the radioreceptor-competition binding and receptor-kinase stimulation assays described above. In most cases, site-directed mutagenesis of EGF and the related protein TGFa gives similar results for equivalent mutations, although some mutations at equivalent sites show measurably different effects. Most information has been acquired from mutagenesis of hEGF, and so these studies are emphasized. In addition, modifications of wild-type and mutant EGF proteins using specific chemical reagents have enhanced several mutagenesis studies by providing an even greater range of protein alterations for the analysis of EGF structure and function. Not all EGF residues are amenable to side-chain substitution, nor is every side-chain suitable to replace those sites that are mutable. The expression of some mutant recombinant EGF genes does not lead to productive

366

STEPHEN R. CAMPION AND SALIL K. NIYOGI

PHE LEU

r

FIG.3. A display of some of the single-site mutants of human EGF generated and characterized in this laboratory.

yields of the mutant EGF protein or result in synthesis of EGF proteins with altered or unstable tertiary structure. Therefore, in our laboratory, more mutant EGF genes have been generated than indicated by the number of mutant EGF proteins characterized and reported. The effects of mutagenesis and/or chemical modification on the activity of EGF and, when available, those of TGFa, are summarized here and discussed with regard to their sequence conservation and the relative importance of each class of amino-acid side-chain functional group. The location of those targeted hEGF residues discussed below can be found within the primary structure of hEGF shown in Fig. 1. The hEGF single-site variants generated in our laboratory by site-directed mutagenesis are shown in Fig. 3. The relative receptor affinity values as measured by radioreceptor-competition assay for each mutant EGF and TGFa protein are given in Table I.

A. Effect of Substitution of Acidic Residues The number of acidic residues present in each member of the EGF family of proteins varies, with hEGF containing nine acidic amino acids

EGF-RECEPTOR INTERACTION

367

TABLE I RELATIVE BINDINGAFFINITIES Gmwth fiwtor species Acidic residues hEGF

hTCFa

Basic residues hEGF

Glu24-4ly Asp27-Gly Glu4bAsp Gln Ala Aspi& Ala TYr A% Asp47-Ala Asn Glu Ser Asp47-Ala Lys2&Leu A% Arg4b L y s GI11 Ile TYr GlY Ala Asp Glu Arg41-Gln His

Leu hTGFa

Arg45-tLys Lys4hArg Ar@+Lys Arg42jLys Arg42-Ala

Chemical mtxlifications of Basic residues hEGF Mutant Lys41~homoargi11ine Mutant Lys4l+lysine-amidine Native Lys28-+homocitriilline Native Lys4&homocitrulline Mutant Lys45+homocitrulline Hydrophobic residues hEGF

Relative affinity

86 48 30 25 23 23 14 4 125

83

Reference

68 68 96 96 96 96 96 96 83

20

83 83

49

83

3

97

79 188

0.40 0.20

85 85 91, 99

99

0.15 0.15 0.15 0.05 0.01 0.01

99

0.4 0.2 0.1

94 94 94

99 99 91, 99

99 99

100

96

120

85

<0.2

98 98

0

84

100

99

5.5

3.5

100 100 100

LeulhTrp

26

Val

18

99

96 96 96

101u 1010

(continued)

368

STEPHEN R. CAMPION AND SALIL K. NIYOGI

TABLE I (Continued) Growth factor species Alit Arg VallhGly Met21+Leu Thr Ile23-rAla Thr IleX.+Lell

Val Phe Trp Ala Asp Leu26-bAla GlY vd134, -Ah Ile38+Leu Leu47+Arg

Ile

mEGF hTGFa

Aromatic residues hEGF

Asp His GlY Ah Pro Leu47+Val Ala Glu ASP Leu47+Val Ser Leu4hIle Met Ala Tyr 13+ Phe Leu Ile Val His Ala GlY TyrlhLeu Tyr22+Lys

Relative &nity

2.7 1.6 58 100 36 19 3 61 45 22 11 6 0.1 48 5 23 77 20 17 14 7 1 0.7 0.4 14 2.5 2.2 1.9 33 14 <1

Reference l0llJ 1010 85

80 85 104 8S 9.5 9s 95

9s 9s 9s 104 85. 93 ,I

, 92

92 92 68, 92 92 92, 102 92

90 .90 90 90

<1

78 78, 89 83 83 83

97 78 22 20 16 6 3 0.3 202 120

7s 7s 75 7s 705 75 7.5 7s 94 10lb

<1

(continued)

369

EGF-RECEPTOR INTERACTION

TABLE I (Continued) Growth factor species

Relative affinity

TrP Phe Leu Pro Ala ASP TyrB-Phe Leu Ala Lys ClY Pro Tyr37-Phe His Ser Ala

106 75 70 69

ASP hTGFa

Polar residues hEGF

A% GlY Phel+Ala TyrWPhe Ala TyrWPhe

8

75 70 70 55

17 16 126 74 62 39 26 10 7 1 200 3

60

10111 10111

10111 10111 1011) 85 10111 10111

10111 lOlb 85 1011~ 91, 103

91,103 103 91,103 103 103

103 97 97 97

TrP

50

His Ala Thr Ser

1 <1 <1 <1

84 84 84 84 84 84

Asn32+Lys TrP GlY

110 100

93

ASP

hTGFa

61)

Reference

Pro His W Phe Asp Gln4hLys HisbLys Hisl2jAla Hisl2+Lys HisWLys His-Lys

35 25 <0.02 78 46 29 7 100 25 21 72 78 8

93 93 93

93 95b 95b 95b 95b 96 98 97 98

98 98 (continued)

STEPHEN R. CAMPION AND SALIL K. NIYOGI

370

TABLE I (Continued) Growth factor species Hisl24Ala Mutation of structural residues hEGF Pr074Thr Gly3hLeu hTGFa CysH, 21, 16, 32, 34, 43+Ala '1

Relative affinity

Reference

21

98

55

77

4

0

93

Unpi~lilishedresults

distributed throughout the molecule. Of these, only Aspll, Glu24, Asp27, Glu40, and Asp46 are conserved, particularly within the various species of EGF. Using site-directed mutagenesis, four acidic residues in hEGF have been substituted with amino acids having non-electrostatic side-chains. The removal of the side-chain at either Glu.24 or Asp27 by replacement with glycine led to little or no decrease in receptor affinity, indicating that an acidic residue is not involved in receptor binding (68). The replacement of Glu40 by aspartate retained the electrostatic charge while shortening the side-chain by one methylene group. This mutation resulted in a decrease in receptor affinity to 30% relative to the wild type (96). Amino-acid substitutions in both the Glu40 + Gln and Glu40 + Ala mutations removed the electrostatic side-chain at position 40 with no further decrease in receptor affinity. The most highly conserved of the acidic residues is Asp46 (Asp47 in TGFa), the substitution of which led to significant decreases in receptor affinity. Replaceinent of Asp46 with alanine and tyrosine resulted in a decrease in receptor a f h i t y to 23% and 14%, respectively (96).Replacement of Asp46 with arginine introduced a side-chain having an electrostatic charge opposite in polarity to the aspartate, establishing adjacent positive charges (Arg45-Arg46) that resulted in decreased affinity to 4% relative to the wild type (96). Substitution of the human TGFa (hTGFa) equivalent, Asp47, to alanine results in a decrease in receptor affinity to 3% relative to the TGFa control (97).

B. Effect of Replacement and/or Alteration of Basic Residues The number of basic residues present in various members of the E GF family ranges from only one arginine up to as many as eight lysine and arginine residues. hEGF contains five positively charged amino acids (two

ECF-WCEPTOR INTERACTION

371

lysines and three arginines), concentrated predominantly in the C-terminal domain of the protein. The participation, in receptor binding, of the positively charged amino-acid residues of hEGF was examined by site-directed mutagenesis and/or chemical modification. The electrostatic Lys28, located within a strongly hydrophobic region of the N-terminal domain, was first substituted by the uncharged hydrophobic residue leucine, with little effect on receptor binding (85). The charged €-amino groups of Lys28 and Lys48, along with the N-terminal a-amino group, were neutralized by reaction with potassium cyanate, resulting in conversion of lysine to uncharged polar hoinocitrulline (96). Elimination of these charged amines had no effect on the ability of the EGF ligand to associate with the receptor. The positively charged residue Arg45 in the C-terminal domain was modified using a combination of both site-directed mutagenesis and chemical modification. The charge-conservative substitution of Arg45 by lysine was followed by conversion to the neutral homocitrulline derivative by reaction with potassium cyanate (96). Neither the mutation nor the neutralization of the charge at Arg45 resulted in decreased receptor af€inity. Examination of sequence conservation data suggested the importance of Arg41, which is retained in all E G F and EGF-like proteins known to bind to the EGF receptor. In contrast to the apparent lack of requirement for electrostatic interaction involving the N-terminus, Lys28, Arg45, and Lys48, there was a strong requirement specifically for the guanidinium group of the Arg41 side-chain, again using a combination of site-directed mutagenesis and chemical modification. Unlike Arg45, mutation of the highly conserved Arg41 (Arg42 in TGFa) resulted in dramatic decreases in the relative receptor affinity and mitogenicity of both hEGF and hTGFa (91,98,99). Replacement with lysine, which retains the positive charge, reduced the receptor &nity to 0.4% of wild-type hEGF (91).The receptor a n i t y and mitogenicity of the Lys41 mutant EGF protein were fully restored after chemical modification with O-methylisourea, which converted the lysine amine sidechain to the guanidinium-bearing arginine homologue homoarginine (99). The stimulation of receptor tyrosine-kinase activity, which could not be reliably measured for the Lys41 mutant, w a s also fully restored to wild-type activity. The results demonstrate that electrostatic charge alone is not sufficient for binding, and establish a clear requirement for the guanidinium functional group of the side-chain at position 41 in receptor-ligand interaction. Covalent chemical rescue of deficient mutants of ribulosebisphosphate carboxylase and other proteins has been reviewed (99a). Finally, the receptor affinity of a naturally occurring truncated form of recombinant hEGF, lacking the last two residues from the C-terminal end of the molecule (-Leu52Arg53), is identical to the full-length EGF protein (100; unpublished).

372

STEPHEN R. CAMPION AND SALIL K. NIYOGI

C. Effect of Substitution of Hydrophobic Residues In initial mutagenesis studies, the highly conserved Leu47 in the C-terminal domain of hEGF w a s found to be very important for receptor-ligand association (68).The specific requirement for leucine at position 47 appears to be more than a simple necessity for a non-polar side-chain at this site in the molecule. Strong conservation of this leucine residue, specifically among the high-affinity receptor-binding growth-factor sequences, suggested that some property unique to the Leu47 side-chain was required. Replacement of Leu47 in hEGF with isoleucine, having a similar chemical character, reduced receptor affinity to 21% relative to wild type, and confirmed the stringent requirement for the Leu47 side-chain for optimal activity (92). Substitution of Leu47 in either hEGF (90, 92) or mouse EGF (78, 89) with a wide spectrum of side-chain functional groups resulted in decreases in relative receptor affinityto 1/5th to 1/2000th of the wild type (92).The relative mitogenic activity, in general, paralleled the loss in receptor affinity. Substitution with ionic residues led to the most drastic reduction in biological activity (92). Structural analyses of Leu47 mutants by NMR indicated minimal alterations in protein conformation; rather, the decreased affinities are probably due to disruption of a direct interaction of Leu47 with the aqueous solvent andlor the receptor (89, 90, 92). The peptide growth factor amphiregulin, which contains an EGF-like domain but lacks the hydrophobic Leu47 found in the C-terminal domains of EGF and TGFar, binds to the EGF receptor with an affinity l/lOth that of wild-type EGF (36).However, it is not clear whether the EGF receptor is the physiological receptor for amphiregulin. As for the N-terminal domain, a cursory examination of the EGF molecule by NMR initially suggested a predominantly structural role for the large P-sheet in the N-terminal domain, in acting as a backbone for the EGF This concept was further reinforced by the sequence homoloprotein (100~). gy data, which suggested that the more highly conserved C-terminal domain is likely the primary molecular determinant involved in the formation of the active EGF-receptor complex. A closer examination of the hEGF residues 19-32 in the N-terminal domain that are involved in forming the antiparallel P-sheet conformation indicate that amino-acid residues are locked into positions either above or below the plane of the P-sheet as a result of intramolecular hydrogen bonding of the corresponding peptide backbone. Amino-acid side-chains on one face of the @-sheet,including two tyrosines, appear to be engaged in intramolecular interactions with other residues in the protein (residues 6-13). The clustering of the aromatic side-chains in aqueous solution ( l o ] ) ,coupled with the physical constraints imposed by the three internal disulfide bonds,

EGF-RECEPTOR INTERACTION

373

leads to the formation of a very stable protein structure (43) that permits a group of hydrophobic residues to remain in a conformation relatively exposed to the aqueous solvent environment. It is also interesting to note that hydrophobic side-chains are conserved at EGF positions 19, 21, 23, and 26, within the large P-sheet of the N-tenninal domain, suggesting possible functional roles for these sites. Similar to results obtained with the C-terminal Leu47, mutation of hEGF residues Ile23 and Leu26, located in the antiparallel P-sheet of the N-terminal domain, showed dramatic decreases in receptor-binding affinity without significant changes in EGF conformation, as revealed by NMR measurements (85). A recent study evaluating various substitutions at the Ile23 site has further demonstrated the importance of this non-polar side-chain for receptor-ligand association (954. Here again, structural analysis of the mutant proteins by NMR indicated only minor perturbations insufficient to account for the dramatic loss of biological activity (950). Mutation of other hydrophobic residues, namely, Va119, Metel, and Ala25, in the P-sheet also led to moderate decreases in receptor affinity (85). Outside the P-sheet, replacements of Leu15 of hEGF led to dramatic decreases in receptor binding and activation (lola).NMR analyses indicate minimal structural changes (unpublished), suggesting a functional role of Leu15 in receptor interaction. [An additional requirement for hydrophobic interaction(s) involving the aromatic side-chain of Tyrl3 is discussed in the next section.] It is clear at this point that, aside from the essential electrostatic residue Arg41, the major forces involved in receptor-ligand binding are the hydrophobic interaction of non-polar, aliphatic amino-acid side-chains at critical sites throughout the EGF molecule. In several instances, the decreased receptor binding observed with hEGF mutants involving critical hydrophobic residues Ile23, Leu26, and Leu47 is accompanied by a decreased ability to activate fully the receptor tyrosine-kinase activity, and these hEGF analogues are also partial antagonists of the EGF-dependent receptor-kinase activity (102).The identification of those sites within the receptor’s extracellular ligand-binding domain which interact with these important sites on the ligand may provide some insight into the initial steps of the mechanism of receptor activation.

D. Effect of Substitution or Removal of Aromatic Residues The C-terminal pentapeptide, -Trp-Trp-Glu-Leu-Arg-, can easily be released by mild trypsin digestion. Removal of this peptide fragment containing both tryptophan residues has no effect on the binding of mouse EGF (loo),whereas a decrease in relative receptor affinity down to 20% of that of the wild type has been observed for hEGF (unpublished). As observed

374

STEPHEN R. CAMPION AND SALIL K. NIYOGI

among homologous proteins, the aromatic character of specific sites in the EGF family of proteins is often retained by the presence of alternate residues having similar aromatic character (i.e., tyrosine, phenylalanine, and histidine). The various EGF species contain no phenylalanine, whereas TGFa contains one or more phenylalanines. All members of the EGF family contain a variable number of tyrosine, histidine, and tryptophan residues. Functional conservation of aromatic character at positions HislO, Tyrl3, Tyr22, Tyr29, Tyr37, and Tyr44 of hEGF appears to be substantial, with complete conservation of the phenolic tyrosine side-chain at position 37. The potential importance of aromatic residues in EGF was also implicated by the studies (101) using NMR and NOE. These studies predicted a clustering of the aromatic side-chains on the surface of the protein and suggested that the aromatic residues might be involved in ligand-receptor interactions by providing a hydrophobic surface on the EGF protein. The highly conserved nature of these side-chains has made them an important target for site-directed mutagenesis studies. The roles of these residues in EGF structure and function have now been examined extensively. In our early studies (68,85), the individual replacement of Tyr22 and Tyr29 with aspartate and glycine led to decreased receptor affinity, to 8%and 178, respectively, commensurate with altered protein folding, as indicated by the appearance of multiple forms of these otherwise full-length mutant proteins during their final purification by reversed-phase HPLC (85). In recent studies, replacement of either of these tyrosine residues with apparently less disruptive [as evident from “normal” elution profiles on reversedphase HPLC (D. K. Tadaki, S. R. Campion and S. K. Niyogi, unpublished)] side-chains, including phenylalanine, leucine, and alanine, resulted in only minor decreases in receptor affinity (1Olh). The Y22W mutant displayed no loss of receptor affinity,and the Y22K analogue had an affinity slightly higher than that of the wild type (-120%) (101b). Interestingly, the Y22P mutant retained -70% receptor d n i t y , while the Y29P analogue had only 16% of wild-type hEGF activity (101b). The above results, together with computer modeling analysis of hEGF based on NMR coordinates, suggest that Tyr29 probably plays a role in maintaining the native structure of EGF; certain mutations at this site can cause sufficient structural alterations to reduce receptor affinity. Computer inodeling also indicates that Tyr22 is located within a pocket of acidic residues, that is, Asp3, Glu5, Glu24, and Asp27. This suggests that the considerably lower receptor affinity of the Y22D mutant is due to local structural perturbations caused by charge repulsion between an existing electronegative group(s)and the electronegative aspartate side-chain at position 22. The higher receptor &nity of the Y22K analogue is probably due to the stibiliza-

EGF-RECEPTOR INTERACTION

375

tion induced by ion-pairing between lysine at position 22 and an electronegative group in the pocket. Precise characterization of (perhaps) subtle structural alterations in mutants at either of the largely conserved 22 or 29 positions by two-dimensional N M R is needed before meaningful conclusions can be drawn regarding the possible role of Tyr22 and Tyr29 in the biological activity of EGF. The results clearly indicate that a role other than receptor binding (e.g., in EGF protein stability, EGF metabolism, receptor internalization, and divalent metal-ion binding) is perhaps played by these residues in oiuo. The importance of the highly conserved tyrosine at position 13 in receptor-ligand association was suggested from N M R studies (48, 49)that predicted its close proximity to Arg41, which plays a critical role in the binding of EGF and TGFa to the EGF receptor, and those that indicated a possible role in providing a hydrophobic surface on the EGF molecule (101). The role of Tyr13 in receptor binding has now been investigated by sitedirected mutagenesis. The results show that aromaticity at position 13 is not critical for overall binding to the receptor, since the aromatic tyrosine residue can adequately be replaced by an aliphatic leucine residue (94, 75, 752). The hydrophobic nature of this site appears to be the functional characteristic required to form a stable ligand-receptor complex, since substitution with smaller, less hydrophobic, or electrostatic residues resulted in significant losses in receptor affinity (75). CD spectral analysis of several hEGF mutants (75) and an N M R study of the hEGF Leu13 mutant (94) showed no major structural alterations. The results indicate that the Tyrl3 side-chain plays a critical functional role in receptor binding by contributing to hydrophobic ligand-receptor interactions; however, it is clear that the aromaticity of this site is not necessary for binding despite the highly conserved aromatic side-chain at this position. It should be noted that, similar to Tyrl3 in hEGF, the mutation of the hTGFa equivalent, Phel5, to alanine resulted in substantially reduced receptor affinity (97). Substitution of the hEGF residue Tyr37, which is completely conserved among the EGF family, by a variety of amino acids indicated that neither an aromatic group nor an aliphatic group is essential at this site for EGF’s biological activity (91,103). The corresponding residue, Tyr38, in TGFa appears to be considerably more important for the biological activity of this growth factor (97). For example, substitution of Tyr38 in hTGFa with alanine decreased the relative receptor atlinity to 1/30th (97),whereas the equivalent substitution of Tyr37 in hEGF decreased the relative receptor atlinity to 2/5ths (91,103). The reason(s) for the drastic difference in the effects of mutation of the highly conserved amino-acid residues Tyrl3 (Phel5 in TGFa) and Tyr37

376

STEPHEN R. CAMPION AND SALIL K. NIYOGI

(Tyr38 in TGFcY)is not clear at this time. It is possible that Tyr37 as well as Tyr22 and Tyr29 might participate in some function common to EGF-like proteins but unrelated to receptor recognition and high-affinity binding. The high degree of conservation of these aromatic residues remains an intriguing subject. The substitution of lysine for an individual histidine residue at position 4, 18 or 45 in hTGFar resulted in a decrease to 1/4th in relative receptor affinity for the Lys4 mutant, to 1/5th for the Lysl8 mutant, and to 1/12th for the Lys45 analogue (98).

E. Mutation of Polar hEGF Residues Asn32 and Gln43 Although amino-acid residues having a non-electrostatic, polar characteristic are not generally conserved ainong the EGF family of proteins, the neutral polar residues Am32 and Gln43 were targeted for inutagenesis because of their proximity to and/or potential interaction with residues important in the hEGF molecule. The highly conserved Asn32, located between Cys3l and Cys33, resides in the “hinge”region of EGF and separates the N- and C-terminal motifs of the EGF molecule. Aside from its potential role in receptor-ligand interaction, its unique location suggests a possible role in maintaining the native EGF conformation. Several hEGF analogues were generated by replacement of Am32 with aspartate, glycine, lysine, proline, and tryptophan (93). A substitution of the relatively small, neutral, polar Am32 with the larger and electrostatically charged lysine or the bulky aromatic tryptophan sidechain had no effect on receptor-binding affinity, suggesting a fairly high degree of tolerance for replacements at this site. Removal of the Am32 sidechain by substitution with glycine resulted in a decrease to 35% in relative receptor affinity, and replacement with aspartate decreased it to 25%. However, no binding of the Pro32 mutant could be detected by radioreceptor competition. NMR analysis indicated gross structural perturbation for the Pro32 analogue. In contrast, the Lys32 and Asp32 mutants exhibited spectra similar to native wild-type EGF. These results suggest the importance of hydrogen-bond donor function of the residue at position 32 in forming a fully competent receptor-binding epitope. A similar conclusion was reached indein studies combining inutagenesis and NMR pendently by Koide et al. (9.51) analysis. The Va132, Phe32, and Asp32 analogues had relative receptor affinities of 46% 29% and 7%, respectively, while exhibiting N M R spectra similar to that of the wild type. Based on sequence conservation and limited structure-function information, the polar residue Gln43 was previously postulated to be a functionally

EGF-KECEPTOH INTERACTION

377

important residue of the receptor-recognition site (100~). In addition, the close proximity of Gln43 to the essential residue Arg41 and the potential for interaction of these side-chains make Gln43 an attractive target for sitedirected mutagenesis. However, the replacement of the neutral polar Gln43 side-chain with the positive lysine ainine had no effect on receptor affinity (96).

F. Alteration of Structural Residues Proline, glycine, and cysteine residues are unique in their respective abilities to induce, accommodate, and maintain bond angles and protein conformations not allowable with any other amino acid. The conservation of a proline residue adjacent to the first cysteine residue in E G F and TGFa suggested a potential role for this residue in establishing some critical feature of the native EGF structure. However, substitution of hEGF residue Pro7 with threonine resulted in only a slight decrease in receptor affinity (68). The structural requirement for repeated tight turns in the C-terminal domain of the EGF may be met by the presence of residues Gly36 and Gly39. The requirement for these residues was evaluated by substitution by valine and leucine, respectively. Introduction of these side-chains resulted in an apparent inability of EGF to fold into its native structure (unpublished). Substitution of any or all of the absolutely conserved EGF or TGFa cysteine residues results in complete loss of function (97).

IV. Cumulative Effect of Multiple Mutations on Receptor Binding As described above, individual mutation of several important EGF residues, including Tyrl3, Tyr22, Ile23, and Leu26 in the N-terminal domain, or of the highly conserved Arg41 and Leu47 in the C-terminal domain of hEGF, decreases receptor affinity. Computer modeling (D. K. Tadaki, S. R. Cainpion and S. K. Niyogi, unpublished), based on NMR data, indicates that these residues, which probably serve as “contact points” in the interaction of EGF with its receptor, are all located on one face of the EGF inolecule (Fig. 4). Having identified most of the important sites in studies individually replacing single amino acids, we asked two related questions about the potential interaction of these important sites located in different regions of the molecule. First, does any single-site mutation disrupt the interactions of the receptor with one or more of the other important sites on the ligand, or are the structural effects of any individual mutation limited, reducing affinity by directly affecting only the region of the mutated site(s)? The second, related

378

STEPHEN R. CAMPION AND SALIL K. NIYOGI

TABLE I1 MUTATIONSOF hEGF ON RECEPTOR BINDING" CUMULATIVE EFFECTSOF DOUBLE-SITE Single-site relative Ilinding affinity (% of wild type)

IiEGF species

Wild type Tyrl+His

hEGF species

Double-site relative binding &nity (% of wild type)

100 rl3+His/Ile%Tlir

0.18

(0.32)

rl3+His/Leu47+Ala

0.20

(0.16)

r22+Asp/Leu47+Ala

0.04

(0.08)

Ile23+Thr/Leu47+AIa

0.05

(0.08)

Tyr22+Asp

Ile-Thr IIe%Ala Lt.uZhGly

Ile%AIa/Leu26+AIa

11

(9)

Leu2hAla u26+Gly/ Leu47+Ala

0.05

(0.10)

Leu26+Gly/Asn32-rAsp

2

(1.25)

Asn32+Asp/Leu47+Ala

0.5

(0.5)

Lys2hArg Asn32+Asp Leu47+Ala Lys2hArgILys4hArg

200

(216)

L y s 4 h Arg ~~

~~

~

Values iii parciitlwses rt-present relatiw affinities, which were calciilated based OII thr prtdictioii that the effkts of single-site mutatioiis are mutnally exclusive and iiidepciidciit. (1

question is: does the association of any single region of the EGF molecule with the receptor influence, by cooperative binding, the subsequent association of any other region of the molecule? We approached the above questions by evaluating the effect(s) of simultaneous mutation of two hEGF residues in various combinations at locations throughout the molecule and including most of the amino acids of known importance. Subsequently, single-site EGF mutations Tyrl3 + His, Tyr22 + Asp, I l e a + Thr, Ile23 + Ala, Leu26 + Gly, Leu26 + Ala, Am32 + Asp, Arg41+ Lys, and Leu47 + Ala were combined in a variety Of WdyS to produce double-mutant gene products having alterations either within the same domain or in separate domains of the EGF molecule (104). The relative receptor-affinity values determined for the double-site hEGF analogues are given in Table 11, showing the relationship of the double-mutant proteins relative to each of the corresponding single-site parent mutations. The effect of simultaneous mutation on receptor &nity, in nearly all cases, indicated that mutation at any one site does not substantially

FIG.4. Functional residues of EGF as depicted in a computer model generated from

NMR coordinates provided by G. T. Montehone (Rutgers University).The residues indicated in pink are the important hydrophobic residues Tyr-13, Leu-15, Ile-23, Leu-26, and Leu-47. The residue shown in purple is the critical Arg-41.

This Page Intentionally Left Blank

EGF-HECEPTOH INTERACTION

379

alter the effect of mutation at the second site, with the cumulative effect of double mutation being the product of the two individual parent mutations. This finding confirins the importance of these individual residues in receptor binding and suggests that each of these separate sites functions essentially independently in the interaction of the EGF molecule with its receptor. Consequently, the overall high affinity of EGF-receptor binding is the result of the cumulative interaction of these individual sites of receptor-ligand interactions.

V. Conclusions Investigation of the interactions of protein molecules with other proteins, with nucleic acids, or with other important biological components has been tremendously enhanced by the development of simple and efficient in uitro mutagenesis procedures. The studies of the physical and chemical nature of EGF receptor-ligand interactions have taken full advantage of the ability to manipulate the primary structure of the EGF protein as a means of identifying individual amino-acid side-chains that participate in the protein-protein interactions resulting in the high-affinity association of EGF with the receptor. Analysis of the in uitro mutagenesis data in light of structural and sequence conservation information has made it possible to assign the relative importance of nearly every critical residue in the EGF molecule in terms of its role in maintaining the required protein tertiary structure and in receptor binding and activation. The studies have now clearly established that the major physical and chemical interactions responsible for the high-affinity association of EGF with its receptor involve the function of several conserved hydrophobic amino acids, Tyrl3, Leul5, I l e a , Leu26, and Leu47, and a single highly conserved electrostatic residue, Arg41. These studies have also generated novel EGF analogues with potential applications, for example, in deciphering the ligand-binding residues of the EGF receptor, in sorting out the chain of events in signal transduction, and as EGF antagonists in potential cancer therapy.

ACKNOWLEDGMENTS The authors are iiidebted to Douglas K. Tadaki for the generation of computer iiiodels of EGF hued on NMR data. We thank him and Krishnadas Nandagopal for critical reading of the manuscript. The N M R ctwrdinates were graciously fiiriiished by Gaetano T. Montelioiie of Hutgers University.

380

STEPHEN R. CAMPION AND SALIL K. NIYOGI

REFERENCES 1. 2. 3. 4.

J. Schlessinger, Biochetn. SOC.Synp. M,13 (1989). A. Ullrich and J. Schlessinger, Cell 61, 203 (1990). G . Carpenter mid S. Cohen, JBC 2M,7709 (1990). G . Carpenter and M. I. Wuhl, in “Peptide Growth Factors and Their Receptors I” (M. Sporn and A. Roberts, eds.), Handb. Exp. Phtumacol., p. 69. Springer-Verlag, New

York, 1990. 5. A. Ullrich, L. Coussens, J. S. Hayflick, T. J. Dull, A. Gray. A. W. Tam, J. Lee, Y. Yarden, T. A. Libermann, J. Schlessinger, J. Downward, E. L. V. Mayes. N. Whittle, M. D. Waterfield and P. H. Seeburg, Nature 309, 418 (1984). 6. W. S. Chen, C. S. Lazar, M. Poenie, R. Y. Tsien, G . N. Gill and M. G . Rosenfeld. nature 328, 820 (1987). 7. A. M. Honegger, T. J. Dull, S. Felder, E. Van Obberghen, F. Bellot, D. Smpary, A. Schmidt, A. Ullrich and J. Schlessinger, Cell 51, 199 (1987). 8. W. H. Moolenluir, A. J. Bierman, B. C. Tilly, I. Verbn, L. H. Defize, A. M. Honegger, A. Ullrich and J. Schlessinger, E M B O J . 7, 707 (1988). 9. J. Downward, Y. Yarden, E. Mayes. G. S c m , N. Totty, P. Stockwell, A. Ullrich, J. Schlessinger and M. D. Waterfield, Nature 307, 521 (1984). 10. J. E. Dehrco and F. Todivo, PNAS 75, 4001 (1978). 11. T. A. Lilwnnann, H. R. Nusbaum, N. R u m , R. Kris. I. Lax, H. Soreq, N. Whittle, M. D. Waterfield, A. Ullrich and J. Schlessinger, Nature 313, 144 (1985). 12. R. G. Goodwin, F. M. Rottman, T. Caalghan, H.-J. Kiing, P. A. Maroney and T. W. Nilsen, MCBbl6, 3128 (1986). 13. P. P. Di Fiore, J. H. Pierce, M. H. Kruus, 0. Segatto, C. R. King and S. Aamnson, Science 237, 178 (1987). 14. D. J. Slamon, W. Godolphin, L. A. Jones, J. A. Holt, S. G. Wong, D. E. Keith, W. J. Levin, S. G. Stuart, J. Udove, A. Ullrich and M. F. Press, Science 244, 707 (1989). 15. C. Greenfield, I. Hiles, M. D. Waterfield, M. Federwisch, A. Wollmer, T L. Blundell and N. McDonald, E M B O J . 8, 4115 (1989). 16. L. H. K. Defize, W. H. Moolenaru, P.T vinderSaagand S. W. d e b t , E M B O J . 5,1187 (1986). 17. J. Schlessinger,J. Cell B b l . 103, 2067 (1986). 18. J. Schlessinger, Bchem 27, 3119 (1988). 19. J. Schlessinger, TZBS 13, 445 (1988). 20. Y. Yarden and J. Schlessinger, Bchem 26, 1434 (1987). 21. Y. Yarden and J. Schlessinger, Bchem 26, 1443 (1987). 22. Y. Ywden and A. Ullrich. Bchem 27, 3113 (1988). 23. 0. b h l e s , Y. Ywden, R. Fischer, A. Ullrich and J. Schlessinger, MCBiol11, 1454 (1991). 24. M. Spwrgaren, L. H. K. Defize, J. Boonstruiind and S. W. de b t , JBC 266,1733 (1991). 25. T. Spivak-Kmizman, D. Rotin, D. Pinchasi. A. Ullrich, J. Schlessinger and I. Lax, JBC 867, 8056 (1992). 26. R. B i s w , M. Basu, A. Sen-Majumder and M. Das, Bchern e4,3795 (1985). 27. J. G. Koland and R. E. Cerione, JBC 263,2230 (1988). 28. I. Northwood and R. J. Davies, JBC 263, 7450 (1988). 29. B. 0. Fanger. J. E. Stephens and J. V. Staros, FASEBJ. 3, 71 (1989). 30. B. 0. Fanger, K. S. Austin, H. S. Earp and J. A. Chidlowski, Bchem S, 6414 (1986). 31. C. Cochett, 0. Kashles, E. M.C h a m h , I. Borrello, C. R. King and J. Schlessinger,JBC 263,3290 (1968).

ECF-RECEPTOR INTERACTION

381

32. T. Wada, X. Qian and M. I. Greene, Cell 61, 1339 (1990). 33. I. Lax, A. K. Mitra, C. Ravera, D. R. Hurwitz, M. Rubinstein, A. Ullrich, R. M. Stroud and J. Schlessinger, JBC 266, 13828 (1991). 34. F. Canals, Bchein 31, 4483 (1992). 35. R. J. Simpson, J. A. Smith, R. L. Moritz, M. J. O’Hare, P. S. Rudland, J. R. Morrison, C. J. Lloyd, B. Grego. A. W. Burgess and E. C. Nice, EJB 153, 629 (1985). 36. M. Shoyab, G . D. Plowman, V. L. McDonald, J. G. Bradley and C . J. Todaro, Science 243, 1074 (1989). 37. S. Higashiyania. J. A. Ahrahain, J. Miller, J. C. Fiddes and M. Klagsl)run, Science 251, 936 (1991). 37n. K. L. Carraway 111 and L. C. Cantley, CeN 78, 5 (1994).

38. W. E. Holmes, M. X. Sliwkowski, R. W. Akita, W. J. Henzel, J. Lee, J. W. Park, D. YdIiSUra, N. Abadi, H. Rub, G. D. Lewis, H. M. Shepard, W.-J. Kuang, W. I. Wood, D. V. Coeddel and R. L. Vandlen, Science 256, 1205 (1992). 39. D. Wen, E. Peles, R. Cupples, S. V. Suggs, S. S. Bacus, Y. Luo, G. Trail, S. Hu, S. M. Silbiger, R. Ben-Levy, R. A. Koski, H. S. Lu and Y. Yarden, Cell 69, 559 (1992). 40. E. Peles, S. S. Bacus, R. A. Koski, H. S. Lu. D. Wen. S. G . Ogden, R. B. Levy and Y. Yarden, Cell 69, 205 (1992). 41. E. Peles, R. Ben-Levy, E. T7;lhar. N. Liu. D.Wen and Y. Yarden, EMBOJ. 12,961 (1993). 42. D. Wen, S. V. S u s s , D. Karunagaran, N. Liu, R. L. Cupples, Y. Luo, A. M. Janssen, N. Ben-Baruch, D. B. Trollinger, V. L. Jacobsen, S. Y. Meng, H. S. Lu, D. Chang, W. Yang, D. Yanigahara, R. A. Koski and Y. Yarden, MCBiol 14, 1909 (1994). 43. L. A. Holladay, C. R. Savage, S. Cohen and D. Puett, Bchetn IS, 2624 (1976). 44. R. M. Cooke, M. J. Tappin, I. D. Campbell, D. Kohda, T. Miyake, T Fuw, T.Miyazawa and F. Inagaki, EJB 193, 807 (1990). 45. R. M. Cooke, A. J. Wilkinson, M. Baron, A. Pastore, M. J. Tappin, I. D. Campbell, H. Gregory and B. Sheard, Nature 327, 339 (1987). 46. G. T. Montelione. K. Wiithrich, E. C. Nice, A. W. Burgess and H. Scheraga, PNAS 83, 8594 (1986). 47. G . T. Montelione, K. Wiithrich, E. C. Nice, A. W. Burgess and H. Scheraga, PNAS 84, 5226 (1987). 48. C. T. Montelione, K Wiithrich, A. W. Burgess. E. C. Nice, G. Wagner, K. I). Gibson and H. Scheragia, Bchetn 31, 236 (1992). 49. D. Kohda, N. Go, K. Hayashi and F. Inagaki, J . Biochetn. 103, 741 (1988). 50. D. Kohda, T. Sawada and F. Inagaki, Bchem 30, 4896 (1991). 51. D. Kolida and F. Inagaki, Bchesi 31, 11928 (1992). S2. T. Ikura and N. Go, Proteins 16, 423 (1993). 53. I. Lax, A. Johnson. R. Howk, J. Sap, F. Bellot, M. Winkler, A. Ullrich, B. Vennstrom, J. Schlessinger and D. Givol, MCBiol 8, 1970 (1988). 54. T. Yarnamoto, S. Ikawa, T. Akiyama, K. Senil)a, N. Noinura, N. Miyajima, T. Saito and K. Toyoshima, Nutrcre 319, 230 (1986). 55. C. I. Bargmann, M.-C. Hung and R. A. Weinberg, Nature 319, 226 (1986). 56. G. D. Plowinan, G. S. Whitney, M. G . Neubauer, J. M. Green, V. L. McDonald. C. J. Todaro and M. Shoyal), PNAS 87, 4905 (1990). 560. G. D. Plowman. J.-M. Culousmu, G. S. Whitney, J. M. Green, G. W. Carlton, L. Foy, M. G. Neubauer and M. Shoyab. PNAS 90, 1746 (1993). 57. S. C. Wadswortli, W. S. Vincent I11 and D. Bilodeau-Wentworth. Nature 314, 178 (1985). 58. E. D. Schejter, D. Segal, L. Glazer and B.-Z. Shilo, Cell 46, 1091 (1986). 59. A. Ullrich, J. R. Bell, E. Y. Chen, R. Herrera, L. M. Petruzzelli, T. J. Dull, A. Cmy,

382

STEPHEN R. CAMPION AND SALIL K. NIYOCI

L. Coussens, Y.-C. Liao, M. T s u b h a , A. M. Mason, P. H. Seeburg, C. Grunfeld, 0. M. Rosen and J. Ramitchandran, Nature 313, 756 (1985). 60. M. Bajaj, M. D. Waterfield, J. Schlessinger, W. R. Taylor and T. Blundell, B B A 916,220 (1987). 61. I. Lax. F. Bellot, R. Howk, A. Ullrich, D. Givol and J. Schlessinger, E M B O J . 8, 421 (1989). 62. I. Lax, F. Bellot. A. M. Honegger, A. Schmidt, A. Ullrich, I). Givol. and J. Schlessinger, Cell Regul. 1, 173 (1990). 63. D. Kohda, M. Odaka, I. Lax, H. Kawasaki, K. Suzuki, A. Ullrich, J. Schlessinger and F. Inagaki, JBC 268, 1976 (1993). 64. I. Lax, W. H. Burgess, F. Bellot, A. Ullrich, J. Schlessingerand 1).Givol, MCBiol8,1831 (1988). 65. D. Wu, L. Wang, Y. Chi, G . H. Sato and J. D.Sato, PNAS 87, 3151 (1990). 66. R. L. Woltjer, T. J. Lukas and J. V. Stmos, PNAS 89, 7801 (1992). 67. R. L. Woltjer, L. Weclas-Henderson, I. A. Papayannopoulos and J. V. Staros, Bchetn 31, 7341 (1992). 68. D. A. Engler, R. K. Matsunami, S. R. Campion, C. D. Stringer, A. Stevens and S. K. Niyogi, JBC 263, 12384 (1988). 69. B. Mroczkowski, M. Reich, M. Chen, G. 1. Bell and S. Cohen. MCBiol 9, 2771 (1991). 70. D. Botstein and D. Shortle, Science 229, 1193 (1985). 71. C. S. Craik, BioTechniques 3, 12 (1985). 72. M. J. Zoller and M. Smith, Methods Enzymol. 100, 468 (1983). 73. F. Sanger, S. Nicklen and A. R. Coulson, PNAS 74, 5463 (1977). 74. A. Helinsley, N. Amheim, M. D. Toney, G . Cortopassi and D. J. Galas, N A R e s 17,6545 (1989). 75. D. K. Tadaki and S. K. Niyogi, JBC 268, 101144 (1992). 7 5 ~ .D. K. Tadaki and S. K. Niyogi, F A S E B J . 5, A l l 8 3 (1W1). 76. T. Oka, Sakamoto, K.4. Miyoshi, T. Fuwa, K. Yoda, M. Yamasaki, G. Tamura and T. Miyake, PNAS 82, 7212 (1985). 77. A. Ito. T. Katoh, H. Gomi, F.Kishimoto, H. Agui, S. Ogino, K. Yoda, M. YaniasakiandG. Tamura, Agric. B i d . Chevn. 50, 1381 (1986). 78. P. Ray, F. J. Moy, G . T. Montelione, J.-F. Liu, S. A. Narang, H. A. Scheraga and R. Wu, Bcheni 27, 7289 (1988). 79. H. Oh@, T. Kumakura, S. Koinoto, Y. Matsuo, K. Ohshiden, T. Koide, C. Yanaihara and N. Yanaihara, /. Biotechnol. 10, 151 (1989). 80. S.-I. Sumi, A. Hasegawa, S. Yagi, K.4. Miyoshi, A. Kanemwa, S. Nakagawa and M. Suzuki, J . Biotechnol. 2, 59 (1985). 81. G. Allen, M. I). Winther, C. A. Henwood, J. Beesley, L. F. Sharry, J. O’Keefe, J. W. Bennett, R. E. Chapman, D. E. Hollis, B. A. Panaretto, P. Van Dcmren, R. W. Edols, A. S. Inglis, P. C. Wynn and G . P. Moore, J. Biotechnol. 5, 93 (1987). 82. M. E. Winkler, T. Bringinan and B. J. Marks, JBC 261, 13838 (1986). 83. E. Lazar, S. Watanabe, S. Dalton and M. B. Sporn, MCBiol8, 1247 (1988). 84. E. h r , E. Vicenzi, E.Van Obhrghen-Schilling, B. Wolfe, S. Dalton, S. Watanabe and M. B. Sporn, MCBiol 9,860 (1989). 85. S. R. Campion, R. K. Matsunami, D. A. Englerand S. K. Niyogi, Bchein29,9988(1990). 86. G. Carpenter, Methods E n z y m o l . 109, 107 (1985). 87. W. M. Hunter and F. C. Greenwood, Nature 174, 495 (1962). 88. T. Akiyama, T. Klldooka and H. Ogawma, BBRC 131, 442 (1985). 89. F. J. Moy, H. A. Scheraga, J. F. Liu, R. Wu and G . T. Montelione. PNAS 86, 9836 (1989).

EGF-RECEPTOR INTERACTION

383

90. T. J. Dudgeon, R. M. Cooke, M. Baron, I. D. Campbell, R. M. Edwardsand A. Fallon, FEBS Lett. 261, 392 (1990). 91. D. A. Engler. G . T. Montelione and S. K. Niyogi, FEBS Lett. 271, 47 (1990). 92. R. K. Matsunami, M. L. Yette, A. Stevens and S. K. Niyogi, J. Cell. Biochem. 46, 242 (1991). 93. S. R. Campion, C. Biamonti, G . T. Montelione and S. K. Niyogi, Protein Eng. 6, 651 (1993). 94. U. Hommel, T. J. Dudgeon, A. Fallon, R. M. Edwards and I. D. Campbell, Bchern 30, 8891 (1991). 95. H. b i d e , Y. Muto, H. Ksai. K. Kohri, K. Hoshi, S. Takahashi, K . 4 . Tsukumo, T. Sasaki, T. Oka, T. Miyaki, T. Fuwa, D. Kohda, F. Inagaki, T. Miymwa and S . Yokoyama, BBA 1120, 257 (1992). 950. H. b i d e , Y. Muto, H. Kasai, K. Hoshi, H. Takusari, K. Kohri, S. Takahashi, T. Sasaki, K. Tsukoino, T. Miyake, T. Fuwa, T. Miyazawa and S. Yokoyama, FEBS Lett. 302, 39 (1992). 96. S. R. Campion, D. K. Tadaki and S. K. Niyogi, J . Cell. Biochem. 50, 35 (1992). 97. D. Defeo-Jones, J. Y. Tai, R. J. Wegnyn, G. A. Vuocolo, A. E. Baker, L. S. Payne, V. M. Garsky, A. Oliff and M. W. Riemen, MCBiol8, 2999 (1988). 98. D. Defeo-Jones, J. Y. Tdi, G. A. Vuocolo, R. J. Wegnyn, T. L. Schofield, M. W. Riemen and A. OW, MCBiol 9, 4083 (1989). 99. D. A. Engler, S. R. Campion, M. R. Hauser, J. S. Cookand S. K. Niyogi,JBC267,2274 (1992). 990. F. C. Hartman and M. R. Harpel, Ado. Enzysiol. 67, 1 (1993). 100. A. W. Burgess, C. J. Lloyd, S. Smith, E. Stanley, F. Walker, L. Fabri. R. J. Simpsonand E. C. Nice, Bchan 27, 4977 (1988). 1000. I. D. Campbell, R. M. Cooke, M. Baron, T. S. Harvey and M. J. Tappin, Prog. Growth Facfor Res. 1, 13 (1989). 101. K. H. Mayo, P. Schaudies, C. R. Savage, A. DeMarcv and R. Kaptein, J. Biochem. 239,13 (1986). 1010. K. Nandagopal, D. K. Tdaki and S . K. Niyogi, FASEB J . 8, A1460 (1994). 1011~D. K. Tadaki, S. R. Campion and S. K. Niyogi, FASEB J. 8, A1459 (1994). 102. R. K. Matsunami, S. R. Campion, S. K. Niyogi and A. Stevens, FEBS Lett. 264, 105 (1990). 103. I). A. Engler, M. H. Hauser, J. S. Cook and S. K. Niyogi, MCBlol 11, 2425 (1991). 104. S. R. Campion, M. K. Geok and S. K. Niyogi, JBC 268, 1742 (1993).

This Page Intentionally Left Blank

Index

A

modulation of activity calmoddin, 263-265 forsknlin, 262 C. proteins. 262-263 phylogenic relationship, 273-275 purification, 262, 264 recu)ml)inant enzyme expression, 26p

Activator protein 1, &wt on HIV gene expression, 164 Adenosine 3',5'-cyclicmonophosphate, see Cyclic AMP S-Adenosylmethioniiie methyl donor DNA (cytosine-5)rnethyltraiisfer~se, 66-

265 sequence homology, 262-263, 265-266,

68, 72

rRNA methylation, 232 nonenzymutic methylation of DNA, 104 Adenylyl cychses a~)undanc~ in cells, 241-2443. 261-262 Bacillirs anthracis enzyme active site residues, 260 c;ilmodulin binding domain, 260-261 enzyme iwtivation, 251,260 evolution, 272, 275 gene cloning, 259 metal requirement. 260 purification, 259 secu)n+ structure, 260-261 Bordetella pertussis enzyme calmtdulin binding site, 254-256 enzyme activation, 251,254,257 domains catalytic, 253-257 hemolysin, 257-259 evolution, 272, 275 gene cloning. 252-253 heterogeneity of prelrarations. 251-252 inhil,itors, 256-257 piirification. 252 site-directed mutagenesis, 276 size, 252-253 r-alcium modulation, 271 class 111 enzymes domains, 263 gene cloning, 262-266 hetendimeri7atio11, 267

268-269 types, 265-266,273 classes, 242 Gram-negative facultative anmrolm ellzyllle domains. 246 gene cloning. 243-246 mtdiilation of activity CAMP, 246 gllIcY)se, 249 G proteins, 250-25251 phosphorylatioii, 250 oxidation, 244 phylogenic relationships, 271-272 purification, 242-243 secondary structure. 243 sequencv homology, 244-249 storage, 243 pulsation of CAMPpduction, 275-276 similarity to guanylyl ccluses, 267, 270-

271, 273-274 AdoMet, see S-Adenosylm~thionine b-Adrenergic kinase in1iil)ition. 139 role in receptor desensitization, 138-139 substrate phosphorylation sites, 138, 142 a,-Adrenergic rewptors, desensiti7ation, 141 &-Adrenergic rewptors agonist I,inding site, 121-124 structure, 121-123 desensiti7ation. 137-140 down-regulation, 143

385

INDEX

effects of domain deletions. 121 G protein 1)inding site, 131-133 messenger HNA stability, 143-144 phospliorylation, 138-139, 143 regulation by steroid horniones, 144 sequestration. 140-141 Antigen presentation. 3, 30-32 Antisense oligooucleotides, HIV therapy, 182-1x3 AP1. see Activator protein, 1 5-Ajracytidine. effect 011 cliro~nosome structure, 98-99 5-kiriideoxycytidiiie DNA (cytosiiie-5)metliyltraiisfer~se iiihil,ition, 66-67 effect on DNA inethylation pattern. 98-99

B B23 protein. role in RNA proc-essing, 221 Bisulfite, deamination of DNA, 103 Blenoxane, see Bleomycin Bleom ycin bithiazole derivatives, 330-331, 336-338 Bleiioxane composition. 314-315 cancer cheinotherapy. 314 I)NA-l)leoinycin intefirtions Iiiiiding constants, 329 I)itIiiajrole. 329-330. 333-338 carboxy terminus. 329, 334 hydrophol)ic interactions, 329-330 ionic interactions. 329 inaiiiiose carbamoyl group, 330-332 metal-l)indiiigdomain, 336 minor grcmve, 332-333, 348 DNA cleavage 1)itliiazolederivative specificity, 333337 catalytic cycle, 320-322 cisplatin 1)iiiding effect on cleavage site, 345 deg1yru)l)leoiiiyciiispecificity. 333-334 deuterium isotope effects, 324 effect of DNA inethylation, 345-346 ~nedianismof catalysis by iron-bound complex, 320-327 oxygen requirement, 326-327 products of reaction. 332-323 remgnition of conforination, 3 4 - 3 8

RNA hyl,rid strains, 343-344 site specificity, 314, 330-332, 348 solvent oxygen exchange, 327 sterecwheinistry. 323-324 triple helix, 346-348 DNA iiiiwinding activity, 329-330 domains, role in DNA cleavage site specificity, 314, 333, 336 metal ion coinplexes Co(III), 327 ruw)rdination geometries. 315-316, 332 Cu(I), 328 Cu(II), 328 Fe(11) activation. 3 16-3 17 MII(II). 327-328 olefin oxidation, 317-320 oxidation srilistrates, 317-318 RNA cleavage DNA hybrid strands, 343-344 niechanisin, 341-343 reaction products, 342-343 recognition of ru)aforination, 339, 341, 344 specificity for HNA type, 339, 343-344 s h t r a t e wcessiMity, 338-339 structure. 315 5-Bromodeoxyuridiiie, elfect oii DNA inethylation pattern, % Butyric wid. 1)iiidiiigto HIV LTH, 167

C Capillary electrophoresis. see Constant deiiaturant capillary electrophoresis Cispletin cancer chemotherapy, 345 DNA \)inding etfect on I)leoinyciii cleavage specificity, 515 Congenital nephrogenic dkhetes insipidus mutations in v;isopressin receptor, 134 sympton1s. 134 Constant deliatitrant gel electrophoresis experimental set-up, 304 separation etTiciency, 305-308 separations in mutational spectrometry, 302, 304-308 Constant denaturant capillary electrophoresis experimental set-up. 304

387

INDEX

separation efficiency, 305-308 sepi~rationsin mutational spectrometry, 288. 304-310 Cyclic AMP ancestral fiinctions, 274 discwvery, 241 grnwth condition effect on bacterial level, 242-243 pulses, 276 recwptor, 2-50 second messenger activity, 241, 267 synthesis, 241 Cyclic GMP ancestral hinctions, 274 role in phototransduction, 267, 271 second messenger activity, 267 synthesis, 267 Cytosine methylation. sre DNA (cytosine-5)iiietIiyltraiisfe~~~e Cytoskeleton. see u/so Intermediate filaments filaments, 36-37 networks, 36

D Denaturant gradient gel electrophoresis experimental set-up, 304 separation efficiency, 305-308 separations in mutational spectrometry, 286-289, 304-308 2,3- Dimercaptopropaiiol, H IV therapy, 184 DNA bulge, 346 cleavage by metal cwiiplexes, 313 electrophoresis of mutants, 303-304 cumstant denaturant gel electrophoresis, 302,304-308 constant denaturant capillary electrophoresis. 288, 304-310 denaturant gradient gel electrophoresis, 286-289, 304-308 experimental set-up, 304-W5, 308 separation efficiency, 305 melting cooperativity, 302 domains, 302-303 factors &ecting melting temperature, 303

methylation, see DNA (cytosine-5)methyltransferase mutational spectrum, .we Mutational spectrometry triple-helix conformation, 346-347 DNA (cytosine-5)methyltransferase S-adenosylmethionine h d i n g site, 72 methyl donor, 66-67 asymmetric 1,inding site enzyme-I)NA interactions cytosine methyl wcwptor, 81-83 cytosine methyl director, 83 guanine, 84 experimental evidence, 79-80 stnicture, 78-79 catalytic cysteine residue, 67, 80 covalent honding with. 5ftiiorodeoxycytidine oligodeoxyniic~eotides.67 demethylation of DNA, 9H DNA binding protein effect on activity, 93 DNA c~nfnrniationeffect on reaction rate, 70-71. 75-78, 93-96, 105-106 gene lcrus, 80 hydrolytic deamination of DNA S-adenosylmetliioiiiiieeffects, 102 mechanism. 71, 101-103 nonenxymatic. 103-104 role in genetic drift may from G-C pairs, 100-101, 104 role of DNA conformation, 104 thymine production, 72, 101 uracil prcduction, 71-72, 101 in1iil)itors 5-az~deoxycytidine.66-67, 99 2-pyrimidinone 1-p-u-2'deoxyribofiiraiioside,66-67 methylation modes de now) methylation, 73-74 methyl-directed methylation, 74-76 modulation factors, 91-96. 106 passive cumtinunus c/e nomi methylation, 96-98 structurally induced methylation, 76-78 patterns of DNA methylation biological role, 84, 86-87 cvncerted mtdification, 87-90 effect of gene cwpy number, 90-91 effect of mutagens, 99-100

INDEX effect on I)lenmycin cleavage specificity, 345-346 interspersed repeated sequences, 87-90 p53. 103-104 restriction fragment-length pnlyniorphisms. 89-91 tissue specificity. 87 transient expression of mtdulators. 9196 i~hosphorylation,92-93 proton exchange at C-5, 66-67, 69-70 reaction inechanism 5.6 dihydrtrytosine carbanion intermediate, 66 niicleophilic attack at C-6, 66-67 spGsp1 energetics. 68-70 steretrhemistrv, 68-70 sequelice lioninlogy between species, 72 sihstrate specificity, 73-79, 85. 97 1)oul)le-stranded RNA-dependent kinase effect on HIV tr~nisIation.176

E E-l)ox, role in HIV gene expression, 169170 EDTA I)leomycin analog, 337-33 DNA cleavage by iron cwinplex, 313, 337 Epidermal grtnutli factor assays HPLC detenninatioii of gross structure changes. 364 mitogenic potential, 365 receptor affinity, 363 tyrosine k i n i w stimulation, 363-364 effect on tyrosine kinase domain on receptor, 354 gene cloning. 360 expression in E. coli, 361-362 site-directed mutagenesis. 360-361, 365-379 mittation effects on receptor affinity, 367370 acidic residues, 366,371 aromatic residues, 373-376 Iwic residues, 370-371 doitble-site mutations, 377-379

hydrophobic residues, 372-373, 379 p l a r residues, 376-377, 379 structural residues, 377 purification of recwnl>inantproteins. 362363 sequence cnnservation between species,

355-356 size, 353 three-dimensional structure. 356-358 Epidermal grnwtli factor receptor assays HPLC determination of gross structure changes. 364 ligand finity, 363 tyrosine kinase stiinulation. 363-364 dimerimtion. 354 Iiomnlngy with insulin receptor, 3.58 hgand effects on receptor affinity, 367-370 acidic residues, 366, 371 aromatic residues, 373-376 basic residues, 370-371 doulde-site mutations, 377-379 hydrophobic residues, 372-373, 379 polar residues, 376-377, 379 striictural residues. 377 ligand-l)inding ptrket, 358-359 sequence conservation Iwtween species, 358 sine, 353 tyrosine kinase domain, 353-354 Expressed sequence tags, identification of c protein receptors, 117

F Fi1)riIIarin. role in RNA pr(ressing, 218219, 223 5-Fluorodeoxycytiditie, effect on DNA methylation pattern, 98-99 N-Formyl-peptide receptor, ligand binding site, 129 Forskolin, activation of adenylyl cyclases. 263

G CHRH receptor, see Grnwth-hormonereleasing hormone receptor

INDEX

Gluctwrticroid receptor, I>indingto HIV promoter. 173 G protein h d i n g sites in receptors, 130-136 GTP 1)inding. 114-115 organimtion in p h m a nienilmne, 115 receptors. see G protein receptors second messenger amplification, 115 signal transduction, 114-1 15 sulxniits, 130 G protein recvptors a2-adrenergic receptors. 141 p-adrenergic receptors, 121-124, 131133, 137-141 binding sites agonists. 121-129 G protein. 130-136 cwistitutive activation, 135-136 defects in diseise, 134-135 desensiti7ation, 136-143 down-regulation, 140-143 gene cloning, 147 number in humans, 147 G protein induction of n)nfonnational change, 135- 136 specificity, 130-131 identification using expressed sequencv tags. 147-149 interleukin-8 receptor. 129 niuscarinic acetylcholine recvptors, 1%125, 133-134. 141-143 N-forinvl-peptide receptor, 129 organisr;\tion in p h n i a menil>rane, 115, 118, 121 phylogenic analysis. 148-149 sequence hoinology, 116-117. 119. 130 serotonin receptors. 125-126 struetiire, 116-118, 120-121 siibfaniilies, 118, 120 sihunits, 114-115 tachykitiin receptors, 127, 129 tliyrotropiii-releasiiig horinone receptor, 129 types, 115-117 Gro-EL, similarity to prosomes, 2-3, 18, 52 Crt~tli-hormone-releasiiighortnone recentor

389 defects, 134-135 gene locus in inicv, 134 Guanine alkylation in DNA, 99-100 interactions at active site of DNA (cytosine-5)1netIiyItraiisfer~se, 84 regulatory binding proteins, see G proteins Guanylyl cyclues calcium mcdulation, 271 do1iIaitis, 270-271 forms cytoplasmic, 270 plasma membrane, 270-271 similarity to adenylyl cyclases. 267, 270271, 273-274

H High-efficiency restriction-enzyine digestion assay, 288, 295 DNA ainplification, 299 digestion, 296-298 high-efficiency restriction digestion, 298-299 isolation, 296 ehnination of PCR-generated inutints, 299-300 niittxliondrial mutational assay, 300-302 nintant analysis, 300 sensitivity. 296 HIV, see Human immuncdeficiency virus 5-HT receptors, see Serotonin receptors Huinan iininuntdeficiency virus activation signals cytokines, 160-161, 166. 182 lieat shtxk, 162 beterologous viruses, 161-162 mitogens, 162 ultraviolet radiation, 162 gene expression during infection, 157 expression studies cell transfection, 159-160 viral replication analysis, 160 in d t r o transcription. 1% promoter strength. 158

390

INDEX

long-tenninal-rei~atelements c w e transcription control element, 162163 modulators dtnvnstreatn region, 173 enhaticw region, 167 G-C-rich region, 167-168 TATA region, 168-170 upstream region. 164-167 nitdulatory transcription control element, 162-163 secluencv, 163 steroid resl’onse element, 165 trans-activatioii-reslx)iisive region. 158, 162-164 RNA structure, 178-179 tat protein I~inding.171-173, 178181 translationd control, 175-177 transcriptiona~initiation site, 157 messenger RNA cellrdar factors in prt~essing:,174 po~yadeoy~atioii, 173-174 translational control, 174- 176 therapy antisense oligonocleotides, 182-183 antisense RNA, 183 ril)ozymes, 183 TAR RNA decoys, 183-184 tat inhil)itors, 184 trans-dominant inhil)itors, 184-105

I Inducer of short transcripts, role in HIV gene expression, 170-171, 180 Interferon-y, effect on prosome subunit structure, 32 Interleukin-X receptor. ligand 1)inding site, 129 Interniediate filaments effects of acrylaniide monomer, 39 movement in cell cycle, 40 mRNA asstriation, 10-11, 40-41, 44 networks, 36-37 prosome association, 39-43 1sc~~)uinarins. activation of multicatalytic proteinase, 29-30 IST, see Inducer of short transcripts

L LMP, see Ltnv-tnoleciilar-weiglit protein complex Long-terminal-repeat, see Human initnunodeficiency virus Low-molecular-weiglit protein cvmplex, see also Prosomes discvwery, 6 role in antigen presentation, 3, 31 LTR, see Long-terminal-relieat

M Major histc~~iiipati1)ility complex antigen presentation. 3, 30-32 gene lociis. 30 MCP, see Multicatalytic proteinase see also Messenger ril~otiucleo~tn~teiii, Prosomes classification, 53-54 conip~nents.7 intermediate filament association, 40 iron resp)nse elelnetit-l)inding protein, 10 poly(A)-l)inding protein asst~iation,9, 37 RNA cv)nsensus seclrrence of 1)inding site, 16 incorporation, 3, 5, 7 recognition. 56 sucrose gradient profile, 7-9 trdlisfer froin iiucleus to cytoplasin. 9 translation, 7 Messenger HNA cytodistril~iitioii,38-33 interniediate filament ~lsstwkation,10-11, 37. 40-41, 44 poly(A)-ltinding protein binding, 9, 11, 37 prosome Itinding. 11 fiinction in translation. 22 recognition, 56 role in cell architecture. 12 transport, 9-12 5-Methylcytosine, I)iological fitnctions, 84, 86 Metliyltransferase. see DNA (cytositie-5)iiiethyltraiisferas~ MHC, see Major histc~u)nipatil~ility coniplex

INDEX

391

Mismatch amplification mutation aonealing reaction dirration, 293 temperature, 293 primer mismatches, 291-292 reaction conditions, 292-293 sensitivity maximizition, 294-295 Midticatalytic proteinase, see also Prosomes; 264 Proteasome activators, 29-30 cleavage site specificity, 27-29 disawery, 6 effects of SDS, 30 inactivation by metal ions, 15, 28 inhilitors, 29-30 motlulation. 28-29 nomeiiclature, 6 protease class, 16. 20, 27-28 size, 2 stnicture. 2, 5, 52 substrates, 24-25. 51-52 sul~units,%, 44 Mriscarinic acetylcholine receptors antagonist Ihding site, 124-125 coiipling to effector enzymes, 133 desensitiiration, 142 down-regulation, 142-143 C, protein Ijinding site. 133-134 messenger RNA stability, 144 pbospliorylatioii, 142 secpiestration. 141 Mutational spectrometry appropriate data sets, 286 lmlk approach to mutant analysis, 2136-287 clone-by-clone spectra, 285 human tissue rec~uirenientsfor nie;lsuring, 28i screening apprnwhes allele-specificpolymerase chain reaction, 289-291 constant denattirant gel electrophoresis, 302, 304-308 constant denatirring capillary electrophoresis. 288, 304-310 denaturing gradient gel electrophoresis, 286-289, 304-

3oH high-efficiency restriction-enzyme digestion assay, 288, 295-302

mismatch amplification mutation assay. 8 8 . 291-295 phenotypic selection, 287-288 mitc~!hondrialDNA, 300-302 polymerase chain reaction, 88-289, m3 10 sensitivity, 296, 310

N Nef protein, effect on HIV gene expression, 164 Neurokinins, see Tachykinin receptors NFAT-1. see Nuclear fwtor of activated T cells N-Nitroso winpnunds cnntent in tolmvo, 99 DNA damage, 99 Nuclear factor of &ivated T cells !)inding to HIV LTR, 165-166 iiiimunosuppresaiit effects on levels, 166 Nuclenlin role in RNA prc~rssing.220-221, 223 sohil)ilitv, 222-223

P PBCM, see Propylknzilylcholine mustard PCR, see Polymerase chain reaction D-Penicillamine, HIV therapy, 184 1,10-Pbenaiithroline. DNA cleavage by rwpper cnmplex, 313 Pl~leornyciii DNA cleavage site slwcificity. 333, 347 structure, 334 Poly(A)-bindingprotein cytoskeleton asscwiation, 37 mRNA Ihding, 9, 11, 37 Polymerase chain reaction DNA concentration limits, 287 mutational spectrometry, 288-289, 308310 allele-specific amplification. 289-2W mismatch amplification mutation assay, 288, 291-295 mutation generation, 299-300, 310 site-directed mutagenesis applications. 360-361

INDEX

Prncessosome crosslinking experiments. 230 role in rRNA processing, 229-230, 233-

234 Propyll~eiizilylclioliiiemustard, musc-arinic wetykholine receptor antagonist, 124 Prosomes, see also Low-molecular-weight protein complex; Midticatalytic prnteiiiase antigens. 46 autoiminiine antil)odies, 48 wntent in oocyte, 46 cytodist rihut ion, 32-36 cell surface. 43 changes during cell cycle. 35-36 extrwellular space, 43 intermediate filanient association, 3942. 55 discvvery. 3, 5-7 genes classification, 16-17 cloning, 16 eflect of deletion. 20, 51 ntiniber per species. 19-20 half-life, 34 heat slitrk proteiii similarity. 2-3. 18 iininuiioplienoty~~iiig, 15 indirect iminriiioflriorescence, 33, 37-39 junctions, 42-43 nuclear 1ncdi7;ltioii signal, 16, 18, 34 pathological response evaluation, 48-50 tumor cells, 50 pliospliorylation sites, 18 plivlogeiiic relationships, 18-20, 51 Dolyriieri7;ltioii. 14 proteiii homeostasis role. M-56: RNA hinding, 3, 5, 11-12, 16, 20, 32 content, 21-22 fiinction in complex. 22 prosome &iiity for RNA types, 20-21, 53 protection study, 20 RNasr resistance, 21 sequence homology between species. 1617, 51 size, 2, 12 species distribution, 3 stability. 15

structure, 2, 5, 12-15, 18 subunits effect on composition interferon-y, 32 cell differentiation. 44-47 embryogenesis, 46-47 disease, 48 types, 14-16 translational repression, 22-23 2 6 4 Proteasome ATPase, 26, 52 ATP dependence, 25, 30 inactivation by metals, 27 physiological role, 55 stability, 25 structure, 27, 52 substrates. 25-26 sihuiiits. 24-26 Protein kinase A, phospliorylatioii of G protein receptors, 138-139 Protein kiiiase C, substrates DNA (cytosine-5)iiiethyltraiisferase.92-93 G protein recq~tors,142 2-Pyrimidinone 1-p-u-2'deoxyri1,ofuraiioside. see DNA (cytosiiie-5)rnetl1~~ltraiisferase inhil)ition, 66-67

R ras onmgene, DNA structure, 97 Re1 protein, binding to HIV LTR, 167 Restriction fragment-length polviiiorl,liisiii, methylation pattenis. 89-91 Retinitis pignientosa. rhodopsin niiitations, 134 Retinoic wid receptor, binding to HIV LTR, 16.5 Reverse transcriptase interspersed nuclear elements. 21 tRNA primer, 21, 52 RFLP, see Restriction lragiiieiit-length polymorphism Ri1)onuclease exonucleases, 217 magnesium dependence, 216 RNase MRP, 216 RNase PRI, 216 Rilmsomal RNA

INDEX

393

cleavage sites

5'-end of5.8S rRNA, 207-208 5'-end of 18s rRNA, 203, 205-206. 228-229 3'-eild of U S rRNA, 202, 206, 208-209 3'-ETS, 209 5'-ETS,202, 205, 224-227 ITS1. 203, 207, 227 ITS2, 203, 2OU, 227-22U recognition signals, 223-228 pnKvssing inethylation I,ase, 233 Cdtalysis, 232-233 inhibitors. 232 patterns. 231 ribose, 231-233 pdthWayS, 199-204 pseudouridiiiylation, 233 role in processing B23 protein, 221 filmillarin, 218-219 helicases, 217 magnesium, 215-216 noncatalytic nucleolar proteins, 213, 21s. 218-223 NSRl protein, 220 iiucleolin, 220-221 processtwome, 229-230. 233-234 ribonucleases, 213, 215-217 RNA secondary structure, 223-225 snoRNA, 210-213 SOFl protein, 221 sequence of events, 199 tagging, 225-226 transcription units, 197, 199 Ribosome conservation between species, 234 production, 198 RNA mntent, 198 Rilxwymes, HIV therapy, 183 RNA, see Messenger RNA: Ri1x)somal RNA; Small nucleolar RNA; tmnskr RNA RNase, see Ril)onuclewe

S SIX, see Scdium dodecylsulfate Serotonin receptors

agonist binding site, 1% types, 125-126 Small iiucleohr RNA cliaracteristics. %%el0 role in rRNA pnmssing, 210, 212-213 si7e. 210-210 Sodium dcdeeylsulfate. effects on multicatdytic pmteinirse, 30 SOFl protein, role in RNA processing, 221 SP1 transcription factor, Binding to HIV LTR, 167-168 Substancv P, see Ikhykinin recvptors

T Twliykinio receptors antagonists, 127. 129 h i d i n g site, 127, 129 types, 127 Tallysomycin DNA cleavage site specificity, 333 structure, 334 TATA Ijox nicddatory fktors, 169 role in HIV gene expression, 168-170 Tat protein asscwiation cellular factors, 181-182 oligonucleotide Iinding proteins, 181 hiding of long-teriiiinal-rei~atelement, lSU, 168, 171-172, 178-180 cellular localization. 177 mnformational changes on binding, 178 effect of interferon-, 176 inhibitors in HIV therapy, lU4 mechanism of action, 171-172, 180 sitedirected mutagenesis, 177-178 structure, 177-178 TGFa, see Triuisforining growth factor a Thyrotropin-releasing hormone receptor binding site, 129 messenger RNA stability, 144 Transfer RNA cleavage by I&omycin, 339-341 tRNA'a'. binding to HIV LTR, 173

INDEX

prosome binding, 20-21, 52 retnwiral primer, 21, 52 Trdnsforining growth firt!,r a expression in E . cnli, 361 site-directed mutagenesis, 365 Transporter-associated protein, gene interspersion with major histtmmpatibility complex, 31 Tumor necrosis fwtor receptor,

phivmwnlogical intervention in HIV infection, 182

U Unified matrix hypothesis, 11 Uracil glycnsylase, repairing of DNA mismatch, 105