MolecularBiologyof Fifth Edition
MolecularBiologyof Fifth Edition
BruceAlberts Johnson Alexander JulianLewis MartinRaff KeithRoberts PeterWalter
Withproblemsby JohnWilson TimHunt
GarlandScience Group Taylor& Francis
Garland Science Vice President:Denise Schanck Assistant Editor: Sigrid Masson Production Editor and Layout: Emma leffcock Senior Publisher: Jackie Harbor Illustrator: Nigel Orme Designer: Matthew McClements, Blink Studio, Ltc. Editors: Marjorie Anderson and Sherry Granum Copy Editor: Bruce Goatly Indexer: Merrall-Ross International, Ltd. Permissions Coordinator: Marv Disoenza Cell Biology Interactiue Artistic and Scientific Direction: PeterWalter Narrated by: Julie Theriot Production Design and Development: Michael Morales
@ 2008, 2002 by Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Rafi Keith Roberts, and PeterWalter. @ f 983, f 989, 1994 by Bruce Alberts, Dennis Bray, Iulian Lewis, Martin Raff, Keith Roberts, and lames D. Watson.
Bruce Alberts received his Ph.D. from Harvard university and is professor of Biochemistry and Biophysics at the university of california, san Francisco.For 12 years,he served as President ofthe u.s. NationalAcademy ofSciences (1993-2005). Alexander Johnson received his Ph.D. from Harvard University and is professor of Microbiology and Immunology and Director of the Biochemistry cell Biology, Genetics, and Developmental Biology Graduate Program at the University of california, San Francisco. Iulian Lewis received his D.Phil. from the University of Oxford and is a Principal Scientist at the London ResearchInstitute of Cancer ResearchUK. Martin Raffreceived his M.D. from McGill University and is at the Medical Research Council Laboratory for Molecular Cell Biology and the Biology Department at University College London. Keith Roberts received his Ph.D. from the University of Cambridge and is Emeritus Fellow at the John Innes Centre, Norwich. peterWalter received his ph.D. from The Rockefeller University in Newyork and is professor and chairman of the Department of Biochemistry and Biophysics at the University of california, san Francisco, and an Investigator of the Howard Hughes Medical Institute.
All rights reserved. No part of this book covered by the copyright heron may be reproduced or used in any format in any form or by any means-graphic, electronic, or mechanical, including photocopying, recording, taping, or information storage and retrieval systems-without permission of the publisher. Library of CongressCataloging-in-Publication Data Molecularbiology of the cell / BruceAlberts ... [et al.].-- 5th ed. p.cm ISBN 978-0-8153-4r05-5 (hardcover)---ISBN978-0-8f5 g-4t06_Z(paperback) L Cytology.2. Molecular biology. I. Alberts, Bruce. QHsB1.2.M642008 571.6--dc22 2007005475CIP Published by Garland science, Taylor & Francis Group, LLC, an informa business, 270 Madison Avenue, NewYork NY f 0016,USA, and 2 park Square,Milton park, Abingdon, OXl4 4RN, UK. Printed in the United States of America 15 14 13 12 lt
10 I
B 7 6 5 4 3 2 |
Preface In many respects,we understand the structure of the universebetter than the workings of living cells.Scientistscan calculatethe age of the Sun and predict when it will ceaseto shine,but we cannot explain how it is that a human being may live for eighty years but a mouse for only two. We know the complete genomesequencesof theseand many other species,but we still cannot predict how a cell will behaveif we mutate a previouslyunstudied gene.Starsmay be l0a3times bigger,but cells are more complex, more intricately structured,and more astonishingproductsof the laws of physicsand chemistry.Through heredity and natural selection,operating from the beginningsof life on Earth to the presentday-that is, for about 20Voof the ageof the universe-living cellshave been progressivelyrefining and extending their molecular machinery and recording the results of their experimentsin the genetic instructions they pass on to their progeny. With each edition of this book, we marvel at the new information that cell biologistshave gatheredin just a few years.But we are even more amazedand daunted at the sophisticationof the mechanismsthat we encounter.The deeper we probe into the cell,the more we reafizehow much remainsto be understood. In the daysof our innocence,working on the first edition, we hailed the identification of a singleprotein-a signalreceptol say-as a greatstep forward' Now we appreciatethat eachprotein is generallypart of a complexwith many others, working togetheras a system,regulatingone another'sactivitiesin subtleways, and held in specificpositionsby binding to scaffoldproteins that givethe chemical factory a definite spatial structure.Genomesequencinghas given us virtually complete molecular parts-listsfor many different organisms;geneticsand biochemistry have told us a great deal about what those parts are capableof individually and which ones interact with which others; but we have only the most primitive grasp of the dynamics of these biochemical systems,with all their interlocking control loops. Therefore,although there are great achievements to report, cell biologistsface evengreaterchallengesfor the future. In this edition, we haveincluded new material on many topics,rangingfrom epigenetics,histonemodifications,small RNAs,and comparativegenomics,to geneticnoise,cytoskeletaldlmamics,cell-cyclecontrol, apoptosis,stem cells, and novel cancer therapies.As in previous editions, we have tried aboveall to give readersa conceptualframework for the mass of information that we now have about cells.This meansgoing beyond the recitation of facts.The goal is to learn how to put the facts to use-to reason,to predict, and to control the behavior of living systems. To help readerson the way to an activeunderstanding,we have for the first time incorporatedend-of-chapterproblems,written by Iohn Wilson and Tim Hunt. Theseemphasizea quantitative approach and the art of reasoningfrom experiments.A companion volume, MolecularBiologyof the CelI,Fifth Edition: by the sameauthors,givescomTheProblemsBook0SBN978-0-8153-4110-9), plete answersto theseproblemsand also containsmore than 1700additional problemsand solutions. A further major adjunct to the main book is the attachedMedia DVD-ROM disc.This provideshundredsof moviesand animations,including manythat are new in this edition, showingcells and cellular processesin action and bringing the text to life; the disc alsonow includesall the figuresand tablesfrom the main
book,pre-loadedinto PowerPoint@ presentations. Otherancillariesavailablefor the book include a bank of test questionsand lectureoutlines,availableto qualified instructors,and a set of 200full-coloroverheadtransparencies. Perhapsthe biggestchange is in the physical structure of the book. In an effort to make the standard Student Edition somewhatmore portable, we are providing chapters 2r-25, covering multicellular systems,in electronic (pDF) form on the accompanyingdisc,while retaining in the printed volume chapters l-20, covering the core of the usual cell biology curriculum. But we should emphasizethat the final chaptershavebeen revisedand updated as thoroughly as the rest of the book and we sincerelyhope that they will be read!A Reference Edition (ISBN97s-0-8153-4r11-6), containingthe full set of chaptersasprinred pages,is also availablefor thosewho prefer it. Full details of the conventionsadopted in the book are given in the Note to the Readerthat follows this Preface.As explainedthere,we have taken a drastic approachin confronting the different rules for the writing of genenamesin different species:throughout this book, we use the same style, regardlessof species,and often in defianceofthe usualspecies-specific conventions. As always,we are indebted to many people. Full acknowledgmentsfor scientific help are given separately,but we must here singleout someexceptionally important contributions: Iulie Theriot is almost entirely responsiblefor chapters 16 (cytoskeleton)and 24 (Pathogens,Infection, and Innate Immunity), and David Morgan likewisefor chapter 17 (cell cycle).wallace Marshall and Laura Attardi provided substantialhelp with chapters 8 and 20, respectively,as did Maynardolson for the genomicssectionof chapter4, Xiaodongwangfor chapter 18,and NicholasHarberdfor the plant sectionof Chapter15. we also owe a huge debt to the staff of Garland science and others who helped convert writers' efforts into a polished final product. Denise schanck directed the whole enterpriseand shepherdedthe wayward authors along the road with wisdom, skill, and kindness.Nigel orme put the artwork into its final form and supervisedthe visualaspectsof the book,including the backcover,with his usual flair. Matthew Mcclements designedthe book and its front cover. Emma Jeffcocklaid out its pageswith extraordinaryspeedand unflappableefficiency,dealingimpeccablywith innumerablecorrections.MichaelMoralesmanagedthe transformationof a massof animations,video clips, and other materials into a user-friendly DVD-ROM. Eleanor Lawrence and sherry Granum updatedand enlargedthe glossary.JackieHarbor and SigridMassonkept us organized.Adam Sendroffkeptus awareofour readersand their needsand reactions. MarjorieAnderson,BruceGoatly,and sherry Granumcombedthe text for obscurities, infelicities, and errors.we thank them all, not only for their professional skill and dedication and for efficiencyfar surpassingour own, but also for their unfailing helpftrlnessand friendship:they havemadeit a pleasureto work on the book. Lastly,and with no less gratitude, we thank our spouses,families, friends and colleagues. without their patient,enduringsupport,we could not haveproducedany of the editionsof this book.
Contents Speci.al Features Detailed Contents Acknowledgments A Note to the Reader
PARTI
uiii ix xxui xxxi
I. 2. 3.
TOTHECELL INTRODUCTION Cellsand Genomes CellChemistryand Biosynthesis Proteins
I 45 t25
PARTII 4. 5. 6. 7.
MECHANISMS BASICGENETIC DNA, Chromosomes,and Genomes DNA Replication,Repair,and Recombination How CellsReadthe Genome:From DNA to Protein Controlof GeneExpression
195 263 329 41I
PARTIII B. 9.
METHODS Manipulating Proteins,DNA, and RNA VisualizingCells
50r
PARTIV 10. 11.
INTERNAL ORGANIZATION OFTHECELL Membrane Structure MembraneTransportof SmallMoleculesand the Electrical Propertiesof Membranes Intracellular Compartmentsand Protein Sorting IntracellularVesicularTraffic EnergyConversion:Mitochondriaand Chloroplasts Mechanismsof CellCommunication The Cytoskeleton The Cell Cycle Apoptosis
12. 13. 14. 15. 16. t7. tB. PARTV 19. 20. 2I, 22. 23, 24. 25. Glossary Index TabIes
CONTEXT CELLS INTHEIRSOCIAL Cell lunctions, Cell Adhesion,and the ExtracellularMatrix Cancer Chapters2I-25 availableon Media DVD-ROM SexualReproduction:Meiosis,Germ Cells,and Fertilization Developmentof MulticellularOrganisms Tissues,StemCells,and TissueRenewal Specialized Pathogens,Infection, and Innate Immunity The Adaptive Immune System
The Genetic Code,Amino Acids
579 6t7 651 695 749 Br3 879 965 1053 1115 1131 I205 1269 r305 L4l7 1485 1539 G-1 L1 T-1
SpecialFeatures Table l-l Table l-2 Table 2-1 Table2-2 Table 2-3 Table2-4 Panel 2-l Panel2-2 Panel 2-3 Panel2-4 Panel 2-5 Panel 2-6 Panel2-7 Panel 2-B Panel 2-9 Panel 3-l Panel 3-2 Table 3-1 Panel 3-3 Table 4-l Table 5-3 Table 6-l Panel B-l Table 10-l Thble 11-l Panel 1l-2 Panel ll-3 Table l2-l Table l2-2 Table l4-l Panel l4-l Thble r5-5 Panel 16-2 Panel 16-3 Table I7-2 Panel l7-l
SomeGenomesThat HaveBeenCompletelySequenced p. 18 The Numbersof GeneFamilies,classifiedby Function,That Are common to All ThreeDomainsof the LivingWorld p.24 Covalentand NoncovalentChemicalBonds p. 53 TheTypesof MoleculesThat Form a BacterialCell p.55 ApproximateChemicalCompositionsof a TypicalBacteriumand a Typical MammalianCell p.63 RelationshipBetweenthe StandardFree-Energy Change,AG, and the Equilibrium Constant p.77 ChemicalBondsand GroupsCommonlyEncounteredin BiologicalMolecules pp. 106-107 Waterand Its Influenceon the Behaviorof BiologicalMolecules pp. 108-109 The PrincipalTypesof weak NoncovalentBondsthat Hold Macromolecules Together pp. r 1 0 - 1 1 1 An Outline of Someof the Typesof SugarsCommonlyFoundin Cells pp. 1 1 2 - 1 1 3 FattyAcidsand Other Lipids pp. I l4-1 I5 A Surveyof the Nucleotides pp. I 16-1r7 FreeEnergyand BiologicalReactions pp. I IB-t 19 Detailsof the t0 Stepsof Glycolysis pp. r20-I2l The CompleteCitric Acid Cycle pp. I22-t23 The 20 Amino AcidsFoundin Proteins pp.tzg-729 Four DifferentWaysof Depictinga SmallProtein,the SH2Domain pp. 132-133 SomeCommonTypesof Enzymes p.159 Someof the MethodsUsedto StudyEnzymes pp. 162-163 SomeVitalStatisticsfor the Human Genome p.206 ThreeMajor Classesof Transposable Elements p.318 PrincipalTlpes of RNAsProducedin Cells p.336 Reviewof ClassicalGenetics pp. 554-555 ApproximateLipid Compositionsof DifferentCellMembranes p.624 A Comparisonof Ion ConcentrationsInsideand Outsidea TypicalMammalianCell p . 6 5 2 The Derivationof the NernstEquation p.670 SomeClassicalExperimentson the SquidGiantAxon p. 679 RelativeVolumes Occupiedby the Major IntracellularCompartmentsin a Liver Cell (Hepatocyre) p. 697 RelativeAmounts of MembraneTypesin Two Kinds of Eucaryoticcells p.697 ProductYieldsfrom the Oxidationof Sugarsand Fats p.824 RedoxPotentials p.830 The RasSuperfamilyof MonomericGTpases p.926 The Polymerizationof Actin and Tubulin pp. 978-979 AccessoryProteinsthat Controlthe Assemblyand positionof Cvtoskeletal Filaments pp. 994-995 Summaryof the Major Cell-CycleRegulatoryproteins p. 1066 The Princinle
Stases of M Phasp (Mitnsis nnrl Crrfnlrinpcic\ in qn Animal
/-oll
nn
rATo rA?a
DetailedContents Chapter 1 Cells and Genomes
1
THEUNIVERSAL FEATURES OF CELLSON EARTH
1
Information in the SameLinear AllCellsStoreTheirHereditary Chemical Code(DNA) byTemplated Hereditary Information AllCellsReplicateTheir Polymerization Portionsof TheirHereditary Informationinto All CellsTranscribe Form(RNA) the SameIntermediary All CellsUseProteinsasCatalysts RNAinto Proteinin the SameWay All CellsTranslate to One TheFragmentof GeneticInformationCorresponding ProteinlsOneGene LifeRequires FreeEnergy with the Factories Dealing AllCellsFunction asBiochemical BuildingBlocks SameBasicMolecular Across Which in a Plasma Membrane AllCellsAreEnclosed MustPass NutrientsandWasteMaterials A LivingCellCanExistwith FewerThan500Genes Summary
5
4 5 o
7 8 8 9 10 11
OF GENOMES AND THETREEOF LIFE THEDIVERSITY CellsCanBePoweredby a Varietyof FreeEnergySources SomeCellsFixNitrogenand CarbonDioxidefor Others Cells TheGreatest Biochemical DiversityExistsAmongProcaryotic Archaea, Bacteria, TheTreeof LifeHasThreePrimaryBranches: and Eucaryotes OthersAreHighlyConserved SomeGenesEvolveRapidly; Genes and ArchaeaHave1000-6000 MostBacteria from Preexisting Genes NewGenesAreGenerated of RelatedGenesWithin GiveRiseto Families GeneDuplications a SingleCell Bothin the BetweenOrganisms, GenesCanBeTransferred and in Nature Laboratory of GeneticInformation in HorizontalExchanges SexResults Withina Species TheFunctionof a GeneCanOftenBeDeducedfrom lts Sequence AreCommonto AllThreePrimary MoreThan200GeneFamilies Branches of the Treeof Life the Functions of Genes MutationsReveal HaveFocused a Spotlighton E coli MolecularBiologists Summary
11
IN EUCARYOTES GENETICINFORMATION CellsMayHaveOriginatedasPredators Eucaryotic CellsEvolvedfrom a Symbiosis ModernEucaryotic HaveHybridGenomes Eucaryotes Eucaryotic GenomesAreBig DNA GenomesAreRichin Regulatory Eucaryotic Development TheGenomeDefinesthe Programof Multicellular LiveasSolitaryCells:the Protists ManyEucaryotes A YeastServesasa MinimalModelEucaryote Levelsof AllTheGenesof An OrganismCanBe TheExpression MonitoredSimultaneously and Computers, To MakeSenseof Cells,We NeedMathematics, Information Quantitative Asa Model HasBeenChosenOut of 300,000Species Arabidopsis
26
Plant
12 13 14 t5 to
17 18 19 21
22 22
23 z)
24 26
26 30 30 31 31 5Z
33 34 35 JO
Bya Worm,a Fly, TheWorldof AnimalCellsls Represented anda Human a Mouse, Development Providea Keyto Vertebrate Studiesin Drosophila Duplication Genomels a Productof Repeated TheVertebrate Butlt Creates ls a Problemfor Geneticists, GeneticRedundancy for EvolvingOrganisms Opportunities asa Modelfor Mammals TheMouseServes Reporton TheirOwnPeculiarities Humans WeAreAll Differentin Detail Summory Problems References
Chapter2 CellChemistryand Biosynthesis OFA CELL COMPONENTS THECHEMICAL
36 37 38 39 5>
40 41 42 42 44
45 45
of Atoms CellsAreMadeFroma FewTypes DetermineHow Atomslnteract TheOutermostElectrons CovalentBondsFormby the Sharingof Electrons ThereAreDifferentTypesof CovalentBonds asif lt Hasa FixedRadius An AtomOftenBehaves in Cells Waterlsthe MostAbundantSubstance AreAcidsand Bases SomePolarMolecules AttractionsHelpBringMolecules of Noncovalent FourTypes Togetherin Cells A Cellls Formedfrom CarbonCompounds Molecules of SmallOrganic CellsContainFourMajorFamilies SugarsProvidean EnergySourcefor CellsandArethe Subunits of Polysaccharides asWellasa of CellMembranes, FattyAcidsAreComponents Sourceof EnergY AminoAcidsArethe Subunitsof Proteins of DNAandRNA Arethe Subunits Nucleotides with of Cellsls Dominatedby Macromolecules TheChemistry Properties Remarkable Shapeof a BondsSpecifyBoththe Precise Noncovalent anditsBindingto OtherMolecules Macromolecule Summary
45 46 48 50 51 51 52
AND THE USEOF ENERGYBY CELLS CATALYSIS by Enzymes CellMetabolismls Organized of HeatEnergy by the Release Orderls MadePossible Biological from Cells Organic UseSunlightto Synthesize Organisms Photosynthetic Molecules CellsObtainEnergyby the Oxidationof OrganicMolecules Transfers Oxidationand ReductionInvolveElectron ThatBlockChemicalReactions Lowerthe Barriers Enzymes Rapidityof TheEnormous FindTheirSubstrates: HowEnzymes MolecularMotions Whetherlt Changefor a ReactionDetermines TheFree-Energy CanOccur the Free-Energy Influences of Reactants TheConcentration Direction Changeand a Reaction's AG"ValuesAreAdditive Reactions, ForSequential for Biosynthesis AreEssential ActivatedCarrierMolecules of an ActivatedCarrierlsCoupledto an TheFormation Reaction Favorable Eneroeticallv
65
53 54 55 55 58 59 61 oz 63 65
66 66 68 70 71 72 74 75 76 77 78
ATPlsthe MostWidelyUsedActivatedCarrierMolecule EnergyStoredin ATPlsOftenHarnessed to JoinTwoMolecules Together NADHand NADPHAre lmportantElectronCarriers ThereAreManyOtherActivatedCarrierMolecules in Cells TheSynthesis of Biological Polymers ls Drivenby ATpHydrolysis Summary
80 81 82 83 84 87
HOW CELLSOBTAINENERGY FROMFOOD 88 pathway Glycolysis ls a CentralATP-producing 88 Fermentations ProduceATPin the Absenceof Oxygen Rq Glycolysis lllustrates How Enzymes CoupleOxidationto Energy Storage 91 Organisms StoreFoodMolecules in SpecialReservoirs 91 MostAnimalCellsDeriveTheirEnergyfrom FattyAcidsBetween Meals 95 Sugarsand FatsAreBothDegradedto AcetylCoAin Mitochondria vo TheCitricAcidCycleGenerates NADHby OxidizingAcetylGroups to CO2 o7 Electron TransportDrivesthe Synthesis of the Majorityof the ATp in MostCells 100 AminoAcidsand Nucleotides Are partof the NitrogenCycle 100 Metabolismls Organized and Regulated 101 Summary 103 Problems 103 References 124
Chapter3 Proteins THESHAPE ANDSTRUCTURE OFPROTEINS TheShapeof a Proteinls Specified by lts AminoAcidSequence ProteinsFoldinto a Conformation of LowestEnergy ThecrHelixand the B SheetAreCommonFoldingpatterns ProteinDomainsAreModularUnitsfrom whichLargerproteins AreBuilt Fewof the ManyPossible Polypeptide ChainsWillBeUsefur to Cells Proteins CanBeClassified intoManyFamilies Sequence Searches CanldentifyCloseRelatives SomeProteinDomainsFormpartsof ManyDifferentproteins CertainPairsof DomainsAreFoundTogetherin Manyproteins TheHumanGenomeEncodes a ComplexSetof proteins, Revealing MuchThatRemains Unknown LargerProteinMolecules OftenContainMoreThanOne Polypeptide Chain SomeProteinsFormLongHelicalFilaments ManyProteinMolecules HaveElongated, FibrousShapes ManyProteins Containa Surprisingly LargeAmountof Unstructured Polypeptide Chain proteins CovalentCross-Linkages OftenStabilize Extracellular ProteinMolecules OftenServeasSubunitsfor the Assembry of LargeStructures ManyStructures in CellsAreCapableof Self-Assembly AssemblyFactors OftenAidthe Formationof Comolex Biological Structures Summary
125 125 tzJ
130 131 tJ)
I Jt)
137 139 140 141 142 "t42 143 t4)
"146 147 148 149 151 't52
PROTEINFUNCTION 152 All Proteins Bindto OtherMolecules 153 TheSurfaceConformation of a ProteinDetermines lts Chemistrv 154 Sequence Comparisons BetweenproteinFamilyMembers Highlight Crucial Ligand-Binding Sites 155 ProteinsBindto OtherProteins ThroughSeveral Typesof Interfaces tf,o AntibodyBindingSitesAreEspecially Versatile 156 TheEquilibrium Constant Measures BindingStrength 157 Enzymes ArePowerfuland HighlySpecific Catalysts 158 Substrate Bindingls the FirstStepin EnzymeCatalysis i59 Enzymes SpeedReactions by Selectively Stabilizing Transitron States 160 Enzymes CanUseSimultaneous AcidandBaseCatalysis 160 Lysozyme lllustrates Howan EnzymeWorks 16"1 TightlyBoundSmallMolecules Add ExtraFunctions to prorerns 166
MolecularTunnels Channel Substrates in Enzymes with '167 MultipleCatalytic Sites Multienzyme Complexes Helpto Increase the Rateof Cell Metabolism 168 TheCellRegulates the Catalytic Activitiesof its Enzymes 169 AllostericEnzymes HaveTwoor MoreBindingSitesThatInteract 1 7 1 TwoLigandsWhoseBindingSitesAreCoupledMust Reciprocally AffectEachOther'sBinding 171 SymmetricProteinAssemblies ProduceCooperative Allosteflc Transitions 172 TheAllosteric Transition in Aspartate Transcarbamoylase ls Understood in AtomicDetail 173 ManyChangesin Proteins Are Drivenby Protein Phosphorylation 175 A Eucaryotic CellContainsa LargeCollectionof ProteinKinases and ProteinPhosphatases 176 TheRegulation of Cdkand SrcProteinKinases ShowsHowa ProteinCanFunctionasa Microchip 177 Proteins ThatBindand Hydrolyze GTPAreUbiquitousCellular Regulators 178 proteins Regulatory Proteins Controlthe Activityof GTP-Binding by Determining WhetherGTPor GDPls Bound 179 LargeProteinMovements CanBeGenerated FromSmallOnes 179 MotorProteinsProduceLargeMovementsin Cells 181 Membrane-Bound Transporters Harness Energyto Pump Molecules ThroughMembranes 182 ProteinsOftenFormLargeComplexes ThatFunctionasProtein Machines 184 ProteinMachines with Interchangeable PartsMakeEfficientUse of Geneticlnformation 184 TheActivationof ProteinMachines OftenInvolvesPositioning Themat SpecificSites 185 ManyProteinsAreControlledby MultisiteCovalentModification t 6 0 A ComplexNetworkof ProteinInteractions Underlies CellFunction 187 Summary 190 Problems 191 References 193
Chapter4 DNA,Chromosomes, and Genomes
195
THESTRUCTURE ANDFUNCTION OFDNA
197
A DNAMoleculeConsists of TwoComplementary Chains of Nucleotides TheStructureof DNAProvides a Mechanism for Heredity In Eucaryotes, DNAls Enclosed in a CellNucleus Summory
197 199 200 201
CHROMOSOMALDNA AND ITSPACKAGING IN THE CHROMATIN FIBER
202 Eucaryotic DNAls Packaged into a Setof Chromosomes 202 Chromosomes ContainLongStringsof Genes 204 TheNucleotide Sequence ofthe HumanGenomeShowsHow OurGenesAreArranged 205 GenomeComparisons RevealEvolutionarily Conserved DNA )equences 207 Chromosomes Existin DifferentStatesThroughout the Life ofa Cell 2OB EachDNAMoleculeThatFormsa LinearChromosome Must Containa Centromere, TwoTelomeres, and Replication Origins 2Og DNAMolecules Are HighlyCondensed in Chromosomes 210 Nucleosomes Area BasicUnitof Eucaryotic Chromosome Structure 211 TheStructureofthe Nucleosome CoreParticleReveals How DNAls Packaged Z'tZ Nucleosomes Havea DynamicStructure, and AreFrequentry Subjected to ChangesCatalyzed by ATp-Dependent ChromatinRemodeling Complexes 215 Nucleosomes AreUsuallyPacked Togetherinto a Compact Chromatin Fiber lto Summary 218
THEREGULATION OFCHROMATIN STRUCTURE SomeEarlyMysteries Concerning Chromatin Structure
219 220
Resistant Heterochromatin ls HighlyOrganized andUnusually 220 to GeneExoression Modifiedat ManyDifferentSites TheCoreHistonesAreCovalently ChromatinAcquiresAdditionalVarietythroughthe Site-Specific Variants lnsertion of a SmallSetof Histone andthe HistoneVariantsAct in TheCovalentModifications Concertto Producea "HistoneCode"ThatHelpsto Function Determine Biological andCode-Writer Proteins CanSpread A Comolexof Code-Reader Alonga for LongDistances Specific ChromatinModifications Chromosome Complexes Blockthe Spreadof Reader-Writer BarrierDNASequences Domains Separate Neighboring Chromatin andThereby How HistoneVariants Reveals TheChromatinin Centromeres zt6 CanCreateSpecialStructures 230 CanBeDirectlyInherited ChromatinStructures to Eucaryotic Add UniqueFeatures ChromatinStructures 231 Function Chromosome 233 Summary THEGLOBALSTRUCTURE OF CHROMOSOMES
233
AreFoldedinto LargeLoopsof Chromatin Chromosomes Chromosomes AreUniquelyUsefulfor Visualizing Polytene ChromatinStructures ThereAreMultipleFormsof Heterochromatin Whenthe GenesWithinThemAre ChromatinLoopsDecondense Exoressed ChromatinCanMoveto SpecificSitesWithinthe Nucleusto AlterTheirGeneExoression Forma Setof DistinctBiochemical Networksof Macromolecules insidethe Nucleus Environments AreFormedfrom Chromatinin lts Most MitoticChromosomes State Condensed Summary
234
EVOLVE HOW GENOMES
245
of the Norma AreCausedby Failures GenomeAlterations DNA for CopyingandMaintaining Mechanisms Differin Proportionto of TwoSpecies TheGenomeSequences Evolved the LengthofTimeThatTheyHaveSeparately of DNA from a Comparison TreesConstructed Phylogenetic of All Organisms Tracethe Relationships Sequences Shows of HumanandMouseChromosomes A Comoarison Howthe Structures of GenomesDiverge Ratesof GenomeReflects the Relative TheSizeof a Vertebrate DNAAdditionand DNALossin a Lineage the Sequence of SomeAncientGenomes WeCanReconstruct ldentifylmportantDNA Comparisons Multispecies Sequence Sequences of UnknownFunction Sequences Can Conserved Changesin Previously Accelerated HelpDecipher Critical Stepsin HumanEvolution an lmportantSourceof Genetic GeneDuplicationProvides NoveltyDuringEvolution GenesDiverge Duplicated TheEvolutionof the GlobinGeneFamilyShowsHow DNA of Organisms Contribute to the Evolution Duplications CanBeCreatedby the GenesEncodingNewProteins Recombination of Exons NeutralMutationsOftenSpreadto BecomeFixedin a Population, that Dependson PopulationSize with a Probability oftheVariation A GreatDealCanBeLearnedfrom Analyses AmongHumans Summary Problems References
Chapter5 DNA Replication,Repair,and Recombination OFDNASEQUENCES THEMAINTENANCE Low MutationRatesAre Extremely for LifeasWe Knowlt LowMutationRatesAreNecessary Summory
236 256
239 239 241 243 245
246 247 248 249 251 251 2s2 253 z>5 254
257 257 258 260 lou
262
263 263 205
265 265
266 MECHANISMS DNA REPLICATION too andDNARepair DNAReplication Underlies Base-Pairing 266 Forkls Asymmetrical TheDNAReplication Proofreading Several Requires TheHighFidelityof DNAReplication Mechanisms AllowsEfficientError in the 5'-to-3'Direction OnlyDNAReplication 27"1 Correction ShortRNA EnzymeSynthesizes A 5pecialNucleotide-Polymerizing 272 on the LaggingStrand PrimerMolecules Helpto OpenUpthe DNADoubleHelixin Front Proteins Special 273 Fork ofthe Replication 273 ontothe DNA A SlidingRingHoldsa MovingDNAPolymerase to Forma Replication ForkCooperate at a Replication TheProteins 275 Machine Replication MismatchRepairSystemRemoves A Strand-Directed 276 Machine from the Replication ErrorsThatEscape PreventDNATanglingDuringReplication z I d DNATopoisomerases and in Eucaryotes Similar ls Fundamentally DNAReplication 280 Bacteria 281 Summary OF DNA REPLICATION AND COMPLETION THEINITIATION 281 IN CHROMOSOMES 281 Origins Replication at Begins DNASynthesis TypicallyHavea SingleOriginof DNA Chromosomes Bacterial 26l Reolication ContainMultipleOriginsof Replication 282 Chromosomes Eucaryotic TakesPlaceDuringOnlyOnePart DNAReplication In Eucaryotes 284 of the cell cycle at Distinct Replicate on the SameChromosome DifferentRegions 285 Timesin S Phase Late,WhileGenesin ChromatinReplicates HighlyCondensed 285 Early Tendto Replicate Chromatin LessCondensed Originsin a ServeasReplication DNASequences Well-Defined 260 the BuddingYeast SimpleEucaryote, Originsof A LargeMultisubunitComplexBindsto Eucaryotic 287 Reolication ThatSpecifythe Initiationof TheMammalianDNASequences 288 HaveBeenDifficultto ldentify Replication 289 Fork Behindthe Replication AreAssembled NewNucleosomes DuplicationEnsure Chromosome of Eucaryotic TheMechanisms 290 CanBeInheriteo of HistoneModification ThatPatterns 292 the Endsof Chromosomes Replicates Telomerase zY5 by CellsandOrganisms Lengthls Regulated Telomere 294 Summary DNA REPAIR DNADamageWouldRapidly Spontaneous WithoutDNARepair, ChangeDNASequences TheDNADoubleHelixls ReadilyRepaired DNADamageCanBeRemovedby MoreThanOne Pathway Thatthe Cell'sMost Ensures CouplingDNARepairto Transcription Repaired lmportantDNAls Efficiently DamageDetection of the DNABasesFacilitates TheChemistry to RepairDNA AreUsedin Emergencies SpecialDNAPolymerases Repaired Are Efficiently Breaks Double-Strand of the CellCycle DNADamageDelaysProgression Summary
295
RECOMBINATION HOMOLOGOUS HasManyUsesin the Cell Recombination Homologous in All Cells HasCommonFeatures Recombination Homologous Recombination GuidesHomologous DNABase-Pairing TheRecAProteinand its HomologsEnablea DNA5ingleStrand Regionof DNADoubleHelix to Pairwith a Homologous or Regions BranchMigrationCanEitherEnlargeHetroduplex DNAasa SingleStrand NewlySynthesized Release DoubleRepair CanFlawlessly Recombination Homologous in DNA Breaks Stranded Recombination the Useof Homologous CellsCarefullyRegulate in DNARepair HollidayJunctionsAreOftenFormedDuringHomologous Events Recombination
304
296 296 297 299 300 302 302 303 304
304 305 305 307 308 308 310 311
MeioticRecombination Beginswith a programmed DoubleStrandBreak Homologous Recombination OftenResults in GeneConversron promiscuous MismatchProofreading Prevents Recombinatron BetweenTwoPoorlyMatchedDNASequences Summary TRANSPOSITION AND CONSERVATIVE SITE-SPECIFIC RECOMBINATION
312 314 315 5to
316
ThroughTransposition, MobileGeneticElements CanInsertlnto AnyDNASequence y7 DNA-OnlyTransposons Moveby BothCut-and-paste and Replicative Mechanisms 317 SomeVirusesUsea Transposition Mechanism to MoveThemselves intoHostCellChromosomes 319 Retroviral-like Retrotransposons Resemble Retroviruses, but Lacka ProteinCoat 320 A LargeFractionof the HumanGenomels Comoosedof Nonretroviral Retrotransposons 32,l predominate DifferentTransposable Elements in Different Organisms 322 GenomeSequences Reveal the Approximate Timesthat Transposable Elements HaveMoved 323 Conservative Site-Specific Recombination CanReversibly Rearrange DNA 323 Conservative Site-Specific Recombination WasDiscovered in Bacteriophage ), n+ Conservative Site-Specific Recombination CanBeUsedto Turn GenesOn or Off 324 Summary 326 Problems 327 References 328
Chapter6 How CellsReadthe Genome:From DNAto Protein
329
FROM DNATO RNA
331
Portionsof DNASequence AreTranscribed into RNA 552 Transcription Produces RNAComplementary to OneStrandof DNA 5 5 5 CellsProduceSeveral Typesof RNA 335 SignalsEncodedin DNATellRNApolymerase Whereto Startand Stop 336 Transcription Startand StopSignalsAreHeterogeneous in NucleotideSequence 338 Transcription Initiationin Eucaryotes Requires Manyproteins 339 RNAPolymerase ll Requires GeneralTranscription Factors 340 Polymerase ll AlsoRequires Activator, Mediator, andChromatinModifyingProteins 342 Transcription Elongation Produces Superhelical Tension in DNA 343 Transcription Elongationin Eucaryotes lsTightlyCoupledto RNA Processing 345 pre-mRNAs 346 RNACappinglsthe FirstModification of Eucaryotic RNASplicingRemoves IntronSequences from NewlyTranscribed Pre-mRNAs 347 Nucleotide Sequences SignalWhereSplicing Occurs 349 RNASplicingls Performedby the Spliceosome 349 TheSpliceosome UsesATPHydrolysis to producea ComplexSeries of RNA-RNA Rearrangements 351 OtherProperties of Pre-mRNA and lts Synthesis Helpto Explain the Choiceof ProperSpliceSites 352 A Second5et of snRNPs Splicea SmallFractionof IntronSequences in Animals andPlants 353 plasticity RNASplicingShowsRemarkable J55 Spliceosome-Catalyzed RNASplicingprobablyEvolvedfrom Self-Splicing Mechanisms 355 RNA-Processing Enzymes Generate the 3, Endof Eucaryotic mRNAs5 t / MatureEucaryotic mRNAsAreSelectively Exportedfrom tne Nucleus 358 ManyNoncodingRNAsAreAlsoSynthesized and processed in the Nucleus 360 TheNucleolus ls a Ribosome-producing Factory 502 TheNucleusContainsa Varietyof Subnuclear Structures 50J Summarv 366
FROMRNATO PROTEIN
366
An mRNASequence ls Decodedin SetsofThreeNucleotide IRNAMolecules MatchAminoAcidsto Codonsin mRNA tRNAsAreCovalently ModifiedBeforeTheyExitfrom the Nucleus SpecificEnzymes CoupleEachAminoAcidto ltsAppropriateIRNA Molecule Editingby RNASynthetases Ensures Accuracy AminoAcidsAreAddedto the C-terminal Endof a Growing Polypeptide Chain TheRNAMessage ls Decodedin Ribosomes ElongationFactorsDriveTranslation Forwardand lmprovelts Accuracy TheRibosome ls a Ribozyme NucleotideSequences in mRNASignalWhereto StartProtein Synthesis StopCodonsMarktheEndofTranslation Proteins AreMadeon Polyribosomes ThereAreMinorVariations in the StandardGeneticCode Inhibitorsof Procaryotic ProteinSynthesis AreUsefulas Antibiotics Accuracy in Translation Requires the Expenditure of FreeEnergy Actto Prevent QualityControlMechanisms Translation of Damaqed mRNAs SomeProteinsBeginto FoldWhileStillBeingSynthesized MolecularChaperones HelpGuidethe Foldingof Mostproteins ExposedHydrophobic RegionsProvideCritical5ignalsfor protein QualityControl TheProteasome lsa Compartmentalized Protease with Sequestered ActiveSites An Elaborate Ubiquitin-Conjugating SystemMarksProteins for Destruction ManyProteins AreControlledby Regulated Destruction AbnormallyFoldedProteins CanAggregateto CauseDestructive HumanDiseases ThereAreManyStepsFromDNAto Protein Summory
367 368 369
THERNAWORLDAND THEORIGINS OF LIFE LifeRequires StoredInformation Polynucleotides CanBothStoreInformationand Catalyze ChemicalReactions A Pre-RNA WorldMayPredatethe RNAWorld Single-Stranded RNAMolecules CanFoldintoHighlyElaborate Structures Self-Replicating Molecules UndergoNaturalSelection How Did ProteinSynthesis Evolve? All Present-Day CellsUseDNAasTheirHereditary Material Summary Problems References
Chapter7 Controlof GeneExpression AN OVERVIEW OF GENECONTROL TheDifferentCellTypesof a Multicellular OrganismContainthe SameDNA DifferentCellTypesSynthesize DifferentSetsof proteins ExternalSignalsCanCausea Cellto Changethe Expression of ItsGenes GeneExpression CanBeRegulated at Manyofthe Stepsin the Pathway from DNAto RNAto Protein Summary
370 371 5t5
373 377 379 379 381 391 392 383 385 385 387 388 390 391 393 39s 396 3gg 3gg 4OO 401 401 402 403 404 407 408 408 409 410
411 4'11 411 412 413 415 415
DNA-BINDING MOTIFS INGENE REGULATORY PROTEINS 416 GeneRegulatory Proteins WereDiscovered UsingBacterial Genetics TheOutsideof the DNAHelixCanBeReadby proteins ShortDNASequences Are Fundamental Components of Genetic Switches GeneRegulatory Proteins ContainStructuralMotifsThatCan ReadDNASeouences TheHelix-Turn-Helix Motif lsOneof the Simplestand Most CommonDNA-B|nding Motifs
416 416 418 418 419
Proteins Constitutea SpecialClassof Helix-TurnHomeodomain 420 HelixProteins 421 of DNA-B|nding ZincFingerMotifs ThereAreSeveralTypes p sheetsCanAlsoRecognize 422 DNA SomeProteinsUseLoopsThatEnterthe Majorand MinorGroove 423 to Recognize DNA TheLeucineZipperMotifMediatesBothDNABindingand Protein 423 Dimerization That Expands the Repertoire of DNASequences Heterodimerization 424 Proteins CanRecognize GeneRegulatory and DNA MotifAlsoMediatesDimerization TheHelix-Looo-Helix 425 Binding Recognized to Predictthe DNASequences It ls NotYetPossible 426 Proteins by All GeneRegulatory ShiftAssayReadilyDetectsSequence-Specific A Gel-Mobility 427 Proteins DNA-Binding of Facilitates the Purification DNAAffinityChromatography 428 Proteins DNA-Binding Sequence-Specific Protein Recognized by a GeneRegulatory TheDNASequence 429 CanBeDeterminedExperimentally Sequences FootprintingldentifiesDNARegulatory Phylogenetic 431 Genomics ThroughComparative ldentifiesManyof the Sites Chromatinlmmunoprecipitation 431 Proteins Occupyin LivingCells ThatGeneRegulatory 432 Summary 432 WORK HOW GENETICSWITCHES Genes That Turns Repressor ls a Simple Switch TheTryptophan 433 On and Off in Bacteria 435 Activators TurnGenesOn Transcriptional Repressor Activatorand a Transcriptional A Transcriptional 435 Controlthe LocOperon 437 GeneRegulation DNALoopingOccursDuringBacterial to Help RNAPolymerase Subunits Bacteria UseInterchangeable 438 GeneTranscription Regulate ComplexSwitchesHaveEvolvedto ControlGeneTranscription 439 in Eucaryotes of a PromoterPlus GeneControlRegionConsists A Eucaryotic 440 DNASequences Regulatory of RNA GeneActivatorProteinsPromotethe Assembly Eucaryotic at the Factors Polymerase and the GeneralTranscription 441 Startpointof Transcription AlsoModifyLocalChromatin GeneActivatorProteins Eucaryotic 442 Structure 444 WorkSynergistically GeneActivatorProteins ProteinsCanInhibitTranscription GeneRepressor Eucaryotic 445 in VariousWays ProteinsOftenBindDNA GeneRegulatory Eucaryotic 445 Cooperatively Development ThatRegulate Drosophila ComplexGeneticSwitches 447 Modules AreBuiltUp fromSmaller Controls 448 by Combinatorial EveGenels Regulated fhe Drosophila AreAlsoConstructed GeneControlRegions ComplexMammalian 450 Modules from SimpleRegulatory Gene ThatPreventEucaryotic Are DNASequences Insulators +)z from Influencing DistantGenes Proteins Regulatory 453 RapidlyEvolve GeneSwitches 453 Summary THATCREATE MECHANISMS GENETIC THEMOLECULAR 454 CELLTYPES SPECIALIZED 454 in Bacteria PhaseVariation Mediate DNARearrangements CellTypein a ProteinsDetermines A Setof GeneRegulatory 455 BuddingYeast the Determine EachOther! Synthesis Repress Two ProteinsThat Lambda HeritableStateof Bacteriophage CircuitsCanBeUsedto MakeMemory SimpleGeneRegulatory 458 Devices Allowthe Cellto CanyOut LogicOperations 459 Circuits Transcriptional Parts 460 Biological from Existing NewDevices BiologyCreates Synthetic Loopsin GeneRegulation 460 ClocksAre Basedon Feedback Circadian the Expression ProteinCanCoordinate A SingleGeneRegulatory of a Setof Genes
ProteinCanTrigger of a CriticalGeneRegulatory Expression Genes of a WholeBatteryof Downstream the Expression ManyDifferentCellTypes GeneControlCreates Combinatorial in Eucaryotes ProteinCanTriggerthe Formation A SingleGeneRegulatory of an EntireOrgan ThePatternof DNAMethylationCanBeInheritedWhen CellsDivide Vertebrate on DNAMethylation lmprintingls Based Genomic with ManyGenesin Mammals lslandsAreAssociated CG-Rich of ThatStablePatterns Ensure Mechanisms Epigenetic to DaughterCells CanBeTransmitted GeneExpression in ChromatinStructure Alterations Chromosome-Wide CanBeInherited Noisy is Intrinsically TheControlof GeneExpression Summary PTIONALCONTROLS POST-TRANSCRI Termination the Premature AttenuationCauses Transcription of SomeRNAMolecules AncientFormsof GeneControl MightRepresent Riboswitches AlternativeRNASplicingCanProduceDifferentFormsof a Proteinfrom the SameGene TheDefinitionof a GeneHasHadto BeModifiedSincethe RNASplicing of Alternative Discovery Dependson a Regulated Drosophilo SexDeterminationin Seriesof RNASplicingEvents and Poly-A Cleavage A Changein the Siteof RNATranscript of a Protein AdditionCanChangethe C-terminus the Meaningof the RNAMessage RNAEditingCanChange from the NucleusCanBeRegulated RNATransport of the Cytoplasm to SpecificRegions SomemRNAsAre Localized Control of mRNAs Regions The5'and3'Untranslated TheirTranslation Protein of an lnitiationFactorRegulates ThePhosphorylation Globally Synthesis Start lnitiationat AUGCodonsUpstreamof the Translation Initiation Translation CanRegulateEucaryotic for EntrySitesProvideOpportunities InternalRibosome Control Translation GeneExpression Changesin mRNAStabilityCanRegulate Poly-AAdditionCanRegulateTranslation Cytoplasmic ManyAnimaland Regulate RNATranscripts SmallNoncoding PlantGenes ls a CellDefenseMechanism RNAInterference Formation CanDirectHeterochromatin RNAInterference Tool HasBecomea PowerfulExperimental RNAlnterference Summory Problems References
464 465 467 468 470 471 473 476 477 477 477 478 479 480 481 482 483 485 486 488 488 489 491 492 493 493 495 496 497 497 497 499
Chapter8 ManipulatingProteins,DNA,and RNA 50r 501 THEMINCULTURE ANDGROWING CELLS ISOLATING CellsCanBelsolatedfrom IntactTissues CellsCanBeGrownin Culture CellLinesArea WidelyUsedSourceof Eucaryotic Cells Homogeneous Medicine StemCellsCouldRevolutionize Embryonic MayProvidea Wayto SomaticCellNuclearTransplantation StemCells Personalized Generate ThatProduceMonoclonal HybridomaCellLinesAreFactories Antibodies Summary
502 s02
PROTEINS PURIFYING intoTheirComponentFractions CellsCanBeSeparated to StudyCellFunctions Systems ProvideAccessible CellExtracts by Chromatography CanBeSeparated Proteins ExploitsSpecificBindingSiteson AffinityChromatography Proteins TagsProvidean EasyWayto Purify Genetically-Engineered Proteins
510
50s 505 s07 508 s10 510 511 512 513 514
PurifiedCell-Free Systems AreRequired for the preciseDissection of Molecular Functions Summory ANALYZING PROTEINS Proteins CanBeSeparated by SDSpolyacrylamide-Gel Electrophoresis SpecificProteins CanBeDetectedby Blottingwith Antibodies MassSpectrometry Provides a HighlySensitive Method for ldentifyingUnknownproteins Two-Dimensional powerful Separation MethodsareEspecially Hydrodynamic Measurements Reveal the SizeandShapeof a Proteincomolex Setsof InteractingProteins CanBeldentifiedby Biochemical Methods Protein-Protein Interactions CanAlsoBeldentifiedby a Two-Hybrid Technique in yeast produces CombiningDataDerivedfrom DifferentTechniques Reliable Protein-lnteraction MaDs OpticalMethodsCanMonitorProteinInteractions in RealTime SomeTechniques CanMonitorSingleMolecules ProteinFunctionCanBeSelectively Disruptedwith Small Molecules ProteinStructureCanBeDeterminedUsingX-RayDiffraction NMRCanBeUsedto DetermineproteinStructurein Solutron ProteinSequence and StructureprovideCluesAboutprotein Function Summory ANALYZING AND MANIPULATING DNA Restriction Nucleases Cut LargeDNAMolecules into Fragments GelElectrophoresis Separates DNAMolecules of DifferentSizes Purified DNAMolecules CanBeSpecifically Labeled with Radioisotopes or ChemicalMarkersin yitro providea Sensitive NucleicAcidHybridization Reactions Wayof DetectingSpecificNucleotideSequences Northernand SouthernBlottingFacilitate Hybridization with Electrophoretically Separated NucleicAcidMolecules GenesCanBeClonedUsingDNALibraries TwoTypesof DNALibraries ServeDifferentpurooses cDNAClones ContainUninterrupted CodingSequences GenesCanBeSelectively Amplifiedby pCR CellsCanBeUsedAs Factories to produceSoecificproteins Proteins and NucleicAcidsCanBeSynthesized Directlyby Chemical Reactions DNACanBeRapidly Sequenced Nucleotide Sequences AreUsedto predictthe AminoAcio Sequences of Proteins TheGenomesof ManyOrganisms HaveBeenFullySequenceo Summary STUDYING GENEEXPRESSION AND FUNCTION Classical Genetics Begins by Disrupting a Cellprocess by Ranoom Mutagenesis GeneticScreens ldentifyMutantswith Specific Abnormalirres MutationsCanCauseLossor Gainof proteinFunction Complementation TestsReveal WhetherTwoMutationsAre in the SameGeneor DifferentGenes GenesCanBeOrderedin Pathways by Epistasis Analysis Genesldentifiedby MutationsCanBeCloned HumanGenetics Presents 5pecialproblems andSpecial Opportunities HumanGenes AreInherited in Haplotype Blocks, WhichCan Aid in the Searchfor MutationsThat CauseDisease ComplexTraitsAre Influenced by MultipleGenes Reverse GeneticsBeginswith a KnownGeneand Determines WhichCellProcesses Requirelts Function GenesCanBeRe-Engineered in Several Ways Engineered GenesCanBeInsertedintothe GermLineof ManyOrganisms Animals CanBeGenetically Altered Transgenic PlantsAre lmportantfor BothCellBiologyand Agriculture
) to ) to
517 517 518 519 521
522 523 523 524 524 526 J2/ ill
529 530 )Jl
532 s32 534 s35 )J)
s38 540 541 544 544 546
548 548 550 551 ))z
553 553 556
558 558 s59 s60 561 s63
LargeCollections ofTaggedKnockouts Providea Toolfor Examining the Function of EveryGenein an Organism RNAInterference ls a Simpleand RapidWayto TestGeneFunction ReporterGenesand /n SituHybridization RevealWhen ano Wherea Genels Expressed Expression of Individual GenesCanBeMeasured Usino RT-PCR Quantitative Microarrays Monitorthe Expression of Thousands of Genesat Once 5ingle-Cell GeneExpression Analysis Reveals Biological"Noise" Summary Problems References Chapter 9 Visualizing Cells
569 571 572 573 574 575 576 576 579
579
LOOKING AT CELLSIN THELIGHTMICROSCOPE
579 TheLightMicroscope CanResolve Details0.2pm Apart s80 LivingCellsAreSeenClearlyin a Phase-Contrast or a DifferentialInterference-Contrast Microscooe 583 lmagesCanBeEnhanced andAnalyzed by DigitalTechniques 583 IntactTissues AreUsuallyFixedand SectionedbeforeMicroscopy 585 SpecificMolecules CanBeLocatedin Cellsby Fluorescence Microscopy 586 AntibodiesCanBeUsedto DetectSpecificMolecules 588 lmagingof ComplexThree-Dimensional ObjectslsPossible with the OpticalMicroscope 589 TheConfocalMicroscope Produces OpticalSectionsby Excluding Out-of-Focus Light 590 Fluorescent Proteins CanBeUsedtoTagIndividualproteinsin LivingCellsandOrganisms 592 ProteinDynamics CanBeFollowed in LivingCells 593 Light-Emitting Indicators CanMeasure Rapidly Changing Intracellular lonConcentrations 596 Several Strategies AreAvailableby WhichMembrane-lmpermeant Substances CanBeIntroducedinto Cells 597 LightCanBeUsedto ManipulateMicroscopic ObjectsAsWell Asto lmageThem 598 SingleMolecules CanBeVisualized by UsingTotalInternal Reflection Fluorescence Microscopy 5gg Individual Molecules CanBeTouched andMovedUsingAtomic ForceMicroscopy 600 Molecules CanBeLabeledwith Radioisotopes 600 Radioisotopes AreUsedtoTraceMolecules in CellsandOrganisms602 Summary 603 LOOKING AT CELLSAND MOLECULES IN THEELECTRON
MtcRoScoPE
604
TheElectronMicroscope Resolves the FineStructureofthe Cell 604 Biological Specimens Require Special Preparation for the Electron Microscope 605 Specific Macromolecules CanBeLocalized by lmmunogold Electron Microscopy 606 lmagesof Surfaces CanBeObtained by Scanning Electron Microscopy 607 MetalShadowing AllowsSurfaceFeatures to BeExamined at HighResolution byTransmission ElectronMicroscopy 60g NegativeStainingand Cryoelectron Microscopy BothAllow Macromolecules to BeViewedat HighResolution 6l O MultiplelmagesCanBeCombined to Increase Resolution 610 DifferentViewsof a SingleObjectCanBeCombinedto Givea Three-Dimensional Reconstruction 612 Summary ot2 Problems 614 References ot)
s63 564
Chapter10 MembraneStructure
565
THELIPIDBILAYER Phosphoglycerides, Sphingolipids, andSterols AretheMajor
617
L i p i d si n C e l lM e m b r a n e s P h o s p h o l i o i d sS o o n t a n e o u s l vF o r m B i l a v e r s
618
foo
)oaJ
6",7
A)6
TheLipidBilayer lsa Two-Dimensional Fluid TheFluidityof a LipidBilayerDependson lts Composition CanFormDomainsof DespiteTheirFluidity,LipidBilayers DifferentCompositions Monolayer LipidDropletsAreSurrounded by a Phospholipid lmportant TheAsymmetryof the LipidBilayerls Functionally Are Foundon the Surfaceof All PlasmaMembranes Glycolipids Summary
oll
MEMBRANE PROTEINS
629
with the LipidBilayerin MembraneProteins CanBeAssociated Various Ways of Some LipidAnchorsControlthe MembraneLocalization Proteins Signaling Proteins the Polypeptide ChainCrosses In MostTransmembrane the LipidBilayerin an o-HelicalConformation Transmembrane crHelicesOftenlnteractwith OneAnother FormLargeTransmembrane Channels Somep Barrels AreGlycosylated ManyMembraneProteins and Purifiedin Detergents MembraneProteins CanBeSolubilized lsa Light-Driven ProtonPumpThatTraverses Bacteriorhodopsin the LipidBilayerasSevens Helices OftenFunctionasLargeComplexes MembraneProteins ManyMembraneProteinsDiffusein the Planeof the Membrane and Lipidsto SpecificDomainsWithin CellsCanConfineProteins a Membrane Mechanical Strength GivesMembranes TheCorticalCytoskeleton and RestrictMembraneProteinDiffusion Summary Problems References
622 624 ot>
626 628 629
629 630 631 632 634 o5f
636 640 642 642 645 646 648 648 650
Chapter11 MembraneTransportof SmallMolecules 651 and the ElectricalPropertiesof Membranes PRINCIPLES OF MEMBRANETRANSPORT
651
to lons LipidBilayers AreHighlylmpermeable Protein-Free TransportProteins: of Membrane ThereAreTwo MainClasses and Channels Transporters Coupledto an ActiveTransportls MediatedbyTransporters EnergySource Summary
652
AND ACTIVEMEMBRANETRANSPORT TRANSPORTERS CanBeDrivenby lon Gradients ActiveTransport pH Cytosolic in the PlasmaMembraneRegulate Transporters Cells in Epithelial Distribution of Transporters An Asymmetric Transportof Solutes Underlies the Transcellular Pumps of ATP-Driven ThereAreThreeClasses P-typeATPase TheCa2+Pumplsthe Best-Understood the PumpEstablishes ThePlasmaMembraneP-typeNa+-K+ Na+GradientAcrossthe PlasmaMembrane Constitutethe LargestFamilyof Membrane ABCTransporters TransoortProteins Summary
654 656
652
6s3 654
o)t
658 oou 661 663 667
OF PROPERTIES ANDTHEELECTRICAL ION CHANNELS 667 MEMBRANES and FluctuateBetweenOpenand Arelon-Selective lon Channels 667 closedStates TheMembranePotentialin AnimalCellsDependsMainlyon K+Leak 669 and the K+GradientAcrossthe PlasmaMembrane Channels Pump the Na+-K+ TheRestingPotentialDecaysOnlySlowlyWhen 669 ls Stopped K+ChannelShows Structureof a Bacterial TheThree-Dimensional 671 CanWork Howan lonChannel 673 to lons to WaterButlmpermeable ArePermeable Aquaporins 675 Structure TheFunctionofa NeuronDependson lts Elongated in ActionPotentials Generate CationChannels Voltage-Gated 676 Excitable Cells Electrically of ActionPotential the Speedand Efficiency MyelinationIncreases o/6 in NerveCells Propagation
IndividualGatedChannels IndicatesThat Recording Patch-Clamp 680 Fashion Openin an All-or-Nothing and Structurally Are Evolutionarily CationChannels Voltage-Gated 682 Related ConvertChemicalSignalsinto lon Channels Transmittercated 682 Onesat ChemicalSynapses Electrical 684 or Inhibitory CanBeExcitatory ChemicalSynapses JunctionAre at the Neuromuscular Receptors TheAcetylcholine 684 CationChannels Transmitter-Gated for Psychoactive AreMajorTargets lon Channels TransmitterGated 686 Drugs Activation the Sequential Involves Transmission Neuromuscular 687 of FiveDifferentSetsof lon Channels 688 SingleNeuronsAreComplexComputationDevices of at Least a Combination NeuronalComputationRequires 689 ThreeKindsof K+Channels (LTP) in the MammalianHippocampus Potentiation Long-Term 691 Channels NMDA-Receptor Dependson Ca2+EntryThrough 692 Summary 693 Problems 694 References
and Compartments Chapter12Intracellular ProteinSorting OFCELLS THECOMPARTMENTALIZATION Setof MembraneBasic CellsHavetheSame AllEucaryotic Organelles Enclosed of Relationships OriginsExplainthe Topological Evolutionary Organelles in DifferentWays CanMoveBetweenCompartments Proteins to the CorrectCellAddress DirectProteins SignalSequences DeNovo:TheyRequire CannotBeConstructed MostOrganelles Informationin the Organelleltself Summary
695 695 695 697 699 701 702 704
THE NUCLEUS BETWEEN OF MOLECULES THETRANSPORT 704 ANDTHE CYTOSOL 705 Envelope Nuclear the Perforate Complexes Pore Nuclear to the Nucleus 705 SignalsDirectNuclearProteins NuclearLocalization Bindto BothNuclearLocalization NuclearlmportReceptors 707 andNPCProteins Signals 708 NuclearExportWorksLikeNuclearlmport,Butin Reverse Through on Transport lmposesDirectionality TheRanGTPase 708 NPCs by Controlling NPCsCanBeRegulated TransportThrough 709 MachinerY to the TransPort Access 7't0 Disassembles DuringMitosisthe NuclearEnvelope 712 Summary INTOMITOCHONDRIA OF PROTEINS THETRANsPORT AND CHLOROPLASTS Dependson SignalSequences into Mitochondria Translocation and ProteinTranslocators ArelmportedasUnfolded Proteins Precursor Mitochondrial Chains Polypeptide and a MembranePotentialDriveProteinlmport ATPHydrolysis Intothe MatrixSpace to lnsert Mechanisms UseSimilar andMitochondria Bacteria 2 PorinsintotheirOuterMembran Membraneand Intothe InnerMitochondrial TransDort SpaceOccursViaSeveralRoutes Intermembrane to the Thylakoid DirectProteins TwoSignalSequences in Chloroplasts Membrane Summary PEROXISOMES to UseMolecularOxygenand HydrogenPeroxide Peroxisomes PerformOxidativeReactions Directsthe lmportof Proteinsinto A ShortSignalSequence Peroxisomes Summary
713 7'13 715 716 717 717 719 720 721 721 722 t25
THEENDOPLASMIC RETICULUM 723 TheERlsStructurally and Functionally Diverse 724 SignalSequences WereFirstDiscovered in proteinslmoorteo into the RoughER 726 A Signal-Recognition Particle(SRp)DirectsERSignalSequences to a SpecificReceptorin the RoughERMembrane 727 porein the ThePolypeptide ChainPasses Throughan Aqueous Translocator 730 Translocation Acrossthe ERMembraneDoesNot AlwaysRequire OngoingPolypeptide ChainElongation 731 In Single-Pass Transmembrane Proteins, a SingleInternal ERSignal Sequence Remains in the LipidBilayer asa Membrane-spanning o Helix 732 Combinations of Start-Transfer and Stop-Transfer SignalsDetermine proteins the Topologyof Multipass Transmembrane 734 Translocated Polypeptide ChainsFoldandAssemble in the Lumen of the RoughER n6 MostProteins Synthesized in the RoughERAreGlycosylated by the Additionof a CommonN-Linked Oligosaccharide 736 Oligosaccharides Are UsedasTagsto Markthe Stateof protein Folding 738 lmproperlyFoldedProteins Are Exportedfrom the ERand Degradedin the Cytosol 739 MisfoldedProteinsin the ERActivatean Unfoldedprotein Resoonse 740 SomeMembraneProteins Acquirea Covalently Attached Glycosylphosphatidylinositol (Gpl)Anchor 742 TheERAssembles MostLipidBilayers 743 Summory 745 Problems 746 References 748
Chapter 13 Intracellular VesicularTraffic THEMOLECULAR MECHANISMS OFMEMBRANE TRANSPORT ANDTHEMAINTENANCE OF COMPARTMENTAL DIVERSITY There AreVarious Types of Coated Vesicles TheAssembly of a Clathrin CoatDrives Vesicle Formation NotAllCoats FormBasket-like Structures Phosphoinositides MarkOrganelles andMembrane Domarns Cytoplasmic Proteins Regulate thepinching-Off andUncoarrng of CoatedVesicles MonomericGTPases ControlCoatAssembly NotAllTransport Vesicles AreSpherical RabProteinsGuideVesicle Targeting SNAREs MediateMembrane Fusion InteractingSNAREs Needto BepriedApartBeforeThey Can Function Again ViralFusionProteins andSNAREs MayUseSimilar Fusion Mechanisms Summary TRANSPORT FROMTHEERTHROUGH THEGOLGI APPARATUS ProteinsLeavethe ERin COPII-Coated Transport Vesicles OnlyProteins ThatAre properlyFoldedand Assembled CanLeave the ER Vesicular TubularClusters MediateTransportfrom the ERto the GolgiApparatus TheRetrieval Pathwayto the ERUsesSortingSignals ManyProteins AreSelectively Retainedin the Compartments in WhichTheyFunction TheGolgiApparatus Consists of an OrderedSeriesof Compartments Oligosaccharide Chains AreProcessed in the GolgiApparatus Proteoglycans AreAssembled in the GolgiApparatus Whatlsthe Purpose of Glycosyfationt Transport Throughthe GolgiApparatus MayOccurbyVesicular Transportor Cisternal Maturation GolgiMatrixProteinsHelpOrganize the Stack Summory
749
750 751 754 757 t)I
758 760 760 toz
764 764 766
766 767 767 768 769 771 771 773 775 776 777 77g 77g
TRANSPORT FROMTHE IRANSGOLGINETWORK TO LYSOSOMES
779
Lysosomes Arethe Principal Sitesof Intracellular Digestion Lysosomes AreHeterogeneous Plantand FungalVacuoles Are Remarkably Versatile Lysosomes MultiplePathways DeliverMaterials to Lysosomes A Mannose6-Phosphate Receptor Recognizes Lysosomal Proteins in the lronsGolgiNetwork TheM6PReceptor ShuttlesBetweenSpecificMembranes A SignalPatchin the Hydrolase Polypeptide ChainProvides the Cuefor M6PAddition Defectsin the GlcNAcPhosphotransferase Causea Lysosomal Storage Disease in Humans SomeLysosomes UndergoExocytosis Summary
779 780 781 792 783 784 785 785 786 786
TRANSPORT INTOTHECELLFROMTHEPLASMA MEMBRANE: ENDOCYTOSIS
787 Specialized Phagocytic CellsCanIngestLargeParticles 787 Pinocytic Vesicles Formfrom CoatedPitsin the PlasmaMemorane 789 Not All Pinocytic Vesicles AreClathrin-Coated 790 CellsUseReceptor-Mediated Endocytosis to lmportSelected Extracellular Macromolecules 791 Endocytosed Materials ThatAreNot Retrieved from Endosomes EndUp in Lysosomes 792 SpecificProteins AreRetrieved from EarlyEndosomes and Returned to the PlasmaMembrane 793 Multivesicular BodiesFormon the Pathway to LateEndosomes 795 Transcytosis Transfers Macromolecules AcrossEpithelial CellSheets 797 Epithelial CellsHaveTwoDistinctEarlyEndosomal Compartments but a CommonLateEndosomal Comoartment 798 Summory 799 TRANSPORT FROMTHE IRANSGOLGINETWORK TO THECELLEXTERIOR: EXOCYTOSIS 799 ManyProteins and LipidsSeemto BeCarriedAutomaticallv from the GolgiApparatus to the CellSurface 800 Secretory Vesicles Budfrom thefuonsGolgiNetwork 801 Proteins AreOftenProteolytically Processed Duringthe Formationof Secretory Vesicles 803 Secretory Vesicles WaitNearthe PlasmaMembraneUntil Signaled to Release TheirContents 803 Regulated Exocytosis CanBea Localized Response ofthe Plasma Membrane andltsUnderlying Cytoplasm 804 Secretory VesicleMembraneComponents AreeuicklyRemoved from the PlasmaMembrane 805 SomeRegulated Exocytosis EventsServeto Enlarge the plasma Membrane 80s Polarized CellsDirectProteins from the lransGolgiNetworkto the Appropriate Domainof the Plasma Membrane 805 DifferentStrategies GuideMembraneProteins and LipidsSelectively to the CorrectPlasmaMembraneDomains 806 Synaptic Vesicles CanFormDirectlyfrom Endocytic Vesicles 807 >ummary 809 Problems 810 References 812
Chapter14 EnergyConversion:Mitochondria and Chloroplasts T H EM I T O C H O N D R I O N TheMitochondrion Contains an OuterMembrane, an Inner Membrane, andTwoInternal Compartments TheCitricAcidCycleGenerates High-Energy Electrons A Chemiosmotic Process ConvertsOxidationEnergyinto ATp NADHTransfers its Electrons to OxygenThroughThreeLarge Respiratory Enzyme Complexes As Electrons MoveAlongthe Respiratory Chain,Energyls Stored asan Electrochemical ProtonGradientAcrossthe lnner Membrane TheProtonGradientDrivesATPSynthesis
813 815 916 817 917 gl9
820 a2'l
TheProtonGradientDrivesCoupledTransportAcrossthe Inner Membrane ProduceMostof the Cell'sATP ProtonGradients Mitochondria Maintain a HighATP:ADP Ratioin Cells MakesATP A LargeNegativeValueof AGfor ATPHydrolysis Usefulto the Cell to Hydrolyze ATPand ATPSynthase CanFunctionin Reverse PumoHr Summary ELECTRON-TRANSPORT CHAINSAND THEIRPROTOI' PUMPS
822 822 823 824 826 827
827
827 ProtonsAre Unusually Easyto Move 828 TheRedoxPotentialls a Measureof ElectronAffinities 829 EfectronTransfers Release LargeAmountsofEnergy in the MethodsldentifiedManyElectronCarriers Spectroscopic 829 Respiratory Chain TheRespiratory ChainIncludesThree LargeEnzyme Complexes 831 in the InnerMembrane Embedded Efficient An lron-CopperCenterin Cytochrome OxidaseCatalyzes 832 02 Reduction Transfers in the InnerMitochondrial MembraneAreMediated Electron 834 Tunneling duringRandom Collisions by Electron A LargeDropin RedoxPotentialAcrossEachoftheThreeRespiratory 835 EnzymeComplexes Provides the Energyfor H+Pumping in theThreeMajor by DistinctMechanisms TheH+PumpingOccurs 835 EnzymeComplexes Transport from ATPSynthesis 836 H+lonophores UncoupleElectron ElectronFlowThrough Respiratory ControlNormallyRestrains 837 the Chain in BrownFatinto NaturalUncouolers Convertthe Mitochondria 838 Heat-Generating Machines PlaysManyCriticalRolesin CellMetabolism 838 TheMitochondrion Mechanisms to Harness Bacteria AlsoExploitChemiosmotic 839 Energy 840 Summary 840 AND PHOTOSYNTHESIS CHLOROPLASTS ls OneMemberof the PlastidFamilyof TheChloroplast 841 Organelles Resemble Mitochondria ButHavean Extra Chloroplasts 842 Compartment fromSunlight andUselt to Fix CaptureEnergy Chloroplasts 843 Carbon by Ribulose Bisphosphate CarbonFixationlsCatalyzed 844 Carboxylase ThreeMolecules EachCO2MoleculeThatls FixedConsumes 845 ofNADPH ofATPandTwoMolecules to Facilitate CarbonFixationin SomePlantslsCompartmentalized 846 Growthat LowCO2Concentrations of Chlorophyll Dependson the Photochemistry Photosynthesis 847 Molecules Reaction CenterPlusan AntennaComplex A Photochemical 848 Forma Photosystem In a Reaction Center,LightEnergyCapturedby Chlorophyll 849 Creates a StrongElectronDonorfrom a WeakOne BothNADPHand ATP 850 Produces NoncyclicPhotophosphorylation CanMakeATPby CyclicPhotophosphorylation Chloroplasts 853 WithoutMakingNADPH and AlsoResemble I and ll HaveRelatedStructures, Photosystems 8s3 Photosystems Bacterial and Forcelsthe Samein Mitochondria TheProton-Motive 6fJ Chloroplasts Control in the Chloroplast InnerMembrane Proteins Carrier 854 with the Cytosol MetaboliteExchange 855 AlsoPerformOtherCrucialBiosyntheses Chloroplasts 855 Summary AND OF MITOCHONDRIA THEGENETIC SYSTEMS 85s PLASTIDS ContainCompleteGeneticSystems856 Mitochondria and Chloroplasts the Numberof Determine GrowthandDivision Organelle 857 in a Cell Mitochondria andPlastids
859 HaveDiverseGenomes and Chloroplasts Mitochondria ProbablyBothEvolvedfrom and Chloroplasts Mitochondria 859 Bacteria Endosymbiotic CodonUsageand CanHavea Havea Relaxed Mitochondria 861 VariantGeneticCode Known 862 Containthe SimplestGeneticSystems AnimalMitochondria 863 GenesContainIntrons SomeOrganelle About Genomeof HigherPlantsContains TheChloroplast 863 120Genes by a Non-Mendelian GenesAreInherited Mitochondrial 864 Mechanism 866 in ManyOrganisms Inherited GenesAreMaternally Organelle the Overwhelming Demonstrate PetiteMutantsin Yeasts Biogenesis 866 for Mitochondrial of the CellNucleus lmportance that Proteins ContainTissue-Specific and Plastids Mitochondria 867 in the CellNucleus AreEncoded Make Chloroplasts lmportMostof TheirLipids; Mitochondria 867 Mostof Theirs MayContributeto the Agingof CellsandOrganisms 606 Mitochondria HaveTheirOwn Genetic and Chloroplasts WhyDo Mitochondria 868 Systems? 870 Summary 870 CHAINS OF ELECTRON-TRANSPORT THE EVOLUTION 870 ATP to Produce Fermentation CellsProbablyUsed TheEarliest to Use ChainsEnabledAnaerobicBacteria Electron-Transoort 871 asTheirMajorSourceof Energy Molecules Nonfermentable Sourceof ReducingPower, ByProvidingan Inexhaustible a MajorEvolutionary Overcame Bacteria Photosynthetic 872 Obstacle Chainsof Cyanobacteria Electron-Transport ThePhotosynthetic Oxygenand PermittedNewLife-Forms 873 Atmospheric Produced 875 Summary 877 Problems 878 References Chapter 15 Mechanisms of Cell Communication
879
879 OF CELLCOMMUNICATION PRINCIPLES GENERAL 880 Receptors Bindto Specific SignalMolecules Extracellular CanAct OverEitherShortor Long SignalMolecules Extracellular 881 Distances Cellsto ShareSignaling AllowNeighboring GapJunctions 884 lnformation of Combinations to Specific to Respond EachCellls Programmed 884 SignalMolecule5 Extracellular to the Same DifferentTypesof CellsUsuallyRespondDifferently 885 SignalMolecule Extracellular CellsDependson TheirPositionin TheFateof SomeDeveloping 886 MorphogenGradients Molecule of an lntracellular A CellCanAlterthe Concentration 886 QuicklyOnlylf the Lifetimeof the Moleculels Short the Activityof NitricOxideGasSignalsby DirectlyRegulating 887 SpecificProteinsInsidetheTargetCell GeneRegulatory AreLigand-Modulated NuclearReceptors 889 Proteins ProteinsArelonReceptor of Cell-Surface TheThreeLargestClasses and Enzyme-Coupled G-Protein-Coupled, Channel-Coupled, 891 Receptors ViaSmall RelaySignals Receptors MostActivatedCell-Surface 893 SignalingProteins and a Networkof Intracellular Molecules Switches asMolecular Function Proteins Signaling ManyIntracellular 895 or GTPBinding ThatAreActivatedby Phosphorylation the Speed,Efficiency, Enhance Complexes Signaling Intracellular 897 ofthe Response and Specificity Between ModularInteractionDomainsMediatelnteractions 897 SignalingProteins Intracellular Abruptlyto to Respond CellsCanUseMultipleMechanisms Signal 899 ofan Extracellular Concentration Increasing a Gradually MakeUseof Usually Networks Signaling Intracellular 901 Loops Feedback 902 to a Signal Sensitivity CellsCanAdjustTheir 903 Summary
SIGNALING THROUGH G-PROTEIN-COUPLED CELL(GPCRs) sURFACE RECEPTORS ANDSMALL INTRACELLULAR MEDIATORS Trimeric GProteins Relay Signals fromGpCRs
904
SomeG ProteinsRegulate the Production of CyclicAMp Cyclic-AMP-Dependent ProteinKinase(pKA)MediatesMosr of the Effectsof CyclicAMP SomeG Proteins Activate An InositolPhospholipid Signaling Pathwayby ActivatingPhospholipase C-p Ca2+ Functions asa Ubiquitous Intracellular Mediator TheFrequency of Ca2+Oscillations lnfluences a Cell! Response proteinKinases Ca2+/Calmodulin-Dependent (CaM-Kinases) MediateManyof the Responses to Ca2+ Signals in AnimalCells SomeG ProteinsDirectlyRegulatelon Channels SmellandVisionDependon GPCRs ThatRegulate CyclicNucleotide-Gated lonChannels Intracellular Mediatorsand Enzymatic Cascades Amplify Extracellular Signals phosphorylation GPCR Desensitization Dependson Receptor Summory
90s 90s 908 909 912 912 914 916 917 919 920 921
SIGNALING THROUGHENZYME-COUPLED CELL-SURFACE RECEPTORS 921 phosphorylate ActivatedReceptorTyrosine (RTKs) Kinases Themselves 922 Phosphorylated Tyrosines on RTKsServeas DockingSitesfor Intracellular Signaling Proteins 923 Proteins with SH2DomainsBindto phosphorylated Tyrosines 924 RasBelongsto a LargeSuperfamily of MonomericGTpases 926 RTKs ActivateRasViaAdaptorsand GEFs: Evidence from the Developing Drosophila Eye 927 RasActivates a MAPKinase Signaling Module 928 ScaffoldProteinsHelpPreventCross-Talk BetweenparallelMAp Kinase Modules 930 RhoFamilyGTPases Functionally CoupleCell-Surface Receptors to the Cytoskeleton 931 Pl3-Kinase Produces LipidDockingSitesin the plasmaMemorane 932 ThePl-3-Kinase-Akt SignalingPathwayStimulates AnimalCellsto Surviveand Grow 934 TheDownstream SignalingPathways ActivatedBy RTKs and GpCRs Overlao v5) Tyrosine-Kinase-Associated Receptors Dependon Cytoplasmic Tyrosine Kinases 935 CytokineReceptors Activatethe JAK-STAT Signalingpathway, Providinga FastTrackto the Nucleus 937 phosphorylations 9 3 8 ProteinTyrosine Phosphatases ReverseTyrosine SignalProteinsof the TGFBSuperfamily ActThroughReceptor Serine/Threonine Kinases andSmads 939 proteinKinases Serine/Threonine andTyrosine AreStructurally Related 941 Bacterial Chemotaxis Dependson a Two-Component Signaling PathwayActivatedby Histidine-Kinase-Associated Receptors 941 Receptor Methylationls Responsible for Adaptationin Bacterial Chemotaxis 943 Summory 944 SIGNALING PATHWAYS DEPENDENT ON REGULATED PROTEOLYSIS OF LATENTGENEREGULATORY PROTEINS protein TheReceptorProteinNotchls a LatentGeneRegulatory Wnt ProteinsBindto Frizzled Receptors and Inhibitthe Degradation of p-Catenin Hedgehog Proteins Bindto patchedRelieving lts Inhibition of Smoothened ManyStressful and Inflammatory StimuliActThroughan NFrB-Dependent Signaling Pathway Summory
946 946 948 950 952 954
SIGNALING IN PLANTS 955 Multicellularity andCellCommunication Evolved Independently in PlantsandAnimals 955 Receptor Serine/Threonine Kinases Arethe LargestClassof Cell-Surface Receptors in Plants vf,o
EthyleneBlocksthe Degradation of SpecificGeneRegulatory Proteinsin the Nucleus Regulated Positioning of AuxinTransporters Patterns PlantGrowth Phytochromes DetectRedLight,andCryptochromes DetectBlue Light Summory Problems References
Chapter 16 The Cytoskeleton
957 959 960 961 964
965
THESELF-AssEMBLY AND DYNAMICSTRUCTURE OF CYTOSKELETAL FILAMENTS
965 Cytoskeletal Filaments Are Dynamicand Adaptable 966 TheCytoskeleton CanAlsoFormStableStructures 969 EachTypeof Cytoskeletal Filamentls Constructed from Smaller ProteinSubunits 970 Filaments Formedfrom MultipleProtofilaments Have Advantageous Properties 971 Nucleationlsthe Rate-Limiting Stepin the Formationof a Cytoskeletal Polymer 973 TheTubulin andActinSubunits Assemble Head-to-Tailto CreatePolarFilaments 973 Microtubules andActinFilaments HaveTwoDistinctEnds ThatGrowat DifferentRates 975 Filament Treadmilling andDynamicInstability AreConsequences of NucleotideHydrolysis byTubulinand Actin 976 Treadmilling and DynamicInstability Aid RapidCytoskeletal Rearrangement 980 TubulinandActinHaveBeenHighlyConserved During Eucaryotic Evolution 982 Intermediate FilamentStructureDependson TheLateral Bundling andTwisting of CoiledCoils 983 Intermediate Filaments lmpartMechanical Stability to AnimalCells 985 DrugsCanAlterFilamentPolymerization 987 Bacterial CellOrganization andCellDivision Dependon Homologsofthe Eucaryotic Cytoskeleton 999 Summary 991 HOWCELLSREGULATETHEIRCYTOSKELETAL FILAMENTS 992 A ProteinComplexContaining yTubulinNucleates Microtubules 992 Microtubules Emanate fromthe Centrosome in AnimalCells 992 ActinFilaments AreOftenNucleated at the PlasmaMembrane 996 TheMechanism of NucleationInfluences Large-Scale Filament Organization 999 Proteins ThatBindto the FreeSubunitsModifyFilamentElongation999 SeveringProteinsRegulate the Lengthand KineticBehaviorof ActinFilaments andMicrotubules 1000 Proteins ThatBindAlongthe Sidesof Filaments CanEitherStabilize or DestabilizeThem 1OO1 ProteinsThat Interact with Filament EndsCanDramatically Change Filament Dynamics 1OO2 DifferentKindsof ProteinsAlterthe Properties of RapidlyGrowing Microtubule Ends 1003 Filaments AreOrganized into Higher-Order Structures in Cells 1005 Intermediate Filaments AreCross-Linked and Bundledlnto StrongArrays 1005 Cross-Linking Proteins with DistinctProperties OrganizeDifferent Assemblies of ActinFilaments 1006 Filaminand SpectrinFormActinFilamentWebs l OOg Cytoskeletal Elements MakeManyAttachments to Membrane 1009 Summary l0l0 MOLECULARMOTORS Actin-Based MotorProteins AreMembersof the Mvosin Superfamily ThereAreTwoTypesof MicrotubuleMotorProteins: Kinesins and Dyneins TheStructural Similarity of MyosinandKinesin Indicates a CommonEvolutionaryOrigin MotorProteins Generate Forceby CouplingATPHydrolysis to Conformational Chanqes
1010 1 0 11
rc14 1015 1016
AreAdaptedto CellFunctions MotorProteinKinetics Transport of MembraneMediatethe Intracellular MotorProteins Organelles Enclosed Localizes SpecificRNAMolecules TheCytoskeleton CellsRegulateMotorProteinFunction Summary
1020 1021 1022 1023 1025
1025 AND CELLBEHAVIOR THE CYTOSKELETON Muscles to Causes Slidingof Myosinll andActinFilaments 1026 Contract InitiatesMuscle Ca2+ Concentration A SuddenRisein Cytosolic 1028 Contraction 10 3 1 Engineered Machine HeartMusclelsa Precisely AreMotileStructures Builtfrom Microtubules Ciliaand Flagella 1031 andDyneins Microtubule of the MitoticSpindleRequires Construction 1034 of ManyMotorProteins Dynamics and the Interactions 1036 ManyCellsCanCrawlAcrossA SolidSubstratum 1037 DrivesPlasmaMembraneProtrusion ActinPolymerization CellAdhesionandTractionAllowCellsto PullThemselves 1040 Forward Membersof the RhoProteinFamilyCauseMajorRearrangements 1041 of the ActinCytoskeleton Extracellular SignalsCanActivatethe ThreeRhoProtein 1043 FamilyMembers 1045 ExternalSignalsCanDictatethe Directionof CellMigration Betweenthe Microtubuleand ActinCytoskeletons Communication 1046 and Locomotion Whole-Cell Polarization Coordinates of NeuronsDepends Specialization TheComplexMorphological 1047 on the Cytoskeleton 1050 Summary 1050 Problems 1052 References
Chapter17 The CellCycle OFTHECELL CYCLE OVERVIEW CellCyclels Dividedinto FourPhases TheEucaryotic Cell-Cycle Controlls Similarin All Eucaryotes by Analysis of Genetically Cell-Cycle ControlCanBeDissected YeastMutants in Animal ControlCanBeAnalyzedBiochemically Cell-Cycle Embryos Cells Cell-Cycle ControlCanBeStudiedin CulturedMammalian Progression CanBeStudiedin VariousWays Cell-Cycle Summary THE CELL-CYCLE CONTROLSYSTEM Triggersthe MajorEventsof the TheCell-Cycle ControlSystem CellCycle Activated ControlSystemDependson Cyclically TheCell-Cycle (Cdks) ProteinKinases Cyclin-Dependent and CdkInhibitoryProteins(CKls) InhibitoryPhosphorylation CdkActivity CanSuppress Proteolysis ControlSystemDependson Cyclical TheCell-Cycle Regulation ControlAlsoDependson Transcriptional Cell-Cycle asa Networkof ControlSystemFunctions TheCell-Cycle Switches Biochemical Summaty 5 PHASE OncePerCycle S-CdkInitiatesDNAReplication Duplication of Chromatin DuplicationRequires Chromosome Structure Together HelpHoldSisterChromatids Cohesins Summory
1053 1054 1054 1056 1056 1057 1059 1059 1060 1060 1060 1062 1063 1064 I uof,
1065 't067 't067 1067 1069 1070 1071
MITOSIS
1071
M-CdkDrivesEntryInto Mitosis M-Cdkat the Onsetof Mitosis Activates Dephosphorylation for Chromosomes HelpsConfigureDuplicated Condensin Separation Machine TheMitoticSpindlels a Microtubule-Based
1071 1074 1075 1075
GovernSpindle MotorProteins Microtubule-Dependent and Function Assembly of a BipolarMitotic in the Assembly Collaborate TwoMechanisms Soindle OccursEarlyin the CellCycle Duplication Centrosome in Prophase M-CdkInitiatesSpindleAssembly in AnimalCellsRequires TheCompletionof SpindleAssembly Breakdown NuclearEnvelope Greatlyin Mitosis MicrotubuleInstabilityIncreases PromoteBipolarSpindleAssembly MitoticChromosomes to the Spindle AttachSisterChromatids Kinetochores ls AchievedbyTrialand Error Bi-Orientation on the Spindle MultipleForcesMoveChromosomes andthe Separation TriggersSister-Chromatid TheAPC/C Completionof Mitosis Separation: BlockSister-Chromatid Chromosomes Unattached CheckPoint TheSpindleAssemblY A and B in Anaphase Segregate Chromosomes in DaughterNucleiat ArePackaged Chromosomes Segregated Teloohase Meiosisls a SoecialFormof NuclearDivisionInvolvedin Sexual Reproduction Summory
1077 1077 l 078 1078 1079 1080 1081 1082 1083 1085 1087 1088 1089 1o9o 1090 1092
1092 CYTOKINESIS for the Force Ring Generate Actinand Myosinll in the Contractile 1093 Cytokinesis of the andContraction LocalActivationof RhoATriggersAssembly 1094 Ring Contractile the Planeof of the MitoticSpindleDetermine TheMicrotubules 1095 AnimalCellDivision 1097 in HigherPlants GuidesCytokinesis ThePhragmoplast to Daughter MustBeDistributed Organelles Membrane-Enclosed 1098 CellsDuringCytokinesis TheirSpindleto DivideAsymmetrically 1099 SomeCellsReposition 1099 MitosisCanOccurWithoutCytokinesis 1100 TheG1Phasels a StableStateof Cdklnactivity 11 0 1 Summary CONTROLOF CELLDIVISIONAND CELLGROWTH MitogensStimulateCellDivision Nondividing CellsCanDelayDivisionby Enteringa Specialized State Activities MitogensStimulateGr-Cdkand GrlS-Cdk TheDNADamageResponse DNADamageBlocksCellDivision: on the Number ManyHumanCellsHavea Built-lnLimitation of TimesTheyCanDivide Arrestor SignalsCauseCell-Cycle AbnormalProliferation Exceptin CancerCells Apoptosis, OrganismandOrganGrowthDependon CellGrowth TheirGrowthand Division CellsUsuallyCoordinate Proliferating SignalProteins CellsCompetefor Extracellular Neighboring CellMassby UnknownMechanisms AnimalsControlTotal Summary Problems References
Chapter18 APoPtosis
11 0 1 11 0 2 1103 1103 1105 1oo7 1107 r 108 11 0 8 1110 1111 1112 1112 1113
11 1 5
1115 UnwantedCells CellDeathEliminates Programmed 1117 Recognizable ApoptoticCellsAreBiochemically Cascade Proteolytic ApoptosisDependson an Intracellular 1118 Thatls MediatedbYCasPases Pathway Activatethe Extrinsic DeathReceptors Cell-Surface 1120 ofApoptosis 1121 TheIntrinsicPathwayof ApoptosisDependson Mitochondria 1121 the IntrinsicPathwayof Apoptosis Bcf2ProteinsRegulate 1124 lAPsInhibitCaspases '1126 Ways Various in Apoptosis Inhibit Factors Survival Extracellular to Disease1127 CanContribute Apoptosis or Insufficient EitherExcessive 1128 Summary 1128 problems 1129 References
Chapter19 CellJunctions,CellAdhesion,and the Extracellular Matrix CADHERINS ANDCELL-CELL ADHESION
I 131 11 3 3
Cadherins MediateCa2+-Dependent Cell-Cell Adhesion in AllAnimals TheCadherinSuperfamily in Vertebrates IncludesHundredsof Different Proteins, Including Manywith Signaling Functions Cadherins MediateHomophilic Adhesion 5electiveCell-CellAdhesionEnables Dissociated Vertebrare Cellsto Reassemble into Organized Tissues Cadherins Controlthe Selective Assortment of Cells TwistRegulates Epithelial-Mesenchyma I Transitions CateninsLinkClassical Cadherins to the ActinCytoskeleton Adherens Junctions Coordinate the Actin-Based Motilityof AdjacentCells Desmosome Junctions GiveEpithelia Mechanical Strenqth Cell-Cell Junctions SendSignals intothe CellInterior Selectins Mediate Transient Cell-Cell Adhesions in the Bloodstream Members of the lmmunoglobulin 5uperfamily of proteins MediateCa2+-lndependent Cell-Cell Adhesion ManyTypes of CellAdhesionMolecules Act in parallelto Create a Synapse ScaffoldProteins OrganizeJunctionalComplexes Summary
1147 11 4 8 1149
TIGHTJUNCTIONS AND THEORGANIZATION OF EPITHELIA
11 5 0
11 3 5
tt50
1137 139 140 141 142
1142 1143 1"t45 1145 1146
TightJunctionsForma SealBetweenCellsand a FenceBetween Membrane Domains playa KeypartIn ScaffoldProteinsin JunctionalComplexes the Controlof CellProliferation Cell-CellJunctions andthe Basal LaminaGovernAoico-Basal Polarity in Epithelia planarCellpolarity A Separate Signaling System Controls Summary
11 5 5 1157 11 5 8
PASSAGEWAYS FROMCELLTO CELL:GAp JUNCT|ONS AND PLASMODESMATA
11 5 8
11 5 0 11 5 3
GapJunctions CoupleCellsBothElectrically andMetabolically A Gap-Junction Connexon lsMadeUp of SixTransmembrane Connexin Subunits GapJunctions HaveDiverse Functions CellsCanRegulate the Permeability of TheirGapJunctions performManyof the SameFunctions In Plants,Plasmodesmata asGapJunctions Summary
1162 11 6 3
THEBASALLAMINA
^t't64
Basal Laminae Underlie All Epithelia andSurround Some Nonepithelial CellTypes Lamininlsa Primary Component of the BasalLamina TypelV CollagenGivesthe BasalLaminaTensileStrenoth BasalLaminae HaveDiverse Functions Summary INTEGRINS AND CELL-MATRIX ADHESION IntegrinsAreTransmembrane Heterodimers ThatLinkto tne Cytoskeleton IntegrinsCanSwitchBetweenan Activeand an Inactive Conformation IntegrinDefectsAre Responsible for ManyDifferentGenetic Diseases Integrins Cluster to FormStrongAdhesions Extracellular MatrixAttachments ActThroughIntegrinsto ControlCellProliferation and Survival proteins Integrins Recruit Intracellular Signaling at Sitesof CellSubstratum Adhesion IntegrinsCanProduceLocalized Intracellular Effects Summory
THEEXTRACELLULAR MATRIX OFANIMALCONNECTIVE TlsSuES 1178 TheExtracellular Matrixls Madeand Orientedby the Cells j179 Withinlt (GAG)ChainsOccupyLargeAmountsof Glycosaminoglycan SpaceandFormHydrated Gels 1179 Hyaluronan Actsasa SpaceFilleranda Facilitator of CellMigration DuringTissue Morphogenesis andRepair 1180 Proteoglycans AreComposed of GAGChainsCovalently Linked to a CoreProtein 11 8 1 Proteoglycans CanRegulate the Activitiesof Secreted Proterns 1182 Cell-Surface Proteoglycans Act asCo-Receptors 11 8 3 Collagens Arethe MajorProteins of the Extracellular Matrix 1184 CollagenChainsUndergoa Seriesof Post-Translational Modifications 11 8 6 Propeptides AreClippedOff Procollagen Afterlts Secretion to AllowAssembly of Fibrils 1187 't187 Secreted Fibril-Associated Collagens HelpOrganize the Fibrils CellsHelpOrganize the CollagenFibrilsTheySecreteby ExertingTensionon the Matrix 11 8 9 ElastinGivesTissues TheirElasticitv 11 8 9 Fibronectin ls an Extracellular ProteinThatHelpsCellsAttach to the Matrix 1191 TensionExertedby CellsRegulates Assemblyof Fibronectin 't't91 Fibrils Fibronectin Bindsto IntegrinsThrough an RGDMotif 11 9 3 CellsHaveto BeAbleto DegradeMatrix,asWellasMakeit 11 9 3 MatrixDegradation ls Localized to the Vicinityof Cells 1194 Summary 1195 THEPLANTCELLWALL TheComposition of the CellWallDependson the CellType TheTensileStrengthof the CellWallAllowsPlantCellsto DevelopTurgorPressure ThePrimary CellWallls BuiltfromCellulose Microfibrils Interwovenwith a Networkof PecticPolysaccharides OrientedCell-Wall DepositionControlsplantCellGrowth Microtubules OrientCell-Wall Deposition Summary Problems References
11 9 5 11 9 5 't't97 1197 1199 1200 1202 1202 1204
11 5 8 11 5 9 11 6 1 11 6 1
1164 11 6 5 't166 1167 1169 1169 1170 1"170 "1172 1174 1175 1176 11 7 7 1178
Chapter20 Cancer CANCER A5A MICROEVOLUTIONARY PROCESS CancerCellsReproduce WithoutRestraint and Colonize OtherTissues MostCancers Derivefrom a SingleAbnormalCell Cancer CellsContainSomatic Mutations A SingleMutationls Not Enoughto Cause Cancer Cancers Develop Gradually fromIncreasingly Aberrant Cells Cervical Cancers Are Prevented by EarlyDetection TumorProgression InvolvesSuccessive Roundsof Random InheritedChangeFollowedby NaturalSelection TheEpigenetic Changes ThatAccumulate in CancerCellsInvolve Inherited Chromatin Structures andDNAMethylation HumanCancer CellsAreGenetically Unstable Cancerous GrowthOftenDependson Defective Controlof CellDeath,CellDifferentiation, or Both CancerCellsAre UsuallyAlteredin TheirResponses to DNA Damageand OtherFormsof Stress HumanCancer CellsEscape a Built-lnLimitto Cellproliferation A SmallPopulation of Cancer StemCellsMaintains Many Tumors How Do CancerStemCellsArise? To Metastasize, MalignantCancerCellsMustSurviveand Proliferate in a ForeignEnvlronment TumorsInduceAngiogenesis TheTumorMicroenvironment Influences Cancer Development ManyProperties Typically Contributeto Cancerous Growth Summary
1205 1205 1206 1207 I 208 1209 1210 1211 1212 1213 1214 1215 1216 12'.t7 1217 1218 1220 1220 1222 1223 1223
CAU5E5OF CANCER THEPREVENTABLE
1224
AgentsDamage DNA Many,ButNotAll,Cancer-Causing Do Not Damage DNA;TumorPromoters TumorInitiators Contribute to a Significant Viruses andOtherInfections of HumanCancers Proportion Reveals Waysto Avoid ldentification of Carcinogens Cancer Summary
1225 1226 1227 1229 1230
1230 GENES FINDINGTHECANCER-CRITICAL and Loss-of-Function of Gain-of-Function Theldentification 1231 MutationsRequires DifferentMethods ThatAlter CanAct asVectorsfor Oncogenes Retroviruses 1232 CellBehavior on the for Oncogenes HaveConverged DifferentSearches 1233 SameGene-Ras Firstldentified Cancer Syndromes of RareHereditary Studies 1234 Genes TumorSuppressor fromStudies Genes CanAlsoBeldentified TumorSuopressor Il5> of Tumors Tumor Mechanisms CanInactivate andEpigenetic BothGenetic 1235 Genes Suppressor in Many CanBeMadeOveractive GenesMutatedin Cancer 1237 Ways 1239 GenesContinues TheHuntfor CancerCritical 1240 Summary 1240 BEHAVIOR BASISOF CANCER-CELL THEMOLECULAR Embryos andGenetically of BothDeveloping Studies of the Function MiceHaveHelpedto Uncover Engineered 1241 Genes Cancer-Critical 1242 CellProliferation GenesRegulate ManyCancer-Critical of Cell-Cycle MayMediate the Disregulation DistinctPathways of CellGrowthin andthe Disregulation Progression 1244 Cells Cancer Cells AllowCancer ThatRegulate Apoptosis in Genes Mutations 1245 WhenTheyShouldNot to Survive Cellsto Survive in thep53GeneAllowManyCancer Mutations 1246 DespiteDNADamage and Proliferate Blockthe Actionof KeyTumorSuppressor DNATumorViruses 1247 Proteins AreStill ThatLeadto Metastasis in TumorCells TheChanges 1249 Largelya Mystery of Visible a Succession Evolve SlowlyVia Colorectal Cancers 1250 Changes AreCommonto a LargeFractionof A FewKeyGeneticLesions 1251 Colorectal Cancers Repair 1254 in DNAMismatch Cancers HaveDefects SomeColorectal with CanOftenBeCorrelated TheStepsof TumorProgression 1254 Mutations SDecific by lts Own Arrayof Genetic EachCaseof CancerlsCharacterized I z)o Lesions Iz>o Summary AND FUTURE PRESENT TREATMENT: CANCER but Not Hopeless Cures ls Difficult for Cancer TheSearch andLossof Instability Exploit the Genetic TraditionalTherapies in Cancer Cells Responses Checkpoint Cell-Cycle Genetic of a Tumor's Cause the Specific NewDrugsCanExploit Instability More BecomeProgressively GeneticInstabilityHelpsCancers Resistant to Therapies of Cancer AreEmergingfrom Our Knowledge NewTherapies Biology Oncogenic to InhibitSpecific CanBeDesigned SmallMolecules Proteins AreLogicalTargetsfor CancerTherapy TumorBloodVessels the lmmune by Enhancing MayBeTreatable ManyCancers Tumor Againsta Specific Response Has with Several DrugsSimultaneously Patients Treating for CancerTherapy PotentialAdvantages into Cancers Profiling CanHelpClassify GeneExpression Subgroups Meaningful Clinically
1256 1257 1257 1257 1259 1260 1260 I loz
| 202
| 205
1264
1264 1265 1265 1267
Therels StillMuchMoreto Do Summory problems References
Chapters21-25 availableon Media DVD-ROM
Meiosis, Chapter2t SexualReproduction: Fertilization GermCells,and
1269
OF SEXUALREPRODUCTIOII OVERVIEW ls Brief TheHaploidPhasein HigherEucaryotes Diversity Genetic Creates Meiosis Advantage a Competitive GivesOrganisms SexualReproduction Summary
1269 1269 1271 1271 1272
1272 ME|OS|S 1272 byTwoMeioticCellDivisions AreProduced Gametes PairDuringEarly (andSexChromosomes) Homologs Duplicated 1274 proohase 1 a Synaptonemal of Formation in the Culminates Pairing Homolog 1275 Complex KinetochoreDependson Meiosis-Specific, HomologSegregation 1276 Proteins Associated 1278 GoesWrong Frequently Meiosis 1279 GeneticReassortment Enhances Crossing-Over 1280 ls HighlyRegulated Crossing-Over 1280 Mammals in MaleandFemale Differently lsRegulated Meiosis 1281 Summary IN GERMCELLSAND sEXDETERMINATION PRIMORDIAL MAMMALS Signalsfrom NeighborsSpecifyPGCsin MammalianEmbryos Gonads Migrateintothe Developing PGCS Gonadto TheSryGeneDirectsthe DevelopingMammalian Becomea Testis VaryGreatlybetween ManyAspectsof SexualReproduction AnimalsPecies Summary
1282 1282 1283 1283 1285 1286
EGGS
1287
for IndependentDevelopment An Eggls HighlySpecialized EggsDevelopin Stages to Growto TheirLargeSize OocytesUseSpecialMechanisms MostHumanOocytesDieWithoutMaturing Summary
1287 1288 1290 1291 1292 't292
SPERM TheirDNAto an Egg SpermAre HighlyAdaptedfor Delivering Testis in the Mammalian Continuously SpermAreProduced SpermDevelopasa SYncYtium Summary
1292 1293 1294 1296
1297 FERTILIZATION in the FemaleGenitalTract 1297 SpermBecomeCapacitated Ejaculated and Undergoan Pellucida Zona to the Bind Sperm Capacitated 1298 AcrosomeReaction 1298 of Sperm-EggFusionls StillUnknown TheMechanism Ca2+in the Cytosol 1299 the Eggby Increasing SpermFusionActivates OnlyOneSpermFertilizes Ti'reCorticalReactionHelpsEnsureThat 1300 the Egg asWellaslts Genometo the Zygote1301 Centrioles TheSoermProvides theTreatmentof Human IVFand lCSlHaveRevolutionized 1301 Infertility 1303 Summary 1304 References
of Multicellular Chapter22 Development Organisms
1305
OFANIMALDEVELOPMENT 1305 MECHANISMS UNIVERSAL 1307 Features Anatomical Basic Some Share Animals
Multicellular Animals AreEnriched in proteins Mediatino Cell Interactions andGeneRegulation 1308 Regulatory DNADefines the program of Development 1309 Manipulation of the EmbryoReveals the Interactions Between ItsCells 13 1 0 Studies of MutantAnimalsldentifythe Genes ThatControl Developmental Processes 131 A CellMakesDevelopmental Decisions LongBeforelt Shows a Visible Change 131 CellsHaveRemembered Positional Values ThatReflect Their Locationin the Body 1312 InductiveSignalsCanCreateOrderlyDifferences Between Initially ldentical Cells 13 1 3 SisterCellsCanBeBornDifferentby an Asymmetric Cell Division 1313 PositiveFeedback CanCreateAsymmetryWhereThereWas NoneBefore 1314 patterns, PositiveFeedback Generates Creates All-or-None Outcomes, and Provides Memory t5t) A SmallSetof SignalingPathways, UsedRepeatedly, Controls Developmental Patterning 13 1 6 Morphogens AreLong-Range Inducers ThatExertGradedEffects 13 1 6 Extracellular Inhibitors of SignalMolecules Shapethe Response to the Inducer 1317 Developmental Signals CanSpread Through Tissue in Severar DifferentWays 13 1 8 Programs ThatAreIntrinsicto a CellOftenDefinethe Time-Course of its Develooment 1319 InitialPatterns AreEstablished in SmallFields of Cellsano Refined by Sequential Induction asthe EmbrvoGrows 13 1 9 Summory 1320 CAENORHABDITIS ELEGANS: DEVELOPMENT FRoM THE PERSPECTIVE OFTHEINDIVIDUAL CELL Caenorhabditis elegans ls Anatomically Simple CellFatesin the Developing NematodeAreAlmostperfectly Predictable Productsof Maternal-Effect GenesOrganize the Asymmetric Division of the Egg Progressively MoreComplexpatternsAreCreatedby Cell-Cell Interactions Microsurgery andGenetics Reveal the Logicof Developmental Control; GeneCloningandSequencing Reveal ltsMolecular Mechanisms CellsChangeOverTimein TheirResponsiveness to Developmental Signals Heterochronic GenesControlthe Timingof Development CellsDo NotCountCellDivisions in TimingTheirInternal Programs Selected CellsDie by Apoptosisaspartof the proqramof Development Summary
13 2 1 't321 1322 1323 1324
| 5l)
1325 1326 1327 1327 1328
DROSOPHILA AND THEMOLECULAR GENETICS OF PATTERN FORMATION: GENESIS OFTHEBODYPLAN 1328 TheInsectBodylsConstructed asa Series of Segmental Units 1329 Drosophilo Beginslts Development asa Syncytium 1330 GeneticScreens DefineGroupsofGenesRequired for Specific Aspectsof EarlyPatterning 1332 Interactions of the OocyteWith lts Surroundings Definethe Axesof the Embryo:the Roleof the Egg-polarity Genes 13 3 3 TheDorsoventral Signaling GenesCreate a Gradient of a protern Nuclear GeneRegulatory 1334 DppandSogSetUp a Secondary Morphogen Gradient to Refinethe Patternof the Dorsalpartof the Embrvo 1336 TheInsectDorsoventral AxisCorresponds to the Veriebrate Ventrodorsal Axis 1336 ThreeClasses of Segmentation GenesRefinethe Anterior_posterior MaternalPatternand Subdivide the Embrvo 1336 TheLocalized Expression of Segmentation Genesls Regulated by a Hierarchy of Positional Signals 1337 TheModularNatureof Regulatory DNAAllowsGenesto Have MultipleIndependently Controlled Functions I 339
Egg-Polarity, Gap,andPair-Rule Genes Create aTransient Pattern ThatlsRemembered bvOtherGenes Summary HOMEOTIC SELECTOR GENES ANDTHEPATTERNING OF THEANTEROPOSTERIOR AXIS The Hox Code SpecifiesAnterior-PosteriorDifferences
proteins Homeotic Selector GenesCodefor DNA-Binding That lnteractwith OtherGeneRegulatory Proteins TheHomeoticSelectorGenesAreExpressed Sequentially Accordingto TheirOrderin the HoxComplex TheHoxComplexCarries a Permanent Recordof Positional Information TheAnteroposterior Axisls Controlledby HoxSelectorGenesIn vertebrates Also Summary
1340 1341 1341 1342 1342 1343 1344 1344 1347
ORGANOGENESIS AND THEPATTERNING OF APPENDAGES
1347 Conditional andInducedSomatic Mutations Makeit possible to AnalyzeGeneFunctionsLatein Development 1348 BodyPartsof the Adult FlyDevelopFromlmaginalDiscs 1349 HomeoticSelectorGenesAre Essential for the Memoryof Positional Information in lmaginal DiscCells tSfl SpecificRegulatory GenesDefinethe CellsThatWillForman Appendage 13 5 1 TheInsectWing DisclsDividedintoCompartments 1352 FourFamiliar Signaling Pathways Combine to Pattern the WingDisc:Wingless, Hedgehog, Dpp,and Notch I 353 TheSizeof EachCompartment ls Regulated by Interactions AmongltsCells 13 s 3 Similar Mechanisms Pattern the Limbsof Vertebrates 1355 Localized Expression of Specific Classes of GeneRegulatory ProteinsForeshadows CellDifferentiation 1356 Lateral Inhibition Singles Out Sensory MotherCellsWithin Proneural Clusters 1357 Lateral Inhibition Drives the Progeny of the Sensory MotherCell TowardDifferentFinalFates 857 Planar Polarity of Asymmetric Divisions isControlled by Signaling viathe ReceptorFrizzled 1359 Asymmetric Stem-Cell Divisions Generate AdditionalNeurons in the CentralNervousSystem 1359 Asymmetric Neuroblast Divisions Segregate an Inhibitorof Cell Division intoJustOneof the Daughter Cells 1361 patternof NotchSignaling Regulates the Fine-Grained Differentiated CellTypesin ManyDifferentTissues 1362 SomeKeyRegulatory GenesDefinea CellType;OthersCan Activatethe Programfor Creationof an EntireOrgan 1362 Summary 1363 CELLMOVEMENTS AND THESHAPING OFTHE VERTEBRATE BODY 1363 ThePolarityof the AmphibianEmbryoDependson the polarity of the Egg 1364 Cleavage Produces ManyCellsfrom One l 365 Gastrulation Transforms a HollowBallof Cellsinto a Three-Lavered Structurewith a PrimitiveGut I JO) predictable TheMovements of Gastrulation ArePrecisely | 500 Chemical Signals Trigger the Mechanical Processes 1367 ActiveChanges of CellPackingProvidea DrivingForcefor Gastrulation 1368 Changing Patterns of CellAdhesion Molecules ForceCells IntoNewArrangements 1369 TheNotochord Elongates, Whilethe NeuralplateRollsUp ro Formthe NeuralTube 1370 A Gene-Expression Oscillator ControlsSegmentation of the 't371 Mesodermlnto Somites DelayedNegativeFeedback MayGenerate the Oscillations of the Segmentation Clock 1373 Embryonic Tissues AreInvadedin a Strictly Controlled Fashion by Migratory Cells 1373 TheDistribution of MigrantCellsDepends on Survival Factors asWellasGuidance Cues 1375
Left-RightAsymmetryof theVertebrateBodyDerivesFrom in the EarlyEmbryo Molecular Asymmetry Summary
1376
T H EM O U S E
1378
Preamble Begins Witha Specialized Mammalian Development Embryols HighlyRegulative TheEarlyMammalian TotipotentEmbryonicStemCellsCanBeObtainedFroma Embryo Mammalian Generate Between Epithelium andMesenchyme Interactions TubularStructures Branching Summary
1378 1380
NEURALDEVELOPMENT
13 8 3
ltt I
1380 13 8 1 1382
Accordingto the NeuronsAreAssignedDifferentCharacters 1383 Timeand PlaceWhereTheyAreBorn the Assigned to a Neuronat lts BirthGoverns TheCharacter 1385 lt WillForm Connections EachAxonor DendriteExtendsby Meansof a GrowthConeat 1386 ItsTip TheGrowthConePilotsthe DevelopingNeuriteAlonga Precisely 1387 DefinedPath/n Vlvo 1389 asTheyTravel Sensibilities GrowthConesCanChangeTheir Neurotrophic Factors ThatControlNerve TargetTissues Release 1389 CellGrowthand Survival Guidesthe Formationof OrderlyNeural NeuronalSpecificity 1391 Maps AxonsFromDifferentRegionsof the RetinaRespondDifferently | sYZ in theTectum of Reoulsive Molecules to a Gradient AreSharpened by of SynapticConnections DiffusePatterns 1393 Remodeling Activity-Dependent in the Moldsthe Patternof SynapticConnections Experience 1395 Brain May Synapse Remodeling AdultMemoryandDevelopmental 1396 Mechanisms Dependon Similar 1397 Summary PLANTDEVELOPMENT
1398
Arabidopsis Servesasa ModelOrganismfor PlantMolecular 1398 Genetics Contro Genomels Richin Developmental fhe Arabidopsis 1399 Genes a Root-Shoot Development Startsby Establishing Embryonic 1400 AxisandThenHaltsInsidethe Seed 1403 by Meristems Sequentially ThePartsof a PlantAreGenerated '1403 Signals on Environmental of the Seedling Depends Development Events Coordinate Developmental Hormonal Signals Long-Range 1403 in Separate Partsofthe Plant TheShapingof EachNewStructureDependson Oriented 1406 andExoansion CellDivision Setof Primordia EachPlantModuleGrowsFroma Microscopic 1407 in a Meristem AuxinTransportControlsthe Patternof Primordia Polarized 1408 in the Meristem 1409 the Meristem Maintains CellSignaling PlantTopologyby MutationsCanTransform Regulatory 1410 in the Meristem AlteringCellBehavior TheSwitchto FloweringDependson Pastand Present 1412 Environmental Cues 1413 HomeoticSelectorGenesSpecifythe Partsof a Flower 1415 Summary 1415 References
Tissues,StemCells, Chapter23 Specialized and TissueRenewal BYSTEM CELLS ANDIT5RENEWAL EPIDERMIS
'a417 1417
1419 Barrier Waterproof Epidermal CellsForma Multilayered of Different CellsExpress a Sequence Epidermal Differentiating 1420 GenesasTheyMature StemCellsin the BasalLayerProvidefor Renewalof the Epidermis1420 of a StemCellDo Not AlwaysHaveto TheTwo Daughters 1421 BecomeDifferent
TheBasalLayerContainsBothStemCellsandTransitAmplifying Cells ArePartof the Strategyof Growth TransitamplifyingDivisions Control DNA Original Retain Selectively StemCellsof SomeTissues Strands Dramatically DivisionCanIncrease TheRateof Stem-Cell WhenNewCellsAre NeededUrgentlY Renewal GovernEpidermal Signals ManyInteracting and Cyclesof Development TheMammaryGlandUndergoes Regression Summary EPITHELIA SENSORY Replaced OlfactorySensoryNeuronsAreContinually AuditoryHairCellsHaveto Lasta Lifetime CellsRenewTheirParts:the Photoreceptor MostPermanent Cellsof the Retina Summary
1422 1423 1424 1425 1426 1426 1428 1429 1429 1430 't432 1433
1434 ANDTHEGUT THEAIRWAYS 1434 Lungs of the in the Alveoli AdjacentCellTypesCollaborate to Collaborate andMacrophages Cells, Ciliated GobletCells, 1434 Keepthe AirwaysClean ltselfFasterThan Renews TheLiningof the SmallIntestine 1436 AnyOtherTissue 1438 Compartment the GutStem-Cell Maintains WntSignaling 1439 GutCellDiversification Controls NotchSignaling of GutEpithelial the Migrations Controls Signaling Ephrin-Eph 1440 Cells Combine Pathways andBMPSignaling PDGF, Wnt,Hedgehog, 1441 Niche to Delimitthe Stem-Cell Tract asan InterfaceBetweenthe Digestive TheLiverFunctions 1442 andthe Blood "t443 LiverCellProliferation LiverCellLossStimulates InsulinDoesNot Haveto Dependon StemCells: Renewal Tissue 1444 Cellsin the Pancreas Secreting 1445 Summary AND ENDOTHELIAL LYMPHATICS, BLOODVEsSEL5, 1445 CELLS 1445 andLymphatics CellsLineAll BloodVessels Endothelial 1446 Angiogenesis TipCellsPioneer Endothelial ofVessel 1447 CellsFormDifferentTypes of Endothelial DifferentTypes NotchSignaling VEGF; a BloodSupplyRelease Requiring Tissues 1448 the Response CellsRegulates Endothelial Between of Pericytes CellsControlRecruitment from Endothelial Signals 1450 Wall and SmoothMuscleCellsto Formthe Vessel 1450 Summary BLOODCELL STEMCELLS: BYMULTIPOTENT RENEWAL 1450 FORMATION Granulocytes, Are Cells of WhiteBlood TheThreeMainCategories 1451 and LYmPhocytes Monocytes, of EachTypeof BloodCellin the BoneMarrowls TheProduction 1453 Controlled Individually 1454 StemCells Hemopoietic BoneMarrowContains 1456 of BloodCells A MultipotentStemCellGivesRiseto All Classes |+)o Process Commitmentls a StePwise of Number the Amplify Cells Progenitor of Committed Divisions 1457 BloodCells Specialized 1458 StemCellsDependon ContactSignalsFromStromalCells CanBeAnalyzedin Culture 1459 Hemopoiesis ThatRegulate Factors 1459 Dependson the HormoneErythropoietin Erythropoiesis Production 1460 andMacrophage Neutrophil lnfluence MultipleCSFs 1461 CellDependsPartlyon Chance of a Hemopoietic TheBehavior of Cell lsaslmportantasRegulation of CellSurvival Regulation 1462 Proliferation 1462 Summary OF AND REGENERATION MODULATION, GENESIS, MUSCLE SKELETAL Fuseto FormNewSkeletalMuscleFibers Myoblasts
1463 1464
MuscleCellsCanVaryTheirProperties by Changing the protein fsoforms TheyContain 1465 Skeletal MuscleFibersSecrete Myostatin to LimitTheirOwnGrowth 1465 SomeMyoblasts Persist aseuiescentStemCellsin the Adult :|466 Summary 1467
FIBROBLASTS ANDTHEIRTRANSFORMATTONS: THE CONNECTIVE-TISSUE CELLFAMILY
1467
F i b r o b l a s tC s h a n g eT h e i rC h a r a c t e ri n R e s p o n s et o C h e m i c a l Signals T h e E x t r a c e l l u l aMr a t r i x M a y I n f l u e n c eC o n n e c t i v e - T i s s uCee l l Differentiation by Affecting Cell Shape and Attachment OsteoblastsMake Bone Matrix M o s t B o n e sA r e B u i l t A r o u n d C a r t i l a g eM o d e l s B o n e l s C o n t i n u a l l yR e m o d e l e db y t h e C e l l sW i t h i n l t OsteoclastsAre Controlled by SignalsFrom Osteoblasts Fat CellsCan Develop From Fibroblasts Leptin Secretedby Fat CellsProvidesFeedbackto Requlate
1467 1468 i46g 1470 lr472 1473 1474
Eating Summary
1475 1476
S T E M - C E LELN G I N E E R I N G
1476
Hemopoietic StemCellsCanBeUsedto Replace Diseased Blood Cellswith Healthy Ones 1477 Epidermal StemCellPopulations CanBeExpanded in Culturefor Tissue Repair 1477 NeuralStemCellsCanBeManipulated in Culture ir478 NeuralStemCellsCanRepopulate the CentralNervousSystem 147g StemCellsin the Adult BodyAreTissue-Specific 1479 ESCellsCanMakeAnyPartofthe Body 1480 Patient-Specific ESCellsCouldSolvethe problemof lmmune Rejection 1481 ESCellsAreUsefulfor DrugDiscovery and Analysis of Disease 14g2 Summary lr4g2 References l4g3
Chapter 24 Pathogens, Infection, andInnate lmmunity INTRODUCTION TOPATHOGENS PathogensHave EvolvedSpecificMechanismsfor Interacting with Their Hosts T h e S i g n sa n d S y m p t o m so f I n f e c t i o nM a y B e C a u s e db y t h e Pathogen or by the Host! Responses PathogensAre PhylogeneticallyDiverse BacterialPathogensCarry SpecializedVirulenceGenes Fungal and ProtozoanParasitesHave Complex Life Cycleswith MultipleForms Alf AspectsofViral PropagationDepend on Host Cell Machinery PrionsAre Infectious Proteins Infectious DiseaseAgents Are Linked To Cancer,Heart Disease,
andOtherChroniclllnesses Summary
1485 1486 1486 1487 1488 1489 1494 1496 14gA
1499 '1501
CELLBIOLOGY OF INFECTION 15 0 1 Pathogens CrossProtective Barriers to Colonizethe Host r5 0 1 Pathogens ThatColonize Epithelia MustAvoidClearance bv the Host 1502 Intracellular Pathogens HaveMechanisms for BothEnterinq and Leaving HostCells I 504 VirusParticles Bindto Molecules Displayed on the HostCell Surface 1505 poreFormation, Virions EnterHostCellsby Membrane Fusion, or Membrane Disruotion 1506 Bacteria EnterHostCellsby phagocytosis 1507 Intracellular Eucaryotic Parasites ActivelyInvadeHostCells 1508 ManyPathogens AlterMembraneTrafficin the HostCell 151 Viruses and Bacteria Usethe HostCellCytoskeleton for Intracellular Movement 1514 ViralInfections TakeOverthe Metabolism of the HostCell 1517 PathogensCan Alter the Behaviorof the Host Organism to Facilitate the Spreadofthe Pathogen
Pathogens EvolveRapidly AntigenicVariationin Pathogens Occursby Multiple Mechanisms Error-Prone Replication Dominates ViralEvolution Drug-Resistant Pathogens Area GrowingProblem Summary
15 1 8 15',19 1520 t)zl
1524
B A R R I E R S TI N OF E C T I OANN D T H EI N N A T E IMMUNE
5YsTEM
1524
Epithelial Surfaces and Defensins HelpPreventInfection HumanCellsRecognize Conserved Features of Pathogens Complement ActivationTargetsPathogens for Phagocytosis or Lysis Toll-likeProteins and NODProteins Arean AncientFamilyof PatternRecognition Receptors Phagocytic CellsSeek,Engulf,and DestroyPathogens ActivatedMacrophages Contributeto the Inflammatory Response at Sitesof Infection Virus-lnfected CellsTake DrasticMeasures to PreventViral Replication NaturalKillerCellsInduceVirus-lnfected Cellsto KillThemselves Dendritic CellsProvide the LinkBetween the Innateand AdaptivelmmuneSystems Summary References
1525 1526
Chapter25 The Adaptivelmmune System
152g 1530 15 3 1 | )55
1534 1535 1536 1537 |53t
1539
LYMPHOCYTES AND THECELLULAR BA5I5OF ADAPTIVE IMMUNITY 1540 Lymphocytes AreRequired for Adaptivelmmunity 1540 TheInnateand AdaptivelmmuneSystems WorkTogether 154j B Lymphocytes Developin the BoneMarrow;TLymphocytes Developin theThymus 1543 TheAdaptivelmmuneSystemWorksby ClonalSelection 1544 MostAntigensActivateManyDifferentLymphocyte Clones 1545 lmmunological MemoryInvolves BothClonalExpansion and jS45 Lymphocyte Differentiation lmmunologicalTolerance Ensures ThatSelfAntigens AreNot NormallyAttacked 1547 Lymphocytes Continuously Circulate ThroughPeripheral Lymphoid Organs 1549 Summary 15 5 1 B CELLSAND ANTIBODIES 15 5 1 B CellsMakeAntibodiesas BothCell-Surface AntigenReceptors and Secreted Proteins 1552 A TypicalAntibodyHasTwoldenticalAntigen-Binding Sites 1552 An AntibodyMolecule lsComposed of HeavyandLightChains 1552 ThereAre FiveClasses of AntibodyHeavyChains, Eachwitn DifferentBiological Properties 15 5 3 TheStrengthofan Antibody-Antigen InteractionDependson Boththe Numberand the Affinityof the Antigen-Binding Sites 15s7 AntibodyLightandHeavyChains Consist of Constant andVariable Regions 15 5 8 The Light and Heavy ChainsAre Composed of Repeatinglg Domains 1559 An Antigen-Binding Site ls Constructedfrom HypervariableLoops 1s60 't561 Summary
THEGENERATION OFANTIBODY DIVERSITY AntibodyGenesAreAssembled FromSeparate GeneSegments DuringB CellDevelopment lmprecise Joiningof GeneSegments GreatlyIncreases the Diversityof V Regions TheControlof V(D)JRecombination EnsuresThat B CellsAre Monospecific Antigen-Driven SomaticHypermutation Fine-Tunes Antibooy Responses B CellsCanSwitchthe Classof AntibodyTheyMake Summary
1562 1562 1564 I 565 1566 1567 1569
T CELLSAND MHC PROTEINS
1569
(TCRs) AreAntibodylikeHeterodimers T CellReceptors by DendriticCellsCanEitherActivate AntigenPresentation T Cells orTolerize T CellsInduceInfectedTargetCellsto EffectorCytotoxic KillThemselves EffectorHelperT CellsHelpActivateOtherCellsof the Innate and AdaptivelmmuneSystems the Activityof OtherT Cells Regulatory T CellsSuppress ForeignPeptides Boundto MHCProteins T CellsRecognize Reactions Wereldentifiedin Transplantation MHCProteins WereKnown BeforeTheirFunctions Similar AreStructurally ll MHCProteins Class I andClass Heterodimers with a An MHCProteinBindsa Peptideand Interacts T CellReceptor Targets MHCProteinsHelpDirectT Cellsto TheirAppropriate Bindto InvariantPartsof MHC CD4and CD8Co-Receptors Proteins Fragments of ForeignCytosolic T CellsRecognize Cytotoxic with ClassI MHCProteins Proteinsin Association Foreign of Endocytosed HelperTCellsRespondto Fragments with Classll MHCProteins ProteinAssociated in the Thymus Selected Potentially UsefulT CellsArePositively
1570 1571 tJ/t
1573 1574 1575 1575 1576 1577 1579 1580 15 8 1 1583 1585
Cytotoxicand HelperT CellsThatCould MostDeveloping AreEliminated Complexes BeActivatedby Self-Peptide-MHC 1586 in theThymus in the Expressed AreEctopically Proteins SomeOrgan-specific 1587 ThymusMedulla TheirPolymorphism1588 HelpsExplain of MHCProteins TheFunction 1588 Summary
ACTIVATION ANDLYMPHOCYTE T CELLS HELPER to CellsUseMultipleMechanisms Dendritic Activated ActivateT Cells TheActivationof T CellslsControlledby NegativeFeedback the Nature of EffectorHelperT CellDetermines TheSubclass lmmuneResPonse of the Adaptive and StimulateAn Tu1CellsActivateInfectedMacrophages ResPonse lnflammatory (BCRs) ls OnlyOneStepin AntigenBindingto B CellReceptors B CellActivation for ActivatingMost HelperTCellsAreEssential Antigen-specific B Cells Antigens T-Cell-lndependent of B CellsRecognize Class A Special Belongto theAncientlg Molecules lmmuneRecognition Superfamily Summary References
1589 1590 15 9 1 | )YZ
1594 1595 1597 1598 1599 1600 1600
Acknowledgments In writing this book we have benefited greatly from the advice of many biologists and biochemists. We would like to thank the following for their suggestions in preparing this edition, as well is those who helped in preparing the first, second, third and fourth editions' (Those who helped on this edition are listed first, those who^helpedwith the first, second, third and fourth editions follow.) Chapter1: W.FordDoolittle(Dalhousie University, Canada), (Exploratorium@, Jennifer Frazier SanFrancisco), DouglasKellogg (University of California, SantaCruz),EugeneKoonin(National Institutes of Health), MitchellSogin(WoodsHoleInstitute) Chapter2: MichaelCox(University of Wisconsin, Madison), Christopher Mathews(OregonStateUniversity), DonaldVoet (University of Pennsylvania), JohnWilson(Baylor Collegeof Medicine) Chapter3: DavidEisenberg (University of California, Los Angeles), Louise Johnson(University of Oxford), SteveHarrison (Harvard University), GregPetsko(Brandeis University), Robert Stroud(University of California, SanFrancisco), JanetThornton (European Bioinformatics Institute, UK) Chapter4: DavidAllis(TheRockefeller University), AdrianBird (Wellcome TrustCentre, (National UK),GaryFelsenfeld Institutes of Health), (University SusanGasser of Geneva, Switzerland), Eric
(Massachusetts Instituteof Technology), JoanSteitz(yale (Harvard University), JackSzostak MedicalSchool, Howard HughesMedicalInstitute), (University DavidTollervey of Edinburgh, (California UK).Alexander Varshavsky Instituteof Technology), (University Jonathan Weissman of California, San Francisco) Chapter7: RaulAndino(University of California, SanFrancisco), DavidBartel(Massachusetts Instituteof Technology), Michael Bulger(University of Rochester MedicalCenter), MichaelGreen (University of Massachusetts MedicalSchool), CarolGross (University of California, SanFrancisco), FrankHolstege (University MedicalCenter, TheNetherlands), RogerKornberg (Stanford University), HitenMadhani(University of California, San Francisco), Barbara Panning(University of California, San Francisco), (Memorial MarkPtashne Sloan-Kettering Center), Ueli (University Schibler of Geneva, Switzerland), AzimSurani (University of Cambridge, Chapter8: Wallace (University Marshall [majorcontribution] of California, SanFrancisco)
Washington) Chapter5: Elizabeth (University Blackburn of California, San Francisco), JamesHaber(Brandeis University), NancyKleckner (Harvard University), JoachimLi (University of California, San Francisco), ThomasLindahl(Cancer Research, UK),Rodney (Columbia Rothstein University), (University AzizSancar of North Carolina, ChapelHill),BruceStillman(ColdSpringHarbor Laboratory), StevenWest(CancerResearch, UK),RickWood (University of Pittsburgh)
Chapter9: WolfgangBaumeister (MaxplanckInstituteof Biochemistry, Martinsried), KenSawin(TheWellcome TrustCentre for CellBiology,UK),PeterShaw(JohnInnesCentre,UK),Werner (MaxPlanckInstitute KLlhlbrandt of Biophysics, Frankfurt am Main),Ronald Vale(University of California, SanFrancisco), Jennifer (National Lippincott-Schwartz Institutes of Health) (Swiss Chapter10:Ari Helenius Federal Instituteof Technology Ztjrich,Switzerland), (MaxplanckInstituteof WernerKtjhlbrandt Biophysics, Frankfurt (Maxplanck am Main),DieterOsterhelt Instituteof Biochemistry, Martinsried), KaiSimons(Maxplanck Instituteof Molecular CellBiologyandGenetics, Dresden)
Chapter1l: Wolfhard Almers(OregonHealthand Science Chapter6: RaulAndino(University of California, SanFrancisco), University), (University Robert Edwards of California, San DavidBartel(Massachusetts Instituteof Technology), Richard Francisco), (University Bertil Hille of Washington), Lily Jan Ebright(Rutgers University), DanielFinley(Harvard University), (University of California, SanFrancisco), RogerNicoll(University of JosephGall(Carnegie Institution of Washington), MichaelGreen California, 5an Francisco), (University Robert Stroud (University of California, of Massachusetts MedicalSchool), CarolGross 5anFrancisco), (University Patrick Williamson of Massachusetts, (University of California, SanFrancisco), Christine Guthrie Amherst) (University of California, SanFrancisco), Art Horwich(yale University Schoolof Medicine), (Stanford RogerKornberg Chapterl2tLarry Gerace(TheScrippsResearch Institute), University), Reinhard (MaxplanckInstituteof Lrjhrman Ramanujan Hegde(National Institutes of Health), Nikolaus Biophysical Chemistry, Gottingen), (University Pfanner euinn Mitrovich(University of of Freiburg, Germany), DanielSchnell California, SanFrancisco), (HarryNoller(University (University of California, of Massachusetts, Amherst),KarstenWeis(University SantaCruz), (University RoyParker of Arizona), RobertSauer of California, Berkeley), Susan Wente(Vanderbilt University
rlnarts lallelal)oU) Nl)lU)lq)laleMsolesew'$1 lalseqrueyl;o {tlstan;u1) Inpdrue!lllM'(rQtslantul JosalnlrlsulleuorteN) 6uaq5ueblo6l a}nll}sulspasnqlesseW) seUeq)'1{6olouqre1;o 6rervrzuessnp leerl)rW'(a6p;rqr.ue3 ;o fi;slen1ul)ta6laqnay 'ralua)q)lPesau ells6aayuay '({6o1ouq)alJoa}nlllsul puele)!pawqslMafleuolleN) '(Alsrenlunalel5 ue61qr;y1) leeq)lw'(re^uac sau{gpieq)lU'$n 1a}saq)ueW }o ^}lsla^lun) sDesnq)PsseW) ues'elulo1lle) ;o {r1sten1u6) l)eueW eddr;rq6'(o)st)uerl q6nouepoog '(a6plrqruel;ofirsran1ul) sa;tqdungulueW'(loot{)S le)lpewple^JeH) uotea3se;6no6 rarue'lsrma-l '$n 'lelsaq)uew;o{lrsranru6) porre9pl^eq'({llslantuq '(looq)S lalueC leulLlaog uo^ pleleH:97raldeql le)rpawpJe^reH) '(o)sr)uerl 'eluloJllef ues q6lqa-l) ;o ftlstantu1) Ilel selqlrew (luoula4 lo {ttslanlu6) ueuq)no) uqof'(ll!H eurlore3'()n'a6allof {>lsureq le1radu1) 'eu1;ore)qlroN firsranlu6)e6pprngqlla) '(^ueulag pre6 {re9'({lrsrenru6 eqf) ueuulelsqdleu'()n rallaJal)ou Jo ;eder43 'elulorlle) '({e1a1tag 'q)reasau /aulllpew ouelae3 esnos a stau lale/l Ja)uel) Jo reln)olowloj .ralua]l)nlqlao-xew)lalaulq)J!8 ueS'eturoJ!lef {oulrod;a1ueq'(orsouelj {1rsran1u61) ,o Illstanluq) '(lalual {ilstanlunploJuels)pollaxv,(ar;;a;:61 raldeql le)tpaw ueS'etuloJtle) {a;s1ro1preq)lu'(o)st)uell }o ^}lsla^lun) orolY) (uedel,Qlsrenru61 '(a6prrquel;oIlrsran;u1) uotealse;6no6 raruplsrma-l '(lerrdsop se;6no6 apnf'lS)uaa.r9 s,ualplrq) '({lrsranrul ele6eNnze1a61q5 uo}a)ulld)lslnbul pro}ue}S) MoIlel ue}S'({l;slantu6 '(o)sl)uell '(etleJlsnv'r{)leasau le)lpawJoalnlllsullleHezlll puelalleMaLl! uu{1'(1ooqr5 le)lpawple^leH)lotuqeo8uo^ pleleH jo t$lstantul sexef suepy,fura;'(looq)5 le)lpawulalsaMqlnos ueS'eruroJrleJ ;o ft rstanrul)doqslglaeql;y1'(,Qtstan;uq reldeql 6ueM6uopoelX:91 eqf) [uo!]nqU]uol lPlluelsqnsl ploluels) ro[eu]loUaLlIallnf:tZ re$eq1 [uorlnquluo) (gergdsog tuarplrq)apn|ls) laqs soUeq)'(a6p;tque1 ()n 'alnlllsulq)rpasau ra)ue))DeMeuorl'()n 'elnlrlsuluoprng)sautdueqleuof'(pJolxgJo^llsle^lun) 'qOlnqulpl 'ralsaq)ueW JoIlrsrantuleqf) llnallssapeq)'()n oleloqe1(6o1o;g I eln)elow qv(usep u ry'((uetutag',fu (ltstantul) 1oftrsranruq)qllulsullsnv'(allleas'uot6urqseMJo p;eep1 rluasle):1r3'({e;a1leg/eluloJlle)JoI11stan1u61) qeuseuoql'(looq)Sle)lpawple^leH)ulllo uenls '(loor'l)S ueadorn3) '(al}}eas'le}ua) laz}ol9 e))aqeU'(o6er1q3;o{1rsren1u61) laeq)lW uas;gurofg'(a)uprl'eun)lnlllsul)pJe^nol le)tpawpre^reH) ue5 parj)re6p3a>nrg',(o6a16 le)ue) uosu!Ll)lnH q)leasou tl)leasau sOu;y) seqbnguoluls'(alnlrlsul larueC'()n'aba;1o1 /erurojtle) ueS'e!uloJ!lel jo {ltstanlun)leseopeLlsrv'(o)sl)ueU 'lS'rQ!stentun uot6ulqsenn) uopuol)lprPqregre6;og'(srno1 ;o Il;sten;u6)[uollnquluo)rofeul ue6roylpl^eq:Zl raldeql aql'alnillsullq)elqnH)sle^el) uoplog(er;;a;'(spue;raqleN alenpleuou jleu :97raldeql q)leasauuopuol)sLuepv sueH'(alnlllsul (o)sDuellues'elulojllef;o A;sten;u1) '()fl 'rueqbu;tut1g '(looq)S le)!pawpre^leH)uos!q)llw{qloulf uopuol)uodelselor1l'()n'alnlllsuluopjnD 's)lleua9 (alnlllsulrl)reeseu elnel'(uapsalC ;o {ltslantu6aq1)I1saqre61 loj allua] leuolleN )!J!lual)s aqf) rll!rusutr'(e)uPll 'tl)leasau pue{6o;o19lla) leln)elo4}o oln}llsull)ueldxew) pleMoH ueulnHlo, ellua) q)uau) q1n6s1amg>s slofuell',()n',s)lreuag reln)olopro; Itoletoqel 35y11) aof '()n [6o1o;9lla) pue16o1o19 '(oba16ueS'eluloJ!le) lsnrl auro)llaMeLlI)uosllaqouLllaqezlll /eluroJlle);o {tte1 u1a1sp;o9 {1;sren;u61) llpHuelv'(o6alqueS sluu!9)Wue;11;nn'({6olouqlal I Jooln}l}su ;o 11;sranrul; '(o)st)uerlues'elulo;llP);o,tlsranlun)aulno€ftuag '({tlstan1u6 zlrmola{a6; eluroJrlef) }}o!lll'$n'aln}t}suluoplnDaql) uosutlof a!lnf:91 raldeql ploluels) ]ollaqf [uoltnqll]uo)JofeLul 'tS sra6lnu)aul^llqlauuo)'({11slantu6 latuec'(ftrsren;u61 '(uopuo'la6a1;o1 'eluroJlle1 'atnlllsul ual suaqdar5 aq1) uleqeJqeg ()n a1n6)ue6ogpr6ug'({ala1tag Jo,(rtstanlul) 'q)unZ l(ltstantuq) '(pueUezllMS ra;seg prer.l)lu puel.reH laq)lldellnf'(oluolol';e11dsop1 leu!StunoW)uosmed Jo ftrsran;u1) '1{6olouqra1;oe}n}l}sul (uof '(ArstenlunploJuels)assnN peruo)'()n 'arnllsuluoprngeq1)ra6urrqyollnf:zz reldeql laou Dollll'(o6el6 ueS'eluloJ! |e) ;o {llstantu1) etulojrIe)) zlrlnote{ayy '(uapaMs (au!]|pawjo filstan;u1)aul^llulqou uue) laeqrlW'(a6prrqr.ue1lo '()n 'alnlllsuluopln9eq1) 'q)leosa5 Ined looq)Sreurslunow)ueullessPM Je)ue)loJelnlllsul6ttnpnl)u!plaHlUuaH-lle) ualleq)Splelag ruernS urzv'(lalua) tuaudolana6q6tnqsp14) '(relue) uelV'(aul)!pew 6ulteDe)-ueolS letue) lleH leuoLuaW) '(o)sr)uerlueS'etulojtle) {lrsten;u61) '(lalue) o[ragaeuaS'({6o;ouq>a1 ;o le)lpaw ]o looq)s,firstantulele ) qsoqgrelues '{6o1o;9 ,fute1'(s;ne6'eluloJllel ra^eeM-lrO jo alnlllsulsuasnq)esseW) sexalJo{ltstantulaql) ueulllgpalilv'()n ulalsamqlnos u61eduel '(,tlstenlunptoluer5) sal(y1euer6'(a6puque3'etnltlsul ;o I1;srenru6) Meqlle6 ueuiaell leln)aloW Jofuo1eroqe1) u1 q)reaseuletue] /a[uo)llaM)uelel]W auuy'(rttstant 'Un 'qlleaseu le)lpawloJelnlllsul leuolleN) lleilal sauef 'eluloJllef pJe^reH) roul)el) {tuel '1srne6 }o fi;slentu6) {etg stuua6'(olsl)uell ao)sugsarue6'(abplrqule) Jo{11slan1u1) '()n 'al}ua)sauul ralunHllaN'(^rl) sPsue)'qlreosaulellpaw loj alntllsulsJaMolS) ueS'eluro,tle);o{llsra4un)aulnog^luaH 'eluoj!lef fttstan1ul)6tnqula6 Ia;rvreg}}or5 '(,{a;a1iag }o plaqleHseloqllN:91 raldeql uqof)luot]nqll]uollel]uelsqnsl '(o)sl)uell {qqy '(uopuola6e11o1 fttstantul;lloJre)uqof (obe16ue5 ueS'elu.roJllef ;o {tlstenlun)o)tele)erlllled:17 reldeql '(eruenl(suue6;o 'e!uroJrlpf {ilstanru6) lllstantul a#e leeLl)lW ]o suasnq)esseW) ({6o1ouq>a1 'alnltlsultl)Jeasau 1oa}nlt}sul uosurelqy)uosduoql la)ue) r(;rure3 'q)leasau uosulltuof 'sr1s,(qdo1g le)ue)) 6taqutann llaqou 'Un ule 61er3'(u;e61 unpluell Jo elnlllsull)ueld 'lolsll8 uosduoql 6;er1'(abp;tquel ue1'(eruenl{suua6 clellsaleH jo ;o r$tstantul) ^l!sle^!un) xew) rpuerqlqu laula A'()n lepuodarnrg '({toleroqelroqte;16ut.td5 :91raldeql 1oftrsran1u6) {er9 ;aeq>161 elsnoqleC) MalpuV'(I11slan1u61 plo)) eMol llo)s '(aul)lpewJo looq)s(trsrantunploluers)l)lsd!l (utewuleun}lueu 's)llaue9pue{6o1o19 1|e1 sel6noq ueqeueH qdesol'(o)st)ue4 ues'elulo1lle) 1oI11s.ranru61 '({11slan1u1 a1e1) '(abpqrquel lltslanlu6)spieMpl Ined'(epeue)'oluolol leln)alow,oalnlltsulI)ueld xew)lellazoulrPW 1o '()n 'q)leasau uopuol)o^elq)sotlaldruelD q)Jeesau uleqelg'(alnlllsul uaileM re)ue) Joelnlllsul ;o lltsrantul;IttC uqof '(a6ppque)'ll)uno) q)reasau lerlpawaq1)rueqla6q6ng suqor)zun€pell '(o)sl)ueJl or.lf)ouoS-acuueqor'(sulldoH '(6o;ouqral;o ele ) ueullawPll'(puelrazllMs'q)UnZ doqslglaeq>1y1'(spueUaqloN'(Ilrs.ranlun ueS'elurojtle) ;o Iltstantu6l) snlualappy'(o6er1q3r 1r1;9 ;o fitsten1u6) sula€uoluv'(ftlslantu6 alnlrlsulleJepalss!MS) otll'aln]!lsulle)ueJspueueqlaN) uag'(o6a;6ues'etu.loJllel ;o {lrsrenrul)ltul llo)S :g 1 reldeql projuels)[uot]nqUluo)lel]uetsqnsl !pJeDVelnel :67 reldeql (lsJaquV 'sllasnq)esseW{}rsranrul)uosulellllM led'(le}ua)le)!pal/! ;o (qlleeH aqo) epeue qlauua)'(uedel'alnlllsul leuolreN) ]o selnlrlsul
lunH utI'()n lalseq)ueW serrqdungur1re61 '(se;e6uy sol'etulojtle);o {lrsranrul)6teques13 Jofirsrenru6) pt^e6,(o)st)uell '(uopuolaba11o1 s,6ury) saq6nguotur5,({lrsrantun ralla;el)ou ueS'etuloJ!le) jo ^ltsla^tun) spremplyeqog,(e6prrquel aq1)qradspngseuef ,(allleas,uol6ulr.lse6Jo firsrarrrul)preMoH spleMplIned,()n ,q)tMloN,alnltlsulsauul Jo^ltsle^tun) ueqlpuof'(16olouqta1 ueulsnoH uqof)llaMunCurtf'(uopuo'l,1ru1-1 ;o alnltlsulsDasnq)esseW) s>rs{qdotg lla) fUW)uunC prne6'({6o;ouqlel}o ,()n ,qrreesag atnltlsulsgasnq)esseW) z}t^loH rueqerg'(uopuo'l a6a;;o1q6ury; UeqoU qllo) {a1pnq '(ft rsranr ,(elDeas,i(6o1or u uola)u plagdo ud) g qof u g 1 sr.uels{5 pJeMuMoO .ro1 Jalup3) uer;n;,(o6er6 ues,etuloJtle1 1oftrsrenruq) atnlllsul)poo;1{ore1'110019 {uot5 ,uo^ MeN;o {lrsran;u1a1e15) al]l!looc;1essn5'(a6prrqLUp) Jo{t;stenrul;uosqoc.raqdolsrrq) ,qlleaH salnltlsul '(o)st)uel3 quornsbur;;op {ruey ,(epsaqlag ueS'etulo;tlp) tsrenruq) Jo o)uellaC IuoqluV leuotteN) ft ;o ,uo16urqse14;o '()n 'lsnrl q)snqauutH ,(alpeos '(uopuol uelV lgrslenlu6) alltH auo)llaM eqf) lauac a6e11o1 ltyeg laeq)lW '(pasealep) zltMolsloH ell ,(qeln;ofirsranruq) r(lrslenluq) (pue1-11n3 aleCatlsal'(uopuola6a11o1 ,Ols.renrun) l)ulaH uualg '(abprrquel'{6o;org ,({r;sren;u6 reln)aloW uoslapueH Uenls'(uostpeW'utsuo)stMro {lrsrengul) Jo,toletoqel )UW) Mor) saulef p r e q ) t u ' ( ^ l r s l a ^ teul e n l )s n t u a l eU .rS,oul)lpow HV , ( ) n , u e q 6 u r u r r 6 su;1dogsuqop)6rerl{ruep ,(srno1 Jo loor.l)S ,1ru1,{6o;o19 pup,{6o1o19 {lls.renruq qleaHuqof ,(uopuo-.| 1o{lrslearu6) uol6ulqsen4) .redooluqof ,(o)st)uellues,eturo;tle) llo) ,({ueureg,6raq;aprag lla) leln)aloryrol {roreioqelfUW)poomlpHupupv,(elneas JoIltsre^tun)alool te6og lgWJ)ueqo) 'uot6urqsetr4 {lrsranr un) |la/v\U eHpuelel,(Itrsra^t un pl e^reH) ;o uaqdals'(olst)upllueS'elulo;tlp);ollrslarrul) ueqo)lleqou ,o6etg;o,tlsranlun)slrreH '(puellols'eepunC;o uosuleHuaqdal5,(pueleaz ana51 firslanrul)uaqol drIq6,Un ,Ll)tMroN ,eturortle) 'elnltlsul uqof 'Un 'q)reasau '(6rnoqsel15;o ra)ue)) suleH ueupV,(i(a;e>Fag sauul uqof) ueo) oluuf r{llstenru6) ,aapunC;o pleqltu,(puplro)S 1oIlrstanrul)pueUeH uoqueq) arretd,(uopuola6a;;o1q6ul)) qllurs_ratle^e) ftrs.ranru6) ulol atpleHuleqel9,(yn,q)tMtoN,arlua)sauuluqof)pleqleH '(obalCueS'eruro;!le);o laluadrelaptelapv,(se;a6uy ftrsranruq) seloq)tN'(e6puquel '(qe}n (are1 aIueHpr^e6,(o)sr)uelj sol /erulojtle);o{1;vanru61) ;o {lrstan;uq) {1rs.renlu6) ;eeq)y1 Jo ,(y1,uoldueqlno5 ues'erulo1rleJ l(1rsienlul) q)ez rqt>adel 1o oueyy'1uo6erg;or(1rsrenru61) lleH ;p1ede1 l)uepou '(I1rsianru61 stapuelg) {ar;;a1,(uopuol ;o firsranruq) uqof ,(rQrsranrul lleH etqulnlo))lotueJsalleqJ,(looqtsle]|pawple^leH) 1;eg l(6o1o19 reln)elow.ro;{roleioqel)UW)llpH llof pup{6o1o;9 Iellue3 sranal'({a;e1rag'etulo}tlef}o {lrslenru6)apue) sneq)ez '(ploJXO 'tueutrr;u; ue;y'({grsranrun ple^reH)6regprne6,(q)UnZ uajpH leltsle^tun) e#!l)peu) sulte)uqof,1llrgledeq3,eurlole) ,(sle)!lne)eujleqd lsulf '(o)st)uellueS,eturoJtlel ^lrs.renrun) auqfng eullsuq) quoN ^l!sle^lun) ebprllng qlle) 1o Jo XgS) '(elloqup)'{lrslanlul uetletlsnv) 6uruungueug,(la}ue) {e;tngueqdals'(laseg lpuotteN rabln6xeyl'(eur)tpew Jo^lrs.renru6) Jo re)ueJ6uua11ay-ueolS leuoueW)rautqulng{rreg,(se;a6uy Ilrstanruplro MaN)uepJnge^als,(ploJXO ;o {1slenru1)uanolg ,(spueuoqlaN '(uopuo-l sol 'eruroJtle) ^ltsra^tun) ,(uopuol ulolsunrD ar{l Jo ebe;1o1 laeq)tW {rreg laeq)rW a6aJ;o1 E6ury)unnorg 'lrailsla^run ,(o)st)uell ,etuJoJtle) '(I6olouq)af snLlsell)pla^solg ueS qoorg luelJ jo rJoqou olntltsul etulojtle)) Ebury) leseJl '1e6prrquel,(6o;orglelnleloW ssorglole] /(spuelJoqleN -reuuorgeuuelreW ;o ^lrsranruq) aqf ,ureple]sruv Jo Jo{roteroqel ^}rsranru g) lla^u9etlsal'(lslequV,s]lasnq)essew ;o ^}rsranruq) o}nllsullelapel )UW)loq)slel€lleW,(Ll)unz,{6o;ouqrelyo ueelg ple^leH) ,(a6prrqLuel pleMoH,(uopuol uaalg laeLl)tW,(&rslanlun sst^ tlpuelg olpuv,(pesealep) uapuerg S) ;re) abal;o1s,6ul))ieaeig lalleM'(plojxo;o i(1rs.ranru6) ,(uopuo-l ua1er9 {lrslenru6) puer€ utue4 e6eg1o3 Ilrsranlu6) 1o ,loorl)s uelv'(uopuo-l xasalpptw) plnoglalad ap{oguely'(o)sl)uplJ le)tpaw;el;dso;1 ups,eturoJtle) }o Atsto^tun)aurnog(tueg '(raplnog /opelolof r.llupoogultr,(looq)sle)tpaw '(uepralsuv Jo{lrsranrul,aln}tlsululeplauuleMsupf)lsroglatd Jor(lrsranru6) ,(uopuol,looLl)S pre^reH) '(aur^rl'etuloJrle) qbnouepoog ,q)leesau latueC epo€sueH,(uopuo-.| lellpeW ;elldsog Jo{Usranrun) a6a11o1 llstenlul) suadr.uog uotlspg,(o6e16 ueS,etuloJtlpJ sstlgLlltf'(olst)uellues,erujojtle) le)paw toJalnltlsulleuotleN) ro ,(eulsny,euuetn {lrsranru6) utolsplog ,(lootl)S rt.re1 l(}rsranrul) lez]olg l(1rsranru6) doqsrg }o 1o ;eeqrr61 le)tpawuosuqofpoo4 uole)uud)6ren1r9 leeq)lW'(llrsrenruq selleq),(pesee)ep) elnltg rraqou-fCNWn)lltg pt^eo,(a6prrqueS,otnlllsul r,!eqplqeg arurog'(lsleqrlv,sllesnq)esse61 {lrsranrun) ejoultDptau aqf a6prrreg ple^rpH) ;o plouureguouaw leeq)tW'(loor.tls le)tpaw '(auprpe4 '(o6alqueS'etuto;tleJ;o {lrslanrupale ) qsoqgleIueS'(^6olouqlaf ,(eut)tpew Jo loor.l)S 6legurnnre6 firsranruq) ,({6o;ouqrel lalUagluell ,(pausutUpW ,o alnlllsulsllesnLl)psseW) ulalsulluaqlv)Dauueglaeq)rW ;o a6e1;o1 ',tls[uaq:ol€ ,(Atsla^tun Joalntrlsull)ueld xew)q)sua9laqlung ,({e;a1reg ploruels) Joalntrtsulsuesnq)essew) ;auegpt^ec 'eturoJtle) ^}rslenru6) Ueq.lagur]of,(a]n]tlsul r.llleoseu saJleguag'(o)st)uell ,etuJoIle);o Jo ueS {1rs.ranru6) uueu6reg ,lo^or.leu ,a)uet)Sjo alnulsul sddrrr5eq1)o)elag &Je1,(1aers1 Prlaulo)'(o)st)uerlues,etulojtle) epupglaeq)lw Jo^llsra^tun) ,unJluazotg) '({lrsranru6 uueuztel ) le6legr(uuag,(leseg prolue}S)utMplegpt^ec ,(paseo)ap) JoIlrslenlu6l releg lalod /q)leeseu '(o)sr)uerl 'etuloJtle);o 6ulrqeglolleM,(uopuo-l ,(uopuol lo)ue) elnltlsul) ueS {pqenny Jo ftrslenrul) eu{e1 puepPD ielad'(uopuolabel;o1{lrslenrup) ut/v\paW-rauplpDa6a;1o1 {l;sranru61) elouqsVueqteuof,(e6p;rqruel ;o ft rstenruq) ^uoqluv'(plolxoJo{trsranlun) lauplegplpqltu,({lgsranguq a;e1) taurnqqsvlapq)tW,(looq)Sle)tpowple^leH)seuolesf ,(o6errq3 ,(olsr)uelJ ,(o)st)uell ,etulo;tle) -slue^euV sq)nJ ;1e9qdaso; Alsranlun) autell ueS 1o sol^ds ueS JoIlrslenru6) 'elulojlle) ^rlsran!un) '(epeue),ueMaq)lplses,o '(eruenl{suua6 pualrl Jo pneurv eLlUeW laluec Illsle^lun) 6uor}stu.rv Iel) Jo ,(loorlls '(o)sr)uerlueS'etuloJtle) Ilrs.ranru6) Ilrsranguq) a1ano3,(ire1 lp)lpowpre^lpH)ueu)loj qepnf outpuv;neg,(a6puque1 1o '(r(grsranru6 '(6o1o19 sUnI) ueLuoll {enreg ,(o)souellues/etulo;tleJ leln)alowJo{role.roqel)UW)soulv epur-l,(uo}sog 'q)leesou ,(a6prrquel;o ;o {ltstentul) lreqog'(Ilrste^run etqunlo))r.lleqq)srJ l)ua}lall loJ alnlltsul pat3 le)tpaulolg llv Ug)) ,etulo}tle) p;era9'({e;e1tog ,(o)st)uellueS,eturo;tle);o {re9 Jo{lrslenlu6)auo}satrl firsranrul)uelv laeLl)tW A;slen;up) pte6y prne6suolllpe qyno, pue,prrql ,puo)es,lsrrl (epeup)'oluorofJofirsranru6) seutqeqS eac ,(epeue) 'otuolof ,(eulsnv,euuatn jo {t;stanru6; ltzpnul)lN Jorerstenru6l) ,(firslenru6alnC) snleqsrypl^e6 lptol) lanueuu3 srapeag '(uopuol'tl)reasau ,(fttsla^tun la)ue) Joolnttlsul)ranu3buel unuerg r(Iaq5 ,eruermelloueell {resso;g a)tn6)Mopul ufueq5,(o6arqupS,etuJo;tle);o l(1rsranlul) ,(atnlrlsul lurl llo)s '(eru16rr410 r(tls.ran!un) uoslaulf '(^l!sla^tun selleq) lles eql) uosloul {;laneg,()n ,uoffns,q)leasaule)ue) ,()n ,q)leaseu o}ntllsul) rolleja))ou aql) qdlpU ueululors Jo lo)ue)) esnos ueurlllqlnu '(slnol.lS,,(llslanru6 uol6u!rlse4) ur6;3qere5 o sreuouelae)'(looq)S ,fismefeg snely,(rllleoH le)!powpre^leH)
(Cancer Hurst(University of Bath,UK), Research, UK),Laurence TonyHyman(Max CollegeLondon), JeremyHyams(University Dresden), CellBiology& Genetics, PlanckInstituteof Molecular Philip Instituteof Technology), Hynes(Massachusetts Richard UK),Normanlscove(Ontario Ingham(University of Sheffield, (Cancer Research, Toronto), Davidlsh-Horowicz CancerInstitute, (University Charles 5an Francisco), Lily Jan of California, UK), (Columbia (deceased), Arthur University), TomJessell Janeway AndyJohnston(JohnInnes A & M University), Johnson(Texas College, Norwich,UK),E.G. Jordan(QueenElizabeth Institute, (University Ray LosAngeles), of California, London), RonKaback DouglasKellogg of California, Berkeley), Keller(University (University of of California, SantaCruz),RegisKelly(University (MRCLaboratory JohnKendrick-Jones SanFrancisco), California, of Biology, Cambridge), CynthiaKenyon(University of Molecular (University of RogerKeynes 5anFrancisco), California, Madison), of Wisconsin, JudithKimble(University Cambridge), (Massachusetts Marc Hospital), Kingston General Robert (National (Harvard Klausner University), Richard Kirschner (Harvard Mike University), NancyKleckner of Health), Institutes (University Boulder), KellyKomachi of Colorado, Klymkowsky (University EugeneKoonin(National of California, SanFrancisco), (University 5an of California, JuanKorenbrot lnstitutes of Health), (University 5anFrancisco), of California, TomKornberg Francisco), (Washington Daniel 5t.Louis), University, StuartKornfeld (University MarilynKozak of California, Berkeley), Koshland (Stanford (University University), MarkKrasnow of Pittsburgh), (MaxPlancklnstitutefor Biophysics, Frankfurt WernerKrlhlbrandt (University Robert Berkeley), of California, am Main),JohnKuriyan Peter London), CellBiology, for Molecular Kypta(MRCLaboratory (MRCCenter, Cambridge), UlrichLaemmli(University Lachmann of Cambridge), TrevorLamb(University Switzerland), of Geneva, of Research, UK),DavidLane(University HartmutLand(Cancer (University JayLash of Oxford), JaneLangdale Dundee, Scotland), of (University PeterLawrence(MRCLaboratory of Pennsylvania), (MountSinaiSchool PaulLazarow Biology, Cambridge), Molecular (DukeUniversity), Michael RobertJ.Lefkowitz of Medicine), WarrenLevinson of California, Berkeley), Levine(University (Hebrew (University AlexLevitzki SanFrancisco), of California, (University of York, UK),Joachim Leyser lsrael), Ottoline University, TomasLindahl(Cancer SanFrancisco), of California, Li (University (University San of California, Research, UK),VishuLingappa (National of Institutes JenniferLippincott-Schwartz Francisco), Schoolof DanLittman(NewYorkUniversity Health,Bethesda), UK),Richard Norwich, CliveLloyd(JohnInnesInstitute, Medicine), (National Institute RobinLovell-Badge University), Losick(Harvard of London), ShirleyLowe(University for MedicalResearch, (University of LauraMachesky 5anFrancisco), California, Medical of Colorado UK),iamesMaller(University Birmingham, (Harvard ColinManoil(Harvard University), TomManiatis School), (National JewishMedicaland Marrack Philippa MedicalSchool), of Cancer MarkMarsh(lnstitute Denver), Research Center, San of California, GailMartin(University London), Research, Joan CollegeLondon), PaulMartin(University Francisco), (Memorial Center), Brian Cancer Sloan-Kettering Massagu6 (University McCarty lrvine),Richard of California, McCarthy (University (Cornell of California, WilliamMcGinnis University), (Wellcome/Cancer Campaign Research Anne McLaren Davis), of California, FrankMcNally(University Cambridge), Institute, Institut,Basel), Miescher Meins(Freiderich Freiderick Davis), lraMellman 5anDiego), Mel(University of California, Stephanie (YaleUniversity). of California, Meyer(University Barbara Instituteof Technology), ElliotMeyerowitz(California Berkeley), of RobertMishell(University University), ChrisMiller(Brandeis (University CollegeLondon), UK),AvrionMitchison Birminoham,
(University TimMitchison CollegeLondon), N.A.Mitchison (TheRockefeller (Harvard PeterMombaerts MedicalSchool), DavidMorgan MarkMooseker(YaleUniversity), University), MichelleMoritz (University SanFrancisco), of California, Moses(Duke Montrose (University 5anFrancisco), of California, (University SanFrancisco), of California, Mostov Keith University), HansM0ller-Eberhard CollegeLondon), AnneMudge(University of AlanMunro(University Institute), (Scripps Clinicand Research (Harvard Richard University), Mitchison J.Murdoch Cambridge), of California, DianaMyles(University University), Myers(Stanford MarkE.Nelson University), AndrewMurray(Harvard Davis), MichaelNeuberger (University Urbana-Champaign), of lllinois, Walter Cambridge), Biology, (MRCLaboratory of Molecular DavidNicholls of Munich,Germany), Neupert(University of Noble(University Suzanne (University Scotland), of Dundee, (University of California, HarryNoller 5anFrancisco), California, Paul Davis), of California, JodiNunnari(University SantaCruz), Patrick UK),DuncanO'Dell(deceased), Research, Nurse(Cancer Olson Maynard (University 5anFrancisco), of California, O'Farrell (Children's Orkin Stuart (University Seattle), Washington, of (Massachusetts Instituteof TerriOrr-Weaver Hospital,Boston), WilliamOtto ErinO'Shea(HarvardUniversity), Technology), of Birmingham, (Cancer UK),JohnOwen(University Research, Palade (University George Michigan), of Oxender UK),Dale (University San of California, (deceased), Panning Barbara WilliamW. (University Tucson), of Arizona, RoyParker Francisco), TerencePartridge Seattle), of Washington, Parson(University WilliamE.Paul(National (MRCClinical London), Centre, Sciences (MountSinaiHospital, Toronto), TonyPawson of Health), Institutes Cambridge), Biology, of Molecular HughPelham(MRCLaboratory Greg Philadelphia), Research, of Cancer RobertPerry(lnstitute (Cancer Research, GordonPeters University), Petsko(Brandeis JeremyPickettUniversity), UK),DavidPhillips(TheRockefeller JuliePitcher Australia), of Melbourne, Heaps(TheUniversity JeffreyPollard(AlbertEinstein (University CollegeLondon), BrucePonder TomPollard(YaleUniversity), of Medicine), College of California, DanPortnoy(University (University of Cambridge), (University Seattle), Washington, of JamesPriess Berkeley), (Duke (Tulane DalePurves University), DarwinProckop JordanRaff EfraimRacker(CornellUniversity), University), (University KlausRajewsky (Wellcome/CRC Institute,Cambridge), (University Elio Oxford)' of Ratcliffe George Germany), of Cologne, (University MartinRechsteiner (Harvard MedicalSchool), Raviola Institutefor Medical of Utah,SaltLakeCity),DavidRees(National (University San of California, Reichardt Louis London), Research, (YaleUniversity), ConlyRieder FredRichards Francisco), (Massachusetts Robbins Phillips (Wadsworth Albany), Center, of Reading, ElaineRobson(University Instituteof Technology), Rosenbaum Joel (The University), Rockefeller UK),RobertRoeder Toronto), (MountSinaiHospital, (YaleUniversity), JanetRossant JimRothman(Memorial of Health), Institutes JesseRoth(National (LaJollaCancer ErkkiRuoslahti Center), Cancer Sloan-Kettering General GaryRuvkun(Massachusetts Foundation), Research (NewYorkUniversity), AlanSachs DavidSabatini Hospital), of AlanSachs(University Berkeley), (University of California, (University North of Salmon Edward Berkeley), California, Peter University), ChapelHill),JoshuaSanes(Harvard Carolina, LisaSatterwhite(DukeUniversity Sarnow(StanfordUniversity), (University of California, HowardSchachman MedicalSchool), of Basel), (Biozentrum, University Schatz Gottfried Berkeley), Richard (University Berkeley), of California, RandySchekman (Cancer Schiavo Giampietro (Stanford University), Scheller (NewYorkUniversity Medical UK),JosephSchlessinger Research, (HebrewUniversity), RobertSchreiber MichaelSchramm Center), (Columbia JamesSchwartz lnstitute), (Scripps Clinicand Research
University), RonaldSchwartz (National Institutes of Health), Franqois (ENS, Schweisguth Paris), JohnScott(University of Manchester, UK),JohnSedat(University of California, San rJK),ZviSellinger Francisco), PeterSelby(CancerResearch, (HebrewUniversity, lsrael), (JohnsHopkins GreggSemenza University), peter PhilippeSengel(University of Grenoble, France), Shaw(JohnInnesInstitute, Norwich,UK),MichaelSheetz (Columbia University), DavidShima(Cancer Research, UK), SamuelSilverstein (Columbia University), KaiSimons(Maxplanck Instituteof Molecular CellBiologyandGenetics, Dresden), Melvin l. Simon(California Instituteof Technology), Jonathan Slack (Cancer Research, UK),AlisonSmith(JohnInnesInstitute, Norfolk, UK),JohnMaynardSmith(University of Sussex, UK),Frank Solomon(Massachusetts Instituteof Technology), Michael (University Solursh of lowa),BruceSpiegelman (Harvard Medical School), (Harvard TimothySpringer MedicalSchool), Mathias Sprinzl(University of Bayreuth, Germany), ScottStachel (University of California, Berkeley), (University AndrewStaehelin of Colorado, Boulder), (University DavidStandring of California, SanFrancisco), (University Margaret Stanley of Cambridge), MarthaStark(University of California, SanFrancisco), WilfredStein (HebrewUniversity, lsrael), (princeton MalcolmSteinberg University), PaulSternberg(California Instituteof Technology), ChuckStevens(TheSalkInstitute),MurrayStewart(MRC Laboratory of Molecular Biology, Cambridge), Monroe (University Strickberger of Missouri, St.Louis),RobertStroud (University of California, SanFrancisco), MichaelStryxer (University of California, SanFrancisco), WilliamSullivan (University of California, SantaCruz),DanielSzollosi (lnstitut Nationalde la Recherche Agronomique, France), JackSzostak (Massachusetts General Hospital), (Kyoto Masatoshi Takeichi University), CliffordTabin(HarvardMedicalSchool),Diethard Tautz(University of Cologne,Germany), JulieTheriot(Stanford University), RogerThomas(University of Bristol,UK),Vernon Thornton(King's CollegeLondon), (University CheryllTickle of Dundee, Scotland), JimTill(Ontario CancerInstitute, Toronto), LewisTilney(University of Pennsylvania), NickTonks(ColdSpring HarborLaboratory), (lnstitute AlainTownsend of Molecular
Medicine, JohnRadcliffe (Anthony Hospital, Oxford), PaulTravers NolanResearch Institute, (UMDNJ, London), RobertTrelstad RobertWoodJohnsonMedicalSchool), AnthonyTrewavas (Edinburgh University, Scotland), NigelUnwin(MRCLaboratory of Molecular (University Biology, Cambridge),Victor Vacquier of California, 5anDiego),HarryvanderWesten(Wageningen, The Netherlands), TomVanaman(University of Kentucky), Harold Varmus(Sloan-Kettering Institute), Alexander Varshavsky (California Instituteof Technology), MadhuWahi(University of California, 5anFrancisco), VirginiaWalbot(StanfordUniversity), FrankWalsh(Glaxo-Smithkline-Beecham, UK),TrevorWang(John InnesInstitute, Norwich, UK),Yu-Lie Wang(Worcester Foundation for Biomedical Research), AnneWarner(University College London),GrahamWarren(YaleUniversitySchoolof Medicine), (MountSinaiSchoolof Medicine), PaulWassarman FionaWatt (CancerResearch, (TheScripps UK),ClareWaterman-Storer Research Institute),FionaWatt(CancerResearch, UK),JohnWatts (JohnInnesInstitute, Norwich, UK),KlausWeber(MaxPlanck Institutefor Biophysical Chemistry, Gottingen), MartinWeigert (lnstitute of Cancer Research, Philadelphia), HaroldWeintraub (deceased), KarstenWeis(University of California, Berkeley), lrving (StanfordUniversity), Weissman (University JonathanWeissman (Stanford of California, SanFrancisco), NormanWessells University), JudyWhite(University of Virginia), StevenWest (Cancer Research, UK),WilliamWickner(Dartmouth College), Michael (ChironCorporation), Wilcox(deceased), LewisT.Williams KeithWillison(Chester BeattyLaboratories, London),JohnWilson (BaylorUniversity), AlanWolffe(deceased), RichardWolfenden (University of NorthCarolina, ChapelHill),Sandra Wolin(yale UniversitySchoolof Medicine), LewisWolpert(University College London),RickWood(CancerResearch, UK),AbrahamWorcel (University of Rochester), NickWright(Cancer Research, UK), JohnWyke(Beatson Institutefor CancerResearch, Glasgow), KeithYamamoto(University of California, 5anFrancisco), Charles Yocum(University of Michigan, AnnArbor),peter (UMDNJ, Yurchenco RobertWoodJohnsonMedicalSchool), Rosalind Zalin(University CollegeLondon), Patricia Zambryski (University of California, Berkeley).
A Noteto the Reader Structure of the Book Although the chapters of this book can be read independently of one another, they are arranged in a logical sequence of five parts. The first three chapters of Part I cover elementary principles and basic biochemistry. They can serve either as an introduction for those who have not studied biochemistry or as a refresher course for those who have. Part II deals with the storage, expression and transmission of genetic information. Part III deals with the principles of the main experimental methods for investigating cells. It is not necessary to read these two chapters in order to understand the later chapters, but a reader will find it a useful reference. Part IV discusses the internal organization of the cell. Part V follows the behavior of cells in multicellular systems, starting with cell-cell junctions and extracellular matrix and concluding with tvvo chapters on the immune system. Chapters 2l-25 can be found on the Media DVD-ROM which is packaged with each book, providing increased portability for students. End-of-Chapter Problems A selection of problems, written by Iohn Wilson and Tim Hunt, now appears in the text at the end of each chapter. The complete solutions to these problems can be found in Molecular Biology of the CelI, Fifth Edition: The Problems Book. References A concise list of selectedreferencesis included at the end of each chapter. These are arranged in alphabetical order under the main chapter section headings. These references frequently include the original papers in which important discoveries were first reported. Chapter 8 includes several tables giving the dates of crucial developments along with the names of the scientists involved. Elsewhere in the book the policy has been to avoid naming individual scientists. Media Codes Media codes are integrated throughout the text to indicate when relevant videos and animations are available on the DVD-ROM. The four-letter codes are enclosed in brackets and highlighted in color, like this
.The interface for the CeII Biology Interactiue media player on the DVD-ROM contains a window where you enter the 4-letter code. lVhen the code is typed into the interface, the corresponding media item will load into the media player. GlossaryTerms Throughout the book, boldface type has been used to highlight key terms at the point in a chapter where the main discussion of them occurs. Italic is used to set off important terms with a lesser degree of emphasis. At the end of the book is the expanded glossary, covering technical terms that are part of the common currency of cell biology; it is intended as a first resort for a reader who encounters an unfamiliar term used without explanation. Nomenclature for Genes and Proteins Each species has its own conventions for naming genes; the only common feature is that they are always set in italics. In some species (such as humans)' gene names are spelled out all in capital letters; in other species (such as zebrafish),
case and rest in lower case; or (as in Drosophila) with different combinations of upper and lower case,according to whether the first mutant allele to be discovered gave a dominant or recessivephenotype. conventions for naming protein products are equally varied. This typographical chaos drives everyone crazy. lt is not just tiresome and absurd; it is also unsustainable. we cannot independently define a fresh convention for each of the next few million species whose genes we may wish to study. Moreover, there are many occasions, especially in a book such as this, where we need to refer to a gene generically,without specifliing the mouse version, the human version, the chick version, or the hippopotamus version, because they are all equivalent for the purposes of the discussion. \.A/hatconvention then should we use? We have decided in this book to cast aside the conventions for individual species and follow a uniform rule: we write all gene names, like the names of people and places, with the first letter in upper case and the rest in lower case,but all in- italics, thus: Apc, Bazooka, cdc2, Disheuelled,Egll. The corresponding protein, where it is named after the gene, will be written in the same way, but in roman rather than italic letters:Apc, Bazooka, cdc2, Dishevelled,Egll. lvhen it is necessary to specify the organism, this can be done with a prefix to the gene name. For completeness,we list a few further details of naming rules that we shall follow In some instances an added letter in the gene name is traditionally used to distinguish between genes that are related by function or evolution; foi those geneswe put that letter in upper case if it is usual to do so (LacZ,RecA,HoxA4). we use no hyphen to separate added letters or numbers from the rest of the name. Proteins are more of a problem. Many of them have names in their own right, assigned to them before the gene was named. such protein names take many forms, although most of them traditionally begin with a lower-case letter (actin, hemoglobin, catalase), Iike the names of ordinary substances (cheese, nylon), unless they are acronyms (such as GFB for Green Fluorescent protein, or BMP4, for Bone Morphogenetic Protein #4).To force all such protein names into a uniform style would do too much violence to established usages,and we shall simply write them in the traditional way (actin, GFB etc.). For thl corresponding gene names in all these cases,we shall nevertheless follow our standard rule: Actin, Hemoglobin, catalase, Bmp4, G/p. occasionally in our book we need to highlight a protein name by setting it in italics for emphasis; the intention will generally be clear from the context. For those who wish to know them, the Table below shows some of the official conventions for individual species-conventions that we shall mostlv vioIate in this book, in the manner shor.tm.
Mouse
Human Zebrafish Coenorhabditis Drosophila
Yeast Socch aromyces cerevisiae (budding yeast) Schizosacch aromyces pombe(fissionyeast) Arabidopsis E.coli
Hoxo4 Bmp4 integrinu-|, ltgal HOXA4 cyclops,cyc unc-6 sevenless, sey(named afterrecessive mutant phenotype) Defarmed,Dfd (named afterdominantmutant phenotype) CDC28 Cdc2 GAI uvrA
Hoxa4 BMP4 integrincr1 HOXA4 Cyclops, Cyc UNC-6 Sevenless, SEV
Deformed,DFD
Deformed, Dfd
Deformed, Dfd
Cdc28, Cdc28p Cdc2,Cdc2p GAI UvrA
Cdc28 Cdc2
Cdc28 Cdc2 GAI UvrA
HoxA4
HoxA4
Bmp4 lntegrin d,l,ltgal
BMP4 i n t e g r i na 1
HoxA4
HoxA4
Cyclops,Cyc Unc6
Cyclops,Cyc Unc6
Sevenless,Sev
Sevenless, Sev
Gai UvrA
Ancillaries Molecular Biolagy of the Cell,Fifih Edition:The ProblemsBook by Iohn Wilson and Tim Hunt (ISBN:978-0-8I 53-4f 10-9) The ProblemsBook is designedto help students appreciatethe ways in which experimentsand simple calculationscan lead to an understandingof how cells work. It providesproblemsto accompanyChaptersI-20 of MolecularBiologyof the Cell. Each chapter of problems is divided into sectionsthat correspondto those of the main textbook and review key terms, test for understandingbasic problems.MolecularBiologyof the Cell,Fifth concepts,and poseresearch-based Bookshould be useful for homework assignmentsand as Edition: TheProblem.s a basisfor classdiscussion.It could evenprovide ideasfor examquestions.Solutions for all of the problems are provided on the CD-ROMwhich accompanies the book. Solutionsfor the end-of-chapterproblemsin the main textbookare also found in TheProblemsBook. MBoCSMediaDVD-ROM The DVD included with everycopy of the book contains the figures,tables,and presentations, one for micrographsfrom the book,pre-loadedinto PowerPoint@ eachchapter.A separatefolder containsindividual versionsof eachfigure,table, and micrograph in JPEGformat. The panels are availablein PDF format. There arealsoover 125videos,animations,molecularstructuretutorials, and high-resolution micrographson the DVD.The authors have chosento include material that not only reinforcesbasicconceptsbut alsoexpandsthe contentand scope of the book.The multimedia can be accessedeither asindividual files or through the Cell BiologyInteractiuemedia player.As discussedabove,the media player has been programmedto workwith the Media Codesintegratedthroughout the book. A completetable of contentsand overviewof all electronicresourcesis contained in the MBoCSMedia Viewing Guide,a PDF file located on the root level of the DVD-ROMand in the Appendix of the media player.The DVD-ROM also containsChapters21-25which covermulticellularsystems.The chapters arein PDFformat and can be easilyprinted or searchedusingAdobe@Acrobat@ Readeror other PDF software. TeachingSupplements Upon request,teaching supplements for MolecularBiologt of the Cell are available to qualified instructors. MBoC1TransparencySet Provides200 frrll-color overheadacetatetransparenciesof the most important figuresfrom the book. MBoCSTestQuestions A selection of test questions will be available.Written by Kirsten Benjamin (AmyrisBiotechnologies,Emeryville,California)and Linda Huang (Universityof Boston),thesethoughtquestionswill teststudents'understandMassachusetts, ing of the chapter material. MBoCSLecture Outlines Lectureoutlines createdfrom the conceptheadsfor the text are provided. Garlnnd ScienceClasswirerM All of the teachingsupplementson the DVD-ROM(theseinclude figuresin PowerPointand JPEGformat;Chapters2l-25 in PDFformat; 125videos,animations, and movies)and the test questionsand Iectureoutlines areavailableto qualified instructorsonline at the GarlandScienceClasswire'"Web site.GarlandScience Classwire'"offersaccessto other instructional resourcesfrom all of the Garland Sciencetextbooks,and providesfreeonline coursemanagementtools. For addior tional information, pleasevisit http://www.classwire.com/garlandscience Inc.) (Classwire of ChalKree, is a trademark e-mail [email protected]. Adobe and Acrobat are either registeredtrademarks or trademarks of Adobe SystemsIncorporated in the United Statesandlor other countries PowerPoint is either a registeredtrademark or trademark of Microsoft Corporation in the United Statesand/or other countries
INTRODUCTION TOTHECELL
Chapter 1
Cells and Genomes The surface of our planet is populated by living things—curious, intricately organized chemical factories that take in matter from their surroundings and use these raw materials to generate copies of themselves. The living organisms appear extraordinarily diverse. What could be more different than a tiger and a piece of seaweed, or a bacterium and a tree? Yet our ancestors, knowing nothing of cells or DNA, saw that all these things had something in common. They called that something “life,” marveled at it, struggled to define it, and despaired of explaining what it was or how it worked in terms that relate to nonliving matter. The discoveries of the past century have not diminished the marvel—quite the contrary. But they have lifted away the mystery as to the nature of life. We can now see that all living things are made of cells, and that these units of living matter all share the same machinery for their most basic functions. Living things, though infinitely varied when viewed from the outside, are fundamentally similar inside. The whole of biology is a counterpoint between the two themes: astonishing variety in individual particulars; astonishing constancy in fundamental mechanisms. In this first chapter we begin by outlining the universal features common to all life on our planet. We then survey, briefly, the diversity of cells. And we see how, thanks to the common code in which the specifications for all living organisms are written, it is possible to read, measure, and decipher these specifications to achieve a coherent understanding of all the forms of life, from the smallest to the greatest.
1 In This Chapter THE UNIVERSAL FEATURES OF CELLS ON EARTH
1
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE
11
GENETIC INFORMATION IN EUCARYOTES
26
THE UNIVERSAL FEATURES OF CELLS ON EARTH It is estimated that there are more than 10 million—perhaps 100 million—living species on Earth today. Each species is different, and each reproduces itself faithfully, yielding progeny that belong to the same species: the parent organism hands down information specifying, in extraordinary detail, the characteristics that the offspring shall have. This phenomenon of heredity is central to the definition of life: it distinguishes life from other processes, such as the growth of a crystal, or the burning of a candle, or the formation of waves on water, in which orderly structures are generated but without the same type of link between the peculiarities of parents and the peculiarities of offspring. Like the candle flame, the living organism consumes free energy to create and maintain its organization; but the free energy drives a hugely complex system of chemical processes that is specified by the hereditary information. Most living organisms are single cells; others, such as ourselves, are vast multicellular cities in which groups of cells perform specialized functions and are linked by intricate systems of communication. But in all cases, whether we discuss the solitary bacterium or the aggregate of more than 1013 cells that form a human body, the whole organism has been generated by cell divisions from a single cell. The single cell, therefore, is the vehicle for the hereditary information that defines the species (Figure 1–1). And specified by this information, the cell includes the machinery to gather raw materials from the environment, and to construct out of them a new cell in its own image, complete with a new copy of the hereditary information. Nothing less than a cell has this capability.
1
2
Chapter 1: Cells and Genomes
(A)
(B)
(E)
(C)
50 mm
50 mm
100 mm
(D)
(F)
Figure 1–1 The hereditary information in the fertilized egg cell determines the nature of the whole multicellular organism. (A and B) A sea urchin egg gives rise to a sea urchin. (C and D) A mouse egg gives rise to a mouse. (E and F) An egg of the seaweed Fucus gives rise to a Fucus seaweed. (A, courtesy of David McClay; B, courtesy of M. Gibbs, Oxford Scientific Films; C, courtesy of Patricia Calarco, from G. Martin, Science 209:768–776, 1980. With permission from AAAS; D, courtesy of O. Newman, Oxford Scientific Films; E and F, courtesy of Colin Brownlee.)
All Cells Store Their Hereditary Information in the Same Linear Chemical Code (DNA) Computers have made us familiar with the concept of information as a measurable quantity—a million bytes (to record a few hundred pages of text or an image from a digital camera), 600 million for the music on a CD, and so on. They have also made us well aware that the same information can be recorded in many different physical forms. As the computer world has evolved, the discs and tapes that we used 10 years ago for our electronic archives have become unreadable on present-day machines. Living cells, like computers, deal in information, and it is estimated that they have been evolving and diversifying for over 3.5 billion years. It is scarcely to be expected that they should all store their information in the same form, or that the archives of one type of cell should be readable by the information-handling machinery of another. And yet it is so. All living cells on Earth, without any known exception, store their hereditary information in the form of double-stranded molecules of DNA—long unbranched paired polymer chains, formed always of the same four types of monomers. These monomers have nicknames drawn from a four-letter alphabet—A, T, C, G—and they are strung together in a long linear sequence that encodes the genetic information, just as the sequence of 1s and 0s encodes the information in a computer file. We can take a piece of DNA from a human cell and insert it into a bacterium, or a piece of bacterial DNA and insert it into a human cell, and the information will be successfully read, interpreted, and copied. Using chemical methods, scientists can read out the complete sequence of monomers in any DNA molecule—extending for millions of nucleotides—and thereby decipher the hereditary information that each organism contains.
THE UNIVERSAL FEATURES OF CELLS ON EARTH
3
All Cells Replicate Their Hereditary Information by Templated Polymerization The mechanisms that make life possible depend on the structure of the doublestranded DNA molecule. Each monomer in a single DNA strand—that is, each nucleotide—consists of two parts: a sugar (deoxyribose) with a phosphate group attached to it, and a base, which may be either adenine (A), guanine (G), cytosine (C) or thymine (T) (Figure 1–2). Each sugar is linked to the next via the phosphate group, creating a polymer chain composed of a repetitive sugarphosphate backbone with a series of bases protruding from it. The DNA polymer is extended by adding monomers at one end. For a single isolated strand, these can, in principle, be added in any order, because each one links to the next in the same way, through the part of the molecule that is the same for all of them. In the living cell, however, DNA is not synthesized as a free strand in isolation, but on a template formed by a preexisting DNA strand. The bases protruding from the existing strand bind to bases of the strand being synthesized, according to a strict rule defined by the complementary structures of the bases: A binds to T, and C binds to G. This base-pairing holds fresh monomers in place and thereby controls the selection of which one of the four monomers shall be added to the growing strand next. In this way, a double-stranded structure is created, consisting of two exactly complementary sequences of As, Cs, Ts, and Gs. The two strands twist around each other, forming a double helix (Figure 1–2E).
(A)
building block of DNA
(D)
double-stranded DNA
phosphate sugar
+ sugar phosphate
(B)
G G
A
C
T
G
G
C
A
A
T
G
nucleotide
T
G
A
C
C
G
T
T
A
C
base
DNA strand
G
T
A
A
C
G
G
sugar-phosphate backbone
A
C
T
(E) (C)
hydrogen-bonded base pairs
DNA double helix
templated polymerization of new strand nucleotide monomers
C C
C
A
A G
G
T
T
A
G C
G
G
T
T G
T
T A
G G
C
A
A
A
G C
T
C
A
C
G A
C
C
A
Figure 1–2 DNA and its building blocks. (A) DNA is made from simple subunits, called nucleotides, each consisting of a sugar-phosphate molecule with a nitrogen-containing sidegroup, or base, attached to it. The bases are of four types (adenine, guanine, cytosine, and thymine), corresponding to four distinct nucleotides, labeled A, G, C, and T. (B) A single strand of DNA consists of nucleotides joined together by sugarphosphate linkages. Note that the individual sugar-phosphate units are asymmetric, giving the backbone of the strand a definite directionality, or polarity. This directionality guides the molecular processes by which the information in DNA is interpreted and copied in cells: the information is always “read” in a consistent order, just as written English text is read from left to right. (C) Through templated polymerization, the sequence of nucleotides in an existing DNA strand controls the sequence in which nucleotides are joined together in a new DNA strand; T in one strand pairs with A in the other, and G in one strand with C in the other. The new strand has a nucleotide sequence complementary to that of the old strand, and a backbone with opposite directionality: corresponding to the GTAA... of the original strand, it has ...TTAC. (D) A normal DNA molecule consists of two such complementary strands. The nucleotides within each strand are linked by strong (covalent) chemical bonds; the complementary nucleotides on opposite strands are held together more weakly, by hydrogen bonds. (E) The two strands twist around each other to form a double helix—a robust structure that can accommodate any sequence of nucleotides without altering its basic structure.
4
Chapter 1: Cells and Genomes template strand
new strand
Figure 1–3 The copying of genetic information by DNA replication. In this process, the two strands of a DNA double helix are pulled apart, and each serves as a template for synthesis of a new complementary strand.
new strand parent DNA double helix
template strand
The bonds between the base pairs are weak compared with the sugar-phosphate links, and this allows the two DNA strands to be pulled apart without breakage of their backbones. Each strand then can serve as a template, in the way just described, for the synthesis of a fresh DNA strand complementary to itself—a fresh copy, that is, of the hereditary information (Figure 1–3). In different types of cells, this process of DNA replication occurs at different rates, with different controls to start it or stop it, and different auxiliary molecules to help it along. But the basics are universal: DNA is the information store, and templated polymerization is the way in which this information is copied throughout the living world.
All Cells Transcribe Portions of Their Hereditary Information into the Same Intermediary Form (RNA) To carry out its information-bearing function, DNA must do more than copy itself. It must also express its information, by letting it guide the synthesis of other molecules in the cell. This also occurs by a mechanism that is the same in all living organisms, leading first and foremost to the production of two other key classes of polymers: RNAs and proteins. The process (discussed in detail in Chapters 6 and 7) begins with a templated polymerization called transcription, in which segments of the DNA sequence are used as templates for the synthesis of shorter molecules of the closely related polymer ribonucleic acid, or RNA. Later, in the more complex process of translation, many of these RNA molecules direct the synthesis of polymers of a radically different chemical class—the proteins (Figure 1–4). In RNA, the backbone is formed of a slightly different sugar from that of DNA—ribose instead of deoxyribose—and one of the four bases is slightly different—uracil (U) in place of thymine (T); but the other three bases—A, C, and G—are the same, and all four bases pair with their complementary counterparts in DNA—the A, U, C, and G of RNA with the T, A, G, and C of DNA. During transcription, RNA monomers are lined up and selected for polymerization on a template strand of DNA, just as DNA monomers are selected during replication. The outcome is a polymer molecule whose sequence of nucleotides faithfully represents a part of the cell’s genetic information, even though written in a slightly different alphabet, consisting of RNA monomers instead of DNA monomers. The same segment of DNA can be used repeatedly to guide the synthesis of many identical RNA transcripts. Thus, whereas the cell’s archive of genetic information in the form of DNA is fixed and sacrosanct, the RNA transcripts are mass-produced and disposable (Figure 1–5). As we shall see, these transcripts function as intermediates in the transfer of genetic information: they mainly serve as messenger RNA (mRNA) to guide the synthesis of proteins according to the genetic instructions stored in the DNA. RNA molecules have distinctive structures that can also give them other specialized chemical capabilities. Being single-stranded, their backbone is flexible, so that the polymer chain can bend back on itself to allow one part of the
DNA synthesis (replication) DNA
RNA synthesis (transcription) RNA
protein synthesis (translation) PROTEIN
amino acids
Figure 1–4 From DNA to protein. Genetic information is read out and put to use through a two-step process. First, in transcription, segments of the DNA sequence are used to guide the synthesis of molecules of RNA. Then, in translation, the RNA molecules are used to guide the synthesis of molecules of protein.
THE UNIVERSAL FEATURES OF CELLS ON EARTH
5 RNA MOLECULES AS EXPENDABLE INFORMATION CARRIERS
DOUBLE-STRANDED DNA AS INFORMATION ARCHIVE TRANSCRIPTION
strand used as a template to direct RNA synthesis many identical RNA transcripts
molecule to form weak bonds with another part of the same molecule. This occurs when segments of the sequence are locally complementary: a ...GGGG... segment, for example, will tend to associate with a ...CCCC... segment. These types of internal associations can cause an RNA chain to fold up into a specific shape that is dictated by its sequence (Figure 1–6). The shape of the RNA molecule, in turn, may enable it to recognize other molecules by binding to them selectively—and even, in certain cases, to catalyze chemical changes in the molecules that are bound. As we see in Chapter 6, a few chemical reactions catalyzed by RNA molecules are crucial for several of the most ancient and fundamental processes in living cells, and it has been suggested that more extensive catalysis by RNA played a central part in the early evolution of life.
Figure 1–5 How genetic information is broadcast for use inside the cell. Each cell contains a fixed set of DNA molecules—its archive of genetic information. A given segment of this DNA guides the synthesis of many identical RNA transcripts, which serve as working copies of the information stored in the archive. Many different sets of RNA molecules can be made by transcribing selected parts of a long DNA sequence, allowing each cell to use its information store differently.
All Cells Use Proteins as Catalysts Protein molecules, like DNA and RNA molecules, are long unbranched polymer chains, formed by stringing together monomeric building blocks drawn from a standard repertoire that is the same for all living cells. Like DNA and RNA, they carry information in the form of a linear sequence of symbols, in the same way as a human message written in an alphabetic script. There are many different protein molecules in each cell, and—leaving out the water—they form most of the cell’s mass. The monomers of protein, the amino acids, are quite different from those of DNA and RNA, and there are 20 types, instead of 4. Each amino acid is built around the same core structure through which it can be linked in a standard way to any other amino acid in the set; attached to this core is a side group that gives each amino acid a distinctive chemical character. Each of the protein molecules, or polypeptides, created by joining amino acids in a particular sequence folds into a precise three-dimensional form with reactive sites on its surface (Figure
G U A U
G C C A G U U A G C C G
C A U A
C
A G C U U A A A
CC U
G GG
(A)
A
U C G A A U U U
A U G C A U
U A C G U A
AAA UU
U (B)
Figure 1–6 The conformation of an RNA molecule. (A) Nucleotide pairing between different regions of the same RNA polymer chain causes the molecule to adopt a distinctive shape. (B) The three-dimensional structure of an actual RNA molecule, from hepatitis delta virus, that catalyzes RNA strand cleavage. The blue ribbon represents the sugarphosphate backbone; the bars represent base pairs. (B, based on A.R. Ferré D’Amaré, K. Zhou and J.A. Doudna, Nature 395:567–574, 1998. With permission from Macmillan Publishers Ltd.)
6
Chapter 1: Cells and Genomes polysaccharide chain + + catalytic site lysozyme molecule (B)
(A) lysozyme
Figure 1–7 How a protein molecule acts as catalyst for a chemical reaction. (A) In a protein molecule the polymer chain folds up to into a specific shape defined by its amino acid sequence. A groove in the surface of this particular folded molecule, the enzyme lysozyme, forms a catalytic site. (B) A polysaccharide molecule (red)—a polymer chain of sugar monomers—binds to the catalytic site of lysozyme and is broken apart, as a result of a covalent bond-breaking reaction catalyzed by the amino acids lining the groove.
1–7A). These amino acid polymers thereby bind with high specificity to other molecules and act as enzymes to catalyze reactions that make or break covalent bonds. In this way they direct the vast majority of chemical processes in the cell (Figure 1–7B). Proteins have many other functions as well—maintaining structures, generating movements, sensing signals, and so on—each protein molecule performing a specific function according to its own genetically specified sequence of amino acids. Proteins, above all, are the molecules that put the cell’s genetic information into action. Thus, polynucleotides specify the amino acid sequences of proteins. Proteins, in turn, catalyze many chemical reactions, including those by which new DNA molecules are synthesized, and the genetic information in DNA is used to make both RNA and proteins. This feedback loop is the basis of the autocatalytic, self-reproducing behavior of living organisms (Figure 1–8).
All Cells Translate RNA into Protein in the Same Way The translation of genetic information from the 4-letter alphabet of polynucleotides into the 20-letter alphabet of proteins is a complex process. The rules of this translation seem in some respects neat and rational, in other respects strangely arbitrary, given that they are (with minor exceptions) identical in all living things. These arbitrary features, it is thought, reflect frozen accidents in the early history of life—chance properties of the earliest organisms that were passed on by heredity and have become so deeply embedded in the constitution of all living cells that they cannot be changed without disastrous effects. The information in the sequence of a messenger RNA molecule is read out in groups of three nucleotides at a time: each triplet of nucleotides, or codon, specifies (codes for) a single amino acid in a corresponding protein. Since there are 64 (= 4 ¥ 4 ¥ 4) possible codons, all of which occur in nature, but only 20 amino acids, there are necessarily many cases in which several codons correspond to the same amino acid. The code is read out by a special class of small RNA molecules, the transfer RNAs (tRNAs). Each type of tRNA becomes attached at one end to a specific amino acid, and displays at its other end a specific sequence of three nucleotides—an anticodon—that enables it to recognize, through base-pairing, a particular codon or subset of codons in mRNA (Figure 1–9). For synthesis of protein, a succession of tRNA molecules charged with their appropriate amino acids have to be brought together with an mRNA molecule and matched up by base-pairing through their anticodons with each of its successive codons. The amino acids then have to be linked together to extend the growing protein chain, and the tRNAs, relieved of their burdens, have to be released. This whole complex of processes is carried out by a giant multimolecular machine, the ribosome, formed of two main chains of RNA, called ribosomal RNAs
THE UNIVERSAL FEATURES OF CELLS ON EARTH
7
amino acids
Figure 1–8 Life as an autocatalytic process. Polynucleotides (nucleotide polymers) and proteins (amino acid polymers) provide the sequence information and the catalytic functions that serve—through a complex set of chemical reactions—to bring about the synthesis of more polynucleotides and proteins of the same types.
nucleotides
catalytic function
sequence information
proteins
polynucleotides
(rRNAs), and more than 50 different proteins. This evolutionarily ancient molecular juggernaut latches onto the end of an mRNA molecule and then trundles along it, capturing loaded tRNA molecules and stitching together the amino acids they carry to form a new protein chain (Figure 1–10).
The Fragment of Genetic Information Corresponding to One Protein Is One Gene DNA molecules as a rule are very large, containing the specifications for thousands of proteins. Individual segments of the entire DNA sequence are transcribed into separate mRNA molecules, with each segment coding for a different protein. Each such DNA segment represents one gene. A complication is that RNA molecules transcribed from the same DNA segment can often be processed in more than one way, so as to give rise to a set of alternative versions of a protein, especially in more complex cells such as those of plants and animals. A gene therefore is defined, more generally, as the segment of DNA sequence corresponding to a single protein or set of alternative protein variants (or to a single catalytic or structural RNA molecule for those genes that produce RNA but not protein). In all cells, the expression of individual genes is regulated: instead of manufacturing its full repertoire of possible proteins at full tilt all the time, the cell adjusts the rate of transcription and translation of different genes independently, according to need. Stretches of regulatory DNA are interspersed among the segments
Figure 1–9 Transfer RNA. (A) A tRNA molecule specific for the amino acid tryptophan. One end of the tRNA molecule has tryptophan attached to it, while the other end displays the triplet nucleotide sequence CCA (its anticodon), which recognizes the tryptophan codon in messenger RNA molecules. (B) The three-dimensional structure of the tryptophan tRNA molecule. Note that the codon and the anticodon in (A) are in antiparallel orientations, like the two strands in a DNA double helix (see Figure 1–2), so that the sequence of the anticodon in the tRNA is read from right to left, while that of the codon in the mRNA is read from left to right.
amino acid (tryptophan)
specific tRNA molecule tRNA binds to its codon in mRNA A
C
C
anticodon
A
C
C
U
G
G
base-pairing
anticodon
codon in mRNA (A)
NET RESULT: AMINO ACID IS SELECTED BY ITS CODON
(B)
8
Chapter 1: Cells and Genomes Figure 1–10 A ribosome at work. (A) The diagram shows how a ribosome moves along an mRNA molecule, capturing tRNA molecules that match the codons in the mRNA and using them to join amino acids into a protein chain. The mRNA specifies the sequence of amino acids. (B) The threedimensional structure of a bacterial ribosome (pale green and blue), moving along an mRNA molecule (orange beads), with three tRNA molecules (yellow, green, and pink) at different stages in their process of capture and release. The ribosome is a giant assembly of more than 50 individual protein and RNA molecules. (B, courtesy of Joachim Frank, Yanhong Li and Rajendra Agarwal.)
growing polypeptide chain incoming tRNA loaded with amino acid
STEP 1 2
1
2
3
4
P
A
3
4
STEP 2
that code for protein, and these noncoding regions bind to special protein molecules that control the local rate of transcription (Figure 1–11). Other noncoding DNA is also present, some of it serving, for example, as punctuation, defining where the information for an individual protein begins and ends. The quantity and organization of the regulatory and other noncoding DNA vary widely from one class of organisms to another, but the basic strategy is universal. In this way, the genome of the cell—that is, the total of its genetic information as embodied in its complete DNA sequence—dictates not only the nature of the cell’s proteins, but also when and where they are to be made.
two subunits of ribosome
3
2 1
mRNA
4
2
P 3
A 4
STEP 3 2
3
1
4
Life Requires Free Energy 2
A living cell is a dynamic chemical system, operating far from chemical equilibrium. For a cell to grow or to make a new cell in its own image, it must take in free energy from the environment, as well as raw materials, to drive the necessary synthetic reactions. This consumption of free energy is fundamental to life. When it stops, a cell decays towards chemical equilibrium and soon dies. Genetic information is also fundamental to life. Is there any connection? The answer is yes: free energy is required for the propagation of information. For example, to specify one bit of information—that is, one yes/no choice between two equally probable alternatives—costs a defined amount of free energy that can be calculated. The quantitative relationship involves some deep reasoning and depends on a precise definition of the term “free energy,” discussed in Chapter 2. The basic idea, however, is not difficult to understand intuitively. Picture the molecules in a cell as a swarm of objects endowed with thermal energy, moving around violently at random, buffeted by collisions with one another. To specify genetic information—in the form of a DNA sequence, for example—molecules from this wild crowd must be captured, arranged in a specific order defined by some preexisting template, and linked together in a fixed relationship. The bonds that hold the molecules in their proper places on the template and join them together must be strong enough to resist the disordering effect of thermal motion. The process is driven forward by consumption of free energy, which is needed to ensure that the correct bonds are made, and made robustly. In the simplest case, the molecules can be compared with spring-loaded traps, ready to snap into a more stable, lower-energy attached state when they meet their proper partners; as they snap together into the bonded arrangement, their available stored energy—their free energy—like the energy of the spring in the trap, is released and dissipated as heat. In a cell, the chemical processes underlying information transfer are more complex, but the same basic principle applies: free energy has to be spent on the creation of order. To replicate its genetic information faithfully, and indeed to make all its complex molecules according to the correct specifications, the cell therefore requires free energy, which has to be imported somehow from the surroundings.
3
STEP 4 2
3
1
4
3
4
4
new tRNA bringing next amino acid 5
4
5
STEP 1 2
3
1
3
(A)
All Cells Function as Biochemical Factories Dealing with the Same Basic Molecular Building Blocks Because all cells make DNA, RNA, and protein, and these macromolecules are composed of the same set of subunits in every case, all cells have to contain and
4
(B)
THE UNIVERSAL FEATURES OF CELLS ON EARTH
9
manipulate a similar collection of small molecules, including simple sugars, nucleotides, and amino acids, as well as other substances that are universally required for their synthesis. All cells, for example, require the phosphorylated nucleotide ATP (adenosine triphosphate) as a building block for the synthesis of DNA and RNA; and all cells also make and consume this molecule as a carrier of free energy and phosphate groups to drive many other chemical reactions. Although all cells function as biochemical factories of a broadly similar type, many of the details of their small-molecule transactions differ, and it is not as easy as it is for the informational macromolecules to point out the features that are strictly universal. Some organisms, such as plants, require only the simplest of nutrients and harness the energy of sunlight to make from these almost all their own small organic molecules; other organisms, such as animals, feed on living things and obtain many of their organic molecules ready-made. We return to this point below.
All Cells Are Enclosed in a Plasma Membrane Across Which Nutrients and Waste Materials Must Pass There is, however, at least one other feature of cells that is universal: each one is enclosed by a membrane—the plasma membrane. This container acts as a selective barrier that enables the cell to concentrate nutrients gathered from its environment and retain the products it synthesizes for its own use, while excreting its waste products. Without a plasma membrane, the cell could not maintain its integrity as a coordinated chemical system. The molecules forming this membrane have the simple physico-chemical property of being amphiphilic—that is, consisting of one part that is hydrophobic (water-insoluble) and another part that is hydrophilic (water-soluble). Such molecules placed in water aggregate spontaneously, arranging their hydrophobic portions to be as much in contact with one another as possible to hide them from the water, while keeping their hydrophilic portions exposed. Amphiphilic molecules of appropriate shape, such as the phospholipid molecules that comprise most of the plasma membrane, spontaneously aggregate in water to form a bilayer that creates small closed vesicles (Figure 1–12). The phenomenon can be demonstrated in a test tube by simply mixing phospholipids and water together; under appropriate conditions, small vesicles form whose aqueous contents are isolated from the external medium. Although the chemical details vary, the hydrophobic tails of the predominant membrane molecules in all cells are hydrocarbon polymers (–CH2–CH2–CH2–), and their spontaneous assembly into a bilayered vesicle is but one of many examples of an important general principle: cells produce molecules whose chemical properties cause them to self-assemble into the structures that a cell needs. The cell boundary cannot be totally impermeable. If a cell is to grow and reproduce, it must be able to import raw materials and export waste across its plasma membrane. All cells therefore have specialized proteins embedded in their membrane that transport specific molecules from one side to the other (Figure 1–13). Some of these membrane transport proteins, like some of the proteins that catalyze the fundamental small-molecule reactions inside the cell,
Figure 1–11 Gene regulation by protein binding to regulatory DNA. (A) A diagram of a small portion of the genome of the bacterium Escherichia coli, containing genes (called LacI, LacZ, LacY, and LacA) coding for four different proteins. The protein-coding DNA segments (red) have regulatory and other noncoding DNA segments (yellow) between them. (B) An electron micrograph of DNA from this region, with a protein molecule (encoded by the LacI gene) bound to the regulatory segment; this protein controls the rate of transcription of the LacZ, LacY, and LacA genes. (C) A drawing of the structures shown in (B). (B, courtesy of Jack Griffith.)
site of protein binding shown in micrograph (B) below LacI
LacZ
noncoding DNA segments LacY LacA 2000 nucleotide pairs
(A)
(B)
protein bound to regulatory segment of DNA (C)
segment of DNA coding for protein
10
Chapter 1: Cells and Genomes
have been so well preserved over the course of evolution that we can recognize the family resemblances between them in comparisons of even the most distantly related groups of living organisms. The transport proteins in the membrane largely determine which molecules enter the cell, and the catalytic proteins inside the cell determine the reactions that those molecules undergo. Thus, by specifying the proteins that the cell is to manufacture, the genetic information recorded in the DNA sequence dictates the entire chemistry of the cell; and not only its chemistry, but also its form and its behavior, for these too are chiefly constructed and controlled by the cell’s proteins.
phospholipid monolayer
OIL
phospholipid bilayer
A Living Cell Can Exist with Fewer Than 500 Genes The basic principles of biological information transfer are simple enough, but how complex are real living cells? In particular, what are the minimum requirements? We can get a rough indication by considering a species that has one of the smallest known genomes—the bacterium Mycoplasma genitalium (Figure 1–14). This organism lives as a parasite in mammals, and its environment provides it with many of its small molecules ready-made. Nevertheless, it still has to make all the large molecules—DNA, RNAs, and proteins—required for the basic processes of heredity. It has only about 480 genes in its genome of 580,070 nucleotide pairs, representing 145,018 bytes of information—about as much as it takes to record the text of one chapter of this book. Cell biology may be complicated, but it is not impossibly so. The minimum number of genes for a viable cell in today’s environments is probably not less than 200–300, although there are only about 60 genes in the core set shared by all living species without any known exception.
H+
plasma membrane OUTSIDE INSIDE
sugars (13)
(A)
amino acids, peptides, amines (14)
ions (16)
other (3)
(B)
Figure 1–13 Membrane transport proteins. (A) Structure of a molecule of bacteriorhodopsin, from the archaeon (archaebacterium) Halobacterium halobium. This transport protein uses the energy of absorbed light to pump protons (H+ ions) out of the cell. The polypeptide chain threads to and fro across the membrane; in several regions it is twisted into a helical conformation, and the helical segments are arranged to form the walls of a channel through which ions are transported. (B) Diagram of the set of transport proteins found in the membrane of the bacterium Thermotoga maritima. The numbers in parentheses refer to the number of different membrane transport proteins of each type. Most of the proteins within each class are evolutionarily related to one another and to their counterparts in other species.
WATER
Figure 1–12 Formation of a membrane by amphiphilic phospholipid molecules. These have a hydrophilic (water-loving, phosphate) head group and a hydrophobic (water-avoiding, hydrocarbon) tail. At an interface between oil and water, they arrange themselves as a single sheet with their head groups facing the water and their tail groups facing the oil. When immersed in water, they aggregate to form bilayers enclosing aqueous compartments.
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE Figure 1–14 Mycoplasma genitalium. (A) Scanning electron micrograph showing the irregular shape of this small bacterium, reflecting the lack of any rigid wall. (B) Cross section (transmission electron micrograph) of a Mycoplasma cell. Of the 477 genes of Mycoplasma genitalium, 37 code for transfer, ribosomal, and other nonmessenger RNAs. Functions are known, or can be guessed, for 297 of the genes coding for protein: of these, 153 are involved in replication, transcription, translation, and related processes involving DNA, RNA, and protein; 29 in the membrane and surface structures of the cell; 33 in the transport of nutrients and other molecules across the membrane; 71 in energy conversion and the synthesis and degradation of small molecules; and 11 in the regulation of cell division and other processes. (A, from S. Razin et al., Infect. Immun. 30:538–546, 1980. With permission from the American Society for Microbiology; B, courtesy of Roger Cole, in Medical Microbiology, 4th ed. [S. Baron ed.]. Galveston: University of Texas Medical Branch, 1996.)
11
(A)
5 mm
Summary Living organisms reproduce themselves by transmitting genetic information to their progeny. The individual cell is the minimal self-reproducing unit, and is the vehicle for transmission of the genetic information in all living species. Every cell on our planet stores its genetic information in the same chemical form—as double-stranded DNA. The cell replicates its information by separating the paired DNA strands and using each as a template for polymerization to make a new DNA strand with a complementary sequence of nucleotides. The same strategy of templated polymerization is used to transcribe portions of the information from DNA into molecules of the closely related polymer, RNA. These in turn guide the synthesis of protein molecules by the more complex machinery of translation, involving a large multimolecular machine, the ribosome, which is itself composed of RNA and protein. Proteins are the principal catalysts for almost all the chemical reactions in the cell; their other functions include the selective import and export of small molecules across the plasma membrane that forms the cell’s boundary. The specific function of each protein depends on its amino acid sequence, which is specified by the nucleotide sequence of a corresponding segment of the DNA—the gene that codes for that protein. In this way, the genome of the cell determines its chemistry; and the chemistry of every living cell is fundamentally similar, because it must provide for the synthesis of DNA, RNA, and protein. The simplest known cells have just under 500 genes.
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE The success of living organisms based on DNA, RNA, and protein, out of the infinitude of other chemical forms that we might conceive of, has been spectacular. They have populated the oceans, covered the land, infiltrated the Earth’s crust, and molded the surface of our planet. Our oxygen-rich atmosphere, the deposits of coal and oil, the layers of iron ores, the cliffs of chalk and limestone and marble—all these are products, directly or indirectly, of past biological activity on Earth. Living things are not confined to the familiar temperate realm of land, water, and sunlight inhabited by plants and plant-eating animals. They can be found in the darkest depths of the ocean, in hot volcanic mud, in pools beneath the frozen surface of the Antarctic, and buried kilometers deep in the Earth’s crust. The creatures that live in these extreme environments are generally unfamiliar, not only because they are inaccessible, but also because they are mostly microscopic. In more homely habitats, too, most organisms are too small for us to see without special equipment: they tend to go unnoticed, unless they cause a disease or rot the timbers of our houses. Yet microorganisms make up most of the
(B)
0.2 mm
12
Chapter 1: Cells and Genomes
total mass of living matter on our planet. Only recently, through new methods of molecular analysis and specifically through the analysis of DNA sequences, have we begun to get a picture of life on Earth that is not grossly distorted by our biased perspective as large animals living on dry land. In this section we consider the diversity of organisms and the relationships among them. Because the genetic information for every organism is written in the universal language of DNA sequences, and the DNA sequence of any given organism can be obtained by standard biochemical techniques, it is now possible to characterize, catalogue, and compare any set of living organisms with reference to these sequences. From such comparisons we can estimate the place of each organism in the family tree of living species—the ‘tree of life’. But before describing what this approach reveals, we need first to consider the routes by which cells in different environments obtain the matter and energy they require to survive and proliferate, and the ways in which some classes of organisms depend on others for their basic chemical needs.
Cells Can Be Powered by a Variety of Free Energy Sources Living organisms obtain their free energy in different ways. Some, such as animals, fungi, and the bacteria that live in the human gut, get it by feeding on other living things or the organic chemicals they produce; such organisms are called organotrophic (from the Greek word trophe, meaning “food”). Others derive their energy directly from the nonliving world. These fall into two classes: those that harvest the energy of sunlight, and those that capture their energy from energy-rich systems of inorganic chemicals in the environment (chemical systems that are far from chemical equilibrium). Organisms of the former class are called phototrophic (feeding on sunlight); those of the latter are called lithotrophic (feeding on rock). Organotrophic organisms could not exist without these primary energy converters, which are the most plentiful form of life. Phototrophic organisms include many types of bacteria, as well as algae and plants, on which we—and virtually all the living things that we ordinarily see around us—depend. Phototrophic organisms have changed the whole chemistry of our environment: the oxygen in the Earth’s atmosphere is a by-product of their biosynthetic activities. Lithotrophic organisms are not such an obvious feature of our world, because they are microscopic and mostly live in habitats that humans do not frequent—deep in the ocean, buried in the Earth’s crust, or in various other inhospitable environments. But they are a major part of the living world, and are especially important in any consideration of the history of life on Earth. Some lithotrophs get energy from aerobic reactions, which use molecular oxygen from the environment; since atmospheric O2 is ultimately the product of living organisms, these aerobic lithotrophs are, in a sense, feeding on the products of past life. There are, however, other lithotrophs that live anaerobically, in places where little or no molecular oxygen is present, in circumstances similar to those that must have existed in the early days of life on Earth, before oxygen had accumulated. The most dramatic of these sites are the hot hydrothermal vents found deep down on the floor of the Pacific and Atlantic Oceans, in regions where the ocean floor is spreading as new portions of the Earth’s crust form by a gradual upwelling of material from the Earth’s interior (Figure 1–15). Downward-percolating seawater is heated and driven back upward as a submarine geyser, carrying with it a current of chemicals from the hot rocks below. A typical cocktail might include H2S, H2, CO, Mn2+, Fe2+, Ni2+, CH2, NH4+, and phosphorus-containing compounds. A dense population of microbes lives in the neighborhood of the vent, thriving on this austere diet and harvesting free energy from reactions between the available chemicals. Other organisms—clams, mussels, and giant marine worms—in turn live off the microbes at the vent, forming an entire ecosystem analogous to the system of plants and animals that we belong to, but powered by geochemical energy instead of light (Figure 1–16).
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE
13
SEA dark cloud of hot mineral-rich water hydrothermal vent
anaerobic lithotrophic bacteria invertebrate animal community
chimney made from precipitated metal sulfides
2–3°C
sea floor
Figure 1–15 The geology of a hot hydrothermal vent in the ocean floor. Water percolates down toward the hot molten rock upwelling from the Earth’s interior and is heated and driven back upward, carrying minerals leached from the hot rock. A temperature gradient is set up, from more than 350°C near the core of the vent, down to 2–3°C in the surrounding ocean. Minerals precipitate from the water as it cools, forming a chimney. Different classes of organisms, thriving at different temperatures, live in different neighborhoods of the chimney. A typical chimney might be a few meters tall, with a flow rate of 1–2 m/sec.
350°C contour
percolation of seawater
hot mineral solution
hot basalt
Some Cells Fix Nitrogen and Carbon Dioxide for Others To make a living cell requires matter, as well as free energy. DNA, RNA, and protein are composed of just six elements: hydrogen, carbon, nitrogen, oxygen, sulfur, and phosphorus. These are all plentiful in the nonliving environment, in the Earth’s rocks, water, and atmosphere, but not in chemical forms that allow easy incorporation into biological molecules. Atmospheric N2 and CO2, in particular, are extremely unreactive, and a large amount of free energy is required to drive the reactions that use these inorganic molecules to make the organic compounds needed for further biosynthesis—that is, to fix nitrogen and carbon dioxide, so as to make N and C available to living organisms. Many types of living cells lack the biochemical machinery to achieve this fixation, and rely on other classes of cells to do the job for them. We animals depend on plants for our supplies of geochemical energy and inorganic raw materials
bacteria
multicellular animals e.g., tubeworms
1m
Figure 1–16 Living organisms at a hot hydrothermal vent. Close to the vent, at temperatures up to about 120°C, various lithotrophic species of bacteria and archaea (archaebacteria) live, directly fuelled by geochemical energy. A little further away, where the temperature is lower, various invertebrate animals live by feeding on these microorganisms. Most remarkable are the giant (2-meter) tube worms, which, rather than feed on the lithotrophic cells, live in symbiosis with them: specialized organs in the worms harbor huge numbers of symbiotic sulfur-oxidizing bacteria. These bacteria harness geochemical energy and supply nourishment to their hosts, which have no mouth, gut, or anus. The dependence of the tube worms on the bacteria for the harnessing of geothermal energy is analogous to the dependence of plants on chloroplasts for the harnessing of solar energy, discussed later in this chapter. The tube worms, however, are thought to have evolved from more conventional animals, and to have become secondarily adapted to life at hydrothermal vents. (Courtesy of Dudley Foster, Woods Hole Oceanographic Institution.)
14
Chapter 1: Cells and Genomes Figure 1–17 Shapes and sizes of some bacteria. Although most are small, as shown, measuring a few micrometers in linear dimension, there are also some giant species. An extreme example (not shown) is the cigar-shaped bacterium Epulopiscium fishelsoni, which lives in the gut of a surgeonfish and can be up to 600 mm long.
2 mm spherical cells e.g., Streptococcus
rod-shaped cells e.g., Escherichia coli, Vibrio cholerae
the smallest cells e.g., Mycoplasma, Spiroplasma
spiral cells e.g., Treponema pallidum
organic carbon and nitrogen compounds. Plants in turn, although they can fix carbon dioxide from the atmosphere, lack the ability to fix atmospheric nitrogen, and they depend in part on nitrogen-fixing bacteria to supply their need for nitrogen compounds. Plants of the pea family, for example, harbor symbiotic nitrogen-fixing bacteria in nodules in their roots. Living cells therefore differ widely in some of the most basic aspects of their biochemistry. Not surprisingly, cells with complementary needs and capabilities have developed close associations. Some of these associations, as we see below, have evolved to the point where the partners have lost their separate identities altogether: they have joined forces to form a single composite cell.
The Greatest Biochemical Diversity Exists Among Procaryotic Cells From simple microscopy, it has long been clear that living organisms can be classified on the basis of cell structure into two groups: the eucaryotes and the procaryotes. Eucaryotes keep their DNA in a distinct membrane-enclosed intracellular compartment called the nucleus. (The name is from the Greek, meaning “truly nucleated,” from the words eu, “well” or “truly,” and karyon, “kernel” or “nucleus”.) Procaryotes have no distinct nuclear compartment to house their DNA. Plants, fungi, and animals are eucaryotes; bacteria are procaryotes, as are archaea—a separate class of procaryotic cells, discussed below. Most procaryotic cells are small and simple in outward appearance (Figure 1–17), and they live mostly as independent individuals or in loosely organized communities, rather than as multicellular organisms. They are typically spherical or rod-shaped and measure a few micrometers in linear dimension. They often have a tough protective coat, called a cell wall, beneath which a plasma membrane encloses a single cytoplasmic compartment containing DNA, RNA, proteins, and the many small molecules needed for life. In the electron microscope, this cell interior appears as a matrix of varying texture without any discernible organized internal structure (Figure 1–18). Figure 1–18 The structure of a bacterium. (A) The bacterium Vibrio cholerae, showing its simple internal organization. Like many other species, Vibrio has a helical appendage at one end—a flagellum—that rotates as a propeller to drive the cell forward. (B) An electron micrograph of a longitudinal section through the widely studied bacterium Escherichia coli (E. coli). This is related to Vibrio but has many flagella (not visible in this section) distributed over its surface. The cell’s DNA is concentrated in the lightly stained region. (B, courtesy of E. Kellenberger.) plasma membrane
DNA
cell wall
flagellum
1 mm
ribosomes (A)
(B)
1 mm
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE
H
S
15
V
10 mm
Figure 1–19 The phototrophic bacterium Anabaena cylindrica viewed in the light microscope. The cells of this species form long, multicellular filaments. Most of the cells (labeled V) perform photosynthesis, while others become specialized for nitrogen fixation (labeled H), or develop into resistant spores (labeled S). (Courtesy of Dave G. Adams.)
Procaryotic cells live in an enormous variety of ecological niches, and they are astonishingly varied in their biochemical capabilities—far more so than eucaryotic cells. Organotrophic species can utilize virtually any type of organic molecule as food, from sugars and amino acids to hydrocarbons and methane gas. Phototrophic species (Figure 1–19) harvest light energy in a variety of ways, some of them generating oxygen as a byproduct, others not. Lithotrophic species can feed on a plain diet of inorganic nutrients, getting their carbon from CO2, and relying on H2S to fuel their energy needs (Figure 1–20)—or on H2, or Fe2+, or elemental sulfur, or any of a host of other chemicals that occur in the environment. Many parts of this world of microscopic organisms are virtually unexplored. Traditional methods of bacteriology have given us an acquaintance with those species that can be isolated and cultured in the laboratory. But DNA sequence analysis of the populations of bacteria in samples from natural habitats—such as soil or ocean water, or even the human mouth—has opened our eyes to the fact that most species cannot be cultured by standard laboratory techniques. According to one estimate, at least 99% of procaryotic species remain to be characterized.
The Tree of Life Has Three Primary Branches: Bacteria, Archaea, and Eucaryotes The classification of living things has traditionally depended on comparisons of their outward appearances: we can see that a fish has eyes, jaws, backbone, brain, and so on, just as we do, and that a worm does not; that a rosebush is cousin to an apple tree, but less similar to a grass. As Darwin showed, we can readily interpret such close family resemblances in terms of evolution from common ancestors, and we can find the remains of many of these ancestors preserved in the fossil record. In this way, it has been possible to begin to draw a family tree of living organisms, showing the various lines of descent, as well as branch points in the history, where the ancestors of one group of species became different from those of another. When the disparities between organisms become very great, however, these methods begin to fail. How do we decide whether a fungus is closer kin to a plant or to an animal? When it comes to procaryotes, the task becomes harder still: one microscopic rod or sphere looks much like another. Microbiologists have therefore sought to classify procaryotes in terms of their biochemistry and nutritional requirements. But this approach also has its pitfalls. Amid the bewildering variety of biochemical behaviors, it is difficult to know which differences truly reflect differences of evolutionary history. Genome analysis has given us a simpler, more direct, and more powerful way to determine evolutionary relationships. The complete DNA sequence of an organism defines its nature with almost perfect precision and in exhaustive detail. Moreover, this specification is in a digital form—a string of letters—that can be entered straightforwardly into a computer and compared with the corresponding information for any other living thing. Because DNA is subject to random changes that accumulate over long periods of time (as we shall see shortly), the number of differences between the DNA sequences of two organisms can provide a direct, objective, quantitative indication of the evolutionary distance between them. This approach has shown that the organisms that were traditionally classed together as “bacteria” can be as widely divergent in their evolutionary origins as
6 mm
Figure 1–20 A lithotrophic bacterium. Beggiatoa, which lives in sulfurous environments, gets its energy by oxidizing H2S and can fix carbon even in the dark. Note the yellow deposits of sulfur inside the cells. (Courtesy of Ralph W. Wolfe.)
16
Chapter 1: Cells and Genomes
A R CH A EA EU
A RI
BA
CT
E
Sulfolobus
human Haloferax
Aeropyrum cyanobacteria
maize
Methanothermobacter
Bacillus
Methanococcus
yeast
CA
RY O
TE
Paramecium
S
Dictyostelium Euglena
E. coli
Thermotoga Aquifex
common ancestor cell
Trypanosoma Giardia 1 change/10 nucleotides
Trichomonas
Figure 1–21 The three major divisions (domains) of the living world. Note that traditionally the word bacteria has been used to refer to procaryotes in general, but more recently has been redefined to refer to eubacteria specifically. The tree shown here is based on comparisons of the nucleotide sequence of a ribosomal RNA subunit in the different species, and the distances in the diagram represent estimates of the numbers of evolutionary changes that have occurred in this molecule in each lineage (see Figure 1–22). The parts of the tree shrouded in gray cloud represent uncertainties about details of the true pattern of species divergence in the course of evolution: comparisons of nucleotide or amino acid sequences of molecules other than rRNA, as well as other arguments, lead to somewhat different trees. There is general agreement, however, as to the early divergence of the three most basic domains—the bacteria, the archaea, and the eucaryotes.
is any procaryote from any eucaryote. It now appears that the procaryotes comprise two distinct groups that diverged early in the history of life on Earth, either before the ancestors of the eucaryotes diverged as a separate group or at about the same time. The two groups of procaryotes are called the bacteria (or eubacteria) and the archaea (or archaebacteria). The living world therefore has three major divisions or domains: bacteria, archaea, and eucaryotes (Figure 1–21). Archaea are often found inhabiting environments that we humans avoid, such as bogs, sewage treatment plants, ocean depths, salt brines, and hot acid springs, although they are also widespread in less extreme and more homely environments, from soils and lakes to the stomachs of cattle. In outward appearance they are not easily distinguished from bacteria. At a molecular level, archaea seem to resemble eucaryotes more closely in their machinery for handling genetic information (replication, transcription, and translation), but bacteria more closely in their apparatus for metabolism and energy conversion. We discuss below how this might be explained.
Some Genes Evolve Rapidly; Others Are Highly Conserved Both in the storage and in the copying of genetic information, random accidents and errors occur, altering the nucleotide sequence—that is, creating mutations. Therefore, when a cell divides, its two daughters are often not quite identical to one another or to their parent. On rare occasions, the error may represent a change for the better; more probably, it will cause no significant difference in the cell’s prospects; and in many cases, the error will cause serious damage—for example, by disrupting the coding sequence for a key protein. Changes due to mistakes of the first type will tend to be perpetuated, because the altered cell has an increased likelihood of reproducing itself. Changes due to mistakes of the second type—selectively neutral changes—may be perpetuated or not: in the competition for limited resources, it is a matter of chance whether the altered cell or its cousins will succeed. But changes that cause serious damage lead nowhere: the cell that suffers them dies, leaving no progeny. Through endless repetition of this cycle of error and trial—of mutation and natural selection—
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE
17
organisms evolve: their genetic specifications change, giving them new ways to exploit the environment more effectively, to survive in competition with others, and to reproduce successfully. Clearly, some parts of the genome change more easily than others in the course of evolution. A segment of DNA that does not code for protein and has no significant regulatory role is free to change at a rate limited only by the frequency of random errors. In contrast, a gene that codes for a highly optimized essential protein or RNA molecule cannot alter so easily: when mistakes occur, the faulty cells are almost always eliminated. Genes of this latter sort are therefore highly conserved. Through 3.5 billion years or more of evolutionary history, many features of the genome have changed beyond all recognition; but the most highly conserved genes remain perfectly recognizable in all living species. These latter genes are the ones we must examine if we wish to trace family relationships between the most distantly related organisms in the tree of life. The studies that led to the classification of the living world into the three domains of bacteria, archaea, and eucaryotes were based chiefly on analysis of one of the two main RNA components of the ribosome—the so-called smallsubunit ribosomal RNA. Because translation is fundamental to all living cells, this component of the ribosome has been well conserved since early in the history of life on Earth (Figure 1–22).
Most Bacteria and Archaea Have 1000–6000 Genes Natural selection has generally favored those procaryotic cells that can reproduce the fastest by taking up raw materials from their environment and replicating themselves most efficiently, at the maximal rate permitted by the available food supplies. Small size implies a large ratio of surface area to volume, thereby helping to maximize the uptake of nutrients across the plasma membrane and boosting a cell’s reproductive rate. Presumably for these reasons, most procaryotic cells carry very little superfluous baggage; their genomes are small, with genes packed closely together and minimal quantities of regulatory DNA between them. The small genome size makes it relatively easy to determine the complete DNA sequence. We now have this information for many species of bacteria and archaea, and a few species of eucaryotes. As shown in Table 1–1, most bacterial and archaeal genomes contain between 106 and 107 nucleotide pairs, encoding 1000–6000 genes. A complete DNA sequence reveals both the genes an organism possesses and the genes it lacks. When we compare the three domains of the living world, we can begin to see which genes are common to all of them and must therefore have been present in the cell that was ancestral to all present-day living things, and which genes are peculiar to a single branch in the tree of life. To explain the findings, however, we need to consider a little more closely how new genes arise and genomes evolve.
human Methanococcus E. coli human
Figure 1–22 Genetic information conserved since the days of the last common ancestor of all living things. A part of the gene for the smaller of the two main RNA components of the ribosome is shown. (The complete molecule is about 1500–1900 nucleotides long, depending on species.) Corresponding segments of nucleotide sequence from an archaean (Methanococcus jannaschii), a bacterium (Escherichia coli) and a eucaryote (Homo sapiens) are aligned. Sites where the nucleotides are identical between species are indicated by a vertical line; the human sequence is repeated at the bottom of the alignment so that all three two-way comparisons can be seen. A dot halfway along the E. coli sequence denotes a site where a nucleotide has been either deleted from the bacterial lineage in the course of evolution, or inserted in the other two lineages. Note that the sequences from these three organisms, representative of the three domains of the living world, all differ from one another to a roughly similar degree, while still retaining unmistakable similarities.
18
Chapter 1: Cells and Genomes
Table 1–1 Some Genomes That Have Been Completely Sequenced SPECIES
SPECIAL FEATURES
HABITAT
GENOME SIZE (1000s OF NUCLEOTIDE PAIRS PER HAPLOID GENOME)
ESTIMATED NUMBER OF GENES CODING FOR PROTEINS
has one of the smallest of all known cell genomes photosynthetic, oxygen-generating (cyanobacterium) laboratory favorite causes stomach ulcers and predisposes to stomach cancer causes anthrax lithotrophic; lives at high temperatures source of antibiotics; giant genome spirochete; causes syphilis bacterium most closely related to mitochondria; causes typhus organotrophic; lives at very high temperatures
human genital tract
580
468
lakes and streams
3573
3168
human gut human stomach
4639 1667
4289 1590
soil hydrothermal vents
5227 1551
5634 1544
soil human tissues lice and humans (intracellular parasite) hydrothermal vents
8667 1138 1111
7825 1041 834
1860
1877
hydrothermal vents
1664
1750
hydrothermal vents
2178
2493
hydrothermal and volcanic hot vents
491
552
minimal model eucaryote
grape skins, beer
12,069
~6300
model organism for flowering plants simple animal with perfectly predictable development key to the genetics of animal development most intensively studied mammal
soil and air
~142,000
~26,000
soil
~97,000
~20,000
rotting fruit
~137,000
~14,000
houses
~3,200,000
~24,000
BACTERIA Mycoplasma genitalium Synechocystis sp. Escherichia coli Helicobacter pylori Bacillus anthracis Aquifex aeolicus Streptomyces coelicolor Treponema pallidum Rickettsia prowazekii Thermotoga maritima ARCHAEA Methanococcus jannaschii Archaeoglobus fulgidus Nanoarchaeum equitans
lithotrophic, anaerobic, methane-producing lithotrophic or organotrophic, anaerobic, sulfate-reducing smallest known archaean; anaerobic; parasitic on another, larger archaean
EUCARYOTES Saccharomyces cerevisiae (budding yeast) Arabidopsis thaliana (Thale cress) Caenorhabditis elegans (nematode worm) Drosophila melanogaster (fruit fly) Homo sapiens (human)
Genome size and gene number vary between strains of a single species, especially for bacteria and archaea. The table shows data for particular strains that have been sequenced. For eucaryotes, many genes can give rise to several alternative variant proteins, so that the total number of proteins specified by the genome is substantially greater than the number of genes.
New Genes Are Generated from Preexisting Genes The raw material of evolution is the DNA sequence that already exists: there is no natural mechanism for making long stretches of new random sequence. In this sense, no gene is ever entirely new. Innovation can, however, occur in several ways (Figure 1–23): 1. Intragenic mutation: an existing gene can be modified by changes in its DNA sequence, through various types of error that occur mainly in the process of DNA replication. 2. Gene duplication: an existing gene can be duplicated so as to create a pair of initially identical genes within a single cell; these two genes may then diverge in the course of evolution.
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE
ORIGINAL GENOME
19
GENETIC INNOVATION INTRAGENIC MUTATION
mutation
1 gene
GENE DUPLICATION +
2
gene A
DNA SEGMENT SHUFFLING +
3
+
gene B
organism A
4
+
HORIZONTAL TRANSFER
organism B organism B with new gene
3.
Segment shuffling: two or more existing genes can be broken and rejoined to make a hybrid gene consisting of DNA segments that originally belonged to separate genes. 4. Horizontal (intercellular) transfer: a piece of DNA can be transferred from the genome of one cell to that of another—even to that of another species. This process is in contrast with the usual vertical transfer of genetic information from parent to progeny. Each of these types of change leaves a characteristic trace in the DNA sequence of the organism, providing clear evidence that all four processes have occurred. In later chapters we discuss the underlying mechanisms, but for the present we focus on the consequences.
Gene Duplications Give Rise to Families of Related Genes Within a Single Cell A cell duplicates its entire genome each time it divides into two daughter cells. However, accidents occasionally result in the inappropriate duplication of just part of the genome, with retention of original and duplicate segments in a single cell. Once a gene has been duplicated in this way, one of the two gene copies is free to mutate and become specialized to perform a different function within the same cell. Repeated rounds of this process of duplication and divergence, over many millions of years, have enabled one gene to give rise to a family of genes that may all be found within a single genome. Analysis of the DNA sequence of procaryotic genomes reveals many examples of such gene families: in Bacillus subtilis, for example, 47% of the genes have one or more obvious relatives (Figure 1–24). When genes duplicate and diverge in this way, the individuals of one species become endowed with multiple variants of a primordial gene. This evolutionary
Figure 1–23 Four modes of genetic innovation and their effects on the DNA sequence of an organism. A special form of horizontal transfer occurs when two different types of cells enter into a permanent symbiotic association. Genes from one of the cells then may be transferred to the genome of the other, as we shall see below when we discuss mitochondria and chloroplasts.
20
Chapter 1: Cells and Genomes 283 genes in families with 38–77 gene members 764 genes in families with 4–19 gene members
2126 genes with no family relationship
273 genes in families with 3 gene members
Figure 1–24 Families of evolutionarily related genes in the genome of Bacillus subtilis. The biggest family consists of 77 genes coding for varieties of ABC transporters—a class of membrane transport proteins found in all three domains of the living world. (Adapted from F. Kunst et al., Nature 390:249–256, 1997. With permission from Macmillan Publishers Ltd.)
568 genes in families with 2 gene members
process has to be distinguished from the genetic divergence that occurs when one species of organism splits into two separate lines of descent at a branch point in the family tree—when the human line of descent became separate from that of chimpanzees, for example. There, the genes gradually become different in the course of evolution, but they are likely to continue to have corresponding functions in the two sister species. Genes that are related by descent in this way—that is, genes in two separate species that derive from the same ancestral gene in the last common ancestor of those two species—are called orthologs. Related genes that have resulted from a gene duplication event within a single genome—and are likely to have diverged in their function—are called paralogs. Genes that are related by descent in either way are called homologs, a general term used to cover both types of relationship (Figure 1–25). The family relationships between genes can become quite complex (Figure 1–26). For example, an organism that possesses a family of paralogous genes (for example, the seven hemoglobin genes a, b, g, d, e, z, and q) may evolve into two separate species (such as humans and chimpanzees) each possessing the entire set of paralogs. All 14 genes are homologs, with the human hemoglobin a orthologous to the chimpanzee hemoglobin a, but paralogous to the human or chimpanzee hemoglobin b, and so on. Moreover, the vertebrate hemoglobins (the oxygen-binding proteins of blood) are homologous to the vertebrate myoglobins (the oxygen-binding proteins of muscle), as well as to more distant ancestral organism
ancestral organism
early ancestral organism
gene G
gene G
SPECIATION TO GIVE TWO SEPARATE SPECIES
gene G
GENE DUPLICATION AND DIVERGENCE
GENE DUPLICATION AND DIVERGENCE
gene G1 species A
species B
gene GA
gene GB
later ancestral organism gene G2
gene G1
SPECIATION
gene G2 genes GA and GB are orthologs (A)
genes G1 and G2 are paralogs (B)
species A
species B
gene G1A
gene G1B
gene G2A
gene G2B
all G genes are homologs
Figure 1–25 Paralogous genes and orthologous genes: two types of gene homology based on different evolutionary pathways. (A) and (B) The most basic possibilities. (C) A more complex pattern of events that can occur.
G1A is a paralog of G2A and G2B but an ortholog of G1B (C)
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE
21 Drosophila globin shark myoglobin
ancestral globin
human myoglobin chick myoglobin shark Hb b chick Hb b chick Hb e chick Hb r human Hb b human Hb d human Hb e human Hb Ag human Hb Gg shark Hb a human Hb q-1 chick Hb a-A human Hb a1 human Hb a2 chick Hb a-D chick Hb p human Hb z
genes that code for oxygen-binding proteins in invertebrates, plants, fungi, and bacteria. From the DNA sequences, it is usually easy to recognize that two genes in different species are homologous; it is much more difficult to decide, without other information, whether they stand in the precise evolutionary relationship of orthologs.
Genes Can Be Transferred Between Organisms, Both in the Laboratory and in Nature Procaryotes also provide examples of the horizontal transfer of genes from one species of cell to another. The most obvious tell-tale signs are sequences recognizable as being derived from bacterial viruses, also called bacteriophages (Figure 1–27). Viruses are not themselves living cells but can act as vectors for gene transfer: they are small packets of genetic material that have evolved as parasites on the reproductive and biosynthetic machinery of host cells. They replicate in one cell, emerge from it with a protective wrapping, and then enter and infect another cell, which may be of the same or a different species. Often, the infected cell will be killed by the massive proliferation of virus particles inside it; but sometimes, the viral DNA, instead of directly generating these particles, may persist in its host for many cell generations as a relatively innocuous passenger, either as a separate intracellular fragment of DNA, known as a plasmid, or as a sequence inserted into the cell’s regular genome. In their travels, viruses can accidentally pick up fragments of DNA from the genome of one host cell and ferry them into another cell. Such transfers of genetic material frequently occur in procaryotes, and they can also occur between eucaryotic cells of the same species. Horizontal transfers of genes between eucaryotic cells of different species are very rare, and they do not seem to have played a significant part in eucaryote evolution (although massive transfers from bacterial to eucaryotic genomes have occurred in the evolution of mitochondria and chloroplasts, as we discuss below). In contrast, horizontal gene transfers occur much more frequently between different species of procaryotes. Many procaryotes have a remarkable capacity to take up even nonviral DNA molecules from their surroundings and thereby capture the genetic information these molecules carry. By this route, or by virus-mediated transfer, bacteria and archaea in the wild can acquire genes from neighboring cells relatively easily. Genes that confer resistance to an
Figure 1–26 A complex family of homologous genes. This diagram shows the pedigree of the hemoglobin (Hb), myoglobin, and globin genes of human, chick, shark, and Drosophila. The lengths of the horizontal lines represent the amount of divergence in amino acid sequence.
22
Chapter 1: Cells and Genomes
antibiotic or an ability to produce a toxin, for example, can be transferred from species to species and provide the recipient bacterium with a selective advantage. In this way, new and sometimes dangerous strains of bacteria have been observed to evolve in the bacterial ecosystems that inhabit hospitals or the various niches in the human body. For example, horizontal gene transfer is responsible for the spread, over the past 40 years, of penicillin-resistant strains of Neisseria gonorrheae, the bacterium that causes gonorrhea. On a longer time scale, the results can be even more profound; it has been estimated that at least 18% of all of the genes in the present-day genome of E. coli have been acquired by horizontal transfer from another species within the past 100 million years.
Sex Results in Horizontal Exchanges of Genetic Information Within a Species Horizontal exchanges of genetic information are important in bacterial and archaeal evolution in today’s world, and they may have occurred even more frequently and promiscuously in the early days of life on Earth. Such early horizontal exchanges could explain the otherwise puzzling observation that the eucaryotes seem more similar to archaea in their genes for the basic information-handling processes of DNA replication, transcription, and translation, but more similar to bacteria in their genes for metabolic processes. In any case, whether horizontal gene transfer occurred most freely in the early days of life on Earth, or has continued at a steady low rate throughout evolutionary history, it has the effect of complicating the whole concept of cell ancestry, by making each cell’s genome a composite of parts derived from separate sources. Horizontal gene transfer among procaryotes may seem a surprising process, but it has a parallel in a phenomenon familiar to us all: sex. In addition to the usual vertical transfer of genetic material from parent to offspring, sexual reproduction causes a large-scale horizontal transfer of genetic information between two initially separate cell lineages—those of the father and the mother. A key feature of sex, of course, is that the genetic exchange normally occurs only between individuals of the same species. But no matter whether they occur within a species or between species, horizontal gene transfers leave a characteristic imprint: they result in individuals who are related more closely to one set of relatives with respect to some genes, and more closely to another set of relatives with respect to others. By comparing the DNA sequences of individual human genomes, an intelligent visitor from outer space could deduce that humans reproduce sexually, even if it knew nothing about human behavior. Sexual reproduction is widespread (although not universal), especially among eucaryotes. Even bacteria indulge from time to time in controlled sexual exchanges of DNA with other members of their own species. Natural selection has clearly favored organisms that can reproduce sexually, although evolutionary theorists dispute precisely what the selective advantage of sex is.
The Function of a Gene Can Often Be Deduced from Its Sequence Family relationships among genes are important not just for their historical interest, but because they simplify the task of deciphering gene functions. Once the sequence of a newly discovered gene has been determined, a scientist can tap a few keys on a computer to search the entire database of known gene sequences for genes related to it. In many cases, the function of one or more of these homologs will have been already determined experimentally, and thus, since gene sequence determines gene function, one can frequently make a good guess at the function of the new gene: it is likely to be similar to that of the already-known homologs. In this way, it is possible to decipher a great deal of the biology of an organism simply by analyzing the DNA sequence of its genome and using the information we already have about the functions of genes in other organisms that have been more intensively studied.
(A) 100 nm
(B) 100 nm
Figure 1–27 The viral transfer of DNA from one cell to another. (A) An electron micrograph of particles of a bacterial virus, the T4 bacteriophage. The head of this virus contains the viral DNA; the tail contains the apparatus for injecting the DNA into a host bacterium. (B) A cross section of a bacterium with a T4 bacteriophage latched onto its surface. The large dark objects inside the bacterium are the heads of new T4 particles in course of assembly. When they are mature, the bacterium will burst open to release them. (A, courtesy of James Paulson; B, courtesy of Jonathan King and Erika Hartwig from G. Karp, Cell and Molecular Biology, 2nd ed. New York: John Wiley & Sons, 1999. With permission from John Wiley & Sons.)
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE
More Than 200 Gene Families Are Common to All Three Primary Branches of the Tree of Life Given the complete genome sequences of representative organisms from all three domains—archaea, bacteria, and eucaryotes—we can search systematically for homologies that span this enormous evolutionary divide. In this way we can begin to take stock of the common inheritance of all living things. There are considerable difficulties in this enterprise. For example, individual species have often lost some of the ancestral genes; other genes have almost certainly been acquired by horizontal transfer from another species and therefore are not truly ancestral, even though shared. In fact, genome comparisons strongly suggest that both lineagespecific gene loss and horizontal gene transfer, in some cases between evolutionarily distant species, have been major factors of evolution, at least among procaryotes. Finally, in the course of 2 or 3 billion years, some genes that were initially shared will have changed beyond recognition by current methods. Because of all these vagaries of the evolutionary process, it seems that only a small proportion of ancestral gene families have been universally retained in a recognizable form. Thus, out of 4873 protein-coding gene families defined by comparing the genomes of 50 species of bacteria, 13 archaea, and 3 unicellular eucaryotes, only 63 are truly ubiquitous (that is, represented in all the genomes analyzed). The great majority of these universal families include components of the translation and transcription systems. This is not likely to be a realistic approximation of an ancestral gene set. A better—though still crude—idea of the latter can be obtained by tallying the gene families that have representatives in multiple, but not necessarily all, species from all three major domains. Such an analysis reveals 264 ancient conserved families. Each family can be assigned a function (at least in terms of general biochemical activity, but usually with more precision), with the largest number of shared gene families being involved in translation and in amino acid metabolism and transport (Table 1–2). This set of highly conserved gene families represents only a very rough sketch of the common inheritance of all modern life; a more precise reconstruction of the gene complement of the last universal common ancestor might be feasible with further genome sequencing and more careful comparative analysis.
Mutations Reveal the Functions of Genes Without additional information, no amount of gazing at genome sequences will reveal the functions of genes. We may recognize that gene B is like gene A, but how do we discover the function of gene A in the first place? And even if we know the function of gene A, how do we test whether the function of gene B is truly the same as the sequence similarity suggests? How do we connect the world of abstract genetic information with the world of real living organisms? The analysis of gene functions depends on two complementary approaches: genetics and biochemistry. Genetics starts with the study of mutants: we either find or make an organism in which a gene is altered, and examine the effects on the organism’s structure and performance (Figure 1–28). Biochemistry examines the functions of molecules: we extract molecules from an organism and then study their chemical activities. By combining genetics and biochemistry and examining the chemical abnormalities in a mutant organism, it is possible to find those molecules whose production depends on a given gene. At the same time, studies of the performance of the mutant organism show us what role those molecules have in the operation of the organism as a whole. Thus, genetics and biochemistry together provide a way to relate genes, molecules, and the structure and function of the organism. In recent years, DNA sequence information and the powerful tools of molecular biology have allowed rapid progress. From sequence comparisons, we can often identify particular subregions within a gene that have been preserved nearly unchanged over the course of evolution. These conserved subregions are likely to be the most important parts of the gene in terms of function. We can test their individual contributions to the activity of the gene product by creating in
23
24
Chapter 1: Cells and Genomes
Table 1–2 The Numbers of Gene Families, Classified by Function, That Are Common to All Three Domains of the Living World GENE FAMILY FUNCTION
NUMBER OF “UNIVERSAL” FAMILIES
Information processing Translation Transcription Replication, recombination, and repair Cellular processes and signaling Cell cycle control, mitosis, and meiosis Defense mechanisms Signal transduction mechanisms Cell wall/membrane biogenesis Intracellular trafficking and secretion Post-translational modification, protein turnover, chaperones Metabolism Energy production and conversion Carbohydrate transport and metabolism Amino acid transport and metabolism Nucleotide transport and metabolism Coenzyme transport and metabolism Lipid transport and metabolism Inorganic ion transport and metabolism Secondary metabolite biosynthesis, transport, and catabolism Poorly characterized General biochemical function predicted; specific biological role unknown
63 7 13 2 3 1 2 4 8 19 16 43 15 22 9 8 5 24
For the purpose of this analysis, gene families are defined as “universal” if they are represented in the genomes of at least two diverse archaea (Archaeoglobus fulgidus and Aeropyrum pernix), two evolutionarily distant bacteria (Escherichia coli and Bacillus subtilis) and one eucaryote (yeast, Saccharomyces cerevisiae). (Data from R.L. Tatusov, E.V. Koonin and D.J. Lipman, Science 278:631–637, 1997, with permission from AAAS; R.L. Tatusov et al., BMC Bioinformatics 4:41, 2003, with permission from BioMed Central; and the COGs database at the US National Library of Medicine.)
the laboratory mutations of specific sites within the gene, or by constructing artificial hybrid genes that combine part of one gene with part of another. Organisms can be engineered to make either the RNA or the protein specified by the gene in large quantities to facilitate biochemical analysis. Specialists in molecular structure can determine the three-dimensional conformation of the gene product, revealing the exact position of every atom in it. Biochemists can determine how each of the parts of the genetically specified molecule contributes to its chemical behavior. Cell biologists can analyze the behavior of cells that are engineered to express a mutant version of the gene. There is, however, no one simple recipe for discovering a gene’s function, and no simple standard universal format for describing it. We may discover, for example, that the product of a given gene catalyzes a certain chemical reaction, and yet have no idea how or why that reaction is important to the organism. The functional characterization of each new family of gene products, unlike the description of the gene sequences, presents a fresh challenge to the biologist’s ingenuity. Moreover, we never fully understand the function of a gene until we learn its role in the life of the organism as a whole. To make ultimate sense of gene functions, therefore, we have to study whole organisms, not just molecules or cells.
Molecular Biologists Have Focused a Spotlight on E. coli Because living organisms are so complex, the more we learn about any particular species, the more attractive it becomes as an object for further study. Each
5 mm
Figure 1–28 A mutant phenotype reflecting the function of a gene. A normal yeast (of the species Schizosaccharomyces pombe) is compared with a mutant in which a change in a single gene has converted the cell from a cigar shape (left) to a T shape (right). The mutant gene therefore has a function in the control of cell shape. But how, in molecular terms, does the gene product perform that function? That is a harder question, and needs biochemical analysis to answer it. (Courtesy of Kenneth Sawin and Paul Nurse.)
THE DIVERSITY OF GENOMES AND THE TREE OF LIFE
25 Figure 1–29 The genome of E. coli. (A) A cluster of E. coli cells. (B) A diagram of the genome of E. coli strain K-12. The diagram is circular because the DNA of E. coli, like that of other procaryotes, forms a single, closed loop. Proteincoding genes are shown as yellow or orange bars, depending on the DNA strand from which they are transcribed; genes encoding only RNA molecules are indicated by green arrows. Some genes are transcribed from one strand of the DNA double helix (in a clockwise direction in this diagram), others from the other strand (counterclockwise). (A, courtesy of Dr. Tony Brain and David Parker/Photo Researchers; B, adapted from F.R. Blattner et al., Science 277:1453–1462, 1997. With permission from AAAS.)
origin of replication
(A)
Escherichia coli K-12 4,639,221 nucleotide pairs
terminus of replication
(B)
discovery raises new questions and provides new tools with which to tackle general questions in the context of the chosen organism. For this reason, large communities of biologists have become dedicated to studying different aspects of the same model organism. In the enormously varied world of bacteria, the spotlight of molecular biology has for a long time focused intensely on just one species: Escherichia coli, or E. coli (see Figures 1–17 and 1–18). This small, rod-shaped bacterial cell normally lives in the gut of humans and other vertebrates, but it can be grown easily in a simple nutrient broth in a culture bottle. It adapts to variable chemical conditions and reproduces rapidly, and it can evolve by mutation and selection at a remarkable speed. As with other bacteria, different strains of E. coli, though classified as members of a single species, differ genetically to a much greater degree than do different varieties of a sexually reproducing organism such as a plant or animal. One E. coli strain may possess many hundreds of genes that are absent from another, and the two strains could have as little as 50% of their genes in common. The standard laboratory strain E. coli K-12 has a genome of approximately 4.6 million nucleotide pairs, contained in a single circular molecule of DNA, coding for about 4300 different kinds of proteins (Figure 1–29). In molecular terms, we know more about E. coli than about any other living organism. Most of our understanding of the fundamental mechanisms of life— for example, how cells replicate their DNA, or how they decode the instructions represented in the DNA to direct the synthesis of specific proteins—has come from studies of E. coli. The basic genetic mechanisms have turned out to be highly conserved throughout evolution: these mechanisms are therefore essentially the same in our own cells as in E. coli.
26
Chapter 1: Cells and Genomes
Summary Procaryotes (cells without a distinct nucleus) are biochemically the most diverse organisms and include species that can obtain all their energy and nutrients from inorganic chemical sources, such as the reactive mixtures of minerals released at hydrothermal vents on the ocean floor—the sort of diet that may have nourished the first living cells 3.5 billion years ago. DNA sequence comparisons reveal the family relationships of living organisms and show that the procaryotes fall into two groups that diverged early in the course of evolution: the bacteria (or eubacteria) and the archaea. Together with the eucaryotes (cells with a membrane-enclosed nucleus), these constitute the three primary branches of the tree of life. Most bacteria and archaea are small unicellular organisms with compact genomes comprising 1000–6000 genes. Many of the genes within a single organism show strong family resemblances in their DNA sequences, implying that they originated from the same ancestral gene through gene duplication and divergence. Family resemblances (homologies) are also clear when gene sequences are compared between different species, and more than 200 gene families have been so highly conserved that they can be recognized as common to most species from all three domains of the living world. Thus, given the DNA sequence of a newly discovered gene, it is often possible to deduce the gene’s function from the known function of a homologous gene in an intensively studied model organism, such as the bacterium E. coli.
GENETIC INFORMATION IN EUCARYOTES Eucaryotic cells, in general, are bigger and more elaborate than procaryotic cells, and their genomes are bigger and more elaborate, too. The greater size is accompanied by radical differences in cell structure and function. Moreover, many classes of eucaryotic cells form multicellular organisms that attain levels of complexity unmatched by any procaryote. Because they are so complex, eucaryotes confront molecular biologists with a special set of challenges, which will concern us in the rest of this book. Increasingly, biologists meet these challenges through the analysis and manipulation of the genetic information within cells and organisms. It is therefore important at the outset to know something of the special features of the eucaryotic genome. We begin by briefly discussing how eucaryotic cells are organized, how this reflects their way of life, and how their genomes differ from those of procaryotes. This leads us to an outline of the strategy by which molecular biologists, by exploiting genetic information, are attempting to discover how eucaryotic organisms work.
Eucaryotic Cells May Have Originated as Predators By definition, eucaryotic cells keep their DNA in an internal compartment called the nucleus. The nuclear envelope, a double layer of membrane, surrounds the nucleus and separates the DNA from the cytoplasm. Eucaryotes also have other features that set them apart from procaryotes (Figure 1–30). Their cells are, typically, 10 times bigger in linear dimension, and 1000 times larger in volume. They have a cytoskeleton—a system of protein filaments crisscrossing the cytoplasm and forming, together with the many proteins that attach to them, a system of girders, ropes, and motors that gives the cell mechanical strength, controls its shape, and drives and guides its movements. The nuclear envelope is only one part of a set of internal membranes, each structurally similar to the plasma membrane and enclosing different types of spaces inside the cell, many of them involved in digestion and secretion. Lacking the tough cell wall of most bacteria, animal cells and the free-living eucaryotic cells called protozoa can change their shape rapidly and engulf other cells and small objects by phagocytosis (Figure 1–31). It is still a mystery how all these properties evolved, and in what sequence. One plausible view, however, is that they are all reflections of the way of life of a
GENETIC INFORMATION IN EUCARYOTES
27
microtubule centrosome with pair of centrioles
5 mm
extracellular matrix chromatin (DNA) nuclear pore nuclear envelope vesicles
lysosome
actin filaments nucleolus peroxisome ribosomes in cytosol
Golgi apparatus
intermediate filaments
plasma membrane
nucleus
primordial eucaryotic cell that was a predator, living by capturing other cells and eating them (Figure 1–32). Such a way of life requires a large cell with a flexible plasma membrane, as well as an elaborate cytoskeleton to support and move this membrane. It may also require that the cell’s long, fragile DNA molecules be sequestered in a separate nuclear compartment, to protect the genome from damage by the movements of the cytoskeleton.
Modern Eucaryotic Cells Evolved from a Symbiosis
endoplasmic reticulum
mitochondrion
Figure 1–30 The major features of eucaryotic cells. The drawing depicts a typical animal cell, but almost all the same components are found in plants and fungi and in single-celled eucaryotes such as yeasts and protozoa. Plant cells contain chloroplasts in addition to the components shown here, and their plasma membrane is surrounded by a tough external wall formed of cellulose.
A predatory way of life helps to explain another feature of eucaryotic cells. Almost all such cells contain mitochondria (Figure 1–33). These small bodies in the cytoplasm, enclosed by a double layer of membrane, take up oxygen and harness energy from the oxidation of food molecules—such as sugars—to produce most of the ATP that powers the cell’s activities. Mitochondria are similar in size to small bacteria, and, like bacteria, they have their own genome in the form of a circular DNA molecule, their own ribosomes that differ from those elsewhere in the eucaryotic cell, and their own transfer RNAs. It is now generally accepted that mitochondria originated from free-living oxygen-metabolizing (aerobic) bacteria that were engulfed by an ancestral eucaryotic cell that could otherwise make no such use of oxygen (that is, was anaerobic). Escaping digestion, these bacteria evolved in symbiosis with the engulfing cell and its progeny,
10 mm
Figure 1–31 Phagocytosis. This series of stills from a movie shows a human white blood cell (a neutrophil) engulfing a red blood cell (artificially colored red) that has been treated with antibody. (Courtesy of Stephen E. Malawista and Anne de Boisfleury Chevance.)
28
Chapter 1: Cells and Genomes Figure 1–32 A single-celled eucaryote that eats other cells. (A) Didinium is a carnivorous protozoan, belonging to the group known as ciliates. It has a globular body, about 150 mm in diameter, encircled by two fringes of cilia—sinuous, whiplike appendages that beat continually; its front end is flattened except for a single protrusion, rather like a snout. (B) Didinium normally swims around in the water at high speed by means of the synchronous beating of its cilia. When it encounters a suitable prey, usually another type of protozoan, it releases numerous small paralyzing darts from its snout region. Then, the Didinium attaches to and devours the other cell by phagocytosis, inverting like a hollow ball to engulf its victim, which is almost as large as itself. (Courtesy of D. Barlow.)
(A) 100 mm (B)
receiving shelter and nourishment in return for the power generation they performed for their hosts (Figure 1–34). This partnership between a primitive anaerobic eucaryotic predator cell and an aerobic bacterial cell is thought to have been established about 1.5 billion years ago, when the Earth’s atmosphere first became rich in oxygen.
(B)
(C)
(A) 100 nm
Figure 1–33 A mitochondrion. (A) A cross section, as seen in the electron microscope. (B) A drawing of a mitochondrion with part of it cut away to show the three-dimensional structure. (C) A schematic eucaryotic cell, with the interior space of a mitochondrion, containing the mitochondrial DNA and ribosomes, colored. Note the smooth outer membrane and the convoluted inner membrane, which houses the proteins that generate ATP from the oxidation of food molecules. (A, courtesy of Daniel S. Friend.)
GENETIC INFORMATION IN EUCARYOTES
29
ancestral eucaryotic cell internal membranes
early eucaryotic cell nucleus
Figure 1–34 The origin of mitochondria. An ancestral eucaryotic cell is thought to have engulfed the bacterial ancestor of mitochondria, initiating a symbiotic relationship.
mitochondria with double membrane
bacterium
Many eucaryotic cells—specifically, those of plants and algae—also contain another class of small membrane-enclosed organelles somewhat similar to mitochondria—the chloroplasts (Figure 1–35). Chloroplasts perform photosynthesis, using the energy of sunlight to synthesize carbohydrates from atmospheric carbon dioxide and water, and deliver the products to the host cell as food. Like mitochondria, chloroplasts have their own genome and almost certainly originated as symbiotic photosynthetic bacteria, acquired by cells that already possessed mitochondria (Figure 1–36). A eucaryotic cell equipped with chloroplasts has no need to chase after other cells as prey; it is nourished by the captive chloroplasts it has inherited from its ancestors. Correspondingly, plant cells, although they possess the cytoskeletal equipment for movement, have lost the ability to change shape rapidly and to engulf other cells by phagocytosis. Instead, they create around themselves a tough, protective cell wall. If the ancestral eucaryote was indeed a predator on other organisms, we can view plant cells as eucaryotes that have made the transition from hunting to farming. Fungi represent yet another eucaryotic way of life. Fungal cells, like animal cells, possess mitochondria but not chloroplasts; but in contrast with animal cells and protozoa, they have a tough outer wall that limits their ability to move
chloroplasts
chlorophyllcontaining membranes
inner membrane outer membrane
(A)
10 mm
(B)
Figure 1–35 Chloroplasts. These organelles capture the energy of sunlight in plant cells and some single-celled eucaryotes. (A) A single cell isolated from a leaf of a flowering plant, seen in the light microscope, showing the green chloroplasts. (B) A drawing of one of the chloroplasts, showing the highly folded system of internal membranes containing the chlorophyll molecules by which light is absorbed. (A, courtesy of Preeti Dahiya.)
30
Chapter 1: Cells and Genomes
early eucaryotic cell
photosynthetic bacterium
early eucaryotic cell capable of photosynthesis
chloroplasts with double membrane
rapidly or to swallow up other cells. Fungi, it seems, have turned from hunters into scavengers: other cells secrete nutrient molecules or release them upon death, and fungi feed on these leavings—performing whatever digestion is necessary extracellularly, by secreting digestive enzymes to the exterior.
Eucaryotes Have Hybrid Genomes The genetic information of eucaryotic cells has a hybrid origin—from the ancestral anaerobic eucaryote, and from the bacteria that it adopted as symbionts. Most of this information is stored in the nucleus, but a small amount remains inside the mitochondria and, for plant and algal cells, in the chloroplasts. The mitochondrial DNA and the chloroplast DNA can be separated from the nuclear DNA and individually analyzed and sequenced. The mitochondrial and chloroplast genomes are found to be degenerate, cut-down versions of the corresponding bacterial genomes, lacking genes for many essential functions. In a human cell, for example, the mitochondrial genome consists of only 16,569 nucleotide pairs, and codes for only 13 proteins, two ribosomal RNA components, and 22 transfer RNAs. The genes that are missing from the mitochondria and chloroplasts have not all been lost; instead, many of them have been somehow moved from the symbiont genome into the DNA of the host cell nucleus. The nuclear DNA of humans contains many genes coding for proteins that serve essential functions inside the mitochondria; in plants, the nuclear DNA also contains many genes specifying proteins required in chloroplasts.
Eucaryotic Genomes Are Big Natural selection has evidently favored mitochondria with small genomes, just as it has favored bacteria with small genomes. By contrast, the nuclear genomes of most eucaryotes seem to have been free to enlarge. Perhaps the eucaryotic way of life has made large size an advantage: predators typically need to be bigger than their prey, and cell size generally increases in proportion to genome size. Perhaps enlargement of the genome has been driven by the accumulation of parasitic transposable elements (discussed in Chapter 5)—“selfish” segments of DNA that can insert copies of themselves at multiple sites in the genome. Whatever the explanation, the genomes of most eucaryotes are orders of magnitude larger than those of bacteria and archaea (Figure 1–37). And the freedom to be extravagant with DNA has had profound implications. Eucaryotes not only have more genes than procaryotes; they also have vastly more DNA that does not code for protein or for any other functional product molecule. The human genome contains 1000 times as many nucleotide pairs as the genome of a typical bacterium, 20 times as many genes, and about 10,000
Figure 1–36 The origin of chloroplasts. An early eucaryotic cell, already possessing mitochondria, engulfed a photosynthetic bacterium (a cyanobacterium) and retained it in symbiosis. All present-day chloroplasts are thought to trace their ancestry back to a single species of cyanobacterium that was adopted as an internal symbiont (an endosymbiont) over a billion years ago.
GENETIC INFORMATION IN EUCARYOTES
Mycoplasma BACTERIA AND ARCHAEA
31 Figure 1–37 Genome sizes compared. Genome size is measured in nucleotide pairs of DNA per haploid genome, that is, per single copy of the genome. (The cells of sexually reproducing organisms such as ourselves are generally diploid: they contain two copies of the genome, one inherited from the mother, the other from the father.) Closely related organisms can vary widely in the quantity of DNA in their genomes, even though they contain similar numbers of functionally distinct genes. (Data from W.H. Li, Molecular Evolution, pp. 380–383. Sunderland, MA: Sinauer, 1997.)
E. coli yeast FUNGI
Amoeba
PROTISTS
Arabidopsis PLANTS Drosophila INSECTS
bean
lily
fern
MOLLUSKS
shark CARTILAGINOUS FISH Fugu zebrafish BONY FISH
newt
AMPHIBIANS REPTILES BIRDS
human
MAMMALS
105
106
107 108 109 1010 number of nucleotide pairs per haploid genome
1011
1012
times as much noncoding DNA (~98.5% of the genome for a human is noncoding, as opposed to 11% of the genome for the bacterium E. coli).
Eucaryotic Genomes Are Rich in Regulatory DNA Much of our noncoding DNA is almost certainly dispensable junk, retained like a mass of old papers because, when there is little pressure to keep an archive small, it is easier to retain everything than to sort out the valuable information and discard the rest. Certain exceptional eucaryotic species, such as the puffer fish (Figure 1–38), bear witness to the profligacy of their relatives; they have somehow managed to rid themselves of large quantities of noncoding DNA. Yet they appear similar in structure, behavior, and fitness to related species that have vastly more such DNA. Even in compact eucaryotic genomes such as that of puffer fish, there is more noncoding DNA than coding DNA, and at least some of the noncoding DNA certainly has important functions. In particular, it regulates the expression of adjacent genes. With this regulatory DNA, eucaryotes have evolved distinctive ways of controlling when and where a gene is brought into play. This sophisticated gene regulation is crucial for the formation of complex multicellular organisms.
The Genome Defines the Program of Multicellular Development The cells in an individual animal or plant are extraordinarily varied. Fat cells, skin cells, bone cells, nerve cells—they seem as dissimilar as any cells could be. Yet all these cell types are the descendants of a single fertilized egg cell, and all (with minor exceptions) contain identical copies of the genome of the species. The differences result from the way in which the cells make selective use of their genetic instructions according to the cues they get from their surroundings in the developing embryo. The DNA is not just a shopping list specifying the molecules that every cell must have, and the cell is not an assembly of all the items on the list. Rather, the cell behaves as a multipurpose machine, with sensors to receive environmental signals and with highly developed abilities to call different sets of genes into action according to the sequences of signals to which the cell has been exposed. The genome in each cell is big enough to accommodate the information that specifies an entire multicellular organism, but in any individual cell only part of that information is used. A large fraction of the genes in the eucaryotic genome code for proteins that regulate the activities of other genes. Most of these gene regulatory proteins act by
Figure 1–38 The puffer fish (Fugu rubripes). This organism has a genome size of 400 million nucleotide pairs— about one-quarter as much as a zebrafish, for example, even though the two species of fish have similar numbers of genes. (From a woodcut by Hiroshige, courtesy of Arts and Designs of Japan.)
32
Chapter 1: Cells and Genomes receptor protein in cell membrane detects environmental signal
gene-regulatory protein is activated... ...and binds to regulatory DNA...
...provoking activation of a gene to produce another protein...
Figure 1–39 Controlling gene readout by environmental signals. Regulatory DNA allows gene expression to be controlled by regulatory proteins, which are in turn the products of other genes. This diagram shows how a cell’s gene expression is adjusted according to a signal from the cell’s environment. The initial effect of the signal is to activate a regulatory protein already present in the cell; the signal may, for example, trigger the attachment of a phosphate group to the regulatory protein, altering its chemical properties.
...that binds to other regulatory regions... protein-coding region regulatory region
...to produce yet more proteins, including some additional gene-regulatory proteins
binding, directly or indirectly, to the regulatory DNA adjacent to the genes that are to be controlled (Figure 1–39), or by interfering with the abilities of other proteins to do so. The expanded genome of eucaryotes therefore not only specifies the hardware of the cell, but also stores the software that controls how that hardware is used (Figure 1–40). Cells do not just passively receive signals; rather, they actively exchange signals with their neighbors. Thus, in a developing multicellular organism, the same control system governs each cell, but with different consequences depending on the messages exchanged. The outcome, astonishingly, is a precisely patterned array of cells in different states, each displaying a character appropriate to its position in the multicellular structure.
Many Eucaryotes Live as Solitary Cells: the Protists Many species of eucaryotic cells lead a solitary life—some as hunters (the protozoa), some as photosynthesizers (the unicellular algae), some as scavengers (the unicellular fungi, or yeasts). Figure 1–41 conveys something of the variety of forms of these single-celled eucaryotes, or protists. The anatomy of protozoa,
Figure 1–40 Genetic control of the program of multicellular development. The role of a regulatory gene is demonstrated in the snapdragon Antirrhinum. In this example, a mutation in a single gene coding for a regulatory protein causes leafy shoots to develop in place of flowers: because a regulatory protein has been changed, the cells adopt characters that would be appropriate to a different location in the normal plant. The mutant is on the left, the normal plant on the right. (Courtesy of Enrico Coen and Rosemary Carpenter.)
GENETIC INFORMATION IN EUCARYOTES
33
I.
especially, is often elaborate and includes such structures as sensory bristles, photoreceptors, sinuously beating cilia, leglike appendages, mouth parts, stinging darts, and musclelike contractile bundles. Although they are single cells, protozoa can be as intricate, as versatile, and as complex in their behavior as many multicellular organisms (see Figure 1–32). In terms of their ancestry and DNA sequences, protists are far more diverse than the multicellular animals, plants, and fungi, which arose as three comparatively late branches of the eucaryotic pedigree (see Figure 1–21). As with procaryotes, humans have tended to neglect the protists because they are microscopic. Only now, with the help of genome analysis, are we beginning to understand their positions in the tree of life, and to put into context the glimpses these strange creatures offer us of our distant evolutionary past.
A Yeast Serves as a Minimal Model Eucaryote The molecular and genetic complexity of eucaryotes is daunting. Even more than for procaryotes, biologists need to concentrate their limited resources on a few selected model organisms to fathom this complexity. To analyze the internal workings of the eucaryotic cell, without the additional problems of multicellular development, it makes sense to use a species that is unicellular and as simple as possible. The popular choice for this role of minimal model eucaryote has been the yeast Saccharomyces cerevisiae (Figure 1–42)—the same species that is used by brewers of beer and bakers of bread. S. cerevisiae is a small, single-celled member of the kingdom of fungi and thus, according to modern views, at least as closely related to animals as it is to plants. It is robust and easy to grow in a simple nutrient medium. Like other fungi, it has a tough cell wall, is relatively immobile, and possesses mitochondria but not chloroplasts. When nutrients are plentiful, it grows and divides almost as
Figure 1–41 An assortment of protists: a small sample of an extremely diverse class of organisms. The drawings are done to different scales, but in each case the scale bar represents 10 mm. The organisms in (A), (B), (E), (F), and (I) are ciliates; (C) is a euglenoid; (D) is an amoeba; (G) is a dinoflagellate; (H) is a heliozoan. (From M.A. Sleigh, Biology of Protozoa. Cambridge, UK: Cambridge University Press, 1973.)
34
Chapter 1: Cells and Genomes
nucleus
cell wall
Figure 1–42 The yeast Saccharomyces cerevisiae. (A) A scanning electron micrograph of a cluster of the cells. This species is also known as budding yeast; it proliferates by forming a protrusion or bud that enlarges and then separates from the rest of the original cell. Many cells with buds are visible in this micrograph. (B) A transmission electron micrograph of a cross section of a yeast cell, showing its nucleus, mitochondrion, and thick cell wall. (A, courtesy of Ira Herskowitz and Eric Schabatach.)
mitochondrion (B)
(A) 10 mm
2 mm
rapidly as a bacterium. It can reproduce either vegetatively (that is, by simple cell division), or sexually: two yeast cells that are haploid (possessing a single copy of the genome) can fuse to create a cell that is diploid (containing a double genome); and the diploid cell can undergo meiosis (a reduction division) to produce cells that are once again haploid (Figure 1–43). In contrast with higher plants and animals, the yeast can divide indefinitely in either the haploid or the diploid state, and the process leading from the one state to the other can be induced at will by changing the growth conditions. In addition to these features, the yeast has a further property that makes it a convenient organism for genetic studies: its genome, by eucaryotic standards, is exceptionally small. Nevertheless, it suffices for all the basic tasks that every eucaryotic cell must perform. As we shall see later in this book, studies on yeasts (using both S. cerevisiae and other species) have provided a key to many crucial processes, including the eucaryotic cell-division cycle—the critical chain of events by which the nucleus and all the other components of a cell are duplicated and parceled out to create two daughter cells from one. The control system that governs this process has been so well conserved over the course of evolution that many of its components can function interchangeably in yeast and human cells: if a mutant yeast lacking an essential yeast cell-division-cycle gene is supplied with a copy of the homologous cell-division-cycle gene from a human, the yeast is cured of its defect and becomes able to divide normally.
The Expression Levels of All The Genes of An Organism Can Be Monitored Simultaneously The complete genome sequence of S. cerevisiae, determined in 1997, consists of approximately 13,117,000 nucleotide pairs, including the small contribution (78,520 nucleotide pairs) of the mitochondrial DNA. This total is only about 2.5 times as much DNA as there is in E. coli, and it codes for only 1.5 times as many distinct proteins (about 6300 in all). The way of life of S. cerevisiae is similar in many ways to that of a bacterium, and it seems that this yeast has likewise been subject to selection pressures that have kept its genome compact. Knowledge of the complete genome sequence of any organism—be it a yeast or a human—opens up new perspectives on the workings of the cell: things that once seemed impossibly complex now seem within our grasp. Using techniques Figure 1–43 The reproductive cycles of the yeast S. cerevisiae. Depending on environmental conditions and on details of the genotype, cells of this species can exist in either a diploid (2n) state, with a double chromosome set, or a haploid (n) state, with a single chromosome set. The diploid form can either proliferate by ordinary cell-division cycles or undergo meiosis to produce haploid cells. The haploid form can either proliferate by ordinary cell-division cycles or undergo sexual fusion with another haploid cell to become diploid. Meiosis is triggered by starvation and gives rise to spores—haploid cells in a dormant state, resistant to harsh environmental conditions.
2n
2n
proliferation of diploid cells 2n meiosis and sporulation (triggered by starvation) 2n n n mating (usually immediately after spores hatch)
n
n
spores hatch n n
n
proliferation of haploid cells n
BUDDING YEAST LIFE CYCLE
GENETIC INFORMATION IN EUCARYOTES
35
ACE2 FKH1 FKH2 MBP1 MC M NDD 1 RM E 1 SK 1 ST N7 S B1 SWWI4 S I5 A WI AS 1 6 H1
YJL206C UGA3 THI2 STP21 STP 4 SIP 1 SFPL1 SF G3 RT 1 G RT GT1 R U T3 P
G1 DI MS1 H E4 IM OT3 M D1 PH 101 RIM K2 SO 12 STE 1 SUM ABF1 DOT6 FHL1 HIR1 HIR2 RAP1 REB1 RGM CAD 1 CIN 1 CRZ 5 CU 1 G P9 H TS1 HA AA1 H L9 MA SF1 C1
CH C A4 BA BF1 S A AR ZF1 1 AR O80 ARGG81 8 ADR 0 ZMS11 ZAP1 YFL044C YAP7 YAP6 YAP5
N1 MS SN2 4 M N MS DR1 P CS1 R X1 RF 1 RLMX1 RO 1 RPH 1 SKO SMP1 YAP1 YAP3
P P HO N HO 4 MS RG1 2 S M 11 M IG1 METET4 MAL 31 MAL133 3 LEU3 IXR1 INO4 INO2 HAP5 HAP4 HAP3 HAP2 GLN3 GCR21 GCRN4 GC T3 GA T1 GA L4 GA ZF1 F L82 1 DA AL8 D
DNA/RNA/protein biosynthesis cell cycle
environmental response
developmental processes
metabolism
to be described in Chapter 8, it is now possible, for example, to monitor, simultaneously, the amount of mRNA transcript that is produced from every gene in the yeast genome under any chosen conditions, and to see how this whole pattern of gene activity changes when conditions change. The analysis can be repeated with mRNA prepared from mutants lacking a chosen gene—any gene that we care to test. In principle, this approach provides a way to reveal the entire system of control relationships that govern gene expression—not only in yeast cells, but in any organism whose genome sequence is known.
Figure 1–44 The network of interactions between gene regulatory proteins and the genes that code for them in a yeast cell. Results are shown for 106 out of the total of 141 gene regulatory proteins in Saccharomyces cerevisiae. Each protein in the set was tested for its ability to bind to the regulatory DNA of each of the genes coding for this set of proteins. In the diagram, the genes are arranged in a circle, and an arrow pointing from gene A to gene B means that the protein encoded by A binds to the regulatory DNA of B, and therefore presumably regulates the expression of B. Small circles with arrowheads indicate genes whose products directly regulate their own expression. Genes governing different aspects of cell behavior are shown in different colors. For a multicellular plant or animal, the number of gene regulatory proteins is about 10 times greater, and the amount of regulatory DNA perhaps 100 times greater, so that the corresponding diagram would be vastly more complex. (From T.I. Lee et al., Science 298:799–804, 2002. With permission from AAAS.)
To Make Sense of Cells, We Need Mathematics, Computers, and Quantitative Information Through methods such as these, exploiting our knowledge of complete genome sequences, we can list the genes and proteins in a cell and begin to depict the web of interactions between them (Figure 1–44). But how are we to turn all this information into an understanding of how cells work? Even for a single cell type belonging to a single species of organism, the current deluge of data seems overwhelming. The sort of informal reasoning on which biologists usually rely seems totally inadequate in the face of such complexity. In fact, the difficulty is more than just a matter of information overload. Biological systems are, for example, full of feedback loops, and the behavior of even the simplest of systems with feedback is remarkably difficult to predict by intuition alone (Figure 1–45); small Figure 1–45 A very simple gene regulatory circuit—a single gene regulating its own expression by the binding of its protein product to its own regulatory DNA. Simple schematic diagrams such as this are often used to summarize what we know (as in Figure 1–44), but they leave many questions unanswered. When the protein binds, does it inhibit or stimulate transcription? How steeply does the transcription rate depend on the protein concentration? How long, on average, does a molecule of the protein remain bound to the DNA? How long does it take to make each molecule of mRNA or protein, and how quickly does each type of molecule get degraded? Mathematical modeling shows that we need quantitative answers to all these and other questions before we can predict the behavior of even this single-gene system. For different parameter values, the system may settle to a unique steady state; or it may behave as a switch, capable of existing in one or other of a set of alternative states; or it may oscillate; or it may show large random fluctuations.
regulatory DNA
gene coding region
mRNA
gene regulatory protein
36
Chapter 1: Cells and Genomes
changes in parameters can cause radical changes in outcome. To go from a circuit diagram to a prediction of the behavior of the system, we need detailed quantitative information, and to draw deductions from that information we need mathematics and computers. These tools for quantitative reasoning are essential, but they are not allpowerful. You might think that, knowing how each protein influences each other protein, and how the expression of each gene is regulated by the products of others, we should soon be able to calculate how the cell as a whole will behave, just as an astronomer can calculate the orbits of the planets, or a chemical engineer can calculate the flows through a chemical plant. But any attempt to perform this feat for an entire living cell rapidly reveals the limits of our present state of knowledge. The information we have, plentiful as it is, is full of gaps and uncertainties. Moreover, it is largely qualitative rather than quantitative. Most often, cell biologists studying the cell’s control systems sum up their knowledge in simple schematic diagrams—this book is full of them—rather than in numbers, graphs, and differential equations. To progress from qualitative descriptions and intuitive reasoning to quantitative descriptions and mathematical deduction is one of the biggest challenges for contemporary cell biology. So far, the challenge has been met only for a few very simple fragments of the machinery of living cells—subsystems involving a handful of different proteins, or two or three cross-regulatory genes, where theory and experiment can go closely hand in hand. We shall discuss some of these examples later in the book.
Arabidopsis Has Been Chosen Out of 300,000 Species As a Model Plant The large multicellular organisms that we see around us—the flowers and trees and animals—seem fantastically varied, but they are much closer to one another in their evolutionary origins, and more similar in their basic cell biology, than the great host of microscopic single-celled organisms. Thus, while bacteria and eucaryotes are separated by more than 3000 million years of divergent evolution, vertebrates and insects are separated by about 700 million years, fish and mammals by about 450 million years, and the different species of flowering plants by only about 150 million years. Because of the close evolutionary relationship between all flowering plants, we can, once again, get insight into the cell and molecular biology of this whole class of organisms by focusing on just one or a few species for detailed analysis. Out of the several hundred thousand species of flowering plants on Earth today, molecular biologists have chosen to concentrate their efforts on a small weed, the common Thale cress Arabidopsis thaliana (Figure 1–46), which can be grown indoors in large numbers, and produces thousands of offspring per plant after 8–10 weeks. Arabidopsis has a genome of approximately 140 million nucleotide pairs, about 11 times as much as yeast, and its complete sequence is known.
The World of Animal Cells Is Represented By a Worm, a Fly, a Mouse, and a Human Multicellular animals account for the majority of all named species of living organisms, and for the largest part of the biological research effort. Four species have emerged as the foremost model organisms for molecular genetic studies. In order of increasing size, they are the nematode worm Caenorhabditis elegans, the fly Drosophila melanogaster, the mouse Mus musculus, and the human, Homo sapiens. Each of these has had its genome sequenced. Caenorhabditis elegans (Figure 1–47) is a small, harmless relative of the eelworm that attacks crops. With a life cycle of only a few days, an ability to survive in a freezer indefinitely in a state of suspended animation, a simple body plan, and an unusual life cycle that is well suited for genetic studies (described in Chapter 23), it is an ideal model organism. C. elegans develops with clockwork precision from a fertilized egg cell into an adult worm with exactly 959 body cells
Figure 1–46 Arabidopsis thaliana, the plant chosen as the primary model for studying plant molecular genetics. (Courtesy of Toni Hayden and the John Innes Foundation.)
GENETIC INFORMATION IN EUCARYOTES
37 Figure 1–47 Caenorhabditis elegans, the first multicellular organism to have its complete genome sequence determined. This small nematode, about 1 mm long, lives in the soil. Most individuals are hermaphrodites, producing both eggs and sperm. The animal is viewed here using interference contrast optics, showing up the boundaries of the tissues in bright colors; the animal itself is not colored when viewed with ordinary lighting. (Courtesy of Ian Hope.)
0.2 mm
(plus a variable number of egg and sperm cells)—an unusual degree of regularity for an animal. We now have a minutely detailed description of the sequence of events by which this occurs, as the cells divide, move, and change their characters according to strict and predictable rules. The genome of 97 million nucleotide pairs codes for about 19,000 proteins, and many mutants and other tools are available for the testing of gene functions. Although the worm has a body plan very different from our own, the conservation of biological mechanisms has been sufficient for the worm to be a model for many of the developmental and cell-biological processes that occur in the human body. Studies of the worm help us to understand, for example, the programs of cell division and cell death that determine the numbers of cells in the body—a topic of great importance in developmental biology and cancer research.
Studies in Drosophila Provide a Key to Vertebrate Development The fruitfly Drosophila melanogaster (Figure 1–48) has been used as a model genetic organism for longer than any other; in fact, the foundations of classical genetics were built to a large extent on studies of this insect. Over 80 years ago, it provided, for example, definitive proof that genes—the abstract units of hereditary information—are carried on chromosomes, concrete physical objects whose behavior had been closely followed in the eucaryotic cell with the light microscope, but whose function was at first unknown. The proof depended on one of the many features that make Drosophila peculiarly convenient for genetics—the
Figure 1–48 Drosophila melanogaster. Molecular genetic studies on this fly have provided the main key to understanding how all animals develop from a fertilized egg into an adult. (From E.B. Lewis, Science 221:cover, 1983. With permission from AAAS.)
38
Chapter 1: Cells and Genomes
giant chromosomes, with characteristic banded appearance, that are visible in some of its cells (Figure 1–49). Specific changes in the hereditary information, manifest in families of mutant flies, were found to correlate exactly with the loss or alteration of specific giant-chromosome bands. In more recent times, Drosophila, more than any other organism, has shown us how to trace the chain of cause and effect from the genetic instructions encoded in the chromosomal DNA to the structure of the adult multicellular body. Drosophila mutants with body parts strangely misplaced or mispatterned provided the key to the identification and characterization of the genes required to make a properly structured body, with gut, limbs, eyes, and all the other parts in their correct places. Once these Drosophila genes were sequenced, the genomes of vertebrates could be scanned for homologs. These were found, and their functions in vertebrates were then tested by analyzing mice in which the genes had been mutated. The results, as we see later in the book, reveal an astonishing degree of similarity in the molecular mechanisms of insect and vertebrate development. The majority of all named species of living organisms are insects. Even if Drosophila had nothing in common with vertebrates, but only with insects, it would still be an important model organism. But if understanding the molecular genetics of vertebrates is the goal, why not simply tackle the problem head-on? Why sidle up to it obliquely, through studies in Drosophila? Drosophila requires only 9 days to progress from a fertilized egg to an adult; it is vastly easier and cheaper to breed than any vertebrate, and its genome is much smaller—about 170 million nucleotide pairs, compared with 3200 million for a human. This genome codes for about 14,000 proteins, and mutants can now be obtained for essentially any gene. But there is also another, deeper reason why genetic mechanisms that are hard to discover in a vertebrate are often readily revealed in the fly. This relates, as we now explain, to the frequency of gene duplication, which is substantially greater in vertebrate genomes than in the fly genome and has probably been crucial in making vertebrates the complex and subtle creatures that they are.
The Vertebrate Genome Is a Product of Repeated Duplication Almost every gene in the vertebrate genome has paralogs—other genes in the same genome that are unmistakably related and must have arisen by gene duplication. In many cases, a whole cluster of genes is closely related to similar clusters present elsewhere in the genome, suggesting that genes have been duplicated in linked groups rather than as isolated individuals. According to one hypothesis, at an early stage in the evolution of the vertebrates, the entire genome underwent duplication twice in succession, giving rise to four copies of every gene. In some groups of vertebrates, such as fish of the salmon and carp families (including the zebrafish, a popular research animal), it has been suggested that there was yet another duplication, creating an eightfold multiplicity of genes. The precise course of vertebrate genome evolution remains uncertain, because many further evolutionary changes have occurred since these ancient events. Genes that were once identical have diverged; many of the gene copies have been lost through disruptive mutations; some have undergone further rounds of local duplication; and the genome, in each branch of the vertebrate family tree, has suffered repeated rearrangements, breaking up most of the original gene orderings. Comparison of the gene order in two related organisms, such as the human and the mouse, reveals that—on the time scale of vertebrate evolution—chromosomes frequently fuse and fragment to move large blocks of DNA sequence around. Indeed, it is possible, as we shall discuss in Chapter 7, that the present state of affairs is the result of many separate duplications of fragments of the genome, rather than duplications of the genome as a whole. There is, however, no doubt that such whole-genome duplications do occur from time to time in evolution, for we can see recent instances in which duplicated chromosome sets are still clearly identifiable as such. The frog
20 mm
Figure 1–49 Giant chromosomes from salivary gland cells of Drosophila. Because many rounds of DNA replication have occurred without an intervening cell division, each of the chromosomes in these unusual cells contains over 1000 identical DNA molecules, all aligned in register. This makes them easy to see in the light microscope, where they display a characteristic and reproducible banding pattern. Specific bands can be identified as the locations of specific genes: a mutant fly with a region of the banding pattern missing shows a phenotype reflecting loss of the genes in that region. Genes that are being transcribed at a high rate correspond to bands with a “puffed” appearance. The bands stained dark brown in the micrograph are sites where a particular regulatory protein is bound to the DNA. (Courtesy of B. Zink and R. Paro, from R. Paro, Trends Genet. 6:416–421, 1990. With permission from Elsevier.)
GENETIC INFORMATION IN EUCARYOTES Figure 1–50 Two species of the frog genus Xenopus. X. tropicalis, above, has an ordinary diploid genome; X. laevis, below, has twice as much DNA per cell. From the banding patterns of their chromosomes and the arrangement of genes along them, as well as from comparisons of gene sequences, it is clear that the large-genome species have evolved through duplications of the whole genome. These duplications are thought to have occurred in the aftermath of matings between frogs of slightly divergent Xenopus species. (Courtesy of E. Amaya, M. Offield and R. Grainger, Trends Genet. 14:253–255, 1998. With permission from Elsevier.)
genus Xenopus, for example, comprises a set of closely similar species related to one another by repeated duplications or triplications of the whole genome. Among these frogs are X. tropicalis, with an ordinary diploid genome; the common laboratory species X. laevis, with a duplicated genome and twice as much DNA per cell; and X. ruwenzoriensis, with a sixfold reduplication of the original genome and six times as much DNA per cell (108 chromosomes, compared with 36 in X. laevis, for example). These species are estimated to have diverged from one another within the past 120 million years (Figure 1–50).
Genetic Redundancy Is a Problem for Geneticists, But It Creates Opportunities for Evolving Organisms Whatever the details of the evolutionary history, it is clear that most genes in the vertebrate genome exist in several versions that were once identical. The related genes often remain functionally interchangeable for many purposes. This phenomenon is called genetic redundancy. For the scientist struggling to discover all the genes involved in some particular process, it complicates the task. If gene A is mutated and no effect is seen, it cannot be concluded that gene A is functionally irrelevant—it may simply be that this gene normally works in parallel with its relatives, and these suffice for near-normal function even when gene A is defective. In the less repetitive genome of Drosophila, where gene duplication is less common, the analysis is more straightforward: single gene functions are revealed directly by the consequences of single-gene mutations (the singleengined plane stops flying when the engine fails). Genome duplication has clearly allowed the development of more complex life forms; it provides an organism with a cornucopia of spare gene copies, which are free to mutate to serve divergent purposes. While one copy becomes optimized for use in the liver, say, another can become optimized for use in the brain or adapted for a novel purpose. In this way, the additional genes allow for increased complexity and sophistication. As the genes take on divergent functions, they cease to be redundant. Often, however, while the genes acquire individually specialized roles, they also continue to perform some aspects of their original core function in parallel, redundantly. Mutation of a single gene then causes a relatively minor abnormality that reveals only a part of the gene’s function (Figure 1–51). Families of genes with divergent but partly overlapping functions are a pervasive feature of vertebrate molecular biology, and they are encountered repeatedly in this book.
The Mouse Serves as a Model for Mammals Mammals have typically three or four times as many genes as Drosophila, a genome that is 20 times larger, and millions or billions of times as many cells in their adult bodies. In terms of genome size and function, cell biology, and molecular mechanisms, mammals are nevertheless a highly uniform group of organisms. Even anatomically, the differences among mammals are chiefly a matter of size and proportions; it is hard to think of a human body part that does not have a counterpart in elephants and mice, and vice versa. Evolution plays freely with quantitative features, but it does not readily change the logic of the structure.
39
40
Chapter 1: Cells and Genomes
gene G1
gene G1
gene G1
gene G1
gene G gene G2 ancestral organism (A)
gene G2
modern organism
EVOLUTION BY GENE DUPLICATION
loss of gene G1 (B)
gene G2 loss of gene G2
gene G2 loss of genes G1 and G2
MUTANT PHENOTYPES OF MODERN ORGANISM
For a more exact measure of how closely mammalian species resemble one another genetically, we can compare the nucleotide sequences of corresponding (orthologous) genes, or the amino acid sequences of the proteins that these genes encode. The results for individual genes and proteins vary widely. But typically, if we line up the amino acid sequence of a human protein with that of the orthologous protein from, say, an elephant, about 85% of the amino acids are identical. A similar comparison between human and bird shows an amino acid identity of about 70%—twice as many differences, because the bird and the mammalian lineages have had twice as long to diverge as those of the elephant and the human (Figure 1–52). The mouse, being small, hardy, and a rapid breeder, has become the foremost model organism for experimental studies of vertebrate molecular genetics. Many naturally occurring mutations are known, often mimicking the effects of corresponding mutations in humans (Figure 1–53). Methods have been developed, moreover, to test the function of any chosen mouse gene, or of any noncoding portion of the mouse genome, by artificially creating mutations in it, as we explain later in the book. One made-to-order mutant mouse can provide a wealth of information for the cell biologist. It reveals the effects of the chosen mutation in a host of different contexts, simultaneously testing the action of the gene in all the different kinds of cells in the body that could in principle be affected.
Humans Report on Their Own Peculiarities As humans, we have a special interest in the human genome. We want to know the full set of parts from which we are made, and to discover how they work. But even if you were a mouse, preoccupied with the molecular biology of mice, humans would be attractive as model genetic organisms, because of one special property: through medical examinations and self-reporting, we catalog our own genetic (and other) disorders. The human population is enormous, consisting today of some 6 billion individuals, and this self-documenting property means that a huge database of information exists on human mutations. The complete human genome sequence of more than 3 billion nucleotide pairs has now been determined, making it easier than ever before to identify at a molecular level the precise gene responsible for each human mutant characteristic. By drawing together the insights from humans, mice, flies, worms, yeasts, plants, and bacteria—using gene sequence similarities to map out the correspondences between one model organism and another—we enrich our understanding of them all.
Figure 1–51 The consequences of gene duplication for mutational analyses of gene function. In this hypothetical example, an ancestral multicellular organism has a genome containing a single copy of gene G, which performs its function at several sites in the body, indicated in green. (A) Through gene duplication, a modern descendant of the ancestral organism has two copies of gene G, called G1 and G2. These have diverged somewhat in their patterns of expression and in their activities at the sites where they are expressed, but they still retain important similarities. At some sites, they are expressed together, and each independently performs the same old function as the ancestral gene G (alternating green and yellow stripes); at other sites, they are expressed alone and may serve new purposes. (B) Because of a functional overlap, the loss of one of the two genes by mutation (red cross) reveals only a part of its role; only the loss of both genes in the double mutant reveals the full range of processes for which these genes are responsible. Analogous principles apply to duplicated genes that operate in the same place (for example, in a single-celled organism) but are called into action together or individually in response to varying circumstances. Thus, gene duplications complicate genetic analyses in all organisms.
GENETIC INFORMATION IN EUCARYOTES
98 84 86
Cretaceous
pig/whale pig/sheep human/rabbit human/elephant human/mouse human/sloth
77 87 82 83 89 81
Jurassic
human/kangaroo
81
Triassic
bird/crocodile
76
human/lizard
57
human/chicken
70
human/frog
56
human/tuna fish
55
human/shark
51
human/lamprey
35
Tertiary 50
100
100
human/orangutan mouse/rat cat/dog
time in millions of years
150
200
250 Permean 300 Carboniferous 350 Devonian 400 Silurian 450
% amino acids identical in hemoglobin α chain
human/chimp
0
41
Ordovician 500 Cambrian 550
Proterozoic
We Are All Different in Detail What precisely do we mean when we speak of the human genome? Whose genome? On average, any two people taken at random differ in about one or two in every 1000 nucleotide pairs in their DNA sequence. The Human Genome Project has arbitrarily selected DNA from a small number of anonymous individuals for sequencing. The human genome—the genome of the human species—is, properly speaking, a more complex thing, embracing the entire pool of variant genes that are found in the human population and continually exchanged and reassorted in the course of sexual reproduction. Ultimately, we can hope to document this variation too. Knowledge of it will help us understand, for example, why some people are prone to one disease, others to another; why some respond well to a drug, others badly. It will also provide new clues to our history—the population movements and minglings of our ancestors, the infections they suffered, the diets they ate. All these things leave traces in the variant forms of genes that have survived in human communities.
Figure 1–52 Times of divergence of different vertebrates. The scale on the left shows the estimated date and geological era of the last common ancestor of each specified pair of animals. Each time estimate is based on comparisons of the amino acid sequences of orthologous proteins; the longer a pair of animals have had to evolve independently, the smaller the percentage of amino acids that remain identical. Data from many different classes of proteins have been averaged to arrive at the final estimates, and the time scale has been calibrated to match the fossil evidence that the last common ancestor of mammals and birds lived 310 million years ago. The figures on the right give data on sequence divergence for one particular protein (chosen arbitrarily)—the a chain of hemoglobin. Note that although there is a clear general trend of increasing divergence with increasing time for this protein, there are also some irregularities. These reflect the randomness within the evolutionary process and, probably, the action of natural selection driving especially rapid changes of hemoglobin sequence in some organisms that experienced special physiological demands. On average, within any particular evolutionary lineage, hemoglobins accumulate changes at a rate of about 6 altered amino acids per 100 amino acids every 100 million years. Some proteins, subject to stricter functional constraints, evolve much more slowly than this, others as much as 5 times faster. All this gives rise to substantial uncertainties in estimates of divergence times, and some experts believe that the major groups of mammals diverged from one another as much as 60 million years more recently than shown here. (Adapted from S. Kumar and S.B. Hedges, Nature 392:917–920, 1998. With permission from Macmillan Publishers Ltd.)
Figure 1–53 Human and mouse: similar genes and similar development. The human baby and the mouse shown here have similar white patches on their foreheads because both have mutations in the same gene (called Kit), required for the development and maintenance of pigment cells. (Courtesy of R.A. Fleischman.)
42
Chapter 1: Cells and Genomes
Knowledge and understanding bring the power to intervene—with humans, to avoid or prevent disease; with plants, to create better crops; with bacteria, to turn them to our own uses. All these biological enterprises are linked, because the genetic information of all living organisms is written in the same language. The new-found ability of molecular biologists to read and decipher this language has already begun to transform our relationship to the living world. The account of cell biology in the subsequent chapters will, we hope, prepare you to understand, and possibly to contribute to, the great scientific adventure of the twenty-first century.
Summary Eucaryotic cells, by definition, keep their DNA in a separate membrane-enclosed compartment, the nucleus. They have, in addition, a cytoskeleton for support and movement, elaborate intracellular compartments for digestion and secretion, the capacity (in many species) to engulf other cells, and a metabolism that depends on the oxidation of organic molecules by mitochondria. These properties suggest that eucaryotes may have originated as predators on other cells. Mitochondria—and, in plants, chloroplasts—contain their own genetic material, and evidently evolved from bacteria that were taken up into the cytoplasm of the eucaryotic cell and survived as symbionts. Eucaryotic cells have typically 3–30 times as many genes as procaryotes, and often thousands of times more noncoding DNA. The noncoding DNA allows for complex regulation of gene expression, as required for the construction of complex multicellular organisms. Many eucaryotes are, however, unicellular—among them the yeast Saccharomyces cerevisiae, which serves as a simple model organism for eucaryotic cell biology, revealing the molecular basis of conserved fundamental processes such as the eucaryotic cell division cycle. A small number of other organisms have been chosen as primary models for multicellular plants and animals, and the sequencing of their entire genomes has opened the way to systematic and comprehensive analysis of gene functions, gene regulation, and genetic diversity. As a result of gene duplications during vertebrate evolution, vertebrate genomes contain multiple closely related homologs of most genes. This genetic redundancy has allowed diversification and specialization of genes for new purposes, but it also makes gene functions harder to decipher. There is less genetic redundancy in the nematode Caenorhabditis elegans and the fly Drosophila melanogaster, which have thus played a key part in revealing universal genetic mechanisms of animal development.
Which statements are true? Explain why or why not. 1–1 The human hemoglobin genes, which are arranged in two clusters on two chromosomes, provide a good example of an orthologous set of genes. 1–2 Horizontal gene transfer is more prevalent in singlecelled organisms than in multicellular organisms. 1–3 Most of the DNA sequences in a bacterial genome code for proteins, whereas most of the sequences in the human genome do not.
Discuss the following problems. 1–4 Since it was deciphered four decades ago, some have claimed that the genetic code must be a frozen accident, while others have argued that it was shaped by natural selection. A striking feature of the genetic code is its inherent resistance to the effects of mutation. For example, a change in the third position of a codon often specifies the same amino acid or one with similar chemical properties. The natural code
resists mutation more effectively (is less susceptible to error) than most other possible versions, as illustrated in Figure Q1–1. Only one in a million computer-generated “random” codes is more error-resistant than the natural genetic code. Does the extraordinary mutation resistance of the genetic code argue in favor of its origin as a frozen accident or as a result of natural selection? Explain your reasoning. number of codes (thousands)
PROBLEMS
25 20 15 10
natural code
5 0
0
5 10 15 susceptibility to mutation
Figure Q1–1 Susceptibility of the natural code relative to millions of computergenerated codes (Problem 1–4). Susceptibility measures the average change in amino acid properties caused by random mutations. A small value indicates that mutations tend to cause 20 minor changes. (Data courtesy of Steve Freeland.)
1–5 You have begun to characterize a sample obtained from the depths of the oceans on Europa, one of Jupiter’s moons. Much to your surprise, the sample contains a lifeform that grows well in a rich broth. Your preliminary analysis
END-OF-CHAPTER PROBLEMS
43
shows that it is cellular and contains DNA, RNA, and protein. When you show your results to a colleague, she suggests that your sample was contaminated with an organism from Earth. What approaches might you try to distinguish between contamination and a novel cellular life-form based on DNA, RNA, and protein?
GENE RNA mt nuc mt nuc
1–6 It is not so difficult to imagine what it means to feed on the organic molecules that living things produce. That is, after all, what we do. But what does it mean to “feed” on sunlight, as phototrophs do? Or, even stranger, to “feed” on rocks, as lithotrophs do? Where is the “food,” for example, in the mixture of chemicals (H2S, H2, CO, Mn+, Fe2+, Ni2+, CH4, and NH4+) spewed forth from a hydrothermal vent?
ratory gene Cox2, which encodes subunit 2 of cytochrome oxidase, was functionally transferred to the nucleus during flowering plant evolution. Extensive analyses of plant genera have pinpointed the time of appearance of the nuclear form of the gene and identified several likely intermediates in the ultimate loss from the mitochondrial genome. A summary of Cox2 gene distributions between mitochondria and nuclei, along with data on their transcription, is shown in a phylogenetic context in Figure Q1–2. A. Assuming that transfer of the mitochondrial gene to the nucleus occurred only once (an assumption supported by the structures of the nuclear genes), indicate the point in the phylogenetic tree where the transfer occurred. B. Are there any examples of genera in which the transferred gene and the mitochondrial gene both appear functional? Indicate them. C. What is the minimal number of times that the mitochondrial gene has been inactivated or lost? Indicate those events on the phylogenetic tree. D. What is the minimal number of times that the nuclear gene has been inactivated or lost? Indicate those events on the phylogenetic tree. E. Based on this information, propose a general scheme for transfer of mitochondrial genes to the nuclear genome. 1–11 When plant hemoglobin genes were first discovered in legumes, it was so surprising to find a gene typical of animal blood that it was hypothesized that the plant gene arose
+
+
+
Tephrosia Galactia Canavalia
+ + +
+ + +
Lespedeza
+
+
+
+
Eriosema Atylosia Erythrina
+ + +
+ + +
Ramirezella Vigna Phaseolus
+ + +
+ + +
+
+
+
Calopogonium + Pachyrhizus +
+ +
+ +
+ + + +
+ +
+ + + +
Cologania Pueraria Pseudeminia Pseudovigna
1–8 The genes for ribosomal RNA are highly conserved (relatively few sequence changes) in all organisms on Earth; thus, they have evolved very slowly over time. Were ribosomal RNA genes “born” perfect?
1–10 The process of gene transfer from the mitochondrial to the nuclear genome can be analyzed in plants. The respi-
+
Clitoria
Dumasia
1–7 How many possible different trees (branching patterns) can be drawn for eubacteria, archaea, and eucaryotes, assuming that they all arose from a common ancestor?
1–9 Genes participating in informational processes such as replication, transcription, and translation are transferred between species much less often than are genes involved in metabolism. The basis for this inequality is unclear at present, but one suggestion is that it relates to the underlying complexity. Informational processes tend to involve large aggregates of different gene products, whereas metabolic reactions are usually catalyzed by enzymes composed of a single protein. Why would the complexity of the underlying process—informational or metabolic—have any effect on the rate of horizontal gene transfer?
Pisum
+
+
+ + +
Ortholobium Psoralea Cullen Glycine
+
+
Neonotonia Teramnus Amphicarpa
+ + +
+
+ + + + + + +
+
+ +
Figure Q1–2 Summary of Cox2 gene distribution and transcript data in a phylogenetic context (Problem 1–10). The presence of the intact gene or a functional transcript is indicated by (+); the absence of the intact gene or a functional transcript is indicated by (–). mt, mitochondria; nuc, nuclei.
by horizontal transfer from an animal. Many more hemoglobin genes have now been sequenced, and a phylogenetic tree based on some of these sequences is shown in Figure Q1–3. A. Does this tree support or refute the hypothesis that the plant hemoglobins arose by horizontal gene transfer? B. Supposing that the plant hemoglobin genes were originally derived from a parasitic nematode, for example, what would you expect the phylogenetic tree to look like? Whale Rabbit Cat VERTEBRATES CobraChicken Human Salamander Cow Frog Goldfish
Barley
Lotus
Earthworm
Alfalfa Bean
Insect
PLANTS
Clam Nematode INVERTEBRATES
Chlamydomonas Paramecium
PROTOZOA
Figure Q1–3 Phylogenetic tree for hemoglobin genes from a variety of species (Problem 1–11). The legumes are highlighted in red.
44
Chapter 1: Cells and Genomes
1–12 Rates of evolution appear to vary in different lineages. For example, the rate of evolution in the rat lineage is significantly higher than in the human lineage. These rate differences are apparent whether one looks at changes in protein sequences that are subject to selective pressure or at
changes in noncoding nucleotide sequences, which are not under obvious selection pressure. Can you offer one or more possible explanations for the slower rate of evolutionary change in the human lineage versus the rat lineage?
REFERENCES
Genetic Information in Eucaryotes
General Alberts B, Bray D, Hopkin K et al (2004) Essential Cell Biology, 2nd ed. New York: Garland Science. Barton NH, Briggs DEG, Eisen JA et al (2007) Evolution. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. Darwin C (1859) On the Origin of Species. London: Murray. Graur D & Li W-H (1999) Fundamentals of Molecular Evolution, 2nd ed. Sunderland, MA: Sinauer Associates. Madigan MT & Martinko JM (2005) Brock’s Biology of Microorganisms, 11th ed. Englewood Cliffs, NJ: Prentice Hall. Margulis L & Schwartz KV (1998) Five Kingdoms: An Illustrated Guide to the Phyla of Life on Earth, 3rd ed. New York: Freeman. Watson JD, Baker TA, Bell SP et al (2007) Molecular Biology of the Gene, 6th ed. Menlo Park, CA: Benjamin-Cummings.
The Universal Features of Cells on Earth Andersson SGE (2006) The bacterial world gets smaller. Science 314:259–260. Brenner S, Jacob F & Meselson M (1961) An unstable intermediate carrying information from genes to ribosomes for protein synthesis. Nature 190:576–581. Fraser CM, Gocayne JD, White O et al (1995) The minimal gene complement of Mycoplasma genitalium. Science 270:397–403. Harris JK, Kelley ST, Spiegelman et al (2003) The genetic core of the universal ancestor. Genome Res 13:407–413. Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309–338. Watson JD & Crick FHC (1953) Molecular structure of nucleic acids. A structure for deoxyribose nucleic acid. Nature 171:737–738. Yusupov MM,Yusupova GZ, Baucom A et al (2001) Crystal structure of the ribosome at 5.5 Å resolution. Science 292:883–896.
The Diversity of Genomes and the Tree of Life Blattner FR, Plunkett G, Bloch CA et al (1997) The complete genome sequence of Escherichia coli K-12. Science 277:1453–1474. Boucher Y, Douady CJ, Papke RT et al (2003) Lateral gene transfer and the origins of prokaryotic groups. Annu Rev Genet 37:283–328. Cole ST, Brosch R, Parkhill J et al (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393:537–544. Dixon B (1994) Power Unseen: How Microbes Rule the World. Oxford: Freeman. Kerr RA (1997) Life goes to extremes in the deep earth—and elsewhere? Science 276:703–704. Lee TI, Rinaldi NJ, Robert F et al (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799–804. Olsen GJ & Woese CR (1997) Archaeal genomics: an overview. Cell 89:991–994. Pace NR (1997) A molecular view of microbial diversity and the biosphere. Science 276:734–740. Woese C (1998) The universal ancestor. Proc Natl Acad Sci USA 95:6854–6859.
Adams MD, Celniker SE, Holt RA et al (2000) The genome sequence of Drosophila melanogaster. Science 287:2185–2195. Andersson SG, Zomorodipour A, Andersson JO et al (1998) The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396:133–140. The Arabidopsis Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815. Carroll SB, Grenier JK & Weatherbee SD (2005) From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design, 2nd ed. Maldon, MA: Blackwell Science. de Duve C (2007) The origin of eukaryotes: a reappraisal. Nature Rev Genet 8:395-403. Delsuc F, Brinkmann H & Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nature Rev Genet 6:361–375. DeRisi JL, Iyer VR & Brown PO (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680–686. Gabriel SB, Schaffner SF, Nguyen H et al (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229. Goffeau A, Barrell BG, Bussey H et al (1996) Life with 6000 genes. Science 274:546–567. International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. Kellis M, Birren BW & Lander ES (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428:617–624. Lynch M & Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155. Mulley J & Holland P (2004) Comparative genomics: Small genome, big insights. Nature 431:916–917. National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/ Owens K & King MC (1999) Genomic views of human history. Science 286:451–453. Palmer JD & Delwiche CF (1996) Second-hand chloroplasts and the case of the disappearing nucleus. Proc Natl Acad Sci USA 93:7432–7435. Pennisi E (2004) The birth of the nucleus. Science 305:766–768. Plasterk RH (1999) The year of the worm. BioEssays 21:105–109. Reed FA & Tishkoff SA (2006) African human diversity, origins and migrations. Curr Opin Genet Dev 16:597–605. Rubin GM, Yandell MD, Wortman JR et al (2000) Comparative genomics of the eukaryotes. Science 287:2204–2215. Stillman B & Stewart D (2003) The genome of Homo sapiens. (Cold Spring Harbor Symp. Quant. Biol. LXVIII). Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. The C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018. Tinsley RC & Kobel HR (eds) (1996) The Biology of Xenopus. Oxford: Clarendon Press. Tyson JJ, Chen KC & Novak B (2003) Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell. Curr Opin Cell Biol 15:221–231. Venter JC, Adams MD, Myers EW et al (2001) The sequence of the human genome. Science 291:1304–1351.
9n sq8ra^auoJlnau Jo uoloJd euo 'Jeleuqlrru B ueds ol 'eurl ]qBIeJls e ul lno p1el 'rueqlJo uorTlrrug tnoqe e{Bl plno.^ tl }eqt os teteuerp ur uru 7'g,(lqBnor sr ruo}e uoqJ€f, lenprlrpu uV'ezrs rraq] aurBeurr o] preq sr 1rluql IIEIUsos eJp sruolv 'ruole uaSorpdq e go sseru eq] o1 pnba ,(laleruxordde 11unsspru cnuolp ue Suraq uolpp euo 'suoilDp ur pagrcadsuego sr elnJaloru e Jo ruole ue Jo sseru agl'Orr su uailrJm sr pue 7I go lq8raa,rJlruole uE seq uoqJeJ Jo adotosr elqelsun cruole ue seq uoqrec 3o adol uE seaJaqm')71 se pezrToqufs sr pue ZI ;o lq8rarvr. -osr rofetu eql snqJ'lplol eql o1Surqlou tsorule a]nqrJluoJ pue ralq8q qonru ere suoJlJeleaql ecurs 'sureluoc alnceloru Jo ruole eql leql suoJlneu snld suolord;o raqunu aql o] pnba rtlepuassa q slqJ'urole uaSorp,(q u 1o leql ol elrleleJ ssetu slr sr 'alnoeloru u Jo fq8laiu JBInJeIoru eql Jo 'ruole up go rqtgam Jlurolu eql
sruolv Jo sad^I ru\ale urorl apew aJVslla) 'peruJoJ eJp selnf,elotu ur raqleSol sruote ploq ]Eql spuoq leJrrueqJ eql Jo IIB ^aoq A\ODI ol [BrJnJJ sJlr 'aJoJaJaq]tellEur elprurueur ruoq llrnq are srusrueSro 3utntl llroq puelsJepun ot JepJo uI 'salnJalou ruJoJ ol sdnor8 ur raqlaSol pe{url eJpsruolp rraql ,(ea,raq1uo puadap-aperu eJe s11ac 8ur -^I qclqm ruo4 slerJaleuraq] SurpnlJur-slueuala arnd ueql Jaqto seJuelsqns '11-7 arnt;g) Iaolv ue st saqradord Jo scrlsrJa]f,eJeqJeql tananoll lecrrueqc elrl 'superu -cuDSIpslr suplal Ilrls ]eql lueuele ue;o aycqredlsell"rus eqJ Iecrueqf, ,{.q sacuelsqns Jaqlo o}ur paueluoJ Jo rrmop ue{oJq eq touueJ teql uoqrpo ro uaSorp^,tqse qons saouplsqns-sluautap Jo suouuurqruoo Jo epuru sr Japehtr
r1) v ro slNtNodwo) rv)rwlH)rHr_
88
COOIWOUIADU]N] Ntvt_80 sttt) MoH slltf A8A9UlNtlO
s9 ]SN ]HI CNVSISAIVIVf 5V
ttlf v lo -tv)lt slNtNodtJo) tHf tHI
raldeq) sFlI ul
'eJIIJorltslratrereqr aru leql s8ulql reqlo eq] ile op o] sB se-acnpordeJ pue,r,ror8 o] srusrueBro pue slleJ alqeue leql salnJalou IIe^ -oJceruaseqlJo sapradord anbrun eql sl lI 'pue-ol-pua peluq slrunqns leJrrueqc Jo surpqc-salnlalou c1tarut1od snoruroue olur paterodJoJur eJEslleo ur sruole uoqJpo eqt Jo lsolu 'salnJeloru SurumluoJ-uoqJec Ilerus Jo ,fiagen e ureluoc slac q8noqllv'umou{ uralsrts [EcruaqJ raqto rtue ueql ,fu]sluaqJ slr ul patecrld -tuoc aroru dllsel sr :xalduroc dlsnor.uJouasr drlsnuaqc 1ac eqt uela lsalduns 1ac 'lueUodurr 'pJrql'uonnlos snoanbe u1 aceyde{Bi lsoru pue luql suorlf,eal IeJr -uaqc uo ,(1a8re1spuadap eJIIpue leluru lueJJad 64 are sllal 'puoras 'ttlslutaqc uunBn sE u,r,rornleJoJeJeqlsr r(pn1sesoqm 'spunodruoc uoqJec uo ,{punulaq,rn -relo peseq sl ]l 'lsJIC 'prcads s aytgo rfulsrueqc aql tele,r,roH 'sira.e1 pcrsdqd puu IeorrueqJ sLaqosrp ]eql susrue8ro 3urn1 ur 3urqlou sr eJaqt lr,rorDl^ ou a A 'sauradord alrlf, urlsrp Jreql ro; alqrsuodseJ sem lpql-,,snrurup,, uE-acJoC 1etr1 e ureluof, ol peleqaq aJelrlr sprutue drnluec qluee]auru eq] []un 'paepq 'saqrrcsapl.lpurrou drlsrueqf, ]eql sase8pue 'sprnbu 'sprlosJo plrom ar{} ruoq pede ruaql tas ol readde acnpordar pue aror8 o] rqilqe Jlaqt pup torlpqaq pyasodrnd [18uruaas 4aqt 'surro;3unr1 ;o rlrsranrp alqrpeJJuraq;'ualsds leJrruaqc e flararu sl 1 raldeq3 ur peqrJcsep sarnleaJf, Surnq aql Jo qrea ]eq] eapr eql tdecJe 01 llnorJJrp rr{8ls rsrg 1u sf lI
s!seHlu{solg pue Atlsltueq)Ila)
'aur!] Joluelsutua^t6Iue le sr uorl)alaue alaqm{l}texa 6urDrpard Jo (em ou sraraq]pue /slrueq)auuinluenb Jo sltnelaqt {q pauranobst lotneqaq rlaqt^trlearur 'saprlredlpnpr^rpul se alaq uMoqsaresuoll)elaaq1q6noqrle 'pnolr uor})elaaq} {1;eu13 Joleqt,_0t tnoqer(;uosrsnal)nuaql Jolalauetp aql'uor]rppeu1'araqpelera66exa ,(;lear6srozrsraql ialoqMp se uole aqi ol uortplarur elnuruJI1r;earur ate suorlralapue'suo1o.rd'suollneu aqf 's eualeur I >rue6.ro 1ourbrro;oar.u!leql autLulalap ot {boloaeqtreut pesnst q)tqM '6u!rep tI uoqre)se urvroulanbruqral e roJsrseqeqt sLuro]srql'alel ,{pea1s 1nq Molse lp {etap enrl>eorpe.r sao6tapuny1 uoqre)'suollnau1q6rapue suolo.rd xrs a^eLlsuJoleasoqMitt uoqre)a^tl)eotpel oql'aoolosralqelsunueJo stunoue xts lleuisoslealp atoL]l'suollnau pue suolordxrsr1llM'Z t uoqlef adolosr alqetsaLllse slsrxaqlrel uo uoqle) lsout alrqM'alotUpxa lol 'alqelsunalp lsqt auos 6urpn;rur'{1;ernleu ln)fo sluaulola oql llelsorulpJo sadolosra;d1t;ny1 'suolord;olaquJnuaulpsaql lnq suollnau,olaqulnu luata;Jtpe 6urneq adolosr\)ea'sadolos t palle)'sr.!loj le)!]uep!{1;e>tuaql1nqalqeqsrnburlsrp Illerrs{qdlelanasut }stxaue) }uauale ue 'suorlnauJo asne)a8'tlotp aql Jo saryadordle)ruraq)oql lalle lou op {eql lnq-^e)ap a^rl)eorpelIq alet6elursrp Ieu sna;lnu eqr MaJool ro ,(ueu.r ool arearaqljr-snal)nu aql Jo ^ttltqets leJn])nrlsaql ol alnquluor{aq1'suolord sesseu auuesaq1{;;erluassa Jo sa;rrlred pableq)unaJesuollnoN lr,l-toleqns 'laqIIJnu]il.llole aures aq] a^eqtuauale uanr6e;o suroleaql Jo llP'ruoleue,o lot^eqaqlP)tulaq)aql eururalapleql suorl)elaaql srlr asnplag 'e6reqr 1auou spqulolp aql lpql os '(Jaqwnu )twolDaql) suotoldJo laqunu slrol lenDasruiole uP ur suoll)elaJo loqurnu atll'suotlnau;erlnau,{;lerulta;a pue suolotdpa6teqt {;anrlsod qloq Jo slsrsuo)ue6orpr(qlda)xa uole llane lo snapnuaq1'ua6orplq ro urole ue pue uoqre) Jo ruolp ue,o suopleluaserde.r )lleulaq)s(1q6191-7 arnbr3
uoJlJele auo,{1uo qlr^ 'lspJ}uoc[q 'uaSorprtg 'sase8uaul ilP eJEasaql 1g+ g + 7 qll-M uoBJE pue 'B + Z qlyv\ uoau (suoJlJala Z qlF runlTeq eru saldluuxg 'o^Ilce -aJun,li.llerrluaqraroJaraqtpue alqEtsf.lprcadsa sI suorlJale q}l \ palllJ dlarpua sr ilaqs lsoruJelno asoq ^ Iuole uv 'uo os puE '.pJlql eq] eJoJaqpuoces eql 'puoses eq] aroJaq lleqs ]srrJeql-Jepro uI slEllqro eq] ilu ruolP uP Jo suorlcele aq] 'sluole JaBJEIaqt u suolldeJxa ul€lJec qlFu 'aroJaJeqJ'sllaqs lsoureuul eql ,,(dnccodaqt uaqarr'sl ]Eql-alqlssod eJe]eq] salpls punoq dFqBI] tsoru eql uI ere suoJlrale aq] ile ueq^a alq€ls ]solu sI luolP uE Jo luetuaSuBJJPuoJlsela eqJ 'salnJelolu 't{JEe suoDcele Ief,rSolorq ur aJpJ ,{ran aru sfieqs rnoJ ueq] eJolu qll^\ surolv 8I ploq uBr slleqs qulJ puE qunoJ aqJ'suol]cele 1q3le ol dn sploq osp lI :punoq dpq8u ssel uale are tpqt suorlcale suletuoJ llaqs prlql eql 'suorlcala lq8le ol dn sploq ilaqs puoJas srql 'punoq fpq8p ssel are suorlcala sll pue 'snelJnu eq] tuo4 d.e.vreraqpeJ sr Ileqs puosas eql 'suoJ]cale o^\l Jo lunlurxBlu P sploq 'ileqs 'Jsolurauul ,{dncco pue ]l o1 flSuorls punoq eql dllq8p Jsoru fiaqs sIqI uo lsesolJ suoJlJala aqL'Ip|.F aJEsnalJnu a^rlrsod aql ol eSeJaAe ]sotl'rpe13eJ11p uoJpala palles-os E-ad,{} ua^r3 e Jo l€lrqJo ue uI pa}Pporuruof,Jeeq uPc }eq} suoJ]f,eleJo Jeqrunu eq] ol llurll l3rJls e sI aJaq] leql pue 'spllqJo pellEJ 'selEls eleJcsrpureual ur,{luo lsue ueJ ruole ue ur suoJlcala}uql elBlJIp srltelasaql 'aJIT depLrala ur JeIIrueJ asoql ruo4 s,lrel ]uaJaJJIpfrarr daqo apcs cldocsoJJlruqns slql uo suorloru lnq'snalsnu eql punoJBuollotu snonunuoJ uI aJesuoJlselg 'salnJalou ruJoJo] eulquoJ sruole qJIqM,,(q.,tr1sruraq3 selnJ Jo eqf .{JIJads pue ruo}e ue Jo JoIJe}xa aql uJoJ ^,(aq; 'sluaure8ueJJeer oBrapun 'sanssl] SuntT uI 'rolJear JBelcnu e ]Bql ruole uE Jo suoJlJele eql dpo st lI Jo Jo uns aq] Jo JorJaluraq] ur Jo 'aldurexa ro; ,{ecap eAI}JEoIpeJSutrnp-suoplpuoc arueJlxeJepun .{1uosraulred aSueqc puE snelJnu eq} uI Jaqlouu euo ol dFqSll pepla,/!\aJEsuorlnau pup suoloJd 'suoJlsalerleq] uo snJoJe^\'srusrueSro3unr1 dn aleu leql salnJelou aql {uJoJ ot JaqleSol puoq stuo}P ^ oq pue}sJapun oJ
l)eJalul stuolv /vloHau!uJaloc suoll)elf lsotltlolno eql Lrlsnuaqc;o ad,rt a,ulcupslp e Jo eoueplla sI pue (t-Z arn8lg) lua(uuoJllue '1q8raa,r Jlue8rour Sur.trluou eq] Jo ]Pql tuoq ^lpe{Jelu sJaJJIpuolllsodtuol srql s.usrup8ro ue Jo %9'96 dn a{Eur-(O) ue8dxo pue '(N) uaSorllu '(H) ueSorpdq (sluaruele ' (3) esaqt uoqrec-qcrq ^ Jo uol]f,elas llstus E ,,tpo ;o aperu eJe Jo JnoJ 're^e,{i\oq'sursrue8roSuyrtt 'sluole sll uI suoJ]3elapue suoloJd Jo Jeqrunu aql uI sJaqlo aq] uro4 Surra;;rp qJea 'slualuele SuuJnoJo -,(1urnleu 68 aJe eJaqJ '(z-z arnElc) eruBlsqns aql Jo elolu auo palpc sr fi11 -uenb srql 'sruer8XJo sseruE aneq IIIM ]l Jo selnoelorug70I x 9 'XJo tq8la,vrrEIn -Jelou e seq eJuulsqns eJI 'selnJelou Jo sruole PnpIAIpulJo s[uJal uI peJnsEau sapnuunb pup sarlrtuenb fepI-rana uaaMleq dtqsuouelar eqt Sulquf,sep JolceJ 'sluolE eleJs ^e{ eql sI (Jaqlunu s.oJpBSo^vpa11ec'g70I x 9) Jequnu aSnq srq; ez0l x 9 suleluoc ueSoJp^q;o ruur8 auo os'luer8 (sz0l x 9)/I ,{lateurxordde
! = ]q6ta^^ lil-uole I = laqulnu )!ulole urole ua6olpfq
7 1 = 1 q 6 r a m: r u o 1 e )lulole 9 = JAqLUnU r.,lrole uoqre)
uoJpala
pue^rlsruaq) llo) :z ieldeqf srsaqlu^so!g
9V
47
THECHEMICAL COMPONENTS OF A CELL
and only a half-filled shell, is highly reactive. Likewise, the other atoms found in living tissues have incomplete outer electron shells and can donate, accept, or share electrons with each other to form both molecules and ions (Figure 2-4). Becausean unfilled electron shell is less stable than a filled one, atoms with incomplete outer shells tend to interact with other atoms in a way that causes them to either gain or lose enough electrons to achieve a completed outermost shell. This electron exchange occurs either by transferring electrons from one atom to another or by sharing electrons between two atoms. These two strategies generate two types of chemical bonds between atoms: an ionic bond is formed when electrons are donated by one atom to another, whereas a coualent bondis formed when two atoms share a pair of electrons (Figure 2-5). Often, the pair of electrons is shared unequally, with a partial transfer between two atoms that attract electrons differently-one more electronegatiuethanthe other: this intermediate strategy results in a polar coualentbond, as we shall discusslater. An H atom, which needs only one electron to fill its shell, generally acquires it by electron sharing, forming one covalent bond with another atom; often this bond is polar-meaning that the electrons are shared unequally. The other common elements in living cells-C, N, and O, with an incomplete second shell, and P and S, with an incomplete third shell (see Figure 2-4)-generally share electrons and achieve a filled outer shell of eight electrons by forming several covalent bonds. The number of electrons that an atom must acquire or lose (either by sharing or by transfer) to fill its outer shell is knorm as irs ualence. The crucial role of the outer electron shell in determining the chemical properties of an element means that, when the elements are listed in order of their atomic number, there is a periodic recurrence of elements with similar properties: an element with, say, an incomplete second shell containing one electron will behave in much the same way as an element that has filled its second shell
A mole is X grams of a substance, where X is its relative molecularmass (molecularweight).A mole will contain 5 x 102m 3 o l e c u l e so f t h e s u b s t a n c e . 1 m o l e o f c a r b o nw e i g h s1 2 g 1 m o l e o f g l u c o s ew e i g h s 1 8 0g 1 m o l e o f s o d i u mc h l o r i d ew e i g h s5 8 g Molar solutions have a concentration of 1 mole of the substancein 1 liter of s o l u t i o n .A m o l a r s o l u t i o n( d e n o t e da s 1 M ) o f g l u c o s e{,o r e x a m p l e ,h a s , h i l e a m i l l i m o l asr o l u t i o n 1 8 0g / 1 w (1 mM) has 180mg/|. The standardabbreviationfor gram is g; the abbreviationfor liter is l.
Figure2-2 Molesand molar solutions.
:,4.4:lpri',1:ll:rllr:ai:':ir:il
!
numan oodv Earth'scrust
o c 6
Ef s o !
a @ a o
E20 o o
I.: ano Mg
NaP ano K
Figure2-3 The abundancesof some chemicalelementsin the nonliving world (the Earth'scrust)comparedwith their abundancesin the tissuesof an animal.The abundanceof eachelement is expressedas a percentageof the total numberof atomspresentincluding water.Thus,becauseof the abundanceof water,more than 600loof the atoms in a livingorganismarehydrogenatoms.The relativeabundanceof elementsis similar i n a l l l i v i n gt h i n g s .
48
Chapter2: CellChemistryand Biosynthesis tomic number
I
e l e c t r o ns h e l |-
Figure2-4 Filledand unfilledelectron shellsin somecommonelements.All the elementscommonlyfound in living organismshaveunfilledoutermostshells (red)andcan thus participatein chemical reactions with otheratoms.For comparison, someelementsthat have only filled shells(yellow)areshown;these arechemicallyunreactive.
&ae&&& &&&&&s e&8*a€ &***** &&&&&& e e e * | & # * t * , s & & & ee
and has an incomplete third shell containing one electron. The metals, for example, have incomplete outer shells with just one or a few electrons,whereas, as we have just seen,the inert gaseshave full outer shells.This pattern gives rise to the famous periodic table of the elements, presented in Figure 2-6 with the elements found in living organisms highlighted.
CovalentBondsFormby the Sharingof Electrons All the characteristics of a cell depend on the molecules it contains. A molecule is defined as a cluster of atoms held together by covalent bonds; here electrons are shared between atoms to complete the outer shells,rather than being transferred between them. In the simplest possible molecule-a molecule of hydrogen (H2)-two H atoms, each with a single electron, share two electrons, which is the number required to fill the first shell. These shared electrons form a cloud of negative charge that is densestbetween the two positively charged nuclei and helps to hold them together, in opposition to the mutual repulsion between like charges that would otherwise force them apart. The attractive and repulsive forces are in balance when the nuclei are separatedby a characteristic distance, called the bond length. Another property of any bond-covalent or noncovalent-is its bond strength, which is measured by the amount of energy that must be supplied to break that bond. This is often expressedin units of kilocalories per mole (kcal/mole), where a kilocalorie is the amount of energy needed to raise the temperature of one liter aroms
a
U
Y-,;
ELECTRoNS
I
molecule
covalent bond
positive ron
negative ton
i o n i cb o n d
Figure2-5 Comparisonof covalentand ionicbonds.Atomscanattaina more stablearrangement of electronsin their outermostshellby interactingwith one another.An ionicbond isformedwhen electronsare transferredfrom one atom to the other.A covalentbond isformed when electronsare sharedbetween atoms.The two casesshown represent extremes;often,covalentbonds form with a partialtransfer(unequalsharingof electrons), resultingin a polarcovalent bond (seeFigure2-43).
49
THECHEMICAL COMPONENTS OFA CELL a t o m i cn u m b e r a t o m i cw e i g h t 6789
CNOF 12
11
12
14
23
28
24 20
23
24
25
26
21
KCa
VCrMnFeCoNi
39
51
40
16
siPScl
Na Mg 19
16
14 15
52 42
55
56
59
2A
29
11
64
35
34
30
5e
CuZn 59
32
19 11
19
65
53
Mo
I
96
127
of water by one degreeCelsius(centigrade).Thus if 1 kilocalorie must be supplied to break 6 x 1023bonds of a specific type (that is, I mole of these bonds), then the strength of that bond is I kcal/mole. An equivalent, widely used measure of energy is the kilojoule, which is equal to 0.239kilocalories. To understand bond strengths, it is helpful to compare them with the average energiesof the impacts that molecules are constantly experiencing from collisions with other molecules in their environment (their thermal, or heat, energy), as well as with other sources of biological energy such as light and glucose oxidation (Figure 2-7).Typical covalent bonds are stronger than the thermal energies by a factor of 100, so they resist being pulled apart by thermal motions and are normally broken only during specific chemical reactions with other atoms and molecules. The making and breaking of covalent bonds are violent events, and in living cells they are carefully controlled by highly specific catalysts, called enzymes.Noncovalent bonds as a rule are much weaker; we shall see later that they are important in the cell in the many situations where molecules have to associateand dissociate readily to carry out their functions. \Mhereasan H atom can form only a single covalent bond, the other common atoms that form covalent bonds in cells-O, N, S, and B as well as the allimportant C atom-can form more than one. The outermost shell of these atoms, as we have seen, can accommodate up to eight electrons, and they form covalent bonds with as many other atoms as necessary to reach this number. Oxygen, with six electrons in its outer shell, is most stable when it acquires an extra two electrons by sharing with other atoms and therefore forms up to two covalent bonds. Nitrogen, with five outer electrons, forms a maximum of three covalent bonds, while carbon, with four outer electrons, forms up to four covalent bonds-thus sharing four pairs of electrons (seeFigure 2-4). \.Vhen one atom forms covalent bonds with several others, these multiple bonds have definite arrangements in spacerelative to one another, reflecting the orientations ofthe orbits ofthe shared electrons.The covalent bonds ofsuch an atom are therefore characterized by specific bond angles as well as by bond lengths and bond energies (Figure 2-B). The four covalent bonds that can form around a carbon atom, for example, are arranged as if pointing to the four corners of a regular tetrahedron. The precise orientation of covalent bonds forms the basis for the three-dimensional geometry of organic molecules.
average t h e r m a lm o t i o n s E NE R G Y CONTENT ( k c a l / m o l e0 ) .1
1 noncovalentbond breakagein water
ATPhydrolysis in cell
Figure2-6 Elementsorderedby their atomicnumberform the periodictable. Elements fall into groupsthat show similarpropertiesbasedon the number in its of electronseachelementpossesses outershell.Forexample,Mg and Catend to giveawaythe two electronsin their outershells;C, N,and O completetheir The secondshellsby sharingelectrons. four elementshighlightedin red constitute99oloof the total number of atomspresentin the humanbody.An additionalsevenelements,highlightedin of b/ue,together representabout 0.9olo the total.Otherelements,shownin green, arerequiredin traceamountsby humans. It remainsunclearwhetherthose in elementsshownin yellowareessential humansor not.Thechemistryof life,it the seems,is thereforepredominantly chemistryof lighterelements. Atomicweights,givenby the sum of the orotonsand neutronsin the atomic will varywith the particular nucleus, isotopeof the element.Theatomic weightsshownherearethoseof the mostcommonisotopeof eachelement.
C-C bond breakage
[ ::.,tr,,+;:.1]*rrg|.f+€*#*#Ei#fflff 100 10 green light
1000
complete glucoseoxidation
Figure2-7 Someenergiesimportant for cells.Notethat theseenergiesare comparedon a logarithmicscale
50
Chapter2: CellChemistryand Biosynthesis
-ooxygen (A)
-N-
I nitrogen
I
-c I caroon
Figure2-8 The geometry of covalent bonds.(A)Thespatialarrangement of the covalentbondsthat can be formed by oxygen,nitrogen,and carbon. (B)Moleculesformed from theseatoms havea precisethree-dimensional structure,as shown here by ball-and-stick modelsfor waterand propane. A structurecan be specifiedby the bond anglesand bond lengthsfor each covalentlinkage. Theatomsarecolored accordingto the following,generally usedconvention:H, white;C, block; O, red; N, blue.
water (H2O) (B)
p r o p a n e( C H 3 - C H 2 - C H 3 )
ThereAre DifferentTypesof CovalentBonds Most covalent bonds involve the sharing of two electrons, one donated by each participating atom; these are called single bonds. Some covalent bonds, however, involve the sharing of more than one pair of electrons. Four electrons can be shared, for example, two coming from each participating atom; such a bond is called a double bond. Double bonds are shorter and stronger than single bonds and have a characteristic effect on the three-dimensional geometry of molecules containing them. A single covalent bond between tvvo atoms generally allows the rotation of one part of a molecule relative to the other around the bond axis. A double bond prevents such rotation, producing a more rigid and less flexible arrangement of atoms (Figure 2-9 and Panel 2-1, pp. 106-107). In some molecules, electrons are shared among three or more atoms, producing bonds that have a hybrid character intermediate between single and double bonds. The highly stable benzene molecule, for example, consists of a ring of six carbon atoms in which the bonding electrons are evenly distributed (although usually depicted as an alternating sequence of single and double bonds, as shown in Panel 2-1). \ivhen the atoms joined by a single covalent bond belong to different elements, the two atoms usually attract the shared electrons to different degrees. compared with a c atom, for example, o and N atoms attract electrons relatively strongly, whereas an H atom attracts electrons more weakly. By definition, a polar structure (in the electrical sense)is one with positive charge concentrated toward one end (the positive pole) and negative charge concentrated toward the other (the negative pole). covalent bonds in which the electrons are shared unequallyinthiswayarethereforeknown aspolarcoualentbonds(Figure2-10). For example, the covalent bond between oxygen and hydrogen, -O-H, or between nitrogen and hydrogen, -N-H, is polar, whereas that between carbon and hydrogen, -C-H, has the electrons attracted much more equally by both atoms and is relatively nonpolar. Polar covalent bonds are extremely important in biology because they create permanent dipolesthat allow molecules to interact through electrical forces. Any large molecule with many polar groups will have a pattern of partial positive and negative chargeson its surface.\Ay'hen such a molecule encounters a second molecule with a complementary set of charges, the two molecules will be attracted to each other by electrostatic interactions that resemble (but are weaker than) the ionic bonds discussedoreviouslv.
(A) ethane
(B) ethene Figure2-9 Carbon-carbondouble bonds and singlebondscompared.(A)The ethanemolecule,with a singlecovalent bond betweenthe two carbonatoms, illustrates the tetrahedral arrangement of singlecovalentbondsformedby carbon. One of the CH3groupsjoined by the covalentbond can rotaterelativeto the otheraroundthe bond axis.(B)The doublebond betweenthe two carbon atomsin a moleculeof ethene(ethylene) altersthe bond geometryof the carbon atomsand bringsall the atomsinto the sameplane(blue);thedoublebond preventsthe rotationof one CH2group relativeto the other.
THECHEMICAL COMPONENTS OFA CELL
51
An AtomOftenBehaves asif lt Hasa FixedRadius \.Vhena covalent bond forms between two atoms, the sharing of electrons brings the nuclei of these atoms unusually close together. But most of the atoms that are rapidly jostling each other in cells are located in separate molecules. \A/hat happens when two such atoms touch? RoshanKeab 02I-66950639 For simplicity and clarity, atoms and molecules are usually represented schematically-either as a line drawing of the structural formula or as a balland-stick model. Space-fiIling models,however, give us a more accurate representation of molecular structure. In these models, a solid envelope represents the radius of the electron cloud at which strong repulsive forces prevent a closer approach of any second, non-bonded atom-the so-called uan derWaals radius for an atom. This is possible because the amount of repulsion increases very steeply as two such atoms approach each other closely.At slightly greater distances, any two atoms will experience a weak attractive force, knor,rryras a uan der Waalsattraction. As a result, there is a distance at which repulsive and attractive forces precisely balance to produce an energy minimum in each atom's interaction with an atom of a second, non-bonded element (Figure Z-tl). Depending on the intended purpose, we shall represent small molecules as Iine drawings, ball-and-stick models, or space-filling models. For comparison, the water molecule is represented in all three ways in Figure 2-l2.lMhenrepresenting very large molecules, such as proteins, we shall often need to further simplifu the model used (see,for example, Panel 3-2, pp. 132-133).
6-
ls
Figure2-10 Polarand nonpolar covalentbonds.Theelectron distributions in the oolarwatermolecule (H:O)and the nonpolaroxygenmolecule (Oz)are compared(6+,partialpositive charge;6-, partialnegativecharge).
Waterls the MostAbundantSubstance in Cells Water accounts for about 70% of a cell'sweight, and most intracellular reactions occur in an aqueous environment. Life on Earth began in the ocean, and the conditions in that primeval environment put a permanent stamp on the chemistry of living things. Life therefore hinges on the properties of water. In each water molecule (HzO) the two H atoms are linked to the O atom by covalent bonds (seeFigure 2-12). The two bonds are highly polar becausethe O is strongly attractive for electrons, whereas the H is only weakly attractive. Consequently,there is an unequal distribution of electrons in a water molecule, with a preponderance of positive charge on the two H atoms and of negative charge on the O (see Figure 2-10). 'vVhen a positively charged region of one water molecule (that is, one of its H atoms) approaches a negatively charged region (that is, the O) of a secondwater molecule, the electrical attraction between them can result in a weak bond called a hydrogenbond (seeFigure 2-15). These bonds are much weaker than covalent bonds and are easily broken by the random thermal motions due to the heat energy of the molecules, so each bond lasts only a short time. But the combined effect of many weak bonds can be profound. Each water molecule can form hydrogen bonds through its two H atoms to two other water molecules, producing a network in which hydrogen bonds are being continually broken and formed (Panel 2-2, pp.f0B-109). It is only because of the
. (+) I E U
z
U
(-)
v a n d e r W a a l sf o r c e e q u i l i b r i u ma t t h i s p o i n t
Figure2-1 1 The balanceofvan der Waalsforces between two atoms. As the nucleiof two atomsapproach eachother,they initiallyshowa weak due to their bondinginteraction fluctuatingelectriccharges.However,the sameatomswill stronglyrepeleachother if they are brought too closetogether. The balanceof thesevan derWaals forcesoccursat attractiveand reoulsive the indicatedenergyminimum.This minimumdetermines the contact distancebetweenany two noncovalently bondedatoms;this distanceis the sum of theirvan der Waalsradii.By definition, zero energy(indicatedby the dotted red line)is the energywhen the two nuclei areat infiniteseparation.
52
Chapter2: CellChemistryand Biosynthesis van derWaals
radiusofO=t+A
o HH (A)
(B)
van derWaals r a d i u so f H i=1.24
(c)
";i.ill3l?ll,"l'
hydrogen bonds that link water molecules together that water is a liquid at room temperature, with a high boiling point and high surface tension-rather than a gas. Molecules, such as alcohols, that contain polar bonds and that can form hydrogen bonds with water dissolve readily in water. Molecules carrying plus or minus charges (ions) likewise interact favorably with water. Such molecules are termed hydrophilic, meaning that they are water-loving. A large proportion of the molecules in the aqueous environment of a cell necessarilyfall into this category including sugars, DNA, RNA, and most proteins. Hydrophobic (waterhating) molecules, by contrast, are uncharged and form few or no hydrogen bonds, and so do not dissolve in water. Hydrocarbons are an important example (see Panel 2-I, pp. 106-107). In these molecules the H atoms are covalently linked to C atoms by a largely nonpolar bond. Becausethe H atoms have almost no net positive charge, they cannot form effective hydrogen bonds to other molecules. This makes the hydrocarbon as a whole hydrophobic-a property that is exploited in cells,whose membranes are constructed from molecules that have long hydrocarbon tails, as we shall see in Chapter I0.
Some PolarMoleculesAre Acidsand Bases One of the simplest kinds of chemical reaction, and one that has profound significance in cells,takes place when a molecule containing a highly polar covalent bond between a hydrogen and a second atom dissolvesin water. The hydrogen atom in such a molecule has largely given up its electron to the companion atom and so resembles an almost naked positively charged hydrogen nucleus-in other words, a proton (H+).\A/henwater molecules surround the polar molecule, the proton is attracted to the partial negative charge on the O atom of an adjacent water molecule and can dissociate from its original partner to associate instead with the oxygen atoms of the water molecule to generate a hydronium ion (H3O+)(Figure 2-f 3A). The reversereaction also takes place very readily, so one has to imagine an equilibrium state in which billions of protons are constantly flitting to and fro from one molecule in the solution to another. The same tlpe of reaction takes place in a solution of pure water itself. As illustrated in Figure 2-13B, water molecules are constantly exchanging protons with each other. As a result, pure water contains an equal, very low concentration of H3O+and OH- ions, both being present at 10-7M. (The concentration of H2O in pure water is 55.5M.) Substancesthat releaseprotons to form H3O+when they dissolve in water are termed acids. The higher the concentration of HsO*, the more acidic the solution. As H3O* rises, the concentration of OH- falls, according to the equilibrium equation for water: [HsO*][OH-] = 1.0 x 10-la, where square brackets denote molar concentrations to be multiplied. By tradition, the H3O+ concentration is usually referred to as the H+ concentration, even though nearly all H+ in an aqueous solution is present as H3O+.To avoid the use of unwieldy numbers, the concentration of H+ is expressedusing a logarithmic scale called the pH scale, as illustrated in Panel 2-2 (pp.108-109). Pure water has a pH of 7.0, and is neutral-that is, neither acidic (pH < 7.0) nor basic (pH > 7.0).
Figure2-1 2 Threerepresentations of a water molecule.(A)The usualline formula,in drawingof the structural whicheachatom is indicatedby its standardsymbol,and eachline represents a covalentbondjoiningtwo model,in atoms.(B)A ball-and-stick by spheres whichatomsarerepresented of arbitrarydiameter,connectedby sticks representing covalentbonds.Unlike(A), represented bond anglesareaccurately in thistype of model(seealsoFigure model,in which 2-8).(C)A space-filling both bond geometryand van derWaals represented. radiiareaccurately
THECHEMICAL COMPONENTS OFA CELL
o
53 HOH
cH3-C
o-H 66acetic acid
|| !:!l HrO
hydronium ton
acetate ton
(A)
HrO
_-
proton moves from one m o l e c u l et o the other
(B)
HrOt
OH-
hydronium ton
hydroxyl ton
Becausethe proton of a hydronium ion can be passedreadily to many types of molecules in cells, altering their character,the concentration of H3O+inside a cell (the acidi$ must be closely regulated. The interior of a cell is kept close to neutrality, and it is buffered by the presence of many chemical groups that can take up and releaseprotons near pH 7. The opposite of an acid is a base. Just as the defining property of an acid is that it donates protons to a water molecule so as to raise the concentration of H3O+ions, the defining property of a base is that it acceptsprotons so as to lower the concentration of H3O+ions, and thereby raise the concentration of hydroxyl ions (OH-). A base can either combine with protons directly or form hydroxyl ions that immediately combine with protons to produce H2O. Thus sodium hydroxide (NaOH) is basic (or alkaline) because it dissociatesin aqueous solution to form Na+ ions and OH- ions. Other bases,especially important in living cells, contain NH2 groups. These groups directly take up a proton from water: -NH2 + H2O -+ -NHs* + OH-. All molecules that accept protons from water will do so most readily when the concentration of H3O* is high (acidic solutions). Likewise, molecules that can give up protons do so more readily if the concentration of H3O+in solution is low (basic solutions), and thev will tend to receive them back if this concentration is high.
FourTypesof Noncovalent AttractionsHelpBringMolecules Togetherin Cells In aqueous solutions, covalent bonds are 10-100 times stronger than the other attractive forces between atoms, allowing their connections to define the boundaries of one molecule from another. But much of biology depends on the specific binding of different molecules to each other. This binding is mediated by a group of noncovalent attractions that are individually quite weak, but whose energies can sum to create an effective force between two separate molecules. We have previously introduced three of these attractive forces: electrostatic attractions (ionic bonds), hydrogen bonds, and van der Waals attractions. Table 2-l compares the strengths of these three types of noncoualent bonds with that of a typical covalent bond, both in the presence and in the Table2-1 Covalentand NoncovalentChemicalBonds
Covalent Noncovalent:ionicx hydrogen van derWaalsattraction (peratom)
0.15 0.2s 0.30 0.35
90 80 4 0.1
*An ionicbond isan electrostatic attraction betweentwo fullvcharoedatoms
90 3 1 0.1
Figure2-13 Acids in water. (A)The reactionthat takesolacewhen a moleculeof aceticaciddissolves in water. (B)Watermolecules arecontinuously exchangingprotonswith eachotherto form hydroniumand hydroxylions.These ionsin turn rapidlyrecombineto form watermolecules.
54
Chapter2: CellChemistryand Biosynthesis
absenceof water. Becauseof their fundamental importance in all biological systems, we summarize their properties here: Electrostatic attractions. These result from the attractive forces between oppositely charged atoms. Electrostatic attractions are quite strong in the absence of water. They readily form between permanent dipoles, but are greatestwhen the two atoms involved are fully charged (ionic bonds).However,the polar water molecules cluster around both fully charged ions and polar molecules that contain permanent dipoles (Figure 2-14). This greatly reduces the attractivenessof these charged species for each other in most biological settings. Hydrogen bonds. The structure of a typical hydrogen bond is illustrated in Figure 2-15. This bond represents a special form of polar interaction in which an electropositive hydrogen atom is partially shared by two electronegative atoms. Its hydrogen can be viewed as a proton that has partially dissociated from a donor atom, allowing it to be shared by a second acceptor atom. Unlike a typical electrostatic interaction, this bond is highly directional-being strongest when a straight line can be drawn between all three of the involved atoms. As already discussed,water weakens these bonds by forming competing hydrogen-bond interactions with the involved molecules. van der Waals attractions. The electron cloud around any nonpolar atom will fluctuate, producing a flickering dipole. Such dipoles will transiently induce an oppositely polarized flickering dipole in a nearby atom. This interaction generates a very weak attraction between atoms. But since many atoms can be simultaneously in contact when two surfaces fit closely,the net result is often significant. Water does not weaken these socalled van der Waals attractions. The fourth effect that often brings molecules together in water is not, strictly speaking, a bond at all. However, a very important hydrophobic force is caused by a pushing of nonpolar surfaces out of the hydrogen-bonded water network, where they would otherwise physically interfere with the highly favorable interactions between water molecules. Bringing any two nonpolar surfaces together reduces their contact with water; in this sense,the force is nonspecific. Nevertheless, we shall see in Chapter 3 that hydrophobic forces are central to the proper folding of protein molecules. Panel 2-3 provides an overview of the four types of attractions just described. And Figure 2-16 illustrates schematically how many such interactions can sum to hold together the matching surfaces of two macromolecules, even though each interaction by itself would be much too weak to be effective in the face of thermal motions.
tr_
Figure2-14 How the dipoleson water moleculesorientto reducethe affinity of oppositelychargedionsor polar groups for each other.
A Cellls Formedfrom CarbonCompounds
Figure2-15 Hydrogenbonds.(A)Ball-and-stick modelof a typical hydrogenbond.Thedistancebetweenthe hydrogenand the oxygenatom hereis lessthan the sum of theirvanderWaalsradii,indicatinga partial (B)The mostcommonhydrogenbondsin cells. sharingof electrons.
"o uH
(A)
Having looked at the ways atoms combine into small molecules and how these molecules behave in an aqueous environment, we now examine the main classesof small molecules found in cells and their biological roles.We shall see that a few basic categoriesof molecules, formed from a handful of different elements, give rise to all the extraordinary richness of form and behavior shown by living things. If we disregard water and inorganic ions such as potassium, nearly all the molecules in a cell are based on carbon. Carbon is outstanding among all the elements in its ability to form large molecules; silicon is a poor second. Becauseit is small and has four electrons and four vacancies in its outermost shell, a carbon atom can form four covalent bonds with other atoms. Most important, one carbon atom can join to other carbon atoms through highly stable covalent C-C
u6-
u,
h y d r o g e nb o n d - 0 . 3 n m l o n g oonor atom
accepror atom
.ou-rf"na UonO - 0 . 1n m l o n g (B)
o-Hililililililro o-Hililililililtoo o - H ilililililil| - H ililililililr o H ililililililr O - H ilililililil|
T H EC H E M I C AC L O M P O N E N TOSF A C E L L
55
bonds to form chains and rings and hence generatelarge and complex molecules with no obvious upper limit to their size (seePanel 2-1, pp. 106-107).The small and large carbon compounds made by cells are called organic molecules. Certain combinations of atoms, such as the methyl (-CHs), hydroxyl (-OH), carboxyl (-COOH), carbonyl (-C=O), phosphate (-POs2-),sulfhydryl (-SH), and amino (-NHz) groups, occur repeatedly in organic molecules. Each such chemical group has distinct chemical and physical properties that influence the behavior of the molecule in which the group occurs. The most common chemical groups and some of their properties are summarized in Panel 2-1, pp. 106-107.
CellsContainFourMajorFamilies of SmallOrganicMolecules The small organic molecules of the cell are carbon-based compounds that have molecular weights in the range 100-1000 and contain up to 30 or so carbon atoms. They are usually found free in solution and have many different fates. Some are used as monomer subunits to construct the giant polymeric macromolecules-the proteins, nucleic acids, and large polysaccharides-of the cell. Others act as energy sources and are broken down and transformed into other small molecules in a maze of intracellular metabolic pathways. Many small molecules have more than one role in the cell-for example, acting both as a potential subunit for a macromolecule and as an energy source. Small organic molecules are much less abundant than the organic macromolecules, accounting for only about one-tenth of the total mass of organic matter in a cell (Table 2-Z). As a rough guess,there may be a thousand different kinds of these small molecules in a typical cell. All organic molecules are slmthesized from and are broken down into the same set of simple compounds. Both their slmthesis and their breakdown occur through sequences of limited chemical changes that follow definite rules. As a consequence, the compounds in a cell are chemically related and most can be classified into a few distinct families. Broadly speaking, cells contain four major families of small organic molecules: lhe sugars, the fatty acids, the amino acids, and the nucleotides (Figure 2-17). Although many compounds present in cells do not fit into these categories,these four families of small organic molecules, together with the macromolecules made by linking them into long chains, account for a large fraction of cell mass (seeTable 2-2).
SugarsProvidean EnergySourcefor Cellsand Arethe Subunitsof Polysaccharides The simplest sugars-the monosaccharides-are compounds with the general formula (CH2O)2,where n is usually 3, 4, 5, 6,7 , or 8. Sugars,and the molecules made from them, are also called carbohydratesbecause of this simple formula. Glucose,for example, has the formula C6H1206@igure 2-18). The formula, however,does not fully define the molecule: the same set of carbons, hydrogens, and Table2-2 TheTypesof MoleculesThat Forma BacterialCell
Water I n o r g a n i co n s Sugars and precursors Aminoacidsand precursors Nucleotides and precursors Fattyacidsand precursors O t h e rs m a lm l olecules (protei Macromolecules ns, nucleicacids,and polysaccharides)
70 1 1 0.4 0.4 1 0.2 26
1 20 250 100 100 50 -300 -3000
il
Figure2-16 Schematicindicatinghow with two macromolecules complementarysurfacescan bind tightly to one anotherthrough noncovalentinteractions,
56
Chapter2:CellChemistryand Biosynthesis
b u i l d i n gb l o c k s of the cell
l a r g e ru n i t s of the cell
-laltaaclg:---J+ _AUlIgASlps"_-"__l+
PROTEINS NUCLEIC ACIDS
___NUcIi-oJlPSl**.-I+ !w
oxygens can be joined together by covalent bonds in a variety ofways, creating structures with different shapes.As shown in Panel 2-4 (pp.1l2-113), for example, glucose can be converted into a different sugar-mannose or galactosesimply by switching the orientations of specific OH groups relative to the rest of the molecule. Each of these sugars,moreover, can exist in either of two forms, called the D-form and the l-form, which are mirror images of each other. Setsof molecules with the same chemical formula but different structures are called isomers,and the subset of such molecules that are mirror-image pairs are called optical isomers.Isomers are widespread among organic molecules in general, and they play a major part in generating the enormous variety of sugars. Panel 2-4 presents an outline of sugar structure and chemistry. Sugarscan exist as rings or as open chains. In their open-chain form, sugars contain a number of hydroxyl groups and either one aldehyde ( > C : O) or one ketone H (> C: O) group. The aldehyde or ketone group plays a special role. First, it can react with a hydroxyl group in the same molecule to convert the molecule into a ring; in the ring form the carbon of the original aldehyde or ketone group can be recognized as the only one that is bonded to two oxygens.Second, once the ring is formed, this same carbon can become further linked, via oxygen, to one of the carbons bearing a hydroxyl group on another sugar molecule. This creates a disaccharide such as sucrose,which is composed of a glucose and a fructose unit. Larger sugar polymers range from the oligosaccharldes(trisaccharides, tetrasaccharides,and so on) up to giant polysaccharides,wlr'ich can contain thousands of monosaccharideunits. The way that sugars are linked together to form poly'rnersillustrates some common features of biochemical bond formation. A bond is formed between an -OH group on one sugar and an -OH group on another by a condensation reaction, in which a molecule of water is expelled as the bond is formed (Figure 2-19). Subunits in other biological polymers, such as nucleic acids and proteins, are also linked by condensation reactions in which water is expelled.The bonds created by all of these condensation reactions can be broken by the reverseprocessof hydrolysis, in which a molecule of water is consumed (seeFigure 2-19).
CH,OH ta -a) "\()H H i lH \r CC H ,/l l\ OH l/ Ho\l n C-C L]
r\H
(c)
Figure2-17 Thefour main familiesof smallorganicmoleculesin cells.These form the monomeric smallmolecules buildingblocks,or subunits,for mostof the macromolecules and other of the cell.Some,suchasthe assemblies sugarsand the fatty acids,arealsoenergy 50urce5.
Figure2-18 The structureof glucose,a previously simplesugar.As illustrated for water(seeFigure2-12),any moleculecan in severalways.In the be represented structuralformulasshownin (A),(B)and (C),the atomsareshownas chemical symbolslinkedtogetherby lines representing the covalentbonds.The thickenedlineshereare usedto indicate the planeof the sugarring,in an attempt to emphasize that the -H and -OH groupsarenot in the sameplaneasthe ring.(A)Theopen-chain form of this sugar,which is in equilibriumwith the morestablecyclicor ringform in (B). (C)Thechairform is an alternative way to drawthe cyclicmoleculethat reflectsthe geometrymoreaccurately than the structuralformulain (B).(D)A spacefillingmodel,which,aswell as depicting the three-dimensional arrangement of the atoms,alsousesthe van derWaals radiito representthe surfacecontoursof the molecule.(E)A ball-and-stick model in whichthe three-dimensional arrangement of the atomsin spaceis shown.(H,white;C,black;O, red;N, blue.)
IHE CHEMICAL COMPONENTS OFA CELL monosaccharide
57 monosaccharide
CONDENSATION
Figure2-19 The reactionof two monosaccharidesto form a Thisreactionbelongsto a disaccharide. generalcategoryof reactions termed reactions, in which two condensation join togetheras a resultof the molecules The reverse lossof a watermolecule. reaction(in whichwateris added)is termed hydrolysrs.Note that the reactive carbonat whichthe new bond is formed (on the monosaccharide on the /efthere) is the carbonjoinedto two oxygensasa resultof sugarringformation(seeFigure this commontype of 2-18),As indicated, covalentbond betweentwo sugar bond moleculesis known as a glycosidic (seealsoFigure2-20).
HYDROLYSIS
H:O
H,O
water expelled
water consumed
'""1,]"0 j" ^o flY.'."n'.?,'j Because each monosaccharide has several free hydroxyl groups that can form a link to another monosaccharide (or to some other compound), sugar polymers can be branched, and the number of possible polysaccharide structures is extremely large. Even a simple disaccharide consisting of two glucose units can exist in eleven different varieties (Figure 2-2O), while three different hexoses (CoHrzOo)can join together to make several thousand trisaccharides. For this reason it is a much more complex task to determine the arrangement of sugarsin a polysaccharide than to determine the nucleotide sequenceof a DNA molecule, where each unit is joined to the next in exactly the same way. The monosaccharide glucoseis a key energy source for cells. In a series of reactions, it is broken down to smaller molecules, releasing energy that the cell can harness to do useful work, as we shall explain later. Cells use simple polysaccharides composed only of glucose units-principally glycogenin animals and starchin plants-as energy stores.
p1*6 CH,OH
CH]OH
t_t_
q"q fo'"
,r-Q '
p 1 *4
CH,OH t-
io..\_,/
I
CH,OH I'
,/-o\ \-,/
CH,OH
,r-o\
o,.f\_/ |\i
(Il * o(l
Rl* 2
Figure2-20 Elevend isaccharides consistingof two D-glucoseunits. Althoughthesedifferonly in the type of linkagebetweenthe two glucoseunits, distinct.Sincethe they arechemically associated with proteins oligosaccharides and lipidsmay havesixor moredifferent kindsof sugarjoined in both linearand through branchedarrangements glycosidic bondssuchasthoseillustrated here,the numberof distincttypesof that can be usedin cells oligosaccharides is extremelylarge.Foran explanation seePanel2-4 of s and p linkages, (pp.112-113).Shortb/acklinesending (Redlines "blind"indicateOH positions. bond merelyindicatedisaccharide and'torners"do not imply orientations extraatoms.)
58
Chapter2: CellChemistryand Biosynthesis
Sugars do not function only in the production and storage of energy.They can also be used, for example, to make mechanical supports. Thus, the most abundant organic chemical on Earth-the cellulose of plant cell walls-is a polysaccharide of glucose.Becausethe glucose-glucoselinkages in cellulose differ from those in starch and glycogen, however, humans cannot digest cellulose and use its glucose. Another extraordinarily abundant organic substance, the chitin of insect exoskeletonsand fungal cell walls, is also an indigestible polysaccharide-in this case a linear polymer of a sugar derivative called ly'-acetylgl.,cosamine (see Panel 2-4). Other polysaccharides are the main components of slime, mucus, and gristle. Smaller oligosaccharidescan be covalently linked to proteins to form glycoproteins and to lipids to form glycolipids,both of which are found in cell membranes.As described in Chapter 10,most cell surfacesare clothed and decorated with glycoproteins and glycolipids in the cell membrane. The sugar side chains on these molecules are often recognized selectively by other cells. And differences between people in the details of their cell-surface sugars are the molecular basis for the different major human blood groups, termed A, B, AB, and O.
FattyAcidsAreComponents of CellMembranes, asWellasa Sourceof Energy A fatty acid molecule, such as palmitic acid,has two chemically distinct regions (Figure 2-21). One is a long hydrocarbon chain, which is hydrophobic and not very reactive chemically. The other is a carboxyl (-COOH) group, which behaves as an acid (carboxylic acid): it is ionized in solution (-COO-), extremely hydrophilic, and chemically reactive.Almost all the fatty acid molecules in a cell are covalently linked to other molecules by their carboxylic acid group. The hydrocarbon tail of palmitic acid is saturated: it has no double bonds between carbon atoms and contains the maximum possible number of hydrogens. Stearic acid, another one of the common fatty acids in animal fat, is also saturated. Some other fatty acids, such as oleic acid, have unsaturatedtails,with one or more double bonds along their length. The double bonds create kinks in the molecules, interfering with their ability to pack together in a solid mass. It is this that accounts for the difference between hard margarine (saturated) and liquid vegetable oils (polyunsaturated). The many different fatty acids found in cells differ only in the length of their hydrocarbon chains and the number and position ofthe carbon-carbon double bonds (seePanel2-5, pp.1l4-ll5). Fatty acids are stored in the cytoplasm of many cells in the form of droplets of triacylglycerol molecules, which consist of three fatty acid chains joined to a glycerol molecule (seePanel 2-5); these molecules are the animal fats found in meat, butter, and cream, and the plant oils such as corn oil and olive oil. \Mhen required to provide energy, the fatty acid chains are released from triacylglycerols and broken dor,rminto two-carbon units. These Wvo-carbonunits are identical to those derived from the breakdor,rrnof glucose and they enter the same energyyielding reaction pathways, as will be described later in this chapter.Triglycerides serve as a concentrated food reserve in cells, because they can be broken down to produce about six times as much usable energy,weight for weight, as glucose. Fatty acids and their derivatives such as triacylglycerols are examples of lipids. Lipids comprise a loosely defined collection of biological molecules that are insoluble in water, while being soluble in fat and organic solvents such as benzene. They typically contain either long hydrocarbon chains, as in the fatty acids and isoprenes,or multiple linked rings, as inthe steroids. The most important function of fatty acids in cells is in the construction of cell membranes. These thin sheets enclose all cells and surround their internal organelles. They are composed largely of phospholipids, which are small molecules that, like triacylglycerols, are constructed mainly from fatty acids and glycerol. In phospholipids the glycerol is joined to two fatty acid chains, however, rather than to three as in triacylglycerols. The "third" site on the glycerol is linked to a hydrophilic phosphate group, which is in turn attached to a small hydrophilic compound such as choline (see Panel 2-5). Each phospholipid
h y d r o p h i l i cc a r b o x y l i ca c i d h e a d
o \
h y d r o p h o b i ch y d r o c a r b o nt a i l (A)
(B)
(c)
Figure2-21 A fatty acid.A fatty acid is composedof a hydrophobichydrocarbon chainto which is attacheda hydrophilic carboxylic acidgroup.Palmiticacidis shown here.Differentfatty acidshave differenthydrocarbontails.(A)Structural formula.Thecarboxylic acidgroupis shownin its ionizedform.(B)Ball-andstickmodel.(C)Space-filling model.
THECHEMICAL COMPONENTS OFA CELL
rwo hydrophobic fatty acid tails
59 Figure2-22 Phospholipidstructure and the orientationof phospholipidsin membranes.In an aqueousenvironment, the hydrophobictailsof phospholipids packtogetherto excludewater.Here they haveformed a bilayerwith the hydrophilicheadof eachphospholipid facingthe water.Lipidbilayersarethe asdiscussed in basisfor cellmembranes, detailin Chapter10.
oa
00ltff00f000of
oo
p h o s p h o l i p i dm o t e c u t e
molecule, therefore, has a hydrophobic tail composed of the two fatty acid chains and a hydrophilic head, where the phosphate is located. This gives them different physical and chemical properties from triacylglycerols,which are predominantly hydrophobic. Molecules such as phospholipids, with both hydrophobic and hydrophilic regions, are termed amphiphilic. The membrane-forming property of phospholipids results from their amphiphilic nature. Phospholipids will spread over the surface of water to form a monolayer of phospholipid molecules, with the hydrophobic tails facing the air and the hydrophilic heads in contact with the water. TWo such molecular layers can readily combine tail-to-tail in water to make a phospholipid sandwich, or lipid bilayer. This bilayer is the structural basis of all cell membranes (Figure 2-22).
AminoAcidsArethe Subunitsof Proteins Amino acids are a varied class of molecules with one defining property: they all possessa carboxylic acid group and an amino group, both linked to a single carbon atom called the cr-carbon (Figure 2-23). Their chemical variety comes from the side chain that is also attached to the cx-carbon.The importance of amino acids to the cell comes from their role in making proteins, which are polymers of amino acids joined head-to-tail in a long chain that is then folded into a threedimensional structure unique to each type of protein. The covalent linkage between two adjacent amino acids in a protein chain forms an amide (seePanel 2-l), and it is called a peptide bond; the chain of amino acids is also known as a polypeptide (Figure 2-24). Regardlessof the specific amino acids from which it is made, the pollpeptide has an amino (NH2) group at one end (its N-terminus) and a carboxyl (COOH) group at its other end (its C-terminus).This givesit a definite directionality-a structural (as opposed to an electrical) polarity. Each of the 20 amino acids found commonly in proteins has a different side chain attached to the o-carbon atom (seePanel 3-1, pp. 128-129).All organisms, amtno group
c ar D o x yI group
s i d ec h a i n( R ) n o n i o n i z e df o r m (A)
i o n i z e df o r m (B)
(c)
Figure2-23 The amino acid alanine. (A)In the cell,wherethe pH is closeto 7, the freeaminoacidexistsin its ionized into a form;but when it is incorporated polypeptidechain,the chargeson the aminoand carboxylgroupsdisappear. (B)A ball-and-stick modeland (C)a modelof alanine(H,white; space-filling C, black;O, red;N,blue).
60
Chapter2:CellChemistryand Biosynthesis Figure 2-24 A small part of a protein molecule.The four amino acids shown are linkedtogether by three peptide bonds,one of which is highlightedin yellow.One of the amino acidsis shadedin gray.The amino acid sidechainsare shown in red.The two ends of a polypeptidechainare chemically distinct.Oneend,the N-terminus, terminatesin an amino group,and the other,the C-terminus,in a carboxylgroup.The sequenceis alwaysreadfrom the N-terminalend; hencethis sequenceis Phe-Ser-Glu-Lys.
N-terminal end o{ p o l y p e p t i d ec h a i n I N_H I H-C -CH? I
O=C I
N-H H-C -CHr
-oH
t-
whether bacteria,archea,plants, or animals,haveproteins made of the same20 amino acids.How this preciseset of 20 cameto be chosenis one of the mysteries of the evolution of life; there is no obviouschemicalreasonwhy other amino acidscould not haveservedjust aswell. But once the choicewas established,it could not be changed;too much dependedon it. Like sugars,all amino acids,exceptglycine,existasoptical isomersin D- and L-forms(seePanel3-l). But only L-forms€ueeverfound in proteins(althoughDamino acids occur as part of bacterial cell walls and in some antibiotics). The origin of this exclusiveuse of l-amino acids to make proteins is another evolutionarymystery. The chemicalversatility of the 20 amino acidsis essentialto the function of proteins.Five of the 20 amino acidshave side chainsthat can form ions in neutral aqueoussolution and thereby can carry a charge(Figure2-25). The others are uncharged;some are polar and hydrophilic, and some are nonpolar and hydrophobic.As we discussin Chapter3, the propertiesof the amino acid side chainsunderlie the diverseand sophisticatedfunctions of proteins.
O=C Glu
Lys
N-H ,,H H - C - C H , --C H ? - C H r - C H ) --N - H l | \H O=C
C-terminal end of p o l y p e p t i d ec h a i n
11
pH
aspartic acid pK-4.7
glutamic acid pK-4.7
histidine
lysine
argrnrne
pK-6.5
pK-10.2
pK-12
Figure2-25 The charge on amino acid side chainsdepends on the pH. The five differentsidechainsthat can carrya chargeare shown.Carboxylicacidscan readilyloseH+in aqueoussolutionto form a negativelychargedion, which is denoted by the suffix"-atei'asin aspartdteor glutamote.A comparable situationexistsfor amines.which in aqueoussolutioncan take up H+to form a positivelychargedion (whichdoes not havea specialname).Thesereactionsare rapidlyreversible, and the amountsof the two forms,chargedand uncharged, dependon the pH of the solution.At a high pH, carboxylicacidstend to be chargedand aminesuncharged. At a low pH,the oppositeis true-the carboxylic acidsare unchargedand aminesare charged.The pH at which exactlyhalf of the carboxylic acidor amineresidues are chargedis known as the pK of that amino acid sidechain (indicatedbyyellow stripe). In the cellthe pH is closeto 7, and almostall carboxylic acidsand aminesare in theirfullychargedform.
THECHEMICAL COMPONENTS OFA CELL
61
o() Pl"
o
Figure2-26Chemicalstructureof adenosinetriphosphate(ATP). (A)Structural formula.(B)Space-filling model.In (8)the colorsof the atomsare C, black;N, b/ue;H, white;O, red; and P,yellow.
H.a-N-ra-NH,
\ -"oo'in
ll
I
"\oto' \
o"'xozclt oX^,t-a.
)
H nlln OH OH
"rlJt'
(A)
(B)
Nucleotides Arethe Subunitsof DNAand RNA A nucleotide is a molecule made up of a nitrogen-containing ring compound linked to a five-carbon sugar, which in turn carries one or more phosphate groups (Panel2-6, pp.116-117). The five-carbon sugar can be either ribose or deoxyribose. Nucleotides containing ribose are known as ribonucleotides, and those containing deoxyribose as deoxyribonucleotides.The nitrogen-containing rings are generally referred to as basesfor historical reasons:under acidic conditions they can each bind an H+ (proton) and thereby increase the concentration of OH- ions in aqueous solution. There is a strong family resemblance between the different bases. Cyfosine (C), thymine (T), and uracil (U) are called pyrimidines becausethey all derive from a six-membered pyrimidine ring; guanine (G) and adenine (A) are purinecompounds, and theyhave a second, five-membered ring fused to the six-membered ring. Each nucleotide is named for the base it contains (seePanel 2-6). Nucleotides can act as short-term carriers of chemical energy.Above all others, the ribonucleotide adenosine triphosphate, or ATP (Figure 2-26), transfers energy in hundreds of different cell reactions. ATP is formed through reactions that are driven by the energy released by the oxidative breakdown of foodstuffs. Its three phosphates are linked in seriesby two phosphoanhydride bonds,whose rupture releaseslarge amounts of useful energy.The terminal phosphate group in particular is frequently split off by hydrolysis, often transferring a phosphate to other molecules and releasing energy that drives energy-requiring biosynthetic reactions (Figure 2-27). Other nucleotide derivatives are carriers for the transfer of other chemical groups, as will be described later. The most fundamental role of nucleotides in the cell, however, is in the storage and retrieval of biological information. Nucleotides serve as building blocks for the construction of nucleic aclds-long pol].rynersin which nucleotide subunits are covalently linked by the formation of a phosphodiester bond between the p h o s p h o a n h y d r i dbeo n d s TI
TI
oo-ortl
P -O-P-O-PP-O_ o P i l it l t l ol oo o
energy from s u n l i g h ot r from food
Hzo
o-
H*+
I o-P-o
H+
il
o Inorganrc p h o s p h a t e( P i )
e n e r g ya v a i l a b l e f o r c e l l u l a rw o r k and {or chemical synthesis
Figure2-27 The ATPmoleculeservesas an energycarrierin cells.Theenergyrequiringformationof ATPfrom ADPand inorganicphosphateis coupledto the oxidationof foodstuffs energy-yielding (in animalcells,fungi,and somebacteria) or to the captureof light energy(in plant The hydrolysis cellsand somebacteria). of this ATPbackto ADPand inorganic phosphatein turn providesthe energyto drivemanvcell reactions.
62
Chapter2: CellChemistryand Biosynthesis Figure2-28A smallpartof onechainof a deoxyribonucleic acid(DNA) molecule. Fournucleotides areshown. Oneof thephosphodiester bonds thatlinksadjacent nucleotide residues ishighlighted inyellow, andoneof thenucleotides isshaded in gray.Nucleotides arelinkedtogether bya phosphodiester linkage between specific carbonatomsof theribose, knownasthe5'and3'atoms. Forthisreason, oneendof a polynucleotide groupandtheother,the chain,the5' end,willhavea freephosphate group. 3' end,a freehydroxyl Thelinearsequence of nucleotides in a polynucleotide chainiscommonly abbreviated by a one-letter code,and thesequence isalways readfromthe5'end.Intheexample illustrated the sequence isG-A-T-C.
phosphate group attached to the sugar of one nucleotide and a hydroxyl group on the sugar of the next nucleotide (Figure 2-28). Nucleic acid chains are sgrthesized from energy-rich nucleoside triphosphates by a condensation reaction that releasesinorganic plrophosphate during phosphodiesterbond formation. There are two main types of nucleic acids, differing in the type of sugar in their sugar-phosphate backbone. Those based on the sugar ribose are knor,tmas ribonucleic acids, or RNA, and normally contain the basesA, G, C, and U. Those based on deoxyribose(in which the hydroxyl at the 2' position of the ribose carbon ring is replaced by a hydrogen are knor,rm as deoxyribonucleic acids, or DNA, and contain the bases A, G, C, and T (T is chemically similar to the U in RNA, merely adding the methyl group on the pyrimidine ring; see Panel2-6). RNA usually occurs in cells as a single polynucleotide chain, but DNA is virtually always a double-stranded molecule-a DNA double helix composed of two polynucleotide chains running antiparallel to each other and held together by hydrogen-bonding between the basesof the two chains. The linear sequence of nucleotides in a DNA or an RNA encodes the genetic information of the cell. The ability of the bases in different nucleic acid molecules to recognize and pair with each other by hydrogen-bonding (called base-pairing)-G with C, and A with either T or U-underlies all of heredity and evolution, as explained in Chapter 4.
5'end
I G
o I
o-
T
o I
o
TheChemistry of Cellsls Dominatedby Macromolecules with Remarkable Properties
I
3'end
By weight, macromolecules are the most abundant carbon-containing molecules in a living cell (Figure 2-29 and Table 2-3). They are the principal building blocks from which a cell is constructed and also the components that confer the most distinctive properties of living things. The macromolecules in cells are polymers that are constructed by covalently linking small organic molecules (called monomers) into long chains (Figure 2-3O). Yet they have remarkable properties that could not have been predicted from their simple constituents. Proteins are especially abundant and versatile. They perform thousands of distinct functions in cells. Many proteins serve as enzymes,the catalysts that i o n s ,s m a l l m o l e c u l e s( 4 % ) p h o s p h o l i p i d(s2 % ) D N A( 1 % ) R N A( 5 % )
^ 7
7Oo/o
Hzo
p r o t e i n s( 1 5 % )
o -m r) C
-
m
polysaccharide ( 2s% )
Figure2-29 Macromoleculesare abundantin cells.Theapproximate compositionof a bacterialcellis shown by weight.The compositionof an animal cellis similar(seeTable2-3).
T H EC H E M I C AC L O M P O N E N TOSF A C E L L
63
Table2-3 ApproximateChemicalCompositionsof a TypicalBacteriumand a TypicalMammalianCell
Hzo lnorganic ions(Na+,K*,Mg2*, ca2+,cl-, etc.) Miscellaneous smallmetabolites Proteins RNA DNA Phospholipids Otherlipids Polysaccharides Totalcellvolume Relativecellvolume
70 1
70 1
3 15 6 1 ')
3 18 1.1 0.25 ?
2
.\ 2 y 1 g - 1.2r 3 1
z
4 x 1 0 - ec m 3 2000
P r o t e i n s p, o i y s a c c h a r i d e sD,N A ,a n d R N Aa r e m a c r o m o l e c u l e sL i p i d sa r e n o t g e n e r a l l yc a s s e da s
b o t h m a m m a la n a n d b a c L e r r a ce ts
bles to make the cell'slong microtubules, or histones, proteins that compact the DNA in chromosomes. Yet other proteins act as molecular motors to produce force and movement, as in the caseof myosin in muscle. proteins perform many other functions, and we shall examine the molecular basis for many of them later in this book. Here we identifu some general principles of macromolecular chemistry that make such functions possible. Although the chemical reactions for adding subunits to each polyrner are different in detail for proteins, nucleic acids, and polysaccharides, they share important features. Each polymer grows by the addition of a monomer onto the end of a growing polymer chain in a condensation reaction, in which a
made from a set of monomers that are slightly different from one another-for example, the 20 different amino acids from which proteins are made. It is critical to life that the polymer chain is not assembled at random from these subunits; instead the subunits are added in a particular order, or sequence.T]ne elaborate mechanisms that allow this to be accomplished by enzymes are describedin detail in Chapters5 and 6.
NoncovalentBondsSpecifyBoth the preciseShapeof a Macromoleculeand its Bindingto Other Molecules Most of the covalent bonds in a macromolecule allow rotation of the atoms they join' giving the polymer chain great flexibility. In principle, this allows a macromolecule to adopt an almost unlimited number of shapes,or conformations, as
SUBUNIT
s u g ar
o
MACROMOLECULE
p o l y s a c c hr ai d e
amrno u.,0"
protern
nu c l e o t i d e
n u c l e i ca c i d
Figure2-30 Threefamiliesof macromolecules. Eachis a polymer formedfrom smallmolecules(called monomers)linkedtogetherby covalentbonds.
64
Chapter2: CellChemistryand Biosynthesis Figure2-31 Most proteinsand many RNAmoleculesfold into only one stable bonds conformation.lf the noncovalent are maintainingthis stableconformation disrupted, the moleculebecomesa flexiblechainthat usuallyhasno biologicalvalue.
many unstable conformations
one stable folded conformation
random thermal energy causesthe polymer chain to writhe and rotate. However, the shapesof most biological macromolecules are highly constrained becauseof the many we ak noncoualent bonds that form between different parts of the same molecule. If these noncovalent bonds are formed in sufficient numbers, the polyrner chain can strongly prefer one particular conformation, determined by the linear sequenceof monomers in its chain. Most protein molecules and many of the small RNA molecules found in cells fold tightly into one highly preferred conformation in this way (Figure 2-31). The four types of noncovalent interactions important in biological molecules were described earlier, and they are reviewed in Panel 2-3 (pp. 110-lll). Although individuallyveryweak, these interactions cooperate to fold biological macromolecules into unique shapes.In addition, theycan also add up to create a strong attraction between two different molecules when these molecules fit together very closely,like a hand in a glove.This form of molecular interaction provides for great specificity, inasmuch as the multipoint contacts required for strong binding make it possible for a macromolecule to select outthrough binding-just one of the many thousands of other types of molecules present inside a cell. Moreover, because the strength of the binding depends on the number of noncovalent bonds that are formed, interactions of almost any affinity are possible-allowing rapid dissociation when necessary. Binding of this type underlies all biological catalysis,making it possible for proteins to function as enzymes. Noncovalent interactions also allow macromolecules to be used as building blocks for the formation of larger structures. In cells, macromolecules often bind together into large complexes, thereby forming intricate machines with multiple moving parts that perform such complex tasks as DNA replication and protein synthesis (Figure 2-32).
SUBUNITS
MACROMOLECULES c o v a l e n tb o n d s
n o n c o v a l e nbt o n d s
MACROMOLECULAR ASSEMBLIES
e g , s u g a r sa, m i n o a c i d s , and nucleotides 30 nm eg,globularproteins and RNA
e 9., ribosome
Figure2-32 Smallmolecules,proteins,and a ribosomedrawn approximatelyto scale.Ribosomes area centralpart of the machinerythat the (proteinand RNAmolecules). cellusesto makeproteins:eachribosomeisformedasa complexof about90 macromolecules
CATALYSIS ANDTHEUsEOFENERGY BYCELLS
65
Summary Liuing organismsare autonomous, self-propagatingchemical systems.Theyare made from a distinctiue and restrictedset of small carbon-basedmoleculesthat are essentially the samefor eueryliuing species.Each of thesemoleculesis composedof a small set of atoms linked to each other in a preciseconftguration through coualent bonds. The main categoriesare sugars,fatty acids,amino acids,and nucleotides.Sugarsare a primary sourceof chemical energyfor cellsand can be incorporated into polysaccharides for energy storage.Fatty acids are also important for energy storage,but their most critical function is in the formation of cell membranes.Polymers consisting of amino acids constitute the remarkably diuerseand uersatilemacromoleculesknown as proteins. Nucleotidesplay a central part in energy transfer.They are also the subunits for the informational macromolecules,RNAand DNA. Most of the dry massof a cell consistsof macromoleculesthat hauebeenproduced as linear polymersof amino acids (proteinsl or nucleotides(DNA and RNA),coualently linked to each other in an exact ordex Most of the protein moleculesand many of the RNAsfold into a unique conformation that depends on their sequenceof subunits. This folding processcreatesunique surfaces,and it depends on a large set of weak attractions produced by noncoualentforces between atoms. Theseforces are of four types:electrostaticattrqctions, hydrogen bonds, uan der Waals attractions, and an interaction between nonpolar groups caused by their hydrophobic expulsion f'rom water. The same set of weak forcesgouernsthe specific binding of other moleculesto macromolecules,making possible the myriad associations between biological moleculesthat produce the structure and the chemistrv of a cell.
CATALYSIS ANDTHEUSEOFENERGY BYCELLS One property of living things above all makes them seem almost miraculously different from nonliving matter: they create and maintain order, in a universe that is tending always to greater disorder (Figure 2-33). To create this order, the cells in a living organism must perform a never-ending stream of chemical reactions. In some of these reactions, small organic molecules-amino acids, sugars, nucleotides, and lipids-are being taken apart or modified to supply the many other small molecules that the cell requires. In other reactions, these small molecules are being used to construct an enormously diverse range of proteins, nucleic acids, and other macromolecules that endow living systems with all of their most distinctive properties. Each cell can be viewed as a tiny chemical factory performing many millions of reactions every second.
Figure2-33 Order in biologicalstructures.Well-defined,ornate,and beautifulspatialpatternscan be size:(A)proteinmoleculesin In orderof increasing foundat everylevelof organization in livingorganisms. the coat of a virus;(B)the regulararrayof microtubulesseenin a crosssectionof a spermtail; (C)surface contoursof a pollengrain(a singlecell);(D)close-upof the wing of a butterflyshowingthe patterncreated by scales, eachscalebeingthe productof a singlecell;(E)spiralarrayof seeds,madeof millionsof cells,in the headof a sunflower.(A,courtesyof R.A.Grantand J.M.Hogle;B,courtesyof LewisTilney;C,courtesyof ColinMacFarlane and ChrisJeffree; D and E,courtesyof KjellB.Sandved.)
66
Chapter2:CellChemistryand Biosynthesis
otecue l
molecule
molecule
molecule
morecure
motecute
c a t a l y s ibs y e n z y m e1
5 A B B R E V I A T EAD
o-o-o-a-a-o e n z y m e2
e n z y m e3
e n z y m e4
e n z y m e5
Figure2-34 How a set ofenzyme-catalyzedreactionsgeneratesa metabolic pathway.Eachenzyme In this example,a setof enzymes catalyzes a particularchemicalreaction,leavingthe enzymeunchanged. actingin seriesconvertsmoleculeA to moleculeF,forminga metabolicpathway.
CellMetabolismls Organized by Enzymes The chemical reactions that a cell carries out would normally occur only at much higher temperatures than those existing inside cells. For this reason, each reaction requires a specific boost in chemical reactivity.This requirement is crucial, because it allows the cell to control each reaction. The control is exerted through the specialized proteins called enzymes,each of which accelerates,or catalyzes,just one of the many possible kinds of reactions that a particular molecule might undergo. Enzyme-catalyzedreactions are usually connected in series,so that the product of one reaction becomes the starting material, or substrate,for the next (Figure 2-34). These long linear reaction pathways are in turn linked to one another, forming a maze of interconnected reactions that enable the cell to survive, grow, and reproduce (Figure 2-35). TWoopposing streams of chemical reactions occur in cells: (l) Ihe catabolic pathways break down foodstuffs into smaller molecules, thereby generating both a useful form of energy for the cell and some of the small molecules that the cell needs as building blocks, and (2) the anabolic, or biosynthellq pathways use the energy harnessed by catabolism to drive the synthesis of the many other molecules that form the cell. Together these two sets of reactions constitute the metabolism of the cell (Figure 2-36). Many of the details of cell metabolism form the traditional subject of biochemistry and need not concern us here. But the general principles by which cells obtain energy from their environment and use it to create order are central to cell biology. We begin with a discussion of why a constant input of energy is needed to sustain living organisms.
Biological Orderls MadePossible by the Release of HeatEnergy from Cells The universal tendency of things to become disordered is a fundamental law of physics-the secondlaw of thermodynamics-which states that in the universe, or in any isolated system (a collection of matter that is completely isolated from the rest of the universe), the degreeof disorder only increases.This law has such profound implications for all living things that we restate it in severalways. For example,we can present the second law in terms of probability and state that systems will change spontaneously toward those arrangements that have the greatest probability. If we consider, for example, a box of 100 coins all lying heads up, a series of accidents that disturbs the box will tend to move the arrangement toward a mixture of 50 heads and 50 tails. The reason is simple: there is a huge number of possible arrangements of the individual coins in the mixture that can achieve the 50-50 result, but only one possible arrangement that keeps all of the coins oriented heads up. Becausethe 50-50 mixture is therefore the most probable, we say that it is more "disordered." For the same reason,
Figure2-35 Someof the metabolicpathwaysand their interconnections in a typicalcell,About500commonmetabolicreactions areshown diagrammatically, with eachmoleculein a metabolicpathwayrepresented by a filledcircle,as in the yel/owbox in Figure2-34.The pathwaythat is highlightedin this diagramwith largercirclesand connectinglinesisthe centralpathwayof sugarmetabolism, whichwill be discussed shortly.
CATALYSIS ANDTHEUSEOF ENERGY BYCELLS Figure2-36 Schematicrepresentationof the relationshipbetween catabolicand anabolicpathwaysin metabolism.As suggestedhere,since a major portion of the energystoredin the chemicalbondsof food moleculesis dissipated as heat,the massof food requiredby any organism that derivesall of its energyfrom catabolism is muchgreaterthan the mass of the molecules that can be oroducedbv anabolism.
67
it is a common experience that one's living space will become increasingly disordered without intentional effort: the movement toward disorder is a sponta-
neous process,requiring a periodic effort to reverse it (Figure 2-37). The amount of disorder in a system can be quantified and expressedas the entropy of the system: the greater the disorder, the greater the entropy. Thus, another way to express the second law of thermodynamics is to say that systems will change spontaneously toward arrangements with greater entropy. Living cells-by surviving, growing, and forming complex organisms-are generating order and thus might appear to defu the second law of thermodynamics. How is this possible?The answer is that a cell is not an isolated system: it takes in energy from its environment in the form of food, or as photons from the sun (or even, as in some chemosynthetic bacteria, from inorganic molecules alone), and it then uses this energy to generate order within itself. In the course of the chemical reactions that generateorder, the cell converts part of the energy it usesinto heat. The heat is dischargedinto the cell'senvironment and disorders it, so that the total entropy-that of the cell plus its surroundings-increases, as demanded by the laws of thermodlmamics. To understand the principles governing these energy conversions, think of a cell surrounded by a sea of matter representing the rest of the universe. As the cell lives and grows, it creates internal order. But it constantly releases heat energy as it synthesizes molecules and assembles them into cell structures. Heat is energy in its most disordered form-the random jostling of molecules. \iVhen the cell releasesheat to the sea, it increases the intensity of molecular motions there (thermal motion)-thereby increasing the randomness, or disorder, of the sea. The second law of thermodynamics is satisfied because the increase in the amount of order inside the cell is more than compensated for by an even greater decreasein order (increase in entropy) in the surrounding sea of matter (Figure 2-38). \Mhere does the heat that the cell releases come from? Here we encounter another important law of thermodynamics. The first law of thermodynamics statesthat energy can be converted from one form to another, but that it cannot
t h e m a n ym o l e c u l e s that form the cell
food molecules
CATABOLIC PATHWAYS
useful forms of energy + lost heat
t h e m a n y b u i l d i n gb l o c k s for biosynthesis
"SPONTANEOUS" REACTION a st i m e e l a p s e s
O R G A N I Z EE DF F O RR T E Q U I R I NEGN E R G Y INPUT
Figure2-37 An everydayillustrationof the spontaneousdrive toward disorder. Reversingthis tendencytoward disorder reouiresan intentionaleffort and an In inputof energy:it is not spontaneous. fact.from the secondlaw of we can be certainthat thermodynamics, the humaninterventionrequiredwill release enoughheatto the environment for the to morethan compensate reorderingof the items in this room.
68
Chapter2: CellChemistryand Biosynthesis sea of matter
o I
J \ a
t o
J -O
o.'. {
to
a\ o.* increaseddisorderincreasedorder Figure2-38A simplethermodynamic analysis of a livingcell.Inthediagram on theleftthe (theseaof matter) molecules of boththecellandtherestof theuniverse in a aredepicted relatively disordered Inthediagram state. on therightthecellhastakenin energy fromfood molecules andreleased heatbya reaction thatorders themolecules thecellcontains. Because theheatincreases thedisorder in theenvironment aroundthecell(depictedby thejagged arrows anddistorted molecules, indicating molecular theincreased motions caused by heat), thesecond lawof thermodynamics-which states thattheamountof disorder in theuniverse mustalways increase-is satisfied asthecellgrowsanddivides. Fora detailed discussion, see (pp.118-119). Panel2-7
be created or destroyed.Figure 2-39 illustrates some interconversions between different forms of energy. The amount of energy in different forms will change as a result of the chemical reactions inside the cell, but the first law tells us that the total amount of energy must always be the same. For example, an animal cell takes in foodstuffs and converts some of the energy present in the chemical bonds between the atoms of these food molecules (chemical bond energy) into the random thermal motion of molecules (heat energy).As described above,this conversion of chemical energy into heat energy is essentialif the reactions that create order inside the cell are to cause the universe as a whole to become more disordered. The cell cannot derive any benefit from the heat energy it releasesunless the heat-generating reactions inside the cell are directly linked to the processesthat generate molecular order. It is the tight coupling of heat production to an increase in order that distinguishes the metabolism of a cell from the wasteful burning of fuel in a fire. Later, we shall illustrate how this coupling occurs. For now it is sufficient to recognize that a direct linkage of the "burning" of food molecules to the generation of biological order is required for cells to create and maintain an island of order in a universe tending toward chaos.
Photosynthetic Organisms UseSunlightto Synthesize OrganicMolecules All animals live on energy stored in the chemical bonds of organic molecules made by other organisms,which they take in as food. The molecules in food also provide the atoms that animals need to construct new living matter. Some animals obtain their food by eating other animals. But at the bottom of the animal food chain are animals that eat plants. The plants, in turn, trap energy directly from sunlight. As a result, the sun is the ultimate source of the energy used by animal cells. Solar energy enters the living world through photosynthesis in plants and photosynthetic bacteria. Photosynthesis converts the electromagnetic energy in sunlight into chemical bond energy in the cell. Plants obtain all the atoms they need from inorganic sources: carbon from atmospheric carbon dioxide, hydrogen and oxygen from water, nitrogen from ammonia and nitrates in the
, :,
CATALYSIS ANDTHEUSEOF ENERGY BYCELLS f a l l i n g b r i c kh a s kinetic energy
r a i s e db r i c k h a sp o t e n t i a l e n e r g yo u e to pull of gravity
/'/'
\
heat isreleased w h e n b r i c kh i t s the floor
,,iri:iitirililrliii
1
potential energy due to position--+
.T
kinetic energy
. /c, 3G \\&",\)
\\
ll
a&
two hydrogen oxygen gas g a sm o l e c u l e s m o l e c u l e
\r*
r,
c h e m i c abl o n d e n e r g y
4
heat dispersedto s ur r o u n di n g s
heat energy
liSli.ilr1';'#]tr
kinetic energy
electricalenergy
+
-
sunlight
u,.' j; G&'
r a p i dv i b r a t i o n sa n d rotations of two newly f o r m e dw a t e r m o l e c u l e s
chemicat bondenersy inH2and02 +
3
heat energy
chlorophyll molecule
electromagnetic(light) energ! --+
c h l o r o p h y lm l olecule in excitedstate
high energy electrons --+
photosynthesis
chemicalbond energy
soil, and other elements needed in smaller amounts from inorganic salts in the soil. They use the energy they derive from sunlight to build these atoms into sugars, amino acids, nucleotides, and fatty acids. These small molecules in turn are converted into the proteins, nucleic acids, polysaccharides, and lipids that form the plant. All of these substances serve as food molecules for animals, if the plants are later eaten. The reactions of photosynthesis take place in two stages (Figure 2-4O).ln the first stage, energy from sunlight is captured and transiently stored as chemical bond energy in specializedsmall molecules that act as carriers of energy and reactive chemical groups. (We discussthese "activated carrier" molecules later.) Molecular oxygen (Oz gas) derived from the splitting of water by light is released as a waste product of this first stage. In the second stage,the molecules that serve as energy carriers are used to help drive a carbonfixallon process in which sugars are manufactured from carbon dioxide Bas (COz) and water (HzO), thereby providing a useful source of stored chemical bond energy and materials-both for the plant itself and for any animals that eat it. We describe the elegant mechanisms that underlie these two stagesofphotosynthesis in Chapter 14.
69
Figure2-39 Someinterconversions between different forms of energy. All energyformsare,in principle, interconvertible. ln all theseorocesses the total amountof energyis conserved. Thus,for example,from the heightand weightof the brickin (1),we can predict exactlyhow much heatwill be released when it hitsthe floor.In (2),notethat the largeamountof chemicalbond energy released when wateris formedis initially convertedto very rapidthermal motions in the two new watermolecules; but with other molecules almost collisions instantaneously spreadthis kinetic energyevenlythroughoutthe (heattransfer), makingthe surroundings from all new moleculesindistinguishable the rest.
70
Chapter2: CellChemistryand Biosynthesis
f.^..^> \1
'11
capture o{ light energy
I
---
H2O+ CO2
energy carflers SUGAR
) ( ) heat
heat
STAGE2
STAGE1
Figure2-40Photosynthesis. Theenergycarriers created in thefirst Thetwo stages of photosynthesis. stagearetwo molecules thatwediscuss shortly-ATP andNADPH. The net result of the entire process of photosynthesis, so far as the green plant is concerned, can be summarized simply in the equation light energy + CO2+ H2O -+ sugars + 02 + heat energy The sugarsproduced are then used both as a source of chemical bond energy and as a source of materials to make the many other small and large organic molecules that are essentialto the Dlant cell.
CellsObtainEnergyby the Oxidationof OrganicMolecules All animal and plant cells are powered by energy stored in the chemical bonds of organic molecules, whether they are sugarsthat a plant has photosynthesized as food for itself or the mkture of large and small molecules that an animal has eaten. Organisms must extract this energy in usable form to live, grow, and reproduce. In both plants and animals, energy is extracted from food molecules by a process ofgradual oxidation, or controlled burning. The Earth'satmosphere contains a great deal of oxygen, and in the presence of oxygen the most energeticallystable form of carbon is CO2and that of hydrogen is H2O.A cell is therefore able to obtain energy from sugarsor other organic molecules by allowing their carbon and hydrogen atoms to combine with oxygen to produce COz and H2O,respectively-a processcalled respiration. Photosynthesisand respiration are complementary processes(Figure 2-41). This means that the transactions between plants and animals are not all one way. Plants, animals, and microorganisms have existed together on this planet for so long that many of them have become an essentialpart of the others' environments. The oxygen releasedby photosynthesis is consumed in the combustion of organic molecules by nearly all organisms. And some of the COz molecules that are fixed today into organic molecules by photosynthesis in a green leaf were yesterday releasedinto the atmosphere by the respiration of an animal-or by that of a fungus or bacterium decomposing dead organic matter. We therefore seethat carbon utilization forms a huge cycle that involves the biosphere (all of the living organisms on Earth) as a whole, crossing boundaries PHOTOSYNT EH SIS COr+HrO+02+SUGARS 02
co,
RESPIRATION S U G A R+ S Or+ HrO+ CO,
co,
or
Figure2-41 Photosynthesis and respirationas complementaryprocesses in the livingworld. Photosynthesis uses the energyof sunlightto producesugars and otherorganicmolecules. These moleculesin turn serveasfood for other organisms. Manyof theseorganisms carryout respiration, a processthat uses 02 to form CO2from the samecarbon atomsthat had beentakenup asCO2and convertedinto sugarsby photosynthesis. In the process, the organisms that respire obtainthe chemicalbond energythat Thefirstcellson the they needto survive. Eartharethoughtto havebeencapable of neitherphotosynthesis nor respiration (discussed in Chapter14).However, photosynthesis musthavepreceded respiration on the Earth,sincethereis strongevidencethat billionsof yearsof photosynthesis were requiredbefore02 had beenreleased in sufficientquantity to createan atmosphere richin this gas, (TheEarth's atmosphere currently contains20o/o Ot.)
CATALYSIS ANDTHEUSEOF ENERGY BYCELLS
71
co2 tN ATMOSPHERE ANDWATER
\\ RESPIRATION
\
PHOTOSYNTHESIS
I PLANTS, ALGAE BACTERIA
Figure2-42 The carbon cycle.Individual into carbonatomsareincorporated of the livingworld by organicmolecules activityof bacteria the photosynthetic Theypassto and plants(includingalgae). and organic animals,microorganisms, materialin soiland oceansin cyclicpaths. when CO2is restoredto the atmosphere organicmolecules areoxidizedby cellsor burnedby humansasfuels.
H U M U SA N D D I S S O L V E D ORGANICMATTER
between individual organisms (Figare2-42). Similarly, atoms of nitrogen, phosphorus, and sulfur move between the living and nonliving worlds in cycles that involve plants, animals, fungi, and bacteria.
Oxidationand Reduction InvolveElectron Transfers The cell does not oxidize organic molecules in one step, as occurs when organic material is burned in a fire. Through the use of enzyme catalysts,metabolism takes the molecules through a large number of reactions that only rarely involve the direct addition of oxygen. Before we consider some of these reactions and their purpose, we discusswhat is meant by the process of oxidation, Oxidation does not mean only the addition of oxygen atoms; rather, it applies more generally to any reaction in which electrons are transferred from one atom to another. Oxidation in this senserefers to the removal of electrons, and reduction-the converse of oxidation-means the addition of electrons. Thus, Fe2*is oxidized if it loses an electron to become Fe3*,and a chlorine atom is reduced if it gains an electron to become Cl-. Since the number of electrons is conserved (no loss or gain) in a chemical reaction, oxidation and reduction always occur simultaneously: that is, if one molecule gains an electron in a reaction (reduction), a second molecule loses the electron (oxidation). \Mhen a sugar molecule is oxidized to CO2and HzO, for example, the 02 molecules involved in forming H2O gain electrons and thus are said to have been reduced. The terms "oxidation" and "reduction" apply even when there is only a partial shift of electrons between atoms linked by a covalent bond (Figure 2-43). (A)
e-
€p+
e
_
F O R M A T I OO NF A POLAR COVALENT gOttg
ry'Try
H
€-
e-
I H*C-H I H
I
./\
*) ATOM 1
ATOM 2
methane
u
MoLEcuLE
methanol I
H _ C" -' O H
I
H g Figure2-43 Oxidation and reduction.(A)Whentwo atoms form a polar covalentbond (seep. 50),the atom endingup with a greatershareof electronsis saidto be reduced, The whilethe otheratom acquires a lessershareof electronsand is saidto be oxidized. reducedatom hasacquireda partialnegativecharge(6-)asthe positivechargeon the atomicnucleusis now morethan equaledby the total chargeof the electrons surroundingit, and conversely, the oxidizedatom hasacquireda partialpositivecharge (6+).(B)Thesinglecarbonatom of methanecan be convertedto that of carbondioxide by the successive replacement of its covalentlybondedhydrogenatomswith oxygen atoms.With eachstep,electronsareshiftedawayfrom the carbon(asindicatedby the b/ueshading), and the carbonatom becomesprogressively moreoxidized.Eachof thesestepsis energetically favorableunderthe conditionspresentinsidea cell.
I formaldehyde
c:o Hl I I f o r m i ca c i d
H
C:O HOI I
I n
---n
c a r b o nd i o x i d e
72
Chapter2:CellChemistryand Biosynthesis
\Mhen a carbon atom becomes covalently bonded to an atom with a strong affinity for electrons, such as oxygen,chlorine, or sulfur, for example, it givesup more than its equal share of electrons and forms a polar covalent bond: the positive charge of the carbon nucleus is now somewhat greater than the negative charge ofits electrons, and the atom therefore acquires a partial positive charge and is said to be oxidized. Conversely,a carbon atom in a C-H linkage has slightly more than its share ofelectrons, and so it is said to be reduced (seeFigure 2-43). 'W/hena molecule in a cell picks up an electron (e), it often picks up a proton (H+) at the same time (protons being freely available in water). The net effect in this caseis to add a hydrogen atom to the molecule A+o+H+-+AH Even though a proton plus an electron is involved (instead ofjust an electron), such hydrogenation reactions are reductions, and the reverse, dehydrogenation reactions, are oxidations. It is especiallyeasyto tell whether an organic molecule is being oxidized or reduced: reduction is occurring if its number of C-H bonds increases,whereas oxidation is occurring if its number of C-H bonds decreases (see Figure 2-438). Cells use enzymes to catalyze the oxidation of organic molecules in small steps,through a sequenceof reactions that allows useful energy to be harvested. We now need to explain how enzymes work and some of the constraints under which they operate.
Enzymes Lowerthe Barriers ThatBlockChemical Reactions Considerthe reaction paper + 02 -+ smoke + ashes+ heat + CO2+ H2O The paper burns readily, releasing to the atmosphere both energy as heat and water and carbon dioxide as gases,but the smoke and ashesnever spontaneously retrieve these entities from the heated atmosphere and reconstitute themselves into paper.\Ahen the paper burns, its chemical energy is dissipated as heat-not lost from the universe, since energy can never be created or destroyed,but irretrievably dispersed in the chaotic random thermal motions of molecules.At the same time, the atoms and molecules of the paper become dispersed and disordered. In the language of thermodlmamics, there has been aloss of free energJ, that is, of energy that can be harnessedto do work or drive chemical reactions. This loss reflects a loss of orderliness in the way the energy and molecules were stored in the paper. We shall discuss free energy in more detail shortly, but the general principle is clear enough intuitively: chemical reactions proceed spontaneously only in the direction that leads to a loss of free energy;in other words, the spontaneous direction for any reaction is the direction that goes "dor.rmhill."A "doumhill" reaction in this senseis often said tobe energeticallyfauorable, Although the most energetically favorable form of carbon under ordinary conditions is COz, and that of hydrogen is HzO, a living organism does not disappear in a puff of smoke, and the book in your hands does not burst into flames. This is because the molecules both in the living organism and in the book are in a relatively stable state, and they cannot be changed to a state of Iower energy without an input of energy: in other words, a molecule requires activation energy-a kick over an energy barrier-before it can undergo a chemical reaction that leaves it in a more stable state (Figure 244).In the case of a burning book, the activation energy is provided by the heat of a lighted match. For the molecules in the watery solution inside a cell, the kick is delivered by an unusually energetic random collision with surrounding moleculescollisions that become more violent as the temperature is raised. In a living cell, the kick over the energy barrier is greatly aided by a specialized class of proteins-the enzymes. Each enzyme binds tightly to one or more molecules, called substrates, and holds them in a way that greatly reduces the activation energy of a particular chemical reaction that the bound substrates can undergo. A substance that can lower the activation energy of a reaction is
CATALYSIS ANDTHEUsEOF ENERGY BYCELLS
73
e n z y m er o w e r s activation e n e r g yf o r catalyzed reaction Y+X
I
I o o c o 6
(A)
ff'113'J""'1"*"v
(B)
:lzJffic;€attalf:i
y (areactant) Figure2-44Theimportantprinciple (A)compound of activation energy. isin a relatively stablestate, andenergyisrequired to convert it to compound X (aproduct), even thoughX isat a loweroverall energylevelthanY Thisconversion willnottakeplace, therefore, unlesscompound Y canacquireenoughactivationenergy(energy a minusenergy b)fromits surroundings to undergo thereaction thatconverts it intocompound X.This energymaybe provided by means of anunusually energetic collision withothermolecules. Forthereverse reaction, X -+ Y theactivation energywill be muchlarger(energy a minusenergy c);thisreaction willtherefore occurmuchmorerarely. Activation positive; energies arealways note,however, thatthetotalenergychange fortheenergetically favorable reaction Y -+ X isenergy c minus (B)Energy energy b,a negative number. barriers for specific reactions canbe loweredby catalysts, asindicated bythelinemarked d. Enzymes areparticularly effective catalysts because theygreatly reduce theactivation energy forthereactions theyperform.
termed a catalyst; catalystsincreasethe rate of chemical reactions becausethey allow a much larger proportion of the random collisions with surrounding molecules to kick the substratesover the energy barrier, as illustrated in Figure 2-45.Enzymes are among the most effective catalystsknown, capable of speeding up reactions by factors of 101aor more. They thereby allow reactions that would not otherwise occur to proceed rapidly at normal temperatures. Enzymes are also highly selective.Each enzyme usually catalyzesonly one particular reaction: in other words, it selectivelylowers the activation energy of only one of the several possible chemical reactions that its bound substrate molecules could undergo. In this way, enzymes direct each of the many different molecules in a cell along specific reaction pathways (Figure 2-46). The success of living organisms is attributable to a cell's ability to make enzymes of many types, each with precisely specified properties. Each enzyme has a unique shape containing an actiue site, a pocket or groove in the enzyme into which only particular substrates will fit (Figure z-42). Like all other catalysts, enzyme molecules themselves remain unchanged after participating in a reaction and therefore can function over and over again. In chapter 3, we discuss further how enzymes work.
I
m a n y m o l e c u l e sh a v e e n o u g he n e r g yt o u n d e r g o the enzyme-catalyzed c h e m i c arl e a c t i o n
I
E .=>
39 6o
a l m o s tn o m o l e c u l e s h a v et h e v e r y h i g h e n e r g yn e e d e dt o u n d e r g oa n uncatalyzed
F> o!
b9
m o l e c u l ew s ith a v e r a g ee n e r g y
chemical reaction
-.5
e n e r g yp e r m o l e c u l e+
activation e n e r g yf o r catalyzed reaction
activation e n e r g yf o r u ncatalyzed reactron
Figure2-45 Loweringthe activation energygreatlyincreases the probability of reaction.At any giveninstant,a populationof identicalsubstrate molecules will havea rangeof energies, distributedas shownon the graph.The varyingenergiescomefrom collisions with surroundingmolecules, which make jiggle,vibrate, the substratemolecules and spin.Fora moleculeto undergoa chemicalreaction, the energyof the moleculemustexceedthe activation energybarrierfor that reaction; for most biologicalreactions, this almostnever happenswithoutenzymecatalysis. Even with enzymecatalysis, the substrate moleculesmustexperience a particularly energeticcollisionto react(redshaded areo).Raisingthe temperaturecan also increase the numberof molecules with sufficientenergyto overcomethe activationenergyneededfor a reaction; but in contrastto enzymecatalysis, this effectis nonselective, speedingup all reactions.
74
Chapter2: CellChemistryand Biosynthesis
'i.
o(o
dry river bed
l a k ew i t h
n$,
&l)
\\
(*- .U '
\
flowing s t r e am
WAVES
a
u n c a t ay z e d r e a c t i o n - w a v e sn o t a r g e e n o u g ht o s u r m o u n b t arrler
catalyzedreaction-waves often surmount barrier
(A)
t
t
2^ I
) a c o
uncatalyzed (B)
e n z y m ec a t a l y s i s o f r e a c t i o n1
How Enzymes FindTheirSubstrates: TheEnormous Rapidity of MolecularMotions An enzyme will often catalyzethe reaction of thousands of substrate molecules every second. This means that it must be able to bind a new substrate molecule in a fraction of a millisecond. But both enzl'rnesand their substratesare present in relatively small numbers in a cell. How do they find each other so fast?Rapid binding is possible because the motions caused by heat energy are enormously fast at the molecular level. These molecular motions can be classified broadly into three kinds: (1) the movement of a molecule from one place to another (translational motion), (2) the rapid back-and-forth movement of covalently linked atoms with respect to one another (vibrations), and (3) rotations. All of these motions help to bring the surfacesof interacting molecules together. The rates of molecular motions can be measured by a variety of spectroscopic techniques.A large globular protein is constantly tumbling, rotating about its axis about a million times per second. Molecules are also in constant translational motion, which causesthem to explore the space inside the cell very efficiently by wandering through it-a process called diffusion. In this way, every molecule in a cell collides with a huge number of other molecules each second. As the molecules in a liquid collide and bounce off one another, an individual molecule moves first one way and then another, its path constituting a random walk (Figure 2-48). In such a walk, the average net distance that each molecule travels (asthe crow flies) from its starting point is proportional to the square root of the time involved: that is, if it takes a molecule I second on averageto travel 1 pm, it takes 4 secondsto travel 2 pm, 100 secondsto travel 10 pm, and so on. The inside of a cell is very crowded (Figure 2-49). Nevertheless,experiments in which fluorescent dyes and other labeled molecules are injected into cells
activesite
m o l e c u l eA (substrate)
enzymesubstrate comolex
enzymeproduct comolex
molecule B (product)
Figure2-46 Floatingball analogiesfor enzyme catalysis.(A)A barrier dam is loweredto representenzyme catalysis. The greenball representsa potentialreactant(compoundY) that is bouncingup and down in energylevel due to constantencounters with waves (ananalogyfor the thermal bombardmentof the reactantmolecule with the surrounding watermolecules). Whenthe barrier(activation energy)is loweredsignificantly, it allowsthe favorablemovementof the energetically ball(thereactant)downhill.(B)Thefour wallsof the box reoresent the activation energybarriers for four differentchemical reactions that areall energetically favorable, in the sensethat the products areat lowerenergylevelsthan the reactants.ln the left-handbox, none of thesereactions occursbecauseeventhe largestwavesarenot largeenoughto surmountany of the energybarriers. In the right-handbox, enzymecatalysis lowersthe activationenergyfor reaction number1 only;now the jostlingof the wavesallowspassage ofthe reactant moleculeoverthis energybanier, inducingreaction1.(C)A branchingriver with a set of barrierdams(yellowboxes) servesto illustratehow a seriesof enzyme-catalyzed reactions determines the exactreactionpathwayfollowedby eachmoleculeinsidethe cell.
Figure2-47 How enzymeswork. Each enzymehasan activesiteto whichone or more substrotemoleculesbind, formingan enzyme-substrate complex. A reactionoccursat the activesite, producingan enzyme-product complex. fhe productis then released, allowingthe enzymeto bind furthersubstrate mnlarr
rla<
75
CATALYSIS AND THEUSEOF ENERGY BYCELLS
show that small organic molecules diffuse through the watery gel of the cytosol nearly as rapidly as they do through water. A small organic molecule, for example, takes only about one-fifth of a second on average to diffuse a distance of 10 pm. Diffusion is therefore an efficient way for small molecules to move the limited distances in the cell (a tlpical animal cell is 15 pm in diameter). Since enzymes move more slowly than substrates in cells, we can think of them as sitting still. The rate of encounter of each enzyme molecule with its substrate will depend on the concentration of the substrate molecule. For example, some abundant substratesare present at a concentration of 0.5 mM. Since pure water is 55.5 M, there is only about one such substrate molecule in the cell for every 10swater molecules. Nevertheless,the active site on an enzyrne molecule that binds this substrate will be bombarded by about 500,000random collisions with the substrate molecule per second. (For a substrate concentration tenfold Iower, the number of collisions drops to 50,000 per second, and so on.) A random encounter between the surface of an enzyme and the matching surface of its substrate molecule often leads immediately to the formation of an enzyme-substrate complex that is ready to react. A reaction in which a covalent bond is broken or formed can now occur extremely rapidly. \dhen one appreciates how quickly molecules move and react, the observed rates of enzymatic catalysisdo not seem so amazing. Once an enzyrne and substrate have collided and snuggled together properly at the active site, they form multiple weak bonds with each other that persist until random thermal motion causesthe molecules to dissociateagain. In general,the stronger the binding of the enzyme and substrate,the slower their rate of dissociation. However, when two colliding molecules have poorly matching surfaces, they form few noncovalent bonds and their total energy is negligible compared with that of thermal motion. In this case the two molecules dissociate as rapidly as they come together, preventing incorrect and unwanted associations betvveen mismatched molecules, such as between an enzyme and the wrong substrate.
oo -. orsta nce traveled Figure2-48 A random walk. Molecules in solutionmovein a random fashionasa resultof the continual with buffetingthey receivein collisions Thismovementallows other molecules. smallmolecules to diffuserapidlyfrom one part ofthe cellto another,as describedin the text.
TheFree-Energy Changefor a Reaction Whetherlt Determines CanOccur We must now digress briefly to introduce some fundamental chemistry. Cells are chemical systems that must obey all chemical and physical laws. Although enzyrnes speed up reactions, they cannot by themselves force energetically unfavorable reactions to occur. In terms of a water analogy, enzymes by themselves cannot make water run uphill. Cells, however, must do just that in order to grow and divide: they must build highly ordered and energy-rich molecules from small and simple ones.We shall see that this is done through enzl'rnes that directly couple energetically favorable reactions, which release energy and produce heat, to energetically unfavorable reactions, which produce biological order. Before examining how such coupling is achieved, we must consider more carefully the term "energetically favorable." According to the second law of thermodynamics, a chemical reaction can proceed spontaneously only if it results in a net increase in the disorder of the universe (seeFigure 2-38). The criterion for an increase in disorder of the universe can be expressedmost conveniently in terms of a quantity called the free energy, G of a system. The value of G is of interest only when a system undergoes a change,and the change in G denoted AG (delta G), is critical, Supposethat the system being considered is a collection of molecules. As explained in Panel 2-7 (pp. I18-tt9), free energy has been defined such that AG directly measures the amount of disorder created in the universe when a reaction takes place that involves these molecules. Energetically fauorable reactions, by definition, are those that decrease free energy; in other words, they have a negatiue L,Gand disorder the universe (Figure 2-50). An example of an energetically favorable reaction on a macroscopic scale is the "reaction" by which a compressed spring relaxes to an expanded state, releasing its stored elastic energy as heat to its surroundings; an example on a microscopic scale is salt dissolving in water. Conversely, energetically unfauorable reactions,with a positiue AG-such as the ioining of two amino acids to
1 0 0n m Figure2-49 The structure of the cytoplasm.Thedrawingis approximately the crowdingin to scaleand emphasizes Onlythe macromolecules the cytoplasm. are shown:RNAsare shown in b/ue. ribosomesin green,and proteinsin red. diffuse Enzymes and other macromolecules in part relatively slowlyin the cytoplasm, becausethey interactwith many other by smallmolecules, macromolecules; contrast, diffusenearlyas rapidlyasthey do in water.(Adaptedfrom D.S.Goodsell, TrendsBiochem.Sci.16:203-206,1991.With permissionfrom Elsevier.)
76
Chapter2: CellChemistryand Biosynthesis The free energy of Y
form a peptide bond-by themselves create order in the universe. Therefore, these reactions can take place only if they are coupled to a second reaction with a negative AG so large that the AG of the entire process is negative (Figure 2-5L).
TheConcentration of Reactants Influences the Free-Energy Changeand a Reaction's Direction As we have just described, a reaction Y = X will go in the direction Y -+ X when the associatedfree-energy change, AG, is negative,just as a tensed spring left to itself will relax and lose its stored energy to its surroundings as heat. For a chemical reaction, however, AG depends not only on the energy stored in each individual molecule, but also on the concentrations of the molecules in the reaction mixture. Remember that AG reflects the degree to which a reaction creates a more disordered-in other words, a more probable-state of the universe. Recalling our coin analogy, it is very likely that a coin will flip from a head to a tail orientation if a jiggling box contains 90 heads and 10 tails, but this is a less probable event if the box has l0 heads and 90 tails. The same is true for a chemical reaction. For a reversible reaction Y = X, a large excessof Y over X will tend to drive the reaction in the direction Y -+ X; that is, there will be a tendency for there to be more molecules making the transition Y -+ X than there are molecules making the transition X -->Y. If the ratio of Y to X increases,the AG becomes more negative for the transition Y -+ X (and more positive for the transition X -+ Y). How much of a concentration difference is needed to compensate for a given decreasein chemical bond energy (and accompanying heat release)?The answer is not intuitively obvious, but it can be determined from a thermodynamic analysis that makes it possible to separatethe concentration-dependent and the concentration-independent parts of the free-energy change.The AG for a given reaction can thereby be written as the sum of two parts: the first, called Ihe standard free-energychange,AGo,depends on the intrinsic charactersof the reacting molecules; the second depends on their concentrations. For the simple reactionY -+ X at 37'C, A G = A G + 0 . 6 1 6l n ] $ = A G + . r + 2 I o"e E lYl lYl where AG is in kilocalories per mole, [Y] and [X] denote the concentrations of Y and X, ln is the natural logarithm, and the constant 0.616 is equal to R7: the product of the gas constant, R, and the absolute temperature, Z. Note that AG equals the value of AG when the molar concentrations of Y and X are equal (log I = 0). As expected,AG becomes more negative as the ratio of X to Y decreases(the log of a number < I is negative). Inspection of the above equation reveals that the AG equals the value of AG when the concentrations of Y and X are equal. But as the favorable reactionY -+ X proceeds,the concentration of the product X increasesand the concentration of the substrate Y decreases.This change in relative concentrations will cause ffi / [Y] to become increasingly large, making the initially favorable AG less and less negative. Eventually, when AG = 0, a chemical equilibrium will be attained; here the concentration effect just balances the push given to the reaction by AG, and the ratio of substrate to product reaches a constant value (Figure 2-52). How far will a reaction proceed before it stops at equilibrium? To address this question, we need to introduce the equilibrium constant, K The value of K is different for different reactions, and it reflects the ratio ofproduct to substrate at equilibrium. For the reactionY -+ X: IX]
^=lfr
The equation that connects AG and the ratio tX / tYl allows us to connect AG directly to K Since AG = 0 at equilibrium, the concentrations of Y and X at this point are such that:
tG =-r.421"c j+
or,
LG =-L4ZIogK
this reactioncan occurspontaneously
ENERGETICALLY UNFAVORABLE REACTION
lf the reactionX*Y o c c u r r e dA , Gw o u l d be > 0, and the u n i v e r s ew o u l d b e c o m em o r e oroereo.
thisreaction canoccuronlyif it iscoupledto a second, favorablereaction energetically Figure2-50 The distinction between energetically favorableand energeticallyunfavorablereactions.
the energeticallyunfavorable reactionX*Y is driven by the energeticallyfavorable reaction C*D, becausethe net free-energychangefor the pair of coupled reactionsis less than zero Figure 2-51 How reaction coupling is used to drive energetically unfavorable reactions.
C
LYSIS ANDTHEUSEOFENERGY BYCELLS
77 Figure2-52 Chemicalequilibrium. the Whena reactionreachesequilibrium, forwardand backwardfluxesof reacting areequaland opposite. molecules
THEREACTION
Theformation of X isenergetically favoredin thisexampleIn otherwords,the A6 of Y -+ X isnegative andthe AGof X + Y ispositiveButbecause of thermal bombardments, therewill always besomeX converting to Y andviceversa. SUPPOSE WESTART WITHAN EQUAL NUMBER OFY ANDX MOLECULES
thereforethe ratioof X to " molecules will increase
y1Jh"
transition
EVENTUALLY therewill be a largeenoughexcess of X overY to just compensate for the slowrateof X -+ Y.Equilibrium willthenbeattained
Table2-4 Relationship Betweenthe StandardFreeEnergyChange,AG",and the EquilibriumConstant AT EQUILIBRIUM t h e n u m b e ro f Y m o l e c u l e sb e i n gc o n v e r t e dt o X m o l e c u l e s e a c hs e c o n di s e x a c t l ye q u a lt o t h e n u m b e ro f X m o l e c u l e sb e i n gc o n v e r t e dt o Y m o l e c u l e se a c hs e c o n ds. o t h a t t h e r e i s n o n e t c h a n o ei n t h e r a t i o o f Y t o X .
Using the last equation, we can see how the equilibrium ratio of X to Y (expressedas an equilibrium constant, K) depends on the intrinsic character of the molecules, as expressedin the value of AG (Ihble 2-4). Note that for every 1.4 kcal/mole (5.9 kJ/mole) difference in free energy at 37"C, the equilibrium constant changes by a factor of 10. \Alhen an enzyme (or any catalyst) lowers the activation energy for the reaction Y -+ X, it also lowers the activation energy for the reaction X -+ Y by exactly the same amount (see Figure 2-44).The forward and backward reactions will therefore be acceleratedby the same factor by an enzyme, and the equilibrium point for the reaction (and AG) is unchanged (Figure 2-53).
ForSequentialReactions, AGoValuesAre Additive We can predict quantitatively the course of most reactions.A large body of thermodlmamic data has been collected that makes it possible to calculate the standard change in free energy,AG, for most of the important metabolic reactions of the cell. The overall free-energy change for a metabolic pathway is then simply the sum of the free-energychangesin each of its component steps.Consider, for example, two sequential reactions X-+Y and Y -+Z whose AG values are +5 and -13 kcal/mole, respectively.(Recallthat a mole is 6 x 1023molecules of a substance.)If these two reactions occur sequentially, the AG for the coupled reaction will be -8 kcal/mole. Thus, the unfavorable reaction X -+ Y which will not occur spontaneously, can be driven by the favorable reactionY -+ Z, provided that this second reaction follows the first. Cells can therefore cause the energetically unfavorable transition, X -+ Y to occur if an enzyme catalyzing the X -+ Y reaction is supplemented by a second enzyme that catalyzes the energetically fauorable reaction,Y -->Z. In effect, the reaction Y -+ Z will then act as a "siphon' to drive the conversion of all of molecule X to molecule Y and thence to molecule Z (Figure 2-54) . For example,
10s 104 103 102 lor 1 10 10-2 10-3 1o-4 1o-s
-7.1(-29.7) -s.7 (-23.8) -4.3(-18.0) - 2 . 8( - 1 1 . 7 ) -1.4(-s.e) 0 (0)
'r.4(s.9)
2 . 8( 1 1 . 7 ) 4.3(18.0) s.7(23.8) 7.1(2s.7)
V a l u eosf t h e e q u i l l b r i ucmo n s t a n t for the simple werecalculated Y = X usingthe chemlcalreaction equationglvenin the text TheAG"givenhereis in kilocalories permoleat 37"C,with kilojoules (1 kilocalorie permolein parentheses s )s i se q u atl o 4 l 8 4 k i l o j o u l e A in the text,AG'represents explained under difference the free-energy (whereal1 conditions standard arepresenlar a components 'l of 0 mole/liter) concentration Fromthistable,we seethat if thereis change free-energy standard a favorable (AG")of -a 3 kca/mole(-l B0 kJlmole)for Y + X,therewill be 1000 the transitlon in stateX than timesmoremolecuJes (K= 1000), in stateY at equilibrium
chapter2:cellchemistry andBiosynthesis
78
XY
XY UNCATALYZED REACTION
ENZYME-CATALYZED REACTION
several of the reactions in the long pathway that converts sugars into CO2 and H2O would be energetically unfavorable if considered on their or,rm.But the pathway neverthelessproceeds becausethe total AG for the seriesof sequential reactions has a large negative value. But forming a sequential pathway is not adequate for many purposes. Often the desired pathway is simply X -+ Y without further conversion of Y to some other product. Fortunately, there are other more general ways of using enzymes to couple reactions together. How these work is the topic we discuss next.
Figure2-53 Enzymescannotchange the equilibriumpoint for reactions, Enzymes, likeall catalysts, speedup the forwardand backwardratesof a reaction by the samefactor.Therefore,for both the catalyzed and the uncatalyzed reactions shownhere,the numberof molecules undergoingthe transition X -+ Y is eoualto the numberof molecules undergoingthe transition Y -+ X when the ratioof Y molecules to X molecules is 3.5to 1. In otherwords,the two reactionsreacheouilibriumat exactlythe samepoint.
ActivatedCarrierMolecules Are Essential for Biosynthesis The energy released by the oxidation of food molecules must be stored temporarily before it can be channeled into the construction of the many other molecules needed by the cell. In most cases,the energy is stored as chemical bond energy in a small set of activated "carrier molecules,"which contain one or more energy-rich covalent bonds. These molecules diffuse rapidly throughout the cell and thereby carry their bond energy from sites of energy generation to the sites where energy is used for bioslnthesis and other cell activities (Figure 2-55). The activated carriers store energy in an easily exchangeable form, either as a readily transferable chemical group or as high-energy electrons, and they can serve a dual role as a source of both energy and chemical groups in biosynthetic reactions. For historical reasons,these molecules are also sometimes referred to as coenzymes.The most important of the activated carrier molecules are ATP and two molecules that are closely related to each other, NADH and NADPHas we discuss in detail shortly. We shall see that cells use activated carrier molecules like money to pay for reactions that otherwise could not take place.
z e q u i l i b r i u mp o i n t f o r X * Y r e a c t i o na l o n e
e q u i l i b r i u mp o i n t f o r Y*Z reactionalone
(c)
<-
-
X
z e q u i l i b r i u mp o i n t f o r s e q u e n t i arl e a c t i o n sX + Y + Z
Figure2-54 How an energetically unfavorablereactioncan be driven by a second,following reaction.(A)At equilibrium, therearetwiceas many X molecules asY molecules, becauseX is of lowerenergythanY.(B)At equilibrium, thereare25 timesmoreZ molecules than Y molecules, because Z is of much lower energythanY.(C)lf the reactionsin (A) and (B)arecoupled,nearlyall of the X molecules will be convertedto Z molecules. as shown.
CATALYSIS ANDTHEUSEOF ENERGY BYCELLS
79
e n e r g e t i c Ial y u n f a v o r ab le reacilon
o x i d i z e df o o d molecule
Figure2-55 Energytransferand the role of activatedcarriersin metabolism.By activated servingas energyshuttles, carriermoleculesperformtheirfunction that linkthe breakdown asgo-betweens and the release of of food molecules energy (catabolism)to the energyof smalland large requiringbiosynthesis organicmolecules(anabolism).
molecule available in cell
CATABOLISM
ANABOLISM
TheFormationof an ActivatedCarrierls Coupledto an Energetical ly Favorable Reaction \Alhen a fuel molecule such as glucose is oxidized in a cell, enzyme-catalyzed reactions ensure that a large part ofthe free energy that is releasedby oxidation is captured in a chemically useful form, rather than being releasedas heat. This is achieved by means of a coupled reaction, in which an energetically favorable reaction drives an energetically unfavorable one that produces an activated carrier molecule or some other useful energy store. Coupling mechanisms require enzymes and are fundamental to all the energy transactions of the cell. The nature of a coupled reaction is illustrated by a mechanical analogy in Figure 2-56, in which an energetically favorable chemical reaction is represented by rocks falling from a cliff. The energy of falling rocks would normally be entirely wasted in the form of heat generated by friction when the rocks hit the ground (seethe falling brick diagram in Figure 2-39). By careful design, however, part of this energy could be used instead to drive a paddle wheel that lifts a bucket of water (Figure 2-568). Because the rocks can now reach the ground only after moving the paddle wheel, we say that the energeticallyfavorable reaction of rock falling has been directly coupledto the energetically unfavorable reaction of lifting the bucket of water. Note that because part of the energy is used to do work in (B), the rocks hit the ground with less velocity than in (A), and correspondingly less energy is dissipated as heat. Similar processesoccur in cells, where enzymes play the role of the paddle wheel in our analogy. By mechanisms that will be discussed later in this chapter, they couple an energetically favorable reaction, such as the oxidation of foodstuffs, to an energetically unfavorable reaction, such as the generation of
Figure2-56 A mechanicalmodel illustratingthe principleof coupled The spontaneous chemicalreactions. reactionshownin (A)couldserveas an analogyfor the directoxidationof glucoseto CO2and HzO,whichproduces heatonly.In (B)the samereactionis this second coupledto a secondreaction; of to the synthesis reactionis analogous Theenergy activatedcarriermolecules. producedin (B)is in a more usefulform than in (A)and can be usedto drivea varietyof otherwiseenergetically (C). reactions unfavorable
(c)
l-I o
r
-^\\
f;""K--
k i n e t i ce n e r g yo f f a l l i n g r o c k sl s t r a n s f o r m e di n t o h e a t e n e r g yo n l y
p a r t o f t h e k i n e t i ce n e r g yi s u s e dt o l i f t a b u c k e to f w a t e r ,a n d a c o r r e s p o n d i n g l y s m a l l e ra m o u n t i s t r a n s { o r m e di n t o h e a t
t h e p o t e n t i a lk i n e t i ce n e r g ys t o r e di n the raisedbucket of water can be u s e dt o d r i v e h y d r a u l i cm a c h i n e tsh a t carry out a variety of useful tasks
80
Chapter2: CellChemistryand Biosynthesis p h o s p h o a n h y d r i dbeo n d s
J.A o- o- ortl -o-P-o-P-o-P-o-cr ,, ililil1 oool
o-
H* +
- O - PI- O H
+
o-[-o-r-o-:H,
o
ool
ADP
I n o r g a nt c phosphate (P;)
an activated carrier molecule. As a result, the amount of heat releasedby the oxidation reaction is reduced by exactly the amount of energy that is stored in the energy-rich covalent bonds of the activated carrier molecule. The activated carrier molecule in turn picks up a packet of energy of a size sufficient to power a chemical reaction elsewherein the cell.
Figure2-57 The hydrolysisof ATPto ADPand inorganicphosphate,Thetwo in ATPare heldto outermostphosphates the restof the moleculeby high-energy phosphoanhydride bondsand arereadily As indicated, watercan be transferred. addedto ATPto form ADPand inorganic phosphate(Pi).Thishydrolysis of the terminalphosphateof ATPyields between11 and 13 kcal/moleof usable energy,dependingon the intracellular conditions. The largenegativeAG of this reactionarisesfrom severalfactors. Release of the terminalphosphategroup removesan unfavorable reoulsion betweenadjacentnegativecharges; in addition,the inorganicphosphateion (Pi) released is stabilized by resonance and by favorablehydrogen-bond formation with water.
ATPls the MostWidelyUsedActivatedCarrierMolecule The most important and versatile of the activated carriers in cells is ATP (adenosine triphosphate). Just as the energy stored in the raised bucket of water in Figure 2-568 can drive a wide variety of hydraulic machines, ATP is a convenient and versatile store, or currency, of energy used to drive a variety of chemical reactions in cells.ATP is synthesizedin an energeticallyunfavorable phosphorylation reaction in which a phosphate group is added to ADP (adenosine diphosphate). \.Vhenrequired, ATP gives up its energy packet through its energetically favorable hydrolysis to ADP and inorganic phosphate (Figure Z-SZ). The regenerated ADP is then available to be used for another round of the phosphorylation reaction that forms AIP The energetically favorable reaction of AIP hydrolysis is coupled to many otherwise unfavorable reactions through which other molecules are slnthesized.We shall encounter severalof these reactions later in this chapter. Many of them involve the transfer of the terminal phosphate in ATP to another molecule, as illustrated by the phosphorylation reaction in Figure 2-58. hydroxyl g r o u po n
| | HO-C-C-
P H O S P H A TTE RANSFER
o-
- O - Pl ;l lO - C - C llrll Oi phosphoester bond
oo+ -O-P-O-P-O-CH,
lr il ool
t-
Figure2-58 An exampleof a phosphate transferreaction.Because an energy-rich phosphoanhydride bond in ATPis convertedto a phosphoester bond,this reactionis energetically favorable, having a largenegativeAG.Reactions of this type areinvolvedin the synthesis of phospholipids and in the initialstepsof reactions that catabolize suoars.
CATALYSIS ANDTHEUSEOFENERGY BYCELLS
(A)
81 (B)
_.,J,,. 'z
\' I
AtP nvorotvsrs A-B
Figure2-59 An example of an energeticallyunfavorable biosyntheticreactiondriven by ATPhydrolysis.(A)Schematic illustration of the formationof A-B in the condensation reaction describedin the text.(B)The biosynthesis of the commonamino acidglutamine.Glutamicacidis firstconvertedto a high-energy (corresponding phosphorylated intermediate to the compound B-O-PO3describedin the text),whichthen reactswith ammonta (corresponding to A-H) to form glutamine.In this exampleboth stepsoccuron the surfaceof the sameenzyme,glutamine high energybonds are shadedred. synthetase.The
ooH \/z
C
I
1", CH,
t-
H3N* -CH-COOg l u t a m i ca c i d
ATP is the most abundant activated carrier in cells. As one example, it supplies energy for many of the pumps that transport substances into and out of the cell (discussedin Chapter 1l). It also powers the molecular motors that enable muscle cells to contract and nerve cells to transport materials from one end of their long axons to another (discussedin Chapter 16).
EnergyStoredin ATPls Often Harnessed to JoinTwo Molecules Together We have previously discussedone way in which an energetically favorable reaction can be coupled to an energetically unfavorable reaction, X -+ Y so as to enable it to occur. In that scheme a second enzyme catalyzesthe energetically favorable reactionY -+ Z, pulling all of the X toY in the process (seeFigure 2-54). But when the required product is Y and not Z, this mechanism is not useful. A typical biosynthetic reaction is one in which two molecules, A and B, are joined together to produce A-B in the energetically unfavorable condensation reaction A-H + B-OH -+ A-B + HzO There is an indirect pathway that allows A-H and B-OH to form A-8, in which a coupling to ATP hydrolysis makes the reaction go. Here energy from ATP hydrolysis is first used to convert B-OH to a higher-energy intermediate compound, which then reacts directly with A-H to give A-B. The simplest possible mechanism involves the transfer of a phosphate from AIP to B-OH to make B-OPOa, in which casethe reaction pathway contains only two steps:
Net result:
a
CH, t_ I H3N'-CH-COOintermediate high-energy
ACTIVATION STEP
1. 2.
/
|'
o
B -OH
Itr
\
B-OH + AIP -+ B-O-POg + ADP A-H + B-O-POg -+ A-B + Pl B-OH + AIP + A-H -+ A-B + ADP + P1
The condensation reaction, which by itself is energeticallyunfavorable, is forced to occur by being directly coupled to ATP hydrolysis in an enzyme-catalyzed reaction pathway (Figure 2-59A). A biosynthetic reaction of exactly this tlpe synthesizes the amino acid glutamine (Figure 2-598). We will see shortly that similar (but more complex) mechanisms are also used to produce nearly all of the large molecules of the cell.
i#"f;r
Pr
prod*ts of ATPhydrolysis
O
\
NH"
c I
,,,
CH, t'
I CHt
l-
H3N'-CH-COOglutamine
82
Chapter2: CellChemistryand Biosynthesis
NADHand NADPHAre lmportantElectronCarriers Other important activated carrier molecules participate in oxidation-reduction reactions and are commonly part of coupled reactions in cells. These activated carriers are specialized to carry high-energy electrons and hydrogen atoms. The most important of these electron carriers are NAD+ (nicotinamide adenine dinucleotide) and the closely related molecule NADP+ (nicotinamide adenine dinucleotide phosphate). Later, we examine some of the reactions in which they participate. NAD+ and NADP+ each pick up a "packet of energy" corresponding to two high-energy electrons plus a proton (H+)-being converted to NADH (reduced nicotinamide adenine dinucleotide) and NADPH (reduced nicotinamide adenine dinucleotide phosphate), respectively.These molecules can therefore also be regarded as carriers of hydride ions (the H+ plus two electrons, or H-). Like ATP, NADPH is an activated carrier that participates in many important biosyrrthetic reactions that would otherwise be energetically unfavorable. The NADPH is produced according to the general scheme shov,rrin Figure 2-60A. During a special set of energy-yielding catabolic reactions, a hydrogen atom plus two electrons are removed from the substrate molecule and added to the nicotinamide ring of NADP+ to form NADPH, with a proton (H+) being released into solution. This is a typical oxidation-reduction reaction; the substrate is oxidized and NADP* is reduced. The structures of NADP+ and NADPH are shor,rmin Figure 2-608. NADPH readily gives up the hydride ion it carries in a subsequent oxidation-reduction reaction, becausethe nicotinamide ring can achieve a more stable arrangement of electrons without it. In this subsequent reaction, which regeneratesNADP*, it is the NADPH that is oxidized and the substrate that is reduced. The NADPH is an effective donor of its hvdride ion to other molecules (A)
- c -I I
I
H-C-O I
H-C-
I I
I I
CC-
I
o x i d a t i o no f m o l e c u l e1 (B)
r e d u c t i o no f m o l e c u l e2
o x i d i z e df o r m
C
nicotinamide nn9
//
o NHz
Oo
reducedform
Figure2-60 NADPH,an important carrierof electrons.(A) NADPHis producedin reactions of the generaltype shownon the left,in whichrwo hydrogenatomsare removedfrom a substrate. Theoxidizedform of the carrier molecule,NADP+, receives one hydrogen atom plusan electron(a hydrideion);the proton(H+)from the other H atom is released into solution.Because NADPH holdsits hydrideion in a high-energy linkage,the addedhydrideion can easily be transferred to other molecules, as shownon the right.(B)Thestructures of NADP+and NADPH. The part of the NADP+moleculeknownasthe nicotinamide ring acceptstwo electrons togetherwith a proton(theequivalentof a hydrideion,H-),formingNADPH.The moleculesNAD+and NADHareidentical in structureto NADP+and NADPH, respectively, exceptthat the indicated phosphategroupis absentfrom both.
C
83
LYSIS ANDTHEUSEOF ENERGY BYCELLS
for the same reason that ATP readily transfers a phosphate: in both casesthe transfer is accompanied by a large negative free-energychange. One example of the use of NADPH in biosynthesis is shown in Figure 2-61. The extra phosphate group on NADPH has no effect on the electron-transfer properties of NADPH compared with NADH, being far away from the region involved in electron transfer (see Figure 2-608). It does, however, give a molecule of NADPH a slightly different shape from that of NADH, making it possible for NADPH and NADH to bind as substratesto completely different sets of enzymes.Thus the two types of carriers are used to transfer electrons (or hydride ions) between two different sets of molecules. \Mhy should there be this division of labor? The answer lies in the need to regulate two sets of electron-transfer reactions independently. NADPH operates chiefly with enzyrnes that catalyze anabolic reactions, supplying the high-energy electrons needed to synthesizeenergy-rich biological molecules. NADH, by contrast, has a special role as an intermediate in the catabolic system of reactions that generate ATP through the oxidation of food molecules, as we will discuss shortly. The genesisof NADH from NAD+ and that of NADPH from NADP+occur by different pathways and are independently regulated, so that the cell can adjust the supply of electrons for these two contrasting purposes. Inside the cell the ratio of NAD+ to NADH is kept high, whereas the ratio of NADP+ to NADPH is kept low. This provides plenty of NAD+ to act as an oxidizing agent and plenty of NADPH to act as a reducing agent-as required for their special roles in catabolism and anabolism, respectively.
ThereAreManyOtherActivatedCarrierMolecules in Cells Other activated carriers also pick up and carry a chemical group in an easily transferred, high-energy linkage. For example, coenzyme A carries an acetyl group in a readily transferable linkage, and in this activated form is known as acetyl CoA (acetyl coenzymeA). Acetyl CoA (Figure 2-62) is used to add two carbon units in the biosynthesis of larger molecules. In acetyl CoA as in other carrier molecules, the transferable group makes up only a small part of the molecule. The rest consists of a large organic portion that
T.DEHYDROCHOLESTEROL
HO
HO
ffH
CHOLESTEROL
Figure2-61 The final stagein one of the biosyntheticroutes leading to As in manyother cholesterol. the reductionof reactions, biosynthetic the C=Cbond is achievedby the transfer of a hydrideion from the carriermolecule NADPH,plusa proton(H+)from the solution.
nu c l e o t i d e
H H O H H O H CH.H O o H-. C . | | il | | il | l-l C S _ C _ C - N - C - C - C - N - C _ C - C - C - O - P-o- Plll I llll n//lll H H H high-energy oono
H H H
OHCH3H
O
o-
- O - P I: O I
o acetylgroup
CoenzymeA (CoA)
Figure2-62 The structure of the importantactivatedcarriermolecule modelis acetylCoA.A space-filling Thesulfur shownabovethe structure. atom (yellow)formsa thioesterbond to this is a high-energy acetate.Because a largeamountof free linkage,releasing energywhen it is hydrolyzed,the acetate to moleculecan be readilytransferred other molecules.
84
Chapter2: CellChemistryand Biosynthesis
Table2-5 SomeActivatedCarrierMoleculesWidelyUsedin Metabolism
ATP NADH,NADPH, FADH2 AcetylCoA Carboxylated biotin S-Adenosylmethionine glucose Uridinediphosphate
phosphate electrons andhydrogens acetylgroup group carboxyl methylgroup 9rucose
serves as a convenient "handle," facilitating the recognition of the carrier molecule by specific enzymes.As with acetyl CoA, this handle portion very often contains a nucleotide (usually adenosine),a curious fact that may be a relic from an early stage of evolution. It is currently thought that the main catalysts for early life-forms-before DNA or proteins-were RNA molecules (or their close relatives),as described in Chapter 6. It is tempting to speculate that many of the carrier molecules that we find today originated in this earlier RNA world, where their nucleotide portions could have been useful for binding them to RNA enzyrnes. Figures 2-58 and 2-61 have presented examples of the type of transfer reactions powered by the activated carrier molecules AIP (transfer of phosphate) and NADPH (transfer of electrons and hydrogen). The reactions of other activated carrier molecules involve the transfer of a methyl, carboxyl, or glucose group for the purpose of biosynthesis (Table 2-5). These activated carriers are generated in reactions that are coupled to ATP hydrolysis, as in the example in Figure 2-63. Therefore, the energy that enables their groups to be used for biosynthesis ultimately comes from the catabolic reactions that generate ATP Similar processesoccur in the synthesis of the very large molecules of the cellthe nucleic acids, proteins, and polysaccharides-that we discuss next.
TheSynthesis of Biological Polymers ls Drivenby ATPHydrolysis As discussed previously, the macromolecules of the cell constitute most of its dry mass-that is, of the mass not due to water (see Figure 2-29). These
Figure2-63 A carboxylgroup transfer reactionusingan activatedcarrier molecule.Carboxylated biotinis usedby the enzyme pyruvatecarboxylaseto transfera carboxylgroup in the productionof oxaloacetate, a molecule neededfor the citricacidcycle.The acceptormoleculefor this grouptransfer reactionis pyruvate. Otherenzymesuse biotinto transfercarboxylgroupsto otheracceptormolecules. Notethat synthesisof carboxylatedbiotin requiresenergythat is derivedfrom ATP-a generalfeatureof many activatedcarriers.
CARBOXYL GROUPACTIVATION
\
ENZYME
P P P
C)-CH,
f
l
(.() ( () () pyruvale
biotin
n
o.\ / oC I
tl
( . . () ENZYME p y r u v a t ec a r b o x y l a s e
(.r
(1
oxa loacetate CARBOXYL GROUP TRANSFER
,: 85
CATALYSIS ANDTHEUSEOFENERGY BYCELLS H,O
It
A-H+HO-B
CONDENSATION
A-B
-B
+
HYDROLYSIS
Figure2-64 Condensationand hydrolysisas opposite reactions.The of the cellarepolymers macromolecules that areformedfrom subunits(or reaction monomers)by a condensation The and arebrokendown by hydrolysis. reactions areall condensation unfavorable. energetically
energetically f a v o r ab l e
energetically unfavorable
molecules are made from subunits (or monomers) that are linked together in a condensation reaction, in which the constituents of a water molecule (OH plus H) are removed from the two reactants. Consequently, the reverse reactionthe breakdown of all three tlpes of polymers-occurs by the enzyme-catalyzed. addition of water (hydrolysis).This hydrolysis reaction is energetically favorable, whereas the biosynthetic reactions require an energy input (Figure 2-64) . The nucleic acids (DNA and RNA), proteins, and polysaccharides are all polymers that are produced by the repeated addition of a monomer onto one end of a growing chain. The synthesis reactions for these three types of macromolecules are outlined in Figure 2-65. As indicated, the condensation step in each case depends on energy from nucleoside triphosphate hydrolysis. And yet, except for the nucleic acids, there are no phosphate groups left in the final product molecules. How are the reactions that releasethe energy of ATP hydrolysis coupled to polymer synthesis?
'NTUCLEIC:lAef g5r,
POLYSACCHARIDES glucose
grycogen
o I
o
CHz
OH e n e r g yo r i g i n a l l yd e r i v e df r o m n u c l e o s i d ter i p h o s p h a t eh y d r o l y s i s
ooH
ooH II
O:
O:
P-O I
I o
I
OH
OH
protein
amino acid
HOR
ttt\
o
o
\l// N-C-C ,/l\ HROH
OH
RHH
o I P-O I
I
O: P-OI
HYo
o
- - - - - cl -l lcl-zNz- c - c
ooH
It energy from nucleoside triphosphate hydrolysis
H
O:
Hzo
OH
grycogen
PROTEINS
I
P-O-
OH
OH
RNA OH
OH
energyfromnucleoside hydrolysis triphosphate HO
ttl - _---C -C ltl
RHH protern
ROHn -N-C
I - C ll-
|
N- C - C
ll'on
,zz"
proteins,and nucleic Figure2-65 The synthesisof polysaccharides, of eachkind of biologicalpolymerinvolvesthe lossof acids.Synthesis reaction.Not shownis the consumptionof waterin a condensation that is requiredto activateeach triphosphates nucleoside high-energy monomer beforeits addition.In contrast,the reversereaction-the breakdownof all threetypesof polymers-occursby the simpleaddition of water(hydrolysis).
86
Chapter2:CellChemistryand Biosynthesis
For each type of macromolecule, an enzyme-catalyzedpathway existswhich resembles that discussed previously for the synthesis of the amino acid glutamine (seeFigure 2-59). The principle is exactly the same, in that the OH group that will be removed in the condensation reaction is first activated by becoming involved in a high-energy linkage to a second molecule. However, the actual mechanisms used to link ATP hydrolysis to the slmthesis of proteins and polysaccharidesare more complex than that used for glutamine synthesis,since a series of high-energy intermediates is required to generate the final highenergy bond that is broken during the condensation step (discussedin Chapter 6 for protein synthesis). Each activated carrier has limits in its ability to drive a biosynthetic reaction. The AG for the hydrolysis of ATP to ADP and inorganic phosphate (PJ depends on the concentrations of all of the reactants,but under the usual conditions in a cell it is between -11 and -13 kcal/mole (between -46 and -54 kJ/mole). In principle, this hydrolysis reaction could drive an unfavorable reaction with a AG of, perhaps, +10 kcal/mole, provided that a suitable reaction path is available. For some biosynthetic reactions, however, even -13 kcal/mole may not be enough. In these casesthe path of ATP hydrolysis can be altered so that it initially produces AMP and pynophosphate (PPJ,which is itself then hydrolyzed in a subsequent step (Figure 2-66). The whole process makes available a total free-energy change of about -26 kcal/mole. An important type of biosynthetic reaction that is driven in this way is the synthesis of nucleic acids (polynucleotides) from nucleoside triphosphates, as illustrated on the right side of Figure 2-62. Note that the repetitive condensation reactions that produce macromolecules can be oriented in one of two ways, giving rise to either the head polymerization or the tail poll'rnerization of monomers. In so-called head polymerization the reactive bond required for the condensation reaction is carried on the end of the growing polymer, and it must therefore be regeneratedeach time that a monomer is added. In this case,each monomer brings with it the reactive bond that will be used in adding the next monomer in the series.ln tail polymerization the reactive bond carried by each monomer is instead used immediately for its or'rmaddition (Figure 2-6S). We shall see in later chapters that both these types of polymerization are used. The synthesisof polynucleotides and some simple polysaccharidesoccurs by tail polymerization, for example, whereas the synthesis of proteins occurs by a head polymerization process. (A)
(B)
o
ll ll o-l
P_O_C H 2
a d e n o s i n et r i p h o s p h a t e( A T P )
Hzo-
nro*-.'| I
oo
- o - Pi l- lol - P - o
P Pi
oo pyropnosphate
Hronl
(AMP) a d e n o s i n em o n o p h o s p h a t e
I
Hzo
I
o
o -o-P-oH + I o-
-o-P-oH I
phosphate
phosphate
o
Pi+Pi
*
AMP
Figure2-66 An alternativepathway of ATPhydrolysis,in which pyrophosphate is first formed and then hydrolyzed.This routereleases abouttwiceas muchfree energyasthe reactionshownearlierin Figure2-57, and it forms AMP insteadof ADP.(A)In the two successive hydrolysis reactions,oxygenatomsfrom the participating watermolecules are retainedin the products,as indicated, whereasthe hydrogenatomsdissociate to form free hydrogenions (H+,not shown).(B)Diagramof overallreactionin summarvform.
87
ANDTHEUSEOFENERGY CATALYSIS BYCELLS
h i g h - e n e r g yi n t e
diate
\ P Pr -
L.n,o
'
P -O
r\_:_/r f ulseI I \i,/ p _O_U.\ \:'s"7
5;
.ffiI?:ffi:lfl",
Figure 2-67 Synthesisof a polynucleotide,RNAor DNA, is a multistep processdriven bY ATP In the firststep,a nucleoside hydrolysis. is activatedby the monophosphate transferof the terminal seouential phosphategroupsfrom two ATP intermediate The high-energy molecules. triphosphateformed-a nucleoside existsfreein solutionuntil it reactswith the growingend of an RNAor a DNA of pyrophosphate. chainwith release of the latterto inorganic Hydrolysis phosphateis highlyfavorableand helps to drivethe overallreactionin the For synthesis. directionof polynucleotide details,seeChapter5.
OH nucleoside monophosphate
OH
Su m m a r y Liuing cellsare highly ordered and need to createorder within themseluesto suruiue and grow. This is thermodynamically possibteonly becauseof a continual input of energy,part of which must be releasedfrom the cellsto their enuironment as heat. The energycomesultimatety from the electromagneticradiation of the sun, which driues theformation of organic moleculesin photosyntheticorganismssuch as greenplants. Animals obtain their energyby eating theseorganic moleculesand oxidizing them in a seriesof enzyme-catalyzedreactions that are coupled to the formation of ATP-a common currency of energyin all cells. Tbmake possiblethe continual generationof order in cells,the energeticallyfauorable hydrotysisofATP is coupled to energeticallyunfauorable reactions.In the biosynthesisof macromolecules,this is accomplishedby the transfer of phosphategroups to form reactiue phosphorylated intermediates. Becausethe energetically unfauorable reaction now becomesenergeticallyfauorable,ATP hydrolysisis said to driue the reaction. Polymeric molecules such as proteins, nucleic acids, and polysaccharidesare assembledfrom small actiuated precursor moleculesby repetitiuecondensationreactions thet are driuen in this way. Other reactiuemolecules,called either actiue carriers or coenzymes,transfer other chemical groups in the course of biosynthesis:NADPH transfershydrogenas a proton plus two electrons(a hydride ion),for example,whereas acetyl CoA transfersan acetylgroup.
FATTYACIDS) HEADPOLYMERIZATION (e.9.,PROTEINS,
TAILPOLYMERIZATION ( e . g . ,D N A ,R N A ,P O L Y S A C C H A R I D E S )
Figure2-68 The orientation of the active intermediatesin the repetitivecondensationreactionsthat form bi-ologicalpolymers.The headgrowth of polymersis comparedwith its alternative,tail growth.As indicated,these areusedto producedifferenttypesof biologicalmacromolecules. two mechanisms
88
Chapter2: CellChemistryand Biosynthesis
HOWCELLS OBTAIN ENERGY FROMFOOD The constant supply of energy that cells need to generate and maintain the biological order that keeps them alive comes from the chemical bond energy in food molecules, which thereby serve as fuel for cells. The proteins, lipids, and polysaccharidesthat make up most of the food we eat must be broken down into smaller molecules before our cells can use themeither as a source of energy or as building blocks for other molecules. EnzFnatic digestion breaks down the large polymeric molecules in food into their monomer subunits-proteins into amino acids, polysaccharides into sugars, and fats into fatty acids and glycerol. After d^igestion, the small orginic molecules derived from food enter the cltosol of cells, where their gradual oxidation begins. Sugars are particularly important fuel molecules, and they are oxidized in small controlled steps to carbon dioxide (coz) and water (Figure 2-69). In this section we trace the major steps in the breakdor.tm,or catabolism, of sugarsand show how they produce ATB NADH, and other activated carrier molecules in animal cells. A very similar pathway also operates in plants, fungi, and many bacteria. As we shall see, the oxidation of fatty acids is equally important for cells. other molecules, such as proteins, can also serve as energy sourceswhen they are funneled through appropriate enzymatic pathways.
Glycolysis ls a CentralATP-Producing Pathway The major process for oxidizing sugars is the sequence of reactions known as
verted into two molecules of pyruuate, each of which contains three carbon atoms. For each glucose molecule, two molecules of ATp are hydrolyzed to provide energy to drive the early steps, but four molecules of Arp are produced in the later steps.At the end of glycolysis,there is consequently a nef gain of two molecules of AIP for each glucose molecule broken down. The glycolltic pathway is outlined in Figure 2-zo and shown in more detail in Panel 2-8 (pp. I?}-IZL). Glycolysis involves a sequence of l0 separate reac_ tions, each producing a different sugar intermediate and each caialvzed bv a
(A) stepwiseoxidation of sugar in cells
I a q o
E
( B ) d i r e c tb u r n i n go f s u g a r
s m a l la c t i v a t i o ne n e r g i e s o v e r c o m ea t b o d y t e m p e r a t u r eo w i n g t o t h e presence of enzymes S U G A R+ O ,
q activated c ar n e r molecules store energy
1w
allfree energy rs reteaSed as heat; none rs stored
& C O ,+ H r O
CO, + HrO
Figure2-69 Schematicrepresentation of the controlled stepwiseoxidation of sugarin a cell,comparedwith ordinary burning,(A)In the cell,enzymescatalyze oxidationvia a seriesof smallsteosin whichfreeenergyis transferred in conveniently sizedpacketsto carrier molecules-mostoftenATPand NADH. At eachstep,an enzymecontrolsthe reactionby reducingthe activation energybarrierthat hasto be surmounted beforethe specificreactioncan occur. The total free energyreleasedis exactly the samein (A)and (B).But if the sugar wereinsteadoxidizedto CO2and H2Oin a singlestep,as in (B),it would release an amountof energymuch largerthan could be capturedfor usefulpurposes.
89
HOWCELLSOBTAINENERGY.FROM FOOD
one molecule of glucose energy investment to be recoupeo later
fructose 1,6bisphosphate
I
CHO CHOH
CHOH
I
two moleculesof glyceraldehyde 3-phosphate
i cH2o P I
@
I
a |
I
I
I
< COO two molecules of pyruvate
srres
I CHO
C:o
I
CHr
srEPT ,',r, e srEP
c l e a v a g eo f s i x - c rab o n sugarto two th ree-carbon s u g ar s
I I
t---*
I
I
I
I
t I
energy generation
sTEP1O
l-
cooI I CH:
different enzyrne. Like most enzymes, these have names ending in ase--such as isomerase and dehydro genase-,lo indicate the type of reaction they catalyze. Although no molecular oxygen is used in glycolysis,oxidation occurs, in that electrons are removed by NAD+ (producing NADH) from some of the carbons derived from the glucose molecule. The stepwise nature of the process releases the energy of oxidation in small packets,so that much of it can be stored in activated carrier molecules rather than all of it being released as heat (see Figure 2-69). Thus, some of the energy releasedby oxidation drives the direct slmthesis of ATP molecules from ADP and Pi, and some remains with the electrons in the high-energy electron carrier NADH. TWomolecules of NADH are formed per molecule of glucosein the course of glycolysis. In aerobic organisms (those that require molecular oxygen to live), these NADH molecules donate their electrons to the electron-transport chain described in Chapter 14, and the NAD+ formed from the NADH is used again for glycolysis (see step 6 in Panel 2-8, pp. 120-l2I).
ProduceATPin the Absenceof Oxygen Fermentations For most animal and plant cells, glycolysis is only a prelude to the final stage of the breakdo'ornof food molecules. In these cells, the p],'ruvateformed by glycolysis is
Figure2-70 An outlineof glycolysis' Eachofthe 10 stepsshownis catalyzedby a differentenzyme.Note sugar a six-carbon that step4 cleaves so that the sugars, into two three-carbon at everystageafter numberof molecules step6 begins this doubles.As indicated, the energygenerationphaseof of ATP two molecules glycolysis. Because in the early,energY arehydrolyzed resultsin the investmentphase,glycolysis of 2 ATPand 2 NADH net synthesis moleculesper moleculeof glucose(see alsoPanel2-B).
90
Chapter2: CellChemistryand Biosynthesis
rapidly transported into the mitochondria, where it is converted into co2 plus acetyl CoA, which is then completely oxidized to CO2and H2O. In contrast, for many anaerobic organisms-which do not utilize morecular oxygen and can grow and divide without it-glycolysis is the principal source of the cell'sArP This is also true for certain animal tissues,such as skeletalmuscle, that can continue to function when molecular oxygen is limiting. In these anaerobic conditions, the pyruvate and the NADH electrons stay in the cytosol. The pyruvate is converted into products excreted from the cell-for example, into ethanol and co2 in the yeastsused in brewing and breadmaking, or into lactate in muscle. In this process,the NADH gives up its electrons and is converted back into NAD+. This regeneration of NAD+ is required to maintain the reactions of glycolysis (Figure 2--7l). Anaerobic energy-yielding pathways like these are called fermentations.
process.The piecing together of the complete glycolltic pathway in the 1930swas a major triumph of biochemistry, and it was quickly followed by the recognition of the central role of ArP in cell processes.Thus, most of the fundamentalioncepts discussed in this chapter have been understood for manv vears.
(A) FERMENTATION LEADINGTO EXCRETION OF LACTATE grucose
2 ADF-
.r -2l|{4q: \ E/
--E
-E-;
2 x pyruvate
oo\,,
o. o\./
I I
H- C-OH
C
C
I I
cH:r
CH: 2 x lactate
( B ) F E R M E N T A T I OLN EADING T O E X C R E T I OONFA L C O H O L ANDCO. grucose -
^-'
........._.\
'6
t->I
=-
/.-'
EDlr
2 x pyruvate
o. o\// C
I
HC:O
I
CH:
I
CH:
2 x acetaldehyde
H-, C l -OH CH:
2x
COz
2 x ethanol
Figure2-71 Two pathwaysfor the anaerobicbreakdownof pyruvate. (A)Whenthereis inadequate oxygen,for example,in a musclecellundergoing vigorouscontraction, the pyruvate producedby glycolysisis convertedto lactateas shown.Thisreactton regenerates the NADt consumedin step 6 of glycolysis,but the whole pathway yieldsmuch lessenergyoverallthan completeoxidation.(B)In some organisms that can grow anaerobically, suchasyeasts,pyruvateis convertedvia acetaldehyde into carbondioxideand ethanol.Again,this pathwayregenerates NAD+from NADH,as requiredto enable glycolysis to continue.Both(A)and (B) are exampfes of fermentations.
HOWCELLS OBTAIN ENERGY FROMFOOD
Glycolysis lllustrates HowEnzymes CoupleOxidationto Energy Storage Returning to the paddle-wheel analogy that we used to introduce coupled reactions (see Figure 2-56), we can now equate enzymes with the paddle wheel. Enzymes act to harvest useful energy from the oxidation of organic molecules by coupling an energetically unfavorable reaction with a favorable one. To demonstrate this coupling, we examine a step in glycolysis to see exactly how such coupled reactions occur. TWo central reactions in glycolysis (steps 6 and 7) convert the three-carbon sugar intermediate glyceraldehyde3-phosphate (an aldehyde) into 3-phosphoglycerate(a carboxylic acid; seePanel2-8, pp. 120-121).This entails the oxidation of an aldehyde group to a carboxylic acid group in a reaction that occurs in two steps.The overall reaction releasesenough free energy to convert a molecule of ADP to AIP and to transfer two electrons from the aldehyde to NAD* to form NADH, while still releasing enough heat to the environment to make the overall reaction energeticallyfavorable (AG for the overall reaction is -3.0 kcal/mole). Figure 2-72 otttlines the means by which this remarkable feat of energy harvesting is accomplished. The indicated chemical reactions are precisely guided by two enzymes to which the sugar intermediates are tightly bound. In fact, as detailed in Figure 2-72, the first enzyme (glyceraldehyde 3-phosphate dehydrogenase) forms a short-lived covalent bond to the aldehyde through a reactive -SH group on the enzyme, and catalyzes its oxidation by NAD+ in this attached state. The reactive enzyme-substrate bond is then displaced by an inorganic phosphate ion to produce a high-energy phosphate intermediate, which is released from the enzyme. This intermediate binds to the second enzyme (phosphoglycerate kinase), which catalyzesthe energetically favorable transfer of the high-energy phosphate just created to ADB forming AIP and completing the process of oxidizing an aldehyde to a carboxylic acid. We have shown this particular oxidation process in some detail because it provides a clear example of enzyme-mediated energy storage through coupled reactions (Figure 2-73). Steps 6 and 7 are the onlyreactions in glycolysis that create a high-energy phosphate linkage directly from inorganic phosphate. As such, they account for the net yield of two AIP molecules and two NADH molecules per molecule of glucose (seePanel 2-8, pp.l20-I2l). As we have just seen,AIP can be formed readily from ADP when a reaction intermediate is formed with a phosphate bond of higher-energy than the phosphate bond in AIP Phosphatebonds can be ordered in energy by comparing the standard free-energy change (AGl for the breakage of each bond by hydrolysis. Figure 2-74 compares the high-energy phosphoanhydride bonds in ATP with the energy of some other phosphate bonds, several of which are generated during glycolysis.
in SpecialReservoirs StoreFoodMolecules Organisms All organisms need to maintain a high ATP/ADP ratio to maintain biological order in their cells. Yet animals have only periodic accessto food, and plants need to survive overnight without sunlight, when they are unable to produce sugar from photosynthesis. For this reason, both plants and animals convert sugars and fats to special forms for storage (Figure 2-75). To compensate for long periods of fasting, animals store fatty acids as fat droplets composed of water-insoluble triacylglycerols,largely in the cytoplasm of specialized fat cells, called adipocltes. For shorter-term storage, sugar is stored as glucose subunits in the large branched polysaccharide glycogen, which is present as small granules in the cltoplasm of many cells,including liver and muscle. The synthesis and degradation of glycogen are rapidly regulated according to need. \.A/hencells need more AIP than they can generate from the food molecules taken in from the bloodstream, they break down glycogen in a reaction that produces glucose 1-phosphate,which is rapidly converted to glucose 6-phosphate for glycolysis.
91
92
Chapter2: CellChemistryand Biosynthesis
(A)
HO
\ /(_ /
I
H-C-OH
glyceraldehyde 3-phosphate
SH
A covalent bond is formed between glyceraldehyde3-phosphate(the substrate)and the -5H group of a cysteineside chain of the enzyme glyceraldehyde3-phosphate d e h y d r o g e n a s ew,h i c h a l s ob i n d s noncovalentlyto NAD+.
t-? I
H -C-OH
I
H_C_OH I
Oxidation of glyceraldehyde 3-phosphateoccurs,as two electronsplus a proton (a hydride ion, see Figure2-60) are transferredfrom glyceraldehyde 3-phosphateto the bound NAD+, forming NADH.Part of the energy releasedby the oxidation of the a l d e h y d ei s t h u s s t o r e di n N A D H , and part goes into convertingthe bond between the enzymeand its substrateglyceraldehyde 3-phosphateinto a high-energy thioester bond
cH2oo
(o c r F
I H-C-OH
HO-
A m o l e c u l eo f i n o r g a n i cp h o s p h a t e displacesthe high-energybond to the enzymeto create 1,3-bisphosphoglycerate,which contains a high-energyacyl-anhydride bond.
PIO
ti
rU- iAL J
1,3-bisphosphoglycerate
C:O
I
H-C-OH
l^
cH2o(L)
@@o F 4
H
r fr_A-r,
C
I
H- C-OH
l^
cH2o(L)
3 - p h o s p h olgy c e r a t e
The high-energybond to phosphate is transferredto ADP to form ATP.
(B) SUMMARYOF STEPS 6 AND 7 Much of the energy of oxidation has been stored in the activateo carriersATPand NADH.
Figure2-72 Energystoragein steps6 and 7 of glycolysis.In thesestepsthe oxidationof an aldehydeto a carboxylic acidis coupledto the formationof ATP and NADH.(A)Step6 beginswith the formationof a covalentbond between the substrate(glyceraldehyde 3-phosphate) and an -5H groupexposed on the surfaceof the enzyme (glyceraldehyde 3-phosphate dehydrogenase). Theenzymethen catalyzestransferof hydrogen(asa hydrideion-a protonplustwo electrons) from the bound glyceraldehyde 3-phosphate to a moleculeof NAD+.Partof the energy released in this oxidationis usedto form a moleculeof NADHand part is usedto convertthe originallinkagebetweenthe enzymeand its substrate to a highenergythioesterbond (shownin red.). A moleculeof inorganicphosphatethen displaces this high-energy bond on the enzyme,creatinga high-energy sugarphosphatebond instead(red).At this point the enzymehas not only stored energyin NADH,but alsocoupledthe energetically favorableoxidationof an aldehydeto the energetically unfavorable formationof a high-energy phosphate bond.Thesecondreactionhasbeen drivenby the first,therebyactinglikethe "paddle-wheel" couplerin Figure2-56. In reactionstep7, the high-energy just made, sugar-phosphate intermediate 1,3-bisphosphoglycerate, bindsto a secondenzyme,phosphoglycerate kinase.The reactivephosphateis transferredto ADP,forming a moleculeof ATPand leavinga freecarboxylic acid groupon the oxidizedsugar. (B)Summaryof the overallchemical changeproducedby reactions6 and 7.
93
HOWCELLSOBTAINENERGY FROMFOOD
o
ME
o c o a o
o
Figure 2-73 Schematicview of the coupledreactionsthat form NADHand ATPin steps 6 and 7 of glycolysis.The C-H bond oxidationenergydrivesthe formationof both NADHand a highenergyphosphatebond.The breakageof bond then drivesATP the high-energy formation.
\,/
C
I
\,, C I
C-H bond oxidation energy
+
STEP6
7 STEP
t o t a l e n e r g yc h a n g ef o r s t e p6 f o l l o w e d b y s t e p 7 i s a f a v o r a b l e- 3 k c a l / m o l e
o-o \//
CO
enol phosphate bond
rll
H'' C : C - O - . P
Hro
-O-
/ ,/lo-
p h o s p h o e n oply r u v a t e isee ianel Z-8,'pp. 120-121)
- 14'6 (-61 e)
oo
tltl c-c-o y'P o-
anhydride bondto carbon
o
Hro/
phosphate bond in creaTtne phosphate
anhydride bondto phosphate (phosphoanhydride bond)
a._a)-
ooo ililtl
i-"-i-o7i-"o-
o-
for example, 1,3-bisphosphoglycerate (seePanel2-8)
c r e a t i n eo h o s p h a t e ( a c t i v a t e dc a r r i e rt h a t storesenergy in muscle)
(-43.0)
for example, ATPwhen hydrolyzed tOADP
-7.3 (-306)
-10.3
/o-
Hzo HO phosphoester bond
lll , -i-"vP-o-
for example, g l u c o s e6 - p h o s p h a t e (seePanel2-8)
",/o-
Hzo type of phosphatebond
specificexamplesshowing the standardfree-energychange (AG') for hydrolysisof phosphatebond
Figwe 2-74 Phosphatebonds have different energies.Examplesof differenttypes of phosphatebondswith their sitesof hydrolysisare shown in the moleculesdepictedon the left.Thoseitarting with a gray catbonatom show only part of a molecule.Examplesof molecules (kilojoules transfer in parentheses)'The in kilocalories changefor hydrolysis containingsuchbondsaregivenon the right,with the free-energy (AG') the of for hydrolysis change free-energy if the standard phosfhate group favorable of a from one moleculeto anotheris energetically Thus,a phosphategroup of the phosphatebond in the second. phosphatebond of the firstmoleculeis more negativethan that for hydrolysis to ADPto form ATPThe hydrolysisreactioncan be viewedas the transferof the phosphate is readilytransferredfrom 1,3-bisphosphoglycerate group to water.
94
Chapter2: CellChemistyand Biosynthesis
9rycogen g r a n u l e isn the cytoplasm o f a l i v e rc e l l
b r a n c hp o i n t
g l u c o s es u b u n i t s
1pm
ql a 1,4-glycosidic bond in backbone
,,-
Figure2-75 The storageof sugarsand fats in animaland plant cells.(A)The structures of starchand glycogen, the storageform of sugarsin plantsand animals,respectively. Botharestorage polymersof the sugarglucoseand differ only in the frequencyof branchpoints (theregioninyellowisshownenlarged below).Therearemanymore branchesin glycogenthan in starch.(B)An electron micrographshowsglycogengranulesin the cytoplasmof a livercell.(C)A thin sectionof a singlechloroplast from a plantcell,showingthe starchgranules and lipid(fatdroplets)that have accumulated asa resultof the biosyntheses occurringthere.(D)Fat droplets(stainedred)beginningto accumulate in developingfat cellsof an animal.(8,courtesyof RobertFletterick and DanielS.Friend;C,courtesyof K.Plaskitt; D,courtesyof RonaldM. Evans and PeterTotonoz.)
o 1 , 6 - 9 l y c o s i dbi co n d at branch point
/ o-cH2
OH
l;------., Quantitatively, fat is far more important than glycogen as an energy store for animals, presumably becauseit provides for more efficient storage.The oxidation of a gram of fat releasesabout twice as much energy as the oxidation of a gram of glycogen. Moreover, glycogen differs from fat in binding a great deal of water, producing a sixfold difference in the actual mass of glycogen required to store the same amount of energy as fat. An averageadult human stores enough glycogen for only about a day of normal activities but enough fat to last for nearly a month. If our main fuel reservoir had to be carried as glycogen instead of fat, body weight would increase by an averageof about 60 pounds. Although plants produce NADPH and Arp by photosynthesis,this important process occurs in a specialized organelle, called a chloroplast, which is isolated from the rest of the plant cell by a membrane that is impermeable to both types of activated carrier molecules. Moreover, the plant contains many other cellssuch as those in the roots-that lack chloroplasts and therefore cannot produce their or,rmsugars.Therefore, for most of its ATP production, the plant relies on an
50u.
HOWCELLSOBTAINENERGY FROMFOOD
.95 Figure 2-76 How the ATPneeded for most plant cell metabolismis made.In plants,the chloroplasts and mitochondria to supplycellswith collaborate metabolitesand ATP.(Fordetails,see Chapter14.)
light
chloroplast
metabolites
export of sugars from its chloroplasts to the mitochondria that are located in all cells of the plant. Most of the AIP needed by the plant is synthesized in these mitochondria and exported from them to the rest of the plant cell, using exactly the same pathways for the oxidative breakdor,rrnof sugars as in nonphotosynthetic organisms (Figure 2-76). During periods of excessphotosynthetic capacity during the day, chloroplasts convert some of the sugars that they make into fats and into starch, a polgner of glucose analogous to the glycogen of animals. The fats in plants are triacylglycerols, just like the fats in animals, and differ only in the types of fatty acids that predominate. Fat and starch are both stored in the chloroplast as reservoirs to be mobilized as an energy source during periods of darkness (see Figure 2-75C). The embryos inside plant seedsmust live on stored sources of energy for a prolonged period, until they germinate to produce leaves that can harvest the energy in sunlight. For this reason plant seeds often contain especially large amounts of fats and starch-which makes them a malor food source for animals, including ourselves (Figare 2-7 7),
MostAnimalCellsDeriveTheirEnergyfrom FattyAcidsBetween Meals After a meal, most of the energy that an animal needs is derived from sugars derived from food. Excesssugars,if any, are used to replenish depleted glycogen stores,or to synthesizefats as a food store. But soon the fat stored in adipose tissue is called into play, and by the morning after an overnight fast, fatty acid oxidation generatesmost of the ATP we need. Low glucose levels in the blood trigger the breakdown of fats for energy production. As illustrated in Figure 2-78, the triacylglycerols stored in fat droplets in adipocl'tes are hydrolyzed to produce fatty acids and glycerol, and the fatty acids released are transferred to cells in the body through the bloodstream. \.\hile animals readily convert sugars to fats, they cannot convert fatty acids to sugars.Instead, the fatty acids are oxidized directly.
Figure2-77 SomePlant seedsthat serveas important foods for humans. Corn,nuts,and Peasall containrich storesof starchand fat that providethe youngplantembryoin the seedwith energyand buildingblocksfor (Courtesy of the JohnInnes biosynthesis. Foundation.)
96
Chapter 2:CellChemistry and Biosynthesis
stored fat bloodstream glycerol
MUSCLE CELL
fatty acids
o x i d a t i o ni n mitochondria
Figure 2-78 How stored fats are mobilized for energy production in animals.Low glucoselevelsin the blood triggerthe hydrolysisof the triacylglycerolmoleculesin fat droplets to free fatty acidsand glycerol,as illustrated.Thesefatty acidsenter the bloodstream,wherethey bind to the abundantblood protein,serumalbumin. Specialfatty acidtransportersin the plasmamembraneof cellsthat oxidize fatty acids,suchas musclecells,then pass thesefatty acidsinto the cytosol,from whichthey aremovedinto mitochondria for energyproduction(seeFigure2-80).
)
Sugarsand FatsAre Both Degradedto AcetylCoAin Mitochondria
The fatty acids imported from the bloodstream are moved into mitochondria, where all of their oxidation takes place ). Each molecule of fatty acid (as the activated molecule /a tty acyl coA) is broken down completely by a cycle of reactions that trims two carbons at a time from its carboxyl end, generating one molecule of acetyl coA for each turn of the cycle. A molecule of NADH and a molecule of FADH2 are also produced in this proces Sugars and fats are the major energy sources for most nonorganisms, including humans. However, most of the useful ene
8 t r i m e r so f lipoamide reductasetransacetylase
+6 dimersof dihydrolipoyl dehydrogenase
+ 1 2 d i m e r so f pyruvatedecarboxylase
o ,//
cH.c
si*ht{iiii acetyl coA (B)
Figure 2-79 The oxidation of pyruvate to acetylCoA and COz.(A)The structure of the pyruvatedehydrogenase complex, whichcontains60 polypeptidechains. Thisis an exampleof a large multienzymecomplexin which reaction intermediatesare passeddirectlyfrom one enzymeto another.In eucaryotic cellsit is locatedin the mitochondrion. (B)The reactionscarriedout by the pyruvatedehydrogenase complex.The complexconvertspyruvateto acetylcoA in the mitochondrial matrix;NADHis also producedin this reaction.A, B,and C are the three enzymespyruvate decarboxylase,Iipoam ide reductasetronsacetylose,and dihydrolipoyI dehydrogenase,respectively.These enzymesareillustrated in (A);their activities arelinkedas shown.
97
HOWCELL5OBTAINENERGY FROMFOOD
S u g a r sa n d polysaccharides
Fats+fatty acids CYTOSOL
Figure2-80 Pathwaysfor the production of acetyl CoAfrom sugarsand fats. The mitochondrionin lt is eucaryotic cellsis the placewhereacetylCoAis producedfrom both typesof majorfood molecules. occurand wheremostof its ATPis made. thereforethe olacewheremostof the cell'soxidationreactions in detailin Chapter14. arediscussed Thestructureand functionof mitochondria
extracted from the oxidation of both types of foodstuffs remains stored in the acetyl CoA molecules that are produced by the two t)?es of reactions just described. The citric acid cycle of reactions, in which the acetyl group in acetyl CoA is oxidized to CO2and H2O,is therefore central to the energy metabolism of aerobic organisms. In eucaryotesthese reactions all take place in mitochondria. We should therefore not be surprised to discover that the mitochondrion is the place where most of the ATP is produced in animal cells. In contrast, aerobic bacteria carry out all of their reactions in a single compartment, the cytosol, and it is here that the citric acid cycle takes place in these cells.
TheCitricAcidCycleGenerates NADHby OxidizingAcetylGroups to COz In the nineteenth century, biologists noticed that in the absence of air (anaerobic conditions) cells produce lactic acid (for example, in muscle) or ethanol (for example, in yeast), while in its presence (aerobic conditions) they consume 02 and produce CO2and H2O.Efforts to define the pathways of aerobic metabolism
Figure2-81 The oxidation of fatty acids to acetyl CoA.(A)Electronmicrographof a lipid droplet in the cytoplasm(top),and the structureof fats (bottom).Fatsare The glycerolportion,to triacylglycerols. whichthreefatty acidsarelinked throughesterbonds,is shownherein areinsolublein waterand form blue.Fats fat largelipiddropletsin the specialized in whichthey are cells(calledadipocytes) stored.(B)The fatty acid oxidationcycle. The cycleis catalyzedby a seriesof four Each enzymesin the mitochondrion. turn of the cycleshortensthe fattyacid chain by two carbons(shownin red)and generatesone moleculeof acetylCoA and one moleculeeachof NADHand The structureof FADHzis FADHz. presentedin Figure2-838.(4, courtesy of Daniel5. Friend.)
(B) fatty acyl CoA Rr,-CH2-CH2-
CH2-C
hydrocarbontail
fatty acyl CoA shortenedby .:fl,- CH2 - C two carDons
//
o
\s-coA / C H'?\ S -C
1pm
@
o C-
acetylCoA hydrirclrrbohtail
hydrocarbon tail
\? R-CH2-C-CH2-C.
,,o
OHH
r$,-CHr- Ct l- C HH
o tlC
e s t e rb o n d
6it
Hzo
o C-
//o -c cH i$tct-tr-cu: 's
hydrocarbontail
2 C
o
.5{6A
98
Chapter2: CellChemistryand Biosynthesis
eventually focused on the oxidation ofpyruvate and led in 1937to the discovery of the citric acid cycle, also knoum as the tricarboxylic acid cycle or the Krebs cycle.Thecitric acid cycle accounts for about two-thirds of the total oxidation of carbon compounds in most cells, and its major end products are CO2and highenergy electrons in the form of NADH. The CO2 is released as a waste product, while the high-energy electrons from NADH are passed to a membrane-bound electron-transport chain (discussedin Chapter 14), eventually combining with 02 to produce H2O. Although the citric acid cycle itself does not use 02, it requires 02 in order to proceed because there is no other efficient way for the NADH to get rid of its electrons and thus regeneratethe NAD+ that is needed to keep the cycle going. The citric acid cycle takes place inside mitochondria in eucaryotic cells. It results in the complete oxidation of the carbon atoms of the acetyl groups in acetyl CoA, converting them into CO2. But the acetyl group is not oxidized directly. Instead, this group is transferred from acetyl CoA to a larger, four-carbon molecule, oxaloacetate,to form the six-carbon tricarboxylic acid, citric acid, for which the subsequent cycle of reactions is named. The citric acid molecule is then gradually oxidized, allowing the energy of this oxidation to be harnessedto produce energy-rich activated carrier molecules. The chain of eight reactions forms a cycle because at the end the oxaloacetate is regenerated and enters a new turn of the cycle, as shown in outline in Figure 2-82. we have thus far discussed only one of the three types of activated carrier molecules that are produced by the citric acid cycle, the NAD+-NADH pair (see Figure 2-60). In addition to three molecules of NADH, each turn of the cycle also produces one molecule of FADH2 (reduced flavin adenine dinucleotide) from FAD and one molecule of the ribonucleotide GTP (guanosine triphosphate) from GDP The structures of these two activated carrier molecules are illustrated in Figure 2-83. GTP is a close relative of ATB and the transfer of its terminal phosphate group to ADP produces one ATP molecule in each cycle. Like NADH, FADHz is a carrier of high-energy electrons and hydrogen. As we discussshortly, the energy that is stored in the readily transferred high-energy electrons of NADH and FADH2will be utilized subsequently for Arp production through the process of oxidatiue phosphorylation, the only step in the oxidative catabolism of foodstuffs that directly requires gaseousoxygen (oz) from the atmosphere. Panel 2-9 (pp. 122-123)presents the complete citric acid cycle.Water, rather than molecular oxygen, supplies the extra oxygen atoms required to make co2 from the acetyl groups entering the citric acid cycle.As illustrated in the panel,
o -t - s-coA Hrc acetylCoA 2C
t
oxd toacelale
4C +H*
/ I 4C
STEP1
*{ srEP2
-)l
u4lllil-,,/
./
V
srEP8
5C
\
srEP 3
r*E!E-f.-
co,
I 5C
N E TR E S U L O T N ET U R NO FT H EC Y C L E P R O D U C ETSH R E EN A D H ,O N EG T BA N D O N EF A D H 2A, N D R E L E A S E TS W O M O L E C U L EOSF C O I
Figure2-82 Simpleoverviewof the citric acid cycle.The reactionof acetylcoA with oxaloacetatestartsthe cycleby producingcitrate(citricacid).In eachturn of the cycle,two molecules of CO2are producedas wasteproducts,plus threemolecules of NADH,one molecule of GTP, and one moleculeof FADH2. The numberof carbonatomsin each intermediateis shown in a yellowbox. Fordetails,see Panel2-9 (pp. 122-123).
guanine
o-
o ll c
2H-
???\'.,-i
2e
OH
ill - N - - c- t - r n c llrtll
HrC-9
t-r-t--N-C\O
H3C-C OH
OH
tl
,rN '' c,. ,rC'-N.' n
CH,
I' I H-C-OH I
H-C -OH
H-C-OH (B)
three moleculesof water are split in each cycle,and the orygen atoms of some of them are ultimately used to make CO2. In addition to pyruvate and fatty acids, some amino acids pass from the cytosolinto mitochondria, where they are alsoconvertedinto acetylCoAor one of the other intermediatesof the citric acid cycle.Thus,in the eucaryoticcell,the mitochondrion is the center toward which all energy-yieldingprocesseslead, whether they begin with sugars,fats,or proteins. Both the citric acid cycle and glycolysisalso function as starting points for important biosynthetic reactionsby producing vital carbon-containing intermediates,such as oxaloacetateand a-ketoglutarate.Someof these substances produced by catabolism are transferredback from the mitochondrion to the cytosol,where they servein anabolicreactionsasprecursorsfor the synthesisof many essentialmolecules,such as amino acids (Figure244).
H2C-O-
-O-
Figure2-83 The structuresof GTPand FADHz.(A)GTPand GDPare close relativesof ATPand ADP,respectively. (B)FADH2is a carrierof hydrogensand high-energyelectrons,like NADHand NADPH.lt is shown here in its oxidized form (FAD)with the hydrogen-canying atoms h ighlightedin yellow.
nucleotides glucose6-phosp nur. /
amrno sugars
+
fructose 6-phosphate
'
glycolipids glycoproteins
+
/\ I
+ + +
serine
d i h y d r o x y a c e t o n+e PhosPhate
3-phosphoglycerate
lipids
a m i n oa c i d s pyrimidines
phosphoenolpyruvate alanine .*.-
orrrlura"
andthe citric Figure2-84Glycolysis acidcycleprovidethe precursors manyimportant neededto synthesize Theaminoacids, biologicalmolecules. andother lipids,sugars, nucleotides, hereasproducts-in molecules-shown for the many turnserveasthe precursors the cell.Eachb/ack macromoleculesof arrowinthisdiagramdenotesa single thered reaction; enzyme-catalyzed with arrowsgenerallyrepresentpathways to produce manystepsthatarerequired products. the indicated
100
Chapter2:CellChemistryand Biosynthesis
ElectronTransportDrivesthe Synthesis of the Majorityof the ATP in MostCells Most chemical energy is released in the last step in the degradation of a food molecule. In this final process the electron carriers NADH and FADH2 transfer the electrons that they have gained when oxidizing other molecules to the electron-transport chain, which is embedded in the inner membrane of the mitochondrion (seeFigure 14-10).As the electrons pass along this long chain of specialized electron acceptor and donor molecules, they fall to successivelylower energy states.The energy that the electrons release in this process pumps H+ ions (protons) across the membrane-from the inner mitochondrial compartment to the outside-generating a gradient of H+ ions (Figure 2-85). This gradient servesas a source of energy,being tapped like a battery to drive a variety of energy-requiring reactions.The most prominent of these reactions is the generation of ATP by the phosphorylation of ADP At the end of this series of electron transfers, the electrons are passed to molecules of oxygen gas (Oz) that have diffused into the mitochondrion, which simultaneously combine with protons (H*) from the surrounding solution to produce water molecules. The electrons have now reached their lowest energy Ievel, and therefore all the available energy has been extracted from the oxidized food molecule. This process, termed oxidative phosphorylation (Figure 2-86), also occurs in the plasma membrane of bacteria. As one of the most remarkable achievements of cell evolution, it is a central topic of Chapter 14. In total, the complete oxidation of a molecule of glucose to H2O and CO2is used by the cell to produce about 30 molecules of ATP In contrast, only 2 molecules of ATP are produced per molecule of glucose by glycolysis alone.
AminoAcidsand Nucleotides ArePartof the NitrogenCycle So far we have concentrated mainly on carbohydrate metabolism and have not yet considered the metabolism of nitrogen or sulfur. These two elements are important constituents of biological macromolecules. Nitrogen and sulfur atoms pass from compound to compound and between organisms and their environment in a seriesof reversible cycles. Although molecular nitrogen is abundant in the Earth's atmosphere, nitrogen is chemically unreactive as a gas.Only a few living speciesare able to incorporate it into organic molecules, a process called nitrogen fixation. Nitrogen fixation occurs in certain microorganisms and by some geophysical processes, such as lightning discharge.It is essentialto the biosphere as a whole, for without it life could not exist on this planet. Only a small fraction of the nitrogenous compounds in today's organisms, however, is due to fresh products of nitrogen fixation from the atmosphere. Most organic nitrogen has been in circulation for
pyruvatefrom gl y c o l y s i s I
Coz I
N A D Hf r o m glycolysis I
Oz I
pyruvate
:*Hl * P
II
t
acetyl CoA
ctTRtc ACID CYCLE
CoA
2eI
u^rDAilvE
-PHOSPHORYLATION
t
MITOCHONDRION
Hzo
Figure2-85 The generationof an H+gradientacrossa membraneby electron-transportreactions. A high-energy electron(derived, for example,from the oxidationof a metabolite) is passedsequentially by carriers A, B,and C to a lowerenergy state.In this diagramcarrierB is arranged in the membranein sucha way that it takesup H+from one sideand releases it to the otherasthe electronpasses. The resultis an H+gradient.As discussed in Chapter14,this gradientis an important form of energythat is harnessed by other membraneoroteinsto drivethe formationof ATP.
Figure2-86 Thefinal stagesof oxidation of food molecules.Molecules of NADH (FADHz and FADH2 is not shown)are producedby the citricacidcycle.These activatedcarriers donatehigh-energy electrons that areeventuallyusedto reduceoxygengasto water. A majorportionof the energyreleased duringthe transferof theseelectrons alongan electron-transfer chainin the mitochondrial innermembrane(or in the plasmamembraneof bacteria)is harnessed to drivethe synthesis of ATPhencethe nameoxidative (discussed phosphorylation in Chapter14).
101
HOW CELLSOBTAINENERGY FROMFOOD
some time, passing from one living organism to another. Thus present-day nitrogen-fixing reactions can be said to perform a "topping-up" function for the total nitrogen supply. Vertebrates receive virtually all of their nitrogen from their dietary intake of proteins and nucleic acids. In the body these macromolecules are broken down to amino acids and the components of nucleotides, and the nitrogen they contain is used to produce new proteins and nucleic acids-or utilized to make other molecules. About half of the 20 amino acids found in proteins are essential amino acids for vertebrates (Figure 2-87), which means that they cannot be synthesizedfrom other ingredients of the diet. The others can be so synthesized, using a variety of raw materials, including intermediates of the citric acid cycle as described previously.The essentialamino acids are made by plants and other organisms, usually by long and energeticallyexpensivepathways that have been lost in the course of vertebrate evolution. Roshanl(eab 02l-66950639 The nucleotides needed to make RNA and DNA can be synthesized using specializedbiosynthetic pathways. All of the nitrogens in the purine and pyrimidine bases (as well as some of the carbons) are derived from the plentiful amino acids glutamine, aspartic acid, and glycine, whereas the ribose and deoxyribose sugars are derived from glucose. There are no "essential nucleotides" that must be provided in the diet. Amino acids not used in biosynthesis can be oxidized to generatemetabolic energy.Most of their carbon and hydrogen atoms eventually form COz or HzO, whereas their nitrogen atoms are shuttled through various forms and eventually appear as urea, which is excreted.Each amino acid is processeddifferently, and a whole constellation of enzymatic reactions exists for their catabolism. Sulfur is abundant on Earth in its most oxidized form, sulfate (SOaz-).To convert it to forms useful for life, sulfate must be reduced to sulfide (S2-),the oxidation state of sulfur required for the synthesis of essential biological molecules. These molecules include the amino acids methionine and cysteine,coenzymeA (seeFigure 2-62), and the iron-sulfur centers essentialfor electron transport (see Figure 14-23). The process begins in bacteria, fungi, and plants, where a special group of enzymes use ATP and reducing power to create a sulfate assimilation pathway. Humans and other animals cannot reduce sulfate and must therefore acquire the sulfur they need for their metabolism in the food that they eat.
Metabolismls Organized and Regulated One gets a sense of the intricacy of a cell as a chemical machine from the relation of glycolysis and the citric acid cycle to the other metabolic pathways sketched out in Figure 2-88. This type of chart, which was used earlier in this chapter to introduce metabolism, represents only some of the enzymatic pathways in a cell. It is obvious that our discussion of cell metabolism has dealt with only a tiny fraction of cellular chemistry. All these reactions occur in a cell that is less than 0.1 mm in diameter, and each requires a different enzyme. As is clear from Figure 2-88, the same molecule can often be part of many different pathways. Pyruvate,for example, is a substrate for half a dozen or more different enzymes,each of which modifies it chemically in a different way. One enzyme converts pyruvate to acetyl CoA, another to oxaloacetate;a third enzyrne changespyruvate to the amino acid alanine, a fourth to lactate, and so on. All of these different pathways compete for the same pyruvate molecule, and similar competitions for thousands of other small molecules go on at the same time. The situation is further complicated in a multicellular organism. Different cell tlpes will in general require somewhat different sets of enzymes. And different tissues make distinct contributions to the chemistry of the organism as a whole. In addition to differences in specialized products such as hormones or antibodies, there are significant differences in the "common" metabolic pathways among various types of cells in the same organism. Although virtually all cells contain the enzymes of glycolysis, the citric acid cycle, lipid synthesis and breakdown, and amino acid metabolism, the levels of
THEESSENTIAL AMINOACIDS
Figure?-87The nine essentialamino by acids.Thesecannotbe synthesized humancellsand so mustbe suppliedin the diet.
102
Chapter2:CellChemistryand Biosynthesis Figure2-88 Glycolysisand the citric acid cycleare at the center of metabolism.Some500 metabolic reactions of a typicalcellareshown with the reactions schematically of glycolysis and the citricacidcyclein red. Otherreactions eitherleadinto these two centralpathways-delivering small molecules to be catabolized with productionof energy-or they lead outwardand therebysupplycarbon compoundsfor the purposeof biosynthesis.
these processes required in different tissues are not the same. For example, nerve cells, which are probably the most fastidious cells in the body, maintain almost no reservesof glycogen or fatty acids and rely almost entirely on a constant supply of glucose from the bloodstream. In contrast, liver cells supply glucose to actively contracting muscle cells and recycle the lactic acid produced by muscle cells back into glucose.All types of cells have their distinctive metabolic traits, and they cooperate extensivelyin the normal state, as well as in response to stressand starvation. One might think that the whole system would need to be so finely balanced that any minor upset, such as a temporary change in dietary intake, would be disastrous. In fact, the metabolic balance of a cell is amazingly stable.\.A/henever the balance is perturbed, the cell reacts so as to restore the initial state. The cell can adapt and continue to function during starvation or disease.Mutations of many kinds can damage or even eliminate particular reaction pathways, and yet-provided that certain minimum requirements are met-the cell survives.It does so because an elaborate network of control mechanismsregulates and coordinates the rates of all of its reactions.These controls rest, ultimately, on the remarkable abilities of proteins to change their shape and their chemistry in response to changesin their immediate environment. The principles that underlie how large molecules such as proteins are built and the chemistry behind their regulation will be our next concern.
103
END-OF-CHAPTER PROBLEMS
Su m m a r y Glucoseand otherfood moleculesare broken down by controlled stepwiseoxidation to prouide chemical energy in the form of ATP and NADH. Thereare three main setsof reactions that act in series-the products of each being the starting material for the next:glycolysis(which occursin the cytosol),the citric acid cycle(in the mitochondrial matrix), and oxidatiue phosphorylation (on the inner mitochondrial membrane).The intermediate products of glycolysk and the citric acid cycleare usedboth as sourcesof metabolic energyand to produce many of the small moleculesusedas the raw materials for biosynthesis.Cellsstore sugar moleculesas glycogenin animals and starch in plants; both plants and animals also usefats extensiuelyas a food store.Thesestorage materials in turn serueas a major sourceof food for humans, along with the proteins that comprisethe majority of the dry massof most of the cellsin thefoods we eet.
PROBLEMS
TableQ2-1 Radioactiveisotopesand someof their properties(Problem2-12).
Whichstatementsare true?Explainwhy or why not. 2-1 Of the original radioactivityin a sample,only about 1/ 1000will remain after 10 half-lives. 2-2
A 10-BM solution of HCI has a pH of B.
2-3 Most of the interactions between macromolecules could be mediatedjust aswell by covalentbonds as by noncovalentbonds.
14c 36 35s 32P
B particle B particle B particle B particle
5730years 12.3years 87.4days 14.3days
0.062 29 1490 9120
2-4 Animals and plants use oxidation to extract energy from food molecules. 2-5 If an oxidation occurs in a reaction, it must be accompaniedby a reduction. 2-6 Linking the energetically unfavorable reaction A -+ B to a second,favorablereaction B -+ C will shift the equilibrium constantfor the first reaction. 2-7 The criterion for whether a reaction proceedsspontaneouslyis AG not AGo,becauseAG takesinto accountthe concentrationsof the substratesand products. 2-8 Becauseglycolysis is only a prelude to the oxidation of glucosein mitochondria, which yields l5-fold more AIB glycolysisis not really important for human cells. 2-9 The oxygen consumed during the oxidation of glucosein animal cellsis returned as COzto the atmosphere. Discussthe following problems. 2- 10 The organicchemistryof living cellsis said to be special for two teasons:it occurs in an aqueous environment and it accomplishessome very complex reactions.But do you suppose it is really all that much different from the organic chemistry carried out in the top laboratoriesin the world? \A/tryor why not? 2-11 The molecular weight of ethanol (CHgCHzOH)is 46 and its density is 0.789g/cm3. A. \A4ratis the molarity of ethanol in beer that is 5% ethanol by volume? [Alcohol content of beer varies from about 4Vo(lite beer) to B% (stout beer).1 B. The legal limit for a driver's blood alcohol content varies,but 80 mg of ethanol per 100 mL of blood (usually
referredto as a blood alcohollevel of 0.08)is t)?ical. \ /hat is the molarity of ethanol in a person at this legal limit? t. How many l2-oz (355-mL)bottles of 5% beer could a 70-kgpersondrink and remain under the legallimit? A 70-kg person contains about 40 liters of water. Ignore the metabolism of ethanol, and assumethat the water content of the person remains constant. D. Ethanol is metabolizedat a constant rate of about 120 mg per hour per kg body weight, regardlessof its concentration. If a 70-kg person were at twice the legal limit (160 mg/f 00 mL), how long would it take for their blood alcohol level to fall below the legal limit? 2-12 Specificactivity refers to the amount of radioactivity per unit amount of substance,usually in biology expressedon a molar basis,for example,as Ci/mmol. [One curie (Ci) corresponds to 2.22 x 1012disintegrations per minute (dpm;.1 As apparent in Table Q2-1, which lists properties of four isotopes commonly used in biology, there is an inverserelationship between maximum specific activity and half-life. Do you suppose this is just a coincidence or is there an underlying reason? Explain your answer. 2-13 By a convenientcoincidencethe ion product ofwater, K- = lH+l[OH-],is a nice round number: 1.0x 10-14M2. A. \AIhyis a solution at pH 7.0 said to be neutral? B. \A/tratis the H+ concentrationand pH of a I mM solution of NaOH? C. If the pH of a solution is 5.0,what is the concentration of OH- ions? 2-14 Suggesta rank order for the pKvalues (from lowestto highest)for the carboxylgroup on the aspartateside chain
104
Chapter2:CellChemistryand Biosynthesis
in the following environments in a protein. Explain your ranking. 1. An aspartateside chain on the surfaceof a protein with no other ionizable groupsnearby. 2. An aspartateside chain buried in a hydrophobic pocket on the surlaceof a protein. 3. An aspartateside chain in a hydrophobic pocket adjacent to a glutamateside chain. 4. An aspartateside chain in a hydrophobic pocket adjacent to a lysine side chain. 2-15 A histidine side chain is knol,rrnto play an important role in the cataly.ticmechanismof an enz).ryne; however,it is not clear whether histidine is required in its protonated (charged)or unprotonated (uncharged)state.To answerthis question you measureenzyrneactivity over a range of pH, with the resultssho\^Trin Figure Q2-1. \Ahich form of histidine is required for enz)ryneactivity? FigureQ2-1 Enzyme activityasa functionof pH(Problem 2-15).
c f
E o
E
o ,r_ ,_a, , ! C- O
FigureQ2-2 Threemolecules that illustrate the sevenmostcommonfunctionalgroupsin biology(Problem2-17).1,3-Bisphosphoglycerate and pyruvateareintermediates in glycolysis and cysteineis an aminoacid.
HO-CH
1 , 3 - b i s p h o s p h o g l y c e r a t e pyruvare
SH
cyslerne
Calculatethe instantaneousvelocity of a water molecule (molecularmass= 1Bdaltons),a glucosemolecule (molecular mass = lB0 daltons),and a myoglobin molecule (molecular mass = 15,000daltons) at 37"C. Just for fun, convert thesenumbers into kilometers/hour.Beforeyou do any calculations,try to guesswhether the moleculesare moving at a slow crawl (<1km/hr), an easywalk (5 km/hr), or a recordsettingsprint (40km/hr). 2-19 Polymerization of tubulin subunits into microtubules occurs with an increasein the orderlinessof the subunits (Figure Q2-3). Yet tubulin polymerization occurs with an increasein entropy (decreasein order). How can that be?
o
5
7
2-16 During an all-out sprint, musclesmetabolizeglucose anaerobically,producing a high concentrationoflactic acid, which lowers the pH of the blood and of the cytosol and contributes to the fatigue sprinters experiencewell before their fuel reservesare exhausted.The main blood buffer againstpH changesis the bicarbonate/CO2system.
PKr=
PK2= 2.3 38 CO2+CO2 +: +H* H2CO3 (gas) (dissolved)
= PtrG
lo" + HCO3-JsH*
POLYMERIZATION .+
FigureQ2-3Polymerization of tubulinsubunits intoa microtubule (Problem 2-19).The fatesof onesubunit(shoded) anditsassociated (smallspheres) watermolecules areshown.
+ CO32-
To improve their performance,would you advisesprinters to hold their breath or to breatherapidly for a minute immediately before the race?Explain your answer. 2-17 The three molecules in Figure Q2-2 contain the seven most common reactive groups in biology. Most moleculesin the cell are built from thesefunctional groups. Indicate and name the functional groups in these molecules. 2-18 "Diffusion" sounds slow-and over everyday distancesit is-but on the scaleof a cell it is very fast.The average instantaneousvelocity of a particle in solution, that is, the velocity betweencollisions,rs
2-2O A 70-kg adult human (154lb) could meet his or her entire energyneedsfor one day by eating3 moles of glucose (5a0g). (We don't recommend this.) Each molecule of glucose generates30 AIP when it is oxidizedto CO2.The concentration of AIP is maintained in cellsat about 2 mM, and a 70-kg adult has about 25 liters of intracellularfluid. Given that the ATPconcentrationremains constantin cells,calculate how many times per day,on average,eachAIP molecule in the body is hydrolyzedand reslmthesized. 2-21 Assuming that there are 5 x 1013cells in the human body and that AIP is turning over at a rate of 10eAIP per minute in each cell, how many watts is the human body consuming?(A watt is a joule per second,and there are 4.18 joules/calorie.) Assume that hydrolysis of AIP yields 12 kcal/mole.
y = (kTlm)h where k= 1.38x 10-16g cmzlKsecz,T = temperaturein K (37'Cis 310 K), m = massin g/molecule.
2-22 Does a SnickersrMcandy bar (65 g, 325 kcal) provide enough energyto climb from Zermatt (elevation1660m) to the top of the Matterhorn (4478m, Figure Q2-4), or might
END-OF-CHAPTER PROBLEMS
105
FigureQ2-4 The Matterhorn(Problem 2-22).(Courtesyof ZermattTourism.)
IAI-^O fert
a\l-CHr-CH2-CH2-CH2-CH2-CH2-CH2-C
.o,,no'Jrna ll
-l
cH2-C
excreted compound
tt'
work (D = mass (kg)x g (m/sec2;x height gained (m) where gis accelerationdue to gravity (9.8m/sec2).Onejoule is 1 kg m2lsec2and there are 4.lB kf per kcal. \Alhatassumptionsmade here will greatlyunderestimate how much candy you need? 2-23 At first glance, fermentation of pyruvate to lactate appearsto be an optional add-on reaction to glycolysis.After all, could cells growing in the absenceof oxygen not simply discard pyruvate as a waste product? In the absenceof fermentation, which products derived from glycolysis would accumulate in cells under anaerobic conditions? Could the metabolism of glucosevia the glycoll'tic pathway continue in the absenceof oxygenin cells that cannot carry out fermentation?\.A/hyor why not? 2-24 In the absenceof oxygen,cellsconsumeglucoseat a high, steadyrate.When oxygenis added,glucoseconsumption drops precipitouslyand is then maintained at the lower rate.\.Vhyis glucoseconsumedat a high rate in the absence of oxygenand at a low rate in its presence? 2-25 The liver provides glucose to the rest of the body betweenmeals.It doesso by breakingdown glycogen,forming glucose6-phosphatein the penultimate step.Glucose6phosphateis convertedto glucoseby splitting offthe phosphate (AG' = -3.3 kcal/mole).\Mhydo you supposethe liver removesthe phosphateby hydrolysis,rather than reversing the reactionby which glucose6-phosphate(G6P)is formed from glucose (glucose + AIP -+ G6P + ADB AG' = -4.0 kcal/mole)?By reversingthis reactionthe liver could generate both glucoseand AIP
//o \o_ phenylacetate
o *^^
n'\-cur-cH2-cH2-cH2-cH2-cH2-c
comoound ll \at
you need to stop at Hdrnli Hut (3260m) to eat another one? Imagine that you and your gear have a mass of 75 kg, and that all of your work is done against gravity (that is, you are just climbing straight up). Rememberfrom your introductory physicscoursethat
-|
o-
seven-carbonchain
o excreted ri\. com -l ' o o u n dl l \4.
o benzoate
fattyacid to analyze labeling experiment FigureQ2-5Theoriginal (Problem of an derivatives 2-26).( ) Fedandexcreted oxidation derivatives of an fattyacidchain.(B)Fedandexcreted even-number fattyacidchain. odd-number
Can you explainthe reasoningthat led him to concludethat two-carbon fragments, as opposed to any other number, were removed, and that degradation was from the carboxylic acid end, as opposedto the other end? 2-27 Pathways for synthesis of amino acids in microorganismswere worked out in part by cross-feedingexperiments among mutant organisms that were defective for individual steps in the pathway. Results of cross-feeding experiments for three mutants defective in the trlptophan TrpIl, and TrpE-are sholrryt in Figure parhway-TrpF, were streaked on a Petri dish and The mutants Q2-6. allowed to grow briefly in the presence of a very small amount of tryptophan, producing three pale streaks.As shown, heavier growth was observed at points where some streakswere close to other streaks.These spots of heavier growth indicate that one mutant can cross-feed(supply an intermediate)to the other one. From the pattern of cross-feedingshor,trrin Figure Q2-6, deducethe order ofthe stepscontrolled by the products of the TrpB, TrpD, and TrpE genes.Explain your reasoning.
C R O S S - F E E D IRNEGS U L T
TrpD-
TrpE-
2-26 In 1904Franz Knoop performed what was probably the first successfullabeling experiment to study metabolic pathways. He fed many different fatty acids labeled with a terminal benzene ring to dogs and analyzedtheir urine for excreted benzene derivatives.\Alheneverthe fatty acid had an even number of carbon atoms, phenylacetate was excreted (Figure Q2-5A). lVhenever the fatty acid had an odd number of carbon atoms, benzoatewas excreted(Figure Q2-58). From theseexperimentsKnoop deducedthat oxidation of fatty acidsto CO2and H2Oinvolved the removalof two-carbon fragments from the carboxylic acid end of the chain.
o-
eight-carbonchain
\t'
TrDB'
using FigureQ2-6 Definingthe pathwayfor tryptophansynthesis (Problem2-27).Results of a crossexperiments cross-feeding feedingexperimentamongmutantsdefectivefor stepsin the tryptophan biosyntheticpathway.Darkoreason the Petridish show regionsof cellgrowth.
CARBONSKELETONS Carbonhasa unique role in the cell becauseof its abilityto form strongcovalentbondswith other carbonatoms.Thuscarbonatomscan ioin to form chains.
\/ ,/
\/
CCCC " " \ \ _ r,, LLLL
/\
\/
\/
/\
^,..
"\
/\
\,,
\/ -t\
./c-
-
-\ ^,r-
or branchedtrees
^,,.
/\
also written as
C-C-
-c/ ,21 alsowritten as
\a /\
X
written arso r, al-)
COVALENTBONDS
HYDROCARBONS
A covalent bond forms when two atoms come very close together and shareone or more of their electrons.In a single bond one electronfrom eachof the two atoms is shared;in a double bond a total of four electronsare shared. Eachatom forms a fixed number of covalentbonds in a definedspatialarrangement.For example,carbonforms four singlebondsarrangedtetrahedrally,whereasnitrogenforms three singlebondsand oxygenforms two singlebondsarranged as shown below.
Carbonand hydrogencombine together to make stable compounds(or chemicalgroups) calledhydrocarbons. Theseare nonpolar,do not form hydrogenbonds,and are generallyinsolublein water. Atomsjoined by two or more covalentbonds cannot rotate freely around the bond axis. This restriction is a major influenceon the three-dimensional shaoe of many macromolecules.
HrC. CH,
ALTERNATING DOUBLEBONDS The carbonchain can includedouble bonds.lf theseare on alternatecarbon atoms,the bonding electronsmove within the molecule,stabilizingthe structureby a phenomenoncalled reS0nance.
HrC. Alternatingdouble bonds in a ring can generate a very stable structure.
CHt HrC. CH, HrC. CH, H,C CH, Hzc.
the truth is sbmewhereberween these two structures
9H, H:C
oftenwritten., O
part of the hydrocarbon"tail" of a fatty acid molecule
C_OCHEMICAL GROUPS
GROUPS C-N CHEMICAL
Many biologicalcompoundscontaina carbon bondedto an oxygen.For example,
Aminesand amidesare two important examplesof compoundscontaininga carbonlinkedto a nitrogen. Aminesin water combinewith an H+ ion to become positivelycharged. lrHt,H
_ ct_/ l(/
+u* *
_q_n_H*
r\nr\H
Amidesare formed by combiningan acidand an amine.Unlikeamines,amidesare unchargedin water. An exampleisthe peptide bond that joins amino acids in a orotein.
o -c/ The -COOHis calleda carboxylgroup. In water this losesan H+ion to become-COO-. esters
Estersare formed by a condensationreaction betweenacid and an alcohol.
'.Or,O
_[_r/+Ho_l___Lr/ I' acid
\
|
oH
alcohol
acid
\o"
+ H"N-J- I amtne
Nitrogenalsooccursin severalring compounds,including important constituentsof nucleicacids:purinesand pyrimidines.
)n' ^4rtn I
|
\
l+H20 o-c-
ester
I
o/"C
(a pyrimidine) ll cytosine tN-C-H H
group the sulfhydryl group.In the aminoacidcysteine iscalleda sulfhydryl form,-t-Sn mayexistin the reduced
SULFHYDRYL GROUP
lrl
form, -C-S-S-Cor more rarelyin an oxidized,cross-bridging
PHOSPHATES Inorganicphosphateis a stableion formed from phosphoricacid,H3PO4. lt is often written as Pi.
Phosphateesterscan form between a phosphateand a free hydroxyl group. Phosphategroups are often attached to proteins in this way.
carboxylgroup, or two or more phosphategroups,givesan acid anhydride. Hzo
l*
T
Hzo
also written as
high-energyacyl phosphate bond (carboxylic-phosphoric acidanhydride)found in some metabolites
-t_
o ^, o-P
Hzo
L T
Hzo
phosphoanhydride-ahighenergybond found in moleculessuchas ATP
-o
alsowritten as
WATER
WATERSTRUCTURE
Two atoms, connected by a covalent bond, may exert different attractionsfor the electronsof the bond. In suchcasesthe bond is polar,with one end slightlynegativelycharged(6-) and the other slightlypositivelycharged(6+).
Moleculesof water join together transiently in a hydrogen-bondedlattice.Evenat 37oC, 15o/o oI the water moleculesare joined to four othersin a short-livedassemblvknown as a "flickeringcluster."
.:tt:'i
l,::,:
electronegative regron
Although a water moleculehasan overallneutralcharge(havingthe same numberof electronsand protons),the electronsare asymmetrically distributeo, which makesthe moleculepolar.The oxygennucleusdrawselectronsaway from the hydrogennuclei,leavingthesenucleiwith a smallnet positivecharge. The excessof electron density on the oxygen atom createsweakly negative regionsat the other two cornersof an imaginarytetrahedron.
H Y D R O G EB NO N D S Because they are polarized,two adjacentH2Omoleculescan form a linkageknown as a hydrogen bond. Hydrogenbondshave only about 1/20the strength of a covalentbond. Hydrogenbondsare strongestwhen the three atoms lie in a straightline.
6-
H
""!,,,
l,u H
The cohesivenature of water is responsible for many of its unusual properties,suchas high surfacetension, specificheat, and heat of vaporization.
bond lengths hydrogenbond 0 . 2 7n m
Q rrrrrrrrrrrrrrrrrrr H -Oo-.tor'tt
hydrogen bond
covalentbond
H Y D R O P H I LM I CO L E C U L E S
H Y D R O P H O BM I CO L E C U L E S
Substances that dissolvereadilyin water are termed hydrophilic.Theyare composedof ionsor polar moleculesthat attractwater moleculesthrough electricalchargeeffects.Water moleculessurroundeachion or polar molecule on the surfaceof a solidsubstanceand carrvit into solution.
Moleculesthat containa preponderance of nonpolarbondsare usuallyinsolublein water and are termed hydrophobic.Thisis true, especially, of hydrocarbons, which containmany C-H bonds.Water molecules are not attractedto suchmoleculesand so have little tendencyto surroundthem and carrythem into solution.
H
HH
\/
C
.\
\H -C
lonicsubstances suchas sodiumchloride dissolvebecausewater moleculesare attracted to the positive (Na+)or negative (Cl) chargeof eachion.
H
/
Polarsubstances suchas urea dissolvebecausetheir molecules form hydrogenbondswith the surroundingwater molecules.
H IHH
o\ . H
C ,/\
WATERAS A SOLVENT Many substances, suchas householdsugar,dissolvein water. That is,their moleculesseparatefrom eachother, eachbecomingsurroundedby water molecules.
in a When a substancedissolves liquid,the mixture is termed a solution. The dissolvedsubstance(in this case sugar)isthe solute,and the liquid that doesthe dissolving(in this casewater) is the solvent.Water is an excellent solvent for many substancesbecause of its polar bonds. sugarcrystal
s u g a rm o l e c u l e
ACIDS
ION EXCHANGE HYDROGEN
Substances that releasehydrogenionsinto solution a r e c a l l e da c i d s .
Positivelychargedhydrogenions(H+)can spontaneously move from one water moleculeto another,therebycreating two ionic species. HH
HCI-+Cf hydrochloric acid (strongacid)
hydrogen ion
ion chloride
oilililrH-o
Many of the acidsimportant in the cellare only partially dissociated, and they are thereforeweak acids-for example, the carboxylgroup (-COOH),which dissociates to give a hydrogenion in solution
n..P, +oo/
,/
-
H hydroxyl ion hydronium ion (water acting as (water acting as a weak acid) a weak base)
oftenwrittenas: Hro i
H*
+
-c/ \
H* + oHhydroxyl nrl:Xn"n
Note that this is a reversible reaction.
hydrogenionsare israpidlyreversible, Since the process Purewater shuttlingbetweenwatermolecules. continually of hydrogenionsand concentration a steady-state contains hydroxylions(both10-'M).
pH
BASES
( w e a ka c i d )
pH The acidityof a solutionis defined by the concentration of H+ ions it possesses. Forconveniencewe usethe pH scale,where
1 02 1 03
pH = log,o[H+]
101
1 2 A
1o-s
5
10-6
6
1o-7 1 08
7 8 9
For pure water
1 o1 0
10
[H+] = 10-7moles/liter
1o-11 1 01 2
11
1013 1014
NH:+H*-NHo* ion ammonia hydrogen
ion ammonium
5
1o-4
1 0e
that reducethe number of hydrogenions in Substances solutionare calledbases.Somebases,suchas ammonia, combinedirectlywith hydrogenions.
12 II
Other bases,suchas sodiumhydroxide,reducethe number of H* ionsindirectly,by making OH- ionsthat then combine directlywith H' ionsto make H2O. NaOH s o d i u mh y d r o x i d e (strong base)
Nasodium ion
Many basesfound in cellsare partiallydissociatedand are termed weak bases.Thisis true of compoundsthat containan amino group (-NH2),which has a weak tendency to reversiblyaccept an H' ion from water, increasingthe quantity of free OH- ions.
14
-NHz+H*--NH:*
WEAKCHEMICAL BONDS
VAN DERWAALSATTRACTIONS
Organicmoleculescan interactwith other moleculesthrough three types of short-rangeattractive forces known as noncovalentbonds: van der Waals attractions,electrostaticattractions,and hydrogen bonds.The repulsionof hydrophobicgroupsfrom water is also importantfor ordering biologicalmacromolecules.
lf two atoms are too closetogether they repel each other very strongly. For this reason,an atom can often be treated as a soherewith a fixed radius.The characteristic "size" lor eachatom is specifiedby a uniquevan der Waals radius.The contact distancebetween any two noncovalentlybondedatoms isthe sum of their van der Waalsradii.
& 0 . 1 5n m radius
Weak chemicalbondshave lessthan 1/20the strengthof a strong covalentbond. Theyare strong enough to providetight binding only when many of them are formed simultaneously.
H Y D R O G EB NO N D S As already describedfor water (seePanel2-2), hydrogenbondsform when a hydrogenatom is "sandwiched " betweentwo electron-attracting atoms (usuallyoxygenor nitrogen).
Hydrogenbondsare strongestwhen the three atomsare i n a s t r a i g h tl i n e :
to-"
nnnro\
N-H
I I
lt
R-C-H
R-C-H
I
c : o l l l l l l l lH l t- N lll
i l i l i l i l i lH - N FFC-R
I I
Two bases,G and C, hydrogen-bondedin DNA or RNA. Fig
H. N-Hililill \' C-C \\\ .H-. L. : /// .NilililililtH\/ N-C. ,/ \\'bllllllltn-N' ,,/ tH
tl 0 . 1 5n m single-bonded caroonS
HYDROGEN BONDSIN WATER
Amino acidsin polypeptidechainshydrogen-bonded together.
c:o
At very short distancesany two atoms show a weak bonding interactiondue to their fluctuatingelectrical charges.The two atomswill be attractedto eachother in this way until the distancebetweentheir nucleiis approximatelyequalto the sum of their van der Waals radii.Although they are individuallyvery weak, van der Waalsattractionscan becomeimoortant when two macromolecularsurfacesfit very closetogether, becausemany atomsare involved. Note that when two atomsform a covalentbond, the centersof the two atoms (the two atomic nuclei)are much closertogether than the sum of the two van der W a a l sr a d i i .T h u s .
|||||/
Examplesin macromolecules:
I
0 . 1 4n m radius
Any moleculesthat can form hydrogenbondsto eachother can alternativelyform hydrogenbondsto water molecules. Becauseof this competitionwith water molecules, the hydrogenbondsformed betweentwo moleculesdissolved in water are relativelyweak.
HYDROPHOBIC FORCES
Water forces hydrophobic groups together, becausedoing so minimizestheir disruptive effects on the hydrogen-bondedwater network. Hydrophobicgroupsheld together in this way are sometimessaid to be held together by "hydrophobic bonds,"eventhough the apparentattraction is actuallycausedby a repulsionfrom the water.
IN ATTRACTIONS ELECTROSTATIC A Q U E O UsSO L U T I O N S Chargedgroupsare shieldedby their interactionswith water molecules. Electrostaticattractions are therefore quite weak in water.
Attractive forces occur both between fully charged groups(ionicbond) and betweenthe partiallycharged groupson polar molecules.
Similarly,ions in solutioncan clusteraround chargedgroupsand further weaken these attractions. CI
o
o
,.O
-C
//
I
\],
rr.t
,rNa
@ Na
clNa+
1mm
H
H -.-N CI\
[/
In the absenceof water, electrostaticforces are very strong. They are responsible for the strengthof suchmineralsas marbleand agate,and for crystalformation in common table salt.NaCl.
a crystalof salt,NaCl
Cl,
^Na
\ The force of attraction between the two charges,6+ and 5-, falls off rapidlyasthe distancebetweenthe chargesincreases.
.H
H.
N
6+
'.,o.
H
-S
ELECTROSTATIC ATTRACTIONS
I
H
CI al\
Despitebeing weakenedby water and salt, electrostaticattractions are very important in biologicalsystems.For example,an enzymethat bindsa positivelychargedsubstratewill often have a negativelychargedamino acidsidechain at the appropriateplace.
MONOSACCHARIDES Monosaccharides usuallyhavethe generalformula (CH2O)',where n can be 3, 4, 5, 6,7, or 8, and havetwo or more hydroxylgroups. Theyeither containan aldehydegroup ( -c(l ) and are calledaldosesor a ketone group ( ).:o ) and are calledketoses. 3-carbon(TRIOSES)
S-carbon(PENTOSES)
6-carbon(HEXOSES)
H,O
\^// L
H,O
ta/
I
I-OH I H-C -OH I H-C -OH I H-C -OH I
g l y c er a1d e h y d e
nbose
H-C
U
o
/ro
H-C
c' I-OH
H-C
I-OH
"t
H
H-C
{-OH I
HO-C-H
I
H-C -OH H-C H-C
H
I-OH
I-OH I
H
g Iu c o s e
H H
I I I H-C -OH I H-C -OH I H-C -OH I
I I c:o
H-C -OH
H-C -OH U u Y
H
I
H-C -OH
I c*-o I-OH
H-C
I
H
H
ribulose
d i hydroxyacetone
I
HO-C-H
I
H-C -OH
I
H-C -OH H-C
I -OH I H
fructose
RINGFORMATION
ISOMERS
In aqueoussolution,the aldehydeor ketone group of a sugar moleculetendsto reactwith a hydroxylgroup of the same molecule,therebyclosingthe moleculeinto a ring.
Many monosaccharides differ only in the spatialarrangement of atoms-that is,they are isomers.For example,glucose, galactose,and mannosehavethe sameformula (C6H,'2Ojbut differ in the arrangementof groupsaround one or two carbon atoms.
H'. zro rc -OH 2C HO;C -H H- C - O H al H -sC-OH H
-CH?OH
I
grucose
cH20H OH
H'. zo
,f
H -C -OH
mannoSe
-l
H-C -OH HIC -OH -CH,OH
OH
OH
Note that eachcarbonatom hasa number.
Thesesmalldifferencesmake only minor changesin the chemicalpropertiesof the sugars.But they are recognizedby enzymesand other proteinsand thereforecan have important biologicaleffects.
0 A N DB L T N K S
SUGARDERIVATIVES
The hydroxylgroup on the carbonthat carriesthe aldehydeor ketone can rapidlychangefrom one positionto the other. Thesetwo positionsare calleda and B.
The hydroxylgroupsof a simple can be replaced monosaccharide by other groups.For example,
glucosamine
cr hydroxyl B hydroxyl As soon as one sugaris linkedto another,the o or B form is frozen.
qH2oH
DISACCHARIDES The carbonthat carriesthe aldehyde or the ketone can reactwith any hydroxylgroup on a secondsugar m o l e c u l et o f o r m a d i s a c c h a r i d e . The linkageis calleda glycosidic bond. Threecommon disaccharides are maltose(glucose+ glucose) lactose(galactose+ glucose) sucrose(glucose+ fructose) The reactionforming sucroseis shown here.
OLIGOSACCHARIDES AND POLYSACCHARIDES Largelinearand branchedmoleculescan be made from simplerepeatingsugar s u b u n i t sS. h o r tc h a i n sa r e c a l l e do l i g o s a c c h a r i d ewsh,i l e l o n g c h a i n sa r e c a l l e d polysaccharides. Glycogen,for example,is a polysaccharide made entirelyof glucoseunitsjoined together.
glycogen
COMPLEXOLIGOSACCHARI DE5 In many casesa sugarsequence is nonrepetitive.Many different moleculesare possible.Such complexoligosaccharides are usuallylinkedto proteinsor to lipids, as is this oligosaccharide, which is molecule oart of a cell-surface that definesa particularblood group.
COMMON FATTY ACIDS
Fatty acidsare stored as an energy reserve(fats and oils)through an esterlinkageto glycerolto form triacylglycerols, alsoknown astriglycerides.
Theseare carboxylicacids with long hydrocarbontails. COOH
I
CH,
I
CH,
I
CHz
I
CH,
I
CH,
I
CH,
COOH I
f", CH,
I
CH,
l-
COOH
I
CH,
I
I 9I H ,
CH,
I
CHr
I
f",
I
fn,
CH,
fn, fn, fn' CHt
CH
CH"
I
CHr
I CH, I CHt
I CHt
I
CH,
I
CH,
I 9H, I cH.
t"
CH:
f*'
I
i", f*, CH:
CH,
,r'C
o tl
o'C
CHt I CHt tI CHt
I
CH,
ll (
o il ),,c
Hundredsof different kindsof fatty acidsexist.Somehaveone or more double bonds in their hydrocarbontail and are saidto be unsaturated.Fattyacidswith no double bondsare saturated.
-oo \/
I
L-
I
CH
I
T h i sd o u b l eb o n d is rigid and creates . za kinkin the chain. ' ll rh" restof the chain is free to rotate about the other C-C bonds.
CHt
I 9H, I CHt tI CH. I CH,
I
CH,
I
CH. 3i,'J'''' t' (Crs)
stear|c a c i d( C r e )
CH:
space-fillingmodel
oleic acid (Cre)
carbonskeleton
UNSATURATED
CARBOXYL GROUP
SATURATED
P H O S P H O L I P I D S P h o s p h o l i pai dr set h e m a j o rc o n s t i t u e n t s of cellmembranes.
lf free, the carboxylgroup of a fatty acidwill be ionized.
,a/
&c.. B u t m o r e u s u a l l yi t i s l i n k e dt o other groupsto form either esters
WC
or amides
o ,1/ C \
s p a c e - f i l l i nm g o d e lo f the phospholipid p h o s p h a t i d y l c hionle
In phospholipids two of the -OH groupsin glycerolare linkedto fatty acids,while the third -OH group is linked g e n e r a ls t r u c t u r e to phosphoricacid.The phosphateis further linkedto o f a p h o s p h o l i p i d one of a varietyof smallpolar groups(alcohols).
LIPIDAGGREGATES
POLYISOPRENOIDS long-chainpolymersof isoprene
ln water they can form a surfacefilm o r f o r m s m a l lm i c e l l e s .
Their derivativescan form largeraggregatesheld together by hydrophobicforces: Triglycerides can form largespherical fat dropletsin the cell cytoplasm.
O T H E RL I P I D S
ipid P h o s p h o l i p i dasn d g l y c o l i p i dfso r m s e l f - s e a l i nl g bilayersthat are the basisfor all cell membranes.
Lioidsare defined asthe water-insoluble m o l e c u l e isn c e l l st h a t a r e s o l u b l ei n o r g a n i c solvents.Two other commontypesof lipids are steroidsand polyisoprenoids. Both are made from isopreneunits.
CH.
\ -CH:CHr .C ,//
Lr 12
isoprene
Steroidshavea common multiple-ringstructure.
cholesterol-found in manv membranes
GLYCOLIPIDS Likephospholipids, thesecompoundsare composedof a hydrophobic
- C - NI H
dolicholphosphate-used to carryactivatedsugars in the membrane-associated synthesisof glycoproteins and somepolysaccharides
NUCLEOTIDES
PHOSPHATES
A nucleotideconsistsof a nitrogen-containing base,a five-carbonsugar,and one or more phosphategroups.
The phosphatesare normallyjoined to the C5 hydroxylof the riboseor deoxyribosesugar(designated5'). Mono-, di-, and triphosphatesare common.
BASICSUGAR LINKAGE
BASE
-o-
-
ililtl
P- o -
"
-
P- o -
I I o-o-o
P- . ) - c H ^
l-"
a sI n
l"'ott
The phosphatemakesa nucleotide negativelycharged.
The baseis linkedto the samecarbon(C1) usedin sugar-sugar bonds.
NH l2 I i
HclC\NH
\Hu
I
ll U
HCli\x
"t-'.,-t\
llC lcvtosine nt-r-t\ o
luracil
The basesare nitrogen-containingring c o m p o u n d se, i t h e rp y r i m i d i n eosr p u r i n e s .
H o l1
" n,.I ' \./-\**
3
ll r I thy
lrl\ - t \
PENTOSE a five-carbonsugar
Eachnumberedcarbonon the sugarof a nucleotideis followed by a prime mark;therefore,one speaksof the " 5 - p r i m ec a r b o n , "e t c .
two kindsare used
bose B-o-2-deoxyri usedin deoxvribonucleic acid
NOMENCLATURE
A nucleosideor nucleotideis named accordingto its nitrogenousbase. Singleletter abbreviationsare usedvariouslyas shorthandfor (1) the basealone,(2) the nucleoside, or (3) the whole nucleotidethe contextwill usuallymake clearwhich of the three entitiesis meant.When the context is not sufficient,we will add the terms "base", B A S E+ S U G A R= N U C L E O S I D E "nucleoside","nucleotide",or-as in the examolesbelow-use the full 3-letternucleotide cooe. AMP dAMP UDP ATP
N U C L E IACC I D S
= adenosinemonophosphate = deoxyadenosine monophosphate = u r i d i n ed i p h o s p h a t e = adenosinetriphosphate = NUCLEOTIDE BASE+ SUGAR+ PHOSPHATE
NUCLEOTIDES HAVEMANY OTHERFUNCTIONS
Nucleotidesare joined together by a phosphodiester linkagebetween 5' and 3' carbonatomsto form nucleicacids. The linearsequenceof nucleotidesin a nucleicacid chain is commonly abbreviated by a one-letter code, A-G-C-T-T-A-C-A, with the 5' end of the chain at the left.
bonds. Theycarrychemicalenergy in their easilyhydrolyzedphosphoanhydride
p h o s p h o a n h y d r i dbeo n d s
-l ooo - o- i l i l t l
-l
- fH,
l- o- P- o- P- o
o-
o-
o-
example:ATP(or
OH
)
OH
Theycombinewith other groupsto form coenzymes.
H
O
H
H
O
H
HS
ttttlllllll
H
t
H
H
H
H
H
O
H CH.H
ll l l'l il r r t t _C _C _N _C _C -C _N_C _C _C -C
ll
-O-
HO CH3H
O
ll
P_O-P-O O-
- CH 2
O-
example:coenzymeA (CoA)
ooH
5' endof chain
I O : P- O -
5',
oTheyare usedas specificsignalingmoleculesin the cell' l)Hz
example:cyclicAMP (cAMP)
o
o
II I
P_
I
o-
ooH
T H EI M P O R T A N COEF F R E E E N E R GF YO RC E L L S Life is possiblebecauseof the complexnetwork of interacting chemicalreactionsoccurringin everycell.In viewingthe metabolicpathwaysthat comprisethis network, one might suspectthat the cell has had the abilityto evolvean enzymeto carryout any reactionthat it needs.But this is not so.Although enzymesare powerful catalysts, they can speedup only those reactionsthat are thermodynamically possible;other reactions proceedin cellsonly becausethey are coupledto very favorable reactionsthat drive them. The ouestionof whether a reaction
can occurspontaneously, or insteadneedsto be coupledto another reaction,is centralto cell biology. The answeris obtained by referenceto a quantity calledthe free energy: the total changein free energyduring a set of reactionsdetermines whether or not the entire reactionseouencecan occur.ln this panel we shallexplainsomeof the fundamentalideas-derived from a specialbranchof chemistryand physicscalledthermodynamics-that are requiredfor understandingwhat free energy is and why it is so important to cells.
E N E R GR Y E L E A S EBDY C H A N G EISN C H E M I C ABLO N D I N G I SC O N V E R T EI N D T OH E A T of molecularcollisionsthat heat uo first the walls of the box and then the outsideworld (representedby the sea in our example).In the end, the systemreturnsto its initial temperature,by which time all the chemicalbond energy releasedin the box has been convertedinto heat energyand transferredout of the box to the surroundings. Accordingto the first law, the changein the energyin the box (AE6o", which 5EA we shalldenote as AE) must be equal and oppositeto the amount of heat energytransferred,which we shalldesignate as h: that is,AE = -h. Thus,the energy in the box (E) decreases when heat leavesthe system. Ealso can changeduring a reactionas a resultof work being done on the outsideworld. For example,supposethat there is UNIVERSE a smallincreasein the volume (Ay) of the box during a reaction. Sincethe walls of the box must pushagainstthe constant pressure(P) in the surroundingsin order to expand,this does An enclosedsystemis defined as a collectionof moleculesthat work on the outsideworld and requiresenergy.The energy doesnot exchangematter with the restof the universe(for usedis P(AV),which accordingto the first law must decrease example,the "cell in a box" shown above).Any suchsystemwill the energy in the box (E) by the sameamount. ln most reactions containmoleculeswith a total energyE. Thisenergywill be chemicalbond energy is convertedinto both work and heat. distributedin a varietyof ways:someasthe translationalenergy Enthalpy (H) is a compositefunction that includesboth of these of the molecules,someas their vibrationaland rotationalenergies, (H = E + PV).To be rigorous,it is the changein enthalpy but most asthe bonding energiesbetweenthe individualatoms (AH) in an enclosedsystem,and not the changein energy,that that make up the molecules.Supposethat a reactionoccursin is equal to the heat transferredto the outsideworld during a the system.The f irst law of thermodynamicsplacesa constraint reaction.Reactionsin which H decreases releaseheat to the on what typesof reactionsare possible:it statesthat "in any surroundingsand are saidto be "exothermic,"while reactions process, the total energyof the universeremainsconstant." in which H increases absorbheat from the surroundingsand Forexample,supposethat reactionA - B occurssomewherein are saidto be "endothermic."Thus,-h = AH. However,the the box and releases a great deal of chemicalbond energy.This volume changeis negligiblein most biologicalreactions,so to energywill initiallyincreasethe intensityof molecularmotions a good approximation (translational, vibrational,and rotational)in the system,which is equivalentto raisingits temperature.However,these increased motionswill soon be transferredout of the svstembv a series
THESECOND LAW OFTHERMODYNAMICS Considera containerin which 1000coinsare all lying headsup. lf the containeris shakenvigorously,subjectingthe coinsto the typesof random motionsthat all moleculesexperiencedue to their frequent collisionswith other molecules,one will end up with about half the coinsoriented headsdown. The reasonfor this reorientationis that there is only a singleway in which the originalorderlystateof the coinscan be reinstated (everycoin mu{lie headsup), whereasthere are many different ways (about 102s8) to achievea disorderlystate in which there is an equal mixtureof headsand tails;in fact, there are more wavs
to achievea 50-50statethan to achieveany other state.Each state has a probabilityof occurrencethat is proportionalto the number of ways it can be realized.The secondlaw of thermodynamicsstatesthat "systemswill changespontaneously from statesof lower probabilityto statesof higher probability." Sincestatesof lower probabilityare more "ordered" than statesof high probability,the secondlaw can be restated: "the universeconstantlychangesso as to becomemore disordered."
THEENTROPY, 5 The secondlaw (but not the first law) allowsone to predictthe directionof a particularreaction.But to make it usefulfor this purpose,one needsa convenientmeasureof the probabilityor, equivalently,the degreeof disorderof a state.The entropy (5) i s s u c ha m e a s u r el.t i s a l o g a r i t h m i fcu n c t i o no f t h e p r o b a b i l i t y suchthat the changein entropy (45) that occurswhen the reactionA - B convertsone mole of A into one mole of B is A5=RInpB/pA where p4 and p, are the probabilitiesof the two statesA and B, R is the gas constant(2 cal deg 1 mole 1),and A5 is measured i n e n t r o p yu n i t s( e u ) .I n o u r i n i t i a le x a m p l eo f 1 0 0 0c o i n s t, h e relativeprobabilityof all heads(stateA) versushalf headsand half tails (stateB) is equal to the ratio of the number of different waysthat the two resultscan be obtained.One can calculate t h a t p A = 1 a n d p , = 1 0 0 0 1 ( 5 0 x0 15 0 0 1-) 1 0 2 s eT.h e r e f o r e , the entropy changefor the reorientationof the coinswhen their
containeris vigorouslyshakenand an equal mixtureof heads or about 1370eu per mole of and tails is obtained is R In (102e8), We seethat, becauseA5 suchcontainers(6 x 1023containers). defined above is positivefor the transitionfrom stateA to state B (ps/p4 > 1), reactionswith a large increasein 5 (that is, for which A5 > 0) are favoredand will occurspontaneously. in Chapter2, heat energycausesthe random As discussed the transferof heat from an commotionof molecules.Because the number of enclosedsystemto its surroundingsincreases different arrangementsthat the moleculesin the outsideworld their entropy.lt can be shown that the can have,it increases releaseof a fixed quantity of heat energyhasa greaterdisordering effect at low temperaturethan at high temperature,and that the value of A5 for the surroundings.as defined above (ASr"u), is preciselyequalto h, the amount of heattransferredto the surroundingsfrom the system,dividedby the absolute temperature(f ):
T H EG I B B S FREE E N E R G YG, When dealingwith an enclosedbiologicalsystem,one would like to have a simpleway of predictingwhether a given reaction will or will not occurspontaneously in the system.We have seenthat the crucialquestionis whether the entropy changefor the universeis positiveor negativewhen that reactionoccurs. In our idealizedsystem,the cell in a box,there are two separate componentsto the entropy changeof the universe-the entropy changefor the systemenclosedin the box and the entropy changefor the surrounding"sea"-and both must be added together before any predictioncan be made.For example,it is possiblefor a reactionto absorbheat and therebydecreasethe entropy of the sea (A5r""< 0) and at the sametime to cause sucha large degreeof disorderinginsidethe box (A56o* > 0) = A5r"" + A56o,is greater than 0. In this that the total A5rn;u"rr" casethe reactionwill occurspontaneously, eventhough the seagivesup heat to the box during the reaction.An exampleof sucha reactionisthe dissolvingof sodiumchloridein a beaker containingwater (the "box"), which is a spontaneousprocess eventhough the temperatureof the water dropsasthe salt goesinto solution. Chemistshavefound it usefulto define a number of new "compositefunctions"that describecombinationsof physical propertiesof a system.The propertiesthat can be combined includethe temperature(f), pressure(P), volume (V), energy (E), and entropy (5). The enthalpy(H) is one suchcomposite function.But by far the most usefulcompositefunction for biologistsis the Gibbs free energy, G. lt servesas an accounting devicethat allowsone to deducethe entropy changeof the universeresultingfrom a chemicalreactionin the box, while avoidingany separateconsiderationof the entropychangein the sea.The definition of G is G=H-TS where, for a box of volume V, H is the enthalpydescribedabove (E + PV), r is the absolutetemperature,and 5 is the entropy. Eachof thesequantitiesappliesto the insideof the box only. The changein free energyduring a reactionin the box (the G of the productsminusthe G of the startingmaterials)is denoted asAG and, as we shallnow demonstrate,it is a direct measureof the amount of disorderthat is createdin the universewhen the reaction occurs.
At constant temperature the change in free energy (AG) during a reactionequalsAH - IA5. Rememberingthat AH = -h, the heat absorbedfrom the sea,we have
But h/f is equal to the entropy change of the sea (A5r""),and the A5 in the above equation is A56o^.Therefore
We concludethat the free-energychangeis a direct measure of the entropy changeof the universe.A reactionwill proceed in the directionthat causesthe changein the free energy(AG) to be lessthan zero, becausein this casethere will be a positive entropy changein the universewhen the reactionoccurs. For a complexset of coupledreactionsinvolvingmany the total free-energychangecan be comdifferent molecules, puted simplyby adding up the free energiesof all the different molecularspeciesafter the reactionand comparingthis value with the sum of free energiesbefore the reaction;for common the requiredfree-energyvaluescan be found from substances publishedtables.In this way one can predictthe directionof a reactionand thereby readilycheckthe feasibilityof any proposedmechanism. Thus,for example,from the observed proton gradient valuesfor the magnitudeof the electrochemical acrossthe inner mitochondrialmembraneand the AG for ATP hydrolysisinsidethe mitochondrion,one can be certainthat ATP synthaserequiresthe passageof more than one proton for each moleculeof ATPthat it synthesizes. The value of AG for a reactionis a direct measureof how far the reactionis from equilibrium.The large negativevaluefor ATP hydrolysisin a cell merelyreflectsthe fact that cellskeep the ATP hydrolysisreactionas much as 10 ordersof magnitude away from equilibrium.lf a reactionreachesequilibrium, AG = 0, the reactionthen proceedsat preciselyequal rates in the forward and backward direction. For ATP hydrolysis, equilibriumis reachedwhen the vast majorityof the ATP has been hydrolyzed,as occursin a dead cell'
F o r e a c hs t e p ,t h e p a r t o f t h e m o l e c u l et h a t u n d e r g o e sa c h a n g ei s s h a d o w e di n b l u e , a n d t h e n a m e o J t h e e n z y m et h a t c a t a l y z etsh e r e a c t i o ni s i n a y e l l o w b o x .
S T E P1 G l u c o s ei s phosphorylatedby ATPto f o r m a s u g a rp h o s p h a t e . T h e n e g a t i v ec h a r g eo f t h e phosphatepreventspassage of the sugar p h o s p h a t et h r o u g h t h e p r a s m am e m D r a n e , t r a p p i n gg l u c o s ei n s i d e the cell.
o. .H \,/
't-
!Ir
1CH?OH
H-C.-OH
l'
f r o m c a r b o n1 t o H O c a r o o nz , t o r m t n g a ketosefrom an a l d o s es u g a r ( S e e Panel2-4.)
*'-i? +
H.)-(--H
lH-
OH
H-C-OH 5 | -CH,OqP(openchainform)
H-C-OH
l5 .CH,OP ( o p e nc h a i nf o r m )
STEP3 The new hydroxyl .l g r o u p o n c a r b o n1 ' , phosphorylatedby ATP,in p r e p a r a t i o nf o r t h e f o r m a t i o n o f t w o t h r e e - c a r b o ns u g a r phosphates.The entry of sugars i n t o g l y c o l y s iiss c o n t r o l l e da t t h i s s t e p ,t h r o u g h r e g u l a t i o no f t h e enzyme p hosphof ru ctok i nase-
P O H-) |C, " - \o (
rLJ/lHr))
(ringform)
CH,O P
|
+
tt''
OH
f ructose 1,6-bisphosphate
CH,O P
STEP 4 The s i x - c a r b o sn u g a ri s c l e a v e dt o p r o d u c e two three-carbon m o l e c u l e sO n l y t h e glyceraldehyde 3-phosphate can p r o c e e di m m e d i a t e l y through glycolysis
I
a-A
I
HO
HO-C-H
I
+C
H-C-OH
I I
cH2o P
I
H-C-OH
I
cH2o
I
( o p e nc h a i nf o r m ) f r u c t o s e1 , 5 - b i s p h o s p h a t e
CH,OH
\//
I
H-C-OH
cH2o P
S T E P5 Theother product of step 4, i hyd d roxyacetone p h o s p h a t ei,s isomerized to form glyceraldehyde 3-phosphate.
ADP
P
glyceraldehyde ?-nhncnh:te
C
,ro
" I I
H-C-OH
cH2o P
g l y c e r a l d e h y d3e- p h o s p h a t e
t
ovo
+EEEE+u*
f
H-C-OH
l
cH2o P F i g u r e2 - 7 3 )
1,3-bisphosphoglycerate
g l y c e r a l d e h y d3e- p h o s p h a t e
S T E P7 T h et r a n s f e r t o A D Po f t h e h i g h - e n e r g yp h o s p h a t e groupthat was generated in step 6 forms ATP.
C
I I
+
H-C-OH
cH2o P 1,3-bisphosphoglycerate
o. .o \//
o. .o\// 'Cl
STEP 8 Theremaining p h o s p h a t ee s t e rl i n k a g ei n 3-phosphoglycerate, which has a relativelylow free energy of hydrolysis,is moved from carbon 3 to carbon 2 to form 2-phosphoglycerate.
C I
H-C-Oi*P
H-C-OH
,l
3-phosphoglycerate
C
C
I H-C-O
P
I
cH2oH 2 - p h o s p h olgy c e r a t e
P
cHz p h o s ph o e n oIp y r u v a t e
C
C
C-O
I
C-O
o. .o\./
o. .o\//
I
2-p hosphog lycerate
o. .o \./
o. .o \,/
STEP9 The removal of water from 2-phosphoglycerate c r e a t e sa h i g h - e n e r g ye n o l p h o s p h a t el i n k a g e .
STEP10 The transfer to ADP of the high-energy p h o s p h a t eg r o u p t h a t w a s generated in step 9 forms A T P ,c o m p l e t i n gg l y c o l y s i s .
t-
cH2oH
- C H r O ' .P .
P
cHz p h o s p h o e n oply r u v a t e
N E TR E S U LO T FG L Y C O L Y S I S
In addition to the pyruvate,the net productsare t w o m o l e c u l e so f A T Pa n d t w o m o l e c u l e so f N A D H
I I
CH:
HS-CoA
pyruvate
The completecitricacidcycle.The two carbonsfrom acetylCoA that enter this turn of the cycle(shadowedin ) will be convertedto CO, in subsequentturns of the cycle:it is the two carbons shadowed in blue that are convertedto CO, in this cycle.
(2c)
acetyl CoA
HS-CoA
eoo *", tHO-C-COO-
Step 1
(+a^)
in, \*'ioo\
Po
(6c) isocitrate fn, HC COO
t
citrate(6c)
no-tH Coo-
fumarate (4C)
(x-ketoglutarate(5C)
ffoo-
€oo-
2
t
fu
'
t' CH
s u c c i n a t e( 4 C ) ,rStep6
GOO-
Hzo
Coz
CH.
succinylCoA (4C)
t'
Ioo-
I
coo-
2
CH,
il*p_:,
t-
C=O I S-CoA
HS-CoA
EE*tcoz
HS-CoA
Detailsof the eight stepsare shown below. For eachstep,the part of the moleculethat undergoesa changeis shadowedin hlue and the name of the enzymethat catalyzesthe reactionis in a yellow box.
O:C -S-CoA
STEP1 After the enzyme removesa proton from the CH, group on acetyl CoA, t h e n e g a t i v e l yc h a r g e d C H r - f o r m sa b o n d t o a carbonylcarbon of oxaloacetate.The s u b s e q u e nlto s sb y h y d r o l y s ios f t h e c o e n z y m e A (CoA)drivesthe reaction strongly forward.
S T E P2 An isomerization reaction,in which water is f i r s t r e m o v e da n d t h e n added back, movesthe hydroxyl group from one c a r b o na t o m t o i t s n e i g h b o r
I
CH,
cooI CHr
t-
HO-C-COO
I 9H, l
coo-
t-
HO-C-COO
+ HS-CoA + H*
I
CH,
t-
coocitrate
coo-
Hzo H-
citrate
cooI C-H I c-cootl C-H I coo-
cls-aconitateintermediate
H-C
cooI-H
I
H-C -COO-
I
HO-C -H I
coo
isocitrate
STEP3 ln the first of f o u r o x i d a t i o ns t e p si n t h e cycle,the carbon carrying the hydroxyl group is convertedto a carbonyl g r o u p .T h e i m m e d i a t e p r o d u c ti s u n s t a b l e l,o s i n g C O ,w h i l e s t i l l b o u n d t o the enzyme.
cooI H-C -H I H-C -H I a-i I coo
cooI-H
H-C
H_C
I
HO-c-H
I coo-
(x-ketogIutarate
isocitrate
STEP4 The o-ketog/utarate dehyd ro gen asecomplex closely r e s e m b l etsh e l a r g ee n z y m e complexthat convertspyruvate to acetyl co{(pyruvate dehydrogenase).lt likewise catalyzesan oxidation that producesNADH,CO2,and a h i g h - e n e r g yt h i o e s t e rb o n d t o coenzymeA (CoA).
STEP5 A phosphate m o l e c u l ef r o m s o l u t i o n d i s p l a c etsh e C o A ,f o r m i n g a h i g h - e n e r g yp h o s p h a t e l i n k a g et o s u c c i n a t eT. h i s p h o s p h a t ei s t h e n p a s s e dt o G D Pt o f o r m G T P .( l n b a c t e r i a and plants,ATP is formed instead.)
cooI H-C-H I-H
cooI H-C -H I H-C-H I c:o I coo-
H-C
S
succinyl-CoA
(x-ketoglutarate
cooH-C
-H
H-C
I S-CoA s u c c ln a r e
H-C
coo I-H I
H-C-H
s u c cni a t e
cooI C-H H-C
I
coo f u marate
S T E P8 l n t h e l a s to f J o u r o x i d a t i o ns t e p si n t h e c y c l et,h e c a r b o nc a r r y i n gt h e h y d r o x y l group is converted t o a c a r b o n y lg r o u p , regeneratingthe oxaloacetate n e e d e df o r s t e p 1 .
,,
I coo-
I coo
S T E P7 T h e a d d i t i o no f water to fumarate placesa hydroxyl group next to a c a r b o n y lc a r b o n .
coo I-H
I ,, ) n-L-r
I H-C-H I
succinyl-CoA
S T E P6 ln the third oxidation step in the cycle,FAD removestwo hydrogen atoms from succinate.
cooI HO-C -H I H-C-H I coomalate
I I-CoA
succinatedehydrogenase
-
cooI
C-H
/\
rl
H-C
I coof umarate
cooI HO-C -H I H-C-H I coo malate
cooI c:o I CH,
I coo oxa loacetate
+ HS-CoA
124
Chapter2: CellChemistryand Biosynthesis
REFERENCES General Berg,JM,Tymoczko, JL& StryerL (2006)Biochemistry, 6th ed New York:WH Freeman GarrettRH& Grisham CM (2005)Biochemistry, 3rded philadelphia: ThomsonBrooks/Cole Hortonl-1R, MoranLA,Scrimgeour et a (2005)Princip esof Bioch-.mistry 4th ed UpperSaddleRiver, NJ:prenticeHall NelsonDL& CoxMM (2004)Lehnlnger Principles of Biochemistry, 4th ed NewYork:Worth NichollsDG& Ferguson S_l(2002)Bioenergerics,3rd ed Newyork: AcademicPress MathewsCK,van Ho de KE& AhernK G (2000)Biochemistry, 3rded 5 a ql r a r c , s c oB: e n j a rr C u m m i n g s MooreJA(1993)Sclence Asa Wayof KnowingCambridge, MA: Harvard University Press VoetD,Voet.lG& PrattCIV(2004)Fundamentals of Biochemistry, 2nd ed NewYork:Wiley The ChemicalComponentsof a Cell AbelesRH,FreyPA& Jencks WP(1992)Biochemistry Boston: Jones& Bartlett AtkinsPW('l996)Mo ecues NewYork:WH Freeman Branden C & ToozeJ (l 999) ntroduction to ProteinStructure, 2nd ed NewYork:Garland Scence Bretscher MS(,l985) Themolecules of the cel membrane5clAm 2 5 3 :01O I O 9 Burey 5K& Petsko GA(t 9BB)Weaklypolarinteractions in proteins,4dy PrateinChem39.125-1 89 De DuveC (2005)Singulanties: Landmarks on the pathways of Lif-. Cambridge: Cambridge University Press DowhanW (1997) Molecular phospholipid basisfor membrane diversity: Whyarethereso many ipids?AnnuRevBiochem66:j99-232 EsenbergD & Kauzman W (l 969)TheStructure and properties of WaterOxford:OxfordUnivers ty Press FershtAR(198/)Ihe hydrogenbond in molecular recognitionIrendj BiochemSci123A1-304 Franks F ('l993)WaterCambridge: RoyalSociety of Chemistry Henderson ll (1927) TheFitness of the Envronment,1958ed Boston: Beacon Neidhardt FC,Ingraham _lL& Schaechter M (t 990)physioiogy of the Bacterial Cel: A Mo ecularApproachSunderland, MA:Sinauer PaulingL (1960)Ihe Natureof the Chemical Bond,3rded thaca,Ny: CornellUniversity Press Saenger W (l 984)Princrples of NucleicAcidStructure, New yorx: S p rni g e r SharonN (1980)Carbohydrates 5ci,4,m 243.90116 Stillinger FH(1980) WarerrevisitedScience 2a9.45j-457 TanfordC ('1978) Thehydrophobtc effectandthe organization of living m a t t e rS c l e n c2e0 0 : , l 0 1l2O l 8 TanfordC (1980) ThelydrophobicEffectFormation of Micelesand BioogicalMembranes, 2nd ed Newyork.JohnWi ey Catalysisand the Use of Energy by Cells AtkinsPW(1994) Ihe SecondLaw:Energy, Chaosand Form Newyork: Scientif c American Books AtkinsPW& De PaulaiD (2006)Physical Chemistry for the Life press Sciences Oxford:OxfordUniversity BaldwinJE& KrebsH (1981)The Evolurion of Metabolic CyclesNciure 291:381-382 BergHC(1983)RandomWalksin B ology Princeton, NJ:princeton University Press Dickerson RE(,1969) MolecularThermodynamics Menlopark,CA: B e n j a m iCn u m m i n g s DillKA& Bromberg S (2003) Molecular DrivingForces: StatisticalThermodynamics in Chemistry and Bioogy Newyork:Garland Science Dressler D & PotterH (1991)DiscovelngEnzymes Newyork:Sclentific American L brary
Einstein A (1956)lnvestigations on the Theoryof Brownian Movement NewYork:Dover FrutonJS(1999)Proteins, Enzymes, Genes: The nterplayof Chemistry and Bioogy NewHaven:Yale University Press, GoodseI DS(1991)nsidea livingcell Trends BiochemSci16:203-206 Karplus M & McCammon JA (1986) Thedynamics of protens SclAm 254:42-51 ) o l e c u l adry n a m i cssi m u l a t i o ni ns K a r p l uM s & P e t s kG o A( 1 9 9 0M biology Nature347:631639 Kauzmann W (1967) Thermodynamics andStatistics: withApplications to GasesIn ThermalProperties of MatterVol2 NewYork:WA Benjamin, Inc Kornberg A (1989)Forthe Loveof Enzymes Cambridge, MA:Harvard University Press Lavenda BH(,1985) Brownian Motion5ci,4m252:7085 LawlorDW (2001)Photosynthesis, 3rded Oxford:BIOS L e h n i n g eArL ( 1 9 1 1T) h eM o l e c u l aBra s iosf B i o l o g i cE an l ergy Transformations, 2nd ed MenloPark, CA:Benjamin Cummings LipmannF (1941)Metabolic generation and uti izationof phosphate bondenergyAdvEnzymol 1:99-162 LipmannF (1971) Wanderings of a Biochemist NewYork:Wiley NisbetEE& SleepNH (2001)The habitatand narureof earlylife Nature 409:1081 3091 Racker E (l9BO)FromPasteur to Mitchell: a hundredyearsof n l n t r n t r r ^ o l r r c L o / 1p t ^ . t , ) : 2 l O - 2 I 5
Schrodinger E (1944& 1958)Whatis Life?: ThePhysicai Aspectof the L i v i n gC eI a n dM i n da n dM a t t e r1, 9 9 2c o m b i n e d ed Cambridge: Cambridge University Press van HoldeKE,JohnsonWC& Ho PS(2005)Principles of Physical Biochemistry, 2nd ed UpperSaddleRiver, NJ:Prentice Hal WalshC (2001)Enabling the chemistry of life Nature409.226-23i Westheimer FH(1982) Why naturechosephosphates Science 235.11/3-1t78 YouvanDC& MarrsBL(1987)Molecular mechanisms of photosynthesis SciAm 256:4249 How CellsObtain Energyfrom Food CramerWA & KnaffDB(1990)Energy Transduction in Bioogical Membranes, NewYork:Springer-Verlag, Dismukes GC,KlimovW, Baranov SVet al (2001) Theoriginof atmospheric oxygenon Earth: Theinnovation of oxygenic photosyntheis PracNatAcadSciUSA9821702175 Fel D (l 997)Understanding the Controlof MetabolismLondon: Portand Press F att JP(1995)Useand storageof carbohydrate and fat Am J ClinNutr 61,95259595. FriedmannHC(2004)FromButybacterium to E.coli:An essayon unity in biochemistryPerspect Biollrled47:47-66 Fothergill-Gilmore LA (,1986) Theevolutionof the glycolytic pathway Trends BiochemSci11:475l Heinrich R,Melendez-Hevia E,MonteroF et al (,1999) Thestructural designof glycolysis: An evo utionaryapproachBiochem SocTrans 27:294-298 HuynenMA,Dandekar T & BorkP (1999)Variarion and evolutionof the citricacidcycle:a genomicperspective, Trends Microbrol l:281-291 Kornberg HL(2000)Krebsand histrinityof cyclesNatureRevMolCell Biol1.225-228 KrebsHA& MartinA (1981)Reminiscences and Reflections Oxford/New York:Clarendon Press/Oxford University Press, KrebsHA (l 970)The historyof the tricarboxylic acidcycle perspect Biol Med14.154-17a MartlnBR(1987)Metabolic Regulation: A Molecular ApproachOxford: Blackwell Scientific McGilvery RW(,1983) Biochemistry: A Functional Approach, 3rded P h i l a d e l p hSi aa:u n d e r s MorowitzHJ(1993)Beginnings of Cellular Life:Metabolism Recapitulates Biogenesis NewHaven: YaleUniversity Press Newsholme EA& StarkC (1973)Regulation of Metabolism NewYork.Wiley,
gzl
'suorrerlaJqqe Jraq] srsr z-g a.rntlc pue seJnronJlsc..uore rreqt s^aoqs(6ZI-8ZI'dd) f-g larrud 'uo os pue 'spuoq tualeloc urrog ,{lrpear aruos 'pa8reqc dlanrlrsodro dyanrle8aueJesJeqlo '(,,3upeay-ra1e,,rn,,) crqoqdorpl.q pue reloduou eJe sureqo aprs asaq] Jo auros '(I-t arnt1fl suleqc epls prJe ourue lueJeJJrp0Z aql :saruadord anbrun slr prJEourrue qrea arr€ leqt pue puoq appdad u Suqeru ur pallolur lou aJp leql sprJp ourup aql Jo suoruod asoq] erp urcqr anppadar sq] ol peqJeDV'auoqlJuq eppdad,tod aql sB 01 parreJer sr ureqJ apltdaddlod eq] Jo eroo aqt Suop sruole;o acuanbas Surleadar aql 'acuanbas prJE ourrue relnrrged uir,ro sll qlp/r qf,Ea 'suralord luaragrp Jo spuesnoql dueur eJe ereq] pue 'spr3e ourrue yo acuanbas anbrun e seq uraloJd yo addr qteg'saptldadt1od se u,\i\oDl osp aJoJereqleJe sureloJd 'puoq apudad luapnoc e q8norq] roqq8rau slr ol pa{url qJpe 'sproeourrup asaqlJo uuqc Suoye tuo4 eperu sr elnJalou ugalord V'serua -dord pcruraqc lualaJJrpqllm qJEe 'suralord ur sprJeounuu;o saddl 0Z eJeaJaqJ
e)uenbesp!)V oulr,uvsll ^q palJ!)adssl ulelorde Jo adeqsaql 'lleJ e ur uorlJury slr seurru -Jelap alnJelour uralord qJee adeqs asrcardaql rtoq aqrJosepol Jo la^al Jnuole eqt te eJn]f,nr1suralord;o Surpuelsrapun srql esn alvrtaldeqc aqt ur Jale'I 'adeqs puolsuolulp-eerqt sll saurruJalep uralord p sruJoJleq] sprJe ourrue ;o 3uu1s 3uo1aql ur prJ€ ourrue qcea Jo uouelol eql ,lrroqJeprsuooa^ 'uorloes srq] uI 'Eurzeute ,{pr1 ruaas uec suralord;o flqqesran alqe{JeueJ eql 'suedxe ol uala '1atr'&o1srq ,{.reuor1n1ona ;o sreaf Jo suoilrq JeAopeun]-eug pue pado -la^ep uaeq seq uratord qJeaJo frlsruraqc pup eJnlJnJlsaql leq] ezlleat ellr eJuo 'Sursrrdrns 1ou sdeqrad sl sHI'umou) se1ncelotupalecqsrqdosl.lpuorlcurg pue xaldruoc rtlernpnrls lsotu eq] re; dq are suralord '^aet,tJolurod pcnuaqJ E ruoJC
SNI]]OUd IO ]UNDNU]SCNV]dVHS3HI
L)
I
NOtr)NnlNElOUd sNE1OUd lo tunl)nuts
SZL
C N V ] d V H S] H I
raldeq) s$ll ul
'suralord;o Surpuelsrapun daap B urpllp lsnur a^a'uorlJunJ serpoq Jno ,t,roqro 'dolanap so,,fuqrua^r,roq'dlrculcele lJnpuoc selreu ^ oq 'lJeJluoc salcsnur ntoq '{{Jomsaua8 anoq pue}sJepun o1 adoq uec e.nA aJoJag'aJuaJseurrunlJosaJJnosro'sadol 'sJeqgJIlsEIa'salnceloruezee4rlup 'sauoruJoq 'sulxol 'sarpoqque sE 'selnoeloru ]Je suralord pazqurcads Jaqlo VNCI palloul afuelun ue) asolautostodol iutseldofc aqr q8norql salaue8ro sladord 'aldruexa rc! 'u6au!q :sped Sulrour qlIM saurqcptu relncaloru Luq sE elJas sJeqlo la;'snalcnu ilac aq] ol eueJqruau eruseydaqt tuo4 pJE^ut spu -8rs;o ,te1ar pu8rs slas srolerSalur sB lJe Jo taqloue o] IIef, euo uror; sa8es leql -saur .d.rrucsuralord Jaqlo 'lleo aql Jo lno pue olut selnoeloru IIErusJo a8essed eq] IoJtuoJ teqt sdund pue slauueqJ ruJoJauerqruaru eruseld aqt ur peppaqua suralord 'suor]3eal IEJrueqJ r{ueru slr alourord tEq} IIaJ e ur seJeJJnsJEInJeloIu aleJrrlur aq] apnord saudzua 'snql 'suorlJunJ s,llal eql IIE dpeau elnJexa osp ,taql ls{Jolq Surplnq s,lleJ eql ,i.Iuo 1ou are .,(aq; 'ssuru 'fup s,llar e Jo lsou alnlrlsuoc suraloJd 'suralord Suurrasqo 'eJuesse ur 'eJe al'- ',Qnrloe IeJrruaqJ -orq Jo lef,rJlJelas1razdpue ro adocsorcnu e q8norql IIeJ u lp {ool a a uaq \
su!elold
126
Chapter3: Proteins
methionine (Met)
Tf
Ol
l/z
H-:N-C-C
|
H
(il
\i.
()
leucine(Leu)
(
f 'l t,/
o
()tl
(.'1
()
lH
oo
o
H
:N-c-c
\._o
I
\J
I
C.U, o
H
I
+
\^O
H
( |l,
I I -^N - c tl H
I
oC
\
H
o
I
( H,
('H
I
H,('
5
//\
tyrosine (Tyr)
Cl-JJ
(-H,
Hzo
H:O
( ) tl
p o l y p e p t i d eb a c k b o n e
s i d ec h a i n s I I
HHO
a m i n ot e r m i n u s or N-terminus
ov
^tttl
Hei-i-3 rl Hl
C
\
o
(H,
I
oono
(-H,
I
H,(-
5
( l-l I
(H, p o l y p e p t i d eb a c k b o n e
SCHEMATIC
SEQUENCE
Met
Asp
Tyr
As discussedin chapter 2, atoms behave almost as if they were hard spheres with a definite radius (their uan derwaals radius). The requirement that no two atoms overlap limits greatly the possible bond angles in a pollpeptide chain (Figure 3-3). This constraint and other steric interactions severely restrict the possible three-dimensional arrangements of atoms (or conformaflons). Nevertheless, a long flexible chain, such as a protein, can still fold in an enormous number of ways. The folding of a protein chain is, however, further constrained by many different sets of weak noncoualent bonds that form between one part of the chain and another. These involve atoms in the polypeptide backbone, as well as atoms in the amino acid side chains. There are three tlpes of weak bonds: hydrogen bonds, electrostatic attractions, and uan der waals .tttractions, as explained in chapter 2 (see p. 54). Individual noncovalent bonds are 30-300 times weaker than the tlpical covalent bonds that create biological molecules. But manyweak bonds acting in parallel can hold two regions of a polypeptide chain tightly together. In this way, the combined strength of large numbers of such noncovalent bonds determines the stability of each folded shape (Figure 3-4).
Figure3-1 The components of a protein. A protein consistsof a polypeptide backbonewith attachedsioe chains.Eachtype of protein differsin its sequenceand numberof aminoacids; therefore,it is the sequence of the chemically different sidechainsthat makeseach c a r b o x ytl e r m i n u s proteindistinct.Thetwo ends or C-terminus of a polypeptidechainare chemically different:the end carryingthe freeaminogroup (NH3+, alsowrittenNH2)isthe aminoterminus,or N-terminus, and that carrying the free carboxylgroup (COO-,alsowritten COOH)is the carboxylterminusor C-terminus. Theaminoacid sequenceof a proteinis alwayspresentedin the N-to-Cdirection,reading from left to rioht.
127
THESHAPEAND STRUCTURE OF PROTEINS
A M I N OA C I D Asparticacid G l u t a m i ca c i d Arginine Lysine Histidine Asparagine Glutamine Serine Threonine Tyrosine
Asp Glu Arg Lys His Asn Gln Ser Thr Tyr
S I D EC H A I N
A M I N OA C I D D E R K H N a S T Y
Ala Alanine Gly Glycine Val Valine Leu Leucine lle lsoleucine Pro Proline P h e n y l a l a n i n eP h e Met Methionine Trp Tryptophan Cys Cysteine
negative negative positive positive positive polar uncharged polar uncharged unchargep dolar polar uncharged polar uncharged
A G V L I P F M W C
nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar nonpolar
Figure3-2 The 20 amino acidsfound in proteins.Each aminoacidhasa three-letter and a one-letterabbreviation. Thereareequalnumbersof p o l a ra n d n o n p o l asr i d e chains;howevetsomeside chainslistedhereas polarare largeenoughto havesome non-polarproperties(for example,Tyr,Thr,Arg,Lys).For seePanet atomicstructures, 3-1 (pp.128-129).
A fourth weak force also has a central role in determining the shape of a protein. As described in Chapter 2, hydrophobic molecules, including the nonpolar side chains of particular amino acids, tend to be forced together in an aqueous environment in order to minimize their disruptive effect on the hydrogenbonded network of water molecules (see p. 54 and Panel 2-2, pp. f08-109). Therefore, an important factor governing the folding of any protein is the distribution of its polar and nonpolar amino acids.The nonpolar (hydrophobic) side chains in a protein-belonging to such amino acids as phenylalanine, leucine, valine, and tryptophan-tend to cluster in the interior of the molecule (just as hydrophobic oil droplets coalesce in water to form one large droplet). This enables them to avoid contact with the water that surrounds them inside a cell. In contrast, polar groups-such as those belonging to arginine, glutamine, and histidine-tend to arrange themselves near the outside of the molecule, where they can form hydrogen bonds with water and with other polar molecules (Figure 3-5). Polar amino acids buried within the protein are usually hydrogenbonded to other polar amino acids or to the polypeptide backbone.
(A)
(B) +180
a m i n oa c i d
o HC
. l- ' c i
R2
n
I
I
N
,,\,' I
-C
o
R1
p e p t i d eb o n d s
-180
0
+18(
ohl
threebonds Figure3-3 Stericlimitationson the bond anglesin a polypeptidechain.(A)Eachaminoacidcontributes (red)to the backboneofthe chain.Thepeptidebond is planar(grayshading)and doesnot permitrotation.By contrast, rotationcanoccuraboutthe Co-Cbond,whoseangleof rotationis calledpsi (V),and aboutthe N-Cobond,whoseangle an R group is often usedto denotean aminoacidsidechain(greencircles). of rotationis calledphi (Q).BVconvention, (B)Theconformation atomsin a proteinis determinedby one pairof Q and ry anglesfor eachaminoacid; of the main-chain betweenatomswithin eachaminoacid,most pairsof Q and ry anglesdo not occur.In this sobecauseof stericcollisions plot,eachdot represents an observedpairof anglesin a protein.Theclusterof dots in the bottom calledRamachandran (seeFigure3-7A)'(8,from quadrant that arelocatedin cr-helixstructures the amino acids left represents all of from AcademicPress.) 1981.Wlthpermission J. Richardson,Adv.Prot.Chem.34:174-175,
THEAMINO ACID
OPTICALISOMERS
Theo,-carbon atomisasymmetric, which allowsfor two mirrorimage(or stereo-) rsomers, LanoD.
T h e g e n e r a fl o r m u l ao f a n a m i n oa c i di s || ,/ c-carbonatom I t't' amtno ^ ^O^H. . c a r b o' x v l -CI -CO group H:N giouf R
group side-chain
R is commonlyone of 20 different sidechains. At pH 7 both the amino and carboxylgroups areionized.
Proteinsconsistexclusivelv of l-amino acids.
F A M I L I EO SF A M I N OA C I D S
BASIC S I D EC H A I N S histidine (Hiso , r H)
T h e c o m m o na m i n oa c i d s are grouped accordingto w h e t h e rt h e i r s i d ec h a i n s are
H
-N-C
tl
a c i di c basic u n c h a r g e dp o l a r nonpolar
H
HO
I-
CH,
I CH, I
CH, T h e s e2 0 a m i n o a c i d s are given both three-letter and one-letterabbreviations.
HO
ltl C-CI CH: I CH., I 9H, I
CH,
ttl
-N-C-CH
I
HN
./
I
NH,
T h u s :a l a n i n e= A l a = A
/tr
/\
C
t-
C.
//\
NH
NHr
CH,
HC:
CH NHt
Thesenitrogenshavea relativelyweak affinity for an H+and are only partly positive at neutral pH.
P E P T I DBEO N D S A m i n o a c i d sa r e c o m m o n l yj o i n e dt o g e t h e rb y a n a m i d el i n k a g e , c a l l e da p e p t i d eb o n d .
\l//
H
o -f
N-C-C
/l\
R
OH
\l//
R
Peptidebond: The four atoms in eachgray box form a rigid planar unit. There is no rotation around the C-N bond.
o
HOI'i
\llll// -c-\-c \-e /ll\
-C
N-C
/l\
OH
H
o -c
iiHH
;H Proteinsare long polymers o f a m i n oa c i d sl i n k e db y peptide bonds,and they are alwayswritten with the N-terminustoward the left. The sequenceof this tripeptide is histidine-cysteine-valine.
a m t n o -o r N-terminus
\
',f t'tt:
( ll {t ,
T h e s et w o s i n g l eb o n d sa l l o w r o t a t i o n ,s o t h a t l o n g c h a i n so f a m i n oa c i d sa r e v e r yf l e x i b l e .
SIDECHAINS NONPOLAR
A C I D I CS I D EC H A I N S
alanine (Val, or V)
(Ala,or A)
glutamicacid ( G l u ,o r E )
HO
HO
til
lll -N-C-C-
HO
ltl
-N-C-C-
-N-C-C-
(-llr
H
HCH
,/\
CH:
(F],
H
CH:
I
('tl,
I
( )/ \
leucine
(.
(Ile, or I)
(Leu,or L)
()
HO
HO
lll
ttl -N-C-C-
-N-C-C-
(tl ,
H
HCH
I
I
'('flr
(.llj
U N C H A R G EPDO L A RS I D EC H A I N S
CH,
CH.
( ' fI
CH:
proline (Phe,or F)
(Pro,or P)
HO
HO -N-C-C-
_N-C-C-
,/\
H
CH,
H
l-l
\
//\ oNHzc o
)n,
ll
n 9n,
#A
t . lI
( a c t u a lal yn i m i n oa c i d )
| //\
-N-C-C('ll,
( H,
CH? CHr
C
tll
lll -N-C-C-
/
methionine
di*ffiiiffift$ (Trp,or W)
(Met, or M)
HO
HO
til
lil -N-C-C-
Although the amide N is not chargedat neutral pH, it is polar.
ll
H
-N-C-C-
tl
( ll, C'H,
i
s-cll r
glycine
H
- cI I ('t
t,
I
('\ \-' I oH The -OH group is polar.
(Cys,or C)
(Gly,or G)
HO
HO
- N - c -lcl rll
lll -N-C-Cll
H
HH
CH,
I
SH
Disulfidebondscan form betweentwo cysteinesidechains in oroteins. --.u
-q-q-aH
--
130
Chapter3: Proteins g l u t a m i ca c i d
electrostatic attractions R
o
//
h y d r o g e nb o n d
H H
N -
CH, tCH,
van der Waalsattractions
t-
CHt
t-
t
Figure3-4 Threetypes of noncovalent bonds help proteinsfold. Althougha singleone of thesebondsis quiteweak, many of them often form togetherto createa strongbondingarrangement, as in the exampleshown.As in the previous figure,R is usedasa generaldesignation for an aminoacidsidechain.
n':
ProteinsFoldinto a Conformationof LowestEnergy As a result of all of these interactions, most proteins have a particular threedimensional structure, which is determined by the order of the amino acids in its chain. The final folded structure, or conformation, of any polypeptide chain is generally the one that minimizes its free energy. Biologists have studied protein folding in a test tube by using highly purified proteins. Treatment with certain
sequence contains all the information needed for specifying the three-dimensional shape of a protein, which is a critical point for understanding cell function. Each protein normally folds up into a single stable conformation. However, the conformation changes slightly when the protein interacts with other molecules in the cell. This change in shape is often crucial to the function of the protein, as we see later. Although a protein chain can fold into its correct conformation without outside help, in a living cell special proteins called.molecular chaperonesoften assist in protein folding. Molecular chaperones bind to partly folded polypeptide chains and help them progress along the most energetically ravoriute-rolaing
-* <_
hydrophobic core regton contains nonpotar s i d ec h a i n s unfolded polypeptide
p o l a rs i d ec h a i n s on the outside of the molecule can form hydrogen bondsto water
folded conformation in aqueousenvtronment
Figure3-5 How a protein folds into a compactconformation,The polaramino acidsidechainstend to gatheron the outsideof the protein,where they can interactwith water;the nonpolaramino acidsidechainsare buriedon the inside to form a tightly packedhydrophobic core of atomsthat are hidden from water. In this schematic drawing,the protein containsonly about30 aminoacids.
131
THESHAPEAND STRUCTURE OF PROTEINS (B)
(A)
o
E X P O STEO A H I G H CONCENTRATION O FU R E A
C
+
HzN
p u r i f i e dp r o t e i n i s o l a t e df r o m c el l s
denatu red proler n
o r i g i n a lc o n f o r m a t i o n of protein re-forms
pathway. In the crowded conditions of the q,toplasm, chaperones prevent the temporarily exposed hydrophobic regions in newly syrrthesizedprotein chains from associatingwith each other to form protein aggregates(see p. 388). However, the final three-dimensional shape of the protein is still specified by its amino acid sequence:chaperonessimplymake the folding processmore reliable. Proteins come in a wide variety of shapes,and they are generally between 50 and 2000 amino acids long. Large proteins usually consist of severaldistinct protein domains-structural units that fold more or less independently of each other, as we discussbelow Since the detailed structure of any protein is complicated, severaldifferent representationsare used to depict the proteins structure, each emphasizing different features. Panel 3-2 (pp. 132-133) presents four different representations of a protein domain called SH2, which has important functions in eucaryotic cells. Constructed from a string of 100 amino acids, the structure is displayed as (A) a pollpeptide backbone model, (B) a ribbon model, (C) a wire model that includes the amino acid side chains, and (D) a space-filling model. Each of the three horizontal rows shows the protein in a different orientation, and the image is colored in a way that allows the polypeptide chain to be followed from its N-termints (purple) to its C-terminrs (red). Panel 3-2 shows that a protein's conformation is amazingly complex, even for a structure as small as the SH2 domain. But the description of protein structures can be simplified because they are built up from combinations of several common structural motifs, as we discuss next.
Thea Helixand the B SheetAreCommonFoldingPatterns \Mhen we compare the three-dimensional structures of many different protein molecules, it becomes clear that, although the overall conformation of each protein is unique, two regular folding patterns are often found in parts of them. Both patterns were discovered more than 50 years ago from studies of hair and silk. The first folding pattern to be discovered, called the c helix, was found in the protein u-keratin, which is abundant in skin and its derivatives-such as hair, nails, and horns. Within a year of the discovery of the cr helix, a second folded structure, called a p sheet, was found in the protein ftbroin, the major constituent of silk. These two patterns are particularly common because they result from hydrogen-bonding between the N-H and C=O groups in the polypeptide backbone, without involving the side chains of the amino acids' Thus, many different amino acid sequences can form them. In each case, the protein chain adopts a regular, repeating conformation. Figure 3-7 shows these two conformations, as well as the abbreviations that are used to denote them in ribbon models of proteins. The core of many proteins contains extensiveregions of p sheet.As shown in Figure 3-8, these B sheetscan form either from neighboring pollpeptide chains that run in the same orientation (parallel chains) or from a pollpeptide chain that folds back and forth upon itself, with each section of the chain running in the direction opposite to that of its immediate neighbors (antiparallel chains). Both types of B sheet produce a very rigid structure, held together by hydrogen bonds that connect the peptide bonds in neighboring chains (seeFigure 3-7D).
r\n2
Figure3-6 The refolding of a denatured protein. (A)Thistype of experiment,first Performedmore than 40 yearsago,demonstrates that a Protein'sconformationis determinedsolelyby its aminoacid (B)The structureof urea. sequence. Ureais very solublein waterand unfoldsproteinsat high wherethereis concentrations, aboutone ureamoleculefor everY sixwatermolecules.
(A) Backbone:Showsthe overall organization of the polypeptide chain; a clean way to comparestructuresof related pioteins.
(B) Ribbon:Easyway to visualizesecondarystructures,suchas o helicesand B sheets.
c
3 -5 o o o E f
(J
( C ) W i r e : H i g h l i g h t ss i d ec h a i n sa n d t h e i r r e l a t i v ep r o x i m i t i e su; s e f u lf o r p r e d i c t i n gw h i c h a m i n o a c i d sm i g h t b e i n v o l v e di n a p r o t e i n ' sa c t i v i t y , p a r t i c u l a r l yi f t h e p r o t e i ni s a n e n z y m e .
(D) Space-filling:Providescontour map of the protein; givesa feel for the s h a o eo f t h e p r o t e i na n d s h o w sw h i c h a m i n o a c i ds i d ec h a i n sa r e e x p o s e d on its surface.Showshow the protein might look to a small molecule, suchas water, or to another protein.
134
Chapter3: Proteins
a m i n oa c i d s i d ec h a i n
(c)
I i ll
li
Figure3-7 The regular conformation of the polypeptide backbone in the cr helix and the p sheet. (A,B,and C)The o helix.The N-H of every peptide bond is hydrogen-bondedto the C=Oof i neighboringpeptide bond locatedfour peptidebondsawayin the samechain.Notethat all of the N-H groupspoint up in this diagIm and that all of the C=Ogroupspoint down (towardthe C-terminus); this givesa polarityto the helix,with the C-terminus havinga partial negativeand the N-terminus a partialpositivecharge.(D,E,and F)TheF sheet.In this example,adjacentpeptidechains run in opposite(antiparallel) directions. Hydrogen-bonding betweenpeptidebondsin differentstrandsholdstne individualpolypeptidechains(strands) togetherin a B sheet,and the aminoacidsidechainsin eachstrandalternatety projectaboveand belowthe planeofthe sheet.(A)and (D)showall the atomsin the polypeptidebackbone,but the aminoacidsidechainsaretruncatedand denotedby R.In contrast,(B)and (E)showthe backboneatomsonly,while (C) and (F)displaythe shorthandsymbolsthat areusedto represent the s helixand the B sheetin ribbondrawingsof proteins (seePanel3-28).
'.':,'',.:'r,r,
ANDSTRUCTURE''OFI'PROTEINS,,],.':, THESHAPE ,
135
An a helix is generatedwhen a single polypeptide chain twists around on itself to form a rigid cylinder.A hydrogenbond forms betweeneveryfourth peptide bond, linking the C=Oof one peptidebond to the N-H of another(seeFigure 3-7A).This givesrise to a regularhelix with a completeturn every3.6 amino acids. Note that the protein domain illustrated in Panel 3-2 contains two s helices,aswell asa three-strandedantiparallelB sheet. Regionsof a helix are especiallyabundantin proteinslocatedin cell membranes,such as transportproteinsand receptors.As we discussin Chapter10, thoseportions of a transmembraneprotein that crossthe lipid bilayer usually crossas an a helix composedlargelyof amino acidswith nonpolarsidechains. to itselfin The pollpeptidebackbone,which is hydrophilic,is hydrogen-bonded the crhelix and shieldedfrom the hydrophobiclipid environmentof the membraneby its protrudingnonpolarsidechains(seealsoFigure3-78). In other proteins,crheliceswrap around eachother to form a particularly stable structure,known as a coiled-coil. This structure can form when the two (or in some casesthree) crheliceshave most of their nonpolar (hydrophobic) sidechainson one side,so that they can twist aroundeachotherwith theseside chainsfacinginward (Figure3-9). Long rodlike coiled-coilsprovidethe structural framework for many elongatedproteins. Examplesare cr-keratin,which forms the intracellular fibers that reinforce the outer layer of the skin and its and the myosinmoleculesresponsiblefor musclecontraction. appendages,
ProteinDomainsAreModularUnitsfrom whichLargerProteins Are Built
Figure3-8 Two tYPesof P sheet structures.(A)An antiparallelB sheet (seeFigure3-7D).(B)A parallelB sheet. Both of these structuresare common in oroteins.
Even a small protein molecule is built from thousands of atoms linked together by precisely oriented covalent and noncovalent bonds, and it is extremely difficult to visualize such a complicated structure without a three-dimensional display. For
NHz
NHz
s t r i p eo f hydrophobic , , a , 'a n d , , d , , aminoacids
HOOC COOH
(B)
(c)
Figure3-9 A coiled-coil. amino (A)A singleo helix,with successive acidsidechainslabeledin a sevenfold sequence,"abcdefg"(from bottom to top)' Aminoacids"a"and'd" in sucha sequence lie closetogetheron the cylindersurface, forming a "stripe"(red)that winds slowly that form aroundthe o helix.Proteins typicallyhavenonpolaramino coiled-coils acidsat positions"a"and "dl'Consequently, as shown in (B),the two cr helicescan wrap aroundeachotherwith the nonpolarside with the chainsof one s helixinteracting nonpolarsidechainsof the other,whilethe more hydrophilicaminoacidsidechainsare left exposedto the aqueousenvironment' (C)The atomic structureof a coiled-coil The determinedby x-raycrystallography. red sidechainsare nonPolar'
136
Chapter3: Proteins
this reason,biologists use various graphic and computer-based aids.A DVD that accompanies this book contains computer-generated images of selected proteins, displayed and rotated on the screen in a variety of formats. Biologists distinguish four levels of organization in the structure of a protein. The amino acid sequence is knor,t'n as the primary structure. Stretches of polypeptide chain that form c helices and p sheets constitute the protein's secondary structure. The full three-dimensional organization of a polypeptide chain is sometimes referred to as the tertiary structure, and if a paiticuiaiprotein molecule is formed as a complex of more than one polypepiide chain, the complete structure is designated as the quaternary structure. Studies of the conformation, function, and evolution of proteins have also revealed the central importance of a unit of organization distinct from these four. This is the protein domain, a substructure produced by any part of a polypeptide chain that can fold independently into i compact, itable s1.uctu.e. A domain usually contains between 40 and 350 amino acids, and it is the modular unit from which many larger proteins are constructed. The different domains of a protein are often associatedwith different functions. Figure 3-10 shows an example-the Srcprotein kinase,which functions in signaling pathways inside vertebrate cells (src is pronounced "sarc',).This protein is considered to have three domains: the sH2 and sH3 domains have regulatory roles, while the c-terminal domain is responsible for the kinase catalytic activity. Later in the chapter, we shall return to this protein, in order to expiain how proteins can form molecular switches that transmit information throughout cells. Figure 3-ll presents ribbon models of three differently organized protein _ domains. As these examples illustrate, the pollpeptide chain tends to cross the entire domain before making a sharp turn at the surface. The central core of a domain can be constructed from crhelices, from B sheets,or from various combinations of these two fundamental folding elements. The smallest protein molecules contain only a single domain, whereas larger proteins can contain as many as several d,ozendomiins, often connected to each other by short, relatively unstructured lengths of pollpeptide chain.
Fewof the ManyPossiblepolypeptidechainswiil Be usefutto Ce l l s Since each of the 20 amino acids is chemically distinct and each can, in principle, occur at any position in a protein chain, there are 20 x 20 x 20 x 20 = 160,000 different possible polypeptide chains four amino acids long, or 20n different possible polypeptide chains n amino acids long. For a typical protein length of
S H 3d o m a i n
S H 2d o m a i n
Figure3-10 A proteinformed from multipledomains.In the Srcprotein shown,a C-terminal domainwith two lobes(yel/owand orange)forms a protein kinaseenzyme,whilethe SH2and SH3 domainsperformregulatoryfunctions. (A)A ribbon model,with ATPsubstratein red.(B)A spacing-fillingmodel,with ATp substratein red.Note that the site that bindsATPis positionedat the interfaceof the two lobesthat form the kinase.The detailedstructureof the SH2domainis illustrated in Panel3-2 (pp.132-133).
:, .,,.,.r',,r,,.
THESHAPEAND STRUCTURE OFPROTEINS
u (c)
about 300 amino acids, a cell could theoretically make more than 103e0(20soo, different pollpeptide chains. This is such an enormous number that to produce just one molecule of each kind would require many more atoms than exist in the universe. Only a very small fraction of this vast set of conceivable polypeptide chains would adopt a single, stable three-dimensional conformation-by some estimates, less than one in a billion. And yet the vast majority of proteins present in cells adopt unique and stable conformations. How is this possible?The answer Iies in natural selection. A protein with an unpredictably variable structure and biochemical activity is unlikely to help the survival of a cell that contains it. Such proteins would therefore have been eliminated by natural selection through the enormously long trial-and-error process that underlies biological evolution. Becauseevolution has selectedfor protein function in living organisms, the amino acid sequence of most present-day proteins is such that a single conformation is extremely stable. In addition, this conformation has its chemical properties finely tuned to enable the protein to perform a particular catalltic or structural function in the cell. Proteins are so precisely built that the change of even a few atoms in one amino acid can sometimes disrupt the structure of the whole molecule so severelvthat all function is lost.
into ManyFamilies ProteinsCanBeClassified
teins can be grouped into protein families, each family member having an amino acid sequence and a three-dimensional conformation that resemble those of the other family members. Consider, for example, Ihe serine proteases,alarge family of protein-cleaving
in their polypeptide chains, which are severalhundred amino acids long, are virtually ideniiiai lfig,rre 3-12).The many different serine proteases nevertheless
.
,,,,,.. 137
Figure3-11 Ribbonmodelsof three different protein domains. (A)Cytochrome a single-domain b562, proteininvolvedin electrontransportin Thisproteinis composed mitochondria. (B)The almostentirelyof o helices. domainof the enzYme NAD-binding which is composed lacticdehydrogenase, of a mixtureof o helicesand parallelp sheets.(C)Thevariabledomainof an immunoglobulin(antibody)light chain, composedof a sandwichof two p sheets.In theseexamples, antiparallel the crhelicesareshownin green,while strandsorganizedas P sheetsare denoted by redanows. chain Note how the polYPePtide backand forth across generallytraverses the entiredomain,makingsharpturns lt is the only at the proteinsurface. protrudingloop regions(yellow)that often form the binding sitesfor other (Adaptedfrom drawings molecules. courtesyof JaneRichardson.)
138
Chapter3: Proteins
Figure3-12 A comparisonof the conformationsof two serineproteases. The backboneconformations of elastase and chymotrypsin. Althoughonlythose aminoacidsin the polypeptidechain shadedin greenarc the samein the two proteins,the two conformationsarevery similarnearlyeverywhere.The activesite of eachenzymeis circledin red;this is wherethe peptidebondsof the proteins that serveassubstrates are boundand cleavedby hydrolysis. The serine proteasesderivetheir namefrom the aminoacidserine,whosesidechainis part of the activesite of eachenzyme and directlyparticipates in the cleavaqereaction.
have distinct enzyrnatic activities, each cleaving different proteins or the peptide bonds between different types of amino acids. Each therefore performs a distinct function in an organism. The story we have told for the serine proteases could be repeated for hundreds of other protein families. In general, the structure of the different members of a protein family has been more highly conserved than has the amino acid
dom process, there must also have been many deleterious changes that altered the three-dimensional structure of these proteins sufficiently io harm them. (A)
b
L{
h e l i x2
(c) yeast GHRFTKENVRI RTAFSSEOLAR Drosophila
LESWFAKNIENPYL
DTKGLENLMKNTS
L5
LKREFNEN.--R\T
TERRRQQLSSELG
LN
Figure3-13 A comparisonof a classof DNA-bindingdomains,called homeodomains, in a pair of proteins from two organismsseparatedby more than a billion yearsof evolution. (A)A ribbon model of the structure common to both proteins.(B)A traceof the o-carbonpositions. Thethreedimensional structures shownwere determinedby x-raycrystallography for the yeast Drosophila engrailed protein (red). (C)A comparison of aminoacid sequences for the regionof the proteins shown in (A)and (B).Blackdotsmark sites with identicalaminoacids.Orange dotsindicatethe positionof a three aminoacidinsertin the cr2protein. (Adaptedfrom C.Wolbergeret al., -528, 1991.With permission Cell67:5'17 from Elsevier.)
139
ANDSTRUCTUR T H ES H A P E OEF P R O T E I N S
Such faulty proteins would have been lost whenever the individual organisms making them were at enough of a disadvantage to be eliminated by natural selection. Protein families are readily recognizedwhen the genome of any organism is sequenced; for example, the determination of the DNA sequence for the entire human genome has revealedthat we contain about 24,000protein-coding genes. Through sequencecomparisons,we can assignthe products of about 40 percent of these genesto known protein structures,belonging to more than 500 different protein families. Most of the proteins in each family have evolved to perform somewhat different functions, as for the enzymes elastase and chymotrypsin illustrated previously in Figure 3-12. These are sometimes called paralogsto dtstinguish them from the corresponding proteins in different organisms (orthologs,such as mouse and human elastase). As described in Chapter 8, becauseofthe powerful techniques ofx-ray crystallography and nuclear magnetic resonance (NMR), we now know the threedimensional shapes,or conformations,of more than 20,000proteins. By carefully comparing the conformations of these proteins, structural biologists (that is, experts on the structure of biological molecules) have concluded that there are a limited number of ways in which protein domains fold up in nature-maybe as few as 2000. The structures for about 800 of these protein folds have thus far been determined. These known folds tend to be those most represented in the universe of protein structures: for example, 50 folds account for nearly three-fourths of the domain families with predicted structures.A complete catalog of the most significant protein folds that exist in living organisms would therefore seem to be within our reach.
Searches CanldentifyCloseRelatives Sequence The present database of knor,nmprotein sequencescontains more than ten milIion entries, and it is growing very rapidly as more and more genomes are sequenced-revealing huge numbers of new genes that encode proteins. Powerful computer search programs are available that allow us to compare each newly discoveredprotein with this entire database,looking for possible relatives. Many proteins whose geneshave evolved from a common ancestral gene can be identified by the discovery of statistically significant similarities in amino acid sequences. With such a large number of proteins in the database,the search programs find many nonsignificant matches, resulting in a background noise level that makes it very difficult to pick out all but the closest relatives. Generally speaking, one requires a 30To identity in sequence to consider that two proteins match. However, we know the function of many short signature sequences ("fingerprints"), and these are widely used to find more distant relationships (Figure 3-r4). Protein comparisons are important because related structures often imply related functions. Many years of experimentation can be saved by discovering that a new protein has an amino acid sequence similarity with a protein of known function. Such sequence relationships, for example, first indicated that certain genes that cause mammalian cells to become cancerous are protein kinases. In the same way, many of the proteins that control pattern formation during the embryonic development of the fruit fly Drosophilawere quickly recognized to be gene regulatory proteins.
IIYITGKITRRESERL,L
GTFt,V!1liSI!
-
WYFGKTBRRESERI,LLNAENPRE?FLVRESETTKGAYCLSVSDFDNAKGL Y LSV D+++ +G W+F + R+E+++LLL ENP GTFLVR SE ENPES.IF1VRp,SEHNPNGYSL SVKDWEDGRGY WFFENVLRREADKS1STLAE 11020
s i g n a t u r es e q u e n c e s hUMAN s e q u e n c em a t c h e s Dr osop h i Ia
Figure3-14The useof short signature sequencesto find related Protein of domains.Thetwo shortsequences 15 and 9 amino acidsshown(green)can for a be usedto searchlargedatabases proteindomainthat isfound in many proteins, the SH2domain.Here,the first 50 aminoacidsof the SH2domainof 100aminoacidsis comParedfor the human and DrosophilaSrcprotein (see Figure3-10).In the computer-generatecl sequencecomparison(yellowrow),exact matchesbetweenthe humanand Drosophilaproteinsare noted by the onefor the aminoacid;the letterabbreviation oositionswith a similarbut nonidentical aminoacidaredenotedbY+, and are blank.In this diagram, nonmatches whereverone or both proteinscontainan exactmatch to a positionin the green are both alignedsequences sequences, coloredred.
140
Chapter3: Proteins
SomeProteinDomainsFormPartsof ManyDifferentproteins As previously stated, most proteins are composed of a seriesof protein domains, in which different regions of the polypeptide chain have folded independently to form compact structures. such multidomain proteins are believed to have originated from the accidental joining of the DNA sequencesthat encode each domain, creating a new gene. Novel binding surfaceshave often been created at the juxtaposition of domains, and many of the functional sites where proteins bind to small molecules are found to be located there. In an evolutioniry process called domain shuffIing, many large proteins have evolved through the joining of preeisting domains in new combinations (Figure 3-f 5). A subset of protein domains have been especially mobile during evolution; these seem to have particularly versatile structures and are sometimes referred to as protein modules.The structure of one such module, the SH2 domain, was illustrated in Panel3-2 (pp. r32-r33). Some other abundant protein domains are illustrated in Figure 3-16. Each of the domains shor.rmhas a stable core structure formed from strands of B sheet, from which less-orderedloops of polypeptide chain protru de (green). The loops are ideally situated to form binding sites for other molecules, ai most clearly demonstrated for the immunoglobulin fold, which forms the basis for antibody molecules (seeFigure 3-41). Most likely, such B-sheet-baseddomains have achieved their evolutionary success because they provide a convenient framework for the generation of new binding sites for ligands through small changes to their protruding loops.
EGF HzN
COOH
tt"tl,tf"tt't .oo" HzN
UROKINASE COOH FACTOR IX
H2NHzN
coOH PLASMINOGEN cooH
Figure3-15 Domainshuffling.An extensiveshufflingof blocksof protein sequence(proteindomains)hasoccurred duringproteinevolution.Thoseportions of a proteindenotedby the sameshape and colorin this diagramare evolutionarily related.Serineproteases like chymotrypsinareformed from two domains(brown).Inthe threeother proteases shown,whicharehighly regulatedand morespecialized, these two proteasedomainsare connectedto one or moredomainsthat aresimilarto domainsfound in epidermalgrowth factor (EGF;green),to a calcium-binding protein(yellow),or to a "kringle"domain (blue)that containsthree internal disulfidebridges.Chymotrypsin is illustratedin Fioure3-12.
fr-t .I, \t
'ii l
dr complement control module
immunoglobulin module
fibronectin type 1 module
fibronectin type 3 module
groMh factor module
Figure3-16 The three-dimensional structuresof some protein modules.In theseribbondiagrams, B-sheetstrandsare shownas arrows,and the N- and C-terminiare indicatedby redspheres. (Adaptedfrom M. Baron,D.G.Normanand l.D.Campbell,TrendsBiochem.Sci.16:i3-17, 1991,with permission from Elsevier, and D.J.Leahyet al.,Science258:987-99i, 1992, with permissionfrom AAAS.)
141
THESHAPEAND STRUCTURE OF PROTEINS
can be readily linked in series to form extended structures-either with themselves or with other in-line domains (Figure 3-f7). Stiff extended structures composed of a series of domains are especially common in extracellular matrix molecules and in the extracellular portions of cell-surface receptor proteins' Other modules, including the SH2 domain and the kringle domain illustrated in Figure 3-16, are of a "plug-in" typ", with their N- and C-termini close together. After genomic rearrangements,such modules are usually accommodated as an insertion into a loop region of a second protein. A comparison of the relative frequency of domain utilization in different eucaryotes reveals that, for many common domains, such as protein kinases, this frequency is similar in organisms as diverse as yeast, plants, worms, flies, and humans (Figure 3-f 8). But there are some notable exceptions, such as the Major Histocompatibility Complex (MHC) antigen-recognition domain (see Figure 25-52) that is present in 57 copies in humans, but absent in the other four organisms just mentioned. Such domains presumably have specialized functions that are not shared with the other eucaryotes,being strongly selected for during evolution so as to give rise to the multiple copies observed. Similarly, a domain like SH2 that shows an unusual increase in its numbers in higher eucaryotes might be assumed to be especially useful for multicellularity (compare the multicellular organisms with yeast in Figure 3-18).
CertainPairsof DomainsAreFoundTogetherin ManyProteins We can construct a large table displaying domain usagefor each organism whose genome sequence is knor,r,rr.For example, the human genome is estimated to contain about 1000 immunoglobulin domains, 500 protein kinase domains, 250 DNA-binding homeodomains, 300 SH3 domains, and 120 SH2 domains. Important additional information can be derived by comparing the frequencies and arrangements of domains in the more than 100 eucaryotic, bacterial, and archaeal genomes that have been completely sequenced. For example, we find that more than two-thirds of proteins consist of two or more domains, and that the same pairs of domains occur repeatedly in the same relative arrangement in a protein. Although half of all domain families are common to archaea,bacteria, and eucaryotes,only about 5 percent of the two-domain combinations are similarly shared.This pattern suggeststhat most proteins containing especiallyuseful two-domain combinations arose relatively late in evolution. The 200 most abundant two-domain combinations occur in about onefourth of all of the proteins with recognizable domains in the complete data set. It would therefore be very useful to determine the precise three-dimensional structure for at least one protein from each common two-domain combination, so as to reveal how the domains interact in that type of protein structure.
(A)
(B)
Figure3-17 An extendedstructure formed from a seriesof in-line protein modules.Fourfibronectintype 3 modules (seeFigure3-16)from the extracellular matrixmoleculefibronectinare illustrated models. in (A)ribbonand (B)space-filling (Adaptedfrom D.J.Leahy,l. Aukhil and 1996.With Cell84:155-164, H.P.Erickson, from Elsevier.) oermission
46 42 c o
38
o c a
34
E 30 E
26
o c 6
zz 18
l o o
14 10
a a
06 02
.f tr'
.c
*J
eucaryoticprotein kinase
I ."d ot
\-\
"s
^a" \c
s
D N A - b i n d i n gh o m e o d o m a i n
./ f' S H 2d o m a i n
Figure3-18 The relativefrequenciesof three protein domains in five eucaryotic organisms.The approximatepercentages givenhavebeendeterminedby dividing the numberof copiesof eachdomainby the total numberof distinctproteins thought to be encodedbY each organism,asdeterminedfrom the sequenceof its genome.Thus,for 5H2 = 0.005. domainsin humans,120/24,000
142
Chapter3: Proteins
TheHumanGenomeEncodes a ComplexSetof proteins, Revealing MuchThatRemains Unknown The result of sequencing the human genome has been surprising, because it reveals that our chromosomes contain only about 25,000 genes.Based on gene number alone, we would appear to be no more complex than the tiny mustard weed, Arabidopsis, and only about l.3-fold more complex than a nematode worm. The genome sequencesalso reveal that vertebrates have inherited nearly all of their protein domains from invertebrates-with only 7 percent of identified human domains being vertebrate-specific. Each of our proteins is on average more complicated, however (Figure 3-19). Domain shuffling during vertebrate evolution has given rise to many novel combinations of protein domains, with the result that there are nearly twice as many combinations of domains found in human proteins as in a worm or a fly. Thus, for example, the trypsinlike serine protease domain is linked to at least 18 other types of protein domains in human proteins, whereas it is found covalently joined to only 5 different domains in the worm. This extra variety in our proteins greatly increasesthe range of protein-protein interactions possible (seeFigure 3-82), but how it contributes to making us human is not known. The complexity of living organisms is staggering,and it is quite sobering to note that we currently lack even the tiniest hint of what the function might be for more than 10,000of the proteins that have thus far been identified in the human genome. There are certainly enormous challengesahead for the next generation of cell biologists, with no shortage of fascinating mysteries to solve.
yeast
Znt
Ep1 PHD PHD Ep2
Br
Ep1 PHD PHD Ep2
Br
BMB
Figure3-19 Domainstructureof a group of evolutionarilyrelated proteins that are thought to havea similar function.In general,thereis a tendency for the proteinsin morecomplex organisms, suchas humans,to contain additionaldomains-as isthe casefor the DNA-binding protein comparedhere.
LargerProteinMolecules OftenContainMoreThanOne Polypeptide Chain The same weak noncovalent bonds that enable a protein chain to fold into a specific conformation also allow proteins to bind to each other to produce larger structures in the cell. Any region of a protein's surface that can interact with another molecule through sets of noncovalent bonds is called a binding site. A protein can contain binding sites for various large and small molecules. If a binding site recognizesthe surface of a second protein, the tight binding of two folded polypeptide chains at this site creates a larger protein molecule with a precisely defined geometry. Each pollpeptide chain in such a protein is called a protein subunit. In the simplest case, two identical folded polypeptide chains bind to each other in a "head-to-head" arrangement, forming a symmetric complex of two protein subunits (a dimer) held together by interactions between two identical binding sites. The cro repressor protein-a viral gene regulatory protein that binds to DNA to turn viral genes off in an infected bacteiial cell-provides an example (Figure 3-20). cells contain many other types of s).rynmeiricprotein complexes, formed from multiple copies of a single pollpeptide chain. The enzyme neuraminidase, for example, consists of four identical protein subunits, each bound to the next in a "head-to-tail" arrangement that foims a closed ring (Figure 3-21). Many of the proteins in cells contain two or more tlpes of polypeptide chains. Hemoglobin, the protein that carries oxygen in red Llood ceils, cbntains
Figure3-20 Two identical protein subunitsbindingtogetherto form a symmetric protein dimer.The Cro repressor proteinfrom bacteriophage lambdabindsto DNAto turn off viral genes.lts two identicalsubunitsbind head-to-head, heldtogetherby a combinationof hydrophobicforces(blue) and a set of hydrogenbonds (yellow region).(Adaptedfrom D.H.Ohlendorl D.E.Tronrudand B.W.Matthews,J. Mol. Biol.280:129-136,1998.Wirh permission from AcademicPress.)
THESHAPEAND STRUCTURE OF PROTEINS
rl:.. :r.
143
Figure3-21 A protein molecule containingmultiplecopiesof a single proteinsubunit.Theenzyme existsas a ring of four neuraminidase identicalpolypeptidechains.Eachof thesechainsis formed from six repeatsof p sheet,as indicatedby a four-stranded The smalldiagram the coloredarrows. showshow the repeateduseof the same bindinginteractionformsthe structure.
t e t r a m e ro f n e u r a m i n i d a sper o t e i n
two identical a-globin subunits and two identical B-globin subunits, symmetrically arranged (Figure 3-22). Such multisubunit proteins are very common in cells, and they can be very large. Figure 3-23 shows a sample of proteins whose exact structures are knolnn, and it compares the sizesand shapesof a few larger proteins with some of the relatively small proteins that we have thus far presented as models.
SomeProteinsFormLongHelicalFilaments Some protein molecules can assemble to form filaments that may span the entire length of a cell. Most simply, a long chain of identical protein molecules can be constructed if each molecule has a binding site complementary to another region of the surface of the same molecule (Figure 3-24). An actin filament, for example, is a long helical structure produced from many molecules of the protein actin (Figure3-2$). Actin is very abundant in eucaryotic cells,where it constitutes one of the major filament systems of the cltoskeleton (discussed in Chapter 16). 'vVhyis a helix such a common structure in biology? As we have seen, bioIogical structures are often formed by linking similar subunits-such as amino acids or protein molecules-into long, repetitive chains. If all the subunits are identical, the neighboring subunits in the chain can often fit together in only one way, adjusting their relative positions to minimize the free energy of the contact between them. As a result, each subunit is positioned in exactly the same way in relation to the next, so that subunit 3 fits onto subunit 2 in the same way that subunit 2 fits onto subunit 1, and so on. Becauseit is very rare for subunits to join up in a straight line, this arrangement generally results in a helixa regular structure that resemblesa spiral staircase,as illustrated in Figure 3-26' Depending on the twist of the staircase,a helix is said to be either right-handed or left-handed (see Figure 3-26E). Handedness is not affected by turning the helix upside dourn, but it is reversed if the helix is reflected in the mirror.
Figure 3-22 A protein formed as a symmetricassemblyof two different subunits.Hemoglobinis an abundant oroteinin red bloodcellsthat containstwo copiesof o-globinand two copiesof B-globin.Eachof thesefour polypeptide chainscontainsa hememolecule(red) which is the sitethat bindsoxygen(Oz). Thus,eachmoleculeof hemoglobinin the bloodcarriesfour moleculesof oxygen.
"144
Chapter3: Proteins
5H2domain
lysozyme
catalase
myoglobin
deoxyribonuclease
cytochromec
ponn
chymotrypsin calmodulin
insulin
Figure3-23 A collectionof protein molecules,shown at the same scale.Forcomparison,a DNA moleculebound to a protein is also illustrated.These space-fillingmodelsrepresenta rangeof sizesand shapes.Hemoglobin,catalase,porin,alcoholdehyd'rogenase, and aspartatetranscarbamoylase areformed from multiple copiesof subunits.The SH2domain (topteft)is presentedin detail in panel3-2 (pp.132-133). (Adaptedfrom David5. Goodsell, Our MolecularNature.NewYork:Springer-Veriag, t SSO. Witn permission from SpringerScience and Business Media.)
THESHAPEAND STRUCTURE OF PROTEINS
(A)
a s s e m b l e sdt r u c t u r e s
free.. s uD Un r t s
(B)
, .i
, . . , , , , . . : i ,il :i ,1.4' ,5,
Figure3-24 Proteinassemblies. (A)A proteinwith just one bindingsite canform a dimerwith anotheridentical protein.(B)ldenticalproteinswith two differentbinding sitesoften form a long helicalfilament.(C)lf the two binding in sitesaredisposedappropriately relationto eachother,the protein subunitsmayform a closedring instead of a helix.(Foran exampleof A, see Figure3-20;for an exampleof C,see F i g u r e3 - 2 1 . )
-
Helices occur commonly in biological structures, whether the subunits are small molecules linked together by covalent bonds (for example, the amino acids in an u helix) or large protein molecules that are linked by noncovalent forces (for example, the actin molecules in actin filaments). This is not surprising. A helix is an unexceptional structure, and it is generated simply by placing many similar subunits next to each other, each in the same strictly repeated relationship to the one before-that is, with a fixed rotation followed by a fixed translation along the helix axis, as in a spiral staircase.
a c t i nm o l e c u l e m i n u se n d
FibrousShapes HaveElongated, ManyProteinMolecules Most of the proteins that we have discussedso far areglobular proteins,in which the pollpeptide chain folds up into a compact shape like a ball with an irregular surface. Enzymes tend to be globular proteins: even though many are large and complicated, with multiple subunits, most have an overall rounded shape (see Figure 3-23). In contrast, other proteins have roles in the cell that require each individual protein molecule to span a large distance. These proteins generally have a relatively simple, elongated three-dimensional structure and are commonly referred to as fibrous proteins. One large family of intracellular fibrous proteins consists of cr-keratin,introduced when we presented the cxhelix, and its relatives. Keratin filaments are extremely stable and are the main component in long-lived structures such as hair. horn, and nails. An cx-keratinmolecule is a dimer of two identical subunits, with the long c helices of each subunit forming a coiled-coil (see Figure 3-9). The coiled-coil regions are capped at each end by globular domains containing binding sites.This enables this class of protein to assemble into ropelike intermediate filaments-an important component of the cltoskeleton that creates the cell's internal structural framework (seeFigure 16-19). Fibrous proteins are especially abundant outside the cell, where they are a main component of the gel-like extracellular matrix that helps to bind collections of cells together to form tissues.Cells secreteextracellular matrix proteins into their surroundings, where they often assemble into sheets or long fibrils.
)l )
1 :znm
lli p l u se n d
(A)
50
"m
(B)
Figure3-25 Actin filaments' (A)Transmission electronmicrographs of negativelystainedactin filaments. (B)The helicalarrangement of actin moleculesin an actinfilament. (A,courtesyof RogerCraig.,
146
Chapter3: Proteins
lefthanded (E)
(A)
(B)
(c)
Figure3-26 Somepropertiesof a helix. (A-D) A helixformswhen a seriesof subunitsbind to eachother in a regular way.At the bottom,the interaction betweentwo subunitsis shown;behind them arethe helicesthat result.These heliceshavetwo (A),three(B),and six (Cand D) subunitsper helicalturn.The photographs at the top showthe arrangement of subunitsviewedfrom directlyabovethe helix.Notethat the helixin (D)hasa wider path than that in (C),but the samenumberof subunitsper turn.(E)A helixcan be eitherrighthandedor left-handed. As a reference, it is usefulto rememberthat standard metalscrews, which insertwhen turned clockwise, areright-handed. Notethat a righthelixretainsthe samehandedness when handed it isturnedupsidedown.
(D)
collagen is the most abundant of these proteins in animal tissues.A collagen molecule consists of three long pollpeptide chains, each containing the nonpolar amino acid glycine at every third position. This regular structure allows the chains to wind around one another to generate a long regular triple helix (Figure 3-27A). Many collagen molecules then bind to one another side-by-side and end-to-end to create long overlapping arrays-thereby generating the extremely tough collagen fibrils that give connective tissues their tensile strength, as described in Chapter 19.
ManyProteinsContaina Surprisingly LargeAmountof Unstructured Polypeptide Chain It has been well kno',.tmfor a long time that, in complete contrast to collagen, another abundant protein in the extracellular matrix, elastin, is formed as a highly disordered pollpeptide. This disorder is essentialfor elastin'sfunction. Its
Figure3-27 Collagenand elastin. (A)Collagenis a triplehelixformedby threeextendedproteinchainsthat wrap around one another(bottom).Many rodlikecollagenmolecules arecrosslinkedtogetherin the extracellular space to form unextendable collagenfibrils (top)that havethe tensilestrengthof steel.Thestripingon the collagenfibrilis causedby the regularrepeating arrangement of the collagenmolecules withinthe fibril.(B)Elastinpolypeptide chainsarecross-linked togetherto form rubberlike, elasticfibers.Eachelastin moleculeuncoilsinto a moreextended conformationwhen the fiber is stretched and recoilsspontaneously as soonasthe stretchingforceis relaxed. elastic fiber
-short sectionof -collagen fibril
coilagen molecule 3 0 0n m x 1 . 5n m
STRETCH
T
i 5nm I
(A)
c o ll a g e n triple helix
s i n g l ee l a s t i nm o l e c u l e
i, :
pROTEINS. THESHApEAND STRUCTUfiE,OF
, 'l,,',,,.,',,',:
. 147
relatively loose and unstructured poll,?eptide chains are covalently cross-linked to produce a rubberlike elastic meshwork that can be reversibly pulled from one conformation to another, as illustrated in Figure 3-278. The elastic fibers that result enable skin and other tissues, such as arteries and lungs, to stretch and recoil without tearing. Intrinsically unstructured regions of proteins are quite frequent in nature, having important functions in the interior of cells.As we have already seen,proteins use the short loops of polypeptide chain that generally protrude from the core region of protein domains to bind other molecules. Similarly, many proteins have much longer regions of unstructured amino acid sequences that interact with another molecule (often DNA or a protein), undergoing a structural transition to a specific folded conformation when the other molecule is bound. Other proteins appear to resemble elastin, in so far as their function requires that they remain largely unstructured. For example, the abundant nucleoporins that coat the inner surface of the nuclear pore complex form a random coil meshwork that is intimately involved in nuclear transport (see Figure 12-10). Finally, as will be discussed later in this chapter (see Figure 3-B0C), unstructured regions of polypeptide chain are often used to connect the binding sites for proteins that function together to catalyze a biological reaction. Thus, for example, in facilitating cell signaling, large scaffold proteinsuse such flexible regions as "tethers" that concentrate sets of interacting proteins, often confining them to particular sites in the cell (discussedin Chapter 1S). We can recognize the unstructured regions in many proteins by their biased amino acid composition: they contain very few of the bulky hydrophobic amino acids that normally form the core of a folded protein, being composed instead of a high proportion of the amino acids Gln, Ser,Pro, GIu, and Lys. Such "natively unfolded" regions also frequently contain repeated sequencesof amino acids.
lular Proteins Covalent Cross-Li nkagesOften StabiIize Extracel Many protein molecules are either attached to the outside of a cell's plasma membrane or secreted as part of the extracellular matrix. All such proteins are directly exposed to extracellular conditions. To help maintain their structures, the polypeptide chains in such proteins are often stabilized by covalent crossIinkages. These linkages can either tie two amino acids in the same protein together, or connect different polypeptide chains in a multisubunit protein. The most common cross-linkages in proteins are covalent sulfur-sulfur bonds. These disulfide bonds (also called S-S bonds) form as cells prepare newly synthesized proteins for export. As described in Chapter 12, their formation is catalyzed in the endoplasmic reticulum by an enzyme that links together two pairs of -SH groups of cysteine side chains that are adjacent in the folded protein (Figure 3-28). Disulfide bonds do not change the conformation of a protein but instead act as atomic stanles to reinforce its most favored conformation. For
SH
I oxidants +
reductants
au Y' '2
Figure3-28 Disulfidebonds. how covalent Thisdiagramillustrates interchain disulfidebondsform betweenadjacent disulfide these cysteinesidechains.As indicated, oono canjoin eithertwo partsof cross-linkages chainor two the samepolyPePtide differentpolypeptidechains.Sincethe energyrequiredto breakone covalent bond is much largerthan the energy requiredto breakeven a whole set of 2-1, p.53), bonds(seeTable noncovalent a disulfidebond can havea major stabilizingeffecton a Protein.
148
Chapter3: Proteins
h e x a g o n al ly pacKeo sheet
h e l i c Ia tube
example, Iysozyme-an enzyme in tears that dissolves bacterial cell wallsretains its antibacterial activity for a long time because it is stabilized by such cross-linkages. Disulfide bonds generally fail to form in the cell cytosol, where a high concentration of reducing agents converts S-S bonds back to cysteine -SH groups. Apparently, proteins do not require this tlpe of reinforcement in the relatively mild environment inside the cell.
ProteinMolecules OftenServeasSubunitsfor the Assemblyof LargeStructures The same principles that enable a protein molecule to associatewith itself to form rings or filaments also operate to generate much larger structures in the cell-supramolecular structures such as enzyme complexes, ribosomes, protein filaments, viruses, and membranes. These large objects are not made as single, giant, covalently linked molecules. Instead they are formed by the noncovalent assembly of many separatelymanufactured molecules, which serve as the subunits of the final structure. The use of smaller subunits to build larger structures has severaladvantages: l. A large structure built from one or a few repeating smaller subunits requires only a small amount of genetic information. 2. Both assembly and disassembly can be readily controlled, reversible processes,becausethe subunits associatethrough multiple bonds of relatively low energy. 3. Errors in the slnthesis of the structure can be more easily avoided, since correction mechanisms can operate during the course of assembly to exclude malformed subunits. Some protein subunits assemble into flat sheets in which the subunits are arranged in hexagonal patterns. Specializedmembrane proteins are sometimes arranged this way in lipid bilayers. with a slight change in rhe geometry of the individual subunits, a hexagonal sheet can be converted into a tube (Figure 3-29) or, with more changes, into a hollow sphere. protein tubes and spheres that bind specific RNA and DNA molecules in their interior form the coats of viruses. The formation of closed structures, such as rings, tubes, or spheres,provides additional stability because it increasesthe number of bonds between the protein subunits. Moreover, because such a structure is created by mutually dependent, cooperative interactions between subunits, a relatively small change that affects each subunit individually can cause the structure to assemble or disassemble. These principles are dramatically illustrated in the protein coat or capsld of many simple viruses,which takes the form of a hollow sphere based on an icosahedron (Figure 3-30). capsids are often made of hundreds of identical protein subunits that enclose and protect the viral nucleic acid (Figure 3-31). The protein in such a capsid must have a particularly adaptable structure: not only must it make severaldifferent kinds of contacts to create the sphere,it must also change this arrangement to let the nucleic acid out to initiate viral replication once the virus has entered a cell.
Figure3-29 An exampleof the assemblyof a singleproteinsubunit requiringmultiple protein-protein packedglobular contacts.Hexagonally proteinsubunitscanform eithera flat sheetor a tube.
149
THESHAPEAND STRUCTURE OF PROTEINS
20 nm Figure3-30 The capsidsof someviruses,all shownat the samescale.(A)Tomatobushystuntvirus;(B)poliovirus; (C)simianvirus40 (5Va0); of all of thesecapsidshavebeendetermined (D)satellitetobacconecrosis virus.Thestructures and JamesM. Hogle.) of RobertGrant,StephanCrainic, by x-raycrystallography and areknown in atomicdetail.(Courtesy
ManyStructuresin CellsAre Capableof Self-Assembly The information for forming many of the complex assemblies of macromolecules in cells must be contained in the subunits themselves,because purified subunits can spontaneously assemble into the final structure under the appropriate conditions. The first large macromolecular aggregateshown to be capable of self-assembly from its component parts was tobacco mosaic uirus (TMV). This virus is a long rod in which a cylinder of protein is arranged around a helical RNA core (Figure 3-32). If the dissociated RNA and protein subunits are mixed together in solution, they recombine to form fully active viral particles. The assembly process is unexpectedly complex and includes the formation of double rings of protein, which serve as intermediates that add to the growing viral coat. Another complex macromolecular aggregate that can reassemble from its component parts is the bacterial ribosome. This structure is composed of about 55 different protein molecules and 3 different rRNA molecules. Incubating the individual components under appropriate conditions in a test tube causesthem to spontaneously re-form the original structure. Most importantly, such reconstituted ribosomes are able to catalyze protein s),'nthesis.As might be expected, the reassembly of ribosomes follows a specific pathway: after certain proteins have bound to the RNA, this complex is then recognized by other proteins, and so on, until the structure is complete. It is still not clear how some of the more elaborate self-assemblyprocesses are regulated. Many structures in the cell, for example, seem to have a precisely
150
Chaoter3: Proteins
Figure3-31 The structureof a spherical virus.In manyviruses, identicalprotein subunitspacktogetherto createa spherical shell(a capsid)that encloses the viralgenome,composedof either RNAor DNA(seealsoFigure3-30).For geometricreasons/ no morethan 60 identicalsubunitscan packtogetherin a precisely symmetricway.lf slight irregularities areallowed,however,more subunitscan be usedto producea larger capsidthat retainsicosahedral symmetry. Thetomato bushystuntvirus(TBSV) shownhere,for example,is a spherical virusabout33 nm in diameterformed from 180identicalcopiesof a 386amino acidcapsidproteinplusan RNAgenome of 4500nucleotides. To constructsucha largecapsid,the proteinmust be ableto fit into threesomewhatdifferent environments, eachof which is differently coloredin the virusparticleshownhere. The postulatedpathwayof assemblyis shown;the precisethree-dimensional structurehasbeendeterminedby x-ray diffraction.(Courtesyof SteveHarrison.)
dimer
p r o j e c t i n gd o m a i n s h e l ld o m a i n c o n n e c t i n ga r m R N A - b i n d i ndgo m a i n
{-
c a p s i dp r o t e i n monomer s n o w na s ribbon model 20nm
defined length that is many times greater than that of their component macromolecules. How such length determination is achieved is in many casesa mystery. Three possible mechanisms are illustrated in Figure 3-33. In the simplest
Figure3-32 The structureof tobaccomosaicvirus(TMV).(A)An electron micrographof the viralparticle, whichconsistsof a singlelong RNA moleculeenclosedin a cylindricalproteincoatcomposedof identical proteinsubunits.(B)A modelshowingpart of the structureof TMV.A single-stranded RNAmoleculeof 6395nucleotides is packagedin a helical coatconstructed from 2130 copiesof a coat protein158 aminoacidslong. Fullyinfectiveviralparticlescan self-assemble in a testtube from ourified RNAand proteinmolecules. (A,courtesyof RobleyWilliams; B,courtesyof RichardJ. Feldmann.)
151
THESHAPEAND STRUCTURE OF PROTEINS
t
( A ) A S 5 E M B LO YN C O R E
( B ) A C C U M U L A T ESDT R A I N
(C) VERNIEM R ECHANIsM
Figure3-33 Threemechanismsof length determination for large protein (A)Coassembly assemblies. alongan elongatedcoreproteinor other of macromolecule that actsasa measuringdevice.(B)Termination assemblybecauseof strainthat accumulates in the polymericstructureas additionalsubunitsareadded,so that beyonda certainlengththe energy large. requiredto fit anothersubunitonto the chainbecomesexcessively (C)A verniertype of assembly, in whichtwo setsof rodlikemolecules differingin lengthform a staggeredcomplexthat growsuntiltheirends exactlymatch.The namederivesfrom a measuringdevicebasedon the sameprinciple,usedin mechanical instruments.
case,a long core protein or other macromolecule provides a scaffold that determines the extent of the final assembly. This is the mechanism that determines the length of theTMVparticle, where the RNA chain provides the core. Similarly, a core protein is thought to determine the length of the thin filaments in muscle, as well as the length of the long tails of some bacterial viruses (Figure 3-34).
AssemblyFactorsOftenAid the Formationof ComplexBiological Structures Not all cellular structures held together by noncovalent bonds self-assemble.A mitochondrion, a cilium, or a myofibril of a muscle cell, for example, cannot form spontaneously from a solution of its component macromolecules. In these cases, part of the assembly information is provided by special enzymes and other proteins that perform the function of templates, guiding construction but taking no part in the final assembled structure. Even relatively simple structures may lack some of the ingredients necessary for their ovrn assembly. In the formation of certain bacterial viruses, for example, the head, which is composed of many copies of a single protein subunit, is assembled on a temporary scaffold composed of a second protein. Becausethe second protein is absent from the final viral particle, the head structure cannot spontaneously reassemble once it has been taken apart. Other examples are knornrnin which proteolltic cleavage is an essential and irreversible step in the normal assembly process.This is even the case for some small protein assemblies, including the structural protein collagen and the hormone insulin (Figure 3-35). From these relatively simple examples,it seems certain that the assembly of a structure as complex as a mitochondrion or a cilium will involve temporal and spatial ordering imparted by numerous other cell components.
Figure3-34 An electron micrographof bacteriophagelambda.The tip of the virustail attachesto a soecificproteinon the surfaceof a bacterialcell, afterwhichthe tightlypackagedDNAin the headis injectedthroughthe tail into the cell.Thetail hasa preciselength,determinedby the mechanismshownin Figure3-33A.
152
Chapter3: Proteins
proinsulin
Figure3-35 Proteolyticcleavagein insulinassembly. The polypeptide hormoneinsulincannotspontaneously re-formefficientlyif its disulfidebonds aredisrupted.lt is synthesized as a larger protein (proinsufin)that is cleaved by a proteolyticenzymeafter the protein chainhasfoldedinto a specificshape. Excision of part of the proinsulin polypeptidechain removessomeof the informationneededfor the oroteinto fold spontaneously into its normal conformation. Onceinsulinhasbeen denaturedand its two polypeptide chainshaveseparated, its abilityto reassemble is lost.
I1,,i1,.,,r,..11."0,,,,"0
reduction irreversibly seoaratesthe two chains
SH
+tr5
SH
Su m m a r y A protein molecule'samino acid sequencedetermines its three-dimensional conformation. Noncoualentinteractions betweendffirent parts of the polypeptide chain stabilize itsfolded structure. The amino acids with hydrophobic side chains tend to cluster in the interior of the molecule, and local hydrogen-bond interactions between neighboring peptide bondsgiue rise to a helicesand B sheets. Globular regions,known as domains, are the modular units from which many proteins are constructed;such domains generally contain 4u3s0 amino acids. small proteins typically consistof only a single domain, while large proteins areformed from seueraldomains linked together by uarious lengths of polypeptide chain, some of which can be relatiuelydisordered.As proteins haueeuolued,domains hauebeenmodified and combined with other domains to construct new proteins. Thusfar, about g00 dffirent ways of folding up a domain haue been obserued,among more than 20,000 known protein structures. Proteinsare brought togetherinto larger structuresby the samenoncoualentforces that determine protein folding. Proteins with binding sitesfor their own surface can assembleinto dimers, closedrings, spherical shells,or helical polymers.Although mixtures of proteins and nucleic acids can assemblespontaneously into complex structures in a test tube, many biological assemblyprocessesinuolue irreuersiblesteps.Consequently,not all structures in the cell are capable of spontaneous reassemblyafter they haue beendissociatedinto their component par6.
PROTEIN FUNCTION we have seen that each type of protein consists of a precise sequence of amino acids that allows it to fold up into a particular three-dimensional shape, or conformation. But proteins are not rigid lumps of material. They often have precisely engineered moving parts whose mechanical actions are coupled to chemical events. It is this coupling of chemistry and movement that gives proteins the extraordinary capabilities that underlie the dynamic processesin living cells.
PROTEIN FUNCTION
153
In this section,we explainhow proteinsbind to other selectedmolecules and how their activity dependson suchbinding.We showthat the ability to bind to other molecules enables proteins to act as catalysts,signal receptors, switches,motors,or tiny pumps.The exampleswe discussin this chapterby no meansexhaustthe vastfunctionalrepertoireof proteins.Youwill encounterthe specializedfunctionsof many other proteinselsewherein this book, basedon similarprinciples.
All ProteinsBindto OtherMolecules A protein molecule's physical interaction with other molecules determines its biological properties. Thus, antibodies attach to viruses or bacteria to mark them for destruction, the enzyme hexokinasebinds glucoseand ATP so as to catalyze a reaction between them, actin molecules bind to each other to assemble into actin filaments, and so on. Indeed, all proteins stick, or bind, to other molecules. In some cases, this binding is very tight; in others it is weak and short-lived. But the binding always shows great speciftcity, inthe sense that each protein molecule can usually bind just one or a few molecules out of the many thousands of different types it encounters. The substance that is bound by the protein-whether it is an ion, a small molecule, or a macromolecule such as another protein-is referred to as a ligand for that protein (from the Latin word ligare, meaning "to bind"). The ability of a protein to bind selectively and with high affinity to a ligand depends on the formation of a set of weak, noncovalent bonds-hydrogen bonds, electrostatic attractions, and van der Waals attractions-plus favorable hydrophobic interactions (seePanel2-3,pp. fl0-lff). Becauseeach individual bond is weak, effective binding occurs only when many of these bonds form simultaneously. Such binding is possible only if the surface contours of the ligand molecule fit very closely to the protein, matching it like a hand in a glove (Figure 3-36). The region of a protein that associateswith a ligand, known as the ligand's binding site, usually consists of a cavity in the protein surface formed by a particular arrangement of amino acids. These amino acids can belong to different portions of the polypeptide chain that are brought together when the protein folds (Figure 3-37). Separate regions of the protein surface generally provide binding sites for different ligands, allowing the protein's activity to be regulated, as we shall seelater.And other parts of the protein act as a handle to position the protein in the cell-an example is the SH2 domain discussed previously, which often moves a protein containing it to particular intracellular sitesin responseto particular signals. Although the atoms buried in the interior of the protein have no direct contact with the ligand, they form the framework that gives the surface its contours and its chemical and mechanical properties. Even small changes to the amino acids in the interior of a protein molecule can change its three-dimensional shape enough to destroy a binding site on the surface.
n o n c o v a l e nbt o n d s ligand
Figure3-36 The selectivebinding of a protein to another molecule.Manyweak bondsare neededto enablea proteinto bind tightlyto a secondmolecule,which is calleda ligandfor the protein.A ligand must thereforefit preciselyinto a protein'sbindingsite,likea handinto a glove,so that a largenumberof noncovalentbondsform betweenthe proteinand the ligand.
154
Chapter3: Proteins amino acid sidechains
/\
u n f o l d e dp r o t e i n
f
(A)
roLorr.rc
foldedprotein
Figure3-37 The binding site of a protein.(A)Thefoldingof the polypeptidechaintypicallycreatesa creviceor cavityon the protein surface.This crevicecontainsa set of aminoacidsidechainsdisoosedin sucha way that they canform noncovalent bondsonly with certain ligands.(B)A close-upof an actualbindingsiteshowingthe hydrogen bondsand electrostatic interactions formedbetweena oroteinand its l i g a n dI.n t h i se x a m p l ec,y c l i cA M Pi st h e b o u n dl i g a n d .
TheSurfaceConformation of a ProteinDetermines lts Chemistry Proteins have impressive chemical capabilities because the neighboring chemical groups on their surface often interact in ways that enhance the chemical reactivity of amino acid side chains. These interactions fall into two main categories. First, the interaction of neighboring parts of the polypeptide chain may restrict the accessof water molecules to that protein's ligand-binding sites.This is important because water molecules readily form hydrogen bonds that can compete with ligands for sites on the protein surface.Proteins and their ligands form tighter hydrogen bonds (and electrostatic interactions) if the protein can exclude water molecules from its binding sites. It might be hard to imagine a mechanism that would exclude a molecule as small as water from a protein surface without affecting the access of the ligand itself. However, because of the strong tendency of water molecules to form water-water hydrogen bonds, water molecules exist in a large hydrogen-bonded network (see Panel 2-2, pp. 108-109). In effect, a protein can keep a ligand-binding site dry because it is energeticallyunfavorable for individual water molecules to break away from this network, as they must do to reach into a crevice on a protein's surface. Second,the clustering of neighboring polar amino acid side chains can alter their reactivity. If protein folding forces together a number of negatively charged side chains against their mutual repulsion, for example, the affinity of the site for a positively charged ion is greatly increased. In addition, when amino acid side chains interact with one another through hydrogen bonds, normally unreactive side groups (such as the -CH2OH on the serine shor,vnin Figure 3-38) can become reactive, enabling them to be used to make or break selected covalent bonds. The surface of each protein molecule therefore has a unique chemical reactivity that depends not only on which amino acid side chains are exposed, but
))v8
(v) INOUI
e ur s o r A t o q d s o q d
pue6rl aprlded^lod
'urEruop [rar^asl] ruolJuotsstullad zHS aqlJo uolpury qtlM'966t'gsE-(.vEt tsz'lo!8'low' r eql Surforlsap dqaraql 'alIS SupuIq-ZHS eql palEAIlrPuI leql t(E,tre uI peJatle 'uaqof 'u'H'a6lelLl)rl'o l'l pue eurnog eruPf,aqsurEruop zHS esoq ^ sruslueSJoIIEJo uonnlo e Suunp uouPultuIa pp Luorlpeldepv) '(8) pue (v) uaeMlaq -uaraJeJdeql ol palnqlrllP sI llnsar slql 'ssacotd Iuopuer E sI uollelnru esneJeg a)uPpuodsalro) jo aal6epq6rqoql 'surpluop uortruSooer apuded Jo dlruPJ ZHS eBrPI eql peJnpord lEql ssaJord oloN a/drndare sloqto aql pue 'ttttollalate .ftuuoqn1o,ra 3uo1 aql Surrnp eBueqJ ot tse,{i\olsaq} uaaq e^eq epqdeddlod pe}el pue6r1 aql lo spr)eour.rtpIal oml aq1 'anlgparolo)ete pue6t;punoq aqllo -,fuoqdsoqdeqt roJ atISSuIpuIq eql tB pelecol splce ouIIuP aql'g6t-g eJnBIcuI rlxaluoo ruu u!qllMpale)olspt)eoutureesoql pelBlrfu apls eulsoJdl E uI uleqJ t'0 plse JIJIceds oulruE IrMoqs se aouenbas 'arag'aprldedXlod punoq slr qlrmuleuop -oqdsoqd eql spulq rI puof,as uleloJd € ol Sululeluoo e sureluof, uralord lI lpql 'suollJpJelur ulelord-utaloJd uI suollcunJ leql appolu e sI uleruop zHs eql Jo arn])nrlsaql (8)pal palolo) ZHS eqJ rorJalururaloroaql pleMolaloul asoql 'selrsSurpurqpup81 ot puodserJocl.leraua8 sralsnlseseqJ'(gtI-ZEI'dd'Z-E pue 'uo1ptrparolo) are;rnsurelordeql uo spr)eoururepa^rasuol,{;ueuorln;one IeuEd aas) ,{.1snou.ardpeqrJf,sap uleluop ZHS eq} roJ VGt-t aJntIC uI pe}Er}snlll se 'aceJJnsuleloJd aql uo srelsnlJ eJolu Jo euo IuJoJ ueuo suolllsod luEue^ul qllM 'u!europ aLll zHs Jo lapoLu 'euop sr slq] ual{r\\ 'raqtue(u [1lureJ euo 6urq;r;-ereds e,o smar^))eq pue ruo.tl(V) lsolu aql Jo aJnlcnJ]s leuolsuelulp 'uleuop -eerql ZHSaqt or pa;lddepoqlaur eq] Jo Iapour P oluo paddPtu ere sleqluelu l.11ueJutalord u ^oDI eql Jo arerl rfueuollnlone aql6g-5 arn6r3 'pe8upqcun,{1reauro 'pe8ueqcun eJE}Eqt spIf,BoulruB esoq} 'esodrnd sHf [p uI JOC'uor]3ury S.urPruopaql o] IPIJnJc ISOIUaq] eJElBq] UIeruOp ulelord P uI sells asoql lyluapl ol SuIcEJ]d;euoqnlona pafiec poq]etu P esn eroJaJeqlUEJeM '(y Z) sralaruoueu Z'0 ulqllm ploJ ulalord uoulluor E ,rtol -loJ urpuop E ur sruole euoq{oeq eql'ohgz o} slleJrillruapr eouenbasplse outrlle eql ueqm uele 'eldurexe JoC 'JplIruISl.1qe>lreuar eJP d.1nueJuluruop etues eql 'JolsesuPuoluruos B luo4 JOSJaqruau aql Jo seJnlJnrls IEuolsueurlp-eaJqleqJ uorlnlo^a JreqlJo aJuapr^eJEaIJ^aoqsleql seilurEJ olul sulaloJd uI sulPluop eql go.tueru dnor8 ol sn llrolle seJuanbesauoue8 '.{lsnona.tdpaqlJcsep aleq a^ sV 'araq pellruroare ureqraprtdad{;odaqllo suorlnlo^uo) {ueu aql'puoq eplldade 6urzIlolp,{q 'alellsqnsau,{zuaaq} q}!Mpuoq lualPAo)e ullo, ol auuaseql sale^tlle s!ql'56t auuasurotj uolold aql a^oulal ol (l.sslH)oulp!]s!r.{ aql sarnput(20|, d s y )u r e q >a p r sp o e r r y e d s eo q l ' ( Zt - € arnbr3aas)sasealo.rd auueslaLllo pue 'aselsela'ursdlrlour,{qr ut punoJ,,ppu} >r1I;e1e1, eql sre;duexasrql'au{zua ue ro alls a^rl)e oql le p!)e oulruP anll)pa.r{l1ensnunuy gg-g arn6r3
: HJ- t)^.-
sat!S6urpu!8-pue6!l ler)nr)lt{6!lt{6!H sraquaw {1;ure1ulelorduea/n}e8suos!.leduo)e)uanbes drlsnuaqc Jreq] ur ,tltear8 ra;;rp,{eu apcelo{u ulelord aues eq}Jo suolleruJoJuoJluaJeJJIpr(pq311s ua^a o^^l 'uoseer srq] Joc 'JeqlouEeuo 01 eArlEIaJuopeluelJo lsexa Jleql uo osp
'::-o-
H- N.r ^7 N llllllllrlH 1
N\r-N-
NOrlfNnJ NETOUd
s9[
raqlaool suralordo^^tlu!l uauo sa)eJlnspr6u^Jelueualduo) oMf (l) 'lto)-palto)e ulloJol laqtabol purquel sa)tleqn oMl (B).utalold puo)ase uo (,,6uuls,, e) uteLl)eptldadllod Jo dool papuatxaup ot putq ue) utalord auo uo a)eJlnsp!6u v (v)'uMoqsale suraloldoM] aql Jo sued 6utl)elalutaql I1u6 raqto q)ea ol pu!q ue) sulalord o/vqq)!qm ur srtemealql gg-g eln6r1
ol {uI uP3 uraloJd eql teq} os pueSl E punoJrns o1 sdnoJSIef,rrueqf,Jo Jeqrunu a8rel e ,rlrolle,{teqJ'seln3alou raqlo Surdser8roJ papr ere puDI srq} Jo sdool 'arn]3nJls ureloJd JrsBq aqt Suuetle ]noqtrm 'sdoo1asaql Jo aJuenbes prf,e ourrue pue qfual aql {1uo 3u€ueqc ,r(qsals Surpurq-ua8pup Jo .{lrsranrp snoruJoua ue aleraua8 serpoq -ltue luereJJIC '(It-t arnt;g) sureruop uralord pasoderynf dlasolc;o rred e;o spua aql ruo4 apnrtord leqt ureqc apqdaddlodyo sdool pranas ruo4 peruroJ are ,,{aqlreql slEaAaJsarpoqrlueJo salrsSurpurq-ue8rlueeql Jo uonpuruexa pelre}ep 'alnJelotu ua8uue eql V Jo ecpJJnsaql Jo uorlrod IIBrus e o1 ,rfueluauayduroc eJplpqt salrsSurpurq leJrluepr oml qtlm selnoelorupadeqs-trare serpoqpuv 'serpoquuB ]ualaJ -JrpJo suoulrq acnpord ot alqe aq ot aAEqallr'relunoJua lq8nu sueunq ]eq] sua8 -ltue tueraJJrp suorpq dlequalod ere araql asnpoag ,{lrcgrcads alqelreuer Jo qrwr (uatpue up pelleo) la8rcl slr sazruSocar.{poqrlue uV 'uor}f,nJ}sep JoJ t1 Suqreur ro ,(pcarrp alnoalou le8re1 aqt Surle.,lrlreur raqlra .{qaraql 'epceloru 1a8re1 repcpred e o] l(llq8p spurq ,ri.poqrtup qceg 'rusrueBroorJrtu Surpplur ue Jo aceJrns eql uo asoql sE qJns 'salnJelour uBrarogol esuodseJur uralsr(s aunurtur aql dq pacnpord suralord ere 'surlnqolSounruur Jo 'salpoqpuv '(97 raldeq3 ur Irelep ur passno -srp) Surpurq a^rtJelas rq8rl ro; dllcedec slr JoJ alqelou sr dyruu;,rtpoqrlue aql 'suollcunJ snorJEArraql ]no drrec o] spue8l relncqred ol pulq lsnur suralord 1ry
elrlesren I;;errads3arv salrs6u;pu;g{poqlluy 'lle3 B ur punoJ suralord luara;;rp;o spuesnoql .,(ueuraq1uror; JeulJpd auo ]sn[ ]Jalas 01 uralord e Suqqeua 'crgrcadsdlauarga eq uec suor]3e -ralur eJBJJns-aJeJrns qJns 'uosBel alues aql Joc 'lla1!\q3]eru ]eq] saJeJJnso1!\] uaamlaq ruJoJUEJ spuoq Tea^ Jo Jequnu a8reye acurs '1q8p fral aq uec suor] -Jerelur qJnS '(10'-t arn8rg) reqloup Jo teq] qtlm aJeJJnsp€u auo go Surqcleu asrcard aql .{q sr 'Je^e^toq'lJpJalur o1 suralord ro; ,(e,vruoruuros lsolu aql 'z Jeloeql ur passnJsrp se 'suralord Irolep8ar aua8 go sarlrueJ IBJeAasur punoJ sr aJeJJalururalord go adfi sq1 '(S0?-g arn8rg) IroJ-pelroJ E ruroJ o1 raqtaSo] Jred 'urelord qJee ruorJ auo 'sacllaq rl oml uaqm sruJoJaJeJralururalord-uratord 3o adfi puocas V '(ntoleq aas) aleldroqdsoqd IIIr\ ll ]eq] suralord aql azluSocal o] aseuDluratord e selqeua osle ]r pue 'paqrJJsaplsnf se 'uralord puooes e uo dool apqdaddlod pa1e1,{roqdsoqde anuSocal ol ureruop ZHS eql smolle 'aldurexa JoJ'|uorlJpJa}ur 8urr1s-ace;rns e qons '(V07-t arn8gg) uralord puoces e uo (,,3uu1s,,p) ureqr apqda&t1od;o dool papualxe uB slJeluoc uralord euo Jo eoeJJnseql Jo uoq -rod e 'seser.{ueru u1 's,{emaarq} lseel le ur suralord Jeq}o o} purq uBJ suraloJd
sa)eJralul Jo sad^I lela^asq6norql sulalotdraqlo ol pu!g suratord 'uorlcunJ uralord raqdrcap o1Surdlaq dqaraqt ,{puel leqt Jo sraqruetu aq} JoJ salrs Surpurq aurruJa}ap o1 slsrSolorq s^^olp Surcerl druuorlnlona 'raqrueru ,r{11ue3 auo JoJ paururalep uaeq seq aln}cnrls 'u^\ou{un erB suorlsunJ asoqM peJa^ossrp ueeq IPuorsuarurp-ealq] e eJuo alerl serlrrueJuralord a,rau.,tueru'SurcuanbasaruouaSanrsualxaJo pJaslq] uI l)vruns-3)vJUns o)
x|EH-Xt'llH (8)
9NrUrS-3)VlUnS(v)
I x!laq
z a)e|ns I a)e+Jns
sulaloJd:€ Jaloeql
95l
PROTEIN FUNCTION
157
h e a v yc h a i n
lr: "fr
l o o p st h a t b i n d a n t i g e n Vs domain --...... NH,
lq \-"l
Vrdomain
v a r i a b l ed o m a i n o f l i g h t c h a i n( V r ) 5"rn
(A)
(B)
cooH
it with many weak bonds. For this reason, loops often form the ligand-binding sites in proteins.
TheEquilibrium ConstantMeasures BindingStrength Molecules in the cell encounter each other very frequently because of their continual random thermal movements. Colliding molecules with poorly matching surfaces form few noncovalent bonds with one another, and the two molecules dissociate as rapidly as they come together. At the other extreme, when many noncovalent bonds form between two colliding molecules, the association can persist for a very long time (Figure 3-42). Strong interactions occur in cells whenever a biological function requires that molecules remain associatedfor a long time-for example, when a group of RNA and protein molecules come together to make a subcellular structure such as a ribosome. We can measure the strength with which any tvvo molecules bind to each other. As an example, consider a population of identical antibody molecules that suddenly encounters a population of ligands diffusing in the fluid surrounding them. At frequent intervals, one of the ligand molecules will bump into the binding site of an antibody and form an antibody-ligand complex. The population of antibody-ligand complexes will therefore increase, but not without limit: over time, a second process, in which individual complexes break apart because of thermally induced motion, will become increasingly important. Eventually, any population of antibody molecules and ligands will reach a steady state, or equilibrium, in which the number of binding (association)events per second is precisely equal to the number of "unbinding" (dissociation)events (seeFigure 2-52). From the concentrations of the ligand, antibody, and antibody-ligand complex at equilibrium, we can calculate a convenient measure-the equilibrium constant (K)-of the strength of binding (Figure 3-43A.).The equilibrium constant for a reaction in which two molecules (A and B) bind to each other to form a complex (AB) has units of liters/mole, and half of the binding sites will be occupied by ligand when that ligand's concentration (in moles/liter) reaches a value that is equal to l/K This equilibrium constant is larger the greater the binding strength, and it is a direct measure of the free-energy difference
Figure3-41 An antibodymolecule, (A)A typicalantibodymoleculeis and hastwo identicalbinding Y-shaped sitesfor its antigen,one on eacharm of the Y.The protein is composedof four polypeptidechains(two identicalheavy chainsand two identicaland smallerlight chains)heldtogetherby disulfidebonds. Eachchainis madeup of severaldifferent immunoglobulindomains,hereshaded eitherb/ueor groy.fheantigen-binding siteisformedwherea heavy-chain variabledomain(VH)and a light-chain variabledomain(Vr)comeclose together.Thesearethe domainsthat differmost in their seouenceand structurein differentantibodies.Each domainat the end of the two armsof the antibodymoleculeformsloopsthat bind to the antigen.ln (B)we can seethese fingerlikeloops (red)contributedby the Vrdomain.
158
Chaoter3: Proteins
t h e s u r f a c e so f m o l e c u l e sA a n d B , a n d A a n d C , a r e a p o o r m a t c ha n d a r e c a p a b l eo f f o r m i n g o n l y a f e w w e a k b o n d s ;t h e r m a l m o t i o n r a p i d l y breaksthem aoart
((
m o l e c u l eA r a n d o m l ye n c o u n t e r s o t h e r m o l e c u l e s( 8 , C ,a n d D )
(
the surfacesof moleculesA and D match well and therefore can form enough weak bonds to withstand t h e r m a lj o l t i n g ;t h e y t h e r e f o r e stay bound to each other
between the bound and free states (Figure 3-438 and C). Even a change of a few noncovalent bonds can have a striking effect on a binding interaction, as shown by the example in Figure 3-44. (Note that the equilibrium constant, as defined here is also knor,rmas the association or affinity constant, Ku.) We have used the case of an antibody binding to its ligand to illustrate the effect of binding strength on the equilibrium state, but the same principles apply to any molecule and its ligand. Many proteins are enzymes, which, as we now discuss, first bind to their ligands and then catalyze the breakage or formation of covalent bonds in these molecules.
Figure3-42 How noncovalentbonds mediate interactionsbetween macromolecules,
Enzymes Are Powerfuland HighlySpecific Catalysts Many proteins can perform their function simply by binding to another molecule. An actin molecule, for example, need only associatewith other actin 1 dissociation AB-A+B
The relationshipbetween free-energydifferencesand equilibriumconstants(37"C)
d i s s o c i a t i o.n" 1 " = d i s s o c i a t i o n x c o n c e n t r a t i o n rate constant of AB
equilibrium constant
d i s s o c i a t i orna t e = k o r [ A B ]
A+lassociationrate =
AB assoclatlon rate constant
of AB minus o f A B m i n u s free enerqv f r e e e n e r g y tAllBl ofA+B-OTA+ts (liters/mole) (kcal/mole) (kJ/mole) lABl
association c o n c e n t r a t i o n concentration ofA ofB
1 10 102 103 104 1os 106 107 108 10e 1010 1oll
a s s o c i a t i o nr a t e = k o n [ A ] [ B l
AT EQUILIBRIUM: a s s o c i a t i orna t e = d i s s o c i a t i o rna t e kon[A] [B]
(A)
=
fre*energydifference
k"r [AB]
(B)
A l t h o u g hj o u l e sa n d k i l o j o u l e s( 1 0 0 0j o u l e s )a r e standard units of energy, c e l l b i o l o g i s t su s u a l l yr e f e r t o f r e e e n e r g yv a l u e si n t e r m so f c a l o r i e sa n d kilocalories.
K
0 -'t 4 -2.8 -4.3 -5.7 - 7. 1 -8.5 -9.9 - 11 3 -12.8
0 -5.9 -11.9 -17.8 -23.7 -29.7 -35.5 -41.5 47.4 -53 4
- 1 56
-55.3
-J9.4
O n e k i l o c a l o r i e( k c a l )i s e q u a lt o 4 . 1 8 4k i l o j o u l e s (kJ). T h e r e l a t i o n s h i pb e t w e e n the f ree-energychange, A G ,a n d t h e e q u i l i b r i u m constant is AG = -0.00458 r log K whereAGis in kilocalories a n d f i st h e a b s o l u t e t e m p e r a t u r ei n K e l v i n s ( 3 1 0K = 3 7 " C )
(c)
Figure3-43 Relatingbinding energiesto the equilibriumconstantfor an association reaction.(A)Theequilibrium betweenmolecules A and B and the complexAB is maintainedby a balancebetweenthe two opposingreactionsshownin panels1 and 2. Molecules A and B mustcollideif they areto react,and the association rateis thereforeproportionalto the productof their individualconcentrations As shownin panel3, the ratio [A]x [B].(Squarebracketsindicateconcentration.) of the rateconstantsfor the association and the dissociation reactionsis equalto the equilibriumconstant(K)for the reaction.(B)Theequilibriumconstantin panel3 is that for the reactionA + B + AB,and the largeritsvalue,the stronger the bindingbetweenA and B.Notethat for everyl.41 kcal/mole(5.91kJlmole)decrease in freeenergythe equilibrium constantincreasesby a factor of 10 at 37'C. Theequilibriumconstantherehasunitsof liters/mole: for simplebindinginteractions it is alsocalledthe affinityconstant ot association constant,denoted Ku.The reciprocalof Kuis calledthe dissociationconstant,K6(in units of moles/liter).
159
P R O T E IFNU N C T I O N
molecules to form a filament. There are other proteins, however, for which ligand binding is only a necessaryfirst step in their function. This is the casefor the large and very important class of proteins called enzyrnes. As described in Chapter 2, enzymes are remarkable molecules that determine all the chemical transformations that make and break covalent bonds in cells. They bind to one or more ligands, called substrates, and convert them into one or more chemically modified products, doing this over and over again with amazing rapidity. Enzymes speed up reactions, often by a factor of a million or more, without themselves being changed-that is, they act as catalysts that permit cells to make or break covalent bonds in a controlled way. lt is the catalysisof organized sets of chemical reactions by enzymes that createsand maintains the cell, making life possible. We can group enzymes into functional classesthat perform similar chemical reactions (Table 3-1). Each type of enzyme within such a classis highly specific, catalyzing onll, a single type of reaction. Thus, hexokinase adds a phosphate group to o-glucose but ignores its optical isomer t-glucose; the blood-clotting enzyme thrombin cuts one tlpe of blood protein between a particular arginine and its adjacent glycine and nowhere else, and so on. As discussed in detail in Chapter 2, enzymes work in teams, with the product of one enzvme becoming the substrate for the next. The result is an elaborate network of metabolic pathways that provides the cell with energy and generatesthe many large and small moleculesthat the cell needs (seeFigure2-35).
Substrate Bindingls the FirstStepin EnzymeCatalysis
C o n s i d e r1 0 0 0m o l e c u l e so f A a n d 1 0 0 0m o l e c u l e so f B i n a e u c a r y o t i c c e l l T h e c o n c e n t r a t i o no f b o t h w i l l b e a b o u t 1 0 - eM l f t h e e q u i l i b r i u m . c o n s t a( K n )t f o r A + B . - A B i s 1 0' ' , t h e n o n e c a n c a l c u l a t et h a t a t e q u i l i b r i u mt h e r e will be 270
270
730
ABAB molecules molecules molecules l f t h e e q u i l i b r i u mc o n s t a n ti s a l i t t l e w e a k e ra t 1 0 ' , w h i c h r e p r e s e n t s a l o s so f 2 8 k c a l / m o l eo f b i n d i n g e n e r g yf r o m t h e e x a m p l e a b o v e ,o r 2 - 3 f e w e r h y d r o g e n b o n d s ,t h e n t h e r e w i l l b e 915
915
85
ABAB molecules molecules molecules
Figure3-44 Smallchangesin the numberof weak bondscan havedrastic effectson a binding interaction.This the dramaticeffectof examoleillustrates or absenceof a few weak the presence noncovalent bondsin a bioloqical conIexI.
For a protein that catalyzesa chemical reaction (an enzyme), the binding of each substratemolecule to the protein is an essentialprelude.In the simplest case,if we denote the enzyme by E, the substrate by S, and the product by Il the basic reaction path is E + S -+ ES -+ EP -+ E + P From this reaction path, we see that there is a limit to the amount of substrate that a single enzyme molecule can process in a given time. An increase in the concentration of substrate also increasesthe rate at which product is formed, up to a maximum value (Figure 3-45). At that point the enzyme molecule is saturated with substrate, and the rate of reaction ( V-oJ depends only on how rapidly the enzyme can processthe substrate molecule. This maximum rate divided by the enzvme concentration is
Table3-1 SomeCommonTypesof Enzymes
ENZYME
REACTION CATALYZED
Hydrolases
andproteoses generaltermfor enzymesthat catalyzea hydrolyticcleavagereaction;nucleases aremorespecific namesfor subclasses of theseenzymes. breakdownnucleicacidsby hydrolyzing bondsbetweennucleotides. breakdownproteinsby hydrolyzing bondsbetweenaminoacids. together. two smallermolecules synthesize molecules by condensing in anabolicreactions catalyze the rearrangement of bondswithina singlemolecule. polymerization of DNAand RNA. catalyze reactions suchasthe synthesis arean importantgroup groupsto molecules. Proteinkinases catalyze the additionof phosphate groupsto proteins. of kinases that attachphosphate groupfroma molecule. catalyze removalof a phosphate the hydrolytic whilethe generalnamefor enzymes reactions in whichonemoleculeisoxidized that catalyze namedeitheroxidases, otheris reduced.Enzymes of this type areoften morespecifically reductases, or dehydrogeno ses. ATPase hydrolyzeATP. Manyproteinswith a wide rangeof roleshavean energy-harnessing activityaspart of theirfunction,for example,motor proteinssuchasmyosinandmembrane pump. transportproteinssuchas thesodium-potassium
Nucleases Proteases Synthases lsomerases Polymerases Kinases Phosphatases Oxido-Reductases
ATPases
that weredlscovered thrombinand lysozyme Enzymenamestyp ca ly end in " ase,"with the exc-apt trypsin, suchaspepsln, on of some-onzymes, centuryThecommonnameof an enzymeusually and namedb-.forethe convention becamegeneraly acceptedat the end of the nineteenth of citrateby a reaction catayzesthe synthests ndjcates the substrate citratesynthase andthe natureof the reactiorcatayzed Forexample, betweenacetvCoAandoxaloacetate
160
Chapter3: Proteins
I
C
o E 6 o o 0 5v.", o 6
K-
s u b s t r a t ec o n c e n t r a t i o n+
Figure3-45 Enzymekinetics,The rate of an enzymereaction(V)increases asthe substrateconcentration increases untila maximumvalue(Vr"r) is reached. At this point all substrate-binding siteson the enzymemolecules arefullyoccupied, and the rateof reactionis limitedby the rateof the catalyticprocesson the enzymesurface.Formostenzymes, the concentration of substrate(Kr) at which the reactionrate is half-maximal(black dot)is a measureof how tightlythe substrateis bound,with a largevalueof K. corresponding to weakbinding.
called the turnouer number. The turnover number is often about 1000 substrate molecules processedper second per enzyme molecule, although turnover numbers between 1 and 10,000are known. The other kinetic parameter frequently used to characterizean enzyme is its K-, the concentration of substrate that allows the reaction to proceed at onehalf its maximum rate (0.5 V-*) (seeFigure 3-45). A low K^value means that the enzyme reaches its maximum catalytic rate at a low concentration of substrate and generally indicates that the enzyme binds to its substrate very tightly, whereas a high K- value corresponds to weak binding. The methods used to characterize enzymes in this way are explained in Panel 3-3 (pp. 162-163).
Enzymes SpeedReactions by Selectively Stabilizing Transition States Enzymes achieve extremely high rates of chemical reaction-rates that are far higher than for any synthetic catalysts.There are several reasons for this efficiency. First, the enzyme increases the local concentration of substrate molecules at the catal)'tic site and holds all the appropriate atoms in the correct orientation for the reaction that is to follow. More importantly, however, some of the binding energy contributes directly to the catalysis. Substrate molecules must pass through a series of intermediate states of altered geometry and electron distribution before they form the ultimate products of the reaction. The free energy required to attain the most unstable transition state is called the actiuation energyfor the reaction, and it is the major determinant of the reaction rate. Enzymes have a much higher affinity for the transition state of the substrate than they have for the stable form. Becausethis tight binding greatly lowers the energies of the transition state, the enzyme greatly acceleratesa particular reaction by lowering the activation energy that is required (Figure 3-46). By intentionally producing antibodies that act like enzymes, we can demonstrate that stabilizing a transition state can greatly increase a reaction rate. Consider, for example, the hydrolysis of an amide bond, which is similar to the peptide bond that joins two adjacent amino acids in a protein. In an aqueous solution, an amide bond hydrolyzes very slowly by the mechanism shown in Figure 3-47A. In the central intermediate, or transition state, the carbonyl carbon is bonded to four atoms arranged at the corners of a tetrahedron. By generating monoclonal antibodies that bind tightly to a stable analog of this very unstable tetrahedral intermediate, one can obtain an antibody that functions like an enzyme (Figure 3-47F_).Becausethis catalytic antibodybinds to and stabilizes the tetrahedral intermediate, it increases the spontaneous rate of amide-bond hydrolysis more than 10,000-fold.
EnzymesCan Use5imultaneousAcid and BaseCatalysis Figure 3-48 compares the spontaneous reaction rates and the corresponding enzyme-catalyzed rates for five enzyrnes. Rate accelerations range from 109to 1023. Clearly, enzymes are much better catalysts than cata\tic antibodies.
a c t i v a t i o ne n e r g y for uncatalyzed reaction
I
A
o q c
o
EP progress of reaction acTrvaron energy for catalyzed reaction
Figure3-46 Enzymaticaccelerationof chemicalreactionsby decreasingthe activation energy.Often both the uncatalyzed reaction(A)and the enzymecatalyzed reaction(B)cango through severaltransitionstates.lt isthe transitionstatewith the highestenergy (Srand ESr)that determines tne activationenergyand limitsthe rateof p = product the reaction.(S= substrate; of the reaction;ES= enzyme-substrate complex;EP= enzyme-product complex.)
PROTEIN FUNCTION
161
( A ) H Y D R O L Y SO I SF A N A M I D EB O N D
oo-o
'H
l(
tetrahedral intermediate
water
( B )T R A N S I T I O N - s T AATNEA L O GF O RA M I D EH Y D R O L Y S I S
o\ D'
Noz
Figure3-47 Catalyticantibodies.The of a transitionstateby an stabilization antibodycreatesan enzyme.(A)The reactionpath for the hydrolysisof an amidebond goesthrougha tetrahedral transition the high-energy intermediate, statefor the reaction.(B)The moleculeon the left wascovalentlylinkedto a protein and usedasan antigento generatean antibodythat bindstightlyto the region of the moleculeshown in yellow.Because this antibodyalsoboundtightlyto the transitionstatein (A),it was found to functionasan enzymethat efficiently of the amide the hydrolysis catalyzed bond in the moleculeon the riqht.
Noz
o analog
Enz).rynes not only bind tightly to a transition state, they also contain precisely positioned atoms that alter the electron distributions in those atoms that participate directly in the making and breaking of covalent bonds. Peptide bonds, for example, can be hydrolyzed in the absence of an enzyme by exposing a polypeptide to either a strong acid or a strong base, as illustrated in Figure 3-49. Enzymes are unique, however, in being able to use acid and base catalysissimultaneously, since the rigid framework of the protein binds the acidic and basic residues and prevents them from combining with each other (as they would do in solution) (Figure 3-49D). The fit between an enzyme and its substrate needs to be precise. A small change introduced by genetic engineering in the active site of an enzyme can have a profound effect. Replacing a glutamic acid with an aspartic acid in one enz)ryne,for example, shifts the position of the catalytic carborylate ion by only I A (about the radius of a hydrogen atom); yet this is enough to decreasethe activity of the enzyme a thousandfold.
LysozymelllustratesHow an EnzymeWorks To demonstrate how enzymes catalyze chemical reactions, we examine an enzlrrne that acts as a natural antibiotic in egg white, saliva, tears, and other secretions.Lysozyme catalyzesthe cutting of polysaccharide chains in the cell walls of bacteria. Because the bacterial cell is under pressure from osmotic forces,cutting even a small number of polysaccharide chains causesthe cell wall to rupture and the cell to burst. Lysozl'rneis a relatively small and stable protein
h a l f - t i m ef o r r e a c t i o n 1 0 6y e a r s
1year
UNCATALYZED
1 sec
cnrnLvzeo
Figure3-48 The rate accelerations causedby five different enzymes, (Adaptedfrom A, Radzickaand 1995. R.Wolfenden,Science267'.90-93, from AAAS.) With permission
WHY ANALYZETHE KINETICS OF ENZYMES? Enzymesare the most selectiveand powerful catalystsknown. An understandingof their detailedmechanisms providesa criticaltool for the discoveryof new drugs,for the large-scale industrialsynthesis of usefulchemicals, and for appreciating the chemistryof cellsand organisms.A detailedstudyof the ratesof the chemicalreactionsthat are catalyzedby a purified enzyme-more specifically how theserateschangewith changesin conditionssuchasthe concentrations of substrates, products,inhibitors,and regulatory Iigands-allows
biochemists to figure out exactlyhow eachenzymeworks. For example,this is the way that the ATP-producing reactions of glycolysis, shown previouslyin Figure2-72, were deciphered-allowing us to appreciatethe rationalefor this criticalenzymaticpathway. In this Panel,we introducethe important field of enzyme kinetics,which hasbeen indispensable for derivingmuch of the detailedknowledgethat we now haveabout cell chemistry.
STEADY-sTATE ENZYM E KINETICS Many enzymeshaveonly one substrate,which they bind and then processto produceproductsaccordingto the scheme outlined in Figure3-504. In this case,the reactionis written as kr
E.S
*.
rate of ESbreakdown k-l [E5]+ kcat[Es]
Kr:t
Es -;
At this steadystate,[ES]is nearlyconstant,so that
rate of ESformation
kr tEltsl
E+p
K_l
Herewe haveassumedthat the reversereaction.in which E + P recombineto form EPand then ES,occursso rarelythat we can ignore it. In this case,EPneed not be represented,ano we can expressthe rate of the reaction- known as its velocity,V, as
or, sincethe concentrationof the free enzyme,[E],is equal to [Eo]- [E5],
r,,r= (;|;)
-,,,,),,, r,r'r = (-jr-_; (,,",
V= k'"t [ES] where IES]is the concentrationof the enzyme-substrate complex, Rearranging,and defining the constantKmas and k.". is the turnover number,a rate constantthat hasa value k-1 + k.", equal to the number of substratemoleculesprocessed per enzymemoleculeeachsecond. k1 But how doesthe value of IES]relateto the concentrations that we know directly,which are the total concentrationof the we get enzyme,IEo],and the concentrationof the substrate,[S]?When enzymeand substrateare first mixed,the concentrationIES]will lE,lIs] tEsl = riserapidlyfrom zero to a so-calledsteady-state lever,as K. + [5] illustratedbelow. or, rememberingthat V = kr"t [E5],we obtain the famous Michaelis-Mentenequation
I c
k."t IEo][S]
g
K. + [S]
c o c
As IS] is increasedto higher and higher levels,essentially all of the enzymewill be bound to substrateat steadystate;at this point, a maximumrate of reaction,V-"r, will be reachedwhere V = V^u, = k."1[E6J.Thus,it is convenientto rewrite the Michaelis-Mentenequationas time + pre-steady state: E Sf o r m i n g
steadystate: ESalmostconstant
164
Chapter3: Proteins
\ir, H
o SLOW
o FAsr ll
tl
c.
\H
oH
n
H
c-
,,/-
,/,
\H
-NC--N HOHHO
,/\
H
o ll
FAsr
\/FRV
FAST \H
/ \
HHH
C
cHHOH
-N
(B)
acid catalysis
(C)
II
,,/.- \H
/\-
o //
base catalysis
il
HH
H
\//
o
C
I (A) no catalysis
O
(
I
(D)
both acid and base catalyses
that can be easily isolated in large quantities. For these reasons, it has been intensively studied, and it was the first enzyme to have its structure worked out in atomic detail by x-ray crystallography. The reaction that lysozyme catalyzes is a hydrolysis: it adds a molecule of water to a single bond between two adjacent sugar groups in the polysaccharide chain, thereby causing the bond to break (seeFigure 2-19). The reaction is energetically favorable because the free energy of the severedpolysaccharide chain is lower than the free energy of the intact chain. However, the pure polysaccharide can remain for years in water without being hydrolyzed to any detectable degree.This is because there is an energy barrier to the reaction, as discussedin Chapter 2 (seeFigure 2-46). Acolliding water molecule can break a bond linking tvvo sugars only if the polysaccharide molecule is distorted into a particular shape-the transition state-in which the atoms around the bond have an altered geometry and electron distribution. Becauseof this distortion, random collisions must supply a very large activation energy for the reaction to take place. In an aqueous solution at room temperature, the energy of collisions almost never exceeds the activation energy. Consequently, hydrolysis occurs extremely slowly, if at all. This situation changes drastically when the polysaccharide binds to lysozyme.The active site of lysozyme,becauseits substrate is a polymer, is a long groove that holds six linked sugars at the same time. As soon as the polysaccharide binds to form an enzyme-substrate complex, the enzyme cuts the polysaccharide by adding a water molecule across one of its sugar-sugar bonds. The product chains are then quickly released,freeing the enzyme for further cycles of reaction (Figure 3-50). The chemistry of the binding of lysozl.rneto its substrate is the same as that for antibody binding to its antigen-the formation of multiple noncovalent
Figure3-50 The reactioncatalyzedby lysozyme.(A)The enzyme lysozyme(E)catalyzes the cuttingof a polysaccharide chain,which is its substrate(S).Theenzymefirstbindsto the chainto form an enzyme-substrate complex(ES)and then catalyzes the cleavageof a specificcovalentbond in the backboneof the polysaccharide, formingan enzyme-productcomplex(EP)that rapidlydissociates. Release of the severedchain(the productsP)leavesthe enzymefreeto act on another substratemolecule.(B)A space-filling modelof the lysozymemolecule boundto a shortlengthof polysaccharide chainbeforecleavage. (B,courtesyof RichardJ. Feldmann.)
(A)
)+E
E)
*
E+P
Figure3-49 Acid catalysisand base catalysis.(A)The start of the uncatalyzed reactionshownin Figure3-474,with b/ueindicatingelectrondistributionin the waterand carbonylbonds.(B)An acid likesto donatea proton(H+)to other atoms.By pairingwith the carbonyl oxygen,an acidcauseselectronsto move awayfrom the carbonylcarbon,making this atom much moreattractiveto the electronegative oxygenof an attacking watermolecule.(C)A baselikesto take up H+.By pairingwith a hydrogenof the attackingwatermolecule, a basecauses electronsto move toward the water oxygen/makingit a betterattacking groupfor the carbonylcarbon.(D)By positionedatoms havingappropriately on its surface, an enzymecan perform both acidcatalysis and basecatalysis at the sametime.
165
PROTEIN FUNCTION
bonds. However,lysozyme holds its polysaccharide substrate in a particular way, so that it distorts one of the two sugarsin the bond to be broken from its normal, most stable conformation. The bond to be broken is also held close to two amino acids with acidic side chains (a glutamic acid and an aspartic acid) within the active site. Conditions are thereby created in the microenvironment of the lysozyme active site that greatly reduce the activation energy necessaryfor the hydrolysis to take place. Figure 3-51 shows three central steps in this enzymatically catalyzed reaction. The enzyme stressesits bound substrate, so that the shape of one sugar more closely resembles the shape of high-energy transition states formed during the reaction. 2. The negatively charged aspartic acid reactswith the Cl carbon atom on the distorted sugar,and the glutamic acid donates its proton to the oxygen that links this sugar to its neighbor. This breaks the sugar-sugar bond and leaves the aspartic acid side chain covalently linked to the site of bond cleavage. 3. Aided by the negatively charged glutamic acid, a water molecule reacts with the Cl carbon atom, displacing the aspartic acid side chain and completing the process of hydrolysis. t.
Figure3-51 Eventsat the active site of lysozyme.The top left and top rightdrawingsshowthe freesubstrate and the freeproducts,respectively, whereasthe otherthreedrawingsshow eventsat the enzyme the sequential activesite.Notethe changein the of sugarD in the conformation complex;this shape enzyme-substrate the oxocarbenium changestabilizes ion-liketransitionstatesrequiredfor of the covalent formationand hydrolysis intermediate shownin the middlepanel. It is alsooossiblethat a carboniumion formsin step2, asthe intermediate shownin the covalentintermediate middlepanelhasbeendetectedonly with a syntheticsubstrate.(SeeD.J.Vocadloet 2001.) al.,Nature412:835-838,
The overall chemical reaction, from the initial binding of the polysaccharide on the surface of the enzyme through the final release of the severed chains, occurs many millions of times faster than it would in the absence of enzyme. Other enzymes use similar mechanisms to lower activation energies and speed up the reactions they catalyze.In reactions involving two or more reactants, the active site also acts like a template, or mold, that brings the substrates together in the proper orientation for a reaction to occur between them (Figure
PRODUCTS
SUBSTRATE T h i ss u b s t r a t ei s a n o l i g o s a c c h a r i doef s i xs u g a r s , l a b e l e dA - F .O n l y s u g a r sD a n d E a r e s h o w n i n d e t a i l
T h e f i n a l p r o d u c t sa r e a n o l i g o s a c c h a r i doef f o u r s u g a r s (/eft) and a disaccharide(dght), produced by hydrolysis.
cHzoH A B CrO
otF cH2oH
-o ---\i_' I H
,3
oV
\ (.
I n t h e e n z y m e - s u b s t r a tceo m p l e x( E 5 ) t, h e e n z y m ef o r c e ss u g a rD i n t o a s t r a i n e d c o n f o r m a t i o nw , i t h G l u 3 5 p o s i t i o n e dt o s e r v ea s a n a c i dt h a t a t t a c k st h e a d j a c e n ts u g a r - s u g a r b o n d b y d o n a t i n ga p r o t o n ( H + )t o s u g a rE ,a n d A s o 5 2 o o i s e dt o a t t a c kt h e C 1 c a r b o na t o m
T h e A s p 5 2 h a sf o r m e d a c o v a l e n tb o n d b e t w e e n t h e e n z y m ea n d t h e C 1c a r b o na t o m o f s u g a rD T h e G l u 3 5 t h e n p o l a r i z e sa w a t e r m o l e c u l e( r e d ) , s o t h a t i t s o x y g e nc a n r e a d i l ya t t a c kt h e C 1 c a r b o na t o m a n d d i s p l a c eA s o 5 2
T h e r e a c t i o no f t h e w a t e r m o l e c u l e( r e d ) c o m p l e t e st h e h y d r o l y s ias n d r e t u r n st h e e n z y m e t o i t s i n i t i a ls t a t e ,f o r m i n gt h e f i n a l e n z y m e o r o d u c tc o m p l e x( E P ) .
166
Chapter3: Proteins
Figure3-52 Somegeneralstrategiesof enzyme catalysis.(A)Holding substrates togetherin a precisealignment. (B)Chargestabilization of reaction (C)Applyingforcesthat intermediates. distortbondsin the substrate to increase the rateof a particularreaction. ( A ) e n z y m eb i n d st o t w o s u b s t r a t em o l e c u l e sa n d o r i e n t st h e m p r e c i s e ltyo e n c o u r a g ea r e a c t i o nt o o c c u rb e t w e e nt h e m
( B ) b i n d i n go f s u b s t r a t e ( C )e n z y m es t r a i n st h e ro enzyme rearranges bound substrate e l e c t r o n si n t h e s u b s t r a t e , m o l e c u l ef,o r c i n gi t c r e a t i n gp a r t i a ln e g a t i v e toward a transition a n d p o s i t i v ec h a r g e s state to favor a reaction that favor a reaction
3-524.).As we saw for lysozyme, the active site of an enzyme contains precisely positioned atoms that speed up a reaction by using charged groups to alter the distribution of electrons in the substrates (Figure 3-528). In addition, when a substrate binds to an enzyme, bonds in the substrate often bend, changing the substrate shape.These changes,along with mechanical forces, drive a substrate toward a particular transition state (Figure 3-52C). Finally, like lysozyme, many enzymes participate intimately in the reaction by briefly forming a covalent bond between the substrate and a side chain of the enzyme. Subsequent steps in the reaction restore the side chain to its original state, so that the enzyme remains unchanged after the reaction (seealso Figure 2-22).
TightlyBoundSmallMolecules Add ExtraFunctions to Proteins Although we have emphasized the versatility of proteins as chains of amino acids that perform different functions, there are many instances in which the amino acids by themselves are not enough. Just as humans employ tools to enhance and extend the capabilities of their hands, proteins often use small nonprotein molecules to perform functions that would be difficult or impossible to do with amino acids alone. Thus, the signal receptor protein rhodopsin, which is made by the photoreceptor cells in the retina, detects light by means of a small molecule, retinal, embedded in the protein (Figure 3-53A). Retinal changes its shape when it absorbs a photon of light, and this change causesthe protein to trigger a cascade of enzymatic reactions that eventually lead to an electrical signal being carried to the brain. Another example of a protein that contains a nonprotein portion is hemoglobin (see Figure 3-22). A molecule of hemoglobin carries four heme groups, ring-shaped molecules each with a single central iron atom (Figure 3-538). Heme gives hemoglobin (and blood) its red color. By binding reversibly to oxygen gas through its iron atom, heme enables hemoglobin to pick up oxygen in the lungs and releaseit in the tissues. sometimes these small molecules are attached covalently and permanently to their protein, thereby becoming an integral part of the protein molecule itself. we shall see in chapter l0 that proteins are often anchored to cell membranes through covalently attached lipid molecules. And membrane proteins exposed COOH
I
fn, 9Hz
COOH
I
T", CHz
ua
-e ,u, l La - u
aH-
H
CH: CH:
(A)
(B)
HC CHz
Figure3-53 Retinaland heme.(A)The structureof retinal,the light-sensitive moleculeattachedto rhodopsinin the eye.(B)The structureof a hemegroup. Thecarbon-containing hemering is red and the iron atom at its centeris orange. A hemegroup is tightlyboundto eachof the four polypeptidechainsin hemoglobin,the oxygen-carrying protein whosestructureis shownin Fiqure3-22.
167
PROTEIN FUNCTION Table3-2 ManyVitaminsProvideCritical Coenzymes for HumanCells
T h i a m i n e( v i t a m i nB r ) Riboflavin(vitaminBz) Niacin Pantothenicacid Pyridoxine Biotin Lipoicacid Folicacid V i t a m i nB r z
thiaminepyrophosphate FADH NADH,NADPH coenzymeA pyridoxal phosphate biotin lipoamide tetrahydrofolate cobalamin coenzymes
activationand transferof aldehydes oxidation-reduction oxidation-reduction acyl group activationand transfer amino acid activation;alsoglycogenphosphorylase CO2activationand transfer acyl group activation;oxidation-reduction activationand transferof singlecarbon groups isomerizationand methyl group transfers
on the surface of the cell, as well as proteins secreted outside the cell, are often modified by the covalent addition of sugars and oligosaccharides. Enzymes frequently have a small molecule or metal atom tightly associated with their active site that assistswith their catalytic function. Carboxypeptidase, for example, an enzyrne that cuts polypeptide chains, carries a tightly bound zinc ion in its active site. During the cleavageof a peptide bond by carboxypeptidase, the zinc ion forms a transient bond with one of the substrate atoms, thereby assisting the hydrolysis reaction. In other enzymes, a small organic molecule servesa similar purpose. Such organic molecules are often referred to as coenzymes. An example is biotin, which is found in enzymes that transfer a carboxylate group (-COO-) from one molecule to another (see Figure 2-63). Biotin participates in these reactions by forming a transient covalent bond to the -COO- group to be transferred, being better suited to this function than any of the amino acids used to make proteins. Because it cannot be synthesized by humans, and must therefore be supplied in small quantities in our diet, biotin is a uitamin. Many other coenzymes are produced from vitamins (Table3-2). Vitamins are also needed to make other types of small molecules that are essential components of our proteins; vitamin A, for example, is needed in the diet to make retinal, the light-sensitive part of rhodopsin.
with Multiple Molecular TunnelsChannelSubstrates in Enzymes CatalyticSites Some of the chemical reactions catalyzedby enzymes in cells produce intermediates that are either very unstable or that could readily diffuse out of the cell through the plasma membrane if released into the cltosol. To preserve these intermediates, enzymes have evolved molecular tunnels that connect tvvo or more active sites, allowing the intermediate to be rapidly processed to a final product-without ever leaving the enzyme. Consider, for example, the enzyme carbamoyl phosphate synthetase,which uses ammonia derived from glutamine plus two molecules of ATP to convert bicarbonate (HCO3-) to carbamoyl phosphate-an important intermediate in several metabolic pathways (Figure 3-54). This enzyme contains three widely separated active sites that are connected to each other by a tunnel. The reaction starts at active site 2, located in the middle of the tunnel, where AIP is used to phosphorylate (add a phosphate group to) bicarbonate, forming carbory phosphate. This event triggers the hydrolysis of glutamine to glutamic acid at active site 1, releasing ammonia into the tunnel. The ammonia immediately diffuses through the first half of the tunnel to active site 2, where it reacts with the carboxyphosphate to form carbamate. This unstable intermediate then diffuses through the second half of the tunnel to active site 3, where it is phosphorylated byATP to the final product, carbamoyl phosphate.
168
Chapter3: Proteins
Figure3-54 The tunnelingof reactionintermediatesin the enzyme carbamoylphosphatesynthetase.(A)Diagramofthe structureof the enzyme,in whicha redribbonhasbeenusedto outlinethe tunnelon the insideof the proteinconnectingits three activesites.Thesmalland largesubunitsof this dimericenzyme (B)The path of the are color codedyellow and blue,respectively. reaction.As indicated, activesite 1 producesammonia,which diffusesthroughthe tunnelto activesite2, whereit combines with carboxyphosphateto form carbamate. Thishighlyunstable intermediate then diffusesthroughthe tunnelto activesite3, whereit is phosphorylated by ATPto producethe finalproduct, (A,modifiedfrom F.M.Raushel, carbamoylphosphate. J.B.Thoden, and H.M.Holden,Acc.Chem.Res.36:539-548,2003.Witn permission from AmericanChemicalSocietv.)
I
L
]'l]]::]:,.']':]i]]'::]'.':l:','.|.i':.:],:].
oo
Hzo I
iltl
- o/ / bicarbonate
CP \
q - llu t a m i n e
l\
odo-
p,' *l
L+ I
q l u t a m i ca c i d V NHr
carboxyphosphate
II
t
NHj diffusion
I I
-",1.(B)
Severalother well characterized enzymes contain similar molecular tunnels. Ammonia, a readily diffusable intermediate that might otherwise be lost from the cell, is the substrate most frequently channeled in the examples thus far kno'nrm.
Multienzyme Complexes Helpto Increase the Rateof Cell Metabolism The efficiency of enzymes in accelerating chemical reactions is crucial to the maintenance of life. cells, in effect, must race against the unavoidable processes of decay, which-if left unattended-cause macromolecules to run downhill toward greater and greater disorder. If the rates of desirable reactions were not greater than the rates of competing side reactions, a cell would soon die. we can get some idea of the rate at which cell metabolism proceeds by measuring the rate of ArP utilization. A typical mammalian cell "turns over" (i.e.,hydrolyzes and restoresby phosphorylation) its entire ATp pool once every I or 2 minutes. For each cell, this turnover represents the utilization of roughly 107molecules of AIP per second (or, for the human body, about I gram of nfi, everv minute).
169
PROTEIN FUNCTION
The rates of reactions in cells are rapid because enzyme catalysisis so effective. Many important enzymes have become so efficient that there is no possibility of further useful improvement. The factor that limits the reaction rate is no Ionger the enzyme's intrinsic speed of action; rather, it is the frequency with which the enzyme collides with its substrate. Such a reaction is said to be diffusion-limited (seePanel3-3, p. 162-163). If an enzyme-catalyzed reaction is diffusion-limited, its rate depends on the concentration of both the enzyme and its substrate. If a sequence of reactions is to occur extremely rapidly, each metabolic intermediate and enzyme involved must be present in high concentration. However, given the enormous number of different reactions performed by a cell, there are limits to the concentrations that can be achieved. In fact, most metabolites are present in micromolar (10-6 M) concentrations, and most enzyme concentrations are much lower. How is it possible, therefore, to maintain very fast metabolic rates? The answer lies in the spatial organization of cell components. The cell can increase reaction rates without raising substrate concentrations by bringing the various enzJ,.rnes involved in a reaction sequence together to form a large protein assembly knor.vn as a multienzyme complex (Figure 3-55). Because this A to be passeddirectly to enzyme B, and so on, difallows the product of enzJ,ryne fusion rates need not be limiting, even when the concentrations of the substrates in the cell as a whole are very low. It is perhaps not surprising, therefore, that such enzyme complexes are very common, and they are involved in nearly all aspects of metabolism-including the central genetic processes of DNA, RNA, and protein slmthesis.In fact, few enzymes in eucaryotic cells diffuse freely in solution; instead, most seem to have evolved binding sites that concentrate them with other proteins of related function in particular regions of the cell, thereby increasing the rate and efficiency ofthe reactions that they catalyze. Eucaryotic cells have yet another way of increasing the rate of metabolic reactions: using their intracellular membrane systems.These membranes can segregateparticular substratesand the enzymes that act on them into the same membrane-enclosed compartment, such as the endoplasmic reticulum or the cell nucleus. If, for example, a compartment occupies a total of 10% of the volume of the cell, the concentration of reactants in that compartment may be increased by 10 times compared with a cell with the same number of enzymes and substrate molecules, but no compartmentalization. Reactions limited by the speed of diffusion can thereby be speeded up by a factor of 10.
the Catalytic Activitiesof its Enzymes TheCellRegulates many of which operate at the same A living cell contains thousands of enz).rynes, time and in the same small volume of the c1'tosol.By their catalytic action, these enzymes generate a complex web of metabolic pathways, each composed of chains of chemical reactions in which the product of one enzyme becomes the substrate of the next. In this maze of pathways, there are many branch points (nodes) where different enzymes compete for the same substrate.The system is so complex (see Figure 2-88) that elaborate controls are required to regulate when and how rapidly each reaction occurs.
8 t r i m e r so f l i p o a m i d er e d u c t a s e transacetylase
+ 1 2 m o l e c u l e so f dihydrolipoyl dehydrogenase
+24 moleculeo sf pyruvatedecarboxylase
Figure3-55 The structure of pyruvate Thisenzymecomplex dehydrogenase. catalyzesthe conversionof pyruvateto acetylCoA,as part of the pathwaythat oxidizessugarsto COzand HzO(seeFigure 2-79).lt is an exampleof a large multienzymecomplexin which reaction intermediatesare passeddirectlyfrom one enzymeto another.
17O
Chapter3: Proteins
Regulation occurs at many levels.At one level, the cell controls how many molecules of each enzyme it makes by regulating the expressionof the gene that encodes that enzyme (discussedin chapter 7). The cell also controls enzymatic activities by confining sets of enzymes to particular subcellular compartments, enclosed by distinct membranes (discussedin chapters 12 and 14). As will be discussed later in this chapter, enzymes are frequently covalently modified to control their activity. The rate ofprotein destruction by targeted proteolysis represents yet another important regulatory mechanism (seep. 395). But the most general process that adjusts reaction rates operates through a direct, reversible change in the activity of an enzyme in response to the specific small molecules that it encounters. The most common type of control occurs when a molecule other than one of the substrates binds to an enzyme at a special regulatory site outside the active site, thereby altering the rate at which the enzyme converts its substrates to products. For example, in feedback inhibition a product produced late in a reaction pathway inhibits an enzyme that acts earlier in the pathway. Thus, whenever large quantities of the final product begin to accumulate, this product binds to the enzyme and slows down its catalytic action, thereby limiting the further entry of substrates into that reaction pathway (Figure g-s6). \Mhere pathways branch or intersect, there are usually multiple points of control by different final products, each of which works to regulate its own synthesis (Figure 3-57). Feedback inhibition can work almost instantaneously, and it is rapidlv reversedwhen the level of the product falls.
Figure3-56 Feedbackinhibitionof a singlebiosyntheticpathway.TheendproductZ inhibitsthe firstenzymethat is uniqueto its synthesis and thereby controlsits own levelin the cell.Thisis an exampleof negativeregulation.
aspartate
I I
I I methionine
Figure3-57 Multiplefeedback inhibition.In this example,which shows the biosyntheticpathwaysfor four differentaminoacidsin bacteria, the red arrowsindicatepositionsat which productsfeed backto inhibitenzymes. Eachaminoacidcontrolsthe firstenzyme specificto its own synthesis, thereby controllingits own levelsand avoidinga wasteful,or evendangerous, buildupof intermediates. The productscanalso separately inhibitthe initialset of reactions commonto all the syntheses; in this case,three differentenzymes catalyze the initialreaction,each inhibitedbv a differentoroduct.
PROTEIN FUNCTION
Feedback inhibition is negatiueregulation: it prevents an enzyme from acting. Enzymes can also be subject to positiue regulation, in which a regulatory molecule stimulates the enzyme's activity rather than shutting the enzyme down. Positive regulation occurs when a product in one branch of the metabolic network stimulates the activity of an enzyme in another pathway. As one example, the accumulation of ADP activates several enzymes involved in the oxidation of sugar molecules, thereby stimulating the cell to convert more ADP to AIP
AllostericEnzymesHaveTwoor MoreBindingSitesThatInteract A striking feature of both positive and negative feedback regulation is that the regulatory molecule often has a shape totally different from the shape of the substrate of the enz].ryne.This is why the effect on a protein is termed allostery (from the Greekwords allos,meaning"other," andstereos,meaning"solid" or"three-dimensional"). As biologists learned more about feedback regulation, they recognized that the enzyrnes involved must have at least two different binding sites on their surface-an active site that recognizes the substrates, and a regulatory site that recognizes a regulatory molecule. These two sites must somehow communicate so that the catalytic events at the active site can be influenced by the binding of the regulatory molecule at its separatesite on the protein's surface. The interaction between separated sites on a protein molecule is now knor,tmto depend on a conformational changein the protein: binding at one of the sites causesa shift from one folded shape to a slightly different folded shape. During feedback inhibition, for example, the binding of an inhibitor at one site on the protein causesthe protein to shift to a conformation that incapacitates its active site, located elsewherein the protein. It is thought that most protein molecules are allosteric. They can adopt two or more slightly different conformations, and a shift from one to another caused by the binding of a ligand can alter their activity. This is true not only for enzymes but also for many other proteins, including receptors, structural proteins, and motor proteins. In all instances of allosteric regulation, each conformation of the protein has somewhat different surface contours, and the protein's binding sites for ligands are altered when the protein changes shape. Moreover as we discuss next, each ligand will stabilize the conformation that it binds to most strongly, and thus-at high enough concentrations-will tend to "switch' the protein toward the conformation that the ligand prefers.
TwoLigandsWhoseBindingSitesAreCoupledMustReciprocally AffectEachOther'sBinding The effects of ligand binding on a protein follow from a fundamental chemical principle knor.vnas linkage. Suppose,for example, that a protein that binds glucose also binds another molecule, X, at a distant site on the protein's surface. If the binding site for X changes shape as part of the conformational change induced by glucosebinding, the binding sites for X and for glucose are said to be coupled. Vy'henevertwo ligands prefer to bind to the same conformation of an allosteric protein, it follows from basic thermodynamic principles that each ligand must increasethe affinity of the protein for the other. Thus, if the shift of the protein in Figure 3-58 to the closed conformation that binds glucose best also causes the binding site for X to fit X better, then the protein will bind glucose more tightly when X is present than when X is absent. Conversely,linkage operates in a negative way if two ligands prefer to bind to dffirent conformations of the same protein. In this case,the binding of the first ligand discouragesthe binding of the second ligand. Thus, if a shape change caused by glucose binding decreasesthe affinity of a protein for molecule X, the binding of X must also decreasethe protein's affinity for glucose (Figure 3-59). The linkage relationship is quantitatively reciprocal, so that, for example, if glucose has a very large effect on the binding of X, X has a very large effect on the binding of glucose.
171
172
Chapter3: Proteins
INACTIVE
Figure3-58 Positiveregulation caused by conformationalcoupling between two distantbinding sites.In this example,both glucoseand moleculeX bind bestto the c/osedconformationof a proteinwith two domains.Becauseboth glucoseand moleculeX drivethe protein toward its closedconformation,each ligandhelpsthe otherto bind.Glucose and moleculeX arethereforesaidto bind cooperativelyto the protein.
molecule X
? I
positive r e gu l a t i o n
ACTIVE 10% active
100% active
The relationships sho'o.rnin Figures 3-58 and 3-59 apply to all proteins, and they underlie all of cell biology. They seem so obvious in retrospect that we now take it for granted. But the discovery of linkage in studies of a few enzymes in the 1950s,followed by an extensive analysis of allosteric mechanisms in proteins in the early 1960s, had a revolutionary effect on our understanding of biology. Since molecule X in these examples binds at a site on the enzyme that is distinct from the site where catalysis occurs, it need have no chemical relationship to glucose or to any other ligand that binds at the active site. Moreover, as we have just seen, for enzymes that are regulated in this way, molecule X can either turn the enzyme on (positive regulation) or turn it off (negative regulation). By such a mechanism, allosteric proteins serve as general switches that, in principle, allow one molecule in a cell to affect the fate of anv other.
SymmetricProteinAssemblies ProduceCooperative Allosteric Transitions A single-subunit enzyme that is regulated by negative feedback can at most decreasefrom 90% to about l0% activity in responseto a 1O0-foldincreasein the concentration of an inhibitory ligand that it binds (Figure 3-60, red line). Responsesof this type are apparently not sharp enough for optimal cell regulation, and most enzymes that are turned on or off by ligand binding consist of s).rynmetricassemblies of identical subunits. with this arrangement, the binding of a molecule of ligand to a single site on one subunit can promote an allosterii change in the entire assembly that helps the neighboring subunits bind the same ligand. As a result, a cooperatiue allosteric transition occurs (Figure 3-60, blue line), allowing a relatively small change in ligand concentration in the cell to switch the whole assembly from an almost fully active to an almost fully inactive conformation (or vice versa).
C
molecule X I
{ negative regu lation
100%active
1 0 %a c t i v e
Figure3-59 Negativeregulation caused by conformationalcoupling between two distant binding sites.The scheme hereresembles that in the orevious figure,but here moleculeX prefersthe open conformation,while glucoseprefers the c/osedconformation.Becauseglucose and moleculeX drivethe proteintoward oppositeconformations(closedand open, respectively), the presenceof eitherligandinterferes with the binding of the other.
173
PROTEIN FUNCTION
I I
o a ^_
EU) N c o o o o
5 i n h i b i t o rc o n c e n t r a t i o n-
The principles involved in a cooperative "all-or-none" transition are the same for all proteins, whether or not they are enzymes.But they are perhaps easiest to visualize for an enzyme that forms a s).rynmetricdimer. In the example sholtryrin Figure 3-61, the first molecule of an inhibitory ligand binds with great difficulty since its binding disrupts an energetically favorable interaction between the two identical monomers in the dimer. A second molecule of inhibitory ligand now binds more easily,however, because its binding restores the energetically favorable monomer-monomer contacts of a symmetric dimer (this also completely inactivates the enzyme). As an alternative to this inducedfirmodel for a cooperative allosteric transition, we can view such a symmetrical enzyme as having only two possible conformations, corresponding to the "enzyme on" and "enzyme off" structures in Figure 3-61. In this view, ligand binding perturbs an all-or-none equilibrium between these two states,thereby changing the proportion of active molecules. Both models represent true and useful concepts; it is the second model that we shall describe next.
Figure3-60 Enzymeactivity versusthe concentrationof inhibitoryligandfor single-subunitand multisubunit allostericenzymes.Foran enzymewith a singlesubunit (redline),a drop from 900/o activity(indicated enzymeactivityto 10o/o by the two dots on the curve)requiresa of in the concentration 10O-foldincrease Theenzymeactivityis inhibitor. from the simpleequilibrium calculated whereP is l( = tlPl/tlltPl, relationship activeprotein,I is inhibitor,and lP is the inactiveoroteinboundto inhibitor.An identicalcurveappliesto any simple bindinginteractionbetweentwo A and B.In contrast,a molecules, enzymecan multisubunitallosteric respondin a switchlikemannerto a the steep changein ligandconcentration: is causedby a cooperative response as bindingof the ligandmolecules, explainedin Figure3-61.Here,the green /ine representsthe idealizedresult expectedfor the cooperativebinding of to an two inhibitoryligandmolecules enzymewith two subunits,and allosteric the blueline showsthe idealized of an enzymewith four response subunits.As indicatedby the two dots on eachof thesecurves,the morecomplex activity enzymesdrop from 90o/oro10o/o overa much narrowerrangeof inhibitor than doesthe enzyme concentration composedof a singlesubunit.
ls Transcarbamoylase in Aspartate TheAllosteric Transition Understood in AtomicDetail One enzyme used in the early studies of allosteric regulation was aspartate transcarbamoylase from E coli. lt catalyzesthe important reaction that begins the synthesisof the pyrimidine ring of C, U, and T nucleotides: carbamoyl phosphate + aspartate -+ ly'-carbamoylaspartate.One of the final products of this pathway, cltosine triphosphate (CTP),binds to the enzyme to turn it off whenever CTP is plentiful. Aspartate transcarbamoylaseis a large complex of six regulatory and six catalyic subunits. The catalyic subunits form two trimers, each arranged in the shape of an equilateral triangle; the two trimers face each other and are held
Figure3-61 A cooperativeallosterictransitionin an enzymecomposed of how the conformation of two identicalsubunits.Thisdiagramillustrates The bindingof a single one subunitcan influencethat of its neighbor. moleculeof an inhibitoryligand (yellow)to one subunitof the enzyme of this subunit occurswith difficultybecauseit changesthe conformation and therebydisruptsthe symmetryof the enzyme.Oncethis the energygainedby changehasoccurred,however, conformational restoringthe symmetricpairinginteractionbetweenthe two subunits makesit especially easyfor the secondsubunitto bind the inhibitory the binding change.Because ligandand undergothe sameconformational the affinitywith whichthe other of the firstmoleculeof ligandincreases of the enzymeto changesin subunitbindsthe sameligand,the response of an of the ligandis much steeperthan the response the concentration enzymewith only one subunit(seeFigure3-60).
ON ENZYME
inhibitor
EASY TRANSITION
tr
174
Chapter3: Proteins
INACTIVE ENZYME: T STATE cata lytic s ub u n i t s
CTP
e
#
5nm ACTIVEENZYME:R STATE
together by three regulatory dimers that form a bridge between them. The entire molecule is poised to undergo a concerted, all-or-none, allosteric transition between two conformations, designated as T (tense) and R (relaxed)states (Figure 3-62). The binding of substrates (carbamoyl phosphate and aspartate) to the catalytic trimers drives aspartate transcarbamoylase into its catalytically active R state, from which the regulatory crP molecules dissociate. By contrast, the binding of crP to the regulatory dimers converts the enzyme to the inactive T state, from which the substrates dissociate. This tug-of-war between crp and substratesis identical in principle to that described previously in Figure 3-59 for a simpler allosteric protein. But because the tug-of-war occurs in a symmetric molecule with multiple binding sites, the enzyme undergoes a cooperative allosteric transition that will turn it on suddenly as substratesaccumulate (forming the R state) or shut it off rapidly when crp accumulates (forming the T state). A combination of biochemistry and x-ray crystallography has revealedmany fascinating details of this allosteric transition. Each regulatory subunit has two domains, and the binding of crP causes the two domains to move relative to each other, so that they function like a lever that rotates the two catalytic trimers and pulls them closer together into the T state (see Figure 3-62). \.4rhenthis occurs, hydrogen bonds form between opposing catal)'tic subunits. This helps widen the cleft that forms the active site within each catalytic subunit, thereby disrupting the binding sites for rhe substrates (Figure 3-63). Adding large amounts of substrate has the opposite effect, favoring the R state by binding in the cleft of each catalytic subunit and opposing the above conformational change. conformations that are intermediate between R and T are unstable, so that the enzyme mostly clicks back and forth between its R and T forms, producing a mixture of these two speciesin proportions that depend on the relitive concentrations of CTP and substrates.
Figure3-62 The transition between R and T statesin the enzyme aspartate transcarbamoylase.The enzyme consists of a complexof sixcatalytic subunitsand six regulatorysubunits,and the structures of its inactive(T state)and active(Rstate)forms havebeen determinedby x-raycrystallography. The enzymeis turned off by feedback inhibitionwhen CTPconcentrations rise. Eachregulatorysubunitcan bind one moleculeof CTP, which is one of the final productsin the pathway.By meansof this negativefeedbackregulation, the pathway is preventedfrom producingmore CTP than the cell needs.(Basedon K.L.Krause, K.W.Volzand W.N.Lipscomb, Proc.Natl Acad.Sci.U.5.A.82:1643-1647, 1985. With permission from NationalAcademy of Sciences.)
175
PROTEIN FUNCTION
Arg 167
Ws164
(g,rcs Arg229 rg 234 G l u2 3 9
T state (inactive)
in Proteins Are Drivenby ProteinPhosphorylation ManyChanges Proteins are regulated by more than the reversible binding of other molecules.A second method that eucaryotic cells use to regulate a protein's function is the covalent addition of a smaller molecule to one or more of its amino acid side chains. The most common such regulatory modification in higher eucaryotes is the addition of a phosphate group. We shall therefore use protein phosphorylation to illustrate some of the general principles involved in the control of protein function through the modification of amino acid side chains. A phosphorylation event can affect the protein that is modified in two important ways. First, because each phosphate group carries two negative charges,the enzyme- catalyzed addition of a phosphate group to a protein can cause a major conformational change in the protein by, for example, attracting a cluster of positively charged amino acid side chains. This can, in turn, affect the binding of ligands elsewhere on the protein surface, dramatically changing the protein's activity. \A/trena second enzyme removes the phosphate group, the protein returns to its original conformation and restoresits initial activity. Second, an attached phosphate group can form part of a structure that the binding sites of other proteins recognize.As previously discussed,certain protein domains, sometimes referred to as modules, appear very frequently as parts of larger proteins. One such module is the SH2 domain, described earliel which binds to a short peptide sequence containing a phosphorylated tyrosine side chain (seeFigure 3-398). More than ten other common domains provide binding sites for attaching their protein to phosphorylated peptides in other protein molecules, each recognizingaphosphorylated amino acid side chain in a different protein context. As a result, protein phosphorylation and dephosphorylation very often drive the regulated assembly and disassembly of protein complexes (seeFigure 15-22). Reversible protein phosphorylation controls the activity, structure, and cellular Iocalization of both enzymes and many other types of proteins in
Figure3-63 Part of the on-off switch in the catalyticsubunitsof aspartate Changesin the transcarbamoylase. interactions indicatedhydrogen-bonding for switchingthis arepartlyresponsible enzyme'sactivesite betweenactive (yellow)and inactiveconformations. Hydrogenbonds are indicatedby thin red /ines. Theaminoacidsinvolvedin the interactionin the T state subunit-subunit are shown in red,while thosethat form the activesite of the enzymein the R state areshownin blue.Thelargedrawings showthe catalyticsitein the interiorof the enzyme;the boxed sketchesshow the samesubunitsviewedfrom the enzyme's externalsurface.(Adaptedfrom E.R.Kantrowitzand W.N.Lipscomb,Irends 1990.With Biochem. Sci.15:53-59, permissionfrom Elsevier.)
'176
Chapter3: Proteins
eucaryotic cells. In fact, this regulation is so extensive that more than one-third of the 10,000or so proteins in a tlpical mammalian cell are thought to be phosphorylated at any given time-many with more than one phosphate. As might be expected, the addition and removal of phosphate groups from specific proteins often occur in responseto signals that specify some change in a cell'sstate. For example, the complicated series of events that takes place as a eucaryotic cell divides is largely timed in this way (discussedin Chapter 17), and many of the signals mediating cell-cell interactions are relayed from the plasma membrane to the nucleus by a cascadeofprotein phosphorylation events (discussed in Chapter 15).
A Eucaryotic CellContainsa LargeCollection of ProteinKinases and ProteinPhosphatases Protein phosphorylation involves the enzyme- catalyzedtransfer of the terminal phosphate group of an ATP molecule to the hydroxyl group on a serine, threonine, or tyrosine side chain of the protein (Figure 3-64). A protein kinase catalyzesthis reaction, and the reaction is essentiallyunidirectional because of the large amount of free energy released when the phosphate-phosphate bond in ATP is broken to produce ADP (discussedin chapt er 2) . Aprotein phosphatase catalyzesthe reversereaction of phosphate removal, or dephosphorylation. cells contain hundreds of different protein kinases, each responsible for phosphorylating a different protein or set of proteins. There are also many different protein phosphatases;some are highly specific and remove phosphate groups from only one or a few proteins, whereas others act on a broad range of proteins and are targeted to specific substratesby regulatory subunits. The state ofphosphorylation of a protein at any moment, and thus its activity, depends on the relative activities of the protein kinases and phosphatasesthat modiff it. The protein kinases that phosphorylate proteins in eucaryotic cells belong to a very large family of enzymes, which share a catal),'tic(kinase) sequence of about 290 amino acids. The various family members contain different amino acid sequences on either end of the kinase sequence (for example, see Figure 3-10), and often have short amino acid sequencesinserted into loops within it (red arrowheadsin Figure 3-65). Some of these additional amino acid sequences enable each kinase to recognize the specific set ofproteins it phosphorylates, or to bind to structures that localize it in specific regions of the cell. Other parts of the protein regulate the activity of each kinase, so it can be turned on and off in response to different specific signals,as described below. By comparing the number of amino acid sequence differences between the various members of a protein family, we can construct an "evolutionary tree" that is thought to reflect the pattern of gene duplication and divergence that gave rise to the family. Figure 3-66 shows an evolutionary tree of protein kinases.Kinases with related functions are often located on nearby branches of the tree: the protein kinases involved in cell signaling that phosphorylate tyrosine side chains, for example, are all clustered in the top left corner of the tree. The other kinases shor,m phosphorylate either a serine or a threonine side chain, and many are organized into clusters that seem to reflect their functionin transmembrane signal transduction, intracellular signal amplification, cellcycle control, and so on. Figure3-65 The three-dimensional structureof a proteinkinase. Superimposed on this structureareredarrowheads to indicatesiteswhere insertions of 5-100aminoacidsarefound in somemembersof the protein kinasefamily.Theseinsertions arelocatedin loopson the surfaceof the enzymewhereother ligandsinteractwith the protein.Thus,they distinguish differentkinasesand conferon them distinctiveinteractions with other proteins. TheATp(whichdonatesa phosphategroup)and the peptideto be phosphorylated areheld in the activesire,whichextends betweenthe phosphate-binding loop (yellow)and the catalyticloop (orange).Seealso Figure3-10. (Adaptedfrom D.R.Knightonet al.,Science 253:407-414, 1991.With permission from AAAS.)
olADP
O: P-OI
o
OH I
I
s e nn e CH s i d ec h a i n
CH,
(A)
--,
k in a s e
*--l
phosphatase k in a s e -_.
:--_-/
oFF
phosphatase
(B) Figure3-64 Proteinphosphorylation. Manythousandsof proteinsin a typical eucaryotic cellaremodifiedby the covalent additionof a phosphategroup. (A)Thegeneralreaction,shownhere, transfers a phosphategroupfrom ATPto an aminoacidsidechainof the targetprotein by a protein kinase.Removalof the phosphategroup is catalyzed by a second enzymera proteinphosphatase. In this example,the phosphateis addedto a serine sidechain;in othercases, the phosphateis insteadlinkedto the -OH groupof a threonineor a tyrosinein the protein. (B)Thephosphorylation of a proteinby a proteinkinasecan eitherincrease or decreasethe protein'sactivity,depending on the siteof phosphorylation and the structureof the protein.
177
PROTEIN FUNCTION
Figure3-66 An evolutionary tree of selectedprotein kinases.Although a cellcontainshundreds highereucaryotic and the human of suchenzymes, genomecodesfor morethan 500,onlY in this bookare someof thosediscussed snown.
CdcT
PDGF receptor EGF tyrosine recepror kinase subfamily
cyclic-AMPd e p e n d e n tk i n a s e cyclic-GMPd e p e n d e n tk i n a s e p r o t e i nk i n a s eC
IGFB receptor
Ca2*/calmodulin-
m y o s i nl i g h t dependent kinase c h a i nk i n a s e s
r e c e p t o rs e r i n e k i n a s es u b f a m i l y
As a result of the combined activities of protein kinases and protein phosphatases,the phosphate groups on proteins are continually turning over-being added and then rapidly removed. Such phosphorylation cyclesmay seem wasteful, but they are important in allowing the phosphorylated proteins to switch rapidly from one state to another: the more rapid the cycle, the faster a population of protein molecules can change its state of phosphorylation in responseto a sudden change in the phosphorylation rate (see Figure 15-11). The energy required to drive this phosphorylation cycle is derived from the free energy of ATP hydrolysis, one molecule of which is consumed for each phosphorylation event.
ShowsHowa TheRegulation of Cdkand SrcProteinKinases ProteinCanFunctionasa Microchip The hundreds of different protein kinases in a eucaryotic cell are organized into complex networks of signaling pathways that help to coordinate the cell's activities, drive the cell cycle, and relay signals into the cell from the cell's environment. Many of the extracellular signals involved need to be both integrated and amplified by the cell. Individual protein kinases (and other signaling proteins) serve as input-output devices, or "microchips," in the integration process. An important part of the input to these signal processing proteins comes from the control that is exerted by phosphates added and removed from them by protein kinases and protein phosphatases,respectively. In general, specific sets of phosphate groups serve to activate the protein, while other sets can inactivate it. A cyclin-dependent protein kinase (Cdk) provides a good example.Kinasesin this classphosphorylate serinesand threonines, and they are central components of the cell-cycle control system in eucaryotic cells,as discussedin detail in Chapter 17.In avertebrate cell, individual Cdk proteins turn on and off in succession, as a cell proceeds through the different phases of its division cycle.r'A/hena particular kinase is on, it influences various aspectsof cell behavior through effects on the proteins it phosphorylates. A Cdk protein becomes active as a serine/threonine protein kinase only when it is bound to a second protein called a cyclin. But, as Figure 3-67 shows, the binding of cyclin is only one of three distinct "inputs" required to activate the Cdk. In addition to cyclin binding, a phosphate must be added to a specific threonine side chain, and a phosphate elsewherein the protein (covalently bound to a specific tyrosine side chain) must be removed. Cdk thus monitors a specific set
INPUTS
OUTPUT Figure 3-67 How a Cdk protein acts as an integrating device.The of the functionof thesecentralregulators in Chapter17' cellcycleis discussed
178
Chapter3: Proteins
fatty acid
5 0 0a m i n o a c i d s
of cell components-a cyclin, a protein kinase, and a protein phosphatase-and it acts as an input-output device that turns on if, and only if, each of these components has attained its appropriate activity state. Some cyclins rise and fall in concentration in step with the cell cycle, increasing gradually in amount until they are suddenly destroyed at a particular point in the cycle. The sudden destruction of a cyclin (by targeted proteolysis) immediately shuts off its partner Cdk enzyme, and this triggers a specific step in the cell cycle.
Figure3-68 The domain structureof the Srcfamily of protein kinases,mapped alongthe amino acid sequence.Forthe three-dimensional structureof Src.see F i q u r e3 - 1 0 .
cated by the evolutionary tree in Figure 3-66, sequence comparisons suggest that tyrosine kinases as a group were a relatively late innovation that branihed off from the serine/threonine kinases, with the src subfamily being only one subgroup of the tyrosine kinases created in this way. The src protein and its relatives contain a short N-terminal region that becomes covalently linked to a strongly hydrophobic fatty acid, which holds the kinase at the c)'toplasmic face of the plasma membrane. Next come two peptide-binding modules, a Src homology 3 (sH3) domain and a sH2 domain, followed by the kinase catalytic domain (Figure 3-68). These kinasesnormally exist in an inactive conformation, in which a phosphorylated tyrosine near the c-terminus is bound to the SH2 domain, and the sH3 domain is bound to an internal peptide in a way that distorts the active site of the en4/me and helps to render it inactive. Turning the kinase on involves at least two specific inputs: removal of the c-
processing events that enable the cell to compute logical responsesto a complex set of conditions.
Proteins ThatBindand Hydrolyze GTpAre ubiquitousceilurar Regulators we have described how the addition or removal of phosphate groups on a protein can be used by a cell to control the protein's activity. In the examples discuised so
Figure3-69 The activation of a Src-type protein kinaseby two sequentialevents. (Adaptedfrom S.C.Harrisonet al.,Ceil 112:737-7 40,2003.With permission from Elsevier.)
a c t i v a t i n gl i g a n d
k i n a s ed o m a i n
PHOSPHATE REMOVAL LOOSENS STRUCTURE
K I N A S EC A NN O W PHOSPHORYLATE T Y R O S I NTEO SELF-ACTIVATE
179
PROTEIN FUNCTION
far, the phosphate is transferred from an AIP molecule to an amino acid side chain of the protein in a reaction catalyzedby a specific protein kinase. Eucaryotic cells also have another way to control protein activity by phosphate addition and removal. In this case,the phosphate is not attached directly to the protein; instead, it is a part of the guanine nucleotide GTB which binds very tightly to the protein. In general, proteins regulated in this way are in their active conformations with GTP bound. The loss of a phosphate group occurs when the bound GTP is hydrolyzed to GDP in a reaction catalyzed by the protein itself, and in its GDP-bound state the protein is inactive. In this way, GTP-binding proteins act as on-off switches whose activity is determined by the presence or absence of an additional phosphate on a bound GDP molecule (Figure 3-71). GTP-binding proteins (also called GTPasesbecause of the GTP hydrolysis they catalyze) comprise a large family of proteins that all contain variations on the same GTP-binding globular domain. !\4ren the tightly bound GTP is hydrolyzed to GDB this domain undergoes a conformational change that inactivates it. The three-dimensional structure of a prototypical member of this family, the monomeric GTPase called Ras, is shor.tmin Figure 3-72. The Ras protein has an important role in cell signaling (discussedin Chapter 15). In its GTP-bound form, it is active and stimulates a cascade of protein phosphorylations in the cell. Most of the time, however, the protein is in its inactive, GDP-bound form. It becomes active when it exchangesits GDP for a GTP molecule in responseto extracellular signals,such as growth factors, that bind to receptors in the plasma membrane (seeFigure 15-58).
INPUTS
src-tvpeproteinkinaseactivityturnson to all of the fuliyonlyi{ the answers areYes abovequestions OUTPUT Figure3-70 How a Src-tYPeProtein kinaseacts as an integrating device.The disruotionof the 5H3domaininteraction (green)involvesreplacingits binding to the indicatedred linkerregionwith a tighterinteractionwith an activating ligand,as illustratedin Figure3-69.
Proteins RegulatoryProteinsControlthe Activityof GTP-B|nding WhetherGTPor GDPls Bound by Determining GTP-binding proteins are controlled by regulatory proteins that determine whether GTP or GDP is bound, just as phosphorylated proteins are turned on and offby protein kinases and protein phosphatases.Thus, Rasis inactivated by a GTPase-actiuating protein (GAP),which binds to the Ras protein and induces it to hydrolyze its bound GTP molecule to GDP-which remains tightlyboundand inorganic phosphate (PJ, which is rapidly released.The Ras protein stays in its inactive, GDP-bound conformation until it encounters a guanine nucleotide exchangefactor (GEF),which binds to GDP-Rasand causesit to releaseits GDP Because the empty nucleotide-binding site is immediately filled by a GTP molecule (GTPis present in large excessover GDP in cells),the GEF activatesRas by indirectly adding back the phosphate removed by GTP hydrolysis' Thus, in a sense,the roles of GAP and GEF are analogous to those of a protein phosphatase and a protein kinase, respectively (Figure 3-73).
FromSmallOnes CanBeGenerated LargeProteinMovements The Ras protein belongs to a large superfamily of monomeric GTPases,each of which consists of a single GTP-binding domain of about 200 amino acids. Over the course of evolution, this domain has also become joined to larger proteins with additional domains, creating a large family of GTP-binding proteins. Family members include the receptor-associated trimeric G proteins involved in cell signaling (discussedin Chapter 15), proteins regulating the traffic of vesicles between intracellular compartments (discussed in Chapter 13), and proteins that bind to transfer RNA and are required as assembly factors for protein
ACTIVE
NACTIVE
NACTIVE
ACTIVE
Figure3-7 1 GTP-bindingproteinsas molecularswitches.The activityof a protein(alsocalleda GTP-binding generallyrequiresthe presence GTPase) of a tightlyboundGTPmolecule(switch 'bn").Hydrolysis of this GTPmolecule producesGDPand inorganicphosphate (Pi),and it causesthe proteinto convert to a different,usuallyinactive, conformation(switch'bff").As shown here,resettingthe switch requiresthe a slow tightlybound GDPto dissociate, step that is greatlyacceleratedby specific a oncethe GDPhasdissociated, signals; moleculeof GTPis quicklyrebound.
180
Chapter3: Proteins
Figure3-72 The structureof the Ras protein in its GTP-boundform. ThismonomericGTPase illustrates the structureof a GTP-binding domain,which is presentin a largefamilyof GTP-binding proteins. Theredregionschangetheir conformation when the GTPmoleculeis hydrolyzed to GDPand inorganic phosphateby the protein;the GDP remainsboundto the protein,whilethe inorganicphosphateis released. The specialroleof the "switchhelix"in proteinsrelatedto Rasis explainednext (seeFigure3-75).
synthesis on the ribosome (discussedin chapter 6). In each case,an important biological activity is controlled by a change in the protein's conformation that is caused by GTP hydrolysis in a Ras-like domain. The EF-Tu protein provides a good example of how this family of proteins works. EF-Tu is an abundant molecule that servesas an elongation factor (hence the EF) in protein synthesis, loading each aminoacyl tRNA molecule onto the ribosome. The tRNA molecule forms a tight complex with the GTp-bound form of EF-Tu (Figure 3-74). In this complex, the amino acid attached to the IRNA is improperly positioned for protein slmthesis. The IRNA can transfer its amino acid only after the GTP bound to EF-Tu is hydrolyzed on the ribosome, allowing the EF-Tu to dissociate. Since the GTp hydrolysis is triggered by a proper fit of the IRNA to the mRNA molecule on the ribosome, the EF-Tu serves as a factor that discriminates between correct and incorrect mRNA-IRNA pairings (seeFigure 6-67 for a further discussion of this function of EF-Tu). By comparing the three-dimensional structure of EF-Tu in its GTp-bound and GDP-bound forms, we can see how the repositioning of the IRNA occurs. The dissociation of the inorganic phosphate group (pJ, which follows the reaction GTP -+ GDP + Pi, causes a shift of a few tenths of a nanometer at the GTpbinding site, just as it does in the Rasprotein. This tiny movement, equivalent to rN. I 'IiGNAL r f
-
I,t GPP APP P APP
GPP P
srcrunl our l
\...',/
\?
S I G N A L I N GB Y P H O S P H O R Y L A T E D PROTEIN
S I G N A L I NB GY G T P - B I N D I NPGR O T E I N
Figure3-73 A comparisonof the two major intracellularsignaling mechanismsin eucaryoticcells.In both casesr a signalingproteinis activatedby the additionofa phosphategroupand inactivated by the removalof this phosphate. To emphasize the similarities in the two pathways,ATPand GTPare drawnas APPPand GPPP, and ADPand GDPasAPPand GPBrespectively. As shownin Figure3-64,the additionof a phosphateto a proteincanalsobe inhibitorv.
181
PROTEIN FUNCTION Thethree Figure3-74An aminoacyl boundto EF-Tu. tRNAmolecule proteinarecolored 3-75. to matchFigure differently, domains of theEF-Tu proteinexists protein; in however, a verysimilar Thisisa bacterial (Coordinates et by P.Nissen determined whereit iscalledEF-1. eucaryotes, fromAAA5.) 270:1464-1472, 1995. Withpermission al.,Science a few times the diameter of a hydrogen atom, causes a conformational change to propagate along a crucial piece of a helix, called Ihe switch helix, in the Raslike domain of the protein. The switch helix seems to serve as a latch that adheresto a specific site in another domain of the molecule, holding the protein in a "shut" conformation. The conformational change triggered by GTP hydrolysis causesthe switch helix to detach, allowing separatedomains of the protein to swing apart, through a distance of about 4 nm. This releasesthe bound IRNA molecule, allowing its attached amino acid to be used (Figure 3-75). Notice in this example how cells have exploited a simple chemical change that occurs on the surface of a small protein domain to create a movement 50 times larger.Dramatic shape changesof this type also causethe verylarge movements that occur in motor proteins, as we discuss next.
MotorProteinsProduceLargeMovementsin Cells We have seen that conformational changes in proteins have a central role in enzyrne regulation and cell signaling. We now discuss proteins whose major function is to move other molecules. These motor proteins generate the forces responsible for muscle contraction and the crawling and swimming of cells. Motor proteins also power smaller-scaleintracellular movements: they help to move chromosomes to opposite ends of the cell during mitosis (discussedin Chapter 17),to move organellesalong molecular tracks within the cell (discussed site of tRNA binding
GTPbinding site switch helix
(A)
(B)
(A)The three-dimensionalstructureof EF-Tuwith Figure3-75 The large conformationalchange in EF-Tucausedby GTPhydrolysis. is the switchhelix,which movesafterGTP lix protein, its and Ras GTPbound.The domainat the top hasa structuresimilarto the (B)Thechangein the conformation of the switchhelixin domain1 causesdomains2 and 3 to rotateas a singleunit by about90" hydrolysis. toward the viewer,which releasesthe IRNAthat was shown bound to this structurein Figure3-74. (A,adaptedfrom H. Berchtoldet al.,Noture Ltd.B,courtesyof MathiasSprinzland RolfHilgenfeld') from MacmillanPublishers 365:126-132,1 993.With permission
182
Chapter3: Proteins
in chapter 16), and to move enzyrnes along a DNA strand during the synthesis of a new DNA molecule (discussed in chapter 5). All these fundamental processesdepend on proteins with moving parts that operate as force-generating machines. How do these machines work? In other words, how do cells use shape changes in proteins to generate directed movements? If, for example, a protein is required to walk along a narrow thread such as a DNA molecule, it can do this by undergoing a series of conformational changes,such as those shor,rrnin Figure 3-76. But with nothing to drive these changes in an orderly sequence,they are perfectly reversible, and the protein can only wander randomly back and forth along the thread. we can look at this situation in another way. Since the directional movement of a protein does work, the laws of thermodynamics (discussed in chapter 2) demand that such movement use free energy from some other source (otherwise the protein could be used to make a perpetual motion machine). Therefore, without an input of energy,the protein molecule can only wander aimlessly. How can the cell make such a series of conformational changes unidirectional? To force the entire cycle to proceed in one direction, it is enough to make any one of the changes in shape irreversible. Most proteins that are able to walk in one direction for long distances achieve this motion by coupling one of the conformational changes to the hydrolysis of anATp molecule bound to the protein. The mechanism is similar to the one iust discussed that drives allosteric
Figure3-76 An allosteric"walking" protein. Although its three different conformationsallow it to wander randomlybackand forth while boundto a threador a filament,the protein cannot moveuniformlyin a singledirection.
In the model shorrrmin Figure 3-zz, Nlp binding shifts a motor protein from conformation I to conformation 2.The bound ATp is then hydrolyzed to produce ADP and inorganic phosphate (PJ, causing a change from conformation 2
Many motor proteins generate directional movement in this general way, including the muscle motor protein myosin, which walks along actin filamenis to generatemuscle contraction, and the kinesinproteins that walk along microtubules (both discussedin chapter l6). These movements can be rapid:iome of the motor proteins involved in DNA replication (the DNA helicises) propel themselves along a DNA strand at rates as high as 1000nucleotides p". second.
Membrane-Bound Transporters Harness Energyto pump Molecules ThroughMembranes
HYDROLYSIS
we have thus far seen how allosteric proteins can act as microchips (cdk and Src kinases),as assembly factors (EF-Tu),and as generatorsof mechanical force and motion (motor proteins). Allosteric proteins can also harness energy derived from ATP hydrolysis, ion gradients, or electron transport processesto pump specific ions or small molecules acrossa membrane. we consider one ex€rrnptetrere; others will be discussedin Chapter ll. The ABC transporters constitute an important class of membrane-bound pump proteins. In humans at least 48 different genesencode them. These transporters mostly function to export hydrophobic molecules from the cytoplasm, Figure3-77 An allostericmotor protein.The transitionbetweenthree differentconformations includesa stepdrivenby the hydrolysis of a bound ATPmolecule,and this makesthe entirecycleessentially irreversible. By repeatedcycles,the proteinthereforemovescontinuouslyto the right alongthe thread.
direction of movement
183
PROTEIN FUNCTION
m e m b r a n e - s p a n n i nsgu b u n i t s
lipid bilayer
CYTOSOL
Figure3-78 The ABC(ATP-binding cassette)transporter,a protein machine that pumps large hydrophobic molecules through a membrane.(A)The bacterial BtuCDprotein,whichimportsvitamin812 into E coli usingthe energyofATP of The bindingof two molecules hydrolysis. ATPclampstogetherthe two ATP-binding The structureis shownin its ADPsubunits. bound state,wherethe channelto the spacecan be seento be open extracellular but the gateto the cytosolremainsclosed. (B)Schematic of substrate illustration In bacteria, pumpingby ABCtransporters. the bindingof a substratemoleculeto the faceof the proteincomplex extracellular triggersATPhydrolysisfollowed by ADP gate; whichopensthe cytoplasmic release, the pump is then resetfor anothercycle.In eucaryotes,an oppositeprocessoccurs, to be pumped causingsubstratemolecules out ofthe cell.(A,adaptedfrom K.P.Locher, Curr. Ooin. Struct.Biol. 14:426-441,2004' from Elsevier.) With permission
A T P - b i n d i n sgu b u n i t s
ABCTRANSPORTER A EUCARYOTIC
(B) A BACTERIAL ABCTRANSPORTER s u b s t r a t em o l e c u l e
l:,
CI'TOSOL substrate molecule
,
\ 2P
{;
\ ATP-binding
serving to remove toxic molecules at the mucosal surface of the intestinal tract, for example, or at the blood-brain barrier. The study of ABC transporters is of intense interest in clinical medicine, because the overproduction of proteins in this class contributes to the resistance of tumor cells to chemotherapeutic drugs. And in bacteria, the same tlpe of proteins primarily function to import essential nutrients into the cell. The ABC transporter is a tetramer, with a pair of membrane-spanning subunits linked to a pair of ATP binding subunits located just below the plasma membrane (Figure 3-78A). As in other exampleswe have discussed,the hydrolysis of the bound ATP molecules drives conformational changes in the protein, transmitting forces that cause the membrane-spanning subunits to move their bound molecules acrossthe lipid bilayer (Figure 3-788). Humans have invented many different types of mechanical pumps, and it should not be surprising that cells also contain membrane-bound pumps that function in other ways. Among the most notable are the rotary pumps that couple the hydrolysis of ATP to the transport of H* ions (protons). These pumps resemble miniature turbines, and they are used to acidify the interior of lysosomes and other eucaryotic organelles.Like other ion pumps that create ion gradients, they can function in reverseto catalyzethe reactionADP + Pr-+ ATB if the gradient acrosstheir membrane of the ion that they transport is steep enough. One such pump, the ATP slrrthase, harnessesa gradient of proton concentration produced by electron transport processesto produce most of the AIP used in the living world. This ubiquitous pump has a central role in energy conversion, and we shall discussits three-dimensional structure and mechanism in Chapter 14.
+zri zi.:#Mi
184
Chapter3: Proteins
ProteinsOftenFormLargeComplexes ThatFunctionas protein Machines Large proteins formed from many domains are able to perform more elaborate functions than small, single-domain proteins. But large protein assemblies formed from many protein molecules perform the most impressive tasks. Now that it is possible to reconstruct most biological processesin cell-free systemsin the laboratory, it is clear that each of the central processesin a cell-such as DNA replication, protein synthesis,vesicle budding, or transmembrane signaling-is catalyzed by a highly coordinated, linked set of I0 or more proteins. In most such protein machines, an energetically favorable reaction such as the hydrolysis of bound nucleoside triphosphates (ATp or GTp) drives an ordered series of conformational changes in one or more of the individual protein subunits, enabling the ensemble of proteins to move coordinately. In this way, each enzyme can be moved directly into position, as the machine catalyzessuccessive reactions in a series.This is what occurs, for example, in protein synthesis on a ribosome (discussedin chapter 6)-or in DNA replication, where a large multiprotein complex moves rapidly along the DNA (discussedin chapter 5). cells have evolved protein machines for the same reason that humans have invented mechanical and electronic machines. For accomplishing almost any task, manipulations that are spatially and temporally coordinated through linked processesare much more efficient than the use of individual tools.
ProteinMachines with Interchangeable PartsMakeEfficientuse of Geneticlnformation To probe more deeply into the nature of protein machines, we shall consider a relatively simple one: the SCF ubiquitin ligase. This protein complex binds different "target proteins" at different times in the cell cycle, and it covalently adds multiubiquitin polypeptide chains to these proteins. Its c-shaped structure is formed from five protein subunits, the largest of which is a molecule that serves as a scaffold protein on which the rest of the structure is built. The structure underlies a remarkable mechanism (Figure 3-zg). At one end of the c is an E2 ubiquitin-conjugating enzyme. At the other end is a substrate-binding arm, a subunit knovrn as an F-box protein.These two subunits are separatedby a gap of about 5 nm. \Mhen this protein complex is acrivated, the F-box protein binds to a specific site on a target protein, positioning the protein in the gap so that some of its lysine side chains contact the ubiquitin-conjugating This enzyme can then catalyze the repeated addition of a ubiquitin "nry-e. polypeptide to these lysines (seeFigure 3-79c), producing a polyubiquitin chain that marks rhe target protein for rapid destruction in a proteasome (seep. 393). In this manner, specific proteins are targeted for rapid destruction in
PROTEIN FUNCTION
adaptor protein 2 1)
F-boxprotein ( s u b s t r a t e - b i n d i nagr m )
E 2u b i q u i t i n conjugating enzyme ubiquitin
I I two oJ many possible substrate-bind ing arms
(B)
p o l y u b i q ui t y l a t e d protein targeted lor destruction ,/r
(C)
u b i q u i t i nl i g a s e
in cells, inasmuch as new functions can evolve for the entire complex simply by producing an alternative version of one of its subunits.
OftenInvolvesPositioning TheActivationof ProteinMachines Themat SpecificSites As scientists have learned more of the details of cell biology, they have recognized increasing degreesof sophistication in cell chemistry. Thus, not only do we now know that protein machines play a predominant role, but it has recently become clear that most of these machines form at specific sites in the cell, being activated only where and when they are needed. Using fluorescent, GFP-tagged fusion proteins in living cells (see p. 593), cell biologists are able to follow the repositioning of individual proteins that occurs in response to specific signals. Thus, when certain extracellular signaling molecules bind to receptor proteins in the plasma membrane, they often recruit a set of other proteins to the inside surface of the plasma membrane to form protein machines that pass the signal on. As an example, Figure 3-804 illustrates the rapid movement of a protein kinase C (PKC)enzyme to a complex in the plasma membrane, where it associates with specific substrate proteins that it phosphorylates. There are more than 10 distinct PKC enzymes in human cells, which differ both in their mode of regulation and in their functions. When activated, these enz]rynesmove from the cytoplasm to different intracellular locations, forming specific complexes with other proteins that allow them to phosphorylate different protein substrates (Figure 3-808). The SCF ubiquitin ligases can also move to specific sites of function at appropriate times. As will be explained when we discuss cell signaling in Chapter 15, the mechanisms frequently involve protein phosphorylation, as well as scaffold proteins that link together a set of activating, inhibiting, adaptol and substrate proteins at a specific location in a cell. This general phenomenon is known as induced proximity, and it explains the otherwise puzzling observation that slightly different forms of enzymes with the same catalltic site will often have very different biological functions. Cells change the locations of their proteins by covalently modifying them in a variety of different ways, as part of a "regulatory code" to be described next.
Figure3-79 The structure and mode of actionof a SCFubiquitinligase.(A)The structureof the five-proteincomplexthat The includesan E2ubiquitinligase. proteindenoted hereas adapterprotein 1 is the Rbxl/Hrt1protein,adaPtor protein2 is the 5kp1protein,and the cullinis the Cull protein.(B)Comparison of the samecomplexwith two different arms,the F-box substrate-binding proteins Skp2 (top) and p-trCPl (bottom), (C)The binding and respectively. ubiquitylationof a target protein by the a SCFubiquitinligase.lf,as indicated, chainof ubiquitinmoleculesis addedto the samelysineof the target protein,that protein is markedfor rapid destructionby the proteasome.(A and B,adaptedfrom G.Wu et al.,Mol.Cell11:1445-1456,2003. With oermissionfrom Elsevier.)
r86
C h a p t e r3 : P r o t e i n s
0 min
3 min
(A)
,
I
(B) 2 0u m
1 0u m
unstructured regton , ,,,i
(C)
1 0m i n
rapid collisions +
structured d o m ar n
'fhese
rnoditicationscreate sites on proteins that bind them to particular scafftlld proteins,therebyclusteringthe proteins required for particular reactionsin specificregionsof the cell. Most biologicalreactionsare catalyzedby setsof 5 or more proteins, and such a clustering of proteins is often required for the reaction to occur. Scaffoldsthereby allow cells to compartmentalizereactionseven irr the absence of membranes. Although onty recently recognized as a widespread phenomenon, this tvpe of clustering is particularly obvious in the cell nucleus (seeFigure4-69). Many scaffolds appear to be quite different from the cullin illustrated previously in Figure 3-79: rather than holding their bound proteins in precisepositions lelative to each other, the interacting proteins are linked by unstructured regionsof polvpeptide chain. This tethersthe proteins together,causingthem to collicle frequently with each other in random orientations-some of which will lead to a productive reaction (Figure3-B0c). In essence,this mechanism greatly speeds reactions by creating a very high local concentration of the reacting species.For this reason,the use ofscaffold proteins representsan especiallyversatileway of controlling cell chemistry (seealso Figure l5-61).
Figure3-80 The assemblyof protein machinesat specificsitesin a cell. (A)In response to a signal(herea phorbol ester),the gammasubspecies of protein kinaseC movesrapidlyfrom the cytosol to the plasmamembrane. The protein kinaseis fluorescent in theselivingcells becausean engineered geneinsidethe cellencodesa fusionproteinthat links the kinaseto greenfluorescent protein (GFP). (B)Thespecificassociation of a differentsubspecies of proteinkinaseC (aPKC) with the apicaltip of a differentiating neuroblastin an early Drosophila embryo.The kinaseis stained red,andthe cellnucleusgreen. (C)Diagramillustrating how a simple proximitycreatedby scaffoldproteins cangreatlyspeedreactionsin a cell.In this example,long unstructured regions of polypeptidechainin a largescaffold proteinconnecta seriesof structured domainsthat bind a setof reacting proteins. The unstructured regionsserve asflexible"tethers"thatgreatlyspeed reactionratesby causinga rapid,random collisionof all of the proteinsthat are (Fora simple boundto the scaffold. exampleof tethering,seeFigure16-38.) (A,from N. Sakaiet al,J. CellBiol. 139:1465-1476, 1997.With permission from The Rockefeller University Press. B,courtesyof AndreasWodarz,Institute of Genetics, University of Dr.isseldorf, Germany.)
Many ProteinsAre controlled by Multisitecovalent Modification we have thus far described only one type of posttranslational modification of proteins-that in which a phosphate is covalentlv attached to an amino acid side chain (seeFigure3-64). But a largenumber of other such modifications also occLlr,rnore than 200 distinct types being known. To give a senseof the variety, lable 3-3 presents a subset of modifying groups with known regulatory roles.As Table3-3 SomeMoleculesCovalentlyAttachedto ProteinsRegulateProteinFunction MODIFYING GROUP
SOMEPROMINENT FUNCTIONS
Phosphateon Ser,Thr,or Tyr Methylon Lys
Drivesthe assembly of a proteininto largercomplexes (seeFigure15-,|9). Helpsto creates histonecodein chromatin throughformingeithermono-, di-,or tri-methyllysine(seeFigure4-38). Helpsto creates histonecodein chromatin(seeFigure4-38). Thisfattyacidadditiondrivesproteinassociation (see with membranes Figurel0-20). Controls enzymeactivityandgeneexpression in glucosehomeostasis. Monoubiquitin additionregulates the transportof membrane proteinsin vesicles (seeFigure13-58). A polyubiquitin chaintargetsa proteinfor degradation (seeFigure3-79).
Acetylon Lys Palmitylgroupon Cys N-acetylglucosamine on Seror Thr Ubiquitinon Lys
U b i q u i tsi na / 6 a m i n o a c i d p o y p e pt ht iedree;a r e e aa t stl0otherubiquitin-relatedproteins,suchasSUMo,thatmodifyproteinsins
187
PROTEIN FUNCTION
P
pl
iltl PP
P
P
P
Ac Ac
ttl
P
P
I
ubiquitin
w:"
P
T E T R A M E R I Z A T I OC N- T E R M I N A L REGULATORY DOMAIN DOMAIN
D N A - B I N D I NDGO M A I N
P R O T E Ip Ns 3
.e. phosphate SUMO
r,l P
TRANSACTIVATION DOMAIN (A)
{ti'
$P,' methyl
S I G N A L I N IGN P U T S
ts+J
/t+\
/
,
i
i
V
A
e
\/ VV ( B ) H I S T O NH E3
T H EC O D EI SR E A D
B I N DT O M O V ET O M O V ET O ^ r n \ / ET ^ pRoTEASoME pRoTErNsor or PLASMA ill;.];,',: or F O RD E G R A D A T I O N M E M B R A N E Y ANDZ
(c)
in phosphate addition, these groups are added and then removed from proteins according to the needs of the cell. A large number of proteins are now knor,vnto be modified on more than one amino acid side chain, with different regulatory events producing a different pattern of such modifications. A striking example is the protein p53, which plays a central part in controlling a cell'sresponseto adversecircumstances (seep. I 105). Through one of four different tlpes of molecular additions, this protein can be modified at 20 different sites (Figure 3-SfA). Becausean enormous number of different combinations of these 20 modifications are possible, the proteins behavior can in principle be altered in a huge number of ways. Moreover, the pattern of modifications on a protein can determine its susceptibility to further modification, as illustrated by histone H3 in Figure 3-BlB. Cell biologists have only recently come to recognize that each protein's set of covalent modifications constitutes an importanl combinatorial regulatory code' As specific modi$ring groups are added to or removed from a protein, this code causes a different set of protein behaviors-changing the activity or stability of the protein, its binding partners, and its specific location within the cell (Figure 3-8iC). This helps the cell respond rapidly and with great versatility to changes in its condition or environment.
Cell Underlies A ComplexNetworkof ProteinInteractions Function There are many challengesfacing cell biologists in this "post-genome" era when complete genome sequences are knor.tm.One is the need to dissect and reconstruct each one of the thousands of protein machines that exist in an organism such as ourselves. To understand these remarkable protein complexes, each must be reconstituted from its purified protein parts, so that we can study its detailed mode of operation under controlled conditions in a test tube, free from
Figure3-81 Multisiteprotein modification and its effects.A protein addition that carriesa post-translational to morethan one of its aminoacidside to carrya chainscan be considered regulatorycode.(A)The combinatorial oatternof known covalentmodifications to the proteinp53;ubiquitinand SUMO (seeTable3-3). arerelatedpolypeptides (B)The possiblemodifications on the first of 20 aminoacidsat the N-terminus histoneH3,showingnot onlYtheir locationsbut alsotheir activating(b/ue.) and inhibiting (red)effectson the additionof neighboringcovalent modifications.In additionto the effects and methylation shown,the acetylation of a lysinearemutuallyexclusive reactions(seeFigure4-38).(C)Diagram showingthe generalmannerin which areaddedto (and multisitemodifications removedfrom)a proteinthrough signalingnetworks,and how the regulatorycode resultingcombinatorial on the protein is readto alter its behavior in the cell.
188
Chapter3: Proteins
all other cell components. This alone is a massive task. But we now know that each of these subcomponents of a cell also interacts with other sets of macromolecules, creating a large network of protein-protein and protein-nucleic acid interactions throughout the cell. To understand the cell, therefore, we need to analyzemost of these other interactions as well. We can gain some idea of the complexity of intracellular protein netvvorks from a particularly well-studied example described in Chapter 16: the many dozens of proteins that interact with the actin cytoskeleton in the yeast saccharomycescereuisiae(seeFigure l6-18). The extent of such protein-protein interactions can also be estimated more generally. An enormous amount of valuable information is now freely available in protein databaseson the Internet: tens of thousands of three-dimensional protein structures plus tens of millions of protein sequencesderived from the nucleotide sequencesofgenes. Scientistshave been developing new methods for mining this great resource to increase our understanding of cells. In particular, computer-based bioinformatics tools are being combined with robotics and microarray technologies (seep. s74) to allow thousands of proteins to be investigated in a single set of experiments. proteomics is a term that is often used to describe such research focused on the large-scaleanalysis of proteins, analogous to the term genomics describing the Iarge-scaleanalysis of DNA sequencesand genes. Biologists use two different large-scalemethods to map the direct binding interactions between the many different proteins in a cell. The initial method of choice was based on genetics: through an ingenious technique known as the yeast two-hybrid screen (see Figure 8-24), tens of thousands of interactions between thousands of proteins have been mapped in yeast,a nematode, and the fruit fly Drosophila. More recently, a biochemical method based on affinity tagging and mass spectroscopy has gained favor (discussedin chapter 8), because it appears to produce fewer spurious results.The results of these and other analyses that predict protein binding interactions have been tabulated and organized in Internet databases.This allows a cell biologist studying a small set of proteins to readily discover which other proteins in the same cell are thought to bind to, and thus interact with, that set of proteins. \Arhendisplayed graphically as a protein interaction map, eachprotein is representedby a box or dot in a twodimensional network, with a straight line connecting those proteins that have been found to bind to each other. \Mhen hundreds or thousands of proteins are displayed on the same map, the network diagram becomes bewilderingly complicated, serving to illustrate how much more we have to learn before we can claim to really understand the cell. Much more useful are small subsections of these maps, centered on a few proteins of interest. Thus, Figure 3-82 shows a network of protein-protein interactions for the five proteins that form the SCFubiquitin ligase in a yeast cell (see Figure 3-79). Four of the subunits of this ligase are located at the bottom right of Figure 3-82. The remaining subunit, the F-box protein that serves as its substrate-binding arm, appears as a set of 15 different gene products that bind to adaptor protein 2 (the Skpl protein). Along the top and left of the figure are sets of additional protein interactions marked with yellow and green shading: as indicated, these protein sets function at the origin of DNA replication, in cell cycle regulation, in methionine slmthesis, in the kinetochore, and in vacuolar H+ArPase assembly.we shall use this figure to explain how such protein interaction maps are used, and what they do and do not mean. 1. Protein interaction maps are useful for identifuing the likely function of previously uncharacterized proteins. Examples are the products of the genes that have thus far only been inferred to exist from the yeast genome sequence,which are the six proteins in the figure that lack a simple threeletter abbreviation (white lettersbeginning withy). one, the product of socalled open readingframeYDRlg6c, is located in the origin of replication group' and it is therefore likely to have a role in starting new replication forks. The remaining five in this diagram are F-box proteins thai bind to Skpl; these are therefore likely to function as part of the ubiquitin ligase, serving as substrate-binding arms that recognize different target proteins.
189
P R O T E IFNU N C T I O N
However, as we discussnext, neither assignment can be considered certain without additional data. 2 . Protein interaction networks need to be interpreted with caution because, as a result of evolution making efficient use of each organism's genetic information, the same protein can be used as part of two different protein complexes that have different types of functions. Thus, although protein A binds to protein B and protein B binds to protein C, proteins A and C need not function in the same process.For example, we know from detailed biochemical studies that the functions of Skpl in the kinetochore and in vacuolar H+-ATPaseassembly (yellow shading) are separate from its function in the SCF ubiquitin ligase. In fact, only the remaining three functions of synthesis, cell cycle regulaSkpl illustrated in the diagram-methionine tion, and origin of replication (green shading)-involve ubiquitylation. 3 . In cross-speciescomparisons, those proteins displaying similar patterns of interactions in the two protein interaction maps are likely to have the same function in the cell. Thus, as scientists generate more and more highly detailed maps for multiple organisms, the results will become increasingly useful for inferring protein function. These map comparisons are a particularly powerful tool for deciphering the functions of human proteins. There is a vast amount of direct information about protein function that can be obtained from genetic engineering, mutational, and O R I G I NO F R E P L I C A T I O N CELLCYCLEREGULATORS
M E T H I O N I NSEY N T H E 5 I 5
KINETOCHORE Okpl
E 2u b i q u i t i n coniugating enzyme
Mit2
ctfl9
c Cep3
cbf2 .-..-
Mcm2l
Mckl
'/. adaptor protein1 Vm
Tfpl
'/'
Ram2 -
Vma2 VACUOLARH*-ATPase ASSEMBLY
adaptor protein 2
scaffoldprotein (cullin)
Figure3-82 A map of some protein- protein interactionsof the SCFubiquitin ligaseand other proteins in the yeast S.lerevisiae,Thesymbolsand/or colorsusedfor the 5 proteinsof the ligasearethose in Figure3-79. Note that 15 different with u/hitelettering(beginningwith Y) areonly knownfrom the genome F-boxproteinsareshown(purpte);those of PeterBowersand DavidEisenberg, sequenceasopen readingframes.Foradditionaldetails,seetext.(Courtesy UCLA.) UCLA-DOE Institutefor Genomicsand Proteomics,
190
Chapter3: Proteins Figure3-83 A networkof protein-bindinginteractionsin a yeastcell. Eachlineconnectinga pairof dots (proteins) indicates a protein-protein (FromA. Guimer6and M. Sales-Pardo, interaction. Mol.Syst. Biol.2:42,2006. With permission from MacmillanPublishers Ltd.)
genetic analyses in model organisms-such as yeast, worms, and fliesthat is not available in humans The available data suggestthat a typical protein in a human cell may interact with between 5 and 15 different partners. Often, each of the different domains in a multidomain protein binds to a different set of partners; in fact, we can speculate that the unusually extensivemultidomain structures observed for human proteins may have evolved to facilitate these interactions. Given the enormous complexity of the interacting networks of macromolecules in cells (Figure 3-83), deciphering their full functional meaning may well keep scientists busy for centuries.
Su m m a r y Proteins canform enormouslysophisticatedchemical deuices,whosefunctions largely depend on the detailed chemical properties of their surfaces.Binding sitesfor ligands areformed as surfacecauitiesin which preciselypositioned amino acid side chains are brought togetherby protein folding. In this way, normally unreactiueamino acid side chains can be actiuated to make and break coualentbonds.Enzymesare catalytic proteins that greatly speedup reaction rates by binding the high-energy transition states for a speciftcreaction path; they also perform acid catalysisand basecatalysissimultaneously.The ratesof enzymereactionsare often sofast thqt they are limited only by diffusion; ratescan befurther increasedif enzymesthat act sequentiallyon a substrate are joined into a single multienzyme complex, or if the enzymesand their substrates are confined to the same compartment of the cell. Proteins reuersiblychange their shape when ligands bind to their surface. The allosteric changesin protein conformation produced by one ligand affect the binding of a secondligand, and this linkage betweentwo ligand-binding sitesprouidesa crucial mechanism for regulating cell processes.Metabolic pathways, for example, are controlled by feedback regulation: some small moleculesinhibit and other small moleculesactiuate enzymesearly in a pathway. Enzymescontrolled in this way generally form symmetric assemblies,allowing cooperetiueconformational changesto reate a steepresponseto changesin the concentrationsof the ligands that regulatethem. The expenditure of chemical energy can driue unidirectional changesin protein shape.By coupling allosteric shape changesto ATp hydrolysis,for example, proteins can do useful work, such as generating a mechanical force or mouing for long distancesin a singledirection.The three-dimensionalstructuresof proteins,determined by x-ray crystallography,haue reuealedhow a small local change causedby nucleoside triphosphate hydrolysis is amplified to create major changes elsewherein the protein. By such means,theseproteinscan serueas input-output deuicesthat transmit information, as assemblyfactors, as motors, or as membrane-boundpumps. Highly efficient protein machines areformed by incorporating many dffirent protein moleculesinto larger assembliesthat coordinate the allosteric mouementsof the inttiuidual components.such machinesere now known to perform many of the most important reactionsin cells. Proteins are subjectedto mqny reuersiblepost-translational modifications, such as the coualentaddition of a phosphateor an acetylgroup to a specificamino acid side chain. The addition of thesemodifying groups is usedto regulate the actiuity of a protein, changing its conformation, its binding to other proteins and its location inside the cell.A ltpical protein in a celt will interact with more than fiue dffirent panners. using the new technologiesof proteomics,biologistscan analyze thousandsof proteins in one set of experiments.One important result is the production of detailed protein interaction maps, which aim at describingall of the binding interactions betweenthe thousandsof distinct proteins in a cell.
...,'*
t.{,
?1
191
END-OF-CHAPTER PROBLEMS
PROBLEMS
FigureQ3-1 The kelch repeatdomainof galactoseoxidasefrom D.dendroides(Problem 3-9).The seven individualB propellers The N- and areindicated. C-terminiareindicated by N and C.
6r*, f l ,
Whichstatementsare true? Explainwhy or why not. 3-1 Each strand in a B sheet is a helix with two amino acidsper turn. 3-2 Loops of polypeptide that protrude from the surface of a protein often form the binding sitesfor other molecules. 3-3 An enzymereachesa maximum rate at high substrate concentrationbecauseit has a fixed number of active sites where substratebinds. 3-4 Higher concentrationsof enzyrnegiverise to a higher turnover number. 3*5 Enz)rynessuch as aspartatetranscarbamoylasethat undergo cooperative allosteric transitions invariably contain multiple identical subunits. 3*6 Continual addition and removal of phosphates by protein kinases and protein phosphatasesis wasteful of energy-since their combined action consumesATP-but it is a necessaryconsequenceof effectiveregulation by phosphorylation. Discussthe following problems. 3-7 Consider the following statement. "To produce one molecule of each possible kind of polypeptide chain, 300 amino acidsin length, would require more atoms than existin the universe." Given the size of the universe,do you suppose this statement could possibly be correct?Since counting atoms is a tricky business,consider the problem from the standpoint of mass.The mass of the observableuniverseis estimated to be about l0B0grams, give or take an order of magnitude or so.Assumingthat the averagemassof an amino acid is I l0 daltons,what would be the massof one molecule of eachpossiblekind of pollpeptide chain 300 amino acidsin length?Is this greaterthan the mass of the universe? 3-8 A common strategyfor identifying distantly related proteins is to search the databaseusing a short signature sequenceindicative of the particular protein function. \A/hy is it better to searchwith a short sequencethan with a long sequence?Do you not have more chancesfor a'hit' in the databasewith a long sequence? 3-9 The so-calledkelch motif consistsof a four-stranded B sheet,which forms what is known as a B propeller. It is usually found to be repeatedfour to seventimes, forming a kelch repeat domain in a multidomain protein. One such kelch repeat domain is shor.tmin Figure Q3-1. Would you classifythis domain as an'in-line' or'plug-in type domain? 3-10 Titin, which has a molecular weight of 3 x 106daltons, is the largest polypeptide yet described. Titin moleculesextend from muscle thick filaments to the Z disc; they arethought to act as springsto keep the thick filaments centeredin the sarcomere.Titin is composedof a largenumber of repeatedimmunoglobulin (Ig)sequencesof 89 amino acids,each of which is folded into a domain about 4 nm in length (Figure Q3-2A). You suspectthat the springlikebehavior of titin is caused by the sequentialunfolding (and refolding) of individual Ig
domains. You test this hlpothesis using the atomic force microscope,which allowsyou to pick up one end of a protein molecule and pull with an accuratelymeasuredforce. For a fragment of titin containing seven repeats of the Ig domain, this experiment gives the sawtooth force-versusextension curve shourn in Figure Q3-28. \A4renthe experiment is repeatedin a solution of B M urea (a protein denaturant), the peaks disappear and the measured extension becomesmuch longer for a given force.If the experimentis repeated after the protein has been cross-linkedby treatment with glutaraldehyde,once again the peaks disappear but the extensionbecomesmuch smaller for a given force. A. Are the data consistentwith your hlpothesis that titin's springlike behavior is due to the sequential unfolding of individual Ig domains?Explainyour reasoning. B. Is the extension for each putative domain-unfolding event the magnitude you would expect? (In an extended polypeptide chain, amino acids are spaced at intervals of 0.34nm.) C. \Mhy is each successivepeak in Figure Q3-2B a little higher than the one before? D. \A/hydoesthe force collapseso abruptly after eachpeak? 3*11 It is often said that protein complexesare made from subunits (that is, individually slnthesized proteins) rather than as one long protein becausethe former is more likelyto give a correctfinal structure. A. Assuming that the protein synthesismachinery incorDoratesone incorrect amino acid for each 10,000it inserts, (A)
(B)
4oo ^ z o ;
c
300 200
,P loo 0
i . rr t:,
0
I r i i . i r , r at t , r i rr:. . . : r , , ' l i . r i r i , r i , , ' ' : . ' .
50
.,:,
150 100 (nm) extension
200
behaviorof titin (Problem3-10)'(A)The FigureQ3-2 Springlike versus structureof an individuallg domain.(B)Forcein piconewtons extensionin nanometersobtainedby atomicforce microscopy.
192
Chapter3: Proteins
calculatethe fraction of bacterial ribosomesthat would be assembledcorrectly if the proteins were synthesizedas one large protein versusbuilt from individual proteins?For the sake of calculation assumethat the ribosome is composed of 50 proteins, each 200 amino acids in length, and that the subunits-correct and incorrect-are assembledwith eoual likelihood into the completeribosome.IThe probability that a polypeptidewill be made correctly,Pc, equalsthe fraction correct for each operation,/6, raisedto a power equal to the number of operations, n: P6 = lfd". For an error rate of 1 / 1 0 , 0 0 0f r, . = 0 . 9 9 9 9 . 1 B. Is the assumption that correct and incorrect subunits assembleequally well likely to be true? \A4ryor why not? How would a changein that assumption affect the calculation in part A? 3-12 Roussarcomavirus (RSV)carriesan oncogenecalled Srq which encodes a continuously active protein tl,'rosine kinase that leadsto uncheckedcell proliferation. Normally, Src carries an attached fatty acid (myristoylate)group that allowsit to bind to the cy'toplasmicside of the plasmamembrane. A mutant version of Src that does not allow attachment of myristoylatedoesnot bind to the membrane.Infection of cells with RSV encoding either the normal or the mutant form of Src leads to the same high level of protein tyrosine kinase activity,but the mutant Src does not cause cell proliferation. A. Assumingthat the normal Srcis all bound to the plasma membrane and that the mutant Src is distributed throughout the cy.toplasm,calculatetheir relativeconcentrationsin the neighborhood of the plasma membrane. For the purposes of this calculation, assume that the cell is a sphere with a radius of l0 pm and that the mutant Srcis distributed throughout, whereasthe normal Src is confined to a 4-nmthick layer immediately beneath the membrane. [For this problem, assumethat the membrane has no thickness.The volume of a sphereis (4/3)rr3.l B. The target (X) for phosphorylationby Srcresidesin the membrane.Explainwhy the mutant Src does not causecell proliferation. 3-13 An antibody binds to anotherprotein with an equilibrium constant,K of 5 x lOeM-1.\A/henit binds to a second, relatedprotein, it forms three fewer hydrogenbonds,reducing its binding affinity by 2.8 kcal/mole.\Mhatis the Kfor its binding to the secondprotein?(Free-energy changeis related to the equilibrium constantby the equationAG" = -2.3 RTlog K whereR is t.9Bx 10-3kcal/(moleK) and Tis 310K.) 3-i 4 The protein SmpBbinds to a specialspeciesof tRNA, tmRNA, to eliminate the incomplete proteins made from truncated mRNAs in bacteria. If the binding of SmpB to tmRNA is plotted as fraction tmRNA bound versus SmpB concentration,one obtainsa symmetricalS-shapedcurve as shor.rnin Figure Q3-3. This curve is a visual displayof a very useful relationship between tr:i and concentration, which has broad applicability.The generalexpressionfor fraction of ligand bound is derived from the equation for K6 (trfr= lPrllll/ [Pr-L])by substituting([L]ror.- tL])for [pr-L] and rearranging.Becausethe total concentrationofligand ([L]ror) is equal to the free ligand (tll) plus bound ligand ([pr-L]),
ltmRNAlror = [SmpB]/([SmpB]+ rQ). Using this relationship, calculatethe fraction of tmRNA bound for SmpB concentrationsequal to 104Kd,103Kd,l02Kd,lOltra, Kd, lO-tIA, l0-2Kd,10-3^?,and 10rK4. 10
E 075 !
c l o c
05
E a
-
025
0 1 01 1
10-e
1 0s
10-7
(M) centration of SmpB FigureQ3-3Fraction of tmRNA boundversus SmpBconcentration ( P r o b l e3m- 1 4 ) .
3*15 Many enzymes obey simple Michaelis-Menten kinetics,which are summarizedby the equation rate = vmax[s]/([S] + K_) where V-* = maximum velociry [S]= concentrationof substrate,and Km= the Michaelisconstant. It is instructiveto plug a fewvaluesof [S]into the equation to seehow rate is affected.What are the ratesfor [S]equal to zero,equal to K-, and equal to infinite concentration? 3-16 The enzyme hexokinaseadds a phosphateto D-glucose but ignores its mirror image, L-glucose.Supposethat you were able to synthesizehexokinase entirely from Damino acids,which are the mirror image of the normal Lamino acids. A. Assuming that the 'D' enz).rnewould fold to a stable conformation,what relationshipwould you expectit to bear to the normal'l enzyme? B. Do you supposethe'D' en4/rnewould add a phosphate to L-glucose,and ignore D-glucose? 3-17 How do you supposethat a molecule of hemoglobin is ableto bind oxygenefficientlyin the lungs,and yet release it efficientlyin the tissues? 3-18 Synthesisof the purine nucleotidesAMP and GMP proceeds by a branched pathway starting with ribose 5phosphate (R5P),as shown schematicallyin Figure Q3-4. Using the principles of feedbackinhibition, proposea regulatory strategyfor this pathway that ensuresan adequate supply of both AMP and GMP and minimizes the buildup of the intermediates(,4-1)when suppliesof AMP and GMP are adequate. F +
G+AMP
H+
/ +GMP
,/ R5P+A+8+C+D+E
\
fraction bound = tll/ [L]ror = tprl/ (tprl + Ka) For SmpB and tmRNA, the fraction bound = [tmRNAl/
FigureQ3-4 Schematic diagramof the metabolicpathwayfor synthesis of AMPand GMPfrom R5P(Problem3*18).
REFERENCES
REFERENCES General Tymoczko Berg-1M, lL & StryerL (2006)Biochemistry, 6rh ed NewYork: WH Freeman Branden C &ToozeJ (1999)Introduction io ProteinStructure,2nd ed NewYork:GarlandScience Dickerson, RE(2005)Present at the FloodHowStructural Mo ecuar BiologyCameAbout Sunderland, MA:Slnauer KyteJ (2006)Structure in ProteinChemistryNewYork:Routledge Petsko GA& RingeD (2004)ProteinStructure and FunctionLondon: NewScience Press DroteinStrdclu.e: Pe'uLz M r 199..r) NewApp.oaches to Disease and TherapyNewYork:WH Freeman The Shapeand Structureof Proteins Anfinsen CB(1973)Principles that governthe foldingof proteinchains Science 181.2)3-230 peptidesandcytoplasmic BrayD (2005)Flexible Biol AelsGenome 6:106-I 09 P,Stetefe Burkhard d J & Strelkov SV(2001)Coiledcoils:a highly versatile protern fold ing r.laltf. Trends Ce|| Biol 11.82-BB principles CasparDLD& KlugA (1962)Physical in the construction of regularvrrusesColdSpringHarbSympQuantBiol27:1-24. DoolittleRF(1995) Themultiplicity of domainsin proteinsAnnuRev Biochem64.287-314 Eisenberg D (2003) Thediscovery ofthe alphahe ix and betasheet, theprincipe structural featuresofproteinsProcNatl AcadSct USA -11210 100.11207 Fraenkel ConratH & Williams RC(1955)Reconstitution of active tobaccomosaicvirusfrom itsinactiveproteinand nucleicacid componentsProcNatlAcadSciUSA41:69A-698 Goodsell DS& OlsonAJ (2000)Structural symmetryand protein function AnnuRevBiophys BtomolStruct29:105-1 53 HarrisonSC(1992)Yiuses CurrOpinStructBrol2.293-299 HarrisonSC(2004)Whitherstructuralbiology?NatureStructltlolBiol 11 : 1 2 -51 HudderA, Nathanson L & Deutscher MP (2003) Organization of mammaliancytoplasmMol CellBiol23.9318-9326 I n t e r n a t i o nHaul m a nG e n o m eS e q u e n c i nCgo n s o r t i u(m2 0 0 1l )n i t i a l sequencing and analysis of the humangenome/Vcfure 4A9.860-921 MeilerI & BakerD (2003)Coupledprediction of proteinsecondary and Tertiarv stnr.trrreProcNarlAcad5ciU5A100:12105 T2ll0 NomuraM (1973)Assembly of bacterial ribosomes Scrence 179. 864-873 OrengoCA& ThorntonJM (2005)Proteinfamilies andtheir perspective evolution a structural AnnuRevBiochem74.867-900 P a u l i n Lg& C o r e yR B( 1 9 5 1C) o n f i g u r a t i oonf sp o l y p e p t i dceh a i n w s ith favoredorientations aroundsinglebonds:two new pleatedsheets ProcNatlAcadSciUSA37:729-740 P a u l i n Lg ,C o r e yR B& B r a n s oHn R( 1 9 5 1 ) T hset r u c t u roef p r o t e i n tsw: o hydrogen-bonded helicalconfigurations of the polypeptide chain ProcNatlAcadSciUSA37.205-211 PontingCP,Schultz J,CopleyRRet al (2000)Evolution of domain familiesAdvProtetnChem54:185-244 T r i n i c k(J1 9 9 2U) n d e r s t a n d itnhge f u n c t i o nosf t i t i na n dn e b u l i n FEBS Leu3a7:44-48 VogelC,Bashton M, Kerrison NDet al (2004)Structure, functionand evolutionof multidomainproteinsCurrOpinStructBiol14.208-216 genomics: Tlang C & KimSH(2003)Overview from of structural structureto function,Curraptn ChemBiol7.28-32 Protein Function preparing AlbertsB (1998) Thecellasa collection of proteinmachines: the nextgeneration of molecular Cell92.291-294 biologists Benkovic| (1992)CatalyticantibodiesAnnuRevBtochem61:2954 BergOG& von HippelPH(1985)Diffusion-controlled macromolecular interactionsAnnu RevBiophys Biophys Chem14.131-1 60,
193 motifs, & LimWA (2006)Domains, RP,Remenyi A,YehB.J Bhattacharyya in the evolutionand and scaffods:Theroleof modularinteractions 75.655680 circuitsAnnuRevBiochem wiringof cellsignaling switches andclocks a familyof molecular BourneHR(1995)GTPases: PhilosTransR SocLondB 349:283-289 features ofthe reactions BradenBC& PoljakRJ(1995)Structural J 9.9-16 and proteinantigensFASEB betweenantibodies Structure, Function, RE& Geis| (1983)Hemoglobin: Dickerson CA:Benjamin Cummings and PathologyMenloPark, Evolution EnzymesNewYork:Scientific D & PotterH (1991)Discovering Dressler Library American | & Yeates TO (2000)Protein EM,Xenarios, D,Marcotte, Eisenberg functionin the post genomicera Nature405.823-826 in ProteinScience: Structure and Mechanisms FershtAR(,l999) NewYork.WH Freeman Catalysis A Guideto Enzyme Structural basisfor controlby LN& LewisRJ(200,1) Johnson, phosphorylation ChemRev101:2209-2242 WN (1988)Escheichia collaspartate ER& Lipscomb Kantrowitz andfunction the relationbetweenstructure transcarbamoylase: 4 Science 241:669-67 ModularenzymesNature409.247-252 Khosa C & HarburyPB(200,1) NatureRev KimE & ShengM (2004)PDZdomainproteinsof synapses Neurosci 5.771-781 DE,Jr (1984)Controlof enzymeactivityand metabolic Koshland pathwaysTrends 5cl9:,155-159 Btochem in enzyme D (2003)Challenges KrautDA,CarrollKS& Herschlag AnnuRevBiochen72.517-571 mechanism and energetics of protein KroganNJ,CagneyG,Yu H et al (2006)Globallandscape Nature cerevtstsae complexesin the yeastSaccharomyces 440:637-643 trace An evolutionary O, BourneHR& CohenFE(,1996) Lichtarge commonto proteinfamilles methoddefinesbindingsurfaces J ltlolBtol257.342-358 protein M, Ng HLet al (1999)Detecting MarcotteEM,Pellegrini fromgenomesequences interactions functionand protein-protein 285.751753, Science proteinsandcellular Allosteric JP& JacobF ('1963) MonodJ,Changeux controlsystemsJ ltlolBiol6.306-329 systems through of regulatory T & NashP (2003)Assembly Pawson 452 300'.445 proteininteraction domainsScience of cyclindependentkinaseregulation: NP(1999)Mechanisms Pavletich andCipand lNK4inhibitors of Cdks,theircyclinactivators, structures J l\lolBiol287.821-B2B in the lnteractions P & Kuriyan J (2006)Protein-protein Pellicena StructBrol regulation of proteinkinasesCurrOptn allosteric 16.702-709 Regulation and Allosteric of Cooperativity PerutzM (1990)Mechanisms Press University Cambrldge in ProteinsCambridge: with molecular F,Thoden,JB& HoldenHM (2003)Enzymes Raushel tunnels,4ccChemRes36.539548 Radzicka A & WolfendenR (1995)A proficientenzymeScience 267.9093, location: location, SatoTK,OverduinM & EmrS (200'])Location, targetingdirectedby PXdomainsScience Membrane 2 9 4 : 1 B B1l B B 5 state transition statesandtransition VL(1998)Fnzymatic Schramm 67:693-720 analogdesignAnnuRevBiochem to catalysis: diversity SchultzPG& LernerRA(1995)Frommolecular 269,1835-1842 fromthe immunesystemSclence lessons Thewaythingsmove:lookingunderthe ValeRD& MilliganRA(2000) 2BB,BB-95 hood of molecularmotor proteinsScience by hen Gl, LaineR& WithersSG(2001)Catalysis VocadloDJ,Davies /Vature proceeds viaa covalentintermedlate egg-whitelysozyme 412:835-838 of life Nature409:226-231 the chemistry WalshC (2001)Enabling and intramolecular YangXJ(2005)Multisiteproteinmodificatlon 24:16531662 signalingOncogene AnnuRevBtochem Zhu H,BilginM & SnyderM (2003)Proteomics 72:783812
DNA,Chromosomes, and Genomes Life depends on the ability of cells to store, retrieve, and translate the genetic instructions required to make and maintain a living organism. Tl:'is hereditary information is passed on from a cell to its daughter cells at cell division, and from one generation of an organism to the next through the organism's reproductive cells. These instructions are stored within every living cell as its genes, the information-containing elements that determine the characteristics of a species as a whole and of the individuals within it. As soon as genetics emerged as a science at the beginning of the twentieth century, scientists became intrigued by the chemical structure of genes. The information in genes is copied and transmitted from cell to daughter cell millions of times during the life of a multicellular organism, and it survives the process essentially unchanged.'What form of molecule could be capable of such accurate and almost unlimited replication and also be able to direct the development of an organism and the daily life of a cell?\A/hatkind of instructions does the genetic information contain? How can the enormous amount of information required for the development and maintenance of an organism fit within the tiny space of a cell? The answers to several of these questions began to emerge in the 1940s.At this time, researchers discovered, from studies in simple fungi, that genetic information consists primarily of instructions for making proteins. Proteins are the macromolecules that perform most cell functions: they serve as building blocks for cell structures and form the enzy'rnesthat catalyze the cell's chemical reactions (Chapter 3), they regulate gene expression (Chapter 7), and they enable cells to communicate with each other (Chapter 15) and to move (Chapter l6). The properties and functions of a cell are determined largely by the proteins that it is able to make. With hindsight, it is hard to imagine what other type of instructions the genetic information could have contained. Painstaking observations of cells and embryos in the late tgth century had led to the recognition that the hereditary information is carried on chromosomes,threadlike structures in the nucleus of a eucaryotic cell that become visible by light microscopy as the cell begins to divide (Figure 4-l). Later, as biochemical analysisbecame possible, chromosomes were found to consist of both deoxyribonucleic acid (DNA) and protein. For many decades, the DNA was thought to be merely a structural element. However, the other crucial advance made in the 1940swas the identification of DNA as the likely carrier of genetic information. This breakthrough in our understanding of cells came from studies
Figure4-l Chromosomes in cells.(A)Two adjacentplantcells photographed througha light microscope. The DNAhasbeenstainedwith a fluorescent dye (DAPI) that bindsto it.The DNAis presentin chromosomes, whichbecomevisibleasdistinctstructures in the light structures microscope onlywhen they becomecompact,sausage-shaped in preparation for celldivision,asshownon the left.Thecellon the right, which is not dividing,containsidenticalchromosomes, but they cannotbe clearlydistinguished in the light microscope at this phasein the cell'slife (B)Schematic cycle,becausethey are in a moreextendedconformation. diagramof the outlinesof the two cellsalongwith theirchromosomes. (A,courtesyof PeterShaw.)
In ThisChapter AND THESTRUCTURE FUNCTION OFDNA
197
DNA 202 CHROMOSOMAL IN AND ITSPACKAGING FIBER THECHROMATIN OF 219 THEREGULATION CHROMATIN STRUCTURE 233 THEGLOBALSTRUCTURE OFCHROMOSOMES EVOLVE245 HOWGENOMES
(A)
d i v i d i n gc e t l
n o n d i v i d i n gc e l l
10t.
196
Chapter4: DNA,Chromosomes, and Genomes
of inheritance in bacteria (Figure 4-2). But as the 1950sbegan, both how proteins could be specified by instructions in the DNA and how this information might be copied for transmission from cell to cell seemed completely mysterious. The mystery was suddenly solved in 1953,when the structure of DNA was correctly predicted by Iames Watson and Francis Crick. As outlined in Chapter 1, the double-helical structure of DNA immediately solved the problem of how the information in this molecule might be copied, or replicated.It also provided the first clues as to how a molecule of DNA might use the sequenceof its subunits to encode the instructions for making proteins. Today, the fact that DNA is the genetic material is so fundamental to biological thought that it is difficult to appreciate the enormous intellectual gap that was filled. In this chapter we begin by describing the structure of DNA. We see how despite its chemical simplicity, the structure and chemical properties of DNA make it ideally suited as the raw material of genes.We then consider how the many proteins in chromosomes arrange and packagethis DNA. The packing has to be done in an orderly fashion so that the chromosomes can be replicated and apportioned correctly between the two daughter cells at each cell division. It must also allow accessto chromosomal DNA for the enzymes that repair it when it is damaged and for the specialized proteins that direct the expression of its many genes.We shall also see how the packaging of DNA differs along the length of each chromosome in eucaryotes,and how it can store a valuable record of the cell's developmental history. In the past two decades,there has been a revolution in our ability to determine the exact sequence of subunits in DNA molecules. As a result, we now know the order of the 3 billion DNA subunits that provide the information for producing a human adult from a fertilized egg, as well as the DNA sequencesof thousands of other organisms. Detailed analysesof these sequenceshave provided exciting insights into the process of evolution, and it is with this subject that the chapter ends. This is the first of four chapters that deal with basic genetic mechanismsthe ways in which the cell maintains, replicates, expresses,and occasionally improves the genetic information carried in its DNA. This chapter presents a broad overview of DNA and how it is packaged into chromosomes. In the following chapter (Chapter 5) we discuss the mechanisms by which the cell accurately replicates and repairs DNA; we also describe how DNA sequencescan be
Sstrain
s m o o t h p a t h o g e n i cb a c t e r i u m c a u s e sD n e u m o n t a
S s t r a i nc e l l s
I noroovrMUTATToN t Rstrain
f r a c t i o n a t i o no f c e l l - rf e e extra(t into classesof p u r i f i e dm o l e c u l e s
roughnonpathogenic m u r a n tD a c t e n u m
C
o oo l i v e R s t r a i nc e l l sg r o w n i n p r e s e n c eo f e i t h e r h e a t - k i l l e d S s t r a i nc e l l so r c e l l - f r e e e x t r a c to f S s t r a i nc e l l s TRANSFORMATION
+
5 strain
S o m eR s t r a i nc e l l sa r e t r a n s f o r m e dt o S s t r a i n c e l l sw , h o s ed a u g h t e r s a r e p a t h o g e n i ca n d c a u s ep n e u m o n t a
C O N C L U S I OM N :o l e c u l e st h a t c a n c a r r yh e r i t a b l ei n f o r m a t i o na r e presentin 5 strain cells. (A)
protein
RNA
DNA
lipid carbohydrate
ttttl tttrl ttttl
moleculestested for transformationof R strain cells
ttttl ttttl
VVVVT
oc
oo
RRSRR strain strain
oo strain
strain
C O N C L U S I OTNh: e m o l e c u l et h a t c a r r i e st h e h e r i t a b l ei n f o r m a t i o n isDNA. (B)
strain
Figure 4-2 The first experimental demonstrationthat DNA is the genetic material.Theseexperiments, carriedout in the 1940s,showedthat addingpurified DNAto a bacteriumchangedits propertiesand that this changewas faithfullypassedon to subsequent generations. Two closelyrelatedstrainsof pneumoniae the bacteriumStreptococcus differfrom eachother in both their appearance underthe microscope and their pathogenicity. One strainappears smooth(5)and causesdeathwhen injectedinto mice,and the otherappears rough(R)and is nonlethal.(A)An initial experimentshowsthat a substance presentin the S straincanchange(or transform) the R straininto the S strain and that this changeis inheritedby generations subsequent of bacteria. (B)Thisexperiment, in whichthe R strain hasbeenincubatedwith variousclasses of biologicalmoleculespurifiedfrom the S strain,identifiesthe substance as DNA.
THESTRUCTURE AND FUNCTION OF DNA
rearranged through the process of genetic recombination. Gene expressionthe process through which the information encoded in DNA is interpreted by the cell to guide the synthesis of proteins-is the main topic of Chapter 6. In Chapter 7, we describe how this gene expression is controlled by the cell to ensure that each of the many thousands of proteins and RNA molecules encrlpted in its DNA are manufactured only at the proper time and place in the life of the cell.
THESTRUCTURE ANDFUNCTION OFDNA Biologists in the 1940s had difficulty in conceiving how DNA could be the genetic material because of the apparent simplicity of its chemistry. DNA was known to be a long poll.rner composed of only four types of subunits, which resemble one another chemically. Early in the 1950s,DNA was examined by xray diffraction analysis, a technique for determining the three-dimensional atomic structure of a molecule (discussedin Chapter 8). The early x-ray diffraction results indicated that DNA was composed of two strands of the polymer wound into a helix. The observation that DNA was double-stranded was of crucial significance and provided one of the major clues that led to the Watson-Crick model for DNA structure. But onlywhen this model was proposed in 1953 did DNAs potential for replication and information encoding become apparent. In this section we examine the structure of the DNA molecule and explain in general terms how it is able to store hereditary information.
A DNAMoleculeConsists of TwoComplementary Chainsof Nucleotides A deoxyribonucleic acid (DNA) molecule consists of two long polynucleotide chains composed of four types of nucleotide subunits. Each of these chains is knornm as a D.ly'Achain, or a DNA strand. Hydrogen bondsbetween the base portions of the nucleotides hold the two chains together (Figure 4-3). As we saw in Chapter 2 (Panel 2-6, pp. 116-117), nucleotides are composed of a five-carbon sugar to which are attached one or more phosphate groups and a nitrogen-containing base. In the case of the nucleotides in DNA, the sugar is deoxyribose attached to a single phosphate group (hence the name deoxyribonucleic acid), and the base maybe either adenine (A),cytosine(C),guanine (G),or thymine (T). The nucleotides are covalently linked together in a chain through the sugarsand phosphates, which thus form a "backbone" of alternating sugar-phosphatesugar-phosphate. Becauseonly the base differs in each of the four types of subunits, each polynucleotide chain in DNA is analogous to a necklace (the backbone) strung with four rypes of beads (the four basesA, C, G, and T). These same symbols (A, C, G, and T) are also commonly used to denote the four different nucleotides-that is, the baseswith their attached sugar and phosphate groups. The way in which the nucleotide subunits are linked together gives a DNA strand a chemical polarity. If we think of each sugar as a block with a protruding knob (the 5'phosphate) on one side and a hole (the 3'hydroxyl) on the other (see Figure 4-3), each completed chain, formed by interlocking knobs with holes, will have all of its subunits lined up in the same orientation. Moreover, the two ends of the chain will be easily distinguishable, as one has a hole (the 3'hydroxyl) and the other a knob (the 5'phosphate) at its terminus. This polarity in a DNA chain is indicated by referring to one end as tl:'e ! end and the other as the ! end. The three-dimensional structure of DNA-the double helix-arises from the chemical and structural features of its two polynucleotide chains. Because these two chains are held together by hydrogen bonding between the bases on the different strands, all the bases are on the inside of the double helix, and the sugar-phosphatebackbones are on the outside (seeFigure 4-3). In each case, a bulkier two-ring base (a purine; see Panel 2-6, pp. 116-l 17) is paired with a single-ring base (a pyrimidine); A always pairs with T and G with C (Figure
197
198
Chapter4: DNA, Chromosomes,and Genomes
$$iiiffiliiiii:ii:iii llilii:i:ilitffi
b u i l d i n gb l o c k so f D N A phosphate suqar \
';
+K-
sugar oase phosphate
ne
double-strandedDNA
3',
5'
D N A d o u b l eh e l i x
s',
Figure4-3 DNA and its building blocks. DNA is made of four types of nucleotides,which are linkedcovalently into a polynucleotide chain(a DNA strand)with a sugar-phosphate backbonefrom which the bases(A,C,G, andT) extend.A DNAmoleculeis composedof two DNA strandsheld together by hydrogenbonds between the pairedbases.Thearrowheadsatthe endsofthe DNAstrandsindicatethe polaritiesof the two strands,which run antiparallel to eachother in the DNA molecule.In the diagramat the bottom left of the figure,the DNA moleculeis shown straightenedout; in reality,it is twistedinto a doublehelix,as shownon the right.Fordetails,seeFigure4-5.
Y 3'
hydrogen-bonded b a s ep a i r s
4-4). This complementary base-pairlng enables the base pairs to be packed in the energetically most favorable arrangement in the interior of the double helix. In this arrangement, each base pair is of similar width, thus holding the sugarphosphate backbones an equal distance apart along the DNA molecule. To maximize the efficiency of base-pair packing, the two sugar-phosphate backbones
H
o \\
N - _CC '\
C -N
\/ -L
\lI
H-N
\\ C \\N sugar-phosphate backbone
H
C -C
C_
,-n, , ,o'l
adenine
[n,,
thymine
H
N -HilililililO
.ll
guanrne
ll / hydrogen DOnO
H
cytosine
Figure4-4 Complementarybasepairs in the DNAdouble helix.The shapesand chemicalstructureof the basesallow hydrogenbondsto form efficientlyonly betweenA and T and betweenG and C. where atomsthat are able to form hydrogen bonds(seePanel2-3, pp. 110-111)can be brought closetogetherwithout distorting the doublehelix.As indicated, two hydrogenbonds form betweenA and T, while three form betweenG and C.The basescan pairin thisway only if the two polynucleotide chainsthat containthem areantiparallel to eachother.
THESTRUCTURE AND FUNCTION OF DNA
199
5'end
\ .o-
d":, mtnor groove
3'end
L=o-l
\
do
I 0 . 3 4n m
o' o={-o 5'end (A)
/
'.. 3'end
Figure4-5 The DNAdouble helix. (A)A space-filling modelof 1.5turnsof the DNAdoublehelix.Eachturn of DNAis madeup of 10.4nucleotidepairs,and the center-to-centerdistancebetween adjacentnucleotidepairsis 3.4nm.The coilingof the two strandsaroundeach other createstwo groovesin the double helix:the widergrooveis calledthe major groove,and the smallerthe minorgroove. (B)A shortsectionof the doublehelix viewedfrom its side,showingfour base pairs.The nucleotides are linkedtogether bondsthat covalentlyby phosphodiester join the 3rhydroxyl(-OH)groupofone sugarto the 5rhydroxylgroup of the next strand sugar.Thus,eachpolynucleotide hasa chemicalpolarity;that is,itstwo The5' end of different. endsarechemically the DNApolymeris by conventionoften illustrated carryinga phosphategroup, whilethe 3rend is shownwith a hydroxyl.
(B)
wind around each other to form a double helix, with one complete turn every ten base pairs (Figure 4-5). The members of each base pair can fit together within the double helix only if the two strands of the helix are antiparallel-that is, only if the polarity of one strand is oriented opposite to that ofthe other strand (seeFigures 4-3 and 4-4). A consequence of these base-pairing requirements is that each strand of a DNA molecule contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand.
TheStructureof DNAProvides a Mechanism for Heredity Genescarry biological information that must be copied accurately for transmission to the next generation each time a cell divides to form two daughter cells. TWo central biological questions arise from these requirements: how can the information for specifying an organism be carried in chemical form, and how is it accurately copied?The discovery of the structure of the DNA double helix was a landmark in twentieth-century biology because it immediately suggested answers to both questions, thereby providing a molecular explanation for the problem of heredity. We discuss these answers briefly in this section, and we shall examine them in much more detail in subsequent chapters. DNA encodes information through the order, or sequence, of the nucleotides along each strand. Each base-A, C, T or G-can be considered as a Ietter in a four-letter alphabet that spells out biological messagesin the chemical structure of the DNA. As we saw in Chapter 1, organisms differ from one another because their respective DNA molecules have different nucleotide sequencesand, consequently,carry different biological messages.But how is the nucleotide alphabet used to make messages,and what do they spell out? As discussed above, it was known well before the structure of DNA was determined that genes contain the instructions for producing proteins. The DNA messagesmust therefore somehow encode proteins (Figure 4-6). This relationship immediately makes the problem easier to understand. As discussed in Chapter 3, the properties of a protein, which are responsible for its biological function, are determined by its three-dimensional structure. This structure is determined in turn by the linear sequenceof the amino acids of which it is composed. The linear sequence of nucleotides in a gene must therefore somehow spell out the linear sequence of amino acids in a protein. The exact correspondence between the four-letter nucleotide alphabet of DNA and the twenty-letter amino acid alphabet of proteins-the genetic code-is not obvious from the DNA structure, and it took over a decade after the discoverv of the double helix
| |
|
lorue
t
I
|
I
rrl lll tti
g e n eC
g e n eB T--t
g e n eA
|
gene expresslon
protein A
hi€ protein B
double
t'"tx
lffi protein c
Figure4-6 The relationshipbetween geneticinformationcarriedin DNAand proteins(discussed in Chapter1).
200
Chapter4: DNA,Chromosomes, and Genomes
before it was worked out. In Chapter 6 we will describe this code in detail in the course of elaborating the process,knoltm as geneexpresslon,through which a cell converts the nucleotide sequence of a gene first into the nucleotide sequenceof an RNA molecule, and then into the amino acid sequenceof a protein. The complete set of information in an organism'sDNA is called its genorne, and it carries the information for all the proteins and RNA molecules that the organism will ever synthesize. (The term genome is also used to describe the DNA that carries this information.) The amount of information contained in genomes is staggering: for example, a typical human diploid cell contains 2 meters of DNA double helix. Written out in the four-letter nucleotide alphabet, the nucleotide sequence of a very small human gene occupies a quarter of a page of text (Figure 4-7), while the complete sequence of nucleotides in the human genome would fill more than a thousand books the size of this one. In addition to other critical information, it carries the instructions for roughly 24,000distinct proteins. At each cell division, the cell must copy its genome to pass it to both daughter cells. The discovery of the structure of DNA also revealed the principle that makes this copying possible: because each strand of DNA contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand, each strand can act as a template, or mold, for the synthesis of a new complementary strand. In other words, if we designate the two DNA strands as S and S', strand S can serve as a template for making a new strand S', while strand S' can serve as a template for making a new strand S (Figure 4-8). Thus, the genetic information in DNA can be accurately copied by the beautifully simple process in which strand S separatesfrom strand S', and each separated strand then servesas a template for the production of a new complementary partner strand that is identical to its former partner. The ability of each strand of a DNA molecule to act as a template for producing a complementary strand enables a cell to copy, or replicate,its genome before passing it on to its descendants.In the next chapter we shall describe the elegant machinery the cell uses to perform this enormous task.
CCCTCTCCAGCCACACCCTAGGGTTGGCCA ATCTACTCCCAGGAGCAGGGAGGGCAGGAG r-Aaaaa4---! \.AgUUU f UU9UA
r ffiU
I UAUUUUAbAb
CCATCTATTGCTTACATTTGCTTCTGACAC AACTGTGTTCACTAGCAACTCAAACAGACA -i^taifr^ii uunrwrw6LU
FF
r uAv
rLU
r u69uAuffibt
CTGCCGTTACTGCCCTGTC,CGGCAAGGTGA nugr
ugArglsur
uudgu
r 1w
r\srwl96wuuulls 1A
!uftuu
I IAvlsuALauu
I
TTAAGGAGACCAATAGAAACTGGGCATGTG GAGACAGAGMGACTC TTGCCTTTCTGATA CGCACTGACTCTCTCTGCCTATTGGTCTAT T TTCCCACCC TTAGGCTCCTGGTGGTCTAC timnaalFFr ^vvr fuunvwvn96|Wl
IUI
I IUAUIUUI
I I
CGGGATCTGTCCACTCCTGCTGCTGT TATG uuuruLuu ffiur
rlu\srusuu!l'
u9 lluulwv
Iuuuffig I I tAul
tsI
uuuLlu
GCTCACCTGGACAACCTCAAGffi ACCTTT GCCACACTGAGTGACCTrcACTGTGACMG CTGCACGTCGATCCTGAGAACTTCAGGGTG ACTCTATGGGACCCTTGATGTTTTCTTTCC CCTTCTTTTCTATGGTTMGTTCATCTCAT AUUffiUUUUHUSU
I &UHUUU
t nLAb
I 1 1
AGAATGGGAAACAGACGAATGAT TGC.qTCA GTGTGGAACTCTCAGGATCGTTTTAGTTTC TTTTATTTGCTGTTCATMCAATTGTTTTC TTTTGTTTAATTCTTGCTTTCTTTTTTTTT CTTCTCCGCAATTTTTACTATTATACTTAA T GCCTTAACAT TGTGTATAACAAAAGGAAA TATCTCTGAGATACAT TAAGTAACTTAAAA ffiUI
I IAVAIdUIUIUU!IAUIAUAI
I
ACTATTTGGAATATATGTGTGCTTATTTGC ATATTCATAATCTCCCTACTTTATTTTCTT TTATTTTTAATTGATACATAATCATTATAC ATATTTATGGGTTAAAGTGTAATGTTTTAA I hIUI
In Eucaryotes, DNAls Enclosed in a CellNucleus As described in Chapter l, nearly all the DNA in a eucaryotic cell is sequestered in a nucleus, which in many cells occupies about l0% of the total cell volume. This compartment is delimited by a nuclear enuelopeformed by two concentric lipid bilayer membranes (Figure 4-9). These membranes are punctured at intervals by large nuclear pores, which transport molecules between the nucleus and the cytosol. The nuclear envelope is directly connected to the extensive membranes of the endoplasmic reticulum, which extend out from it into the cytoplasm. And it is mechanically supported by a network of intermediate filaments called the nuclear lamina, which forms a thin sheetlike meshwork just beneath the inner nuclear membrane (seeFigure 4-98). The nuclear envelope allows the many proteins that act on DNA to be concentrated where they are needed in the cell, and, as we see in subsequent Figure4-7 The nucleotidesequenceof the human p-globingene.By convention, a nucleotidesequenceis writtenfrom its 5'end to its 3'end, and it shouldbe readfrom left to right in successive linesdown the pageasthough it werenormalEnglishtext.Thisgenecarriesthe informationfor the aminoacidsequenceof one of the two typesof subunitsof the hemoglobinmolecule, the proteinthat carriesoxygenin the blood.A differentgene,the a-globingene,carriesthe informationfor the othertype of hemoglobinsubunit(a hemoglobinmoleculehasfour subunits, two of eachtype).Onlyone of the two strandsof the DNAdouble helixcontainingthe B-globingeneis shown;the otherstrandhasthe exact complementary sequence. The DNAsequences highlightedin yellowshow the threeregionsof the genethat specifythe aminoacidsequence for the B-globinprotein.We shallseein Chapter6 how the cellsplicesthesethree sequences togetherat the levelof messenger RNAin orderto synthesize a full-lenqthB-qlobinprotein.
U I6IA!AIAl
1 UHLUffi
lUAUWl
AATTTTGCATTTGTAATTTTAAAAAATGCT TTCTTCTTTTAATATACTTTTTTGTTTATC TTATTTCTAATACTTTCCCTAATCTCTTTC TTTCAGGGCAATAATGATACAATGTATCAT GCCTCTTTGCACCATTCTAAAGMTMCAG TGATAATTTCTGGGTTAAGGCAATAGCAAT Affluru\AfAr
l l9ruuAtAlls
ATTCTAACTGATGTAAGAGGTTTCATATTG TACAATCCAGCTACCAT CTAATAGCAGC TC TGCTTTTATTTTATGGTTGGGATAAGGCTG GATTATTCTGAGTCCAAGCTAGGCCCTTTT GCTAATCATGTTCATACCTCTTATCTTCCT TGTGCT rcCCCATCACTTTCCCAAAGMTT CACCCCACCAGTGCAG@T@CTATCAGAA AGTGGTffi TGGTGTGCC TAATGCCC TC,CC CCACMGTATCACTAAGCTCGCT TTCTTGC lulLu&t
I LLr6lrffiuul
rLuiIIull
CCCTAAGTCCAACTACTAAACTGGGGGATA TTATCAAGGGCCTTGAGCATCTGGATTCTG !!
f M]ffiLhI
I f dI
ru^rul6ttrffir hL
rrffiuuuM
r6l
f 1 IUAf
IULlg
I lulu/fiIhlttl
r u r uuuAuu
I uAu
IULA
TT T ry\'\i\CAT Ai\,\GAAATGATGAGCTGTTC ffiUUl
1l&Uffi
IAUAL
IAIAII-
I 1H
CTCCATCAAAGAAGGT GAGGC TGCAACCAG CTAATGCACATTGGCAACAGCCCCTGATGC !
] H I l*L
I I AI
IUAIU!!f
UAUffiUUAI
TCTTGTAGAGGC TTGATTTGCAGGTTAAAG TGTTTTAGCTGTCCTCATGAATGTCTTTTC
THESTRUCTURE AND FUNCTION OF DNA
201 t e m p l a t eS s t r a n d
Figure4-8 DNA as a template for its own duplication.As the nucleotideA pairsonly with I and G with successfully C,eachstrandof DNAcan act asa templateto specifythe sequenceof in its complementary strand. nucleotides In this way,double-helical DNAcan be with eachparentalDNA copiedprecisely, helixproducingtwo identicaldaughter DNAhelices.
5 strand new S'strand
new 5 strand S'strand p a r e n t a lD N A d o u b l e h e l i x t e m o l a t e5 ' s t r a n d d a u g h t e rD N A d o u b l e h e l i c e s
chapters, it also keeps nuclear and cytosolic enzymes separate,a feature that is crucial for the proper functioning of eucaryotic cells. Compartmentalization, of which the nucleus is an example, is an important principle of biology; it serves to establish an environment in which biochemical reactions are facilitated by the high concentration of both substrates and the enzymes that act on them. Compartmentalization also prevents enzymes needed in one part of the cell from interfering with the orderly biochemical pathways in another.
Su m m a r y Genetic information is caruied in the linear sequenceof nucleotides in DNA. Each molecule of DNA is a double helix formed from two complementery strands of nucleotidesheld togetherby hydrogen bonds betweenG-C and A-T basepairs. Dultlication of the geneticinformation occursby the useof one DNA strand as a templatefor theformation of a complementarystrand. Thegeneticinformation storedin an organism's DNA contains the instructionsfor all the proteins the organism will euersynthesize and is said to comprise its genome.In eucaryotes,DNA is contained in the cell nucleus,a largemembrane-boundcompartment. e n d o p l a s m i rce t i c u l u m
peripheral heterochromatin
DNA and associated p r o t e i n s( c h r o m a t i n ) , p l u sm a n y R N Aa n d p r o t e i nm o l e c u l e s
nucteotuS
nucleolus microtubule n u c l e a lra m i n a
\) n u c l e a rp o r e/
(B) z pm
membrane innernuclear
l
Figure4-9 A cross-sectional view of a typicalcell nucleus.(A)Electronmicrographof a thin sectionthroughthe nucleus the outer (B)Schematic of a humanfibroblast. drawing,showingthat the nuclearenvelopeconsistsof two membranes, one beingcontinuouswith the endoplasmic reticulummembrane(seealsoFigure12-8).The spaceinsidethe endoplasmic The lipid reticulum(theERlumen)is coloredyel/or4t it is continuouswith the spacebetweenthe two nuclearmembranes. networkof bilayersof the innerand outer nuclearmembranes areconnectedat eachnuclearpore.A sheet-like forminga special supportfor the nuclearenvelope, intermediate filaments(brown)insidethe nucleusprovidesmechanical nearthe laminacontains supportingstructurecalledthe nuclearlamina(fordetails,seeChapter12).The heterochromatin later. specially condensedregionsof DNAthat will be discussed
202
Chapter4: DNA,Chromosomes, and Genomes
CHROMOSOMAL DNAAND ITSPACKAGING INTHE CHROMATIN FIBER The most important function of DNA is to carry genes, the information that specifies all the proteins and RNA molecules that make up an organismincluding information about when, in what types of cells, and in what quantity each protein is to be made. The genomes of eucaryotesare divided up into chromosomes, and in this section we see how genes are typically arranged on each chromosome. In addition, we describe the specialized DNA sequencesthat are required for a chromosome to be accurately duplicated and passedon from one generation to the next. We also confront the serious challenge of DNA packaging. If the double helices comprising all 46 chromosomes in a human cell could be laid end-toend, they would reach approKimately 2 meters; yet the nucleus, which contains the DNA, is only about 6 pm in diameter. This is geometrically equivalent to packing 40 km (24 miles) of extremely fine thread into a tennis ball! The complex task of packaging DNA is accomplished by specializedproteins that bind to and fold the DNA, generating a series of coils and loops that provide increasingly higher levels of organization, preventing the DNA from becoming an unmanageable tangle. Amazingly, although the DNA is very tightly folded, it is compacted in a way that keeps it available to the many enz).rnes in the cell that replicate it, repair it, and use its genesto produce RNA molecules and proteins.
Eucaryotic DNAls Packaged into a Setof Chromosomes In eucaryotes,the DNA in the nucleus is divided between a set of different chromosomes. For example, the human genome-approximately 3.2 x 10e nucleotides-is distributed over 24 different chromosomes. Each chromosome consists of a single, enormously long linear DNA molecule associatedwith proteins that fold and pack the fine DNA thread into a more compact structure. The complex of DNA and protein is called chromatin (from the Greek chroma, "color," because of its staining properties). In addition to the proteins involved in packaging the DNA, chromosomes are also associated with many proteins and RNA molecules required for the processesof gene expression,DNA replication, and DNA repair. Bacteria carry their genes on a single DNA molecule, which is often circular (see Figure 1-29). This DNA is associatedwith proteins that package and condense the DNA, but they are different from the proteins that perform these functions in eucaryotes.Although often called the bacterial "chromosome," it does not have the same structure as eucaryotic chromosomes, and less is knoltm about how the bacterial DNA is packaged.Therefore, our discussion of chromosome structure will focus almost entirely on eucaryotic chromosomes. With the exception of the germ cells (discussed in Chapter 2l) and a few highly specialized cell types that cannot multiply and lack DNA altogether (for example, red blood cells),each human cell contains two copies of each chromosome, one inherited from the mother and one from the father. The maternal and paternal chromosomes of a pair are called homologous chromosomes (homologs). The only nonhomologous chromosome pairs are the sex chromosomes in males, where a Y chromosome is inherited from the father and an X chromosomefrom the mother. Thus, each human cell contains a total of 46 chromosomes-22 pairs common to both males and females, plus two so-called sex chromosomes (X and Y in males, two Xs in females). DNA hybridization is a technique in which a labeled nucleic acid strand servesas a "probe" that localizes a complementary strand, as will be described in detail in Chapter B. This technique can be used to distinguish these human chromosomes by "painting" each one a different color (Figure 4-f0). Chromosome painting is typically done at the stagein the cell cycle called mitosis, when chromosomes are especiallycompacted and easy to visualize (seebelow). Another more traditional way to distinguish one chromosome from another
CHROMOSOMAL DNAAND ITSPACKAGING FIBER IN THECHROMATIN
(A)
(B)
-r^
along each mitotic chromosome (Figure 4-f l). The structural bases for these banding patterns are not well understood. Nevertheless,the pattern of bands on each type of chromosome is unique, and it is these patterns that initially allowed each human chromosome to be identified and numbered. The display of the 46 human chromosomes at mitosis is called the human karyotype. If parts of chromosomes are lost or are switched between chromosomes, these changes can be detected by changes in the banding patterns or by changes in the pattern of chromosome painting (Figure 4-12). Cytogeneticists use these alterations to detect chromosome abnormalities that are associated with inherited defects, as well as to characterize cancers that are associated with specific chromosome rearrangementsin somatic cells (discussedin Chapter 20).
5
2
I5
203 Figure4-10 The completeset of human from chromosomes.Thesechromosomes, a male,wereisolatedfrom a cell and undergoingnucleardivision(mitosis) arethereforehighlycompacted.Each chromosomehasbeen"painted"a differentcolorto permitits unambiguous identification underthe light microscope. paintingis performedby Chromosome to a collection exposingthe chromosomes that havebeen of humanDNAmolecules coupledto a combinationof fluorescent derived dyes.Forexample,DNAmolecules from chromosome1 arelabeledwith one specificdye combination,those from chromosome2 with another,and so on. Because the labeledDNAcanform base pairs,or hybridize, only to the chromosomefrom which it was derived (discussed in Chapter8),each chromosomeis differentlylabeled.For the chromosomes are suchexperiments, subjectedto treatmentsthat separatethe DNAinto individualstrands, double-helical with the designedto permitbase-pairing labeledDNAwhile single-stranded keepingthe chromosomestructure relativelyintact.(A)Thechromosomes visualized asthey originallyspilledfrom the lysedcell.(B)Thesamechromosomes linedup in their numericalorder. artificially Thisarrangement of the full chromosome (FromE.Schrock set is calleda karyotype. With et al..Science273:494-497,1996. permissionfrom AAAS.)
Figure4-1 1 The bandingpatternsof Chromosomes human chromosomes. order 1-22 arenumberedin approximate of size.A typicalhumansomatic(nongerm-line)cellcontainstwo of eachof plustwo sex thesechromosomes, in a chromosomes-twoX chromosomes female,one X and oneY chromosomein a usedto make male.The chromosomes thesemapswerestainedat an earlystage are in mitosis,when the chromosomes incompletelycompacted.The horizontol redline representsthe positionof the centromere(seeFigure4-21),which appearsas a constrictionon mitotic The red knobson chromosomes. c h r o m o s o m e1s3 , ' l 4 ,1 5 , 2 1, a n d 2 2 indicatethe positionsof genesthat code in for the largeribosomalRNAs(discussed Chapter6).Thesepatternsareobtainedby with Giemsastain, stainingchromosomes and they can be observedunderthe light (Formicrographs, seeFigure microscope. 21-1 8; adaptedfrom U. Franke , Cytogenet. 1981.With CellGenet.31:24-32,
204
Chapter4: DNA,Chromosomes, and Genomes
l,?: f J#i,1lff:Tl1flT:ffl[S?ilil:;'1] 1,5# : ;i,en,w,h
ataxia,a diseasecharacterized by progressive deterioration of motor skills. poil, but one The patienthasa normalpairof chromosome 4s (left-hand
il?;|;;;:,T:"?;il::::ililiiil:,'""#i.1,ffi ff:iil: ;:JI:liffi (A)
(B)
aberrantchromosome12 (redbracket\was deduced,from its patternof bands,asa copyof part of chromosome4 that had becomeattachedto
:il:il:::il: l'"*:::l#lit",'*1" f :?il:Tl:T:T;:**il "
pairs,"painted"redfor chromosome4 DNA and bluefor chromosome12 DNA.Thetwo techniquesgive riseto the sameconclusionregardingthe natureof the aberrantchromosome12,but chromosomeoainting providesbetterresolution, allowingthe clearidentification of evenshort piecesof chromosomes that havebecometranslocated. However, Giemsa stainingis easierto perform.(Adaptedfrom E.Schrocket al.,Sclence 273:494-497,1996.With permissionf rom AAA5.)
Chromosomes ContainLongStringsof Genes Chromosomes carry genes-the functional units of heredity. A gene is usually defined as a segment of DNA that contains the instructions for making a particular protein (or a set of closely related proteins). Although this definition holds for the majority of genes, several percent of genes produce an RNA molecule, instead of a protein, as their final product. Like proteins, these RNA molecules perform a diverse set of structural and catalltic functions in the cell, and we discuss them in detail in subsequent chapters. As might be expected, some correlation exists between the complexity of an organism and the number of genes in its genome (see Table l-1, p. 1B). For example, some simple bacteria have only 500 genes, compared to about 25,000 for humans. Bacteria and some single-celled eucaryotes, such as yeast, have especially concise genomes; the complete nucleotide sequence of their genomes reveals that the DNA molecules that make up their chromosomes are little more than strings of closely packed genes (Figure 4-13). However, chromosomes from many eucaryotes (including humans) contain, in addition to genes, a large excessof interspersed DNA that does not seem to carry critical information. Sometimes called "junk DNA' to signify that its usefulness to the cell has not been demonstrated, the particular nucleotide sequence of most of this DNA may not be important. However, some of this DNA is crucial for the proper expression of certain genes,as we discuss elsewhere. Becauseof differences in the amount of DNA interspersed between genes, genome sizes can vary widely (see Figure l-37). For example, the human genome is 200 times larger than that of the yeast S. cereuisiae,but 30 times smaller than that of some plants and amphibians and 200 times smaller than that of a species of amoeba. Moreover, because of differences in the amount of excessDNA, the genomes of similar organisms (bony fish, for example) can vary severalhundredfold in their DNA content, even though they contain roughly the same number of genes.\.Vhateverthe excessDNA may do, it seems clear that it is not a great handicap for a eucaryotic cell to carry a large amount of it. How the genome is divided into chromosomes also differs from one eucaryotic species to the next. For example, compared with 46 for humans, somatic cells from a speciesof small deer contain only 6 chromosomes, while those from a species of carp contain over 100. Even ciosely related species with similar genome sizes can have very different numbers and sizes of chromosomes (Figure 4-14). Thus, there is no simple relationship between chromosome number,
0 5 % o f t h e D N A o f t h e y e a s tg e n o m e
3', 1 0 , 0 0 0n u c l e o t i d ep a i r s
Figure4-1 3 The arrangementof genes in the genome o'fS,cerevisiae. S.cerevisiae is a budding yeastwidely usedfor brewingand baking.The genomeof this yeastcellis distributed over 16 chromosomes. A smallregionof one chromosomehasbeenarbitrarily selectedto showthe high densityof genescharacteristic of this species. As indicatedby the light redshading,some genesare transcribedfrom the lower strand,whileothersaretranscribed from the upperstrand.Thereareabout 6300genesin the completegenome, whichcontainssomewhatmorethan 12 millionnucleotidepairs.(Forthe closelypackedgenesof a bacterium whosegenomeis 4.6millionnucleotides long,seeFigure1-29).
('pl-lsraqs!lqnd uellrul)eW urelord paplol @ uro4 uolssil,utao qllM. t002, l"z6_0ggt60b a.rnJDN'ulntuosuoJ 6ut)uanbasauJouag ueulnHleuotleuratul urolJpaldepv) UlalOlO mM 'uorsr^tp sa)uanoos Jo,uotlpJedald ut saulosouloJq) uo,ssa.rd*a aue6 slr 6urle)tldnplou uaqMsitedapttoal)nu uorlu! | I 'vNc .l ,.:,:.,,,:: i t t , r ,i i l i t l , : , , r , r , i : i t , I i l !t I I Jo lunouleptoldeqaqt 601x v'9lo rlr.ri,lillll:.1 i i r rl.::tli, r, i tiri i' i i' tt I I alrMl ^lq6nolsuteluo)snal)nulle) )tteuos s r r e da p r l o a ; : n u , O tx t € ; o a u a 6 a u o ueurnqe'ploldlp6urag.eupuoqrollu aqt pue sauosoutolq)leal)nutz aql u! VNOro aluenbasaplloallnu ega;du.ro> or1lol sraJaraluanbasawouabuownq urol eqf '(t t latdeq) u! llPlaput passn)stp pue'l raldeql ur pa)npollul)pupuoq)o]il,! s a u a 6y 6 u r u r e l u o la u r o s o r . x o r q + )o % L aql u! puno,sr(lla)rad sardote;d11;nr.u ur-stred eptloal)nu69S'9t)euoua6 ueurnqaql,o uotl)elJelnutLuv 'snal)nu aql ulqll/!^punol ( t [-t pue g 1-7 sarn6rl oos)sauosotuolLl)xas z pue saulosolne saua6gy- [!re auJosouorq)p VoOl ZZeql Ja^opalnqutstpsraurouab 'saoadslno o16ur6uo;aq slll Jo lle1sor,u;y uorlptrtroJur )llauablo Illlelol aq1sr(srred aprtoal)nu60l.x z'€)auouao ueunq aql '9 laldeqf ul llelapul passn)srp se'lueuodutunI;ant1e1al s1$ot61 suorluraqlJo atuanbasVNCaql ollqm 'ulalord aql uorlrod e )o! sapcD(pil) ;o 6uo; s.rredaprloal)nu e0[ x 8? q)ee 'seln]alourVNc orltr]]o pasodulo) uoxa Ll)el 'uorsueoxaploluol loqunj e toue 'uolleuijoluo) )!lol!ul sl! ul zz eurosol,jrolqlueulnq (v) umoqssrauo6le)rdIl e 1olueule6uele uoxo-uoltut aqf (C)'saua6lela^as 'Qt-V eJnBIil suleloJd JoJsapof, (luacrad lBtuosoruoJqcSurureuralaqlJo qonlN Jo rllbual arlua aql sMoqs(g)Jo uoruod ^{eJ e LIuo) ]l Jo allfl ^ oq sr eluoue8 ueunq eq] Jo erntBeJ3ut1u1s ]srIJ eqJ papuedxauy (1; 'sauabpal)!pard ale par 'Iooq sHl ur raldeqs ^,t-lana;o ul esoqlpue seua6umou) oJeu/AorqryDp lueluoJ ut asoqf'pale)rpursaue60t tnoqe qllm s]ueuuedxe 1v\eupe]Elnruus,{.pBeJIBsEq lI aql uo slceJJeJofptu peq e^eq ]eq] iZZaurosouolq>;ouorl.lode1o uolsuedxa 'paz,(lEuB,illry sr uorleuJoJur srq] eJoJeqsapecep zi.uuu aq lnq ill \ u )JolJ aq] plolualv (B)'tardeq)srrllu! ralel passn)stp punoJe puocas rad sapuoelcnu 000I Jo elPJP le seJuenbasaplloelcnu ^^eJpelE sr q)rqM'(ulteuolr.l)olalaq) utleuJolr..l) '{Eed -reuag stl }V'(I-? alqul pue 9I-? arntlg) 8uua83els sl l3eford 1ca[or4eq] l>eduor {;re;ntrlrede ur pabel>ed 1o r,u.ro1 'alqPlle^e are leql VNOJo sa)uanbaspaleada.r aruoueS uEunH aql .{q papr^ord uopBIuJoJuI Jo .&IluBnb reeqs aqJ sr saruosoruoJqc uErunq ile uI uouuuJoJur crleueS elql'v002 q ,,ecuanbasvNC lroqsJo slsrsuo)zz auosoruolq)Jo tllle 'euoua6 ueunq peqsrurJ,, altlua aqt puB I00z q auoueS uErunq aJuue eql Jo ,,ueJp ]sJg,,aql Jo uop aq] Uel aqrJo1soy1 r(lalerulxordde dn saleu pue s.rred -ecnqnd Joo/og'L qll^\ Lepol'(9I-? arnSlC) aluosoluorqr alerqeile^ eJllue ue SuolE aql aprloal)nue0t x It sureluo)'sauosorlojtl) pegueJJe are seue8^^oq.{Fcexeees o} arull IsJIJeql JoJelqlssod arueceq 1l '666I ueuJnqlsolleulsaql lo auo'zz uI '(t-I eJnBrCaes) suraloJd a>lpluol 'selElpauualul VNU q8norqt 'pasn pue lno aurosoruolq)(V)'aulosotuoJq)ueunq e peal sr VNCI ul uorlP{uJoJur eq} /\/\oq'sr.uJe}IeJauaSuI 'pessnsslp e,\/\I raldeq] uI uo saua6lo uoglezgue6,ro aq1 91-9 arn6r1
I
I
II
pobuerrv erv saueD rno /vloHs/v\orls auoueguerunHeql,o a)uanbaseplloel)nNoql ('0002'sraqs!lqnd llalupg g sauor:vy! [rnqpn5'pa pr€ /uorlnlon3'ra6req1trr15'n11'y1 'seluu.{Jeuor}nl r,uor;pardepy)'saua6;o taqurnu -ola Suol Je^osaJnsseJduollJales,{q uo pelcu 'sluala f,l}eua8ruopuer flSutruees repurse ureluo)seoadsom] asaql'leulue aq] uo l)aJJeloleu e burneq]noq];m'pasn; Jo ^Jolsrq enbrun e ,{.qpadeqs ueaq {1JEaaleq selJeds ,(ep-urapou Jo seuros -ouorr{J pue sauoueS aq} 'raqleg 'aZIs euoueS IPlo} puP salJads saurosourol r.lta1e.r edasI; ;erlrur'reflunu "fiIxalduoc uerpuleql uorlnlo^aaLllul 's,taqurnu Jo euosou,roltl)lueraggp{ran qllm raap :e[1unuasaurql > e [ u n u .u e r p u ; pale;ar {;aso1>orvrltr 1-y arn6r1 1o sag>eds tA I
l.x r(f
,!
rr
--
--
XZ
|l
v9
o,
9r
rrv
gla
(ro
trv
9r,
v(
uU
ba
UU
uu uu (r(, 0u
902
U]8II NIIVIAOUHf]HI NI 9NI9V))Vd SII CNV VNC IVWOSOWOUH)
pue leln])nrls'isuor6a.r po^.tasuol pue'svNUleuolt)unJ vNC aqt uo satrs6u purq-uralord pa^lasald pJlelsuerlun) sl1n ,€ pue,g 6urpoluaVNCapnputasaq]lsuot60lleuorl)unJ *** ,4dotauo ur aqt uror; suortetnLu 6urbeiuep;o uortelnunllpaql .(q poMollolauableuotDunle 1ouotlettldnp 6ututeluot1nq uorssardxa radordsll luanardleql suorlelnusnorautnu asueseuabopnasd 1so1,11 'eua6 y ** yp6 1oaruanbasapttoallnue sraua6opnasd elo leqt 6urlqurasar,(lasop leuorlrunJ lluapuel areleql ouo u.ror1 6uua;rpsraquJnu leedarq1t,a,r'paleader lxouaqt ol lenpr^rpur y5166uruteurar peleade;llqbrqlroqs1ostsrsuo) lluer-uud aql (saplloal)nu saruanbas OOO'OO L l r rJ U o l n o q e A l u o J o J l p r r o r a ) , { l a s r : a r d usM J po [uo)Jslr) n u , r o l l l l q q B 6 . 1 o a t u a n b a s e q l * (dor-q6tq ul oTogg {qaler"urxorddes}uarrrele anr}r}ader VNC }o a6e}ue)rad o/o9't ***SelUenOeS pa^rasuo)I;q6rq raqlo ul vNc,o abetua)red
o/o9'l 000'02ueql erou sredeprloel)nuItt srledaprloal)nugoL'LL t'0t 8LL l srledeprloel)nuooo'Lz sltedaptloel)nueOI x t'Z ggg'97Ilaleurxordde aptloallnu60l. x Z'€ +sJted
(sa>uenbas 6urpor urelord)suoxaura)uonbas VNCJoebelue)rod *xsoua60pnesd ;o requrnSl azrsuoxoueew azrsuoxalsa6lel aua6rad suoxaJoraqLUnu ueaw eua6radsuoxaJoraqunu lsablel euebrad suoxaro Jaqurnu ]salleus azrsaua6ueew auoblsa6re-l saua6JoraquinN !]6ua1YP6
ououaD ueuinHaql rol s)llsllels lelln euos [-t alqel
'dlssalarecrltuaredde 'r{.yaleururr.rcslpur sualr elqenp^ ,,'lnoq8norql paralleJs dliualed,r,ra; aql pue :pepreosrpra^e Sulqlou,!1en1rp 1(,1unf,sp pateqlulun aqt ^q ol peJJeJer)JallnlJ palepruncre qonru luoqpzrue8to Jo eJuaplle ell]{ l}duelun 'ralaure lano trrtsrlenpplpul dUBq :eJrT/roteJe8r4ar/uoorpaqTa8ure8rno/t alquasar lsnf ^luo ol dn ppe lnq plnoMaua6stLltut sa)uanbes6utpo)aql .4.eru11sitem auros u1,, 'auroua8 Jno paqrJcsap Joleluaruruoc auo sy ferresrp ;o ]nq 'LU0€ roJpuatxe plno^^aua6 a6ela^e uV 'tu 0€ t r(ranaaua6 6urpor-ura1o.rd alels SurruJpleup uI aq ol suaas ueunq e af,npoJd ol pepeeu uollBruJoJur lPcrlrJc aql leql paleelar seq aruoua8 uerunq aq] Jo aouenbas apnoalcnu eql dlpurC e'e6elaneuo 'aq plnoMelaql'ale)s 'Irorlr saJuanbasy11q Lrolep8ar,uoq raldeq3 ur ssnrsrp slql lv '(Bu aq par)sur6uoueunq rno 1o 2 alrsaql 'e)uJVjo ratua) aql ssol)e q)lells aM'sauoua8 esrJuoJqlyv\ susruESro ur passarduoc aroru eJBsaJuanbesdroleln o1q6nouareJ'(sapur 6697Ila1er-urxordde) -3er asaq] 'palradxa eq plnom sy 'srrcd epqoalf,nu Jo spuesnoql Jo sue] JaAolno Lul 002€pualxaplnoMeuoua6 ueunq peards aru eue8 pcldril e JoJ seJuenbas ,trolep8eJ er{l 'suerunq q 'lleJ Jo eddr aql'(v) ur se'eprloal)nuq)pa uaaMlaq JadoJdeq] ut I1uo puu 'la^el alerrdordde aql lE pessaJdxa'arur1JedoJdaql te JJo areds uut I e qllM umplpl .euoua6 Jo uo paurnl sr eue8 aq] teq] Surrnsua JoJelqrsuodseJ eJe qJIqM 'sacuanbasy6161 uetunqaqtro e1er591-tr arnbrl holaynBat qlyv\ pelprJossp sr eue8 qope 'suoxe pue suoJlur o] uonrppe uI 'saruoso(uoJqcJraql ul vNo SurpocSouorlJe4 raqSrqqcnu aql JoJ se IIeM se '(seua8 upunq Jo ]eq] qlapue^ l-euo lnoqP) sauaSrraql Jo azrs rallerus qJnru aql JoJslunocce srql'suoJlur {rEI saruoua8 asrcuoc qlylr susrue8Jo ruor; saua8 .&rroferu aq] 'lseJ]uoc uI 'suoJ]urJo 3ur]srsuoc eue8 eq] Jo lsotu q]lm ;o 'suoJlul pue suoxe Surleuralp go Surns Suoye Jo lsrsuoc snqt seue8 ueurnq;o dlr -rofeu eqJ'(I-t elqeJ pue g1-7 arn8rgaas)suonrq palpc ere saua8ur seruenbes (Surpocuou) Suruarrratureq] lsuoxe pelleJ ale saJuenbasSurpoc eq] 'g JeldeqJ ur 'uralord JoJepol ]Bqt VNC Jo stuetu8es uoqs /i.la^ue Irelep ur pessnf,srpaq III ^ sy -lal eqt Surpocuou ldnJJalur ]Eql VNCI ;o seqcleJls3uo1yo stsrsuoc eua8 E ul VNe Sutureruar aq] Jo lsolN '(sueunq ut sprJp ourrup gtt lnoqe) azrs aBpJaAE Jo urel -ord e epoJua ol parrnbal eJEsJredeppoelonu 00tI lnoqp dpg 'uralord e sprce Jo ourue er{l Jo aruanbas ruauq eql JoJ uopeuJoJur aqt sepnoelJnu go acuanbas 'sJred appoelJnu 'eloqe passnJsrp Jeeurl slr ur serJJec eue8 e sV 000'LZ Jo Imrd,{r azs aua8 aBpJeAEa8rel aql sr eruoue8 ueunq aql eJnl€eJelqelou puooas V Jo 'sraldeqc relel ur lrc]ap ur vuawap a1qasod (v) -su0Jl esarfl ssncsrp aM'etup dteuorlnloura JaAoaruosoruoJqJ ar{] ur sa^lesueql pauasur dlenperS a^Eq lpql VNC Jo sacard alqou 'iloqs Jo dn apeu sl VNCI
'vNo :t raldeql sauouaDpuP'sauosouroJqf
902
.: :..:.,,,1:;,1$illJ
CHROMOSOMAL DNAAND ITSPACKAGING INTHECHROMATIN FIBER percentage 0
10
20
30
40
LINEs SlNEs r e t r o v i r a l - l i ke l e m e n t s D N A - o n l yt r a n s p o s o n ' f o s s i l s '
50
50
70
80
90
100
protein-coding regions GENES
IRANSPOSONS
s r m p t es e q u e n c er e p e a t s s e g m e n t adl u p l i c a t i o n s R E P E A T ESDE O U E N C E S
non-repetitiveDNA that is n e i t h e ri n i n t r o n sn o r c o d o n s UNIQUE SEQUENCES
GenomeComparisons RevealEvolutionarily Conserved DNA Sequences A major obstacle in interpreting the nucleotide sequences of human chromosomes is the fact that much of the sequence is probably unimportant. Moreover, the coding regions of the genome (the exons) are typically found in short segments (average size about 145 nucleotide pairs) floating in a sea of DNA whose exact nucleotide sequence is of little consequence. This arrangement makes it very difficult to identify all the exons in a stretch of DNA sequence. Even harder is the determination of where a gene begins and ends and exactly how many exons it spans. Accurate gene identification requires approaches that extract information from the inherently low signal-to-noise ratio of the human genome. We shall describe some of them in Chapter 8. Here we discussonly one general approach, which is based on the observation that sequencesthat have a function are relatively conserved during evolution, whereas those without a function are free to mutate randomly. The strategy is therefore to compare the human sequence with that of the corresponding regions of a related genome, such as that of the mouse. Humans and mice are thought to have diverged from a common mammalian ancestor about 80 x 106years ago,which is long enough for the majority of nucleotides in their genomes to have been changed by random mutational events.Consequently,the only regions that will have remained closely similar in the two genomes are those in which mutations would have impaired function and put the animals carrying them at a disadvantage, resulting in their elimination from the population by natural selection. Such closely similar regions are known as conserued regions. The conserved regions include both functionally important exons and regulatory DNA sequences. In contrast, nonconserued regionsrepresent DNA whose sequenceis unlikely to be critical for function. The power of this method can be increased by comparing our genome with the genomes of additional animals whose genomes have been completely sequenced,including the rat, chicken, chimpanzee, and dog. By revealing in this way the results of a very long natural "experiment," lasting for hundreds of millions of years, such comparative DNA sequencing studies have highlighted the most interesting regions in these genomes.The comparisons reveal that roughly 5% of the human genome consists of "multi-species conserved sequences,"as discussed in detail near the end of this chapter. Unexpectedly, only about onethird of these sequences code for proteins. Some of the conserved noncoding sequences correspond to clusters of protein-binding sites that are involved in gene regulation, while others produce RNA molecules that are not translated into protein. But the function of the majority of these sequences remains unknor.tm. This unexpected discovery has led scientists to conclude that we understand much less about the cell biology of vertebrates than we had previously imagined. Certainly, there are enormous opportunities for new discoveries, and we should expect many surprises ahead. Comparative studies have revealed not only that humans and other mammals share most of the same genes,but also that large blocks of our genomes contain these genes in the same order, a feature calIed conseruedsynteny.As a result, Iarge blocks of our chromosomes can be recognized in other species. This allows the chromosome painting technique to be used to reconstruct the recent evolutionarv historv of human chromosomes (Fieure 4-18).
Figure 4-17 Representationof the nucleotidesequencecontent of the completelysequencedhuman genome. retroviral-like elements, The LlNEs, SlNEs, aremobile and DNA-onlytransposons geneticelementsthat havemultipliedin themselves our genomeby replicating and insertingthe new copiesin different positions. Thesemobilegeneticelements in Chapter5 (seeTable5-3, arediscussed p. 318).Simplesequencerepeatsare (lessthan 14 shortnucleotidesequences nucleotidepairs)that arerepeatedagain Segmental and againfor long stretches. arelargeblocksof the duplications genome(1000-200,000 nucleotidepairs) that are presentat two or more locations in the genome.The most highlyrepeated have blocksof DNAin heterochromatin not yet been completelysequenced; of human DNA thereforeabout 10olo in this arenot represented sequences diagram.(Datacourtesyof E.Margulies.)
208
Chapter4: DNA,Chromosomes, and Genomes
A N C E S T OC RH R O M O S O M E
a n c e s t o rD N A of human c h r o m o s o m e3
2 i n v e r soin 5
a n c e s t o rD N A of human c h r o m o s o m e2 1
f usion
I
(A)
"6'
B
-
ry lemur
orangutan
nu m a n
abcdabcdabcd (B)
ChromosomesExistin DifferentStatesThroughout the Life of a Cell
Figure4-18 A proposedevolutionary historyof human chromosome3 and its relativesin other mammals.(A)The orderof chromosome3 segments hypothesized to be presenton a chromosomeof a mammalianancestoris shown$rellowbox).The minimum changesin this ancestral chromosome necessary to accountfor the appearance of eachof the threemodern (The chromosomes areindicated. present-day chromosomes of humans and Africanapesareidenticalat this resolution.) The smallcirclesdepicted in the modernchromosomes reoresent the positionsof centromeres. A fissionand inversion that leadsto a changein chromosomeorganization is thoughtto occuronceevery5-10 x 106yearsin mammals.(B)Someof the chromosome paintingexperiments that led to the diagramin (A).Eachimageshowsthe chromosomemostcloselyrelatedto humanchromosome3, paintedgreenby hybridization with differentsegmentsof DNA,lettereda, b, c, and d alongthe bottom of the figure.Theseletters correspondto the coloredsegmentsof the diagramsin (A),as indicatedon the (From5. MLilleret ancestral chromosome. al.,Proc.NatlAcad.Sci.U.5.A.97:206-211, 2000.With permission from National Academyof Sciences.)
We have seen how genesare arranged in chromosomes, but to form a functional chromosome, a DNA molecule must be able to do more than simply carry genes: it must be able to replicate, and the replicated copies must be separatedand reliably partitioned into daughter cells at each cell division. This process occurs through an ordered series of stages,collectively known as the cell cycle, which provides for a temporal separation between the duplication of chromosomes and their segregation into two daughter cells. The cell cycle is briefly summarized in Figure 4-19, and it is discussed in detail in Chapter 17. Only certain
n u c t e a re n v e t o p e
MTTOSTS
In r e r p n a s e c hr o m o s o m e
m itotic
chromosome I NTERPHASE
M PHASE
INTERPHASE
Figure4-19 A simplifiedview of the eucaryoticcell cycle,Duringinterphase, the cellis activelyexpressing its genesano is thereforesynthesizing proteins. Also,duringinterphase and beforecelldivision,the DNAis replicated and each chromosomeis duplicatedto producetwo closelypaireddaughterchromosomes (a cellwith onlytwo chromosomes is illustrated here).OnceDNAreplication is complete,the cellcan enterM phase,when mitosisoccursand the nucleusis dividedinto two daughternuclei.Duringthis stage,the chromosomes condense, the nuclearenvelopebreaksdown,and the mitoticspindleformsfrom microtubules and other proteins. Thecondensedmitoticchromosomes arecapturedby the mitoticspindle,and one completesetof chromosomes is then pulledto eachend of the cellby separating eachdaughter chromosomepair.A nuclearenvelopere-formsaroundeachchromosomeset,and in the finalstepof M phase,the cell dividesto producetwo daughtercells,Mostof the time in the cellcycleis spentin interphase; M phaseis briefin comparison, occupyingonly aboutan hour in manymammaliancells.
CHROMOSOMAL DNAAND ITSPACKAGING INTHECHROMATIN FIBER
,:' l$9 ";,;:'"' Figure4-20 A comparisonof extended interphasechromatinwith the chromatin in a mitotic chromosome. (A)A scanningelectronmicrographof a mitoticchromosome: a condensed duolicatedchromosomein whichthe arestilllinked two new chromosomes together(seeFigure4-21).The regionindicatesthe position constricted describedin Figure of the centromere, 4-21.(B)An electronmicrograph showingan enormoustangleof chromatinspillingout of a lysed interphasenucleus.Note the differencein scales.(A,courtesyof TerryD. Allen; B,courtesyof VictoriaFoe.)
(A)
1 lr.
(B)
l0 pm
parts of the cycle concern us in this chapter. During interphase chromosomes are replicated, and during mitosis they become highly condensed and then are separated and distributed to the two daughter nuclei. The highly condensed chromosomes in a dividing cell are knorm as mitotic chromosomes (Figure 4-2OA).This is the form in which chromosomes are most easily visualized; in fact, the images of chromosomes shor,rmso far in the chapter are of chromosomes in mitosis. During cell division, this condensed state is important for the accurate separation of the duplicated chromosomes by the mitotic spindle, as discussedin Chapter 17. During the portions of the cell cycle when the cell is not dividing, the chromosomes are extended and much of their chromatin exists as long, thin tangled threads in the nucleus so that individual chromosomes cannot be easily distinguished (Figure 4-208).We shall refer to chromosomes in this extended state as interphasechromosomes.Since cells spend most of their time in interphase, and this is where their genetic information is being read out, chromosomes are of greatestinterest to cell biologists when they are least visible.
EachDNAMoleculeThatFormsa LinearChromosome Must Containa Centromere, Origins TwoTelomeres, and Replication A chromosome operates as a distinct structural unit: for a copy to be passed on to each daughter cell at division, each chromosome must be able to replicate, and the newly replicated copies must subsequently be separated and partitioned correctly into the two daughter cells.These basic functions are controlled by three types of specialized nucleotide sequencesin the DNA, each of which binds specific proteins that guide the machinery that replicates and segregates chromosomes (Figure 4-21) . Experiments in yeasts, whose chromosomes are relatively small and easy to manipulate, have identified the minimal DNA sequence elements responsible for each of these functions. One type of nucleotide sequenceacts as a DNA replication origin, the location at which duplication of the DNA begins. Eucaryotic chromosomes contain many origins of replication to ensure that the entire chromosome can be replicated rapidly, as discussedin detail in Chapter 5. After replication, the two daughter chromosomes remain attached to one another and, as the cell cycle proceeds, are condensed further to produce mitotic chromosomes. The presence of a second specialized DNA sequence, called a centromere, allows one copy of each duplicated and condensed chromosome to be pulled into each daughter cell when a cell divides. A protein
21O
Chapter4: DNA,Chromosomes, and Genomes INTERPHASE
telomere
r e p li c a t i o n oflgrn
A
MtTOStS
INTERPHASE
i!, E
HH
centromere
HH
portion of m i t o t i cs p i n d l e
tlLl
d u pl icated chromosomes i n s e p a r a t ec e l l s
complex called a kinetochore forms at the centromere and attaches the duplicated chromosomes to the mitotic spindle, allowing them to be pulled apart (discussedin Chapter I7). The third specializedDNA sequenceforms telomeres, the ends of a chromosome. Telomerescontain repeated nucleotide sequencesthat enable the ends of chromosomes to be efficiently replicated. Telomeresalso perform another function: the repeated telomere DNA sequences,together with the regions adjoining them, form structures that protect the end of the chromosome from being mistaken by the cell for a broken DNA molecule in need of repair. We discuss both this type of repair and the structure and function of telomeres in Chapter 5. In yeast cells, the three types of sequencesrequired to propagate a chromosome are relatively short (typically less than 1000base pairs each) and therefore use only a tiny fraction of the information-carrying capacity of a chromosome. Although telomere sequencesare fairly simple and short in all eucaryotes, the DNA sequencesthat form centromeres and replication origins in more complex organisms are much longer than their yeast counterparts. For example, experiments suggestthat human centromeres contain up to 100,000nucleotide pairs and may not require a stretch of DNA with a defined nucleotide sequence. Instead, as we shall discuss later in this chapter, they seem to consist of a large, regularly repeating protein-nucleic acid structure that can be inherited when a chromosome replicates.
DNAMolecules AreHighlyCondensed in Chromosomes All eucaryotic organisms have special ways of packaging DNA into chromosomes. For example, if the 48 million nucleotide pairs of DNA in human chromosome 22 could be laid out as one long perfect double helix, the molecule would extend for about 1.5 cm if stretched out end to end. But chromos ome 22 measures only about 2 pm in length in mitosis (seeFigures 4-10 and 4-ll), representing an end-to-end compaction ratio of nearly 10,000-fold.This remarkable feat of compression is performed by proteins that successivelycoil and fold the DNA into higher and higher levels of organization. Although much less condensed than mitotic chromosomes, the DNA of human interphase chromosomes is still tightly packed, with an overall compaction ratio of approximately 500-fold (the length of a chromosome's DNA helix divided by the end-to-end length of that chromosome). In reading these sections it is important to keep in mind that chromosome structure is dynamic. We have seen that each chromosome condenses to an unusual degree in the M phase of the cell cycle. Much less visible, but of enormous interest and importance, specific regions of interphase chromosomes
Figure4-21 The three DNAsequences requiredto producea eucaryotic chromosomethat can be replicatedand then segregatedat mitosis.Each chromosomehasmultipleoriginsof replication, one centromere, and two telomeres.Shownhere is the seouenceof eventsthat a typicalchromosome follows duringthe cellcycle.The DNAreplicates in interphase, beginningat the originsof replication and proceeding bidirectionally from the originsacrossthe chromosome. In M phase,the centromere attachesthe duplicatedchromosomes to the mitoticspindleso that one copyis distributedto eachdaughtercellduring mitosis. Thecentromerealsohelpsto hold the duplicatedchromosomes togetheruntilthey arereadyto be movedapart.The telomeresform special caosat eachchromosomeend.
CHROMOSOMAL DNA AND ITSPACKAGING INTHECHROMATIN FIBER
211
decondense as the cells gain access to specific DNA sequences for gene expression, DNA repair, and replication-and then recondense when these processesare completed. The packaging of chromosomes is therefore accomplished in a way that allows rapid localized, on-demand accessto the DNA. In the next sections we discuss the specialized proteins that make this type of packagingpossible.
Nucleosomes Area BasicUnitof Eucaryotic Chromosome Structure The proteins that bind to the DNA to form eucaryotic chromosomes are traditionally divided into two general classes:the histones and the nonhistone chromosomal proteins.The complex of both classesof protein with the nuclear DNA of eucaryotic cells is knor,rmas chromatin. Histones are present in such enormous quantities in the cell (about 60 million molecules of each t)?e per human cell) that their total mass in chromatin is about equal to that of the DNA. Histones are responsible for the first and most basic level of chromosome packing, the nucleosome, a protein-DNA complex discovered in 1974.\Mhen interphase nuclei are broken open very gently and their contents examined under the electron microscope, most of the chromatin is in the form of a fiber with a diameter of about 30 nm (Figure 4-22A).If this chromatin is subjected to treatments that cause it to unfold partially, it can be seen under the electron microscope as a series of "beads on a string" (Figure 4-228). The string is DNA, and each bead is a "nucleosome core particle" that consists of DNA wound around a protein core formed from histones. The structural organization of nucleosomes was determined after first isolating them from unfolded chromatin by digestion with particular enzymes (called nucleases) that break dor,rmDNA by cutting between the nucleosomes. After digestion for a short period, the exposed DNA between the nucleosome core particles, the linker Dl/A, is degraded. Each individual nucleosome core particle consists of a complex of eight histone proteins-two molecules each of histones H2A, HzB, H3, and H4-and double-stranded DNA that is 147 nucleotide pairs long. The histone octamer forms a protein core around which the double-stranded DNA is wound (Figure 4-23). Each nucleosome core particle is separated from the next by a region of linker DNA, which can vary in length from a few nucleotide pairs up to about 80. (The term nucleosometechnically refers to a nucleosome core particle plus one of its adjacent DNA linkers, but it is often used synonymously with nucleosome core particle.) On average,therefore, nucleosomes repeat at intervals of about 200 nucleotide pairs. For example, a diploid human cell with 6.4 x 10enucleotide pairs contains approximately 30 million nucleosomes.The formation of nucleosomes converts a DNA molecule into a chromatin thread about one-third of its initial length.
(A)
Figure 4-22 Nucleosomesas seenin the electron microscope,(A)Chromatin isolateddirectlyfrom an interphase nucleusappearsin the electron microscooe asa thread30 nm thick. (B)Thiselectronmicrographshowsa lengthof chromatinthat hasbeen or unpacked, experimentally afterisolationto showthe decondensed, (A,courtesyof Barbara nucleosomes. Hamkalo;B,courtesyof VictoriaFoe.)
212
Chapter4: DNA,Chromosomes, and Genomes Figure4-23 Structuralorganizationof the nucleosome. A nucleosome containsa proteincoremadeof eight histonemolecules. In biochemical experiments, the nucleosome coreparticlecan be released from isolated chromatinby digestionof the linkerDNAwith a nuclease, an enzymethat breaksdown DNA,(Thenuclease candegradethe exposedlinkerDNAbut cannotattackthe DNAwound tightlyaroundthe nucleosome core.)After dissociation of the isolatednucleosome into its proteincoreand DNA,the lengthof the DNAthat waswound aroundthe corecan be determined. Thislengthof 147nucleotidepairsis sufficientto wrap 1.7 timesaround the histonecore.
c o r eh i s t o n e s of nucleosome
nucleosome includes " beads-on-a-string " -200 nucleotide formof chromatin p a i r so f D N A
I NUCLEASE I DTGESTS I L I N K EDRN A I V
,.rd'..Q'...od TheStructureof the Nucleosome CoreParticleReveals How DNA ls Packaged The high-resolution structure of a nucleosome core particle, solved in 1997, revealed a disc-shaped histone core around which the DNAwas tightlywrapped 1.7 turns in a left-handed coil (Figure 4-24). All four of the histones that make up the core of the nucleosome are relatively small proteins (102-135 amino acids), and they share a structural motif, known asthe histonefold,formed from three cr helices connected by two loops (Figure 4-25).In assembling a nucleosome, the histone folds first bind to each other to form H3-H4 and H2A-H2B dimers, and the H3-H4 dimers combine to form tetramers. An H3-H4 tetramer then further combines with two HZA-H2B dimers to form the compact octamer core, around which the DNA is wound (Figure 4-26). The interface between DNA and histone is extensive: 142 hydrogen bonds are formed between DNA and the histone core in each nucleosome. Nearly half of these bonds form between the amino acid backbone of the histones and the phosphodiester backbone of the DNA. Numerous hydrophobic interactions and salt linkages also hold DNA and protein together in the nucleosome. For example, more than one-fifth of the amino acids in each of the core histones are either lysine or arginine (two amino acids with basic side chains), and their positive
I
t -^,^^-^r *(
w
octameric h i s t o n ec o r e
+
ilJ H2A
side view
€ t h i s t o n eH 2 A
bottom view
@ h i s t o n eH 2 B
@ h i s t o n eH 3
O
h i s t o n eH 4
r
"ill".'"?..&ll rir .. :;;;i:r
sL H2B
147-nucleotide-pair D N Ad o u b l eh e l i x
Jt'
H3
ar6 H4
Figure4-24 The structureof a nucleosome core particle,as determined by x-ray diffraction analysesof crystals.Each histoneis coloredaccordingto the scheme in Figure4-23,with the DNAdoublehelixin light gray.(From K. Luger et al.,Nature 389:251-260,1997.With permissionfrom MacmillanPublishers Ltd.)
CHROMOSOMAL DNAAND IT5PACKAGING INTHECHROMATIN FIBER
213
(A) H2A 4la6t
H2B
H3N
n
H4
m*"'-' N - t e r m i n atla i l
**ffi h i s t o n ef o l d
Figure4-25 The overallstructuralorganizationof the core histones. (A)Eachof the corehistonescontainsan N-terminal tail,which is subjectto severalformsof covalentmodification, and a histonefold region,as (B)Thestructureof the histonefold,whichis formedby allfour indicated. (C)Histones of the corehistones. 24 and 28 form a dimerthroughan interactionknownasthe "handshakei'Histones H3 and H4 form a dimer throughthe sametype of interaction.
charges can effectively neutralize the negatively charged DNA backbone. These numerous interactions explain in part why DNA of virtually any sequencecan be bound on a histone octamer core. The path of the DNA around the histone core is not smooth; rather, several kinks are seen in the DNA, as expected from the nonuniform surface of the core.The bending requires a substantial compression of the minor groove of the DNA helix. Certain dinucleotides in the minor groove are especiallyeasyto compress,and some nucleotide sequencesbind the nucleosome more tightly than others (Figure 4-27). This probably explains some striking, but unusual, casesof very precise positioning of nucleosomes along a stretch of DNA. For most of the DNA sequencesfound in chromosomes, however, the sequence preference of nucleosomes must be small enough to allow other factors to dominate, inasmuch as nucleosomes can occupy any one of a number of positions relative to the DNA sequence in most chromosomal regions. In addition to its histone fold, each of the core histones has an N-terminal amino acid "tail", which extends out from the DNA-histone core (see Figure 4-26). These histone tails are subject to several different types of covalent modifications that in turn control critical aspects of chromatin structure and function, as we shall discuss shortly. As a reflection of their fundamental role in DNA function through controlling chromatin structure, the histones are among the most highly conserved eucaryotic proteins. For example, the amino acid sequenceof histone H4 from a pea and from a cow differ at only 2 of the 102 positions. This strong evolutionary conservation suggeststhat the functions of histones involve nearly all of their amino acids, so that a change in any position is deleterious to the cell. This suggestion has been tested directly in yeast cells, in which it is possible to mutate a given histone gene in uitro andintroduce it into the yeast genome in place of the normal gene. As might be expected, most changes in histone sequences are lethal; the few that are not lethal cause changes in the normal pattern of gene expression,as well as other abnormalities. Despite the high conservation of the core histones, eucaryotic organisms also produce smaller amounts of specializedvariant core histones that differ in amino acid sequence from the main ones. As we shall see,these variants, combined with a surprisingly large variety of covalent modifications that can be added to the histones in nucleosomes, make possible the many different chromatin structures that are required for DNA function in higher eucaryotes.
c
214
Chapter4: DNA,Chromosomes, and Genomes Figure4-26 The assemblyof a histone octamer on DNA.The histoneH3-H4 dimerand the H2A-H2Bdimerare formed from the handshakeinteraction. An H3-H4tetramerformsand bindsto the DNA.Two H2A-H2Bdimersarethen added,to completethe nucleosome. The histonesarecoloredas in Figures4-24 and4-25. Notethat all eight N-terminal tailsof the histonesprotrudefrom the disc-shaped corestructure. Their conformations arehighlyflexible. lnsidethe cell,the nucleosome assemblyreactions shownhereare mediatedby histonechaperoneproteins, some specificfor H3-H4 and others specificfor H2A-H28.(Adaptedfrom figuresby J.Waterborg.)
H 3 - H 4d i m e r
I H3-H4 tetramer
two dimers bind to H3-H4 tetramer
/3
G-Cpreferred here (minor groove outside)
TT,and TA dinucleotides preferred here ( m i n o rg r o o v ei n s i d e ) histone core of nucleosome (histoneoctamer)
D N Ao f nucteosome
Figure4-27 The bending of DNA in a nucleosome. The DNAhelixmakes 1.7tight turnsaroundthe histone octamer.This diagramillustrates how the minorgrooveis compressed on the insideof the turn.Owingto certain structural featuresof the DNAmolecule, the indicateddinucleotides are preferentially accommodated in sucha narrowminorgroove,which helpsto explainwhy certainDNAsequences will bind moretightlythan othersto the nucleosomecore.
CHROMOSOMAL DNAAND ITSPACKAGING INTHECHROMATIN FIBER w r a p p e on u c r e o s o m e existsfor 250 milliseconds
u n w r a p p e dn u c l e o s o m e existsfor 10-50 milliseconds
rewrapped nucleosome
215
Figure4-28 Dynamicnucleosomes. showthat the DNA Kineticmeasurements is surprisingly in an isolatednucleosome dynamic,rapidlyuncoilingand then core. rewrappingaroundits nucleosome As indicated, this makesmostof its bound to otherDNADNAsequenceaccessible bindingproteins.(Datafrom G. Li and J.Widom, Nat.Struct.Mol.Biol.11:763-769, from Macmillan 2004.With oermission Ltd.) Publishers
Nucleosomes Havea DynamicStructure, and Are Frequently Subjected to Changes ChromatinCatalyzed by ATP-Dependent Remodeling Complexes For many years biologists thought that, once formed in a particular position on DNA, a nucleosome remains fixed in place because of the very tight association between its core histones and DNA. If true, this would pose problems for genetic readout mechanisms, which in principle require rapid accessto many specific DNA sequences,as well as for the rapid passageof the DNA transcription and replication machinery through chromatin. But kinetic experiments show that the DNA in an isolated nucleosome unwraps from each end at rate of about 4 times per second, remaining exposed for 10 to 50 milliseconds before the partially unr,trapped structure recloses.Thus, most of the DNA in an isolated nucleosome is in principle availablefor binding other proteins (Figure 4-28). For the chromatin in a cell, a further loosening of DNA-histone contacts is clearly required, because eucaryotic cells contain a large variety of ATP-dependent chromatin remodeling complexes. The subunit in these complexes that hydrolyzes ATP is evolutionarily related to the DNA helicases (discussed in Chapter 5), and it binds both to the protein core of the nucleosome and to the double-stranded DNA that winds around it. By using the energy of AIP hydrolysis to move this DNA relative to the core, this subunit changes the structure of a nucleosome temporarily, making the DNA less tightly bound to the histone core. Through repeated cycles of ATP hydrolysis, the remodeling complexes can catalyze nucleosomesliding, and by pulling the nucleosome core along the DNA double helix in this way, they make the nucleosomal DNA availableto other proteins in the cell (Figure 4-25). In addition, by cooperating with negatively ATP-dependent c h r o m a t i nr e m o d e l i n g
complex
/
4J
.s.Fl CATALYSIS OF NUCLEOSOM SL EI D I N G
Figure4-29 The nucleosomesliding catalyzedby ATP-dependentchromatin remodelingcomplexes.Usingthe the remodeling energyof ATPhydrolysis, complexis thoughtto pushon the DNA and loosenits of its bound nucleosome core.Each attachmentto the nucleosome and cycleof ATPbinding,ATPhydrolysis, release of the ADPand PiProducts therebymovesthe DNAwith respectto the histoneoctamerin the directionof the arrowin this diagram.lt requires manysuchcyclesto producethe slidingshown.(Seealso nucleosome Figure4-468.)
2'16
Chapter4: DNA,Chromosomes, and Genomes
Figure4-30 Nucleosomeremovaland histone exchangecatalyzedby ATP-dependent chromatinremodelingcomplexes.By cooperating with specifichistonechaperones, somechromatinremodelingcomplexes can removethe H2A-H2Bdimersfrom a (top seriesof reactions) nucleosome and replacethem with dimersthat containa variant histone,suchas the H2AZ-H2Bdimer (see Figure4-41).Otherremodelingcomplexes are attractedto specificsiteson chromatinto removethe histoneoctamercompletelyand/or to replaceit with a differentnucleosome core (bottomseriesof reactions)
h i s t o n ec h a p e r o n e
ATP-dependent chromatin remodeling complex
i2
"i+b
charged proteins that serve as histone chaperones,some remodeling complexes are able to remove either all or part of the nucleosome core from a nucleosome-catalyzing either an exchange of its HZA-H2B histones, or the complete removal of the octameric core from the DNA (Figure 4-90). Cellscontain dozensof differentATP-dependentchromatin remodeling complexes that are specializedfor different roles. Most are large protein complexes that can contain 10 or more subunits. The activity of these complexesis carefully controlled by the cell.As genesare turned on and off, chromatin remodeling complexes are brought to specific regions of DNA where they act locally to influence chromatin structure (discussedin Chapter 7; seealso Figure 4-46, below). As pointed out previously,for most of the DNA sequencesfound in chromosomes,experimentsshow that a nucleosomecan occupy any one of a number of positions relative to the DNA sequence.The most important influence on nucleosomepositioning appearsto be the presenceof other tightly bound proteins on the DNA. Some bound proteins favor the formation of a nucleosome adjacent to them. others create obstacles that force the nucleosomes ro move to positions between them. The exact positions of nucleosomes along a stretch of DNA therefore depends mainly on the presence and nature of other proteins bound to the DNA. Due to the presenceof ATP-dependentremodeling complexes, the arrangement of nucleosomes on DNA can be highly dynamic, changing rapidly accordingto the needs of the cell.
Nucleosomes Are UsuallyPacked Togetherinto a Compact Ch r o m a t i F n i b er Although enormously long strings of nucleosomes form on the chromosomal DNA, chromatin in a living cell probably rarely adopts the extended "beads on a string" form. Instead, the nucleosomes are packed on top of one anothe; generating regular arrays in which the DNA is even more highly condensed. Thus, when nuclei are very gently lysed onto an electron microscope grid, most of the chromatin is seen to be in the form of a fiber with a diameter of about 30 nm, which is considerably wider than chromatin in the "beads on a string" form (see Figure 4-22).
217
CHROMOSOMAL DNAAND IT5PACKAGING FIBER IN THECHROMATIN
F i g u r e 4 - 3 1A z i g z a g m o d ef lo r t h e 3 0 - n m c h r o m a t i n f i b e r . (TAh) e c o n f o r m a t i o n o f t w o o f t h e f o u r (B)Schematic of nucleosomes in a tetranucleosome, from a structuredeterminedby x-raycrystallography. is not visible,beingstackedon the bottom nucleosome the entiretetranucleosome; the fourth nucleosome of a possiblezigzagstructurethat couldaccount illustration and behindit in this diagram.(C)Diagrammatic 2005'With for the 30-nm chromatinfiber.(Adaptedfrom C.L.Woodcock,Ndf.Sttuct.Mol.Biol.12:639-640, permission Ltd.) from MacmillanPublishers
How are nucleosomes packed in the 30-nm chromatin fiber? This question has not yet been answered definitively, but important information concerning the structure has been obtained. In particular, high-resolution structural analyses have been performed on homogeneous short strings of nucleosomes, prepared from purified histones and purified DNA molecules. The structure of a tetranucleosome, obtained by X-ray crystallography,has been used to support a zigzag model for the stacking of nucleosomes in the 30-nm fiber (Figure 4-3f ). But cryoelectron microscopy of longer strings of nucleosomes supports a very different solenoidal structure with intercalated nucleosomes (Figure 4-32). \Arhatcauses the nucleosomes to stack so tightly on each other in a 30-nm fiber? The nucleosome to nucleosome linkages formed by histone tails, most notably the H4 tail (Figure 4-33) constitute one important factor. Another
(A)
(c)
1 0n m
Figure4-32 An interdigitatedsolenoidmodelfor the 30-nmchromatinfiber.(A)Drawingsin whichstringsof color(B)Schematic diagramof finalstructurein (A). codednucleosomes areusedto illustratehow the solenoidis generated. arrays imagesof nucleosome (C)Structuralmodel.The modelis derivedfrom high-resolution microscopy cryoelectron octamersand Bothnucleosome of specificlengthand sequence. reconstituted from purifiedhistonesand DNAmolecules a linkerhistone(discussed below)wereusedto produceregularlyrepeatingarrayscontainingup to 72 nucleosomes' 1, 2006.With (Adaptedfrom P.Robinson,L. Fairall, V. Huynhand D. Rhodes,Proc.NatlAcad.Sci.U.S.A.103:6506-651 permission from NationalAcademyof Sciences.)
218
Chapter4: DNA, Chromosomes,and Genomes
H 4t a i l H 2 At a i l
H 2 Bt a i l . . H 3t a i l
H4 tail
H3 tail
important factor is an additional histone that is often present in a l-to-1 ratio with nucleosome cores, knor,r,nas histone Hl. This so-called linker histone is larger than the individual core histones and it has been considerably less well conserved during evolution. A single histone Hl molecule binds to each nucleosome, contacting both DNA and protein, and changing the path of the DNA as it exits from the nucleosome. Although it is not understood in detail how Hl pulls nucleosomes together into the 30-nm fiber, a change in the exit path in DNA seems crucial for compacting nucleosomal DNA so that it interlocks to form the 30-nm fiber (Figure 4-34). Most eucaryotic organisms make several histone Hl proteins of related but quite distinct amino acid sequences. It is possible that the 30-nm structure found in chromosomes is a fluid mosaic of several different variations. For example, a linker histone in the Hl family was present in the nucleosomal arrays studied in Figure 4-32 but was missing from the tetranucleosome in Figure 4-31. Moreover, we saw earlier that the linker DNA that connects adjacent nucleosomes can vary in length; these differences in linker length probably introduce local perturbations into the structure. And the presenceof many other DNA-binding proteins, as well as proteins that bind directly to histones, will certainly add important additional features to any array of nucleosomes.
Figure4-33 A speculativemodel for the role playedby histonetailsin the formationof the 30-nmfiber.(A)This schematic diagramshowsthe approximate exit pointsof the eight histonetails,one from eachhistone protein.that extendfrom each nucleosome. Theactualstructureis shown to its right.In the high-resolution structure of the nucleosome, the tailsarelargely unstructured, suggesting that they are highlyflexible.(B)A speculative model showinghow the histonetailsmay helpto packnucleosomes togetherinto the 30-nmfiber.Thismodelis basedon (1) experimental evidencethat histonetails aid in the formationof the 30-nmfiber, and (2)the x-raycrystalstructureof the nucleosome, in whichthe tailsof one nucleosome contactthe histonecoreof an adjacentnucleosome in the crystallattice.
Su m m a r y A geneis a nucleotidesequencein a DNA moleculethat actsas a functional unit for the production of a protein, a structural RNA,or a catalytic or regulatory RNAmolecule.In eucaryotes,protein-codinggenesare usually composedof a string of alternating introns and exonsassociatedwith regulatory regionsof DNA. A chromosomeisformeclfrom a single,enormously long DNA moleculethat contains a linear array of many genes.The human genomecontains3.2x ]d DNA nucleotidepairs,diuidedbetween22 dffirent autosomesand 2 sexchromosomes.only a small percentageof this DNA codesfor proteins or functional RNAmolecules.A chromosomal DNA moleculealso contains three other filpes of functionally important nucleotide sequences:replication origins and telomeresallow the DNA molecule to be fficiently replicated, while a centromere attaches the daughter DNA moleculesto the mitotic spindle, ensuring their accurate segregationto daughter cellsduring the M phaseof the cell cycle.
Figure4-34 How the linkerhistone bindsto the nucleosome. The position and structureof the globularregionof histoneH1 areshown.As indicated, this regionconstrains an additional 20 nucleotidepairsof DNAwhereit exits from the nucleosome core.Thistype of bindingby H1 isthoughtto be important for formingthe 30-nmchromatinfiber. The long C-terminal tail of histoneH1 is alsorequiredfor the high-affinity binding of H1 to chromatin,but neitherits positionor that of the N-terminal tail is (B)structure. known.(A)Schematic, (8,from D. Brown,T. lzardand T. Misteli, Nat,Struct.Mol. Biol. 13:250-255,2006. With permission from Macmillan Publishers Ltd.)
219
THEREGULATION OF CHROMATIN STRUCTURE
The DNA in eucaryotesis tightly bound to an equal massof histones,which form repeatedarrays of DNA-protein particles called nucleosomes.The nucleosomeis composedof an octameric core of histone proteins around which the DNA double helix is wrapped.Nucleosomesare spacedat interuals of about 200 nucleotidepairs, and they are usually packed together (with the aid of histone Hl molecules)into quasi-regular arrays to form a 30-nm chromatin fiber. Despite the high degreeof compaction in chromatin, its structure must be highly dynamic to allow accessto the DNA. Thereis somespontaneousDNA unwrapping and rewrappingin the nucleosomeitself;how' euer,the general strategyfor reuersiblychanging local chromatin structure features ATP-driuen chromatin remodeling complexes.Cells contain a large set of such complexes,which are targeted to speciflc regionsof chromatin at appropriate times. The remodeling complexescollaborate with histone chaperonesto allow nucleosomecores to be repositioned,reconstitutedwith dffirent histones,or completelyremouedto exposethe underlying DNA.
THEREGULATION OFCHROM IN STRUCTURE Having described how DNA is packagedinto nucleosomesto create a chromatin fiber, we now turn to the mechanisms that create different chromatin structures in different regions of a cell's genome. We now know that mechanisms of this type are used to control many genesin eucaryotes.Most importantly, certain types of chromatin structure can be inherited; that is, the structure can be directly passed donm from a cell to its decendents.Becausethe cell memory that results is based on an inherited protein structure rather than on a change in DNA sequence,this is a form of epigenetic inheritance. The prefix epl is Greek for "on"; this is appropriate, becauseepigeneticsrepresentsa form of inheritance that is superimposed on the genetic inheritance based on DNA (Figure,t-35). In Chapter 7, we shall introduce the many different ways in which the expression of genes is regulated. There we discuss epigenetic inheritance in detail and present severaldistinct mechanisms that can produce it. Here, we are concerned with only one, that based on chromatin structure. We begin this section with an introduction to inherited chromatin structures and then describe the basis for them-the covalent modification of histones in nucleosomes.We shall see that these modifications serve as recognition sites for protein modules that bring specific protein complexes to the appropriate regions of chromatin, thereby producing specific effects on gene expressionor inducing other biological functions. Through such mechanisms, chromatin structure plays a central role in the development, growth, and maintenance of eucaryotic organisms' including ourselves.
G E N E T IIC NHERITANCE
E P I G E N E TI N I CH E R I T A N C E g e n eY o n
g e n eX o n
seeuerucr I orun
cHANGE IV cHnovnrtru
I CHANGE
CIITI!
gene X ofl
o e n eY o f f CELLS MULTTPLTCATTON OF SOMATTC /\
gene X off
*E*il
gene Y off
P R O D U C T I OONF G E R MC E L L S :ililiii:li:i.t, t:]it::]iltl::.lul
gene X off
I
I tiiilillii*:i iilisi:i:liitl
Figure4-35 A comparisonof genetic inheritancewith an epigenetic inheritancebasedon chromatin is based structures.Geneticinheritance of DNA on the directinheritance duringDNA nucleotidesequences DNAsequencechangesare replication. not only transmittedfaithfullyfrom a but somaticcellto all of its descendents, alsothroughgerm cellsfrom one generationto the next.Thefieldof genetics,reviewedin Chapter8, is based of thesechanges on the inheritance The type of betweengenerations. shownhereis epigeneticinheritance basedon other moleculesboundto the DNA,and it is thereforelesspermanent in than a changein DNAsequence; particular, epigeneticinformationis usually(but not always)erasedduring the formationof eggsand sPerm. that Onlyone epigeneticmechanism, of chromatin basedon an inheritance in this chapter. is discussed structures, are Otherepigeneticmechanisms presentedin Chapter7, whichfocuseson (see the controlof geneexpression Figure7-86).
220
Chapter4: DNA,Chromosomes, and Genomes
SomeEarlyMysteries Concerning ChromatinStructure Thirty years ago, histones were viewed as relatively uninteresting proteins. Nucleosomes were known to cover all of the DNA in chromosomes, and they were thought to exist to allow the enormous amounts of DNA in many eucaryotic cells to be packaged into compact chromosomes. Extrapolating from what was knor.m in bacteria, many scientists believed that gene regulation in eucaryotes would simply bypass nucleosomes, treating them as uninvolved bystanders. But there were reasons to challenge this view. Thus, for example, biochemists had determined that mammalian chromatin consists of an approximately equal mass of histone and non-histone proteins. This would mean that, on auerLge,every 200 nucleotide pairs of DNA in our cells is associated with more than 1000 amino acids of non-histone proteins (that is, a mass of protein equivalent to the total mass of the histone octamer plus histone Hl). We now know that many of these proteins bind to nucleosomes, and their abundance might suggestthat histones are more than just packaging proteins. A second reason to challenge the view that histones were inconsequential to gene regulation was based on the amazingly slow rate of evolutionary change in the sequences of the four core histones. The previously mentioned fact that there are only two amino acid differences in the sequence of mammalian and pea histone H4 implies that a change in almost any one of the 102 amino acids in H4 must be deleterious to these organisms.\iVhattype of process could make the life of an organism so sensitive to the exact structure of the nucleosome core that only two amino acids had changed in more than 500 million years of random variation followed by natural selection? Last but not least, a combination of genetics and cytology had revealed that a particular form of chromatin silencesthe genesthat it packageswithout regard to nucleotide sequence-and does so in a manner that is directly inherited by both daughter cells when a cell divides. It is to this subiect that we turn next.
Heterochromatin ls HighlyOrganized and Unusually Resistant to GeneExpression Light-microscope studies in the 1930sdistinguished two types of chromatin in the interphase nuclei of many higher eucaryotic cells: a highly condensed form, called heterochromatin, and all the rest, which is less condensed, called euchromatin. Heterochromatin representsan especially compact form of chromatin (see Figure 4-9), and we are finally beginning to understand important aspects of its molecular properties. Although present in many locations along chromosomes, it is also highly concentrated in specific regions, most notably at the centromeres and telomeres introduced previously (seeFigure 4-21). In a typical mammalian cell, more than ten percent of the genome is packaged in this way. The DNA in heterochromatin contains very few genes, and those euchromatic genes that become packaged into heterochromatin are turned off by this type of packaging. However, we know now that the term heterochromatin encompassesseveraldistinct types of chromatin structures whose common feature is an especially high degree of compaction. Thus, heterochromatin should not be thought of as encapsulating "dead" DNA, but rather as creating different tlpes of compact chromatin with distinct features that make it highly resistant to gene expression for the vast majority of genes. lvhen a gene that is normally expressedin euchromatin is experimentally relocated into a region of heterochromatin, it ceasesto be expressed,and the gene is said to be silenced.Thesedifferences in gene expression are examples of position effects, in which the activity of a gene depends on its position relative to a nearby region of heterochromatin on a chromosome. First recognized in Drosophila, position effects have now been observed in many eucarvotes, including yeasts,plants, and humans.
221
THEREGULATION OFCHROMATIN STRUCTURE 12345 b ar r i e r
I
I
genes _ a---123 45
heterochromatin euchromatin
12345 trI1T."I{tr;iffiTlTi_T: jl
12345
I
e a r l yi n t h e d e v e l o p i n ge m b r y o ,h e t e r o c h r o m a t i fno r m s a n d s p r e a d si n t o n e i g h b o r i n g euchromatinto different extents in different cells
I
r--r-I
CHROMOSOME TRANSLOCATION
12345
T:'t-Tr
I
12345 heterochromatin euchromatin
r--r[rT-r] c l o n eo f c e l l sw i t h gene 1 inactive (A)
c l o n eo f c e l l sw i t h g e n e s1 , 2 , a n d 3 i n a c t i v e
c l o n eo f c e l l sw i t h n o g e n e si n a c t i v a t e d
(B)
Figure4-36 The causeof position effect variegationin Drosophild.(A)Heterochromatin(green)is normallyprevented whichwe shalldiscussshortly. sequences, from spreadinginto adjacentregionsof euchromatin(red)by specialbarrierDNA this barrieris no longerpresent.(B)Duringthe early In fliesthat inheritcertainchromosomal however, rearrangements, for different DNA,proceeding developmentof suchflies,heterochromatin can spreadinto neighboringchromosomal patternof heterochromatin is inherited,so that distances in differentcells.Thisspreadingsoonstops,but the established and largeclonesof progenycellsareproducedthat havethe sameneighboringgenescondensedinto heterochromatin is (hencethe "variegated" therebyinactivated appearance of someof theseflies;seeFigure4-37).Although"spreading" the term may not be existingheterochromatin, usedto describethe formationof new heterochromatin closeto previously can"skipover"someregionsof chromatin, whollyaccurate. heterochromatin Thereis evidencethat duringexpansion, sparingthe genesthat lie withinthem from repressive effects
The position effects associated with heterochromatin exhibit a feature called position effectuariegation,which in retrospect provided critical clues concerning chromatin function. ln Drosophila, chromosome breakage events that directly connect a region of heterochromatin to a region of euchromatin tend to inactivate the nearby euchromatic genes.The zone of inactivation spreadsa different distance in different early cells in the fly embryo, but once the heterchromatic condition is established on a gene, it tends to be stably inherited by all of the cell's progeny (Figure 4-36). This remarkable phenomenon was first recognized through a detailed genetic analysis of the mottled loss of red pigment in the fly eye (Figure 4-37), but it shares many features with the extensive spread of heterochromatin that inactivates of one of the two X chromosomes in female mammals (seep. 473). Extensive genetic screenshave been carried out in Drosophila, as well as in fungi, in a search for gene products that either enhance or suppress the spread of heterochromatin and its stable inheritance-that is, for genes that when mutated serve as either enhancers or suppressorsof position effect variegation. In this way, more than 50 genes have been identified that play a critical role in these processes.In recent years, the detailed characterization of the proteins produced by these genes has revealed that many are nonhistone chromosomal proteins that underlie a remarkable mechanism for eucaryotic gene control, one
White gene at normal location
barner heterochromatin
rare cnromosome In v e r s t o n
Dar
Figure4-37 The discoveryof position effectson gene expression.TheWhite gene in the fruit fly Drosophilacontrols eyepigmentproductionand is named afterthe mutationthat firstidentifiedit. Wild-typeflieswith a normal Whitegene (White+) havenormalpigmentproduction, which givesthem red eyes,but if the White the geneis mutatedand inactivated, mutantflies(White-)makeno pigment and havewhiteeyes.Infliesin whicha normalWhite+gene has been moved near the eyesare a regionof heterochromatin, mottled,with both red and whltepatches. fhe white patchesrepresentcell lineages in which the White+gene hasbeen silencedby the effectsof the In contrast, the red heterochromatin. patchesrepresent celllineagesin which the White+gene is expressed.Earlyin when the heterochromatin develooment, isfirstformed,it spreadsinto neighboring to differentextentsin euchromatin differentembryoniccells(seeFigure 4-36).The presenceof largepatchesof red and whitecellsrevealsthat the stateof activity,asdeterminedby transcriptional the packagingof this geneinto chromatin in thoseancestorcells,is inheritedby all dauqhtercells.
222
Chapter4: DNA,Chromosomes, and Genomes
( A ) L Y S I NA EC E T Y L A T I OANN D M E T H Y L A T I OANR EC O M P E T I NR GE A C T I O N S
HO
-N-
I
H
HO
rtl C-C-
-N-
I
CH,
H
tau
t.-
lysrne
l\H
t-
fn' CHt
CH, I N-
,.+
I
CH, +
t-
I CHt
H:C
t-
CH,
I'
H^C
|
t t- Cl -
n fn, I
CHr t|
CHt
C:O
-N-C
CHt +
H
HO
tll C-CI I
I.
\cn.
/ \cn. | H,c -
H
H-
monomethyllysine
dimethyl lysine
CH:
CH: acetyllysine
Figure4-38 Someprominenttypesof covalentamino acid side-chain modificationsfound on nucleosomalhistones.(A)Threedifferentlevelsof lysinemethylationareshown;eachcan be recognized by a different bindingproteinand thuseachcan havea differentsignificance for the cell. Notethat acetylation removesthe pluschargeon lysine,and that,most importantly,an acetylatedlysinecannot be methylated,and viceversa. (B)Serinephosphorylation addsa negativechargeto a histone. Modifications not shownherearethe mono-or di-methylation of an arginine,the phosphorylation of a threonine,the additionof ADP-ribose to a glutamicacid,and the additionof a ubiquityl,sumoyl,or biotingroupto a lysine.
trimethyl lysine
( B ) S E R I NP EH O S P H O R Y L A T I O N
HO
HO
ttl
ttl -N-C-CH
-N-C-C-
CH?
lOH Senne
H
+
CH,
t-
o O-
I
P:O
I
that requires the precise amino acid sequencesof the core histones. This mechanism of gene control therefore helps to explain the remarkably slow change in the histones over time.
TheCoreHistones AreCovalently Modifiedat ManyDifferent Sites The amino acid side chains of the four histones in the nucleosome core are subjected to a remarkable variety of covalent modifications, including the acetylation of lysines, the mono-, di-, and tri-methylation of lysines, and the phosphorylation of serines (Figure 4-38). A large number of these side-chain modifications occur on the eight relatively unstructured N-terminal "histone tails" that protrude from the nucleosome (Figure 4-39). However, there are also specific side-chain modifications on the nucleosome'sglobular core (Figure 4-40). Ail of the above types of modifications are reversible.The modification of a particular amino acid side chain in a nucleosome is created by a specific enzyme, with most of these enzymes acting only on one or a few sites.A different enzyme is responsible for removing each side chain modification. Thus, for example, acetyl groups are added to specific lysines by a set of different histone acetyl transferases (FIATs)and removed by a set of histone deacetylase complexes (HDACs).Likewise,methyl groups are added to lysine side chains by a set of different histone methyl transferases and removed by a set of histone demethylases. Each enzwe is recruited to specific sites on the chromatin at defined times in each cell'slife history. For the most part, the initial recruitment of these enz).rnesdepends on gene regulatory proteins thatbind to specific DNA sequencesalong chromosomes, and these are produced at different times in the life of an organism, as described in chapter 7. But in at least some cases,the covalent modifications on nucleosomes can persist long after the gene regulatory proteins that first induced them have disappeared,thereby carrying a memory in the cell of its developmental history. very different patterns of covalent modifications are therefore found on different groups of nucleosomes, according to their exact position on a chromosome and the status of the cell.
o
phosphoserine
223
THEREGULATION OFCHROMATIN STRUCTURE
# 'ift- #W
|
H2A
s c n e x o c e r . q n a x a l s t R s s R A GL Q F P v G R V - i r 1315 9 1 5 119
PM
A
.A le l
I
r
iA 'f
I'
1.."4
l-r
H2B . "ffi' \ Y
prrat<sapeprrcs;rtxevrfteQ(i
120
A.& M$A':{.M
l?
YY
Y Ml
M MIP
Y
r x n o * l $ A p A r c GV K - K A * r K e r A R K S r G G K An * l o 36 9 r0 14 1718 " o 23 262728 2 4
PMA
AAA M A
A
ttltttll S,:Rl'XCCXI:
Y 79
M
M
L GKaIG1rKRHRKVLIT.DNT !G i
L'-K
135812162079
N-terminaltails
M methylation
P phosphorylation
I
bottom view
Sl"brbr. domains
ffi
acetvlation U
ubiquitylation
I
(B)
(A)
highlightingthe location Figure4-39 The covalentmodificationof core histonetails.(A)Thestructureof the nucleosome (B\Well-documented of the modifications of the first30 aminoacidsin eachof its eight N-terminalhistonetails(green). Althoughonly a singlesymbolis usedfor methylationhere(M),eachlysine(K)or four histonecoreproteinsareindicated. arginine(R)can be methylatedin severaldifferentways.Notealsothat somepositions(e.9.,lysine9 of H3)can be modified smallmolecule shownadd a relatively eitherby methylationor by acetylation, but not both.Mostof the modifications (seeFigure onto the histonetails;the exceptionis ubiquitin,a 76 aminoacidproteinalsousedfor othercellprocesses With permissionfrom Elsevier') 6-92). (Adaptedfrom H. Santos-Rosa and C.Caldas,Eur.J. Cancer41:2381-2402,2005.
The modifications of important consequences. tends to loosen chromatin lysine removes its positive
the histones are carefully controlled, and they have The acetylation of lysines on the N-terminal tails structure, in part because adding an acetyl group to charge, thereby reducing the affinity of the tails for
H 3t a i l s -t
I
top vlew
a acetylation O methylation S phosphorylation u b i qu i t y l a t i o n I acetylationor methylation s i d ev i e w
Figure 4-4OA map of histone modifications on the surfaceof the nucleosomecore particle.As noted,the histonetails have beenomittedhere(comparewith Figure 4-39).The functionsof most of thesecore modificationsare not yet known.(Adapted from M.S.Cosgrove,J.D.Boekeand C.Wolberger,Nat.Sttuct.Mol. Biol. from 2004.With permission 11:1037-1043, Ltd.) MacmillanPublishers
224
Chapter4: DNA,Chromosomes, and Genomes
adjacent nucleosomes (seeFigure 4-33). However, the most profound effect of the histone modifications is their ability to attract specific proteins to a stretch of chromatin that has been appropriately modified. These new proteins determine how and when genes will be expressed,as well as other biological functions. In this way, the precise structure of a domain of chromatin determines the expressionof the genespackaged in it, and thereby the structure and function of the eucaryotic cell.
ChromatinAcquiresAdditionalVarietyThroughthe Site-Specific Insertionof a SmallSetof HistoneVariants Despite the tight conservation of the amino acid sequencesof the four core histones over hundreds of millions of years, eucaryotes also contain a few variant histones that assemble into nucleosomes. These histones are present in much smaller amounts than the major histones, and they have been less well conserved over long evolutionary times. Except for histone H4, variants exist for each of the core histones; some examples are shown in Figure 4-41. The major histones are synthesized primarily during the S phase of the cell cycle (see Figure l7-4) and assembled into nucleosomes on the daughter DNA helices just behind the replication fork (seeFigure 5-38). In contrast, most histone variants are synthesized throughout interphase. They are often inserted into already-formed chromatin, which requires a histone-exchangeprocess catalyzed by the ATP-dependent chromatin remodeling complexes discussed previously. These remodeling complexes contain subunits that cause them to bind both to specific sites on chromatin and to histone chaperones that carry a particular variant. As a result, each histone variant is inserted into chromatin in a highly selectivemanner (seeFigure 4-30).
The CovalentModificationsand the HistoneVariantsAct in Concertto Producea "HistoneCode"ThatHelpsto Determine B i o l o g i c aFl u n c t i o n The number of possible distinct markings on an indMdual nucleosome is enormous. Even with the recognition that some of the covalent modifications are mutually exclusive(for example,it is not possiblefor a lysine to be both acetylated and methylated at the same time), and that other modifications are created together as a set, it is clear that thousands of combinations can exist. In addition, there is the further diversity created by nucleosomes that contain histone variants. h i s t o n ef o l d
S P E C I AFLU N C T I O N
H3
H33
t r a n s c r i p t i o n aal c t i v a t i o n
CENP-A loop insert
c e n t r o m e r ef u n c t i o na n d k i n e t o c h o r ea s s e m b l y
H24
H2AX
D N A r e p a i ra n d recombination
H2AZ
g e n ee x p r e s S r o n , c h r o m o s o m es e g r e g a t i o n
macroH2A
t r a n s c r i p t i o n ar le p r e s s i o n , X - c h r o m o s o m ien a c t i v a t i o n h i s t o n ef o l d
Figure 4-41 The structure of some histonevariantscomparedwith the major histone that they replace.These histonesareinsertedinto nucleosomes at specificsiteson chromosomes by ATP-dependent chromatinremodeling enzymesthat act in concertwith histone (seeFigure4-30).The chaperones CENP-A variantof histoneH3 is discussed laterin this chapter(seeFigures4-48 to 4-51); othervariantsarediscussed in Chapter7.The sequences that arecolored differentlyin eachvariantare different from the corresponding sequenceof the majorhistone.(Adaptedfrom K.Sarma and D. Reinberg,Nat.Rev.Mol.Cell.Biol. 6:139-149,2005.With permission from MacmillanPublishers Ltd.)
225
THEREGULATION OFCHROMATIN STRUCTURE
Zn
(B)
Many of the combinations appear to have a specific meaning for the cell becausethey determine how and when the DNA packaged in the nucleosomes is accessed,leading to the histone code hlpothesis. For example, one type of marking signals that a stretch of chromatin has been newly replicated, another signals that the DNA in that chromatin has been damaged and needs repair, while many others signal when and how gene expression should take place. Small protein modules bind to specific marks, recognizing for example a trimethylated lysine 4 on histone H3 (Figure tl-42). These modules are thought to act in concert with other modules as part of a code-readercomplex, so as to allow particular combinations of markings on chromatin to attract additional protein complexes that execute an appropriate biological function at the right time (Figure 443). scaffold orotein modules protein b i n d i n gt o s p e c i f i c h i s t o n em o d i f i c a t i o n s on nucteosome
Figure 4-42 How each mark on a nucleosomeis read.The structureof a proteinmodulethat sPecificallY on histoneH3 trimethylated recognizes model lysine4 is shown.(A)Space-filling of an INGPHDdomainboundto a histonetail (green,with the trimethyl group highlightedinyellow).(B)A ribbon modelshowinghow the N-terminalsix aminoacidsin the H3 tail arerecognized. The doshedIinesrepresenthydrogen bonds.Thisis one of manyPHDdomains that recognizemethylatedlysineson differentdomainsbind tightly histones; to lysineslocatedat differentpositions, betweena and they can discriminate lysine.In a mono-,di-,and tri-methylated similarway,othersmallproteinmodules specifichistonesidechains recognize that havebeen markedwith acetyl groups,phosphategrouPs,and so on. (Adaptedfrom P.V.Penaet al.,Nature 03,2006.With permissionfrom 442:100-1 Ltd.) MacmillanPublishers
c o v al e n t
modification o n h i s t o n et a i l (mark) C O D ER E A D E R BINDS AND ATTRACTS OTHER COMPONENTS
p r o t e i nc o m p l e xw i t h catalyticactivitiesand a d d i t i o n a lb i n d i n gs i t e s
Figure4-43 Schematicdiagramshowing how the histone code could be read by a code-readercomplex.A largeprotein complexthat containsa seriesof protein a modules,eachof which recognizes specifichistonemark,is schematically "code-reader illustrated(green).This complex"will bind tightlyonly to a region of chromatinthat containsseveralof the differenthistonemarksthat it recognizes' Therefore,only a specificcombinationof markswill causethe complexto bind to chromatinand attractadditionalprotein complexes(purple)thatcatalyzea biolooicalfunction.
226
Chapter4: DNA,Chromosomes, and Genomes (A)
MM RK 24
(B)
A M
AAA
M lP
MvrlM
KS 910
K 14
I
YI?
rl
RK 1118
K RKS 2 3 262728
modification state
M
M
K 36
K 79
r
"meaning"
M h e t e r o c h r o m a t i fno r m a t i o n , g e n es i l e n c i n g
N
9 MA rl
g e n ee x p r e S s r o n
KK 49 PA t l
SK 10
g e n ee x p r e S S t o n 14 M I
K 27
s i l e n c i n go f H o x g e n e s , X c h r o m o s o m ei n a c t i v a t i o n
The marks on nucleosomes due to covalent additions to histones are dynamic, being constantly removed and added at rates that depend on their chromosomal locations. Because the histone tails extend outward from the nucleosome core and are likely to be accessibleeven when chromatin is condensed, they would seem to provide an especially suitable format for creating marks in a form that can be readily altered as a cell's needs change. Although much remains to be learned about the meaning of the many different histone code combinations, a few well-studied examples of the information that can be encoded in the histone H3 tail are listed in Figure 4-44.
proteinsCanSpread A Complexof Code-reader and Code-writer Specific ChromatinModifications for LongDistances Alonga Ch r o m o s o m e The phenomenon of position effect variegation described previously requires that at least some modified forms of chromatin have the ability to ipreud fot substantial distances along a chromosomal DNA molecule (see Figure 4-36). How is this possible? The enzymes that modify (or remove modifications from) the histones in nucleosomes are part of multisubunit complexes.They can initially be brought to a particular region of chromatin by one of the sequence-specificDNA-binding proteins (gene regulatory proteins) discussedin chapters 6 and 7 (for a specific example, see Figure 7-87). But after a modifying enzyme "writes" its mark on one or a few neighboring nucleosomes,events that resemble a chain reaction can ensue. In this case,the "code-writer" enzyme works in concert with a codereader protein located in the same protein complex. This second protein contains a code-reader module that recognizes the mark and binds tightly to the newly modified nucleosome (see Figure 4-42), positioning its attached writer enzyme near an adjacent nucleosome. Through many such read-write cycles, the reader protein can carry the writer enzyrne along the DNA-spreading the mark in a hand-over-hand manner along the chromosome (Figure 4-45). In reality, the process is more complicated than the scheme just described. Both readers and writers are part of a protein complex that is likely to contain
Figure 4-44 Somespecificmeaningsof the histonecode.(A)The modifications on the histoneH3 N-terminal tail are shown,repeatedfrom Figure4-39. (B)The H3 tail can be markedby different combinations of modifications that conveya specificmeaningto the stretch of chromatinwherethis combination occurs.Onlya few of the meaningsare known,includingthe four examples shown.Tofocuson just one example,the trimethylation of lysine9 attractsthe proteinHP1, heterochromatin-specific whichinducesa spreadingwaveof furtherlysine9 trimethylation followed by furtherHP1binding,accordingto the generalschemethat will be illustrated shortly(seeFigure4-46). Not shown is the fact that,asjust implied(seeFigure 4-43),readingthe histonecodegenerally involvesthe joint recognitionof marksat othersiteson the nucleosome alongwith the indicatedH3 tail recognition. In addition,specificlevelsof methylation (mono-,di-,or tri-methylgroups)are required,as in Figure4-42.
THEREGULATION OFCHROMATIN STRUCTURE g e n e r e g u l a t o r yp r o t e i n
c o d e - r e a d epr r o t e i n
h i s t o n em o d i fi c a t i o n( m a r k )
multiple readers and writers, and to require multiple marks on the nucleosome to spread. Moreover, many of these reader-writer complexes also contain an ATP-dependent chromatin remodeling protein, and the reader, writer, and remodeling proteins work in concert to either decondense or condense long stretchesof chromatin as the reader moves progressivelyalong the nucleosomepackaged DNA (Figure 4-46). Some idea of the complexity of the processesjust described can be derived from the results of genetic screensfor mutant genesthat either enhance or suppress the spreading and stability of heterochromatin in tests for position effect variegation in Drosophila (see Figure 4-37). As pointed out previously, more than 50 such genes are knonrn, and most of them are likely to function as subunits in one or more reader-writer-remodeling protein complexes.
Blockthe Spreadof Reader-Writer BarrierDNASequences Domains Chromatin Neighboring andtherebySeparate Complexes The above mechanism for spreading chromatin structures raises a potential problem. Inasmuch as each chromosome consists of one continuous, very long DNA molecule, what prevents a cacophony of confusing cross-talk between adjacent chromatin domains of different structure and function? Early studies of position effect variegation had suggested an answer: the existence of specific DNA sequencesthat separate one chromatin domain from another (seeFigure 4-37). Severalsuch barrier sequenceshave now been identified and characterized through the use of genetic engineering techniques that allow specific regions of DNA sequenceto be deleted or added to chromosomes. For example, a sequence called HS4 normally separates the active chromatin domain that contains the B-globin locus from an adjacent region of silenced, condensed chromatin in erythrocytes (see Figure 7-61)' If this sequenceis deleted, the B-globin locus is invaded by condensed chromatin. This chromatin silencesthe genesit covers,and it spreadsto a different extent in different cells, causing a pattern of position effect variegation similar to that
227 Figure4-45 How the recruitmentof a code-reader-writercomplex can spread chromatinchangesalong a Thecode-writeris an chromosome. enzymethat createsa specific on one or moreof the four modification Afterits histones. nucleosomal to a specificsiteon a recruitment chromosomeby a generegulatorY with a protein,the writercollaborates proteinto spreadits mark code-reader bY to nucleosome from nucleosome meansof the indicatedreader-writer to work, complex.Forthis mechanism the same the readermust recognize histonemodificationmarkthat the writer produces(seealsoFigure4-43).
228
Chapter4: DNA,Chromosomes, and Genomes
" r e a d e r - w r i t e r "c o m p l e x
\D,'
1onm-
ill;*'".1,",11"Ti0",,"n l.- @ com'rex f--
oo'
l.@ f-
(A)
oo'
S P R E A D I NW GA V EO F CHROMATIC NONDENSATION
observed in Drosophila. As described in chapter 7, this invasion has dire consequences:the globin genes are poorly expressed,and individuals who carry such a deletion have a severeform of anemia.
matin modifications are knor,tmthat can also protect genesfrom silencing.
Thechromatinin centromeres Reveals HowHistoneVariantscan CreateSpecialStructures The presence of nucleosomes carrying histone variants is thought to produce marks in chromatin that are unusually long lasting. consider, for example, the formation and inheritance of the chromatin that forms on centromeres, the DNA region of each chromosome required for the orderly segregation of the chromosomes into daughter cells each time a cell divides (see Figure 4-21). rn many complex organisms, including humans, each centromere is embedded in a stretch of special centric heterochromatinthatpersists throughout interphase, even though the centromere-mediated movement of DNA occurs only during
Figure4-46 How a complexcontaining reader-writer and ATP-dependent chromatinremodelingproteinscan spreadchromatinchangesalong a chromosome.(A)A spreadingwaveof chromatincondensation. Thismechanism is identicalto that in Figure4-45,except that the reader-writer complex collaborates with an ATP-deoendent chromatinremodelingprotein(seeFigure 4-29)to repositionnucleosomes and packthem into highlycondensedarrays. Thisis a highlysimplifiedview of the mechanismknownto be ableto spreada majorform of heterochromatinfor long (see distances alongchromosomes Figure4-36).The heterochromatinspecificproteinHPI playsa majorrolein that process. HP1bindsto trimethyl lysine9 on histoneH3,and it remains associated with the condensed chromatinasone of the readersin a reader-writer-remodeling complexthat, while incompletely understood, is considerably more intricatethan that shownhere.(B)Theactualstructureof a chromatinreader-remodeling complex, showinghow it is thoughtto interact with a nucleosome. Modeledin grayis the yeastRSCcomplex,whichcontains 1 5s u b u n i t s - i n c l u d i nagn A T P dependentchromatinremodeling proteinand at least4 subunitswith codereaderdomains.(8,from A.E.Leschziner et al.,Proc.Natl Acad.Sci.U.S.A. 104:491 3-4918, 2007.With permission from NationalAcademyof Sciences.)
.
THEREGULATION OFCHROMATIN STRUCTURE (A)
229
Figure 4-47 Some mechanismsof barrier action.Thesemodelsare derivedfrom of barrieraction,and a differentanalyses combinationof severalof them may function at any one site.(A)The tethering of a regionof chromatinto a largefixed site,suchasthe nuclearporecomplex illustratedhere,canform a banierthat stopsthe spreadof heterochromatin. (B)Thetight bindingof barrierproteinsto cancompetewith a groupof nucleosomes (C)By spreading. heterochromatin recruitinga groupof highlyactivehistonebarrierscan erasethe modifyingenzymes, histonemarksthat arerequiredfor to spread.Forexample,a heterochromatin potentacetylation of lysine9 on histone H3 will competewith lysine9 methylation, therebypreventingthe HP1protein bindingneededto form someformsof (seeFigure4-46).(Based heterochromatin on A.G.Westand P Fraser,Hum.Mol.Genet. from 11, 2005.With permission 14:R101-R1 OxfordUniversityPress.)
n u c t e a rp o r e
b a r r i e rp r o t e i n
b a r r i e rp r o t e i n
b a r r i e rp r o t e i n
mitosis. This chromatin contains a centromere-specific variant H3 histone, known as CENP-A (see Figure 4-41), plus additional proteins that pack the nucleosomes into particularly dense arrangements and form the kinetechore, the special structure required for attachment of the mitotic spindle. A specific DNA sequence of approximately 125 nucleotide pairs is sufficient to serve as a centromere in the yeast S. cereuisiae.Despite its small size, more than a dozen different proteins assemble on this DNA sequence; the proteins include the CENP-A histone H3 variant, which, along with the three other core histones, forms a centromere-specific nucleosome. The additional proteins at the yeast centromere attach this nucleosome to a single microtubule from the yeast mitotic spindle (Figure 4-48). The centromeres in more complex organisms are considerably larger than those in budding yeasts.For example, fly and human centromeres extend over hundreds of thousands of nucleotide pairs and do not seem to contain a centromere-specific DNA sequence. These centromeres largely consist of short, repeated DNA sequences, knor,tn as alpha satellite Dl/A in humans. But the same repeat sequencesare also found at other (non-centromeric) positions on
./ normal ucleosome
nucleosomw e ith centromere-specifi c h i s t o n eH 3
(A) s e qu e n c e - s p e c i f i c D N A b i n d i n gp r o t e i n
y e a s tc e n t r o m e r i cD N A
m t c r ot u b ul e yeastkinetochore (B)
Figure4-48 A model for the structureof a simplecentromere.In the Yeast Saccharomycescerevisiae, a speciaI a DNAsequenceassembles centromeric in whichtwo copiesof singlenucleosome in an H3 varianthistone(calledCENP-A the normalH3. replaces mostorganisms) uniqueto thisvariant Peptidesequences histone(seeFigure4-41)then helpto additionalproteins,someof assemble This whichform a kinetochore. is unusualin capturingonly a kinetochore humanshavemuch singlemicrotubule; and form kinetochores largercentromeres that can capture20 or more microtubules (seeFigure4-50).The kinetochore is in detailin Chapter17. discussed (Adaptedfrom A. Joglekaret al.,Nat Cel/ 2006.With permission Biol.8:381-383, Ltd.) from MacmillanPublishers
230
Chapter4: DNA,Chromosomes, and Genomes h i g h e r - o r d er e p e a t :+it+
a l p h as a t e l l i t eD N A m o n o m e ( 1 7 1n u c l e o t i d p eairs)
actlve centromere (A)
tlanking i n a c t i v ec e n t r o m e r e heterochromatin withnonfunctional a l p h as a t e l l i t eD N A
(B)
n e o c e n t r o m e r feo r m e d w i t h o u t a l p h as a t e l l i t eD N A
Figure4-49Evidence for the plasticity (A)A series of humancentromere formation, of A-T-rich alphasatellite DNA sequences arerepeated manythousands (red), of timesat eachhumancentromere surrounded by pericentric (brown). heterochromatin However, dueto anancient chromosome breakage andrejoining event, somehuman chromosomes contain two blocks of alphasatellite DNA'eachof whichpresumablyfunctioned asa centromere in its original chromosome. Usually, thesedicentric chromosomes arenotstablypropagated because theyattachimproperly to thespindle andarebrokenapartduringmitosis. Inchromosomes thatdo survive, however, oneof thecentromeres has somehow inactivated, eventhoughit contains allthenecessary DNAsequences. Thisallows thechromosome to bestably propagated. (B)Ina smallfraction (1/2000) of humanbirths, extrachromosomes areobserved in cellsof theoffspring. Someof theseextrachromosomes, whichhaveformedfroma breakage event,lackalphasatellite yetnew DNAaltogether, (neocentromeres) centromeres havearisen fromwhatwasoriginally euchromatic DNA.
chromosomes, indicating that they are not sufficient to direct centromere formation. Most strikingly, in some unusual cases,new human centromeres (called neocentromeres) have been observed to form spontaneously on fragmented chromosomes. some of these new positions were originally euchromatic and Iack alpha satellite DNA altogether (Figure 4-45). It therefore seemsthat centromeres in complex organisms are defined by an assembly of proteins, instead of by a specific DNA sequence.when antibodies that stain specific modified nucleosomes are used to examine the stretched chromosome fibers from centromeres, one observesstriking alternation of two modified forms of chromatin (Figure 4-50). It appears that this arrangement allows the centric heterochromatin to fold so as to position the cENp-A-containing nucleosomes on the outside of the mitotic chromosome, where they bind the set of proteins that form the kinetechore plates. These plates in turn capture a group of microtubules from the mitotic spindle in order to partition the chromosomes accuratelv as described in Chaoter 17.
ChromatinStructures CanBeDirectlyInherited To explain the above observations, it has been proposed that de nouo centromere formation requires an initial seeding event, involving the formation of a specialized DNA-protein structure that contains nucleosomes formed with the CENP-A variant of histone H3. In humans, this seeding event happens more readily on arrays of alpha satellite DNA than on other DNA sequences.The H3-H4 tetramers from each nucleosome on the parental DNA helix are directly inherited by the daughter DNA helices at a replication fork (see Figure 5-3g). Therefore, once a set of GENP-A-containing nucleosomes has been assembled on a stretch of DNA, it is easy to understand how a new centromere could be generated in the same place on both daughter chromosomes following each round of cell division (Figure 4-5f ). The plasticity of centromeres may provide an important evolutionary advantage. we have seen that chromosomes evolve in part by breakage and rejoining events (seeFigure 4-18). Many of these events produce chromosomes with two centromeres, or chromosome fragments with no centromeres at all. Although rare, both the inactivation of centromeres and their ability to be activate d cJenouo
231
STRUCTURE OFCHROMATIN THEREGULATION
r
'":"".::::-iH::"_.1r n n e ra n o outer kinetochore plates
chromatin containing centromeresoecifichistone H3 variant c h r o m a t i nc o n t a i n i n gn o r m a lh i s t o n eH 3 t h a t i s dimethylated at lysine4
m i c r o t u b u l e sI
chromatin containing centromere-sPecific h i s t o n eH 3 v a r i a n t
pericentric heterochromatin
/ c o h e s i nm o l e c u l e s l i n k i n gs i s t e r chromatids
microtubules
k in e t o c h o r e
(c) c h r o m a t i nc o n t a i n i n g c e n t r o m e r e - s p eicci f h i s t o n eH 3 v a r i a n t
(B)
c h r o m a t i nc o n t a i n i n g n o r m a lh i s t o n eH 3 t h a t i s dimethylated at lysine4
may occasionally allow newly formed chromosomes to be maintained stably, thereby facilitating the process of chromosome evolution. There are some striking similarities between the formation and maintenance of centromeres and the formation and maintenance of other regions of heterochromatin. In particulal the entire centromere forms as an all-or-none entity, suggestinga highly cooperative addition of proteins after a seedingevent' Moreover, once formed, the structure seemsto be directly inherited on the DNA as part of each round of chromosome replication.
to Eucaryotic Add UniqueFeatures ChromatinStructures Function Chromosome Although a great deal remains to be learned about the functions of different chromatin structures, the packaging of DNA into nucleosomes was probably crucial for the evolution of eucaryotes like ourselves' Complex multicellular organisms would appear to be possible only if the cells in different lineages can spLcializeby changing the acceisibility and responsivenessof many hundreds of glrr.r to genetic r"udort. As described in Chapter 22, each cell has a stored ir"1no.y ol itr purt developmental history in the regulatory circuits that control its many genes. Althoirgh bacteria also require cell memory mechanisms, the complexity of the memory circuits required by higher eucaryotesis unprecedented' The packaging of seiected regions of eucaryotic genomes into different forms of chroriatiir md es possibie a type of cell memory mechanism that is not available to bacteria. The irucial featuie of this uniquely eucaryotic form of gene regulation is the storage of the memory of the state of a gene on a gene-by-genebasis-in the form oi local chromatin structures that can persist for various lengths of time. At one extreme are structures Iike centric heterochromatin that, once established,are stably inherited from one cell generation to the next (seeFigure
Figure4-50 The organizationand function of the chromatin that forms (A)By staining human centromeres. stretchedchromatinfiberswith the alpha labeledantibodies, fluorescently satelliteDNAthat formscentric in humansis seento be heterochromatin packagedinto alternatingblocksof chromatin.One blockis formedfrom a containingthe long stringof nucleosomes H3 varianthistone(green);the CENP-A that are other blockcontainsnucleosomes markedwith a dimethyllysine4 specially (red).Eachblock is more than a thousand long.(B)A modelfor the nucleosomes organizationof the two types of centric As in Yeast,the heterochromatin. that containthe H3 variant nucleosomes histoneform the kinetochore.(C)The of the centricand pericentric arrangement on a humanmetaphase heterochromatin asdeterminedbY chromosome, usingthe same microscopy fluorescence antibodiesas in (A).(Adaptedfrom B.A.Sullivanand G.H.Karpen,Nat.Struct. 2004'With MoL Biol.11:1076-1083, Ltd.) from MacmillanPublishers oermission
232
Chapter4: DNA,Chromosomes, and Genomes
(A)
S6
H 2 A - H 2 Bd i m e r
*-: G
s r e p l i c a t i o nf o r k h i s t o n e sa d d e dt o new nucleosomes
H 3 - H 4t e t r a m e r p a r e n t a lc h r o m a t i n
newly assembledchromatin with n o r m a lH 3 h i s t o n e
(B) H 2 A - H 2 Bd i m e r
C E N P - AH 3 H4i
centromere-specif ic n u c l e o s o m edsi r e c t t h e a d d i t i o no f new histones
H4
newlyassembled chromatinwith centromere-specific H3histone Figure4-51 A modelfor the direct inheritanceof centromericheterochromatin. (A)The normalassembly of chromatinon the two daughterDNAhelicesproducedat a replication fork requiresthe depositionof H2A-H2Bdimersonto directlyinheritedH3-H4tetramers, aswell asthe assembly of new histoneoctamers (seeFigure5-38 for details). (B)At a centromere, the inheritance of H3 variant_H4 tetramersseedsthe formationof new histoneoctamersthat likewisecontainthe variantH3 histone.A similarseedingprocess couldcausethe adjacentblocksof centricheterochromatin (containing H3 modifiedat dimethyllysine4; seeFigure4-50)to be inherited. Althoughthe detailsarenot known,the seedingprocessis likelyto involveothercentromeric proteinsthat areinheritedalongwith the nucleosomes lseeFigure+-!Z;.
4-51). Closely related mechanisms that are likewise based on the direct inheritance of parental forms of chromatin by the daughter DNA helices behind the replication fork are thought to be responsible for other types of condensed chromatin (Figure 4-sz). For example, ihe permanently silenied, classicaltwe heterochromatinproteins nucleosomes
heterochromatin euchromatrn I
I cHROMOSOME I DUPLICATION .4.
t\
n1n11 \z r,/ U \./\,/ N E WH E T E R O C H R O M A T I N P R O T E I NASD D E DT O P R O P E R LMYO D I F I E D HISTONES
heterochromatin euchromarrn
Figure4-52 How the packagingof DNA in chromatincan be inheritedduring chromosomereplication.In this model, someof the specialized chromatin componentsaredistributedto each daughterchromosome afterDNA duplication, alongwith the specially markednucleosomes that they bind. AfterDNAreplication, the inherited nucleosomes that arespecially modified, actingin concertwith the inherited chromatincomponents, changethe patternof histonemodification on the newlyformeddaughternucteosomes nearby.Thiscreatesnew bindingsitesfor the samechromatincomponents, which then assemble to completethe structure. The latter processis likelyto involvecode reader-writer-remodeling complexes operatingin a mannersimilarto that previously illustrated in Figure4-46.
THEGLOBAL5TRUCTUREOFCHROMOSOMES
,
I
of heterochromatin contains the HPI protein, whereas the condensed chromatin that coats important developmental regulatory genes is maintained by the polycomb group of proteins. The latter type of heterochromatin silences a large number of genesthat encode gene regulatory proteins early in embryonic development, covering a total of about 2 percent of the human genome, and it is removed only when each individual gene is needed by the developing organism (discussed in Chapter 22). Although other tlpes of inherited chromatin structures exist, it is not yet clear how many different t)?es there are: the number could certainly exceed l0 (seep. 238). The fundamental importance of this mechanism for distinguishing different genes is schematically represented in (Figure 4-53). other forms of chromatin can have a shorter lifetime, much less than the division time of the cell; however, many have a built-in persistence that helps to
specificchromatin structureson genes
feedback loops m a i n t a i n i n gc h r o m a t i n structures
mediate biological function.
Su m m a r y Despite the uniform assemblyof chromosomal DNA into nucleosomes,a large uariety of dffirent chromatin structuresare possiblein eucaryotic organisms. This uariety is based on a large setof reuersiblecoualent modifications of the four histones in the nucleosomecore. Thesemodifications include the mono-, di-, and tri-methylation of many dffirent lysinesidechains,an important reactionthat is incompatiblewith the acetylation of the same lysines.Specifi,ccombinations of the modifications mark each nucleosomewith a histone code.The histone code is read when protein modules that are part of a larger protein complex bind to the modified nucleosomesin a region of chromatin. Thesecode-readerproteins then attract additional proteins that catalyze biologically releuantfunctio ns. Some code-readerprotein complexescontain a histone-modtfying enzyme,such A as a histone methylase,that "writes" the same mark that the code-readerrecognizes. reader-writer-remodelingcomplex of this type cqn spreada specfficform of chromatin hetefor long distancesalong a chromosome.In particular, large regionsof condensed is commonly rochromatin are thought to be formed in this way. Heterochromatin other found around centromeresand near telomeres,but it is also present &t many usually heterochromatin packaging DNA into of positionsin chromosomes.The tight silencesthe geneswithin it. The phenomenon of position effect uariegation prouides good euidencefor the direct inheritance of condensedforms of chromatin by the daughter DNA helices formed at a replication fork, and a similar mechanism appears to be responsiblefor maintaining the specializedchromatin at centromeres.More generally,the ability to transmit specificchromatin structuresfrom one cell generationto the next prouides the basisfor an epigeneticcell memory processthat is likely to be critical for maintaining the complex set of dffirent cell stetesrequired by complex multicellular organBms.
OFCHROMOSOMES STRUCTURE THEGLOBAL Having discussed the DNA and protein molecules from which the 30-nm chromatin fiber is made, we now turn to the organization of the chromosome on a more global scale.As a 30-nm fiber, the typical human chromosome would still be 0.1 cm in length and able to span the nucleus more than 100 times. Clearly, there must be a still higher level of folding, even in interphase chromosomes. Aithough its molecular basis is still largely a mystery, this higher-order packaging almost certainly involves the folding of the 30-nm fiber into a seriesof loops and coils. This chromatin packing is fluid, frequently changing in response to the needs of the cell. We shall begin by describing some unusual interphase chromosomes that can be easily visualized, inasmuch as certain features of these exceptional cases are thought to be representativeof all interphase chromosomes. Moreover, they
Figure4-53 Schematicillustrationof cell memory stored as chromatin-based epigenetic information in the genesof eucaryotes.Genesin eucaryoticcellscan be packagedinto a largevarietYof differentchromatinstructures,indicated here by differentcolors.At leastsomeof thesechromatinstructureshavea special effecton gene expressionthat can be directlyinheritedasePigenetic This informationwhen a celldivides. allowssomeof the generegulatory proteinsthat createdifferentgene states to act only once,inasmuchasthe state can be rememberedafter the regulatory protein is gone.Epigeneticinformation can alsobe storedin networksof that controlgene signalingmolecules expression(seeFigure7-86).
234
Chapter4: DNA,Chromosomes, and Genomes
Figure 4-54 Lampbrushchromosomes. (A)A light micrographof lampbrush chromosomes in an amphibianoocyte. Earlyin oocytedifferentiation,each chromosomereplicates to beginmeiosis, and the homologousreplicated chromosomes pairto form this highly extendedstructurecontaininga total of four replicatedDNA molecures, or chromatids. The lampbrushchromosome stagepersistsfor monthsor years,while the oocytebuildsup a supplyof materials requiredfor its ultimate developmentinto a new individual. (B)An enlargedregionof a similar chromosome,stainedwith a fluorescent reagentthat makesthe loopsactivein RNAsynthesis clearlyvisible.(Courtesy of JosephG.Gall.)
1 0 0p m
20pm
provide a unique means for investigating some fundamental aspects of chromatin structure raised in the previous section. Next we describe how a typical interphase chromosome is arranged in the cell nucleus, focusing on human cells. Finally, we conclude by discussing the additional tenfold comlpaction that interphase chromosomes undergo during the process of mitosis.
Chromosomes AreFoldedinto LargeLoopsof Chromatin Insight into the structure of the chromosomes in interphase cells has been obtained from studies of the stiff and extended meiotically paired chromosomes in growing amphibian oocytes (immature eggs).These rr"ry unnsual lampbrush chromosomes (the largest chromosomes knor,r,n)are cleaily visible even in the light microscope, where they are seen to be organized into a seriesof large chromatin loops emanating from a linear chromosomal axis (Figure +-s4). The organization of a lampbrush chromosome is illustrated in Figure 4-55. A gi_venloop always contains the same DNA sequence, and it remains extended in the same manner as th9 ooclte grows. Theie chromosomes are producing large amounts of RNA for the ooclte, and most of the genes present in the DNA loops are being actively expressed.The majority of ttrJ DNA, however, is not in loops but remains highly condensed in the chromomeres on the axis, where genes are generally not expressed. It is thought that the interphase chromosomes of all eucaryotesare similarly arranged in loops. Although these loops are normally too smail and fragile to b! easily observed in a light microscope, other methods can be used to infer their presence. For example, it has become possible to assessthe frequencv with
i''" ,,,:,": :: ''f35'
OFCHROMOSOMES THEGLOBALSTRUCTURE
Figure4-55 A model for the structureof a lampbrush chromosome.The set of in manY lampbrushchromosomes amohibianscontainsa total of about 10,000chromatinloops,althoughmostof the DNAin eachchromosomeremains highlycondensedin the chromomeres. to a particularDNA Eachloop corresponds Fourcopiesof eachloop are sequence. presentin eachcell,sinceeachofthe two of majorunitsshownat the top consists two closelyapposed,newly replicated Thisfour-stranded chromosomes. of this stageof structureis characteristic developmentof the oocyte,the diplotene seeFigure21-9. stageof meiosis;
REGION OF SINGLE CHROMOSOME
extended chromatin in loop
OF S M A L LR E G I O N CHROMOSOME SHOWING SISTER CHROMATIDS chromatin Jornrng adjacent chromomeres
cnromomere f o r m e df r o m h i g h l yc o n d e n s e d chromatin
which any two Ioci along an interphase chromosome are paired with each other, thus revealing Iikely candidates for the sites on chromatin that form the closely apposed bases of loop structures (Figure 4-56). These experiments and others suggestthat the DNA in human chromosomes is organized into loops of
DNA-binding prorernS t-:, t
iI
I
t-.
+ TREAT WITH FORMALDEHYDE
REMOVE CROSS-LINKS BY HEATTREATMENT AND PROTEOLYSIS
T E S TF O RJ O I N E D S E G M E N TBSY PCR
DNA product is obtained only if proteins hold the two DNA sequencesclose together in the cell
In thistechnique,knownas Figure4-56 A methodfor determiningthe positionof loopsin interphasechromosomes. the indicatedcovalent create to formaldehyde with (:C) are treated cells method, .uptur" the chromosomeconformation that chopsthe DNAinto many The DNAisthen treatedwith a restrictionnuclease DNA-proteinand DNA-DNAcross-links, ends"(seeFigure8-34)'The and formingsetsof identical"cohesive pieces,cuttingat strictlydefinednucleotidesequences lmportantly,priorto the ligationstep base-pairin9. cohesiveendscan be madeto join throughtheir complementary (throughcrossshown,the DNAis dilutedso that the fragmentsthat havebeenkept in closeproximityto eachother and the newlyligatedfragmentsof DNAare arereversed the cross-links linking)arethe onesmost likelyto join. Finally, describedin Chapter8).By combiningthe frequency-ofchainreaction, identiiiedand quantifiedby pCR(the polymeiase structuralmodelscan be informationgeneratedby the 3Ctechniquewith DNAsequenceinformation, association of chromosomes. producedfor the interphase conformation
236
Chapter4: DNA,Chromosomes, and Genomes
l o o p e dd o m a i n folded
30-nm fiber
I
histone m o d i f y i n ge n z y m e s chromatin r e m o d e l i n gc o m p l e x e s RNA polymerase
p r o t e i n sf o r m i n g c h r o m o s o m es c a f f o l d
Figure4-57 A model for the organizationof an interphasechromosome.A sectionof an interphase chromosomeis shownfoldedinto a seriesof loopeddomains,eachcontainingperhaps50,000-2O0,OOO nucleotidepairsof double-helical DNAcondensedinto a 30-nmfiber.Thechromatinin eachindividualloop isfurthercondensed throughpoorlyunderstood foldingprocesses that are reversed when the cellrequiresdirectaccess to the DNApackagedin the lJop.Neitherthe compositionof the postulatedchromosomal axisnor how the folded30-nmfiber is anchoredto it is clear.However, rn mitoticchromosomes the basesof the chromosomal loopsareenrichedboth in condensins and in DNAtopoisomerase ll enzymes, two proteinsthat mayform muchof the axisat metaphase (seeFigure4-74).
different lengths. A typical loop might contain between 50,000 and 200,000 nucleotide pairs of DNA, although loops of a million nucleotide pairs have also been suggested (Figure 4-SZ).
PolyteneChromosomes Are UniquelyUsefulfor Visualizing ChromatinStructures certain giant insect cells have grown to their enormous size through multiple cycles of DNA slmthesis without cell division. such cells with more thin the nbr-
chromosome.
pairs, while a thick band may contain 200,000 nucleotide pairs in each of its chromatin strands. The chromatin in each band appears dark because the DNA is more condensed than the DNA in interbands; it may also contain a higher proportion of proteins (Figure 4-59). There are approximately 3700 bands and 3700 interbands in the complete set of Dro sophila p olltene chromosomes. The bands can be recognized by their different thicknesses and spacings, and each one has been given a number to generate a chromosome "map" that has been indexed to the finished genome sequence of this fly. The Drosophila polytene chromosomes provide a good starting point for examining how chromatin is organized on a large scale.In the previous section, we saw that there are many forms of chromatin, each of which contains nucleosomes with a different combination of modified histones. By reading this histone code, specific sets of non-histone proteins assembleon tire nucleosomes to
'.''
THEGLOBALSTRUCTURE OFCHROMOSOMES
Figure4-58 The entire set of polytene chromosomes in one Drosophila salivarycell.In this drawingof a light the giantchromosomes micrograph, havebeen spreadout for viewing by them againsta microscope squashing slide.Drosophilahasfour chromosomes, and there arefour differentchromosome pairspresent.Buteachchromosomeis tightly pairedwith its homolog (sothat eachpairappearsasa singlestructure), whichis not true in mostnuclei(except Eachchromosomehas in meiosis). undergonemultipleroundsof and the homologuesand all replication, theirduolicateshaveremainedin exact registerwith eachother,resultingin hugechromatincablesmanyDNA strandsthick. The four polytenechromosomesare normallylinkedtogetherby regionsneartheir heterochromatic centromeresthat aggregateto createa singlelargechromocenter(pinkregion). In this preparation,however,the hasbeensplitinto two chromocenter halvesby the squashingprocedureused. (Adaptedfrom T.S.Painter,J. Hered. 6, 1934.With permissionfrom 25:465-47 OxfordUniversityPress.)
t:\'
.,.N"
right arm of c h r o m o s o m e3
237
X chromosome n o r m a lm i t o t i c chromosomesat s a m eS c a l e
left arm of c h r o m o s o m e3
,o lrm
v , ,!,.i11, -.1
{.
interbands
Danos
(A) 2pm
1 ilm
Figure4-59 Micrographsof polytene chromosomesfrom Drosophilo salivary glands.(A) Light micrographof a portion DNAhasbeen ofa chromosome.The stainedwith a fluorescentdye, but a reverseimage is presentedherethat rendersthe DNA black rather than white; the bandsareclearlyseento be regions This of increasedDNAconcentration. chromosomehasbeen ProcessedbY a high pressuretreatmentso asto show its distinctpatternof bandsand interbands more clearly.(B)An electronmicrograph of a smallsectionof a Droso7hila polytenechromosomeseenin thin section.Bandsof very differentthickness separated can be readilydistinguished, whichcontainless bv interbands, condensedchromatin.(A,adaptedfrom D.V.Novikov l. Kireevand A.S.Belmont, With Nat.Methods4:483-485,2007. permission from MacmillanPublishers Ltd.B,courtesyofVeikkoSorsa.)
238
Chapter4: DNA,Chromosomes, and Genomes
1 0p m
affect biological function in different ways. some of these non-histone proteins can spread for long distances along the DNA, imparting a similar chromatin structure to contiguous regions of the genome (seeFigure 4-46). Thus, in some regions, all of the chromatin has a similar structure and is separatedfrom neighboring domains by barrier proteins (see Figure 4-47). ft low resolution, the interphase chromosome can therefore be considered as a mosaic of chromatin structures, each containing particular nucleosome modifications associated with a particular set of non-histone proteins. (At a higher level of resolution one would also emphasize the many sequence-specificDNA-binding proteins that will be described in Chapter 7). This view of an interphase chromosome helps us to interpret the results obtained from studies of polltene chromosomes. By staining with highly specific antibodies, one can show that differently modified histones (Figure 4-60), as well as distinct sets of non-histone proteins, are located on different polytene chromosome bands. This suggests a powerful general strategy. By employing combinations of antibodies that bind tightly to each of the many different histone modifications that create the histone code (seeFigure 4-39), it may be possible to determine which combinations of modifications specify particular types of chromatin domains. And by carrying out similar experiments with antibodies that recognize each of the hundreds of different non-histone proteins in chromatin, one can attempt to decipher the many different meanings encoded in histone modifications.
ThereAreMultipleFormsof Heterochromatin Molecular studies have led to a reevaluation of our view of heterochromatin. For many decades,heterochromatin was thought to be a single entity defined by its highly condensed structure and its ability ro silence genes permanently. But if we define heterochromatin as a form of compact chromatin that can silence genes, be epigenetically inherited, and spread along chromosomes to cause position effect variegation (seeFigure 4-36), it is clear that different types ofheterochromatin exist. In fact, we have already considered three of these types in discussing the human centromere (seeFigure 4-50). Each domain of heterochromatin is thought to be formed by the cooperative assemblyof a set of non-histone proteins. For example,classicalpericentromeric heterochromatin contains more than six such proteins, including heterochromatin protein I (HPI), whereas the so-called polycomb form of heterochromatin contains a similar number of proteins in a non-overlapping set (pcc proteins). There are hundreds of small blocks of heterochromatin spread across the arms of Drosophilapolytene chromosomes, as identified by their late replication (discussedin chapter 5). Antibody staining of these regions of heterochromatin suggests that the knor,tm forms of heterochromatin can account for no more than half of the heterchromatic polytene bands (Figure 4--61).Thus, other rypes of heterochromatin must exist whose protein composition is not knolrm. rt is titetv
Figure4-60 The pattern of histone modificationson Drosophilapolytene chromosomes.Antibodiesthat specificallyrecognizedifferenthistone modificationscan revealwhere each modificationis found with referenceto the manybandsand interbandson these chromosomes. In the two preparations shownhere,the positionsof two differentmarkingson histoneH3 tailsare compared.In both cases, the antibody labelingthe modifiedhistoneis green, and the DNAis stainedred.Onlya small regionsurrounding eachchromocenter is shown.(A) DimethylLys9 (green)is a histonemodification associated with the pericentric heterochromatin. lt is seento be associatedwith the chromocenter. (B)Acetylated Lys9 (green)is a modification that is concentrated in histonesassociated with activegenes.lt is seento be presentin numerousbands on the chromosome arms,but not in the heterochromatic chromocenter. Similar experiments can be carriedout to positionmanyothermodifiedhistones, proteins(see, aswell asthe non-histone for example,Figure 22-45 for chromosomes stainedfor Polycomb). (Adaptedfrom A. Ebert,S.Lein, G. Schottaand G. Reuter,Chromosome Res.14:377-392,2006.With permission from Springer.)
i o n l y P c Gp r o t e i n
'g/rt
b o t h H P l a n d P c Gp r o t e i n
neitherHP1nor PcGprotein
50
75
100
percentof heterochromatin sites Figure4-61 Evidencefor multipleforms of heterochromatin,In this study,240 late-replicatingsiteson the Drosophila polytenechromosome armswere examinedfor the presenceof two nonhistoneproteins. Theseproteinsare knownto help compacttwo different forms of heterochromatin(seetext).As indicated, antibodystainingsuggests that roughlyhalfof the sitesare packagedinto forms of heterochromatin that are differentfrom eitherof these two. Experiments suchasthese demonstrate that we havea greatdeal moreto learnaboutthe packagingof DNA in eucaryotes.(Datafrom l.F.Zhimulevand E.5.Belyaeva, BioEssays 25:1040-1051, 2003.With permission from JohnWiley& Sons.)
239
THEGLOBALSTRUCTURE OF CHROMOSOMES
that each of these types of heterochromatin is differently regulated and has different roles in the cell. The chromatin structure in each domain ultimately depends on the proteins that bind to specific DNA sequences,and these are kno'o,nto vary depending on the cell type and its stageof development in a multicellular organism. Thus, both the pattern of chromatin domains and their individual compositions (nucleosome modifications plus non-histone proteins) can vary between tissues.These differences make different genes accessible for genetic readout, helping to explain the cell diversification that accompanies embryonic development (describedin Chapter 22).Comparisons of the polyene chromosomes in two different tissues of a fly lend support to this general idea: although the patterns of bands and interbands are largely the same,there are reproducible differences.
Whenthe GenesWithinThemAre ChromatinLoopsDecondense Expressed \Mhen an insect progressesfrom one developmental stageto another, distinctive chromosome puffs arise and old puffs recede in its polltene chromosomes as new genes become expressedand old ones are turned off (Figure 4-62). From inspection of each puff when it is relatively small and the banding pattern is still discernible, it seems that most puffs arise from the decondensation of a single chromosome band. The individual chromatin fibers that make up a puff can be visualized with an electron microscope. In favorable cases, loops are seen, much like those observed in the amphibian lampbrush chromosomes discussed above' \A4ren not expressed,the loop of DNA assumesa thickened structure, possibly a folded 30-nm fiber, but when gene expression is occurring, the loop becomes more extended. In electron micrographs, the chromatin located on either side of the decondensed loop appears considerably more compact, suggestingthat a loop constitutes a distinct functional domain of chromatin structure. Observations made in human cells also suggestthat highly folded Ioops of chromatin expand to occupy an increased volume when a gene within them is expressed.For example, quiescent chromosome regions from 0.4 to 2 million nucleotide pairs in length appear as compact dots in an interphase nucleus when visualized by fluorescence microscopy using FISH or other technologies. However, the same DNA is seen to occupy a Iarger territory when its genes are expressed,with elongated, punctate structures replacing the original dot.
1 0p t Figure 4-62 RNAsynthesisin polytene chromosomepuffs.An autoradiograph of a singlepuff in a polytene from the salivaryglandsof chromosome the freshwatermidge C.tentans.As outlinedin Chapter1 and describedin detailin Chapter6, the firststepin gene of an RNA is the synthesis expression moleculeusingthe DNAasa template. portionofthe Thedecondensed chromosomeis undergoingRNA and hasbecomelabeledwith synthesis 3H-uridine (seep. 603),an RNAprecursor into moleculethat is incorporated of Jos6 growingRNAchains.(Courtesy Bonner.)
SitesWithinthe Nucleusto Alter ChromatinCanMoveto Specific GeneExpression New ways of visualizing individual chromosomes have shonm that each of the 46 interphase chromosomes in a human cell tends to occupy its ovrmdiscrete territory within the nucleus (Figure tl-63). However, pictures such as these present only an average view of the DNA in each chromosome. Experiments that specifically localize the heterochromatic regions of a chromosome reveal that they are often closely associated with the nuclear lamina, regardless of the chromosome examined. And DNA probes that preferentially stain gene-rich regions of human
1 0p m 1'l 199
10 t3
of the chromosometerritories visualization Figure4-63 Simultaneous for all of the human chromosomesin a singleinterphasenucleus.A FISH for markingthe DNAof analysis usinga differentmixtureof fluorochromes detectedwith sevencolorchannelsin a fluoresence eachchromosome, in threeto be distinguished microscope, allowseachchromosome eachchromosomeis Belowthe micrograph, reconstructions. dimensional of the actualimage.Notethat the two identifiedin a schematic (e.g.,the two copiesof chromosome9),arenot homologouschromosomes (FromM.R.Speicher Nat'Rev.Genet. and N.P.Carter, in generalco-located. Ltd.) from MacmillanPublishers With permission 6:782-792,2005.
5
8 12
240
Chapter4: DNA,Chromosomes, and Genomes Figure4-64Thedistribution of gene-rich regions of the humangenome in an interphase nucleus. Gene-rich regions havebeenvisualized witha fluorescent probethathybridizes to theAluinterspersed repeat, whichis present in morethana millioncopies in thehumangenome(seeFigure 5-75).Forunknown reasons, thesesequences cluster in chromosomal regions richin genes. Inthisrepresentation, regions enriched fortheAlu sequence ategreen,regions depletedfor thesesequences arered,while theaverage regions areyel/or,v. Thegene-rich regions areseento be depletedin the DNAnearthe nuclear (FromA. Bolzeret al.,pLoS envelope. Biol.3:826-842, 2005.Withpermission frompublicLibrarvof Science.)
chromosomes produce a striking picture of the interphase nucleus that presumably reflects different average positions for active and inactive genes (Figure 4-Sq. A variety of different types of experiments have led to the conclusion that the position of a gene in the interior of the nucleus changes when it becomes highly expressed.Thus, a region that becomes very actively transcribed is often found to extend out of its chromosome territory, as if in an extended loop (Figure 4-65). We will see in Chapter 6 that the initiation of transcription-the first step in gene expression-requires the assembly of over 100 proteins, and it makes sensethat this would occur most rapidly in regions of the nucleus particularly rich in these proteins. More generally,it is clear that the nucleus is very heterogeneous,with functionally different regions to which portions of chromosomes can move as they are subjected to different biochemical processes-such as when their gene expression changes (Figure 4-66). There is evidence that some of these nuclear regions are marked with different inositol phospholipids, reminiscent of the way that the same lipids are used to distinguish different membranes in the cytoplasm (seeFigure 13-11). But what these lipids are attached to in the interior of the nucleus is a mystery as the onlyknown lipid-rich environments are the lipid bilayers of the nuclear envelope.
5[m
-
r'-ffi
n u c l e a re n v e l o p e
homologouschromosomes detected by hybridization techniques
(B)
G E N EO F F
.-
G E N EO N
Figure4-65 An effect of high levelsof gene expressionon the intranuclear location of chromatin.(A) Fluorescence micrographs of humannucleishowing how the positionof a genechanges when it becomeshighlytranscribed. The regionof the chromosomeadjacentto the gene (red)is seento leaveits chromosomalterritory(green)only when it is highlyactive.(B)Schematic representationof a largeloop of chromatinthat expandswhen the geneis on,and contractswhen the geneis off Othergenesthat arelessactively expressed can be shownby the same methodsto remaininsidetheir chromosomal territorywhen transcribed. (FromJ.R.Chubb and W.A.Bickmore,Cel/ 112:403-406,2003.With permissionfrom Elsevier.)
241
THEGLOBALSTRUCTURE OFCHROMOSOMES nuclearneighborhood f o r g e n es i l e n c i n g
nuclearneighborhood f o r g e n ee x p r e s s i o n
CELL CHANGES IN RESPONSE TO SIGNALS
n u c l e a renverop" envelope nucrear
n",#;J:il:t:l:"t;ff:l!fl%,"
'i'llli'"Ti"T$ I l""lil"' :,."ff
Figure4-66 The movement of genesto different regionsof the nucleuswhen their expressionchanges.The interiorof and the nucleusis very heterogeneous, are differentnuclearneighborhoods knownto havedistincteffectson gene Movementssuchasthose exoression. reflect indicatedherepresumably that the changesin the bindingaffinities chromatinand RNAmolecules a genehavefor different surrounding lt is thoughtthat nuclearneighborhoods. the movementis drivenby diffusionand doesnot reouirea directedmovement process, inasmuchas eachregionof a chromosomecan be seento undergo constantrandommotionwhen marked in a way that allowsits positionto be followedin a livingcell.
Forma Setof DistinctBiochemical Networksof Macromolecules i n si d eth e N u cl e u s En v i r o n m e nts In Chapter 6, we describe the function of a variety of subcompartments that are present within the nucleus. The largest and most obvious of these is the nucleoIus, a structure well known to microscopists even in the 19th century (seeFigure 4-9). Nucleolar regions consist of networks of RNAs and proteins surrounding transcribing ribosomal RNA genes, often existing as multiple nucleoli. The nucleolus is the cell's site of ribosome assembly and maturation, as well as the place where many other specialized reactions occur. A variety of less obvious organelles are also present inside the nucleus. For example, spherical structures called Cajal bodies and interchromatin granule clusters are present in most plant and animal cells (Figure 4-67). Like the nucleolus, these organellesare composed of selectedprotein and RNA molecules that bind together to create networks that are highly permeable to other protein and RNA molecules in the surrounding nucleoplasm (Figure 4-68). Structures such as these can create distinct biochemical environments by immobilizing select groups of macromolecules, as can other networks of proteins and RNA molecules associatedwith nuclear pores and with the nuclear envelope. In principle, this allows the molecules that enter these spaces to be processedwith great efficiency through complex reaction pathways. Highly permeable, fibrous networks of this sort can thereby impart many of the kinetic advantagesof compartmentalization (seep. 186) to reactions that take place in the nucleus (Figure 4-69A). However, unlike the membrane-bound compartments in the cltoplasm (discussedin Chapter 12), these nuclear subcompartments-lacking a lipid bilayer membrane-can neither concentrate nor exclude specific small molecules. The cell has a remarkable ability to construct distinct biochemical environments inside the nucleus. Those thus far mentioned facilitate various aspectsof gene expressionto be discussedin Chapter 6 (seeFigure 6-49). Like the nucleolus, these subcompartments appear to form only as needed, and they create a high local concentration of the many different enzlrnes and RNA molecules needed for a particular process.In an analogous way, when DNA is damaged by irradiation, the set of enzymes needed to carry out DNA repair are observed to congregate in discrete foci inside the nucleus, creating "repair factories" (see Figure 5-60). And nuclei often contain hundreds of discrete foci representing factories for DNA or RNA synthesis. It seemslikely that all of these entities make use of the type of tethering illustrated in Figure 4-698, where Iong flexible lengths of pollpeptide chain (or some other polymer) are interspersed with binding sites that concentrate the multiple proteins and/or RNA molecules that are needed to catalyzea particular process. Not surprisingly, tethers are similarly used to help to speed biological processes
fr Figure 4-67 Electronmicrographshowing two very common fibrous nuclear The largespherehere subcompartments. is a Cajalbody.Thesmallerdarkersphereis granulecluster, also an interchromatin knownasa spreckle(seealsoFigure6-49). arefrom the organelles" These"subnuclear nucleusof aXenopusoocYte.(From K.E.Handwergerand J.G.Gall,TrendsCell With permissionfrom Biol.16:19-26,2006. Elsevier.)
242
Chapter4: DNA,Chromosomes, and Genomes m o l e c u l aw r e i g h t o f f l u o r e s c e ndt e x t r a ni n n u c l e u s 70,000
500,000
1 0p m
in the cytoplasm, increasing specific reaction rates (for example, see Figure l6-38). Is there also is an intranuclear framework, analogous to the cltoskeleton, on which chromosomes and other components of the nucleus are organized? The nuclear matrix, or scaffold, has been defined as the insoluble material left in the nucleus after a series of biochemical extraction steps. Many of the proteins and RNA molecules that form this insoluble material are likely to be derived from the flbrous subcompartments of the nucleus just discussed, while others seem to be proteins that help to form the base of chromosomal loops or to attach chromosomes to other structures in the nucleus. Whether or not the nucleus also contains
Figure4-69 Effectivecompartmentalizationwithout a bilayer membrane.(A)Schematic illustration of the organization of a spherical subnuclear organelle(/eft)and of a postulated similarlyorganizedsubcompartment just beneaththe nuclearenvelope(rlght).In both cases, RNAsand/or proteins(groy)associate to form highly porous,gel-likestructuresthat contain binding sitesfor other specificproteinsand RNAmolecules(coloredobjects). (B)How tne tetheringof a selectedsetof proteinsand RNAmolecules to long flexiblepolymerchains,as in A, could create"stagingareas"that greatlyspeedthe ratesof reactionsin subcompartmentsof the nucleus.The reactions catalyzed will dependon the particularmacromolecules that are localizedby the tethering.The sametype of rateaccelerations are of courseexpectedfor similar subcompartments established elsewhere in the cell(seealsoFigure3-g0C).
Figure4-68 An experiment showing that the subnuclearorganellesare highly permeableto macromolecules. In thesemicrographsof a living oocyte nucleus, the top row comparesthe fluorescence of the interiorsof nucleoli, Cajalbodies,and sprecklesto the fluorescence of the surrounding nucleoplasm, 12 hoursafterfluorescent dextransof the indicatedmolecular weight had been injectedinto the nucleoplasm. The brightness of each organellereflectsits permeability,with the mostpermeableorganellebeingthe brightest.Forcomparison, the bottom row presentsnormallight micrographs of the samemicroscope fields,with the nucleolusin eachfieldof view marked brownto distinguishit. Cajalbodiescan be seento be more permeablethan quantitationshows nucleoli.However, that a great deal of dextranenterseach organelle,evenfor the largestdextran tested.(FromK.E.Handwerger, J.A.Corderoand J.G.Gall,Mol. Biol.Cell '16.'202-2'11, 2005.With permissionfrom AmericanSocietyof CellBiology.)
243
THEGLOBALSTRUCTURE OFCHROMOSOMES Figure4-70 A typicalmitoticchromosome Eachsister at metaphase. chromatid contains oneof two identical daughter DNAmolecules (seealsoFigure generated 17-26). earlier in thecellcycleby DNAreplication
cnromosome
long filaments that form organized tracks on which nuclear components can move, analogous to some of the filaments in the cytoplasm, is still disputed.
MitoticChromosomes Are Formedfrom Chromatinin lts Most CondensedState Having discussed the dynamic structure of interphase chromosomes, we now turn to mitotic chromosomes.The chromosomes from nearly all eucaryotic cells become readily visible by light microscopy during mitosis, when they coil up to form highly condensed structures. This condensation reduces the length of a tJ,?ical interphase chromosome only about tenfold, but it produces a dramatic change in chromosome appearance. Figure 4-70 depicts a typical mitotic chromosome at the metaphase stage of mitosis (for the stages of mitosis, see Figure 17-3). The two daughter DNA molecules produced by DNA replication during interphase of the cell-division cycle are separately folded to produce two sister chromosomes, or sister chromatids, held together at their centromeres (see also Figure 4-50). These chromosomes are normally covered with a variety of molecules, including large amounts of RNA-protein complexes.Once this covering has been stripped away, each chromatid can be seen in electron micrographs to be organized into loops of chromatin emanating from a central scaffolding (Figure 4-71), Experiments using DNA hybridization to detect specific DNA sequencesdemonstrate that the order of visible features along a mitotic chromosome at least roughly reflects the order of genes along the DNA molecule. Mitotic chromosome condensation can thus be thought of as the final level in the hierarchy of chromosome packaging (Figure 4-72). The compaction of chromosomes during mitosis is a highly organized and dynamic process that serves at least two important purposes. First, when condensation is complete (in metaphase), sister chromatids have been disentangled from each other and lie side by side. Thus, the sister chromatids can easily separate when the mitotic apparatus begins pulling them apart. Second, the compaction of chromosomes protects the relatively fragile DNA molecules from being broken as they are pulled to separatedaughter cells. The condensation of interphase chromosomes into mitotic chromosomes begins in early M phase, and it is intimately connected with the progression of the cell cycle, as discussedin detail in Chapter 17. During M phase, gene expression shuts do'n'm,and specific modifications are made to histones that help to reorganize the chromatin as it compacts. The compaction is aided by a class of proteins called condenslns, which use the energy of ATP hydrolysis to help coil the two DNA molecules in an interphase chromosome to produce the two chromatids of a mitotic chromosome. Condensins are large protein complexes built from SMC protein dimers: these dimers form when two stiff, elongated protein monomers join at their tails to form a hinge, leaving two globular head domains at the other end that bind DNA and hydrolyze ATP (Figure 4-73).\,Vhen added to purified DNA, condensins can make large right-handed loops in DNA molecules in a reaction that requires ATP Although it is not yet known how they act on chromatin, the coiling model shornmin Figure 4-73C is based on the fact that condensins are a major structural component that end up at the core of metaphase chromosomes, with about one molecule of condensin for every Figure4-71 A scanningelectron micrographof a region near one end of a typical mitotic chromosome.Eachknoblikeprojectionis believedto represent the tip of a separateloopeddomain.Notethat the two identical pairedchromatids(drawnin Figure4-70) can be clearlydistinguished. (FromM.P.Marsdenand U.K.Laemmli,Cell17:849-858,1979.With permissionfrom Elsevier.)
chrornatia
0 . 1p m
244
Chapter4: DNA,Chromosomes, and Genomes
" b e a d s - o n - a - s t r i"n g form of chromatin
30nm
s e c t i o no f c h r o m o s o m ei n extended form
enti re mitotic chromosome
I
T I
3 0 - n mc h r o m a t i n fiber of packed nucleosomes
c o n d e n s e ds e c t i o n of chromosome
T
l1 nm
T I
3 0 0n m
ret
TI
700nm
i
I
T I
1 4 0 0n m
N E TR E S U L E T :A C HD N A M O L E C U LH EA SB E E N P A C K A G E IDN T OA M I T O T I C C H R O M O S O MTEH A T I S 1 O , O O O - F OSLHDO R T ETRH A NI T SE X T E N D ELDE N G T H
Figure4-73 The SMCproteinsin condensins. (A)Electronmicrographs of a purifiedSMCdimer.(B)Thestructureof a SMCdimer.The long central regionof this proteinis an antiparallel coiled-coil(seeFigure3-9) with a flexiblehingein its middle.(C)A modelfor the way in whichthe SMC proteinsin condensins might compactchromatin.In reality,SMCproteins arecomponentsof a much largercondensincomplex.lt hasbeen proposedthat, in the cell,condensins coil long stringsof loopedchromatin domains(seeFigure4-57).lnthis wa, the condensins couldform a structuralframeworkthat maintainsthe DNAin a highlyorganizedstate duringmetaphase of the cellcycle.(A,courtesyof H.P.Erickson; B and C, adaptedfrom T. Hirano,Not.Rev.Mol.CellBiol.7:311-322,2006. With permission from MacmillanPublishers Ltd.)
Figure4-72 Chromatin packing.This modelshowssomeof the manylevelsof chromatinpackingpostulatedto giverise to the highlycondensedmitotic chromosome.
245
HOWGENOMES EVOLVE
Figure4-74 The locationof condensinin condensedmitotic (A)Fluorescence at chromosomes. micrographof a humanchromosome mitosis,stainedwith an antibodythat localizes In chromosomes condensin. in that arethis highlycondensed, the condensinis seento be concentrated punctatestructures show alongthe chromosome axis.Similarexperiments a similarlocationfor DNAtopoisomerase ll,an enzymethat makes reversible double-strand breaksin DNAthat allowone DNAdoublehelixto
(A)
lHjiJ#l=il:liij$:H';i;i.?;ii],1il"T,",%"J,3"JliX:?n.,".,,0
is seenin crosssection,with the chromosome to the axisperpendicular planeof the paper.(A,from K.Maeshimaand U.K.Laemmli,Dev.Cell 4:467-480,2003. With permission from Elsevier. B,courtesyof U.K.Laemmli, from K.Maeshima, 114:365-375, M. Eltsovand U.K.Laemmli,Chromosoma 2005.With permission from Springer,) 10,000 nucleotides of DNA (Figure 4-7 4) . \A/hen condensins are experimentally depleted from a cell, chromosome condensation still occurs, but the process is abnormal.
Summary Chromosomesare generallydecondensedduringinterphase, so that the details of their structure are dfficult to uisualize.Notable exceptionsare the specializedlampbrush chromosomesof uertebrateoocytesand the polytene chromosomesin the giant secretory cellsof insects.Studiesof thesetwo typesof interphasechromosomessuggestthat each long DNA moleculein a chromosomeis diuided into a large number of discrete domains organizedas loopsof chromatin, each loop probably consistingof a 30-nm chromatinfiber that is compactedbyfurther folding.IAlhengenescontainedin a loop are expressed,the loop unfolds and allows the cell'smachinery accessto the DNA. Interphasechromosomesoccupydiscreteterritories in the cell nucleus;that is,they are not extensiuelyintertwined.Euchromatin makesup most of interphasechromosomes and, when not being transcribed, it probably existsas tightly folded 30-nm flbers. Howeuer,euchromatin is interrupted by stretchesof heterochromatin, in which the 30-nm fibers are subjectedto additional packing that usually rendersit resistantto gene expression.Heterochromatin existsin seueralforms, some of which arefound in large blocks in and around centromeresand near telomeres.But heterochromatin is also presentat many other positions on chromosomes,where it can serueto regulate deuelopmentallyimportant genes. The interior of the nucleusk highly dynamic, with heterochromatinoften positioned near the nuclear enuelopeand loops of chromatin mouing away from their chromosometerritory when genesare ueryhighly expressed.This reflectsthe existence of nuclearsubcompartments,wheredffirent setsof biochemicalreactionsarefacili' tated by an increasedconcentration of selectedproteins and RNAs.The components inuolued in forming a subcompartment can self-assembleinto discreteorganellessuch as nucleoli or Cajal bodies; they can also be tethered to fixed structures such as the nuclearenuelope. During mitosis,geneexpressionshutsdown and all chromosomesadopt a highly condensedconformation in a processthat beginsearly in M phase to packagethe two DNA moleculesof each replicated chromosomeas two separatelyfolded chromatids. This processis accompaniedby histone modifications that facilitate chromatin packing. Howeuer,satisfactorycompletion of this orderly process,which reducesthe end-toend distance of each DNA moleculefrom its interphaselength by an additional factor of ten, requirescondensinproteins.
HOWGENOMES EVOLVE In this chapter, we have discussedthe structure of genes and the ways that they are packaged and arranged in chromosomes. In this final section, we provide an overview of some of the ways that genes and genomes have evolved over time to produce the vast diversity of modern-day life forms on our planet. Genome
(B) 0 . 5p m
246
Chapter4: DNA,Chromosomes, and Genomes
sequencing has revolutionized our view of the process of molecular evolution, uncovering an astonishing wealth of information about specific family relationships among organisms, as well as illuminating evolutionary mechanisms more generally. It is perhaps not surprising that geneswith similar functions can be found in a diverse range of living things. But the great revelation of the past 25 years has been the discovery that the actual nucleotide sequencesof many genes are sufficiently well conserved that homologous genes-that is, genes that are similar in both their nucleotide sequence and function because of a common ancestry-can often be recognized across vast phylogenetic distances. For example, unmistakable homologs of many human genesare easyto detect in such organisms as nematode worms, fruit flies, yeasts, and even bacteria. In many cases, the resemblance is so close that the protein-coding portion of a yeast gene can be substituted with its human homolog-even though we and yeast are separated by more than a billion years of evolutionary history. As emphasized in Chapter 3, the recognition of sequence similarity has become a major tool for inferring gene and protein function. Although finding a sequencematch does not guarantee similarity in function, it has proven to be an excellentclue.Thus, it is often possibleto predict the function of genesin humans for which no biochemical or geneticinformation is availablesimply by comparing their nucleotide sequenceswith the sequencesof genesin other organisms. In general, gene sequences are more tightly conserved than is overall genome structure. As we saw earlier, other features of genome organization such as genome size,number of chromosomes,order of genesalong chromosomes, abundance and size of introns, and amount of repetitive DNA are found to differ greatly among organisms, as does the number of genes that an organism contains. The number of genes is only very roughly correlated with the phenotypic complexity of an organism (seeTable l-l). Much of the increasein gene number observed with increasing biological complexity involves the expansion of families of closely related genes, an observation that establishes gene duplication and divergence as major evolutionary processes.Indeed, it is likely that all present-day genes are descendants-via the processesof duplication, divergence, and reassortment of gene segments-of a few ancestral genes that existed in early life forms.
GenomeAlterations areCausedby Failures of the Normal M e c h a n i s mf or s C o p yi n ga n d Ma i n ta i ning DNA cells in the germline do not have specialized mechanisms for creating changes in the structures of their genomes: evolution depends instead on accidents and mistakes followed by nonrandom survival. Most of the genetic changes that occur result simply from failures in the normal mechanisms by which genomes are copied or repaired when damaged, although the movement of transposable DNA elements also plays an important role. As we will discuss in chapter 5, the mechanisms that maintain DNA sequences are remarkably precise-but they are not perfect. For example, because of the elaborate DNA-replication and DNA-repair mechanisms that enable DNA sequences to be inherited with extraordinary fidelity, along a given line of descent only about one nucleotide pair in a thousand is randomly changed in the germline every million years. Even so, in a population of 10,000diploid individuals, every possible nucleotide substitution will have been "tried out" on about 20 occasions in the course of a million years-a short span of time in relation to the evolution of species. Errors in DNA replication, DNA recombination, or DNA repair can lead either to simple changes in DNA sequence-such as the substitution of one base pair for another-or to large-scalegenome rearrangements such as deletions, duplications, inversions,and translocations of DNA from one chromosome to another. In addition to these failures of the genetic machinery, the various mobile DNA elements that will be described in chapter 5 are an important source of genomic change (seeTable 5-3, p. 318). These transposable DNA elements (ransposons)
247
HOW GENOMES EVOLVE
are parasitic DNA sequences that colonize genomes and can spread within them. In the process, they often disrupt the function or alter the regulation of existing genes. On occasion, they can even create altogether novel genes through fusions between transposon sequencesand segmentsof existing genes. Over long periods of evolutionary time, transposons have profoundly affected the structure of genomes. In fact, nearly half of the DNA in the human genome has recognizable sequence similarity with known transposon sequences, thereby indicating that these sequences are remnants of past transposition events (see Figure 4-17). Even more of our genome is no doubt derived from transposition events that occurred so long ago (>l0B years) that the sequences can no longer be traced to transposons.
TheGenomeSequences of TwoSpecies Differin Proportionto the Lengthof TimeThatTheyHaveSeparately Evolved The differences between the genomes of species alive today have accumulated over more than 3 billion years. Lacking a direct record of changes over time, we can nevertheless reconstruct the process of genome evolution from detailed comparisons of the genomes of contemporary organisms. The basic tool of comparative genomics is the phylogenetic tree. A simple example is the tree describing the divergence of humans from the great apes (Figure 4-75). The primary support for this tree comes from comparisons of gene or protein sequences.For example, comparisons between the sequencesof human genes or proteins and those of the great apes tlpically reveal the fewest differencesbetween human and chimpanzee and the most between human and orangutan. For closely related organisms such as humans and chimpanzees, it is relatively easyto reconstruct the gene sequencesof the extinct, last common ancestor of the two species (Figure 4-76). The close similarity between human and chimpanzee genesis mainly due to the short time that has been available for the accumulation of mutations in the two diverging lineages, rather than to functional constraints that have kept the sequencesthe same. Evidence for this view comes from the observation that even DNA sequenceswhose nucleotide order is functionally unconstrained-such as the sequences that code for the fibrinopeptides (seep. 264) or the third position of "synonymous" codons (codons specifying the same amino acid-see Figure 4-76)-are nearly identical in humans and chimpanzees. For much less closely related organisms, such as humans and chickens (which have evolved separatelyfor about 300 million years), the sequence conservation found in genes is largely due to purifying selection (that is, selection that eliminates individuals carrying mutations that interfere with important genetic functions), rather than to an inadequate time for mutations to occur. As a result, protein-coding, RNA-coding, and regulatory sequencesin the DNA are often remarkably conserved. In contrast, most DNA sequences in the human and chicken genomes have diverged so far due to multiple mutations that it is often impossible to align them with one another. 15
l a s tc o m m o na n c e s t o r 15 c
c o o o-
l
!
Iro
=
'1 0 ' -
o
@
6 o
o l c
vf
0.5 -
c o i o o
c
E human
chimpanzeegorilla
00 o r an g u l a n
Figure4-75 A phylogenetictree showing the relationshipbetweenthe human and the great apes basedon nucleotide the sequences sequencedata,As indicated, of the genomesof allfour speciesare estimatedto differfrom the sequenceof the genomeof their lastcommonancestor changesoccur Because by a littleover 1.5ol0. on both diverginglineages, independently pairwisecomparisons revealtwicethe from the last sequencedivergence ForexamPle, commonancestor. typically comparisons human-orangutan of a littleover showsequencedivergences while human-chimpanzee 3olo, of showdivergences comparisons (Modifiedfrom approximately1.2olo. F.C.Chenand W.H.Li,Am.J. Hum.Genet. 68:444-456,2001.With permissionfrom of ChicagoPress.) University
248
Chapter4: DNA,Chromosomes, and Genomes gorilla c.la Figure 4-T6Tracingthe ancestral O sequencefrom a sequencecomparison NUMAN GTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGG of the codingregionsof human and ||||||lillIt ||||I|Iil||||||||l||||l ChIMP GTGCCCATCCAAAAAGTCCAGGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGG chimpanzeeleptin genes.Leptinis a PTOIEiNVP I Q K V Q D D T K T L I K T I V T R hormonethat regulates food intakeand
energyutilizationin response to the adequacyoffat reserves. As indicatedby the codonsboxedin green,only K human aTcaaTGACATTTCACACACGCAGTCAGTCTCCTCCAAACAGAAAGTCACcGGTTTGGAC 5 nucleotides(of 441 total) differ llll| | | | |l | | | | | | | | | | | | 1il il1 | | | | | || l betweenthesetwo sequences. Moreovet 1ilililp AlLsluAUAr-1-IuALACACGCAGTCAGTCTCCTCCAAACAGAAGGTCACCGGTTTGGAC proteinr N D r s H T o s v s S K e KV when the aminoacidsencodedby both T G L D gorilla aec the humanand chimpanzee sequences areexamined,in only one of the 5 gorilla ccc positionsdoesthe encodedaminoacid P NUMANTTCATTCCTGGGCTCCACCCCATCCTGACCTTATCCAAGATGGACCAGACACTGGCAGTC differ.Foreachof the 5 variantnucleotide llll||||l||ilt |||||lil|1||l|||l||||||l positions, the corresponding sequencein chimp TTceltccTccccTccACccTATCCTGACCTTATcCAAGATGGACCAGACACTGGCAGTC the gorillais alsoindicated.In two cases, I P G L H P I L T L PTOIEIN F S K M D T L A V Q the gorillasequenceagreeswith the humansequence, while in threecasesit agreeswith the chimpanzee sequence. v Whatwasthe sequenceof the leptin human tacC.qacAGATCCTCACCAGTATGCCTTCCAGAAAcGTGATCCAAATATCCAACGAccTc genein the lastcommonancestor? An ||l|||||||||||l|l||! I||l|il||||1il|l chimp TaccaacAGATccTcACCAGTATGCCTTCCAGAAACATGATCCAAATATCCAAccAccTc evolutionary modelthat seeksto protein v O O r L T s M p s R N M r e r s N D L minimizethe numberof mutations gorilla ATG postulatedto haveoccurredduringthe evolutionof the humanand chimpanzee D geneswould assumethat the leptin human caceaccrcccccATCTTCTTCAGGTGcrccccrrcrcrAAGAGCTGCCACTTGCCCTGG sequenceof the lastcommonancestor |||l|||t l|l||l||||l||||!|||l|||||tl chimp GAGAACcrcccccAccrrcrrcAGGTGCTGGccrrcrcrAAGAGCTGCcACTTGCccrGG w a st h e s a m ea st h e h u m a na n d pTotein E N L R D L L H V L A F S K S c H L P W chimpanzee sequences when they agree; gorilla cAc when they disagree, it would usethe gorillasequenceasa tie-breaker. For convenience, onlythe first Phylogenetic TreesConstructed 300 nucleotides of the leptincoding from a Comparison of DNA given. sequences are The remaining Sequences Tracethe Relationships of All Organisms 141areidenticalbetweenhumansand chimoanzees. Integration phylogenetic -him^
of trees based on molecular sequence comparisons with the fossil record has led to the best available view of the evolution of modern life forms. The fossil record remains important as a source of absolute dates based on the decay of radioisotopes in the rock formations in which fossils are found. However, precise divergence times between speciesare difficult to establish from the fossil record, even for species that leave good fossils with distinctive morphology. Such integrated phylogenetic trees suggestthat changes in the sequencesof particular genes or proteins tend to occur at a nearly constant rate, although rates that differ from the norm by as much as twofold are observed in particular lineages.As discussedabove and in Chapter 5, this "molecular clock" runs most rapidly and regularly in sequencesthat are not subject to purifying selectionsuch as intergenic regions, portions of introns that lack splicing or regulatory signals, and genes that have been irreversibly inactivated by mutation (the socalled pseudogenes).The clock runs most slowly for sequencesthat are subject to strong functional constraints-for example, the amino acid sequencesof proteins such as actin that engage in specific interactions with large numbers of other proteins and whose structure is therefore highly constrained (see, for example,Figure 16-18). Occasionally, rapid change is seen in a previously highly conserved sequence.As discussedlater in this chapter, such episodes are especially interesting becausethey are thought to reflect periods of strong positive selection for mutations that conferred a selective advantage in the particular lineage where the rapid change occurred. Molecular clocks run at rates that are determined both by mutation rates and by the degree of purifying selection on particular sequences.Therefore, a completely different calibration is required for those genes replicated and repaired by different systems within cells. Most notably, in animals, although not in plants, clocks based on functionally unconstrained mitochondrial DNA sequencesrun
:' .
HOWGENOMES EVOLVE opossum wallaby ancesror
armadillo hedgehog bat
cow sheep - I n d i a nm u n t j a k pr9 rabbit mouSe
,,:,,::,',,;',:',,:,::".,.,,;',1
Figure 4-77 A phylogenetictree highlightingsomeof the mammals whose genomesare being extensively studied.The lengthof eachlineis proportionalto the numberof "neutral the substitutions"-representing nucleotidechangesobservedin the (Adapted absenceof purifyingselection. from G.M.Cooperet al.,GenomeRes. from 15:901-91 3, 2005.With permission Cold SpringHarborLaboratoryPress.)
garago marmosel s q u i r r e lm o n k e y vervet baboon macaque o r an g u l a n qorilla cnrmp numan
much faster than clocks based on functionally unconstrained nuclear sequences, due to an unusually high mutation rate in animal mitochondria. Molecular clocks have a finer time resolution than the fossil record and are a more reliable guide to the detailed structure of phylogenetic trees than are classicalmethods of tree construction, which are based on comparisons of the morphology and development of different species.For example,the precise relationship among the great-ape and human lineages was not settled until sufficient molecular-sequence data accumulated in the 1980s to produce the tree that was shor,rmin Figure 4-75. And with huge amounts of DNA sequence now determined from a variety of mammals, much better estimates of our relationship to them are being obtained (Figure 4-77).
A Comparison of Humanand MouseChromosomes ShowsHow TheStructuresof GenomesDiverge As would be expected, the human and chimpanzee genomes are much more alike than are the human and mouse genomes. Although the size of the human and mouse genomes are roughly the same and they contain nearly identical sets of genes, there has been a much longer time period over which changes have had a chance to accumulate-approximately 80 million years versus 6 million years. In addition, as indicated in Figure 4-77, rodent lineages (representedby the rat and the mouse) have unusually fast molecular clocks. Hence, these lineages have diverged from the human lineage more rapidly than otherwise expected. As indicated by the DNA sequencecomparison in Figure 4-78, mutation has led to extensive sequence divergencebetween humans and mice at all sites that are not under selection-such as most nucleotide sequencesin introns. In con-
of a portionof Figure4-78 Comparison the mouseand humanleptingenes. differby a wherethe sequences Positions areboxed substitution singlenucleotide
poriio"r ut"itr" trast,in human-chimpanzee nearlyalliequence comparisons,
[3'f,'* Slij!|,:f]ffi3jT,:'.?:ll'
same simply because not enough time has elapsed since the last common ancestor for large numbers of changes to have occurred. In contrast to the situation for humans and chimpanzees, local gene order and overall chromosome organization have diverged greatly between humans
ro*"Jinyellor.Notethatthecoding of the exonismuchmore sequence intron thanisthe adjacent conserved sequence.
exon .<_-r--> intron
:::i:ffi:txi:s;trt $::ii*::mn::ff:il:i:i::ffi::il:il:ffi:i:ffiH:i::ffi:il:ffi:i:i#:i:ffiff#il::i:tff:il: mouse TATTTCTGGTCATGGCTCTTGTCACTGCTGCCTGCTGAAATACAGGGCTGA ACCAGAGTCTGAGAAACATGTCATGCACCTCCTAGAAGCTGAGAGTTTAT.AAGCCTCGAGTGTACAT.GAAGGATTTGAAAGCACA GCCAG- - CCC-AGCACTGGCTCCTAGTGGCACTGGACCCAGATAGTCCAAGAAACATTTATTGAACGCCTCCTGAATGCCAGGCACCTACTGGAAGCTGAhuman
250
Chapter4: DNA,Chromosomes, and Genomes h u m a nc h r o m o s o m e1 4
m o u s ec h r o m o s o m1e2 2 0 0 , 0 0 0b a s e s
and mice. According to rough estimates, a total of about 180 break-and-rejoin events have occurred in the human and mouse lineages since these two species Iast shared a common ancestor. In the process, although the number of chromosomes is similar in the two species(23 per haploid genome in the human versus 20 in the mouse), their overall structures differ greatly. Nonetheless, even after the extensive genomic shuffling, there are many large blocks of DNA in which the gene order is the same in the human and the mouse. These stretches of conserved gene order in chromosomes are referred to as regions of synteny. An unexpected conclusion from a detailed comparison of the complete mouse and human genome sequences, confirmed from subsequent comparisons between the genomes of other vertebrates, is that small blocks of sequencesare being deleted from and added to genomes at a surprisingly rapid rate. Thus, if we assumethat our common ancestor had a genome of human size (about 3 billion nucleotide pairs),mice would have lost a total of about 45 percent of that genome from accumulated deletions during the past B0 million years, while humans would have lost about 25 percent. However, substantial sequence gains from many small chromosome duplications and from the multiplication of transposons have compensated for these deletions. As a result, our genome size is unchanged from that of the last common ancestor for humans and mice, while the mouse genome is smaller by only 0.3 billion nucleotides. Good evidence for the loss of DNA sequencesin small blocks during evolution can be obtained from a detailed comparison of most regions of synteny in the human and mouse genomes. The comparative shrinkage of the mouse genome can be clearly seen from such comparisons, with the net loss of sequences scattered throughout the long stretches of DNA that are otherwise homologous (Figure 4-79). DNA is added to genomes both by the spontaneous duplication of chromosomal segments that contain tens of thousands of nucleotide pairs (as will be discussed shortly), and by active transposition (most transposition events are duplicative, becausethe original copy of the transposon stayswhere it was when a copy inserts at the new site; for example, see Figure 5-74). Comparison of the DNA sequencesderived from transposons in the human and the mouse therefore readily reveals some of the sequence additions (Figure 4-80). For unknown reasons, all mammals have genome sizes of about 3 billion nucleotide pairs that contain nearly identical sets of genes,even though only on the order of 150 million nucleotide pairs appear to be under sequence-specific functional constraints.
h u m a n p - g l o b i ng e n e c l u s t e r C
'l I Y-
rttttlltl tl
aa
l' "l ll'l
m o u s eB - g l o b i ng e n e c l u s t e r '
.
ti', :: ::'.::::'''
l" ll 1/\ I ll 10,000 n u c l e o t i d ep a i r s
BmaJor
In
Bm rnor
Figure 4-79 Comparisonof a syntenic portion of mouseand human genomes. About90 percentofthe two genomescan be alignedin thisway.Notethat while thereis an identicalorderof the matched (redmarks),therehas indexsequences beena net lossof DNAin the mouse lineagethat is interspersed throughoutthe entireregion.Thistype of net lossistypical for all suchregions,and it accountsfor the fact that the mousegenomecontains14 percentlessDNAthan doesthe human genome.(Adaptedfrom Mouse Sequencing Consortiu m, Noture 420:520-57 3, 2002.With permissionfrom MacmillanPublishers Ltd.)
Figure4-80 A comparisonof the p-globingene clusterin the human and mousegenomes,showingthe location of transposable elements.Thisstretchof humangenomecontainsfivefunctional B-globin-likegenes(orange);the comparable regionfrom the mouse genomehasonlyfour.The positionsof the humanAlu sequenceareindicatedby greencircles, and the humanLl sequences by redcircles. The mouse genomecontainsdifferentbut related transposable elements: the positionsof B1elements(whichare relatedto the humanAlu sequences) areindicatedby bluetriangles, and the positionsof the mouseL1elements(whichare relatedto the humanL1 sequences) are indicated by orangetriangles. The absenceof transposable elementsfrom the globin structuralgenescan be attributedto purifyingselection, whichwould have eliminatedany insertionthat genefunction.(Courtesy compromised of RossHardisonandWebbMiller.)
HOWGENOMES EVOLVE
.
:
251
TheSizeof a VertebrateGenomeReflectsthe RelativeRatesof DNAAdditionand DNALossin a Lineage Now that we know the complete sequence of a number of vertebrate genomes, we see that genome size can vary considerably, apparently without a drastic effect on the organism or its number of genes.Thus, while the mouse and dog genomes are both in the typical mammalian size range, the chicken has a genome that is only about one-third human size (one billion nucleotide pairs). A particularly notable example of an organism with a genome of anomalous size is the puffer fis}:',Fugu rubripes (Figure 4-81), which has a tiny genome for a vertebrate (0.4 billion nucleotide pairs compared to I billion or more for many other fish). The small size of the Fugu genome is largely due to the small size of its introns. Specifically,Fugu introns, as well as other noncoding segmentsof the Fugu genome, lack the repetitive DNA that makes up a large portion of the genomes of most well-studied vertebrates. Nevertheless,the positions of Fugu introns are nearly perfectly conserved relative to their positions in mammalian genomes (Figure 4-82). \Mhile initially a mystery we now have a simple explanation for such large differences in genome size between similar organisms: because all vertebrates experience a continuous process of DNA loss and DNA addition, the size of a genome merely depends on the balance between these opposing processesacting over millions of years. Suppose,for example, that in the lineage leading to Fugu, the rate of DNA addition happened to slow greatly. Over long periods of time, this would result in a major "cleansing" from this fish genome of those DNA sequenceswhose loss could be tolerated. In retrospect,the processof purifying selection in the Fugu lineage has partitioned those vertebrate DNA sequencesmost likely to be functional into only 400 million nucleotide pairs of DNA, providing a major resource for scientists.
Figure4-81 The pufferfish, Fugu rubripes.(Courtesyof ByrappaVenkatesh.)
WeCanReconstruct the Sequence of SomeAncientGenomes The genomes of ancestral organisms can be inferred, but never directly observed: there are no ancient organisms alive today. Although a modern organism such as the horseshoe crab looks remarkably similar to fossil ancestors that lived 200 million years ago, there is every reason to believe that the horseshoecrab genome has been changing during all that time at a rate similar to that occurring in other evolutionary lineages.Selection constraints must have maintained key functional properties of the horseshoe-crab genome to account for the morphological stability of the lineage. However, genome sequencesreveal that the fraction of the genome subject to purifying selection is small; hence the genome of the modern horseshoe crab must differ greatly from that of its extinct ancestors, known to us only through the fossil record. Is there any way around this problem? Can we ever hope to decipher large sections of the genome sequence of the extinct ancestors of organisms that are h u m a ng e n e
100.0 pairs thousands of nucleotide
Figure 4-82 Comparisonof the genomic sequencesof the human and Fugugenes encodingthe protein huntingtin. Both genes(indicatedin red)contain 67 shortexonsthat alignin 1:1 to one another;these correspondence exonsare connectedby curvedlines.The humangeneis 7.5timeslargerthan the Fugugene(180,000versus27,000 nucleotidepairs).The sizedifferenceis entirelydue to largerintronsin the human gene.Thelargersizeof the humanintrons is due in part to the presenceof whose positionsare retrotransposons, representedby greenverticallines;the In Fuguintronslackretrotransposons. humans,mutationof the huntingtingene causesHuntington'sdisease,an inherited disorder.(Adapted neurodegenerative from S.Baxendaleet al.,Nat.Genet. 'l 0:67-7 6, 1995.With permissionfrom Ltd.) MacmillanPublishers
252 alive today?For organismsthat are as closelyrelated as human and chimp, we sawthat this may not be difficult. In that case,referenceto the gorilla sequence can be be used to sort out which of the few differencesbetween human and chimp DNA sequenceswasinherited from our common ancestorsome6 million yearsago (seeFigure4-76).For an ancestorthat hasproduced a largenumber of different organisms alive today, the DNA sequencesof many speciescan be compared simultaneouslyto unscramblethe ancestralsequence,allowing scientists to trace DNA sequencesmuch farther back in time. For example,from the complete genome sequencesof 20 modern mammals that will soon be obtained,it shouldbe possibleto deciphermost of the genomesequenceof the 100million year-oldBoreoeutherianmammal that gaverise to speciesasdiverse as dog, mouse,rabbit, armadillo and human (seeFigure4-77).
Multispecies Sequence Comparisons ldentifylmportantDNA Sequences of UnknownFunction The massivequantity of DNA sequencenow in databases(morethan a hundred billion nucleotide pairs) provides a rich resourcethat scientistscan mine for many purposes.We have alreadydiscussedhow this information can be usedto unscramblethe evolutionarypathwaysthat haveled to modern organisms.But sequencecomparisonsalsoprovide many insightsinto how cellsand organisms function. Perhapsthe most remarkable discoveryin this realm has been the observationthat, althoughonly about I.5% of the human genomecodesfor proteins, about three times this amount (in total, 5%of the genome-see Table4-1, p. 206)has been stronglyconservedduring mammalian evolution.This massof conservedsequenceis most clearly revealedwhen we align and compareDNA synteny blocks from many different species.In this way, so-calledmultispecies conseruedsequences can be readily identified (Figure 4-83). Most of the noncoding conservedsequencesdiscoveredin this way turn out to be relatively short, containing between 50 and 200 nucleotide pairs.The strict conservation gene(cystic humanCFTR fibrosistransmembrane conductance regulator) 190,000nucleotidepairs
,t3'
intron
exon
T---------------rt i t j'. r i i' t r i t l
i 'i i 't i i r t r | l t r ' i i t i ' i t i i l i + t t i l l i i i t i i i l t t i i t l
r*.}}+iF)+i*.,+iF++ *
chimpanzee orangutan baboon marmoset lemur rabbit horse cat dog mouse opossum chicken \100%
Fugu
,L[
'
I
10-0nucleotide pairs 1O OOO nr rrlonfida
nrir<
rSOVo
Figure4-83 The detection of multispeciesconservedsequences.In this example,genome sequencesfor eachofthe organisms shownhavebeen comparedwith the indicatedregionof the humanCFTRgene,scanningin 25 nucleotideblocks.Foreachorganism, the percentidentity acrossits syntenic sequencesis plotted in green.Inaddition, a computationalalgorithmhas been usedto detect the sequenceswithin this regionthat are most highly conserved when the sequencesfrom all of the organismsaretaken into account. Besidesthe exon,three other blocksof multispeciesconservedsequencesare shown.The function of most such sequences in the humangenomeis not known
/Cor rrfcsv of Frir f) Grcen )
HOW GENOMES EVOLVE
implies that they have important functions that have been maintained by puriffing selection. The puzzle is to unravel what those functions are. Some of the conserved sequence that does not code for protein codes for untranslated RNA molecules that are known to have important functions, as we shall see in later chapters. Another fraction of the noncoding conserved DNA is certainly involved in regulating the transcription of adjacent genes,as discussedin Chapter 7. But we do not yet know how much of the conserved DNA can be accounted for in these ways, and the bulk of it is still a deep mystery. The solution to this mystery is bound to have profound consequences for medicine, and it reveals how much more we need to learn about the biology of vertebrate organisms. How can cell biologists tackle this problem? The first step is to distinguish between the conserved regions that code for protein and those that do not, and then, among the latter, to focus on those that do not already have some other identified function, in coding for structural RNA molecules, for example. The next task is to discover what proteins or RNA molecules bind to these mysterious DNA sequences,how they are packaged into chromatin, and whether they ever serve as templates for RNA synthesis. Most of this task still lies before us, but a start has been made, and some remarkable insights have been obtained. One of the most intriguing concerns the evolutionary changes that have made us humans different from other animals-changes, that is, in sequences that have been conserved among our close relatives but have undergone sudden rapid change in the human sublineage.
Accelerated Changes in Previously Can Conserved Sequences HelpDecipherCriticalStepsin HumanEvolution As soon as both the human and the chimpanzee genome sequences became available, scientists began searching for DNA sequence changes that might account for the striking differences between us and them. With 3 billion nucleotide pairs to compare in the two species,this might seem an impossible task. But the job was made much easierby confining the search to 35,000clearly defined multispecies conserved sequences(a total of about 5 million nucleotide pairs), representing parts of the genome that are most likely to be functionally important. Though these sequences are conserved strongly, they are not conserved perfectly, and when the version in one species is compared with that in another they are generally found to have drifted apart by a small amount corresponding simply to the time elapsed since the last common ancestor.In a small proportion of cases,however, one seessigns of a sudden evolutionary spurt. For example, some DNA sequencesthat have been highly conserved in other mammalian speciesare found to have changed exceptionally fast during the six milIion years of human evolution since we diverged from the chimpanzees. Such human accelerated regions IFIARs) are thought to reflect functions that have been especially important in making us different in some useful way. About 50 such sites were identified in one study, one-fourth of which were located near genes associatedwith neural development. The sequence exhibiting the most rapid change (18 changesbetween human and chimp, compared to only two changesbetween chimp and chicken) was examined further and found to encode a 1l8-nucleotide noncoding RNA molecule that is produced in the human cerebral cortex at a critical time during brain development (Figure 4-84). Although the function of this FIARIF RNA is not yet known, this exciting finding is stimulating further studies that will hopefully shed light on crucial features of the human brain.
Gene DuplicationProvidesan lmportant Sourceof Genetic NoveltyDuring Evolution Evolution depends on the creation of new genes,as well as on the modification of those that already exist. How does this occur?lVhen we compare organisms that seem very different-a primate with a rodent, for example, or a mouse with
253
254
Chapter4: DNA,Chromosomes, and Genomes C R E S YVLI O L E TS T A I N
outer surface of cortex
Inner surface of cortex
4mm
/ N S / T UH Y B R I D I Z A T I O N I
r I
SF.;
?.If $..' {{"
I ?
* I t J
(A)
(B)
:s' 1mm
a fish-we rarely encounter genes in the one species that have no homolog in the other. Genes without homologous counterparts are relatively scarce even when we compare such divergent organisms as a mammal and a worm. On the other hand, we frequently find gene families that have different numbers of members in different species.To create such families, genes have been repeatedly duplicated, and the copies have then diverged to take on new functions that often vary from one speciesto another. The genes encoding nuclear hormone receptors in humans, a nematode worm, and a fruit fly illustrate this point (Figure 4-85). Many of the subtypes of these nuclear receptors (also called intracellular receptors) have close homologs in all three organisms that are more similar to each other than they are to other family subtypes present in the same species.Therefore, much of the functional divergence of this large gene family must have preceded the divergence of these three evolutionary lineages.Subsequently,one major branch of the gene family underwent an enormous expansion in the worm lineage only. Similar, but smaller, lineage-specific expansions of particular subtypes are evident throughout the gene family tree. Gene duplication occurs at high rates in all evolutionary lineages, contributing to the vigorous process of DNA addition discuss