Biological Magnetic Resonance Volume 17
Structure Computation and Dynamics in Protein NMR
A Continuation Order Plan ...
40 downloads
1145 Views
11MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Biological Magnetic Resonance Volume 17
Structure Computation and Dynamics in Protein NMR
A Continuation Order Plan is available for this series. A continuation order will bring delivery of each new volume immediately upon publication. Volumes are billed only upon actual shipment. For further information please contact the publisher.
Biological Magnetic Resonance Volume 17
Structure Computation and Dynamics in Protein NMR Edited by
N. Rama Krishna University of Alabama at Birmingham
Birmingham, Alabama
and
Lawrence J. Berliner Ohio State University Columbus, Ohio
KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
!"##$ %&"'( )*+,-.%&"'(
0-306-47084-5 0-306-45953-1
/0110.2345!*.6789!:+7.)4;3+<=!*< '!5.>#*$?."#<-#,?.@#*9*!7=-?.A#,9#,?.B#<7#5 )*+,-../1999.2345!*.6789!:+7.C.)3!,4:.)4;3+<=!*< '!5.>#*$ 633.*+D=-<.*!
=--F(CC$345!*#,3+,!M7#: =--F(CC!;##$<M$345!*#,3+,!M7#:
To Praveen Srinivas Nepalli Krishna
Contributors
Hashim M. Al-Hashimi • Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602 Michael Andrec • Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602
Todd M. Billeci • Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94143-0446
Alexandre M. J. J. Bonvin • Laboratory of Physical Chemistry, Swiss Federal Institute of Technology-Zurich, ETH Zentrum, CH-8092 Zurich, Switzerland
Werner Braun • Sealy Center for Structural Biology, University of Texas Medical Branch, Galveston, Texas 77555-1157 Xavier Daura • Laboratory of Physical Chemistry, Swiss Federal Institute of Technology-Zurich, ETH Zentrum, CH-8092 Zurich, Switzerland Vladimir P. Denisov • Physical Chemistry 2, Lund University, S-22100 Lund, Sweden
Jan Engelke • Institut fur Biophysikalische Chemie, Johann-Wolfgang Goethe Universität, D-60439 Frankfurt am Main, Germany Shauna Farr-Jones • Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94143-0446
vii
viii
Contributors
David G. Gorenstein • Sealy Center for Structural Biology and Department of Human Biological Chemistry and Genetics, The University of Texas Medical Branch at Galveston, Texas 77555-1157 Elliott K. Gozansky • Sealy Center for Structural Biology and Department of Human Biological Chemistry and Genetics, The University of Texas Medical Branch at Galveston, Texas 77555-1157
Bertil Halle • Physical Chemistry 2, Lund University, S-22100 Lund, Sweden Nishantha Illangasekare • Sealy Center for Structural Biology and Department of Human Biological Chemistry and Genetics, The University of Texas Medical Branch at Galveston, Texas 77555-1157 Thomas L. James • Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94143-0446 N. Rama Krishna • Department of Biochemistry and Molecular Genetics, Comprehensive Cancer Center, The University of Alabama at Birmingham, Birmingham, Alabama 35294-2041
Bruce A. Luxon • Sealy Center for Structural Biology and Department of Human Biological Chemistry and Genetics, The University of Texas Medical Branch at Galveston, Texas 77555-1157 Gaetano T. Montelione • Center for Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, NJ 08854 Hunter N. B. Moseley • Department of Biochemistry and Molecular Genetics, Comprehensive Cancer Center, The University of Alabama at Birmingham, Birmingham, Alabama 35294-2041 Anwer Mujeeb • Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94143-0446
Michael Nilges • European Molecular Biology Laboratory, D-69012 Heidelberg, Germany Seán I. O’Donoghue • European Molecular Biology Laboratory, D-69012 Heidelberg, Germany
Contributors
ix
Gottfried Otting • Department of Medical Biochemistry and Biophysics, Karolinska Institute, S-171 77 Stockholm, Sweden
James H. Prestegard • Complex Carbohydrate Research Center, University of
Georgia, Athens, Georgia 30602 Carlos B. Rios • Center for Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, New Jersey 08854 Heinz Rüterjans • Institut für Biophysikalische Chemie, Johann-Wolfgang Goethe Universität, D-60439 Frankfurt am Main, Germany
Catherine H. Schein • Sealy Center for Structural Biology, University of Texas Medical Branch, Galveston, Texas 77555-1157 Lorna J. Smith • Oxford Centre for Molecular Sciences and New Chemistry
Laboratory, University of Oxford, Oxford OX1 3QR, England G. V. T. Swapna • Center for Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, New Jersey 08854
Varatharasa Thiviyanathan • Sealy Center for Structural Biology and Department of Human Biological Chemistry and Genetics, The University of Texas Medical Branch at Galveston, Texas 77555-1157
Joel R. Tolman • Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602 Nikolai B. Ulyanov • Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94143-0446
Wilfred F. van Gunsteren • Oxford Centre for Molecular Sciences and New Chemistry Laboratory, University of Oxford, Oxford OX1 3QR, England and Laboratory of Physical Chemistry, Swiss Federal Institute of TechnologyZurich, ETH Zentrum, CH-8092 Zurich, Switzerland
Kandadai Venu • School of Physics, University of Hyderabad, 500046 Hyderabad, India
x
Contributors
Yuan Xu • Sealy Center for Structural Biology, University of Texas Medical Branch, Galveston, TX 77555-1157 Diane E. Zimmerman • Center for Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, New Jersey 08854
Preface
Volume 17 is the second in a special topic series devoted to modern techniques in protein NMR, under the Biological Magnetic Resonance series. Volume 16, with the subtitle Modern Techniques in Protein NMR, is the first in this series. These two volumes present some of the recent, significant advances in the biomolecular NMR field with emphasis on developments during the last five years. We are honored to have brought together in these volume some of the world’s foremost experts who
have provided broad leadership in advancing this field. Volume 16 contains advances in two broad categories: I. Large Proteins, Complexes, and Membrane Proteins and II. Pulse Methods. Volume 17 contains major advances in: I. Computational Methods and II. Structure and Dynamics. The opening chapter of volume 17 starts with a consideration of some important aspects of modeling from spectroscopic and diffraction data by Wilfred van
Gunsteren and his colleagues. The next two chapters deal with combined automated assignments and protein structure determination, an area of intense research in many laboratories since the traditional manual methods are often inadequate or
laborious in handling large volumes of NMR data on large proteins. First, Werner Braun and his associates describe their experience with the NOAH/DIAMOD protocol developed in their laboratory. Next, Guy Montelione and his collaborators describe the AUTOASSIGN program that, together with a suite of heteronuclear correlation experiments can accelerate the assignment and structure determination of proteins. Then a chapter from the laboratory of Michael Nilges follows, dealing with problems unique to symmetric oligomers, and computational approaches for assignment of ambiguous NOEs and structure determination. The last three chapters in this section deal with computational methods for structure refinements. David Gorenstein and his colleagues discuss the use of the hybrid–hybrid matrix method for structure refinement of macromolecules from 3D NOESY–NOESY data. Next, a chapter from the laboratory of Thomas James describes the use of conformational ensemble calculations to study dynamic structures of proteins and xi
xii
Preface
nucleic acids. In the final chapter of this section, one of the editors, Rama Krishna, and his associate review the theory and application of the complete relaxation and
conformational exchange matrix (CORCEMA) formalism for quantitatively analyzing the NOESY spectra of reversibly forming ligand–receptor complexes, and discuss its potential application in structure-based drug design. In the second section dealing with structure and dynamics, Jim Prestegard and his colleagues describe exciting new developments associated with increasing field strengths of superconducting magnets, viz., exploitation of field-induced residual
dipolar couplings in weakly oriented proteins to deduce structural and dynamical information. They also discuss the use of bicelles to induce a weak alignment of proteins with the magnetic field. Next, a contribution from the laboratory of Heinz
Rüterjans discusses recent advances in the study of protein dynamics from 15N and I3
C relaxation time measurements. The final two chapters focus on a study of
protein-bound water molecules by NMR. First, Bertil Halle and co-workers give an account of the relaxation dispersion method for characterizing protein hydration. This is complemented by the last chapter from Gottfried Otting on the use of intermolecular water-solute NOEs to study bound water molecules. We are extremely proud of this compilation of excellent contributions describing significant advances in the biomolecular NMR field. Whether the field has already reached a state of maturity with only a few new advances (as occasionally
suggested) or it is still rapidly evolving with exciting new developments around every corner is a question which we leave to the reader. As always, we welcome
suggestions, comments, and criticisms for future volumes. N. Rama Krishna Lawrence J. Berliner
Contents
Section I. Computational Methods
Chapter 1 Aspects of Modeling Biomolecular Structure on the Basis of Spectroscopic or Diffraction Data Wilfred F. van Gunsteren, Alexandre M. J. J. Bonvin, Xavier Daura, and Lorna J. Smith 1. 2. 3. 4.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Molecular Modeling Approach . . . . . . . . . . . . . . . . . Generating Ensembles Consistent with Experimental Data . . . . . Six Aspects of Structure Determination Based on Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Choice of Degrees of Freedom r for Generating an Ensemble and for Calculating q(r ) . . . . . . . . . . . . . . . . . . . . 4.2. Choice of Physical Force Field (r) . . . . . . . . . . . . 4.3. Choice of (Empirical) Function q(r) for Calculating the Quantity q Using r . . . . . . . . . . . . . . . . . . . . . . . 4.4. Choice of Penalty Function to Restrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5. Quality of the Experimental Data to Guide the Restraining . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6. Choice of Method and Extent of Boltzmann Sampling of the Configurational Space . . . . . . . . . . . . . . . . . . . . . 5. Assessing the Quality of the Obtained Ensemble of Structures . . .
3 5 6 7 8 9 12 13 19 21 29 xiii
xiv
Contents
6.
Pitfalls That Can Be Avoided . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31 32
Chapter 2 Combined Automated Assignment of NMR Spectra and Calculation of Three-Dimensional Protein Structures Yuan Xu, Catherine H. Schein, and Werner Braun 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational Methods for Sequence-Specific Resonance Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Graph Theory 2.2. Genetic Algorithms and Mutual Information Method . . . . 2.3. Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 2.4. Matching Rungs on a Ladder: Automated Sequential Assignment Using Isotopically Labeled Proteins . . . . . . 2.5. Combinatorial Optimization and Monte Carlo Simulated Annealing of Score Functions . . . . . . . . . . . . . . . . 2.6. Real-Space Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Automated Stereospecific Assignments 3.1. Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Tests, Applications, and Assessment . . . . . . . . . . . . . 4. Combined Automated NOESY Spectra Assignment and 3D Structure Calculation . . . . . . . . . . . . . . . . . . . . . . 4.1. Molecular Dynamics Calculations with Ambiguous Restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Self-Correcting Distance Geometry Method . . . . . . . . 5. Future Improvements and Outlook . . . . . . . . . . . . . . . . .
37
2.
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40 41 50 52 56
59 61 62 62 65 67 67 68
75 76
Chapter 3 NMR Pulse Sequences and Computational Approaches for Automated Analysis of Sequence-Specific Backbone Resonance Assignments of Proteins Gaetano T. Montelione, Carlos B. Rios, G.V.T. Swapna, and Diane E. Zimmerman 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Systems for Automated Analysis of Resonance Assignments from Triple-Resonance NMR Spectra . . . . . . . . . . . . . . . . . .
81 82
Contents
3. Autoassign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 3.2. The Philosophy of AUTOASSIGN . . . . . . . . . . . . . 3.3. Generic Spin System Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. Constraint Propagation . . . . . . . . . . . . . . . . . . . 3.5. Representative Results 4. Practical Considerations in Data Collection and Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. General Considerations . . . . . . . . . . . . . . . . . . . 4.2. Peak Picking of NMR Spectra . . . . . . . . . . . . . . . 4.3. Validation of Input Files . . . . . . . . . . . . . . . . . . . 5. Experiments for Automated Analysis of Backbone Resonance Assignments . . . . . . . . . . . . . . . . . . . . . . 5.1. HSQC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. HNCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3. HN(CA)CO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4. HNCA . . . . . . . . . . . . . . . . . . . . . . . 5.5. HACA(CO)NH . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6. HACANH 5.7. C–C and C–H Phase Information in HACA(CO)NH and HACANH Experiments . . . . . . . . . . . . . . . . . . . 5.8. CBCA(CO)NH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9. CBCANH 5.10 C–C and C–H Phase Information in CBCA(CO)NH and CBCANH Experiments . . . . . . . . . . . . . . . . . . . 6. Future Developments . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xv
. . . . . .
85 85 87 87
87 90
. 97 . 97 . 99 . 100 . . . . . . .
100 101
104 104 108 109 114
. 115 . 121 . 121 . 126 . 127 . 128
Chapter 4 Calculation of Symmetric Oligomer Structures from NMR Data Seán I. O’ Donoghue and Michael Nilges 1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Symmetry in Macromolecular Aggregates . . . . . . . . . 2.2. The Problem: Symmetry Degeneracy in NMR Spectra . . 2.3. Reducing Symmetry Degeneracy with Asymmetric Labeling . . . . . . . . . . . . . . . . . . . . 3. The Symmetry-ADR Calculation Method . . . . . . . . . . . . . 3.1. Symmetry Restraint Terms . . . . . . . . . . . . . . . . .
. . . .
131 132 132 136
. 137 . 138 . 138
xvi
Contents
Ambiguous Distance Restraints (ADRs) . . . . . . . . . Annealing Protocols . . . . . . . . . . . . . . . . . . . . Iterative Structure Calculation and Explicit Assignment of ADRs . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. Other Restraint Terms . . . . . . . . . . . . . . . . . . . 4. Experiences with the Symmetry-ADR Method . . . . . . . . . 4.1. Initial Test Calculations . . . . . . . . . . . . . . . . . . 4.2. ssDBP Dimer . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Leucine Zipper Homodimers . . . . . . . . . . . . . . . 4.4. p53 Tetramerization Domain . . . . . . . . . . . . . . . . 5. Symmetric Oligomers Solved by NMR . . . . . . . . . . . . . 6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1. Problems of the Symmetry-ADR Method . . . . . . . . . 6.2. Should Symmetry Restraint Terms Be Used? . . . . . . . 6.3. Alternatives to the Symmetry-ADR Method . . . . . . . Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. 3.3. 3.4.
. .
140 142
. . . . . . . . . . . . . .
145 146 147 147 149 149 151
152 155 155 156 157 157 158
Chapter 5 Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements Elliott K. Gozansky, Varatharasa Thiviyanathan, Nishantha Illangasekare, Bruce A. Luxon, and David G. Gorenstein 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Simulation Studies Describing 3D NOESY–NOESY Cross Peaks, Approximate versus Exact Methods . . . . . . . . . . . . . . . . 3. Hybrid–Hybrid Relaxation Matrix Method for 3D NOESY–NOESY Data Analysis . . . . . . . . . . . . . . . . . . 3.1. Theory and Methods: Deconvolution of 2D NOESY Volumes from 3D NOESY–NOESY Volumes . . . . . . . . 3.2. Three-Dimensional Simulation Test and Effect of Added Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Hybrid–Hybrid Relaxation Matrix Structural Refinement of Duplex DNA from Simulated 3D NOESY–NOESY Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Hybrid–Hybrid Matrix: Experimental Refinement Test on a DNA Three-Way Junction . . . . . . . . . . . . . . . . . . . . . . . . 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
163 167 171 173 176
177 190 194 198
Contents
xvii
Chapter 6 Conformational Ensemble Calculations: Analysis of Protein and Nucleic Acid NMR Data Anwer Mujeeb, Nikolai B. Ulyanov, Todd M. Billeci, Shauna Farr-Jones, and Thomas L. James 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Determination of Structural Restraints . . . . . . . . . . . . . . . 2.1. Interproton Distance Restraints . . . . . . . . . . . . . . . 2.2. Coupling Constants and Torsion-Angle Restraints . . . . . 2.3. Other Types of Restraints . . . . . . . . . . . . . . . . . . 2.4. Indices of Agreement . . . . . . . . . . . . . . . . . . . . 3. Assessment of Conformational Flexibility . . . . . . . . . . . . . 4. Ensemble Calculations . . . . . . . . . . . . . . . . . . . . . . . 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Relaxation-Rate-Based Probability Calculations . . . . . . Experimental Examples . . . . . . . . . . . . . . . . . . . . . . 5. 5.1. Conotoxin MVIIC . . . . . . . . . . . . . . . . . . . . 5.2. Nucleic Acid Example . . . . . . . . . . . . . . . . . . . . 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
201 203 203 207 209 209 210 212 212 213 214 214 216 219 220
Chapter 7 Complete Relaxation and Conformational Exchange Matrix (CORCEMA) Analysis of NOESY Spectra of Reversibly Forming Ligand–Receptor Complexes Application to Transferred NOESY N. Rama Krishna and Hunter N. B. Moseley 1.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1. Molecular Complexes and Conformational Exchange . . . 1.2. Reversible Binding and Transferred NOESY . . . . . . . . 2. CORCEMA Theory . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Basic Formulation . . . . . . . . . . . . . . . . . . . . . . 2.2. Two-State Model of Ligand–Receptor Interactions . . . . 2.3. Treatment for More than Two States . . . . . . . . . . . . 2.4. Intermolecular Transferred NOESY . . . . . . . . . . . . 2.5. Treatment of Nonspecific Binding . . . . . . . . . . . . . 3. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. The CORCEMA Program . . . . . . . . . . . . . . . . . . 3.2. Calculation of Concentrations . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
223 224 225 230 230 233 240 244 244 246 246 247
xviii
Contents
3.3. Methods for Suppressing or Identifying Protein-Mediated Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. Methods for Observing Intermolecular Transferred NOESY Contacts . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. Structure Refinement Calculations . . . . . . . . . . . . . 4. Characterization of Some Critical Factors Using Simulated Transferred NOESY Data . . . . . . . . . . . . . . . . . . . . . 4.1. Finite Receptor Off-Rates . . . . . . . . . . . . . . . . . . 4.2. Effect of Ligand–Receptor Ratio on the Ligand Transferred NOESY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Role of Ligand–Protein Intermolecular Dipolar Relaxation . . . . . . . . . . . . . . . . . . . . . . 4.4. Ligand–Protein Intermolecular NOESY Intensity as a Function of Off-Rate . . . . . . . . . . . . . . . . . . . . . 4.5. Effect of Motions in the Protein–Ligand Complex on the Transferred NOESY . . . . . . . . . . . . . . . . . . . . . 5. Experimental Examples . . . . . . . . . . . . . . . . . . . . . . . . 5.1. Thrombin-Bound Structures of Human Fibrinopeptide Analogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Studies on Blood Group A Trisaccharide Bound to Dolichos biflorus Lectin . . . . . . . . . . . . . . . . . . . . . . . . 5.3. Transferred NOESY Studies on the Forssman Pentasaccharide Complexed to Dolichos biflorus . . . . . . . . . . . . . . . 5.4. Interaction of Sialyl LewisX Tetrasaccharide with E-selectin . . . . . . . . . . . . . . . . . . . . . . . . 5.5. Reversible Binding of Corepressor Tryptophan with Repressor–Operator Complex . . . . . . . . . . . . . . . . 6. Final Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
249
255 261 267 268 270 272 277 278 281 282 283 287 290 297 301 302
Section II. Structure and Dynamics
Chapter 8 Protein Structure and Dynamics from Field-Induced Residual Dipolar Couplings James H. Prestegard, Joel R. Tolman, Hashim M. Al-Hashimi, and Michael Andrec 1.
Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
311
Contents
2. Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Anisotropic Spin Interactions in Solution-State NMR . . . 2.2. The Dipolar Hamiltonian . . . . . . . . . . . . . . . . . . 2.3. Residual Dipolar Couplings under Magnetic Field Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Early History of Observation 4. Application to Protein Systems . . . . . . . . . . . . . . . . . . 5. Measurement of Residual Dipolar Couplings . . . . . . . . . . . 5.1. Frequency-Domain Experiments . . . . . . . . . . . . . . 5.2. Intensity-Based Experiments . . . . . . . . . . . . . . . . 6. Other Contributions to Multiplet Splittings . . . . . . . . . . . . 6.1. Effects of Transverse Relaxation . . . . . . . . . . . . . . 6.2. Dynamic Frequency Shifts . . . . . . . . . . . . . . . . . 7. Structure Determination Protocols . . . . . . . . . . . . . . . . . 8. The Effects of Molecular Motion and Their Separation . . . . . . . . . . . . . . . . . . . . . . . 8.1. The Cone-and-Arc Model 8.2. Order Matrix Analysis: A Test for Rigid Model Validity . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xix
. 314 . 314 . 315 . . . . . . . . . . . . . .
316 320 322 328 328 333 336 336 337 339 344 344 348 353
Chapter 9 Recent Developments in Studying the Dynamics of Protein Structures from and Relaxation Time Measurements Jan Engelke and Heinz Rüterjans 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1. General Features of Dynamics 1.2. Microdynamic Motional Parameters . . . . . . . . . . . . . 2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Theory of Relaxation in Proteins . . . . . . . . . . . . . . . 2.2. Experiments for the Determination of Relaxation Rates . . . Relaxation Rates . . . . . . 3. Backbone Dynamics Derived from 3.1. Experimental Details . . . . . . . . . . . . . . . . . . . . . 3.2. Processing of Spectra and Determination of Relaxation Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Calculation of Microdynamical Parameters . . . . . . . . . . 3.4. Interpretation of Microdynamical Parameters . . . . . . . . 4. Backbone Dynamics Derived from Relaxation Rates . . . . . 4.1. Analysis of the Multispin Relaxation of . . . . . . . . . . Relaxation Rates . . . . 4.2. Experiments to Determine the
357 357 359 361 361 365 370 370 376 377 381 385 386 391
xx
Contents
5.
6. 7. 8.
4.3. Microdynamical Parameters Derived from Relaxation Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Side-Chain Dynamics Derived from Relaxation Rates . . . . 5.1. Dynamical Parameters Derived from Relaxation Times and Steady-State NOE . . . . . . . . . . . . . . . . . . . . 5.2. SIIS Cross Relaxation . . . . . . . . . . . . . . . . . . . . Determination of Protein Dynamics in the Microsecond Time Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determination of Protein Dynamics in the Millisecond Time Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
394 395 396 402 406 410 414 414
Chapter 10 Multinuclear Relaxation Dispersion Studies of Protein Hydration Bertil Halle, Vladimir P. Denisov, and Kandadai Venu
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Methodology of Water NMRD . . . . . . . . . . . . . . . . . . . 2.1. Conventional Field Variation . . . . . . . . . . . . . . . . . 2.2. Fast Field Cycling . . . . . . . . . . . . . . . . . . . . . . 2.3. NMR Properties of the Water Nuclei . . . . . . . . . . . . 3. Relaxation Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Quadrupolar Relaxation . . . . . . . . . . . . . . . . . . . 3.2. Dipolar Relaxation . . . . . . . . . . . . . . . . . . . . . . 3.3. Relaxation due to Isotropic Couplings . . . . . . . . . . . . 4. Molecular Motions . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Spatial Resolution . . . . . . . . . . . . . . . . . . . . . . 4.2. Temporal Resolution . . . . . . . . . . . . . . . . . . . . . 4.3. Water Relaxation in Semisolid Proteins . . . . . . . . . . . 5. Quantitative Analysis of NMRD Data . . . . . . . . . . . . . . . 5.1. Parametrization of the NMRD Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Correlation Time 5.3. Dispersion Amplitude . . . . . . . . . . . . . . . . . . . . 5.4. High-Frequency Plateau . . . . . . . . . . . . . . . . . . . 5.5. NMRD Time Scales . . . . . . . . . . . . . . . . . . . . . .................... 5.6. Stretched Dispersions 5.7. Labile Hydrogens . . . . . . . . . . . . . . . . . . . . . . 6. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
419 421 421 424 429 432 433 437 442 444 446 451 457 462 462 465 466 470 471 474 477 480 481
Contents
xxi
Chapter 11 Hydration Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs Gottfried Otting 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Theoretical Background for Intermolecular NOEs . . . . . . . 2.1. NOE between Two Rigidly Bound Protons . . . . . . . . 2.2. NOE between Solute Proton and Bound but Locally Reorientating Water . . . . . . . . . . . . . . . . . . . . 2.3. NOE with Rapidly Diffusing Water Molecules . . . . . . 3. Assignments of Water–Solute Cross peaks . . . . . . . . . . . 4. NMR Experiments for the Detection of Intermolecular NOEs with Water . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Water Suppression . . . . . . . . . . . . . . . . . . . . . 4.2. Selective Water Excitation . . . . . . . . . . . . . . . . 4.3. Nonselective Experiments . . . . . . . . . . . . . . . . . 4.4. Dipolar Field Effects . . . . . . . . . . . . . . . . . . . 5. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1. Studies of Protein Hydration . . . . . . . . . . . . . . . 5.2. Studies of DNA and RNA Hydration . . . . . . . . . . . 6. Summary of the Results . . . . . . . . . . . . . . . . . . . . . 6.1. Residence Times . . . . . . . . . . . . . . . . . . . . . . 6.2. Structural Relevance . . . . . . . . . . . . . . . . . . . . 6.3. Future Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Conclusion References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 485 . . 487 . . 488 . . 489 . . 490 . . 493 . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
494 494 496
508 511 511 511 516 519 519 521 523 523 524
Contents of Previous Volumes . . . . . . . . . . . . . . . . . . . . . . . . 529 Index
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
I
Computational Methods
1
Aspects of Modeling Biomolecular Structure on the Basis of Spectroscopic or Diffraction Data
Wilfred F. van Gunsteren, Alexandre M. J. J. Bonvin, Xavier Daura, and Lorna J. Smith 1. INTRODUCTION Structural information for biomolecules such as proteins, sugars, and nucleic acids
can be derived from spectroscopic measurements on the molecule in solution or in the solid state or from diffraction measurements on the molecules in crystalline form. The structure, which can be represented in Cartesian coordinates by
Wilfred F. van Gunsteren • Oxford Centre for Molecular Sciences and New Chemistry Laboratory,
University of Oxford, Oxford OX1 3QR, England and Laboratory of Physical Chemistry, Swiss Federal Institute of Technology-Zurich, ETH Zentrum, CH-8092 Zurich, Switzerland. Alexandre M. J. J. Bonvin and Xavier Daura • Laboratory of Physical Chemistry, Swiss Federal Institute of
Technology-Zurich, ETH Zentrum, CH-8092 Zurich, Switzerland.
Lorna J. Smith • Oxford
Centre for Molecular Sciences and New Chemistry Laboratory, University of Oxford, Oxford OX1 3QR, England. Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999.
3
4
Wilfred F. van Gunsteren et al.
where
is the number of atoms, or in
internal or generalized coordinates by
where is the number of degrees of freedom, is not directly measured but is derived from a set of directly measured quantities
which depend on the molecular structure or conformation: q(r) or . We shall use the notation q(r) to express generally the dependence of an observable quantity
q upon molecular conformation r. Actually, the measurement does not yield a value q(r) depending on a single conformation r, but an average over the many molecules in solution or in the crystal and over the duration of the measurement,
where denote averaging. The quantities q(r) that are most widely used in structure determination of biomolecules are the following. 1. Intensities
of diffracted X-rays or neutrons, which depend on the
3.
reciprocal lattice indices hkl. Nuclear Overhauser enhancement (NOE) intensities due to nuclei i and j. through bond coupling constants, which depend on the nuclei i, j,
4.
k, and l defining a dihedral angle Chemical shifts of nuclei i.
2.
These observable quantities all depend in one way or the other on the molecular
conformation r, although the exact form of the function q(r) is often not known. The central problem of structure determination based on experimental data can now be cast in the following general form: Is it possible to derive conformation r from measured using relation (4)? A straightforward solution, viz., inversion of relation (4), is not at hand for the following reasons. 1. Except for high-resolution X-ray diffraction measurements, there are
insufficient observed quantities tion r uniquely from the latter:
2. The Boltzmann averaging cule cannot be inverted.
available to determine the conforma-
over the configuration space of the mole-
Modeling Biomolecular Structure on Basis of Spectroscopic Data
5
3. Except in the case of diffraction, where the scattered intensity is proportional to the square of the structure factor F(hkl), which in turn is the spatial Fourier transform of the electron density , the exact dependence of q on r, viz. the function q(r), is not known. The currently used functions to express NOE intensities, constants, or chemical shifts in terms of molecular conformation are all of empirical and approximative nature, and depend on many aspects of molecular structure. 4. Even if the function q(r) is known, it is generally not invertible; viz., the inverse function r(q) is not defined. The electron density depends not only on diffraction intensities but also on the phases. The Karplus relation gives as functions of dihedral angle values and is a multiplevalued function and hence is not invertible. Therefore, the structure determination problem is generally solved in an indirect manner by using a molecular modeling approach. 2. THE MOLECULAR MODELING APPROACH When formulating an electronic, atomic, or molecular model of a molecular system of interest in order to calculate physical quantities q which depend on the molecular configuration and electronic state, the following four choices have to be made.
1. Which degrees of freedom, indicated generally by r, of the real system will be explicitly treated in the molecular model? The degrees of freedom upon which the quantity q critically depends should at least be explicitly modeled in order to be able to evaluate q(r). (We note that r is used to indicate a molecular conformation and the degrees of freedom of a molecule.) 2. Which interaction function or force field will be used to calculate the potential energy of the molecular system and the forces along the explicitly treated degrees of freedom? The better the force field, the better the calculated values for q(r) will be. 3. Which equations of motion will be integrated to simulate the behavior of the molecular system, or which configurational sampling method will be used to generate a Boltzmann-weighted ensemble of configurations of the molecular system? According to statistical mechanics the weight of a configuration r in the average should be proportional to its Boltzmann factor
6
Wilfred F. van Gunsteren et al.
where
4.
is Boltzmann’s constant and T is the absolute temperature.
Which function q(r), expressing the quantity q in terms of the chosen degrees of freedom r, will be used to calculate from the ensemble of configurations r ?
Using the chosen molecular model, an ensemble of configurations r is generated from which the averages are calculated, which then can be compared with in order to assess the quality of the molecular model and computational
procedure. If (i) the degrees of freedom relevant to the quantity q have been included in the model, (ii) the interaction function is exact, (iii) the function q(r) is exact, and (iv) the sampling of configurational space is infinite, then the problem is solved; i.e., will be equal to the experimental data are exactly reproduced by the molecular modeling approach. However, in many cases this situation does not occur; i.e., Then a remedy would be to modify the molecular modeling approach in such a manner that the resulting is restrained to the observed values
3. GENERATING ENSEMBLES CONSISTENT WITH EXPERIMENTAL DATA
In its most general form the problem of generating an ensemble of configurations r for which (7) is satisfied can be formulated as follows. Given a Hamiltonian or Lagrangian or a potential energy function V(r) and given equations of motion or a sampling method such as Monte Carlo sampling for generating a Boltzmann ensemble, can one modify their form or parameter values such as to drive the average to agree with More formally, can the interaction function V (r), its parameters, the equations of motion, or the sampling method be changed so that (7) is satisfied? There are at least five quite distinct approaches to restrain the motion or the conformational distribution along the degrees of freedom r such that (7) is satisfied (van Gunsteren et al., 1996b): 1. 2. 3. 4. 5.
Constraint methods Penalty function methods Extended system methods Weak-coupling methods Stochastic methods
Modeling Biomolecular Structure on Basis of Spectroscopic Data
7
Since the penalty function method is most widely used in structure determination
based on NMR spectroscopic or X-ray diffraction data, only this technique will be described here. The average can be restrained to the value or penalty function term
by adding a restraining
to the physical interaction function
The resulting function V(r) is then used in the equations of motion or sampling procedure. The restraining term should be chosen such that its value increases the more deviates from (Kaptein et al., 1985), e.g.,
where is a parameter determining the range of values for which is linear. Formula (10) represents an upper-bound restraint to A lowerbound restraint can be obtained by replacing by – and inverting the inequality symbols in (10). This is useful when treating the absence of a NOE as indicative of
some minimum distance (de Vlieg et al., 1986). 4. SIX ASPECTS OF STRUCTURE DETERMINATION BASED ON EXPERIMENTAL DATA A refined formulation of the problem of structure determination of biomolecules based on spectroscopic or diffraction data is the following. Using an interaction function or energy function or scoring function (9), a Boltzmannweighted ensemble of molecular configurations r is to be generated. This problem has six aspects, which involve choices to be made which will affect to a variable degree the quality of the obtained ensemble of configurations.
1. Choice of degrees of freedom r for the molecular model used to generate the ensemble and for calculating q(r) 2. Choice of physical force field 3. 4.
Choice of (empirical) function q(r) for calculating the quantity q using r Choice of penalty function to restrain to
8
Wilfred F. van Gunsteren et al.
5. 6.
Quality of the experimental data to guide the restraining Choice of method and extent of Boltzmann sampling of the configurational space
These basic aspects will be discussed in the following six subsections. It is not intended to cover these thoroughly or with complete references but to accentuate particular points or problems. The subject of structure determination of biomolecules based on experimental data, particularly NMR data, has regularly been reviewed (van Gunsteren et al., 1985,1991,1994; Braun, 1987; Clore and Gronen-
born, 1987; Oppenheimer and James, 1989; Hoch et al., 1991; Brünger and Karplus, 1991; de Vlieg and van Gunsteren, 1991; James and Basus, 1991; Torda and van Gunsteren, 1992; Wagner et al., 1992; Brünger and Nilges, 1993). This literature illustrates the ongoing evolution of the computational techniques used.
4.1. Choice of Degrees of Freedom r for Generating an Ensemble and for Calculating q(r) Different particles, electrons, atoms, or groups of atoms can be chosen to define the degrees of freedom of the molecular model: • • • •
electrons atoms united atoms amino acid residues
Solvent degrees of freedom can be explicitly treated or neglected. The choice of degrees of freedom of the model is determined by two aspects: 1. Which degrees of freedom are essential to generate a sufficiently accurate Boltzmann ensemble of molecular configurations? 2. Which degrees of freedom are required to calculate the quantity q from the ensemble of configurations r ? Ad 1. The most widely used models have (united) atomic degrees of freedom. Quantum-chemical calculations are still too expensive to incorporate electronic degrees of freedom in molecular models for structure refinement. Simple molecular models based on residue degrees of freedom are generally not of sufficient quality to be very helpful in structure refinement. The inclusion of solvent degrees of freedom is necessary when aiming at high-resolution structures. Ad 2. The use of united-atom models matches well the calculation of structure factors to obtain X-ray diffraction intensities I(hkl), since the contribution of hydrogens is small. Except in methyl groups, the hydrogen atom positions can
Modeling Biomolecular Structure on Basis of Spectroscopic Data
9
easily be reconstructed from their covalently bound non-hydrogen neighbors. This means that an all-atom model is not required to calculate NOE intensities, Jcoupling constants, or chemical shifts. For the latter quantity the inclusion of electronic degrees of freedom may be necessary to obtain sufficient accuracy (de Dios et al., 1993). The inclusion of the electric field due to the solvent degrees of freedom might also be essential to obtain accurate chemical-shift values. Residuebased models lack the necessary detail to calculate structure factors, NOE intensities, -coupling constants, or chemical shifts. 4.2. Choice of Physical Force Field The type of force field that can be chosen follows partly from the selected degrees of freedom for the model. Atomic force fields are standardly used in structure determination, but they are of a wide range of sophistication and accuracy (Gelin, 1993; Hünenberger and van Gunsteren, 1997).
1. The simplest energy functions only contain terms to maintain bond lengths, bond angles, chirality, and the volume of atoms (Hendrickson and Konnert, 1981; Braun and Go, 1983; Havel and Wüthrich, 1985). This type of interaction function is often employed in the first stages of structure determination. 2. The next level of sophistication is to include van der Waals attraction between atoms and to account for the observed general preferences of dihedral or torsional angles in molecules (Gerber, 1993). 3. The next level is then to include a Coulomb term which accounts for polar, hydrogen bonding, and ionic interactions. 4. Force-field terms including polarizability have not yet been used in structure refinement. The better the force field, the better will be the ensemble of molecular conformations to calculate
The question of the general quality of a force field cannot be easily answered. It would require a systematic comparison of calculated (without restraints) and experimentally measured values for a range of structural, dynamic, and thermodynamic properties for a variety of biomolecular systems. An example of such a comparison for only one molecule, a hepta- -peptide in methanol solution, and only two types of quantities or properties, a set of 42 NOE distance bounds and a set of 21 constants, is given in Figs. 1 and 2. In a (unrestrained) 2-ns molecular dynamics (MD) simulation of the -peptide in a periodic box with 962 methanol molecules using the GROMOS 43A1 force field (van Gunsteren et al., 1996a), the NOE distance bounds are satisfied, the average deviation being only 0.05 Å, and the constant values are essentially reproduced, the mean
10
Wilfred F. van Gunsteren et al.
Modeling Biomolecular Structure on Basis of Spectroscopic Data
11
12
Wilfred F. van Gunsteren et al.
deviation being only 0.53 Hz (Daura et al., 1997). We note that the averages over the ensemble generated in the MD simulation are closer to the experimental values than the NOE distances and of the starting structure, which was a model built using the NMR data. 4.3. Choice of (Empirical) Function q(r) for Calculating the Quantity q Using r The expression relating the structure factor F(hkl) to the electron density or molecular conformation r is a simple Fourier transform with the atomic X-ray scattering factors and the atomic B-factors representing the spatial extension of the atoms as parameters (Hendrickson and Konnert, 1981). The NOE intensities should in principle be obtained through a relaxation matrix calculation (James, 1991; Bonvin et al., 1996), but are often converted to NOE distance bounds, thereby neglecting multispin dipolar relaxation and angular time correlation of the spatial vector connecting the nuclei i and j (Tropp, 1980). The coupling constants are generally calculated using the Karplus relation (Karplus, 1959, 1963)
where is the dihedral angle between vicinal nuclei i and l. Relation (11) is of approximative nature, and the empirical parameters a, b, and c are constants for a particular type of constant, and are calibrated by fitting measured from molecules whose dihedral angles are known from crystal structures. This leads to different sets of a, b, and c values (Bystrov, 1976; de Marco et al., 1978; Pardi et al., 1984; Wang and Bax, 1996) being used in structure refinement. A simple-minded practice is to convert into dihedral angle values using relation (11). Since the Karplus relation is multiple valued for 90% of the range of and the average is very nonlinear, this practice is hazardous except for very large . The expressions used to calculate chemical shifts from molecular conformation are even more empirical. Although much progress has been made by using quantum-chemical calculations to relate chemical shifts to the local molecular conformation (de Dios et al., 1993), the nonlocal contributions to are based on empirical formulae which account for magnetic anisotropy (ApSimon et al., 1967), aromatic ring current effects (Johnson and Bovey, 1958), and electric field effects (Buckingham, 1960). As for the Karplus relation, the parameters of the various expressions are obtained by fitting measured values from molecules of known conformation (Williamson et al., 1992). Due to the above-mentioned features of the Karplus relation and the approximative character of the expressions for the chemical shifts, the inclusion of (Torda et al., 1993; Garrett et al., 1994) or of chemical shifts (Harvey and van Gunsteren, 1993; Ösapay et al., 1994; Kuszewski et al., 1995a, 1995b) in structure
Modeling Biomolecular Structure on Basis of Spectroscopic Data
13
refinement of biomolecules has not yet led to a significant improvement of the
resulting ensemble of molecular conformations.
4.4. Choice of Penalty Function The penalty function bringing close to achieve that
, or if
to Restrain
to
serves the purpose of keeping or only serves as upper or lower bound, to
or
respectively. The penalty function should satisfy the following conditions.
1. The value of
should increase the more
the bound
e.g., the function could be quadratic or of higher power
violates
in the violation [see Eq. (10)]. On the other hand, in order to maintain molecular flexibility when searching conformational space, it should not restrict violations too rigidly. 2. For larger violations the gradient of with respect to its argument should be bounded in order to avoid overly dominant restraining forces (which are proportional to this gradient) when structures with large
violations are considered. 3. The function should be continuous and possess a continuous first derivative with respect to its argument in order to allow its use in MD simulations. 4. The function should be as simple as possible in order to keep the calculation efficient. Since has no physical meaning, no restrictions due to physical laws inhibit the choice of a simple form. A variety of functional forms for
has been used in the literature (Scarsdale et al., 1988; Fry et al., 1989). Expression (10) is used by van Gunsteren et al. (1996a). As has been argued at length in the literature (Gros et al., 1990; Torda et al., 1990, 1993; Gros and van Gunsteren, 1993; Bonvin et al., 1994; Nanzer et al., 1994; Schiffer et al., 1995; Fennen et al., 1995), the use of an average over time and/or molecules, , of the observed quantity q in the restraining potential energy term is essential for a correct representation of the experimental data in the molecular simulation. Generally, the observed quantity q(r), such as the NOE intensity, the constant, or the chemical shift, depends in a nonlinear way on the atomic coordinates, which implies that
14
Wilfred F. van Gunsteren et al.
that is, the average of q over an ensemble of conformations will be different from the value of q calculated using the mean structure. As a consequence, there may be no single structure that will fit all the experimental data simultaneously (Nanzer et al., 1994). In addition, if the averaging is omitted from the mobility of the
molecular system or the variation of conformations in the ensemble is strongly reduced (Nanzer et al., 1995).
The ensemble average can be taken as a time (trajectory) average (Torda et al., 1989) or as an average over different molecules (Scheek et al., 1991; Bonvin et al., 1994; Fennen et al., 1995). In MD simulations the use of the time average
is the natural choice. Formula (14) is the true average of q and is used in the analysis of simulation trajectories, but it is not suitable for deriving a restraining force from
during a simulation: the rate of change of depends on the length of the averaging period t. This problem is avoided by building a decay into the summation over time with a characteristic decay time or memory relaxation time so that
is used in
As a consequence of using (14) or (15) in the restraining
potential energy term, the interaction function V(r) becomes a nonconservative
function, so the total energy will not be conserved in an MD simulation with time-averaged restraining. The ensemble average can alternatively be taken as an average over different molecules (Scheek et al., 1991):
where is the probability of a conformation or molecule in the ensemble of conformations or molecules. In the first application of (16) the weights of the conformations when averaging were taken as identical (Scheek et al., 1991):
Modeling Biomolecular Structure on Basis of Spectroscopic Data
According to statistical mechanics this is incorrect. The probabilities
15
should
satisfy a Boltzmann distribution (Fennen et al., 1995):
The advantage of time averaging (14) over molecule averaging (16) is that the relative Boltzmann probability of the configurations of a trajectory is guaranteed when proper equations of motion are integrated. A limitation of time averaging over
an MD trajectory is that during the finite simulation time not all conformations which contribute to the measured will be visited, e.g., due to the presence of high-energy barriers separating low-energy conformations, as in the case of a cyclic molecule (Fennen et al., 1995). In such a case, one can combine space or molecule averaging [Eqs. (16) and (18)] with time averaging [Eqs. (14) and (15)] to obtain a set of Boltzmann-weighted conformers that reproduces the observed quantities In conventional structure refinement of biomolecules using the penalty function approach, the averaging of the quantity q(r) over conformations r is neglected. This is not a problem for relatively rigid, nonmobile (parts of) systems. However, as soon as conformational variability plays a role, averaging of q(r) over r is required in order to avoid misinterpretation of the molecular modeling results. The method of time-averaging structure refinement, using X-ray diffraction data, has been investigated using different molecular systems (Gros et al., 1990; Gros and van Gunsteren, 1993; Burling and Brünger, 1994; Clarage and Phillips, 1994; Schiffer et al., 1995). Given sufficient diffracted X-ray beam intensities, the method gives a better representation of the conformational variability of a biomolecule in crystalline form than can be obtained by use of conventional refinement methods with either isotropic or anisotropic temperature factors. This is illustrated in Fig. 3, which shows the exact and two differently refined electron density maps for a part of a cyclodextrin crystal. Time-averaging refinement leads to a cleaner map and a lower R-value of 5.23% than anisotropic B-factor refinement (R-value of 8.74%). The application of time-averaging structure refinement using NOE intensity data (Bonvin et al., 1994) or NOE distance restraint data (Torda et al., 1990; Nanzer et al., 1994) has led to identification of conformational misinterpretations, which were based on the neglect of averaging over the molecular motion. When applying time-averaging of NOE distances the Tyr 15 side chain of tendamistat was shown to be very mobile while satisfying the NOE restraints, in accordance with the flat (featureless) crystallographic electron density map for this side chain, whereas the use of conventional structure refinement (neglecting the averaging) led to a unique and artificially very well defined side-chain conformation for Tyr 15; see Fig. 4 (Torda et al., 1990). For chymotrypsin inhibitor 2 (CI-2) it was shown that time-averaging structure refinement could resolve the discrepancy found
16
Wilfred F. van Gunsteren et al.
between the NMR solution structure, on the one hand, and the X-ray crystal structure, on the other, each of which had been determined conventionally, i.e., neglecting averaging (Nanzer et al., 1994). The method of time-averaging restraining has also been applied using coupling constant data (Torda et al., 1993; Nanzer et al., 1997). This is, however,
more problematic than in the case of X-ray diffraction structure factor or NOE
Modeling Biomolecular Structure on Basis of Spectroscopic Data
17
intensity or NOE distance restraining. Application of time-averaging restraining will improve the agreement between and values, but often at the cost of generating relatively large structural fluctuations, as illustrated in Fig. 5 by the fluctuations of the radius of gyration of the dodecapeptide (Nanzer et al, 1997). The enhanced fluctuations are due to the combined use of attractive and repulsive restraints of type (10), which only depend on the average and not on the instantaneous value. This causes the restraining forces to be nonzero as long as violates the bound , even when the actual value already satisfies the bound . This should be remedied by choosing a penalty function which depends on both and :
18
Wilfred F. van Gunsteren et al.
When applying time averaging to NOE distance restraint data, no artificial structural fluctuations are induced (Nanzer et al., 1995, 1997); see Fig. 5. This is due to the following particular features of NOE distance restraining: 1. Generally, only attractive time-averaged restraints (upper bounds) are
used. 2. The van der Waals repulsion between atoms in contact counteracts the attractive restraint. 3. The distance averaging favors short distances, which makes the average follow rapidly the actual value for small r.
Modeling Biomolecular Structure on Basis of Spectroscopic Data
4.5. Quality of the Experimental Data
19
to Guide the Restraining
If the experimental data are incorrect, or if the averaging involved in measuring is very different from the averaging involved in measuring other restraints or from the averaging over the generated ensemble, a conflict may occur between the forces or interactions due to the different terms in (9). The total interaction function V(r) contains 1. A physical force-field term , representing the general knowledge about molecular structure which is based on experimental data. 2. Restraining terms , representing the specific experimental data with respect to the particular molecule of interest, such as a. X-ray or neutron diffraction intensities or structure factor amplitudes b. NOE intensities or distance bounds c. constants d. chemical shifts. An example of a conflict between the physical force field and the crystallographic structure factor restraining term, on the one hand, and the NOE distance restraining term, on the other hand, can be found in Schiffer et al. (1994). The structure of the protein basic pancreatic trypsin inhibitor (BPTI) was refined using different re-
20
Wilfred F. van Gunsteren et al.
straining terms in addition to the GROMOS87 force field (van Gunsteren and Berendsen, 1987) for 1. No restraining terms 2. Structure factor restraining term
involving 15,867 X-ray reflection intensities 3. NOE distance restraining term involving 642 distance upper bounds 4. Both the structure factor and the NOE distance restraining terms
It was found that the side chains of Leu 6 and Phe 22 show unfavorable angle values when the NMR data set was applied, indicating a conflict between the NMR data and the other molecule-specific and general structural data.
Modeling Biomolecular Structure on Basis of Spectroscopic Data
21
4.6. Choice of Method and Extent of Boltzmann Sampling of the Configurational Space
4.6.1. Sampling Methods Since biomolecular conformational space is too large to be exhaustively sampled, one generally has to use in biomolecular modeling heuristic methods for sampling and searching for low-energy conformers. An overview of types of methods to search and sample conformational space can be found in van Gunsteren
et al. (1995a). Only a subset of the great variety of methods has been tried in structure refinement based on spectroscopic or diffraction data. The most widely used sampling methods are the following. 1. Non-Boltzmann sampling, such as a. repeated distance geometry calculations b. structural database sampling c. random or gradient-driven variation of torsional angles sampling 2.
Boltzmann sampling, such as
a. conventional or configuration-bias Monte Carlo simulation (Frenkel, 1993) b. molecular dynamics (MD) simulation (Alien and Tildesley, 1987) c. stochastic dynamics (SD) simulation (van Gunsteren, 1993) The efficacy of the sampling is generally restricted by the general nature of the energy surface V(r). The occurrence of high-energy barriers between local minima may inhibit proper sampling. Therefore, techniques have been developed to enhance the sampling power of the methods.
1.
Deformation of the potential energy surface in order to reduce barriers. a. Smoothing of the potential energy surface V(r) allows for better sampling. Examples are the variable-target-function method (Braun and Go, 1983), in which gradually longer-ranged atom pair interac-
b.
tions are included while refining a structure against NMR data, and the use of low-resolution data, which flattens , in the initial stages of crystallographic refinement (Gros et al., 1989; Gros and van Gunsteren, 1993). The use of simplified nonbonded interaction functions in in the early stages of structure refinement based on NMR data also falls in this class (Nilges et al., 1988). Incorporation of information on the energy hypersurface obtained during refinement into the potential energy function is another possibility to enhance sampling. Examples are the local-elevation search method (Huber et al., 1994) and the use of time-averaging restraints
22
Wilfred F. van Gunsteren et al.
in structure refinement based on NMR or X-ray data, which leads to much-enhanced sampling (van Gunsteren et al., 1994). c. Extension of the dimensionality of the configurational space allows for circumventing of energy barriers. An example is the use of MD simulation in four-dimensional Cartesian space in structure refinement based on NMR data (van Schaik et al., 1993). 2. Scaling of system parameters in order to improve the statistics of sampling. a. Simulated temperature annealing is widely used in structure refinement based on NMR or X-ray data. b. Scaling of atomic masses does not affect statistical mechanical averages of quantities q that only depend on spatial coordinates r (in the absence of constraints), so it can be used to enhance sampling (Mao
c.
and Friedman, 1990). Mean-field approximations to the energy function can be used to enhance sampling at the cost of losing details of the interaction (Elber and Karplus, 1990; Zheng et al., 1993; Huber et al., 1996).
4.6.2. Choice of Coordinate System When considering branched polymers, the choice of internal coordinates, bond lengths, bond angles, and torsional angles seems to be natural. However, the equations of classical dynamics expressed in the internal, generalized coordinates
are considerably more complex than when expressed in
Cartesian coordinates
Equations (20) have been presented in different forms (Wittenburg, 1977; Katz et al., 1979; Bae and Haug, 1987, 1988; Mazur et al, 1991; Jain et al., 1993; Turner et al., 1993; Rice and Brünger, 1994; Mathiowetz et al., 1994), and the coefficients and depend on the atomic masses and the molecular topology of the
Modeling Biomolecular Structure on Basis of Spectroscopic Data
23
polymer considered. Since the Cartesian (Newtonian) equations of motion do not involve the explicit coupling of the equations through the summation over the index j in (20) and lack the two terms depending on the generalized velocities in (20), Cartesian equations of motion are generally used in MD and stochastic dynamics (SD) simulations of condensed phase systems for efficiency reasons (Allen and Tildesley, 1987; van Gunsteren and Berendsen, 1990).
4.6.3. Use of (Soft) Constraints or Multiple-Time-Step Algorithms In a molecular simulation or energy minimization the bulk of the computational effort is spent in evaluating the interaction function V(r) and its partial derivatives with respect to the spatial coordinates, (which are minus the forces). When using Cartesian coordinates, integration of the Newtonian equations of motion (21) is a simple task requiring, in general, less than 1% of the total computational effort. The various interaction terms contained in V(r) require quite different computational efforts. The evaluation of nonbonded interactions (van der Waals, Coulomb), up to a physically reasonable cutoff distance around each atom, is by far the dominant task, requiring easily 80%–99% of the computing time, the exact figure
depending on the cutoff radius. The other interaction terms, torsional-angle, bondangle bending, and bond-stretching interactions take generally 1%–10% of the
effort. The different restraining terms require very different computing efforts. A structure factor calculation using an atomic resolution spatial grid in will easily dominate the whole calculation of V(r). An NOE relaxation matrix calculation can also be relatively costly, depending on the approximations involved (Bonvin et al., 1994). NOE distance restraining and restraining are relatively cheap operations, generally requiring less than a few percent of the computational effort. A chemical-shift restraining term in will in general be somewhat more expensive due to the larger number of neighboring atoms involved in the calculation of the chemical shift of an atom i. Algorithms to integrate equations of motion forward in time are based on the approximation that the forces and their derivatives are constant during the integration time step which is of course more exact the smaller is. This approximation is, of course, also exact if the force is constant, or in other words if the derivative of the force or the second derivative of the energy function V(r), viz., the curvature of V(r), equals zero. So, the length of the integration time step is limited by the size of the local curvature of V(r) or in physical language by the highest-frequency motions occurring in the molecular system:
and
24
Wilfred F. van Gunsteren et al.
For example, for the one-dimensional classical harmonic oscillator with potential energy function
and mass m, one has
and the curvature
equals the force constant K. The larger the curvature K
of V(x), the higher the frequency
of the motion and the smaller time step
must
be used. Expressions (22)–(25) show that a smoothing of the interaction function V(r) will allow for larger integration time steps to be used.
In biomolecular systems a hierarchy of motional frequencies originating in different types of interatomic interactions can be distinguished. In order of decreasing frequency or increasing smoothness of the corresponding (physically modeled) interaction term we have I.
Bond-stretching vibrations with an approximate oscillation or relaxation time of about 10 fs II. Bond-angle bending vibrations, torsional-angle vibrations around double bonds, water (or solvent) librational vibrations with III. Motions dominated by van der Waals contacts, single-bond torsional interactions and short-range (e.g., hydrogen bonding) Coulomb interactions with IV. Motions determined by long-range Coulomb (ionic, dielectric) interactions with
This hierarchy can be exploited to enhance the efficiency of a simulation through the use of longer time steps Four techniques to lengthen are available (van Gunsteren, 1991). 1. Application of holonomic distance constraints. This can be implemented by formulating Lagrange equations of motion in the generalized (e.g., torsional) coordinates (Katz et al., 1979; Mazur et al., 1991) [see (20)] or by using Newton’s equations of motion, Cartesian coordinates, and
Lagrangian multipliers to satisfy the constraints (Ryckaert et al., 1977; Ciccotti et al., 1982; Hess et al., 1997). 2. Application of soft or adiabatic (variable length) distance constraints. The difference with respect to the hard or holonomic constraints is that the length of a constrained distance is not a constant throughout the simulation, but varies per integration time step such that the gradient
Modeling Biomolecular Structure on Basis of Spectroscopic Data
25
of the total potential energy (force or minus the force) along the constrained degree of freedom is zero after each time step (Reich, 1995, 1996). In other words, the length of a constrained distance is adiabatically adjusted at each time point such that no strain is built up along the constrained degree of freedom. Or, after each time step the total potential energy is minimized, allowing only changes in the constrained degrees of freedom. 3. Application of multiple-time-step (MTS) algorithms using different time steps or each satisfying condition (22) with respect to the relaxation times and of the different interactions, when integrating the contributions of the different forces. MTS algorithms have been used for integrating bond-stretching and bond-angle bending forces separately from the remaining forces (Tuckerman et al., 1992; Watanabe and Karplus, 1993) and for integrating long-range Coulomb forces separately from the remaining forces in the so-called twin-range method (van Gunsteren and Berendsen, 1990). 4. Softening or smoothing of the high-frequency (most strongly curved) interaction terms in V(r) will also allow the use of longer integration time steps This techique is often used in the earlier stages of structure refinement in order to enlarge the radius of convergence of the restraining
minimization and in order to avoid a reduction of when simulating at high temperature in simulated temperature annealing refinement. What are the approximations used in the four different methods, or what are their relative advantages and disadvantages? Methods 1, 2, and 3 all treat some high-frequency degrees of freedom differently (as hard or soft constraints or with a smaller from the other lower-frequency degrees of freedom. This is physically allowed only if 1. The frequency components of the motion along the specially treated (high-frequency) degrees of freedom are well separated from the other frequencies occurring in the molecular system. 2. The coupling between both types of motion is weak. When applying constraints (methods 1 and 2), so-called metric tensor correction terms may have to be added to the interaction function V(r), depending on the type of constraints used (van Gunsteren, 1980; Ryckaert, 1991). Moreover, a physical force field calibrated for use in unconstrained simulations may have to be recalibrated for use in conjunction with hard constraints, which will rigidify the molecules, or with soft constraints, which will mollify (or soften up) the molecules (van Gunsteren and Karplus, 1982). Another relevant aspect is the quantum-mechanical
nature of particular intramolecular vibrations. Bond-stretching vibrations have
26
Wilfred F. van Gunsteren et al.
frequencies in the range At room temperature, to a frequency of . Thus we have (h is Planck’s constant)
corresponds
which implies that the bond vibrations are essentially of quantum-mechanical nature and that only the ground state will be populated. Treating the bonds as hard constraints (method 1) is most likely to be a more correct approximation of the quantum dynamics than adiabatic dynamics (method 2) or classical (harmonic
oscillator) dynamics (methods 3 and 4), with its different energy distribution, would be. Another unpleasant aspect of fully flexible molecular models is the presence of weakly coupled modes of different frequencies, which makes the energy redistribution between these modes in a molecular simulation a slow process. This is a
technical problem, which can be avoided by coupling the different degrees of freedom (high-frequency versus the rest) separately to a heat bath (van Gunsteren
et al., 1996a). This problem is likely to occur when MTS algorithms (method 3) are used. In method 1 the high-frequency modes have been eliminated or decoupled, and in methods 2 and 4 they have been strongly coupled to the other modes, allowing for easy energy redistribution. Finally, an obvious effect of the application of method 4—softening of V(r)—is an enlargement of the structural fluctuations of the molecules along the degrees of freedom for which the interaction has been smoothed. We now consider the hierarchy of motional frequencies and the corresponding interaction terms given before, motions of types I–IV, and discuss the relative merits of applying one of the four time-saving methods to lengthen the integration time step to the particular type of motion. I. Bond-Stretching Motion The most appropriate treatment of the bond-stretching degrees of freedom is the use of hard constraints (method 1). It is a good approximation of their quantum-mechanical nature, it avoids the energy redistribution problem (method 3), and it can be very simply carried out using Cartesian coordinates (Newton’s equations of motion) with the SHAKE method (Ryckaert et al., 1977; Hess et al., 1997). Metric tensor effects are negligible (van Gunsteren, 1980) and force-field corrections are not necessary: the dynamics of the molecular system is not affected by the bond-length constraints (van Gunsteren and Karplus, 1982). Use of one of the time-saving methods 1–4 may reduce the computational effort by up to a factor II. Bond-angle bending motion, torsional motion around double bonds, water librational motion In simulations including explicit water molecules, no gain in computational efficiency can be obtained by use of one of the time-saving methods due to the presence of the high-frequency water molecule librations governed by the nonbonded (van der Waals, Coulomb) interactions.
Modeling Biomolecular Structure on Basis of Spectroscopic Data
27
These limit the time step to about When simulating a macromolecule in vacuo, the bond-angle and stiff torsional-angle degrees of freedom could be treated using one of the methods 1–4, leading to a reduction of computational effort of up to a factor Typically, the time step is limited to about when using physically realistic force fields. Using method 4, softening the interaction V(r), the time step can be lengthened at will, but at the cost of a loss of physical content of the molecular model. The use of bond-angle constraints (method 1), or so-called torsion-angle dynamics (Rice and Brünger, 1994; Stein et al., 1997), introduces two problems: (i) metric tensor effects are not small, so should not be neglected (van Gunsteren, 1980); and (ii) the dynamics of the macromolecule are severely altered due to the rigidification of the molecular model: structural fluctuations and torsional-angle transitions are quenched (van Gunsteren and Karplus, 1982). To restore the proper physical behavior of the molecular system a metric tensor term should be added to V(r), and should be recalibrated for use with
bond-angle constraints. This also applies, to a lesser extent, to the use of soft bond-angle constraints (method 2). III. Motions dominated by van der Waals contacts, single-bond torsional interactions, and short-range Coulomb interactions
These interactions
dominate the essential degrees of freedom of molecular systems and should therefore not be approximated using time-saving techniques. They limit the time
step using physically realistic force fields to for macromolecules and to when water is used as solvent. As before, softening the torsional and nonbonded interaction terms (method 4) allows for larger but at a loss of physical content of the molecular model. IV. Motions determined by long-range Coulomb interactions Long-range electrostatic interactions can be computationally very demanding due to their dependence on a large number of atoms. Since these interactions are only changing slowly during a simulation, the use of a multiple-time-step algorithm (method 3) when integrating these forces can reduce the computational effort up to a factor (van Gunsteren and Berendsen, 1990; van Gunsteren et al., 1996a). On the other hand, application of so-called particle–particle–particle–mesh methods to evaluate electrostatic interactions (Hockney and Eastwood, 1981) may reduce the computational effort by a factor of 100 over conventional Ewald techniques (Luty et al., 1994), combining better treatment of long-range electrostatic interactions with computational efforts in the order of the current MTS algorithms.
The considerations of this subsection lead to the following conclusions with respect to the use of different methods of saving computational effort by lengthening the integration time step For condensed phase systems which include solvent, the use of bond-length constraints (method 1) and a multiple-time-step algorithm for long-range nonbonded interactions (method 3) leads, at present, to a
most efficient, yet physically reliable simulation. At the cost of reducing the
28
Wilfred F. van Gunsteren et al.
physical content of the molecular model, the time step can be lengthened for macromolecules in the gas phase by additionally constraining bond angles and
torsional angles around double bonds (method 1). Comparable time steps can be used when softening the bond-angle and stiff torsional interaction terms (method 4). The longest time steps can be used for macromolecules in the gas phase by also softening the nonbonded interactions (method 4), but again with even more loss of physical content of the model. 4.6.4. Extent of Sampling, Convergence of Even when using a powerful sampling method, the actual region of space sampled in a simulation should be monitored and analyzed with respect to the variety of conformations and their energies. Although a method may generate very different conformers, these may all be of high energy and therefore of little use
when Boltzmann-averaging to compute
Second, even when using a very
Modeling Biomolecular Structure on Basis of Spectroscopic Data
29
long simulation or a large ensemble to average q(r), the average may not have converged. This is illustrated in Fig. 6 for a 1100-ps MD simulation of lysozyme
in water (Smith et al., 1995). The upper panel shows the value of the backbone angle (C–N–CA–C) of residue Ile 78, which switches occasionally between two relatively stable states. The order parameter
which can be related to
NMR relaxation parameters, is sensitive to the value of
the
since denotes the three Cartesian components of the vector of the peptide nitrogen. The lower panel of Fig. 6 shows the trajectory-averaged parameter of Ile 78 (solid line) as a function of the time used for averaging in (27). If the averaging period is of the same order of magnitude as the time between the occasional flips, every is reflected in a change in . So, for a proper evaluation of the accuracy of trajectory or ensemble averages
the analysis of averages as a function of time or ensemble size is a necessary condition. For other examples of incomplete convergence of ensemble averages we refer to a previous paper (van Gunsteren et al., 1995b).
5. ASSESSING THE QUALITY OF THE OBTAINED ENSEMBLE OF STRUCTURES The quality of the ensemble of molecular structures generated in a given
simulation depends on the choices made with respect to the aspects discussed in the previous section. In practice, one needs tangible criteria which can be applied
to the generated ensemble of structures. These are the following (van Gunsteren et al., 1994).
I. Criteria concerning the discrepancy between the generated three-dimensional structures and the molecule-specific experimental spectroscopic or diffraction data. 1. The sum of the violations of the bounds (12) should be low. In the case of diffraction data the crystallographic R-factor should be low considering the resolution of the data. In the case of NMR data the sum of violations of NOE intensity or distance bounds, or bounds or chemical-shift bounds, could also be cast in terms of an R-factor (Baleja et al., 1990; Gonzales et al., 1991; Schmitz et al., 1992). 2. Individual bound violations should be small (NMR case) or for X-ray
data the electron density map should display clear atomicity.
30
Wilfred F. van Gunsteren et al.
3. Cross-correlation tests should be carried out, such as the use of free R-factors (Brünger, 1992). II. Criteria concerning general knowledge about a particular class of biomolecular systems, e.g., proteins. 1. The value of the physical potential energy should be low, indicating correct stereochemistry and absence of strain in the molecule. How low should the molecular energy be? When considering the energy of a series of proteins for which a high-resolution X-ray structure is available, it appears that the molecular energy for
these structures is roughly a linear function of the number of residues (van Gunsteren, 1990). Thus, when a refined protein structure displays a relatively large molecular energy compared to the energies of well-determined X-ray structures of proteins of comparable size, this must be taken as a warning that a partially wrongly folded or packed
structure is obtained. Examples of such cases can be found in van Gunsteren (1990). 2. A greater part of the hydrogen-bond donors and acceptors in the inner core of a protein should form hydrogen bonds. A protein structure without any internal hydrogen bonds has yet to be found. Leaving hydrogen-bond donors and acceptors unbound is energetically unfavorable.
3. Backbone
torsional-angle values in proteins should generally
fall in the stereochemically likely regions of the
map.
4. The spatial distribution of bare charges (Arg, Lys, Asp, Glu residues in proteins) should not contain regions with a very high density of like charges. Furthermore, charged residues are expected to lie on the molecular surface, allowing the (high dielectric) aqueous solvent to lower their energy, or to form salt bridges. 5. Hydrophobic residues in proteins should tend to cluster in the inner core of the molecule. The hydrophobic solvent accessible surface area should not be large.
6. For larger proteins, the surface-to-volume ratio should correspond to the values found for this type of molecule. Too large a volume indicates unlikely deficiencies in packing. 7. If homologous protein sequences are known, the structures obtained should be consistent with the known structures. Amino acid residue insertions or deletions in the inner core of a protein are seldom observed. 8. The chemical environment of heavy-atom binding sites should correspond to chemical knowledge.
We note that good agreement between calculated and experimental data does not necessarily imply that the ensemble of generated molecular structures is correct.
Modeling Biomolecular Structure on Basis of Spectroscopic Data
31
It is a necessary condition for correctness, not a sufficient one. The good agreement may be due to compensation of errors. For examples of this phenomenon, see our
previous papers (van Gunsteren, 1990; van Gunsteren et al., 1995b). Finally, we note that the discussion of Sec. 4.4 should have made clear that a low value of the root-mean-square positional deviation (RMSD) for a set of molecular atoms calculated over the ensemble of structures is not a good measure of the quality of an ensemble generated using conventional (nonaveraging) restraints. First, the RMSD value can be reduced at will by increasing the force constant of the restraining term . Second, conventional restraining may lead to
very low RMSD-values (Fig. 4) due to conflicting restraining forces, while in reality a large spread of conformers exists in solution (Torda et al., 1990). 6. PITFALLS THAT CAN BE AVOIDED
Here, we list rather arbitrarily a few pitfalls of structure determination based on spectroscopic or diffraction data, which can be avoided. Their explanation follows from the discussions given in the previous sections. We consider primarily
the (final) phase of the structure determination process in which a physical molecular model is used to obtain an ensemble of molecular conformations. 1. Conversion of an observed other than for extreme values, to a torsional-angle value with subsequent torsional-angle restraining. (The Karplus curve is highly nonlinear and multiple-valued, which makes the
effects of the experimental averaging unpredictable.) 2. Neglect of averaging in when applying the restraining interaction (may artificially reduce conformational variability). 3.
Application of time-averaging restraining of using a penalty function that only depends on (artificially enhances structural fluctuations). 4. Use of non-Boltzmann weighting of conformers when calculating
(violates statistical mechanics). 5. Use of equations of motion in non-Cartesian coordinates (leads to complex integration algorithms). 6. Freezing bond-angle degrees of freedom or using torsional dynamics without adjustment of the physical force field to these conditions and without inclusion of metric tensor interaction terms (reduces the atomic
motions, distorts the Boltzmann weighting). 7. Inadequate sampling when calculating the average
[The average value will depend on the size of the ensemble, or the ensemble may contain mainly unlikely (high-energy) structures].
32
Wilfred F. van Gunsteren et al.
ACKNOWLEDGMENTS. Financial support was obtained from the SwissNational Science Foundation (project 21-41875.94) and from the Underwood Fund, which is gratefully acknowledged. L. J. Smith is a Royal Society University Research Fellow. We thank C. M. Dobson for critical comments on the manuscript.
REFERENCES Allen, M. P., and Tildesley, D. J., 1987, Computer Simulation of Liquids, Clarendon, Oxford. ApSimon, J. W., Craig, W. G., DeMarco, P. V., Mathieson, D. W., and Saunders, L., 1967, Tetrahedron 23:2357. Bae, D.-S., and Haug, E. J., 1987, Mech. Struct. Mach. 15:359. Bae, D.-S., and Haug, E. J., 1988, Mech. Struct. Mach. 15:481. Baleja, J. D., Moult, J., and Sykes, B. D., 1990, J. Magn. Reson. 87:375. Bonvin, A. M. J. J., Boelens, R., and Kaptein, R., 1994, J. Biomol. NMR 4:143. Bonvin, A. M. J. J., Boelens, R., and Kaptein, R., 1996, in Encyclopedia of Nuclear Magnetic Resonance, Vol. 6 (D. M. Grant and R. K. Norris, eds.), Wiley, New York, pp. 3801–3811. Braun, W., 1987, Quart. Rev. Biophys. 19:115.
Braun, W., and Go, N., 1983, J. Mol. Biol. 186:611. Brünger, A. T., 1992, Nature 355:472. Brünger, A. T., and Karplus, M., 1991, Ace. Chem. Res. 24:54. Brünger, A. T., and Nilges, M., 1993, Quart. Rev. Biophys. 26:49. Buckingham, A. D., 1960, Can. J. Chem. 38:300. Burling, F. T., and Brünger, A. T., 1994, Isr. J. Chem. 34:165. Bystrov, V. F, 1976, Progr. NMR Spectr. 10:41.
Ciccotti, G., Ferrario, M., and Ryckaert, J.-P, 1982, Mol. Phys. 47:1253. Clarage, J. B., and Phillips, G. N., 1994, Acta Crystallogr. Sect. D 50:24. Clore, G. M., and Gronenborn, A. M., 1987, Protein Eng. 1:275. Daura, X., van Gunsteren, W. F., Rigo, D., Jaun, B., and Seebach, D., 1997, Chem. Eur. J. 3:1410. De Dios, A. C., Pearson, J. G., and Oldfield, E., 1993, Science 260:1491. De Marco, A., Llinás, M., and Wüthrich, K., 1978, Biopolymers 17:617. De Vlieg, J., Boelens, R., Scheek, R. M., Kaptein, R., and van Gunsteren, W. F., 1986, Isr. J. Chem. 27:181. De Vlieg, J., and van Gunsteren, W. F., 1991, Meth. Enzym. 202:268.
Elber, R., and Karplus, M., 1990, J. Am. Chem. Soc. 112:9161. Fennen, J., Torda, A. E., and van Gunsteren, W. F., 1995, J. Biomol. NMR 6:163. Frenkel, D., 1993, in Computer Simulation of Biomolecular Systems: Theoretical and Experimental Applications, Vol. 2 (W. F. van Gunsteren, P. K. Weiner, and A. J. Wilkinson, eds.), Escom, Leiden, pp. 37–66.
Fry, D. C., Madison, V. S., Bolin, D. R., Greeley, D. N., Toone, V, and Wegrzynski, B. B., 1989, Biochemistry 28:2399. Garrett, D. S., Kuszewski, J., Hancock, T. J., Lodi, P. J., Vuister, G. W., Gronenborn, A. M., and Clore, G. M., 1994, J. Magn. Reson. B104:99. Gelin, B. R., 1993, in Computer Simulation of Biomolecular Systems: Theoretical and Experimental Applications, Vol. 2 (W. F. van Gunsteren, P. K. Weiner, and A. J. Wilkinson, eds.), Escom, Leiden, pp. 127–146. Gerber, P. R., 1993, in Computer Simulation of Biomolecular Systems: Theoretical and Experimental Applications, Vol. 2 (W. F. van Gunsteren, P. K. Weiner, and A. J. Wilkinson, eds.), Escom, Leiden, pp. 213–228.
Modeling Biomolecular Structure on Basis of Spectroscopic Data
33
Gonzales, C, Rullmann, J. A. C., Bonvin, A. M. J. J., Boelens, R., and Kaptein, R., 1991, J. Magn. Reson. 91:659.
Gros, P., Fujinaga, M., Dijkstra, B. W., Kalk, K. H., and Hol, W. G. J., 1989, Acta Crystallogr. Sect. B 45:488. Gros, P., and van Gunsteren, W. F., 1993, Mol. Simulation 10:377. Gros, P., van Gunsteren, W. F., and Hol, W. G. J., 1990, Science 249:1149. Harvey, T. S., and van Gunsteren, W. F., 1993, Techniques in Protein Chemistry IV, Academic, New York, pp. 615–622. Havel, T. F., and Wüthrich, K., 1985, Bull. Math. Biol. 182:673. Hendrickson, W. A., and Konnert, J. H., 1981, in Biomolecular Structure, Conformation, Function and Evolution, Volume 1: Diffraction and Related Studies (R. Srinivasan, ed.), Pergamon, Oxford, pp. 43–57. Hess, B., Bekker, H., Berendsen, H. J. C., and Fraaije, J. G. E. M., 1997, J. Comput. Chem. 18:1463. Hoch, J. C., Poulsen, F. M., and Redfield, C., eds., 1991, NATO ASI-Ser. A225, Plenum, New York, pp. 1-464. Hockney, R. W., and Eastwood, J. W., 1981, Computer Simulation Using Particles, McGraw-Hill, New York. Huber, T., Torda, A. E., and van Gunsteren, W. F., 1994, J. Comput.-Aided Mol. Design 8:695. Huber, T., Torda, A. E., and van Gunsteren, W. F., 1996, Biopolymers 39:103. Hünenberger, P.H., and van Gunsteren, W.F., 1997, in Computer Simulation of Biomolecular Systems: Theoretical and Experimental Applications, Vol. 3 (W. F. van Gunsteren, P. K. Weiner, and A. J. Wilkinson, eds.), Escom, Leiden, pp. 3–82.
Jain, A., Vaidehi, N., and Rodriguez, G., 1993, J. Comput. Phys. 106:258. James, T. L., 1991, Curr. Opin. Struct. Biol. 1:1042. James, T. L., and Basus, V. J., 1991, Annu. Rev. Phys. Chem. 42:501. Johnson, C. E., and Bovey, F. A., 1958, J. Chem. Phys. 29:1012. Kaptein, R., Zuiderweg, E. R. P., Scheek, R. M., Boelens, R., and van Gunsteren, W. F., 1985, J. Mol. Biol. 182:179. Karplus, M., 1959, J. Chem. Phys. 30:11. Karplus, M., 1963, J. Am. Chem. Soc. 85:2870. Katz, H., Walter, R., and Somorjay, R. L., 1979, Comp. Chem. 3:25. Kuszewski, J., Gronenborn, A. M., and Clore, G. M., 1995a, J. Magn. Reson. B107:293. Kuszewski, J., Qin, J., Gronenborn, A. M., and Clore, G. M., 1995b, J. Magn. Reson. B106:92.
Luty, B. A., Davis, M. E., Tironi, I. G., and van Gunsteren, W. F., 1994, Mol. Simulation 14:11. Mao, B., and Friedman, A. R., 1990, Biophys. J. 58:803. Mathiowetz, A. M., Jain, A., Karasawa, N., and Goddard III, W. A., 1994, Proteins 20:227. Mazur, A. K., Dorofeev, V. E., and Abagyan, R. A., 1991, J. Comput. Phys. 92:261. Nanzer, A. P., Huber, T., Torda, A. E., and van Gunsteren, W. F., 1996, J. Biomol. NMR 8:285. Nanzer, A. P., Poulsen, F. M., van Gunsteren, W. F., and Torda, A. E., 1994, Biochemistry 33:14503. Nanzer, A. P., Torda, A. E., Bisang, C., Weber, C., Robinson, J. A., and van Gunsteren, W. F., 1997, J.
Mol. Biol. 267:1012. Nanzer, A. P., van Gunsteren, W. F., and Torda, A. E., 1995, J. Biomol. NMR 6:313. Nilges, M., Clore, G. M., and Gronenborn, A. M., 1988, FEBS Lett. 239:129. Oppenheimer, N. J., and James, T. L., eds., 1989, Meth. Enzym. 177. Ösapay, K., Theriault, Y, Wright, P. E., and Case, D. A., 1994, J. Mol. Biol. 244:183. Pardi, A., Billeter, M., and Wüthrich, K., 1984, J. Mol. Biol. 180:741.
Reich, S., 1995, Physica D89:28. Reich, S., 1996, Phys. Rev. E53:4176. Rice, L. M., and Brünger, A. T., 1994, Proteins 19:277.
34
Wilfred F. van Gunsteren et al.
Ryckaert, J.-P., 1991, in Computer Simulation in Material Science (M. Meyer and V. Pontikis, eds.), NATO ASI-Ser. E205, Kluwer, Dordrecht, pp. 43–66. Ryckaert, J.-P., Ciccotti, G., and Berendsen, H. J. C., 1977, J. Comput. Phys. 23:327. Scarsdale, J. N., Ram, P., Prestegard, J. H., and Yu, R. K., 1988, J. Comput. Chem. 9:133. Scheek, R. M., Torda, A. E., Kemmink, J., and van Gunsteren, W. F., 1991, in Computational Aspects of the Study of Biological Macromolecules by NMR Spectroscopy (J. C. Hoch, F. M. Poulsen, and
C. Redfield, eds.), NATO ASI-Ser. A225, Plenum, New York, pp. 209–217. Schiffer, C. A., Gros, P., and van Gunsteren, W. F., 1995, Acta Crystallogr. Sect. D 51:85.
Schiffer, C. A., Huber, R., Wüthrich, K., and van Gunsteren, W. F., 1994, J. Mot Biol. 241:588.
Schmitz, K., Kumar, A., and James, T. L., 1992, J. Am. Chem. Soc. 114:10654. Smith, L. J., Mark, A. E., Dobson, C. M., and van Gunsteren, W. F., 1995, Biochemistry 34:10918. Stein, E. G., Rice, L. M., and Brünger, A. T., 1997, J. Magn. Reson. 124:154. Torda, A. E., Brunne, R. M., Huber, T., Kessler, H., and van Gunsteren, W. F., 1993, J. Biomol. NMR 3:55. Torda, A. E., Scheek, R. M., and van Gunsteren, W. F., 1989, Chem. Phys. Lett. 157:289. Torda, A. E., Scheek, R. M., and van Gunsteren, W. F., 1990, J. Mol. Biol. 214:223.
Torda, A. E., and van Gunsteren, W. F., 1992, in Reviews in Computational Chemistry, Vol. III, (K. B. Lipkowitz and D. B. Boyd, eds.), VCH Publishers, New York, pp. 143–172. Tropp, J., 1980, J. Chem. Phys. 72:6035. Tuckerman, M. E., Berne, B. J., and Martyna, G. J., 1992, J. Chem. Phys. 97:1990. Turner, J. D., Weiner, P. K., Chun, H. M., Lupi, V., Gallion, S., and Singh, U. C., 1993, in Computer Simulation of Biomolecular Systems: Theoretical and Experimental Applications, Vol. 2 (W. F. van Gunsteren, P. K. Weiner, and A. J. Wilkinson, eds.), Escom, Leiden, pp. 535–555. Van Gunsteren, W. F., 1980, Mol. Phys. 40:1015. Van Gunsteren, W. F., 1990, in Studies in Physical and Theoretical Chemistry, Volume 71, Modeling of Molecular Structures and Properties (J.-L. Rivail, ed.), Elsevier, Amsterdam, pp. 463–478. Van Gunsteren, W. F., 1991, Am. Inst. Phys. Conf. Proc. 239:131. Van Gunsteren, W. F., 1993, in Computer Simulation of Biomolecular Systems: Theoretical and Experimental Applications, Vol. 2 (W. F. van Gunsteren, P. K. Weiner, and A. J. Wilkinson, eds.), Escom, Leiden, pp. 3–36. Van Gunsteren, W. F., and Berendsen, H. J. C., 1987, Groningen Molecular Simulation (GROMOS) Library Manual, Biomos, Groningen.
Van Gunsteren, W. F., and Berendsen, H. J. C., 1990, Angew. Chem. Int. Ed. Engl. 29:992. Van Gunsteren, W. F., Billeter, S. R., Eising, A. A., Hünenberger, P. H., Krüger, P., Mark, A. E., Scott, W. R. P., and Tironi, I. G., 1996a, Biomolecular Simulation: The GROMOS96 Manual and User Guide, Hochschulverlag der ETH, Zürich.
Van Gunsteren, W. F., Boelens, R., Kaptein, R., Scheek, R. M., and Zuiderweg, E. R. P., 1985, in Molecular Dynamics and Protein Structure (J. Hermans, ed.), Polycrystal Book Service, Western Springs, pp. 92–99.
Van Gunsteren, W. F., Brunne, R. M., Gros, P., van Schaik, R. C., Schiffer, C. A., and Torda, A. E., 1994, Meth. Enzym. 239:619. Van Gunsteren, W. F., Gros, P., Torda, A. E., Berendsen, H. J. C., and van Schaik, R. C., 1991, Ciba Foundation Symp. 161:150.
Van Gunsteren, W. F., Huber, T., and Torda, A. E., 1995a, Am. Inst. Phys. Conf. Proc. 330:253. Van Gunsteren, W. F., Hünenberger, P. H., Mark, A. E., Smith, P. E., and Tironi, I. G., 1995b, Comp.
Phys. Commun. 91:305. Van Gunsteren, W. F., and Karplus, M., 1982, Macromolecules 15:1528. Van Gunsteren, W. F., Nanzer, A. P., and Torda, A. E., 1996b, in Monte Carlo and Molecular Dynamics
of Condensed Matter Systems (K. Binder and G. Ciccotti, eds.), Proc. Euroconf., Vol. 49, S1F, Bologna, pp. 777–788.
Modeling Biomolecular Structure on Basis of Spectroscopic Data
35
Van Schaik, R. C., Berendsen, H. J. C., Torda, A. E., and van Gunsteren, W. F., 1993, J. Mol. Biol. 234:751. Wagner, G., Hyberts, S. G., and Havel, T. F., 1992, Annu. Rev. Biophys. Biomol. Struct. 21:167. Wang, A. C., and Bax, A., 1996, J. Am. Chem. Soc. 118:2483. Watanabe, M., and Karplus, M., 1993, J. Chem. Phys. 99:8063. Williamson, M. P., Asakura, T., Nakamura, E., and Demura, M., 1992, J. Biomol. NMR 2:83.
Wittenburg, J., 1977, Dynamics of Systems of Rigid Bodies, Teubner, Stuttgart. Zheng, Q., Rosenfeld, R., Vajda, S., and DeLisi, C., 1993, Protein Sci. 2:1242.
2
Combined Automated Assignment of NMR Spectra and Calculation of
Three-Dimensional Protein Structures
Yuan Xu, Catherine H. Schein, and Werner Braun 1. INTRODUCTION
The interpretation of NMR data to determine the three-dimensional structure of proteins has made significant progress in the last decade. In just one year, 1995, 100 high-resolution structures were reported, (Hendrickson and Wüthrich, 1996), many more than were possible in the initial stages of NMR solution structure determination of proteins, beginning with the proteinase inhibitor IIA, (Williamson et al., 1985), the headpiece of the lac represser (Kaptein et al., 1985), metallothionein (Braun et al., 1986), and the first high-resolution structure, tendamistat (Kline et al., 1988, 1986). In addition, while in the early days a spectra of a protein of more than 10 kD was too complicated to assign (Markley et al., 1984), isotopic labeling has opened up the method for much larger proteins and complexes (Gronenborn and Clore, 1994; Wagner, 1993). More refined pulse techniques, in combination with 3D and 4D heteronuclear NMR spectroscopy (Clore et al., 1990;
Yuan Xu, Catherine H. Schein, and Werner Braun • Sealy Center for Structural Biology, Department of Human Biological Chemistry and Genetics, University of Texas Medical Branch, Galveston,TX 77555-1157.
Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999.
37
38
Yuan Xu et al.
Kay et al., 1990), allow analysis of proteins at least as large as 30 kD (Garrett et al., 1997; Wüthrich, 1995). However, this impressive progress would not have been
possible without improvements in computer-supported analysis of NMR spectra (Morelle et al., 1995; Hare and Prestegard, 1994; Olson and Markley, 1994; Zimmerman et al., 1994) and 3D structure calculation (Nilges, 1996; Havel, 1991; Braun, 1987). Several computational tools for data collection and data reduction, such as
Fourier transform of FID to frequency space, baseline correction, interpolation of data (linear prediction), and noise reduction, have been combined in single software
packages (Neidig et al., 1995). Algorithms for automatic peak picking, volume integration, and line-shape characterization have been successfully implemented in software packages such as FELIX (Molecular Simulations Inc., San Diego, CA), XEASY (Eccles et al., 1991) or AURELIA (Neidig et al., 1995). However,
complete automation of the process from the collection of NMR spectra to the generation of the three-dimensional structure of macromolecules is far from reality. Collection of data is currently much faster than its interpretation by manual methods. Automatic analysis, including sequential resonance and NOESY crosspeak assignment, is only now being introduced and the methods must be optimized.
Reliable and accurate methods of assigning cross peaks in experimental NOESY spectra will be needed to handle the increased number of 3D structure determinations and to allow analysis of larger proteins. Computer analysis of spectra, despite its lack of the (human) “innate massively parallel visual” processor
(Nigles, 1996), becomes even more desirable for multidimensional spectra with labeled proteins (Zimmerman and Montelione, 1995; Oschkinat and Croft, 1994). The emphasis in this chapter will be on concepts and computational methods rather than on specific programs. As progress in this field is moving at a fast pace, it would be premature to predict which of the methods now in development will ultimately prove the best for assigning NMR spectra of biological macromolecules.
The “classical procedure” for assignment and 3D structure calculation from 2D proton–proton spectra (Wüthrich et al., 1982) has been modified and extended to include information from heteronuclear 3D and 4D spectroscopy. Schematically, the discrete steps from data analysis to a refined three-dimensional protein structure can be described as follows (Zimmerman and Montelione, 1995): 1. Assign atom types
and classify J-coupled spin
systems. Important cross peaks characteristic for each amino acid residue comprise backbone amide
and
protons in proton
spectra, and in spectra of proteins, and and in proteins. 2. Assign the spin system to possible amino acid residue types. 3. Identify sequential relationships using J-coupling data, possibly including that from specially designed experiments for isotopically labeled
Calculation of Three-Dimensional Protein Structures
4. 5. 6.
7.
8.
39
samples (3D HNCACB and CBCA(CO)HN, 4D HNCAHA and HN(CO)CAHA, CO- and CA-TOCSY,...), and/or interresidue NOESY data for and resonances. Map the sequentially connected spin systems to the amino acid sequence (sequence specific assignment). Extend the assignments to the side-chain nuclei of each spin system and include stereospecific assignments of prochiral groups. Use the assigned resonance frequencies to interpret NOESY, scalar coupling data and hydrogen–deuterium exchange to generate distance and dihedral angle constraints. Generate a three-dimensional structure with these constraints by distance geometry and/or molecular dynamics calculations. Once a structure has been generated, interactive looping is used to identify incorrect constraints. The structure is then improved by adding additional constraints as peak assignments become clarified. Refine the three-dimensional structure by restrained energy minimization or relaxation matrix calculations.
Having said this, it should be noted that few structures are solved, regardless of the dependence on computers, by applying these steps in a strictly consecutive
fashion. In the most extreme automated case, real-space assignment proposes to start at step 7 (Kraulis, 1994); i.e., unassigned NOESY data are used to calculate a
3D structure of arbitrarily labeled protons, which is then fitted to the primary sequence. The “main chain directed” (MCD) strategy suggested by Englander and Wand (1987) initially skips step 2 and combines steps 3 and 6 before the sequencespecific assignment. It takes advantage of the fact that sequence-specific assignment
of backbone protons is easier than assigning the side-chain protons at initial stages. The MCD method focuses on the secondary structure characteristics derived from patterns in the main-chain region of the protein spectra. Peaks are assigned in the 2D NOESY directly by recognizing patterns for and from expected NOE connectivity patterns. Obviously, correct side-chain assignments are then added, and the secondary structure regions are superimposed on the primary sequence. This information is then used to assign other peaks and to gradually assign the spectrum. The MCD approach is suited for computer-assisted analysis, and several attempts to automate the assignment procedure are based on it, as, for example, the SERENDIPITY method (van Geerestein-Ujah et al., 1995). Peak ambiguity is one of the most difficult problems to handle automatically. Manually, a bootstrap procedure is used to assign spectral cross peaks that could arise from the overlap of several different pairs of nuclei due to the degeneracy of chemical shifts (Hodsdon et al., 1996; Meadows et al., 1994; Braun, 1991). An initial structure is calculated using unambiguous peaks. If these cross peaks provide sufficient constraints to allow convergence to a bundle of structures with a unique
40
Yuan Xu et al.
3D fold, this is then used to reinterpret the spectra, allowing the identification of further peaks. Manual second-round peak assignment can seldom be done in a systematic way, and in practice most ambiguous peaks are not included in the structure determination.
Completely automated methods have to allow multiple peak assignments at the initial stages of 3D structure calculation. In the molecular dynamics approach, all
possible assignments within a certain tolerance range are included as an weighted sum in the restraint term to be minimized by simulated annealing (Nilges, 1995). Alternatively, in the “self-correcting distance geometry” (SECODG) method each possible assignment is considered as an independent constraint, and the data set is allowed to evolve during structure generation in an iterative way (Hänggi and Braun, 1994). The routine NOAH acts as a structure-based filter, suggesting possible assignments of ambiguous cross peaks. It detects ambiguous constraints
that are not consistent with the rest of the data set and automatically eliminates them (Mumenthaler and Braun, 1995). Experimental 2D and 3D NOESY cross-peak lists of six proteins which have been previously solved by conventional manual assignment have been automatically assigned by this procedure within a few days for each of the proteins. The extent of the automated assignment was only 10% less on the
average than in the manual approach, and the individual assignments differed only in 0.8% to 2.4% of the cross peaks. For some of these cross peaks NOAH suggested viable alternative assignments (Mumenthaler et al., 1997). Applications of the NOAH–DIAMOD program suite to NMR data sets which have not been manually analyzed are emerging (Xu et al., 1997). The SECODGbased structure calculation program can give high-resolution structures for “real life” data sets with a minimum of human interference. We will present some examples for such structure calculations toward the end of this chapter.
2. COMPUTATIONAL METHODS FOR SEQUENCE-SPECIFIC RESONANCE ASSIGNMENTS
Most assignment problems could be automatically solved by a computer program keeping track of all observed cross peaks (bookkeeping) and implementing simple rules for combining these data, similar to the procedure used by a spectroscopist in the manual assignment procedure. To handle real data, a completely automated procedure must be able to deal with • Additional peaks that may be present due to noise or spectral artifacts • Individual cross peaks that may be undetectable due to overlap of resonance
lines
Calculation of Three-Dimensional Protein Structures
41
• Variation in the peak position of the same nucleus in different spectra due to limited digital resolution, limitations in peak-picking procedure, sample heating during complicated pulse sequences, and local variations.
These effects multiply with increasing protein size, making the logical problem and the search for the optimal assignment a complex task. It is therefore not astonishing that completely different computational approaches have been used to solve the same mathematical problem, emphasizing different aspects of real-life problems. New computational methods for computer-supported or automatic methods rely on graph theory, fuzzy logic, neural networks, constraint logic programming, genetic algorithms, and real-space assignments. The concepts of these methods are described here. We have classified the approaches on the basis of their prime computational methods, and we will concentrate on describing their main features. We are aware that most of the software packages actually combine several approaches.
2.1. Graph Theory 2.1.1. Definition of a graph
Most assignment problems can be formulated in the language of graph theory. Coupling patterns in multidimensional NMR spectra can be described by a graph G(X,U), where the nodes or vertices X correspond to the nuclei and the edges U represent the correlation observed between the nuclei in the spectrum (Lau, 1989; Pfandler and Bodenhausen, 1988; McGregor, 1982; Born and Kerbosch, 1973; Harary, 1972). Methods differ in the actual definition of the graphs and in the criteria for building global networks and searching subgraphs. The assignment problem consists of building a global network of graphs from the spectrum having certain fragments or subgraphs in common and finding those common subgraphs. Powerful algorithms for subgraph searching have been adopted to solve computational problems (Christofides et al., 1979). For analysis of COSY spectra, a node of the graph may be defined as a spin with a corresponding chemical shift The edge describes the scalar coupling constant connecting the two nodes and In multidimensional NOESY-type spectra, protons are represented by nodes, and short interproton distances define the edges. Short proton–proton distances for NH, protons which can be experimentally detected by NOESY cross peaks have been extensively studied (Wüthrich et al., 1984, 1983, 1982). Certain distances are short (within the detectable range of 5 Å) for all allowed conformations, while others have connectivity patterns characteristic for the regular secondary structures of the polypeptide chains. One thus searches observed patterns for characteristic subgraphs representing these connectivities. Figures 1 to 3 show graphic repre-
42
sentations of three types of regular secondary structures:
Yuan Xu et al.
, antiparallel, and
parallel The maximum common subgraph (MCS) isomorphism algorithm (Grindley et al., 1993; Lau, 1989; Born and Kerbosch, 1973) can be used to search all the
secondary structure connectivity patterns in the NOESY spectrum and match them with the NOE templates connectivity patterns and, thus, identify the secondary structure connectivity patterns in the NOESY spectrum.
Calculation of Three-Dimensional Protein Structures
43
2.1.2. Computer-Supported Assignment Methods The first computerized assignment methods based on graph theory were manually supported, as the information content of the experimental data was in most cases not sufficient for a complete automatic assignment procedure. An early example, MARCOPOLO (Pfändler and Bodenhausen, 1990, 1988; Pfändler et al.,
44
Yuan Xu et al.
1985), uses the fine structure of multiplets in a 2D COSY spectrum and connectivity diagrams to make spin system assignment. The connectivity diagrams describing the coherence transfer phenomena of the spin systems can be derived from energylevel diagrams of the spin coupling networks. Fragments of the spin coupling
Calculation of Three-Dimensional Protein Structures
45
network are detected and assembled in a global spin network, representing a spin system, by a fragment bridging algorithm. A similar program used 2D TOCSY and NOESY spectra to make sequential assignment based on known patterns of the amino acids (Cieslar et al., 1988). “Prepared spin systems,” generated manually, were used by the program to assign partial sequence connectivities, and eventually the whole sequence, from a set of rules. CLAIRE, a suite of interactive programs (Kleywegt et al., 1991, 1990, 1989), searches spin system patterns and cross peaks between spin patterns using 2D NOESY data to generate consistent assignments. PROSPECT (van De Ven, 1990) performs an exhaustive search of all possible assignments for a given cross-peak pattern. The assignment is divided into three steps: (a) searching for J-connectivity patterns in COSY and TOCSY spectra, (b) assigning each pattern to amino acid type, and (c) obtaining sequence-specific assignment by mapping sequential NOE connectivities onto the amino acid sequence of the protein. At every step, the user can inspect, edit, and overrule assignments. The program suggests multiple assignments for spin patterns that cannot be assigned uniquely.
2.1.3. Assignment of Homonuclear 3D Spectra 2.1.3.1. Concepts. The assignment of multidimensional spectra is particularly suitable for a computerized assignment approach, as the overlap of cross peaks is reduced and human interpretation of the spectra is in any event limited to the study of 2D subspectra. The basic unit of the graphs for 3D TOCSY–TOCSY and TOCSY–NOESY spectra (D’Ursi et al., 1995; Oschkinat and Croft, 1994; Oschkinat et al., 1991) is a triangle that represents the interaction of three J-coupled spins A, B, and C. Each 3D TOCSY–TOCSY cross peak contains the information of the two coherence steps denoted in the graph with two directed edges. All six symmetric 3D cross peaks of the three coupled spins are represented by the whole triangle via all permutations of A, B, C. While the amino acids Ala and Gly can be represented completely by a triangle, Ser and Cys, with four spins, require a tetrahedron. A five-spin system can be represented by two tetrahedra that share a plane, and larger spin systems by clusters of tetrahedra. The program for spin system assignment from 3D TOCSY–TOCSY spectra (Oschkinat et al., 1991) first locates symmetric peaks, represented as triangles. Then the search extends to higher logical units that contain four different chemical shifts and on to clusters of tetrahedra. Due to the redundancy of defining a tetrahedron uniquely through three rather than four connected planes, the conditions for finding a tetrahedron are relaxed, as some cross peaks may be missing. Typically, four chemical shifts are connected by this procedure through a redundant number of 3D cross peaks which are then assigned to the atom types and by their chemical shifts and intensities.
46
Yuan Xu et al.
Characteristic cross peaks for helix, sheets and turns in 3D TOCSY– NOESY spectra are identified on the basis of J-coupling and short proton–proton distances. Then the spin systems assigned from 3D TOCSY–TOCSY spectra are connected in pairs using interresidue cross peaks. Possible connectivities are examined using and chemical shifts, the type of amino acid residues, and regular secondary patterns. The pairs are combined to larger fragments which are then compared to the primary sequence to obtain sequence-specific assignments. Several potential solutions are kept as intermediate results in this assembly process. Finally the possible solutions are evaluated as the “best” assignment on the basis of fragment length and number of interresidue connectivities.
2.1.3.2. Tests and Applications. Overlapping cross peaks also occur in 3D spectra, although the chemical-shift degeneracy is greatly reduced compared to 2D spectra. The procedure outlined in Sec. 2.1.3.1 could automatically assign 60% of
the spin systems in a 3D TOCSY–TOCSY spectrum of monomeric bovine seminal ribonunclease (mBS-RNase). The automated sequence-specific assignment was most successful in the and some loop regions of the protein. All helical segments and some loop regions, however, had to be manually assigned. 2.1.4. SERENDIPITY Method
2.1.4.1. Concepts. The SERENDIPITY method (van Geerestein-Ujah et al., 1996, 1995) first identifies graph patterns for regular secondary structures, before complete spin systems are assigned to possible amino acid types or are sorted in sequential order. First, partial spin systems of and protons are assigned from COSY and TOCSY spectra on the basis of expected values of the chemical shifts of these protons in proteins (Wishart et al., 1995, 1992, 1991; Wishart and Sykes, 1994a, 1994b; Gross and Kalbitzer, 1988; Richarz and Wüthrich, 1978) and if possible the amino acid type is also specified. Second, a list of short NOESY contacts between different partial spin systems is collected according to the
cross-peak volumes. The volumes of the cross peaks are used to calibrate the distances of the interspin cross peaks. The mean value of these volumes is calculated and the corresponding distance set to 2.7 Å. The list of interspin NOE distances is then processed into a series of graphs before the sequence order is determined. A library of NOE graph templates for regular structures (Figs. 1–3), including five for , three for , and two for different types of turns, has been compiled. This library is then used to search the experimentally derived NOE
graphs for patterns characteristic of these secondary structures using the maximal common subgraph (MCS) algorithm (Grindley et al., 1993; Lau, 1989; Born and Kerbosch, 1973). The edges of the template graphs must be weighted differently for and . Some edges are weighted higher than others even for the same secondary structure template. For instance, , and
Calculation of Three-Dimensional Protein Structures
47
connectivities indicate an while strong sequential and weak connectivities (distances) are evidence for These edges have higher weights in selecting common subgraphs than other edges. As precise mapping is not possible due to missing cross peaks and the limited accuracy of the NOE distances, a user-specified distance error tolerance of about 20% is applied. Finally the segments of secondary structures are joined (manually) according to residue type, sequence order, and absolute position of the fragments in the protein sequence to obtain sequence-specific assignments. Efforts are being made to automate the last step (van Geerestein-Ujah et al., 1996). 2.1.4.2. Tests and Applications. The SERENDIPITY program was tested with simulated NOE data and applied to the 3D NOESY–HMQC spectrum of the
-labeled lac represser headpiece protein (van Geerestein-Ujah et al., 1995) and the 3D NOESY HSQC spectrum of the HU protein from Bacillus stearothermophilus (van Geerestein-Ujah et al., 1996). Short main-chain distances were extracted from the previously determined NMR structure of the lac represser headpiece (De Vlieg et al., 1988; Kaptein et al., 1985) to simulate NOE data. The short contacts were translated into 55 “observed” NOE graphs. The identity of the residues and the sequence order were removed from the NOE graphs,
and the NOE graphs were numbered arbitrarily. This simulated data set was used to assess the capability of different templates to recognize regions in proteins. Using optimal templates, the sequence positions of the three helical regions were correctly identified, while suboptimal templates had an error rate of 25% (van Geerestein-Ujah et al., 1995). For the 3D NOESY–HMQC spectrum of the labeled lac represser headpiece, a manually edited list of 291 NOE cross peaks involving interactions between protons in partial spin systems were used as input for SERENDIPITY. The program suggested a few contiguous fragments related to the sequence as possible solutions. Manual inspection of the remaining fragments identified correct sequence-specific
assignments. For the larger HU protein (19.5 kDa), which has three and a three-stranded antiparallel 3D NOESY HSQC data were used as input. The program identified secondary structure elements that were consistent with the X-ray structure. In addition, it suggested an irregular -arm region that was not detected in the X-ray structure. Visual inspection of the individual interstrand long-range contacts confirmed the presence of this feature. Joining all the fragment solutions allowed sequential assignment of 92% of the residues. 2.1.5. Fuzzy Graphs
2.1.5.1. Concepts. Graph theory can be systematically combined with the observed distribution of chemical shifts of the nuclei in proteins in fuzzy graphs. J-coupled nuclei and/or nuclei with chemical shifts near mean values of spin
48
Yuan Xu et al.
systems in individual amino acids are connected with a certain probability in a graph. A fuzzy graph FG contains several graphs as members, and each member has a certain probability of being a member of this graph (J. Xu et al., 1995, 1993a, 1993b): where V represents the set of chemical shifts, is the set of expected deviations of the chemical shifts in a coupled network, and a continuous membership index, , is calculated for each of the chemical shifts in V from the statistical distribution
of chemical shifts (Gross and Kalbitzer, 1988), assuming a Gaussian type of distribution. Values for range from 0 to 1, with those near 1 indicating that a chemical shift is likely to be a member of the graph. The minimum of the set of values is used to estimate the reliability of the whole set to be member of the fuzzy graph. For practical applications, a database of spin coupling patterns of the 20 amino acids is compiled where a particular residue can contain several subgraphs for disconnected spin systems. Fragmented graphs also account for situations where chemical shifts are degenerate or J-coupling cannot be detected in COSY spectra for proteins. For every residue, the database contains the name of the amino acid, spin coupling connectivities, mean value, and standard deviations of the chemical shifts as observed in proteins. The observed spin network patterns of experimental COSY and TOCSY
spectra are coded in fuzzy graphs (query graphs) which are then matched with those in the database. Typically a query graph is mapped onto the database graph for several different types of residues where it is contained as a subgraph. There might be different mappings even within the same residue. The correct assignment is found among the graphs with high membership function values, but the best assignment, i.e., the graph with the highest value, is not always the correct assignment. The result of this step is a list of possible residue types for each spin system. This list is then converted to a list of possible spin systems for each amino acid in the protein sequence (Fig. 4). A tree search algorithm then analyzes the observed NOESY connectivities to determine which of the assignments is most consistent with the primary sequence. Gln 2 in Fig. 4 has, for instance, four choices to connect to Asn 3. The number of possible search paths, the product over all the spin systems for each residue in the sequence, is too large to be exhaustively searched. The number of paths is reduced by restricting the search only to spin systems with NOESY connectivities, but this number increases exponentially with the number of amino acid residues in the sequence. In practice, NOE connectivities might be missing or unresolvable in heavily overlapping regions, such that a strict requirement for the existence of NOE connectivities might not lead to the correct assign-
Calculation of Three-Dimensional Protein Structures
ment. Also the existence of
49
with short interstrand connectivities prohibits
a simple straightforward selection rule for the correct assignment. A heuristic search method, the Constrained Tree Search Algorithm (CTSA) (J. Xu et al., 1995), preferentially searches NOE connectivities such as , and because short distances for these connectivities indicate a sequential connection with very high probability (Wüthrich et al., 1984). 2.1.5.2. Tests and Applications. A series of programs implementing elements
from fuzzy graphs has been tested on experimental spectra of a 21-residue peptide fragment from Nac-t21a and for the 11 -residue cyclic peptide, cyclosporin A (J. Xu et al., 1995). A correct sequence-specific assignment was made in both cases. However, human intervention is needed to apply the method to proteins, and the search methods are prohibitively CPU intensive.
2.1.6. Assessment of the Graph Theory Methods The framework of graph theory is a very suitable formalism to describe the computational problems of assigning NMR spectra and determining secondary structures. In some of the approaches, graph theory is used as a convenient language to describe the assignment problem, while others use powerful algorithmic procedures to solve the assignment problem. Some of the programs assist the manual assignment process or provide suggestions for possible assignment and leave the final decision to the user. The fuzzy graph approach rephrases the problem, but low computational efficiency hinders its application
to proteins despite the new formulation. The SERENDIPITY program is a
50
Yuan Xu et al.
promising practical tool, judging from its successful application to a large protein (19.5 kDa) HU from B. stearothermophilus.
2.2. Genetic Algorithms and Mutual Information Method 2.2.1. Concepts
Genetic algorithms (Michalewicz, 1995) are stochastic optimization methods that attempt to mimic biological evolution. A population of bit strings representing possible sequential assignments of spin systems are subjected to “natural selection”
by inducing point mutations and recombination events. A number of spin system assignments are (randomly) produced as the initial generation. The bit strings are ranked according to a score function derived from the consistency of the assignment
with the spectral data. Members of a new generation are created by combining the best-ranking members of the current generation and local rearrangements of the assignments. The population size is usually kept constant. The fitnesses of the offspring are again evaluated with the score function, and the iterative cycles (generations) are repeated until the optimization criteria are fulfilled. Global
optimization of the score function through the genetic algorithm can be supported by special routines which try to find the best assignment of sequential spin systems
(local optimization). The genetic algorithm was first used by Wehrens et al. (1993) to support the
manual sequential assignment procedure. The method uses as input a list of the neighboring residues in the protein sequence and, for each residue pair, a list of
possible pairs of sequentially connected spin systems. The program GARANT (general algorithm for resonance assignment) (Bartels et al., 1997, 1996) was initially introduced as a method to assign COSY, TOCSY,
and NOESY spectra of a protein if the assignment and the 3D structure of a homologous protein are available. Basically GARANT generates a series of assignment possibilities based on matching expected and observed cross peaks from these spectra. It uses as input (a) the primary sequence of the protein, (b) a list of the
experimentally observed cross peaks, (c) a library of the expected cross peaks of residues derived from the different types of NMR spectra (COSY, TOCSY, NOESY etc.), from chemical-shift statistics and the coupling patterns of protons in the different amino acid residues, and (d) (optionally) the chemical shifts and 3D structure information of the homologous proteins. GARANT suggests cross peaks on the basis of the library data and previous assignments for the spectra. The expected NOESY cross peaks are restricted to sequential residues if no data from a homologous protein is available. These expected cross peaks are then matched on observed cross peaks within a certain tolerance in the chemical-shift position. The expected cross peaks are predicted from atoms and residue types. A successful match therefore defines a possible resonance assignment. A score function (mutual
Calculation of Three-Dimensional Protein Structures
51
information) is derived from the accuracy of the suggested match based on observed standard deviations of chemical-shift variations of the different atom types and the number of expected cross peaks that could be accounted for in the spectra. A local optimization algorithm generates new resonance assignments based on the quality of previous mappings. Generally, there are several experimental peak candidates for a given expected peak if the position of the expected peak is not
precisely defined. In such cases the mapping of previous generations is used. If no such peak exists, a new mapping is tried with the same spin system assignment as
in the parent generation. The chemical-shift ranges of the expected peaks will also be adjusted (enlarged or shrunk) according to the mapping. 2.2.2. Tests and Applications The potential of the genetic algorithm of Wehrens et al. (1993) as an assignment tool was tested on the proteins E-L30, BPTI (each of the 58 residues), and Tendamistat (74 residues). Using real data sets, the program was able to assign about 80% of the resonances correctly for BPTI. However, only about 50% of the assignment was correct for tendamistat and 40% for E-L30. In all cases the success of sequential assignment was heavily dependent on the quality of the spectra. The GARANT program was tested on manually derived data from 2D COSY, TOCSY, and NOESY spectra for Tendamistat R19L (74 residue), where the known structure of the wild-type protein was used to clarify the spectral interpretation (Bartles et al., 1996). It assigned 321 of 393 protons correctly and 18 incorrectly, with the backbone almost completely correctly assigned. The program was able to assign 332 resonance frequencies if the chemical shifts for the wild-type protein were used as supplemental data. When data from 2D and NOESY spectra for a mutant of the homeodomain protein Antp (C39S,W56S) were combined with supplementary NOE predictions from the Antp C39S protein, experimental problems prevented the assignment of anything but the well-ordered central part of the molecule. The results were better for reassigning the structure of cyclophilin A, for which many 3D structures from X-ray and NMR studies are known for variants alone and in complex (Braun et al., 1995). Here, the authors used 3D NOESY, NOESY, and CBCA(CO)NHN data in combination with NOE data from an NMR spectrum of a complex of the cyclophilin A with cyclosporin S. The backbone and assignments were nearly complete, and a high percentage assignment of protons were also obtained (Bartels et al., 1996). GARANT was limited in its assignment capabilities, however, when NOE data from a homologous structure were not included and especially if the automatic peak-picking routine from the program package XEASY was used to generate the input data. For Tendamistat, 321 of 393 proton chemical shifts (81%) were correctly assigned if an “ideal” (manually picked peaks with an error tolerance of 0.02 ppm
52
Yuan Xu et al.
for the peak positions) peak list was used. For cyclophilin A, when the peak list was obtained through manual interactive analysis of the spectra, GARANT was able to assign 1353 chemical shifts from a total of 1613 and atoms. With an
automatically generated peak list, however, the program was only able to assign a small portion of the resonances. 2.2.3. Assessment Genetic algorithms combined with local optimization may be useful for sequence-specific assignment of NMR spectral data of mutants of proteins where a 3D structure is known. GARANT may speed up assignment and structure calculation for a set of mutant proteins. The basic genetic algorithm is slow to find good solutions, and has to be supported by additional search methods. Criteria based on
the score function values to ensure an error-free assignment or a few incorrect assignments are missing. 2.3. Neural Networks 2.3.1. Concepts A neural network (NN) consists of hierarchical layers of units: a layer of input units, one or several layers of hidden units, and a layer of output units. Each unit of one layer is connected to all units of the adjacent layer with a defined direction (Fig. 5). The NN represents a flexible mapping of the values of the input units to the values of the output units. Values of the units range between 0 and 1. The mapping of the input values to the output values is coded in a set of weights representing the connection between the unit i of one layer and the unit j of the next layer, and a bias term (Fig. 6). The mapping of the values of the input units to the value of the unit j of the next layer is given by
where the activation is a weighted linear combination of all input values and a The weights and the bias terms are found by presenting to the NN many sets c of input and output values (training cases). Almost any mapping between input and output vectors can be approximated, making the NN an attractive versatile tool for modeling complex input–output relations. bias term
For each case c the weights and the bias terms are fitted such that the output values calculated from the given input values and the desired output values
Calculation of Three-Dimensional Protein Structures
53
54
Yuan Xu et al.
match approximately. In practice the weights and bias terms are initially randomly assigned and are gradually optimized by minimizing the total error function E at each training cycle (Rumelhart et al., 1986):
By presenting a large number of training sets c to the NN, one hopes that the trained NN will recognize in new input values the basic feature of the input–output
relation and will produce the desired output result. 2.3.2. NN in Spin System and Sequential Assignment
Neural networks have been applied to the automatic identification of NMR spectra of oligosaccharides (Meyer et al., 1991; Thomsen and Meyer, 1989). We will concentrate on the identification of amino acid types from 2D or isotope-edited 3D TOCSY experiments, and the application of NN in the sequential assignment process (Hare and Prestegard, 1994). For the assignment of spin systems in NMR TOCSY spectra by NN (Hare and Prestegard, 1994), many sets of TOCSY cross peaks of the same spin system are presented as input to the net in the training phase, and the output is defined as this spin system. This procedure is repeated for all spin systems by picking cross peaks
for all amino acid residues in assigned multidimentional TOCSY spectra and integrating their intensities. Peaks for and , which do not contain enough residue-specific information, are not included (Wishart et al., 1991). Each spin system is characterized by a set of chemical shifts and their cross-peak volumes. The TOCSY spectrum is divided in one dimension into units of 0.1 ppm interval ranging from –1.0 ppm up to 6.0 ppm to yield 71 input units. The spectrum for every spin system can be coded in these 71 units according to the chemical shifts of all participating spins and their associated cross-peak volumes. Each input unit contains the sum of the volumes of all cross peaks of the spin system within the 0.1-ppm band. Twenty output units code for the amino acid residues. The output unit of the presented spin system (residue type) is assigned a value of 1, and each of the other 19 output units a value of 0. After residue-type assignment, sequential assignment requires extra interresidue NOESY data (2D and 3D) to find proper connectitivies. A modified NN, combined with NOE constraint satisfaction algorithm, is used for fitting the assigned residue types from first NNs to the sequence. Rules used to construct the weight matrix are (a) assignment of more than one spin system to the same position of the sequence is discouraged; (b) assignment of the same spin system to more than one position in the sequence is discouraged; and (c) all and NOEs are sequential (valid for ).
Calculation of Three-Dimensional Protein Structures
55
Every residue in the primary sequence has three most probable assignments of the spin systems obtained from the first NN in addition to those not ruled out by the first NN. Every unit of the input layer is connected to all other units of the output layer; the (symmetric) weight matrix linking two layers is given by the user according to constraints described above. Because the weight matrix is predefined, this NN is not trained. Potential spin systems are probed for and distances indicative of a sequential connection; when such a connection is found, the weight between the two units is enforced by adding a value of 1.0 (Hare and Prestegard, 1994). To run this NN, the output from the last cycle of the first NN is taken as the bias term of the weight matrix, and has the form
where T is a temperature factor used for simulated annealing, and a constraint satisfaction algorithm was used for minimization. The minimization continues by randomly selecting the units and calculating the output using Eq. (4). The “temperature” T is adjusted according to a predefined protocol to avoid local minima.
2.3.3. Tests and Applications To train the NNs (Hare and Prestegard, 1994), a 2D TOCSY spectrum and a 3D TOCSY–HMQC spectrum of E. coli ACP (acyl carrier protein) were used to extract the spin systems. All the TOCSY spectra were assigned manually to 148 spin systems and 17 types of residues, which were used to train the NN. The trained NN was then tested on 53 spin systems from spinach ACP, which has 37% sequence homology with E. coli ACP. The primary sequence of spinach is known, but the TOCSY spectrum was not assigned. To lower the chance of misassignments, a weight of–60 was added to the units representing the residues Tyr, His, Arg that are not in the sequence of spinach ACP, such that the values of Eq. (1) are zero for these residues at the output layer. The assignment varied somewhat when different numbers of units were used for the hidden layer and when values of the bias factor of NNs were changed (Hare and Prestegard, 1994). The best residue-type assignment was found at four hidden units with a bias weight factor of 1, where 77% of 53 tested spin systems had a correct assignment to a residue type within the top three choices and 83% within the top eight. The sequential assignment of a section of spinach ACP (20 residue fragment from residue 5 to 24) was tested using this NN. The NN has two identical layers arranged according to the primary sequence 5–24 of spinach ACP. Using this
approach, 15 out of 20 residues were assigned sequentially, about a 75% success rate.
56
Yuan Xu et al.
2.3.4. Assessment of the Neural Network Method Automated assignment of TOCSY spectra with NN is a promising method but will require considerable work to be used as an on-line module in an automated suite. Sequential assignment was less successful, but can also be achieved by combining TOCSY data with NOE connectivities and distances. The second NN was only useful for the section of the spinach ACP. Sequential assignment was done manually rather than by using a second trained network. Possible sequential connections has been used to assign the numerical values for the weight matrix based on sequential and distances.
2.4. Matching Rungs on a Ladder: Automated Sequential Assignment Using Isotopically Labeled Proteins Heteronuclear triple-resonance NMR experiments with isotopically labeled proteins have dramatically simplified the assignment problem, especially for proteins between 10 and 30 kDa (Gronenborn and Clore, 1994; Wagner, 1993; Clore et al., 1990; Kay et al., 1990). The increased resolution in three-dimensional and four-dimensional spectra reduces the amount of overlapping cross peaks. Even more important, sequential connectivities between nuclei of neighboring residues can be directly observed, as these cannot be confused with short proton–proton distances resulting from long-range contacts as might happen with homonuclear proton spectra. In addition, the chemical shifts of and greatly restrict the possibilities for amino acid residue types for a given spin system (Grzesiek and Bax, 1993). Two automatic sequential assignment methods, first practical with the introduction of triple-resonance (using - and proteins) NMR, take advantage of this technological breakthrough. Both AUTOASSIGN (Zimmerman et al., 1997; Zimmerman and Montelione, 1995) and the RID method (Friedrichs et al., 1994) begin by generating a generic list of spin systems which contain linkage information from the heteronuclear cross peaks. Pairs of sequentially connected spin systems are then linked with a high reliability and finally matched to the amino acid sequence. The AUTOASSIGN method uses data on intraresidue and sequential connectivities between aliphatic side-chain protons to the backbone nuclei and while the RID method mainly uses the connectivities of neighboring pairs of
2.4.1. Automated Sequential Assignment Using Logical Constraint Propagation AUTOASSIGN (Zimmerman et al., 1994, 1993) is an object-oriented expert system that uses constraint reasoning to assign spin systems to a known amino acid sequence. Given a list of spin systems, the amino acid sequence of the protein, and a list of CO-TOCSY cross peaks as input data, AUTOASSIGN determines a one-to-one mapping of spin systems to the amino acid sequence that is most
Calculation of Three-Dimensional Protein Structures
57
consistent with the spin system connectivities from the CO-TOCSY spectra. Certain logical rules dictate the order of spin systems. A brief discussion of AUTOASSIGN follows, based on Zimmerman et al. (1994, 1993). The reader is referred to Chapter 3 in this volume for further details as well as comments on earlier related methods. AUTOASSIGN uses a constraint satisfaction method (based on artificial intelligence programs) to assign spin systems to a given amino acid. Each spin system has three linking variables: N-, C-, and amino acid assignment. Two types of TOCSY spectra yield pairs of connected spin systems. CA-ladders, derived from CA-TOCSY spectra, represent connectivities between aliphatic side-chain protons and and nuclei of the same residue. CO-ladders, derived from CO-TOCSY cross peaks, describe the side-chain interactions with the and nuclei of the next residue. The program matches the CA-ladders on the CO-ladders, based on the chemical shifts of the side-chain protons. To be assigned to the sequence, a spin system’s best link to a CO-ladder must be that ladder’s best match to any spin system, and the link score between the two spin systems must exceed the other link scores by a set threshold value. A constraint propagation network ensures that after each assignment, its consequences are passed on to all the other variables and that obviously incorrect matches are “pruned.” 2.4.2. Tests and Applications AUTOASSIGN was first tested for reliability on actual and simulated data for
a 72-residue domain of staphylococcal protein A. For simulated data, the method assigned 83% in the initial phases, before the iteration began, and 100% with one cycle. When data were deleted from and noise added to this set, a few erroneous
assignments were made. With real data (3D CO-TOCSY), which contained only 65% of the expected cross peaks, and 71 of 72 spin systems, 30% of the definite assignments were made at start-up and with iterations the final assignment was made to >90%. (Zimmerman et al., 1993). Only two residues, for which there were no definitive data, could not be assigned. Various simulated data sets for hTGF a 50-residue protein, were used to test AUTOASSIGN’s ability to deal with noise and incomplete data. Again, with complete and exact data, AUTOASSIGN assigned all the spin systems with no errors. When the data set included noise and several pieces of data were deleted, the system was able to assign 91%–100% of the residues
correctly (Zimmerman et al., 1994). The method shows great promise as a routine automated assignment tool in practice (Zimmerman et al., 1997). 2.4.3. Sequential Assignment Using a Residue Information Data (RID) Structure
The RID method (Friedrichs et al., 1994) combines the input from six different 3D and 4D triple-resonance experimental spectra: 3D HNCACB, 3D CBCA(CO)HN, 4D HNCAHA, 4DHN(CO)CAHA, 3D HBHA(CO)NH, and 3D
58
Yuan Xu et al.
HNHA-Gly (for assigning glycines). The first four experiments are specifically
designed to take advantage of the residue specificity of the shift. Each RID contains the chemical shifts of different atom types for each residue and the previous residue in the sequence, pointers from the atoms to the cross peaks, and links between residues. The method is implemented as a set of macros integrated into a modified Felix program. The macros are executed in three stages. In stage 1, the cross peaks of the first four experiments are linked to the four atom types based on the correlation of these peaks to amide and chemical shifts of residue i. The RIDs are linked sequentially in a second stage, and the sequential ladder of RIDs is aligned with the protein sequence in the final stage. A separate stage 2 macro incorporates the data from the 3D HBHA(CO)NH and HNHA-Gly experiments to link glycine residues to the rest of the sequence. Linkages between residues are described as weak or strong and are given a score based on differences in matched chemical shifts. The method deals with chemical-shift degeneracy in the initial stage by using minimal chemical-shift tolerances while searching for peaks within a spectrum or between two spectra, combined with rules related to peak volume and alignment in two dimensions. If this does not work, the peak is either temporarily allowed to have multiple assignments or a given atom is allowed multiple peak assignments. These ambiguities are then dealt with at the last program stage. Due to the quantity and quality of data, only 10%–30% of the RIDs are ambiguous. As the information linking an RID to its sequential neighbor is overdetermined, the ambiguities should be resolved when the known protein sequence is superimposed on the sequential RID ladder. 2.4.3.1. Tests and Applications. The program was applied to assign simulated and real experimental data sets for the human hnRNP C RNA-binding domain (93 residues), calmodulin (148 residue), and apokedarcidin (114 residues). Using simulated spectral data for calmodulin as input, nearly all residues were assigned correctly. A data set for human hnRNP C RNA-binding domain that had been
manually assigned was assigned again by the program. With a small amount of manual assistance at stage 2, all of the backbone and atoms were assigned correctly except for two N-terminal residues, due to rapidly exchanging protons. Finally, previously unassigned spectral data for the apokedarcidin protein were used as input. Using a small amount of manual editing of the RID list at stage 2, all 108 assignable residues were assigned by the program. Remarkably, the method was able to deal with spectral heterogeneity arising from holoprotein contamination of the sample.
2.4.4. Assessment of “Ladder” Methods Both AUTOASSIGN and RID require extensive heteronuclear NMR data collection, including specially designed TOCSY (for AUTOASSIGN) and hetero-
Calculation of Three-Dimensional Protein Structures
59
nuclear triple-resonance experiments. The 3D experiments require labeled protein. As the input data for AUTOASSIGN is derived manually and the data should be manually checked, it is not completely automated. Peak-picking programs used in series can speed up assignment, and techniques from the rapidly expanding field of artificial intelligence may expedite the transition from spectra to input data. The RID method would appear to be completely automated in the initial stages of spectral interpretation, but manual editing is required at stage 2 to detect RIDs that result from noise. Presumably the RID method will also become progressively more automated as experience is gathered from using the experiments it is based on. Either of these programs could be a module in automated analysis if used in suite with distance geometry programs, such as those including SECODG, discussed later, which can detect and eliminate erroneous constraints. 2.5.
Combinatorial Optimization and Monte Carlo Simulated Annealing of Score Functions
2.5.1. Concepts
In this type of method, a potential sequential assignment is evaluated through a score function derived from sequential connectivities, probabilities for matching chemical shifts in different spectra, and the degree of matching of an observed spin system to a particular amino acid residue type. The score function is optimized by systematic rearrangement of sequential assignments in one or two short segments (combinatorial optimization) or by Monte Carlo simulated annealing. A Monte Carlo move consists in swapping two randomly chosen segments of the same length. After typically to moves and gradually decreasing temperature, the score function reaches an optimal value, indicating a possible sequential assignment for the given constraints. The program ALFA (Bernstein et al., 1993) implements this method for assigning nD heteronuclear spectra (NOESY and TOCSY). The input data are (1) the protein sequence, (2) a list of the observed spin systems classified according to residue, (3) a list of all potential sequential neighbors and number of interresidue contacts, and (4) (optional) data from the three-dimensional structure. Two short segments of the sequence are randomly chosen. The assignments for each residue in these segments are systematically explored and evaluated by the score function. The best assignment according to the score function is kept, and two other segments are chosen for the next cycle. About cycles are needed for the score to reach an optimal value. A systematic search and a simulated annealing search method is implemented in the program ALPS (Morelle et al., 1995). Automated assignment of backbone NMR spectra of -labeled proteins is based on a set of “reduced dimensionality” 2D triple-resonance experiments such as –HSQC, 2D HNCO, 2D
60
Yuan Xu et al.
HN(CO)CA, 2D H(N)COCA, 2D HN(COCA)H, 2D HNCA, 2D HN(CA)CO, and 2D HN(CA)H. The assignment is made in three independent steps: (1) Identification of frequency triplets in the reduced 2D spectra. These triplet peaks are filtered
to remove redundancies using best-alignment and intensity criteria. Any ambiguities of triplets must be removed manually at this stage. (2) Construction of pseudoresidues by combining HNCO, HN(CO)CA, and H(N)COCA experiments. These pseudoresidues are then extended with the frequency by using the HNCA experiment. Pseudoresidues can then be sequentially arranged through overlapping and chemical shifts. Optionally CO(i), or chemical shifts can be included in extended pseudoresidues to improve the sequential connectivity. The score function evaluates the fit of a spin system to a particular residue type, based on and chemical shifts and the chemical-shift deviations of overlapping pseudoresidues. This score function is optimized by either a systematic search or a Metropolis simulated annealing procedure. The same approach has been recently more refined by analyzing chemical-shift deviations among different spectra of the same nuclei and the chemical-shift variations of and in the same amino acid type. These statistical data were then used to improve the score function (Lukin et al., 1997).
2.5.2. Tests and Applications ALFA (Bernstein et al., 1993) was tested on a 107-residue protein, mucous trypsin inhibitor (MPI). Without information on the 3D structure, 83% of the spin systems could be correctly assigned. The ALPS program was successful in assigning real spectra of Rhodobacter capsulatus ferrocytochrome a 116-residue protein, yielding the same sequential assignment as previously obtained by manual analysis. The procedure was also shown to be robust by deliberately reducing the input information (Morelle et al., 1995). The method of Lukin et al. (1997) was tested on three previously assigned proteins. For calmodulin (148 residue), the computer assignments agreed perfectly with previous assignments. For the structured region of CheA (residue 124–257) the computer assignment agreed with 99% of the manual assignment. For glutamine-binding protein of Escherichia coli (226 residues) (Yu et al., 1997), assignments from the computer program agreed with 95% of those derived manually.
2.5.3. Assessment
The combination of local methods by combinatorial optimization and global methods such as Monte Carlo simulated annealing yield impressive results which almost reach the quality and reliability of a manual assignment by an experienced NMR spectroscopist. The success is certainly partially based on the high quality of the three-dimensional or “reduced dimensional” 2D triple-resonance experiments, and not solely on the efficiency of the methods per se. Avoiding the burden of fancy
Calculation of Three-Dimensional Protein Structures
61
approaches, which in practice often hide, rather than solve, the intrinsic computational problem, and using standard optimization methods in an efficient way largely account for the success of the methods.
2.6. Real-Space Assignment 2.6.1. Concepts This method uses the metric matrix distance geometry method (Havel et al., 1983), which gives an elegant and direct relation between internuclei distances and the 3D position of these nuclei. The conventional order, to assign chemical shifts to individual nuclei in the protein in the first stage and then to calculate 3D structures from distance information, is reversed in the real-space assignment. The nuclei
(e.g., all protons) are arbitrarily labeled, and the three-dimensional structure of these nuclei is calculated from distance information. Residue-type assignments are then done by an exhaustive search among different combinations of unassigned spins in the calculated 3D structures. This approach resembles that of an X-ray crystallographer fitting a polypeptide model to the electron density map. Combining 3D structure calculation and NOE assignment using the metric matrix methods is very appealing. The idea was first tested for the sequential assignment of two isoleucines, two lysines, and two arginines in the micelle bound conformation of melittin (Brown et al., 1982). However, the experimental constraints from proton–proton distances were at that time not restrictive enough to pursue this procedure further. A more general implementation of the concept was described a decade later. Oshiro and Kuntz (1993) applied a metric matrix distance geometry to the assignment of unambiguous spectra. In their approach, similar to the previously mentioned MCD, nuclei are first arbitrarily labeled, and the spatial positions of the nuclei are calculated on the basis of approximate distances. First, main-chain protons are assigned to their respective type of amino acid and to specific spin systems rather than individual amino acids. Distance geometry trial structures are generated from the interresidue NMR constraints. Graph theory is then used to divide the structures into domains, where the secondary structure elements are identified on the basis of known patterns for or standard geometries. These structures are then superimposed on the primary structure, and additional details from these are used to further assign protons. Almost simultaneously Kraulis (1994) introduced another method, based on a Monte Carlo simulated annealing algorithm, to generate 3D structures directly from - and -separated NOE data with no reliance on J-coupling data. The ANSRS method (assignment of NOESY spectra in real space) proceeds in three phases: (1) generation of a 3D structure based on proton coupling between unassigned protons, (2) search for residue-type assignments for the remaining unassigned protons after structure generation, using a “chemical shift probability surface” for each amino
62
Yuan Xu et al.
acid type, and (3) sequence-specific assignment by optimally fitting the results of steps 1 and 2 to the known primary sequence. 2.6.2. Tests and Applications
Neither real-space automated assignment method is able to handle ambiguous (i.e., real) data. The metric matrix distance geometry method was tested on real and simulated data sets for BPTI. The authors concluded that their DG structures based only on the NOE-derived interresidue distances did not closely resemble the true structure and that their method could only be applied to excellent quality, stereoresolved data (Oshiro and Kuntz, 1993). The ANSRS method has been applied to simulated data sets for Gal 4 and BPTI. The method determined sequence-specific assignments for >95% of the spins. Although it could generate a 3D structure that would account for the proton–proton interactions, the method could not deal with ambiguous real data (when two H–X pairs have the same chemical shift) and failed to generate a structure as soon as the distance constraints accounted for less than 70%–80% of the theoretically obtainable data.
2.6.3. Assessment The decisive advantage of all real-space methods is that NOESY spectra are assigned directly, without the intermediate step of assigning J-correlated spectra. Thus the extensive data collection from several 3D and 4D J-correlated spectra, needed by other automatic methods such as AUTOASSIGN or RID, might not be necessary in future automatic assignment methods based partially on real-space methods. The quality of the distance constraints and ambiguity of cross peaks in the frequency space are limiting factors for all real-space assignment methods. The ANSRS method gave excellent results if the chemical shifts were unique in the simulated data sets. Improvements will be needed so that it can use real experimental data sets.
3. AUTOMATED STEREOSPECIFIC ASSIGNMENTS 3.1. Concepts
In the early calculations of 3D NMR structures from NOESY spectral constraints, two stereorelated spins, such as the two hydrogens in methylene groups or the two methyl groups in isopropyl groups, could not be individually resolved. A simple “pseudoatom concept” was introduced for these prochiral centers to treat NOE intensities in distance geometry calculations (Braun et al., 1983). A pseudoatom representing the mean position of the two spins was used as a reference point. The distance constraints derived from the larger NOESY cross-peak intensity of
Calculation of Three-Dimensional Protein Structures
63
two nonstereospecifically assigned spins was relaxed with a pseudoatom correction (Güntert et al., 199la; Wüthrich et al., 1983) and used as a constraint to the reference point. This pseudoatom concept was rather successful in the first NMR structure determinations of proteins and is still in widespread use if the chemical shifts of the two stereorelated spins are degenerate. Improved pseudoatom corrections for the diastereotopic substituents in the amino acid side have been suggested (Fletcher et al., 1996). The accuracy of 3D protein structures can be substantially improved if the chemical shifts of the nuclear spins in prochiral centers can be resolved and assigned to the individual group (stereospecific assignment) (Havel, 1991; Driscoll et al., 1989; Güntert et al., 1989). Several computational methods have been suggested to make stereospecific assignments, including grid search methods (Polshakov et al., 1995; Nilges et al., 1990; Güntert et al., 1989), floating chirality (Holak et al., 1989; Weber et al., 1988), atom swapping (Williamson and Madison, 1990), averaging (Brünger et al., 1986) or summation (Nilges, 1993), and automated analysis of distance geometry structures (Güntert et al., 1991a). Empirical methods for stereospecific assignment of prochiral groups are completely based on local information. Patterns of intraresidue and sequential NOEs are qualitatively combined with restrictions of the local conformation from coupling constants (Hyberts et al., 1987). Analysis of the COSY cross peaks may also be used for determining the angle in cases where cross peaks in the region heavily overlap (Bartik and Redfield, 1993). Variations in the
chemical shift, which are dependent on the exact geometry of the surrounding groups (Williamson and Asakura, 1992), can also be used to further restrict the torsion angle . Specific pulse sequences have been developed to obtain restrictions on (Hu and Bax, 1997) and stereospecific assignments of the resonances of the side-chain amide groups of Asn and Glu (Mcintosh et al., 1997) in isotopically labeled proteins. Grid search methods automate these empirical procedures. The program HABAS (Güntert et al., 1989) scans the list of experimental distance constraints for consistency with all local conformations for a particular residue i. It selects only intraresidue and sequential distance constraints for these checks. It is preferably applied before one starts a distance geometry calculation. HABAS checks the two assignment possibilities for all methylene protons and for the two methyl groups of valine. It locates all conformations that fulfill the experimental constraints in two independent grid searches of the 3D space defined by the angles with a 10º step in each angle. Only a subset of 13,050 conformations among the theoretical possible conformations needs to be checked against the experimental data due to steric hindrance. HABAS will not make erroneous assignments or miss a possible stereospecific assignment, but errors in the experimental data and internal mobility influence the result of the calculation. These considerations have to be included in the interpretation of the HABAS results.
64
Yuan Xu et al.
The program STEREOSEARCH (Nilges et al., 1990) is similar to HABAS but also includes a search in a database of conformations derived from high-resolution
X-ray crystal structures. The program eliminates conformations that do not fit with the majority of constraints and the user can specify criteria for constraint satisfaction that allow a conformation to be assigned unambiguously. Like HABAS, the program generates a range of allowable torsion angles for residues also in cases where an unambiguous assignment is not achieved. The program AngleSearch (Polshakov et al., 1995), a more recent implementation of the grid search method, calculates coupling constants and interproton distances for a single torsion angle or pairs of torsion angles and does an exhaustive grid search in these spaces similar to HABAS or STEREOSEARCH. In addition, it explicitly calculates dynamic averages of coupling constants and effective distances in cases the data cannot be accounted for by a single conformation. In addition to protein data, AngleSearch can also interpret data for DNA, RNA, and other biopolymers. To facilitate the calculation of many conformations, HABAS and STEREOSEARCH ignore medium- and long-range constraints. The program GLOMSA (Güntert et al., 1991 a) was developed to use these longer-range distance constraints in determining the accuracy of stereospecific assignments. GLOMSA has been designed with input–output interfaces to the program DIANA. The 3D protein structures are calculated in a first round with the pseudoconcept of DIANA. In the 3D DIANA structures the labels of stereorelated pairs are consistently defined with the stereochemically correct nomenclature; the absolute configuration is always preserved as the program operates in torsion angles. For those prochiral centers for which HABAS has not been able to uniquely assign stereorelated pairs, pseudoatoms are used as reference points for the relaxed distance constraints. GLOMSA then analyzes an ensemble of “good structures” (Widmer et al., 1993). It compares the distances of stereorelated pairs to neighboring protons with the imposed distance constraints derived from the observed NOE intensities. It correlates the calculated distances with these NOE intensities, and stereospecifically assigns stereorelated pairs. For example, if short distances to one prochiral center consistently agree with the larger observed NOE intensity, the assignment is taken from the 3D structure, or if large distances are consistently observed with large observed NOE intensities, the assignment is reversed. As GLOMSA uses the ensemble of calculated distance geometry structures, the sampling is not exhaustive as in the grid search methods, but it can detect additional stereospecific assignments imposed by medium- and long-range NOEs and improves the quality of the final structures. An alternative method, “floating chirality” (Weber et al., 1988), uses both observed NOE intensities of prochiral centers in 3D structure calculations without relaxing the distance constraints. No chiral constraints (signed volumes) for carbon atoms with nonstereospecifically assigned substituents are applied in metric matrix distance geometry and simulated annealing calculations. The tetrahedral geometry
Calculation of Three-Dimensional Protein Structures
65
is thereby conserved, but the absolute configuration for these centers still has two possibilities. The stereorelated groups are allowed to “float” between pro-R and pro-S configurations, which in practice means that they are randomly assigned to one configuration or the other. To enforce a good sampling, protons are usually
given a preliminary assignment for the starting structures in the “forward” runs and the opposite assignment in “reverse” runs [atom swapping; (Williamson and Madison, 1990)]. The final structures are then statistically analyzed for their preference of pro-R or pro-S chiralities. Another method, “ averaging” (Brünger et al., 1986) or “ summation” (Nilges, 1993), also includes constraints directly in 3D structure calculations from nonstereospecifically assigned groups. The observed NOE intensities to both spins are added, and a corresponding average distance is calculated. This observed average distance is compared to the actual average of the distances to the stereorelated spins in the 3D structures during the calculation and included as a least-square target term in the simulation annealing optimization. The difference between averaging (Brünger et al., 1986) and summation (Nilges, 1993) is the treatment of the multiplicity of a cross peak, i.e., the number of spins contributing to the same cross peak (Fletcher et al., 1996). For cross peaks involving single protons and resolved protons in methylene groups, the distinction is only a question of semantics. For cross peaks involving methyl groups the difference is in practice quite small, but the correct treatment for these cases should be based on averaging (Tropp, 1980). 3.2. Tests, Applications, and Assessment The precision of current NMR protein structures is comparable to highresolution X-ray structures (MacArthur and Thornton, 1993). The use of automated programs for determining stereospecific assignments has contributed to this improvement, as shown by the following examples. Many structure determinations of proteins still combine manual stereospecific assignments with the pseudoatom approach. Grid search methods, if properly used, are in practice quite robust and can be considered as a safe method for automated stereospecific assignment. STEREOSEARCH was tested with model data for 1414 methylene groups from 20 crystal structures and was able to make about 80% of the stereospecific assignments (67% if only the crystal structure database was used). For experimental NMR data for the C-terminal domain of cellobiohydrolase I (36 residues), the program was able to assign 15 of the 18 nonproline groups. The three exceptions included two residues with multiple cl conformations and one where there was no discriminating intraresidue NOE information (Nilges and Brunger, 1991). The HABAS–DIANA–GLOMSA suite was extensively tested with simulated data sets for BPTI (Güntert et al., 199la) and then on experimental data for the
66
Yuan Xu et al.
antennapedia homeodomain (Güntert et al., 1991b). Some recent examples of stereospecific assignments using the program suite include • Toxin OSK1 from Orthochirus scrobiculosus scorpion venom (Jaravine et al., 1997). • Toxin III from the scorpion Leiurus quinquestriatus, where dihedralangle restraints were derived from NOE connectivities and pling constants using the HABAS program (Landon et al., 1996). • SH3 domain of human p56 Lck tyrosine kinase (Hiroaki et al., 1996). • Ribonuclease T1 (RNase T1, 104 amino acids). Primary multidimensional NMR spectroscopy data were interpreted to yield 1856 assigned NOE intensities, 493 constants, and 62 values of amide proton exchange rates. From these data, 2580 distance bounds, 168 allowed ranges for torsional angles, and stereospecific assignments for 75% of betamethylene protons as well as for 80% of diastereotopic methyl groups were derived (Pfeiffer et al., 1997). • PEC-60, a protein of the Kazal-type inhibitor family (Liepinshe et al., 1994). HABAS alone was able to assign 8 of 44 methylene groups, and GLOMSA assigned another 4 methylene groups, 9 - and methylene groups, and 3 isopropyl groups.
HABAS can also be used in combination with other distance geometry programs such as DISGEO, as demonstrated by the calculation of the three-dimensional structure of apo calbindin D9k. Input data consisted of 994 NOE distance constraints and 122 dihedral constraints, aided by the stereospecific assignment of the resonance from 21 methylene groups and 7 isopropyl groups of leucine and valine residues. HABAS was used to generate 12 stereospecific methylene assignments (Skelton et al., 1995). AngleSearch was tested with various synthetic data sets and with ROESY data for 13C-labeled dihydrofolate reductase from L. casei complexed with anitfolate drugs (methotrexate, trimethoprim, trimetrexate). Even with only rough estimates of the proton coupling constants and full ROESY data, AngleSearch could stereospecifically assign about 50% of the residues. For more complete assignments, a more complete data set is needed (Polshakov et al., 1995). The floating chirality method is not yet in widespread use because uncertainty about the correctness of an assignment prevents its practical use. Even if a high percentage ofcalculated structures shows a dominant chirality for a prochiral center, it is not clear if it is a valid assignment. The most detailed study of this problem showed that prochiral centers do influence each other, and the statistical validation of a stereospecific assignment cannot be judged by a model of independent “floating” prochiral centers (Beckman et al., 1993). Methods for estimating the probability of a correct assignment in these cases have been suggested. The authors,
Calculation of Three-Dimensional Protein Structures
67
using a fragment of oxidized cytochrome c (57–79) containing nonideal helices and loops, concluded that assignments are reproducible and reliable only if appropriate statistical precautions are taken and if the system is well enough constrained by the experimental data to avoid compensatory interactions between the floating
prochiral centers during refinement. However, a direct comparison of the stereospecific assignments achieved by J-coupling experiments and by the floating chirality method (Folmer et al., 1997) for the 18-kDa single-stranded DNA binding protein of Pseudomonas bacteriophage Pf3 gave more positive results. A combination of floating chirality and atom swapping could reliably reproduce most of the stereospecific assignments found by the independent experimental J-coupling data. The accuracies of the calculated structures were of almost the same quality as those derived with explicit stereospecific assignments. A combination of floating and swapping is needed to achieve a reliable result. 4. COMBINED AUTOMATED NOESY SPECTRA ASSIGNMENT AND 3D STRUCTURE CALCULATION
While spin system recognition and sequential resonance assignment are important intermediate steps toward a 3D structure, a reliable and accurate 3D NMR structure depends on the number and quality of distance constraints extracted from NOESY spectra. This last step is time consuming and difficult to automate. Several computer-supported tools were developed and combined in a semiautomated way for the 3D structure determination of the FK506 binding protein FKBP complexed to the immunosuppressant ascomycin (Meadows et al., 1994). It was shown that this computer-based strategy can substantially reduce the time for 3D structure determination. Several new computational tools for automated assignments, described in this pioneering study, have been incorporated in later approaches. It was recognized in this study that the major barrier to automatic analysis of NOESY spectra is chemical-shift variations in different spectra. The problem of cross-peak ambiguity in NOESY spectra has been specifically addressed by two completely different procedures: the molecular dynamics calculations with ambiguous restraints (Nilges, 1995), and the self-correcting distance geometry method (Mumenthaler and Braun, 1995). 4.1. Molecular Dynamics Calculations with Ambiguous Restraints 4.1.1. Concepts
Nilges (1995) treated ambiguous NOESY spectral peaks by including all possible assignments within a certain tolerance range as an weighted sum. A
68
Yuan Xu et al.
minimization procedure based on simulated annealing is used for ab initio structure calculations. The restraint list is defined directly in terms of the proton chemicalshift assignment and the NOE peak table without assigning NOE cross peaks to proton pairs. More recently, the method has been refined by including elements
from the self-correcting distance geometry (SECODG) methods, such as selectively discarding obviously wrong constraints in iterative cycles of assignment and 3D structure calculation (Nilges et al., 1997; Odonoghue et al., 1996). The reader is referred to Chapter 4 in this volume for more details of the method and its applications.
4.2. Self-Correcting Distance Geometry Method 4.2.1. Concepts
A new automatic method that has proved its ability to deal with the ambiguous peak problem is based on self-correcting distance geometry (SECODG) (Mumenthaler and Braun, 1995; Hänggi and Braun, 1994). The method, like the previous ones, starts with a list of constraints derived from obvious peak assignments and for peaks where only a few assignments are possible. Automated peak picking methods can also be used, as shown in Sec. 4.2.5. The method uses a structure-based filter to detect inconsistent constraints and eliminate them from the list. Threedimensional structures are calculated with a modified distance geometry approach in torsion angles by using an error-tolerant target function. Unlike previous distance geometry methods, which were designed for consistent data sets, the method can generate accurate structures from sets that contain erroneous constraints.
4.2.2. The SECODG-Based Program Sequence in Practice The SECODG method is based on the successful distance geometry method for interpreting NMR structures first implemented in the programs DISMAN and DIANA (Güntert et al., 1991a). Three-dimensional macromolecular structures are calculated with the variable-target-function method in torsion angles (Braun and Go, 1985) on the basis of the stereochemical properties of the amino acids (using standard bond lengths and bond angles) and distance and dihedral angle constraints. SECODG is an iterative algorithm that does not rely on a suitable model structure to start the assignment procedure. It selects, by iterative cycles of trial assignments and structure calculations, the assignments for ambiguous NOESY cross peaks which best contribute to a consistent data set. The basic form of the target function is (Hänggi and Braun, 1994)
Calculation of Three-Dimensional Protein Structures
where the relative violations
69
are determined by
The actual distance is compared to the upper bound The violation is calculated if exceeds otherwise Similar expressions are used for lower bounds. The target function TF in Eq. (6) was chosen to decrease the weight of large violations relative to a group of small violations. Geometric constraints are progressively included in a series of target functions, containing an increasing number of terms during the optimization procedure. This “variable-target-function method” is an efficient strategy to overcome the local minimum problem (Y. Xu et al., 1995; Güntert et al., 1991a). The program DIAMOD implements the variable target function in torsion-angle space to calculate structures from the distance constraints. There are several separate routines in the packet. The NOAH program extracts distance constraints from a structure or structure ensemble and compares the resulting peak list with that used to derive the structures. This “structure based filter”
70
Yuan Xu et al.
has input–output interfaces as outlined in Fig. 7 with DIAMOD. All combinations
of peak assignments can be tested, and the resulting peak list is unbiased. The assignment of ambiguous peaks can be easily traced, and the method can also detect noise peaks. Using simulated and experimental data sets, SECODG can reliably assign about 80% of NOESY cross peaks with an error rate of about 5% (Mumenthaler and Braun, 1995). Figure 8 illustrates the assignment steps: (1) NOAH reads as input the protein sequence, the coupling constants, spin chemical shifts, and the list of NOESY cross peaks. It creates a first set of distance constraints which are based on unambiguous and ambiguous peak assignments. (2) This peak list is used by DIAMOD to calculate a 3D structure bundle. Distance constraints are assigned to these peaks for the first round of calculation with a uniform specified upper limit (5 Å in the examples to be mentioned); this means the proton pair should be no more than 5 Å a tolerance distance of each other within the final structure. (3) NOAH analyzes
Calculation of Three-Dimensional Protein Structures
71
the conformers with the lowest target function values to determine assignments which are consistently violated during the previous calculation cycle. These assignments are eliminated. Those that are consistent with the majority of structures are
considered unambiguous. These new peak lists are used to start the next structure calculation cycle in DIAMOD. Between 25 and 40 calculation cycles are usually needed to reach a saturation level for the number of assignments and a well-defined bundle of structures.
4.2.3. NOAH, a Structure-Based Spectral Filter The program NOAH (Fig. 8) uses several filters to assign spectral peaks. Possible cross-peak assignments are derived from the chemical-shift list with a certain tolerance in the chemical shifts. To assign a peak at position
in an nth-dimensional spectrum, the method considers that it could have been generated by any nucleus with a chemical shift in the interval Tolerance values may vary with type of nuclei and dimension. Only peaks with less than a user-specified maximum number of possible assignments, (usually between 2 and 4), are considered in the first round of structure calculations.
A peak with many possibilities may be used later, if the converging structures limit
the possible assignments to the user-specified maximum. The method has been tested with 2D and 3D data sets as described below. After the initial round of assignment, the 10 conformers generated by DIAMOD with the lowest target function values are used to check the peak list. An additional variable, NV(L), counts the Number of assignments of a specific peak that are Violated by less than L% of the structures. Peaks included in the unambiguous assignment list (AL) from the previous round whose distance constraints are consistent with more than a user-defined percentage of the conformers remain on that list. Those with DCs violated in many of the conformers are reclassified as
unassigned An initially ambiguous peak can be moved to the unambiguous AL only if its assignment in the previous round is not violated in any conformer and all other possible assignments would be violated in more than a specified percentage of the conformers. In each cycle, the tolerance distance (the difference allowed between the maximal distance for a proton pair interaction and the actual distance in the conformer bundle) is reduced, and the criterion for accepting a peak assignment is made more stringent (i.e., the percent of conformers in which a violation can occur is lower for rejection of an assignment). Finally, NOAH calculates a “reliability distance” (RD) for each assignment, which is best understood as the average distance in Å a nucleus would need to be moved in the ensemble of structures in order to fulfill the distance constraint arising from an alternative assignment. A high RD means that an assignment is reliable, while an RD of 0 means that several possible assignments can be made. The maximum RD for incorrect constraints has
72
Yuan Xu et al.
been shown to be less than or equal to the RMSD of the calculated structures, while
an RD greater than 3 Å indicates a correct constraint (Mumenthaler and Braun, 1995). 4.2.4. Tests and Applications
Several tests have demonstrated that the SECODG method can deal with real data and is significantly faster than manual interactive spectral interpretation
methods. The first test was to compare the results of automatic structure calculation, using a NOAH–DIANA suite, of NOESY spectra that had been analyzed earlier
by manual methods for six proteins ranging in size from 40 to 135 amino acids (Mumenthaler et al., 1997). The automated method assigned 70%–80% of the NOESY cross peaks, and the three-dimensional structures were similar in quality
to the manually determined structures. After calculation of structures, the final peak lists were compared to those assigned manually (Table 1). NOAH capably interpreted 2D spectra for protein preparations in water or and 3D spectra with - and -labeled nuclei. The automated method assigned 70%–80% of the NOESY cross peaks, on the average only 10% less than the number of manual assignments. A very small percentage (between 0.8% and 2.4%)
Calculation of Three-Dimensional Protein Structures
73
of the peaks assigned by the automated method differed from manual assignment. The resulting structures had very similar RMSD values to those based on manual assignments (last column of Table 1). The method is, however, calculation intensive,
especially for the larger proteins. For P14A, although the calculation schedule was reduced by a factor of 2, 60 h CPU time using six processors on a Cray J90 were required. Subsequent automated energy minimization with OPAL removed nearly all remaining constraint violations. The best indication of the ability of the method to calculate structures automatically is the killer toxin from Williopsis mrakii (WmKT). Here, the number of peaks detected by the SECODG-based automatic method was higher and the resulting structure bundle’s mean RMSD was lower than those resulting from the manual– interactive approach. The SECODG method also detected incorrect constraints in manually derived peak lists. 4.2.5. Second, Real-World Test of NOAH–DIAMOD: NMR Structure of Crambin S22, I25 The first experimental test case for the NOAH–DIAMOD program suite was
the determination by NMR of the structure of crambin (S22, I25) (Xu et al., 1997).
Crambin, isolated from the seeds of Crambe abysinica, is a 46-residue protein with high solubility in ethanol and organic solvents. This unusual solubility, as well as crambin’s homology to membrane-active plant toxins (purothionins and others) (Stec et al., 1995), has excited much interest in its structure–function relationship. The recent expression of crambin as a fusion protein in E. coli (Lobb et al., 1996) means that many mutants of crambin should soon be available for structural analysis, suggesting an immediate use for a good automated method for data assignment and interpretation. Seed crambin is a mixture of two nearly identical proteins, for one of which (Pro22,Leu25) the structure has been determined by both NMR and X-ray crystallography (Bonvin et al., 1994, 1993; Teeter et al., 1993; Teeter and Hendrickson, 1979). The global fold of the second form, which substitutes Ser and Ile at the two positions, should be similar to the first. The structure of this second form of crambin was determined automatically by using previously unassigned NMR data as input for NOAH–DIAMOD, and the resultant structure bundle was compared directly to
the X-ray structure of crambin P22, L25. The methodology and results were as follows. 4.2.5.1. Data Analysis and Initial Peak Assignments. A NOESY spectrum at 200 ms was obtained using the Felix E-Z 2D transform protocol. Felix automatic
peak picking was used to obtain all peaks, and FELIX volume integration and optimization with a Lorentzian line-shapes algorithm were used to obtain crosspeak intensities. Intensities for 540 cross peaks, combined with 92 very weak peaks (intensities up to smaller than the strongest peaks), were then used by an
74
Yuan Xu et al.
in-house FORTRAN program to calculate the average volumes and generate an output peak file with the chemical-shift data in a format suitable for the NOAH program. 4.2.5.2. Deriving the Initial Distance Constraints. Despite crambin’s small
size, the chemical shifts of 15 protons
could not be assigned manually.
This hindered completely automatic NOESY assignment with NOAH–DIAMOD, as the initial calculation cycles demonstrated. To initiate the calculations, 157 cross peaks from a 2D NOESY spectrum were manually assigned and the resulting distance constraints were regarded as invariant. The selected peaks allowed sequential, intraresidue, and assignments. Several long-range but short throughspace backbone cross peaks were also assigned manually to obtain 79 interresidue
and 80 intraresidue fixed assignments. The input thus consisted of 612 NOE cross-peak intensities, of which 157 were assigned manually, and a proton list with
data for 204 proton chemical shifts. The upper distance constraints were combined with three pairs of disulfide bridge constraints. The angular constraints from J-coupling constants included 33 angle constraints and 7 -angle constraints. NOAH can directly read the coupling constants and then translate them as angular constraints. For each DIAMOD cycle, 40 structures were calculated with upper distance constraints, and 10 of those with the smallest target function values were analyzed by NOAH. For convergence, 60 NOAH–DIAMOD cycles were needed. 4.2.5.3. Comparison of the Calculated Structures with the X-ray Structure of Crambin P22, L25. Figure 9 shows a superposition of the X-ray structure for
crambin P22, L25 (thick line) on the structure bundle of the 10 best NOAH– DIAMOD structures. The SECODG method, with some manual intervention required due to the incompleteness of the data set, is clearly working, as most of
Calculation of Three-Dimensional Protein Structures
75
the differences in the new structures probably reflect the two amino acid differences in the primary structures. The time required for determining the structure was reduced from months, required for a completely manual assignment, to a few weeks.
5. FUTURE IMPROVEMENTS AND OUTLOOK There has been significant progress in automating the determination of threedimensional protein structures from raw NMR data in recent years. If high-quality input data are available, most of the time-consuming and tedious NMR spectral
analysis can be done by computer programs. Assuming that heteronuclear tripleresonance data are available, automated complete sequential resonance assignment can be done with high reliability and minimal human intervention using methods such as RID or AUTOASSIGN. A combination of self-correcting distance geometry (SECODG) and structure-based spectral filters can assign most of the NOESY cross peaks and generate a high-quality three-dimensional protein structure bundle from a nearly complete chemical-shift list. None of the methods we have described can generate a structure in a completely
automatic fashion starting from spectral data. A good automated assignment program should be applicable to protein spectra of different types and of varying quality, able to analyze real data with a minimum of human intervention, and, most
important, be user friendly and computationally efficient. Such a program must be able to perform the analysis and 3D structure calculation in a much shorter time
than an interactive computer-supported approach. This does not mean that a sophisticated software package will ever replace the need for high-quality data. The best automated procedure cannot transform poor spectra into high-quality structures. A robust method should be able to recognize noise peaks and spectral artifacts, but it cannot create missing spectral information. To further develop completely automated data analysis methods, a library of high-quality structures and the data used to derive them should be established for
use in comparing the different methods in an objective way. The Protein Data Bank or BioMagres are important clearing sites for the establishment of generally accepted benchmarks. In the future, in addition to distance constraints or 3D atomic
coordinates, peak lists and other spectral information used for a 3D structure determination should be deposited in these central data banks. These could then be used as standard validation tests for methods as they are developed.
An integrated program suite for automatic analysis of NMR spectra will have building blocks based on the methods described in this chapter. However, with their continuous evolution and increased computational capabilities, methods which are currently not sufficiently optimized to be feasible in practice might become useful. The development of pulse sequences specifically designed for automatic interpretation, as done for the RID method, might increase the usefulness of real-space
76
Yuan Xu et al.
assignment, neural networks, or fuzzy graph methods. These considerations suggest that spectroscopists and theoreticians should continue to work in close collaboration to improve all aspects of the NMR method for determining the structures of biological macromolecules.
ACKNOWLEDGMENTS. This work was supported by grants from NSF (DBI9632326 and DBI-9714937), DOE (DE-FG03-96ER62267) and the John Sealy & Smith Foundation. REFERENCES Bartels, C., Billeter, M., Güntert, P., and Wüthrich, K., 1996, J. Biomol. NMR 7:207. Bartels, C., Güntert, P., Bületer, M., and Wüthrich, K., 1997, 7. Comput. Chem. 18:139. Bartik, K., and Redfield, C., 1993, 7. Biomol. NMR 3:415. Beckman, R. A., Litwin, S., and Wand, A. J., 1993, J. Biomol. NMR 3:675. Bernstein, R., Cieslar, C., Ross, A., Oschkinat, H., Freund, J., and Holak, T. A., 1993, J. Biomol. NMR 3:245. Bonvin, A. M., Boelens, R., and Kaptein, R., 1994, Biopolymers 34:39. Bonvin, A. M., Rullmann, J. A., Lamerichs, R. M., Boelens, R., and Kaptein, R., 1993, Proteins 15:385. Born, C., and Kerbosch, J., 1973, Comm. ACM 16:575. Braun, W., 1987, Q. Rev. Biophys. 19:115. Braun, W., 1991, in Computational Aspects of the Study of Biological Macromolecules by NMR (J. C. Hoch, F. M. Poulsen, and C. Redfield, eds.), Plenum Press, New York, p. 199. Braun, W., and Go, N., 1985, J. Mol. Biol. 186:611. Braun, W., Kallen, W., Mikol, V, Walkinshaw, M. D., and Wüthrich, K., 1995, FASEB J. 9:63. Braun, W., Wagner, G., Worgotter, E., Vasak, M., Kagi, J. H. R., and Wüthrich, K., 1986, 7. Mol. Biol. 187:125. Braun, W., Wider, G., Lee, K. H., and Wüthrich, K., 1983, 7. Mol. Biol. 169:921. Brown, L. R., Braun, W., Anil, K., and Wüthrich, K., 1982, Biophys. J. 37:319. Brünger, A. T., Clore, G. M., Gronenborn, A. M., and Karplus, M., 1986, Proc. Natl. Acad. Sci. U.S.A. 83:3801. Christofides, N., Mingozzi, A., Toth, P., and Sandi, S., 1979, Combinatorial Optimization, Wiley-Inter-
science, London. Cieslar, C., Clore, G. M., and Gronenborn, A. M., 1988, J. Magn. Reson. 80:119. Clore, G. M., Appella, E., Yamada, M., Matsushima, K., and Gronenborn, A. M., 1990, Biochemistry 29:1989. D’Ursi, A., Oschkinat, H., Cieslar, C., Picone, D., D’Alessio, G., Amodeo, P., and Temussi, P., 1995, Eur. J. Biochem. 229:494. De Vlieg, J., Boelens, R., Scheek, R. M., van Gunsteren, W. F., Berendsen, H. J. C., Kaptein, R., and Thomason, J., 1988, Proteins 3:209. Driscoll, P. C., Gronenborn, A. M., and Clore, G. M., 1989, FEBS Lett. 243:223. Eccles, C, Güntert, P., Billeter, M., and Wüthrich, K., 1991, J. Biomol. NMR 1:111.
Englander, S. W., and Wand, A. J., 1987, Biochemistry 26:5953. Fletcher, C. M., Jones, D. N. M., Diamond, R., and Neuhaus, D., 1996, J. Biomol. NMR 8:292. Folmer, R. H. A., Hilbers, C. W., Konings, R. N. H., and Nilges, M., 1997, J. Biomol. NMR 9:245. Friedrichs, M., Mueller, L., and Wittekind, M., 1994, J. Biomol. NMR 4:703.
Calculation of Three-Dimensional Protein Structures
77
Garrett, D. S., Seok, Y. J., Liao, D. I., Peterkofsky, A., Gronenborn, A. M., and Clore, G. M, 1997, Biochemistry 36:2517. Grindley, H. M., Artymiuk, P. J., Rice, D. W., and Willett, P., 1993, J. Mol. Biol. 229:707. Gronenborn, A. M., and Clore, G. M., 1994, Proteins: Struct. Fund. Genet. 19:273.
Gross, K. H., and Kalbitzer, H. R., 1988, J. Magn. Reson. 76:87. Grzesiek, S., and Bax, A., 1993, J. Biomol. NMR 3:185. Güntert, P., Braun, W., Billeter, M., and Wüthrich, K., 1989, J. Am. Chem. Soc. 111:3997. Güntert, P., Braun, W., and Wüthrich, K., 199la, J. Mol. Biol. 217:517. Güntert, P., Qian, Y. Q., Otting, G., Muller, M., Gehring, W., and Wüthrich, K., 1991b, J. Mol. Biol. 217:531. Hänggi, G., and Braun, W., 1994, FEBS Lett. 344:147. Harary, F., 1972, Graph Theory, Addison-Wesley, Reading, MA.
Hare, B. J., and Prestegard, J. H., 1994, J. Biomol. NMR 4:35. Havel, T. F., 1991, Prog. Biophys. Mol. Biol. 56:43.
Havel, T. F., Kuntz, I. W., and Crippen, G. M., 1983, Bull. Math. Biol. 45:665. Hendrickson, W., and Wüthrich, K., 1996, Macromolecular Structures, Current Biology Publications,
London. Hiroaki, H., Klaus, W., and Senn, H., 1996, J. Biomol. NMR 8:105. Hodsdon, M. E., Ponder, J. W., and Cistola, D. P., 1996, J. Mol. Biol. 264:585. Holak, T. A., Nilges, M., and Oschkinat, H., 1989, FEBS Lett. 242:649. Hu, J. S., and Bax, A., 1997, J. Biomol. NMR 9:323.
Hyberts, S. G., Marki, W., and Wagner, G., 1987, Eur. J. Biochem. 164:625. Jaravine, V. A., Nolde, D. E., Reibarkh, M. J., Korolkova, Y. V., Kozlov, S. A., Pluzhnikov, K. A., Grishin, E. V, and Arseniev, A. S., 1997, Biochemistry 36:1223. Kaptein, R., Zuiderweg, E. R. P., Scheek, R. M., Boelens, R. M., and van Gunsteren, W. F., 1985, J. Mol. Biol. 182:179. Kay, L. E., Clore, G. M., Bax, A., and Gronenborn, A. M., 1990, Science 249:411. Kleywegt, G. J., Boelens, R., Cox, M., Llinas, M., and Kaptein, R., 1991, J. Biomol. NMR 1:23. Kleywegt, G. J., Boelens, R., and Kaptein, R., 1990, J. Magn. Reson. 88:601.
Kleywegt, G. J., Lamerichs, R., Boelens, R., and Kaptein, R., 1989, J. Magn. Reson. 85:196. Kline, A. D., Braun, W., and Wüthrich, K., 1986, J. Mol. Biol. 189:377. Kline, A. D., Braun, W., and Wüthrich, K., 1988, J. Mol. Biol. 204:675. Koradi, R., Billeter, M., and Wüthrich, K., 1996, J. Mol. Graphics 14:51. Kraulis, P. J., 1994, J. Mol. Biol. 243:696. Landon, C., Cornet, B., Bonmatin, J. M., Kopeyan, C., Rochat, H., Vovelle, F., and Ptak, M., 1996, Eur. J. Biochem. 236:395. Lau, H., 1989, Algorithms on Graphs, TAB Books, Blue Ridge Summit, PA. Liepinsh, E., Berndt, K. D., Sillard, R., Mutt, V, and Otting, G., 1994, J. Mol. Biol. 239:137. Lobb, L., Stec, B., Kantrowitz, E. K., Yamano, A., Stojanoff, V., Markman, O., and Teeter, M. M., 1996, Protein Eng. 9:1233.
Lukin, J. A., Gove, A. P., Talukdar, S. N., and Ho, C., 1997, J. Biomol. NMR 9:151. MacArthur, M. W., and Thornton, J. M., 1993, Proteins 17:232. Markley, J. L., Westler, W. M., Chan, T. M., Kojiro, C. L., and Ulrich, E. L., 1984, Two-dimensional NMR approaches to the study of protein structure and function, Proc. 74th Ann Meeting Am. Soc.
Biol. Chem., Vol. 43, p. 2648. McGregor, J. J., 1982, Software-Pract. Exp. 12:23. Mcintosh, L. P., Brun, E., and Kay, L. E., 1997, J. Biomol. NMR 9:306. Meadows, R. P., Olejniczak, E. T., and Fesik, S. W., 1994, J. Biomol. NMR 4:79. Meyer, B., Hansen, T., Nute, D., Albersheim, P., Darvill, A., York, W., and Sellers, J., 1991, Science 251:542.
78
Yuan Xu et al.
Michalewicz, Z., 1995, Genetic Algorithms + Data Structures = Evolution Programs, Springer, Berlin. Morelle, N., Brutscher, B., Simorre, J. P., and Marion. D., 1995, J. Biomol. NMR 5:154. Mumenthaler, C., and Braun, W., 1995, J. Mol. Biol. 254:465. Mumenthaler, C., Güntert, P., Braun, W., and Wüthrich, K., 1997, J. Biomol. NMR 10:351. Neidig, K. P., Geyer, M., Gorier, A., Antz, C., Saffrich, R., Beneicke, W., and Kalbitzer, H. R., 1995, J. Biomol. NMR 6:255. Nilges, M., 1993, Proteins 17:297. Nilges, M., 1995, J. Mol. Biol. 245:645. Nilges, M., 1996, Curr. Opin. Struct. Biol. 6:617. Nilges, M., and Brunger, A. T., 1991, J. Cell. Biochem. 15G:422. Nilges, M., Clore, G. M., and Gronenborn, A. M., 1990, Biopolymers 29:813.
Nilges, M., Macias, M. J., Odonoghue, S. I., and Oschkinat, H., 1997, J. Mol. Biol. 269:408. Odonoghue, S. I., King, G. F., and Nilges, M., 1996, J. Biomol. NMR 8:193. Olson, J. B., and Markley, J. L., 1994, J. Biomol. NMR 4:385. Oschkinat, H., and Croft, D., 1994, Meth. Enzymol. 239:308. Oschkinat, H., Holak, T. A., and Cieslar, C., 1991, Biopolymers 31:699.
Oshiro, C. M., and Kuntz, I. D., 1993, Biopolymers 33:107.
Pfändler, P., and Bodenhausen, G., 1988, J. Magn. Reson. 79:99. Pfändler, P., and Bodenhausen, G., 1990, J. Magn. Reson. 87:26.
Pfändler, P., Bodenhausen, G., Meier, B. U., and Ernst, R. R., 1985, Anal. Chem. 57:2510. Pfeiffer, S., Karimi-Nejad, Y, and Rüterjans, H., 1997, J. Mol. Biol. 266:400. Polshakov, V. I., Frenkiel, T. A., Birdsall, B., Soteriou, A., and Feeney, J., 1995, J. Magn. Reson. Ser. B 108:31. Richarz, R., and Wüthrich, K., 1978, Biopolymers 17:2133. Rumelhart, D. E., Hinton, G. E., and Williams, R. J., 1986, Nature 323:533. Skelton, N. J., Kordel, J., and Chazin, W. J., 1995, J. Mol. Biol. 249:441. Stec, B., Rao, U., and Teeter, M. M., 1995, Crystallogr. Sec. D, Biol. Crystallogr. 51:914. Teeter, M., and Hendrickson, W., 1979, J. Mol. Biol. 127:219. Teeter, M. M., Roe, S. M., and Heo, N. H., 1993, J. Mol. Biol. 230:292. Thomsen, J. U., and Meyer, B., 1989, J. Magn. Reson. 84:212.
Tropp, J., 1980, J. Phys. Chem. 72:6035. van De Ven, F. J. M., 1990, J. Magn. Reson. 86:633. van Geerestein-Ujah, E., Mariani, M., Vis, H., Boelens, R., and Kaptein, R., 1996, Biopolymers 39:691. van Geerestein-Ujah, E., Slijper, M., Boelens, R., and Kaptein, R., 1995, J. Biomol. NMR 6:67. Wagner, G., 1993, J. Biomol. NMR 3:375. Weber, P. L., Morrison, R., and Hare, D., 1988, J. Mol. Biol. 204:483. Wehrens, R., Lucasius, C., Buydens, L., and Kateman, G., 1993, J. Chem. Inf. Comput. Sci. 33:245. Widmer, H., Widmer, A., and Braun, W., 1993, J. Biomol. NMR 3:307. Williamson, M. P., and Asakura, T., 1992, FEBS Lett. 302:185. Williamson. M. P., Havel, T. F., and Wüthrich, K., 1985, J. Mol. Biol. 182:295.
Williamson, M. P., and Madison, V. S., 1990, Biochemistry 29:2895. Wishart, D. S., Bigam, C. G., Holm, A., Hodges, R. S., and Sykes, B. D., 1995, J. Biomol. NMR 5:67. Wishart, D. S., and Sykes, B. D., 1994a, J. Biomol. NMR 4:171.
Wishart, D. S., and Sykes, B. D., 1994b, Meth. Enzymol. 239:363. Wishart, D. S., Sykes, B. D., and Richards, F. M., 1992, Biochemistry 31:1647. Wishart, D. S., Sykes, B. D., and Richards, F. M., 1991, J. Mol. Biol. 222:311. Wüthrich, K., 1995, NMR in Structural Biology. A Collection of Papers by K. Wüthrich, World Scientific
Series, Vol. 5. Wüthrich, K., Billeter, M., and Braun, W., 1983, J. Mol. Biol. 169:949. Wüthrich, K., Billeter, M., and Braun, W., 1984, J. Mol. Biol. 180:715.
Calculation of Three-Dimensional Protein Structures Wüthrich, K., Wider, G., Wagner, G., and Braun, W., 1982, J. Mol. Biol. 155:311. Xu, J., Gray, B. N., and Sanctuary, B. C, 1993a, J. Chem. Inf. Comput. Sci. 33:475. Xu, J., Straus, S. K., and Sanctuary, B. C., 1993b, J. Chem. Inf. Comput. Sci. 33:668. Xu, J., Weber, P. L., and Borer, P. N., 1995, J. Biomol. NMR 5:183. Xu, Y, Krishna, N. R., and Sugar, I. P., 1995, J. Magn. Reson. 107:201. Xu, Y, Wu, J., Gorenstein, D., and Braun, W., 1997, Abstracts 38th Experimental Nuclear Magnetic Resonance Conference, Orlando, FL. Yu, J., Simplaceanu, V, Tjandra, N. L., Cottam, P. E, Lukin, J. A., and Ho, C., 1997, J. Biomol. NMR 9:167. Zimmerman, D., Kulikowski, C. A., Huang, Y., Feng, W., Tashiro, M., Shimotakahara, S., Chien, C., Powers, R., and Montelione, G. T, 1997, J. Mol. Biol. 269:592. Zimmerman, D. E., Kulikowski, C. A., and Montelione, G. T., 1993, ISMB 1:447. Zimmerman, D., Kulikowski, C., Wang, L., Lyons, B., and Montelione, G. T., 1994, J Biomol. NMR 4:241. Zimmerman, D. E., and Montelione, G. T., 1995, Curr. Opin. Struct. Biol. 5:664.
79
3
NMR Pulse Sequences and Computational Approaches for Automated Analysis of Sequence-Specific Backbone Resonance Assignments of Proteins
Gaetano T. Montelione, Carlos B. Rios, G. V. T. Swapna, and Diane E. Zimmerman 1. INTRODUCTION Resonance assignments form the basis for analysis of protein structure and dynam-
ics by NMR (Wüthrich, 1986), and their determination represents a primary bottleneck in protein solution structure analysis. In many cases, the sequence-specific assignment of backbone resonances is sufficient to allow immediate interpretation of chemical-shift, NOESY, and scalar coupling data in terms of the protein’s secondary structure and chain fold. Complete assignments of backbone and side-
Caetano T. Montelione, Carlos B. Rios, G. V. T. Swapna, and Diane E. Zimmerman • Center for Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, NJ 08854. Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999.
81
82
Gaetano T. Montelione et al.
chain resonances provide the basis for analysis of conformational constraints and the generation of high-resolution protein structures. The introduction of multidimensional triple-resonance NMR (Montelione and Wagner, 1989, 1990; Ikura et
al., 1990; Kay et al., 1990) has dramatically improved the speed and reliability of the protein assignment process. Interpretation of these triple-resonance data is greatly facilitated by computer-assisted analysis (Zimmerman et al., 1993, 1994, 1997; Friedrichs et al., 1994; Hare and Prestegard, 1994; Meadows, et al., 1994;
Olson and Markley, 1994;Morelle et al., 1995; Zimmerman and Montelione, 1995; Bartels et al., 1996). Several research groups have attempted to automate the process of determining protein resonance assignments using 2D or 3D 15N-edited NOESY, COSY, and TOCSY data [for a recent review see Zimmerman and Montelione (1995)]. In this chapter we focus on progress in our own laboratory in triple-resonance data
collection and processing, and our progress in automating the process of analyzing resonance assignments from these data. By providing an extensive set of relatively conformation-independent intraresidue and sequential connectivity information (Montelione and Wagner, 1989, 1990; Ikura et al., 1990; Kay et al., 1990; Grzesiek and Bax, 1993; Clowes et al., 1995) triple-resonance NMR data sets are particularly amenable to automated analysis. Triple-resonance experiments can be viewed as providing selective information about intraresidue and/or sequential interactions already present in the multidimensional NOESY spectrum. Accordingly, all of the systems which have been implemented to date for the interpretation of triple-resonance spectra have essentially followed the same classical strategy originally developed by Wüthrich and co-workers (Billeter et al., 1982; Wagner and Wüthrich, 1982; Wüthrich, 1986) for analysis of resonance assignments from homonuclear NOESY and COSY, modified to include the unique sequential connectivity information available from triple-resonance data.
2. SYSTEMS FOR AUTOMATED ANALYSIS OF RESONANCE ASSIGNMENTS FROM TRIPLE-RESONANCE NMR SPECTRA Several systems have been described in the literature that can provide automated analysis of triple-resonance spectra. All of the programs follow a common general process, though details regarding the kinds of data used as input and specific issues of implementation differ from one program to another. As a usual starting point, a high-resolution root or source spectrum (e.g., 2D HSQC, 3D HNCO, etc.)
is used to identify the backbone amide N–H resonances of most residues. Each cross peak in that spectrum is initially interpreted as the root of an individual spin system, and the remaining spectra are then examined to identify additional intraresidue and sequential cross peaks whose amide resonances fall within some specified tolerance of that root. In general, the HCNH-type “intra” experiments
Automated Analysis of Resonance Assignments
83
detect both intraresidue and sequential correlations because of similar values of the and coupling constants (Montelione and Wagner, 1989, 1990). HC(CO)NH-type experiments (Ikura et al., 1990; Kay et al., 1990; Boucher et al., 1992), which select for magnetization pathways through the carbonyl carbon, generally provide exclusively sequential connectivity information. For this reason, it is common practice to first identify spin systems using HC(CO)NH-type data, and then use this information to distinguish intraresidue from sequential correlations in HCNH-type spectra. In all implementations to date, the individual nuclei
of each “spin system” i minimally include the intraresidue and resonances as well as the resonance of the preceding amino acid residue. Additional chemical-shift information is also usually used, and may include the intraresidue and sequential and/or frequencies as well. Once these spin systems have been identified and collated, most (though not all) implementations attempt to give each spin system a classification or measure of relative merit regarding possible amino acid types to which it can be assigned. With this information in hand, the search for logically consistent and “optimal” sequential assignments proceeds by establishing matches between the intraresidue resonances
of spin system i and the sequential resonances of spin system j. A suite of programs has been developed at Abbott Laboratories (Meadows et al., 1994) to automate the entire process of determining resonance assignments, NOESY analysis, structure generation, and reiterative structure refinement. The input data used for spin system classification and resonance assignments are the
amino acid sequence and peak-picked 4D HNCH, 4D HN(CO)CH, 4D HCC(CO)NH-TOCSY, and 2D (H,N)–HSQC spectra. The resulting spin systems include the backbone amide and aliphatic and \ resonances of residue i together with the backbone and side-chain aliphatic resonances of residue Classification of the (preceding) residue type is determined by matching the aliphatic proton and carbon-13 chemical shifts in 4D HCC(CO)NH–TOCSY spectra to residue-specific masks designating the common shift ranges of each
amino acid type, resulting for each spin system in a list of possible amino acid types with associated scores. Next, all possible pairs of sequence-adjacent spin systems are identified by matching in the and dimensions using 4D HN(CO)CH and HNCH data. Each pair of spin systems that is uniquely identified in this way as adjacent in the sequence defines a dipeptide. Once all such dipeptides have been
identified, the system extends these into longer fragments by matching the intraresidue shifts of the C-terminal spin system of one dipeptide to the sequential shifts of the N-terminal spin system of another. This process continues until all possible tri- to hexapeptide fragments are generated. These fragments are then aligned to the amino acid sequence, starting with the longest fragment first. This process is repeated until only gaps remain, which are resolved by fitting progressively smaller fragments to those parts of the sequence.
84
Gaetano T. Montelione et al.
A set of macros (for use within the commercial software package FELIX) developed by the Bristol–Myers Squibb NMR group (Friedrichs et al., 1994) constitutes another robust and highly reliable assignment engine. Though largely automatic, the system requires some human assistance in collating spin systems and classifying atom types (Friedrichs et al., 1994). The input consists of six heteronuclear three- and four-dimensional spectra: 3D HNCACB, 3D CBCA(CO)NH, 4D HNCAHA, 4D HN(CO)CAHA, 3D HBHA(CO)NH, and 3DHNHA-Gly (an experiment designed to selectively edit for glycine spin systems). In stage 1, peaks sharing common resonances in the four spectra are grouped into residue information data structures (RIDs), and each of the observed resonances is classified according to atom type etc.). Like the CA- and CO-ladders described for AUTOASSIGN (below), each RID includes the backbone N and resonance frequencies of residue i along with the and resonance frequencies for both residues i and . Next, a stage 2 macro establishes tentative links between pairs of RIDs, with the associated scores reflecting the number and quality of chemical-shift matches. Spin systems identified as glycines are handled specially. The RID strings are then compared to the amino acid sequence for “goodness of fit” by stage 3 macros which establish sequence-specific assignments and links. The goodness-of-fit metric is a measure of how well the observed aliphatic 13C resonances of a string of RIDs match the expected upperand lower-bound and chemical-shift values for each amino acid in a polypeptide segment. A best-first strategy for establishing assignments and links is then used to map stretches of linked spin systems to the amino acid sequence. A second stage 3 macro then completes the assignments by filling in the remaining gaps in the protein sequence. Finally, resonances are assigned automatically using HBHA(CO)NH data. Connectivity tracing assignment tools (CONTRAST) (Olson and Markley, 1994) is a set of C macros designed to assist in automated analysis of resonance assignments. CONTRAST is designed to establish the sequential order of amino acid spin systems, but does not attempt to classify spin system types or automatically match these to the amino acid sequence. Thus the final mapping of linked spin systems to specific sites in the amino acid sequence is left to the user in an interactive mode. The system input includes peak-picked files from 3D HNCO, HNCA, HN(CO)CA, TOCSY–HMQC, HCACO, and HCA(CO)NH spectra. Spin system roots are defined by HNCO and HNCA spectra and then used to identify sequential peaks in 3D HN(CO)CA and intraresidue and peaks in 3D HNCA and TOCSY–HMQC spectra. The next step extends these partial spin
systems (called “fragments”) by matching the designated intraresidue resonances to cross peaks in the HCACO and HCA(CO)NH spectra, thus also identifying the resonance frequencies of intraresidue carbonyl and the amide nitrogen of the following residue As a result of these extensions, each fragment has chemical-shift information spanning three sequential residues (i.e.,
Automated Analysis of Resonance Assignments
85
and each fragment has two sets of as many as three backbone resonances (i.e., which can be matched between fragments to establish sequential links. The ALPS system (Morelle et al., 1995) is unique in that “reduced-dimensionality” 2D triple-resonance experiments (Szyperski et al., 1993; Simorre et al., 1994; Mittard et al., 1995) are used to achieve higher resolution and reduce the amount of time required to collect data. In general, higher resolution allows tighter match tolerances, improving the efficiency of sequential matching. The input data include peak-picked 2D (H,N)–HSQC, and the following reduced dimensionality 2D triple-resonance experiments: HNCO, HN(CO)CA, H(N)COCA, HN(COCA)H, HNCA, HN(CA)CO, and HN(CA)H. The HSQC data is first used to deconvolute the multiple-quantum doublets of the reduced-dimensionality spectra. Once this has been accomplished, the resulting data can be treated the same as the corresponding 3D spectra. Using the HNCO spectrum to define spin system roots, “pseudoresidue” spin systems are compiled which include the resonance frequencies of This allows matching of C-terminal resonance frequencies of one pseudoresidue with N-terminal
resonance frequencies of another pseudoresidue. The
chemical-shift information is also used to restrict possible amino acid types.
Two methods of assignment have been described, a systematic search procedure and simulated annealing using a global penalty function. Both the systematic search and simulated annealing methods obtained complete and correct assignments on test data sets, although performance with simulated annealing was significantly more robust.
3. AUTOASSIGN 3.1. The Algorithm AUTOASSIGN (Zimmerman et al., 1993, 1994) is a constraint-based expert system designed to derive backbone and side-chain resonance assignments from a specific set of triple-resonance spectra. Input to AUTOASSIGN includes a peak-picked 2D (H–N)–HSQC spectrum and the following peak-picked
3D spectra: HNCO, HNCACO, CANH, CA(CO)NH, CBCANH, CBCA(CO)NH, H(CA)NH, and H(CA)(CO)NH. The system first uses the HSQC and HNCO data to define backbone amide spin system roots. Cross peaks corresponding to sequential and intraresidue and resonances are identified in the HN(CO)CH and HNCH-type spectra and used to define CO-ladders and CA-ladders, respectively, associated with each N–H root (Fig. 1). The HNCO and CA(CO)NH spectra are also used to flag degeneracies, as two spin systems sharing similar amide N–H chemical shift values will generally have different sequential carbonyl
86
and/or
Gaetano T. Montelione et al.
shifts. The CA- and CO-ladders of each spin system are then classified
into possible spin system types using Bayesian posterior probabilities based on and chemical-shift information. If it is available, phase information from
properly tuned CANH and CA(CO)NH experiments is also used to uniquely identify glycine (Gly) residues or N–H groups that follow glycine residues (Gly-X) in the amino acid sequence (Feng et al., 1996). AUTOASSIGN then uses a best-first search which iteratively derives the most reliable sequential links between the
CA-ladder of spin system i and the CO-ladder of spin system j. The program uses a constraint propagation network (CPN) (Zimmerman et al., 1994) which can infer all other logically entailed assignments and links at each step. Using a match-score
function based on Euclidean distance, AUTOASSIGN first identifies all unique four-valued sequential matches that exceed a given threshold. This threshold is iteratively decremented and new matches are established until all reliable four- valued matches have been exhausted, at which point the system iteratively considers three-valued matches in the same fashion. The dynamic CPN module propagates constraints and makes any assignments which are logically entailed by the current state. Once these methods have been exhausted, the system makes any additional assignments through a process of elimination. AUTOASSIGN also has specific processes for dealing with overlap of roots, and for addressing issues that arise from chemical or conformational heterogeneity (Zimmerman et al., 1997).
Automated Analysis of Resonance Assignments
87
3.2. The Philosophy of AUTOASSIGN The aim of this software development was to create an integrated system that combines a specific set of NMR experiments with an analysis tool tailored to this set of experiments. These NMR data are represented schematically in Fig. 2. As such, the current version of AUTOASSIGN is not a general tool for automated or interactive analysis of assignments from an arbitrary set of triple-resonance experiments. Rather, it is a combined set of NMR experiments and analysis tool which together provide a robust and reliable approach for obtaining backbone and select side-chain resonance assignments in a fully automated fashion. Of course, it would be preferable to develop a more general tool capable of executing arbitrary strategies for resonance assignments from triple resonance. Efforts along these lines are under development in our laboratory.
3.3. Generic Spin System Objects One of the key features of the AUTOASSIGN process involves representation of the information that is derived from the triple-resonance experiments outlined
in Fig. 2. This data representation, the generic spin system (GS) object, is shown in Fig. 1. The features of the GS include (i) the root, corresponding to the resonance frequencies of the backbone nuclei of amino acid i, (ii) the CA-ladder, corresponding to the H and resonance frequencies of amino acid i derived from specific “intraresidue” triple-resonance experiments of Fig. 2, and (iii) the CO-ladder, corresponding to the H and resonance frequencies of amino acid that are detected on in specific “sequential” triple-resonance experiments of Fig. 2. The and resonance frequencies of the CA- and CO-ladders provide information about possible amino acid types for residue i and respectively, which limits the domain of possible N–H sites in the sequence to which the GS root can be assigned.
3.4. Constraint Propagation A second key feature of the AUTOASSIGN process is the use of constraint propagation ideas from artificial intelligence (Mackworth, 1977; Kumar, 1992) to reduce the domain of possible sites in the sequence to which a GS can be assigned on the basis of constraints imposed by establishing sequential relationships between GS objects. An example of such constraint propagation is shown in Fig. 3. In this example, the and values of the CA- and CO-ladders of GS(i) (Fig. 3A) indicate possible amino acid types that are consistent with eight dipeptide positions in the stretch of polypeptide sequence shown. Similarly, the and values of the CA- and CO-ladders of GS(j) (Fig. 3B) indicate a X-Gly sequence that is
88
Gaetano T. Montelione et al.
mapped to two “possible sites” in the sequence. However, the unique four-dimensional match of CA-ladder of GS(i) to the CO-ladder of GS(j)
(Fig. 3C) indicates that the domain of possible assignments of GS(i) can be reduced to only those N–H sites that precede a N–H site in the domain of possible
Automated Analysis of Resonance Assignments
89
assignments of GS(j). In this way, the sets of sites that can be assigned to GS(i) and GS(j) are reduced to the unique sequence Pro-His-Gly (Fig. 3D). The program keeps track of both the domain of possible sequence-specific sites (SSs) in the sequence that each GS can be assigned, and the set of possible GSs to which each SS site can be assigned. If Pro-His-Gly is the only set of SSs consistent with this
90
Gaetano T. Montelione et al.
string of linked GSs, and the site itself has no other possible assignments to other GS objects, the corresponding atom-specific chemical shifts of GS(i) and GS(j) can be assigned to sequential residues Pro-His-Gly. In general, GS objects are linked by matching CA- and CO-ladders, and the resulting constraints on the domains of possible assignments of each GS to sites in the sequence are propagated down the chain of linked GSs until the linked segment of GSs is mapped to a single unique stretch of amino acid residues in the sequence.
3.5.
Representative Results AUTOASSIGN was developed and tested (Zimmerman et al., 1997) using
triple-resonance data sets obtained for five distinctly different proteins: the Z domain of staphylococcal protein A (7.5 kD) (Lyons et al., 1993; Tashiro et al.,
1997), the single-stranded-RNA-binding cold-shock protein A from Eschericia coli (Csp A, 7.3 kD) (Newkirk et al., 1994; Feng et al., 1998), a homodimeric doublestranded RNA-binding domain from the influenza A virus nonstructural protein 1 [NS-l(l–73), 16.6 kD] (Chien et al., 1997), human basic fibroblast growth factor (FGF-2, 17.2 kD) (Moy et al., 1995), and bovine pancreatic ribonuclease A (RNase A, 13.5 kD) (Shimotakahara et al., 1997). Further testing was carried out using additional triple-resonance data obtained for two different disulfide-deleted mutants of RNase A: [C65S, C72S]-RNase A (Shimotakahara et al., 1997) and [C40A, C95A]-RNase A (Laity et al., 1997). The three RNase A data sets provide a useful case study; while their amino acid sequences are 98% identical, about one-third of the N–H resonance frequencies are significantly different in the spectra of the wild-type (wt) and mutant proteins. In addition, the spectra differ in terms of the extent of spurious peaks, N–H degeneracy, and extra GSs. All of the data sets for RNase A were analyzed independently; the results of one analysis were not used to guide the other analyses. Representative results of AUTOASSIGN analysis are illustrated in our recent
study of the structure and dynamics of wt RNase A and an analog of a folding– unfolding intermediate, [C65S, C72S]-RNase A (Shimotakahara et al., 1997). Using the nine triple-resonance experiments summarized in Fig. 2, collected over a 10-day period, AUTOASSIGN obtained nearly complete backbone and side-chain assignments for both wt and mutant RNase A. Similar results were obtained for the [C65S, C72S]-RNase A mutant. Table 1 summarizes some of the more salient characteristics of the automated analysis, and a survey of the intraresidue and sequential triple-resonance connectivities determined by AUTOASSIGN for both wt and mutant RNase A is presented in Fig. 4. Both proteins consist of 124 amino acid residues, but because the N-terminal lysine
and four proline residues have no backbone amide protons, only 119 of these residues have detectable roots. The analyses of the wt and [C65S, C72S]-RNase A data were carried out completely independently; the assignments
Automated Analysis of Resonance Assignments
91
determined for wt RNase A were not used in the analysis of the spectral data of the mutant. The automated analyses summarized in Table 1 and Fig. 4 were subsequently verified by manual inspection of strip plots from all of the 3D triple-resonance spectra. The subsequent manual analysis also provided several additional resonance assignments (Table 1). In addition to resonances assigned to the predominant form of RNase A present in solution, a large number of generally weaker resonances present in these spectra could be clustered into spin systems. Many of these spin systems have distinct amide roots, but their CO- and CA-ladders closely resemble those of assigned spin systems whose relative peak intensities are significantly stronger.
Typically, the resonances of these extra spin systems have ~10% the intensities of resonances assigned to the predominant form of the protein. Mass spectroscopy,
92
Gaetano T. Montelione et al.
Automated Analysis of Resonance Assignments
93
94
Gaetano T. Montelione et al.
N-terminal analysis, capillary electrophoresis, and cation exchange chromatogra-
phy analyses show that there is no detectable chemical heterogeneity in these protein samples. Accordingly, one plausible explanation is that these extra spin systems arise from minor conformations that are in slow dynamic equilibrium with the major native conformers of the folded states of wt RNase A. Similar conformational heterogeneity was observed in samples of the two RNase A mutants, and also in the sample of basic FGF-2. In FGF-2, this heterogeneity has been shown to be due to X-Proline cis–trans conformational isomerization (Moy et al., 1995).
Similar analyses were also carried out using uniformly enriched samples of [C65S, C72S]-RNase A and [C40A, C95A]-RNase A. The differences in assigned and backbone and side-chain resonance frequencies between wt and these mutant forms of RNase A were carefully examined. As a representative example of the results of these studies, residues
containing resonances that exhibit significant chemical-shift differences between
wt and [C65S, C72S]-RNase A proteins are shown on the three-dimensional X-ray crystal structure of wt RNase A (Wlodawer et al., 1988) in Fig. 5. Comparison of HSQC spectra shows that about 30% of the [C65S, C72S]-RNase A cross peaks are shifted compared to those of wt RNase A. As can be seen in Fig. 5, these differences are concentrated around three regions: polypeptide segments Gin 60–Ser80 His 105-Glu 111 and He 116–Ser 123 All of the residues exhibiting significant chemical-shift differences are spatially adjacent to one another in the three-dimensional structure of the mutant and near the site of the Cys 65–Cys 72 disulfide deletion. Similar results
were obtained in comparing chemical-shift differences between wt and [C40A, C95A]-RNaseA(Laity et al., 1997).
Chemical shifts are highly sensitive to local chemical and electronic environments. Since the polypeptide segment (Gin 60–Ser 80) of [C65S, C72S]-RNase A includes the Cys 65–Cys 72 disulfide deletion site, it is not surprising that the most striking chemical-shift deviations occur in this region. The fact that the chemical shifts of residues distant from the site of the disulfide deletion are all essentially identical (Fig. 5) provides additional assurance that resonance assignments determined by AUTOASSIGN independently for wt and [C65S, C72S]-RNase A are largely correct. Similar results for [C40A, C95A]-RNase A indicate that AUTOASSIGN is also accurate for this mutant as well. Subsequent analysis of NOE and scalar coupling data (Shimotakahara et al., 1997; Laity et al., 1997) confirm that (i) the assignments determined by AUTOASSIGN result in an extensive set of conformational constraints that are consistent with the X-ray crystal structure of wt RNase A (Wlodawer et al., 1988); (ii) essentially all of the structural differences between wt and [C65S, C72S]-RNase A are localized near the site of the disulfide deletion, and the resulting NOE assignments are consistent with an overall wt-like
structure for the mutant; and (iii) essentially all of the structural differences between wt and [C40A, C95A]-RNase A are localized near the site of the disulfide deletion,
Automated Analysis of Resonance Assignments
95
and the resulting NOE assignments are consistent with an overall wt-like structure for this mutant. These studies provide validation of the accuracy of resonance assignments determined by AUTOASSIGN. The AUTOASSIGN assignments for wt RNase A are also in generally good agreement with published and proton assignments (Rico et al., 1989; Robertson et al., 1989; Santoro et al., 1993), based
96
Gaetano T. Montelione et al.
on homonuclear NMR data, although some sites exhibit significant differences attributable to differences in pH and temperature at which these assignments were made. These sequence-specific assignments for wt and [C65S, C72S] RNase A were derived independently of one another; knowledge of the assignments for wt RNase was not used to direct the assignment process for the mutants. However, the NMR experiments carried out for these studies could also provide a database for development of “homology based” automated assignment software. In such a strategy, assignments obtained from a complete set of triple-resonance spectra for wt RNase A could be used to guide the interpretation of a more limited set of spectra for a RNase A mutant, or for a complex between RNase A and a bound ligand. More generally, if a second protein contains high-sequence homology to a protein whose assignments and structure have been determined previously, these known assign-
ments can be used to assign the homologous regions first, thus restricting the possible assignments of the remaining nonhomologous regions and dramatically reducing the data collection and software execution time. Similar stategies can be developed to provide assignments of protein–ligand complexes from the assignments of the unliganded protein, in cases where ligand binding does not greatly perturb the structure of the protein. In the automated analysis reported here, homology-based reasoning was not used, and the spectra of both proteins were analyzed independently (and without knowledge of the published proton assignments). Work is now in progress to determine how homologous assignment information can be used to reduce the number of triple-resonance data sets required to provide reliable and complete resonance assignments. Results of AUTOASSIGN analysis of triple-resonance spectra collected for seven different proteins are presented in Table 2. This version of the AUTOASSIGN program (Zimmerman et al., 1997), implemented in LISP, is sufficiently robust to provide nearly complete backbone resonance assignments for all seven of these proteins, ranging in size from 7.5 to 17 kD. With the exception of FGF-2, experiments depicted in Figs. 2A through 2H were collected for all of the proteins tested. For the RNase A and NS1 data sets, HNCACO data (Fig. 2I) were also collected. For FGF-2, experiments equivalent to (Fig. 2A) and (Figs. 2D–H) were carried out. In addition, for the FGF-2 data manual analysis was used to extract cross peaks from a CBCA(CO)NH-type spectrum and cross peaks from an HBHA(CO)NH spectrum, and a separate preprocessing module was used to infer intraresidue resonances from an HN(CA)CO spectrum. Peak lists corresponding to data that would be provided by experiments depicted in Figs. 2B, 2C, and 2I were thus defined for FGF-2 and included in the input for AUTOASSIGN. As discussed above, several of these proteins exhibit many extra spin systems due to chemical or conformational heterogeneity associated with proline cis–trans isomerization
(Moy et al., 1995, Shimotakahara et al., 1997; Laity et al., 1997). Using a compiled LISP version of the program, execution times on a Spare 20 workstation ranged
Automated Analysis of Resonance Assignments
97
from 16 to 360 s. Assignments for CspA, NS1, and Z-domain have subsequently been validated by using these as the basis for assigning NOESY spectra and generating high-resolution 3D structures that satisfy all of the derived NOE constraints (Tashiro et al., 1997; Chien et al., 1997; Feng et al., 1998).
4. PRACTICAL CONSIDERATIONS IN DATA COLLECTION AND PROCESSING 4.1. General Considerations The most critical consideration in data collection involves ensuring minimal matching tolerances when comparing peaks that have the same nominal resonance
frequency between two spectra. This is particularly important for matching CAand CO-LDDR frequencies. Accordingly, the resonance frequencies in spectra used to obtain intraresidue chemical-shift values (e.g., CBCANH data) must exactly match the corresponding resonance frequencies in spectra used to obtain sequential chemical-shift values (e.g., CBCA(CO)NH data). In addition, best results are
98
Gaetano T. Montelione et al.
obtained when these matching dimensions are collected and processed with identical digital resolutions, and Fourier-transformed using identical window functions.
From a practical point of view, the matching of chemical shifts between spectra is best ensured by collecting the spectra in a back-to-back fashion on the same NMR sample. Although this care is not necessarily required for preparing input to an automated analysis program, the performance of such a program will be enhanced by using the smallest match tolerances consistent with the data sets. In our laboratory, we generally prepare two identical samples, and execute the full set of NMR experiments back to back on one of these. Macros have been developed which allow automated setup of these experiments on a Varian Inova spectrometer, and
ensure that matching dimensions are collected with matching digital resolutions. If the sample decomposes or is accidentally spoiled in the course of data collection, the remaining spectra can be collected on the second sample. Efforts are also made to ensure that all of the spectra are collected with identical sample temperatures. This is not always simple to do, as the different decoupling duty cycles used in the various triple-resonance experiments can result in varying degrees of sample heating; this can be measured and the temperature of the probe offset appropriately to account for differential sample heating. Other important considerations are the digital resolution and the overall measuring time needed to record the eight or nine spectra required for input to AUTOASSIGN. In our laboratory, recommended digital resolutions for data collection (or following linear prediction to twice the number of complex data points collected and Fourier transformation) are 0.024 (0.012) ppm/pt in the direct dimension, 0.32 (0.16) ppm/pt in the indirect N dimension, 0.52 (0.26) ppm/pt in the indirect aliphatic or carbonyl C dimension, and 0.110 (0.055) ppm/pt in the indirect aliphatic H dimension. On a 500-MHz spectrometer, these correspond to sweepwidths of 6250 Hz, 2000 Hz, 8300 Hz, 8300 Hz, and 7000 Hz in the , N, and dimensions, respectively, collecting 512 complex points in the direct dimension and
128 complex points in each of the indirect dimensions, followed by two-fold linear prediction or zero filling in each dimension. Successful analyses have been done with lower-resolution data sets, but these resolutions work well in our hands. The AUTOASSIGN program cannot yet correct for folding in the carbon or proton dimensions. The overall measuring time, of course, is determined by the signal-to-noise ratio available on the specific instrument for the specific sample. For example, using a 500-MHz spectrometer and sample concentrations of 1–3 mM, typical total data collection times in our laboratory range from 5 to 14 days. It should be possible to reduce these collection times in future developments by using triply -enriched protein samples and decoupled triple-resonance experiments
within the AUTOASSIGN strategy.
Automated Analysis of Resonance Assignments
99
4.2. Peak Picking of NMR Spectra All of the peak lists analyzed by AUTOASSIGN are in the form of ASCII text files listing the 2D or 3D peak coordinates (in ppm) and intensities. Processing of the NMR spectra for AUTOASSIGN input has been done using VNMR (Varian Associates), NMRPipe (Delaglio et al., 1995), and Felix (Molecular Simulations, Inc.) programs. The resulting frequency-domain spectra are peak-picked using automated tools provided in NMRCompass (Molecular Simulations Inc.), Felix (Molecular Simulations Inc.), or PIPP (Garrett et al., 1991). For most spectra, an initial list of automatically picked cross-peak resonance frequencies and intensities is generated for each 2D and 3D spectrum using intensity and linewidth filters. This list is then edited manually to identify and eliminate extraneous peaks, using interactive graphics and various general features such as the approximate expected number of peaks, the visual quality of alignment across spectra, and peak shape criteria. However, as no general specifications for peak picking have been developed, the user-defined criteria for manual editing of peak-picked spectra vary considerably. The interactive manual editing requires h per 3D NMR data set (i.e., for the complete set of spectra), and can be carried out while data collection is in progress, adding little to the total time required for the complete process of determining backbone resonance assignments. This peak picking and subsequent editing process uses no assumptions about
the assignments or structure of the protein. AUTOASSIGN can tolerate a good deal of incompleteness, spurious peaks, peak frequency perturbations, chemical-shift degeneracy, and conformational and/or chemical heterogeneity that results in extraneous spin systems in the spectra. However, as the quality of the peak-picked data deteriorates, there is a point at which performance is compromised (see discussion of the [C40A, C95A]-RNase data set in Zimmerman et al., 1997). In particular, the quality and completeness of the backbone assignments critically depend on the quality of the HNCO and HSQC spectra and the reliability with which these spectra are peak-picked. In general, better-quality spectra and accurate peak picking results in improved performance. Despite this sensitivity to peak-picking reliability, our results demonstrate that AUTOASSIGN can obtain almost complete and highly reliable assignments of backbone N, and side-chain resonances from reasonably good quality triple-resonance NMR data. The results obtained to date indicate that while the current implementation of AUTOASSIGN is reasonably robust with respect to peak-picking artifacts, even better performance can be anticipated once software is developed to provide more consistent and reliable peak-list input files.
100
Gaetano T. Montelione et al.
4.3. Validation of Input Files
In our experience, it is often useful or necessary to validate certain features of the input files before executing AUTOASSIGN analysis. The most common causes of poor performance in AUTOASSIGN analysis are inaccurate and/or inconsistent global referencing of the spectra used to generate the peak-picked lists used as input. Although the program includes internal global referencing corrections, it is critical to ensure that the input data are properly referenced both with respect to one another and with respect to standard referencing values (Wishart et al., 1995), as the values of and resonance frequencies in these input files are used to identify possible amino acid residue types that can be assigned to the corresponding GS object. Specific validation procedures which should be done on each set of input files include the following:
• The 2D HSQC spectrum and all 3D spectra should exhibit superimposable 2D projections, within tolerances of ppm and ppm in the and dimensions, respectively. • frequencies in 3D HNCA and (HA)CA(CO)NH spectra should fall in the range of 40–70 ppm, with an average value of
• •
•
• •
frequencies in 3D CBCANH and CBCA(CO)NH spectra should fall in the range of 10–75 ppm, with an average value of The regions of 2D and projections of the 3D HNCA and CBCANH experiments should exhibit many superimposable peaks, within tolerances of and in the and dimensions, respectively. The regions of 2D _ _ _ and projections of the 3D (HA)CA(CO)NH and CBCA(CO)NH spectra should exhibit many superimposable peaks, within tolerances of and ppm in the and dimensions, respectively. The distribution of C' resonance frequencies (plot as histograms) in the 3D HNCO and HN(CA)CO spectra should be similar. The distribution of resonance frequencies (plot as histograms) in the 3D HA(CA)NH and HA(CA)(CO)NH spectra should be similar.
5. EXPERIMENTS FOR AUTOMATED ANALYSIS OF BACKBONE RESONANCEASSIGNMENTS
The general kinds of NMR data required for AUTOASSIGN execution are shown schematically in Fig. 2. The required data can be obtained using various implementations of triple-resonance experiments that are available on the world-
Automated Analysis of Resonance Assignments
101
wide web from several academic laboratories, or from commercial NMR spectrometer vendors. In particular, the AUTOASSIGN program does not consider whether a particular pulse sequence is implemented in the “out-and-back” fashion or in the “straight-through” fashion, nor whether or not pulsed-field gradients have been used for water suppression. Similarly, it is not relevant if the magnetization is detected on protons or on aliphatic protons, so long as the nuclei that are frequency labeled include those indicated in each schematic experiment of Fig. 2. In our experience the specific implementations of triple-resonance experiments described below that are in use in our laboratory are especially suitable for generating input to the AUTOASSIGN program. The following sections describe our current “standard” versions of these triple-resonance experiments, together with practical issues associated with their execution. All of the experiments are carried out using heteronuclear coherence selection with pulsed-field gradients (PFGs) for solvent suppression, and sensitivity enhancement by collection of both the x- and y-components that develop during the frequency-evolution period (Kay et al., 1992). In these sensitivityenhanced experiments, the delay is set to a value between 100 and
depending on the gradient recovery properties of the specific NMR probe used in the data collection. In this regard, it is crucial to select probe and gradient amplifier hardware with excellent PFG phase recovery properties. Selective C´ or decoupling indicated during specific periods of these pulse sequences is achieved using a sinc waveform with an MLEV supercycle (S. D. Emerson and G. T. Montelione, unpublished results). These pulse sequences have been implemented on Varian Inova 500 and 600 NMR spectrometers. The corresponding C source codes, parameter files, and waveforms, together with macros for automatically optimizing values of key coherence transfer delays for different protein NMR samples, are available over the worldwide web at http://www-nmr.cabm.rutgers.edu/. 5.1. HSQC Our implementation of -correlated PFG-HSQC (Kay et al., 1992; Li and Montelione, 1993) is shown schematically in Fig. 6. The single adjustable coherence transfer delay, is tuned to values slightly smaller than by arraying its value and evaluating the intensity of the first spectrum of the 2D array. In providing input to AUTOASSIGN, two 2D HSQC spectra are collected with
different sweepwidths in the
dimension: the larger sweepwidth is sufficient to
include all side-chain Arg and Lys frequencies; the smaller sweepwidth is adjusted to provide folding of these same Arg and Lys resonances. The remaining 3D spectra are collected with exactly the same and sweepwidths and corresponding digital resolutions used in this second HSQC spectrum. The program uses these two HSQC spectra to identify folded Arg and Lys cross
102
GaetanoT. Montelione et al.
Automated Analysis of Resonance Assignments
103
104
Gaetano T. Montelione et al.
peaks in the spectrum recorded with smaller sweepwidth and in all the remaining 3D triple-resonance spectra.
5.2. HNCO
Our implementation of PFG–HNCO (based on Muhandiram and Kay, 1994) is shown schematically in Fig. 7. The delay is set as described above, and the adjustable coherence transfer delay is tuned using the value optimized in the HSQC experiment (above). The coherence transfer delay is adjusted to exactly corresponding to a null for coherence transfer within groups, as shown in Fig. 8. This choice of provides significant (though not
necessarily perfect) suppression of cross peaks from side-chain amide
groups
of Asn and Gln, which otherwise are quite strong in the spectrum. The key
coherence transfer function specific to this PFG–HNCO, is shown for a typical effective coherence relaxation rate ms in Fig. 9. For proteins in the 7–20 kD range, typical optimal values of range from 8.0 to 15.0 ms.
5.3. HN(CA)CO The current implementation of AUTOASSIGN can use HN(CA)CO spectral data to identify intraresidue C´ resonance frequencies. For simpler systems, these data are not necessary, but for more complex systems it has been found to be very valuable to have some intraresidue C´ frequency information (Zimmerman et al., 1997). When these data are available, they can be matched to sequential C´
resonance frequency data derived from HNCO experiments to resolve ambiguities in establishing links between GS objects. It should be noted that unless the protein sample is deuterated in the position, the HN(CA)CO experiment is relatively insensitive and generally provides intraresidue C´ frequency information for only a subset of amino acid spin systems. Nonetheless, when this data is available it
greatly enhances the progress of the AUTOASSIGN process. Our current implementation of PFG–HN(CA)CO (based on Clubb et al., 1992) is shown schematically in Fig. 10. The delay is set as described above, the adjustable coherence transfer delay is set using the value optimized in the HSQC experiment (above), and the adjustable coherence transfer delay is set to exactly as in the HNCO experiment, so as to suppress pathways
involving side-chain groups. The key coherence transfer functions specific to PFG–HN(CA)CO, and are shown for a typical effective coherence relaxation rate in Figs. 11 and 12, respectively. During the coherence transfer delay magnetization transfer occurs via both active
and
coupling constants, giving rise to intraresidue and sequential, respectively, cross peaks. AUTOASSIGN uses these sequential cross peaks to align the HN(CA)CO spectrum with HNCO data, and uses the intraresidue cross peaks to fill out the
Automated Analysis of Resonance Assignments
105
106
Gaetano T. Montelione et al.
Automated Analysis of Resonance Assignments
107
108
Gaetano T. Montelione et al.
CA-ladders of GSs. Optimum values of delay are 8.0–15.0 ms. Typically the intraresidue cross peaks are more intense than sequential cross peaks (Fig. 11),
though differential relaxation effects sometimes result in sequential correlations that are more intense than the corresponding intraresidue correlations. The coherence transfer function for the delay is shown in Fig. 12. During this period,
magnetization transfer via the coupling constant is modulated by the passive couplings, particularly the scalar coupling interaction. Thus, the coherence transfer functions are significantly different for Gly and non-Gly residues (Fig. 12). For non-Gly residues, optimum values of are typically 1.5–3.0 ms. 5.4.
HNCA
Figure 13 shows the implementation of PFG–HNCA (based on Muhandiram and Kay, 1994) in current use for AUTOASSIGN analysis. Delays and
are set as described above for the PFG–HNCO experiment. The key coherence transfer function specific to PFG–HNCA, is shown for a typical effective coherence relaxation rate in Fig. 14. During this period, magnetization transfer occurs via both active and coupling constants, giving rise to intraresidue and sequential, respectively, cross peaks. As in the case of the
Automated Analysis of Resonance Assignments
109
HN(CA)CO experiment, AUTOASSIGN uses these sequential cross peaks to align the HNCA spectrum with CA(CO)NH data, and uses the intraresidue cross peaks to fill out the CA-ladders of GSs. Optimum values of are typically 6.0–14.0 ms. As in the case of HN(CA)CO spectra, the intraresidue cross peaks in HNCA spectra are typically (though not always) more intense than sequential cross peaks (Fig. 14). 5.5.
HACA(CO)NH
Figures 15 and 16 show the implementations of 3D PFG–HACA(CO)NH experiments [based on 4D experiments originally described by Boucher et al.,
1992] in current use for AUTOASSIGN analysis. Figure 15 shows a version of the experiment with frequency labeling in the dimension, while Fig. 16 shows the related experiment with frequency labeling in (Feng et al., 1996).
Conveniently, the optimum values of delays are identical for these two experiments, so that once these delays are optimized for one of the pair, the user can run both experiments with essentially the same parameter sets (except for changes required
to switch from indirect detection of in the dimension to Delays and are set as described above for the PFG–HNCO experiment, and the delay is optimized for intraresidue magnetization transfer via the active
110
Gaetano T. Montelione et al.
Automated Analysis of Resonance Assignments
111
coupling constant (Fig. 11). The delay is used for the refocusing of antiphase magnetization. Its coherence transfer function is similar to that of the curve shown for cross peaks in Fig. 12; i.e., the optimum value is a bit longer than the corresponding delay in the HN(CA)CO experiment. This is because, unlike the non-Gly pathway of the HN(CA)CO experiment, the relevant magnetization of the HACA(CO)NH experiment is not modulated by the passive coupling constant during the delay Of special consideration in the HACA(CO)NH experiments (and in all of the experiments below in which coherence transfer pathways begin on aliphatic protons) are the coherence transfer functions associated with delays (Fig. 17) and (Fig. 18). The optimal value of is determined by the crossover point for the coherence transfer curves for CH (nonGly) and (Gly) groups, i.e., ms (Fig. 17). The optimal values of representing a combination of the effects of active coupling and passive coupling, range from 2.0 to 4.0 ms, depending on the effective relaxation rates. As described below, these delays can also be tuned to provide for C–H and C–C spin-topology editing (Feng et al., 1996; Rios et al., 1996).
112
Gaetano T. Montelione et al.
Automated Analysis of Resonance Assignments
113
114
5.6.
Gaetano T. Montelione et al.
HACANH
Figure 19 shows the implementation of the 3D PFG–HA(CA)NH experiment [based on experiments originally described by Montelione and Wagner, (1989)] in current use for AUTOASSIGN analysis, with frequency labeling of
during the
period. The related 3D (HA)CANH experiment (Montelione and Wagner, 1989), providing intraresidue information, can be run in place of the HNCA experiment, though the latter generally exhibits better sensitivity. With the exception of
the delay all of the delays in HACANH are optimized exactly the same as for HACA(CO)NH; i.e., delays and , are set as described for the PFG– HNCO experiment, delays as described for the HACA(CO)NH experiments, and the delay is optimized for intraresidue magnetization transfer via the active coupling constant (Fig. 11). Coherence transfer curves for corresponding to intraresidue and sequential cross peaks are shown in Fig. 20. These curves are different for Gly (or Gly-X) cross peaks than for nonGly (or nonGly-X) cross peaks because of the modulating effects of passive
Automated Analysis of Resonance Assignments
115
coupling on magnetization in non-Gly pathways during the period. AUTOASSIGN uses the intraresidue peaks in the HACANH spectrum to identify resonances of CA-LDDRS, and uses the sequential peaks to help align the HACANH spectra with the corresponding HACA(CO)NH data. Accordingly, data collection should be carried out so as to optimize both the intraresidue and sequential cross peaks, and typical optimum values of the delay are 3.0–5.0 ms (Fig. 20). As described below, and can also be tuned to provide for C–C and C–H spin-topology editing, respectively.
5.7.
C–C and C–H Phase Information in HACA(CO)NH and HACANH Experiments
Much of the recent work on triple-resonance pulse sequence development has focused on obtaining information useful for classifying amino acid spin system types (Montelione et al., 1992; Bax and Grzesiek, 1993; Lyons and Montelione, 1993; Wittekind and Mueller, 1993; Yamazaki et al., 1993, 1995; Olejniczak and Fesik, 1994; Gehring and Guittet, 1995; Grzesiek and Bax, 1995; Tashiro et al., 1995; Dötsch and Wagner, 1996; Dötsch et al., 1996a, 1996b; Farmer and Venters,
116
Gaetano T. Montelione et al.
Automated Analysis of Resonance Assignments
117
1996; Feng et al., 1996; Rios et al., 1996). This information is extremely valuable for determining resonance assignments, especially when combined with characteristic chemical-shiftdata into automated assignment programs like AUTOASSIGN (Friedrichs et al., 1994; Meadows et al., 1994; Zimmerman and Montelione, 1995; Zimmerman et al., 1994, 1997). Specific information about spin system topologies can be obtained by appropriate tuning of scalar coupling effects. For example, constant-time frequencyevolution periods commonly used in triple-resonance experiments for homonuclear decoupling are generally designed to combine frequency evolution and coherence defocusing–refocusing periods. During these coherence defocusing– refocusing periods, magnetization oscillates differently according to the spin system topology and the set of active and passive scalar couplings. In uniformly
118
Gaetano T. Montelione et al.
. enriched molecules, proper tuning of these delay times can provide resonance phase information (i.e., positive or negative peak intensities) which depends on the number of coupled nuclei (Santoro and King, 1992; Grzesiek and Bax, 1993; Wittekind and Mueller, 1993; Tashiro et al., 1995; Dötsch and Wagner, 1996; Dötsch et al., 1996b; Feng et al., 1996; Rios et al., 1996). We refer to these as “C–C type phase experiments.“ Alternatively, proper tuning of the time period used for refocusing (or defocusing) antiphase carbon magnetization into (or from) in-phase carbon magnetization can provide resonances phase information which depends on the number of coupled nuclei (Morris, 1980; Gehring and Guittet, 1995; Dötsch et al., 1996a; Feng et al., 1996; Rios et al., 1996). We refer to these as “C–H type phase experiments.” Such phase experiments can be used to identify spin system topologies characteristic of different amino acid residue types (see, for example, Grzesiek and Bax, 1993; Tashiro et al., 1995; Feng et al., 1996;
Rios et al., 1996). While there are several varieties of side-chain spin system types, there are only two kinds of carbons in a polypeptide chain composed of the 20 common naturally occurring amino acid residues, i.e., methylene Gly with no directly coupled atoms and methine non-Gly with a single directly coupled atom. Magnetization pathways involving Gly resonances can therefore be distinguished with either C–C type or C–H type phase information. In backbone 2D HN(CO)CA (Gehring and Guittet, 1995) and 3D HN(CA)HA–Gly (Wittekind et al., 1993) pulse sequences, selection of Gly resonances and suppression of
non-Gly resonances is obtained on the basis of their different C–H coupling topologies. Unfortunately, these experiments provide only intraresidue Gly or sequential Gly-X peaks (all the other correlations are “nulled”), and are generally carried out in addition to experiments tuned for identifying intraresidue and sequential connectivity information for the remaining non-Gly spin systems. In our efforts to develop automated methods for determining resonance assignments in proteins, we have found that it is convenient to incorporate Gly “phase labeling” directly into the standard HACA(CO)NH (Figs. 15 and 16) and HACANH (Fig. 19) experiments used for establishing intraresidue and/or sequential connectivities. In these pulse sequences the delay periods (Fig. 17) and (Figs. 18 and 20) can be adjusted to provide C–H or C–C phase information, respectively. To illustrate this point, we describe transfer functions for and spin systems of the relevant pulse sequence fragments. For all of the experiments outlined in Figs. 15, 16, and 19, the transfer function for in-phase carbon magnetization after the refocusing delay is multiplied by the following term describing the refocusing of antiphase magnetization by scalar coupling:
Automated Analysis of Resonance Assignments
119
where is the coupling constant and m is the number of protons directly bonded to the carbon. This transfer function” is identical for the HACANH and HACA(CO)NH experiments (Fig. 17). Appropriate tuning of can thus be used to discriminate magnetization beginning on Gly (m = 2) and non-Gly (m = 1) nuclei based on C–H phase information. Similar considerations can be used to describe the C–C phase effects of coupling during the constant-time evolution periods (Figs. 18 and 20). For HACANH and HACA(CO)NH experiments, the transfer function is multiplied by the following terms describing the effects of and scalar coupling during the period
where
and are the and coupling constants, respectively, and n is the number of carbon atoms with active one-bond coupling to the atom. Equations (2) and (3) describe the
intraresidue and sequential transfer pathways in HACANH, respectively, while Eq. (4) describes the sequential transfer pathway in HACA(CO)NH. These “ transfer functions” are plotted in Fig. 18 [Eq. (4)] and Fig. 20 [Eqs. (2) and (3)], respectively, for Gly and non-Gly cross peaks. Tuning of can thus be used to discriminate magnetization beginning on Gly (n = 0) and non-Gly (n = 1) nuclei based on C–C phase information.
In the case of C–H phase tuning (Fig. 17), the signal modulation is dominated by the coupling constant during Optimal values of for maximizing both methine and methylene magnetizations are (i.e., for nonphase spectra and for phase spectra (see discussion in Feng et al., 1996), assuming Computer simulations of this transfer function carried out with effective uniform relaxation times (assumed to be identical for Gly and non-Gly residues) from 2 to 50 ms indicate that the positions of these optimal
values are independent of relaxation (Feng et al., 1996), although of course the amplitude of the transfer function at these optimal values
becomes smaller as the value of becomes shorter. In the case of C–C phase tuning (Figs. 18 and 20), the highest-frequency modulation during the constant-time period for non-Gly spin systems is due to the coupling constant. With uniform relaxation time optimal values of in the HACANH experiment are (nonphase) and ms (phase), respectively, while for the HACA(CO)NH experiments, the corre-
120
Gaetano T. Montelione et al.
sponding optimal values are (nonphase) and (phase), respectively. For both experiments, the positions of these optima (both phase and nonphase) are relatively independent of relaxation for effective uniform relaxation times as small as 15 ms, but shift to smaller values with shorter relaxation times (Feng et al., 1996). The coherence transfer function plots in Figs. 17, 18, and 20 were simulated assuming a uniform relaxation time of 10 ms during the entire or periods. This value is based on our experience with uniformly enriched proteins in the 7–14 kD range. Under these conditions, good signal-to-noise spectra can be obtained using nonphase, C–C phase, or C–H phase delay tunings in proton
and carbon
versions of the 3D PFG–HACANH and PFG–
HACA(CO)NH experiments (Feng et al., 1996). The signal is modulated during the period by the product of the appropriate and coherence transfer functions. This product is generally smaller for the HACANH experiment than for the HACA(CO)NH experiment; i.e., the HACA(CO)NH experiments are generally more sensitive. Comparisons of these transfer functions provide a prediction of the best method (C–C or C–H) for obtaining backbone phase information in these two experiments. These predictions of the simulations have also been verified by experimental measurements (Feng et al., 1996). With a uniform relaxation rate the HACANH experiments yield better signal-to-noise (S/N) ratios using C–C phase methods (i.e., long value), while the HACA(CO)NH experiments yield better S/N ratios using C–H phase methods (i.e., long value). Specifically, relative to nonphase spectra, the HACANH experiments generally provide better signal-to-noise ratios using C–C Gly phase labeling than when using C–H phase labeling. Indeed, as is evident from the curves in Fig. 20, in HACANH the signal-to-noise ratios can be better for Gly phase versions than for the nonphase version (Feng et al., 1996). This is because longer values used for C–C Gly phase labeling (e.g., also allow more complete transfer by the relatively small coupling constant resulting in better S/N ratios even in the presence of significant relaxation rates (Fig. 20). In larger proteins, the effective relaxation rate during the period is faster, and C–H phase labeling may be preferable. On the other hand, relative to nonphase spectra, the HACA(CO)NH experiments generally exhibit better signal-to-noise ratios using C–H Gly phase labeling than with C–C phase labeling. Since the coupling constant is fairly large the transfer is quite efficient and the long periods required for obtaining C–C phase information cost signal-to-noise due to relaxation effects without much enhancement of the coherence transfer. While not essential for the function of the AUTOASSIGN program, such “Gly phase” information can be interpreted by the program and, when available, can greatly enhance its performance.
Automated Analysis of Resonance Assignments
121
5.8. CBCA(CO)NH Figure 21 shows the version of 3D PFG–CBCA(CO)NH experiment [based on experiments originally described by Grzesiek and Bax (1992a)] in current use for AUTOASSIGN analysis, with frequency labeling of and during the period. This experiment correlates the and resonances of residue i with the resonance of residue by transferring magnetization through the intervening carbonyl group. Our philosophy in setting up this experiment, and in setting up the related 3D PFG–CBCANH experiment (described below), is to select coherence transfer delays so as to optimize the intensities of cross peaks involving resonances, rather than, for example, compromising the choice of delay values to optimize intensities of both and cross peaks. The resulting subset of weaker cross peaks in these spectra are used to align these CBCANH data with (HA)CA(CO)NH spectra, and then the resonance frequency information is used to complete the “rungs” of CO-LDDRS. Delays and are set as described above for the PFG–HNCO experiment, and the delay is optimized for intraresidue magnetization transfer via the active coupling constant (Fig. 11). For nonphase versions of the experiment, delays
and
are
set as described above for the HACA(CO)NH and HACANH experiments.
Coherence transfer functions for critical delays
and
in the
CBCA(CO)NH experiment are shown in Figs. 22 and 23, respectively. These
delays must be optimized to maximize the
transfer. Their optimum
values cannot be determined by arraying them and comparing the signal intensity in the first 1D spectrum of the 3D data set, as the pathways also
contribute significantly to this signal. Instead, the optimal values must be determined by comparing spectra recorded with different values of these arrays. For nonphase versions of the experiment, typical values are and (Table 3).
5.9. CBCANH Figure 24 shows our implementation of 3D PFG–CBCANH [based on experiments originally described by Grzesiek and Bax (1992b)] in current use for
AUTOASSIGN analysis, with frequency labeling of and during the period. This experiment correlates the and resonances of residue i with the resonance of residue and the resonance of residue [via . As with CBCA(CO)NH, our philosophy in setting up this CBCANH experiment is to select coherence transfer delays so as to optimize the intensity of cross peaks involving
resonances, rather than compromising the delay values to
optimize intensities of
and
cross peaks. The subset of
cross peaks in these
spectra are used to align these data with HNCA spectra, and then the resonance frequency information is used to complete the “rungs” of CA-LDDRS. Delays
122
Gaetano T. Montelione et al.
Automated Analysis of Resonance Assignments
123
124
Gaetano T. Montelione et al.
and are set as in the PFG–HNCO experiment, the delay is optimized for intraresidue magnetization transfer via the active coupling constant (Fig. 11), and for nonphase versions of the experiment delays and are set as described above for the HACA(CO)NH and HACANH experiments. The coherence transfer function for the delay is essentially identical to that of the CBCA(CO)NH experiment (Fig. 22). The coherence transfer function for the delay in the CBCANH experiment is shown in Fig. 25. As for CBCA(CO)NH, optimal and values for transfer must be evaluated
by comparing
spectra recorded with different values of these delays.
For nonphase versions of the experiment, typical values are
(Table 3).
ms and
Automated Analysis of Resonance Assignments
125
126
Gaetano T. Montelione et al.
5.10. C–C and C–H Phase Information in CBCA(CO)NH and CBCANH Experiments Figures 17 and 22 clearly show that C–H and C–C phase versions of CBCA(CO)NH and CBCANH can be obtained using longer values of or (Rios et al., 1996). Such information is very valuable for distinguishing spin system types (Rios et al., 1996) and can greatly enhance the automated analysis process. However, although this represents an important future extension, AUTOASSIGN has not yet been designed to take advantage of this type of phase information.
It is also possible to distinguish
from
(and Gly
cross peaks in
CBCA(CO)NH by using longer values of (Fig. 23) or by using values in CBCANH (Fig. 25). With these settings, passive coupling effects during the refocusing of antiphase coherence will
result in intensities for non-Gly cross peaks with opposite signs relative to Gly or all cross peaks, providing a means for distinguishing non-Gly cross peaks (Grzesiek and Bax, 1992a, 1992b). However, such phase labeling of cross peaks costs significant sensitivity (see coherence transfer curves Figs. 23 and 25). Our current philosophy has been to tune all delays in these CBCA(CO)NH and CBCANH experiments so as to maximize the intensities of the cross peaks without phase information, and to distinguish from cross peaks by comparing the CBCA(CO)NH and CBCANH data with (HA)CA(CO)NH and HNCA spectra, respectively. Although this strategy is somewhat redundant, it has the advantage of
Automated Analysis of Resonance Assignments
127
tuning delays so as to provide the most sensitivity in these generally less sensitive CBCA(CO)NH and CBCANH spectra. This approach is particularly important for
collecting CBCANH data, as this critical experiment is one of the least sensitive of the entire set that is normally collected.
6. FUTURE DEVELOPMENTS The specific implementations of triple-resonance experiments described here together with the AUTOASSIGN software provide a robust and efficient process for determining backbone C, N, and H resonance assignments in proteins with molecular weights Future developments focus on extending the automated analysis process to include complete assignments of side-chain aliphatic and aromatic resonances and integration of these assignments in automated processes for analysis of NOESY spectra and 3D structure generation calculations. In regard to side-chain assignments, it is anticipated that phase-type spectra that provide specific information about local and topologies will be especially amenable to automated analysis (Tashiro et al., 1995; Feng et al., 1996; Rios et al.,
1996).
NMR analysis has tremendous potential for analyzing the many gene products that are being identified in the various genomic sequencing projects. In considering “high throughput” analysis of protein structures from NMR data, another key issue is the total time required for data collection. The current process requires some 5 to 14 days of data collection to provide the necessary input for automated analysis of backbone resonance assignments. At least this much additional time will be required to collect the data needed to complete the side-chain assignments and
NOESY data sets needed for structure generation calculations. Recent work in our laboratory has demonstrated that the signal-to-noise ratios of some of the tripleresonance spectra described here can be significantly enhanced, in some cases by more than a factor of 2, by replacing single quantum coherence states that evolve
during specific points in the pulse sequences with multiple-quantum heteronuclear coherence states that exhibit better relaxation properties (Swapna et al., 1997;
Shang et al., 1997). Efforts are in progress to evaluate the value of these multiplequantum versions of the triple-resonance experiments described here in the overall efficiency of the assignment process. An even more efficient approach would be to carry out complete assignments of the and skeleton using decoupled triple-resonance experiments (Grzesiek et al., 1993; Yamazaki et al., 1994; Farmer and Venters, 1995, 1996) on fully (or partially) perdeuterated enriched protein samples, and then adding aliphatic H atom assignments to these using various kinds of correlation experiments (with appropriate isotope shift corrections). While originally designed for addressing assignment problems in larger proteins the improved relaxation properties of
128
Gaetano T. Montelione et al.
perdeuterated proteins make them ideal for rapid collection of triple-resonance
spectra of smaller proteins in progress in our laboratory.
I as well. Efforts along these lines are currently
ACKNOWLEDGMENTS. We thank Rebecca Klein for her expert assistance in scientific editing. This work was supported by grants from the National Institutes of Health (GM-47014 and GM-50733), the National Science Foundation (MCB9407569), a National Science Foundation Young Investigator Award (MCB9357526), and by a New Jersey Commission on Science and Technology Research Excellence Award.
REFERENCES Bartels, C., Billeter, M., Guntert, P., and Wüthrich, K., 1996, J. Biomol. NMR 7: 207–213. Bax, A., and Grzesiek, S., 1993, Accts. Chem. Res. 26: 131–138. Billeter, M., Braun, W., and Wüthrich, K., 1982, 7. Mol. Biol. 155: 321–346. Boucher, W, Laue, E. D., Campbell-Burk, S. L., and Domaille, P. J., 1992, J. Biomol. NMR 2: 631–637.
Chien, C.-Y, Tejero, R., Huang, Y, Zimmerman, D. E., Rios, C. B., Krug, R. M., and Montelione, G. T., 1997, Nature Struct. Biol. 4: 891–895. Clowes, R. T., Crawford, A., Raine, A. R. C., Smith, B. O., and Laue, E. D., 1995, Curr. Opin. Biotech. 6:81–88. Clubb, R. T., Thanabal, V., and Wagner, G., 1992, J. Magn. Reson. 97: 213–217. Delaglio, F., Grzesiek, S.; Vuister, G. W., Zhu, G., Pfeifer, J., and Bax, A., 1995, J. Biomol. NMR 6: 277–293. Dötsch, V., Oswald, R. E., and Wagner, G., 1996a, J. Magn. Reson Ser. B 110: 304–308. Dötsch, V., Oswald, R. E., and Wagner, G., 1996b, J. Magn. Reson Ser. B 110: 107–111. Dötsch, V., and Wagner, G., 1996, J. Magn. Reson Ser. B 111: 310–313. Farmer, B. T. III, and Venters, R. A., 1995, J. Am. Chem. Soc. 117: 4187–4188. Farmer, B. T. III, and Venters, R. A., 1996, J. Biomol. NMR 7: 59–71. Feng, W., Rios, C. B., and Montelione, G. T., 1996, J. Biomol. NMR 8: 98–104. Feng, W., Tejero, R., Zimmerman, D. E., Inouye, M., and Montelione, G. T., 1998, Biochemistry 37: 10881–10896. Friedrichs, M. S., Mueller, L., and Wittekind, M., 1994, J. Biomol. NMR 4: 703–726. Garrett, D. S., Powers, R., Gronenborn, A. M., and Clore, G. M., 1991, J. Magn. Reson. 95: 214–220. Gehring, K., and Guittet, E., 1995, J. Magn. Reson. Ser. B 109: 206–208. Grzesiek, S., Anglister, J., Ren, H., and Bax, A., 1993, J. Am. Chem. Soc. 115: 4369–4370. Grzesiek, S., and Bax, A., 1992a, J. Am. Chem. Soc. 114: 6291–6293. Grzesiek, S., and Bax, A., 1992b, J. Magn. Reson. 99: 201–207. Grzesiek, S., and Bax, A., 1993, J. Biomol. NMR 3: 185–204. Grzesiek, S., and Bax, A., 1995, J. Biomol. NMR 6: 335–339. Hare, B. J., and Prestegard, J. H., 1994, 7. Biomol. NMR 4: 35–46. Ikura, M., Kay, L. E., and Bax, A., 1990, Biochemistry 29: 4659–4667. Kay, L. E., Ikura, M., Tschudin, R., and Bax, A., 1990, J. Magn. Reson. 89: 496–514. Kay, L. E., Keifer, P., and Saarinen, T., 1992, J. Am. Chem. Soc. 114: 10663–10665. Kumar, V., 1992, Artif. Intell. Mag. Spring:32–44. Laity, J. H., Lester, C., Shimotakahara, S., Zimmerman, D. E., Scheraga, H. A., and Montelione, G. T., 1997, Biochemistry 36: 12683–12699.
Automated Analysis of Resonance Assignments
129
Li, Y. C., and Montelione, G. T., 1993, J. Magn. Reson. Ser. B 101: 315–319. Lyons, B. A., and Montelione, G. T., 1993, J. Magn. Reson. Ser. B 101: 206–209. Lyons, B. A., Tashiro, M., Cedergren, L., Nilsson, B., and Montelione, G. T., 1993, Biochemistry 32: 7839–7845. Macworth, A. K., 1977, Artif. Intell. 8: 99–118. Marion, D., Ikura, M., Tschudin, R., and Bax, A., 1989, J. Magn. Reson. 85: 393–399. Meadows, R. P., Olejniczak, E. T., and Fesik, S.W., 1994, J. Biomol. NMR 4: 79–96. Mittard, V, Morelle, N., Brutscher, B., Simorre, J.-P., Marion, D., Stein, M., Jacquot, J.-P, Lirsac, P.-N., and Lancelin, J. -M., 1995, Eur. J. Biochem. 229: 473–485. Montelione, G. T., Lyons, B. A., Emerson, S. D., and Tashiro, M., 1992, J. Am. Chem. Soc. 114: 10,974–10,975. Montelione, G. T., and Wagner, G., 1989, J. Am. Chem. Soc. 1 1 1 : 5474–5475. Montelione, G. T., and Wagner, G., 1990, J. Magn. Reson. 87: 183–188. Morelle, N., Brutscher, B., Simorre, J. P., and Marion, D., 1995, J. Biomol. NMR 5: 154–160. Morris, G., 1980, J. Am. Chem. Soc. 102: 428–429. Moy, F. J., Seddon, A. P., Campbell, E. B., Böhlen, P., and Powers, R., 1995, J. Biomol. NMR 6: 245–254. Muhandiram, D. R., and Kay, L. E., 1994, J. Magn. Reson. Ser. B. 103: 203–216. Nagayama, K., 1986, J. Magn. Reson. 69: 508–510. Newkirk, K., Feng, W., Jiang, W., Tejero, R., Emerson, S. D., Inouye, M., and Montelione, G. T., 1994, Proc. Natl. Acad. Sci. U. S. A. 91: 5114–5118. Olejniczak, E. T., and Fesik, S.W., 1994, J. Am. Chem. Soc. 116: 2215–2216. Olson, J. B., Jr., and Markley, J. L., 1994, J. Biomol. NMR 4: 385–410. Rico, M., Bruix, M. Santoro, J., Gonzalez, C., Neira, J. L., Nieto, J. L., and Herranz, J., 1989, Eur. J. Biochem. 183: 623–638. Rios, C. B., Feng, W., Tashiro, M., Shang, Z., and Montelione, G. T, 1996, J. Biomol. NMR 8: 345–350. Robertson, A.D., Purisima, E. O., Eastman, M. A., and Scheraga, H. A., 1989, Biochemistry 28: 5930–5938. Santoro, J., Gonzalez, C., Bruix, M., Neira, J. L., Nieto, J. L., Herranz, J., and Rico, M., 1993, J. Mol. Biol. 229: 722–734. Santoro, J., and King, G. C., 1992, J. Magn. Reson. 97: 202–207. Shang, Z., Swapna, G. V. T., Rios, C. B., and Montelione, G. T, 1997, J. Am. Chem. Soc. 119:9274–9278. Shimotakahara, S., Rios, C. B., Laity, J. H., Zimmerman, D. E., Scheraga, H. A., and Montelione, G. T., 1997, Biochemistry 36: 6915–6929. Simorre, J.-P., Brutscher, B., Caffrey, M. S., and Marion, D., 1994, J. Biomol. NMR 4: 325–333. Swapna, G. V. T., Rios, C. B., Shang, Z., and Montelione, G. T, 1997, J. Biomol. NMR 9: 105–111. Szyperski, T., Wider, G., Bushweller, J. H., and Wüthrich, K., 1993, J. Am. Chem. Soc. 115: 9307–9308. Tashiro, M., Rios, C. B., and Montelione, G. T., 1995, J. Biomol. NMR 6: 211–216. Tashiro, M., Tejero, R., Zimmerman, D. E., Celda, B., Nilsson, B., and Montelione, G. T, 1997, J. Mol. Biol. 272: 573–590. Wagner, G., and Wüthrich, K., 1982, J. Mol. Biol. 155: 347–366. Wishart, D. S., Bigam, C. G., Yao, J., Abildgaard, F., Dyson, H. J., Oldfield, E., Markley, J. L., and Sykes, B. D., 1995, J. Biomol. NMR 6: 135–140. Wittekind, M., Metzler, W. J., and Mueller, L., 1993, J. Magn. Reson. Ser. B 101: 214–217. Wittekind, M., and Mueller, L., 1993, J. Magn. Reson. Ser. B 101: 201–205. Wlodawer, A., Svensson, L. A., Sjölin, L., and Gilliland, G. L., 1988, Biochemistry 27: 2705–2717. Wüthrich, K., 1986, NMR of Proteins and Nucleic Acids, Wiley, New York. Yamazaki, T, Forman-Kay, J. D., and Kay, L. E., 1993, J. Am. Chem. Soc. 115: 11,054–11,055. Yamazaki, T., Lee, W., Revington, M., Mattiello, D. L., Dahlquist, F. W., Arrowsmith, C. H., Kay, L. E., 1994, J. Am. Chem. Soc. 116, 6464–6465.
130
Gaetano T. Montelione et al.
Yamazaki, T., Pascal, S. M., Singer, A. U., Forman-Kay, J. D., and Kay, L. E., 1995, J. Am. Chem. Soc. 117: 3556–3564. Zimmerman, D. E., Kulikowski, C. A., and Montelione, G. T., 1993, Proc. 1st Int’l Conf. Intell. Syst. Mol. Biol. 1: 447–455.
Zimmerman, D. E., Kulikowski, C. A., Wang, L. L., Lyons, B. A., and Montelione, G. T., 1994, J. Biomol. NMR 4:241–256. Zimmerman, D. E., Kulikowski, C. A., Feng, W., Tashiro, M., Powers, R., and Montelione, G. T., 1997, J. Mol. Biol. 269: 592–610. Zimmerman, D. E. and Montelione, G. T., 1995, Curr. Opin. Struct. Biol. 5: 664–673.
4
Calculation of Symmetric Oligomer Structures from NMR Data
Seán I. O’Donoghue and Michael Nilges 1. SUMMARY The size range of proteins amenable to NMR spectroscopy has extended to the point where protein oligomer structures are now being routinely determined. Many are symmetric; we found 36 symmetric oligomers solved by NMR in the present protein structure database: 32 dimers, 2 tetramers, 1 pentamer, and 1 hexamer. Hence, we anticipate that an increasing number of symmetric oligomer structures will be studied in the future. Since symmetry-related nuclei have degenerate chemical shift, the resonance assignment problem for symmetric oligomers is simplified compared with asymmetric molecules of similar size. However, the NOESY assignment and structure calculation are much more difficult, mainly due to difficulty in distinguishing among intra-, inter-, and comonomer (mixed) NOE signals. For dimers, this difficulty can be overcome using asymmetric labeling, but ambiguity remains for higher-order oligomers. In this chapter we focus on a calculation method, called the symmetry-ADR method, that we have developed for overcoming these difficulties. The main features of the method are the use of special
Seán I. O’Donoghue and Michael Nilges • European Molecular Biology Laboratory, D-69012 Heidelberg, Germany. Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999. 131
132
Seán I. O’Donoghue and Michael Nilges
restraints to specify the oligomeric symmetry, the use of ambiguous distance restraints (ADRs) to represent the ambiguous NOEs, and the use of novel annealing protocols for the structure calculation. We discuss in detail several structure calculations we have made with this method. We also briefly review the structure calculation methods used in all of the symmetric oligomers solved by NMR to date; the majority have been solved by using aspects of the symmetry-ADR method. We conclude that the symmetry-ADR method has proven to be useful and capable of producing accurate structures. However, our experience cautions us that the calculation of symmetric oligomers by NMR remains challenging, particularly for higher-order oligomers.
2.
INTRODUCTION
We discuss the different classes of symmetry that can occur in protein oligomers, how symmetry complicates NMR structure determination, and why experimental methods for breaking symmetry cannot fully address the problem.
2.1. Symmetry in Macromolecular Aggregates When identical macromolecules aggregate symmetrically, the most favorable
intermolecular contact surface is completely buried from the solvent. Hence, symmetric aggregates are usually energetically more favorable than asymmetric aggregates, in which some of the favorable contact surface must be exposed to the solvent. For in vivo protein complexes, this implies that evolution will tend toward symmetric arrangements. Thus, most protein aggregates of identical subunits we
observe are symmetric. Three different classes of symmetry can occur in macromolecular aggregates: space group, linear group, or point group symmetries. In each case, the aggregate is comprised of identical macromolecular subunits all related by geometrical transformations, which satisfy the requirements of a group, as defined in mathematical group theory. Space group symmetry occurs in crystals and is defined by rotation and translation operators, and a specification of the unit cell. In vivo, proteins rarely aggregate into crystals. More common is linear group symmetry, which occurs in protein fibers, viruses, and in filamentous phages; this symmetry is defined by rotation and translation operators. Due to the high molecular weight of protein crystals and fibers, it is generally not possible to determine
their structures at atomic resolution by NMR, although some structural properties
can be determined by solid-state NMR techniques (e.g., Facelli and Grant, 1993; Phillips et al., 1991). Most symmetric protein aggregates in vivo have point group symmetry, which
forms symmetric oligomers. Many of these oligomers are within the size range amenable to NMR structure determination. A point group is defined by specifying
Calculation of Symmetric Oligomer Structures from NMR Data
133
one or more symmetry axes and the rotation operators that relate the monomers arranged around each axis. For example, point group 2 indicates two identical monomers related by a twofold (180°) rotation around one symmetry axis; point group 3 indicates three identical monomers related by a threefold (120°) rotation
around one symmetry axis; point group 22 (equivalently, 222) indicates a dimer of dimers, i.e., two identical dimers (of point group 2) related by a twofold rotation around another two-fold symmetry axis; point group 32 indicates a trimer of dimers, i.e., three identical symmetric dimers related by a three-fold symmetry axis (see Fig. 1). For symmetric macromolecular aggregates, only the following point groups are possible: n, n2, 23, 432, 532, where n is any positive integer (Weyl, 1952). In the present PDB* (protein structure database; Bernstein et al., 1977), all these are represented (Schirmer, 1978). Some protein oligomers are quasi-symmetric; i.e., the monomers are chemically identical, but each has a slightly different conformation. Quasi-symmetries can be reliably detected by X-ray crystallography. However, in NMR spectroscopy, quasi-symmetries can only be detected when the distinct conformations are longlived compared with the NMR time scale; when each monomer exchanges rapidly
between the different conformations, only one average conformation will be seen in the NMR spectra, and the molecule will incorrectly appear to be completely symmetric. An example of a quasi-symmetry that cannot be detected by NMR occurs in the central asparagine residue of some leucine zipper homodimers: in one crystal structure of the GCN4 leucine zipper, the molecule is entirely symmetric except for the side chain of N16 (PDB code 1tza; O’Shea et al., 1991). In another crystal
structure, the entire structure has symmetric electron density, but the density for N16 cannot be fitted by a single conformation, indicating that the side chain is disordered (Konig and Richmond, 1993). This residue occurs at the dimer interface and interacts with its symmetry mate on the other monomer. A symmetric conformation of the two side chains would lead to steric overlap; hence the side chains exchange between two asymmetric conformations. In contrast, NMR studies of GCN4 and other closely related leucine zippers show only one set of resonances for this residue (Junius et al., 1996; Atkinson et al., 1991; Saudek et al., 1991; Oas
et al., 1990), indicating that the exchange must occur rapidly on the NMR time scale. The fast exchange could be confirmed by hydrogen-exchange and relaxation measurements (MacKay et al., 1996; King, 1996; Junius et al., 1995). Other oligomers are pseudosymmetric; i.e., the monomers are chemically distinct but are arranged nearly symmetrically. Pseudosymmetric oligomers are fairly common; the structures of several have already been determined by NMR. When the proteins have distinct sequences, pseudosymmetry is generally no *For the meanings of acronyms and symbols used, see the symbols list preceding the references.
134
Seán I. O’Donoghue and Michael Nilges
Calculation of Symmetric Oligomer Structures from NMR Data
135
136
Seán I. O’Donoghue and Michael Nilges
problem for NMR. However, as the sequence similarity increases, there will be more chemical-shift degeneracy, and the situation approaches true symmetry;
hence, determining the structure by NMR becomes more complicated.
2.2. The Problem: Symmetry Degeneracy in NMR Spectra In NMR spectra of symmetric oligomers, all symmetry-related nuclei have equivalent magnetic environments and therefore are degenerate in chemical shift. Thus, only one monomer is “seen” in the spectra. We refer to this degeneracy as “symmetry degeneracy,” to distinguish it from the more familiar “dispersion degeneracy” that also occurs in asymmetric systems. To determine the number of monomers in a symmetric oligomer generally requires an independent technique, such as sedimentation equilibrium studies or chemical cross-linking.
Symmetry degeneracy greatly simplifies the resonance assignment problem since we only have to assign one monomer. Consequently, in the homonuclear case,
it is feasible to assign symmetric oligomers that are much larger than the present limit for asymmetric structures. As an extreme example, Flynn et al. (1977) recently reported the partial assignment of a symmetric oligomer of 11 monomers, total molecular weight of 91 kDa. Unfortunately, NOESY assignment and structure calculation of symmetric structures are considerably more complicated than for asymmetric structures. The central problem is that it is impossible to distinguish if an NOE cross peak is intramonomer, intermonomer, or comonomer. For trimers and higher-order oligomers there are several different classes of intermonomer NOEs that occur; again, it is impossible to distinguish these classes in symmetry-degenerate NMR spectra (Fig. 2). Hence, with traditional calculation methods, which require explicit assignment of all NOEs, no structure can be determined a priori. A related problem is the reduced number of NOE cross peaks compared to an equivalent-sized asymmetric system; in theory, this can be compensated by decreasing the degrees of freedom searched during the structure calculation, specifying the coordinates for only one monomer together with the symmetry axes. In practice, implementing this approach complicates the structure calculation. There is an additional complication when the point group symmetry is not clear. For example, a tetramer may have either point group 222 or 4, and from the NMR spectra it would not be possible to distinguish between these possibilities; hence we would not know which symmetry to apply during the structure calculation. The point group can sometimes be inferred from stability studies; e.g., for the p53 tetramer, the dimer was observed to be more stable, which suggested point group 222, rather than 4 (Lee et al., 1994).
Calculation of Symmetric Oligomer Structures from NMR Data
137
2.3. Reducing Symmetry Degeneracy with Asymmetric Labeling Experimentally, symmetry degeneracy is a distinct and more fundamental problem than dispersion degeneracy. While dispersion degeneracy can be reduced
138
Seán I. O’Donoghue and Michael Nilges
using higher field strengths, better acquisition, or better sample conditions, symmetry degeneracy cannot. Several experimental approaches have been proposed to break the symmetry by mixing labeled with unlabeled monomers (“asymmetric labeling”). These experiments can specifically identify which NOEs are intermonomer by analyzing difference spectra in the case of labeling (Arrowsmith
et al., 1991) or using X-filtered spectroscopy for
and
labeling (Folmer et
al., 1995a; Folkers et al., 1993). However, these approaches have several limitations. First, it is sometimes difficult to achieve full mixing of labeled and unlabeled monomers in the oligomer. Second, the difference spectra and X-filtered experiments have reduced signal-tonoise and may have strong artifacts; this may be improved through careful design of the experiment (Folmer et al., 1995a), but interpreting these spectra still requires a great deal of caution as the artifacts can lead to serious errors in the final structure
(Clore et al., 1995). The lower signal-to-noise may result in comonomer NOEs being incorrectly assigned as purely intramonomer; when there are very few purely
intermonomer NOEs, the structure determination is considerably more complicated. A final limitation is that the method cannot distinguish between the different classes of intermonomer NOEs that occur in trimers and higher-order oligomers; it can only distinguish between intermonomer and intramonomer NOEs. This is usually sufficient to enable the monomer structure to be calculated, and for dimers it is generally sufficient to enable complete structure determination. However, for higher-order oligomers the data remain highly ambiguous; determining the structure requires a special calculation method.
3. THE SYMMETRY-ADR CALCULATION METHOD
In this section we describe the symmetry-ADR method we have developed for calculating symmetric oligomer structures. The method has three main features: the use of symmetry restraint terms to enforce correct symmetry; the use of ambiguous distance restraints to describe the ambiguity in the NOEs arising from symmetry
degeneracy; and finally, the use of specific annealing protocols to actually run the structure calculation.
3.1. Symmetry Restraint Terms Throughout the structure calculation, each monomer is represented by a
separate set of coordinates; the symmetry is enforced using two restraint terms. The first term forces the monomers to be (very nearly) identical and uses the NCS (“non-crystallographic symmetry”) restraint option in X-PLOR (Brünger, 1993). This restrains each atom to the average position over all monomers, using the following potential:
Calculation of Symmetric Oligomer Structures from NMR Data
where
139
are the Cartesian coordinates of
the ath atom on the mth monomer after superimposing onto the first monomer, are the averages of the superimposed coordinates, A is the total number of atoms in one monomer, and M is the number of monomers. The second restraint term ensures a symmetric arrangement of the identical
monomers using distance symmetry (DSYM) restraints (O’Donoghue et al., 1996; Nilges, 1993). In this potential, we specify some number, S, of atom pairs, and chosen from one monomer; then considering all symmetry-related atoms, we
restrain all equivalent intermonomer distances to be equal. Which distances need to be included depends on the point group. For a dimer, we need only restrain two
intermonomer distances for each pair of atoms; i.e., the following difference should be zero:
where indicates the distance between the atom on the first monomer, and the atom on the second monomer [similarly for . For higher order oligomers, several different intermonomer distances need to be restrained for each pair of atoms. Table 1 shows the equivalent distance pairs for all oligomers up to a hexamer. When the point group is not known (e.g., for a tetramer the point group could be either 4 or 222), it would be necessary to do a separate series of structure calculations for each possible point group; the correct point group should give the lowest-energy structures. The equivalent distance pairs are restrained using the following “soft-square” potential that switches from an initially square-well poten-
tial to asymptotic behavior for large deviations:
where and are determined by the requirement that the function is continuous and differentiable at the switching distance and
is the slope of the asymptote. Since the NCS restraint term keeps the monomers identical, only a small number of atom pairs need to be restrained. We
use one pair of atoms per residue in the monomer, with the atoms systematically set to each and the atoms chosen at random. A more efficient method would be to have only one set of monomer coordinates; however, this would require explicitly defining the symmetry axes at the beginning of the calculation. Currently, this approach is not implemented in X-PLOR. The advantage of our approach (separate coordinates for each monomer
140
Seán I. O’Donoghue and Michael Nilges
and enforcing symmetry using the two terms above) is that the symmetry axes do not need to be defined; they evolve implicitly during the structure calculation, driven by the NOE data.
3.2. Ambiguous Distance Restraints (ADRs) The second major problem of symmetric oligomers is how to treat ambiguous NOEs that arise from symmetry degeneracy. Our approach is to use the same
Calculation of Symmetric Oligomer Structures from NMR Data
141
formalism as for ambiguous NOEs arising from dispersion degeneracy in asymmetric systems (Nilges and O’Donoghue, 1998; Nilges, 1997, 1996, 1995), simply extending the summation to include intermonomer contributions (O’Donoghue et al., 1993; Nilges, 1993). This approach is described in detail below. For the nth NOE cross peak of volume upper and lower limits distance bounds are calculated as follows:
where and are the distance and volume of a known reference; and are error estimates on the upper and lower limit bounds, respectively. The cross peak between a pair of methylene protons is often used as the reference. However, this is not the best choice as this distance is very short and
fixed, whereas most of the structurally important NOEs are longer and hence are differently affected by spin diffusion and internal dynamics. A better reference is to define as the average of the characteristic backbone–backbone
distances within assigned secondary structure elements, and as the arithmetic average of the corresponding NOE volumes. Once an initial set of structures has been calculated, for subsequent iterations and can be defined using all proton pairs less than, say, 6 apart (Nilges et al., 1997). Several definitions for and have been tried; empirically, we have found to be a good starting point for asymmetric structures. In practice, is very important to ensure that is set correctly when using our protocols. When the error bounds are too generous, the restraints do not discriminate between the correct structure and many other possibilities; the restraints may even be satisfied
in a monomer. In these cases, the calculation gives an ensemble of structures with high RMSD. With too tight error bounds, the calculation may converge to an incorrect conformation with high energy; unfortunately, it is not easy to define what constitutes “high” energy—it depends on the data quality and on the force field used. For each NOE peak, we apply one ambiguous distance restraint (ADR) for each
monomer in the oligomer. For the mth restraint from the nth NOE, we restrain the following “d–6-summed distance”:
142
Seán I. O’Donoghue and Michael Nilges
where the sum over i is over all dispersion-degenerate protons from the mth monomer, on the F1 axis; the sum over j is over all dispersion-degenerate protons from the monomer, on the F2 axis; M is the number of monomers in the oligomer. Hence, the resulting distance restraint set (DRS) has restraints, where N is the number of NOE crosspeaks. In some of our previous work (Folkers et al., 1994; Nilges, 1993), we used one restraint per NOE; this requires another summation over m, and division by M (in X-PLOR this can be done automatically by setting the monomer parameter to M). However, we now suggest using a separate restraint for each monomer (i.e., M restraints per NOE with the monomer parameter set to 1) since it is then easier to include data from asymmetric labeling experiments: if we know that a peak is not intramonomer, the intramonomer contribution to Eq. (6) is not calculated (i.e., the sum over
excludes
During refinement, the model structure is constrained to satisfy the NOE data in the DRS by restraining the distances to be within the corresponding upper and lower bounds using the “soft” potential function (Nilges et al., 1988b) which switches between flat, square, and asymptotic behavior:
where
and are determined as described for Eq. (3), is usually set to 1 and the slope of the asymptote, is usually set to 2 NOEs which are unambiguous are defined similarly (i.e., M restraints for each peak) but are treated as a separate restraint term, (in X-PLOR, this is done by defining two NOE classes); is usually weighted more strongly than and the more stringent square-well potential may be used [effectively setting to infinity in Eq. (7)].
3.3. Annealing Protocols
Having defined the symmetry and ADR information, we need a method to search conformational space to find conformations that satisfy these experimental restraints. The annealing protocols we use are derived from standard “molecular dynamical simulated annealing” (MDSA) protocols developed for asymmetric structures with unambiguous distance restraints (Nilges et al., 1988a, 1988c), and
use essentially the same simplified force field (Nilges et al., 1988b). However, we have developed several modified protocols specifically for calculating symmetric oligomers (Table 2). In this section, we discuss the three main protocols.
Calculation of Symmetric Oligomer Structures from NMR Data
143
3.3.1. Ab initio Protocols
The first protocol developed for symmetric oligomers, called MDSA-SO-RPP, was designed to begin with no assumed knowledge of the monomer structure (Nilges, 1993). The protocol begins by generating a monomer structure with a random chain (i.e., random angles); the other monomers are then created with exactly the same coordinates. Thus, the initial conformation trivially satisfies both of the symmetry terms. A later protocol, called MDSA-SO-RXYZ, begins with random Cartesian coordinates, and hence uses a very different weighting scheme to vary the force-field parameters. In both protocols, the calculation is done in three phases. The first stage is a high-temperature conformational search in which nonbonded interactions between atoms are greatly reduced by calculating only interactions between atoms using the repel potential (Nilges et al., 1988b) with a slightly increased radius. The weights on the NCS, DSYM, and covalent geometry terms are also reduced. This allows the structure the necessary freedom to move toward a low-energy conformation. The monomers quickly separate from the initial coincident position. In the second phase, the temperature of the system is slowly
144
Seán I. O’Donoghue and Michael Nilges
cooled, and the weights on the nonbonded, symmetry, and NOE terms are simultaneously increased. Nonbonded interactions are calculated between all atoms, switching to smaller radius. In the final phase, the energy of the structure is minimized using weights of 1.0 for all energy terms. In later versions of these protocols, we have tried starting structures in which
the monomers are placed in the correct symmetry by rotations, keeping the center of mass for each monomer is at the origin. Particularly for the random Cartesian coordinate structures, this starting orientation is completely unbiased in the initial implicit intra- and intermonomer assignments. This may be of particular advantage for solving oligomers in which the monomers are intricately interwoven.
3.3.2. Beginning with a Known Monomer We have also developed a variation of the above protocol (called MDSA-SOWDMR) for the case where a reasonably accurate monomer structure can be calculated before the complete oligomer structure is known (O’Donoghue et al, 1996). Such will often be the case, as asymmetric labeling techniques allow intramonomer NOEs to be unambiguously assigned. The protocol begins from a well-defined monomer structure calculated from the intramonomer NOEs with the standard MDSA-AM-RXYZ protocol (Table 2); “well-defined” means that the structure has good covalent geometry and the overall topology is approximately correct. The monomer structure is maintained throughout the oligomer protocol using higher initial weights on the covalent geometry terms, the NCS term, the intramonomer NOEs, and on the term restraining the experimentally determined dihedral angles. In this way, many assignments are implicitly done at the first stage of the protocol; many of the ADRs that correspond to intramonomer NOEs will already be satisfied by the monomer structure, and the intermonomer assignment possibilities in Eq. (6) will not contribute to the force driving the structure calculation. In contrast, most of the ADRs that correspond to intermonomer NOEs will not be satisfied, and hence a relatively large force will be applied in which both the intermonomer and intramonomer terms will contribute. The initial relative placement of the monomers is important, as it defines the initial weighting of these contributions. For this reason, the monomers are initially placed with the correct symmetry but with a randomized relative orientation of the monomers. This is done by centering the monomer at the origin and randomly rotating; the symmetry-related monomers are then generated by applying appropriate symmetry rotations. During the structure calculation, the weight on the NCS term is kept high, forcing the monomers to move cooperatively. Except as described above, the calculation proceeds as before. Clearly, this protocol is not as unbiased as the ab initio protocols; however, in some cases, it appears to give better initial convergence.
Calculation of Symmetric Oligomer Structures from NMR Data
145
3.4. Iterative Structure Calculation and Explicit Assignment of ADRs In our experience, only a small fraction of the structures in the initial ensembles have the correct overall oligomer topology; however, the correct structures usually have the lowest energies. For the annealing method to produce high-quality structures, we require a high rate of convergence to the correct topology. The same problem occurs when using ADRs to calculate asymmetric structures from spectra with high dispersion degeneracy. Unfortunately, the many contributing terms in the ADRs introduce many additional local minima, making it much more difficult to find the correct conformation. The solution is to use the low-energy structures in the initial ensemble to partially assign the ambiguous NOEs, then calculate a new ensemble of structures using the partial assignments. In this way the convergence toward the correct topology can be iteratively improved until it is high enough so that the lowest-energy structures define a high-quality solution structure. This iterative assignment can be done with ARIA (Nilges and O’Donoghue, 1998; Nilges et al., 1997), which was originally designed for calculating asymmetric structures. The standard criterion for assignment in ARIA is based on an estimate of the relative peak contributions of different assignment possibilities to the peak volume. For each assignment possibility, k, which contributes to a given NOE [i.e., each term in the summation on the right-hand side of Eq. (6)], the relative contribution, to the total NOE volume is estimated from the corresponding interproton distances in the ensemble of calculated structures using
where the sum over a is over all pairs of protons which contribute to the NOE, and is the average distance between the given proton pair in the structure ensemble. The assignment possibilities are then reordered according to the values, such that corresponds to the assignment with the largest contribution, We then find the largest contributions such that
where the cutoff parameter p is gradually reduced over successive iterations, usually starting from 0.999 for the first iteration and reaching a final value of 0.8 in the eighth iteration. The corresponding assignment possibilities are then written out as a new ADR, and a new round of structures calculated with the new DRS. Applying this “assignment filter,” the ambiguity can be iteratively reduced, giving progressive improvement in convergence and efficiency.
146
Seán I. O’Donoghue and Michael Nilges
3.5. Other Restraint Terms In some cases, particularly when more NOEs are comonomer than inter-
monomer, the above method can have very low convergence; here we describe some additional restraint terms that can be used to improve convergence in difficult cases. 3.5.1. Packing Restraints
During the calculation, it may happen that the monomers drift too far apart so that the intermonomer terms become negligible. To avoid this, it may be necessary to add an overall “packing” or “collapse” term. Simply restraining all atoms to the
origin with a low weight is sometimes sufficient (see Sec. 4.4 and Nilges, 1995). In the case of leucine zippers, we used a “coiled-coil” packing term, which we found to be important in solving the structure (Sec. 4.3). Such packing terms should not affect the energy landscape close to the correct fold, but merely increase convergence to the correct fold by preventing dissociation of the oligomer. 3.5.2. Comonomer Restraints
NOEs involving protons close to a symmetry axis may be comonomer, i.e., arising from a mixture of several classes of interaction (intramonomer interactions
or the different classes of intermonomer interactions). When the entire interface between two monomers is close to a symmetry axis (e.g., leucine zippers, Sec. 4.3), there will be more comonomer NOEs than pure intermonomer NOEs. In such cases, the intramonomer contributions alone are almost sufficient to satisfy the ADRs, and hence only a weak force is applied between the monomers. Hence, convergence to the correct topology can be particularly low; moreover, even after many iterations using the assignment filter (Sec. 3.4), these NOEs will at best be left ambiguous, or possibly incorrectly assigned as intramonomer.
A solution to this problem is to try to specifically assign comonomer NOEs, and include comonomer restraints in the structure calculation. Our assignment criterion for comonomer NOEs was to consider both the intramonomer and intermonomer assignment possibilities involved in each NOE; if both distances are less than 5 Å in all low-energy structures in the ensemble, we consider that the NOE is comonomer. In this case, we add two additional restraints to the DRS, separately restraining the intramonomer and the intermonomer distances to be less than 5 Å. These restraints usually improve convergence.
Since asymmetric labeling experiments may lead to comonomer NOEs being incorrectly assigned as intramonomer, all NOEs assigned as intramonomer can be also be checked in the above manner.
Calculation of Symmetric Oligomer Structures from NMR Data
147
3.5.3. Interface Filter
Another approach in cases where the initial convergence is very low is to attempt to identify the residues involved in intermonomer contacts. If all interface residues can be identified, we can design an “interface filter” that can be used to screen out structures that do not have the correct interface. The filter uses the following principle: each interface residue must be close to at least one interface residue on a separate monomer. We measure the summed distance from each interface residue to all other interface residues on other monomers. Structures in which this distance is greater than, say, 9 Å can then be excluded from the assignment analysis, hence improving convergence toward the correct topology. When asymmetric labeling experiments have been done, it may be possible to apply the interface filter from the beginning of the structure calculation, since in general only the interface residues will have ambiguous NOEs. In this case, we can improve convergence greatly by choosing starting conformations that satisfy the interface filter. In the absence of asymmetric labeling data, we may be able to map the interface residues after several rounds of structure calculation; if the data show some tendency to converge toward the correct topology and if the assignment filter (Sec. 3.4) is used carefully enough, only the interface residues will be left as ambiguous after several iterations. The method may work even for particularly difficult DRSs (O’Donoghue et al., 1996).
4. EXPERIENCES WITH THE SYMMETRY-ADR METHOD
In this section, we discuss the experience we have had in applying the symmetry-ADR method to calculating symmetric oligomer structures. 4.1.
Initial Test Calculations
The method was first tested using three DRSs (Nilges, 1993): a model DRS derived from the crystal structure of the met repressor (1cmc; Rafferty et al., 1989), and two experimental DRSs—one measured for TNCIII, a peptide comprising one EF hand of troponin C (1cta; Kay et al., 1991), and another measure for interleukin 8 (2il8, Clore et al., 1990). These structures are shown in Fig. 3. In these calculations, all NOEs were treated as ambiguous, and all calculations used the MDSASO-RPP protocol (Sec. 3.3.1); however, different starting structures were used for each DRS. The simplest case was that of interleukin 8, where we used the crystal structure (3il8; Baldwin et al., 1991) as the starting structure. All calculations converged to the previously published NMR structure. The crystal structure is about 2 Å RMSD from the NMR structure of the same molecule; the structural rearrangements were
148
Seán I. O’Donoghue and Michael Nilges
Calculation of Symmetric Oligomer Structures from NMR Data
149
therefore minor and involved mostly a widening of the gap between the two helices by about 2 Å. We were initially motivated to use the met represser for testing our method because of the problems encountered by Breg et al. (1990) in solving the solution
structure of the Arc repressor, a homologous protein; they were only able to solve the structure by exploiting this homology to partially assign the NOESY spectrum. In both structures, the two monomers are intricately interwoven (Fig. 3); the monomer structure can only be formed by interaction with another monomer. This fold proved to be very challenging for the MDSA-SO-RPP protocol. Calculations starting from identical, superimposed random chain monomers completely failed to converge. We then tested to see if convergence could be achieved starting from structures close to the crystal structure. Two kinds of distortions were applied to the crystal structure: rotating each secondary structure element up to 180° around its own axis, and shifting the sequence up to three residues from its correct position. In both cases, the calculation converged back to the crystal structure. The calculations with TNCIII gave the first evidence that fully automatic ab initio calculation is feasible. Calculations started with random chains, with both monomers ideally superimposed. Eight out of 50 structures converged to low energy and correct symmetry. Most nonconverged structures showed completely dissociated monomers; due to the scarcity of intermonomer NOEs and a relatively flexible monomer, all NOEs could almost be satisfied in one monomer alone. A packing restraint might have improved the convergence. The correct dimer structure gave the lowest energy. 4.2. ssDBP Dimer
The first real application of the method was in determining the structure of the single-stranded DNA binding protein (ssDBP) encoded by gene V of the phage M13 (2gvb; Folkers et al., 1994); the structure is shown in Fig. 3. As a starting structure, we used precalculated monomer structures placed in approximately the correct orientations, and the incorrect crystal structure (2gn5; Brayer and McPherson, 1983), which is shifted by one to four residues from the correct structure but has correct topology and symmetry. All NOEs were treated as ambiguous, and data from an asymmetric labeling experiment were also used by imposing 6-Å upper limits on intermonomer distances, in addition to using the ADRs derived from the homonuclear experiment. The calculation converged convincingly; ranked in order of total energy, the best 50% had a similarly low energy and the correct topology. 4.3. Leucine Zipper Homodimers The method was also used in determining the structure of the leucine zipper domain of the Jun homodimer (1jun; Junius et al., 1996; O’Donoghue et al., 1996).
150
Seán I. O’Donoghue and Michael Nilges
Despite the geometric simplicity of the coiled-coil fold (Fig. 3), the leucine zippers
are a particularly difficult case for NMR structure determination. Due to repetition in sequence and structure, there is high dispersion degeneracy in addition to the symmetry degeneracy. In addition, the entire intermonomer interface is close to the symmetry axis; hence, there are more comonomer NOEs than pure intermonomer NOEs. An additional problem with the Jun DRS was that no asymmetric labeling experiments were done to distinguish intra- and intermonomer NOEs. These
experiments would have been particularly useful to identify NOEs between symmetry-related nuclei, since many of the intermonomer NOEs are of this type. Initial calculations using the ab initio protocols (Sec. 3.3.1) showed extremely low convergence to the correct fold. Hence, we developed a protocol to exploit prior knowledge we have about the monomer structure (Sec. 3.3.2) and about the dimer symmetry. In developing the new protocol, we did extensive test calculations using model DRSs derived from the crystal structure of the GCN4 LZ homodimer (2zta; O’Shea et al., 1991) and from a model structure of the Jun LZ homodimer (O’Donoghue et al., 1993). These DRSs were designed to have complete symmetry ambiguity and the same number of NOEs per residue as in the Jun DRS, hence mimicking the experimental DRS. Many backbone–backbone NOEs in a symmetric coiled-coil can be unambiguously assigned as intramonomer (O’Donoghue et al., 1993). Using these NOEs with the MDSA-AM-RXYZ-1.0 protocol, we generated 50 monomer structures for each distance set. These structures were completely helical, with a somewhat variable overall twist. The dimer structures were calculated starting from these monomers, as described in Sec. 3.3.2. The NOEs were classified into three distance categories with upper limits of 3.3, 4.2, and 6.0. The lower limits were set to zero. A packing term was used, restraining the geometric centers of each symmetry-related heptad to be within 10.4 Å using a square-well quadratic potential (Nilges and Brünger, 1991). From trial calculations with the model DRSs, we were able to optimize the protocol specifically for the coiled-coil geometry; the final protocol, called MDSASCC-WDMR, had significantly improved convergence in the initial structure calculation round. The protocol starts with two identical monomers ( helices) arranged in parallel. Using the final protocol, we generated 50 structures for each of the model DRSs; all structures in the top 50% (ranked in order of total energy) had the correct coiled-coil interface (a and d residues in the interface). The final selected ensembles had good covalent geometry, no NOE violations greater than 0.5 Å, and superimposed closely onto the structures from which the DRSs were derived; the mainchain RMSD were Å for GCN4 and Å for Jun. These numbers give some idea of the expected accuracy of the structure calculation method. Using the experimental DRS for Jun, 50 dimer structures were calculated; again, the top 50% all had the correct coiled-coil interface. The ARIA assignment
Calculation of Symmetric Oligomer Structures from NMR Data
151
filter [ ] was used to produce a new, less ambiguous DRS. Also, several comonomer NOEs were assigned (Sec. 3.5.2). A second round of structures was calculated with the new DRS to give the final structures. The final ensemble had no NOE violations greater than 0.5 Å and good covalent geometry. The ensemble superimposes onto the homologous region of the Fos–Jun crystal structure (Glover and Harrison, 1995) with an average RMSD of 0.9 Å, giving an independent estimate of the accuracy of the ensemble. 4.4. p53 Tetramerization Domain
The tetramerization domain of the tumor suppressor protein p53 (1pes) was solved by Lee et al. (1994) using a modification of the MDSA-SO-RPP protocol to first calculate the dimer structure, and then the tetramer, starting from symmetric random-chain structures and using manual iterative assignment. The structure was also solved by Clore et al. (1sae; 1995, 1994), using a different approach relying much more on manual assignment (described in detail by Gronenborn and Clore, 1995). Encouraged by the results of Lee et al., we have since been using p53 as the standard test for the symmetry-ADR method. Our goal has been to automate the
calculation as far as possible. From the NMR data of Clore et al. (1995), a model DRS was derived by
removing all distance restraints that could only have been obtained by asymmetric labeling, i.e., data between equivalent protons on different monomers. The corresponding NOEs would lie on the diagonal in a standard homo- or heteronuclear NOESY spectrum. Similarly, the hydrogen-bond restraints for the sheets were
152
Seán I. O’Donoghue and Michael Nilges
removed, since they require assignment of intermonomer NOEs. In contrast, the hydrogen bonds in the helices could be used. In the first part of the calculation, intramonomer lower-bound restraints in the secondary structure elements were used to improve the definition of the helices, and avoid an incorrect fold of the -strands into hairpins [cf. the incorrectly folded structures of the met repressor in Nilges (1993)]. In addition, a packing term restraining all atoms weakly to the origin (Nilges, 1995) was employed. We used the protocol developed for asymmetric ambiguities (Nilges, 1995) without modification. The calculation consists of a sequence of four simulated annealing protocols. First, an approximate structure was calculated, starting from random Cartesian coordinates. This starting structure seemed appropriate for an intricately interwoven oligomer since it contains no systematic bias toward intraor intermonomer assignment. Because of our negative experiences starting with random chains for the met repressor model study, we did not try the originally published protocol (Nilges, 1993). No chiral information is present in the first part of the protocol. Subsequently, the correct enantiomer was selected (Kuszewski et al., 1992) and the structure was regularized. The structures were then refined twice. Out of 50 calculated structures, the six lowest-energy structures converged to the correct symmetry. The RMS difference between these structures ranges between 1 and 5 Å, and the interhelical angles range roughly from the values found in the first published structure (Clore et al., 1994) and the subsequently published NMR and X-ray crystal structures (Clore et al., 1995; Cho et al., 1994; Lee et al., 1994); see Fig. 4. This result demonstrates the lack of experimental data in the dimer– dimer interface (many of the NOEs in the dimer–dimer interface are between equivalent protons and were not used in the calculation). Hence, the data obtained without asymmetric labeling seems insufficient to uniquely determine the interface. Encouragingly, while most of the structures did not converge to the correct symmetry, many of the higher-energy structures contained essentially correct dimers (Fig. 4). Incorrect dimer topology was only found with much higher energies.
5. SYMMETRIC OLIGOMERS SOLVED BY NMR In this section we briefly discuss the calculation methods used in all of the symmetric oligomer structures solved by NMR to date; the majority have been solved using the symmetry-ADR method or parts of this method. Table 3 lists all symmetric oligomer structures in the PDB (November 1997) which were solved using NMR spectroscopy; most have been solved in the last few years. Almost all structures are dimers. There are only two tetramers, p53 (Clore et al., 1995; Lee et al., 1994) and platelet factor 4/IL-8 chimer (1 pfn; Mayo et al., 1995), both point group 22. Only in the last year have higher-order oligomers been
Calculation of Symmetric Oligomer Structures from NMR Data
153
154
Seán I. O’Donoghue and Michael Nilges
Calculation of Symmetric Oligomer Structures from NMR Data
155
solved: the VTB pentamer (4ull, point group 5; Richardson et al., 1997) and the insulain hexamer (1aiy, point group 32; Chang et al., 1997); these structures are shown in Fig. 1b. Currently, there are no oligomers with point groups 3,4, or 6, and no heptamer or higher-order oligomer. Of the 40 reported structure determinations, 26 have used one or both of the symmetry restraint terms, while 16 have used symmetry ADRs. In many cases, the NOEs assigned as intramonomer from asymmetric labeling experiments have been used to build monomer structures. Often, these structures were used to test the remaining ambiguous assignments—those that could not be satisfied as intramonomer were then assigned as intermonomer. This is essentially doing manually what is done automatically at the early stages of the symmetry-ADR method. In some cases, including the VTB pentamer and the insulin hexamer, the symmetry ambiguity of the NOEs was resolved by reference to previously determined crystal structures; this method is analogous to molecular replacement in X-ray crystallography. While this “molecular reference” method can be effective, it is clearly preferable if the ambiguities in the DRS can be resolved using NMR data alone. In the case of the insulin hexamer, we have recently re-calculated the structure using the symmetry-ADR method, without reference to the crystal struc-
ture. The resulting structure has the same fold as the crystal structure (O’Donoghue, S. I. Chang, X., Abseher, R., Nilges, M., and Led, J. J., in preparation). These results suggest it may be worthwhile re-calculating the other structures solved by reference to crystal structures. In several cases, the ab initio approach (Sec. 3.3.1) was tried and found to have extremely low convergence, in agreement with our own experiences with interleukin 8 and leucine zippers, and for the model data for the met repressor. In such
cases it is best to begin with monomer structures, where possible, and to apply iterative assignment and additional restraints (Secs. 3.4 and 3.5). 6. DISCUSSION 6.1. Problems of the Symmetry-ADR Method The method has been tested on completely ambiguous NOE DRSs in several model calculations; it has also been applied to five completely ambiguous experimental DRSs to produce novel structures; in each case, where crystal structures are available, there is good agreement. In addition, the method has been used with 11 partially ambiguous experimental DRSs, with intermonomer assignments derived either manually or from asymmetric labeling experiments. The method has the appeal that it can be extended to any point group symmetry, and that all information in the spectra can be used to direct the structure calculation, including results of asymmetric labeling. Thus, we conclude that the symmetry-ADR method is a useful
156
Seán I. O’Donoghue and Michael Nilges
general solution to the symmetric oligomer problem. However, the method has problems with certain types of symmetry. When all or most of the interfacial residues are close to a symmetry axis, as in leucine zippers, there are few purely intermonomer NOEs. In such cases, it is most important to correctly assign the comonomer NOEs—unfortunately, these usually cannot be assigned experimentally. The situation is much improved if X-filtered experiments can identify nuclei that interact with their own symmetry mates (and hence occur close to a symmetry axis). Unfortunately, however, these experiments are prone to artifacts and must be interpreted with care. Thus, during the initial rounds of structure calculation, the force driving toward the correct structure can be very weak, hence the convergence will be low, and the calculation can be quite difficult. In such cases, convergence can be improved by using packing restraints, iterative assignment, comonomer restraints, and the interface filter.
In some cases, such as the DRS without the asymmetric labeling data, the data are simply not sufficient to define a unique structure; the initial round of
structure calculation may then suggest a variety of different solutions, as also observed for the cellulose-binding domain by Xu et al. (1995), using a modified form of the symmetry-ADR method. In such cases, there is a danger that applying the iterative assignment procedure may lead to overfitting the data and may converge to the wrong fold. The question is: how do we know if the DRS is sufficient to define a structure? Clearly, we should apply measures to detect these situations, either internal measures of the information content in the data (e.g., free R-factors;
Brünger et al., 1993), agreement with other spectral information such as chemical shift (Sorimachi et al., 1996), or external criteria which judge the final structures [e.g. the PROSA program, which can recognize incorrect structures (Sippl, 1993)].
6.2. Should Symmetry Restraint Terms Be Used? In 14 of the symmetric oligomers solved to date, no symmetry restraint terms were used (Table 3). Leaving out the DSYM term usually does not influence the final structure greatly, once many intermonomer NOEs have been unambiguously assigned, since symmetrically applied intermonomer restraints have a similar effect. The DSYM term acts more as a catalyst, increasing convergence to the right fold. A justification for leaving out NCS symmetry may be seen in the fact that, due to thermal motion, the monomers in an oligomer will rarely be exactly symmetric. In contrast, using the DSYM and NCS terms ensures that the final structures have near perfect symmetry. When complete symmetry is observed in the spectra, we argue that it is valid to fit the structure to this observation. If there are regions affected by time-averaged quasi-symmetries, these may show up as having larger NCS energies. The symmetry restraint terms also aid in improving the convergence of the method, and overcome the potential problem of overfitting the data due to the reduced degrees of freedom (Sec. 2.2). When ADRs are used to describe the symmetry-ambiguous NOEs, it is particularly important to use symmetry restraints.
Calculation of Symmetric Oligomer Structures from NMR Data
157
Leaving out the symmetry restraint terms would require a careful investigation to determine if the data content is high enough via a free R-factor calculation. We believe that it is best to apply symmetry restraint terms during the initial structure calculation to solve the basic problems associated with symmetric oligomer structure determination. Having calculated the correct structure, and hence having assigned many intermonomer NOEs, an additional round of structure calculations can be performed without the symmetry restraint terms. Our experience with symmetric oligomers has shown that there may be a value in doing these calculations, as the ensemble of structures produced without symmetry restraints may better represent the internal dynamics within the oligomer (Abseher et al., 1998). 6.3. Alternatives to the Symmetry-ADR Method
Overall, the method shows poor convergence compared to asymmetric cases. This is likely due to strong correlations between the ambiguities of neighboring residues. A minimization method, such as simulated annealing, that moves single atoms (or rigid parts of amino acids) may not be optimal to move larger parts of the structure coherently if a whole set of NOEs needs to be implicitly
reassigned. The performance of the symmetry-ADR method may be improved by more powerful minimization techniques, such as torsion-angle dynamics, or by using
only one set of coordinates for every monomer, generating the others by strict symmetry. An alternative approach may be possible with self-correcting distance geometry (see Chapter 2 of this volume); to date, no structure determination of a symmetric oligomer has been reported using this approach. This approach is likely to have similar performance to iterative MDSA.
In conclusion, while the determination of symmetric oligomer structures still poses a challenge for NMR spectroscopy, the symmetry-ADR method is often successful, particularly in combination with data from asymmetric labeling experiments. Particularly for higher order oligomers, the use of symmetry ADRs appears to be currently the best approach, indeed often the only approach.
ACKNOWLEDGMENTS. We thank Drs. Robert Hooft and Gert Vriend for help with Sec. 2.1.
SYMBOLS Abbreviations ADR ambiguous distance restraint DRS distance restraint set
158
Seán I. O’Donoghue and Michael Nilges
DSYM distance symmetry restraints MDSA molecular dynamical simulated annealing
NCS NOE PDB RMSD
noncrystallographic symmetry nuclear Overhauser effect Brookhaven protein data bank root-mean-squared deviation
Symbols A total number of atoms per monomer relative contribution of the kth assignment possibility d(a,b) distance between atoms a and b ensemble-averaged distance for the ath assignment possibility
summed distance E energy k energy constant lower limit restraint distance corresponding to the nth NOE peak M total number of monomers N total number of NOESY cross peaks p cutoff parameter used in the ARIA assignment filter upper limit restraint distance corresponding to the nth NOE peak volume of the nth NOESY cross peak Cartesian coordinates for the ith atom
REFERENCES Abseher, R., Horstink, L., Hilbers, C. W., and Nilges, M., 1998, Proteins 31:370. Arrowsmith,C. H., Pachter, R., Altman, R. B., Iyer, S. B., andJardetzky, O., l99l, Biochemistry 29:6332. Atkinson, R. A., Saudek, V., Huggins, J. P., and Pelton, J. T., 1991, Biochemistry 30:9387. Baker, P. J., Turnbull, A. P., Sedelnikova, S. E., Stillman, T. J., and Rice, D. W., 1995, Structure 3:693. Baldwin, E. T., Weber, I. T., St. Charles, R., Xuan, J.-C., Appella, E., Yamada, M., Matsushima, K.,
Edwards, B. F. P., Clore, G. M., Gronenborn, A. M., and Wlodawer, A., 1991, Proc. Natl. Acad. Sci. USA 88:502. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T., and Tasumi, M., 1977, J. Mol. Biol. 112:535. Bonvin, A. M. J. J., Vis, H., Breg, J. N., Burgering, M. J., Boelens, R., and Kaptein, R., 1994, J. Mol. Biol. 236:328. Brayer, G. D., and McPherson, A., 1983, J. Mol. Biol. 169:565.
Breg, J. N., van Opheusden, J. H., Burgering, M. J., Boelens, R., and Kaptein, R., 1990, Nature 346:586. Brünger, A. T., 1993, X-PLOR Version 3.1, User Manual, Yale University, New Haven, CT.
Brünger, A. T., Clore, M. G., Gronenborn, A. M., Saffrich, R., and Nilges, M., 1993, Science 261:328. Burgering, M. J. M., Boelens, R., Gilbert, D. E., Breg, J. N., Knight, K. L., and Kaptein, R., 1994, Biochemistry 33:15036. Chang, X., J gensen, A. M. M., Bardrum, P., and Led, J. J., 1997, Biochemistry 36:9409.
Calculation of Symmetric Oligomer Structures from NMR Data
159
Cho, Y., Gorina, S., Jeffrey, P., and Pavletich, N. P., 1994, Science 265:346. Chung, C. W., Cooke, R. M, Proudfoot, A. E., and Wells, T. N., 1995, Biochemistry 34:9307. Clore, G. M., Appella, E., Yamada, M., Matsushima, K., and Gronenborn, A. M., 1990, Biochemistry
29:1689. Clore, G. M., Omichinski, J. G., Sakaguchi, K., Zambrano, N., Appella, E., and Gronenborn, A., 1995, Science 267:1515. Clore, G. M., Omichinski, J. G., Sakaguchi, K., Zambrano, N., Sakamoto, H., Appella, E., and Gronenborn, A. M., 1994, Science 265:386. Drohat, A. C., Amburgey, J. C., Abildgaard, F, Stanch, M. R., Baldisseri, D., and Weber, D. J., 1996, Biochemistry 35:11577. Eberle, W., Pastore, A., Sander, C., and Rösch, P., 1991, J. Biomol. NMR 1:71. Facelli, J. C., and Grant, D. M., 1993, Nature 365:325. Fairbrother, W. J., Reilly, D., Colby, T. J., Hesselgesser, J., and Horuk, R., 1994, J. Mol. Biol. 242:252. Flynn, P. F, Gollnick, P., and Wand, A. J., 1997, Keystone Symposia, Silverthorne, in Frontiers of NMR in Molecular Biology. V, (G. Wagner, S. W. Fesik, and S. J. Opella, eds.), p. 41. Folkers, P. J. M., Folmer, R. H. A., Konings, R. N. H., and Hilbers, C. W., 1993, J. Amer. Chem. Soc. 115:3798. Folkers, P. J. M., Nilges, M., Folmer, R. H. A., Konings, R. N. H., and Hilbers, C. W., 1994, J. Mol. Biol. 236:229.
Folmer, R. H. A., Hilbers, C. W., Konings, R. N. H., and Hallenga, K., 1995a, J Biomol. NMR 5:427. Folmer, R. H. A., Nilges, M., Konings, R. N. H., and Hilbers, C. W., 1995b, J. Mol. Biol. 236:229.
Fry, E., Acharya, R., and Stuart, D., 1993, Acta Crystallogr. A 49:45.
Glover, J. N. M., and Harrison, S. C., 1995, Nature 373:257. Granier, T., Gallois, B., Dautant, A., Langlois D’Estaintot, B., and Precigoux, G., 1996, Acta Crystallogr. D 52:594. Gronenborn, A. M., and Clore, G. M., 1995, Crit. Rev. Biochem. Mol. Biol. 30:351. Handel, T. M., and Domaille, P. J., 1996, Biochemistry 35:6569. Hard, T., Barnes, H. J., Larsson, C., Gustafsson, J. A., and Lund, J., 1995, Nature Struct. Biol. 2:983. Hinck, A. P., Archer, S. J., Qian, S. W., Roberts, A. B., Sporn, M. B., Weatherbee, J. A., Tsang, M. L., Lucas, R., Zhang, B. L., Wenker, J., and Torchia, D. A., 1996, Biochemistry 35:8517. Jia, X., Grove, A., Ivancic, M., Hsu, V. L., Geiduschek, E. P., and Kearns, D. R., 1996, J. Mol. Biol. 263:259. Junius, F. K., Mackay, J. P., Bubb, W. A., Jensen, S. A., Weiss, A. S., and King, G. F., 1995, Biochemistry 34:6164. Junius, F. K., O’Donoghue, S. I., Nilges, M., Weiss, A. S., and King, G. F., 1996, J. Biol. Chem. 271:13663.
Kay, L. E., Forman-Kay, J. D., McCubbin, W. D., and Kay, C. M., 1991, Biochemistry 30:4323. Kilby, P. M., van Eldik, L. J., and Roberts, G. C, 1996, Structure 4:1041. Kim, K. S., Clark-Lewis, I., and Sykes, B. D., 1994, J. Biol. Chem. 269:32909. King, G. F., 1996, Biophys. J. 71:1. Konig, P., and Richmond, T. J., 1993, J. Mol. Biol. 233:139. Kuszewski, J., Nilges, M., and Brünger, A. T., 1992, J. Biomol. NMR 2:33. Lawrence, M. C., Suzuki, E., Varghese, J. N., Davis, P. C., Van Donkelaar, A.,Tillock, P. A., and Colman, P.M., 1990, EMBO J. 9:9. Lee, W., Harvey, T.-S., Yin, Y, Yau, P., Litchfield, D., and Arrowsmith, C.-H., 1994, Nature Struct. Biol. 1:877. Liang, H., Petros, A. M., Meadows, R. P., Yoon, H. S., Egan, D. A., Walter, K., Holzman, T. F., Robins, T., and Fesik, S. W., 1996, Biochemistry 35:2095. Lodi, P. J., Ernst, J. A., Kuszewski, J., Hickman, A. B., Engelman, A., Craigie, R., Clore, G. M., and Gronenborn, A. M., 1995, Biochemistry 34:9826.
160
Seán I. O’Donoghue and Michael Nilges
Lodi, P. J., Garrett, D. S., Kuszewski, J., Tsang, M. L., Weatherbee, J. A., Leonard, W. J., Gronenborn, A. M., and Clore, G. M., 1994, Science 263:1762.
MacKay, J. P., Shaw, G. L., and King, G. F., 1996, Biochemistry 35:4867.
MacKenzie, K. R., Prestegard, J. H., and Engelman, D. M., 1997, Science 276:131. Manival, X., Yang, Y., Strub, M. P., Kochoyan, M., Steinmetz, M., and Aymerich, S., 1997, EMBO J. 16:5019. Matsuo, H., Shirakawa, M., and Kyogoku, Y., 1995, J. Mol. Biol. 254:668. Mayo, K. H., Roongta, V, Ilyina, E., Milius, R., Barker, S., Quinlan, C., La Rosa, G., and Daly, T. J., 1995, Biochemistry 34:11399. Meunier, S., Bernassau, J.-M., Guillemot, J.-C., Ferrara, P., and Darbon, H., 1997, Biochemistry 36:4412. Nilges, M., 1993, Proteins 17:297. Nilges, M., 1995, J. Mol. Biol. 245:645. Nilges, M., 1996, Curr. Opin. Struct. Biol. 6:617.
Nilges, M., 1997, Fold. Des. 2:S53. Nilges, M., and Brünger, A. T., 1991, Protein Eng. 4:649. Nilges, M., and O’Donoghue, S. I., 1998, Prog. NMR Spect 32:107. Nilges, M., Clore, G. M., and Gronenborn, A. M., 1988a, FEBS Lett. 239:129. Nilges, M., Clore, G. M., and Gronenborn, A. M., 1988b, FEBS Lett. 229:317. Nilges, M., Gronenborn, A. M., Brünger, A. T., and Clore, G. M., 1988c, Protein Eng. 2:27. Nilges, M., Macias, M., O’Donoghue, S. I., and Oschkinat, H., 1997, J. Mol. Biol. 269:408. Oas, T. G., McIntosh, L. P., O’Shea, E. K., Dahlquist, F. W., and Kim, P. S., 1990, Biochemistry 29:2891. O’Donoghue, S. I., Junius, F. K., and King, G. F, 1993, Protein Eng. 6:557. O’Donoghue, S. I., King, G. F, and Nilges, M., 1996, J. Biomol. NMR 8:196. O’Shea, E. K., Klemm, J. D., Kim, P. S., and Alber, T., 1991, Science 254:539. Pabo, C. O., and Lewis, M., 1982, Nature 298:443. Phillips, L., Separovic, F., Cornell, B. A., Barden, J. A., and dos Remedios, C. G., 1991, Eur. J. Biophys.
19:147. Potts, B. C., Smith, J., Akke, M., Macke, T. J., Okazaki, K., Hidaka, H., Case, D. A., and Chazin, W. J., 1995, Nature Struct. Biol. 2:790. Rafferty, J. B., Somers, W. S., Saint-Girons, I., and Phillips, E. V., 1989, Nature 341:705. Richardson, J. M., Evans, P. D., Homans, S. W., and Donohue-Rolfe, A., 1997, Nature Struct. Biol. 4:190. Rico, M., Jimenez, M. A., Gonzalez, C., De Filippis, V., and Fontana, A., 1994, Biochemistry 33:14834.
Saudek, V., Pastore, A., Castiglione Morelli, M. A., Frank, R., and Gibson, T., 1991, Protein Eng. 4:519. Schirmer, R. H., 1978, in Principles of Protein Structure (C. R. Cantor, ed.), Springer-Verlag, Berlin. Shaw, G. S., Hodges, R. S., and Sykes, B. D., 1992, Biochemistry 31:9572. Sippl, M., 1993, Proteins 17:355.
Skelton, N. J., Aspiras, F., Ogez, J., and Schall, T. J., 1995, Biochemistry 34:5329. Sorimachi, K., Jacks, A. J., Le Gal-Coeffet, M. F., Williamson, G., Archer, D. B., and Williamson, M. P., 1996, J. Mol. Biol. 259:970. Srinivasan, N., White, H. E., Emsley, J., Wood, S. P., Pepys, M. B., and Blundell, T. L., 1994, Structure
2:1017. Starich, M. R., Sandman, K., Reeve, J. N., and Summers, M. F., 1996, J. Mol. Biol. 255:187.
Sticht, H., Auer, M., Schmitt, B., Besemer, J., Horcher, M., Kirsch, T., Lindley, I. J., and Rosch, P., 1996, Eur. J. Biochem 235:26. Sutcliffe, M. J., Dobson, C. M., and Oswald, R. E., 1992, Biochemistry 31:2962. Vis, H., Mariani, M., Vorgias, C. E., Wilson, K. S., Kaptein, R., and Boelens, R., 1995, J. Mol. Biol. 254:692. Walters, K. J., Dayie, K. T., Reece, R. J., Ptashne, M., and Wagner, G., 1997, Nature Struct. Biol. 4:744.
Calculation of Symmetric Oligomer Structures from NMR Data
161
Weyl, H., 1952, Symmetry, Princeton University Press, Princeton, NJ.
Wu, Z. R., Ebrahimian, S., Zawrotny, M. E., Thornburg, L. D., Perez-Alvarado, G. C., Brothers, P., Pollack, R. M., and Summers, M. F, 1997, Science 276:415. Xu, G. Y., Ong, E., Gilkes, N. R., Kilburn, D. G., Muhandiram, D. R., Harris-Brandts, M., Carver, J. P., Kay, L. E., and Harvey, T. S., 1995, Biochemistry 34:6993. Yamazaki, T., Hinck, A. P., Wang, Y. X., Nicholson, L. K., Torchia, D. A., Wingfield, P., Stahl, S. J., Kaufman, J. D., Chang, C. H., Domaille, P. J., and Lam, P. Y., 1996, Protein Sci. 5:495. Zhao, D., Arrowsmith, C. H., Jia, X., and Jardetzky, O., 1993, J. Mol. Biol. 229:735.
5
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
Elliott K. Gozansky, Varatharasa Thiviyanathan, Nishantha Illangasekare, Bruce A. Luxon, and David G. Gorenstein 1. INTRODUCTION In an effort to increase the molecular size boundary imposed on structure determination by NMR spectroscopy, an experiment named 3D NOESY–NOESY was developed by Boelens et al. (1989a). The experiment, similar to its 2D counterpart, utilizes the through-space dipole–dipole coupling to correlate three protons within
a pairwise 5-Å radius. In practical terms, it can be thought of as two consecutive 2D NOESY experiments resulting in a correlation between three protons (instead of two protons as found in 2D NOESY experiments). Since the experiment is
homonuclear (normally proton), it has the advantage of working on unlabeled samples. Curiously, the experiment was developed long before an efficient means for quantitative data analysis was established. Several data processing methods
Elliott K. Gozansky, Varatharasa Thiviyanathan, Nishantha Illangasekare, Bruce A. Luxon, and David G. Gorenstein • Sealy Center for Structural Biology and Department of Human Biological Chemistry and Genetics, The University of Texas Medical Branch at Galveston, Texas 77555-1157.
Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999. 163
164
Elliott K. Gozansky et al.
were proposed, but they either suffered due to systematic error or were computationally intensive. A method called the 3D hybrid–hybrid matrix method (Donne et al., 1995a; Zhang et al., 1995) was proposed based on the long-standing 2D hybrid matrix methodology used in 2D NOESY data analysis. Fortunately, the 3D method retained the precision and accuracy of the 2D method and still retained nearly identical computational efficiency. Making the last pulse of one 2D NOESY experiment the first pulse in a second 2D NOESY creates the 3D NOESY–NOESY experiment. Figure 1 is a pictorial representation of the resulting pulse sequence. There are two incremented evolution periods and two mixing periods and but still only one acquisition period The 3D cross peak is actually a volume measured in four dimensions: three dimensions of chemical shift plus one dimension of amplitude. Provided all peaks
in a spectrum have identical line shape, the maximum amplitude of a 3D peak will be proportional to the 3D volume. In general, this is a poor assumption. The term denotes a 3D NOESY–NOESY volume correlating spins i, j, and k—in the order of i to j and then j to k. Interaction between k to j and then j to i (kji) as well as interactions of the type i to j and back to i (iji) (back-transfer peak), or i to i to j (iij), could also be detected. These latter two types of interactions are 2D-like in resolution characteristics. There are a number of ways to mathematically deal with the 3D NOESY– NOESY data. One approach uses the two-spin approximation (Wüthrich, 1986).
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
165
where is the 3D NOESY–NOESY volume and is of the distances between spins a and b. This method only considers the pairwise interactions—independent of any surrounding atoms. The two-spin approximation, albeit an intuitive description for dipole interactions, ignores the effects of multiple relaxation pathways (spin diffusion). Spin diffusion becomes quite significant with larger molecules. Although a little more difficult to solve, it is critical in any precise and accurate model of NOEs that the entire system be considered. Better techniques than the two-spin approximation utilize an eigenvalue–eigenvector solution to the rate
equation (Krishna et al., 1978; Bothner-By and Noggle, 1979; Keepers and James, 1984; Macura and Ernst, 1980; Meadows et al., 1991; Post et al., 1990). First, consider the description of cross peaks found in a 2D NOESY experiment. The 2D NOESY cross-peak volume matrix can be calculated as follows:
where is the 2D volume matrix and A(0) is initial magnetization vector available for NOE transfer. The rate matrix, R, describes the rate of NOE buildup for the entire proton spin system. The intrinsic cross-relaxation rate, between protons i and j, is related to the inverse sixth power of the distance between the two
protons. This relationship is the source of the two-spin approximation, given in Eq. (1). The rate matrix is a phenomenological description of the dipole interactions between every proton in the system and can be used to calculate the NOE for any single mixing time (2D NOESY) by Eq. (2). Each volume matrix element, denoted as represents the cross peak between protons i and j for a mixing time and R contains the cross-relaxation rates for all spin pairs:
Since R is a symmetric matrix about the diagonal, cross peaks above and below the spectral diagonal, for the same pair of spins, should be equal (assuming correct experimental parameters).
The self-relaxation rates (diagonal elements) are given by
166
Elliott K. Gozansky et al.
and the cross-relaxation rates (off-diagonal elements) are given by
where
is the resonance frequency and is the gyromagnetic ratio.
The spectral density function
which describes the transition probability at frequency is assumed to depend on a single, overall rotational correlation time Provided an invertible matrix, P, exists that can diagonalize R, Eq. (3) can be rewritten as
where
is a diagonal matrix. It is important to recall that
where I is the identity matrix. The rate equation can be easily extended to three dimensions. Consider the effects of a second mixing time and a third nucleus. Instead of an NOE between protons i and j there will be an NOE between i and j and then j and k (ijk). The initial magnetization can be represented as a column vector, which is often normalized to unity if thermal equilibrium has been reached. After the first NOE mixing period,
a 2D matrix is required to describe the magnetization available for the second mixing period. Thus, a 3D NOESY–NOESY cross peak can be described as
or, in diagonalized matrix form,
is the three-dimensional volume matrix produced by the 3D NOESY– NOESY experiment. This type of equation, where the NOE is considered across the whole system, is considered to be exact in describing the cross-peak volume.
In the case of equal mixing times there will be some symmetry in the data resulting from symmetry in the rate matrix. However, it is important to notice that the 3D NOESY–NOESY volumes are not equal through all permutations of the spins; in general, only This can be seen if one considers that A(0) can be arbitrarily set to I. Since R is symmetric,
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
167
However, in general, exp Thus, . Based on similar arguments the only equality that necessarily exists in the data is that described by Eq. (10).
Since matrix diagonalization can be difficult [and required for solving Eq. (8)], an approximation can be made using a Taylor series expansion of the rate equation (Boelens et al., 1989a, 1989b; Bonvin et al., 1991a; Habazettl et al., 1992a, 1992b;
Holak et al., 1991):
Normally, only the first few terms are retained since it becomes intractable to carry
out the summation much further. Data resulting from these methods can then be
used in distance geometry (Braun and Go, 1983; Havel et al., 1983; Wüthrich, 1986) or restrained molecular dynamics structure refinement (Gorenstein et al., 1990; Nilges et al., 1988; Zuiderweg et al., 1985). Unfortunately, the series converges relatively slowly; thus, approximation yields systematic error (examined in detail below) (Keepers and James, 1984; Post et al., 1990). Yip and Case (1989) put forth a gradient method for quantitative analysis of 3D NOESY–NOESY data. However, it scales to the sixth power of the number of
spins. Thus, even for a moderately sized system of spins (say 600), the solution becomes computationally prohibitive. Kaptein’s group created an approximation
method, based on the gradient method, and successfully refined an eight-residue peptide and the lac represser headpiece (residues 1–56) (Bonvin et al., 1991b; Slijper et al., 1995). This approximation incorporates a Taylor series expansion of the rate equation in the gradient analysis and still scaled with the cube of the number of spins.
2. SIMULATION STUDIES DESCRIBING 3D NOESY–NOESY CROSS PEAKS, APPROXIMATE VERSUS EXACT METHODS
Three-dimensional NOESY–NOESY cross-peak volumes have been handled using the approximation methods discussed above, without critical examination of their quality. In an attempt to quantify the limitations of various approximation methods, the crystal coordinates of an oligonucleotide duplex, Dickerson’s dode-
168
camer
Elliott K. Gozansky et al.
were used to compare the two-spin approximation
and the Taylor series expansion approximation (up to four terms) to the exact calculation based on the eigenvalue–eigenvector solution to the rate matrix equation
(Donne et al., 1995a). Atomic positions for all atoms in the oligomer were built with standard B-DNA geometry using the program AMBER 3.0 (Weiner and Kollman, 1981), followed by energy minimization as described previously (Nikonowicz and Gorenstein, 1992; Post et al., 1990). Isotropic tumbling was assumed for all studies using two overall correlation times of 1.6 and 3.2 ns. RMS errors were used as a criterion for the statistical analysis of the deviation between the
simulated NOESY–NOESY volumes using the “exact” methods approximate method
and the
The RMS was defined as
Comparison between the methods was examined using mixing times from 20 to 200 ms. In Fig. 2 scatterplots of approximate volumes versus “exact” volumes
are shown. Since geminal cross peaks tend to be large and grouped away from nongeminal, only nongeminal protons are displayed in the figure. In Fig. 3A, B, the RMS errors for one-term and two-term approximations for the whole data set have been plotted against mixing times. The first-order approximation, equivalent
to the two-spin approximation, reaches an RMS error of 50% at a mixing time around 60 ms (where the error increases with increasing mixing time). With the addition of more terms in the expansion, there is an improvement in the error; however, at useful mixing times the approximations yield significant systematic error. As noted in 2D NOESY simulation studies (Post et al., 1990), it is not the mixing time alone but rather the combined effect of correlation time and mixing time that determines when the approximation fails. This is more obvious in Fig. 3C, D, where the RMS errors are plotted against the product of correlation time and mixing time, Specifically, the first-order approximation failed (RMS error greater than 50%) with for both the dodecamer and the decamer. In the literature, efforts have been made to account for the spin-diffusion effect using the Taylor series approximation approach (Boelens et al., 1989a, 1989b; Kessler et al., 1991). In those approaches, it was assumed that the linear term in the expansion was the direct magnetization transfer term and the second-order term was regarded as spin diffusion through a third spin. However, the assumption cannot
explain why, in the Taylor expansion, the terms have alternate signs. If dramatic spin diffusion occurs, then all of the “spin-diffusion” terms (i.e., higher-ordered terms) in the series should contribute to the NOE volume in the same way; the second-order and all larger-order terms should have the same sign in the series. Second, note that cross relaxation will affect the NOESY and NOESY–NOESY
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
169
170
Elliott K. Gozansky et al.
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
171
volumes according to an exponential relationship. Therefore, when the exponent is expanded, the direct and indirect magnetization transfers are embedded in every term of the series. Specific terms in the Taylor series (e.g., linear, second-order, and third-order) cannot represent direct magnetization transfer, spin diffusion through a third spin, and spin diffusion through two other spins, respectively. Mathematically, for an expansion approximation to be successful, the series must converge uniformly and quickly. Although the Taylor series converges uniformly, it does not converge as fast as required for NOE volume approximation (Borgias et al., 1990). In the context of NOE simulation, the series usually did not converge after two terms (this would require the values of the third- and higher-order terms, to be negligibly small). Therefore, it was inadequate to take one or two terms in the Taylor series to simulate NOESY–NOESY volumes or interpret spin diffusion. Figure 4 shows the oscillatory behavior of approximations by the Taylor series expansion. In situations like this, the exact eigenvector–eigenvalue method should be the method of choice, particularly for 3D NOESY–NOESY experiments.
3. HYBRID–HYBRID RELAXATION MATRIX METHOD FOR 3D NOESY–NOESY DATA ANALYSIS Three-dimensional NOESY–NOESY has shown great promise for the structural refinement of large biomolecules. The NOESY-NOESY data analysis meth-
172
Elliott K. Gozansky et al.
ods, however, have proven to be quite challenging. Several different approaches have been developed to refine structures from 3D NOESY–NOESY spectra (Berstein et al., 1993; Bonvin et al., 1991a; Habazettl et al., 1992a, 1992b; Kessler et
al., 1991). Unfortunately, as shown in the previous section, the approximation methods fail at short mixing times, and the direct volume refinement (NOE gradient refinement) methods are computationally demanding for large systems (Bonvin et al., 199la; Dollwo and Wand, 1993; Yip, 1993; Yip and Case, 1989).
The relaxation matrix approach (Krishna et al., 1978; Bothner-By and Noggle, 1979; Keepers and James, 1984; Macura and Ernst, 1980; Measows et al., 1991;
Post et al., 1990) avoids the two-spin approximation by employing a matrix eigenvalue–eigenvector solution to the Bloch equations. Importantly, cross-relaxation rates evaluated by a matrix method include effects from multiple relaxation pathways (spin diffusion). Distances and structures derived from a matrix method
are more precise and accurate (Boelens et al., 1989a, 1989b; Borgias and James, 1990; Nikonowicz et al., 1990). For two-dimensional NMR, accurate distances can be directly obtained by the complete relaxation method by diagonalizing the 2D volume matrix which represents a 2D NOESY spectrum (Borgias et al., 1990; Post et al., 1990). Unfortunately, the eigenvalue–eigenvector solutions are very sensitive to the accuracy and completeness of the NOESY volume matrix (Post et al., 1990). A hybrid matrix solution
to this problem was originally proposed by Kaptein and co-workers (Boelens et al., 1988, 1989a) and implemented in several programs—IRMA (Boelens et al., 1988, 1989a), MARDIGRAS (Borgias et al., 1990), and MORASS (Gorenstein et al., 1990; Meadows et al., 1989), for example. This hybrid matrix approach combines the information from the experimental NOESY volumes and calculated volumes derived from an initial structure. The hybrid volume matrix, contains the well-resolved and measurable cross peaks from the experimental NOESY spectrum, while overlapped or weak cross peaks and diagonals are calculated from the cross-relaxation rates. A complete hybrid matrix is necessary
for successful matrix diagonalization. The distances derived from the complete rate matrix can then be utilized in a distance geometry or restrained molecular dynamics refinement of the structure. This process of hybridizing the volume matrix and structural refinement is repeated until a satisfactory agreement between the calculated and observed cross-peak volumes is obtained. In various structural refinements, three to six iterations are typically required to achieve convergence within
a family of structures consistent with the NOE data. We have recently demonstrated that this method can be extended to 3D NOESY–NOESY data, and the new method is called the hybrid–hybrid relaxation matrix method (Donne et al., 1995b; Zhang
et al., 1995). It represents a more computationally efficient method than the current gradient methods (Bonvin et al., 1991 b; Slijper et al., 1995), avoids the assumptions of the various approximation methods, and provides a method for the refinement of larger biomolecules.
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
173
3.1. Theory and Methods: Deconvolution of 2D NOESY Volumes from 3D NOESY–NOESY Volumes To mathematically represent 3D NOESY–NOESY data, various expressions have been suggested (Boelens et al., 1989b; Bonvin et al., 1991a; Borgias et al., 1990; Donne et al., 1995a; Kessler et al., 1991). The most straightforward expression represents the 3D NOESY–NOESY peak as the product of two 2D NOESY peaks:
where is a single 3D volume and are the 2D volumes, for spins a and b, during the two mixing times and (Boelens et al., 1989b). This equation is similar in form to the two-spin approximation, except here the effects of spin diffusion are explicitly considered. The main advantage of a 3D NOE–NOE spectrum over a 2D NOE spectrum is the enhanced spectral resolution provided by the third frequency dimension. Molecules greater than 10 kDa generally have significant spectral overlap. This
prevents the measurement of sufficient 2D NOEs to converge to a meaningful structure during refinement. Of course, this has been a major impetus in the development of new 3D and 4D experiments. Using Eq. (13), a 3D volume matrix can be deconvoluted into two 2D matrices. Theoretically, by Eq. (8), a 2D NOE matrix can be obtained from any single place in the 3D matrix; however, each plane in the 3D matrix is invariably incomplete. Clearly a 3D volume, will not be experimentally observable if the distance between spins i and j or spins j and k is greater than 5 Å. However, since each plane will be incomplete in a region different from any other plane, the full data set is recoverable. This is done by deconvoluting each of the incomplete planes in the 3D matrix into separate incomplete 2D planes representing a part of the full 2D NOE volume matrix. When more than one 2D plane contains a value for any one term, these values are averaged to give a single element As an added advantage of this treatment, errors are minimized by averaging over many computed values derived from many 3D volumes The hybrid–hybrid algorithm requires the calculation of the 2D volumes from the 3D volumes and the corresponding set of 2D volumes (for equal mixing times, we remove the distinction):
If there is enough spectral resolution in the 3D spectrum, values can be obtained from the cross-diagonal volumes or or the back-transfer volumes or 2D volumes measured independently from a well-resolved 2D NOESY spectrum. Often, it will not be possible to experimentally determine sufficient numbers of
174
Elliott K. Gozansky et al.
these cross peaks for larger biomolecules. In such cases simulated data must be used for the divisor in Eq. (14). Once again, a hybrid matrix solution to this problem has been fashioned and implemented in a 3D version of MORASS [Multiple Overhauser Relaxation AnalySis and Simulation (Meadows et al., 1989; Post et al., 1990)]. A flowchart of the hybrid–hybrid relaxation matrix method (3D MORASS) is shown in Fig. 5. First, an initial model structure is used to calculate the rate matrix. Then the 2D NOESY and 3D NOESY–NOESY spectra are simulated. The experimental and simulated 3D NOESY–NOESY data are then scaled and merged to create a hybrid 3D data set. Due to the tremendous number of 3D elements in larger biomolecules the disk file containing only the nonzero elements (ca. 1 % to 2% of the total elements) still requires approximately 140 to 280 Mbytes for 600 spins. Therefore, only data required for the method is stored. The 3D hybrid data are then deconvoluted into a 2D volume matrix with elements
where are the experimental 3D cross peaks and nonzero values are obtained from experimental or simulated data. Additional experimental or simulated 2D NOESY volumes can then be scaled and merged into the deconvoluted 2D volume matrix to give a complete 2D hybrid–hybrid volume matrix. The rate matrix can then be calculated from the hybrid–hybrid volume matrix using the 2D MORASS relaxation matrix approach, or any other 2D refinement protocol can be used. Note that numerical integration methods can also be used but are generally not as computationally efficient as the relaxation matrix method (Zhao and Jardetzky, 1994). The resulting distances are taken from the cross-relaxation rates, and the distances are then utilized in a distance geometry or restrained molecular dynamics refinement of the structure. The entire process is repeated in an iterative fashion until a satisfactory agreement between the calculated and observed 3D cross-peak volumes is obtained. Because two independent 3D data sets are merged (i.e., simulated and experimental data), the hybrid–hybrid matrix approach relies heavily on careful scaling. One way to scale the experimental and theoretical volume matrices is to match the volumes of several “markers” (Boelens et al., 1988, 1989a; Nikonowicz et al., 1990). Some of the back-transfer volumes in the 3D spectrum will hopefully be well resolved and correspond to spin pairs of fixed distance, similar to the proton pairs one would use as reference volumes in the 2D hybrid matrix or two-spin methods. We have found it best to use all experimental volumes for scaling. The ratio of the sum of these volumes gives the appropriate scale factor S:
Hybrid-Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
175
176
Elliott K. Gozansky et al.
where the summation is taken over all experimentally integrated volumes. (Various weighting factors may also be introduced.) Each refined structure from each cycle becomes a new model structure for the next iteration: S must be reevaluated for every iteration.
3.2.
Three-Dimensional Simulation Test and Effect of Added Noise
Before testing the refinement capability of the hybrid–hybrid method, it was necessary to examine the 3D data simulation routine and the effects of added noise. The correctness of the deconvolution routine was examined by comparing deconvoluted experimental 3D NOESY–NOESY data and experimental 2D NOESY data (Zhang et al., 1995). Reasonable accuracy of the deconvolution routine (data not shown) was followed by a study on the effects of added noise to the data. For these tests, a 3D NOESY–NOESY data set for a 12-mer GG mismatched duplex was simulated, based on two identical mixing times of 100 ms and a spectrometer frequency of 500 MHz. The structure of the 12-mer duplex was previously solved in our laboratory by a MORASS–restrained-MD calculation on the experimental 2D NOESY spectrum (Roongta, 1989). The refined coordinates were also used to generate a 2D NOE volume matrix at a mixing time of 200 ms. Elements of the relaxation rate matrix were calculated from the set of proton
Cartesian coordinates and a rotational correlation time of 3.6 ns. A partial set of 107 spins from the 12-mer duplex was used in the simulation, and only those volumes that could potentially be integrated, in an experiment, were included
(maximum of approximately 50,000). All other 3D matrix elements were set to zero, and only a linear table of the nonzero elements was stored on disk. Based upon the relaxation matrix, 3D volumes were generated for a given using Eq. (13). This represented the target spectrum with noise-free data. The most common types of experimental error found in multidimensional NMR, that of low signal-to-noise and incorrect volume integration, were added to the target spectrum to produce the “experimental” data. Random noise from a Gaussian distribution was added to all peak volumes in in order to simulate a constant low-level thermal noise and a peak integration error. Noise levels from to proportional to the individual volume, were added to each element. We used back-transfer volumes, to calculate an initial estimate of The “experimental” 3D spectrum was deconvoluted and the calculated average 2D volumes were compared to the exact simulated values. Figure 6 shows a plot of the %RMS deviation of the data after deconvolution as a function of the added random noise. As expected, the resulting RMS error for the “experi-
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
177
mental” 3D volumes was about one-half the error introduced into the 3D volumes.
However, when the planes were linked and the
values derived from the
simulated “experimental” 3D volumes were compared to the target the error is dramatically reduced two to threefold due to the effects of averaging.
3.3.
Hybrid–Hybrid Relaxation Matrix Structural Refinement of Duplex DNA from Simulated 3D NOESY–NOESY Data
In the previous section, a simulation study of the hybrid–hybrid matrix method was used to test the convergence of the deconvolution algorithm and the effects of
noise added to the data. Presented here are the results from a simulated refinement of a dodecamer DNA duplex by the 3D hybrid–hybrid matrix method. This test served to examine the performance of the entire methodology—in particular the
convergence capabilities compared to the 2D hybrid matrix method. Similar to the previous study, “experimental” 3D data sets were generated by adding noise to the known target structure. Theoretical 3D NOESY–NOESY spectra were then calculated from several model-built structures. As before, scaling was used to merge the
“experimental” 3D data with the theoretical 3D data to create a hybrid 3D NOESY– NOESY data set. This was deconvoluted into a 2D NOE volume data set and subsequently merged with the simulated 2D NOESY data. The result was a hybrid–hybrid 2D NOE volume matrix. Using this complete volume matrix to calculate a rate matrix, the distances were derived from the cross-relaxation rates and used in a restrained molecular dynamics refinement. It is worth pointing out that in the first hybridization the only volumes simulated were those needed to
178
Elliott K. Gozansky et al.
completely deconvolute the 3D data and scale the experimental to the simulated data. Only the simulated data needed for deconvolution were stored in memory. For the second hybridization, a complete 2D NOE volume matrix must be established. All interactions not present in the experimental data must be calculated from the model structure; however, experimental 2D NOE data was not required for such a calculation. The complete volume matrix is used to calculate the rate matrix, which is then transformed into the interproton distances. Finally, these distances are used
in a restrained molecular dynamics calculation to produce a more refined structure. This structure is then used as the model structure for the next iteration. The process is repeated until a satisfactory agreement between the theoretical and experimental 3D volumes is obtained, as shown in Fig. 7.
Hybrid–HybridMatrix Methodfor 3DNOESY–NOESYDataRefinements
179
3.3.1. Target Model and Data Simulation In this simulation study, the target structure was again taken from the X-ray coordinates of Dickerson’s dodecamer duplex DNA (Dickerson and Drew, 1981). Three different NOE data sets were simulated using this experimental target model. Two 3D NOESY–NOESY data sets were simulated using Eq. (16) with a Gaussian distribution of 20% (3D20) and 50% (3D50) integration error added, respectively. A 2D NOE data set was also simulated with 20% integration error (2D20) for comparison purposes. A 0.2% random thermal noise was added to all three data sets, and nonzero 3D cross peaks above a 0.2% cutoff were saved in a linear table representing the “experimental” spectrum. (Note that the two types of experimental noise are added explicitly in these data sets.) A three-dimensional matrix was not needed since over 99% of the 3D NOESY– NOESY volumes rounded to zero. A total of 1667 “true” 3D cross peaks were deconvoluted to generate 481 2D NOE volumes. Table 1 summarizes the statistics for the simulated data sets. A 2D cutoff of 2.5% was used to limit the number of volumes to a realistic level. Note that the target structure is nonsymmetric, so the number of constraints differs slightly for each strand. Again, both mixing
times, were 100 ms in the 3D NOESY–NOESY simulations with a single overall isotropic correlation time, of 3.6 ns. 3.3.2.
Hybrid–Hybrid Matrix Iterative Refinement Calculation
Three model structures were used as starting models for refinement. AMBER 4.0 was used to generate canonical A- and B-DNA model structures. Another model was generated from the canonical A-DNA by running an unconstrained molecular dynamics simulation for 2 ps at 1000 K and retaining the last structure. The Cartesian RMS deviations between the A-, B-, and with respect to
180
Elliott K. Gozansky et al.
the target dodecamer were 3.72, 2.95, and 4.21 Å, respectively. Since the had a distorted structure, it was used to test the effectiveness of the refinement protocol. The hybrid–hybrid matrix refinement calculations were performed using the scheme in Fig. 5. In each iteration new theoretical 3D NOESY–NOESY and 2D NOESY data sets were simulated using the model refined from the previous iteration. The standard 2D MORASS–AMBER refinement procedure was followed to generate a new set of distances for the restrained molecular dynamics calculation with a flat-well penalty function (Gorenstein et al., 1990). Such iterations were continued until the theoretical 3D spectrum converged to the “experimental” spectrum. The quality of refinement was examined by comparing the Cartesian RMS deviations between the refined structure and the target structure. The final force constants for the harmonic half-wells of the flat-well potential were for all refinements. The final flat-well distances, where no energy penalty applies, were 12% to 13% for 3D data sets and 9% for the 2D data set. This choice was made based on the consideration that the signal-tonoise ratio in 2D NOESY was greater than a 3D NOESY–NOESY (which should yield better precision for the 2D data integration). A single structure after the final iteration was considered the refined structure. Often, an extended restrained MD is used to generate an average final structure. However, we found it sufficient here to use the single final structure for the test of the method since, except in dynamically averaged torsional angles, there was no major difference between such single structures and those derived from the MD averaging (Nikonowicz et al., 1990). The quality of the final structure was judged by several figures of merit. The most important one in this case was the Cartesian RMS difference with the target model. Other criteria of refinement were as follows:
where * is ij for 2D data or ijk for 3D data, and a or b can be the theoretical or “experimental” volumes. Additional indicators of refinement examined were the R-factor, defined by
and the
-factor, defined as
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
where
181
is the NOESY mixing time.
The number of distance constraints derived from the hybrid–hybrid matrix
analysis of the 3D data sets was relatively conservative. The 0.20% volume threshold seems to be realistic from experimental considerations. For the simulated 2D data set, 359 constraints per duplex was clearly a reasonable upper limit for a molecule of this size, considering that Nerdal et al. (1989) used 310 constraints, experimentally derived from 2D NOESY, for the same 12-mer duplex. Table 1 shows that the %RMS(volume) values for the deconvoluted 2D data are only about half as large as the 3D data before deconvolution. This was not surprising because each deconvoluted 2D data point was obtained from averaging about three to four 3D NOESY–NOESY volumes (i.e., ca. 1700 initial cross peaks deconvoluted to ca. 500 cross peaks). As seen in this case, averaging should reduce the error by a factor
of where n was the number of data points used for each deconvoluted cross peak. This illustrates one of the significant advantages of the hybrid–hybrid matrix deconvolution method. 3.3.2a. Refinement Convergence. Table 2 summarizes the results for all nine refinement calculations. The results from the simulated refinement demonstrated that the proposed method is quite powerful and efficient. The quality of the refinement can be seen directly from the Cartesian RMS difference between each final structure and the target structure. All are below 1.60 Å except for the refinement with the 2D data set constraints (1.87 Å). A- and B-DNA models all achieved very good convergence. Starting from the A- and B-DNA models, the Cartesian RMS error from the refined 2D NOE data set was generally better than those refined from 3D NOESY–NOESY data sets. This was especially true for model-built B-DNA, which was most similar to the target duplex (basically a B-DNA structure with unfrayed ends). In this case, the final structure from the 2D NOE data set reached the lowest RMS value of all (1.10 Å). While the 3D data set with 50% added integration error (3D50) produced slightly better results than 3D20 for both models, the differences are probably not significant. For the model which was quite frayed and had the largest initial RMS difference from the target structure, the 3D data sets produced higher-quality structures than the 2D data sets. Several different iterative refinements always produced similar results (data not shown). Satisfactory convergence of this difficult model structure demonstrates the robustness of the hybrid–hybrid matrix method. This supports the general observation that having a larger number of distance constraints can be at least as important as increasing the accuracy of the constraints
182
Elliott K. Gozansky et al.
(Clore et al., 1993; Kaluarachchi et al., 1991; Meadows et al., 1991). It also indicates that the iterative deconvolution approach is quite efficient for achieving high-quality convergence. The target duplex along with comparisons of the target duplex, starting models, and the final refined structures are shown in Fig. 8. In contrast to the structures
refined using 3D data, the structures (starting model–refinement method and %error in data set) had a slight overall bend in comparison to the structure.
Although the individual structures converged reasonably well to the target structure (a measure of accuracy of the method), it did not guarantee that they all converged into the same family of final structures. Table 2 lists the average RMS differences among refined structures for each data set refinement. The average RMSD was calculated by simple arithmetic averaging of all possible RMSD of the refined structure as compared to the target. Another RMSD calculation involved calculating the difference between A-2D20 and B-2D20, A-2D20 and and B-2D20 and The three RMS values were averaged to give the final average RMS value. This method was repeated for the other two data groups, and the results agree well with the spread among the individual structures. The best convergence was from the 3D20 data set (RMSD spread was 1.31 Å), while the 2D20 data set gave the largest spread. This indicates that the 3D data is capable of producing better precision in convergence from different starting structures. This likely reflects again on the observation that having a larger number of less accurate constraints can lead to structures with greater precision but not necessarily greater accuracy.
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
183
The better precision and accuracy of the 3D data set refinement over the 2D
data set was further confirmed by helical parameter analysis (Zhu et al., 1996). In all cases, the overall quality of the reproduction of the sequence-specific variation in the helical parameters from the 3D50 refinement was better than those produced from the 2D20. This was particularly true for the major groove, minor groove, helix twist, roll angles, and nearly all backbone dihedral angles (data not shown). These results were consistent with the study of the accuracy of the hybrid relaxation matrix refinement of duplex structures (Kaluarachchi et al., 1991). This also provides strong evidence that such an iterative hybrid–hybrid matrix method of 3D NOESY–
184
Elliott K. Gozansky et al.
NOESY data is capable of achieving good convergence with high accuracy and
precision for both global and local structural features. The methodology has also been tested on simulated noise-free data sets with the same 12-mer DNA target model. The result indicates that, using between 350 to 400 constraints and 1.0% distance error in the flat-well potential function and 5 harmonic force constant, the %RMS(volumes) values for the final
structures were consistently below 10% (data not shown). The iterative refinement calculations were well behaved. As examples, Tables 3 through 6 list the complete refinement parameters for and respectively. In Table 4 the %RMS(volume) values calculated both from 3D NOESY–NOESY data and 2D NOESY data are listed. The 3D error measures were calculated between the 3D50 data set and that simulated from the model at each iteration. Likewise, the 2D values were calculated between the 2D-like data from
the deconvolution and the simulated 2D data from the model. In principle, the 3D result should reflect the progress more accurately than the 2D result since the deconvoluted 2D data depended on the intermediate models derived at each
iteration cycle. However, except for the sudden large change in the quality of the refinement parameters at iteration 6 for the 2D portion, both error measures progressed quite consistently and agreed well with the decrease in the Cartesian
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
185
186
Elliott K. Gozansky et al.
RMS difference between models and target structures. In all other calculations, no unusual values were observed. The progression of the R-factor and during the 3D and 2D versions was also well behaved. The flat-well force constants used were kept quite low throughout the refinement. When higher values were tested, they did not provide any noticeable improvement. The final total energies were all very similar to the unrestrained structure (ca. This level was representative for all the refinements.
Table 7 summarizes the final refinement results for all calculations. For the 3D hybrid–hybrid matrix refinement, the calculated errors in the deconvoluted 2D NOESY volumes are also included (in parentheses). Compared with the distribution of RMSD values in Table 2, these parameters seem to behave well in most cases.
The 2D %RMS(volume) values, R-factors, and with the results in Table 2. The 3D %RMS(volume) values for
are very consistent seem to be
quite high compared to the other error parameters. All of the refinements based on 3D NOESY–NOESY data sets required approximately the same number of iterations to achieve convergence, one to three more iterations longer than the 2D hybrid matrix method. In the hybrid–hybrid matrix deconvolution approach, the dependence of the deconvolution upon the iterative-model structure is likely responsible for the slightly higher number of iterations required to reach convergence. As might be expected, the data sets with the larger integration errors generally required more iteration cycles.
3.3.2b. Goodness of Refinement Measures. A number of parameters have been proposed to monitor the progress of 2D complete relaxation matrix refinement methods and the “goodness” of the obtained structures. For 2D matrix methods, the
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
187
%RMS(volume) is a very useful parameter for this purpose since it does not weigh the percentage differences between theoretical and experimental volumes for both large and small cross peaks differently. This is especially important since larger cross peaks often represent cross relaxation between intraresidue protons whose positions are constrained to be close by the geometry of the residue. The most important cross peaks are usually those that define interresidue distances and they are often the weakest. The R-factor (analogous to the X-ray R-factor) generally is
188
Elliott K. Gozansky et al.
regarded as a poor measure of quality of the refined structure because the magnitude
is dominated by the largest cross peaks, which are often the least important. The better reflects the quality of the structure since it more heavily weighs the weak cross peaks in comparison to the R-factor. Benefits of %RMS(volume) include increased sensitivity to the change of the model structure and a direct measurement between the calculated and experimental NOE data. Usually, a final refined value of %RMS(volume) comparable to the experimental error in the integrated volumes is acceptable (20% to 50%). For 2D NOE data, a %RMS(volume) value of 50% roughly corresponds to a distance error of approximately 8% to 9% (by propagation of error treatment, a factor of As described earlier, a 3D NOESY–NOESY cross peak is the product of two 2D NOE volumes; therefore, a simple direct relationship between %RMS(volume) and RMSD is not obvious. As such, it is not surprising to see that the refinement quality parameters in Table 7 do not appear to have a well-defined direct correspondence to the distribution of the RMSD values in Table 2. Note that the 2D matrix method does not have a precise one-to-one correspondence either. Rather, the quality of refinement parameters corresponds to a reasonable range of structural change. This should be kept in mind when comparing results in Tables 7 and 2. For example, in the 2D20 data set, the has the highest Cartesian RMSD in Table 2, but none of the parameters appear to be sensitive enough to reflect this. However, the difference between and B-2D20 usually would be observed
in practice. This likely reflects a problem inherent in the structural refinement of duplex nucleic acids where tertiary NOEs are not observed and where subtle bending or distortion of the structure can have a profound impact on the best RMS fit of the structures. For proteins, this should be much less a problem. All of the parameters in Table 7 are generally useful for monitoring the refinement progress. All the 2D parameters derived from the 3D data sets are quite useful for monitoring convergence and quality of the structures, although the %RMS(volume) values actually compare the simulated 2D NOE and the deconvoluted 2D-like data. The 2D version appears slightly more consistent than the 3D version of these parameters. The 3D version parameters are quite self-consistent even though they do not seem to agree with results in Table 2 as well. The has the highest %RMS(volume) value (248%), which corresponds to the highest Cartesian RMSD (1.60 Å) in Table 2. However, the lowest %RMS(volume) value (54%) actually corresponds to the second highest RMS value (1.54 Å) for B-3D20 in Table 2. For a given refinement, though, all these parameters reflect the trend of structural changes for the model very well. Figure 9 compares the results for the refinement progress for and This result is quite comparable to a 2D refinement calculation. The threshold value for 3D %RMS(volume) can only be determined empirically for the reasons mentioned above. It seems that a %RMS(volume) value two to five times as large as the 2D %RMS(volume) is
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
189
190
Elliott K. Gozansky et al.
acceptable. It can be concluded that the 2D and/or 3D parameters can be used effectively to monitor the hybrid–hybrid matrix calculation.
4. HYBRID–HYBRID MATRIX: EXPERIMENTAL REFINEMENT
TEST ON A DNA THREE-WAY JUNCTION As an experimental test of the hybrid–hybrid relaxation matrix refinement method, a DNA three-way junction (TWJ) was refined. The TWJ, consisting of three DNA strands forming three conjoined helical duplexes, contained two unpaired bases on one strand in the junction region. Gel electrophoresis and UV melting experiments have shown that two or more unpaired bases stabilize the DNA TWJ (Leontis et al., 1991). This molecule
was chosen as a test molecule because the resonance assignments were obtained previously (Leontis et al., 1993, 1995) and a high-quality 2D NOESY data set was available. The G–C rich TWJ sequence was designed to include one A–T base pair in each helical arm to serve as a spectroscopic marker. The sequence d(GGACGTCGCAGC), which is also shown in Fig. 10, contained a unique A–T base pair in each helical arm, allowing for unequivocal assignments. For base identification, the strand was first categorized and then the bases were numbered separately for each strand. For example, S1-G1 stands for the first guanine residue on strand 1. The NMR sample was prepared by dissolving stoichiometric amounts of the three oligonucleotide strands in buffer containing 10 mM sodium phosphate, pH 6.8 (uncorrected for deuterium), 100 mM NaCl, 10 mM and 0.5 mM EDTA to give a final concentration of approximately 2 mMDNA. The 3D NOESY– NOESY data were collected at 28°C at 750 MHz on a Varian UnityPlus instrument. The relaxation delay was set at 0.9 s, and a sweepwidth of 7100 Hz was used in all
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
191
three dimensions. Acquisition time was 72 ms. The mixing times were both 200 ms. Residual water suppression was achieved by low power saturation at the water frequency immediately before the first 90° pulse and during the two mixing periods. No attempt was made to suppress zero-quantum interference during the mixing times. In the direct detection dimension, 512 complex points were acquired, and 128 complex points were acquired for each of the two indirect dimensions. Eight scans were acquired for each pair. Quadrature detection in Fl and F2 was achieved with the hypercomplex method (States et al., 1982). The total experiment
time was 228 h. The 3D data set (128*128*512) was processed using Felix software (Biosym, CA) to give a data matrix of 256*256* 1024 real data points after zero-filling in all three dimensions. Only the real part of the final spectrum was stored. A 90° shifted sine bell apodization function was used in all three dimensions. The Flat routine available in the Felix program was used for baseline correction. Assignment of proton chemical shifts was based on previously reported values by Leontis et al. (1995). Cross peaks were picked by using the automatic peak-pick-
ing option available in the Felix program. The bounding box for each peak of interest was manually adjusted to reflect the actual linewidth of the peak. A total of 6253 peaks were picked by the automatic peak-picking routine. After manually sorting through the list to remove the body diagonals, artifacts, and noise peaks, 1635 peaks were retained. A total of 912 3D peaks (ijk- and iji-type peaks) were used for the deconvolution process—the remainder were iij-type peaks. The 3D NOESY–NOESY volumes were deconvoluted into 2D NOE volumes using the procedures described above (Zhang et al., 1995) and used in the standard MORASS–restrained-MD iterative refinement cycles. From the 2D NOESY experimental data set, 78 additional volumes, not obtained from the deconvoluted 3D spectrum, were incorporated with the deconvoluted 2D volumes to form the hybrid–hybrid 2D volume matrix. Starting model coordinates were obtained from a previous study using X-PLOR (Brünger, 1993a; Ouporov and Leontis, 1995). This model structure was then placed in a box of water containing 1720 TIP3P water molecules along with 29 sodium counterions (one sodium counterion for each phosphate group) and was subsequently equilibrated for 10 ps using the molecular dynamics program AMBER 4.1 (Weiner and Kollman, 1981). This structure was used as the starting model for all subsequent refinements. To ensure Watson–Crick base pairing, a hydrogen-bond constraint was added at each base pair in the helical arms, except for the base pairs at the stem ends. Only one hydrogen-bond constraint was applied per base pair in order to allow propeller twist during the refinement. For the MD part of each iteration, the starting structure was first energy minimized with the NOE constraints for 3000 steps. Eight ps of constrained molecular dynamics protocol with temperature annealing was performed on the energy-minimized model. Then the average structure from the last 3 ps of the MD
192
Elliott K. Gozansky et al.
was energy-minimized again, and the resulting structure was used as the starting model for the next iteration. Several key indicators monitored the progress of the iterative refinement process. The RMS errors in the volumes were used as the first criteria for monitoring the refinements. As can be seen from Table 8, the %RMS errors in volumes start at relatively higher numbers and gradually settle down to lower values with increased percentage of volume merging between the experimental and theoretical volumes. Energy factors, such as the total potential energy and the constraint energy, were also monitored throughout the iterative process. Both the total energy and the constraint energy increased in value as the error bars and force constants on the constraints were tightened in the molecular dynamics refinement. The effect of the force constant was controlled by changing the error bars from liberal values of 25% to much lower values as the confidence in the intermediate structures was estab-
lished. The R-factor was found to decrease as the refinement progressed as expected.
Figure 11 shows a plot of deconvoluted 2D NOE volumes derived from experimental 3D volumes versus the experimentally determined 2D NOESY volumes with a slope of 0.82. The random dispersion of these data points indicates a lack of systematic error introduced by the deconvolution process. The plot of theoretically calculated 2D volumes for the final structure versus the experimental (deconvoluted 2D plus the experimentally determined 2D) volumes, gives a slope
of 0.99 (Fig. 12), again with no systematic error. Figure 13 shows the number of NOE volumes measured per residue from the 2D and 3D data sets. Except for residues in the junction region, where NOE interactions are weak, the 3D NOESY– NOESY gave higher numbers of measurable NOE peaks than the 2D NOESY. A
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
193
194
Elliott K. Gozansky et al.
unique tertiary contact was also observed between the methyl group of S3-T6 and the of S3-G11. This crucial NOE peak was well resolved in both 2D and 3D spectra (as 2D iij- and ijj-type peaks), and was very useful to determine the conformation of the S3-T6 base. The final, refined, structure of the TWJ is shown in Fig. 14. As in the preliminary model, the three helical arms form two domains. Two of the helices, helix 1 and helix 2, are stacked on each other forming one continuous helical domain. The other helical domain, formed by helix 3, extends almost perpendicularly from the axis of the first helical domain. The unpaired pyrimidine bases are extrahelical, exposed to solvent and lie along the minor groove of helix 1. These two unpaired bases are stacked on each other. Helical parameters for all three helical arms exhibit only minor deviations from typical values for right-handed B-form DNA. Unusual values are, however, observed for the glycosidic angles of S3-T6 and S3-G8. The glycosidic bond of S3-T6 exists in an unusual syn conformation, allowing its methyl group to contact the hydrophobic surface of the minor groove of helix 1, at S3-G11.
5. CONCLUSIONS In this chapter, a simple, efficient, and robust structure refinement method using 3D NOESY–NOESY data has been presented and successfully tested by simulation
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
195
studies on a 12-mer DNA model as well as an experiment refinement on a 32-nucleotide TWJ. This method uses a straightforward deconvolution scheme to obtain a hybrid–hybrid 2D NOE volume matrix from 3D NOESY–NOESY volume data. It then uses the hybrid matrix 2D MORASS (or comparable Bloch equation solution) method for structure refinement calculation. Simulations have shown that the highest accuracy and precision in structural refinement is achieved by increased numbers and accuracy of the constraints used (Meadows et al., 1991; Thomas et al., 1991). The use of approximate methods has led investigators to apply liberal error bars when using NMR-derived distances during constrained structural refinements. While it is true that many good-quality structures have been obtained from NMR (Wüthrich, 1989)—especially with the development of 3D and 4D NMR (Clore et al., 1991: Grzesiek et al., 1992; Kay et al., 1990; Zuiderweg et al., 1991)—it is obvious that using larger numbers and stronger constraints will achieve greater precision (and hopefully accuracy) in the structures obtained by NMR. This appears to be generally true for proteins (Brünger et al., 1993b; Thomas et al., 1991) and nucleic acids (Gronenborn and Clore, 1989; Meadows et al., 1991; Metzler et al., 1990; Nilsson et al., 1986; Pardi et al., 1988). It has been reported that a relaxation matrix analysis for proteins may actually lead to poorer accuracy (Clore et al., 1993), although this has been disputed by futher simulation studies by Zhao and Jardetzky (1994). Thus, 3D NOESY–NOESY experiments hold the promise of providing more accurate structures given the vastly increased number of resolvable 3D NOESY– NOESY volumes (Kessler et al., 1991). As has been shown, approximation methods
196
Elliott K. Gozansky et al.
may not yield accurate distances at the longer mixing times required to achieve adequate magnetization transfer and signal-to-noise in large molecules (Donne et al., 1995a). Calculations based upon the complete relaxation matrix, however, are well within current computational resources, even for macromolecules containing more than 600 spins. Competing with the hybrid–hybrid matrix method are various direct gradient–NOE refinement methods which use the volume data directly in an attempt to match the theoretical volume data back to the experimental data (Bonvin et al., 199la; Habazettl et al., 1992a, 1992b). While this latter method does take into account spin diffusion, it scales to the sixth power of the number of spins in the system; a method that is computationally prohibitive for large systems. Kaptein and co-workers reported an approximation to the NOE gradient calculation method that scales to the third power of the number of spins (Bonvin et al., 1991 a); however, the hybrid–hybrid matrix method only scales with the square of the number of spins
via diagonalization of the n × n volume matrix. It must be emphasized that NOE gradient refinements should utilize relaxation matrix methods for calculating 3D volumes in order to achieve the highest accuracy for the refined structure (Donne et al., 1995a). Although our tests were conducted on nucleic acids, structural refinement of proteins should prove equally feasible. Kaptein’s gradient refinement method, carried out on the lac represser headpiece, demonstrates the important potential of relaxation matrix methods for 3D NOESY–NOESY refinement. The hybrid– hybrid matrix method has several advantages:
(1) It does not rely on any 2D experimental data. To use this method, only 3D NOESY–NOESY experimental data and a reasonable starting model are needed. An initial model can be constructed from the two-spin method utilizing NOEs derived from 3D or 4D heteronuclear edited or filtered NOESY. This makes it particularly suitable for studying larger molecules since significant numbers of good-quality 2D NOESY cross peaks cannot be resolved for molecules much larger than 10 kDa. (2) It is quite robust, precise, and accurate. The results from the simulation have shown that it converges well and possibly better than similar 2D methods. This seems to be especially true when using less than favorable starting models. It also involves the use of the more accurate 2D hybrid full-matrix method which takes into account the extensive spin diffusion which occurs in larger macromolecules at useful mixing times. Furthermore, in larger macromolecules, sensitivity in the 3D NOESY–NOESY (as well as 3D–4D heteronuclear filtered and edited NOESY) experiments is often poor so that longer NOESY mixing times must be used to increase the NOE volumes. Both larger molecular size and longer mixing times conspire to make spin diffusion a greater problem requiring a complete relaxation matrix analysis. The important combination of generating larger numbers of more
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
197
accurate distance constraints and the 2D method makes it possible to achieve quite accurate structures. (3) The method provides a simple means to incorporate into the refinement distance constraints derived from 3D–4D heteronuclear filtered and edited NOESY experiments since these can be added to the hybrid–hybrid volume matrix along with the deconvoluted volumes, any 2D NOESY volumes, and the simulated volumes. Of course one would want to carefully evaluate the relative accuracy of these constraints which are analyzed at a two-spin approximation level. These less accurate constraints can be added to the distance constraint list with appropriate increases in the error bars and decreases in the force constants. In addition, the 3D hybrid–hybrid method avoids exclusive use of less accurate
nD heteronuclear filtered and edited NOESY experiments commonly used for structural refinement. In various multidimensional heteronuclear coherence transfer NOESY methods, the NOESY volumes will be determined significantly by the degree of coherence transfer between the heteroatom and protons. This will depend on the magnitude of the coupling constant, which can vary significantly with atom and residue type. Distance constraints in these spectra can only be divided into three of four limits (e.g., short, medium, and long), which in turn limits the accuracy and
precision of the structure. (4) Software written for 2D NOESY data analysis does not need to be
significantly modified since the 2D deconvoluted matrix replaces the experimental 2D NOESY volume matrix. The analysis techniques described here are also general enough that they can be expanded to include even higher-dimensional experiments (such as a 4D NOESY–NOESY–NOESY spectrum). (5) The method is computationally very efficient as it does not involve the use of any 3D matrix or gradients which, at best, scale with the cube of the number of spins (Bonvin et al., 1991a; Zhang et al., 1995). This potentially limits the practical size of the systems that can be refined. Here, only the 3D peaks that are required for deconvolution and scaling are simulated and stored in memory. Once merged with the 3D experimental data, deconvoluted, and merged with the experimental 2D data, the data is stored as a simple 2D array. As a result, there is no need to create or store a three-dimensional array for the 3D data, which saves processing time and disk space. The design partially removes the extra limitations on the number of spins the computer can handle, thus making the methodology applicable to larger systems. CPU time scales as the square of the number of spins. It is apparent that the hybrid–hybrid method proposed here can be very useful for solving the structure of quite large molecules because it does not rely on 2D
NMR data and further effort is currently underway to fully automate the protocol. In practice, one problem with its successful implementation is the low S:N and relatively poor dispersion of 3D NOESY–NOESY spectra compared to heteronuclear spectra (requiring 1–2 mM concentrations). With higher fields and newer
198
Elliott K. Gozansky et al.
probes, we find this to not be as major an encumbrance as found previously. For instance, we have a Varian 5-mm 1H-only probe with a S:N of 1500:1 and a newly constructed Nalorac 8-mm triple-resonance probe with a S:N of 2400:1 which operate on a 750-MHz Varian spectrometer. The successful application of the hybrid–hybrid methodology in solving the structure of the 32-nucleotide TWJ gives us confidence that the method can be applied to proteins or nucleic acids of considerably higher molecular weight, especially with the newer probes.
REFERENCES Berstein, R., Ross, A., Cieslar, C, and Holak, T. A., 1993, J. Magn. Reson. B101:185.
Boelens, R., Koning, T. M. G., and Kaptein, R., 1988, J. Mol. Struct. 173:299. Boelens, R., Koning, T. M. G., van der Marel, G. A., van Boom, J. H., and Kaptein, R., 1989a, J Magn. Reson. 82:290.
Boelens, R., Vuister, G., Koning, T. M. G., and Kaptein, R., 1989b, J. Am. Chem. Soc. 111:8525. Bonvin, A. M. J. J., Boelens, R., and Kaptein, R., 1991a, J. Magn. Reson. 95:626. Bonvin, A. M. J. J., Boelens, R., and Kaptein, R., 1991b, J. Biomol. NMR. 1:305.
Borgias, B. A., Gochin, M., Kerwood, D. J., and James, T. L., 1990, Prog. NMR Spectrosc. 22:83. Borgias, B. A., and James, T. L., 1990, J. Magn. Reson. 87:475. Bothner-By, A. A., and Noggle, J. H., 1979, J. Am. Chem. Soc. 101:5152. Braun, W., and Go, N., 1983, J. Mol. Biol. 186:613. Brünger, A. T., 1993a, X-PLOR, Version 3.1. A System for X-Ray Crystallography and NMR, Yale University Press, London.
Brünger, A. T., Clore, G. M., Gronenborn, A. M., Saffrich, R., and Nilges, M., 1993b, Science 261:328. Clore, G. M., Wingfield, P. T., and Gronenborn, A. M., 1991, Biochemistry 30:2315.
Clore, G. M., Robien, M. A., and Gronenborn, A. M., 1993, J. Mol. Biol. 231:82. Dickerson, R. E., and Drew, H. R., 1981, J. Mol. Biol. 149:761. Dollwo, M. J., and Wand, J., 1993, J. Biomol. NMR. 3:205. Donne, D. G., Gozansky, E. K., and Gorenstein, D. G., 1995a, J. Magn. Reson. B106:156. Donne, D. G., Gozansky, E. K., Zhu, F. Q., Zhang, Q., Luxon, B. L., and Gorenstein, D. G., 1995b, Bull. Magn. Reson. 17:61. Gorenstein, D. G., Meadows, R. P., Metz, J. T., Nikonowicz, E. P., and Post, C. B., 1990, Advances in Biophysical Chemistry (Bush, C. A., ed.), JAI Press, Greenwich, p. 47. Gronenborn, A. M., and Clore, G. M., 1989, Biochemistry 28:5978. Grzesiek, S., Dobeli, H., Gentz, R., Garotta, G., Labhardt, A. M., and Bax, A., 1992, Biochemistry. 31:8180. Habazettl, J., Ross, A., Oschkinat, H., and Holak, T. A., 1992a, J. Magn. Reson. 97:511. Habazettl, J., Schleicher, M., Otlewski, J., and Holak, T. A., 1992b, J. Mol. Biol. 228:156. Havel, T. A., Kuntz, I. D., and Crippen, G. M., 1983, Bull. Math. Biol. 45:665. Holak, T. A., Habazettl, J., Oschkinat, J., and Otlewski, J., 1991, J. Am. Chem. Soc. 113:3196. Kaluarachchi, K., Meadows, R. P., and Gorenstein, D. G., 1991, Biochemistry 30:8785. Kay, L. E., Clore, G. M., Bax, A., and Gronenborn, A. M., 1990, Science 249:411. Keepers, J. W., and James, T., 1984, J. Magn. Reson. 57:404.
Kessler, H., Seip, S., and Saulitis, 1991, J. J. Biomol. NMR 1:83. Krishna, N. R., Agresti, D. G., Glickson, J. D., and Walter, R., 1978, Biophysical J. 24:791. Leontis, N. B., Kwok, W., and Newman, J. S., 1991, Nucleic Acids Res. 19:759.
Hybrid–Hybrid Matrix Method for 3D NOESY–NOESY Data Refinements
199
Leontis, N. B., Hills, M. T, Piotto, I. V., Malhotra, A., Nussbaum, J., and Gorenstein, D. G., 1993, J. Biomol. Struct. Dyn. 11:215.
Leontis, N. B., Hills, M. T., Piotto, M., Ouporov, I. V., Malhotra, A., and Gorenstein, D. G., 1995, Biophys. J. 68:251. Macura, S., and Ernst, R. R., 1980, Mol. Phys. 41:95.
Meadows, R. P., Post, C. B., and Gorenstein, D. G., 1989, MORASS, Purdue University, West Lafayette. Meadows, R. P., Post, C. B., Kaluarachchi, K., and Gorenstein, D. G., 1991, Bull. Magn. Reson. 13:22. Metzler, W. J., Wang, C, Kitchen, D. B., Levy, R. M., and Pardi, A., 1990, J. Mol. Biol. 214:711. Nerdal, W., Hare, D. R., and Reid, B. R., 1989, Biochemistry 28:10008. Nikonowicz, E. P., and Gorenstein, D. G., 1992, J. Am. Chem. Soc. 114:7494. Nikonowicz, E. P., Meadows, R. P., and Gorenstein, D. G., 1990, Biochemistry 29:4193. Nilges, M., Gronenborn, A. M., Brünger, A. T., and Clore, G. M., 1988, Protein Eng. 2:27. Nilsson, L., Clore, G. M., Gronenborn, A. M., Brünger, A. T., and Karplus, M., 1986, J. Mol. Biol. 188:455.
Ouporov, I. V., and Leontis, N. B., 1995, Biophys. J. 68:266. Pardi, A., Hare, D. R., and Wang, C., 1988, Proc. Natl. Acad. Sci. U.S.A. 85:8785. Post, C. B., Meadows, R. P., and Gorenstein, D. G., 1990, J. Am. Chem. Soc. 112:6796. Roongta, V. A., 1989, Ph.D. Thesis, Purdue University, West Lafayette. Slijper, M., Bonvin, A. M. J. J., Boelens, R., and Kaptein, R., 1995, J. Magn. Reson. B107:298. States, D. J., Haberkorn, R. A., and Ruben, D. J., 1982, J. Magn. Reson. 48:286.
Thomas, P. D., Basus, V. J., and James, T. L., 1991, Natl. Acad. Sci. U.S.A. 88:1237. Weiner, P. K., and Kollman, P. A., 1981, J. Comput. Chem. 2:287. Wüthrich, K., 1986, NMR of Proteins and Nucleic Acids, Wiley, New York. Wüthrich, K., 1989, Science 243:45. Yip, P. F., 1993, J. Biomol. NMR 3:361. Yip, P. F., and Case, D. A., 1989, J. Magn. Reson. 83:643. Zhang, Q., Chen, J., Gozansky, E. K., Zhu, F., Jackson, P. L., and Gorenstein, D. G., 1995, J. Magn. Reson. B106:164.
Zhao, D., and Jardetzky, O., 1994, J. Mol. Biol. 239:601. Zhu, F. Q., Donne, D. G., Gozansky, E. K., Luxon, B. L., and Gorenstein, D. G., 1996, Magn. Reson.
Chem. 34:S125. Zuiderweg, E. R. P., Petros, A. M., Fesik, S. W., and Olejniczak, E. T., 1991, J. Am. Chem. Soc. 113:370.
Zuiderweg, E. R. P., Scheek, R., Boelens, R., van Gunsteren, R., and Kaptein, R., 1985, Biochemie 67:707.
6
Conformational Ensemble Calculations: Analysis of Protein and Nucleic Acid NMR Data
Anwer Mujeeb, Nikolai B. Ulyanov, Todd M. Billeci, Shauna Farr-Jones, and Thomas L. James 1. INTRODUCTION Biological processes are generally governed by conformation-specific interactions of biomolecules. Molecular recognition is often modulated by an intrinsic flexibility of biomolecules, in particular, proteins and nucleic acids. Since conformational dynamics in some cases can be crucial, considerable effort has been devoted to developing methods to ascertain the dynamic structure of biomolecules. NMR has been long recognized as a rich means for probing the dynamics of biomolecules in solution (Jardetsky and Lefevre, 1994; Palmer, 1997). NMR parameters by which
Anwer Mujeeb, Nikolai B. Ulyanov, Todd M. Billeci, Shauna Farr-Jones, and Thomas L. James • Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94143-0446.
Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999.
201
202
Anwer Mujeeb et al.
dynamic aspects of protein structures are characterized include spin–lattice
and spin–spin rates for and
relaxation rates, rotating-frame spin–lattice resonances and heteronuclear or
relaxation nuclear
Overhauser effects (NOEs) to probe fast motions (Palmer, 1997; Li and Montelione, 1995; Lane, 1993; Gorenstein, 1994). Amide proton exchange rates reflect much slower motions (Koning et al., 1991). Deuterium relaxation measurements have also been used for probing side-chain dynamics in proteins. Furthermore, since biomolecules are not rigid bodies, internal motion and local flexibility must be taken into account during structural refinement (Schmitz et al., 1996; Nilges, 1996;
van Gunsteren et al., 1994). Standard methods for structure refinement, e.g., restrained molecular dynamics (rMD) and distance geometry (DG), yield a single rigid structure when used with the usual NMR structural restraints, i.e., interproton distances derived from nuclear
Overhauser effect spectroscopy (NOESY) cross-peak intensities and torsion angles derived from scalar coupling data. However, the nonlinear averaging of distances and angles that occurs with conformational fluctuations may compromise structural restraints. Some amelioration of the distortions in restraint values can be achieved by accounting for internal motions (Koning et al., 1991; Kumar et al., 1992; Liu et al., 1992), but a single rigid structure still results when using these methods. This
structure must be consequently understood as ensemble- and time-averaged. As we barely have sufficient data to define a single structure with high resolution, we certainly cannot define to high resolution each member of an ensemble of interconverting conformers. At best, we hope to improve our picture of the dynamic nature of proteins and nucleic acids. Challenges to construction of a dynamic high-resolution structure of biomolecules include
a. Calculating structural restraints with the best possible accuracy b.
Identifying internal inconsistencies in experimental NMR restraints, which may possibly arise from dynamics
c. Generating a large enough pool of molecular conformations to encompass all possible interconverting conformers d. Assessing the pool of conformations to ascertain which are needed to account for all of the NMR data with minimal conflict e. Evaluating how well the resulting ensemble represents the experimental NMR data
In this chapter, we outline the approaches being developed in our laboratory to address some of these challenges in generating accurate, precise, and possibly
dynamic images of biomolecules via NMR.
203
Analysis of Protein and Nucleic Acid NMR Data
2. DETERMINATION OF STRUCTURAL RESTRAINTS 2.1. Interproton Distance Restraints The preliminary requirement for structure elucidation by NMR is an accurate estimation of structural restraints. Two main sources of structurally sensitive information from NMR data are NOE and coupling constants, NOE being of principal importance. In homonuclear NMR, the NOE between two protons arises due to the interaction between their magnetic dipole moments; the NOE can be observed if the protons are close in space In simplest terms, the intensity of a cross peak between protons i and j in a two-dimensional NOE (2D NOE) spectrum relates to the interproton distance
as
Interproton distances can be calculated from experimental values of NOE crosspeak intensities. These interproton distances then constitute the basis for structural refinement. However, the relationship of Eq. (1) becomes imprecise as a result of
multispin effects, so-called spin diffusion. The presence of numerous protons in a molecule constitutes a network of magnetization relaxation pathways. Each proton experiences multiple dipole–dipole interactions with neighboring protons, which is the basis of spin diffusion. Although one can use an isolated spin-pair approximation [Eq. (1)] to estimate the interproton distances, spin diffusion makes this method prone to systematic errors. In order to calculate accurate interproton distances, one must consider all relaxation pathways.
A matrix of NOE intensities, a matrix of dipole–dipole relaxation rates, equation (Keepers and James, 1984)
where the off-diagonal relaxation rates power of the interproton distance:
at mixing time is related to by the matrix exponential
are proportional to the inverse sixth
and the proportionality coefficients depend on the motional model (Ernst et al., 1987). In contrast to the relationship of Eq. (1), Eqs. (2) and (3) are exact: they hold in the presence or absence of spin diffusion. Equations (2) and (3) constitute the basis for the program CORMA (COmplete Relaxation Matrix Analysis) developed
in this laboratory (Keepers and James, 1984). CORMA calculates NOE intensities using either a full set of interproton distances or the atomic coordinates of a structure.
204
Anwer Mujeeb et al.
Equation (2) can be easily inverted, thus expressing a matrix of relaxation rates via the logarithm of a matrix of NOE intensities:
Together with inverted Eq. (3),
one could obtain a means of calculating interproton distances from experimentally measured NOE intensities. Distances calculated in such a way would be accurate in the sense that they would not depend on the presence of spin diffusion in a molecule. Unfortunately, this is not a practical method to determine distances from the experimental NOE data. For any realistic system, the set of experimental NOE intensities is incomplete due to experimental limitations, such as peak overlap, limited spectral resolution, finite signal-to-noise ratio, and incompletely assigned
proton resonances. The algorithm MARDIGRAS (Borgias and James, 1990) developed in our laboratory solves this problem by supplementing the experimentally measured NOE intensities with theoretical intensities calculated from an initial (model) structure, thus forming a complete hybrid matrix of NOE intensities and a corresponding matrix of relaxation rates. Hybrid matrices are then modified iteratively to achieve consistency between all observed and nonobserved intensities and relaxation rates. The output consists of accurate interproton distances corresponding to experimentally observed NOE intensities. These distances are represented in the form of upper and lower bounds as well as an average value for each calculated
distance. For both CORMA and MARDIGRAS calculations, an explicit motional model must be assumed in order to derive the proportionality coefficients in Eqs. (3) and (5). In most practical cases, it is assumed that a molecule undergoes overall isotropic tumbling with correlation time an assumption valid for roughly spherical shapes of molecules. However, in the case of larger modular proteins (e.g., long axis–to– short axis ratio and nucleic acids duplexes ( base pairs), anisotropic tumbling should be assumed. An estimation of overall correlation time is required to determine the distance information from 2D NOE data. The correlation time can be measured from heteronuclear relaxation data for or labeled samples, or it can be roughly estimated from experimentally measured values of spin–lattice relaxation and spin–spin relaxation rates of various protons. For small nucleic acids and proteins, the correlation time is on the order of a few nanoseconds. It is always advisable to estimate a range of correlation times and perform MARDIGRAS calculations using several values within that range. Also, one has to remember that depends on solvent viscosity and, most importantly, on the aggregation state of the molecule. Besides an overall tumbling motion, fast internal molecular motions that affect relaxation rates should also be taken into account during
Analysis of Protein and Nucleic Acid NMR Data
205
relaxation matrix calculations. Rotational motions of methyl groups, flipping of aromatic rings, exchange of labile protons with bulk solvent, and accounting for effective internal motions in the form of order parameters (Lipari and Szabo, 1982) are some examples of processes that can be included in relaxation matrix calculations (Liu et al., 1992, 1993; Kumar et al., 1992). A severe limiting factor in the NMR-based structure determination can be the quality of measured 2D NOE intensities. In many cases the observed intensities are compromised by decreased signal-to-noise ratio, integration errors, and incomplete recovery of longitudinal magnetization of protons. Incomplete recovery of magnetization occurs when the delay between consecutive free induction decay (FID) acquisitions in an NMR experiment (repetition delay) is not sufficient for the longitudinal magnetization to reach its equilibrium value. Interproton distances can be biased if determined from an incompletely relaxed 2D NOE spectrum. A repetition delay of about five times larger than the spin–lattice relaxation time is required to achieve nearly full recovery of proton magnetization. In certain cases, e.g., the H2 proton on adenine, values are often quite long making it impractical to use long repetition delays. A program called SYMM was developed in our laboratory for correcting 2D NOE intensities for partial relaxation (Liu et al., 1996). SYMM uses the ratios of the below- and above-diagonal cross-peak intensities of a partially relaxed 2D NOE spectrum to calculate scaling factors that are used to correct the intensities. Alternatively, SYMM can adjust the NOE intensities by taking into account the experimentally measured values for individual protons and the experimentally used repetition delay value. It is desirable that NOE intensities are corrected prior to complete relaxation matrix calculations for spectra acquired with very short repetition delay. Quantitative errors may become increasingly significant for 2D NOE intensities measured at longer mixing times due to increased spin diffusion (Liu et al., 1995b). This is especially true for weak 2D NOE cross peaks representing large distances. In the case of proteins, large distances may involve long-range restraints that define the tertiary structure or global fold of the protein. Furthermore, because of error propagation, relatively small integration errors for strong cross peaks can also dramatically affect the distances derived from other weak NOE intensities. To overcome this problem, MARDIGRAS has recently been supplemented with an error analysis option (RANDMARDI) which simulates the effects of random spectral noise and errors in peak integration (Liu et al., 1995b). The program introduces a random error in each experimentally measured NOE intensity, and MARDIGRAS is then run for the perturbed set of intensities. The procedure is iterated (typically, 30–50 rounds) with the random errors varied within the user-defined limits. The resulting distance bounds from random-error MARDIGRAS calculations have enhanced accuracy, although the precision (i.e., the tightness of bounds) may be often decreased. This procedure has been applied to a DNA-psoralen complex (Liu et al., 1995b), a heteroduplex of an antisense DNA probe and its RNA target
206
Anwer Mujeeb et al.
sequence (Mujeeb et al., 1997), and a 17-mer RNA hairpin with a dynamic loop structure (Yao et al., 1997). In all cases RANDMARDI yielded more accurate distance restraints. Use of RANDMARDI in the case of the DNA-psoralen complex
study showed a clear advantage of this approach, affording a set of unbiased interproton distances. For the antisense DNA•RNA hybrid, analysis of interproton distances calculated with RANDMARDI permitted a successful analysis of conformational preferences of individual residues (see below). If possible, more than one starting model structure should be used in MARDIGRAS. Although MARDIGRAS shows little dependence on the starting model,
better starting structures yield somewhat more accurate distances. After initial refinements, resulting structures can be used as new starting models for the next round of distance calculations with MARDIGRAS. Further, it is often desirable
that calculations be performed on several sets of 2D NOE data collected at various mixing times. Typically, two to four 2D NOE spectra at different mixing times are
recorded, affording an increased number of distances as well as improved accuracy of the distance bounds. In the case of small molecules with fast overall correlation times NOE intensities may become small or negative. For such molecules, the rotatingframe NOE experiment, ROESY, is a more sensitive method. The CARNIVAL algorithm permits determination of interproton distances from ROESY intensities (Liu et al., 1995a) and is now part of the MARDIGRAS program. This method
corrects for the Hartmann–Hahn transfer of magnetization (HOHAHA), which otherwise leads to errors in distances calculated from ROESY data. The HOHAHA effect is accounted for by using scalar coupling constant values. In its turn, the coupling constants can be experimentally determined or estimated by a Karplus relationship from a model structure.
Interproton distances calculated with MARDIGRAS should be carefully analyzed for internal disagreements. Such disagreements may arise due to experimental errors in NOE intensities such as resonance misassignments and integration errors, which may be readily identified and corrected. Alternatively, internal inconsistencies can be caused by conformational flexibility leading to dynamically averaged interproton distances (Ulyanov et al., 1995; Schmitz et al., 1996; Yao et al., 1997). Use of the complete relaxation matrix method has now been widely accepted for the calculation of interproton distances from 2D NOE intensities (Mujeeb et al., 1993; Sorensen et al., 1997; Glemarec et al., 1996; Farr-Jones et al., 1995). Besides the MARDIGRAS and CORMA programs, similar approaches are used in methods such as the hybrid matrix approach and refinement of the structure against measured intensities via back calculations (Huang et al., 1993; Zhang et al., 1995; Zhu and Reid, 1995; Görler and Kalbitzer, 1997; Fedoroff et al., 1997).
Analysis of Protein and Nucleic Acid NMR Data
207
2.2. Coupling Constants and Torsion-Angle Restraints Coupling-constant information as such or transformed into torsion-angle restraints constitutes another vital piece of NMR-derived structural data. Couplingconstant information, being vicinal in nature, provides local structural restraints. If such restraints are redundant, they may help characterize local motions. Interproton distances extracted from NOE data can be supplemented with coupling constants during the structure refinement process. Transforming vicinal coupling constants into bond torsion angles requires a well-parameterized Karplus function [see, e.g., Schmitz and James (1995)]. In the case of nucleic acids, several three-bond coupling constant values involving sugar protons are often estimated, providing insights into sugar pucker conformation and dynamics (Schmitz et al., 1990; Mujeeb et al., 1992; Conte et al., 1996). Homonuclear NMR methods such as phase-sensitive COSY, ECOSY, PECOSY and double-quantum-filtered COSY (2QF-COSY) can be used to make direct or indirect measurements of coupling constants for small proteins, peptides, and oligonucleotide duplexes (Marion and Wüthrich, 1983; Griesinger et al., 1985; Bax and Lerner, 1988). However, line broadening due to increasing molecular size may prevent the direct measurement of coupling constants. Indirect determination of coupling constants involves fitting
experimental cross peaks to simulated ones using the SPHINX and LINSHA programs (Widmer and Wüthrich, 1987). This procedure has been successfully applied to fit 2QF-COSY peaks of DNA duplexes and an hybrid (Celda et al., 1989; Schmitz et al., 1990; Mujeeb et al., 1992; Weisz et al., 1992;González et al., 1994). The program SPHINX generates a stick spectrum based on energy transitions for various COSY-type pulse sequences. Then LINSHA calculates effective 2D line shapes while accounting for experimental parameters such as
digital resolution, window functions, truncation effects, and resonance linewidths for the protons involved. Cross peaks are simulated for a set of coupling constant values and resonance linewidths and then fitted to experimental cross peaks. The
best fit provides coupling constants that are consistent with the experimental data. In the case of nucleic acids, this procedure provides an elegant method for determining sugar puckers via analysis of vicinal coupling constants using a modified Karplus function. [For a detailed description of the strategies involved in using SPHINX and LINSHA methods, see Schmitz and James (1995).] Sugars in nucleic acids are nonflat five-membered rings whose conformers (puckers) are classified according to the atom that deviates most from the plane and according to the direction of the deviation. Conformers are called endo if the most deviating atom points toward the exocyclic position, and conformers with the opposite direction of deviation are called exo (Saenger, 1984). For example, standard B-form model of DNA has puckers of deoxyriboses, and standard A-form models of both DNA and RNA have sugar puckers. If we assume constant bond
208
Anwer Mujeeb et al.
lengths in sugars, the sugar ring conformation will depend on four degrees of
freedom. However, for many practical cases, sugar conformation can be described by just two parameters: pseudorotation phase angle P and maximum pucker
amplitude range of
(Altona and Sundaralingatn, 1972; van Wijk et al., 1992). The typical for undistorted rings is 30°–40°. Angle P is defined from 0° to 360° (or from –180° to 180°); sugar puckers have P between 0° and 36°, and puckers have P from 144° to 180°. Also, sugar conformations with P from –90° to 90° are often called northern, or N-conformers, and those with P from 90° to 270° are called southern, or S-conformers. Values of scalar coupling constants are directly related to the pseudorotation phase angle P and maximum pucker amplitude In the case of rapidly interconverting sugar ring conformations, the effective coupling constant would be a population-weighted arithmetic average of coupling constants of all conformers involved. A rigid sugar conformation can be defined by only two endocyclic torsion angles. By overdefining the pucker via coupling-constant analysis, one has the opportunity to characterize the repuckering dynamics. We have used a simple two-state model where a fast jump between Nand S- conformers is modeled to fit the derived torsion angles (Schmitz et al., 1990; Mujeeb et al., 1992; Weisz et al., 1992). For DNA duplexes, the major population of S-conformer (70%–95%) has been determined using this method.
In contrast to the above strategy, which involves translating the J-coupling constants into torsion angles, the direct use of experimental coupling constants during restrained molecular dynamics refinement has also been explored (González et al., 1995). Experimental J-coupling constants were applied as a flat-well energy
term in addition to the AMBER 4.1 force field. The flat-well width expresses the accuracy of experimental coupling constant values with explicit upper and lower
bounds. Similarly to the distance restraints, this term adds a penalty to the total energy when the calculated coupling constant is beyond the experimental bounds. Theoretical J-coupling constants during AMBER simulations are calculated from
the trajectory coordinates using a generalized Karplus function. Methods involving the direct measurement of coupling constants as well as fitting procedures often fail in the case of larger proteins, as the analysis is often fraught by increased natural proton linewidths and signal overlap in molecules of
increasing molecular weight. For proteins, emphasis mostly has been on three-bond couplings related to the backbone dihedral angle and the side-chain torsion angle However, recent improvements in and isotopic enrichment of proteins have made it possible to measure many heteronuclear J-couplings using multidimensional heteronuclear NMR experiments. Pulse sequences for the meas-
urement of three-bond
couplings have been developed and
improved (Bax et al., 1994; Case et al., 1994). Information regarding J-coupling
constants in proteins helps in stereospecific assignments of prochiral groups like and methyl groups on amino acids such as leucine and valine (Basus, 1989). It is also possible to use the J-coupling data for prochiral protons during
Analysis of Protein and Nucleic Acid NMR Data
209
refinement or conformational search without prior stereospecific assignments (Constantine et al., 1995). In the absence of stereospecific assignments, the use of “floating” chiralities during structure refinement also enhances the quality of the calculated structures. Floating chiralities may be assigned to all nondegenerate NMR signals from methylene and isopropyl groups and their fitting can be monitored statistically (Beckman et al., 1993) or via an NOE energy term (Folmer et al., 1997). Finally, coupling-constant data not only constitute a vital part of structural information, but also provide a tool for assessing the quality of structures calculated using NOE restraints. One can make use of J-coupling information for cross-validation.
For this purpose, coupling-constant information should be excluded from the refinement process. Subsequently, theoretical coupling constants can be calculated for the resulting structure(s) and compared with experimental data (Ulyanov et al., 1995). However, a prerequisite for this approach is a sufficient number of NOE restraints which allows efficient structure refinement without torsion-angle restraints.
2.3.
Other Types of Restraints
In certain systems, NOE intensities and coupling constants can be supplemented with other types of structural restraints. In the case of molecules containing anisotropic paramagnetic centers, such as metalloproteins and spin-labeled nucleic acids, the paramagnetic effects on chemical shifts may be used as additional restraints (Gochin and Roder, 1995; Salgueiro et al., 1997). Semiquantitative estimations of distance restraints from paramagnetic relaxation parameters can also be an important adjunct to standard NMR information (Gillespie and Shortle, 1997). Unlike conventional NMR information, the effects of paramagnetic shifts are long range, and inclusion of such information holds an obvious promise of enhanced quality of calculated structures. A systematic dependence of the and chemical shifts of and resonances, respectively, on secondary and tertiary protein structure has also been established (Wishart and Sykes, 1994a, 1994b). Efforts have been made to include this type of information directly in structure refinements (Ösapay et al., 1994), although with limited success in nonmetalloproteins. The limitations arise from the complexity of the physical interactions which determine the value of chemical shift (Case et al., 1994). So, although for the moment direct use of chemical-shift data may not be feasible in high-resolution structure calculations, they do contain useful structural information that can possibly be used indirectly during refinement.
2.4. Indices of Agreement
After the structure (or structural ensemble) was determined via NMR, all predicted NMR parameters must be compared with original experimental data. A
210
Anwer Mujeeb et al.
number of NOE-based agreement indices can be calculated with the program CORMA. Among them of note is a residual index which is analogous to a crystallographic R-factor:
where A 0(i) and Ac(i) are observed and calculated NOE intensities, respectively, and the summation is performed over all observed cross peaks. However, due to the equal weighting of all deviations, such an R-factor is dominated by strong intensities corresponding to short distances. The term may be negligibly small if both calculated and observed intensities are weak and correspond to long, but nevertheless very different, distances. At the same time, such long distances can be critical for determining the correct tertiary fold of the molecule. A more sensitive index of agreement for NOE cross-peak intensities is a sixth-root-weighted factor which accounts for the approximate sixth-root dependence of interproton distances on NOE intensities (Thomas et al., 1991):
In addition to the indices of agreements (e.g., R or ) based on all observed NOE intensities (“total” R-factors), it is often useful to split the observed intensities into several groups and calculate the indices for each group separately. For example, R-factors calculated for the subset of intensities which correspond to fixed interproton distances (such as distances between geminal protons or protons on aromatic rings) are not very sensitive to the details of a molecular structure; however, such R-factors are more sensitive to the motional model used and, in particular, to the overall rotational correlation time (see above). NOE-based R-factors are routinely split in the program CORMA into the intra- and interresidue components, the latter being more sensitive to the tertiary structure. Also, R-factors can be calculated for each residue separately. In addition to NOE-based R-factors, figures of merits can be calculated for interproton distances, scalar coupling constants (Ulyanov et al., 1993), and relaxation rates (Sec. 4.2). Analysis of such a plentitude of agreement indices may seem tedious, but it can indicate potential problems with the structure refinement, and, in some cases, it can reveal possible conformational averaging.
3.
ASSESSMENT OF CONFORMATIONAL FLEXIBILITY
The presence of multiple conformations in solution can be assessed by a variety of NMR methods. Especially well established are the methods to probe local
Analysis of Protein and Nucleic Acid NMR Data
211
dynamics of the backbone and side chains in proteins using heteronuclear NMR of and nuclei [for a review, see, e.g., Palmer (1997)]. In the case of homonuclear NMR on protons, one of the indications of conformational averaging is a so-called exchange broadening of proton NMR lines, which can be quantified by measurements(Schmitz et al., 1992b; Lane et al., 1993; Blackledge et al., 1993; McAteer et al., 1995) and then interpreted in terms of particular motional models. Here we will be concerned with the situation when several conformation are in fast exchange with each other. Such conformations give rise to a single set of NMR signals, and structural restraints derived from NOE intensities or scalar
coupling constants are the nonlinear averages. Attempts to satisfy all these restraints in a single rigid structure may be unsuccessful or even yield a nonphysical conformation. One of the first well-documented examples of such a situation is antamanide, a cyclic decapeptide. In this case, rMD calculations were first used to prove that all experimental restraints could not be satisfied in any single structure (Kessler et al., 1988), which served later as a basis for identification of possible conformers (Blackledge et al., 1993). In another example, a model 17-residue peptide, evidence of multiple conformations was obtained after careful analysis of NOE cross-peak patterns (Merutka et al., 1993). On the one hand, a pattern of shortand medium-range NOE cross peaks showed that the peptide is highly helical; on the other hand, certain long-range cross peaks indicated that nonhelical but structured conformers exist at the same time. In yet another example, it was observed for a 17-nucleotide hairpin-loop RNA, that an adenine H2 proton from the loop region had cross peaks simultaneously with protons of five(!) different residues (Yao et al., 1997). Clearly, such a situation could arise only for a highly flexible loop, but at the same time individual conformers must exist long enough to give rise to the observed NMR signals. The last example involves sugar pucker dynamics in nucleic acids. As discussed in Sec. 2.2, scalar coupling data indicate that deoxyriboses are in dynamic equilibrium for DNA duplexes and hybrids. Indeed, experimental J-couplings could not be explained by any single sugar conformation, but they were consistent with an equilibrium of and puckers. This conclusion can be corroborated with NOE data as well: intraresidue NOE-derived interproton distances between the sugar protons and base protons (H6 of pyrimidines or H8 of purines) were typical of sugars (typical of B-DNA), while the distances between and H6/H8 were suggestive of puckers (such as in A-DNA) (Ulyanov et al., 1995; Schmitz and James, 1995). From the examples discussed here, it is clear that redundant experimental restraints are required in order to reveal potential conformational averaging. This explains the scarcity of well-documented cases of conformational flexibility: solution structures are more typically underdetermined by the NMR data than overdetermined. Anyway, the first logical step in assessment of NMR-derived restraints is a conventional refinement assuming a single rigid structure. For
212
Anwer Mujeeb et al.
example, the average structure of an antisense
hybrid determined by the rMD with NOE-derived distance restraints displayed the sugar pucker for the DNA strand (intermediate between and This was in contradiction to the experimentally derived J-coupling information, which suggested a dynamic equilibrium for deoxyriboses (González et al., 1995).
4. ENSEMBLE CALCULATIONS
4.1.
Overview In recent years, several methods have been put forward for determining
structural ensembles from NMR data [reviewed by Ulyanov et al. (1998)]. One of
the most popular is so-called time-averaged molecular dynamics (MDtar), developed by Torda et al. (1990). Unlike standard rMD, in which all restraints are imposed at each time step of the simulation, MDtar requires that experimental
restraints be satisfied only for the whole trajectory on a time-average basis. MDtar has better sampling properties than rMD, but most importantly it generates a trajectory which serves to explain the experimental data with an ensemble of structures. When this method was applied to the example of a hybrid mentioned in the previous section, the resulting structural ensemble satisfied all the restraints together and better than any single structure (González et al., 1995; Schmitz and James, 1995). Similarly, the MDtar trajectory calculated for the
17-nucleotide RNA explained better the multiple NOE cross peaks observed for the loop adenine (Yao et al., 1997). One of the shortcomings of this method is that it produces very large ensembles of structures. Not only are such ensembles not unique (clearly, one never has enough experimental data to define unambiguously
each member of the ensemble), but it is also very difficult to analyze and draw structural and functional conclusions from such trajectories. To alleviate this problem, various clustering techniques can be used for selection of representative structures. Of note are the identification of time clusters in the MDtar trajectories (Yao et al., 1997) and hierarchical cluster analysis based on atomic root-meansquare deviations (RMSD) with the NMRCLUST program (Kelley et al., 1996). However, we must mention that the agreement with experimental data can be
compromised when going from large MDtar trajectories to a limited set of representative structures (Yao et al., 1997). In the next section we will discuss a method, PARSE, which is geared to determine smaller ensembles of structures, but is still sufficient to explain the experimental data in the case of conformational averaging. An important methodological development in this field is a method of “multiple copy refinement.” This approach involves simultaneous rMD refinement of several conformations, assuming their equal populations (Bonvin and Brünger, 1995; Kemmink and Scheek, 1995). Unlike standard (“single copy”) rMD refinement,
Analysis of Protein and Nucleic Acid NMR Data
213
distance restraints are imposed here not on individual interproton distances, but on distances which are ensemble-averaged for all copies of a molecule. The method appears to be very promising in situations where a few conformations exist in almost equal populations. A variant of multiple-copy refinement was proposed by Fennen et al. (1995), in which the contribution of each copy was weighted by a Boltzmann
factor calculated using conformational energy. In another group of methods, structural ensemble determination is separated into two independent parts: generation of a pool of potential conformers, and assessment of their probabilities (Brüschweiler et al., 1991; Ulyanov et al., 1995; Pearlman, 1996). One of these approaches is discussed in detail in the next section.
4.2. Relaxation-Rate-Based Probability Calculations The algorithm PARSE (Probability Assessment via Relaxation rates of a
Structural Ensemble), developed in our laboratory (Ulyanov et al., 1995), splits the problem of structural ensemble determination from NMR data into two independent steps. First, a pool of potential conformers is generated, and then the conformers are assessed to find which conformers and their probabilities give the best agreement with experimental data. For the first step, PARSE uses the idea originally implemented in the algorithm MEDUSA by Ernst and co-workers (Brüschweiler et al., 1991): the total set of experimental constraints is subdivided into several groups, and the structure is refined repeatedly against each subset. The rationale for this procedure must be clear from the previous section: in the case of conformational averaging, the experimental restraints may be internally inconsistent. Therefore, removing conflicting restraints from the total set, several at a time, and refining against the
reduced set is expected to produce each time a distinct conformation. While it may not be true that each conformation, produced in such a way, represents a true solution conformer, nevertheless the conformational space covered with the calculated pool of structures is expected to reflect roughly the true conformational
flexibility. For small molecules with a small total number of restraints, all possible restraint subsets can be considered (Blackledge et al., 1993). For bigger molecules, a preliminary assessment of flexibility (see above) is necessary, the goal of which
is identification of conflicting restraints. For the second step, the identity and probabilities of the conformers in the
optimal ensemble are calculated with the program PDQPRO (Ulyanov et al., 1995). PDQPRO uses a quadratic programming algorithm (Fletcher, 1981) for the global constrained minimization of a relaxation-rate-based index of agreement (constraints are put on probabilities which must be nonnegative and sum to unity). This approach makes use of the fact that for rapidly exchanging conformers, the effective dipole–dipole relaxation rate is a linear population-weighted average of rates of
214
Anwer Mujeeb et al.
individual conformers. For a given theoretical conformational ensemble (i.e., a set of structures with assigned probabilities the effective relaxation rates are
where is a relaxation rate for proton pair (i,j) in the conformer and the summation is performed over all conformers. Then the problem of ensemble determination can be formulated as fitting the theoretical rates to the observed rates by varying probabilities The PDQPRO program performs this fitting by finding the global minimum of the quadratic objective function
under the constraints and The summation is carried out over all observed cross peaks (i,j) in Eq. (9); weights can be used to regulate the relative contribution of different cross peaks. Theoretical relaxation rates for each potential conformer can be calculated with the program CORMA using Eq. (3), and experimental relaxation rates can be derived from NOE intensities with MARDIGRAS using Eq. (4).
5. EXPERIMENTAL EXAMPLES In this section we will illustrate the two steps of the approach outlined above using preliminary data for a small peptide and a nucleic acid as examples. It is clear from the previous section that conformers’ probabilities can be assessed with PDQPRO independently of the sampling method. In the first example, we will apply PDQPRO for a very small pool of conformations calculated during conventional NMR refinement. In the second example, we will show how conformational sampling can be carried out with partitioning of the distance restraints into a series of self-consistent subsets.
5.1.
MVIIC
A 26-residue peptide, MVIIC, from the sea snail Conus magus is a channel blocker which binds with high affinity to voltage-sensitive channels
in neurons. The peptide has three disulfide bonds between Cys 1 and Cys 16, Cys 8 and Cys 20, and between Cys 15 and Cys 26. The solution structure for this peptide has been determined using NMR (Farr-Jones et al., 1995). Interproton distance restraints were calculated from homonuclear 2D NOE cross-peak intensities using MARDIGRAS. The distances were input to DG, which produced 15 structures;
Analysis of Protein and Nucleic Acid NMR Data
215
these structures were subsequently refined with rMD simulated annealing using AMBER as described (Farr-Jones et al., 1995). The final ensemble of 15 refined structures is shown in Fig. 1 (left). This kind of structural ensemble is a typical result of NMR refinement: the global fold is well defined by the NMR restraints (the backbone atomic RMSD is 0.84 Å), but some variation in the backbone geometry and sidechain conformations (e.g., Tyr 13) is clearly seen. We must emphasize that the set of structures shown here (Fig. 1, left) is different from structural ensembles discussed in the previous section: all 15 structures are equally valid (or invalid), each was calculated using the same protocol, and no probability is associated with any of them. The conformational variations in the 15 structures may reflect the true conformational flexibility in the solution or just the degree of indetermination of the structure by the NMR restraints available. The theoretical relaxation rates for the 15 structures were calculated using CORMA, and the experimental rates were derived from the original NOE intensities using MARDIGRAS. After that, the index [Eq. (9)] was optimized with PDQPRO, assuming equal weights A structural ensemble with the minimum index consists of just seven structures with nonzero probabilities: 32.0, 30.8, 17.8,9.4,4.6, 3.9, and 1.5%. The resulting ensemble is shown in Fig. 1 (right) with the diameter and darkness of the backbone proportional to the probability of the structure. It is seen that the conformational variation is significantly reduced for the
216
Anwer Mujeeb et al.
PDQPRO ensemble, although it has not disappeared completely. For example, the side chain of Tyr 13 still occupies two distinct positions in the conformational space, but the probability of one of them is a mere 1.5%. It is trivial that the relaxationrate-based index improved for the PDQPRO ensemble (3.61) compared to individual structures (4.38–7.95, Table 1), because was optimized during the PDQPRO calculation. It is noteworthy, however, that the NOE-based and R-factors also improved, especially their interresidue components, which are more sensitive to the tertiary fold (Table 1). Still, it is not clear yet to what extent the PDQPRO ensemble represents the solution conformers, because the improvement in the indices of agreement was not that dramatic. A conservative interpretation of the result is that the eight structures eliminated are not necessary to explain the existing NMR data. 5.2.
Nucleic Acid Example
We have recently carried out a high-resolution proton NMR study on a hybrid duplex formed by a methylphosphonated DNA strand and a complementary RNA strand, (Mujeeb et al., 1997). Here, MP stands for the methylphosphonate linkages in the pure
Analysis of Protein and Nucleic Acid NMR Data
217
stereoconfiguration; MP alternates with the usual phosphodiester linkage in the DNA strand. Our investigations suggested that this hybrid is dynamic, and its average structure is different from the standard A- and B-forms. Indeed, the absence of the cross peaks for the RNA strand indicated that the corresponding scalar coupling constants were small, which is typical for the A-like ribose conformations. In contrast, deoxyriboses in the MP-DNA strand exhibited detectable strong cross peaks, and almost similar
patterns for the cross peaks in the 2QF-COSY spectrum. Similar observations have been made for another hybrid, which were explained by highly flexible sugar puckers in the DNA strand (González et al., 1994, 1995; Schmitz et al., 1996). We will illustrate how the idea of PARSE can be used to calculate an ensemble of representative structures which explain the experimental data. Interproton distance restraints were calculated using MARDIGRAS for the 2D
NOE data acquired at three mixing times (50,150, and 350 ms). The RANDMARDI approach was applied to account for spectral noise and integration errors as described in Sec. 2.1. Three initial model structures were used for MARDIGRAS calculations, which corresponded to the A-form, B-form, and a mixed conformation
with the sugar puckers for the RNA strand and puckers for the MP-DNA strand sugar puckers have pseudorotation phase angle P from 72° to 108°, intermediate between that for and Comparison of the calculated distances with the three initial structures showed that calculated distances in the RNA strand were consistent with the values expected for the A-form. However, the situation was mixed for the MP-DNA strand. Some of the
distances (e.g., intraresidue distances between sugar proton and base proton H6 or H8) were typical of A-DNA, other distances (e.g., interresidue sequential distances were consistent with B-form DNA, and yet other distances had intermediate values between A- and B-forms. Such apparently conflicting interproton distances suggested a significant degree of conformational averaging in the MP-DNA residues. A comparison of experimental NOE cross-peak intensities with NOE intensities calculated via CORMA for the three model structures (using index of agreement computed for each residue separately; see Sec. 2.4) yielded qualitatively similar results. We attempted to produce a set of the hybrid conformations which may represent the flexibility of this molecule in solution and explain the experimental interproton distances. For that purpose, we constructed a series of distance restraints subsets by excluding, one at a time, the conflicting restraints from the total set. Altogether, 144 subsets were generated, and the structure was refined against each of them. The refinement included a 10-ps in vaccuo rMD run using standard protocols described elsewhere (Mujeeb et al., 1993). Normally, such a procedure is known as “cross validation” (Brünger et al., 1993; Weisz et al., 1994). When a molecule has one major conformation in solution,
218
Anwer Mujeeb et al.
all experimental restraints must be self-consistent. In such a case, exclusion of a small percentage from the restraint set is not expected to affect the refined structure. However, if experimental restraints were subject to conformational averaging, refinement with just a few critical restraints removed may lead to a dramatically
different structure (Ulyanov et al., 1995, 1998). This turned out to be the case for the MP-DNA•RNA hybrid. Refinements against the 144 distance-restraint subsets led to many distinct conformations with a variety of sugar puckers in the MP-DNA
strand. Subsequently, the PDQPRO calculations were carried out for the pool of 144 potential conformers. Sets of theoretical dipolar relaxation rates were calculated for each of 144 conformations using CORMA. Experimental relaxation rates were calculated with MARDIGRAS for experimental 2D NOE sets at three mixing times. A fourth set of experimental relaxation rates was created by averaging the rates for
Analysis of Protein and Nucleic Acid NMR Data
219
the three mixing-time data sets. The PDQPRO calculations selected about five
conformers with nonzero probabilities for each of the four sets of experimental rates. Altogether, 10 structures were selected with some conformations picked in more than one set of calculations, although with different probabilities. Figure 2 shows the distribution of sugar puckers in the selected structures. In agreement with the qualitative results of the 2QF-COSY cross-peak analysis, the riboses in the RNA strand have a relatively narrow distribution around puckers (top), while deoxyriboses in the MP-DNA strand populate all sugar conformations from endo to (bottom). It is interesting that flexibility of sugars in the MP-DNA strand of the hybrid surpasses that of the sugars in pure DNA•DNA duplexes; this is in agreement with the studies on another DNA•RNA hybrid (González et al., 1994, 1995;Schmitze et al., 1996).
6. CONCLUSIONS A number of computational methods are being currently developed to model fast dynamics of nucleic acids and proteins at the atomic level of resolution based
on NMR data. Such dynamics can be described in the form of structural ensembles. A prerequisite for this modeling is the presence of conflicting experimental restraints, distance restraints derived from NOE data, and/or torsion-angle restraints derived from scalar coupling constants. In its turn, this requires a certain redundancy of restraints, which must overdefine the structure. The PDQPRO algorithm is capable of assessing the probabilities of structures in a given conformational pool based on the fitting to the experimental data. A convenient feature of this algorithm is that it can be combined with any method of conformational sampling which produces a pool of potential conformers. In all applications of the PDQPRO method that we attempted so far, it reduces significantly the size of a conformational pool
(without sacrificing the agreement with experimental data), which simplifies greatly the subsequent structural analysis of the calculated ensemble. However, the success of this approach depends critically on comprehensive prior conformational sampling (Ulyanov et al., 1998). Methods, such as MEDUSA and PARSE, attempt
a “rational” construction of the conformational pool by identifying the conflicting restraints and then repeatedly refining the structure against reduced sets of restraints. In the case of bigger molecules, such an operation is far from trivial, and it often requires a prior hypothesis about possible solution conformers. Promising alternative approaches, which require further exploration, involve combination of PDQPRO with such sampling methods as MDtar or multiple-copy refinement. Lastly, all these methods should not be expected to produce a unique set of solution conformers, because typically there are barely enough experimental restraints to define even a single high-resolution structure. However, as we demonstrated for
certain experimental systems, it is possible to characterize areas of conformational
220
Anwer Mujeeb et al.
space that a flexible molecule occupies in solution, and to generate an ensemble of representative structures satisfying the existing experimental data.
ACKNOWLEDGMENTS. We thank Dr. Uli Schmitz for many useful discussions, and Eric Pettersen for providing the script to display ensembles of structures. This work was supported by NIH grants GM39247 and RR01081. TMB was partially supported by the NIH Training Grant NS07219.
REFERENCES Altona, C., and Sundaralgam, M., 1972, J. Am. Chem. Soc. 94:2333–2344. Basus, V. J., 1989, Meth. Enzym. 177:132. Bax, A., Vuister, G. W., Grzesiek, S., Delaglio, F., Wang, A. C., Tschudin, R., and Zhu, G., 1994, Meth.
Enzym. 239:79. Bax, A., and Lerner, L., 1988, J. Magn. Reson. 79:429. Beckman, R. A., Litwin, S., and Wand, A. J., 1993, J. Biomol. NMR 3:675. Blackledge, M. J., Brüschweiler, R., Griesinger, C., Schmidt, J. M., Xu, P., and Ernst, R. R., 1993, Biochemistry 32:10960. Bonvin, A. M., and Brünger, A. T., 1995, J. Mol. Biol. 250:80. Borgias, B. A., and James, T. L., 1990, J. Magn. Reson. 87:475. Brünger, A. T., Clore, G. M., Gronenborn, A. M., Saffrich, R., and Nilges, M., 1993, Science 261:328. Brüschweiler, R., Blackledge, M., and Ernst, R. R., 1991, J. Biomol. NMR 1:3.
Case, D. A., Dyson, H. J., and Wright, P. A., 1994, Meth. Enzym. 239:393. Celda, B., Widmer, H., Leupin W., Chazin, W. J., Denny, W. A., Wüthrich, K., 1989, Biochemistry 28:1462–1470.
Constantine, K. L., Friedrichs, M. S., Mueller, L., and Bruccoleri, R. E., 1995, J. Magn. Reson. Ser. B 108:176. Conte, M. R., Bauer, C. J., and Lane, A. N., 1996, J. Biomol. NMR 7:190. Ernst, R. R., Bodenhausen, G., and Wokaun, A., 1987, Principles of Nuclear Magnetic Resonance in One and Two Dimensions, Oxford University Press, New York. Farr-Jones, S., Miljanich, G. P., Nadasdi, L., Ramachandran, J., and Basus, V. J., 1995, J. Mol. Biol. 248:106. Fedoroff, O. Y., Ge, Y., and Reid, B. R., 1997, J. Mol. Biol. 269:225. Fennen, J., Torda, A. E., and van Gunsteren, W. F., 1995, J. Biomol. NMR 6:163. Ferrin, T. E., Huang, C. C., Jarvis, L. E., and Langridge, R., 1988, J. Mol. Graphics 6:13. Fletcher, R., 1981, Practical Methods of Optimization. Vol. 2, Wiley, New York.
Folmer, R. H., Hilbers, C. W., Konings, R. N., and Nilges, M., 1997, J. Biomol. NMR 9:245. Gillespie, J. R., and Shortle, D., 1997, J. Mol. Biol. 268:170. Glemarec, C., Kufel, J., Földesi, A., Maltseva, T., Sandström, A., Kirsebom, L. A., and Chattopadhyaya, J., 1996, Nucl. Acids Res. 24:2022. Gochin, M., and Roder, H., 1995, Prot. Sci. 4:296. González, C., Stec, W., Kobylanska, A., Hogrefe, R. I., Reynolds, M. A., and James, T. L., 1994, Biochemistry 33:11062.
González, C., Stec, W., Reynolds, M. A., and James, T. L., 1995, Biochemistry 34:4969. Gorenstein, D. G., 1994, Chem. Rev. 94:1315.
Görler, A., and Kalbitzer, H. R., 1997, J. Magn. Reson. 124:177. Griesinger, C., Sorensen, O. W., and Ernst, R. R., 1985, J. Am. Chem. Soc. 107:6394.
Analysis of Protein and Nucleic Acid NMR Data
221
Huang, P., Patel, D. J., and Eisenberg, M., 1993, Biochemistry 32:3852. Jardetsky, O., and Lefevre, J. F., 1994, FEBS Lett. 338:246. Keepers, J. W., and James, T. L., 1984, J. Magn. Reson. 57:404–426. Kelley, L. A., Gardner, S. P., and Sutcliffe, M. J., 1996, Prot. Eng. 9:1063.
Kemmink, J., and Scheek, R. M., 1995, J. Biomol. NMR 6:33. Kessler, H., Griesinger, C., Lautz, J., Müller, A., van Gunsteren, W. F., and Berendsen, H. J. C., 1988, J. Am. Chem. Soc. 110:3393. Koning, T. M. G., Boelens, R., van der Marel, G. A., van Boom, J. H., and Kaptein, R., 1991, Biochemistry 30:3787. Kumar, A., James, T. L., and Levy, G. C., 1992, Isr. J. Chem. 32:257. Lane, A. N., 1993, Prog. NMR Spectrosc. 25:481. Lane, A. N., Bauer, C. J., and Frenkiel, T. A., 1993, Eur. Biophys. J. 21:425. Li, Y. C., and Montelione, G. T., 1995, Biochemistry 34:2408. Lipari, G., and Szabo, A., 1982, J. Am. Chem. Soc. 104:4546.
Liu, H., Thomas, P. D., and James, T. L., 1992, J. Magn. Reson. 98:163. Liu, H., Kumar, A., Weisz, K., Schmitz, U., Bishop, K. D., and James, T. L., 1993, J. Am. Chem. Soc. 115:1590. Liu, H., Banville, D. L., Basus, V. J., and James, T. L., 1995a, J. Magn. Reson. Ser. B 107:51. Liu, H., Spielmann, H. P., Ulyanov, N. B., Wemmer, D. E., and James, T. L., 1995b, J. Biomol. NMR 6:390. Liu, H., Tonelli, M., and James, T. L., 1996, J. Magn. Reson. Ser. B 111:85.
Marion, D., and Wüthrich, K., 1983, Biochem. Biophys. Res. Commun. 113:967. McAteer, K., Ellis, P. D., and Kennedy, M. A., 1995, Nucl. Acids Res. 23:3962. Merutka, G., Morikis, D., Brüschweiler, R., and Wright, P, E., 1993, Biochemistry 32:13089. Mujeeb, A., Bishop, K., Peterlin, B. M., Turck, C., Parslow, T. G., and James, T. L., 1994, Proc. Natl. Acad. Sci. U.S.A. 91:8248. Mujeeb, A., Kerwin, S. M., Egan, W., Kenyon, G. L., and James, T. L., 1992, Biochemistry 31:9325. Mujeeb, A., Kerwin, S. M., Egan, W., Kenyon, G. L., and James, T. L., 1993, Biochemistry 32:13419.
Mujeeb, A., Reynolds, M. A., and James, T. L., 1997, Biochemistry 36:2371. Nilges, M., 1996, Curr. Opin. Struct. Biol. 6:617. Ösapay, K., Theriault, Y, Wright, P. E., and Case, D. A., 1994, J. Mol. Biol. 244:183. Palmer, A. G. III, 1997, Curr. Opin. Struct. Biol. 7:732. Pearlman, D. A., 1994, J. Biomol. NMR 4:279. Pearlman, D. A., 1996, J. Biomol. NMR 8:49. Pellegrini, M., Gobo, M., Rocchi, R., Peggion, E., Mammi, S., and Mierke, D. F., 1996, Biopolymers 40:561. Saenger, W., 1984, Principles of Nucleic Acid Structure, Springer-Verlag, New York.
Salgueiro, C. A., Turner, D. L., and Xavier, A. V, 1997, Eur. J. Biochem. 15:244. Schmitz, U., González, C., Ulyanov, N. B., Blocker, F. H., Liu, H., and James, T. L., 1996, in Biological Structure and Dynamics, Vol. 2 (R. H. Sarma and M. H. Sarma, eds.), Adenine Press, New York, p. 165. Schmitz, U., and James, T. L., 1995, Meth. Enzym. 261:1. Schmitz, U., Kumar, A., and James, T. L., 1992a, J. Am. Chem. Soc. 114:10654. Schmitz, U., Sethson, I., Egan, W. M., and James, T. L., 1992b, J. Mol. Biol. 227:510. Schmitz, U., Ulyanov, N. B., Kumar, A., and James, T. L., 1993, J. Mol. Biol. 234:373. Schmitz, U., Zon, G., and James, T. L., 1990, Biochemistry 29:2357.
Sorensen, M. D., Bjorn, S., Norris, K., Olsen, O., Petersen, L., James, T. L., and Led, J. J., 1997, Biochemistry 36:10439.
Thomas, P. D., Basus, V. J., and James, T. L., 1991, Proc. Natl. Acad. Sci. U.S.A. 88:1237. Torda, A. E., Scheek, R. M., and van Gunsteren, W. F., 1990, J. Mol. Biol. 214:223.
222
Anwer Mujeeb et al.
Ulyanov, N. B., Mujeeb, A., Donati, A., Furrer, P., Liu, H., Farr-Jones, S., Konerding, D. E., Schmitz, U., and James, T. L, 1998, in Molecular Modeling of Nucleic Acids (N. B. Leontis and J. Santalucia, Jr., eds.). American Chemical Society, Washington DC, p. 181. Ulyanov, N. B., Schmitz, U., and James, T. L., 1993, J. Biomol. NMR 3:547. Ulyanov, N. B., Schmitz, U., Kumar, A., and James, T. L., 1995, Biophys. J. 68:13. van Gunsteren, W. F., Brunne, R. M., Gros, P., van Schaik, R. C., Schiffer, C. A., and Torda, A. E., 1994, Meth. Enzym. 239:619. van Wijk, J., Huckriede, B. D., Ippel, J. H., and Altona, C., 1992, Meth. Enzym. 211:286. Weisz, K., Shafer, R. H., Egan, W. M., and James, T. L., 1992, Biochemistry 31:7477. Weisz, K., Shafer, R. H., Egan, W. M., and James, T. L., 1994, Biochemistry 33:354. Widmer, H., and Wüthrich, K., 1987, J. Magn. Reson. 74:316. Wishart, D. S., and Sykes, B. D., 1994a, Meth. Enzym. 239:363. Wishart, D. S., and Sykes, B. D., 1994b, J. Biomol. NMR 4:171. Yao, L. J., James, T. L., Kealey, J. T., Santi, D. V, and Schmitz, U., 1997, J. Biomol. NMR 9:229. Zhang, Q., Chen, J., Gozansky, E. K., Zhu, F., Jackson, P. L., and Gorenstein, D. G., 1995, J. Magn. Res. Ser. B 106:164. Zhu, L., and Reid, B. R., 1995, J. Magn. Reson. B 106:227.
7
Complete Relaxation and Conformational Exchange Matrix (CORCEMA) Analysis of NOESY Spectra of Reversibly Forming
Ligand–Receptor Complexes Application to Transferred NOESY
N. Rama Krishna and Hunter N. B. Moseley 1. INTRODUCTION
The code for the three-dimensional structure of a macromolecule such as a protein is contained in its primary structure (Anfinsen, 1973). A study of the high-resolution three-dimensional structures of proteins is important in understanding protein folding pathways, and in deciphering the rules for protein structure prediction. Principal methods for studying high-resolution structures of proteins are X-ray crystallography and multidimensional NMR spectroscopy. Even though the code
N. Rama Krishna and Hunter N. B. Moseley • Department of Biochemistry and Molecular Genetics, Comprehensive Cancer Center, The University of Alabama at Birmingham, Birmingham, Alabama 35294-2041. Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999.
223
224
N. Rama Krishna and Hunter N. B. Moseley
for the biological activity of a macromolecule is also inherent in its three-dimensional structure, it is only through the formation of complexes with other molecules (e.g., proteins, enzymes, antibodies, DNA/RNA, natural products, carbohydrates and lipids) that they exert their important activities. Thus, a study of the three-dimensional (3D) structures of molecular complexes is vital to an understanding of the molecular basis for recognition, biological function, and mode of action. Molecular complexes at atomic detail can also be studied by crystallography and
NMR spectroscopy. The analysis of nuclear Overhauser effects (NOE) in multispin systems by the use of isolated spin-pair approximation (ISPA) is often inadequate since it neglects multispin effects, i.e., three-spin effects for small molecules with short correlation times, and spin-diffusion effects for large molecules with longer rotational correlation times (Krishna et al., 1978). To properly account for these effects in
quantitative interpretation of 2D NOESY data on nucleic acids and proteins,
complete relaxation rate matrix algorithms such as CORMA have been developed (Borgias and James, 1989, 1988; Keepers and James, 1984). Similarly, total relaxation rate matrix methods have been used in the quantitative analyses of 1D NOEs to account for these multispin effects, including spin diffusion in biomolecules (Krishna et al., 1978). Many other structure refinement procedures that employ total relaxation rate matrix analyses of NOESY intensities have since been proposed (e.g., Xu and Krishna, 1995; Xu et al., 1995a, 1995b; Bonvin et al., 1994; Sugar and Xu, 1992; Guntert et al., 1991; Mertz et al., 1991; Gorenstein et al., 1990; Borgias and James, 1988; Boelens et al., 1988). In all these treatments, there is usually the assumption of a single conformation for the macromolecule under consideration.
1.1. Molecular Complexes and Conformational Exchange When the biomolecular system exhibits a conformational exchange as well, it is necessary to incorporate properly such exchange effects in rigorous structure refinements based on complete relaxation matrix treatments. Typical examples of biomolecular conformational exchange include a ligand exchanging between free and receptor-bound forms (e.g., Moseley et al., 1997), proteins existing in equilibrium between native, partially folded, and denatured forms (Alexandrescu et al., 1990; Dobson and Evans, 1984), or two distinct native forms (e.g., Boyd et al., 1984; Gupta et al., 1972), proteins exhibiting conformational heterogeneity in part of the sequence (Meadows et al., 1991; Driscoll et al., 1990) or slow internal rotations for some of the side chains (Fejzo et al., 1991; Campbell et al., 1976; Wuthrich and Wagner, 1975), or disulfide bond isomerization (Otting et al., 1993) or cis–trans proline isomerization (Hinck et al., 1997), and a DNA duplex existing in an equilibrium between two or more distinct conformations (Choe et al., 1991; Feigon et al., 1984). Theoretical treatments have been developed that describe
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
225
complete relaxation and conformational exchange matrix analyses of NOESY of dynamical systems (Curto et al., 1996; Moseley et al., 1995; Ni and Zhu, 1994; Krishna and Lee, 1992; Lee and Krishna, 1992; London et al., 1992; Lippens et al.,
1992; Ni, 1992). The methods of analysis of NOESY data to develop molecular models of tight complexes (with with moderate molecular weights are essentially identical to those used for uncomplexed proteins, because the NOESY intensities are not influenced by the exchange rates of the interacting species between their free and bound forms. Because the exchange off-rates are much too
slow under these conditions, one can prepare samples containing only the ligand– receptor complex without any free ligand. High-molecular-weight
complexes of proteins with tightly bound ligands still present a challenge for structure determination by NMR. This upper limit in molecular weight has been extended very recently with the introduction of TROSY (Pervushin et al., 1997).
1.2. Reversible Binding and Transferred NOESY When the binding is reversible (during the course of an experiment), as is
usually the case for larger dissociation constants to the NOESY spectrum of a sample containing a ligand and its receptor reflects the transfer of magnetization by chemical exchange between free and complexed species as well as by dipolar exchange in each state. This reversible binding has been exploited in the design of the “transferred NOE or transferred NOESY”
technique, as a means of studying indirectly the receptor-bound conformation of a ligand from the NOESY spectrum of a sample containing excess ligand (Albrand et al., 1979). Under fast exchange conditions, the enhanced cross relaxation due to the longer rotational correlation time can often more than compensate for the minor fractional population of the complexed ligand. It is not the purpose of this chapter to provide an exhaustive survey of literature on the transferred NOESY field; the reader is referred to Ni (1994) for a comprehensive review of this field up to 1993. Other related topics dealing with protein– ligand interactions are covered by James and Oppenheimer (1994). The current
chapter is intended to provide the authors’ own perspective and approach to the quantitative interpretation of NOESY spectra of interacting molecules in general
and the transferred NOESY in particular. Our approach, using the complete relaxation and conformational exchange matrix (CORCEMA) methodology, stresses the need to focus on the entire system involving the ligand, the receptor, their motional dynamics, and the kinetics of the binding process. It is hoped that
this chapter will exert some modest influence on the way we think about transferred NOESY experiments. In the following discussion, we will be using the words receptor and protein (or enzyme) interchangeably, though it must be understood
that the receptor can be any general macromolecule. Indeed, examples are known
226
N. Rama Krishna and Hunter N. B. Moseley
where complexes form under fast reversible binding. These include drug–DNA (Pavlopoulos et al., 1995; Crenshaw et al., 1995; Wadkine and Graves, 1991), protein–DNA (Baleja et al., 1994; Dekker et al., 1993), protein–protein (Yi et al., 1994; Chen et al., 1993), nucleotide/nucleoside-enzyme (Murali et al., 1997; Jarori et al., 1994; Perlman et al., 1994; London et al., 1992), corepressor-represser– operator (Lee et al., 1995), carbohydrate–protein (Casset et al., 1997; Bevilacqua et al., 1992), peptide–protein (Blommers et al., 1997;Ni et al., 1995;Campbell and Sykes, 1991), peptide–antibody (Anglister et al., 1995; Scherf et al., 1992); peptide–membrane (Bersch et al., 1993; Gounarides et al., 1993) complexes, just to mention a few. In addition, the reported observation of NOEs between protein protons and bound water molecules (Otting and Wuthrich, 1989; Clore et al., 1990a) is another example of this. In all these cases, the NOESY intensities reflect both chemical and dipolar exchange of magnetization between protons. Their respective
contributions need to be properly quantitated to get meaningful quantitative struc-
tural information. One area of special interest in studying complexes is structure-based drug design by NMR (Fesik, 1993). For complexes of proteins with tight-binding lead compounds, the methods of analysis of NOESY spectra are well established. Sometimes, however, some of the promising lead compounds may only show marginal affinities, and the chemist is faced with the prospect of designing higheraffinity analogs. Under these circumstances, a structure of the weekly binding lead compound deduced from a quantitative analysis of transferred NOESY spectra that incorporates the interactions between the ligand and the residues in the active site is of immense value. The CORCEMA method treats such ligand–receptor interactions explicitly and should be of value in structure-based drug design efforts. Since the original introduction of the transferred NOE technique in the late 1970s (Albrand et al., 1979), the biomolecular NMR field has experienced several significant advances. Most notable among these are the introduction of multidimensional NMR and TROSY methods for studying proteins that were too large to study in the 1970s, development of isotope-filtered methods for recording subspectra (intra- and intermolecular) of interacting molecules, development of molecular biological procedures and overexpression systems for the production of proteins and other macromolecules with uniform isotopic labeling for use in these measurements, random fractional deuteration to reduce problems due to severe dipolar line broadening and spin diffusion in moderately large proteins development of a wide variety of structure refinement algorithms that can quantitatively analyze the NOESY and other NMR data, and improvements in sensitivity through the construction of NMR systems with very high field magnets (now approaching the Gigahertz range). It is clear that the transferred NOE field can be further advanced by taking advantage of these technological and methodological developments. The theory for steady-state 1D transferred NOEs has been developed by Clore and Gronenborn (1982). It was later extended to selective saturation-based time-
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
227
dependent transferred NOEs with an emphasis on an analysis of initial slopes (Clore and Gronenborn, 1983). Under these conditions, useful structural information in the bound state is obtained if the conformational exchange is fast on the relaxation rate scales. The 2D transferred NOESY (tr-NOESY) presents obvious advantages over time-dependent 1D NOE techniques in terms of generation of large data sets
in an efficient manner, and is generally the preferred approach. Recent studies since 1991 have identified and characterized several factors that play a critical role in the quantitative analyses of 2D transferred NOESY. These are summarized below. 1.2.1. Finite Off-Rates
The importance of finite off-rates in tr-NOESY was anticipated (Choe et al., 1991), and theoretical frameworks to account for these were described by Lee and Krishna (1992) and, independently, by others (London et al., 1992; Ni, 1992; Lippens et al., 1992). The main advantage of these formulations is that the utility of the tr-NOESY technique is now extended to a much wider range of off- and on-rates rather than the restrictive regime of exchange rates faster than the crossrelaxation rates. Typical examples of reversibly forming complexes with off-rate slow on the chemical-shift scale are lysozyme–GlcNac (Lumb et al., 1994) and
Trp-repressor–operator (Lee et al., 1995) complexes. Additionally, as the molecular weight of the enzyme increases, the rotational correlation time and, hence, the cross-relaxation rates also increase, and the fast exchange approximation often employed in traditional tr-NOESY analyses may not be satisfied. AH these theoretical formulations have been cast for treating multispin systems.
1.2.2. Intermolecular Ligand–Receptor Dipolar Relaxation Many traditional analyses of tr-NOE experiments have routinely neglected ligand–protein intermolecular cross relaxation, partly because (i) until recently there was no adequate theoretical framework available to properly account for them, (ii) neglect of cross relaxation with protein protons simplified the analyses consid-
erably, and (iii) in some instances the structure of the receptor active site and/or the identity of the residues within the site was presumably unknown and it was therefore
difficult to take this cross relaxation into account. Even now, several publications continue to appear in which bound ligand conformations are being deduced without regard to the possible role of protein protons and motions at the active site on the ligand NOESY spectra. Only time will tell whether any of these published structures need revision. It is obvious, however, that any serious effort at a structure-based design of a protein-binding ligand will substantially benefit from one’s ability to explicitly
incorporate the intermolecular contacts with the protein, rather than suppressing them or ignoring them. This is because the conformation of the active site itself can
change substantially, depending upon the different chemical modifications on the
228
N. Rama Krishna and Hunter N. B. Moseley
ligand. Purine nucleotide phosphorylase inhibitors are an example of this (Ealick et al., 1991). Our results (Moseley et al., 1994, 1995; Jackson et al., 1995) and those
of others (Arepalli et al., 1995; Ni and Zhu, 1994; Zheng and Post, 1993; London et al., 1992) show conclusively that the neglect of ligand–protein interactions, and in particular the neglect of protein-mediated effects, can result in misleading conclusions about the bound conformation of the ligand. The effects of ligand–protein cross relaxation on the ligand tr-NOESY spectrum can be rather complex, and result in two distinct types of effects, depending upon the relative disposition of receptor protons in relation to ligand proteins. These are (i) protein-mediated spin-diffusion effects, which can sometimes dramatically affect the initial growth portions of the ligand tr-NOESY (Jackson et al., 1995), and (ii) protein-leakage effects, which tend to affect more the decay portions of the ligand–tr-NOESY. The protein-mediated spin diffusion (or protein indirect effects) deserves special consideration, since, under high ligand–receptor ratios customarily employed in tr-NOE experiments, protein-mediated spin-diffusion effects can be much more efficient than the corresponding bound-ligand-mediated spin-diffusion effects and can produce dramatic effects even for very short mixing times (Jackson et al., 1995). These protein-indirect effects on the ligand tr-NOEs, if not properly accounted for, can result in misleading compact structures for the bound ligand, as predicted theoretically (Moseley et al., 1995; Jackson et al., 1995; Ni and Zhu, 1994) and experimentally confirmed (Dratz et al., 1996; Arepalli et al., 1995). In contrast, the protein-leakage effects might lead to slightly expanded structures, if not properly treated. 1.2.3. Motions in the Bound State
1.2.3a. Protein Motions at the Active Site. An implicit assumption usually made is that the bound conformation deduced by the tr-NOESY technique corresponds directly to that of the ligand bound in the active site. Indeed, such an
assumption is not unreasonable (i) if the active site with and without the ligand remains essentially identical [e.g., neuraminidase (Janakiraman et al., 1994)] as in the rigid lock-and-key model (Fischer, 1894), or (ii) when the process of ligand binding to and release from the active site (into the solvent) is practically instantaneous. Complications arise, however, when the ligand-binding process does not follow the rigid lock-and-key model and the active site on the receptor exhibits distinct conformational movements following the initial binding of a ligand, and/or the occupation of the active site by a ligand is not instantaneous but takes a finite time. These motions can be fast (motion of side chains) or slow (e.g., large-scale motions of domains). Motions on a time scale have been detected in the tips of the flaps that cover the active site of HIV-1 protease (Nicholson et al., 1995). In the
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
229
case of lysozyme (adsorbed on mica), atomic force microscopy measurements have detected conformational changes of the order of 5 Å lasting for ~50 ms in the presence of a substrate (Radmacher et al., 1994). Maltodextrin binding protein exhibits a “Venus flytrap” type of rigid-body hinge-bending motion of two globular domains through an angle of 35° upon ligand binding (Sharff et al., 1992). Other examples of proteins exhibiting hinge-bending motions include thermolysin (Holland et al., 1992) and related neutral proteases, periplasmic proteins (Quiocho, 1991), yeast hexokinase (Bennett and Steitz, 1978), and adenylate kinase (Schulz et al., 1990), to name a few. The active site in human purine nucleoside phosphorylase exhibits a considerable fluidity and undergoes different conformational rearrangements, depending upon the inhibitors (Ealick et al., 1991). All these proteins can be thought of as exhibiting an “induced fit” binding mode (Koshland, 1958). It is easily appreciated that the motions at the active site occurring over a finite time (milliseconds to nanoseconds, longer than the rotational correlation time of the receptor) during the course of induced-fit binding of a ligand have the potential to modulate the intermolecular ligand–protein dipolar contacts as well as the intramolecular contacts due to the accompanying conformational changes. Ignoring such effects might result in an erroneous interpretation of tr-NOESY data
on a complex ligand–receptor system.
1.2.3b. Ligand Motions in the Bound State. The ligand itself may exhibit conformational transitions while bound to the protein (Perlman et al., 1994). Hence, any attempt to determine the so-called bound conformation of a ligand must properly address this conformational malleability of the ligand as well. The CORCEMA algorithm permits an incorporation of motions in both the protein and the ligand in the bound state (and of course in the unbound states as well). 1.2.4. Intermolecular Transferred NOESY When the ligand–receptor ratio is not too high, it is sometimes possible to observe intermolecular ligand–receptor NOESY contacts for moderate-size proteins These contacts are extremely valuable for properly docking the ligand within the binding pocket and, together with intra-tr-NOEs, for structurally refining the ligand and the active site residues.
1.2.5. Nonspecific Binding of the Ligand
In addition to binding in the active site of a receptor with high affinity and specificity, a ligand could also often associate with a receptor at nonspecific or weak
binding sites. Such nonspecific binding has been demonstrated in several systems (e.g., Murali et al., 1997; Jarori et al., 1994; Behling et al., 1988). The tr-NOEs resulting from nonspecific binding may mask tr-NOEs from specific binding, and thus can complicate structural analysis and the estimation of specifically bound ligand concentration. Hydrophobic association with surface hydrophobic residues
230
N. Rama Krishna and Hunter N. B. Moseley
and electrostatic interaction with charged surface residues are presumably a couple of factors that contribute to such nonspecific binding. 2. CORCEMA THEORY The theory for CORCEMA analysis presented here is based on the matrix algebra formulation originally developed by our laboratory to treat multistate (n-state) conformational exchange (Krishna et al., 1980), and is an extension of our early work on NOESY in exchanging systems (Lee and Krishna, 1992; Choe et al., 1991). In the following, we formulate the theory for a general multistate situation, and illustrate its application with two specific examples: a two-state model and a three-state model of ligand–enzyme interactions.
2.1. Basic Formulation The dynamic matrix D that governs the time evolution of the peak intensities in a 2D NOESY experiment is given by (Ernst et al., 1987)
where R is the relaxation rate matrix and K is the kinetic matrix (Ernst et al., 1987; Krishna et al., 1980). The kinetic matrix has elements (Krishna et al., 1980),
where
is the rate of exchange from conformations i to j. The elements of the R matrix for NOESY are (Krishna et al., 1978)
In the above equations, are the transition probabilities due to dipolar relaxation between two spin-1/2 nuclei i and j undergoing isotropic rotational diffusion, and are given by (Krishna et al., 1978)
where. and. and. and are the Larmor frequency and magnetogyro ratio, respectively, for spin i, r
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
is the internuclear distance, molecule, and
231
is the isotropic rotational correlation time for the
is the leakage relaxation. We will refer to the off-diagonal elements
as cross-relaxation rates. Similar expressions for ROESY (Bax and Davis, 1985; Bothner-By et al., 1984) experiments on homonuclear systems are
and
The transition probabilities and macromolecules that satisfy the limit
are given by Eq. (4). For large the off-diagonal elements for the
relaxation rate matrix have opposite signs for the NOESY and ROESY experiments, a feature that has been exploited in the design of pulse sequences for minimizing spin-diffusion contributions (Macura et al., 1994; Fejzo et al., 1992). For situations where internal motions need consideration, we can use Lipari and Szabo’s model free expressions (1982a, 1982b) for the transition probabilities, modified in an empirical manner (Baleja et al., 1990) for NOE contact between two protons i and j:
and
is the effective correlation time for internal motion and satisfies the extreme narrowing condition (Lipari and Szabo, 1982a, 1982b). In Eq. (6), and respectively, are order parameters for nuclei i and j, and is an averaged value due to internal motion. A generalization of Eq. (6) to cases with internal correlation times on different time scales can also be made when the need arises (Clore et al.,
1990b). If is the fractional population of molecules in conformation k, then the elements of the kinetic matrix further satisfy the relationships (Krishna et al., 1980)
232
N. Rama Krishna and Hunter N. B. Moseley
From Eqs. (8) and (9) it follows that a row vector composed of 1’s and a column vector composed of the fractional populations constitute an eigenvector pair corresponding to the zero eigenvalue of the kinetic matrix K (Krishna et al., 1980). Using
this property, we have previously shown for noninteracting systems that when the conformational exchange rates are much faster than the relaxation rates in all the conformations, the effective relaxation rate (or rate matrix for coupled multispin systems) is simply a weighted average of relaxation rates (or rate matrices) in the
N individual conformations (Krishna et al., 1980):
where is the effective relaxation rate matrix, and is the relaxation rate matrix for the kth conformation with a fractional population In the present formalism, which deals with interacting systems, the R and K matrices are generalized relaxation rate and kinetic matrices, respectively, and are composed of submatrices that describe each molecular species in each state. The
manner in which they are defined becomes obvious from the two-state and threestate examples given below. The NOESY intensities at a mixing time are calculated from the expression where is a square matrix of intensities for all the molecules, and I(0) is a diagonal matrix with elements proportional to equilibrium concentration matrices in each state. It corresponds to all protons that are according to their chemical shifts during the evolution. If U is the transformation matrix that diagonalizes then where
is a diagonal matrix. Then Eq. (11) becomes
2.1.1. CORCEMA Calculations for Finite Delays In Eq. (13), the assumption was made that the magnetizations for all protons recovered to their thermal equilibrium values, prior to the preparation pulse, and hence the diagonal elements of the I(0) matrix correspond directly to the equilibrium concentrations; i.e.,
where is a diagonal matrix with elements representing the concentrations. In practice, however, this condition is met less frequently, and the relaxation delay T between pulses (i.e., acquisition time plus the waiting period for the next preparation pulse) is somewhat comparable to the longitudinal relaxation times. Under these conditions, Eq. (13) is modified as follows:
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
233
where I(0, T) is a diagonal matrix with elements,
An important result from Eq. (16) is that the NOESY spectrum can be asymmetric
due to differences in the longitudinal recoveries of coupled spins—a factor that can be exploited to generate large data sets for relaxation rate matrix analyses in general by recording the spectra at several mixing times as a function of the recycling time T. The effect of finite delays between pulses on structure refinements of nucleic acids and proteins has been addressed (Zhu and Reid, 1995; Dellwo et al., 1994). 2.2. TWO-STATE MODEL OF LIGAND–RECEPTOR INTERACTIONS In the following, we present a formulation suitable for reversible binding of a ligand and a protein to form binary complexes. This model is characterized by the
free state consisting of the interacting species and in their uncomplexed form and the bound state in which they form a complex as shown in Fig. 1. The generalized R matrix is composed of generalized submatrices and for the free and bound states, respectively (matrices describing more than one molecular species will be referred to as generalized matrices). They are defined as follows:
The and are the relaxation rate matrices for the ligand and the enzyme, respectively, in their uncomplexed state. The diagonal and off-diagonal terms of these matrices take into account the complete dipolar connectivities as described elsewhere (Krishna et al., 1978). In addition, any leakage terms (such as dipolar
234
N. Rama Krishna and Hunter N. B. Moseley
relaxation of amide protons with the nucleus, solvent exchange rates, and relaxation due to dissolved paramagnetic oxygen) can be added to the diagonal elements (Krishna et al., 1978). The absence of cross relaxation between the ligand and enzyme in their free states is denoted by the zero off-diagonal elements of the matrix. In a similar fashion, and are the relaxation rate matrices for the
complexed form of the ligand and the enzyme, respectively. The intermolecular dipolar cross relaxation [containing terms of the type in the complex is denoted by (and its transpose, This matrix is, in general, rectangular because of the different number of protons in the ligand and the protein. The and matrices include the complete relaxation matrix elements for the intramolecular relaxation within the bound forms of the ligand and the enzyme, respectively. In addition, their diagonal elements also include terms of the type associated with the intermolecular dipolar cross relaxation. In the CORCEMA algorithm, we opted to enter all equivalent protons (e.g., methyl protons) explicitly, so the R matrix is always symmetric. Such a practice is compatible with the manner in which the input files are entered in the algorithm (as PDB files for the coordinates of individual atoms). The generalized kinetic matrix K is composed of generalized kinetic submatrices defined as follows:
with and and where is a unit matrix of dimension appropriate for molecule N; and are respectively the on and off rates describing the reversible complex. The generalized intensity matrix has the following general form:
In this equation, the diagonal elements represent the traditional “intramolecular NOESY spectra” for the four molecular species, except that these
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
intensities are now influenced by the exchange process also. The symmetric counterparts
and
235
and their
refer to the traditional exchange peaks between
the free and bound forms of the ligand and the protein. These include traditional exchange cross peaks, as well as exchange-mediated NOESY (Choe et al., 1991). The and are the “direct intermolecular” NOESY contacts in the “bound state.” As will be shown later, except for very tight binding situations, these intensities will also be influenced by the exchange process. The remaining six peaks are exchange-mediated intermolecular ligand–protein NOESY spectra; four involve a free species and a bound species and two and involve the free ligand and the free protein. The last two arise from a double-exchange process (Curto et al., 1996).
The I(0) matrix is given by
Because of the conformational exchange matrix K, the D matrix is not generally symmetric. Even though algorithms exist that can diagonalize asymmetric matrices such as D, it is desirable to reduce this matrix to a symmetric form if one wishes to use the more standard orthogonal transformation routines meant for symmetric matrices. The D matrix can be brought into a symmetric form using a symmetrization matrix S defined as (Moseley et al., 1995)
Here
and
where
, and are the concentrations of the free and bound forms of the ligand and enzyme. This form of the symmetrization matrix is related to but
slightly different from the one used by Ni (1992) in terms of separate ratios of equilibrium concentrations for ligand and enzyme in their free and bound forms, and the number of equivalent spins for each resolved resonance. As shown in the second example, our definition lends itself to an automatic extension to any arbitrary number of states.
Thus the symmetrized dynamic matrix,
is given by
236
N. Rama Krishna and Hunter N. B. Moseley
where it is easily verified that The symmetrized form of T as
and
are now symmetric (Moseley et al., 1995). by a transformation
can be put in a diagonal form
The expression for NOESY intensities, Eq. (13), now becomes
2.2.1. Fast Conformational Exchange
When the conformational exchange rates are much faster than the relaxation rates in the free and bound states of the ligand and the enzyme, a simplifying result
obtains, since in this case the relaxation matrix R can be treated as a minor perturbation on the kinetic matrix K, and a perturbation theory treatment can be applied (Moseley et al., 1995; Krishna et al., 1980). For this case, it is easily shown that
where
where and etc; and are the fractional populations for the free forms of the ligand and enzyme, respectively. Since the R matrix is now a minor perturbation, its contribution in first order to the dynamic matrix will be significant only to the “zero” block-diagonal element of the diagonalized form of the K matrix. An approximate solution for
is
where
The important result is that the pertinent generalized relaxation rate matrix that
governs the intensities in the NOESY spectrum is simply a weighted average of the generalized relaxation rate matrices for the free and the bound states of the
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
237
molecules forming a binary complex. This is a generalization to interacting systems of our earlier result (Krishna et al., 1980). Note that because of the matrix is asymmetric, but can be readily put in a symmetric form using a similarity transformation (Moseley et al., 1995). Taking
advantage of the fact that for normal mixing times employed in NOESY experiments, because of fast conformational exchange (on both relaxation and chemical-shift scales), Eq. (13) now reduces to
where
etc. From Eq. (30) it is
clear that the NOESY spectrum is determined by a generalized relaxation rate matrix which is a weighted average of the rate matrices for the free and bound states (i.e., including the protons on the ligand and the enzyme, and the intermolecular cross relaxation in the bound form). The effect of intermolecular ligand–receptor cross relaxation on the ligand
tr-NOESY spectrum for the general case can be calculated numerically from Eq. (13). However, for the case of fast exchange on the relaxation rate scale in the two-state model, the first few terms describing the initial growth portion are obtained from a Taylor series expansion of Eq. (30):
where
is a constant of proportionality, and
Most noteworthy is the dependence of
represents higher-order terms. on ligand–protein cross relaxation
and in the quadratic and higher-order terms in , This has important consequenceson the accuracy of bound-ligand structures (vide infra).
2.2.2. Absence of Ligand–Enzyme Cross Relaxation In the absence of intermolecular cross relaxation (i.e., and the ligand and the receptor are uncoupled in spin relaxation, and one obtains from Eq. (29) the much simpler result for the effective relaxation rate matrix governing the ligand tr-NOESY:
A similar result obtains for the enzyme.
238
N. Rama Krishna and Hunter N. B. Moseley
2.2.3. Analytical Expressions for the Transferred NOESY on a Two-Spin System (A–X) in the Absence of Ligand–Protein Cross Relaxation
In the following, we present analytical expressions (Lee and Krishna, 1992 and Choe et al., 1991) for the case of a ligand composed of two spin-1/2 nuclei, exchanging between two conformations corresponding to free (A–X) and bound states according to the following scheme (Fig. 2). We have assumed that ligand-protein cross relaxation is negligible. This example is useful in understanding the behavior of NOESY intensities as a function of correlation times, cross-relaxation rates, and forward and reverse exchange rates (which in turn determine the fractional populations). The dynamic matrix for the above case is
This matrix can be diagonalized (Lee and Krishna, 1992) using the transformation matrices U and in Eq. (12) given by
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
The matrix and
239
is diagonal with elements In the above equations,
and The intensities for NOESY peaks associated with spin A are (Lee and Krishna, 1992)
Of these, I(AA) is the diagonal peak, I(AX) corresponds to the direct NOESY peak, is the exchange peak, and is the exchange-mediated NOESY peak. They are schematically defined in Fig. 3. Similar expressions for intensities
associated with the remaining three spins can be obtained by proper interchange of indices. These expressions are useful in understanding the effect of finite off-rates on each of the individual components and due to transfer of magnetization between spins A and X in their two conformations. These individual components can only be observed if the exchange rate is slow on the chemical-shift scale. For fast exchange on the chemical-shift
240
N. Rama Krishna and Hunter N. B. Moseley
scale, the net intensity is a sum of these four components. The effect of varying equilibrium constants on these four intensities has been described by Lee and Krishna (1992). For the special case where the concentrations and the relaxation rates are identical in both states, Eqs. (36)–(39) reduce to the simpler expressions given earlier (Choe et al., 1991).
2.3. Treatment for More than Two States
As an example of treatment for more than two states we consider proteins and enzymes that exhibit hinge-bending motions (or forming encounter complexes, initially). We consider a ligand binding to an enzyme in its “open” state, followed by a hinge-bending motion on the enzyme that “closes in” the ligand in the active site (Fig. 4). The ligand could bind to the open state of the enzyme at a nonspecific or a weak binding site. We have assumed that the free ligand cannot bind directly to the active site in the closed state, due to some steric hindrance (e.g., yeast hexokinase). This assumption was made only for simplicity and to make the model more interesting, but the theoretical results given below can easily be extended to the general case where direct binding of the free ligand can take place in the closed state also. For our simulations, we chose an example provided by the hypothetical binding of Leu inhibitor to thermolysin (Fig. 5). Several other proteins showing this behavior have been mentioned earlier. In general, all examples where binding in the active site is facilitated by induced fit or hinge bending fall into this category, and may involve more than three states. For
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
241
the simplest of these, we adopt the following three-state scheme shown in Fig. 4.
In this case, the R matrix is
242
N. Rama Krishna and Hunter N. B. Moseley
where and refer to generalized relaxation rate matrices for the free state, the open state, and the closed state, respectively, of the ligand–enzyme system. The matrix is identical to that for the two-state case. Similarly, the and matrices describing the open and closed states of the complex have a form similar to for the two-state case (i.e., they contain intermolecular cross-relaxation terms). The generalized kinetic matrix has the form [after a minor notational change from Moseley et al. (1995)]
where
In this example, we are interested in the special case where and which describes hinge-bending motions. The and matrices are identical to the two-state case, and consist of the on- and off-rates, respectively. The I(0) matrix is given by
where is similar to for the two-state case, and and represent the concentrations of the bound species in the open and closed states, and have definitions similar to for the two-state case. The dynamic matrix in this case can be made symmetric by a matrix S, which is an extension of Eq. (22) to the three-state case (Moseley et al., 1995).
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
243
2.3.1. Fast Conformational Exchange on the Relaxation Rate Scale for AH Three States Under this limit, the R matrix can be treated as a minor perturbation on K. Using the procedure outlined in Moseley et al. (1995), the expression for the intensity of the ligand–receptor system under fast-exchange conditions (fast on
both relaxation and chemical-shift scale) is shown to be
where
The generalized population matrix is analogous to that for the two-state case, and and represent population matrices for the open and closed states of the complex, respectively, and have definitions analogous to for the two-state case. They satisfy the requirement The above result for the three-state case can be generalized for any n-state fast exchange equilibrium of a bimolecular complex on the relaxation rate scale:
2.3.2. Slow Hinge-Bending Motions In this limit, there is fast exchange between states 1 and 2, and slow exchange between states 2 and 3. As shown in Moseley et al. (1995) one obtains
Here, we obtain an averaging of the relaxation rate matrix representing states 1 and 2 together that are in fast exchange on the relaxation rate scale. Matrices and are population matrices for states 1 and 2, but normalized such that
2.3.3. Fast Hinge-Bending Motions For this case, we assume that there is slow exchange between states 1 and 2, and fast exchange between states 2 and 3. We obtain the following result (Moseley et al., 1995):
where that
and
are the normalized population matrices for states 2 and 3 such
244
N. Rama Krishna and Hunter N. B. Moseley
2.4. Intermolecular Transferred NOESY
The CORCEMA algorithm can calculate the complete intensity matrix in Eq. (19), including the various intermolecular NOESY contacts between the ligand and the protein in their bound state, and the exchange-mediated peaks between states and within the free state. When the exchange is fast on the chemical-shift scale as well as the cross-relaxation rate scales, the intensity matrix collapses to Eq. (30) given previously. The terms and represent the intermolecular NOESY contacts between the ligand and the receptor. The first few terms in the expansion of the exponential term give
where
represents higher-order terms, is the concentration of the bound is a constant of proportionality.
form of the receptor, and 2.5.
Treatment of Nonspecific Binding
An important artifact in the analysis of experimental transferred NOESY data is the binding of a ligand to a protein in a nonspecific manner at multiple locations other than the binding pocket. The origin of this binding could be the presence of some very weak binding sites, electrostatic interactions between charged groups on the ligand and the receptor, or hydrophobic association with surface-accessible hydrophobic residues. The nonspecifically bound ligand acquires the rotational correlation time of the larger macromolecule and, due to the fast-exchange condition, will significantly contribute to the transferred NOESY experiment on the free ligand. Furthermore its conformation in the nonspecific location is likely to be different from that of the bound ligand within the active site. Conceivably, since several nonspecific binding sites are simultaneously occupied by the ligand molecules, the intermolecular cross relaxation will also be nonspecific in nature, and may be reflected as an increased leakage factor for the nonspecifically bound ligand. In the presence of a significant amount of nonspecific binding, there could be serious errors in the estimation of populations of specifically bound-ligand molecules and, hence, in the complete relaxation and exchange matrix calculations. Nonspecific binding is inherently somewhat difficult to quantitate and properly correct for. In the following we will describe three approaches. An elegant approach for correcting contributions from nonspecific binding involves recording transferred NOESY spectra on a ligand of interest by performing experiments with and without a tightly binding inhibitor and subtracting one from the other (Behling et al., 1988). Here the assumption is that tightly binding inhibitor preferentially occupies the binding site, and hence any transferred NOESY of a ligand in the presence of the inhibitor reflects only nonspecific contributions. These
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
245
NOESY intensities could be subtracted from the tr-NOESY with ligand alone, to arrive at the tr-NOESY spectrum of the specifically bound ligand. This scheme should work well for most proteins and enzymes with well-defined binding pockets. It is less clear if such an approach would work satisfactorily for enzymes with large-scale domain motions with the active site forming only when two domains are closed. This is because the amount of nonspecific binding can be different in the open and closed states due to differences in accessible surface areas for the ligand, and thus the method will only correct for nonspecific binding in the closed state.
Other investigators have focused on identifying sample conditions where such nonspecific binding is minimal (Murali et al., 1997; Jarori et al., 1994). Typically, for each system under investigation, the tr-NOESY for a peak was measured as a function of absolute ligand concentration, while holding the ligand–enzyme ratio constant. For ligand concentrations in the 1- to 2-mM range, the NOE typically remained constant but dramatically increased for higher concentrations (Murali et
al., 1997). This increase at higher concentrations was interpreted to be the result of nonspecific or weak binding. Some of these effects could also be due to increased solvent viscosity associated with increasing enzyme concentration. Performing tr-NOESY at lower ligand absolute concentrations (typically 1 to 2 mM), where the NOE remains constant, reduces the nonspecific binding contributions significantly, as shown by these investigators (Murali et al., 1997; Jarori et al., 1994). Even though this approach significantly minimizes the contributions from nonspecific binding (compared to the high-ligand-concentration case), it is reasonable to
assume they will not be eliminated entirely. To that extent, corrections for residual nonspecific binding may be needed to further improve data analyses. Our solution to the nonspecific-binding problem involves treating it as an optimizable parameter to get the best fit between experimental and calculated tr-NOESY spectra. To generate the appropriate average relaxation rate matrix, we
consider the following kinetic scheme for the molecular species: state 1 corresponds to a free ligand and a free enzyme. State 2 corresponds to the enzyme with nonspecifically bound ligand only. In state 3, the active site of the enzyme from state 2 (with nonspecifically bound ligand) is occupied by a specifically bound ligand derived from the free-ligand pool in state 1 (one can include an additional
pathway where the active site can also be occupied by a molecule from the nonspecifically bound ligand pool in state 2 without altering the results). Under the fast-exchange condition for all states, it can be shown that the above scheme reduces to Eq. (30), where
where
246
N. Rama Krishna and Hunter N. B. Moseley
In the above equation, and are the fractional populations of the ligand respectively in its free form (state 1), nonspecifically bound to the enzyme with unoccupied active site (state 2), nonspecifically bound to the enzyme when its active site is occupied (state 3), and specifically bound (in state 3); i.e., Similarly, and are, respectively, the fractional populations of the enzyme in its free (state 1), nonspecifically bound with unoccupied active site (state 2), and specifically bound (state 3) forms, with
Note that the enzyme in state (3) has both specifically and nonspecifically bound ligands attached to it. One can make the reasonable assumption that the conformation of the nonspecifically bound ligand does not alter from its free state. With this assumption, the relaxation rate matrix differs from that of the free-ligand rate matrix in only two respects: (i) its rotational correlation time, which is now identical to that of the enzyme complex, and (ii) it may experience leakage factors different from that in the free state. From a prior knowledge of the conformation of the free ligand, and the rotational correlation times for the free and bound forms, the optimizable parameters to correct for nonspecific contributions are reduced to the sum of fractional populations and the leakage factor, which we will assume is identical for all ligand protons in its nonspecific bound form. If the ligand molecule is sufficiently small, then the relaxation rate matrix for the nonspecifically bound enzyme can be set identical to that of the free enzyme, The current version of CORCEMA, however, does not use the simplification implied in Eqs. (50) and (51), but requires specification of the full three-state model. 3. METHODS
3.1. The CORCEMA Program The current version of CORCEMA is designed primarily to compute NOESY and ROESY spectra for different proposed models of a dynamical system while optimizing some chosen parameters (e.g., bound and free correlation times, offrates, order parameter, and internal correlation time) to get the best fit with the
experimental data. Figure 6 shows the CORCEMA protocol for version 1.74. Iterative optimization of the conformations of the ligand and the active site to get best fit with NMR data is planned for the future. Currently we use simulated annealing and Powell minimization to optimize nonstructural parameters like exchange rates, equilibrium constants, correlation times, etc. The program can calculate spectra for an N-state model of conformational exchange. The entire
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
247
program was written in C and has a modular architecture so that it can be modified easily. The program is compiled on a Silicon Graphics workstation with UNIX operating system. No machine-specific calls are made in the program, so it will be compatible with other computers and operating systems. The required input files are the number of states involved in the equilibrium (e.g., three states for the
hinge-bending motion), the coordinates (in PDB format) of the various molecular species in their free and bound forms, overall rotational correlation times, and magnitudes of the various conformational exchange rates (i.e., off- and on-rates, as well as the hinge-bending motion rates). The enzyme on- and off-rates could be obtained by independent methods. Next, flags are set to include internal motions
for methyl groups and aromatic rings. It is assumed that the internal rotation correlation time for the methyl groups is much shorter than the overall rotational correlation time for the ligand. The effect of internal motions on intramethyl and
external methyl (i.e., and interactions was incorporated using the model-free approach (Lipari and Szabo, 1982a, 1982b), empirically modified by assigning order parameters and to protons i and j in the internuclear vector (Baleja et al., 1990). For aromatic rings, it is assumed that the ring-flip correlation times are much longer than the rotational correlation times, but much shorter than
the cross-relaxation times;
method is used to account for modulation of internuclear distances. The third stage involves creation of generalized rate matrices for relaxation (R) and kinetics (K), based on the model under consideration. Next, the dynamic matrix D is created, symmetrized, and diagonalized using the QR factorization method (Press et al., 1992). In the prefinal stage, a file consisting of desired peak intensities (cross peaks including exchange-mediated peaks, diagonal peaks, or sums of appropriate sets of peaks in the case of fast exchange on the chemical-shift scale) is read to print the intensities and compare them to experimental values. This comparison involves a calculation of NOE R-factor (Xu et al., 1995a, 1995b; Krishna et al., 1978) obtained by optimization of nonstructural parameters to get the best fit. The implementing program for the optimization protocol in Fig. 6 computes the rates for the ligand and the enzyme in the dynamical model under consideration, as well as the concentrations of the different species under equilibrium, given a subset of exchange rates, equilibrium constants, and total ligand concentrations. These species concentrations are used to define the concentration matrix C and the symmetrization matrix S (in the current version we have normalized the concentrations with respect to one of the species and reexpressed them in terms of ratios of appropriate rates).
3.2. Calculation of Concentrations
For a two-state model, with (Moseley et al., 1995)
the concentrations are given by
248
N. Rama Krishna and Hunter N. B. Moseley
where
and
S i m i l a r l y , for a three-state model ( B e r n a s c o n i , 1986), w i t h and the concentrations are
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
249
where and 3.3. Methods for Suppressing or Identifying Protein-Mediated Effects
While we wish to highlight in this chapter the advantages of ligand–protein cross-relaxation effects (i.e., protein-mediated spin diffusion and protein-induced leakage effects) in structure-based design using transferred NOESY, we also stress the importance of undertaking proper control experiments in which the effects of protein–ligand cross relaxation on the NOESY spectra are deliberately suppressed. A comparison of this control spectrum with that in which the protein-induced
effects are felt by the ligand then provides a basis for deducing or inferring the conformation of the ligand–receptor complex. Once the ligand conformation has
been quantitatively deduced without interference from protein protons, this conformation can, in principle, be properly “docked” within the binding pocket of the protein. This docking process can be guided from direct intermolecular transferred NOESY contacts whenever they are observable (Ramesh et al., 1996; Anglister et al., 1995; Scherf and Anglister, 1993) or indirectly through protein-mediated effects on intraligand NOE intensities (Curto et al., 1996). Since both intermolecular and intraligand TrNOEs can be dependent upon the relaxation rate matrix elements involving the active site residues [Eqs. (31) and (49)], these intensities can serve as experimental constraints in CORCEMA calculations that incorporate the binding pocket residues (Curto et al., 1996). More experience is needed in these types of calculations; however, the transferred NOE studies on fibrinopeptide analogs docked into the thrombin binding pocket (Ni et al., 1995) as well as our work on
the Trp-repressor–operator system (Moseley et al., 1997) and our joint work with Professor Thomas Peters on the sLex/E-selectin system (vide infra) are encouraging and point to the feasibility of such calculations.
3.3.1 Perdeuterated Receptors
The most obvious approach to eliminate ligand–protein cross relaxation is to employ perdeuterated receptors. By eliminating ligand–protein intermolecular cross relaxation altogether, the ligand transferred NOESY spectra are completely free of both protein-mediated spin diffusion and protein-leakage effects. This
250
N. Rama Krishna and Hunter N. B. Moseley
approach has the added benefit of minimizing background signals from the receptor protons so that traditional 2D NMR methods will be adequate to probe the bound-ligand conformation. Typical tr-NOESY examples in literature include conformational studies of substrates interacting with perdeuterated yeast phosphoglycerate kinase (Shibata et al., 1995), honey bee venom melittin complexed to perdeuterated phosphatidylcholine vesicles (Okada et al., 1994), bound to perdeuterated lipid (Gounarides et al., 1993), and senktide, a neurokinin analog, bound to perdeuterated vesicles (Bersch et al., 1993). Selective deuteration of residues within the binding pocket of the receptor is useful in suppressing ligand– protein cross relaxation and identifying intermolecular contacts (Scherf and Anglister, 1993).
3.3.2. NMR Pulse Methods Since perdeuteration of receptors is not always feasible or economical, one could exploit a large number of pulse sequences that in effect suppress proteinmediated spin-diffusion effects which make dominant contributions during the early and mid-range portions of the NOESY time-course curves. In some of these methods, however, since the protein protons still contribute autorelaxation terms to the diagonal elements of the ligand relaxation rate matrix [terms of the type in in Eq. (3)], protein-leakage effects will persist and affect the decay portions of the ligand NOESY cross peaks. These methods are different
from subtraction methods (Andersen et al., 1987) that minimize baseline artifacts in the ligand NOESY spectrum due to broad protein resonances or due to intermolecular contacts with protein protons that resonate at a given ligand proton signal.
A inserted prior to also is effective in eliminating broad protein signals in the ligand tr-NOESY spectrum (Scherf and Anglister, 1993). 3.3.2a. Transferred NOESY with Short Mixing Times. One-dimensional transient NOE experiments (Wagner and Wuthrich, 1979; Gordon and Wuthrich, 1978; Krishna et al., 1978) obtained by the use of selective inversion or selective progressive saturation of chosen resonances have been proposed as a way of measuring direct NOEs between two protons before spin-diffusion effects become dominant. Similarly, 2D NOESY with very short mixing times will accomplish the same objective, at least in principle. In the presence of strong protein-mediated spin diffusion, this suggestion can be very difficult to realize in practice for two reasons. First, for very short mixing times the NOESY spectrum or its equivalent 1D transient NOE will suffer from poor signal/noise problems. Second, transferred NOESY simulations (Jackson et al., 1995; Moseley et al., 1995) on model systems with a correlation time of s under the fast-exchange condition showed that protein-mediated spin diffusion can become significant within the first 50 to 60 ms (Fig. 18). Thus, recording transferred NOESY spectra with extremely short mixing
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
251
times does not appear to be a universal solution, though it might still work for some specific examples, especially when protein-indirect effects are weak. 3.3.2b. Radio-Frequency Pulse Saturation of Receptor Resonances. Continu-
ous rf irradiation within the protein envelope (Clore and Gronenborn, 1983) or the 2D MINSY experiment (Massefski and Redfield, 1988) where the protein signals are selectively saturated during mixing time, are some of the simplest implementations to suppress protein-mediated spin diffusion. In both these methods it is important to ensure that the receptor protons within the binding pocket remain saturated. This could be ensured if the spectrum of the protein without the ligand could be recorded first, and some exploratory irradiation experiments carried out with radio frequency centered at different regions of the broad protein resonance envelope. For complexes where the ligand protons do not overlap with signals from the receptor binding site, the MINSY experiment might be attractive (e.g., if the receptor binding pocket is predominantly composed of aromatic residues and the ligand does not have any aromatic protons). 3.3.2c. Two-Dimensional ROESY. The 2D ROESY experiment (or more correctly the transferred ROESY) offers a very attractive method for separately identifying “direct” and “indirect” NOE contacts in the spectrum of a ligand reversibly binding to a protein. In the NOESY spectrum of a large molecule with both direct and indirect NOE contacts have the same sign as the diagonal, thus misleading the unwary experimentalist. In contrast, in the ROESY experiment, an expansion of the term as a series shows that all terms odd in have negative sign while even terms have positive sign with respect to a positive diagonal. Thus for short spin-lock times, the direct NOE contact has a sign opposite to that of the
diagonal, while the first indirect NOE contact peak has the same sign. (Some authors mistakenly ascribe the indirect effects in ROESY to spin diffusion, in
analogy to NOESY. This is a misnomer since the magnetization equation in the rotating frame cannot be converted to a diffusion equation because of the positive sign of Multispin effect is a more appropriate name for this magnetization transfer, in analogy to small molecules). Thus, a comparison of the NOESY and ROESY spectra should readily identify all protein-mediated indirect pathways. It is not uncommon for several NOESY peaks to be reduced in intensity or disappear altogether in a ROESY spectrum. This is due to a cancellation effect from direct and indirect contributions. A dramatic demonstration of the application of trROESY to identify protein-indirect effects has been given by Arepalli et al. (1995) in their study of the bound conformation of a disaccharide bound to a monoclonal antibody. Figure 7 shows the 2D transferred NOESY spectrum of the disaccharide in the presence of the Fab. Most noteworthy in this figure are the two intense peaks between H4 and (the H4 shows up as a doublet due to coupling with Based on the observation of this cross peak in a previous study (Glaudemans et al., 1990), these authors proposed a conformational change for the disaccharide. This conclusion, however, has been revised in the more recent study by Arepalli et al. (1995),
252
N. Rama Krishna and Hunter N. B. Moseley
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
253
where they performed a ROESY experiment and concluded that the originally observed cross peak between H4 and hydrogens was due to protein-indirect effects. Their ROESY spectra on the free disaccharide and in the presence of the Fab are shown in Fig. 8. 3.3.2d. Network Editing Sequences. A number of network editing pulse sequences have been proposed to suppress spin diffusion during mixing time, and hence hold promise for suppressing protein-mediated spin diffusion in tr-NOESY experiments (Hoogstraten et al., 1995; Macura et al., 1994; Fejzo et al., 1991). These sequences exploit the differences in the signs of the off-diagonal elements of the relaxation rate matrix R for NOESY and ROESY
254
N. Rama Krishna and Hunter N. B. Moseley
for large correlation times In the direct NOESY (D.NOESY) method (Macura et al., 1994), spin diffusion is allowed to take place during the experiment, but their effects on the spectrum are eliminated by the addition of properly normalized NOESY and ROESY spectra. A number of editing sequences have also been proposed that restrict relaxation to direct effects only (Macura et al., 1992). In the selective NOESY (S.NOESY) method (Fejzo et al., 1992), selective 180° pulses to invert selected band of spins are inserted on either side of a 90°–spin-lock–90° element during the mixing period in place of the normal NOESY mixing period to retain only direct effects between the inverted spins and noninverted spins and to suppress all other indirect effects due to spin diffusion. There is also a loss of sensitivity in this method due to the application of a long
CORCEMA Analysis of NOESY Spectra of Ligand–Rcceptor Complexes
255
series of pulses to the resonances. More practical experience with these network editing techniques is needed to evaluate them for tr-NOESY applications. 3.3.2e. QUIET-NOESY and Related Methods. Pulse sequences such as QUIET-NOESY, QUIET-BIRD-NOESY, QUIET-EXSY, and their variants (Vincent et al., 1996a, 1996b; Zwahlen et al., 1994) involve selective excitation of resonances from a chosen pair of protons to monitor direct cross relaxation between them while suppressing spin-diffusion contributions from the intervening protons. An important advantage of some these methods is that neither the chemical shifts of the intervening protons nor the existence of such protons need to be known. These methods should also find applications in tr-NOESY to suppress the effects of protein-mediated spin diffusion on the ligand resonances. Since the smaller ligand molecules, when present in excess (over the receptor) and in fast exchange, generally yield sharp well-resolved signals, they are easily amenable to these
selective excitation schemes. By using a labeled ligand and an unlabeled receptor, the QUIET-BIRD-NOESY can, in principle, give one-step magnetization transfer connectivities between all ligand protons attached to the labeled heteronuclei. 3.4. Methods for Observing Intermolecular Transferred NOESY Contacts
Because the intermolecular tr-NOE contacts between a reversibly binding ligand and its receptor are very useful in properly docking the ligand within the active site, and in quantitative CORCEMA calculations, it is worthwhile to explore
optimal experimental methods for observing them. The many limitations associated with observing intramolecular NOEs in high-molecular-weight systems, such as line broadening and spin diffusion, also apply, albeit in a somewhat less severe manner, to the observation of intermolecular tr-NOEs. For example, because the line shapes for intermolecular ligand–receptor tr-NOEs in the fast-exchange limit reflect the ligand in one dimension and the receptor in the second dimension, these
peaks are inherently somewhat sharper than the intrareceptor NOEs, and hence are comparatively easier to observe. The reported intermolecular tr-NOESY cross peaks in a 37-kDa ligand-protein–DNA complex (Lee et al., 1995) and in ~ 50-kDa peptide–antibody complexes (Arepalli et al., 1995; Scherf et al., 1992) are reasonably sharp and suggest the feasibility of observing these highly informative cross peaks in other similar systems. In larger-molecular-weight systems it may be worthwhile to explore methods such as random fractional deuteration (LeMaster, 1989) and reverse protonation of an otherwise deuterated receptor as a way of reducing dipolar broadening and severe spin-diffusion problems. With random fractional deuteration, however, determination of proper concentrations of different protons within the protein for CORCEMA calculations might become a problem.
Experimental observations (Anglister et al., 1993; Anglister and Zilber, 1990; Glasel, 1989; James, 1976) and theoretical calculations (Curto et al., 1996;Moseley
256
N. Rama Krishna and Hunter N. B. Moseley
et al., 1995) suggest that the intermolecular tr-NOESY contacts are easier to observe at lower ligand–receptor ratios, typically Indeed, it is not uncommon to observe intermolecular tr-NOE contacts during routine tr-NOESY measurements, especially when the ligand–receptor ratio is maintained relatively small At high ligand–receptor ratios they may be usually too weak to observe, especially for very high molecular weight receptors. Many times, however, the requirement of using lower ligand–receptor ratios also means a tr-NOESY spectrum that is dominated by broad featureless resonances from the high-molecular-weight receptor signals (e.g., Fig. 9), thus making it difficult to resolve the
somewhat sharper and interesting intermolecular tr-NOESY peaks. Luckily, a number of techniques can be used to suppress these broad receptor signals to reveal tr-NOESY signals of interest. We will briefly summarize some of them. Since intermolecular tr-NOEs build up and decay comparatively faster than the intraligand tr-NOEs, a second requirement for optimal observation of intermolecular tr-NOEs is that the mixing time be kept relatively short (Curto et al., 1996; Arepalli et al., 1995). Crystallographic data can some times aid the assignment of these inter-tr-NOESY peaks.
3.4.1. Two-Dimensional Transferred NOESY Difference Spectroscopy Anglister and co-workers have successfully used a 2D tr-NOE difference
spectroscopy method to suppress the broad resonances from the antibody receptor and to identify specific intermolecular NOE contacts between the peptide ligand and the antibody protons (Anglister et al., 1993; Scherf and Anglister, 1993; Scherf et al., 1992; Anglister and Zilber, 1990). Typically, NOESY spectra under identical
conditions are collected on two samples of the peptide–antibody mixtures, one in which the peptide is in excess (typically four- to fivefold excess over the antibody), and a second sample in which the ratio is 1:1. A 2D tr-NOE difference spectrum is
obtained by subtracting the second spectrum from the first one. Figure 9 shows the remarkable effectiveness of this method in suppressing the broad resonances of the high-molecular-weight receptor while retaining the important intraligand (cholera toxin peptide CTP3) and intermolecular ligand–receptor NOE contacts for the
cholera toxin peptide (CTP3) bound to the 50-kDa Fab fragment. A number of well-resolved peaks arising from intermolecular peptide–antibody NOE contacts are readily visible in this spectrum, in addition to intraligand tr-NOEs. The amino acid types in the antibody-combining site contributing intermolecular tr-NOEs with
the peptide ligand were identified by perdeuteration and partial deuteration of suspected amino acids (e.g., Phe, Tyr, Trp) and the concomitant disappearance of the cross peaks. In addition, these investigators have also successfully identified
interactions associated with a specific chain in the antibody by specifically labeling the heavy chain or the light chain. Typical results are shown in Fig. 10. Employing an antibody-combining site model based on crystallographic data from other
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
257
antibodies, these authors docked the peptide ligand (Fig. 11 for CTP3 bound to the combining site of TE33) into the binding site using distance restraints from intraand intermolecular-tr-NOE data together with energy minimization and molecular dynamics. 3.4.2.
Spin-Lock and Spin-lock and
Spin-Echo Relaxation Filters
spin-echo relaxation filters are filters that take advantage
of the different spin relaxation properties of the ligand and the receptor arising from their vastly different sizes. Typically, large proteins have much shorter transverse
258
N. Rama Krishna and Hunter N. B. Moseley
relaxation times, and hence their signals decay faster in the transverse plane than the signals from the low-molecular-weight ligands. Hence, a or relaxation filter can effectively suppress the broad signals from the receptor while retaining only the signals from the ligand. Depending on whether these filters are located at
the beginning of the
or
periods, the broad receptor
signals will be filtered from that particular dimension. Figure 12 shows a spectrum obtained by Arepalli et al. (1995), who employed a 20-ms spin-echo following the observed pulse to identify the intermolecular tr-NOESY contacts between a disaccharide [methyl O- -D-galactopyranosyl-( 1,6)-4-deoxy-2-deuterio-4-fluoro- -D-galactopyranoside] and the Fab derived from the antibody
X24. This part of the spectrum clearly identifies contacts between specific protons on the disaccharide and some aromatic residues within the binding pocket of the
Fab. Anglister and co-workers employed a filter with a 20-ms spin-lock pulse at the beginning of the period to observe the intermolecular tr-NOES between a cholera toxin peptide and Fab (Scherf and Anglister, 1993). Similarly,
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
259
260
N. Rama Krishna and Hunter N. B. Moseley
Casset et al. (1997) demonstrated the separate use of spin-lock and spin-echo relaxation filters to observe intermolecular tr-NOEs between the nonreducing disaccharide moiety of Forssman pentasaccharide reversibly binding to Dolichos biflorus lectin. Typical results are shown in Fig. 13. The spin-lock filters can result in a slight loss in sensitivity (Scherf and Anglister, 1993) as well as in some artifacts, and methods to overcome these have been suggested (Ni, 1994).
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
261
3.4.3. Isotope-Selected/Filtered Methods By selectively labeling one of the partners in the interacting pair with a suitable
isotope
one can use
double-half-filters to observe the different
subspectra associated with the ligand–receptor complex (Wider et al., 1990). For
example, the intermolecular tr-NOESY spectrum can be observed without interference from the intraligand and intrareceptor NOESY spectra using selected–X or its reverse double-half-filter pulse sequences (Wider et al., 1990). Several other isotope-filtered pulse sequences useful in identifying intermolecular NOESY contacts have also been described (Folkers et al., 1993; Ikura and Bax, 1992; Gemmecker et al., 1992; Ikura et al., 1992). Table 1 of Lian et al. (1994) gives a summary of these methods. Even though most of these methods have been widely used for tight-binding complexes, they should be applicable to identifying transferred intermolecular
NOEs in reversibly binding complexes as well. To overcome the limitations associated with tedious phase cycling and sensitivity loss due to transverse magnetization decay in these earlier sequences, pulse-field gradient-based isotope-filtered 3D HMQC–NOESY (Lee et al., 1994) and a 2D isotope-edited NOESY sequence (Lee et al., 1995) have been developed for identifying intermolecular ligand–receptor NOE contacts. Figure 14 is an example of the application of a 2D NOESY pulse sequence to identify intermolecular transferred NOEs between a tryptophan and its 37-kDa repressor–operator complex. In addition to contacts between bound forms of the ligand and the receptor, this figure also shows exchange-mediated intermolecular tr-NOEs between the free ligand and the bound form of the receptor (Curto et al., 1996). 3.5. Structure Refinement Calculations A large number of structures from tr-NOESY studies have been published in which the structure refinements were limited to the ligand only. Implicit in such
calculations is that the ligand–protein intermolecular cross relaxation has a negligible influence on intraligand tr-NOESY, and that one is dealing with a two-state model in the fast-exchange situation. In the following we restrict our discussion to those cases where one or both of these assumptions is not valid. For discussion purposes, the ligand–receptor complexes will be divided into two groups—small to medium-size systems where complete NMR structural determination of the entire system is possible, and larger complexes where the receptor may be too large for solution NMR structure determination with high resolution. In all these structure refinement calculations, it is highly desirable to obtain independent estimates of as many parameters (e.g., correlation times, off-rates, binding constants, hinge-bending rates, etc.) as possible to reduce the dimension of
262
N. Rama Krishna and Hunter N. B. Moseley
the search surface for the remaining variables to be optimized and to facilitate the search for the global minimum (Moseley et al., 1995). 3.5.1.
Small to Medium-Size
Complexes
We will assume that the receptor macromolecule is amenable to recombinant expression, and by virtue of its smaller size one could employ the complete battery
of modern 3D and 4D NMR techniques (Clore and Gronenborn, 1991) to determine the assignments and conformation of residues within and surrounding the active
site, with and without the bound ligand. As previously mentioned, it is useful to undertake tr-NOESY measurements at several ratios. Ligand–receptor intermolecular tr-NOE contacts as well as intraligand tr-NOEs can be measured without interference from the receptor signals by isotope-filtered and -edited NMR on complexes where only one of the molecules is labeled (Ramesh et al., 1996; Lee
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
263
et al., 1995). The tr-NOE data [i.e., intraligand, intrareceptor (active site), and intermolecular contacts] can be supplemented by torsion-angle constraints from vicinal coupling-constant data on the bound ligand. This latter information can be deduced under fast-exchange conditions by measuring the ligand vicinal coupling constants as a function of increasing amount of bound-ligand concentration, and then extrapolating them to the limit of 100% bound ligand (Campbell et al., 1992; Campbell and Sykes, 1991). These data could be used in a variety of refinement methods to deduce the ligand–receptor (active site) conformation. We briefly summarize some of them. 3.5.1a. Testing between Several Models. The simplest method of analysis involves testing between several alternative models for the complex by predicting tr-NOESY intensities using CORCEMA (Moseley et al., 1997, 1998). In the current version of CORCEMA, the algorithm computes the predicted intra-tr-NOESY as well as inter-tr-NOESY spectra for each proposed model (including the crystallographic structure, if available), and compares them with experimental data using NOE R-factor analysis (Xu et al., 1995a, 1995b; Krishna et al., 1978). Some
parameters, such as correlation times, leakage factors, and off-rates, etc., could be optimized to get the best fit in each case. If the agreement is not satisfactory,
alternative models for the complex could be proposed, and the model that gives the best fit can be identified (Krishna et al., 1978). If the intrareceptor NOESY spectra are observed experimentally, this data can be included in this procedure to directly determine the ligand-induced structural perturbations. This method is useful if one is testing among several proposed models (including crystallographic structures) to see which one is most compatible with the NMR data. It may also be used for a manual refinement of the structure of the complex, although this can be a rather
tedious task if there are too many variables or if the active site is flexible. 3.5.1b. Distance-Constrained Methods. In this approach, which does not use CORCEMA analysis, one will first classify the intraligand and intermolecular ligand–receptor tr-NOESY intensities (as well as intrareceptor tr-NOEs) into distance constraints with upper and lower bounds (Kuntz et al., 1989; Scheek et al., 1989; Nilges et al., 1988; Braun, 1987; Clore et al., 1986) using a qualitative
isolated spin-pair approximation (ISPA). Using intraligand distance constraints [and torsion-angle constraints from measurements (Campbell et al., 1992)], suitable models for the bound-ligand conformation are generated using standard distance geometry–restrained molecular dynamics–restrained energy minimization procedures. Next, using a known or an approximate conformation for the active site (e.g., from NMR or crystallography), one could first approximately dock the ligand within the active site (from the known intermolecular contacts and any additional information such as hydrogen bonds). This starting structure for the ligand active site can be refined using intermolecular distance constraints, while holding the active site residues fixed using distance constraints. This procedure
exploits the sensitivity of the intra-tr-NOESY and inter-tr-NOESY to proton–proton
264
N. Rama Krishna and Hunter N. B. Moseley
distances (Curto et al., 1996; Ramesh et al., 1996). An advantage of distanceconstrained methods such as distance geometry is that they are relatively less
CPU intensitive, and hence fast. A disadvantage is the loss of information (e.g., spin diffusion, protein leakage, finite off-rates, and internal motions) associated with a classification of intensities into distances using strong, medium, weak criteria. For example, in the presence of strong protein-mediated spin diffusion between a pair of ligand protons, the corresponding intraligand tr-NOEs at short mixing times can be very intense even when the distances are large (Jackson et al., 1995). Thus, standard distance geometry type of calculations using distance constraints based on intensities will result in compact structures by this approach since multispin effects are not properly accounted for in these methods. Similarly, if the fast-exchange condition is not satisfied, the magnitude of transferred NOEs can be smaller and can potentially result in slightly expanded distance geometry structures. 3.5.1c. Intensity-Restrained Refinement. The above limitation with distance-constrained methods involving a loss of information about multispin effects is lifted by the total relaxation matrix treatments (Keepers and James, 1984; Krishna et al., 1978). CORCEMA can be used in intensity-based refinement procedures (Xu et al., 1995; Mertz et al., 1991; Borgias and James, 1988) to iteratively optimize a target function consisting of experimentally measurable intensities only (e.g., intraligand tr-NOESY and inter-tr-NOESY in our case). The target function can be constructed to be simple (Yip and Case, 1989; Borgias and James, 1988) or variable (Xu et al., 1995a, 1995b; Guntert et al., 1991; Mertz et al., 1991; Braun, 1987). The optimization can be efficiently performed either with a least-squares refinement (Borgias and James, 1988), numerical or analytical gradient-based intensity-restrained refinement (Xu et al., 1995a, 1995b; Mertz et al., 1991; Yip and Case, 1989), or simulated-annealing-based methods (Xu and Krishna, 1995; Bonvin et al., 1994), or a combination of these. Intensity-restrained refinements can prove to be computer intensive due to repetitive diagonalizations of the dynamic matrix during optimization, but may serve as attractive methods if the total number of protons on the ligand and the active site residues under consideration is relatively small. These procedures typically use data at several mixing times, including long mixing times where spin diffusion from unobservable protons (e.g., active site residues of a high-molecular-weight enzyme) also can influence the observable intensities, and hence are amenable to refinement to some extent. Recent experimental work from our laboratory has confirmed this prediction (Xu et al., 1995b). In the presence of significant intermolecular cross relaxation, a rigorous characterization of the ligand–receptor complex requires a knowledge of the relaxation rate matrix for the macromolecule (active site residues), and can be deduced for moderate-size receptors in principle from intrareceptor NOESY. Even if the intra-NOESY spectrum for the active site residues is not amenable for direct observation (e.g., for high-molecular-weight complexes), because the intraligand tr-NOESY [Eq. (31)] and the inter-tr-NOESY [Eq. (49)] both depend upon the
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
265
matrix, they have the potential to serve as experimental constraints in the refinement of the active site conformation. In these refinements, it is helpful to take into account external leakage factors
as accurately as possible to calculate the intensity profiles. Some typical leakage factors arise due to weak interactions with paramagnetic oxygen in solution, exchange of amide protons with bulk solvent, and dipolar interaction of amide hydrogens with the quadrupolar nitrogen or the labeled heteronucleus (Dellwo et al., 1994; Liu et al., 1993; Krishna et al., 1978). 3.5.1d. Iterative Refinement Employing Distance Restraints and CORCEMA Back-Calculation of Transferred NOESY Spectra. The above limitations about loss of information in purely distance-constrained methods such as distance geometry or restrained molecular dynamics can be lifted by complementing them with a CORCEMA back-calculation of tr-NOESY spectra to properly account for multispin effects and internal motions. This type of procedure is an integral part of some standard hybrid-matrix-based algorithms used in refinement to a single rigid structure (Gorenstein et al., 1990; Borgias and James, 1990, 1989; Boelens et al., 1988). In principle, the relaxation rate matrix and, hence, the distances can be back-calculated directly from the experimental NOESY spectrum (Olejniczak et
always possible since many experimental al., 1986). In practice, however, this is not reasons (e.g., in our case, the intensities are not accessible due to overlap or other intrareceptor NOEs for high-molecular-weight receptors or some intermolecular
tr-NOEs may not always be observable). In the traditional hybrid-matrix method, the missing elements in the experimental intensity matrix are supplemented with intensities back-calculated for an initial trial structure using a complete relaxation rate matrix algorithm. For transferred NOESY analysis, this last step can be accomplished using CORCEMA and a reasonable trial structure and parameters for the ligand–receptor complex (e.g., from crystallography). The back transformation is to an average relaxation rate matrix for fast exchange or into the dynamic matrix for the general case. After reconciling the various elements of the dynamic matrix in a manner analogous with some existing algorithms (Borgias and James, 1990), one can use the distance information in a distance geometry or restrained molecular dynamics procedure to generate a new trial structure. The corresponding full tr-NOESY spectrum (including intrareceptor and intermolecular tr-NOESY) for this new trial structure can be computed using CORCEMA, and the next cycle of optimization can be started. Since, in the presence of strong ligand– protein cross relaxation, both intraligand tr-NOEs [Eq. (31)] and intermolecular tr-NOEs [Eq. (49)] depend upon the elements of they can potentially serve as experimental constraints on the orientation of active site residues. If the intrareceptor NOEs are directly observable they can be directly included. After a few cycles of refinement, a self-consistent structure for the ligand–protein (active site) complex may be generated, as determined from a comparison of experimental and calculated intensities at several mixing times. Whether the procedures described in
266
N. Rama Krishna and Hunter N. B. Moseley
Secs. 3.5.1c and 3.5.1d can be realized in practice can only be answered after further extensive work. Ni et al. (1992, 1995) used a slightly different iterative procedure involving distance geometry and spectral back-calculation. Section 5.1 contains a description of this procedure. 3.5.2. High-Molecular-Weight Complexes
Here we consider strategies for receptors which are too large
for
standard NMR structural determination. For such systems, since a quantitative
interpretation of tr-NOESY requires a knowledge of the active site residues and their coordinates, one has to rely on the crystallographic structure of this protein or
of a related homologous protein to serve as a starting structure. Even though intrareceptor NOEs will not be amenable to direct observation due to line broadening and spin-diffusion problems (and low concentrations of the receptor), intermolecular tr-NOEs may be observable perhaps up to about ~50-kDa. Random fractional deuteration (LeMaster, 1989) of the receptor or the TROSY implementation (Pervushin et al., 1997) may alleviate some of these problems and yield better-quality inter-tr-NOESY data which can be used for structure refinement purposes (however, estimation of precise concentrations of different receptor protons for use in CORCEMA calculations could be a potential problem due to nonuniformity in labeling). In those few instances where inter-tr-NOEs are observable, some tentative assignments for these could be made based on crystallographic
data with some reasonable assumptions. These assignments could be tested for self-consistency (see chapter 2 by Xu et al., in this series). Once high-quality intraligand tr-NOEs, and with some luck, some inter-trNOEs, have been recorded at several mixing times and several ratios, some of the procedures listed above for low-molecular-weight complexes may still be applicable, with the important difference that experimental intrareceptor NOEs are usually not available. In the presence of significant ligand–receptor cross relaxation, the intraligand tr-NOEs (and inter-tr-NOEs when observable) are dependent upon the relaxation rate matrix for the bound form of the receptor, and hence may serve as experimental constraints on the active site conformation, at least in principle. Whether such optimizations are feasible or not, in practice, can only be judged by trying them out, and more studies and experience are needed in this area. However, the results from the work on thrombin-bound structures of human fibrinopeptide analogs using an iterative refinement procedure that consisted of a combination of distance geomentry and tr-NOESY spectrum back-calculations (Ni et al., 1995), as well as our own work with sialyl tetrasaccharide bound to E-selectin (Moseley et al., 1998) that used a manual refinement, suggest that the above refinement methods are not unreasonable.
CORCEMA Analysis of NOESY Spectra of Ligand-Receptor Complexes
267
3.5.3. Normalization of Calculated and Experimental Intensities
In comparing the calculated intensities with respect to the experimental intensities, it is a common practice to use a scaling factor S to normalize them (Lian et al., 1994; Brünger, 1992; Borgias and James, 1988). This scaling factor is usually defined as where and are the calculated and experimental cross-peak intensities respectively, and the summation runs over all the cross peaks
observed experimentally, though in some optimizations the summation has been restricted to only some well-defined cross peaks (Lian et al., 1994; Moseley et al., 1997). Though this kind of normalization works reasonably well and we have used it (Moseley et al., 1997, 1998), it has the drawback that the fit (or lack of a fit) between a calculated and experimental NOESY curve will be less intuitive to interpret since any factor (e.g., protein-mediated spin diffusion) that seriously affects one or a few of the values used in the normalization also will affect all the remaining normalized calculated NOEs, including those that should not be affected by the factor. A better normalization procedure, we believe, is one where the cross peaks are referenced with respect to the corresponding diagonal peak intensities at zero mixing time (which, for long recycle delays makes them independent of the
model), and these diagonal peaks in turn are normalized between experiment and calculation (Xu et al., 1995b).
4. CHARACTERIZATION OF SOME CRITICAL FACTORS USING SIMULATED TRANSFERRED NOESY DATA
We have examined the role of several factors (vide supra) critical in tr-NOESY analysis using simulated data on a hypothetical ligand–enzyme system based on
the published X-ray structure of thermolysin with an irreversible inhibitor bound in the active site (Holland et al., 1992). Since our primary interest is to simulate the tr-NOESY results for different forward and reverse rates, we replaced the covalent bond between the inhibitor and the enzyme with a hydrogen to allow for reversible binding of this hypothetical inhibitor in our models. We have also drastically changed the orientation of the putative enzyme flap in order to better test effects of protein-mediated spin diffusion, as discussed below. Only the active site residues (a total of nine residues, consisting of N-l12, A-l13, F-l14, W-l15, N-l16, E-143,
H-146, R-203, and H-231) of thermolysin were included in our models to expedite calculations. To simulate the effect of hinge-bending motions, it was assumed that Ala 113 of thermolysin is farther from the inhibitor in the open state, and is closer in the closed state, as shown in Fig. 5. A correlation time of somewhat shorter than normal, was deliberately assumed for the free ligand. For the bound form, a value of was chosen. For the methyl group internal rotation correlation time, a value of was chosen for the free and bound states.
268
N. Rama Krishna and Hunter N. B. Moseley
4.1. Finite Receptor Off-Rates
Many traditional tr-NOE analyses have used the so-called fast-exchange approximation; i.e., the exchange rates are much faster than the cross-relaxation rates in the complex. As the enzyme off-rates become comparable to cross-relaxation rates in the bound state, one can intuitively anticipate that the magnitude of the tr-NOE will also diminish. If this effect is not properly taken into account, the diminished intensities might be misinterpreted in terms of wrong structures for the bound ligand. To demonstrate this effect, we have calculated the behavior of the NOESY cross-peak intensity connecting the two geminal protons (separated by a fixed distance of 1.8 Å) in the inhibitor in Fig. 5. They will be referred to as A–X in the following. For notational purposes, we will assume that A and X refer to the free-ligand protons (state 1 in Fig. 1), while and correspond to the ligand protons in the enzyme-bound form. To approximate the thermolysin–inhibitor interaction to a two-state situation, it was assumed that the open state of the enzyme is nonexistent and that the inhibitor goes to the closed state instantaneously upon complexation. In Fig. 15, we show the direct NOESY cross-peak intensities I(AX) and as well as the two exchange-mediated cross-peak intensities and They were computed as a function of the off-rate and the mixing time. In this simulation, was held fixed and the off-rate (and the on-rate) was varied. These figures show the dramatic effect of the off-rates on cross-peak intensities. Noteworthy is the dramatic change of sign of the I(AX) peak and the drop in intensity for the peak as increases. The two exchange-mediated peaks and have identical profiles. Addition of intensities in four figures will give the total intensity for the situation when conformational exchange is either fast on the chemical-shift scale or there is a chemical-shift degeneracy between the free and bound protons. In Fig. 16, the sum of the four cross peaks is shown over the entire range of The composite NOESY peak [which derives a major contribution from the free-ligand peak I(AX)] shows a plateau for the fast-exchange condition, begins to drop in intensity as approaches the cross-relaxation rate, and changes sign for very short off-rates. This figure underscores the importance of unequivocally establishing the exchange off-rate and the dissociation constants by independent methods, to establish the fast-exchange condition, if such an approximation is being used. As a hypothetical example, consider a large enzyme with a rotational correlation time of 50 ns (typical of ~200-kDa systems), a dissociation constant of M for a ligand, and a diffusion-controlled on-rate of The estimated off-rate of is
only about 12 times larger than the cross-relaxation rate of
for a geminal
proton pair (1.8 Å), and about 86 times larger than the cross-relaxation rate of ~12 for protons separated by 2.5 Å, on the enzyme complex. Thus, the assumption of a fast-exchange approximation is not uniformly applicable for this case, and it is safer to undertake a CORCEMA analysis to obtain meaningful structural infor-
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
269
270
N. Rama Krishna and Hunter N. B. Moseley
mation for the bound ligand. This breakdown of the fast-exchange approximation is further exacerbated if the on-rate is less than the diffusion-controlled rate.
4.2.
Effect of Ligand–Receptor Ratio on the Ligand Transferred NOESY
In setting up tr-NOESY experiments, it is often useful to get an idea of the range of ligand–receptor ratios that can be employed for optimal sensitivity. Figure 17 shows the effect of varying the ratio on the geminal proton total tr-NOESY (i.e., sum of two direct plus two exchange-mediated peak intensities). It is clear that significant tr-NOE effects are obtained when is 1 to 75 for the thermolysin–ligand system chosen in the current study. For the tr-NOESY technique as a method is not as dramatic since one has to contend with small changes in the magnitudes of the negative intensities of the ligand alone as opposed to a reversal in signs of the NOESY intensities for smaller ratios. For larger enzymes, the range of useful tr-NOESY regime increases, as recognized earlier
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
271
(Clore and Gronenborn, 1983). Many early experimental investigations have tended to employ high ratios as a way of taking advantage of the sensitivity of the technique to reduce the amount of purified enzyme required for the measurements. By employing high ratios, one might inadvertently (i) significantly enhance
the relative efficiency of protein-mediated spin diffusion (vide infra) and (ii) increase contributions from nonspecific binding, both of which in turn can result in erroneous structures for the bound ligand if not properly accounted for. In practice, it is prudent to perform tr-NOESY measurements over a wide range of ratios (e.g., 50:1–2:1) to develop a model for the conformation and dynamics that is self-consistent over this range. In those circumstances where one is forced to use high ratios because of severe line-broadening problems at lower values, it is essential to do proper control experiments (see Sec. 3.3) to examine if
272
N. Rama Krishna and Hunter N. B. Moseley
any significant ligand–protein intermolecular cross relaxation exists, and undertake additional measurements where protein-indirect effects are suppressed. Curto et al. (1996) simulated the dependence of ligand–receptor intermolecular tr-NOESY intensity as a function of As expected, the intensity decreases for
large values of the ratio. A comparison of Fig. 17 for intraligand tr-NOESY and Fig. 6 in Curto et al. (1996) for intermolecular tr-NOESY shows that their intensity surfaces as a function of and mixing time are dramatically different. Thus,
contrary to assertions by some investigators in the field, these two effects do not exhibit similar behavior. 4.3.
Role of Ligand–Protein Intermolecular Dipolar Relaxation
Intermolecular ligand–receptor dipolar interactions modulated by exchange have rather complex effects on the ligand tr-NOESY (Jackson et al., 1995; Moseley et al., 1995; Ni and Zhu, 1994; London et al., 1992; Nirmala et al., 1992). Broadly speaking, these intermolecular interactions can result in two distinct classes of
effects on the ligand–tr-NOESY spectrum, depending upon the geometrical arrangement of the ligand and receptor protons. These are (1) protein-mediated spin-diffusion effects, which can enhance the tr-NOEs even at short mixing times, and (2) protein-leakage effects, which lead to a decrease in tr-NOE intensity at
longer mixing times. The protein-mediated spin diffusion can sometimes result in dramatic effects (Jackson et al., 1995). In practice, a combination of these two effects might be manifested in the tr-NOESY spectra due to ligand–protein cross relaxation.
4.3.1. Protein-Mediated Spin-Diffusion Effects The protein-mediated spin-diffusion effects have been demonstrated in simulations by Jackson et al. (1995) using a hypothetical two-state model for thermolysin–leucine inhibitor complex conditions (Fig. 5) under a fast-exchange condition. In this calculation, the tr-NOESY intensity between two ligand methyl proton groups (average distance 6.8 Å) has been computed as a function of mixing time. Because of the large distance, the NOESY is extremely susceptible to indirect effects from the protein and the ligand protons.
Jackson et al. (1995) demonstrated the relative effects of each from two sets of CORCEMA calculations—the first one in which the protein protons were gradually removed (equivalent to selective deuteration) while retaining the ligand protons (Fig. 18A), and the second one in which the ligand protons were gradually removed while retaining the protein protons (Fig. 18B). From an examination of Figs. 18A and B, it is easily seen that the tr-NOESY experiences the protein-indirect effects (or protein-mediated spin-diffusion effects) during the growth portions and mid-regions of the mixing times, while the ligand-indirect effects are more pro-
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
273
nounced at longer mixing times, including during the decay of the intensity. Notice that, in this example, the initial lag period that is often used to identify protein-mediated spin diffusion is somewhat poorly defined for the case where all the alanine protons were included. Further, the true initial slope period is confined to the first 25 ms of the mixing time only. For small mixing times, the NOESY spectrum often suffers from poor signal/noise ratio, making it difficult to get good estimates of the initial slopes. Under these conditions, one might be tempted to fit the data in the 0–200 ms mixing times range by an initial slope approximation or a relaxation matrix analysis (limited to ligand protons only) to get estimates of intraligand distances. Such calculations will result in misleading compact structures for the bound ligand. Our calculations underscore the importance of properly incorporating the ligand–protein intermolecular cross relaxation. London et al. (1992) illustrated these protein-indirect effects for a hypothetical ligand–enzyme system shown in Fig. 19. The ligand spin system, arranged at the corners of an equilateral triangle, forms a complex with the enzyme represented by
274
N. Rama Krishna and Hunter N. B. Moseley
five protons equally spaced in a straight line. It was also assumed that there are no conformational changes in the ligand or the enzyme. In this model the enzyme proton 4 is closer to the ligand protons 1 and 2 than they are to each other. Figure 20A shows the variations in the tr-NOESY peak as a function of the exchange
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
275
rate. The protein-indirect effects are dramatic, especially at higher exchange rates.
The intensity is dramatically more intense even at very short mixing times Further, the intensities are substantially stronger than the reference intensity. This is a result of protein-mediated spin diffusion. In this instance, one might be misled into thinking that the distance between protons 1 and 2 is shorter in the bound state.
276
N. Rama Krishna and Hunter N. B. Moseley
It is noteworthy that, although the actual 1–3 and 1–2 distances are identical (= 3.5 Å), they show distinctly different behaviors in their tr-NOESY intensities due to interactions with the enzyme protons—the first pair shows the protein-leakage effects, while the second pair shows the protein-mediated spin-diffusion effects. This is a consequence of the relative arrangement of ligand and enzyme protons within the active site. The theoretical basis for the dramatic effects due to protein-mediated spin diffusion has been described in the literature (Jackson et al., 1995). For fast exchange, neglecting the effect of free-ligand relaxation, the first few terms of the ligand tr-NOESY can be written as [from Eq. (31)]
In the equation, is the relaxation rate matrix for the bound ligand (note that the diagonal terms also include terms of the type due to ligand–protein intermolecular dipolar relaxation). The matrix (and its transpose represents the ligand–protein cross-relaxation terms of the type The term that is linear in corresponds to the traditional initial slope. The second- and higherorder terms in contribute to direct and indirect effects. When the ligand is in high excess of the enzyme, as is the practice in traditional experiments, the It is seen from Eq. (54) that for the first indirect-pathway term (i.e., two-step transfer), the ligand-mediated spin-diffusion term is proportional to while a similar term due to enzyme-mediated spin diffusion is only proportional to Since is always for a high ratio, it is easily seen that the enzyme-mediated spin diffusion can be much more pronounced than the ligand-mediated spin diffusion for identical geometrical arrangement of the intervening protons from the ligand
and the enzyme. The physical basis for the differences in the efficiencies of ligand and proteinmediated spin-diffusion pathways has been discussed (Jackson et al., 1995). Given nearly identical geometrical arrangements, the enzyme-mediated spin-diffusion pathways are dominant over the ligand-mediated spin-diffusion pathways for small mixing times under high conditions. Because of the dependence on in the presence of strong protein-mediated effects the sensitivity of the tr-NOE experiment to ligand-mediated spin-diffusion effects can be recovered at least in part by employing somewhat lower ratios of ligand–enzyme rather than the high ratios traditionally employed [see Fig. 3 in Jackson et al. (1995)]. The protein-mediated spin-diffusion effects have been experimentally demonstrated by Arepalli et al. (1995) using ROESY (Fig. 8). These studies allowed them to revise their earlier conclusion in which, without consideration of protein-mediated effects, a significant conformational change was proposed for a disaccharide on binding to an antibody (Glaudemans et al., 1990). An examination of some published 1D
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
277
tr-NOE spectra in the literature clearly demonstrates large intensity changes in the background enzyme resonances presumably due to an intermolecular NOE in the complexed state when a “free” ligand resonance is saturated. Customarily, these intensity changes and the associated influence of ligand–enzyme cross relaxation have been routinely neglected in many studies. It will be interesting to reexamine these earlier tr-NOE measurements. 4.3.2. Protein-Leakage Effects
Using the hypothetical ligand–enzyme system shown in Fig. 19, London et al. (1992) demonstrated the reduction in tr-NOE intensity due to protein-leakage effects. Figure 20B shows the effect of these interactions on the tr-NOESY of
protons 1 and 3 on the ligand as a function of exchange rate. At longer mixing times the intensity is lower than that predicted by the reference curve, due to protein-leakage effects. Notice that the initial growth portions in the first 25–50 ms are identical. However, because of poor signal/noise this short-mixing-time region will usually be characterized by low sensitivity, and one is forced to focus attention on data at mixing times longer than 50 ms. If the protein-induced leakage effects are not identified as such, an analysis using correct correlation times for the complex might mislead one to the erroneous conclusion that the distance between protons 1 and 3 has become larger in the bound state. Thus, protein-leakage effects
might result in somewhat expanded structures for the bound ligand if not properly treated. Alternatively, one might be misled into deducing shorter correlation times for the complex. Depending upon the specific geometrical arrangement of ligand protons in relation to receptor protons at the active site, the tr-NOESY time-course curves for different proton pairs on the ligand might experience either protein-mediated spin-diffusion effects (which affect the growth portions), or protein-leakage effects (which affects primarily the decay), or a combination, or none of these.
4.4. Ligand–Protein Intermolecular NOESY Intensity as a Function of Off-Rate In Fig. 21 we have computed the intermolecular ligand–enzyme NOESY cross-peak intensity between the m1 methyl group of the ligand and the Ala 113 methyl group of the enzyme as a function of the off-rate. This intensity is a sum of four separate intensities (free ligand–free enzyme; bound ligand–bound enzyme; free ligand–bound enzyme; and bound ligand–free enzyme). The four contributions that would be observable if the exchange is slow on the chemical-shift scale are
shown in Fig. 22 as a function of and mixing time. Note that for fast-exchange rates, the free-ligand–free-enzyme cross peak develops intensity due to indirect pathways involving conformational exchange and cross relaxation in the bound
278
N. Rama Krishna and Hunter N. B. Moseley
state (Curto et al., 1996; Moseley et al., 1995). The substantial nature of intermolecular ligand–enzyme NOESY contact is obvious from these figures. These results suggest that it may be feasible to model intermolecular contacts between two interacting species even when they are not tightly bound M).
4.5. Effect of Motions in the Protein–Ligand Complex on the Transferred NOESY As a typical example of motions that can occur at the active site, we consider a fairly common situation associated with enzymes that undergo hinge-bending
motions; i.e., upon ligand binding, the enzyme undergoes a conformational transi-
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
279
tion from an “open state” to a “closed state” in which the ligand securely occupies the active site. We have also made the simplifying assumption in this model that the only way the ligand could be released from the closed state is by passing through the open state. The relevant scheme is shown in Fig. 4. The conformations of the ligand and the enzyme could be different in all three states. 4.5.1.
A Simplified Example
First, to demonstrate the relative influence of both (off-rate) and (hinge-bending rate) on the tr-NOESY, we chose the example of a ligand with two
280
N. Rama Krishna and Hunter N. B. Moseley
protons (AX) separated by a distance of 5.5 Å and a rotational correlation time of
In the open state of the complex, it was assumed that there is no additional dipolar interaction, but now the ligand tumbles slowly with the correlation time of
the enzyme In the closed state, a proton (M) from the enzyme approaches the A and X protons on the ligand with equal distances of 2.75 Å. Thus, in this model, the ligand–enzyme cross relaxation takes place only in the closed state. A uniform leakage factor of was included in all three states. Figure 23 shows the results for a mixing time of 4 s. The intensity surface in this figure is a reflection of the model used here, and displays three well-defined
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
281
plateaus associated with conformational averaging. For large off-rates one plateau is located in the very-slow-hinge-bending-rate region where the effective relaxation rate is given by [Eq. (47)], while the second one is located in the fast-hinge-bending region where the effective generalized relaxation rate is given by [Eq. (45)]. A third plateau, described by [Eq. (48)], is seen for very slow off-rates and large rates. A fourth plateau, for very small corresponds to the trivial case of no conformational exchange. The enormous variations observed in the intensity surface underscore the importance of identifying the proper range of off-rates and hinge-bending rates. Moseley et al. (1995) have simulated the effect of hinge bending on an intraligand tr-NOE for a hypothetical thermolysin–leucine inhibitor model to demonstrate the importance of incorporating such motions in a rigorous tr-NOESY analysis.
5. EXPERIMENTAL EXAMPLES Even though several studies have been published that employed tr-NOESY to
determine the bound-ligand conformations, the number of examples actually employing a CORCEMA type of analysis in which the receptor protons and/or finite
off-rates are explicitly considered is surprisingly small (Moseley et al., 1998,1997; Casset et al., 1996, 1997; Rinnbauer et al., 1998; Ni et al., 1995, 1992;Ning et al., 1994). Some investigators have focused on the analysis of intraligand tr-NOE buildup curves under fast-exchange assumption in terms of a relaxation rate matrix limited to the ligand protons only (Murali et al., 1997; Jarori et al., 1994; Bevilacqua et al., 1992). Distance-constrained refinement protocols such as distance geometry, restrained molecular dynamics, or restrained energy minimization methods have been used by several investigators to deduce the bound-ligand conformations (Hrabal et al., 1996; Schneider and Post, 1995; Fischer et al., 1995; Okada et al., 1994; Campbell and Sykes, 1991; Ni et al., 1990). Typically, the intraligand tr-NOESY intensities are classified as strong, medium, or weak, and typical distance constraints are set for use in a protocol that employs distance geometry or restrained molecular dynamics. The advantages and disadvantages of these methods over full relaxation rate matrix methods have already been considered in Sec. 3.5. The bound ligands were sometimes docked into the receptor binding pocket from a prior knowledge of intermolecular hydrogen-bond and/or salt bridge constraints (e.g., Schneider and Post, 1995; Ni et al., 1995) or intermolecular tr-NOESY contacts (Ramesh et al., 1996; Asensio et al., 1995a; Scherf et al., 1992). Use of direct methods such as MARDIGRAS have also been described in tr-NOESY analyses (Adams et al., 1997). In some instances, through an iterative distance geometry–full relaxation matrix back-calculation of tr-NOESY spectra (Ni et al., 1995) or a manual refinement with CORCEMA back-calculation (Moseley et al.,
282
N. Rama Krishna and Hunter N. B. Moseley
1998), the docked ligand structures were further refined, thereby correcting for protein-indirect effects. For carbohydrate ligands, a combination of energy maps for glycosidic linkages, molecular mechanics, and simulated annealing were typically used (Weimar et al., 1995; Asensio et al., 1995a, 1995b). In the following section, we will limit ourselves to some recent examples where ligand structure refinements explicitly incorporated protons from the receptor active site residues in a full relaxation rate matrix treatment.
5.1. Thrombin-Bound Structures of Human Fibrinopeptide Analogs
Ni and co-workers employed tr-NOESY to study the conformations of human fibrinopeptide A (FpA) and its analogs when bound to thrombin (Ni et al., 1995). FpA is a 16-residue peptide that is released from the chains of fibrinogen after proteolytic cleavage of the R16–G17 peptide bond by thrombin (Scheraga, 1986,1983). The sequence specificity of thrombin–fibrinogen interactions have been studied in great detail, including the effect of naturally occurring mutations in fibrinopeptide A on this interaction. The mutation of G12 to V12 decreases the efficiency of thrombin-catalyzed cleavage of R16–G17 peptide bonds in peptides derived from the
chains of fibrinogen Roulen (Lord et al., 1990; Ni et al., 1989).
An analog (P15–FpA) was synthesized in which Val 15 was replaced by Pro 15 to restrict conformational freedom and potentially enhance binding affinity to throm-
bin. A second analog (P15–FpA Rouen), in which Gly 12 was replaced by Val 12 to mimic the mutation associated with fibrinogen Rouen, was also synthesized. In solutions of these peptides with bovine (ligand: protein ratio 25:1), transferred NOE measurements at 25°C and pH 5.5 exhibited chemical exchange cross peaks between free and bound forms for the ligands, with the P15–FpA peptide showing significantly slower off-rate than P15–FpA Rouen. The bound structures for the peptides were calculated using distance geometry methods with approximate distances derived from transferred NOESY spectra at various mixing times, in combination with an iterative distance and structure refinement by comparing spectra calculated using full relaxation matrix treatment with the corresponding experimental spectra at longer mixing times (100, 150, 200, and 400 ms). These structures were further refined by a distance-restrained and electrostatically driven Monte Carlo method. The refined structures were docked into the thrombin active site using the distance geometry program DGEOM with the aid of hydrogen bonds and ion-pair interactions observed between inhibitors and trypsin-like serine proteases (including thrombin) in several protease–inhibitor
complexes. During docking, the thrombin structure was fixed as in the crystal structure, and the conformations of the peptide residues were allowed to vary. For calculation of transferred NOE spectra using complete relaxation matrix analysis, a minimum set of nine residues (H43, D189, G193, D194, S195, S214, W215, G216, and G219) at the catalytic site of bovine thrombin was explicitly included
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
283
for the intermolecular docking constraints used in the procedure. Experimentally
measured selective longitudinal relaxation rates for resolved resonances of the peptide ligands were used to replace the calculated diagonal elements of the relaxation rate matrix (Ni, 1994; Perlman et al., 1994). The dissociation constants
(0.04 mM for P15–FpA and 0.4 mM for P15–FpA Rouen) were estimated from values. Leakage rates of for the free peptide and for thrombin and the complex were used in the simulations. During the course of the docking refinement, some of the intraligand proton distances, which otherwise would have been very short to account for observed NOEs without incorporating the enzyme protons, had to be increased to allow for intervening enzyme protons (and the
concomitant protein-mediated effects on the tr-NOEs). The comparison with experimental spectra was made by converting the calculated NOE intensities to two-dimensional FIDs and utilizing the assigned chemical shifts and estimated linewidths for all resolved peptide protons. Figure 24 shows typical experimental and calculated spectra, while the final docked structures of P15–FpA and P15–FpA Rouen are shown in Fig. 25. The final transferred NOE-based structures for the thrombin-bound FpA were found to be closely similar to the crystal structure of FpA in the noncovalent
complexes with bovine thrombin. These authors suggested that the binding of FpA
Rouen to thrombin may require a conformational rearrangement of thrombin residues Ile 174 and Glu 217 to accommodate the bulky side chain of Val 12. Such
a structural requirement could possibly be the reason for the reduced conformational stability for the thrombin complex with FpA Rouen. Other studies on thrombin-bound structures have been described earlier (Hrabal et al., 1996; Ning et al., 1994, 1992). 5.2.
Studies on Blood Group A Trisaccharide Bound to Dolichos biflorus Lectin
Transferred NOESY has become a popular method to study the conformations of oligosaccharide antigens bound to antibodies (Arepalli et al., 1995; Bundle et al., 1994; Glaudemans et al., 1990) and lectins (Scheffler et al., 1997, 1995;Poppe et al., 1997; Casset et al., 1997, 1996; Asensio et al., 1995b; Cooke et al., 1994; Bevilacqua et al., 1992). The blood group antigens are important in blood transfusion and can be used as indicators of tissue differentiation and malignancy. The lectin Dolichos biflorus recognizes blood group A oligosaccharides through its unique specificity for GalNAc residues. Casset et al. (1996) have performed tr-NOESY and tr-ROESY measurements on the blood group A trisaccharide to determine the conformation of the minimal antigenic determinant when complexed to lectin Dolichos biflorus. Figure 26 shows the tr-NOESY and tr-ROESY spectra of the blood group A trisaccharide complexed with D. biflorus. The tr-NOESY spectrum identifies a
284
N. Rama Krishna and Hunter N. B. Moseley
number of new interglycosidic NOEs that are absent in the uncomplexed oligosaccharide. However, most of these new peaks in the tr-NOESY are due to either ligand- or protein-mediated indirect effects, and have been readily identified as such by the tr-ROESY experiment. To quantitatively test for the bound conformations of the blood group A oligosaccharide, Casset et al. generated two conformations
that corresponded to the energy minimia of two families of conformations, FamI and FamII, that described the conformations of the uncomplexed trisaccharide in solution. Specifically, these conformations were further energy minimized within the binding site as described by Imberty et al. (1994). The resulting energyminimized structures, labeled CxI and CxII, were used in generating theoretical
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
285
tr-NOESY curves using the program PDB2NOE (Ni and Zhu, 1994). Thirteen
residues of the D. biflorus lectin-binding site, based on modeling studies (Imberty et al., 1994), were explicitly included to account for interactions with protein protons in the tr-NOESY simulations: Asp 85, Gly 102, Gly 103, Tyr 104, Leu 127, Ser 128, Asn 129, Ser 130, Trp 132, Gly 213, Leu 214, Ser 215, and Tyr 218. The rotational correlation time for the complex was given a value of 55 ns, as expected
286
N. Rama Krishna and Hunter N. B. Moseley
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
287
for a protein of 110 kDa. The optimal values of and were deduced by fitting some theoretical intraresidue tr-NOEs to the corresponding experimental data. The methyl groups were assigned an internal correlation time of 25 ps. The experimental tr-NOESY data for interglycosidic contacts and the theoretical curves predicted for the CxI and CxII conformations are shown in Fig.
27. Casset et al. concluded that the CxI conformation gave a good agreement with the experimental data, though CxII could not be discarded entirely. Considering the many approximations involved in calculating the predicted tr-NOEs, in particular the orientation of the ligand within the binding pocket, the agreement between experiment and theory is encouraging, especially for interglycosidic NOEs such as H1GN–H3G and H1F–H3GN. A model proposed by Casset et al. (1996) for the blood group A trisaccharide bound to the D. biflorus is shown in Fig. 28. 5.3. Transferred NOESY Studies on the Forssman Pentasaccharide
Complexed to Dolichos biflorus The Forssman antigen is a commonly occurring heterophile antigen and, together with some related glycilipids, represents the antigenic determinant of the P blood group system (Casset et al., 1997; Marcus et al., 1976). It is present in several forms of human cancer, which include gastric, colon, and lung cancers (Ono et al., 1994; Uemura et al., 1989). To understand the nature of carbohydrate–receptor interactions, Thomas Peters and co-workers have undertaken extensive transferred NOESY characterization of the interaction of the Forsman pentasaccharide with the seed lectin from Dolichos biflorus, a 110-kDa tetramer with hemagglutinating properties, with two carbohydrate-binding sites per tetramer. The structure of the Forssman pentasaccharide (FPS) is shown in Fig. 29. A series of tr-NOESY and tr-ROESY measurements were performed for D. biflorus:FPS ratios of 1:5, 1:10, and 1:15. The evolution of the magnitude of the tr-NOEs and tr-ROEs as a function of the lectin/FPS ratio was found to be different for the nonreducing disaccharide moiety compared to the reducing end trisaccharide of the FPS, thus reflecting distinct relaxation and exchange properties for the disaccharide moiety. This was further confirmed by and filtered tr-NOESY, which identified several intermolecular contacts between the disaccharide and side-chain protons of the aromatic and aliphatic residues within the binding pocket of the lectin. Figure 13 shows typical and filtered tr-NOESY spectra identifying several intermolecular tr-NOESY peaks in the D. biflorus–FPS complex. Previously, a model for the disaccharide complexed within the binding pocket of D. biflorus was proposed by Imberty et al. (1994). The experimentally observed intermolecular tr-NOEs were found to be qualitatively in good agreement with the predictions based on this model. Based on this, Casset et al. proposed a model for the bound conformation of the FPS in which the nonreducing end disaccharide is buried in the lectin-binding pocket with the
288
N. Rama Krishna and Hunter N. B. Moseley
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
289
remaining trisaccharide moiety pointing away. Figure 30 shows their proposed model for the D. biflorus–FPS complex.
Rinnbauer et al. (1998) undertook a more quantitative analysis of some of the intraligand tr-NOEs measured at 500 MHz and 315 K, using the CORCEMA
program (Moseley et al., 1995). These experimental conditions corresponded to null NOE for the free ligand, yielding a correlation time of 0.21 ns. For the complex, a correlation time of 50 ns was chosen. The calculations employed the coordinates for 32 residues that constitute part of the binding pocket of D. biflorus and the coordinates for the pentasaccharide in the proposed model. The off-rate and the dissociation constant were determined first by iteratively optimizing calculated intraglycosidic tr-NOEs with respect to the corre-
290
N. Rama Krishna and Hunter N. B. Moseley
sponding experimental values until a minimum R-factor was obtained. These values were used for the remaining calculations. The results of the CORCEMA calculations are shown in Fig. 31 for some intraglycodic tr-NOEs, and in Fig. 32 for some of the interglycosidic NOEs. In general, the agreement seems to be quite satisfactory. On the other hand, since the nonreducing end disaccharide is also in close contact with some aromatic and aliphatic side chains in the binding pocket of the protein, some transferred NOEs can be expected to be very sensitive to the relative location of these protein protons in relation to the ligand, and the concomitant protein-mediated spin-diffusion and protein leakage effects (Jackson et al., 1995). Not surprisingly, some intraglycosidic and interglycosidic tr-NOEs involving predominantly
the terminal disaccharide also show poor fits (Fig. 33), reflecting the need for a further optimization of the orientation and/or conformation of the pentasaccharide
in the binding pocket. Because of the lack of specific resonance assignments for the protein residues that contributed to the observed intermolecular tr-NOEs, such an optimization is not trivial, although possible at least in principle (Curto et al., 1996). 5.4. Interaction of Sialyl
Tetrasaccharide with E-selectin
The conformation of sialyl tetrasaccharide when bound to E-selectin has been the subject of at least five transferred NOE analyses, attesting to the
importance of this system (Scheffler et al., 1997, 1995; Poppe et al., 1997; Cooke et al., 1994; Hansley et al., 1994). E-selectin is a membrane glycoprotein that belongs to the selectin family (E-, P-, and L-selectins). It is expressed on endothelial cells and plays an important role in inflammation (Lasky, 1992; Bevilacqua et al., 1989). E-selectin specifically binds the sialyl antigen which is present
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
291
292
N. Rama Krishna and Hunter N. B. Moseley
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
293
on neutrophilic granulocytes (Scheffler et al., 1995; Lasky, 1992; Bevilacqua et al., 1989). The structure of the tetrasaccharide is shown in Fig. 34. In solution, the free tetrasaccharide exists in equilibrium among several conformers (Rutherford et al., 1994). An understanding of the bound conformation of the antigen is of considerable interest in unraveling the molecular basis of recognition between the antigen and its receptor, E-selectin. Whereas the analyses in the earlier papers were qualitative in nature, Poppe et al. (1997) undertook a more quantitative analysis based on a full relaxation matrix treatment of the ligand. Based on published hydrodynamic data, the E-selectin protein was represented as a prolate ellipsoid, and the relaxation matrix elements were computed using Woessner’s expressions. [Incidentally, Eq. (5), given by Poppe et al. in their paper for computing the relaxation rate matrix elements of the bound ligand in a prolate ellipsoid, refers to the extreme narrowing limit and, hence, is not applicable for the bound state. The correct expressions (with frequency dependence) for relaxation rate matrix calculations applicable for large symmetric-top molecules were given earlier by our laboratory (Krishna et al., 1978)]. Poppe et al. also did not explicitly include protein protons in their calculations since a saturation of the protein envelope in the aromatic and aliphatic regions by a DANTE sequence during the mixing time period
294
N. Rama Krishna and Hunter N. B. Moseley
did not appreciably affect the transferred NOE intensities. The bound conformation of the branched trisaccharide portion was found to be close to that of the free ligand. In a more recent unpublished study, Peters and his co-workers at the University of Lübeck undertook careful and detailed tr-NOESY measurements at 600 MHz
(310 K) on with the E-selectin IgG-chimera (220 kDa) in using a ratio of 15:1 for the ligand:binding sites. In a joint collaboration, our laboratories analyzed the data quantitatively using the CORCEMA program and incorporating explicitly the protein protons within the active site (and excluding all exchangeable hydrogens). These calculations were aided by the availability of crystallographic data on the unliganded E-selectin (Graves et al., 1994) as well as the crystallographic data on complexed to the mannose binding protein (MBP) mutant (Ng and Weis, 1997). Using the crystal structure of MBP in its complex with we have manually aligned the E-selectin backbone based on homology in the binding pocket. Based on a comparison of the loop conformation for residues 84–88 in E-selectin which are in register with residues 189–193 in MBP, we suggest that this loop, which is extended in the unliganded E-selectin, moves toward the ligand in the complex. For CORCEMA calculations, three receptor models were used: the first (R1) was identical to the unliganded E-selectin, the second (R2) had the 84–88 loop bent to approximate that in the MBP, but with the Arg 84 side-chain manually positioned to be close to the fucose, and the third (R3) had no protein protons. These receptor structures were used without any energy minimization, and only residues within a 7-Å radius from the ligand protons were included in the CORCEMA calculations. For the bound ligand, four structures were used—the first one (L1) was the crystallographic structure of Ng and Weiss (1997). In this structure, the electron density for G to N linkage was very weak and indicative of considerable conformational flexibility for this interglycosidic linkage in the complex. A second structure (L2) was obtained by a further manual optimization of the torsion angles of L1, and two more structures (L3, L4) for the tetrasaccharide were generated based
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
295
on preliminary analysis of tr-NOESY data without total relaxation matrix treatment and modeling calculations. These structures were generated by docking the tetrasaccharide into the unliganded E-selectin structure using GRID and SYBYL, with the TRIPOS force field for the sugars, and Pullman charge calculations (Thomas Peters, private communication). For CORCEMA refinements, the correlation times (bound and free), the leakage factor for the free ligand, and the order parameter were optimized to get the best fit (lowest NOE R-factor) between experimental and calculated values for the six ligand–receptor models: L1:R1, L1:R3, L2:R2, L2:R3, L3:R1, and L4:R1. This optimization also used a leakage-shell model (Moseley et al., 1997) to account for leakage dipolar relaxation of active site protons with the rest of the protein protons. Each NOE (both calculated and experimental) was normalized with respect to a sum of some reference NOEs (e.g., intrasugar NOEs such as H4F–H6F). This kind of normalization results in the unfortunate artifact that any effect (e.g., protein-mediated spin diffusion) that significantly affects a few of these reference NOEs also indirectly affects the fits for the rest of the tr-NOEs, including those remote protons (see Sec. 3.5.3). This fact has to be kept in mind in while drawing conclusions based on a comparison of calculated and experimental tr-NOESY
curves. The for this system is sufficiently slow enough that it is safer to carry out the full CORCEMA treatment, including finite exchange off-rates, instead of assuming the fast-exchange approximation. In fact, the computed for this example is nearly four orders of magnitude smaller than the value of used by some investigators while trying to justify the fast-exchange assumption. Despite these moderately slow off-rate conditions, there will still be a significant amount of transferred NOE to the signal associated with the free ligand. In our computations, we combined intensities from all contributions (Choe et al., 1991; Moseley et al., 1995). For CORCEMA optimizations, a data set consisting of 16 tr-NOEs was used. Figure 35 shows some of the interglycodic tr-NOESY fits using the L2:R2 complex shown in Fig. 36. For the optimizations, and were held fixed, based on independent estimates (Poppe et al., 1997; Thomas Peters, unpublished results, 1996). The L2:R2 structure resulted in the best NOE R-factor From the 5F-2G tr-NOE fit, it is apparent that additional minor refinements may be necessary. Interestingly, deletion of the protein protons (i.e., L2:R3 complex) resulted in only a slight increase of the R-factor, suggesting that the protein protons play a marginally important role in the ligand tr-NOESY spectra, with only some NOEs showing the effect. The remaining complexes (L1:R1, L1:R3, L3:R1, and L4:R1) gave R-factors suggesting that they (and in particular the ligand conformations) are less compatible with the experimental data. More recent calculations in which the off-rate, bound correlation time, order parameters, and leakage factors were optimized, will be reported elsewhere (Moseley et al., 1999).
296
N. Rama Krishna and Hunter N. B. Moseley
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
5.5.
297
Reversible Binding of Corepressor Tryptophan with Repressor–Operator Complex
As a last example, we present the CORCEMA analysis of intermolecular transferred NOESY in a ligand–protein/DNA complex. It is also the first quantitative analysis of intermolecular transferred NOESY. The E. coli Trp-repressor is a
DNA-binding protein important in gene regulation. The Apo-repressor is a homodimer of two 107-residue monomers. The holo-repressor has two L-tryptophan corepressor molecules which bind in a noncooperative manner to two specific binding pockets in the dimer interface. Two Trp-repressors can bind in a tandem fashion to operator sequences (2:1 stoichiometry) of 33 base pairs or longer (Kumamoto et al., 1987). The minimal operator is an 18-base-pair consensus sequence which binds the Trp-repressor with a 1:1 stoichiometry (Haran et al., 1992; Bennett and Yanofsky, 1978). Crystal structures of the Apo- and holo-repressors and holo-repressor–operator complexes are available as are the NMR structures of the Apo-repressor–operator complex (Zhao et al., 1993; Lawson and Carey, 1993; Arrowsmith et al., 1991; Otwinowski et al., 1988). Lee et al. (1995) have assigned the intermolecular tr-NOESY contacts between the Trp-repressor– operator (Trp-op/rep) complex and the corepressor tryptophan.
298
N. Rama Krishna and Hunter N. B. Moseley
Figure 14 shows the typical NOESY spectrum with the peaks identifying the intermolecular contacts between the free and bound forms of tryptophan and the corepressor bound represser–operator complex. The concentration of unbound
operator–repressor complex is negligible under the conditions of the experiment. The chemical shifts of the residual unbound op/rep complex also coincide with that of the repressor-bound form. The peaks between the free corepressor and the bound complex arise due to an exchange-mediated NOESY (Lee and Krishna, 1992; Choe et al., 1991).
5.5.1. The Leakage-Shell Model
Moseley et al. (1997) analyzed the inter-tr-NOESY data by CORCEMA, using a two-state model consisting of free and bound conformations for the interacting
species (i.e., the corepressor and the op/rep complex). The crystallographic structure for the complex (Otwinowski et al., 1988) was used to generate the bound state with the corepressor and the residues in and around the binding pocket. The free state consisted of the corepressor and the binding pocket in their uncomplexed state, but with the same conformations as in the complexed state. Three intense intermolecular cross peaks were analyzed: (A) between the free to methyl (free and bound), (B) between bound to methyl (free and bound), and (C) between bound to methyl (free and bound) protons. To account for direct and indirect effects, the binding pocket explicitly included all hydrogens within a radius r from each participating hydrogen (or the attached carbon for methyl protons), as shown by four overlapping spheres in Fig. 37. All hydrogens in the sphere were given a uniform leakage factor of Hydrogens in the outer 1-Å shell of the overlapping spheres were given a uniform leakage of to simulate leakage pathways for the shell hydrogens due to dipolar interactions with the rest of the protons outside the shell. This way, the dimension of the CORCEMA matrix was kept relatively small and manageable while accurately accounting for the specific effects of each individual proton within the binding pocket (i.e., protein-mediated spin-diffusion and leakage effects), as well as the general somewhat nonspecific leakage effects due to the remaining protons outside the shell. The was set at To account for internal motions due to methyl group rotation, the Lipari–Szabo model-free approach was used, with representing the order parameter for the internuclear vector i–j. The internal correlation time was fixed at 0.005 ns. The order parameter for nonmethyl protons was fixed at 0.85. The parameters (correlation time for free ligand), (correlation time for the bound ligand and the complex), external methyl S, and leakage for the free ligand were optimized by Powell minimization to get best fits between CORCEMA-calculated NOESY intensities and experimental intensities. An NOE R-factor (Krishna et al., 1978; Xu et al., 1995a, 1995b) was used as the energy term to be minimized by the Powell minimization, where
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
299
The calculations were performed with active site sphere radii set to 5, 6, 7, and 8 Å, with and without a leakage shell. Figure 4 and Table 1 in Moseley et al. (1997) show the results of optimization that give the best fit. Also shown are the effects of
varying the exchange off-rate and the sensitivity of the inter-tr-NOESY to its variations. An off-rate of and a of 13.5 ns were determined from the best-fit optimizations. The off-rate is in excellent agreement with the value of determined by direct measurements (Lee et al., 1995). The bound correlation time of 13.5 ns at 45°C, determined by Moseley et al. from CORCEMA analysis, is in excellent agreement with the value of 14.5 ns at 37°C reported by Shan et al. (1996). This demonstrates the power of the CORCEMA method. Figure 38 demonstrates the sensitivity of the inter-tr-NOESY to changes in the orientation of the corepressor within the binding pocket. A change in of (i.e., from 92°
300
N. Rama Krishna and Hunter N. B. Moseley
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
301
to 112°) gives dramatically bad fits between experiment and theory, and can be
rejected outright. In contrast, a change by –20° (i.e., from 92° to 72°) results in an acceptable fit as well as R-factor (changing from 0.142 for the crystallographic orientation to 0.154). Nevertheless, this orientation resulted in somewhat unacceptable values for optimized parameters like (= 9.79 ns) and external methyl These CORCEMA calculations confirm that the intermolecular tr-NOESY data are compatible with the crystal structure orientation for the corepressor within the binding pocket. The intermolecular NOESY data were analyzed by Ramesh et al. (1996), using another method. They calculated approximate intermolecular distance constraints from a NOESY spectrum at 50 ms and used these constraints together with distance geometry–simulated annealing refinements to model the orientation of the corepressor within the binding pocket. These calculations also confirmed that the solution structure of the corepressor is generally similar to that in the crystal structure. Because of the difficulty in stereospecifically assigning the two a slightly altered orientation for the aromatic ring was also found to be compatible by distance geometry methods. Because an alteration in the corepressor orientation also alters the spin-diffusion pathways which may be sensed by the full-mixing-
time curves for the different NOEs, the CORCEMA method has the potential to provide a more sensitive probe of the ligand orientation within the binding pocket. Moseley et al. also pointed out that it is relatively easy to get good fits between experimental tr-NOEs and calculated tr-NOEs by optimizing parameters using some models—whether such good fits are meaningful or not can only be judged by comparing the values of the optimized parameters (e.g., bound correlation time, off-rate, etc.) with respect to their estimates from independent measurements.
6. FINAL COMMENTS
In this chapter, we have summarized the CORCEMA methodology along with some experimental examples where this and other similar methods have been successfully employed. The CORCEMA algorithm should prove to be useful in the NOESY analysis of interacting molecules under a wide range of conditions, from very weak to very tight binding. Quantitative analysis of transferred NOESY is one major application. In addition to a discussion on the quantitative determination of bound-ligand structures, we have also considered the possibility of exploiting tr-NOESY in structure-based design by an explicit incorporation of active site residues and the associated effects (i.e., protein-mediated spin diffusion, leakage effects, and intermolecular tr-NOEs) in CORCEMA analysis. Recent reports such as tr-NOESY-based screening of compound libraries for biological activity (Meyer et al., 1997) and the SAR-by-NMR method for designing higher-affinity ligands (Shuker et al., 1996), underscore the increasingly important role of high-field NMR
302
N. Rama Krishna and Hunter N. B. Moseley
spectroscopy in serious structure-based drug design efforts. In this context, CORCEMA and other similar algorithms for analyzing tr-NOESY data can play a major role in the arsenal of tools available to investigators involved in such efforts.
Together with computational advances in structure refinement protocols, along with experimental advances, CORCEMA and similar algorithms render the transferred NOESY technique into a powerful tool for structure-based drug design directly in the solution phase.
ACKNOWLEDGMENTS. This work was supported in part by NSF grant MCB9630775, NCI Grant CA-13148, and the Arthritis Foundation. The authors wish to thank Drs. Jacob Anglister, Cheryl Arrowsmith, Ad Bax, Anne Imberty, Robert London, Thomas Peters, and Tali Scherf for supplying the originals of some figures used in this chapter, and Drs. Robert London, Feng Ni, and Thomas Peters for sending preprints of manuscripts prior to publication. Figures 35 and 36 showing CORCEMA calculations on system are from a joint collaboration with the laboratory of Dr. Thomas Peters at the University of Lübeck, Germany. The authors also thank Dr. Peters for his comments on this article. Stimulating discussions with Drs. Ernie Curto and Patricia Jackson during the early stages of this work are also acknowledged.
REFERENCES Adams, E. R., Dratz, E. A., Gizachew, D., Deleo, F. R., Yu, L., Volpp, B. D., Vlases, M, Jesaitis, A. J., and Quinn, M. T., 1997, Biochem. J. 325:249. Albrand, J. P., Birdsall, B., Feeney, J., Roberts, G. C. K., and Burgen, A. S. V., 1979, Int. J. Biol. Macromol. 1:37. Alexandrescu, A. T., Hinck, A. P., and Markley, J. L., 1990, Biochemistry 29:4516. Andersen, N. H., Eaton, H. L, and Nguyen, K. T., 1987, Magn. Reson. Chem. 25:1025. Anfinsen, C., 1973, Science 181:223. Anglister, J., Scherf, T., Zilber, B., and Levy, R., 1995, Biopolymers 37:383. Anglister, J., Scherf, T., Zilber, B., Levy, R.,Zvi, A., Hiller, R., and Feigelson, D., 1993, Faseb J. 7:1154. Anglister, J., and Zilber, B., 1990, Biochemistry 29:921. Arepalli, S. R., Glaudemans, C. P. J., Daves, Jr., G. D., Kovac, P., and Bax, A., 1995, J. Magn. Reson. B106:195. Arrowsmith, C. H., Pachter, R., Altman, R., and Jardetzky, O., 1991, Eur. J. Biochem. 202:53–66. Asensio, J. L., Cañada, F. J., Bruix, M., Rodriguez-Romero, A., and Jimenez-Barbero, J., 1995a, Eur. J. Biochem. 230:621.
Asensio, J. L., Cañada, F. J., and Jimenez-Barbero, J., 1995b, Eur. J. Biochem. 233:618. Baleja, J. D., Mau, T., and Wagner, G., 1994, Biochemistry 33:3071. Baleja, J. D., Pon, R. T., and Sykes, B. D., 1990, Biochemistry 29:4828–4839. Bax, A., and Davis, D. G., 1985, J. Magn. Reson. 63:207. Behling, R. W., Yamane, T., Navon, G., and Jelinski, L. W., 1988, Proc. Natl. Acad. Sci. U.S.A. 85:6721. Bennett, W. S., Jr., and Steitz, T. A., 1978, Proc. Natl. Acad. Sci. U.S.A. 75:4848. Bennett, G. N., and Yanofsky, C., 1978, J. Mol. Biol. 121:179–192.
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
303
Bernasconi, C. F., Ed., 1986, Investigation of Rates and Mechanisms of Reactions, Techniques of Chemistry, Vol. VI, Part I, Wiley-Interscience, Chaps. IV and VI. Bersch, B., Koehl, P., Nakatani, Y., Ourisson, G., and Milon, A., 1993, J. Biomol. NMR 3:443. Bevilacqua, V. L., Kim, Y., and Prestegard, J. H., 1992, Biochemistry 31:9339. Bevilacqua, M. P., Stengelin, S., Gimbrone, M. A., Jr., and Seed, B., 1989, Science 243:1160. Blommers, M. J. J., Fendrich, G., García-Echeverría, C., and Chêne, P., 1997, J. Am. Chem. Soc. 119:3425. Boelens, R., Konig, T. M. G., and Kaptein, R., 1988, J. Mol. Struct. 173:299. Bonvin, A. M. J., Boelens, R., and Kaptein, R., 1994, Biopolylers 34:39. Borgias, B. A., and James, T. L., 1988, J. Magn. Reson. 79:493. Borgias, B. A., and James, T. L., 1989, Meth. Enzymol. 176:169. Borgias, B. A., and James, T. L., 1990, J. Magn. Reson. 87:475. Bothner-By, A. A., Stephens, R. L., Lee, J., Warren, C. D., and Jeanloz, R. W., 1984, J. Am. Chem. Soc. 106:811. Boyd, J., Moore, G. R., and Williams, G., 1984, J. Magn. Reson. 58:511. Braun, W., 1987, Q. Rev. Biophy. 19:115–157. Brunger, A. T., 1992, X-PLOR, Version 3.1, Yale University Press, New Haven. Bundle, D. R., Baumann, H., Brisson, J. R., Gagné, S. M., Zdanov, A., and Cygler, M., 1994, Biochemistry 33:5183. Campbell, I. D., Dobson, C. M., Moore, G. R., Perkins, S. J., and Williams, R. J. P., 1976, FEBS Lett. 70:96–100. Campbell, A. P., and Sykes, B. D., 1991, J. Mol. Biol. 222:405. Campbell, A. P., Van Eyk, J. E., Hodges, R. S., and Sykes, B. D., 1992, Biochem. Biophys. Acta 1160:35. Casset, F., Imberty, A., Perez, S., Etzler, M. E., Paulsen, H., and Peters, T., 1997, Eur. J. Biochem. 244:242. Casset, F., Peters, T., Etzler, M., Korchangina, E., Nifant’ev, N., Pérez, S., and Imberty, A., 1996, Eur. J. Biochem. 239:710. Chen, Y., Reizer, J., Saier, M., Jr., Fairbrother, W. J., and Wright, P. E., 1993, Biochemistry 32:32. Choe, B. Y., Cook, G. W., and Krishna, N. R., 1991, J. Magn. Reson. 94:387. Clore, G. M., Bax, A., Wingfield, P., and Gronenborn, A. M., 1990a, Biochemistry 29:5671.
Clore, G. M., and Gronenborn, A. M., 1982, J. Magn. Reson. 48:402. Clore, G. M., and Gronenborn, A. M., 1983, J. Magn. Reson. 53:423. Clore, G. M., and Gronenborn, A. M., 1991, Prog. NMR Spectrosc. 23:43. Clore, G. M., Nilges, M., Sukumaran, D. K., Brunger, A. T., Karplus, M., and Gronenborn, A. M., 1986, EMBO J. 5:2729. Clore, G. M., Szabo, A., Bax, A., Kay, L. E., Driscoll, P. C., and Gronenborn, A. M., 1990b, J. Am. Chem. Soc. 112:4989. Cooke, R. M., Hale, R. S., Lister, S. G., Shah, G., and Malcolm, P., 1994, Biochemistry 33:10591. Crenshaw, J. M., Graves, D. E., and Denny, W. A., 1995, Biochemistry 34:13682–13687. Curto, E. V., Moseley, H. N. B., and Krishna, N. R., 1996, J. Comp-Aided Mol. Design 10:361–371. Czaplicki, J., Arrowsmith, C., and Jardetzky, O., 1991, J. Biomol. NMR 1:349–361. Dekker, N., Cox, M., Boelens, R., Verrijer, C. P., ver der Vliet, P. C., and Kaptein, R., 1993, Nature 362:852. Dellwo, M. J., Schneider, D. M., and Wand, A. J., 1994, J. Magn. Reson. B103:l.
Dobson, C. M., and Evans, P. A., 1984, Biochemistry 23:4267. Dratz, E. A., Gizachew, D., Busse, S. C., Rens-Domiano, S., and Hamm, H. E., 1996, Biophys. J. 70:A16. Driscoll, P. C., Gronenborn, A. M., Wingfield, P. T., and Clore, G. M., 1990, Biochemistry 29:4668. Ealick, S. E., Babu, Y., Bugg, C. E., Erion, M., Guida, W., Montgomery, J. A., and Secrist, J. A. III, 1991, Proc. Natl. Acad. Sci. U.S.A. 88:11540.
304
N. Rama Krishna and Hunter N. B. Moseley
Ernst, R. R., Bodenhausen, G., and Wokaun, A., 1987, Principles of Nuclear Magnetic Resonance in One and Two Dimensions, Clarendon Press, Oxford. Feigon, J., Wang, A. H. J., van der Marel, G. A., van Boom, J. H., and Rich, A., 1984, Nucleic Acid Res. 12:1243. Fejzo, J., Westler, W., Macura, S., and Markley, J. L., 1991, J. Magn. Reson. 92:195. Fejzo, J., Westler, W., Markley, J. L., and Macura, S., 1992, J. Am. Chem. Soc. 114:1523. Fesik, S. W., 1993, J. Biomol. NMR 3:261–269. Fischer, E., 1894, Ber. Deutsch Chem. Ges. 27:2985. Fischer, A., Laub, P. B., and Cooperman, B. S., 1995, Nature Struct. Biol. 2:951. Folkers, P. J. M., Folmer, R. H. A., Konings, R. N. H., and Hilbers, C. W., 1993, J. Am. Chem. Soc. 115:3798. Gemmecker, G., Olejniczak, E. T., and Fesik, S. W., 1992, J. Magn. Reson. 96:199. Glasel, J. A., 1989, J. Mol. Biol. 209:747. Glaudemans, C. P. J., Lerner, L. E., Daves, D. G., Jr., Kovac, P., Venable, R., and Bax, A., 1990, Biochemistry 29:906. Gordon, S. L., and Wüthrich, K., 1978, J. Am. Chem. Soc. 100:7094. Gorenstein, D. G., Meadows, R. P., Metz, J. T., Nikonowicz, E. P., and Post, C. P., 1990, in Advances in
Biophysical Chemistry (C. A. Bish, ed.), JAI Press, London, pp. 47–124. Gounarides, J. S., Broido, M. S., Becker, J. M., and Naider, F. R., 1993, Biochemistry 32:908. Graves, B. J., Crowther, R. L., Chandran, Ch., Rumberger, J. M., Li, S., Huang, K. S., Presky, D. H., Familletti, P. C., Wolitzky, B. A., and Burns, D. K., 1994, Nature 367:532. Guntert, P., Braun, W., and Wüthrich, K., 1991, J. Mol. Biol. 217:517. Gupta, R. K., Koenig, S. H., and Redfield, A. G., 1972, J. Magn. Reson. 7:66. Hansley, P., McDevitt, P. J., Brooks, I., Trill, J. J., Feild, J. A., McNulty, D. E., Connor, J. R., Griswold, D. E., Kumar, N. V., Kopple, K. D., Carr, S. A., Dalton, B. J., and Johanson, K., 1994, J. Biol. Chem. 269:23949.
Haran, T. E., Joachimiak, A., and Sigler, P. B., 1992, EMBO J. 11:3021–3030. Hinck, A. P., Walkenhorst, W. F., Truckses, D. M., and Markley, J. L., 1997, in Biological NMR
Spectroscopy (J. L. Markley and S. J. Opella, eds.), Oxford University Press, New York, pp. 113–138. Holland, D. R., Tronrud, D. E., Pley, H. W., Flaherty, K. M., Stark, W., Jansonius, J. N., McKay, D. B., and Matthews, B. W., 1992, Biochemistry 31:11310. Hoogstraten, C. G., Westler, W. M., Macura, S., and Markley, J. L., 1995, J. Am. Chem. Soc. 117:5610. Hrabal, R., Komives, E. A., and Ni, F., 1996, Protein Sci. 5:195. Ikura, M., and Bax, A., 1992, J. Am. Chem. Soc. 114:2433. Ikura, M., Clore, G. M., Gronenborn, A. M., Zhu, G., Klee, C. B., and Bax, A., 1992, Science 256:632. Imberty, A., Casset, F., Gegg, C. V., Etzler, M. E., and Pérez, S., 1994, Glycoconj. J. 11:400. Jackson, P. L., Moseley, H. N. B., and Krishna, N. R., 1995, J. Magn. Reson. 107B:289. James, T. L., 1976, Biochemistry 15:4724. James, T. L., and Oppenheimer, N. J., Eds., 1994, Nuclear Magnetic Resonance, Methods in Enzymology, Vol. 239, Sec. IV. Janakiraman, M. N., White, C. L., Laver, W. G., Air, G. M., and Luo, M., 1994, Biochemistry 33:8172. Jarori, G. K., Murali, N., and Rao, B. D. N., 1994, Biochemistry 33:6784. Keepers, J. W., and James, T. L., 1984, J. Magn. Reson. 57:404. Koshland, D. E., Jr., 1958, Proc. Natl. Acad. Sci. U.S.A. 44:98. Krishna, N. R., Agresti, D. G., Glickson, J. D., and Walter, R., 1978, Biophys. J. 24:791. Krishna, N. R., Goldstein, G., and Glickson, J. D., 1980, Biopolymers 19:2003. Krishna, N. R., and Lee, W., 1992, Biophys. J. 61:A33. Kumamoto, A. A., Miller, W. G., and Gunsalus, R. P., 1987, Gene Der. 1:556–564.
Kuntz, I. D., Thomason, J. F., and Oshiro, C. M., 1989, Meth. Enzymol. 177:159.
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
305
Lasky, L. A., 1992, Science 258:964. Lawson, C. L., and Carey, J., 1993, Nature 366:178–182. Lee, W., and Krishna, N. R., 1992, J. Magn. Reson. 98:36.
Lee, W., Revington, M. J., Arrowsmith, C. H., and Kay, L. E., 1994, FEBS Lett. 350:87–90. Lee, W., Revington, M., Farrow, N. A., Nakamura, A., Utsunomiya-Tate, N., Miyake, Y., Kainosho, M., and Arrowsmith, C. H., 1995, J. Biomol. NMR 5:367–475. LeMaster, D. M., 1989, Meth. Enzymol. 177:23. Lian, L. Y., Barsukov, I. L., Sutcliffe, M. J., Sze, K. H., and Roberts, G. C. K., 1994, Meth. Enzymol. 239:657–700. Lipari, G., and Szabo, A., 1982a, J. Am. Chem. Soc. 104:4559–4570.
Lipari, G., and Szabo, A., 1982b, J. Am. Chem. Soc. 104:4546–4559. Lippens, G. M., Cerf, C., and Hallenga, K., 1992, J. Magn. Reson. 99:268. Liu, H., Kumar, A., Weisz, K., Schmitz, U., Bishop, K. D., and James, T. L., 1993, J. Am. Chem. Soc. 115:1590. London, R. E., Perlman, M. E., and Davis, D. G., 1992, J. Magn. Reson. 97:79. Lord, S. T., Byrd, P. A., Hede, K. L., Wei, C., and Colby, T. J., 1990, J. Biol. Chem. 265:838. Lumb, K. J., Cheetham, J. C., and Dobson, C. M., 1994, J. Mol. Biol. 235:1072. Macura, S., Fejzo, J., Hoogstraten, C. G., Westler, W. M., and Markley, J. L., 1992, Isr. J. Chem. 32:245. Macura, S., Westler, W., and Markley, J. L., 1994, Meth. Enzymol. 239:106. Marcus, D. M., Naiki, M. A., and Kundu, S. K., 1976, Proc. Natl. Acad. Sci. U.S.A. 73:3263. Massefski, W., Jr., and Redfield, A. G., 1988, J. Magn. Reson. 78:150.
Meadows, R. P., Nikonowicz, E. P., Jones, C. R., Bastian, J. W., and Gorenstein, D. G., 1991, Biochemistry 30:1241. Mertz, J. E., Guntert, P., Wüthrich, K., and Braun, W., 1991, J. Biomol. NMR 1:257. Meyer, B., Weimar, T., and Peters, T., 1997, Eur. J. Biochem. 246:705. Moseley, H. N. B., Curto, E. V., and Krishna, N. R., 1994, 35th Experimental NMR Conference, WP115, Asilomar, CA. Moseley, H. N. B., Curto, E. V., and Krishna, N. R., 1995, J. Magn. Reson. 108B:243–261. Moseley, H. N. B., Lee, W., Arrowsmith, C. H., and Krishna, N. R., 1997, Biochemistry 36:5293. Moseley, H. N. B., Scheffler, K., Perez, S., Imberty, A., Krishna, N. R., and Peters, T., 1998, to be submitted, (1999).
Murali, N., Lin, Y., Mechulam, Y., Plateau, P., and Rao, B. D., 1997, Biophys. J. 70:2275. Ng, K. K.-S., and Weis, W. I., 1997, Biochemistry 36:979. Ni, F., 1992, J. Magn. Reson. 96:651. Ni, F., 1994, Prog. NMR Spectrosc. 26:517. Ni, F., Konishi, Y., Bullock, L. D., Rivetna, M. N., and Scheraga, H. A., 1989, Biochemistry 8:3106. Ni, F., Konishi, Y., and Scheraga, H. A., 1990, Biochemistry 29:4479. Ni, F.., Ripoll, D. R., Martin, P. D., and Edwards, B. F. P., 1992, Biochemistry 31:11551. Ni, F., and Zhu, Y., 1994, J. Magn. Reson. 103B: 180–184. Ni, F., Zhu, Y, and Scheraga, H. A., 1995, J. Mol. Biol. 252:656. Nicholson, L. K., Yamazaki, T., Torchia, D. A., Grzesiek, S., Bax, A., Stahl, S. J., Kaufman, J. D., Wingfield, P. T., Lam, P. Y. S., Jadhav, P. K., Hodge, C. N., Domaille, P. J., and Chang, 1995, Nature Struct. Biol. 2:274.
Nilges, M., Clore, G. M., and Gronenborn, A. M., 1988, FEBS Lett. 239:129. Ning, Q., Ripoll, R., Szewczuk, Z., Konishi, Y, and Ni, F., 1994, Biopolymers 34:1125. Nirmala, N. R., Lippens, G. M., and Hallenga, K., 1992, J. Magn. Reson. 100:25. Okada, A., Wakamatsu, K., Miyazawa, T., and Higashijima, T., 1994, Biochemistry 33:9438. Olejniczak, E. T., Gampe, T., Jr., and Fesik, S. W., 1986, J. Magn. Reson. 67:28. Ono, K., Hattori, H., Uemura, K., Nakayama, J., Ota, H., and Katsuyama, T., 1994, J. Histochem. Cytochem. 42:659.
306
N. Rama Krishna and Hunter N. B. Moseley
Otting, G., Liepinsh, E., and Wüthrich, K., 1993, Biochemistry 32:584–595. Otting, G., and Wüthrich, K., 1989, J. Am. Chem. Soc. 111:1871. Otwinowski, Z., Schevitz, R. W., Zhang, R. G., Lawson, C. L., Joachimiak, A., Marmorstein, R. Q., Luisi, B. F, and Sigler, P. B., 1988, Nature 335:321–329. Pavlopoulos, S., Rose, M., Wickham, G., and Craik, D. J., 1995, Anticancer Drug Design 10:623. Perlman, M., Davis, D. G., Koszalka, G. W., Tuttle, J. V., and London, R. E., 1994, Biochemistry 33:7547. Pervushin, K., Riek, R., Wider, G., and Wüthrich, K., 1997, Proc. Natl. Acad. Sci. USA 94:12366–
12371. Poppe, L., Brown, G. S., Philo, J. S., Nikrad, P. V., and Shah, B. H., 1997, J. Am. Chem. Soc. 119:1727. Press, W. A., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P., 1992, Numerical Recipes in C, 2nd ed., Cambridge University Press. Quiocho, F. A., 1991, Curr. Opinion Struct. Biol. 1:922. Radmacher, M., Fritz, H. G. Hansma, and P. K. Hansma, 1994, Science 265:1577. Ramesh, V., Syed, S. E. H., Frederick, R. O., Sutcliffe, M. J., Barnes, M., and Roberts, G. C. K., 1996, Eur. J. Biochem. 235:804–813. Rinnbauer, M., Mikros, E., and Peters, T., 1998, J. Carbohydrate Chem. 17:217–230.
Rutherford, T. J., Spackmann, D. G., Simpson, P. J., Homans, S. W., 1994, Glycobiology 4:59. Scheek, R. M., van Gunsteren, W. F, and Kaptein, R., 1989, Meth. Enzymol. 177:204. Scheffler, K., Ernst, B., Katopodis, A., Magnani, J. L., Wong, W. T., Weisemann, R., and Peters, T., 1995, Angew. Chem. Int. Ed. Engl. 34:1841. Scheffler, K., Brisson, J.-R., Weisemann, R., Magnani, J. L., Wong, W. T., Ernst, B. V, and Peters, T., 1997, J. Biomol. NMR 9:423. Scheraga, H. A., 1983, Ann. NY Acad. Sci. 408:330. Scheraga, H. A., 1986, Ann . NY Acad. Sci. 485:124. Scherf, T., and Anglister, J., 1993, Biophys. J. 64:754. Scherf, T., Hiller, R., Naider, F., Levitt, M., and Anglister, J., 1992, Biochemistry 31:6884. Schneider, M. L., and Post, C. B., 1995, Biochemistry 34:16574. Schulz, G. E., Muller, C. W., and Diederichs, K., 1990, J. Mol. Biol. 213:627. Shan, X., Gardner, K. H., Muhandiram, D. R., Rao, N. S., Arrowsmith, C. H., and Kay, L. E., 1996, J. Am. Chem. Soc. 118:6570–6579. Sharff, A. J., Rodseth, L. E., Spurlino, J. C., and Quiocho, F. A., 1992, Biochemistry 31:10657. Shibata, C. G., Gregory, J. D., Gerhardt, B. S., and Serpersu, E. H., 1995, Archiv. Biochem. Biophys. 319:204.
Shuker, S. B., Hajduk, P. J., Meadows, R. P., and Fesik, S. W., 1996, Science 274:1531. Sugar, I. P., and Xu, Y., 1992, Prog. Biophys. Mol. Biol. 58:61. Uemura, K., Hattori, H., Ono, K., Ogata, H., and Taketomi, T., 1989, Jpn. J. Exp. Med. 59:239. Vincent, S. J. F., Zwahlen, C., and Bodenhausen, G., 1996a, in NMR as a Structural Tool for Macromolecules: Current Status and Future Directions (B. D. N. Rao and M. D. Kemple, eds.),
Plenum Press, New York, pp. 145–166. Vincent, S. J. F., Zwahlen, C., Bolton, P. H., Logan, T. M., and Bodenhausen, G., 1996b, J. Am. Chem. Soc. 118:3531.
Wadkine, R. M., and Graves, D. E., 1991, Biochemistry 30:4278–4283. Wagner, G., and Wüthrich, K., 1979, J. Magn. Reson. 33:675. Weimar, T., Harris, S. L., Pitner, J. B., Bock, K., and Pinto, M., 1995, Biochemistry 34:13672. Wider, G., Weber, C., Traber, R., Widmer, H., and Wüthrich, K., 1990, J. Am. Chem. Soc. 112:9015. Wüthrich, K., and Wagner, G., 1975, FEBS Lett. 50:265–268. Xu, Y., and Krishna, N. R., 1995, J. Magn. Reson. B108: 192. Xu, Y., Krishna, N. R., and Sugar, I. P., 1995a, J. Magn. Reson. B107:201.
CORCEMA Analysis of NOESY Spectra of Ligand–Receptor Complexes
307
Xu, Y., Sugar, I. P., and Krishna, N. R., 1995b, J. Biomol. NMR 5:37. Yi, Q., Erman, J. E., and Satterlee, J. D., 1994, J. Am. Chem. Soc. 116:1981.
Yip, P. F., and Case, D. A., 1989, J. Magn. Reson. 83:643. Zhao, D., Arrowsmith, C. H., Jia, X., and Jardetzky, O., 1993, J. Mol. Biol. 229:735–746. Zheng, J., and Post, C. B., 1993, J. Magn. Reson. 101B:262–270.
Zhu, L., and Reid, B. R., 1995, J. Magn. Reson. B106:227. Zwahlen, C., Vincent, S. J. F., Di Bari, L., Levitt, M. H., and Bodenhausen, G., 1994, J. Am. Chem. Soc. 116:362.
II
Structure and Dynamics
8
Protein Structure and Dynamics from Field-Induced Residual Dipolar Couplings
James H. Prestegard, Joel R. Tolman, Hashim M. Al-Hashimi, and Michael Andrec 1. INTRODUCTION
The quest for information about the structure and dynamics of proteins in solution has traditionally been approached using spin-relaxation-based phenomena: NOEs
for distance-constraint-based structure, and heteronuclear relaxation for backbone and side-chain dynamics. While new techniques that capitalize on these phenomena continue to be introduced, both phenomena are limited in fundamental ways. NOEs are limited by their inherent short-range distance sensitivity. This becomes a problem when successive short-range constraints must dictate spatial relationships of remote parts of macromolecules, as occurs in extended DNA and RNA helices. It also puts long-range constraints, such as those between side-chain protons of different parts of a protein, at a premium and presses the limits of methods which
James H. Prestegard, Joel R. Tolman, Hashim M. Al-Hashimi, and Michael Andrec • Complex Carbohydrate Research Center University of Georgia, Athens, Georgia 30602. Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999.
311
312
James H. Prestegard et al.
can extend backbone-directed assignment strategies to the very tip of residue side chains. Most heteronuclear spin relaxation phenomena are limited by truncation of their sensitivity to motional time scales on the order of or faster than the overall tumbling rate of a molecule (time scale of several nanoseconds)(Wagner, 1993). This becomes a problem when motions of fundamental importance to enzyme mechanisms occur on the microsecond to millisecond time scale. Here, we discuss some new methods that can complement traditional methods in cases where they are limited in these respects. These new methods are based on the phenomena of field-induced residual dipolar couplings. The existence of these couplings was realized and demonstrated many years ago, but its potential as an abundant source of information for macromolecular structure and dynamics has been fully realized only with the advent of very high field magnets and heteronuclear methods that allow very precise measurement of dipolar couplings in isotopically labeled biomolecules. While dipolar interactions between pairs of spin-1/2 nuclei certainly exist and they are in fact the basis of the relaxation-based phenomena to which we refer above, they are seldom directly observed in solution. This may seem surprising when one realizes that the spin operator description of the interaction shares features with that for through-bond spin–spin coupling. The interaction should contribute to multiplet structure or change the splitting in multiplet structure just as scalar (J) coupling does. As discussed more fully in Sec. 2, the reason for the failure to observe these contributions in solution is that the geometric factor which scales the dipolar interaction, averages to zero when the angle between the magnetic field and the interaction vector, is sampled isotropically during molecular tumbling. The internuclear distance, r, can be recovered through spin relaxation measurements, but, for truly isotropic sampling, direct measurement of neither the distance nor angular part of the dipolar interaction is possible. Recovery of the angular information in the dipolar interaction could be very important. If we could learn how the average orientations of two vectors, say two bond vectors in remote parts of a biomolecule differed, this could be a powerful constraint on structure, even in cases where the two vectors are remote. If we could learn how the averaging process differed for the two vectors, this could provide new information on internal dynamics, even when the dynamics involved time scales of motions only slightly shorter than the reciprocal of the residual dipolar interaction itself (100 ms). The value of angular information has long been recognized in the fields of solid-state NMR and liquid crystal NMR. But, with a few exceptions (Bastiaan and MacLean, 1990; Bastiaan et al., 1987), the value of measuring residual dipolar interactions in solution has not been appreciated. The reason is simple: the residual interactions in solution have been small and difficult to measure. Why is our ability to measure dipolar contributions to multiplet splittings changing? One reason is that field strengths available to high-resolution spectro-
Protein Structure and Dynamics from Dipolar Couplings
313
scopists are increasing. For weakly aligned systems the residual dipolar contributions to multiplet splittings increases as field squared. This means that, in going from spectrometers operating a few years ago at 600 MHz for protons to ones operating today at 800 MHz for protons, the magnitude of the dipolar contribution
has increased by a factor of 1.78. At 1 GHz the factor will be 2.78. With today’s fields, the interactions for isolated molecules in solution are still moderately small. So, in simple solution systems amenable to measurement today molecules must have an inherently large anisotropic magnetic susceptibility. In practical terms, this means the main subjects of this chapter will be certain paramagnetic proteins and certain highly anisotropic diamagnetic systems, such as proteins bound to a DNA helix. Very recently an ability to amplify the orientational tendencies of individual molecules by dissolving them in a dilute liquid crystal which orients more strongly has been demonstrated (Tjandra and Bax, 1997b). This is a very promising
technology that may allow extension of measurement to be discussed here to even a broader range of systems. In any emerging methodology, particularly one that pushes precision of meas-
urements, development of new experiments becomes of primary importance. We are fortunate that residual dipolar couplings appear as contributions to multiplet splittings, because many of the new methods being introduced to measure scalar couplings for the purpose of obtaining torsion-angle constraints can be used as models for the experiments needed here. The entire range of experiments for scalar coupling measurement are reviewed elsewhere in this volume. We will focus here on methods designed for measurement of one-bond couplings, because the bond distance in the one-bond dipolar interaction can be assumed known, and we can focus on the angular information in the residual dipolar contribution. We will also discuss the limits of precision in measurement and some of the systematic errors that can be introduced. This is important because, at current field strengths, in noncooperative systems, we will push the limits of measurement to their extremes. Interpretation is also an issue. The data are new and protocols for incorporating the data in structure determinations are just emerging. Preliminary applications to structure refinement and global structure determination will be discussed. We have mentioned the issue of motion. Motional and structural effects on NMR parameters are frequently intertwined, and this is no less the case for residual dipolar couplings. An approach that can at least screen for the presence of large-scale slow motions in proteins of known structure will be described. The predictions from the limited data available are as yet controversial. However, the promise that measurement of residual dipolar couplings will eventually yield unique contributions to the definition of both structure and motion in macromolecules is great. This chapter will hopefully lay a useful basis for the fulfillment of this promise.
314
James H. Prestegard et al.
2. THEORY 2.1. Anisotropic Spin Interactions in Solution-State NMR
The various interactions that contribute to the Hamiltonian describing nuclear spin energies in the presence of a magnetic field differ in their dependencies on molecular orientation with respect to the field. The Zeeman interaction does not depend on the orientation of the molecule since the axis of quantization of spin angular momentum is solely determined by the direction of the effective magnetic field in the laboratory frame. On the other hand, all other interactions, including nuclear shielding indirect spin–spin coupling direct dipole–dipole interactions and electric quadrupole coupling depend on the orientation. These orientation dependencies can be described in terms of second-rank
tensors written in the laboratory frame. In liquids and gases, thermal motion formally renders these lab frame tensors time dependent. However, for observation of the interactions of interest, characterized by frequencies from and greater, an average Hamiltonian will suffice, and time-dependent tensors can be replaced by effective averages. If the motion to be included in the averaging is perfectly isotropic in its sampling of orientational space, the averaging leaves only the trace of the interaction Hamiltonian. The dipolar and quadrupolar anisotropic interactions have traceless tensors and will therefore vanish from direct detection and only play a role in spin relaxation. Chemical shielding and indirect coupling will persist
as an apparent scalar interaction. This is what we expect in normal high-resolution NMR. While this averaging is convenient when high resolution and spectral simplicity are the goal, removal of all anisotropic interactions sacrifices a great deal in useful data. Small departures from isotropic tumbling may reintroduce these anisotropic interactions and maintain spectral simplicity. A convenient way to accomplish this is by recognizing that molecules with large anisotropic magnetic susceptibilities, will favor certain orientations with respect to an applied magnetic field and will therefore sample orientational space anisotropically. Any anisotropic terms in the spin Hamiltonian will now have an effective average which differs from its isotropic value, allowing direct observation of anisotropic spin interactions. Here, we are concerned primarily with the effect of partial averaging of the dipolar interaction between a pair of spins which results in a contribution to the observed line splittings. In the subsequent discussion, we will derive an expression for the residual dipolar coupling and demonstrate the structural information that can be extracted from its measurement. The theory underlying magnetic-field-induced order and the consequences in terms of the averaging of anisotropic interactions, such as the dipolar interaction, has been described in several places (Bastiaan and MacLean, 1990; Bastiaan et al., 1987; Lohman and MacLean, 1978). We repeat some of that description here in
Protein Structure and Dynamics from Dipolar Couplings
315
terms of irreducible spherical tensors. This approach has a certain simplicity and
makes many of the underlying assumptions more apparent. Irreducible spherical tensors have been used previously to describe orientation effects in the case of the EPR spectra of radicals (Falle and Luckhurst, 1970) and for the NMR spectrum of monofluorobenzene in a nematic liquid crystal solvent (Snyder, 1965). The reader is referred to those works for additional clarification. 2.2. The Dipolar Hamiltonian The lab frame Hamiltonian (in Hz) for the dipolar interaction between two nuclei can be expressed as a scalar contraction of spin and spatial tensor components:
where
are the gyromagnetic ratios for the nuclei, h is Plank’s constant, r
is a vector joining the two nuclei; T and D are second-rank, zero-order tensor
operators describing the spin and spatial part of the interaction, respectively. Note that the nonsecular terms have been dropped since their time dependence will render
them ineffective in determining the energies of the system. The components of the dipolar interaction tensor written in their own principal axis frame are actually quite simple. In this frame, with the as the internuclear vector, there is only one nonzero component, with value (lab) is, however, a complex function of molecular orientation. While the principal axis system (PAS) of a given dipolar interaction is often assumed fixed in a molecular frame, a wide range of molecular orientations is sampled by motion. We will therefore have to average over these orientations to obtain closed-form expressions for the dipolar interaction. Averaging is most conveniently done by transforming to an intermediate molecular frame. For the case of interest here, it is most convenient to choose the molecular frame which corresponds to the principal axis system of the magnetic susceptibility tensor for the entire molecule (referred to as the magnetic frame). For isolated molecules in solution, sampling of molecular orientations will be influenced by the interaction of the magnetic field with the anisotropic magnetic susceptibility of the molecule. Thus, assuming a rigid molecule, all necessary averaging will involve the relationship of the magnetic frame to the lab frame and will be the same for any dipole interaction in the molecule. On the other hand, the relationship of a given principal dipolar frame to the magnetic frame will remain a fixed function of molecular geometry. The transformations between frames are most easily accomplished using Wigner rotation matrices with elements The first transformation, carrying
316
James H. Prestegard et al.
the PAS of the dipolar interaction into the magnetic coordinate system, is described by the fixed Euler angles, If it is assumed that the molecule is rigid, then this transformation is independent of time:
The second transformation relates the PAS of the tensor (magnetic axes) to the lab frame by the Euler rotation denoted by The time dependence of these angles arises because of molecular reorientation:
Thus, the dipolar tensor, D, in the lab frame can be written
Substitution into Eq. (1) leads to
where the angle brackets indicate that a time average has been taken in order to account for molecular reorientation. This time average will cause to vanish if all orientations in space are sampled equally, and for this reason dipolar couplings are not normally observed in liquid-state high-resolution NMR. However, this assumption of an isotropic distribution of orientations will begin to fail at high magnetic fields and for molecules which have a considerably anisotropic suscepti-
bility. This occurs because of the interaction between the induced molecular dipole moment and the magnetic field, originally discussed by Van Vleck (1932). The next section will detail how consideration of this small interaction can break the isotropy of orientational space and allow the direct observation of anisotropic parameters, such as dipolar couplings, in solution-state NMR spectroscopy.
2.3. Residual Dipolar Couplings under Magnetic Field Alignment Placement of a molecule in a magnetic field gives rise to an induced magnetic dipole moment which is proportional to the susceptibility, This moment will oppose the field for diamagnetic molecules and be along the field for
Protein Structure and Dynamics from Dipolar Couplings
317
paramagnetic molecules (Van Vleck, 1932). For molecules which are not of spherical symmetry, must be described by a tensor, and thus the size of the induced magnetic dipole moment will be dependent upon the orientation of the molecule with respect to the magnetic field. The induced dipole in turn interacts with the magnetic field itself, leading to an energy of interaction, W, which can be written as (Bastiaan et al., 1987)
As most molecules are not of spherical symmetry, they will have some orientations with respect to the magnetic field which are energetically more
favorable than others. If W is large enough compared to the thermal energy, a sufficient, although small, degree of order will be induced such that the effects of the resulting incomplete averaging of anisotropic interactions become observable (Bastiaan et al., 1987; Bastiaan and MacLean, 1990; Lohman and MacLean, 1978). It is important to recognize that the orientation of the principal axes of the susceptibility tensor (PAS) within the molecular frame will determine how the
molecule will tend to order in the field. Shown in Fig. 1 is the most probable orientation for a benzene molecule with its symmetry-determined principal axis system. We can express the interaction energy in Eq. (6) as a scalar contraction of irreducible spherical tensors (in the PAS of the susceptibility tensor):
318
James H. Prestegard et al.
Note that the rank 0 (isotropic) component of has not been included in the above expression. This part will not contribute to an orientational dependence of the interaction energy W and, hence, can be dropped for simplicity. In the PAS of the tensor, the rank 2 irreducible spherical components of can easily be written in terms of familiar susceptibility anisotropies:
where the +1 and –1 components vanish because of the assumed symmetric nature of the susceptibility tensor. The operator T is simple in the lab frame where only the element exists and is equal to The magnetic frame equivalent is generated by use of Wigner rotation matrices and substitution into Eq. (7) leads to the desired expression for W
in the PAS of the
tensor,
Our objective in obtaining Eq. (9) is the calculation of the average which appears in Eq. (5). Having an expression for the energy, we can assume a Boltzmann distribution and carry out this averaging:
In the high-temperature approximation, the above integral can be solved by inspection utilizing the orthogonality theorem of the Wigner rotation matrix elements:
This allows Eq. (5), in the case of a heteronuclear spin pair, to be simplified to
Protein Structure and Dynamics from Dipolar Couplings
319
where has been replaced explicitly by the spherical angles and which relate the internuclear vector between spins I and S to the molecule fixed magnetic coordinate system. Note the appearance ofthe operator. This is exactly the same as appears in the expression for the Hamiltonian for through-bond scalar coupling. For a simple pair of unlike spin-1/2 nuclei two doublets will appear, each split by the sum of scalar and residual dipolar contributions. It is convenient to write the expression for the residual dipolar coupling
contribution in the case of two spin-1/2 nuclei (units of Hz):
Inspection of Eq. (13) leads to the conclusion that, for a rigidly tumbling molecule, the magnitude of the residual dipolar coupling will depend on the square of the
magnetic field strength, the susceptibility anisotropy, the internuclear distance r, and the orientation of the internuclear vector within the magnetic coordinate system (Fig. 2). The field dependence can be used to separate the residual dipolar contribution from scalar coupling contributions which are not field dependent. For a directly bonded pair of spins, it will be assumed that the internuclear distance is known, and measurement of these contributions, given knowledge of the susceptibility anisotropy, can provide information related to the orientation of bond vectors within a molecule fixed magnetic coordinate system. While these effects represent a potentially abundant source of structural information, some significant challenges remain. Even at the highest fields available today, the predicted splittings are small. For a benzene ring at 17.6 T and 298 K, the predicted residual dipolar contribution to a one-bond coupling with is 0.12 Hz. Even for systems with large
320
James H. Prestegard et al.
susceptibility anisotropies, these effects are only on the order of a few Hertz, and thus are experimentally demanding to measure.
3. EARLY HISTORY OF OBSERVATION Despite anticipated difficulties in observation of residual dipolar couplings, several accounts of observation can be found in the early NMR literature. Although
in this chapter we will be mainly concerned with applications of dipolar couplings and magnetic field alignment to the structure and dynamics of biomolecules in simple solution, many of the key concepts underlying the methodology stem from a more diverse set of studies. Many of the earlier studies are on organic molecules (Bastiaan and MacLean, 1990; Bastiaan et al., 1987; van Zijl et al., 1984), many use alignment methods other than magnetic-field-induced alignment (Buckingham and McLauchlan, 1967; Plantenga et al., 1980), many include the measurement of quadrupole as well as residual dipolar splittings (Bothner-By et al., 1981), and many rely on cooperative systems such as liquid crystals to achieve partial orientation (Sanders et al., 1994). However, all of these studies share underlying principles and are therefore worthy of some discussion here. In the interests of brevity this discussion cannot be complete, but we hope to make connections to other segments of the literature, which are particularly useful, such as the liquid crystal NMR literature (Emsley and Lindon, 1975). Observable effects of partial molecular alignment were first predicted for polar molecules aligned under the influence of an electrical field (Buckingham and Levering, 1962). This prediction gained experimental support soon after, when signatures of residual dipolar coupling were observed in p-nitrotoluene using
electric field NMR. The predicted effects, combined with the observed perturbations of multiplet splittings, allowed some of the first determinations of the absolute signs of scalar coupling constants in aromatic rings (Buckingham and McLauchlan, 1963). Applications of electric field alignment in NMR have not, however, become widespread because of the need for specially designed cells and limitation to relatively nonconducting solutions. Magnetic field alignment does not suffer the particular limitations of electric field alignment. However, it does require very high fields and large anisotropic magnetic susceptibilities. Aromatic and paramagnetic systems have the requisite
large anisotropic magnetic susceptibilities, and therefore became candidates for the early development of magnetic field alignment methodology as soon as magnets of
sufficient field strength were available. The first experimental demonstration of magnetic field alignment dates back to 1978 (Lohman and MacLean, 1978). The primary observation in this case was quadrupolar splitting of resonances from nuclei with spins greater than 1/2, rather than dipolar splitting from coupled pairs of spins. Quadrupolar splittings were observed in the NMR spectra of
Protein Structure and Dynamics from Dipolar Couplings
321
triphenelene-d6 and phenanthrene-d10, suggesting incomplete averaging of the quadrupolar interaction due to partial magnetic field alignment (Lohman and MacLean, 1978). Quadrupole interactions, present for nuclei with spins greater than 1, show a dependence on molecular alignment and molecular geometry similar to that of dipolar interactions with the magnitude of the observed splittings depending on the quadrupole coupling constant the magnetic field strength squared and the magnitude of the molecule’s magnetic susceptibility anisotropy Since values for quadrupole coupling constants were known (180 kHz for C–D bonds in aromatic molecules), the authors were able to compute the axial magnetic susceptibility anisotropy The diamagnetic anisotropy in these molecules is characteristic of aromatic behavior with cyclic delocalization of affecting a substantial increase in the principal negative susceptibility component perpendicular to the aromatic ring plane. Aromatic molecules thus tend to align with the normal of the ring plane perpendicular to the magnetic field as shown in Fig. 1. Studies on aromatic molecules have continued to be reported (Laatikainen et al., 1995, 1993). As discussed later, aromatic groups of tyrosine, tryptophan, phenyalanine, and histidine are also important in diamagnetic protein applications where their anisotropies play a dominate role in the resultant magnetic susceptibil-
ity (Tjandra et al., 1996). Quadrupole splittings were chosen for initial observations on small molecules because of the large quadrupole coupling constants and resultant large residual splittings. However, chemical-shift resolution for quadrupole nuclei is generally poor, making retrieval of information from multiple sites in large molecules difficult. When quadrupole interactions dominate effective transverse relaxation
rates of solution samples, linewidths increase as the square of the interaction strength, decreasing resolution as interaction strength increases. Hence, particularly
for larger molecules, measurement of the weaker dipolar interactions offers an advantage. Observations of residual dipolar splittings, nevertheless, required a combination of higher fields, and molecules with larger anisotropic susceptibilities. The enhanced anisotropy in paramagnetic systems eventually allowed the smaller effects of residual dipolar coupling between pairs of protons to be observed. The first observations using paramagnetic systems were actually made on quadrupolar couplings (Eyring et al., 1980). The observation of dipolar couplings was later made on the para methyl protons in paramagnetic Bis{tolyltris(pyrazlyl)borato]cobalt(II) (Bothner-By et al., 1981). This also marks one of the first discussions of the utility of these measurements in structural analysis. The small number and small magnitude of the residual dipolar splittings made a completely independent structural analysis impossible. However, quadrupolar splittings in the corresponding deuterated molecule allowed determination of a degree of order and the observed dipolar couplings could then be shown to be consistent with accepted methyl group geometry. Thus, the potential structural utility of the methodology was demonstrated.
322
James H. Prestegard et al.
Observation of residual dipolar couplings in diamagnetic molecules presented a greater challenge owing to the smaller magnitude of anisotropy. Residual dipolar
couplings could not at first be observed in small organic molecules but proved feasible in larger aromatic systems in which the anisotropy of individual aromatic systems add up to a larger effective anisotropy. This was first demonstrated for methylpyropheophorbide and coronene where residual dipole couplings became visible as an apparent magnetic field dependence as discussed below) of the scalar coupling interaction. Extraction of the couplings allowed determination of
the diamagnetic anisotropy for the molecules (Gayathri and Bothner-By, 1982). Diamagnetic anisotropies have since been determined for a number of molecules,
including simple aromatics, porphyrins, and halomethanes (Bothner-By et al., 1987; van Zijl and Bothner-By, 1988). Acceptance of this approach to determining susceptibility anisotropies, reflects the simplicity of the methodology in comparison to other techniques, such as use of the Cotton–Mouton effect.
Magnetic field alignment and residual dipolar couplings have also been used to successfully determine the three-dimensional structure of a complex cage molecule containing a porphyrin and a quinine (Lisicki et al., 1988). In later studies on a DNA decamer, dipolar effects were observed between protons (C5H and C6H) in two cytosines located at the center and terminal end of the double helix. From the data, an angle of 15° between the cytosine base planes was inferred, suggesting loosening of the double helix at the ends of the strands (Skoglund, 1987). More recent work on DNA using heteronuclear coupling has been reported by Kung et al. (1995). Thus the basic utility of residual dipolar couplings in both chemical and structural analysis of small to moderate-sized molecules has been established. Applications of residual dipolar couplings to the structural determination of proteins in solution would seem on the basis of the above discussion to have great potential. One can obtain data that is quite complementary to NOE-based data because residual dipolar splittings for directly bonded pairs can yield the relative orientation of different interaction vectors relative to an order-determining susceptibility tensor without the requirement for close approach. One can also capitalize on backbone-localized assignment strategies for large molecules since constraints on orientation of directly bonded backbone pairs are useful, even when long-range NOEs involving these pairs are seldom observed.
4. APPLICATION TO PROTEIN SYSTEMS
The first applications of residual dipolar constraints to problems in protein structure and dynamics have now appeared. Among those that rely on the inherent tendencies of the molecular system of interest to orient in a magnetic field, one involves work on a paramagnetic protein, myoglobin (Tolman et al., 1995, 1997).
Protein Structure and Dynamics from Dipolar Couplings
323
A second involves work on a protein with a rather substantial diamagnetic anisotropy, ubiquitin (Tjandra et al., 1996). And, a third involves work on the DNA-binding domain of a protein, GATA-1, bound to DNA, which, although diamagnetic, is highly anisotropic (Tjandra et al., 1997). We review these first systems in what follows in a belief that they can illustrate current problems and future potential. We will then focus on problems inherent in the measurement of dipolar splittings. The small magnitude of the residual dipolar interaction at currently available field strengths (<18 T) remains the primary problem, but experiments have now been designed that provide accurate measurements of coupling constants with a precision of a fraction of a Hertz. Finally, we will discuss a problem that stems from the interplay of structural and dynamical information that must be resolved before the full potential of these studies is realized. The first experimental support for use of residual dipolar couplings as a general probe of protein structure was reported by Tolman et al. (1995) in a study which utilized an sample of myoglobin. This sample was chosen because it could be prepared in a paramagnetic state [low-spin Fe(III)] by the addition of cyanide to an oxidized aqueous sample. The anisotropy of the susceptibility is large the equivalent of about 30 aligned benzene rings), and the extremely
short electron spin lifetimes leave linewidths in NMR spectra relatively unperturbed. There was also a wealth of structural information on the molecule, making it a paradigm for testing new structural methodology. The focus was on directly bonded pairs of amide bonds along the protein backbone. Resonances for these pairs could
be easily resolved in standard two-dimensional heteronuclear single-quantum coherence spectra (HSQC), and these resonances could be assigned with the aid of a few conventional 3D correlation spectra and the previous assignments of the diamagnetic, carbonmonoxy form of the molecule. (Ösapay et al., 1994). Measurement of residual splittings was straightforward in the sense that the normal HSQC
spectrum was modified to allow
coupling to persist during the indirect
evolution period. The rationale for choosing this dimension was simply that linewidths are smaller and long-range couplings less problematic than they would be in the proton dimension. A section of a similar HSQC spectrum in which splittings are amplified by a factor of 2 is shown in Fig. 3. All of the normal HSQC cross peaks appear as pairs
along columns at the amide proton chemical shifts. The splittings are dominated by the
coupling, but there are variations that arise from residual dipolar
coupling. Since there are also variations in scalar coupling constants, it is not appropriate to assume all variations come from dipolar contributions. Their presence can, however, be verified by looking for the expected field dependence. Columns corresponding to two amide protons from spectra taken at 500 MHz and 750 MHz are superimposed in Fig. 4. It is clear that effects are small, but the values do increase and decrease at the higher field as expected when values move from 90° to 0° relative to the principal axis of the susceptibility tensor (nearly normal to
324
James H. Prestegard et al.
the heme plane). Analysis of field-dependent data for approximately 30 sites allowed comparison of measurements to predictions based on the published X-ray structure (Kuriyan et al., 1986) and susceptibility tensor (Rajarathnam et al., 1992) corrected for diamagnetic contributions. The measured dipolar contributions ranged from 1.7 to –2.1 Hz. While some systematic deviations from predictions appeared to exist, the strong correlation supported proposed use as a structural tool. Moreover, the spectra shown above were acquired on a 5-mM sample in a simple overnight run, suggesting that structural information might be acquired with great efficiency. The first example of measurement of residual dipolar contributions on a diamagnetic protein was reported shortly thereafter (Tjandra et al., 1996). Again the focus of attention was directly bonded pairs in backbone amide residues. The target protein was ubiquitin, a protein about half the size of myoglobin. It has a susceptibility anisotropy opposite in sign and about one-tenth the
Protein Structure and Dynamics from Dipolar Couplings
325
magnitude of that in paramagnetic myoglobin. Hence, the range of residual dipolar splittings is only of the order of Measurement of splitting with sufficient precision to extract meaningful numbers in this range is a challenge: one met in part by the introduction of intensity-based measuring schemes that will be discussed
more thoroughly in Sec. 5. Also, dipolar contributions become similar in size to other field-dependent contributions to splittings, such as dipole–CSA dynamic frequency shifts to be discussed in Sec. 6. Data analysis must therefore be pursued with caution. Despite challenges associated with precise measurement and separation from potential sources of systematic error, meaningful details about the structure of the molecule could still be obtained. First, variations in splittings for various amide
bonds agree well with calculations based on the geometry of the X-ray structure and a susceptibility tensor fit to the data. Second, the fit susceptibility tensor agreed
326
James H. Prestegard et al.
well with one calculated by summing susceptibilities of amide bonds and aromatic rings placed in orientations given by the X-ray structure. And, third, the data could be used to test theoretical predictions of nonplanarity of peptide bonds. Distortions in the direction predicted by theory did improve fit of the X-ray structure to
observations, but only if distortions were held to about one fifth the magnitude predicted. Most significantly, the studies demonstrated the feasibility of conducting measurements on diamagnetic systems, greatly expanding the area of potential future application. A second application to a diamagnetic system appeared recently (Tjandra et al., 1997). It contains an example of the first real use of residual dipolar data to refine the solution structure of a protein. The system involved the transcription factor, GATA-1, complexed to a 16-base-pair DNA double helix. In this case the dominant factor leading to a significant susceptibility anisotropy is the near copla-
narity of the bases in the DNA helix. The susceptibility anisotropy is a value opposite in sign but of about the same magnitude as that observed in myoglobin. Hence, the range of dipolar couplings observed for _ _ _ pairs is comparable (about +2 to –1 Hz). The study also included data from directly bonded pairs . A gyromagnetic ratio which is 2.5 times larger for carbon relative to nitrogen, offset somewhat by the longer internuclear distance, leads to an expectation of a coupling that is about a factor of 2 larger. However, the enhanced relaxation of the carbon results in a net degradation of precision in measurement. Nevertheless, combining data from couplings is useful, because these pairs of dipolar interactions in a single residue are seldom collinear and, thus, are highly complementary. The structure of the GATA-1–DNA complex had been solved previously by conventional NMR methods (Omichinski et al., 1993), and no independent structure determination from dipolar data was attempted. However, simple observation of small, negative, dipolar values for the N–H vectors of the groove-binding did allow estimation of an angle of about 60° relative to the DNA helix axis. For a more quantitative treatment, the new dipolar constraints were combined with the original NOE data using an orientational target function (see Sec. 7) to redetermine a structure. Most parts were only marginally affected by inclusion of the dipolar data. However, several backbone torsion angles did move into more favorable regions of a Ramachandran map. Also, there was a significant movement of residues in one loop (about 4 Å for residue 22). This proves to be in a region where there were few NOEs, despite the indicated absence of rapid motions. The result highlights the way dipolar data may complement NOE data in providing indications of relative orientations of remote segments that may not be well connected by short-range NOEs. Again, this provides a clear example of a fruitful area for future exploration. In all of the above applications the degree of orientation depended on the inherent anisotropy in magnetic susceptibility of the molecular complex under
Protein Structure and Dynamics from Dipolar Couplings
327
study. The resulting small magnitudes of residual dipolar couplings certainly impeded more widespread application. Late in 1997 Tjandra and Bax (1997b; Bax and Tjandra, 1997) provided a very important example of using a liquid crystal medium to amplify levels of orientation for proteins. The use of liquid crystals to
accomplish similar effects with small molecules in solution has, of course, a long history (Emsley and Lindon, 1975). However, the existence of a suitable aqueous liquid crystal medium that could impart moderate degrees of order to proteins without perturbing their structure was not apparent. Dilution of a liquid crystal dispersion of lipid bilayer fragments called bicelles that had been developed for the study of membrane associated molecules (Sanders et al., 1994) proved to be the key. The medium is easily prepared from mixtures (about 3:1) of dimyristoyl phosphatidylcholine and dihexanoyl phosphatidylcholine dispersed at 3–8% in aqueous buffer. The residual dipolar couplings seen for ubiquitin rose from tenths of Hz in simple solution to 10s of Hz in bicelle media. The initial report suggested that the mechanism of induced orientation was based on collision of the non-spherically shaped protein with the planar bilayer-like surfaces of the oriented bicelles. This mechanism is graphically depicted in Figure 5. Whether or not this mechanism holds true, a powerful tool for amplification of residual dipolar couplings has been discovered. These and other, yet to be discovered, liquid crystal media will remain powerful tools as long as structures of the proteins are not perturbed by interactions with the media. Since the discovery of the utility of dilute bicelle media, there has
328
James H. Prestegard et al.
been a flurry of activity resulting in application to other protein and non-protein systems (Prestegard, 1998; Clore et al., 1998a). It is clear that the measurement of anisotropic NMR parameters in solution is now feasible even in macromolecular systems and that a wealth of diverse physical information awaits. For weak alignment, where the magnitude of the observed effects depends on the square of the magnetic field, the increase in magnetic field
strengths has certainly played a critical role in the developments described thus far. Even in the case where orientation is aided by the use of liquid crystal media, increased fields will play a role as collection of useful structural data on larger and larger molecules becomes possible.
5. MEASUREMENT OF RESIDUAL DIPOLAR COUPLINGS The measurement of effects arising due to field-induced order in the solution state obviously presents a significant experimental challenge. At currently available magnetic field strengths, and even for systems chosen specifically for their large susceptibility anisotropy, residual dipolar couplings are expected to be on the order of only a couple of Hertz. To make effective use of these data as structural or dynamical probes, precision of measurement must be on the order of a couple of tenths of a Hertz. Several experiments directed at achieving this precision, or even better precision, have now appeared. Some are specifically designed for the accurate measurement of one-bond couplings in proteins, either amide couplings alpha couplings. Some of these build on methods for measuring multiple-bond scalar couplings (Biamonti et al., 1994), but are distinct in their emphasis on the very precise measurement of relatively large couplings. Methods of measurement can be divided into two general categories: frequencyresolved methods, such as J-resolved and ECOSY spectroscopy where separation of peak centers is measured in a frequency domain (Montelione and Wagner, 1989), and quantitative J-modulation experiments, in which the coupling is extracted from the resonance intensity rather than an experimental splitting (Billeter et al., 1993; Vuister et al., 1993; Vuister and Bax, 1993; Bax et al., 1994). The frequency-resolved versus intensity-based methods are somewhat complementary, as they are often subject to different sources of systematic error, a subject which we shall discuss in more detail in what follows. They also differ somewhat in applicability to different types of spin coupling networks: two-spin, three-spin, and higher. We
will limit our discussion to the experiments used in investigation of the three proteins mentioned in the previous section.
5.1. Frequency-Domain Experiments Frequency-domain experiments are not only conceptually simple but are often less prone to systematic error than resonance-intensity-based experiments. On the
Protein Structure and Dynamics from Dipolar Couplings
329
other hand, experiments of this type uniformly suffer from increased complexity as the numbers of lines in a multiplet increases, and degradation of precision as linewidths increase. Also, they can be time consuming when couplings are measured in the indirect dimension of a two-dimensional experiment. To achieve optimum precision, data must be collected for a period of time on the order of the transverse spin relaxation time and when peaks are spread over a large frequency range acquisition of many points is required, J-resolved spectroscopy can be employed to improve precision as field inhomogeneity effects are refocused and the spectral width which must be characterized in the indirect dimension is reduced. A sacrifice is, however, made in the loss of chemical-shift resolution in this dimension. One experiment used in the myoglobin studies was a modification of 2D J-spectroscopy in which the indirect evolution period is actually a composite of two evolution periods which are incremented accordion style (Bodenhausen and Ernst, 1981). A useful compromise between spin–spin coupling and chemical-shift resolution can be achieved. Since the attainable precision will be limited by the
linewidths, it is advantageous to allow the coupling to evolve while transverse magnetization is present on the member of a spin pair with the most favorable relaxation properties. In the systems of interest here, this is frequently
and the
coupling of interest is the one-bond amide coupling. The experiment used resembles a standard experiment and is called a selective–coupling–enhanced HSQC (SCE-HSQC) experiment (Tolman and Prestegard, 1996a). The pulse sequence is shown in Fig. 6. There are, however, a few deviations from a standard HSQC. First, the evolution period is divided into two distinct parts. During the first part, chemical shift and coupling is allowed to evolve exactly as in a HSQC experiment, except the 180° pulse to protons normally used to remove coupling is omitted. The dwell time employed should be short enough to span the frequencies present and hence is dictated by the
330
James H. Prestegard et al.
chemical-shift dispersion. During the second part, a time ofduration only pure coupling evolution occurs. Removal of chemical shift and preservation of coupling
is accomplished by the inversion of both the spins near the midpoint of this second period. Gradients of equal magnitude and duration are applied on both sides of the 180° pulse to ensure that only transverse magnetization which experiences the inversion is preserved (Bax and Pochapsky, 1992). The amount of additional coupling evolution can be varied by modifying the value of the parameter The resulting spectrum is much like that obtained with a simple coupled HSQC except that the apparent coupling will be larger.
Second, multiple-bond couplings involving the nitrogen are removed. These couplings, which are on the order of a few Hertz, will generally cause an increase
in the apparent linewidth for a protein sample and will therefore degrade the accuracy with which the line positions can be determined. Removal of these
couplings is accomplished by insertion, at the juncture of the two evolution domains, of a frequency-selective pulse (Emsley and Bodenhausen, 1990) which inverts resonances in the region of the spectrum containing alpha and beta protons of amino acid residues. This leads to a complete refocusing of the effects of their coupling to the amide spin in the case where is set to 0.5. There are some unexpected additional advantages in the introduction of a pure
coupling evolution period. In particular, the effects of chemical-shift anisotropy (CSA) dipole–dipole (DD) cross correlation are suppressed during this period (see Sec. 6). The effectively simultaneous 180° pulses to retain the sign of the dipole interaction for a given component of a doublet; however, this inverts the sign of the CSA interaction. Therefore, cross-correlation effects before and after the 180° pulses cancel.
Extraction of maximum information from frequency-domain spectra requires a more sophisticated treatment than simple measurement of the separation of peak maxima in a column of an HSQC spectrum. In the initial application to myoglobin,
methodology introduced by Garrett et al. (1991) was used. Here, peak centers are found by fitting two-dimensional data to a constrained set of ellipses at several contour levels. In effect, this makes use of data spread over a range of columns and places some restrictions on line-shape variations. Using this approach a precision
of approximately 0.1 Hz was obtained for spectra acquired in about 15 h on a 5-mM sample using points spread over 256 ms. The issue of how much precision can be obtained from frequency-domain spectra and how the data can be most efficiently acquired is actually a complex one. Common notions about digital resolution and improving resolution by lengthening acquisition times do not apply when analysis goes beyond simple measurement of
peak maxima positions in data treated with a fast Fourier transform. The simple approach is limited by failure to use all data acquired, by limitation of possible frequencies to a set of predetermined discrete values, and by neglect of other knowledge about line shapes, amplitude, and phases.
Protein Structure and Dynamics from Dipolar Couplings
331
Other well-established procedures based on least-squares fits to more detailed models can be used. If we know that our signal is not of fixed amplitude but
exponentially decaying with rate we might consider using an exponentially damped sinusoidal model. In this case, we can replace and in a typical Fourier transform representation with and The least-squares estimate for is then the value of which maximizes the real part of the discrete Fourier transform of the “matched filter”-apodized FID. This result has been derived within the context of Bayesian parameter estimation by Bretthorst (1990a, 1990b). If is unknown, then both and can be estimated by minimizing the residual sum of squares or, equivalently, by maximizing the projection of the model function onto the data. Methods for calculating this projection include the method of Tang and Morris (1986), the “fast maximum likelihood estimation” method of Umesh and Tufts (1996), or a method based on generalization of the Clenshaw recurrence relation for finite sums as described by Andrec and Prestegard (1997). Alternatively, one could use more direct forms of parameter estimation such as time-domain linear prediction (de Beer et al., 1988), frequency-domain least squares (Martin, 1994), or time-domain maximum likelihood (Chylla and Markley, 1995).
The precision of a parameter estimate is in general a nontrivial function of the
form of the model, the data sampling pattern, the signal-to-noise ratio of the data, and the parameter estimation method. It is possible, however, to derive a lower bound on the variance of a parameter estimate, known as the Cramér–Rao lower
bound (CRLB), which cannot be exceeded by any unbiased parameter estimator (Kiefer, 1987; van den Bos, 1982). It can be shown that if the noise consists of independent, identically distributed normal random variables, then the variance of
the least-squares parameter estimate will approach the CRLB as the number of data points increases (van den Bos, 1982). Considering a linearly sampled exponentially damped sinusoidal model containing one signal of amplitude B, center frequency and decay rate with additive Gaussian noise of mean zero and variance
it can be shown that the CRLB for the variance of the estimate of the center frequency is
where
N is the number of data points, and
is the dwell time.
332
James H. Prestegard et al.
The dependence of the square root of for various decay rates is shown in Fig. 7 for a 256-point FID with a signal-to-noise ratio of 200:1.
It is clear that for a decay rate of 30 Hz, the standard deviation of the centerfrequency estimate (i.e., the square root of can be below 0.01 Hz. As might be expected, the standard deviation deteriorates as the line gets broader and as the digital resolution gets poorer (i.e., at larger Hz/point values). But, simply acquiring more time-domain points to improve digital resolution is not a solution. Since our signal is decaying, the latter points of the FID contain less signal than the earlier ones and so contribute less to our ability to estimate the center frequency. There is, therefore, a steep increase in at very small Hz/point values. Optimal experiments have an acquisition time of about Although 0.01-Hz values for a may seem extremely small to many NMR spectroscopists, computational error estimation using linear predictive meth-
ods routinely find variances in the parameter estimates approaching the CRLB for exponentially decaying models (see, for example Totz et al., 1997; van Huffel,
1993). Signal-to-noise ratios for data typically analyzed seldom exceed 100:1; but
Protein Structure and Dynamics from Dipolar Couplings
333
it is still not unreasonable to conclude that estimation of center frequencies of well-resolved Lorentzians to precisions of 0.1 Hz is within the capabilities of sophisticated data analysis methodologies.
5.2. Intensity-Based Experiments Quantitative J-correlation experiments overcome some of the complexities of frequency-domain analysis by encoding coupling in an easily quantitated resonance intensity. Resolution only needs to be adequate for chemical-shift resolution, and intensities can be measured from a single decoupled resonance rather than dealing with signal spread over multiplet components. Provided that systematic errors can be adequately controlled, couplings can be measured with very high precision using quantitative J-correlation spectroscopy. The principle underlying quantitative J-type experiments is to pass the observed signal through a period in which the intensity is modulated by a known function of the spin–spin coupling (Billeter et al., 1993; Bax et al., 1994; Vuister et al., 1993; Vuister and Bax, 1993). The experiment used by Tjandra et al. (1996) in their studies on ubiquitin typifies this class of experiment.
Again the experiment is a modification of the regular HSQC pulse scheme where an additional period is included (Fig. 8). Since only coupling evolution occurs during this period, the intensity of the signal at the end of will be modulated by the To a first approximation, the intensity
of the
correlation using this pulse scheme will be
A set of 2D spectra are then collected centered around values of which maximize the dependence of the resonance intensity on Maximum dependency of the
334
intensity on
James H. Prestegard et al.
can be achieved when simultaneously having
N being an integer. Values of in proteins are fairly uniform so variations in choice of from system to system will depend largely on Usually it is advisable to use an odd value and an even value of N to compensate for incomplete inversion. In the application to ubiquitin values of and were used and a total of durations were used centered around these values. The HSQC intensities can be fit to Eq. (17) modified to allow for some unmodulated magnetization. The values of obtained using this procedure are highly reproducible, and in the study of ubiquitin the pairwise RMSD between successive measurements of was 0.031 and 0.015 Hz at 360 and 600 MHz, respectively. The methodology used in later work on myoglobin is also intensity based (Tolman and Prestegard, 1996b). Like the above experiment, it depends on the efficiency of transfer of magnetization from one member of a coupled set to another in a typical HSQC experiment. Maximum efficiency is normally accomplished by matching a fixed coupling evolution time, T, to 1/2(J + D). In the normal experiment, part of the signal is lost when the time is not well matched. As illustrated above, simple intensity decreases can be used to evaluate the degree of mismatch and, hence, the value of J + D. However, the signal normally lost can also be recovered and used to calculate a phase error due to over or under precession during T. This phase error is a function of J + D and can be used to evaluate the coupling. This experiment is similar in some respects to experiments such as the HNHA, in which the referencing is accomplished by observation of a pair of resonances which are cosine and sine modulated, respectively, by the coupling (Vuister and Bax, 1993). Measurements of the intensities of the cosine- and sine-modulated components can be related to the angular deviation of coupling evolution, The experiment which was developed for the measurement of called a experiment, is shown in Fig. 9. It is essentially a modification of a
Protein Structure and Dynamics from Dipolar Couplings
335
simple constant-time (CT) HSQC experiment (Vuister and Bax, 1992) where the normal reverse INEPT has been replaced by a sequence of pulses which appear much like those in the sensitivity-enhanced experiments originally developed by Palmer et al. (1991) and then extended to gradient-selected experiments by Kay et al. (1992). The reason for the development of these latter experiments was primarily a matter of sensitivity enhancement. In contrast to more conventional schemes which retain a projection of signal on a single axis when coherence is sampled at the end of an evolution period (States et al., 1982), the new schemes retain signal on both axes. This gains a factor of in sensitivity. It also preserves both the cosine- and sine-modulated signal components resulting from chemical-shift evolution during The experiment used in the myoglobin work is similar, except that
the phases of the pulses in the reverse INEPT period have been modified such that signal components which are cosine and sine modulated by coupling rather than chemical-shift evolution during are retained. Water-selective pulses (short crosshatched) have also been added to place the water magnetization along the at the start of the constant-time-evolution period (Grzesiek and Bax, 1993). The phase of the 90° pulse, is responsible for ultimately separating the and terms. Normally, only the terms are retained. Changing the phase of will invert the signal while leaving the signal unperturbed. Hence, the and parts can be encoded differently in two separately stored types of data. Quadrature in is obtained using the method of States, which leads to
observation of a signal which is amplitude modulated, due to chemical-shift evolution, and phase modulated by the coupling evolution which occurred during To separate the phase and amplitude information, a total of four FIDs must be acquired for each increment, two for each quadrature component acquired by the
method of States. The sum and difference of this pair of FIDs are taken and stored in two separate data matrices. Standard 2D Fourier transformation of both data sets results in a pair of HSQC spectra in which the signal intensities are modulated by
and by
respectively. deviation from the tuned coupling,
From the measured volumes, is computed as follows:
For intensity-based experiments, systematic errors must be carefully considered. Possible contributing errors include spectrometer timing errors and corrections for contributions of finite pulse widths to delays, especially since data from different spectrometers need to be compared in using field dependence to separate dipolar and scalar contributions to splittings. Imperfections in excitation profiles
336
James H. Prestegard et al.
from selective pulses such as the pulse used to remove the effects of remote couplings also need to be considered, as do effects of differential relaxation of and components. These issues have been discussed in some detail in the original publications. Assuming that one can minimize systematic contributions to errors, the accuracy achieved in a fixed period of time can be superior to that of frequency-domain experiments such as the SCE-HSQC.
6. OTHER CONTRIBUTIONS TO MULTIPLET SPLITTINGS Even given a complete success in optimizing the precision of measurement, there are hazards in the interpretation of small deviations of multiplet splittings from expectations based on scalar couplings. When one begins to talk about contributions on the order of 0.1 Hz, we must question whether scalar couplings and residual dipolar couplings are the only contributions present. When we rely on the predicted field dependence of residual dipolar splittings to separate them from other contributions, we rely on the assumption that other contributions, including scalar couplings, are not field dependent. We will consider here some of the contributions that have been identified to date and whether they are likely to be field dependent.
6.1. Effects of Transverse Relaxation One of the better-known sources of multiplet splitting modification arises from the fact that proton spin relaxation will interchange magnetization associated with the two lines of a doublet (such as those arising from our amide nuclei) (Abragam, 1961). These effects are severe only when linewidths approach the magnitude of multiplet splittings. Tjandra et al. have estimated the contribution for a doublet in a protein the size of ubiquitin to be only 0.001 Hz, a value that should not be of concern for measurements discussed here (Tjandra et al., 1996). There are, however, more subtle effects of transverse relaxation that are associated with distortions of line shapes by cross-correlation effects. Cross correlation arises when multiple interactions contribute to spin relaxation. When the modulation of the interaction is correlated, it is not correct to say that relaxation is proportional to the sum of the squares of the interactions; there is a signed correction term that can add to or subtract from different lines of a multiplet (Werbelow and London, 1995). An example of this effect is readily apparent in peak shapes of doublet components in the coupled HSQC spectrum of Fig. 3, and the columns through such doublets as shown in Fig. 4, where one line of the doublet is wider than the other. The two interactions in this case are the nitrogen chemical-shift anisotropy (CSA) interaction and the proton–nitrogen dipole–dipole (DD) interaction. The
Protein Structure and Dynamics from Dipolar Couplings
337
CSA interaction has the same sign for both doublet components, while the DD term has the opposite sign for lines associated with spins and spins. In the case of this simple doublet the distortion will not lead directly to errors in
measurement of splitting. However, as multiplets become more complex, distortions similar to this can cause errors. Consider the case where we might measure a splitting from the proton resonance from one spin of a methylene group (Fig. 9). The two protons of the group are coupled, but the coupling is small and may be poorly resolved. Here, there are both
and
relaxation interactions, and they are
cross correlated. When the nonobserved proton and nonobserved carbon have the same spin state, the cross-correlation term will have one sign; when they have the opposite spin, they will have the opposite sign. One possible broadening pattern is shown in Fig. 11. It is clear that if splittings are measured from peak separations, the differential broadening will contribute to the systematic error. The distortions are not inherently field dependent, but if any other field-dependent contributions to the underlying linewidths exist, the errors will become field dependent and will contribute to difficulties in measuring dipolar contributions. Similar effects have been noted in the literature (Tjandra and Bax, 1997). 6.2.
Dynamic Frequency Shifts
Line splittings can be directly affected by cross correlation through a phenomenon often referred to as the second-order dynamic frequency shift. When molecular reorientation is slow compared to the inverse Larmor frequency, the fluctuating fields responsible for causing relaxation can also contribute a very small static
338
James H. Prestegard et al.
component to the much larger static magnetic field, leading to a small shift in the frequency of precession (Redfield, 1965). This is perhaps most easily understood by considering the two limiting cases of very fast and very slow rates of molecular reorientation. The fluctuating interaction causing relaxation can be thought of as producing a small local field, which is experienced by the relaxing spin. In the event that molecular reorientation is very fast, the effective field is given by
But, by definition,
, and hence there is no resulting frequency shift.
However, in the case of slow molecular reorientation, the spin may precess many times at each instantaneous orientation, so the resultant effective magnetic field
experienced by the spin is The requirement that have a zero time average means that it must fluctuate randomly, and thus the resulting magnitude of the static field can be expressed as
Protein Structure and Dynamics from Dipolar Couplings
339
For the result is a small offset, More formally, the dynamic frequency shift arises from the imaginary part of the Fourier transform of the angular correlation function (Werbelow and London, 1995; Grzesiek and Bax, 1994). These effects are normally very small, usually less than a tenth of a Hertz offset for a 5-kHz interaction at 500 MHz. It is in practice very hard to measure these effects as an absolute offset of a resonance. However, when crosscorrelation effects exist and one measures splittings of a doublet, they are accentuated by the two lines moving in opposite directions. In essence, they produce a contribution to the scalar coupling constant. This contribution can be expressed as
where and are the dipole–dipole and CSA interaction constants, respectively, and is the imaginary part of the corresponding spectral density function. Using Eq. (21) we can calculate the expected contributions to the measured couplings at fields of 11.7 and 17.6 T, and the contribution that would be extracted
and interpreted as a dipolar coupling by using the expected field dependence. Note that the dynamic frequency shift is inherently field dependent through both and For large molecules, is inversely proportional to
and is directly proportional to the effects cancel. Hence, the effects on for myoglobin are significant, 0.5 Hz at 11.7 T; but the field-dependent contribution extracted by comparing splittings at 11.7 T and 17.6 T is just 0.05 Hz. For ubiquitin, a smaller molecule, the total contribution is similar, but the part extracted from the field dependence is larger, 0.14 Hz. Fortunately these effects are quite uniform for all amide bonds and Tjandra et al. have been able to correct measured values for ubiquitin by applying a constant calculated correction term (Tjandra et al., 1996). There is the possibility of dynamic frequency shifts which arise from cross correlation between other pairs of interactions. For the myoglobin work, it is important to consider cross correlation between nuclear dipole–nuclear dipole interactions and Curie spin–nuclear dipole interactions due to the unpaired electrons. The interactions again prove to be insignificant for myoglobin since the electronic spin state is and the magnitude of the Curie interaction depends on S(S + 1 ) and where r is the electron–nucleus distance. However, these interactions could become important in other paramagnetic systems, with higher electron spin states.
7.
STRUCTURE DETERMINATION PROTOCOLS
Assuming that a high precision measurement can be achieved and sources of systematic errors eliminated, residual dipolar couplings clearly offer potential as
340
James H. Prestegard et al.
structural constraints. Suppose that one could measure residual dipolar contributions for directly bonded pairs with an accuracy of just when the residual dipolar contributions ranged from 2.2 to –1.8 Hz (this is approximately what was done for the protein cyanometmyoglobin). Then, since the full range of dipolar splittings can be spanned in reorienting the N–H vector by 90°, this
translates to an angular precision of approximately for orientation of an N–H bond vector. This corresponds to a precision in proton placement that is unsurpassed by most other structural methods. As discussed in the introduction, this information also has special appeal in that it restricts relative orientations of parts of molecules without regard to spatial proximity. This makes the information arising from residual dipolar couplings highly complementary to NOE-based distance constraints. Perhaps the simplest way of incorporating information in a structure calcula-
tion is by means of an error function (Nilges et al., 1988). One simply adds for each
measured dipolar contribution a penalty function which is the weighted ( W ) square of the difference between the experimental splitting and the splitting calculated using Eq. (22) for a trial molecular structure:
This energy is added to the normal molecular and NOE distance-constraint energies used in simulated annealing or minimization protocols. This strategy has been employed by Tjandra et al. in their work on the GATA-1 protein (Tjandra et al., 1997). It has also been used in work on myoglobin, as described below. Application of this seemingly simple protocol is, however, not without challenges. Calculation of the expected splitting using Eq. (13) depends on the anisotropy of the susceptibility tensor, and and the orientation of this tensor in a molecular frame. These parameters depend on a structure which is not known at the onset of the search. One could imagine including these parameters as a part of a general structure search, but this complicates an admirably simple protocol. Tjandra and co-workers have suggested a reasonable compromise—basically assuming that to first approximation the tensor is axially symmetric and that there are sufficient numbers of randomly oriented vectors to have a least one coincide with the symmetry axis of the axially symmetric tensor (Tjandra et al., 1997). The N–H pair corresponding to this vector will then be the one with the largest residual dipolar splitting, and it can be used to define a molecular orientation axis and calculate a value for In other cases it may be clear that one particular group within the molecule dominates the susceptibility anisotropy, and this group can be used to define an orientation axis. Myoglobin is a case in point. Here, the paramagnetic part of the susceptibility tensor associated with the unpaired electron spin on the heme iron
Protein Structure and Dynamics from Dipolar Couplings
341
clearly dominates, and the susceptibility can be calculated from a knowledge of electronic structure of suitable model compounds, or it can be calculated from complementary data such as the g anisotropy of EPR spectra (Hori, 1971). In cases where the structure around the paramagnetic center can be assumed known, as it might be for a highly conserved domain where the peripheral structural perturbations are of interest, it may be possible to calculate the paramagnetic portion of the tensor from perturbations of NMR chemical shifts. These perturbations are called dipolar or pseudocontact shifts. Measurement of the paramagnetic susceptibility anisotropy in this way has an additional advantage; Residual dipolar interactions and pseudocontact shifts have an identical angular dependence. This means that the effects of averaging arising from time dependence of the
will be the same, and an appropriate effective tensor will be calculated. To proceed with the calculation, both paramagnetic and diamagnetic species must be prepared and the chemical shifts of assigned peaks measured. LaMar and co-workers have done this, using a set of protons assumed to be in positions consistent with the X-ray structure of myoglobin (Rajarathnam et al., 1992). Tolman and Prestegard have extended this measurement to include most of the assigned
pairs in
cyanometmyoglobin. Essentially the same paramagnetic susceptibility tensor is found. These results are summarized in Table 1.
Unfortunately, diamagnetic contributions are not insignificant, and these must be added if a high level of accuracy is to be achieved. The complete diamagnetic correction can be computed by assuming a structure and summing over all groups
and bonds within the protein which have a significant susceptibility anisotropy. The group anisotropies typically employed are given in Table 2, (Tjandra et al., 1996; Rajarathnam et al., 1992), and the net correction to the total susceptibility, obtained for myoglobin assuming the X-ray structure, is given in Table 1. Note that the diamagnetic contribution to the susceptibility anisotropy is dominated by the rigid and well-structured heme ring for myoglobin so that a reasonable approximation of the susceptibility tensor might be possible based on
342
James H. Prestegard et al.
the heme ring and shift of protons very near this ring without a complete knowledge of the protein structure. There are also cases among diamagnetic proteins where the susceptibility anisotropy is dominated by a single structured entity. This is illustrated in early applications to a porphyrin complex (Lisicki et al., 1988) and in recent applications to molecules bound to DNA (Tjandra et al., 1997). The susceptibility anisotropy of DNA is in fact quite large (Bastiaan and MacLean, 1990), and for a dodecamer it is comparable to that for cyanometmyoglobin. Values of used in estimating DNA anisotropy have also been included in Table 2. With reasonable estimates of the susceptibility tensor and accurate measurements of residual dipolar coupling, it is possible to proceed with a search for a structure using Eq. (22). As mentioned, refinement of structures for which other structural information exists (such as a complete set of NOE-based distance constraints) is straightforward. The results on the GATA-1–DNA complex are encouraging. However, we focus here on a slightly more ambitious question: can a protein structure be found using only the short-range NOE data that define the secondary structure and the orientational constraints from backbone residual dipolar couplings? The question is of some interest because this is the type of information that one might anticipate getting from a large protein using only 15N labeling or from a protein labeled with and for backbone assignment purposes but proves to be too large to carry out side-chain assignments. For example, cyanometmyoglobin consists of eight folded about a central heme prosthetic group. These are easily identified in NOE spectra by the pattern of strong sequential i to i + 1 HN–HN NOEs and the
intermediate i to i + 3
As a test case, a portion of the myoglobin
molecule was selected, helices F–H. The helical residues as listed in the X-ray PDB file were held perfectly rigid, with the intervening loops allowed to move as dictated by the OPLS all-atom force field. A starting structure having a nearly linear arrangement of helices F–H was chosen, and simulated annealing was undertaken
Protein Structure and Dynamics from Dipolar Couplings
343
using a torsion-angle dynamics version of XPLOR (Stein et al., 1996). As the simulation proceeded, more and more synthetic N–H dipolar constraints, calculated from the X-ray structure, were added and the weighting factor was likewise increased. A penalty which rose with distance from the center of mass was also used so that only compact structures would be produced. The results are shown in Fig. 12. It is clear that an arrangement of helices that closely mimics the reference crystal structure can be reproduced. There are some imperfections, however, particularly in the loops connecting the helices. On close examination, one finds that the orientation of some of the N–H vectors in the loops are parallel to the corresponding vectors in the reference structure; it is just that directions are mirror image orientations with respect to the magnetic coordinate system. This points to one fundamental limitation of orientational constraints: the mapping from splittings to orientations is not unique. In fact, there is usually a whole range of angles that can reproduce a measured splitting. In the case of a rigid structure with known susceptibility, one needs three noncollinear vectors belonging to a single rigid entity to find a solution that is degenerate only with respect to
344
James H. Prestegard et al.
inversion. In the above case we reach this number of vectors for helices, but only because N–H vectors are not quite parallel to the average helix axis. For loops it is unwise to assume rigid structures and the geometry will be underdetermined using just N–H vectors. Even in the case of the helices, reversal of direction is always possible. For three helices this leads to possible folding patterns. So, even in this deliberately idealized test case, structures of the form depicted in Fig. 12 are found only about 10% of the time. Hence, it is likely that a small amount of longer-range helix–helix contact data will be necessary to make a final selection of the correct structure. This data could be provided by NH–NH NOEs in deuterated proteins. Procedures for refining protein structures using a mixture of dipolar and NOE derived constraints are evolving very rapidly. Some very recent advances have incorporated dipolar constraints in a simulated annealing protocol along with a representation of the principal frame of the susceptibility tensor. Both molecular
structure and orientation of the frame are optimized in the procedure (Clore et al.,
1998b). The procedure also makes use of a statistical analysis of observed residual coupling to estimate both asymmetry and rhombicity of the susceptibility (alignment) tensor (Clore et al., 1998a).
8. THE EFFECTS OF MOLECULAR MOTION AND THEIR SEPARATION 8.1. The Cone-and-Arc Model The above search for molecular structure is predicated on the assumption that a single rigid structure can accurately reproduce the observed data. For NOE-based structures this seems to be largely the case, but this may not be so for structures based on orientational constraints from dipolar data. For NOE-based structures it turns out that models having high-frequency uncorrelated local motions that are centrosymmetric about an average atomic position are degenerate with models that have atoms rigidly held at their average positions (LeMaster et al., 1988). This degeneracy is due to compensation of distance-averaging effects by angularaveraging effects. The particular conditions for compensation are not likely to hold for orientational data from directly bonded pairs. In principle, one can exclude data that may be compromised by motional averaging. This is a procedure that Tjandra et al. (1997) have used. Spin relaxation determination of order parameters for backbone sites was used to exclude data on
sites with order parameters less than 0.77. Conventional spin-relaxation-based procedures for assessing the amplitude of fast motions in conventional-solution NMR studies may unfortunately be inadequate because these procedures are not sensitive to motions on time scales much longer than the correlation time for molecular tumbling. In principle, an entire range of motions, including motions out
Protein Structure and Dynamics from Dipolar Couplings
345
to the millisecond time scale, can lead to reduction in the magnitude of residual
dipolar couplings. Myoglobin should prove a good benchmark for evaluating the possible influence of internal motions. It is widely accepted that rigid models are good representations of the structure of this protein. However, there is also adequate evidence for structural rearrangements, particularly those necessary to allow binding and release of oxygen from the central heme. Tolman et al. (1997) reported an investigation which compared predicted residual dipolar couplings based on existing crystal structures and the best values of susceptibility tensor data available. The correlation of the predicted couplings with experiment is shown in Fig. 13. Data from loop regions is shown as open circles, and data from helices is shown as solid circles. Even if the loop regions are excluded from consideration, it is clear that the correlation is less than ideal. There is a tendency to overestimate the magnitudes of splittings, increasing the slope of
the best linear correlation line. Such a change in slope is exactly what is expected for reduction in splittings due to motional disorder. However, the increase in slope
346
James H. Prestegard et al.
is greater than the factor of 0.9 expected on the basis of typical helical H–N order parameters determined from spin relaxation measurements, and there is considerable scatter remaining about this best-fit line. Significant structural or motional perturbations beyond local N–H vector motions would be required to bring theory into agreement with experiment. Structural deviations seem unlikely given the agreement among several crystal structures, and an NMR solution structure for various forms of myoglobin (Cheng and Schoenborn, 1991; Kuriyan et al., 1986; Ösapay et al., 1994). In view of this consensus and the apparent correlation of deviation for N–H vectors belonging to the same helix, Tolman et al. (1997) proposed models which included additional large-amplitude motions for rigid helical segments. These would largely maintain the average structures observed in crystals, but would reduce magnitudes of splittings for N–H vectors of different helices by different amounts. The specific model explored that produced the best results is described as an arc model. This model treats each helix as a rigid unit, as depicted in the crystal structure, which can undergo rotations about an axis perpendicular to the helix axis (Fig. 14). In addition, local vibrations of individual N–H vectors were modeled by using a fixed order parameter of 0.9 as predicted by spin relaxation measurements. Small asymmetric departures from the crystalline structure were allowed by having a helix to adopt a new average position , in addition to dynamic averaging over a range The orientation of the axis is specified by an additional angular parameter,
The model is still quite restrictive in that it does not allow rotations
about the long helix axis, and it does not allow even small deviations of helix geometry from what is seen in the crystal structure. It does, however, give insight
Protein Structure and Dynamics from Dipolar Couplings
347
into amplitudes of motion that would be required if arc motions and local N–H vibrations were the only motions allowed. The predicted amplitudes of motion are quite large. For example, the motion along the helix arc for helix H using the corrected set of tensor values is and that for helix A is One might ask if motions predicted by these models are realistic. Figure 15 depicts the directions of motion for helix H, assuming all
other structural elements are static. Interestingly, the motion of helix H has few steric clashes with other structural elements. The helix appears to slide along the back side of helix F, and the average displacement is away from the core of the
348
James H. Prestegard et al.
protein. Thus, based on the limited analysis presented here, the motions predicted using the arc model appear to be physically possible. The picture which emerges of myoglobin as a very dynamic protein is clearly
at odds with the predominantly accepted view that proteins in solution are largely rigid molecules which, in the absence of exchange between discrete conformations, experience only small-amplitude motions on the sub-ns time scale. Bax and co-workers have offered some alternative explanations for the discrepancy with a rigid model (Bax and Tjandra, 1997a). The accuracy of the parameters employed is perhaps the issue of greatest concern. Due to the fact that both the magnitude of the anisotropies and the presence of motion will have a scaling effect on the data, the amplitudes of motion obtained are highly dependent on the accuracy
of the To address this issue, the tensor has been redetermined using more extensive pseudocontact shift measurements to get a more accurate measurement of the paramagnetic contribution. This does reduce the RMSD to 0.48 Hz, but qualitatively does not change the model. There could also be deviations in effective susceptibility tensors that arise from molecular associations that would not show
up in pseudocontact shifts from a single paramagnetic center. Such phenomena have been seen before in simple diamagnetic systems where the association of aromatic rings in solution enhances the effective molecular anisotropy (van Zijl and MacLean, 1985). To fit the data, a decrease in the anisotropy would be required. This would have to be the result of a very specific geometry of association in which heme axes for two molecules were orthogonal. Sample dilution experiments can test this possibility.
8.2. Order Matrix Analysis: A Test for Rigid Model Validity
There are methods for analysis of residual dipolar couplings that are less biased than approaches that assume a specific model, yet still allow for the effects of both
structure and motion. These methods, referred to as order matrix analyses, are widely applied in the liquid crystal field where they are used to analyze a variety of anisotropic data, including dipolar couplings, quadrupolar couplings, and chemicalshift anisotropy offsets (Sanders et al., 1994). Here we introduce them as a possible means of testing the validity of a rigid model, independent of assumptions about the origin of order or the possible nature of internal motions.
The order matrix with elements was first introduced by Saupe (1968) (Diehl and Khetrapal, 1969; Emsly and Lindon, 1975). The elements of S are usually defined in terms of direction cosines that relate an ordering director to the axes of a molecule fixed Cartesian coordinate system by the angles sle In this case the order matrix elements are
Protein Structure and Dynamics from Dipolar Couplings
349
Here the brackets denote averaging over a time scale which is short compared to the reciprocal of the interaction being measured, and is the Kronecker delta. The matrix is traceless and symmetric, so, there are only five independent elements. The dipolar interaction between a pair of spin-1/2 nuclei, i and j, may be written in terms of these elements as follows:
Here the are the direction cosines that define the direction of the interaction vector in the molecular coordinate frame. In general, we have several interaction vectors within a rigid fragment. In this case, a coordinate system can be defined within a particular fragment, and the are fixed by the molecular geometry of the fragment. A set of expressions for measured couplings with a common set of order matrix elements may then be solved for the order matrix elements provided that couplings for at least five noncollinear interaction vectors have been measured. Diagonalizing the order matrix formed from these elements yields the principal components of the order matrix and a transformation matrix that can be used to
orient the frame of principal order relative to the frame of the rigid fragment. In
cases where a highly anisotropic element dominates the anisotropy of the susceptibility tensor, the reoriented principal order frame should coincide with the symmetry axes of this element, thus corroborating the validity of the rigid model. In cases where the dominant factor in determining order is not known, one may still be able to test a rigid model. One can choose a single molecular frame to define the but divide data into sets belonging to individual fragments likely to be locally rigid, such as individual helices of an protein. Solving for a principal order frame and its orientation from the point of view of each local fragment should
produce identical results if the overall structural model is correct and the fragments experience no independent motions (other than N–H bond oscillations that are identical for each fragment).
If results do not coincide, then either the structural model is wrong or there is substantial internal motion. In principle, it may be possible to characterize the nature of internal motions if the global contributions to molecular order are known from other sources. To understand how this might be done, it is useful to look at a slightly modified form of Eq. (24). By the trace and symmetry properties of S, it is possible to recast Eq. (24) into the following form:
where The still contain a description of the orientation of an interaction vector with respect to a molecule fixed frame, but direction cosines have been collected in forms appropriate for order parameters.
350
James H. Prestegard et al.
The order parameters under conditions where the are time dependent, become averages and characterize internal motions. In some simple cases one can readily solve for the One is the case in which the internal motions are highly symmetrical, such as the wobbling of a vector in a cone. These motions scale all by a constant factor, We therefore know that for each term of Eq. (25) we can factor into parts that represent global, and local,
order:
One can calculate from independent information on the orienting forces. Then can be extracted from Eq. (26) since the measured value of is known.
Ordering due to interactions with an anisotropic susceptibility tensor is one of the simplest cases for which to calculate We have previously discussed magnetic-field-induced ordering in terms of an anisotropic susceptibility tensor, Our final expression was in terms of the axial anisotropy,
and rhombic
anisotropy, But these terms were introduced by evaluation of the expectation values of the 00 and 02 Wigner rotation matrix elements:
and are equivalent to and respectively (Emsley and Lindon, 1975). Hence, we can use the expressions in Eq. (27) to calculate in Eq. (26). We simply must be sure that and are in the same frame. We have applied the order matrix test for internal motion to the results for myoglobin. We assume that the X-ray crystal structure is correct, at least for the
helical portions, and add hydrogens assuming normal bond geometries with respect to heavy atoms. As a matter of caution we truncate the helices by one residue at each end since we know the loops connecting helices show extra motion. The set of N–H vectors within each of the truncated helices of myoglobin yield five or more noncollinear vectors except for helices C, D, F, and G. Helices C and D, which are separated by a very short loop, can be linked in a rigid unit to yield more than five noncollinear vectors (C + D). If the simple rigid model is correct, the orientation and principal values of the order tensor determined from data on each of the five sets should be identical. Figure 16 shows the results of the calculation on a Sauson–Flamsteed projec-
tion (Losonczi and Prestegard, 1997) of the surface of a globe with the x-axis passing through the poles and coming out of the page. Any molecular axis system could be used, but we have chosen one with along the direction of the heme-dominated susceptibility tensor. If the susceptibility tensor and a simple rigid model ordered by susceptibility interactions are correct, the direction of principal
Protein Structure and Dynamics from Dipolar Couplings
351
order as viewed from each of the helices should coincide with the The points representing principal order directions consistent with data within the estimated error of 0.2 Hz are found by random search through the space determined by the five independent order parameters. Only the direction for the principal ordering axis is shown in Fig. 16, although each point also has an associated order parameter, asymmetry parameter, and a third Euler rotation which defines the x- and y-axes of the asymmetry parameter. The points cluster for each helix. Note that for most helices the points are closely clustered about the and for helices B, E, and C + D some points are actually at the axis. Not only is the principal order
direction coincident with the principal axis of the susceptibility tensor, but the principal order parameters determined are within 30% of those expected based on the anisotropy of the susceptibility. This is one indication of the approximate validity of the structural model and the susceptibility tensor. The broad spread of
points for helices such as B does not indicate failure of the model but a range of possible solutions which include the rigid model. Helix B is nearly ideal, leading to the near collinearity of many N–H vectors. This means that even though there is
352
James H. Prestegard et al.
a large number of measured couplings, the data are redundant and the orientation about the helix axis is not well determined. Linking two noncollinear helices such
as C and D reduces this redundancy. In the future, this redundancy can be reduced by use of the data, as Bax and co-workers have done in their work on the GATA-1 protein (Tjandra et al., 1997). There are, however, deviations from the rigid model. Helix A, in particular, does not have points which overlap the and points for helix H are further from the axis than those of most helices. This indicates that a purely rigid model for these helices does not fit the data, and the fit cannot be improved by invoking random motions in a cone (this would reduce the order parameter but not change the axis direction), in agreement with the results reported by Tolman et al. (1997). In the most severe cases, like helix A, the assumed local helix geometry is incorrect, and in other cases large anisotropic motions may exist. Helices B and E show order
frames coincident with the susceptibility frame, but the principle order parameters are well below prediction. Using the procedure of Eqs. (26) and (27), values
of 0.82 and 0.78 are calculated. These are near those reported by Tolman et al. (1997). Deviations from the originally reported data can be attributed to use of an improved and deletion of data at the helices termini. Even the new values for helices B and E suggest motion of slightly greater amplitude than the 0.9 that would be predicted for rapid N–H motion coming from spin relaxation measurements. Even though there is reason to suspect substantial local motion effects on the
residual dipolar parameters measured in myoglobin, it is unlikely that these will prevent their utilization as a structural tool. There seems to be approximate agreement between the orientation of the various structural elements as determined from dipolar data with orientations found by more traditional structural methods. The most substantial deviations with helices A and H occur at the N- and C-terminal elements in the structure or depart from local structures found in crystal structures. They may have more freedom to move. Deletion of data for loops and the ends of helices has already helped in convergence of the analyses. It is possible that better independent ways of screening for segments affected by motion may allow selective deletion and improve convergence farther. Recent spin-relaxation-based methods that probe slower motions in proteins may be particularly useful in this respect (Carr et al., 1997; Mandel et al., 1996). The order matrix approach, in addition to offering a test for the validity of the rigid structures, offers a route to structure determination that could permit a range of local motions, and would not require prior knowledge of a susceptibility tensor. Order tensors viewed from any number of semirigid structural elements can be found, and the positions of their principal axes depicted on a map such as the one in Fig. 16. As long as local motions of the structure elements could be characterized as symmetric cone-type motions, rotations needed to make all principal axes
Protein Structure and Dynamics from Dipolar Couplings
353
coincide provide a recipe for moving all the elements to a common coordinate frame. This is equivalent to determining a structure. One limitation is that a precise structure does require at least five independent measurements for a semirigid element in order to eliminate the broad spread of solutions such as that shown for helix B in Fig. 16. This can be accomplished for elements as small as single peptide bonds if one uses the full range of anisotropic data (Losonczi and Prestegard, 1997). The couplings will certainly be useful in this respect (Tjandra et al., 1997). For the future, studies need not be restricted to macromolecule structure and dynamics. Other potential targets include ligand-binding studies, particularly where interactions may be mediated by structural elements showing few intermolecular NOEs. Protein–carbohydrate interactions where hydrogen-bonding networks mediate the interaction are a case in point. Protons in these networks exchange rapidly with water, making observation of discrete NOEs difficult. Yet measurement of residual dipolar couplings can constrain carbohydrate orientation. In all, there is a great deal of promise for the characterization of both structure and motion using dipolar data. We look forward to the applications that will appear over the next few years.
ACKNOWLEDGMENT. We thank Mr. Ranajeet Ghose for useful discussions and preparation of Figs. 10 and 11 and Judit Loscenzi for her work on the order matrix analysis methods and preparation of Figure 5. The authors also acknowledge financial support from NIH grants GM 33225 and GM 54160, and NSF grant MCB-9726344.
REFERENCES Abragam, A., 1961, Principles of Nuclear Magnetism, Clarendon Press, Oxford. Andrec, M., and Prestegard, J. H., 1998, J. Magn. Reson. 130:217. Bastiaan, E. W., and MacLean, C., 1990, in NMR Basic Principles and Progress, Vol. 25, SpringerVerlag, Berlin, pp. 19–43. Bastiaan, E. W., MacLean, C., Van Zijl, P. C. M., and Bothner-By, A. A., 1987, in Annual Reports on NMR Spectroscopy, Vol. 19, Academic Press, London, pp. 35–77. Bax, A., and Pochapsky, S. S., 1992, J. Magn. Reson. 99:638. Bax, A., and Tjandra, N., 1997a, Nat. Struct. Biol. 4:254. Bax, A., and Tjandra, N., 1997b, J. Biomol. NMR 10:289.
Bax, A., Vuister, G. W., Grzesiek, S., Delaglio, F., Wang, A. C., Tschudin, R., and Zhu, G., 1994, Meth.
Enzymol. 239:79. Biamonti, C., Rios, C. B., Lyons, B. A., and Montelione, G. T., 1994, Adv. Biophys. Chem. 4:51. Billeter, M., Neri, D., Otting, G., Qian, Y. Q., and Wüthrich, K., 1993, J. Biomol. NMR 2:257.
Bodenhausen, G., and Ernst, R. R., 1981, J. Magn. Reson. 45:367. Bothner-By, A. A., Dadok, J., Mishra, P. K., and van Zijl, P. C. M., 1987, J. Am. Chem. Soc. 109:4180. Bothner-By, A. A., Domaille, P. J., and Gayathri, C., 1981, J. Am. Chem. Soc. 103:5602. Bretthorst, G. L., 1990a, J. Magn. Reson. 88:533.
354
James H. Prestegard et al.
Bretthorst, G. L., 1990b, J. Magn. Reson. 88:571. Brünger, A. T., 1992, X-PLOR: A System For X-ray Crystallography and NMR, Yale University Press, New Haven.
Buckingham, A. D., and Levering, E. G., 1962, Trans. Faraday. Soc. 58:2077. Buckingham, A. D., and McLauchlan, K. A., 1963, Proc. Chem. Soc. May:144. Buckingham, A. D., and McLauchlan, K. A., 1967, Prog. NMR Spectosc. 2:63. Carr, P. A., Erickson, H. P., and Palmer, A. G. III, 1997, Structure 5:949. Cheng, X., and Schoenbom, B. P., 1991, J. Mol. Biol. 220:381. Chylla, R. A., and Markley, J. L., 1995, J. Biomol. NMR 5:245. Clore, G. M., Gronenborn, A. M., and Bax, A., 1998, J. Magn. Reson. 133:216. Clore, G. M., Gronenborn, A. M., and Tjandra, N., 1998, J. Magn. Reson. 131:159. de Beer, R., van Ormondt, D., Pijnappel, W. W. F., and van der Veen, J. W. C., 1988, Isr. J. Chem. 28:249. Diehl, P., and Khetrapal, C. L., 1969, NMR: Basic Principles and Progress, Vol. 1, Springer-Verlag, New York. Emsley, L., and Bodenhausen, G., 1990, Chem. Phys. Lett. 165:469. Emsly, J. W., and Lindon, J. C., 1975, NMR Spectroscopy Using Liquid Crystal Solvents, Pergamon
Press, Oxford, UK. Eyring, G., Curry, B., Mathies, R., Broek, A., and Lugtenburg, J., 1980, J. Am. Chem. Soc. 102:5392. Falle, H. R., and Luckhurst, G. R., 1970, J. Magn. Reson. 3:161. Garrett, D. S., Powers, R., Gronenborn, A. M., and Clore, G. M., 1991, J. Magn. Reson. 95:214. Gayathri, C., and Bothner-By, A. A., 1982, Chem. Phys. Lett. 87:192. Grzesiek, S., and Bax, A., 1993, J. Am. Chem. Soc. 115:12593. Grzesiek, S., and Bax, A., 1994, J. Am. Chem. Soc. 116:10196. Hori, H., 1971, Biochim. Biophys. Acta 251:227. Kay, L. E., Keifer, P., and Saarinen, T., 1992, J. Am. Chem. Soc. 114:10663. Kiefer, J. C., 1987, Introduction to Statistical Inference, Springer-Verlag, New York.
Kung, H. C., Wang, K. Y, Goljer, I., and Bolton, P. H., 1995, J. Magn. Reson. 109:323. Kuriyan, J., Wilz, S., Karplus, M., and Petsko, G. A., 1986, J. Mol. Biol. 192:133. Laatikainen, R., Ratilainen, J., Sebastian, R., and Santa, H., 1995, 7. Am. Chem. Soc. 117:11006.
Laatikainen, R., Santa, H., Hiltunen, Y., and Lounila, J., 1993, J. Magn. Reson. 104:238. LeMaster, D. M., Kay, L. E., Brünger, A. T., and Prestegard, J. H., 1988, FEBS Lett. 236:71. Lisicki, M. A., Mishra, P. K., Bothner-By, A. A., and Lindsey, J. S., 1988, J. Phys. Chem. 92:3400. Lohman, J. A. B., and MacLean, C., 1978, Chem. Phys. 35:269. Losonczi, J. A., and Prestegard, J. H., 1998, Biochemistry 37:706. Mandel, A. M., Akke, M., and Palmer, A. G. III, 1996, Biochemistry 35:16009.
Martin, Y-L., 1994, J. Magn. Reson. A 111:1. Montelione, G. T., and Wagner, G., 1989, J. Am. Chem. Soc. 111:5474.
Nilges, M., Gronenborn, A. M., Brünger, A. T., and Clore, G. M., 1988, Prot. Eng. 22:27. Omichinski, J. G., Clore, G. M., Schaad, O., Felsenfeld, G., Trainor, C., Appella, E., Stahl, S. J., and Gronenborn, A. M., 1993, Science 261:438. Ösapay, K., Theriault, Y., Wright, P. E., and Case, D. A., 1994, J. Mol. Biol. 244:183. Palmer, A. G. III, Cavanagh, J., Wright, P. E., and Rance, M., 1991, J. Magn. Reson. 93:151.
Plantenga, T. M., Ruessink, B. H., and MacLean, C, 1980, Chem. Phys. 48:359. Prestegard, J. H., 1998, Nat. Struct. Biol. 5:517.
Rajarathnam, K., LaMar, G. N., Chiu, M. L., and Sligar, S. G., 1992, J. Am. Chem. Soc. 114:9048. Redfield, A. G., 1965, Adv. Magn. Reson. 1:1. Sanders, C. R., Hare, B. J., Howard, K. P., and Prestegard, J. H., 1994, Prog. NMR Spectrosc. 26:421. Saupe, A., 1968, Angew. Chem., Int. Ed. Engl. 7:97. Skoglund, C. M., 1987, Molecular Orientation Studies Using High Field NMR, Thesis, Carnegie-Mellon University, Pittsburgh.
Protein Structure and Dynamics from Dipolar Couplings
355
Snyder, L. C., 1965, J. Chem. Phys. 43:4041. States, D. J., Haberkorn, R. A., and Ruben, D. J., 1982, J. Magn. Reson. 48:286. Stein, E. G., Rice, L. M., and Brünger, A. T, 1996, J. Magn. Reson. 124:154. Tang, J., and Norris, J. R., 1986, J. Magn. Reson. 69:180. Tjandra, N., and Bax, A., 1997, J. Magn. Reson. 124:512. Tjandra, N., and Bax, A., 1998, Science 278:1111. Tjandra, N., Grzesiek, S., and Bax, A., 1996, J. Am. Chem. Soc. 118:6264. Tjandra, N., Omichinski, J. G., Gronenborn, A. M., Clore, G. M., and Bax, A., 1997, Nat. Struct. Biol. 4:732. Tolman, J. R., Flanagan, J. M., Kennedy, M. A., and Prestegard, J. H., 1995, Proc. Natl. Acad. Sci. U.S.A. 92:9279.
Tolman, J. R., Flanagan, J. M., Kennedy, M. A., and Prestegard, J. H., 1997, Nat. Struct. Biol. 4:292. Tolman, J. R., and Prestegard, J. H., 1996a, J. Magn. Reson. 112:269. Tolman, J. R., and Prestegard, J. H., 1996b, J. Magn. Reson. 112:245. Totz, J., van den Boogaart, A., van Huffel, S., Graveron-Demilly, D., Dologlou, I., Heidler, R., and Michel, D., 1997, J. Magn. Reson. 124:400. Umesh, S., and Tufts, D. W., 1996, IEEE Trans. Signal Process. 44:2245. van den Bos, A., 1982, in Handbook of Measurement Science, Vol. 1 (P. H. Sydenham, ed.), Wiley, London, pp. 331–377. van Huffel, S., 1993, Signal Process. 33:333. Van Vleck, J. H., 1932, The Theory of Electric and Magnetic Susceptibilities, Oxford University Press, Oxford. van Zijl, P. C. M., and Bothner-By, A. A., 1988, J. Magn. Reson. 79:439. van Zijl, P. C. M., and MacLean, C., 1985, J. Chem. Phys. 83:4410. van Zijl, P. C. M., Ruessink, B. H., Bulthuis, J., and MacLean, C., 1984, Ace. Chem. Res. 17:172. Vuister, G. W., and Bax, A., 1992, J. Magn. Reson. 98:428. Vuister, G. W., and Bax, A., 1993, J. Am. Chem. Soc. 115:7772. Vuister, G. W., Delaglio, F, and Bax, A., 1993, J. Biomol. NMR 3:67. Wagner, G., 1993, Curr. Opinion Struct. Biol. 3:748. Werbelow, L. G., and London, R. E., 1995, J. Chem. Phys. 102:5181.
9
Recent Developments in Studying the Dynamics of Protein Structures from and Relaxation Time Measurements
Jan Engelke and Heinz Rüterjans 1. INTRODUCTION
1.1. General Features of Dynamics High-resolution NMR spectroscopy has become one of the most important methods for the determination of three-dimensional structures of biological macromolecules (Kessler et al., 1988; Ernst et al., 1987; Wuthrich, 1986). Only very recently NMR has been used to describe the dynamic properties of molecules of biological interests. In fact, protein action is usually connected with the binding of another molecule. The high rate of these protein–substrate interactions is only possible when the protein is sufficiently rigid to recognize the substrate in a specific manner.
On the other hand, it must be flexible enough to induce a type of binding by adaptation. The biological function of a protein in solution is therefore not only
Jan Engelke and Heinz Rüterjans • Institut für Biophysikalische Chemie, Johann-Wolfgang Goethe Universität, D-60439 Frankfurt am Main, Germany. Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999.
357
358
Jan Engelke and Heinz Rüterjans
dependent on the rigid molecular structure but connected with the dynamical properties of its structure in solution. Various investigations have shown that the recognition process between molecules cannot be described with a rigid keylock principle but with an induced-fit mechanism of two flexible molecules (Lane, 1989; Searle et al., 1988; Jardetzky and Roberts, 1981). An almost complete characterization of all dynamical processes in a molecule is probably not feasible. The number of independent parameters which are necessary for a complete description of dynamics is supposed to be much larger than the number of experimental motional parameters obtained by NMR spectroscopy. Therefore the description of internal motions in proteins necessarily follows strongly simplified model assumptions. In general, various time windows for characteristic dynamical processes in proteins are accessible by NMR spectroscopy.
• Very slow processes
•
•
•
•
like protein folding or the exchange of labile protons with solvent protons, can be detected by a variation of the NMR spectrum with time following a specific disturbance (Davis, 1995; Rohl and Baldwin, 1994; Grzesiek and Bax, 1993a; Spera et al., 1991). Slow processes like an interconversion of various discrete conformations or slow chemical exchange, can be detected either with homonuclear (Fejzo et al., 1990) or heteronuclear exchange spectroscopy (Lee et al., 1995; Farrow et al., 1995, 1994; Wider et al., 1991; Montelione and Wagner, 1989) or with off-resonance ROESY experiments (Desvaux et al., 1994; Kuwata and Schleich, 1994). A prerequisite of these experiments is the appearance of two signals of the corresponding resonances in the NMR spectrum (compare Sec. 7). Processes of medium time range like rapid conformational or rapid chemical exchange, can be easily detected with measurements at variable field strength (Desvaux et al., 1995b; 1995c; Szyperski et al., 1993) or by using measurements with variable spin-echo time provided that these processes are connected with a modulation of the chemical shift or the J-couplings (Orekhov et al., 1995) (compare Sec. 6). Rapid motions like the reorientation of a protein or motions of the protein main chain or of side chains, may directly influence the relaxation rates via a modification of the spectral power density. These processes may be characterized by direct determination of homo- or heteronuclear auto or cross-relaxation rates (compare Secs. 2–5). Local vibrations of single atoms or atomic groups which may be described by harmonic or inharmonic motions lead to a transformation of relaxation rates to order parameters which lack a time resolution of this motional process (compare Secs. 2–5).
Dynamics of Protein Structures
359
1.2. Microdynamic Motional Parameters A general scheme for the determination of microdynamical parameters from NMR experiments is given in Fig. 1. Suitable experimental data for the description of rapid internal dynamics of proteins are obtained from relaxation rates of distinct states of magnetization. Autorelaxation rates in general are the longitudinal relaxation rates, the time constant of which is called the spin–lattice relaxation time
and the transversal relaxation rate, the time constant of which is called spin–spin relaxation time . When magnetization according to a relaxation mechanism is transferred from one nucleus to another nucleus this process is called cross relaxation. This effect is the basis of NOE and ROE spectroscopy. In order to describe the internal dynamics of a protein the design of models for the motions seems to be a prerequisite. As an example: the motion of a vector of the protein backbone is different from the motion of a methyl group at the end of an amino acid side chain. The mathematical description of a motion is usually represented by a correlation function which is specific for a distinct motion. From relaxation theory, a correlation between the experimental relaxation rates and the spectral density function which is characteristic for a distinct motion can be derived.
The spectral density function is obtained by Fourier transformation of the internal correlation function. Motions of proteins or within proteins at the atomic level can be investigated using relaxation time measurements with the nuclei and Proton relaxation rates can be determined with high sensitivity because of the high natural abundance of the isotope and because of its large gyromagnetic ratio. The evaluation of data from one-dimensional NMR spectra requires highly resolved
360
Jan Engelke and Heinz Rüterjans
resonances, which is certainly not the case for proteins. With the exception of some
isolated
resonances it is not feasible in general to determine
relaxation rates
from the spectra. An additional difficulty arises with the interpretation of relaxation rates since a protein proton usually is surrounded by various other protons with similar distance, and hence the relaxation of the observed proton depends on the motion of all protons in the neighborhood. First determinations of heteronuclear relaxation rates of macromolecules were
carried out by Allerhand and co-workers (1971). Longitudinal relaxation times were obtained for carbonyl resonances and aliphatic resonances of ribonuclease A at natural abundance. relaxation times were determined at natural abundance from various biomolecules from Gust and co-workers (1975). Hawkes and coworkers (1975) determined relaxation times from gramicidin S. Since the early 1970s, numerous relaxation times at natural abundance of the corresponding nuclei were carried out in order to investigate the dynamics of macromolecular systems (Schiksnis et al., 1989; McCain and Markley, 1986; Henry et al., 1986;Hanssum and Rüterjans, 1983; Richarz et al., 1980; Norton et al., 1977). Usually relaxation times were determined using one-dimensional NMR techniques in which the heteronuclear resonances were directly detected. Because of the low sensitivity of these experiments and because of signal overlap, relaxation times were usually determined only for single separate resonances. In addition, high concentrations of samples were necessary in these experiments. With the application of one- and two-dimensional experiments in which the relaxation times of the heteronuclei were indirectly detected from a decrease of signal intensity of the resonances, a remarkable improvement in resolution and sensitivity was reached (Nirmala and Wagner, 1989, 1988; Sklenar et al., 1987; Kay et al., 1987). With the introduction of gene technology, which allowed a heterologous expression of proteins in bacterial systems, it was possible to enrich the and stable isotopes in proteins, using corre sponding and media for growth. In the beginning of this period the enrichment of the isotope in proteins was used. First NMR experiments for the determination of relaxation using a protein sample were carried out in 1989 (Kay et al., 1989). Meanwhile the experimental details have been improved in such a way that a fairly precise analysis of the dynamics of protein-backbone amide groups has become available (Farrow et al., 1997; Phan et al., 1996; Tjandra et al., 1996). Recent investigations have used the possibility to describe the dynamics of asparagine and glutamine side chains in addition to the N–H bond vector of the peptide backbone using relaxation times (Boyd, 1995; Buck et al., 1995). The motions of these side chains are of particular interest because they are often involved in the interactions with substrates or inhibitors. Investigations of the internal dynamics of proteins using relaxation times with selectively or completely molecules have been carried out with only a few model systems (Engelke and Rüterjans, 1995; Yamazaki et al., 1994; Nicholson et al., 1992). Selectively proteins
Dynamics of Protein Structures
361
can be investigated using the same methodology for the determination and interpretation of relaxation times as with unlabeled proteins with natural abundance.
However, the sensitivity of these experiments is much more pronounced even though the protein solutions may be less concentrated. On the other hand, selective enrichment of the isotopes may be too difficult for microbiological preparation. Therefore, the determination of the heteronuclear relaxation rates may be more convenient using completely and protein samples for various reasons: using such a sample the relaxation time of different and nuclei of the protein main chain and side chains can be determined with a high sensitivity. Moreover, relaxation rates of nuclei which are not covalently linked to a proton can
be measured. The experimental outline of the experiments as well as the interpretation of relaxation rates is certainly more difficult than by use of selectively enriched proteins. A first theoretical description of the influence of internal motions on the correlation function was given by Woessner (1962). Woessner considered the relaxation of protons of a rotating methyl group which is attached to a big molecule involved in an isotropic tumbling motion. In a model-like assumption the motion was described with a free rotatory diffusion of the whole molecule together with a
jumplike motion between three distinct rotamer sites. Starting from these first theoretical concepts, more complex models of internal motions have been developed (Kowalewski, 1990a, 1990b), some of them based on axially symmetric or asymmetric diffusion tensors (Werbelow and Grant, 1977). Another model characterizing the internal motions was proposed by Lipari and Szabo (1982a, 1982b).
This description of internal dynamics known also as the model-free approach is connected with one correlation time and one order parameter. Both parameters are not connected with a distinct physical model, but they are interpreted in a second step using models of motions. This approach is particularly suitable for comparing similar motions and for estimating the amplitudes of motions. This model has been used with various modifications and extensions primarily for the description of the
peptide backbone motions in proteins. 2. METHODS 2.1. Theory of Relaxation in Proteins
In general, two mechanisms of interaction with the environment are important for relaxation in proteins: the dipole–dipole interaction and the chemical-shift anisotropy (CSA). For the dipolar interaction the rotational motion of the vector connecting the two interacting nuclei relative to the external magnetic field is relevant, as well as the modulation of the length of this vector. The latter is not important if the distance of the two interacting nuclei is fixed by a chemical bond.
362
Jan Engelke and Heinz Rüterjans
This makes and relaxation attractive for studies of mobility because these nuclei relax primarily by interaction with their directly attached protons.
The chemical shifts of nuclei express the shielding of the external magnetic field by the environment of a nucleus. This shielding depends on the orientation of the chemical structure relative to the external magnetic field. Rotational diffusion of the protein and internal mobility can modulate this effect and thus may cause transitions between the two energy levels of the spin-1/2 nucleus if the rates of the motions correspond to the nuclear transition frequencies Hence, relaxation depends on whether the chemical structure in a protein to be investigated moves with frequencies corresponding to differences between energy levels. In case of an isolated SI spin system, where S is a heteronucleus, the NMR relaxation is caused by the dipole–dipole interaction with its bound proton (DD), magnetic shielding due to CSA, and cross correlation between these two interactions (DDA). In general, equations for initial relaxation rates of the left and right line of the or doublet are (Bull, 1992; Goldman, 1984)
where
h is Planck’s constant is the internuclear distance between the nitrogen or carbon nuclei and their bound hydrogens, is the CSA constant, and and are the magnetogyric ratios. The spectral densities are defined by
Dynamics of Protein Structures
where
363
denotes the angle between the magnetization vector of nucleus S and the and are the and resonance frequencies, respectively, and
In this spectral density function, is the second-rank spherical harmonic; are the SI bond spherical polar angles in the laboratory frame. In order to estimate the parts of the individual relaxation contributions, an isotropic reorientation of the molecule with an overall tumbling correlation time of 5 ns is assumed. For a carbon resonance frequency of 125 MHz and a nitrogen frequency of 50 MHz, the contributions of the different mechanisms are shown in Table 1. For a CH spin
system the contribution of the CSA term amounts to about 3% and can therefore be neglected. For a NH vector this approach is not feasible, since independent of the CSA contribution will be of the order of 20%. Nevertheless, the influence of dipolar–heteronuclear CSA cross correlation is significant in both cases and can potentially affect measurements of and NOE. Fortunately, this contribution can be eliminated from relaxation measurements by continuous inversion of the proton resonance during the relaxation period. For an isolated spin system, equations for initial relaxation rates of the inner
and the left or right outer lines of the be written as (Daragan et al., 1993a, 1993b)
or
multiplet can
364
Jan Engelke and Heinz Rüterjans
where
The function
where
and
is the dipolar cross-correlation function:
are the two geminal SI vectors. For isotropic reorientation of the
molecule the spectral density function is
. 125 MHz, the longitudinal and transverse values of For and are given in Table 1. In order to show the different relaxation contributions to the individual lines of a multiplet, two one-dimensional spectra of selectively glycine, which is dissolved in a mixture of 70% glycerol and 30% are shown in Fig. 2. The different heights of the two outer lines indicate the different contributions of the CSA–dipole cross correlation to both lines. The temperature dependent relation of the inner to the outer lines indicates the dipole–dipole cross-correlation contribution. For a highly viscous solution
is
large and negative, and hence the linewidth of the outer lines is smaller than that of the inner line. In contrast, for a fast rotational motion of the molecule the contribution of is small, and the expected intensity ratio of 1:2:1 is nearly
obtained. Hence, even under proton saturation the cross correlation between the dipolar interactions leads to a double-exponential decay of the longitudinal magnetization. Concerning this problem, Zhu and co-workers (1995) started a detailed study about
Dynamics of Protein Structures
365
the magnitude of the error when fitting the longitudinal magnetization decay of a group to a single exponential function. They applied several different methods of data reduction and found, all in all, that the errors in and the NOE are less than 6% and 4%, respectively. Therefore, neglecting dipolar cross correlation in and NOE measurements in systems do not lead to major problems
in the interpretation of the results in terms of molecular motion. A similar result was reported by Buck and co-workers (1995), who investigated the relaxation of
the nucleus in a group. Since the magnitude of the dipole–dipole crosscorrelation process between the pair of internuclear vectors is scaled by the angular term which is 0.125 for the cross-correlation process does not strongly affect the longitudinal or transverse
recovery (Buck, 1994). Thus, for isolated and groups the dipolar interaction(s) with its directly bound proton(s) has (have) to be considered, and for the nuclei the CSA contribution must be considered as well. 2.2.
Experiments for the Determination of Relaxation Rates
For the determination of
and
relaxation times out and back-pulse
sequences have been applied in most cases (Fig. 3). The magnetization transfer in these measurements starts and ends at the resonances of the covalently bound protons. Since the sensitivity of NMR experiments depends on the gyromagnetic
ratio of both the nucleus, the irradiation of which is starting the experiments and the nucleus, the magnetization of which is detected Hence, a considerable increase of sensitivity compared with a direct detection of the heteronucleus can be expected:
366
Jan Engelke and Heinz Rüterjans
For nitrogen nuclei the gain in sensitivity amounts to nuclei
and for carbon
In the experiments for the determination of heteronuclear
or NOEs, the magnetization transfer starts with the resonances of the heteronucleus such that the sensitivity can be improved by a factor 75 or 8, respectively. The principle design of pulse sequences for the determination of longitudinal and transversal relaxation rates is depicted in Fig. 2. The experiments start with a polarization transfer from the proton to the heteronucleus, the relaxation
rate of which has to be determined. The evolution of the chemical shift of the heteronucleus follows in the indirect dimension. After this period a special element in the pulse sequence has to be inserted for the measurements of the relaxation rates. In order to avoid influences of the cross correlation between the dipolar or the interactions and the CSA of the or nucleus, resonances are inverted using 180° pulses during the relaxation period (Kay et al., 1992b; Boyd et al., 1991, 1990). At the end of such a typical experiment the magnetization is transferred back for detection to the protons. Between the transitions a dwelling time is inserted such that the magnetization can again reach thermal equilibrium.
2.2.1. Determination of the Longitudinal Relaxation Time The longitudinal relaxation time, or the so-called spin–lattice relaxation time, describes the rate with which a disturbed spin ensemble again reaches the state of equilibrium. The experimental determination of this relaxation time is usually carried out with the inversion recovery experiment (Freemann and Hill, 1969; Vold et al., 1968). In its simplest form the pulse sequence consists of a 180° pulse which
inverts macroscopic magnetization from
to
the relaxation period T, and
finally a 90° pulse which produces the necessary transversal magnetization for
detection (Fig. 4A). Experiments with increasing values of T lead to signals starting with negative amplitudes and ending with positive amplitudes. The T dependence of the signal amplitudes may be described by the equation
Dynamics of Protein Structures
where
367
is the value of the equilibrium magnetization and
inverted magnetization. For the determination of
is the value of the
relaxation times a fit of three
parameters is necessary. A two-parameter fit is possible only when this spin ensemble before the experiment is in equilibrium and when the 180° pulse inverts all spins such that Since in a usual experiment many FIDs have to be accumulated and because of resonance effects, a complete inversion of all spins is not feasible, so this prerequisite is not fulfilled in most cases. A modified inversion recovery experiment in which these problems do not arise is depicted in Fig. 4B (Sklenar et al., 1987). In this experiment the first 180° pulse is divided into two 90° pulses. The first pulse produces transversal magnetization which is rotated with the second pulse alternatively in the direction of the and When the phase of the detector is also inverted accordingly such that the two FIDs in the sequence are subtracted from each other, the resulting signal amplitude is always positive and will decrease with increasing T in an exponential manner (Fig. 4C):
368
Jan Engelke and Heinz Rüterjans
This procedure avoids artifacts in values due to off-resonance effects and leads to correct values, even though the equilibrium magnetization was not yet reached
at the beginning of the experiment. 2.2.2. Determination of the Transversal Relaxation Time The transversal relaxation time
or the so-called spin–spin relaxation time,
describes the rate with which the phase coherence between the spins of a spin ensemble is decreasing. In the classical description almost half of the spins precess in phase on the upper cone, and the other half in phase on the lower cone of a double cone following a 90° pulse. The macroscopic magnetization is moving in the transversal plane. The phase relation between coherently processing nuclei is lost
with the time constant or, in other words, the time constant is correlated with the dissolution of the coherence. Contrary to the spin–lattice relaxation time the
energy of the spin ensemble does not change. Instead the spin–spin relaxation leads to an increase of entropy. This mechanism can be described as follows: with fluctuating magnetic fields energy may be transferred from one nucleus to another nucleus. One nucleus may change from the high-energy level to the low-energy
level, whereas at the same time the adjacent nucleus may change from the low-energy level to the high-energy level. The donated energy and the accepted energy are the same. However, the phase relation is lost. In addition to the spin–spin interaction another contribution to the transversal relaxation time has to be considered. Since the experimental probe has an extended
volume and since the magnetic field is not homogeneous along this volume, not all nuclei are positioned in exactly the same magnetic field. Even chemically equivalent nuclei precess with different Larmor frequencies because of the inhomogeneities of the magnetic field. It is obvious that the inhomogeneity of the magnetic field also decreases the transversal magnetization. This contribution to may produce artifacts in the analysis of molecular motions. In order to suppress this experimental parameter a spin-echo procedure for the determination of the
transversal relaxation time is usually applied, which was developed by Carr and Purcell (1954) and by Meiboom et al., (1958). In the so-called CPMG pulse sequence the starting magnetization develops during the delay time according
to the chemical shift and the inhomogeneity of the magnetic field. However, it is refocused during the second dwelling time
since the 180° pulse reverses the
direction of the rotation of the magnetic vector. It is a prerequisite for the refocusing that the molecules be located in the identical field during the first and the second dwelling time Successive repeats of CPMG sequences lead to an exponential decay of the signal amplitude with the time constant
Dynamics of Protein Structures
N indicates the number of CPMG elements, and pulse length (Fig. 2.4).
369
is the length of the
or
2.2.3. Determination of the Heteronuclear NOE
The change in intensity of the NMR signal of the nucleus S during the perturbation of the equilibrium magnetization of an adjacent nucleus I is called the nuclear Overhauser enhancement effect (NOE). All so-called NOE experiments, such as the transient NOE or the NOESY experiment, are based on this effect. In order to investigate the dynamics of proteins the heteronuclear NOE is usually determined. A change in amplitude of the or resonance is effected by a saturation of or covalently bound protons. The intensity of the heteronuclear NOE can be derived directly from the longitudinal relaxation behavior of an IS spin
system (Goldman, 1988):
with
NOE values should be between 1 and 3, while between 1 and –4, since the gyromagnetic ratio of
NOE values should be is negative. Heteronuclear
NOEs are determined by recording two spectra, one with proton saturation and one without saturation. The delay times between successive scans have to be chosen in such a way that the or the spin ensemble has again reached equilibrium.
370
Jan Engelke and Heinz Rüterjans
3. BACKBONE DYNAMICS DERIVED FROM RATES
RELAXATION
The investigation of the protein backbone dynamics using relaxation time measurement of proteins has become routine during recent years. Experiments are based on a two-dimensional HSQC experiment which has to be modified according to the requirements for the determination of relaxation times. Because of the easy access to the relaxation rates and because of the relatively simple interpretation of the data, corresponding investigations have been carried out for many proteins. The results of these studies are discussed in Sec. 3.4. 3.1. Experimental Details
In the experiments for the determination of relaxation rates the methodology of water resonance suppression is of utmost importance. In early experiments a presaturation of the water resonance during the delay time between two scans using a pulse of low power or a so-called spin-lock pulse was applied. The spin-lock pulse is connected to a high field strength and has a length of a few milliseconds. It induces an equal population of the energy levels such that the water resonance disappears. Usually with both methods a complete water resonance suppression is not reached, and the baseline of the spectrum is very often disturbed. Due to the exchange of protons with water protons, artifacts in the relaxation rates may occur, in particular NOEs may be wrong (Grzesiek and Bax, 1993b). With the introduction of pulsed-field gradients in NMR spectroscopy, new methods for water resonance suppression have been developed. With respect to the experiments for the determination of relaxation rates, two different experiments have become important: the sensitivity-enhanced HSQC experiment with pulsed-field gradients for coherence selection (SE-HSQC) and the so-called water-flip back HSQC (WFB-HSQC).
3.1.1. Sensitivity-Enhanced HSQC Experiment (SE-HSQC) In this type of experiment pulsed-field gradients were used in order to minimize the artifacts of the spectra. The intense solvent resonance has to be suppressed and a coherence transfer pathway has to be selected whereby magnetization passes from to for observation. Bax and Pochapski (1992) have described in detail the use of gradients to eliminate artifacts in NMR spectra. Following their proposal,
equal-gradient pulses were inserted on opposite sides of simultaneous
pulses in order to ensure that only transverse magnetization present before and after the application of the pulse pair is refocused. In addition, gradients are inserted during intervals when the magnetization is of the form where and denote the
of
and
magnetization. In addition to eliminating artifacts
Dynamics of Protein Structures
371
due to pulse imperfections, insertion of gradients during these periods in the sequences improves suppression of the water resonance. In order to ensure that magnetization originates on and not on in the case of and measurements, the and sequences begin with a pulse followed by the application of a gradient pulse in order to dephase any magnetization in the transverse plane. The heavily shaded gradients in Fig. 6 are used for coherence
selection. As discussed previously (Muhandiram and Kay, 1994; Kay et al., 1992a), the use of pulsed-field gradients to select coherence transfer pathways in combination with the enhanced sensitivity method developed by Rance and co-workers (Cavanagh et al., 1991; Palmer et al., 1991) has generated spectra free of artifacts. Water suppression levels are also excellent, and increased sensitivity, by as much as a factor of relative to the unenhanced, nongradient spectra has been achieved. A recent comparison of spectra recorded on a number of different proteins ranging in molecular weight from approximately 10 to 20 kDa showed that when using the enhancement approach of Rance and co-workers, the use of gradients for pathway selection does not decrease the sensitivity of spectra (Muhandiram and Kay, 1994). Note that the relaxation periods have to be placed behind the indirect evolution periods, since it is not possible to preserve both cosine- and sine-modulated
components during the relaxation periods. This is not really true in the
sequence, but in this experiment imperfections in the pulses during the CPMG pulse train will affect the sine and cosine components of the magnetization differently, leading to potential errors in the measured values. The steady-state NOE values are obtained by recording spectra with (NOE experiment) and without (NONOE experiment) the use of saturation applied before the start of the experiment. The level of water suppression in spectra recorded without saturation is often significantly worse than that in the saturation case and can lead to difficulties in obtaining accurate values for peak intensities. Moreover, any saturation of protons prior to the start of the NONOE experiment gives rise to a small truncated NOE effect. Therefore, suppression of the strong resonance is achieved via the use of gradients to select the coherence transfer pathway and via the application of a pulse followed by a gradient pulse just prior to the start of the experiment to eliminate magnetization. Eliminating of all magnetization (including solvent) in this way can be achieved in 2–3 ms, which is sufficiently short so that cross relaxation may be neglected during this interval.
3.1.2. Water-Flipback HSQC The excellent water suppression in the sensitivity-enhanced HSQC is achieved due to the dephasing of the water magnetization. Unfortunately, owing to the very long of protons (4–5 s) compared to protein protons (~1.4 s), the water remains in a semisaturated state as the delay time between scans is typically much
372
Jan Engelke and Heinz Rüterjans
Dynamics of Protein Structures
373
shorter than the Therefore, the resonance amplitude of labile protons which exchange with a rate faster compared with the delay time are strongly attenuated. In contrast, the water-flipback HSQC approach leaves the water unperturbed along the during most of the NMR pulse sequences, thereby avoiding saturation. Figure 7 shows the pulse schemes for nonsaturated relaxation time measurements. Immediately after the magnetization is transformed into magnetization (time a), the magnetization is rotated from the y- to the by the subsequent low-power pulse. The pulse-field gradient ensures that no radiation damping occurs. Subsequently, magnetization evolves during the with refocusing of antiphase magnetization occurring for a duration of so that at time b in the sequence the signal is inphase. Note that the 180° pulse applied during this period rotates the water magnetization, such that it is aligned along the In order to determine the longitudinal relaxation time the modified inversion recovery sequence is used; i.e., the pulse at time b establishes nitrogen magnetization, which is allowed to relax during the mixing time T. For measuring the transverse relaxation time the CPMG scheme is inserted in the pulse sequence at this time. After the T period, magnetization is returned to the NH proton for detection via an INEPT pulse sequence (Morris and Freeman, 1979) in both schemes. During the period the water magnetization is inverted, which aligns it along the However, the combined action of the selective water pulse of phase and the 90° pulse of phase ensures that the water magnetization is restored to the The final 180° pulse of phase is surrounded by water-selective 90° pulses of phase and pulse-field gradients (Piotto et al., 1992). This leaves the water magnetiza-
374
Jan Engelke and Heinz Rüterjans
Dynamics of Protein Structures
375
tion in a well-defined state along the providing excellent water suppression. The NOE values are measured by comparing the intensity of signal transferred from to in the absence and presence of saturation. Magnetization exchange between amide protons and water protons, either via (exchange-
mediated) NOE or hydrogen exchange, can cause amide protons to relax to their thermal equilibrium value with a time constant that can be much longer than their inherent For reasons of sensitivity, the experiment is frequently repeated with a delay between scans of only ~4–6 s (Stone et al., 1992; Clore et al., 1990b; Kay et al., 1989) shorter than of the resonance. As a consequence, unless special precaution is taken (Fig. 7C) to avoid saturation of the reference signal (with no NOE) for an amide subject to hydrogen exchange may be much smaller than its true equilibrium value. Beside the advantages of avoiding water saturation in comparison to the sensitivity-enhanced pulse schemes, a second point can be of interest. Since the relaxation periods for and can be placed before the indirect evolution period, an exchange process between conformers during the mixing time does not lead to artifacts in the spectra. This effect will be discussed in detail in Sec. 7. It is open to question which type of the pulse sequences leads to better results. This question cannot be answered in a general way. If an exchange of amide protons with water protons is expected due to a high pH value or due to an unfolded tertiary structure, it is advisable to use the water-flipback HSQC method. In most other cases the sensitivity-enhanced HSQC should be preferred, since this experiment is connected with a very low phase cycle and with a larger sensitivity. For the determination of and relaxation rates, a series of 7 to 10 spectra are usually recorded. The range of delay times T has to be selected according to the
correlation time of the protein, which in most cases is not known a priori. Starting with the setup of instrument conditions a series of first increments with increasing length of T should be recorded in order to determine the value of T with which the intensity is decreased to 1/e. The intermediate dwelling times should be set such that the amplitude decreases in a linear way. Typical values for the determination of longitudinal relaxation times are between 10 and 800 ms. For the determination of times, values between 4 and 150 ms should be set. A recycle delay of 1.4 s was employed when recording and experiments. Steady-state values were determined from spectra recorded in the presence and absence of proton saturation. Saturation was achieved by the application of 120° pulses every 5 ms (Markley et al., 1971). NOE spectra recorded with proton saturation utilized a 3-s relaxation delay followed by a 3-s period of saturation while spectra recorded in the absence of proton saturation employed a relaxation delay of 6 s. The number of scans depends on several factors, such as the sample concentration, the mass of the protein, the field strength of the spectrometer, the temperature, and, of course, the available time for the measurement. However, from our experience saving on the number of scans is very critical, since a low signal-to-noise ratio leads to badly
376
Jan Engelke and Heinz Rüterjans
defined relaxation rates and thus to microdynamical parameters which are difficult to interpret. A minimum of 16 scans for and measurement and 32 scans for the NOE spectra are recommended. In recent publications it has been suggested that relaxation times should be extracted from pseudo-3D spectra, as the experimental conditions for all spectra of a series is completely identical (Tjandra et al., 1996).
3.2. Processing of Spectra and Determination of Relaxation Rates The processing of two-dimensional spectra for the determination of relaxation rates is usually performed in four steps: apodisation, zero filling of data matrix, Fourier transformation, and phase correction. For the apodisation FIDs of each dimension are usually multiplied with a suitable window function, as, e.g., with a
function. Zero filling in each dimension should be chosen in such
a way that a digital resolution of the order of 2 Hz/point is reached. After apodisation and zero filling, Fourier transformation and phase correction of the spectra will follow. Linear prediction procedures for improving the resolution of the spectra should be avoided. Skelton et al. (1993) have shown that this technique leads to a reduction of error margins of the relaxation rates. However, there may be deviations from the real values of the relaxation rates. For the calculation of relaxation times the signal intensities of each spectrum have to be determined. For data points with coordinates an exponential function of the type
is fitted using a simplex algorithm (Press et al., 1992) such that the average quadratic error is minimized:
N indicates the signal heights in a corresponding series. It is also possible to use volume integrals for the determination of relaxation times (Farrow et al., 1995). However, in some cases the results may be of minor quality because of the difficulties with signal integration. It should be possible to improve the determination of volume integrals with the development of appropriate software. All parameters determined with fitting procedures, such as amplitudes and time constants, have inherent errors. Standard procedures for the determination of variances will be described in the following: Since possible deviations in the signal heights are not known a priori, they can be estimated using the values of A prerequisite of this procedure will be that all signal heights be subject to the same
Dynamics of Protein Structures
377
range of absolute deviations. A proof of this assumption can be obtained with an analysis of the noise level in the two-dimensional spectra. For the average deviation of the signal height a value of may be chosen. In a next step the variance of the amplitude and the time constant can be determined using a Monte Carlo simulation (Kamath and Shriver, 1989). In this procedure a distribution of random numbers are produced centered around zero. The width of such a distribution is given by These values are then added to the signal heights, and a new
fitting of both parameters (amplitudes and time constants) with the already described methodology is carried out. In this way artificial sets of data are produced and a normal distribution of the amplitude and time constants can be derived of which the standard deviations of both quantities are given with a probability of 68%. For the evaluation of relaxation rates about 500 sets of data should be simulated. The steady-state NOE values are determined from the ratios of the average intensities of the peaks with and without proton saturation. The standard deviation of the NOE value, can be determined on the basis of the measured background noise levels using the relationship (Nicholson et al., 1992)
where and represent the measured intensities of a resonance in the presence and absence of proton saturation, respectively. The standard deviations of these values, calculated from the root-mean-square noise of background regions, are represented by and
3.3. Calculation of Microdynamical Parameters As shown in Sec. 2.1 the spin relaxation of the nuclei is primarily caused by the dipolar interaction between the nucleus and the attached proton and the chemical-shift anisotropy. The relaxation rates and NOEs are dependent on the spectral density function describing the contribution of five distinct frequencies (0, to the motion of the individual amide bond vectors:
The constants
and
are defined as
378
Jan Engelke and Heinz Rüterjans
The assumption of an axially symmetric chemical-shift tensor has been shown to be valid for peptide bonds with (Hiyama et al., 1988). In the absence of any assumption about the form of the spectral density function, the three experiments NOE) do not provide enough information to enable a direct determination of the spectral density function at all five frequencies. Rather than performing additional relaxation measurements, as described by Peng et al. (1991 a, 1991 b) and Habazettl et al. (1996) to map all five spectral density functions, one may describe the spectral density by using a simpler model-free
formalism for an isotropically tumbling protein (Lipari and Szabo, 1982a, 1982b) as follows: (In the original publication of Lipari and Szabo the spectral density function is smaller by a factor of 5. This factor was already included in the constants and respectively.)
where The model-free approach characterizes both the rate and amplitude of internal motions for individual N–H vectors in terms of the generalized order parameter the overall rotational correlation time and the effective correlation time
The values of the generalized order parameter may range from 0 (corresponding to completely isotropic motion) to 1 (completely restricted motion). The effective correlation time describes the rate of motion for fluctuations of the amide bond vector occurring on a time scale faster than that of the
overall motion An extended form of the model-free spectral density function has been developed (Clore et al., 1990a, 1990b) to describe internal motions that take place on two distinct time scales, differing by at least one order of magnitude. Assuming that
the term of the correlation time describing the faster of the two time scales contributes a negligible amount to the relaxation, the modified spectral density function becomes
where the order parameter
is expressed as the product of two order parameters
characterizing the fast and slow internal motions,
and
effective correlation time for the slow internal motions,
relationship
respectively. The is included using the
Dynamics of Protein Structures
379
Since transverse relaxation times can also be affected by internal motions occurring on the time scale, such as those arising from conformational exchange averaging, the exchange-broadening contribution factor, was introduced as a correction factor to the relaxation rate for the calculation of model-free parameters (Farrow et al, 1994; Stone et al., 1992). The overall correlation time can be determined with two different procedures. The first procedure is based on the analysis of the ratio between the longitudinal and the transverse relaxation times. This calculation can be carried out without greater difficulty. The second procedure depends on the quality of the fit of experimental data to obtain the microdynamical motional parameters. While the first procedure is of qualitative nature, the second procedure may lead to quantitative data. According to the first procedure, the ratio is determined for all amino acids. An average as well as a standard deviation of all amino acids can be calculated. For those residues for which the ratio deviates less than the variance, a motion is assumed for which the amplitude is restricted and for which the internal correlation time is shorter than For those cases the overall correlation time can be estimated from the equation
For those nuclei for which the ratio is found to be more than a standard deviation above the average, a shortening of the transverse relaxation time owing to the presence of conformational equilibria can be assumed. For those nuclei the value of which is more than a standard deviation below the average, a significant prolongation of the time is induced by a motion which can be described with a time constant of the order of According to the second method, the determination of the overall correlation time is carried out by a minimization of the following error function, whereby the motion of the N–H bond vector is described following the model-free approach of Lipari and Szabo:
N is the number of amino acids for which relaxation data are available, and and are the standard deviations of the and NOE values of the residue i. For the calculation of a value for the correlation time is presumed
380
Jan Engelke and Heinz Rüterjans
and the parameters of internal motion and for all amino acids are varied in such a way that the sum of the deviations between the calculated and the experimental relaxation rates has become a minimum. A summation of single deviations of all amino acids will result in a total deviation between the experimental and calculated data for a distinct overall correlation time In the following this correlation time will be varied stepwise until a value can be assigned to the protein for which the error function has reached a minimum. This way of evaluation leads to problems, when the error for single amino acids in comparison to others is higher by orders of magnitudes. Such a deviation may occur when the model assumption describes the motion of the N–H vector in an improper way or when conformational equilibria have to be considered. In these cases the overall correlation time would be primarily determined by those amino acids with a larger deviation. Such an erroneous determination of
can be avoided
when only those residues are taken into account for which a moderate value of has been obtained. From a plot of versus the overall correlation time can be directly obtained from the minimum of the curve. The value of at the minimum should be about 1, which indicates that the average deviations of the calculated internal motional parameters are within the experimen-
tal error. In order to translate the experimental data into motional parameters, a suitable model for the description of motions has to be applied. Since the motion of the N–H bond vector can be of manifold nature, various models have been introduced. As a sort of basic model for the description of the protein backbone motions, the Lipari–Szabo model (LS) [Eq. (23)] or the modified Lipari–Szabo model (MLS) [Eq. (24)] have been used quite often. These model-free approaches are quite suitable for a comparison of various similar motions. From these two basic models five other models of motion can be derived, which are listed in Table 2. In essence
they differ by the number of variable motional parameters. Provided the overall
Dynamics of Protein Structures
381
correlation time of the molecule is known, it is possible to calculate the motional parameters for each of the amino acids with all five model assumptions by minimization of the values. In fact the motional model which describes the motion of the N–H bond vector can be best chosen using the model-specific values. A criterion for the selection may be the simplicity of the applied model. If the difference between the calculated and the experimentally determined relaxation rates is smaller than the fluctuations of the corresponding relaxation rates when applying one of the models, there is no reason to choose a more complex model. If one applies a model for representing a confidence interval of 95%, the uncertainty for the choice of a motional model will be only 5%. 3.4. Interpretation of Microdynamical Parameters
Relaxation time studies of proteins have been carried out with more than 30 different species using two-dimensional NMR spectroscopy. The investigated proteins had quite a variety of functional properties, as there were repressor-type or regulatory proteins, transport proteins, or enzymes. Characteristic
geometrical features of these proteins and the determined overall correlation times
are listed in Table 3. In most of the investigations a linear relationship between the mass M of a protein and its correlation time has been assumed. The theoretical basis of this assumption is given by the Stokes–Einstein relation for rotational diffusion (Abragam, 1961):
where
is the viscosity of the solution, R is the radius, V is the volume, and T is
the absolute temperature. This relation is valid for spherical molecules with a smooth surface moving in an ideal liquid. In Fig. 8, is plotted in dependence on
the molecular weight, and only those proteins with a nearly spherical inertia tensor have been considered. It is obvious that the validity of the Stokes–Einstein relation
for proteins is limited. A reason for this may be the difference in surface structures of the various proteins. Proteins with a rough fissured surface with long side chains ranging into the surrounding water or with hydrophilic side chains apparently diffuse with a slower rate, as one would expect according to its molecular mass. Proteins with a hydrophobic smooth surface, on the other hand, should diffuse with a higher diffusional rate. Since determines the quality of the NMR spectra to a high degree, it should become customary to replace the traditional classification of proteins according to their molecular mass by a characterization according to their correlation times. At least, with respect to the viewpoint of NMR spectroscopy, it can be shown that larger proteins, as, for example, the flavodoxin (16.3 kDa) of D.
383
Jan Engelke and Heinz Rüterjans
Dynamics of Protein Structures
383
vulgaris, with a correlation time of 4.5 ns can be more readily investigated than smaller proteins with a remarkably higher The order parameter is by far the most often used parameter for the description of internal dynamics in the ps–ns time scale. In secondary structure elements this parameter has values between 0.7 and 1, while in less-ordered areas smaller values were usually obtained, albeit the dynamics in these pretended unstructured areas is not very pronounced. In some publications order parameters derived from relaxation time measurements are compared with temperature factors (B-factors) determined with
384
Jan Engelke and Heinz Rüterjans
X-ray crystallography. Staphylococcal nuclease (Kay et al., 1989) showed no correlation of order parameters and temperature B-factors, while for Interleukin-1B (Clore et al., 1990b) or the C-terminal SH2 domain of the phospholipase (Farrow et al., 1994) and for ribonuclease T1 (Fushman et al., 1994) a poor correlation of these parameters was found. For Calbindin (Kördel et al., 1992) a very pronounced correlation of these parameters was found. There is no doubt that the poor correlation of these two parameters is connected to their very different physical origin. While B-factors are a measure for the static uncertainty of atom positions in the lattice, order parameters are derived from the motion in a ps–ns time scale. Therefore, a comparison of these two parameters should be discussed only with care.
Dynamics of Protein Structures
385
Beside the internal correlation time is of significant importance. Although a correct interpretation of this quantity is only possible in the frame of a realistic model of motion (Sec. 3.3), its order of magnitude already allows a qualitative description of the motion. However, its determination with an acceptable variance is extremely difficult. Fushman et al. (1994) showed that the as well as the relaxation times determine the order parameter But both quantities are relatively insensitive with respect to the determination of the local correlation time. With the prerequisite that the reorientation time of the molecule is known, one relaxation time is sufficient for the determination of while a determination of is not possible. Presumably because of this difficulty a detailed interpretation of is not given in most relaxation time studies. The parameter indicates a motion in the time scale. This parameter is a consequence of the change of the magnetic environment of the nucleus during the delay of a CPMG sequence. Because of this change the chemical shift is not refocused completely. A reason for this behavior is either a motion of nuclei in the environment of the nucleus, which may change its magnetic environment, or a motion of the nucleus itself in an inhomogeneous magnetic field. Relatively rapid conformational changes of protein segments which may occur during the
biological function of this protein may happen in this time scale and are necessary for its activity. First, contacts of substrates with binding regions of active sites may be connected with such motions; hence, motions in this time scale are of great interest when investigating the function of proteins. Of particular interest also are investigations of the dynamics of a protein in different states. Rather than absolute values of microdynamical parameters, only differences of these quantities have to be considered. Investigations of this kind have successfully been carried out for many proteins. Owing to differences in the
motional parameters, contact sites between two molecules have been identified in protein–DNA interactions (Slijper et al., 1997; Yu et al., 1996). Changes in the flexibility of protein parts near redox centers have been observed during a change from the reduced state to the oxidized state, although the overall structure did not change considerably (Hrovat et al., 1997). 4. BACKBONE DYNAMICS DERIVED FROM RATES
RELAXATION
Protein backbone dynamics can be investigated not only with the but with the relaxation times. Usually protein probes with both and enriched isotopes are used, whereby the sensitivity of the experiment is considerably improved in comparison with the application of nonlabeled probes. However, two additional problems arise. During the measurement of the transverse relaxation time a coherent transfer of magnetization between the
and the
nuclei owing to
386
Jan Engelke and Heinz Rüterjans
the scalar has to be inhibited. As a second aspect the kind of relaxation mechanism has to be considered: In a completely the nucleus is surrounded by several other nuclei with a nuclear spin, and therefore CSA and various dipole–dipole interactions contribute to its relaxation. This environment implies that the relaxation of cannot be described in terms of an SI spin system, as is possible for the nucleus.
4.1. Analysis of the Multispin Relaxation of In a uniformly various spin-1/2 nuclei are located in the vicinity of the spin. Therefore, its relaxation is influenced by the CSA relaxation and by various dipole–dipole interactions. Since the CSA constant for amounts to only 30 ppm (Ye et al., 1993), the CSA relaxation mechanism can be neglected in most cases (Palmer et al., 1993). The equations for the dipolar interaction between two spins were derived in Sec. 2. These equations are composed of a geometric factor containing the distance and the gyromagnetic ratio of both nuclei, and a motional factor, which consists of a sum of spectral density functions. According to this description the intensity of dipole–dipole interactions is proportional to
the value of which is depicted for the interaction of the neighboring spins in Table 4.
nucleus with its
Dynamics of Protein Structures
4.1.1. The
387
Relaxation Time
In the case of the transverse relaxation the dipole–dipole relaxation rate for all relevant interactions is proportional to J(0), and therefore the contribution of each interaction depends only on the constant From the values of Table 4 it is obvious that the transverse relaxation of the is dominated by its covalently bound hydrogen The sum of all other contributions amounts to about 5% and can therefore be neglected. As in the case of 15N relaxation, the data can be interpreted
by assuming an isolated SI spin system. 4.1.2. The
Relaxation Time
For the longitudinal relaxation phenomena and the heteronuclear NOE) three dipole–dipole interactions have to be considered: the heteronuclear interaction, which is proportional to and the two homonuclear and interactions, which are proportional to J(0). In proteins 7(0) is much larger than and consequently the products and become
comparable to the product In the following the longitudinal relaxation of
in a uniformly enriched
13
C/15N protein will be compared to that of an unlabeled protein. For the uniformly enriched protein the longitudinal relaxation of can be described by
where and Note that cross relaxation between and is eliminated by proton saturation which occurs during the relaxation period T. The cross relaxation with the carbonyl spin can also be neglected because the aliphatic pulses were adjusted such that the equilibrium magnetization of C' remains undisturbed . Assuming that at the beginning of the longitudinal relaxation period and the solution for
is
where The experiment performed for measuring records the 13C chemical shift prior to the relaxation period T such that Therefore, after two-dimensional
388
Jan Engelke and Heinz Rüterjans
Fourier transformation the first term of Eq. (28) gives cross peaks at while the second term corresponds to cross peaks at . In order to eliminate the dependence of the relaxation rate on the equilibrium values of the longitudinal magnetization, the magnetization of alternate scans is stored on the and the at the beginning of the relaxation period T (Sklenar et al., 1987):
Since the second term of Eq. (29) is proportional to only very weak cross peaks at will be observed (Fig. 9). Expanding the first term of Eq. (29) in a power series of T gives
Dynamics of Protein Structures
389
it is easy to show that the contribution of the cross relaxation to the value of the cross-peak intensity at is less than 8% for ns and (Fig. 9). For proteins with an overall correlation time between 5 and 10 ns, an error of no more than 2%–4% in the measured values of
should be obtained by fitting the experimental data to a single exponential. The smaller value of of a labeled protein in comparison to its value of an unlabeled protein is mainly due to the additional terms and It should be emphazised that this additional magnetization loss cannot be “refocused” by any pulse sequence. For proteins with a longer rotational correlation time the contribution of and to increases steadily. As a consequence, the relaxation time cannot be interpreted with an SI spin system for larger proteins.
4.1.3. The Heteronuclear
NOE
The heteronuclear NOE is also influenced by the additional homonuclear dipole–dipole interactions. In order to quantify these effects the heteronuclear NOE of of alanine in a uniformly labeled protein is compared to its value in an unlabeled sample. Alanine seems to be a good test compound because the large NOE value of the group should have a significant influence on the NOE of Cross relaxation takes place between and between and and between and The following equation describes these interactions:
where and and n denotes the number of protons covalently bound to For steady-state conditions the heteronuclear NOE of can be derived from Eq. (31), assuming that and are saturated For the NOE of the following equation is
obtained:
For a quantification of the heteronuclear NOE the spectral density function derived
from the model-free approach of Lipari and Szabo (1982a) is used in order to
390
Jan Engelke and Heinz Riiterjans
describe the dynamic behavior of the bond and the ' bond. For the methyl group the modified model-free spectral density function is applied,
taking into account the rapid methyl jump rate around the bond and the reorientation of the bond on a time scale intermediate between the methyl jump rate and the overall tumbling rate (Clore et al., 1990a, 1990b). The comparison of the heteronuclear NOE of in the labeled, selectively labeled, and unlabeled alanine is shown in Fig. 10. For the unlabeled alanine Eq. (32) was used, assuming that For a protein with a value up to 10 ns, the difference of the NOE values of labeled and unlabeled alanine should be less than 5%. The experimental errors should not be more than 5%–10% of the absolute value, and hence it seems justified to describe as an SI spin system for the determination of NOE values. From the above outline it seems obvious that the transverse relaxation of aliphatic nuclei is dominated by its interaction with the covalently bound hydrogen(s), and, hence, with respect to its relaxation behavior these carbons form an SI spin system. For the longitudinal relaxation rate and the heteronuclear NOE, contributions of both the heteronuclear interaction(s) and the homonuclear interaction(s) have to be considered. The ratio of their contributions mainly depends on the overall correlation time; in fact, for small proteins the
Dynamics of Protein Structures
391
heteronuclear interaction dominates and an isolated SI spin system can be assumed. With larger rotational correlation times the contribution of the homonuclear interactions increases, and the relaxation of neighboring aliphatic nuclei has to be considered in a multispin system. 4.2. Experiments to Determine the
Relaxation Rates
In the determination of the transverse relaxation time a coherent transfer of magnetization between the and the nuclei owing to the scalar has to be prevented. For this purpose differently shaped transverse relaxation periods T have been simulated from quantum-mechanical calculations in order to search for conditions in which the coherent transfer of magnetization is minimal (Engelke and Rüterjans, 1995). As a result it turned out that the application of a spin-lock field should be preferred rather than the CPMG sequence. In Fig. 11
392
Jan Engelke and Heinz Riiterjans
the decays of the magnetization for two nuclei of the enzyme ribonuclease T1 are depicted in dependence on the applied pulse sequence. The chemical shift of the nucleus in (A) is significantly different from the chemical shift of the neighboring nucleus, such that a selective irradiation of the nucleus of this amino acid is possible. Hence, a fit of the resonance intensity to a monoexponential decay function is possible with a relatively small variance independent of the applied pulse sequence. Contrary to this case, the chemical shifts of the and the nuclei of the considered second amino acid (B) of Fig. 11 are almost identical such that a complete suppression of the coherent transfer of magnetization during the transversal relaxation period is not possible. Also in this case better results are obtained by using a spin-lock field rather than a CPMG sequence. Pulse sequences for the determination of T1 and T2 relaxation times and for the determination of the heteronuclear NOE are depicted in Figs. 12 and 13. The sequences resemble the experiments for the determination of relaxation times (Kay et al, 1992a). The manifold couplings between the nucleus and the and nuclei during the evolution of the chemical shift of can be refocused when applying an evolution period with constant time (CT) (Vuister and Bax, 1992). In addition, the section of refocusing of the INEPT magnetization transfer can be integrated into the CT period in such a way that the antiphase magnetization at the beginning of the sequence may evolve into a in-phase magnetization, and vice versa. Contrary to the already published pulse sequences (Engelke and Riiterjans, 1995), a carbonyl carbon filter is inserted into the CT evolution period in order to suppress the very intensive cross peaks of serin residues which appear in the absorption region of the resonances. For the determination of the 7, relaxation time a modified inversion recovery pulse sequence has been used. The distances between the 1H 180° pulses during the relaxation period for suppressing the cross correlation between the dipolar interaction and the CSA relaxation were set to 5 ms. For recording the transverse relaxation time T2 a spin-lock field has been applied. During the time of the spin-lock field, 1H 180° pulses were applied with an interval of 1.4 ms in order to suppress the relaxation due to the cross correlation between the dipolar interaction and the chemical-shift anisotropy. For the determination of times a spin-lock field with intensity has been used. In order to obtain a flip angle of 80° to 90° between the vector ot magnetization and the -axis for all resonances, three series of spectra with 13C transmitter frequencies at 54, 59, and 64 ppm have been recorded. The heteronuclear NOE has been obtained from a spectrum with saturation and a spectrum without it. With the pulse sequence of Fig. 13 a spectrum with presaturation is obtained. In this case a 1H presaturation period of 3.5 s follows the delay time of 2.5 s. For spectra without NOE a delay time of 6 s between single scans has been applied. A minimum of 16 scans for T1 and T2 measurements and 32 scans for the NOE spectra is recommended.
Dynamics of Protein Structures
393
394
4.3. Microdynamical Parameters Derived from
Jan Engelke and Heinz Rüterjans
Relaxation Rates
In Sec. 2.1 it was shown that the transverse relaxation and the heteronuclear NOE of the nucleus is dominated by the heteronuclear dipole– dipole interaction. For the evaluation of longitudinal relaxation rates, the homonuclear and the interactions also have to be considered. The following equations have been derived for the evaluation of microdynamical parameters:
In order to limit the number of adjustable parameters, motions of the and the bond were described with an order parameter of 0.8. Possible deviations of the time due to this assumption may be at most 2% since the ratio for in the interval from 0.6 to 1 will fluctuate
between 7% to 11%. With this assumption the spectral density function is given by
Dynamics of Protein Structures
395
such that the contribution of the homonuclear dipolar interactions to the relaxation time is dependent only on the overall correlation of the protein. For the evaluation of the overall correlation time the influence of the additional homonuclear relaxation contributions has to be considered. Hence, Eq. (24), which was used for the determination of values from relaxation time, has to be changed to the following form:
The determination of the microdynamical parameters by a minimization of the error and the choice of motional models is identical to the procedures already described in Sec. 3.3.
function
5. SIDE-CHAIN DYNAMICS DERIVED FROM RATES
RELAXATION
For a quantitative study of side-chain motions, a large number of degrees of freedom caused by various diffusion or jump processes around different axes of orientations have to be considered. Such an analysis seems to be very complex, in particular in the case of long side chains. As a first approach a study of the dynamics of the dihedral angle seems to be obvious. The angle connects a or group to of the corresponding amino acid. The internal dynamics of
this dihedral angle has been characterized using motionally averaged constants like vicinal carbon-proton, nitrogen-proton, and carbon– carbon coupling constants (Bystrov, 1976; Karplus, 1963, 1959). Starting with a small number of discrete conformations with different angles, the theoretical constants have been calculated using the Karplus relations (Mádi et al., 1990). Hence, populations of individual staggered rotamer conformations were obtained by fitting the value of the calculated constant to the experimental value. However, no information about the rate constants of transfer between the various rotameric states can be obtained from averaged J-coupling constants. Instead, the use of heteronuclear relaxation rates may provide information about both the amplitude and the time constant of the motion around the
angle.
The problem of adequately describing protein side-chain dynamics by NMR relaxation normally is connected with limited experimental data and with the choice of appropriate model assumptions. Models currently used for the analysis of NMR relaxation data include those which describe anisotropic unrestricted diffusion,
396
Jan Engelke and Heinz Rüterjans
multiple rotations, restricted diffusion [see, for example, a review by London (1980)], and wobbling-in-a-cone motions (Lipari and Szabo, 1981, 1980). Various “model-free approaches” (Clore et al., 1990a, 1990b; Lipari and Szabo, 1982a, 1982b) for obtaining information about overall rotational correlation times and characteristics of restricted motions in terms of order parameters are also available. With limited experimental data, it is often difficult to decide which model is best, and the simplest model that can describe the available experimental parameters is used in most cases. For example, detailed investigations were carried out for the dynamics of the phenylalanine side chains in antamanide, but even for this small
peptide a distinction between a diffusion motion and a jump motion was difficult to make (Ernst, 1993). Thus, a decision about the type of motion for protein side chains solely from relaxation rates is difficult to make. However, using a fixed model of motion the amplitude and the time scale of the motion should be
determined quantitatively. To date, the model-free approach has been used most often. This may be reasonable for investigations of the protein backbone, but it fails in more detailed descriptions of protein side-chain dynamics. In order to cover a wide range of amplitudes and time scales of motions around the dihedral angle, the rotational restricted diffusion model has been chosen for the interpretation of
the relaxation data (London and Avitabile, 1978). In this model the bond rotates in the range In cases where extends approximately the dynamics may be adequately described with jump models. For the determination of microdynamical parameters two different types of relaxation rates are of interest: the heteronuclear longitudinal relaxation process described by the relaxation time together with the heteronuclear NOE
and the transverse heteronuclear cross-correlated cross-relaxation rate between the single-quantum and the triple-quantum operator in groups (Ernst and Ernst, 1994). The inclusion of the transverse relaxation time in the analysis might also be conceivable. However, for measuring the relaxation time on a uniformly protein a pulse sequence has to be available which effectively suppresses the scalar coupling during the evolution period. While such a pulse sequence has already been developed for the nucleus (Engelke and Rüterjans, 1995; Yamazaki et al., 1994), its application to side-chain carbon nuclei fails in most cases due to the small difference in the chemical shift of adjacent nuclei.
5.1. Dynamical Parameters Derived from
Relaxation Times and
Steady-State NOE The pulse sequences used for measuring the relaxation time and the heteronuclear NOE of the nuclei in aliphatic side chains of proteins are depicted in Fig. 14. They are based on the HCCONH experiment developed by Grzesiek and co-workers (1993) with modifications concerning the first part of the
Dynamics of Protein Structures
397
398
Jan Engelke and Heinz Rüterjans
original pulse sequence. For measuring the longitudinal relaxation time the mag-
netization was transferred from the aliphatic protons to the directly bound carbons, using a refocused INEPT sequence. A period for the evolution of the carbon chemical shift followed. For determining the inversion recovery scheme was used. Subsequently, the magnetization was transferred by a coherent magnetization transfer from the nuclei to the nucleus using the FLOPS Y-8 pulse scheme (Mohebbi and Shaka, 1991). Finally, the magnetization was transferred to the proton of the following residue for detection using the carbonyl carbon and the nucleus as relay nuclei. In order to determine NOEs the first refocused INEPT sequence was dropped. Instead data sets with and without saturation were collected. Scheme B in Fig. 14 indicates the sequence used for the data set with
saturation. The interpretation of longitudinal
relaxation rates measured on a uni-
formly protein is rather complicated. Beside the n directly bound protons the homonuclear dipolar interactions contribute to its relaxation. For the heteronuclear NOE an additional contribution due to cross relaxation can
be expected. Following the detailed analysis of the relaxation of the nucleus, this contribution can be neglected in most cases (Engelke and Rüterjans, 1995). The correlation between the longitudinal relaxation rates and the spectral densities is given by
where m is the number of directly bound to the nucleus and the homonuclear longitudinal relaxation rate between two carbon nuclei:
is
In order to extract microdynamical parameters from the relaxation rates a specific model for the molecular motion has to be chosen. Assuming that the
reorientation of the protein is isotropic and that the side-chain motion almost exclusively originates from rotations around the expression of the autocorrelation function for the
Szabo, 1978)
bond, the general vector is (Wittebort and
Dynamics of Protein Structures
where
399
are the elements of the Wigner rotation matrix of second order and
The angle
amounts to 109.471° for an ideal tetrahedral geometry. For
a restricted diffusion motion around the
dihedral angle in the range between
the correlation function is (London and Avitabile, 1978)
where
The spectral density function is obtained by Fourier transformation:
with
and
The constants are given by
and = 0.1107. Since it is assumed that the internal motion is caused by a rotation around the bond, the spectral density function for the contribution is that of a rigid rotator. However, the motion of the bond is identical to that of the bond, and therefore the expression for the spectral density function of is given by Eq. (39).
400
Jan Engelke and Heinz Rüterjans
In Fig. 15 the heteronuclear NOE of the nucleus is shown dependent on the amplitude and the internal correlation time For a strongly restricted motion a small heteronuclear NOE is expected, which is independent of the internal correlation time. Motions with larger amplitudes should lead to larger NOE values. From this dependence a certain limit for the amplitude of motion can be derived from the NOE value. For example, a NOE value of 1.8 can only be generated from amplitudes with angles larger than 45°. Other properties of the heteronuclear NOE can be extracted from Eq. (35). In case the homonuclear interactions are neglected, it is obvious that the NOE depends neither on the number n of directly bound protons nor on a geometry factor like a bond length. Because of this characteristic feature, the heteronuclear NOE seems to be a suitable parameter to describe the flexibility of side chains. The possible NOE values are supposed to be between 1.16 and 3. In correspondence to the classification of the homonuclear NOEs for structure calculations a distribution of the heteronuclear NOE in a few classes may be reasonable. While a lower limit for the amplitude of motion can be deduced for a given NOE value, an upper limit cannot be determined. In particular, when the internal correlation time is very small or very large, small NOE values
Dynamics of Protein Structures
401
402
Jan Engelke and Heinz Rüterjans
can also be generated from motions with large amplitudes. Thus, for a quantitative analysis the inclusion of the relaxation time is necessary. Using the combination of these two experimental values, the calculation of the microdynamical parameters and seems possible. For a group, which is bound to two adjacent nuclei, the angular amplitude and the internal correlation time in dependence on the relaxation time and the heteronuclear NOE is shown in the contour plots in Fig. 16. As an example, experimental values for RNase are also indicated in the
diagrams. Only for those nuclei for which the neighborhood is identical to those for which the calculation is carried out are the microdynamical parameters directly comparable. From Fig. 16A it seems obvious that the increase of NOE values and the corresponding flexibility parameter are correlated. For a nucleus with a heteronuclear NOE value of 1.7 the angular amplitude amounts to at least The backwards conclusion that a small NOE value is indicative of a strongly restricted motion is certainly wrong. Considering
as a typical example, an
amplitude of results from a NOE value of only 1.34. From the relaxation time upper limits for the internal correlation time can be determined. For a nucleus with a longitudinal relaxation time of 250 ms the motion of the correspond-
ing
bond is faster than 400 in most cases. The motion of most of the groups of RNase T1 can be characterized by an angular amplitude between 0° and 50° and an internal correlation time from 100 to 800 ps. The side chains of some residues which are located in loop region residues (43–55, 93–99) of RNase T1 show an increased flexibility with amplitudes up to 5.2. SIIS Cross Relaxation The second technique is called SIIS cross relaxation (Ernst and Ernst, 1994) and can be used to analyze the dynamics of groups. The basic process relies on the motional cross correlation of two CH dipolar interactions. In a particular range of intermolecular motional correlation times, it allows a distinction of restricted and unrestricted intramolecular motion. For the determination of the SIIS cross relaxation the SIIS–CCONH pulse
sequence was developed (Fig. 17). Starting with longitudinal proton magnetization the desired three-spin operator is generated prior to the mixing time with the depicted pulse scheme and is selected with a suitable phase cycle in combination with a triple-quantum filter. After the mixing time an evolution period for the nucleus follows, and subsequently the magnetization is transferred to the amide proton of the following residue in the same manner as in the and NOE experiments. In order to suppress all undesired terms of the density operator, in particular the term, the protons were decoupled as soon as the magnetization reached the 15N nucleus. The SIIS cross relaxation depends exclusively on the cross correlation:
Dynamics of Protein Structures
where by
403
is defined in Eq. (8). The tilt angle
of the rotating frame is given
= , where is the amplitude of the spin-locking field and is the frequency offset. The transverse cross-relaxation rate constant for the
restricted diffusion motion in dependence on the amplitude and the internal correlation time is shown in Fig. 18. It is apparent that for amplitudes smaller than the sign of remains negative, independent of the internal correlation time. Only when the amplitude exceeds does the sign of depend on For slow motions it will be negative, and for fast motions it will be positive. Thus, the appearance of a negative cross peak in an SIIS cross-relaxation spectrum (due to a negative sign in the master equation, a negative cross-relaxation rate constant leads to a positive cross relaxation) indicates a high flexibility on a fast time scale of the corresponding group. Hence, under certain conditions the SIIS cross-relaxation rate enables one to distingush between a restricted motion and
an unrestricted motion around the
dihedral angle which is obvious already from
the sign of the cross peak. Note that in proteins the experiment in the rotating frame
is favorable compared to the experiment in the laboratory frame, since the crossrelaxation constant is clearly larger and the zero point is shifted to smaller internal correlation times (Ernst and Ernst, 1994).
404
Jan Engelke and Heinz Rüterjans
As an example, the SIIS cross-relaxation spectrum recorded on RNase T1 is shown in Fig. 19. Most of the cross peaks in the spectra have a positive sign. Resonances with negative amplitudes are only obtained for the groups of and the groups of and (Fig. 19). The negative cross peaks for Pro55 can be explained with a rapid ring-puckering process with correlation times of ~30 ps. The three other proline residues in RNase do not appear in the spectrum. The negative cross peaks of the groups of and suggest that the rotation around is less restricted in its angular range and is rapid with a correlation time 1.8 ns. The negative cross peaks of the lysine side-chain carbons indicate a rapid and virtually unrestricted motion of these side chains, implying that the group sticks out into solution and does not undergo intramolecular hydrogen bonding. The major advantage of the data is connected with its unambiguous character. By means of the sign of the cross peaks in the spectrum a distinction between rigid and flexible side chains can easily be made. Such clear-cut evidence is known for only very few methods used for investigating dynamical features of biological macromolecules.
Dynamics of Protein Structures
405
406
Jan Engelke and Heinz Rüterjans
6. DETERMINATION OF PROTEIN DYNAMICS IN THE MICROSECOND TIME WINDOW Conformational exchange in the time window can provide an adiabatic relaxation pathway for transverse magnetization (Kaplan and Fraenkel, 1980). Deverell et al. (1970) showed that relaxation times in the rotating frame, can be used to determine exchange rate constants in the time range, since the efficiency of relaxation of transverse magnetization locked along an effective magnetic field depends on the amplitude of the spin-lock field. Early applications of this principle with small molecules included determinations of the rate constant for ring inversion in cyclohexane from measurement of of the ring protons as a function of the spin-lock power, and evaluation of the rotational barriers in a series of urea derivatives on the basis of
of the
nuclei Szyperski et al. (1993)
investigated conformational rate processes of BPTI by measuring the rotating frame relaxation times of the backbone spins as a function of the spin-lock power. They identified an intermolecular exchange process of the residues Cys 38 and Arg 39, which is related to isomerization of the chirality of the disulfide bond Cys 14–Cys 38. The time constant for this process amounts to 2.4 1.8 ms. The contribution of a conformational exchange process to can be determined if the frequency of this exchange process is near the Larmor frequency of the rf spin-lock field For a system of nuclei exchanging randomly between two conformational states with arbitrary populations, is given by (Deverell et al., 1970)
where ane the populations of states A and B, respectively, is the chemical-shift difference between states A and B, is the strength of the spin-lock field, is the relaxation rate for an infinitely large spin-lock power, and is the time constant of the conformational exchange process. In Fig. 20 a contour plot of the exchange rate in dependence on the field strength and the exchange time is shown. It is obvious that amounts only to a significant contribution to for motions in a limited time range. In this example, a time window between 10 and 5 ms is covered. An important feature for the experiment can also be concluded from Fig. 20. In order to obtain a strong dependence of from the spin-lock field strength, low values of should be used.
Pulse sequences for measuring accurate transverse
relaxation rates
are
recently published by Habazettl et al. (1996). They consist of a refocused HSQC experiment in which a continuous-wave spin-lock pulse for the nuclei is integrated. The transverse relaxation rates of Glu 73 of the N-terminal fragment of Ada protein are shown in Fig. 21 as a function of the applied spin-lock field
Dynamics of Protein Structures
407
408
Jan Engelke and Heinz Rüterjans
strength The frequency of the conformational exchange process should be in the range of the experimentally sampled frequency. The solid line represents the least-squares fit of the data points with Eq. (43), setting to the average value of The frequency at the turning point of the curve represents The fit resulted in a value of Spin-lock field-strength-dependent transverse relaxation rates can also be
measured for the carbonyl carbons of the protein backbone (Engelke and Rüterjans, 1997). A suitable pulse sequence is shown in Fig. 22. The experiment is based on a magnetization transfer in a HCACO manner yielding two-dimensional spectra with the carbonyl dimension in F1 and the dimension in F2. The details of this type of correlation have been reported by Grzesiek and Bax (1993c).
As an example, the results of field-dependent measurements of the resonances of ribonuclease T1 will be reported. The relaxation had been measured at the rf field strength between 1000 and 5000 Hz. For each of
Dynamics of Protein Structures
409
the field strengths about seven 2D spectra with different mixing times between 2.4 and 96 ms have been recorded. In order to avoid off-resonance effects, the measurements were carried out at two transmitter frequencies (174 and 179 ppm) from which data of 70 2D spectra were obtained. The time window which could be analyzed with these measurements was determined by the applied field strength. In
this way motions with a time constant between and 1 ms could be determined. From these experiments a dependence on the relaxation rate on the rf field strength was observed for the nuclei of the amino acids (Fig. 23). Assuming an exchange between two equally populated states
and
410
Jan Engelke and Heinz Rüterjans
leads to a difference of chemical-shift values of about 90 Hz for . From an analysis of the relaxation times a motion of in the time scale was already indicated. These experimental results may be interpreted as a reorientation of the peptide plane with a time constant of 49 . This observation is also in agreement with the analysis of coupling constants for which two values for the angles were obtained. The exchange process 3 and for was slower by one order of magnitude. Since the following amino acid is
a prolin residue, this may be a hint for a cis–trans isomerization of the peptide bond
7. DETERMINATION OF PROTEIN DYNAMICS IN THE MILLISECOND TIME WINDOW
NMR spectroscopy is a powerful tool for the measurement of rates of interconversion of chemical systems that are in slow dynamic equilibrium (Forsen and
Hoffmann, 1963; Gutowsky and Saika, 1953). One- and two-dimensional homonuclear exchange experiments have been successfully employed for studies of exchange processes in small molecules. However, due to spectral overlap, homonuclear approaches may not be practical when applied to macromolecules such as proteins. In addition, homonuclear methods such as NOESY (Macura and Ernst, 1980) or ROESY (Bax and Davis, 1985; Bothner-By et al., 1984) may be of limited utility when the exchange cross peaks are close to the diagonal peaks. To overcome this problem, an approach that exploits the increased chemical-shift resolution of proton-detected heteronuclear correlation experiments has been proposed by Montelione and Wagner (1989). In one class of experiments, chemical
exchange rates are measured by monitoring the exchange of a two-spin heteronuclear order, between different molecular conformations. A second class of experiments makes use of the net transfer of heteronuclear longitudinal magnetization, to measure rates of interconversion. In both cases, the resultant spectra consist of “autopeaks” for each conformation of a coupled I–S pair, plus “exchange peaks” that arise due to a magnetization transfer during a mixing period. If the spectra are recorded under fully relaxed conditions in the limit of zero mixing time, the relative volumes of the autopeaks of the different conformations may be used to characterize equilibrium constants. In addition, exchange rates may be determined from the initial slopes of buildup curves, obtained from spectra recorded using a series of mixing periods. More recently, Wider et al. (1991) have proposed a difference correlation experiment to facilitate the measurement of exchange rates from the exchange peaks in spectra with minimal interference from the autopeaks. Motions which can be described with the exchange spectroscopy must have characteristic features. One of the prerequisites is that the different conformations be connected with differences of the chemical shift of the observed nuclei. In
Dynamics of Protein Structures
411
addition, the jump rate between the considered conformations must be smaller than the difference of their chemical-shift values Otherwise an average of the chemical shifts will occur and an average signal will be observed. Hence, the time
constant for the exchange process has to be in the interval as described by the equation
The upper limit is determined by the mixing time
in the NMR experiment. A
cross signal will only be observed when at least one exchange between the
conformations during the mixing time will take place. The dependence of the amplitude of the auto- and cross signals on the mixing time may be obtained by solving the Bloch equations. This calculation for a system undergoing chemical exchange between two sites is straightforward (Farrow et al., 1994; Hull and Sykes,
1975; McConell, 1958; Gutowsky and Saika 1953; Hahn and Maxwell, 1952). Homonuclear and heteronuclear experiments for the determination of exchange processes will be briefly described. The identification of exchange signals in homonuclear two-dimensional spectra is difficult because of the large amount of
cross peaks. A suppression of the NOESY or ROESY cross signals may be feasible
by recording a clean exchange experiment or with the application of off-resonance ROESY experiments. Those experiments depend on the same effect. The crossrelaxation rate for a homonuclear two-spin system is (compare Sec. 2)
For a protein with a long correlation time only the spectral density function at the frequency (spin-diffusion limit) contributes to the relaxation rate in a first
approximation. Therefore the laboratory frame cross relaxation is negative and equal to half the rotating frame cross relaxation . If, during the mixing time, the magnetization is flipped rapidly between the two frames such that the average residence time in the NOESY:ROESY frames is 2:1, then exchange of magnetization due to cross relaxation will cancel out and be removed. Since chemical exchange takes place steadily, irrespective of the frame of reference, it
will contribute to cross-peak volumes in the usual manner. Another possibility to cancel out cross-relaxation cross peaks is the application of a ROESY experiment with an off-resonance spin-lock field (Desvaux et al., 1994). In a ROESY experiment a pulse with rf amplitude is applied such that the magnetization in the rotating frame of reference no longer rotates around the z-axis but around the z´-axis, which forms the effective inclination angle of
412
Jan Engelke and Heinz Rüterjans
against the of the static field, whereby is the offset frequency. In case the offset frequency is considerably smaller than the strength of this spin-lock field the inclination angle should be 90° and the measured relaxation time corresponds to the relaxation time. In case the frequency distance between the irradiation frequency of the spin-lock field and the resonance frequency of the spins and the amplitude of the field is chosen such that the cross-relaxation rate disappears [spin-diffusion limit
then the observed cross peaks are entirely due either to exchange or to crosscorrelation effects (Brüschweiler et al., 1989). The implementation of the off-resonance ROESY experiment requires a switch of the transmitter frequency during the pulse sequence. Desvaux et al. (1995a) have proposed a pulse sequence in which the magnetization at the beginning and at the end of the mixing time is pointing into the Following a slow increase of the amplitude of the spin-lock field the magnetization is following the in an adiabatic way, i.e., without loss of energy. According to this arrangement a rotation, rather than a projection, of the magnetization is induced. Exchange processes can be followed also using heteronuclear NMR spectroscopy. Even the pulse sequence for the determination oflongitudinal relaxation times, which has been described in Sec. 3.1, can be applied. In this experiment the indirect evolution time is arranged before the mixing time such that each of the spins will be labeled primarily with its chemical shift. If the exchange process happens during the following mixing time two cross peaks are observed in the spectrum, which will form together with the autocorrelation signal a rectangular pattern. In Fig. 24 a section of the exchange spectrum of the N-terminal SH3 domain of the protein drk is depicted (Farrow et al., 1994). This protein was found to be in
Dynamics of Protein Structures
413
a dynamic equilibrium between the native and the unfolded form under the chosen experimental conditions. It is obvious that the intensities of the autopeaks decrease with an increase of the mixing time, while the exchange peaks increase to maximal intensity and subsequently decrease again. With a quantitative analysis of the cross signal in the exchange spectrum, a time constant for the folding process of 1.16 s could be determined. In contrast to the pulse sequences using the exchange two-spin order which were published previously (Wider et al., 1991; Montelione and Wagner, 1989), in this experiment an exchange single-spin order is used. In the case of proteins and larger molecules, longitudinal two-spin order decays more efficiently than heteronuclear due to the presence of spin flips
414
Jan Engelke and Heinz Rüterjans
which can effectively relax the two-spin The increased relaxation rates of the autopeaks in the experiment will result in exchange peaks with reduced
signal intensity in relation to the experiment. In addition, the lower boundary of exchange rates can be extended with the experiment. In order to suppress autopeaks corresponding to nonexchanging IS groups, a reference spectrum can be recorded in which the order of the mixing time and the evolution time is interchanged (Montelione and Wagner, 1989). As a consequence, the longitudinal spin is not frequency labeled, and the intensity that is located in the chemical exchange peaks remains part of the direct correlation peaks. Therefore, the difference between the two data sets will lead to a reduction of direct correlation peaks of exchanging IS groups, and the signals of nonexchanging spin systems will disappear.
8. CONCLUDING REMARKS With appropriate model assumptions
and
relaxation may provide an
insight into structural dynamics of proteins. However, a quantitative description of
the anisotropic motion in proteins will be possible only after a quantitative analysis of the various contributions to the relaxation parameters, i.e., the dipolar interactions, the contributions due to the chemical-shift anisotropy, and various possibilities of cross-correlation effects. A theoretical treatment of correlated motions is also lacking. At present quite a few investigations have been started to solve some of the inherent problems. Although the complexity of NMR relaxation behavior is obvious, the resulting microdynamical parameters are definitely needed for understanding the functional properties of proteins.
ACKNOWLEDGMENT. We would like to thank Dr. David Fushmann, Rockefeller University, for stimulating discussions. Thanks are due to Christian Ludwig, Stefania Pfeiffer, and Yasmin Karimi-Nejad for help and the structural features of RNase T1. Stefan Geschwindner, Harald Thüring, and Norman Spitzner were of great help in preparing the and protein samples. Sigrid Fachinger assisted in preparing the manuscript. Jan Engelke was a recipient of a stipend of the Graduiertenkolleg “Protein structures, dynamics and function” of the J. W. Goethe University, Frankfurt. Grants from the Deutsche Forschungsgemeinschaft are gratefully acknowledged (Ru 145/8-7 and Ru 145/11-2). A grant was also
obtained from the European Union (RTD Program, ERBFMGECT-950002).
REFERENCES Abragam, A., 1961, Principles of Nuclear Magnetism, Wiley, New York. Akke, M., Skelton, N. J., Kördel, J., Palmer A. G. III, and Chazin, W. J., 1993, Biochemistry 32:9832.
Dynamics of Protein Structures
415
Allerhand, A., Doddrell, D., Glushko, V., Cochran, D. W., Wenkert, E., Lawson, P. J., and Gurd, F. R. N., 1971, J. Am. Chem. Soc. 93:544.
Barbato G., Ikura, M., Kay, L. E., Pastor, R. W., and Bax, A., 1992, Biochemistry 31:5269–5278. Bax, A., and Davis, D., 1985, J. Magn. Reson. 63:207. Bax, A., and Pochapski, S. S., 1992, J. Magn. Reson. 99:638–643. Bothner-By, A. A., Stephans, R. L., Lee, J., Warren, C. D., and Jeanlox, R. W., 1984, J. Am. Chem. Soc. 106:811. Boyd, J., 1995. J. Magn. Reson. B 107:279–285. Boyd, J., Hommel, U., and Campbell, I. D., 1990, Chem. Phys. Lett. 175:477–482. Boyd, J., Hommel, U., and Krishnan, V. V, 1991, Chem. Phys. Lett. 187:317–324. Brüschweiler, R., Griesinger, C., and Ernst, R. R., 1989, J. Am. Chem. Soc. 111:8034. Buck, M., Boyd, J., Redfield, C., MacKenzie, D. A., Jeenes, D. J., Archer, D. B., and Dobson, C. M., 1995, Biochemistry 34:4041–4055. Bull, T. E., 1992, Prog. NMR Spectr. 24:377–410. Bystrov, V. F., 1976, Prog. NMR Spectr. 10:41–81. Carr, H. Y., and Purcell, E. M., 1954, Phys. Rev. 94:630–638. Cavanagh, J., Palmer A. G. III, Wright, P. E., and Rance, M., 1991, J. Magn. Reson. 91:429. Cheng J., Lepre, C. A., Chambers, S. P., Fulghum, J. R., Thomson, J. A., and Moore, J. M., 1993, Biochemistry 32:9000–9010.
Cheng, J., Lepre, C. A., and Moore, J. M., 1994, Biochemistry 33:4093–4100. Clore, G. M., Driscoll, P. C., Wingfield, P. T., and Gronenborn, A. M., 1990a, Biochemistry 29:7387–
7401. Clore, G. M., Szabo, A., Bax, A., Kay, L. E., Driscoll, P. C., and Gronenborn, A. M., 1990b, J. Am. Chem. Soc. 112:4989–4991. Daragan, V. A., Kloczewiak, M. A., and Mayo, K. H., 1993a, Biochemistry 32:10,580–10,590. Daragan, V. A., and Mayo, K. H., 1993b, Biochemistry 32:11488–11499. Davis, J. H., 1995, J. Biomol. NMR 5:433–437. Desvaux, H., Berthault, P., Birlirakis, N., and Goldman, M., 1994, J. Magn. Reson. A 108:219–229. Desvaux, H., Berthault, P., Birlirakis, N., Goldman, M., and Piotto, M., 1995a, J. Magn. Reson. A 113:47. Desvaux, H., Birlirakis, N., Wary, C., and Berthault, P., 1995b, Mol. Phys. 86:1059–1073. Desvaux, H., Wary, C., Birlirakis, N., and Berthault, P., 1995c, Mol. Phys. 86:1049–1058. Deverell, C., Morgan, R. E., and Strange, J. H., 1970, Mol. Phys. 18:553–559. Dobson, C. M., 1995, Biochemistry 34:4041–4055.
Engelke, J., and Rüterjans, H., 1995, J. Biomol. NMR 5:173–182. Engelke, J., and Rüterjans, H., 1997, J. Biomol. NMR 9:63–78. Epstein, D. M., Benkovic, S. J., and Wright, P. E., 1995, Biochemistry 34:11037–11048. Ernst, M. C., 1993, Diss. ETH Nr. 10390, Zurich. Ernst, R. R., Bodenhausen, G., and Wokaun, A., 1987, Principles of Nuclear Magnetic Resonance in One and Two Dimensions, Clarendon Press, Oxford. Ernst, M. C., and Ernst, R. R., 1994, J. Magn. Reson. A 110:202–213.
Farrow, N. A., Zhang, O., Forman-Kay, J. D., and Kay, L. E., 1994, J. Biom. NMR 4:727–734. Farrow, N. A., Zhang, O., Forman-Kay, J. D., and Kay, L. E., 1995, Biochemistry 34:868–878. Farrow, N. A., Zhang, O., Forman-Kay, J. D., and Kay, L. E., 1997, Biochemistry 36:2390–2402.
Fejzo, J., Westler, W. M., Macura, S., and Markley, J. L., 1990, J. Am. Chem. Soc. 112:2574–2577. Forsen, S., and Hoffmann, R. A., 1963, J. Chem. Phys. 39:2892. Freemann, R., and Hill, H. D. W., 1969, J. Chem. Phys. 51:3140–3141. Fushman, D., Weisemann, R., Thüring, H., and Rüterjans, H., 1994, J. Biomol. NMR 4:61–78. Goldman, M., 1984, J. Magn. Reson. 60:437–452. Goldman, M., 1988, Quantum Description of High-Resolution NMR in Liquids, Clarendon Press, New York.
416
Jan Engelke and Heinz Rüterjans
Grzesiek, S., Anglister, J., and Bax, A., 1993, J. Magn. Reson. B 110:114–119. Grzesiek, S., and Bax, A., 1993a, J. Biomol. NMR 3:627–638.
Grzesiek, S., and Bax, A., 1993b, J. Am. Chem. Soc. 115:12593–12594. Grzesiek, S., and Bax, A., 1993c, J. Magn. Reson. B 102:103–106.
Gust, D., Moon, R. B., and Roberts, J. D., 1975, Proc. Natl. Acad. Sci. U.S.A. 72:4696–4700. Gutowski, H. S., and Saika, A., 1953, J. Chem. Phys. 21:1688. Habazettl, J., Myers, L. C., Yuan, F., Verdine, G. L., and Wagner, G., 1996, Biochemistry 35:9335–9348. Hahn, E. L., and Maxwell, D. E., 1952, Phys. Rev. 88:1070–1084. Hanssum, H., and Rüterjans, H., 1983, J. Chem. Phys. 78:4687–4697. Hawkes, G. E., Randall, E. W., and Bradley, C. H., 1975, Nature 257:767–772. Henry, G. D., Weiner, J. H., and Sykes, B. D., 1986, Biochemistry 25:590–598. Hiyama, Y, Niu, C.-H., Silverton, J. V, Bavoso, A., and Torchia, D. A., 1988, J. Am. Chem. Soc. 110:2378–2383.
Hrovat, A., Löhr, F., and Rüterjans, H., 1997, J. Biomol. NMR, 10:53–62.
Hull, W. E., and Sykes, B. D., 1975, J. Chem. Phys. 63:867–880. Jardetzky, O., and Roberts, G. C. K., 1981, NMR in Molecular Biology, Academic Press, New York. Kamath, U., and Shriver, J. W., 1989, J. Biol. Chem. 264:5586–5592. Kaplan, J. I., and Fraenkel, G., 1980, NMR of Chemically Exchanging Systems, Academic Press, New York. Karplus, M., 1959, J. Chem. Phys. 30:11–13. Karplus M., 1963, J. Am. Chem. Soc. 85:2870–2871. Kay, L. E., Jue, T. L., Bangerter, B., and Demou, P. C., 1987, J. Magn. Reson. 73:558–564. Kay, L. E., Keifer, P., and Saarinen, T., 1992a, J. Am. Chem. Soc. 114:10663. Kay, L. E., Nicholson, L. K., Delaglio, F., Bax, A., and Torchia, D. A., 1992b, J. Magn. Reson.
97:359–375. Kay, L. E., Torchia, D. A., and Bax, A., 1989, Biochemistry 28:8972–8979.
Kessler, H., Gehrke, M., and Griesinger, C., 1988, Angew. Chem. 100:507–554. Kördel, J., Skelton, N. J., Akke, M., Palmer A. G. III, and Chazin, W. J., 1992, Biochemistry 31:4856. Kowalewski, J., 1990a, Annual Reports on NMR Spektroscopy, Vol. 22, Academic Press, London. Kowalewski, J., 1990b, Annual Reports on NMR Spektroscopy, Vol. 23, Academic Press, London. Kuwata, K., and Schleich, T., 1994, J. Magn. Reson. A 111:43–49.
Lane, A. N., 1989, Eur. J. Biochem. 182:95–104. Lee, W., Recington, M., Farrow, N. A., Nakamura, A., Utsunomiya-Tate, N., Miyake, Y., Kainosho, M., and Arrowsmith, C. H., 1995, J. Biomol. NMR 5:367–375.
Lefevre, J.-F., Dayie, K. T., Peng, J. W., and Wagner, G., 1996, Biochemistry 35:2674–2686. Lipari, G., and Szabo, A., 1980, J. Biophys. 30:489–506. Lipari, G., and Szabo, A., 1981, J. Chem. Phys. 75:2971–2976. Lipari, G., and Szabo, A., 1982a, J. Am. Chem. Soc. 104:4546–4559. Lipari, G., and Szabo, A., 1982b, J. Am. Chem. Soc. 104:4559–4570. Liu, J., Prakash, O., Cai, M., Gong, Y., Huang, Y., Wen, L., Wen, J. J., Huang, J.-K., and Krishnamoorthi, R., 1996a, Biochemistry 35:1516–1524. Liu, J., Prakash, O., Huang, Y., Wen, L., Wen, J. J., Huang, J.-K., and Krishnamoorthi, R., 1996b, Biochemistry 35:12503–12510. Logan, T. M., Olejnickzak, E. T., Xu, R. X., and Fesik, S. W., 1993, J. Biomol. NMR 3:225. London, R. E., 1980, Magnetic Resonance in Biology, Vol. 1, 1–69, Wiley, New York. London, R. E., and Avitabile, J., 1978, J. Am. Chem. Soc. 100:7159–7165. Macura, S., and Ernst, R. R., 1980, Mol. Phys. 41:95.
Mádi, Z. L., Griesinger, C. and Ernst, R. R., 1990, J. Am. Chem. Soc. 112:2908–2914. Markus, M. A., Dayie, K. T., Matsudaira, P., and Wagner, G., 1996, Biochemistry 35:1722–1732. Markley, J. L., Horsley, W. J., and Klein, M. P., 1971, J. Chem. Phys. 55:3604.
Dynamics of Protein Structures
417
McCain, D. C., and Markley, J. L., 1986, J. Am. Chem. Soc. 108:4259–4264. McConell, H. M, 1958, J. Chem. Phys. 28:430–431. Meiboom, S., Luz, Z., and Gill, D., 1958, J. Chem. Phys. 27:1411–1412. Mohebbi, A., and Shaka, A. J., 1991, Chem. Phys. Lett. 178:374–378. Montelione, G. T., and Wagner, G., 1989, J. Am. Chem. Soc. 111:3096–3098. Morris, G. A., and Freeman, R., 1979, J. Am, Chem. Soc. 101:760. Muhandiram, D. R., and Kay, L. E., 1994, J. Magn. Reson. Ser. B 103:203–216. Nicholson, L. K., Kay, L. E., Baldisseri, D. M., Arango, J., Young, P. E., Bax, A., and Torchia, D. A., 1992, Biochemistry 31:5253–5263. Nirmala, N. R., and Wagner, G., 1988, J. Am. Chem. Soc. 110:7557–7558. Nirmala, N. R., and Wagner, G., 1989, J. Magn. Reson. 82:659–661. Norton, R. E., Clouse, A. O., Addleman, R., and Allerhand, A., 1977, J. Am. Chem. Soc. 99:79–83. Orekhov, V. Y.,Pervushin, K. V., Korzhnev, D. M.,and Arseniev, A. S., 1995, J. Biomol. NMR 6:113–122. Palmer A. G. III, Cavanagh, J., Wright, P. E., and Ranee, M., 1991, J. Magn. Reson. 93:151–170. Palmer A. G. III, Hochstrasser, R. A., Millar, D. P., Ranee, M., and Wright, P. E., 1993, J. Am. Chem. Soc. 115:6333–6345. Peng, J. W., Thanabal, V., and Wagner, G., 1991a, J. Magn. Reson. 94:82–100. Peng, J. W., Thanabal, V., and Wagner, G., 1991b, J. Magn. Reson. 95:421–427. Peng, J. W., and Wagner, G., 1992, Biochemistry 31:8571–8586. Phan, I. Q. H., Boyd, J., and Campell, I. D., 1996, J. Biomol. NMR 8:369–378. Piotto, M, Saudek, V, and Sklenar, V, 1992, J. Biomol. NMR 2:661–665.
Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T., 1992, Numerical Recipes, Cambridge University Press, Cambridge. Richarz, R., Nagayama, K., and Wuthrich, K., 1980, Biochemistry 19:5189–5196. Rischel, C., Madsen, J. C., Andersen, K. V, and Poulsen, F. M., 1994, Biochemistry 33:13997–14002. Rohl, C. A., and Baldwin, R. L., 1994, Biochemistry 33:7760–7767. Schiksnis, R. A., Bogusky, M. J., Tsang, P., and Opella, S. J., 1989, Biochemistry 26:1373–1381. Schneider, D. M., Dellwo, M. J., and Wand, A. J., 1992, Biochemistry 31:3645. Schneider, M., Engelke, J., and Rüterjans, H., in preparation. Searle, M. S., Forster, M. J., Birdsall, B., Roberts, G. C. K., Feeney, J., Cheung, H. T. A., Kompis, I., and Geddes, A. J., 1988, Proc. Natl. Acad. Sci. U.S.A. 85:3787–3791. Shaw, G. L., Davis, B., Keeler, J., and Fersht, A. R., 1995, Biochemistry 34:2225–2233.
Slijper, M., Boelens, R., Davis, A. L., Konings, R. N. H., Marel, G. A., Boom, J. H., and Kaptein, R., 1997, Biochemistry 36:249–254. Skelton, N. J., Palmer A. G. III, Akke, M., Kördel, J., Rance, M., and Chazin, W. J., 1993, J. Magn. Reson. B 102:253–264. Sklenár, V., Torchia, D., and Bax, A., 1987, J. Magn. Reson. 73:375–379. Spera, S., Ikira, M., and Bax, A., 1991, J. Biomol. NMR 1:155–165. Stone, M. J., Chandrasekhar, K., Holmgren, A., Wright, P. E., and Dyson, H. J., 1993, Biochemistry 32:426. Stone, M. J., Fairbrother, W. J., Palmer A. G. III, Reizer, J., Saier, M. H., and Wright, P. E., 1992, Biochemistry 3:4394–4406. Szyperski, T, Luginbühl, P., Otting, G., Güntert, P., and Wüthrich, K., 1993, J. Biomol. NMR 3:151 –164. Tjandra, N., Wingfield, P., Stahl, S., and Bax, A., 1996, J. Biomol. NMR 8:273–284. Vold, R. L., Waugh, J. S., Klein, M. P., and Phelps, D. E., 1968, J. Chem. Phys. 48:3831–3832. Vuister, G. W., and Bax, A., 1992, J. Magn. Reson. 98:428–435. Werbelow, L. G., and Grant, D. M., 1977, Adv. Magn. Reson. 9:189–300. Wider, G., Neri, D., and Wüthrich, K., 1991, J. Biomol. NMR 1:93–98. Wittebort, R. J., and Szabo, A., 1978, J. Chem. Phys. 69(4): 1722–1736. Woessner, D. E., 1962, J. Chem. Phys. 36:1–4.
418
Jan Engelke and Heinz Rüterjans
Wüthrich, K., 1986, NMR of Proteins and Nuclei Acids, Wiley, New York. Yamazaki, T., Muhandiram, R., and Kay, L. E., 1994, J. Am. Chem. Soc. 116:8266–8278.
Ye, C., Fu, R., Hu, J., Hou, L., and Ding, S., 1993, Mag. Reson. Chem. 31:699–704. Yu, L., Zhu, C.-X., Tse-Dinh, Y.-C., and Fesik, S. W., 1996, Biochemistry 35:9661–9666. Zhu, L., Kemple, M. D., Landy, S. B., and Buckley, P., 1995, J. Magn. Reson. Ser. B 109:19–30.
Zink, T., Ross, A., Lüers, K., Cieslar, C., Rudolph, R., and Holak, T. A., 1994, Biochemistry 33:8453– 8463.
10
Multinuclear Relaxation Dispersion Studies of Protein Hydration
Bertil Halle, Vladimir P. Denisov, and Kandadai Venu 1. INTRODUCTION A protein solution contains at least two components: protein and water. The interaction of the two can be monitored by a variety of NMR methods utilizing either the strong water resonance or the much weaker protein resonances. While both approaches have a long history, the study of protein hydration by NMR relaxation has been transformed in the past few years by important methodological advances.
Studies of biomolecular hydration via the water resonance were among the first applications of NMR to biological systems (Shaw and Elsken, 1953; Jacobson et al., 1954; Odeblad and Lindström, 1955; Jardetzky and Jardetzky, 1957; Balazs et al., 1959). Although the first nuclear magnetic relaxation dispersion (NMRD) measurements on the water resonance in protein solutions were reported in 1969–1970 (Koenig and Schillinger, 1969; Blicharska et al., 1970; Kimmich and Noack, 1970a), the molecular basis of the observed relaxation dispersion was
Bertil Halle and Vladimir P. Denisov • Physical Chemistry 2, Lund University, S-22100 Lund, Sweden. Kandadai Venu • School of Physics, University of Hyderabad, 500046 Hyderabad, India. Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999.
419
420
Bertil Halle et al.
established only recently (Denisov and Halle, 1994). There were two major obstacles on the long and tortuous road to progress in the water NMRD field. First, the contribution to the relaxation rate of the observed bulk water resonance from rapidly exchanging labile protein protons could not be isolated and was generally underestimated. This problem was overcome by measuring the water relaxation rate (Halle et al., 1981; Piculell and Halle, 1986; Denisov and Halle, 1995a). Second, since the measured relaxation rate is an irreducible average over all water molecules exchanging rapidly with bulk water, it was not possible to identify the water molecules responsible for the dispersion. At the time of the first NMRD studies, the understanding of protein structure was rudimentary. Only with access to high-resolution crystal structures and genetically engineered proteins could NMRD experiments be designed that discriminated among the various models
proposed and even allowed individual hydration water molecules to be dynamically characterized (Denisov and Halle, 1994, 1995a; Denisov et al., 1996). It is now firmly established that the water relaxation dispersion from protein solutions is due to a small number of crystallographically identifiable water molecules, essentially buried within the protein and typically with residence times
in the range (Denisov and Halle, 1994, 1995a, 1996). This insight has transformed the NMRD method into a quantitative tool for studies of biomolecular solvation and has made it possible to use buried water molecules as noninvasive probes of large-scale conformational dynamics, unfolding, and association of proteins and other biomolecules in solution. Unlike most high-resolution NMR techniques, the NMRD method is not limited to solutions, but can also be applied to semisolid samples such as biological tissues. Since protein-associated water molecules generally exchange rapidly with bulk water on the chemical-shift time scale, they cannot be directly observed by high-resolution NMR spectroscopy. Double-resonance techniques, however, can be used to monitor magnetization transfer from bulk water to resolved solute resonances. This alternative NMR approach was first applied to oligopeptides (Pitner et al., 1974) and proteins (Stoesz et al., 1978; Akasaka, 1979) in the 1970s. The extensive development during the 1980s of two-dimensional NMR spectroscopy and solvent suppression techniques led in 1989 to the first demonstration by NOE spectroscopy (NOESY) of cross relaxation between buried water molecules and protein protons (Otting and Wüthrich, 1989). Since then, a variety of pulse schemes have been invented for observing NOEs with (usually long-lived) water molecules in proteins (Otting, 1997). The NMRD and NOE methods, with their different strengths and weaknesses, have emerged as complementary approaches for studying protein hydration in solution. The aim of this chapter is to provide a coherent presentation of the methodology and theoretical basis of the water NMRD method, illustrated by recent applications to protein solutions.
Multinuclear Relaxation Dispersion Studies of Protein Hydration
421
2. METHODOLOGY OF WATER NMRD The term nuclear magnetic relaxation dispersion (NMRD) usually refers to the dependence of the longitudinal relaxation rate on the Larmor frequency determined by the static magnetic field applied during the evolution period. In bulk water at room temperature, the relaxation dispersion occurs at about 30 GHz and, hence, is not accessible with current NMR technology. In aqueous solutions of small to medium-sized proteins at ambient temperature, however, another dispersion appears around 10 MHz, conveniently located in the middle of the frequency range accessible with conventional methods of field variation. In our laboratory, the low-field range (down to 2 MHz) is covered by a 100-MHz tunable NMR spectrometer equipped with an electromagnet, while higher frequencies (currently up to 600 MHz) are accessed by a series of fixed-field spectrometers with cryomagnets. For large proteins or biomolecular complexes, very high protein concentrations (with strong protein–protein interactions), high solvent viscosities (e.g., at subzero temperatures), and for semisolid proteins, the relaxation dispersion may extend into the kHz range and below. At such low fields, fast field cycling (FFC) is the method of choice. This method differs fundamentally from the conventional one in that the static field is varied during the course of an individual measurement. The
principal limitations of the FFC method are the relatively low maximum attainable field and the finite time required to switch the field, factors which have so far prevented FFC studies of fast-relaxing nuclei such as Due to their different strengths and weaknesses, the conventional and FFC methods thus usefully complement each other.
2.1. Conventional Field Variation Conceptually, the simplest way to record a relaxation dispersion is to carry out relaxation measurements on a series of NMR spectrometers operating at different fixed magnetic fields. At the Lund NMR Center, relaxation data at half a dozen
fields from 2.35 T to 14.1 T can currently be gathered in this way. Even for the and
however, this barely covers the upper half of the 1–100
MHz frequency range of primary interest in most studies of protein solutions. Access to the lower part of this range is most conveniently provided by a field-variable electromagnet interfaced to a 90- or 100-MHz console. NMRD studies using such tunable, or multichannel, NMR spectrometers have been carried out since the 1960s. The early NMRD measurements actually extended as far down as 50 kHz, but required large samples (50 mL) even for aqueous solutions of paramagnetic ions, where the proton density is high and the dispersion is strong (Hausser and Noack, 1964). In the early 1970s, water NMRD data in the range 0.45–160 MHz were obtained from protein solutions with this method, complemented by
422
Bertil Halle et al.
FFC data at lower frequencies (Blicharska et al., 1970; Kimmich and Noack, 1970a;
Kimmich, 1971; Noack, 1971; Grösch and Noack, 1976). Large samples were needed (15 mL at 0.45 MHz) to determine to an accuracy of 5%. In present-day NMRD studies of protein solutions by conventional field variation, the focus is on the less receptive and nuclei and, considering the high cost of water and recombinant proteins, sample volumes rarely exceed 1 mL. If care is taken to minimize systematic errors (see below), an accuracy of 1 % in can nevertheless be achieved routinely. At this level of accuracy, the NMRD method can be applied in novel ways to provide a wealth of molecular-level information, such as residence times of individual buried water molecules in proteins (Denisov et al., 1996).
2.1.1. Spectrometer Characteristics For NMRD measurements up to 2.0 T, we use a modified Bruker MSL 100 spectrometer interfaced to an electromagnet (Drusch EAR-35N) with a field-variable
external field-frequency lock. The construction of an inexpensive field-variable external lock system for use in NMRD studies has recently been described (Sitnikov et al., 1996). The lock works in parallel with a flux stabilizer, which eliminates higher-frequency field variations. A peak-to-peak field stability of better than is achieved in this way (Furó and Halle, 1995). For NMRD studies of protein solutions, the stability requirements are actually less stringent. For where the lowest frequencies are outside the 0.2–2.0 T operating range of our lock, good results are obtained provided that the field is allowed to settle for a few hours. For cryomagnets, a lock is not essential. Besides the intrinsic field instabilities, external transients may cause field jumps of the order The influence of such transients was minimized by extensive grounding and shielding and by enclosing the entire instrument in a Faraday cage. Three tuned preamplifiers and eight interchangeable probeheads are used to cover the frequency range 2–85 MHz. The practical low-frequency limit is determined by the frequency band of conventional NMR spectrometers. It is currently 2 MHz on our instrument, but can be further reduced with customized spectrometer components. Signal-to-noise considerations are usually not the limiting factor even at this frequency provided that suitably (20%, say) isotope-enriched water is used for measurements. Since a Lorentzian NMRD profile spans approximately two frequency decades, it is often desirable to extend the relaxation measurements to and frequencies approaching 100 MHz, as provided by a 600-MHz instrument. 2.1.2. Relaxation Time Measurements
To achieve 1% accuracy in the signal-to-noise ratio should be kept high at all fields. While this is easily achieved for and up to 4000 transients may be required at 2 MHz for with 20% enrichment in 0.5–1.0 mL samples.
Multinuclcar Relaxation Dispersion Studies of Protein Hydration
423
The inversion recovery pulse sequence with standard phase cycling is usually the best option for measurements, due to its large dynamic range and tolerance to pulse angle imperfections and off-resonance effects (Martin et al., 1980). Nevertheless, the pulse length should be calibrated to within a few percent for each experiment. To reduce the inhomogeneity, the sample volume should not exceed the length (typically 10–15 mm) of the coil. The resulting increase of the inhomogeneity presents no problem provided that it does not vary during a experiment. For measurements at high fields, radiation damping produces an apparent (initial) relaxation enhancement if inversion is imperfect (Mao et al., 1994; Wu and Johnson, 1994). Such artifacts can easily be avoided by reducing the filling factor (e.g., by using the decoupler coil), reducing the homogeneity, detuning the probe, inserting a homospoil pulse, or using the saturation recovery pulse sequence with low-power irradiation or an aperiodic excitation sequence. Conventional field variation, of course, also permits relaxation dispersion studies involving coherences, such as standard spin-echo measurements and, for the spin-5/2 nucleus, multiple-quantum experiments (see Sect. 3.1.2b). In general, however, longitudinal relaxation can be measured more accurately and is more straightforward to analyze.
In the inversion recovery method, the use of 20 delay times in the range taken in random order, usually gives very good results for a signalto-noise ratio of 100. The longitudinal relaxation time is determined from a standard nonlinear three-parameter fit, where the adjustable initial and long-time intensity parameters compensate for inhomogeneity. If the field is stable, the use of peak amplitudes or peak integrals produces virtually identical results (Fig. 1).
2.1.3. Temperature Control Accurate temperature control is perhaps the most important factor in reducing systematic errors in conventional NMRD. Since the measured is essentially inversely proportional to solvent viscosity, it follows that a 1 % systematic error in can result from a temperature missetting of merely 0.4 K (at room temperature). For measurements on different spectrometers with different temperature controllers and different probe designs, it is therefore essential to use an accurate and robust procedure for temperature calibration, e.g., a thermocouple immersed in a dummy sample inserted at the same height in the probe as the actual sample. A measurement on a pure water reference sample with known (temperature dependent) can also serve as a temperature check. Temperature calibration should be carried out before each (set of) measurements on a given probehead and preferably also afterwards to check for possible temperature drifts. With most standard temperature controllers and with a properly insulated probe, fluctuations in sample temperature can be kept below 0.1 K, and temperature gradients below Using the procedures
424
Bertil Halle et al.
outlined here, values accurate to 1 % or better are usually obtained for water and protein solutions over the entire field range (Fig. 2). 2.2.
Fast Field Cycling
The polarization of a nuclear spin system is due to the interaction between the nuclear magnetic moments and the static magnetic field The initial nonequilibrium polarization required for a longitudinal relaxation experiment can be created either by manipulating the spin state populations without altering the Zeeman levels, as in conventional pulse NMR at constant field, or by changing the level spacing without altering the populations, as in the cyclic variation of the field performed in FFC NMR. In either case, the nonequilibrium state should preferably be established (by the rf pulse or the field switch) in a time short compared to the longitudinal relaxation rate.
Multinuclear Relaxation Dispersion Studies of Protein Hydration
425
While the FFC method was used already in the 1950s to study relaxation in solids (Anderson and Redfield, 1959), its application to water
relaxation in
protein solutions dates back to the late 1960s (Koenig and Schillinger, 1969; Blicharska et al., 1970; Kimmich and Noack, 1970b). By extending the accessible NMRD range down to a few kHz, where the sensitivity scaling precludes conventional relaxation measurements, the FFC method represented an important methodological advance. In current NMRD studies of protein solutions, however, the focus is on highly accurate and relaxation measurements in the 1–100 MHz range, where conventional field variation is the best method. As mentioned above, for certain applications, complementary FFC measurements are essential. Among the three water nuclei, only and have so far been used in FFC work. Several excellent reviews of the FFC method and its diverse applications are
available (Kimmich, 1980; Noack, 1986, 1995; Noack et al., 1997).
426
Bertil Halle et al.
2.2.1. The FFC Spectrometer A field cycling relaxation experiment generally consists of three periods (Fig. 3a, c). The sample, initially equilibrated in a field is brought into a nonequilibrium state by rapidly changing the field to a lower or higher value After an evolution period of variable length during which relaxation takes place in the field the field is switched to a relatively high value for signal detection. The NMRD frequency range accessible by the FFC method is limited from above by the maximum field of the FFC magnet and from below by interference from ambient fields, including the geomagnetic field (about 50 or 2-kHz frequency) as well as stray fields from nearby magnets or building materials, and by leakage currents in the transistors that control the magnet current. The low-frequency limit can be suppressed by ambient-field compensation. While
the longitudinal component (parallel to the primary field) of the ambient field is easily compensated by the magnet, the transverse component requires special compensation coils. For low-field measurements, the current through any external shim coils must also be stepped down during the evolution period.
Multinuclear Relaxation Dispersion Studies of Protein Hydration
427
FFC instruments based on home-built spectrometers and suitable for water NMRD studies of protein solutions have been developed in several laboratories (Redfield et al., 1968; Florkowski et al., 1969; Kimmich and Noack, 1970b;Conti, 1986; Koenig and Brown, 1987; Job et al., 1996), and a commercial FFC spectrometer is available since 1995 (Sykora and Ferrante, 1995). The low-inductance air core magnets used for FFC are generally optimized for short switching times
and minimal power consumption rather than for high field homogeneity (Schweikert et al., 1988). For water NMRD studies, an inhomogeneity of 200 ppm (in a 10-mm sample), as for our instrument, does not lead to significant signal loss during the ca. receiver deadtime. Air core FFC magnets with as little as 5-ppm inhomogeneity have been constructed (Noack, 1986). From the point of view of homogeneous polarization, even a 200-ppm inhomogeneity is, of course, far superior to typical inhomogeneities in conventional pulse NMR. Switching of the field, which must be performed at a rate at least comparable to the relaxation rate, can be accomplished mechanically or electronically. The mechanical approach, where the sample is rapidly shuttled between two magnetic
fields, has the advantage that the detection field can be provided by a high-field cryomagnet. The consequent sensitivity gain and improved field homogeneity may eventually permit high-resolution FFC measurements. While these factors and the wider frequency range are important advantages, the long shuttling time (typically
several 100 ms) limits the mechanical approach to samples with relatively slow relaxation. With electronic switching, the maximum field is usually 0.5–2 T, but much shorter relaxation times can be measured. Our version of Stelar’s FFC spectrometer has a maximum field of 0.5 T (21-MHz frequency) and a maximum switching rate of allowing accurate (2%) measurements of values down to a few ms (Sykora and Ferrante, 1995). Using the energy storage principle, switching rates as high as
can be achieved (Redfield et al., 1968; Noack,
1986; Job et al., 1996). The sensitivity of the FFC method is comparable to a conventional relaxation experiment at a fixed field equal to the FFC detection field (Noack, 1986). Whereas the evolution field is varied in the course of an NMRD experiment, the detection field is kept fixed. The rf circuitry of an FFC spectrometer can therefore be optimized to a narrow frequency band, and there is no need for retuning as the evolution field is changed. This reduces the cost of the spectrometer and makes NMRD studies by FFC much less time consuming than by conventional means. With the aid of a personal computer, the FFC experiment can be fully
automated and an entire NMRD profile can often be recorded overnight. This is a decisive advantage of the FFC method, apart from access to the sub-MHz frequency range.
428
Bertil Halle et al.
2.2.2. Relaxation Time Measurements
The two field cycles most frequently used for measurements (Noack, 1986) are schematically illustrated in Fig. 3. Unless the evolution field is close to the detection field, the preparation period, typically of duration 5 times consists of polarization at a high field whereby the sample acquires a longitudinal magnetization After the field has been switched to the magnetization starts decaying toward the new equilibrium value (Fig. 3a, b). The magnetization remaining at the end of the evolution period is monitored by switching the field to and recording the initial point of the free induction decay following a 90° rf pulse or, if the
field is not sufficiently homogeneous, the amplitude of a
90°-t-180° spin echo. Signal averaging (beyond what is required for phase cycling) can be used if needed (usually only for
measurements). After repeating the
experiment with about 20 different values, the longitudinal relaxation time in the evolution field is obtained from a three-parameter fit according to
where, ideally, To obtain the NMRD profile, the whole procedure must be repeated for a range of values of the continuously variable evolution field In practice, the coefficients A and C can deviate significantly from their ideal values. This is due to relaxation (in a time-dependent field) during the switching intervals and during the delay (a few ms) inserted before the read pulse to allow the field to stabilize. Provided that the field is cycled in a reproducible way for all values, these imperfections are fully accounted for by the adjustable parameters A and C and do not introduce systematic errors in the determination. It should also be noted that inhomogeneity is not an important issue in FFC work, since the nonequilibrium state is prepared by means of the field.
When the evolution field is close to the polarization field, the parameter C in Eq. (1) becomes too small for an accurate determination with the prepolarization cycle. An alternative field cycle is then used, with essentially zero field in the preparation period (Fig. 3c, d). Equation (1) still applies, but with (in the ideal case). As an illustration of the accuracy achievable with our FFC instrument, Fig. 4
shows the water relaxation dispersion from a protein solution. The low-frequency FFC data are seen to merge smoothly with the high-frequency data acquired by conventional field variation. Primarily due to the lower quality of the detection field,
the accuracy of the FFC data (2% on average) is inferior to that of the conventional data (1%). The FFC data are also more prone to systematic error since the restricted
space available for thermal insulation of the probe in the FFC magnet does not allow as precise temperature control as desired. Temperature fluctuations of 0.2 K are
Multinuclear Relaxation Dispersion Studies of Protein Hydration
429
typical and temperature gradients can be even larger for measurements far from ambient temperature.
2.3. NMR Properties of the Water Nuclei Three different hydrogen isotopes, and as well as the magnetic oxygen isotope are available for NMR studies of water. The radioactive nucleus, a weak emitter with a half-life of 12.5 yr, has spin-1/2 and 7% larger magnetogyric ratio than Despite its high NMR sensitivity, has so far found little use for water NMR studies. The radioactivity makes high enrichment levels hazardous and causes radiolysis of biological samples. Whereas NMRD studies
have not yet been attempted, a NOESY study of DNA hydration using specifically nucleotides was recently reported (Kubinec et al., 1996). Table 1 summarizes some relevant properties of the three stable water nuclei used for NMRD studies. The choice of nucleus is largely dictated by factors related to the
430
Bertil Halle et al.
relaxation mechanism as discussed in more detail in the following sections. Here, the nuclei are compared primarily from an experimental point of view. 2.3.1. Proton Relaxation Until recently, the vast majority of NMRD studies of protein solutions employed the nucleus, which has the highest sensitivity and the widest accessible frequency range. These advantages, however, must be balanced against several drawbacks. First and foremost, relaxation generally contains a potentially
confounding contribution from labile protein protons. Due to proton exchange catalysis (Eigen, 1964), this contribution is more pronounced away from neutral pH and in the presence of buffers (Liepinsh and Otting, 1996). In the past, exchange-averaged labile proton contributions to data have frequently been unjustifiably ignored. Water protons are engaged in intramolecular as well as intermolecular dipole couplings. The intermolecular couplings are quite different in bulk water and for a
Multinuclear Relaxation Dispersion Studies of Protein Hydration
431
water molecule buried inside a protein, thus complicating the interpretation of NMRD data (Venu et al., 1997). The dynamic coupling between the water and protein
spin systems brought about by cross relaxation (via intermolecular
dipole couplings) was long thought to be essential for analyzing NMRD data (Koenig et al., 1978; Koenig and Brown, 1991), but more recent work has shown that this effect is negligible for protein solutions (Venu et al., 1997). Since dipolar relaxation in water is relatively slow, other mechanisms may contribute significantly to relaxation. Bulk water in equilibrium with air at atmospheric pressure contains dissolved molecular oxygen at a concentration of 0.27 mM at 298 K. The unpaired electrons (S = 1) in give rise to a paramagnetic (mainly dipolar) relaxation dispersion in the range 10–300 MHz (centered at 40 MHz) resulting in a 30% shortening of below 10 MHz (Hausser and Noack, 1965). Among the several methods available for removing dissolved oxygen (Martin et al., 1980), we have found gentle bubbling of an inert gas such as argon to be convenient for protein solutions. Water relaxation can also be enhanced by paramagnetic metal ions slowly released from standard NMR tubes. This contribution can be minimized by pretreating NMR tubes with strong acid and EDTA solution or by using quartz tubes. If under control, the paramagnetic relaxation enhancement from intrinsic metal ions or spin labels can be used constructively to study certain aspects of protein hydration. This specialized topic falls outside the scope of this chapter.
2.3.2. Deuteron Relaxation Despite having two orders of magnitude lower receptivity and one order of
magnitude faster relaxation than the nucleus can be used for water NMRD studies over a wide frequency range. The first NMRD profiles from protein solutions, obtained by the FFC method, were reported in 1975 (Koenig et al., 1975; Hallenga and Koenig, 1976). Due to the shorter intrinsic relaxation time, is less susceptible than to labile hydrogen exchange averaging. Nevertheless, rapidly exchanging biopolymer deuterons can dominate water relaxation at low and high pH values (Woessner and Snowden, 1970; van der Klink et al., 1974; Piculell and Halle, 1986; Denisov and Halle, 1995b) and can never be neglected a priori. Among the magnetic water nuclei, only is affected by 180° flips around the axis of a water molecule associated with a protein (Denisov and Halle, 1995c).
2.3.3. Oxygen-17 Relaxation
The resonance of water was first observed in 1951 and used to determine the nuclear spin, 1=5/2, of the isotope (Alder and Yu, 1951). The first systematic relaxation study of protein hydration appeared in 1981, including data over one decade in frequency for half a dozen proteins (Halle et al., 1981). Since
magnetization is exclusively associated with water molecules, the ob-
432
Bcrtil Halle et al.
served relaxation dispersion provided indisputable evidence of motional components on the time scale of protein tumbling (ca. 10 ns) for a fraction of the water molecules in the protein solution. Since 1994, we have made extensive use of
improved instrumentation and higher enrichment to measure highly accurate NMRD profiles from protein solutions. Due to its fast relaxation, is not easily studied by current FFC technology. As a result, the accessible frequency range is restricted to 1–100 MHz. For most protein solutions, however, this is the most interesting range. Due to the low natural abundance, ca. 20 mM in water, water is essential for accurate NMRD work. Both and are commercially available at enrichment levels up to at least 60%. (For the same product, the prices can vary by a factor of 5 among different suppliers!) We generally use 20% enrichment, a compromise between cost and convenience. At this enrichment, the relative receptivity of is comparable to in Furthermore, the rapid relaxation allows short acquisition times and efficient signal averaging. A major advantage of the nucleus is that, unlike the hydrogen nuclei, it reports exclusively on water molecules. Ironically, the first studies of water in biological systems (Glasel, 1968; Civan and Shporer, 1972) did not mention this advantage, but instead argued that, due to its large quadrupole coupling, the nucleus should be a more sensitive probe of hydration effects. This is actually not true, since the relative relaxation enhancement is independent of the quadrupole coupling constant. On the other hand, due to its efficient quadrupolar relaxation and small magnetic moment, is much less susceptible than 1H to paramagnetic impurities. As a spin-5/2 nucleus, does not strictly obey the Bloch equations. Under most conditions of practical interest, however, relaxation is virtually exponential and the dispersion has the same shape as for a spin-1 nucleus like (Halle and Wennerström, 1981a). In the neutral pH range, the transverse relaxation is not purely quadrupolar, but includes a scalar relaxation contribution due to hydrogenexchange modulation of the spin–spin coupling between and in water (see Sect. 3.3.2).
3. RELAXATION MECHANISMS The water relaxation dispersion in protein solutions has been extensively studied using the and nuclei. Longitudinal relaxation is predominantly due to fluctuating magnetic dipole–dipole couplings for and to fluctuating electric field gradients for the quadrupolar nuclei and The spin–lattice couplings involved in these mechanisms are of order (Table 1). In solutions containing paramagnetic species, as an integral part of the protein or as
an impurity, interactions with electron spins can contribute substantially to the
Multinuclear Relaxation Dispersion Studies of Protein Hydration
433
overall solvent relaxation, in particular for the slowly relaxing nucleus. For the transverse relaxation, much weaker interactions, of order rad associated with (isotropic) chemical-shift differences and scalar couplings can compete
with the stronger anisotropic couplings if modulated by slow processes, typically hydrogen exchange (see Sects. 3.3 and 5.7). Except for Sect. 4.3, the following presentation is restricted to the motional-narrowing regime, where spin relaxation can be treated by the conventional second-order perturbation theory associated with the names of Bloch, Wangsness, and Redfield (Abragam, 1961). The regime of validity of BWR theory for longitudinal relaxation is essentially
where is the rigid-lattice coupling (in angular frequency units) responsible for relaxation (Table 1), is the correlation time for the motion modulating that coupling, and is the Larmor frequency.
3.1. Quadrupolar Relaxation
Relaxation of the water nuclei and can usually be fully accounted for by the coupling of the nuclear quadrupole moment with the electric field gradient present at the nuclear site. Since the electric field gradient tensor is mainly determined by the charge distribution within the water molecule, its principal components are nearly constant while the orientation of the principal axes system is modulated by rotation of the water molecule.
3.1.1. Deuteron Relaxation The relaxation of a spin-1 nucleus like is always exponential in isotropic solution (and in the fast-exchange limit). The longitudinal and transverse relaxation rates are (Abragam, 1961)
We define the rigid-lattice quadrupole frequency
through
with the quadrupole coupling constant (QCC) and the asymmetry parameter of the electric field gradient tensor. For the numerical prefactor is The reduced spectral density function is the Fourier cosine transform of the reduced time correlation function of the fluctuating quadrupole coupling:
434
where
Bertil Halle et al.
is normalized so that
3.1.2. Oxygen-17 Relaxation The relaxation of a spin-5/2 nucleus like is in general multiexponential (Abragam, 1961; Rubinstein et al., 1971; Bull et al., 1979). In isotropic solution, where the quadrupole coupling is averaged to zero, relaxation of an initial nonequilibrium magnetization outside the extreme-narrowing limit induces a coupling with higher odd-rank components of the spin density matrix. For state multipoles of rank 1 (the usual magnetization vector), 3, and 5 will thus couple, yielding triexponential relaxation. In the extreme-narrowing limit, where the relaxation matrix reduces to diagonal form, i.e., all state multipoles evolve independently, and the first-rank multipoles relax exponentially with
where, for is given by Eq. (4) with 3.1.2a. Effectively Exponential Relaxation. Even outside the extreme-narrowing limit, the deviation from exponential relaxation is usually insignificant. In water-rich systems like protein solutions, the effective spectral density function contains a large frequency-independent contribution from bulk water and mobile surface water (see Sect. 4). This contribution cancels out in the off-diagonal relaxation matrix elements but adds to the diagonal elements. Under such conditions, the observed relaxation rate is accurately approximated by the rank-1 diagonal element of the relaxation matrices (in the spherical multipole basis) for the polarizations and single-quantum coherences (Halle and Wennerström, 1981a). It can be shown with the aid of the Wigner–Eckart theorem (in Liouville space) that these elements are of the same form for any spin quantum number I, differing merely by a numerical prefactor. The rigorous results in Eqs. (3) thus hold also for in the typical situation of effectively exponential relaxation. The accuracy of this single-exponential approximation for is illustrated in Fig. 5. Even in the rare cases (concentrated solutions of large proteins and/or intermediate exchange conditions) where the single-exponential approximation may be inaccurate, Eqs. (3) remain valid for the average relaxation rate measured at short evolution times (McLachlan, 1964; Halle and Wennerström, 1981a).
3.1.2b. Multiple-Quantum Coherences. The dynamic coupling to high-rank density matrix components (of odd parity) induced by quadrupolar relaxation of nuclei outside the extreme-narrowing limit makes it possible to generate
Multinuclear Relaxation Dispersion Studies of Protein Hydration
435
multiple-quantum coherences (MQCs) even in isotropic solution (Jaccard et al., 1986; Pekar and Leigh, 1986; Chung and Wimperis, 1992). At least two rf pulses are required to excite MQCs, the evolution of which can be indirectly detected after conversion to observable single-quantum coherence by a third pulse. In principle, the use of MQC filters in water relaxation studies of protein solutions offers certain advantages. First, since this is a null method any deviation from single-exponential relaxation is more readily detected than with conventional one- or two-pulse methods. Second, since relaxation matrix elements of different co-
436
Bertil Halle et al.
herence order usually involve different linear combinations of the three spectral densities and the MQC relaxation rates can provide independent information about the spectral density function. In practice, however, these advantages are offset by poor sensitivity. The proposed use of MQC filters as a diagnostic for non-extreme-narrowing conditions (Flesche et al., 1995) does not seem to offer any advantages over single-field and measurements or multiple-field measurements using conventional methods. It has also been claimed that the MQC filter technique allows observation of bound water molecules without interference from bulk water (Flesche et al., 1995; Baguet et al., 1996). This is incorrect; under the usual fast-exchange conditions any relaxation experiment probes the same spatially averaged spectral density function (see Sect. 4). Furthermore, if the deviation from single-exponential relaxation is small, as is typically the case in protein solutions, the MQC conversion yield is correspondingly low, so that extensive signal averaging is needed to obtain reasonable accuracy. A recent MQC study of protein hydration (Baguet et al., 1996) produced results in qualitative disagreement with more reliable NMRD results obtained under very similar conditions (Halle et al., 1981; Denisov and Halle, 1996). The discrepancy seems to be due partly to the inherent insensitivity of the MQC method (even at high protein concentration and high magnetic field) and partly to the difficulty of determining the correlation time from measurements at a single high field (9.4 T). Field-variable relaxation measurements (NMRD) generally provide a far more complete and accurate characterization of the spectral density function than is possible with MQC methods. MQ filters might prove useful, though, for separating contributions to transverse relaxation from quadrupolar and scalar couplings. 3.1.2c. Dynamic Shifts. In isotropic solutions, the quadrupole coupling is manifested to second order not only in relaxation but also in dynamic frequency shifts of the various coherences (Werbelow and Pouzard, 1981; Westlund and Wennerström, 1982). For the single-quantum coherences in the nearly-exponentialrelaxation regime, the dynamic frequency shift of a spin-I nucleus is
where
is the imaginary part of the complex spectral density function, i.e.,
The accurate determination of dynamic frequency shifts is technically demanding, and they have rarely been used constructively in relaxation studies of isotropic solutions (Tromp et al., 1990; Eliav et al., 1991; Huang Kenéz et al., 1992).
Multinuclear Relaxation Dispersion Studies of Protein Hydration
437
Moreover, since the time correlation function can in principle be obtained from by an inverse Fourier cosine transform, the function provides no new information. Dynamic shifts could possibly be useful when the functional form of is complicated and data are available only in a restricted frequency range.
3.2. Dipolar Relaxation Water longitudinal relaxation in diamagnetic protein solutions is due to intra- and intermolecular dipole–dipole couplings between water protons and to
intermolecular couplings between water and protein protons. Since the dipole coupling has intramolecular as well as intermolecular components, it can be modulated by rotational as well as (relative) translational motion.
3.2.1. Isolated Spin Pair
The evolution of the longitudinal magnetizations of an isolated pair of dipolecoupled protons, i and j, is governed by the Solomon equations (Solomon, 1955):
where denotes the deviation from the equilibrium magnetization. The autoand cross-relaxation rates are given by
The rigid-lattice dipole frequency D for an isolated spin pair is defined as
With the internuclear separation r in Ångström units and D in rad The reduced spectral density function is the Fourier cosine transform of the reduced time correlation function of the fluctuating dipole coupling,
as in Eq. (5), and
is normalized so that
In Eqs. (10) and (11), we
have assumed that the Larmor frequencies of spins i and j are sufficiently close that which is generally true for the homonuclear case.
438
Bertil Halle et al.
The Solomon equations (9) yield for the evolution of the i spin magnetization
It follows from Eq. (13) and the analogous result for that the sum of the i and j spin magnetizations, which is the only observable quantity for a pair of magnetically equivalent protons, decays exponentially with the longitudinal relaxation rate:
Like the populations, the coherences and of a dipole-coupled proton pair decay biexponentially in general. For two magnetically equivalent protons,
however, the sum
relaxes exponentially with the transverse relaxation
rate (Abragam, 1961):
While Eqs. (14) are of the same form as the quadrupolar relaxation rates in Eqs. (3), the spectral density functions generally reflect different aspects of the molecular motions and are identical only in the special case of an isotropically tumbling rigid molecule. In the unlike-spin limit, differential precession quenches the coupling between and leading again to exponential relaxation but with a relaxation rate (Abragam, 1961):
3.2.2. Water–Protein Magnetization Transfer Magnetization transfer from water protons to oligopeptide protons (Pitner et al., 1974) and protein protons (Stoesz et al., 1978; Akasaka, 1979) in isotropic solution was first demonstrated in the 1970s and, more recently, has provided the basis for studies of protein hydration by multidimensional nuclear Overhauser effect (NOE) spectroscopy (Otting and Wüthrich, 1989; Otting, 1997). The mechanism of magnetization transfer can be chemical exchange of labile protons, dipolar cross relaxation, or a combination of the two (van de Ven et al., 1988). Direct measurements of cross-relaxation effects on the longitudinal relaxation of the water resonance in a semisolid protein system were first carried out by Edzes and Samulski (1977, 1978). These authors also introduced a phenomenological description of magnetization transfer (cross-relaxation and/or chemical exchange), where the water and protein protons were modeled as two thermodynamic subsystems, each with a uniform spin temperature. Magnetization transfer between the two subsystems was described by extended Bloch equations of the same form as for two-site chemical exchange. Essentially the same model was used to analyze
Multinuclear Relaxation Dispersion Studies of Protein Hydration
439
water NMRD data from protein solutions (Koenig et al., 1978), although these data did not provide direct evidence for cross-relaxation effects. Nevertheless, it was argued, mainly on the basis of NMRD data from mixtures, that to account for the NMRD profile from protein solutions it is necessary to consider explicitly the coupled evolution of the water and protein magnetizations,
the latter acting as a relaxation sink for the former (Koenig et al., 1978; Koenig and Brown, 1991). 3.2.3. Dipolar Relaxation in a Multispin System with Chemical Exchange A more rigorous analysis of cross-relaxation effects on water relaxation in protein solutions was recently undertaken (Venu et al., 1997). Consider a water or labile protein proton exchanging between bulk water and a site in a protein, where it is dipole-coupled to N nonexchanging protein protons, all of which are mutually dipole coupled. The coupled evolution of the nonequilibrium longitudinal magnetizations associated with the different protons is governed by a set of linear relaxation-exchange equations (Noggle and Schirmer, 1971):
where X(t) is the column vector of n o n e q u i l i b r i u m magnetizations the superscripts labeling the bulk state (B), the exchanging macromolecular site (M), and the nonexchanging macromolecular sites ( 1 , 2 , . . . , N)- The rate matrix R takes the form
Here, f is the ratio of the equilibrium populations in the M and B states and k is the M B exchange rate, i.e., 1/k is the mean residence time of a proton in site M. The autorelaxation rates and cross-relaxation rates of the explicitly dipole-coupled nuclei are
440
Bertil Halle et al.
Although Eq. (16) predicts that the observed bulk water magnetization should decay as a sum of exponentials, in protein solutions it is usually indistinguishable from a single-exponential decay. The effective longitudinal relaxation rate measured under such conditions can be calculated as
Provided that and as is typically the case in protein solutions, one thus finds for nonselective excitation (Venu et al., 1997)
with the intrinsic relaxation rate by
of the exchanging macromolecular site given
where P is the relaxation matrix for the nonexchanging macromolecular spins, comprising the lower right submatrix in Eq. (17), and the matrix is obtained from P by replacing the ith row of P by If all cross-relaxation rates vanish, then and Eq. (22) reduces to the well-known result for two-site exchange under the conditions
1 and
(Luz and Meiboom,
1964). Using Eqs. (18)–(20), it can be shown that det(P) and nonnegative and, hence, that the effect of the cross-relaxation rates to reduce the intrinsic relaxation rate For Eq. (23) reduces to
det are is always
and for
If the cross-relaxation rates, between the macromolecular spins are neglected, the general result, Eq. (23), reduces to
Multinuclear Relaxation Dispersion Studies of Protein Hydration
441
These results are easily generalized to the case where the exchanging entity is a two-spin system, such as the protons in a water molecule (Venu et al., 1997). The foregoing analysis (Venu et al., 1997) shows that the relative importance of cross relaxation is essentially determined by quantities of the form Since the cross-relaxation rate is due solely to the dipole coupling between protons i and j, while the autorelaxation rates and involve all dipole couplings to these protons, one expects in general that
, This is particularly
obvious when i or j is a water proton, strongly coupled to its intramolecular partner. With the aid of a high-resolution crystal structure of BPTI (with the protons identified by neutron diffraction), the cross-relaxation contribution to the amplitude of the buried water dispersion was calculated explicitly and was found to be merely ca. 1% (Venu et al., 1997). The contribution from intermolecular dipole couplings (with protein protons) to the autorelaxation rate of buried water molecules is ca. 30% (Venu et al., 1997), as compared to the ca. 60% intermolecular (with other water protons) contribution in bulk water (Lankhorst et al., 1982). Similarly, the calculated cross-relaxation effect on the labile proton contribution to the dispersion was found to be only a few percent (Venu et al., 1997). Also the chemically relayed cross-relaxation mechanism (Koenig et al., 1978; van de Ven et al., 1988; Hills, 1992) is therefore unimportant for water relaxation in protein solutions. Since the conclusions do not depend on the value of the correlation time cross relaxation should be unimportant also for proteins that are much larger than BPTI, as long as BWR theory remains valid. These theoretical predictions are also consistent with recent NMRD data from BPTI solutions, which can be quantitatively accounted for in terms of buried water molecules (with intra- and intermolecular dipole couplings) and labile protons, without invoking cross relaxation (Venu et al., 1997). Water–protein magnetization transfer via cross relaxation can be monitored directly by selective saturation transfer or multidimensional NOE spectroscopy. For the transient NOE, cross relaxation is a “first-order” effect while autorelaxation is a “second-order” effect which is negligible for short mixing times. In contrast, for the water relaxation rate measured in a NMRD experiment, autorelaxation is a “first-order” effect while cross relaxation is a “second-order” effect which turns out to be negligible (Venu et al., 1997). In semisolid protein samples, however, the cross-relaxation rate is governed by water exchange rather than by macromolecular tumbling, whereas the autorelaxation rates of protein protons are governed by much faster internal motions. The cross-relaxation rate can therefore be much larger than the autorelaxation rates. Furthermore, if the water residence time is in the micro-
442
Bertil Halle et al.
second range, the conventional BWR theory of spin relaxation breaks down (see Sect. 4.3). 3.3. Relaxation due to Isotropic Couplings
For the water , and nuclei, the instantaneous resonance frequency is determined by the chemical shielding experienced in the current physical environment and by the current spin state of the directly bonded water nucleus. The lifetime of the covalent O–H bond in bulk water at neutral pH and room temperature is in the millisecond range (Meiboom, 1961; Halle and Karlström, 1983). Exchange of hydrogen nuclei between different water molecules modulates the indirect spin– spin (J) coupling with the nucleus, affecting the lineshapes of the two J-coupled nuclei. Likewise, exchange of hydrogen nuclei between water molecules and labile groups on a protein modulates the chemical shift. Chemical-shift modulation can also take place via diffusive exchange of entire water molecules between different physical environments, as when a water molecule buried in a protein cavity exchanges with bulk water. Since the Zeeman and (first-order) J couplings commute with their modulation cannot contribute to longitudinal relaxation. Line-shape effects can be treated in terms of the exchange-modified Bloch equations. Under most conditions, the line shape remains Lorentzian and can therefore be characterized by an effective transverse relaxation rate. In magnetic fields sufficiently weak that the inequality is violated, the nonsecular part of the scalar coupling must be retained. In bulk water at neutral pH, this gives rise to a longitudinal relaxation dispersion in the kHz range (Noack, 1986). 3.3.1. Chemical-Shift Modulation
In the case of chemical-shift modulation (CSM), the relevant situation is one where water nuclei exchange between a bulk environment and an arbitrary number of protein-associated environments, each characterized by a fractional equilibrium population a mean residence time an intrinsic transverse relaxation rate and an intrinsic chemical shift For protein solutions, it is usually justified to assume that all exchange processes occur via the bulk, that and that Under these conditions (Swift and Connick, 1962) )
where
Several limiting cases of Eq. (27) are of interest. If the chemical-shift differences are sufficiently small that then
Multinuclear Relaxation Dispersion Studies of Protein Hydration
An analogous relation holds for provided that Sect. 4.1). If exchange is fast compared to relaxation,
If exchange is also fast compared to the shift difference,
443
1 and
(see
then
then
In this limit, the CSM contribution to can also be obtained from BWR theory as the adiabatic (zero-frequency) relaxation rate induced by an exchange-modulated Zeeman coupling. Equations (27)–(30) pertain to the transverse relaxation rate measured in a two-pulse spin-echo experiment. In a CPMG pulse-train experiment, the transverse relaxation rate characterizing the decay of the echo envelope depends on the pulse spacing Starting from the extended Bloch equations for an arbitrary number of sites, the evolution of the echo amplitude can be calculated numerically by standard matrix techniques (Allerhand and Thiele, 1966). The preceding results for are valid also for if all site correlations are lost in one pulse interval , In the opposite limit, where practically no spins exchange in one pulse interval the shifts are refocused and the CSM contribution vanishes. In principle, the proton exchange times can therefore be determined from a CPMG dispersion experiment where is measured as a function of the pulse repetition frequency (Allerhand and Thiele, 1966). In practice, CPMG dispersions from protein solutions are rather well described by a two-site model (Hills et al., 1989), so it is difficult to separate individual CSM contributions. At least a partial separation of CSM contributions from different types of labile
protons is possible, however, if CPMG dispersions are performed over a wide pH range (see Sect. 5.7).
3.3.2 Scalar Relaxation As demonstrated in Meiboom’s classic study (Meiboom, 1961), modulation of scalar coupling in by proton exchange and/or spin relaxation leads to a substantial shortening of the proton An analogous effect is seen in the relaxation in (Meiboom, 1961) and in the and relaxation in (Halle and Karlström, 1983). Since proton exchange in water is both acid and base catalyzed, the scalar relaxation rate is maximal in the neutral pH range and can be
the
444
Bertil Halle et al.
quenched by buffers (Luz and Meiboom, 1963a; Luz and Meiboom, 1963b; Lankhorst et al., 1983) or macromolecules containing proton-exchange catalyzing groups (Schriever and Leyte, 1977; Rose and Bryant, 1980; Halle and Piculell, 1982; Lankhorst and Leyte, 1984; Conti, 1986). In aqueous protein solutions, the scalar relaxation contribution to is therefore usually much smaller than in bulk water at the same pH (Halle et al., 1981; Denisov and Halle, 1995a, 1995b). In recent years, the reduction induced by the scalar coupling in has been used to generate contrast in magnetic resonance imaging (Kwong et al., 1991; Ronen and Navon, 1994; Stolpen et al., 1997). Analytical expressions for the line shape in the presence of a (secular) scalar coupling to two equivalent nuclei have been derived (Halle and Karlström, 1983). For hydrogen exchange times less than about 1 ms, the line shape is essentially Lorentzian with a half-width given to a good approximation by the motional-narrowing result for scalar relaxation of the first kind (Abragam, 1961):
where S is the spin of the hydrogen nuclei scalar-coupled to The scalar coupling constant has been measured for water molecularly dispersed in a variety of organic solvents, whereby proton exchange is slowed down sufficiently to reveal the fine structure. The best estimate thus obtained is 1 Hz for (Halle and Karlström, 1983; Mateescu et al., 1988). For therefore, . It is conceivable that the actual value of in bulk water differs slightly from that in organic solvents. The analogous motional-narrowing result for the scalar relaxation rate of is (Meiboom, 1961)
where x is the atom fraction
in the water and the effect of
longitudinal
relaxation (scalar relaxation of the second kind) has been neglected. As for CSM, exchange modulation of scalar couplings may be investigated by the CPMG pulse-train method (Luz and Meiboom, 1963b; Schriever and Leyte, 1977). The theory is analogous to the CSM case and the scalar relaxation contribution to vanishes for short pulse spacings
4. MOLECULAR MOTIONS The quadrupolar and dipolar relaxation rates of the water nuclei and are primarily probes of water molecule rotation. On a molecular scale, a protein
Multinuclear Relaxation Dispersion Studies of Protein Hydratlon
445
solution is a heterogeneous system where water molecules sample many different local environments on the time scale of spin relaxation. Moreover, in a given microenvironment, the rotational motion of a water molecule may take place on several time scales, ranging from subpicosecond torsional vibrations to rotational diffusion of a long-lived water–protein complex on a time scale of nanoseconds or longer. A detailed characterization of this structural and dynamic complexity would require a method with spatial as well as temporal resolution. Since a relaxation dispersion profile essentially is a map of the spectral density function, the NMRD method provides the ultimate temporal resolution, but it lacks intrinsic spatial resolution. In other words, NMRD data alone can provide the time scales of water motion, but not the location of the water molecules. To identify the water molecules associated with the dynamic features reflected in the NMRD data, one must rely on extrinsic information. The principal source of such data are high-resolution protein crystal structures, which can provide the spatial coordinates of the water molecules responsible for the relaxation dispersion. The symbiosis between NMRD and crystallography has proven extremely fruitful. The interpretational ambiguity that had plagued the NMRD field ever since the first
NMRD studies of protein solutions brought forth a variety of more or
less imaginative and often mutually inconsistent dynamic models. The 25-year search for the molecular mechanism behind the water relaxation dispersion came to an end only recently by using NMRD to avoid labile hydrogen contributions and by correlating the dynamic NMRD information with structural information from crystallographic data on native and genetically engineered proteins (Denisov and Halle, 1994, 1995a, 1996; Denisov et al., 1996). It is now firmly established that the water relaxation dispersion, observed in the 1–100 MHz range for small and medium-sized proteins, is due to a small number of long-lived water molecules (residence times >1 ns) buried in internal cavities or in deep and narrow surface pockets. At high frequencies, above the dispersion, the excess relaxation rate (above the bulk water rate) is due to the conventional hydration layer, essentially the water molecules in contact with the protein surface. These water molecules generally have rotational correlation times as well as residence times in the subnanosecond range and therefore do not contribute to the MHz dispersion. In addition, rapidly exchanging labile protein hydrogens can make substantial contributions to the and relaxation rates over the entire frequency range (Denisov and Halle, 1995b; Venu et al., 1997). The theoretical framework needed to analyze NMRD data from protein solutions is based on the “standard model” of water relaxation, first proposed by Lars Onsager (Wang, 1955) and applied in the first detailed water relaxation study of protein solutions by the Krakow group in 1963 (Daszkiewicz et al., 1963). The standard model required two extensions (Halle and Wennerström, 1981b). First, a
frequency-independent term must be added to account for the contribution from mobile surface waters and fast local motions of long-lived waters and labile
446
Bertil Halle et al.
hydrogens. Second, the effect of fast local motions of long-lived water molecules on the dispersion amplitude is accounted for by an order parameter formalism that does not rely on specific assumptions about the nature of this motion.
4.1. Spatial Resolution The relaxation of the water magnetization in a protein solution is governed by molecular motions at two distinct levels. At the level of spin dynamics, translational motion of water molecules transfers magnetization between microenvironments with different intrinsic relaxation rates. If sufficiently fast, such material exchange leads to spatial averaging of the local relaxation rates. At the level of orientational time correlation functions, water rotation averages out the anisotropic spin-lattice coupling and thus determines the intrinsic spin relaxation rate (see Sect. 4.2).
4.1.1. Exchange Averaging The theoretical framework for analyzing relaxation data from nuclei exchanging between discrete states is well established and was, in fact, first developed in connection with studies of water in microheterogeneous systems (Zimmerman and Brittin, 1957). It is not obvious, however, that a discrete-state exchange model provides a valid description of continuous water diffusion in a spatially heterogeneous system such as a protein solution (Halle and Westlund, 1988). There are two aspects to this issue. First, in the fast-exchange regime the actual exchange mechanism is irrelevant, and it is only necessary that the perturbation of water rotation induced by the protein be relatively short-ranged. This is known to be the case: relaxation studies on a variety of microheterogeneous aqueous systems show that only water molecules in direct contact with an interface are significantly perturbed (Woessner, 1980; Carlström and Halle, 1988; Volke et al., 1994). Second, the observed spin relaxation rate depends on the exchange mechanism only in the intermediate exchange regime where the residence times are in the –ms range. Such long residence times are only relevant for buried water molecules and labile protein protons, for which a discrete-state exchange (or jump) model is indeed appropriate. This would not necessarily be the case for water molecules at the protein surface, but they are invariably in the fast-exchange regime. The simplest description of the effect of exchange averaging on the water longitudinal relaxation rate in a protein solution is of the form
Provided that chemical-shift differences can be ignored, an analogous result holds for (see Sect. 3.3.1). Since the contributions and are generally in the extreme-narrowing limit, the subscript 1 is suppressed. The first term in Eq. (32)
Multinuclear Relaxation Dispersion Studies of Protein Hydration
447
refers to the fraction of water molecules that are unperturbed by the protein and thus have the same relaxation rate as bulk water. The second term refers to the fraction of water molecules that are dynamically perturbed by the protein, but remain sufficiently mobile that their effective correlation times are much shorter than the tumbling time of the protein. These water molecules are responsible for (most of) the excess relaxation at frequencies above the dispersion.
The third term in Eq. (32) refers to the long-lived water molecules responsible for the relaxation dispersion. Each of these water molecules has a distinct residence time and intrinsic longitudinal relaxation rate and, taken together, they
account for a fraction of all water molecules in the sample. The '. and relaxation rates generally contain contributions from water hydrogens as well as labile protein hydrogens. Since labile hydrogens generally exchange slowly compared to the tumbling of the protein, they contribute only to the third term in Eq. (32). The simple form of this term is valid provided that the quantities and are small compared to 1 (Luz and Meiboom, 1964). In typical protein
solutions, they are in the range
so this condition is satisfied with a wide
margin. Equation (32) is an essentially phenomenological description of exchange
averaging and, as such, is of considerable generality. Since the NMRD method lacks
intrinsic spatial resolution, the microscopic significance of the individual terms in Eq. (32) can be deduced only with the aid of extrinsic structural data, such as high-resolution crystal structures. This has been done for a variety of proteins and the general picture is now clear (Denisov and Halle, 1995a, 1996). The short-lived water molecules responsible for the second term in Eq. (32) essentially comprise the traditional hydration layer, i.e., water molecules in contact with the protein surface (hence the subscript S). The quantity is the intrinsic relaxation rate averaged over all surface sites occupied by short-lived water molecules. The long-lived water molecules responsible for the third term in Eq. (32) are usually buried in cavities inside the protein or trapped in deep surface pockets with low accessibility to external water. In the following, we refer to this class of crystallographically identifiable water molecules as internal water molecules (hence the subscript I). (Crystallographers often use this term in a slightly more restrictive sense, including only water molecules that are not within hydrogen-bonding distance of external water molecules.) 4.1.2. Difference NMRD The identification of internal water molecules as the source of the relaxation dispersion has transformed the NMRD method into a quantitative tool for investigating specific water molecules of structural and functional significance and for
exploiting internal water molecules as noninvasive probes of protein structure and dynamics. The most powerful way of conducting such studies is in the form of a
448
Bertil Halle et al.
difference NMRD experiment where the NMRD profiles from two structurally related proteins are compared. In fact, the first demonstration of the crucial role of buried water molecules was a difference NMRD experiment where the NMRD profiles of BPTI and ubiquitin were compared (Denisov and Halle, 1994, 1995a).
These proteins are of similar size and surface structure, but differ qualitatively in one respect: BPTI contains four buried water molecules, ubiquitin none. The result (Fig. 6) is clear-cut: the virtual absence of a relaxation dispersion for ubiquitin must be due to the absence of buried water molecules in this protein. (The tiny ubiquitin dispersion can be attributed to a single weakly ordered water molecule in a surface pocket.) Subsequent work (Denisov et al., 1995, 1996) revealed that the dispersion from BPTI is actually due to only three of the four buried water molecules, the fourth one (W122) exchanging too slowly to contribute significantly at 300 K (see Sect. 5.5). The BPTI–ubiquitin difference experiment relied for its interpretation on high-resolution crystal structures of the two proteins. With the correlation between internal water molecules and NMRD firmly established (Denisov and Halle, 1996),
Multinuclear Relaxation Dispersion Studies of Protein Hydration
449
useful information can be obtained from difference NMRD experiments even when the structure of one of the two proteins is unknown. Partially folded proteins is a case in point. Figure 7 shows the dispersions from in the native state, in the partially folded A state (“molten globule”) at pH 2, and in the unfolded state (in the presence of 4 M GuHCl and with the four disulfide bonds reduced by dithiothreitol) (Denisov et al., 1999). This experiment provides three pieces of information. First, the dispersion from the A state (corresponding to at least three long-lived water molecules) implies the existence of persistent (>10 ns) structural elements that are not present in the unfolded form. Second, the dispersion frequency is a measure of the hydrodynamic volume of the protein (see Sect. 5.2). The frequency shift (more accurately measured from the dispersions) between the native and A forms suggests a 30% expansion of the latter. Third, the excess relaxation rate on the high-frequency plateau provides a global measure of solvent exposure. This is seen to differ little between the native and A forms, but, as expected, is substantially higher for the unfolded form.
450
Bertil Halle et al.
These two examples can be thought of as global difference NMRD experiments. More detailed information can be obtained from a local difference NMRD experiment where the relaxation dispersion is recorded before and after a site-directed structural perturbation that eliminates one or more of the internal water molecules that contribute to the relaxation dispersion from the unperturbed protein. In a more subtle version of this experiment, the perturbation does not eliminate any internal water molecules but only affects their residence times (e.g., by altering the rate of large-scale conformational fluctuations). If it can be established that the perturbation is local, e.g., from crystal structures of both forms, then should be unaffected and should be the same for all internal water molecules present in both forms. Equation (32) then yields for the difference dispersion
where the sum includes only the displaced water molecules. A local structural perturbation can be induced in several ways. Site-directed mutagenesis is the method of choice for replacing buried water molecules. For example, in the single-point BPTI mutant G36S, the buried water molecule W122 is replaced by the hydroxyl group in the side chain of serine-36. The wild-typeG36S difference dispersions shown in Fig. 8 are thus due to a single buried water molecule. Local covalent modifications can of course also be introduced by conventional chemical methods, e.g., selective reduction of disulfide bonds (this might be a residence time perturbation).
More accessible (but long-lived) water molecules in the native structure can be eliminated (or replaced by short-lived ones) by removing an intrinsic metal ion or cofactor or by adding a high-affinity substrate or inhibitor. If complete removal of an intrinsic ligand cannot be achieved, NMRD profiles can be recorded at a series of ligand–protein ratios and the results extrapolated to zero ligand concentration. This approach can be used, for example, for intrinsic multivalent metal ions that coordinate long-lived water molecules, as illustrated in Fig. 9 for calbindin where each of the two ions coordinates one water molecule (Denisov and Halle, 1995c). The strategy of water elimination by ligand binding is illustrated in Fig. 10 for a B-DNA dodecamer where five water molecules in the minor groove are displaced by the polyaromatic drug netropsin (Denisov et al., 1997a). Due to the relatively short residence time (1 ns), only the low-frequency part of the dispersion could be accessed (see Sect. 5.5). More complete dispersions from DNA solutions were subsequently recorded at 253 K, using an emulsion technique to avoid freezing (Jóhannesson and Halle, 1998). In general, hydrogen exchange is less of a problem in difference NMRD experiments since any labile hydrogen contribution tends to cancel out in the
Multinuclear Relaxation Dispersion Studies of Protein Hydration
451
difference. This is an important advantage as NMRD data from all three water nuclei can then be quantitatively compared, providing detailed information about residence times (Denisov et al., 1996) and orientational disorder (Denisov et al., 1997b) of buried water molecules. Concern about hydrogen exchange is warranted even in difference NMRD experiments, however, because the structural perturbation might affect the values or exchange rates of labile hydrogens (Denisov and Halle, 1995c). Ligands carrying rapidly exchanging hydrogens may, of course, also present problems (Denisov et al., 1997a). 4.2. Temporal Resolution
By definition, the water molecules responsible for the second term in Eq. (32) do not produce a dispersion in the experimentally accessible frequency range. Even
452
Bertil Halle et al.
for a residence time as long as 100 ps, the dispersion would be centered around 1 or Larmor frequency. The observed relaxation dispersion is due to the frequency dependence of the intrinsic relaxation rates of internal water molecules (and labile hydrogens). Within the BWR regime, as defined in Eq. (2), these rates are related to the spectral density function as shown in Sect. 3. The discussion in Sect. 4.2.1 applies to quadrupolar relaxation as well as to intramolecular dipolar relaxation; hence we omit the Q/D subscript on the spectral density function. Intermolecular dipolar contributions are considered in Sect. 4.2.2. GHz, an order of magnitude above the highest achievable
4.2.1. Intramolecular Spectral Density Function
For an internal water molecule (or labile proton spin pair) tumbling rigidly together with a spherical protein, the spectral density function has the usual
Lorentzian form
Multinuclear Relaxation Dispersion Studies of Protein Hydration
453
where the effective correlation time
is determined by the residence time in site k and the rotational correlation time of the protein according to (Beckert and Pfeifer, 1965; Hertz, 1967; Brüssau and Sillescu, 1972)
This simple relationship results from two innocuous assumptions. First, water (or labile hydrogen) exchange and protein rotation are statistically independent processes. Second, each exchange event randomizes the orientation of the spin–lattice interaction tensor. In other words, once the internal water molecule or labile hydrogen has exchanged with bulk water, the probability of returning to the same
454
Bertil Halle et al.
site (in the same protein molecule) before the protein has randomized its orientation
is negligible. Note that the residence time
appears at both levels of motional
averaging: Eq. (32) describes spatial averaging of the intrinsic relaxation rates,
while Eq. (35) describes orientational averaging by exchange from a locally ordered site to an isotropic bulk phase. (When treating water and labile hydrogen exchange on an equal footing, we denote the residence time by When either species is referred to, we use the notations and respectively.) Equation (34) is readily generalized to nonspherical proteins with symmetrictop rather than spherical-top rotational diffusion. The spectral density function is then a sum of three Lorentzians, weighted according to the relative orientation of the spin–lattice interaction tensor and rotational diffusion tensor (Woessner, 1962). For most globular proteins, however, the effect of anisotropic rotational diffusion on the shape of the relaxation dispersion is insignificant. Internal water molecules (and labile hydrogens) do not in general tumble
rigidly with the protein, but undergo restricted rotational motions on time scales short compared to If the local rotation is much faster than the global isotropic motion (with correlation time and remains in the extreme-narrowing regime at the highest relevant frequency, then the appropriate generalization of Eq. (34) takes the simple form (Halle and Wennerström, 1981b; Lipari and Szabo, 1982)
where more,
is an effective correlation time for the local restricted rotation. Furtheris the generalized second-rank orientational order parameter for site k, defined through
In our previous work, a generalized order parameter was used. The quantity is more convenient since it has a maximum value of unity for all three water nuclei in the limit of a rigidly attached water molecule. If k refers to an internal water site, is a molecular order parameter defined as
where specifies the orientation of the water-molecule-fixed frame M (Fig. 11) relative to an arbitrary protein-fixed frame P (assuming spherical-top rotation). To relate the generalized order parameters of all three water nuclei to the same set of molecular order parameters we have introduced in Eq. (37) a set of geometric coefficients defined as
Multinuclear Relaxation Dispersion Studies of Protein Hydration
455
where specifies the orientation of the principal frame F of the spin lattice interaction tensor (Fig. 11) relative to the M frame. The explicit forms of the geometric coefficients for the three water nuclei are collected in Table 2. Since the relaxation rates of these nuclei are not affected by a 180° flip of an internal water molecule around its axis (Denisov and Halle, 1995c; Venu et al., 1997). If k refers to a labile hydrogen site, it is more convenient to define
456
Bertil Halle et al.
the order parameters directly in terms of the orientation tensor with respect to the protein. This corresponds to setting
of the interaction in Eq. (37).
4.2.2. Intermolecular Spectral Density Function When Eq. (32) is applied to NMRD data, the intrinsic relaxation rate of an internal water molecule contains a contribution, given by Eq. (14a), from the intramolecular dipole coupling between the two water protons as well as a contribution, given by Eqs. (18) and (19), from intermolecular dipole couplings
between either water proton and all protein protons. The spectral density function for the intramolecular contribution, where only the orientation of the H–H vector is modulated, is of the same form as for quadrupolar relaxation, Eq. (36), and the generalized intramolecular order parameter is given by Eq. (37) with For the intermolecular contribution, where local motions can modulate both the orientation and the length r of the H–H vector, the spectral density function takes the form
where the sum runs over all internuclear vectors connecting one of the protons of is given by Eq. (12) and the effective dipole frequency averaged by local motions, can be expressed in terms of the generalized intermolecular order parameter as the internal water molecule k with a protein proton i. The dipole frequency
with
involving the solid spherical harmonics of rank Here is the H–H vector, of length and orientation and the are (unnormalized) spherical harmonics. For a rigid water–protein complex without internal motions on the time scale of protein tumbling or faster,
The generalized intermolecular order parameter is most conveniently evaluated in a coordinate system with its origin at the center of symmetry of the internal motion rather than at the proton (Otting et al., 1997). If only one of the two coupled protons undergoes internal motion, the solid harmonics can be transformed to the center of symmetry according to (Chiu, 1964)
Multinuclear Relaxation Dispersion Studies of Protein Hydration
457
where and are vectors from the center of symmetry to the mobile and fixed proton, respectively, and is assumed. Furthermore,
If the mobile-proton vector r1 is distributed with spherical symmetry, it follows from the orthogonality of the spherical harmonics that whereby Inserting this into Eq. (42) and using the closure relation for spherical harmonics, one obtains , i.e., the same result as if the spherically disordered proton were fixed at the center of symmetry. For internal
motion of lower symmetry, corrections to this result appear that are proportional to a power of
For cylindrical symmetry, for example, one finds
By employing a two-center expansion for solid harmonics (Chiu, 1964), internal motions of both protons can be handled in a similar way. When both protons are spherically disordered, same as if they were located at the centers of symmetry.
is the
4.3. Water Relaxation in Semisolid Proteins 4.3.1. General Features of Semisolid Systems A substantial fraction of all published NMR studies of water in biological systems are concerned, not with isotropic protein solutions, but with semisolid materials of relatively low water content. In this category we find a diverse
collection of materials, including protein fibers and powders, protein crystals, protein gels, biological tissues, and partially frozen protein solutions. Protein fibers and powders hydrated from the vapor phase to less than a monolayer of sorbed water may seem ideal for NMR studies of protein hydration since all water molecules interact strongly with the protein, whereas in protein solutions hydration effects are “diluted” by the dominant bulk water response. The structural, energetic, and dynamic properties of sorbed water, however, are qualitatively different from those of water at a protein surface in solution. Furthermore, dehydration may significantly perturb the native protein structure. While studies of sorbed water may therefore not be directly relevant to hydration in solution, they are nevertheless of importance for a variety of applications in food and materials technology. Protein
crystals and gels typically have water contents of 40% or more and are therefore better models for hydration in protein solutions and biological tissues.
458
Bertil Halle et al.
From the point of view of NMR relaxation, the motional-narrowing condition provides a natural demarcation line between semisolids and solutions. In most protein solutions, all orientation-dependent terms in the spin Hamiltonian are averaged to zero by protein tumbling at a rate exceeding the anisotropic coupling frequencies. Under these conditions, the conventional BWR theory of spin relaxation applies (Abragam, 1961). In the semisolid biological materials mentioned
above, the macromolecular component is stationary on this time scale. This has several important consequences. In macroscopically anisotropic systems, incompletely averaged anisotropic couplings may give rise to dipolar or quadrupolar line splittings, the temperature dependence of which can provide information about residence times in the range. Moreover, the relaxation behavior becomes more complex, and richer in information, than in solutions. Relaxation due to relatively fast anisotropic motions becomes orientation dependent and is no longer described by a single spectral density function (as in solutions). Further averaging by slower motions often dominates relaxation. Since the protein molecules are not free to tumble, the actual exchange of internal water molecules (and labile hydrogens) with bulk water can modulate the (residual) couplings, thereby providing direct access to residence times; cf. Eq. (35). If the exchange rates are comparable to the residual couplings, however, relaxation cannot be described by BWR theory. Solutions of large or highly concentrated globular proteins may exhibit such borderline behavior (neither solid nor solution), with overall tumbling rates as well as exchange rates of the same order of magnitude as the residual anisotropic couplings. More importantly, the water relaxation
dispersion in heterogeneous semisolids tends to be dominated by water molecules
with exchange rates comparable to the residual (dipolar or quadrupolar) couplings and then cannot be described by BWR theory. Since water–protein (but not
intraprotein) dipole couplings are modulated by water exchange, cross relaxation (with water as the relaxation sink) can assume much greater importance than in solutions (see Sect. 3.2.3). Since (residual) static dipolar couplings are present, even spin diffusion (in the original sense) can be important for relaxation. 4.3.2. Generalized Relaxation Theory
If protein rotation is sufficiently slow or even inhibited, the correlation time in the spectral density function in Eq. (36) no longer equals the rotational
correlation time If the fractions are small, the mean time during which a water molecule diffuses between two successive visits to a long-lived site
is sufficiently long that, after leaving a given internal site, a water molecule can reach any other site (on the same protein molecule or on a different one) with essentially equal probability. If the semisolid protein sample is macroscopically isotropic or nearly so, as for chemically cross-linked or highly concentrated protein solutions, it then follows that each exchange event brings about complete orienta-
Multinuclear Relaxation Dispersion Studies of Protein Hydration
459
tional randomization of the anisotropic quadrupole coupling. Equation (35) is then valid and the residence time becomes the correlation time, Since the residence times of internal water molecules span a wide range, from nanoseconds
to milliseconds at least, the motional-narrowing condition, Eq. (2), can be violated. This happens when is of the order of the inverse rigid–lattice coupling frequency or longer, i.e., about 1 for (see Table 1). Under such conditions, the second-order perturbation treatment inherent in the BWR theory must be replaced by a more general theory, such as the stochastic Liouville equation, where spin dynamics and molecular motion appear at the same level of description. A nonperturbative stochastic theory of spin relaxation by exchange among an isotropic distribution of locally anisotropic sites has recently been developed for quadrupolar nuclei (Halle, 1996) and is directly applicable to NMRD data from chemically cross-linked (Koenig and Brown, 1993; Koenig et al., 1993) or highly concentrated (Kimmich et al., 1990) protein solutions. Since the stochastic Liouville equation can be solved analytically for the isotropic exchange model (Halle, 1996), the entire spin dynamical behavior can be calculated within the low-dimensional spin space rather than in the computationally demanding infinite-dimensional direct-product space usually employed in stochastic Liouville calculations. For the experimentally relevant dilute regime the stochastic theory predicts that the longitudinal relaxation is exponential (as observed) with the relaxation rate obtained from Eq. (3a), but with the spectral density function in Eq. (36) replaced by the generalized spectral density function (Halle, 1996):
where is given by Eq. (37) with (since the locally averaged quadrupole tensor is taken to be uniaxial), and where, for The direct contribution from local motions has been neglected here, but can be added a posteriori if necessary (Halle, 1996). A similar (but not identical) result can be obtained less rigorously with the aid of Eqs. (3) and (32). This is not unexpected, because when the motional-narrowing condition in Eq. (2) coincides with the condition for fast-exchange averaging of local relaxation rates and when (so that the effective quadrupole coupling is sparse) BWR theory is approximately valid even when Eq. (2) is violated. As expected, Eq. (45) reduces to (the first term of) Eq. (36) when the motional-narrowing condition, Eq. (2), is satisfied. It should be noted that Eq. (45) is not subject to any restrictions on the relative magnitudes of and It is instructive to cast Eq. (45) on the form of the motional-narrowing spectral density, Eq. (36), as
460
Bertil Halle et al.
with the apparent fraction 1996)
and the apparent residence time
given by (Halle,
If Eq. (36) is used outside its domain of validity, the internal water fraction and residence time deduced from the dispersion profile are the apparent quantities in Eqs. (47). Equation (47b) shows that if then the apparent residence time deduced from the dispersion profile using motional-narrowing theory, is nothing
but the inverse of the residual quadrupole frequency For deuterons in buried water molecules, the residual quadrupole frequency should be close to (Table 1), while the residence times are expected to span a wide range. Figure 12 shows how deuterons with different residence times
Multinuclear Relaxation Dispersion Studies of Protein Hydration
461
contribute to the magnitude of the dispersion step. The maximum contribution comes from and the relative contribution is reduced by a factor 5 (or 50) when is shifted one (or two) decades away from If there is a distribution of residence times, the relaxation dispersion will thus be dominated by deuterons with residence times near The dispersion profile is therefore expected to show little temperature dependence. It has been demonstrated that these theoretical considerations can account for NMRD data from rotationally immobilized protein samples (Halle and Denisov, 1995). The previous interpretation of these data in terms of a universal residence time of 1
for protein-associated water
molecules (Koenig and Brown, 1993; Koenig et al., 1993; Koenig, 1995) thus appears to be an artifact of using the conventional (fast-exchange) perturbation theory of spin relaxation. In contrast, the nonperturbative, stochastic theory identifies the apparent correlation time of with the inverse of the residual quadrupole frequency, thus explaining its universality (for different proteins) and virtual independence of temperature (Halle and Denisov, 1995). The observed dispersion profiles (Fig. 13) are consistent with a broad distribution of residence times,
462
Bertil Halle et el.
spanning the range. These considerations are also relevant for dipolar relaxation in immobilized protein samples and for understanding the origin of
relaxation-based contrast in MRI images of soft tissues. The BWR theory can break down even for protein solutions if the protein tumbles sufficiently slowly. This should be the case for hemocvanin (9 MDa), with an apparent correlation time of 0.9 deduced from the dispersion (Koenig et al., 1975) and with virtually independent of temperature (Piculell and Halle, 1986). Both these observations can be rationalized by the generalized spectral
density function in Eq. (45). Originally, however, the inference that in hemocyanin solutions was taken as an indication that the standard two-state fast-exchange model is inapplicable (Koenig et al., 1975) when, in fact, it implies that the motional-narrowing condition in Eq. (2) is violated. 5.
QUANTITATIVE ANALYSIS OF NMRD DATA
Throughout most of this section, we assume that the relaxation rate is due entirely to water nuclei, as is always the case for With obvious modifications, however, most of the discussion applies also to labile hydrogens. Some considerations specific to labile hydrogens are presented in Sect. 5.7.
5.1.
Parametrization of the NMRD Profile
For the purpose of analyzing experimental NMRD data, it is convenient to express Eq. (32) on the form
Here, is a normalized dispersion function decreasing monotonically from 1 at the dispersion frequency. Furthermore, is the excess relaxation rate on the high-frequency plateau above the dispersion:
while
measures the magnitude of the dispersion step:
As long as relaxation is exponential, all relaxation rates are linear combinations of spectral densities. The decomposition of the spectral density function in Eq. (36), due to motional time scale separation, then carries over to the intrinsic relaxation rates which may be expressed as
Multinuclear Relaxation Dispersion Studies of Protein Hydration
463
with
in the extreme narrowing regime at all accessible frequencies. We consider first the simplest case where all water nuclei contributing to the dispersion exchange with bulk water rapidly compared to the intrinsic spin relaxation but slowly compared to the (isotropic) protein rotational diffusion. In the quadrupolar case, the (single-exponential) longitudinal relaxation rate is then given by Eq. (48) with the following identifications:
in Eq. (51) is the average of over all sites contributing to the dispersion (in analogy to the definition of and in Eq. (52) is the average of over these sites. Under the stipulated conditions, the measured relaxation dispersion is fully
characterized by the three parameters and as illustrated for a typical dispersion in Fig. 14. The dispersion function in Eq. (53) is commonly known as a Lorentzian dispersion, although it is, in fact, a sum of two Lorentzians. Nevertheless, it can be accurately approximated by the single-Lorentzian dispersion function (Hallenga and Koenig, 1976)
The difference between the normalized dispersion functions in Eqs. (53) and (55)
varies between + 0.013 and –0.016, with the zero crossing at In the case of relaxation, Eq. (48) should be replaced by
where, under fast-exchange conditions, the dispersion function for the intramolecular contribution is given by Eq. (53) and that for the intermolecular contribution by [cf. Eq. (10)]
464
Bertil Halle et al.
This dispersion function differs by less than 0.009 (over the full frequency range) from
If the small shift of the dispersion frequency is neglected, Eq. (56) can therefore be cast on the form of Eq. (48) with and
Here, is given by Eq. (51) and by Eq. (52) with intramolecular dipole frequency (Table 1)
replaced by the
Multinuclear Relaxation Dispersion Studies of Protein Hydration
465
with
the H–H separation in the water molecule. The value in Table 1 was derived from the (libration-corrected) intramolecular second moment of ice Ih, obtained by subtracting the calculated intermolecular moment from the measured second moment (Whalley, 1974). When this value is inserted in Eq. (61), one obtains which is the best available estimate of the intramolecular H–H separation in ice Ih (Kuhs and Lehmann, 1986). Finally, is given by
where the sum runs over all protein protons (i), the outer brackets signify averaging over all internal water protons (k), and the inner brackets signify averaging over any local motions, also taken into account via the intermolecular order parameters as defined in Eq. (42). Cross relaxation would alter the frequency dependence of but, as discussed in Sect. 3.2.3, such contributions are generally negligible.
5.2. Correlation Time If all water molecules contributing to the dispersion have residence times such that then the effective correlation time deduced from a fit of Eq. (48) to the NMRD data is simply the rotational correlation time of the protein, as assumed for Eqs. (51)–(54). The assumption that can be checked in several ways. For sufficiently dilute protein solutions, can be estimated from the Debye–Stokes–Einstein relation
with V the hydrodynamic volume of the protein and
the viscosity of the solvent.
This relation is strictly valid only for a protein that behaves as a smooth rigid sphere.
It is common to include a hydration layer in the volume V, but this practice has never been theoretically justified. Bead models have been developed to compute the hydrodynamic properties of real proteins from crystallographic data (Garcia de la Torre et al., 1994; Byron, 1997), thus taking into account surface roughness and nonsphericitv. Independent experimental estimates of may also be available, e.g., from relaxation. The assumption that may also be checked by recording NMRD profiles at different temperatures, since should have the same temperature dependence as (provided the protein structure is invariant), whereas the residence time is expected to vary more strongly (Denisov et al., 1996). If, by any of these means, it can be established that then Eq. (35) yields a lower bound for the residence time of any water molecule that contributes
466
significantly to the dispersion, i.e.,
Bertil Halle et al.
On the other hand, if the residence
time does not satisfy the inequalities then it can in principle be accurately determined from NMRD data (see Sect. 5.5.2).
5.3. Dispersion Amplitude
According to Eq. (52), the dispersion amplitude parameter contains information about the number of rapidly exchanging internal water molecules with residence times obeying and about their orientational order. It is convenient to express the internal water fraction , with the number of internal water molecules contributing to the dispersion and the total number of water molecules in the solution, both on a per-protein basis. . is typically of order and can be obtained from the protein concentration and the molecular weights of protein and (isotope-labeled) water. For a quantitatively reliable analysis of , an accurate determination of the protein concentration in the NMR sample is essential. This is particularly important in difference NMRD experiments (see Sect. 4.1.2). Whereas uncertainty in extinction coefficients usually limits the accuracy of spectrophotometrically determined protein concentrations to ca. 5% (Gill and von Hippel, 1989), chromatographic analysis of the entire amino acid content of a hydrolyzed aliquot of the protein solution can give protein concentrations to ca. 2% relative accuracy.
5.3.1. Quadrupole Coupling Constants The water and quadrupole frequencies given in Table 1 refer to the rigid-lattice limit of ice Ih. The use of ice values seems a priori justified at least for extensively hydrogen-bonded internal water molecules and is supported by detailed and NMRD studies of the singly buried water molecule W122 in BPTI (Denisov et al., 1995, 1996, 1997b). A wealth of solid-state NMR and NQR data on and QCCs in crystal hydrates and different ice polymorphs (Berglund et al., 1978; Poplett, 1982) as well as large-basis-set quantum-chemical calculations on molecular clusters (Halle and Wennerström, 1981b; Cummins et al., 1985, 1987; Eggenberger et al., 1992, 1993; Ludwig et al., 1995) have established correlations between the QCCs and the geometry of hydrogen bonding (or ion coordination). Judging from such data, the QCC variation among different internal water molecules should be small. In particular, the QCC ratio is nearly invariant at 30.5 1.5 in a variety of hydrogen-bonded solids (Poplett, 1982). Even for water molecules coordinated to ions, such as (Halle and Wennerström, 1981b), (Denisov and Halle, 1995c), and (Thomann et al., 1995), and for water adsorbed on NaX zeolite (Resing, 1976), the QCCs seem to differ little from the ice Ih value. In bulk water, however, the QCCs are 20%–25% larger than in ice Ih but virtually independent of temperature (van der Maarel et al., 1985, 1986;
Multinuclear Relaxation Dispersion Studies of Protein Hydration
467
Struis et al., 1987; Ludwig et al., 1995). These larger values are probably more appropriate for water molecules at the protein surface than for internal water molecules.
With knowledge about the protein concentration and the rigid-lattice coupling the value derived from the NMRD profile can be used to calculate the quantity
where
is the mean-square generalized order parameter for the
internal water
molecules responsible for the relaxation dispersion. Furthermore, denotes either the quadrupole frequency in Eq. (4) or the intramolecular dipole frequency in Eq. (61). In the 1Hcase, , obtained from Eqs. (60) and (62), should be used with Eq. (64). The available NMRD data from protein solutions suggest that is in the range 0.5–1.0 for buried water molecules. Since, by definition, cannot exceed 1, the quantity provides a lower bound for the number of long-lived internal water molecules in the protein. The actual number of long-lived
water molecules will be larger if
and/or if not all water molecules exchange rapidly with bulk water (see Sect. 5.5). On the other hand, if is known, as might be the case in a difference NMRD experiment, then SI can be obtained directly. If labile-hydrogen contributions (see Sect. 5.7) and intermediate-exchange effects (see Sect. 5.5) can be excluded (or corrected for), then the number should be the same for all three water nuclei. The ratio of the values derived from, say, the and dispersions then yields directly the ratio of the corresponding generalized order parameters, providing information about orientational disorder of internal water molecules.
5.3.2. Libration Amplitudes
The generalized order parameters and describe the effect on the relaxation dispersion of any reorientational motion of buried water molecules that is fast compared to the isotropic tumbling of the protein. Since the nuclear interaction tensors have different orientations with respect to the water molecule (Fig. 11), the three generalized order parameters provide independent information about the internal motion. This information is contained in the second-rank orientational order parameters in Eq. (38). To obtain a quantitative measure of the degree and anisotropy of orientational disorder, these order parameters can be translated into motional amplitudes with the aid of a model. In the anisotropic harmonic libration (AHL) model (Denisov et al., 1997b), the fast local motions are modeled in terms of three independent symmetric libration modes: (i) the rocking of the water molecule around an axis (x) perpendicular to the molecular plane, (ii)
468
Bertil Halle et al.
the wagging of the water molecule around an axis (y) parallel to the H–H vector,
and (iii) the twisting of the water molecule around its axis (see Fig. 11). In addition, the possibility of a fast 180° flip around the axis is included. In the AHL model, the angular variables are the libration angles , and for the rock, wag, and twist modes, respectively. The order parameters , can be expressed in terms of these variables as
where the angular brackets denote averages over the appropriate equilibrium distribution Due to the noncommutability of finite rotations, the order parameters in the AHL model depend on the order in which the rotations are applied. [The result in Eq. (65) corresponds to the order
first and
last.] For the libration
amplitudes of interest, however, this dependence is very weak and can be neglected.
On account of the symmetry of the libration modes, there are only 5 (rather than 25) independent order parameters, namely
In the presence of a
flip, the order parameters
must also reflect the
symmetry of the water molecule, which requires p to be even. The only effect of the flip is thus to make In the AHL model, the five order parameters in Eq. (66) are not independent
since they are all determined by the rms amplitudes
of the three libration
modes. The orientational distribution function for each mode is of the form
Multinuclear Relaxation Dispersion Studies of Protein Hydration
469
This distribution is normalized on the unrestricted interval rather than on The error introduced by this approximation is negligible for the libration amplitudes of interest (say, For the Gaussian distribution in Eq. (67), the five order parameters in Eq. (66) can be expressed in terms of the orientational averages:
with
and 2. Figure 15 shows the effect of each libration mode on the generalized order parameters. Some general observations can be made: (i) is most affected by the twist mode; (ii) is unaffected by the wag mode and is equally sensitive to rock and twist librations; and (iii) only is affected by the flip. Since a fast flip can reduce by as much as a factor 2.7 (in the absence of librational averaging), a comparison of the and dispersion amplitudes may help to diagnose this type of motion. In general, all three libration modes will be more or less excited. The preceding relations are valid for this general case and can be numerically inverted to obtain the three libration amplitudes from the
experimentally determined generalized order parameters. This strategy has recently
470
Bertil Halle et al.
been implemented for several buried water molecules in BPTI and two single-point mutants (Denisov et al., 1997b). While one of these (W122) is as ordered as a water molecule in ice Ih, the others are more disordered. Converting the libration amplitudes to rotational entropies, one finds that the three extensively hydrogenbonded buried water molecules in the Y35G mutant have a configurational entropy comparable to that of bulk water (Denisov et al., 1997b). This result clearly
challenges the conventional wisdom that bound water is highly ordered and suggests that the hydration of nonpolar cavities (Otting et al., 1997) may actually be entropically driven. 5.4. High-Frequency Plateau According to Eq. (51), the high-frequency excess relaxation rate contains contributions from reorientation and/or exchange of mobile surface waters and from local motions of internal water molecules. The latter contribution is usually negligible since and since is generally smaller than While this is certainly the case for subpicosecond librational motions, 180° flips of internal water molecules around the (dipole) axis can make a small but significant contribution to (Denisov and Halle, 1995c). For symmetry reasons, the flip does not contribute to If the flip is slow compared to protein tumbling , not even the relaxation can be affected, since the anisotropic quadrupole coupling then has been averaged to zero before any flips have occurred. The largest flip contribution can be expected when is close to For large proteins (long , water flips in the 1–10 ns range may actually produce an
observable secondary
dispersion step at higher frequencies. Usually, however,
the principal effect of water flips is not the small contribution to but the strong attenuation of (see Sect. 5.3.2). By definition, the contribution refers to water molecules in the extremenarrowing limit at all accessible frequencies. The relaxation rate is therefore proportional to an effective correlation time reflecting more or less restricted local rotation and/or exchange with bulk water. If the QCC is taken to be the same as for bulk water (see Sect. 5.3.1), we thus have If the second term in Eq. (51) can be neglected, we can use the relation andthe known values of and (directly measured on a reference water sample at the same temperature and isotopic composition as the protein solution) to calculate the quantity
where is the average correlation time for the water molecules at the protein surface. From relaxation studies of water in contact with various interfaces, it is
Multinuclear Relaxation Dispersion Studies of Protein Hydration
471
known that the dynamic perturbation is essentially confined to water molecules in direct contact with the surface (Woessner, 1980; Carlström and Halle, 1988; Volke et al., 1994). That this is the case also for proteins is suggested by molecular dynamics simulations (Brunne et al., 1993; Garcia and Stiller, 1993; Lounnas and Pettitt, 1994; Abseher et al., 1996; Rocchi et al., 1997; Kovacs et al., 1997). It is therefore reasonable to estimate for a monolayer, e.g., using the solvent-accessible surface area of the protein (as computed from crystallographic data) and a molecular area of 15 per water molecule. This leads to a dynamic retardation factor of 5–7 for most investigated native globular proteins (Denisov and Halle, 1996). Somewhat larger values for a few proteins, such as trypsin and BSA may be attributed to local motions within clusters of buried water molecules, as represented by the second term in Eq. (51). Although is known, it is useful to quote the ratio (rather than since the ratio depends neither on the tensorial rank of the interaction that induces relaxation (at least for a rotational diffusion model) nor on the isotopic composition of the water (fractionation factors are close to 1). To obtain an estimate for the time taken for a surface water molecule to rotate through one radian, may be multiplied by the (first-rank) dielectric relaxation time of bulk ca. 8 ps at 298 K. For typical globular proteins, one thus obtains values of order 50 ps (at 300 K). Being an arithmetic average over all surface waters, this value is biased toward the longer times in the (probably wide) distribution and may be markedly affected by a few “outliers.” Since both rotation and translation of exposed water molecules at the protein surface should be rate-limited by hydrogen-bond disruption, the 50-ps estimate also gives an indication of the average residence time of surface waters. 5.5. NMRD Time Scales 5.5.1. NMRD Windows For an internal water molecule to contribute fully to the entire relaxation dispersion, its residence time must be long compared to the rotational correlation time of the protein but short compared to the zero-frequency intrinsic relaxation time, If the local motion contribution to is ignored, these conditions can be expressed as
which may be said to define the “NMRD window” on residence times. Of course, water molecules that do not satisfy Eq. (70) may still contribute to the dispersion,
but do so with less than the maximum contribution Using Eqs. (3a), (32), (35), and (36), we can express the relative dispersion step as which becomes 1 when Eq. (70) is obeyed. This quantity is plotted as a function of in Fig. 16 for all three water nuclei, with
472
Bertil Halle et al.
and 100 ns, and from Table 1. (For has been increased by 30% to take intermolecular dipole couplings into account.) Due to the different rigid-lattice coupling frequencies, the intrinsic relaxation time is three orders of magnitude shorter for than for with falling in between (Table 1). The consequent variation of the width of the NMRD windows implies, for example, that some internal water molecules may give a large relative contribution to the dispersion but only a small one to the dispersion. For small to medium-sized proteins, with typically 5–10 ns, such differential window effects are important for residence times longer than a few 100 ns and must be taken into account when comparing values for different nuclei. It is also clear from Fig.
16a that although a large protein (100 kDa, say) may contain numerous buried water molecules, these will only contribute partially to the dispersion. The edge of the NMRD windows is due the competition of protein rotation and water exchange in orientationally averaging the anisotropic coupling, as expressed by Eq. (35). This is a pure correlation time effect and does not affect the value. 5.5.2. Water Residence Time For water molecules on the central plateau of the NMRD window, only lower and upper bounds on the residence time can be established, as expressed in Eq. (70). On the wide flanks of the NMRD window, however, can be accurately determined. On the flank, this requires independent information about (see
Sect. 5.2). Using this strategy, the residence time of water molecules in the narrow minor groove of a B-DNA dodecamer was recently determined to ns (at 277K(Denisov et al., 1997a) and ns at 253 K (Jóhannesson and Halle, 1998). Relatively short residence times, 5–10 ns at 300 K, have also been obtained for water molecules residing in deep surface pockets in ribonuclease A (Denisov and Halle, 1998) and ribonuclease Tl (Langhorst et al., 1999). Longer residence times can be determined by traversing the flank of the NMRD window as the temperature is varied. This is possible even within the restricted temperature range available with protein solutions since long residence times usually are associated with high (apparent) activation enthalpies. Furthermore, with decreasing temperature we not only move to the right on the in Fig. 16, but the edge of the NMRD window is also shifted to the left since increases (this actually shrinks the NMRD window from both sides). Due to the frequency dependence of the intrinsic relaxation rate the fast-exchange condition may be more strongly violated at low frequencies than at high frequencies. Since the dispersion is then more strongly attenuated at lower frequencies, the shape of the dispersion profile is affected. Provided that all water molecules contributing to the dispersion have the same residence time the Lorentzian form of Eq. (53) remains valid to an excellent approximation, but the dispersion is shifted to higher frequency (shorter and the dispersion amplitude
Multinuclear Relaxation Dispersion Studies of Protein Hydratlon
parameter is reduced. To show this, we return to Eq. (32), make use of the decomposition in Eq. (50), and carry out some rearrangements using the (excellent) approximation in Eq. (55). The result is again on the form of Eq. (48), but with in Eqs. (48) and (53) replaced by the effective correlation time
and
in Eq. (48) replaced by the effective amplitude parameter
473
474
Bertil Halle et al.
If the local motion contribution is in the fast-exchange limit
as is
usually the case, Eqs. (71) and (72) reduce to
where, in general, is given by Eq. (35). If NMRD profiles are recorded at a series of temperatures where the flank of the NMRD window is traversed, the residence time and its activation parameters can be determined from the variation of with temperature, as described by Eq. (73) and a suitable parametrization of (T). (The temperature dependence of is usually known; cf. Sect. 5.2.) The activation parameters are particularly valuable as they provide insight about the mechanism (usually largescale fluctuations of the protein structure) whereby a buried water molecule escapes from within a protein. The residence time can actually be obtained (at one temperature) without assuming a functional form for For example, at the temperature Eq. (73) yields (Often, when is outside the fast-exchange limit.) As an illustration of this approach, Fig. 17 shows the temperature dependence of deduced from and difference dispersions (see Sect. 4.1.2) isolating the contribution from the single buried water molecule W122 in BPTI. A joint fit to the two curves in Fig. 17 yielded a residence time at 300 K and an apparent activation enthalpy (Denisov et al., 1996). The temperature shift between the and curves is quantitatively accounted for by the different quadrupole frequencies of the two nuclei (Table 1). 5.6. Stretched Dispersions
The Lorentzian dispersion function is the fastest decaying function that can result from diffusive (overdamped) molecular motions. On the other hand, experimental dispersion profiles are sometimes found to be more extended than predicted
by Eq. (53). At least three factors can contribute to such dispersion stretching: (i) anisotropic protein rotation, (ii) protein–protein interactions, and (iii) a distribution of residence times extending into either or both flanks of the NMRD window. Depending on the circumstances, these effects can shift the dispersion to higher or lower frequency and/or stretch it over a wider frequency range. While it is straightforward to incorporate the effect of anisotropic rotational diffusion of the protein on the spectral density function, especially in the limit of rigid binding (Woessner, 1962), this generalization introduces not only one or two additional rotational diffusion coefficients as parameters but also requires information (available from high-resolution neutron diffraction data for a few proteins) about the orientation of all contributing internal water molecules (and
Multinuclear Relaxation Dispersion Studies of Protein Hydration
475
labile hydrogens) relative to the principal frame of the rotational diffusion tensor. In practice, this mechanism of dispersion stretching is probably unimportant for most globular proteins (aspect ratio (Denisov and Halle, 1995a). In concentrated solutions, protein–protein interactions may affect the relaxation dispersion. The hydrodynamic interference between nearby protein molecules retards their rotation to some extent; to first order the rotational diffusion coefficient is reduced by a factor at a protein volume fraction (Landau and Lifshitz, 1959; Montgomery and Berne, 1977), but the Lorentzian form of the spectral
density function is not significantly affected (Montgomery and Berne, 1977; Wolynes and Deutch, 1977). Direct interactions (electrostatic, van der Waals, and short-ranged), however, can induce a microscopically heterogeneous solution structure. Little is known about such heterogeneities apart from a few cases of specific association at the dimer or oligomer level. If internal water molecules (or labile hydrogens) experience different local environments on a time scale short compared
476
Bertil Halle et al.
to their spin relaxation times, then the observed relaxation dispersion will be a superposition of Lorentzian dispersions characterized by different rotational correlation times In the case of tight association, also the parameters and could vary. Large-scale heterogeneities that are not sampled on the relaxation time scale would give rise to multiexponential relaxation, but this has not been observed in protein solutions. Most proteins contain several internal water molecules, presumably with different residence times. Unless all residence times happen to fall on the central plateau of the NMRD window (Fig. 16), the Lorentzian dispersion term in Eq. (48) should be replaced by a sum over all contributing internal water molecules, i.e., the relaxation dispersion should be a weighted sum of Lorentzian dispersion functions with different (apparent) correlation times. If some residence times are not much longer than the rotational corelation time of the protein, Eq. (35) must be used. Provided that all contributing water molecules are in the fast-exchange limit, and are still given by Eqs. (51) and (52), but in Eq. (48) we must make the replacement
with as in Eq. (35) and the normalized amplitude factors In the event that all contributing internal water molecules have the same residence time the dispersion is Lorentzian but shifted to higher frequency, with an effective correlation time If or if and are comparable and is known, the residence time can thus be obtained directly from the dispersion. For the quadrupolar water nuclei, where Larmor frequencies above 100 MHz cannot be accessed, the shortest residence time that can be determined in this way is about 1 ns. If the fast-exchange limit is not applicable for all contributing water molecules, the dispersion can again become stretched (even if all In Eq. (48), we must then make the replacement
where
are given by Eqs. (71) and (72) with and as in Eq. (35). This mechanism for stretching and shifting the dispersion (to higher frequency) is particularly important for relaxation, where a large number of labile protons in intermediate exchange can contribute significantly to the dispersion (Denisov et al., 1997a). Stretched dispersions should also be more common for very large proteins: when is about 100 ns or longer, even the NMRD window does not exhibit a plateau region (Fig.
Multinuclear Relaxation Dispersion Studies of Protein Hydration
477
16a), in which case internal water molecules with different residence times will also have different effective correlation times. For relaxation, an additional complication may arise in the intermediate exchange regime in that the intrinsic relaxation
behavior may be slightly nonexponential (see Sect. 3.1.2). Traditionally, stretched dielectric and magnetic relaxation dispersions (and broad minima) have been accounted for in terms of empirical correlation time distributions (Yager, 1936; Connor, 1964). In connection with water NMRD studies of protein solutions and other aqueous biological systems, a lognormal distribution was favored initially (Blicharska et al., 1970; Kimmich and Noack, 1970a), but in
the past two decades most authors have used a so-called Cole–Cole dispersion for fitting stretched dispersions (Hallenga and Koenig, 1976). The original Cole–Cole dispersion function was used to describe dielectric dispersion data (Cole and Cole, 1941) and can be inverted to yield a particular correlation time distribution (Fuoss and Kirkwood, 1941). When this dispersion function was modified (Hallenga and Koenig, 1976) so as to be dimensionally commensurate with the real part of the spectral density function (which governs nuclear spin relaxation), its physical meaning was lost. In fact, it can be shown that the modified Cole–Cole dispersion does not correspond to any correlation time distribution (Halle et al., 1998). The significance of the effective correlation time extracted from a fit of the modified Cole–Cole dispersion to stretched NMRD data is therefore somewhat obscure. By inverting the Fourier transform in Eq. (5) and setting it follows that
The frequency integral of the modified Cole–Cole dispersion, however, exhibits an
unphysical divergence. A rigorous procedure has recently been developed for analyzing stretched NMRD profiles without the bias of an arbitrarily imposed
correlation time distribution (Halle et al., 1998). This model-free approach allows a separation of the static and dynamic information content of the dispersion data.
5.7.
Labile Hydrogens
Exchange averaging of macromolecular and water hydrogens is a potential pitfall in all water and relaxation work. Failure to appreciate this point has led to even qualitatively incorrect conclusions about hydration behavior. Well-documented cases include a study of poly (methacrylic acid) (Glasel, 1970) and a recent study of an oligonucleotide (Zhou and Bryant, 1996). In both cases, subsequent studies revealed that the relaxation effects that had been attributed to hydration water were entirely due to labile hydrogens (Halle and
Piculell, 1982; Denisov et al., 1997a).
478
Bertil Halle et al.
The labile hydrogen contribution to NMRD data from protein solutions has been characterized in greatest detail for BPTI. By recording and NMRD
profiles over a wide pH range (Fig. 18), the labile hydrogen contribution could be isolated and quantitatively accounted for in terms of known values and hydrogen exchange rate constants and intrinsic relaxation times of the expected magnitude (Denisov and Halle, 1995b). Since the intrinsic relaxation times of labile hydrogens are at least an order of magnitude longer for than for a larger fraction of the labile hydrogens contribute to the dispersion (Venu et al., 1997).
For BPTI, the labile proton contribution appears to dominate over the buried water contribution even at pH 7, where labile protons were previously thought to exchange too slowly to contribute to the dispersion (Koenig and Schillinger, 1969). While hydrogen exchange is a serious complication in and NMRD studies of protein hydration, it can also be used constructively to study side-chain
dynamics (via the intrinsic relaxation rates) and fast hydrogen exchange rates (not readily accessible with high-resolution techniques). More direct access to fast
proton exchange kinetics is provided by the CSM contribution to the transverse
Multinuclear Relaxation Dispersion Studies of Protein Hydration
479
relaxation rate (see Sect. 3.3.1). The CSM contribution usually dominates over the dipolar contribution to at frequencies of and increases strongly at higher frequencies since the chemical shifts are proportional to the magnetic field (Fig. 19). Most labile protons have chemical shifts of 1–5 ppm from the bulk water resonance. Even at moderate fields, therefore, is much larger than typical intrinsic relaxation rates of According to Eq. (29), which then applies, a given type of proton gives a maximum CSM contribution at a value where the (acid and base catalyzed) exchange rate matches the shift difference This gives rise to characteristic maxima in the dependence of (Fig. 19), which help to separate the contributions from different types of labile protons. If the chemical shifts are known, e.g., from high-resolution studies under conditions of slow exchange, a complete separation can possibly be achieved from
480
CPMG dispersions over a wide
Bertil Halle et al.
range (analogous to the
NMRD data in Fig.
18), perhaps including data at several fields.
6. OUTLOOK
Although water NMRD has been applied to protein solutions for nearly three decades, it is only in the last few years that this technique has matured to the stage where it can make significant contributions to protein science. At present, multinuclear NMRD and high-resolution NOE spectroscopy are the two most powerful NMR methods available for probing protein–water interactions in solution. The information provided by these two techniques is largely complementary. While NMRD has unsurpassed temporal resolution by its ability to map out the spectral
density function in the kHz–GHz range, NOE spectroscopy provides spatial resolution by spectral assignments that can establish the proximity of water molecules to specific protein protons. Although the water relaxation rate measured in an NMRD experiment reflects all rapidly exchanging water molecules in the sample, the frequency dependence separates the contributions from the few long-lived
(biologically interesting) water molecules and the many short-lived ones. Moreover, the location of long-lived water molecules can be established by difference NMRD experiments and with recourse to high-resolution crystal structures. (Also
the water NOE method relies on extrinsic structural information to convert chemical shifts into spatial coordinates and to distinguish water NOEs from chemically relayed NOEs.) While the water NOE method has so far been applied only to
solutions of small and medium-sized proteins (up to 22 kDa), the NMRD method is also applicable to very large proteins, subzero temperatures, and semisolid
samples. Labile proton exchange is a serious problem in NMRD as well as in water NOE spectroscopy (cross peaks from direct water NOEs cannot be distinguished from proton-exchange relayed NOEs and may be obscured by intense exchange cross peaks). Oxygen-17 relaxation, however, invariably reports on water molecules. The NMRD and NOE methods will undoubtedly continue to develop in ways that will allow a more detailed structural and dynamic characterization of water
molecules interacting with proteins and will remove some of the present methodological limitations. The ultimate goal is of course to combine the temporal resolution of NMRD with the spatial resolution of multidimensional high-field spectroscopy. The development of FFC instruments with high-field cryomagnets represents a step in this direction. For semisolid protein samples, such as biological tissues, the NMRD approach might be extended in several respects by employing more sophisticated pulse schemes, polarization transfer, and relaxation anisotropy. Building on recent advances in the study of protein hydration in solution, a
Multinuclear Relaxation Dispersion Studies of Protein Hydration
481
quantitative understanding of the molecular basis of relaxation-based contrast in soft-tissue imaging should also be within reach.
REFERENCES Abragam, A., 1961, The Principles of Nuclear Magnetism, Clarendon Press, Oxford.
Abseher, R., Schreiber, H., and Steinhauser, O., 1996, Proteins 25:366. Akasaka, K., 1979, J. Magn. Reson. 36:135. Alder, F., and Yu, F. C., 1951, Phys. Rev. 81:1067. Allerhand, A., and Thiele, E., 1966, J. Chem. Phys. 45:902.
Anderson, A. G., and Redfield, A. G., 1959, Phys. Rev. 116:583. Baguet, E., Chapman, B. E., Torres, A. M., and Kuchel, P. W., 1996, J. Magn. Reson. B 111:1. Balazs, E. A., Bothner-By, A. A., and Gergely, J., 1959, J. Mol. Biol. 1:147–154. Beckert, D., and Pfeifer, H., 1965, Ann. Phys. 16:262. Berglund, B., Lindgren, J., and Tegenfeldt, J., 1978, J. Mol. Struct. 43:179. Blicharska, B., Florkowski, Z., Hennel, J. W., Held, G., and Noack, F., 1970, Biochim. Biophys. Acta 207:381. Brunne, R. M., Liepinsh, E., Otting, G., Wüthrich, K., and van Gunsteren, W. F., 1993, J. Mol. Biol. 231:1040.
Brüssau, R. G., and Sillescu, H., 1972, Ber. Bunsenges. Phys. Chem. 76:31. Bull, T. E., Forsén, S., and Turner, D. L., 1979, J. Chem. Phys. 70:3106. Byron, O., 1997, Biophys. J. 72:408. Carlström, G., and Halle, B., 1988, Langmuir 4:1346. Chiu, Y., 1964, J. Math. Phys. 5:283. Chung, C.-W., and Wimperis, S., 1992, Mol. Phys. 76:47.
Civan, M. M., and Shporer, M., 1972, Biophys. J. 12:404. Cole, K. S., and Cole, R. H., 1941, J. Chem. Phys. 9:341. Connor, T. M., 1964, Trans. Faraday Soc. 60:1574. Conti, S., 1986, Mol. Phys. 59:449. Cummins, P. L., Bacskay, G. B., and Hush, N. S., 1987, Mol. Phys. 61:795. Cummins, P.L., Bacskay, G. B., Hush, N.S., Halle, B., and Engström, S., 1985, J. Chem. Phys.82:2002. Daszkiewicz, O. K., Hennel, J. W., Lubas, B., and Szczepkowski, T. W., 1963, Nature 200:1006. Denisov, V. P., Carlström, G., Venu, K., and Halle, B., 1997a, J. Mol. Biol. 268:118. Denisov, V. P., and Halle, B., 1994, J. Am. Chem. Soc. 116:10324. Denisov, V. P., and Halle, B., 1995a, J. Mol. Biol. 245:682. Denisov, V. P., and Halle, B., 1995b, J. Mol. Biol. 245:698. Denisov, V. P., and Halle, B., 1995c, J. Am. Chem. Soc. 117:8456. Denisov, V. P., and Halle, B., 1996, Faraday Discuss. 103:227. Denisov, V. P., and Halle, B., 1998, Biochemistry 37:9595.
Denisov, V. P., Halle, B., Peters, J., and Hörlein, H. D., 1995, Biochemistry 34:9046. Denisov, V. P., Johsson, B.-H., and Halle, B., 1999, Nature Struct. Biol. 6:253. Denisov, V. P., Peters, J., Hörlein, H. D., and Halle, B., 1996, Nature Struct. Biol. 3:505.
Denisov, V. P., Venu, K., Peters, J., Hörlein, H. D., and Halle, B., 1997b, J. Phys. Chem. B 101:9380. Edmonds, D. T., and Mackay, A. L., 1975, J. Magn. Reson. 20:515. Edmonds, D. T., and Zussman, A., 1972, Phys. Lett. 41A:167. Edzes, H. T., and Samulski, E. T., 1977, Nature 265:521. Edzes, H. T., and Samulski, E. T., 1978, J. Magn. Reson. 31:207. Eggenberger, R., Gerber, S., Huber, H., Searles, D., and Welker, M., 1992, J. Chem. Phys. 97:5898.
482
Bertil Halle et al.
Eggenberger, R., Gerber, S., Huber, H., Searles, D., and Welker, M., 1993, Mol. Phys. 80:1177. Eigen, M., 1964, Angew. Chemie (Int. Ed.) 3:1. Eliav, U., Shinar, H., and Navon, G., 1991, J. Magn. Reson. 94:439. Flesche, C. W., Gruwel, M. L. H., Deussen, A., and Schrader, J., 1995, Biochim. Biophys. Acta 1244:253. Florin, A. E., and Alei, M., 1967, J. Chem. Phys. 47:4268. Florkowski, Z., Hennel, J. W., and Blicharska, B., 1969, Nukleonika (Engl. transl.) 14:9. Fuoss, R. M., and Kirkwood, J. G., 1941, J. Am. Chem. Soc. 63:385. Furó, I., and Halle, B., 1995, Phys. Rev. E 51:466. Garcia, A. E., and Stiller, L., 1993, J. Comput. Chem. 14:1396. Garcia de la Torre, J., Navarro, S., Lopez Martinez, M. C., Diaz, F. G., and Lopez Cascales, J. J., 1994, Biophys. J. 67:530. Gill, S. C., and von Hippel, P. H., 1989, Anal. Biochem. 182:319. Glasel, J. A., 1968, Nature 218:953. Glasel, J. A., 1970, J. Am. Chem. Soc. 92:375. Grösch, L., and Noack, F., 1976, Biochim. Biophys. Acta 453:218. Halle, B., 1996, Prog. NMR Spectrosc. 28:137.
Halle, B., Andersson, T., Forsén, S., and Lindman, B., 1981, J. Am. Chem. Soc. 103:500.
Halle, B., and Denisov, V. P., 1995, Biophys. J. 69:242. Halle, B., Jóhannesson, H., and Venu, K., 1998, J. Magn. Reson. 135:1. Halle, B., and Karlström, G., 1983, J. Chem. Soc., Faraday Trans. 2 79:1031. Halle, B., and Piculell, L., 1982, J. Chem. Soc., Faraday Trans. 1 78:255. Halle, B., and Wennerström, H., 1981a, J. Magn. Reson. 44:89.
Halle, B., and Wennerström, H., 1981b, J. Chem. Phys. 75:1928. Halle, B., and Westlund, P. -O., 1988, Mol. Phys. 63:97. Hallenga, K., and Koenig, S. H., 1976, Biochemistry 15:4255. Hausser, R., and Noack, F., 1964, Z. Phys. 182:93. Hausser, R., and Noack, F., 1965, Z. Naturforsch. 20a:1668.
Hertz, H. G., 1967, Ber. Bunsenges. Phys. Chem. 71:979. Hills, B. P., 1992, Mol. Phys. 76:489. Hills, B. P., Takacs, S. F., and Belton, P. S., 1989, Mol. Phys. 67:903. Hindman, J. C., 1966, J. Chem. Phys. 44:4582. Huang Kenéz, P., Carlström, G., Furé, I., and Halle, B., 1992, J. Phys. Chem. 96:9524. Jaccard, G., Wimperis, S., and Bodenhausen, G., 1986, J. Chem. Phys. 85:6282. Jacobson, B., Anderson, W. A., and Arnold, J. T., 1954, Nature 173:772. Jardetzky, C. D., and Jardetzky, O., 1957, Biochim. Biophys. Acta 26:668–669. Job, C., Zajicek, J., and Brown, M. F., 1996, Rev. Sci. Instrum. 67:2113. Jóhannesson, H., and Halle, B., 1998, J. Am. Chem. Soc. 120:6859. Kimmich, R., 1971, Z. Naturforsch. 26b:1168. Kimmich, R., 1980, Bull. Magn. Reson. 1:195. Kimmich, R., Gneiting, T., Kotitschke, K., and Schnur, G., 1990, Biophys. J. 58:1183.
Kimmich, R., and Noack, F., 1970a, Z. Naturforsch. 25a:299. Kimmich, R., and Noack, F., 1970b, Z Angew. Phys. 29:248. Koenig, S. H., 1995, Biophys. J. 69:593. Koenig, S. H., and Brown, R. D., 1987, in NMR Spectroscopy of Cells and Organisms, Vol. II (R. K.
Gupta, ed.), CRC Press, Boca Raton, FL, pp. 75–114. Koenig, S. H., and Brown, R. D., 1991, Prog. NMR Spectrosc. 22:487.
Koenig, S. H., and Brown, R. D., 1993, Magn. Reson. Med. 30:685. Koenig, S. H., Brown, R. D., and Ugolini, R., 1993, Magn. Reson. Med. 29:77. Koenig, S. H., Bryant, R. G., Hallenga, K., and Jacob, G. S., 1978, Biochemistry 17:4348. Koenig, S. H., Hallenga, K., and Shporer, M., 1975, Proc. Nat. Acad. Sci. U.S.A. 72:2667.
Multinuclear Relaxation Dispersion Studies of Protein Hydration
483
Koenig, S. H., and Schillinger, W. E., 1969, J. Biol. Chem. 244:3283. Kovacs, H., Mark, A. E., and van Gunsteren, W. F., 1997, Proteins 27:395.
Kubinec, M. G., Culf, A. S., Cho, H., Lee, D. C., Burkham, J., Morimoto, H., Williams, P. G., and Wemmer, D. E., 1996, J. Biomol. NMR 7:236. Kuhs, W. F., and Lehmann, M. S., 1986, in Water Science Reviews, Vol. 2 (F. Franks, ed.), Cambridge University Press, Cambridge, pp. 1–65. Kwong, K. K., Hopkins, A. L., Belliveau, J. W., Chesler, D. A., Porkka, L. M., McKinstry, R. C., Finelli, D. A., Hunter, G. J., Moore, J. B., Barr, R. G., and Rosen, B. R., 1991, Magn. Reson. Med. 22:154. Landau, L. D., and Lifshitz, E. M., 1959, Fluid Mechanics, Pergamon Press, Oxford. Langhorst, U., Loris, R., Denisov, V. P., Doumen, J., Roose, P., Maes, D., Halle, B., and Steyaert, J., 1999, Protein Sci. 8:722. Lankhorst, D., and Leyte, J. C., 1984, Macromolecules 17:93. Lankhorst, D., Schriever, J., and Leyte, J. C., 1982, Ber. Bunsenges. Phys. Chem. 86:215.
Lankhorst, D., Schriever, J., and Leyte, J. C., 1983, Chem. Phys. 77:319. Liepinsh, E., and Otting, G., 1996, Magn. Reson. Med. 35:30. Lipari, G., and Szabo, A., 1982, J. Am. Chem. Soc. 104:4546. Lounnas, V., and Pettitt, B. M., 1994, Proteins 18:148. Ludwig, R., Weinhold, F., and Farrar, T. C., 1995, J. Chem. Phys. 103:6941. Lutz, O., and Oehler, H., 1977, Z. Naturforsch. 32a:131. Luz, Z., and Meiboom, S., 1963a, J. Am. Chem. Soc. 85:3923. Luz, Z., and Meiboom, S., 1963b, J. Chem. Phys. 39:366.
Luz, Z., and Meiboom, S., 1964, J. Chem. Phys. 40:2686. Mao, X.-A., Guo, J.-X., and Ye, C.-H., 1994, Chem. Phys. Lett. 222:417. Martin, M. L., Delpuech, J.-J., and Martin, G. J., 1980, Practical NMR Spectroscopy, Heyden, London. Mateescu, G. D., Yvars, G. M., and Dular, T., 1988, in Water and Ions in Biological Systems (P. Läuger, L. Packer, and V. Vasilescu, eds.), Birkhäuser-Verlag, Basel, pp. 239–250. McLachlan, A. D., 1964, Proc. Roy. Soc. (London), Ser. A 280:271. Meiboom, S., 1961, J. Chem. Phys. 34:375. Montgomery, J. A., and Berne, B. J., 1977, J. Chem. Phys. 67:4589. Noack, F., 1971, in NMR, Basic Principles and Progress, Vol. 3 (P. Diehl, E. Fluck, and R. Kosfeld, eds.), Springer-Verlag, Berlin, pp. 83–144.
Noack, F., 1986, Prog. NMR Spectrosc. 18:171. Noack, F., 1995, in Encyclopedia of Nuclear Magnetic Resonance (D. M. Grant, and R. K. Harris, eds.),
Wiley, New York, pp. 1980–1990. Noack, F., Becker, S., and Struppe, J., 1997, Annu. Rep. NMR Spectrosc. 33:1. Noggle, J. H., and Schirmer, R. E., 1971, The Nuclear Overhauser Effect, Academic Press, New York. Odeblad, E., and Lindström, G., 1955, Acta. Radiol. 43:469. Otting, G., 1997, Prog. NMR Spectrosc, 31:259. Otting, G., Liepinsh, E., Halle, B., and Frey, U., 1997, Nature Struct. Biol. 4:396. Otting, G., and Wüthrich, K., 1989, J. Am. Chem. Soc. 111:1871. Pekar, J., and Leigh, J. S., 1986, J. Magn. Reson. 69:582. Piculell, L., and Halle, B., 1986, J. Chem. Soc., Faraday Trans. 1 82:401. Pitner, T. P., Glickson, J. D., Dadok, J., and Marshall, G. R., 1974, Nature 250:582. Poplett, I. J. F., 1982, J. Magn. Reson. 50:397. Redfield, A. G., Fite, W., and Bleich, H. E., 1968, Rev. Sci. Instrum. 39:710. Resing, H. A., 1976, J. Phys. Chem. 80:186.
Rhim, W. K., Burum, D. P., and Elleman, D. D., 1979, J. Chem. Phys. 71:3139. Rocchi, C., Bizzarri, A. R., and Cannistraro, S., 1997, Chem. Phys. 214:261. Ronen, I., and Navon, G., 1994, Magn. Reson. Med. 32:789. Rose, K. D., and Bryant, R. G., 1980, J. Am. Chem. Soc. 102:21.
484
Bertil Halle et al.
Rubinstein, M., Baram, A., and Luz, Z., 1971, Mol. Phys. 20:67. Sceats, M. G., and Rice, S. A., 1980, J. Chem. Phys. 72:3236. Schriever, J., and Leyte, J. C., 1977, Chem. Phys. 21:265. Schweikert, K. H., Krieg, R., and Noack, F., 1988, J. Magn. Reson. 78:77.
Shaw, T. M., and Elsken, R. H., 1953, J. Chem. Phys. 21:565. Sitnikov, R., Furó, I., and Henriksson, U., 1996, J. Magn. Reson. A 122:76. Solomon, I., 1955, Phys. Rev. 99:559. Spiess, H. W., Garrett, B. B., Sheline, R. K., and Rabideau, S. W., 1969, J. Chem. Phys. 51:1201. Stoesz, J. D., Redfield, A. G., and Malinowski, D., 1978, FEBS Lett. 91:320. Stolpen, A. H., Reddy, R., and Leigh, J. S., 1997, J. Magn. Reson. 125:1. Struis, R. P. W. J., de Bleijser, J., and Leyte, J. C., 1987, J. Phys. Chem. 91:1639. Swift, T. J., and Connick, R. E., 1962, J. Chem. Phys. 37:307. Sykora, S., and Ferrante, G. M., 1995, NMR Relaxometry with FFC Spinmaster, Technical Note 954.1, Stelar s.n.c., Mede (PV), Italy. Thomann, H., Bernardo, M., Goldfarb, D., Kroneck, P. M. H., and Ullrich, V., 1995, J. Am. Chem. Soc. 117:8243. Tromp, R. H., de Bleijser, J., and Leyte, J. C., 1990, Chem. Phys. Lett. 175:568. van de Ven, F. J. M., Janssen, H. G. J. M., Gräslund, A., and Hilbers, C. W., 1988, J. Magn. Reson. 79:221. van der Maarel, J. R. C., Lankhorst, D., de Bleijser, J., and Leyte, J. C., 1985, Chem. Phys. Lett. 122:541. van der Maarel, J. R. C., Lankhorst, D., de Bleijser, J., and Leyte, J. C., 1986, J. Phys. Chem. 90:1470. van der Klink, J. J., Schriever, J., and Leyte, J., 1974, Ber. Bunsenges. Phys. Chem. 78:369.
Venu, K., Denisov, V. P., and Halle, B., 1997, J. Am. Chem. Soc. 119:3122. Volke, F., Eisenblätter, S., Galle, J., and Klose, G., 1994, Chem. Phys. Lipids 70:121. Wang, J. H., 1955, J. Am. Chem. Soc. 77:258. Werbelow, L., and Pouzard, G., 1981, J. Phys. Chem. 85:3887.
Westlund, P.-O., and Wennerström, H., 1982, J. Magn. Reson. 50:451. Whalley, E., 1974, Mol. Phys. 28:1105. Woessner, D. E., 1962, J. Chem. Phys. 37:647. Woessner, D. E., 1980, J. Magn. Reson. 39:297.
Woessner, D. E., and Snowden, B. S., 1970, J. Colloid Interface Sci. 34:290. Wolynes, P. G., and Deutch, J. M., 1977, J. Chem. Phys. 67:733. Wu, D., and Johnson, C. S., 1994, J. Magn. Reson. A 110:113. Yager, W. A., 1936, Physics 7:434. Zhou, D., and Bryant, R. G., 1996, J. Biomol. NMR 8:77. Zimmerman, J. R., and Brittin, W. E., 1957, J. Phys. Chem. 61:1328.
11
Hydration Studies of Biological Macromolecules by Intermolecular Water-Solute
Gottfried Otting 1. INTRODUCTION
The use of intermolecular water–peptide NOEs (nuclear Overhauser effect) for the detection of solvent exposure was already in 1974 (Pitner et al., 1974). With improved equipment, it is possible today to obtain a much more complete picture of the hydration of biomolecules in aqueous solution. This chapter describes from “Progress in Nuclear Magnetic Resonance Spectroscopy,” Vol. 31, Gottfried Otting, NMR Studies of Water Bound to Biological Molecules, pp. 259–285, 1997, with kind permission from Elsevier Science-NL, Sara Burgerhartstraat 25, 1055 KV Amsterdam, The Netherlands. The exploitation of intermolecular water–solute NOEs in biological molecules was
originally proposed in 1973 by N. Rama Krishna and Sidney L. Gordon in their study of the effects on solutes with coupled spin systems [J. Chem. Phys. 58 (1973), 5687–5696]. The first demonstration of an intermolecular solvent–solute NOE dates back to 1965 when Reinhold Kaiser reported the observation of an enhancement in a chloroform proton signal when the solvent cyclohexane was saturated [J. Chem. Phys. 42 (1965), 1838–1839]. Gottfried Otting • Department of Medical Biochemistry and Biophysics, Karolinska Institute, S-171 77 Stockholm, Sweden. Biological Magnetic Resonance, Volume 17: Structure Computation and Dynamics in Protein NMR, edited by Krishna and Berliner. Kluwer Academic / Plenum Publishers, New York, 1999.
485
486
Gottfried Otting
the use of the nuclear Overhauser effect in high-resolution NMR spectroscopy to detect and localize the water molecules hydrating proteins, DNA and RNA molecules. Other reviews are Kubinec and Wemmer (1992a), Wüthrich et al. (1992), Kochoyan and Leroy (1995), Billeter (1995), and Otting and Liepinsh (1995a). Intermolecular NOEs observed between the water and the solute allow the identification of individual hydration water molecules in the presence of a very large excess of bulk water which appears at the same chemical shift as the signals from the hydration water. This is possible because NOEs are effectively observed only for internuclear distances shorter than 4–5 Å. NOEs observed between the single, averaged water resonance and the solute thus report on direct interactions between the solute and the first shell of hydration. The degeneracy of the chemical shifts of hydration water and bulk water is a consequence of the chemical exchange between the two environments which is fast
on the chemical-shift time scale (milliseconds). Chemical exchange in this context refers not only to the exchange of entire water molecules but also to proton exchange between different water molecules. The proton exchange between water molecules is catalyzed by acids and bases and is slowest at neutral (Meiboom, 1961). Given the proton exchange rates in pure water (ca. at and 25°C, corresponding to an average proton residence time on a water oxygen of 1 ms), it is not surprising that, in general, only a single, averaged NMR resonance is observed for hydration water and bulk water, although it is in principle possible that some proteins contain single water molecules in internal cavities that exchange sufficiently slowly with the bulk water to give rise to resolved NMR signals. To date, such a case has not been reported, illustrating the presence of conformational fluctuations in proteins which trigger the exchange of internal hydration water molecules with bulk water within milliseconds even at temperatures near the freezing point of water. The signal from a hydration water proton exchanging with a rate of with the bulk water would have a linewidth of which would hardly be detectable in the crowded spectrum of a biological macromolecule. In principle, the exchange between different water molecules could be slowed down by the use of organic solvent molecules. For example, the hydration of the peptide antamanide has been studied in chloroform solution (Peng et al., 1996). Water-soluble proteins, DNA and RNA fragments, however, lose their native three-dimensional structure in pure or nearly pure organic solvents. Water molecules can be used with four NMR active isotopes: and Of those, deuterium and relax too rapidly to be suitable for high-resolution NMR spectroscopy and tritium is difficult to handle at high concentration because of its radioactivity. In addition, the magnetization transfer rate due to the NOE between two spins A and B depends on the gyromagnetic ratio of the spins as The high natural abundance of protons in water and biomolecules provides optimum sensitivity for the observation of NOEs at no extra cost.
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
487
In the following, principal differences between intermolecular and intramolecular NOEs are discussed, NMR experiments suitable for the measurement of intermolecular water–solute NOEs are evaluated, and protein hydration studies using intermolecular NOEs are reviewed. A further section briefly summarizes and compares the biophysical information obtained from NOE studies with that obtained from NMRD measurements, X-ray crystallography, and molecular dynamics simulations.
2. THEORETICAL BACKGROUND FOR INTERMOLECULAR NOEs The NOEs can be observed either by the transfer of longitudinal or transverse magnetization between spins. The latter is also referred to as ROE (rotating frame NOE). Throughout this article, the term NOE is used to describe both the cross relaxation in the laboratory frame and the rotating frame; the distinction between NOE and ROE is made only in the terms and which describe the rates of magnetization transfer between two protons by cross relaxation in the respective
frames of reference. The cross relaxation rates and Bothner-By et al., 1984; Griesinger and Ernst, 1987)
where
are defined by (e.g.,
is the spectral density at frequency is the Larmor frequency, is the gyromagnetic ratio of the protons,
is Planck’s constant divided by
and
is the induction
constant. The spectral density functions depend on the model describing the change in length and orientation of the vector connecting the two nuclear spins involved in the dipole–dipole interaction.
Since spectral densities do not assume negative values, is always positive, while can be positive or negative. The values are negative when the high-frequency components of the spectral density function are unimportant compared to its component at zero frequency, i.e., for slow reorientation of the internuclear vector. Negative values are typically observed for intramolecular NOEs between the protons in slowly tumbling macromolecules. Positive
values are observed for the intramolecular NOEs in small, rapidly reorientating molecules and for intermolecular NOEs, if at least one of the compounds is very mobile. Note that positive cross-relaxation rates yield negative cross peaks in NOESY and ROESY spectra (i.e., of opposite sign than the diagonal peaks), whereas negative cross-relaxation rates yield positive cross peaks.
488
Gottfried Otting
2.1. NOE between Two Rigidly Bound Protons
Intermolecular water–solute NOEs can be treated like intramolecular NOEs within the solute, if the hydration water molecules are rigidly bound for longer than the rotational correlation time of the solute (typically nanoseconds). This is the case, for example, for hydration water molecules bound in the interior of a protein with hydrogen bonds providing an icelike environment. The spectral density for the simple case of the interaction between two protons attached to an isotropically tumbling sphere (“rigid sphere model,” Fig. 1) is
where r is the interproton distance and the reorientation rate of the sphere.
is the rotational correlation time describing
Plots of and calculated using Eqs. (2) and (3) for different Larmor frequencies are given in Fig. 1 as a function of the rotational correlation time The curves show that (i) the signs of and are the same for small, rapidly tumbling molecules (i.e., small ), and (ii) the sign change of occurs at values of corresponding to 300 ps at a spectrometer frequency of
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
489
For lower Larmor frequencies, the sign change shifts to longer correlation times; i.e., lower magnetic fields favor positive values. The point where changes sign separates the “fast-motional regime” from the “slow-motional regime.” 2.2. NOE between Solute Proton and Bound but Locally Reorientating Water The description of the intermolecular becomes more complicated if the bound water molecule performs motions with a local correlation time shorter than the rotational correlation time of the solute. This situation is quite usually encountered for slowly tumbling proteins and other biological macromolecules. In the extreme case, can be positive for the water-solute interaction, while the rates between protons of the macromolecule are negative. Using explicit models for which analytical expressions of the spectral density function are available, one can show that water–solute NOEs with positive rates are observed with macromolecular systems only if the water molecule is displaced by more than its own diameter within less than a nanosecond, i.e., for rapid exchange
of water molecules (Otting et al., 1991a).
One of the models which can be calculated analytically is the “wobbling-in-acone” model (Fig. 2A) (Richarz et al., 1980; Fujiwara and Nagayama, 1985). Here, a water molecule may be thought to be hydrogen bonded via its oxygen to a proton donor on the solute, with free rotation around the H-bond axis and an additional “wobbling” motion of the axis. The model predicts reduced rates for increased water mobility especially for motions in the time regime, where the rigid-sphere model would predict a sign change. Analytical expressions are also available for a model, where a water proton moves along a line connecting the center of the solute with a solute proton and the water proton (Fig. 2B) (Luginbühl, 1996). This model predicts reduced rates if the water moves rapidly with an amplitude corresponding to complete dissociation and reassociation, but positive values are hardly obtained if the solute is a macromolecule in the slow-motional regime. It appears quite generally that positive rates result more easily from rapid reorientation of the vector with respect to the main magnetic field than if the vector rapidly changes its length. The difficulty of obtaining positive rates by local motions only is also
supported by experimental results: if methane or hydrogen molecules are inserted under pressure into hydrophobic cavities of hen egg-white lysozyme, the values of the intermolecular NOEs observed are negative, although the local reorientation rate of the gas molecules is certainly in the fast-motional regime (Otting et al., 1997). The cross-relaxation rate between a “probe” proton of the solute and a proton of a small molecule trapped inside a cavity of the solute, which reorientates rapidly in the cavity with spherical symmetry, has also theoretically
490
Gottfried Otting
been shown to be the same as the NOE between the probe proton and a hypothetical proton located at the center of the cavity (Otting et al., 1997). The most general representation of local motion is obtained by the use of a generalized order parameter
In this case the spectral density is given by (Halle
and Wennerström, 1981; Lipari and Szabo, 1982; Denisov et al., 1997)
where
denotes the correlation time of the rapid local motion, is the correlation time of the overall rotational tumbling of the solute, r is the internuclear distance with indicating time averaging, and ranges between 0 and 1 for complete disorder and complete order, respectively.
2.3. NOE with Rapidly Diffusing Water Molecules In the extreme limit, water may not be bound at all, but simply diffuse past the solute with no further restriction than that imposed by the space excluded by the
solute. Analytical formulas have been calculated by Ayant et al. (1977) for the case
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
491
where solute and solvent molecules are represented by large and small spheres, respectively, with proton spins located at a certain distance underneath the surface of the hard spheres (Fig. 2C). The spectral density describing the intermolecular interaction is
where are the translational diffusion coefficients of the spheres with spins I (protein) and S (solvent)], is the density of the solvent spin S, and and are defined in Fig. 2C. In addition,
where K denotes the modified spherical Besse function of the third kind and
and
and
are the rotational diffusion coefficients given by Stokes’
law:
where k is the Boltzmann constant, T is the absolute temperature, and is the viscosity coefficient. In evaluating the first term of the double sum in Eq. 5, it is helpful to use the relation (Ayant et al., 1977)
Plotting and with Eqs. (5)–(9) as a function of the inverse translational diffusion coefficient D yields curves similar to those of Fig. 1. Using the Einstein– Smoluchowski relationship
the diffusion coefficient D can be translated into an average residence time
of a water molecule at its hydration site on the solute, assuming that the water molecule is exchanged after a displacement x by its own diameter. Calculating and with (Fig. 2C), and for a frequency of 600 MHz, the sign inversion of is predicted for a diffusioncoefficient This value is about six times smaller than
492
Gottfried Otting
the self-diffusion coefficient of pure water at 36°C (Hausser et al., 1966). Using Eq. (10) with it corresponds to a residence time of about 70 ps. This time span is four times shorter than the rotational correlation time at which changes sign in the simple model of Fig. 1. It should be noted that the conversion of the diffusion coefficient into a residence time of the hydration water by Eq. (10) assumes three-dimensional diffusion. The corresponding equations for two-dimensional or one-dimensional diffusion, respectively (Villars and Benedek, 1974), would predict twofold or fourfold increased residence times from the same diffusion coefficient. Furthermore, the rotational correlation times and dimensions of the spheres used to represent the solute and the water molecules change the precise value of the diffusion coefficient for which In particular, a smaller radius and a shorter rotational correlation time of the solute predicts positive
rates for longer
residence times. Biological macromolecules present a surface with more curvature to the solvent than a sphere with a smooth surface. Furthermore, the solventexposed chemical groups are often more mobile than corresponding groups in the interior of, for example, a protein. Thus, a positive value of a water–solute NOE indicates a water residence time shorter than about 1 ns, but is difficult to pinpoint more accurately. Intermolecular cross-relaxation rates have also been calculated for a model where the solute is represented by a planar surface, treating bulk water as a self-diffusing continuum (Brüschweiler and Wright, 1994). It was pointed out that
for this and the model of Ayant et al. (1977),
and
with water molecules
in the fast-motional regime are approximately proportional to the inverse of the internuclear distance r, in marked contrast to the dependence usually observed for NOEs (Brüschweiler and Wright, 1994; Wang et al., 1996a). Residence times that are less dependent on the precise parameters of an explicit model can be derived by replacing in Eqs. (3) and (4) by an effective correlation time which depends on the mean residence time and the rotational correlation time of the solute as (Clore et al., 1990; Denisov et al., 1997)
If used with Eq. (3), the resulting model assumes isotropic diffusional rotation of the solute, that the water at the hydration site is rigidly bound to the solute without local mobility, and that the water exchanges between two discrete states: the hydration site and the bulk water. If Eq. (11) is used with Eq. (4), the local mobility of the bound water is taken into account, too. Using this approach it has been demonstrated that the presence of local motions with an order parameter and a local correlation time can shift the sign inversion of to residence times of 1 ns and longer (Denisov et al., 1997).
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
493
3. ASSIGNMENTS OF WATER-SOLUTE CROSS PEAKS Water–solute cross peaks in NOESY and ROESY can come about by three principal
mechanisms: (i) direct water–solute NOEs, (ii) exchange-relayed NOEs, where the magnetization is transferred from the water to the solute by chemical exchange and further to another solute proton by an intrasolute NOE, and (iii) chemical exchange between a labile solute proton and the water (Fig. 3). Direct NOEs and exchange-relayed NOEs are readily distinguished from chemical exchange peaks by their different signs in ROESY: in ROESY, chemical exchange peaks have the same sign as the diagonal peaks, whereas NOEs and exchange-relayed NOEs give rise to negative peaks, when the diagonal peaks are plotted as positive peaks. In principle, positive ROESY cross peaks are also observed for magnetization transferred by two subsequent NOE steps during the mixing time (“spin diffusion”) (Farmer et al., 1987) and by TOCSY-type transfers. Since spin-diffusion peaks tend to be very weak in ROESY and TOCSY-type cross peaks are prominent only near the diagonal and antidiagonal of a two-dimensional ROESY spectrum (Glaser and Drobny, 1990), they are disregarded in the following discussion. Since the rotational correlation times of biological macromolecules usually are much longer than , where is the Larmor frequency, intramolecular NOEs invariably lead to positive NOESY cross peaks. Therefore, negative NOESY cross peaks with the water are always direct NOEs. For positive water–solute NOESY cross peaks, which have been shown not to arise from direct chemical exchange by a corresponding ROESY spectrum, it is necessary to consider the possibility of
494
Gottfried Otting
exchange-relayed NOEs, before the assignment of a direct water–solute NOE can be made. The distinction between exchange-relayed NOEs and water–solute NOEs is usually not possible experimentally. Therefore, the possibility of exchangerelayed NOEs can be excluded only if the solute proton involved in the water-solute cross peak is at least 4-5 Å from any labile solute proton which exchanges rapidly with the water. It is thus important to know the three-dimensional structure of the solute before assigning water-solute cross peaks.
4. NMR EXPERIMENTS FOR THE DETECTION OF INTERMOLECULAR NOEs WITH WATER 4.1. Water Suppression Water suppression is required because the analog-digital converters (ADC) in commercial NMR spectrometers cannot adequately digitize small signals at the low signal amplification needed to digitize the entire unsuppressed water signal. Although a two-dimensional NOESY spectrum is symmetric with respect to the diagonal, intermolecular NOE cross peaks between water and solute protons can be observed with acceptable sensitivity only in the cross section along the frequency axis (row) taken at the chemical shift of the water resonance, because the corresponding cross section along the frequency axis (column) taken at the chemical shift of the water resonance is obscured by noise from the residual, incompletely suppressed water signal. The intermolecular water–solute NOEs detected in the row along the frequency axis arise from the magnetization transfer, where the water protons are frequency labeled during the evolution time part of the water magnetization is transferred to the solute during the mixing time and subsequently detected at the frequencies of the solute protons during the detection period Thus, the measurement of water-solute NOEs requires that the water suppression takes place after the NOE mixing time and before the detection period Many different experimental schemes are available to excite a spectrum without exciting the water resonance (e.g., Plateau and Guéron, 1982; Hore, 1983; et al., 1987; and Bax, 1987a; Smallcombe, 1993). Most of these assume, however, that the water magnetization is aligned along the positive at the start. (By definition, the is the axis parallel to the main magnetic field; equilibrium magnetization is aligned along the positive ) At the end of a NOESY or ROESY mixing period, however, the water magnetization is usually not simply aligned along the Spin-lock pulses, Watergate, and diffusion filters can be used which suppress the water resonance irrespective of the starting conditions.
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
495
4.1.1. Pair of Spin-Lock Pulses Spin-lock pulses defocus magnetization not aligned along the spin-lock axis by the spatial inhomogeneity of the radio-frequency field. Consequently, spin-lock pulses are most effective when applied at high power. High-power spin-lock pulses of 1 to 2 ms duration are sufficient for nearly complete averaging of the magnetization in the plane orthogonal to the spin-lock axis. A pair of orthogonal spin-lock pulses without interpulse delay suppresses all magnetization. With a free precession delay between the two spin-lock pulses, only the magnetization at the carrier frequency and at multiples of are suppressed. Therefore, the sequence can be used to suppress the water resonance if the carrier is at the water resonance. Solute magnetization which starts as y-magnetization and precesses by 90° during the delay is not suppressed. The resulting excitation profile follows the function where is the frequency relative to the carrier frequency. To avoid echo effects, the spin-lock pulses should be of different length, e.g., 0.5 ms for and 2 ms for If the delay is set to l/(spectral width), the excitation profile covers the spectral halves to the left and to the right of the carrier frequency (water frequency), each with a single lobe of the sine function (Otting et al., 1991b).
If the water suppression sequence follows a NOESY mixing time, the first spin-lock pulse can be replaced by a pulsed-field gradient (PFG) or homospoil pulse during the mixing time. The gradient selects the longitudinal magnetization which is aligned along the y-axis after the pulse at the end of the NOESY mixing time. In an analogous way, the first spin-lock pulse can be replaced by the spin lock of a ROESY mixing period. Spin-lock pulses are the quickest way of adequate water suppression.
4.1.2. Watergate The Watergate sequence (Piotto et al., 1992) uses the sequence 90°(selective)180°-90°(selective) with PFGs before and after the 90° pulses. The 90° pulses are selective for the water resonance. Therefore, the water resonance experiences a 360° or 0° rotation, while the solute resonances, which are not affected by the selective pulses, experience only the 180° pulse. With two PFGs of equal amplitude and sign, any transverse water magnetization is dephased by both PFGs, while the solute magnetization is defocused by the first PFG and refocused by the second. The sequence combines excellent water suppression with a uniform excitation profile which is decreased only near the water resonance, depending on the bandwidth of the 90° pulses. Furthermore, Watergate can be combined with selective water-flipback pulses, which selectively take the residual water magnetization to the positive before the Watergate sequence, which then only purges residual transverse water magnetization. A drawback of the Watergate scheme compared to a pair of spin-lock pulses is the fact that the solute magnetization stays
496
Gottfried Otting
transverse for a longer time, causing some signal loss by transverse relaxation and small distortions of the solute signals by scalar coupling evolution. A typical PFG
takes about 0.5–1 ms, and 90° pulses shorter than about 2 ms are no longer very selective, leading to phase and amplitude distortions in the spectrum near the water frequency. Scalar coupling evolution can be refocused if the 180° pulse in the Watergate sequence excites only some of the solute resonances, without exciting their coupling partners (Mori et al., 1994). Another variant of the Watergate sequence uses a pulse train of six hard pulses separated by short free precession delays to replace the 90°(selective)- 180°-90°(selective) sequence ( et al., 1993). The relative amplitudes of the pulses are 3:9:19:19:9:3. This 3-9-19 sequence has the advantage of robustness: no amplitudes and phases of selective pulses have to be optimized to achieve good water suppression. On the other hand, the excitation profile is no longer uniform, with a broad
zero-excitation region around the water and at the ends of the spectrum. 4.1.3. Diffusion Filter
One of the simplest diffusion filters uses the spin-echo sequence, with a strong PFG during each of the delays (van Zijl and Moonen, 1990; Wider et al., 1994; Wu et al., 1995). Water and solute magnetization defocus during the
first gradient and refocus during the second. For sufficiently long and strong gradients, the diffusion of the water molecules during the spin-echo sequence prevents the complete refocusing of the water magnetization, whereas the magnetization of the larger, more slowly diffusing solute is refocused more completely. This results in a preferential suppression of the water signal. Diffusion filters are typically longer than 10 ms, causing significant loss of magnetization by transverse relaxation and impure phases by evolution under scalar couplings. Their main advantages are a uniform excitation profile, which allows the detection of solute signals even under the water resonance, and the possibility of suppressing multiple solvent signals simultaneously (Ponstingl and Otting, 1997a). 4.2. Selective Water Excitation
The NMR signals of all hydration water molecules in proteins and DNAs are at the same frequency, because bound water and bulk water exchange rapidly on the NMR time scale (milliseconds). Consequently, all water–solute cross peaks are observed in two-dimensional NOESY and ROESY experiments in a single cross section. Although the intrasolute cross peaks in the two-dimensional spectra may be helpful for assigning the water–solute cross peaks, the intermolecular water– solute cross peaks could be recorded with better sensitivity and in a shorter
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
497
experimental time by selectively recording the cross section of interest in a onedimensional experiment using selective water excitation. Since two-dimensional spectra can be recorded in a few hours, one-dimensional versions, which may be more complicated to set up, do not provide important time savings. Selective water excitation is thus most important for recording two-dimensional analogs of three-dimensional NMR experiments, which would take days to record with adequate resolution. Studying biomolecular hydration by three-dimensional experiments allows the assignment of the water–solute cross peaks in crowded spectral regions. In these experiments, the magnetization transfer from water to the solute is followed by a second mixing time during which the magnetization is transferred further to other solute spins through scalar couplings
or NOEs (Otting et al., 1991b; Holak et al., 1992). The two-dimensional analogs with selective water excitation can be considered as two-dimensional experiments,
where the starting magnetization is obtained by the prior water–solute magnetization transfer. The selective excitation of the water is complicated by the phenomenon of radiation damping (Abragam, 1961). Radiation damping is caused by the interaction of the precessing magnetization with the detection coil of the probehead. The current induced in the coil acts back on the precessing magnetization like a conventional radio-frequency pulse, causing a rotation of the precessing magnetization toward the positive Consequently, transverse magnetization decays more rapidly than one would expect from relaxation. For inverted, longitudinal magnetization, any minor residual transverse component of the magnetization triggers radiation damping, increasing the amount of transverse magnetization until the magnetization passes through the transverse plane. Thus, the FID of the water signal after a 180° pulse grows and decays, with an envelope reminiscent of a Gauss function. This envelope is a direct measure of the current induced in the coil; i.e., it represents the pulse shape acting back on the water magnetization (Otting and Liepinsh, 1995b). On a 600-MHz NMR spectrometer, radiation damping can turn the water magnetization from the negative to the positive within 50 ms. Thus, no selective pulse can effectively excite the water in the presence of radiation damping if it is longer than 50 ms. Radiation damping is proportional to the intensity of the NMR signal and to the quality factor Q of the probehead. In practice, radiation damping is important only for probeheads with high quality factor as they are common at frequencies above 400 MHz. In a dilute solution of a biomolecule in the water resonance is prone to radiation damping, but the resonances of the biomolecule are not. The selectivity of radiation damping can be assessed quantitatively from the envelope of the FID observed after a 180° pulse, which describes the shape of the selective pulse arising from the current induced in the coil.
498
Gottfried Otting
In the following, different selective water excitation schemes are discussed using one-dimensional NOESY experiments as examples. The experiments represent also examples of different solvent suppression schemes.
4.2.1. Selective Water Excitation by a 90° Pulse The simplest one-dimensional experiment would be a NOESY experiment, where the excitation pulse and the evolution time are replaced by a selective 90° pulse at the water frequency (Fig. 4A). As discussed, a long, selective 90° pulse does not provide good sensitivity in the presence of radiation damping. Nonetheless, straight selective or semiselective 90° pulses have been used for water excitation in a couple of examples (Fig. 4B–E). With samples, selective water excitation can be achieved by the use of a heteronuclear filter which suppresses the signals from the labeled sample. Such experiments are discussed in this section, too. It has been noted (Mori et al., 1996a) that E-BURP pulses (Geen and Freeman, 1991) can be used with higher selectivity than, e.g., Gaussian pulses. The reason is that the amplitude of an E-BURP pulse grows toward its end, provoking less radiation damping during the initial half of the pulse. Yet, the recommended pulse duration was not longer than 16 ms, corresponding to the excitation of a fairly wide band (Mori et al., 1996a). In the experiment of Fig. 4B, radiation damping during the rest of the pulse sequence was avoided by the use of a PFG after the selective excitation pulse to defocus the water magnetization (Mori et al., 1994). The subsequent 90° pulse generates longitudinal water magnetization, a second PFG is used to destroy transverse magnetization, and a selective 90° pulse is used to generate transverse coherence for the NH protons. The magnetization is refocused by a gradient which is combined with a second gradient which is part of the following Watergate sequence. This Watergate variant contains only a single semiselective 180° refocusing pulse on the NH protons which does not excite the water. The experiment was proposed for the detection of chemical exchange between water and amide protons in proteins. Because of the use of a defocusing gradient after the selective water excitation which prevents the generation of purely longitudinal magnetization by the following 90° pulse, the sensitivity of the experiment is at most half of the sensitivity of the hypothetical experiment of Fig. 4A. The experiment of Fig. 4C (Mori et al., 1996a) eliminates the sensitivity disadvantage of the experiment of Fig. 4B. All the excited magnetization is longitudinal during the mixing time The selection of longitudinal magnetization is supported by a short strong PFG at the beginning of the mixing time. A weak gradient during the rest of the mixing time prevents the formation of transverse water magnetization which could trigger radiation damping. The mixing time is followed by a hard 90° pulse and a water-flipback pulse, which is a selective pulse
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
499
500
Gottfried Otting
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
501
on the water applied with a phase so that the water magnetization ends up along the positive Residual transverse water magnetization is suppressed by the following Watergate sequence. The water-flipback greatly enhances the sensitivity, since the recovery of equilibrium magnetization of water by relaxation is slow. The water-flipback pulse cannot be phase-cycled independently of the first excitation pulse. This is not expected to lead to artifacts, however, since the magnetization of interest has been transferred from the water to the solute during the mixing time where it is no longer affected by the selective flipback pulse. Since radiation damping prevents the use of a truly selective, long 90° excitation pulse, a spin-echo sequence was proposed to reduce those signals from the macromolecules that are excited by the selective 90° pulse (Fig. 4D) (Mori et al., 1996b). The spin-echo filter relies on the shorter transverse relaxation times of macromolecules compared to water. A PFG is applied both at the start and at the end of the spin-echo delay to prevent loss of water magnetization by radiation damping during the spin-echo sequence. Those PFGs must not be too intense to avoid loss of water magnetization by diffusion. A spin-echo delay of 40 ms was proposed for the use with proteins, where resonances overlap with the water signal. Although this delay is too short for complete relaxation of the protein magnetization, an additional suppression factor is provided by the scalar coupling evolution of the with respect to couplings to amide and which channels much of the magnetization into antiphase coherences which no longer lead to longitudinal magnetization during the NOESY mixing time. It was recommended to use two filter delays, 40 and 60 ms, to check the suppression of the solute resonances (Mori et al., 1996b). Much longer filter delays may result in substantial loss of water magnetization, since the effective relaxation time of water protons in solutions of solutes with exchangeable protons can easily be shorter than 200 ms due to exchange broadening. If proteins are available, the selective excitation problem can be overcome by purging the signals of the protein after semiselective water excitation.
502
Gottfried Otting
Purging of the magnetization of protons is particularly efficient, since the constant is large and of very similar size for different CH groups. The experiment of Fig. 4E (Grzesiek and Bax, 1993a, 1993b) uses a short spin-echo sequence of selective pulses on the water resonance. The delay is chosen so that any magnetization excited by the first selective pulse precesses during an effective delay of at which time a 90° pulse converts the antiphase magnetization into unobservable two-spin coherence. Residual in-phase magnetization defocuses again during the second delay into antiphase magnetization which is also converted into unobservable two-spin coherence by the second 90° pulse. If water–protein NOEs with protons are to be observed, the constant-time HSQC sequence with water-flipback following the NOESY
mixing time is an efficient way of measuring the NOEs in a two-dimensional spectrum. The selective 90° water-flipback pulse (the seventh proton pulse in the pulse sequence) is phase-cycled together with the phase of the first selective 90° excitation pulse to align the water magnetization along the in each scan. The following pulses effectively rotate the magnetization back to the before the detection period The magnetization of solute protons not bound to is not removed by the pulses at the beginning of the pulse sequence. In this case, it was proposed to distinguish between intramolecular NOEs and intermolecular water–solute interactions by a control experiment identical to that of Fig. 4E, except that it is preceded by selective water irradiation during the interscan relaxation delay until 200 ms before the first selective 90° pulse (Grzesiek et al., 1994). Water magnetization is removed by the selective water irradiation and does not recover very much during the following 200-ms delay. In contrast, the magnetization of the solute is either not affected by the very selective water preirradiation or it is largely replenished by intraprotein NOEs during the 200-ms delay. Therefore, water–solute NOEs are strongly suppressed in the control experiment, whereas intrasolute NOEs are much less affected. When the solute is 100% labeled with both and the scalar coupling evolution of the protons by the large one-bond and couplings can be used to purge the magnetization from the protons bound to and . thus selectively retaining the water magnetization. This principle was implemented in the HMQC experiment of Fig. 4F which was designed for the observation of water– amide proton cross peaks (Gemmecker et al., 1993). After a nonselective 90° excitation pulse, the magnetization evolves under scalar couplings with respect to and The 90° and 90° pulses after delays of and respectively, turn antiphase magnetization into unobservable two-spin
coherence. The filters are applied twice with slightly different delays to improve the purging quality for different and constants. The original experiment used neither water-flipback nor any special precautions to suppress radiation damping throughout the entire mixing time
Furthermore, the magneti-
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
503
zation of hydroxyl and sulfhydryl protons is not filtered out, even if their chemical shifts are resolved from the water resonance (Knauf et al., 1996).
The experiment of Fig. 4G makes use of the radiation damping effect itself to achieve a long, selective 90° pulse of the water (Otting and Liepinsh, 1995b). A 180° inversion pulse is followed by a gradient to remove any residual transverse magnetization. A long selective pulse of a very small nominal flip angle is used to generate a small amount of transverse magnetization, triggering radiation damping. Once the water magnetization passes through the transverse plane, it is picked up by the following 90° pulse and converted into longitudinal magnetization. The magnetization of the solute is not affected by the radiation damping unless the
signals are very close to the water resonance. In the original sequence, a train of homospoil pulses was used to suppress radiation damping during the NOESY mixing time and the water resonance was suppressed by a spin-lock purge pulse. The radiation damping field generating the selective 90° water pulse is similar to
that of a half-Gaussian pulse, which is similarly selective as the Gaussian pulse (Friedrich et al., 1987). By varying the intensity of the selective the duration of the selective water excitation can be adjusted also on probeheads of not too high quality factor, where radiation damping alone would produce unacceptably
long pulse durations. On a 600-MHz NMR spectrometer, radiation damping produces a 90° flip angle during about 25 ms.
The experiment of Fig. 4H (Wider et al., 1996) uses the same principle in a difference experiment. In the first experiment, the nonselective 90° excitation pulse is followed by a PFG to destroy all transverse magnetization. In the second experiment, the PFG is applied only after some delay which allows for nearly complete return of the water magnetization to the The solute magnetization, which is not affected by radiation damping, remains transverse until the PFG in either experiment and is therefore subtracted when the difference between both
experiments is calculated. Only every second experiment contributes to the desired signal in the difference experiment, leading to a twofold reduction in sensitivity. The radiation damping field generated by the selective 90° water pulse is similar to that of a time-reversed half-Gaussian pulse. The experiment of Fig. 4I is most similar to that of Fig. 4A. Instead of a continuous selective 90° excitation pulse, a time-shared 90° pulse is used with short free precession delays between the individual pulse segments as in DANTE type (Morris and Freeman, 1978) excitation (Otting and Liepinsh, 1995e). The quality
factor of the rf coil is switched high during the pulses and low during the delays. In this way, radiation damping is suppressed during the delays. Each pulse segment of the excitation pulse is more intense than the corresponding segment of a
continuous pulse of the same duration, because the overall integral of the pulse must be the same for a 90° flip angle. Therefore, the radiation damping field is more easily overcome during short pulse elements. In practice, selective 90° Gaussian pulses of 50 ms duration can be achieved in this way without significant loss of
504
Gottfried Otting
water magnetization. The excitation sidebands produced by the DANTE-type excitation are placed outside the spectral width by setting the free precession delays shorter than the dwell time. Switching of the quality factor of the probehead requires special hardware, by which the coil can be connected to electrical ground via a rapid switch. Switch designs are available that hardly affect the sensitivity of the probehead (Anklin et al., 1995). 4.2.2. Selective Water Excitation by a 180° Pulse The simplest conceivable NOE experiment using a selective 180° pulse for water excitation is the difference experiment sketched in Fig. 5A, where an experiment with the 180° pulse on the water resonance is subtracted from an experiment, where this 180° pulse is either absent or applied outside the spectral range of interest. Although it is a difference experiment, all scans contribute to the water–solute cross peaks, retaining the full sensitivity. As discussed before, the simple scheme does not allow for a very selective pulse in the presence of radiation damping. Nonetheless, the scheme has been used for selective water excitation with
pulse durations of up to 50 ms (Kriwacki et al., 1993). Similarly as in the experiment of Fig. 4D, the use of a diffusion filter has been proposed to help distinguish direct water–solute NOEs from intrasolute NOEs which are less strongly affected by
diffusion during the delay (Fig. 5B) (Kriwacki et al., 1993). A different scheme for a long, water-selective 180° pulse is presented by the experiment of Fig. 5C. The experiment presents a difference experiment, where the selective 180° pulse is composed of a DANTE-type series of small flip-angle pulses interleaved by short free precession delays (Böckmann and Guittet, 1996). Short bipolar gradients ( , 1995; Zhang et al., 1996) are applied during the delays to suppress radiation damping. In the second part of the difference experiment, the phase of the small flip-angle pulses is reversed in the second half of the selective excitation pulse, leading to an effective 0° flip angle for the water magnetization. The following NOE mixing time starts with a PFG to support the selection of longitudinal magnetization followed by a weak gradient throughout the mixing time to prevent radiation damping. Each bipolar gradient first defocuses and then refocuses the water magnetization. It has been shown that weak bipolar gradients of as little as 0.2 G/cm are sufficient to suppress radiation damping during the evolution time of a two-dimensional experiment ( , 1995). In the scheme of Fig. 5C, the free precession delays and thus the PFGs must be of the order of the dwell time or shorter to exclude the appearance of excitation sidebands in the spectrum. To achieve significant defocusing during
the short delay, each individual PFG must be relatively intense, yet sufficiently weak to avoid troubles from eddy currents. Figure 5D presents a scheme, where radiation damping is used to achieve a near-180° rotation of the water magnetization (Otting and Liepinsh, 1995b). Like
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
505
506
Gottfried Otting
the scheme of Fig. 5A, the experiment is a difference experiment. Following a nonselective 160° pulse, a series of homospoil pulses or PFGs is applied in one experiment but not in the other. With the homospoil pulses, any transverse magnetization is defocused and radiation damping is suppressed. Without the homospoil pulses, the transverse component of the water magnetization triggers radiation damping, which turns the water magnetization back to the positive while the magnetization of the solute remains unaffected as long as it precesses with different frequencies than the water magnetization. The effective field produced by the radiation damping resembles a Gaussian pulse. The experimental scheme of Fig. 5D yields optimum sensitivity, since almost no water magnetization is lost during the radiation damping process. In contrast, the water signal intensity observed after a selective radio-frequency pulse is always somewhat less than that observed after a nonselective pulse, mostly due to relaxation. A drawback of the excitation scheme of Fig. 5D is the poor definition of the mixing time, since the water magnetization is not longitudinal during the entire mixing time in half of the scans. As in all difference experiments based on selective 180° pulses, the water–solute NOE building up during the selective excitation scheme is not completely subtracted in the difference experiment, which may become noticeable when the excitation scheme is followed by a short ROE mixing time. Finally, it has been noted that difference experiments based on selective 180° inversion pulses tend to suffer from subtraction artifacts (Otting and Liepinsh, 1995b; Mori et al., 1996a), perhaps because of dipolar field effects (see below). The experimental schemes of Fig. 5E–G use a selective 180° refocusing pulse in the middle of a spin-echo period, during which the water magnetization is transverse. Radiation damping is suppressed by defocusing the magnetization by a PFG applied before the selective refocusing pulse. Thus, long, selective pulses can be used without interference from radiation damping. In these schemes, magnetization transfer between water and solute during the selective excitation scheme does not result in a net magnetization transfer; i.e., the NOE or ROE mixing times in these experiments are well defined and given by It must be remembered, though, that the exchange of protons between water and solute can lead to rather short effective relaxation times of the water magnetization. Furthermore, care must be taken to adjust the phase of the selective 180° refocusing pulse. If phase-shifted by 45° relative to the phase of the hard pulses, no longitudinal magnetization is generated at the start of the mixing time. The experiment of Fig. 5E uses PFGs of opposite polarity on either side of the selective 180° pulse; i.e., the second PFG defocuses the water magnetization even
further (Dalvit, 1995; Dalvit and Hommel, 1995a). Thus, only half of the water magnetization is longitudinal during the subsequent NOE mixing time, resulting in twofold reduced sensitivity. The magnetization transferred to the solute is refocused during the Watergate scheme, which also contains PFGs of opposite polarity. The
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
507
PFGs in the Watergate sequence must be of different strength to avoid undesired echo effects. The experiment of Fig. 5F is another variant of the experiment of Fig. 5E, where the water magnetization is refocused by the PFG after the selective 180° pulse, so that all water magnetization is longitudinal during the mixing time and full sensitivity is retained (Dalvit and Hommel, 1995b). Radiation damping is suppressed during the mixing time by a weak, continuous gradient. The mixing time ends with the combination of a selective 90° pulse and a nonselective 90° pulse,
which together return the water magnetization to the positive
The following
conventional Watergate sequence effectively does not excite the water resonance. Thus, high sensitivity is retained in this experiment even if the repetition rate is fast
compared to the relaxation time of the water. The excitation schemes of Figs. 5E and 5F have also been implemented in off-resonance ROESY experiments for the detection of exchange cross peaks with water (Birlirakis et al., 1996). The experiment of Fig. 5G (Wider et al., 1996) relies on a diffusion filter to separate the magnetization of the water and the solute. The selective 180° refocusing pulse is relatively short (4.1 ms) and therefore of little selectivity. The selectivity of this pulse is, however, not very important, since the water signal is selected based on the different diffusion rates of water and solute rather than frequency. The experiment is a difference experiment. In the first experiment, all magnetization
excited by the initial nonselective 90° excitation pulse is defocused by the following PFG. Only the magnetization refocused by the following selective 180° pulse is refocused by the subsequent PFG. Only little water magnetization is refocused, however, because the PFGs are applied with very high amplitude (i.e., 115 G/cm), leading to efficient suppression of the water magnetization by diffusion. In the second experiment, the first pair of PFGs is applied with weak amplitude (i.e., 10 G/cm) so that radiation damping is suppressed but magnetization losses by diffusion are unimportant. The difference between both experiments yields the cross peaks
with the water resonance and suppresses the intrasolute cross peaks between nonlabile or slowly exchanging protons. Since the diffusion of the solute during the excitation scheme also affects the solute magnetization, the total gradient power in each of the experiments is kept constant; that is, weak PFGs are used during the
Watergate sequence, if strong PFGs were used during the excitation scheme, and vice versa. In this way, the Watergate sequence acts as a diffusion filter like the excitation scheme. For comparable diffusion filtering effects, the duration of the
excitation scheme is the same as the duration of the Watergate sequence The advantage of the experiment is the suppression of intrasolute NOEs even if the solute’s resonances are at exactly the same chemical shift as the water. As in all
other experiments of Figs. 4 and 5, however, exchange-relayed NOEs (Fig. 2B) are not suppressed. A disadvantage is the twofold loss in sensitivity, since water magnetization is retained only in every second experiment. Furthermore, the
experiment is prone to eddy current artifacts from the strong gradients. Finally, the
508
Gottfried Otting
duration of the Watergate sequence is relatively long to match the duration of the excitation sequence.
4.3. Nonselective Experiments Nonselective experiments have the advantage that spectral artifacts such as and are readily identified, whereas they would appear as subtraction artifacts in experiments using selective water excitation. Furthermore, selective pulse shapes tend to produce negative excitation sidelobes (Hajduk et al., 1993), requiring special care in later spectral analysis. Otherwise, NOEs from solute protons excited with negative sign could easily be interpreted as negative NOESY cross peaks with the water. Nonselective experiments of higher dimensionality, however, tend to be less sensitive than selective experiments. Since quadrature detection in the indirect dimension requires that the phase of the first pulse be incremented in steps of 90°, the water magnetization cannot be channeled into longitudinal magnetization during the NOE mixing time for all FIDs as in the analogous experiments of lower dimensionality which employ selective waterexcitation schemes. Thus, water-flipback schemes cannot readily be implemented. Much of the sensitivity lost in experiments scrambling the water magnetization can, however, be recovered by the use of relaxation reagents which shorten the relaxation time of the water and therefore allow faster repetition rates (Otting and Liepinsh, 1995c). One of these is Gd-diethylenetriamine pentaacetic acid-bismethylamide [Gd(DTPA-BMA)], a nonionic relaxation reagent which is routinely used in MR imaging to shorten the relaxation time of water protons. Gd(DTPA-BMA) has been shown not to bind to plasma proteins and is effective at submillimolar concentrations.
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
509
510
Gottfried Otting
The pulse sequence of Fig. 6A shows how the water magnetization can be steered into reproducible positions during the mixing time of a NOESY experiment.
The 90° pulse preceding the mixing time is phase-shifted by 45° with respect to the first 90° excitation pulse of the pulse sequence (Driscoll et al., 1989). With the carrier at the water frequency, half the water magnetization becomes longitudinal at the start of the mixing time, while the other half becomes transverse, independent of whether the first 90° excitation pulse is applied along the x- or y-axis. In this way, the amount of water magnetization that needs to be suppressed is the same for every scan. The transverse magnetization is destroyed by the strong PFG at the start of the mixing time. Radiation damping during the rest of the mixing time is suppressed by a long, weaker gradient, and the remaining water magnetization is suppressed by some water suppression scheme, e.g., a spin-lock purge pulse or a Watergate sequence. Radiation damping during the evolution time would lead to broadening of the water signal in the dimension, but can be suppressed by the use of a bipolar gradient, by which the water magnetization is first defocused and then refocused ( , 1995). Alternatively, if PFGs are not available, a Q-switch (Anklin et al., 1995) or spin-lock pulses before the first 90° pulse and after the second 90° pulse (Otting, 1994) can be used for the same purpose. Three-dimensional experiments for the observation of water–solute NOEs are straightforward extensions of the corresponding two-dimensional experiments. Only a few illustrative examples are discussed here. In three-dimensional experiments, water magnetization can be suppressed either after the first or second mixing time. Figure 6B shows a pulse sequence for a 3D NOESY–TOCSY experiment, where transverse water magnetization during the first mixing time is suppressed by a PFG during and longitudinal water magnetization is suppressed by the sequence where the free precession interval before the spin-lock purge pulse SL introduces a sine-shaped excitation profile in the frequency dimension (Otting et al., 1991b). Alternatively, the water suppression scheme can be implemented right before the detection period placing the nonuniform excitation profile in the dimension (Holak et al., 1992). The hydration of or labeled solutes is conveniently studied by 3D NOESY–HSQC experiments. HSQC experiments are not only very sensitive, but also offer simple ways of combining various water suppression schemes with the delays already present in the pulse sequence. For example, water suppression by spin-lock pulses can be incorporated into the first INEPT step of the HSQC sequence, as illustrated by the experiment of Fig. 6C (Messerle et al., 1989). With the carrier at the water frequency, the magnetization of the protons bound to precesses during the INEPT delay by 90°, while the water magnetization stays aligned along the y-axis and is defocused by the spin-lock purge pulse. Since one-bond coupling constants are very similar for different groups, the heteronuclear coherence is hardly affected by the spin-lock purge pulse, resulting in a uniform excitation profile in all dimensions. The
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
511
Watergate sequence is implemented with similar ease in the reverse INEPT step of a 3D NOESY–HSQC experiment (Fig. 6D) ( et al., 1993). Isotope-labeled samples offer the additional option to use gradients for coherence selection with the possibility to totally remove the residual water magnetization (Hurd, 1991). For example, a pair of PFGs of opposite polarity around a 180° pulse can be used to defocus the magnetization of the spins without dephasing the proton magnetization. The coherences of interest are refocused by a corresponding gradient applied to the proton magnetization immediately before detection (Fig. 6E). In the implementation of Fig. 6E, a factor of in sensitivity is lost by the use of gradients in an echo–antiecho mode (Keeler et al., 1994). Using an HSQC sequence with sensitivity enhancement, up to twofold better sensitivity can be obtained (Kay et al., 1992). 4.4.
Dipolar Field Effects
The effective magnetic field experienced by solute and water spins depends also on the orientation of the water magnetization with respect to the main magnetic field. Thus, solute signals appear shifted by about 1 Hz, depending on whether the
bulk magnetization of the water is parallel or antiparallel to the main magnetic field (Edzes, 1990). The effect is present locally, too, i.e., if the water magnetization is parallel to the main magnetic field in some areas of the sample and antiparallel in others. Such inhomogeneous magnetization patterns arise when the magnetization is defocused by a PFG and partially converted into longitudinal magnetization by a following 90° pulse (Bowtell, 1992). The field shift can lead to subtraction artifacts in difference experiments and impure line shapes (Kubinec et al., 1996). It can be shown to cancel when PFGs are applied along the magic angle (54.7°) with respect to the main magnetic field (Warren et al., 1993). Both classical and quantum-mechanical descriptions are available for quantitative descriptions of this so-called dipolar field or demagnetization field effect (Broekert et al., 1996; Levitt, 1996; Richter et al., 1995).
5. APPLICATIONS
5.1. Studies of Protein Hydration
After initial reports on intermolecular water–peptide NOEs observed in 1D NOE difference experiments with angiotensin II (Pitner et al., 1974) and oxytocin (Glickson et al., 1976), hydration studies by intermolecular NOEs do not seem to have been pursued any further, perhaps because of the limited sensitivity of the NMR instrumentation or the difficulty in suppressing subtraction artifacts in the 1D NOE difference experiments.
512
Gottfried Otting
The use of intermolecular NOEs for the identification of individual hydration water molecules in proteins was first demonstrated with bovine pancreatic trypsin
inhibitor (BPTI) (Otting and Wüthrich, 1989). This study used NOESY and ROESY spectra to distinguish between chemical exchange and NOE or exchangerelayed NOEs. The cross peaks that could be assigned to intermolecular water– BPTI NOEs could all be explained by NOEs with the four internal hydration water molecules buried in the interior of BPTI, which had previously been identified by X-ray crystallography in all single-crystal structures of BPTI. The cross peaks were positive in NOESY and their intensities comparable to intraprotein cross peaks. It was noted that all water protons and most hydroxyl protons appeared at the chemical shift of the bulk water. Later, the exchange between hydration water and bulk water was formally verified by adding the paramagnetic shift reagent which shifts the frequency of the bulk water signal (Otting et al., 1991c). The experiment showed that the NOEs with hydration water molecules were shifted together with the bulk water signal.
BPTI was further used to develop homonuclear 3D NMR experiments for the study of protein hydration by intermolecular water–protein NOEs (Otting et al.,
1991b; Holak et al., 1992). The improved resolution in these experiments allowed the assignment of many more cross peaks. Negative NOESY cross peaks were observed for surface protons of BPTI, indicating little hindered diffusion rates of the hydration water molecules on the protein surface. A control experiment performed with a 50 mM solution of oxytocin at 8°C showed that negative water–peptide NOESY cross peaks can be observed for all
protons (Otting et al., 199la). At 8°C, all intrapeptide cross peaks were positive in NOESY. Lowering the temperature to –25°C (with the addition of 40% acetone to prevent freezing), the sign of the water–oxytocin NOESY cross peaks turned positive, indicating water residence times at the very low temperature (Otting et al., 1992). As a side result of the hydration studies, exchange cross peaks were observed between water and the hydroxyl protons of BPTI at low temperatures. Their
exchange rates were subsequently measured at 4°C as a function of (Liepinsh et al., 1992a). This study was later complemented by measurements of the proton exchange rates of the labile side-chain protons of lysine, arginine, threonine, serine, and tyrosine in the free amino acids in the temperature range 4–36°C and as a function of (Liepinsh and Otting, 1996). It was also shown that carboxyl protons of solvent-exposed side chains are not readily detected by water–polypeptide NOEs (Liepinsh et al., 1993). BPTI was also used as an example for a comparative hydration study of a protein with and without the presence of 200 mM
using a modified NOE–
TOCSY sequence with selective excitation by radiation damping (Fig. 5D). Unfortunately, the presence of artifacts and impure phases of the cross peaks interfered with a detailed spectral analysis (Böckmann and Guittet, 1995). The same excitation
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
513
scheme worked well for the same authors in a proton exchange study (Böckmann et al., 1996). An early protein hydration study by water–protein NOEs was performed with (Clore et al., 1990). The experiment used was a 3D ROESY HMQC using a spin-lock purge pulse for water suppression. 15 water–protein NOEs were identified and interpreted by 11 water molecules previously detected in the single-crystal structure. Although no NOESY experiment was performed, residence times were attributed to the detected hydration water molecules based on Eq. (11) and on the fact that their NOEs were sufficiently intense for detection. human was used in a later study (Ernst et al., 1995) to detect water–protein NOEs with methyl groups in a WNOESY experiment (Fig. 4E) (Grzesiek and Bax, 1993a). NOEs were detected with methyl protons lining a hydrophobic cavity of about volume in the interior of the protein, although no water molecules had been located in this cavity in any of the crystal structures. It was argued that the lack of hydrogen-bonding partners in the cavity wall could lead to a delocalization of the hydration water molecules, which would make their observation difficult by X-ray crystallography.
In a hydration study of reduced human thioredoxin, four hydration water molecules were detected by six water–protein NOEs with the amide protons in a 3D ROESY HMQC experiment (Forman-Kay et al., 1991; Clore et al., 1990). A structure calculation was performed using these NOE distance constraints supplemented by H-bond restraints with nearby carbonyl oxygens and lower-limit distance constraints for amide protons, for which no intermolecular NOE had been observed. Only those two water molecules which were characterized by two NOEs each were located at unique sites in the protein structure. Their orientation appeared disordered. The 3D ROESY HMQC experiment (Clore et al., 1990) was further used
to study the hydration of the immunoglobulin binding domain of streptococcal protein G (Clore and Gronenborn, 1992). Two solvent-exposed water molecules were identified by three NOEs with amide protons and their binding to the protein modeled with bifurcated hydrogen bonds. A structure computation including internal water molecules was further performed with an FK506-binding protein–ascomycin complex (Meadows et al., 1993; Xu et al., 1993). The protein was and 11 water–protein NOEs were detected in 3D ROESY HMQC (Clore et al., 1990) and 3D NOESY HMQC experiments using a spin-lock purge pulse for water suppression. The NOEs defined three internal water molecules at 30°C. The NOE distance constraints were supplemented by 18 hydrogen-bond constraints based on the crystal structure. The resulting structures were reported to be better defined in the vicinity of the water molecules, when the water molecules were explicitly included in the structure calculation. The same three internal water molecules were detected
514
Gottfried Otting
in a later study using FK506-binding protein with the PHOGSY pulse sequence (Fig. 5E) (Dalvit and Hommel, 1995a). The possibility of detecting hydration water molecules at the interface between a DNA-binding protein and DNA by intermolecular water–protein NOEs was demonstrated for a complex between an Antennapedia homeodomain mutant and a 14-base-pair DNA duplex (Qian et al., 1993). The 3D NOESY
and spectra with water suppression by spin-lock purge pulses (Fig. 6C) (Messerle et al., 1989) were recorded with samples of the complex containing protein. Three intermolecular water–protein NOEs
were identified. The experiment (WNOESY, Fig. 4E) was first demonstrated with a complex between calmodulin and an unlabeled 13-residue peptide, where intermolecular water–protein cross peaks were observed with numerous methyl groups (Grzesiek and Bax, 1993a). The same technique was used to quantify the magnetization exchange rates between water and protein protons in a sample of calcineurin B (Grzesiek and Bax, 1993b). In the absence of a three-dimensional structure, however, direct water–protein NOEs could not be distinguished from exchange-relayed NOEs.
The WNOESY and WROESY experiments (Fig. 4E) (Grzesiek and Bax, 1993a) were also used for the detection of intermolecular water–protein cross peaks
with GATA-1 in complex with a 16-base-pair DNA duplex, for which 20 direct water–protein NOEs were reported (Clore et al., 1994). Only eight NOEs were detected in the WNOESY experiments (recorded with NOE mixing times of 60 and 100 ms), one of them with the same sign as in the WROESY experiments (which were recorded with 60-ms mixing time). NOEs present in the WROESY and absent from the WNOESY experiment were ascribed to water molecules with residence times of 200–300 ps. Curiously, numerous water–protein
NOEs were observed with solvent-exposed methyl groups with good intensities in the WROESY experiment. Usually, the water–protein NOEs with solvent-exposed methyl groups yield cross peaks of the same sign and similar intensity in NOESY and ROESY (e.g., Otting et al., 1991a; Kubinec and Wemmer, 1992b; Liepinsh et al., 1992b; Radhakrishnan and Patel, 1994a). WNOESY and WROESY experiments (Grzesiek and Bax, 1993a) were further used to detect buried water molecules in the catalytic domain of stromelysin-1 complexed with a small inhibitor (Gooley et al., 1996). Seven water–protein NOEs were reported, giving evidence for three water molecules which had also been detected in the crystal structure by X-ray crystallography. A homonuclear hydration study of horse heart ferrocytochrome c and ferricy-
tochrome c using 2D NOE–TOCSY and ROE–TOCSY experiments with selective water excitation by a simple, sine-shaped 90° pulse and water suppression by spin-lock purge pulses reported five (six) hydration water molecules in the interior of ferri(o)cytochrome c, one of which changed position between the different
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
515
oxidation states (Qi et al., 1994). Thirty-four NOEs defined six water molecules. Two of these had not been detected by X-ray crystallography. A water molecule was detected at the interface between HIV-1 protease and a chemically synthesized inhibitor by one NOE with an amide proton (Grzesiek et al., 1994). The assignment was based on the crystal structure. A different inhibitor, designed to replace this water molecule, was shown to abolish the intermolecular water–amide proton NOE. The water–amide proton crossrelaxation rate was quantitatively measured using WNOESY and WROESY experiments (Fig. 4E) (Grzesiek and Bax, 1993a) and found to match the internuclear distance measured in the single crystal. Hence, a residence time longer than the rotational correlation time of the complex (9 ns) was attributed to this water molecule. Corresponding cross-relaxation rate measurements were performed later to characterize the hydration of HIV-1 protease in complex with the inhibitor KNI-272 (Wang et al., 1996b) and of HIV-1 protease in complex with DMP323 (Wang et al., 1996a). Four to six water molecules with residence times ns were reported for the complex with KNI-272, but only one to three such water molecules were found at the inhibitor binding site in the complex formed with DMP323. The quantitative measurement of intermolecular water–peptide NOEs in the
turn-forming peptide SYPYD demonstrated differential solvation of the proline residue under conditions of cis and trans prolyl peptide bonds and 1.8/30°C, respectively) (Yao et al., 1994). Two-dimensional ROESY and NOESY experiments were used with spin-lock purge pulses for water suppression (Otting et al., 1991b). Reduced intermolecular NOEs were observed in the cis proline form, indicating low solvent accessibility of the proline ring in the turn structure. An NOE study of human dihydrofolate reductase in complex with methotrexate and NADPH revealed six bound water molecules, five of which were also observable in the absence of NADPH (Meiering and Wagner, 1995). The observed water molecules were highly conserved between different crystal structures. It was noted that these water molecules were buried with less than 80% solvent accessibility and had low-temperature factors in the crystal structures and at least two hydrogen bonds. Three different mutants of the protein were prepared which removed a hydrogen bond to one of the water molecules (Meiering et al., 1995). Weaker water–protein NOEs were subsequently observed for this water molecule, possibly because of a shortened residence time. The experiments used were 3D (Clore et al., 1990), 3D (Messerle et al., 1989), and corresponding two-dimensional spectra using a 10-ms hyperbolic secant 90° pulse for water excitation were recorded. A homonuclear hydration study of a ribonuclease C-peptide analog showed negative NOESY cross peaks with the water resonance for all protons of all 13 residues, as far as the signals could be resolved in 2D NOESY and 3D NOESY– TOCSY spectra, although CD spectra indicated 60% (Brüschweiler et
516
Gottfried Otting
al., 1995). Spin-lock purge pulses (Otting et al., 1991b) were used for water suppression. The e-PHOGSY pulse sequence (Fig. 5F) was demonstrated using hen eggwhite lysozyme. The detection of at least three not further specified hydration water molecules was reported (Dalvit, 1996). Like hen egg-white lysozyme contains internal hydrophobic cavities, where no hydration water was detected in the crystal structures, whereas water–protein cross peaks with the protons lining the cavity walls were detected in NOESY and ROESY experiments using spin-lock purge pulses for water suppression (Otting et al., 1997). In contrast to the experiments with interleukin only weak intermolecular NOEs were observed, suggesting partial occupancies of the cavities. Partial occupancy is further suggested by the fact that one of the cavities is so small that only a single water molecule can be accommodated at a time. Thus, the difficulty of observing this water molecule by X-ray crystallography cannot be attributed to a delocalization of the hydration water. Using unlabeled, and samples, the hydration of oxidized flavodoxin from Desulfovibrio vulgaris was studied (Knauf et al., 1996) by way of homonuclear 3D NOESY–TOCSY (Otting et al., 1991b; 3D (Clore et al., 1990) and MEXICO (Gemmecker et al., 1993) experiments. The 3D NOESY–TOCSY experiment used spin-lock purge pulses for water suppression, but was modified by an additional 4-ms water-selective 90° Gaussian pulse at the end of the NOESY mixing period. The pulse was applied with orthogonal phase relative to the following hard 90° pulse. Its purpose was to improve the water suppression by turning any longitudinal water magnetization present at the end of the mixing time into the transverse plane with a phase so that it was not affected by the following 90° hard pulse, resulting in optimum defocusing by the following spin-lock purge pulse which was applied with orthogonal phase relative to the hard 90° pulse (Otting et al., 1991b; Knauf et al., 1996). Four hydration water molecules were defined by about 10 intermolecular water–protein NOEs, one of which lies in a bridging position between the protein and the ribityl side chain of the FMN ligand. Interestingly, some of the buried hydration water molecules reported by the single-crystal structure seemed to be absent in solution. Finally, four to five intermolecular water–protein NOEs detected in 3D (Messerle et al., 1989), 3D and the corresponding ROESY experiments of E. coli flavodoxin were used for the identification of two to three buried hydration water molecules (Ponstingl and Otting, 1997b).
5.2. Studies of DNA and RNA Hydration An early attempt to detect intermolecular
NOEs between water and
DNA by two-dimensional NOESY spectra failed because the imino and amino
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
517
protons of the DNA fragment exchanged too rapidly with the water under the conditions chosen (van de Ven et al., 1988). A quantitative analysis of exchangerelayed NOEs showed that all cross peaks observed with the water resonance could be interpreted as exchange-relayed NOEs. The first study demonstrating intermolecular water–DNA NOEs appeared four years later (Kubinec and Wemmer, 1992b). Using spin-lock purge pulses to suppress the water resonance in two-dimensional NOESY and ROESY experiments (Otting et al., 1991b), it was shown that the hydration water in the vicinity of the adenine 2 protons and some of the sugar protons in the minor groove of the
self-complementary DNA fragment has sufficiently long residence times to give rise to positive water–adenine 2H cross peaks in the NOESY spectrum. Negative NOESY cross peaks were reported with the thymidine methyl protons and G12
indicating short water residence times near these protons.
Positive NOESY cross peaks observed with the terminal nucleotides of the DNA duplex were probably falsely attributed to bound hydration water molecules, since the presence of hydroxyl groups at the terminal sugar moieties provides the possibility of exchange-relayed NOEs. The same DNA fragment and pulse sequences were used in a study published
shortly after, detecting the same water molecules of the spine of hydration in the minor groove and negative NOESY cross peaks with thymidine methyl groups,
guanine 8H, and some of the protons (Liepinsh et al., 1992b). Furthermore, the fragment was studied by the same techniques, where positive NOESY cross peaks with the adenine 2 protons of the central part of the duplex indicated the presence of a spine of hydration even there.
The DNA fragment sample, where A5 was selectively labeled with
was again studied later with a at positions 2 and 8 of the base
(Kubinec et al., 1996). The level of tritium labeling was sufficient to observe intermolecular water–proton to DNA–tritium NOEs in a heteronuclear NOESY experiment which was derived from the conventional three-pulse NOESY sequence by replacing the last 90° pulse by a 90° pulse with subsequent tritium detection. Since the water did not contain tritium, the spectrum could be recorded without water suppression. It was noted, however, that the water–proton to tritium cross peaks were mostly dispersive at short mixing times, regaining pure phase at mixing times of 100 ms or longer. It is likely that the phase distortions arose from demagnetization field effects (Edzes, 1990; Bowtell, 1992; Kubinec et al., 1996; Warren et al., 1993; Broekaert et al., 1996; Levitt, 1996; Richter et al., 1995). Using ROESY experiments with a spin-echo water suppression sequence, the water–DNA NOEs with four different phenazine-tethered matched and mismatched DNA duplexes were measured in a study that tried to correlate the intensities of the water–DNA NOEs with imino proton exchange rates and the thermodynamic stabilities of the duplexes (Maltseva et al., 1993). The validity of
the conclusions reached in this study was perhaps compromised by the fact that the
518
Gottfried Otting
water suppression scheme used (Bax et al., 1987; and Bax, 1987b) had not been designed for the observation of intermolecular NOEs with water, producing markedly unequal amounts of water magnetization with even and uneven FIDs recorded with different phase increments for quadrature detection in the indirect frequency dimension. Furthermore, large exchange cross peaks were observed and the possibility of exchange-relayed NOEs was not convincingly ruled out. NOESY and ROESY spectra recorded of the non-self-complementary duplex using water suppression by spinlock purge pulses (Otting et al., 1991b) confirmed the presence of a spine of hydration in the minor groove with water residence times above about 1 ns, since positive NOESY cross peaks were observed with several adenine 2 protons near the center of the duplex (Fawthrop et al., 1993). Interestingly, no hydration water molecules were observed at the central A–T step in the crystal structure of a closely
related duplex. The thymidine methyl groups showed negative NOESY cross peaks as all B-DNA type duplexes studied to date. The residence time of the water molecules of the spine of hydration in the minor groove were reported to be slightly shorter near the AT base pairs in than in since water–adenine 2H cross peaks were absent from the NOESY spectrum of the former, but positive in the latter DNA fragment, while the corresponding ROESY cross peaks were intense in both fragments (Liepinsh et al., 1994). It was speculated that the different residence times could arise from a different minor groove width. The experiments were two-dimensional NOESY and ROESY experiments using spin-lock pulses for water suppression (Otting et al., 1991b).
A subsequent study of three different DNA fragments containing TTAA and AATT segments showed that positive water–adenine 2H NOESY cross peaks can be observed also with TTAA segments (Jacobson et al., 1996). The experiments used the Q-switched water-selective 90° pulse in two-dimensional NOE–NOESY and ROE–NOESY experiments (Otting and Liepinsh, 1995c), where the water– DNA cross peaks lie on the diagonal and off-diagonal peaks assist with the assignment of the diagonal peaks. It was shown that NOEs could be observed on the diagonal free from interference with the strong exchange cross peaks of the terminal hydroxyl protons which otherwise appear in the spectral region of the resonances. Negative NOESY cross peaks were observed for base protons other than adenine 2H, most of the sugar protons, and all thymidine methyl groups. The 2H resonances of adenines next to GC base pairs also yielded mostly negative NOESY cross peaks. The conclusion of the study was that the residence time of the hydration water in the minor groove of TTAA segments depends on the nucleotide sequence context. The hydration of DNA triplexes and a parallel-stranded DNA duplex has been studied by two-dimensional NOESY and ROESY experiments using spin-lock purge pulses for water suppression (Radhakrishnan and Patel, 1994a, 1994b; Wang
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
519
and Patel, 1994). Positive water–DNA NOESY cross peaks were observed at -3 to –4°C for some of the protons lining the grooves in these unusual DNA structures. It was argued that these hydration water molecules could contribute to the conformational stability of the structures by shielding against unfavorable electrostatic interactions. A hydration study of RNA was recently performed with the fragment (Conte et al., 1996). Two-dimensional NOESY and ROESY spectra were recorded using Watergate for water suppression. Weak positive water-RNA cross peaks were observed in the NOESY spectrum with two of the adenine 2H and several protons. Since the minor groove in RNA is wider than in DNA, it was argued that groove width is less important for long water residence times than opportunities for hydrogen-bond formation. The NMR signals of the hydroxyl groups were resolved in the spectra, but gave rise to large exchange cross peaks with the water. It was therefore not trivial to exclude the possibility that the cross peaks with the nonexchangeable minor groove protons originated from exchange-relayed NOEs, in particular since the exchange cross peaks with the hydroxyl protons were about 100 times more intense than the water-RNA cross peaks with the nonexchangeable RNA protons, i.e., of similar
intensity as the diagonal peaks. The argument that only weak cross peaks were observed does not mean that these NOEs are weak, since there was also very little diagonal peak intensity for the resonances due to the rapid exchange with the water during the mixing time.
6. 6.1.
SUMMARY OF THE RESULTS Residence Times
By fortuitous coincidence, the sign of the NOE cross-relaxation rate changes for water residence times in the range 0.1–1 ns. Hydration water on protein surfaces and in the minor groove of DNA exhibits residence times exactly in this time range. Thus, NOE measurements provide a tool to distinguish “slow” and “fast” water molecules on this time scale. A second fortuitous coincidence is the fact that water molecules with longer residence times are much easier to detect by water-solute NOEs than rapidly diffusing water molecules. The NOE intensities increase with the residence time until the residence time becomes longer than the rotational correlation time of the solute. Therefore, water-solute NOEs cannot discriminate between different residence times in the regime above the rotational correlation time of the solute (typically several nanoseconds). Since only a few water molecules from the hydration shell of a biomolecule are in the slow-motional regime, the water–solute NOEs provide a filter for the preferential observation of these water molecules
520
Gottfried Otting
which usually are in more intimate contact with the solute than rapidly diffusing
water. The upper limit of the residence times of slowly exchanging hydration water molecules is in the millisecond time range. A residence time of at least about 20 ms would be required to enable the observation of a NOESY cross peak at a chemical shift separate from that of the bulk water (Otting et al., 1991c). A residence time of 1 ms would broaden the signal of the water molecule by about 300 Hz, which would be difficult to observe in the one-dimensional NMR spectrum of a biomolecule. Definitely, upper limits of 100 to 200 cannot be deduced from water–solute NOE studies as claimed (Ernst et al., 1995). Attempts to distinguish rapidly diffusing bulk water from hydration water diffusing at the rate of the macromolecule in an experiment with strong PFGs yielded an upper limit of 1 ms for the residence times of the internal hydration water molecules in BPTI at 4°C (Dötsch and Wider, 1995). Since proton exchange rates between water molecules in the bulk phase occur with rates of and faster (Meiboom, 1961), all these upper limits pertain strictly speaking only to the residence times of the water protons but not the entire water molecules. Residence times in the subnanosecond time range, as documented by negative NOESY cross peaks, must be due to the exchange of entire water molecules, unless proton exchange is very strongly catalyzed. In bulk water, proton exchange lifetimes become shorter than 1 ns at (25°C) (Meiboom, 1961). Recent work by Halle and co-workers showed that accurate residence times in the nanosecond to millisecond time range can be measured for individual hydration water molecules using nuclear magnetic relaxation dispersion (NMRD) of the water nuclei and (Denisov and Halle, 1995a, 1995b, 1995c; Denisov et al., 1996; Venu et al., 1997). The NMRD data predominantly reflect the exchange of the few hydration water molecules with extended residence times on the macromolecular solute. The measurements report on the entire hydration of the solute, not only in the vicinity of solute protons as the water–solute NOE measurements. Although only the relaxation times of the average water NMR signals are measured, information on individual hydration water molecules can be obtained by comparison between samples with and without solvent accessible hydration sites. Hydration sites can be rendered inaccessible by site-directed mutagenesis [for example, the internal water molecule 122 in BPTI is replaced by the hydroxyl group of Ser 36 in the mutant BPTI(G36S) (Berndt et al., 1993)] or by the addition of a ligand [for example, a drug binding to the minor groove of DNA replaces hydration water molecules from the spine of hydration (Denisov et al., 1997)]. The technique yields not only residence times but also order parameters for the solute-bound water
molecules. Furthermore, the number of water molecules bound with long residence times can be determined with good accuracy.
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
521
6.2. Structural Relevance
Since intermolecular water–solute NOEs single out buried hydration water molecules with residence times longer than 1 ns, it is tempting to believe that these water molecules are of importance for the three-dimensional structure of the biomolecules. The energetic implications of slowly and rapidly exchanging hydration water molecules are, however, not so clear. A slowly exchanging hydration
water molecule may not be “more stably” bound than a water molecule that is more easily exchanged by another water molecule. This is particularly apparent for water molecules in hydrophobic cavities where hydrogen-bonding partners are missing. The problem is also well illustrated by the hydration water molecules mediating specific contacts in the trp repressor/operator-DNA cocrystal structure (Otwinowski et al., 1988). Some of these hydration water molecules appear to be approximately conserved in the single-crystal structure of the free DNA (Shakked et al., 1994). In the free DNA, these water molecules are highly solvent exposed and are probably characterized by residence times in the subnanosecond time range. Thus, rapidly exchanging water molecules may be structurally important as slowly exchanging water molecules may be of little structural relevance. The observation
of hydration water molecules with residence times longer than 1 ns in the interior of proteins and in the minor groove of DNA is primarily a consequence of the fact that these water molecules are buried or at least largely protected from access to the bulk solvent. Hydration water molecules buried inside a protein or located in the minor groove of DNA are almost invariably also detectable by X-ray crystallography, where they are often characterized by low B-factors. These water molecules are thought to play a structural role, when the crystal structure shows several well-defined hydrogen bonds with the solute. Usually, many more hydration water molecules are detected by X-ray crystallography than by NOE experiments, but not all hydration water molecules of the first shell of hydration are detected. This is readily explained by the fact that the electron density of the water molecules must be spatially well localized in order to be observable by X-ray crystallography. Thus, continuously diffusing or disordered water escapes X-ray detection, whereas rapidly exchanging water molecules may be observable if they exchange in a
“hopping” motion. It is not surprising that many of the hydration water molecules detected by X-ray crystallography contact one or two solute molecules in the crystal lattice. Negative water–solute NOESY cross peaks observed in solution show that most of these hydration water molecules have residence times of less than 1 ns in solution. More puzzling are reports on water molecules with residence times longer than about 1 ns near solvent-exposed methyl groups (Ernst et al., 1995; Clore et al., 1994). Rapid rotation of water pentagons about the methyl groups has been proposed to explain the fact that these water molecules were not observable in single-crystal X-ray studies, but molecular dynamics simulations do not support this interpretation.
522
Gottfried Otting
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
523
The molecular dynamics (MD) of a protein or DNA molecule in water can be simulated with explicit water molecules for up to a few nanoseconds. The residence times of the hydration water molecules on protein surfaces predicted by the MD simulations range from tens of picoseconds to a few hundred picoseconds (Ahlström et al., 1988; Brunne et al., 1993; Knapp and Mügge, 1993; Billeter et al., 1996).
6.3. Future Perspectives
Currently, water–solute NOEs can be observed for the entire surface of small peptides, but not for protein or DNA molecules. With improved sensitivity of the
NMR equipment, intermolecular water–solute NOEs should become observable for all solvent-exposed protons of the biomolecular macromolecules. Equipment with improved sensitivity would further allow the use of heteronuclear NOEs to study the hydration of chemical groups devoid of protons, such as carbonyl groups. The principle feasibility of such studies has been demonstrated with small organic molecules (e.g., Seba and Ancian, 1990; Canet et al., 1992). The distinction between direct and exchange-relayed NOEs continues to be a problem if the three-dimensional conformation of the solute is not known. Theo-
retically, diffusion filters could be used to separate the signal of rapidly diffusing
bulk water from that of hydration water diffusing at the rate of the solute, but much stronger PFGs would have to be applied in a much shorter time span than what is technically possible today. The attempt to identify direct NOEs in the presence of exchange-relayed NOEs by a quantitative measurement of the NOE cross-relaxation rates failed (Wang et al., 1996a). Usually, many water–solute cross peaks are observed (Fig. 7), but only a few of them can be attributed to direct water–solute NOEs in an unambiguous way and the number of water molecules identified by these is even less. Automation of the spectral analysis will greatly enhance the attractiveness of
the technique. The detection of hydration water molecules at the interface between a protein and a small organic ligand molecule would suggest the design of a new ligand which could bind with higher affinity by replacing the water molecules by functional groups (Grzesiek et al., 1994). It may be conceived that future MD simulations will cover a sufficiently long time span to allow the calculation of water–solute NOEs with all protons of the solute, which will allow the further refinement of the force fields describing biomolecular hydration and lead to a model in quantitative agreement with the experimental NMR data.
7. CONCLUSION What have we learned from hydration studies of biological macromolecules using water–solute NOEs? Perhaps the most interesting result are the short residence
524
Gottfried Otting
times of the hydration water molecules on protein and DNA surfaces. Hydration–
dehydration events would not be expected to be rate-limiting steps in protein folding and intermolecular recognition. Most of the water molecules detected by X-ray crystallography were shown not to be kinetically stable in solution. The possibility of obtaining this information for many individual water molecules in aqueous solution is unique to the NOE method. The hydration studies of proteins and other biological macromolecules by intermolecular water–solute NOEs certainly triggered the development of numerous new pulse sequences dedicated to the detection of intermolecular water–solute cross peaks. In the field of selective water excitation, the experiments with the most colorful acronyms are perhaps not the most attractive in practice. Yet the ideas developed in the context of biomolecular hydration studies may prove invaluable in the development of pulse sequences applicable to the study of NOEs between biological macromolecules and organic cosolvents in aqueous solutions. The first NOE studies of protein–organic solvent interactions are currently emerging (Liepinsh and Otting, 1997; Ponstingl and Otting, 1997a). They may significantly enhance our understanding of altered enzyme specificity observed in nonaqueous environments and provide a tool for rational drug design.
NOTE. Abergel et al. recently demonstrated an elegant modification of the selective excitation scheme of Fig. 5D, where an electronic feedback circuit is used to eliminate or enhance radiation damping at any time during the pulse sequence (Abergel, D., Louis-Joseph, A., and Lallemand, J.-Y., 1996, J. Biomol. NMR 8:15).
ACKNOWLEDGMENTS. The author thanks Dr. Edvards Liepinsh for the spectrum of Fig. 7 and helpful discussions, Dr. Bertil Halle for a critical reading of the manuscript, and the Swedish Natural Science Research Council for financial support. REFERENCES Abragam, A., 1961, Principles of Nuclear Magnetism, Clarendon Press, Oxford. Ahlström, P., Teleman, O., and Jönsson, B., 1988, J. Am. Chem, Soc. 110:4198. Anklin, C., Rindlisbacher, M., Otting, G., and Laukien, F. H., 1995, J. Magn. Reson. B 106:199. Ayant, Y., Belorizky, E., Fries, P., and Rosset, J., 1977, J. Phys. (Paris) 38:325. Bax, A., and Davis, D. G., 1986, J. Magn. Reson. 65:355.
Bax, V., Clore, G. M., and Gronenborn, A. M., 1987, J. Am. Chem. Soc. 1109:6511. Berndt, K. D., Beunink, J., Schröder, W., and Wüthrich, K., 1993, Biochemistry 32:4564. Billeter, M., 1995, Prog. NMR Spectrosc. 27:635. Billeter, M., Güntert, P., Luginbühl, P., and Wüthrich, K., 1996, Cell 85:1057.
Birlirakis, N., Cerdan, R., and Guittet, E., 1996, J. Biomol. NMR 8:487. Böckmann, A., and Guittet, E., 1995, J. Chim. Phys. 92:1923. Böckmann, A., and Guittet, E., 1996, J. Biomol. NMR 8:87. Böckmann, A., Penin, F., and Guittet, E., 1996, FEBS Lett. 383:191.
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
525
Bothner-By, A. A., Stephens, R. L., Lee, J., Warren, C. D., and Jeanloz, R. W., 1984, J. Am. Chem. Soc. 106:811. Bowtell, R., 1992, J. Magn. Reson. 100:1. Broekaert, P., Vlassenbroek, A., Jeener, J., Lippens, G., and Wieruszeski, J.-M., 1996, J. Magn. Reson.
A 120:97. Brunne, R. M., Liepinsh, E., Otting, G., Wüthrich, K., and van Gunsteren, W. F, 1993, J. Mol. Biol.
231:1040. Brüschweiler, R., Morikis, D., and Wright, P. E., 1995, J. Biomol. NMR 5:353. Brüschweiler, R., and Wright, P. E., 1994, Chem. Phys. Lett. 229:75. Canet, D., Mahieu, N., and Tekely, P., 1992, J. Am. Chem. Soc. 114:6190. Clore, G. M., Bax, A., Omichinski, J. G., and Gronenborn, A. M., 1994, Structure 2:89. Clore, G. M., and Gronenborn, A. M., 1992, J. Mol. Biol. 223:853. Clore, G. M., Bax, A., Wingfield, P. T., and Gronenborn, A. M., 1990, Biochemistry 29:5671. Conte, M. R., Conn, G. L., Brown, T., and Lane, A. N., 1996, Nucl. Acids Res. 24:3693. Dalvit, C., 1995, J. Magn. Reson. A 113:120. Dalvit, C., 1996, J. Magn. Reson. B 112:282. Dalvit, C., and Hommel, U., 1995a, J. Biomol. NMR 5:306. Dalvit, C., and Hommel, U., 1995b, J. Magn. Reson. B 109:334. Denisov, V. P., Carlström, G., Venu, K., and Halle, B., 1997, J. Mol. Biol. 268:118.
Denisov, V. P., and Halle, B., 1995a, J. Mol. Biol. 245:682. Denisov, V. P., and Halle, B., 1995b, J. Mol. Biol. 245:698.
Denisov, V P., and Halle, B., 1995c, J. Am. Chem. Soc. 117:8456. Denisov, V. P., and Halle, B., 1996, Faraday Discuss. 103:227. Denisov, V. P., Peters, J., Hörlein, H. D., and Halle, B., 1996, Nat. Struct. Biol. 3:505. Dotsch, V., and Wider, G., 1995, 7. Am. Chem. Soc. 117:6064. Driscoll, P. C., Clore, G. M., Beress, L., and Gronenborn, A. M., 1989, Biochemistry 28:2178. Edzes, H. T., 1990, J. Magn. Reson. 86:293. Ernst, J. A., Clubb, R. T., Zhou, H.-X., Gronenborn, A. M., and Clore, G. M., 1995, Science 267:1813. Farmer, B. T. II, Macura, S., and Brown, L. R., 1987, J. Magn. Reson. 72:347. Fawthrop, S. A., Yang, J.-C., and Fisher, J., 1993, Nucl. Acids Res. 21:4860. Forman-Kay, J. D., Gronenborn, A. M., Wingfield, P. T., and Clore, G. M., 1991, J. Mol. Biol. 220:209. Friedrich, J., Davis, S., and Freeman, R., 1987, 7. Magn. Reson. 75:390. Fujiwara, T., and Nagayama, K., 1985, J. Chem. Phys. 83:3110. Geen, H., and Freeman, R., 1991, J. Magn. Reson. 93:93. Gemmecker, G., Jahnke, W., and Kessler, H., 1993, J. Am. Chem. Soc. 115:11620. Glaser, S. J., and Drobny, G. P., 1990, Adv. Magn. Reson. 14:35. Glickson, J. D., Rowan, R., Pitner, T. P., Dadok, J., Bothner-By, A. A., and Walter, R., 1976, Biochemistry 15:1111.
Gooley, P. R., O’Connell, J. F., Marcy, A. I., Cuca, G. C., Axel, M. G., Caldwell, C. G., Hagmann, W. K., and Becker, J. W., 1996, J. Biomol. NMR 7:8. Griesinger, C., and Ernst, R. R., 1987, J. Magn. Reson. 75:261. Griesinger, C., Otting, G., Wüthrich, K., and Ernst, R. R., 1988, J. Am. Chem. Soc. 110:7870. Grzesiek, S., and Bax, A., 1993a, J. Am. Chem. Soc. 115:12593. Grzesiek, S., and Bax, A., 1993b, J. Biomol. NMR 3:627.
Grzesiek, S., Bax, A., Nicholson, L. K., Yamazaki, T., Wingfield, P., Stahl, S. J., Eyermann, C. J., Torchia, D. A., Hodge, C. N., Lam, P. Y. S., Jadhav, P. K., and Chang, C.-H., 1994, J. Am. Chem. Soc. 116:1581. Hajduk, P. J., Horita, D. A., and Lerner, L. E., 1993, J. Magn. Reson. A 103:40. Halle, B., and Wennerström, H., 1981, J. Chem. Phys. 75:1928. Hore, P. J., 1983, J. Magn. Reson. 55:283.
526
Gottfried Otting
Hausser, R., Meier, G., and Noak, F., 1966, Z. Naturforsch. 21a:1410. Holak, T. A., Wiltscheck, R., and Ross, A., 1992, J. Magn. Reson. 97:632. Hurd, R. E., 1991, J. Magn. Reson. 91:648. Jacobson, A., Leupin, W., Liepinsh, E., and Otting, G., 1996, Nucl. Acids Res. 24:2911. John, B. K., Plant, D., Webb, P., and Hurd, R. E., 1992, J. Magn. Reson. 98:200. Kay, L. E., Keifer, P., and Saarinen, T., 1992, J. Am. Chem. Soc. 114:10663. Keeler, J., Clowes, R. T., Davis, A. L., and Laue, E. D., 1994, Meth. Enzymol. 239:145.
Knapp, E. W., and Mügge, I., 1993, J. Phys. Chem. 97:11339. Knauf, M. A., Löhr, F., Curley, G. P., O’Farrel, P., Mayhew, S. G., Müller, F., and Rüterjans, H., 1996, Eur. J. Biochem. 213:167. Kochoyan, M., and Leroy, J. L., 1995, Curr. Opin. Struct. Biol. 5:329. Kriwacki, R. W., Hill, R. B., Flanagan, J. M., Caradonna, J. P., and Prestegard, J. H., 1993, J. Am. Chem. Soc. 115:8907. Kubinec, M. G., Culf, A. S., Cho, H., Lee, D. C., Burkham, J., Morimoto, H., Williams, P. G., and Wemmer, D. E., 1996, J. Biomol. NMR 7:236. Kubinec, M. G., and Wemmer, D. E., 1992a, Curr. Opin. Struct. Biol. 2:828.
Kubinec, M. G., and Wemmer, D. E., 1992b, J. Am. Chem. Soc. 114:8739. Levitt, M. H., 1996, Concepts Magn. Reson. 8:77. Lipari, G., and Szabo, A., 1982, J. Am. Chem. Soc. 104:4546. Liepinsh, E., Leupin, W., and Otting, G., 1994, Nucl. Acids Res. 22:2249. Liepinsh, E., and Otting, G., 1996, Magn. Reson. Med. 35:30. Liepinsh, E., and Otting, G., 1997, Nat. Biotech. 15:264. Liepinsh, E., Otting, G., and Wüthrich, K., 1992a, J. Biomol NMR 2:447. Liepinsh, E., Otting, G., and Wüthrich, K., 1992b, Nucl. Acids Res. 20:6549. Liepinsh, E., Rink, H., Otting, G., and Wüthrich, K., 1993, J. Biomol. NMR 3:253. Luginbühl, P., 1996, Diss. ETH Nr. 11994. Maltseva, T. V., Agback, P., and Chattopadhyaya, J., 1993, Nucl. Acids Res. 21:4246. Meadows, R. P., Nettesheim. D. G., Xu, R. X., Olejniczak, E. T., Petros, A. M., Holzman, T. F., Severin, J., Gubbins, E., Smith, H., and Fesik, S. W., 1993, Biochemistry 32:754. Meiboom, S., 1961, J. Chem. Phys. 57:375. Meiering, E. M., Li, H., Delcamp, T. J., Freisheim, J. H., and Wagner, G., 1995, J. Mol. Biol. 247:309. Meiering, E. M., and Wagner, G., 1995, J. Mol. Biol. 247:294. Messerle, B. A., Wider, G., Otting, G., Weber, C., and Wüthrich, K., 1989, J. Magn. Reson. 85:608. Mori, S., Abeygunawardana, C., van Zijl, P. C. M., and Berg, J. M., 1996a, J. Magn. Reson. B 110:96. Mori, S., Berg, J. M., and van Zijl, P. C. M., 1996b, J. Biomol. NMR 7:77. Mori, S., Johnson, M. O., Berg, J. M., and van Zijl, P. C. M., 1994, J. Am. Chem. Soc. 116:11982. Morris, G. A., and Freeman, R., 1978, J. Magn. Reson. 29:433. Otting, G., 1994, J. Magn. Reson. B 103:288. Otting, G., and Liepinsh, E., 1995a, Acc. Chem. Res. 28:171. Otting, G., and Liepinsh, E., 1995b, Biomol. NMR 5:420.
Otting, G., and Liepinsh, E., 1995c, J. Magn. Reson. B 107:192. Otting, G., Liepinsh, E., Farmer, B. T. II, and Wüthrich, K., 1991b, J. Biomol. NMR 1:209.
Otting, G., Liepinsh, E., Halle, B., and Frey, U., 1997, Nat. Struct. Biol. 4:396. Otting, G., Liepinsh, E., and Wüthrich, K., 1991a, Science 254:974. Otting, G., and Wüthrich, K., 1989, J. Am. Chem. Soc. 111:1871. Otting, G., Liepinsh, E., and Wüthrich, K., 1991c, J. Am. Chem. Soc. 113:4363. Otting, G., Liepinsh, E., and Wüthrich, K., 1992, J. Am. Chem. Soc. 114:7093. Otwinowski, Z., Schevitz, R. W., Zhang, R.-G., Lawson, C. L., Joachimiak, A., Marmorstein, R. Q., Luisi, B. F., and Sigler, P. B., 1988, Nature 335:321.
Peng, J. W., Schiffer, C. A., Xu, P., van Gunsteren, W. E, and Ernst, R. R., 1996, J. Biomol. NMR 8:453.
Studies of Biological Macromolecules by Intermolecular Water-Solute NOEs
527
Piotto, M., Saudek, V., and V, 1992, J. Biomol. NMR 2:661. Pitner, T. P., Glickson, J. D., Dadok, J., and Marshall, G. R., 1974, Nature 250:582. Plateau, P., and Guéron, M., 1982, J. Am. Chem. Soc. 104:7310. Ponstingl, H., and Otting, G., 1997a, J. Biomol. NMR 9:441. Ponstingl, H., and Otting, G., 1997b, Eur. J. Biochem. 244:384.
Qi, P. X., Urbauer, J. L., Fuentes, E. J., Leopold, M. F., and Wand, A. J., 1994, Nat. Struct. Biol. 1:378. Qian, Y. Q., Otting, G., and Wüthrich, K., 1993, J. Am. Chem. Soc. 115:1189. Radhakrishnan, I., and Patel, D. J., 1994a, Structure 2:395.
Radhakrishnan, I., and Patel, D. J., 1994b, J. Mol. Biol. 241:600. Richarz, R., Nagayama, K., and Wüthrich, K., 1980, Biochemistry 19:5189. Richter, W., Lee, S., Warren, W. S., and He, Q., 1995, Science 267:654.
Seba, H. B., and Ancian, B., 1990, J. Chem. Soc. Chem. Commun.: 997. Shakked, Z., Guzikevich-Guerstein, G., Frolow, F., Rabinovich, D., Joachimiak, A., and Sigler, P. B., 1994, Nature 368:469. , V, 1995, J. Magn. Reson. A 114:132.
, V, and Bax, V., 1987a, J. Magn. Reson. 75:378. , V., and Bax, A., 1987b, J. Magn. Reson. 74:469. , V., Piotto, M., Leppik, R., and Saudek, V., 1993, J. Magn. Reson. A 102:241.
, V., Tschudin, R., and Bax, A., 1987, J. Magn. Reson. 75:352. Smallcombe, S. H., 1993, J. Am. Chem. Soc. 115:4776. van de Ven, F. J. M., Janssen, H. G. J. M., Graslund, A., and Hilbers, C. W., 1988, J. Magn. Reson.
79:221. van Zijl, P. C. M., and Moonen, C. T. W., 1990, J. Magn. Reson. 87:18.
Venu, K., Denisov, V. P., and Halle, B., 1997, J. Am. Chem. Soc. 119:3122. Villars, F. M. H., and Benedek, G. B., 1974, Physics, Vol. 2, Chap. 2, Addison-Wesley, Reading, MA. Wang, Y.-X., Freedberg, D. I., Grzesiek, S., Torchia, D. A., Wingfield, P. T., Kaufman, J. D., Stahl, S. J., Chang, C.-H., and Hodge, C. N., 1996a, Biochemistry 35:12694. Wang, Y.-X., Freedberg, D. I., Wingfield, P. T., Stahl, S. J., Kaufman, J. D., Kiso, Y., Bhat, T. N., Erickson, J. W., and Torchia, D. A., 1996b, J. Am. Chem. Soc. 118:12287. Wang, Y., and Patel, D. J., 1994, J. Mol. Biol. 242:508. Warren, W. S., Richter, W., Andreotti, A. H., and Farmer, B. T. II, 1993, Science 262:2005. Wider, G., Dötsch, V., and Wüthrich, K., 1994, J. Magn. Reson. A 108:255.
Wider, G., Riek, R., and Wüthrich, K., 1996, J. Am. Chem. Soc. 118:11629. Wu, D., Chen, A., and Johnson, C. S. Jr., 1995, J. Magn. Reson. A 115:260.
Wüthrich, K., Otting, G., and Liepinsh, E., 1992, Faraday Discuss. 93:35. Xu, R. X., Meadows, R. P., and Fesik, S. W., 1993, Biochemistry 32:2473.
Yao, J., Brüschweiler, R., Dyson, H. J., and Wright, P. E., 1994, J. Am. Chem. Soc. 116:12051. Zhang, S., and Gorenstein, D. G., 1996, J. Magn. Reson. A 118:291.
Contents of Previous Volumes
VOLUME 1 Chapter 1
NMR of Sodium-23 and Potassium-39 in Biological Systems Mortimer M. Civan and Mordechai Shporer
Chapter 2
High-Resolution NMR Studies of Histones C. Crane-Robinson Chapter 3
PMR Studies of Secondary and Tertiary Structure of Transfer RNA in Solution Philip H. Bolton and David R. Kearns
Chapter 4 Fluorine Magnetic Resonance in Biochemistry J. T. Gerig
Chapter 5
ESR of Free Radicals in Enzymatic Systems Dale E. Edmondson 529
530
Contents of Previous Volumes
Chapter 6 Paramagnetic Intermediates in Photosynthetic Systems Joseph T. Warden Chapter 7
ESR of Copper in Biological Systems John F. Boas, John R. Pilbrow, and Thomas D. Smith
Index VOLUME 2 Chapter 1
Phosphorus NMR of Cells, Tissues, and Organelles Donald P. Hollis Chapter 2
EPR of Molybdenum-Containing Enzymes Robert C. Bray
Chapter 3
ESR of Iron Proteins Thomas D. Smith and John R. Pilbrow
Chapter 4
Stable Imidazoline Nitroxides Leonid B. Volodarsky, Igor A. Grigor’ev, and Renad Z. Sagdeev Chapter 5
The Multinuclear NMR Approach to Peptides: Structures, Conformation, and Dynamics Roxanne Deslauriers and Ian C. P. Smith
Index
Contents of Previous Volumes
531
VOLUME 3
Chapter 1 Multiple Irradiation Experiments with Hemoproteins Regula M. Keller and Kurt Wüthrich Chapter 2
Vanadyl(IV) EPR Spin Probes: Inorganic and Biochemical Aspects N. Dennis Chasteen
Chapter 3
ESR Studies of Calcium- and Protein-Induced Photon Separations in Phospatidylserine-Phosphatidylcholine Mixed Membranes Shun-ichi Ohnishi and Satoru Tokutomi Chapter 4
EPR Crystallography of Metalloproteins and Spin-Labeled Enzymes James C. W. Chien and L. Charles Dickinson Chapter 5
Electron Spin Echo Spectroscopy and the Study of Metalloproteins W. B. Mims and J. Peisach Index VOLUME 4 Chapter 1
Spin Labeling in Disease D. Allan Butterfield
Chapter 2
Principles and Applications of Ian M. Armitage and James D Otvos
to Biological Systems
532
Contents of Previous Volumes
Chapter 3
Photo-CIDNP Studies of Proteins Robert Kaptein Chapter 4
Application of Ring Current Calculations to the Proton NMR of Proteins and Transfer RNA Stephen J. Perkins Index VOLUME 5 Chapter 1
CMR as a Probe for Metabolic Pathways in Vivo R. L. Baxter, N. E. Mackenzie, and A. I. Scott
Chapter 2
Nitrogen-15 NMR in Biological Systems Felix Blomberg and Heinz Rüterjans Chapter 3
Phosphorus-31 Nuclear Magnetic Resonance Investigations of Enzyme Systems B. D. Nageswara Rao
Chapter 4
NMR Methods Involving Oxygen Isotopes in Biophosphates Ming-Daw Tsai and Larol Bruzik Chapter 5
ESR and NMR Studies of Lipid-Protein Interactions in Membranes Philippe F. Devaux Index
Contents of Previous Volumes
VOLUME 6 Chapter 1 Two-Dimensional Spectroscopy as a Conformational Probe of Cellular Phosphates Philip H. Bolton Chapter 2 Lanthanide Complexes of Peptides and Proteins Robert E. Lenkinski Chapter 3 EPR of Mn(II) Complexes with Enzymes and Other Proteins George H. Reed and George D. Markham
Chapter 4 Biological Applications of Time Domain ESR Hans Thomann, Larry R. Dalton, and Lauraine A. Dalton Chapter 5 Techniques, Theory, and Biological Applications of Optically Detected Magnetic Resonance (ODMR) August H. Maki
Index VOLUME 7 Chapter 1 NMR Spectroscopy of the Intact Heart Gabriel A. Elgavish Chapter 2 NMR Methods for Studying Enzyme Kinetics in Cells and Tissue K. M. Brindle, I. D. Campbell, and R. J. Simpson
533
534
Contents of Previous Volumes
Chapter 3
Endor Spectroscopy in Photobiology and Biochemistry Klaus Möbius and Wolfgang Lubitz Chapter 4
NMR Studies of Calcium-Binding Proteins Hans J. Vogel and Sture Forsén
Index VOLUME 8 Chapter 1
Calculating Slow Motional Magnetic Resonance Spectra: A User’s Guide David J. Schneider and Jack H. Freed
Chapter 2
Inhomogeneously Broadened Spin-Label Spectra Barney Bales Chapter 3
Saturation Transfer Spectroscopy of Spin-Labels: Techniques and Interpretation of Spectra M. A. Hemminga and P. A. de Jager Chapter 4
Nitrogen-15 and Deuterium Substituted Spin Labels for Studies of Very Slow Rotational Motion Albert H. Beth and Bruce H. Robinson
Chapter 5
Experimental Methods in Spin-Label Spectral Analysis Derek Marsh Chapter 6
Electron-Electron Double Resonance James S. Hyde and Jim B. Feix
Contents of Previous Volumes
535
Chapter 7
Resolved Electron-Electron Spin-Spin Splittings in EPR Spectra Gareth R. Eaton and Sandra S. Eaton
Chapter 8
Spin-Label Oximetry James S. Hyde and Witold S. Subczynski
Chapter 9
Chemistry of Spin-Labeled Amino Acids and Peptides: Some New Mono- and Bifunctionalized Nitroxide Free Radicals Kálmán Hideg and Olga H. Hankovsky
Chapter 10 Nitroxide Radical Adducts in Biology: Chemistry, Applications, and Pitfalls Carolyn Mottley and Ronald P. Mason
Chapter 11
Advantages of and Deuterium Spin Probes for Biomedical Electron Paramagnetic Resonance Investigations Jane H. Park and Wolfgang E. Trommer
Chapter 12
Magnetic Resonance Study of the Combining Site Structure of a Monoclonal Anti-Spin-Label Antibody Jacob Anglister Appendix
Approaches to the Chemical Synthesis of Spin Labels Jane H. Park and Wolfgang E. Trommer Index
and Deuterium Substituted
536
Contents ofPrevious Volumes
VOLUME 9
Chapter 1 Phosphorus NMR of Membranes Philip L. Yeagle
Chapter 2
Investigation of Ribosomal 5S Ribonucleotide Acid Solution Structure and Dynamics by Means of High-Resolution Nuclear Magnetic Resonance Spectroscopy Alan G. Marshall and Jiejun Wu Chapter 3
Structure Determination via Complete Relaxation Matrix Analysis (CORMA) of Two-Dimensional Nuclear Overhauser Effect Spectra: DNA Fragments Brandan A. Borgias and Thomas L. James
Chapter 4
Methods of Proton Resonance Assignment for Proteins Andrew D. Robertson and John L. Markley Chapter 5
Solid-State NMR Spectroscopy of Proteins Stanley J. Opella Chapter 6
Methods for Suppression of the Signal in Proton FT/NMR Spectroscopy: A Review Joseph E. Meier and Alan G. Marshall Index VOLUME 10 Chapter 1
High-Resolution
Magnetic Resonance Spectroscopy of
Oligosaccharide-Alditols Released from Mucin-Type O-Glycoproteins Johannis P. Kamerling and Johannes F. G. Vliegenthart
Contents of Previous Volumes
537
Chapter 2
NMR Studies of Nucleic Acids and Their Complexes David E. Wemmer Index VOLUME 11 Chapter 1
Localization of Clinical NMR Spectroscopy Lizann Bolinger and Robert E. Lenkinski Chapter 2
Off-Resonance Rotating Frame Spin-Lattice Relaxation: Theory, and in Vivo MRS and MRI Applications
Thomas Schleich, G. Herbert Caines, and Jan M. Rydzewski Chapter 3 NMR Methods in Studies of Brain Ischemia Lee-Hong Chang and Thomas L. James
Chapter 4
Shift-Reagent-Aided Whole-Organ Systems
NMR Spectroscopy in Cellular, Tissue, and
Sandra K. Miller and Gabriel A. Elgavish
Chapter 5
In Vivo
NMR
Barry S. Selinski and C. Tyler Burt
Chapter 6
In Vivo
NMR Studies of Cellular Metabolism
Robert E. London
Chapter 7
Some Applications of ESR to in Vivo Animals Studies and EPR Imaging Lawrence J. Berliner and Hirotada Fujii
Index
538
Contents of Previous Volumes
VOLUME 12
Chapter 1
NMR Methodology for Paramagnetic Proteins Gerd N. La Mar and Jeffrey S. de Ropp
Chapter 2
Nuclear Relaxation in Paramagnetic Metalloproteins Lucia Banci
Chapter 3 Paramagnetic Relaxation of Water Protons Cathy Coolbaugh Lester and Robert G. Bryant
Chapter 4
Proton NMR Spectroscopy of Model Hemes F. Ann Walker and Ursula Simonis
Chapter 5 Proton NMR Studies of Selected Paramagnetic Heme Proteins J. D. Satterlee, S. Alam, Q. Yi, J. E. Erman, I. Constantinidis, D. J. Russell, and S. J. Moench Chapter 6
Heteronuclear Magnetic Resonance: Applications to Biological and Related Paramagnetic Molecules Joël Mispelter, Michel Momenteau, and Jean-Marc Lhoste Chapter 7
NMR of Polymetallic Systems in Proteins Claudio Luchinat and Stefano Ciurli
Index
Contents of Previous Volumes
539
VOLUME 13 Chapter 1 Simulation of the EMR Spectra of High-Spin Iron in Proteins Betty J. Gaffney and Harris J. Silverstone Chapter 2
Mössbauer Spectroscopy of Iron Proteins Peter G. Debrunner
Chapter 3 Multifrequency ESR of Copper: Biophysical Applications Riccardo Basosi, William E. Antholine, and James S. Hyde Chapter 4 Metalloenzyme Active-Site Structure and Function through Multifrequency CW and Pulsed ENDOR Brian M. Hoffman, Victoria J. DeRose, Peter E. Doan, Ryszard J. Gurbiel, Andrew L. P. Houseman, and Joshua Telser Chapter 5
ENDOR of Randomly Oriented Mononuclear Metalloproteins: Toward Structural Determinations of the Prosthetic Group Jürgen Hüttermann
Chapter 6 High-Field EPR and ENDOR in Bioorganic Systems Klaus Möbius Chapter 7
Pulsed Electron Nuclear Double and Multiple Resonance Spectroscopy of Metals in Proteins and Enzymes Hans Thomann and Marcelino Bernardo
Chapter 8 Transient EPR of Spin-Labeled Proteins David D. Thomas, E. Michael Ostap, Christopher L. Berger, Scott M. Lewis, Piotr G. Fajer, and James E. Mahaney
540
Contents of Previous Volumes
Chapter 9
ESR Spin-Trapping Artifacts in Biological Model Systems Aldo Tomasi and Anna Iannone
Index VOLUME 14 Introduction: Reflections on the Beginning of the Spin Labeling Technique Lawrence J. Berliner Chapter 1
Analysis of Spin Label Line Shapes with Novel Inhomogeneous Broadening from Different Component Widths: Application to Spatially Disconnected Domains in Membranes M. B. Sankaram and Derek Marsh
Chapter 2 Progressive Saturation and Saturation Transfer EPR for Measuring Exchange Processes and Proximity Relations in Membranes Derek Marsh, Tibor Páli, and László Horváth Chapter 3
Comparative Spin Label Spectra at X-band and W-band Alex I. Smirnov, R. L. Belford, and R. B. Clarkson
Chapter 4
Use of Imidazoline Nitroxides in Studies of Chemical Reactions: ESR Measurements of the Concentration and Reactivity of Protons, Thiols, and Nitric Oxide Valery V. Khramtsov and Leonid B. Volodarsky
Chapter 5
ENDOR of Spin Labels for Structure Determination: From Small Molecules to Enzyme Reaction Intermediates Marvin W. Makinen, Devkumar Mustafi, and Seppo Kasa
Contents of Previous Volumes
Chapter 6
Site-Directed Spin Labeling of Membrane Proteins and PeptideMembrane Interactions Jimmy B. Feix and Candice S. Klug Chapter 7
Spin-Labeled Nucleic Acids Robert S. Keyes and Albert M. Bobst
Chapter 8
Spin Label Applications to Food Science Marcus A. Hemminga and Ivon J. van den Dries Chapter 9
EPR Studies of Living Animals and Related Model Systems (In-Vivo EPR) Harold M. Swartz and Howard Halpern
Appendix Derek Marsh and Karl Schorn
Index VOLUME 15 Chapter 1 Tracery Theory and NMR Maren R. Laughlin and Joanne K. Kelleher Chapter 2
Isotopomer Analysis of Glutamate: A NMR Method to Probe Metabolic Pathways Intersecting in the Citric Acid Cycle A. Dean Sherry and Craig R. Malloy Chapter 3
Determination of Metabolic Fluxes by Mathematical Analysis of Labeling Kinetics John C. Chatham and Edwin M. Chance
541
542
Contents of Previous Volumes
Chapter 4 Metabolic Flux and Subcelluar Transport of Metabolites E. Douglas Lewandowski Chapter 5
Assessing Cardiac Metabolic Rates During Pathologic Conditions with Dynamic NMR Spectra Robert G. Weiss and Gary Gerstenblith
Chapter 6
Applications of Labeling to Studies of Human Brain Metabolism In Vivo Graeme F. Mason Chapter 7
In Vivo NMR Spectroscopy: A Unique Approach in the Dynamic Analysis of Tricarboxylic Acid Cycle Flux and Substrate Selection Pierre-Marie Luc Robitaille
Index VOLUME 16
Chapter 1
Determining Structures of Large Proteins and Protein Complexes by NMR G. Marius Clore and Angela M. Gronenborn
Chapter 2
Multidimensional NMR Methods for Resonance Assignment, Structure Determination, and the Study of Protein Dynamics Kevin H. Gardner and Lewis E. Kay Chapter 3
NMR of Perdeuterated Large Proteins Bennett T. Farmer II and Ronald A. Venters
Contents of Previous Volumes
Chapter 4
Recent Developments in Multidimensional NMR Methods for Structural Studies of Membrane Proteins Francesca M. Marassi, Jennifer J. Gesell, and Stanley J. Opella Chapter 5
Homonuclear Decoupling to Proteins Hiroshi Matsuo, and Gerhard Wagner Chapter 6
Pulse Sequences for Measuring Coupling Constants Geerten W. Vuister, Marco Tessari, Yasmin Karimi-Nejad, and Brian Whitehead Chapter 7 Methods for the Determination of Torsion Angle Restraints
in Biomacromolecules C. Griesinger, M. Hennig, J. P. Marino, B. Reif, C. Richter, and H. Schwalbe
Index
543
Index
Acyl carrier protein, 55 distance constraints, 24 Aggregation symmetric, 132 ,449 ALFA, 59, 60 Alignment, see Molecular alignment ALPS, 59, 60 Ambiguous restraints, 67 Ambiguous distance restraints (ADRs), 131, 140–142, 155–156, 157 assignment of, 145 symmetric, 140–142 Analytical expressions for the transferred NOESY of a two-spin system, 238 Angle search, 64, 39 Anisotropic interactions, see Interactions, anisotropic magnetic susceptibility, see Magnetic susceptibility, anisotropic reorientation, see Molecular alignment Annealing protocols, 142–144 naming convention, 143 ANRS method, 61 Apo-kedarcidin, 58 Applications of water-solute NOEs, 511 Arc motion, see Motion, arc model ARIA, 145 Assessment of conformational flexibility, 209 Assignment of NOEs, 136, 155 of resonances, 136 Assignment methods, 43 Assignments of water-solute cross peaks, 493 Asymmetric labeling, 137–138 Atom swapping, 63 Atomic B-factors, 12
AURELIA, 38 AUTOASSIGN, 56, 67 Automated methods, 40 Automated peak picking, 68 Automated resonance assignments, 40, 57 ALPS, 85 AUTOASSIGN, 85–97 CONTRAST, 84–85 FELIX, 83 NOESY spectra assignment, 67 program of Abbott Laboratories NMR Group, 84–85 program of Bristol-Myers Squibb NMR Group, 83 stereospecific assignments, 33 Averaging sum, 141 Back calculation of NMR spectra, 206 transferred NOESY spectra, 265, 281 Backbone dynamics derived from relaxation rates, 385 analysis of the multispin relaxation of 386 experiments to determine the relaxation rates, 391 heteronuclear NOE, 389 relaxation time, 387 relaxation time 387 Backbone dynamics derived from relaxation rates, 370 calculation of microdynamical parameters, 377 experimental details, 370 interpretation of microdynamical parameters, 381 processing of spectra and determination of relaxation rates, 376 sensitivity-enhanced HSQC experiment (SE-HSQC), 370 water-flipback HSQC, 371 Basic fibroblast growth factor (FGF-2), 90, 94, 97 Bayesian parameter estimation, 331
545
546 Bayesian posterior probabilities, 86 Bicelles, see Molecular alignment, Protein alignment Blood group A trisaccharide, 283, 289 Boltzmann average, 4, 28 constant, 6 ensemble, 5, 6 factor, 5 probability distribution, 15 sampling, 21 Bovine pancreatic ribonuclease, 90–97 Bovine seminal ribonunclease, 46 BPTI, 51, 448, 450, 474–475, 478–479 Branched polymers, 22 BSA, 461 Calbindin D9k, 450, 452 Calculation of concentrations, 247 Calmodulin, 58 CARNIVAL, 206 Cellobiohydrolase I, 65 Channel blocker, 214 Chemical exchange, 493, 501 Chemical shift dispersion degeneracy, 136–141 symmetry degeneracy, 136, 137, 140–141 Chemical shifts, 4,12,19,23 CLAIRE, 45 Coherence-transfer delays, 124 Cold-shock protein (Csp A), 90, 97 Comonomer NOEs, 137, 138, 146 Complete hybrid matrix, 204 Complete relaxation matrix, 165, 204, 282 CORCEMA, 223 CORMA, 203 IRMA,172 MARDIGRAS, 172, 204 MORASS, 172 PDB2 NOE, 285 Cone motion, see Motion, cone model Conformational annealing search, 142–144 averaging, 210 exchange, 224
heterogeneity, 86 sampling, 214 Conformational exchange matrix, 233
Constraint adiabatic distance, 24 methods, 6 propagation, 56, 87 CONTRAST, 84 CORCEMA, 223, 265 analysis of transferred NOESY data, 289–301 calculations for finite delays, 232 program, 246, 248 theory, 230
Index Corepressor tryptophan with repressor-operator
complex, 297 CORMA, 203, 224 Correlation time, 204, 453, 465–466 COSY (ECOSY), 328 Coupling constants, 203, 207 Couplings, measurement of effects of cross correlation, see Spin relaxation, cross correlation effects of dynamic frequency shifts, see Spin relaxation, dynamic frequency shifts frequency based experiments, 323–325, 328–333 intensity based experiments, 325, 333–336 precision of measurement, 323, 325, 330–332 systematic errors, 324, 335–336 Couplings, residual dipolar angular constraints, 312, 319, 321–322; see also Structure determination determination of sign, 320 field dependence, 313, 319, 323, 325 field induced, 311 history of observation, 320–322 in the study of motion, see Motion measurement of, see Couplings, measurement of separation from scalar, 319 theory, 314–320 Couplings, scalar measurement of, see Couplings, measurement of CPMG, 443, 444
Crambin, 73 Cramér-Rao lower bound (CRLB), see Couplings, measurement of, precision of measurement Cross validation, 217 Degeneracy dispersion, 136, 141 symmetry, 136, 137, 140–141 Determination of protein dynamics in the microsecond time window, 406 in the millisecond time window, 409 Diamagnetic susceptibility, see Magnetic susceptibility, diamagnetic Diamagnetic systems, see Magnetic susceptibility, diamagnetic DIAMOD, 69 DIANA, 65
Difference NMRD, 447 Difference spectroscopy, 138 Diffraction, 3, 8
Diffusion filter, 496 Dipolar field effects, 511 Dipolar relaxation with chemical exchange, 439 Dipolar couplings, see Couplings, residual dipolar, field-induced
Index Dipolar (cont.) Hamiltonian, 315, 316; see also Couplings, residual dipolar, theory shifts (pseudocontact shifts) 341 DISGEO, 66 Dispersion amplitude, 466 function, 463, 474–477 stretched, 474–477 Distance constraints, 6, 24 holonomic, 24 Distance geometry, 61–63, 202 DISGEO, 66 DGEOM, 282 self-correcting, 143 Distance restraints (constraints), 62 ambiguous, 140–142 bounds, 141 restraint function symmetry restraints, 139–140, 156–157 DNA and RNA hydration, 516 DNA, 450, 453, 472 DNA duplexes, 207 complexed to GATA-1, see Structure determination, examples, GATA-1 complexed to DNA
magnetic susceptibility, see Magnetic susceptibility,
diamagnetic, in DNA structure refinement, 322; see also Structure determination, examples, GATA-1 complexed to DNA DNA three way junction, 190–194 final structure, 192–193 hybrid-hybrid matrix refinement, 190–194 refinement summary, 192 sequence, 190 Dolichos biflorus lectin, 283, 287 Double-quantum-filtered COSY (2QF-COSY), 207 Dynamic frequency shift, see Spin relaxation, dynamic
frequency shifts Dynamic matrix, 230, 238 Dynamic shift, 436–437 Dynamics of protein structures, 311, 357 from field-induced dipolar couplings, 311 from and relaxation, 357
general features of dynamics, 357 microdynamic motional parameters, 359 ECOSY spectroscopy, 328 Effect of ligand-receptor ratio on tr-NOESY, 270 Effect of motions of transferred NOESY, 278 Electric field alignment, see Molecular alignment, using electric field Electron density, 5, 12 Electron spin, see Magnetic susceptibility, paramagnetic, in myoglobin Encounter complexes, 240
547 Energy, see potential minimization, 144 Ensemble, 6, 29 average, 14, 19, 29 calculations, 201 generation of, 6, 8
Ensemble of structures, 212 Equations of motion, 5, 6, 23, 24 Er-2, 72 Error analysis, 205
E-selectin, 290 Euler rotations, see Rotations, Euler Ewald techniques, 27 Exchange rate, 202 Experimental NMR restraints, 202
accuracy, 202, 206 internal inconsistencies, 213, 217 redundancy of restraints, 219 Extended system restraining methods, 6 Fast conformational exchange, 236 Fast field cycling, 421, 424–429 FELIX, 38, 47 3D data set to process, 191 Ferrocytochrome c, 60
FFC, see fast field cycling Field-induced residual dipolar couplings, 311 Field variation, 421 Finite receptor off-rates, 227, 268 FKBP, 67 Floating chirality, 63, 64, 67 Force Field, 5, 7, 9, 19 GROMOS, 20, 87 GROMOS 43A1, 9 parameters, 142 Forssman pentasaccharide, 287, 289 Frequency domain experiments, see Couplings, meas-
urement of, frequency domain experiments
Function of off-rate, 277 Fuzzy graphs, 47 GAL 4, 62 GARANT, 50, 51 GATA-1 magnetic susceptibility, see Magnetic susceptibility,
diamagnetic, in DNA structure refinement, see Structure determination, examples, GATA-1 complexed to DNA GCN4 homodimer, 149–151
Generalized intensity (I) matrix, 234 Generalized kinetic (K) matrix, 234 Generalized relaxation rate (R) matrix, 233 Generic spin system object, 86 Genetic algorithms, 50 Global optimization, 50
GLOMSA, 64, 65
548
Index
Glutamine-binding protein, 60 Goodness of refinement, 186 Graph theory, 41, 49 Grid search, 63 HABAS, 63, 65 Hamiltonian, see Dipolar Helix motion in myoglobin, see Motion, in myoglobin relative orientations, see Structure determination,
example, myoglobin Hemocyanin, 462 High temperature approximation, 318 Hinge-bending motion, 228–229, 240–243 Hirudin, 72 HIV integrase fragment dimer, 134 HnRNP C RNA-binding domain, 58 Holonomic distance constraints, 24 HSQC, 323–324; see also Couplings, measurement of
Human fibrinopeptide analogs, 282 Human RNP C RNA-binding domain, 58 Human transforming growth factor (hTGF), 57 Hybrid duplex, 216
Hybrid-hybrid matrix method, 163–199 advantages, 196–198 effect of added noise, 176 experimental refinement, 190–194 for 3D NOESY-NOESY data analysis, 171–176 iterative refinement calculation, 179–190 procedure, 175 refinement of a duplex DNA, 177–190 refinement of a DNA three way junction, 190–194 theory, 173 Hybrid-matrix-based algorithms, 265 IRMA, 172 MARDIGRAS, 204 MORASS, 172 Hybrid matrix of NOE intensities, 204 Hydration studies by intermolecular water-solute NOEs, 485 Indices of agreement, 209; see also R-factor and NOE R-factor crystallographic R-factor, 210 sixth-root-weighted
factor, 210
Insulin hexamer, 135, 154–155 Integration time step, 23, 24, 25, 26, 27 Intensity based experiments, see Couplings, measurement of, intensity-based experiments Intensity-restrained refinement, 264
Interaction function, 5, 7, 9, 19, 23 Interaction tensor, 455 Interactions anisotropic dipolar, see Couplings, residual dipolar, theory
electric quadrupole, 314; see also Quadrupolar
Interactions (cont.) isotropic scalar couplings, 314
Zeeman, 314 Interface filter, 147 Interleukin-8 dimer, 147, 148, 149 Intermolecular ligand-receptor dipolar relaxation, 227 Intermolecular NOE hydration studies, 485 in transferred NOESY, 229, 244, 255, 277 solvent-solute, 485, 523 theory, 487 water-DNA, 516 water-protein, 511 water-solute, 485
Intermolecular potentials, 141 Intermolecular transferred NOESY, 229, 244, 255, 277 methods for observing, 255–261 Intermonomer NOEs, 137–146
Internal motion, 202 Interproton distance restraints, 203, 217
dynamically averaged distances, 206 Intramonomer NOEs, 137 Irreducible spherical tensor (IRE), see Tensor, irreducible spherical tensors Isolated spin pair approximation (ISPA), 224, 437 Isotope-selected/filtered methods, 261 Isotropic
interactions, see Interactions, isotropic reorientation, see Molecular alignment Iterative structure calculation, 145, 157 4,12,16,19, 23 J-modulation experiments, 328 J-resolved spectroscopy, 328 Jun homodimer, 148, 149–151 Karplus relation, 5, 12, 31 Killer toxin, 72 Kinetic matrix, 230 Labile hydrogens, 477–480 Lac repressor, 47 Ladder, 56, 58 Leakage-shell model, 298, 299
Leucine zipper homodimers, 146, 148, 149–151 Libration amplitudes, 467
Ligand motions in the bound state, 229 Ligand-protein intermolecular dipolar relaxation, 272 Ligand-protein/DNA complex, 297 Ligand-protein intermolecular NOESY intensity, 277 Ligand-receptor complexes, 223 calculation of concentrations, 247–249 reversibly binding, examples of, 225, 226
Index Ligand-receptor interactions, 233 encounter complex, 240 multistate models, 240 two-state model, 233 LINSHA, 207 Liquid crystals, see Molecular alignment, using liquid crystals LISP, 96 Local-elevation search method, 21 Logical constraint propagation, 56 Magnetic field alignment, see Molecular alignment, using magnetic field Magnetic susceptibility anisotropic, 313, 316–328, 340–342, 348, 350 concentration dependence, 348 determination, 320–322, 340–342, 348 diamagnetic, in ubiquitin, 324–326 in aromatic systems, 317, 319–322 in DNA, 322, 326, 342 in myoglobin, 321, 340–342, 348 interaction with magnetic field, see Molecular alignment, using magnetic field
origin of, 321 paramagnetic determination, 340, 341 in myoglobin, 323–325, 340–341 in small inorganic complexes, 321 origin, see Magnetic susceptibility, paramagnetic, in myoglobin principal axis system, see Principal axis system, magnetic susceptibility Magnetization transfer, 438-442 Main chain directed strategy, 39 MARCOPOLO, 43 MARDIGRAS, 204, 281 Maximum common subgraph, 42 Mean-field approximations, 22 MEDUSA, 213 Met repressor dimer, 147, 148, 149 Metalloprotein, 209 Methods for relaxation rate determination, 361 determination of the heteronuclear NOE, 369 determination of the longitudinal relaxation time 366 determination of the transversal relaxation time 368 experiments for the determination of relaxation rates, 365 theory of relaxation in proteins, 361 Methods for suppressing or identifying proteinmediated spin diffusion, 249 Methylphosphonate, 216 Metric tensor, 25, 27, 31 Model free expressions for transition probabilities, 231
549 Molecular alignment, see also Protein alignment using bicelles, 327 using dilute liquid crystals, 313, 327 using electric field, 320 using magnetic field, 312, 314, 316–322; see also Magnetic susceptibility Molecular complexes, 224, see also Ligand-receptor complexes Molecular conformations, 202 pool of conformers, 202, 213 Molecular dynamics in torsion angle space, 157 simulated annealing with, 142–144 Molecular dynamics (MD) simulation, 9, 13, 14, 21, 23, 29 in four-dimensional Cartesian space, 22 Molecular motion libration, 467–470 models, 444–446 Monte Carlo, 59 Monte Carlo simulation, 6, 21 MORASS, 172, 174–194 3D version of, 174 iterative refinement cycles, 191
Motion, see also Motions amplitudes, 346–347 arc model, 344–347 cone model, 344–348 effects on magnetic susceptibility, 341, 348 effects on NOE measurement, 344 effects on residual dipolar couplings, 344–345, 348–352 in myoglobin function, 345 librations from spin relaxation, see Spin relaxation order parameters and time scales order matrix analysis, see Order matrix, motion characterization slow collective motion in myoglobin, 346–347 Motional model, 204 Motions bond-angle bending, 24, 26 bond-stretching, 24, 26 dominated by Coulomb interactions, 24, 27 dominated by van der Waals contacts, 24, 27 torsional, 26 water librational, 26 Multiple copy refinement, 212 Multiple-time-step algorithms, 23, 25, 27 Multiple-quantum coherence, 434–436 Mutual information method, 50 Myoglobin diamagnetic susceptibility, see Magnetic Susceptibility, diamagnetic, in myoglobin motion, see Motion paramagnetic susceptibility, see Magnetic Susceptibility, paramagnetic, in myoglobin
550 Myoglobin (cont.) structure refinement, see Structure determination, examples, myoglobin Network editing sequences, 253 Neural networks, 52 Neutron diffraction intensities, 4, 19 NMR CLUST, 212 NMR data, 202 NMR experiments for intermolecular NOEs with water, see Pulse sequence for water-solute NOEs NMR methods for suppressing protein-mediated spin diffusion, 250
NMRD, see Nuclear Magnetic Relaxation Dispersion NOAH, 71, 44 NOAH/DIAMOD, 40, 47 NOE, 203, 311–312, 342; see also Motion, effects on NOE measurement between solute proton and bound but locally reorientating water, 489 between two rigidly bound protons, 488 connectivities, 45, 74
with rapidly diffusing water molecules, 490 intermolecular, 255–261 NOE assignment between symmetry mates, 156 comonomer, 137, 146 intermonomer, 137, 146 intramonomer, 137 restraint potential, 142
NOE-NOESY, 522 NOE R-factor, 298, 299; see also R-factor NOESY, 420; see also NOE 3D NOESY-NOESY data deconvolution of, 173–176, 191 gradient method for the analysis of, 165 simulation studies, 167–171 Non-bonded potential, 143 Non-crystallographic symmetry, 138–139, 156–157 Nonselective experiments, 508 Nonspecific binding, 229, 244 Non-structural protein (NS-1) from influenza A virus, 90, 97
Non-symmetric aggregation, 132 Normalization of calculated and experimental intensities, 267 Nuclear Overhauser enhancement (NOE)
Index Nuclei
319, 326, 338 see Quadropolar , 323–325 , 431, 432 Nucleic acid, see DNA
, 214 Order, magnetic field induced, 314 Order matrix, 348 diagonalization, 349 motion characterization, 350–353
ordering director, 348, 350–351 relation to magnetic susceptibility parameters, 350 structure determination, 349–352 theory, 348–350 Order parameter, 29, 231, 490 intermolecular, 456–545 intramolecular, 454–456, 467–470 Order parameters from residual dipolar couplings, see Order matrix, theory from spin relaxation, see Spin relaxation, order parameters and time scales tetramerization domain, 136, 135, 148, 151–152 Packing restraints, 146 Pair of spin-lock pulses, 495 Paramagnetic susceptibility, see Magnetic susceptibility, paramagnetic Paramagnetic systems, see Magnetic susceptibility, paramagnetic
PARSE, 212 Particle-particle-particle-mesh methods, 27 Partial relaxation, 205 Pathogenesis-related protein from tomato, 72 PDB2 NOE, 282, 285 PDQPRO, 213, 214, 218 Peak ambiguity, 39 Peak picking for resonance assignments, 99 Penalty function, 6, 7, 13, 31 for lower-bound restraining, 7 for upper-bound restraining, 7 Perdeuterated receptors, 249 Phage 434 represser, 72 Pitfalls in structure determination, 31 Platelet factor4/lL-8 chimer tetramer, 152
distance bounds, 12, 15, 19, 23
Point groups, 132–133, 134–135
intensities, 4, 12, 15, 19
Potential distance symmetry, 139–140, 156–157 NOE, 142 non-crystallographic symmetry, 139–140, 156–157 repel, 143
relaxation matrix calculation, 12, 23, 165, 204, 223 Nuclear magnetic relaxation dispersion (NMRD), 419– 421 data Analysis, 462 difference, 447–451 window, 471–473
soft-square, 139, 142
square-well, 139, 142
Index Principal axis system (PAS) dipolar interaction, 315 magnetic susceptibility, 315-319, 323 order tensor, see Order matrix, ordering director and diagonalization Probabilities of conformers, 213 PROSPECT, 45 Protein alignment, see also molecular alignment magnetic field induced, 311,316 using bicelles, 327 using liquid crystals, 313, 327 Protein association, 132 Protein hydration, 511 Protein-hydration, 419 semisolid sample, 457–458 Protein-leakage effects, 277 Protein-mediated spin-diffusion effects, 228, 272 methods for suppressing, 249–255 Protein motions at the active site, 228, 229 Proteins, see GATA-1, myoglobin and ubiquitin Protocols for symmetric oligomers, 143 Pseudoatom, 62 Pseudorotation phase angle, 208, 217 Pseudosymmetry, 133–134 Psuedocontact shifts, see dipolar, shifts Pucker amplitude, 208 Pulse sequences amplitude modulated HSQC, 333 CBCA(CO)NH,121 CBCANH, 121-125 CPMG for 369 for 393,397 for 393,408 for NOE, 397 for heteronuclear NOE, 394 for NOE, 372, 374 for .. 372, 374 for 372, 374 for transverse SIIS cross relaxation, 403 HNCA, 108, 110 HN(CA)CO, 104, 107 HACA(CO)NH, 109–113 HACANH, 114–116 HNCO, 104, 105 HSQC, 101–103 inversion recovery sequences for 367 multiple-quantum triple-resonance spectra, 127 NOESY-NOESY, 164 phase modulated HSQC, 334 phase-type triple-resonance spectra, 115–120, 126–127 selective coupling enhanced HSQC, 329 water flip-back HSQC, 371 Pulse sequences for water-solute NOEs ID NOE, 499 3D NOESY-TOCSY, 509
551 Pulse sequences for water-solute NOEs (cont.) 3D , 509 3D 509 90° excitatin by radiation damping, 499 HYDRA-N, 505 MEXICO, 499 PHOGSY, 505 water excitation by 180° pulse, 505 with 160° water excitation, 505 with diffusion filter, 505 with Q-switched selective 90° pulse, 499 with WANTED sequences, 505 with WEX-I filter, 499 with WEX-II filter, 499 WNOESY, 499 Q-switch, 503 Q(l/6) factor, 180 Quadratic objective function, 214 Quadratic programming algorithm, 213 Quadrupolar relaxation, 433 Quadrupolar coupling constant, 321 interaction, see Interactions, anisotropic, electric quadrupole nuclei, see Nuclei, splittings, 320–321 Quadrupole coupling constant, 466–467 Quantitative J, see Couplings, measurement of, intensity based experiments Quasi-symmetry, 133 example of leucine zippers, 133 Quiet-bird-NOESY, 255 Quiet-EXSY, 255 Quiet-NOESY, 255 R-factor, 29, 156, 157, 180, 210, 298, 299; see also NOE R-factor averaging, 65, 67 summation, 65, 67 Radiation damping, 497 Ramachandran map, 326 RANDMARDI, 205 Real space assignment, 39, 61 Relaxation filters, 257–260 spin-echo, 257–260 spin-lock, 257–260 Relaxation rate, 214 Relaxation rate matrix, 165, 230 Relaxation-reagent, 508 Relaxation time, 24, 25 Relaxation BWR theory, 433, 458 chemical-shift modulation, 442–443, 479–480 cross, 437–442 due to isotropic couplings, 442
552 Relaxation (cont.) deuteron, 431, 433–434 dipolar, 437–442 dispersion, see Nuclear Magnetic Relaxation Dispersion effectively exponential, 434 exchange averaging, 446–447 filters, 257-260 generalized theory, 458 mechanisms, 432–444 oxygen-17, 431–432, 434 proton, 430–431 quadrupolar, 433–437 scalar, 443–444 stochastic theory, 458–462 temporal resolution, 351 time measurement, 422, 428 Reliability distance, 71
Reorientation local, 489 Repel potential, 143 Residence limes, 519 Residual dipolar couplings, see Couplings, residual dipolar Restrained molecular dynamics, 202, 208 time-averaged molecular dynamics, 212 trajectories, 212 Restraints, 138–140 comonomer, 146 distance symmetry, 139-140, 156–157 NOE distance, 142 non-bonded, 143 non-crystallographic symmetry, 138–139, 156–157 packing, 146 space group, 132 Reversibly binding molecular complexes, 225, 226 Ribonuclease, 472 RID method, 56, 57 RNA DNA hybrid, 207, 217 RNA hairpin, 206 ROESY, 206 Hartmann-Hahn transfer of magnetization (HOHAHA), 206 Rotations Euler, 315–318, 351 Wigner, 316–318, 351 SAR-by-NMR, 301 Saturation of receptor resonances, 251 Sauson-Flamsteed projection, 350–351 Screening of compound libraries, 301 SECODG, 40, 67, 68 Selective water excitation, 496 Selective water excitation by a 90° pulse, 498 Selective water excitation by a 180° pulse, 504 Self-correcting distance geometry, 40, 67, 68
Index SERENDIPITY, 39, 46, 49 SH3, 66 SHAKE method, 26 Sialyl tetrasaccharide, 290, 294 complexed to E-selectin, 297 Side-chain derived from relaxation rates, 395 dynamical parameters derived from relaxation times and steady state NOE, 396 SIIS Cross Relaxation, 402 Simulated annealing, 59, 60 see Structure determination, simulated annealing using molecular dynamics, 142–144 Simulated temperature annealing, 22, 25 Simulated transferred NOESY, 267 Simulation of NMR cross-peaks, 207 Single target function, 264 Soft-square potential, 139, 142 Spectral Density Function, 452–457 SPHINX, 207 Spin relaxation contributions to multiplets, 336–338 CSA/dipole-dipole, 336–337 dipole-dipole/dipole-dipole, 337–338 dynamic frequency shifts, 337–339 effects on couplings measurement, 337–339 nuclear dipole/Curie spin-nuclear dipole, 339 order parameters and time scales, 344–346 Square-well potential, 139, 142 dimer (Single-stranded DNA binding protein), 148, 149 Staphylococcal protein A, 57 Stereoconfiguration, 217 STEREOSEARCH, 64, 65 Stereospecific assignment, 209 Stereospecific assignment, 62, 63 Stochastic dynamics (SD) simulation, 21, 23 restraining methods, 6 Structural relevance, 521 Structural restraints, 203 Structure based design, 227, 301, 302 factor, 5, 12, 19, 23 refinement, 24, 202 well defined, 144 Structure calculation iterative, 145, 157 protocols, 142, 144 Structure determination examples, 322–344 GATA-1 complexed to DNA, 326 myoglobin, 322–323, 342–344 order matrix approach, see Order matrix, structure determination protocols, 339 simulated annealing, 339–344
Index Structure determination (cont.) ubiquitin, 324–326 Structure-based drug design, 302 Structure-based filters, 70, 71 Studies of protein hydration, 511 Studies of DNA and RNA hydration, 516 Sugar conformation, 208
Surface hydration water, 470–471 SYMM, 205 Symmetric oligomers, 131
examples of, 148 interface between, 147 possible point group symmetries, 133–135 solved by NMR, 131, 149–151, 152–155 structure calculation method, 138–147 Symmetric aggregation, 132 dimers, 133, 134–135 hexamers, 133, 134–135 oligomers, 131 pentamers, 133, 134–135 trimers, 133, 134–135 tetramers, 133, 134–135 Symmetrization matrix, 235 Symmetry ADR method, 138 problems, 155 pseudo, 133–134 quasi, 133 Symmetry degeneracy, 136, 137, 138 linear group, 132 point group, 132–133, 134–135
spin-echo relaxation filters, 257
spin-lock filter, 257 Temperature used in simulated annealing, 143–144 Temperature control, 423
553 Toxin III, 66 Tr-NOESY-based screening of compound libraries, 301 Transferred NOESY, 223, 225 analytical expressions for a two-spin system, 238 CORCEMA theory of, 223 effect of finite off-rates, 227, 268–270 effect of ligand-receptor ratio, 270–272 effect of motions in the protein-ligand complex, 278–281 effect of protein-mediated spin diffusion, 227, 228, 272–277
effect of protein-leakage, 228, 272–270 in structure-based design, 227, 301 intermolecular, 229, 277–278 for multi-state models, 240–243
for two state models, 233–240 on systems with encounter complexes, 240 screening of compound libraries, 301 simulation using CORCEMA, 289–301 simulation using PDB2 NOE, 282, 285 Transferred NOESY difference spectroscopy, 256 Transferred NOESY with short mixing times, 250 Transverse relaxation, 336 Treatment for more than two states, 240
Triple-resonance NMR, 82; see also Pulse sequences Troponin C EF hand, TNCIIIdimer, 147, 148, 149 Trp-repressor-complex operator, 227, 297 Twin-range method, 25 Two-dimensional ROESY, 251 Two-state model of ligand-receptor interactions, 233 Ubiquitin, 448 diamagnetic susceptibility, see Magnetic susceptibility, diamagnetic, in ubiquitin structure refinement, see Structure determination, examples, ubiquitin United-atom model, 8 Upper distance constraints, 74
Tendamistat, 19, 51
Variable target function, 21, 68, 69, 264
Tensor irreducible spherical (IRE), 315–319 magnetic susceptibility, see Magnetic susceptibility operators, 315–319 order, see Order matrix Theoretical background for intermolecular NOEs, 487 Thermolysin-inhibitor complex, 241 Three dimensional volume matrix, 166
VTB, B subunit of verotoxin, 135, 154–155
Three-spin effects, 224 Thrombin, 282 Time averaging, 14, 15 restraining, 14, 15, 16, 18, 21, 31 structure refinement, 15 Torsion-angle dynamics, 27, 31 Torsion angle restraints, 207 Toxin OSK1, 66
Water excitation, selective, 496 Water flipback, 502 Watergate, 495 demagnetizing fields, 511 diffusion filter, 496 dipolar field effects, 506, 511
spin-lock pulses, 495 Water-internal, 447 residence time, 453, 472–475 Water-protein magnetization transfer, 438–439 Water relaxation in semisolid proteins, 457 Water residence time, 491, 519 Water suppression, 494 Weak-coupling restraining methods, 6
554 Well-defined structure, 144 Wigner rotations, see Rotations, Wigner X-ray diffraction intensities, 4, 15, 19 scattering factors, 12
Index X-filtered spectroscopy, 138, 156 X-PLOR, 138, 139 XEASY, 38, 51 Z-Domain of staphylococcal protein A, 90, 97 Zeeman interaction, see Interactions, isotropic, Zeeman