Protein NMR Spectroscopy: Practical Techniques and Applications
Protein NMR Spectroscopy: Practical Techniques and App...
70 downloads
1092 Views
69MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Protein NMR Spectroscopy: Practical Techniques and Applications
Protein NMR Spectroscopy: Practical Techniques and Applications Edited by Lu-Yun Lian NMR Centre for Structural Biology, Institute of Integrative Biology, The University of Liverpool, Liverpool, UK Gordon Roberts Henry Wellcome Laboratories of Structural Biology Department of Biochemistry, University of Leicester, Leicester, UK
This edition first published 2011 Ó 2011 John Wiley & Sons Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising herefrom. Library of Congress Cataloging-in-Publication Data Protein NMR spectroscopy : practical techniques and applications / edited by Lu-Yun Lian, Gordon Roberts. p. cm. Includes bibliographical references and index. ISBN 978-0-470-72193-3 1. Proteins–Analysis. 2. Nuclear magnetic resonance spectroscopy. I. Lian, Lu-Yun. II. Roberts, G. C. K. (Gordon Carl Kenmure) QP551.P69725 2011 5470 .7–dc22 2011010948 A catalogue record for this book is available from the British Library. Print ISBN: 9780470721933 ePDF ISBN: 9781119972013 oBook ISBN: 9781119972006 ePub ISBN: 9781119972822 Mobi: 9781119972884 Set in 10/12 pt Times by Thomson Digital, Noida, India
Contents List of Contributors
1
Introduction Lu-Yun Lian and Gordon Roberts
1
References
4
Sample Preparation, Data Collection and Processing Frederick W. Muskett
5
1.1 1.2
2
xiii
Introduction Sample Preparation 1.2.1 Initial Considerations 1.2.2 Additives 1.2.3 Sample Conditions 1.2.4 Special Cases 1.2.5 NMR Sample Tubes 1.2.5.1 3 mm Tubes 1.3 Data Collection 1.3.1 Locking 1.3.2 Tuning 1.3.3 Shimming 1.3.4 Calibrating Pulses 1.3.5 Acquisition Parameters 1.3.6 Fast Acquisition Methods 1.4 Data Processing References
5 5 6 7 7 8 9 10 11 11 11 12 13 14 16 17 20
Isotope Labelling Mitsuhiro Takeda and Masatsune Kainosho
23
2.1 2.2
Introduction Production Methods for Isotopically Labelled Proteins 2.2.1 Recombinant Protein Expression in Living Organisms 2.2.1.1 Escherichia coli 2.2.1.2 Yeast Cells 2.2.1.3 Other Host Cells
23 24 24 24 25 25
vi
Contents
2.2.2 Cell-Free Synthesis Protocol 1: Preparation of the Amino Acid Free S30 Extract Protocol 2: Cell-Free Reaction on a Small Scale 2.3 Uniform Isotope Labelling of Proteins 2.3.1 Uniform 15N Labelling 2.3.2 Uniform 13C, 15N Labelling 2.3.3 2H Labelling 2.4 Selective Isotope Labelling of Proteins 2.4.1 Amino Acid Type-Selective Labelling 2.4.2 Reverse Labelling 2.4.3 Stereo-Selective Labelling 2.5 Segmental Labelling 2.6 SAIL Methods 2.6.1 Concept of SAIL 2.6.2 Practical Procedure for the SAIL Method Protocol 3: Production of SAIL Proteins by the E. coli Cell-Free Method 2.6.3 Residue-Selective SAIL Method Protocol 4: Optimisation of the Amount of SAIL Amino Acids for the Production of Calmodulin Selectively Labelled by SAIL Phenylalanine 2.7 Concluding Remarks Acknowledgements References 3
Resonance Assignments Lu-Yun Lian and Igor L. Barsukov 3.1 3.2
Introduction Resonance Assignment of Unlabelled Proteins 3.2.1 Spin System Assignments 3.2.2 Sequence-Specific Assignments 3.2.3 Possible Difficulties 3.3 15 N-Edited Experiments 3.4 Triple Resonance 3.4.1 3D Triple Resonance 3.4.1.1 Identification of Spin Systems 3.4.1.2 Sequential Assignment 3.4.1.3 Proline Residues 3.4.2 4D Triple Resonance 3.4.3 Computer-Assisted Backbone Assignments 3.4.4 Unstructured Proteins 3.4.5 Large Proteins 3.5 Side-Chain Assignments References
25 26 28 29 29 30 30 32 32 34 36 37 38 38 41 41 42
45 45 46 46 55 55 56 57 59 60 60 62 62 64 68 74 74 76 76 77 77 81
Contents
4
Measurement of Structural Restraints Geerten W. Vuister, Nico Tjandra, Yang Shen, Alex Grishaev and Stephan Grzesiek 4.1 4.2
4.3
4.4
4.5
Introduction NOE-Based Distance Restraints 4.2.1 Physical Background 4.2.2 NMR Experiments for Measuring the NOE 4.2.3 Set-up of NOESY Experiments 4.2.3.1 Estimation of T2s Recipe 4.1: 1–1 Echo Experiment Recipe 4.2: Set-up of Optimal Acquisition Times Recipe 4.3: Set-up of a 3D 15N-Edited NOESY Experiment (Figure 4.2a) Recipe 4.4: Set-up of a 3D 13C-Edited NOESY Experiment 4.2.4 Deriving Structural Information from NOE Cross-peaks Recipe 4.5: Extraction of Distances Using Classes Recipe 4.6: Extraction of Distances Using the Two-Spin Approximation 4.2.5 Information Content of NOE Restraints Dihedral Restraints Derived from J-Couplings 4.3.1 Physical Background 4.3.2 NMR Experiments for Measuring J-Couplings Recipe 4.7: E.COSY Experiment Recipe 4.8: Quantitative J-Correlation 4.3.3 Deriving Structural Information from J-Couplings Hydrogen Bond Restraints 4.4.1 NMR H-Bond Observables 4.4.2 Detection of NH O¼C H-Bonds in Proteins Recipe 4.9: Setting up a Long-Range HNCO Experiment for H-Bond Detection Orientational Restraints 4.5.1 Physical Background 4.5.1.1 Dipolar Couplings in Anisotropic Solution 4.5.1.2 The Alignment Tensor 4.5.1.3 Chemical Shifts in Anisotropic Solution 4.5.2 Alignment Methods 4.5.2.1 Intrinsic Molecular Alignment 4.5.2.2 Indirect Alignment by External Media 4.5.3 Measurements and Data Analysis 4.5.4 Determination of the Alignment Tensor 4.5.4.1 Degeneracy of Solutions 4.5.4.2 Prediction of the Alignment Tensor from the Structure 4.5.5 RDCs in Structure Validation 4.5.5.1 Q-Factor
vii
83
83 84 84 86 87 87 88 89 90 91 92 95 95 95 96 96 97 98 100 102 103 103 104 106 107 108 108 109 111 112 112 113 116 118 121 121 122 122
viii
Contents
4.5.5.2 Using RDC Values for Database Screening RDCs in Structure Determination 4.5.6.1 Structure Refinement 4.5.6.2 Domain Orientation 4.5.6.3 De Novo Structure Determination 4.5.7 Conclusion 4.6 Chemical Shift Structural Restraints 4.6.1 Origin of Chemical Shifts and Its Relation to Protein Structure 4.6.2 Obtaining Chemical Shifts 4.6.3 Backbone Dihedral Angle Restraints from Chemical Shifts (TALOS) Recipe 4.10: Using the TALOS þ Program (for details see http://spin.niddk.nih.gov/bax/software/TALOS þ /) 4.6.4 Protein Structure Determination from Chemical Shifts (CS-Rosetta) Recipe 4.11: CS-Rosetta Structure Calculation 4.7 Solution Scattering Restraints 4.7.1 Physical Background 4.7.2 Shape Reconstructions from Solution Scattering Data 4.7.3 Use of SAXS in High-Resolution Structure Determination 4.7.4 Sample Preparation 4.7.5 Data Collection 4.7.6 Data Processing and Initial Analysis Acknowledgement References
122 122 122 125 128 129 129 129 131
Calculation of Structures from NMR Restraints Peter Guntert
159
4.5.6
5
5.1 5.2 5.3
Introduction Historical Development Structure Calculation Algorithms 5.3.1 Molecular Dynamics Simulation versus NMR Structure Calculation 5.3.2 Potential Energy – Target Function 5.3.3 Torsion Angle Dynamics 5.3.3.1 Tree Structure 5.3.3.2 Kinetic Energy 5.3.3.3 Forces ¼ Torques ¼ Gradient of the Target Function 5.3.3.4 Equations of Motion 5.3.3.5 Torsional Accelerations 5.3.3.6 Time Step 5.3.4 Simulated Annealing Protocol for Simulated Annealing 5.4 Automated NOE Assignment 5.4.1 Ambiguity of Chemical Shift Based NOESY Assignment
132 132 134 136 137 137 139 140 141 142 145 147 147
159 161 164 164 165 166 167 167 169 169 170 171 172 172 173 174
Contents
5.4.2 5.4.3
Ambiguous Distance Restraints Combined Automated NOE Assignment and Structure Calculation with CYANA 5.4.4 Network-Anchoring 5.4.5 Constraint Combination 5.4.6 Structure Calculation Cycles 5.5 Nonclassical Approaches 5.5.1 Assignment-Free Methods 5.5.2 Methods Based on Residual Dipolar Couplings 5.5.3 Chemical Shift-Based Structure Determination 5.6 Fully Automated Structure Analysis References 6
Paramagnetic Tools in Protein NMR Peter H.J. Keizers and Marcellus Ubbink 6.1 6.2
Introduction Types of Restraints 6.2.1 Paramagnetic Dipolar Relaxation Enhancement 6.2.2 Other Types of Relaxation 6.2.3 Residual Dipolar Couplings 6.2.4 Contact and Pseudocontact Shifts 6.3 What Metals to Use? 6.4 Paramagnetic Probes 6.4.1 Substitution of Metals 6.4.2 Free Probes 6.4.3 Nitroxide Labels 6.4.4 Metal Binding Peptides 6.4.5 Synthetic Metal Chelating Tags Protocol for the Application of Paramagnetic NMR on Diamagnetic Proteins 6.5 Examples 6.5.1 Structure Determination of Paramagnetic Proteins 6.5.2 Structure Determination Using Artificial Paramagnets 6.5.3 Structures of Protein Complexes 6.5.4 Studying Dynamics with Paramagnetism 6.6 Conclusions and Perspective References 7
Structural and Dynamic Information on Ligand Binding Gordon Roberts 7.1 7.2
Introduction Fundamentals of Exchange Effects on NMR Spectra 7.2.1 Definitions 7.2.2 Lineshape 7.2.3 Identification of the Exchange Regime
ix
175 175 177 177 177 178 178 179 180 181 185 193 193 194 194 197 197 199 200 203 203 204 204 205 206 207 209 209 209 210 211 212 213 221 221 222 222 225 227
x
Contents
7.3
Measurement of Equilibrium and Rate Constants 7.3.1 Lineshape Analysis 7.3.1.1 Slow Exchange 7.3.1.2 Fast Exchange 7.3.2 Magnetisation Transfer Experiments 7.3.2.1 Saturation Transfer 7.3.2.2 Inversion Transfer 7.3.2.3 Two-Dimensional Exchange Spectroscopy 7.3.3 Relaxation Dispersion Experiments 7.4 Detecting Binding – NMR Screening 7.4.1 Detecting Binding by Changes in Rotational and Translational Mobility of the Ligand 7.4.2 Detecting Binding by Magnetisation Transfer 7.4.2.1 Saturation Transfer Difference (STD) Spectroscopy 7.4.2.2 Water-LOGSY 7.5 Mechanistic Information 7.5.1 Problems of Fast Exchange 7.5.2 Identification of Kinetic Mechanisms 7.5.2.1 Slow Exchange 7.5.2.2 Fast Exchange 7.6 Structural Information 7.6.1 Ligand Conformation – the Transferred NOE 7.6.1.1 Exchange Rate 7.6.1.2 Contributions from Other Species 7.6.1.3 Spin Diffusion 7.6.1.4 Structure Calculation 7.6.2 Interligand Transferred NOEs 7.6.2.1 Two Ligands Bound Simultaneously 7.6.2.2 Competitive Ligands – INPHARMA 7.6.3 Ligand Conformation – Transferred Cross-Correlated Relaxation 7.6.4 Chemical Shift Mapping – Location of the Binding Site 7.6.5 Paramagnetic Relaxation Experiments 7.6.6 Isotope-Filtered and -Edited Experiments References 8
Macromolecular Complexes Paul C. Driscoll 8.1 Introduction 8.2 Spectral Simplification through Differential Isotope Labelling 8.3 Basic NMR Characterisation of Complexes Protocol for Protein–Protein Titrations 8.4 3D Structure Determination of Macromolecular Protein–Ligand Complexes 8.4.1 NOEs
229 229 229 230 231 233 233 233 235 238 239 240 240 241 241 242 242 243 243 246 246 248 249 250 251 251 252 252 253 253 254 256 259 269 269 270 273 273 277 277
Contents
8.4.2 Saturation Transfer 8.4.3 Residual Dipolar Couplings 8.4.4 Paramagnetic Relaxation Enhancements 8.4.5 Pseudo-Contact Shifts 8.4.6 Data-Driven Docking 8.4.7 Small Angle X-Ray Scattering (SAXS) 8.5 Literature Examples 8.5.1 Protein–Protein Interactions 8.5.2 Protein–DNA Interactions 8.5.3 Protein–RNA Interaction 8.5.3.1 Protein–dsRNA 8.5.3.2 Protein–ssRNA References 9
Studying Partially Folded and Intrinsically Disordered Proteins Using NMR Residual Dipolar Couplings Malene Ringkjøbing Jensen, Valery Ozenne, Loic Salmon, Gabrielle Nodet, Phineus Markwick, Pau Bernado´ and Martin Blackledge 9.1 9.2 9.3 9.4
Introduction Ensemble Descriptions of Unfolded Proteins Experimental Techniques for the Characterisation of IDPs NMR Spectroscopy of Intrinsically Disordered Proteins 9.4.1 Chemical Shifts 9.4.2 Scalar Couplings 9.4.3 Nuclear Overhauser Enhancements 9.4.4 Paramagnetic Relaxation Enhancements 9.4.5 Residual Dipolar Couplings 9.5 Residual Dipolar Couplings 9.5.1 Interpretation of RDCs in Disordered Proteins 9.5.2 RDCs in Highly Flexible Systems: Explicit Ensemble Models 9.5.3 RDCs to Detect Deviation from Random Coil Behaviour in IDPs 9.5.4 Multiple RDCs Increase the Accuracy of Determination of Local Conformational Propensity 9.5.5 Quantitative Analysis of Local Conformational Propensities from RDCs 9.5.6 Conformational Sampling in the Disordered Transactivation Domain of p53 9.6 Conclusions References Index
xi
282 286 289 291 293 296 297 297 301 303 303 305 310
319
319 320 320 321 321 322 322 322 323 323 324 327 329 333 335 339 340 340 347
List of Contributors Igor Barsukov, NMR Centre for Structural Biology, The University of Liverpool, School of Biological Sciences, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom Pau Bernado´, Institute for Research in Biomedicine, c/Baldiri Reixac 10, 08028Barcelona, Spain Martin Blackledge, Institut de Biologie Structurale, UMR 5075 CEA-CNRS-UJF, 41 Rue Jules Horowitz, Grenoble 38027, France Paul Driscoll, Division of Molecular Structure, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, United Kingdom Alex Grishaev, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, 5 Memorial Drive, Bethesda, MD 20892 Stephan Grzesiek, Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland Peter G€ untert, Institut f€ ur Biophysikalische Chemie, BMRZ, J.W. Goethe-Universit€at Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany Malene Ringkjøbing Jensen, Institut de Biologie Structurale, UMR 5075 CEA-CNRSUJF, 41 Rue Jules Horowitz, Grenoble 38027, France Masatsune Kainosho, Center for Structural Biology, Graduate School of Science, Nagoya University, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648602, Japan Peter H.J. Keizers, Leiden Institute of Chemistry, Leiden University, Gorlaeus Laboratories, P.O. Box 9502, 2300 RA Leiden, The Netherlands Lu-Yun Lian, NMR Centre for Structural Biology, The University of Liverpool, School of Biological Sciences, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom Phineus Markwick, Howard Hughes Medical Institute, 9500 Gilman Drive, La Jolla, California 92093-0378, USA
xiv
List of Contributors
Frederick W. Muskett, Henry Wellcome Laboratories of Structural Biology, Department of Biochemistry, University of Leicester, PO Box 138, Lancaster Road, Leicester LE1 9HN, United Kingdom Gabrielle Nodet, Institut de Biologie Structurale, UMR 5075 CEA-CNRS-UJF, 41 Rue Jules Horowitz, Grenoble 38027, France Valery Ozenne, Institut de Biologie Structurale, UMR 5075 CEA-CNRS-UJF, 41 Rue Jules Horowitz, Grenoble 38027, France Gordon Roberts, Henry Wellcome Laboratories of Structural Biology, Department of Biochemistry, University of Leicester, PO Box 138, Lancaster Road, Leicester LE1 9HN, United Kingdom Loic Salmon, Institut de Biologie Structurale, UMR 5075 CEA-CNRS-UJF, 41 Rue Jules Horowitz, Grenoble 38027, France Yang Shen, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, 5 Memorial Drive, Bethesda, MD 20892 Mitsuhiro Takeda, Center for Structural Biology, Graduate School of Science, Nagoya University, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648602, Japan Nico Tjandra, Laboratory of Molecular Biophysics, National Heart, Lung, and Blood Institute, National Institutes of Health, 50 South Drive, Bethesda, Maryland 20892, USA Marcellus Ubbink, Leiden Institute of Chemistry, Leiden University, Gorlaeus Laboratories, PO Box 9502, 2300 RA Leiden, The Netherlands Geerten Vuister, Protein Biophysics, Institute of Molecules and Materials, Radboud University Nijmegen, PO Box 9101, 6500 HB Nijmegen, The Netherlands; and Henry Wellcome Laboratories of Structural Biology, Department of Biochemistry, University of Leicester, PO Box 138, Lancaster Road, Leicester LE1 9HN, United Kingdom
(a)
Chemical Shifts C , C , C', H , H , N
ANN predicted ϕ/ψ distribution
Tri-peptide (i-1,i,i+1)
Calculate similarity (chemical shifts & sequence)
chemical shifts sequence
matched tri-peptides ( j-1,j,j+1)k, k=1-10
Tri-peptide Database ( j-1,j,j+1) {chemical shifts, Sequence, ϕ and ψ angles}
Predefined
(ϕ j,ψj)k
consistent?
YES
average (ϕ,ψ) & std .dev.
Predicted (ϕi ,ψi )
(b)
Figure 4.21 (a) Flowchart of TALOS þ method. (b) Graphic TALOS þ inspection interface for TolR protein. (See details at TALOS þ webpage http://spin.niddk.nih.gov/bax/software/ TALOS þ /)
(a)
chemical
Chemical Shifts
shifts 1. MFR Fragment Selection (sequence)
Fragment Candidates
Structural Database Calculate chemical shifts (SPARTA) Predefined
2. Rosetta Fragment Assembly All-atom Models
3. Chemical Shift Score Calculation
Rescored Models
(b)
Converged?
(c) Rosetta Energy
Rosetta Energy
-120 -140 -160 -180 0
2
4
6
8 10 12
(d)
Predicted Structure
30 0 -30 -60 -90 0
Cα RMSD to TolR
YES
2
4
6
8 10 12
Cα RMSD to TolR
(e)
Rosetta Energy
30 0 -30 -60 -90 0
4 6 8 10 12 Cα RMSD to Lowest Energy Model 2
Figure 4.22 (a) Flowchart of CS-Rosetta protocol. (b–e) CS-Rosetta structure generation of TolR protein. (b) Plot of Rosetta full-atom energy versus C a rmsd relative to the experimental TolR monomer structure for all CS-Rosetta models. (c) as (b) but showing the Rosetta full-atom energy rescored (augmented) by the chemical shift deviation (x 2) energy. (d) as (c) showing the Rosetta full-atom energy rescored by the chemical shift deviation (x2) energy versus the C a rmsd from the lowest energy CS-Rosetta model. (e) Backbone ribbon representation of 10 CS-Rosetta models with lowest energy (dark grey) superimposed on the experimental monomer NMR structure (light grey) for TolR
Figure 5.7 Structures obtained by fully automated structure determination with the FLYA algorithm (blue) superimposed on the corresponding NMR structures determined by conventional methods (dark red). (a) ENTH domain At3g16270(9–135) from Arabidopsis thaliana [147]. (b) Rhodanese homology domain At4g01050(175–295) from Arabidopsis thaliana [148]. (c) Src homology domain 2 (SH2) from the human feline sarcoma oncogene Fes [143]
Figure 6.5 The complex of nitrite reductase (NiR) and pseudoazurin (Paz). NiR is shown in spacefill, with its subunits in blue, pink and green. The best twenty Paz orientations are shown as Ca traces. The Paz copper atoms are shown as green spheres and the positions of the Gd3 þ ions in the CLaNP molecules as orange spheres. Reprinted from Vlasie et al. [96], Copyright (2008) with permission from Elsevier
Figure 6.6 The ensemble of complex structures of adrenodoxin (Adx) and cytochrome c (Cc) from a PCS-based simulation illustrates the degree of dynamics between the two proteins. Adx is shown in a surface representation coloured according to the electrostatic potential with red for negative and blue for positive; the Fe2S2 binding loop is in yellow. The centres of mass of Cc are represented by green spheres. Reprinted with permission from Xu et al. [125]. Copyright 2008, American Chemical Society
Figure 7.4 Relaxation dispersion. (A) Schematic representation of signal dephasing during CPMG pulse trains based on the analogy to the runners described in the text, where the y axis plots the distance of the runners from the starting position. A blue or red line indicates a spin in the major or minor state, respectively. Dashed lines correspond to spins experiencing at least one conformational transition, whereas the solid lines correspond to no transitions. Reproduced by permission from Mittermaier and Kay, Science (2006) 312, 224–228. (B) 15N CPMG relaxation dispersion profiles obtained at 500 MHz (blue, lower) and 800 MHz (red, upper) proton Larmor frequencies for Leu7 of the Fyn SH3 domain partially saturated with its ligand, a 12-residue peptide. Data are shown for 20, 30, 35, 40, 45, and 50 C (a)–(f), illustrating the effects on the relaxation dispersion of the changes in koff, from 11.7 s1 at 20 C to 331 s1 at 50 C. Best-fit curves were generated using a single value of pB optimised for all temperatures, koff values fit globally, and a Dv value taken from HSQC spectra of the free and peptide-saturated states. Reproduced by permission from Demers and Mittermaier, J. Amer. Chem. Soc. (2009) 131, 4355–4367
Figure 8.6 Top: best-fit superposition of the backbone atoms of the 40 simulated annealing structures of the N-terminal domain of enzyme I (EIN) complexed to the histidine-containing phosphocarrier protein (HPr). Bottom: ribbon diagrams illustrating two views of the 40 kDa EINHPr complex. HPr is shown in green, the a-domain of EIN in red, and the a/b-domain and C-terminal helix of EIN in blue. Also shown in gold are the side-chains of active site histidine residues of both EIN and HPr. Image taken from [59]
Figure 8.7 NMR data for the titration of insulin-like growth factor-2 (IGF2) receptor (IGF2R) domain 11 with IGF2 as reported by Williams et al. [148]. Left: 2D 15N,1H-correlation spectra of IGF2R domain 11 in the absence (black) and presence (red) of IGF2. The insert panel shows an expanded view of the boxed region. Right: the pattern of IGF2binding-dependent chemical shift perturbations for IGF2R domain 11 mapped on a molecular surface representation. Residues with shift perturbations H 0.05 ppm are red, and residues with shift perturbations G0.05 ppm are orange. Blue indicates NH resonances that broaden and disappear (i.e. are ‘bleached’) upon IGF2 binding (cf. Figure 1.2d); grey indicates little or no change in chemical shift upon binding. Image taken from [148]
Figure 8.8 Structural models of IGF2R-D11-IGF2 complex generated using HADDOCK. The two lowest energy structures in each of the candidate clusters are shown, with IGFR2-D11 depicted in surface mode, the IGF2 backbone in ribbon mode and selected side-chains as sticks. The core of the IGF2 binding site is coloured blue, and the side-chain of E1544, which is known to negatively regulate IGF2 binding, is drawn in red. The orientation of IFG2 differs by approximately 20 between the two models. Image taken from [148]
Figure 8.10 NMR-derived structure of the complex formed between the Rnt1p RNAse III dsRBD protein and snR47h AGNN tetraloop hairpin RNA determined by Wu et al. [155]. Left: best-fit superposition of the 15 lowest energy NMR models with the protein shown in blue and RNA in green. Right: schematic representation of the lowest energy structure in the bundle with the RNA helical backbone indicated by thin blue cylinder, the RNA atoms shown in stick form and the protein as ribbons with residues populating the protein-RNA interface shown as ball and sticks. Image adapted from [155]
Figure 8.11 Representations of the molecular components of the structure of the complex between Rous sarcoma virus (RSV) mY packaging signal RNA and Zn-binding RSV nucleocapsid (NC) protein reported by Zhou et al. [164]. The predicted secondary structure elements of the RNA and the coordination of the Zn atoms is shown. Nonnative nucleotides used to enable in vitro transcription and protease cleavage of the expressed fusion protein are depicted in red and grey respectively. Image taken from [164]
Figure 8.12 Top: two sample regions of the 3D 13C-edited NOESY-HMQC spectrum recorded for double 13C,15N isotope-labelled Rous sarcoma virus (RSV) nucleocapsid (NC) protein bound to unlabelled RSV mY packaging signal RNA investigated by Zhou et al. [164]. The crosspeaks correspond to intermolecular NOE contacts associated with residues Arg16 and Ala32 of the NC N-terminal Zn-knuckle, respectively. Bottom: Overlay of the 2D 1HNOESY spectra obtained for specifically protonated GH-mY (black) and UH-mY (red) bound to NC showing intermolecular NOE crosspeaks connecting the stem-loop C (SL-C) tetraloop RNA residues U217, G218 and G220 to the NC N-terminal Zn-knuckle residues Tyr22 and Tyr30. Image taken from [164]
Figure 8.13 (a) Rendering of the 20 NMR-derived structures of the NC:mY complex showing the relative convergence of the secondary structure elements, obtained by best-fit superposition of the SL-C stem carbon atoms. The result shows that the relative positions of SL-B (green), SL-C (brown), O3 (red), the linkers (orange), and the NC Zn-knuckles (blue) are well defined by the NMR data, but the position of SL-A (purple) is not; (b) and (c) show two different stereo views of a representative structure, showing the relative positions of the NC and mY secondary structure elements. Image taken from [164]
Figure 9.3 Residual dipolar couplings (1DNH and 2DCNH) from the two-domain protein, PX, from Sendai virus. (a) 1DNH and 2DC 0 NH RDCs are well reproduced from throughout the protein using flexible meccano (black). Experimental values are shown in grey. (b) Ensemble representation of PX. Reprinted from Malene Ringkjøbing Jensen, Phineus R.L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier
Introduction Lu-Yun Lian and Gordon Roberts
The nuclear magnetic resonance (NMR) method is one of the principal techniques used to obtain physical, chemical, electronic and three-dimensional structural information about molecules in solution, whether small molecules, proteins, nucleic acids, or carbohydrates. NMR is a physical phenomenon based upon the magnetic properties of certain atomic nuclei. When exposed to a very strong magnetic field (2–21.1 Tesla) these nuclei align with this field. During an NMR experiment, the alignment is perturbed using a radiofrequency signal (typically a few hundred megahertz). When the radio transmitter is turned off, the nucleus returns to equilibrium and in the process re-emits radio waves. The usefulness of this technique in biochemistry results largely from the fact that nuclei of the same element in different chemical and magnetic environments give rise to distinct spectral lines. This means that each NMR-active atom in a large molecule such as a protein can be observed and can provide information on structure, conformation, ionisation state, pKa, and dynamics. The nuclei which are most relevant to the study of biological macromolecules are shown in Table 1. The proton (1 H) is the most sensitive nucleus for NMR detection. For biological studies, 13 C and 15 N are now just as important, although enrichment with these stable isotopes is necessary. The first published NMR spectrum of a biological macromolecule was the 40 MHz 1 H spectrum of pancreatic ribonuclease reported in 1957. Since then, the significant milestones for NMR include: . . .
Fourier Transform NMR in the late 1960s; the development of two-dimensional NMR in the early 1970s; the development of INEPT/HMQC pulse sequences in the late 1970s;
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. Ó 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
2
Protein NMR Spectroscopy
Table 1
Properties of nuclei of interest in NMR studies of proteins
Isotope
Spin
1
1
H H 3 H 13 C 14 N 15 N 17 O 19 F 31 P 111Cd 113Cd 2
a
.
.
/2 1 1 /2 1 /2 1 1 /2 5 /2 1 /2 1 /2 1 /2 1 /2
Frequency (MHz) at 14.0954T
Natural Abundance
600.13 92.124 640.123 150.903 43.367 60.834 81.356 546.686 242.937 127.32 133.188
99.99 0.0115 0 1.108 99.636 0.37 0.038 100 100 12.80 12.22
Relative Sensitivitya 1.00 9.65 10 1.21 1.59 10 1.01 10 1.04 10 2.91 10 0.83 6.63 10 9.66 10 1.11 10
3 2 3 3 2 2 2 2
For equivalent numbers of nuclei (i.e. 100 % isotope).
the application of NMR to solve the full three-dimensional structure of a protein in solution in the early 1980s; the introduction, in the late 1980s, of three- and four- multidimensional heteronuclear experiments for use with 13 C/15 N isotopically labelled proteins, followed in the late 1990s by the TROSY experiments (which require protein deuteration in addition).
Each stage of these developments has been accompanied by improvements in the spectrometer hardware. In particular, the increases in magnetic field strengths, improved probeheads such as cryogenically-cooled probeheads and better electronics have together led to very substantial improvements in resolution and sensitivity. In addition, continuous advances in molecular biology and sample preparation have allowed these NMR-based improvements to be exploited, particularly in the speed with which samples of significant quantities can be produced in a cost-effective way and the ease with which stable isotope enrichment can be accomplished. Finally, the data analysis is now more streamlined and in the case of very high-quality data, the structure determination process, from resonance assignment to structure calculation, can be automated. These developments are important in order for NMR to remain a mainstream technique for high-resolution structure determination and to make significant contributions in structural biology. For structural biology, NMR is unique in that it can be used for studies of macromolecules in both the solution and solid states and it is, furthermore, the only method that can provide information on dynamics at the atomic level. This book focuses on the use of NMR to study protein structure and interactions in solution, with the aim of providing a practical guide to users of the method. The book attempts to deal with methods, approaches and issues commonly encountered in the everyday use of NMR in structural biology. No attempt is made to provide a description of the fundamental physics of NMR, but in some chapters it is necessary to detail the theoretical aspects of the methodology in order that the methods can be appropriately applied. A full discussion of the fundamental basis of the wide range of solution NMR experiments used in structural biology can be found in [1] and other valuable introductions to modern NMR spectroscopy include [2,3].
Introduction
3
The success of any application of NMR depends on the correct sample preparation, the appropriate use of parameters for data acquisition and processing; these are covered in Chapter 1. Once initial data has been collected to assess if a protein system is suitable for NMR studies, the next step will depend upon whether the objective is a determination of the three-dimensional structure of a protein or its complex, or a more limited specific objective such as screening for ligand binding or the determination of pKa values. For all but the smallest proteins, isotope-labelling will be required (Chapter 2), and to go beyond purely qualitative experiments resonance assignments (Chapter 3) will be essential. Structure determination involves the acquisition and treatment of structural restraints (Chapter 4) and the use of these to obtain structural ensembles (Chapter 5). Chapter 6 describes the additional information on protein structure or complex formation which can be obtained when the protein contains a paramagnetic species – either naturally occurring or introduced specifically for the purpose. Chapter 7 describes different approaches to the study of the binding of small molecules, ranging from screening to full structure determination of the complex; this requires an understanding of the theoretical and practical aspects of the effects of chemical exchange in NMR, which is also important in many other areas of biological NMR. Chapter 8 provides a comprehensive description of the use of NMR to study macromolecular complexes; this is a challenging area and the chapter outlines the problems and approaches which can be taken to overcome these challenges. Chapter 9 focuses on the structural studies of intrinsically disordered proteins. The widespread existence and significance of these proteins are becoming increasingly recognised and NMR is currently the best method to provide detailed information on their conformational distributions. NMR is uniquely suited for the characterisation of biomolecular dynamics. Since so many nuclei can be detected simultaneously, NMR can provide a comprehensive description of the internal motions and conformational fluctuations at atomic resolution, and NMR methods have been developed to quantify motions that occur at a wide range of timescales, from picoseconds to days and months. At the same time, consideration of dynamics and the averaging processes to which they lead is an essential part of the use of NMR to obtain structural information. As a result, several chapters in this book deal with methods for obtaining dynamic information from NMR. For additional information the reader is also directed to the following reviews [4–7]. Over the last few years there have been significant developments in the application of solid-state NMR techniques as a tool for determining the high-resolution structures of proteins, ranging from microcrystalline soluble proteins to protein fibrils and membrane proteins. It is now possible to assign the spectra of proteins larger than 100 amino acids using 13 C,15 N –labelling [8]. However, as yet this remains an area for the expert and it is not covered in detail in this book (although some of the methods for isotope-labelling described in Chapter 2 will also be relevant to solid-state studies). For useful reviews, the reader is directed to [9,10].
Note Added in Proof Several valuable relevant reviews have appeared while this book was in production. In particular, two useful qualitative introductions to biomacromolecular NMR for the newcomer to the field would serve as valuable initial reading [11,12]. Clore [13] has
4
Protein NMR Spectroscopy
reviewed the use of relaxation methods (see Chapters 6 and 7) to observe species with low population, Wishart [14] has reviewed the use of chemical shifts in structure determination (see Chapters 4 and 5), and Dominguez et al. [15] have reviewed the use of NMR in the study of protein-RNA complexes (see Chapter 8).
References 1. Cavanagh, J., Fairbrother, W.J., Palmer, A.G. III et al. (2007) Protein NMR Spectroscopy: Principles and Practice, 2nd edn, Academic Press, San Diego. 2. Keeler, J. (2005) Understanding NMR Spectroscopy, John Wiley & Sons, Ltd, Chichester. 3. Levitt, M.H. (2001) Spin Dynamics: Basis of Nuclear Magnetic Resonance, John Wiley & Sons, Ltd, Chichester. 4. Mittermaier, A.K. and Kay, L.E. (2009) Observing biological dynamics at atomic resolution using NMR. TIBS, 34, 601–611. 5. Baldwin, A.J. and Kay, L.E. (2009) NMR spectroscopy brings invisible protein states into focus. Nature Chem. Biol., 5, 808–814. 6. Jarymowycz, V.A. and Stone, M.J. (2006) Fast time scale dynamics of protein backbones: NMR relaxation methods, applications, and functional consequences. Chem. Revs., 106, 1624–1671. 7. Igumenova, T.I., Frederick, K.K. and Wand, A.J. (2006) Characterization of the fast dynamics of protein amino acid side chains using NMR relaxation in solution. Chem. Revs., 106, 1672–1699. 8. Schuetz, A. et al. (2010) Protocols for the sequential solid-state NMR spectroscopic assignment of a uniformly labeled 25kDa protein: HET-s(1-227). ChemBioChem., 11, 1543–1551. 9. Renault, M., Cukkemane, A. and Baldus, M. (2010) Solid-state NMR spectroscopy on complex biomolecules. Angew. Chem. Int. Ed., 49, 8346–8357. 10. McDermott, A. (2009) Structure and dynamics of membrane proteins by magic angle spinning solid-state NMR. Ann. Rev. Biophys., 38, 385–403. 11. Kwan, A.H., Mobli, Gooley, P.R. et al. (2011) Macromolecular NMR spectroscopy for the nonspectroscopist. FEBS Journal, 278, 687–703. 12. Bieri, M., Kwan, A.H., Mobli, M. et al. (2011) Macromolecular NMR spectroscopy for the nonspectroscopist: beyond macromolecular solution structure determination. FEBS Journal, 278, 704–715. 13. Clore, G.M. (2011) Exploring sparsely populated states of macromolecules by diamagnetic and paramagnetic NMR relaxation. Protein Sci., 20, 229–246. 14. Wishart, D.S. (2011) Interpreting protein chemical shift data. Prog. Nucl. Magn. Reson. Spectrosc., 58, 1–61. 15. Dominguez, C., Schubert, M., Duss, O. et al. (2011) Structure determination and dynamics of protein–RNA complexes by NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc., 58, 62–87.
1 Sample Preparation, Data Collection and Processing Frederick W. Muskett
Purgamentum init, exit purgamentum
1.1
Introduction
The power of NMR spectroscopy for the analysis of biological macromolecules is undisputed. During the last two decades, the development of spectrometers and the experiments they perform, software, and the molecular biological techniques for the expression and purification of proteins have progressed at a formidable rate. Enrichment of molecules in the three major isotopes used in NMR (15 N, 13 C and 2 H) is now commonplace and the cost is no longer prohibitive. The software used to analyse the plethora of data we can generate makes spectral assignment and the extraction of data straightforward for all but the most challenging systems. With all these developments it is easy to forget some of the more fundamental requirements for obtaining good quality NMR data, namely a good sample and a well-set-up NMR experiment.
1.2
Sample Preparation
The first, and possibly one of the most important, steps before embarking on an NMR-based project is the preparation of the sample. Spending some time optimising sample Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
6
Protein NMR Spectroscopy
conditions for concentration, ionic strength, pH and temperature before collecting large amounts of data will pay dividends, particularly if the sample is difficult and expensive to produce. Ideally, the optimised sample will not only give the best possible NMR data, but will also have long-term stability as, assuming the project requires backbone and side-chain assignments, the total acquisition time required can be in the order of several weeks. The following sections will outline the general requirements of a biological sample that is to be used to record NMR data. The assumption has been made that full resonance assignment is required; however, these guidelines could, and probably should, be applied to all samples regardless of the intention of the experiments. 1.2.1
Initial Considerations
This optimisation can be performed in the NMR spectrometer but much can be done using other biophysical techniques such as circular dichroism or fluorescence spectroscopy. These methods require much lower concentrations and do not require isotopic enrichment. The effects of buffer composition on secondary structure content and the melting temperature of the sample give a useful starting point. Once the initial conditions have been determined, final optimisation in the spectrometer can begin. If the sample is a protein, although much can be learned from a simple one-dimensional proton experiment, by far the most useful experiment is the 15 N-edited HSQC. This type of experiment removes a great deal of resonance overlap, allowing the user to see in much more detail the effects of varying pH, ionic strength and temperature. NMR has intrinsically poor sensitivity and, as a result, the concentration of the sample needs to be in the millimolar range. For a conventional room temperature probe ideally the sample concentration needs to be 1 mM but can be as low as 0.5 mM. With the development of cryogenically cooled probes this concentration can be reduced to 0.2 mM and given the right sample can be as low as 0.05 mM (depending on the experiments performed). Whilst the sample can be exchanged into the buffer intended for NMR experiments in the last step of purification (usually gel filtration chromatography) it is rarely at the concentration required. The two main methods used to increase concentration are lyophilisation with subsequent re-suspension in a lower volume and ultra-filtration, or a combination of the two. Unfortunately, whether a sample will survive either method cannot be predicted; in the end one must simply try and see what happens. However, lyophilisation is generally considered the more dangerous of the two. The number and type of disposable ultra-filtration devices on the market is large, each with their own characteristics regarding compatibility with a particular sample and the effective volumes with which they can be used; again, try and see what happens. As a final step, either passing the sample though a 0.2 mm filter or centrifuging in a benchtop micro-centrifuge, to remove any insoluble material or dust, will greatly help sample homogeneity. With modern solvent suppression techniques that can effectively eliminate the 110 M protons from the water signal, dissolving the sample in 2 H2 O would, at first, no longer seem to be required. However, such samples are still important in recording the experiments designed to allow assignment of protein side-chain resonances and for 13 C-edited nuclear Overhauser effect (NOE) experiments. Even the most efficient solvent suppression techniques still leave residual solvent signal and at the same time suppress or distort the signals of interest in that area of the spectrum. In addition, the use of 2 H2 O allows one
Sample Preparation, Data Collection and Processing
7
to carry out experiments in which the coherences are recorded in the directly detected dimension where they have the highest resolution. These two advantages alone outweigh the effort required to transfer the sample into 2 H2 O. Methods for exchanging the solvent for 2 H2 O are the same as for concentrating samples, either lyophilisation or repeated concentration and dilution with 2 H2 O. Alternatively, if the sample is unlikely to survive those methods, the sample can be passed down a short de-salting gel-filtration column that has been pre-equilibrated in 2 H2 O. 1.2.2
Additives
As many NMR experiments require hours or days to complete, addition of anti-microbial agents is highly recommended. Sodium azide at a concentration of 0.02 % w/v is an almost universal method; however, in the rare cases where the azide ion interacts with the sample (e.g. some cytochromes) micromolar concentrations of an antibiotic such as ampicillin or chloramphenicol can be substituted. EDTA or AEBSF are frequently added to NMR samples at a concentration of 0.1–5 mM in order to reduce proteolysis. However, these compounds have nonexchangeable protons that can interfere with the spectrum of the sample. Excessive use of these compounds is best avoided, a better approach being to improve the purification protocol. If the protein sample contains free cysteines reducing agents such as DTTor TCEP are required to stop the protein forming dimers or multimers, which can result in precipitation. In addition, degassing the sample can help; however samples inevitably re-dissolve oxygen during subsequent sample manipulations or during the course of the NMR experiment unless special care is taken to seal the NMR tubes. 1.2.3
Sample Conditions
Although the primary choice of buffer must be that which promotes long-term stability of the sample, some buffer salts are more convenient for NMR than others. As buffer concentrations are typically between 10–50 mM, any covalently bonded protons in the buffer will give rise to sharp and obtrusive signals in the spectrum. This has resulted in phosphate buffer being the primary choice if its buffering range is appropriate to your sample and if it does not interact with your protein – though many proteins bind ligands containing phosphate groups, from ATP to phosphoproteins and DNA, and may bind inorganic phosphate weakly. Otherwise, many of the more common buffer salts are available with deuterium replacing the nonlabile protons. When selecting a pH it should be borne in mind that the exchange rates of amide protons are such that pH values between 3 and 7 are most conducive to observing the signals arising from these groups. For many biological samples, the addition of salts (typically sodium chloride) to the buffer increases solubility and decreases aggregation. Unfortunately, in NMR the dielectric losses at high ionic strength (greater than 150 mM) are severe, particularly in cryogenically cooled probes and at high magnetic fields. Such losses can be dramatic, as degrading the signal-to-noise ratio twofold results in a fourfold increase in acquisition time to achieve comparable spectra. Recently, alternatives to traditional buffer systems have be proposed, such as dipolar ions [1], low conductivity buffers [2] or the use of ‘solubilising salts’ [3]. The general applicability of these alternatives is yet to be realised; however, they should be investigated if the sample has low solubility and/or requires high ionic strength for stability.
8
Protein NMR Spectroscopy
The final parameter to consider in optimising the sample conditions is the temperature at which the experiments are to be performed. As the temperature increases, the correlation time of the molecules decreases and so the resonances become narrower. In addition, varying the temperature will lead to changes in the chemical shift of temperature sensitive groups and may help to resolve any resonance overlaps. Typically, NMR experiments are performed in the temperature range of 293–308 K but if you have determined the thermal stability of the sample in advance (i.e. by CD spectroscopy), you will have a better idea of the attainable upper limit. 1.2.4
Special Cases
There are two types of sample that require extra attention: integral membrane proteins and samples intended for ligand titrations. The use of solution NMR methods to study membrane proteins, although not mainstream, is now feasible. The solubilisation of integral membrane proteins in detergent micelles is relatively straightforward and the procedures for preparing membrane protein samples are essentially as described. There is much debate about which detergents are best for preparing NMR samples and it is apparent that no one detergent will suit all proteins. As a result, screening of several different detergent types at different concentrations is required. In addition, significant improvement of the spectrum can be achieved by using sample temperatures significantly higher than those used for soluble proteins (>310 K). The reader is referred to some excellent reviews on sample optimisation [4–6]. The study of ligand interactions via NMR is a well-established technique, enabling the identification of a specific binding site and the determination of kinetic information (see Chapter 7). The usual method for obtaining this information is to run successive spectra with increasing concentrations of ligand whilst observing the spectral changes that result. However, there is a danger that addition of the ligand will lead to changes in the sample conditions other than those due to ligand binding, resulting in artefactual/artificial changes in the spectrum. It can be difficult to obtain identical buffer conditions on mixing different proportions of two or more samples even if they have been dialysed against the same buffer but in separate dialysis tubes, as many biological macromolecules have a high affinity for electrolytes. In addition, if the ligand is a small molecule it may be difficult to solubilise and impossible to dialyse into the same buffer as the macromolecule. The major concern in mixing two samples together or adding a ligand to a sample is a change in pH, as this alone can result in considerable chemical shift changes in the molecule of interest, as any ionisable groups change state. Fortunately, this property of ionisable groups can be used to monitor the pH of the NMR sample. Addition of a small molecule (e.g. imidazole) to the sample can be used to monitor the sample’s pH as additions of ligand are made. Any pH shift will become immediately apparent as the chemical shift of the imidazole resonances will change. A number of these molecules have recently been characterised and the reader is referred to this article for more details [7]. In order to minimise the number of manipulations during a titration experiment, and so reduce the likelihood of systematic errors, the following procedure is recommended (see also Chapter 8 for a more detailed description). Two samples should be prepared, one of the biological macromolecule alone and the other with the macromolecule and the ligand at the concentration of its maximum titre – estimated, for example, by a biochemical assay.
Sample Preparation, Data Collection and Processing
9
A spectrum is recorded of each sample, giving the initial and end point of the titration series. Assuming 600 ml sample volumes and that the titration series is to be incremented in steps of 0.1 molar equivalents of ligand, 60 ml of each sample is removed and mixed with the other so giving ratios of 1 : 9 and 9 : 1. Spectra are again recorded, so giving the next highest and lowest points in the titration series. This procedure is repeated until both samples are of equal concentration of macromolecule and ligand, at which a spectrum of each is recorded and compared. At this point, the spectra should, of course, be identical unless an error was made. The advantages of this method are that the concentration of the macromolecule never changes due to dilution by the addition of ligand and also, because there is a clearly defined end-point, any errors, or sudden changes in the sample condition, will not go unnoticed. 1.2.5
NMR Sample Tubes
For high-resolution NMR experiments, the homogeneity of the magnetic field in which the sample sits must of course be very high. This is most easily achieved with a small sample volume, but on the other hand, the sensitivity – always limiting in an NMR experiment – sets a limit to how small a sample can be used. The sample is usually housed in a 5 mm diameter tube; the optimal volume of sample for the highest signal-to-noise ratio and magnetic field homogeneity (and hence the optimal resonance lineshape) will be dependent on the probe design and will vary between manufacturer and between probe generations from the same manufacturer. This information should be provided by the manufacturer but is usually of the order of 600 ml in a conventional 5 mm NMR tube. However, the use of ‘standard’ 5 mm tubes is now in decline due to the development of a reduced volume symmetrical microtube by Shigemi Inc., commonly referred to as a ‘Shigemi tube’. These tubes have a plug of susceptibility matched glass at the bottom of the tube with the equivalent in the form of a plunger that forms the top of the sample. The advantage of these tubes is that the sample volume need only match the susceptible volume of the transmitter/receiver coils of the probe. The glass plugs eliminate the change in magnetic susceptibility of the solvent/glass or solvent/air interface and hence the ‘end effects’ associated with short samples observed in standard NMR tubes. Therefore, sample volumes can be reduced to 300–350 ml. The temptation to reduce the sample volume even further should be resisted as the resonance lineshape, particularly of the residual water, deteriorates rapidly. One drawback to the use of these tubes is that samples tend to degas over a period of hours at temperatures much above 298 K. This results in air bubbles forming at the top of the sample, which has disastrous effects on the sample homogeneity and, as a result, on the residual water resonance. Therefore, the sample should be pre-incubated at the required temperature prior to insertion of the sample in the magnet. (Alternatively, the sample homogeneity should be checked several hours after inserting the sample in the magnet and before starting a long NMR experiment.) If conventional 5 mm tubes are to be used, selecting the correct grade is important. Although the intrinsic linewidth of a high molecular weight biological sample is high (i.e. for human ubiquitin, Mr 8.5 KDa, amide resonances range between 10 and 15 Hz) using cheap tubes limits the attainable field homogeneity. However, due to these larger intrinsic linewidths extremely high precision, and therefore expensive, NMR tubes are not really required and those graded for use in 500 MHz and above will suffice.
10
Protein NMR Spectroscopy
Whichever style of NMR tube is selected, none should be used directly from the box. The chemicals used during their manufacture will result in intense signals contaminating the spectra of the sample. Soaking in commercially available laboratory detergents (e.g. Decon 90) is usually sufficient for both new and used tubes. However, if a sample has precipitated, the residue adhering to the glass can be particularly stubborn and soaking in a strong mineral acid may be required. However, considering the cost of making the sample in the first place, perhaps such tubes should be consigned to the glass bin. Avariety of NMR tube cleaners are commercially available (e.g. Sigma-Aldrich or GPE Scientific Ltd); these are extremely useful for ensuring the tubes are thoroughly rinsed after soaking in either detergent or acid. If the sample is to be dissolved in 2 H2 O, the tubes may be soaked in 2 H2 O to exchange the residual water bound to the glass. Once rinsed, tubes are best dried in a stream of dry nitrogen gas as heating the tubes can cause distortions which will adversely affect field homogeneity. 1.2.5.1 3 mm Tubes As discussed previously, the ionic strength of a sample can have profound effects on the signal-to-noise ratio of the observed spectrum. If the sample cannot tolerate low levels of salt then a final option is to use 3 mm NMR tubes, even if the probe is designed to be used with 5 mm tubes. In effect, the sample adds a resistance to the receiver coils of the probe; therefore reducing the volume of sample reduces this additional resistance and so does not degrade the overall sensitivity of the probe. Figure 1.1 shows the effect of increasing ionic strength on the signal-to-noise ratio observed in a 600 MHz cryogenically cooled probe when using a standard 5 mm, 3 mm or Shigemi tube. At low ionic strength (i.e. <75 mM) signal-to-noise is dominated by the total amount of sample in the susceptible volume of the transmitter/receiver coils of the probe. Once the
Figure 1.1 The effect of ionic strength on the signal-to-noise ratio observed in a 600 MHz cryogenically cooled probehead. Signal-to-noise was estimated using the signal of a 0.5 mM solution of DSS dissolved in 2H2O. The ionic strength of the samples was obtained using potassium chloride
Sample Preparation, Data Collection and Processing
11
ionic strength reaches 100 mM, the signal-to-noise ratio for the 3 mm and 5 mm tubes is equivalent, even though there is only 30 % of the sample in the 3 mm tube. Shigemi tubes give higher signal-to-noise ratios until extremely high ionic strengths are reached (>400 mM), due to a combination of sample volume and the physical dimensions of the sample.
1.3
Data Collection
The initial four steps required to obtain an NMR spectrum are identical regardless of the type of experiment to be carried out. These are: to establish field/frequency lock, to tune and match the probe, to optimise the magnetic field homogeneity (shimming) and to calibrate the radio frequency pulse lengths. It is recommended that the user follows these four operations every time a sample is used, even if it has been used frequently or recently, as changes may indicate a problem, either with the sample itself or with the spectrometer. Depending on the age of the spectrometer, these steps may now be either fully or partially automatic. However, knowledge of how to perform these steps manually is essential if the sample is challenging or if better than merely ‘good’ data is required. The assumption is made throughout this section that the NMR spectrometer to be used is well maintained, i.e. all heteronuclear pulses are properly calibrated, the VT unit is calibrated and the magnetic field homogeneity is optimised. Additionally, no attempt has been made to discuss which experiments should be collected or the theory behind them. Such a discussion is well beyond the scope of this chapter and the reader is directed to later chapters in this volume and to the following excellent discussions [8–11]. 1.3.1
Locking
Once the sample is safely in the magnet and its temperature has equilibrated the first step is to establish a ‘field-frequency lock’. Even though the magnetic field produced by the superconducting magnet is extremely stable, there is some drift that will adversely affect the lineshape of the spectrum. The function of locking is to ensure that the magnetic field the sample experiences is constant. The use of deuterium as the locking nucleus is now universal and in the case of samples dissolved in H2O, 2 H2 O should be added at a level of 5–10 %. To lock, the magnetic field is first adjusted to bring the deuterium signal on resonance with the lock frequency of the spectrometer. Care should be taken that too much power is not used on the lock transmitter or the magnetic field stability will be poor and shimming will be unresponsive. Optimal lock transmitter power can be found by increasing the lock power stepwise, and observing the lock signal. Once the signal becomes ‘saturated’, indicated by a rise and then drop in lock level when the power is increased, lock power should be reduced by several dB such that a stable lock signal is obtained. Lock phase should also be adjusted to give the highest signal, ensuring that the lock circuit is at its most sensitive, thus providing optimal stability to the system. 1.3.2
Tuning
Putting a biological sample into the transmit/receive coils of an NMR probe affects the tuned circuit of the coil used to excite the sample and to detect the NMR signal. Therefore,
12
Protein NMR Spectroscopy
the probe needs to be tuned back to the correct resonance frequency and matched to the correct impedance. The tuning of the observe coils (usually proton) is strongly sample dependent and proper tuning is essential to obtain the highest sensitivity. This circuit should be tuned and matched to each sample and the procedure repeated if, for example during a titration experiment, additions are made to the sample. Conversely, the indirect or heteronuclear coils of the lock circuit are reasonably insensitive to the sample and do not need routine tuning. Modern spectrometers have a visual feedback (the precise nature of which is manufacturer dependent) to facilitate the tuning and matching procedure as each requires the physical adjustment of capacitors in the circuit. As the two adjustments tend to interact with each other this process can be troublesome. However, assuming the tuning of the circuit is not too far off, if an iterative approach is adopted, i.e. the match is first fully optimised, and then the tune and so on until no improvement is achieved, the process should be relatively straightforward. 1.3.3
Shimming
The superconducting magnet of the spectrometer alone cannot produce a magnetic field homogenous enough for NMR experiments. Therefore, additional room-temperature electromagnetic coils, known as ‘shim-coils’, are placed around the sample and are used to cancel out the residual inhomogeneities of the main magnetic field. These inhomogeneities are either inherent to the magnet itself or are introduced when the sample is put into the bore of the magnet. The process of correcting these inhomogeneities is known as shimming. Shimming can be carried out either manually or semi-automatically using gradient based techniques. The affect of adjusting a shim coil can be observed in one of two ways, by observing the Free Induction Decay (FID) or the lock level. As most biological samples will at some time be dissolved in a nondeuterated solvent it is usual to shim these samples using the lock level. The intense signal from the water decays much more rapidly than expected due to the phenomenon known as radiation damping, consequently shimming on the FID is particularly unproductive for these samples. Modern spectrometers may have up to 40 shim coils named according to the field profile they generate. Due to the field profile of the shims, the higher order shims interact with the lower order shims such that overall shimming is not a first order process. As a consequence, the manual shimming of an NMR magnet has become shrouded in mystery and folklore and there are probably as many protocols to obtain homogeneity as there are NMR spectroscopists. Coupled with the rapid development of automatic gradient-based shimming methods, manual shimming is slowly being ousted to the point that many structural biologists cannot manually shim their samples. But, however robust the gradient-based methods are they can, and do, fail to shim satisfactorily and then the spectroscopist must intervene. The following texts give extensive descriptions of the shimming process and protocols that the reader might try [8,12,13]. As almost all studies of biological macromolecules depend extensively upon 2D and 3D experiments, these samples are never spun, and unlike those for small molecules, these shimming protocols do not include the use of sample spinning. Biological samples usually have linewidths of 10 Hz or more and this should be taken into account during the shimming process. There is little to be gained in spending hours shimming a sample in an attempt to reduce the solute linewidths by a few fractions of a hertz.
Sample Preparation, Data Collection and Processing
13
However, the presence of a very strong solvent signal in samples dissolved in H2O makes reaching ‘good’ shimming much more critical. For these samples, the lineshape of the residual water resonance is important, as a poorly shimmed sample will have long tails from the intense solvent peak obscuring solute signals. In addition, the spectrum will have a distorted baseline that can be particularly detrimental in multidimensional experiments. An easy way to assess the quality of the shimming in an H2O sample is to acquire a onedimensional pre-saturation experiment. After optimisation of the transmitter offset to maximise solvent signal suppression the solvent signal should be easily suppressed using a pre-saturation duration of 1.5 sec and a maximum B1 field of 100 Hz, with the residual solvent signal being no more than 150 Hz wide at its base. Such a residual signal should be achievable even on a cryogenically cooled probe where radiation damping is even more of a problem. Failure to achieve such a ‘lineshape’ will result in poor quality spectra. Although pre-saturation is rarely used in modern NMR experiments it is nevertheless a good test of shimming, as failure to adequately suppress the solvent by this method will not be compensated for by more sophisticated methods (e.g. watergate [14]). 1.3.4
Calibrating Pulses
All but the most basic NMR experiments used to study biological samples are multipulse experiments that depend on applying pulses with the correct flip-angle. Even the simplest ‘pulse and acquire’ experiment benefits from a properly calibrated 90 pulse as then maximum intensity is obtained. The heteronuclear (i.e. 13 C, 15 N and 31 P) pulse widths are insensitive to the sample and once calibrated need only checking occasionally (e.g. every six months or so, unless there has been a change in hardware) and do not need recalibrating for each sample. Calibration of heteronuclear pulses is described in detail in [15]. However, proton pulse widths are sensitive to the sample, particularly ionic strength, and should be calibrated each time the sample is used or if anything is added to the sample. Calibration of proton pulse widths is a straightforward process providing a few simple rules are followed: place the signals of interest in the centre of the spectrum and allow a long enough relaxation delay for the signals to relax fully. Normal practice is to increase the flip angle until the first signal maximum, or the first or second signal null is found. A coarse calibration is performed first, i.e. linearly increasing the pulse width in 4 msec steps, then when an approximate value is found a finer-grained calibration can be performed. The null methods are more precise as the maxima are rather broad, and if the 360 null is chosen a shorter relaxation delay can be used. Once the 360 null is found, simply dividing by four gives the 90 pulse width. If the sample is dissolved in H2O then it is most practical to use this resonance. However, even if the receiver gain is set to its lowest value, ADC overflow will occur, and the null methods are recommended. Pulses designed to selectively excite the solvent resonance are commonly encountered in many NMR experiments optimised for biological samples. The spin-lattice relaxation time of H2O is much longer than those of the macromolecule and the water proton resonance is therefore partially saturated at the end of the pulse sequence. Chemical exchange and spin diffusion between protons on the molecule of interest (in proteins the HN and Ha spins are affected most) and the partially saturated water can lead to a partial saturation of the protons of the macromolecule. This can be avoided by selectively returning the water magnetisation back to the z-axis; so called ‘water-flip-back’.
14
Protein NMR Spectroscopy
Such selective excitation can be achieved either by simply reducing the radio frequency (RF) field strength and/or by increasing the length of the pulse (referred to as selective pulses or soft pulses) or by the use of a shaped pulse. Here, both the amplitude and the phase are varied during the period of the pulse to achieve the desired excitation profile (commonly used shapes are gaussian and sinc shapes).1 Whether selective or shaped pulses are used their excitation profile should be such that only the solvent peak is excited and not the resonances of interest. Modern spectrometer software will calculate the approximate power of such pulses based on the calibrated 90 pulse width supplied by the user. Fine tuning of this power is required by the user to achieve optimum solvent suppression. In addition to the power of the pulse, fine tuning of the transmitter offset may be required. 1.3.5
Acquisition Parameters
Once the initial four steps are complete, attention is now shifted to the acquisition parameters of the NMR experiments themselves. Many of these parameters are sampledependent and therefore no hard and fast rules can be supplied. However, once the acquisition parameters are optimised they can be applied on subsequent occasions. Modern spectrometer software allows the user to write macros for common commands to facilitate spectrometer set-up and data processing; these can be easily adapted to loading experimental parameters, thus avoiding mistakes and hence wasting long hours of spectrometer time (and sample life). The parameters that are applicable to the vast majority of experiments (from 1D to 4D experiments) are considered below and from a practical point of view rather than a theoretical one. The author has avoided the use of manufacturer specific terminology to avoid confusion. The transmitter offset is usually positioned such that the signals of interest are centred. If the spectrum contains an intense solvent peak it is usual to position the transmitter offset on this resonance. Apart from the solvent suppression method employed, this makes the use of convolution functions (see data processing, below) more convenient and reduces the possibility of spectral artefacts arising from such an intense signal. Inspection of the BMRB database (Biological Magnetic Resonance Data Bank, http://www.bmrbwiscedu/) shows that the vast majority of proton resonances found in biological macromolecules (protein, DNA and RNA) are between 14 and 1.0 ppm. However, it should never be assumed that this spectral range will contain all the resonances of interest. For example, paramagnetic systems have much larger spectral widths, sometimes in the order of 100 ppm (see Chapter 6). For such samples, acquisition of all the proton resonances in a single spectrum is impossible. Initially, a relatively wide spectral width should be acquired to avoid aliasing or folding in signals that are shifted outside the ‘normal’ range as a result of bound ligands. In addition, modern digital filters are extremely efficient and resonances that are significantly outside the set spectral width will be filtered out and so never observed. Similarly, the heteronuclear chemical shift ranges are well known (e.g. 90–140 ppm for backbone amide nitrogens and 5–85 ppm for aliphatic side-chain carbons) but should be checked for unusually shifted resonances. Arginine side-chain guanidine nitrogen signals are commonly observed in
1
Such pulses are also commonly used to decouple the carbonyl and Ca spins of proteins.
Sample Preparation, Data Collection and Processing
15
N-HSQC type experiments, usually folded into the spectrum from 70 ppm; less commonly lysine side-chain amino nitrogens are observed (folded from 20 ppm). A common practice is to reduce the spectral width for indirect heteronuclear dimensions (particularly 13 C) and deliberately fold resonances in, so allowing a higher resolution with the same total acquisition time. If the initial incremental delay for the aliased dimension is set to 1/(2SW) where SW is the spectral width (and quadrature is achieved with the States-TPPI method), then resonances that have been folded or aliased an odd number of times will have opposite phase to those that are not aliased (or have been aliased an even number of times). However, care should be taken when folding spectra lest signals be folded on top of others and so increase the resonance overlap (or signal cancellation) in already crowded spectra. The long correlation times of macromolecules results in the signals relaxing relatively quickly, and as a consequence acquisition times are quite short (i.e. 100 ms). There is a temptation to increase the acquisition time of the FID in the misconception that this will enhance the resolution of the spectrum. If data is recorded after the signal has decayed, all that is then measured is noise and, after Fourier transformation, the signal-to-noise ratio of the resulting spectrum is degraded. Determination of the optimal acquisition time is best performed experimentally. If a 1D spectrum with good signal-to-noise is acquired it can be compared to subsequent spectra either acquired with a shorter acquisition time or with the FID truncated during processing. A judgement must be made as the point where the gain in signal-to-noise is outweighed by the loss in resolution. Acquisition times in the indirect dimension(s) of 2-, 3- and 4- dimensional experiments are set via the number of acquisitions, or increments, and are in turn related to the spectral widths in these dimensions. The relationship of AQ ¼ N/(2SW) (where AQ is the total acquisition time, N is the number of increments and SW the spectral width) is well known. Theoretically, optimal resolution requires an acquisition time up to 3/R2, and the maximum signal-to-noise is obtained at 1.26/R2, where R2 is the transverse relaxation rate [16,19], but this is rarely possible to achieve, and so a balance must be struck. This balance is additionally complicated by the constraints of the phase cycle of the NMR experiment itself. This sets a minimum number of scans per increment and only integer multiples of a phase cycle should be used. As a general guide for two-dimensional homonuclear experiments, acquisition times in the indirect dimension of 40–60 msec should give adequate resolution. For heteronuclear three-dimensional experiments, both total acquisition time constraints (experiments usually require 3–4 days) and relaxation limit the proton acquisition time to 20 msec. For 15 N dimensions, acquisition times 30 msec should be used and for 13 C acquisition times are usually limited to 9 msec so as not to resolve one bond carbon couplings. If carbonyl resonances are being recorded, this can be extended to 25 msec to give better resolution in this crowded region. However, when the experiments make use of constant-time periods [17–19] it is usual to acquire the maximum number of points allowable in these dimensions. Although these acquisition times limit the resolution of the spectra, this can be regained during processing with judicious linear prediction of the spectrum if sufficient signal-to-noise is available. Invariably, an NMR experiment consists of multiple scans that are added together. Failure to leave an adequate relaxation delay between scans is likely to result in artefacts in a multidimensional experiment as the spin systems will not be in the same state at the start of each scan. In addition, if the relaxation delay is too short the spins will not have fully relaxed 15
16
Protein NMR Spectroscopy
and so the signal-to-noise ratio will decrease. Traditionally, in an attempt to compromise between signal-to-noise and an artefact free spectrum, an interscan delay of one to one and a half times T1 is used. For a fixed total experiment time there is a trade-off between the increase in observed signal as the relaxation delay is increased and the concomitant decrease in the total number of scans that can be acquired. This becomes more apparent with deuterated proteins which have longer relaxation times. In order to double the signal-tonoise ratio of the spectrum the number of scans acquired needs to increase four fold. Therefore, it may be better to reduce the relaxation delay, and so reduce the observed signal per scan, in order to increase the total number of scans and hence the final signal to noise ratio. Most modern NMR probes have a maximum duty cycle, usually in the order of 15 %, that should not be exceeded. As a result, relaxation delays are in the range of one to one and a half seconds. Additionally, steady state or dummy scans are used to ensure the sample is at thermal equilibrium before data is recorded. The duration of the steady-state period depends upon the experiment being performed. For example, if the experiment contains spin lock periods or decoupling then a steady-state period of five to ten minutes may be required. Avoiding baseline distortions is particularly important in multidimensional experiments, especially when the intensity of the cross-peak is to be determined (e.g. nuclear Overhauser effect spectroscopy (NOESY) spectra). Baseline distortions that manifest as positive and negative ridges can be particularly damaging to a spectrum and may obscure cross-peaks altogether. Reducing these distortions can be achieved by shimming the sample to minimise the linewidth of the remaining solvent signal and by adjusting the pre-acquisition delay to remove the need for frequency dependent phase corrections in the detected dimension. In the indirect dimensions, the initial incremental delay should be set to obtain either a 0 or 180 first-order phase correction. In some cases baseline distortions result if the first few points of the FID are corrupted – this can be caused if the receiver gain has been set too high (referred to as a clipped FID). After Fourier transformation the spectrum shows ‘sinc wiggles’ or truncation artifacts. Additionally, a baseline roll may arise from the transient response of the audio filters to the signal; again, the first few points of the FID are corrupted. If only the first point is affected then the result is a constant baseline offset, but as more points are affected, the baseline distortions become more severe. In order to compare the resonance positions between different samples and spectrometers they are measured relative to a standard compound. Tetramethylsilane (TMS) is the universal reference for 1 H NMR of organic molecules. For biological macromolecules, the situation is less straightforward, since TMS is not soluble in water. The recommended reference for biological samples is the methyl 1 H resonance of 2,2-dimethyl-2-silapentane-5-sulphonic acid (DSS) at 0.00 ppm. However, DSS can interact with biological molecules and a suitable alternative is dioxane, although its resonance, at 3.75 ppm, appears in a more crowded region of the spectrum. Once the proton shifts have been referenced, the heteronuclear chemical shifts are referenced indirectly, using the relevant gyromagnetic ratios. The software used to acquire the data usually carries out this procedure automatically. For a detailed discussion of chemical shift referencing in biological NMR the reader is referred to [20]. 1.3.6
Fast Acquisition Methods
Traditionally, multidimensional NMR experiments are acquired by linearly, and systematically, incrementing the indirect evolution periods. This has the advantage that frequencies
Sample Preparation, Data Collection and Processing
17
are not overlooked, but is time inefficient as regions of frequency space are explored where there is only noise. This can result in total acquisition times of several days. However, new acquisition methods are being developed with the goal of speeding up total acquisition times without loss of information. Experimental time can be reduced from days to hours, hours to minutes and in some cases minutes to seconds. These new methods include projection [21,22], G-matrix [23] and Hadamard NMR [24,25], nonlinear time domain sampling [26,27], and fast-pulsing NMR [28,29]. All have their advantages and disadvantages, some of which will depend on the ‘traditional’ spectrum the sample gives, but all require good signal-to-noise ratios. Consequently, no recommendations can be made as to which the user should try. However, apart from fast-pulsing NMR, all require specific data processing algorithms and/or data manipulation to reconstruct a traditional multidimensional spectrum. Such algorithms are not readily available within the core data processing software. In addition, spectral analysis software development is lagging behind these advances, making analysis of such spectra far from routine. Hopefully, as these methods become more mainstream, so will the processing and analysis.
1.4
Data Processing
Once the experiment is finished, the FID(s) are stored on the computer’s hard drive. Data processing describes the practice of performing mathematical manipulations on the FID prior to conversion from the time domain to the frequency domain by Fourier transformation and, if required, further manipulations on the frequency domain before the spectrum is ready for analysis. Although modern NMR processing software is equipped with a considerable array of tools for ‘fixing’ poor quality data it will nevertheless be suboptimal and its interpretation made all the harder. It is much better to collect the experiments well in the first place rather than depending on software to ‘clean it up’ afterwards. Additionally, some care is needed in the correct use of these mathematical tools as overuse can do more harm than good. The purpose of this section is to provide the user with some practical guidelines to the commonly used data processing tools. The choice of software available for the processing of NMR experiments is a very personal one. The software supplied by the spectrometer manufacturers is more than adequate for the processing of multidimensional experiments and is equipped with the vast majority of algorithms one might want to use. In addition, powerful third party software, such as NMRPipe [30] (http://www.nmrscience.com/nmrpipe.html), is available that can not only process ‘traditional’ multidimensional NMR experiments but can also process some of the fast acquisition methods mentioned above. Additionally, it has built-in functions for data analysis and is readily customisable by the user. Regardless of which software is used the basic steps in transforming the FID into a frequency domain spectrum that the user can analyse are the same and their use is outlined below. The FID stored on the computer consists of a time domain signal that has been sampled at regular intervals and then converted to a digital format. The total number of points in the FID is composed of both real and imaginary data which allows the sign of the frequency with respect to the transmitter offset to be determined. Therefore, the actual spectrum displayed after Fourier transform contains only half the original number of points collected. A one-dimensional spectrum is most usually displayed as a line which is in fact an
18
Protein NMR Spectroscopy
interpolation of these points into a smooth line. Therefore, the more points that make up the line the smoother that line will be. Taking the original FID and adding zeroes to the end of it before Fourier transformation is known as zero filling, with the result that the displayed line is represented by more data points. Doubling the number of points in the FID can be repeated as many times as the user wishes (referred to as zero filling once, twice and so on). Zero filling does not adversely affect the spectrum but nor does it improve the resolution as the measured signal remains the same. It is applied for purely cosmetic reasons. However, the Fourier transform algorithm used by computer programs works best if the number of data points is a power of two. Therefore, it is usual to zero fill the time domain data, prior to Fourier transformation, so that the total number of points is a power of two. The same applies to multidimensional experiments. These are normally displayed as contour plots and the more points the smoother the contours will look. In one-dimensional NMR it is usual to record the FID until it has decayed into the noise. In multidimensional NMR experiments this may not be possible due to the time restrictions discussed above. Not recording the signal until it has fully decayed gives a truncated FID resulting, upon Fourier transformation, in oscillations at the base of the peak. These oscillations are referred to as sinc wiggles as the peak shape is related to a sinc function. The more the FID is truncated the more severe the sinc wiggles. These oscillations are undesirable as they distort the baseline of the spectrum and can obscure nearby signals, sometimes completely. Therefore, the only solution is to apply a weighting function to the FID that drives the signal to zero by the end of the FID (referred to as apodisation). Use of apodisation functions (or weighting functions) is ubiquitous in biological NMR spectroscopy. However, they should be used with caution as poorly matched functions can have side effects, broadening the resonances and reducing the signal-to-noise ratio. There are two basic weighting functions that can be applied to an FID: sensitivityenhancing and resolution-enhancing functions. Sensitivity-enhancing functions, applied to the later part of the FID, improve signal-to-noise ratio at the expense of resolution. The simplest example of this type of weighting function is an exponential. Multiplication of the FID by this function results in the envelope of the FID decaying more rapidly, as a result the resonances become broader (hence they are also known as line-broadening functions) and the noise is suppressed. Resolution-enhancing functions improve resolution at the expense of signal-to-noise by attenuating the first part of the FID and enhancing the latter part. To avoid degrading the signal-to-noise ratio excessively it is usual to apply a second decaying weighting function, commonly a Gaussian function, to drive the tail of the FID to zero, so de-emphasising the noise in the final spectrum. The parameters that define these weighting functions have to be set by the user and additionally have to be ‘matched’ to the FID. Trial and error is used to find the appropriate values. These ‘basic’ weighting functions are rarely used in biological NMR where the use of sine bell functions, and to a lesser extent Gaussian functions, are the most popular. The basic sine bell function is adjusted so that it fits exactly over the FID resulting in a resolution enhancement. This rather severe function gives the typical appearance of a highly resolution enhanced spectrum where the resonances are very sharp, signal-to-noise has been degraded and deep troughs are found on either side of the resonance lines. A simple modification to this function is to shift its maximum towards the beginning of the FID to the limit where the maximum of the function is at the start of the FID. This is referred to as a 90 or p/2 shifted
Sample Preparation, Data Collection and Processing
19
sine function and is now simply a decaying function, and therefore will broaden the resonances but improve signal-to-noise. The degree of shift (anywhere between 0 and 90 but is usually expressed as p/x where p/2 is 90 ) is set by the user in order to optimise resolution without adversely affecting the signal-to-noise. However, the deep troughs on either side of the resonance lines should be avoided, particularly in multidimensional experiments, where signals can be obscured by these distortions. When the FID has been zero filled, a weighting function applied and then Fourier transformed the resulting spectrum will require phase correction to give pure absorption mode signals. There are two phase corrections to be applied, one that is independent of the resonance frequency (zero-order phase correction) and one that is dependent on the resonance frequency (first-order phase correction). Resonances with zero-order phase errors have the same degree of dispersive character across the spectrum whereas resonances with first-order phase errors have varying degrees of dispersive character. The usual procedure to correct phase errors is to first correct the low frequency resonances with a zero-order phase correction and then (if required) to phase the higher frequency resonances with first-order phase correction. The procedure is an iterative one and may require two or three ‘cycles’ of phasing before the spectrum is properly phased. A high degree of first-order phase correction will cause baseline distortions and the problem causing it should be corrected (usually by adjusting the pre-acquisition delay). In the indirect dimensions the phase adjustment should have already been corrected in the experimental design. If this is not the case, the procedure is the same as for the directly detected dimension. In principle, the spectrum should now be ready for analysis. There are, however, two other procedures that might be used if required: a convolution filter to remove an intense solvent signal and a linear prediction procedure to calculate additional data points in a truncated FID. The most common post-acquisition water suppression technique is the convolution difference low-pass filter [31]. This method removes the low-frequency components from the spectrum. Although some baseline distortion occurs near the water signal it is a very effective method or removing this intense signal. However, the user should be aware that this method does not discriminate between the water signal and any signals arising from the sample that are within the filter bandwidth applied. Thus, it should not be considered as an alternative to good solvent suppression. A theoretical description of linear prediction is well beyond the scope of this chapter and the reader is referred to the following excellent articles [32,33]. Here, some general guidance is provided to give the inexperienced user a starting point when using linear prediction algorithms. As mentioned earlier, the indirectly detected dimensions of multidimensional NMR experiments are almost always truncated and linear prediction provides an effective method of ‘extending’ this data, so improving the resolution and spectral quality of these dimensions. In addition, linear prediction is useful for correcting the first few points of a corrupted FID. In general, linear prediction works best for data with relatively high signalto-noise ratios and FIDs should not generally be extended by more than a factor of two as artefacts can arise as well as distortions to the signals themselves. If the 1 H dimensions are Fourier transformed first, the number of signals in the heteronuclear dimensions is reduced; this simplifies the prediction problem and makes the algorithm more stable. Finally, if constant-time experiments have been recorded, the interferogram does not decay as in the FID. In this case, much better results will be obtained if the mirror image linear prediction algorithm is used.
20
Protein NMR Spectroscopy
References 1. Lane, A.N. and Arumugam, S. (2005) Improving NMR sensitivity in room temperature and cooled probes with dipolar ions. J. Magn. Reson., 173, 339–343. 2. Kelly, A.E. et al. (2002) Low-conductivity buffers for high-sensitivity NMR measurements. J. Amer. Chem. Soc., 124, 12013–12019. 3. Hautbergue, G.M. and Golovanov, A.P. (2008) Increasing the sensitivity of cryoprobe protein NMR experiments by using the sole low-conductivity arginine glutamate salt. J. Magn. Reson., 191, 335–339. 4. Krueger-Koplin, R.D. et al. (2004) An evaluation of detergents for NMR structural studies of membrane proteins. J. Biomol. NMR, 28, 43–57. 5. Tamm, L.K. and Liang, B.Y. (2006) NMR of membrane proteins in solution. Prog. Nucl. Magn. Reson. Spectrosc., 48, 201–210. 6. Tian, C.L. et al. (2005) Membrane protein preparation for TROSY NMR screening. Meth. Enzymol., 394, 321–334. 7. Baryshnikova, O.K., Williams, T.C. and Sykes, B.D. (2008) Internal pH indicators for biomolecular NMR. J. Biomol. NMR, 41, 5–7. 8. Cavanagh, J., Fairbrother, W.J., Palmer, A.G. III, et al. (2007) Protein NMR Spectroscopy: Principles and Practice, 2nd edn, Academic Press, San Diego, p. 587. 9. Keeler, J. (2005) Understanding NMR Spectroscopy, 1st edn, John Wiley & Sons, Ltd, Chichester, p. 476. 10. Hounsell, E.F. (1995) H-1 NMR in the structural and conformational analysis of oligosaccharides and glycoconjugates. Prog. Nucl. Magn. Reson. Spectrosc., 27, 445–474. 11. Flinders, J. and Dieckmann, T. (2006) NMR spectroscopy of ribonucleic acids. Prog. Nucl. Magn. Reson. Spectrosc., 48, 137–159. 12. Conover, W.W. (1984) Topics in Carbon-13 NMR Spectroscopy, vol. 4 (ed. G.C. Levy), John Wiley & Sons, Ltd, Chichester, p. 282. 13. Chmurny, G.N.H. and Hoult, D.I. (1990) The ancient and honourable art of shimming. Concepts Magn. Reson., 2, 131–149. 14. Piotto, M., Saudek, V. and Sklenar, V. (1992) Gradient-tailored excitation for single-quantum NMR-spectroscopy of aqueous-solutions. J. Biomol. NMR, 2, 661–665. 15. Braun, S., Kalinowski, H.-O. and Berger, S. (1998) 150 and More Basic NMR Experiments, 2nd edn, Wiley VCH, Weinheim, p. 610. 16. Rovnyak, D., et al. (2004) Resolution and sensitivity of high field nuclear magnetic resonance spectroscopy. J. Biomol. NMR, 30, 1–10. 17. Bax, A. and Freeman, R. (1981) Investigation of complex networks of spin-spin coupling by twodimensional NMR. J. Magn. Reson., 44, 542–561. 18. Rance, M. et al. (1984) Application of omega-1-decoupled 2D correlation spectra to the study of proteins. J. Magn. Reson., 59, 250–261. 19. Bax, A., Mehlkopf, A.F. and Smidt, J. (1979) Absorption spectra from phase-modulated spin echoes. J. Magn. Reson., 35, 373–377. 20. Wishart, D.S. et al. (1995) H-1, C-13 and N-15 chemical-shift referencing in biomolecular NMR. J. Biomol. NMR, 6, 135–140. 21. Kupce, E. and Freeman, R. (2003) Projection-reconstruction of three-dimensional NMR spectra. J. Amer. Chem. Soc., 125, 13958–13959. 22. Kupce, E. and Freeman, R. (2008) Hyperdimensional NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc., 52, 22–30. 23. Kim, S. and Szyperski, T. (2003) GFT NMR, a new approach to rapidly obtain precise highdimensional NMR spectral information. J. Amer. Chem. Soc., 125, 1385–1393. 24. Kupce, E. and Freeman, R. (2003) Frequency-domain Hadamard spectroscopy. J. Magn. Reson., 162, 158–165. 25. Kupce, E. and Freeman, R. (2003) Fast multi-dimensional NMR of proteins. J. Biomol. NMR, 25, 349–354.
Sample Preparation, Data Collection and Processing
21
26. Rovnyak, D. et al. (2004) Accelerated acquisition of high resolution triple-resonance spectra using non-uniform sampling and maximum entropy reconstruction. J. Magn. Reson., 170, 15–21. 27. Marion, D. (2005) Fast acquisition of NMR spectra using Fourier transform of non-equispaced data. J. Biomol. NMR, 32, 141–150. 28. Schanda, P., Kupce, E. and Brutscher, B. (2005) SOFAST-HMQC experiments for recording twodimensional heteronuclear correlation spectra of proteins within a few seconds. J. Biomol. NMR, 33, 199–211. 29. Lescop, E., Schanda, P. and Brutscher, B. (2007) A set of BEST triple-resonance experiments for time-optimized protein resonance assignment. J. Magn. Reson., 187, 163–169. 30. Delaglio, F. et al. (1995) NMRPipe - a multidimensional spectral processing system based on Unix pipes. J. Biomol. NMR, 6, 277–293. 31. Marion, D., Ikura, M. and Bax, A. (1989) Improved solvent suppression in one-dimensional and two-dimensional NMR-spectra by convolution of time-domain data. J. Magn. Reson., 84, 425–430. 32. Stephenson, D.S. (1988) Linear prediction and maximum-entropy methods in NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc., 20, 515–626. 33. Stern, A.S., Li, K.B. and Hoch, J.C. (2002) Modern spectrum analysis in multidimensional NMR spectroscopy: Comparison of linear-prediction extrapolation and maximum-entropy reconstruction. J. Amer. Chem. Soc., 124, 1982–1993.
2 Isotope Labelling Mitsuhiro Takeda and Masatsune Kainosho
2.1
Introduction
The isotopic labelling of proteins is the basis of current NMR methodology, and almost all NMR studies are performed with isotope-labelled proteins. The isotopic labelling of proteins has played crucial roles in addressing the fundamental problems encountered in NMR studies of proteins: signal overlap and line-broadening. In current isotope-aided NMR studies, the atoms with isotopic compositions that are often altered are hydrogen, nitrogen and carbon, and 2 H, 13 C and 15 N nuclei are commonly used for this purpose. The aims of isotope labelling can be categorised as follows: a. Mitigation of NMR peak overlap For biological macromolecules, a huge number of 1 H resonances that are prone to overlap each other can be observed in a resolved manner by using hetero-nuclear correlation methods, such as 1 H -15 N and 1 H -13 C HSQC spectra. The substitution of 1 H atoms by 2 H further reduces the overlap of 1 H resonances. b. Enhancement of signal-to-noise ratio Replacements of 1 H by 2 H lead to reduced dipole and scalar interactions between nearby atoms. Enrichment with 13 C and 15 N is useful for the direct detection of the enriched carbon and nitrogen atoms. c. Resonance assignment Isotopic enrichment of nitrogen and carbon makes it possible to perform sequential backbone and side-chain resonance assignments that utilise the one-bond or two-bond scalar couplings. Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
24
Protein NMR Spectroscopy
d. Obtaining structural information The NOEs between 1 H atoms, the secondary chemical shifts, the residual dipolar couplings and the presence of hydrogen bonds can be investigated by using 13 C and/or 15 N labelled proteins. 2 H labelling also effectively reduces spin diffusion, as well as dipolar and scalar couplings. e. Spin relaxation analysis Dynamic information about proteins can be obtained by the individual analysis of the relaxation properties of 13 C and 15 N atoms. There are two types of isotopic labelling: uniform and selective. In uniformly labelled proteins, all of the atoms in the target proteins are labelled without any selectivity, which is relatively easy to accomplish. On the other hand, in the case of selective labelling, a selected position is isotopically labelled, which requires special techniques in some cases. Several methods are now available for the production of these isotopically labelled proteins. In this chapter, we will first outline the production method for an isotopically labelled protein. Uniform and selective labelling techniques will then be discussed. Finally, we will describe the stereo-array isotope labelling (SAIL) method. In addition to this chapter, there are several useful reviews on isotope labelling of proteins for NMR [1–6].
2.2 2.2.1
Production Methods for Isotopically Labelled Proteins Recombinant Protein Expression in Living Organisms
2.2.1.1 Escherichia coli By virtue of the molecular cloning techniques developed in the late 1980s and ’90s, target proteins can readily be produced by using specific host organisms. By growing cells transformed with DNA encoding a target gene, the target protein is efficiently produced in the host cells. Amongst the many systems available for heterologous protein production, the Gram-negative bacterium Escherichia coli (E. coli) is the most widely used host for the production of recombinant proteins [7,8]. E. coli can synthesise 20 amino acids from a simple carbon source and nitrogen salt, and thus can grow on minimal media containing limited kinds of nutrients. Therefore, by supplying isotope-labelled nutrients, the E. coli cells can produce a large quantity of an isotope-labelled target protein. One frequently employed system involves the utilisation of a bacteriophage-derived T7 promoter system [9], in which a plasmid encoding a target protein, under the control of the T7 promoter, is transformed into an E. coli strain, such as a K12 or B strain. The advantage of the E. coli expression system over those of other host organisms is that E. coli can grow rapidly and produce large amounts of heterologous proteins. In addition, its metabolic and catabolic pathways for amino acids have been well characterised, and mutations that block key steps are readily available. At present, the use of E. coli is usually considered first in an attempt to express a new protein, unless posttranslational modifications, such as glycosylation, are required for biological activity. However, overexpression of recombinant proteins often results in the formation of insoluble protein inclusion bodies,
Isotope Labelling
25
which are composed of densely packed, denatured proteins in the form of particles [10]. This problem is especially vexatious for proteins with numerous disulfide bonds. If the protein within the inclusion body can be refolded into a functional form, then the insolubility is not a major problem. However, if the refolding is impossible, then slowing the protein expression rate by growing the cells at lower temperatures is worth considering. The expression levels vary for different genes due to many factors, such as mRNA stability, codon bias, protein degradation, and folding efficiency. To circumvent these problems, the optimisation of the DNA sequence (codon usage), the deletion of genes encoding proteases, and the addition of chaperones can be considered. 2.2.1.2 Yeast Cells A major problem encountered in the use of an E. coli expression system is difficulties in the production of functional proteins with numerous disulfide-bonds and/or posttranslational modifications. As an alternative host organism to E. coli, the yeast Pichia pastoris (P. pastoris) has gained popularity [11,12]. The advantage of P. pastoris is that it secretes heterologous proteins into the medium when a specific expression vector is used, which facilitates their purification. In addition, correct disulfide bonds are more likely to be formed, as compared to the E. coli production system. Quite recently, an expression system using the yeast hemiascomycete Kluyveromyces lactis (K. lactis) has been reported [13,14]. One of the major differences between P. pastoris and K. lactis is the promoters used for expression of the target gene. Target protein expression in P. pastoris and K. lactis is induced by adding methanol and galactose, respectively. The target proteins expressed by the yeast cells are modified by a heterogeneous, high-mannose glycan. In some cases, digestion of the attached glycan by a glycosidase is additionally needed to achieve homogeneity of the sample, unless the glycan is involved in some functional aspect. 2.2.1.3 Other Host Cells Baculovirus-infected cells are also regarded as a potentially important expression system. The expression of rhodopsin labelled with specific amino acids by using baculovirusinfected Spodoptera frugiperda (Sf9) cells has been reported [15,16]. Amino acid-type selective labelling of the catalytic domain of c-Abl kinase in Sf9 cells has also been achieved [17]. To accomplish correct folding and posttranslational modification of target proteins, the use of mammalian cells, such as Chinese hamster ovary (CHO) cells, is also a promising approach [18,19]. As compared to E. coli systems, the production of the target protein is difficult and expensive. In general, mammalian cells do not grow on media used for bacteria, and they require amino acids, vitamins, cofactors, and in most cases, serum. Hence, well-defined expression media supplemented with isotope-labelled amino acids are commonly used. 2.2.2
Cell-Free Synthesis
In some cases, using E. coli cells and yeast as protein production systems presents some difficulties. For instance, the expression of a protein toxic to the host cells will not be tolerated. Cross-labelling of amino acids also occurs for specific amino acids, which leads
Protein NMR Spectroscopy
26
to the inefficient and incorrect incorporation of the label into the target protein. Recently, cell-free synthesis systems have drawn intense interest as an alternative method for producing isotope-labelled proteins [20–24]. In cell-free synthesis, the protein production is carried out in a vessel, in which the protein synthesis system is reconstituted. The E. coli cell-free protein production system utilises an E. coli extract, which contains the protein synthesis machinery. The advantage of the E. coli cell-free system is that the metabolic conversion that is present in living cells is strongly suppressed, which enables the selective labelling of target proteins for almost all amino acids. In addition, the incorporation rate of the labelled amino acids is much higher, as compared to the in vivo system, which is especially important for the SAIL method [25]. The expression of trans-membrane regions in membrane proteins by a cell-free system is also possible, by adding various detergents into the cell-free reaction mixture [26]. The E. coli extract is commercially available from several companies, and can also be prepared in the laboratory [24]. Cell-free synthesis using wheat germ extracts has gained attention as a promising approach for NMR sample production [27,28]. The wheat germ cell-free system is considered to be RNase-free and suitable for the expression of large proteins. Aminoacid-selective labelling with the use of the wheat germ cell-free system has been reported [29]. To our knowledge, the preparation of the wheat germ extract seems to be more difficult, as compared to the E. coli extract. The wheat germ extract for cell-free protein synthesis is commercially available from Cell-Free Sciences Co. (http://www. cfsciences.com/). In the production of isotope labelled proteins, it is crucial that unlabelled amino acids are not included in the S30 extract. Otherwise, the employed isotopically labelled amino acids are diluted. The protocol for the preparation of the E. coli S30 extract with minimal unlabelled amino acids is described in Protocol 1.
Protocol 1: Preparation of the Amino Acid Free S30 Extract 1. Inoculate stock of E. coli (A19, BL21 Star etc.) into 10 ml of LB medium in a 50 ml tube and grow the cells overnight at 37 C with shaking. 2. Inoculate 10 ml of the culture medium into 1 l of incomplete rich medium in a 2-litre flask. .
Incomplete rich medium For 1 l, combine the following: 5.6 g KH2PO4, 28.9 g K2HPO4, 1 g Bacto yeast extract, 1.5 mg thiamine Autoclave and add the following: 50 ml 40 % (w/v) D-glucose, 10 ml 0.1 M Mg(OAc)2
3. Grow the cells at 37 C with shaking to an OD650 of 0.7. The growth rate of the cells should be monitored, since it correlates with the activity of the resulting extract.
Isotope Labelling
27
4. Centrifuge the cells (5000 g, 4 C, 10 min) and wash them with 200 ml of ice-cold S30 buffer containing 0.05 % 2-mercaptoethanol by gentle resuspension three times. Do not allow foaming of the suspension. .
S30 buffer For 1 l, combine the following: 10 ml 1 M Tris-acetate (pH 8.2), 10 ml 1.4 M Mg(OAc)2, 10 ml 6 M KOAc, 1 ml 1 M DTT (add after autoclaving)
5. Gently resuspend the cell pellet with 200 ml of ice-cold S30 buffer containing 0.05 % (v/v) 2-mercaptoethanol. Centrifuge the suspension (5000 g, 4 C, 10 min) and weigh the E. coli pellets. Resuspend the pellet in 1.27 ml of S30 buffer per gram of E. coli. 6. Disrupt the cells with a French Press at 20 000 psi (1400 kg cm2). Add 30 ml of 1 M DTT to the lysate immediately after the disruption of the cells. Centrifuge the lysate (30 000 g, 4 C, 30 min) using RNase-free centrifuge tubes. Carefully remove approximately 1.4 ml of the supernatant per gram of E. coli, without mixing with the precipitate. 7. Transfer the supernatant to RNase-free tubes. Centrifuge them (30 000 g, 4 C, 30 min) and remove approximately 1.0 ml of the supernatant per gram of E. coli into a 50 ml tube. 8. Shake the tube at 37 C for 80 min. 9. Dialyse the solution at 4 C for 45 min against 2 l of S30 buffer using a dialysis tube with a MWCO of 6000–8000. Allow a little air into the tube to let it float. Repeat the dialysis twice, and then centrifuge the solution (15 000 g, 10 min, 4 C) and collect the supernatant. 10. Uniformly fill an open column (Econo-column chromatography column, 2.5 20 cm) with Sephadex G25 resin and place the column vertically in a cold space (4 C). Attach an Econo-column funnel to the top end of the column. Pour 500 ml of S30 buffer through the funnel into the column. 11. Apply the supernatant from step 9 to the column that was pre-equilibrated at 4 C in step 10. After loading the supernatant, continue to supply the funnel with the S30 buffer to maintain the flow in the column. When the first fraction reaches to the bottom, start to collect 1.4 times the volume of the applied extract. Determine the first fraction by judging from its colour (yellow) and turbidity. 12. Dialyse the eluate at 4 C against 700 ml of an equal weight mixture of PEG-8000 and S30 buffer. Before use, the PEG-S30 buffer (at 4 C) should always be stirred to avoid PEG deposition. Adjust the dialysis time so as to concentrate the extract up to 0.86 times the volume. Dialyse it at 4 C for 60 min against 2 l of S30 buffer. 13. Transfer the extract to 1.5 ml tubes. Freeze the tubes in liquid nitrogen. Store them at 80 C. The activity of the produced S30 extract should be evaluated by performing cell-free reactions of some proteins on a small-scale (Protocol 2). Empirically, the prepared S30 extract can be stored at 80 C at least for several months.
Protein NMR Spectroscopy
28
Protocol 2: Cell-Free Reaction on a Small Scale 1. Prepare the reaction solution and the dialysis solution by mixing the components as follows: Stock solution
Reaction solution
Dialysis solution
RNase-free water 1.4 M NH4OAc 0.5 M Mg(OAc)2 Mixture of 20 unlabelled amino acids (1 mM each) 0.645 M creatine phosphate LM mixturea 1 mg/ml template DNA 11 mg/ml T7 RNA polymerase 40 units/ml RNase inhibitor 10 mg/ml creatine kinase S30 extract Total volume
123.2 ml 9.8 ml 15 ml 20 ml 40 ml 125 ml 10 ml 4.5 ml 1.25 ml 12.5 ml 150 ml 0.5 ml
1160.8 ml 39.2 ml 60 ml 80 ml 160 ml 500 ml — — — — — 2 ml
a
LM mixture. For 200 ml, combine the following: 22 ml 2 M HEPES-KOH pH 7.5, 33.4 ml 6 M KOAc, 210 mg DTT, 530 mg ATP, 338 mg CTP, 335 mg GTP, 310 mg UTP, 172 mg cAMP, 28 mg folinic acid, 140 mg tRNA, 64 ml 50% (w/v) PEG8000, RNase-free water, up to 200 ml. The prepared LM mixture can be frozen at 20 C for one month or more. For SDS-PAGE analysis of the cell-free reactants, the PEG-8000, which hampers SDS-PAGE analysis, should be removed by ethanol precipitation before the addition of sample dye to the reaction solution.
Thaw the frozen S30 extract on ice. Prepare the creatine phosphate in RNase-free water just prior to use. Heating the amino acid mix up to 60 C is effective to dissolve the amino acids. Excessive heating of SAIL amino acids may cause racemisation, especially at high pH. 2. Pour the dialysis solution into the outer tube. Place the inner membrane apparatus of the Float-A-Lyzer inside the outer tube, and pour the reaction solution into the inner membrane. 3. Shake the tube to allow for the production of target proteins. The optimal temperature and incubation times should be determined with small-scale cell-free reactions. 4. Retrieve the reaction solution and the dialysis solution. If the protein produced has a molecular weight smaller than molecular weight cut-off of the membrane, check the outer solution for the presence of the protein.
The yield of the cell-free synthesis varies depending on different factors, including the sequence of the target gene, the expression plasmid, the buffer conditions, the amount of
Isotope Labelling
29
amino acids used, and the temperature. At least, pilot experiments should be performed with small-scale reactions by using unlabelled amino acids. It seems that the protein production level in the cell-free synthesis is correlated with that in the in vivo expression to some extent. In addition, the incubation times and the temperature affect the expression level of the target proteins. It is especially worthwhile to optimise the magnesium concentration. The utilisation of a special expression vector optimised for cell-free synthesis and the introduction of silent mutations into the target DNA sequence are also worth considering [24]. After optimisation of conditions for the cell-free reaction, uniformly 15 N or 13 C/15 N labelled proteins are commonly produced in a large-scale reaction. For cost control, a commercially available amino acid mixture, such as from an algal source, enriched with 15 N and/or 13 C is often used. However, the composition of each amino acid should be checked prior to use. In our experience, glutamine, asparagine, cysteine and tryptophan residues are missing in the case of an algal lysate. Although glutamine and asparagine are not needed, due to their formation from glutamate/aspartate and ammonium ion mediated by a transaminase, tryptophan and/or cysteine should be supplemented if these residues are present in the amino acid sequence of the target protein. For a comparison between cell-free and in vivo expression, preparation of the target protein by in vivo expression is also highly recommended. Their 1 H-15 N HSQC spectra should be compared carefully. It should be emphasised that the processing of the N-terminus by peptide deformylase is likely to be incomplete in the case of E. coli cellfree expression, thus producing an inhomogeneous sample. In such a case, the residues close to the N-terminus produce doubled resonances, which correspond to the formylated and deformylated forms. One method to overcome this problem is to use a cleavable N-terminal tag. The N-terminal tag can also be used to increase the expression level [24].
2.3 2.3.1
Uniform Isotope Labelling of Proteins Uniform 15N Labelling
An NMR study of a new protein often starts with the preparation of the uniformly 15 N labelled protein. The use of uniformly 15 N labelled proteins is suitable for initial characterisation, due to its low cost. The uniformly 15 N labelled proteins are usually produced by growing E. coli cells transformed with the target DNA on minimal medium (M9) that contains 15 N labelled ammonium chloride (15 NH4 Cl) as the sole nitrogen source [1]. The quality of the NMR sample is evaluated based on the dispersion of peaks, the number of peaks and the uniformity of the peak intensities. Good-quality NMR spectra promise the success of further studies, such as NMR structure determination, dynamic studies, and interaction analyses. If the quality of the NMR spectra is not sufficient, then the sample conditions, such as buffer composition, temperature and concentration, and the construct should be optimised (see Chapter 1).
Protein NMR Spectroscopy
30
2.3.2
Uniform 13C,
15
N Labelling
13
The C enrichment of a target protein further expands the scope of NMR experiments. Uniformly 13 C/15 N labelled proteins are commonly produced by growing E. coli cells in minimal M9 medium containing a 15 N salt and, in addition, a 13 C labelled precursor. Although E. coli cells can utilise several kinds of precursors, including glucose, pyruvate, acetate, succinate and glycerol, as sole carbon sources; glucose is commonly used, since it facilitates a high expression level [4]. The 13 C-enrichment enables a variety of heteronuclear multidimensional experiments involving the backbone and side-chain carbon atoms [30–32]. The secondary chemical shifts of the 13 Ca, 13 Cb and 13 C carbonyl carbons also provide information about secondary structure elements. Some excellent protocols for secondary structure prediction have been developed [33,34]. Recently, a strategy to determine the tertiary structures of proteins, using a limited number of NOE-derived constraints and chemical shifts, has been reported [35]. Although one-bond 13 C-13 C couplings are useful for the assignment of the side-chains in proteins, they complicate relaxation studies of 13 C nuclei. In this context, a protein possessing an alternate 13 C-12 C labelling pattern in the side chains, which can be prepared by growing E. coli in medium containing a combination of either [2-13 C] glycerol and NaH12 CO3 or [1; 3-13 C] glycerol and NaH12 CO3 as the carbon source, is useful in this respect [36]. This approach was applied for the relaxation study of thioredoxin, with the concomitant use of 50 % random fractional deuteration. A notable example of this isotope labelling strategy was the structure determination of an SH3 domain by solid-state NMR [37]. As described above, 13 C-12 C alternate labelling is also useful for backbone assignment [38]. 2.3.3
2
H Labelling
Protein deuteration has long been regarded as a key method to study proteins by NMR [39–41]. The magnetogyric ratio of 2 H is 6.5 times lower than that of 1 H, and thus the substitution of 1 H to 2 H mitigates the unwanted dipolar and scalar couplings in proteins. Theoretically, the substitution of 1 H to 2 H at the a position is expected to lead to a longer T2 relaxation time (about 12-fold for a 50 kDa protein) for the a carbon. Deuteration also removes the scalar coupling and reduces the spin diffusion. Based on the level of deuteration, the 2 H labelling schemes can be classified into two groups: full deuteration, termed ‘perdeuteration’ [42] and random fractional deuteration (the deuteration level is around 50–90 %) which will be described in the following subsection. Deuterated proteins are commonly produced by growing E. coli cells on minimal medium (M9) containing 2 H2 O and either protonated or deuterated carbon sources. While E. coli cells can tolerate 2 H2 O, culturing them in 2 H2 O medium involves a significant reduction in growth rate and yield, due to the deuterium isotope effects. To overcome this problem, several protocols for deuteration have been reported [43–45]. The production method for the deuterated protein is selected based on the intended level of deuteration. When a deuteration level of up to 75–80 % is required, one easy method is to grow the E. coli cells on H2O medium to a high cell density, resuspend the isolated cells in medium containing the intended ratio of 2 H2 O, and then induce the protein expression [4,46]. To achieve higher levels of deuteration, growing E. coli cells adapted to 2 H2 O medium is necessary.
Isotope Labelling
31
The adaptation process involves increasing the ratio of 2 H2 O to 1 H2 O gradually during the culture of E. coli cells. Step-by-step procedures for the deuteration have been well documented in other reviews [43,47]. Protonated glucose is often used in the case of random fractional deuteration. The use of acetate as a carbon source has also been reported [48]. When acetate is used, the deuteration level of the protein linearly correlates with the 2 H2 O ratio in the culture medium [49]. The culture of E. coli cells in 2 H2 Ocontaining medium reportedly affects carbon metabolism [50]. However, the use of glucose presently appears to be the most reliable way to obtain a high yield, as compared to other carbon sources. The addition of a mixture of deuterated amino acids into the culture medium is effective to enhance the growth of E. coli cells and also to increase the expression levels of target proteins. One problem often encountered in the analysis of deuterated proteins is that the amide groups embedded in the core of a protein molecule are highly protected from the solvent. The back-exchange of such amide groups can often be accomplished by denaturing the deuterated protein in 1 H2 O, provided that it can be refolded. On the other hand, when 1 H2 O medium is used, the a and some b protons are partially protonated, even when fully deuterated amino acids are used [51]. Recently, P. pastoris has also been successfully used to prepare deuterated proteins [52]. Deuteration expands the size of proteins amenable to NMR analyses. In a perdeuterated protein, sequential backbone assignments can be accomplished in the absence of the relaxation effect derived from aliphatic protons. Of course, the assignment of the side-chain protons is no longer possible in completely deuterated proteins, although that of the sidechain 13 Cs can still be performed by using a 13 C-originating pulse scheme [53]. For the observation of 13 C atoms attached to 2 H atoms, 2 H decoupling is often required to reduce the line-widths of the deuterated 13 C signals. Irradiation with an RF field sufficiently stronger than the inverse 2 H T1 removes the broadening due to the residual scalar interactions between 2 H and 13 C, and thus results in a much narrower 13 C linewidth, relative to that of the protonated 13 C [54]. The complete absence of aliphatic protons also minimises the effects of spin-diffusion, thus facilitating the observation of long distance NOEs. The problem is that the available NOEs are limited to 1 HN -1 HN pairs. Based on simulations by Venters and coworkers for human carbonic anhydrase II and human profilin, both the backbone and side chain 1 HN -derived NOEs are needed, and a distance constraint greater than 6 A is required to obtain a protein structure with reasonable quality [55]. Another important application of a perdeuterated protein is the cross-saturation experiment and its modified version, the transferred cross-saturation experiment [56–58]. These methods are applicable to a wide variety of biologically relevant interactions [59–61]. A number of these applications of perdeuteration are described in other chapters of this volume. The TROSY (transverse relaxation optimised spectroscopy) technique is often used when analysing large deuterated proteins [62–64]. This method exploits the destructive interference between distinct relaxation mechanisms. For instance, the interference between the 1 H-15 N dipolar interaction and the 15 N chemical shift anisotropy results in distinct linewidths for the two 1 H-coupled 15 N resonances. The cross correlation effect can divide the coherence into two classes: one that is fast-relaxing, due to constructive interference between the cross-correlated relaxations, and another that is slowly relaxing, due to destructive interference. In the TROSY approach, a sharper line is selectively observed. The development of the TROSY method has significantly increased the upper limit of molecular weights amenable to NMR studies [65]. TROSY was originally applied to modify
32
Protein NMR Spectroscopy
the duration of the chemical shift encoding. The TROSYapproach was subsequently utilised in the polarisation transfer step, which is known as CRIPT (cross-correlated relaxation induced polarisation transfer) and CRINEPT (cross-correlated relaxation induced INEPT) [63]. To date, the methods for observing sharper lines arising from the crosscorrelation of different relaxation mechanisms have been applied to amide 1 H-15 N moieties [62], aromatic 1 H-13 C moieties [66] and methyl groups [67]. If side-chain nonexchangeable protons must be observed, then random fractional deuteration may be used for this purpose, as demonstrated for a variety of proteins, such as thioredoxin and staphylococcal nuclease [68,69]. Random fractional deuteration compromises between the reduction of the 1 H NMR information and the reduced line-width of the remaining 1 H resonances. A deuteration level of 50 % has been recommended for structure determination [70]. In combination with methyl-selective protonation, which will be described in the next section, the global fold of a large protein can be determined [71]. A serious problem encountered in the employment of random fractionally deuterated proteins is the presence of multiple isotopomers in the methylene and methyl groups. The chemical shifts in the deuterated sample are slightly different from those of the fully protonated sample, due to deuterium-induced isotope shifts. The substitution of 1 H by 2 H induces upfield chemical shift changes of the carbons separated by one to three chemical bonds from the substituted atom. In the case of methylene groups, there are four isotopomers: CD2, CHRDS, CDRHS and CH2. With the use of a special editing technique, the coherence for specific isotopomers can be eliminated, which enables the relaxation analysis using a specific isotopomer, such as CH2D in methyl and CHD in methylene groups [72–74].
2.4 2.4.1
Selective Isotope Labelling of Proteins Amino Acid Type-Selective Labelling
Amino acid type-selective 15 N labelling in proteins is useful for the reliable identification of a specific amino-acid type for resonances observed in 1 H-15 N HSQC spectra. Although the well-established sequential assignment method using a uniformly 13 C/15 N labelled protein is a powerful tool (see Chapter 3), it is desirable to confirm the assignment by different approaches. In the amino acid type-selective 15 N labelled protein, the nitrogen atoms in a specific amino acid are selectively enriched with 15 N. In its 1 H-15 N HSQC spectrum, therefore, the resonances originating from this selected residue type are selectively observed, which enables the reliable assignment of the amino acid type for the peaks. The amino acid-selective 15 N-labelled protein can be produced by using E. coli cells for specific residues, and by the cell-free synthesis system for a wide range of amino acid residues [21]. When E. coli cells are used to produce the 15 N-selectively labelled protein, two potential problems are isotopic dilution and cross-labelling, caused by various metabolic pathways. Provided that a selected amino acid is located downstream of the biosynthetic pathway, the selective enrichment can be achieved by adding the labelled amino acid to the unlabelled amino acid pool just prior to the induction of protein expression [75]. However, this approach does not work well in the cases where the amino acids are key intermediates in amino acid metabolic pathways, such as with Gly, Ser, Asx
Isotope Labelling
33
and Glx. To ensure the controlled incorporation of an isotope labelled amino acid into a target protein, auxotrophic strains, such as E. coli with lesions in the appropriate amino acid metabolic pathways, are commonly employed [76,77]. For instance, by using E. coli deficient in the biosynthesis of shikimic acid, the selective labelling of aromatic amino acids (Phe, Tyr and Trp) can be accomplished [78]. Many different auxotrophic strains of E. coli are available from a stock centre (http://cgsc.biology.yale.edu/). The addition of a large amount of unlabelled amino acids is also known to suppress metabolic conversion effectively in E. coli cells. As an alternative approach, a cell-free production system is suitable for the amino acid selective labelling method. In the cell-free reaction system, the interconversion of amino acids is suppressed to a large extent. However, isotope-labelled amino acids are generally costly, as compared to isotope labelled carbon sources and ammonium salts. The amino acid selective 14 N reverse labelling of a protein has been reported, whose cost is reasonable as compared to its counterpart [79]. In this method, specific amino acids are added in their unlabelled forms into M9 medium containing 15 NH4 Cl. Subsequently, in the produced protein, some peaks are missing or weakened as compared to those in the uniformly 15 N labelled sample. Specific 13 C amino acid type-selective labelling of proteins is also useful when combined with 13 C direct detection. Recent advances in cryogenic probe technology have enabled the direct observation of carbon atoms with high sensitivity even in large proteins, thus encouraging researchers to revisit this traditional technique [80,81]. The direct observation of 13 C nuclei has some advantages over that of 1 H nuclei. For instance, the 13 C nuclei are less affected by paramagnetic relaxation as compared to the 1 H nuclei, which enables the detection of atoms close to the paramagnetic centre in analyses of metal-binding proteins [82]. In the case of an NMR study of Streptomyces subtilisin inhibitor (SSI), amino acid selective labelling for the 13 C carbonyl carbons has been employed [83,84]. The characteristics of the carbonyl carbon observations are that the CSA of the carbonyl carbon is large, which induces line broadening for higher magnetic fields. Conversely, the carbon detection can be accomplished by performing the NMR experiment at a lower magnetic field. Recently, deuterium-induced isotope shifts for carbonyl carbons were used to monitor the amide proton exchange rates of the residues embedded in the hydrophobic core of a globular protein, leading to investigation of the detailed mechanism of the dynamic fluctuation of a five-stranded b-sheet in SSI [84,85]. The amino acid type selective 13 C carbonyl– and 15 N-double labelling method is a powerful tool for reliable resonance assignment. This method utilises proteins in which the main chain carbonyl carbons of a specific amino acid are labelled with 13 C, and the amide nitrogens of another kind of amino acid are labelled with 15 N. The NMR signals of amino acids that possess a 15 N -13 CO linkage are extracted on the basis of 13 C-15 N spin couplings. This method was first demonstrated for the assignment of the methionyl carbonyl carbons within SSI (Streptomyces Subtilisin Inhibitor). SSI contains three Met residues, at positions 70, 73 and 103. Their succeeding residues are Cys71, Val74 and Asn104, respectively. In this case, two SSI samples were prepared: SSI doubly labelled with [1-13 C]-Met and 15 N-Val and that doubly labelled with [1-13 C]-Met and 15 N-Cys. In the 1D 13 C spectra for the two samples, three peaks from the methionyl carbonyl carbons were observed. Based on the scalar coupling with directly coupled 15 N in the succeeding residue (15 Hz), the peak assignments were readily obtained [83]. The experimental procedures for this approach
34
Protein NMR Spectroscopy
have been described earlier [1]. The 13 C-carbonyl, 15 N double labelling approach can be effectively applied for very large proteins. Amino acid type identification of peaks in 1 H -15 N HSQC spectra can be achieved by using 19 different samples, where each of the 19 nonproline residues is selectively 15 Nlabelled [86], although this is obviously time-consuming and costly. The same information can be obtained with much less effort by the use of combinatorial selective labelling (CSL) [79,87–89]. In this approach, several amino acids are simultaneously 15 N or 13 C carbonyl-labelled. The idea of combinatorial amino acid labelling originated within 14 N reverse labelling [79], in which a 14 N labelled amino acid mixture was added to M9 medium containing 15 NH4 Cl. Parker and coworkers developed a CSL method capable of yielding large numbers of residue-type and sequence-specific backbone amide assignments, which involves comparing the cross-peak intensities in 1 H-15 N HSQC and 1 H-15 N HNCO spectra collected for five samples containing different combinations of 13 C- and 15 N-labelled amino acids [87,88]. The important consideration in this approach involves the problems of cross-labelling. To avoid this, the use of a cell-free system expands the applicability of this method. 2.4.2
Reverse Labelling
As mentioned above, the depletion of a specific isotope in a selected atom, as well as selective isotopic enrichment, is an important concept in stable isotope labelling approaches. In some cases, for example, specific protonation of the selected residues in a fully deuterated background is effective for elucidating the structure. Amino acid-type selective labelling often involves the addition of protonated 15 N- or 15 N, 13 C-labelled amino acids to the deuterated expression medium [90,91]. The procedure for the accomplishment of residue-selective labelling in a deuterium background has been well documented [44]. By combining deuteration with selective 1 H-, 13 C- and 15 N-labelling of a limited number of amino acid residues, a sufficient number of NOEs can be identified to determine the global folds of large proteins. The utility of this procedure has been demonstrated for the structure determination of the 25 kDa tryptophan repressor from E. coli [92,93]. Although the deuteration of nonexchangeable protons in proteins improves the quality of the NH-based triple resonance experiments employed for backbone assignments [94–96], the selective protonation of the a position in the deuterium background is useful for the backbone assignments in some cases, because the Ha atom is used as the starting point of the triple resonance experiment [97]. The Ha-based multidimensional experiment has an advantage over NH-based experiments, in that it can be performed under conditions with 100 % 2 H2 O buffer. The quality of both the NH- and Ha-based experiments for backbone assignment depends largely on the line-width of the 13 Ca resonance. In this regard, the presence of 13 Ca-13 Cb coupling (40 Hz) is unfavourable for these experiments. A commercially available 2 H, 13 C and 15 N labelled amino acid mixture can be backprotonated at the a position by using a simple chemical reaction [98]. The b subunit of human chorionic gonadotropin has been prepared by CHO cells cultured in a medium containing amino acids labelled only in the backbone (N, Ca, Ha, C0 ) atoms [99]. Since mammalian cells require nearly all naturally occurring amino acids, no amino acid scrambling would be expected. The backbone 13 C, 15 N, (50%) 2 H-labelled leucine,
Isotope Labelling
35
phenylalanine and valine were synthesised and utilised. For the selective 13 Ca enrichment, glycerol can be used with 12 C-13 C alternate labelling [74]. By growing E. coli cells on deuterium-containing minimal medium, using either combinations of [2-13 C] glycerol and NaH13 CO3 or [1; 3-13 C] glycerol and NaH13 CO3 as carbon sources, proteins with Ca atoms enriched with 13 C and carbonyl and Cb atoms with 12 C are obtained. When [2-13 C] glycerol is used, the Ca positions of Ala, Cys, Phe, Gly, His, Lys, Ser, Val, Trp and Tyr residues are efficiently enriched with 13 C. On the other hand, when [1; 3-13 C] glycerol is used, those of Glu, Leu, Gln, Pro and Arg residues are enriched by 13 C. The Ca atoms of Asp, Ile, Met, Asn and Thr are partially enriched in both cases. In this labelling pattern, the T2 relaxation is minimised due to the absence of 13 Cb, 1 Ha and 13 C0 nuclei. With the use of the 13 C direct observation technique, the backbone assignment has been accomplished via 15 N-13 Ca [38]. As compared to 13 C-attached protons, 12 C-attached protons yield sharper peaks, due to the absence of dipolar relaxation. If one prepares a protein that selectively contains unlabelled (12 C) amino acids, while others are uniformly 13 C labelled, then the NOEs between the 12 C-amino acid residues and 13 C-labelled amino acids can be observed by using isotope-editing and -filtering techniques. The phenylalanine aromatic resonances of the DNA binding domain of Drosophila heat shock factor were assigned by this method [100]. This type of approach has been widely used to study many proteins, such as calcium-free calmodulin [101], the 24 kDa Dbl homology domain [102], and the 25 kDa anti-apoptotic protein, Bcl-xL [103]. Side-chain methyl groups are valuable probes in NMR studies of the structures and dynamics of proteins [72,104,105]. Recent advances in labelling techniques have made it possible to introduce 1 H and 13 C only into methyl positions, which are otherwise deuterated [71,106,107]. The merit of this type of methyl labelling is its applicability to large proteins. For the production of the methyl-protonated proteins, an isotope labelled precursor is commonly employed. Perdeuterated proteins, in which the methyl groups of Ile (g2 only), Leu, Val and Ala are selectively protonated, can be produced by adding protonated pyruvate to a 2 H2 O-based medium [106]. The problem with the use of pyruvate is that the methyl group contains isotopomers: 13 CH3 , 13 CHD2 , 13 CH2 D and 13 CD3 [106]. The 13 C and 1 H chemical shifts of these isotopomers, except for CD3, are slightly different, due to 2 H isotope shifts of 0.3 and -0.02 ppm per deuteration [108]. Hence, a-ketobutyrate (for Ile (d1)) and a-ketoisovalerate (for Leu, Val) are now commonly used for the methyl labelling. With the use of commercially available ketobutyrates/ketoisovalerates with 13 CHD2 , 13 CH2 D labelling, in addition to the 13 CH3 -based precursors, one can now produce samples that are uniform in the isotopomers of choice [75,109]. The methyl-protonation of Ile, Leu, and Val residues by using precursors has been well established. Quite recently, the methyl selective labelling of alanine residues has also been reported [110], using methyl-protonated Ala instead of the precursor. In a protein methyl-protonated in a deuterium background, the global fold can be determined by the use of a set of methyl-methyl, methyl-NH, and NH-NH distance restraints, as demonstrated for the C-terminal SH2 domain of phospholipase Cg1 [71]. The three methyl isotopomers (13 CH3 , 13 CHD2 and 13 CH2 D) have distinct relaxation properties, and the corresponding 2D correlation pulse schemes that yield the best resolution differences between them have been reported [108]. The properties of the three isotopomers are also distinct. In the case of the 13 CH3 methyl isotopomer, by using
36
Protein NMR Spectroscopy
an HMQC pulse scheme, the methyl groups of an extremely large protein complex, such as the 670 kDa 20S proteasome, can be observed [111]. In the case of the 13 CHD2 isotopomer, an 1 H relaxation analysis can be performed [112], while in the case of the 13 CH2 D isotopomer, a 2 H relaxation analysis is performed. Due to the poor deuterium chemical shift dispersion and the rapid decay of the deuterium magnetisation, the relaxation times were obtained indirectly, as those of the triple spin terms, IzCzDz or IzCzDy. The picosecond to nanosecond time scale dynamics of the methyl-containing side-chains have been studied by using a 2 H-based NMR relaxation approach. This method has been applied to analyse the dynamic behaviours of the methyl side-chains of the C-terminal SH2 domain of phospholipase C-g1 [73]. The assignment of the methyl signals can be achieved by utilising the side-chain 13 C-13 C connectivity [113,114]. In the case of the 20S proteasome, the assignments of the methyl groups were transferred from those obtained in the state of the 21 kDa a monomer, and then the remaining signals were assigned by introducing mutations [111]. 2.4.3
Stereo-Selective Labelling
The stereospecific assignment of diastereotopic groups in Leu and Val residues is essential for defining the precise orientations of the isopropyl groups of Val and Leu residues [115]. Biosynthetic fractional 13 C labelling can be used for the unambiguous stereospecific assignment of diastereotopic methyl groups in Val and Leu residues [116–118]. In this method, a mixture of roughly 90 % [12 C6 ]-glucose and 10 % uniformly [13 C6 ]-glucose is used as the sole carbon source. This method exploits the fact that the biosynthesis of Val and Leu residues from glucose is stereo-selective. The isopropyl group of valine and leucine is composed of two pyruvate molecules originating from glucose. In this isopropyl group, the pro-R methyl group (d1 in Leu and g 1 in Val) and the adjacent carbon atom (g in Leu and b in Val) originate from the same pyruvate molecule, and the isotopic composition of the two groups becomes the same. On the other hand, the pro-S methyl group (d 2 in Leu and g 2 in Val) and the adjacent carbon atom originate from two different pyruvate molecules. Therefore, when 90 % [12 C6 ]-glucose and 10 % uniformly [13 C6 ]-glucose are used as precursors, the 13 C atoms located in the pro-R methyl group are always directly bonded to 13 C, and meanwhile, a large proportion (90 %) of the 13 C atoms located in the pro-S methyl group are connected to the 12 C atom. Based on the presence of 13 C-13 C J coupling, the stereospecific assignment of the isopropyl methyl group can readily be performed. If the observation of the peaks of the diastereotopic methyl groups is hampered by severe overlaps with the methyl groups from other amino acids, such as Ile, Lys and Thr, then unlabelling of the problematic amino is useful for overcoming this problem [118]. The use of ‘block’-13 C-labelled Val and Leu is a straightforward approach for the precise stereospecific assignment of their prochiral methyl groups, and this has been applied to the analysis of cystatin A and phosphoprotein CPI-17 [119,120]. Stereoselective deuteration of one methyl group in Leu residues has also been demonstrated for L. casei dihydrofolate reductase [121]. The stereospecific assignment of methylene protons is also of considerable importance. In this case, an amino acid in which one 1 H atom in the methylene group is stereo-selectively substituted by 2 H is incorporated into the target protein [122,123]. Staphylococcal nuclease H124L, in which the Gly residues were labelled with [2 HR ; 2-13 C] Gly or [2 HS ; 2-13 C] Gly,
Isotope Labelling
37
was prepared for the NMR analysis [124]. The linewidth improvement by deuteration is more prominent for the deuteration within an amino acid than that of the neighbouring amino acids. Thus, the extensive stereo-specific isotope labelling of target proteins promises considerable improvement of NMR spectra, which was ultimately accomplished by the SAIL method [25].
2.5
Segmental Labelling
Segmental labelling is a promising strategy to study large proteins. For instance, when a target protein comprises two structural domains, the N-terminal half of the polypeptide chain is labelled by 13 C and 15 N, and the remaining C-terminal half is unlabelled, thereby reducing the observable NMR signals to a manageable number. The segmental labelling method differs from the aforementioned amino acid selective labelling in that the labelled amino acids are sequentially positioned in the primary sequence, thus enabling the conventional sequential backbone assignment and the structure determination for the isotope labelled segment. In addition, the segment labelling can be used for NMR studies of interdomain interactions [125,126]. In segmental labelling schemes, an individual protein segment is expressed in medium containing the desired isotopic precursors or amino acids, while the rest of the protein segment(s) is expressed in unlabelled medium. Ligation of the independently labelled proteins yields the segmentally labelled protein. There are two methods that have been used for the segment ligation: protein trans-splicing (PTS) and expressed protein ligation (EPL). PTS is based on the utilisation of a protein splicing reaction. Protein splicing is a posttranslational processing event in which an internal protein segment, the intein, can catalyse its own excision from a precursor protein and concomitantly ligate the flanking regions to form the mature protein. The first application of PTS for NMR was performed for the C-terminal domain of the E. coli RNA polymerase a-subunit, using an intein, PI-PfuI from Pyrococcus furiosus, which can be cleaved into N-terminal and C-terminal portions [127]. The N- and C-terminal portions can adopt the correct folds of the intein, in turn leading to the protein splicing reaction, and thus the N-terminal segment of the target protein, attached to the N-terminal half of PI-pfuI, and the C-terminal segment of the target protein, attached to the C-terminal half of PI-pfuI, were mixed, heat denatured, and refolded. As a result, the PI-pfuI was formed, and then the protein splicing reaction occurred. The same strategy was also utilised for the 52 kDa b subunit of F1-ATPase [128] and for maltosebinding protein [129,130]. PTS can also be performed in E. coli cells [131]. In this method, two plasmids were used. One plasmid contained the T7/lac promoter for expression of the N-terminal portion of Ssp DnaE (inteinN) fused to GB1. The other plasmid was designed to express the C-terminal portion of Ssp DnaE (inteinC) fused to the CBD under the control of the araBAD promoter. The E. coli cells were first grown in unlabelled medium, and then the expression of the C-terminal fragment (inteinC-CBD) was induced by adding L-arabinose. Subsequently, the cells were harvested by centrifugation and resuspended in 15 N-labelled M9 medium. Expression of the N-terminal fragment (GB1-inteinN) was then induced by adding IPTG. In the E. coli cells, trans-splicing between the 15 N-labelled N-terminal fragment and the unlabelled C-terminal fragment occurred. The advantage of this method is
38
Protein NMR Spectroscopy
that it does not require either the individual preparation of precursor fragments before protein ligation or additional chemical reagents. EPL is based on a reaction originally employed in native chemical ligation, where the C-terminal thioester of one peptide reacts with the N-terminal cysteine residue of a second peptide [132–135]. Nucleophiles that react with thioesters, such as thiols (i.e. bmercaptoethanol, DTT, cysteine) and hydroxylamine, are used to shift the N-S equilibrium by attacking the thioester, which in turn induces the N-terminal cleavage of the intein. The choice of certain thiols depends on the accessibility of the catalytic pocket of the intein/extein splicing domain and the properties of the target proteins. Protein ligation requires a C-terminal thioester group and an N-terminal a-cysteine at the ends of the protein fragments. Such protein termini can be generated by expressing the protein fragments as fusions with a full-length or truncated intein, and subsequently inducing intein cleavage. As compared to the PTS method described above, EPL can be performed under milder conditions. The utility of this strategy has been demonstrated for over 40 proteins, such as two folded domains (SH2 and SH3) of Abelson protein tyrosine kinase [136] and s70-like factors [137]. The insertion of a synthetic peptide into a peptide segment can be performed, and thus a fluorescent probe can be site-specifically introduced into a polypeptide [138]. The IMPACT (Intein-mediated purification with an affinity chitin binding tag) kit is commercially available from New England Biolabs. This system utilises a modified intein in conjunction with a chitin-binding domain. The target protein is produced as a fusion with the intein-CBD at its N-terminus or C-terminus, and the fusion protein is absorbed onto a chitin column. The immobilised protein is then induced to undergo self-cleavage under mild conditions, resulting in the release of the target protein while the intein-CBD remains bound to the column [139].
2.6 2.6.1
SAIL Methods Concept of SAIL
Protein deuteration is the key to study large proteins by NMR. However, random fractional deuteration suffers from the presence of numerous isotopomers and the dilution of 1 H. Selective deuteration, such as methyl-selective protonation, in a deuterium background improves this situation to some extent. However, the methyl groups are localised to specific sites in the protein molecule, and the overall distance constraints cannot be obtained. To overcome the size-limitations of NMR structure determination without compromising the accuracy of the structure to be determined, the stereo-array isotope labelling (SAIL) method was developed [25]. The concept of the SAIL method is to utilise proteins exclusively composed of special amino acids that are stereo- and regio-specifically isotope labelled with 2 H, 13 C and 15 N (Figure 2.1). The optimal isotope labelling pattern for protein structure determination is designed as follows: a. In methylene groups, one of the two protons is stereo-selectively substituted by a deuteron. In this labelling pattern, the remaining 1 H atoms produce sharpened peaks, due to the absence of 1 H-1 H dipoles and coupling within the methylene group. In addition, the 1 H no longer overlaps with the substituted proton, thereby simplifying the NMR spectra of the methylene region. Furthermore, precise NOE-derived constraints can readily be obtained, due to the known stereospecific assignment of the 1 H atom. Once
Isotope Labelling
39
Methylene 13C
13C
H
H
H
D
Methyl H
13C
H
H
13C
D
D
H Prochiral Methyl 13C
13C 13
H313C
CH3
13
CHD2
D312C
Aromatic Ring H C
13C
H
D
13
H
H
H
13C 13C
H
D
C
13C
13C
13C
12
13C
12C
12C 13
C
H
D
Figure 2.1 SAIL amino acids. Design concepts in the SAIL amino acids
the positions of the methylene proton and the carbon are defined, the position of the substituted 2 H is automatically defined through the geometry of the methylene group. b. In methyl groups, two of the three protons are substituted with 2 H. The aim of this labelling pattern is the reduction of the spin diffusion effect. In addition, this labelling pattern might be used for advanced relaxation analyses of methyl groups. c. In the diastereotopic methyl groups in Leu and Val residues, one prochiral methyl is 13 CHD2 , and the other one is 12 CD3 . The aim of this labelling pattern is the observation of one methyl group with a known stereospecific assignment. d. In aromatic groups, the target proton-carbon moieties are 1 H-13 C, and the other moieties are 2 H and 12 C, thus eliminating the 13 C-13 C scalar coupling and the 1 H-1 H dipole coupling within aromatic rings. The SAIL approach profoundly simplifies the NMR spectra, by reducing the number of nonexchangeable protons, which are prone to overlap each other, to less than half of the original number and thus reducing the expected numbers of NOE peaks by 40–45% [25,140,141]. Since the NOEs eliminated provide information on either fixed (geminal) or redundant distance constraints, the reduction of the proton density is not a problem for structure determination. Conversely, the NOEs involving protons with known stereospecific assignments largely contribute towards defining the molecular conformation of the target protein. Another important advantage of the SAIL protein is the improved signal-to-noise ratio, which is mainly derived from the increased T2 relaxation times, due to the replacement
Protein NMR Spectroscopy
40
of unneeded 1 H and 13 C with 2 H and 12 C, respectively. In the case of the aromatic groups of Phe, Tyr and Trp, the absence of one-bond carbon-carbon coupling eliminates the need for a constant time scheme, thereby reducing the duration of the pulse scheme [141,142]. To perform this method, 20 special amino acids with a complete stereo- and regio-specific pattern of isotope labelling (SAIL amino acids) have been chemically and enzymatically synthesised (Figure 2.2) [25,142–144]. These SAIL amino acids are commercially available from SAILTechnologies, a company that was established to supply SAIL amino acids to the NMR community (www.sail-technologies.com). To address more difficult cases, such as NMR studies of membrane proteins, other SAIL amino acids with different isotope labelling patterns are now being designed and synthesised [145]. H
13CO H 2
H
H215N
13C
15
15NH 2
HD 213C
N
H 13
13
C
13
15
NH
H
Ala
13
H 13
HS
13
C
13 13
C
C
D D
15
13
13
H2 NO C 15
CO2H
H
NH2
D
C
D
13
Cys
Gly
15
NH2
D
Asp
13
13C
C
H
C
C
H
D
H H
13C
H215NO13C
13CO H 2
H215N
13
Asn
D
13
13
HO 213C
15NH 2
H
H
13CO H 2
H
CO2H
C
13C
NH2
Arg
15
C
H
C
C
13
H
13CO H 2
D H
CO2H
15
NH2
D HO213C
13
H H
13C
D
13
13C
15
C
H
CO2H NH2
D
Glu
Gln
H 15N 13 13
H
C
13
C 15
13
C
15NH
D
H
13
H
H
His
15
C 13
13C
NH 2
CD2H
Ile
H H
13C
HD 213C
13
CO2H
13C
H
15NH 2
13C
13C
13
H
C 13CD
15
NH2
2H
15NH
D 13
H
13
C
13
Figure 2.2
13 13
C
D
Thr
15
C
D
H
13
C
C
13
C
C
Trp
13
C
D
13CO H 2
13C
H
15
C
D
H
H
13C
13
13
HO
H
C
N
13
H
CO2H
13
C
C
D
CO2H 15
NH2
H
H
Ser
H
13
D H
13
H 215N
Pro
13CO H 2
13
HO
C
H 15NH 2
H
13C
13C
D
13
CO 2H
D312C
15NH 2
HD213C
13C 13C
H
D
Tyr
13
H
D
C
13C
C
H
Lys
13 15
H
H
C
13
H
C
D
NH2
H 13
2
D
13C
D
13C
Phe
13CO H 2
13C
CO2H
13C 13
H
D
Met
HO
13
H
D
H
H
D
13C
S
15NH
D
13C
D
H
Leu
H D
13C
D
H
13C
13C
D312C
D
13CO H 2
H H
HD213C
13C
HD 213C
2
13CO H 2
H H
13C
C
13C
N
D
13CO H 2
H
13 13
C
C
15
CO2H
NH2
H
H
Val
Chemical structures of the SAIL amino acids. Symbols: H, 1H; D, 2H
NH 2
Isotope Labelling
2.6.2
41
Practical Procedure for the SAIL Method
The production of SAIL proteins starts with cell-free synthesis, using SAIL amino acids. As described in the protein production section, the advantages of the cell-free system are its minimised metabolic scrambling and high incorporation rate of the added amino acid into the target protein. Thus far, the E. coli cell-free expression system has been used for the production of SAIL proteins. Once the pilot experiment has been accomplished, the production of the SAIL protein is performed according to Protocol 3.
Protocol 3: Production of SAIL Proteins by the E. coli Cell-Free Method 1. Prepare the reaction solution and the dialysis solution by mixing the components as follows:
2.
3. 4.
5.
Stock solution
Reaction solution
Dialysis solution
RNase-free water 1.4 M NH4OAc 0.5 M Mg(OAc)2 SAIL amino acid mixture (Total 60 mg) 0.645 M creatine phosphate LM mixture 1 mg/ml template DNA 11 mg/ml T7 RNA polymerase 40 units/ml RNase inhibitor 10 mg/ml creatine kinase S30 extract Total volume
1269.5 ml 98 ml 150 ml 200 ml 400 ml 1250 ml 100 ml 45 ml 12.5 ml 125 ml 1500 ml 5 ml
11 608 ml 392 ml 600 ml 800 ml 1600 ml 5000 ml — — — — — 20 ml
Dissolve the SAIL amino acid mixture in water and then add it to the cell-free reaction solution. If the SAIL amino acids appear to be insoluble in water, then warm the solution up to 60 C; L-tryptophan and L-tyrosine are especially likely to be insoluble, as compared to other amino acids. Cut the outer tube of the Float-A-Lyzer at an appropriate height such that inner solution in the tube will be completely immersed in the dialysis solution when the inner membrane apparatus is placed within the outer tube. Pour the dialysis solution into the outer tube. Place the inner membrane apparatus of the Float-A-Lyzer within the outer tube, and pour the reaction solution into the inner membrane. Cover the tube with Parafilm. Shake the tube to facilitate the production of target proteins under optimised conditions. Retrieve the reaction solution and the dialysis solution. If the produced protein has a molecular weight smaller than the molecular weight cut-off of the membrane, then check the outer solution for the presence of the protein. Purify the produced protein according to the purification procedures for the target protein. The N-terminus of the protein produced by cell-free expression may be
42
Protein NMR Spectroscopy
heterogeneous, due to incomplete deformylation by peptide deformylase. This can be overcome by using a cleavable N-terminal tag. 6. Transfer the prepared sample into the NMR tube.
When the SAIL protein is prepared, 1 H-15 N HSQC, 1 H-13 C constant time HSQC for the aliphatic region (with 2 H decoupling during 13 C chemical shift encoding), and 1 H-13 C HSQC for the aromatic region are commonly acquired. In the case of SAIL proteins, the number of time points for the indirect 13 C dimension is set to a relatively large number, and the window function should be optimised. A comparison of the quality of the NMR spectra between uniformly labelled and SAIL proteins is highly recommended (Figure 2.3). Firstly, the chemical shifts of the protons and carbons should be slightly different between them due to isotope shift effects, which confirms the desired isotope labelling pattern. Secondly, the linewidth of each resonance in the SAIL protein should be much less than that in the uniformly labelled protein. The NMR experiments to be acquired for the SAIL proteins for structure determination are essentially the same as those for the uniformly labelled proteins. 2 H decoupling during the 13 C chemical shift encoding ensures the narrow linewidth in the 13 C dimension. In the case of the chemical shift encoding of aromatic carbons, a constant time scheme is no longer required because of the absence of one-bond carbon-carbon couplings (Figure 2.4). If the quality of these three spectra is good, then the set of NMR experiments needed for resonance assignment and structure determination should be acquired (see Chapters 3 and 4). In our laboratory, two 13 C-edited NOESY-HSQCs for aliphatic and aromatic regions and a 15 N-edited NOESY-HSQC are acquired. The pulse sequences employed in these NOESY experiments are the same as those used for the uniformly labelled proteins, except that 2 H decoupling is applied in the 13 C chemical shift encoding. Since the 1 H density in SAIL proteins is about half of that in the corresponding uniformly labelled proteins, the optimal mixing time for the SAIL protein is expected to be longer than that for the uniformly labelled protein [146]. The aromatic resonances in Phe and Tyr residues are connected to the Hb-Cb moieties within the given aromatic amino acids. Some caution should be used, since the chemical shifts are different between the SAIL and uniformly labelled proteins, so that when the TALOS program is employed, the input data should be adjusted prior to use [33,34]. Details for structure determination of the SAIL-labelled proteins have been described [145]. 2.6.3
Residue-Selective SAIL Method
Along with the full SAIL labelling method, residue-selective labelling by SAIL amino acid(s) is also a powerful approach. As compared to the full SAIL approach, the residueselective SAIL method has some obvious advantages. For example, well-established and thus more robust in vivo expression systems can be used for protein production, provided that the added SAIL amino acid is not affected by metabolic scrambling. As compared to in vitro expression, in vivo expression requires a much larger amount of the amino acid to obtain the intended amount in the target protein. One strategy to overcome this problem involves the use of an auxotrophic E. coli strain. By growing the E. coli auxotrophic strain in minimal medium containing a small amount of the target isotope labelled amino acid and
[ppm]
13C
[ppm]
(c)
2.6
2.6
V23β
2.4
2.4
2.2
2.2
V357β E131β2 K29β3
Q253β2
2.0
2.0
E3β2
1H
1H
[ppm]
1.8
[ppm]
1.8
1.6
1.6
R354β3 P126β2/P254β2
E4β2/E221β2 P133β2
L135γ
1.4
1.4
1.2
1.2
L151γ L7γ Q335β2 K189δ3 L121γ L122γ L76γ/L115γ P123γ2 P91γ2 P334γ2 L275γ K15δ3 R298γ2 K202β3 L20γ P315γ2 R367γ3 I104γ12 K170δ3 P154γ2 Q72β2 R98γ3 L139γ K127δ3 K202δ3 Q365β2 P254γ2P159γ2 I329γ12 E30β2/E274β2 K83δ3 K179δ3 K251β3/K362δ3 P48γ2 Q325β2 L89γ K140δ3 K25δ3 K34δ3/K251β3 E322β2 K200δ3/K296δ3 K142δ3 Q49β2 K6δ3 I348γ12 E22β2 K26δ3 E309β2 E310β2 E288β2 K88δ3 E278β2 K189δ3 K305δ3/I317γ12 R316β3 R78β2 K29δ3/K277δ3 E138β2 R367β2 K175δ3/K219δ3/K326δ3 P48β2 E359β2 E45β2 R66β3
L43γ
32.0
31.5
31.0
30.5
30.0
29.5
29.0
28.5
28.0
27.5
27.0
26.5
31.5 ppm
31.0
30.5
30.0
29.5
29.0
28.5
28.0
27.5
27.0
26.5
26.0
(d)
2.6
2.6
P154γ2
(b)
2.5
2.5
2.4
2.4
Q72β2
P91γ2
1H
1H
Q365β2
[ppm]
2.3
[ppm]
2.3
R298γ2
2.2
2.2
P48γ2
P254γ2
P334γ2
2.1
2.1
P315γ2
2.0
2.0
E30β2/E274β2
Q325β2
P159γ2
K202β3
26.6
28.6
28.4
28.2
28.0
27.8
27.6
27.4
27.2
28.2
28.0
27.8
27.6
27.4
27.2
27.0
26.8
UL P91(γ2)
2.5
SAIL P91(γ2)
(e)
2.4
1H
2.3
2.2
[ppm]
UL P334(γ2)
SAIL P334(γ2)
2.1
2.0
UL P334(γ3)
Figure 2.3 1 H -13 C CT-HSQC spectra of maltose-binding protein (MBP). (a) Aliphatic region of methylene groups in SAIL-MBP. (b) Enlargement of the rectangular region of the methylene group marked in a. Assignments are indicated. (c, d) Corresponding region for uniformly 13C, 15N –labelled MBP. (e), Cross-section at the position indicated in (b and d). The spectra for SAIL MBP and uniformly labelled MBP were acquired under the same conditions and were scaled for equal noise levels
13C
[ppm] 13C
[ppm] 13C
(a)
Isotope Labelling 43
Protein NMR Spectroscopy
1
C
13
C
13
130
H
C
13
131
13
C 13
H
129
1
13
H
1
(c)
H
C
C
132
1
H 8.0
7.0
6.0 129
(d)
D
(b) 13
H
13
C
F315ε
F364ε
C
13
C
131
F316ε F275ε1
D
130
(ppm)
F347ε
H
(ppm)
1
13C
(a)
13C
44
F308ε
F275ε2
D
132
8.0
7.0 1H
(ppm)
(f) 115
115
116
116 Y334ε1
117
118
118
13C
117
(ppm)
(e)
6.0
119
7.0
6.8 1H
6.6
6.4
(ppm)
6.2
Y361ε
7.0
6.8
Y334ε2 Y269ε
6.6 1H
6.4
119
6.2
(ppm)
Figure 2.4 Comparisons of NMR spectra for the SARS-CoV NP CTD between UL and SAIL in the aromatic region. (a, b) Chemical structures of the aromatic rings for UL- (a) and SAILphenylalanine (b). (c, d) Phenylalanine signals of 1 H -13 C HSQC for UL- (c) and SAIL- (d) SARSCoV NP CTD. (e, f) Tyrosine signals of 1H-13C HSQC for UL- (e) and SAIL- (f) SARS-CoV NP CTD. To demonstrate the absence of 1Jcc coupling of aromatic rings for SAIL phenylalanine and tyrosine residues, all 1H-13C HSQC spectra for the aromatic regions were recorded without the constant time technique
a large amount of the 19 other unlabelled amino acids, a reasonable yield of the selectively labelled protein can be obtained. The minimum amount of the SAIL amino acid required for efficient expression varies for different target proteins and labelled amino acids. To achieve low cost, it is desirable to determine the minimum amount of SAIL amino acid required in a small-scale culture (Protocol 4). In favourable cases, an incorporation rate of more than 10 % of the SAIL amino acid into the target protein can be achieved, which is the same level as that in cell-free reactions. The growth rate of E. coli cells becomes slower as the amount of added amino acid decreases, and thus the expression level also decreases.
Isotope Labelling
45
Protocol 4: Optimisation of the Amount of SAIL Amino Acids for the Production of Calmodulin Selectively Labelled by SAIL Phenylalanine 1. Transform the expression vector into an auxotrophic E. coli strain. We often use AB2826 (DE3) strains [78] for selective labelling by SAIL aromatic amino acids. 2. Prepare minimal medium. .
Amino acid-containing M9 medium with different amounts of phenylalanine.
For 1L, combine the following: Na2HPO412 H2 O 15.1 g, KH2PO4 3.0 g, 15 N NH4Cl 1.0 g, NaCl 0.5 g L-Alanine
400 mg, L-Arginine 400 mg, L-Aspartic Acid 250 mg, L-Cystine 50 mg, Acid 400 mg, L-Glycine 400 mg, L-Histidine 100 mg, L-Isoleucine 100 mg, L-Leucine 100 mg, L-Lysine 150 mg, L-Methionine 50 mg, L-Phenylalanine 50 mg, LProline 150 mg, L-Serine 1000 mg, L-Threonine 100 mg, L-Tryptophan 50 mg, L-Tyrosine 100 mg, L-Valine 100 mg. The most important point in this method is that the amount of the target amino acid can be decreased without reducing the yield. For instance, in the case of phenylalanine, the amount can be decreased to around 10 mg/L. L-Glutamic
After autoclaving, add the following: 1M MgSO4 1 ml, 0.1 M CaCl2 1 ml, 8 % thiamine 0.5 ml, 20 % D-glucose 10 ml 3. Pick the colonies and inoculate them into LB medium. 4. Spin down the E. coli cells grown on the LB medium, and resuspend the cells in the minimal medium. 5. Grow the E. coli cells to an OD600 of 0.6–0.7. 6. Induce the expression by adding IPTG. 7. After a defined time, stop the culture and check the protein expression by SDS-PAGE.
2.7
Concluding Remarks
The labelling of proteins with stable isotopes enhances NMR methods for analyses of structure, dynamics and interactions. The choice of a suitable isotope labelling strategy is now very important, along with the optimisation of buffer conditions, in terms of sample optimisation. The selection of the isotope labelling strategy is based on many factors: available production method, cost, yield and intended study. The concept of stable isotope labelling resolves the numerous peaks and sharpens each line, thereby yielding reliable information on protein structure, dynamics and functional aspects.
46
Protein NMR Spectroscopy
Acknowledgements The authors thank our collaborators, Drs. Tsutomu Terauchi, Akira Mei Ono, Toshiya Hayano, Masato Shimizu, Takuya Torizawa, Teppei Ikeya and Peter G€untert, for their contributions to the development of the SAIL method described in this chapter. We are grateful for financial support from Core Research for Evolutional Science and Technology (JST) and the Targeted Proteins Research Program (MEXT).
References 1. Markley, J.L. and Kainosho, M. (1993) Stable isotope labeling and resonance assignments in larger proteins, in NMR of Macromolecules: A Practical Approach (ed. G.C.K. Roberts), Oxford University Press, Oxford. 2. LeMaster, D.M. (1994) Isotope labeling in solution protein assignment and structural analysis. Prog. Nucl. Magn. Reson. Spectrosc., 26, 371–419. 3. Kainosho, M. (1997) Isotope labelling of macromolecules for structure determinations. Nat. Struct. Biol., 4, 854–857. 4. Lian, L.-Y. and Middleton, D.A. (2001) Labelling approaches for protein structural studies by solution-state and solid-state NMR. Progr. Nucl. Magn. Reson. Spectrosc., 39, 171–190. 5. Goto, N.K. and Kay, L.E. (2000) New developments in isotope labeling strategies for protein solution NMR spectroscopy. Curr. Opin. Struct. Biol., 10, 585–592. 6. Ohki, S. and Kainosho, M. (2008) Stable isotope labeling methods for protein NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc., 53, 208–226. 7. Makrides, S.C. (1996) Strategies for achieving high-level expression of genes in Escherichia coli. Microbiol. Rev., 60, 512–538. 8. Baneyx, F. (1999) Recombinant protein expression in Escherichia coli. Curr. Opin. Biotechnol., 10, 411–421. 9. Studier, F.W., Rosenberg, A.H., Dunn, J.J. and Dubendorff, J.W. (1990) Use of T7 RNA polymerase to direct expression of cloned genes. Meth. Enzymol., 185, 60–89. 10. Singh, S.M. and Panda, A.K. (2005) Solubilization and refolding of bacterial inclusion body proteins. J. Biosci. Bioeng., 99, 303–310. 11. Woods, M.J. and Komives, E.A. (1999) Production of large quantities of isotopically labeled protein in Pichia pastoris by fermentation. J. Biomol. NMR, 13, 149–159. 12. Massou, S., Puech, V., Talmont, F. et al. (1999) Heterologous expression of a deuterated membrane-integrated receptor and partial deuteration in methylotrophic yeasts. J. Biomol. NMR, 14, 231–239. 13. Colussi, P.A. and Taron, C.H. (2005) Kluyveromyces lactis LAC4 promoter variants that lack function in bacteria but retain full function in K. lactis. Appl. Microbiol. Biotechnol., 71, 7092–7098. 14. Sugiki, T., Shimada, I. and Takahashi, H. (2008) Stable isotope labeling of protein by Kluyveromyces lactis for NMR study. J. Biomol. NMR, 42, 159–162. 15. DeLange, F., Klaassen, C.H.W., Wallace-Williams, S.E. et al. (1998) Tyrosine structural changes detected during the photoactivation of rhodopsin. J. Biol. Chem., 273, 23735–23739. 16. Creemers, A.F.L., Klaassen, C.H.W., Bovee-Geurts, P.H.M. et al. (1999) Solid state 15N NMR evidence for a complex Schiff base counterion in the visual G-protein-coupled receptor rhodopsin. Biochemistry, 38, 7195–7199. 17. Strauss, A., Bitsch, F., Cutting, B. et al. (2003) Amino–acid-type selective isotope labeling of proteins expressed in baculovirus-infected insect cells useful for NMR. J. Biomol. NMR, 26, 367–372. 18. Hansen, A.P., Petros, A.M., Mazar, A.P. et al. (1992) A practical method for uniform isotopic labeling of recombinant proteins in mammalian cells. Biochemistry, 31, 12713–12718.
Isotope Labelling
47
19. Archer, S.J., Bax, A., Roberts, A.B. et al. (1993) Transforming growth factor beta 1: NMR signal assignments of the recombinant protein expressed and isotopically enriched using Chinese hamster ovary cells. Biochemistry, 32, 1152–1163. 20. Spirin, A.S., Baranov, V.I., Ryabova, L.A. et al. (1988) A continuous cell-free translation system capable of producing polypeptides in high yield. Science, 242, 1162–1164. 21. Kigawa, T., Muto, Y. and Yokoyama, S. (1995) Cell-free synthesis and amino acid-selective stable isotope labeling of proteins for NMR analysis. J. Biomol. NMR, 6 129–134. 22. Kigawa, T., Yabuki, T., Yoshida, Y. et al. (1999) Cell-free production and stable-isotope labeling of milligram quantities of proteins. FEBS Lett., 442, 15–19. 23. Ozawa, K., Headlam, M.J., Schaeffer, P.M. et al. (2004) Optimization of an Escherichia coli system for cell-free synthesis of selectively 15N-labelled proteins for rapid analysis by NMR spectroscopy. Eur. J. Biochem., 271, 4084–4093. 24. Torizawa, T., Shimizu, M., Taoka, M. et al. (2004) Efficient production of isotopically labeled proteins by cell-free synthesis: A practical protocol. J. Biomol. NMR, 30, 311–325. 25. Kainosho, M., Torizawa, T., Iwashita, Y. et al. (2006) Optimal isotope labelling for NMR protein structure determinations. Nature, 440, 52–57. 26. Berrier, C., Park, K.H., Abes, S. et al. (2004) Cell-free synthesis of a functional ion channel in the absence of a membrane and in the presence of detergent. Biochemistry, 43, 12585–12591. 27. Madin, K., Sawasaki, T., Ogasawara, T. and Endo, Y. (2000) A highly efficient and robust cellfree protein synthesis system prepared from wheat embryos: plants apparently contain a suicide system directed at ribosomes. Proc. Natl. Acad. Sci. USA, 97, 559–564. 28. Endo, Y. and Sawasaki, T. (2003) High-throughput, genome-scale protein production method based on the wheat germ cell-free expression system. Biotechnol. Adv., 21, 695–713. 29. Morita, E.H., Shimizu, M., Ogasawara, T. et al. (2004) A novel way of amino acid-specific assignment in 1H-15N HSQC spectra with a wheat germ cell-free protein synthesis system. J. Biomol. NMR, 30, 37–45. 30. Ikura, M., Kay, L.E. and Bax, A. (1990) A novel approach for sequential assignment of proton, carbon-13, and nitrogen-15 spectra of larger proteins: heteronuclear triple-resonance threedimensional NMR spectroscopy. Application to calmodulin. Biochemistry, 29, 4659–4667. 31. Kay, L.E., Ikura, M., Tschudin, R. and Bax, A. (1990) Three-dimensional triple-resonance NMR spectroscopy of isotopically enriched proteins. J. Magn. Reson., 89, 496–514. 32. Clore, G.M. and Gronenborn, A.M. (1994) Multidimensional heteronuclear nuclear magnetic resonance of proteins. Methods Enzymol., 239, 349–363. 33. Wishart, D.S., Sykes, B.D. and Richards, F.M. (1992) The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy. Biochemistry, 31, 1647–1651. 34. Wishart, D.S. and Sykes, B.D. (1994) The 13C chemical-shift index: a simple method for the identification of protein secondary structure using 13C chemical-shift data. J. Biomol. NMR, 4, 171–180. 35. Shen, Y., Lange, O., Delaglio, F. et al. (2008). Consistent blind protein structure generation from NMR chemical shift data. Proc. Natl. Acad. Sci. USA, 25, 4685–4690. 36. LeMaster, D.M. and Kushlan, D.M. (1996) Dynamic mapping of E. coli Thioredoxin via 13C NMR relaxation analysis. J. Amer. Chem. Soc., 118, 9255–9264. 37. Castellani, F., van Rossum, B., Diehl, A. et al. (2002) Structure of a protein determined by solidstate magic-angle-spinning NMR spectroscopy. Nature, 420, 98–102. 38. Takeuchi, K., Sun, Z.Y. and Wagner, G. (2008) Alternate 13C-12C Labeling for complete mainchain resonance assignments using Ca direct-detection with applicability toward fast relaxing protein systems. J. Am. Chem. Soc., 130, 17210–17211. 39. Crespi, H.L., Rosenberg, R.M. and Katz, J.J. (1968) Proton magnetic resonance of proteins fully deuterated except for 1H-leucine side chains. Science, 161, 795–796. 40. Markley, J.L., Putter, I. and Jardetzky, O. (1968) High-resolution nuclear magnetic resonance spectra of selectively deuterated staphylococcal nuclease. Science, 161, 1249–1251. 41. Kalbitzer, H.R., Leberman, R. and Wittinghofer, A. (1985) 1H-NMR spectroscopy on elongation factor Tu from Escherichia coli. FEBS Lett., 180, 40–42.
48
Protein NMR Spectroscopy
42. Venters, R.A., Farmer, B.T. II, Fierke, C.A. and Spicer, L.D. (1996) Characterizing the use of perdeuteration in NMR studies of large proteins: 13C, 15N and 1H assignments of human carbonic anhydrase II. J. Mol. Biol., 264, 1101–1116. 43. Gardner, K.H. and Kay, L.E. (1998) The use of 2H, 13C, 15N multidimensional NMR to study the structure and dynamics of proteins. Annu. Rev. Biophys. Biomol. Struct., 27, 357–406. 44. Fiaux, J., Bertelsen, E.B., Horwich, A.L. and W€ uthrich, K. (2004) Uniform and residue-specific 15 N-labeling of proteins on a highly deuterated background. J. Biomol. NMR, 29, 289–297. 45. Paliy, O. and Gunasekera, T.S. (2007) Growth of E. coli BL21 in minimal media with different gluconeogenic carbon sources and salt contents. Appl. Microbiol. Biotechnol., 73, 1169–1172. 46. Markley, J.L., Lu, M. and Bracken, C. (2001) A method for efficient isotopic labeling of recombinant proteins. J. Biomol. NMR, 20, 71–75. 47. Venters, R.A., Huang, C.C., Farmer JII, B.T. et al. (1995) High-level 2H/13C/15N labeling of proteins for NMR studies. J. Biomol. NMR., 5, 339–344. 48. Venters, R.A., Calderone, T.L., Spicer, L.D. and Fierke, C.A. (1991) Uniform 13C isotope labeling of proteins with sodium acetate for NMR studies: application to human carbonic anhydrase II. Biochemistry, 30, 4491–4494. 49. Leiting, B., Marsilio, F. and O’Connell, J.F. (1998) Predictable deuteration of recombinant proteins expressed in Escherichia coli. Anal. Biochem., 265, 351–355. 50. Hochuli, M., Szyperski, T. and W€uthrich, K. (2000) Deuterium isotope effects on the central carbon metabolism of Escherichia coli cells grown on a D2O-containing minimal medium. J. Biomol. NMR, 17, 33–42. 51. Etezady-Esfarjani, T., Hiller, S., Villalba, C. and W€ uthrich, K. (2007) Cell-free protein synthesis of perdeuterated proteins for NMR studies. J. Biomol. NMR, 39, 229–238. 52. Morgan, W.D., Kragt, A. and Feeney, J. (2000) Expression of deuterium-isotope-labelled protein in the yeast Pichia pastoris for NMR studies. J. Biomol. NMR, 17, 337–347. 53. Farmer, B.T. II and Venters, R.A. (1995) Assignment of side-chain 13C resonances in perdeuterated proteins. J. Am. Chem. Soc., 117, 4187–4188. 54. Grzesiek, S., Anglister, J., Ren, H. and Bax, A. (1993) Carbon-13 line narrowing by deuterium decoupling in deuterium/carbon-13/nitrogen-15 enriched proteins. Application to triple resonance 4D J connectivity of sequential amides. J. Am. Chem. Soc., 115, 4369–4370. 55. Venters, R.A., Metzler, W.J., Spicer, L.D. et al. (1995) Use of 1HN-1HN NOEs to determine protein global folds in perdeuterated proteins. J. Am. Chem. Soc., 117, 9592–9593. 56. Takahashi, H., Nakanishi, T., Kami, K. et al. (2000) A novel NMR method for determining the interfaces of large protein- protein complexes. Nature Struct. Biol., 7, 220–223. 57. Nakanishi, T., Miyazawa, M., Sakakura, M. et al. (2002) Determination of the interface of a large protein complex by transferred cross-saturation measurements. J. Mol. Biol., 318, 245–249. 58. Shimada, I. (2005) NMR techniques for identifying the interface of a larger protein–protein complex: cross-saturation and transferred cross-saturation experiments. Methods Enzymol., 394, 483–506. 59. Takeda, M., Ogino, S., Umemoto, R. et al. (2006) Ligand-induced structural changes of the CD44 hyaluronan-binding domain revealed by NMR. J. Biol. Chem., 281, 40089–40095. 60. Takeda, M., Terasawa, H., Sakakura, M. et al. (2003) Hyaluronan recognition mode of CD44 revealed by cross-saturation and chemical shift perturbation experiments. J. Biol. Chem., 278, 43550–53555. 61. Nishida, N., Sumikawa, H., Sakakura, M. et al. (2003) Collagen-binding mode of vWF-A3 domain determined by a transferred cross-saturation experiment. Nat. Struct. Biol., 10, 53–58. 62. Pervushin, K., Riek, R., Wider, G. and W€uthrich, K. (1997) Attenuated T2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc. Natl. Acad. Sci. USA, 94, 12366–12371. 63. Riek, R., Wider, G., Pervushin, K. and W€uthrich, K. (1999) Polarization transfer by crosscorrelated relaxation in solution NMR with very large molecules. Proc. Natl. Acad. Sci. USA, 96, 4918–4923.
Isotope Labelling
49
64. Salzmann, M., Pervushin, K., Wider, G. et al. (1998) TROSY in triple-resonance experiments: new perspectives for sequential NMR assignment of large proteins. Proc. Natl. Acad. Sci. USA, 95, 13585–13590. 65. Fiaux, J., Bertelsen, E.B., Horwich, A.L. and W€uthrich, K. (2002) NMR analysis of a 900K GroEL GroES complex. Nature, 418, 207–211. 66. Pervushin, K., Riek, R., Wider, G. and W€uthrich, K. (1998) Transverse relaxation-optimized spectroscopy (TROSY) for NMR studies of aromatic spin systems in 13C-labeled proteins. J. Am. Chem. Soc., 120, 6394–6400. 67. Tugarinov, V., Hwang, P.M., Ollerenshaw, J.E. and Kay, L.E. (2003) Cross-correlated relaxation enhanced 1H-13C NMR spectroscopy of methyl groups in very high molecular weight proteins and protein complexes. J. Am. Chem. Soc., 125, 10420–10428. 68. LeMaster, D.M. and Richards, F.M. (1988) NMR sequential assignment of Escherichia coli thioredoxin utilizing random fractional deuteration. Biochemistry, 27, 142–150. 69. Torchia, D.A., Sparks, S.W. and Bax, A. (1988) Delineation of a-helical domains in deuterated staphylococcal nuclease by 2D NOE NMR spectroscopy. J. Am. Chem. Soc., 110, 2320–2321. 70. Nietlispach, D., Clowes, R.T., Broadhurst, R.W. et al. (1996) An approach to the structure determination of larger proteins using triple resonance NMR experiments in conjunction with random fractional deuteration. J. Am. Chem. Soc., 118, 407–415. 71. Gardner, K.H., Rosen, M.K. and Kay, L.E. (1997) Global folds of highly deuterated, methylprotonated proteins by multidimensional NMR. Biochemistry, 36, 1389–1401. 72. Muhandiram, D.R., Yamazaki, T., Sykes, B.D. and Kay, L.E. (1995) Measurement of 2H T1 and T1r relaxation times in uniformly 13C-labeled and fractionally 2H-labeled proteins in solution. J. Am. Chem. Soc., 117, 11536–11544. 73. Kay, L.E., Muhandiram, D.R., Farrow, N.A. et al. (1996) Correlation between dynamics and high affinity binding in an SH2 domain interaction. Biochemistry, 35, 361–368. 74. LeMaster, D.M. and Kushlan, D.M. (1996) Dynamical mapping of E. coli thioredoxin via 13C NMR relaxation analysis. J. Am. Chem. Soc., 118, 9255–9264. 75. Tugarinov, V., Kanelis, V. and Kay, L.E. (2006) Isotope labeling strategies for the study of highmolecular-weight proteins by solution NMR spectroscopy. Nature Protoc., 1, 749–754. 76. Muchmore, D.C., McIntosh, L.P., Russell, C.B. et al. (1989) Expression and nitrogen-15 labeling of proteins for proton and nitrogen-15 nuclear magnetic resonance. Methods Enzymol., 177, 44–73. 77. Waugh, D.S. (1996) Genetic tools for selective labeling of proteins with a-15N-amino acids. J. Biomol. NMR, 8, 184–192. 78. Rajesh, S., Nietlispach, D., Nakayama, H. et al. (2003) A novel method for the biosynthesis of deuterated proteins with selective protonation at the aromatic rings of Phe, Tyr and Trp. J. Biomol. NMR, 27, 81–86. 79. Shortle, D. (1994) Assignment of amino acid type in 1H-15N correlation spectra by labeling with 14 N-amino acids. J. Magn. Reson. B., 105, 88–90. 80. Bertini, I., Duma, L., Felli, I.C. et al. (2004) A heteronuclear direct-detection NMR spectroscopy experiment for protein-backbone assignment. Angew. Chem., Int. Ed., 43, 2257–2259. 81. Bermel, W., Bertini, I., Felli, I.C. et al. (2006) 13C-detected protonless NMR spectroscopy of proteins in solution. Progr. NMR Spectrosc., 48, 25–45. 82. Babini, E., Bertini, I., Capozzi, F. et al. (2004) Direct carbon detection in paramagnetic metalloproteins to further exploit pseudocontact shift restraints. J. Am. Chem. Soc., 126, 10496–10497. 83. Kainosho, M. and Tsuji, T. (1982) Assignment of the three methionyl carbonyl carbon resonances in Streptomyces subtilisin inhibitor by a carbon-13 and nitrogen-15 double-labeling technique. A new strategy for structural studies of proteins in solution. Biochemistry, 21, 6273–6279. 84. Kainosho, M., Nagao, H. and Tsuji, T. (1987) Local structural features around the C-terminal segment of Streptomyces subtilisin inhibitor studied by carbonyl carbon nuclear magnetic resonances of three phenylalanyl residues. Biochemistry, 26, 1068–1075.
50
Protein NMR Spectroscopy
85. Uchida, K., Markley, J.L. and Kainosho, M. (2005) Carbon-13 NMR method for the detection of correlated hydrogen exchange at adjacent backbone peptide amides and its application to hydrogen exchange in five antiparallel beta strands within the hydrophobic core of Streptomyces subtilisin inhibitor (SSI). Biochemistry, 44, 11811–11820. 86. Yamazaki, T., Yoshida, M., Kanaya, S. et al. (1991) Assignments of backbone 1H, 13C, and 15N resonances and secondary structure of ribonuclease H from Escherichia coli by heteronuclear three-dimensional NMR spectroscopy. Biochemistry, 30, 6036–6047. 87. Parker, M.J., Aulton-Jones, M., Hounslow, A.M. and Craven, C.J. (2004) A combinatorial selective labeling method for the assignment of backbone amide NMR resonances. J. Am. Chem. Soc., 126, 5020–5021. 88. Wu, P.S., Ozawa, K., Jergic, S. et al. (2006) Amino-acid type identification in 15N-HSQC spectra by combinatorial selective 15N-labelling. J. Biomol. NMR, 34, 13–21. 89. Craven, C.J., Al-Owais, M. and Parker, M.J. (2007) A systematic analysis of backbone amide assignments achieved via combinatorial selective labelling of amino acids. J. Biomol. NMR, 38, 151–159. 90. Kelly, M.J., Krieger, C., Ball, L.J. et al. (1999) Application of amino acid type-specific 1H- and 14 N-labeling in a 2H-, 15N-labeled background to a 47 kDa homodimer: Potential for NMR structure determination of large proteins. J. Biomol. NMR, 14, 79–83. 91. Metzler, W.J., Wittekind, M., Goldfarb, V. et al. (1996) Incorporation of 1H/13C/15N-{Ile, Leu, Val} into a perdeuterated, 15N-labeled protein: Potential in structure determination of large proteins by NMR. J. Am. Chem. Soc., 118, 6800–6801. 92. Arrowsmith, C.H., Pachter, R., Altman, R.B. et al. (1990) Sequence-specific proton NMR assignments and secondary structure in solution of Escherichia coli trp repressor. Biochemistry, 29, 6332–6341. 93. Zhang, H., Zhao, D., Revington, M. et al. (1994) The Solution Structures of the trp RepressorOperator DNA Complex. J. Mol. Biol., 238, 592–614. 94. Yamazaki, T., Lee, W., Revington, M. et al. (1994) An HNCA pulse scheme for the backbone assignment of 15N,13C,2H-labeled proteins: application to a 37-kDa Trp repressor-DNA Complex. J. Am. Chem. Soc., 116, 6464–6465. 95. Yamazaki, T., Lee, W., Arrowsmith, C.H. et al. (1994) A suite of triple resonance NMR experiments for the backbone assignment of 15N, 13C, 2H labeled proteins with high sensitivity. J. Am. Chem. Soc., 116, 11655–11666. 96. Shan, X., Gardner, K.H., Muhandiram, D.R. et al. (1996) Assignment of 15N, 13Ca13Cb and HN resonances in an 15N,13C,2H labeled 64 kDa Trp repressor-pperator complex using triple-resonance NMR spectroscopy and 2H-decoupling. J. Am. Chem. Soc., 118, 6570–6579. 97. L€ohr, F., Katsemi, V., Hartleib, J. et al. (2003) A strategy to obtain backbone resonance assignments of deuterated proteins in the presence of incomplete amide 2H/1H back-exchange. J. Biomol. NMR, 25, 291–311. 98. Yamazaki, T., Tochio, H., Furui, J. et al. (1997) Assignment of backbone resonances for larger proteins using the 13C-1H coherence of a 1Ha-, 2H-, 13C-, and 15N-labeled sample. J. Am. Chem. Soc., 119, 872–880. 99. Coughlin, P.E., Anderson, F.E., Oliver, E.J. et al. (1999) Improved resolution and sensitivity of triple-resonance NMR methods for the structural analysis of proteins by use of a backbonelabeling strategy. J. Am. Chem. Soc., 121, 11871–11874. 100. Vuister, G.W., Kim, S.J., Wu, C. and Bax, A. (1994) 2D and 3D NMR study of phenylalanine residues in proteins by reverse isotopic labeling. J. Am. Chem. Soc., 116, 9206–9210. 101. Kuboniwa, H., Tjandra, N., Grzesiek, S. et al. (1995) Solution structure of calcium-free calmodulin. Nature Struct. Biol., 2, 768–776. 102. Aghazadeh, B., Zhu, K., Kubiseski, T.J. et al. (1998) Structure and mutagenesis of the Dbl homology domain. Nature Struct. Biol., 5, 1098–1107. 103. Medek, A., Olejniczak, E.T., Meadows, R.P. and Fesik, S.W. (2000) An approach for highthroughput structure determination of proteins by NMR spectroscopy. J. Biomol. NMR, 18, 229–238.
Isotope Labelling
51
104. Tugarinov, V. and Kay, L.E. (2004) An isotope labeling strategy for methyl TROSY spectroscopy. J. Biomol. NMR, 28, 165–172. 105. Tugarinov, V., Choy, W.Y., Orekhov, V.Y. and Kay, L.E. (2005) Solution NMR-derived global fold of a monomeric 82-kDa enzyme. Proc. Natl. Acad. Sci. USA, 102, 622–627. 106. Rosen, M.K., Gardner, K.H., Willis, R.C. et al. (1996) Selective methyl group protonation of perdeuterated proteins. J. Mol. Biol., 263, 627–636. 107. Gardner, K.H. and Kay, L.E. (1997) Production and incorporation of 15N, 13C, 2H (1H-d1 methyl) isoleucine into proteins for multidimensional NMR studies. J. Am. Chem. Soc., 119, 7599–7600. 108. Ollerenshaw, J.E., Tugarinov, V., Skrynnikov, N.R. and Kay, L.E. (2005) Comparison of 13CH3, 13 C H2D, and 13C HD2 methyl labeling strategies in proteins. J. Biomol. NMR, 33, 25–41. 109. Goto, N.K., Gardner, K.H., Mueller, G.A. et al. (1999) A robust and cost-effective method for the production of Val, Leu, Ile (d1) methyl-protonated 15N-, 13C-, 2H-labeled proteins. J. Biomol. NMR, 13, 369–374. 110. Isaacson, R.L., Simpson, P.J., Liu, M. et al. (2007) A new labeling method for methyl transverse relaxation-optimized spectroscopy NMR spectra of alanine residues. J. Am. Chem. Soc., 129, 15428–15429. 111. Sprangers, R. and Kay, L.E. (2007) Quantitative dynamics and binding studies of the 20S proteasome by NMR. Nature, 445, 618–622. 112. Ishima, R., Louis, J.M. and Torchia, D.A. (2001) Optimized labeling of 13CHD2 methyl isotopomers in perdeuterated proteins: Potential advantages for 13C relaxation studies of methyl dynamics of larger proteins. J. Biomol. NMR, 21, 167–171. 113. Gardner, K.H., Zhang, X.C., Gehring, K. and Kay, L.E. (1998) Solution NMR studies of a 42 kDa Escherichia coli maltose binding protein/b-cyclodextrin complex: Chemical shift assignments and analysis. J. Am. Chem. Soc., 120, 11738–11748. 114. Tugarinov, V. and Kay, L.E. (2003) Ile, Leu, and Val methyl assignments of the 723-Residue Malate Synthase G using a new labeling strategy and novel NMR methods. J. Am. Chem. Soc, 125, 13868–13878. 115. W€uthrich, K. (1986) NMR of Proteins and Nucleic Acids, John Wiley & Sons, Inc., New York. 116. Senn, H., Werner, B., Messerle, B.A. et al. (1989) Stereospecific assignment of the methyl 1H NMR lines of valine and leucine in polypeptides by nonrandom 13C labelling. FEBS Lett., 249, 113–118. 117. Neri, D., Szyperski, T., Otting, G. et al. (1989) Stereospecific nuclear magnetic resonance assignments of the methyl groups of valine and leucine in the DNA-binding domain of the 434 repressor by biosynthetically directed fractional carbon-13 labeling. Biochemistry, 28, 7510–7516. 118. Atreya, H.S. and Chary, K.V. (2001). Selective ‘unlabeling’ of amino acids in fractionally 13C labeled proteins: An approach for stereospecific NMR assignments of CH3 groups in Val and Leu residues. J. Biomol. NMR 19, 267–272. 119. Tate, S., Ushioda, T., Utsunomiya, N. et al. (1995) Solution structure of a human cystatin A variant, cystatin A2-98 M65L by NMR spectroscopy. A possible role of the interactions between the N- and C-termini to maintain the inhibitory active form of cystatin A. Biochemistry, 34, 14637–14648. 120. Ohki, S., Eto, M., Kariya, E. et al. (2001) Solution NMR structure of the myosin phosphatase inhibitor protein CPI-17 shows phosphorylation-induced conformational changes responsible for activation. J. Mol. Biol., 314, 839–849. 121. Ostler, G., Soteriou, A., Moody, C.M. et al. (1993) Stereospecific assignments of the leucine methyl resonances in the 1H NMR spectrum of Lactobacillus casei dihydrofolate reductase. FEBS Lett., 318, 177–180. 122. Kainosho, M., Ajisaka, K., Kamisaku, M. et al. (1975) Conformational analysis of amino acids and peptides using specific isotope substitution. I. Conformation of L-phenylalanylglycine. Biochem. Biophys. Res. Commun., 64, 425–432.
52
Protein NMR Spectroscopy
123. Kainosho, M. and Ajisaka, K. (1975) Conformational analysis of amino acids and peptides using specific isotope substitution. II. Conformation of serine, tyrosine, phenylalanine, aspartic acid, asparagine, aspartic acid b-methyl ester in various ionization states. J. Am. Chem. Soc., 97, 5630–5631. 124. Kushlan, D.M. and LeMaster, D.M. (1993) Resolution and sensitivity enhancement of heteronuclear correlation for methylene resonances via 2H enrichment and decoupling. J. Biomol. NMR, 3, 701–708. 125. Vitali, F., Henning, A., Oberstrass, F.C. et al. (2006) Structure of the two most C-terminal RNA recognition motifs of PTB using segmental isotope labeling. EMBO J., 25, 150–162. 126. Skrisovska, L. and Allain, F.H. (2008) Improved segmental isotope labeling methods for the NMR study of multidomain or large proteins: Application to the RRMs of Npl3p and hnRNP L. J. Mol. Biol., 375, 151–164. 127. Yamazaki, T., Otomo, T., Oda, N. et al. (1998) Segmental isotope labeling for protein NMR using peptide splicing. J. Am. Chem. Soc., 120, 5591–5592. 128. Yagi, H., Tsujimoto, T., Yamazaki, T. et al. (2004) Conformational change of H þ -ATPase b monomer revealed on segmental isotope labeling NMR spectroscopy. J. Am. Chem. Soc., 126, 16632–16638. 129. Otomo, T., Ito, N., Kyogoku, Y. and Yamazaki, T. (1999) NMR observation of selected segments in a larger protein: central-segmental isotope labeling through intein-mediated ligation. Biochemistry, 38, 16040–16044. 130. Otomo, T., Teruya, K., Uegaki, K. et al. (1999) Improved segmental isotope labeling of proteins and application to a larger protein. J. Biomol. NMR, 14, 105–114. 131. Z€uger, S. and Iwai, H. (2005) Intein-based biosynthetic incorporation of unlabeled protein tags into isotopically labeled proteins for NMR studies. Nat. Biotech., 23, 736–740. 132. Dawson, P.E., Muir, T.W., Clark-Lewis, I. and Kent, S.B. (1994) Synthesis of proteins by native chemical ligation. Science, 266, 776–779. 133. Dawson, P.E. and Kent, S.B. (2000) Synthesis of native proteins by chemical ligation. Annu. Rev. Biochem., 69, 923–960. 134. Muir, T.W. (2003) Semisynthesis of proteins by expressed protein ligation. Annu. Rev. Biochem., 72, 249–289. 135. David, R., Richter, M.P. and Beck-Sickinger, A.G. (2004) Expressed protein ligation. Method and applications. Eur. J. Biochem., 271, 663–677. 136. Xu, R., Ayers, B., Cowburn, D. and Muir, T.W. (1999) Chemical ligation of folded recombinant proteins: Segmental isotopic labeling of domains for NMR studies. Proc. Natl. Acad. Sci. USA, 96, 388–393. 137. Camarero, J.A., Shekhman, A., Campbell, E. et al. (2002) Autoregulation of a bacterial s factor explored by using segmental isotopic labeling and NMR. Proc. Natl. Acad. Sci. USA, 99, 8536–8541. 138. Cotton, G.J., Ayers, B., Xu, R. and Muir, T.W. (1999) Insertion of a synthetic peptide into a recombinant protein framework: A protein biosensor. J. Am. Chem. Soc., 121, 1100–1101. 139. Chong, S., Mersha, F.B., Comb, D.G. et al. (1997) Single-column purification of free recombinant proteins using a self-cleavable affinity tag derived from a protein splicing element. Gene, 192, 271–281. 140. Takeda, M., Sugimori, N., Torizawa, T. et al. (2008) Structure of the putative 32 kDa myrosinase binding protein from Arabidopsis (At3g16450.1) determined by SAIL-NMR. FEBS J., 275, 5873–5884. 141. Takeda, M., Chang, C.K., Ikeya, T. et al. (2008) Solution structure of the C-terminal dimerization domain of SARS coronavirus nucleocapsid protein solved by the SAIL-NMR method. J. Mol. Biol., 380, 608–622. 142. Torizawa, T., Ono, A.M., Terauchi, T. and Kainosho, M. (2005) NMR assignment methods for the aromatic ring resonances of Phenylalanine and Tyrosine residues in proteins. J. Am. Chem. Soc., 127, 12620–12626. 143. Terauchi, T., Kobayashi, K., Okuma, K. et al. (2008) Stereoselective synthesis of triply isotopelabeled Ser, Cys, and Ala: amino acids for stereoarray isotope labeling technology. Org. Lett., 10, 2785–2787.
Isotope Labelling
53
144. Okuma, K., Ono, A.M., Tsuchiya, S. et al. (2009) Asymmetric synthesis of (2S,3R)- and (2S,3S)[2-13C;3-2H] glutamic acid. Tetrahedron Lett., 50, 1482–1484. 145. Ikeya, T., Terauchi, T., G€untert, P. and Kainosho, M. (2006) Evaluation of stereo-array isotope labeling (SAIL) patterns for automated structural analysis of proteins with CYANA. Magn. Reson. Chem., 44, S152–S157. 146. Takeda, M., Ikeya, T., G€untert, P. and Kainosho, M. (2007) Automated structure determination of proteins with the SAIL-FLYA NMR method. Nat. Protoc., 2, 2896–2902.
3 Resonance Assignments Lu-Yun Lian and Igor L. Barsukov
3.1
Introduction
The assignment of the resonances in the spectrum of a protein to individual amino-acids in the sequence is an essential first step towards detailed studies of the protein by NMR. For determination of the high-resolution structure of the protein, complete proton assignments are required. For more specific studies – especially when the structure of the protein is known – such as identifying regions of interactions with other molecules, characterising the dynamics of the protein, and determination of pKa values, less complete assignments may be sufficient. The first protein resonance assignments were carried out using unlabelled proteins with a strategy based on firstly making through-bond (scalar coupling) connections between protons within an amino acid, followed by sequential through-space (NOE-based) connections between protons in neighbouring amino acids [1]. By the early 1990s several significant advances occurred which changed the way protein NMR spectra were assigned and revolutionised the use of NMR for biomolecular studies. The advances included: the ease of producing quantities of recombinant proteins labelled with 13 C and 15 N, the advent of three-dimensional NMR, major improvements in spectrometer hardware and the development of stable, progressively higher-field magnets. It became possible to assign the NMR spectrum of a protein by simply making covalent connections between the protons, nitrogen and carbon atoms, thereby linking almost all the atoms of entire polypeptide chain without
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
56
Protein NMR Spectroscopy
recourse to through-space NOE connectivities [2]. These advances opened up the applicability of NMR for the high-resolution studies of larger proteins; the increased dimension of the NMR data to a large extent resolved the resonance overlap problem, and using connectivities based on covalent interactions to connect two adjacent residues significantly alleviated the problem of rapid transverse relaxation of protons which is characteristic of large proteins. Further progress was made in the late 1990s when TROSYexperiments were introduced [3]; again these experimental techniques were accompanied by technological developments with the availability of NMR spectrometers at field strengths of 18.4 T and above, and the possibility and affordability of making highly deuterated 13 C, 15 N labelled proteins. In addition site-specific selective labelling became more readily available, facilitating side-chains assignment even for very large proteins. In this chapter we describe first the NOE-based resonance assignment strategy which is appropriate for both unlabelled proteins (Section 3.2) and when only 15 N-labelled proteins are available (Section 3.3). We then describe the most common approaches based on triple resonance HSQC and TROSY experiments, the latter being required for proteins over 30 kDa. Section 3.4 describes the resonance assignment of the polypeptide backbone and Section 3.5 that of the side-chains. The triple resonance approach is very efficient and almost all the automatic resonance assignment programmes make use of data from these experiments. It is useful to stress that in both the NOE-based method and the main-chain directed method from the triple resonance 3D data, the essential first step is the identification of spin-systems from individual amino-acid residues and the types of amino acid from which they arise. For some amino acids the chemical shifts of the protons and carbons follow unique patterns. It is advisable when undertaking resonance assignments, to have available a list of proton and carbon chemical shifts of all twenty amino acids; these are readily available from standard text-books or databases. In the second step, the NOE-based method relies on two adjacent residues being close in space whereas the main-chain directed method relies on the presence of covalent linkage (through-bond scalar coupling) between 15 N and 13 C atoms in adjacent amino-acid residues in the polypeptide chain.
3.2
Resonance Assignment of Unlabelled Proteins
It is possible to assign the spectrum of a protein of molecular mass up to about 8kDa using proton-only experiments, without recourse to stable isotope labelling. Although most proteins studied are recombinant proteins, expressed predominantly is E. coli, there are sometimes situations which necessitate the use of unlabelled materials. Examples of these include purified native proteins such as toxins, peptides containing unusual amino acids such as the lantibiotics, and synthetic peptides. In situations when only 15 N-labelled proteins can be made or when 13 C-labelled proteins are prohibitively expensive to produce, 3D 15 N-edited experiments are used; in this case, the strategy for the assignment procedure is identical to that described for unlabelled proteins with the extra 15 N dimension being particularly advantageous for resolving resonance overlap, especially in proteins with very high helical contents. The assignment of the second IgG-binding domain of Streptococcus Protein G (henceforth referred to as GB1) is used here to illustrate the NOE-based sequential resonance assignment strategy [4]. For spin-system identification, Double-Quantum filter COSY
Resonance Assignments
57
(DQF-COSY), and TOCSY data were collected; for sequential assignment, NOESY experiments were acquired. 3.2.1
Spin System Assignments
As indicated above, the overall strategy is, firstly, to identify all the spin systems and identify the type of amino acid from which each originates, and, secondly, to link the proton resonances from each amino acid in such a way as to match the protein or peptide sequence. Spin systems are delineated using through-bond coupling connections whereas sequential links are achieved by making through-space dipolar coupling connections. The protons within an individual amino acid constitute a ‘spin system’. Hence, a spin system is defined as a recognisable pattern of chemical shifts belonging to a particular amino acid. The most convenient way to obtain information on complete spin systems is to use a TOCSY spectrum but firstly, the DQF-COSY spectrum is examined to identify the direct three-bond scalar connectivities. The first region of the spectrum to examine is the CaH-NH region (Figure 3.1); this is called the fingerprint region. The success of the assignment process relies upon obtaining the maximal number of cross-peaks in this region. GB1 has 55 amino acids; hence, the expected number of cross-peaks is 54 (55 minus N-terminal Thr; there are no prolines in GB1). All the expected CaH-NH cross-peaks are observed. Once all these connectivities are identified, TOCSY spectra at 60 and 120 ms mixing times are used to delineate almost all the spin systems by developing the spin system from the NH resonance. The TOCSY pattern of the complete spin system for each representative type of amino acid from GB1 is shown in Figure 3.2; there are no arginine, methionine, histidine, serine, cysteine or proline residues in GB1. Figure 3.3 shows the aromatic side-chain spin system for tyrosine, phenylalanine and tryptophan for data acquired in H2O. Also appearing in this region of the spectrum are the Asn and Gln amide NH2 resonances which are present in the TOCSY but not COSY spectrum. Generally, alanine, threonine and glycine have unique, easily identifiable spin systems. Valine. isoleucine and leucine are in principle distinguishable from each other. However, due to their long side-chain, and the decreased efficiency in coupling magnetisation transfers, the entire spin system may not be easily delineated from a single TOCSY spectrum. Amino acids Asp, Asn, Phe, Trp, Tyr, Cys, Ser show a simpler AMX spin system as far as the NH, CaH and CbH protons are concerned. Of the remaining long side-chains, Lys is distinguishable from the others by the presence of the CeH methylene resonance at around 3.02 ppm. For larger proteins, it is sometimes not possible to delineate the complete spin systems for the long side-chain amino acids by developing only from the amide proton. This is often due either to unfavourable intervening scalar coupling constants or to substantial attenuation of the CaH-NH cross-peaks by amide exchange. For these, the amide to side-chain connectivities are identified as far along the chain as possible and the identification of the spin system is completed by using the aliphatic part of the spectrum. For instance, the methyl region of the TOCSYand DQF-COSY spectra are particularly useful for completing the spin systems of the Leu and Ile residues by obtaining relayed connectivities from the methyl groups to the methylene resonances and ‘meeting up’ with the partial spin system developed from the amide resonances.
58
Protein NMR Spectroscopy
Figure 3.1 NH amide/aromatic (F2 x-axis)- aliphatic (F1 y-axis) region of the [1 H , 1 H ] COSY (left) and [1 H , 1 H ] TOCSY (right) of the second IgG-binding domain of Protein G (GB1) at 1.5 mM in 90 % H2O/10 % D2O at 25 C, pH4.2. These spectra form the starting point for the sequential assignment of the proton spectrum of GB1. The expected 53 cross-peaks from all the amino acids (except the proline and N-terminus amino acids) are observed in the COSY spectrum. These cross-peaks represent the fingerprint region and from these, and in combination with the TOCSY spectrum, spin systems will be developed to obtain the resonance positions for as many resonances for each amino acid as possible. Another GB1 sample in 99.96 % 2 H 2 O was also prepared; this sample was required mainly to simplify the spectrum of the aromatic region where side-chain NH2 resonances from the asparagine and glutamine side-chains complicated the assignment of the aromatic side-chain resonances. Due to potential resonance overlaps, and in order to shift the water resonance to allow resonances near the water peak to be observed, datasets were also collected at two pH values, 4.2 and 3.1 and for each pH value, at two different temperatures, 25 C and 37 C. For all the experiments, water suppression was achieved using gradient pulses
Although identification of the CbH2 resonances of Asn and Cg H2 resonances of Gln are possible from the TOCSY spectra alone, it is also possible to make use of the additional NOE connectivities between the amide side-chain signals with their respective aliphatic side-chain resonances to simplify or confirm these spin systems. This method will distinguish Asn from Asp and Gln from Glu side-chain resonances. Similarly, once the aromatic side-chain resonances are assigned from the distinct spin system for each of the four aromatics rings – Trp, Tyr, Phe and His – in the COSY and TOCSY spectra, these assignments can assist with the identification of the CbH2 resonances from the aromatic residues using NOE connectivities with their respective side-chain aromatic protons as shown in Figure 3.4.
Resonance Assignments
59
Figure 3.2 [1 H , 1 H ] TOCSY spectrum showing connectivities which make up the spin system for a particular amino acid. The mixing times for the TOCSY experiments were 60 and 120 ms. TOCSY spectra at the shorter mixing time showed shorter range scalar connectivities; at the longer mixing time, it is often possible to observe all the scalar-coupled cross-peaks which will define the complete spin system of an amino acid. It is important to ensure that sufficient number of datapoints is collected in the indirect dimension in order to obtain good resolution for the cross-peaks. Missing peaks are generally due to chemical exchange broadening. Shown are spin systems from each of the different amino acids found in GB1. The pattern of cross-peaks, that is the chemical shifts of the protons, can be unique for some of the amino acids; such recognisable patterns are very useful for assigning peaks to amino acid types prior to full sequential assignment
3.2.2
Sequence-Specific Assignments
The sequence-specific resonance assignment is achieved through the interresidue throughspace connectivities obtained from the NOESY spectrum. The most convenient starting points for GB1 are the unique Trp, Ile and Gln residues. The complete spin systems of each of these residues can be identified from a combination of the TOCSY and NOESY experiments. The most useful NOE effects for sequential assignment involve the CaH, CbH of residue i and the NH of adjacent residue i þ 1, daN(i,iþ1) and dbN(i,iþ1), and the NH of residues i and i þ 1, dNN(i,iþ1). These interresidue sequential NOE connectivities connect the CaH, CbH and NH of residue i and the NH of residue i þ 1. Figure 3.5 shows examples of daN(i,iþ1) and dNN(i,iþ1) connectivities that allowed the complete sequential assignment of GB1.
60
Protein NMR Spectroscopy
Figure 3.3 The aromatic region of GB1 showing three-bond covalent coupling (COSY) and relayed covalent coupling (TOCSY, 120 ms mixing) patterns for a sample dissolved in 90 % H2O/10 % D2O. Also shown are the amido NH2 side-chain TOCSY cross-peaks. Assignment to the different aromatic amino acid type is based on the unique cross-peak pattern for each residue type
The GB1 domain used in these experiments does not contain a proline residue. However, if a proline is present, its sequence-specific assignment will be based on the NOE connectivities between ProCdH and CaH(i-1). 3.2.3
Possible Difficulties
The most common difficulties encountered in the approach described above are overlapping and missing peaks. Missing peaks could result from line-broadening due to chemical exchange (see Chapter 7), while resonance overlap is particularly likely to be a problem in proteins with a very high helical content. In the latter case most of the CaH resonances are between 3.5 and 4.5 ppm and the NH between 7 and 8.5 ppm. Steps that can be taken to alleviate these problems include using various temperatures, and pH values (see Chapter 1).
3.3
15
N-Edited Experiments
In the unusual situation when it is only possible to make 15 N-labelled proteins rather than 15 N, 13 C-doubly labelled proteins (for example where poor expression makes 13 C labelling prohibitively expensive), the approach for resonance assignments is identical to the one described for the unlabelled protein. The only difference is that the spin system can be developed from the 15 N-1 HN two-dimensional cross-peak, thereby significantly improving the resolution and overcoming resonance overlap cause by degenerate amide proton resonances. Even in the unfortunate cases when both the 15 N and 1 H resonances are degenerate for a set of residues, the extra 15 N dimension would still reduce the possible
Resonance Assignments
61
Figure 3.4 [1 H , 1 H ]NOESY spectrum of GB1 showing the connectivities between the sidechains of aromatic residues, Gln and Asn which are useful for assigning AMX spin systems. For the aromatic residues, the NOESY data is best acquired for a sample dissolved in 2 H 2 O in order to simplify the spectrum and remove many of the connectivities involving labile amide protons. In addition, the 2 H 2 O data will distinguish the aromatic connectivities from the Gln and Asn, since the NH2 groups from Gln/Asn will be absent. For sequential assignment, NOESY experiments with mixing times of 100 ms and 150 ms were acquired
candidate residues at these positions to a very small number; often changing the temperature and/or pH would cause small chemical shift changes sufficient to resolve these signal overlap. For the aromatic side-chains and proline residues, it is still necessary to analyse the COSY, TOCSY and NOESY spectra of the unlabelled protein. Figure 3.6 illustrates slices through the 15 N-dimension of the of 15 N-edited TOCSY spectrum and Figures 3.7 and 3.8 show the use of 15 N-edited NOESY for the sequential resonance assignments, using strips for a-helical region A23-N37 and the b-strand region K4-L12. Figures 3.7 and 3.8, respectively, show clearly the characteristic strong dNN (i, i þ 1) and d aN (i, i þ 1) connectivities for the a-helical and b-sheet regions.
62
Protein NMR Spectroscopy
Figure 3.5 Examples of daN(i,iþ1) (upper) and dNN(i,iþ1) (lower) connectivities of the 150 ms NOESY spectrum of GB1. The HN(i þ1)-Ha(i) sequential connectivities for some of the residues are indicated; peaks are labelled using the format shown (e.g. I7/V6 ¼ I7NH/V6Ha). For HN (i þ1)-NH(i) connectivities, the position of the NH peaks are indicated in the diagonal, and the sequential HN(i þ 1)-NH(i) connectivities are indicated in the off-diagonal cross-peaks
3.4 3.4.1
Triple Resonance 3D Triple Resonance
If uniform 13 C, 15 N (and when necessary 2 H) isotope-labelling is feasible, powerful triple-resonance experiments are used to assign backbone resonances. The principles of the experiments are extensively described in NMR textbooks [5,6] and reviews [7,8] and are not discussed here (see Chapter 4). Most of the variations of the triple resonance experiments are incorporated into the standard pulse sequence libraries. The main spectra used for assignments correlate 1 H and 15 N resonances of each NH group with one or more carbon or proton resonances of given or preceding residue, as summarised in Figure 3.9. The 15 N-1 HN correlations are common in all the experiments and are detected independently in the 1 H,15 N-HSQC spectra. The triple-resonance correlations can be divided into two categories depending on the coherence transfer pathways. The first category is based on the N-C0 transfer and is restricted to the connection between the NH group of residue i and
Resonance Assignments
63
Figure 3.6 Example of the TOCSY connectivities observed in the 3D [15 N ] HSQC-TOCSY spectrum of GB1. The residue assigned to the spin system is indicated at the top of each slice. The number at the bottom left-hand corner of each slice is the nitrogen chemical shift of the amide resonance and the position on the x-axis, the corresponding proton shift. Extra peaks in each slice are due to overlapping peaks in the amide nitrogen and proton dimensions. The arrows for T18 and T16 indicate that the Hg resonances are present although not plotted whereas for T17, the Hg resonance is not observed due to line-broadening
0 the carbon atoms of the presiding residue i-1. This generates correlations Hi Ni Ci1 b a a in HNCO, Hi Ni Ci1 in HN(CO)CA and Hi Ni Ci1 =Ci1 in CBCA(CO)NH experiments. The second type involves N-Ca transfer that is possible between Ni and Cai of the same residue, as well as between Ni and Cai1 of the preceding residue, leading to the correlations b b 0 Hi Ni Ci0 =Ci1 in HN(CA)CO, Hi Ni Cai =Cai1 in HNCA and Hi Ni Cai =Ci =Cai1 =Ci1 in HNCACB experiments. Cross-peaks corresponding to the sequential correlations are usually substantially weaker or absent because of a smaller value of the two-bond N-Ca coupling. The cross-peaks corresponding to Ca and Cb atoms are of opposite sign in HNCACB experiments and this helps to distinguish the Ca and Cb resonances in Ser and Thr residues; however, it may lead to the cancellation of cross-peaks if these chemical shifts are very similar. The assignment procedure can be summarised as matching intra-residue 13 C chemical shifts identified from one NH group with the sequential chemical shifts identified from a different NH group. Normally the procedure is subdivided into two relatively distinct stages – (i) identification of the correlations for each HN group and grouping them into spin-systems, similar to the concept described for the assignment of unlabelled proteins; and (ii) assembling this information in a sequential order that follows residue connections by through-bond coupling, rather than through-space coupling, in the polypeptide chain. The main steps of the assignment protocol are listed in Figure 3.10 and described in detail in the following text.
64
Protein NMR Spectroscopy
Figure 3.7 Example of the dNN(i,iþ1) connectivities for residues A23 to N37 observed in the 3D [15 N ] HSQC-NOESY spectrum of GB1. This region forms an alpha helix secondary structure. The assigned residue is indicated at the top of each slice. The numbers at the bottom are the nitrogen and proton chemical shifts of the amide resonance. Starting from 1 H , 1 H diagonal of either unique or easily identifiable spin systems (and hence residue type), the NOE peaks to the NH of the adjacent residue is identified as shown by the dash arrows. By arranging the slices from adjacent residues next to each other, it is possible to ‘walk’ along the polypeptide backbone, enabling sequence specific resonance assignments to be made. For each slice, both dNN(i,iþ1) and dNN(i,i1) connectivities are observed. The dNN(i,i1) connectivities are shown as solid arrows
3.4.1.1 Identification of Spin Systems Spin-system grouping is usually achieved by starting from a ‘root’ experiment and locating cross-peaks in other triple-resonance experiments that have the same 1 HN and 15 N chemical shifts, as schematically illustrated in Figure 3.11. The best spectra to use as a ‘root’ are those of highest sensitivity and resolution, preferably containing a single cross-peak per residue. For small to medium size proteins 1 H, 15 N-HSQC is a usual root spectrum as most of the cross-peaks are normally well resolved. In case of a strong overlap, as is common for large or unfolded proteins, the HNCO spectrum provides a good alternative. Cross-peaks in the root spectrum are picked automatically and a separate spin-system will be developed from each of the cross-peaks. Obvious noise and artefact cross-peaks need to be removed prior to further steps. Cross-peaks in other triple-resonance spectra are then picked automatically or manually at the positions close to those of the root peaks. When using the automatic peakpicking procedure, the range for the peak search is set equal to the line-width at half height for 1 H and 15 N-dimensions, while for the manual peak-picking orthogonal slices are best displayed at the 1 H and 15 N-coordinates of the root peaks for the interactive peak selection.
Resonance Assignments
65
Figure 3.8 Example of the daN(i,iþ1) connectivities for residues K4 and L12 observed in the 3D HSQC-NOESY spectrum of GB1. This region forms a beta-sheet secondary structure. The assigned residue is indicated at the top of each slice. The numbers at the bottom are the nitrogen and proton chemical shifts of the amide resonance. Starting from the NOESY cross-peak between Leu12NH(residue i þ 1) and T11 Ha(residue i) on the right-hand panel, the horizontal arrows link the aN(i,i þ 1) cross-peak with the intraresidue aN cross-peak. The vertical arrows link the intraresidue cross-peak to the next interresidue cross-peak. By arranging the slices from adjacent residues next to each other, it is possible to ‘walk’ along the polypeptide backbone, enabling sequence specific resonance assignments to be made. There are two aH cross-peaks for G9 but for simplicity, only one of the two daN(i,iþ1) connectivities is shown
The choice between manual and automatic peak picking depends primarily on the quality of the spectra. For high sensitivity, artefact-free spectra the automatic method is fast and reliable, while for spectra with low signal-to-noise ratio automated peak-picking may either generate too many noise peaks or miss some real peaks with low intensities. In practice, a combination of the two methods usually works best, with automated peak-picking performed at a high threshold level, followed by manual inspection at a contour level set close to that of the noise. Even if manual peak picking is not used, it is best to check manually the results of peaks picked automatically to ensure that no cross-peaks have been missed due to overlap or low intensity, as having noise-free and reliable peak tables can save substantial time at the later stages. Peaks from different spectra are combined into spin systems by correlating them with the root cross-peaks. For well-resolved root peaks the correlation can be done automatically by grouping together cross-peaks that are within a certain tolerance from the root peak in the 1 H and 15 N-dimensions. The tolerances are normally set to a fraction of the line width (in practice 0.02–0.03 ppm in the 1 H and 0.2–0.3 ppm in the 15 N dimension) and can be determined by inspecting strips corresponding to several well-resolved root peaks. The results of the peak correlation need to be checked manually by going through all the root
66
Protein NMR Spectroscopy
peaks and displaying corresponding slices in all spectra. For well-resolved root peaks a single-slice orientation, normally the 1 H,13 C view, is sufficient, while the overlapped peaks need to be examined carefully using both orientations of the slices (1 H,13 C and 15 N,13 C). Any inappropriately grouped peaks must be corrected manually at this stage and new root peaks and spin systems created if needed. The grouped peaks are also compared with the expected number in each of the spectra. In particular, a spin system should contain: one cross-peak in each of HNCO and HN(CO)CA spectra, two cross-peaks in each of HNCA and HN(CA)CO spectra, of which one corresponds to the cross-peak in the corresponding CO-based experiment, two cross-peaks in CBCA(CO)NH spectra and four cross-peaks in HNCACB spectra, with opposite phases for CaH and CbH peaks. Some of the expected cross-peaks may not be present because of their low intensity, but a number of peaks that is larger than expected would indicate an overlap between different spin systems. The intensities of the cross-peaks need to be checked for large deviations. Presence of very intense sharp peaks corresponds to dynamic unstructured regions of the proteins, while small peaks could indicate impurities or multiple states of the protein.
Figure 3.9 Correlations observed in triple-resonance experiments. All experiments record resonances of NH groups correlated to different 13 C or 1 H intra-residue or sequential resonances, as marked by circles for each experiment type. Maximum number of cross-peaks in each experiment for individual residues corresponds to the number of encircled non-HN atoms. Experiments that provide complimentary intra-residue and sequential correlations are arranged horizontally
Resonance Assignments
67
Figure 3.10 Main steps of the assignment protocol based on triple resonance experiments
The chemical shifts of 13 Ca and 13 Cb strongly depend on the residue type as summarised in Figure 3.12. In particular, Gly, Leu, Ile, Val, Thr, Ser, Ala and Pro residues tend to have highly characteristic chemical shift values that can be used to identify them once the spin systems are assembled. This is achieved either by manual inspection of the spin systems or automatically by assigning scores based on the agreement between the observed values and
68
Protein NMR Spectroscopy
Figure 3.11 Spin system identification in triple-resonance spectra. Cross-peaks in a 1 H , 15 N -HSQC spectrum are picked automatically and used as roots. For each root peak 1 H ,13 C planes corresponding to the 15 N chemical shift are selected in triple-resonance spectra and cross-peaks are peaked at the 1 H chemical shift position. In this example all four cross-peaks are present in the HNCACB experiment with the sequential cross-peaks of significantly lower intensity than the intra-residue cross-peaks. Positive cross-peaks represented by black and negative by grey contours. Notice opposite sign of C a and C b cross-peaks for HNCACB spectrum that helps with correlation type identification. For overlapping HSQC cross-peaks that have close 1 H chemical shifts as two peaks highlighted by a box, orthogonal 15 N ,13 C planes in the triple-resonance experiments are used to resolved correlations
the value expected for each type of residue. Comparison of the number of spin systems identified for each residue type with the number of residues of this type in the sequence shows the quality of the data and serves to highlight problems at this early stage. If the number of spin systems corresponding to unambiguously identified residue types is larger than expected from the protein sequence, multiple protein forms or impurities may be present. The dominant form can usually be identified from the intensities of the cross-peaks. On the other hand, a smaller than expected number indicates missing cross-peaks or incorrectly assembled spin systems. 3.4.1.2 Sequential Assignment At the second stage of assignment, spin-systems are arranged sequentially by matching intra-residue peaks of one spin systems with sequential peaks of another, as illustrated in Figure 3.13a. The sequential matching of the spin systems is normally performed within the spectral analysis software which generates a list of possible connections for each spin system ranked according to the matching quality. The high scoring connections are checked
Resonance Assignments
69
Figure 3.12 Mean values and standard deviations of 13 C a and 13 C b chemical shifts for amino acids in nonparamagnetic proteins (data from http://www.bmrb.wisc.edu/). Amino acids with distinct chemical shifts are marked in italic bold. This information is used to determine residue types
graphically by displaying strips drawn through the corresponding spin systems. Particular attention must be paid to the accuracy of the chemical shift match and similarities of the peak shapes. Computer matching is necessarily based on tolerances that are higher than can be detected visually to account for possible noise and peak distortions. This can generate incorrect matches that are straightforward to identify graphically. Additionally, the sequential spin systems are checked for similarities in the intensities and line widths, as their dynamic properties are usually very similar. Large differences may indicate that the spin systems are not sequential or the presence of multiple forms. Although any spin system can be use to start sequential matching, it is often beneficial to use a spin system with distinct 13 C chemical shift values, as illustrated in Figure 3.13b. Since each spin system has information on intra-residue and sequential shifts, their values define a dipeptide fragment, which may be unique in the protein sequence. In such case the spin system can be assigned to a specific position in the protein sequence, such as the last spin system in Figure 3.13b that corresponds to the unique AA fragment. For the selected spin system, sequential connections to the preceding and following spin systems are checked and, if unique, used to generate a sequence of linked spin systems. The process is repeated for the newly joined spin systems until sequential connections become ambiguous. Residue types in the stretch of connected spin systems determined from the 13 C chemical shift values are checked against the protein sequence. Stretches of spinsystems that match unique positions in the sequence are assigned to the specific residues. The partial sequence-specific assignment reduces the number of unassigned residues, while the identified sequential connections reduce the number of spin systems available for the assignments, decreasing the complexity of the system. The procedure is repeated with the unassigned residues and unconnected spin systems.
70
Protein NMR Spectroscopy
Figure 3.13 Sequence-specific assignment. (a) Finding sequential spin systems. Intra-residues peaks are identified for spin system A in HNCACB (left) and HN(CA)CO (right) spectra shown in grey. 13 C shifts of these peaks are matched against sequential peaks in complimentary CBCA (CO)NH (left) and HNCO (right) spectra shown in black for all other spin systems. Four different spin systems B-E with at least one matching shift are shown. Only spin system B has all three matching sequential shifts and is selected as following spin system A in the protein sequence.
Resonance Assignments
71
The complete assignment procedure is illustrated in Figure 3.14 for a hypothetical four residue peptide fragment LQTE using schematic spectra for clarity. (i) Planes are selected in 3D HNCACB and CBCA(CO)NH spectra on the basis of 15 N chemical shifts in the root HSQC spectrum and peaks corresponding to intra-residue and sequential 13 Ca and 13 Cb nuclei are identified in the planes. Spin system B has highly characteristic intra-residue chemical shifts of Thr or Ser and can be assigned to the position in the sequence that includes only a single Thr residue. (ii) Spin system preceding to B is identified by comparing CBCA (CO)NH chemical shifts of this spin system with HNCACB chemical shifts of other spin systems. Only spin system D has matching chemical shifts, resulting in a unique sequential connectivity. (iii) Spin system C that follows B is identified by comparing HNCACB chemical shifts of this spin system with CBCA(CO)NH chemical shifts of other spin systems. Matching chemical shifts demonstrate that spin system A precedes D, corresponding to the first residue of the fragment. (iv) Strips from the HNCACB and CBCA(CO)NH spectra are arranged in the sequential order demonstrating matching connectivities. In addition, chemical shifts of spin system A have values characteristic for Leu, in agreement with the sequence. Often in the course of the assignment it becomes apparent that the spin system list is incomplete as no sequential connections can be identified for some of the spin systems. The main reasons for this are: (i) no signals are detected for some residues because of the resonance broadening and/or low signal-to-noise ratio; (ii) spin systems not identified correctly due to resonance overlap; (iii) incomplete peak picking in some of the spectra, particularly the root spectrum. The problem can be resolved by the direct analysis of the triple resonance spectra, as illustrated in Figure 3.15. If found, new spin-system(s) are added to the list and the assignment procedure is resumed. If no new matching spin-systems are found this way more sensitive experiments may be required to detect the signals for the particular region of the protein. Note that residues following and preceding prolines will have missing sequential spin-systems. Spin-systems of the residues that follow prolines can often be identified by the characteristic chemical shift values of sequential Ca and Cb cross-peaks. For small globular proteins, backbone resonance assignment is usually a straightforward process that can be often completed with just a complementary pair of HNCACB/CBCA (CO)NH experiments. For larger proteins some regions may remain unassigned after following the standard procedure because of overlapping or missing cross-peaks. In such cases additional experiments can be used to improve and/or validate the assignments. Highly sensitive complementary data are available from [1 H,15 N]-NOESY-HSQC/ TOCSY-HSQC experiments discussed above. When a [13 C,15 N]-labelled sample is available it is more effective to start with triple-resonance experiments because of their higher 3 Figure 3.13 (Continued) Note good shift match for spin system D in HNCACB/CBCA(CO)NH, but a clear mismatch in HN(CA)CO/HNCO pair. Use of multiple spectra often helps in resolving ambiguity of sequential matches. (b) Assigning sequentially linked spin systems to the positions in the protein sequence. Five spin system are connected sequentially through chemical shift matches in HNCACB/CBCA(CO)NH (left) and HN(CA)CO/HNCO (right) experiments. Three of the spin systems highlighted in italic bold have distinct chemical shifts that can is used identify sequence TxxAA corresponding to this stretch of spin systems. The protein sequence has only one corresponding fragment TDEAA, allowing unambiguous assignment of these spin systems to the positions in the protein sequence
72
Protein NMR Spectroscopy
Figure 3.14 Illustration of the assignment procedure for a hypothetical four residue peptide fragment LQTE. Cross-peaks are schematically shown by circles. CBCA corresponds to HNCACB and CBCACO to CBCA(CO)NH spectra. Larger and smaller circles in the HNCACB spectrum represent intra-residue and sequential cross-peaks, respectively. (i) Selection of strips in the 3D spectra corresponding to the root HSQC peaks. Cross-peaks of the 3D spectra are assembled into spin systems A–D. (ii) Identification of the spin system preceding spin system B through the match between CBCA(CO)NH peaks of B and HNCACB peaks of other spin systems. Chemical shifts of B correspond to Ser or Thr residue. (iii) Identification of the spin system following spin system B through the match between HNCACB peaks of B and CBCA(CO)NH peaks of other spin systems. (iv) Strips from the 3D spectra arranged in the sequential order based on the matching chemical shifts
Resonance Assignments
73
Figure 3.15 Identifying sequential connections by direct analysis of HNCACB (grey) and CBCA(CO)NH (black) spectra. From left to right: to locate spin-system that follows spin-system A, [1 H , 15 N ]-planes of CBCA(CO)NH spectrum are displayed at the chemical shifts corresponding to intraresidue Ca and Cb (marked respectively). Cross-peaks with the same [1 H ,15 N ] coordinates in both planes are identified (dashed lines) and [1 H ,13 C ] strip corresponding to these cross-peaks is compared with the strip containing spin-system A. In this example only one spin-system B has cross-peaks present in both Ca and Cb planes. Similar procedure is applied to identify preceding spin system using sequential chemical shifts of the Ca i-1 and Cb i-1 to select the planes and HNCACB spectrum to identify spin-systems. For large proteins or relaxation broadening one of the peaks in the Ca and Cb planes may be missing, in which case spinsystems corresponding to the single peaks are checked
resolution and lower ambiguity and use the [1 H,15 N]-based experiment for validation and for difficult regions. Even if the triple-resonance assignment is straightforward, the independent validation improves the reliability of the assignments. Additionally, a complementary pair of HBHANH/HBHA(CO)NH triple-resonance experiments can be used to resolve ambiguities and derive assignments for Ha and Hb protons. These experiments provide selective correlations between the signals of the NH group and intraresidue and/or sequential Ha and Hb protons and may allow their detection more reliably than [1 H,15 N]-NOESY-HSQC/TOCSY-HSQC experiments. The sensitivity of the experiments is the main limiting factor of their usage, particularly for HBHANH. As a guidance, the expected sensitivity is at least 50 % lower that of the corresponding HNCACB/ CBCA(CO)NH experiments due to the fast relaxation of proton magnetisation. If sensitivity is a limiting factor, HA(CA)NH/HA(CO)NH experiments can be used to detect Ha correlations. In summary, on completion of the triple-resonance assignments it is best to validate the results using (i) 1 H,15 N NOESY-HSQC/TOCSY-HSQC and (ii) HBHANH/ HBHA(CO)NH experiments.
74
Protein NMR Spectroscopy
3.4.1.3 Proline Residues Due to the absence of amide protons no intra-residue correlations are observed for proline residues in the triple-resonance experiments. However, the Ca/Cb and Ha/Hb resonances of prolines followed by a non-proline residue can be detected and assigned in CBCA(CO)NH and HBHA(CO)NH spectra, respectively. Poly-proline stretches can be assigned using experiments that correlate CaH signals with intra-residue or sequential resonances. The most sensitive complementary pair comprises the HACAN/HACA(CO)N experiments. Note, that in contrast to the sequential CACB(CO)NH, the HACA(CO)N experiment correlates the CH resonances with the 15 N resonance of the following, rather than preciding residue. Since many proteins contain short polyproline sequences, detection of a single sequential connection is often sufficient for unambiguous assignment. For longer stretches additional triple-resonance experiments can be used, although they have reduced sensitivity. Alternatively, the [1 H,13 C]-NOESY-HSQC experiment can be applied to detect sequential NOEs from the CdH2 group. With respect to the NOE connectivities, this group has characteristics similar to the backbone NH group of a nonproline residue, with dH2i/aHi1 and dH2i/dH2i1 NOEs detectable for the majority of the residues. In practice, chemical shifts of CdH2 and CaH groups are similar, with the corresponding cross-peaks located near the diagonal, and the signal dispersion is lower than for the HN signals. This makes the cross-peaks difficult to resolve and often requires the [1 H,13 C]-NOESY-HSQC spectrum collected in 100 % 2 H2 O to minimise the baseline distortions from the water signal. 3.4.2
4D Triple Resonance
Small and medium size globular proteins can be usually fully assigned with 3D triple resonance spectra. In some cases, however, signal separation in the 3D spectra may be insufficient for establishing unambiguous sequential connectivities. The use of 4D spectra may help to resolve ambiguity, although often at the expense of sensitivity and measurement time. Triple-resonance 4D spectroscopy offers two types of improvements. The first helps with resolving individual spin-systems, as schematically illustrated in Figure 3.16a using HNCOCA correlations as an example. If both 1 H and 15 N resonances of two residues overlap, only a single cross-peak is observed in the 1 H-15 N HSQC spectrum and the 13 Ca chemical shift of the HN(CO)CA spectrum cannot be associated with the 13 CO chemical shift detected in the HNCO spectrum as all the cross-peaks will have identical 1 H and 15 N coordinates. In the 4D HNCOCA spectrum the peaks have four coordinates – 1 H, 15 N, 13 Ca and 13 CO and the correct combination of CO and Ca chemical shift is automatically detected. All triple resonance experiments that involve two or more different 13 C nuclei can be acquired in a 4D mode, allowing detection of Hi NiCOi1 Cai1 correlations in 4D HNCOCA, Hi Ni COi1 Cai1 Cbi1 in 4D HNCOCACB and Hi NiCai Cbi in 4D HNCACB spectra. In practice, complete overlap of both 1 H and 15 N resonances is rare and the ambiguity in the spin-systems can be resolved by checking different combinations of chemical shifts, with only a single combination usually showing sequential connectivities. The second type of 4D spectrum allows direct correlation of HN resonances of two sequential residues in 4D HN(COCA)NH experiment as illustrated in Figure 3.16b. The coherence transfer involves two relatively small couplings: J(Ni,COi1) and J(Ni1,Cai1 ), which strongly decreases the sensitivity of the experiment as the molecular weight increases. For that reason, it is only applicable to small, unfolded or deuterated proteins.
Resonance Assignments
75
Figure 3.16 Use of 4D spectra in resonance assignment. (a) Resolving spin systems with simultaneous overlap of 1 H N and 15 N resonances. In this case both 1 H ,15 N -HSQC (left) crosspeaks of the spin-systems 1 and 2 have the same coordinates and cross-peaks of the tripleresonance experiments cannot be separated into individual spin-systems, as schematically illustrated in the figure for the 3D HN(CO)CA and HNCO experiments. Each of the corresponding slices contains two cross-peaks and it is not possible to decide which one of the HN(CO)CA and HNCO cross-peaks belongs to the same spin-system. The 13 C a and 13 C O chemical shifts can be correlated in the 4D HNCOCA experiment, allowing separation of the cross-peaks into spin-systems. (b) Direct sequential correlation of the 1 H ,15 N -HSQC cross-peaks with 4D HN (COCA)NH experiment. The 1 H ,15 N -HSQC (left) cross-peaks of the sequential spin-systems are labelled i and i-1. The [Hi,Ni] plane of the 4D experiment contains a single cross-peak with coordinates [Hi1,Ni1] corresponding to the 1 H ,15 N -HSQC cross-peaks of the preceding spinsystem. These coordinates are used to select the next 4D plane and to identify the next sequential spin-system (right). By repeating the plane selections spin-systems are directly aligned in a sequential order. Note that the correlations of the spin-system X are not present in the selected planes despite of the overlap of the 1 H resonances
When sensitivity is sufficient, the experiment is extremely powerful, as sequential 1 H-15 N cross-peaks can be directly and unambiguously identified from the coordinates of the 4D cross-peak. For resolved HSQC peaks the assignment process is reduced to displaying the HiNi plane in the 4D spectrum, identifying the Hi1Ni1 cross-peak, displaying the Hi1Ni1 plane, identifying the Hi2Ni2 cross-peak and continuing the process until the
76
Protein NMR Spectroscopy
sequential connections become ambiguous. As each cross-peak in the 4D spectrum is directly associated with an 1 H-15 N HSQC peak, the correlation of the peaks in the 4D spectrum automatically generates a set of sequential spin-systems. This set can then be associated with the protein sequence using 13 C chemical shifts detected in 3D tripleresonance experiments. In addition to low sensitivity, 4D experiments suffer from limited digital resolution, which increases the cross-peak overlap and introduces additional ambiguity. To overcome this, the 4D spectrum is normally combined with 3D tripleresonance experiments. 3.4.3
Computer-Assisted Backbone Assignments
Algorithmically, backbone assignment based on triple-resonance data is reduced to matching well-defined chemical shifts sequentially and comparing them to the values expected for different residue types. Such a procedure is highly suitable for automation, which led to the development of a number of systems with a various degree of manual contribution. Graphical spectral analysis packages such as Sparky [9] and CCPN Analysis [10] have internal modules to assist with the manual assignment. The user is presented with a list of sequentially matching spin-systems and possible positions for the spin systems in the sequence. Based on this information the spin-systems can be connected into fragments and matched against the proteins sequence. Once unique positions are found the chemical shifts are assigned to the appropriate residues with a single mouse click. Such assistance significantly accelerates the assignment process, but the user still has to make decisions on the uniqueness of the sequential connectivities and positions in the proteins sequence. High ambiguity due to low chemical shift dispersion and strong resonance overlap can make the manual procedure unreliable. The assignment procedure can be enhanced by fully automated approaches with the programs such as AUTOASSIGN [11] or MARS [12]. In the automated procedures the connectivities and the positions in the protein sequence are optimised simultaneously for all spin systems using all the information available, rather than a subset related to a small number of spin systems that are possible to analyse manually. This improves the reliability of the assignments, particularly when some of the spin systems are missing. The drawback of the automated procedures is often a long-range effect of incorrectly assigned spin systems, where assignment mismatch occurs in several regions of the protein sequence. This is particularly prominent when a minor form of the same protein is present, leading to duplication of spin systems. The results of the automated assignment need to be checked graphically for mismatch in the sequential connectivities and between the residue type and chemical shifts of the spin system. In addition, the intensities of the cross-peaks corresponding to the residues in sequential positions should be comparable, unless selective resonance broadening indicates intensity loss due to relaxation. 3.4.4
Unstructured Proteins
Unstructured proteins (see Chapter 9) are characterised by low chemical shift dispersion, leading to strong resonance overlap and assignment ambiguities. At the same time the relaxation properties are favourable for the detection of triple resonance experiments at high resolution and sensitivity. Often the resolution in the HSQC experiment is sufficiently high
Resonance Assignments
77
to separate the majority of the cross-peaks and most of the ambiguities are related to the overlap of 13 C resonances. To resolve such ambiguities it is often sufficient to collect spectra which have high resolution in the 13 C dimension, such as the HNCO/HN(CA)CO spectra. Additionally, the HBHANH experiment can be recorded with high sensitivity and resolution, offering additional sequential connectivities in combination with the HBHA(CO)NH experiment. If the increased resolution and detection of additional sequential connectivities is still insufficient for the unambiguous assignment, the 4D HN(COCA)NH experiment can be recorded to observe direct correlation between HN resonances. The overlap between HN resonances can be reduced by measuring triple-resonance experiments at different temperatures. 3.4.5
Large Proteins
The sensitivity of triple-resonance experiments for large proteins is low due to fast relaxation. The uniform deuteration of the protein in combination with TROSY method for the relaxation compensation has been successfully used to overcome the relaxation limitations and led to the backbone assignment of the proteins as large as 100 kDa [13]. The assignment strategy is similar to that of the smaller proteins, with the chemical shift information derived from the corresponding TROSY variants of the triple-resonance experiments [14]. The main challenge in the assignment procedure is the ambiguity caused by the large number of resonances and the lack of unique sequential combinations of residues. The ambiguities in the sequential connection of the spin systems can be resolved in 4D spectra, with the 4D HN(COCA)NH often sensitive enough due to deuteration. The effectiveness of the assignment is dramatically improved with the use of automated assignments software that can analyse a large number of assignment combinations. Expression of proteins in 2 H2 O leads to deuteration of amide groups (see Chapter 2 for more details of protein deuteration). Most of these groups become protonated on transfer to H2O-based buffer, although a number of groups may remain deuterated for a significant time in case of a stable protein fold. The resonances of these residues cannot be detected until significant fraction of the group is protonated, leading to missing signals in the tripleresonance experiments. In such cases an additional set of spectra needs to be recorded once all the groups are protonated. In some cases the full exchange may require special buffer conditions. Nonetheless, the reduced number of signals in the spectra at the initial stage may be beneficial for resolving ambiguities in the assignments.
3.5
Side-Chain Assignments
On completion of backbone assignments using triple-resonance experiments both 1 H and 13 C chemical shifts of CaH and CbH groups are normally known. If the assignment was done by using only 13 C chemical shifts, the corresponding 1 H shifts can be determined in HBHA (CO)NH/HBHANH experiments. Starting from CaH and CbH groups, the rest of the sidechain can be assigned using a combination of H(C)CH-TOCSY and (H)CCH-TOCSY spectra, as illustrated in Figure 3.17. These experiments correlate a cross-peak in a [1 H,13 C]-HSQC experiment with all proton or carbon resonances within the same sidechain, respectively. The H(C)CH-TOCSY spectrum is usually best measured with the
78
Protein NMR Spectroscopy
Resonance Assignments
79
acquisition dimension corresponding to that of HSQC to maximise separation of resonances correlated to different groups. With sufficient sensitivity all resonances are observed within the CaH and CbH slices and the only uncertainty is the relation between the 1 H and 13 C resonances. The carbon chemical shifts can normally be assigned to specific positions in the side-chain on the basis of the chemical shifts and to a certain degree this is possible for protons as well. The assignments are validated by displaying the slice corresponding to the newly assigned resonances alongside the slices with the confirmed assignments, as well as checking the presence of the cross-peak in the [1 H,13 C]-HSQC spectrum. If matches are not found, a different combination of 1 H and 13 C shifts is tested. On completion of the procedure a set of matching strips is identified in the spectra, validating the side-chain assignments (Figure 3.17). This assignment procedure is quick and straightforward to apply for small globular proteins, but for larger proteins can be complicated by resonance overlap and missing crosspeaks. In the case of the missing cross-peaks additional resonances can be identified while displaying slices further along the side-chain. In particular, for Leu and Ile strong correlations are observed between CaH and methyl groups, but other correlations may be absent in these slices because of the fast relaxation for non-methyl groups. However, in the slices corresponding to the methyl groups such correlations are present due to a shorter transfer path and can be identified once the methyl groups are assigned. The more sensitive HCCH-COSY experiment is beneficial for detection of missing correlations and identification of direct correlations, although the cross-peak separation in this experiment is lower than in HCCH-TOCSY. For large proteins the information from HCCH-TOCSY experiments may be too ambiguous for reliable assignments, in particularly for long side-chains. In such cases 15 N-separated NOESY-HSQC can be used to resolve ambiguities. This experiment benefits from a high dispersion of HN resonances and high sensitivity even for large proteins due to efficiency of the NOE transfer. To aid in the assignment the slices of 15 N-separated NOESY-HSQC corresponding to the intra-residue and sequential NH groups are displayed alongside the H (C)CH-TOCSY strips and peaks are cross-checked. The majority of intra-residue and sequential NOEs are usually observed in the NOESY-HSQC spectra, particularly for large 3 Figure 3.17 Side-chain assignments using a combination of HCCH-TOCSY (top) and (H)CCHTOCSY (bottom) experiments. The example illustrates assignment of the Ile side-chain. Slices on the left corresponding to CaH and CbH groups are selected on the basis of chemical shifts determined in the course of the backbone assignments. These slices display cross-peaks associated with other 1 H and 13 C atoms of the side-chain, as marked. Combinations of chemical shifts corresponding to the specific CH groups are used to select additional slices in the 3D experiments and match them against the CaH and CbH slices (right). In this example 1 H chemical shifts determine the position of the plane, while 13 C chemical shifts are used to identify the cross-peaks in the plane. The 13 C resonances can usually be assigned to the specific atoms in the side-chain on the basis of the chemical shift, while assignment of the 1 H resonances may be ambiguous. The ambiguity is resolved by checking different combinations of the 1 H and 13 C chemical shits in the 1 H ,13 C -HSQC and 3D HCCH-TOCSY/(H)CCH-TOCSY spectra. Incorrect combination often does not correspond to any cross-peak in the 1 H ,13 C -HSQC spectrum and has no cross-peaks matching CaH and CbH slices. Some of the cross-peaks in the 3D spectra may be missing due to low sensitivity
80
Protein NMR Spectroscopy
proteins. As the result, the cross-peaks present in H(C)CH-TOCSY spectra that do not have a corresponding NOESY-HSQC peak usually belong to a different side-chain. The 15 N-separated NOESY-HSQC spectra may also help to detect missing side-chain resonances. For unstructured proteins with high internal mobility, resonance overlap is the largest difficulty in the side-chain assignments. However, 15 N-HSQC still has sufficient peak separation and H(CCCO)NH, H(CCCA)NH, C(CCO)NH and C(CCA)NH experiments that correlate side-chain resonances with the signals of the backbone NH-groups provide a powerful assignment strategy. In these experiments all intra-residue or sequential 13 C or 1 H chemical shifts can be observed within a single slice. The assignment of the resonances to the specific groups in the side-chain is done on the basis of the chemical shifts and validation against 13C-HSQC and HCCH-TOCSY spectra. Relatively low sensitivity of the experiments restricts their usability to unstructured or small proteins, unless proteins are deuterated to reduce relaxation. Once completed, the assignments can be validated in 3D 13 C-separated NOESY-HSQC measured in H2O. In addition to correlations detected in H(C)CH-TOCSY experiment, the NOESY spectra contain well-resolved correlations between the side-chains and HN-groups that can be reliably detected even in large proteins. In most cases Ha, Hb and methyl proton resonances exhibit intra-residue and sequential correlations with the resonances of the corresponding HN protons. Missing cross-peaks may be caused by unfavourable relaxation properties, but if neither intra-residue nor sequential peaks are present while cross-peaks to other HN resonances are observed from the same group the assignment is likely to be incorrect. The validation is best conducted on a residue basis by simultaneously displaying 13 C-NOESY-HSQC strips corresponding to all 1 H resonances of the same side-chain. The positions of the intra-residue 1 H chemical shifts, as well as sequential 1 HN are marked on the slices and cross-peak detection is analysed. The procedure is repeated for all residues in a sequential order. Following the assignments of the aliphatic protons, side-chain NH2 groups of Asn and Gln residues can be assigned using CBCA(CO)NH and HBHA(CO)NH experiments. Intraresidue correlations between resonances of NH2 and CaH/CbH2 or CbH2/Cg H2 groups for Asn and Gln, respectively, are identified in the experiments and compared against chemical shifts determined at the previous stage to generate the assignments. Use of both 1 H and 13 C chemical shifts is normally sufficient to resolve ambiguities. When optimised for detection of the backbone resonances these experiments do not generate any transferable magnetisation for the NH2 groups. However, 5–10 % 2 H2 O present in the solvent leads to the corresponding fraction of HN2H groups with correlations that are detectable without any parameter adjustments, although the sensitivity of the experiment is reduced proportionately. The cross-peaks corresponding to the HN2H groups are observed in 15 N-HSQC spectra as low-intensity satellites displaced vertically from the main NH2 signals by the isotope shift value. As the result, the cross-peaks in the triple-resonance experiments are similarly displaced from the main HSQC peaks, so the spectral display planes have to be adjusted accordingly to detect the peaks. If the intensity of the satellite peaks is too low, the experiments are acquired with adjusted parameters allowing detection of the main NH2 correlations. As an alternative or additional assignment approach 15 N-NOESY-HSQC spectra can be used to detect intra-residues NOE correlations of NH2 groups, with the most intense correlations normally corresponding to the interactions with CbH2 or Cg H2 groups of Asn and Gln, respectively (see Section 3.2.1 above and Figure 3.4).
Resonance Assignments
81
A number of experiments have been proposed for the resonance assignments of aromatic rings based on the coherence transfer through 13 C spin couplings that correlate CbH2 groups with aromatic 1 H and 13 C resonances [15]. However, multistage transfer pathways and often unfavourable relaxation properties of the aromatic groups make these experiments applicable only to small proteins at high concentrations. Even if 13 C- labelled samples are available the assignments of the aromatic groups are still mainly based on the intra-residue NOEs. The aromatic resonances are best resolved in a [1 H,13 C] constant-time TROSY experiment [16]. The intra-ring resonance systems can be identified using 3D HCCH-COSY and HCCH-TOCSY spectra, although in some cases a better signal separation can be achieved in 2D homonuclear NOESY and TOCSY experiments. The NOE correlations are resolved in 13 C-NOESY-HSQC spectra with the HSQC dimensions corresponding to either aromatic or aliphatic groups. The choice of the experiment depends on the resonance overlap in the spectra and in difficult cases both spectra have to be used to cross-check the correlations.
References 1. Wuthrich, K., Wider, G., Wagner, G. and Braun, W. (1982) Sequential resonance assignments as a basis for determination of spatial protein structures by high-resolution protein nuclear magnetic resonance. J. Mol. Biol., 155, 311–319. 2. Clore, G.M. and Gronenborn, A.M. (1991) Applications of 3-dimensional and 4-dimensional heteronuclear NMR-spectroscopy to protein-structure determination. Prog. NMR Spectrosc., 23, 43–92. 3. Pervushin, K., Reik, R., Wider, G. and Wuthrich, K. (1997) Attenuated T-2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc. Natl. Acad. Sci. USA, 94, 12366–12371. 4. Lian, L.Y., Yang, J.C., Derrick, J.P. et al. (1991) Sequential 1 H NMR assignments and secondary structure of an IgG-binding domain from Protein G. Biochemistry, 30, 5335–5340. 5. Cavanagh, J., Fairbrother, W.J., Rance, M. et al. (2007) Protein NMR Spectroscopy: Principles and Practice, Second Edition, Academic Press, Amsterdam. 6. Rule, G.S. and Hitchens, T.K. (2006) Fundamentals of Protein NMR Spectroscopy, Springer. 7. Permi, P. and Annila, A. (2004) Coherence transfer in proteins. Prog. NMR Spectrosc., 44, 97–137. 8. Sattler, M., Schleucher, J. and Griesinger, C. (1999) Heteronuclear multidimensional NMR experiments for the structure determination of proteins in solution employing pulsed field gradients. Prog. NMR Spectrosc., 34, 93–158. 9. Goddard, T.D. and Kneller, D.G. (2008) SPARKY 3, University of California, San Francisco. 10. Vranken, W.F., Boucher, W., Stevens, T.J. et al. (2005) The CCPN data model for NMR spectroscopy: Development of a software pipeline. Proteins – Struct. Funct. Bioinform., 59, 687–696. 11. Zimmerman, D.E., Kulikowski, C.A., Huang, Y.P. et al. (1997) Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol., 269, 592–610. 12. Jung, Y.S. and Zweckstetter, M. (2004) Mars – robust automatic backbone assignment of proteins. J. Biomol. NMR, 30, 11–23. 13. Tzakos, A.G., Grace, C.R.R., Lukavsky, P.J. and Riek, R. (2006) NMR techniques for very large proteins and RNAs in solution. Ann. Rev. of Biophys. Biomol. Structure, 35, 319–342. 14. Zhu, G. and Yao, X.J. (2008) TROSY-based NMR experiments for NMR studies of large biomolecules. Prog. NMR Spectrosc., 52, 49–68.
82
Protein NMR Spectroscopy
15. Prompers, J.J., Groenewegen, A., Hilbers, C.W. and Pepermans, H.A.M. (1998) Two-dimensional NMR experiments for the assignment of aromatic side chains in C-13-labeled proteins. J. Mag. Res., 130, 68–75. 16. Pervushin, K., Riek, R., Wider, G. and Wuthrich, K. (1998) Transverse relaxation-optimized spectroscopy (TROSY) for NMR studies of aromatic spin systems in C-13-labeled proteins. J. Am. Chem. Soc., 120, 6394–6400.
4 Measurement of Structural Restraints Geerten W. Vuister, Nico Tjandra, Yang Shen, Alex Grishaev and Stephan Grzesiek
4.1
Introduction
Structure determination by high-resolution NMR spectroscopy in solution has traditionally relied on the use of Nuclear Overhauser Enhancement (NOE) derived distance restraints, complemented by dihedral restraints obtained from the analysis of J-couplings [1]. This approach has proven to be very successful in solving the three-dimensional structures of proteins up to the 20–30 kDa range [2]. More recently, additional sources of structural information have become available. These include chemical shifts [3,4], reliable information about hydrogen bonding [5], residual dipolar couplings (RDCs) [6,7] and small-angle X-ray (SAXS) [8,9] or neutron scattering (SANS) data [10]. The different types of data are highly complementary and their combined use has led to strongly improved NMR structures. Since the information from NOE, J-couplings and chemical shifts is inherently short-range in nature, cumulative errors from such restraints can translate into inaccurate long-range structural behaviour [11]. This is particularly the case for non-globular biomolecules, such as larger DNA-molecules or protein (complexes) made up of loosely associated modules. In contrast, RDCs and SAXS/SANS data provide long-range information by reporting on the orientation of local groups relative to an overall molecular frame and constraining the overall shape of the molecule. The combination of both types of data has opened avenues for the accurate study of a much wider range of systems, in particular large complexes, dynamic domain–domain interactions and partially or completely unfolded proteins.
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
84
Protein NMR Spectroscopy
In this chapter, we describe the use of these parameters in structure determination. For each parameter, we discuss the physical basis, the practical experimental setup, how structural information is extracted from the data and potential caveats.
4.2 4.2.1
NOE-Based Distance Restraints Physical Background !
A magnetic nucleus with spin I and gyromagnetic ratio g I generates a magnetic dipolar ! field Bd , which is given by ( ) ! ! ! ! gI hm 0 ! r ðI r Þ Bd ¼ I 3 ð4:1aÞ r2 4pr3 where ! r is the distance vector to the nucleus and m0 is the vacuum permeability. Another ! magnetic nucleus, with spin S and gyromagnetic ratio g I, will feel this field as a dipoledipole interaction (Figure 4.1) with energy ( ) ! ! ! ! ! ! g Sg I h2 m 0 ! ! ðS r ÞðI r Þ ! ! Hd ¼ mS Bd ¼ ðg S hS Þ Bd ¼ S I 3 ð4:1bÞ r2 4pr3 In an isotropic solution, rapid rotational motion averages the dipolar interaction to zero, and hence it does not contribute to the energy levels of the system. However, the fluctuating
S r I
!
Figure 4.1 The magnetic dipole field associated with the spin I of a magnetic nucleus exerts a ! force onto a second magnetic nucleus with spin S in an orientation- and distance-dependent manner
Measurement of Structural Restraints
85
dipolar field generated by the molecular motion constitutes an effective relaxation mecha! ! nism for the spins I and S , which induces transitions between the different energy levels and alters their populations. Transitions between the longitudinal spin states (z-magnetisation) occur in the form of auto-relaxation (affecting only one spin) and cross-relaxation (affecting both spins). Longitudinal auto-relaxation is also referred to as T1 or spin-lattice relaxation, whereas longitudinal dipolar cross-relaxation is the source of the Nuclear Overhauser enhancement (NOE) and gives rise to cross-peaks in a NOESY spectrum. Relaxation matrix theory. Longitudinal relaxation in a system of N spins is governed by a set of coupled differential equations of the form [12,13] (boldface indicates matrices or vectors) d Iz ðtÞ ¼ RðIz ðtÞI0z Þ dt
ð4:2Þ
where Iz(t) represents the time-dependent vector of the z-magnetisations of all N spins, I0z is its value at thermal equilibrium and R is the relaxation matrix, which contains the auto- and cross-relaxation rate constants. For a system of dipolar-coupled, homonuclear spin-1/2 nuclei the diagonal and off-diagonal elements of R are given by 1 4 2 m0 2 X g h Jij ð0Þ þ 3Jij ðw0 Þ þ 6Jij ð2w0 Þ þ Rii;leak i ¼ 6 j 10 4p m 2 1 Rij ¼ sij ¼ g 4 h2 0 Jij ð0Þ6Jij ð2w0 Þ for i 6¼ j 10 4p
Rii ¼ rii ¼
ð4:3aÞ ð4:3bÞ
where rii denotes the auto-relaxation rate with Rii,leak as additional term to account for all non-dipolar relaxation mechanisms, sij represents the cross-relaxation rate between spins i and j, and w0 is the Larmor frequency. The dipolar field fluctuations enter into Equation 4.3a in the form of the spectral density function Jij(w), which corresponds to the Fourier transform of the autocorrelation of the fluctuating dipolar field between spins i and j. Thus the dipolar relaxation rates are proportional to the squares of the dipole-dipole interaction (Equation 4.1b), and dipolar relaxation is most effective for high-g nuclei; i.e. amongst 1 H, 15 N, 13 C it is most effective for 1 H. For an isotropic rigid rotator with rotational correlation time tc, the spectral density Jij(w) is given by Jij ðwÞ ¼
1 tc r6ij 1 þ ðwtc Þ2
ð4:4Þ
which shows the proportionality of the NOE effect to the inverse sixth power of the internuclear distance, rij. For molecules in the slow-tumbling limit, i.e. w0tc 1, which applies to most biomolecules with a molecular weight >3–4 kDa, the Jij(2w0) spectral density term of Equation 4.3b can safely be neglected, resulting in sij
1 g 4 h2 m0 2 tc 10 r6ij 4p
ð4:5Þ
86
4.2.2
Protein NMR Spectroscopy
NMR Experiments for Measuring the NOE
The NOE effect described by the cross-relaxation rate sij is best measured using a transient NOE experiment. For this purpose, a deviation from equilibrium z-magnetisation is created at a certain point in the pulse sequence and magnetisation is then allowed to exchange via cross relaxation during a subsequent NOE mixing period. Denoting by DIz ðtÞ ¼ Iz ðtÞI0z the deviation from thermal equilibrium, the magnetisation after the NOE mixing time tm is given by the formal solution of Equation 4.2 DIz ðtm Þ ¼ expðRtm ÞDIz ð0Þ
ð4:6Þ
where DIz ð0Þ is the deviation at the beginning of the mixing period and the matrix exponential expðRtm Þ is the NOE transfer matrix. In the conventional two-dimensional (2D) version of this NOESY (NOE SpectroscopY) experiment, the frequencies of protons before and after the NOE transfer are measured yielding a 2D (F1,F2) NOE spectrum, which identifies the cross relaxing protons i and j at frequency positions (FHi,FHj) and (FHj,FHi). The intensities of the peaks are proportional to DIz ðtÞ thus yielding a measure for sij and rij via Equations 4.3–4.6. Obviously, the correct assignment of an NOE cross-peak is essential for use as a structural restraint, and misinterpreted NOEs can have disastrous effects on the structure determination. The introduction of 13 C and 15 N isotope labelling has greatly facilitated correct NOE assignment via editing the 2D NOESY in additional dimensions by the heteronuclei attached to the interacting protons. A commonly used experiment is the 3D 15 N-edited NOESY-HSQC (Figure 4.2a). The sequence consists of a concatenation of an NOE- and a 1 H-15 N-HSQC building block. Selective water flip-back pulses assure scan-to-scan preservation of the water z-magnetisation, thus avoiding saturation transfer to the protein that would otherwise result in signal attenuation [15]. The resulting 3D NOESY-HSQC spectrum will yield H $ HN (Figure 4.2b) cross-peaks at frequencies (FHi,FNj,FHj), where Hj,Nj constitute an amide proton, nitrogen pair connected by 1 J HN coupling. Since NOEs between two carbon-attached protons are not detected in such a 15 N-edited NOESY-HSQC, this crucial information for side chain contacts is usually obtained from a complementary 3D 13 C-edited NOESY-HSQC or NOESY-HMQC experiment. The latter yields H $ HC cross-peaks at frequencies (FHi,FCj,FHj), where Hj,Cj now constitute a directly bonded proton, carbon pair (Figure 4.2b). The increase to four dimensions appears as a natural extension of the 3D NOESYs for the unambiguous identification of NOE cross-peaks. Such an extension is easily achieved by the addition of a further HMQC or HSQC element (Figure 4.2b). Three complementary versions exist, detecting either NOEs between two nitrogen-attached protons (NH $ HN), between a carbon-attached proton and a nitrogen-attached proton (CH $ HN) or between two carbon-attached protons (CH $ HC). However, practical considerations reduce the usefulness of 4D NOESYexperiments: (i) for realistic total experimental times on the order of several days, the additional sampling of the fourth dimension limits the achievable digital resolution in all indirect dimensions, i.e. the maximal acquisition times in the indirect t1, t2 and t3 dimensions, below the physical limit of the effective lifetime of resonances (see below). (ii) The addition of an extra dimension reduces the sensitivity by H2 relative to the 3D counterpart because of the necessity to sample both the real and the imaginary part of the fourth dimension. (iii) The additional pulses and delays needed for transfer to the 13 C
Measurement of Structural Restraints Frequency labeling
(a)
NOE transfer
φ1 1
y
t1
H
τm
δ
φ3
(b)
HSQC
y -x
δ
φ2
15N
t2 2
t2 2
1
3D:
1
H
1
H
H
1
H
δ δ
N C
H
H
15
N
15
N C
H
N H C 1H 13 C 1H
1
Dec.
4D:
13C
15 13
1
1
Cα C’
φ3
NOE
2D: -x -x
t3
87
15
13
1
1
H 1 H
13
Grad. G1
G2
G3
G4 G5
G5
Figure 4.2 (a) Pulse sequence of the 3D 15N-edited NOESY-HSQC experiment (H $ HN) for 15 N- or 15N/ 13C-labelled samples. Narrow and wide pulses have flip angles of 90 and 180 , respectively and are applied with phase x unless specified otherwise. The 1H, 15N, and 13C carriers are set to 4.7 (H2O), 116.5, and 70 ppm, respectively. 1H pulses have an RF field strength of 28 kHz with the exception of the water-selective flip-back pulses applied as 2 ms sinc 90 pulses (semi-ellipses) and low-power, rectangular 1 ms 90 pulses (smaller rectangles). The 15 N pulses have an RF field strength of 6.25 kHz. 15N WALTZ-16 decoupling during acquisition is applied at an RF field strength of 1.5 kHz. The 90 13C pulses have a strength of 22 kHz, the 13C 180 pulse (semi-ellipse) is implemented as a 400–500 ms hyperbolic-secant pulse. For t1 values shorter than the length of this pulse, the pulse can be applied as overlapping with the 1H 90 pulses or prior to the first 1H pulse. 13C a and 13C 0 180 decoupling pulses are applied at 56 and 177 ppm, respectively, and have an RF field strength of 14.0 kHz. On cryo probes, 13C and 15 N 180 pulses are usually not applied simultaneously, but back-to-back to avoid arcing. As an alternative, the final soft 90 x –hard 180 x –soft 90 x WATERGATE scheme can be replaced by a 3–9–19 sequence (see [14] for details). Gradient durations (z-direction, sine bell shaped; 30 G/cm at centre): G1,2,3,4,5 ¼ 6.0, 5.0, 3.0, 2.0, 0.4 ms. Delays: d ¼ 2.25 ms, tm see text. Phases: f1 ¼ (45 ,225 ); f2 ¼ 2x,2(x); f3 ¼ 4x,4y,4(x),4(y); receiver ¼ x,2(x),x. Quadrature detection in the indirect dimensions is achieved by incrementing phases f1 (1H) and f2 (15N) in the usual States-TPPI manner. (b) Schematic overview of nuclei connected in 2D, 3D and 4D NOE experiments
or 15 N nuclei result in further losses. For these reasons, it is often advisable to limit the dimensionality of an NMR experiment. Thus three-dimensional projections of the 4D NOESYs such as N $ HN, C $ HN, or C $ HC are often more useful as a complement to the ‘regular’ H $ HN or H $ HC 3D-NOESY experiments. Recent developments employing sparse sampling techniques partially overcome some of the limitations with respect to a decrease in digital resolution [16,17]. 4.2.3
Set-up of NOESY Experiments
4.2.3.1 Estimation of T2s For the proper set-up of any NMR experiment, it is essential to have an approximate idea of the relaxation times of the different nuclei involved in the magnetisation transfer pathways and in the frequency detection periods. A simple estimate of amide proton T2s can be obtained from the 1–1 spin-echo experiment (Figure 4.3a [18]). This experiment is a selective spin-echo with good water suppression that can be used also on unlabelled samples in H2O and hence is very suitable for general, fast characterisation of any
Protein NMR Spectroscopy
88
Δ1 = 0.1 ms Δ2 = 2.9 ms I2 /I1 = 0.65
(a)
(b)
Cγ 6
8-10 6
H C H β
φ1 -φ1 1
H τ
15
12
φ2 -φ2 Δ
Δ
2τ
N
42-50 acq Dec.
10
8
6
4
2
0
1H
14-20
N
Cα
C
H 13
H8
O
50-60
(ppm)
Figure 4.3 Estimation of T2 values in proteins from a 1-1 spin echo experiment. (a) Result of 1-1 Echo experiment applied to the HIV-1 Nef protein. The inset shows the pulse sequence of the experiment. The carrier is set on water. The delay t is adjusted such that the excitation maximum is in the centre of the 1HN region: |nmax| ¼ (4t)1 where nmax denotes the frequency offset from the water. Phases: f1 ¼ x,y,-x,-y; f2 ¼ 4x,4y,4(-x),4(-y); receiver ¼ x,-y,-x,y, -x,y,x,-y. The delay D is varied to estimate T2. The Nef spectrum was recorded with delays D1,2 of 0.1 and 2.9 ms, respectively. The intensity ratio of the left amide proton resonances is about 0.65, yielding a T2-value of 13 ms and a correlation time tc of about 15 ns (see text). (b) Typical T2 values ([ms] at 600 MHz 1H) of backbone and side-chain nuclei in a protein with a 15-ns tc
biomacromolecule. It works as follows: the 1 H carrier is set to the water frequency. The two pairs of 1–1 or jump-return 90f /90f pulses, which are separated by delays t and 2t, respectively, leave the water along its equilibrium position in the positive z-direction. For nuclei which resonate at a frequency offset of (4t)1 from the water, the first and second 1-1 pair act as selective 90 and 180 pulses, respectively. The delay t is set such that the excitation maximum is around 8.5 ppm, i.e. the centre of the 1 HN region. The two D periods surrounding the selective 180 pulse are the spin-echo relaxation delays during which 1 HN transverse relaxation is monitored. Two experiments are carried out with different D values and T2 is calculated from the intensity ratios (Recipe 4.1). It should be noted that the experiment also decouples 1 Ha protons from the HN protons, since the selective 180 pulse does not excite the 1 Ha in the vicinity of the water frequency. For a rigid, isotropically tumbling molecule, the rotational correlation tc can be estimated from the 1 HN T2 as tc[ns] 1/(5T2 [s]). In the slow tumbling limit, the T2s of all nuclei of a rigid rotator are proportional to each other. Thus the T2s of the various 1 H, 13 C and 15 N nuclei of a protein can be estimated from the 1 HN T2. Figure 4.3b shows typical T2s for a 15-ns tumbler that can serve as a guide for other proteins. Recipe 4.1: 1–1 Echo Experiment 1. Assure proper shimming and tuning. Calibrate a high-power 1 H 90 pulse. Set 1 H carrier to water. Set 15 N carrier and decoupling for 15 N-labelled samples. 2. Set t such that the excitation maximum is around 8.5 ppm (82 ms at 800 MHz) and D to a small value (100 ms).
Measurement of Structural Restraints
89
3. Set the power of all proton pulses and the length of the two first pulses of the 1–1 pairs (90f1 , 90f2 pulses) as calibrated. Tweak the length and the phases of the second pulses (90f1 , 90f2 ) such that minimal water excitation results. Typically, the pulse lengths are slightly shorter (0.05 ms) than for the first pulses, since radiation damping during the t periods already moves the water in the direction of the positive z-axis. Typical phase variations are 1–2 depending on the accuracy of the water frequency setting. 4. Run the 1–1 echo experiment with a short relaxation delay (D1 100 ms) and with a longer delay (D2). Estimate the T2 from the intensity ratio of resonances in the downfield part of the 1 HN spectrum as T2 ¼ 2(D2 D1)/ln(I1/I2). Note that the downfield part usually corresponds to strongly hydrogen-bonded amide protons in the folded part of the protein. The best sensitivity for T2 estimation is obtained when D2 is close to T2/2. Recipe 4.2: Set-up of Optimal Acquisition Times The acquisition periods of an NMR experiment should be set up such that the resolution is maximised according to the available signal. Note that the resolution achieved is a function of the maximal acquisition time tmax and the signal-to-noise ratio. Good results can be obtained by following simple rules: 1. The acquistion time tmax in the directly observed dimension should be about 3 T2 of the observed resonances. In this case, good digital filtering is obtained when using a 60 shifted sine-square bell function. 2. For decaying signals in an indirectly detected dimension, the acquistion time tmax should be set to a value where the signal has decayed to about 1/3. This corresponds approximately to the T2 of the observed resonances, when the decay is solely caused by relaxation. Note, however, that the decay can also be due to unresolved J-couplings, for example JCC for 13 C resonances. Also in this case, decay to about 1/3 is a good compromise. Good digital filtering is obtained for such a signal with a 60 -shifted sine bell function. 3. Rule 2 obviously does not apply to constant-time experiments, where no decay is occuring. For such cases, tmax should be set to the maximal achievable time in the constant time period. 4. The initial delay t0 of the first sampled time point requires additional consideration. During t0 chemical shift evolution occurs, which necessitates an appropriate phase correction during processing. For conventional quadrature detection and phasing definitions, usually ph1 ¼ 360 t0/Dt, where ph1 is the first order phase and Dt is the time increment. The zero order phase ph0 usually is then given as ph0 ¼ 0.5 ph1, if there is no additional contribution to ph0 from the hardware, phase shifts or BlochSiegert effects. Note that on some systems (e.g. nmrPipe [19]) the sign of the phases is inverted. Furthermore, the initial delay t0 is best set to values of 0, 1/2 Dt or 1 Dt, since all other values result in curved spectral baselines due to the intricacies of the discrete Fourier transform [20,21]. For the calculation of t0, all initial chemical shift evolution needs to be taken into account, for example also the evolution during the flanking 90 pulses in a typical P90-t1-P90 evolution period. For rectangular 90 -pulses of duration P90, the chemical shift evolution time is equivalent to 2 P90/p. Finally, during processing the first data point needs to be multiplied by the factor 0.5, 1, or 1 for the initial delay value t0
90
Protein NMR Spectroscopy
of 0, 1/2 Dt or 1 Dt, respectively. In the last case (t0 ¼ Dt), a constant baseline correction in the frequency domain is required because the signal at time zero (¼ integral over frequency domain) is missing. Recipe 4.3: Set-up of a 3D 15N-Edited NOESY Experiment (Figure 4.2a) 1. Assure proper shimming and tuning. Calibrate 1 H, 13 C and 15 N pulses. 2. For best water suppression, optimise the water-gate scheme [22] at the end of the HSQC part in a separate experiment. For this, apply a single-scan soft 90 (x) – hard 180 (x) – soft 90 (x) – acquire sequence without any gradients. Set the 1 H carrier to the water resonance. Set power levels and lengths of the pulses as calibrated (90 hard pulse: 7–10 ms, 90 soft pulse: 1 ms). Optimise the duration and phase of the soft pulses for minimal water excitation. Use thesevalues for the full 3D 15 N-NOESY-HSQC experiment. 3. Use the same 1 H carrier, pulse durations, phase values for the 15 N-NOESY-HSQC experiment. Set the 15 N carrier to the middle of the 15 N region (116.5 ppm). For 13 C-labelled samples set 13 C carrier for 180 during t1 to 70 ppm, and to 56/177 ppm for the two 180 pulses during t2. Checking the individual 2D planes 4. Record an HSQC test plane of the 15 N-edited NOESY. For this, omit the t1-incrementation. For largest signal, set t1 0 and omit the f3 decoupling pulses on 15 N and 13 C. Other settings: tm ¼ 80 ms, 1 H (f3) spectral width 14–16 ppm (to cover all 1 HN resonances), 15 N (f2) spectral width 20–30 ppm, recycle delay 0.9–1.0 s. Set the number of increments in t2 and t3 such that the maximal acquisition times tmax match about 1x and 3x, respectively, the estimated T2 of the resonances (Recipes 4.1/4.2). Recording the experiment with a small number of scans (2 or 4) should yield a spectrum with sensitivity similar to a normal HSQC. Check that the phases in the indirect f2 dimension correspond to the settings according to the initial time delay (Recipe 4.2 point 4). Initially, the phases in the directly detected t3 dimension may be arbitrary. From the respective ph1 value obtained, a correction for the initial time delay can be calculated as Dt3 ph1/360 , where Dt3 is the dwell time. The sweep width in the f2 dimension may be optimised to minimise peak overlap. However, the total acquisition time t2,max should be kept at about T2. Note that some reduction in sweep width (leading to folding) may be beneficial to minimise the total required experimental time. 5. Record a 1 H(f1)-1 HN (f3) test plane of the 15 N-edited NOESY. For this, omit the t2incrementation. For largest signal, set t2 0 and omit the 13 C decoupling pulses in the t2evolution interval. Set the 1 H (f1) spectral width to about 10 ppm and the maximal acquisition time t1,max to the estimated T2. Other parameters as under point 4. Record the experiment with a small number of scans. Transform the spectrum: mainly the diagonal will be visible, since cross-peak intensity is usually on the order of a few percent of the diagonal. Check that the phases of the diagonal peaks in the indirect f1 dimension correspond to the settings according to the initial time delay (Recipe 4.2 point 4). An additional þ or 45 is required on ph0 due to the phase offset the f1-pulse settings. The sweep width in the f1 dimension may be optimised in a similar way to that indicated under point 4.
Measurement of Structural Restraints
91
Optimising the mixing time 6. Mixing times of about 80–100 ms are usually a good compromise for proteins in the 10–40 kDa range for maximising the intensity of the cross-peaks while ensuring only moderate levels of spin diffusion. For other situations (deuterated, larger or smaller proteins), maximal cross-peak sensitivity can be obtained by setting the mixing time to approximately the value of the decay time of the diagonal peaks, i.e. the selective T1,sel. An estimate of T1,sel can be determined by recording the first increment of the 15 N-edited NOESY with a short (e.g. 20 ms) and a long mixing time (e.g. 120 ms) and a sufficient number of scans. Since most of the signal arises from diagonal resonances, T1,sel can be estimated from the ratio of intensities of the 1D spectra after Fourier transform. As in Recipe 4.1, the downfield side of amide resonances should be used, since they represent the folded protein. Using this method, optimal NOE mixing times on the order of several hundred ms are found for protons on the background of a deuterated protein [23]. Run the final experiment 7. Run the 3D experiment with mixing time, sweep widths, and acquisition times as optimised in the previous steps. Since the indirect dimension acquisition times are limited to
Protein NMR Spectroscopy
92
the duration of the observable FID. Maximal acquisition times tmax larger than 7–9 ms are thus rarely useful. 4.2.4
Deriving Structural Information from NOE Cross-peaks
The intensity Iijpeak of a NOESY cross-peak between protons j and i results from the magnetisation DIz;ij ðtm Þ transferred from proton j to proton i during the NOE mixing period tm. Using Equation 4.6, DIz;ij ðtm Þ corresponds to DIz;ij ðtm Þ ¼ expðRtm Þij DIz;j ð0Þ
ð4:7Þ
where expðRtm Þij are the ij matrix elements of the matrix exponential expðRtm Þ. Two-spin approximation. For an isolated pair of protons i and j, the matrix exponential is easily evaluated to give DIz;ij ðtm Þ ¼ ðeðrii þ sij Þtm eðrii sij Þtm ÞDIz;j ð0Þ=2
ð4:8Þ
where we have used the fact that sij ¼ sji and assumed that rii ¼ rjj. For short mixing times (sijtm, riitm 1) Equation 4.8 reduces to DIz;ij ðtm Þ sij tm DIz;j ð0Þ
ð4:9Þ
The latter is also obtained in the more general case for short mixing times, since then expðRtm Þ 1Rtm . Assuming that the initial magnetisation, relaxation and transfer losses during the total NOESY experiment are the same for every proton, the peak intensities Iijpeak should be proportional to sij and one can derive the distances between protons i and j based on Equations 4.5 and 4.9 and a known reference distance rref with peak observed reference peak intensity Iref ! peak 6 Iref sref 6 rij ¼ rref peak rref ð4:10Þ sij Iij The accuracy of distance determination by Equation 4.10 is severely affected by three problems: spin diffusion, local motions, and varying efficiencies of excitation and detection for different protons. These limitations are discussed in the following. Spin diffusion. In the general case, the magnetisation transfer according Equation 4.7 must be calculated from a full evaluation of the matrix exponential: 1 X expðRtm Þij ¼ dij tm Rij þ t2m Rik Rkj . . . ð4:11Þ 2 k The last term of Equation 4.11 shows that magnetisation transfer between spins i and j can also occur as a result of indirect transfer via a third spin k. This so-called spin diffusion effect becomes more pronounced with increasing mixing time. Due to the dependence of the NOE on the inverse sixth power of the distance, the relayed transfer following a consecutive chain of close protons may become more efficient than the direct transfer between the distant start and end points of the chain. As a consequence, when such NOE cross-peaks are interpreted using Equation 4.10, the distances between spins i and j are underestimated. Several programs have been developed, in which model structures are refined against the full relaxation matrix exponential [24–26].
Measurement of Structural Restraints
93
Besides affecting the accuracy of distance measurements, spin diffusion also limits the sensitivity of the NOE transfer, since magnetisation is diluted across the entire proton reservoir. The effect of spin diffusion can be reduced by the introduction of deuterons (2 H) into the molecule [27]. The most efficient procedures use full deuteration and selective protonation of specific positions such as amide [23] or methyl hydrogens [28]. For such highly deuterated molecules, very long NOE transfer periods of several hundred milliseconds can be used to obtain high sensitivity for large proteins or detection of very long distance NOEs. Rotating-frame Nuclear Overhauser Spectroscopy (ROESY) is a further method to assess and limit the effects of spin diffusion. ROESY has been developed as an alternative to NOESY to avoid problems associated with the zero crossing of the longitudinal crossrelaxation rate sij (Equation 4.3b) for fast rotational correlation times, i.e. small molecules [29]. In contrast to the NOE (sij < 0 for large molecules), the ROE cross-relaxation rate is positive across the entire motional regime. As a consequence, cross-peaks resulting from a single-step ROE transfer have the opposite sign to the diagonal (i.e. non-transferred magnetisation), and each subsequent relayed ROE transfer alternates the sign of the transferred signal. In the common case where a two-step spin-diffusion peak coincides with a peak resulting from a direct ROE transfer, the total signal will be attenuated resulting in a rather harmless overestimation of the distance. In contrast, the underestimation of distances from NOE spin diffusion induces detrimental, unrealistic strains on the molecule. Hence, ROE-derived upper bounds can be set more ‘tightly’ as compared to their NOE counterparts. The ROESY experiment also distinguishes the ROE effect from chemical exchange (see Chapter 7). The latter results in positive cross-peaks in the ROE spectrum, which are easily discriminated from the negative ROE cross-peaks. Experimentally, optimal ROE transfer is obtained by setting the mixing period approximately equal to the T2 of the protons. Caution must be exercised to minimise the potentially interfering Hartman-Hahn transfer by avoiding the Hartmann-Hahn match condition (jwi wcarrier jjwj wcarrier j wRF ), where wi,j,carrier denote the frequencies of proton i, j, and the carrier, and wRF is the field strength of the ROESY spin-lock. In practice, this can be achieved by placing the carrier to the edge of the spectrum and using a suitably weak spin-lock field. Local motion. A further complication in the interpretation of NOE intensities originates from the fact that the relative motion of protons in a protein usually cannot be described by a single rotational correlation time as assumed in Equation 4.9. In particular, side-chains often experience motions with considerable amplitude. Thus methyl groups rotate around the carbon-carbon chemical bond on the picosecond time scale whereas phenylalanine and tyrosine aromatic rings flip around the Cb–Cg axis on the microsecond time scale. Several specific models such as ‘wobbling-in-a-cone’ or ‘three-site jump’ have been developed to correct NOEs for such local motion effects. For local motions faster than the nanosecond time scale, the model-free approach can be used, which scales the cross-relaxation rate by the order parameter S2ij to account for the relative local motion of the protons i and j sij
1 g 4 h2 m0 2 2 S tc 6 10 rij 4p ij
ð4:12Þ
However, the order parameter S2ij is usually not known from experimental data, but may be obtained from molecular dynamics simulations [25]. For motions that are slow with respect
94
Protein NMR Spectroscopy
to the overall tumbling, but fast with respect to the chemical shift differences, the NOE matrix A ¼ exp(Rtm) may be treated as an average over several NOE matrices Ak of individual conformers k. This procedure corresponds to an r6 ij averaging over the various allowed conformations. Varying efficiencies of proton excitation and detection. According to Equation 4.7 the NOESY cross-peak intensity Iijpeak depends on the deviation DIz;j ð0Þ of the z-magnetisation of spin j from thermal equilibrium at the start of the NOE mixing period. Due to differing transfer efficiencies and relaxation behaviour, the factor DIz;j ð0Þ may not be the same for all protons. In particular, exchange with saturated water may bleach out some of the resonances. Similarly, the efficiency of detection of spin i after the NOE transfer may not be the same for all protons, again mainly as a result of differing relaxation properties. Representing these efficiencies of excitation of spin j and detection for spin i as ej and di, respectively, cross and diagonal peak intensities for short mixing times take the form Iijpeak di sij tm ej Iiipeak di ð1rii tm Þei
ð4:13Þ
Thus, if the mirror peak between proton i and proton j and both diagonal peaks for protons i and j are also observable, the unknown factors ei/j and di/j can be eliminated [30] by using vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u peak peak uIij Iji di sij tm ej dj sji tm ei t ð4:14Þ sij tm peak peak d ð1r i Iii Ijj ii tm Þei dj ð1rjj tm Þej where again sij ¼ sji is used. In cases, where the variations in ei/j and di/j are severe, this may be a viable procedure to obtain a more adequate estimate of sij. Practical extraction of distance information. The effects described limit the precision of NOE-derived distance restraints. To address this problem in practice rather large error margins are commonly used. Thus NOEs are often classified into a number of distance categories: for example strong, medium and weak (cf. Recipe 4.5). Despite these somewhat loose definitions, the quality of structures obtained is usually not strongly affected, since the lack of distance precision can be compensated by a high number of observed NOEs. In addition, the strongest structural restraints are exerted by long-range NOEs between residues that are far apart in the primary sequence (see below), and such topological restraints are not very sensitive to the distance precision. Most common structure calculation programs, like ARIA [31] and CYANA [32], implement internal scaling routines via the two-spin approximation (cf. Equation 4.10, Recipe 4.6). By using distinct categories for different types of atoms, more realistic distance bounds can be obtained. For example, NOEs involving methyl groups should be analysed separately to account for the specific spectral properties of these moieties. In addition, ARIA can perform a limited relaxation matrix analysis [33], thus partially correcting for the effects of spin diffusion. These automated two-spin procedures are both convenient and usually yield somewhat more accurate restraints.
Measurement of Structural Restraints
95
Recipe 4.5: Extraction of Distances Using Classes Define a lower (L) and upper-bound (U) by using a classification: L ¼ 0:18 nm 8 0:27 nm for strong NOEs > > < U ¼ 0:33 nm for medium NOEs > > : 0:50 nm for weak NOEs Recipe 4.6: Extraction of Distances Using the Two-Spin Approximation 1. Identify NOE cross-peaks between protons in fixed geometry (e.g. Tyr, Phe, Trp aromatic ring protons) and in regular secondary structural elements (e.g. those corresponding to daN(i,i þ 1) ¼ 0.21 nm in b-sheets and daN(i,i þ 3) ¼ 0.34 nm in a-helices). 2. Using the cross-peaks of step 1, generate a simple calibration curve of cross-peak intensity (I peak ) vs distance (D). 3. Using the calibration curve of step 2, derive D from I peak for every cross-peak of interest. 4. Define a lower (L) and upper-bound (U) by using a relative error such as L ¼ 0.8 D, U ¼ 1.2 D. 4.2.5
Information Content of NOE Restraints
Modern NMR structure determination often includes restraints from NOEs, J-couplings, chemical shifts, residual dipolar couplings and other observables. However, NOE-derived distances still constitute probably the most important source of structural information. Figure 4.4 illustrates why these restraints are so important. The observation of an NOE immediately implies a short distance (<5–6 A) between two protons. If these protons are far
(a)
(b)
NOE
Figure 4.4 NOE information in proteins. (a) An NOE between two protons limits the conformational space. (b) All NOE-derived restraints in the PDZ2 domain of PTP-BL (PDB code 1GM1) are indicated by light-grey tubes. Dark grey ribbons indicate regions of secondary structure
96
Protein NMR Spectroscopy
apart in the primary sequence, such a long-range NOE will strongly limit the accessible conformational space (Figure 4.4a). Furthermore, proteins are often globular and have a high and homogeneous proton density. This usually leads to a large number of observable NOEs. In particular, often the protein core can be very well defined by a dense network of long-range NOEs (Figure 4.4b). The information content of NOEs can be defined more rigorously by informationtheoretical approaches [34]. The results show that long-range NOEs (i.e. residue separation by more than four amino acids in the primary sequence) typically comprise 85–95 % of all structural information, whereas they constitute only 27–43 % of the total number of NOE restraints. In contrast, intraresidue NOEs carry only 0.4 % of the information, while constituting a similar fraction of the total number of restraints. Sequential- and mediumrange NOEs account for the remaining fraction of about 1/3 of the NOEs restraints with an information content between 5 and 15 %. This finding underlines the importance of an extensive search for long-range NOEs. Unfortunately, intraresidual or sequential NOE peaks not only dominate the NOESY spectrum in number, but also in intensity as the corresponding distances typically are short. In contrast, the long-range NOEs typically correspond to longer distances and have weak signal intensities. Therefore, the comprehensive assignment of almost all, i.e. thousands, of NOEs is usually required to extract all available information. This is a laborious exercise, but automated assignment protocols such as CYANA/CANDID [32] or ARIA [31,33] can aid with the extraction of the majority of the NOE information from the spectra in an automated fashion. Not surprisingly, the short-range accuracy of NMR structures, which are derived solely from NOEs and torsion-angle information, is higher than the long-range accuracy. This is a particular problem for elongated molecules or molecules with protruding regions. In such cases, often not many NOEs that would accurately orient the different parts of the molecule are observed. This situation is even more serious for nucleic acids, which typically are non-globular and in addition suffer from a low proton density that further limits the number of observable NOEs. The inclusion of RDC and SAXS data can often remedy such problems of long-range accuracy of structures (see Sections 4.5 and 4.7 of this chapter).
4.3
Dihedral Restraints Derived from J-Couplings
Dihedral angles derived from three-bond J-couplings (3J) present the second class of ‘traditional’ restraints. The introduction of 13 C and 15 N labelling has greatly expanded the number of measurable J-couplings (Figure 4.5b) [35,36] and consequently enhanced the available structural information. Besides angular information, 3J-couplings are also very useful for establishing stereo-specific assignments and for the determination of rotameric states [37]. 4.3.1
Physical Background
J-couplings (also called spin-spin or scalar couplings) arise from magnetisation transfer of one magnetic nucleus via polarisation of the electron cloud to a second magnetic nucleus. Usually the Fermi contact of the s-electrons with the nucleus is the dominant mechanism of magnetisation transfer between nucleus and electrons (cf. Figure 4.5a). The subsequent
Measurement of Structural Restraints
97
Figure 4.5 (a) Schematic representation of the J-coupling interaction between spins I and S through polarisation of the electron clouds of chemical bonds. (b) Selected observable 3 J-couplings for determination of f, y and c1 angles (dashed lines) as well as cross hydrogen bond h1J- and h3J-couplings (dotted lines) in 15N/13C labelled proteins. Approximate absolute magnitudes are indicated in Hz
transfer from one atom to another commonly happens via the electron clouds of covalent chemical bonds. However, in a more general sense, any interatom contact that results in electron correlation may also induce scalar coupling transfer. For isotropic liquids, the Hamiltonian of the J-interaction between spins I and S is given by ! !
HJ ¼ h nJIS I S
ð4:15Þ
where nJIS represents the magnitude of the J-interaction in Hertz and n indicates the number of intervening bonds between the two nuclei. In the weak coupling case (nJIS |nI nS|), Equation 4.15 reduces to HJ ¼ h nJIS Iz Sz
ð4:15’Þ
For biomolecules, nJ -interactions are usually only detected for n ¼ 1,2,3 as longer-range effects tend to be too small [38,39]. As for any nuclear magnetic interaction, the size of the J-coupling is proportional to the magnetic moment of the involved nuclei (JIS / g I g S), thus typically jn JHH j > jn JCH j > jn JCC j > jn JCN j for similar bond geometries. 4.3.2
NMR Experiments for Measuring J-Couplings
The requirement to measure small J-couplings (typically <10 Hz) with sufficient accuracy in a biomolecule presents a considerable challenge since the broadness of the biomolecular resonance lines often prohibits direct observation of the splitting from the resonance line. E. COSY (Exclusive Correlation Spectroscopy) [40], quantitative J-Correlation [41,42] and IPAP (in-phase/anti-phase) [43] are three types of experiments that are designed to overcome this problem, since they do not depend on the extraction of the J-value from a single 1D spectral trace. These techniques can also be used for the measurement of hydrogen-bond J-interactions (Section 4.4) and RDCs (Section 4.5).
98
Protein NMR Spectroscopy mix
(a) JIX
(b)
S
I X
I,S
JSX
(c)
t1
Dec.
(f)
(e)
Xβ
Xα JIX
νI
t2
X (d)
νS
mix
νI
νS
Xα
JSX
Xβ
Xα
νS
JSX
F1
νS
Xβ
νI
JIX
νI
F2
Figure 4.6 The E. COSY principle. (a) Coupling topology of three spins I, S, and X required for the E. COSY. Spins I, S and are both J-coupled to the passive spin X. A further interaction (either J-couplings or NOE) is present between spins I and S, by which magnetisation can be transferred during the mixing period. (b) Schematic E. COSY pulse sequence. The decoupling pulses on the X-spin (180 pulse during t1 and composite decoupling scheme during t2, grey) are shown for explanatory purposes only (see text). (c–f) Schematic spectra resulting from the pulse scheme B, using X-spin decoupling during t1 and t2 (c), only during t1 (d), only during t2 (e) and without any X-spin decoupling (f)
E. COSY. The E. COSY principle [40] requires two spins I and S being both J-coupled to a third, so-called passive spin X (Figure 4.6a). In a two-dimensional experiment (Figure 4.6b), magnetisation is transferred from spin S to spin I between the t1 and t2 evolution periods. If during both periods the JSX and JIX interactions are decoupled (both the 180 X-spin pulse and the X-spin composite decoupling scheme are applied), the resulting spectrum shows a single cross-peak at frequencies (nS, nI) (Figure 4.6c). In contrast, omitting decoupling of JIX during t2 or of JSX during t1 will yield a doublet in F2 or F1, respectively (Figure 4.6d and e). Note that in both cases, the upfield component is associated with the a spin state of the X-spin (assuming both positive JSX and JIX). Finally, in the situation where the interactions with the X-spin are not decoupled either during t1 or during t2, the X-spin does not change its state and the resulting 2D spectrum displays a so-called E. COSY multiplet pattern (Figure 4.6f), resulting from an upfield shift in both dimensions for the molecules with the X-spin in the a state and a downfield shift in both dimensions for the molecules with the X-spin in the b state. Practical E. COSY experiments are designed such that one of the chosen J-interactions, for example the JSX, is large to provide for a well-resolved splitting of the two multiplet components along the F1 axis, thus allowing the small, otherwise unresolved JIX-interaction to be determined from the displacement of the multiplet components along the F2 axis. Note that the E. COSY scheme yields the relative signs of the JSX and JIX and is equally suited for measuring positive, negative and near zero JIX values. Recipe 4.7: E.COSY Experiment 1. Record the appropriate experiment (see e.g. [35] for a list of suitable experiments) with sufficient resolution in the relevant dimensions (Figure 4.6f): the splitting in F1 needs to be resolved, which implies sufficient t1,max > 1.5/JSX; the splitting in F2 should be sampled with the highest possible resolution, i.e. t2,max ¼ (1–3) T2 of the relevant nuclei.
Measurement of Structural Restraints
99
2. Process with sufficient digital resolution in the dimension from which the J-coupling is to be measured, i.e. use zero filling at least by a factor of 4. 3. Pick peaks using a peak picker that interpolates between data points for best resolution. Pick only well-resolved peaks with sufficient signal-to-noise ratio, for example >4. 4. Assure that peak frequency data files are written with sufficient precision, for example <0.1 digital resolution. (Preferably switch the frequency format to Hz.) 5. Extract J-couplings as the frequency differences from the appropriate peaks. 6. If possible repeat the experiment and data analysis under identical conditions and determine average and standard deviations for the J-values. The independent repetition of the experiment gives a very good idea of the cumulative statistical error resulting from either the experiment or the data analysis procedure. Quantitative J-Correlation. The second general class of experiments for measuring J-couplings comprises the so-called quantitative-J (QJ) experiments (Figure 4.7) [41,42]. In these experiments, the size of the JIS-coupling is determined from the amplitude of inphase (e.g. Ix) or anti-phase magnetisation (e.g. 2IySz) terms generated during a fixed period T where the JIS interaction is active. Thus, the method ‘translates’ the frequency shift of the coupling into an intensity variation. The simplest QJ experiment is the quantitative spin-echo (Figure 4.7a). With the 180 (S) pulse in position a, the JIS interactions is effectively decoupled, whereas in position b, the JIS
(a)
T 2
I
T 2
S
‘cross-peak’
b
φ1
(b) I,S
νS Dec.
a
F1
(d) t2
T 2
T 2
φ2
t1
T 2
T 2
t2 ‘diagonal peak’
(c)
I
T
T φ1
S
t1
νI t2
Dec.
νI
F2
Figure 4.7 Quantitative-J (QJ) pulse sequence elements. Narrow and wide pulses have flip angles of 90 and 180 , respectively and are applied with phase x unless specified otherwise. (a) Spin-echo based QJ experiment. Two experiments are recorded with the 180 (S) pulse in positions a and b, respectively. (b) COSY-based QJ experiment. f1 ¼ y, y; f2 ¼ 2y, 2(y); receiver ¼ x. (c) HMQC-based QJ experiment. The two-dimensional ‘cross’ experiment is recorded by the usual t1-evolution with phase settings f1 ¼ x, x; receiver ¼ x, x. This experiment yields a cross-peaks at frequency position (ns, nI) and intensity proportional to sin2(pJIST). For the reference, a one-dimensional experiment can be recorded by omitting the t1evolution and the pulses on the S-spin (see text). (d) Schematic outline of an F1 strip taken at the F2 ¼ nI frequency of the COSY-based QJ-experiment displaying non-transferred magnetisation as ‘diagonal peak’ and transferred magnetisation as ‘cross-peak’ with opposite sign (grey)
100
Protein NMR Spectroscopy
interaction is active for time T, yielding an effective attenuation of the signal by a factor cos(pJIST). Hence, JIS can be calculated from the relation Ib ¼ cos ðpJIS TÞ Ia
ð4:16Þ
where Ia and Ib denote the signal intensities recorded with the 180 (S) pulse in positions a and b, respectively. In the homonuclear COSY-based QJ experiment (Figure 4.7b), the I and S spins evolve as in-phase and anti-phase terms during the first constant-time period T, they are then subjected to the COSY-mixing pulse, frequency-labelled during the t1 period and transferred via the reverse pathway during the second constant-time period T of the scheme. The resulting spectrum (Figure 4.7d) consists of a so-called diagonal peak at F1 ¼ nI with intensity Id proportional to cos2(pJIST) and a so-called cross-peak at F1 ¼ ns with intensity Ic proportional to sin2(pJIST) [41]. Hence, JIS can be calculated from Ic ¼ tan2 ðpJIS TÞ Id
ð4:17Þ
The COSY-based QJ experiment can be easily (and typically without loss of signal) combined with a coherence transfer to a heteronucleus, for example 15 N for 3 JHNHA [41] or 13 C for 3 JHNHB [44] and 3 JCC [45], yielding 3D experiments with improved resolution. The HMQC-based QJ experiments (Figure 4.7c) follow the same principle as the homonuclear COSY-based QJ. A difficulty arises as the in-phase I magnetisation is not detected in the indirect t1-evolution of the S spin and thus cannot be quantified from the 2D HMQC. However, by omitting the t1-evolution and the pulses on the S-spin, the intensity of the I magnetisation, which is not attenuated by the JIS coupling, can be quantified from the resulting 1D experiment [46]. After suitable scaling to account for the difference of 2D and 1D detection, the JIS coupling can be determined from the relation Ic ¼ sin2 ðpJIS TÞ ð4:18Þ I d where Ic is the intensity of the 2D cross-peak and Id is the scaled intensity of the 1D ‘diagonal’ peak. Additional correction factors may have to be applied for incomplete S-spin isotope labelling or imperfections of the S-spin pulses [46]. For all QJ experiments, differential relaxation of the in-phase and anti-phase terms introduces systhematic errors in the derived J-values. The increased relaxation rate of the anti-phase term relative to the in-phase results in an underestimate of the J-coupling, an effect which can be corrected for by a suitable scaling factor [41]. Other systematic errors originate from imperfect pulses and incomplete labelling [35]. Recipe 4.8: Quantitative J-Correlation 1. Record the appropriate experiment (see e.g. [35] for a list of suitable experiments) using appropriate digital sampling in the respective dimensions according to Recipe 4.2. Note that for the experiment in Figure 2.3b t1,max should be limited to about 0.1/JIS, because otherwise unwanted J-evolution occurs during t1. For optimal sensitivity of cross-peaks in the weak coupling case (pJIST2, pJIST 1), choose T T2.
Measurement of Structural Restraints
101
2. Process using window functions with only limited resolution enhancement. Zero fill at least once in each dimension. 3. Pick peaks (both ‘diagonal’ and ‘cross’ peaks) using a peak-picking program that employs interpolation. Also pick diagonal peaks for which no cross-peak is observed. In these instances, the noise level of the experiment can provide an upper limit for the J-coupling. 4. Assure that the intensity (volume) within the peak list file is written with sufficient precision and in absolute scaling mode (i.e. not scaled relative to any external factor, such as contour level). 5. Extract J-couplings using the appropriate peaks and formula. Be aware of the factor ‘two’ related to the de/rephasing periods that may result from the actual implementation of the pulse sequence. Correct for differential relaxation effects by applying a correction factor (see [41] for a recipe). Use the lowest contour level at which unambiguous peak detection is possible as an upper limit for the cross-peak amplitude of J-couplings where no cross-peaks are observed. This then yields upper limit estimates for the J-coupling. Note that typically the lowest reasonable contour level is about three times the rmsd of the noise. 6. If appropriate, improve the (diagonal) peak definition by simulation of the spectrum; for example using the NlinLS routine of the NMRPipe package [19]. Typically this may be useful to extract the proper intensities of partially overlapping peaks or when diagonal and cross-peaks have strongly differing line widths. 7. If possible repeat the experiment under identical conditions to obtain estimates on the statistical errors (see Recipe 4.7). IPAP. In IPAP schemes (Figure 4.8) pulse sequence elements are used that alternatively generate in-phase or anti-phase multiplets in nD spectra. Addition or subtraction of the two spectra then yields either the upfield or downfield multiplet component. The method is most often used for measuring relatively small deviations from an otherwise large and
in-phase
anti-phase
J IS
ωI t1
+/– I
=
ωI t1
δ
δ
ωI
S 1/(2JIS)
Figure 4.8 IPAP experiments for measuring J-couplings. Left: an in-phase signal is obtained in a normal t1-evolution. The insertion of a d-180 (I,S)-d (IPAP) element with d ¼ 1/(4JIS) into the evolution period creates an anti-phase signal. Right: addition and subtraction of the two signals yields separated downfield and upfield part of the JIS doublet
102
Protein NMR Spectroscopy
uniform J-couplings; for example for measuring the RDC contributions to heteronuclear 1J and 2J couplings (see Section 4.5). Several pulse sequence elements, such as IPAP, S3E and S3CT, have been designed to create either the in-phase or anti-phase spectra [43,47,48]. These can often be inserted without difficulty into an existing nD experiment. The IPAP pulse sequence element (Figure 4.8) constitutes a very simple building block. By setting d ¼ 1/(4JIS) the anti-phase spectrum is created, whereas the regular in-phase spectrum is obtained in the absence of the IPAP element. Mismatches in d, originating from variability in the JIS-values, may result in poor cancelation during addition or subtraction. This effect can become more pronounced for RDC measurements, as the effective J-values in the aligned phase may vary considerably (see Section 4.5). 4.3.3
Deriving Structural Information from J-Couplings
The magnitude of the nJ -coupling constants depends on the local electronic environment, which is related to the conformation and dynamics of the (bio-)molecule. Although certain 1 J and 2J coupling constants do contain structural information [49–52], the 3J coupling constants are the most valuable and easiest to use for deriving structural restraints. The magnitude of these 3JIS coupling constants can be correlated with the dihedral angle q, defined by the three bonds that connect spin I and spin S, through the Karplus equation [53,54] JðqÞ ¼ Acos2 ðqÞ þ BcosðqÞ þ C
3
ð4:19Þ
where A, B, and C are adjustable parameters. Thus, provided that the parameters A, B and C are known, the J-coupling can be translated into a structural restraint (cf. Figure 4.9). For example, the 3JHNHA -coupling between the amide proton and the a-proton depends on the backbone torsion angle f (Figure 4.9a). A recent parametrisation using simultaneous optimisation of multiple Karplus parameters involving several 3J couplings all related to the f dihedral angle yielded [55,56] 3
JHNHA ðfÞ=Hz ¼ 7:97cos2 ðf60 Þ1:26cosðf60 Þ þ 0:63
ð4:20Þ
The use of the Karplus relation for translating an experimentally derived 3J value into a dihedral angle is complicated by several factors. First, the multivalued nature of the Karplus curves can yield multiple solutions corresponding to a single 3J value. Additional information, such as other 3J couplings related to the same dihedral angle, is needed to resolve this ambiguity. An alternative and often preferred solution is direct refinement in the structure calculation protocol against the actual 3J value [57–60]. Second, the Karplus coefficients are themselves subject to experimental uncertainty, since they are usually derived from a comparison of measured values and dihedral angles obtained from high-resolution X-ray or NMR structures. This inherent uncertainty translates into an uncertainty of the derived dihedral angle restraint. In particular, if sampling of the experimental data points for the derivation of the Karplus coefficients was limited to a certain dihedral angle range, large uncertainties are to be expected for dihedral angles outside of this region.
Measurement of Structural Restraints (b)
(a)
Ni
10 8
β
Ci
(Hz)
φi
C'i-1
6
3J
H αi
103
4 C'i
2 Hi 0 -150 -100 -50
0 φi
50
100
150
Figure 4.9 Karplus relations for the f angle. (a) Fisher projection along the N i --C ai bond. The f angle of residue i is defined by the C 0i1 --N i --C ai --C 0 i atoms. Note that the depicted definition corresponds to a negative f angle of about 145 . (b) 3JHNHA (f) Karplus correlation (solid line), its estimated error (dashed lines) and 3JC 0 C 0 ðfÞ correlation (dashed-dotted line)
Like most NMR parameters, the 3J values are conformationally averaged [61]. While this effect provides a handle on probing the dynamical process, it may also introduce large errors on the derived restraints, if not properly accounted for. In particular, the averaging of sidechain rotameric states can greatly change the observed magnitude of the 3J value, an effect that can only be evaluated from analysing multiple 3J coupling constants in parallel [62]. Further sources of complications are that the magnitude of the 3J-coupling also depends on other factors such as substituent effects, geometric distortions and hydrogen bonding [56]. Nevertheless, the high redundancy of information from several J-couplings defining the same torsion angle has led to remarkably accurate, self-consistent sets of Karplus coefficients for the f [56] and c1 [62] angles.
4.4 4.4.1
Hydrogen Bond Restraints NMR H-Bond Observables
H-bonds are key features of biomacromolecules, stabilising secondary and tertiary structures and participating in many enzymatic reactions [63]. Their presence can be detected by a number of NMR observables comprising chemical shifts, coupling constants, hydrogen exchange rates and hydrogen/deuterium fractionation factors [5,64]. Besides their general physicochemical importance, a knowledge of specific H-bond donor-acceptor pairs provides strong constraints for NMR structure calculations. In proteins, the slow hydrogen exchange of amide protons is often used as evidence for the existence of an intramolecular H-bond [65]. The exchange experiment is easily carried out either by lyophilising a protein sample dissolved in H2O and subsequently dissolving it in D2O or by appropriate mixing with D2O. The exchange is then detected by the disappearance of resonances in a time series of sequential 1 H-15 N correlation experiments. Depending on pH and the protection of the NH group, exchange rates can vary from milliseconds to
104
Protein NMR Spectroscopy
years. The slow exchange of amide hydrogens forming intramolecular H-bonds is easily distinguished from the fast exchange of amides involved in transient H-bonds to water. The exchange can also be quantified for very large proteins since the detection of 1 H-15 N correlations is the only requirement. However, the method does not provide information about the H-bond acceptor, whose identity is usually inferred from NOE data or by spatial proximity in preliminary structure calculations. The observation of scalar couplings across the H-bonds in biomolecules provides an unambiguous way to identify both donor and acceptor groups. The H-bond scalar couplings (HBCs) connect magnetically active nuclei on both sides of the hydrogen bridge via the magnetic polarisation of its electron cloud. In biomolecules, H-bonds observable by HBCs comprise the canonical N–H. . .N and N–H. . .O¼C H-bonds of regular secondary structure elements in nucleic acids [66,67] and proteins [68,69]. In favourable cases, complete Hbond networks in biomacromolecules can be established from COSY-type experiments. The size of the HBCs is determined by the overlap of H-bond donor and acceptor electronic orbitals [70] and thus provides a sensitive measure of H-bond geometry, i.e. HBCs depend exponentially on donor-acceptor distances [71] as well as on certain H-bond angles [5]. Due to the requirement that the HBCs connect NMR-observable nuclei, their observation in biomolecules has been limited to HBCs between 15 N, 13 C, 1 H, or 31 P nuclei, but excludes oxygen. Experimental procedures for their detection, theoretical descriptions and applications of HBCs have been reviewed extensively [5]. 4.4.2
Detection of N–H. . .O¼C H-Bonds in Proteins
In proteins, N–H. . .O¼C H-bonds (Figures 4.5 and 4.10) induce detectable HBCs between the donor amide 15 N or 1 HN nuclei and the carbonyl acceptor 13C 0 nuclei [68,69,73]. The three-bond h3JNC0 HBCs are easier to detect than the two-bond h2JHC0 HBCs, but their small size of about 0.2 – 0.9 Hz limits the sensitivity of the experiments for larger molecular weights. Nevertheless, h3JNC0 HBCs are easily accessible in small (10–15 kDa) 13 15 C/ N-labelled proteins, provided they can be studied at concentrations of 0.5 mM. Additional deuteration in combination with TROSY allows detection even for mediumsized proteins of up to about 30 kDa [74]. The h3JNC0 HBCs can be detected and quantified by the long-range HNCO experiment (TROSY version, Figure 4.11), which is a modification of the standard HNCO for backbone assignments with longer N $ C0 transfer times [68,69]. During the N ! C0 and C0 ! N INEPT transfer periods of the long-range HNCO, both h3JNC0 and the large covalent 1JNC0 ( 15 Hz) couplings are active for a period 2T 2e, where the length of the delay e is determined by the position of the 13C0 180 pulses. Optimal transfer from the 15 N nucleus via the H-bond to the 13C0 nucleus of the acceptor and back is achieved when the one-bond coupling 1JNC0 is approximately refocused, i.e. 2T 2e ¼ n/1JNC0 , and when the total transfer time is set to a value that is close to the T2 relaxation time of the 15 N nucleus. In practice, a total transfer time of about 133 ms (n ¼ 2) works well for small proteins, which is realised by setting the delays T ¼ 66.5 ms and e ¼ 4 ms 0. As a result, the long-range HNCO yields cross-peaks between the donor amide 1 HNi and 15 N i as well as the acceptor 1
The symbol hnJAB is used for HBCs between nuclei A and B in order to emphasise that one of the n bonds connecting the two nuclei in the chemical structure is actually an H-bond.
Measurement of Structural Restraints 51⋅⋅⋅ 39 m
C'
Hα
15
H
res. i O
O
Cα
13
13
N
C'
15
O
13
29
H
N
C'
12
i
50⋅⋅⋅5
s
d
40⋅⋅⋅28
42⋅⋅⋅26
m
173
m
13C’
13 α
C
h3J NC’
8⋅⋅⋅10
d
10 51⋅⋅⋅7 m
15⋅⋅⋅50 m
8 25
s
174 32s
s
12 43⋅⋅⋅47
9.6
s
m
9.4
42
49s
s
49⋅⋅⋅41
9.8
172
10 ⋅⋅⋅ 8 d
res. j
13
105
9.2
m
48i
41⋅⋅⋅49m
9.0
s
8.8
175
ppm
1 HN
Figure 4.10 Selected region of the 2D long-range H(N)CO TROSY spectrum recorded on a 2 mM (monomer) sample of 15N/13C RANTES dimer (MWT 2 8.5 kDa), pH 3.8, 310 K, 800 MHz spectrometer with cryogenic probe. The total experimental time was 46.5 h. Cross-peaks marked as Resi. . .Resj are due to h3JNiC 0 j HBCs between the 15N nucleus of residue i and the 13C 0 nucleus of residue j (see inset for definition). Superscripts ‘m’ and ‘d’ denote intramonomer and intermonomer (¼dimer) h3JNiC 0 j correlations, respectively. Cross-peaks marked by a single residue number correspond to incompletely suppressed sequential 1JNiCi1 (superscript ‘s’) or intraresidue 2JNiCi (superscript ‘i’) correlations. Adapted from [72]
carbonyl 13 Cj0 nuclei with intensities (Icross) proportional to cos2 ð2p1 JNC0 TÞ sin2 ð2ph3 JNC0 TÞ. Thus all partner nuclei of the H-bond are ‘visible’. To determine the size of the h3 JNC0 coupling by quantitative J-correlation, a reference experiment is recorded using the same total lengths 2T for the two INEPT periods, but with e ¼ 1=ð41 JNC0 Þ 16:5 ms. This reduces the effective time for 15 N–13 C0 de- and refocusing to 2T2e ¼ ðn1=2Þ=1JNC0 yielding primarily transfer via the large 1JNC0 couplings. The resulting 3D HNCO reference spectrum shows correlations between amide 1 N 15 H i, N i and the carbonyl 13 C0 i-1 nuclei with intensities proportional to 2 1 sin ð2p JNC0 ðTeÞÞ cos2 ð2ph3 JNC0 ðTeÞÞ. Therefore, the intensity ratio of cross and reference peaks is given by cos2 ð2p 1JNC0 TÞsin2 ð2p h3JNC0 TÞ Icross ¼ 2 Iref sin ð2p 1JNC0 ðTeÞÞcos2 ð2p h3JNC0 ðTeÞÞ
ð4:21Þ
and the size of h3JNC0 can be determined by a numerical inversion of this implicit function using the known values of 1JNC0 . As an alternative, the coupling can be estimated to good
Protein NMR Spectroscopy
106
H→N INEPT y
1
δ
H
δ
N→C’ INEPT
y
13
-x
y
-x
T
N
φ3
y
T
ε
C’
t2 2
φ2 13
N→H TROSY
C’→N rev-INEPT
-y -x
φ1 15
C’ evol.
t1 2
T
t1 2
t2 T-2
-x
-y
δ
δ
-x
δ
t3
δ
ε
Dec.
α
C
Grad. G1
G2 G3
G3 G4
G5
G6
G6 G7 G7 G8
G8
Figure 4.11 Long-range-HNCO-TROSY pulse sequence. The sequence corresponds to a conventional 3D HNCO-TROSY. However, the delays T and e are adjusted to achieve transfer either via h3JNiC 0 j H-bond couplings (‘cross’ experiment) or via the usual 1JNiCi1 couplings (‘reference’ experiment, see text). Adapted from [72]
approximation as jh3 JNC0 j ðIcross =Iref Þ1=2 =ð2pTÞ for 1JNC0 values close to 15 Hz and j2ph3 JNC0 Tj 1. For smaller proteins, the resolution of the 2D H(N)CO versions of the long-range and reference experiments is often sufficient. In these cases, the 15 N t2-evolution can be omitted in the scheme of Figure 4.11. Recipe 4.9: Setting up a Long-Range HNCO Experiment for H-Bond Detection 1. The success of the experiment depends critically on the relaxation behaviour and concentration of the protein. Typical amide nitrogen (TROSY for deuterated proteins) T2s should be on the order of 100 ms or larger. Protein concentrations should be in excess of 0.5 mM. 2. Set up pulse power levels, carrier frequencies, sweep widths and acquisition times parameters as for a normal HNCO or HNCO-TROSY (for deuterated proteins) sequence. A very detailed protocol is given in [72]. 3. Record a 1 JNC0 2D H(N)CO test spectrum. Set the total length of the N–C0 transfer periods to a small value, which is optimal for 1 JNC0 detection, for example T ¼ 16.5 ms, e ¼ 4 ms. Record a 2D H(N)CO spectrum with a small number of scans (e.g. 4). This should result in a normal 2D H(N)CO spectrum showing all the expected 1 H-13 C0 i-1 correlations with good sensitivity, for example signal/noise > 10. 4. Record the 1 JNC0 reference spectrum. Set the total length of the N–C0 transfer periods to the same value as will be used for the cross experiment and the e delay for best observation of the 1 JNC0 ‘diagonal’ (1 H-13 C0 i-1 ) peaks (e.g. T ¼ 66 ms, e ¼ 16.5 ms). Set all other parameters as for the test spectrum and acquire the data. The spectrum should still have the same appearance as a normal H(N)CO, but at a lower signal/noise ratio. For example, protonated (deuterated) 15 N/13 C-labelled ubiquitin (1 mM) yields typical signal/noise ratios of 43:1 at 298 K [75] and protonated 15 N/13 C-labelled dimeric
Measurement of Structural Restraints
5.
6.
7. 8.
107
RANTES (MWT 2 8.5 kDa, 2 mM monomer) signal/noise ratios 4:1 at 310 K after 20 minutes acquisition on an 800 MHz instrument equipped with a cryoprobe. Estimate the experimental time (number of scans) needed for the H-bond experiment. This time can be estimated from the signal/noise ratio of the reference experiment as jJmin ðNScross Þj ¼ ðNSref =NScross Þ1=4 ðSref =Ntop;ref Þ1=2 =ð2pTÞ, where Jmin(NScross) denotes the minimal detectable J value, NSref/cross are the number of scans in the reference/cross experiments, Sref is the peak height of a ‘diagonal’ peak in the reference spectrum and Ntop,ref is the top level of the noise in the reference spectrum. Typical |h3 JNC0 | values are in the range of 0.2–0.9 Hz, with |h3JNC0 | values in b-sheets (0.5 0.2 Hz) larger than in a-helices (0.3 0.2 Hz). Check whether the Jmin for your anticipated experiment is sufficient for the detection of this |h3JNC0 | range. Otherwise increase NScross. Note that the necessary experimental time rapidly becomes limiting. In this case an increase in protein concentration or deuteration may help. For the protonated (deuterated) ubiquitin example, a total experimental time of 12.3 h (NScross ¼ 136) yields |Jmin| ¼ 0.28 (0.07) Hz, which covers most of the usual h3JNC0 values. For RANTES, a total experimental time of 46.5 hours (NScross ¼ 512) yields |Jmin| ¼ 0.36 Hz, which still allows detection of most b-sheet H-bonds. Record the long-range H(N)CO spectrum. Set the length of the N–C0 transfer periods to the same value used for the reference experiment (T ¼ 66 ms) and the e delay for best observation of h3JNC0 correlations (e ¼ 4 ms). Adjust the number of scans according to step 5. Note that if a 3D HNCO is required for resolution, steps 4–6 apply accordingly to the 3D versions. The 2D and 3D version of the sensitivity-enhanced TROSY of Figure 4.11 have identical signal/noise ratios. Process the spectrum and evaluate the data according to Equation 4.21. If possible repeat the experiment to obtain estimates on the statistical errors.
The detected H-bonds can be incorporated into structure calculations by the usual H-bond distance restraints (rHO ¼ 2.1 0.5 A, rNO ¼ 2.0 0.5 A) or by fitting to the heuristic formula [70,76] h3 JNC ¼ 357 Hz exp 3:2rHO =A cos2 q ð4:22Þ where q presents the H. . .O¼C angle.
4.5
Orientational Restraints
The NMR structural restraints discussed so far are only local in nature. NOEs provide information over a relatively short distance, scalar couplings are restricted to a few chemical bonds and also chemical shifts report on the local electronic environment. In spite of their general availability in large numbers, such restraints lead to an imprecise definition of longrange order in biomolecules due to the accumulation of errors. Such missing information on long-range order can be provided by methods using weak anisotropic orientation of biomolecules in solution. In the anisotropic situation, dipolar, CSA or quadupolar interactions no longer average to zero. The resulting ‘residual’ tensorial interactions report on the orientation of local groups relative to a common overall director, thereby yielding information about the relative orientation of individual groups even over long distances.
Protein NMR Spectroscopy
108
Anistropic orientation can be achieved by several mechanism. First, all molecules possess an intrinsic anisotropic magnetic susceptibility [77], which is particularly pronounced for paramagnetic molecules [75,78]. The anisotropic susceptibility leads to a net alignment in a static magnetic field. However, this intrinsic alignment is typically on the order of 104 or less at current magnetic fields strengths, which results in residual interactions of only a few Hertz at best. Stronger alignment can be accomplished by the introduction of external substances such as anisotropic liquid crystalline media into the biomolecular solution [79]. The degree of alignment by such media can usually be tuned such that the residual interactions are in the tens of Hertz, i.e. not too strong to cause complications from the breakdown of the weak coupling limit, but strong enough to be detectable with high sensitivity. In the following, we discuss the physical background, the different alignment methods, practicalities of measurement, data analysis and the use as structural information of these residual anisotropic interactions. 4.5.1
Physical Background
4.5.1.1 Dipolar Couplings in Anisotropic Solution ! ! According to Equation 4.1b, the dipolar energy Hd between two spins I and S in a ! molecule depends on the size and direction of the internuclear vector r g g h2 m Hd ¼ S I 3 0 4pr
(
! !
! !
ðS r ÞðI r Þ S I 3 r2 ! !
) ð4:23Þ
In the heteronuclear case, the transverse spin components oscillate rapidly and average to zero. Thus they can be neglected (secular approximation) and only the z-components remain Hdhetero
g Sg I h2 m 0 g S g I h2 m0 S I 3ðS cosqÞðI cosqÞ ¼ Sz Iz ð3cos2 q1Þ f g z z z z 4pr3 4pr3
ð4:24Þ
where q is the angle between the internuclear vector ! r and the static magnetic field. Rotational diffusion of the molecule in solution changes the orientation of the internuclear ! vector r relative to the field and the effective dipolar interaction becomes an average of both time and all possible states of the ensemble
Hdhetero
g Sg I h2 m0 3 cos2 q1 g S g I h2 m0 S I Sz Iz hP2 ðcos qÞi ¼ ¼ z z 2 2pr3 2pr3
ð4:25Þ
where it has been assumed that the internuclear distance r is constant and P2 ðxÞ ¼ 1=2 ð3x2 1Þ is the second Legendre polynomial. In isotropic solution, the average hP2 ðcos qÞi and hence the heteronuclear dipolar Hamiltonian Hdhetero (Equation 4.25) vanishes. However, under anisotropic conditions the average may become observable containing valuable information about the average orientation of the internuclear vector. Denoting by D the residual dipolar coupling (RDC)
Measurement of Structural Restraints
109
constant and by Dmax its maximal value D¼
g I g S hm 0 hP2 ðcos qÞi ¼ Dmax hP2 ðcos qÞi 4p2 r3
the average dipolar Hamiltonian takes the form
hetero Hd ¼ hD Sz Iz ¼ hDmax hP2 ðcos qÞi Sz Iz
ð4:26Þ
ð4:27Þ
The spin part Sz Iz of Hdhetero is identical to the spin part of the heteronuclear J-coupling Hamiltonian HJhetero ¼ hJ Sz Iz . Therefore the heteronuclear dipolar Hamiltonian behaves exactly like a scalar coupling, leading to a total splitting T of the resonance line as T ¼ JþD
ð4:28Þ
The maximal dipolar coupling between nearby nuclei is very substantial, for example Dmax equals 21 kHz for the amide 1 H-15 N spin pair (rNH ¼ 1.04 A). Thus, if the anisotropy becomes too large, the strong coupling between many spins will lead to very complicated spectra. It is therefore best to adjust the degree of anisotropy such that the RDCs are not yet too strong, but still easy to detect [79]. Optimal degrees of orientation are on the order of 103 where for example maximal 1 DHN RDCs are in the range of 20–30 Hz. 4.5.1.2 The Alignment Tensor The residual dipolar coupling in Equation 4.27 is proportional to the ensemble average hP2 ðcos qÞi containing the angle between the internuclear vector and the static magnetic field. For a rigid molecule, this ensemble average can be described in a moving molecular coordinate system by defining unit vectors ^ex ; ^ey ; ^ez along the molecular x-, y-, and z-directions. In this system, an internuclear unit vector ^r with coordinates (cx,cy,cz) takes the form 0 1 cx ^r ¼ cx ^ex þ cy ^ey þ cz ^ez ¼ ð^ex ^ey ^ez Þ @ cy A ð4:29Þ cz The cosine of q corresponds to the scalar product between ^r and a unit vector ^b parallel to the direction of the magnetic field: 0 1 0 1 cx cx X Ci ci cos q ¼ ^ b ^r ¼ ^ b ^ex ^ b ^ey ^ b ^ez @ cy A ¼ ð Cx Cy Cz Þ @ cy A ¼ i¼x;y;z cz cz ð4:30Þ b relative to the molecular coordinate system where Cx,y,z are the direction cosines of ^ ^ex ; ^ey ; ^ez . Therefore by substitution, * !+ X
X 2 1=2 ¼ 3=2 hP2 ðcos qÞi ¼ 3=2 Ci ci Ci Cj ci cj 1=2 ð4:31Þ i¼x;y;z
i¼x;y;z j¼x;y;z
110
Protein NMR Spectroscopy
Here, the ensemble and time average is over the molecular orientation with respect to the magnetic field, while the coordinates (cx,cy,cz) of the internuclear vector ^r are timeindependent in the molecular frame. Since ^r is a unit vector, c2x þ c2y þ c2z ¼ 1, and therefore Equation 4.31 can be written as 0 1 0 1 Sxx Sxy Sxz cx X ð4:32Þ ci Sij cj ¼ ð cx cy cz Þ @ Syx Syy Syz A @ cy A hP2 ðcos qÞi ¼ S S S c zx zy zz z i¼x;y;z j¼x;y;z
where
Sij ¼ 3=2 Ci Cj 1=2 d ij
ð4:33Þ
are the elements of the Saupe order matrix [80] and dij is the Kronecker delta [dij ¼ 1 (for i ¼ j), 0 (otherwise)]. The Saupe matrix contains all the information about the distribution of the molecular alignment, which is needed to calculate the RDCs. D It E is also called the 2 2 2 alignment tensor. The Saupe matrix is real, traceless ( C þ C x y þ Cz ¼ 1), and
symmetric ( Ci Cj ¼ Cj Ci ), and thus contains only five independent elements. These properties ensure that there is always a coordinate system where the Saupe matrix is diagonal. In this principal axis system (PAS), the dipolar coupling as obtained by Equations 4.26 and 4.32 simplifies to h i D ¼ Dmax hP2 ðcosqÞi ¼ Dmax ~Sxx~c2x þ ~Syy~c2y þ ~Szz~c2z ð4:34Þ where the tilde should express the elements of S and c in the principal axis system. Usually the principal axis system is chosen such that j~ Szz j > j~Syy j > j~Sxx j. Using polar coordinates ~ ~ Q; F for the description of the internuclear distance unit vector in the principal axis system, Equation 4.34 becomes ~ þ h=2 sin2 Q ~ cos 2F ~ D ¼ Dmax ~ Szz P2 ðcos QÞ
ð4:35Þ
with h being the asymmetry parameter defined as h ¼ ~Sxx ~Syy =~Szz . Alternative forms are obtained by substituting the axial ~Sa ¼ ~Szz =2, and rhombic ~ Sr ¼ ð ~ Sxx ~ Syy Þ=3 components of the Saupe matrix as D ¼ Dmax ¼ Da
3 ~ ~ cos 2F ~ ~ Sa 3 cos2 Q1 þ ~Sr sin2 Q 2
3 ~ ~ cos 2F ~ 3 cos2 Q1 þ R sin2 Q 2
where Da ¼ Dmax ~ Sa and R ¼ ~ Sr =~ Sa ¼ 2h=3 is the rhombicity.
ð4:36Þ
Measurement of Structural Restraints
111
These equations can also be reformulated by using irreducible representations of the Saupe matrix and the molecular coordinates [81] 4p X * S Y2;i ðQ; FÞ ð4:37Þ D ¼ Dmax 5 i¼2;2 i where Y2;i ðQ; FÞis a second order spherical harmonic and S*2;1;...2 are five complex numbers representing the Saupe order matrix in irreducible form: rffiffiffiffiffiffi rffiffiffiffiffiffirffiffiffi rffiffiffiffiffiffi rffiffiffi rffiffiffi rffiffiffi ! 5 5 2 5 1 1 2 S0 ¼ Szz ; S1 ¼ Sxz þ iSyz ; S2 ¼ Sxx Syy þ i Sxy 4p 4p 3 4p 6 6 3 S*i ¼ ð1Þi Si ð4:38Þ Equation 4.37 is valid in any coordinate system and shows that all parameters describing the molecular alignment enter into the RDC in a linear manner. This property greatly simplifies the problem of finding the best alignment tensor for a given set of RDCs and coordinates [81–84]. 4.5.1.3 Chemical Shifts in Anisotropic Solution Similar to the dipolar coupling, the chemical shift is orientation-dependent. This is expressed in the chemical shift (Zeeman) Hamiltonian of spin I as !
!
H ¼ hg I I ð1 þ dI ÞB 0
ð4:39Þ
!
where B 0 is the static magnetic field and d I is the chemical shift tensor of spin I, which is a quantity that is fixed in the molecular frame and represented by a 3x3 matrix. In isotropic solution, rotational averaging of the chemical shift tensor leads to the observation of the isotropic chemical shift, d iso, which is the average of the diagonal elements of this tensor. d iso ¼
1 dxx þ dyy þ dzz 3
ð4:40Þ
In an anisotropic medium the remaining part the chemical shift anisotropy tensor does not average to zero. This results in a small, but detectable shift deviation Dd from d iso. A analogous derivation as in the previous section for the dipolar Hamiltonian yields the value of this residual chemical shift anisotropy (RCSA) [85] as Dd ¼
2 X 2 X ~ Sij dji ¼ Sii cos2 qij d~jj 3 3 i¼x;y;z j¼x;y;z
ð4:41Þ
i¼x;y;z j¼x;y;z
~ and d~ are the respective tensors in their principal where Sij is again the Saupe matrix, S ~ axes system, and qij is the angle between the i-th principal axis of ~S and the j-th principal axis of d. In contrast to the RDC, which provides the orientation of the internuclear vector in the molecular alignment frame, the RCSA corresponds to a projection of the CSA tensor onto
112
Protein NMR Spectroscopy
the molecular alignment frame. In general, the principal components of the CSA tensor are not collinear with internuclear vectors involving the particular nucleus. Therefore the RCSA can offer distinct structural information. Similar to RDCs, the information can be used as restraints to refine and validate structures [85–88]. A drawback is that some prior knowledge about the size and orientation of the chemical shift tensor is required. Furthermore, the determination of precise chemical shift differences between oriented and non-oriented samples under exactly identical conditions (temperature, pH, etc.) presents challenges. For practical reasons RCSAs are typically measured for nuclei having a large CSA, such as nitrogen, carbonyl or aromatic carbon nuclei in proteins or nucleic acids. Under the usual alignment conditions, the observed RCSA values for those nuclei typically range between 100 and 200 ppb. 4.5.2
Alignment Methods
Current methods for obtaining anisotropic alignment of biomolecules either rely on their intrinsic anisotropic magnetic susceptibility or use auxiliary molecules or particles introduced into the solvent, which in turn induce a net alignment of solute biomolecules by steric, electrostatic or chemical interactions. 4.5.2.1 Intrinsic Molecular Alignment The anisotropy of the magnetic susceptibility of a molecule causes a net alignment in an external magnetic field, which may lead to detectable residual ‘tensorial’ couplings in NMR. Very strong alignments by the intrinsic anisotropic magnetic susceptibility were originally observed in the ordered phases of liquid crystals [80]. However, in this case, the dipolar couplings are very strong (kHz) leading to highly complex spectra of the liquid crystal molecules. In the early 1980s Bothner-by, Van Zijl and co-workers showed that the magnetic alignment of smaller organic molecules is much weaker and leads to easily interpretable first order spectra with detectable RDCs [89,90]. In addition to magnetic alignment, electric alignment has also been shown to yield detectable residual quadrupolar splittings for small molecules in organic solvents [91]. Unfortunately, electrical alignment is not practical for biomolecules in aqueous solvents due to their strong conductance. In general, the anisotropic magnetic susceptibility is caused by the electrons of a molecule and is particularly strong for paramagnetic molecules. Thus Tolman and Prestegard [78] could show the first readily-detectable RDCs in a biomolecule for the paramagnetic myoglobin, where the iron-containing heme induces DHN RDCs on the order of a few Hz. At this time, Kung et al. [92] also demonstrated that diamagnetic doublestranded DNA aligns itself in a magnetic field and leads to DHC RDCs of a few Hz. The anisotropy of diamagnetic molecules mainly results from electron-rich chemical groups such as aromatic rings or peptide bonds. The effects of these groups are additive and therefore stronger for the regularly stacked bases of nucleic acid duplexes. For the more irregular protein structures, they are, however, usually very small with typical RDC sizes below one Hz. Therefore, intrinsic magnetic alignment has been exploited mainly in nucleic acid complexes [93] and in metal-binding proteins complexed to paramagnetic lanthanides [94]. More recently, several protocols have been developed to artificially introduce lanthanides into proteins by lanthanide binding tags [95], which in favourable cases have yielded DHN RDCs on the order of 20 Hz [96].
Measurement of Structural Restraints
113
4.5.2.2 Indirect Alignment by External Media The intrinsic alignment methods are limited by the weak alignment of diamagnetic molecules or the requirement of having a paramagnetic centre available in a biomolecule. A much more general and practical method is the indirect alignment by additional substances that yield substantially larger RDCs [79]. These systems are usually also tunable in their strength by the variation of their concentration or other modifications. A large number of such solvent/media systems has been developed, which achieve alignment by a variety of mechanisms. The topic has been reviewed by Prestegard et al. [97]. Table 4.1 lists the most common media and some of their properties. Choice of medium. Unfortunately, there is no single, ideal medium that is applicable to any biomolecule, as each system has different tolerances towards conditions such as salt, pH, temperature, detergent or organic solvent content. Therefore, a good understanding of physico-chemical properties and, usually, trial-and-error are required for finding a suitable medium for a particular application. A prime consideration is the principal difference between uncharged and charged media: uncharged media mainly interact by steric exclusion of the solute, whereas charged media in addition have strong electrostatic interactions. This leads to desired variations in the alignment tensor and hence to new information. However, charged media strongly attract solutes of opposing charge, which may make NMR observation impossible. Many biological media used for alignment, such as Pf1 or fd phages, DNA, cellulose, or purple membranes are negatively charged. For such media, it is often only possible to work with proteins with net negative charges, i.e. at a pH above the pI. Electrostatic interactions are decreased at higher ionic strengths. Thus too strong interactions can sometimes be attenuated to an acceptable level by the addition of salt. Conversely, uncharged media like bicelles and alkyl-poly(ethylene glycol)/hexanol La phases often contain hydrophobic components, which may interact too strongly with hydrophobic proteins, such as membrane-interacting proteins. In this case, as in other cases of too strong interactions, it may help to reduce the concentrations of alignment medium and solute. The stability of the medium under the required physicochemical conditions is a mandatory condition. Thus bicelles only form above room temperature and are not suitable for working at low temperature. In contrast, many other media are not stable at elevated temperatures. Polyacrylamide gel [115,116] is more or less inert and can tolerate a wide range of temperatures and pH, mild detergent or organic solvents. However, it is not suited for too large protein sizes and occasionally interactions with parts of the protein surface have been observed. In some cases, additives such as detergents or organic solvents may be required to keep the proteins solubilised. This destabilises many media such as bicelles, alkylpoly(ethylene glycol)/hexanol La phases media, Pf1 or fd phages. Optimisation of sample conditions. The essential criterion in optimising sample conditions is the achievement of suitable alignment with minimal distortion in the resonance line shape. Evidence for suitable alignment is a range of measured RDCs at least one order of magnitude larger than the measurement errors. An upper limit in achievable alignment usually results from a deterioration of spectral quality. Thus, too strong alignment causes a splitting of 1 H resonances into broad multiplets from 1 H-1 H long-range dipolar couplings. In addition, the increased viscosity in the alignment medium or more specific interactions of the solute molecule with the medium often lead to a decrease in transverse relaxation times and subsequent line broadening. Figure 4.12 illustrates the change of the quality of
positive negative
negative
Helfrich phases DNA nanotubes
di-guanosine
a
All charged media interact more or less strongly with solutes of opposing charge.
cellulose crystallites
good detergent stability
wide temperature, pH stability
inert
easy to add and to remove as pf1
strong interaction with positive biomoleculesa as pf1 limited temperature range, stability limited to smaller MWT, problematic mechanical handling costly from commercial source limited spectral quality
good behaviour for larger proteins, good detergent stability weakly good pH and salt stability involved media negative preparation
negative
purple membranes
charged gels
negative negative/ positive negative/ positve
negative
Charged filamentous phage pf1
fd phages charged bicelles
neutral
collagen gels
polymerisation carried out at 37 C
wide temperature range, limited to smaller MWT inert (<20 kDa)
neutral
mechanically stressed polyacrylamide good temperature, pH, detergent stability
very good spectral quality
alkyl-poly(ethylene glycol)/ neutral hexanol La phases
limited temperature range, stability, hydrophobic interactions limited temperature range, hydrophobic interactions
Limitations
very good spectral quality
Advantage
neutral
Uncharged lipid bicelles
Charge
Table 4.1 Properties of common media for anisotropic orientation of biomolecules
[102,115,116]
<20 kDa MWT, extreme solution conditions large biomolecules or membrane proteindetergent complexes
membrane proteindetergent complexes large biomolecules or membrane proteindetergent complexes charged and hydrophobic proteins
[114]
[113]
[111] [112]
[81,110]
extreme solution conditions [108,109]
biomolecules with net [104,105] negative charge (pH > pI) as pf1 [106] [107,129]
[103]
[100,101]
[79,98,99]
Reference
well soluble, not too hydrophobic proteins
well soluble, not too hydrophobic proteins
Specific Applications
114 Protein NMR Spectroscopy
Measurement of Structural Restraints (a)
115
125.5
126.5
(b)
125.5 15N 126.5
(c)
125.5
126.5
(d)
125.5
126.5
−15 −10 -5
0
Hz
5
10 15
9.2
9.0
8.8
8.6 8.4 1H
8.2
8.0
7.8 ppm
Figure 4.12 NMR Spectral quality as a function of phage concentration. The HSQC spectra of 0.5 mM GlnBP in the glutamine bound form in 20 mM Tris/HCl pH 7.2 are shown in four different Pf1 phage concentration: 0 mg/ml (a), 6 mg/ml (b), 8 mg/ml (c), and 12 mg/ml (d). Their corresponding deuterium spectra (except for 0 mg/ml) are shown in the left panels with the quadrupolar splittings of 5.7 Hz, 7.8 Hz, and 11.7 Hz, respectively. Increasing resonance linewidths observed in the HSQC spectra is due to long-range dipolar coupling interactions that are no longer negligible at high Pf1 concentration
116
Protein NMR Spectroscopy
resonance lines of the protein GlnBP as a function of increasing alignment medium concentration. At high phage concentration, the 1 H-1 H long-range RDCs become increasingly important and cause a splitting of the 1 H resonances into broad multiplets. Typically, useful alignment corresponds to maximal 15 N-1 H RDCs of about 10–30 Hz, in which case the one-bond heteronuclear couplings are not yet significantly affected, such that the usual INEPT transfers are still efficient. The quality of the alignment medium itself in the presence or absence of the solute biomolecule of interest can often be judged from the residual alignment of water molecules, which is detectable by the 2 H-quadrupolar splitting. Figure 4.12 also shows this 2 H splitting for the GlnBP sample. The splitting increases with increasing phage concentration, and the 2 H resonance is relatively narrow, indicative of good sample homogeneity. However, in certain cases, the water resonance may be broadened by exchange with medium or protein to an extent that the 2 H splitting cannot be resolved. This does not necessarily indicate that also the spectral quality of the solute biomolecule is unsatisfactory, and hence its behaviour should be tested independently. Optimising anisotropic sample conditions requires good knowledge of the characteristics of both medium and biomolecule. However, often only a slight modification of the physicochemical conditions (e.g. concentration, salt, pH, temperature, detergent) is sufficient to achieve the desired result. Further options of variation are available by combining two different alignment media [117] or direct and indirect alignment methods [118]. 4.5.3
Measurements and Data Analysis
According to Equation 4.28 the RDC causes a change in the total coupling constant between two nuclei. Therefore, RDCs are generally determined as the difference between the total coupling with the biomolecule in an aligned and an isotropic state. For indirect alignment this implies measuring the coupling in anisotropic and isotropic media, whereas for intrinsic alignment, such as in paramagnetic proteins, the RDC can be separated from the isotropic scalar coupling by carrying out the measurement at different magnetic fields, or by measuring a paramagnetic protein with and without its paramagnetic ligand. Any scheme to determine J-couplings can also be used for measuring RDCs. All considerations that apply to J-coupling measurements, such as cross-correlation effects, sensitivity issues, and peak shape distortions, also apply to RDC measurements. However, besides such general caveats, no further special precautions are necessary for RDCs. Figure 4.13 shows as an example the 1 H-15 N IPAP spectra of the protein GlnBP recorded in the isotropic and aligned phase. The comparison reveals changes in the total coupling TNH, which correspond to a DNH of 7.3 Hz and þ 20.6 Hz for residues L180 and F13, respectively. Many J-detection schemes, such as quantitative-J and IPAP, only yield the absolute value of the scalar coupling. Thus the sign of the RDC is often only determined relative to the isotropic J-value. If the sign of J is known and |D| < |J|, the sign of D follows from D ¼ sign (J) (|T| |J|). The signs of the relevant one-bond J-couplings are 1 JCH , 1 JCC > 0 and 1 JNH , 1 JNC < 0 and their magnitude is generally large compared to RDCs. Therefore the sign of biomolecular one-bond RDCs is usually easy to obtain. In cases where the sign cannot be obtained from the absolute value of the total coupling, for instance for smaller multibond couplings, alternative J-detection schemes such as E. COSY may yield this information.
Measurement of Structural Restraints (a)
(b)
F13
L180
L180
F13
117
129.4
129.8
130.2
-93.72 Hz -93.28 Hz
-73.14 Hz 130.6
-100.61 Hz
15N 131.0
131.4
131.8
10.1 10.0 9.9
9.8
9.7
9.6
9.5
9.4 10.1 10.0 9.9 1H
9.8
9.7
9.6
9.5
9.4
ppm
Figure 4.13 J-coupled NMR spectra to measure dipolar coupling. The isotropic (a) and anisotropic (b) IPAP spectra of 0.5 mM GlnBP in the glutamine-bound form in 6 mg/ml of Pf1 phage are shown. The pair of IPAP spectra corresponding to the up- and down-field components is superposed together. The up-field components of the IPAP spectra are shown in dashed lines. Residue L180 shows an increase, while F13 shows a decrease of apparent 15N-1H J-couplings in the anisotropic medium
If also this is not possible, some structural information can still be extracted by using the absolute value of the RDC. Further complications may arise when one nucleus experiences RDCs from several adjacent nuclei that cannot be separated. For example, for the 13 C nucleus in a 13 CH spin system only the sum of 1 DCH1 and 1 DCH2 is detected in a quantitative J-modulated HSQC with constant-time 13 C evolution [119]. Nonetheless, the sum 1 DCH1 þ 1 DCH2 can still be incorporated as valuable information into structure calculations. Note that the two couplings can also be separated when detecting the RDCs on the 1 H methylene resonances. However, this is often less precise and sensitive due to the faster relaxation times of 1 H as compared to 13 C nuclei. A simpler situation arises for CH3 groups. Since methyl groups in proteins rotate around the C–C bond on the picosecond time scale, the rotational averaging of the tetrahedral arrangement scales the 13 C–1 H dipolar coupling (DCH) by 1/3. The threefold larger change in splitting observed for the outer methyl quartet components compensates the factor 1/3, leaving only the negative sign. The observed RDC is then exactly opposite to the one expected for a regular CH bond that points in the direction of the C–C bond vector [119]. According to Equation 4.25 the magnitude of the RDC is given as the time and ensemble average of the dipolar interaction. This average includes not only the anisotropic Brownian rotation of the entire molecule, but also internal motions. The RDC time average ranges over the entire time of the measurement, that is up to milliseconds. It thus presents an
118
Protein NMR Spectroscopy
experimental handle to detect motions in the biologically interesting 108 to 104 s range where traditional relaxation measurements have a ‘blind spot’ [6]. In general, flexible regions in a protein will have smaller dipolar couplings than rigid portions. To a first description, such motions can be quantified by a generalised order parameter (S) of the particular internuclear vector [79]. More elaborate descriptions may involve averages over explicit multiconformer ensembles of a molecule [120–122], which can give interesting insights about the accessible conformational space. 4.5.4
Determination of the Alignment Tensor
The use of RDCs for structure validation, determination and refinement of rigid molecules requires knowledge of the Saupe matrix (Equation 4.33) or equivalently of its diagonal elements (amplitude, rhombicity) (Equations 4.35 and 4.36) and its Euler angles in the principal axes system (PAS). For an unknown structure, these diagonal elements can be determined via the histogram method [123]. This is illustrated by the example of the glutamine binding protein (GlnBP) of E. coli with its L-glutamine substrate bound [124]. The measured DNH values for GlnBP are shown in Figure 4.14a as a function of the residue number. In line with the previously formulated criterion, the range of observed DNH is about 30 Hz, with an estimated experimental error of less than 1.0 Hz corresponding to a dynamic range >10. Figure 4.14b shows the histogram of the DNH values for GlnBP. The shape of the histogram is identical to that of powder patterns in solid state NMR. The positions of its three extrema, give the principal values of the alignment tensor, Axx 0 Hz, Ayy 25 Hz, and Azz 30 Hz, where Aii ¼ Dmax ~ Sii , according to the previous convention (jAzz j > jAyy j > jAxx j). The extrema correspond to NH vectors that are oriented in the direction of the Saupe matrix principal axes. From the Aii values estimates for Da (15 Hz) and R (0.45–0.67) can be obtained. This pronounced rhombicity is also directly obvious in Figure 4.14a from the large difference between the Axx and Ayy values. It is noted that the estimated values for Axx, Ayy, and Azz do not satisfy the traceless condition of the alignment tensor (Axx þ Ayy þ Azz ¼ 0). This effect is primarily due to the sparse sampling of all possible dipole orientations and becomes particularly problematic when working with small proteins where the number of measured RDCs is very small or for proteins with a skewed distribution of dipole orientations. One possible solution to this problem is to use multiple types of RDCs to construct a better sampled histogram [123]. For instance, DCH, DCaC0 , and DNC0 are all oriented differently in the molecular frame relative to DNH and thus enlarge the orientational sampling. Prior to their inclusion into the histogram, they have to be renormalised to the DNH values by correcting for the differences in their gyromagnetic ratios and bond lengths. This correction amounts to simple multiplication by factors of 0.46, 8.3 and 5.0 for DCaHa, DNC0 , and DCaC0 , respectively [125]. A maximum likelihood method has also been proposed to determine Da and R [126]. This is particularly useful for RNA and DNAwhere the number of available dipoles typically is very low, and Da and R can be determined reliably [127] by using additional shape information for example as obtained by the program SSIA [128]. If a structure is available, the determination of the alignment tensor is much simpler. Its five free parameters (2 traceless diagonal elements, 3 Euler angles) can be fitted by minimising the chi-square difference between measured RDCs and the prediction according
Measurement of Structural Restraints
119
(a)
30 20
DNH (Hz)
10 0 -10 -20 -30 0
50
100
150
200
Residue Number 20 (b)
AXX
Count Number
15
10 Ayy
AZZ
5
0
-30
-20
-10
0 10 DNH (Hz)
20
30
Figure 4.14 15N-1H dipolar couplings for GlnBP in the glutamine-bound form. (a) The measured DNH for GlnBP in the bound form in 6 mg/ml Pf1 phage are plotted against residue number. A DNH range of almost 30 Hz can be observed. (b) A histogram distribution of DNH for GlnBP. The distribution of the DNH for GlnBP in the bound form is plotted as a histogram. It shows a fully asymmetric distribution with designated components Axx, Ayy, and Azz marked with arrows. The closer the values between Axx and Ayy the more the alignment tensor is axially symmetric
to Equations 4.34 or 4.37. Using Da and R determined above as starting values, a simple nonlinear minimisation to the GlnBP X-ray structure resulted in an optimised alignment tensor with Da ¼ 13.0 Hz and R ¼ 0.60. The orientation of the PAS of this alignment tensor is described by the three Euler angles a ¼ 184 , b ¼ 160 , g ¼ 9 within the GlnBP molecular frame. Alternatively, the fit can be formulated as a linear optimisation problem and solved by a simple matrix inversion for example using singular value decomposition
120
Protein NMR Spectroscopy 30
(a)
DNH(calc) (Hz)
20 10 0 z
-10 x
-20
y
-30 -30
DNH(meas) - DNH(calc) (Hz)
30
-20
-10 0 10 DNH(meas) (Hz)
20
30
(b)
20 10 0 -10 -20 -30 0
50
100
150
200
Residue Number
Figure 4.15 Validation of GlnBP structure by DNH RDCs. (a) The calculated DNH are plotted against the measured values. The fitting was done with a gradient minimisation. The inset shows the GlnBP in the bound form as a ribbon representation with the principal axis system (PAS) next to it. (b) The difference between measured and calculated DNH values are plotted against the residue number. Residues with the largest deviation typically are located in the loop regions (depicted as open circles). The loops in GlnBP in the bound form in the X-ray structure apparently do not adopt the same conformation as they do in solution
(SVD) [81,82]. The agreement between measured and predicted DNH values is shown in Figure 4.15a where the PAS has been drawn within the structure of the GlnBP molecule. For the fitting of the alignment tensor, values of the exposed loop regions of GlnBP have been excluded. When the predicted RDCs for these loops are compared to measured values, larger deviations are observed (Figure 4.15b). This indicates that the loop conformations of the solution state differ from the crystalline state.
Measurement of Structural Restraints
121
4.5.4.1 Degeneracy of Solutions From the quadratic dependency of RDC on the internuclear vector coordinates (Equation 4.34) it is obvious that internuclear vectors with inverted x-, y-, or z-directions also fulfil the RDC constraints. The general solution of Equations 4.34 or 4.35 for a single (fixed distance) RDC corresponds to a dipole vector oriented anywhere on two oppositely oriented cones [129]. This degeneracy can be reduced by information from multiple alignments. For instance, if a single dipolar vector is constrained by RDCs from two sufficiently different alignment tensors, the degeneracy decreases to eight or less according to the intersection points of the two pairs of cones. Such different alignment tensors can be obtained from different alignment media, changing the solution conditions of the alignment medium (e.g. salt and pH for electrostatic alignment mechanisms), changing the protein construct (e.g. adding a his-tag), and also by combining different alignment methods (see also Section 4.5.2). When several RDCs are determined between the nuclei of a chiral chemical moiety under rhombic alignment conditions, the fixed stereochemistry restricts the solutions to only four orientations of the chemical group, because mirror-symmetric solutions are excluded. These four possible orientations can be easily created by 180 rotations around the x-, y-, and z-axis of the PAS (see e.g. Figure 4.15a). Sometimes, additional information, such as steric clashes or chemical contact shift maps, may resolve such ambiguities and may lead to unique solutions, for example for the relative orientation of two independent domains. 4.5.4.2 Prediction of the Alignment Tensor from the Structure In certain cases, the alignment tensor of a molecule can be predicted from its structure. The comparison to measured data then provides further possibilities for structural analysis, such as the determination of the oligomerisation state, multiple conformations, flexibility and structural homology just to name a few. The prediction requires knowledge on the interaction between the solute and the alignment medium. Very good agreement between prediction and measured RDCs has been obtained for electrically neutral alignment media such as lipid bicelles, polyethylene glycol liquid crystals and strained polyacrylamide. In this case, the interaction is based on steric exclusion, and the alignment medium can be modelled by two parallel, rigid walls. The probability of a certain molecular orientation is then calculated as the volume that the molecule can occupy in this orientation divided by the total volume between the walls [128,130,131]. The steric alignment prediction is available via the program PALES [128] and has also been introduced as a potential for structure calculations into the program Xplor-NIH [131]. For neutral alignment media, it has been shown that the principal axes of the alignment tensor almost coincide with the axes of the rotational diffusion tensor [132], clearly indicating that the shape of the molecule governs both diffusion and steric alignment of a molecule. Besides steric repulsion, the most important alignment mechanism results from electrostatic interactions between the solute and the alignment medium. Prediction in this case is much harder, since for example the exact location and dynamics of surface charges on a biomolecule is often not known. To account for electrostatic alignment, the PALES program has been extended with some success by an additional Boltzmann weighting factor of the
122
Protein NMR Spectroscopy
steric orientation probability that accounts for the electrostatic potential of the solute in the field surrounding the alignment medium [133]. 4.5.5
RDCs in Structure Validation
4.5.5.1 Q-Factor Since RDCs can be calculated purely from the geometry of a structure and the alignment tensor, they provide a simple means for the quantitative validation of structures. For this, the alignment tensor is determined by a best fit of suitable regions of the molecule. The quality of the structure is then quantified by the Q-factor [85] as the RMS difference between measured RDCs and their values predicted by the alignment tensor divided by the RMS of the measured RDCs D Q¼
2 E12 Dmeas Dcalc D E12 ðDmeas Þ2
ð4:42Þ
where the angular brackets denote the average over all RDCs. The Q-factor varies in the range from 0 to about 1, with a lower value indicating a better agreement. The data in Figure 4.15a correspond to a Q-factor of 0.25, which is a typical value for an X-ray structure resolution of 1.94 A as in the present case for GlnBP. Crystal structures with resolution better than 1.0 A usually result in Q-factors of less than 0.2. In contrast, larger Q factors (greater than 0.4) are commonly observed for NMR structures derived from NOE data alone. 4.5.5.2 Using RDC Values for Database Screening The RDC quality factor can also be used to search for structural homologies of a protein, for which RDCs but no structure is available. Thus Annila and coworkers [134] compared the measured DNH RDCs of calerythrin, an EF hand containing protein, against the RDCs predicted for four different known EF hand protein structures with low (less than 30 %) sequence similarity. Residues between the unknown and known structures were aligned by using multiple sequence similarities and secondary structure information. The comparison yielded very good agreements between RDCs measured for calerythrin and predicted for the sarcoplasmic calcium binding proteins from two different organisms. In this way, the general fold for calerythrin could be established with minimal effort. 4.5.6
RDCs in Structure Determination
The use of RDC in structure determination is still evolving, comprising improvements in the structural definition of native structures as well as of conformational ensembles of folded and unfolded proteins. We limit ourselves here to the most widespread applications of structure refinement, domain orientation including biomolecular complexes, and de novo structure determination. 4.5.6.1 Structure Refinement Structure refinement by RDCs requires that some conventional NMR structural information such as NOE and dihedral restraints is available. The structure refinement protocol consists of the following steps: estimation of the alignment tensor, definition of its axes, set up of
Measurement of Structural Restraints
123
114 115 116 117 118 119 120 15N
121 122 123 124 125 126 9.2
9.0
8.8
8.6
8.4
8.2
8.0
7.8
7.6
7.4
ppm
1H N
Figure 4.16 Region of the HSQC spectrum of SeBP, a 44 kDa homo-pentamer. Broad resonances are observed due to structural heterogeneity and conformational exchange between the monomers
RDC restraints, simulated annealing, optimisation of the alignment tensor, evaluation of the resulting structure. The recently determined NMR structure of the selenium binding protein (SeBP) of Methanoccocus vannielii [135] will be used as an illustrative example for this process. SeBP is a homopentamer consisting of 8.8 kDa monomer subunits. With respect to NMR properties SeBP is a non-ideal case. The effective size of SeBP is 44 kDa and slight structural variations between the monomers lead to broad resonances even in the isotropic medium (Figure 4.16). Thus only a limited number of conventional restraints was obtained. Nevertheless an accurate structure determination was possible due to RDCs. Since a priori no structure was available, the SeBP alignment tensor was estimated by the histogram method. Three data sets containing in total 63 DNH and 42 DCaHa, and 54 sidechain DCH RDCs were measured in 12 mg/ml Pf1 phage. Despite the broad spectral lines, the errors for DNH and DCaHa based on the experimental reproducibility are quite small, i.e. 0.93 and 1.1 Hz, respectively (Figure 4.17a and b). The sparse sampling of orientations for the individual DNH and DCaHa data made it necessary to combine both data sets into a single dipolar histogram for the estimation of the alignment tensor components as described in the previous section (Figure 4.17c). In contrast to GlnBP, the SeBP dipolar distribution
Protein NMR Spectroscopy 20
20
Ayy= Axx
(c)
0 -20 20 0 -20 -40
(a)
Count
DCαHα
DNH (Hz)
124
10
Azz
(b) 10
20
30
40
50
residue number
60
70
80
-40
-20
0 20 Dmeas (Hz)
40
Figure 4.17 Dipolar couplings for SeBP. The measured DNH (a) and DCaHa (b) for SeBP in 12 mg/ml Pf1 phage and 20 mM Tris/Hcl, pH 7.1 are plotted against residue number. Missing values of dipolar couplings belong to residues undergoing conformational exchange and their resonances could not be observed. (c) Histogram distribution of combined DNH and DCaHa RDCs for SeBP. The designated components Axx, Ayy, and Azz are marked with arrows. Similar values between Axx and Ayy indicate that the alignment tensor is axially symmetric
shows a nearly axial profile with components Axx and Ayy being almost equal. As a consequence, the rhombicity for the SeBP alignment tensor in Pf1 phage is close to zero. Most likely, this axial symmetry around the z-axis of the alignment tensor results from the fivefold rotational symmetry expected for the homopentamer. Both symmetry axes apparently coincide. Using the procedure described in the previous section, the estimated magnitude of the alignment tensor for SeBP is Da ¼ 19 Hz. The refinement of SeBP was carried out using the program XPLOR-NIH [136]. The XPLOR-NIH routine requires that an explicit axes system for the PAS of the alignment tensor be defined. This is achieved via a pseudo molecule consisting of four atoms OO, XX, YY, and ZZ and connected by chemical bonds OO–XX, OO–YY, and OO–ZZ. These three bonds are kept in a fixed perpendicular geometry representing a right-handed x-, y-, and z-coordinate system. During the calculation, the pseudo molecule is allowed to rotate freely around the OO atom (origin) while keeping the relative perpendicular geometry of the bonds. Thus the PAS can adjust in the best orientation to minimise the RDC pseudo forces. Typically an RDC restraint entry in XPLOR-NIH is a list consisting of the two atoms that make up the dipole, the target (measured) RDC value, and its error margins. For SeBP the situation is more complicated since the RDCs result from the average over all monomers of the homo-pentamer. Therefore, all RDC restraints were refined against this average using the sum averaging option of XPLOR-NIH. The error margins for the restraints were set according to the RDC experimental reproducibility. The target function for the restraints was implemented as a harmonic potential of the difference between measured and calculated RDCs, except for those RDCs that showed dynamic flexibility such as DNH for a few residues at the termini (determined by high 15 N T2 values) as well as all side chain DCHs. Since the dynamic averaging attenuates the apparent magnitude of these RDCs, they were incorporated using a half-open (lower limit, but no upper limit) potential target function. At every molecular dynamic (MD) step of the simulated annealing calculation, the RDCs associated with the instantaneous structure of the protein were evaluated against the
Measurement of Structural Restraints
125
prediction of Equation 4.36. The larger the difference between the calculated and measured RDCs at a particular MD step, the larger the force associated with the RDC that drives the next MD step. The overall RDC force is scaled by a force constant that is typically very small or zero during the initial high temperature equilibration, but slowly increases during the later cooling process of the MD run. The final value of the constant is adjusted such that the RMSD between the calculated and measured RDCs agrees with the experimental errors. Since the RDC term is only one of several constraining potentials, the rate of increase for its force constant is adjusted to not substantially affect all other energy terms and the overall convergence of the calculation. Conversely, problems associated with the RDC term such as large experimental errors, incorrect assignments or alignment parameters may manifest themselves as a high overall energy and poor convergence. The starting alignment tensor parameters (Da and R) used in the refinement of SeBP were estimated from the histogram method, which is sensitive to incomplete sampling. In the previous example for GlnBP (Figure 4.14b), the histogram estimates for Da and R differed by about 15 and 25 % from the optimised values. To achieve better refinement these parameters were automatically adjusted during the simulated annealing. The procedure followed a suggestion [83,84] to obtain the parameters of the orientation tensor at every time step by a simple, linear fit. Following this protocol, the optimised Da value for SeBP was finally determined as 19.5 Hz, while R was equal to zero. The final agreement between measured and calculated RDCs is shown in Figure 4.18. The RMSDs of 0.7 and 1.2 Hz reflect the associated experimental errors for DNH and DCaHa, respectively, and correspond to quite small Q-values (QNH ¼ 0.08 and QCaHa ¼ 0.04). These small Q-values are not surprising since the structure was refined against a rather small number of RDCs. To independently assess the quality of the structures, a separate criterion is needed. Simple cross-validation can be achieved by leaving out a complete subset or a random fraction of RDC restraints and recalculating the structures and their respective Q values. For SeBP, leaving out the DNHs in the refinement yields QCaHa ¼ 0.37, whereas leaving out the DCaHas results in QNH ¼ 0.63. These relatively high Q values reflect the fact that the system is quite underdetermined. The incomplete NOE and dipolar restraints in SeBP allow the N-H bonds to move without affecting the Ca–Ha bonds orientation, and vice versa. When cross-validation was performed by random omission of 10% of DNHs or DCaHas in this example, the structure refinement resulted in a QNH of 0.16 and QCaHa of 0.14, respectively [135]. A comparison of the structures calculated with and without RDCs is shown in Figure 4.19. As expected since the restraints were sparse, the overall precision of the calculated structures is not improved significantly by the inclusion of RDC restraints. Thus the average RMSD to the mean of the ten best structures calculated without and with RDCs were 0.80 and 0.70 A, respectively. However, as indicated in Figure 4.19a, there are distinct differences between the two families of structures that result from the reorientation of the subunits. For example, the angle between the lone helix and the symmetry axis of the pentamer is 48 and 40 for structures refined with and without RDCs, respectively. This reorientation significantly affects the central channel of SeBP. 4.5.6.2 Domain Orientation RDCs can be very helpful in determining domain orientation in biomolecules as well as in assessing if interdomain dynamics is present. If structures of the individual domains are already available and there is no interdomain dynamics, the problem reduces to reorienting
126
Protein NMR Spectroscopy 20 (a)
DNH(meas) (Hz)
10 0 -10 -20 -30 -40 -40
-30
-20
-10
0
10
20
DNH(calc) (Hz) 40 (b)
DCH(meas) (Hz)
20
0 -20 -40 -60 -60
-40
-20
0
20
40
DCH(calc) (Hz)
Figure 4.18 Agreement between measured and calculated RDCs for SeBP. The comparison between measured and calculated DNH and DCaHa from the lowest energy structure of SeBP are shown in panel A and B, respectively. Their RMSD values correspond to their respective experimental errors
the domains such that their alignment tensors coincide. This procedure was used for example for the relative orientation of the B and C domains structures from barley lectin [137] that could not be orientated without RDC data due to the lack of interdomain NOEs. Alternatively, the domain orientation can be forced to correspond to the RDCs during structure calculation by incorporating the RDCs as constraints (see previous section). This method was used for example to determine the relative orientation of three zinc fingers in TFIIIA bound to its target DNA [138]. Without RDC restraints, the relative orientation between the first and second zinc fingers differed by 32 from the crystal structure. After refinement with RDCs, a significant difference of 20 remained, indicating that the solution conformation truly differs from the crystal.
Measurement of Structural Restraints
127
Figure 4.19 Comparison of SeBP structures calculated with and without RDCs. The ten best out of 100 structures of SeBP calculated with and without dipolar couplings are shown in panel A and B, respectively. For clarity, the flexible N- and C-terminal residues (M1-D5, and A76–A81) are not shown. Distinct conformations of the central loop (T58 – L65) forming the SeBP core channel and the strand (I25 – I29) connecting the helix to the top of the b-sheet in the two families of structures are indicated with arrows. In structures calculated with RDCs the conformations of the loops forming the centre of the core result in a smaller opening. The backbone traces of subunits of SeBP (best fit as pentamers) calculated with and without RDCs are shown in stereo in black and grey, respectively in panel C. The C a atoms for I25, I29, T58, and L65 are shown in black spheres. The T58 – L65 loop in the bottom of the figure shows a significant displacement in the two structures and corresponds to the loop forming the core channel of SeBP
For the described use of RDCs in domain orientation, interdomain motions should be absent. Such motions can be assessed by a comparison of the size of the alignment tensors of the individual domains. In the absence of a structure, the histogram method can be used for estimating their size provided sufficient sampling is achieved [139]. If the structures of both domains are available, the alignment tensors can be derived directly from fits of the RDC data. In the absence of motion, both domains will have alignment tensors with very similar amplitudes and rhombicities. Conversely, significant differences in these values indicate differences in the degree of alignment of the domains and thus the presence of interdomain motions. In such cases, the RDC data correspond to a population average over different conformations and may be interpreted by using suitable models for the interdomain motions such as wobbling in a cone or others [140].
128
Protein NMR Spectroscopy
The same approaches as for domain orientation can also be used to study protein ligand complexes. For instance, 1 H-13 C RDCs were used to determine the structures of a-methyl mannose [141] and of trimannoside [142] bound to the 53-kDa mannose binding protein A. The RDCs of the sugar ligands were measured in the unsaturated protein complexes where they represent the weighted average of RDCs of sugars in the free and bound states because both states are in fast exchange. The RDCs of the free sugars were measured separately, and used to extrapolate from the measured RDCs in the exchanging complexes to the sugar RDCs in the fully bound forms based on the known dissociated constants of the complexes. Since the protein is a C3-symmetric homo-trimer, the PAS of the alignment tensor in the bound state is determined solely from the symmetry. Thus no protein RDCs were necessary to orientate the sugar ligands relative to the protein structure. In a similar approach, RDCs could be measured for a transducin peptide that transiently binds to rhodopsin-containing disks from rod outer segments of the retina, which are strongly orientated in the magnetic field [143,144]. The binding is triggered by light activation, and orientational order is then transferred from the rhodopsin disk to the peptide inducing RDCs. Since exchange between free and bound peptide is fast, RDCs can be detected on the free peptide. Using these ‘transferred’ RDCs, the conformation of the peptide and its orientation relative to disk could be determined in the light-activated, rhodopsin-bound form. 4.5.6.3 De Novo Structure Determination The derivation of the correct calerythrin fold from screening its RDCs against existing structures [134] shows the potential of RDCs for rapid fold determination. However, the determination of complete de novo structures from RDC data alone is a much more challenging problem. The difficulty arises from the degeneracies of the dipolar coupling constant function (Equation 4.26) with respect to the direction and length of the internuclear vector. This degeneracy leads to a very complex surface of the scoring function for structure calculation and bad convergence properties. Obviously, the problem can be overcome by the inclusion of a sufficient, but hopefully small number of additional distance restraints. For instance for the 56-residue GB1, easy obtainable backbone RDCs (DHN, DHaCa, DCaC0 , DHNC0 , DNC0 ) together with distance restraints based solely on hydrogen bonds were sufficient to define its structure at 1 A accuracy [145]. However, for the about 100-residue BAF and cyanovirin-N proteins additional long-range NOEs were necessary to obtain the same accuracy. For even larger proteins, such as the maltodextrin binding protein (MBP), the combination of RDCs with HN-HN, HN-CH3, CH3-CH3 NOEs observed in the deuterated/methyl-protonated protein resulted in very poor convergence of the calculated structures [146]. This problem could be remedied by defining the orientation of each peptide plane from a set of five backbone RDCs prior to the global structure calculation. Thus the combination of several RDCs to define local structures before arranging them into larger segments appears to be a promising way to overcome the convergence problem. Indeed, in the molecular fragment replacement (MFR) approach [147] such a strategy has been used successfully for structure determination without NOEs. Here, sequential, overlapping 7–10 residue fragments are selected from a database of high-resolution structures, whose predicted backbone RDCs agree with the measured values. These fragments are then combined into an initial fold, which is further refined by minimising the difference between measured and calculated RDCs. Additional terms, such as the chemical shift, can be
Measurement of Structural Restraints
129
incorporated to improve the accuracy of the refined structure. For ubiquitin, the resulting final structure had an RMSD of 0.88 A relative to the crystal structure. A different variant of the local RDC combination strategy was implemented by Hus et al. in the program meccano [148]. This program builds the backbone of a protein solely from RDC data by assuming each peptide plane as rigid and defining its orientation in the alignment frame by multiple backbone RDCs. The planes are then added sequentially to construct the full backbone. The backbone structure obtained for ubiquitin is essentially identical to the crystal structure. In a further application on the MsrAEchmi protein [149], meccano was used to determine the conformation of a catalytically important loop region within the context of the previously known remaining structure. The inclusion of further RDC data improves the accuracy of the meccano approach. Thus, an ultra-high resolution structure of GB1 (RMSD to crystal structure 0.25 A) could be derived solely from RDC data, when DHNHN RDCs from the perdeuterated protein were also included [150]. These examples show that in favourable cases structures can be obtained from RDCs alone. However, probably the most efficient use of RDCs in early stages of structure determination is being made by combination with ab initio structure prediction programs and chemical shifts. Thus Rohl and Baker [151] showed that incorporation of backbone RDCs into the structure prediction program Rosetta substantially improved the contact order and accuracy of predicted structures for smaller proteins. More recently, this work has been extended to significantly larger proteins of up to 25 kDa by the inclusion of backbone chemical shifts and 1 HN -1 HN NOEs, which achieved accuracies in the range of 1.4–3.0 A for 18 out 26 structures [152]. 4.5.7
Conclusion
In this section, we have focused on the practical aspect of RDCs for structure determination. However, there are also substantial efforts to use RDCs to characterise conformational dynamics in folded proteins [122,153,154] protein folding/unfolding [121,155,156], and transition states [157]. In these systems, all possible conformers are sampled from the sub-ns to the ms time scale and contribute to the measured RDCs, thereby providing hitherto inaccessible information on the long-time scale order parameters that report on motions in biomolecules. By their ease of use and the provided increase in structural precision, weak alignment techniques have opened a wide range of structural and dynamical questions that can now be addressed by solution NMR. These applications are still evolving together with additional improvements in techniques for data acquisition and sample preparation. Nevertheless, the results so far make it clear that they present a unique tool to connect structure and dynamics of molecules in solution to their biological mechanisms [158].
4.6 4.6.1
Chemical Shift Structural Restraints Origin of Chemical Shifts and Its Relation to Protein Structure
The NMR resonance frequencies of individual atomic nuclei in a molecule depend on their local electronic environment, i.e. the chemical structure. The electrons shield the external magnetic field by a tiny amount (parts per million). Differences in the electronic structure
130
Protein NMR Spectroscopy
and dynamics lead to variations of the local magnetic field at the position of the nucleus and thus to variations of the resonance frequency. The chemical shifts are defined as the differences in observed resonance frequencies of nuclei relative to the resonance frequency of a standard (e.g. DSS or TMS [159]). In general, the chemical shift depends on the orientation of the molecule. However, in isotropic solutions of biomolecules the anisotropic contributions are usually averaged due to the rapid reorientation of the molecules. Thus only the isotropic part of the chemical shift is observed. Chemical shifts in proteins are exquisitely sensitive to the local conformation. However, their dependence on multiple factors, including backbone and side-chain torsion angles [160–164], neighbouring residues [165–167], ring currents [168,169], hydrogen bonding [170,171], electric fields [172] and solvent exposure [173], have complicated attempts to derive an explicit functional description of the influence of individual parameters (Figure 4.20a). Observed isotropic chemical shifts in proteins d obs commonly are partitioned into the random coil chemical shift drc and the secondary shift Dd: d obs ¼ drc þ Dd
ð4:43Þ
drc is defined as the chemical shift observed in a conformationally disordered peptide [174–179] and depends on the residue type and the nucleus. Dd depends on the conformation of the molecule and contains contributions from the secondary and tertiary
Figure 4.20 (a) Cartoon representation of the protein backbone and the typical factors that influence the secondary chemical shifts of atoms in a given residue of a protein, like hydrogen bonding, side-chain torsion angles and ring current effects. (b) Distributions of 13C a (left) and 13 b C (right) secondary chemical shifts in a-helical (filled bars) and b-strand (open bars) conformations in the TALOS database. (c) Contour plots of the 13C a (left) and 13C b (right) secondary chemical shifts as a function of f/y angles for all residues in the TALOS database; the background greyscale contour plots show the residue densities
Measurement of Structural Restraints
131
structure. Secondary chemical shifts are assumed to be independent of the residue type and display characteristic patterns for many structural motifs. In particular, the dependencies of the secondary shifts of backbone nuclei on the secondary structure [180] (Figure 4.20b) and the backbone torsion angles [161,162] (Figure 4.20c) are well characterised. Typically two questions are of practical interest: the prediction of chemical shifts for a known protein structure and the inverse problem of predicting the protein structure from known chemical shifts. Prediction of chemical shifts. Currently there are multiple approaches for predicting chemical shifts for a given protein structure. These include ab initio quantum mechanical (QM) calculations [181,182], empirical Dd(f,y) shielding surface analysis [161,167], secondary structure and hydrogen bonding [170,180], sequence homology [183], artificial neural networks [184], database mining [171] and hybrid methods [172]. All of these methods have their individual strengths and weaknesses. For example, the empirical approaches are relatively rapid and can cover chemical shifts over a wide range, but with modest accuracy, whereas the QM approach potentially offers relatively high accuracy but can require extremely long computation times and is very sensitive to the precise knowledge of the local geometry. Amongst the present methods, the hybrid predictive method SHIFTX [172] and the empirical database search approach SPARTA [171] appear to yield the best compromise between prediction accuracy, speed, and completeness. 4.6.2
Obtaining Chemical Shifts
Triple resonance correlation experiments collected on 15 N- and 13 C-enriched protein samples are commonly used to assign the chemical shifts of the 1 H, 13 C and 15 N nuclei. The procedure is usually divided into two steps starting with the sequence-specific assignment of backbone resonances and continuing from there to the side-chains (see Chapter 3). Due to the larger number of atoms and the lower chemical shift dispersion, the assignment of the side-chains is more labour-intensive and difficult than the assignment of the backbone. Nearly complete chemical shift assignments for both backbone and sidechain resonances are normally required to (iteratively) assign NOESY spectra and derive internuclear distances for protein structure determination by conventional methods. By contrast, the backbone (1 Ha , 13 C0 , 13 Ca , 15 N and 1 HN ) and 13 Cb chemical shifts are comparatively easy to obtain and are available at an early stage of the analysis. Thus it is particularly interesting that they already contain relevant information on the backbone torsion angles [162] and the secondary structure [185]. This information may be used as a complement to conventional NOE distance restraints in liquid-state NMR or internuclear distance restraints in solid-state NMR. More recently, full structure determination from these chemical shifts has become possible for smaller proteins [3,4,186,187]. For both approaches correct chemical shift referencing is a prerequisite. The chemical shift for the all nuclei is obtained according to IUPAC recommendations [159] from a 1 H standard. The 1 H frequency corresponding to zero ppm may be obtained as the measured frequency of DSS (4,4-dimethyl-4-silapentane-1-sulfonic acid) dissolved either directly in the protein solution or in a separate small capillary inserted into the NMR tube. As an alternative the known water chemical shift of about 4.7 ppm may be used to calculate the absolute 1 H zero frequency from the measured water frequency. The zero frequency of 13 C (15 N) can then be obtained by multiplying the 1 H zero frequency by 0.251 449 530
132
Protein NMR Spectroscopy
(0.101 329 118). Proper referencing can also be checked a posteriori by a comparison to expected shift behaviour by the program SHIFTCOR [188] at http://redpoll.pharmacy. ualberta.ca/ or by the linear analysis of chemical shifts (LACS) method [189], which estimates referencing errors from the correlation between the secondary chemical shift DdX of nucleus X and the Dd 13 Ca Dd13 Cb difference. The latter is also implemented in the TALOS þ program (see below). 4.6.3
Backbone Dihedral Angle Restraints from Chemical Shifts (TALOS)
With the rapid increase of high-resolution protein structures available in the Protein Data Bank (PDB) [190] and protein chemical shift assignments in the BioMagResBank (BRMB) [191], more accurate and precise empirical descriptions of the relation between protein structure and chemical shifts have become possible. In the following, the use of the programs TALOS [162] and its successor TALOS þ [192] will be described. These programs yield accurate predictions of f and y backbone torsion angles from the knowledge of 13 Ca , 13 Cb , 13 C0 , 15 N, 1 Ha and 1 HN chemical shifts. The predicted f and y torsion angles can then be used as restraints in structure calculation and refinement protocols or to crossvalidate structures. In brief, TALOS searches a protein tripeptide database, currently constructed from about 200 proteins with available high-resolution X-ray structures and NMR chemical shift assignments, for the 10 best matches with similar secondary chemical shifts and residue types to a given tripeptide of the target protein (Figure 4.21a). The assumption is made that fragments with similar chemical shifts and residue type typically have similar backbone conformations. If these 10 matches indicate consistent values for the f and y angles of the centre residues, their averages and standard deviations are used as prediction for the f and y angles of the centre residue of the query tripeptide. With this quality criterion, TALOS predicts f and y torsion angles for, on average, 72 % of the residues in the proteins. For these residues the average errors of the predicted f and y angles are about 13 degrees. To extend these accurate predictions to more residues, TALOS has been augmented by an artificial neural network (ANN) module, which first predicts a three-state distribution in the Ramachandran map, i.e. alpha, beta and positive-phi, which is then used to guide the database search procedure to identify the 10 best matches. This improved TALOSþ method is able to predict backbone f and y torsion angles with the same accuracy for, on average, 88 % of the total residues. Importantly, most of these additional predictions are made for the residues in loop or turn regions. These additional torsion angle restraints are very valuable for protein structure calculation and refinement. Using chemical shifts, TALOSþ also predicts the regular secondary structure and the backbone order parameters S2 [193] of the target protein (Figure 4.21b, lower right panel). The S2 prediction can be used to identify dynamic residues and to avoid erroneous f/y predictions for such cases. Recipe 4.10: Using the TALOSþ Program (for details see http://spin.niddk.nih.gov/bax/ software/TALOSþ /) 1. Prepare a TALOSþ formatted input table (‘inCS.tab’) with the sequence and chemical shift assignments. Note that in the ‘DATA SEQUENCE’ field, reduced and
Measurement of Structural Restraints
133
Figure 4.21 (a) Flowchart of TALOS þ method. (b) Graphic TALOS þ inspection interface for TolR protein. (See details at TALOS þ webpage http://spin.niddk.nih.gov/bax/software/ TALOS þ /). Please refer to the colour plate section
oxidised cysteine residues are defined as ‘C’ and ‘c’, respectively. CYANA users can use a standard CYANA macro ‘taloslist.cya’. 2. Use the script talos þ to perform the database search: talos þ -in inCS.tab. A file ‘pred.tab’ is created, where a summary of the prediction results is stored. 3. Execute rama þ to manually inspect and adjust the predictions made by the program: rama þ -in inCS.tab (cf. Figure 4.21b). The -ref PDBfile option can be supplied if a reference structures is available. During this inspection step for any residue of the target protein, a graphic interface displays the f/y distribution of the centre residue of the 10 best database matches in an empirical Ramachandran map (Figure 4.21b). A ‘good’ prediction is obtained if the centre residues of all 10 best
134
Protein NMR Spectroscopy
matches fall in a consistent f/y region and if the artificial neural network yields high confidence for its f/y distribution prediction. All other cases are considered as ‘ambiguous’ or ‘no prediction’. The ‘bad’ classification is only used for deviations relative to a known reference structure. The prediction files are overwritten to reflect any changes made interactively, and a final ‘pred.tab’ is created containing the adjusted f and y predictions (average and standard deviation). 4. Convert the TALOS þ predictions (pred.tab) into dihedral angle restraints for a conventional NMR structure calculation in Xplor or CYANA by the scripts talos2xplor.com or talosaco.com, respectively. Both scripts are included in the TALOS þ distribution. 4.6.4
Protein Structure Determination from Chemical Shifts (CS-Rosetta)
In the past, chemical shifts also have been used in combination with other NMR observables such as RDC or sparse NOE data for generating protein structures with relatively modest quality [147,194,195]. More recently, several computational protocols [3,4,186,187] have been developed that use the NMR chemical shifts alone as input for structure determination of proteins. These approaches, represented by CS-Rosetta [4] and Cheshire [3], are based on similar strategies (Figure 4.22a), which in essence involve three steps: 1. Fragment library generation. The experimental backbone and 13Cb chemical shifts are matched to a structural database to identify short protein fragments (3 and 9 residues long) with similar chemical shift and amino acid sequence patterns. As the number of solved protein structures with NMR assignments remains relatively small and would not cover all query fragments, the database is augmented by further proteins of known structure without chemical shift assignments, but which are ‘assigned’ by empirical chemical shift prediction methods such as SPARTA [171]. The recent improvements in empirical chemical shift prediction and fragment search methods [195] yield sufficient accuracy for the following steps of structure determination. 2. Fragment assembly and structure generation. The selected protein fragments are then used as input for a Monte Carlo fragment assembly and relaxation procedure to generate all-atom protein structures. For example in the CS-Rosetta protocol, the Rosetta program [196] is used. This step relies on accurate force fields to obtain proper local geometries, for example with respect to hydrogen bonding, hydrophobic packing, and so on. 3. Structure evaluation and selection. The generated full-atom structures are further inspected for their fit to the input chemical shift data. If proper agreement is observed for models with low structural energy, the prediction is deemed to be good and the structures with the lowest energy are selected. These approaches to structure determination have been demonstrated for dozens of proteins with sizes of up to 15 kDa and a wide variety of folds. For the vast majority, convergence is obtained, and yields all-atom protein models that compare well with experimental structures, with root mean square deviations (rmsd) from the conventionally determined
Measurement of Structural Restraints (a)
135
chemical shifts 1. MFR Fragment Selection (sequence)
Chemical Shifts
Fragment Candidates
Structural Database Calculate chemical shifts (SPARTA)
Predefined 2. Rosetta Fragment Assembly All-atom Models
3. Chemical Shift Score Calculation
Rescored Models
Converged?
(b) -120
(c) 30
Rosetta Energy
Rosetta Energy
-140 -160 -180 0
2
4
6
8
10 12
(d)
Structure
0 -30 -60 -90 0
Cα RMSD to TolR
YES Predicted
2
4
6
8
10 12
Cα RMSD to TolR
(e) 30
Rosetta Energy
0 -30 -60 -90 0
2
4
6
8
10 12
Cα RMSD to Lowest Energy Model
Figure 4.22 (a) Flowchart of CS-Rosetta protocol. (b–e) CS-Rosetta structure generation of TolR protein. (b) Plot of Rosetta full-atom energy versus C a rmsd relative to the experimental TolR monomer structure for all CS-Rosetta models. (c) as (b) but showing the Rosetta full-atom energy rescored (augmented) by the chemical shift deviation (c2) energy. (d) as (c) showing the Rosetta full-atom energy rescored by the chemical shift deviation (c2) energy versus the C a rmsd from the lowest energy CS-Rosetta model. (e) Backbone ribbon representation of 10 CS-Rosetta models with lowest energy (dark grey) superimposed on the experimental monomer NMR structure (light grey) for TolR. Please refer to the colour plate section
136
Protein NMR Spectroscopy
reference structures in the 0.7–2.0 A range for the backbone atoms, and 1.4–3.0 A when considering all atoms [3,4]. The CS-Rosetta procedure has been successfully applied for tens of structural genomics target proteins, prior to completion of the conventional NMR structure determination process [4]; it has also been shown to be robust for proteins with incomplete NMR chemical shift assignments [197]. More recently chemical shift-based structure determination has also been applied to the solid-state [197,198], protein-protein complex docking [199] and small protein assemblies [200]. Considering that the backbone chemical shifts are available at the early stage of a traditional NMR structure determination, prior to the collection and analysis of structural restraints, the chemical shift-based approach presents a viable and efficient alternative to NOE-based structure determination for small to medium-size proteins, and potentially provides a new direction for high-throughput structure determination in structural genomics [201]. Recipe 4.11: CS-Rosetta Structure Calculation 1. Data Preparation. Prepare a TALOS formatted input table (‘inCS.tab’) with the sequence and chemical shift assignments as described under Recipe 4.10 step 1. Note that the residue number must start with 1, and residues in long flexible tails are better removed in order to improve the convergence of generated structures and to decrease the computing time. Assure proper referencing. However, possible referencing errors and chemical shift outliers will also be checked in the next step prior to the database search. 2. Fragment Selection. The script runCSRjob.com is the master script to search the protein database and generate fragment candidates. To use this script, the input chemical shift file is specified in the command line as runCSRjob.com inCS.tab. This will create a set of 200 best-matched fragments, with similar chemical shift and amino acid type patterns, selected from the protein database for each overlapped 9-residue and 3-residue fragment in the target protein. 3. Structure Generation. The script runRosetta.com, generated by runCSRjob .com, is then used to start the standard Rosetta fragment assembly and full-atom relaxation from the protein fragments. By default, this step generates 5000 Rosetta fullatom models, each of which is scored by a Rosetta full-atom energy related to the hydrogen-bonding, the hydrophobic packing, and so on (Figure 4.22b). 4. Structure Evaluation and Selection. The generated full-atom structures need to be further evaluated for their agreement with the original experimental chemical shifts. This can be performed by the command runCSrescore.com silent_file.out inCS.tab. This procedure compares the SPARTA-predicted chemical shifts of each model with the experimental chemical shifts; the difference is added to the Rosetta fullatom energy as a ‘pseudo chemical shift’ energy term (Figure 4.22c). The rescored Rosetta energy is stored in an output file along with a Ca rmsd value relative to the lowest energy structure, for each generated Rosetta structure (Figure 4.22d). Convergence is assumed if the 10 structures with the lowest Rosetta energy all cluster within 2 A rmsd for the backbone atoms (or Ca atoms only). The 10 lowest energy structures are selected as the final predicted ensemble. Note that even if the convergence criterion is not achieved, the results may still be useful for further analysis, but should be interpreted with caution.
Measurement of Structural Restraints
137
Figure 4.22b–e shows the results of the CS-Rosetta calculation for the high-resolution monomer structure of the homo-dimeric protein TolR [202]. This protein consists of 2 72 residues. For TolR, all 10 lowest energy CS-Rosetta structures (of a total of 10 000 calculated) are within 1.0 A rmsd from the backbone atoms of the TolR NMR structure (2.0 A rmsd for all heavy atoms). Note that CS-Rosetta is able to determine this monomer structure within the dimeric TolR. However, caution should be taken for such cases, in particular when applying the approach to intertwined dimers. By using the current CS-Rosetta protocol, 5000–20 000 predicted structures are generally required to obtain convergence. For small proteins ( 90–100 amino acids), 1000 to 5000 CS-Rosetta structures often suffice. Note that the generation of thousands of Rosetta structures is computationally demanding (on the order of 5–10 minutes per structure for a 70–100 amino acid protein on a single 2.4 GHz CPU). Thus the use of a computer cluster is usually necessary. As an alternative the calculation can be carried out at the eNMR CSRosetta webserver (http://www.enmr.eu/webportal/).
4.7
Solution Scattering Restraints
The solution NMR data discussed above can be very effective at specifying torsion angles and short interatomic distances, as well as orientations of interatomic vectors relative to a common frame, but typically lack long-range translational information. For multidomain systems or macromolecular complexes, interdomain or intermolecular NOEs are often sparse and difficult to obtain. Invariably, the density of NMR restraints decreases with increasing molecular size. Additional long-range constraints are thus important for accurate structure determination. Both paramagnetic NMR methods (Chapter 6) and solution X-ray or neutron scattering are valuable in this context (see Chapter 8 for applications to macromolecular complexes). The usefulness of small angle X-ray/neutron scattering (SAXS/SANS) data to define the macromolecular shape and the speed with which they can be acquired and analysed have increased the popularity of solution scattering methods in the last decade. Since the scattering data carry long-range translational information, they ideally complement global orientational NMR restraints such as RDCs. NMR and scattering data can be acquired under very similar conditions, thus minimising the risk of structural or biochemical changes between the two measurements. The information content of the scattering data within a given resolution range and their signal/noise ratio increase with the size of the particle. When scattering data are used for structural analysis, care must be taken to avoid artifacts from sample heterogeneity, radiation damage, or interparticle interference. This section briefly discusses the basics of solution scattering, practical procedures for data acquisition and analysis and their structural applications; for further details the reader is referred to several excellent reviews [203–205]. 4.7.1
Physical Background
When a radiation wavefront hits a target, spherical waves are generated at each scatterer, which interfere according to their phase differences at a given point. The amplitude of the scattered X-ray wave is inversely proportional to the mass of the charged scattering particle. Therefore nearly all X-ray scattering originates from electrons. Neutrons, on the other hand,
138
Protein NMR Spectroscopy
interact strongly with the atomic nuclei. As a result, the X–ray scattering amplitudes increase with the atomic number while the corresponding values for neutrons span a relatively narrow range and depend on the nuclear spin state. An important aspect of neutron scattering is a significant difference between the scattering amplitudes of the 1 H and 2 H hydrogen isotopes. For the following description we will assume that (i) scattering is coherent with no loss of energy, (ii) detection occurs in the Fraunhofer (far field) regime, and (iii) the first Born approximation holds, i.e. the scattering amplitude of a composite object results from the sum over the individual scatterers that constitute the object. The scattering amplitude for an electron density r(r) in the direction q of the inverse space is given by its Fourier transform ð FðqÞ ¼ d 3 rrðrÞexpðiqrÞ
ð4:44Þ
where the length q of the scattering vector equals 4p sin(q)/l, and 2q and l are the scattering angle and the radiation wavelength, respectively. For a set of point-like atomic scatterers, the above expression becomes FðqÞ ¼
N X
fj ðqÞexpðiqrj Þ
ð4:45Þ
j¼1
where rj are the atomic coordinates and the atomic form factors fj(q) are the Fourier transforms of the respective electron densities. In neutron scattering, the point sizes of the nuclei relative to the neutron wavelength cause the scattering amplitudes to be q-independent. Since the detector measures the flux of the energy per unit area, the scattering intensity is proportional to I(q) ¼ F(q)F (q) ¼ |F(q)|2, which for an isotropic solution takes the form D E IðqÞ ¼ jFðqÞj2
ð4:46Þ
W
where the angular bracket denotes the isotropic average over all orientations W of the scatterer relative to the incident photons. Solution scattering data are measured as the difference between the scattering by the sample and the matching buffer. This difference corresponds to the contrast between the macromolecule including its surface layer of perturbed solvent and the bulk solvent occupying the same volume D E IðqÞ ¼ jFmol ðqÞ þ dFsurf ðqÞFdispl ðqÞj2
W
ð4:47Þ
CRYSOL [206] and CRYSON [207] are popular programs that employ this expression for evaluating the agreement between solution X-ray or neutron scattering data and the atomic coordinates of the macromolecule. The average over all orientations in Equation 4.47 can
Measurement of Structural Restraints
139
also be carried out analytically producing the Debye formula: IðqÞ ¼
Nscat X
f i0ðqÞf j0ðqÞ
i;j¼1
sin ðqrij Þ qrij
ð4:48Þ
Here, the double sum is over all scatterer atom pairs, f 0(q) are the displaced solventcorrected form factors, and rij the interatomic distances. The computation of Equations 4.47 and 4.48 can be accelerated by combining groups of spatially proximal atoms into ‘globs’ [208]. Scattering form factors for each glob are calculated over all its constituting atoms as 0 fjglob ðqÞ ¼ @
112
Njglob
X
k;l¼1
fk ðqÞfl ðqÞ
sinðqrkl ÞA qrikl
ð4:49Þ
and the sums in Equations 4.45 and 4.48 are then over globs rather than atoms at the expense of certain systematic errors [209]. According to the convolution theorem, the scattering intensities correspond to the (inverse) Fourier transform of the distance auto-correlation function of the scatterers, P(r): FT½FðqÞF ðqÞ ¼ FT½FðqÞ FT½FÐ ðqÞ ¼ rðrÞ rðrÞ ¼ d 3 rrðr0 Þrðr0 þ rÞ ¼ PðrÞ
ð4:50Þ
The average of P(r) over the isotropic orientations in solution (Equation 4.46) produces a 1-D probability distribution of the scattering density within the scattering object, P(r). The mathematical conversion between I(q) and P(r) should occur without any loss of information. However, the correct P(r) can rarely be obtained via the Fourier transform of the experimental I(q) due to limitations in the observable q-range. So-called regularised FT methods implemented in software such as GNOM or GIFT [210,211] use constraints P(r) 0, P(0) ¼ P(Dmax) ¼ 0 (where Dmax is the maximum dimension of the scatterer) as well as others to describe a ‘proper’ P(r) for a compact object. As a consequence, the reconstructed P(r) depends on the program’s regulariser in addition to the scattering data. Therefore, using the experimental I(q) directly for structure refinement rather than the indirectly obtained P(r) may represent a cleaner approach. 4.7.2
Shape Reconstructions from Solution Scattering Data
While the knowledge of the atomic coordinates of the scattering object is sufficient to predict the scattering profile, the inverse problem – obtaining the 3-D structure from 1-D scattering data is indeterminate. Therefore, low-resolution shape reconstruction from scattering data alone represents one of the most significant achievements of the technique. The main idea of these methods is regularisation of the reconstructed model by requiring a compact, connected shape with a constant electron density. DAMMIN [211] and GASBOR [208,212] are popular programs for ab initio shape reconstructions. Both allow specifying the point symmetry and anisometry of the object, thereby increasing the accuracy of the reconstructions when such information is known. The price for the apparent
140
Protein NMR Spectroscopy
simplicity of ab initio reconstructions is that the obtained shapes do not necessarily correspond to true low-resolution structures of the macromolecules; rather the averaged scattering from these shapes corresponds to the measured data. The reasons for this include (i) loss of information due to orientational averaging (ii) the effect of the regulariser and (iii) the systematic errors in the predicted scattering profile calculated from densely packed spheres rather than all-atom models [208]. In addition, often several distinct classes of reconstructed shapes fit the scattering data equally well. In particular, the mirror image of any reconstructed low-resolution model yields an identical scattering profile.
4.7.3
Use of SAXS in High-Resolution Structure Determination
The notion of high-resolution structure refinement against SAXS data seems counter intuitive since their nominal resolution is quite low. For instance, a qmax of 0.5 A1 corresponds to a maximal resolution of only 12 A (resolution ¼ 2p/qmax). It thus appears that such data cannot be sensitive to structural rearrangements on the scale of 1–5 A, which typically occur during structure refinement. However, this is not necessarily the case as illustrated in Figure 4.23. In addition to the nominal resolution range, the signal/noise ratio and the amount of other structural information available for constructing a molecular model determine how much the scattering data can reveal. Thus all-atom structural models usually yield the best accuracy of the derived structures, since they do not contain systematic errors from lower-resolution approximations and easily incorporate general knowledge about the biomolecular structures as well as further data from other experimental techniques such as NMR. In practice, all-atom structural models are derived from constrained molecular dynamics simulations, which minimise a c2 penalty function of the deviation between the predicted
Figure 4.23 X-ray solution scattering data for lysozyme (APS synchrotron, 5 mg/mL) and CRYSOL fits of structures 193L and 1E8L (backbone rmsd of 1.5 A , inset)
Measurement of Structural Restraints
141
and measured scattering intensities divided by the experimental uncertainties sexpt: N Iexpt ðqj ÞIcalc ðqj Þ 2 1X 2 ð4:51Þ c ¼ N j¼1 sexpt ðqj Þ Thus the penalty function emphasises the more intense lower-q data over the higher-q data, which invariably exhibit lower precision and accuracy. For the calculation of Icalc, either Equations 4.45–4.47 or Equation 4.48 can be used with the Xplor-NIH or CNS structural refinement packages [8,213,214]. In both cases, the ‘globbic approximation’ (Equa tion 4.49) and sparse sampling of data points (e.g. a Dq increment of 0.005–0.01 A1) can speed up the computation. The forces generated during fitting of the solution scattering data can cause unwanted distortions on the small length scale. For this reason, interatomic clash statistics have to be monitored, especially in cases when fitting against SAXS data produces a decrease in the radius of gyration of the macromolecule. Deterioration of interatomic clash statistics may necessitate increasing the van der Waals radii. Furthermore, the SAXS c2 force constant and the qmax of the fitted data have to be optimised by validating the structural quality against the measured scattering data outside the fitted resolution range or against RDCs that are left out of the refinement for cross-validation. To counteract small length scale distortions, it is recommended to keep the structure locally rigid but globally flexible [214]. For example in a multidomain structure refinement, it is beneficial to completely fix the structures of the individual domains, if they are well defined, and to leave the interdomain arrangement flexible. When a structural homologue exists for the refined model, an effective strategy is to keep short-range overlapping sequential segments of the refined structure close to the homologue [214]. The combination of orientational NMR restraints from residual dipolar couplings (RDCs) with SAXS data is particularly effective due to the high complementarity of such data [8,202,213–217]. It has been shown for both RNA and proteins that accurate assembly of multisubunit constructs is possible with RDC and SAXS data alone, without any interdomain NOE distance restraints [202,216]. 4.7.4
Sample Preparation
Monodispersity of the scattering object is the most important requirement for the sample. In practice, sample purity has to be better than 95 %, in particular with respect to the larger-size contaminants. For proteins with a MW < 200 kDa, 1–10 mg/mL is a suitable range for measurements; for larger sizes, concentrations below 0.1–2 mg/mL should be used. Typical sample volumes are on the order of about 20 –100 ml for SAXS and 300 – 600 ml for SANS. Salts in the buffer are used to suppress long-range electrostatic interactions between solutes that would otherwise affect the low-q data at a finite solute concentration (so-called structure factor). They also increase incoherent background scattering and decrease solute/ solvent contrast. However, such effects are negligible up to 0.5–1.0 M for NaCl. Due to the higher surface charge of oligonucleotides as compared to proteins, the suppression of the interparticle correlations by salts is weaker for oligonucleotides at a given salt and solute concentration. A salt concentration of 150 mM NaCl is usually sufficient to suppress the structure factor at q values above 0.02 A1 for protein concentrations below 5–8 mg/mL and for DNA/RNA concentrations below 1 mg/mL. Buffer compounds with high electron
142
Protein NMR Spectroscopy
Figure 4.24 X-ray solution scattering data for Dickerson dodecamer DNA (APS synchrotron, 5 mg/mL). Approximately 1 out 105 incident photons are scattered by biomolecular samples
density or X-ray absorption should be avoided. Examples include transition metal ions, phosphates, and guanidinium chloride. Since the scattering profile of a biomolecular sample in solution is obtained from the small differences between relatively weak scattering signals of the solute and the buffer reference (Figure 4.24), a very good match between the two is required. Typically this can be achieved by a 16–48 hours dialysis or by repeatedly washing the sample with the buffer in a centrifugal filter device. Free radical scavengers should be included in the buffer for synchrotron data collection to minimise radiation damage, especially for proteins containing surface-exposed cysteine residues. Common choices are DTT (2–10 mM), TCEP (1–2 mM), glycerol or sugars (5–10 %), or organic buffering agents such as TRIS or HEPES. To decrease bubble formation during measurement, the buffer should be degassed, which is especially important for multihour data collections. Detergents should be avoided unless absolutely necessary as their signal can complicate data interpretation. To remove large particles it is also recommended to pass the samples through a 0.2 mm filter. Aggregation is the most common problem that can render data uninterpretable. Aggregation can be detected by dynamic light scattering or analytical ultracentrifugation. It may be possible to remove aggregates by native gel filtration or centrifugation through a highMW cutoff membrane. Freshly prepared samples kept at low concentrations until data collection work best in difficult cases. Also freeze/thaw cycles should be avoided since they can promote aggregation. When freezing is unavoidable, cryo-protectants (5–10 % glycerol) can be used. 4.7.5
Data Collection
The instrumentation for scattering comprises the radiation source (X-ray photons or neutrons), a monochromator, a beam collimator, the sample cell, an evacuated enclosure
Measurement of Structural Restraints
143
between the sample and the detector, the beamstop, and monitors for scattered and transmitted radiation. X-ray photons for SAXS have typical wavelengths of 0.5–2 A (synchrotron beamlines) or 1.542 A (Cu Ka radiation of laboratory-based instruments), whereas the thermal neutrons for SANS from nuclear reactors usually have a wavelength range of 5–20 A. Scattering data collection minimally involves the sequential measurements of the buffer and the sample along with transmitted and incident intensities. In order to minimise protein deposition, cleaning of the sample cell between measurements is crucial. A sequence of water/3 % sodium hypochlorite/isopropanol/water works well with quartz capillaries. The final wash before any measurement should be the matching buffer. Data measurement is typically done in sequential frames, which can reveal problems such as bubble formation or radiation damage. In the latter case, beam attenuation, decreased exposure, or a flowthrough set-up are recommended. To test for concentration dependence, scattering data should be acquired as a 3–6 point series covering an order of magnitude in concentration. It is best to measure data in SAXS and WAXS (wide-angle X-ray scattering) sample-to 1 1 detector configurations, covering the ranges from 0.008 A to 0.2–0.3 A and from 0.1 A1 to 2.2 A1, respectively, and then merge the two data sets. SAXS can be used to detect interparticle interference, aggregation or radiation damage, whereas WAXS expands the resolution range and can be used to evaluate the sample-to-buffer matching. Exposure times should be optimised to prevent radiation damage or detector saturation. Higher solute concentrations (5–20 mg/mL) can be used for WAXS since structure factors at these conditions rarely extend beyond 0.1 A1. For lab-based X-ray sources, radiation damage is a smaller issue with the signal/noise becoming the limiting factor. Minimal practical protein concentrations for such measurements are often 1 mg/mL corresponding to data collection times on the scale of hours. Multihour data collections are also typical for SANS experiments at reactor-based neutron sources, since the neutron fluxes are 104 smaller than the X-ray fluxes of synchrotrons. For SANS typical protein concentration ranges are 2–10 mg/mL. In practice, biomolecular SANS has advantages over SAXS in two cases. (i) Neutrons produce virtually no radiation damage for biomolecular samples. Thus SANS may be helpful for samples that exhibit extreme sensitivity to X-rays. (ii) The use of the differential scattering of 1 H and 2 H to separate signals from different components of the scattering particle, for example for multiple subunits such as protein/protein or protein/RNA/DNA complexes. In such cases, the scattering profiles of individual subunits and their cross-terms can be distinguished from measurements at variable solvent H2O/D2O ratios. For a two-part object, the SANS data have the form IðqÞ ¼ Dr21 I11 ðqÞ þ Dr22 I22 ðqÞ þ Dr1 Dr2 I12 ðqÞ
ð4:51Þ
where Dr1 and Dr2 are the contrasts between the scattering length densities of the two components and the solvent, and I11, I22, and I12 are the scattering profiles for the components and the cross-term which can be reconstructed from measurements at more than three different solvent 1 H/2 H ratios [218]. Differences between Dr1 and Dr2 are observed either due to the different chemical nature of the subunits (protein vs. RNA/ DNA) or for protein/protein complexes when one of the constituents is protonated and the other is deuterated. At the contrast match point (42, 65, and 70 % D2O for proteins,
144
Protein NMR Spectroscopy
Figure 4.25 Panel (a) experimental SANS data (filled circles) for the complex between protonated neuroligin 1 and deuterated neurexin proteins in 42 % D2O buffer [219] (NG3 instrument, NCNR, NIST, 2 m and 5 m sample-to-detector distances, 2 hr and 4 hr collection times, Dl/l ¼ 15 %, 5.2 mg/mL). The solid line indicates the indirect Fourier transform fit via GNOM. Panel (b) shows the corresponding P(r) distribution with uncertainties indicated by vertical bars. The inset depicts the structure of the neuroligin/neurexin. The position of the maximum at 107 A corresponds to the centre-of-mass separation between the deuterated neurexin subunits. A subsidiary maximum at 58 A corresponds to the correlation between the interacting neuroligin and neurexin subunits due to inhomogeneity of the scattering length density of the protonated neuroligin
RNA, and DNA, respectively), the protonated component becomes invisible to neutrons. The contrast-matched measurements allow a precise determination of the centre-of-mass separation between the deuterated subunits, useful for their positioning within the complex (Figure 4.25).
Measurement of Structural Restraints
4.7.6
145
Data Processing and Initial Analysis
After initial corrections for dark current profile, detection efficiency and removal of spurious spots from window material and cosmic radiation, the scattering data are azimuthally integrated to give a one-dimensional I(q) profile, which is normalised for the incident beam intensity and the transmission coefficient. The scattering from the buffer reference is then subtracted from the sample taking into account the volume fraction of the solvent in the sample as IðqÞ ¼ Isample ðqÞaIbuffer ðqÞ
ð4:53Þ
Here the rescaling constant a is calculated according to the sample concentration c as 1-cmg/mL 7.425104 for proteins and 1- cmg/mL 5.4104 for RNA/DNA and Isample and Ibuffer are scattering intensities scaled by the respective incident beam intensities and transmissions. The scattering profiles recorded for each individual data frame are then superimposed and outliers due to bubbles, beam position instability, or radiation damage are removed. The subtracted scattering data have to be analysed to ensure the absence of concentration dependence at low q or extrapolated to zero concentration going as low in concentration as the signal allows, i.e. typically 0.2–0.5 mg/mL on a synchrotron or 1–3 mg/mL on a lab source for a protein. Sample aggregation or polydispersity has to be ruled out as well, indicated by nonlinearity of the Guinier plot or by the indeterminate nature of the Dmax parameter of the P(r) curve. In cases when the amount of a dimer is below 10%, these effects may be subtle with P(r) going to zero at a finite distance, but the extracted Dmax and radius of gyration, Rgyr, being higher than expected for the monomeric form. In some cases, the interparticle interference and aggregation can masque each other, making Rgyr appear normal at a particular concentration as they produce opposing effects on the low-q data. In such cases, concentration-dependent effects can be detected by plotting Rgyr and I(0)/c as functions of the concentration c. The contamination of the scattering profile by very large aggregates is often easier to correct than the contamination by a small oligomer (dimer/ trimer etc.) continuum. In the former case, the unwanted signal is concentrated in the lowest-q data, while for the latter, the entire curve is affected. A number of model-free geometric parameters can be obtained from the scattering data. Rgyr can be calculated either using the Guinier approximation ln ½IðqÞ ln ½Ið0Þq2 R2Gyr =3
ð4:54Þ
valid for qmaxRGyr < 1.3 for globular and for qmaxRGyr < 0.8 for elongated particles, or as the second moment of the P(r) distribution: R2Gyr ¼
ð Dmax 0
r2 PðrÞdr=
ð Dmax PðrÞdr
ð4:55Þ
0
The maximum particle dimension Dmax can be calculated with a practical accuracy of 5 % as the distance, at which P(r) becomes 0, for situations where reliable scattering data are
146
Protein NMR Spectroscopy
available with qmin < 2p/Dmax. The particle volume can be obtained from the Porod invariant ð1 V ¼ 2p2 Ið0Þ= IðqÞq2 dq ð4:56Þ 0
with the accuracy depending on the shape and signal-to-noise of the data and becoming low for asymmetric particles and qmax above 0.2 A1. The molecular mass MW of the protein can be calculated from the I(0) values as MWmacromol ¼ MWstandard
Imacromol ð0Þ cstandard cmacromol Istandard ð0Þ
ð4:57Þ
using standards such as lysozyme. As established from measurements on a series of proteins, the errors in MW usually do not exceed 10 % making determination of the aggregation state quite robust [220]. The detection of flexibility of the scattering object from the SAXS data is possible but not without challenges. A significant amount of flexibility can exhibit itself in the disappearance of the maximum in the so-called Kratky plot (q2I(q) vs. q, Figure 4.26). On the other hand, a limited degree of flexibility can be sometimes difficult to detect from scattering data. For flexible systems, the Dmax parameter derived from P(r) can be systematically underestimated [221]. Model calculations show that size polydispersity as caused by flexibility can severely decrease the maximal q value, for which the Guinier approximation (Equation 4.54) holds, thus making correct determination of RGyr challenging [222]. The ensemble optimisation method (EOM) is a useful tool for fitting a structural ensemble
Figure 4.26 Kratky plots for a rigid compact protein (lysozyme), a protein with two rigid domains connected by a flexible linker (Pin1 Pro-isomerase), and an intrinsically unfolded protein (Sic1 cyclin-dependent kinase inhibitor). All data were collected at APS on 5 mg/mL samples. The number of the features in the scattering data decreases as the amount of structural disorder increases
Measurement of Structural Restraints
147
to the scattering data of completely unfolded or flexibly linked systems [223]. The individual members of the fitted ensembles should not be considered snapshots of conformations present in the solution due to the ill-defined nature of the problem. On the other hand, the aggregate characteristics of these ensembles such as the distribution of RGyr and anisometries are more reliable. Similar to the rigid case, joint refinement against SAXS and solution NMR data yielding a structural ensemble is feasible.
Acknowledgement The authors gratefully acknowledge the fantastic postdoctoral training they received from Dr. Ad Bax, National Institutes of Health USA, which shaped many of the concepts outlined in this chapter.
References 1. W€uthrich, K. (2003) NMR studies of structure and function of biological macromolecules (Nobel Lecture). J. Biomol. NMR, 27, 13–39. 2. Billeter, M., Wagner, G. and W€uthrich, K. (2008) Solution NMR structure determination of proteins revisited. J. Biomol. NMR, 42, 155–158. 3. Cavalli, A., Salvatella, X., Dobson, C.M. and Vendruscolo, M. (2007) Protein structure determination from NMR chemical shifts. Proc. Natl. Acad. Sci. U.S.A., 104, 9615–9620. 4. Shen, Y., Lange, O., Delaglio, F. et al. (2008) Consistent blind protein structure generation from NMR chemical shift data. Proc. Natl. Acad. Sci. U.S.A., 105, 4685–4690. 5. Grzesiek, S., Cordier, F., Jaravine, V.A. and Barfield, M. (2004) Insights into biomolecular hydrogen bonds from hydrogen bond scalar couplings. Prog. Nucl. Magn. Reson. Spectrosc., 45, 275–300. 6. Bax, A. and Grishaev, A. (2005) Weak alignment NMR: a hawk-eyed view of biomolecular structure. Curr. Opin. Struct. Biol., 15, 563–570. 7. Blackledge, M. (2005) Recent progress in the study of biomolecular structure and dynamics in solution from residual dipolar couplings. Prog. Nucl. Magn. Reson. Spectrosc., 46, 23–61. 8. Grishaev, A., Wu, J., Trewhella, J. and Bax, A. (2005) Refinement of multidomain protein structures by combination of solution small-angle X-ray scattering and NMR data. J. Am. Chem. Soc., 127, 16621–16628. 9. Lipfert, J. and Doniach, S. (2007) Small-angle X-ray scattering from RNA, proteins, and protein complexes. Annu. Rev. Biophys. Biomol. Struct., 36, 307–327. 10. Neylon, C. (2008) Small angle neutron and X-ray scattering in structural biology: recent examples from the literature. Eur. Biophys. J., 37, 531–541. 11. Nabuurs, S.B., Spronk, C.A.E.M., Vuister, G.W. and Vriend, G. (2006) Traditional biomolecular structure determination by NMR spectroscopy allows for major errors. PLoS Comput. Biol., 2, e9. 12. Solomon, I. (1955) Relaxation processes in a system of two spins. Phys. Rev., 99, 559. 13. Neuhaus, D.P. and Williamson, M. (2000) The Nuclear Overhauser Effect in Structural and Conformational Analysis, John Wiley & Sons, Inc., New York. 14. Sklenar, V., Piotto, M., Leppik, R. and Saudek, V. (1993) Gradient-tailored water suppression for 1 H-15N HSQC experiments optimized to retain full sensitivity. J Magn Reson Ser A, 102, 241–245. 15. Grzesiek, S. and Bax, A. (1993) The importance of not saturating H2O in Protein NMR – application to sensitivity enhancement and NOE measurements. J. Am. Chem. Soc., 115, 12593–12594. 16. Tugarinov, V., Kay, L.E., Ibraghimov, I. and Orekhov, V.Y. (2005) High-resolution fourdimensional 1H-13C NOE spectroscopy using methyl-TROSY, sparse data acquisition, and multidimensional decomposition. J. Am. Chem. Soc., 127, 2767–2775.
148
Protein NMR Spectroscopy
17. Jaravine, V., Ibraghimov, I. and Orekhov, V.Y. (2006) Removal of a time barrier for highresolution multidimensional NMR spectroscopy. Nat. Methods, 3, 605–607. 18. Sklenar, V. and Bax, A. (1987) Spin-echo water suppression for the generation of pure-phase two-dimensional NMR-spectra. J. Mag. Reson., 74, 469–479. 19. Delaglio, F., Grzesiek, S., Vuister, G.W. et al. (1995) NMRpipe – a multidimensional spectral processing system based on Unix pipes. J. Biomol. NMR, 6, 277–293. 20. Bax, A., Ikura, M., Kay, L. and Zhu, G. (1991) Removal of F1-base-line distortion and optimization of folding in multidimensional NMR-spectra. J. Mag. Reson., 91, 174–178. 21. Zhu, G., Torchia, D.A. and Bax, A. (1993) Discrete Fourier transformation of NMR signals – the relationship between sampling delay-time and spectral base-line. J. Magn. Reson. Ser. A, 105, 219–222. 22. Piotto, M., Saudek, V. and Sklenar, V. (1992) Gradient-tailored excitation for single-quantum NMR-spectroscopy of aqueous-solutions. J. Biomol. NMR, 2, 661–665. 23. Grzesiek, S., Wingfield, P., Stahl, S. et al. (1995) 4-dimensional 15N-separated NOESYof slowly tumbling perdeuterated 15N-enriched proteins –application to HIV-1 Nef. J. Am. Chem. Soc., 117, 9594–9595. 24. Boelens, R., Koning, T.M.G. and Kaptein, R. (1988) Determination of biomolecular structures from proton–proton NOEs using a relaxation matrix approach. J. Mol. Struct., 173, 299–311. 25. Bonvin, A.M., Vis, H., Breg, J.N. et al. (1994) Nuclear magnetic resonance solution structure of the Arc repressor using relaxation matrix calculations. J. Mol. Biol., 236, 328–341. 26. Borgias, B.A. and James, T.L. (1990) Mardigras – a procedure for matrix analysis of relaxation for discerning geometry of an aqueous structure. J. Magn. Reson., 87, 475–487. 27. LeMaster, D.M. (1990) Deuterium labelling in NMR structural analysis of larger proteins. Q. Rev. Biophys., 23, 133–174. 28. Rosen, M.K., Gardner, K.H., Willis, R.C. et al. (1996) Selective methyl group protonation of perdeuterated proteins. J. Mol. Biol., 263, 627–636. 29. Bothner-By, A., Stephens, R., Lee, J. et al. (1984) Structure determination of a tetrasaccharide: transient nuclear Overhauser effects in the rotating frame. J. Am. Chem. Soc., 106, 811–813. 30. Grzesiek, S. and Bax, A. (1997) A three-dimensional NMR experiment with improved sensitivity for carbonyl-carbonyl J correlation in proteins. J. Biomol. NMR, 9, 207–211. 31. Habeck, M., Rieping, W., Linge, J.P. and Nilges, M. (2004) NOE assignment with ARIA 2.0: the nuts and bolts. Methods Mol. Biol., 278, 379–402. 32. G€untert, P. (2004) Automated NMR structure calculation with CYANA. Methods Mol. Biol., 278, 353–378. 33. Linge, J.P., Habeck, M., Rieping, W. and Nilges, M. (2004) Correction of spin diffusion during iterative automated NOE assignment. J. Magn. Reson., 167, 334–342. 34. Nabuurs, S.B., Spronk, C.A.E.M., Krieger, E.K. et al. (2003) Quantitative evaluation of experimental NMR restraints. J. Am. Chem. Soc., 125, 12026–12034. 35. Vuister, G.W., Tessari, M., Karimi-Nejad, Y. and Whitehead, B. (1999) Pulse sequences for measuring coupling constants. Biol. Magnet. Reson., 16, 195–257. 36. Griesinger, C., Hennig, M., Marino, J. et al. (1999) Methods for the determination of torsion angle restraints in biomacromolecules. Biol. Magnet. Reson., 16, 259–367. 37. Xu, R.X., Olejniczak, E.T. and Fesik, S.W. (1992) Stereospecific assignments and c1 rotamers for FKBP when bound to ascomycin from 3JHaHb and 3HNHb coupling constants. FEBS Lett., 305, 137–143. 38. Bystrov, V. (1976) Spin-spin coupling and the conformational states of peptide systems. Prog. Nucl. Magn. Reson. Spectrosc., 10, 41–81. 39. Vuister, G.W. and Bax, A. (1994) Measurement of four-bond HN-Ha J-couplings in staphylococcal nuclease. J. Biomol. NMR, 4, 193–200. 40. Griesinger, C., Sørensen, O.W. and Ernst, R.R. (1985) Two-dimensional correlation of connected NMR transitions. J. Am. Chem. Soc., 107, 6394–6396.
Measurement of Structural Restraints
149
41. Vuister, G.W. and Bax, A. (1993) Quantitative J correlation: a new approach for measuring homonuclear three-bond JHNHa coupling constants in 15N-enriched proteins. J. Am. Chem. Soc., 115, 7772–7777. 42. Bax, A., Vuister, G.W., Grzesiek, S. et al. (1994) Measurement of homo- and heteronuclear J couplings from quantitative J correlation. Meth Enzymol., 239, 79–105. 43. Ottiger, M., Delaglio, F. and Bax, A. (1998) Measurement of J and dipolar couplings from simplified two-dimensional NMR spectra. J. Magn. Reson., 131, 373–378. 44. Grzesiek, S., Kuboniwa, H., Hinck, A.P. and Bax, A. (1995) Multiple-quantum line narrowing for measurement of Ha-Hb J-couplings in isotopically enriched proteins. J. Am. Chem. Soc., 117 (19), 5312–5315. 45. Bax, A., Max, D. and Zax, D. (1992) Measurement of long-range 13C-13C J couplings in a 20-kDa protein-peptide complex. J. Am. Chem. Soc., 114, 6923–6925. 46. Zhu, G. and Bax, A. (1993) Measurement of long-range 1H-13C coupling constants from quantitative 2D heteronuclear multiple-quantum correlation spectra. J. Magn. Reson. A, 104, 353–357. 47. Andersson, P., Weigelt, J. and Otting, G. (1998) Spin-state selection filters for the measurement of heteronuclear one-bond coupling constants. J. Biomol. NMR, 12, 435–441. 48. Sørensen, M.D., Meissner, A. and Sørensen, O.W. (1999) 13C natural abundance S3E and S3CT experiments for measurement of J coupling constants between 13Ca or 1Ha and other protons in a protein. J. Magn. Reson., 137, 237–242. 49. Delaglio, F., Torchia, D.A. and Bax, A. (1991) Measurement of 15N-13C J couplings in staphylococcal nuclease. J. Biomol. NMR, 1, 439–446. 50. Vuister, G.W., Delaglio, F. and Bax, A. (1992) An empirical correlation between 1JCaHa and protein backbone conformation. J. Am. Chem. Soc., 114, 9674–9675. 51. Cornilescu, G., Bax, A. and Case, D.A. (2000) Large variations in one-bond 13Ca-13Cb couplings in polypeptides correlate with backbone conformation. J. Am. Chem. Soc., 122, 2168–2171. 52. Juranic, N., Dannenberg, J.J., Cornilescu, G. et al. (2008) Structural dependencies of protein backbone 2JNC0 couplings. Protein Sci, 17, 768–776. 53. Karplus, M. (1959) Contact electron-spin coupling of nuclear magnetic moments. J Chem Phys, 30, 11–15. 54. Karplus, M. (1963) Vicinal proton coupling in nuclear magnetic resonance. J. Am. Chem. Soc., 85, 2870–2871. 55. Wang, A.C. and Bax, A. (1995) Reparametrization of the Karplus relation for 3JHaN and 3JHNC0 in peptides from uniformly 13C/15N-enriched human ubiquitin. J. Am. Chem. Soc., 117, 1810–1813. 56. V€ogeli, B., Ying, J., Grishaev, A. and Bax, A. (2007) Limits on variations in protein backbone dynamics from precise measurements of scalar couplings. J. Am. Chem. Soc., 129, 9377–9385. 57. Kim, Y. and Prestegard, J.H. (1990) Refinement of the NMR structures for acyl carrier protein with scalar coupling data. Proteins, 8, 377–385. 58. Mierke, D.F. and Kessler, H. (1992) Combined use of homo- and heteronuclear coupling constants as restraints in molecular dynamics simulations. Biopolymers, 32, 1277–1282. 59. Torda, A.E., Brunne, R.M., Huber, T. et al. (1993) Structure refinement using time-averaged J-coupling constant restraints. J. Biomol. NMR 3 55–66. 60. Garrett, D.S., Kuszewski, J., Hancock, T.J. et al. (1994) The impact of direct refinement against three-bond HN-CaH coupling constants on protein structure determination by NMR. J. Magn. Reson. B, 104, 99–103. 61. Hoch, J.C., Dobson, C.M. and Karplus, M. (1985) Vicinal coupling constants and protein dynamics. Biochemistry, 24, 3831–3841. 62. Perez, C., L€ohr, F., R€uterjans, H. and Schmidt, J.M. (2001) Self-consistent Karplus parametrization of 3J couplings depending on the polypeptide side-chain torsion c1. J. Am. Chem. Soc., 123, 7081–7093. 63. Jeffrey, G.A. and Saenger, W. (1991) Hydrogen Bonding in Biological Structures, Springer, New York. 64. Becker, E.D. (1996) Encyclopedia of Nuclear Magnetic Resonance, vol. 4 (eds D.M. Grant and R.K. Harris), John Wiley & Sons, Inc., New York, pp. 2409–2415.
150
Protein NMR Spectroscopy
65. Englaender S.W. (1996) Encyclopedia of Nuclear Magnetic Resonance, vol. 4 (eds D.M. Grant and R.K. Harris), John Wiley & Sons, Inc., New York, pp. 2415–2420. 66. Dingley, A. and Grzesiek, S. (1998) Direct observation of hydrogen bonds in nucleic acid base pairs by internucleotide 2JNN couplings. J. Am. Chem. Soc., 120, 8293–8297. 67. Pervushin, K., Ono, A., Fernandez, C. et al. (1998) NMR scalar couplings across Watson-Crick base pair hydrogen bonds in DNA observed by transverse relaxation-optimized spectroscopy. Proc. Natl. Acad. Sci. U.S.A., 95, 14147–14151. 68. Cordier, F. and Grzesiek, S. (1999) Direct observation of hydrogen bonds in proteins by interresidue 3hJNC0 scalar couplings. J. Am. Chem. Soc., 121, 1601–1602. 69. Cornilescu, G., Hu, J.-S. and Bax, A. (1999) Identification of the hydrogen bonding network in a protein by scalar couplings. J. Am. Chem. Soc., 121, 2949–2950. 70. Barfield, M. (2002) Structural dependencies of interresidue scalar coupling h3JNC, and donor 1H chemical shifts in the hydrogen bonding regions of proteins. J. Am. Chem. Soc., 124, 4158–4168. 71. Cornilescu, G., Ramirez, B.E., Frank, M.K. et al. (1999) Correlation between 3hJNC0 and hydrogen bond length in proteins. J. Am. Chem. Soc., 121, 6275–6279. 72. Cordier, F., Nisius, L., Dingley, A.J. and Grzesiek, S. (2008) Direct detection of N-H O¼C hydrogen bonds in biomolecules by NMR spectroscopy. Nature Protocols, 3, 235–241. 73. Cordier, F., Rogowski, M., Grzesiek, S. and Bax, A. (1999) Observation of through-hydrogenbond 2hJNC0 in a perdeuterated protein. J. Magn. Reson., 140, 510–512. 74. Wang, Y.-X., Jacob, J., Cordier, F. et al. (1999) Measurement of h3JNC0 connectivities across hydrogen bonds in a 30 kDa protein. J. Biomol. NMR, 14, 181–184. 75. Gaponenko, V., Sarma, S., Altieri, A. et al. (2004) Improving the accuracy of NMR structures of large proteins using pseudocontact shifts as long-range restraints. J. Biomol. NMR, 28, 205–212. 76. Sass, H.J., Schmid, F.F. and Grzesiek, S. (2007) Correlation of protein structure and dynamics to scalar couplings across hydrogen bonds. J. Am. Chem. Soc., 129, 5898–5903. 77. Tjandra, N., Grzesiek, S. and Bax, A. (1996) Magnetic field dependence of nitrogen-proton J splittings in 15N-enriched human ubiquitin resulting from relaxation interference and residual dipolar coupling. J. Am. Chem. Soc., 118, 6264–6272. 78. Tolman, J.R., Flanagan, J.M., Kennedy, M.A. and Prestegard, J.H. (1995) Nuclear magnetic dipole interactions in field-oriented proteins: information for structure determination in solution. Proc. Natl. Acad. Sci. U.S.A., 92, 9279–9283. 79. Tjandra, N. and Bax, A. (1997) Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science, 278, 1111–1114. 80. Saupe, A. and Englert, G. (1963) High-resolution nuclear magnetic resonance spectra of orientated molecules.1 Phys. Rev. Lett., 11, 462–464. 81. Sass, J., Cordier, F., Hoffmann, A. et al. (1999) Purple membrane induced alignment of biological macromolecules in the magnetic field. J. Am. Chem. Soc., 121, 2047–2055. 82. Losonczi, J.A., Andrec, M., Fischer, M.W. and Prestegard, J.H. (1999) Order matrix analysis of residual dipolar couplings using singular value decomposition. J. Magn. Reson., 138, 334–342. 83. Moltke, S. and Grzesiek, S. (1999) Structural constraints from residual tensorial couplings in high resolution NMR without an explicit term for the alignment tensor. J. Biomol. NMR, 15, 77–82. 84. Sass, H.J., Musco, G., Stahl, S.J. et al. (2001) An easy way to include weak alignment constraints into NMR structure calculations. J. Biomol. NMR, 21, 275–280. 85. Cornilescu, G., Marquardt, J., Ottiger, M. and Bax, A. (1998) Validation of protein structure from anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J. Am. Chem. Soc., 120, 6836–6837. 86. Lipsitz, R.S. and Tjandra, N. (2001) Carbonyl CSA restraints from solution NMR for protein structure refinement. J. Am. Chem. Soc., 123, 11065–11066. 87. Choy, W.Y., Tollinger, M., Mueller, G.A. and Kay, L.E. (2001) Direct structure refinement of high molecular weight proteins against residual dipolar couplings and carbonyl chemical shift changes upon alignment: an application to maltose binding protein. J. Biomol. NMR, 21, 31–40.
Measurement of Structural Restraints
151
88. Lipsitz, R.S. and Tjandra, N. (2003) 15N chemical shift anisotropy in protein structure refinement and comparison with NH residual dipolar couplings. J. Magn. Reson., 164, 171–176. 89. Bothnerby, A., Domaille, P. and Gayathri, C. (1981) Ultra-high-field NMR spectroscopy – observation of proton-proton dipolar coupling in paramagnetic bis[tolyltris(pyrazolyl)borato] cobalt(II). J. Am. Chem. Soc., 103, 5602–5603. 90. van Zijl, P.C.M., Ruessink, B.H., Bulthuis, J. and MacLean, C. (1984) NMR of partially aligned liquids – magnetic-susceptibility anisotropies and dielectric-properties. Acc. Chem. Res., 17, 172–180. 91. Plantenga, T., Van Zijl, P. and MacLean, C. (1982) Studies of quadrupolar and dipolar electric field effects in the NMR spectra of binary mixtures of liquids. Chem. Phys., 66, 1–9. 92. Kung, H.C., Wang, K.Y., Goljer, I. and Bolton, P.H. (1995) Magnetic alignment of duplex and quadruplex DNAs. J. Magn. Reson. Ser. B, 109, 323–325. 93. Tjandra, N., Omichinski, J.G., Gronenborn, A.M. et al. (1997) Use of dipolar 1H-15N and 1H-13C couplings in the structure determination of magnetically oriented macromolecules in solution. Nat. Struct. Biol., 4, 732–738. 94. Bertini, I., Luchinat, C. and Parigi, G. (2002) Paramagnetic constraints: An aid for quick solution structure determination of paramagnetic metalloproteins. Concepts Magn. Resonance, 14, 259–286. 95. Otting, G. (2008) Prospects for lanthanides in structural biology by NMR. J. Biomol. NMR, 42, 1–9. 96. H€aussinger, D., Huang, J.-R. and Grzesiek, S. (2009) DOTA-M8: An extremely rigid, highaffinity lanthanide chelating tag for PCS NMR spectroscopy. J. Am. Chem. Soc., 131, 14761–14767. 97. Prestegard, J.H., Bougault, C.M. and Kishore, A.I. (2004) Residual dipolar couplings in structure determination of biomolecules. Chem. Rev., 104, 3519–3540. 98. Cavagnero, S., Dyson, H.J. and Wright, P.E. (1999) Improved low pH bicelle system for orienting macromolecules over a wide temperature range. J. Biomol. NMR, 13, 387–391. 99. Ottiger, M. and Bax, A. (1999) Bicelle-based liquid crystals for NMR-measurement of dipolar couplings at acidic and basic pH values. J. Biomol. NMR, 13, 187–191. 100. Ruckert, M. and Otting, G. (2000) Alignment of biological macromolecules in novel nonionic liquid crystalline media for NMR experiments. J. Am. Chem. Soc., 122, 7793–7797. 101. Freyssingeas, E., Nallet, F. and Roux, D. (1996) Measurement of the membrane flexibility in lamellar and ‘sponge’ phases of the C(12)E(5)/hexanol/water system. Langmuir, 12, 6028–6035. 102. Chou, J., Gaemers, S., Howder, B. et al. (2001) A simple apparatus for generating stretched polyacrylamide gels, yielding uniform alignment of proteins and detergent micelles. J. Biomol. NMR, 21, 377–382. 103. Ma, J., Goldberg, G. and Tjandra, N. (2008) Weak alignment of biomacromolecules in collagen gels: an alternative way to yield residual dipolar couplings for NMR measurements. J. Am. Chem. Soc., 130, 16148–16149. 104. Hansen, M., Mueller, L. and Pardi, A. (1998) Tunable alignment of macromolecules by filamentous phage yields dipolar coupling interactions. Nat. Struct. Biol., 5, 1065–1074. 105. Zweckstetter, M. and Bax, A. (2001) Characterization of molecular alignment in aqueous suspensions of Pf1 bacteriophage. J. Biomol. NMR, 20, 365–377. 106. Clore, G., Starich, M. and Gronenborn, A. (1998) Measurement of residual dipolar couplings of macromolecules aligned in the nematic phase of a colloidal suspension of rod-shaped viruses. J. Am. Chem. Soc., 120, 10571–10572. 107. Losonczi, J.A. and Prestegard, J.H. (1998) Improved dilute bicelle solutions for high-resolution NMR of biological macromolecules. J. Biomol. NMR, 12, 447–451. 108. Meier, S., H€aussinger, D. and Grzesiek, S. (2002) Charged acrylamide copolymer gels as media for weak alignment. J. Biomol. NMR, 24, 351–356. 109. Ulmer, T.S., Ramirez, B.E., Delaglio, F. and Bax, A. (2003) Evaluation of backbone proton positions and dynamics in a small protein by liquid crystal NMR spectroscopy. J. Am. Chem. Soc., 125, 9179–9191.
152
Protein NMR Spectroscopy
110. Koenig, B., Hu, J., Ottiger, M. et al. (1999) NMR measurement of dipolar couplings in proteins aligned by transient binding to purple membrane fragments. J. Am. Chem. Soc., 121, 1385–1386. 111. Prosser, R., Losonczi, J. and Shiyanovskaya, I. (1998) Use of a novel aqueous liquid crystalline medium for high-resolution NMR of macromolecules in solution. J. Am. Chem. Soc., 120, 11010–11011. 112. Douglas, S., Chou, J. and Shih, W. (2007) DNA-nanotube-induced alignment of membrane proteins for NMR structure determination. Proc. Natl. Acad. Sci. U.S.A., 104, 6644–6648. 113. Lorieau, J., Yao, L. and Bax, A. (2008) Liquid crystalline phase of G-tetrad DNA for NMR study of detergent-solubilized proteins. J. Am. Chem. Soc., 130, 7536–7537. 114. Fleming, K., Gray, D., Prasannan, S. and Matthews, S. (2000) Cellulose crystallites: A new and robust liquid crystalline medium for the measurement of residual dipolar couplings. J. Am. Chem. Soc., 122, 5224–5225. 115. Tycko, R., Blanco, F. and Ishii, Y. (2000) Alignment of biopolymers in strained gels: A new way to create detectable dipole-dipole couplings in high-resolution biomolecular NMR. J. Am. Chem. Soc., 122, 9340–9341. 116. Sass, H., Musco, G., Stahl, S. et al. (2000) Solution NMR of proteins within polyacrylamide gels: Diffusional properties and residual alignment by mechanical stress or embedding of oriented purple membranes. J. Biomol. NMR, 18, 303–309. 117. Ruan, K. and Tolman, J.R. (2005) Composite alignment media for the measurement of independent sets of NMR residual dipolar couplings. J. Am. Chem. Soc., 127, 15032–15033. 118. Barbieri, R., Bertini, I., Lee, Y.-M. et al. (2002) Structure-independent cross-validation between residual dipolar couplings originating from internal and external orienting media. J. Biomol. NMR, 22, 365–368. 119. Ottiger, M., Delaglio, F., Marquardt, J.L. et al. (1998) Measurement of dipolar couplings for methylene and methyl sites in weakly oriented macromolecules and their use in structure determination. J. Magn. Reson., 134, 365–369. 120. Bouvignies, G., Bernado, P., Meier, S. et al. (2005) Identification of slow correlated motions in proteins using residual dipolar and hydrogen-bond scalar couplings. Proc. Natl. Acad. Sci. U.S.A., 102, 13885–13890. 121. Meier, S., Blackledge, M. and Grzesiek, S. (2008) Conformational distributions of unfolded polypeptides from novel NMR techniques. J. Chem. Phys., 128, 052204. 122. Lange, O., Lakomek, N., Fares, C. et al. (2008) Recognition dynamics up to microseconds revealed from an RDC-derived ubiquitin ensemble in solution. Science, 320, 1471–1475. 123. Clore, G.M., Gronenborn, A.M. and Bax, A. (1998) A robust method for determining the magnitude of the fully asymmetric alignment tensor of oriented macromolecules in the absence of structural information. J. Magn. Reson., 133, 216–221. 124. Sun, Y.J., Rose, J., Wang, B.C. and Hsiao, C.D. (1998) The structure of glutamine-binding protein complexed with glutamine at 1.94 A resolution: comparisons with other amino acid binding proteins. J. Mol. Biol., 278, 219–229. 125. Baber, J.L., Libutti, D., Levens, D. and Tjandra, N. (1999) High precision solution structure of the C-terminal KH domain of heterogeneous nuclear ribonucleoprotein K, a c-myc transcription factor. J. Mol. Biol., 289, 949–962. 126. Warren, J.J. and Moore, P.B. (2001) A maximum likelihood method for determining DaPQ and R for sets of dipolar coupling data. J. Magn. Reson., 149, 271–275. 127. Warren, J.J. and Moore, P.B. (2001) Application of dipolar coupling data to the refinement of the solution structure of the sarcin-ricin loop RNA. J. Biomol. NMR, 20, 311–323. 128. Zweckstetter, M. and Bax, A. (2000) Prediction of sterically induced alignment in a dilute liquid crystalline phase: Aid to protein structure determination by NMR. J. Am. Chem. Soc., 122, 3791–3792. 129. Ramirez, B. and Bax, A. (1998) Modulation of the alignment tensor of macromolecules dissolved in a dilute liquid crystalline medium. J. Am. Chem. Soc, 120, 9106–9107. 130. van Lune, F., Manning, L., Dijkstra, K. et al. (2002) Order-parameter tensor description of HPr in a medium of oriented bicelles. J. Biomol. NMR, 23, 169–179.
Measurement of Structural Restraints
153
131. Huang, J.-R. and Grzesiek, S. (2010) Ensemble calculations of unstructured proteins constrained by RDC and PRE data: A case study of urea-denatured ubiquitin. J. Am. Chem. Soc., 132, 694–705. 132. de Alba, E., Baber, J. and Tjandra, N. (1999) The use of residual dipolar coupling in concert with backbone relaxation rates to identify conformational exchange by NMR. J. Am. Chem. Soc., 121, 4282–4283. 133. Zweckstetter, M., Hummer, G. and Bax, A. (2004) Prediction of charge-induced molecular alignment of biomolecules dissolved in dilute liquid-crystalline phases. Biophys J., 86, 3444–3460. 134. Annila, A., Aitio, H., Thulin, E. and Drakenberg, T. (1999) Recognition of protein folds via dipolar couplings. J. Biomol. NMR, 14, 223–230. 135. Suzuki, M., Lee, D.-Y., Inyamah, N. et al. (2008) Solution NMR structure of selenium-binding protein from Methanococcus vannielii. J. Biol. Chem., 283, 25936–25943. 136. Schwieters, C.D., Kuszewski, J.J., Tjandra, N. and Clore, G.M. (2003) The Xplor-NIH NMR molecular structure determination package. J. Magn. Reson., 160, 65–73. 137. Fischer, M.W.F., Losonczi, J.A., Lim Weaver, J. and Prestegard, J.H. (1999) Domain orientation and dynamics in multidomain proteins from residual dipolar couplings. Biochemistry, 38, 9013–9022. 138. Tsui, V., Zhu, L., Huang, T.H. et al. (2000) Assessment of zinc finger orientations by residual dipolar coupling constants. J. Biomol. NMR, 16, 9–21. 139. Braddock, D.T., Cai, M., Baber, J.L. et al. (2001) Rapid identification of medium- to large-scale interdomain motion in modular proteins using dipolar couplings. J. Am. Chem. Soc., 123, 8634–8635. 140. H€aussinger, D., Ahrens, T., Sass, H. et al. (2002) Calcium-dependent homoassociation of E-cadherin by NMR spectroscopy: changes in mobility, conformation and mapping of contact regions. J. Mol. Biol., 324, 823–839. 141. Bolon, P.J., Al-Hashimi, H.M. and Prestegard, J.H. (1999) Residual dipolar coupling derived orientational constraints on ligand geometry in a 53 kDa protein-ligand complex. J. Mol. Biol., 293, 107–115. 142. Jain, N.U., Noble, S. and Prestegard, J.H. (2003) Structural characterization of a mannosebinding protein-trimannoside complex using residual dipolar couplings. J. Mol. Biol., 328, 451–462. 143. Koenig, B.W., Mitchell, D.C., K€onig, S. et al. (2000) Measurement of dipolar couplings in a transducin peptide fragment weakly bound to oriented photo-activated rhodopsin. J. Biomol. NMR, 16, 121–125. 144. Koenig, B.W., Kontaxis, G., Mitchell, D.C. et al. (2002) Structure and orientation of a G protein fragment in the receptor bound state from residual dipolar couplings. J. Mol. Biol., 322, 441–461. 145. Clore, G., Starich, M., Bewley, C. et al. (1999) Impact of residual dipolar couplings on the accuracy of NMR structures determined from a minimal number of NOE restraints. J. Am. Chem. Soc., 121, 6513–6514. 146. Mueller, G.A., Choy, W.Y., Yang, D. et al. (2000) Global folds of proteins with low densities of NOEs using residual dipolar couplings: application to the 370-residue maltodextrin-binding protein. J. Mol. Biol., 300, 197–212. 147. Delaglio, F., Kontaxis, G. and Bax, A. (2000) Protein structure determination using molecular fragment replacement and NMR dipolar couplings. J. Am. Chem. Soc, 122, 2142–2143. 148. Hus, J.C., Marion, D. and Blackledge, M. (2001) Determination of protein backbone structure using only residual dipolar couplings. J. Am. Chem. Soc., 123, 1541–1542. 149. Beraud, S., Bersch, B., Brutscher, B. et al. (2002) Direct structure determination using residual dipolar couplings: reaction-site conformation of methionine sulfoxide reductase in solution. J. Am. Chem. Soc., 124, 13709–13715. 150. Bouvignies, G., Meier, S., Grzesiek, S. and Blackledge, M. (2006) Ultrahigh-resolution backbone structure of perdeuterated protein GB1 using residual dipolar couplings from two alignment media. Angew. Chem.-Int. Edit., 45, 8166–8169. 151. Rohl, C.A. and Baker, D. (2002) De novo determination of protein backbone structure from residual dipolar couplings using Rosetta. J. Am. Chem. Soc., 124, 2723–2729.
154
Protein NMR Spectroscopy
152. Raman, S., Lange, O.F., Rossi, P. et al. (2010) NMR structure determination for larger proteins using backbone-only data. Science, 327, 1014–1018. 153. Clore, G.M. and Schwieters, C.D. (2006) Concordance of residual dipolar couplings, backbone order parameters and crystallographic B-factors for a small alpha/beta protein: a unified picture of high probability, fast atomic motions in proteins. J. Mol. Biol., 355, 879–886. 154. Showalter, S.A. and Br€uschweiler, R. (2007) Quantitative molecular ensemble interpretation of NMR dipolar couplings without restraints. J. Am. Chem. Soc., 129, 4158–4159. 155. Jensen, M.R., Markwick, P.R.L., Meier, S. et al. (2009) Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings. Structure, 17, 1169–1185. 156. Korzhnev, D.M., Religa, T.L., Banachewicz, W. et al. (2010) A transient and low-populated protein-folding intermediate at atomic resolution. Science, 329, 1312–1316. 157. Vallurupalli, P., Hansen, D. and Kay, L. (2008) Structures of invisible, excited protein states by relaxation dispersion NMR spectroscopy. Proc. Natl. Acad. Sci. U.S.A., 105, 11766–11771. 158. Grzesiek, S. and Sass, H.-J. (2009) From biomolecular structure to functional understanding: new NMR developments narrow the gap. Curr. Opin. Struct. Biol., 19, 585–595. 159. Markley, J.L., Bax, A., Arata, Y. et al. (1998) Recommendations for the presentation of NMR structures of proteins and nucleic acids. IUPAC-IUBMB-IUPAB Inter-Union Task Group on the Standardization of Data Bases of Protein and Nucleic Acid Structures Determined by NMR Spectroscopy. J. Biomol. NMR, 12, 1–23. 160. Saitoˆ, H. (1986) Conformation-dependent 13C chemical shifts: A new means of conformational characterization as obtained by high-resolution solid-state 13C NMR. Magn. Reson. Chem., 24, 835–852. 161. Spera, S. and Bax, A. (1991) Empirical correlation between protein backbone conformation and Ca and Cb 13C Nuclear Magnetic Resonance chemical shifts. J. Am. Chem. Soc., 113, 5490–5492. 162. Cornilescu, G., Delaglio, F. and Bax, A. (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR, 13, 289–302. 163. Williamson, M.P., Kikuchi, J. and Asakura, T. (1995) Application of 1H NMR chemical shifts to measure the quality of protein structures. J. Mol. Biol., 247, 541–546. 164. de Dios, A.C., Pearson, J.G. and Oldfield, E. (1993) Secondary and tertiary structural effects on protein NMR chemical shifts: An ab initio approach. Science, 260, 1491–1496. 165. Wishart, D.S. and Case, D.A. (2002) Use of chemical shifts in macromolecular structure determination. Methods Enzymol., 338, 3–34. 166. Wang, Y.J. and Jardetzky, O. (2002) Investigation of the neighboring residue effects on protein chemical shifts. J. Am. Chem. Soc., 124, 14075–14084. 167. Wang, Y.J. and Jardetzky, O. (2004) Predicting 15N chemical shifts in proteins using the preceding residue-specific individual shielding surfaces from f, yi1, and c1 torsion angles. J. Biomol. NMR, 28, 327–340. 168. Haigh, C.W. and Mallion, R.B. (1979) Ring current theories in Nuclear Magnetic Resonance. Progr. NMR Spectrosc., 13, 303–344. 169. Case, D.A. (1995) Calibration of ring-current effects in proteins and nucleic acids. J. Biomol. NMR, 6, 341–346. 170. Wagner, G., Pardi, A. and W€uthrich, K. (1983) Hydrogen bond length and proton NMR chemical shifts in proteins. J. Am. Chem. Soc., 105, 5948–5949. 171. Shen, Y. and Bax, A. (2007) Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology. J. Biomol. NMR, 38, 289–302. 172. Neal, S., Nip, A.M., Zhang, H.Y. and Wishart, D.S. (2003) RefDB: a database of uniformly referenced protein chemical shifts. J. Biomol. NMR, 26, 215–240. 173. Avbelj, F., Kocjan, D. and Baldwin, R.L. (2004) Protein chemical shifts arising from a-helices and b-sheets depend on solvent exposure. Proc. Natl. Acad. Sci. U.S.A., 101, 17394–17397. 174. Richarz, R. and W€uthrich, K. (1978) Carbon-13 NMR chemical shifts of the common amino acid residues measured in aqueous solutions of the linear tetrapeptides H-Gly-Gly- X-L-Ala-OH. Biopolymers, 17, 2133–2141.
Measurement of Structural Restraints
155
175. Bundi, A. and W€uthrich, K. (1979) 1H NMR parameters of the common amino acid residues measured in aqueous solutions of the linear tetrapeptides H-Gly-Gly-X-L-Ala-OH. Biopolymers, 18, 285–297. 176. Braun, D., Wider, G. and Wuethrich, K. (1994) Sequence-corrected 15N ‘random coil’ chemical shifts. J. Am. Chem. Soc., 116, 8466–8469. 177. Merutka, G., Jane Dyson, H. and Wright, P. (1995) ‘Random coil’ 1H chemical shifts obtained as a function of temperature and trifluoroethanol concentration for the peptide series GGXGG. J. Biomol. NMR, 5, 14–24. 178. Wishart, D.S., Bigam, C., Holm, A. et al. (1995) 1H, 13C and 15N random coil NMR chemical shifts of the common amino acids. I. Investigations of nearest-neighbor effects. J. Biomol. NMR, 5, 67–81. 179. Schwarzinger, S., Kroon, G.J.A., Foss, T.R. et al. (2001) Sequence-dependent correction of random coil NMR chemical shifts. J. Am. Chem. Soc., 123, 2970–2978. 180. Wishart, D.S., Sykes, B.D. and Richards, F.M. (1991) Relationship between nuclear magnetic resonance chemical shift and protein secondary structure. J. Mol. Biol., 222, 311–333. 0 181. Xu, X.P. and Case, D.A. (2001) Automated prediction of 15N, 13Ca, 13Cb and 13C chemical shifts in proteins using a density functional database. J. Biomol. NMR, 21, 321–333. 182. Le, H.B. and Oldfield, E. (1996) Ab initio studies of amide-N-15 chemical shifts in dipeptides: Applications to protein NMR spectroscopy. J. Phys. Chem., 100, 16423–16428. 183. Wishart, D.S., Watson, M.S., Boyko, R.F. and Sykes, B.D. (1997) Automated 1H and 13C chemical shift prediction using the BioMagResBank. J. Biomol. NMR, 10, 329–336. 184. Meiler, J. and Baker, D. (2003) Rapid protein fold determination using unassigned NMR data. Proc. Natl. Acad. Sci. U.S.A., 100, 15404–15409. 185. Wishart, D.S., Sykes, B.D. and Richards, F.M. (1992) The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy. J. Am. Chem. Soc., 31, 1647–1651. 186. Gong, H., Shen, Y. and Rose, G.D. (2007) Building native protein conformation from NMR backbone chemical shifts using Monte Carlo fragment assembly. Protein Sci., 16, 1515–1521. 187. Wishart, D.S., Arndt, D., Berjanskii, M. et al. (2008) CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data. Nucl. Acids Res., 36, W496–502. 188. Zhang, H., Neal, S. and Wishart, D.S. (2003) RefDB: a database of uniformly referenced protein chemical shifts. J. Biomol. NMR, 25, 173–195. 189. Wang, L., Eghbalnia, H., Bahrami, A. and Markley, J. (2005) Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications. J. Biomol. NMR, 32, 13–22. 190. Berman, H.M., Westbrook, J., Feng, Z. et al. (2000) The protein data bank. Nucl. Acids Res., 28, 235–242. 191. Doreleijers, J.F., Nederveen, A.J., Vranken, W. et al. (2005) BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures. J. Biomol. NMR, 32, 1–12. 192. Shen, Y., Delaglio, F., Cornilescu, G. and Bax, A. (2009) TALOS þ : a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR, 44, 213–223. 193. Berjanskii, M.V. and Wishart, D.S. (2005) A simple method to predict protein flexibility using secondary chemical shifts. J. Am. Chem. Soc., 127, 14970–14971. 194. Bowers, P.M., Strauss, C.E.M. and Baker, D. (2000) De novo protein structure determination using sparse NMR data. J. Biomol. NMR, 18, 311–318. 195. Kontaxis, G., Delaglio, F. and Bax, A. (2005) Molecular fragment replacement approach to protein structure determination by chemical shift and dipolar homology database mining. Methods Enzymol., 394, 42–78. 196. Rohl, C.A., Strauss, C.E.M., Misura, K.M.S. and Baker, D. (2004) Protein structure prediction using Rosetta. Methods Enzymol., 383, 66–93.
156
Protein NMR Spectroscopy
197. Shen, Y., Vernon, R., Baker, D. and Bax, A. (2009) De novo protein structure generation from incomplete chemical shift assignments. J. Biomol. NMR, 43, 63–78. 198. Robustelli, P., Cavalli, A. and Vendruscolo, M. (2008) Determination of protein structures in the solid state from NMR chemical shifts. Structure, 16, 1764–1769. 199. Montalvao, R.W., Cavalli, A., Salvatella, X. et al. (2008) Structure determination of protein–protein complexes using NMR chemical shifts: Case of an endonuclease colicin-immunity protein complex. J. Am. Chem. Soc., 130, 15990–15996. 200. Das, R. Andre, I., Shen, Y. et al. (2009) Simultaneous prediction of protein folding and docking at high resolution. Proc. Natl. Acad. Sci. U.S.A., 106, 18978–18983. 201. Gryk, M.R. and Hoch, J.C. (2008) Local knowledge helps determine protein structures. Proc. Natl. Acad. Sci., 105, 4533–4534. 202. Parsons, L.M., Grishaev, A. and Bax, A. (2008) The periplasmic domain of TolR from Haemophilus influenzae forms a dimer with a large hydrophobic groove: NMR solution structure and comparison to SAXS data. Biochemistry, 47, 3131–3142. 203. Svergun, D. and Koch, M. (2003) Small-angle scattering studies of biological macromolecules in solution. Rep. Prog. Phys., 66, 1735–1782. 204. Koch, M., Vachette, P. and Svergun, D. (2003) Small-angle scattering: a view on the properties, structures and structural changes of biological macromolecules in solution. Q. Rev. Biophys., 36, 147–227. 205. Jacques, D.A. and Trewhella, J. (2010) Small-angle scattering for structural biology – expanding the frontier while avoiding the pitfalls. Protein Sci., 19, 642–657. 206. Svergun, D., Barberato, C. and Koch, M.H.J. (1995) CRYSOL – A program to evaluate x-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Cryst., 28, 768–773. 207. Svergun, D.I., Richard, S., Koch, M.H.J. et al. (1998) Protein hydration in solution: Experimental observation by x-ray and neutron scattering. Proc. Natl. Acad. Sci. U.S.A., 95, 2267–2272. 208. Svergun, D.I., Petoukhov, M.V. and Koch, M.H.J. (2001) Determination of domain structure of proteins from X-ray solution scattering. Biophys. J., 80, 2946–2953. 209. Svergun, D.I. (1992) Determination of the regularization parameter in indirect-transform methods using perceptual criteria. J. Appl. Cryst., 25, 495–503. 210. Bergmann, A., Fritz, G. and Glatter, O. (2000) Solving the generalized indirect Fourier transformation (GIFT) by Boltzmann simplex simulated annealing (BSSA). J. Appl. Cryst., 33, 1212–1216. 211. Svergun, D.I. (1999) Restoring low resolution structure of biological macromolecules from solution scattering using simulated annealing. Biophys. J., 76, 2879–2886. 212. Petoukhov, M.V. and Svergun, D.I. (2003) New methods for domain structure determination of proteins from solution scattering data. J. Appl. Cryst., 36, 540–544. 213. Schwieters, C. and Clore, G. (2007) A physical picture of atomic motions within the Dickerson DNA dodecamer in solution derived from joint ensemble refinement against NMR and largeangle X-ray scattering data. Biochemistry, 46, 1152–1166. 214. Grishaev, A., Ying, J., Canny, M. et al. (2008) Solution structure of tRNAVal from joint refinement against dual alignment residual dipolar couplings and SAXS data. J. Biomol. NMR, 42, 99–109. 215. Grishaev, A., Tugarinov, V., Kay, L. et al. (2008) Refined solution structure of the 82-kDa enzyme Malate Synthase G from joint NMR and synchrotron SAXS restraints. J. Biomol. NMR, 40, 95–106. 216. Zuo, X., Wang, J., Foster, T. et al. (2008) Global molecular structure and interfaces: refining an RNA:RNA complex structure using solution X-ray scattering data. J. Am. Chem. Soc., 130, 3292–3293. 217. Wang, J., Zuo, X., Yu, P. et al. (2009) Determination of multicomponent protein structures in solution using global orientation and shape restraints. J. Am. Chem. Soc., 131, 10507–10515. 218. Whitten, A., Cai, S. and Trewhella, J. (2008) MULCh: Modules for the analysis of small-angle neutron contrast variation data from bio-molecular assemblies. J. Appl. Cryst., 41, 222–226. 219. Comoletti, D., Grishaev, A., Whitten, A. et al. (2007) Synaptic arrangement of the neuroligin/ beta-neurexin complex revealed by X-ray and neutron scattering. Structure, 15, 693–705.
Measurement of Structural Restraints
157
220. Mylonas, E. and Svergun, D. (2003) Accuracy of molecular mass determination of proteins in solution by small-angle X-ray scattering. J. Appl. Cryst., 40, s245–s249. 221. Heller, W. (2004) Influence of multiple well-defined conformations on small-angle scattering of proteins in solution. Acta Crystallogr. D, 61, 33–44. 222. Wang, Y., Trewhella, J. and Goldenberg, D. (2008) Small-angle X-ray scattering of reduced ribonuclease A: effects of solution conditions and comparisons with a computational model of unfolded proteins. J. Mol. Biol., 377, 1576–1592. 223. Bernado, P., Mylonas, V., Petoukhov, M.V. et al. (2007) Structural characterization of flexible proteins using small-angle x-ray scattering studies of biological macromolecules in solution. J. Am. Chem. Soc., 129, 5656–5664.
5 Calculation of Structures from NMR Restraints Peter Guntert
5.1
Introduction
When the NMR method for protein structure determination was introduced in the early 1980s the new approach met with enthusiasm amongst NMR spectroscopists, as well as scepticism and disbelief by structural biologists until the simultaneous but independent determinations of the three-dimensional structure of the protein tendamistat by X-ray crystallography [1] and NMR spectroscopy [2,3] yielded virtually identical results [4]. Since that time, NMR has become a firmly established method for determining the threedimensional structures of proteins. More than 7800 structures in the Protein Data Bank [5] of March 2009 have been determined by NMR (Figure 5.1). This remarkable achievement would not have been possible without the development of sophisticated computational methods to compute three-dimensional protein structures from NMR-derived conformational restraints, and by increasingly automated approaches for analysing multidimensional NMR spectra. NMR structure calculations can be performed in several ways that differ essentially by the extent to which the analysis of the spectra is automated. In a basic structure calculation, all spectra are analysed by the spectroscopist who also interprets the data and provides the structure calculation program with geometric restraints in the form of allowed interatomic distance ranges, ranges of allowed torsion angle values, and
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
160
Protein NMR Spectroscopy (a)
New PDB entries
6000 5000 4000
NMR X-ray
3000 2000 1000
1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
0
Deposition year
(b)
PDB entries
800 NMR X-ray
600 400 200 0 0
20
40 60 Protein mass, kDa
80
100
(c)
280
BMRB entries
240 200 20
160
16 12
120
8
80
4 0
40
30
40
50
60
70
80
0 0
20
40 60 Protein mass, kDa
80
Figure 5.1 (a) Annual depositions of X-ray and NMR structures 1988–2007. (b) Size distribution of X-ray and NMR structures in the Protein Data Bank of January 2009. (c) Completeness of chemical shift assignments in the Biological Magnetic Resonance Data Bank (BMRB) of January 2009. Completeness of the chemical shift assignments of backbone amide 1 H and aliphatic 1 H chemical shifts: 10–30 %, black bars; 30–70 %, dark grey bars; 70–90 %, medium grey bars; more than 90 %, light grey bars
Calculation of Structures from NMR Restraints
161
possibly additional types of restraints. In this case the software deals with the purely geometric problem of finding a three-dimensional arrangement of the atoms that is compatible with the primary structure of the protein, the conformational restraints from NMR, and steric repulsion. Instead of using conformational restraints, NMR structure calculation software can also read assigned NOESY peak lists and convert this information into upper bounds on distances between the corresponding pairs of hydrogen atoms using a given or automatically derived peak volume-to-distance relationship. A first significant degree of automation was reached by approaches that combined the automated assignment of NOESY peaks with the structure calculation. These algorithms start from the given chemical shift assignments and unassigned lists of NOESY peak positions and intensities. Only recently has it become possible to completely automate NMR spectra analysis by a fully automated algorithm that uses as input data a set of uninterpreted, multidimensional NMR spectra. Finally, several lines of unconventional approaches to NMR structure determination have been proposed that do not rely on sequence-specific chemical shift assignments and/or NOESY data. This chapter gives an overview of the principles, basic algorithms and popular implementations of NMR structure calculation methods, including automated, assignment-free, and chemical shift-based approaches. For consistency and simplicity, the following conventions will be used: An interaction between two or more atoms is manifested by a signal in a multidimensional spectrum. A peak refers to an entry in a peak list that has been derived from an experimental spectrum by peak picking. A peak may or may not represent a signal, and there may be signals that are not represented by a peak. Chemical shift assignment is the process and the result of attributing a specific chemical shift value to an atom. Peak assignment is the process and the result of identifying in each spectral dimension the atom(s) that are involved in the signal represented by the peak. NOESY assignment is peak assignment in NOESY spectra.
5.2
Historical Development
With the first attempts to determine protein structures by NMR it became clear that new computer algorithms for structure calculation would be indispensable for solving threedimensional protein structures, and that existing techniques from X-ray diffraction data would be as inadequate for the task as manual model building or interactive computer graphics. The mathematical theory of distance geometry [6] was the first method to be used for protein structure calculation. The basic idea of distance geometry is to formulate the problem not in the Cartesian space of the atom positions but in the high-dimensional space of all interatomic distances where it is straightforward to find configurations that satisfy a network of distance measurements. The crucial step is then the embedding of a solution found in distance space into Cartesian space. For the first time a computer program was used to calculate the solution structure of a nonapeptide on the basis of experimental NOE measurements [7], and later on the NMR solution structure of a 35-residue globular protein [8]. An improved version of the original embedding
162
Protein NMR Spectroscopy
algorithm was implemented in DISGEO [9], the first complete program package for NMR protein structure calculation. Finding molecular conformations that are in agreement with geometrical restraints can be formulated as the minimisation of a suitable ‘target function’. The variable target function method in torsion angle space [10] used the method of conjugate gradients [11] for the minimisation of a multidimensional function. Recognising that fluctuations of the covalent bond lengths and bond angles around their equilibrium values are small, fast, and not measurable by NMR, only the torsion angles were retained as degrees of freedom. A fast recursive method made it possible to rapidly calculate the gradient of the target function against torsion angles [12]. However, as a local minimiser that takes exclusively downhill steps, conjugate gradient minimisation of a target function representing the complete network of NMR-derived restraints and the steric repulsion in a protein was virtually always trapped in local minima far from the correct solution. To alleviate this problem, the variable target function method implemented in the programs DISMAN [10] and DIANA [13], went through a series of minimisations of different target functions that gradually included restraints between atoms further and further separated along the polypeptide chain, thereby increasing step-by-step the complexity of the target function. This was a natural idea for helical proteins, but less successful for b-sheet topologies that are characterised by many nonlocal contacts. This convergence problem could later be cured in part by the usage of redundant torsion angle restraints [14]. In this iterative procedure redundant torsion angle restraints were generated on the basis of the torsion angle values found in a previous round of structure calculations. In parallel with these developments, NMR structure calculation methods based on simulated annealing [15] driven by molecular dynamics simulation were developed. By numerically solving Newton’s equations of motion of classical mechanics, trajectories for the atoms of a protein can be obtained. In the context of protein structure calculation the basic advantage of molecular dynamics simulation over minimisation techniques is the presence of kinetic energy that allows the system to escape from local minima. The efficiency of structure calculations using molecular dynamics simulation was enhanced by replacing the full ‘physical’ force field [16] by a simplified ‘geometric’ energy function, a modified potential for NOE restraints with asymptotically linear slope for large violations [17–19], and simulated annealing [15]. Three different protocols for simulated annealing by molecular dynamics, each using a different way to produce the starting structure for the molecular dynamics run, were established: ‘Hybrid distance geometrydynamical simulated annealing’ [17] used a start conformation obtained from metric matrix distance geometry, the second method started from an extended polypeptide chain [18], and the third from a random array of atoms [19]. These protocols were implemented in the molecular dynamics program X-PLOR [20], that was written especially for biomolecular structure determination by NMR and X-ray diffraction, and its later successors CNS [21] and Xplor-NIH [22]. It became clear that a method working in torsion angle space and using simulated annealing by molecular dynamics would benefit from the advantages of both approaches because the absence of high-frequency bond length and bond angle vibrations in torsion angle space would allow for longer integration time steps and/or higher temperatures during the simulated annealing. Mazur and Abagyan [23,24] derived explicit formulas
Calculation of Structures from NMR Restraints
163
for Lagrange’s equations of motion of a polymer using internal coordinates as degrees of freedom. Independently, Bae and Haug [25] and Jain et al. [26] found improved torsion angle dynamics algorithms whose computational effort scaled linearly with the system size, as in Cartesian space molecular dynamics, such that the advantage of longer integration time steps in torsion angle dynamics could be exploited for systems of any size. Both algorithms were adapted for protein structure calculation on the basis of NMR data, the first [25] in the program X-PLOR [27], the other [26] in the programs DYANA and CYANA [28], and in the NIH version of X-PLOR [29]. Experience with these programs confirmed that torsion angle dynamics was the most efficient way to calculate NMR structures of biological macromolecules, and showed that the computation time with DYANA and CYANA was about one order of magnitude shorter than with other programs [28]. Simulated annealing by torsion angle dynamics became the standard method to calculate NMR protein structures. A recent survey (Table 1 in [30]) revealed that the structure calculation programs that are cited most often in the NMR protein structures deposited to the Protein Data Bank [5] in September 2005–2008 were CYANA [28] (1160 citations), CNS [21] (242 citations), Xplor-NIH [22] (153 citations), ARIA [31,32] (122 citations), DYANA [28] (114 citations), AutoStructure [33] (103 citations), and X-PLOR [20] (75 citations). When the basic problem of NMR protein structure calculation was solved satisfactorily by these programs, interest turned towards automating the most time-consuming part of NMR spectral analysis, namely the assignment of multidimensional NOESY spectra for the collection of conformational restraints. Because of the extensive degeneracy of the chemical shifts this task is cumbersome and error-prone if done manually. After semiautomatic approaches [34,35], the feasibility of automated NOESY cross-peak assignment was afforded by the NOAH algorithm [36,37] implemented in the program DIANA [13]. Automated NOESY assignment became of practical relevance with the introduction of ambiguous distance restraints [38] that allowed one to make use of NOESY cross-peaks in the structure calculation even if they had multiple possible assignments [39]. Ambiguous distance restraints became the central feature of the ARIA algorithm [31,32,40]. The CANDID [41] algorithm implemented in DYANA and CYANA [42] made use of ambiguous distance restraints, and improved the robustness of structure calculations with automated NOESY assignment by ‘network anchoring’ and ‘constraint combination’. Network anchoring reduced the initial ambiguity of NOESY cross-peak assignments by inducing self-consistency with the network of other assigned NOEs, and constraint combination minimised the impact of erroneous distance restraints on the resulting structure. At present, the combination of the automated assignment of NOESY cross-peaks and the structure calculation with CYANA or ARIA have become the standard approach to protein structure analysis by NMR [30]. Alternative approaches for the automated assignment of NOESY cross-peaks were implemented in the AutoStructure [33], PASD [43], and KNOWNOE [44] algorithms, and in a Bayesian approach [45]. The complete automation of protein structure determination is one of the challenges of biomolecular NMR spectroscopy that has, despite early optimism [46], proved difficult to achieve. The unavoidable imperfections of experimental NMR spectra and the intrinsic ambiguity of peak assignments that results from the limited accuracy of frequency measurements turn the tractable problem of finding the chemical shift assignments from
Protein NMR Spectroscopy
164
Figure 5.2 Steps of an NMR protein structure determination and their resulting data
ideal spectra into a formidably difficult one under realistic conditions. Many attempts have been made to automate further parts of the structure determination process, including peak identification [46–60], and the sequence-specific assignment of the chemical shifts [61–107]. However, fully automated NMR structure determination was more demanding than automating individual parts of NMR structure analysis because the cumulative effect of imperfections at successive steps could easily render the overall process unsuccessful (Figure 5.2). Systems designed to handle the whole process therefore generally required certain human interventions [55,62]. Only recently was the purely computational FLYA algorithm [108], that is capable of determining the 3D structure of proteins on the basis of uninterpreted spectra, developed. Nowadays, most NMR protein structure determinations make use of sophisticated computational methods but nevertheless follow in essence the original approach (Figure 5.2) that was introduced in the early 1980s [109]. Alternative methods that circumvented the chemical shift assignment step [110–116], or replaced the NOESY information by residual dipolar couplings [117–121] or chemical shift data [122,123], have been developed. De novo protein structure determination by these approaches have not been reported yet and it remains to be seen whether they will provide the reliability and the structural quality of the conventional method.
5.3
Structure Calculation Algorithms
This section presents the core algorithms for NMR protein structure calculation by simulated annealing in torsion angle space, as implemented in the widely applied programs CYANA [28] and X-PLOR/CNS [20,21]. 5.3.1
Molecular Dynamics Simulation versus NMR Structure Calculation
There is a fundamental difference between molecular dynamics simulation that has the aim of simulating the trajectory of a molecular system as realistically as possible in order to extract molecular quantities of interest and NMR structure calculation that is driven by experimental restraints. Classical molecular dynamics simulations rely on a full ‘physical’ force field to ensure proper stereochemistry, and are generally run at a constant temperature, close to room temperature. Substantial amounts of computation time are
Calculation of Structures from NMR Restraints
165
required because the physical energy function includes long-range pair interactions that are time-consuming to evaluate, and because conformation space is explored slowly at room temperature. When molecular dynamics algorithms are used for NMR structure calculations, however, the objective is quite different. Here, such algorithms simply provide a means to efficiently optimise a target function that takes the role of the potential energy. Details of the calculation, such as the course of a trajectory, are unimportant, as long as its end point is close to the global minimum of the target function. Therefore, the efficiency of NMR structure calculation can be enhanced by simplifying the force field and/or the algorithm without significantly altering the location of the global minimum (the correctly folded structure) but shortening, in terms of the computation time needed, the path by which it can be reached from the start conformation. A typical ‘geometric’ force field used in NMR structure calculation therefore retains only the most important part of the nonbonded interaction by a simple repulsive potential that replaces the Lennard-Jones and electrostatic interactions of the full empirical energy function. This short-range repulsive function can be calculated much faster and significantly facilitates large-scale conformational changes that are required during the folding process by lowering energy barriers induced by the overlap of atoms. 5.3.2
Potential Energy – Target Function
For simulated annealing a simplified potential energy function, the ‘target function’, is used that includes a simple repulsive potential instead of the Lennard-Jones and electrostatic nonbonded interactions, as well as terms for distance and torsion angle restraints. In Cartesian space the target function also comprises terms to maintain the covalent geometry of the structure by means of harmonic bond length and bond angle potentials, torsion angle potentials, and terms to enforce the proper chiralities and planarities. These terms are not needed in torsion angle space. For instance, in the program X-PLOR [20], Epot ¼
X bonds
kb ðrr0 Þ2 þ X
X angles
ky ðyy0 Þ2 þ impropers X þ kd D2d þ þ
distance restraints
X
kq ðqq0 Þ2 þ X
kf ð1 þ cosðnf þ dÞÞ
dihedrals
h i2 krepel max 0; ðsRmin Þ2 R2
nonbonded X pairs
ka D2a :
angle restraints
kb, kq, kf, ky, krepel, kd and ka denote the various force constants, r the actual and r0 the correct bond length, respectively, q the actual and q0 the correct bond angle, f the actual torsion angle, y the improper angle and y0 the correct improper angle, n the number of minima of the torsion angle potential, d an offset of the torsion angle and improper potentials, Rmin the distance where the van der Waals potential has its minimum, R the actual distance between a nonbonded atom pair, s a scaling factor, and Dd and Da the size of the distance or torsion angle restraint violation. As an alternative to the square-well potential, distance restraints are often represented by a potential with linear asymptote for
166
Protein NMR Spectroscopy
large violations [20,124], which limits the maximal force exerted by a violated distance constraint. In this case the violation Dd of a single distance restraint is computed as 8 ðdlÞ2 > > > > > > <0 2 Dd ¼ ðduÞ > > > a2 ðg2aÞ > > > þ g ðduÞ að3a2gÞ þ : du
if
d < l;
if
l d u;
if
u < d < u þ a;
if
d u þ a:
Here, d denotes the actual distance, l and u are the lower and upper distance bounds, g is the slope of the asymptotic potential, and a is the violation at which the potential switches from harmonic to asymptotic behaviour. In the program CYANA the target function [13,28] is defined such that it is zero if and only if all experimental distance restraints and torsion angle restraints are fulfilled and all nonbonded atom pairs satisfy a check for the absence of steric overlap. A conformation that satisfies the restraints more closely than another one will lead to a lower target function value. The CYANA target function for distance restraints and torsion angle restraints is defined by V¼
X c¼u;l;v
wc
X
wcab ðdab bab Þ2
ða;bÞ2Ic
þ wa
X i2Ia
"
# 1 Di 2 2 wi 1 Di : 2 Gi
a andb, and restraints Upper and lower bounds, bab, on distances dab between two atoms are considered. on individual torsion angles qi in the form of allowed intervals qmin ; qmax i i Iu, Il and Iv are the sets of atom pairs (a,b) with violated upper, lower or van der Waals distance bounds, respectively, and Ia is the set of restrained torsion angles. wu, wl, wv and wa are overall weighting factors for the different types of restraints, and wcab and wi are relative weighting factors for individual restraints. Gi ¼ pðqmax qmin i i Þ=2 denotes the half-width of the forbidden range of torsion angle values, and Di is the size of the torsion angle restraint violation. The target function may include additional terms for restraints on vicinal scalar coupling constants, residual dipolar couplings, and pseudocontact shifts, as well as identity and symmetry restraints for symmetric multimers. Alternatives to the simple square potential for violated distance restraints have also been implemented. 5.3.3
Torsion Angle Dynamics
Torsion angle dynamics, i.e. molecular dynamics simulation using torsion angles instead of Cartesian coordinates as degrees of freedom [23–26], provides at present the most efficient way to calculate NMR structures of biological macromolecules. The only degrees of freedom are the torsion angles, i.e. rotations about single bonds, such that the conformation of the molecule is uniquely specified by the values of all torsion angles. The efficiency of the torsion angle dynamics algorithm [26] implemented in the program CYANA, and, previously, in DYANA [28], is due to the fact that it requires a computational effort that increases only linearly with the system size. In contrast, the computation
Calculation of Structures from NMR Restraints
167
time for ‘na€ıve’ approaches to torsion angle dynamics rises with the third power of the system size [24], which renders these algorithms unsuitable for use with macromolecules. With the fast torsion angle dynamics algorithm in CYANA the advantages of torsion angle dynamics, especially that much longer integration time steps can be used, are effective for molecules of all sizes. 5.3.3.1 Tree Structure A key idea of the fast torsion angle dynamics algorithm in CYANA [26,28] is to exploit the fact that a chain molecule such as a protein or nucleic acid can be represented in a natural way as a tree structure consisting of n þ 1 rigid bodies or ‘clusters’ that are connected by n rotatable bonds. Each rigid body is made up of one or several mass points (atoms) with fixed relative positions. The tree structure starts from a base, typically at the N-terminus of the polypeptide chain, and terminates with ‘leaves’ at the ends of the side-chains and at the C-terminus. The only degrees of freedom are rotations about single bonds, and parameters that define the position and orientation of the molecule in space. The clusters are numbered from 0 to n. The base cluster has the number k ¼ 0. Each of the other clusters, with numbers k 1, has a single nearest neighbour in the direction toward the base, which has a number p(k) < k. The torsion angle between the rigid bodies p(k) and k is denoted by qk. The conformation of the molecule is uniquely specified by the values of all torsion angles, (q1, . . . , qn). The following quantities are defined for each cluster k (Figure 5.3): the ‘reference point’, rk, which is the position vector of the end point of the bond between the clusters p(k) and k; vk ¼ r_ k , the velocity of the reference point; wk, the angular velocity of the cluster; Yk, the vector from the reference point to the centre of mass of the cluster; mk, the mass of the cluster k; Ik, the inertia tensor of the cluster k with respect to the reference point, given by X Ik ¼ m Iðya Þ; a a where the sum runs over all atoms in the cluster k, ma is the mass of the atom a, ya is the vector from the reference point of cluster k to the atom a, and I(ya) is the symmetric 3 3 matrix defined by the relation IðyÞx ¼ y ^ ðx ^ yÞ for all three-dimensional vectors x. The symbol ‘^’ denotes the vector product. All position vectors are in an inertial frame of reference that is fixed in space. 5.3.3.2 Kinetic Energy The angular velocity vector vk and the linear velocity vk of the reference point of the rigid body k are calculated recursively from the corresponding quantities of the preceding rigid body p(k): vk ¼ vpðkÞ þ ek q_ k ; vk ¼ vpðkÞ ðrk rpðkÞ Þ ^ vpðkÞ : The kinetic energy can then be computed as a sum over all rigid bodies: Ekin ¼
n 1X ½mk v2k þ vk Ik vk þ 2vk ðvk ^ mk Yk Þ: 2 k¼0
Figure 5.3 (a) Tree structure of torsion angles for the tripeptide Val-Ser-Ile. Circles represent rigid units. Rotatable bonds are indicated by arrows that point towards the part of the structure that is rotated if the corresponding torsion angle is changed. (b) Excerpt from the tree structure formed by the torsion angles of a molecule, and definition of quantities required by the CYANA torsion angle dynamics algorithm
168 Protein NMR Spectroscopy
Calculation of Structures from NMR Restraints
169
5.3.3.3 Forces ¼ Torques ¼ Gradient of the Target Function The torques about the rotatable bonds, i.e. the negative gradients of the potential energy or target function with respect to torsion angles, !V(q), are calculated by a fast recursive algorithm [12]. The gradient of the target function can be calculated efficiently because the target function is a sum of functions of individual interatomic distances and torsion angles. The partial derivative of the target function V with respect to a torsion angle qk is given by " # X qV Di 2 ¼ ek fk ðek ^ rk Þ gk þ 2wa wi 1 Di d ik qqk Gi i2Ia with fk ¼
X
wc
ða;bÞ2Ic a2Mk
c¼u;l;v
gk ¼
X
X
wc
c¼u;l;v
wcab 2
dab bab ðra ^ rb Þ; dab
wcab 2
dab bab ðra rb Þ: dab
X ða;bÞ2Ic a2Mk
ra and rb are the position vectors of the atoms a und b, and Mk denotes the set of all atoms whose position is affected by a change of the torsion angle qk if the base cluster is held fixed. Because Mk is a subset of Mp(k), the quantities fk and gk can be calculated recursively for k ¼ n, n 1, . . . , 1 starting from the leaves of the tree structure by evaluating the interaction for each atom pair (a, b) only once. 5.3.3.4 Equations of Motion The calculation of the torsional accelerations, i.e. the second time derivatives of the torsion angles, is the crucial point of a torsion angle dynamics algorithm. The equations of motion for a classical mechanical system with generalised coordinates are the Lagrange equations d qL qL ¼0 dt qq_ k qqk
ðk ¼ 1; . . . ; nÞ
with the Lagrange function L ¼ Ekin Epot. They lead to equations of motion of the form _ ¼ 0: € þ Cðq; qÞ MðqÞq In the case of n torsion angles as degrees of freedom, the n n mass matrix M(q) and the _ can be calculated explicitly [23,24]. To generate a trajectory n-dimensional vector Cðq; qÞ
170
Protein NMR Spectroscopy
this linear set of n equations would have to be solved in each time step for the torsional accelerations € q, formally by _ € ¼ MðqÞ1 Cðq; qÞ: q This requires a computational effort proportional to n3, which is prohibitively expensive for larger systems. Therefore, in CYANA the fast recursive algorithm of [26] is implemented to compute the torsional accelerations, which makes explicit use of the tree structure of the molecule in order to obtain € q with a computational effort that is only proportional to n. The mathematical details and a proof of correctness of the CYANA torsion angle dynamics algorithm are given in [26]. 5.3.3.5 Torsional Accelerations The torsional accelerations can be obtained by executing a series of three linear loops over all rigid bodies similar to the single one that is needed to compute the kinetic energy, Ekin. The algorithm [26] computes a factorisation of the inverse of the mass matrix, M(q)1, into a product of highly sparse matrices with nonzero elements only in 6 6 blocks on or near the diagonal. As a result, the torsional accelerations can be obtained by executing a series of three linear loops over all rigid bodies similar to the single loop that is needed to compute the kinetic energy, Ekin. €k is initialised by calculating for all rigid The computation of the torsional accelerations q bodies, k ¼ 1, . . . , n, the six-dimensional vectors ak, ek and zk: 2 ak ¼ 4
ðvk ^ ek Þ q_ k
3
5; vpðkÞ ^ ðvk vpðkÞ Þ " # ek ek ¼ ; 0 " # vk ^ Ik vk zk ¼ ; ðvk mk Yk Þvk jvk j2 mk Yk and the 6 6 matrices Pk and fk: " Pk ¼ " fk ¼
Ik mk AðIk Þ 13 03
mk AðYk Þ
mk 13 # Aðrk rpðkÞ Þ : 13
# ;
The three-dimensional zero vector is denoted by 0, 03 is the 3 3 zero matrix, 13 is the 3 3 unit matrix, and A(y) denotes the antisymmetric 3 3 matrix associated with the cross product, defined by the relation AðyÞx ¼ y ^ x for all vectors x.
Calculation of Structures from NMR Restraints
171
Next, several auxiliary quantities are calculated by executing a recursive loop over all rigid bodies in the backward direction, k ¼ n, n 1, . . . , 1: Dk ¼ ek Pk ek Gk ¼ Pk ek =Dk ek ¼ ek ðzk þ Pk ak Þ
qV qqk
PpðkÞ
PpðkÞ þ fk ðPk Gk eTk Pk Þ fTk
zpðkÞ
zpðkÞ þ fk ðzk þ Pk ak Gk ek Þ:
Dk and ek are scalars, Gk is a six-dimensional vector, and ‘ ’ means: ‘assign the result of the expression on the right-hand side to the variable on the left-hand side.’ Finally, the torsional accelerations are obtained by executing another recursive loop over all rigid bodies in the forward direction, k ¼ 1, . . . , n: ak ¼ fTk apðkÞ
€k ¼ ek =Dk Gk ak q a k a k þ ek € qk þ ak : The auxiliary quantities ak are six-dimensional vectors, with a0 being equal to the zero vector. 5.3.3.6 Time Step The integration scheme for the equations of motion in torsion angle dynamics is a variant of the ‘leap-frog’ algorithm [125] used in Cartesian space molecular dynamics. To obtain a trajectory, the equations of motion are numerically integrated by advancing the i ¼ 1; . . . ; n (generalised) coordinates qi and velocities q_ i that describe the system by a small but finite time step Dt: q_ i ðt þ Dt=2Þ ¼ q_ i ðtDt=2Þ þ Dtq€i ðtÞ þ OðDt3 Þ qi ðt þ Dt=2Þ ¼ qi ðtÞ þ Dtq_ i ðt þ Dt=2Þ þ OðDt3 Þ The degrees of freedom, qi, are the Cartesian coordinates of the atoms in conventional molecular dynamics simulation, or the torsion angles in CYANA. The O(Dt3) terms indicate that the errors with respect to the exact solution incurred by the use of a finite time step Dt are proportional to Dt3. The time step Dt must be small enough to sample adequately the fastest motions. Because the fastest motions in conventional molecular dynamics simulation are oscillations of bond lengths and bon angles, which are ‘frozen’ in torsion angle space, longer time steps can be used for torsion angle dynamics than for molecular dynamics in Cartesian space [28]. The temperature is controlled by weak coupling to an external bath [126] and the length of the time step is adapted automatically based on the accuracy of energy conservation [28]. It could be shown that in practical applications with proteins time steps of about 100, 30 and 7 fs at low (1 K), medium (400 K) and high (10 000 K) temperatures,
172
Protein NMR Spectroscopy
respectively, can be used in torsion angle dynamics calculations with CYANA [28], whereas time steps in Cartesian space molecular dynamics simulation generally have to be in the range of 2 ns. The concomitant fast exploration of conformation space provides the basis for the efficient CYANA structure calculation protocol. With the CYANA torsion angle dynamics algorithm it is possible to efficiently calculate protein structures on the basis of NMR data. Even for a system as complex as a protein, the program CYANA can execute thousands of torsion angle dynamics steps within minutes of computation time. Furthermore, since an NMR structure calculation always involves the computation of a group of conformers, it is highly efficient and straightforward with CYANA to run calculations of multiple conformers in parallel. Nearly ideal speed-up, i.e. an overall computation time almost inversely proportional to the number of processors, can be achieved with CYANA [28]. 5.3.4
Simulated Annealing
The potential energy landscape of a protein is complex and studded with many local minima, even in the presence of experimental restraints and when using a simplified target function. Because the temperature, i.e. the kinetic energy, determines the maximal height of energy barriers that can be overcome in a molecular dynamics trajectory, the temperature schedule is important for the success and efficiency of a simulated annealing calculation. Elaborated protocols have been devised for structure calculations using molecular dynamics in Cartesian space [17,20]. In addition to the temperature, other parameters such as force constants and repulsive core radii are varied in these schedules that may involve several stages of heating and cooling. However, the fast exploration of conformation space with torsion angle dynamics allows for simpler schedules.
Protocol for Simulated Annealing The standard simulated annealing protocol in the program CYANA includes N torsion angle dynamics time steps. It starts from a conformation with all torsion angles treated as independent, uniformly distributed random variables and consists of five stages: 1. Initial minimisation. A short conjugate gradient minimisation is applied to reduce high energy interactions that could otherwise disturb the torsion angle dynamics algorithm: 100 conjugate gradient minimisation steps are performed, including only distance restraints between atoms up to 3 residues apart along the sequence, followed by a further 100 minimisation steps including all restraints. For efficiency, all hydrogen atoms are excluded from the check for steric overlap, the repulsive core radii of heavy atoms without covalently bound hydrogen atoms are decreased by 0.2 A with respect to their standard values, and the radii of heavy atoms with covalently bound hydrogens are decreased by 0.05 A. The weighting factors in the target function are set to 1 for userdefined upper and lower distance bounds, and to 0.5 for steric lower distance bounds. 2. First simulated annealing stage with reduced heavy atom radii. A torsion angle dynamics trajectory with (N 200)/3 time steps is generated. Typically, one-fifth of
Calculation of Structures from NMR Restraints
173
these torsion angle dynamics steps are performed at a constant high reference temperature Thigh of, typically, 10 000 K, followed by slow cooling according to a fourth-power law to an intermediate reference temperature Tmed ¼ Thigh/20. The time step is initialised to 2 fs. The list of van der Waals lower distance bounds is updated every 50 steps using a cutoff equal to twice the largest van der Waals radius plus 1 A (¼4.2 A for proteins) for the van der Waals pair list generation throughout all torsion angle dynamics phases. 3. Second simulated annealing stage with normal heavy atom radii and, later, normal hydrogen atom radii. The repulsive core radii of all heavy atoms are reset to their standard values, 50 conjugate gradient minimisation steps are performed, and the torsion angle dynamics trajectory is continued for 2(N 200)/3 time step starting with an initial time step that is half as long as the last preceding time step. The reference temperature is decreased according to a fourth-power law from the intermediate temperature Tmed to zero reference temperature. After two-thirds of these time steps, the hydrogen atoms are included, with their standard radii, in the steric overlap check, and 50 conjugate gradient minimisation steps are performed before continuing the trajectory, starting with a time step that is half as long as the last preceding time step. 4. Low temperature phase with increased weight for steric repulsion. The weighting factor for steric restraints is increased to 2, and 50 conjugate gradient minimisation steps are performed, followed by 200 torsion angle dynamics steps at zero reference temperature, starting with a time step that is half as long as the last preceding time step. 5. Final minimisation. A final minimisation with a maximum of, typically, 1000 conjugate gradient steps is applied.
5.4
Automated NOE Assignment
Obtaining a comprehensive set of distance restraints from a NOESY spectrum is in practice by no means straightforward. Resonance and peak overlap turn NOE assignment into an iterative process in which preliminary structures, calculated from limited numbers of distance restraints, serve to reduce the ambiguity of the cross-peak assignments. Additional difficulties may arise from spectral artifacts and noise, and from the absence of expected signals because of fast relaxation. These inevitable shortcomings of NMR data collection are the main reason why laborious interactive procedures have dominated this central step of NMR protein structure determination for a long time. Automated procedures follow the same general scheme as the interactive approach but do not require manual intervention during the assignment/structure calculation cycles. Two main obstacles have to be overcome by an automated method starting without any prior knowledge of the structure. First, the number of cross-peaks with unique assignments based on chemical shift alignment alone is in general not sufficient to define the fold of the protein [127]. An automated method must therefore also have the capability to use NOESY cross-peaks that cannot (yet) be assigned unambiguously. Secondly, the automated program must be able to cope with the erroneously picked or inaccurately positioned peaks and with the incompleteness of the chemical shift assignment of typical experimental data sets. An automated procedure needs devices to substitute for the intuitive decisions made by an experienced spectroscopist in dealing with the imperfections of experimental NMR data.
174
Protein NMR Spectroscopy
Besides semi-automatic approaches [34,35,128], several algorithms have been developed for the automated analysis of NOESY spectra given the chemical shift assignments of the backbone and side chain resonances, namely NOAH [36,37], ARIA [31,32,40,129], AUTOSTRUCTURE [33], KNOWNOE [44], CANDID [41] and a similar algorithm implemented in CYANA [130], PASD [43], and a Bayesian approach [45]. Automated NOE assignment algorithms generally require a high degree of completeness of the backbone and side-chain chemical shift assignments [131]. 5.4.1
Ambiguity of Chemical Shift Based NOESY Assignment
In de novo three-dimensional structure determinations of proteins in solution by NMR spectroscopy, the key conformational data are upper distance limits derived from nuclear Overhauser effects (NOEs) [34–37]. In order to extract distance constraints from a NOESY spectrum, its cross-peaks have to be assigned, i.e. the pairs of interacting hydrogen atoms have to be identified. The NOESY assignment is based on previously determined chemical shift values that result from the chemical shift assignment. Because of the limited accuracy of chemical shift values and peak positions many NOESY cross-peaks cannot be attributed to a single unique spin pair but have an ambiguous NOE assignment comprising multiple spin pairs. A simple mathematical model of the NOESY assignment process by chemical shift matching gives insight into this problem [37]. It assumes a protein with n hydrogen atoms, for which complete and correct chemical shift assignments are available, and N cross-peaks picked in a 2D NOESY spectrum with an accuracy of the peak position of Dw, i.e. the position of the picked peak differs from the resonance frequency of the underlying signal by no more than Dw in both spectral dimensions. Under the simplifying assumption of a uniform distribution of the proton chemical shifts over a range DW, the chemical shift of a given proton falls within an interval of half-width Dw about a given peak position with probability p ¼ 2Dw=DW. Peaks with unique chemical shift-based assignment have in both spectral dimensions exactly 1 out of all n proton shifts inside the tolerance range Dw from the peak position. Their expected number, N ð1Þ ¼ Nð1pÞ2n2 Ne2np ¼ Ne4nDw=DW ; decreases exponentially with increasing size of the protein (n) and increasing chemical shift tolerance range (Dw). For a typical small protein with 100 amino acid residues, n ¼ 500 proton chemical shifts, and N ¼ 2000 NOESY cross-peaks within a range of DW ¼ 10 ppm, one expects that only about 2 % of the NOEs can be assigned unambiguously based solely on chemical shift information with an accuracy of Dw ¼ 0.02 ppm, which is an insufficient number to calculate a preliminary three-dimensional structure. For peak lists obtained from 3D 13 C- or 15 N-resolved NOESY spectra, the ambiguity in one of the proton dimensions can usually be resolved by reference to the hetero-spin, so that the expected number of unambiguously assignable NOEs becomes N ð1Þ Nenp ¼ Ne2nDw=DW :
Calculation of Structures from NMR Restraints
175
With regard to assignment ambiguity, 3D NOESY spectra are thus equivalent to homonuclear NOESY spectra from a protein of half the size or with twice the accuracy in the determination of the chemical shifts and peak positions. 5.4.2
Ambiguous Distance Restraints
Ambiguous distance restraints [39] provide a powerful concept for handling ambiguities in the initial, chemical shift-based NOESY cross-peak assignments. Prior to the introduction of ambiguous distance restraints in the ARIA algorithm [40], in general only unambiguously assigned NOEs could be used as distance restraints in the structure calculation. Since the majority of NOEs cannot be assigned unambiguously from chemical shift information alone, this lack of a general way to include ambiguous data into the structure calculation considerably hampered the performance of early automatic NOESY assignment algorithms. When using ambiguous distance restraints, every NOESY cross-peak is treated as the superposition of the signals from each of its possible assignments by applying relative weights proportional to the inverse sixth power of the corresponding interatomic distances. A NOESY cross-peak with a unique assignment possibility gives rise to an upper bound b on the distance d(a,b) between two hydrogen atoms, a and b. A NOESY cross-peak with n > 1 assignment possibilities can be interpreted as the superposition of n degenerate signals and interpreted as an ambiguous distance restraint, deff b, with the ‘effective’ or ‘r6-summed’ distance deff ¼
n X
!1=6 dk6
:
k¼1
Each of the distances dk ¼ d(ak,bk) in the sum corresponds to one assignment possibility to a pair of hydrogen atoms, ak and bk. The effective distance deff is always shorter than any of the individual distances dk. Thus, an ambiguous distance restraint will be fulfilled by the correct structure provided that the correct assignment is included amongst its assignment possibilities, regardless of the possible presence of other, incorrect assignment possibilities. Ambiguous distance restraints make it possible to interpret NOESY cross-peaks as correct conformational restraints also if a unique assignment cannot be determined at the outset of a structure determination. Including multiple assignment possibilities, some but not all of which may later turn out to be incorrect, does not result in a distorted structure but only in a decrease of the information content of the ambiguous distance restraints. 5.4.3
Combined Automated NOE Assignment and Structure Calculation with CYANA
Awidely used algorithm for the automated interpretation of NOESY spectra is implemented in the NMR structure calculation program CYANA [28,130]. This algorithm is a reimplementation of the former CANDID algorithm [41] on the basis of a probabilistic treatment of the NOE assignment, combined in an iterative process that comprises seven cycles of automated NOE assignment and structure calculation, followed by a final structure calculation using only unambiguously assigned distance restraints. Between successive
176
Protein NMR Spectroscopy
cycles, information is transferred exclusively through the intermediary 3D structures. The molecular structure obtained in a given cycle is used to guide the NOE assignments in the following cycle. Otherwise, the same input data are used for all cycles, that is, the amino acid sequence of the protein, one or several chemical shift lists from the sequencespecific resonance assignment, and one or several lists containing the positions and volumes of cross-peaks in 2D, 3D or 4D NOESY spectra. The input may further include previously assigned NOE upper distance bounds or other previously assigned conformational restraints for the structure calculation. In each cycle, first all assignment possibilities of a peak are generated on the basis of the chemical shift values that match the peak position within given tolerance values, and the quality of the fit is expressed by a Gaussian probability, Pshifts. Secondly, in all but the first cycle the probability Pstructure for agreement with the preliminary structure from the preceding cycle, represented by a bundle of conformers, is computed as the fraction of the conformers in which the corresponding distance is shorter than the upper distance bound plus the acceptable distance restraint violation cutoff. Thirdly, each assignment possibility is evaluated for its network anchoring (see below), which is quantified by the probability Pnetwork. Only assignment possibilities for which the product of the three probabilities is above a threshold, Ptot ¼ Pshifts Pstructure Pnetwork Pmin ; are accepted (Figure 5.4). Cross-peaks with a single accepted assignment yield a conventional unambiguous distance restraint. Otherwise, an ambiguous distance restraint is generated that embodies multiple accepted assignments.
Figure 5.4 Schematic illustration of the effect of constraint combination in the case of two distance restraints, a correct one connecting atoms A and B, and a wrong one between atoms C and D. A structure calculation that uses these two restraints as individual restraints that have to be satisfied simultaneously will, instead of finding the correct structure (shown, schematically, in the first panel), result in a distorted conformation (second panel), whereas a combined restraint, which will be fulfilled already if one of the two distances is sufficiently short, leads to an almost undistorted solution (third panel). The formation of a combined restraint from the assignments of two peaks is shown in the right panel
Calculation of Structures from NMR Restraints
5.4.4
177
Network-Anchoring
Each assignment possibility is evaluated for its network anchoring, i.e. its embedding in the network formed by the assignment possibilities of all the other peaks and the covalently restricted short-range distances. The network anchoring probability Pnetwork that the distance corresponding to an assignment possibility is shorter than the upper distance bound plus the acceptable violation is computed given the assignments of the other peaks but independent of knowledge of the three-dimensional structure. Contributions to the network anchoring probability for a given, ‘current’, possible assignment result from other peaks with the same assignment, from pairs of peaks that connect indirectly the two atoms of the current possible assignment via a third atom, and from peaks that connect an atom in the vicinity of the first atom of the current assignment with an atom in the vicinity of the second atom of the current assignment. For network anchoring, short-range distances that are constrained by the covalent geometry take the same role as an unambiguously assigned NOE. Individual contributions to the network anchoring of the current assignment possibility are expressed as probabilities, P1, P2, . . . , that the distance corresponding to the current assignment possibility satisfies the upper distance bound. The network anchoring probability is obtained from the individual probabilities as Pnetwork ¼ 1 ¼ (1 P1) (1 P2). . ., which is never smaller than the highest probability of an individual network anchoring contribution. 5.4.5
Constraint Combination
In practice, spurious distance restraints may arise from the misinterpretation of noise and spectral artifacts, in particular at the outset of a structure determination, before 3D structurebased filtering of the restraint assignments can be applied. The key technique used in CYANA to reduce structural distortions from erroneous distance restraints is ‘constraint combination’ [41]. Ambiguous distance restraints are generated with combined assignments from different, in general unrelated, cross-peaks (Figure 5.5). The basic property of ambiguous distance restraints that the restraint will be fulfilled by the correct structure whenever at least one of its assignments is correct, regardless of the presence of additional, erroneous assignments, then implies that such combined restraints have a lower probability of being erroneous than the corresponding original restraints, provided that the fraction of erroneous original restraints is smaller than 50 %. Constraint combination aims at minimising the impact of such imperfections on the resulting structure at the expense of a temporary loss of information. It is applied to medium- and long-range distance restraints in the first two cycles of combined automated NOE assignment and structure calculation with CYANA. 5.4.6
Structure Calculation Cycles
The distance restraints are then included in the input for the structure calculation with simulated annealing by the fast CYANA torsion angle dynamics algorithm [28]. The structure calculations typically comprise seven cycles. The second and subsequent cycles differ from the first cycle by the use of additional selection criteria for cross-peaks and NOE assignments that are based on assessments relative to the protein 3D structure from the preceding cycle. The precision of the structure determination normally improves with each
Protein NMR Spectroscopy
178
(a)
(b)
ωA Δω
Δω
(c)
Peak at (ω1,ω 2) ω wB
atom A A
atom B B
|ω1 − ωA | < ∆ω |ω2 − ωB | < Δω
d AB
Figure 5.5 Three conditions that must be fulfilled by a valid assignment of a NOESY cross-peak to two protons A and B in the CYANA automated NOESY assignment algorithm: (a) Agreement between the proton chemical shifts wA and wB and the peak position (w1,w2) within a tolerance of Dw. (b) Spatial proximity in a (preliminary) structure. (c) Network-anchoring. The NOE between protons A and B must be part of a network of other NOEs or covalently restricted distances that connect the protons A and B indirectly through other protons
subsequent cycle. Accordingly, the cutoff for acceptable distance restraint violations in the calculation of Pstructure is tightened from cycle to cycle. In the final cycle, an additional filtering step ensures that all NOEs have either unique assignments to a single pair of hydrogen atoms, or are eliminated from the input for the structure calculation. This facilitates the subsequent use of refinement and analysis programs that cannot handle ambiguous distance restraints. A CYANA structure calculation with automated NOE assignment can be completed in less than one hour for a 10–15 kDa protein, provided that the structure calculations can be performed in parallel, for instance on a Linux cluster system.
5.5
Nonclassical Approaches
Nonclassical approaches that do not rely on sequence-specific resonance assignments and methods using residual dipolar couplings or chemical shifts in conjunction with molecular modelling to determine the backbone structure without the need for side-chain assignments have also been proposed. 5.5.1
Assignment-Free Methods
Much of the NMR measurement time and the spectral analysis effort is devoted to finding sequence-specific resonance assignments. However, the chemical shift assignment by itself has no biological relevance. It is required only as an intermediate step in the interpretation of the NMR spectra. Consequently, strategies for NMR protein structure determination were
Calculation of Structures from NMR Restraints
179
sought that circumvented the chemical shift assignment step. Assignment-free NMR structure calculation methods exploit the fact that NOESY spectra provide distance information even in the absence of chemical shift assignments. This proton-proton distance information is used to calculate a spatial proton distribution. Since there is no association with the covalent structure at this point, the protons of the protein are treated as a cloud of unconnected particles. Provided that the emerging proton distribution is sufficiently clear, a model can then be built into the proton density in a manner analogous to X-ray crystallography where a structural model is placed into the electron density. This general idea was first tested with simulated data [110–114]. The most recent approach to NMR structure determination without chemical shift assignment is the CLOUDS protocol [115,116] which has demonstrated the feasibility of assignment-free structure determination using experimental rather than simulated data. A ‘gas’ of unassigned, unconnected hydrogen atoms is condensed into a structured proton distribution (cloud) via a molecular dynamics simulated annealing scheme in which the internuclear distances and van der Waals repulsive terms are the only active restraints. Proton densities are generated by combining a large number of such clouds, each computed from a different trajectory. The primary structure is threaded through the unassigned proton density by a Bayesian approach, for which the probabilities of sequential connectivity hypotheses are inferred from likelihoods of HN þ -HN, HN-Ha, and Ha-Ha interatomic distances as well as 1 H NMR chemical shifts, both derived from public databases. Side chains are placed by a similar procedure. As for all NMR spectrum analysis, resonance overlap presents a major difficulty also in applying assignment-free strategies. At present, a de novo protein structure determination by the assignment-free approach has not yet been reported. 5.5.2
Methods Based on Residual Dipolar Couplings
Methods using residual dipolar couplings to determine the backbone structure without the need for side-chain assignments have been developed [117]. In a first approach [118] the Protein Data Bank was searched for fragments of seven contiguous amino acid residues that fitted the measured residual dipolar couplings. From consensus values of the torsion angles for the nonterminal residues of these fragments, an initial structure was built from overlapping fragments by ‘molecular fragment replacement’ (MFR). Errors in the MFR-derived backbone torsion angles accumulate when building the initial model because the long-range information contained in the residual dipolar couplings is not yet used. However, this global orientational information could be reintroduced when using these rough models as starting structures in a subsequent refinement procedure based on a simple iterative gradient approach that adjusted the values of the backbone torsion angles f and y to minimise the difference between measured and best-fitted dipolar couplings, and between measured chemical shifts and those predicted by the model. It was demonstrated that the 3D structure of large protein backbone segments, and in favourable cases an entire small protein, could be calculated exclusively from dipolar couplings and chemical shifts [118]. This and similar approaches [119] require assignments of the backbone chemical shifts as input. In a further step, automated algorithms were developed that simultaneously perform the assignment and the determination of low resolution backbone structures on the basis of
180
Protein NMR Spectroscopy
unassigned chemical shifts and residual dipolar couplings [120,121]. The latter method relied on the de novo protein structure prediction algorithm ROSETTA [132] and a Monte Carlo search for chemical shift assignments that produced the best fit of the experimental NMR data to a candidate 3D structure. 5.5.3
Chemical Shift-Based Structure Determination
The chemical shift is the NMR parameter than can be measured most easily and accurately. Because the chemical shifts are highly sensitive to their local environment they are widely used to monitor conformational changes or ligand binding, and they can yield information about specific features of protein conformations, notably dihedral angles [133] and secondary structure [134]. However, the complex relationship between chemical shifts and 3D structure has impeded their direct use for tertiary structure determination. Recently, however, two approaches to 3D protein structure determination have been developed that use exclusively chemical shifts as experimental input data [122,123]. The methods do not rely on the quantum mechanical calculation of chemical shifts from first principles but exploit the availability of an ever-growing database of 3D protein structures [5] and corresponding chemical shifts [135] to extract from known protein structures molecular fragment conformations that match the experimentally determined secondary chemical shifts of the protein under study. A secondary chemical shift is the deviation of a chemical shift from the residue type dependent random coil chemical shift value of the corresponding atom. This separates the conformation dependence of the chemical shift from its residuetype dependence, which is a prerequisite for the sequence independent identification of molecular fragments with similar conformation. The molecular fragment conformations are found by extending the database search method of the program TALOS [133] to contiguous segments of several residues [122,136]. The fragment conformations are then assembled into a 3D structure of the entire protein using molecular modelling approaches. The CHESHIRE algorithm was the first program to generate near-atomic resolution structures from chemical shifts [122]. It first uses the 1 Ha , 15 N, 13 Ca and 13 Cb secondary chemical shifts to predict the secondary structure of the protein and the backbone torsion angles, followed by the identification of three- and nine-residue segments on the basis of the secondary chemical shifts, the predicted secondary structure and the predicted backbone dihedral angles. Low resolution structures in which the side-chains are represented by a single Cb atom are calculated by a Monte Carlo algorithm using the CHARMM force field [16] complemented with terms for secondary structure packing and cooperative hydrogen bonding. The previously determined three- and nine-residue fragments guide Monte Carlo moves. All atom conformers are generated. Finally, the 500 best scoring all atom conformers are refined by an Monte Carlo protocol during which an additional energy term is active that describes the correlation between experimental and predicted chemical shifts. The CHESHIRE algorithm yielded the structures of 11 proteins of 46–123 residues with an accuracy of 2 A or better for the backbone RMSD. The CS-ROSETTA method is based on the same concept [123]. It combines the ROSETTA structure prediction program [137] with a recently enhanced empirical relation between structure and chemical shifts [136], which allows the selection of database fragments that better match the structure of the unknown protein. Generating new protein structures by CS-ROSETTA involves two separate stages. First, polypeptide fragments are
Calculation of Structures from NMR Restraints
181
selected from a protein structural database, based on the combined use of 13 Ca , 13 Cb , 13C0 , 15 N, 1 Ha and 1 HN chemical shifts and the amino acid sequence pattern. In the second stage, these fragments are used for de novo structure generation, using the standard ROSETTA Monte Carlo assembly and relaxation methods. The method was calibrated using 16 proteins of known structure, and then successfully tested for nine proteins with 65–147 residues under study in a structural genomics project. For these, the CS-ROSETTA algorithm yielded full-atom models with 0.6–2.1 A RMSD for the backbone atoms relative to the independently determined NMR structures. Both methods require as experimental input the chemical shift assignments for the backbone and 13 Cb atoms. These shifts are generally available at an early stage of the traditional NMR structure determination process, before the collection and analysis of structural restraints. Side chain chemical shift assignments beyond Cb, which are considerably harder to obtain than those for the backbone, are not necessary. In contrast to the NOE-based conventional approach, for which a well-established theory exists relating each piece of NMR data (the NOESY peak volume) to a corresponding conformational restraint, chemical shift-based structure determination is an empirical approach in which it is assumed that the entire sequence of the protein can be covered by overlapping fragments with a similar conformation in already existing structures. There are no experimentally derived long-range conformational restraints. This implies that the correct tertiary structure has to be found – or may be missed – by the underlying molecular modelling algorithm. In practice, convergence decreases with increasing protein size, and is adversely affected by the presence of long, disordered loops [123]. The CS-ROSETTA approach works for proteins up to about 130 residues.
5.6
Fully Automated Structure Analysis
Fully automated NMR structure determination is more demanding than automating individual parts of NMR structure analysis because the cumulative effect of imperfections at successive steps can easily render the overall process unsuccessful. For example, it has been demonstrated that reliable automated NOE assignment and structure calculation requires around 90 % completeness of the chemical shift assignment [41,131], which is not straightforward to achieve by unattended automated peak picking and automated resonance assignments. Present systems designed to handle the whole process therefore generally require certain human interventions [55,62]. The interactive validation of peaks and assignments, however, still constitutes a time-consuming obstacle for high-throughput NMR protein structure determination. The crucial indicator for a fully automated NMR structure determination method is the accuracy of the resulting 3D structures when real experimental input data is used and any human interventions at intermediate steps are avoided. Even ‘small’ manual corrections, or the use of idealised input data, can lead to substantially altered conclusions, and prejudice the assessment of different methods. Fully automated structure determination of proteins in solution (FLYA) yields, without human intervention, 3D protein structures starting from a set of multidimensional NMR spectra [108]. As in the classical manual approach, structures are determined by a set of experimental NOE distance restraints without reference to already existing structures or empirical molecular modelling information. In addition to the 3D structure of the protein,
182
Protein NMR Spectroscopy
Figure 5.6 Flowchart of the fully automated structure determination algorithm FLYA
FLYA yields backbone and side-chain chemical shift assignments, and cross-peak assignments for all spectra. The FLYA algorithm (Figure 5.6) uses as input data only the protein sequence and multidimensional NMR spectra. Any combination of commonly used hetero- and homonuclear two-, three- and four-dimensional NMR spectra can be used as input for the FLYA algorithm, provided that it affords sufficient information for the assignment of the backbone and side-chain chemical shifts and for the collection of conformational restraints. Peaks are identified in the multidimensional NMR spectra using the automated peak picking algorithm of NMRView [48], or AUTOPSY [47]. Peak integrals for NOESY cross-peaks are determined simultaneously. Since no manual corrections are applied, the resulting raw peak
Calculation of Structures from NMR Restraints
183
lists may contain, in addition to the entries representing true signals, a significant number of artifacts (see Figures 5.2 and 5.4 of [108]). The following steps of the fully automated structure determination algorithm can tolerate the presence of such artifacts, as long as the majority of the true peaks have been identified. Based on the peak positions and, in the case of NOESY spectra, peak volumes, peak lists are prepared by CYANA [28,127]. Depending on the spectra, the preparation may include unfolding aliased signals, systematic correction of chemical shift referencing, and removal of peaks near the diagonal or water lines. The peak lists resulting from this step remain invariable throughout the rest of the procedure. An ensemble of initial chemical shift assignments is obtained by multiple runs of a modified version of the GARANT algorithm [100,101] with different seed values for the random number generator [138]. The original GARANT algorithms was modified for new spectrum types and for the treatment of NOESY spectra when 3D structures are available. In analogy to NMR structure calculation in which not a single structure but an ensemble of conformers is calculated using identical input data but different randomised start conformers, the initial chemical shift assignment produces an ensemble rather than a single chemical shift value for each 1 H, 13 C and 15 N nucleus. The peak position tolerance is typically set to 0.03 ppm for the 1 H dimensions and to 0.4 ppm for the 13 C and 15 N dimensions. These initial chemical shift assignments are consolidated by CYANA into a single consensus chemical shift list. The most highly populated chemical shift value in the ensemble is computed for each 1 H, 13 C and 15 N spin and selected as the consensus chemical shift value that will be used for the subsequent automated assignment of NOESY peaks. The consensus chemical shift for a given nucleus is the value w that maximises the function mðwÞ ¼
2 2 exp ðww Þ =2Dw ; j j
X
where the sum runs over all chemical shift values wj for the given nucleus in the ensemble of initial chemical shift assignments, and Dw denotes the aforementioned chemical shift tolerance. NOESY cross-peaks are assigned automatically [41] on the basis of the consensus chemical shift assignments and the same peak lists and chemical shift tolerance values used already for the chemical shift assignment. The automated NOE assignment algorithm of the program CYANA is used. The overall probability for the correctness of possible NOE assignments is calculated as the product of three probabilities that reflect the agreement between the chemical shift values and the peak position, the consistency with a preliminary 3D structure [34], and network-anchoring [41], i.e. the extent of embedding in the network formed by other NOEs. Restraints with multiple possible assignments are represented by ambiguous distance restraints [39]. Seven cycles of combined automated NOE assignment and structure calculation by simulated annealing in torsion angle space and a final structure calculation using only unambiguously assigned distance restraints are performed. Constraint combination [41] is applied in the first two cycles to all NOE distance restraints spanning at least three residues in order to minimise distortions of the structures by erroneous distance restraints that may result from spurious entries in the peak lists and/ or incorrect chemical shift assignments. A complete FLYA calculation comprises three stages. In the first stage, the chemical shifts and protein structures are generated de novo (stage I). In the next stages (stages II
184
Protein NMR Spectroscopy
Figure 5.7 Structures obtained by fully automated structure determination with the FLYA algorithm (blue) superimposed on the corresponding NMR structures determined by conventional methods (dark red). (a) ENTH domain At3g16270(9–135) from Arabidopsis thaliana [147]. (b) Rhodanese homology domain At4g01050(175–295) from Arabidopsis thaliana [148]. (c) Src homology domain 2 (SH2) from the human feline sarcoma oncogene Fes [143]. Please refer to the colour plate section
and III), the structures generated by the preceding stage are used as additional input for the determination of chemical shift assignments. Stages II and III are particularly important for aromatic residues and other resonances whose assignments rely on through-space NOESY information. At the end of the third stage, the 20 final CYANA conformers with the lowest target function values are subjected to restrained energy minimisation in explicit solvent against the AMBER force field [139] using the program OPALp [140,141]. The complete procedure is driven by the NMR structure calculation program CYANA, which is also used for parallelization of all time-consuming steps. The performance of the FLYA algorithm can be monitored at different steps of the procedure by quality measures that can be computed without referring to external reference assignments or structures [108]. Structure calculations with the FLYA algorithm yielded 3D structures of three 12–16 kDa proteins that coincided closely with the conventionally determined structures (Figure 5.7). Deviations were below 0.95 A for the backbone atom positions, excluding the flexible chain termini, and 96–97 % of all backbone and side-chain chemical shifts in the structured regions were assigned to the correct residues. The purely computational FLYA method is thus suitable to substitute all manual spectra analysis and overcomes a major efficiency limitation of the NMR method for protein structure determination. The number of input spectra can be reduced for well-behaved proteins. This is of particular interest because a considerable amount of NMR measurement time was necessary to record the 13–14 input 3D spectra that were used as input for the aforementioned FLYA structure determinations. The influence of reduced sets of experimental spectra on the quality of NMR structures obtained with FLYA was investigated for the 12 kDa Src homology domain 2 from the human feline sarcoma oncogene Fes (Fes SH2) [142]. FLYA calculations were performed for five reduced data sets selected from the complete set of 13 3D spectra of the earlier conventional structure determination [143]. The reduced data sets utilised only CBCA(CO)NH and CBCANH for the backbone assignments and either all,
Calculation of Structures from NMR Restraints
185
some or none of the five original side-chain assignment spectra. In four of the five cases tested, the 3D structures deviated by less than 1.3 A backbone RMSD from the conventionally determined Fes SH2 reference structure. The FLYA algorithm can thus also be used with reduced sets of input spectra. A further improvement resulted in conjunction with stereo-array isotope labelling (SAIL) [144,145]. SAIL provides a complete stereo- and regiospecific pattern of stable isotopes, which yields much sharper resonance lines and reduced signal overlap without loss of information (see Chapter 2). Automated signal identification can be achieved with higher reliability for the fewer, sharper and more intense peaks of SAIL proteins. The danger of making erroneous assignments decreases with the number of nuclei and peaks to assign, and less spin diffusion allows NOEs to be interpreted more quantitatively. As a result of the superior quality of the SAIL NMR spectra, reliable fully automated analysis of the NMR spectra and structure calculation are possible using fewer input spectra than with conventional uniformly 13 C/15 N- labelled proteins. FLYA calculations with SAIL ubiquitin using a single ‘through-bond’ 3D spectrum in addition to the 13 C-edited and 15 N-edited NOESY spectra for the restraint collection yielded structures with an accuracy of 0.82–1.15 A for the backbone RMSD to the conventionally determined solution structure [146], showing the feasibility of fully automated NMR structure analysis from a minimal set of spectra.
References 1. Pflugrath, J.W., Wiegand, G., Huber, R. and Vertesy, L. (1986) Crystal structure determination, refinement and the molecular model of the a-amylase inhibitor Hoe-467a. J. Mol. Biol., 189, 383–386. 2. Kline, A.D., Braun, W. and W€uthrich, K. (1986) Studies by 1H nuclear magnetic resonance and distance geometry of the solution conformation of the a-amylase inhibitor tendamistat. J. Mol. Biol., 189, 377–382. 3. Kline, A.D., Braun, W. and W€uthrich, K. (1988) Determination of the complete threedimensional structure of the a-amylase inhibitor tendamistat in aqueous solution by nuclear magnetic resonance and distance geometry. J. Mol. Biol., 204, 675–724. 4. Billeter, M., Kline, A.D., Braun, W. et al. (1989) Comparison of the high-resolution structures of the a-amylase inhibitor tendamistat determined by nuclear magnetic resonance in solution and by X-ray diffraction in single crystals. J. Mol. Biol., 206, 677–687. 5. Berman, H.M., Westbrook, J., Feng, Z. et al. (2000) The protein data bank. Nucleic. Acids Res., 28, 235–242. 6. Blumenthal, L.M. (1953) Theory and Applications of Distance Geometry, Cambridge University Press, Cambridge, UK. 7. Braun, W., B€osch, C., Brown, L.R. et al. (1981) Combined use of proton-proton overhauser enhancements and a distance geometry algorithm for determination of polypeptide conformations. Application to micelle-bound glucagon. Biochim. Biophys. Acta, 667, 377–396. 8. Arseniev, A.S., Kondakov, V.I., Maiorov, V.N. and Bystrov, V.F. (1984) NMR solution spatial structure of ‘short’ scorpion insectotoxin I5A. FEBS Lett., 165, 57–62. 9. Havel, T. and W€uthrich, K. (1984) A distance geometry program for determining the structures of small proteins and other macromolecules from nuclear magnetic resonance measurements of intramolecular 1H 1H proximities in solution. Bull. Math. Biol., 46, 673–698. 10. Braun, W. and Go, N. (1985) Calculation of protein conformations by proton proton distance constraints - a new efficient algorithm. J. Mol. Biol., 186, 611–626. 11. Powell, M.J.D. (1977) Restart procedures for the conjugate gradient method. Math. Program., 12, 241–254.
186
Protein NMR Spectroscopy
12. Abe, H., Braun, W., Noguti, T. and Go, N. (1984) Rapid calculation of 1st and 2nd derivatives of conformational energy with respect to dihedral angles for proteins - General recurrent equations. Comput. Chem., 8, 239–247. 13. G€untert, P., Braun, W. and W€uthrich, K. (1991) Efficient computation of three-dimensional protein structures in solution from nuclear magnetic resonance data using the program DIANA and the supporting programs CALIBA, HABAS and GLOMSA. J. Mol. Biol., 217, 517–530. 14. G€untert, P. and W€uthrich, K. (1991) Improved efficiency of protein structure calculations from NMR data using the program DIANA with redundant dihedral angle constraints. J. Biomol. NMR, 1, 447–456. 15. Kirkpatrick, S., Gelatt, C.D. and Vecchi, M.P. (1983) Optimization by simulated annealing. Science, 220, 671–680. 16. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D. et al. (1983) CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem., 4, 187–217. 17. Nilges, M., Clore, G.M. and Gronenborn, A.M. (1988) Determination of three-dimensional structures of proteins from interproton distance data by hybrid distance geometry-dynamical simulated annealing calculations. FEBS Lett., 229, 317–324. 18. Nilges, M., Gronenborn, A.M., Br€unger, A.T. and Clore, G.M. (1988) Determination of threedimensional structures of proteins by simulated annealing with interproton distance restraints. Application to crambin, potato carboxypeptidase inhibitor and barley serine proteinase inhibitor 2. Protein Eng., 2, 27–38. 19. Nilges, M., Clore, G.M. and Gronenborn, A.M. (1988) Determination of three-dimensional structures of proteins from interproton distance data by dynamical simulated annealing from a random array of atoms - circumventing problems associated with folding. FEBS Lett., 239, 129–136. 20. Br€unger, A.T. (1992) X-PLOR, Version 3.1. A System for X-ray Crystallography and NMR, Yale University Press, New Haven, CT. 21. Br€unger, A.T., Adams, P.D., Clore, G.M. et al. (1998). Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr. D, 54, 905–921. 22. Schwieters, C.D., Kuszewski, J.J., Tjandra, N. and Clore, G.M. (2003) The Xplor-NIH NMR molecular structure determination package. J. Magn. Reson., 160, 65–73. 23. Mazur, A.K. and Abagyan, R.A. (1989) New methodology for computer-aided modeling of biomolecular structure and dynamics 1. Non-cyclic structures. J. Biomol. Struct. Dyn., 6, 815–832. 24. Mazur, A.K., Dorofeev, V.E. and Abagyan, R.A. (1991) Derivation and testing of explicit equations of motion for polymers described by internal coordinates. J. Comput. Phys., 92, 261–272. 25. Bae, D.S. and Haug, E.J. (1987) A Recursive formulation for constrained mechanical system dynamics. 1. Open loop-systems. Mech. Struct. Mach., 15, 359–382. 26. Jain, A., Vaidehi, N. and Rodriguez, G. (1993) A fast recursive algorithm for molecular dynamics simulation. J. Comput. Phys., 106, 258–268. 27. Stein, E.G., Rice, L.M. and Br€unger, A.T. (1997) Torsion-angle molecular dynamics as a new efficient tool for NMR structure calculation. J. Magn. Reson., 124, 154–164. 28. G€untert, P., Mumenthaler, C. and W€uthrich, K. (1997) Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol., 273, 283–298. 29. Schwieters, C.D. and Clore, G.M. (2001) Internal coordinates for molecular dynamics and minimization in structure determination and refinement. J. Magn. Reson., 152, 288–302. 30. Williamson, M.P. and Craven, C.J. (2009) Automated protein structure calculation from NMR data. J. Biomol. NMR, 43, 131–143. 31. Linge, J.P., Habeck, M., Rieping, W. and Nilges, M. (2003) ARIA: automated NOE assignment and NMR structure calculation. Bioinformatics, 19, 315–316. 32. Rieping, W., Habeck, M., Bardiaux, B. et al. (2007) ARIA2: Automated NOE assignment and data integration in NMR structure calculation. Bioinformatics, 23, 381–382.
Calculation of Structures from NMR Restraints
187
33. Huang, Y.J., Tejero, R., Powers, R. and Montelione, G.T. (2006) A topology-constrained distance network algorithm for protein structure determination from NOESY data. Proteins, 62, 587–603. 34. G€untert, P., Berndt, K.D. and W€uthrich, K. (1993) The program ASNO for computer-supported collection of NOE upper distance constraints as input for protein structure determination. J. Biomol. NMR, 3, 601–606. 35. Duggan, B.M., Legge, G.B., Dyson, H.J. and Wright, P.E. (2001) SANE (Structure assisted NOE evaluation): An automated model-based approach for NOE assignment. J. Biomol. NMR, 19, 321–329. 36. Mumenthaler, C. and Braun, W. (1995) Automated assignment of simulated and experimental NOESY spectra of proteins by feedback filtering and self-correcting distance geometry. J. Mol. Biol., 254, 465–480. 37. Mumenthaler, C., G€untert, P., Braun, W. and W€ uthrich, K. (1997) Automated combined assignment of NOESY spectra and three-dimensional protein structure determination. J. Biomol. NMR, 10, 351–362. 38. Nilges, M. (1993) A calculation strategy for the structure determination of symmetrical dimers by 1 H-NMR. Proteins, 17, 297–309. 39. Nilges, M. (1995) Calculation of protein structures with ambiguous distance restraints Automated assignment of ambiguous NOE crosspeaks and disulfide connectivities. J. Mol. Biol., 245, 645–660. 40. Nilges, M., Macias, M.J., O’Donoghue, S.I. and Oschkinat, H. (1997) Automated NOESY interpretation with ambiguous distance restraints: The refined NMR solution structure of the pleckstrin homology domain from beta-spectrin. J. Mol. Biol., 269, 408–422. 41. Herrmann, T., G€untert, P. and W€uthrich, K. (2002) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol., 319, 209–227. 42. G€untert, P. (2009) Automated structure determination from NMR spectra. Eur. Biophys. J., 38, 129–143. 43. Kuszewski, J., Schwieters, C.D., Garrett, D.S. et al. (2004) Completely automated, highly errortolerant macromolecular structure determination from multidimensional nuclear overhauser enhancement spectra and chemical shift assignments. J. Am. Chem. Soc., 126, 6258–6273. 44. Gronwald, W., Moussa, S., Elsner, R. et al. (2002) Automated assignment of NOESY NMR spectra using a knowledge based method (KNOWNOE). J. Biomol. NMR, 23, 271–287. 45. Hung, L.H. and Samudrala, R. (2006) An automated assignment-free Bayesian approach for accurately identifying proton contacts from NOESY data. J. Biomol. NMR, 36, 189–198. 46. Pf€andler, P., Bodenhausen, G., Meier, B.U. and Ernst, R.R. (1985) Toward automated assignment of nuclear magnetic resonance spectra - pattern recognition in two-dimensional correlation spectra. Anal. Chem., 57, 2510–2516. 47. Koradi, R., Billeter, M., Engeli, M. et al. (1998) Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. J. Magn. Reson., 135, 288–297. 48. Johnson, B.A. (2004) Using NMRView to visualize and analyze the NMR spectra of macromolecules. Meth. Mol. Biol., 278, 313–352. 49. Garrett, D.S., Powers, R., Gronenborn, A.M. and Clore, G.M. (1991) A common sense approach to peak picking two-, three- and four-dimensional spectra using automatic computer analysis of contour diagrams. J. Magn. Reson., 95, 214–220. 50. Herrmann, T., G€ untert, P. and W€uthrich, K. (2002) Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J. Biomol. NMR, 24, 171–189. 51. Antz, C., Neidig, K.P. and Kalbitzer, H.R. (1995) A general Bayesian method for an automated signal class recognition in 2D NMR spectra combined with a multivariate discriminant analysis. J. Biomol. NMR, 5, 287–296. 52. Kleywegt, G.J., Boelens, R. and Kaptein, R. (1990) A versatile approach toward the partially automatic recognition of cross peaks in 2D 1H NMR spectra. J. Magn. Reson., 88, 601–608. 53. Rouh, A., Louisjoseph, A. and Lallemand, J.Y. (1994) Bayesian signal extraction from noisy FT NMR spectra. J. Biomol. NMR, 4, 505–518.
188
Protein NMR Spectroscopy
54. Dancea, F. and G€unther, U. (2005) Automated protein NMR structure determination using wavelet de-noised NOESY spectra. J. Biomol. NMR, 33, 139–152. 55. Huang, Y.P.J., Moseley, H.N.B., Baran, M.C. et al. (2005) An integrated platform for automated analysis of protein NMR structures. Methods Enzymol., 394, 111–141. 56. Moseley, H.N.B., Riaz, N., Aramini, J.M. et al. (2004) A generalized approach to automated NMR peak list editing: application to reduced dimensionality triple resonance spectra. J. Magn. Reson., 170, 263–277. 57. Corne, S.A., Johnson, A.P. and Fisher, J. (1992) An artificial neural network for classifying cross peaks in two-dimensional NMR spectra. J. Magn. Reson., 100, 256–266. 58. Carrara, E.A., Pagliari, F. and Nicolini, C. (1993) Neural networks for the peak picking of nuclear magnetic resonance spectra. Neural Networks, 6, 1023–1032. 59. Neidig, K.P., Saffrich, R., Lorenz, M. and Kalbitzer, H.R. (1990) Cluster analysis and multiplet pattern recognition in two-dimensional NMR spectra. J. Magn. Reson., 89, 543–552. 60. Meier, B.U., Bodenhausen, G. and Ernst, R.R. (1984) Pattern recognition in two-dimensional NMR spectra. J. Magn. Reson., 60, 161–163. 61. Moseley, H.N.B. and Montelione, G.T. (1999) Automated analysis of NMR assignments and structures for proteins. Curr. Opin. Struct. Biol., 9, 635–642. 62. Gronwald, W. and Kalbitzer, H.R. (2004) Automated structure determination of proteins by NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc., 44, 33–96. 63. Baran, M.C., Huang, Y.J., Moseley, H.N.B. and Montelione, G.T. (2004) Automated analysis of protein NMR assignments and structures. Chem. Rev., 104, 3541–3555. 64. Altieri, A.S. and Byrd, R.A. (2004) Automation of NMR structure determination of proteins. Curr. Opin. Struct. Biol., 14, 547–553. 65. Volk, J., Herrmann, T. and W€uthrich, K. (2008) Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH. J. Biomol. NMR, 41, 127–138. 66. Kamisetty, H., Bailey-Kellogg, C. and Pandurangan, G. (2006) An efficient randomized algorithm for contact-based NMR backbone resonance assignment. Bioinformatics, 22, 172–180. 67. Atreya, H.S., Chary, K.V.R. and Govil, G. (2002) Automated NMR assignments of proteins for high throughput structure determination: TATAPRO II. Curr. Sci., 83, 1372–1376. 68. Friedrichs, M.S., Mueller, L. and Wittekind, M. (1994) An automated procedure for the assignment of protein 1 HN, 15 N, 13 Ca , 1 Ha , 13 Cb and 1 Hb resonances. J. Biomol. NMR, 4, 703–726. 69. Hare, B.J. and Prestegard, J.H. (1994) Application of neural networks to automated assignment of NMR spectra of proteins. J. Biomol. NMR, 4, 35–46. 70. Olson, J.B. and Markley, J.L. (1994) Evaluation of an algorithm for the automated sequential assignment of protein backbone resonances: A demonstration of the connectivity tracing assignment tools (CONTRAST) software package. J. Biomol. NMR, 4, 385–410. 71. Buchler, N.E.G., Zuiderweg, E.R.P., Wang, H. and Goldstein, R.A. (1997) Protein heteronuclear NMR assignments using mean-field simulated annealing. J. Magn. Reson., 125, 34–42. 72. Li, K.B. and Sanctuary, B.C. (1997) Automated resonance assignment of proteins using heteronuclear 3D NMR. 1. Backbone spin systems extraction and creation of polypeptides. J. Chem. Inf. Comput. Sci., 37, 359–366. 73. Lukin, J.A., Gove, A.P., Talukdar, S.N. and Ho, C. (1997) Automated probabilistic method for assigning backbone resonances of (C-13,N-15)-labeled proteins. J. Biomol. NMR, 9, 151–166. 74. Zimmerman, D.E., Kulikowski, C.A., Huang, Y.P. et al. (1997) Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol., 269, 592–610. 75. Leutner, M., Gschwind, R.M., Liermann, J. et al. (1998) Automated backbone assignment of labeled proteins using the threshold accepting algorithm. J. Biomol. NMR, 11, 31–43. 76. Atreya, H.S., Sahu, S.C., Chary, K.V.R. and Govil, G. (2000) A tracked approach for automated NMR assignments in proteins (TATAPRO). J. Biomol. NMR, 17, 125–136. 77. Bailey-Kellogg, C., Widge, A., Kelley, J.J. et al. (2000) The NOESY JIGSAW: Automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data. J. Comput. Biol., 7, 537–558.
Calculation of Structures from NMR Restraints
189
78. Bailey-Kellogg, C., Chainraj, S. and Pandurangan, G. (2005) A random graph approach to NMR sequential assignment. J. Comput. Biol., 12, 569–583. 79. G€untert, P., Salzmann, M., Braun, D. and W€uthrich, K. (2000) Sequence-specific NMR assignment of proteins by global fragment mapping with the program MAPPER. J. Biomol. NMR, 18, 129–137. 80. Bhavesh, N.S., Panchal, S.C. and Hosur, R.V. (2001) An efficient high-throughput resonance assignment procedure for structural genomics and protein folding research by NMR. Biochemistry, 40, 14727–14735. 81. Moseley, H.N.B., Monleon, D. and Montelione, G.T. (2001) Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data. Method Enzymol., 339, 91–108. 82. Andrec, M. and Levy, R.M. (2002) Protein sequential resonance assignments by combinatorial enumeration using 13 Ca chemical shifts and their (i, i1) sequential connectivities. J. Biomol. NMR, 23, 263–270. 83. Chatterjee, A., Bhavesh, N.S., Panchal, S.C. and Hosur, R.V. (2002) A novel protocol based on HN(C)N for rapid resonance assignment in (15 N, 13 C) labeled proteins: implications to structural genomics. Biochem. Biophys. Res. Commun., 293, 427–432. 84. Coggins, B.E. and Zhou, P. (2003) PACES: Protein sequential assignment by computer-assisted exhaustive search. J. Biomol. NMR, 26, 93–111. 85. Bernstein, R., Cieslar, C., Ross, A. et al. (1993) Computer-assisted assignment of multidimensional NMR spectra of proteins - application to 3D NOESY-HMQC and TOCSY-HMQC Spectra. J. Biomol. NMR, 3, 245–251. 86. Chen, Z.Z., Lin, G.H., Rizzi, R. et al. (2005) More reliable protein NMR peak assignment via improved 2-interval scheduling. J. Comput. Biol., 12, 129–146. 87. Kjaer, M., Andersen, K.V. and Poulsen, F.M. (1994) Automated and semiautomated analysis of homonuclear and heteronuclear multidimensional nuclear magnetic resonance spectra of proteins - the program PRONTO. Methods Enzymol., 239, 288–307. 88. Lin, H.N., Wu, K.P., Chang, J.M. et al. (2005) GANA - a genetic algorithm for NMR backbone resonance assignment. Nucleic Acids Res., 33, 4593–4601. 89. Masse, J.E. and Keller, R. (2005) AutoLink: Automated sequential resonance assignment of biopolymers from NMR data by relative-hypothesis-prioritization-based simulated logic. J. Magn. Reson., 174, 133–151. 90. Vitek, O., Bailey-Kellogg, C., Craig, B. et al. (2005) Reconsidering complete search algorithms for protein backbone NMR assignment. Bioinformatics, 21, 230–236. 91. Vitek, O., Bailey-Kellogg, C., Craig, B. and Vitek, J. (2006) Inferential backbone assignment for sparse data. J. Biomol. NMR, 35, 187–208. 92. Wang, J.Y., Wang, T.Z., Zuiderweg, E.R.P. and Crippen, G.M. (2005) CASA: An efficient automated assignment of protein mainchain NMR data using an ordered tree search algorithm. J. Biomol. NMR, 33, 261–279. 93. Wu, K.P., Chang, J.M., Chen, J.B. et al. (2006) RIBRA - An error-tolerant algorithm for the NMR backbone assignment problem. J. Comput. Biol., 13, 229–244. 94. Xu, Y., Xu, D., Kim, D. et al. (2002) Automated assignment of backbone NMR peaks using constrained bipartite matching. Comput. Sci. Eng., 4, 50–62. 95. Xu, Y.Z., Wang, X.X., Yang, J. et al. (2006) PASA - A program for automated protein NMR backbone signal assignment by pattern-filtering approach. J. Biomol. NMR, 34, 41–56. 96. Eghbalnia, H.R., Bahrami, A., Wang, L.Y. et al. (2005) Probabilistic identification of spin systems and their assignments including coil-helix inference as output (PISTACHIO). J. Biomol. NMR, 32, 219–233. 97. Masse, J.E., Keller, R. and Pervushin, K. (2006) SideLink: Automated side-chain assignment of biopolymers from NMR data by relative-hypothesis-prioritization-based simulated logic. J. Magn. Reson., 181, 45–67. 98. Xu, J., Straus, S.K., Sanctuary, B.C. and Trimble, L. (1993) Automation of protein 2D proton NMR assignment by means of fuzzy mathematics and graph theory. J. Chem. Inf. Comput. Sci., 33, 668–682.
190
Protein NMR Spectroscopy
99. Xu, J., Straus, S.K., Sanctuary, B.C. and Trimble, L. (1994) Use of fuzzy mathematics for complete automated assignment of peptide 1 H 2D NMR spectra. J. Magn. Reson. B, 103, 53–58. 100. Bartels, C., Billeter, M., G€untert, P. and W€uthrich, K. (1996) Automated sequence-specific NMR assignment of homologous proteins using the program GARANT. J. Biomol. NMR, 7, 207–213. 101. Bartels, C., G€untert, P., Billeter, M. and W€uthrich, K. (1997) GARANT- A general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comput. Chem., 18, 139–149. 102. Choy, W.Y., Sanctuary, B.C. and Zhu, G. (1997) Using neural network predicted secondary structure information in automatic protein NMR assignment. J. Chem. Inf. Comput. Sci., 37, 1086–1094. 103. Croft, D., Kemmink, J., Neidig, K.P. and Oschkinat, H. (1997) Tools for the automated assignment of high-resolution three-dimensional protein NMR spectra based on pattern recognition techniques. J. Biomol. NMR, 10, 207–219. 104. Li, K.B. and Sanctuary, B.C. (1997) Automated resonance assignment of proteins using heteronuclear 3D NMR. 2. Side chain and sequence-specific assignment. J. Chem. Inf. Comput. Sci., 37, 467–477. 105. Gronwald, W., Willard, L., Jellard, T. et al. (1998) CAMRA: Chemical shift based computer aided protein NMR assignments. J. Biomol. NMR, 12, 395–405. 106. Pristovsˇek, P., R€uterjans, H. and Jerala, R. (2002) Semiautomatic sequence-specific assignment of proteins based on the tertiary structure - the program st2nmr. J. Comput. Chem., 23, 335–340. 107. Hitchens, T.K., Lukin, J.A., Zhan, Y.P. et al. (2003) MONTE: An automated Monte Carlo based approach to nuclear magnetic resonance assignment of proteins. J. Biomol. NMR, 25, 1–9. 108. Lo´pez-Mendez, B. and G€untert, P. (2006) Automated protein structure determination from NMR spectra. J. Am. Chem. Soc., 128, 13112–13122. 109. W€uthrich, K. (1986) NMR of Proteins and Nucleic Acids, John Wiley & Sons, Inc., New York. 110. Malliavin, T.E., Rouh, A., Delsuc, M.A. and Lallemand, J.Y. (1992) Approche directe de la determination de structures moleculaires a partir de l’effet Overhauser nucleaire. C. R. Acad. Sci. II, 315, 653–659. 111. Oshiro, C.M. and Kuntz, I.D. (1993) Application of distance geometry to the proton assignment problem. Biopolymers, 33, 107–115. 112. Kraulis, P.J. (1994) Protein three-dimensional structure determination and sequence-specific assignment of 13 C-separated and 15 N-separated NOE data - a novel real-space ab-initio approach. J. Mol. Biol., 243, 696–718. 113. Atkinson, R.A. and Saudek, V. (1997) Direct fitting of structure and chemical shift to NMR spectra. J. Chem. Soc. Faraday T, 93, 3319–3323. 114. Atkinson, R.A. and Saudek, V. (2002) The direct determination of protein structure by NMR without assignment. FEBS Lett., 510, 1–4. 115. Grishaev, A. and Llinas, M. (2002) CLOUDS, a protocol for deriving a molecular proton density via NMR. Proc. Natl. Acad. Sci. USA, 99, 6707–6712. 116. Grishaev, A. and Llinas, M. (2002) Protein structure elucidation from NMR proton densities. Proc. Natl. Acad. Sci. USA, 99, 6713–6718. 117. Prestegard, J.H., Mayer, K.L., Valafar, H. and Benison, G.C. (2005) Determination of protein backbone structures from residual dipolar couplings. Methods Enzymol., 394, 175–209. 118. Delaglio, F., Kontaxis, G. and Bax, A. (2000) Protein structure determination using molecular fragment replacement and NMR dipolar couplings. J. Am. Chem. Soc., 122, 2142–2143. 119. Rohl, C.A. and Baker, D. (2002) De novo determination of protein backbone structure from residual dipolar couplings using Rosetta. J. Am. Chem. Soc., 124, 2723–2729. 120. Jung, Y.S., Sharma, M. and Zweckstetter, M. (2004) Simultaneous assignment and structure determination of protein backbones by using NMR dipolar couplings. Angew. Chem. Int. Edit., 43, 3479–3481. 121. Meiler, J. and Baker, D. (2003) Rapid protein fold determination using unassigned NMR data. Proc. Natl. Acad. Sci. U.S.A., 100, 15404–15409. 122. Cavalli, A., Salvatella, X., Dobson, C.M. and Vendruscolo, M. (2007) Protein structure determination from NMR chemical shifts. Proc. Natl. Acad. Sci. U.S.A., 104, 9615–9620.
Calculation of Structures from NMR Restraints
191
123. Shen, Y., Lange, O., Delaglio, F. et al. (2008). Consistent blind protein structure generation from NMR chemical shift data. Proc. Natl. Acad. Sci. U.S.A., 105, 4685–4690. 124. Nilges, M. and O’Donoghue, S.I. (1998) Ambiguous NOEs and automated NOE assignment. Prog. Nucl. Magn. Reson. Spectrosc., 32, 107–139. 125. Allen, M.P. and Tildesley, D.J. (1987) Computer Simulation of Liquids, Clarendon Press, Oxford. 126. Berendsen, H.J.C., Postma, J.P.M., van Gunsteren, W.F. et al. (1984) Molecular dynamics with coupling to an external bath. J. Chem. Phys., 81, 3684–3690. 127. G€untert, P. (2003) Automated NMR protein structure calculation. Prog. Nucl. Magn. Reson. Spectrosc., 43, 105–125. 128. Meadows, R.P., Olejniczak, E.T. and Fesik, S.W. (1994) A computer-based protocol for semiautomated assignments and 3D structure determination of proteins. J. Biomol. NMR, 4, 79–96. 129. Habeck, M., Rieping, W., Linge, J.P. and Nilges, M. (2004) NOE assignment with ARIA 2.0: the nuts and bolts. Meth. Mol. Biol., 278, 379–402. 130. G€untert, P. (2004) Automated NMR structure calculation with CYANA. Meth. Mol. Biol., 278, 353–378. 131. Jee, J. and G€untert, P. (2003) Influence of the completeness of chemical shift assignments on NMR structures obtained with automated NOE assignment. J. Struct. Funct. Genom., 4, 179–189. 132. Simons, K.T., Kooperberg, C., Huang, E. and Baker, D. (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol., 268, 209–225. 133. Cornilescu, G., Delaglio, F. and Bax, A. (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR, 13, 289–302. 134. Wishart, D.S. and Sykes, B.D. (1994) The 13 C chemical-shift index: a simple method for the identification of protein secondary structure using 13 C chemical-shift data. J. Biomol. NMR, 4, 171–180. 135. Seavey, B.R., Farr, E.A., Westler, W.M. and Markley, J.L. (1991) A relational database for sequence-specific protein NMR data. J. Biomol. NMR, 1, 217–236. 136. Shen, Y. and Bax, A. (2007) Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology. J. Biomol. NMR, 38, 289–302. 137. Bradley, P., Misura, K.M. and Baker, D. (2005) Toward high-resolution de novo structure prediction for small proteins. Science, 309, 1868–1871. 138. Malmodin, D., Papavoine, C.H.M. and Billeter, M. (2003) Fully automated sequence-specific resonance assignments of heteronuclear protein spectra. J. Biomol. NMR, 27, 69–79. 139. Cornell, W.D., Cieplak, P., Bayly, C.I. et al. (1995) A second generation force field for the simulation of proteins, nucleic acids and organic molecules. J. Am. Chem. Soc., 117, 5179–5197. 140. Luginb€uhl, P., G€untert, P., Billeter, M. and W€uthrich, K. (1996) The new program OPAL for molecular dynamics simulations and energy refinements of biological macromolecules. J. Biomol. NMR, 8, 136–146. 141. Koradi, R., Billeter, M. and G€untert, P. (2000) Point-centered domain decomposition for parallel molecular dynamics simulation. Comput. Phys. Commun., 124, 139–147. 142. Scott, A., Lo´pez-Mendez, B. and G€untert, P. (2006) Fully automated structure determinations of the Fes SH2 domain using different sets of NMR spectra. Magn. Reson. Chem., 44, S83–S88. 143. Scott, A., Pantoja-Uceda, D., Koshiba, S. et al. (2005) Solution structure of the Src homology 2 domain from the human feline sarcoma oncogene Fes. J. Biomol. NMR, 31, 357–361. 144. Kainosho, M., Torizawa, T., Iwashita, Y. et al. (2006) Optimal isotope labelling for NMR protein structure determinations. Nature, 440, 52–57. 145. Takeda, M., Ikeya, T., G€untert, P. and Kainosho, M. (2007) Automated structure determination of proteins with the SAIL-FLYA NMR method. Nature Protocols, 2, 2896–2902.
192
Protein NMR Spectroscopy
146. Ikeya, T., Takeda, M., Yoshida, H. et al. (2009) Automated NMR structure determination of stereo-array isotope labeled ubiquitin from minimal sets of spectra using the SAIL-FLYA system. J. Biomol. NMR, 44, 261–272. 147. Lo´pez-Mendez, B., Pantoja-Uceda, D., Tomizawa, T. et al. (2004) Letter to the Editor: NMR assignment of the hypothetical ENTH-VHS domain At3g16270 from Arabidopsis thaliana. J. Biomol. NMR, 29, 205–206. 148. Pantoja-Uceda, D., Lo´pez-Mendez, B., Koshiba, S. et al. (2005) Solution structure of the rhodanese homology domain At4g01050(175–295) from Arabidopsis thaliana. Protein Sci., 14, 224–230.
6 Paramagnetic Tools in Protein NMR Peter H.J. Keizers and Marcellus Ubbink
6.1
Introduction
Unpaired electrons have a strong magnetic moment and consequently affect nearby nuclear spins. The nuclear resonances can be broadened or shifted and these paramagnetic effects are distance and orientation dependent in a well-understood fashion. Stable unpaired electrons are found on metals and protected radicals, but this does not mean that the observation of paramagnetic effects is limited to metal proteins. It is possible to generate such effects also in other proteins by the introduction of paramagnetic centres, for example by attaching a paramagnetic tag or substitution of a diamagnetic metal, like Ca2 þ , with a paramagnetic one, like a lanthanide. Paramagnetic effects offer distance restraints up to 60 A, a way to cause partial alignment for the generation of RDCs without the need of external media, the possibility to study protein dynamics and to visualise minor populations. Thus, paramagnetic effects are amazingly powerful and complement more classical restraints, like the NOE. This chapter aims to provide the uninitiated reader sufficient knowledge of paramagnetic NMR tools to enable him/her to select the method of choice for the particular problem at hand. After discussing the type of restraints, choice of metals and the available paramagnetic tags, practical hints are given in a protocol as well as several examples that illustrate the current possibilities. It is not meant as a theoretical description of paramagnetism, for which the reader is referred to other sources [1–3].
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
194
Protein NMR Spectroscopy
6.2
Types of Restraints
Paramagnetic centres are characterised by two properties that determine the type of restraints they generate. The electronic relaxation time, tS, which is the longitudinal relaxation time of the unpaired electron, is much shorter than for nuclei and ranges from microseconds down to picoseconds at ambient temperature. Centres with slow electronic relaxation (ns-ms) cause strong nuclear relaxation, and thus line broadening. This is called paramagnetic relaxation enhancement (PRE). The effect of centres with small tS (ps) on nuclear relaxation is much smaller. The anisotropy of the paramagnetic effect, described by the magnetic susceptibility tensor (c-tensor), causes shifts in the resonance positions (pseudocontact shifts, PCSs), as well as partial alignment in strong magnetic fields. As a rule of thumb, centres that cause much line broadening are not very anisotropic and thus cause negligible shifts, whereas for highly anisotropic centres, line-broadening effects are limited to a small sphere around the metal. 6.2.1
Paramagnetic Dipolar Relaxation Enhancement
Unpaired electrons may enhance nuclear relaxation rates of nearby nuclei because of the dipolar interaction between the unpaired electron and the nucleus. The fluctuating field at the nucleus that produces dipolar relaxation is caused by flipping of the electron spin due to its fast relaxation as well as tumbling of the molecule in the magnetic field. Therefore, the relevant correlation time, tc, is determined by tS and tr, the rotational correlation time of the molecule, according to tc1 ¼ tS1 þ tr1. When studying proteins (tr > 5 ns), tc is often dominated by tS, but this is not necessarily the case when the paramagnetic centre is a metal with slow electronic relaxation or a stable radical, in which case tr < tS. Both longitudinal and transverse relaxation rates are affected by the presence of a paramagnet. The relation between longitudinal relaxation rate enhancement ðRpara 1 Þ and the distance from nucleus to paramagnet was described by Solomon, and in its simplified form is shown in Equation 6.1, for the case where tS dominates tc [4]. 2 m0 g 2I g2e mB 2 SðS þ 1Þ 7tS 3tS para R1 ¼ þ ð6:1Þ 15 4p 1 þ wS2 t2S 1 þ w2I t2S r6IM The meaning of the symbols and the physical constants can be found in Table 6.1. The range of distances for which significant effects are observed depends on the type of nucleus (g I) and the number of unpaired electrons in the paramagnet, which determine the quantum number S. The squared g I in Equations 6.1 and 6.2 implies that of the common nuclei 1 H is by far the 13 most sensitive to PRE. The effects on C and 15 N are 16 and 100 times less, respectively. For instance, distances of up to 16.7 A have been reported from the haem iron atom to distal protons of the substrate 12-Br-laurate binding to cytochrome P450BM3, based on the T1 relaxation enhancement caused by the active site haem iron atom [5]. Although in principle R1 relaxation rates could be used to estimate distances from metals to protein nuclei, this area has not been explored much due to experimental difficulties in retrieving the paramagnetic component in the relaxation rate and the interference of cross relaxation [6]. Nevertheless, the structure of a ferredoxin was refined using T1 based distance restraints [7], after careful analysis of the data.
Paramagnetic Tools in Protein NMR
195
Table 6.1 Symbols and physical constants used Symbol
Quantity
tS tc tr R1para R2para R2dia p mB gI ge S J rIM Ipara,dia t ta Dres dC dPC rij wI B0 k T h Dcax Dcrh
Electronic relaxation time Correlation time Rotational correlation time Paramagnetic contribution to the longitudinal relaxation rate Paramagnetic contribution to the transverse relaxation rate Diamagnetic contribution to the transverse relaxation rate Pi Bohr magneton Gyromagnetic ratio of nucleus I Electronic g-factor Spin quantum number Quantum number used for lanthanides Nucleus to metal distance Resonance intensity of paramagnetic, diamagnetic sample Time during which transverse relaxation was active Delay time a Residual dipolar coupling Fermi contact shift Pseudocontact shift i-j internuclear distance Larmor frequency of nucleus I Magnetic field Boltzmann constant Absolute temperature Planck’s constant Axial component of the magnetic susceptibility tensor Rhombic component of the magnetic susceptibility tensor
The paramagnetically enhanced transverse nuclear relaxation rate, Rpara 2 , is also related to the distance between paramagnet and nucleus, and is therefore a direct means of retrieving structural information from NMR data. The relation between the nucleus-unpaired electron distance (rIM) and relaxation rate enhancement ðRpara 2 Þ is shown in its simplified form in Equation 6.2. 1 m0 2 g 2I g2e m2B SðS þ 1Þ 3tc para R2 ¼ 4tc þ ð6:2Þ 15 4p 1 þ w2I t2c r6IM Note that Equations 6.1 and 6.2 indicate that the PRE drops off with the sixth power of the distance, yielding a limited useful distance range. On the other hand, it offers the possibility to study minor species of molecular conformation in solution, as is illustrated in Section 6.5. Also note that in Equation 6.2 tc is used, for which an accurate value is not easily determined. Fortunately an accurate tc is usually not required, because of the sixth power distance dependence of Rpara on the distance. If the tr dominates, the protein rotational correlation can 2 be used, but this may not always be valid, for examplewhen using a nitroxide radical tag on the protein surface, since such a tag will have mobility independent of the protein as a whole. Sometimes it is possible to fit tc, as was done for the data shown at the end of this section.
196
Protein NMR Spectroscopy
Rpara can be retrieved from the broadening of the resonance in a paramagnetic sample by 2 comparison to its diamagnetic counterpart, according to Equation 6.3 [8]. para I para Rdia 2 expðR2 tÞ ¼ para I dia Rdia 2 þ R2
ð6:3Þ
where Ipara,dia are the peak heights under paramagnetic and diamagnetic conditions, respectively, and t is the time during which transverse relaxation was active in the pulse programme, that is the INEPT time in a HSQC experiment. Disadvantages of this approach are that significant errors can be included when the paramagnetic sample is contaminated with a considerable amount of diamagnetic sample (>5 %). Furthermore, peak fitting is required to obtain Rdia 2 . Therefore, Iwahara et al. proposed another method, based on the measurement of the resonance intensities using two delay times during the INEPT of the pulse sequence [9]. Without requiring any fitting procedures, the paramagnetic contribution to the relaxation is obtained from Equation 6.4. Rpara ¼ 2
1 I dia ðtb ÞI para ðta Þ ln dia tb ta I ðta ÞI para ðtb Þ
ð6:4Þ
where ta,b are the two delay times. The benefit of using this method is that no long repetition delays are required for signal recovery and it is better able to deal with diamagnetic impurities. However, the required experimental time is much greater, due to the doubling of the number of spectra and, more importantly, due to the relaxation occurring during the longer delay time. We found the one-point method to produce reliable distance restraints between 20 and 30 A for a Gd3 þ probe [10], with a precision of 3 A, see Figure 6.1. The range of distances
Figure 6.1 Correlation of the calculated metal-to-amide proton distances of pseudoazurin with the tag CLaNP-3-Gd attached, plotted versus the experimentally PRE-based distances. The error margin distances with 3 A boundaries are marked by the dashed lines. The arrows pointing down and up emphasise that for some residues, only an upper or a lower experimental distance could be determined at 20 and 30 A , because the resonances are broadened beyond detection or are not affected, respectively. Reproduced with permission from Vlasie et al. [10], copyright Wiley-VCH Verlag GmbH & Co. KGaA
Paramagnetic Tools in Protein NMR
197
that can be studied depends on the relaxation agent. Mn2 þ -EDTA was found to have significant effects over a range of 16–25 A [11], rather similar to a nitroxide which affected nuclei 12–22 A away [8]. In contrast to NOEs, restraints can also be defined for the remaining nuclei. For those whose resonance broadens out beyond detection an upper bound restraint can be given and likewise, a lower bound restraint is given to unaffected resonances, belonging to nuclei that are too far from the paramagnetic centre. Given the above limitations it is important to generate restraints with sufficiently large error margins. As with NOEs, it is better to have many loose restraints than a few precise ones. 6.2.2
Other Types of Relaxation
The net magnetisation caused by the Zeeman splitting of the unpaired electron has a dipolar interaction with nearby nuclei. The fluctuations of this dipolar interaction cause Curie spin relaxation, which is manifested as a contribution to T2 relaxation. This type of relaxation depends only on tr and not on tS, because it is caused by the time-averaged component of the electron spin [3]. Rpara 2
1 m0 2 w2I g4e m4B S2 ðS þ 1Þ2 3tr ¼ 4t þ 5 4p 1 þ w2I t2r ð3kTÞ2 r6IM
ð6:5Þ
Curie relaxation can be the dominant paramagnetic relaxation for lanthanides with short tS (so not Gd3 þ ) and in large proteins (large tr) at high fields [12], as is illustrated in Figure 6.2. The use of Curie spin relaxation as a source for structural restraints is not widespread, although an example is given by Banci et al. [13] and other references can be obtained from the review by Arnesano et al. [14]. Cross-correlation occurs between Curie relaxation and regular dipolar relaxation, affecting the linewidth difference between components of a multiplet. For example, the linewidths of the two components of the resonance of a 1 H attached to a 15 N in the neighbourhood of a paramagnetic centre will differ from the diamagnetic control [15]. The size of this effect depends on the distance between the paramagnet and the 1 H, as well as the angle between the metal-1 H and the 1 H-15 N vectors [16]. Thus, paramagnetic crosscorrelation can yield distance and angle information [17]. Cross-correlated relaxation may interfere with the observation of T1 relaxation and vice versa, so care should be taken when either of these processes is used to acquire structural restraints [7]. 6.2.3
Residual Dipolar Couplings
From the partial alignment of a metalloprotein caused by the anisotropy of its paramagnetic centre, protein RDCs could be measured for the first time. Prestegard and co-workers showed that the anisotropy of cyanometmyoglobin was sufficient to detect dipolar couplings at 17.6 T [18]. Nowadays, RDCs are mostly measured by using external alignment media [19] (see Chapter 4), but with the strong and rigid lanthanide tags (Section 6.4), obtaining RDCs of sufficient size even at 14 T is readily possible. Paramagnetic alignment offers a good alternative to external alignment and has a clear advantage when studying dynamics between domains or in a protein complex [20,21], because interpretation of differences in alignment between domains (or partners) is much simpler.
198
Protein NMR Spectroscopy
Figure 6.2 Distance dependence of PRE and PCS. The plots give the increase in the linewidth (Dn ¼ DR2/p) as a function of the distance from the paramagnetic centre. Solid line, dipolar relaxation; dotted line, Curie relaxation. In the panels on the right, also the maximal PCS is indicated, by the thick solid line and using the right-hand axis. The plots were generated using Equations 6.2, 6.5 and 6.10, using wL ¼ 600 MHz (1H), tr ¼ 10 ns, T ¼ 300 K and ts ¼ 1 ms (Spin label), 5 ns (Gd3 þ ) and 10 ps (Co2 þ , Yb3 þ and Tm3 þ ). cax was 6 (Co2 þ ), 9 (Yb3 þ ) and 55 1032 m3 (Tm3 þ )
The RDCs are apparent as change in the peak separation of the multiplet component in HSQC-like spectra that are acquired without decoupling, so the RDC adds to the J-coupling. It can be measured with the in-phase anti-phase (IPAP) pulse sequence [22], or by acquiring J-modulated spectra [23,24]. The RDC is dependent on the angular orientation of the internuclear vector relative to the magnetic susceptibility tensor (c-tensor) of the paramagnetic centre, according to Equation 6.6 (see also Chapter 4).
D
res
3 B20 g I g J h 2 2 ¼ Dcax 3 cos q1 þ Dcrh sin q cos 2j 2 15kT 16p3 r3IJ
ð6:6Þ
where Dcax,rh are the axial and rhombic components of the magnetic susceptibility tensor respectively, I and J are the observed nuclei, rIJ is the internuclear distance (e.g. 1.02 A for
Paramagnetic Tools in Protein NMR
199
HN), and q and j determine the orientation of the I-J vector relative to the c-tensor. RDCs are independent from the distance to the paramagnetic centre and can therefore be obtained for residues all over the protein, making it a powerful source of long-range restraints. The analytical challenge that has to be overcome when applying RDC-based structural restraints is the degeneracy of the alignment by the c-tensor. In general, another set of NMR restraints is required, for instance from PCSs caused by the same lanthanide ion giving the alignment [25]. The RDCs are sensitive to dynamics, because this leads to averaging of the orientations of the bond vector. The alignment tensors calculated from RDCs even for paramagnetic centres that are rigid relative to the protein [26,27], are 10–20% smaller than calculated from pseudocontact shifts (discussed in the next paragraph), probably due to local dynamics within the protein. 6.2.4
Contact and Pseudocontact Shifts
Paramagnetic centres having an anisotropic c-tensor cause NMR resonances to shift. The shift difference between a paramagnetic and a diamagnetic sample (Dd obs, Equation 6.7) arises from contributions of the Fermi contact shift (dC), a pseudocontact shift (PCS or d PC) or by other contributions originating from dissimilar diamagnetic properties between oxidised and reduced forms of the sample in the case of metalloproteins (dS) [28]. Ddobs ¼ d C þ dPC þ dS
ð6:7Þ
Contact shifts are caused by the delocalisation of the spin density of the unpaired electron through the network of covalent bonds between the metal and its coordinating ligands. This effect has a very local character and it can only be of use to retrieve structural information about the ligands of a metal [29]. Contact shifts are related to the angle between the metalligand bond and the magnetic field, according to Equations 6.8 and 6.9. dC ¼ hSZ iDf ðfi Þ
ð6:8Þ
f ðfi Þ ¼ b0 þ b2 cos2 fi
ð6:9Þ
where SZ is the spin magnetisation on the metal atom and D, b0 and b2 are constants. The angle fi is the Fe-S–CB-HB dihedral, in the case of s-spin delocalisation of a thio-linked haem [30]. Similar equations have been used in the refinement of blue copper protein metal sites [31]. Through space, the effect of the dipolar coupling between the magnetic moment of the nucleus and the anisotropic component of the unpaired electron can reach much farther than the metal coordinating ligands and cause PCSs, according to Equation 6.10 [3]. d PC ¼
3 1 2 2 Dc Dc 3 cos q1 þ sin q cos 2 W ax 2 rh 12pr3IM
ð6:10Þ
200
Protein NMR Spectroscopy
Figure 6.3 Back-predicted PCSs versus experimentally determined PCSs of backbone amide protons of cytochrome c (Cc) with the tag CLaNP-5-Yb attached. Reprinted from Xu et al. [21], with kind permission of Springer Science and Business Media
Significant PCSs can be measured up to 60 A away from the paramagnetic centre when a strong lanthanide is used as the metal and thus PCSs form an important source of long-distance restraints [25,32], useful in structure determination of large proteins or protein complexes [33]. To relate the PCS to the metal-to-nucleus distance, the c-tensor of the paramagnet needs to be determined. This can be done in a separate experiment [25], in parallel [34], or the shift-agent can be pre-calibrated [27]. Figure 6.3 shows a fit obtained between predicted and observed PCSs after performing the five-parameter fit required to determine the Dcax, Dcrh and the three Euler angles that determine the c-tensor orientation. If the protein structure is available and more than 20 PCSs have been measured, this is a trivial procedure. Already in the first NMR studies on paramagnetic molecules, shifted resonances were indicative of nuclei coordinating to, or at least being close to the paramagnetic centre [35]. Thus, PCSs can not only be used as restraints to obtain or optimise a protein structure, they have also been applied successfully to assign resonances [36]. In principle, the use of paramagnetic effects in the assignment of resonances is not restricted to PCSs, and other types of paramagnetic spectral changes can provide helpful assignment information.
6.3
What Metals to Use?
The paramagnetic properties of the metal depend on its coordination number and oxidation and spin states. For the transition metals the latter is affected by the coordinating ligands. Commonly used are Mn2 þ and Cu2 þ for PREs and Co2 þ , low-spin Fe3 þ and Ni2 þ for PCSs [37–42], either as naturally occurring cofactors in metal proteins or as artificial substitutes. Mn2 þ , Co2 þ and sometimes Cu2 þ are also used in paramagnetic tags [43–45]. Lanthanides do not occur naturally in proteins, yet provide excellent paramagnetic properties, and lanthanides are readily exchanged for Ca2 þ and Mg2 þ . Also many lanthanide tags are available (see Section 6.4). Virtually all of the f-block lanthanides are used because the paramagnetic properties vary greatly throughout the series. The f-shell
Paramagnetic Tools in Protein NMR
201
electrons are not part of the coordination bonds and are in deeply buried shielded atomic orbitals, which makes their magnetic properties essentially independent of coordination geometries. Because of Hund’s rules, an electronic shell more than half-filled has the quantum number J in the highest possible ground state, which causes the second half of the lanthanide ions to be stronger paramagnets than the first half [46]. As the f-electrons are hardly mixed with the binding electrons, lanthanides show similar affinities for binding sites, even though lanthanide ions have a decreasing ionic radius (ranging from 117 pm to 100 pm for the trivalent ions). Some lanthanide ions are diamagnetic and therefore used as controls (Lu3 þ and La3 þ and also the similar transition metal Y3 þ is used for this purpose). Others are strongly paramagnetic (Dy3 þ , Tb3 þ , Tm3 þ ) or moderately paramagnetic (Er3 þ , Ho3 þ , Yb3 þ ). Gd3 þ is a special case in the series because it contains 7 unpaired electrons with an isotropic distribution in the f-shell and therefore, the paramagnetic effect of Gd3 þ , alone amongst the lanthanides, is fully isotropic, that is there is no magnetic susceptibility tensor and thus Gd3 þ causes only broadening of lines instead of shifts [3]. Gd3 þ has a long tS and is the most suitable lanthanide for PREs, whereas the others are normally used to obtain PCSs or RDCs. When an aromatic group is present near the lanthanide coordination site, this may act as an antenna that can absorb light and transmit this to the metal, causing Eu3 þ and Tb3 þ to luminesce with a millisecond lifetime, a property that can be convenient to monitor labelling efficiency or to do fluorescence experiments as well [47]. In a protein or peptide, the aromatic group could be a Trp residue and of the synthetic lanthanide probes, CLaNP-5 (Figure 6.4) shows this behaviour [27,48]. In Table 6.2 some relevant properties of commonly used paramagnetic metals are listed. Table 6.2 Paramagnetic properties of various metals. The columns RDC, PRE and PCS indicate the suitability of the metal to obtain these restraints and the distance range in which significant PREs or PCSs are obtained is indicated Paramagnet S/Ja
tS, s
RDC
PRE
PCS
Distance Diamagnetic range, A analogue
Fe3 þ HS Fe3 þ LS Mn2 þ Co2 þ LS Ni2 þ Cu2 þ Gd3 þ Er3 þ Ho3 þ Yb3 þ Dy3 þ Tb3 þ Tm3 þ Nitroxide
109–1011 1011–1013 108 109–1010 1010–1012 108–109 108–109 1012–1013 1012–1013 1012–1013 1012–1013 1012–1013 1012–1013 107
þ þ þ þ þ þþ þþ þþ
þ þ þþ þ þ þþ þþ
þ þ þ þ þ þ þ þ þ þ
5–12b 5–17b 16–25 4–13 12–22b 5–25 20–30 10–40 10–40 10–40 15–60 15–60 15–60 12–25
5/2 1/2 5/2 3/2 2/2 1/2 7/2 15/2 16/2 7/2 15/2 12/2 12/2 1/2
þ þ þ þ þ þ
Fe2 þ LS Fe2 þ LS Ca2 þ , Mg2 þ Zn2 þ , Cd2 þ Zn2 þ , Cd2 þ Cu þ , Zn2 þ La3 þ , Lu3 þ , Y3 þ La3 þ , Lu3 þ , Y3 þ La3 þ , Lu3 þ , Y3 þ La3 þ , Lu3 þ , Y3 þ La3 þ , Lu3 þ , Y3 þ La3 þ , Lu3 þ , Y3 þ La3 þ , Lu3 þ , Y3 þ reduced radical, MTSc
Ref. [131] [40] [37] [132] [41] [39] [10] [34] [133] [95] [79] [80] [78] [8]
a J is the quantum number used for lanthanides, which is a vectorial addition of S and L, the total orbital angular momentum quantum number [46]. b based on PRE. c MTS, (1-Acetyl-2,2,5,5-tetramethyl-3-pyrroline-3-methyl)-methanethiosulfonate, diamagnetic analogue of MTSL (Figure 6.4) [72].
202
Protein NMR Spectroscopy
Figure 6.4 Chemical structures of the nitroxide spin label (1-oxyl-2,2,5,5-tetramethyl-3pyrroline-3-methyl)methanethiosulphonate (MTSL, 1) [64], the nitroxide-containing amino acid 2,2,6,6-tetramethylpiperidine-1-oxyl-4-amino-4-carboxylic acid (TOAC, 2) [75], and the metal chelators usable as paramagnetic tags; diethylenetriaminepentaacetic acid cyclic anhydride (3) [85], 2,20 ,200 ,2000 -((((4-amino-1,2-phenylene)bis(oxy))bis(ethane-2,1-diyl))bis(azanetriyl))tetraacetic acid (4) [82], S-mesyl-cysteine-EDTA (5) [91], CLaNP-1 (6) [95], ClaNP-3 (7) (10), CLaNP-5 (8) [92], and 4-mercaptomethyl-dipicolinic acid (9) [93]. Probes 1, 5, 6, 7 and 8 are cysteine-reactive by their one or two methanethiosulphonate leaving groups. Probe 9 was designed to form a sulfur bridge with an activated cysteine. Probes 2 and 4 can be introduced into a peptide or attached to a protein respectively, via peptide chemistry methodology and probe 3 is amine specific via its anhydrides
Paramagnetic Tools in Protein NMR
6.4
203
Paramagnetic Probes
Paramagnetic metals are sometimes present in proteins, but this is generally not the case. So to take advantage of paramagnetism, the unpaired electrons have to be introduced into the sample. In some cases substitution of diamagnetic metals for paramagnetic ones is an option. For specific purposes, paramagnetic compounds, like Mn2 þ ions or caged lanthanides, can simply be added to the solution. As a more general method to obtain paramagnetic restraints from diamagnetic proteins, paramagnetic tags can be attached specifically to the protein termini or to surface engineered residues. Care must be taken when designing these alterations and control experiments should always be performed to validate that the protein structure or the interaction site (for complexes) has not changed. The various probes that have been used can be divided into three groups; nitroxide spin labels, metal binding peptides and synthetic metal chelators. Examples of such peptide sequences are displayed in Table 6.3 and examples of spin labels and metal chelators are shown in Figure 6.4. In this section a description of the various paramagnetic shift and relaxation labels is given and several examples of applications are provided.
6.4.1
Substitution of Metals
Already in 1975, Ca2 þ was replaced by Gd3 þ in staphylococcal nuclease, to determine distances between the protons of the substrate thymidine 30 ,50 -diphosphate and the substituted metal cofactor via both R1 and R2 relaxation measurements on proton and phosphorus [49]. Substitution of Ca2 þ in parvalbumin with Pr3 þ allowed the determination of PCS, indicating this to be a promising method to characterise Ca2 þ binding sites, although the analysis was not performed quantitatively at that stage [50]. In general, the affinity of Ca2 þ -binding proteins is higher for lanthanide ions than for Ca2 þ , due to their trivalent nature [51], and orders of magnitude stronger than that of, for example EGTA [52]. Ca2 þ binding sites show high sequence similarity and are often so-called EF-hand motifs [53]. In many NMR studies, beginning with the study of Griffin et al. [49], Ca2 þ was substituted with a lanthanide ion in an EF-hand motif. An elegant quantitative study comes from the group of Bertini, in which the c-tensor of a variety of lanthanide ions substituted in the EF-hand domain protein calbindin D was characterised [54]. Similar substitution studies have been performed for the identification of the metal coordinating ligands [55].
Table 6.3 Lanthanide-binding peptide sequences Sequence
DNDGDGKIGADE KRRWKKNFIAVSAANRFKKISSSGAL YIDTNNDGWYEGDELLA CYVDTNNDGAYEGDEL YVDTNNDGAYEGDELC YCDTNNDGAYEGDEL
Reference [81] [80] [78] [26] [26] [26]
204
Protein NMR Spectroscopy
The exchange of a lanthanide ion can also be done for Mg2 þ , which, like Ca2 þ , is a group 2 alkaline earth metal [56]. Diamagnetic transition metals have also been exchanged for paramagnetic ones, for instance to obtain PCSs to monitor protein domain motions after introduction of Co2 þ in the Zn2 þ binding site in protein PA0128 [57], and Co2 þ and Ni2 þ were exchanged for Cu2 þ in the blue copper protein azurin to obtain contact and pseudocontact shifts of the metal coordinating residues [41,58]. 6.4.2
Free Probes
Small compounds containing paramagnetic centres can be added to a protein solution to study surface accessibility or ligand interactions. Both nitroxide radicals and Gd-chelators have been applied for this purpose [59–62]. The biggest advantage of this approach lies in its simplicity, for it does not require any protein modifications. It cannot readily be used to generate quantitative information however, because no significant PCSs will be generated due to the mobility of the probe relative to the protein. PREs will be observed for protein areas accessible to the probe and in this way a qualitative map of surface exposed regions can be obtained. In a similar fashion, the disappearance of PREs upon complex formation with a partner molecule can reveal interface regions. It should be realised though that any probe may have preferred interaction sites on protein surfaces resulting in a bias in the measured PREs. 6.4.3
Nitroxide Labels
The simplest and most widely used paramagnetic tags are probably the nitroxide spin labels (Figure 6.4). These are organic compounds that contain an unpaired electron at a sterically protected nitroxide, which makes them relatively stable radicals under physiological conditions [63]. The label is generally attached to a surface exposed cysteine residue and enhances R1 and R2 rates of nearby nuclei [64]. As one of the first examples, the structure of the 25 kDa protein eIF4E was refined by placing site directed spin labels on five positions on the protein surface [8]. By making use of the relation between line broadening and nucleus-to-electron distance (Equation 6.2), the global fold was correctly predicted. The RMSD of the precision and accuracy of the structure decreased from 8.3 to 2.3 and 7.9 to 3.2 A, respectively, after including the spin label restraints in addition to a limited set of NOEs (HN-HN only in perdeuterated protein) and chemical shift derived angle restraints. Spin labels have been successfully applied in studies of complexes of proteins with ligands [65–67], proteins with nucleic acids [68–70] and proteins with other proteins [71,72]. The nitroxide is also used on peptides, for instance to mimic interacting peptide hormones. Distances between spin-labelled peptides and neurophysin protons were determined from R1 enhancement effects, confirming the presence of a secondary binding site. Furthermore, with the distances obtained from the paramagnetic ligands, it could be demonstrated that observed NOEs were caused by spin diffusion rather than true contacts between peptide and protein [73]. Spin labels are mobile groups that sample a considerable conformational space on the protein. When defining PRE restraints derived from spin labels this has to be taken into account (see step 5 of the protocol at the end of this section) [74]. In
Paramagnetic Tools in Protein NMR
205
the case of peptides it is possible to use TOAC, an amino acid with a stable radical in a rigid conformation (Figure 6.4), which can be built into peptides using standard solid phase synthesis methodology [75]. Again, it is important to check the effect of TOAC incorporation on the structure and activity of the peptide. In an SH3-binding peptide, placing the TOAC in the binding motif decreased the affinity considerably, but when situated just outside the recognition motif, the effect was small [67]. Nitroxide spin labels attached via a phosphodiester bond to both the 30 and 50 end of DNA reduced the complexity of the TOCSY spectra of the bacteriophage M13 gene V protein, which allowed the assignment of more connectivities [68]. With a similar label, DNA-toprotein distances have also been determined using the one-point approach on the complex between Mrf2 and its cognate DNA [69]. Protein internal flexibility of troponin C did not become clear from crystallography or conventional NMR investigations. From the effects of a spin label on the selectively 13 C enriched methyl groups of the ten endogenous methionines, it was shown that the central helix was flexible and that this flexibility was enhanced upon interaction with troponin I [71]. Another example of the use of spin labels for determination of structure and dynamics in a protein complex is described in Section 6.5. 6.4.4
Metal Binding Peptides
Peptides that mimic metal binding sites of metalloproteins can be inserted at the termini of other proteins to serve as a paramagnetic source once chelated to the appropriate metal. For instance, the Cu2 þ binding motif of human albumin was used as a template to design a three-residue N-terminal Cu2 þ /Ni2 þ tag, providing Rpara based structural restraints as 2 demonstrated for ubiquitin [44]. Similarly, a His-His-Pro-tag usable to purify proteins by Ni-affinity chromatography, was rebound to Ni2 þ after purification, to yield Rpara -based 1 restraints for the refinement of the structure of thioredoxin [76]. As mentioned above, EF-hand containing proteins are easily equipped with paramagnetic lanthanide ions [77]. This property was explored further in order to develop lanthanidebinding peptides (LBPs) that can be used as paramagnetic centres, for instance to attach to proteins [48]. Such tags equipped with the appropriate lanthanide deliver restraints such as RDCs [78] and PCSs [79] and in principle also PREs could be obtained when chelating Gd3 þ . Large effects have been reported for LBPs and their affinity for various lanthanide ions is high. The peptides can be engineered genetically at protein termini [34], or attached to a surface exposed cysteine via a disulfide bond [26]. The sequences of some LBPs are shown in Table 6.3. A minimum length of twelve amino acids of which at least four acidic residues appear to be required to ligate the metal, although a LBP based on the calmodulin Ca2 þ -site is much longer and without any negative charge [80]. A genetically introduced lanthanide-binding sequence can only be at one of the termini and may be rather flexible, making it most suitable for obtaining PREs. Lanthanidebinding peptides attached to Cys residues are generally more than 10 residues long and therefore more bulky compared to synthetic probes and thus may have greater effects on the protein structure or protein complex formation. On the other hand, it was shown that the bulkiness limits its mobility relative to the protein backbone, which yields reliable restraints [26].
206
Protein NMR Spectroscopy
Given the difficulties in structure determination of membrane proteins, obtaining RDCs for this class of proteins is highly beneficial. An EF-hand motive has been used successfully to align the membrane protein Vpu in the magnetic field [81]. Significant RDCs could be obtained at 750 MHz. It was observed that the orientation of the protein in the magnetic field varied with the lanthanide ion used, perhaps due to interference by partial alignment caused by the micelles that were required for protein stabilisation. As a general approach, structures of soluble proteins attached to LBPs can be modelled from the large RDCs and PCSs, originating from a series of lanthanide ions immobilised in the LBP [26]. 6.4.5
Synthetic Metal Chelating Tags
Metal chelators such as EDTA, EGTA, DTPA and DOTA can be modified with cysteine- or amine-reactive groups for surface immobilisation or protein attachment and as such used as paramagnetic tags [82–86]. Examples of synthetic probes are displayed in Figure 6.4. Such probes are used to place paramagnetic atoms on specified locations on protein surfaces. Cysteine-reactive probes are used more frequently than amine-reactive ones because they readily allow for site-specific attachment. To achieve this, endogenous surface exposed Cys residues need to be replaced by Ala before a new Cys residue (or Cys residue pair, see below) is introduced. Cys residues are much rarer on proteins surfaces than amines, making the former more suitable. Dependent on the desired effects, either transition metals or lanthanides can be used. For instance, the commercially available cysteaminyl-EDTA chelated to Mn2 þ has been attached to a surface accessible cysteine of a protein to monitor complex formation with a partner protein [45,87,88]. In these studies, Rpara values revealed the transient self2 association of the protein HPr and also a nonspecific encounter complex between HPr and EIN could be identified. A similar EDTA-based ligand, chelated to Co2 þ , provided PCSbased restraints for the structural refinement of STAT4 [43]. By attaching Mn2 þ -EDTA on DNA, Clore and co-workers were able to show nonspecific binding modes of the transcription factor HOXD9 to noncognate DNA sites and concluded that the structure of this complex was similar to the one formed with the recognition site [89]. Chelating a lanthanide ion instead of a transition metal may increase the size of the effects. For instance, lanthanide ions chelated to cysteine-attached EDTA offer the opportunity to observe larger PCSs than those obtained from Co2 þ and to obtain significant RDCs at 800 MHz [90,91]. Also when Gd3 þ was chelated to a cysteine-reactive DTPA analogue, the obtained PREs were larger than those obtained with a nitroxide spin label [61]. As for the nitroxide spin label, a clear disadvantage of the metal containing tags described above is their mobility relative to the protein, causing averaging of their effects and, consequently, yielding restraints that are both less accurate and less precise. A solution for this problem has been found in rigidifying the attachment point of tag to protein. By attaching a lanthanide-probe via two Cys residues on the protein surface, as is done in the CLaNP probes (Figure 6.4), mobility is greatly reduced [92]. This leads to stronger alignment and larger, nonaveraged PCSs. Moreover, the position and even the orientation of the c-tensor can be predicted a priori if the structure of the protein that will carry the tag is available, for example in the study of protein complexes [27]. In our hands, the introduction of two Cys residues does not cause more problems than a single one and when the Cys
Paramagnetic Tools in Protein NMR
207
residues are selected close together in the sequence, the site-directed mutagenesis can be performed in a single step. In all cases it was found that attachment occurs via two arms, not just a single one. This approach has been applied to a range of proteins already, with several positions on a protein surface, and seems generally applicable. An alternative to obtain the required rigidity is to use a single cysteine and make use of a proximal carboxylate to ligate the lanthanideion [93]. The dipicolinic acid tag, developed for this purpose, is easily synthesised and has a minimal size, which is advantageous in studies involving complex formation to other macromolecules. A disadvantage could be that the lanthanide has to be introduced after attachment of the probe to the surface which may cause nonspecific labelling of metals and protein aggregation [93]. Also, the method is less general than using two Cys and it may not always possible to predict a priori whether a carboxyl group from the protein will coordinate and what the location of the lanthanide ion will be. Another important aspect of lanthanide-probes is their internal rigidity. It is known from EDTA, DPTA and DOTA-like molecules that they exchange between conformations on timescales important for NMR [94], which can lead to multiple PCSs per protein nucleus for tags based on these compounds [95]. This can be overcome by rigidifying the cages, for example designing the pendant arms on the polyaza macrocycle of DOTA to enforce a single conformation, as has been shown for probe CLaNP-5 [27] (Figure 6.4). Conformational exchange of these tags is not a problem when obtaining PREs with Gd3 þ , because the relaxation effect is isotropic and does not depend on the c-tensor. For example, the DPTAbase probe CLaNP-1 was applied to obtain interprotein distances in a 152 kDa complex via PRE measurements [96] (see Section 6.5).
Protocol for the Application of Paramagnetic NMR on Diamagnetic Proteins In this section a general protocol is provided on how to apply paramagnetic NMR to retrieve structural information from diamagnetic proteins and their complexes. 1. First the type of restraint is selected, depending on the required information. To retrieve long-range structural information, PCS or RDC analyses are the methods of choice. For dynamics analysis PREs or RDCs should be determined. To determine distances or relative positions on or between proteins, PREs or PCSs should be acquired (see also the examples in Section 6.5). The best metal(s) for a given type of restraint can be found in Table 6.2. 2. The paramagnetic tag and the position of attachment on the protein are selected. This position (or several positions) should not influence the protein structure or interfere with complex formation to partner molecules; control experiments are required to validate this. The various tags have been discussed in Section 6.4 and important features are: bulkiness, genetic versus chemical attachment, mobility and the type of metal(s) it binds. Cysteine tags are the most common (Figure 6.4), either involving a single or a pair of Cys residues. Thus, surface accessible Cys residues have to be introduced via site-directed mutagenesis. Endogenous surface accessible Cys residues have to be removed by mutation to an alanine. Buried Cys residues are usually not problematic. The introduction of Cys residues may affect expression and solubility, which we experienced to be less troublesome for large proteins than for smaller ones, but any effect is difficult to
208
3.
4.
5.
6.
Protein NMR Spectroscopy
predict. Therefore, it is advisable to design one or two Cys positions more than required, in case a mutant gives problems. Probe attachment to Cys requires that any dimers are removed first by treatment with a reductant such as DTT. Then DTT is removed quickly via a buffer exchange column and probe is added to the protein. A ratio of 5 : 1 of probe over protein is a good starting point. It is recommended to use degassed buffers to prevent formation of dimers. After labelling, any dimers and excess probe can be removed via size exclusion chromatography. The degree of labelling can be determined in various ways, using quantitative EPR to measure spin label concentration [72], metal analysis via atomic absorption [96], in relative quantity by MS [27], or in the case of pseudocontact shifts, simply by NMR, by checking for diamagnetic peaks that still remain for resonances that experience PCS. At least two NMR experiments, with diamagnetic and paramagnetic samples are required per position of the paramagnetic label. In some cases both PCSs and RDCs can be obtained from the same sample and from similar experiments (e.g. IPAP and HSQC). In general, greater signal-to-noise and resolution are required for the RDCs than for PCSs. For PREs, the peak intensities need to be determined, which requires much better signal-to-noise than for shifts (see Section 6.2.1). The shifts, splittings or broadenings are quantified from the difference between the diaand paramagnetic sample. Subsequently, these are translated into distance restraints, using Equations 6.1–6.10, or used directly as paramagnetic restraints in specialised software (see below). Any averaging of paramagnetic effects should be taking into account, for instance, extreme probe orientations should be used and averaged over to model the effects realistically [74]. Starting with a previously determined protein 3D structure, for example from conventional NMR data, the experimental paramagnetic NMR restraints or their derived distances are used to refine the structure or to model a complex. With the widespread application of paramagnetic restraints to guide structure calculations, software has been developed specifically to optimise structures and to dock proteins, based on experimental paramagnetic NMR data. X-PLOR-NIH is a widely used structure calculation program based on input from NMR and X-ray crystallography data [97]. Interatomic distances derived from paramagnetic NMR data are usable as input in the program [98]. Alternatively, paramagnetic packages are available to directly use restraints from paramagnetic NMR experiments. The group of Bertini has developed several of such software packages, specifically designed to incorporate paramagnetic NMR in structure calculations. Thus new paramagnetic modules were incorporated in CYANA and DYANA [99–101], and similar scripts were added as modules into X-PLOR via PARA-Restraints [102]. HADDOCK is a docking program used to find lowest energy structures of complexes of proteinswithproteins,DNA,ligandsandpeptides[103].Theprogramhasbeensuccessfully applied in using experimental data, like NOE and chemical shift perturbations obtained from NMR experiments, to guide the docking of macromolecules. Also in this program, interatomic distances determined from paramagnetic restraints can be used as input [69]. The current server-based version is also able to make use of RDCs as restraints [104,105]. To determine the c-tensor and refine structures using RDCs, whether or not from paramagnetic origin, several other software packages have been developed, including Module [106], by the group of Blackledge, and REDCAT and REDCRAFT [107,108],
Paramagnetic Tools in Protein NMR
209
from the Prestegard group. The groups of Otting and Huber have developed several dedicated software packages to calculate c-tensors, find structures and assign spectra based primarily on paramagnetic NMR data obtained from lanthanide-bound proteins. Recent examples of these are Platypus, Numbat and Echidna [36,109,110].
6.5 6.5.1
Examples Structure Determination of Paramagnetic Proteins
Traditionally it has been considered impossible to solve the full structure of proteins containing a paramagnetic centre by NMR, due to the large broadening and shifts of the resonances of nuclei close to the metal [111]. However, since the 1990s techniques have been developed to use the paramagnetic cofactor to retrieve structural restraints to solve metal binding site structures. Some excellent reviews are available explaining how to use a paramagnetic cofactor to deliver structural restraints [14,17,112–115]. As an example, the overall solution structure of the blue copper protein plastocyanin with the copper cofactor in its oxidised, paramagnetic state is discussed [116]. Copper has a long tS, causing much line broadening but small PCSs. It was known from previous studies that the copper atom forms a distorted tetrahedral coordination sphere with the four ligand residues, two His, one Cys and one Met [117,118]. The unpaired copper electron is delocalised over the ligating atoms which makes it difficult to accurately estimate the effective distance between unpaired nucleus and proton. The electron spin density was parameterised by seven hydrogen-like atomic orbitals and the resulting equation was added to the structure calculations, making use of the longitudinal PREs of nearby proton resonances, conventional NOEs and dihedral angle restraints to obtain the effective distance and refine the structure. Eventually, this approach led to a structure with a well defined metal site, highly similar to crystallised blue copper proteins. Similar techniques have been applied to haem proteins. Initially, proton resonances shifted far from the polypeptide signals could be assigned to haem porphyrin protons and the histidine ligand coordinating to the haem iron atom [119]. More recently, paramagnetic restraints originating from the iron have also been used to refine the distal parts of protein, for instance in the case of cytochrome c556, allowing a structural model to be built of the protein [101]. 6.5.2
Structure Determination Using Artificial Paramagnets
The two Ca2 þ atoms of the N-terminal domain of calmodulin were exchanged for the paramagnetic Ce3 þ in order to determine the metal positions in the structure [120]. From the Ce3 þ -induced PCSs the positions of both metal atoms were determined with higher resolution than was possible from conventional NMR and along the way the entire protein structure was refined significantly. During the structure calculations using DYANA, two paramagnetic centres with their own c-tensors were taken into account, as determined using FANTASIA [121]. The structure was initially determined based on NOEs and Ce3 þ enhanced R1 rates and subsequently also the PCSs were introduced as restraints. The addition of the paramagnetic metals led to a reduction in the number of observed NOEs, but due to the paramagnetic restraints a better resolved structure was obtained. Eventually, the RMSD for the two metal sites decreased from 0.65 and 1.03 A to 0.18 and 0.16 A, respectively.
210
Protein NMR Spectroscopy
Ubiquitin was equipped with an N-terminal Cu2 þ -binding peptide to retrieve Rpara 2 restraints for structure refinement [44]. In this study, the position of the Cu2 þ atom was determined and the protein torsion angles and side-chain dihedral angles were optimised using the experimental restraints and an available crystal structure of the protein. For this, a paramagnetic module PMAG was introduced for use in CNS, providing an energy term for thedifferenceintheexperimentallyobtained relaxationrateversus thecalculated one,based on a model of the peptide attached to the ubiquitin crystal structure. The ten lowest energy solutions all show the Cu2 þ atom in narrow spatial distribution and consequently the backpredicted Cu2 þ -proton distances match well with the experimental values. So far, the application of paramagnetic NMR to determine de novo structures of macromolecules has been limited. 6.5.3
Structures of Protein Complexes
In many studies the paramagnetic effects from one protein have been used as restraints to dock a binding protein partner (see also Chapter 8). Interprotein amide-to-metal distances are determined and subsequently used to guide the docking of the two protein structures, where their individual solution structures are known. In the first of such studies, both chemical shift perturbations and haem-induced PCSs were used to guide the docking of the photosynthetic redox proteins plastocyanin and cytochrome f [122]. The results show that plastocyanin can be docked with good precision. The benefit from paramagnetic restraints is that these are easily related to distances, unlike chemical shift perturbations [123]. Another example is the use of Gd3 þ chelated in CLaNP-1 (Figure 6.5), which was attached to nitrite reductase (NiR), a trimeric enzyme of 110 kDa involved in
Figure 6.5 The complex of nitrite reductase (NiR) and pseudoazurin (Paz). NiR is shown in spacefill, with its subunits in blue, pink and green. The best twenty Paz orientations are shown as Ca traces. The Paz copper atoms are shown as green spheres and the positions of the Gd3 þ ions in the CLaNP molecules as orange spheres. Reprinted from Vlasie et al. [96], Copyright (2008) with permission from Elsevier. Please refer to the colour plate section
Paramagnetic Tools in Protein NMR
211
denitrification [96]. Cys pairs were engineered at three places on NiR and the tag was attached. Subsequently, perdeuterated, 15 N-enriched pseudoazurin, the 14 kDa electron donor of NiR, was bound and the intermolecular PRE effects were observed in TROSY spectra. The resulting set of long-range distance restraints, in combination with a few chemical shift perturbations, were sufficient to dock the proteins and yield an ensemble of pseudoazurin structures with an RMSD of 1.5 A (Figure 6.5). The PREs extended to 35 A, which is a larger distance than is obtained for small proteins equipped with the same probe [10], because the larger tr of the complex results in larger paramagnetic effects (see Equation 6.2). This study shows the great potential to study large protein complexes with paramagnetic restraints. 6.5.4
Studying Dynamics with Paramagnetism
The paramagnetic effects discussed in this chapter are always very sensitive to changes in interatomic distances or angles. In many cases time-dependent processes are indirectly monitored and therefore information on the movement of (parts of) proteins can be extracted. The complex formed between the yeast redox proteins cytochrome c (Cc) and cytochrome c peroxidase (CcP) has been crystallised [124], suggesting that the complex is in a well-defined orientation. To study this complex in solution, five mutants with an exposed Cys residue were produced and labelled with the stable nitroxide MTSL (Figure 6.4). The intermolecular PREs observed on Cc were used to dock the proteins. The resulting structure was almost identical to the crystal structure, confirming the validity of docking based on PREs [72]. Interestingly, for several spin label positions, some defined regions of Cc showed stronger PREs than expected from the solution structure. This provided evidence for other conformations of the cytochrome within the complex, representing the dynamic encounter state of the complex that represents about 20 % of the complex. PREs are very sensitive to minor states that bring the nucleus close to the paramagnetic centre, because this results in strong relaxation due to the sixth power dependence on the distance (see Equation 6.2). Also other PRE-based studies indicate that single complex conformations are in many cases not enough to describe a protein interacting with another protein or DNA [45,89]. Two studies on an extremely dynamic complex of two small redox proteins used a range of paramagnetic effects, both from the naturally present metals, from a spin label and from a two-armed lanthanide tag [21,125]. All data showed consistently that the two proteins sample a large surface area on the partner, as illustrated in Figure 6.6. Similar experiments can be used to study domain dynamics, as was illustrated for calmodulin containing lanthanides in one of the Ca2 þ binding site [20]. In this study, both paramagnetically induced RDCs and PCSs were applied. Various approaches have been proposed for visualisation of dynamic complexes characterised with paramagnetism, mostly based on a minimum set of structures that can describe the data [20,72,88,125–127]. This area is still under development and is hindered not only by the fact that NMR provides only averages over all the populated conformers, but also by the fact that the mean of observables depends nonlinearly on the fractional populations due to the high powers in the distance dependence of PCS and PRE. Although this allows for the detection of lowly populated states, it is difficult to describe these states both structurally and in terms of populations.
212
Protein NMR Spectroscopy
Figure 6.6 The ensemble of complex structures of adrenodoxin (Adx) and cytochrome c (Cc) from a PCS-based simulation illustrates the degree of dynamics between the two proteins. Adx is shown in a surface representation coloured according to the electrostatic potential with red for negative and blue for positive; the Fe2S2 binding loop is in yellow. The centres of mass of Cc are represented by green spheres. Reprinted with permission from Xu et al. [125]. Copyright 2008, American Chemical Society. Please refer to the colour plate section
For measuring the dynamics of proteins by NMR, paramagnetic tools are not required per se, nevertheless these may be helpful to ease the analysis and offer additional possibilities. For instance, when obtaining RDCs to study the millisecond dynamics of protein domains, as has been done for ubiquitin [128], paramagnetic tags [26,27] could give a well-defined and easy-to-use means of alignment. Paramagnetic probes could be used in enhancing relaxation dispersion experiments as well [129]. Rigid anisotropic paramagnetic probes can create a gradient of chemical shift through the protein (PCSs). Protein mobility along this gradient can be visualised using relaxation dispersion [130]. Potentially this can lead to determination of the structure of the lowly populated state because the PCS for the minor state can be derived and interpreted in structural terms, contrary to the normal chemical shifts measured for the minor state derived from this type of experiment. However, this places high demands on the rigidity of the probe relative to the protein backbone, because probe movement would be difficult to distinguish from mobility within the protein. It is expected that this approach may yield valuable information in the future, because lowly populated states can only be observed at atomic level by NMR. Using relaxation dispersion it is possible to distinguish mobile from rigid parts in proteins and timescales of these motions can be estimated.
6.6
Conclusions and Perspective
The application of paramagnetic NMR in research dedicated to obtain and refine protein structures, and to study protein dynamics and protein complex formation, holds great promise for the future. With the technical developments that have been made in the last two decades, the paramagnetic NMR toolbox has expanded and can be used for any protein. Paramagnetic probes are widely available, and the paramagnetic effects can usually be measured in simple and sensitive experiments, like HSCQ and TROSY, offering opportunities for large proteins and complexes. The software required to use paramagnetic
Paramagnetic Tools in Protein NMR
213
restraints has been developed, molecular biology methods to provide for isotope enriched and mutated proteins are available and via various consortia and collaborations also highfield NMR facilities are within reach of nonspecialised scientists. By using paramagnetic NMR, large protein complexes can be studied and so can the interactions with all kinds of ligands, which is in many cases impossible by other structural techniques like conventional NMR or X-ray crystallography. Furthermore, recent applications have shown the possibilities paramagnetic NMR offers in studying dynamic processes, such as the interactions between macromolecules in transient complexes. Hidden, low-populated states of conformations could be visualised using relaxation experiments and the near future will show more applications, like using relaxation dispersion in combination with paramagnetic restraints to refine short-living structures of highly dynamic molecules.
References 1. Luginb€uhl, P. and W€uthrich, K. (2002) Semi-classical nuclear spin relaxation theory revisited for use with biological macromolecules. Prog. NMR Spectrosc, 40, 199–247. 2. Clore, G.M. and Gronenborn, A.M. (1998) New methods of structure refinement for macromolecular structure determination by NMR. Proc. Natl. Acad. Sci. USA, 95, 5891–5898. 3. Bertini, I. and Luchinat, C. (1996) NMR of Paramagnetic Substances, Elsevier, Amsterdam. 4. Solomon, I. (1955) Relaxation processes in a system of two spins. Phys. Rev., 99, 559–565. 5. Modi, S., Primrose, W.U., Boyle, J.M. et al. (1995) NMR studies of substrate binding to cytochrome P450 BM3: comparisons to cytochrome P450 cam. Biochemistry, 34, 8982–8988. 6. Granot, J. (1982) Paramagnetic relaxation in dipolar-coupled homonuclear spin systems. J. Magn. Reson, 49, 257–270. 7. Bertini, I., Donaire, A., Luchinat, C. and Rosato, A. (1997) Paramagnetic relaxation as a tool for solution structure determination: Clostridium pasteurianum ferredoxin as an example. Proteins, 29, 348–358. 8. Battiste, J.L. and Wagner, G. (2000) Utilization of site-directed spin labeling and high-resolution heteronuclear nuclear magnetic resonance for global fold determination of large proteins with limited nuclear overhauser effect data. Biochemistry, 39, 5355–5365. 9. Iwahara, J., Tang, C. and Marius Clore, G. (2007) Practical aspects of 1H transverse paramagnetic relaxation enhancement measurements on macromolecules. J. Magn. Reson., 184, 185–195. 10. Vlasie, M.D., Comuzzi, C., van den Nieuwendijk, A.M. et al. (2007) Long-range-distance NMR effects in a protein labeled with a Lanthanide-DOTA chelate. Chem. Eur. J., 13, 1715–1723. 11. Pintacuda, G., Moshref, A., Leonchiks, A. et al. (2004) Site-specific labelling with a metal chelator for protein-structure refinement. J. Biomol. NMR, 29, 351–361. 12. Caravan, P., Greenfield, M.T. and Bulte, J.W. (2001) Molecular factors that determine Curie spin relaxation in dysprosium complexes. Magn. Reson Med., 46, 917–922. 13. Banci, L., Bertini, I., Marconi, S. and Pierattelli, R. (1993) 1H-NMR study of reduced heme proteins myoglobin and cytochrome P450. Eur. J. Biochem., 215, 431–437. 14. Arnesano, F., Banci, L. and Piccioli, M. (2005) NMR structures of paramagnetic metalloproteins. Q. Rev. Biophys., 38, 167–219. 15. Madhu, P.K., Grandori, R., Hohenthanner, K. et al. (2001) Geometry dependent two-dimensional heteronuclear multiplet effects in paramagnetic proteins. J. Biomol. NMR, 20, 31–37. 16. Reif, B., Diener, A., Hennig, M. et al. (2000) Cross-correlated relaxation for the measurement of angles between tensorial interactions. J. Magn. Reson., 143, 45–68. 17. Bertini, I., Luchinat, C., Parigi, G. and Pierattelli, R. (2005) NMR spectroscopy of paramagnetic metalloproteins. Chembiochem., 6, 1536–1549.
214
Protein NMR Spectroscopy
18. Tolman, J.R., Flanagan, J.M., Kennedy, M.A. and Prestegard, J.H. (1995) Nuclear magnetic dipole interactions in field-oriented proteins: Information for structure determination in solution. Proc. Natl. Acad. Sci. USA, 92, 9279–9283. 19. Prestegard, J.H. and Kishore, K.I. (2001) Partial alignment of biomolecules: an aid to NMR characterization. Curr. Opin. Chem. Biol., 5, 584–590. 20. Bertini, I., Del Bianco, C., Gelis, I. et al. (2004) Experimentally exploring the conformational space sampled by domain reorientation in calmodulin. Proc. Natl. Acad. Sci. USA, 101, 6841–6846. 21. Xu, X., Keizers, P.H., Reinle, W. et al. (2009) Intermolecular dynamics studied by paramagnetic tagging. J. Biomol. NMR, 43, 247–254. 22. Tjandra, N., Omichinski, J.G., Gronenborn, A.M. et al. (1997) Use of dipolar 1H-15N and 1H-13C couplings in the structure determination of magnetically oriented macromolecules in solution. Nat. Struct. Biol., 4, 732–738. 23. Tjandra, N., Grzesiek, S. and Bax, A. (1996) Magnetic field dependence of nitrogen-proton J splittings in 15N-enriched human ubiquitin resulting from relaxation interference and residual dipolar coupling. J. Am. Chem. Soc., 118, 6264–6272. 24. Tolman, J.R., Flanagan, J.M., Kennedy, M.A. and Prestegard, J.H. (1997) NMR evidence for slow collective motions in cyanometmyoglobin. Nat. Struct. Biol., 4, 292–297. 25. Biekofsky, R.R., Muskett, F.W., Schmidt, J.M. et al. (1999) NMR approaches for monitoring domain orientations in calcium-binding proteins in solution using partial replacement of Ca2 þ by Tb3 þ . FEBS Lett., 460, 519–526. 26. Su, X.C., McAndrew, K., Huber, T. and Otting, G. (2008) Lanthanide-binding peptides for NMR measurements of residual dipolar couplings and paramagnetic effects from multiple angles. J. Am. Chem. Soc., 130, 1681–1687. 27. Keizers, P.H., Saragliadis, A., Hiruma, Y. et al. (2008) Design, synthesis, and evaluation of a lanthanide chelating protein probe: CLaNP-5 yields predictable paramagnetic effects independent of environment. J. Am. Chem. Soc., 130, 14802–14812. 28. Gochin, M. and Roder, H. (1995) Protein structure refinement based on paramagnetic NMR shifts: applications to wild-type and mutant forms of cytochrome c. Protein Sci., 4, 296–305. 29. Phillips, W.D., Poe, M., McDonald, C.C. and Bartsch, R.G. (1970) Proton magnetic resonance studies of Chromatium high-potential iron protein. Proc. Natl. Acad. Sci. USA, 67, 682–687. 30. Busse, S.C., La Mar, G.N. and Howard, J.B. (1991) Two-dimensional NMR investigation of iron-sulfur cluster electronic and molecular structure of oxidized Clostridium pasteurianum ferredoxin. Interpretability of contact shifts in terms of cysteine orientation. J. Biol. Chem., 266, 23714–23723. 31. Kalverda, A.P., Salgado, J., Dennison, C. and Canters, G.W. (1996) Analysis of the paramagnetic copper(II) site of amicyanin by 1H NMR spectroscopy. Biochemistry, 35, 3085–3092. 32. Allegrozzi, M., Bertini, I., Janik, M.B. et al. (2000) Lanthanide-induced pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40 A from the metal ion. J. Am. Chem. Soc., 122, 4154–4161. 33. Otting, G. (2008) Prospects for lanthanides in structural biology by NMR. J. Biomol. NMR, 42, 1–9. 34. Pintacuda, G., Park, A.Y., Keniry, M.A. et al. (2006) Lanthanide labeling offers fast NMR approach to 3D structure determinations of protein-protein complexes. J. Am. Chem. Soc., 128, 3696–3702. 35. Iizuka, T. and Morishima, I. (1975) NMR studies of hemoproteins. VI. Acid-base transitions of ferric myoglobin and its imidazole complex. Biochim. Biophys. Acta, 400, 143–153. 36. Pintacuda, G., Keniry, M.A., Huber, T. et al. (2004) Fast structure-based assignment of 15N HSQC spectra of selectively 15N-labeled paramagnetic proteins. J. Am. Chem. Soc., 126, 2963–2970. 37. Grimaldi, J.J. and Sykes, B.D. (1975) Concanavalin A: a stopped flow nuclear magnetic resonance study of conformational changes induced by Mn2+, Ca2+, and alpha-methyl-Dmannoside. J. Biol. Chem., 250, 1618–1624. 38. Inubushi, T., Ikeda-Saito, M. and Yonetani, T. (1983) Isotropically shifted NMR resonances for the proximal histidyl imidazole NH protons in cobalt hemoglobin and iron-cobalt hybrid
Paramagnetic Tools in Protein NMR
39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61.
215
hemoglobins. Binding of the proximal histidine toward porphyrin metal ion in the intermediate state of cooperative ligand binding. Biochemistry, 22, 2904–2907. Beattie, J.K., Fensom, D.J., Freeman, H.C. et al. (1975) An NMR investigation of electron transfer in the copper-protein, plastocyanin. Biochim. Biophys. Acta, 405, 109–114. Schejter, A., Lanir, A., Vig, I. and Cohen, J.S. (1978) Direct observation of the methionine residues of cytochrome c by 13C nuclear magnetic resonance spectroscopy. J. Biol. Chem., 253, 3768–3770. Salgado, J., Jimenez, H.R., Moratal, J.M. et al. (1996) Paramagnetic cobalt and nickel derivatives of Alcaligenes denitrificans azurin and its M121Q mutant. A 1H NMR study. Biochemistry, 35, 1810–1819. Gochin, M. (2000) A high-resolution structure of a DNA–chromomycin–Co(II) complex determined from pseudocontact shifts in nuclear magnetic resonance. Structure, 8, 441–452. Gaponenko, V., Sarma, S.P., Altieri, A.S. et al. (2004) Improving the accuracy of NMR structures of large proteins using pseudocontact shifts as long-range restraints. J. Biomol. NMR, 28, 205–212. Donaldson, L.W., Skrynnikov, N.R., Choy, W. et al. (2001) Structural characterization of proteins with an ATCUN motif by paramagnetic enhancement NMR spectroscopy. J. Am. Chem. Soc., 123, 9843–9847. Tang, C., Iwahara, J. and Clore, G.M. (2006) Visualization of transient encounter complexes in protein-protein association. Nature, 444, 383–386. Cotton, S. (2006) Lanthanide and Actinide Chemistry, John Wiley & Sons, Ltd, Chichester. Franz, K.J., Nitz, M. and Imperiali, B. (2003) Lanthanide-binding tags as versatile protein coexpression probes. Chembiochem., 4, 265–271. Martin, L.J., Hahnke, M.J., Nitz, M. et al. (2007) Double-lanthanide-binding tags: design, photophysical properties, and NMR applications. J. Am. Chem. Soc., 129, 7106–7113. Griffin, J.H., Furie, B. and Schechter, A.N. (1975) Application of nuclear magnetic resonance spectroscopy to proteins. Biochimie, 57, 453–460. Lee, L., Sykes, B.D. and Birnbaum, E.R. (1979) A determination of the relative compactness of the Ca2 þ -binding sites of a Ca2 þ -binding fragment of troponin-C and parvalbumin using lanthanide-induced 1H NMR shifts. FEBS Lett., 98, 169–172. Gariepy, J., Sykes, B.D. and Hodges, R.S. (1983) Lanthanide-induced peptide folding: variations in lanthanide affinity and induced peptide conformation. Biochemistry, 22, 1765–1772. Chantler, P.D. (1983) Lanthanides do not function as calcium analogues in scallop myosin. J. Biol. Chem., 258, 4702–4705. Kretsinger, R.H. and Nockolds, C.E. (1973) Carp muscle calcium-binding protein. II. Structure determination and general description. J. Biol. Chem., 248, 3313–3326. Bertini, I., Janik, M.B., Lee, Y.M. et al. (2001) Magnetic susceptibility tensor anisotropies for a lanthanide ion series in a fixed protein matrix. J. Am. Chem. Soc., 123, 4181–4188. Bertini, I., Jimenez, B., Piccioli, M. and Poggi, L. (2005) Asymmetry in 13C-13C COSY spectra provides information on ligand geometry in paramagnetic proteins. J. Am. Chem. Soc., 127, 12216–12217. Barden, J.A., Cooke, R., Wright, P.E. and dos Remedios, C.G. (1980) Proton nuclear magnetic resonance and electron paramagnetic resonance studies on skeletal muscle actin indicate that the metal and nucleotide binding sites are separate. Biochemistry, 19, 5912–5916. Wang, X., Srisailam, S., Yee, A.A. et al. (2007) Domain-domain motions in proteins from timemodulated pseudocontact shifts. J. Biomol. NMR, 39, 53–61. Donaire, A., Salgado, J. and Moratal, J.M. (1998) Determination of the magnetic axes of cobalt(II) and nickel(II) azurins from 1H NMR data: Influence of the metal and axial ligands on the origin of magnetic anisotropy in blue copper proteins. Biochemistry, 37, 8659–8673. Scarselli, M., Bernini, A., Segoni, C. et al. (1999) Tendamistat surface accessibility to the TEMPOL paramagnetic probe. J. Biomol. NMR, 15, 125–133. Yuan, T., Ouyang, H. and Vogel, H.J. (1999) Surface exposure of the methionine side chains of calmodulin in solution. J. Biol. Chem., 274, 8411–8420. Pintacuda, G. and Otting, G. (2002) Identification of protein surfaces by NMR measurements with a pramagnetic Gd(III) chelate. J. Am. Chem. Soc., 124, 372–373.
216
Protein NMR Spectroscopy
62. Bernini, A., Spiga, O., Venditti, V. et al. (2006) NMR studies of lysozyme surface accessibility by using different paramagnetic relaxation probes. J. Am. Chem. Soc., 128, 9290–9291. 63. Griffith, O.H. and McConnell, H.M. (1966) A nitroxide-maleimide spin label. Proc. Natl. Acad. Sci. USA, 55, 8–11. 64. Anglister, J., Frey, T. and McConnell, H.M. (1985) NMR technique for assessing contributions of heavy and light chains to an antibody combining site. Nature, 315, 65–67. 65. Lee, Y.H., Currie, B.L. and Johnson, M.E. (1986) Interaction of a spin-labeled phenylalanine analogue with normal and sickle hemoglobins: detection of site-specific interactions through spin-label-induced 1H NMR relaxation. Biochemistry, 25, 5647–5654. 66. Cutting, B., Strauss, A., Fendrich, G. et al. (2004) NMR resonance assignment of selectively labeled proteins by the use of paramagnetic ligands. J. Biomol. NMR, 30, 205–210. 67. Lindfors, H.E., de Koning, P.E., Drijfhout, J.W. et al. (2008) Mobility of TOAC spin-labelled peptides binding to the Src SH3 domain studied by paramagnetic NMR. J. Biomol. NMR, 41, 157–167. 68. Folkers, P.J., van Duynhoven, J.P., van Lieshout, H.T. et al. (1993) Exploring the DNA binding domain of gene V protein encoded by bacteriophage M13 with the aid of spin-labeled oligonucleotides in combination with 1H-NMR. Biochemistry, 32, 9407–9416. 69. Cai, S., Zhu, L., Zhang, Z. and Chen, Y. (2007) Determination of the three-dimensional structure of the Mrf2-DNA complex using paramagnetic spin labeling. Biochemistry, 46, 4943–4950. 70. Ramos, A. and Varani, G. (1998) A new method to detect long-range protein-RNA contacts: NMR detection of electron-proton relaxation induced by nitroxide spin-labeled RNA. J. Am. Chem. Soc., 120, 10992–10993. 71. Kleerekoper, Q., Howarth, J.W., Guo, X. et al. (1995) Cardiac troponin I induced conformational changes in cardiac troponin C as monitored by NMR using site-directed spin and isotope labeling. Biochemistry, 34, 13343–13352. 72. Volkov, A.N., Worrall, J.A., Holtzmann, E. and Ubbink, M. (2006) Solution structure and dynamics of the complex between cytochrome c and cytochrome c peroxidase determined by paramagnetic NMR. Proc. Natl. Acad. Sci. USA, 103, 18945–18950. 73. Lord, S.T. and Breslow, E. (1980) Synthesis of peptide spin-labels that bind to neurophysin and their application to distance measurements within neurophysin complexes. Biochemistry, 19, 5593–5602. 74. Wahara, J., Schwieters, C. D. and Clore, G. M. (2004) Ensemble approach for NMR structure refinement against 1H paramagnetic relaxation enhancement data arising from a flexible paramagnetic group attached to a macromolecule. J. Am. Chem. Soc., 126, 5879–5896. 75. Bettio, A., Gutewort, V., P€oppl, A. et al. (2002) Electron paramagnetic resonance backbone dynamics studies on spin-labelled neuropeptide Y analogues. J. Peptide Sci., 8, 671–682. 76. Jensen, M.R., Lauritzen, C., Dahl, S.W. et al. (2004) Binding ability of a HHP-tagged protein towards Ni2 þ studied by paramagnetic NMR relaxation: The possibility of obtaining long-range structure information. J. Biomol. NMR, 29, 175–185. 77. Lee, L. and Sykes, B.D. (1980) Nuclear magnetic resonance determination of metal-proton distances in the EF site of carp parvalbumin using the susceptibility contribution to the line broadening of lanthanide-shifted resonances. Biochemistry, 19, 3208–3214. 78. Wohnert, J., Franz, K.J., Nitz, M. et al. (2003) Protein alignment by a coexpressed lanthanidebinding tag for the measurement of residual dipolar couplings. J. Am. Chem. Soc., 125, 13338–13339. 79. Su, X.C., Huber, T., Dixon, N.E. and Otting, G. (2006) Site-specific labelling of proteins with a rigid lanthanide-binding tag. Chembiochem., 7, 1599–1604. 80. Feeny, J., Birdsall, B., Bradbury, A.F. et al. (2001) Calmodulin tagging provides a general method of using lanthanide induced magnetic field orientation to observe residual dipolar couplings in proteins in solution. J. Biomol. NMR, 21, 41–48. 81. Ma, C. and Opella, S.J. (2000) Lanthanide ions bind specifically to an added ‘EF-hand’ and orient a membrane protein in micelles for solution NMR spectroscopy. J. Magn. Reson., 146, 381–384. 82. Tei, L., Baranyai, Z., Botta, M. et al. (2008) Synthesis and solution thermodynamic study of rigidified and functionalised EGTA derivatives. Org. Biomol. Chem., 6, 2361–2368.
Paramagnetic Tools in Protein NMR
217
83. Lewis, M.R. and Shively, J.E. (1998) Maleimidocysteineamido-DOTA derivatives: new reagents for radiometal chelate conjugation to antibody sulfhydryl groups undergo pH-dependent cleavage reactions. Bioconjug. Chem., 9, 72–86. 84. Leonov, A., Voigt, B., Rodriguez-Castaneda, F. et al. (2005) Convenient synthesis of multifunctional EDTA-based chiral metal chelates substituted with an S-mesylcysteine. Chem. Eur. J., 11, 3342–3348. 85. Lauffer, R.B. and Brady, T.J. (1985) Preparation and water relaxation properties of proteins labeled with paramagnetic metal chelates. Magn. Reson. Imaging, 3, 11–16. 86. Arano, Y., Uezono, T., Akizawa, H. et al. (1996) Reassessment of diethylenetriaminepentaacetic acid (DTPA) as a chelating agent for indium-111 labeling of polypeptides using a newly synthesized monoreactive DTPA derivative. J. Med. Chem., 39, 3451–3460. 87. Suh, J.Y., Tang, C. and Clore, G.M. (2007) Role of electrostatic interactions in transient encounter complexes in protein-protein association investigated by paramagnetic relaxation enhancement. J. Am. Chem. Soc., 129, 12954–12955. 88. Tang, C., Ghirlando, R. and Clore, G.M. (2008) Visualization of transient ultra-weak protein self-association in solution using paramagnetic relaxation enhancement. J. Am. Chem. Soc., 130, 4048–4056. 89. Iwahara, J. and Clore, G.M. (2006) Detecting transient intermediates in macromolecular binding by paramagnetic NMR. Nature, 440, 1227–1230. 90. Rodriguez-Castaneda, F., Haberz, P., Leonov, A. and Griesinger, C. (2006) Paramagnetic tagging of diamagnetic proteins for solution NMR. Magn. Reson. Chem., 44 Spec No, S10–S16. 91. Ikegami, T., Verdier, L., Sakhaii, P. et al. (2004) Novel techniques for weak alignment of proteins in solution using chemical tags coordinating lanthanide ions. J. Biomol. NMR, 29, 339–349. 92. Keizers, P.H., Desreux, J.F., Overhand, M. and Ubbink, M. (2007) Increased paramagnetic effect of a lanthanide protein probe by two-point attachment. J. Am. Chem. Soc., 129, 9292–9293. 93. Su, X.C., Man, B., Beeren, S. et al. (2008) A dipicolinic acid tag for rigid lanthanide tagging of proteins and paramagnetic NMR spectroscopy. J. Am. Chem. Soc., 130, 10486–10487. 94. Spirlet, M.R., Rebizant, J., Desreux, J.F. and Loncin, M.F. (1984) Crystal and molecular structure of sodium aqua(1,4,7,10-tetraazacyclododecane-1,4,7,10-tetraacetato)europate(III) tetrahydrate Na þ (EuDOTA.H2O)-.4H2O, and its relevance to NMR studies of the conformational behavior of the lanthanide complexes formed by the macrocyclic ligand DOTA. Inorg. Chem., 23, 359–363. 95. Prudencio, M., Rohovec, J., Peters, J.A. et al. (2004) A caged lanthanide complex as a paramagnetic shift agent for protein NMR. Chem. Eur. J., 10, 3252–3260. 96. Vlasie, M.D., Fernandez-Busnadiego, R., Prudencio, M. and Ubbink, M. (2008) Conformation of pseudoazurin in the 152kDa electron transfer complex with nitrite reductase determined by paramagnetic NMR. J. Mol. Biol., 375, 1405–1415. 97. Schwieters, C.D., Kuszewski, J.J., Tjandra, N. and Clore, G.M. (2003) The Xplor-NIH NMR molecular structure determination package. J. Magn. Reson., 160, 65–73. 98. Lin, J., Abeygunawardana, C., Frick, D.N. et al. (1997) Solution structure of the quaternary MutT-M2 þ -AMPCPP-M2 þ complex and mechanism of its pyrophosphohydrolase action. Biochemistry, 36, 1199–1211. 99. Guntert, P., Mumenthaler, C. and Wuthrich, K. (1997) Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol., 273, 283–298. 100. Jee, J. and Guntert, P. (2003) Influence of the completeness of chemical shift assignments on NMR structures obtained with automated NOE assignment. J. Struct. Funct. Genomics, 4, 179–189. 101. Bertini, I., Faraone-Mennella, J., Gray, H.B. et al. (2004) NMR-validated structural model for oxidized Rhodopseudomonas palustris cytochrome c(556). J. Biol. Inorg. Chem., 9, 224–230. 102. Banci, L., Bertini, I., Cavallaro, G. et al. (2004) Paramagnetism-based restraints for Xplor-NIH. J. Biomol. NMR, 28, 249–261. 103. Dominguez, C., Boelens, R. and Bonvin, A.M. (2003) HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc., 125, 1731–1737.
218
Protein NMR Spectroscopy
104. van Dijk, A.D., Fushman, D. and Bonvin, A.M. (2005) Various strategies of using residual dipolar couplings in NMR-driven protein docking: application to Lys48-linked di-ubiquitin and validation against 15N-relaxation data. Proteins, 60, 367–381. 105. Zhang, W., Pochapsky, S.S., Pochapsky, T.C. and Jain, N.U. (2008) Solution NMR structure of putidaredoxin–cytochrome P450cam complex via a combined residual dipolar coupling–spin labeling approach suggests a role for Trp106 of putidaredoxin in complex formation. J. Mol. Biol., 384, 349–363. 106. Dosset, P., Hus, J.C., Marion, D. and Blackledge, M. (2001) A novel interactive tool for rigidbody modeling of multi-domain macromolecules using residual dipolar couplings. J. Biomol. NMR, 20, 223–231. 107. Valafar, H. and Prestegard, J.H. (2004) REDCAT: a residual dipolar coupling analysis tool. J. Magn. Reson., 167, 228–241. 108. Bryson, M., Tian, F., Prestegard, J.H. and Valafar, H. (2008) REDCRAFT: a tool for simultaneous characterization of protein backbone structure and motion from RDC data. J. Magn. Reson., 191, 322–334. 109. Schmitz, C., John, M., Park, A.Y. et al. (2006) Efficient chi-tensor determination and NH assignment of paramagnetic proteins. J. Biomol. NMR, 35, 79–87. 110. Schmitz, C., Stanton-Cook, M.J., Su, X.C. et al. (2008) Numbat: an interactive software tool for fitting Deltachi-tensors to molecular coordinates using pseudocontact shifts. J. Biomol. NMR, 41, 179–189. 111. Bertini, I., Felli, I.C., Kastrau, D.H. et al. (1994) Sequence-specific assignment of the 1H and 15N nuclear magnetic resonance spectra of the reduced recombinant high-potential iron-sulfur protein I from Ectothiorhodospira halophila. Eur. J. Biochem., 225, 703–714. 112. Ubbink, M., Worrall, J.A., Canters, G.W. et al. (2002) Paramagnetic resonance of biological metal centers. Annu. Rev. Biophys. Biomol. Struct., 31, 393–422. 113. Jensen, M.R., Hass, M.A., Hansen, D.F. and Led, J.J. (2007) Investigating metal-binding in proteins by nuclear magnetic resonance. Cell Mol. Life Sci., 64, 1085–1104. 114. Cheng, H. and Markley, J.L. (1995) NMR spectroscopic studies of paramagnetic proteins: ironsulfur proteins. Annu. Rev. Biophys. Biomol. Struct., 24, 209–237. 115. Bertini, I., Luchinat, C., Parigi, G. and Pierattelli, R. (2008) Perspectives in paramagnetic NMR of metalloproteins. Dalton Trans., 3782–3790. 116. Hansen, D.F. and Led, J.J. (2006) Determination of the geometric structure of the metal site in a blue copper protein by paramagnetic NMR. Proc. Natl. Acad. Sci. USA, 103, 1738–1743. 117. Penfield, K.W., Gewirth, A.A. and Solomon, E.I. (1985) Electronic structure and bonding of the blue copper site in plastocyanin. J. Am. Chem. Soc., 107, 4519–4529. 118. Hansen, D.F. and Led, J.J. (2004) Mapping the electronic structure of the blue copper site in plastocyanin by NMR relaxation. J. Am. Chem. Soc., 126, 1247–1252. 119. Shulman, R.G., W€uthrich, K., Yamane, T. et al. (1969) Nuclear magnetic resonances of reconstituted myoglobins. Proc. Natl. Acad. Sci. USA, 63, 623–628. 120. Bentrop, D., Bertini, I., Cremonini, M.A. et al. (1997) Solution structure of the paramagnetic complex of the N-terminal domain of calmodulin with two Ce3 þ ions by 1H NMR. Biochemistry, 36, 11605–11618. 121. Banci, L., Bertini, I., Bren, K.L. et al. (1996) The use of pseudocontact shifts to refine solution structures of paramagnetic metalloproteins: Met80Ala cyano-cytochrome c as an example. J. Biol. Inorg. Chem., 1, 117–126. 122. Ubbink, M., Ejdeback, M., Karlsson, B.G. and Bendall, D.S. (1998) The structure of the complex of plastocyanin and cytochrome f, determined by paramagnetic NMR and restrained rigid-body molecular dynamics. Structure, 6, 323–335. 123. Stark, J. and Powers, R. (2008) Rapid protein-ligand costructures using chemical shift perturbations. J. Am. Chem. Soc., 130, 535–545. 124. Pelletier, H. and Kraut, J. (1992) Crystal structure of a complex between electron transfer partners, cytochrome c peroxidase and cytochrome c. Science, 258, 1748–1755. 125. Xu, X., Reinle, W., Hannemann, F. et al. (2008) Dynamics in a pure encounter complex of two proteins studied by solution scattering and paramagnetic NMR spectroscopy. J. Am. Chem. Soc., 130, 6395–6403.
Paramagnetic Tools in Protein NMR
219
126. Clore, G.M., Tang, C. and Iwahara, J. (2007) Elucidating transient macromolecular interactions using paramagnetic relaxation enhancement. Curr. Opin. Struct. Biol., 17, 603–616. 127. Hulsker, R., Baranova, M.V., Bullerjahn, G.S. and Ubbink, M. (2008) Dynamics in the transient complex of plastocyanin-cytochrome f from Prochlorothrix hollandica. J. Am. Chem. Soc., 130, 1985–1991. 128. Lange, O.F., Lakomek, N.A., Fares, C. et al. (2008) Recognition dynamics up to microseconds revealed from an RDC-derived ubiquitin ensemble in solution. Science, 320, 1471–1475. 129. Eichm€uller, C. and Skrynnikov, N.R. (2007) Observation of ms time-scale protein dynamics in the presence of Ln3 þ ions: application to the N-terminal domain of cardiac troponin C. J. Biomol. NMR, 37, 79–95. 130. Palmer, A.G., Kroenke, C.D. and Loria, J.P. (2001) Nuclear magnetic resonance methods for quantifying microsecond-to-millisecond motions in biological macromolecules. Methods Enzymol., 339, 204–238. 131. Liu, Y., Zhang, X., Yoshida, T. and La Mar, G.N. (2005) Solution 1H NMR characterization of the distal H-bond network and the effective axial field in the resting-state, high-spin ferric, substratebound complex of heme oxygenase from N. meningitidis. J. Am. Chem. Soc., 127, 6409–6422. 132. Tu, K. and Gochin, M. (1999) Structure determination by restrained molecular dynamics using NMR pseudocontact shifts as experimentally determined constraints. J. Am. Chem. Soc., 121, 9276–9285. 133. Horrocks, W.D. Jr and Sipe, J.P. 3rd, (1972) Lanthanide complexes as nuclear magnetic resonance structural probes: Paramagnetic anisotropy of shift reagent adducts. Science, 177, 994–996.
7 Structural and Dynamic Information on Ligand Binding Gordon Roberts
7.1
Introduction
The function of the vast majority of proteins in the cell depends upon their noncovalent interactions with other molecules, ranging from enzyme-substrate interactions to the array of protein-protein and protein-nucleic acid interactions involved in, for example, the regulation of transcription. NMR has proved very valuable in the study of the interaction of proteins with both small and large molecules. In this chapter the focus will be on the basic principles of the study of ligand binding by NMR and on applications to small molecule– protein interactions. Studies of macromolecular complexes are covered in Chapter 8. The use of NMR to study protein-ligand interactions is a vast area and, for reasons of space, many of the references herein are to reviews rather than to original papers; my apologies to those whose important contributions have not been cited directly. NMR can of course provide information on the structure of protein-ligand complexes, just as it can on the structure of the protein itself. However, the NMR spectrum is sensitive to both the equilibrium and the kinetics of the binding process. These effects can provide valuable information in themselves but, most importantly, they must be understood in order to interpret the spectrum correctly.
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
222
Protein NMR Spectroscopy
7.2
Fundamentals of Exchange Effects on NMR Spectra
There are many situations in protein NMR where the nucleus being examined can exist in two or more environments and can exchange between them – for example, the protonated or unprotonated state of an ionisable group on an amino-acid side-chain, an enzyme with or without a substrate bound or a protein alone or complexed with another (see Chapter 8). The effects of these exchange processes on the spectrum depend upon their rate relative to the frequency difference in the NMR parameter being measured, as illustrated for a simple two-state case in Figure 7.1 (see also Chapter 8). In this case, a single nucleus is being observed, which can exist with equal probability in two different environments; the chemical shift of the nucleus differs by 637 Hz between the two environments. If the exchange rate between the two states is very slow, then we simply see two separate signals. When the molecule jumps between the two states, the precession frequency of the nucleus changes suddenly; these sudden changes in precession frequency caused increased dephasing of transverse magnetisation, and hence increased line-broadening, as shown in Figure 7.1. When the rate of exchange between the two states is of the same order as the chemical shift difference between them, the distinction in precession frequency is lost, and the two resonances coalesce into one broad signal at a chemical shift equal to the average of that of the two states. As the rate of exchange increases further, the resonance line sharpens; with more frequent jumps between the states, the lifetime of the nucleus in either state becomes too short for it to accumulate a significant phase difference between jumps. Before proceeding to a more quantitative description of exchange effects, it is important to emphasise three fundamental characteristics of exchange effects on NMR spectra: .
.
.
the effects depend on the rate of the exchange process relative to the parameter being measured, whether it be chemical shift (as in Figure 7.1), relaxation, or coupling. by the same token, two different nuclei affected by the same exchange process can show different behaviour depending on, for example, the chemical shift change which they experience (see Figure 7.2 below). these effects on the NMR spectrum are observed while the system is in chemical equilibrium; this ability to measure kinetics in a system at equilibrium is an important advantage of NMR.
7.2.1
Definitions
As will be clear from the qualitative discussion above, a key parameter in the description of exchange effects on NMR spectra is the lifetime of a particular state, or the rate of exchange between them, defined as follows: kþ1
A ! B k1
For a two-state first-order exchange the lifetime in state A, tA ¼ 1/k þ 1, and the lifetime in state B, tB ¼ 1/k1.
Structural and Dynamic Information on Ligand Binding
223
Figure 7.1 Exchange effects on the lineshape of NMR spectra. Observed (a) and calculated (b) 188.3 MHz 19F-n.m.r. spectra of 30 ,50 -difluoromethotrexate bound to L. casei dihydrofolate reductase. The sample contained slightly less than 1 molar equivalent of difluoromethotrexate, so that all the ligand is bound to the enzyme. The small sharp resonance marked X represents a small amount (<5 %) of the impurity N-(30 ,50 -difluoro-4-methylaminobenzoyl)-L-glutamate. For each pair of experimental and calculated spectra the experimental temperature and ‘best-fit’ rate constant are indicated. The frequency axis is referred to the averaged bound difluoromethotrexate resonance as zero. Reproduced by permission from Clore et al., Biochem. J. (1984) 217, 659–666
For a second-order exchange, such as a ligand, L, binding to a protein, E, which will be the focus of this chapter, we need to distinguish between nuclei in the protein and the ligand molecule. kþ1
E þ L ! EL k1
224
Protein NMR Spectroscopy
For a nucleus in the ligand molecule, The lifetime in state EL; tEL ¼ 1=k1
ð7:1Þ
The lifetime in state L; tL ¼ 1=k þ 1 ½E
ð7:2Þ
where [E] is the concentration of free protein and k þ 1[E] is a pseudo-first order rate constant. Using the definition of the equilibrium constant for the reaction k1 ½E½L ¼ ½EL kþ1
ð7:3Þ
½L pL ¼ k1 ½EL k1 pLEL
ð7:4Þ
pL ¼ ½L=LT
ð7:5Þ
pLEL ¼ ½EL=LT ¼ ð1pL Þ;
ð7:6Þ
Kd ¼ and substituting for [E] in Equation 7.2, tL ¼ where
and
with LT ¼ [L] þ [EL], represent the mole fractions of the ligand in the free state and in the complex, respectively. For a nucleus in the protein molecule, we similarly have The lifetime in state EL, tEL ¼ 1=k1
ð7:7Þ
½E pE ¼ k1 ½EL k1 pEEL
ð7:8Þ
pE ¼ ½E=ET
ð7:9Þ
pEEL ¼ ½EL=ET ¼ ð1pE Þ;
ð7:10Þ
The lifetime in state E, tE ¼ where the mole fractions are
and
with ET ¼ [E] þ [EL]. Two points should be noted. First, from the definitions of tL and tE (Equations 7.4 and 7.8), the lifetimes, and hence the exchange effects on the spectrum, will depend on the concentration of the ligand. Secondly, while the form of the equations for the lifetimes is the
Structural and Dynamic Information on Ligand Binding
225
same for ligand and protein nuclei, at any given concentration of protein and ligand the fractions of ligand and of protein in the two states will generally be different (cf. Equations 7.5, 7.6, 7.9 and 7.10), so that the lifetimes relevant to protein and ligand nuclei, and hence their exchange behaviour, will be different. In many discussions of exchange effects, a single exchange rate (or lifetime) is used to characterise the process: for a ligand nucleus kex ¼ 1=t ¼ 1=tEL þ 1=tL ¼ k1 =pL
ð7:11Þ
kex ¼ 1=t ¼ 1=tEL þ 1=tE ¼ k1 =pE
ð7:12Þ
and for a protein nucleus
In describing exchange effects on NMR spectra, three ‘exchange regimes’ can be broadly defined by considering the lifetimes of each state (or the rate of interconversion between them), relative to the difference in the NMR parameters, defining an ‘NMR timescale’. Thus, slow exchange is defined by kex |PA-PB|, where PA and PB represent the values of the NMR parameters (such as chemical shift, scalar coupling or relaxation rate) in the two states. Intermediate exchange is defined by kex |PA-PB|, and fast exchange by kex |PA-PB|. It is important to recognise that the criteria for fast and slow exchange must be applied separately to the different NMR parameters. For example, if exchange is slow on the relaxation rate timescale two separate resonances with different linewidths will always be present, regardless of whether these resonances are in fact resolved. Note that the chemical shift is for these purposes expressed in frequency units (Hertz or radian s1) rather than in the dimensionless ppm units, and the appearance of the spectrum will be different at different magnetic field strengths, spectra at higher fields showing slower exchange behaviour. Typical chemical shift differences are of the order of a few hundreds of Hertz; linewidth (transverse relaxation rate) differences will be several tens of Hertz in diamagnetic systems, and much larger in systems with a paramagnetic centre (see Chapter 6). Differences in scalar coupling – for example, in three-bond couplings which change as a result of changes in ligand conformation on binding – are almost always in fast exchange, small and difficult to measure. The nature of the experiment will depend to a large extent upon which of these three exchange regimes obtains in a given situation, so it is important to begin by establishing this from an initial qualitative analysis of the spectrum. To do this, we must consider the way in which the lineshape depends on the relative populations of the two states and on the rate of exchange between them. 7.2.2
Lineshape
In the absence of scalar coupling effects, the NMR lineshape in the presence of exchange can be described by a modification of the Bloch equations, as originally described by McConnell [1]. When scalar couplings change as a result of the exchange process the density operator formalism must be used [2–6]; the review by Bain [6] contains a useful comparison of the McConnell and density operator approaches. For the present illustrative purposes, we will simply present the result for a two-site exchange of an isolated spin (i.e. with no scalar coupling effects); succinct derivations are given by Bain [6] and in Chapter 4 of Cavanagh et al. [7]. In this case the lineshape (intensity of absorption as a function of frequency, n) is given by the imaginary part of the complex quantity G(n); for a ligand resonance:
226
Protein NMR Spectroscopy
2pL pLEL tt2 ð pL aEL þ pLEL aL Þ GðnÞ ¼ iC pL pLEL t2 aL aEL
ð7:13Þ
where C is a scaling factor, t and the relative populations of the two states are defined above, aL ¼ ½2piðnL nÞ þ R2;L þ pLEL =t
ð7:14Þ
aEL ¼ ½2piðnEL nÞ þ R2;EL þ pL =t
ð7:15Þ
and
with ni and R2,i (i ¼ L, EL) being the chemical shift and the transverse relaxation rate in the absence of exchange for the nucleus in the two states. A number of programs are available which carry out iterative lineshape fitting ([5] and references therein; [8]). In the situation illustrated in Figure 7.1 the two states are by definition equally populated and the linewidth in the absence of exchange is the same in both, so that the only variable in the simulated spectra shown in Figure 7.1 is the rate of exchange between them. In slow exchange, when kex 2p|nA nB|, separate signals are observed at the chemical shifts characteristic of the two states. At the other extreme, in fast exchange, when kex 2p|nA nB|, a single resonance is observed at the average chemical shift, (nA þ nB)=2. The transition between the two, when the spectrum shows a broad peak (centred at the average chemical shift) with a flat top, is referred to as the coalescence point. At this point, the rate kcoalescence p|nA nB|=H2 (or for a second-order exchange process kcoalescence pjnA nB j=ðpA pB Þ1=2 ). It should be noted that in intermediate exchange, around the coalescence point, the appearance of the spectrum is extremely sensitive to the rate of exchange; as can be seen from Figure 7.1, a change in rate of only a factor of 2 is sufficient to go from a spectrum with two quite well-defined peaks, through the coalescence point, to a spectrum with a single, albeit broad, peak. In ligand binding experiments, where increasing concentrations of ligand are commonly added to a fixed concentration of protein, we obviously need to consider the effects of changes in the relative populations. From Equations 7.4 and 7.8, the lifetimes of the two states will also change as a function of the ligand concentration and consequently changes in lineshape of resonances will often be observed in the course of a ligand titration. In slow exchange, as the ligand concentration is increased the intensities of the resonances corresponding to a given nucleus in the two states will simply change as a function of the relative populations of the relevant species. At the other extreme, in very fast exchange, a single resonance will be observed at a chemical shift which is the weighted average of the shifts in the two states (see Equation 7.20 below); its position will therefore change as a function of ligand concentration. In both exchange regimes there will also be linewidth changes, as a result of the concentration-dependence of the lifetimes. An example of this for the fast exchange case is given in Figure 7.2, which shows a superposition of a series of 1 H-15 N HSQC spectra of the calcium-binding regulatory protein S100B, alone and with increasing concentrations of a peptide corresponding to its binding site on the actin-capping protein CapZ [9]. Three residues whose 1 H and 15 N chemical shifts change on peptide binding are indicated; in all three cases exchange is fast on the chemical shift timescale. For Glu72, the chemical shift changes are small, and a simple progressive movement of the cross-peak from nE to nEL is seen, indicative of very
Structural and Dynamic Information on Ligand Binding
227
Figure 7.2 Ligand titration in fast exchange. An expanded region of the overlaid HSQC spectra from the titration of S100B with a peptide from CapZ. The protein concentration was 1 mM and the peptide concentrations ranged from 0 to 3 mM. Chemical-shift changes for Ala-78, Glu-72, and Ser-62 are indicated. Copyright (1997) Wiley. Reproduced by permission from Kilby et al., Protein Science (1997) 6, 2494–2503
fast exchange. This is also the case for Ser62, where the shift changes are larger, but here the progressive shift of the cross-peak is accompanied by a clear broadening at intermediate concentrations. Finally, for Ala78, which shows the largest changes in 1 H and 15 N shifts, the exchange broadening at intermediate peptide concentrations is such that the peak disappears completely. Figure 7.2 also illustrates the key point that the ‘NMR timescale’ is a relative one. For a given equilibrium, while the rate constants are indeed constant, different resonances will show different exchange behaviour – here differences in the degree of line-broadening – depending on how much their chemical shifts differ between the free and bound states. 7.2.3
Identification of the Exchange Regime
The general precautions regarding sample preparation outlined in Chapter 1 of course apply when studying exchange effects, but it is worth reiterating that, in view of the sensitivity of rate processes to pH and temperature, these must be accurately controlled. It is also important to determine the concentrations of protein and ligand as accurately as possible. Even for the initial qualitative analysis it is very valuable to sure when a ligand:protein ratio
228
Protein NMR Spectroscopy
of 1 has been reached, while when rate or equilibrium constants are being measured the results will only be as accurate as the concentrations. The usual experiment consists of adding ligand progressively in small concentration increments to a solution of protein.1 It is important to begin this ligand titration at concentrations much less than that of the protein (approximately 0.1–0.2 molar equivalents), to make a number of additions at <1 molar equivalent and to continue until at least 5 molar equivalents of ligand have been added.2 After each addition of ligand, a spectrum is acquired; a simple 1D 1 H spectrum will provide preliminary information about both ligand and protein resonances, while a 1 H-15 N HSQC spectrum will provide more detailed information about a labelled protein (cf. Figure 7.2). A number of possible changes in the spectrum may be observed (see also Chapter 8): .
.
.
.
.
a new resonance appears, increases in intensity and shifts progressively as the ligand concentration is increased, approaching the chemical shift of a ligand resonance at high ligand concentrations – a ligand resonance in fast exchange. a resonance shifts progressively without change in intensity – a protein resonance in fast exchange. in either of these cases, the resonance may broaden as well as shifting if the exchange is only ‘moderately fast’. an existing resonance (from the protein) or a new resonance (from the ligand), which is sharp at low ligand concentrations, broadens markedly as the ligand concentration is increased, often disappearing at LT/ET 2; a signal reappears at high ligand concentrations, perhaps shifting slightly – intermediate exchange. a new resonance appears and increases in intensity without shifting – a resonance from either ligand or protein in slow exchange.
Note that during a ligand titration experiment, the signal from the species with the lower population will be preferentially broadened and may be undetectable; it is therefore important to cover the full range of ligand:protein ratios to ensure that the correct identification of the exchange regime is made. In addition, the dependence of the linewidth on the external magnetic field strength can be a very valuable indication of the exchange regime [10]. Bearing in mind the relativity of the ‘NMR timescale’, it is useful to estimate the range of rates which will produce visible exchange effects. For chemical shift differences in the range 10–1000 Hz, intermediate exchange behaviour will be observed for dissociation rate constants in the range 10–104 s1; assuming that the association rate constant is diffusion-limited, 107–108 M-1 s1, this will correspond to dissociation constants in the range 103–107 M.
1
It is of course possible to do the inverse experiment, adding protein progressively to a ligand solution, but this is much more difficult to do satisfactorily and is best avoided if at all possible.
2
In order to minimise the number of manipulations of the sample and the likelihood of systematic errors (e.g. from pH changes), there are advantages in carrying out the experiment by successive mixing of a sample of free protein and one of protein saturated with ligand, as described in Chapters 1 and 8.
Structural and Dynamic Information on Ligand Binding
7.3
229
Measurement of Equilibrium and Rate Constants
The measurement of exchange rates by the analysis of the lineshape of NMR signals has a long and distinguished history; supplemented by homonuclear magnetisation exchange experiments, it remains valuable in studies of protein-ligand interactions if the resonances of interest are adequately resolved. However, over the last decade a number of new heteronuclear experiments, notably 2D zz-exchange spectroscopy and relaxation dispersion measured by CPMG or R1r experiments, have been developed which allow exchange effects to be measured on many resonances of the protein and which cover exchange rates from 0.1 s1 to 105 s1, thus providing a wealth of mechanistic and structural information. 7.3.1
Lineshape Analysis
In principle, measurements of the lineshape (of either a ligand or a protein resonance) as a function of ligand concentration can, by least-squares fitting, provide estimates of the exchange rate constants, the relaxation rates and the populations (and hence the equilibrium constant). In fact, while this can be done [11], it is rarely practical, unless the protein is small or specific isotope labelling (or incorporation of a fluorinated amino-acid) makes it possible to isolate a single resonance. Signal overlap is an obvious problem, as is the limiting signalto-noise ratio of the broad signals – in a ligand titration experiment, where populations are unequal, the signal from the species with the lower population will be preferentially broadened and may be undetectable. Finally, 13 C and 15 N spectra of macromolecules are usually recorded in an indirect dimension of multidimensional 1 H-detected NMR experiments, and the limited digital resolution may make accurate lineshape analysis impossible. If intermediate exchange is observed, it is generally preferable to change to a different spectrometer frequency (or to change the temperature a few degrees) to shift the system into slower or faster exchange, where there are useful simplifications which make analysis straightforward. 7.3.1.1 Slow Exchange This is defined by kex 2p|nA nB| on the chemical shift timescale and kex |Ri,A Ri,B| on the relaxation timescale, where A and B are, respectively, the bound and free states for either the ligand or the protein. There are separate resonances for each state at chemical shifts (for the protein) of nE and nEL, or (for the ligand) of nL and nEL. These signals will change in intensity but not in chemical shift as the ligand concentration increases. In principle, these changes in relative intensity can be used to estimate the dissociation constant; in practice this is only of any value under very restrictive conditions. First, the signals must be sufficiently well-resolved to allow accurate intensity measurements; this can be achieved by appropriate isotope labelling – for example, using perdeuterated protein to study ligand resonances. Second, the signal-to-noise ratio must be sufficient to allow accurate intensity measurements at low ligand concentrations. This latter problem is compounded by the fact that accurate measurement of Kd requires that ET < Kd. Slow exchange implies a slow rate of dissociation and a relatively low value of Kd and hence the need for a low protein concentration. For example, for a chemical shift difference of 100Hz, observation of slow exchange would imply that k1 300 s1, so that (assuming
230
Protein NMR Spectroscopy
a diffusion-limited association rate) Kd 3.105 M and one would need to make accurate intensity measurements at protein concentrations of 5 mM. It is therefore very rarely practical to make reliable estimates of Kd from spectra in the slow exchange regime. Turning to the linewidth, the transverse relaxation rates of the two ligand resonances will be given by: R2;ELðobsÞ ¼ R2;EL þ k1
ð7:16Þ
R2;LðobsÞ ¼ R2;L þ k1 pLEL =ð1pLEL Þ
ð7:17Þ
with analogous equations for protein resonances. There is thus an exchange contribution to the relaxation rate, and hence the linewidth. In the case of the signal from the free ligand, the linewidth will change as a function of the relative populations, becoming narrower as the ligand concentration increases. The value of pLEL at each ligand concentration can be calculated from an independently determined value of Kd:
pLEL
½EL ðET þ LT þ Kd Þ ¼ ¼ LT
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðET þ LT þ Kd Þ2 4ET LT 2LT
ð7:18Þ
and a plot of linewidth (or R2,L(obs)) against pLEL = 1pLEL will have a slope of k1. In practice, it will often be the case that LT > ET > Kd , and then
ET pLEL = 1pLEL LT -ET
ð7:19Þ
so that k1 can be determined without prior knowledge of Kd. The two terms in Equations 7.16 and 7.17 have opposite temperature dependence; the ‘natural’ linewidth will decrease with increasing temperature, while k1, and hence the exchange contribution to the linewidth, will increase with increasing temperature (for an example see [12]); this provides a useful diagnostic for the existence of a significant exchange contribution. 7.3.1.2 Fast Exchange In fast exchange, defined by kex 2p|nA nB| on the chemical shift timescale, a single average resonance is observed. This will not change in intensity as the ligand concentration increases but its chemical shift will be the weighted average of the shifts in the two states; for a ligand resonance, when exchange is very fast: nobs ¼ nL pL þ nEL pLEL
ð7:20Þ
with a corresponding equation for protein resonances. From Equations 7.18 and 7.20,
nobs nL ¼ ðnEL nL Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðET þ LT þ Kd Þ ðET þ LT þ Kd Þ2 4ET LT 2LT
ð7:21Þ
Structural and Dynamic Information on Ligand Binding
231
and fitting Equation 7.21 to the data (nobs as a function of LT) will provide estimates of Kd and nEL. This is the commonest and generally the most satisfactory NMR method for determining Kd; for a full discussion of this topic see [13]. If kex |R2,L R2,EL| on the relaxation timescale, the transverse relaxation rate of the single ligand resonance is given by R2;obs ¼ R2;L pL þ R2;EL pLEL þ pL pLEL Dw2 =kex
ð7:22Þ
where Dw ¼ (wEL wL). Equations analogous to (7.20)–(7.22) will describe the behaviour of protein resonances. The first two terms in Equation 7.22 represent the weighted average of the relaxation rates in the free and bound states, while the third term represents an exchange contribution to the linewidth. This third term will depend upon the magnitude of Dw2 =kex . When this is small (fast exchange and/or small chemical shift difference; ‘very fast exchange’) the third term will be negligible and R2 will simply be the weighted average of the values in the free and bound states. R2;obs ¼ R2;L pL þ R2;EL pLEL
ð7:23Þ
When Dw2 =kex is not small, sometimes called ‘moderately fast exchange’, the magnitude of third term in Equation 7.22 will depend on the ligand concentration, being greatest at pLEL 1=3. Again, the exchange rate can be measured by fitting the linewidth as a function of ligand concentration to Equation 7.22; as always, it is important to make measurements over as wide a range of ligand concentration as possible. For large proteins, a large value of R2,EL may partly ‘mask’ the exchange contribution and an actual maximum in linewidth may not be observed. In practice, relaxation experiments of the kind described below in Section 7.3.3 are better ways of measuring this exchange contribution, and hence the exchange rate, than linewidth measurements. However, it remains important to measure the linewidth as a function of ligand concentration during ligand titrations intended to determine Kd using Equation 7.21, since the simple relationship of Equation 7.20 applies only in very fast exchange. If the third term in Equation 7.22 is significant, this is diagnostic of moderately fast exchange, where averaging of chemical shifts is not totally complete, and inappropriate application of Equations 7.20 and 7.21 can lead to significant systematic errors in the estimated Kd [14]. 7.3.2
Magnetisation Transfer Experiments
In these experiments, which depend upon the nucleus of interest being in slow exchange on the chemical shift timescale between the two states so that separate resonances are observed, a selective perturbation of a nucleus in one state is transferred to the same nucleus in the other state by the exchange process;3 this transfer depends upon the rate of exchange being at least comparable to the relaxation rate of the spin. The first two kinds 3
It is important to distinguish between magnetisation transfer experiments in which the transfer is simply an exchange step linking the same nucleus (ligand or protein) in different states, and those in which in addition there is an intermolecular transfer, magnetisation being transferred by dipolar interactions between ligand and protein nuclei. Only the former are discussed in this section; for the latter, see Section 7.5.1.
232
Protein NMR Spectroscopy
of experiment to be described, saturation transfer and inversion transfer, are limited by the need for selective pulses and hence by the need for one of the resonances of interest to be adequately resolved in the spectrum; they tend to be restricted to ligand resonances. Twodimensional exchange experiments, on the other hand, can be applied to both protein and ligand resonances. The time-course of the effects observed in a magnetisation transfer equation can be described by McConnell’s modification of the Bloch equations [6,15,16]. To illustrate the principle of the methods, we consider only an isolated spin undergoing a two-state exchange process; extensions to multisite exchange have been described [17]. For a ligand resonance: dML =dt ¼ ðML MLe ÞR1;L k þ 1 ½EML þ k1 MEL
ð7:24Þ
e dMEL =dt ¼ ðMEL MEL ÞR1;EL k1 MEL þ k þ 1 ½EML
ð7:25Þ
where ML and MEL are the instantaneous magnetisations of the nucleus in the two states, and e MLe and MEL are the corresponding equilibrium magnetisations. The general solution to these equations has the form of a double exponential: ML ðtÞ ¼ C1 expðl þ tÞ þ C2 expðl tÞ þ MLe
ð7:26Þ
e MEL ðtÞ ¼ C3 expðl þ tÞ þ C4 expðl tÞ þ MEL
ð7:27Þ
where the apparent rate constants l þ and l are each combinations of k þ 1, k1, R1,L and R1,EL. Using the shorthand notation kL ¼ k þ 1 ½E þ R1;L and kEL ¼ k1 þ R1;EL , and with 0 ML0 and MEL representing the magnetisations immediately after the perturbation,
lþ ¼
1 2
l ¼
1 2
h i1=2 2 ðkL þ kEL Þ þ ðkL kEL Þ þ 4kL kEL ½E
h
2
ðkL þ kEL Þ ðkL kEL Þ þ 4kL kEL ½E
i1=2
ð7:28Þ ð7:29Þ
e 0 MEL Þ=ðl þ l Þ C1 ¼ ½ðl þ kL ÞðMLe ML0 Þk1 ðMEL
ð7:30Þ
e 0 MEL Þ=ðl þ l Þ C2 ¼ ½ðl þ þ kL ÞðMLe ML0 Þ þ k1 ðMEL
ð7:31Þ
e 0 MEL Þ=ðl þ l Þ C3 ¼ ½k þ 1 ½EðMLe ML0 Þðl þ þ kEL ÞðMEL
e 0 MEL Þ=ðl þ l Þ C4 ¼ ½k þ 1 ½EðMLe ML0 Þ þ ðl þ kEL ÞðMEL
ð7:32Þ ð7:33Þ
Structural and Dynamic Information on Ligand Binding
233
7.3.2.1 Saturation Transfer In this experiment [15], a resonance, commonly of the ligand, is saturated by an appropriate selective pulse or selective low-power irradiation, and the transfer of magnetisation is followed in real time by observing the intensity of the same nucleus in the other state. By interleaving experiments with on-resonance and control off-resonance irradiation, a difference spectrum can be obtained. If the saturation of the resonance of the bound ligand is instantaneous and complete, MEL ¼ 0 and the solution of the Bloch equations simplifies to a single exponential ML ¼ MLe ½ðk þ 1 ½E=kL ÞexpðkL tÞ þ R1;L =kL
ð7:34Þ
The intensity of the resonance of the free ligand decreases as a function of time after saturation with a rate constant kL ¼ k þ 1 ½E þ R1;L . If the free ligand is present in excess, it will usually be possible to measure its resonance intensity with sufficient accuracy. However, there are problems in using this experiment to estimate the exchange rate; it is often difficult to ensure simultaneously that the irradiation is selective and that sufficient power is used for saturation to be effectively instantaneous (to occur in a time short compared to kL). In addition, since in practice we are not dealing with isolated spins, the same experiment can give rise to NOEs ([18]; Chapter 4), complicating the interpretation unless exchange is substantially faster than cross-relaxation. Where saturation transfer has proved valuable is in locating the resonance of the ligand in the bound state, by saturating a resonance of the ligand, present in excess; the corresponding resonance of the bound ligand will often be readily detected in the (on-resonance – off-resonance) difference spectrum (e.g. [19,20]). It should be noted, however, that the results can depend on the kinetic mechanism and the ‘bound’ resonance located by saturation transfer may arise from a minor species [21,22]. 7.3.2.2 Inversion Transfer In this version of the magnetisation transfer experiment, the resonance of the nucleus in one site is selectively inverted (by a selective 180 pulse; [15,16]) and the intensities of both resonances are followed as a function of time after the inversion; the inverted resonance will recover, while the other resonance will first decrease in intensity and then recover. For both resonances the behaviour will be described by a double exponential and the analysis requires use of the full Equations 7.26 and 7.27. As with any other double exponential, it is important to collect data at a sufficient number of time points, and even so the two exponents l þ and l can be determined independently only if they are sufficiently different. A second experiment in which the other resonance is inverted provides another pair of independent combinations of rate constants, and combined analysis of the two experiments together can permit determination of all four rates (exchange and relaxation) [15,16]. Thus the inversion transfer experiment is more demanding experimentally than the simple saturation transfer, but provides significantly more reliable estimates of the exchange rates. 7.3.2.3 Two-Dimensional Exchange Spectroscopy The limitation of both saturation and inversion transfer is the requirement for a selective perturbation of one resonance; in the crowded spectra of protein-ligand complexes this can often make such experiments very difficult. As in so many other areas of protein NMR,
234
Protein NMR Spectroscopy
the use of two-dimensional methods can make it possible to observe exchange processes in the range 0.1–10 s1 even in crowded spectra. The simplest kind of homonuclear 2D exchange experiments, sometimes referred to as EXSY experiments, are carried out using the same basic three-pulse experiment used for NOESY experiments. Indeed, the two experiments are formally and practically identical [6,23]: during the mixing time the magnetisation is transferred either by exchange (between the same nucleus in two different states) or by cross-relaxation (between two different nuclei). As the mixing time, tm, is increased, the intensity of the diagonal peak decreases according to: IAA ¼ IBB ¼ 12f½1 þ expð2ktm ÞexpðRtm Þg
ð7:35Þ
while the intensity of the cross-peak first increases and then decreases according to: IAB ¼ IBA ¼ 12f½1expð2ktm ÞexpðRtm Þg
ð7:36Þ
where these expressions refer to the simple case of exchange between two equally populated sites with equal relaxation rates (k þ 1 ¼ k1 ¼ k; RA ¼ RB ¼ R) (Figure 7.3; [6]). In principle the virtue of this experiment is that the very appearance of a cross-peak at (nA, nB) is evidence that exchange is occurring between sites A and B – permitting the location and assignment of resonances from the bound ligand – and the rate of this exchange
Figure 7.3 Homonuclear 2D exchange experiment. Plot of the diagonal peak volume (dotted line) and cross-peak volume (solid line) as a function of mixing time in a homonuclear 2D exchange experiment, calculated for exchange between two equally populated sites with equal relaxation rates (Equations 7.35 and 7.36). Reproduced by permission from Bain, Prog. Nucl. Magn. Reson. Spectrosc. (2003) 43, 63–103
Structural and Dynamic Information on Ligand Binding
235
can be measured from the intensity of the cross-peaks as a function of tm at short tm. This is straightforward for isotopically labelled ligands or ligands containing fluorine, for example [24]. However, in the more general case of 1 H experiments there are real practical difficulties with this experiment in its simple form, because both exchange and crossrelaxation will contribute to the cross-peak intensities – and also in some cases transferred NOE cross-peaks (see Section 7.5.1 below) will be observed. In the simple NOESY/EXSY experiment, exchange and cross-relaxation cannot be separated (except conceivably by studying the dependence on ligand concentration, temperature or magnetic field). The exchange and cross-relaxation contributions to the cross-peaks in a homonuclear 1 H 2D exchange experiment can most readily be distinguished by carrying out a spinlocked NOE or ROESY experiment [25,26]. In a ROESY experiment, cross-peaks due to NOEs have phase opposite to that of the diagonal peaks, while those due to exchange will be in phase with the diagonal peaks, so a comparison of the two experiments allows the distinction to be made. Markley and co-workers [27] have described a sequence which relies on the differences in sign and build-up rate for the NOE and ROE crosspeaks; by setting the NOE mixing time to half that of the ROE mixing time, the contribution of cross-relaxation to the spectrum is minimised. The spin-lock principle has been extended to the design of efficient ‘pure exchange’ experiments such as CLEANEX-PM [28,29]. The 2D heteronuclear zz-magnetisation exchange experiment has proved very valuable for studies of protein nuclei exchanging slowly between two states [6,30–34]. This is essentially an inverse detection 1 H-15 N correlation experiment with an adjustable delay, tm, inserted between the 15 N chemical shift encoding period and the direct acquisition of the 1 H signal, during which the signal of interest is stored as longitudinal 15 N magnetisation. Assuming that we are looking at a resonance from 15 N-labelled protein, if free protein binds ligand during tm it will give rise to a cross-peak with the 15 N chemical shift of the free protein and the 1 H chemical shift of the complex; similarly, if the complex dissociates during tm this will generate a cross-peak at the 15 N chemical shift of the complex and the 1 H chemical shift of the free protein. The intensities of the auto and cross-peaks as a function of tm follow double exponentials and can be analysed to yield the exchange rates and the 15 NR1 in the free and bound states [6,35]. If the exchange rate is fast, account must be taken of exchange during the magnetisation transfer steps [35]. 7.3.3
Relaxation Dispersion Experiments
These experiments, which have been developed extensively over the past ten years for both fast and slow exchange, provide a means of measuring the exchange contribution to the transverse relaxation rate caused by the exchange of nuclei between states in which they have different chemical shifts (cf. Equation 7.22). The stochastic exchange process dephases the coherent magnetisation which can be refocused by applying either a train of 180 pulses separated by delays of length tcp, as in the Carr-Purcell-Meiboom-Gill (CPMG) experiment, or a spin-lock RF field of strength weff, as in the rotating frame relaxation (R1r) experiment. The dependence of the transverse relaxation rate, R2,eff, on the strength (1/tcp or weff) of the refocusing RF field defines a relaxation dispersion profile, from which estimates of exchange rates, relative populations (and hence equilibrium constants) and chemical shift differences can be obtained by curve fitting [7,32,36–41]. These
236
Protein NMR Spectroscopy
experiments have proved to be an extremely powerful means of studying both ligand binding (e.g. [35,42–44]) and protein folding [45–48]. The basic CPMG relaxation dispersion experiment can be explained as follows, using the elegant athletic analogy of Mittermaier and Kay [41]. Imagine that a number of runners, comprising two groups of slow and fast individuals, start a race at the same time. If at the halfway point of the race all runners are made to stop, turn around, and run back to the starting position, it is clear that both slow and fast individuals will cross the starting line at the same time, giving rise to a spontaneous ordering referred to in the NMR context as an echo. Now suppose that during the course of this ‘out and back’ race the runners can interconvert stochastically, in the sense that a slow runner can become fast and vice versa (corresponding to molecules exchanging between a pair of states). Although the positions of the runners at the end will depend on how often the interconversion process occurs, in general they will not all finish the race simultaneously, and a plot of the distribution of runners crossing the starting line versus time gives rise to a peak that is broader than that observed in the absence of exchange. The breadth of the peak provides information about the relative populations of fast and slow runners, their rate of interconversion (kinetics), and the difference in running speeds between them (analogous to chemical shift differences). The CPMG relaxation dispersion experiment involves applying a variable number of refocusing pulses during a fixed time interval, where each pulse corresponds to runners stopping and turning around in the analogy above. If the pulses are applied at a moderate rate, there is time between the pulses for the runners to get out of phase with each other and the exchange process leads to broad lines (or, equivalently, short R2,eff values) as described above (Figure 7.4A, top). By contrast, when the pulses are applied at a fast rate there is less time for dephasing, the effects of the interconversion are reduced and narrower peaks (longer R2,eff values) are obtained (Figure 7.4A, bottom). An example of the application of this approach to the binding of a peptide to an SH3 domain is shown in Figure 7.4B [35]. Measurements of R2,eff as a function of the interpulse delay tcp (relaxation dispersion) can be fitted to the appropriate equation to determine the parameters of interest. For practical reasons – experimental constraints on the minimum value of tcp – the CPMG experiment is generally limited to kex < 104 s1. In a ligand-binding experiment under intermediate exchange conditions, values of pEL, k1 and Dw can be extracted directly from the fits of the relaxation dispersion. When exchange is fast, the values of pEL and Dw cannot be extracted separately from the fits and instead one can determine kex and Fex, where Fex ¼ pEL ð1pEL ÞDw2
ð7:37Þ
In practice, the sample conditions will often be such that pEL ð1pEL Þ can be estimated and hence Dw can be determined. For example, if a ligand resonance is being studied under conditions where LT ET Kd (taking advantage of the sensitivity of the CPMG experiment to exchange with a species, here the bound ligand, having a low fractional population), then pLEL ET =LT . If, on the other hand, one is studying protein resonances under conditions where LT < ET Kd , then pEEL LT =ET . (The minimum chemical shift difference can always be p estimated by assuming that the populations of the two states are ffiffiffiffiffiffiffiffi equal, when jDwjmin ¼ 2 Fex .) Data obtained at more than one external magnetic field strength is generally essential. The CPMG experiment can also be used under slow exchange conditions, where oscillatory behaviour of R2,eff as a function of tcp is observed, but under
Structural and Dynamic Information on Ligand Binding
237
Figure 7.4 Relaxation dispersion. (A) Schematic representation of signal dephasing during CPMG pulse trains based on the analogy to the runners described in the text, where the y axis plots the distance of the runners from the starting position. A blue or red line indicates a spin in the major or minor state, respectively. Dashed lines correspond to spins experiencing at least one conformational transition, whereas the solid lines correspond to no transitions. Reproduced by permission from Mittermaier and Kay, Science (2006) 312, 224–228. (B) 15N CPMG relaxation dispersion profiles obtained at 500 MHz (blue, lower) and 800 MHz (red, upper) proton Larmor frequencies for Leu7 of the Fyn SH3 domain partially saturated with its ligand, a 12-residue peptide. Data are shown for 20, 30, 35, 40, 45, and 50 C (a)–(f), illustrating the effects on the relaxation dispersion of the changes in koff, from 11.7 s1 at 20 C to 331 s1 at 50 C. Best-fit curves were generated using a single value of pB optimised for all temperatures, koff values fit globally, and a Dw value taken from HSQC spectra of the free and peptide-saturated states. Reproduced by permission from Demers and Mittermaier, J. Amer. Chem. Soc. (2009) 131, 4355–4367. Please refer to the colour plate section
238
Protein NMR Spectroscopy
these conditions the 2D heteronuclear zz-magnetisation exchange experiment may be preferable [49]. In recent years a series of improvements to the basic pulse sequences for the CPMG relaxation dispersion experiment have been reported, which have helped to minimise artefacts, have extended the range of rates which can be studied and, in combination with appropriate labelling schemes, have extended its application from 15 N and 1 H N to include 13 Ca and 13 CO signals, allowing a detailed characterisation of the exchange process. The sequences have also been modified to incorporate the TROSY sequence and extend the application to larger proteins. Care must be taken both in the experimental design and in the analysis of the dispersion curves to ensure accurate results, but excellent guidance is available in a number of publications [32,35,36,40,50,51]. Similarly, an excellent review of the theoretical and practical aspects of the R1r method, which can be extended to kex < 105 s1, is available [37]. A key feature of this class of experiment is that it makes it possible to obtain structural information on species which are present as only a small proportion (a few percent) of the equilibrium population. With the improved methods for obtaining structural information from chemical shifts (see Chapter 5), together with experiments which allow orientational information from the minor species to be obtained from residual dipolar couplings [43,52], atomic resolution structural information on such low-population species can now be obtained ([53]; this paper refers to the structure of a protein folding intermediate, but the same approach could be applied to intermediates in ligand binding processes).
7.4
Detecting Binding – NMR Screening
In recent years, NMR has been applied with some success as a technique for the identification of lead compounds in the drug discovery process [54–63], particularly in the detection of low-affinity (Kd micromolar to millimolar) specific ligands as part of a ‘fragment-based’ drug design strategy. In this section we will focus on NMR methods for simply detecting binding of small molecules to a target macromolecule that can be applied to screening compound libraries. In addition, NMR can of course provide structural information on the ligand-target complex to follow on from the initial lead identification, and approaches to this are discussed in Section 7.6 below. Screening may proceed by ligand- or protein-based methods. Protein-based methods observe and compare the NMR parameters of the protein resonances in the presence and absence of candidate compounds or compound mixtures. Such methods, such as the widelyused ‘chemical shift mapping’ approach, intrinsically provide more information than simply whether a compound binds and will be discussed in Section 7.6. While proteinbased methods generally provide more information, they have the disadvantage that many important pharmaceutical targets are not amenable to NMR. The ligand-based methods, which typically involve comparison of the NMR parameters of a mixture of compounds in the presence and absence of the protein molecules, rely on the exchange-mediated transfer of bound state information to the free state. This biases ligand-based methods towards identification of weakly binding ligands (rapid exchange), but it does render the molecular weight of the protein molecule largely irrelevant, and indeed the approach has been applied to the ribosome, to receptors in membranes and to whole cells [64–70]. Competition
Structural and Dynamic Information on Ligand Binding
239
experiments can be used to extend the range of the relaxation approach to tighter binding ligands, including those which are not in fast exchange. For example, 19 F relaxation would be expected to be particularly sensitive to binding, due to its large chemical shift range (and hence large exchange contribution to R2,obs) and the chemical shift anisotropy contributions to R2,EL. While fluorine-containing ligands may not be of direct interest, Dalvit and co-workers [71] have demonstrated the utility of the relaxation properties of a small set of 19 F ‘spy’ compounds in reporting on the binding of higher affinity binders by competitive displacement. Analogous approaches for 1 H STD and waterLOGSY experiments (see below) have been described [57,72–74]. Ligand-based NMR experiments generally detect binding by exploiting either (i) the decreased rotational and translational mobility of the ligand in the bound state or (ii) a transfer of 1 H magnetisation from the protein to the ligand. They are almost universally carried out under conditions of fast exchange, with a large excess of ligand over protein. This simplifies the spectrum greatly and the fast exchange acts as an ‘amplifier’, allowing the detection of the binding of only a small proportion of the ligand; however it does raise the possibility of interference from nonspecific binding, and it is important wherever possible to repeat the experiment in the presence of a specific high-affinity ligand as a control. 7.4.1
Detecting Binding by Changes in Rotational and Translational Mobility of the Ligand
The observation of the broadening of the resonance lines of a ligand in the presence of a protein, due to the increased rotational correlation time in the bound state, was one of the first NMR studies of a protein–ligand interaction [75], and this experiment can still be valuable. However, observation of line broadening can be difficult, especially if the effect is small (common for LT/ET 1) or there is significant spectral crowding from the protein and/or other compounds in the mixture. A generally preferred alternative is to compare the resonance intensities of the compound(s) in the presence and absence of protein using transverse relaxation experiments such as the CPMG or spin-lock (R1r) sequences [57,76]. These experiments lead to attenuation of resonances of nuclei with rapid relaxation rates; thus, they simultaneously reduce or eliminate protein resonances and facilitate the detection of compound binding through the increased average R2,obs which results under the fast exchange condition (Equation 7.22). Just as the ligand undergoes slower rotational motion when bound to the protein, so it will undergo slower translational motion, and this can similarly be used to detect binding. A pulsed-field gradient stimulated echo experiment (PFG-STE), is based on the application of two gradients of opposite sign separated by an interval t. Owing to fast diffusion during t, a small molecule is situated at a different position at the time of the second gradient relative to its position at the time of the first. The effect of the two gradients thus does not cancel out, causing loss of phase coherence and consequently loss of signal intensity. The conditions for complete resonance cancellation depend on the diffusion time t and the strength of the gradients; conditions can be found in which all the resonances of small molecules of 200–400 Da either strongly decrease or even disappear from the spectrum. When the ligand binds to a protein its diffusion is slowed down, and as a result the intensities of the averaged ligand resonances are attenuated to a lesser extent than those of the free ligand and therefore remain in the spectrum [55,76–78].
240
7.4.2
Protein NMR Spectroscopy
Detecting Binding by Magnetisation Transfer
The mostly widely used and probably the most sensitive NMR experiments for screening applications are those which depend upon transfer of magnetisation by cross-relaxation from protons of the protein to protons of the bound ligand and hence, by chemical exchange, to the bulk ligand. These experiments are closely related to the more structurally specific transferred NOE experiment which is discussed in detail in Section 7.6.1 below. 7.4.2.1 Saturation Transfer Difference (STD) Spectroscopy This has become a very popular method for screening due to its simplicity, the small amounts of protein required (10 mM), and its compatibility with large targets. The essence of the experiment [57,62,79,80] is to saturate protein protons selectively via a train of frequency-selective RF pulses applied to a frequency region that contains protein but not ligand resonances (e.g. 0.0 to 1.0 ppm). The saturation propagates from these initially saturated spins to other protein protons via the network of intramolecular 1 H–1 H crossrelaxation pathways, and the bound ligand picks up the saturation via intermolecular 1 H–1 H cross-relaxation at the ligand–protein interface. The spin-diffusion process is efficient in a large proteins with long tc values – one of the few instances where an NMR experiment is easier in larger proteins. The ligand then dissociates back into free solution where the saturated state persists due to the slow longitudinal relaxation in the free state. Ligand exchanges on and off the receptor while saturation energy continues to enter the system through the sustained application of RF, so that saturated free ligands accumulate during the saturation time. A complementary reference experiment in which the identical RF train is applied far off-resonance is recorded in an interleaved fashion and the two spectra are subtracted. The resulting difference spectrum yields only those resonances that have experienced saturation – i.e. the resonances of the compounds which bind. (The difference spectrum also includes signals from the protein, but these are generally too weak to be observed, due to the low protein concentration; if necessary they can be eliminated by transverse relaxation filtering.) For the STD technique to work properly, k-1 for the complex must be at least of the same order of magnitude as the intermolecular cross-relaxation rate; the Kd range of the method has been estimated to be 108 M < Kd < 103 M [79]. For relatively large ligands, the magnitude of the STD effect on different protons of the ligand can in principle be used to define a ‘binding epitope’ – that is, to identify the protons of the ligand which are in contact with the binding site [72]. Jayalakshmi & Krishna [81,82] have carried out a detailed analysis of the STD experiment using a combined exchange and relaxation matrix approach and suggest that caution is required in this kind of quantitative analysis. Relaxation properties of the ligand can considerably alter the relative intensities of the STD saturation factors, independently of the ligand-target contact surface, and the assumption that the spin-diffusion mechanism uniformly spreads the saturation across all protons of the protein may not be correct. Recently Claridge and colleagues [83] have proposed modifications to the STD experiment which promise substantially to overcome these problems. In addition to the usual conditions of large ligand excess and fast exchange, they measure the STD values under saturation equilibrium conditions and use the nonselective T1 values for the ligand protons to account approximately for cross-relaxation within the ligand. As noted above, the use of a competitive ligand is valuable in eliminating effects due to nonspecific binding; competition can also be used in the reverse fashion, by screening
Structural and Dynamic Information on Ligand Binding
241
for compounds which decrease the STD effects seen with a known ligand, hence screening for ligands which bind to the same site as the known ligand [84,85]. Hajduk et al. [86] have described an extension of the STD experiment, named SOS-NMR, where a series of samples are used in which the protein is perdeuterated except for a single amino-acid type; for samples such as this, the spin-diffusion which is a feature of the STD experiment will be much more limited, and it is preferable to measure specific intermolecular NOEs (see Section 7.6). 7.4.2.2 Water-LOGSY This experiment (water-ligand observed via gradient spectroscopy; [87,88]), like STD, relies on perturbation of protein protons through a selective RF pulse scheme and transfer of the perturbation to the bound ligand by cross-relaxation. However, waterLOGSY achieves this indirectly by selective inversion (not saturation) of the bulk water magnetisation as opposed to direct perturbation of protein magnetisation. The intended transfer of magnetisation is therefore water ! protein ! ligand. Inverted water magnetisation can be transferred to the bound ligands via two different pathways. One involves direct 1 H-1 H cross-relaxation between the bound ligand and either ‘bound’ waters or exchangeable protein protons (NH and OH) within the binding site, whose magnetisation has been perturbed by chemical exchange of water molecules or protons with bulk water. In the second pathway, magnetisation of bound water molecules or exchangeable protons elsewhere on the protein, perturbed by chemical exchange with bulk water in the same way, is transferred to the ligand by intra- and intermolecular spin diffusion in a similar way to the processes involved in the STD experiment. In both these pathways, the crossrelaxation with the bound ligands is associated with the long rotational correlation time of the protein and will thus yield negative NOEs. By contrast, the dipolar interactions of free compounds (either ligand molecules in the free state or compounds which do not bind) have much shorter tc, leading to positive NOEs. For the ligand, the bound and free contributions will be averaged in the spectrum, so that if LT ET the negative contribution from the free state can overwhelm that of the bound state, resulting in a false negative. To eliminate this problem, large ligand/protein ratios should be avoided and a reference spectrum of the compounds in the absence of protein should be recorded and subtracted from the spectrum in the presence of protein.
7.5
Mechanistic Information
Of course, not all protein-ligand interactions follow as simple a kinetic mechanism as kþ1
E þ L ! EL k1
and this is relevant to NMR experiments of the kind discussed here in two ways. First, it may be possible to use experiments of the kind discussed in Section 7.3 to obtain information on the kinetic mechanism of binding. Secondly, interpretation of the NMR parameters derived from an NMR exchange experiment, particularly one in which exchange is fast, in structural terms is only reliable if the experiment is analysed in terms of the correct kinetic mechanism.
242
7.5.1
Protein NMR Spectroscopy
Problems of Fast Exchange
A significant limitation to the interpretation of NMR spectra in the fast exchange regime must always be borne in mind: the spectra represent an average over all the states which he nuclei experience, but we do not know a priori how many states there are, nor what their NMR parameters may be. Fast exchange spectra are always interpreted in terms of a model (even if this is not always explicitly stated) and the conclusions will only be meaningful if the model is correct. It is clearly sensible to use the simplest model, and often this is easy to choose; for example, changes as a function of pH in the chemical shift of a nucleus close to an ionisable group would naturally be interpreted in terms of a twostate equilibrium between the protonated and unprotonated states of the ionisable group. For ligand binding to a protein, the simplest model is of course E þ L ! EL. However, if the complex exists in two states, so that the kinetic mechanism is E þ L ! EL1 ! EL2 or, more generally, E+L
EL1 EL2
then the NMR parameters deduced from the experiment for the species EL, on the basis of the simplest model, will in fact be the weighted average of those for EL 1 and EL2 and not those of any real species. The situation is complicated by the fact that many NMR parameters of course have a nonlinear relationship to structure; for example, because of the r6 dependence of the magnitude of paramagnetic relaxation, a significant effect may be observed under fast exchange averaging conditions even when the species from which it arises is only a minor one in terms of population. It is always important to obtain a full range of chemical shift and relaxation data for as many resonances as possible in such a situation, to maximise the possibility of detecting any apparent inconsistencies which might indicate that the model being used is incorrect; comparison with structural and kinetic information from other techniques can also be helpful (see below). 7.5.2
Identification of Kinetic Mechanisms
As with any other approach to the elucidation of kinetic mechanisms of protein-ligand interactions, NMR can only lead to the identification of the simplest mechanism necessary to explain the data. In essence, it can provide information on the minimum number of species present at detectable levels and on the exchange pathways which link them. It has a particular value in that the experiments can be carried out under equilibrium conditions, without the need for perturbations such as rapid mixing or temperature-jump, and thus studies of enzyme-substrate interactions including both binding and catalysis are possible [5,42,89–95]. The relaxation dispersion experiments described in Section 7.3.3 are particularly valuable in this context, since under favourable conditions they make it possible to determine the relative populations and chemical shifts of species with low populations.
Structural and Dynamic Information on Ligand Binding
243
7.5.2.1 Slow Exchange The simplest situation to interpret is one in which exchange between the free species and all forms of the complex are slow on the chemical shift timescale, so that separate resonances are observed. It is thus in principle possible to distinguish readily between the simple situation in which resonances are observed only for one form of the free protein and one of the complex, and more complex cases where the free protein and/or the complex exist in more than one slowly-interconverting state. If multiple resonances are indeed observed for individual nuclei – for example, in a 1 H-15 N HSQC spectrum – it is important to exclude other explanations, such as chemical heterogeneity of the protein (‘fraying’ at the termini, partial de-amidation, etc.), before concluding that multiple conformational states are present. The clearest way of demonstrating that the multiple resonances do indeed represent different conformational states is to detect exchange between them by 2D exchange or magnetisation transfer experiments, provided that the exchange rate is not too slow. The exchange experiments will also provide information on the kinetic mechanism and the rates of interconversion, and structural information, such as NOEs, can be obtained for each form of the complex. While exchange between different forms of the complex which is sufficiently slow for separate resonances to be observed is not very common, it has been observed in a number of situations, notably in a wide range of complexes of dihydrofolate reductase (reviewed by Feeney [96]). In some cases the slow interconversion was due to a slow conformational isomerisation of the ligand [24,97], for example rotation about a partial double bond, but in others the rate-limiting step was clearly associated with changes in the protein [22,98–101]. These papers provide examples of the kind of approaches which can be used to study multiple conformations in slow exchange, and emphasise the importance of isotopic labelling of both the ligand and the protein in order to identify multiple resonances from individual nuclei in the complicated spectrum of a proteinligand complex. 7.5.2.2 Fast Exchange When the resonances of the different species present are averaged, information on the kinetic mechanism, i.e. on the number of species, is of necessity less direct, but it can sometimes be established that the simple two-state model cannot account for the data. For example, in the widely used experiment where 1 H-15 N or 1 H-13 C HSQC spectra are recorded on successive additions of the ligand, for fast exchange the correlation peaks almost always move linearly in the spectrum (Figure 7.2), sometimes with broadening at intermediate ligand concentrations. This is a necessary consequence of the weighted average equation (Equation 7.20) if there is a simple two-state equilibrium between free and bound states. If, however, the change in peak position follows a curved path when the spectra at successive ligand concentrations are superimposed (Figure 7.5 [102]; see also [103]) then the equilibrium must be a more complex one, E þ LKEL1 KEL2 ; involving more than one mode of ligand binding (in which the ratio of 1 H to 13 C or 15 N shifts differ). The ligand concentration-dependence of the linewidth can also provide valuable information. From Equation 7.22 and the analogous equation for protein resonances, for a simple binding process the third (exchange) term will be zero when the ligand is wholly bound to the protein (or the protein is wholly saturated with the
244
Protein NMR Spectroscopy
Figure 7.5 Ligand titration – evidence for more than two states. Titration of [methyl-13C]Methionine-labelled cardiac troponin C with bepridil and trifluoperazine (TFP). Chemical shift changes for Met residues located in the N-terminal domain (panels A and C) are plotted separately from those in the C-terminal domain (panels B and D). Bepridil was added in 0.5 equivalents to a drug/protein ratio of 4 (panels A and B). TFP was added in 1.0 equivalents to a drug:protein ratio of 6 (panels C and D). The cross-peaks for successive drug additions are shown only for Met-45 and -157, respectively; for other cross-peaks, only the initial and final positions are shown. Reproduced by permission from Kleerekoper et al., J. Biol. Chem. (1998) 273, 8153–8160
ligand); however, if the exchange process is in fact, then even under these conditions there may be an exchange contribution to the linewidth from the interconversion between EL1 and EL2. The substrate concentration-dependence of R2 (measured by relaxation dispersion experiments) has been used to great effect to separate binding and protein isomerisation events in studies of enzymes undergoing catalytic turnover [42]. As noted above, the key to correct analysis of experiments carried out under fast exchange conditions is to obtain as many different kinds of data for as many different resonances as possible. As a simple example, in the studies of Birdsall et al. [99] of the ternary complexes of dihydrofolate reductase with folinic acid and NADP þ or NADPH, the rates of
Structural and Dynamic Information on Ligand Binding
245
dissociation of the coenzyme from the complex estimated by 1 H saturation transfer experiments and by 31 P lineshape analysis differed by a factor of 5 (NADP þ ) or 8 (NADPH), clearly indicating that a simple two-state equilibrium was not sufficient to explain the data; it was shown that a model involving two forms of the complex could explain the results. A particularly powerful approach to testing the validity of the simple two-state model is to analyse both chemical shift titration and relaxation dispersion experiments for as many protein residues as possible. Both kinds of experiment can yield estimates of Kd and of Dw; the estimates of Kd should of course be the same within experimental error for all residues, while the estimates of Dw for each individual resonance should be the same whether derived from chemical shift titration or relaxation dispersion. An additional useful comparison is between the values for relative populations, exchange rates and Dw obtained from relaxation dispersion data either by fitting data for each resonance individually or by carrying out a ‘global fit’, in which the populations and exchange rates are constrained to be the same for all residues, with only Dw differing between them. A thorough analysis of zz exchange and CPMG experiments, together with isothermal titration calorimetry, on the binding of a 12-residue peptide to the SH3 domain from the Fyn tyrosine kinase showed that all the data were consistent with a simple two-state binding equilibrium [35]. By contrast, in a study of the interaction of an SH3 domain of CIN85 with ubiquitin [44], while the chemical shift titration data and the relaxation dispersion data were each individually consistent with a two-state model, the two kinds of experiment yielded different values for Kd and for Dw, showing that the two-state model was inadequate; the data could be satisfactorily accounted for by a three-state model. It should be noted that these kinds of analysis require not only very extensive data, but also a careful statistical analysis to ensure that a more complex kinetic model is indeed justified by the data [51]. Comparison with information from other methods can also be valuable in analysing fast exchange experiments. For example, in a study of coenzyme A binding to chloramphenicol acetyltransferase [104], the dissociation rates estimated by NMR were 2.5–3 times faster than those measured by stopped-flow fluorescence experiments, and the conformation of the bound ligand determined by transferred NOE experiments (see below) differed from that determined by X-ray crystallography. Both of these discrepancies could be accounted for by a model in which the complex exists in two states, the predominant one dissociating more slowly and having the conformation seen in the crystal and the other having a more compact conformation of coenzyme A, with shorter internuclear distances and hence having a predominant influence on the NOE experiment. This sort of comparison is also key in assessing the biological relevance of the exchange processes involved, by showing that a protein isomerisation process takes place at a rate consistent with it contributing to the rate-limiting step in an enzyme-catalysed reaction; experiments of this kind have proved extremely valuable in throwing light on the relatively slow and probably rate-limiting motions of enzymes involved in catalysis (e.g. [42,90,91,94,95,105–109]). NMR ‘probes’ can also be used in this way; chemical modification with a 13 C labelled group or incorporation of a fluorine-containing amino-acid has been used to position a ‘probe’ in a strategic place in an enzyme and to detect an isomerisation process which occurs at a rate corresponding to the turnover rate of the enzyme or to product release [110,111]; these kinds of experiments necessarily provide
246
Protein NMR Spectroscopy
much less detailed information about the nature of the isomerisation process, but they can be applied to relatively large proteins.
7.6
Structural Information
In this section, the focus will be on methods for obtaining information on the location of the binding site and the conformation of the bound ligand. From the preceding discussions, it will not be a surprise to the reader that the approaches to this differ according to the rate of exchange of the ligand on and off the protein. If the ligand binds tightly and the rate of exchange is slow, it will almost always be possible to use a sample in which ET Kd and the ligand is in as close as possible to an equimolar concentration to the protein. Under these conditions, the protein-ligand complex will be the predominant species in solution, and artifacts arising from the very sharp lines from the low concentration of free ligand will not interfere. The structure of the protein-ligand complex can then be determined on the basis of NOEs, residual dipolar couplings, and so on, as described in Chapters 4 and 5. Isotopefiltering and -editing methods (Section 7.6.4 and Chapter 8) can be used to focus in on the binding site by specifically observing ligand resonances and ligand-protein NOEs. These methods work best when applied to stable complexes with Kd less than about 106 M; for weaker complexes they can suffer from chemical exchange induced line broadening or confusing chemical exchange cross-peaks in filtered NOESY spectra. When exchange is fast, it may not be possible to obtain a fully-formed complex, and more indirect methods, such as chemical shift mapping (Section 7.6.3) and the transferred NOE experiment (Section 7.6.1) will be required. 7.6.1
Ligand Conformation – the Transferred NOE
This experiment, commonly abbreviated4 trNOE, has proved to be widely applicable and can provide valuable information on the conformation of a ligand bound to a protein without the need to observe the resonances of the bound ligand directly. It is a simple experiment to perform, but care is required if it is to be interpreted quantitatively. The essence of the experiment is shown in Figure 7.6, for the simple situation of two spins, I and S, with crossrelaxation between them, which exist in two states, bound and free. If the rate of exchange between the two states is much faster than the spin-lattice relaxation rate in the bound state (kex R1b), then changes in magnetisation of a bound ligand proton resulting from cross-relaxation (NOEs) will be transferred to the free ligand by exchange. The average cross-relaxation rate in a ligand exchanging rapidly on the relaxation timescale between free and bound states is given, by analogy to Equation 7.23, by hsi ¼ pL sL þ pEL sEL
ð7:38Þ
Cross-relaxation depends upon the rotational correlation time of the molecule and will be substantially more efficient in the macromolecule than in the free ligand; thus the 4
It has been suggested [112] that the abbreviation should be etNOE, for exchange transferred NOE, to avoid confusion, as trNOE has been used to refer to the transient NOE experiment; while this is certainly more logical, the usage of trNOE for the transferred NOE experiment is so widespread that it will be retained here.
Structural and Dynamic Information on Ligand Binding
247
Figure 7.6 The transferred NOE experiment. Two spins, I and S, exist in two states, bound (Ib and Sb) and free (If and Sf), between which they exchange (solid arrows). The spin-lattice relaxation of the two spins in each of the two states is indicated by dotted arrows. If the rate of exchange between the two states is much faster than the spin-lattice relaxation rate in the bound state (kex RIb, RS,b), then changes in magnetisation of a bound ligand proton resulting from cross-relaxation (NOEs) will be transferred to the free ligand by exchange. There is crossrelaxation between spins I and S (heavy dashed arrows), at rates sL for the free state and sEL for the bound state (with sEL > sL). There is also cross-relaxation with protein protons (light dashed arrows), leading to spin diffusion
cross-relaxation rate in the complex, sEL, is much greater than the cross-relaxation rate of the free ligand, sL. It is therefore easy to produce a situation where the observed NOE is dominated by cross-relaxation in the bound state. In the usual situation in which the off-rate is fast on the chemical shift timescale as well as on the relaxation timescale, the only signals observed are sharp and virtually identical to those of free ligand, and consequently are easily assigned. We therefore observe NOEs characteristic of the bound state, but seen on the signals of the free state, allowing information about the bound state to be readily obtained. Unusually for protein NMR, the experiment actually gets better as the protein gets bigger, because sEL increases, and therefore the NOE is dominated more and more by the crossrelaxation rate in the bound state. Depending on the exact values of sB and sF (which depend on the tumbling rate of bound and free ligand), ligand:protein ratios of 10–40 can often be used, so that one can use protein concentrations in the low mM range and still obtain useful quantitative results. The trNOE can be either intermolecular, where one of the spins involved is on the protein and the other on the ligand, or intramolecular, where both spins are on the ligand. The first observations of the transferred NOE were intermolecular ones [113–115], and intermolecular trNOEs have been used, for example, to study peptide-antibody complexes [116]. However, there are clear problems with the assignment of the nuclei on the protein
248
Protein NMR Spectroscopy
contributing to the cross-relaxation, which require extensive selective deuteration for their solution [86,117,118], and the developments in heteronuclear NMR methods for the study of large proteins have meant that in most cases intermolecular protein-ligand NOEs can now more readily be studied directly (see below), except for very weak complexes. The major application of the trNOE remains the study of the conformation of the bound ligand by means of intramolecular NOEs between ligand protons, and this will be the focus of this section. In the early applications of the method, one-dimensional selective saturation experiments were used, with the associated limitations to ‘clear’ regions of the spectrum and the dangers of spindiffusion effects arising from the accidental saturation of protein resonances, but the twodimensional NOESY experiment is now the method of choice. Similarly, early applications were frequently interpreted only qualitatively, although this often led to useful conclusions, whereas quantitative analysis is now usually attempted. A number of groups have described frameworks for such quantitative analysis based on a combined relaxation and exchange matrix approach [81,119–122]. The full equations can be found in these references; here the emphasis will be on the conclusions which can be drawn from them for the design and conduct of the experiment (see, for example, [119,120]). In a typical trNOE experiment, either the intensities of NOESY cross-peaks or the initial build-up rates of NOESY cross-peak intensity as a function of mixing time for the ligand resonances are measured and internuclear distances in the bound ligand are estimated from rEL ij rEL ref
¼
aref aij
1=6 ð7:39Þ
where aij and aref are the intensities or initial build-up rates for a given cross-peak and for a cross-peak corresponding to a known internuclear distance, such as a methylene proton pair or a pair of ortho protons on an aromatic ring. This is referred to as the ‘isolated spin pair approximation’ (ISPA). Provided that a sufficient number of distances can be estimated in this way, the conformation of the bound ligand can then be determined by the same methods used for determining protein structures from NOE restraints (Chapter 5). However, for this to be successful, it is essential to consider the conditions which must be fulfilled for Equation 7.39 to be used. 7.6.1.1 Exchange Rate The primary criterion is that the exchange should be fast on the relaxation timescale, for Equation 7.38 to be valid. In the first report of an intramolecular trNOE giving information on the conformation of a bound ligand [123], the rate of exchange between the bound and free states was in fact slow on the chemical shift timescale, so that separate resonances were observed for the bound and free ligand, but fast on the relaxation timescale, so that boundstate NOEs were transferred to the free ligand; this emphasises again that these two timescales are different. An advantage of carrying out trNOE experiments under conditions of slow exchange on the chemical shift timescale is illustrated by experiments on coenzyme binding to dihydrofolate reductase, using one-dimensional selective saturation experiments [124]. At 20 C, the resonance of the N2 proton of free NADP þ showed trNOE effects of essentially equal magnitude whether the irradiation was centred on the Nl0 (free) or the Nl0 (bound) proton resonance frequencies. Thus, with reference to Figure 7.6, the observed effect is the same for the pathway SF ! SB ! IB ! IF as for the pathway SB ! IB ! IF,
Structural and Dynamic Information on Ligand Binding
249
where N10 ¼ S and N2 ¼ I. It follows that the rate of exchange of the coenzyme between the two states must be much faster than the relaxation rate of the N10 proton in the bound state – a condition for the applicability of Equation 7.38. By contrast, at 3 C, where the estimated dissociation rate constant is 18 sl as compared to 29 s1 at 29 C, irradiation at the resonance position of Nl0 (free) has a distinctly smaller effect on the intensity of the N2 (free) signal than irradiation at Nl0 (bound), and exchange is therefore not fast enough for averaging of the cross-relaxation rates. Thus, under conditions of slow exchange on the chemical shift timescale, there is a simple criterion by which to establish whether exchange is fast on the relaxation timescale. However, the vast majority of subsequent applications of the trNOE have involved systems in fast exchange on both shift and relaxation timescales. These have the advantage that very simple spectra are observed, with sharp and readily assigned signals from the ligand (when, as is commonly the case, a substantial excess of ligand over protein is used), but they have the disadvantage that it is much more difficult to establish with certainty that exchange is truly fast. When exchange is somewhat slower, the initial rate of the NOE buildup curve is decreased and is no longer simply a function of the cross-relaxation rates and the fraction of bound ligand, but has a complicated dependence on the dissociation rate of the protein-ligand complex. For quantitative results, where the NOE intensity is related only to the internuclear distance, we require that k1 sEL. This inequality is least likely to be satisfied for protons which are close together in space, so that sEL is large. Using the suggested [112] approximation s 57
tc ðnsÞ MolWtðkDaÞ ¼ 24 6 rij ðA Þ r6ij ðAÞ
ð7:40Þ
we can estimate that for an internuclear distance of 2 A, in a protein of 50 kDa sEL 19 s1 and in a protein of 150 kDa sEL 56 s1, and that we require k1 > 100 s1 and >300 s1 respectively. Assuming diffusion-limited association rates, we require Kd 1 mM. For Kd > 1 mM, it will be difficult to ensure that sufficient ligand is present in the bound state to contribute to the overall relaxation, and these values define a ‘window’ in which quantitative analysis of trNOEs is possible. TrNOEs can be observed for more tightly binding ligands (with Kd values as low as 0.03 mM; [125]), but in these cases only qualitative analysis is possible – though this may suffice to draw useful conclusions about the conformation of the bound ligand. Unfortunately it is far from easy to establish from the trNOE experiment itself whether the fast exchange criterion is satisfied. Simulations have shown that exchange limitation can lead to a lag in the trNOE build-up rate at short mixing times, but the lowintensity cross-peaks at short mixing times are often difficult to measure, and such a lag can also arise from spin-diffusion (see below). A truly quantitative analysis requires the comparison of the experimental data with simulations accounting for both relaxation and exchange processes, and this requires an independent measurement of koff, for example from relaxation dispersion experiments. 7.6.1.2 Contributions from Other Species The simple analysis assumes that pL sL pEL sEL , so that the observed averaged crossrelaxation rates are dominated by the bound state scaled down by the fraction of bound ligand. In general this will be true, and indeed for many ligands of interest wtc 1 for the
250
Protein NMR Spectroscopy
free ligand, so that the NOEs in the free state will be close to zero. However, it is essential to check this by running a control NOE experiment in the absence of protein. While decreasing the ligand concentration will decrease the contribution from the free state, a substantial excess of free ligand is advantageous from the practical point of view. At higher ligandprotein ratios, although the percentage NOE enhancement decreases, the absolute magnitude of the signal increases, giving an improved signal-to-noise ratio, and the build-up rate as a function of mixing time is slower, avoiding the need for very short mixing times. Simulations have led to the recommendation that it is best to use very low pEL values and moderately long mixing times [119]. Note that the relative contribution of the free and bound states also depends on the size of the protein (on the relative magnitude of sEL and sL); for small proteins it is necessary to use lower pEL values (lower ligand to protein ratios) to ensure that the inequality above is satisfied [126]. A potentially greater problem arises from nonspecific binding, since the experiments are usually carried out with a relatively large excess of ligand. For example, with a 30-fold excess of ligand, binding to a specific site and to a 100-fold weaker nonspecific site will give almost equal contributions to the observed (averaged) trNOEs, and hence to misleading distance information. Depending on the accessible concentration range, it may be possible to minimise nonspecific binding by decreasing the ligand and protein concentrations while maintaining a constant ligand:protein molar ratio [127]. However, the most clear-cut control is to repeat the trNOE experiment with the protein binding site blocked by a tight-binding inhibitor [122,128]. 7.6.1.3 Spin Diffusion The ‘isolated spin pair approximation’ referred to above ignores all the nuclei in the system except spins I and S. This is clearly wrong, in general; it is apparent from Figure 7.6 that there will in principle be other relaxation pathways, referred to as spin diffusion pathways, for I and S. This is the single greatest problem in a truly quantitative analysis of the trNOE. The traditional means by which spin diffusion is identified in NOE experiments is the existence of a ‘lag’ at the beginning of the build-up of NOE intensity as a function of mixing time, but this is of little practical use in trNOE experiments since the very small peaks at short mixing times are very hard to measure accurately and, as noted above, a lag can also arise as a result of exchange limitation. Spin diffusion can be detected using a transferred ROESY experiment, in which spin diffusion gives rise to smaller peaks of opposite sign to direct NOEs. Bax and co-workers have provided an elegant example of a trNOE that disappeared on using a transferred ROESY experiment, showing that is was due to spin diffusion [129]. However, this experiment typically requires a low ligand:protein ratio and more careful control experiments in the absence of protein, because the ratio of sB to sF is not as great as for NOESY, and it is also more prone to artifacts. Transferred ROESY has not in fact been very widely used, but can be very valuable as a check that measured NOEs are genuine, rather than the result of spin-diffusion [120]. In considering the effects of spin diffusion, it is important to distinguish between intramolecular spin diffusion (within the ligand, involving ligand protons in addition to I and S) and intermolecular spin diffusion (involving protein protons). Spin diffusion through the ligand protons decreases reasonably rapidly as the fraction of bound ligand decreases so that, under the conditions of excess ligand which are commonly used, it is less of a problem. It is also possible to use a relaxation matrix approach to include all the dipolar
Structural and Dynamic Information on Ligand Binding
251
interactions between all the spins in the ligand, thus taking into account spin diffusion [104,120,130]. The success of this approach of course requires that a large number of trNOEs can be measured, so that a large number of distance restraints define the relative positions of the ligand protons and the spin diffusion effects can be estimated. Spin diffusion involving protein protons decreases less rapidly as the fraction of bound ligand decreases and can become a severe problem at typical ligand:protein ratios [121,131]. Simulations have been used [112,132,133] to argue that intermolecular spin diffusion is no more important than intramolecular spin diffusion in trNOE experiments, at least for peptide ligands; however, this will obviously depend on the nature and conformation of the ligand and on whether it binds to the surface of the protein or in a cleft. It is useful to distinguish two kinds of intermolecular spin diffusion effects. The first leads to loss of magnetisation from the ligand spin(s) of interest and hence to a decreased intensity of the trNOE peak, leading to an overestimate of the distance between the ligand nuclei. This can to a large extent be accounted for by the conventional use of the distance estimated from the trNOE intensity as an upper bound in calculating the ligand conformation. Much more serious is the situation where there is spin diffusion from a bound ligand proton to a protein proton and thence back to a second ligand proton, leading to an increase in the trNOE and making it appear that the two ligand protons are closer together than is in fact the case. This can obviously not be accounted for by using the trNOE-derived distance as an upper bound, and the relaxation matrix approach cannot be used for intermolecular spin diffusion effects, since in general the positions of the protein protons are not known. At the same time, a spin diffusion contribution of this type cannot be excluded even if there is a linear dependence of cross-peak intensity on mixing time [120]. Transferred ROESY experiments can sometimes help to identify spin diffusion effects [120], but the only general solution is to use perdeuterated protein (e.g. [104]). 7.6.1.4 Structure Calculation There are thus substantial limitations to the applicability of the isolated spin pair approximation (Equation 7.39), and a full relaxation and exchange matrix approach, taking account of both ligand and protein protons, is not possible (except in the case where the structure of the complex is already known). In spite of these limitations, valuable and relatively precise structural information can be obtained provided that a substantial number of trNOEs can be observed between ligand protons distant in the chemical structure. The calculated distance, rij, from Equation 7.39 must be interpreted only as an upper bound to the distance between the ligand protons in the bound state; this is often done by classifying the trNOEs as strong, medium or weak, with corresponding distance ranges. It is very useful to measure the crosspeak intensities as a function of the NOE mixing times and/or the fraction of bound ligand [112,119,120,122]. The structure calculation then proceeds as for protein structure determination, using the distance bounds as restraints in simulated annealing or similar calculations, recognising that, because many fewer restraints will be available, the structure will be less precise. 7.6.2
Interligand Transferred NOEs
These experiments provide information on the relative position of two ligands in the binding site rather than on ligand conformation, but they are included here since the experiment involved is the same as the trNOE described above.
252
Protein NMR Spectroscopy
7.6.2.1 Two Ligands Bound Simultaneously Just as intermolecular magnetisation transfer between the ligand and the protein can be observed in a trNOE experiment [113,115,117,118], so can magnetisation transfer between two ligands bound in adjacent positions in the binding site; this can obviously provide valuable information about the relative positions of two ligands in a ternary complex. This effect was first observed in the ternary complex of chloramphenicol and coenzyme A with chloramphenicol acetyltransferase [104] and has subsequently been observed in a number of other systems [134–139] and named the ILOE (interligand Overhauser effect). A theoretical analysis has been developed for the evaluation of interligand Overhauser effects in a ternary complex [140], providing guidance on the conditions which must be fulfilled for correct interpretation. As in the case of intraligand NOEs, ILOEs can arise from spin diffusion. This is made worse by the fact that ILOE experiments typically require longer mixing times than intraligand trNOE experiments, due to the fact that the internuclear distances between the ligands will typically be longer than those within a ligand. There is thus a greater opportunity for spin diffusion; spin diffusion from a proton in one ligand to a protein proton and thence back to a proton in the second ligand will make it appear that the two ligands are closer together than is in fact the case. In addition, studies of ternary complexes by transfer experiments are considerably more challenging than those of binary complexes because the exchange rates of both ligands must be fast on the relaxation timescale,5 while at the same time the affinities and concentrations must be such as to ensure that a substantial fraction of the protein is present as the ternary complex. It has also been shown that the detailed behaviour of the ILOE depends on the kinetic mechanism of binding – whether the two ligands bind in a random or ordered fashion [140]. It is thus true to an even greater extent for ILOEs than for trNOEs in general that the derived distance constraints must be treated with caution, though again even qualitative information can be valuable. For example, Sledz et al. have described the use of ILOEs in fragment-based inhibitor design [141]. In the case of chloramphenicol acetyltransferase [104], the availability of crystal structures of the two binary complexes made it possible to use an iterative relaxation matrix approach, including protein protons, leading to models of the ternary complex and of the transition state for the reaction. It is important to note, however, that if information on the protein is not available, the constraints from the interligand NOEs can only move the ligands closer together, perhaps leading to an inaccurate structure of the ternary complex. 7.6.2.2 Competitive Ligands – INPHARMA This ingenious experiment [142,143] exploits the existence of ligand-protein spin diffusion, which is such a problem for the interpretation of trNOEs, to obtain information about the relative orientation of two competing ligands in the binding site. Cross-relaxation between proton HA of ligand A and a protein proton HP will lead to magnetisation transfer between them while ligand A is bound. If ligand A then dissociates and ligand B binds, magnetisation can now be transferred from HP to HB, on ligand B. This can result in an intermolecular
5
In principle, transferred interligand NOEs could also be observed if one of the two ligands bound tightly (in slow exchange) and the second was in fast exchange, providing that the order of binding is either random or with the rapidly-exchanging ligand binding second. Under these circumstances, the tightly binding ligand is essentially behaving as part of the protein and there will be a similar resonance assignment problem.
Structural and Dynamic Information on Ligand Binding
253
trNOE peak between HA and HB on the two different ligands, corresponding to the pathway HA ! HP ! HB. Clearly this pathway can only exist if both HA and HB are close to HP in their respective complexes, providing information about the relative orientation of the two competing ligands in the binding site. The experiment obviously requires that both ligands are exchanging rapidly relative to the cross-relaxation rate with the protein proton, and at similar rates to one another; this may limit its general applicability (for detailed simulations see [144]). 7.6.3
Ligand Conformation – Transferred Cross-Correlated Relaxation
Cross-correlated relaxation (CCR) originates from the interference between two relaxation processes (either dipolar or chemical shift anisotropy) and depends on the projection angle between the vectors defining the two interfering relaxation processes. For example, the CCR rate between two 13 C-1 H dipolar interactions, C1-H1 and C2-H2, depends on the projection angle defined by the directions of the C1-H1 and C2-H2 vectors. CCR thus provides angular rather than distance information and in principle is a valuable adjunct to NOE measurements. The CCR rate is greater the longer the correlation time of the molecule and thus, similar to trNOEs, transferred-CCR rates can be measured for a ligand in the presence of a protein and interpreted to provide information on the bound conformation of the ligand. This approach has been used successfully in a limited number of cases [145–147]. However, the fast exchange condition for CCR rates requires koff R2,B for C1 and C2, as opposed to koff sB, R1,B for the trNOE and is substantially more restrictive, typically by a factor of 10, limiting the application of transferred CCR [148]. 7.6.4
Chemical Shift Mapping – Location of the Binding Site
Of the NMR parameters that change upon complex formation, the chemical shift is the easiest to measure and has been extensively employed in the location of ligand binding sites by ‘chemical shift mapping’ (see also Chapter 8). With the use of uniformly [2 H, 13 C, 15 N]-labelled proteins, it is now possible to assign the backbone resonances of proteins up to at least 50 kDa. The [1 H-15 N]-HSQC spectrum of such a protein therefore contains one signal for each amino-acid residue except proline. On addition of the ligand, the signals of those amides whose environments are changed by ligand binding will change position. If the ligand binds in fast exchange, addition of increasing concentrations of the ligand will lead to progressive shifts of the resonances, so that each amide signal can be followed from its position in the free protein to its position in the complex. For slow exchange, affected residues will be characterised by the disappearance of the signal from the free protein and the appearance of a signal from the complex. In the slow exchange situation, if – as is usually the case – resonance assignments for the complex are not available, the ‘minimum chemical shift approach’ is employed. In this method, for each cross-peak in the assigned [1 H,15 N]-HSQC spectrum of free protein the nearest cross-peak in the unassigned spectrum of the complex is identified and the chemical shift difference between them is measured. This conservative approach may underestimate the shift differences in crowded regions of the spectrum, and will clearly not identify all those residues whose environment is different in the two situations, but it has proved very valuable. In both the fast and slow exchange
254
Protein NMR Spectroscopy
situations, the 1 H and 15 N chemical shift differences are often combined into a single parameter, using a formula such as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi DðH; NÞ ¼ ðDH WH Þ2 þ ðDN WN Þ2 ; ð7:41Þ where WH and WN are weighting factors for the 1 H and 15 N amide shifts respectively and D ¼ ncomplex nfree. Values of WH ¼ 1, WN ¼ 0.154 [149,150] are commonly used; Schumann et al. [151] have recently discussed the optimum formulae and weighting factors (though in the context of protein-protein interactions) and conclude that there may be an advantage in using amino-acid specific weighting factors derived from the BMRB. This chemical shift parameter will allow one to identify those amide groups whose environment is most affected by ligand binding. These will include groups both in residues that make contact with the ligand and in residues that are affected indirectly by ligand-induced changes in protein structure. Although these groups cannot be distinguished on the basis of this experiment, in practice a clear surface patch of affected residues is generally observed when the affected residues are mapped onto the protein structure, indicating the location of the binding site. A program SAMPLEX which carries out automated mapping of perturbed and unperturbed regions of the protein has been described [152]. Where comparisons with subsequently determined structures have been made, chemical shift mapping has proved – even when the minimum chemical shift approach is used – to be a reliable indicator of the location of ligand binding sites in proteins (e.g. [153,154]). Further refinement is possible by using the shifts as ambiguous restraints in programs such as HADDOCK ([155]; see Chapter 8). Recently it has been shown [156] that small changes in the magnitude of the onebond 1 H-15 N scalar coupling in backbone amide groups can be observed on ligand binding. Though these changes are small, they can be measured reliably and the fact that there is only a modest correlation with D(H,N) suggests that they contain additional information. An ingenious extension of chemical shift mapping has been developed by Fesik and colleagues and dubbed ‘SAR (structure–activity relationships) by NMR’ [157]. [1 H-15 N]-HSQC spectra are used to screen a library of small molecules to identify molecules that bind to two distinct sites. When these ‘lead’ molecules have been identified, analogues are synthesised and tested to optimise the affinity at the two sites. Once two ligands with reasonable affinity have been identified, knowledge of the protein structure, and hence the relative positions of the two sites identified from the [1 H-15 N]-HSQC experiments, is used to design a ‘linker’ to combine the two relatively weakly binding molecules into one that binds much more strongly. Very substantial increases in affinity have been obtained in this way [55,157,158]. While backbone amide chemical shifts are commonly used for chemical shift mapping aimed at the location of the ligand binding site, selective labelling can be used to answer specific questions about the effects of ligand binding – for example in a recently reported ‘conformational assay’ which rapidly identified compounds binding to Abl kinase which affected the conformation of helix I [159]. 7.6.5
Paramagnetic Relaxation Experiments
The magnetic moment of an unpaired electron is 103 times that of the proton, and thus the presence of a paramagnetic species can have a marked effect on the NMR spectrum. Both relaxation and shift effects are observed, these effects being distance and orientation and
Structural and Dynamic Information on Ligand Binding
255
thus containing valuable structural information. These effects are discussed in detail in Chapter 6, but are mentioned briefly here because dipolar relaxation of protons by paramagnetic species has been extensively used to obtain information the distance between nuclei in the ligand and the paramagnetic species, and hence on the conformation and mode of binding of ligands in protein binding sites (e.g. [160–169]). The paramagnetic species can be naturally occurring, such as iron in haem proteins, a substituted metal ion, such as manganese replacing the essential magnesium in kinases or cobalt as a replacement in a number of zincdependent enzymes, or an artificially attached metal ion or spin-label (see Chapter 6). The magnitude of the relaxation effect of the paramagnetic species is such that experiments are carried out under conditions where the concentration of the protein-ligand complex containing the paramagnetic species is very low relative to the total concentration of the ligand, with the requirement that exchange between the bound and free states is fast, so that average relaxation rates are measured and extrapolated to the bound state. The measurements are thus easy to carry out, but the interpretation is not always straightforward. We are interested in the paramagnetic contribution to the relaxation rate in the bound state, R1,P , so it is necessary to correct for diamagnetic contributions (R1,D) by carrying out a control experiment with, for example, Mg(II) in place of Mn(II); we then have, when LT ET, ðR1;obs R1;D ÞR1;L ¼
ET Kd þ LT ðR1;P R1;L Þ
ð7:42Þ
and a plot of ðR1;obs R1;D Þ against LT will give estimates of R1,P and Kd. Since the paramagnetic effects are large, the extrapolation to the bound state is a long one, and it is important to make measurements at as low a ligand concentration as feasible (limited by the broadening of the resonances). Now R1;P ¼
pLM T1;M þ tM
ð7:43Þ
Where pLM is the fraction of the ligand in a complex with both the protein and the paramagnetic species, tM is the lifetime of this species and T1,M ¼ 1/R1,M. Since it is R1,M which contains the distance information we seek, the first step in the analysis is to establish whether T1,M or tM dominates in Equation 7.43, that is, whether exchange is indeed fast; only in this case can we obtain distance information. When a haem protein, for example, is being studied, the metal ion is very firmly bound to the protein, and the exchange process of concern is simply that of the ligand on and off the enzyme. However, in the case of the use of Mn(II) to study kinases one also needs to consider the binding of the metal to free nucleotide and the formation of a protein-ligand complex without bound metal.6 The relation between R1,M and the distance from the nucleus to the paramagnetic species is given by the Solomon-Bloembergen equations (see Chapter 6). The assumptions involved in the application of these equations are outlined by Dwek [170] and by Jardetzky and 6
Nageswara Rao and colleagues [163,167–169] have noted that in this latter case the experiments are simplified if all of the ligand is bound to the protein (LT < ET) and the Mn(II) is present at a much lower concentration; however, this does of course limit the experiments to the observation of 31P or of 13C in labelled ligands.
256
Protein NMR Spectroscopy
Roberts [171]. In a simplified form, and neglecting any contribution from scalar relaxation or from outer-sphere relaxation, the Solomon-Bloembergen equation is R1;M ¼
C f ðtc Þ r6
ð7:44Þ
where C contains physical constants characteristic of the paramagnetic species and the relaxing nucleus, r is the nucleus-electron distance and tc is the correlation time of the dipolar interaction, with 1 1 1 1 ¼ þ þ ð7:45Þ tc tR ts tM where tR is the rotational correlation time of the complex and ts the electron spin relaxation time. Thus, in order to estimate r it is necessary, first, to establish that exchange is fast (tM T1,M) and, secondly, to determine the value of tc. In order to do this, R1,P must be measured as a function of temperature and of spectrometer frequency. If R1,P shows a linear increase as a function of reciprocal temperature, this confirms that the fast exchange condition is satisfied. When R1,P is limited by tM, it shows the opposite temperature dependence and in addition is frequency-independent. If tc is itself frequency-independent, then a plot of R1,P versus w2 will be linear; however, ts (and hence tc) can under some circumstances (e.g. for Mn(II) complexes) itself be frequency-dependent, making the plot nonlinear. In either event, measurements at a minimum of three frequencies are required to fit the data to the Solomon-Bloembergen equation and allow extraction of distance estimates. 7.6.6
Isotope-Filtered and -Edited Experiments
When exchange of the ligand on and off the protein is slow, the most clear-cut information on the location of the binding site and the conformation of the bound ligand comes from direct observation of the resonances of the bound ligand. When [13 C, 15 N]-labelled protein and/or ligand are available, ligand and protein resonances can be separately observed, and indeed ligand–ligand or ligand–protein NOEs can be selectively observed, by so-called isotope-filtered and -edited experiments.7 The basic idea of the isotope-filtered and -edited approach to studying a protein–ligand complex is shown in Figure 7.7. In this example the macromolecule is labelled uniformly with 15 N and 13 C, while the ligand is at natural isotopic abundance. (Analogous experiments can be carried out with isotopically labelled ligand, though in most cases it is simpler to label the protein biosynthetically, particularly since one will often wish to study the binding of several different ligands.) Correlations within the labelled protein may be selectively observed using isotope-edited 2D or isotopeseparated nD experiments (thin solid arrows). Resonances of the bound ligand can be specifically observed using isotope-filtered NMR experiments (thin dashed arrow) in order to remove the 1 H resonances of the labelled protein, which would otherwise swamp the bound ligand spectrum. Intermolecular NOE correlations may be selectively observed using 7
The term ‘filter’ is used to denote rejection (for example, a 15N-filter removes the signals of protons attached to 15N from the spectrum, but passes all others) and the term ‘edit’ denotes selection of isotope-attached proton magnetisation.
Structural and Dynamic Information on Ligand Binding
257
13C-1H
15N-1H
1H-12C
13C-1H
1H-14N
15N-1H
Figure 7.7 The principle of isotope-filtered and -edited experiments on a protein-ligand complex. The macromolecule is labelled uniformly with 15N and 13C, while the ligand is at natural isotopic abundance. Correlations within the labelled protein may be selectively observed using isotope-edited or isotope-separated experiments (thin solid arrows). Resonances of the bound ligand can be specifically observed using isotope-filtered NMR experiments (thin dashed arrow). Intermolecular NOE correlations may be selectively observed using experiments in which one proton dimension is isotope-filtered, to select for resonances of the unlabelled ligand, and the other proton dimension is isotope-edited or isotope-separated, to select for resonances of the labelled protein (thick dotted arrows)
experiments in which one proton dimension is isotope-filtered, to select for resonances of the unlabelled ligand while removing protein resonances, and the other proton dimension is isotope-edited (2D experiment) or isotope-separated (3D experiment), to select for resonances of the labelled protein (thick dotted arrows) - see also Chapter 8. Editing experiments, in which protons attached to 13 C or 15 N are selectively observed, are closely related to the ubiquitous 13 C- or 15 N- separated experiments and are technically simpler than filtering experiments, in which protons attached to 13 C or 15 N are selectively removed, since in the latter case complete removal of the undesired signals is important to avoid interference with the much smaller number of resonances from the unlabelled ligand. However, filtering experiments are of key importance in observing intramolecular and intermolecular NOEs involving ligand protons and hence in obtaining information on the conformation of the bound ligand and on the mode of ligand binding. For the latter experiments, first, the highest possible levels of isotopic incorporation into the protein are critical; any protons bound to residual 12 C or 14 N in the labelled protein will give rise to spurious resonances in the isotope-filtered dimension(s) of the spectrum. It is important to avoid introduction of any possible extraneous sources of 12 C or 14 N during the fermentation process; for example, methods involving growth to high cell density in unlabelled media, followed by harvesting, resuspension in labelled media and induction of protein expression are best avoided if possible, since they are likely to result in slight but significant carry-over
258
Protein NMR Spectroscopy
Figure 7.8 Detection of intermolecular NOEs in an isotope-filtered experiment. (a) Part of a 12 13 C/ C filtered 2D NOESY spectrum acquired from a sample containing unlabelled methotrexate and 13C-labelled L. casei dihydrofolate reductase showing the NOEs involving the methotrexate N10-CH3 and Ha protons. (b) Molecular structure of methotrexate, showing the NOEs detected between methotrexate and protein protons in the L. casei dihydrofolate reductase-methotrexate complex. Reproduced by permission from Gargaro et al., J. Mol. Biol. (1998) 277, 119–134
Structural and Dynamic Information on Ligand Binding
259
of 12 C and 14 N into the labelled protein. Secondly, the efficiency of the isotope filtering sequence is crucial; see Chapter 8 and [7,172,173]. The power of this approach has led to the development of a wide range of different pulse sequences. These are based either on difference methods (the ‘X-half-filter’; [174]), in which 90 x 90 x pulses to the X-nucleus, giving effective flip angles of 180 or zero, are applied on alternate scans and the data stored separately for addition or subtraction, or on ‘purge’ methods, in which the undesired coherences are destroyed, for example by conversion to unobservable multiple-quantum coherences [173,175,176]. The principles of these are discussed in Cavanagh et al. [7] and in an excellent review by Breeze [172] who describes in detail advantages and disadvantages of the different pulse sequences and the requirements and procedures for such experiments. The practicalities of the selection of experiments for application to protein–protein complexes are described in detail in Chapter 8, and many of the same considerations apply to studies of protein–ligand complexes, of which this approach has become an important part. As an example, Figure 7.8 shows part of a 12 C/13 C filtered 2D NOESY spectrum acquired from a sample containing unlabelled methotrexate and 13 C-labelled dihydrofolate reductase showing the NOEs (both protein–ligand and ligand–ligand) involving the methotrexate N10-CH3 and Ha protons [177], together with the structure of methotrexate indicating the 54 protein-ligand NOEs identified from this spectrum. In the case of a complex between a [13 C,15 N]-labelled protein and an unlabelled ligand, a general strategy is as follows. Proton resonances of the bound ligand are assigned by using 2D F1,F2-13 C,15 N-filtered TOCSY, COSY, and/or NOESY experiments. Intramolecular distance restraints for the bound ligand are obtained from the 2D F1,F2-13 C,15 N-filtered NOESY data. Intermolecular proteinligand distance restraints are derived from a 3D 13 C-edited, 13 C,15 N-filtered HSQCNOESY or a 3D 15 N-edited, 13 C,15 N-filtered HSQC-NOESY spectrum. These spectra contain exclusively NOE peaks between ligand proton resonances (along the F3 dimension) and resonances from 13 C-attached or 15 N-attached protons in the protein (along the F1 (1 H) and F2 (13 C or 15 N) dimensions).
References 1. McConnell, H. (1958) Reaction rates by nuclear magnetic resonance. J. Chem. Phys., 28, 430–431. 2. Binsch, G. (1968) The study of intramolecular rate processes by dynamic nuclear magnetic resonance. Top. Stereochem., 3, 97–191. 3. Kaplan, J.I. and Fraenkel, G. (1980) NMR of Chemically Exchanging Systems, Academic Press, New York. 4. Ernst, R.R., Bodenhausen, G. and Wokaun, A. (1987) Principles of Nuclear Magnetic Resonance in One and Two Dimensions, Clarendon Press, Oxford. 5. Nageswara Rao, B. (1989) Nuclear magnetic resonance line-shape analysis and determination of exchange rates. Meth. Enzymol., 176, 279–311. 6. Bain, A. (2003) Chemical exchange in NMR. Prog. Nucl. Magn. Reson. Spectrosc., 43, 63–103. 7. Cavanagh, J., Fairbrother, W.J., Palmer, A.G. III et al. (2007) Protein NMR Spectroscopy: Principles and Practice, Elsevier Academic Press, Amsterdam. 8. G€unther, U.L. and Schaffhausen, B. (2002) NMRKIN: Simulating line shapes from twodimensional spectra of proteins upon ligand binding. J. Biomol. NMR, 22, 201–209. 9. Kilby, P.M., Van Eldik, L.J. and Roberts, G.C.K. (1997) Identification of the binding site on the S100B protein for the actin capping protein CapZ. Prot. Sci., 6, 2494–2503.
260
Protein NMR Spectroscopy
10. Millet, O., Loria, J.P., Kroenke, C.D. et al. (2000) The static magnetic field dependence of chemical exchange linebroadening defines the NMR chemical shift time scale. J. Amer. Chem. Soc., 122, 2867–2877. 11. Schmitt, T., Zheng, Z. and Jardetzky, O. (1995) Dynamics of tryptophan binding to Escherichia coli trp repressor wild type and AV77 mutant: an NMR study. Biochemistry, 34, 13183–13189. 12. Bevan, A., Roberts, G.C.K., Feeney, J. and Kuyper, L. (1985) 1H and 15N NMR studies of protonation and hydrogen-bonding in the binding of trimethoprim to dihydrofolate reductase. Eur. J. Biophys., 11, 211–218. 13. Fielding, L. (2007) NMR methods for the determination of protein-ligand dissociation constants. Prog. Nucl. Magn. Reson. Spectrosc., 51, 219–242. 14. Feeney, J., Batchelor, J.G., Albrand, J.P. and Roberts, G.C.K. (1979) The effects of intermediate exchange processes on the estimation of equilibrium constants by NMR. J. Magn. Reson., 33, 519–529. 15. Lian, L.-Y. and Roberts, G.C.K. (1993) Effects of chemical exchange on NMR spectra, in NMR of Biological Macromolecules (ed. G.C.K. Roberts), IRL Press at Oxford University Press, Oxford, pp. 153–182. 16. Led, J.J., Gesmar, H. and Abildgaard, F. (1989) Applicability of magnetization transfer nuclear magnetic resonance to study chemical exchange reactions. Meth. Enzymol., 176, 311–329. 17. Gesmar, H. and Led, J.J. (1986) Optimizing the multisite magnetization-transfer experiment. J. Magn. Reson., 68, 95–101. 18. Neuhaus, D. and Williamson, M.P. (2000) The Nuclear Overhauser Effect in Structural and Conformational Analysis, Wiley-VCH, New York. 19. Cayley, P.J., Albrand, J.P., Feeney, J. et al. (1979) Nuclear magnetic resonance studies of the binding of trimethoprim to dihydrofolate reductase. Biochemistry, 18, 3886–3895. 20. Hyde, E.I., Birdsall, B., Roberts, G.C.K. et al. (1980) Proton nuclear magnetic resonance saturation transfer studies of coenzyme binding to Lactobacillus casei dihydrofolate reductase. Biochemistry, 19, 3738–3746. 21. Clore, G.M., Roberts, G.C.K., Gronenborn, A. et al. (1981) Transfer of saturation nmr studies of protein-ligand complexes. Three-site exchange. J. Magn. Reson., 45, 151–161. 22. Gronenborn, A., Birdsall, B., Hyde, E.I. et al. (1981) 1H and 31P nmr characterisation of two conformations of the trimethoprim-NADP þ -dihydrofolate reductase complex. Mol. Pharmacol., 20, 145–153. 23. Jeener, J., Meier, B.H., Bachmann, P. and Ernst, R.R. (1979) Investigation of exchange processes by two-dimensional NMR spectroscopy. J. Chem. Phys., 71, 4546–4553. 24. Tendler, S.J.B., Griffin, R.J., Birdsall, B. et al. (1988) Direct 19F nmr observation of the conformational selection of optically active rotamers of the antifolate compound fluoronitropyrimethamine bound to the enzyme dihydrofolate reductase. FEBS Lett., 240, 201–204. 25. Bothner-By, A., Stephens, R.L., Lee, J. et al. (1984) Structure determination of a tetrasaccharide: transient nuclear Overhauser effects in the rotating frame. J. Amer. Chem. Soci., 106, 811. 26. Davis, D.G. and Bax, A. (1985) Separation of chemical exchange and cross-relaxation effects in two-dimensional NMR spectroscopy. J. Magn. Reson., 64, 533–535. 27. Fejzo, J., Westler, W.M., Macura, S. and Markley, J.L. (1990) Elimination of cross-relaxation effects from two-dimensional chemical-exchange spectra of macromolecules. J. Amer. Chem. Soc., 112, 2574–2577. 28. Hwang, T.L. and Shaka, A.J. (1998) Multiple-pulse mixing sequences that selectively enhance chemical exchange or cross-relaxation peaks in high-resolution NMR spectra. J. Magn. Reson., 135, 280–287. 29. Subramanian, S., Briggs, S.L. and Kline, A.D. (2006) Monitoring the ligand binding mode by proton NMR chemical shift differences. ChemMedChem., 1, 1197–1199. 30. Farrow, N., Zhang, O., Forman-Kay, J.D. and Kay, L.E. (1994) A heteronuclear correlation experiment for simultaneous determination of 15N longitudinal decay and chemical exchange rates of systems in slow equilibrium. J. Biomol. NMR., 4, 727–734. 31. Montelione, G.T. and Wagner, G. (1989) 2D Chemical exchange NMR spectroscopy by protondetected heteronuclear correlation. J. Amer. Chem. Soc., 111, 3096–3098.
Structural and Dynamic Information on Ligand Binding
261
32. Palmer, A., Kroenke, C.D. and Loria, J.P. (2001) Nuclear magnetic resonance methods for quantifying microsecond-to-millisecond motions in biological macromolecules. Meth. Enzymol., 339, 204–238. 33. Wagner, G., Bodenhausen, G., Muller, N. et al. (1985) Exchange of two-spin order in nuclear magnetic resonance: separation of exchange and cross-relaxation processes. J. Amer. Chem. Soc., 107, 6440–6446. 34. Wider, G., Neri, D. and Wuthrich, K. (1991) Studies of slow conformational equilibria in macromolecules by exchange of heteronuclear longitudinal 2-spin-order in a 2D difference correlation experiment. J. Biomol. NMR., 1, 93–98. 35. Demers, J. and Mittermaier, A. (2009) Binding mechanism of an SH3 domain studied by NMR and ITC. J. Amer. Chem. Soc., 131, 4355–4367. 36. Palmer, A., Grey, M.J. and Wang, C. (2005) Solution NMR spin relaxation methods for characterizing chemical exchange in high-molecular-weight systems. Meth. Enzymol., 394, 430–465. 37. Palmer, A. and Massi, F. (2006) Characterization of the dynamics of biomacromolecules using rotating-frame spin relaxation NMR spectroscopy. Chem. Revs., 106, 1700–1719. 38. Akke, M. (2002) NMR methods for characterizing microsecond to millisecond dynamics in recognition and catalysis. Curr. Opin. Struct. Biol., 12, 642–647. 39. Kay, L.E. (2005) NMR studies of protein structure and dynamics. J. Magn. Reson., 173, 193–207. 40. Hansen, D.F., Vallurupalli, P., Lundstrom, P. et al. (2008) Probing chemical shifts of invisible states of proteins with relaxation dispersion NMR spectroscopy: How well can we do? J. Amer. Chem. Soc., 130, 2667–2675. 41. Mittermaier, A. and Kay, L.E. (2006) New tools provide new insights in NMR studies of protein dynamics. Science, 312, 224–228. 42. Kern, D., Eisenmesser, E.Z. and Wolf-Watz, M. (2005) Enzyme dynamics during catalysis measured by NMR spectroscopy. Meth. Enzymol., 394, 507–524. 43. Vallurupalli, P., Hansen, D.F., Stollar, E. et al. (2007) Measurement of bond vector orientations in invisible excited states of proteins. Proc. Natl. Acad. Sci. U.S.A., 104, 18473–18477. 44. Korzhnev, D.M., Bezsonova, I., Lee, S. et al. (2009) Alternate binding modes for a ubiquitinSH3 domain interaction studied by NMR spectroscopy. J. Mol. Biol., 386, 391–405. 45. Korzhnev, D.M., Salvatella, X., Vendruscolo, M. et al. (2004) Low-populated folding intermediates of Fyn SH3 characterized by relaxation dispersion NMR. Nature, 430, 586–590. 46. Korzhnev, D.M., Neudecker, P., Mittermaier, A. et al. (2005) Multiple-site exchange in proteins studied with a suite of six NMR relaxation dispersion experiments: An application to the folding of a Fyn SH3 domain mutant. J. Amer. Chem. Soc., 127, 15602–15611. 47. Korzhnev, D.M. and Kay, L.E. (2008) Probing invisible, low-populated states of protein molecules by relaxation dispersion NMR spectroscopy: An application to protein folding. Acc. Chem. Res., 41, 442–451. 48. Neudecker, P., Lundstrom, P. and Kay, L.E. (2009) Relaxation dispersion NMR spectroscopy as a tool for detailed studies of protein folding. Biophys. J., 96, 2045–2054. 49. Tollinger, M., Skrynnikov, N.R., Mulder, F.A.A. et al. (2001) Slow dynamics in folded and unfolded states of an SH3 domain. J. Amer. Chem. Soc., 123, 11341–11352. 50. Kovrigin, E.L., Kempf, J.G., Grey, M.J. and Loria, J.P. (2006) Faithful estimation of dynamics parameters from CPMG relaxation dispersion measurements. J. Magn. Reson., 180, 93–104. 51. Neudecker, P., Korzhnev, D.M. and Kay, L.E. (2006) Assessment of the effects of increased relaxation dispersion data on the extraction of 3-site exchange parameters characterizing the unfolding of an SH3 domain. J. Biomol. NMR., 34, 129–135. 52. Igumenova, T.I., Brath, U., Akke, M. and Palmer, A.G. (2007) Characterization of chemical exchange using residual dipolar coupling. J. Amer. Chem. Soc., 129, 13396–13397. 53. Korzhnev, D.M., Religa, T.L., Banachewicz, W. et al. (2010) A transient and low-populatedprotein-folding intermediate at atomic resolution. Science., 329, 1312–1316. 54. Fesik, S.W. (1991) NMR studies of molecular complexes as a tool in drug design. J. Med. Chem., 34, 2937–2945.
262
Protein NMR Spectroscopy
55. Hajduk, P.J., Meadows, R.P. and Fesik, S.W. (1999) NMR-based screening in drug discovery. Quart. Revs. Biophys., 32, 211–240. 56. Lepre, C.A., Moore, J.M. and Peng, J.W. (2004) Theory and applications of NMR-based screening in pharmaceutical research. Chem. Revs., 104, 3641–3675. 57. Peng, J.W., Moore, J. and Abdul-Manan, N. (2004) NMR experiments for lead generation in drug discovery. Prog. Nucl. Magn. Reson. Spectrosc., 44, 225–256. 58. Villar, H.O., Yan, J.L. and Hansen, M.R. (2004) Using NMR for ligand discovery and optimization. Curr. Opin. Chem. Biol., 8, 387–391. 59. Jhoti, H., Cleasby, A., Verdonk, M. and Williams, G. (2007) Fragment-based screening using Xray crystallography and NMR spectroscopy. Curr. Opin. Chem. Biol., 11, 485–493. 60. Pellecchia, M., Sem, D.S. and Wuthrich, K. (2002) NMR in drug discovery. Nature Revs. Drug Discov., 1, 211–219. 61. Stockman, B.J. and Dalvit, C. (2002) NMR screening techniques in drug discovery and drug design. Prog. Nucl. Magn. Reson. Spectrosc., 41, 187–231. 62. Meyer, B. and Peters, T. (2003) NMR spectroscopy techniques for screening and identifying ligand binding to protein receptors. Angew. Chem. Int. Ed. Engl., 42, 864–890. 63. Pellecchia, M., Bertini, I., Cowburn, D. et al. (2008) Perspectives on NMR in drug discovery: a technique comes of age. Nature Revs. Drug Discov., 7, 738–745. 64. Assadi-Porter, F.M., Tonelli, M., Maillet, E. et al. (2008) Direct NMR detection of the binding of functional ligands to the human sweet receptor, a heterodimeric family 3 GPCR. J. Amer. Chem. Soc., 130, 7212–7213. 65. Gyi, J.I., Brennan, R.J., Pye, D.A. and Barber, J. (1991) The binding of erythromycin A to bacterial ribosomes – a H-1 transferred NOE study. J. Chem. Soc. Chem. Commun., 1471–1473. 66. Verdier, L., Gharbi-Benarous, J., Bertho, G. et al. (2002) Antibiotic resistance peptides: Interaction of peptides conferring macrolide and ketolide resistance with Staphylococcus aureus ribosomes. Conformation of bound peptides as determined by transferred NOE experiments. Biochemistry, 41, 4218–4229. 67. Furukawa, H., Hamada, T., Hayashi, M.K. et al. (2002) Conformation of ligands bound to the muscarinic acetylcholine receptor. Mol. Pharmacol., 62, 778–787. 68. Assadi-Porter, F.M., Tonelli, M., Maillet, E.L. et al. (2010) Interactions between the human sweet-sensing T1R2-T1R3 receptor and sweeteners detected by saturation transfer difference NMR spectroscopy. Biochim. Biophys. Acta., 1798, 82–86. 69. Mari, S., Invernizzi, C., Spitaleri, A. et al. (2010) 2D TR-NOESY experiments interrogate and rank ligand-receptor interactions in living human cancer cells. Angew. Chem. Int. Ed. Engl., 49, 1071–1074. 70. Bartoschek, S., Klabunde, T., Defossa, E. et al. (2010) Drug design for G-protein-coupled receptors by a ligand-based NMR method. Angew. Chem. Int. Ed. Engl., 49, 1426–1429. 71. Dalvit, C. (2007) Ligand- and substrate-based 19F NMR screening: Principles and applications to drug discovery. Prog. Nucl. Magn. Reson. Spectrosc., 51, 243–271. 72. Mayer, M. and Meyer, B. (2001) Group epitope mapping by saturation transfer difference NMR to identify segments of a ligand in direct contact with a protein receptor. J. Amer. Chem. Soc., 123, 6108–6117. 73. Dalvit, C., Flocco, M., Knapp, S. et al. (2002) High-throughput NMR-based screening with competition binding experiments. J. Amer. Chem. Soc., 124, 7702–7709. 74. Dalvit, C., Fasolini, M., Flocco, M. et al. (2002) NMR-based screening with competition waterligand observed via gradient spectroscopy experiments: Detection of high-affinity ligands. J. Med. Chem., 45, 2610–2614. 75. Fischer, J.J. and Jardetzky, O. (1965) Nuclear magnetic relaxation study of intermolecular complexes. The mechanism of penicillin bindingto serum albumin. J.Amer. Chem. Soc., 87,3237. 76. Hajduk, P.J., Olejniczak, E.T. and Fesik, S.W. (1997) One-dimensional relaxation- and diffusion-edited NMR methods for screening compounds that bind to macromolecules. J. Amer. Chem. Soc., 119, 12257–12261. 77. Altiere, A.S., Hinton, D.P. and Byrd, R.A. (1995) Association of biomolecular systems via pulsed field gradient NMR self-diffusion measurements. J. Amer. Chem. Soc., 117, 756–7567.
Structural and Dynamic Information on Ligand Binding
263
78. Brand, T., Cabrita, E.J. and Berger, S. (2005) Intermolecular interaction as investigated by NOE and diffusion studies. Prog. Nucl. Magn. Reson. Spectrosc., 46, 159–196. 79. Mayer, M. and Meyer, B. (1999) Characterization of ligand binding by saturation transfer difference NMR spectroscopy. Angew. Chem. Int. Ed. Engl., 38, 1784–1788. 80. Cutting, B., Shelke, S.V., Dragic, Z. et al. (2007) Sensitivity enhancement in saturation transfer difference (STD) experiments through optimized excitation schemes. Magn. Reson. Chem., 45, 720–724. 81. Jayalakshmi, V. and Rama Krishna, N. (2002) Complete relaxation and conformational exchange matrix (CORCEMA) analysis of intermolecular sturation transfer effects in reversibly forming ligand-receptor complexes. J. Magn. Reson., 155, 106–118. 82. Krishna, N.R. and Jayalakshmi, V. (2006) Complete relaxation and conformational exchange matrix analysis of STD-NMR spectra of ligand-receptor complexes. Prog. Nucl. Magn. Reson. Spectrosc., 49, 1–25. 83. Kemper, S., Patel, M.K., Errey, J.C. et al. (2010) Group epitope mapping considering relaxation of the ligand (GEM-CRL): Including longitudinal relaxation rates in the analysis of sturation transfer difference (STD) experiments. J. Magn. Reson., 203, 1–10. 84. McCoy, M.A., Senior, M.M. and Wyss, D.F. (2005) Screening of protein kinases by ATP-STD NMR spectroscopy. J. Amer. Chem. Soc., 127, 7978–7979. 85. Wang, Y.-S., Liu, D. and Wyss, D.F. (2004) Competition STD NMR for the detection of highaffinity ligands and NMR-based screening. Magn. Reson. Chem., 42, 485–489. 86. Hajduk, P.J., Mack, J.C., Olejniczak, E.T. et al. (2004) SOS-NMR: A saturation transfer NMRbased method for determining the structures of protein-ligand complexes. J. Amer. Chem. Soc., 126, 2390–2398. 87. Dalvit, C., Fogliatto, G., Stewart, A. et al. (2001) WaterLOGSY as a method for primary NMR screening: Practical aspects and range of applicability. J. Biomol. NMR., 21, 349–359. 88. Gossert, A.D., Henry, C., Blommers, M.J.J. et al. (2009) Time efficient detection of proteinligand interactions with the polarization optimized PO-WaterLOGSY NMR experiment. J. Biomol. NMR., 43, 211–217. 89. Nageswara Rao, B.D. (1989) Determination of equilibrium constants of enzyme-bound reactants and products by nuclear magnetic resonance. Meth. Enzymol., 177, 358–3755. 90. Eisenmesser, E.Z., Bosco, D.A., Akke, M. and Kern, D. (2002) Enzyme dynamics during catalysis. Science., 295, 1520–1523. 91. Eisenmesser, E.Z., Millet, O., Labeikovsky, W. et al. (2005) Intrinsic dynamics of an enzyme underlies catalysis. Nature, 438, 117–121. 92. Henzler-Wildman, K. and Kern, D. (2007) Dynamic personalities of proteins. Nature, 450, 964–972. 93. Henzler-Wildman, K.A., Lei, M., Thai, V. et al. (2007) A hierarchy of timescales in protein dynamics is linked to enzyme catalysis. Nature, 450, 913–916. 94. Henzler-Wildman, K.A., Thai, V., Lei, M. et al. (2007) Intrinsic motions along an enzymatic reaction trajectory. Nature, 450, 838–844. 95. Loria, J.P., Berlow, R.B. and Watt, E.D. (2008) Characterization of enzyme motions by solution NMR relaxation dispersion. Acc. Chem. Res., 41, 214–221. 96. Feeney, J. (2000) NMR studies of ligand binding to dihydrofolate reductase. Angew. Chem. Int. Ed., 39, 290–312. 97. Feeney, J., Birdsall, B., Albrand, J.P. et al. (1981) A 1H nmr study of the complexes of two diastereoisomers of folinic acid with dihydrofolate reductase. Biochemistry, 20, 1837. 98. Gronenborn, A., Birdsall, B., Hyde, E.I. et al. (1981) Direct observation by NMR of two coexisting conformations of an enzyme-ligand complex in solution. Nature, 290, 273–274. 99. Birdsall, B., Burgen, A.S.V., Hyde, E.I. et al. (1981) Negative cooperativity between folinic acid and coenzyme in their binding to L. casei dihydrofolate reductase. Biochemistry, 20, 7186–7195. 100. Birdsall, B., Gronenborn, A., Hyde, E.I. et al. (1982) H-1, C-13, and P-31 nuclear magnetic resonance studies of the dihydrofolate reductase - nicotinamide adenine dinucleotide phosphate - folate complex - Characterization of 3 coexisting conformational states. Biochemistry, 21, 5831–5838.
264
Protein NMR Spectroscopy
101. Birdsall, B., Bevan, A.W., Pascual, C. et al. (1984) Multinuclear NMR characterization of two coexisting conformational states of the Lactobacillus casei dihydrofolate reductase-trimethoprim-NADP þ complex. Biochemistry, 23, 4733–4742. 102. Kleerekoper, Q., Liu, W., Choi, D. and Putkey, J.A. (1998) Identification of binding sites for bepridil and trifluoperazine on cardiac troponin C. J. Biol. Chem., 273, 8153–8160. 103. Ababou, A., Pfuhl, M. and Ladbury, J.E. (2009) Novel insights into the mechanisms of CIN85 SH3 domains binding to Cbl proteins: solution-based investigations and in vivo implications. J. Mol. Biol., 387, 1120–1136. 104. Barsukov, I.L., Lian, L.Y., Ellis, J. et al. (1996) The conformation of coenzyme A bound to chloramphenicol acetyltransferase determined by transferred NOE experiments. J. Mol. Biol., 262, 543–558. 105. Boehr, D.D., Dyson, H.J. and Wright, P.E. (2008) Conformational relaxation following hydride transfer plays a limiting role in dihydrofolate reductase catalysis. Biochemistry, 47, 9227–9233. 106. Boehr, D.D., McElheny, D., Dyson, H.J. and Wright, P.E. (2010) Millisecond timescale fluctuations in dihydrofolate reductase are exquisitely sensitive to the bound ligands. Proc. Natl. Acad. Sci. U.S.A., 107, 1373–1378. 107. Boehr, D.D., McElheny, D., Dyson, H.J. and Wright, P.E. (2006) The dynamic energy landscape of dihydrofolate reductase catalysis. Science., 313, 1638–1642. 108. McElheny, D., Schnell, J.R., Lansing, J.C. et al. (2005) Defining the role of active-site loop fluctuations in dihydrofolate reductase catalysis. Proc. Natl. Acad. Sci. USA., 102, 5032–5037. 109. Venkitakrishnan, R.P., Zaborowski, E., McElheny, D. et al. (2004) Conformational changes in the active site loops of dihydrofolate reductase during the catalytic cycle. Biochemistry, 43, 16046–16055. 110. Waldman, A.D.B., Birdsall, B., Roberts, G.C.K. and Holbrook, J.J. (1986) C-13-NMR and transient kinetic studies on lactate dehydrogenase Cys(CN)165-C-13 - direct measurement of a rate-limiting rearrangement in protein structure. Biochim. Biophys. Acta., 870, 102–111. 111. Rozovsky, S., Jogl, G., Tong, L. and McDermott, A.E. (2001) Solution-state NMR investigations of triosephosphate isomerase active site loop motion: ligand release in relation to active site loop dynamics. J. Mol. Biol., 310, 271–280. 112. Post, C.B. (2003) Exchange-transferred NOE spectroscopy and bound ligand structure determination. Curr. Opin. Struct. Biol., 13, 581–588. 113. Balaram, P., Bothner-By, A.A. and Breslow, E. (1972) Localization of tyrosine at binding-site of neurophysin II by negative nuclear Overhauser effects. J. Amer. Chem. Soc., 94, 4017–4018. 114. Bothner-By, A.A. and Gassend, R. (1973) Binding of small molecules to proteins. Ann. NY Acad. Sci., 222, 668–676. 115. James, T.L. and Cohn, M. (1974) The role of the lysyl residue at the active site of creatine kinase: nuclear Overhauser effect studies. J. Biol. Chem., 249, 2599–2604. 116. Anglister, J. and Naider, F. (1991) Nuclear magnetic resonance for studying peptide antibody complexes by transferred nuclear Overhauser effect difference spectroscopy. Meth. Enzymol., 203, 228–241. 117. Anglister, J. (1990) Use of deuterium labelling in NMR studies of antibody combining site structure. Quart. Revs. Biophys., 23, 175–203. 118. Pellecchia, M., Meininger, D., Dong, Q. et al. (2002) NMR-based structural characterization of large protein-ligand interactions. J. Biomol. NMR., 22, 165–173. 119. Campbell, A.P. and Sykes, B.D. (1993) The two-dimensional transferred nuclear Overhauser effect - theory and practice. Ann. Revi. Biophys. Biomol. Struct., 22, 99–122. 120. Lian, L.Y., Barsukov, I.L., Sutcliffe, M.J. et al. (1994) Protein ligand iinteractions - exchange processes and determination of ligand conformation and protein ligand contacts. Meth. Enzymol., 239, 657–700. 121. London, R.E., Perlman, M.E. and Davis, D.G. (1992) Realxation-matrix analysis of the transferred nuclear Overhauser effect for finite exchange rates. J. Magn. Reson., 97, 79–98. 122. Ni, F. (1994) Recent developments in transferred NOE methods. Prog. Nucl. Magn. Reson. Spectrosc., 26, 517–606.
Structural and Dynamic Information on Ligand Binding
265
123. Albrand, J.P., Birdsall, B., Feeney, J. et al. (1979) Use of tranferred nuclear Overhauser effects in the study of the conformations of small molecules bound to proteins. Int. J. Biol. Macromol., 1, 37–41. 124. Feeney, J., Birdsall, B., Roberts, G.C.K. and Burgen, A.S.V. (1983) Use of transferred nuclear Overhauser effect measurements to compare binding of coenzyme analogs to dihydrofolate reductase. Biochemistry, 22, 628–633. 125. Weimar, T., Petersen, B.O., Svensson, B. and Pinto, B.M. (2000) Determination of the solution conformation of D-gluco-dihydroacarbose, a high-affinity inhibitor, bound to glucoamylase by transferred NOE NMR spectroscopy. Carbohyd. Res., 326, 50–55. 126. Lee, Y.C., Jackson, P.L., Jablonsky, M.J. and Muccio, D.D. (2007) Conformation of 3’CMP bound to RNase A using TrNOESY. Arch. Biochem. Biophys., 463, 37–46. 127. Murali, N., Jarori, G.K., Landy, S.B. and Rao, B.D.N. (1993) Two-dimensional transferred nuclear Overhauser effect spectroscopy (TRNOESY) studies of nucleotide conformations in creatine kinase complexes: effects due to weak nonspecific binding. Biochemistry, 32, 12941–12948. 128. Behling, R.W., Yamane, T., Navon, G. and Jelinski, L.W. (1988) Conformation of acetylcholine bound to the nicotinic acetylcholine-receptor. Proc. Natl. Acad. Sci. U. S. A., 85, 6721–6725. 129. Arepalli, S.R., Glaudemans, C.P.J., Daves, G.D. et al. (1995) Identification of protein-mediated indirect NOE effects in a disaccharide-Fab’ complex by transferred ROESY. J. Magn. Reson. B., 106, 195–198. 130. Ni, F. and Zhu, Y. (1994) Accounting for ligand-protein interactions in the relaxation-matrix analysis of transferred nuclear Overhauser effects. J. Magn. Reson. B., 103, 180–184. 131. Moseley, H., Curto, EV. and Rama Krishna, N. (1995) Complete relaxation and conformation exchange matrix (CORCEMA) analysis of NOESY spectra of interacting systems; twodimensional transferred NOESY. J. Magn. Reson. B., 108, 243–261. 132. Zabell, A.P.R. and Post, C.B. (2002) Intermolecular relaxation has little effect on intra-peptide exchange-transferred NOE intensities. J. Biomol. NMR., 22, 303–315. 133. Eisenmesser, E.Z., Zabell, A.P.R. and Post, C.B. (2000) Accuracy of bound peptide structures determined by exchange transferred nuclear Overhauser data: A simulation study. J. Biomol. NMR., 17, 17–32. 134. Li, D.W., DeRose, E.F. and London, R.E. (1999) The inter-ligand Overhauser effect: A powerful new NMR approach for mapping structural relationships of macromolecular ligands. J. Biomol. NMR., 15, 71–76. 135. Li, D.W., Levy, L.A., Gabel, S.A. et al. (2001) Interligand Overhauser effects in type II dihydrofolate reductase. Biochemistry, 40, 4242–4252. 136. Li, D. and London, R.E. (2002) Ligand discovery using the inter-ligand nuclear Overhauser effect: horse liver alcohol dehydrogenase. Biotechnol. Letts., 24, 623–629. 137. Becattini, B., Culmsee, C., Leone, M. et al. (2006) Structure-activity relationships by interligand NOE-based design and synthesis of antiapoptotic compounds targeting. Bid. Proc. Natl. Acad. Sci. USA., 103, 12602–12606. 138. Becattini, B. and Pellecchia, M. (2006) SAR by ILOEs: An NMR-based approach to reverse chemical genetics. Chem. Eur. J., 12, 2658–2662. 139. Becattini, B., Sareth, S., Zhai, D.Y. et al. (2004) Targeting apoptosis via chemical design: Inhibition of bid-induced cell death by small organic molecules. Chem. & Biol., 11, 1107–1117. 140. London, R.E. (1999) Theoretical analysis of the inter-ligand Overhauser effect: A new approach for mapping structural relationships of macromolecular ligands. J. Magn. Reson., 141, 301–311. 141. Sledz, P., Silvestre, H.L., Hung, A.W. et al. (2010) Optimization of the interligand Overhauser effect for fragment linking: application to inhibitor discovery against Mycobacterium tuberculosis pantothenate synthetase. J. Amer. Chem. Soc., 132, 4544–4545. 142. Reese, M., Sanchez-Pedregal, V.M., Kubicek, K. et al. (2007) Structural basis of the activity of the microtubule-stabilizing agent epothilone A studied by NMR spectroscopy in solution. Angew. Chem. -Int. Ed., 46, 1864–1868. 143. Sanchez-Pedregal, V.M., Reese, M., Meiler, J. et al. (2005) The INPHARMA method: Proteinmediated interligand NOEs for pharmacophore mapping. Angew. Chem. -Int. Ed., 44, 4172–4175.
266
Protein NMR Spectroscopy
144. Orts, J., Griesinger, C. and Carlomagno, T. (2009) The INPHARMA technique for pharmacophore mapping: A theoretical guide to the method. J. Magn. Reson., 200, 64–73. 145. Blommers, M.J.J., Stark, W., Jones, C.E. et al. (1999) Transferred cross-correlated relaxation complements transferred NOE: Structure of an IL-4R-derived peptide bound to STAT-6. J. Amer. Chem. Socy., 121, 1949–1953. 146. Carlomagno, T., Felli, I.C., Czech, M. et al. (1999) Transferred cross-correlated relaxation: application to the determination of sugar pucker in an aminoacylated tRNA-mimetic weakly bound to EF-Tu. J. Amer. Chem. Soc., 121, 1945–1948. 147. Carlomagno, T. (2005) Ligand-target interactions: What can we learn from NMR? Ann. Rev. Biophys. Biomol. Struct., 34, 245–266. 148. Ravindranathan, S., Mallet, J.M., Sinay, P. and Bodenhausen, G. (2003) Transferred crossrelaxation and cross-correlation in NMR: effects of intermediate exchange on the determination of the conformation of bound ligands. J. Magn. Reson., 163, 199–207. 149. Ayed, A., Mulder, F.A.A., Yi, G.-S. et al. (2001) Latent and active p53 are identical in conformation. Nature Struct. Biol., 8, 756–760. 150. Seavey, B., Farr, E.A., Westler, W.M. and Markley, J.L. (1991) A relational database for sequence-specific protein NMR data. J. Biomol. NMR, 1, 217–236. 151. Schumann, F.H., Riepl, H., Maurer, T. et al. (2007) Combined chemical shift changes and amino acid specific chemical shift mapping of protein-protein interactions. J. Biomol. NMR., 39, 275–289. 152. Krzeminski, M., Loth, K., Boelens, R. and Bonvin, A.M.J.J. (2010) SAMPLEX: automatic mapping of perturbed and unperturbed regions of proteins and complexes. BMC Bioinformatics., 11, 51. 153. Williamson, R.A., Carr, M.D., Frenkiel, T.A. et al. (1997) Mapping the binding site for matrix metalloproteinase on the N-terminal domain of the tissue inhibitor of metalloproteinases-2 by NMR chemical shift perturbation. Biochemistry, 36, 13882–13889. 154. Muskett, F.W., Frenkiel, T.A., Feeney, J. et al. (1998) High resolution structure of the N-terminal domain of tissue inhibitor of metalloproteinases-2 and characterization of its interaction site with matrix metalloproteinase-3. J. Biol. Chem., 273, 21736–21743. 155. Dominguez, C., Boelens, R. and Bonvin, A.M.J.J. (2003) HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J. Amer. Chem. Soc., 125, 1731–1737. 156. Ma, J., Gruschus, J.M. and Tjandra, N. (2009) 15N-1H scalar coupling perturbation: an additional probe for measuring structural changes due to ligand binding. J. Amer. Chem. Soc., 131, 9884–9885. 157. Shuker, S.B., Hajduk, P.J., Meadows, R.P. and Fesik, S.W. (1996) Discovering high-affinity ligands for proteins: SAR by NMR. Science, 274, 1531–1534. 158. Hajduk, P.J., Sheppard, G., Nettesheim, D.G. et al. (1997) Discovery of potent nonpeptide inhibitors of stromelysin using SAR by NMR. J. Amer. Chem. Soc., 119, 5818–5827. 159. Jahnke, W., Grotzfeld, R.M., Pelle, X. et al. (2010) Binding or bending: distinction of allosteric Abl kinase agonists from antagonists by an NMR-based conformational assay. J. Amer. Chem. Soc., 132, 7043–7048. 160. Mildvan, A.S. (1977) Magnetic resonance studies of the conformations of enzyme-bound substrates. Acc. Chem. Res., 10, 246–252. 161. Mildvan, A.S. and Gupta, R.K. (1978) Nuclear relaxation measurements of the geometry of enzyme-bound substrates and analogs. Meth. Enzymol., 49, 322–359. 162. Cohn, M. and Nageswara Rao, B.D. (1979) 31P NMR studies of enzymatic reactions. Bull. Magn. Reson., 1, 38–60. 163. Jarori, G.K., Ray, B.D. and Nageswara Rao, B.D. (1985) Structure of metal-nucleotide complexes bound to creatine kinase: 31P NMR measurements using Mn(II) and Co(II). Biochemistry, 24, 3487–3494. 164. Crull, G.B., Kennington, J.W., Garber, A.R. et al. (1989) 19F nuclear magnetic resonance as a probe of the spatial relationship between the heme iron of cytochrome P-450 and its substrate. J. Biol. Chem., 264, 2649–2655.
Structural and Dynamic Information on Ligand Binding
267
165. Modi, S., Primrose, W.U., Boyle, J.M.B. et al. (1995) NMR studies of substrate binding to cytochrome P450 BM3: Comparisons to cytochrome P450 cam. Biochemistry, 34, 8982–8988. 166. Modi, S., Paine, M.J., Sutcliffe, M.J. et al. (1996) A model for human cytochrome P450 2D6 based on homology modeling and NMR studies of substrate binding. Biochemistry, 35, 4540–4550. 167. Ray, B.D., Chau, M.H., Fife, W.K. et al. (1996) Conformation of manganese(II)-nucleotide complexes bound to rabbit muscle creatine kinase: 13C NMR measurements using [2-13C]ATP and [2-13C]ADP. Biochemistry, 35, 7239–7246. 168. Raghunathan, V., Chau, M.H., Ray, B.D. and Nageswara Rao, B.D. (1999) Structural characterization of manganese(II)-nucleotide complexes bound to yeast 3-phosphoglycerate kinase: 13 C relaxation measurements using [U-13C]ATP and [U-13C]ADP. Biochemistry, 38, 15597–15605. 169. Lin, Y. and Nageswara Rao, B.D. (2000) Structural characterization of adenine nucleotides bound to Escherichia coli adenylate kinase. 2. 31P and 13C relaxation measurements in the presence of cobalt(II) and manganese(II). Biochemistry, 39, 3647–3655. 170. Dwek, R.A. (1973) Nuclear Magnetic Resonance in Biochemistry: Applications to Enzyme Systems, Clarendon Press, Oxord. 171. Jardetzky, O. and Roberts, G.C.K. (1981) NMR in Molecular Biology, Academic Press, New York. 172. Breeze, A. (2000) Isotope-filtered NMR methods for the study of biomolecular strucutre and interactions. Prog. Nucl. Magn. Reson. Spectrosc., 36, 323–372. 173. Lee, W., Reevington, M.J., Arrowsmith, C. and Kay, L.E. (1994) A pulsed field gradient isotopefiltered 3D 13C HMQC-NOESY experiment for extracting intermolecular NOE contacts in molecular complexes. FEBS Letts., 350, 87–90. 174. Otting, G. and Wuthrich, K. (1990) Heteronuclear filters in two-dimensional [1H, 1H]-NMR spectroscopy: combined use with isotope labelling for studies of macromolecular conformation and intermolecular interactions. Quart. Revs. Biophys., 23, 39–96. 175. Kogler, H., Sorensen, O.W. and Ernst, R.R. (1983) Low-pass J-filters - suppression of neighbor peaks in heteronuclear relayed correlation spectra. J. Magn. Reson., 52, 157–163. 176. Ikura, M. and Bax, A. (1992) Isotope-filtered 2D NMR of a protein-peptide complex: study of a skeletal muscle myosin light chain kinase fragment bound to calmodulin. J. Amer. Chem. Soc., 114, 2433–2440. 177. Gargaro, A.R., Soteriou, A., Frenkiel, T.A. et al. (1998) The solution structure of the complex of Lactobacillus casei dihydrofolate reductase with methotrexate. J. Mol. Biol., 277, 119–134.
8 Macromolecular Complexes Paul C. Driscoll
8.1
Introduction
It is clear that applications of solution state NMR spectroscopy to the interaction of small molecule ligands with biological macromolecules has proved immensely useful in a number of fields of activity, ranging from fundamental biological research to the commercial development of therapeutic agents. In a similar vein it is undeniably the case that NMR investigations of the mutual interactions between macromolecules themselves have provided insights that, oftentimes, are not obtainable by other methods or complement those garnered by alternative approaches. There is significant overlap of many of the basic principles of the application of NMR to complexes of macromolecules with small molecules, and macromolecular assemblies, though clearly a major factor in the latter case is that the approach is usually constrained by the tendency of the target to yield broader resonances as a result of slow(er) overall tumbling, reflecting the fundamental hydrodynamic characteristics of globular assemblies. Combined with the higher intrinsic overall complexity of the spectra of such systems (because of the potential number of distinct nuclear spins), considerations of how to handle such line-broadening tend to dominate the experimental design and one typically appeals to multiple paradigms to lay siege to this type of problem. It is transparent to anyone with exposure to solution state applications of NMR spectroscopy that this is an area that is always going to receive tough competition from our colleagues who approach such targets by X-ray crystallography, not least because the latter technique does not suffer (to first order) from the impact of increased molecular size. Nevertheless NMR provides a purely solution state method that can compete with, Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
270
Protein NMR Spectroscopy
corroborate and even sometimes correct the view obtained from the crystal lattice. Moreover, the NMR method does not possess the stringent requirement for crystals that diffract to high resolution, nor does it depend to anything like the same degree on the intrinsic thermodynamic stability of the complex or the apparently important minimisation of ‘flexible’ regions of the target constructs. An extrapolation of the utility of NMR in this area has been dramatically demonstrated in the last few years in descriptions of the weakly populated encounter complexes between proteins or protein and nucleic acid binding partners [1], as well as dynamic exchange between alternate conformations of an intermolecular interface between two proteins [2]. Despite its promise, one should be straightforward and recognise that the application of NMR to macromolecular complexes remains nontrivial and, whilst clearly laboured over in a significant fraction of NMR laboratories, this strand of activity has perhaps not witnessed the same level of exploitation as the application of NMR in other more tractable areas. Nevertheless the research imperatives in structural proteomics and functional genomics constantly push us towards systems of assembled macromolecules and it is probable that the future will witness wider application and new developments in the field of NMR spectroscopy for the challenge posed by such targets. Already there is a veritable mass of relevant literature [3–12] that it is difficult to do justice to in this single exposition. The following attempts to describe the headline concepts that must be taken into account, illustrated with selected examples of activity in this field. By way of a disclaimer, the bias of this description is cast in terms of binary complexes between globular proteins. However, many of the concepts described are essentially equally applicable to complexes involving protein interactions with nucleic acids, carbohydrates or model membranes (e.g. lipid micelles). This chapter illustrates how NMR can be used for the studies of macromolecular complexes, using approaches and techniques that have already been described in previous chapters. For convenience, some of these concepts are covered again in sufficient detail for this chapter to be standalone; however, where appropriate and for more detailed discussions, specific references are made to other chapters.
8.2
Spectral Simplification through Differential Isotope Labelling
Whatever the nature of the macromolecular complex target it is likely that the experimenter will want to exploit the potential that such systems usually allow to simplify the NMR spectrum by differential isotope labelling of the components (see Chapter 2). Thus for a protein-nucleic acid complex it is ordinarily straightforward to isotope label with 15 N and 13 C the protein component [13–18], or at least for RNA, the nucleic acid component [19–28]. For complexes between proteins it is normal to prepare at least two different samples of the complex: one with the isotope labels in the first protein of the pair; and the other with the labels in the second protein (see Figure 8.1). Typically the separate double-labelled components will have been the subject of investigation on their own to obtain resonance assignments and to characterise the local backbone dynamics, and the behaviour of the isolated proteins more-or-less understood prior to complex formation. (This strategy presumes that the complex can be reconstituted from the separate components. If this is not the case then one can perhaps appeal to
Macromolecular Complexes
271
Figure 8.1 Schematic diagram to illustrate the potential for ‘differential’ or ‘asymmetric’ isotope labelling of the components of a binary protein-protein complex. Options depicted include single (typically 15N) or double (15N and 13C) isotope labelling in either a protonated (upper braces) or ‘perdeuterated’ (lower braces) background. In the latter case specific patterns of methyl group protonation can prove advantageous for higher molecular weight examples (ILV stands for isoleucine, leucine and valine; A stands for alanine). Moreover the experimenter can select whether the sample is prepared in H2O or D2O solution in order to retain or eliminate the exchangeable proton (HN, and perhaps HS, HO) resonances in the spectrum. Similar differential isotope-labelling considerations are applicable to protein-RNA and protein-DNA complexes (see main text)
sophisticated in vivo labelling strategies by which the components are expressed in the host organism at different times in different media [29].) It has been pointed out that in certain circumstances it proves advantageous to double (13 C,15 N) isotope-label one component of the target complex, and to mix this with single (15 N) isotope-labelled second component [30]. Although both proteins are then 15 N-labelled and will yield (overlapping) 2D 1 H,15 N correlation spectra in a single experiment, the presence of J-couplings to 13 C atoms allows the capability to edit the spectrum for or against the double labelled component, thereby permitting separation of the two component subspectra – the pulse sequences that have been developed for this purpose have been dubbed ‘isotope-discriminated NMR’ or IDIS-NMR. Under these conditions the spectroscopist can be sure to measure certain NMR parameters (chemical shifts, 15 N-relaxation times, residual dipolar couplings) for the two components of a complex under intrinsically ‘matched’ conditions, albeit perhaps with a small loss of sensitivity due to the additional RF pulses required [30–32]. In favourable cases the spectroscopist might be able to choose to perform such isotope labelling along with perdeuteration of the nonexchangeable protons by preparing the protein in heavy water cultures (see Chapter 2) [33–36]. Application of perdeuteration generally assists with the suppression of adverse relaxation pathways in larger systems. This advantage arises in a number of contexts. First, because the magnetogyric ratio of the deuteron is approximately 6.5 times smaller than that of the proton, replacement of the 13 C-bonded protons with deuterons alleviates the dipolar relaxation mechanism for these nuclei.
272
Protein NMR Spectroscopy
Most dramatically the transverse relaxation rates of Ca nuclei are slowed fivefold, and combined with the introduction of 2 H-multiple pulse decoupling during the 13 C evolution periods, this property allows the extension of triple resonance through-bond (J-coupled) NMR experiments to globular molecules well in excess of the 30 kDa ceiling that otherwise represents the approximate upper limit for this class of experiment. Secondly, certain classes of deuterium-adapted constant-time triple resonance NMR experiments yield cross-peak sign patterns that facilitate rapid sequential resonance assignment [37]. Thirdly, the effective dilution of protons in the target molecule, that is concomitant with deuteration, has advantages for the realisation of the benefits of transverse relaxation-optimised spectroscopy (TROSY) (see Chapter 3) and related methods (CRINEPT, CRIPT) that are designed to select or record particular submultiplet components of un-decoupled heteronuclear correlation spectra [38–44]. Thus to first order high level deuteration minimises the contributions to the linewidth of the NH cross-peaks that would arise from (15 N)1 H-(12=13 C)1 H dipolar interactions. A related advantage is that this ‘proton dilution’ means that 1 H-NOEs between NH groups are less prone to the effects of spin-diffusion and can be observed for significantly greater interproton separations (e.g. up to 7 A) than in fully protonated systems [45,46]. Despite the very significant advantages of deuteration outlined above, there are experimental caveats. First, successful application of this strategy relies on the experimenter being able to back-exchange the N-2 H groups to N-1 H in normal H2O: this is not always straightforward, especially when the perdeuterated protein has very high stability in the folded state or cannot be coaxed to refold after dissolution in a denaturing buffer. Secondly, it is generally agreed that very high levels of 2 H-incorporation are the most beneficial in terms of optimising performance of multidimensional heteronuclear correlation spectroscopy. Typically >80 % deuteration is achievable with the growth of recombinant bacteria is heavy water-based minimal media with a protonated carbon source. Under these conditions replacement of Ha atoms with deuterons is somewhat higher (near 100 %) because of the liability of the Ca–Ha bond with respect to exchange in the biosynthetic pathways of the expression host, but side-chain CH groups will typically not be completely deuterated. If one wants to remove these remaining protons then expression of the protein with media containing a (more expensive) deuterated carbon source is required. It pays to be aware that the 1 HN T1 relaxation time constants in such fully perdeuterated samples become rather long, limiting the interscan repetition rate for multipulse NMR experiments. Such effects are mitigated to a degree by deliberate reincorporation of methyl protons in specific positions, such as is obtained when the source bacteria are grown in media that contain specific amino acid precursors (e.g. [3-2 H]-a-ketoisovalerate and [3,3-2 H2 ]-a-ketobutyrate) that lead to protonation of Leu, Val, and Ile(d1) side-chain methyl groups in an otherwise deuterated background [47]. Clearly this last strategy, whilst having certain benefits, particularly with respect to the recording of 13 C-ILV-methyl TROSY NMR, is at the cost of reintroducing CH protons, and in certain situations pertaining to protein complexes (see below) this would not be sensible. In summary, as a general rule the investigator will want to assess the benefits that perdeuteration might bring to the study of the target macromolecular complex. If that target has an overall molecular weight in excess of about 30 kDa then it is extremely likely that perdeuteration of one component or the other, or both, will be obligatory if the maximum information content of the NMR spectra is to be obtained.
Macromolecular Complexes
8.3
273
Basic NMR Characterisation of Complexes
The differential labelling strategy outlined above simplifies the NMR spectra that are recorded to make an initial, if not complete, characterisation of the resultant complex. Thus by labelling only one component of the system, and monitoring by 2D 15 N-1 H or (more rarely) 13 C-1 H heteronuclear correlation spectroscopy the titration of the other component into the NMR tube one can focus upon the effects on the first component of contact with the second component without interference from a trivial increase in spectral complexity. Let us take the common example of a heterodimeric complex comprised of two proteins P and Q, denoted P:Q. P þ Q ! P:Q Almost certainly, from a practical perspective, the first experiment that a spectroscopist will perform in an attempt to characterise the interaction is to titrate one molecule (usually unlabelled, e.g. protein P) into a solution of the other, typically enriched in, at minimum, 15 N nuclei (e.g. protein Q). The experimenter records a 2D 15 N-1 H correlation spectrum (e.g. using the 15 N,1 H heteronuclear single quantum coherence – HSQC – pulse sequence) for a series of admixtures of the two proteins P and Q ranging from free P to excess Q. The experimenter should take care to make sure that the two proteins are both prepared in the same buffer (certainly the identical pH), and ideally to check that the sample pH is maintained constant throughout the titration. A convenient way to perform this titration that tends to minimise chemical shift artifacts and makes the most efficient use of the ‘expensive’ labelled component of the complex is as follows (also described in Chapter 1).
Protocol for Protein–Protein Titrations The initial NMR sample (sample A) contains about 0.5 mL total volume of 0.1–0.5 mM protein P in 90 % H2O/10 % D2O buffer, and therefore has a [protein P]:[protein Q] molar ratio of 1:0. A second NMR sample tube (sample B) is prepared containing the 0.5 mL total volume of the same concentration of protein P and excess protein Q, which therefore has a concentration ratio [protein P] < [protein Q]. A practical way to perform this step is to slightly over-concentrate the stock solution of protein P and add identical volumes < 0.5 mL to the A and B sample tubes. Sample tube A is then topped up with buffer, and sample tube B with the buffered solution of protein Q. First, 2D HSQC spectra of the A and B sample tubes are recorded. These spectra represent the end points of the titration series. Then a series of additional 2D HSQC NMR experiments are performed where the concentration of protein P is maintained at a constant concentration and the protein Q concentration varied in stepwise increments to give a series of [protein P]:[protein Q] molar ratios of intermediate values. These spectra are obtained by simultaneously removing an equal aliquot (initially about 5–20 %) from both sample tube A and sample B tube. Transfer the aliquot from tube A to tube B, and the one from tube B to tube A. This procedure of simultaneously exchanging equal volume aliquots is repeated until a series of experiments covering the whole range of
274
Protein NMR Spectroscopy
[protein P]:[protein Q] has been completed. Since the initial protein P concentration is the same in sample tubes A and B, [protein P] remains constant and only [protein Q] varies across the titration series (this outcome can be validated by performing SDS-PAGE on 5 mL samples of the A and B tube contents extracted at each aliquot exchange). Note, as the experiment proceeds the exchange volume needs to be increased to avoid asymptotic convergence upon nonoverlapping values of the molar ratio in sample tubes A and B.
A number of outcomes of such a titration experiment are formally possible. These are outlined in a schematic manner in Figure 8.2. We start with the HSQC spectrum of the labelled component protein P in the free state. This spectrum will be more or less well dispersed with, to first order (leaving aside sidechain resonances), approximately the same number of NH cross-peaks corresponding to residues in the protein chain. The chemical shifts of these cross-peaks reflect the local physicochemical environment of each NH group. As ligand protein Q is titrated into the solution of protein P we would anticipate that the spectrum of protein P will be perturbed in a way which reflects the change in the individual environments of the 15 N-H (or 13 C-H) groups in any complex that is formed between the two proteins. Figure 8.2a–c illustrates, in an idealised manner, three different scenarios that are commonly encountered in such titration experiments (see Chapter 7 for a comprehensive discussion on chemical exchange). Each panel in the schematic diagram represents just three heteronuclear cross-peaks, sampled from what would be a much more complex (often overlapped) spectrum; in each case only a relatively small number of cross-peaks is observed to be significantly perturbed during the course of the titration, but in Figure 8.2 two of the three cross-peaks depicted are supposed to represent NH groups in or close to the binding intermolecular site. In the first case (Figure 8.2a), these perturbed cross-peaks migrate systematically and in a linear fashion towards a new position in the spectrum with little, if any, change in peak height. A plot of the change in chemical shift versus the concentration of protein Q yields a rectangular hyperbola that can be readily fitted to a standard Michaelis equilibrium binding curve. Such behaviour is characteristic of ‘fastexchange’ between the free and the bound states of protein P (see also Chapter 7). A second distinctive outcome, depicted in Figure 8.2b, can arise where the intensity of a subset of cross-peaks for the free state on protein P progressively diminish in the presence of increasing amounts of protein Q accompanied by the initial appearance and subsequent growth in intensity of ‘new’ cross-peaks corresponding to the bound state of protein P. This ‘slow-exchange’ behaviour can also be exploited to extract the apparent dissociation of the complex by fitting the changes in intensity of the free- and bound-state cross-peaks to a Michaelis curve. True fast- and slow-exchange titration behaviours represent idealised limiting cases: typically fast-exchange occurs where the off-rate for protein P from the complex is relatively rapid, and slow-exchange when the off-rate is relatively slow (in quantitative terms, when the dissociation rate constant koff is respectively substantially larger than, or smaller than, the difference in chemical shifts of the free- and bound-states respectively, expressed in Hz). Generally, fast-exchange occurs for relatively weak intermolecular interactions, and slow-exchange for relatively strong binding. (Note slowexchange can also be observed when the on-rate for complexation is slow, though such observations are comparatively rare.) When koff is of the order of the chemical shift
Macromolecular Complexes
275
Figure 8.2 Schematic representations of the appearance of a 2D 15N,1H heteronuclear correlation spectrum for the titration (left to right) of the labelled component of a complex (‘protein P ’, see main text) with increasing concentration of (unlabelled) ligand (‘protein Q ’). At left, the spectrum of the ligand-free protein P is depicted containing three cross-peaks representing the backbone NH signals for (fictional) residues Lys39, Arg58 and Ser74. Scenarios (a)–(e) depict different titration behaviours. In each case the spectrum recorded with increasing [protein Q] is depicted on a grey scale: black ¼ free labelled protein P; light grey ¼ excess ligand Q. At the bottom the successive steps of each titration are superposed in a single panel. Italic text depicts the cross-peak position in the final spectrum of the titration. In each case the NH groups of Lys39 and Ser74 are supposed to be within or close to the binding interface between the protein and the ligand, whereas the NH of Arg58 is distant from the interaction site. See the main text for a full description of each case: (a) ‘fast-exchange’ titration characteristics; (b) ‘slow-exchange’ characteristics; (c) ‘intermediate exchange’ characteristics; (d) fast/intermediate exchange with selective cross-peak ‘bleaching’; (e) formation of an ‘NMR-invisible’ large complex. Note: in real-life experiments, the spectroscopist might observe more than one of these behaviour patterns for different cross-peaks in the same spectrum
differences involved, intermediate-exchange conditions prevail and the titration characteristic is as shown in Figure 8.2c. In this situation one observes progressive changes in the chemical shifts of perturbed cross-peaks accompanied by initial line broadening and diminution and then recovery of the crosspeak intensity across the titration. The intensity
276
Protein NMR Spectroscopy
profile reflects the exchange contribution to the linewidths of one or both of the resonances contributing to the crosspeak, and can be characterised by a variety of sophisticated analyses (see Chapter 7) that permit the extraction of the exchange rate constant(s) and the limiting (bound-state) chemical shifts (e.g. 15 N relaxation dispersion analysis). The cartoon representations of chemical shift titrations depicted in Figure 8.2a–c are deliberately simplistic. First, it is likely that complexation could give rise to an overall broadening of all cross-peaks in the spectrum, reflecting the increase in the overall rate of transverse magnetisation decay associated with the slower tumbling time of the complex compared to the isolated protein P. Moreover, in general, since the exchange regime that applies to any given crosspeak depends upon the magnitude of the ligand-dependent chemical shift change, one can observe instances of fast-, intermediate- and slow-exchange behaviour in the same experiment. Since the absolute frequency differences between free and bound state signals depend upon the applied magnetic field strength, then the outcome of the experiment is in principle dependent upon the particular spectrometer being used for the experiment. Note that in the cases of fast- and intermediate-exchange the chemical shifts of the bound-state can be directly related to the signals in the free-state, making the transfer of the resonance assignments from the free- to the bound-state a trivial exercise. In the case of slow-exchange, a priori there is no reliable method by which to transfer the assignments from the free- to the bound-state spectrum by inspection alone. Rather one should take steps to reassign the bound-state spectrum, either by traditional methods that would typically exploit double 13 C,15 N-isotope labelling and triple resonance 3D NMR methods, or by recording a spectrum at the midpoint in the titration that yields exchange cross-peaks between the free- and bound-state signals (e.g. 2D 15 N,1 H ZZ-exchange [48], or 2D 1 H-NOESY [49] or 1 H-ROESY [50,51] pulse sequences). It is worth noting that less straightforward outcomes of such titration experiments are sometimes obtained. Figure 8.2d depicts the relatively frequent occurrence that is similar to the case of intermediate-exchange (Figure 8.1c) but is different in that the perturbed protein P cross-peaks exhibit systematically diminishing peak intensity which does not recover by the end-point of the titration (i.e. at maximal [protein Q]). In effect, a subset of cross-peaks is ‘bleached’. The absence of any NMR signal at this point makes precise interpretation of the underlying mechanism somewhat speculative. Either: (i) the complex is in intermediate exchange between free and bound states but the achievable maximum [protein Q] is insufficient to approach the mole fraction of intact P:Q complex that corresponds to recovery of the crosspeak linewidth (which could be when protein P is close to be being saturated with ligand) – remember that the Michaelis curve approaches 100 %-bound asymptotically; or (ii) the crosspeak bleaching properly represents the spectrum of the fully bound complex, for which we are then required to invoke (intermediate timescale) chemical exchange broadening for the bleached NH groups, presumably reflecting interchange between two or more conformations of protein P (or protein Q) within the bound state on a timescale that leads to broad lines: P þ Q ! f< P : Q>1 $< P : Q>2 $ . . . < P : Q>n g free bound Resolution of whether case (i) applies might be obtained by measurement of the KD value by an independent method (e.g. isothermal titration calorimetry). Validation of whether case
Macromolecular Complexes
277
(ii) applies is inherently difficult, though variation in the temperature or magnetic field strength might conceivably provide an indication of whether intracomplex exchange processes are operative. Finally, Figure 8.2e illustrates an altogether more challenging situation that can easily arise in the titration of macromolecules that form complexes. The scenario that is depicted is that, as the titration proceeds, the vast majority of cross-peaks in the spectrum systematically diminish in intensity, usually without any evidence of selective chemical shift perturbations along the way. In these circumstances one could invoke global ligand-dependent induction of exchange broadening of the protein P spectrum. However, the more sensible explanation for this behaviour is that P:Q complex formed is in essence invisible to the NMR experiment being employed, mostly likely because the overall size of the complex means that the transverse relaxation time constants are too short to permit efficient magnetisation transfer. That the complex remains in solution and has not sedimented in the NMR tube is usually revealed by the persistence of a small number of cross-peaks from highly flexible (often terminal) main-chain or side-chain NH and NH2 groups. This type of situation represents a significant test of the ability of NMR to provide further details. Whilst the spectroscopist can appeal to the latest frontier methods in NMR that are applicable to high molecular weight systems (see below) these must be pursued alongside other structural and analytical approaches. For example, in such a case it is important to independently establish the effective molecular mass and stoichiometric composition of the complex by analytical size exclusion chromatography, static or dynamic light-scattering, analytical ultracentrifugation, capillary-nanoflow electrospray-ionisation mass spectrometry, or even crystallisation and X-ray diffraction.
8.4
8.4.1
3D Structure Determination of Macromolecular Protein–Ligand Complexes NOEs
The traditional approach to 3D structure determination of individual proteins relies heavily upon the collection of interproton distance restraints from multidimensional, often 13 C- or 15 N-separated, 1 H-NOESY data. It is still true that NOE-based structure determination remains a valid approach to macromolecular complex systems. Often it is convenient to approach such problems with prior knowledge of the resonance assignments (and 3D structures) of the isolated component parts of the complex, and to appeal to the opportunity for differential isotope labelling for the simplification of the NOE spectra. This can be achieved by exploiting the potential to design NMR pulse sequences that can ‘filter’ and ‘edit’ the spectra for certain classes of 1 H-NOE connections (see Chapter 7 for definitions). Let us consider the heterodimeric protein-protein complex P:Q, as before. In general we can usually choose to isotope label either protein P or protein Q, or both, prior to formation of the complex, with one or both of 15 N and 13 C isotopes. A common approach, one that emerges naturally from characterising the isolated proteins, is to prepare complex samples in which one of the components is double-13 C,15 N-isotope labelled and the other is left unlabelled, that is, P(13 C,15 N):Q or P:Q(13 C,15 N). The asymmetric double-labelling pattern allows for the chemical shift assignments of the bound state of each protein to be obtained
278
Protein NMR Spectroscopy
by application of standard triple resonance correlation experiments that rely on the intramolecular scalar couplings. The simple 2D 1 H-NOESY spectrum of each of the complexes contains the full complement of intra- and intermolecular NOE connectivities, comprising intramolecular NOE cross-peaks relating 12 C--H and 14 N--H (‘unlabelled’) protons, intramolecular 1 H-NOEs between X-bonded (‘labelled’) protons (X ¼ 13 C, 15 N), and intermolecular NOEs that connect X-bonded protons in one molecule and with ‘unlabelled’ protons in the other. In principle, it is possible to separate these three classes of NOE connections and thereby to simplify their analysis (see Figure 8.3). Derivatives of the 2D 1H NOESY pulse sequence have been devised that, for each dimension, either reject or select magnetisation arising from X-bound protons (reviewed in [52]). Early approaches
Figure 8.3 The principle of isotope-filtered/edited 2D NOE spectroscopy. Top left: a 2D 1 H,1H-NOESY spectrum of a binary macromolecular complex is schematically represented as a set of cross-peaks depicted as black dots. The full set of cross-peaks comprises intramolecular NOE correlations for each of the components of the complex (depicted as back and grey dots respectively, top right) and intermolecular NOEs between the components (open circles). Let us assume that the molecular component that in the latter case yields the back dot cross-peaks is uniformly isotope-labelled with a heteronucleus X, such that all protons yielding signals in the spectrum are directly bonded to an X-nucleus (X-H). Then in the ideal application (which takes into account the nonuniform nature of the 1JXH couplings) of the full set of X-filtered/edited combination NOESY experiments can be used to provide as separate subspectra: (a) F1,F2double edited NOESY (intramolecular HX-HX NOEs); (b) F1,F2-double filtered NOESY (intramolecular H-H NOEs); (c) F1-filtered, F2-edited NOESY and (d) F1-edited, F2-filtered NOESY (both intermolecular HX-H NOEs). Note: in practice not all combinations of these spectra would necessarily be recorded, and complications necessarily arise depending upon whether single isotope (e.g. 15N) or double (15N,13C)-isotope labelling and/or perdeuration of the nonexchangeable protons is employed (see main text)
Macromolecular Complexes
279
to this problem relied upon the use of pairs of spin-echo half-filter elements that could be combined into four different variants of the 2D 1 H NOESY pulse sequence [53,54]. The half-filter elements are tuned to either 15 N-1 H or 13 C-1 H 1 J- couplings and the 180 -X pulses applied either on- or off-resonance. Combinatorial summation/subtraction of the outputs of these four experiments yields four different spectra: F1,F2 double-filtered NOESY; F1,F2 double-edited NOESY; F1-edited, F2-filtered NOESY; and F1-filtered, F2-edited NOESY. Respectively, these spectra contain NOE correlations between: non-X-bonded protons; Xbonded protons; X-bonded protons (F1) and non-X-bonded protons (F2), and non-X-bonded protons (F1) and X-bonded protons (F2). Thus, with the asymmetric isotope labelling pattern described above, the first two results yield intramolecular NOE cross-peaks, and the latter two intermolecular NOEs. The double half-filter approach appears extremely elegant at first sight, but a number of complications arise in its experimental implementation. First, it is impossible to tune the half-filter elements simultaneously to 15 N-1 H and 13 C-1 H couplings, so that this approach will work only when applied separately to 15 N- or 13 C-bound protons. Secondly, one-bond 13 C-1 H J-couplings exhibit a significant range of values (1 JCH 120–150 Hz and 160–230 Hz for aliphatic and aromatic CH groups respectively) that undermine the overall performance of the 13 C-half filter element (which works absolutely only for a single value of 1 JCH for a given filter delay time) and give rise to artifactual breakthrough of 13 C-bound proton signals where one needs them to be suppressed. Moreover the incorporation of two half-filter elements into the pulse sequence leads to substantial loss of sensitivity due to transverse relaxation during the spin-echo periods, especially for higher molecular weight systems. Taking these features into account, a more practical approach is as follows: the asymmetric 13 C,15 N-labelled complexes are used to record typical 3D 15 N-separated and 13 C-separated 1 H-NOESY spectra (as would be acquired for the structure determination of each protein on its own). The 13 C-experiment can be recorded in both H2O and D2O solution to distinguish 1 H signals close to the solvent water resonance. These spectra are in effect equivalent to 2D F2-X-edited NOESY spectra where the half-filter element has been converted into a chemical shift evolution period that encodes the frequency of the Xnucleus bonded to the destination proton. These spectra contain both the intramolecular NOEs of the labelled protein and the intermolecular NOEs. These two classes of NOEs can then be distinguished by comparison to data sets that contain only intermolecular NOEs, such as can be obtained in modern, optimised variants of X-filtered, X-edited 1 H-NOESY spectroscopy. Lewis Kay and colleagues have developed a version of the 13 C-filtered,13 Cedited 1 H-NOESY that uses frequency-swept 13 C-inversion pulses to purge the 13 C-1 H signals in F1 [55]. The use of the adiabatic pulse is designed to reflect the approximate linear correlation between the 1 JCH value and the 13 C chemical shift that occurs in both polypeptides and, with different parameters, RNA oligonucleotides. The pulse sequence is extended to encode the 13 C-chemical shift of the destination proton in a third dimension, and can be applied either in D2O to detect intermolecular H(12 C)-H(13 C) NOEs or (with modifications) in H2O to obtain H(14 N)/H(12 CH)-H(15 N)/H(13 CH) NOE. A related approach to compensate for 1 JCH variation in proteins, with superior resolution in the indirect 1 H (source proton) dimension, has been described by Palmer and co-workers [56]. In addition, Melacini has described methods that appeal to J-correlation spectroscopy that permit the recording of both intermolecular and intramolecular 1 H-NOEs in separate
280
Protein NMR Spectroscopy
regions of the same 4D NMR experiment [57] (and, in brief, Nietlispach et al. have described extensions of this distinctive approach [9]). Isotope-filtered/edited experiments contain many pulse and delay elements that are prone to compromise the overall sensitivity. Moreover the targeted NOE effects are weak and can have overall intensity that is the same order of magnitude as ‘breakthrough’ spectral artefacts, particularly close to the 1 H,1 H diagonal, that arise from very flexible regions of the complex structure. In light of this one can take an entirely different course that amounts to simplification of the 1 H-NOESY spectrum by biosynthetic replacement of subsets of protons with deuterons. Thus, as alluded to above, high level replacement of nonexchangeable protons with deuterons can be achieved by preparation of the target protein in a (bacterial) expression host cultured in heavy water and a perdeuterated carbon source (or alternatively on a commercially supplied medium based upon perdeuterated algal lysates or similar). When back-exchanged in normal water the NMR spectrum of such a sample contains essentially only the backbone and side-chain NH resonances. (Note: complete back exchange may require a denaturation-refolding step.) Preparation of the target complex with one or both of the components in the perdeuterated state provides samples for which (by definition) the various potential multidimensional 1 H-NOESY experiments are ‘edited’ in that only certain subsets of inter- and intramolecular contacts will be apparent as cross-peaks. Perdeuteration also improves the performance of variants of these experiments for larger complexes that incorporate TROSY-type signal selection, and can lead to the enhancement of the sensitivity for long-range 1 H-NOEs because of the effective dilution of 1 H-1 H dipolar relaxation pathways. (Note: side-chain hydroxyl OH and sulfydryl SH protons are also formally present in perdeuterated samples in H2O. Often, but not always, the chemical shift of these signals is coincident with the bulk water resonance because of rapid chemical exchange. Nevertheless the potential for the NMR resonances of such groups – which might be locked down in important hydrogen bond interactions – to be present should be borne in mind during the analysis of the spectra.) As an example let us return to our hypothetical P:Q heterodimer. The complex could be prepared with 15 N-labelled perdeuterated protein P mixed with 13 C-labelled protein Q; that is to make the complex P(15 N,2 H):Q(13 C). The application of a 3D F113 C-edited, F2-15 N-separated 1 H-NOESY pulse sequence will yield a spectrum that contains only intermolecular HC(protein Q)-HN(protein P) NOE cross-peaks. In principle this experiment can be extended to four frequency dimensions to additionally encode the 13 C-chemical shift. These data can be compared to the crosspeak inventory of a 3D 15 N-separated 1 H-NOESY experiment obtained from a sample of the complex prepared with 15 N,2 H-labelled protein P and unlabelled protein Q. This experiment contains the intermolecular HC-HN NOEs, as before, but also the intermolecular HN-HN contacts. The perdeuteration of protein P makes advantageous the application of TROSY-based signal readout for superior resolution, and in the latter case in which unlabelled protein Q is used the HC proton relaxation is slowed due to the absence of the 13 C magnetic moment leading to sharper lines in the indirect 1 H dimension. This 15 N readout-focused strategy is most beneficial when there is potential for short-range contacts involving NH groups between the two components in the complex, such as occurs when an intermolecular b-sheet is formed. In the more general case one would anticipate that side-chain–side-chain interactions might dominate in the interface, focussing interest on methods that facilitate the analysis of intermolecular 1 H NOEs between aliphatic
Macromolecular Complexes
281
and aromatic protons. Wagner and co-workers have reported an approach to this type of contact based on asymmetric isotope labelling and perdeuteration, that for large systems can complement or potentially supplant the X-filtered/edited 1 H-NOESY paradigm [58]. Recognising that protein-protein interfaces tend to be populated by hydrophobic interactions between residues containing methyl groups and aromatic rings, these researchers proposed 13 C-ILV-methyl-u2 H-labelling for one component of the complex (e.g. protein P) and to mix this with the other component (protein Q) in unlabelled form; that is to prepare the complex P(13 C-CH3-ILV,2 H):Q(12 C) (see Figure 8.4). Application of 3D 13 C-HMQC-1 H-NOESY spectroscopy for this sample then yields a spectrum with 2D1 H,1 H-planes containing crosspeaks that correlate the protein P ILV-methyl group protons (F2) with nearby aliphatic and aromatic HC protons from protein Q (F3). Evidently, the labelling pattern can be reversed to obtain a complementary pattern of Q(labelled)-P(unlabelled) 1 H NOEs. It should be apparent that there are a number of ways of mounting an assault on the determination of the structure of a macromolecular complex via the analysis of NOE contacts. However once embarked on this strategy the spectroscopist faces the conundrum of how to separate inter- from intramolecular NOE cross-peaks. One major approach to this problem is by the application of multidimensional X-filtered/edited 1 H NOESY pulse sequences applied to differentially 13 C,15 N-isotope labelled samples. Another avenue is
Figure 8.4 A cartoon representation of the asymmetric sample deuteration strategy suggested by Gross et al. [58] for obtaining intermolecular NOEs focused upon side-chain contacts. One protein in the complex (here, protein eIF4E shown on the left) is prepared using perdeuterated media containing precursors for the synthesis of isoleucine (d1 position), leucine and valine (ILV) residues containing 13CH3 groups (Goto, Gardner et al. [47]). The partner protein in the complex (here, eIF4G shown on the right) is unlabelled. 13C-edited 1H-NOESY spectroscopy recorded in D2O solution provides cross peaks for intramolecular (narrow dashes) and intermolecular (wide dashes) contacts involving the 13C-labelled ILV side-chains. Image taken from [58]
282
Protein NMR Spectroscopy
asymmetric isotope labelling combined with perdeuteration to ‘chemically’ edit the NOE spectrum. These two modes of attack can be appropriately combined where feasible. On the other hand, when one of the components of the target complex is for any reason less tractable, then the choice of experiment is more limited. For example, in the case of proteinDNA complexes, it is not normal to isotopically label the DNA, and the options for mixing and matching the experimental paradigms for resonance assignment and NOE measurement are narrowed. In truth there is a plethora of combinations of isotope-labelling and NOESY pulse sequence variants that can be, and have been, employed in NMR investigations of macromolecular complexes. Famously, in the seminal work by Clore and Gronenborn’s group on the 40 kDa complex formed between the proteins E1N and HPr (vide infra), sixteen different 2 H/13 C/15 N-labelled samples and eight different 3D or 4D NOESY experiments were employed [59]. The NOE data obtained in this comprehensive (and instrument-timeconsuming!) analysis provided for an impressively high-resolution structure, and arguably still represents a pinnacle of achievement in terms of the application of NMR to the ab initio structure determination of a protein complex. Nevertheless following this example, there has been a rapid realisation that the extension of the traditional NOE-focussed approach to macromolecular complexes is likely to be very inefficient. Only a relatively small number of all the interproton NOE contacts represented in a macromolecular complex arises from the intermolecular contact zone, and often these seem to be influenced by interfacial dynamics that either broaden the corresponding resonances beyond detection or act to quench the magnitude of the NOEs. These considerations have led to the exploration of a variety of other means to obtain structural information concerning the nature of large complexes, either in terms of the identification of the interfacial contact surfaces, or of the overall shape. Either on their own or more commonly in combination with computer software incorporating database potentials that is targeted to the prediction of protein-ligand complex structures, these alternative routes can prove extremely useful. 8.4.2
Saturation Transfer
As described above, chemical shift perturbation mapping is an obvious and, in practice, the most facile means to initiate the structural characterisation of a protein-ligand complex. The approach works particularly well for systems in fast-exchange, or for systems in slowexchange when the spectra of the complexed state are tractable (in terms of establishing the resonance assignments, etc.). For systems in intermediate-exchange where many crosspeaks may be broadened beyond detection (Figure 8.2d), or when the final complex is rather too large to obtain high-quality spectra (Figure 8.2e), another method based upon transfer of magnetisation by cross-saturation can provide a means to identify the contact interface. Indeed, even for systems in fast-exchange this ‘saturation transfer’ approach can provide complementary and arguably more precise localisation of the interaction surfaces. The concept of saturation transfer is simple to grasp: we engineer the system so that is it possible to selectively saturate the NMR transitions for one component of the complex, and then we record the effect of this saturation on the spectrum of the other component [60–64]. Where there exists one or more magnetisation transfer pathways comprised of tandem short hops between 1 H nuclei, the intensity of the spectrum of the second component will be diminished as nuclear polarisation is ‘drained’ into the saturated spin systems of the first component (Figure 8.5).
Macromolecular Complexes
283
Figure 8.5 Top, the principle of the saturation transfer experiment. The target complex comprises two components, one of which (protein II) is unlabelled; the other (protein I) is 15 N,2H-labelled. The proton spectrum of the unlabelled component is saturated using selective RF irradiation of the aliphatic region. Cross-saturation of signals in the spectrum of protein I corresponding to NH groups in the contact zone can assessed by comparison of 2D 15N,1H correlated spectroscopy obtained with and without the RF irradiation. Bottom, the concept of cross-saturation can be generalised to a situation of exchange between free- and bound-state protein I. Thus if exchange is fast on the timescale of the intermolecular cross relaxation rate then the ‘transferred’ cross-saturation effect can be detected in the spectrum of the excess free protein I, which will likely have superior NMR characteristics than the bound state spectrum. Image taken from [64]
Returning to our P:Q heterodimer (see above), the practical requirement is that we are able to perform the saturation selectively on (say) component P so that the effects we observe for component Q can be safely ascribed to the magnetisation transfer process, and that those resonances correspond to atoms in or near the contact region between the two molecules. A straightforward way to obtain selective saturation is to employ narrow-band low-power
284
Protein NMR Spectroscopy
irradiation of the upfield-shifted methyl group region. The methodology is related to the application of cross-saturation NMR that is routinely applied in the investigation of transient interactions of small molecules with substoichiometric amounts of a binding protein. Usually in the latter case, the small molecule spectra have a restricted 1 H chemical shift dispersion that does not extend upfield of d 0.7 ppm, affording the opportunity to apply selective RF irradiation to subset of protein 1 H signals that normally populate the region d < 0.5 ppm. Because of the slow tumbling of the protein, 1 H spin diffusion leads to efficient transfer of the protein saturation both to other protein signals and to the bound ligand. When the ligand is in sufficiently fast-exchange between the free- and bound-states, the effect of the protein-mediated saturation is transferred to the spectrum of the free ligand, and the signal intensity for one or more of the ligand resonances is diminished. A control experiment is usually run in parallel where the RF irradiation is applied off-resonance (e.g. at d ¼ 20 ppm). The difference between the on- and off-resonance spectra yields solely the spectrum of the affected ligand signals. As a measure of the utility of this saturation transfer difference (STD) approach, it is noteworthy that the methodology is widely used (amongst other approaches) in highthroughput NMR screening for small molecule interactions with target proteins [65–68]. However, application of the saturation transfer method to protein-protein complexes presents a complication. It is unlikely that the 1 H spectrum of one component has sufficiently different chemical shift dispersion to find a window in which to easily apply the saturation RF irradiation entirely selectively. Moreover in high affinity complexes, the quality of the NMR spectrum may be limited by the effective high molecular mass and slow tumbling rate. A solution is provided by again appealing to differential isotope-labelling, as first demonstrated by Takahashi and co-workers [61]. The complex is prepared so that one component (e.g. protein P) is prepared in a highly deuterated, 15 N-labelled state: P(2 H,15 N); the other component (protein Q) is not specifically labelled. Thus the upfield region of the 1 H spectrum of the complex contains only signals from protein Q. Assuming back exchange (2 H ! 1 H) of the protein P amide groups has been successfully achieved – by refolding if necessary – the downfield region of the complex spectrum contains aromatic 1 H signals from protein Q together with NH signals from both protein P and protein Q. The spectrum of protein P within the complex can be selectively recorded using a 2D 15 N,1 H-HSQC or 15 1 N, H-TROSY pulse sequence. The absence of aliphatic protons from protein P means that selective narrow-band 1 H irradiation of the protein Q spectrum can be obtained using, for example, a WURST-2 pulse train [69] covering the aliphatic chemical shift range during the pulse sequence preparation period [64]. As a result of the high proton density in protein Q, spin diffusion effects mean that not only the aliphatic signals but also the aromatic HC and amide HN resonances are efficiently saturated. By contrast the relatively low proton density in the double-labelled protein P means that the spin diffusion saturation transfer effect should be relatively localised to HN nuclei closest to the interfacial region. This expectation is somewhat undermined in circumstances where the NH groups in protein P are rather close to each other, such as consecutive backbone amide groups in an alpha-helix (HN(i) N H (i þ 1) 2.5 A). Increasing the lock D2O/solvent H2O ratio in the sample buffer can have the advantage of diluting out these second order effects. This adjustment is at the cost of the overall sensitivity of the NMR experiment, but provides for more specific information about the intermolecular contact surface on the protein P. Experience shows that, under conditions where care is taken to specifically reduce HN-HN spin diffusion (e.g. in 10 %
Macromolecular Complexes
285
H2O/90 % D2O solution), amide proton magnetization within the interfacial region can be diminished up to 90 % by intermolecular saturation transfer [64]. Model calculations suggests that HN atoms up to 7 A distant from a typical protein–protein interface can experience a > 50 % reduction in intensity. In general, the surfaces mapped out by such cross-saturation experiments appear to be less extensive than suggested by corresponding chemical shift mapping analysis, indicating that in the latter type of experiment the perturbations of crosspeak position can reflect a range of stereoelectronic effects (including small conformational changes such as alteration of hydrogen bond geometry and side-chain rotameric distributions) that can be transmitted a significant distance away from the intermolecular contact zone. In this context the results obtained from saturation transfer experiments are expected to provide a more precise identification of the interfacial contact surface than chemical shift mapping [64,70,71]. (Despite this advantage, it is important to bear in mind that for a given NH crosspeak the absence of a significant cross-saturation effect is not necessarily an indication that it is not close to the intermolecular interface since it may be that the local proton density in the opposing surface is such that polarisation transfer is relatively inefficient.) Recognising the potential difficulties working with a 15 N,1 H-HSQC/TROSY readout in a buffer containing low levels of H2O, leading to intrinsic low sensitivity, and that backbone HN atoms may be further from the protein–protein contact surface than many side-chain protons, alternative modes of saturation transfer experiments have been sought. One demonstrated approach is to exploit the labelling scheme described above that leads to 1 H,13 C-ILV-methyl group incorporation in a protein that otherwise contains 12 C and 2 H atoms [47,72]. The fully exchanged D2O solution 1 H spectrum of such a sample contains only 1 H signals in the upfield region of the spectrum (typically d < 2.0 ppm). Therefore, when mixed with an unlabelled (12 C,1 H) binding partner, selective 1 H saturation can be achieved by band-selective RF irradiation across the window covering both aromatic and many nonmethyl group aliphatic protons (8.5 ppm > d > 3.5 ppm) [73]. Although the readout of this type of experiment, based on a 2D 13 C,1 H-HSQC or TROSY pulse sequence, is limited to the methyl groups of Ile, Leu and Val residues, there is some support for the claim that these moieties populate typical protein–protein interfaces with high frequency. In addition, the relatively sharp lines of the methyl signals, combined with the potential to exploit the methyl-TROSYeffect for very large complexes, means that the sensitivity of this mode of saturation transfer experiment can outcompete that obtained using the 15 N,1 H readout described above. As the range of selective 13 C,1 H labelling schemes is expanded (for example selective 13 C,1 H-labelling of alanine methyl groups [74,75]) then the completeness with which protein interaction surfaces can be mapped by saturation transfer techniques will improve. Saturation transfer methods can be applied to systems exhibiting a variety of thermodynamic and exchange kinetic characteristics. For high affinity complexes which typically display slow exchange NMR behaviour transfer of the resonance assignments from the freeto the bound-state spectrum may require recording of standard triple resonance backbone data sets in both conditions, or the exploitation of pulse sequences that can reveal correlations between free- and bound-state cross-peaks in a partially saturated sample of the complex. Where the target complex is in fast-exchange the potential to use the transferred cross-saturation effect, effectively the same as employed in STD-NMR for small molecules, may be advantageous (Figure 8.5). In this situation the X,2 H-labelled ‘readout’ component
286
Protein NMR Spectroscopy
(X ¼ 15 N or 13 C,1 H-methyl) is included in excess of the protonated binding partner. Since the T1 relaxation time constants of the ‘readout’ signals are relatively long, saturation transfer that occurs for protein molecules within the complex is maintained upon complex dissociation, effectively averaging the peak intensity between the free and bound state populations. The magnitude of this transferred cross-saturation effect will depend upon the precise characteristics of the system under study, such as the dissociation constant for the complex, the molar ratio of the component molecules, and dissociation rate constant (‘offrate’). In general the faster the off-rate the more important it is to prepare the sample so as to contain a high proportion the ‘readout’ component in the bound state. However, because the efficiency of spin diffusion is a function of the overall tumbling rate of the internuclear vectors, in the limit that the binding partner has a very long rotational correlation time (perhaps such as an intregral membrane protein embedded in a micellar structure [76,77]) then the proportion of bound ‘readout’ protein needed to obtain substantial saturation transfer effects can be very low (1 %). 8.4.3
Residual Dipolar Couplings
Over the past decade or so, the application of heteronuclear NMR spectroscopy to problems in the analysis of macromolecular structure and function have been massively enhanced by the realisation that a wealth of data can be obtained from the measurement of residual dipolar couplings from samples prepared in media that impart a weakly orienting influence on the solute molecules [78–82]. Typical ‘dilute liquid crystal’ media [83] that can be employed to partially align biological macromolecules include mixtures of short- and long-chain lipids that produce bi-layered micelle nematic phases (so-called bicelles) [84–91], aqueous poly (ethylene glycol)/alcohol mixtures [92], Helfich phases (aqueous cetylpyridium bromide/ sodium bromide/hexanol) [93,94], filamentous bacteriophage particles [95–97], purple membranes [98], compressed or stretched polyacrylamide gels [99–101], cellulose crystallites [102] and polymerised collagen [103]. When dissolved in such anisotropic media, the nonuniformity in the overall molecular tumbling of the solute molecules, which relates to their overall shape and charge distribution, can be assessed by application of pulse sequences that are used to extract spin–spin coupling constants. The observed splittings for partially orientated solute contain a contribution from the incompletely averaged dipolar interactions between the sampled nuclei. The ‘residual’ n-bond dipolar coupling can be estimated by comparison of the apparent nJ-values measured in isotropic and anisotropic media. n
D n Janiso n Jiso
The scope of the utility of RDC measurements in NMR is vast, and the underlying theoretical concepts are a little dense, so the comments provided here are necessarily abbreviated (see Chapter 4 for a detailed discussion of the practical aspects of using Residual Dipolar Coupling and Chapter 9 for the application of RDCs to the studies of partially folded proteins). For a given dipolar vector, the measured RDC is dependent upon the orientation and dynamic order of the internuclear vector with respect to the principal axis frame of the alignment tensor. Thus the RDC between two nuclei A and B DAB, as a function of the polar coordinates (q,j) in the frame of principal axis system of the alignment tensor A,
Macromolecular Complexes
287
is given by DAB ¼1 =2 D0;AB fAax ð3 cos2 q1Þ þ 3=2 Arh sin2 q cos 2jg where D0;AB ¼ ðm0 h=8p2 Þ S g A g B < rAB 3 > and Aax and Arh describe the axial and rhombic components of the alignment tensor, r is the internuclear distance, g A and g B are the magnetogyric ratios for A and B, m0 is the magnetic permeability of a vacuum and h is Planck’s constant. The alignment tensor describes the ordering of the molecule frame with respect to the external alignment influence (the liquid crystal). The great experimental utility of RDC measurements is that, measured for a sufficiently large number of internuclear vectors, they permit an estimation of the relative orientation of arbitrarily selected regions of a particular structure – be it individual peptide bond units within a single polypeptide, globular domains within a single protein chain, or different components of a macromolecular complex. Within the context of NMR of complexes, the power of RDC measurements is most readily evident in the case of the stable interaction of two globular domains, for which the 3D structures of the isolated components are already known and are not strongly perturbed in the bound state. In many respects this situation is formally equivalent to the use of RDC measurements to delineate the relative orientation of separate domains within a single molecule (e.g. tethered domains of a tandem domain protein [104,105]) for which several helpful reviews have appeared [81,82,106–110]. For a well-ordered high affinity complex, where the signals of any excess ‘free’ components are in slow-exchange and can be separately identified, the measured RDCs for either molecule involved in the complex will reflect a common mode of alignment. In other words, when the measured RDCs are analysed in terms of the known free-state structures, the derived parameters and error ranges that describe alignment tensors for each component should ‘overlap’. The alignment tensors should appear to be the same for each part of the complex. Such an outcome is more likely to arise when the analysis of the RDC measurements is restricted to the more rigid parts of the structure, say the regular secondary structure elements, and away from the surface loop regions. A valuable aspect of RDC measurements, especially for large systems, is that in principle only a relatively small number ( 5) of well-defined RDC values are required to determine the alignment tensor of for a given component structure. Although methods have been devised for obtaining RDC values for many pairs of dipolarcoupled atom-types, the most common, and arguably the most practical RDCs can be obtained for directly bonded NH pairs (1 DNH ). The RDC values can be readily extracted from the variation of the separation of the 15 N doublet components in modified F1undecoupled 15 N,1 H-HSQC type spectra, tagged ‘in-phase/antiphase’ (IPAP), obtained for the sample in the aligned and nonaligned condition [111–113]. In general such spectra have a good signal-to-noise ratio and are reasonably well-resolved, facilitating the analysis. Extensions of the 2D experiment into a third dimension (e.g. as a modified 3D HNCO pulse sequence) can help relieve cross-peak overlap in busy spectra. In addition 1 DNH values can be obtained for systems where perdeuteration of nonexchangeable protons has been applied, for example for higher molecular weight systems. Although all of the 1 DNH RDC values for
288
Protein NMR Spectroscopy
a given complex could be obtained in a single shot (i.e. with both components 15 N-labelled), it usually proves more practical to measure these separately for each partner in the complex using two different samples each with a single component 15 N-labelled. In this case, it may prove important to uniformly scale the measured RDC values (or the derived alignment tensor order parameters) for each set of measurements to control for experimental variation of the degree of alignment obtained in each case. Alternatively, the experimenter can appeal to the concept of combined single-/double-isotope labelling described above [30] combined with IDIS-type NMR pulse sequences for the extraction of the RDC values. In this way the RDC values obtained for each component of the complex are obtained under equivalent alignment conditions for each step in the titration [31], effectively eliminating the potential for mismatch in buffer or alignment conditions that will almost inevitably occur for measurements made of separately labelled samples. In contrast to high affinity complexes, in the case of a complex that demonstrates a relatively weak affinity such that the NMR spectra of the component molecules both demonstrate fast-exchange chemical shift perturbation behaviour (Figure 8.2a and b) the measured RDCs will represent averages of the alignment of the component molecules in the free and bound states. In such an example, because of the asymptotic nature of the Michaelis binding curve, it is essentially impossible to obtain the spectrum of the pure (saturated) complex, meaning that the limiting values of the RDCs for the complex are practically inaccessible. Blackledge and co-workers have recently shown however that a combination of RDC measurements made for a small number of component admixtures throughout a titration series can be combined with prior knowledge of the KD value for the complex (e.g. from a fit to chemical shifts changes, or an independent methodology such as isothermal titration calorimetry) and precise measurement of the component molecule concentrations in the NMR samples can be analysed to extrapolate the variation of experimental RDCs with the ‘fraction-bound’ for each component to the hypothetical ‘fully-bound’ values [32]. Importantly this approach absolutely relies upon combined single-/double-isotope labelling [30] together with IDIS NMR experiments to obtain the intermediate RDC values. Since RDC measurements provide only orientation, but not translational, information for different bond vectors or molecular fragments, ab initio characterisation of macromolecular complexes on the basis of RDCs alone is not possible. Moreover, on their own, sets of RDC measurements for different components of a complex recorded in a single alignment medium lead to fourfold symmetry-related but degenerate solutions for the relative orientation of the different parts. Such orientational ambiguities can be lifted by obtaining RDCs in a second alignment medium sufficiently different from the first as to yield substantial variation of the alignment tensor, for example by using one medium that aligns the solute molecules predominantly on the basis of their hydrodynamic shape, and another that additionally interacts strongly with their charge distribution. Critically the lack of translational information arising from even the most complete RDC datasets means that these measurements must be supplemented by additional information that provides constraints on the separation distance of the components, such as intermolecular interproton distance restraints from NOESY data. In fact, because other data types are required it is not strictly necessary to obtain RDCs under multiple alignment conditions in order to make good use of such measurements, though clearly the more information one has of any type is likely to improve the precision and accuracy of the final outcome. In practice it may be challenging to find more than one alignment medium that is compatible with maintaining
Macromolecular Complexes
289
the solute molecules in solution – a universal issue for RDC measurements – and the experimenter may want to strike a balance between the number of different alignment media trials and the expense (time and cost) or procuring the complex samples. 8.4.4
Paramagnetic Relaxation Enhancements
The exploitation of paramagnetic centres in biological macromolecules has been practised within multiple fields of magnetic resonance for decades. In the context of NMR, the influence of paramagnetic centres on chemical shifts, relaxation rate constants and partial alignment is well understood and has been used to determine aspects of structure and dynamics in a variety of contexts [114,115]. There has been a resurgence of interest in these concepts with the realisation that recombinant DNA technology, protein engineering and advances in metal-ion ligation paradigms can be combined with multidimensional heteronuclear NMR to broaden the scope of these methods [1]. In particular it is now relatively straightforward to engineer a paramagnetic centre into a specific site within a given protein (or RNA) molecule [116]. Much as for RDCs, to do justice to the breadth of the utility of paramagnetic centres in biological NMR itself would require a lengthy treatise and the reader is referred to other sources, including Chapter 6, for a more complete discussion of this topic than can be presented here. Paramagnetic centres come in a variety of ‘flavours’ and the precise nature of their influence in the NMR context depends upon the isotropy of the unpaired electron magnetic susceptibility tensor, the extent of electron delocalisation over neighbouring atoms, and the unpaired electron relaxation properties. A class of paramagnetic centres, typically with an anisotropic electron magnetic susceptibility tensor, gives rise to pseudo-contact shifts (see below). Other types of paramagnetic centres have a more significant influence on the longitudinal and transverse relaxation rates. In principle, any paramagnetic centre, whether intrinsic to – or engineered into – the molecule of interest, can be exploited to yield useful information concerning the overall structure. Here the discussion is focused on a single aspect, namely the so-called paramagnetic relaxation enhancement (PRE) effect, which can be put to good use in investigations of a variety of macromolecular complexes. In the presence of a paramagnetic centre (defined as consisting of one or more unpaired electrons) the relaxation rates of a given NMR-active nucleus will be elevated compared to the situation where that centre is either removed or converted (e.g. by oxidation or reduction) to be diamagnetic (i.e. no unpaired electrons). The difference in relaxation rates is referred to as PRE. The PRE effect principally arises because of the dipolar interaction between the nucleus and the unpaired electron(s). Although a general phenomenon displayed by all paramagnetic systems, the PRE is most readily interpreted in instances when the unpaired electron centre possesses an essentially isotropic g-tensor, under which circumstances pseudo-contact shifts and Curie-spin relaxation effects can be safely disregarded. Freeradical paramagnets based upon metastable nitroxide spin labels provide such an opportunity, as do systems based upon chelated Mn2 þ ions, for example EDTA-Mn2 þ . Methods have been devised to incorporate these moieties into polypeptide chains and nucleic acids in such a way as allow a great deal of flexibility in the choice of attachment site. In the case of proteins the spin label is usually attached by a disulphide coupling to a surface-exposed Cys side-chain. To ensure site-specific labelling the target protein might require to be engineered to possess only a single such exposed Cys. (If more than a single reactive Cys exists on the
290
Protein NMR Spectroscopy
protein surface, it is required to use site-directed mutagenesis to eliminate the extra thiols.) Derivatisation can be effected by treatment with S-(2-pyridy1thio)-cysteaminyl-EDTAmetal (where e.g. metal ¼ Mn2 þ , Cu2 þ , etc.) or (1-oxyl-2,2,5,5-tetramethyl-D3-pyrroline3-methyl)methanethiosulfonate spin label (more simply referred to as MTSSL) that introduces a nitroxide spin label [117,118]. Incorporation of EDTA-metal groups into DNA can be achieved by synthesis of the target oligonucleotide using EDTA-derivatised deoxythymidine (dT-EDTA) [119]. This reagent was originally developed for sequencespecific cleavage of double-stranded (ds) DNA and the phosphoramidite derivative of dTEDTA is commercially available. Similarly nitroxide-containing tetraalkylpyrrolidine-Noxyl (‘proxyl’) groups can be coupled to RNA synthesised to contain single 4-thiouracyl bases at the desired position by treatment with 3-(2-iodoacetamidoproxyl) [120]. Less general, but still potentially useful, is the use of a amino-terminal NH2-Gly-Gly-Histripeptide motif (denoted ATCUN) that binds Cu2 þ ions with very high affinity [121]. Although PRE effects operate on both protons and X-nuclei, and upon both the longitudinal and transverse relaxation rates of these nuclei, the most popular mode of analysis is in terms of the PRE effect on the transverse relaxation rates of protons, and is denoted 1 H-G2. PRE effects on 1 H longitudinal relaxation rates are complicated by sensitivity to internal motions and cross-relaxation effects. The 1 H-G2 is defined as the difference in 1 H transverse relaxation rate constant R2 between the paramagnetic sample and an equivalent diamagnetic molecule: G2 ¼ R2;para R2;dia ¼ 1=15 ðm0 =4pÞ2 g 2I g2 m2B S ðS þ 1Þr6 f4tc þ 3tc =½1 þ ðwH tc Þ2 g where m0 is the permittivity of a vacuum, g I the nuclear gyromagnetic ratio, g the electron g-factor, S the electron spin quantum number and wH/2p is the nuclear Larmor frequency. tc is the correlation time relevant to PRE given by tc1 ¼ tr1 þ ts1 where tr is the molecular correlation time and ts the electron relaxation time [122]. Like the NOE, the magnitude of the PRE scales with 1/r6 where r is the separation between the nucleus and the paramagnet, making the magnitude strongly distance dependent. Unlike the NOE, however, because the magnetic moment of the unpaired electron(s) is large, detectable PRE effects can extend over distances similar to the overall dimensions of biological macromolecules, sometimes up to 35 A. In the context of macromolecular complexes, because of the physical bulk of the spin-label, care needs to be exercised to ensure that the modification of the component macromolecule does not perturb or interfere with the interaction of interest. However as a result of the potentially long-range nature of the PRE, it is usually straightforward to avoid such problems. Measurement of the 1 H-G2 amounts to estimating the difference in the 1 H linewidth between the paramagnetic and diamagnetic states of the modified macromolecule. Conventionally this is performed by NMR measurement in the paramagnetic state followed either by chemical reduction of the nitroxide-type spin label with an excess of ascorbic acid or replacement of the metal ion in MnEDTA-type labels with a diamagnetic cation such as Ca2 þ , and final NMR measurement in the diamagnetic state. It has been emphasised that accurate assessment of 1 H-G2 is best carried out using a two-time point measurement [122]. Compared to seemingly more straightforward ‘single time point’ estimation of 1 H-G2 that relies on comparison of crosspeak intensities in simple HSQC spectra of the two spin-label redox states, this approach does not require complex data fitting or error estimation, and avoids a potentially unwarranted
Macromolecular Complexes
291
assumption about the 1 H relaxation behaviour between scans. In order to avoid nonspecific PRE effects arising from collisions between spin-labelled molecules, it is recommended to perform the PRE experiments on relatively dilute solutions of the target complex (< 0.5 mM). When applied to a heterodimeric complex in which one component is appropriately spinlabelled, 1 H-G2 measurements yield useful structural distance constraints. Because a small error in 1 H-G2 translates into a large interval in the predicted separation distance, precision and accuracy of the measurements is at a premium. In this context an important aspect arises from the potential for a residual diamagnetic contribution to the 1 H-G2 rate in the experiment designed to measure R2,para. Diamagnetic species can arise because of: incomplete crosslinking of the spin-label appendage or incomplete purification of the conjugated species; diamagnetic impurities in the source of paramagnetic metal ions; or the slow chemical decay of the spin-labelled species, for example by disulphide exchange between the paramagnetic ‘heterodimer’ spin-labelled state of the macromolecule M(Cys)-SS-SL and the diamagnetic cross-linked homodimers (diamagnetic) M(Cys)-SS-(Cys)M and SL-SS-SL. The contribution of the diamagnetic state to the 1 H-G2 measurements can be assessed by careful examination of 1D traces through cross-peaks displaying essentially no signal in the paramagnetic state [122]. The analysis of PREs in terms of structural restraints is subject to certain important caveats. The first of these is that the spin-label is usually attached to one of the complex components by a tether which displays a degree of flexibility, leading to some ambiguity in its precise position with respect to the target macromolecules. Sophisticated calculation methods have been devised that allow for this variability to be taken into account [123]. The second critical caveat for the interpretation of PRE measurements is that internal motions within the target complex lead to nonlinear averaging of the observed effects on resonance linewidths. Thus in a dynamic system the excursion of the spin-label to weakly populated sites can still give substantial PRE effects whose weight can be difficult to ascertain. Indeed, in the worst case one can observe combinations of PRE measurements that are not self-consistent, though identification of this problem is likely to be elusive without some prior knowledge of the separate 3D structures of the components of the complex. On a more positive note the utility of PRE measurements of dynamic systems has been exploited to estimate the envelope of encounter complexes formed between weakly associated macromolecules [1,124–128]. As with all of the methods described here, the combination of PREs with other data-types is likely to prove more reliable than focussing on this type of measurement on its own. 8.4.5
Pseudo-Contact Shifts
It has been known for quite some time that the association of lanthanide metal ions with biological macromolecules (particularly proteins) has dramatic influence upon their NMR spectra through PREs and the induction of pseudo-contact shifts [129]. There has been a recent resurgence in the application of lanthanide tagging of macromolecules with the realisation that the anisotropic magnetic susceptibility of these ions can be exploited to induce partial alignment of the appended macromolecule and thereby obtain RDC values in the absence of exogenous (e.g. liquid crystal) alignment media. Although applications of lanthanides for RDC measurements of macromolecular complexes are not (yet) described, Otting and co-workers have recently demonstrated the utility of lanthanide-induced pseudocontact shifts (PCSs) in this context [130].
292
Protein NMR Spectroscopy
The PCS arises from a through-space dipole–dipole interaction between the NMR probe nucleus and the unpaired f-electrons of the lanthanide ion (generally denoted Ln3 þ ). PCSs have a significant magnitude for all of the lanthanide trivalent cation series that possess a fast unpaired electron relaxation time (te 1013 s), in practice meaning all Ln3 þ except Gd3 þ (see Chapter 6). The nonisotropic distribution of the f-electron density around the lanthanide gives rise to an anisotropic magnetic susceptibility tensor c that governs the interaction of the unpaired electron dipole moment with the applied magnetic field. The magnitude of the PCS is proportional to the c tensor anisotropy (Dc) which can be decomposed in terms of the axial and rhombic components Dcax and Dcrh: Dd PCS ¼ d para ddia ¼ 1=12pr3 ½Dcax ð3cos2 q1Þ þ 3=2Dcrh sin2 qcos2j where Dd PCS is the difference in chemical shifts between paramagnetic and (equivalent) diamagnetic samples, and r, q and j represent the polar coordinates describing the distance and position of the nucleus relative to metal ion and the principal axis system of the Dc tensor. Depending upon the lanthanide ion used, measurable PCSs can be observed for metal-NMR nucleus distances as great as 40 A [131]. In general PCSs can be detected well beyond the radius for which line broadening due to the lanthanide-induced PRE operates. Since experimental values of PCS relate to both the orientation of the target molecule relative to the susceptibility tensor and the distances to the paramagnetic centre, these data can represent very powerful structural constraints on the system. Indeed such measurements have sometimes been adopted for the refinement of individual macromolecular structures wherein the parameters defining the magnitude and orientation of the Dc tensor are optimised as part of the process. Within the context of macromolecular complexes of components with known structures, it has been shown that even a relatively small number of PCS measurements can be sufficient to guide the rigid body docking process and determine the nature of the intermolecular interaction [130]. In such a case the pattern of PCS values induced in both molecules by the lanthanide ‘label’ attached to one of them, can be used to determine the eight Dc tensor parameters relevant for the first molecule, and the eight Dc tensor parameters relevant to the partner molecule. The structure of the complex is then obtained by rotation and translation of the molecules so as to superpose the Dc tensors. An apparent advantage of the long range nature of the PCS effects is that they can ‘reach across’ intermolecular interfaces whose NMR signals often exhibit chemical shift exchange broadening. Pintacuda et al. demonstrated the principle of this approach for the complex formed between two subunits of the E. coli DNA polymerase III, one of which has a natural metal-ion binding site that can accommodate lanthanide ions. 15 N and 1 H PCS measurements were made for subset of crosspeaks in complexes in which either subunit was 15 N-labelled and contained either Er3 þ or Dy3 þ . For each of the two lanthanides, the tensor parameters were derived by minimising the difference between the experimental PCSs and values back-calculated from the known structure of the metal-ion containing subunit (the precise location of the metal ion within this structure was refined as a part of this calculation). In general the fit residuals were larger than the uncertainty in the PCS measurements, suggesting perhaps that the subunit structures in the complex are slightly different to that in the complex. Superposition of the principal axis systems for the Dc tensors of the two components (for one of the metal ions) yielded the model of the heterodimer complex. In principle there are multiple degenerate solutions to this superposition, corresponding to rotations around the x, y and z axes. Not all of these solutions
Macromolecular Complexes
293
would necessarily be plausible because of intermolecular overlap or limited contact regions. However the availability of the second Dc tensor obtained with a different lanthanide cation can allow resolution of any ambiguities in this respect (so long as the principal axes systems of the two Dc tensors are noncoincident). 8.4.6
Data-Driven Docking
It should be apparent from foregoing description of the variety of NMR methods that can be used to explore macromolecular complexes that it is likely that no single approach provides a high density of structural data with which to constrain the system within ab initio calculations aimed at deriving the 3D structure. Rather it proves sensible to combine different approaches to maximise the total experimental information with which to build compatible models. Moreover when the 3D structures of the isolated components of the complex are already known or can be accurately modelled, then it would appear prudent to use this information as a means to bootstrap the overall process. A brief survey of the literature indicates that this is becoming the consensus view. Several independent examples of this type of hybrid approach have been reported. These efforts converge upon software solutions able to integrate the different data types. In this regard, an extremely common application has emerged: HADDOCK [132], standing for High Ambiguity Driven Docking, has been developed by Bonvin and co-workers as an extension of python scripts that derive from the popular ARIA (Ambiguous Restraints for Iterative Assignment) [133,134] component of the CNS structure determination package [135]. HADDOCK was originally developed for protein–protein docking, but has been expanded to cover protein–peptide, protein–nucleic acid and protein–ligand targets. HADDOCK has been refined so that docking calculations, including analysis of the outputs, can be effected automatically and is available as a Web-based service (http://www.nmr.chem.uu.nl/haddock/). The basic concept behind HADDOCK is that experimental data determined for the macromolecular complex, be it in the form of biophysical (NMR chemical shift perturbations, PREs, RDCs, etc.) or biochemical (mutagenesis) measurements, can be introduced into the calculation of the complex structure in terms of ambiguous interaction restraints (AIRs) that drive the molecules together within the context of a force field that describes the bonded and nonbonded interaction energies of the system. At the end of the calculation, the structures are ranked by assessment of a pseudo-energy that combines the goodness-of-fit to the experimental restraints along with the standard molecular potentials. Essentially HADDOCK performs a data-driven intermolecular docking calculation. Because HADDOCK is layered on top of the capabilities of CNS, there is a tremendous range of capability that allows for any degree of flexibility of the combining molecules that the experimenter wishes to allow. Thus one can perform anything from rigid-body docking (e.g. using component 3D structures independently obtained by NMR or X-ray crystallography) to ab initio calculations of the overall structure from random starting configurations. It should be borne in mind that the extent to which the outputs of HADDOCK (and related programs) provide a substitute for high-resolution structure determination will depend upon the quality and quantity of the experimental restraints applied. However in the majority of circumstances this type of approach been proven to yield results for the target system that are adequate either to withstand rationalisation via independent biochemical or biophysical data or to provide a platform for the design of further experimental validation of the structure [10].
294
Protein NMR Spectroscopy
Let us consider the application of HADDOCK with chemical shift perturbation data. The program requires the user to distinguish ‘active’ from ‘passive’ residues. Active residues are those declared to exhibit a strong chemical shift change upon complex formation and a high relative solvent accessibility in the free form of the protein (e.g. >50 % solvent accessibility); passive residues are those residues that show a weaker chemical shift perturbation and/or are physically adjacent to the active residues on the protein surface. An AIR is represented as an intermolecular distance restraint between ambiguously defined sets of atoms, diAB, with a user-definable upper bound (in the range 2–3 A) between any atom m of an active residue i of molecule A (miA) and any atom n from either active or passive residues k of molecule B (nkB), and vice versa. For each restraint the instantaneous value of the effective distance deff iAB is given by 1=6 6 deff iAB ¼ ðSmiA ¼1;Natom Sk¼1;Nres SnkB ¼1;Natom 1=dmiA ;nkB Þ
Here Natom is the number of atoms in a given residue and Nres is the sum of residues defined as either active or passive for a given molecule. The effect of this definition means that the AIRs provide pairwise reciprocal attractive forces between sets of active residues on the component macromolecules and between active residues on one molecule and passive residues on the other, without any explicit extra forces between sets of passive residues. The 1/6-power sum-averaging mimics the attractive part of the Lennard-Jones potential and the upper bound of 2–3 A allows for close approach of the complex components at a distance compatible with van der Waals contacts between heavy atoms. There are many userselectable options and the reader is referred to the online manual for full information. The basic underlying docking protocol is built on a three-step strategy [132]. Step 1. Randomisation of the orientations of the complex components at a suitable separation distance, by random rotations of the molecules around their centre of mass, followed by rigid body energy minimisation that allows for both rotation and translation of the molecules towards a minimum of the intermolecular energy function. Step 2. A multistage simulated annealing (SA) protocol for the complex in torsion angle space starting with a high temperature rigid body search, rigid body SA, and then semiflexible SA progressively allowing for internal mobility of the interfacial side-chains and then backbone atoms. Step 3. Refinement in Cartesian coordinate space of the complex model in an explicit solvent layer (typically 8 A thick for aqueous systems), with position restraints eventually limited to the noninterfacial heavy atoms. Typically 1000 models are generated in Step 1, and the 200 lowest energy models are taken forwards into Steps 2 and 3. The refined models are clustered on the basis of the pairwise backbone atom RMSD within the interfacial region, denoted iRMSD. Clusters are defined for a set of models for which iRMSD is smaller than 1 A. The clusters are optionally ranked according to user-defined criteria, typically the average interaction energies within the adopted force field and the mean buried surface area. This apparently straightforward application of chemical shift perturbation data was demonstrated to work remarkably well for a number of systems for which the structure of the
Macromolecular Complexes
295
complex was already known [132]. Interestingly it was shown that Steps 2 and 3 of the calculation do relatively little to improve the iRMSD score relative to the target structure, suggesting that the rigid body energy minimisation efficiently finds low energy structures that capture the basic configuration of the complex. Rather the last two phases allow the interfacial side-chains to reorient to adopt stereochemically sensible conformations, leading to superior scoring of the interfacial energy. Within the HADDOCK concept, the definition of the ‘active’ and ‘passive’ residues sets can be extended beyond chemical shift perturbations to the effects of site-directed mutagenesis experiments (e.g. active residues are those that when mutated have a significant effect on complex formation), hydrogen-deuterium exchange measurements, PREs, STD effects, and so on. Clearly for a given HADDOCK calculation the precise impact of the AIRs will depend upon the criteria adopted by the user to delineate ‘significant’ versus ‘minor’ effects. In practice the experimenter will need to ‘titrate’ this definition according to the output of the calculation, in particular by iterating the classification of active and passive against the proximity of the residues to the molecular surface of the individual components. The partly subjective aspect of the HADDOCK procedure recalls that in the final analysis it is a modelling exercise, rather than an ab initio structure determination, making imperative clear exposition of the restraints used in any publication outputs that arise from application of the procedure. The more that such calculations can be supplemented by unambiguous structural restraint data such as quantitative PREs, RDCs and intermolecular NOEs, the more reliable is likely to be the outcome. Interestingly, AIR restraints based upon chemical shift perturbation and mutagenesis, whilst defining the interfacial surfaces within a complex, provide relatively little information to define the relative orientation of the complex components, suggesting that when HADDOCK converges the protocol finds the relevant solution mostly on the basis of shape complementarity (which contributes to the van der Waals interaction energy) and the distribution of hydrophobic and hydrophilic moieties (contributing to the electrostatic energy). The discrimination of the correct solution is likely to be challenging when the interface lacks a degree of asymmetry in either respect. Both in this context and more generally, the capability to include more ‘global’ restraint types, such as RDCs and rotational diffusion anisotropy data, into HADDOCK-style calculations can prove extremely beneficial. In a reciprocal manner the energy terms of the force field adopted in HADDOCK provide the translational constraint on the complex that is lacking from RDC measurements on their own, as well as lifting the degeneracy of compatible intermolecular orientations that arise for RDCs obtained in a single alignment medium. The HADDOCK/CNS package allows for either direct use of RDC restraints within the CNS force field where the projection angles are defined with respect to a separate frame that defines the alignment tensor, or as a set of intervector projection angle restraints [136]. The latter treatment means that one can avoid working with an explicit alignment tensor during the structure calculations, distinguish between inter- and intramolecular restraints (useful given that part of the system will typically be kept rigid during the docking procedure), and additionally facilitate the use of multiple RDC datasets obtained in different alignment media. In either case, prior estimation of the alignment tensor components will be required. In the context of complexes between single domain components of known structure, these parameters can be obtained using external software such as PALES (prediction of alignment from structure) [137], for which HADDOCK provides a convenient script to generate the
296
Protein NMR Spectroscopy
requisite input files. Amongst other utilities, PALES can best-fit the dipolar coupling tensor to the corresponding 3D structures. A sensible approach is to weight the RDCs corresponding to bond vectors with the least contribution of internal motion, such as from residues in regular secondary structure elements. 8.4.7
Small Angle X-Ray Scattering (SAXS)
In recent times, with the expansion of high power sources of X-rays (particularly synchrotons, but also by adaptation of in-house diffraction equipment) and the development of relatively user-friendly software, it has become reasonably straightforward to obtain small angle X-ray scattering (SAXS) data for biological macromolecules (see Chapter 4 for a detailed description of the technique). SAXS (and similarly small angle neutron scattering, SANS, though to date this has not been exploited to the same degree) can provide information that complements restraints derived from NMR data. There has been a relatively long tradition of using SAXS profiles in applications to evaluate and filter models for multidomain proteins where there is scope for interdomain flexibility [138–144]. More recently methods have been devised to use SAXS data as an explicit restraint term during structure calculations, greatly extending the utility of the combination of NMR and scattering measurements to macromolecular systems [145,146]. Essentially the SAXS scattering curve depends upon the difference in electron density between the solute and the solvent and maps the Fourier transform of the distribution of interatomic distances within the macromolecular target, which can be straightforwardly related to the molecular shape and internal structure (principally the nonuniform electron density, which can be treated numerically by robust approximations). As the density of the solvent layer surrounding the protein is slightly higher than that of the bulk solvent, the scattering profile includes a contribution from this layer and should be accounted for during the data analysis. In the context of NMR investigations of complex structures, SAXS has the advantage that it can be applied in a relatively efficient manner. Thus the SAXS measurements can be matched to the conditions used for NMR experiments, can be applied to the same isotope-labelled preparations, and typically require substantiallysmaller samplevolumes (tens of microlitres). On the other hand, for SAXS methodology to be practically useful, care must be taken to: obtain sample conditions where the sample is nonaggregated (the scattering intensity scales with the square of the molecular radius); to measure the contribution to scattering from the bulk solvent with high precision (this is measured separately and then subtracted from the scattering profile of the target complex); and to guard against the potential for X-ray-induced radiation damage to the sample. In general obtaining nonaggregation of the target complex is consistent with optimal conditions for NMR investigation so this criterion is not likely to present a major hurdle. Radiation damage occurs largely by bond breakage due to the generation of reactive OH free radicals. Maintaining a reducing environment in the sample buffer (e.g. with high concentrations of DTT), along with assessment of the effects on the reproducibility of the scattering curve of attenuation of the beam intensity and exposure times helps to limit this issue. In part because SAXS measurements are usually obtained at a specialist synchrotron facility, with its local idiosyncratic setup, a full discussion of the practical aspects of SAXS measurements and data analysis is not warranted here, and the reader is referred to the specialist literature on this topic for a more comprehensive discussion. Suffice to say here a
Macromolecular Complexes
297
few groups have been able to demonstrate that the incorporation of SAXS data into refinement of the structure of macromolecules (both multidomain monomers and heteromultimers) that both improves the precision (and likely accuracy) of the outcome, particularly highlighting that SAXS data can assist in circumstances where NOE-based distance restraints are sparse. For example, the Bax group has developed a very elegant procedure to include SAXS data alongside NOE and RDC restraints within CNS [145,147]. These researchers report that a explicit back-calculation of the scattering curve during the refinement procedure in a reasonable timeframe would be computationally intractable. Their solution to this problem incorporates a numerical reduction of the density of points on the scattering curve. Independently, Sattler et al. have reported a similar procedure that adopts a polynomial approximation of the scattering curve [146]. An important aspect to bear in mind is that on its own the SAXS data provide a low resolution envelope that captures the time-averaged electron density of the structure. If the target structure is a comparatively featureless (near isotropic) object then the added-value of the SAXS data will also be relatively limited. Nevertheless the notable success represented by the relatively few reports of combined SAXS/NMR-refinement of macromolecular structure combined with increased ease of access to high intensity X-ray sources presages the likely expansion of this type of activity in the future as NMR spectroscopists increasingly target larger, potentially more dynamic, macromolecular assemblies.
8.5
Literature Examples
The scope of the application of NMR spectroscopy to the examination of macromolecular complexes is enormous. The following examples of this activity represent only a skimming of the surface of this activity, chosen to highlight some of the methodologies described above. 8.5.1
Protein–Protein Interactions
As mentioned previously, one of the earliest examples of the application of NMR to reveal the 3D structure of the complex formed between two globular proteins was that of the N-terminal domain of Enzyme I (EIN) and the histidine-containing phosphocarrier protein (HPr), components of the bacterial phosphoenolpyruvate:sugar phosphotransferase system [59]. This complex formed in fast-exchange on the chemical shift timescale with a Kd of about 7 mM. The rotational correlation time at 40 C was 15.5 ns with rotational diffusion anisotropy 1.7. Structure determination of the complex was conducted in the ‘classical’ mode – relying on the essentially exhaustive detection of unambiguously assigned intra- and intermolecular NOE-derived distance restraints and a small number of RDCs. This tour de force involved the utilisation of a very large number of NMR samples and thirty 3D and eight 4D NMR datasets which, with additional 2D NMR experiments, amounting to 3500 hours spectrometer time. RDCs were obtained for 244 NH bond vectors in the nematic phase of a colloidal suspension of fd bacteriophage. Of the 5474 interproton distance restraints derived from the various NOESY spectra, only 110 describe intermolecular NOE contacts. So, whilst the resulting high resolution complex structure (Figure 8.6) represents a truly impressive result, not least because the 40 kDa molecular weight was (and perhaps still is) at the edge of what is ordinarily possible for NMR applications, it quickly became clear
298
Protein NMR Spectroscopy
Figure 8.6 Top: best-fit superposition of the backbone atoms of the 40 simulated annealing structures of the N-terminal domain of enzyme I (EIN) complexed to the histidine-containing phosphocarrier protein (HPr). Bottom: ribbon diagrams illustrating two views of the 40 kDa EIN-HPr complex. HPr is shown in green, the a-domain of EIN in red, and the a/b-domain and C-terminal helix of EIN in blue. Also shown in gold are the side-chains of active site histidine residues of both EIN and HPr. Image taken from [59]. Please refer to the colour plate section
Macromolecular Complexes
299
that NMR spectroscopists should strive to find more efficient means to arrive at biologically useful results. A more typical example of the investigation of complex formation between globular protein domains is provided by the interaction of insulin-like growth factor 2 (IGF2) with domain 11 of the IGF2 receptor/mannose-6-phosphate receptor (IGF2R) reported by Crump and co-workers[148]. IGF2R mediates trafficking of mannose-6-phosphate (M6P)containing proteins and the mitogenic hormone IGF2. Mutation of IGF2R is often associated with human carcinogenesis suggesting that it acts as a tumour suppressor. IGF2 interacts with domain 11 (of 15) of the extracellular region of IGF2R (IGF2R-D11). Both the X-ray crystal structure of IGF2R-D11 and the solution structure of IGF2 had previously been reported but efforts to visualise the structure of the complex between the two were hampered by the poor tractability of IGF2. After developing a translational fusion system to enhance the expression yield of properly folded, biologically active IGF2 and having established de novo the chemical shift assignments and 3D solution structure of IGF2RD11, these researchers exploited chemical shift mapping and site-directed mutagenesis data to perform data-driven protein-protein docking using HADDOCK. The investigation was complicated by the fact that IGF2 is difficult to work with at pH > 3.0 and at millimolar concentrations, due a tendency to self-association, and IGF2RD11 tends to precipitate at pH values below 4.5. As a result the samples of the complex were prepared at low concentration by dissolving lyophilised unlabelled IGF2 in a solution of 15 N-labelled IGF2R-D11. The pH was adjusted to 5.5 prior to NMR. Where possible, the assignment of shifted peaks using 3D 15 N TOCSY-HSQC and 15 N NOESY-HSQC spectra. The small number of cross-peaks that could be not be assigned unambiguously, but were still close enough to the corresponding peak in free IGF2R-D11 to be assignable, were classified as either ‘intermediate’ (Dd < 0.05 ppm) or ‘large’ (Dd > 0.05 ppm) (Figure 8.7). Those peaks that broadened beyond detection or could not be assigned by comparison to the unbound spectrum were classified as ‘disappeared’ (cf. Figure 8.2d). When assessed in terms of the 3D structure of IGF2R-D11 the highest concentration of perturbed residues corresponds to a patch that encompasses Ile1572, mutation of which was previously known to abolish IGF2 binding to the receptor [149]. A number of additional residues that showed chemical shift changes are scattered across the protein structure or buried within the protein interior. Such shifts were presumed to arise from subtle structural rearrangements induced upon binding or by secondary effects, such as allostery or changes in mobility upon binding. For the HADDOCK calculations ‘active’ residues in IGFR2-D11 were defined as having a chemical shift perturbation upon complex formation greater than 0.05 ppm with an average relative solvent accessibility higher than 50 %. All amino acids neighbouring the active residues with a high solvent accessibility (>50 %) were defined as ‘passive’. Two separate definitions of the active residues in IGF2 were adopted. In one instance (Case 1) all of the existing mutagenesis data available in the database on IGF2 binding to IGF2R used to define the active residue set. In the other instance (Case 2) a more selective definition incorporating only IGF2 residues more recently shown to be critical for binding to IGF2R domain 11 was used. The AIR effective distances were defined with an upper limit of 2 A. A number of regions for each protein, corresponding to loops with demonstrable elevated internal motion evidenced by relatively low {1 H}15 N heteronuclear NOE values, were declared as ‘flexible’. The resulting models of the complex were clustered using a 2.5 A
300
Protein NMR Spectroscopy
Figure 8.7 NMR data for the titration of insulin-like growth factor-2 (IGF2) receptor (IGF2R) domain 11 with IGF2 as reported by Williams et al. [148]. Left: 2D 15N,1H-correlation spectra of IGF2R domain 11 in the absence (black) and presence (red) of IGF2. The insert panel shows an expanded view of the boxed region. Right: the pattern of IGF2binding-dependent chemical shift perturbations for IGF2R domain 11 mapped on a molecular surface representation. Residues with shift perturbations >0.05 ppm are red, and residues with shift perturbations <0.05 ppm are orange. Blue indicates NH resonances that broaden and disappear (i.e. are ‘bleached’) upon IGF2 binding (cf. Figure 8.2d); grey indicates little or no change in chemical shift upon binding. Image taken from [148]. Please refer to the colour plate section
cut-off based on the pairwise backbone RMSD matrix after superposition on the backbone atoms of domain IGF2R-D11. Interestingly the HADDOCK calculations for Case 1 did not converge in a useful way, yielding nine clusters representing 2 major topologies for the complex that failed to show interactions between the full set of residues included for IGF2. On the other hand the Case 2 calculations yielded two highly populated and essentially equi-energetic clusters differing only in a small (20 ) rotation in the relative orientation of the binding partners. In both clusters the IGF2 wraps around the IGF2 binding site of IGFR-D11 forming an extensive binding interface up to 1775 A2 (Figure 8.8). The models predict that IGF2 residues Thr16, Phe19, Asp52, Leu53, Ala54, and Glu57 fit into the IGF2R-D11 binding groove and form extensive contacts with the three loops which are flexible in the apo-state. The models suggest that hydrophobic interactions most likely drive the binding of the two proteins with an additional ‘O’-ring of electrostatic interactions stabilising the final conformation. This example of the application of data-driven docking based upon chemical shift perturbation and mutagenesis data illustrates that 3D models of protein-protein complexes can be obtained in a more straightforward manner (at least compared to the exhaustive NOEbased approach), both in terms of the number of labelled samples and the amount of NMR instrument time required. The resulting models are often of sufficiently good ‘resolution’ so as to guide more detailed biochemical or structural investigations. However it should be borne in mind that use of HADDOCK or similar approaches is to a greater or lesser extent, depending upon the amount of experimental data to hand, essentially a modelling exercise
Macromolecular Complexes
301
Figure 8.8 Structural models of IGF2R-D11-IGF2 complex generated using HADDOCK. The two lowest energy structures in each of the candidate clusters are shown, with IGFR2-D11 depicted in surface mode, the IGF2 backbone in ribbon mode and selected side-chains as sticks. The core of the IGF2 binding site is coloured blue, and the side-chain of E1544, which is known to negatively regulate IGF2 binding, is drawn in red. The orientation of IFG2 differs by approximately 20 between the two models. Image taken from [148]. Please refer to the colour plate section
and to a degree, as in this example, obtaining reasonable convergence of the docking algorithm can depend upon the subjective interpretation of the experimental data in terms of the AIRs. In this particular case, a more recent X-ray diffraction study of IGF2 complexed to larger fragments of IGF2R provided a structure that is broadly consistent, though not identical, with the NMR-based model [150]. 8.5.2
Protein–DNA Interactions
In general, applications of NMR to investigate the structure of complexes formed between proteins and DNA are confronted by the general difficulty to obtain isotope-labelled samples of the target DNA. Whereas methods to synthesise isotope labelled DNA have been devised these are likely to be rather laborious or expensive. Coupled with the fact that the conformational variability of DNA is relatively limited, NMR investigations in this field have focused upon making the best out of samples in which 13 C,15 N-isotope-labelled protein is titrated or mixed with unlabelled DNA oligonucleotides. Often, to guard against uncontrolled precipitation, such mixing is performed in relatively dilute solution, followed by a concentration before NMR spectroscopy. A relatively noteworthy example of NMR applied to protein-DNA complexes is provided by the high mobility group (HMG) protein domain, a subclass of which derives from transcription factors and bind DNA in a sequence specific manner. Prior to structural studies it was anticipated from the results of biochemical experiments that HMG domains bind mainly to the minor groove of the DNA, and induce significant deformation of the B-form double-stranded DNA target. Initial reports of the solution structures of lymphoid enhancer factor-1 (LEF-1) HMG domain bound to a 15-base pair (bp) DNA duplex [151] and the human testis-determining gene SRY (for sex-determining region Y) HMG domain bound to
302
Protein NMR Spectroscopy
a 8 bp DNA target [152] relied mainly upon the analysis of isotope-edited/filtered NOESY spectra to derive intra- and intermolecular interatomic distance restraints. Chemical shift assignments for the DNA component as well as intra-DNA 1 H-NOEs were obtained using 12 14 C, N-filtered TOCSY and NOESY pulse sequences respectively. NOEs involving protons of the protein were obtained from 3D and 4D isotope-separated/filtered NOESY experiments in both H2O and D2O solutions. The details of the structure calculations differ between the two examples. For the SRY-complex the DNA was loosely constrained by torsion angle restraints that encompass both A- and B-form DNA; for the LEF-1-complex constraints that enforce a B-form DNA conformation were systematically relaxed to allow the experimental NOE data to determine the outcome. In each case the structures revealed that the HMG domain forms a twisted L-shape that presents a concave surface comprising three a-helices and the N- and C-terminal strands to the widened minor groove of the DNA. In the LEF-1 complex a C-terminal extension of the HMG polypeptide is directed to the compressed major group. The DNA itself is helically unwound relative to classical A- or B-form DNA with an overall bend of the order of 80 (SRY) or 120 (LEF-1). In each case a hydrophobic HMG domain side-chain (Met, Ile) is partially intercalated between DNA base pairs in the centre of the specific recognition sequence which otherwise maintain standard inter-DNA-strand Watson-Crick H-bonds. Since the DNA target duplexes populate B-form DNA conformations in isolation, it is argued that the structural distortion engendered by interaction with the HMG-domain proteins represent examples of classical ‘induced-fit’. The SRY-DNA complex has more recently been re-examined partly by re-design of the protein and DNA constructs (both of which have extra residues included to take into account new structural and biochemical data) but also with the adoption of more contemporary NMR methodologies [153]. This more highly refined structure provided a platform for investigation of a naturally occurring mutation in SRY of a methionine residue that participates with the intercalating Ile in a hydrophobic wedge that forces the characteristic HMG-dependent DNA bend. Circular permutation gel shift assays suggest that this Met ! Ile SRY HMG variant appears to bend DNA recognition sites approximately 20 less than the wild-type protein with a significant reduction in binding affinity. In this study, which utilised a 14mer DNA duplex, NMR spectroscopy was performed on samples containing 1:1 complexes of 15 N-SRY HMG and unlabelled DNA, 15 N/13 C-SRY HMG and unlabelled DNA, and 15 N-SRY HMG and 15 N/13 C-DNA. In addition to NOE measurements, a large number of 3-bond scalar couplings were measured, including 3 JH30 P couplings, which are related to the DNA « (C40 -C30 -O30 -P) sugar-phosphate backbone torsion angles, obtained using a 12 C-filtered constant time 1 H-1 H{31 P} COSY difference experiment [154]. Importantly, several classes of RDC measurements were made for samples of the complex prepared with mixed lipid bicelles. These included heteronuclear 1 DNH and 1 DCH splittings for both protein and DNA, 1 DNC0 and 2 DHNC0 couplings for protein only, and homonuclear DHH splittings for the DNA. The latter were encoded as approximate restraints grouped into ranges corresponding to strong, medium and weak intensity cross-peaks in a 2D 12 C-filtered COSY spectrum recorded on a sample comprising 15 N/13 C-labelled protein and unlabelled DNA in bicelles. The RDC data provided long-range orientational information, critical for enhancing the precision and accuracy of the structures.
Macromolecular Complexes
303
Each SRY HMG-DNA complex was solved using 2700 experimental NMR restraints, including 167 intermolecular interproton distance restraints for each complex, and more than 350 RDCs. The high resolution of the resulting structures allowed the authors to conclude that whilst the general characteristics of the interaction of the wild-type and mutant SRY HMG proteins with the DNA target are conserved at the local level, the overall bend angle for the 14 bp DNA in the wild-type complex is significantly greater (by 13 ) than that for the Met ! Ile mutant. The precision of the structures is so high as to permit speculation that the removal of the putative hydrogen bond formed between an Arg sidechain guanidinium group and the Sd atom of the Met side-chain, coupled with the ‘shorter’ Ile side-chain replacement which makes suboptimal packing with the sugar moiety of an adenine base, is the basis for the difference in observed dissociation constants and overall DNA bend angles. 8.5.3
Protein–RNA Interaction
8.5.3.1 Protein–dsRNA Unlike the case for DNA, which is typified by almost universal adoption of a doublestranded polynucleotide structure, RNA can function in either double- or single-stranded form. This extra structural ‘freedom’ for RNA is conveniently matched by the greater capacity (compared to DNA) to obtain isotope-labelled material for NMR studies. Whilst with DNA such labelling is usually not economically viable, but assumptions about the overall structure can reasonably be made thereby facilitating model building, the fact that RNA can be isotope-labelled in a relatively facile manner [19,20,158] means that direct assessment of the conformation of RNA by heteronuclear NMR is feasible, and is arguably the norm in published accounts of RNA solution structure [27]. An impressive contemporary example of the structural investigation of the 3D structure of a complex between a protein and a double-stranded RNA (dsRNA) is provided by the work of Feigon and colleagues on the recognition of the AGNN tetraloop RNA fold by the double-stranded RNA-binding domain (dsRBD) of Rnt1p RNase III [155]. Rnt1p is a member of the RNase III family of dsRNA endonucleases and a key component of the Saccharomyces cerevisiae RNA-processing machinery. The dsRBD of Rnt1p has been implicated in substrate-targeting of the endonuclease through recognition of dsRNA hairpins closed by AGNN tetraloops. The NMR investigation focused on the dsRBD interaction with the 50 terminal hairpin from the small nucleolar RNA substrate of Rnt1p known as the snR47 precursor. NMR samples of the complex comprised a 90 residue dsRBD with a 32-nucleotide RNA. Samples of isotope-labelled RNA was prepared from DNA templates using T7 RNA polymerase [156,157] and 13 C,15 N-labelled nucleotide triphosphates (NTPs) [158,159]. During titration of the labelled dsRBD with unlabelled dsRNA and vice versa formation of a protein-RNA complex was observed with fast-exchange characteristics and specific chemical shift perturbations to subsets of protein and RNA resonances. Resonance assignment and the identification of intermolecular interproton distance restraints were based upon a differential isotope labelling strategy. Protein resonances in the complex were assigned from a standard set of triple resonance NMR spectra recorded on 13 C,15 N-labelled Rnt1p dsRBD/unlabelled snR47h samples. The spectrum of the snR47h RNA in the
304
Protein NMR Spectroscopy
Figure 8.9 A selected region of the F213C-filtered NOESY spectrum [161] acquired for the 13 C ,15 N-isotope-labelled Rnt1p RNAse II dsRBD bound to unlabelled snR47h RNA in D2O solution, as reported by Wu et al. [155]. The assignments of the intermolecular NOE contacts are indicated: RNA chemical shifts at top, protein side-chain chemical shifts to the right. Image taken from [155]
complex was assigned from 2D NOESY, TOCSY, and 1 H-13 C HSQC and 3D HCCHTOCSY and HCCH-COSY spectra on unlabelled and labelled samples respectively [159,160]. 2D NOESY, 3D 13 C-NOESY-HMQC and 3D 15 N-NOESY-HSQC experiments were used to obtain NOEs for distance restraints. Additional RNA and all intermolecular NOEs were assigned using a suite of four 2D isotope-filtered/edited NOESY experiments [161] applied to four samples of unlabelled Rnt1p dsRBD complexed with RNA synthesised with specific (i.e. separate) incorporation of 13 C,15 N-A, -U, -C, or -G, as well as a sample of 13 C,15 N-labelled Rnt1p dsRBD bound to unlabelled RNA sample (Figure 8.9). The selective incorporation of double-isotope labelled bases in the RNA component represents a general strategy to aid the assignment and NOE identification for large RNA molecules. RDCs were measured from apparent 1 JHN differences in F1-coupled HSQC spectra in the presence and absence of C12E6/hexanol [92]. The structure of the complex was calculated using XPLOR-NIH [162] with standard simulated annealing protocols starting with 100 templates generated from extended protein and RNA chains with randomised orientations. The protein and RNA were initially separated by 70 A and were folded simultaneously during 80 ps of high-temperature dynamics followed by 75 ps of slow cooling. The structures were then annealed in a second cooling phase during which the RDC restraints were introduced. Hydrogen-bond restraints were used for the 80 amino acids that possessed slowly exchanging amide protons and for
Macromolecular Complexes
305
Figure 8.10 NMR-derived structure of the complex formed between the Rnt1p RNAse III dsRBD protein and snR47h AGNN tetraloop hairpin RNA determined by Wu et al. [155]. Left: best-fit superposition of the 15 lowest energy NMR models with the protein shown in blue and RNA in green. Right: schematic representation of the lowest energy structure in the bundle with the RNA helical backbone indicated by thin blue cylinder, the RNA atoms shown in stick form and the protein as ribbons with residues populating the protein-RNA interface shown as ball and sticks. Image adapted from [155]. Please refer to the colour plate section
the 14 Watson-Crick base pairs. Restraints for the ribose sugar conformation and c angles were assessed [159] and a, b, g, and « torsion angle restraints were included for nucleotides with NOE patterns consistent with A-form geometry. The resulting conformer bundle (Figure 8.10) reveals that Rnt1p dsRBD adopts the abbba-fold topology characteristic for dsRBD structures but with the addition of a third a-helix at the C-terminus. The conserved AGNN tetraloop conformation, which had previously been investigated in isolation [159], was retained in the complex. The dsRBD contacts the RNA at successive minor, major, and tetraloop minor grooves on one side of the dsRNA helix. The structure unexpectedly showed that neither the universally conserved tetraloop G nor the highly conserved A are recognised by specific hydrogen bonds to the bases. Instead the dsRBD N-terminal helix fits snugly into the minor groove of the RNA tetraloop and top of the stem, interacting in a nonsequence-specific manner with the sugarphosphate backbone and the two nonconserved tetraloop bases. This structure was used to design further mutagenesis experiments to probe the role of the dsRBD amino acid residues that contact the tetraloop region in the context of the intact Rnt1p protein in vivo, revealing them to be functionally important for RNA processing.
8.5.3.2 Protein–ssRNA A second example of investigation of protein-RNA complex structure of provided by the work of the Summers group on the combination of the 82-nucleotide segment of the
306
Protein NMR Spectroscopy
Figure 8.11 Representations of the molecular components of the structure of the complex between Rous sarcoma virus (RSV) mY packaging signal RNA and Zn-binding RSV nucleocapsid (NC) protein reported by Zhou et al. [164]. The predicted secondary structure elements of the RNA and the coordination of the Zn atoms is shown. Nonnative nucleotides used to enable in vitro transcription and protease cleavage of the expressed fusion protein are depicted in red and grey respectively. Image taken from [164]. Please refer to the colour plate section
50 -untranslated region (50 -UTR) of the Rous sarcoma virus (RSV) retroviral genome (denoted mY) with the cognate 89-residue nucleocapsid protein PC [163,164] (Figure 8.11). This region of the RSV RNA genome is sufficient to support RNA packaging during virus assembly. In the absence of a binding partner mY yielded essentially intractable NMR spectra, with broad lines suggesting exchange between multiple conformations. High quality NMR spectra were obtained in the presence of the NC protein, which contains two ‘zinc-knuckle’ domains in which each of the Zn atoms is coordinated by the side-chains of a His and three Cys residues, consistent with the low-nanomolar dissociation constant established by isothermal titration calorimetry.
Macromolecular Complexes
307
The 3D structure of the NC:mY complex was obtained using a combination of isotopeedited NMR experiments coupled with an isotope-labelling strategy that exploited the capability to obtain specifically protonated, 2 H-labelled mY RNA. As before, the RNA was prepared by in vitro transcription using phage T7 RNA polymerase with purification by denaturing gel electrophoresis. Selective protonation for a given base was achieved by combination of three 85 %-deuterated NTPs (available commercially) with the respective protonated NTP in the synthesis reaction. The strategy of specific protonation means that the 2D 1 H NMR spectra of the protein-RNA complex are greatly simplified, allowing sequential assignment of the RNA component through the analysis of the patterns of 1 H NOES connecting neighbouring bases. Comparison of the mY NMR spectra obtained for separate samples in which each of the four bases was selectively protonated yielded nearly complete 1 H resonance assignments for the RNA component of the complex. The secondary structure of mY-RNA was predicted to contain three stem-loops (SL-A,-B- and -C) sprouting from a main double-stranded stem denoted O3 (Figure 8.11). NMR studies of chemically synthesised oligoriboncleotide hairpins corresponding to SL-A, SL-B and O3 yielded spectra that were consistent with the spectrum of mY within the NC-complex. In addition the pattern of intra-RNA NOES suggests that each of these three SL elements takes up an Aform double-helix conformation. The 15 N,13 C-labelled RSV NC protein component was studied both in isolation and in the complex with mY RNA using standard triple resonance NMR methods. Although several of the NC resonances exhibited large RNA-dependent perturbations, the overall pattern of the chemical shifts and intramolecular NOES is consistent with maintenance of the tandem Zn-knuckle conformation in the complex. Notably, the absence of interknuckle NOEs suggested that the two structural elements are formed independently. The 3D structure of the NC:mY complex was calculated with the torsion angle space dynamics program CYANA [165,166] (http://www.cyana.org) starting from randomised conformations of the protein and RNA components (see also Chapter 5 for more details of the CYANA program). Intermolecular NOEs were obtained by combined analysis of 3D 1 H,15 N HSQC-NOESY, 3D 1 H,13 C HMQC-NOESYand 4D 15 N,13 C-edited NOESY data of samples containing isotope-labelled NC protein along with 2D NOESY spectra for AH-mY, UH-mY, GH-mY, CH-mY (containing protonated A, U, G and C, respectively, with the remaining nucleotides perdeuterated) or fully protonated mY bound to unlabelled NC, respectively (Figure 8.12). Additional constraints were employed to limit the approach of mY phosphate atoms across the major grooves of the A-helical regions to greater than 8 A, and to keep all other long-range P(i)-P(i þ 4) distances further than 6 A. The final structures were refined with a total of 28 restraints per residue, including 31 intermolecular distance restraints (Figure 8.13). At a local level, the O3 helix, SL-C and the stems of SL-A and SL-B are relatively well defined by the NMR data. The SL-A and SL-B loop residues are poorly defined due to the absence of restraints in these regions. Sequential internucleotide NOEs between two guanosines indicate that SL-B and SL-C form a continuous stack. Moreover the combination of experimental NMR and van der Waals means that the relative position and orientation of the NC Zn-knuckles, O3, SL-B and SL-C, but not SL-A, are well defined. The hairpin tetraloop of SL-C interacts with the N-terminal Zn-knuckle of NC, with the exposed first guanosine of the UGCG sequence bound in a hydrophobic pocket
308
Protein NMR Spectroscopy
Figure 8.12 Top: two sample regions of the 3D 13C-edited NOESY-HMQC spectrum recorded for double 13C,15N isotope-labelled Rous sarcoma virus (RSV) nucleocapsid (NC) protein bound to unlabelled RSV mY packaging signal RNA investigated by Zhou et al. [164]. The cross-peaks correspond to intermolecular NOE contacts associated with residues Arg16 and Ala32 of the NC N-terminal Zn-knuckle, respectively. Bottom: Overlay of the 2D 1H-NOESY spectra obtained for specifically protonated GH-mY (black) and UH-mY (red) bound to NC showing intermolecular NOE cross-peaks connecting the stem-loop C (SL-C) tetraloop RNA residues U217, G218 and G220 to the NC N-terminal Zn-knuckle residues Tyr22 and Tyr30. Image taken from [164]. Please refer to the colour plate section
reminiscent of NC:RNA complexes from divergent retroviruses. The C-terminal Zn-knuckle interacts with mY in a manner not observed before, binding to two adenosine bases from the O3-SL-A and SLA-A/SL-B linker segments. The study authors suggest that this observation is consistent with the fact that chimeric viruses containing swapped NC domains are only sometimes capable of packaging the genome from which the NC coding sequence was derived. Overall the structural model of the RSV NC:mY complex obtained by NMR is consistent with the abundant biochemical data that describes the effect of mutagenesis of the 50 -UTR on genome packaging for this virus, and provides a platform for further experimental interrogation of this process. In the broader context of NMR investigation of protein-RNA (and indeed RNA-RNA [167] and RNA-DNA [168]) complexes, the divide-and-conquer strategy adopted here in which RNA secondary structure prediction is exploited to derive stable substructures of larger polynucleotides is likely to prove a powerful experimental paradigm. Such an approach immediately lends itself to model building based upon sparse NMR (NOE, RDC, PRE) restraint data used in combination with RDC measurements and SAXS analysis.
Macromolecular Complexes
309
Figure 8.13 (a) Rendering of the 20 NMR-derived structures of the NC:mY complex showing the relative convergence of the secondary structure elements, obtained by best-fit superposition of the SL-C stem carbon atoms. The result shows that the relative positions of SL-B (green), SL-C (brown), O3 (red), the linkers (orange), and the NC Zn-knuckles (blue) are well defined by the NMR data, but the position of SL-A (purple) is not; (b) and (c) show two different stereo views of a representative structure, showing the relative positions of the NC and mY secondary structure elements. Image taken from [164]. Please refer to the colour plate section
310
Protein NMR Spectroscopy
References 1. Clore, G.M., Tang, C. et al. (2007) Elucidating transient macromolecular interactions using paramagnetic relaxation enhancement. Curr. Opin. Struct. Biol., 17(5), 603–616. 2. Korzhnev, D.M., Bezsonova, I. et al. (2009) Alternate binding modes for a ubiquitin-SH3 domain interaction studied by NMR spectroscopy. J. Mol. Biol., 386(2), 391–405. 3. Clore, G.M. and Gronenborn, A.M. (1998) Determining the structures of large proteins and protein complexes by NMR. Trends Biotechnol., 16(1), 22–34. 4. Clore, G.M. and Gronenborn, A.M. (1998) NMR structure determination of proteins and protein complexes larger than 20kDa. Curr. Opin. Chem. Biol., 2(5), 564–570. 5. Allen, M., Varani, L. et al. (2001) Nuclear magnetic resonance methods to study structure and dynamics of RNA-protein complexes. Meth. Enzymol., 339, 357–376. 6. D€otsch, V. (2001). Protein-DNA interactions. Meth. Enzymol., 339, 343–357. 7. Qin, J., Vinogradova, O. et al. (2001) Protein-protein interactions probed by nuclear magnetic resonance spectroscopy. Meth. Enzymol., 339, 377–389. 8. Walters, K.J., Ferentz, A.E. et al. (2001) Characterizing protein-protein complexes and oligomers by nuclear magnetic resonance spectroscopy. Meth. Enzymol., 339, 238–258. 9. Nietlispach, D., Mott, H.R. et al. (2004) Structure determination of protein complexes by NMR. Methods Mol. Biol., 278, 255–288. 10. Bonvin, A.M., Boelens, R. et al. (2005) NMR analysis of protein interactions. Curr. Opin. Chem. Biol., 9(5), 501–508. 11. Takeuchi, K. and Wagner, G. (2006) NMR studies of protein interactions. Curr. Opin. Struct. Biol., 16(1), 109–117. 12. Foster, M.P., McElroy, C.A. et al. (2007) Solution NMR of large molecules and assemblies. Biochemistry, 46(2), 331–340. 13. Muchmore, D.C., McIntosh, L.P. et al. (1989) Expression and N-15 labeling of proteins for proton and N-15 nuclear-magnetic-resonance. Meth. Enzymol., 177, 44–73. 14. McIntosh, L.P. and Dahlquist, F.W. (1990) Biosynthetic incorporation of N-15 and C-13 for assignment and interpretation of nuclear-magnetic-resonance spectra of proteins. Q. Rev. Biophys., 23(1), 1–38. 15. Lemaster, D.M. (1994) Isotope labeling in solution protein assignment and structural-analysis. Prog. Nucl. Mag. Res. Sp., 26, 371–419. 16. Lian, L.Y. and Middleton, D.A. (2001) Labelling approaches for protein structural studies by solution-state and solid-state NMR. Prog. Nucl. Mag. Res. Sp., 39(3), 171–190. 17. Tyler, R.C., Sreenath, H.K. et al. (2005) Auto-induction medium for the production of [U-N-15]and [U-C-13, U-N-15]-labeled proteins for NMR screening and structure determination. Protein Expr. Purif., 40(2), 268–278. 18. Ohki, S.Y. and Kainosho, M. (2008) Stable isotope labeling methods for protein NMR spectroscopy. Prog. Nucl. Mag. Res. Sp., 53(4), 208–226. 19. Nikonowicz, E.P., Sirr, A. et al. (1992) Preparation of 13C and 15N labelled RNAs for heteronuclear multi-dimensional NMR studies. Nucleic Acids Res., 20(17), 4507–4513. 20. Batey, R.T., Inada, M. et al. (1992) Preparation of isotopically labeled ribonucleotides for multidimensional NMR spectroscopy of RNA. Nucleic Acids Res., 20(17), 4515–4523. 21. Puglisi, J.D. and Wyatt, J.R. (1995) Biochemical and NMR studies of RNA conformation with an emphasis on RNA pseudoknots. Meth. Enzymol., 261, 323–350. 22. Zimmer, D.P. and Crothers, D.M. (1995) Nmr of enzymatically synthesized uniformly (Cn)-C-13-N-15-Labeled DNA oligonucleotides. Proc. Natl. Acad. Sci. USA, 92(8), 3091–3095. 23. Varani, G., Aboulela, F. et al. (1996) NMR investigation of RNA structure. Prog. Nucl. Mag. Res. Sp., 29, 51–127. 24. Smith, D.E., Su, J.Y. et al. (1997) Efficient enzymatic synthesis of 13C, 15N-labeled DNA for NMR studies. J. Biomol. NMR, 10(3), 245–253. 25. Werner, M.H., Gupta, V. et al. (2001) Uniform C-13/N-15-labeling of DNA by tandem repeat amplification. Meth. Enzymol., 338, 283–304.
Macromolecular Complexes
311
26. Lukavsky, P.J. and Puglisi, J.D. (2004) Large-scale preparation and purification of polyacrylamide-free RNA oligonucleotides. RNA, 10(5), 889–893. 27. Flinders, J. and Dieckmann, T. (2006) NMR spectroscopy of ribonucleic acids. Prog. Nucl. Mag. Res. Sp., 48(2–3), 137–159. 28. Nelissen, F.H.T., van Gammeren, A.J. et al. (2008) Multiple segmental and selective isotope labeling of large RNA for NMR structural studies. Nucleic Acids Res., 36(14), e89. 29. (a) Burz, D.S., Dutta, K. et al. (2006) In-cell NMR for protein-protein interactions (STINTNMR). Nat. Protoc., 1(1), 146–152; (b) Burz, D.S., Dutta, K. et al. (2006) Mapping structural interactions using in-cell NMR spectroscopy (STINT-NMR). Nat. Methods, 3(2), 91–93. 30. Golovanov, A.P., Blankley, R.T. et al. (2007) Isotopically discriminated NMR spectroscopy: A tool for investigating complex protein interactions in vitro. J. Am. Chem. Soc., 129(20), 6528–6535. 31. Bermel, W., Tkach, E.N. et al. (2009) Simultaneous measurement of residual dipolar couplings for proteins in complex using the isotopically discriminated NMR approach. J. Am. Chem. Soc., 131(24), 8564–8570. 32. Ortega-Roldan, J.L., Jensen, M.R. et al. (2009) Accurate characterization of weak macromolecular interactions by titration of NMR residual dipolar couplings: application to the CD2AP SH3-C:ubiquitin complex. Nucleic Acids Res., 37(9), e70. 33. (a) Lemaster, D.M. (1990) Deuterium labeling in NMR structural-analysis of larger proteins. Q. Rev. Biophys., 23(2), 133–174; (b) Lemaster, D.M. (1990) Uniform and selective deuteration in 2-dimensional nmr of proteins. Annu. Rev. Biophys. Biophys. Chem., 19, 243–266. 34. Venters, R.A., Huang, C.C. et al. (1995) High-level H-2/C-13/N-15 labeling of proteins for nmrstudies. J. Biomol. NMR, 5(4), 339–344. 35. Sattler, M. and Fesik, S.W. (1996) Use of deuterium labeling in NMR: overcoming a sizeable problem. Structure, 4(11), 1245–1249. 36. Gardner, K.H. and Kay, L.E. (1998) The use of H-2, C-13, N-15 multidimensional NMR to study the structure and dynamics of proteins. Annu. Rev. Bioph. Biom., 27, 357–406. 37. Shan, X., Gardner, K.H. et al. (1996) Assignment of N-15, C-13(alpha), C-13(beta), and HN resonances in an N-15, C-13 H-2 labeled 64kDa trp repressor-operator complex using triple-resonance NMR spectroscopy and H-2-decoupling. J. Am. Chem. Soc., 118(28), 6570–6579. 38. Pervushin, K., Riek, R. et al. (1997) Attenuated T2 relaxation by mutual cancellation of dipoledipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc. Natl. Acad. Sci. USA, 94(23), 12366–12371. 39. Salzmann, M., Pervushin, K. et al. (1998) TROSY in triple-resonance experiments: new perspectives for sequential NMR assignment of large proteins. Proc. Natl. Acad. Sci. USA, 95(23), 13585–13590. 40. Riek, R., Wider, G. et al. (1999) Polarization transfer by cross-correlated relaxation in solution NMR with very large molecules. Proc. Natl. Acad. Sci. USA, 96(9), 4918–4923. 41. Wider, G. and W€uthrich, K. (1999) NMR spectroscopy of large molecules and multimolecular assemblies in solution. Curr. Opin. Struct. Biol., 9(5), 594–601. 42. Pervushin, K. (2000) Impact of transverse relaxation optimized spectroscopy (TROSY) on NMR as a technique in structural biology. Q. Rev. Biophys., 33(2), 161–197. 43. Riek, R., Pervushin, K. et al. (2000) TROSY and CRINEPT: NMR with large molecular and supramolecular structures in solution. Trends Biochem. Sci., 25(10), 462–468. 44. Wider, G. (2005) NMR techniques used with very large biological macromolecules in solution. Methods in Enzymology, 394, 382–398. 45. Venters, R.A., Metzler, W.J. et al. (1995) Use of H-1(N)-H-1(N) noes to determine protein global folds in perdeuterated proteins. J. Am. Chem. Soc., 117(37), 9592–9593. 46. Mal, T.K., Matthews, S.J. et al. (1998) Some NMR experiments and a structure determination employing a [15N, 2H] enriched protein. J. Biomol. NMR, 12(2), 259–276. 47. Goto, N.K., Gardner, K.H. et al. (1999) A robust and cost-effective method for the production of Val, Leu, Ile (delta 1) methyl-protonated N-15-, C-13-, H-2-labeled proteins. J. Biomol. NMR, 13(4), 369–374.
312
Protein NMR Spectroscopy
48. Farrow, N.A., Zhang, O.W. et al. (1994) A heteronuclear correlation experiment for simultaneous determination of N-15 longitudinal decay and chemical-exchange rates of systems in slow equilibrium. J. Biomol. NMR, 4(5), 727–734. 49. Macura, S. and Ernst, R.R. (1980) Elucidation of cross relaxation in liquids by two-dimensional Nmr-spectroscopy. Mol. Phys., 41(1), 95–117. 50. Bothnerby, A.A., Stephens, R.L. et al. (1984) Structure determination of a tetrasaccharide – transient nuclear overhauser effects in the rotating frame. J. Am. Chem. Soc., 106(3), 811–813. 51. Bax, A. and Davis, D.G. (1985) Practical aspects of two-dimensional transverse NOE spectroscopy. J. Magn. Reson., 63(1), 207–213. 52. Breeze, A.L. (2000) Isotope-filtered NMR methods for the study of biomolecular structure and interactions. Prog. Nucl. Mag. Res. Sp., 36(4), 323–372. 53. Otting, G., Senn, H. et al. (1986) Editing of 2D H-1-NMR spectra using X half-filters – combined use with residue-selective N-15 labeling of proteins. J. Magn. Reson., 70(3), 500–505. 54. (a) Otting, G. and W€uthrich, K. (1989) Extended heteronuclear editing of 2D H-1-Nmr spectra of isotope-labeled proteins, using the X(Omega-1, Omega-2) double half filter. J. Magn. Reson., 85(3), 586–594; (b) Otting, G. and W€uthrich, K. (1990) Heteronuclear filters in two-dimensional [1H, 1H]-NMR spectroscopy: combined use with isotope labelling for studies of macromolecular conformation and intermolecular interactions. Q. Rev. Biophys., 23(1), 39–96. 55. Zwahlen, C., Legault, P. et al. (1997) Methods for measurement of intermolecular NOEs by multinuclear NMR spectroscopy: Application to a bacteriophage lambda N-peptide/boxB RNA complex. J. Am. Chem. Soc., 119(29), 6711–6721. 56. Stuart, A.C., Borzilleri, K.A. et al. (1999) Compensating for variations in H-1-C-13 scalar coupling constants in isotope-filtered NMR experiments. J. Am. Chem. Soc., 121(22), 5346–5347. 57. Melacini, G. (2000) Separation of intra- and intermolecular NOEs through simultaneous editing and J-compensated filtering: A 4D quadrature-free constant-time J-resolved approach. J. Am. Chem. Soc., 122(40), 9735–9738. 58. Gross, J.D., Gelev, V.M. et al. (2003) A sensitive and robust method for obtaining intermolecular NOEs between side chains in large protein complexes. J. Biomol. NMR, 25(3), 235–242. 59. Garrett, D.S., Seok, Y.J. et al. (1999) Solution structure of the 40, 000 Mr phosphoryl transfer complex between the N-terminal domain of enzyme I and HPr. Nat. Struct. Biol., 6(2), 166–173. 60. Mayer, M. and Meyer, B. (1999) Characterization of ligand binding by saturation transfer difference NMR spectroscopy. Angew. Chem. Int. Edit., 38(12), 1784–1788. 61. Takahashi, H., Nakanishi, T. et al. (2000) A novel NMR method for determining the interfaces of large protein-protein complexes. Nat. Struct. Biol., 7(3), 220–223. 62. Mayer, M. and Meyer, B. (2001) Group epitope mapping by saturation transfer difference NMR to identify segments of a ligand in direct contact with a protein receptor. J. Am. Chem. Soc., 123(25), 6108–6117. 63. Shimada, I. (2005) NMR techniques for identifying the interface of a larger protein-protein complex: cross-saturation and transferred cross-saturation experiments. Meth. Enzymol., 394, 483–506. 64. Shimada, I., Ueda, T. et al. (2009) Cross-saturation and transferred cross-saturation experiments. Prog. Nucl. Mag. Res. Sp., 54(2), 123–140. 65. Stockman, B.J. and Dalvit, C. (2002) NMR screening techniques in drug discovery and drug design. Prog. Nucl. Mag. Res. Sp., 41(3-4), 187–231. 66. Meyer, B. and Peters, T. (2003) NMR spectroscopy techniques for screening and identifying ligand binding to protein receptors. Angew. Chem. Int. Edit., 42(8), 864–890. 67. Lepre, C.A., Moore, J.M. et al. (2004) Theory and applications of NMR-based screening in pharmaceutical research. Chem. Rev., 104(8), 3641–3675. 68. Peng, J.W., Moore, J. et al. (2004) NMR experiments for lead generation in drug discovery. Prog. Nucl. Mag. Res. Sp., 44(3–4), 225–256. 69. Kupce, E. and Wagner, G. (1995) Wideband homonuclear decoupling in protein spectra. J. Magn. Reson. Series B, 109(3), 329–333.
Macromolecular Complexes
313
70. Gouda, H., Shiraishi, M. et al. (1998) NMR study of the interaction between the B domain of staphylococcal protein A and the Fc portion of immunoglobulin G. Biochemistry, 37(1), 129–136. 71. Morgan, W.D., Frenkiel, T.A. et al. (2005) Precise epitope mapping of malaria parasite inhibitory antibodies by TROSY NMR cross-saturation. Biochemistry, 44(2), 518–523. 72. Gardner, K.H. and Kay, L.E. (1997) Production and incorporation of N-15, C-13 H-2 (H-1-delta 1 methyl) isoleucine into proteins for multidimensional NMR studies. J. Am. Chem. Soc., 119 (32), 7599–7600. 73. Takahashi, H., Miyazawa, M. et al. (2006) Utilization of methyl proton resonances in crosssaturation measurement for determining the interfaces of large protein-protein complexes. J. Biomol. NMR, 34(3), 167–177. 74. Isaacson, R.L., Simpson, P.J. et al. (2007) A new labeling method for methyl transverse relaxation-optimized spectroscopy NMR spectra of alanine residues. J. Am. Chem. Soc., 129(50), 15428. 75. Ayala, I., Sounier, R. et al. (2009) An efficient protocol for the complete incorporation of methylprotonated alanine in perdeuterated protein. J. Biomol. NMR, 43(2), 111–119. 76. Kutateladze, T. and Overduin, M. (2001) Structural mechanism of endosome docking by the FYVE domain. Science, 291(5509), 1793–1796. 77. Brunecky, R., Lee, S. et al. (2005) Investigation of the binding geometry of a peripheral membrane protein. Biochemistry, 44(49), 16064–16071. 78. Tjandra, N., Garrett, D.S. et al. (1997) Defining long range order in NMR structure determination from the dependence of heteronuclear relaxation times on rotational diffusion anisotropy. Nat. Struct. Biol., 4(6), 443–449. 79. Tjandra, N. (1999) Establishing a degree of order: obtaining high-resolution NMR structures from molecular alignment. Struct. Fold. Des., 7(9), R205–R211. 80. Bax, A., Kontaxis, G. et al. (2001) Dipolar couplings in macromolecular structure determination. Methods in Enzymology, 339, 127–174. 81. de Alba, E. and Tjandra, N. (2002) NMR dipolar couplings for the structure determination of biopolymers in solution. Prog. Nucl. Mag. Res. Sp., 40(2), 175–197. 82. Lipsitz, R.S. and Tjandra, N. (2004) Residual dipolar couplings in NMR structure analysis. Annu. Rev. Bioph. Biom., 33, 387–413. 83. Fleming, K. and Matthews, S. (2004) Media for studies of partially aligned states. Methods Mol. Biol., 278, 79–88. 84. Vold, R.R., Prosser, R.S. et al. (1997) Isotropic solutions of phospholipid bicelles: A new membrane mimetic for high-resolution NMR studies of polypeptides. J. Biomol. NMR, 9(3), 329–335. 85. Losonczi, J.A. and Prestegard, J.H. (1998) Improved dilute bicelle solutions for high-resolution NMR of biological macromolecules. J. Biomol. NMR, 12(3), 447–451. 86. Ottiger, M. and Bax, A. (1998) Characterization of magnetically oriented phospholipid micelles for measurement of dipolar couplings in macromolecules. J. Biomol. NMR, 12(3), 361–372. 87. Prosser, R.S., Hwang, J.S. et al. (1998) Magnetically aligned phospholipid bilayers with positive ordering: A new model membrane system. Biophys. J., 74(5), 2405–2418. 88. Prosser, R.S., Losonczi, J.A. et al. (1998) Use of a novel aqueous liquid crystalline medium for high-resolution NMR of macromolecules in solution. J. Am. Chem. Soc., 120(42), 11010–11011. 89. Ottiger, M. and Bax, A. (1999) Bicelle-based liquid crystals for NMR-measurement of dipolar couplings at acidic and basic pH values. J. Biomol. NMR, 13(2), 187–191. 90. Tan, C.B., Fung, B.M. et al. (2002) Phospholipid bicelles that align with their normals parallel to the magnetic field. J. Am. Chem. Soc., 124(39), 11827–11832. 91. Prosser, R.S., Evanics, F. et al. (2006) Current applications of bicelles in NMR studies of membrane-associated amphiphiles and proteins. Biochemistry, 45(28), 8453–8465. 92. Ruckert, M. and Otting, G. (2000) Alignment of biological macromolecules in novel nonionic liquid crystalline media for NMR experiments. J. Am. Chem. Soc., 122(32), 7793–7797.
314
Protein NMR Spectroscopy
93. Barrientos, L.G., Dolan, C. et al. (2000) Characterization of surfactant liquid crystal phases suitable for molecular alignment and measurement of dipolar couplings. J. Biomol. NMR, 16(4), 329–337. 94. Barrientos, L.G., Gawrisch, K. et al. (2002) Structural characterization of the dilute aqueous surfactant solution of cetylpyridinium bromide/hexanol/sodium bromide. Langmuir, 18(10), 3773–3779. 95. Clore, G.M., Starich, M.R. et al. (1998) Measurement of residual dipolar couplings of macromolecules aligned in the nematic phase of a colloidal suspension of rod-shaped viruses. J. Am. Chem. Soc., 120(40), 10571–10572. 96. Hansen, M.R., Mueller, L. et al. (1998) Tunable alignment of macromolecules by filamentous phage yields dipolar coupling interactions. Nat. Struct. Biol., 5(12), 1065–1074. 97. Zweckstetter, M. and Bax, A. (2001) Characterization of molecular alignment in aqueous suspensions of Pf1 bacteriophage. J. Biomol. NMR, 20(4), 365–377. 98. Koenig, B.W., Hu, J.S. et al. (1999) NMR measurement of dipolar couplings in proteins aligned by transient binding to purple membrane fragments. J. Am. Chem. Soc., 121(6), 1385–1386. 99. Tycko, R., Blanco, F.J. et al. (2000) Alignment of biopolymers in strained gels: A new way to create detectable dipole-dipole couplings in high-resolution biomolecular NMR. J. Am. Chem. Soc., 122(38), 9340–9341. 100. Ishii, Y., Markus, M.A. et al. (2001) Controlling residual dipolar couplings in high-resolution NMR of proteins by strain induced alignment in a gel. J. Biomol. NMR, 21(2), 141–151. 101. Meier, S., Haussinger, D. et al. (2002) Charged acrylamide copolymer gels as media for weak alignment. J. Biomol. NMR, 24(4), 351–356. 102. Fleming, K., Gray, D. et al. (2000) Cellulose crystallites: A new and robust liquid crystalline medium for the measurement of residual dipolar couplings. J. Am. Chem. Soc., 122(21), 5224–5225. 103. Ma, J.H., Goldberg, G.I. et al. (2008) Weak alignment of biomacromolecules in collagen gels: an alternative way to yield residual dipolar couplings for NMR measurements. J. Am. Chem. Soc., 130(48), 16148. 104. Fischer, M.W.F., Losonczi, J.A. et al. (1999) Domain orientation and dynamics in multidomain proteins from residual dipolar couplings. Biochemistry, 38(28), 9013–9022. 105. Braddock, D.T., Cai, M.L. et al. (2001) Rapid identification of medium- to largescale interdomain motion in modular proteins using dipolar couplings. J. Am. Chem. Soc., 123(35), 8634–8635. 106. Prestegard, J.H., Al-Hashimi, H.M. et al. (2000) NMR structures of biomolecules using field oriented media and residual dipolar couplings. Q. Rev. Biophys., 33(4), 371–424. 107. Prestegard, J.H. and Kishore, A.I. (2001) Partial alignment of biomolecules: an aid to NMR characterization. Curr. Opin. Chem. Biol., 5(5), 584–590. 108. Fushman, D., Varadan, R. et al. (2004) Determining domain orientation in macromolecules by using spin-relaxation and residual dipolar coupling measurements. Prog. Nucl. Mag. Res. Sp., 44(3–4), 189–214. 109. Blackledge, M. (2005) Recent progress in the study of biomolecular structure and dynamics in solution from residual dipolar couplings. Prog. Nucl. Mag. Res. Sp., 46(1), 23–61. 110. Tolman, J.R. and Ruan, K. (2006) NMR residual dipolar couplings as probes of biomolecular dynamics. Chem. Rev., 106(5), 1720–1736. 111. Ottiger, M., Delaglio, F. et al. (1998) Measurement of J and dipolar couplings from simplified two-dimensional NMR spectra. J. Magn. Reson., 131(2), 373–378. 112. Ding, K.Y. and Gronenborn, A.M. (2003) Sensitivity-enhanced 2D IPAP, TROSY-anti-TROSY, and E. COSY experiments: alternatives for measuring dipolar N-15-H-1(N) couplings. J. Magn. Reson., 163(2), 208–214. 113. Yao, L.S., Ying, J.F. et al. (2009) Improved accuracy of N-15-H-1 scalar and residual dipolar couplings from gradient-enhanced IPAP-HSQC experiments on protonated proteins. J. Biomol. NMR, 43(3), 161–170. 114. Bertini, I., Luchinat, C. et al. (2001) Paramagnetic probes in metalloproteins. Meth. Enzymol., 339, 314–340.
Macromolecular Complexes
315
115. Ubbink, M., Worrall, J.A.R. et al. (2002) Paramagnetic resonance of biological metal centers. Annu. Rev. Bioph. Biom., 31, 393–422. 116. Kosen, P.A. (1989) Spin labeling of proteins. Meth. Enzymol., 177, 86–121. 117. Battiste, J.L. and Wagner, G. (2000) Utilization of site-directed spin labeling and high-resolution heteronuclear nuclear magnetic resonance for global fold determination of large proteins with limited nuclear overhauser effect data. Biochemistry, 39(18), 5355–5365. 118. Gaponenko, V., Howarth, J.W. et al. (2000) Protein global fold determination using site-directed spin and isotope labeling. Protein Science, 9(2), 302–309. 119. Iwahara, J., Anderson, D.E. et al. (2003) EDTA-derivatized deoxythymidine as a tool for rapid determination of protein binding polarity to DNA by intermolecular paramagnetic relaxation enhancement. J. Am. Chem. Soc., 125(22), 6634–6635. 120. Ramos, A. and Varani, G. (1998) A new method to detect long-range protein-RNA contacts: NMR detection of electron-proton relaxation induced by nitroxide spin-labeled RNA. J. Am. Chem. Soc., 120(42), 10992–10993. 121. Mal, T.K., Ikura, M. et al. (2002) The ATCUN domain as a probe of intermolecular interactions: Application to calmodulin-peptide complexes. J. Am. Chem. Soc., 124(47), 14002–14003. 122. Iwahara, J., Tang, C. et al. (2007) Practical aspects of H-1 transverse paramagnetic relaxation enhancement measurements on macromolecules. J. Magn. Reson., 184(2), 185–195. 123. Iwahara, J., Schwieters, C.D. et al. (2004) Ensemble approach for NMR structure refinement against (1)H paramagnetic relaxation enhancement data arising from a flexible paramagnetic group attached to a macromolecule. J. Am. Chem. Soc., 126(18), 5879–5896. 124. Volkov, A.N., Ferrari, D. et al. (2005) The orientations of cytochrome c in the highly dynamic complex with cytochrome b5 visualized by NMR and docking using HADDOCK. Protein Sci., 14(3), 799–811. 125. Volkov, A.N., Worrall, J.A.R. et al. (2006) Solution structure and dynamics of the complex between cytochrome c and cytochrome c peroxidase determined by paramagnetic NMR. Proc. Natl. Acad. Sci. USA, 103(50), 18945–18950. 126. Iwahara, J. and Clore, G.M. (2006) Detecting transient intermediates in macromolecular binding by paramagnetic NMR. Nature, 440(7088), 1227–1230. 127. Tang, C., Iwahara, J. et al. (2006) Visualization of transient encounter complexes in proteinprotein association. Nature, 444(7117), 383–386. 128. Ubbink, M. (2009) The courtship of proteins: Understanding the encounter complex. FEBS Lett., 583(7), 1060–1066. 129. Otting, G. (2008) Prospects for lanthanides in structural biology by NMR. J. Biomol. NMR, 42(1), 1–9. 130. Pintacuda, G., Park, A.Y. et al. (2006) Lanthanide labeling offers fast NMR approach to 3D structure determinations of protein-protein complexes. J. Am. Chem. Soc., 128(11), 3696–3702. 131. Allegrozzi, M., Bertini, I. et al. (2000) Lanthanide-induced pseudocontact shifts for solution structure refinements of macromolecules in shells up to 40 angstrom from the metal ion. J. Am. Chem. Soc., 122(17), 4154–4161. 132. Dominguez, C., Boelens, R. et al. (2003) HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc., 125(7), 1731–1737. 133. Linge, J.P., Habeck, M. et al. (2003) ARIA: automated NOE assignment and NMR structure calculation. Bioinformatics, 19(2), 315–316. 134. Rieping, W., Habeck, M. et al. (2007) ARIA2: Automated NOE assignment and data integration in NMR structure calculation. Bioinformatics, 23(3), 381–382. 135. Brunger, A.T., Adams, P.D. et al. (1998) Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Cryst. D, 54, 905–921. 136. van Dijk, A.D., Fushman, D. et al. (2005) Various strategies of using residual dipolar couplings in NMR-driven protein docking: application to Lys48-linked di-ubiquitin and validation against 15 N-relaxation data. Proteins, 60(3), 367–381. 137. Zweckstetter, M. (2008) NMR: prediction of molecular alignment from structure using the PALES software. Nat. Protoc., 3(4), 679–690.
316
Protein NMR Spectroscopy
138. Svergun, D.I., Petoukhov, M.V. et al. (2001) Determination of domain structure of proteins from X-ray solution scattering. Biophys. J., 80(6), 2946–2953. 139. Svergun, D.I. and Koch, M.H.J. (2002) Advances in structure analysis using small-angle scattering in solution. Curr. Opin. Struct. Biol., 12(5), 654–660. 140. Svergun, D.I. and Koch, M.H.J. (2003) Small-angle scattering studies of biological macromolecules in solution. Rep. Prog. Phys., 66(10), 1735–1782. 141. Petoukhov, M.V. and Svergun, D.I. (2007) Analysis of X-ray and neutron scattering from biomacromolecular solutions. Curr. Opin. Struct. Biol., 17(5), 562–571. 142. Svergun, D.I. (2007) Small-angle scattering studies of macromolecular solutions. J. Appl. Crystallogr., 40, S10–S17. 143. Bernado, P., Mylonas, E. et al. (2007) Structural characterization of flexible proteins using smallangle X-ray scattering. J. Am. Chem. Soc., 129(17), 5656–5664. 144. Blobel, J., Bernado, P. et al. (2009) Low-resolution structures of transient protein-protein complexes using small-angle X-ray scattering. J. Am. Chem. Soc., 131(12), 4378–4386. 145. Grishaev, A., Wu, J. et al. (2005) Refinement of multidomain protein structures by combination of solution small-angle X-ray scattering and NMR data. J. Am. Chem. Soc., 127(47), 16621–16628. 146. Gabel, F., Simon, B. et al. (2008) A structure refinement protocol combining NMR residual dipolar couplings and small angle scattering restraints. J. Biomol. NMR, 41(4), 199–208. 147. Grishaev, A., Tugarinov, V. et al. (2008) Refined solution structure of the 82-kDa enzyme malate synthase G from joint NMR and synchrotron SAXS restraints. J. Biomol. NMR, 40(2), 95–106. 148. Williams, C., Rezgui, D. et al. (2007) Structural insights into the interaction of insulin-like growth factor 2 with IGF2R domain 11. Structure, 15(9), 1065–1078. 149. Linnell, J., Groeger, G. et al. (2001) Real time kinetics of insulin-like growth factor II (IGF-II) interaction with the IGF-II/mannose 6-phosphate receptor – The effects of domain 13 and pH. J. Biol. Chem., 276(26), 23986–23991. 150. Brown, J., Delaine, C. et al. (2008) Structure and functional analysis of the IGF-II/IGF2R interaction. EMBO J., 27(1), 265–276. 151. Love, J.J., Li, X.A. et al. (1995) Structural basis for DNA bending by the architectural transcription factor Lef-1. Nature, 376(6543), 791–795. 152. Werner, M.H., Bianchi, M.E. et al. (1995) Nmr spectroscopic analysis of the DNA conformation induced by the human testis-determining factor SRY. Biochemistry, 34(37), 11998–12004. 153. Murphy, E.C., Zhurkin, V.B. et al. (2001) Structural basis for SRY-dependent 46-X, Y sex reversal: Modulation of DNA bending by a naturally occurring point mutation. J. Mol. Biol., 312(3), 481–499. 154. Clore, G.M., Murphy, E.C. et al. (1998) Determination of three-bond (1)H30 -P-31 couplings in nucleic acids and protein nucleic acid complexes by quantitative J correlation spectroscopy. J. Magn. Reson., 134(1), 164–167. 155. Wu, H., Henras, A. et al. (2004) Structural basis for recognition of the AGNN tetraloop RNA fold by the double-stranded RNA-binding domain of Rnt1p RNase III. Proc. Natl. Acad. Sci. USA, 101(22), 8307–8312. 156. Milligan, J.F., Groebe, D.R. et al. (1987) Oligoribonucleotide synthesis using T7 RNA polymerase and synthetic DNA templates. Nucleic Acids Res., 15(21), 8783–8798. 157. Milligan, J.F. and Uhlenbeck, O.C. (1989) Synthesis of small RNAs using T7 RNA polymerase. Meth. Enzymol., 180, 51–62. 158. Peterson, R.D., Bartel, D.P. et al. (1994) 1H NMR studies of the high-affinity Rev binding site of the Rev responsive element of HIV-1 mRNA: base pairing in the core binding element. Biochemistry, 33(18), 5357–5366. 159. Wu, H., Yang, P.K. et al. (2001) A novel family of RNA tetraloop structure forms the recognition site for Saccharomyces cerevisiae RNase III. EMBO J., 20(24), 7240–7249. 160. Dieckmann, T. and Feigon, J. (1997) Assignment methodology for larger RNA oligonucleotides: application to an ATP-binding RNA aptamer. J. Biomol. NMR, 9(3), 259–272. 161. Peterson, R.D., Theimer, C.A. et al. (2004) New applications of 2D filtered/edited NOESY for assignment and structure elucidation of RNA and RNA-protein complexes. J. Biomol. NMR, 28(1), 59–67.
Macromolecular Complexes
317
162. Schwieters, C.D., Kuszewski, J.J. et al. (2003) The Xplor-NIH NMR molecular structure determination package. J. Magn. Reson., 160(1), 65–73. 163. Zhou, J., McAllen, J.K. et al. (2005) High affinity nucleocapsid protein binding to the muPsi RNA packaging signal of Rous sarcoma virus. J. Mol. Biol., 349(5), 976–988. 164. Zhou, J., Bean, R.L. et al. (2007) Solution structure of the Rous sarcoma virus nucleocapsid protein: muPsi RNA packaging signal complex. J. Mol. Biol., 365(2), 453–467. 165. Guntert, P., Mumenthaler, C. et al. (1997) Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol., 273(1), 283–298. 166. Herrmann, T., Guntert, P. et al. (2002) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol., 319(1), 209–227. 167. Spriggs, S., Garyu, L. et al. (2008) Potential intra- and intermolecular interactions involving the unique-50 region of the HIV-1 50 -UTR. Biochemistry, 47(49), 13064–13073. 168. Collin, D., van Heijenoort, C. et al. (2000) NMR characterization of a kissing complex formed between the TAR RNA element of HIV-1 and a DNA aptamer. Nucleic Acids Res., 28(17), 3386–3391.
9 Studying Partially Folded and Intrinsically Disordered Proteins Using NMR Residual Dipolar Couplings Malene Ringkjøbing Jensen, Valery Ozenne, Loic Salmon, Gabrielle Nodet, Phineus Markwick, Pau Bernado´ and Martin Blackledge
9.1
Introduction
The classical structure-function paradigm is based on the assumption that the determination of structural features of a protein will reveal the molecular and eventually the atomic basis of its biological activity [1]. The observation that many proteins adopt a single folded structure that is both energetically and conformationally stable has led to the adoption of techniques such as X-ray diffraction of crystalline samples or NMR spectroscopy, to determine a rigid representation of the protein that will in turn form the basis of a rational understanding of protein function. However, a fundamentally different and equally important paradigm has emerged over the last two decades, wherein this classical paradigm is no longer valid [2,3]. Following groundbreaking and at the time controversial contributions in the 1990s from a small number of dedicated groups, it is now accepted that a significant fraction of proteins – over 40 % of the human proteome – are not folded in their functional form [4–6]. These intrinsically disordered proteins (IDPs) or regions (IDRs) have been shown to play key roles in a wide range of cellular processes, including signalling, cell cycle control, molecular Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
320
Protein NMR Spectroscopy
recognition, transcription and replication, as well as in the development of numerous human pathologies such as neurodegenerative disease and cancer, where this figure rises to 80 %. IDPs necessarily fall outside the realm of classical structural biology due to their conformational flexibility, posing a new set of questions concerning the molecular understanding of functional biology. As an example, a number of IDPs have been observed to fold upon binding, adding an additional dimension to the characterisation of protein interactions and their relation to function and providing a novel paradigm concerning the relationship between intrinsic conformational propensity and the structure adopted by the protein in its bound form [1–3,7,8]. It has also been postulated that some IDPs even retain this flexibility in physiological complexes [9]. The development of new techniques to characterise the conformational behaviour of IDPs, and to provide tools to study important biological and medical questions involving these proteins is an essential and active field of research.
9.2
Ensemble Descriptions of Unfolded Proteins
Determination of a single set of atomic coordinates obviously would have little meaning for a highly disordered protein that can be best described in terms of a continuum in coordinate space. A conformational description of IDPs must be able to identify the rules that define the behaviour of the chain in terms of the probability of adopting a certain conformation. In practice this is often achieved by describing the protein in terms of an explicit ensemble of interconverting structures. The conformational space available to IDPs can be expected to be extremely large, such that accurate mapping of the conformational energy landscape will necessarily depend on exploitation of extensive and complementary experimental techniques such as those reporting on short-range and long-range structural parameters. Once the conformational energy surface has been mapped, we must consider an additional level of complexity relating to the accurate description of the dynamic nature of these structural ensembles. Interconversion rates between representative members of the ensemble can be considered to be of equal importance for understanding the molecular mechanisms underpinning physiological interactions involving these proteins. It has been recognised that the study of IDPs is strongly related to the study of chemically or thermally denatured proteins, although subtle differences have become apparent concerning their specific behaviour. The paradigm is therefore shifted from the determination of a single, static description of the protein, to the development of meaningful ensemble descriptions that accurately capture the conformational behaviour of the protein, in the hope that this will lead to a better understanding of the relationship between primary sequence and biological function in IDPs.
9.3
Experimental Techniques for the Characterisation of IDPs
A range of complementary spectroscopic techniques have been proposed to characterise IDPs, including circular dichroism [4], Raman [10,11] and infrared spectroscopy [12]. These techniques all report on local structural propensities averaged over the whole
Studying Partially Folded and Intrinsically Disordered Proteins
321
molecule, and can indicate the tendency of the backbone of the protein to sample specific regions of conformational space. Although these methods are instructive in detecting global tendencies, their disadvantage lies in the fact that short stretches or low populations of local structure can be difficult to detect. In spite of their highly dynamic nature, IDPs can exhibit transient tertiary structure that may be important for physiological function. Such long-range interactions between distant parts of the chain will affect the overall dimensions of the protein, and these dimensions can be probed using techniques that are sensitive to the dimensions of the protein, for example size-exclusion chromatography, dynamic light, X-ray or neutron scattering [13,14] or fluorescence correlation spectroscopy [15,16]. IDPs have been identified on the basis of many of these available experimental techniques, allowing for the development of bioinformatics tools to predict the level of disorder, including prediction of folding upon interaction [3,17]. NMR spectroscopy is probably the most powerful biophysical tool for studying IDPs [18], reporting on both local and long-range conformational behaviour at atomic resolution on timescales varying over many orders of magnitude. The interpretation of dynamically averaged NMR observables is well understood, rendering them particularly appropriate for the development of ensemble descriptions of the unfolded state. The motional properties of IDPs permit the use of multidimensional NMR experiments that can compensate for the spectral crowding of peaks experienced in the amide region of the proton spectrum, allowing the assignment of 1 H, 15 N and 13 C resonances from throughout the protein. The complete backbone resonance assignment of the full length tau protein (441 amino acids) is a particularly powerful example of the use of NMR to study even the most intimidating members of the IDP family [19]. Here recent advances are described in the study of highly disordered proteins using NMR spectroscopy, in combination with biophysical techniques such as small angle scattering, to develop explicit ensemble descriptions that can be used to understand the conformational behaviour of unfolded proteins. Particular emphasis is placed on recent applications of residual dipolar couplings (RDCs) to describe the level of local structure and transient long-range order in both intrinsically disordered and chemically denatured proteins.
9.4 9.4.1
NMR Spectroscopy of Intrinsically Disordered Proteins Chemical Shifts
NMR spectroscopy represents a key technique for studying highly flexible systems, as has been continually demonstrated over the last 20 years. During this period NMR has furnished an enormous volume of conformational information describing the unfolded state [20–23]. Even the most basic spectroscopic measurement, the chemical shift, reports on population-weighted averages over conformations sampled by all molecules in the ensemble, that interchange on so-called ‘fast’ timescales up to the millisecond. Chemical shifts have been shown to depend strongly on the local physico-chemical environment of the observed nucleus [24,25], to the extent that structure determination approaches have been successfully developed that depend uniquely on chemical shifts as experimental
322
Protein NMR Spectroscopy
restraints [26,27] (see also Chapter 5). It is perhaps not surprising, then, that average chemical shifts measured from a broad conformational equilibrium also report on the local conformational propensity of the observed nucleus. For specific nuclei, in particular backbone 13 C spins, the dominant contribution originates from the type of amino acid, so that amino-acid specific ‘random coil’ shifts, normally calibrated from short unstructured peptides, have to be subtracted from the measured value. The resulting, so-called secondary chemical shift, can then be used to identify the presence of transient structure in flexible chains [28–30]. For example in the case of a C spins, successive positive secondary shifts result from propensity to form alpha-helical segments. A potential problem associated with this kind of approach can result from incorrect frequency referencing, and in order to address this problem the a C and b C chemical shifts (which shift in opposing directions for alpha-helical segments) have been used in tandem to estimate the level of secondary structure in disordered proteins [31,32]. 9.4.2
Scalar Couplings
Three-bond scalar couplings between nuclei in the backbone of the protein also depend on backbone dihedral angles [33,34] and in a disordered ensemble also represent a populationweighted average that can be interpreted in terms of conformational propensity. As with chemical shifts, random coil values have been measured in small peptides and these can be compared to experimental values to determine the level of transient local structure. For these parameters, the calibration depends on so-called Karplus relationships that parameterise the dependence of the measured coupling on the dihedral angle. A potential source of error concerns the influence of near-neighbours on the conformational preferences of the amino acid of interest. Consideration of the persistence length, beyond which the rest of the chain exerts a negligible effect, is important for all approaches that are based on the interpretation of experimental measurements made in intact proteins in comparison to short peptides. The relevance of this persistence length may vary over the protein, and must depend on the composition of the primary sequence. Schwalbe and co-workers used scalar coupling measurements to study the importance of such effects, to detect differences between the conformational sampling in short peptides in isolation and in the context of a longer chain [35]. 9.4.3
Nuclear Overhauser Enhancements
More detailed information about local conformational sampling can in theory be derived from interproton NOEs [36], however, quantitative interpretation of cross-relaxation is confounded by the sensitivity of the interaction on the range of dynamic timescales that are characteristic of backbone reorientation in unfolded proteins. This makes it difficult to extract precise information concerning the distance distribution function from measured NOEs. On the other hand 15 N spin relaxation has been shown to provide information that correlates with local order [37] as long as it occurs on the ms to ms timescale. 9.4.4
Paramagnetic Relaxation Enhancements
Transient long-range contacts can be characterised by measuring dipolar relaxation between an unpaired electron, for example attached to an artificially introduced nitroxide group, and
Studying Partially Folded and Intrinsically Disordered Proteins
323
the observed spin (commonly termed paramagnetic relaxation enhancement or PRE; see Chapter 6) [38]. The advantage of such interactions over measurement of proton-proton relaxation via NOE, is that they rely on stronger interactions (the gyromagnetic ratio of the electron spin is 660 times larger than that of the proton and enters quadratically in all formulae describing relaxation), and as such provide longer-range information about distance distribution functions, and concomitantly provide information about weaker populated transient contacts. Experimental data can be interpreted in terms of average distance restraints between the unpaired electron and the observed spin, and then incorporated directly into a restrained molecular dynamics approach [39,40], or in terms of probability distributions [41]. The production of the necessary number of cysteine-carrying mutants of the protein, and the possible influence that the nonnative moiety may have on native long-range contacts can be considered as disadvantages of this kind of approach, but the approach is also extremely powerful, providing unambiguous evidence of fluctuating tertiary structure that can be otherwise very difficult to detect. Recent developments have shown that the incorporation of the mobility of the paramagnetic probe carrying sidechain is very important in the accurate characterisation of long-range contacts between different parts of the protein [42]. 9.4.5
Residual Dipolar Couplings
Over the last decade, residual dipolar couplings (RDCs), measured between pairs of nuclei in partially aligned proteins, have been shown to be sensitive probes of time and ensembleaveraged conformational equilibria on timescales up to the millisecond in folded proteins (see Chapter 4) [43] RDCs can also be used to characterise the conformational behaviour of unfolded proteins [44,45] Remarkable progress has been made in developing a clearer understanding of the nature of RDCs in the unfolded state, using analytical random chain descriptions derived from polymer physics or statistical coil-based conformational ensemble descriptions of the protein. In the remainder of this chapter some recent results are described that demonstrate the extraordinary power of RDCs to describe the conformational behaviour of IDPs.
9.5
Residual Dipolar Couplings
Dipolar couplings between two spins i and j depend on the orientational properties of the internuclear spin vector (see Chapter 4): gigj hm0 ð3 cos2 qðtÞ1Þ Dij ¼ ð9:1Þ ¼ Dmax hP2 ðcos qðtÞÞi 2 4p2 r3 with Dmax ¼
gigj hm 0 2 4p r3
ð9:2Þ
here q is the instantaneous orientation of the internuclear vector with respect to the static magnetic field and r is the vibrationally averaged distance. The angular parentheses in
324
Protein NMR Spectroscopy
Equation 9.1 describe an average over conformations that exchange with rates faster than the millisecond timescale. The strength of the static dipolar coupling between covalently bound spins can be intrinsically very high (many kilohertz), but if all orientations q are sampled with equal probability, as in free solution, the value of the measured coupling averages to zero. Dipolar couplings do however become measurable when the protein is dissolved in a weakly aligning medium such as lipid bicelles [43], filamentous phages [46–48], lyotropic ethylene glycol/alcohol phases [49], or mechanically strained polyacrylamide gels [50,51] (also described in Chapters 4 and 8). Alignment in many of these media results from a steric repulsion between the protein and the dilute liquid crystal medium, although bacteriophage or charged forms of the other media result in a combination of electrostatic and steric interactions. The interpretation of RDCs in IDPs resulting from electrostatic alignment in terms of local structure is more complicated than in the case of steric alignment [52]. RDCs determined from folded proteins aligned in dilute liquid crystalline media report on the orientation of internuclear vectors connecting pairs of spins relative to a common molecular alignment tensor. This alignment tensor describes the net orientation of the protein molecular frame relative to the magnetic field, and this is expressed in terms of a second rank order matrix. Assuming that global alignment is not coupled to local fluctuations, RDCs for different spin pairs can be expressed in terms of different orientations of the internuclear vectors relative to a common molecular frame: D ¼ Dmax Azz P2 ðcos qÞ þ h=2 sin2 q cos 2j
ð9:3Þ
where Azz is the main component of the alignment tensor, h is the rhombicity of the tensor defined as h ¼ Axx Ayy =Azz and {q,j} are the polar coordinates of the internuclear vector relative to these axes. This obviously provides long-range information about the different orientations of internuclear vectors relative to the molecular frame and therefore to each other, a type of information that is difficult to extract from isotropic solution state NMR. RDCs measured from folded proteins have been shown to be crucial for precise structure determination [53–55], long-range order in extended molecules [56] or protein complexes [57,58]. RDCs also report on local dynamic amplitudes and motional modes in folded proteins [59–67]. 9.5.1
Interpretation of RDCs in Disordered Proteins
If we now consider the case of highly conformationally heterogeneous proteins such as IDPs, different conformations of the molecule contributing to the time and ensemble average in rapid exchange, will be expected to have different shapes and sizes and therefore will align differently. The average RDC must then be described in terms of the mean over the different averages for all N molecules in the ensemble: N 1X 1 D ¼ Dmax N k¼1 tmax
tmax ð
P2 ðcos qk ðtÞÞdt t¼0
ð9:4Þ
Studying Partially Folded and Intrinsically Disordered Proteins
325
As each copy of the protein can be expected to sample the conformational space of the ensemble [68], this can be further simplified in the following way: D ¼ Dmax
1
tmax ð
P2 ðcos qðtÞÞdt
tmax
ð9:5Þ
t¼0
This average is potentially very complex in the case of IDPs and it may be thought to be unlikely to provide any useful information, or that the RDC would be too small to be measured in a truly unfolded state, due to the near complete orientational averaging. Despite this, relatively large RDCs have been measured in chemically denatured proteins, suggesting that the orientational sampling of internuclear vectors is clearly far from isotropic in these proteins (Figure 9.1). Dipolar couplings were measured in denatured or partially denatured proteins such as Staphylococcal nuclease in 8M urea [44], eglin C [69], protein GB1 [70], apo-
Figure 9.1 1 D NH residual dipolar couplings (light grey) from the urea unfolded proteins (a) apomyoglobin and (b) Staphylococcal nuclease D131D mutant aligned in radially squeezed polyacrylamide gel. Simulated RDCs using the explicit-ensemble flexible-meccano approach are shown in dark grey for comparison. In each molecule all RDCs are multiplied by a common scaling factor to best reproduce the data. Reprinted from Malene Ringkjøbing Jensen, Phineus R. L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier
326
Protein NMR Spectroscopy
myoglobin [71] and acyl-CoA binding protein (ACBP) [72]. It emerged that the overall distribution of 1 DNH couplings was found to be of negative sign, with maximal values in the central region of the chain, following a bell-shaped distribution that tapers off to zero at the extremities. If the partially denatured, or unfolded proteins contain residual secondary structure, as for example previously identified via chemical shift measurements, it was found that the residual dipolar couplings deviated strongly from the bell-shaped distributions. Changes in magnitude and sign of 1 DNH couplings have been observed in aciddenatured states of apo-myoglobin and ACBP and these changes of sign correlated with postulated raised helical propensities. If we consider the orientational dependence of the RDC, the observed change in sign can be rationalised: average orientation of the amide bond vectors present in an unfolded chain where the protein is preferentially aligned in a direction parallel to the magnetic field would be expected to be approximately orthogonal to the field, and therefore to have a negative sign. The presence of helical elements will induce a change in sign of the measured coupling, because the bond vector would be aligned rather in an average parallel direction with respect to the average chain direction (and therefore the field). Because the angular averaging term P2(cosq) changes sign between these two conditions, the dipolar coupling will also change its sign (Figure 9.2).
Figure 9.2 Representation of angular averaging properties of 15 N-1 H vectors in an unfolded protein dissolved in weakly aligning medium with the director along the magnetic field. RDCs measured for 15 N-1 H vectors in more extended conformations (q 90 ), more commonly found in unfolded proteins, will have negative values (a) while those in helical or turn conformations align more or less parallel with the direction of the chain (q 0 ) and will have larger positive values (b). Reprinted from Malene Ringkjøbing Jensen, Phineus R.L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier
Studying Partially Folded and Intrinsically Disordered Proteins
327
A number of studies then exploited this apparent sensitivity to local structure to follow folding and unfolding events in partially aligned proteins. Grzesiek et al. studied the diminution of the structure of the experimental RDC profile with increasing temperature in a b-hairpin in the fibritin foldon domain [73], while a gradual decrease of RDCs was observed with increasing temperature and decreasing salt concentration in a-helical ribonuclease S-peptide [74], and in thermal unfolding of protein GB1 [70]. RDCs were used to characterise the amino acid conformational specificity in short peptides [75] and to investigate local and long-range structure in disordered proteins such as Tau [76,77a,b] and a-synuclein [40,78,79]. A more quantitative understanding of the time and ensemble average represented in Equation 9.5 has been derived from polymer theory [45,80,81], describing the unfolded protein as a chain of connected segments of equal length undergoing a restricted random walk. Equation 9.5 is integrated over all available orientations of each segment in the presence of an obstacle, and the expected RDCs are analysed. Because orientational sampling is more restricted in the centre of the chain than at the termini, nonvanishing RDCs are predicted, even in the presence of random sampling of backbone torsion angles along the polymer chain. Central segments, with more neighbours, are less flexible than those at the ends, resulting in the experimentally observed bell-shaped distribution. These polymer-based models are thus capable of rationalising many experimentally observed aspects of the physical alignment of the unfolded polypeptide and as such present a relatively simple conceptual framework with which to understand RDC measurements. Proteins are, however, complex heteropolymers whose conformational heterogeneity cannot be adequately described in terms of a simple homopolymer exhibiting random flight behaviour. This means that such analytical models cannot easily be adapted to take into account the specificity of each primary sequence that defines the particular protein and therefore its function. Incorporation of glycine and proline residues into simulations of random homopolymers resulted in the prediction of RDC profiles that increased or decreased the absolute value of the predicted coupling as a function of the flexibility of the specific amino acids, demonstrating the dependence of RDC values on primary sequence [82]. This was supported experimentally, with the observation of significant siteto-site variation of experimental RDCs along unfolded peptide chains, underlining the need to introduce amino acid-specific conformational behaviour into any interpretative model of RDCs measured in disordered proteins. 9.5.2
RDCs in Highly Flexible Systems: Explicit Ensemble Models
Following the approach applied to the interpretation of chemical shifts and scalar couplings [33,34], two groups proposed very similar statistical coil approaches to the interpretation of RDCs from unfolded proteins [83,84]. These methods rely on the generation of explicit ensemble descriptions of the unfolded state. RDCs were obtained as the average over all couplings derived from an extensive conformational ensemble and predicted on the basis of the molecular shape, or on the basis of electrostatic charge distribution in the case of electrostatic alignment. D ¼ Dmax
M 1X Ak;zz P2 ðcos qk Þ þ hk =2 sin2 qk cos 2jk M k¼1
ð9:6Þ
328
Protein NMR Spectroscopy
When dipolar couplings are averaged over a sufficient number (M) of conformers they are assumed to fully sample the available conformational space, and the average is calculated. Although this number may be of the order of many thousands for an unfolded strand of 50 amino acids in length, it has been demonstrated that convergence of RDCs towards experimental data can be made more amenable to tractable ensemble sizes when the protein is divided into small, uncoupled segments [85]. This decoupling of distant regions in the chain needs to be treated with care, as long-range contacts that may modulate the experimental RDCs are absent in this approach and the resulting sampling may then not represent the true nature of the conformational space [42,86]. Such approaches explicitly take into account the heteropolymeric character of the peptide chain, sampling amino-acid-specific {f/y} propensities to construct the conformational ensemble [83,84]. Bernado´ et al. sampled conformations that were selected from a library of coil conformations that comprised nona-helical and non-b-sheet conformations from the 500 highest resolution X-ray crystal structures [87], with additional sampling regimes to account for example for amino acids preceding prolines. Simple steric clash repulsive terms were used to account for contacts between amino acid side chains, removing structures when an overlap occurred between residue-specific spheres centred on the b-carbon atoms of each amino acid (a-proton in the case of glycines). Random sampling of the amino acid specific coil library for each amino acid resulted in the construction of an ensemble of conformers, and RDCs were predicted for each copy of the ensemble using shape-based alignment algorithms [88,89]. The first application of this approach, termed flexible-meccano, was to the study of a twodomain viral protein, protein X, from Sendai virus phosphoprotein (Figure 9.3) comprising a
Figure 9.3 Residual dipolar couplings (1 D NH and 2 D CNH ) from the two-domain protein, PX, from Sendai virus. (a) 1 D NH and 2 D C 0 NH RDCs are well reproduced from throughout the protein using flexible meccano (thick lines). Experimental values are shown as thin lines. (b) Ensemble representation of PX. Reprinted from Malene Ringkjøbing Jensen, Phineus R.L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier. Please refer to the colour plate section
Studying Partially Folded and Intrinsically Disordered Proteins
329
small folded domain and a disordered domain. Two sets of RDCs (1 DNH and 2 DC0 NH ) were experimentally measured in the partially aligned protein (PEG-Hexanol alignment). The predicted RDCs using the flexible-meccano approach are quite well reproduced from throughout the protein both in terms of amplitude and site-specific distribution. In this particular system RDCs from each copy of the protein, in both folded and unfolded domains, depend on the relative alignment of the two domains, so that a comparison of the relative amplitude of RDCs measured in the folded and unfolded domains constitutes a quantitative test of the ability of the approach to reproduce experimental data. Following this initial observation, that the RDCs in the unfolded domain were in reasonable agreement with those predicted from an extremely simple random coil model, further systems were studied using the same approach. It was found to be possible to reproduce the overall distribution of experimentally measured 1 DNH couplings in the D131D mutant of nuclease [44], again simply on the basis of local conformational propensities. Following this, a number of further examples demonstrated that the observation was of a general nature, for example the prediction of 1 DNH RDCs measured in 8M urea unfolded apo-myoglobin [71]. Thus a statistical coil approach was, at least to a first approximation, capable of predicting the overall distribution of ‘random coil’ RDCs that result directly from the conformational properties of the primary sequence. These random coil values can be considered to constitute an unfolded ‘baseline’ of RDCs that will be expected in the absence of significant secondary or tertiary structure. The absolute level of alignment is not known in these simulations, so that RDCs are generally scaled by the appropriate optimal scaling factor in order to best reproduce the experimental data. Related observations were made correlating amino-acid sidechain bulkiness and the amplitude of experimental 1 DNH couplings [90]. 9.5.3
RDCs to Detect Deviation from Random Coil Behaviour in IDPs
A random coil description of intrinsically disordered or unfolded proteins as presented above provides a tool for calculating expected RDC profiles assuming that the protein could be described as a random coil, obeying only local structural propensities and devoid of any specific or persistent local or long-range structure. The establishment of these approaches is clearly essential in order to perform the next, significantly more demanding step that is the development of techniques whereby a departure from baseline values can be interpreted in terms of specific local or long-range conformational behaviour. In the following we will describe some recent efforts to develop a quantitative description of conformational detail in regions of IDPs that exhibit local structure, and in doing so we will present the evolution of the approaches that we have used in this direction over recent years. The natively unfolded protein Tau controls microtubule dynamics and stability in neuronal cells, but also represents a significant fraction of the proteins found in tangles in Alzheimer’s disease [91]. In a recent study of a 130 amino acid construct of the protein, 1 DNH couplings were measured from throughout the chain [77a]. As shown in Figure 9.4, local sign inversion of 1 DNH RDCs was observed in homologous regions of four repeat domains (R1–R4) comprising the hexapeptide segments that are known both to interact with microtubules, as well as nucleate self-association, forming paired helical filaments that eventually results in aggregation.
330
Protein NMR Spectroscopy
Figure 9.4 Residual dipolar couplings (1 D NH ) measured in the 130 amino acid domain of Tau protein aligned in polyacrylamide gel (grey). Experimental RDCs (black) are reproduced throughout the protein using flexible-meccano (a). Four regions show deviations from expected RDCs, with inversion of the sign. Backbone dihedral angle sampling from accelerated molecular dynamics simulation (d) of pentadecapeptides centred on these regions differs from the statistical coil sampling (c). This sampling is incorporated into flexible-meccano, to replace the coil database, and RDCs are better reproduced (b). (e). Ribbon diagram of K18 construct of Tau protein summarises conformational sampling from NMR data. The four strands identified in this study as containing turn propensities (252–255, 283–286, 314–317, 345–348), the three GGG motifs (271–273, 333–335, 365–367) and the regions identified as having propensity towards
Studying Partially Folded and Intrinsically Disordered Proteins
331
Following the logic presented previously, the fact that this inversion of the sign of the RDCs was not reproduced by the statistical coil flexible-meccano approach, can be qualitatively interpreted as the presence of local helical or turn motifs. The authors attempted to go further, using extensive simulation, to determine whether the simple observation of 1 DNH RDC sign inversion within the sequence can be unambiguously interpreted as the presence of helical propensity for specific amino acids. This turns out not to be true, for example if a neighbouring amino acid samples a left-handed helix backbone dihedral angle, the influence on a measured RDC at the site of interest is essentially identical as that predicted when a right handed helix is present at this site. In this study the authors therefore used a molecular dynamics based approach, accelerated molecular dynamics (AMD) simulation, that extends the effective temporal range compared to standard MD simulation by many orders of magnitude [92], to predict the behaviour of pentadecapeptides centred around these turn regions. The simulation clearly identified a tendency to form bI-turns for these repeat domains that well reproduced the experimental data, when the intrinsic statistical coil backbone dihedral angles for the four amino acids involved in each turn region were replaced with the backbone dihedral sampling resulting from the AMD simulations (Figure 9.4). This demonstration that the local bI-turn conformations and their predicted populations were in good agreement with the experimental RDC data substantiated the relevance of the simulation, and again underlined the sensitivity of RDCs to local conformational propensity, even when this propensity is weak (20–30 % in this case). RDCs are of course not only sensitive to local structure, but also to the presence of transient long-range contacts in disordered proteins [42]. Indeed, in the case of folded proteins, RDCs are regularly used to define the orientations of bond vectors relative to a common molecular alignment tensor, so that components from long-range order will be expected to contribute to the measured RDC in partially folded states. Partially disordered proteins often lie in a conformational state somewhere between fully folded and completely random coil proteins, and may therefore exhibit characteristics associated with both extremes. In the following example we present a case where such contacts would be important for the interpretation of experimental RDCs measured in a study of the IDP a-synuclein. 1 DNH couplings were measured from throughout the chain and compared to the flexiblemeccano prediction, resulting in significant and systematic deviations in the RDC profile compared to random coil RDCs, that were observed both in the N- and C-termini of the protein (Figure 9.5) [78,84]. These deviations may stem from local structure that is not predicted by the coil model, but the potential relevance of an alternative explanation was also demonstrated. Extensive ensemble averages were predicted containing transient longrange contacts between different segments of the molecule (RDCs were averaged for ensembles containing contacts of less than 15 A between any residues present in different 3
b-sheet conformations (274–283, 306–313 and 336–345). Reprinted from Malene Ringkjøbing Jensen, Phineus R.L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier
332
Protein NMR Spectroscopy
Figure 9.5 Experimental 1 D NH RDCs (black lines) measured in a-Synuclein aligned in bacteriophage depict the sensitivity of RDCs to the presence of transient long-range structure in IDPs. RDC profiles are reasonably well reproduced by flexible-meccano except for the N and C terminal regions (a). The importance of interactions between different parts of the protein is shown by dividing the 140 amino acid chain into 7 20-residue strands 1–20, 21–40, and so on. The flexible-meccano procedure was repeated, and conformers were only accepted if a b C from one of the 20-residue domains was less than 15 A from a b C from the other specified domain (c). Best reproduction is found for a contact between the N- and C-terminal regions (b). Reprinted from Malene Ringkjøbing Jensen, Phineus R.L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier
pairs of segments of the protein – Figure 9.5). Predicted RDCs clearly depend on the presence of long-range contacts. Even when these contacts are weak and relatively nonspecific they induce a modulation of the underlying baseline of the expected RDCs [42], reinforcing values in the vicinity of the broad regions experiencing contacts, and quenching values in the intervening regions. Importantly, the backbone conformational sampling of the amino acids showing increased RDCs compared to the completely unfolded ensemble is not measurably different in the presence and absence of the contacts. The dependence of measured RDCs on transient contacts and fluctuating tertiary structure has important consequences for approaches that divide the unfolded protein into short uncorrelated segments to improve the efficiency of RDC prediction [85,86], as such long-range effects would necessarily be absent from this kind of simulation.
Studying Partially Folded and Intrinsically Disordered Proteins
333
In the case of a-synuclein experimental RDCs were best reproduced in the presence of a long-range contact between the N- and C-terminal regions (Figure 9.5). This hypothesis is not entirely uninformed: Interactions between the terminal regions have been detected using PREs and have been shown to disappear at high temperatures, upon polyamine binding and upon addition of denaturant. Such experimental conditions favour aggregation of a-synuclein in vitro, suggesting a role of the long-range interactions in a-synuclein against misfolding and aggregation [39,93]. 9.5.4
Multiple RDCs Increase the Accuracy of Determination of Local Conformational Propensity
From the preceding descriptions we have seen that 1 DNH RDCs provide evidence of the relevance of statistical coil models, as well as being sensitive probes of local conformational propensity. Nevertheless the ambiguity of their interpretation in terms of local structural propensities underlines the need for complementary structural information. This necessity was recognised by Grzesiek and co-workers, who measured multiple additional RDCs between different pairs of nuclei on the protein backbone in highly deuterated urea and acid denatured ubiquitin, even showing that long-range order can be detected from RDCs by the measurement of 1 HN -1 HN RDCs. 1 HN -1 HN RDCs were measured across the N-terminal b-hairpin of ubiquitin, indicating the presence of significantly populated (around 20%) native-like local structure [94]. In this detailed study, up to seven RDCs, including 1 HN -1 HN and 1 HN -1 Ha , were measured per peptide unit. The flexible-meccano approach was used to predict one-bond RDCs (1 DNH , 1 DCaHa and 1 DCaC0 ) from the unfolded chain, resulting in a surprising observation. Profiles of experimental 1 DNH and 1 DCaHa RDCs were reasonably well reproduced by simulation, but critically a different scaling factor was required to reproduce the different RDC types from the same ensemble. Extensive simulation revealed that the standard statistical coil distribution of dihedral angles was inappropriate for the description of RDCs from urea-unfolded proteins (Figure 9.6). The conformational sampling distribution was then refined, to invoke a generally higher propensity for extended conformations {50 < y < 180 } to achieve simultaneous reproduction of the different types of covalently bound RDCs (1 DNH , 1 DCaHa and 1 DCaC0 ). This observation was experimentally substantiated by a comparison of calculated and experimental interproton 1 a 1 N 1 a RDCs 1 HN i - Hi and Hi - Hi1 , and interamide proton RDCs measured using quantitative 1 N 1 N 1 N J-coupling approaches [95] in perdeuterated ubiquitin (1 HN i - Hi þ 1 and Hi - Hi þ 2 ). An 1 1 additional scaling factor was required in order to reproduce all H- H RDCs, suggesting that increased mobility of these vectors is not accounted for in the standard flexible-meccano statistical coil model. Combination of flexible-meccano with an ensemble selection algorithm ASTEROIDS (A Selection Tool for Ensemble Representation Of Intrinsically Disordered States) was then applied to the development of a direct selection approach on the basis of the experimental data available from urea denatured ubiquitin [86]. This approach allowed the identification of local conformational sampling properties of urea-unfolded Ubiquitin, again indicating a general trend towards more extended conformers, but shows that the backbone sampling of certain types of charged or polar amino acids, in particular threonine, glutamic acid and arginine, are affected more strongly by urea binding than amino acids with hydrophobic side chains. In general these observations support the proposition that urea denaturation extends
334
Protein NMR Spectroscopy
Figure 9.6 Conformational sampling in urea unfolded proteins. 1 D NH , 1 D C aHa and 1 D C aC 0 , 1 N 1 a 1 N 1 a 1 N H i - H i , H i - H i1 , were measured in ubiquitin at pH 2 and 8M urea, and 1 H N i - H i1 and 1 N 1 N H i - H i þ 2 measured under the same conditions in perdeuterated ubiquitin and compared to expected couplings from the standard statistical coil database (black). All couplings are scaled using scaling factors appropriate for the 1 D NH coupling (left). The general disagreement between experimental and simulated RDCs appears to stem from the nature of the statistical coil model, which, when modified to reflect enhanced sampling in the more extended regions of Ramachandran space (right) provides a better overall reproduction of the RDCs. In this case RDCs between covalently bound spins are scaled using scaling factors appropriate for the 1 D NH 1 a coupling, while all 1 H -1 H are scaled using the best scaling for 1 H N i - H i1 couplings. Four sample Ramachandran plots are shown to illustrate this enhanced sampling. Reprinted from Malene Ringkjøbing Jensen, Phineus R.L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier
the unfolded amino acid chain, an observation that would be in agreement with local binding of urea to the polypeptide chain, inducing and more extended sampling of backbone dihedral angles [94]. Analysis of 3 JHNHa scalar couplings measured under the same conditions indicated that while the y angle was more extended, the y dihedral angle appeared to span both polyproline II and extended b-regions such that neither dominated the additional sampling of extended conformations.
Studying Partially Folded and Intrinsically Disordered Proteins
9.5.5
335
Quantitative Analysis of Local Conformational Propensities from RDCs
One key element of the complex relationship between structural behaviour and biological function inIDPs is theability ofmembersofthisfamily toundergoadisorder-to-ordertransition upon interaction with physiological partners, where molecular recognition is accompanied by local folding into a characteristic three-dimensional conformation [7–9,96]. Such binding events can exhibit high specificity but often have low affinity, with rapid dispersal due to high kon and koff rates. They may be promiscuous, with observed binding to multiple partners apparently via conformational plasticity in the interaction site. These kinds of weak, highly fluctuating interactions fall outside the range of classical approaches to protein structural biology. The development of an understanding of the physical basis of induced folding upon binding requires an accurate description of the conformational behaviour of the pre-recognition, free form of the protein. The dynamics of peptide folding upon interaction have been studied using rotating frame relaxation, identifying the formation of initial encounter complexes via weak, nonspecific interactions that facilitate the formation of a partially folded state upon binding [97]. As we have seen, RDCs are sensitive probes of local conformational sampling in the unfolded state, and as such can significantly improve our ability to characterise the extent to which regions of a protein that play a role in binding and function are pre-configured prior to interaction, although existing methods for the interpretation of RDCs in terms of local conformational propensities were initially rather qualitative. In a recent study a key step was taken towards the quantitative, and hopefully insightful analysis of local structure from RDCs. Flexible-meccano was used to characterise the conformational properties of the partially ordered interaction site of the C-terminal domain, NTAIL, of Sendai virus nucleoprotein [98a]. This domain is important for the replication and transcription of the viral RNA, processes that are initiated by the interaction between NTAIL and the C-terminal three-helix bundle domain, PX, of the phosphoprotein P [99]. Chemical shift perturbation and prediction based on primary sequence had been used to propose that the molecular recognition element presents a nascent a-helix in free solution that further folds upon interaction with PX via a negative patch on the surface of PX [100]. Multiple RDCs (1 DHN , 1 DCaC0 , 2 DHNC0 and 1 DCaHa ) were measured from the partially aligned NTAIL in liquid crystalline ethylene glycol/alcohol. Not unexpectedly, the helical region exhibits strongly positive 1 DHN RDCs, indicating the presence of a region with helical propensity, while the remaining regions have predominately negative RDCs and appear to act as a disordered random coil (Figure 9.7). In this study all possible combinations and populations of continuous helical segments, from a minimum of 4 amino acids to a maximum of 20, from throughout the molecular recognition element region, were used to propose a minimum description of the quantitative conformational sampling (Figure 9.7), including the possibility of an unfolded state in equilibrium with the helices. The effective RDC is then given by the population weighted average over the equilibrium of states: Dij;eff ¼
X k¼1;n
pk Dkij
þ 1
X k¼1;n
! pk D U ij
ð9:7Þ
336
Protein NMR Spectroscopy
Studying Partially Folded and Intrinsically Disordered Proteins
337
here pk are the populations of the n helical conformers, Dkij are the individual predicted couplings between nuclei i and j, and DU ij are couplings from the unfolded state. The effective couplings are compared to experimental data: X 2 c2 ¼ Dij;eff Dij;exp =s2ij ð9:8Þ where s represents the uncertainty on the experimental coupling. All available RDCs were therefore used to develop a minimum ensemble representation of the molecular recognition element of NTAIL in terms of an ensemble of interconverting conformational states, demonstrating that, rather than fraying randomly, the molecular recognition sequence of NTAIL preferentially populates three specific helical conformers in equilibrium with the unfolded state. Interestingly the highest populated conformers differ by one turn of a helix at both termini, with the helix termini enclosing the recognition site amino acids (Figure 9.7). The three interconverting helical conformers are stabilised by N-capping interactions via hydrogen bonds between the side chain of the N-capping amino acid (in this case aspartic acids or serine) and the backbone amide in position two or three in the helical elements. The fact that these helices are stabilised by N-capping motifs, suggests that the preferred conformations are pre-encoded in the primary sequence of the molecular recognition element, providing clear detail of mechanisms used by the disordered sequence to control conformational sampling prior to function. Intriguingly, the direction in which the disordered strands neighbouring the helix are projected is also selectively controlled as a result of these stabilising interactions. A mechanism by which the partially folded form of the protein could project the unfolded strands in the most functionally useful direction to achieve efficient nonspecific interactions is thereby identified. It is interesting to look into more detail at the information content of RDCs in partially structured elements. Indeed one can immediately identify a periodicity of the 1 DHN and 3 Figure 9.7 Determination of the conformational equilibrium in the molecular recognition element of NTAIL in solution. Explicit structural ensembles were simulated using specific helical elements of all possible combinations and populations of continuous helical segments, from a minimum of 4 amino acids to a maximum of 20 in the range 476–495. Ensemble equilibria comprising combinations of increasing numbers of conformers (n ¼ 0, 1, 2, 3, 4) of the 153 helical conformations, were compared to the experimental data. In each case the population of each member of the ensemble was optimised. The four conformations are presented as: a single structure, representing the 25 4 % unfolded conformers, the shortest helical element, comprising 6 amino acids 479–484, populated at a level of 36 3 %, 476–488 populated at 28 1 % and a longer stretch 478–492 populated to a level of 11 1 %. The molecular recognition site arginines are shown as sticks. 20 randomly selected conformers are shown for each of the helical segments to illustrate the directionality of the adjacent chains projected from the helix caps. Reproduction of experimental data is shown compared to simulation in the molecular recognition element on the right. Each of the helices is found to be preceded by an amino acid capable of forming an N-capping interaction that can stabilise the formation of helices in flexible peptides (shown in dark color on the ribbon and primary sequence). Reprinted from Malene Ringkjøbing Jensen, Phineus R.L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier
338
Protein NMR Spectroscopy
Studying Partially Folded and Intrinsically Disordered Proteins
339
1
DCaHa , couplings within helical elements (Figure 9.7), and to a lesser extent the 1 DCaC0 , and 2 DHNC0 couplings. Assuming that the helix is more or less canonical in structure, a periodicity should only be observed if the effective net orientation of the vectors on either side of the helix differs relative to the magnetic field (averaged over the ensemble). This would result in an effective tilt of the main axis of the helical element – again on average – with respect to this axis. From simulation it was predicted that the effective tilt of the helix relative to the alignment axis in disordered proteins is determined by the directionality of the unfolded chains projected from the helix termini. The properties of this so-called ‘dipolar wave’ have indeed been shown to depend in a predictable way on helix length [98b], thereby obviating the need for construction of explicit ensembles. 9.5.6
Conformational Sampling in the Disordered Transactivation Domain of p53
As mentioned in the earlier part of the chapter, a full understanding of the vast conformational space available to intrinsically disordered proteins requires experimental data from as many complementary biophysical techniques as possible in order to map conformational space as completely as possible. The combination of dipolar couplings with complementary biophysical techniques was illustrated for the elucidation of the first explicit ensemble description of the human tumour suppressor p53 [101] a protein that plays a key role in maintaining the integrity of the human genome. p53 adopts a homotetrameric structure in solution, with folded tetramerisation and core domains flanked by disordered domains at the N- and C-termini. The three-dimensional structure and the quaternary geometry of the folded domains have been determined using NMR and X-ray crystallography, but no explicit model of the unfolded domain had been obtained in the context of the entire protein. Wells et al. used the RDC-based approaches applied to the NTAIL Sendai virus protein, combined with accelerated MD and small angle scattering, to study the intrinsically disordered N-terminal transactivation domain of p53 in isolation, in the full length form of the protein bound to DNA and in the unbound form. Differential flexibility was found in different regions of the N-terminal disordered transactivation domain of p53 (Figure 9.8). 3 Figure 9.8 Defining the conformational behaviour of p53 N-terminal intrinsically unfolded domain using RDCs. (a) 1 D NH RDCs (grey) from p53(1–93) compared to amino acid specific statistical coil predictions from flexible-meccano (black). All simulated values are scaled by the same prefactor to best reproduce the experimental data. (b) 1 D NH RDCs from p53(1–93) compared to predicted values from amino acid specific statistical coil predictions including the presence of a single turn helix at amino acids 22–24, populated at a level of 30%. Helical values were centred on the conformations present in the X-ray crystal structure of the a-helix formed when [1–93] binds to the ubiquitin ligase MDM2. (c) Conformational sampling as in (B) with an increased level of polyproline II sampling for each amino acid in the region [58–91]. (d) Comparison of experimental NHN RDCs from the 1–93 region of intact p53[1–393]: DNA (grey) with predictions from flexiblemeccano using the full length intact model of DNA bound form (black). Conformational sampling of the unfolded domain as in C. (e) N-terminal domain ensemble in our model with one representative full-length p53 molecule included for illustration. The flexible C-terminal is not shown for reasons of clarity. 20 copies are shown for each monomer. Reprinted from Malene Ringkjøbing Jensen, Phineus R.L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier
340
Protein NMR Spectroscopy
In particular multiple RDCs measured from throughout the N-terminal domain, revealed the presence of a single helix turn at the MDM2 interaction site in the transactivation domain – MDM2 is an important negative regulator of p53. The population of the helix was estimated to be approximately 30 %, consistent with AMD calculations, supporting earlier suggestion of a nascent helix that fully folds upon interaction with MDM2. This interaction motif is preceded by an aspartic acid, as in Sendai virus NTAIL and the beta turns present in Tau K18, again suggesting that the nascent structure is prepared and stabilised via N-capping interactions. The proline rich region, attached to the folded core domain, and positioned between the MDM2 binding site and the surface of the folded domain, exhibited enhanced stiffness relative to the transactivation domain, thereby allowing the projection of the MDM2 interaction site away from the surface of the protein. Most importantly, the dynamic properties of the disordered N-terminal domain allowed RDC measurements from the entire disordered domain, in the context of the full-length tetrameric protein, both in the free and the DNA-complexed forms of the protein (particles of molecular mass of approximately 240 kD). The analysis of these results clearly showed that local conformational sampling of the N-terminal domain is essentially the same in both the full-length protein and in isolation, results that were validated in the isolated and intact forms against small angle scattering data.
9.6
Conclusions
NMR is a remarkably sensitive tool for the study of intrinsically disordered proteins. In this chapter we have reviewed some of the most recent developments, in particular concentrating on the ability of RDCs to provide sensitive and agile probes for the study of local structural propensity in IDPs. The development of appropriate methods to interpret measured couplings in terms of conformational behaviour is evolving very rapidly, as befits a nascent field of research. By combining appropriate ensemble descriptions with experimental constraints, it has been shown that unique and important information can be extracted that accurately reports on the conformational propensities of IDPs. The future success of NMR as a quantitative tool for the study of IDPs depends on the establishment of robust approaches that can unambiguously identify structural properties of IDPs directly from experimental data. If we manage to address this challenge successfully NMR is destined to make significant and original contributions to our understanding of the relationship between conformational behaviour in the disordered state and molecular function and malfunction and the relationship between conformational interconversion timescales and kinetic rates.
References 1. Aloy, P. and Russell, R.B. (2004) Ten thousand interactions for the molecular biologist. Nat Biotech, 22, 1317–1321. 2. Dyson, H.J. and Wright, P.E. (2002) Coupling of folding and binding for unstructured proteins. Curr. Opin. Struct. Biol., 12, 54–60.
Studying Partially Folded and Intrinsically Disordered Proteins
341
3. Fuxreiter, M., Simon, I., Friedrich, P. and Tompa, P. (2008) Preformed structural elements feature in partner recognition by intrinsically unstructured proteins. J. Mol. Biol., 338, 1015–1026. 4. Uversky, V.N. (2002) Natively unfolded proteins: a point where biology waits for physics. Protein Science, 11, 739–756. 5. Tompa, P. (2002) Intrinsically unstructured proteins. TIBS, 27, 527–533. 6. Fink, A.L. (2005) Natively unfolded proteins. Curr. Opin. Struct. Biol., 15, 35–41. 7. Vacic, V., Oldfield, C.J., Mohan, A. et al. (2007) Characterization of molecular recognition features, MoRFs, and their binding partners. J. Prot. Res., 6, 2351–2366. 8. Vucetic, S., Obradovic, Z., Vacic, V. et al. (2005) DisProt: a database of protein disorder. Bioinformatics, 21, 137–140. 9. Tompa, P. and Fuxreiter, M. (2008) Fuzzy complexes: polymorphism and structural disorder in protein-protein interactions. TIBS, 33, 2–8. 10. Syme, C.D., Blanch, E.W., Holt, C. et al. (2002) A Raman optical activity study of rheomorphism in caseins, synucleins and tau. New insight into the structure and behaviour of natively unfolded proteins. Eur. J. Biochem., 269, 148–156. 11. Maiti, N.C., Apetri, M.M., Zagorski, M.G. et al. (2004) Raman spectroscopic characterization of secondary structure in natively unfolded proteins: a-synuclein. J. Am. Chem. Soc., 126, 2399–2408. 12. Denning, D.P., Uversky, V., Patel, S.S. et al. (2002) The Saccharomyces cerevisiae nucleoporin Nup2p is a natively unfolded protein. J. Biol. Chem., 277, 33447–33455. 13. Millett, I.S., Doniach, S. and Plaxco, K.W. (2002) Toward a taxonomy of the denatured state: small angle scattering studies of unfoldedproteins. Adv. Protein Chem., 62, 241–262. 14. Bernado´, P., Mylonas, E., Petoukhov, M.V. et al. (2007) Structural characterization of flexible proteins using small-angle X-ray scattering. J. Am. Chem. Soc., 129, 5656–5664. 15. Jeganathan, S., von Bergen, M., Brutlach, H. et al. (2006) Global hairpin folding of tau in solution. Biochemistry, 45, 2283–2293. 16. Schuler, B. and Eaton, W. (2008) Protein folding studied by single-molecule FRET. Curr. Opin. Struct. Biol., 18, 16–26. 17. Dunker, A.K., Silman, I., Uversky, V.N. and Sussman, J. (2008) Function and structure of inherently disordered proteins. Curr. Opin. Str. Biol., 18, 756–764. 18. Dyson, H.J. and Wright, P.E. (2004) Intrinsically unstructured proteins and their functions. Chem. Rev., 104, 3607–3622. 19. Mukrasch, M.D., Bibow, S., Korukottu, J. et al. (2009) Structural polymorphism of 441-residue Tau at single residue resolution. Plos. Biology, 7, 399–414. 20. Neri, D., Billeter, M., Wider, G. and Wuthrich, K. (1992) NMR determination of residual structure in a urea-denatured protein, the 434 repressor. Science, 257, 1559–1563. 21. Alexandrescu, A.T., Abeygunawardana, C. and Shortle, D. (1994) Structure and dynamics of a denatured 131-residue fragment of stahylococcal nuclease: a heteronuclear NMR study. Biochemistry, 33, 1063–1072. 22. Shortle, D. (1996) The denatured state (the other half of the folding equation) and its role in protein stability. Faseb. J., 10, 27–34. 23. Schwalbe, H., Fiebig, K.M., Buck, M. et al. (1997) Structural and dynamical properties of a denatured protein. Heteronuclear 3D NMR experiments and theoretical simulations of lysozyme in 8M urea. Biochemistry, 36, 8977–8991. 24. Spera, S. and Bax, A. (1991) Empirical correlation between protein backbone conformation and C-alpha and C-beta C-13 NMR chemcial shifts. J. Am. Chem. Soc., 113, 5490–5492. 25. Wishart, D., Sykes, B. and Richards., F. (1992) The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy. Biochemistry, 31, 1647–1651. 26. Cavalli, A., Salvatella, X., Dobson, C.M. and Vendruscolo, M. (2007) Protein structure determination from NMR chemical shifts. Proc. Natl. Acad. Sci. U.S.A., 104, 9615–9620. 27. Shen, Y. et al. (2008) Consistent blind protein structure generation from NMR chemical shift data. Proceedings of the National Academy of Sciences (U.S.A.), 105, 4685–4690.
342
Protein NMR Spectroscopy
28. Wishart, D.S., Bigam, C.G., Holm, A. et al. (1995) H-1, C-13 and N-15 random coil NMR shifts of the common amino acids. 1. Investigations of nearest neighbour effects. J. Biomol. NMR, 5, 67–81. 29. Schwarzinger, S., Kroon, G.J.A., Foss, T.R. et al. (2001) Sequence-dependent correction of random coil NMR chemical shifts. J. Am. Chem. Soc., 123, 2970–2978. 30. Wang, Y. and Jardetzky, O. (2002) Probability-based protein secondary structure identification using combined NMR chemical-shift data. Protein Science, 11, 852–861. 31. Wang, L.Y., Eghbalnia, H.R., Bahrami, A. and Markley, J.L. (2005) Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications. J. Biomol. NMR, 32, 13–22. 32. Marsh, J.A., Singh, V.K., Jia, Z.C. and Forman-Kay., J.D. (2006) Sensitivity of secondary structure propensities to sequence differences between alpha- and gamma-synuclein: Implications for fibrillation. Protein Science, 15, 2795–2804. 33. Serrano, L. (1995) Comparison between the j Distribution of the Amino Acids in the Protein Database and NMR Data Indicates that Amino Acids have Various j Propensities in the Random Coil Conformation. J. Mol. Biol., 254, 322–333. 34. Smith, L.J., Bolin, K.A., Schwalbe, H. et al. (1996) Analysis of main chain torsion angles in proteins: prediction of NMR coupling constants for native and random coil conformations. J. Mol. Biol., 255, 494–506. 35. Graf, J., Nguyen, P.H., Stock, G. and Schwalbe, H. (2007) Structure and dynamics of the homologous series of alanine peptides: a joint molecular dynamics/NMR study. J. Am. Chem. Soc., 129, 1179–1189. 36. Macura, S. and Ernst, R.R. (1980) Elucidation of cross relaxation in liquids by two-dimensional NMR-spectroscopy. Mol. Phys., 41, 95–117. 37. Klein-Seetharaman, J., Oikawa, M., Grimshaw, S.B. et al. (2002) Long-range interactions within a nonnative protein. Science, 295, 1719–1722. 38. Gillespie, J.R. and Shortle, D. (1997) Characterization of long-range structure in the denatured state of staphylococcal nuclease. I. Paramagnetic relaxation enhancement by nitroxide spin labels. J. Mol. Biol., 268, 158–169. 39. Dedmon, M.M., Lindorff-Larsen, K., Christodoulou, J. et al. (2005) Mapping long-range interactions in alpha-synuclein using spin-label NMR and ensemble molecular dynamics simulations. J. Am. Chem. Soc., 127, 476–477. 40. Bertoncini, C.W., Jung, Y.S., Fernandez, C.O. et al. (2005) Release of long-range tertiary interactions potentiates aggregation of natively unstructured a-synuclein. Proc. Natl. Acad. Sci. USA, 102, 1430–1435. 41. Felitsky, D.J., Lietzow, M.A., Dyson, H.J. and Wright., P.E. (2008) Modeling transient collapsed states of an unfolded protein to provide insights into early folding events. Proc. Natl. Acad. Sci. U.S.A., 105, 6278–6283. 42. Salmon, L., Nodet, G., Ozenne, V. et al. (2010) NMR characterization of long-range order in intrinsically disordered proteins. J. Am. Chem. Soc., 132, 8407–8418. 43. Tjandra, N. and Bax, A. (1997) Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science, 278, 1111–1114. 44. Shortle, S. and Ackerman, M.S. (2001) Persistence of native-like topology in a denatured protein in 8 M urea. Science, 293, 487–489. 45. Louhivouri, M., P€a€akk€onen, K., Fredriksson, K. et al. (2003) On the origin of residual dipolar couplings from denatured proteins. J. Am. Chem. Soc., 125, 15647–15650. 46. Torbet, J. and Maret, G. (1978) Fibres of highly oriented PF1 bacteriophage produced in a strong magnetic field. J. Mol. Biol., 134, 843–845. 47. Hansen, M.R., Mueller, L. and Pardi, A. (1998) Tunable alignment of macromolecules by filamentous phage yields dipolar coupling interactions. Nat. Struct. Biol., 5, 1065–1074. 48. Clore, G.M., Starich, M.R. and Gronenborn, A.M. (1998) Measurement of residual dipolar couplings of macromolecules aligned in the nematic phase of a colloidal suspension of rodshaped viruses. J. Am. Chem. Soc., 120, 10571–10572. 49. Ruckert, M. and Otting, G. (2000) Alignment of biological macromolecules in novel nonionic liquid crystalline media for NMR experiments. J. Am. Chem. Soc., 122, 7793–7797.
Studying Partially Folded and Intrinsically Disordered Proteins
343
50. Sass, H.J., Musco, G., Stahl, S.J. et al. (2000) Solution NMR of proteins within polyacrylamide gels: Diffusional properties and residual alignment by mechanical stress or embedding of oriented purple membranes. J. Biomol. NMR, 18, 303–309. 51. Tycko, R., Blanco, F.J. and Ishii, Y. (2000) Alignment of biopolymers in strained gels: a new way to create detectable dipole–dipole couplings in high-resolution biomolecular NMR. J. Am. Chem. Soc., 122, 9340–9341. 52. Skora, L., Cho, M.K., Kim, H.Y. et al. (2006) Charge-induced molecular alignment of intrinsically disordered proteins. Angewandte Chemie-International Edition, 45, 7012–7015. 53. Bax, A. (2003) Weak alignment offers new NMR opportunities to study protein structure and dynamics. Protein Science, 12, 1–18. 54. Prestegard, J.H., Bougault, C.M. and Kishore, A.I. (2004) Residual dipolar couplings in structure determination of biomolecules. Chem. Rev., 104, 3519–3540. 55. Blackledge, M. (2005) Recent advances in the use of residual dipolar couplings for the study of biomolecular structure and dynamics in solution. Progress in Nuclear Magnetic Resonance Spectroscopy, 46, 23–61. 56. Tjandra, N., Omichinski, J., Gronenborn, A.M. et al. (1997) Use of dipolar 1H-15N and 1H-13C couplings in the structure determination of magnetically oriented macromolecules in solution. Nat. Struct. Biol., 4, 732–738. 57. Clore, G.M. (2000) Accurate and rapid docking of protein–protein complexes on the basis of intermolecular nuclear Overhauser enhancement data and dipolar couplings by rigid body minimization. Proc. Natl. Acad. Sci. U.S.A., 97, 9021–9025. 58. Ortega-Roldan, J.L., Jensen, M.R., Brutscher, B. et al. (2009) Accurate characterization of weak macromolecular interactions by titration of NMR residual dipolar couplings: application to the CD2AP SH3-C:Ubiquitin complex. Nucleic Acids Research, 37, e70. 59. Meiler, J., Prompers, J., Griesinger, C. and Br€uschweiler, R. (2001) Model-free approach to the dynamic interpretation of residual dipolar couplings in globular proteins. J. Am. Chem. Soc., 123, 6098–6107. 60. Clore, G.M. and Schwieters, C.D. (2004) How much backbone motion in ubiquitin is required to account for dipolar coupling data measured in multiple alignment media as assessed by independent cross-validation? J. Am. Chem. Soc., 126, 2923–2938. 61. Tolman, J. (2002) A novel approach to the retrieval of structural and dynamic information from residual dipolar couplings using several oriented media in biomolecular NMR spectroscopy. J. Am. Chem. Soc., 124, 12020–12030. 62. Briggman, K.B. and Tolman, J.R. (2003) De novo determination of bond orientations and order parameters from residual dipolar couplings with high accuracy. J. Am. Chem. Soc., 125, 10164–10165. 63. Bernado´, P. and Blackledge, M. (2004) Local dynamic amplitudes on the protein backbone from dipolar couplings: towards the elucidation of slower motions in biomolecules. J. Am. Chem. Soc., 126, 7760–7761. 64. Ulmer, T.S., Ramirez, B.E., Delaglio, F. and Bax, A. (2004) Evaluation of backbone proton positions and dynamics in a small protein by liquid crystal NMR spectroscopy. J. Am. Chem. Soc., 125, 9179–9191. 65. Bouvignies, G., Markwick, P.R.L., Br€uschweiler, R. and Blackledge, M. (2006) Simultaneous determination of protein backbone structure and dynamics from residual dipolar couplings. J. Am. Chem. Soc., 128, 15100–15101. 66. Lakomek, N.A., Walter, K.F., Fares, C. et al. (2008) Self-consistent residual dipolar coupling based model-free analysis for the robust determination of nanosecond to microsecond protein dynamics. J. Biomol. NMR, 41, 139–155. 67. Salmon, L., Bouvignies, G., Markwick, P. et al. (2009) Protein conformational flexibility from structure-free analysis of NMR dipolar couplings: quantitative and absolute determination of backbone motion in ubiquitin. Angewandte Chemie International Edition, 48, 4154–4157. 68. Meier, S., Blackledge, M. and Grzesiek, S. (2008) Conformational distributions of unfolded polypeptides from novel NMR techniques. J. Chem. Phys., 128, 052204. 69. Ohnishi, S., Lee, A.L., Edgell, M.H. and Shortle, D. (2004) Direct demonstration of structural similarity between native and denatured eglin C. Biochemistry, 43, 4064–4070.
344
Protein NMR Spectroscopy
70. Ding, K., Louis, J.M. and Gronenborn, A.M. (2004) Insights into conformation and dynamics of protein GB1 during folding and unfolding by NMR. J. Mol. Biol., 335, 1299–1307. 71. Mohana-Borges, R., Goto, N.K., Kroon, G.J. et al. (2004) Structural characterization of unfolded states of apomyoglobin using residual dipolar couplings. J. Mol. Biol., 340, 1131–1142. 72. Fieber, W., Kristjansdottir, S. and Poulsen, F.M. (2004) Short-range, long-range and transition state interactions in the denatured state of ACBP from residual dipolar couplings. J. Mol. Biol., 339, 1191–1199. 73. Meier, S., Guthe, S., Kiefhaber, T. and Grzesiek, S. (2004) Foldon, the natural trimerization domain of T4 fibritin, dissociates into a monomeric A-state form containing a stable betahairpin: Atomic details of trimer dissociation and local beta-hairpin stability from residual dipolar couplings. J. Mol. Biol., 344, 1051–1069. 74. Alexandrescu, A.T. and Kammerer, R.A. (2003) Structure and disorder in the ribonuclease S-peptide probed by NMR residual dipolar couplings. Protein Science, 12, 2132–2140. 75. Dames, S.A., Aregger, R., Vajpai, N. et al. (2006) Residual dipolar couplings in short peptides reveal systematic conformational preferences of individual amino acids. J. Am. Chem. Soc., 128, 13508–13514. 76. Sibille, N., Sillen, A., Leroy, A. et al. (2006) NMR investigation of the interaction between the neuronal protein Tau and the microtubules. Biochemistry, 45, 12560–12572. 77. Mukrasch, M.D., Markwick, P., Biernat, J. et al. (2007a) Highly populated turn conformations in natively unfolded Tau protein identified from residual dipolar couplings and molecular simulation. J. Am. Chem. Soc., 129, 5235–5243; Mukrasch, M.D., von Bergen, M., Biernat, J. et al. (2007b) The “jaws” of the Tau-microtubule interaction. J. Biol. Chem., 282, 12230–12239. 78. Bernado´, P., Bertoncini, C.W., Griesinger, C. et al. (2005) Defining long-range order and local disorder in native a-synuclein using residual dipolar couplings. J. Am. Chem. Soc., 127, 17968–17969. 79. Sung, Y.H. and Eliezer, D. (2007) Residual structure, backbone dynamics, and interactions within the synuclein family. J. Mol. Biol., 372, 689–707. 80. Fredriksson, K., Louhivuori, M., Permi, P. and Annila, A. (2004) On the interpretation of residual dipolar couplings as reporters of molecular dynamics. J. Am. Chem. Soc., 126, 12646–12650. 81. Obolensky, O.I., Schlepckow, K., Schwalbe, H. and Solov’yov, A.V. (2007) Theoretical framework for NMR residual dipolar couplings in unfolded proteins. J. Biomol. NMR, 39, 1–16. 82. Louhivuori, M., Fredriksson, K., Paakkonen, K. et al. (2004) Alignment of chain-like molecules. J. Biomol. NMR, 29, 517–524. 83. Jha, A.K., Colubri, A., Freed, K.F. and Sosnick, T.R. (2005) Statistical coil model of the unfolded state: Resolving the reconciliation problem. Proc. Natl. Acad. Sci. U.S.A., 102, 13099–13104. 84. Bernado´, P., Blanchard, L., Timmins, P. et al. (2005) A structural model for unfolded proteins from residual dipolar couplings and small-angle x-ray scattering. Proc. Natl. Acad. Sci. U.S.A, 102, 17002–17007. 85. Marsh, J.A., Baker, J.M.R., Tollinger, M. and Forman-Kay, J.D. (2008) Calculation of residual dipolar couplings from disordered state ensembles using local alignment. J. Am. Chem. Soc., 130, 7804–7805. 86. Nodet, G., Salmon, L., Ozenne, V. et al. (2009) Quantitative description of backbone conformational sampling of unfolded proteins at amino acid resolution from NMR residual dipolar couplings. J. Am. Chem. Soc., 131, 17908–17918. 87. Lovell, S.C., Davis, I.W., Arendall, W.B. III et al. (2003) Structure validation by C alpha geometry: phi, psi and C beta deviation. Proteins, 50, 437–450. 88. Zweckstetter, M. and Bax, A. (2000) Prediction of sterically induced alignment in a dilute liquid crystalline phase: aid to protein structure determination by NMR. J. Am. Chem. Soc., 122, 3791–3792. 89. Almond, A. and Axelsen, J.B. (2002) Physical interpretation of residual dipolar couplings in neutral aligned media. J. Am. Chem. Soc., 124, 9986–9987.
Studying Partially Folded and Intrinsically Disordered Proteins
345
90. Cho, M.-K., Kim, H.-Y., Bernado, P. et al. (2007) Amino acid bulkiness defines the local conformations and dynamics of natively unfolded a-synuclein and Tau. J. Am. Chem. Soc., 129, 3032–3033. 91. Mandelkow, E.-M. and Mandelkow, E. (1998) Tau in Alzheimer’s disease. Trends in Cell. Biol., 8, 425–427. 92. Markwick, P.R.L., Bouvignies, G. and Blackledge, M. (2007) Exploring multiple timescale motions in protein GB3 using accelerated molecular dynamics and NMR. J. Am. Chem. Soc., 129, 4724–4730. 93. Bertoncini, C., Fernandez, C., Griesinger, C. et al. (2005) Familial mutants of alpha-synuclein with increased neurotoxicity have a destabilized conformation. J. Biol. Chem., 280, 30649–30652. 94. Meier, S., Strohmeier, M., Blackledge, M. and Grzesiek, S. (2007a) Direct observation of dipolar couplings and hydrogen bonds across a b-Hairpin in 8 M urea. J. Am. Chem. Soc., 129, 754–755; Meier, S., Grzesiek, S. and Blackledge, M. (2007b) Mapping the conformational landscape of urea-denatured ubiquitin using residual dipolar couplings. J. Am. Chem. Soc., 129, 9799–9807. 95. Meier, S., H€aussinger, D., Jensen, P., Rogowski, M. and Grzesiek., S. (2003) High-accuracy residual 1HN-13C and 1HN-1HN dipolar couplings in perdeuterated proteins. J. Am. Chem. Soc., 125, 44–45. 96. Sickmeier, M., Hamilton, J.A., LeGall, T. et al. (2007) DisProt: the database of disordered proteins. Nucleic. Acids Res., 35, D786–D793. 97. Sugase, K., Dyson, H.J. and Wright, P.E. (2007) Mechanism of coupled folding and binding of an intrinsically unstructured protein. Nature., 447, 1021–1025. 98. Jensen, M.R., Houben, K., Lescop, E. et al. (2008) Quantitative conformational analysis of partially folded proteins from residual dipolar couplings: application to the molecular recognition element of sendai virus nucleoprotein. J. Am. Chem. Soc., 130, 8055–8061; Jensen, M.R. and Blackledge, M. (2008) On the origin of NMR dipolar waves in transient helical elements of partially folded proteins. J. Am. Chem. Soc., 130, 11266–11267. 99. Blanchard, L., Tabouriech, N., Blackledge, M. et al. (2004) Structure and dynamics of the nucleocapsid-binding domain of the Sendai virus phosphoprotein in solution. Virology, 319, 201–211. 100. Houben, K., Marion, D., Tarbouriech, N. et al. (2007) Interaction of the C-terminal domains of sendai virus N and P proteins: Comparison of polymerase-nucleocapsid interactions within the paramyxovirus family. J. Virology, 81, 6807–6816. 101. Wells, M., Tidow, H., Rutherford, T.J. et al. (2008) Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain. Proc. Natl. Acad. Sci. (U.S.A.), 105, 5762–5767.
Index 4D triple resonance 74–7, 86–87, 279–80, 302, 307 Acquisition parameters 14–16, 89–91, 106 AIR, ambiguous interaction restraints, see HADDOCK Alignment tensor 109–11, 118–22, 286–8, 324 Apomyoglobin 325, 329 ARIA 94, 96, 163, 174, 175 AUTOASSIGN 76 AutoStructure 163, 174 Binding detecting 230–41 equilibrium constant 224 measurement 229–31, 235, 274 kinetic mechanism 233, 241–6, 252 location of site 253–4, 256 rate constants 224, 233, 249 measurement 229–238 Bloch equations 225, 232, 233 CANDID 96, 174 CCPN 76 Cell-free synthesis 25–29, 41 protocols for 26–30, 41 Chemical shift anisotropy 111–12 mapping 253–4, 285, 294–5, 299–300 paramagnetic, see Paramagnetic effects prediction of 130–1 referencing 16, 131–2 structural calculation using 180–1 see also CHESHIRE, CS-ROSETTA and TALOS CHESHIRE 180 CNS 163
Contact shifts 199–200 Correlation time 8, 85, 93, 194, 195, 239, 241, 246, 253, 256 Coupling dipolar, see Residual dipolar coupling scalar, see Scalar coupling CRINEPT, CRIPT 32, 272 Cross-relaxation, see Relaxation Cross-saturation, see Saturation transfer CS-ROSETTA 134–7, 180–1 CYANA 94, 96, 133, 163, 166–74, 175–8 Data collection acquisition parameters 14–16, 89–91, 106 fast acquisition 16–17 pulse calibration 13–14 selective pulses 13–14, 86–7, 88 Data processing 17–19 linear prediction 19 programs CCPN 76 NMRPipe 17 SPARKY 76 weighting functions 18–19 zero filling 18 Denatured proteins 325–7, 333 Deuteration 30–2, 77, 93, 104, 270–2, 280–2 fractional 30, 32 perdeuteration 30–2, 229, 270–2, 280–2 selective 35–6 see also SAIL Dipolar coupling, see Residual dipolar coupling Dipolar relaxation, see Relaxation see also Paramagnetic effects Docking complexes with NMR restraints, see HADDOCK DYANA 163
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
348
Index
Dynamics 36, 129, 190, 211–12, 235–8, 242–3, 245 Electronic relaxation time, ts 194, 197, 203, 256 Escherichia coli protein expression in 24–5 Exchange definitions 222–5 effects, fundamentals of 222–8, 273–7 lifetimes 224 lineshape, effects on 225–7, 229, 274–7 rate constants, measurement of 229–39 regimes 225 fast 230–1, 243–6, 248–9, 274–5 problems of 242 identification 227–8, 274–7 intermediate 228, 274–6 moderately fast 228, 231 slow 229–30, 243, 274–5 spectroscopy, 2D 233–5, zz spectroscopy 235, 274, 276 see also Magnetization transfer Fast acquisition methods 16–17 Flexible Meccano 328, 331, 333, 335 FLYA 164, 182–5 HADDOCK
254, 293–6, 299–301
IDIS-NMR, isotope-discriminated NMR, see Isotope-edited and -filtered spectra INPHARMA 252–3 Intein 37–8 Intrinsically disordered proteins chemical shifts 321–2 NOEs 322 paramagnetic relaxation enhancement 322–3 residual dipolar couplings 323–40 resonance assignment 76–7, 80 scalar couplings 322 Inversion transfer 233 Isotope-edited and -filtered spectra 35, 256–9, 278–80, 303–308 Isotope labelling Ca, CaH 34–5 differential, in protein complexes 256–9, 270–2, 277–81, 284 methyl 35–6, 271–2, 281, 285
reverse 34–6, 93 RNA 304, 307–8 SAIL 38–45, 185 protocols for 41, 45 segmental 37–8 selective 32–4, 35 amino-acid type 25, 26, 32–4, 35, 42–5, 272 combinatorial 34 stereo-selective 36–7 uniform 29–32 Isotope shift 32, 33, 35, 80 ISPA, isolated spin-pair approximation 92, 94–5, 248, 250–1 J-coupling, see Scalar coupling Karplus equation 102–3 Kinetic mechanism, identification of KNOWNOE 163, 174
242–6
Lineshape 9, 11, 13, 225–7, 229–31 Linewidth 32, 113 exchange contribution to 225–7, 229–31, 245 paramagnetic relaxation contribution to 194–6, 204, 254–6 Longitudinal relaxation rate 13, 85, 194, 246, 255–6, 272, 286, Low-populated states, observation of 211–12, 235–8 Magnetization transfer by cross-relaxation 86, 92–3 by exchange 231–8, 240–1 MARS 76 Molecular dynamics simulation 93, 140, 162–3, 164–72, 307, 323, 330–1 see also Simulated annealing Nitroxide labels 195, 201–2, 204–5, 211, 289–90 NMR timescale 225, 227, 228 NOAH 163, 174 NOEs 84–96 assignment 173–8 automated 163, 173–8 ARIA 96, 163 CYANA/CANDID 96, 163, 175–8 intermolecular 247, 256-, 278–82
Index measurement 86–92 rotating frame 93 structural restraints from 92–96 local motion, effects of 93–4 spin diffusion, effects of 92–3 PASD 163, 174 Peak picking 64–5, 71, 101, 161, 181–2 Paramagnetic effects alignment see Residual dipolar coupling dynamics, study of, using 211–12 ligand binding 254–6 probes 202–8 metal-binding tags 112, 205–7, 290–1 protocol for use of 207–9 metals choice of 200–4 lanthanides 112, 197, 199, 200, 201–4, 203–7, 211, 291–3 see also nitroxide labels protein-protein complexes 210–11, 289–91 relaxation Curie spin relaxation 197 dipolar 194–6, 254–6, 322–3 measurement 196–7 shifts 199–200 contact 199–200, 204 pseudocontact 166, 194, 195, 199–200, 204, 208, 291–3 PCS, pseudocontact shifts, see Paramagnetic effects Peptides, metal-binding 203, 205–6, 210 Perdeuteration see Deuteration PRE, paramagnetic relaxation enhancement, see Paramagnetic effects Protein expression, for isotope labelling 24–5 Protein-DNA complexes 301–3 Protein-protein complexes differential isotope labelling 270–2, 277–82, 282–6, 299–301, 303–9 intermolecular NOEs 278–82 protocol for titrations 273–7 saturation transfer 282–6 structure determination 293–6, 297–8, 299–301 Protein-RNA complexes 303–9 Proteins, specific acyl-CoA binding protein 326 calcium-binding proteins
349
paramagnetic metal substitution 203–5, 209 chloramphenicol acetyltransferase 245, 252 cytochrome c peroxidase 211 dihydrofolate reductase 36, 222–3, 226, 243, 244–5, 248–9, 258–9 enzyme I / HPr complex 297–8 glutamine-binding protein (E. coli) 115–20 IGF2 / receptor complex 299–301 LEF-1 HMG domain / DNA complex 301–3 maltose-binding protein 43 mannose-binding protein 128 nitrite reductase 210–11 nuclease (Staphylococcus) 203, 325, 329 nucleocapsid protein / RNA complex 306–9 p53 protein 339–40 plastocyanin 209, 210 proteasome 36 protein G (Streptococcus), IgG-binding domain 56 ff. S100B 226–7 selenium-binding protein (Methanococcus) 123–7 Sendai virus nucleoprotein 328–9, 335–7 SH3 domain 237, 245 SRY HMG domain / DNA complex 302–3 subtilisin inhibitor (Streptomyces) 33 a-synuclein 331–3 troponin C 244 ubiquitin 210, 245, 333–4 Pseudocontact shifts, see Paramagnetic effects Pulse calibration 13–14, Pulse sequences CBCA(CO)NH 63, 66, 70–3, 80 COSY, DQF-COSY 57–60, 99–100, 302 C(CCO)NH 80 C(CCA)NH 80 CLEANEX-PM 235 CPMG (Carr-Purcell-MeiboomGill) 235–8, 239 E.COSY 98–9 EXSY 233 HBHA(CO)NH 66, 73, 77, 80 HBHANH 66, 73, 77 H(CCCA)NH 80 H(CCCO)NH 80 HCCH-COSY 79 HCCH-TOCSY 77 HNCA 63, 66 HNCACB 63, 66, 68, 70–3
350
Index
Pulse sequences (Continued ) HN(CA)CO 63, 66, 68, 70–3 HNCO 63, 66, 68, 70–3, 104–7 HN(CO)CA 63, 66 HSQC 6, 34, 115, 133, 226–7, 253–4, 273–4, 284–5 IPAP 101–2 NOESY 59, 61–2 NOESY-HSQC 64–5, 68, 79–81, 86–7, 259, 307 protocol 87–92 PFG-STE 239 quantitative J-correlation 99–101 ROESY 93, 235, 250–1, 276 TOCSY 57–60 TOCSY-HSQC 63, 299 TROSY 31, 77, 81, 104–6, 238, 272, 280, 284–5 Random coil behaviour, deviation from 329–33 RDC, see Residual dipolar coupling Relaxation cross-relaxation 85, 86, 93, 240, 246–7, 322 see also NOE, transferred NOE dipolar 84–5, 241, 271–2, 280 dispersion 212, 235–8 matrix 85, 86, 92, 94, 240, 248, 250–1 paramagnetic see paramagnetic effects, relaxation rotating frame 93, 235, 238, 250–1, 335 spin diffusion 92–3, 241, 250–1 transferred see transferred NOE transferred cross-correlated 253 Reverse labelling 33, 34–6 Residual dipolar coupling 108–11 analysis 118–21 intrinsically disordered proteins 323–40 ligand binding 128 measurement 116–8 alignment media 113–16 paramagnetic proteins 108, 112, 197–8 protein complexes 286–9, 295–6, 297, 302, 303 structure calculation using 122–9, 179–80 structure validation using 121–2 RNA-binding domain / RNA complex 303–5 Rotational correlation time 85, 93, 194, 239, 241, 246, 253, 256,286, 290 Resonance assignment 55 ff.
backbone assignment 59–62 computer-assisted 76 NOE-based 59–62 scalar coupling-based 57–59, 62–81 sequence-specific 59–62, 68–77 side-chain 57–9, 77–81 aromatic 60, 81 proline 74 spin system assignment 57–59, 64–8 triple resonance-based 62–81 Rotational correlation time, see Correlation time SAIL 38–45, 185 Sample additives 7 concentration 6 conditions 6–8 for RDC measurements 113–16 preparation 5–9 tubes 9–11 SAR by NMR 254 Saturation transfer 233, 240–1, 282–6 Scalar coupling across hydrogen bonds 104–7 analysis 96–7 assignment, use in 57–9, 62–81 measurement 97–102 structural restraints from 102–3, 302 Screening for binding 238–41 Simulated annealing 123–5, 172–3, 294, 298, 304 protocol for 172–3 Small Angle Neutron Scattering (SANS) see Solution scattering Small Angle X-ray Scattering (SAXS) see Solution scattering Solomon-Bloembergen equation 194–5, 256 Solution scattering 137–47 measurement 141–7 in NMR structure determination 140–1 protein-protein complexes 297–8 SPARKY 76 Spectrometer set-up locking 11 pulse calibration 13–14 shimming 12–13 tuning 11 Spin diffusion 92–3 Spin-echo 87–9, 279 see also CPMG
Index Spin label, see Nitroxide Spin-lattice relaxation rate, see Longitudinal relaxation rate Spin-spin relaxation rate, see Transverse relaxation rate Spin systems 57–9 STD, saturation transfer difference, see Saturation transfer Structural restraints ambiguous 163, 175 chemical shifts see CS-ROSETTA and TALOS distance 94–6 see NOE hydrogen-bond 103–7 orientation see Residual dipolar coupling torsion angle see Scalar coupling see also TALOS Structure calculation assignment-free 178–9 automated 163–4, 182–5 chemical shift-based 180–1 distance geometry 161–2 programs 163 residual dipolar coupling-based 179–80 simulated annealing 162, 172–3 target functions 162, 165–6 torsion angle dynamics 162–3, 166–72
351
T1, spin-lattice relaxation time see Longitudinal relaxation rate T2, spin-spin relaxation time see Transverse relaxation rate tc, see Correlation time tM, exchange lifetime 255–6 tr, see Correlation time ts, see Electronic relaxation time TALOS 42, 130, 132–4, 180 Tau microtubule protein 329–31 Titrations, ligand & protein binding 8–9, 226–8, 229, 244, 245, 273–7, 288, 300, 303 Transferred NOE (trNOE) 246–53 exchange requirements 248–9 interligand 251–3 spin diffusion 250–1 Transverse relaxation rate 15, 30, 35, 39, 104, 195, 196, 226, 230–1, 239, 272, 277, 279, 290 measurement 87–9 Translational mobility, measurement of 239 Two-spin approximation 92, 95 Unstructured proteins see Intrinsically disordered proteins Triple resonance experiments 62–81 TROSY 31–2, 77, 104, 280, 285 WaterLOGSY
241
XPLOR and XPLOR-NIH
163, 165