Spectroscopic Methods and Analyses: NMR, Mass Spectrometry, and Metalloprotein Techniques (Methods in Molecular Biology Vol 17)

Spectroscopic Methods and Analyses Methods in Molecular Biology John M. Walker, SERIES EDITOR 17. Spectroscopic Method...

Author: Christopher Jones | Barbara Mulloy | Adrian H. Thomas

19 downloads 678 Views 21MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Spectroscopic Methods and Analyses

Methods in Molecular Biology John M. Walker, SERIES EDITOR 17. Spectroscopic Methods and Analyses, edited by Christopher Jones, Barbara Mulloy, and Adrian H. Thomas, 1993 16. Enzymes of Molecular Biology, edited by Michael M. Burrell, 1993 15. PCR Protocols, edited by Bruce A. White, 1993 14. Glycoprotein Analysis in Biomedicine, edited by Elizabeth F. Hounsell, 1993 13. Protocols in Molecular Neurobiology, edited by Alan Longstaff and Patricia Revest, 1992 12. Pulsed-Field Gel Electrophoresis, edited by Margit Burmeister and Levy Vlunovsky, 1992 11. Practical Protein Chromatography, edited by Andrew Kenney and Susan Fowell, 1992 10. Immunochemical Protocols, edited by Margaret M. Manson, 1992 9. Protocols in Human Molecular Genetics, edited by Christopher G. Mathew, 1991 8. Practical Molecular Virology, edited by Mary K L. Collins, 1991 7. Gene Transfer and Expression Protocols, edited by Edward J. Murray, 1991 6. Plant Cell and Tissue Culture, edited by JefFey W. Pollard and John M. Walker, 1990 5. Animal Cell Culture, edited by J e m W. Pollard and John M. Walker, 1990 4. New Nucleic Acid Techniques, edited by John M. Walker, 1988 3. New Protein Techniques, edited by John M. Walker, 1988 2. Nucleic Acids, edited by John M. Walker, 1984 1. Proteins, edited by John M. Walker, 1984

Spectroscopic Methods and Analyses NMR, Mass Spectrometry, and Metalloprotein Techniques

Edited by

Christopher Jones, Barbara Mulloy, andAdrian H. Thomas National Institute for Biological Standards and Control, South Mimms, Potters Bar, UK

Humana Press

Totowa, New Jersey

O 1993 Humana Press Inc.

999 Riverview Drive, Suite 208 Totowa, New Jersey 07512 All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise without written permission from the Publisher. Photocopy Authorization Policy: Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by The Humana Press Inc., provided that the base fee of US $3.00 per copy, plus US $00.20 per page is paid directly to the Copyright Clearance Center at 27 Congress Street, Salem, MA 01970. For those organizations that have been granted a photocopy license from the CCC, a separate system of payment has been arranged and is acceptable to The Humana Press Inc. The fee code for users of the Transactional Reporting Service is: [O-89603-215-9193 $3.00 + $00.201. Printed in the United States of America Library of Congress Cataloging in Publication Data Main entry under title: Methods in molecular biology. Spectroscopic methods and analyses: NMR, mass spectrometry, and metalloprotein techniques edited by Christopher Jones, Barbara Mulloy, and Adrian H. Thomas. p. cm. - (Methods in molecular biology ; 17) Includes index. ISBN 0-89603-215-9 1. Nuclear magnetic resonance spectroscopy. 2. Mass spectrometry. 3. Proteins-Analysis. 4. Metalloproteins-Analysis. 5. Glycoproteins-Analysis. I. Jones, Christopher, 1954- . 11. Mulloy, Barbara. 111. Thomas, A. H. (Adrian H.) IV. Series: Methods in molecular biology (Totowa, NJ) ; 17. [DNLM: 1. Metalloproteins--analysis. 2. Nuclear Magnetic Resonance. 3. Spectrum Analysis. WI ME9616J v.17 199311 QD 96.N8 S741 19931 QP519.9.N83S74 1993 574.19'285--dc20 DNLMIDLC 92-48555 for Library of Congress CIP

Preface The three volumes in Methods in Molecular Biology covering Physical Methods of Analysis (vol. 1, Spectroscopic Methods and Analyses: NMR, Mass Spectrometry, and Metalloprotein Techniques;vol. 2, Optical Spectroscopy and Macroscopic Techniques; vol. 3, Crystallographic Methods and Techniques) differ from others in this series in several ways. Each volume covers a group of techniques for the characterization of biological molecules and their interactions that involve the application of modern techniques of physical chemistry. These techniques by and large do not lend themselves to the "hands-on" approach and cannot usually be carried out by the molecular biologist alone, but most often require collaboration with a specialist. The biologist or biochemist contemplating such a collaboration may feel somewhat at a distance from the experimental work and further isolated by the use of the jargons of analytical and physical chemistry. Physical methods have been used in molecular biology from the earliest days, from simple applications of optical spectroscopy to the complexity of X-ray crystallography, and the full range of these methods will be covered over the three volumes. The methods dealt with in this first volume have largely developed from beginnings in small molecule chemistry to the point where they play a valuable role in the characterization of biological macromolecules. There are three groups of techniques covered here: First, nuclear magnetic resonance spectroscopy and its applications to proteins, peptides, nucleic acids, and carbohydrates. Second, mass spectrometry and the soft ionization techniques that allow it to be applied to biological molecules. And third, a variety of techniques that can be used to characterize the metal center in metalloproteins. All these techniques require sophisticated and expensive equipment that will not be found in every laboratory, and all need specialist expertise to optimize experimental methods and fully interpret the data.

vi

Preface

The authors of the individual chapters in this initial volume, Spectroscopic Methods and Analyses: NMR, Mass Spectrometry, and Metalloprotein Techniques, all specialize in the application of these techniques to biological problems, and they have been asked to highlight the practical aspects their biological colleagues need to take into account when designing a multidisciplinary project. This includes information on the kinds of problems that can fruitfully be tackled; what commitment of time and expense is involved; how much sample is required and how pure it should be; and an overview of the information that can be obtained. We hope that the material in this volume will prove useful to molecular biologists at all levels, and help smooth the daily working collaborations between scientists across the many different disciplines that play an ever-increasing part in modern biological studies. W e are particularly grateful for the help of Robin Wait in the assembly and editing of the chapters on mass spectrometry.

Christopher Jones Barbara Mulloy Adrian H. Thomas

Contents Preface ....................................................................................................................... v Contributors ............................................................................................................ ix PARTI. NMR SPECTROSCOPY CH. 1, Introduction to Nuclear Magnetic Resonance, Christopher Jones and Barbara Mulloy ........................................ 1 CH. 2. Structural Studies of Proteins in Solution Using Proton Nuclear Magnetic Resonance, David Neuhaus and Philip A. Evans ..............................................15 CH. 3. Peptide Structure Determination by NMR, Michael P. Williamson ..................................................................... 69 CH.4. High-Resolution NMR of DNA and Drug-DNA Interactions, Jill Barber, Helen F. Cross, and John A. Parkinson ....................87 CH. 5. Structural Characterization of the Carbohydrate Moieties of Glycoproteins by High-Resolution 'H-NMR Spectroscopy, Herman van Halbeek ..................................................................... 115 CH. 6. The Application of Nuclear Magnetic Resonance to Structural Studies of Poly saccharides, Christopher Jones and Barbara Mulloy .................................. 149 CH. 7. Dynamic and Exchange Processes in Macromolecules Studied by NMR Spectroscopy, Lu-Yun Lian ...................................................................................169 PART11. MASSSPECTROMETRY CH. 8. Introduction to Mass Spectrometry, Robin Wait ...................................................................................... 191 CH. 9. Laser Desorption Ionization Mass Spectrometry of Bioorganic Molecules, Michael Karas and Ute Bahr ........................................................ 215 CH. 10. 252-Californium Plasma Desorption Time-of-Flight Mass Spectrometry of Peptides and Proteins, Peter Roepstorff .............................................................................. 229 CH. 11. Fast Atom Bombardment Mass Spectrometry of Peptides, Robin Wait ...................................................................................... 237 CH. 12. Tandem Mass Spectrometry, Catherine E. Costello ..................................................................... 285

uii

Contents PART111. METALLOPROTEIN TECHNIQUES CH. 13. Mossbauer Spectroscopy, Dominic P. E. Dickson ................................................................... 315 CH.14. Electron Paramagnetic Resonance Spectroscopy of Metalloproteins, Richard Cammack .....................................................................3 2 7 CH.15. Resonance Raman Spectroscopy of Metalloproteins Using CW Laser Excitation, Roman S. Czernuszewicz ...............................................................345 CH.16. The Application of X-Ray Absorption Spectroscopy to Characterize Metal Centers in Proteins, C. David Garner .............................................................................375

Index ...................................................................................................................... 391

Contributors UTEBAHR Institute of Medical Physics, University of Munster, Germany JILLBARBERDepartment of Pharmacy, University of Manchester, UK RICHARD CAMMACK Division of Biomolecular Sciences, King's College London, UK CATHERINE E. COSTELLODepartment of Chemistry, Massachusetts Institute of Technology, Cambridge, MA HELEN F. CROSS Department of Pharmacy, University of Manchester, UK ROMAN S. CZERNUSZEWICZ Department of Chemistry, University of Houston, TX DOMINIC P. E. DICKSONDepartment of Physics, University of Liverpool, UK PHILIP A. EVANSDepartment of Biochemistry, University of Cambridge, UK C. DAVID GARNERDepartment of Chemistry, University of Manchester, UK CHRISTOPHER JONES National Institute for Biological Standards and Control, Potters Bar, UK MICHAEL KARAS Institute of Medical Physics, University of Munster, Germany Lu-YUNLIAN University of Leicester, UK BARBARA MULLOYNational Institute for Biological Standards and Control, Potters Bar, UK DAVID NEUHAUSMRC Laboratory of Molecular Biology, Addenbrookes Hospital Site, Cambridge, UK JOHNA. PARKINSON Department of Chemistry, University of Edinburgh, UK PETERROEPSTORFF Department of Molecular Biology, Odense University, Odense, Denmark HERMAN VAN HALBEEKComplex Carbohydrate Research Center, University of Georgia, Athens, GA

x

Contributors

ROBINWAIT Division of Pathology, Public Health Laboratory Service Centrefor Applied Microbiology and Research, Salisbury, UK MICHAEL P. WILLIAMSON Department of Molecular Biology and Biotechnology, University of Sheffield, UK

CHAPTER 1

Introduction to Nuclear Magnetic Christopher

Resonance

Jones and Barbara

Mulloy

1. Introduction This brief guide is not intended as a full explanation of the theory and practice of nuclear magnetic resonance (NMR), on which there are a large number of excellent texts (I-3), but as an introduction to the terms used in the subsequent chapters. The section as a whole does not provide a comprehensive outline of the NMR of organic compounds, which would be out of place in this volume, but is a selection of particular applications likely to be of use to molecular biologists and biochemists. Over the last few years, the number of publications dealing with NMR determinations of protein and peptide conformation in solution has increased dramatically, and this is reflected in the amount of space given here to the subject in Chapters 2 and 3. The use of NMR m the study of internal mobility in proteins and in interactions between molecules is covered in Chapter 7. Chapters 5 and 6 deal with structural studies on complex carbohydrates, which have thrived on recent advances in NMR. Nucleic acids and their interactions are covered in Chapter 4. 2. Basics of NMR When the sample is placed in a magnetic field, the nuclei of some of its constituent atoms (usually ‘H, but r3C, 15N, 19F,31P,and 2H are also commonly encountered in biomedical research) are forced into From Methods m Molecular &o/ogy, Vol 17 Spectroscopic Methods and Analyses NMR, Mass Spectromefry, and Mefalloprotem Techmques Ed&d by C Jones, B Mulloy, and A H Thomas Copynght 01993 Humana Press Inc , Totowa, NJ

1

2

Jones and Mulloy

alignment with the field. In this state, the absorption or emission of electromagnetic radiation with a suitable, resonant frequency becomes possible. The frequency of the absorbed energy is directly proportional to the strength of the magnetic field, so the resonance condition can be achieved either by scanning the frequency of the electromagnetic radiation at constant field (as effectively happens in modern Fourier transform [FT] spectrometers) or by scanning the magnetic field at a constant irradiation frequency (as usually happened in older continuous-wave [CW] instruments). The nomenclature of NMR is complicated by the fact that both options are enshrined in the terminoiogy independently of the experimental setup used. 3. Fourier Transform NMR The older CW instruments utilized a monochromatic irradiation frequency and observed an absorption spectrum, whereas FT machines use a broad-band pulse of radiation to equalize the populations of the high- and low-energy states and then observe an emission spectrum. This pulse methodology has proven extremely powerful in its practical application, and the subsequent chapters assume that such instruments are available. A significant advantage of the FT method is that, since the whole spectrum is acquired in a few seconds following a single pulse, data from many such acquisitions may be added together to give much improved signal-to-noise ratios and sensitivity. Without this advantage, the use of relatively insensitive nuclei, such as i3C, in studies of biological samples would be impossible. 4. The 1D Spectrum The NMR peak as usually seenin the one-dimensional (1D) spectrum can be characterized by four basic parameters: the frequency (or field) at which resonance occurs, the intensity of the peak, couplings to other nuclei as revealed by the multiplicity of the peak, and a series of parameters, such as linewidth, based on relaxation behavior (seeFig. 1). 4.1. Chemical

Shift

The resonance frequency is usually quoted as the difference, in parts per million, from that of a reference standard arbitrarily set to 0 ppm. The advantage of this scale is that it is Independent of the base oper-

Introduction

3

to NMR

-B)-a-ManNAc-(l-OPO,;OAc

Chmkal

B

ShlH

(ppm)

Spin-spin

4’5 -‘----

Chemical

coupling

shift

Fig. 1 Four parameters that can be measured from the ID NMR spectrum of a brological macromolecule (A) Expansions of the 500-MHz NMR spectrum of the menmgococcal type A polysaccharrde (recorded at 343 K = 70°C). (B) A further expansion of the H2 doublet. The frequency at which resonance occurs is the chemical shift; the intensity of the peak 1s usually measured by mtegratron, the multrplicity of the srgnal reflects spin-spin coupling, and the drfference between the frequencies of the two lures gives the spin-spin coupling constant; the lmewrdth IS related to the rate at whrch the nucleus relaxes from Its exerted state

Jones and Mulloy

4 LOW

FREQUENCY

LOW

FIELD

DESHIELDED

HIGH

FRECUENCY

HIGH

FIELD

SHIELDED

TMS reeonance

+

.

I 10 ppm

498.8e5,ooo

0 mm Hz

Reeonance

frequency

500.000,000

Hz

Fig. 2. The chemical shift scale for protons in a 500-MHz spectrometer (in which the magnetic field is 11.744 T). The frequency of the tetramethylsilane (TMS) resonance is taken as an arbitrary reference point, and other frequencies are expressed in terms of parts per million (ppm) of this frequency.

ating frequency of the instrument (i.e., the strength of the field produced by the magnet), and is equally applicable to field or frequency scanning. The primary standard chosen for ‘H work, trimethylsilane, resonates at higher frequency than most other nuclei, and the scale runs in the opposite direction to increasing frequency (or field). The ends of the spectrum are often referred to as high field (or high frequency) or low field (or frequency). Individual ‘H nuclei resonate at different frequencies, becausethey are shielded from the applied magnetic field by the electrons around them. Thus, an effect that moves a resonance to low field corresponds to an additional deshielding of the nucleus, and movement to high field is a shielding (see Fig. 2). Aromatic systems with a cyclic x-electron system generate a magnetic

Introduction

to NMR

5

field of their own when placed in an external field, which affects the chemical shift of nearby nuclei in a manner that depends on the geometry of the system, This is referred to as ring-current anisotropy. In a given magnetic field, different types of nuclei resonate at different frequencies, and instruments are usually described by therr proton frequency (e.g., a500-MHz instrument). In such an instrument 13C nuclei resonate near 125 MHz and 31Pnuclei near 202 MHz. Thus, the individual spectra arewidely separatedin frequency. The typical widths of the spectra of different elements can be very different, too: Most of the ‘H spectrum 1sbetween 0 and 10 ppm, whereas the r3C spectrum occupies 200 ppm. The chemical shift of a resonance is dependent primarily on its local chemical environment and less critically on geometric factors. 4.2. Intensity With reasonable care in the choice of experimental conditions, the intensity (integral) of a resonance is proportional to the number of nuclei contributing to it. Thus, integration of parts of the spectrum can be used to show how many of each type of nucleus are present within a complex molecule, or to quantify the amounts of two or more distinct chemical entities. 4.3. Coupling A single resonance may be split into several separate lines by the influence of the spins of nearby nuclei. This interaction, called J- or scalar-coupling, is small and independent of the applied magnetic field (and so is quoted in Hertz), and operates throughchemical bonds. The magnitude of the splitting depends on the number of chemical bonds involved and the geometry of the interaction. The most commonly measured and used coupling constant is that between ‘H nuclei separated by three bonds (written as 3.1r.ru)which depends on the dihedral angle about the central (usually C-C) bond. This relationship can be fitted to an equation, called the Karplus curve, relating the coupling (typically l-10 Hz) to the dihedral angle between the two C-H bonds, although the coefficients depend on the nature of other substituents on the carbon skeleton. An example, showing a Karplus curve characterized for the fragment H-N-Ca-H in peptides, is shown in Fig. 3 (4). Because the difference in energy caused by this

6

Jones and Mulloy

3Jc~m,~~ =

2 17

-

127COd

+5

41COS%

Fig. 3 The Karplus relatlonshlp between the three-bond proton-proton coupling constant (3JH,H) between the a and NH protons of ammo acids in peptldes, and the dihedral angle between the C-H and N-H bonds as characterized m ref. 4 The curve IS symmetrical about 180”

coupling is small, the populations of the two levels are only slrghtly different. Thus, the two peaks in a resonance (split into a doublet) have almost identical intensities, although the situation may appear more complex if a resonance is coupled to more than one other nucleus. Conventionally, the signal arising from a particular type of nucleus is usually described as a single resonance even if it is split by scalar coupling. When two coupled resonances have very similar chemical shifts, multiplets become distorted, and it is no longer possible to measure coupling constants directly from the spectrum. This is “strong couplmg.” Spectroscopists use an alphabetical convention to describe systems of coupled spms, in which adjacent letters in the alphabet

Introduction

to NMR

denote strong couplmg (an AB system has two strongly coupled spins) and distant letters in the alphabet denote weak coupling (an AMX system has three weakly coupled spins). 13Cspectra would be greatly complicated by coupling to protons, except for the fact that they are usually recorded with some form of broad-band irradiation of the frequencies absorbed by protons, which effectively decouples the proton and carbon nuclei. 4.4. Relaxation The equalization of the population of the nuclear spin states caused by the RF pulse (in modern instruments) creates a high-energy system that relaxes back to thermal eqmlibrium by a varrety of mechanisms, and analysis of the relaxation rates and pathways provides a great deal of information about the geometry and dynamics of the system. The characteristic rates (I?,, R2) of relaxation resulting from these mechanisms, or their reciprocal relaxation times (T, = l/R,, T2 = 1/R2), are important not only as data, but because their values determine optimal conditions for the acquisition of spectra. There are many NMR experiments for which it is important that a delay between pulses is incorporated sufficient to give effectively full relaxation. The spin-lattice (or longitudmal) relaxation time (T,) cannot be deduced from a simple 1D spectrum, but must be measured in a separate experiment, usually the inversion recovery experiment. The spin-spin (or transverse) relaxation time (T,) can be estimated from the width of lines in the 1D spectrum or more accurately measured by special experiments, usually basedon the “spin-echo” experiment.Theseexperimentsaredescribed m ref. 2. The nuclear Overhauser enhancement (NOE) is also a relaxation phenomenon. Nuclei close to each other in space transfer energy to each other during relaxation, and the extent of this transfer is related to the distance between the nuclei. The NOE from protons to then attached carbons is conveniently exploited to increase the intensity of the r3C spectrum, by u-radiating the sample at a frequency absorbed by protons while the 13Cspectrum IS accumulated (a routine measure in any case to remove multiplicity in the carbon spectrum owing to Jcoupling between protons and r3C). The NOE between two protons can be used to estimate the drstance between them and 1s of great importance m the determination of three-dimensional structures of brologrcal macromolecules (see Chapter 2).

8

Jones and Mulloy

All theserelaxation parametersarestrongly influenced by the mobility in solution and, hence,the molecular size of the compound of interest. For large molecules, TI and T2arereduced, andthe interproton NOES become negative in sign. The magnitude of all threeof theserelaxation effects can be expressed in terms of the correlation time (Q, which is itself a characteristic of the rate of random reorientation of a molecule in solution. In NMR studies of biological molecules, it is usually assumed that relaxation by a single mechanism, known as dipolar relaxation, takes place between directly bonded nuclei with magnetic spins. There are other relaxation mechanisms that become important m specific circumstances, for example, relaxation via a paramagnetic nucleus (see Chapter 2, Section 7.). 5. Transfer

of Magnetization

Transfer of magnetization from one resonance in the spectrum to another may be the result of other mechanisms than the NOE. If a nucleus is involved in a chemical reaction while it is excited, it will take its remaining magnetization with it to its new environment. This mechanism, known as chemical exchange, can be used to study the reaction concerned (see Chapter 7). In systems where both NOES and chemical exchange are taking place, it can be difficult to tell them apart without the use of elaborate two-dimensional (2D) techniques. For experiments of this kind, it is necessary to irradiate one resonance and observe the results elsewhere in the spectrum. FT spectrometers use a very short, intense pulse of RF radiation (a “hard pulse”) that has a bandwidth (related to the reciprocal of the pulse length) considerably wider than the spectral width. Continuous irradiation at the desired frequency can be used to saturate a smgle resonance. This is the method used in saturation transfer experiments and sometimes for measurement of NOES. For some experiments, however, it is necessary to deliver a pulse to an individual resonance, for example, where selective inversion (of the spin of a particular nucleus) is required. This can be done using a relatively long, low-power pulse (a “soft pulse”) that gives excitation of a narrow bandwidth, but is otherwise identical to the “hard pulse.“The selectivity of the pulse can be further improved (5,6) if, instead of a square pulse in which the transmitter is switched on at the final power, the pulse is “shaped” as a Gaussian or half-Gaussian (Fig. 4).

Introohction

to NMR

9

-

Fig 4. Shaped pulses. (A) A simple square pulse, a long, “soft” pulse (1 e , of low power) with this shape will excite one part of the spectrum selectively, but ~111 cause artifacts (B) A Gaussian pulse will be as selective as the square pulse and avoid some of the artifacts There are many other possible shapes for pulses

6.2D NMR Experiments Modem spectrometersusing pulse methods do not record the spectrum directly, but rather use an mterferogram of magnetization vs time, which is digitized and Fourier transformed to a spectrum of magnetization vs frequency (Fig. 5). Excitation of the sample need not be by a single pulse, and multipulse sequenceshave been introduced to give a wide variety of informative experiments, including the 2D methods. If a delay between the pulses is introduced and a series of spectra are collected at various values of this delay, a second FT of intensity vs incremented delay generates a second frequency axis. This is the basis of 2D NMR. The spectra are usually plotted as a contour map with intensity as the z axis. The most common series of 2D spectra has the original spectrum occupying the diagonal (frequency 1 = frequency 2) and a number of off-diagonal peaks with frequency coordinates connecting two peaks in the original spectrum, The position and intensity of these peaks generate additional information and extend the power of the NMR methods. Chapter 2 in this section gives a more detailed account of the principles behind 2D NMR spectroscopy, particularly of 2D NOE spectroscopy or NOESY, here we give an overview of the range of methods available and the information they provide. 1D NMR spectra are usually recorded in the “phase-sensitive” mode, which is to say that the real and imaginary data points resulting from

Jones and Mulloy

Free

Fourier

lnductfon

decay

transform

(FID)

F(w)

=

I

B

lm

J-m

f(t)ezp(

-wt)dt

Spectrum

,I

”

I

,,

&

Frequency

”

*

*

m

(w)

Fig 5 (A) An mterferogram of magnetlzatlon vs time (or free mductlon decay [FID]) recorded on a pulse FT NMR spectrometer This IS drgltahzed and Fourier transformed to give (B) a spectrum of magnetization vs frequency (m this case a 13C spectrum)

Introduction

to NMR

11

the FT are used to distinguish between absorbance and dispersion components of the spectrum. 2D spectra are sometimes recorded in the power mode, by taking the square of all the data points; this is economical in time and in computing power and memory, but does not give as good resolution and line shape as the phase-sensitive method. Correlation spectra: COSY (Correlation SpectroscopY): In these spectra, crosspeaks are located at the frequencies of resonances, which are spin-coupled and allow assignments of specific resonances to individual protons in the spectra by allowing a network of coupling connectivities (and hence bonded atoms) to be built up. Figure 6 shows an example of this kind of spectrum and its relatlonship to the structure of a simple molecule. An extension of this experiment, the relayed COSY, generates crosspeaks where two resonances couple to a common partner and is valuable when extensive spectral overlap makes assignment difficult. HOHAHA (HOmonuclear HArtmann-HAhn) and TOCSY (Total Correlation SpectroscopY) generate essentially the same information using different spin physics, although the degree of relay depends on the lengthof a “spin-lock” pulseratherthanadditional pulsesin the sequence. The individual lines making up the fine structure of a crosspeak in a phase-sensitive COSY spectrum are antiphase, with some positive and some negative. As the linewidth of these components approaches the separation between them, cancellation can occur, reducing sensitivity. In HOHAHA and TOCSY experiments, the fme structure 1sm phase, and cancellation does not occur. NOESY (NOE SpectroscopY) generates crosspeaks at the frequencies of resonances that are close in space within the molecule, rather than linked through covalent bonding. ROESY (Rotating frame noE SpectroscopY) generates similar information, but the dependence of magnitude of the NOES on molecular motion is different and spin diffusion less of a problem, although other artifacts occur. This experiment uses a spin-locking pulse, and so is related to the HOHAHA and TOCSY experiments. 1D versions of many of these experiments can be performed where selective irradiation of a resonance is possible. These experiments generatea spectrum looking like a 2D cross-section, but can be obtained with the very high digitization needed for accurate measurement of

12

Jones and Mulloy

Hi A

Fig 6. (A) The 1D ‘H spectrum, and (B) A COSY spectrum of the methyl glycosrde of P-o-galactose Starting from the anomerrc doublet (3Jn1,n2= -9 Hz), connectivtttes can be traced between vtcmal protons around the sugar rmg. The difference between the chemical shifts of the two H6 protons ISof the sameorder as the geminal 2JH,Hbetween them, so they are strongly coupled, and the multtplet shapeIS distorted

Introduction

to NMR

13

coupling constants. These 1D equivalents are usually just called lDCOSY, and so on, although the ROESY equivalent is also known as CAMELSPIN (7). 7. Heteronuclear

Correlation

The coupling phenomenon and the correlation methods based on it are not restricted to the case where both nuclei are the same, but allow, for instance, a 13Cresonanceto be correlated with that from the attached proton. These methods require pulses to be applied at both the proton and heteroatom resonance frequency, and may be detected at either resonance. In practice, heteroatom detection is simpler, but less sensitive, and results in the standard heteronuclear correlation experiment (8), although modified schemes, such as COLOC (9), can be used when the experiment is tuned for the smaller couplings arismg over more than one bond (e.g., 2Jc,u), and is particularly valuable for establishing a covalent framework when quaternary carbons are present. Proton detection, often referred to as inverse detection, is more sensitive and gives better dispersion in the crowded proton domain, but requires spectrometer hardware that is not always available and pulse sequences that suppress signals from protons not attached to an NMR active heteroatom (e.g., 12C). Since the natural abundance of the “useful” heteroatomisotopes is rarely complete, signals from protons attached at unlabeled sites must be suppressed by careful design of the pulse sequence. The standard pulse sequence for this work is called HMQC (Heteronuclear Multiple Quantum Coherence) (IO), but relayed versions (II) (relayed HMQC) and long-range versions (12) are possible (HMBC-Heteronuclear Multiple Bond Correlation). References 1. Neuhaus, D. and Williamson,

M P. (1989) The Nuclear Overhauser Effect in Analysis Verlag Chemre, Weinherm 2 Abraham, R J , Fisher, J , and Loftus, P (1988) Introduction to NMR Spectroscopy. Wiley, Chichester 3 Ernst, R R , Bodenhausen, G., and Wokaun, A, (1987) Principles of Nuclear Magnetic Resonance in One and Two Dimenstons. Oxford Universtty Press, Oxford. 4 DeMarco, A , Llinas, M., and Wuthrich, K. (1978) Analysis of the proton NMR spectra of ferrichrome peptides, Part 2 The amtde resonances BlopolyStructural

and Conformational

mers 17,637~650.

14

Jones and Mulloy

5 Kessler, H , Oschkmat, H , and Griesmger, C. (1986) Transformation of homonuclear two-dimensional NMR techniques mto one-dimensional techniques using Gaussian pulses J Mugn Reson 70, 106-133 6 Kessler, H , Schmieder, P , Kock, M , and Kurz, M (1990) Improved resolution in proton-detected heteronuclear long-range correlation J Mugn Reson 88,615-618. 7 Bothner-By, A A , Stephens, R L., Lee, J , Warren, C D , and Jeanloz, R W (1984) Structure determination of a tetrasaccharide transient nuclear Overhauser effects in the rotating frame J Am. Chem Sot. 106, 811-813 8. Bax, A. and Moms, G A. (198 1) An improved method for heteronuclear chemical shift correlation by two dimensional NMR. J Mugn Reson 42,501-505 9. Kessler, H., Griesenger, C., Zarbeck, J , and Loosi, H R (1984) Assignment of carbonyl carbons and sequence analysis m peptides by heteronuclear shift correlation via small couplmg constants with broadband decouplmg in fl (COLOC) J. Magn. Reson 57,331-336 10. Bax, A and Subramaman, S. (1986) Sensitivity enhanced two dimensional heteronuclear shift correlation NMR spectroscopy J Magn Reson 67, 565569 11. Lerner, L. and Bax, A (1986) Sensitivity enhanced two dimensional heteronuclear relayed coherence transfer NMR spectroscopy J Magn Reson. 69, 375-380 12. Bax, A. and Summers, M F (1986) ‘H and t3C assignments from sensitivity enhanced detection of heteronuclear multiple-bond connectivity by 2D multiple quantum NMR J Am Chem. Sot. 108,2093-2096

CHAPTER

2

Structural Studies of Proteins in Solution Using Proton Nuclear Magnetic Resonance David

Neuhaus

and Philip

A. Evans

1. Introduction Nuclear magnetic resonance(NMR) spectroscopy has become established m recent years as a uniquely powerful technique for studying the structures of proteins in solution. In a ‘H spectrum, each hydrogen atom in the molecule gives rise to an individual signal, and in favorable cases, it is possible to resolve each of them and assign each to an identified atom. The power of the method then lies in the wealth of information that can be obtained concerning both through-bond and through-space connectivities between individual nuclei. This makes it possible to determine m ?&ail the three-dimensional (3D) conformation from NMR data, and that is the major subject of this chapter. The feasibility of such a full structure determination depends crucially on the completeness with which signals can be assigned to individual nuclei and conformation-dependent parameters determined. The key to achieving this has been the development of two-dimensional (2D) NMR, which, by dispersing the signals more thoroughly and m a structurally significant manner, permits the resolution of large numbers of resonances and elucidation of the connectivities between them. The principles of 2D NMR and the particular experiments that are commonly employed in studies of proteins will be outlined, together with From Methods m Molecular Biology, Vol 17 Spectroscopic Methods and Analyses NMR, Mass Spectrometry, and Metalloprotern Technrques Edited by C Jones, 0 Mulloy, and A H Thomas Copynght 01993 Humana Press Inc , Totowa, NJ

15

16

Neuhaus and Evans

the strategies employed to make specific resonance assignments. A number of approaches are then possible to turning the accumulated spectral information into structural detail, and thesearereviewed briefly. Such a detailed analysis is not always feasible, however, because it may not be possible to resolve and assign the spectrum in sufficient detail-in that case, the structural information that can be obtained will necessarily be more limited, though it may nonetheless be valuable. We illustrate briefly how “partial answers” to some structural questions may be obtained. This chapter IS intended to give a general outline of the principles of protein structure determination from NMR data, from a practical perspective. Thus, we do not attempt to review the literature comprehensively, but rather to provide a limited number of useful references, particularly reviews, where they can helpfully expand upon a particular point. In particular, the book by Wtithrich (I) and a number of recent general reviews (2-4) give a more detailed account of resonance assignment and structure determination methodologies. 2. Scope and Limitations The chief limitations on the applicability of NMR are sensitivity and resolution. NMR is a particularly insensitive method, so that sample concentrations much higher than those used for other spectroscopic studies are generally necessary. Protein solutions of at least 1 mM and preferably higher are desirable for ‘H NMR. Many small proteins are perfectly soluble at this concentration, but in some cases, there can be a problem both in terms of the amount of material required and its solubility. NMR of other nuclei IS much more insensitive still, and in the case of 13Cand 15N,the problem is compounded by low natural abundance. Heteronuclear NMR studies are therefore largely dependent on being able to enrich the protein with the isotope concerned. Sensitivity in NMR experiments improves greatly asthe magnetic field strength employed is increased. For this reason, protein NMR requires a high-field spectrometer-at present,typical fields arebetween about 914 T, corresponding to ‘H frequencies of between 400-600 MHz. Detailed characterization of the spectrum of a macromolecule requires resolution of a large number of resonances, and the determination of both through-bond and through-spaceconnectivities between the nuclei from which they arise. Resolution is limited by:

Proteins in Solution

17

1. The number of resonances in the spectrum; 2. The dispersion of their chemical shifts; and 3. Their lmewidths. Linewtdths, in particular, can also pose problems rf they become comparable to the coupling constants in the spectrum-if couplings are not resolved because of broad lines, then the results of correlation experiments, which depend on these interactions, will be impaired. Each of these factors can thus be an important consideration in NMR

studies of proteins, and we will consider them briefly in turn: 1. The number of resonances m a protein spectrum obviously Increases more or less linearly as the number of residues increases. For a protein of 100 residues, for example, the ‘H spectrum will contam m the region of 600 resonances, most of which need to be resolved and assigned if a full structural analysis 1sto be carried out. In some cases, it is possible to alleviate overcrowding by means of spectral edmng techniques The simplest such trick is to dissolve the protein m D,O, so that exchangeable (NH and OH) protons will be progressively replaced by deuterons and their tH resonances will be lost from the spectrum. If some NHs are protected from the solvent (for example, by hydrogen bonding), they may be resistant to exchange and be selectively retained m the spectrum. This kmd of editing has the added advantage that it provides additional structural information. Other editing techniques typically exploit couplmgs to specific heteronuclei to isolate particular classes of resonances m the spectrum (5,6). These methods are proving to be extremely valuable m extending the range of NMR to larger proteins, but it must be noted that they depend on being able to introduce tsotopic labels, such as t3C or t5N, into the protein. 2. The dispersion of lines m a protein spectrum results because different protons resonate at slightly different frequencies depending on their exact molecular environment. The primary determinant of this “chemical shift” 1sthe covalent structure (see Chapter 1), but there are also conformation-dependent perturbations, which are crucial for separating resonances that would otherwise be degenerate (e.g., if there 1s a recurrence of any particular amino acrd m the sequence, resonances of correspondmg protons of each of them would be expected to be comcident m the absenceof conformatronal effects). In compactly folded structures, these perturbations may be quite large as a result of through-space mteractions with magnetically anisotropic groups, such as aromatic rings and carbonyl groups (7), and this usually provides sufficient resolution

Neuhaus

and Evans

to permit assignment of mdtvtdual resonances. In protems that do not have a fixed tertiary structure, however, dynamic averaging can greatly diminish these effects, leadmg to severe problems with spectral overlap-for example, this is a major difficulty m studies of residual structure m nonnative states of proteins, which are important m relation to protein folding (8). Of course, lack of a unique tertiary structure also poses fundamental problems for structure determmation! Unless alternative conformations interconvert very slowly (
Proteins in Solution

19

undertaken, although resolution IS limited under these conditions, and only rather small polypeptides have thus far been studied in this way (see, e.g., ref. II). In solution, lines are narrowed, because the tumbling motions of solute molecules greatly reduce the rates of dipolar relaxation (see Chapter 1). However, as the molecular size increases, tumblmg becomes slower and the narrowing achieved is less. For very large molecules, the linewidths can then become too great for high-resolution studies. This, together with the mcreasmg problem of overcrowdmg, effectively limits the range of proteins for which detailed structures may be determined by NMR. These effects are illustrated by the 1D spectra shown m Fig. 1. The zmc finger from SW15 is a rather small protein of 35 residues; it gives a well resolved spectrum with narrow lines, permrttmg the multipltcities of mdivtdual resolved resonances to be seen. It has been possible to analyze this spectrum in detail using 2D methods (Z2), and we use this as an illustration in subsequent secttons. Hen eggwhite lysozyme is substantially larger, at 129 residues. It is nonetheless apparent that mdtvidual signals are still reasonably narrow, and tt has been possible to assign vtrtually the enttre proton spectrum of this protem (13); proteins of this size are, however, close to the limit of the mol-wt range for which detailed assignment and analysts are possible without atd from heteronuclear studies using isotoptcally enriched protein. IgG is a much larger protein, comprising a total of approx 1200 residues. Even though these are organized mto domains, with limited flexibility between them, the slow overall tumbling of this much larger molecule makes the maJorrty of its resonances extremely broad, so that detailed assignment and structural analysis would not be feasible. It should be emphasized that there are no hard and fast rules as to the size of protein for which detailed NMR investigations are applicable. The particular structure, dynamics, and intermolecular interactions of an individual protein are also of great significance in determining resolution. Some quite small proteins turn out to give rather poor *H spectra if, for example, they are prone to aggregation at the relatively high concentrations typically required for NMR study-e.g., Fig. 2 shows the example of glucagon (30 residues), which turns out to grve rather broad lined spectra because it tends to trimerize in aqueous solution at high concentrations (14). Conversely, relatively large molecules may give surprisingly good spectra if there IS significant segmental mobility-this IS well illustrated by the example of uroki-

20

Neuhaus

and Evans

Fig 1. One-dimensional ‘H NMR spectra of three proteins, Illustrating the effects on resolution of increasing mol wt. (A) A zinc-finger peptlde from SWIS: 4 kDa; (B) hen egg-white lysozyme: 14 5 kDa; (C) bovine immunoglobulm G. 150 kDa.

nase, which is discussed further in Section 6. This protein has a total mol wt of 60 kDa, but comprises three distinct domains that appear to be able to move relatively independently, with the result that some lines are quite well resolved and can be used as markers for the behav-

ior of the individual domains (15). If line broadening owing to drpolar relaxation is a problem, it may be possible to ameliorate the situation in various ways. The simplest is to try to enhance molecular mobility. This may often be achieved by warming the sample, provided that the folded state is reasonably ther-

Proteins in Solution

J

60

21

I

I

I

75

70

65

1

I

I

20

15

10

1

wm

Fig 2. Concentration dependence of the ‘H spectrum of glucagon m aqueous solution. The spectra (whrch have been manipulated to enhance resolutron) were obtained at protein concentrations of (A) 0.5 mM and (B) 5 0 mM. The shifting and broadening of signals evident at the htgher concentration are associated with partial trimerization of the peptide under these conditions. This Illustrates that even relatively modestly stzed peptides do not necessarily give very high resolutron spectra and that it may be necessary to experiment m order to obtain optimal solution conditions. (Adapted, with permission, from ref. 14 )

mostable. It may also be possible to reduce the extent of intermolecular interactions, which tend to impede tumbling, by varying solution conditions, such as pH and ionic strength-this can only be investigated by trial and error. More sophisticated tricks are, as usual, available if isotopic labeling is possible. One such trick is to prepare a protein in which there is random replacement of protons by deuterons, to obtain a spectrum in which all proton resonances are still present albeit at reduced intensity. This has the result that, on average, a given proton will have many fewer other protons in close proximity than in the normal protein, thereby reducing the efficiency of dipolar relaxation and, hence, the linewidths (16). Other applications of labeling

22

Neuhaus

and Evans

that may allow one to live with broad lines utilize the fact that heteronuclear coupling constants may be much larger than ‘H-‘H couplings, so that although homonuclear 2D experiments, which rely on scalar coupling, may fail, heteronuclear variants may still be useful. In certain cases, lines may be broadened by processes other than dipolar relaxation. One commonly encountered in studies of metalloproteins results from interaction with a paramagnetic center. This can cause virtual obliteration from the spectrum of resonances of protons close in space to the paramagnetic group. Although this may be a problem, paramagnetic perturbations can, in some circumstances, be used to obtain valuable structural information if, for example, it is possible to mterconvert between paramagnetlc and diamagnetic forms of the protein. A good example is provided by Megasphera elsdenii flavodoxin. This is a protein of 137 residues whose spectral assignment was facilitated by selective suppression of the resonances of protons close to the flavin prosthetic group, when the latter was m a paramagnetic oxidation state (I 7). One feature of proteins that may be encountered in NMR studies is interconversion between different structural forms. The effect of this on the spectrum depends critically on its rate: If it is very slow, distinct resonances of the alternative forms will occur in the spectrum; there are a number of such cases known-for example, conformational differences associated with prolyl peptide bond isomerism can give rise to this kind of behavior (18). On the other hand, if conformational interconversion is fast, a single average resonance will appear for each proton, The danger with this, of course, is that the existence of more than one conformer is not immediately apparent in the spectrum, and there is the risk that one will proceed with a structural analysis on the false assumption that there is a single structure. Fortunately, the existence of significant populations of substantially different conformers of a protein does not seem to be common. On the other hand, it 1s certainly true that there 1slocal motion of many residues within protein structures, so that the NMR spectrum will reflect an average conformation in these cases. The use of NMR to study this dynamic behavior is actually a major research area in itself (19) (see Chapter 7), but it is beyond the scope of this chapter. Conformational interconversion at intermediate rates, comparable to the resonant frequency differences for Individual protons in the different states, gives rise to broadening

Proteins in Solution

23

of the spectrum. The degree of broadening depends on the chemical shift disparity between the different states for each nucleus, so that different signals are in general broadened to different extents, and, indeed, some may become so broad as to be undetectable. It may be possible to eliminate exchange broadening by driving the conformational equilibrium over toward a single form in some way-for example, binding of a ligand may have this effect. Alternatively, raising the temperature may increase the interconversion rates sufficiently to give fast exchange where single, sharp lines are observed; it then has to be borne in mind, aspreviously discussed, that these represent an average over different conformations. The general conclusion to be drawn from these considerations is that there are many potential difficulties in NMR studies of proteins. Some can be overcome by judicious choice of experimental conditions; others are more fundamental and limit the extent to which structural information is accessible for some proteins. For all that, it is now clear that for small, nonaggregating proteins, NMR is a high resolution structural method of considerable generality. 3. Experiments Although NMR has been used to study proteins for many years, the main impetus for the recent surge in structural applications has been the development of 2D NMR. In this section and the next, we give an overview of the prmcipal 2D NMR experiments useful for proteins, concentrating on the type of information available from each experiment, and its place in the overall strategy of assignment and structure determination. To begin, we give as background a much simplified picture of the general features of 2D NMR, using the NOESY experiment as an example. Figure 3 is a schematic representation of a NOESY experiment for a one-line spectrum. As in all FT NMR experiments, the data are recorded as decaying signals in real time, called free induction decays or FIDs (see Chapter 1). These result from excitation by pulses (three in the case of NOESY) of radio-frequency irradiation. In a 2D experiment, a whole series of 1D experiments is carried out in sequence, increasing the value of the delay tl systematically from one experiment to the next, and storing the results separately for each value of tt. When the Fourier transform of each FID IS calculated, the resulting

24

2D

Neuhaus

and Evans

spectrum WI,

St F,, F2,

F2)

2

nd

FT

\

\

\ 1

1.

,------F*

----

4

,--__--

V

-F2--

4

Fig 3. Schematic representation of a NOESY expenment for a one-lme spectrum, see text for discussion For simplicity, only the first ten increments are shown, and the incrementation of tl IS exaggerated. (Reproduced, with permlsslon, from ref. 20 )

Proteins

in Solution

25

spectra show the influence of the variation in tt as a modulation of the amplitudes (or, in some experiments, the phases) of the lines in the spectrum. This stagecorresponds to the “interferogram” S(t,, F2) shown in Fig. 3, where the amplitude of the single line in this spectrum varies as acosine function of tt . In fact, the modulation arises because, during tt, the signal of interest is present in the transverse plane and therefore precesses (i.e., rotates). At the end of ti, only one component is selected by the pulse sequence to contribute to the finally observed spectrum. A trace running “backwards” through the interferogram (i.e., in the direction of increasing tt), linking the center of the peak as it appears in each spectrum, thus corresponds to an indirectly detected FZD, charting the precession of the magnetization during t,. A second Fourier transformation, this time with respect to t,, turns the indirectly detected time domain tt mto an indirectly detected frequency domain Ft. In Fig. 3, this leads to a 2D spectrum containing just one line, but Figs. 4 and 5 show what happens for a two-line NOESY spectrum. If nothing occurs to cause appreciable interactions between the signals during the time between t, and t2, then each will continue to precess with the same frequency during t2 as it had during tl. This gives results of the sort shown in Fig. 4. The lower frequency signal S (assuming zero frequency to be at the left-hand edge of the spectrum, for simplicity) shows the lower frequency modulation in the interferogram, while the higher frequency signal I shows the higher. Because both signals maintained their original frequencies throughout, their positions in the 2D spectrum after the second Fourier transform both lie on the 45” diagonal defined by F, = F2. Such behavior would be expected, for instance, for a NOESY experiment with a very short mixing time (called 2, in the figures). If these diagonal peaks were the only signals detected, 2D NMR would be of little interest. The whole value of the method arises because the spins can be made to interact during the time between t, and t2, resulting in off-diagonal or crosspeaks, linking the F, and F2 frequencies of the interacting signals. There are several possible interactions exploited in different 2D experiments, and we will discuss the more important ones shortly, but for now, we concentrate on NOESY because the interaction is particularly simple to picture in this case.The nuclear Overhauser effect (NOE), on which the NOESY experiment is based,

---F2------l--*

Fig 4 Schematic representation of a NOESY experiment for a two-hne diagonal peaks (Reproduced, with permlsslon, from ref 20 )

I-----

‘Y--------

*I

spectrum with T, set to zero, to show the orlgln of the

%

\

Proteins

in Solution

27

Fig. 5 Schematicrepresentation of a NOESY experimentfor a two-he spectrum with ‘t, not setto zero,to showthe orlgm of the crosspeaks. (Reproduced, with permission,from ref. 20.) causes the intensities of signals I and S to become interdependent during z,, provided that the correspondmg spins are spatially close enough together in the structure (seediscussion of the NOE later in this section). For molecules the size of small proteins, this results effectively in an exchange of magnetization between the spins. Thus, when the intensity values at the start of 2, are different for signals I and S (e.g., in Fig. 4, when signal I has passed through half a cycle of modulation and is positive, but signal S is still negative), then the NOE will act to make the intensities of I and S converge during 2,. As shown in Fig. 5, this results in new modulations in the interferogram, since by the end of 2, the intensity of each signal depends both on its own precession frequency during tl and that ofthe other signal, It is these new modulations that give rise to crosspeaks in the 2D spectrum after the second Fourier transform. Clearly, 2D experiments take longer to carry out than ID experiments, other things being equal. The length of a 2D experiment dependsmainly on the number of transients recorded per increment and on the resolution required, particularly in F1 (since this is dictated by the number of tl

28

Neuhaus and Evans

increments used). Normally, a 2D experiment lasts for several hours at least, and overnight periods or longer are common for protein experiments. However, when speed is essential (e.g., for kinetic measurements), sacrificing both resolution and sensitivity combined with some tricks can sometimes bring times down to some fraction of an hour (21). Note particularly that the length of a 2D experiment does not depend on how many signals are in the spectrum, but only on the spectral widths and resolutions in the two dimensions; the NOESY spectrum of a small molecule takesjust aslong to run as that of a high-molwt protein at the same resolution, if the spectral widths are similar. Normally, 2D spectra are displayed in the form of an intensity contour plot. Note that, provided the experiment is phase sensitive, as it should always be (see ref. 22), contours can represent either positive or negative intensity. These are usually represented using lines of different color, but this sign information is often discarded when preparing monochrome figures for publication. Table 1 shows a few of the more important 2D experiments, together with the interactions (called mixing processes) that grve rise to crosspeaks in each case. The homonuclear 2D experiments in Table 1 fall into two distinct classes: those based on J-couplings, and those based on the NOE. J-coupling, which is transmitted between nuclei through electronic interactions in the covalent bonding network of the protein, is responsible for the splitting of signals into multiplets. If two protons are J-coupled, the corresponding signals both appear as doublets whose components are separated by the coupling constant, and more complicated patterns result when several couplings act on one proton. Since J-couplings are normally only appreciable over three or fewer bonds, interproton J-couplings can normally only be observed between protons within the sameresidue. Thus, the experiments based on correlation through homonuclear J-couplings (COSY, RELAY, TOCSY, and so on) provide information that links together the signals within groups called spin systems, each spin system corresponding to a particular residue in the protein. The NOE, in contrast to J-couplings, is an intensity effect related to the populations of the energy statesof the spins; it is transmitted directly through space and leads to no directly observable features in the simple 1D spectrum (20). When two protons are sufficiently close in space, the magnetic field owing to each is appreciable at the site of the other.

29

Proteins in Solution Table 1 Common 2D NMR Experiments and Their Information Experiment (IH, 1H) COSY or DQF COSY (IH, 1H) RELAY

(lH, 1H) TOCSY

(IH, 1H) Double quantum correlation

(1H, X> Heteronuclear shift correlation

Requirement for observing a crosspeak lH-1H

J-coupling

lH-1H -1H pathway of one or two J-couplings IH-1H . . . lH-1H pathway of several J-couplings lH-1H J-coupling

lH-X J-coupling over one bond

( 1H, X) Long-range heteronuclear shift correlauon

1H-X J-couplmg over one, two, or three bonds

(lH, 1H) NOESY

IH- - - -lH short distance

(lH, 1H) ROESY

lH----lH short distance

Content

Comments Many variants Also detects COSY crosspeaks. Not all possible peaks occur, dependent on setting of mixing time. Not all possible peaks occur, dependent on setting of mixmg time Different layout and mformation content from COSY Not all possible peaks occur, dependmg on settmg of tuned delay. Selectton of one bond correlattons based on large size of one bond coupling constants. Not all possible peaks occur, dependent on setting of tuned delay Also detects one bond correlations. Relattonship between intensity and distance dtfficult to quantify. Apphcable to “medmmsized” molecules where NOESY fails

This condition allows the protons to indulge in “dipolar cross-relaxation,” which simply means relaxation events in which both protons simultaneously change their spin state. The implication is that, when the resonance intensity of one proton is altered, by a change in its populations, the intensity of signals owing to its near neighbors ~111 subsequently change also, as a result of cross-relaxation. These changes

30

Neuhaus

and Evans

at the signals owing to neighbors are called NOE enhancements. For large molecules, the NOE causes a change for the neighbor in the same sense as that of the perturbed signal (as in Fig. 5, discussed earlier), whereas for small molecules, it is in the opposite sense; for the NOESY experiment discussed earlier, this means that crosspeaks for large molecules have the same sign as the diagonal, whereas for small molecules crosspeaks and diagonal peaks have opposite signs.* Figure 6 shows some calculated examples of how the intensity of NOESY crosspeaks change as a function of 2, (23). During the early part of the buildup, the change is essentially linear and is caused only by cross-relaxation between the two spins whose signals areconnected by the crosspeak (I and S in Fig. 5). This early period is called the “initial rate regime,” and its significance is that only while it lasts is there a simple relationship between crosspeak intensity and the single internuclear distance between I and S. At later times, intensity changes caused by the NOE themselves start to perturb intensities of other near neighbors, so that crosspeak intensities become dependent on many internuclear distances rather than one. This process is called spin diffusion, and intensities in NOESY spectra where appreciable spin diffusion has occurred are only calculable numerically by methods requiring knowledge of the whole structure (24). Until recently, protein work almost exclusively involved homonuclear (‘H,‘H) experiments, such as those previously discussed, but now that many proteins are available from expression of cloned genes in microorganisms, they can be obtained containing various stable isotopic labels to facilitate heteronuclear NMR experiments. The most useful heteronuclei are 15Nand 13C,which can be incorporated globally (although this is very expensive in the case of i3C) or specifically (in the sense that a particular position is labeled in all occurrences of a particular amino acid). Using suitable pulse sequences, it is possible to edit the ‘H spectrum to yield signals from just those protons directly bonded to a heteronuclear label, and then to determine the interactions of these protons with others (25). Another important recent development 1sthat of three-dimensional spectroscopy (26). Here, a second, *The conditions “large” and “small” here really refer to the rate of molecular tumbhng, characterized by the correlation time TV,,“large” means q >> 1, “small” means W, C-C 1, where 0) IS the field strength of the spectrometer expressed m radians S-I

Proteins in Solution

31

A

zrn (set)

0.5

B

1.0

rm (set) Fig 6 Calculated time-course(Intensity vs 2,) for three crosspeaksm the NOESY spectrum of bovine phosphohpaseA2. The crosspeaksall involve the backbone NH of Lys 56, interacting with Ala 55 NH (curve labeled N m the figure), Lys 56 Ca’ (labeled pl), and Lys 56 Ca2 (labeled p2). The relative drstancesfrom Lys 56 NH to each of these spursare m the ratio 1.13 (N)*l 00 (pt) 1 61 (p2) Intensities were calculated by a numerrcal integration procedure mvolvmg all nearby spins, using distances determined from the known crystal structure of this protein, and assuminga spectrometer frequency of 400 MHz and correlatron time for molecular tumbling of 3 x lO-9 s m A and 3 x 10d8s in B Note the effect of spin diffusion particularly on curve P2, where a very short period of slow growth (the Initial rate corresponding to the long distance between these spins) is rapidly followed by buildup of intensity transmitted through the Intermediate spin pt. (Reproduced and modrfred, with permission, from ref 23.)

independently incremented delay is introduced, and the resulting matrix of FIDs, written S(t,, t2, t3), is Fourier transformed to give a spectrum with three independent frequency dimensions. Two mixing periods are then available, one between t, and t,, the other between t2 and t3, and these may involve different mixing processes, leading to a very wide variety of possible experiments. It is already clear that these new developments will extend significantly the range of proteins accessible to study by NMR.

32

Neuhaus

and Evans

4. Assignment It is c!early impossible to give a detailed account here of the way in which the full assignment is carried out for a new protein, so what follows is necessarily a rather cursory overview. More detailed accounts may be found in the general references cited earlier (14). Spectra are acquired in a mixture of 90% H,O; 10% D20, the D20 being necessary for the spectrometer’s field-frequency lock. Depending on the quality of the spectra, in particular of the suppression of the HZ0 signal, it may also be necessary to run spectra in 100% D20. Spectra in D20 lack most of the signals resulting from exchangeable protons, but are more sensitive and reveal crosspeaks closer to the water signal. Before tackling the methodology of assignment, a few points about chemical shifts in proteins areneeded.Proton chermcal shifts aremainly determined by electronic influences through bonds from immediately neighboring groups and atoms. Thus, for instance, backbone amide NH signals are usually in the range of approx 10-6 ppm, CaH signals are usually near midfield at approx 5.5-3.0 ppm, and methyl groups bound to sp3 carbon are usually at high field, approx 2.0-0.5 ppm. However, there are important subsidiary effects on chemical shifts caused by more remote interactions, often transmitted through space rather than through bonds (e.g., aromatic ring current shifts). These depend on the wider environment of the proton concerned, and so are properties of the whole protein conformation. For example, if a protein contains several alanine residues, each will contribute NH, CaH and CPH, signals (usually in the gross regions just mentioned), but, barring chance coincidences, these will all be at different shifts for each residue. Such interresidue chemical shift differences cannot be predicted without extensive knowledge of the whole structure (if then), so assignments cannot be made on grounds of chemical shift alone. As will be shown, the way out of this difficulty is to base assignments on interactions between protons, as determined by 2D experiments. The first task is to establish as far as possible the connectivities through couplings that define the spin systems, using experiments, such as COSY, RELAY, and TOCSY, that are based on J-couplings. Some amino acids give patterns of crosspeaks that are often easily recognized and lead quickly to a unique classification. For instance, alanine shows crosspeaks linking the methyl signal at high field to the

Proteins in Solution

33

C”H signal at midfield in COSY, and to the NH signal in RELAY. No other residue gives the same combination of crosspeaks; threonine gives (methyl, CPH) crosspeaks in much the same spectral region as alanine (methyl, CaH) crosspeaks, but it does not show (methyl, NH) crosspeaks in RELAY. Similarly, glycine is the only residue to have two C”H signals, and thus, to give two (NH, CCIH) crosspeaks in COSY or a (C”H + CaH, NH) crosspeak in a double quantum spectrum. Using distinctions based on arguments of this sort, it is often relatively simple to pick out spin systems corresponding to Gly, Ala, and Val residues, and to find at least parts of the spin systems of Leu, Be, and Thr. Other amino acids give patterns of crosspeaks that can only be categorized into groups. The largest such group are the “AMX” residues, so called because, in DzO, the CaH and two @H signals form an AMX spin system (see Chapter 1). This group comprises the aromatic residues Phe, Tyr, His, and Trp, and also Asp, Asn, Cys, and Ser. Of these, often only Ser can be distmguished at this stage, based on the lower field CBH shifts and smaller geminal CPH coupling than for other AMX residues (both these differences are caused by the oxygen substitution at C?). The other large group comprises the residues with long side chains, namely Lys, Arg, Met, Glu, and Gln. Of these, Met, Glu, and Gln can sometimes be distinguished by their lower CW shifts and simpler spin systems, but correlations involving more distant side chain protons in “long side chain” residues are often ambiguous or indistinct, since they give crosspeaks in very crowded spectral regions, or involve inefficient transfer of magnetization over several couplings. By the same token, if only partial connectivities can be found for Leu residues, there may be no recognizable indication that the methyl groups are linked to the NH, C”H, and CPH signals, so that the spin system can be indistinguishable from a “long.” Finally, Pro is often the hardest spin system to characterize, since it has no NH proton, and must be identified using correlations to the CaH and CSH signals at midfield. Figure 7 shows schematically some of the patterns expected for various spin systems. In addition to these spin systems, Asn and Gln residues contribute a pair of exchangeable amide signals from their side chains (these do not J-couple to other protons in the residue, but can couple to one another, and often show strong mutual exchange peaks in NOESY),

Neuhaus and Evans

34

Alanine

0

Cross peak tn COSY, RELAY, and TOCSY

#

Cross peak In RELAY and TOCSY

Ha

NH

10 10

NH

Ha

Me

Fig 7 (through p 39) Schematrc representations of the patterns of crosspeaks expected for the various amino acid spm systems in COSY, RELAY, and TOCSY experiments for samples in HzO. Diagonal peaks are omitted for clartty For samples in D,O, crosspeaks involving exchangeable protons (amide and guamdme NH) will be lost. For spectra m HzO, the exchangeable protons observed depend on the experimental conditrons, partrcularly pH. At low pH, other signals, such as side chain NH, of Lys, may become detectable because of decreased exchange rate, whereas NHE of Arg may not be observable at higher pH because of increased exchange rate. Other types of exchangeable proton (e g , Thr OH) may be observable rf then exchange is slowed by some special factor, e g , a stable hydrogen bond. The exact values taken by particular chemtcal shafts depend on the details of the local environment, so, for instance, not all alanme methyl groups ~111resonate at exactly the same frequency Thus, multiple occurrences of particular ammo acid types generally give separately resolved crosspeaks for most occurrences. The chemical shift values shown here are therefore only intended to be approxrmate mdtcators of the ranges actually found; also, protons not too far apart in shift might somettmes be permuted. For lysme, signals owing to the CW and C”H protons are often too crowded to be separately tdentrfied, so the corresponding envelope of signals 1s represented here by a Jagged lme covering the appropriate typical range of shifts

Proteins

35

in Solution 0

AMX spin system

f

#

Cross peak in cow, RELAY, and TOCSY

0

Cross peak In RELAY and TOCSY

Ha

NH

I

I

NH

I

Ha

10

I

0

HP HP

0

Arginine

0

D 8 i I / 1:I 0

Ws HP HP

0.0

O/

0.0

/ 0

000

NHe

0

0

0

0.0

NH

NHE

Ha

I

I HS'S

I W

HP’s

0

Cross peak In RELAY and TOCSY

0

Cross peak TOCSY

Ha

0

/, NH

0

0

Cross peak In COSY, RELAY, and TOCSY

HS’s

/ 0

@

10 0

In

Neuhaus

36

and Evans

10

5spin system

0

00

0 0

a 0 4/ 0

HP HP 0

Cross peak In COSY, RELAY, and TOCSY

0

Cross peak m RELAY and TOCSY

W

Ha

Cross peak TOCSY

0

m

NH

L 10

I

I

NH

I

Ha

W

I

I

10

HP HP

10

Glycine

.O

#

Ha Ha

NH

I

NH

I

Ha

I

Ha

Cross peak in COSY, RELAY, and TOCSY

Proteins in Solution

37 0

0

$

lsoleucine

.O

Me6 Mq HY HP HY

@

Cross peak m COSY, RELAY, and TOCSY

#

Cross peak in RELAY and TOCSY

0

Cross peak TOCSY

Ha

0

00 00 0

m

NH

/ 10

I

I

NH

I

Ha

I

HYHpI

I

Hy,,&Me6

--I IO 0

0 7- .O

0

Leucine

0

Cross peak in COSY. RELAY, and TOCSY

0

Cross peak In RELAY and TOCSY

0

Cross peak TOCSY

Ha

0

mo co

NH

/ 10

I

NH

I

Ha

II

HP’s

I

Hy

10

II

Me’s

0

tn

38

Neuhaus and Evans Lysine

,O

W HP

0

Cross peak in COSY, RELAY, and TOCSY

@

Cross peak m RELAY and TOCSY

0

Cross peak TOCSY

He’s

Ha

-

In

Cross peaks mvolwng Hy’s and H6’s

NH

I

NH

I

Ha

I

HE’S

I

I

10

HP HP

Proline

0

Pf HP HY HP

HS HS

#

Cross peak In COSY, RELAY, and TOCSY

#

Cross peak m RELAY and TOCSY

0

Cross peak TOCSY

Ha

10

m

Proteins

39

in Solution 0

a

Threonine

0

7-

Me

@

Cross peak In COSY, RELAY, and TOCSY

$

Cross peak In RELAY and TOCSY

0

Cross peak TOCSY

HP

HO. In

NH

I

I Ha

NH

I HP

-11 0

I Me

(1

0

Valine

7- .O : )Me’s

HP

0

Cross peak In COSY, RELAY, and TOCSY

0

Cross peak in RELAY and TOCSY

0

Cross peak TOCSY

HU

NH

10

I

NH

I

Ha

I

HP

I I

Me’s

10 0

in

40

Neuhaus

and Evans

and aromatic residues contribute nonexchangeable signals usually in the low-field region. The J-coupling connectivity patterns expected for Phe, Tyr, Trp, and His are characteristically different in that Tyr contributes only two coupled aromatic signals, Phe contributes three, and Trp four (in addition to the NH& to H” connectivity), although His contributes a pair of sharp singlets with a small (unresolved) coupling between them. For Phe and Tyr, more complex patterns can result if the rate at which the aromatic ring flips is slow on the NMR time scale. All these additional spin systems from CONHz and aromatic groups are isolated from the rest of the molecule as far as J-coupling is concerned, but are linked in to the other assignments using NOESY connectivities during the sequential assignment stage. As just shown, J-couplings allow one to classify spur systems according to the residue type from which they originate, but it is not possrble to assign each spin system to its correct location m the sequence using homonuclear coupling data alone. For this we need to know the sequential neighbors of each spin system, and such knowledge can only come from the through-space mformatron in NOESY (or ROESY) spectra or from heteronuclear J-couplings along the backbone. Only the former approach has been extensively used so far, mamly because of the very low sensitivity of heteronuclear experiments in the absenceof isotopic enrichment and the small size of many long-range heteronuclear coupling constants (i.e., heteronuclear couplings transmitted over more than one bond). Figure 8 shows the three main types of close contact, d,,, d,,, and dpN, that lead to sequential NOE crosspeaks from NH protons in a NOESY spectrum. Connectivttles from NH signals are generally the most useful, because they are the most abundant and suffer the least from spectral overlap. Of course, a priori one does not know whether a given interresidue crosspeak necessarily represents a sequential connectivity; in fact, the existence of nonsequential interresrdue crosspeaks is vital to the later determination of the three-dimensional fold. Statistically, an interrestdue crosspeak is more likely to be sequential than not, though not overwhelmingly so (a study of 19 proteins containing in total 3224 residues(1) showed that, if the NOE ISassumedto be appreciable only up to 3.6 A, the percentages of contacts found to be sequential are, for dNN,76%, for daN, 72%, and for dpN,66%). However, when a stretch of several spin systems is linked by a network of mterresrdue

Proteins in Solution

41

Id

CZN

Fig. 8 Types of short Interproton distances that give me to sequential NOESY crosspeaks Involving backbone NH signals, see texl for discussion.

crosspeaks and can be fitted to the sequence, the combined confidence m this stretch of assignments becomes very much higher. To illustrate the process of fitting the spin systems to the known sequence, Fig. 9 shows the NOE connectivrties that were used to make sequential assignments for a 35-residue zinc-finger peptide from SW15 (in its metal-bound form) (12). Before turning to the NOESY spectrum, the spin systems identified from COSY, TOCSY, and RELAY spectra comprised 13 AMX systems (excluding serines), 12 “long” spin systems (among which are included one of the leucmes and the three five-spin systems, not yet distinguishable from the other “longs”; the N-terminal Met of course shows no amide NH signal), two alanines, two serines, two prolines, one valine, one isoleucine, the other leucine (readily identifiable since TOCSY showed connectivities from the PH through to the methyls), and one glycine. Clearly, the Val, Ile, and Gly residuesare uniquely assignable already, and form convenient “start points” for the sequential assignment process. Starting from the NH resonanceof Val, dctNand daNconnectivtties were found to an AMX system, and from the NH resonance of this

45

50 55

s

Ill

s

60

s

s w w

m

w

Fig 9 Sequential and hellcal connectlvltles found for a 35residue zinc-finger peptlde from SW15 (a yeast transcrlptlonal activator protein) Approximate relative mtensltles are indicated See text for discussion. (Reproduced and amended, with permission, from ref 12 )

daN(t, r+3)

dcN

dNN

40

MLEDRPYSCDHPGCDKAFVRNHDLIRHKK:HQEKi

36

Proteins in Solution

43

AMX system, duN and dpNconnectivities were found to one of the Ala spin systems. Given that there are only two alanines in the sequence, this represents another “anchor point” for the assignments, and allows high confidence to be placed in the assignment of the intervening AMX system. Furthermore, there are NOESY connectivities from the CBH resonances of this AMX to the ortho protons of a phenyl ring spin system, reinforcing its assignment as a Phe, and simultaneously assigning the aromatic signals (note, however, that these aromatic signals also show NOESY connectivities to other CPH signals, so that the correct combination of an AMX with an aromatic spm system for this Phe could not have been deduced prior to the sequential assignment stage). From the NH of the Ala, the connectivities continue to a “long,” then to an AMX, followed by another AMX, and then a Gly. Once again, the Gly represents an “anchor point,” reinforcing the confidence that can be placed in the intervening three assignments. This particular stretch of connectivities involving NHs necessarily ends at P47 (although connectivities from the prolme C”H protons to CaH of H46 make abridge to the next stretch of connectlvities). To the C-terminal side, from V54 connectivities are found to a “long” and then to an AMX, but here the path stops at least in this spectrum, becauseH57 turns out to have a particularly weak NH signal becauseof rapid exchange with solvent water protons. The assignment of the AMX to N56 is strongly reinforced by observation of enhancementsfrom these @H signals to a pair of side chain CONH2 signals. The region of a NOESY spectrum containmg the dclNand dpNcrosspeaks is shown in Fig. 10; the dNN crosspeaks appear near the diagonal in a region below that shown. Before turning to the remaining assignments m this peptide, there are some more general points that this example brings out. One of the clearest tertiary NOE enhancements involving this part of the peptide 1sfrom K5 1NH to both the CPH signals of C44, and they illustrate how the distinction between tertiary and sequential enhancements proceeds during the assignment. Viewed in isolation, these enhancements could initially equally well have been sequential, but the assumption that they are sequential leads to no self-consistent assignments for this region of the peptide (the AMX spin system actually owing to C44 would be incorrectly assigned to D50, and the pattern of NOE connectivities observed from it IS inconsistent with the actual neighbors of DSO).

Neuhaus

and Evans

-,

,

!I

c

-2

0 H46

%I 3 1’

0

“57 ‘,

Q

0 4

H57 ?

4

8

H46

H20

‘slgnal

Proteins in Solution

45

Usually, it is necessary to record spectra at more than one temperature to complete the assignments. Exchangeable signals are particularly temperature sensitive, so that changing the temperature slightly achieves two things: (a) It moves the water signal, revealing previously buried C”H signals (e.g., D50 and A52 in Fig. lo), and (b) it causes differential movements among the NH signals, so that by comparing spectra at both temperatures, NH overlap can often be resolved (e.g., the degeneracy of H62 NH with 467 NH and the near degeneracy of F53 NH with S45 NH seen in Fig. 10 are not present in a spectrum recorded at 27°C). Overlap among nonexchangeable signals is harder to deal with, since it is less likely to be affected by temperature. For example, G48 and P47 have a common CaH shift both at 10°C and 27”C, so it 1s impossible to tell whether the crosspeak at 64.18, 68.98 is an intraresidue (NH, C”IH) interaction within G48, or a sequential dCLNclose contact to P47 (or a combination of both). Similar stretchesof assignmentscan be made for fragment H57-A70, “anchored” principally on 160, S65, and A70, fragment P41-H46, “anchored” on S43, and fragment M36-R40; as with P47, crosspeaks from P41 C”H to R40 C”IH link the two N-terminal fragments. Although there is no sequential link from H57 to N56, the assignment is secure on both sides of H57, and in this instance, there is additional informaFig. 10 (opposite page). Part of a NOESY spectrum used to make the sequential assignments shown m Fig 9. Crosspeaks corresponding to intruresidue contacts from backbone NH signals are shown filled, and are all identified by residue type and sequence position. The classification of these according to residue type depends on information already deduced from COSY, RELAY, and TOCSY experiments, but the sequence specific assignments arise as a result of analyzing this (and other) NOESY data, as discussed m the text. The sequential connectivities for the sequence fragment P47-N56 are traced through the spectrum (in the direction C terminus to N terminus), and are dlscussed in more detail m the text. The daN crosspeaks are identified by rectangles, the dpN crosspeaks by diamonds, and the dNN peaks appear m a spectral region below that are shown here Also shown (using circles) are the crosspeaks used to hnk the CPH signals of F53 to the H2,6 signals of its aromatic ring, and the crosspeaks used to link the CPH signals of N56 to the corresponding CONHz side chain amide signals. The position of the water signal is marked m F,, showing where a narrow band of signals has been lost owing to the presaturation applied at this frequency. The positions of crosspeaks in this region are known from spectra recorded at other temperatures and are marked here by crosses

46

Neuhaus

and Evans

tion from d&i,i + 3) connectivities, because this region of the peptide forms an a-helix. There is no rigid division between the process of spin system assignment and sequential assignment, so that information from the sequential assignment stage often “finishes off’ the spin systems. In this example, the spectra are sufficiently simple that this was hardly required, but sometimes, for instance, the second CPH signal of an AMX or the more distant signals in a long side chain are confirmed by sequential crosspeaks when the intraresidue crosspeaks are overlapped. Variations of the sequential assignment strategy have recently been proposed in which patterns of sequential crosspeaks aresearched for before the side chain assignments are complete (27), but it seems likely that these methods are only applicable in regions of well ordered secondary structure, where predictable patterns occur in the NOESY spectra. One important omission from the discussion so far is that of stereospecific assignments.For methylene groups that give two resolved signals, there is generally an ambiguity asto which signal corresponds to the pro-R proton and which to the pro-S. This applies particularly to CPH signals. There is a similar ambiguity for the diastereotopic methyl signals of Val and Leu, and also for the two CaH signals of Gly. Various methods have recently been proposed to make such assignments. For CFH signals, these involve interpreting (PH, CPH) coupling constants, but the relationship between 3J and dihedral angle is such that the two staggered gauche conformations (g’ and g-) cannot be distinguished, and additional information is needed to break this ambiguity. Usually, such information comes from differential (NH, CPH) intraresidue NOE enhancements at short mixing times (28,29), but (15N, CPH) or (13C=0, CPH) heteronuclear three-bond couplings can also be used. For methyl groups of Val and Leu, a method based on the stereospecificity of biosynthetic incorporation of 13Chas been developed (30). In cases where these methods are unsuccessful or inapplicable, it may be possible to make stereospecific assignments during the structure calculation itself, if one assignment leads to significantly smaller violations of the NOE constraints (31). In general, however, the problem of missing stereospecific assignments is handled by sacrificing the stereospecific information, referring all distance constraints involving either diastereotopic atoms or groups to a single “pseudo-atom” at the geometric centroid of the

Proteins in Solution

47

group of protons involved (e.g., for a methylene group, the midpoint between the CPH protons). A similar approach is used to handle the ambiguity that exists between the two sides of a symmetrical aromatic ring when fast ring fhpping leads to averaged signals. Clearly, the present example is a relatively simple case. In larger proteins, overlap becomes a much more serious problem, and sensitivity is likely to be lower owing both to the lower molar concentrations likely to be available and to the broader signals. Fitting to the sequence also becomes more complicated, since there will be fewer if any unique residue types, so that the assignments have to be “anchored” on identifiable, unique, di-peptide or tri-peptide fragments. However, there have been some recent developments that have improved the situation and promise to extend the size range of proteins that can be studied. If the protein is available in good yield via overexpression of a cloned gene, it may be possible to incorporate 15Nglobally throughout the protein with high efficiency (>95%). Heteronuclear variants of the spectra so far discussed are available, in which each directly coupled NH pair contributes signals at its i5N frequency m Fi rather than at its ‘H frequency (32). Since there is generally no correlation between the two shifts, the combination of homonuclear and heteronuclear spectra together provide a powerful tool to resolve overlap. Still more powerful is the application of 3D spectroscopy, both homonuclear and heteronuclear, and the first experiments (both 2D and 3D) with globally i3C-labeled proteins have recently appearedm the literature (33-3.5). It is very likely that developments such as these will lead to significant changes in the way in which sequential assignments are carried out for those caseswhere labeling is viable. 5. Structure Determination Most of this section is concerned with the calculation of 3D structures of proteins from NMR-derived distance constraints, but first we consider what structural features can be deduced from the assigned spectra by inspection, short of actual calculation. Essentially, this is limited to characterizing secondary structural elements, in particular a-helices and P-sheets. The residues involved in an a-helix can often be recognized by the combination of: 1. Strong sequential d,, connectivitles; 2 Relatively weak d,, connectivlties; and 3. Small J (NH, PH) coupling constants.

48

Neuhaus

and Evans

Even more characteristic are enhancements transmitted across one turn of the helix, the most useful of which are d&i, i + 3) connectivities. As an example, several such connectivities are indicated on Fig. 9; not only do these indicate a helix running from N56 to 467, but they are also a useful independent check on sequential assignments in this part of the structure (for instance, they bridge the gap m sequential connectivitles at H57). In much the same way, a regular P-sheet often shows: 1. Strong sequential daNconnectlvitles; 2. Relatively weak d,, connectlvltles; and 3. Large J (NH, PH) coupling constants Further evidence can sometimes be found from cross-strand NOE connectivities (although of course these represent tertiary structural information). Characterization of turns, other than in the simplest case of a tight turn linking two strands of antiparallel P-sheet, is in general more difficult and often emerges only during the calculation of the overall structure. In addition to this evidence from NOE connectivitles and J-couplings, regions of regular secondary structure are often associated with slowly exchanging NH signals. If the protein can be transferred rapidly into D20 (e.g., by lyophilization from HZ0 followed by dissolving in D20), these signals can often be identified directly, since those NH protons protected from solvent exchange by hydrogen bonding within secondary structural elements may persist for some hours or even longer. However, exchange rates also depend on the particular dynamics of the protein structure, and in some cases, this may obscure the influence of the H-bonding pattern. Turning now to the determination of the overall structure by calculation, the first task is to assign as many of the crosspeaks in the NOESY spectrum as possible. Note that this represents an additional level of assignment beyond that already achieved; even when the chemical shift of every signal is known, the origin of a given crosspeak is ambiguous whenever one or both of its shift ordinates correspond to two or more signals. Some such ambiguities can be solved by comparing spectra acquired under different conditions, and others may be resolved once one has some preliminary knowledge of the structure, but in general these ambiguities limit the number of crosspeaks that

Proteins

in Solution

49

can be used to provide clearly identified distance constraints. It is to be hoped that, for proteins where global 13Clabeling is possible, heteronuclear experiments may largely remove this problem (33-35). The remaining task in preparing input data for structure calculation 1sto classify the enhancement intensities into semiquantitative groups, and to calibrate these against distance. This area poses a number of difficulties, discussed later, The result is that classification by distance can only be approximate, so that NOE-derived constraints are expressed as allowed distance ranges, rather than specific values. Within such an allowed range,all distancevalues areusually taken to be equally probable. First, there is the point raised in Section 3. that NOE intensities have a simple distance dependence only during the “initial rate regime,” that is, for short mixing times 7,. Within this approximation, crosspeak intensity is taken to be proportional to rw6, that is, the inverse sixth power of internuclear separation. As z, increases, enhancements at directly neighboring protons themselves become large enough that they, in turn, disturb the balance of cross-relaxation at their near neighbors. Thus, enhancements propagate through the network of protons within the structure, and the intensity of a given crosspeak becomes a complicated function of the geometrical arrangement of all nearby protons. Still worse, new crosspeaks start to appear, corresponding to pairs of protons separated not by one short distance, but rather by a pathway of two or more short distances via intervening protons. This process is called spin diffusion, and its influence increases as the tumbling rate of the solute decreases, so the larger the molecule under study, the more severe the problem becomes. Within the initial rate regime, as the name implies, enhancements grow linearly, so one way to reject spin diffusion is to measure NOESY crosspeak intensities at several mixing times, and then to take the initial slope of the timecourse as being proportional to re6. However, the time during which the initial rate approximation is valid is different for each proton, and for some geometries associated with rapid spin diffusion, the true initial rate may escape detection (e.g., curve p2 in Fig. 6). This leads to incorrect distances if the simple re6 dependence is assumed. The other most important reason for uncertainty in the relationship between NOE intensity and distance is that of motion. Quite apart from their dependenceon internuclear distance, NOE intensities depend critically on the motion of the internuclear vector connecting the interact-

50

Neuhaus

and Evans

ing protons, so if the simple assumption that there is a single rigid structure tumbling isotropically is invalid, this will alter particular intensity values. For globular proteins, the main consideration is that of internal motions, and these have two important effects. First, if the distance between the interacting protons changes as a result of the motion, then the measured intensity represents an average over the motion. This average may be strongly weighted toward shorter distances because of the y6 dependence. Second, whether or not the distance changes, NOE intensity is affected by the local mobility of the interacting protons. As pointed out in Section 3., the NOE is positive for small molecules and negative for large. If a large molecule includes a region of high local mobility, NOE interactions involving one or more protons in the mobile region will behave as if they occurred in a smaller, more rapidly tumbling molecule. Since proteins are large enough to be in the negative NOE regime, this means that motion tends to reduce NOE intensities. It is quite common, for instance, for a few residues at the C or N terminus of a protein to show very weak NOESY crosspeaks, as a result of the greater flexibility in these regions. At a more practical level, there is also the matter of measuring NOESY crosspeak intensities. Volume integration is certainly the correct method and is being used increasingly. However, methodology is still developing in this area, and at present, volume integration can be difficult in regions of overlap, or where the base surface of the 2D spectrum is distorted or noisy. For convenience, measurement of peak height (often simply by counting contours in an evenly contoured plot) is sometimes substituted for volume integration. However, this should be combined with at least approximate corrections for individual linewidths and multiplet structures, since these factors alter the relationship between integral and peak height for each crosspeak. Approximate calibration of the data against distance can often be achieved by examining the overall intensity distribution of the NOESY crosspeaks, particularly once something is known about the location of secondary structural elements. Thus, daN crosspeaks in regions of regular p structure are very intense, corresponding to a distance of 2.2 A, whereas d N(i,i + 3) interactions in a helices correspond to separations of 3.4 R . If a more formal calibration is required, this can be obtained by identifying and quantifying one or more crosspeaks corresponding to known “reference” distances. Crosspeakintensity ratios can

Proteins in Solution

51

then be used to estimate the ratio of an unknown distance to the reference distance, using the equation al/a2 = (r1/r2)-6, where al and a2 are the two intensities, and y1 and r2 are the two distances. A geminal methylene interaction (e.g.,the [C?H, C”H] crosspeakof glycine, r= 1.75 - 1.8 A) is often used as a reference or alternatively, the interaction between adjacent aromatic ring protons, e.g., of Tyr (r = 2.8 A). Since there is always the possibility of errors owing to internal motions and spin diffusion, it is as well to compare results with several reference distances and to assesswhether the calibration is reasonable in terms of the implied range of distances observed for sequential contacts. Note also that, for proteins, it is common practice to set only the upper bounds of the distance constraints according to this calibration, the lower bounds being set in each case to the sum of the appropriate van der Waals contact radii. This is again to allow for internal motions; if a crosspeak is weak, it cannot be assumed that this implies the interacting protons are necessarily distant, since a short-range interaction could always have been quenched by high local mobility. With these factors in mind, it can be seenthat quantification requires caution, Rather than attempting to find a rigid relationship between intensity and distance, crosspeak intensities are divided into semiquantitative groups (e.g., “strong, ” “medium,” and “weak”), and each group associated with an appropriate calibrated value of the upper bound for the corresponding distance constraints. The longer the distance these upper bounds are set to, the more certain it is that the data are not overinterpreted and that the various sources of error mentioned earlier are allowed for, but the less active the distance constraints are in determining the structure during the subsequent calculations. In addition to constraints based on NOE data, increasing use is being made of coupling constant data to specify torsion angles. Accurate values of coupling constants are not trivial to obtain for proteins, and the Karplus equation (see Chapter l), which relates coupling constants to torsion angles, is semi-empirical and somewhat approximate. Therefore, much as for NOE-derived constraints, coupling constants lead to acceptable ranges for the corresponding torsion angle, rather than specific values. Also, becausethe Karplus equation is multivalued, a given coupling constant is compatible with either two or four possible angles (or ranges). For these reasons, often only those coupling constants with extreme values are used to provide constraints. This

52

Neuhaus

and Evans

minimizes the ambiguities and maximizes the chance that the coupling originates in a region of defined local conformation. Coupling constant data are in many ways complementary to NOE-derived constraints, since couplings relate to local structural detail, which is precisely where the approximate nature of NOE constraints leads to difficulties. For proteins that can be labeled with 13Cor 15N,it is to be expected that heteronuclear coupling constant measurements will prove very useful in the future. Several methods of calculation are available for determining structure from the NMR constraints. Some aim to tackle the purely geometrical problem of fitting the maximum number of constraints while maintaining the covalent connectivity and minimizing van der Waals contacts. Others combine this processwithenergy calculations, which necessitatesexpressing the NMR constraints as if they were additional energy terms. Because of the number and approximate nature of the constraints, there is no one structure that uniquely fits the NMR data. For this reason, it is usual to carry out a series of calculations using randomly different starting conditions (the meaning of this depends on the particular method) and to compare the results for the whole series. Each calculated result then represents one point in the conformational space compatible with the NMR constraints. If the method used is itself unbiased, and if a sufficient number of calculations are carried out, the set of conformations represents this conformational space.Also, some judgment can then be made as to how well different parts of the structure are defined by the data, since well defined regions will vary little between the different computed conformations, whereas poorly defined regions will differ more considerably. As an example, Fig. 11 shows a set of five conformations for the small protein BUS1 IIA, calculated using a distance geometry algorithm. Calculations of this sort necessarily assume that there is a single conformation to be found, and it is reasonable to ask: What are the consequences when this assumption breaks down? If, as may often be the case, the gross conformation is preserved while the local detail varies, this is unlikely to affect interpretation of the NOE constraints very much becauseof their already approximate nature. In other cases, the consequences depend on the nature and extent of the conformational heterogeneity and its effect on the NOE constraints. As indicated earlier, flexibility often reduces NOE intensities, so that flexible

Proteins in Solution

53

Fig 11, Five conformations calculated for the small protein BUS1 IIA using the program DISGEO. In addition to the covalent structure (including disulfide bridges), the input data for these calculatrons comprised 202 assigned NOE constraints, several constraints derived from coupling constants, and explicit constramts for hydrogen bonds within the regular secondary structural elements once these elements had been located from initial calculattons. For certain regions of the protein where there are many constraints, such as in the a-helix and triple-stranded P-sheet, there is good agreement between the individual structures, whereas in others where there are fewer constraints, such as m the N-terminal region and the loop at the bottom of the structure (as shown), there 1s much more divergence. (Reproduced, with permission, from ref. 36 )

regions may be poorly constrained by the data, leading to a large divergence between individual calculated structures in such regions. On the other hand, if constraints are available from flexible regions, it is likely that many of them will be impossible to fulfill in any single structure, since they represent an average over different conformations. This would lead to large constraint violations in individual calculated structures. If there are regions of stable structure amid more flexible parts (e.g., domains connected by flexible linkers), then interpretation is likely only to be possible for the structured regions in isolation. However, if there are many contributing conformers that differ grossly throughout the molecule, interpretation cannot be expected to succeed.

54

Neuhaus and Evans

Among the purely geometrical methods, there are two quite separate approaches, although in at least one case the results obtained between them do not seem to be very greatly different (37). The program DISMAN works by systematically varying torsion angles, while minimizing a target function that represents the sum of the NMR constraint violations and the van der Waals interactions (38). In order to avoid local minima, the constraints are often introduced progressively, beginning with those that span only a few residues, and including longer range constraints only later in the calculation. Because the method operates on a starting conformation, it is possible either to start with randomly generated conformations or from some known model structure if desired. The other geometrical approach is that of distance-geometry calculation. For a structure of Natoms, any geometry in (N-l) dimensional space will be compatible with a full set of N(N-1)/2 distances measured between the atoms in three dimensions (39). Distance-geometry calculations project such a high-dimensional geometry into three-dimensional space using a process called “embedding,” while minimizing the extent to which constraint violations are introduced. Further optimization of the structure is then carried out. For calculations based on NMR data, distance geometry methods also have to cope with the fact that only some small fraction of the total number of distances is known, and although the covalently bonded distances are known precisely, the NOE-derived distances are not. As mentioned earlier, this imprecision is usually handled by running several calculations. For each calculation, particular values are chosen at random from between the upper and lower bounds for each constraint, and some further checking is carried out to make these choices as mutually consistent as possible. Note that, unlike the DISMAN program, distance-geometry calculations do not operate on a starting conformation, but take only the covalent connectivities and NMR constraints as input. The NMR constraints can also be incorporated into the force field used for various types of energy calculation; the calculations are then said to be “restrained.” In such calculations, violating an NMR constraint is associated with an energy penalty, the functional form of which is often chosen to be parabolic for convenience. The relative weight attached to the NMR constraints, as opposed to the purely

Proteins in Solution

55

energetic terms, can also be varied, sometimes during the calculation itself. Methods in this category include restrained energy minimization, restrained molecular dynamics, and “simulated annealing,” which in this context is essentially a simplified version of molecular dynamics. In practice, several techniques are often combined. Quite often, the geometrical methods yield rather high-energy conformations. Therefore, a typical strategy might be to begin with a series of geometrical calculations, to select those that have converged, and then to refine these using restrained molecular dynamics or energy minimization. Another method of refinement at present being developed is that of back-calculation of the NOESY spectrum. As was mentioned in Section 3., it is possible to obtain theoretical crosspeak intensities for a known structure even when there is spm diffusion, by using a relaxation matrix calculation. Thus, once aninitial structure hasbeenobtained, it can be optimized by iterative comparison of the back-calculated theoretical NOESY mtensrtles with the real NOESY data to obtain the best match (40). This process transforms limited spin diffusion from a problem into a source of information and has the added advantage that it is largely independent of the external decisions necessary in the earlier preparation of input data, particularly in the quantification and calibration of the NOE intensities. However, there are some practical problems in implementing the method, and the problem of accounting for dynamics of the structure remains. 6. Partial Answers: Lower Resolution Information We have seen in the preceding sections how NMR spectroscopy can be used to determine m detail the 3D structures of small proteins. The applicability of these methods is constantly being extended with the development of new experiments, but there are nonetheless many instances, particularly when the protein is rather large, when such a detailed analysis will not be possible. It may nonetheless be possible to devise NMR experiments that can yield valuable, albeit more limited, structural information m these cases.There have been many such studies, using widely differing experimental strategies-many examples can be found m the book by Jardetzky and Roberts (41). Here we attempt only the briefest of overviews.

56

Neuhaus

and Evans

There are numerous examples where, although the majority of lines in a spectrum remain unresolved and unassigned, a limited number of “marker” resonances can be found that are particularly well resolved and can provide useful structural information, provided that they can be assigned. Often such marker signals are those with extreme values of chemical shift that place them outside the broad envelopes of overlapping signals. Although full sequential assignments obviously cannot be made under these circumstances, it may be possible to assign marker resonances by studying chemically modified or mutant proteins, or protein fragments. This principle underlies many NMR studies of larger proteins, in particular. Among the most widely studied of marker resonances in protein spectra are those of the C2H protons of histidine residues (42). These resonances are often relatively sharp and in an uncrowded region of the spectrum (in D20), facilitating their observation. In addition, their chemical shifts are pH dependent, reflecting the ionization of the side chain imidazole function, and this permits their pK, values to be determined. This can provide mechanistically important information about the ionization statesof specific histidines in the active site of an enzyme, for example. Conveniently observed resonances such as these can also be employed as more general probes for detecting conformational change in proteins-for example, Fig. 12 illustrates how the histidine resonances of staphylococcai nuclease turned out to be very useful in studies of the proline isomerization equilibria in this protein (18). Hemoglobin is a protein that has been much studied with the aid of a limited number of assigned resonances in its ‘H spectrum (44‘45). These include many of the His C2H resonances as well as others that are well resolved as a result of large shift perturbations caused by the iron-porphyrin group. Assignment of these resonances has been facilitated by the availability of a wide range of mutant hemoglobins-for example, if a specific histidine residue is absent in a specific mutant, it is often possible to identify the C2H resonance of that residue in the wild-type protein spectrum simply on the basis of its absence in the mutant protein spectrum (this approach works only if the mutation does not cause significant wider conformational change; otherwise the shifts of many resonances will be altered, and interpretation will no longer be straightforward. Thus, it is particularly well suited to

Proteins

in Solution

57

A

1

I

80

9

I

70 Chemical

1

I

I

60 shift (p.p m )

50

Frg. 12 (A) Spectrum of staphylococcal nuclease in D20 All of the amtde NHs have been allowed to exchange out revealmg clearly the C2H resonances of the four histidmes to low field m the spectrum. The smaller stgnals denoted by * arise from a minor conformer of the protein, which differs marginally from the major form, in particular by cis-tram isomerism about a single prolyl peptide bond These are the only resonances of the mmor form that are clearly resolved in the spectrum, demonstratmg the utihty of His C2H peaks as marker signals. (B) Spectrum of a mutant nuclease (P117G). The prolme that was thought to isomerize was replaced by a glycine to see rf thts would remove the conformational heterogeneity The minor resonanceshave indeed disappeared from the spectrum, supportmg the hypothesis. (Reproduced, with permission, from ref. 43 )

surface residues, as histidines often are). These NMR studies, particularly measurements of histidine pK, values, have provided important mformation concerning the mechanism of cooperative oxygen binding and the structural changes associated with it. If increasing size in proteins meant simply increasing the dimensions of a single globular structure, then we would expect that NMR spectra would become ever broader and less informative. However,

58

Neuhaus

and Evans

this is not necessarily the case in reality, because large proteins tend to be segregated into structural domains. In some cases, the interactions between these are relatively weak, and there may then be sufficient relative mobility of individual domains to give surprismgly good spectra. Although much of the spectrum may be hopelessly overcrowded, a limited number of well resolved peaks may be sufficient to provide a useful range of structural probes. Individual domams, isolated from the remainder of the protein by, for example, proteolysis, may fold independently, and comparison of their spectra may make it possible to identify individual resonances m the intact protein spectrum. This has recently been illustrated for the multidomam fibrmolytic protein, urokinase, as illustrated in Fig. 13A (15). In principle, it might then be possible to mvestigate the effects of interdomain interactions. An interestmg feature of multidomain proteins that has recently been explored is that mdividual domains may have different thermal stabilities, so that it is possible to obtain spectra of partially unfolded states in which only certain domains remain folded-this was also demonstrated for urokinase, where independent unfolding of four separate domains was observed--see Fig. 13B (15). Studies of this type are of interest in terms of protein folding, but they also offer the prospect of investigating the possible presence of distinct structural domains where nothing is otherwise known about the structure. Even in cases where the dimensions of a protein are such that resonances of nuclei buried in the core of the structure are hopelessly broadened, there are sometimes more mobile segments of the molecule, giving rise to well resolved lines in the spectrum. It may be possible to identify the origin of these regions, for example, by comparing spectra of partially proteolyzed derivatives. The existence of such regions of enhanced flexibility may be of functional significance, and their identification through NMR in this way thus constitutes valuable information in itself. A good example of this is the pyruvate dehydrogenase multienzyme complex, which has a mol wt of approx 6 million, but has profitably been studied by NMR, which revealed the presence of a flexible linker segment that apparently provides for rapid conformational changesthat arecrucial to the catalytic mechanism (46). The idea of focusing on particular, readily observed, resonances is particularly appealing if it is possible to select them so that they directly reflect a region of interest in the molecule, such as the active site. In

Proteins in Solution

04

0

-04

-06

-lzwm

Ok

0

-04

-08

-12

Fig 13. (A) Spectra of human urokmase and vartous fragments thereof The intact protein (spectrum a) hasa mol wt of approx 60 kDa, but someresonancesm its spectrum are nonethelessquite well resolved This is becausethe protem IS constructed m a modular fashion from three quasi-independentdomains: A sermeproteasedomain (spectrum c), a kringle domain (spectrum a), and an EGF-like domam (spectrum b is of a fragment comprismg the kringle and EGF domains). Compartson with spectra of the isolated domams permits resonancesm the intact protein spectrum to be assigned;for example, the well resolved resonanceat -1 0 ppm is present in the spectrum of the isolated krmgle and can therefore be assrgnedto thus domain (B) Spectraof urokmaseacquiredover a rangeof temperatures It is apparent that someof the resolved upfield-shrfted resonancesdisappearfrom the spectrum at lower temperaturesthan others This reflects noncooperative thermal unfolding for example, the disappearanceof the resonanceat -1 0 ppm from the spectrum above 50°C shows that the krmgle domam has unfolded while more thermostable parts of the protein remain intact (Reproduced, with permission, from ref 1.5.)

60

Neuhaus

and Evans

studies of protein-ligand complexes, for example, one can focus on the ligand (see Chapter 7). Thus, one can study ligands by heteronuclear NMR methods if they are labeled or, in the case of metal ions, if they can be substituted by ions, such as ’ 13Cd2+,that can be studied directly by NMR. For example, in studies of metallothionein, it was possible to determine which residuescoordinate the metal ion by detecting coupling of cysteinyl CPH protons to li3Cd2+ (47). Alternatively, it may be possible to study the conformation of the bound ligand when it is in equilibrium with the free form (which may be in excess, so that its spectrum is readily observed) by the detection of transferred NOES (48). It may also be possible to use labeled ligands to obtain structural information about the residues of the protein itself, to which they are bound. NOES can also be detected between nuclei of the ligand and of the protein, potentially providing a very powerful specific probe of the binding site; however, the success of such experiments has so far proven to be limited m practice, principally because they do not overcome the problem of assigning the protein resonances concerned. NMR also has considerable potential as a technique for studying nonnative states of proteins. This is of considerable importance in protein folding studies, and even though the information available may be rather limited, it is, in most cases, virtually the only residuespecific structural information obtainable and therefore very valuable. The problem with partially unfolded states is that they tend to give very poorly resolved spectra, so that direct assignment and structural interpretation are very difficult. However, it may be possible to use the well resolved native state spectrum to obtain mformation indirectly about the nonnative one (8,49). For example, chemical shifts of individual protons in the partially folded state may be determined by magnetization transfer from the corresponding native stateresonanceswhere the two forms are interconverting. Hydrogen bonded structure in the partially folded form may be detectable by means of the protection it offers against exchange of NHs for deuterons when the protein is dissolved in D20. The pattern of labeling in the partially folded form can be determined by allowing the protein to fold and determining the extent of proton occupancy at individual sites m the well resolved spectrum of the native form. This idea has now been extended to the characterization of transient structural intermediates on refolding pathways (50,51).

Proteins in Solution 7. Practical

61 Considerations

The feasibility of undertaking a detailed structural study of a protein by NMR depends, in part, on the intrinsic properties of the protein, as discussed in Section 2. A full 3D structure determination generally requires a level of assignment and analysis that is only currently attainable forrelatively small proteins. Thus, the NMR spectroscopist’s first question about a protein is always said to be “how big is it?” As we pointed out before, there are no hard and fast rules; as a guideline, we would suggest that if the mol wt is cl0 kDa, it is a possibility well worth considering, although only preliminary studies to gage the quality of the spectrum can really tell. For proteins between 10 and 15 kDa, it may still be possible, but the undertaking becomes increasingly onerous. A few spectra of proteins of this size have been assigned using only homonuclear ‘HNMR, although in these cases, other tricks were generally used to obtain “edited” spectra in order to resolve problems of resonance overcrowding- for example, differential solvent exchange rates of amide protons in the case of lysozyme (13) and the variable oxidation state of the prosthetic group in the case of flavodoxin (I 7). In the caseof proteins much larger than approx 15 kDa, 13C or 15Nlabeling to permit heteronuclear studies will undoubtedly become necessary to permit much progress with assignment to be made. The alternative with these and still larger proteins is to settle for seeking more limited structural information, as discussed in Section 6. The other major limitation of NMR is its Insensitivity. Obtaining a 1-mit4 solution, the minimum desirable for ‘H NMR, requires 5 mg of protein to be dissolved in a OS-mL sample, in the case of a protein of 10 kDa. The overall requirement, both in terms of the amount of protein needing to be purified and its solubility, could therefore in some casesbe excessive. It is also important to note that the protein must not only be soluble at this concentration, but it must also not aggregate appreciably; otherwise, the effective mol wt will of course be greatly increased, and the spectrum will correspondingly tend to be poorer. It may be necessary to experiment with a variety of solution conditions m order to optimize the spectral quality obtainable. In preparing a sample for NMR studies, several factors unique to this technique need to be borne in mind. In particular, the solvent conditions may have to be adjusted. NMR studies require that the

62

Neuhaus

and Evans

solvent water be at least 10%deuteratedto permit the field frequency lock to function; for some studies,it may be desirableto work in virtually 100% D,O. Thus, some form of buffer exchange is generally necessary.Since NMR samples are typically more concentratedthan those used for other studies, this is usually associatedwith a concentration step. The simplest method is to freeze-dry the protein, in the absenceof addedbuffer salts, and then redissolve the product in a buffer appropriate to NMR work. Some proteins cannot be freeze-dried, however, and m that case, it may be necessary to use some form of concentrator that employs a semipermeable membrane to effect buffer exchange and concentration. A particular problem with protein samples is the presence of small molecule signals that can interfere with the spectrum. The most obvious of these is water. Since it is in general necessary to work in H,O rather than D20 solution, in order to observe all of the exchangeable NH signals, considerable effort has been put mto developing methods for suppressing the water peak in protein spectra. This may be achieved either by selective saturation of the water or selective excitation of the remainder of the spectrum (52). Whatever technique is applied, the key to successis excellent field homogeneity, since line-shapedistortions can lead to poor suppression of parts of the peak, resulting m serious baseline distortions in the protein spectra obtained. It must be remembered that any solute molecule contaimng protons will also give rise to signals in the ‘H spectrum and, therefore, that it is desirable to remove such species as far as possible. This can usually be achieved by dialysis or gel filtration, but it can present more of a problem in relation to the buffer requirements of the particular protein. The simplest solution is to use an inorganic buffer, such as phosphate, whose only protons are in fast exchange with the water and can readily be preexchanged for deuterons if need be simply by freeze-drying from D20 solution; alternatively, several common buffer salts are available asperdeuterated derivatives-simple compounds, such as (d,)-acetic acid are indeed quite cheap to obtain. If it is necessary to use a protonated buffer, its concentration should be kept to an absolute minimum. It is desirable, as with all experiments, to minimize contaminants of any kind, but certain types present particular hazards to NMR samples, and it may be necessary to take special steps to avoid them. Paramagnetic impurities can cause broadening and shifting of resonances in protein spectra, and need to be excluded if they are found to be present

Proteins in Solution

63

in a protein sample. To remove trace metal ions, it may be sufficient to add a low concentration of EDTA or EGTA to the solution, but it is probably preferable to remove them altogether by dialysis against one of these agents or passing the protein through a column of a metal ion sequestering resin, Of course, the problem may need more careful consideration in the case of a metalloprotein! It should be noted that molecular oxygen is itself aparamagnetic impurity, and some workers remove dissolved oxygen from NMR samples by freeze-thaw methods. However, the effects are rather marginal in practice, and since the process is time-consuming and may have deleterious effects on some proteins, it is not very common today. Optimum linewidths in NMR spectra depend on a number of factors. One is the condition of the sample, which should be free of extraneous matter. This can usually be achieved by centrifugation prior to placing the sample in the NMR tube. Most important of all is the homogeneity of the magnetic field, which must be optimized by “shimming.” This may be a very tedious process (although increasingly it is possible to get the instrument to do it, at least in part, automatically), but is absolutely necessary. Some workers fmd that it is best to shim using a sample of a small molecule first, where resolution of very fine couplmgs provides a very stringent test of homogeneity-it should then only be necessary to make small final adjustments on introducing the protein sample. Another factor that is a key to the success of 2D experiments is stability; this is to someextent dependenton the quality and situation of the instrument, but care should also be taken by the experimenter to ensure, for example, that the probe temperature is fully equilibrated before commencing acquisition. Running the experiment without spinning the samplealso improves stability substantially, without degradmg the resolution noticeably, provided the shimming is adequate. The discussion presented here can serve only to provide a brief introduction to the study of protein structure by NMR. Today NMR is becoming an increasingly accessible technique, no longer solely the preserve of specialist spectroscopists. Nonetheless, it should be clear that an NMR study, particularly at the level of detailed assignment and structural analysis, is still a major undertaking requiring a considerable input of time, far beyond that requiredjust to acquire the spectra, and that a degree of acquired expertise is necessary to obtain useful spectra and interpret them correctly. Protein NMR is still far from being a routme

64

Neuhaus

and Evans

technique, but the opportunities are constantly increasing to take advantages of the unique structural information that it can generate. References 1. Wilthrrch, K. (1986) NMR of Proterns and Nucleic Aczds. Wiley, New York 2 Wuthrrch, K (1989) Protein structure determmatron m solution by nuclear magnetic resonance spectroscopy. Scrence 243,45-50 3. Wdthrich, K. (1989) The development of nuclear magnettc resonance spectroscopy as a technique for protein structure determmatton Accounts Chem Res 22,36-44 4 Clore, G M and Gronenborn, A. M. (1990) Determination of three-drmensional structures of proteins and nucleic acids m solutton by nuclear magnetic resonance spectroscopy CRC Cnt. Rev Biochem 24,419-564 5 Griffey, R H. and Redfield, A. G. (1987) Proton-detected heteronuclear edited and correlated nuclear magnetic resonance and nuclear Overhauser effect m solution. Q. Rev. Biophys 19,51-82 6 McIntosh, L. P. and Dahlqurst, F W. (1990) Biosynthettc mcorporation of 15N and t3C for assignment and interpretation of NMR spectra of proteins. Q. Rev Biophys 23, l-38. 7 Perkins, S. J. (1982) Applications of rmg current calculations to the proton NMR of proteins and nucleic acids, in Blologzcal Magnetic Resonance, vol 4 (Berliner, L. J. and Reuben, J., eds ), pp 193-336 8. Baum, J., Dobson, C. M., Evans, P. A., and Hanley, C. (1989) Characterlsation of a partly folded protein by NMR methods. Studies on the molten globule state of a-lactalbumm. Biochemistry 28, 7-13. 9. Smith, S. 0 and Griffin, R G (1988) High resolution solid state NMR of proteins. Annu Rev. Phys Chem. 39,511-536. 10. Tappin, M. J., Pastore, A., Norton, R. S., Freer, J. H., and Campbell, I D (1988) High resolution NMR study of the solution structure of b-hemolysin Blochemcstry 27, 1643-l 647 11 Braun, W , Wider, G , Lee, K. H., and Wuthrrch, K (1983) Conformation of glucagon in a lipid-water interphase by ‘H nuclear magnetic resonance. J Mol Biol. 169,921-948. 12 Neuhaus, D , Nakaseko, Y , Nagar, K , and Klug, A (1990) Sequence-specdtc [‘H]NMR resonance assignments and secondary structure identiftcation for Iand 2-zinc finger constructs from SWIS; a hydrophobic core mvolvmg four invariant residues. FEBS Lett 262, 179-l 84 13. Redfield, C. and Dobson, C M. (1988) Sequential assignments and secondary structure of hen egg-white lysozyme m solution Blochemlstry 27, 122-I 36 14 Wagman, M. E , Dobson, C. M , and Karplus, M (1980) Proton NMR studres of the association and folding of glucagon in solution FEBS Lett 119,265-270 15. Bogusky, M., Dobson, C. M., and Smith, R. A G (1989) Reverstble independent unfolding of the domains of urokinase monitored by proton NMR Blochemrstry 28,6728-6735.

Proteins in Solution

65

16 LeMaster, D. (1990) Deutermm labeling m NMR structural analysts of larger proteins Q. Rev. Biophys 23, 133-174. 17 van Mterlo, C P. M., Vervoort, G , Muller, F , and Bather, A (1990) A twodimenstonal ‘H NMR study on Megasphaera Elsdenii flavodoxm in the reduced state; sequenttal assignments. Eur J. Biochem. 187,521-541 18. Evans, P. A , Kautz, R A , Fox, R. 0 , and Dobson, C M. (1989) A magnettzatton transfer NMR study of the folding of staphylococcal nuclease. Biochemistry 28,362-370.

19. Wagner, G. (1983) Charactertsatton of the dtstributton of internal motions m the bovine pancreatic trypsin inhibitor using a large number of internal NMR probes Q. Rev. Biophys. 16, l-58 20. Neuhaus, D and Williamson, M. P (1989) The Nuclear Overhauser Effect VI StructuraE and Conformational Analysis. VCH, New York. 21 Marion, D., Ikura, M., Tschudm, R., and Bax, A (1989) Rapid recording of 2D NMR spectra wtthout phase cyclmg. Applrcation to the study of hydrogen exchange in proteins J. Magn Reson. 85,393-399 22 Keeler, J. and Neuhaus, D. (1985) Compartson and evaluatton of methods for two-dlmensronal NMR spectra with absorption-mode lineshapes J Magn Reson 63,454-472.

23 Williamson, M P (1987) Guidelines for the design of kinetic NOE expertments from computer simulation Magn Reson Chem 25, 356-36 1 24. Borgras, B. A and James, T. L. (1989) Two-dimensional nuclear Overhauser effect. complete relaxation matrix analysis Methods Enzymol 176, 169-l 83 25 Otting, G. and Wuthrich, K (1990) Heteronuclear filters in 2D [ lH, lH] NMR spectroscopy Combmed use with tsotoprc labelling for studies of macromolecular conformatton and intermolecular mteractrons. Q. Rev Blophys. 23,39-96. 26 Grresinger, C , Sorensen, 0 W , and Ernst, R R. (1989) Novel three-dimensional NMR techniques for studies of peptides and brologtcal macromolecules .I. Am. Chem Sot 109,7227-7228.

27 Englander, S W. and Wand, A J. (1987) Main-chain-directed strategy for the assignment of ‘H NMR spectra of proteins. Biochemistry 26,5953-5958 28 Hyberts, S. G , Markt, W and Wagner, G. (1987) Stereospectfic assignments of side-chain protons and characterisatton of torsteron angles m Eghn c Eur J. Blochem 164,625-635

29 Guntert, P , Braun, W , Brlleter, M , and Wuthrrch, K. (1989) Automated stereospectftc ‘H assignments and their impact on the precision of protein structure determinations in solutron J Am Chem. Sot 111, 39974004. 30 Neri, D , Szyperskt, T , Ottmg, G , Senn, H , and Wuthrrch, K (1989) Stereospectfic nuclear magnetic resonance assignments of the methyl groups of valme and leucine m the DNA-binding domain of the 434 repressor by blosynthettcally directed fractional t3C labelling Biochemistry 28,7510-7516 31 Weber, P L , Morrison, R , and Hare, D. (1988) Determmmg stereo-specific ‘H nuclear magnetic resonance assignments from distance geometry calculations

J. Mol. Blol. 204,483-487

32 Gronenborn, A M., Bax, A., Wmgfteld,

P. T , and Clore, G M (1989) A

66

Neuhaus and Evans

powerful method of sequential proton resonance assignment m proteins using relayed 15N-‘H multiple quantum coherence spectroscopy. FEBS Lett 243, 93-98 33 Feslk, S W , Eaton, H L , OleJniczak, E T , and Zmderweg, E. R P. (1990) 2D and 3D NMR spectroscopy employing 13C - 13C magnetisatlon transfer by isotropic mixing Spm system ldentlflcatlon m large protems J. Am Chem sot. 112,886-888 34. Wang, J , Hmck, A P., Loh, S N., and Markley, J L (1990) Two-dlmensional NMR studies of staphylococcal nuclease. 2 Sequence-specific asslgnments of carbon- 13 and mtrogen- 15 signals from the nuclease H124L-thymedme 3’S’-blsphosphate-Ca*+ ternary complex. Blochemwy 29, 102-I 13 35. Ikura, M , Kay, L. E , Tschudm, R., and Bax, A (1990) Three-dimensional NOESY-HMQC spectroscopy of a 13C labelled protein J Mugn Reson 86, 204-209 36. Wllhamson, M P , Havel, T F , and Wuthnch, K (1985) Solution conformation of protemase mhlbltor IIA from bull seminal plasma by ‘H nuclear magnetlc resonance and distance geometry J. Mol Biol. 182,295-3 15 3’7. Wagner, G , Braun, W , Havel, T F , Schaumann, T , Go, N , and Wuthnch, K (1987) Protem structures m solution by nuclear magnetic resonance and distance geometry. The polypeptide fold of the basic pancreatic trypsin mhibitor determmed usmg two different algorithms, DISGEO and DISMAN J Mol Biol. 196,6 1 l-639 38. Braun, W. and Go, N. (1985) Calculation of protein conformation by protonproton distance constraints, a new efficient algorithm J Mol. Biol 186,61 l626 39. Crippen, G. M. and Havel, T F (1988) Dcstance Geometry and Molecular Conformation Wiley, New York 40 Boelens, R , Koning, T M G., and Kaptem, R. (1988) Determmatlon of blomolecular structures from proton-proton NOE’s usmg a relaxation matrix approach J. Mol. Struct. 173,299-311. 41 Jardetzky, 0 and Roberts, G C K (1981) NMR ln Molecular Biology. Academic, New York 42. Markley, J. L. (1975) Observation of hlstldme residues m protems by means of NMR spectroscopy Act Chem. Res 8,70-80 43. Evans, P A , Dobson, C M., Kautz, R A , Hatfull, G , and Fox, R. 0 (1987) Prolme lsomerlsm m staphylococcal nuclease characterlsed by NMR and site directed mutagenesis. Nature 329,266-268 44. Shulman, R G , Hopfield, J. J , and Ogawa, S (1975) Allosteric mterpretatlon of haemoglobm propertles. Q Rev. Brophys 8,325-420. 45. Ho, C. and Russu, I. M. (1987) How much do we know about the Bohr effect m haemoglobin? Biochemistry 26,6299-6305. 46. Radford, S E., Laue, E. D , Perham, R. N., Miles, J S , and Guest, J R (1987) Segmental structure and protein domams m the pyruvate dehydrogenase multienzyme complex of Escherichla coli Biochem J 247,641-649 47. Frey, M. H , Vasak, M., Sorenson, 0 W , Neuhaus, D , Worgotter, E , Kagl J

Proteins

48. 49 50 51 52

in Solution

67

H R , Ernst, R R , and Whthrtch, K (1985) Polypeptide-metal cluster connecttvmes m metallothionem-2 by novel ‘H - lt3Cd heteronuclear 2D NMR experiments. J. Am. Chem Sot 107,6847-6851 Clore, G. M and Gronenborn, A. (1982) Theory and applications of the transferred NOE to the study of the conformations of small hgands bound to proteins J Magn. Reson 48,402-417 Roder, H (1989) Structural charactertsatton of protein folding mtermediates by proton magnetic resonance and hydrogen exchange. Methods Enzymol 176, 446-473. Udgaonkar, J B and Baldwin, R L (1989) NMR evidence for an early framework Intermediate on the folding pathway of rtbonuclease A Nature 335, 694-699 Roder, H , Elove, G., and Englander, S. W (1989) Structural charactertsatton of folding intermediates m cytochrome c by hydrogen exchange labellmg and NMR Nature 335,700-704 Hore, P (1989) Solvent suppression. Methods Enzymol. 176,64-77

CHAPTER3

Peptide Structure Determination by NMR Michael

I? Williamson

1. Introduction The difference between peptides and proteins (the subject of Chapter 2) is that peptides are molecules too small to have a “globular” structure. This means that the spectral assignment process is often much simpler for peptides than it is for proteins, because there are fewer signals present m peptide spectra; on the other hand, peptides seldom adopt a single, well defined structure in solution, which makes the interpretation of structural data more contentious for peptides than it is for proteins. The emphasis in this chapter is therefore different from that in Chapter 2. The acquisition of structurally relevant data is straightforward, given a familiarity with modern two-dimensional (2D) NMR techniques and is given less emphasis here, but the analysis of the data is seen as the key to obtaining a meaningful answer, and is the area where experience and expertise are most necessary. The difficulty in dealing with flexible structures by NMR derives from the fact that intramolecular motion (i.e., rotation about single bonds) causes most NMR parameters, such as NOE, coupling constant, and chemical shift, to be averaged, rather than giving a superposition of values, as IS seen in many other branches of spectroscopy. This has a number of consequences. First, it is not at all obvious from inspection of a spectrum whether one conformation or many conforFrom Methods m Molecular Biology, Vol 17 Spectroscop/c Methods and Analyses NMR, Mass Spectrometry, and Metalloprotem Techmques Edited by C Jones, B Mulloy, and A H Thomas CopyrIght 01993 Humana Press Inc , Totowa, NJ

69

70

Williamson

mations are present (see Note 1). Second, the different conformational parameters are averaged in a very nonlinear way (I). Thus, for example, the size of the NOE depends not on the internuclear distance r, but on +, so that a very close contact between two protons in a minor conformer can still give a very strong NOE after conformational averaging. Third, it is usually impossible to use the observed averaged NMR parameters to deduce the nature of the constituent conformers; in other words, the problem is underdetermined. The consequence of these three points is that, in deriving peptide structures from NMR data, it is usually assumed that only one conformation is present, often without any serious attempt to justify the assumption. Analysis of the data will then often produce a structure that may fit the data, but may possibly have very little relation to the actual conformations present. On the other hand, if one is more cautious and assumes that more than one conformation may be present, how does one limit the choice of possible conformations? In the rest of this chapter, we describe methods for tackling the problem, based on the plan: 1. Acquire as much and as varied information as possible; 2. Analyze it to seeif it could fit a single conformation; and 3. Adopt a cautious approachto structuredetermination, bearing in mind the underdeterminednatureof the problem. Several aspects of this approach have been discussed in reviews G’,3.

2. Materials We assume that a Fourier transform (FT) mode NMR spectrometer is available. The alternative is acontinuous-wave (CW) machine, which is generally less sensitive, and is incapable of doing 2D experiments. A 300-MHz instrument would be quite adequate for almost all the experiments described here, although higher field instruments are more sensitive and have better spectral dispersion. Peptides longer than eight to ten residues would benefit greatly from the extra dispersion available from higher field machines. The volume of solution needed for NMR is approx 0.5 mL, and the minimum concentration is approx 1 rnA4. Lower concentrations may be acceptable at very high fields (500 or 600 MHz), but higher concentrations (in the range 5-10 mM) are more normally used, especially if working in protonated solvents (see Note 2).

Peptide Structure

71

The NMR response is proportional to the number of nuclei present. It is therefore important to ensure that the sample is free from any proton-carrying material, including buffers. For work in aqueous solutions, phosphate is aconvenient nonproton-carrying buffer. Many transition metal ions broaden NMR signals and should be removed using chelating agents. The solvent used should be fully deuterated, to avoid the need for suppressing the large signal from solvent protons. In protic solvents, such as water or methanol, there is chemical exchange between solvent protons and the amide protons on peptides. The use of D20 or CDsOH therefore removes the very valuable conformational information obtainable from amide protons, and for water and methanol, it is common to put up with the need for solvent suppression and use 90% protic/lO% deuterated solvent mixtures, the deuterated component being required for the field-frequency lock. This difficulty is one reason for the popularity of dimethylsulfoxide, m which this solvent exchange reaction cannot occur, as a solvent for peptide work (see Notes 3 and 4). 3. Methods 3.1. Assignment

Assignment of peptides relies on the analysis of spin-spin couphng patterns and nuclear Overhauser effects (NOES), as described for proteins in Chapter 2. It is usually much more straightforward for peptides, because there are fewer signals present; in addition, the greater mobility ofpeptides makes linewidths narrower, thereby further reducing spectral crowding and giving greater net intensity in many 2D spectra. Spin-spin couplings can be analyzed completely for small peptides using COSY alone, which is easily implemented on all modern FT machines. The assignment of side chain protons is facilitated by relaying coupling information along the side chain, which can be done by relayed COSY, or preferably by TOCSY (also called HOHAHA). This technique is somewhat more complicated to set up than COSY, and may not be possible on machines built before about 1988 or on “low-budget” machines. Both 2D techniques take between 3-l 2 h to acquire the data, plus some time in Fourier transforming and plotting the data. The time taken to do this depends strongly on the instrumentation available, and can vary between a few minutes and several hours.

72

Williamson

To complete the sequence-specificassignments,NOES areusually also necessary, aselaborated in Chapter 2. There are two ways of collecting NOE information; either using the normal longitudinal crossrelaxation pathways, with 1D NOES or NOESY, or via the rotating frame NOE, called ROE or CAMELSPIN in the 1D experiment, or ROESY in two dimensions. Both techniques aretypically runovernight. ROESY is more prone to produce misleading signals and is also more difficult to set up (#), so NOESY is preferred where possible. However, the biggest factor affecting the size of the NOE is the tumbling rate of the interproton vectors concerned, as shown in Fig. 1.As a rough guide, for moderate or high-field machines operating at room temperature, small peptides in nonviscous solvents have 02, << 1, and neither NOE nor ROE will work very well; linear peptides in nonaqueous solvents and cyclic peptides in nonviscous solvents have cez, - 1, and only ROE will work; long linear peptides in water and cyclic peptides m dimethylsulfoxide or water have NIX, >I, and NOE and ROE will both work, although NOE is usually preferable, as previously described. At lower field strengths or higher temperatures, 07, becomes smaller. If in doubt, it is simpler to try NOESY first, but expert help is strongly advisable for any NOE experiments. If using ROESY, it is advisable to use low spin-lock field strengths and to acquire two spectra with different offsets of the spinlock field, in order to reduce and identify some of the undesirable signals. The undesirable signals are most troublesome when the transmitter frequency is midway between the two protons giving the ROES (4). COSY, TOCSY, and NOESY/ROESY are usually sufficient for a complete ‘H assignment; sometimes other techniques are used, particularly ones that do not make use of the NOE, such as COLOC (5). All these techniques have 1D analogs, but in nearly all cases, the 2D version is simpler to set up (see Note 5). The exact method of implementation of these techniques relies heavily on the instrumentation available, which varies widely. 3.2. NMR Parameters

Available

Some of these parameters are discussed more fully in a review (6). 3.21. The NOE

This is generally the most useful parameter available, since it is very sensitive to mternuclear distance (intensity proportional to rm6).As described in Section 3.1.) it is not straightforward and takes some time:

Peptide Structure

73

Fig. 1 Dependence of NOE intensity on wz, for longitudinal NOE (N) and transverse (rotating frame) NOE (R)-CII is the observation frequency (in rad s-i) and 2, is the rotational correlation time The figure depicts the maximum observable value in a 2D experiment from an isolated two-spin system. In practice, NOE values will be smaller, particularly for values of oz, c c 1, because of external relaxation In 1D experiments, numerical values are slightly more than twice as large and inverted (i e , R IS always positive, whereas N starts off positive and goes negative with mcreasmg ~2,)

2D experiments are generally done overnight (seeNote 6). It can sometimes be useful to obtain heteronuclear { NH}-i3C0 NOES, but these are time-consuming and difficult to interpret (4). NOE intensities from 2D spectra are best measured by integration of crosspeak volume, having first ensured that the baseplane around the crosspeak is corrected (see Notes 7 and 8). 3.2.2. Coupling

Constant (3J~~a)

Protos separated by three bonds have signals split by a coupling constant, J, which varies in a somewhat complicated fashion with the angle between the protons, as shown in Fig. 2. For peptide NH protons, JHN~ is usually measurable directly from the normal 1D spectrum, making it a very easily measured parameter. However, note that, for accurate measurement of J, the digital resolution of the spectrum should be better than 0.2 Hz/point, meaning that the spectrum should normally contain 32K points or more. If the spectrum is too crowded to measure Jdirectly, coupling constants can be measured from a high-

Williamson

74

1,

I,

-160

-120

‘ , I,, -80

-40

., 0 ‘p

.I, 40

, , , 80

120

160

Ag 2 Varlatlon of3JHNcl with dihedral angle for peptides, using the equation of ref (7), J = 6 4 cos*0 - 1.4 cos0 + 1 9 (Cl = ( $- 60 I) Typical values for a-helix, psheet, and random co11 are mdlcated by a, p, and I, respectively

resolution, phase-sensitive COSY experiment (8), which is simple to acquire, but tiresome to analyze because of the very large data matrix needed for adequate digitization of crosspeaks. 3.23. Temperature

Dependence of NH Shifts

Solvent-exposed amide protons shift roughly 0.0064008 ppm (6-8 ppb) upfield per degree temperature increase, whereas amide protons hidden from the solvent shift much less (seeNote 9). Thus, the temperature dependence of the shift (AS/IT) is widely used as a measure of the extent of hydrogen bonding of amide protons, with values of ~3 ppb/ K taken as indicative of well formed hydrogen bonds (e.g., ref. 9). This 1san easy parameter to measure, which is one reason for its popularity. It is important to leave sufficient time for the temperature of the sample to equilibrate after altering the temperature (at least 15 min, depending on the spectrometer). If it has not been done recently, it is also advisable to check that the temperature reading of the spectrometer is accurate, using a reference sample of methanol or ethylene glycol. 3.2.4. NH Exchange Rate

An amide proton in a well formed hydrogen bond (or an amide proton otherwise shielded from solvent) will have an exchange rate slow enough that its signal can still be seenafter dissolvmg the peptide

Peptide Structure

75

in D20 or CDsOD (10). Exchange rates are at their slowest at a pH of 3-3.5; the peptide should therefore be lyophilized from the protoncarrying solvent at this pH, then dissolved in the deuterated solvent, and immediately observed (see Note 10). Faster exchange rates can be detected by 1D saturation transfer or 2D NOESY (see Chapter 7 of this vol.), which are harder to perform than the straightforward exchange experiment and also harder to interpret. 3.2.5. Chemical Shift

In an unstructured peptide, protons have chemical shifts dependent only on the amino acid type. These are known as the randomcoil shifts, and their values have been tabulated (II). Chemical shifts very different from these values indicate some form of preferred structure, without any indication of what that structure may be (IO). Considerable care is needed, since chemical shifts may be affected by nearby aromatic rings, titratable groups, or hydrogen bonds from side chains. Chemical shifts of 15N do not seem to be very good indicators of hydrogen bond formation, and 13C0 shifts seem to be only poor mdicaters, but it is still too early to say much about the usefulness of heteronuclear shifts. 3.2.6. Solvent Titration

Exposed amide protons are sensitive to the hydrogen-bonding capability of the solvent. Thus, on adding chloroform to adimethylsulfoxlde solution, exposed amide protons will become less hydrogen bonded and shift to higher field (12). The absence of a chemical shift change is indicative of shielding from solvent. Naturally, these arguments are only relevant if the conformation does not change on altering the solvent composition -this is not always an easy point to decide, as discussed in Section 3.3. Alternatively, solutes, such as shift reagents or free radicals, can be added to perturb resonances in a more or less predictable manner, hopefully without altering the peptide conformation (6). All of these methods are difficult to interpret meaningfully, 3.3. How Many

Conformations?

As outlined earlier in this chapter, this is the key question that needs to be addressed, since if more than one conformation is present, structural analysrs becomes much harder. As a general rule, acyclic peptides of less than 30 residues are likely to be mobile in solution. If they

76

Williamson

do have structured regions, these are likely to be in fast exchange with random-coil conformations (see discussion in Section 3.4.2.). Cyclic peptides tend to be more structured, although a range of conformations can often exist in fast exchange. Side chains are likely to be in fast exchange between the two or three staggered conformations. A single conformation can be assumed if all the following hold: 1 Most structural parameters indicate a preferred structure. In other words, there should be non-random-coil NOES, extreme values of J (~6 or >8.5 Hz), low temperature coefficients (<3 ppb/K) and slow exchange of any NH implicated in a hydrogen bond, and some unusual chemical shift values (differing from random-co11 values by ~0.4 ppm for NH and >0.2 ppm for other protons, which cannot be explained by ring-current or titration effects). Diastereotopic protons (especially Gly CaH) should have different chemical shifts and couplmg constants. Conformational preferences of side chains are only worth considering if diastereotopic CaH have different chemical shifts and coupling constants. Another way of putting this is to say that the ammo acids in the sequence should show sequence-dependent differences m their coupling constants, NOES, and so on. A good example of this is quoted by Kessler

(3): in cyclo(Gly& all the Gly NH are equivalent, with Dd/T equal to 2.96 ppb/K, whereas in cycle (Ala-Gly,), all five residues are distmguishable, having temperature coefficients (starting with Ala) of 4.16, 2.45, 3.46, 3.21, and 1.87 ppb/K, respectively, thereby providmg evidence for a preferred structure (but not necessarily only one single preferred structure). 2. Temperature changes do not alter the parameters or do at least alter them in a linear fashion. This applies particularly to A6/T, for whrch a nonlinear variation 1sindicative of multiple conformations, or at least of the unfolding of a folded conformation with increase m temperature. 3 All structural parameters are self-consistent; thus, A&T, NH exchange, and solvent titration should all rmplrcate the same amide protons.

Peptide conformations can depend markedly on solvent composition (see Note 3). If structural parameters (e.g., 3J and NOE) do not change as the solvent composition is altered, it can be assumed that the conformational equilibrium has not altered, and thus almost certainly a single conformation is indicated. If they do change, considerable caution is called for. At the very least, it shows that several conformations are accessible, while leaving open the question of how many con-

formations coexist in any given solvent. Finally, we should repeat the

77

Peptide Structure

warning given earlier: It is very unwise to assume that only a single conformation is present,without careful examination of all available data. 3.4. Structure

Analysis

Here, as elsewhere in this chapter, the golden rule is that as many parameters as possible should be used to reach structural conclusions. Non-NMR parameters, such asCD and fluorescence quenching, should also be used if applicable (see refs. 13-17). We treat the simpler case first, where only one conformation is present in solution. 3.4.1. Single Conformation

There are several ways of deriving structures from NMR data. Distance geometry or molecular dynamics can be applied, as described in Chapter 2, but these methods are often unsatisfactory for cyclic peptides because of the restraints imposed by the ring system. (Not only are some programs incapable of handling cyclic systems adequately, but the high energy barriers to internal rotation of the backbone in small cyclic peptides can mean that dynamics calculations cannot access all the available conformations.) The normal approach is to go through each of the possible types of local structure in turn and see if it fits the data. This approach is risky, since it is easy to overlook other possibilities once a conformation has been identified and to ignore conflicting data. We stress again that a claim for a single conformation requires that all structural data be satisfied by the conformation postulated. A promising new approach, particularly for cyclic systems, is to calculate all the low-energy backbone conformations accessible and seewhich one fits the NMR data best. This approach has the major advantage that it is far less subjective than the manual approach, but it is not yet generally available. Peptides in solution normally adopt a limited range of structures. For L-amino acids, these are the “random coil,” the a-helix, and the type I and type II p-turns, and their mirror images, the type I’ and II’ turns. For type I’, II, and II’ turns, the geometry is such that glycine or a n-amino acid is strongly preferred as one of the two residues in the turn-residue 2 in I’ and particularly in II’, and residue 3 in II. (Throughout this chapter, the internal residues of a p-turn will be designated residues 2 and 3, and the internal residue of a y-turn will be designated residue 2.) There are two further types of p-turns possible, Via and VIb

Williamson

(18), which involve a cis amide bond, and are usually only found with proline or N-alkyl amino acids. Cis amide bonds are readily recognizable by a short distance between Ca protons on either side of the bond. Particularly in cyclic pentapeptides, y-turns and reverse y-turns can also be found, usually with a bulky residue (e.g., Pro, Val, Phe, or Aib) in the turn. The reverse y-turn is less sterically stramed than the y-turn for L-amino acids. These turns are depicted in Fig. 3, and some characteristic angles and distances are given in Tables 1 and 2. In crystal structures of proteins (18), local geometries can differ considerably from those used to produce the data in Tables 1 and 2, implymg that the distances and angles in real peptides may vary quite markedly from those given in the tables. In practice, it is usually AS/T, NOE, and Jthat are used to identify structural features, but many other techniques should be used to confirm the conclusions reached using these parameters. AS/T is much quicker to measure than the NOE, and is therefore more often quoted (especially in earlier work), although it is not aseffective at distinguishing different secondary structures than the NOE (9). 3.4.2. Multiple

Conformations

As stressedearlier and in many other places (e.g., 4), it is only when a single conformation is present that structures can be derived with any degree of reliability. Because of the averaging of NMR properties by intramolecular reorientation, NMR cannot easily be used to characterize multiple conformers, unless some independent knowledge or assumptions are used as to the nature of the conformations present. For example, imagine a flexible peptide for which most of the NMR parameters are fit by a type I turn. Assume that a better fit can be obtained by including a small amount of a type II’ turn, plus smaller proportions of conformations involving hydrogen bonds from side chains to backbone atoms. By suitable juggling of the populations of these conformers, the data can be fit very well, but this has meant introducing a large number of experimentally undetermined parameters (i.e., conformations and populations): Anything can be fit in this way, provided enough new conformations are introduced, and the exercise is therefore largely meaningless. The only conformation that can reasonably be introduced without good experimental evidence is the random coil, which is generally assumed to mean the conformational space occupied by unstructured

Peptide Structure

79

B

H\,R2 H,N/C lc//O I I cso HAN\C,H / HNC ‘R3

I

R” \

Fig 3 (A) A p-turn, (B) A y-turn Table 1 Characterlstlc Angles and Coupling Constants in Secondary Structure Residue 2 Residue 3 Structure a-Hehx Random co11 Turn I Turn II Turn II’ Turn y Reverse y U 3J~~~

4) -57

w -47

-60 -60 60 80 -80

-30 120 -120 -65 65

3J~~~u

39 6.5-8 5 46 46 69 62 67

0

ye

-90 80 -80

0 0 0

3J~~~a

79 62 67

values are calculated using the equation of ref 7

peptides. This is by definition a mobile structure, but a wide range of information from NMR and elsewhere implies that it is predominantly an extended (P-sheet-like) conformation. However, because of its mobility and the nonlinear averaging of NMR parameters (particularly the NOE), it has some characteristics of more folded structures; for example, low-intensity dNNNOES are commonly found m randomcoil peptides. If a peptide exists in fast exchange between random coil and one particular folded structure (for example, an a-helix), then the NMR parameters will be an appropriately weighted average of the two sets of NMR parameters (i.e., an Y-~-weighted average for the NOE, an (A cos28 + B co&l + C) weighted average for dihedral angles, and somewhat less-defined averages for other parameters). If the NMR data fit

23

28 -

&-Helix 2.7 Random co11 Medium Turn I 27

28 2.3 2.3

29

Turn II Turn II’ Turn y

Reverse Y

38

45 45 39

28 Long 2.8

NH,-NH,

2.7

2.2 3.2 3.6

3.5 Short 3.4

NH3-or,

2.2

29

NH+‘,”

23 29

29

NH3-q

29 -

-

NH+x’~~

26 28

26

NH,-NH3

%’ represents the other C=H of Gly or the CaH of a D-ammo acid, and IS given where such residues occur frequently

-

NH&,=

NH2-g?

Structure

Table 2 Short Distances in Secondary Structures (A)

3.3 33

33

NH‘+-a3

33

-

NH4-cx’3=

Peptide Structure

81

such a model, it is reasonable to assume that such a conformational equilibrium is occurring, particularly if solvent titration can be used to shift the conformation from random coil to the folded structure. If the NMR data do not fit a simple random coil ti folded structure model, then it is very hard to deduce anything reliable. Sometimes, comparison of different peptide sequences can be useful (IO), but sequence comparisons can be misleading, for example, if interactions with the side chain lead to the perturbation of As/T (9). There is no established way of dealing with the problem of multiple conformations. One promising method, particularly for cyclic peptides, is to obtain the conformational models either from crystal structures or from molecular mechanics or both (I), and use the NMR data to assess the populations of each conformatton. In a similar approach, Nikiforovich et al. (19) calculated a large number (nearly 15,000) of accessible conformations of the linear peptide angiotensin II, which were categorized into 12 families. They then used NMR and fluorescence data to give statistical weights to the different conformers. No single conformer could adequately describe the conformation of the peptide, but five different “indispensable” conformers were shown to be the minimum number necessary to account for the experimental data adequately.

4. Notes isomerlzation about amide bonds is usually slow enough to lead to two setsof signals in the NMR spectrum. It is particularly common for prohne and N-alkyl amino acids. If the rate of exchange between the two isomers is slow enough, they can be treated as two separate compounds. However, If it is faster than l/7’, (the spin-lattice relaxation rate), NOES will be partially or completely averaged between the two conformations, even though the two conformations give separate NMR signals (4).

1. Cdtruns

2. The upper limit to the concentration suitable for NMR experiments is determined either by solubility or by mtermolecular mteraction, but in any case, measurement of the concentratton dependence of the chemical shifts, A6/T, and couplmg constants is recommended to check that there are no overt concentration-dependent effects. Chloroform is particularly prone to aggregatton phenomena. 3. The choice of solvent IS crucial to a meanmgful result. Because of their flexibility, pepttdes can often adopt different structures in different solvents It then becomes debatable what the significance of a structure IS, particularly if the solvent is nonphysiological. As discussed in Section

Williamson 3.3. of this chapter, it is m any case advisable to use several solvents or solvent mixtures to obtain a more complete picture of the conformattonal heterogeneity of the peptide. There ISno general rule as to the “best” solvent to use.Most peptides are normally found in aqueous environments, and water would therefore seem an obvious choice. However, peptides act at protein surfaces or m membranes, which are less polar, and therefore lesspolar solvents may give a more relevant result; lesspolar solvents also tend to induce more structure m peptides, because the hydrogenbonding potential of the solvent is weaker. Dtmethylsulfoxide is a common choice and also a good solvent for peptides, whereas either methanol or 2,2,2-trifluoroethanol is often added to aqueous solutions to induce helix formation, the assumption being that the helices seen m such solvent systems are representative of the helices formed m their native environment (usually in membranes) (20) Water/dimethylsulfoxide mixtures have been suggested for use at temperatures below 273 K, as a way of mcreasmg z, (in order to make NOESY crosspeaks larger) and to freeze out some conformational riotion (21). It has been suggested that chloroform mduces conformations of enkephalm analogs with a better correlation to their activities than does dimethylsulfoxide (22), whereas a study of somatostatm analogs (23) showed that conformations m dimethylsulfoxide are good predictors of the presence or absence of biological activity, although structure-activity relationships are better when conformations in aqueous solution are used. Different receptor environments are probably best modeled by different solvent systems, but many more structure-activity studies are necessary before any general conclusions can be drawn m this area. 4. Samples can be recovered from chloroform and methanol by solvent evaporation in a stream of dry mtrogen, and from water by lyophilization. Lyophilization can also be used to recover samples from dimethylsulfoxide, but only if it is frozen in a thm film and often only if water is added. Alternatively, desalting columns provide a rapid way of exchangmg dimethylsulfoxide to water for subsequent lyophilization 5 All the 2D experiments described here, with the exception of COLOC, should be performed m the phase-sensitive mode. COSY should be run as the double-quantum filtered version. 6. For peptides with 07,
83

Peptide Structure

7. The NOE intensity is only proportronal to rm6 at short mixing times, because at longer times, spin diffusion and magnetization decay produce intensity distortion. This topic is discussed at length in Chapter 2 and ref. 4, but as a rough guide for peptides, mixing times of longer than 150 ms should be avoided for quantitative work. 8. When the exchange rate between conformers is faster than the overall rotational correlation rate, which is often the case (particularly for side chain rotation), the NOE should be averaged not as a-% but as 2 (4). 9. Low values of amide temperature coefficients can arise from hydrogen bonding to side chains, such as glutamate H-bonding to its own amide, or aspartate H-bonding to a residue 3 ahead m the sequence. Obvrously, pH will have a marked effect on H-bonding from side chants. In some cases, anomalous results can be obtained; for example, in a 22-residue peptide from the Herpes simplex virus glycoprotein D-l antigenic domain, the amide proton of Val-14 has a very high temperature coefficient, but is the most slowly exchangmg amide proton in the peptrde (IO). The large coefficrent was ascrtbed to loss of the local structure around Val-14 on increasing the temperature. 10 The exchange rate of a solvent-exposed amide proton in water depends on the nature of the side chain on either srde of it. The sequence effects have been tabulated (24) and confirmed by numerous experiments since. Most sequences give exchange rates within a lo-fold range of the value for -Ala-Ala-. The major excepttons are residues at either terminus of the peptide, and His+, which can give a base-catalyzed exchange rate m the peptide -His + -His+ - some 300 times faster than that in -Ala-Ala-.

References 1. Kessler, H , Griesmger, C., Lautz, J., Muller, A., van Gunsteren, W. F., and

2.

3 4. 5

Berendsen, H. J. C. (1988) Conformatlonal dynamics detected by nuclear magnetic resonance NOE values and J coupling constants J Am Chem Sot. 110, 3393-3396 Kessler, H and Bermel, W. (1986) Conformational analysis of peptides by two-drmensional NMR spectroscopy, in Applications of NMR Spectroscopy to Problems in Stereochemistry and Conformattonal Analysis (Takeucht, Y. and Marchand, A P., eds.), VCH, Weinherm, pp 179-205. Kessler, H. (1982) Peptrde conformations. Part 19 Conformation and blologtcal effects of cyclic peptides. Angew Chem Int. Ed 21,5 12-523. Neuhaus, D. and Wilhamson, M P (1989) The Nuclear Overhauser Effect rn Structural and Conformational Analysis. VCH, Weinheim Kessler, H , Griesmger, C , and Lautz, J (1984) Determmatlon of the connect-

ivities of weak proton-carbon couplings with a variation of the two-dimensronal NMR technique. Angew. Chem Int Ed 23,444-445.

84

Williamson

6. Smith, J. A. and Pease, L. G. (1980) Reverse turns m peptides and proteins CRC Crit Rev. Btochem. 8,3 15-399. 7 Pardr, A , Billeter, M., and Wuthrich, K. (1984) Calibration of the angular dependence of the amide proton-Ca proton couplmg constant, 3JHNa, in a globular protein. J. Mol Btol 180, 741-751. 8 Marion, D. and Whthrich, K (1983) Application of phase sensitive two-drmensional correlated spectroscopy (COSY) for measurements of *H-‘H spin-spin coupling constants in proteins. Btochem. Biophys. Res Commun. 113,967-974 9 Dyson, H. J , Rance, M., Houghten, R. A, Lerner, R. A., and Wright, P E (1988) Folding of immunogenic peptide fragments of proteins in aqueous solution. I. Sequence requirements for the formation of a reverse turn. J. Mol. Biol 201,161-200.

10 Williamson, M P , Hall, M J , and Handa, B K (1986) ‘H-NMR assignment and secondary structure of a Herpes stmplex virus glycoprotem D-l antigenic domain Eur J. Biochem 158,527-536. 11. Wiithrrch, K. (1986) NMR of Proteins and Nucleic Acids. Wiley, New York 12 Urry, D W and Long, M M (1976) Conformations of the repeat peptrdes of elastin m solution an application of proton and carbon-13 magnetic resonance to the determmation of polypeptide secondary structure. CRC Crtt Rev Btochem. 4, l-45. 13. Urry, D W (1985) Absorption, circular dichroism and optical rotatory drspersron of polypeptides, proteins, prosthetic groups and biomembranes, in Modern Phystcal Methods in Biochemistry (Neuberger, A. and Van Deenen, L.L M , eds ), Elsevier, Amsterdam. 14. Drake, A. F. (1993) Optical spectroscopy, in Methods in Molecular Biology Protocols for Opttcal Spectroscopy and Macroscopic Techniques (Jones, C., Mulloy, B , and Thomas, A. H., eds ), Humana, Totowa, NJ, m press. 15. Varley, P G (1993) Fluorescence spectroscopy, m Methods in Molecular Btology* Protocols for Optical Spectroscopy and Macroscopic Techniques (Jones, C , Mulloy, B , and Thomas, A H , eds.), Humana, Totowa, NJ, m press 16 Drake, A. F (1993) Circular dichroism, m Methods tn Molecular Biology* Protocols for Optical Spectroscopy and Macroscopic Techniques (Jones, C , Mulloy, B., and Thomas, A. H , eds.), Humana, Totowa, NJ, m press. 17. Haris, P I and Chapman, D. (1993) Analysis of polypeptide and protein structure using Fourier Transform infrared spectroscopy, in Methods m Molecular Biology: Protocolsfor Optical Spectroscopy and Macroscopic Technrques (Jones, C., Mulloy, B., and Thomas, A H , eds ), Humana, Totowa, NJ, in press 18 Richardson, J S (1981) The anatomy and taxonomy of protein structures. Adv Prot. Chem. 34, 167-339. 19 Nikiforovich, G V , Vesterman, B , Betms, J , and Podms, L (1987) The space structure of a conformationally labile oligopeptide in solution: angiotensin. J Btomol. Struct Dynamics 4, 1119-l 135. 20. Bazzo, R , Tappm, M. J., Pastore, A , Harvey, T S , Carver, J A , and Campbell, I D. (1988) The structure of mehttin. A ‘H-NMR study in methanol Eur J Biochem 173, 139-146

Peptide Structure

85

21 Motta, A., Prcone, D , Tancredi, T., and Temussi, P A. (1987) NOE measurements on linear peptides m cryoprotectrve aqueous mrxtures. J. Magn. Reson. 75,3&t-370 22 Temussi, P A , Tancredt, T., Pastore, A., and Castrghone-Morelh, M. A. (1987) Experimental attempt to simulate receptor site environment. A 500-MHz ‘H nuclear magnetic resonance study of enkephalin amides. Biochemistry 26, 7856-7863 23 Wynants, C , Coy, D H , and van Binst, G (1988) Conformational study of superactive analogues of somatostatm with reduced ring size by ‘H NMR Tetrahedron 44,94 l-973, 24 Molday, R. S , Englander, S W., and Kallen, R. G. (1972) Prrmary structure effects on peptide group hydrogen exchange. Blochemcstry ll, lSO-158

CHAPTER

High-Resolution and Drug-DNA Jill

Barber, and John

4 NMR of DNA Interactions

Helen F. Cross, A, Parkinson

1. Introduction The advantage of NMR over most other spectroscopic techniques lies in the ability to gain structural and dynamic information at atomic resolution. Every nucleus with spin gives rise to a signal that is characterized by a number of parameters (chemical shift, J-couplings, relaxation data, and NOE connectivities) that can be used to obtain quite detailed structural information about the molecule under study. They can also be used to determine kinetic properties, for example, the interconversion rates of different conformations of a molecule and the exchange rate of free with bound ligand on a macromolecule. NMR has been widely used to study both static and dynamic aspects of DNA structure and drug-DNA mteractions. Several atomic nuclei are available for the study of DNA by NMR. ‘H is the most common, but 31PNMR is especially useful for studying the effects of ligand binding on the phosphate groups of DNA. The simplicity and large chemical shift range of 13Cspectra, relative to ]H spectra, sometimes make this the nucleus of choice. Other nuclei that may be considered are 15N,a good nucleus if isotopic enrichment of the DNA ISpossible; 2H and 14N,which are quadrupole nuclei and only really suitable for specialized solid-state studies; 170 and “0, whrch may be detected indirectly via isotopic shifts in the 13Cor 31PspecFrom Methods NMR,

m Molecular Biology, Vol Mass Spectrometry, and Metalloprotern

17 Spectroscopfc Methods and Analyses Techmques Edited by C Jones, B Mulloy,

and A H Thomas Copynght 01993 Humana Press Inc , Totowa, NJ

87

Barber, Cross, and Parkinson trum; and 3H, the best NMR nucleus of all in terms of sensitivity, the use of which is, sadly, almost completely prohibited by financial and safety considerations. NMR spectroscopy can, if necessary, be used as an alternative to Xray crystallography. On rare and much publicized occasions, it has even been used to correct information obtained by X-ray diffraction. In general, however, when crystals are available, it is easier to obtain precise information, such as absolute stereochemistry and interatomic distances, from diffraction data than from NMR data. The advantage of the NMR experiment is its versatility. Information can be obtained at various temperatures, and solvents of different pH, ionic strength, or dielectric constant can be used. NMR is a particularly good technique for the study of interactions of small molecules with macromolecules, such as DNA; the effects of changing experimental conditions can be monitored relatively easily, and there is a wealth of conformational and dynamic information to be extracted. The technique, naturally, is not without disadvantages. The most obvious drawback is the cost of a suitable spectrometer (hundreds of thousands of pounds). If the money is available, one then has to satisfy the inherent insensitivity of the technique; 100 pM solutions are usually the minimum requirement (i.e., approx 0.4 mg of a decamer duplex). The most fundamental problem, however, is that of the broad lines associated with NMR spectra of large molecules. Figure 1A shows part of a 270 MHz ‘H NMR spectrum of calf thymus DNA. Uninterpretable lumps like this are typical of NMR spectra of large molecules. For NMR spectroscopy of DNA (unlike proteins), this problem can be alleviated rather easily by the use of smaller fragments of DNA. Sonicated or sheared DNA may be used, but most usually synthetic self-complementary oligonucleotides of 416 bp are chosen, Symmetry, of course, helps to simplify NMR spectra. In Fig. lB, the spectrum of a synthetic oligonucleotide bound to a dye, the lines are much narrower than in Fig. IA, but still broader than those obtained from small molecules. Although spectra of this sort can be interpreted with the aid of two-dimensional experiments, a great deal of skill and patience is required in preparing the sample, setting up the instrument, and interpreting the fearsome 2D data sets obtained from DNA fragments. Spectra become still more complex when drugs are bound to DNA. Additional peaks resulting from the

High-Resolution

NMR

89

Fig 1. (A) Calf thymus DNA aromatrc region (B) d(CGCGAATTCGCG)2.Hoechst 33258 1 1 complex

drug are, of course, normally observed, and, when the DNA fragment 1sself-complementary, dyad symmetry may be lifted, doubling many of the peaks. Figure 2 shows a short section of B DNA illustrating the principal modes of binding of drugs. Many antitumor drugs are intercalators. They slip between the base pairs of DNA. In order to do this, they need to be flat, generally containing a number of fused aromatic rings. Ethidium bromide (Structure l), acridines (Structure 2), and actinomycin D (Structure 3, which appears as part of Fig. 9 later in this chapter) are familiar intercalators. Intercalators have limited ability to read sequence, and even those used as drugs are usually very toxic. Drugs that bind in the minor groove of DNA are able to read sequence to some extent. Most minor groove binders bind selectively in AT-rich regions of DNA, but there have been attempts meeting with increasing success (14) to design molecules that read sequence from the minor groove. The major groove of DNA is, as the name implies, large. It is able to accommodate proteins with functions that necessarily involve readmg sequence very precisely. Restriction enzymes, proteins that regulate gene expression, and DNA repair enzymes all operate from the

90

Barber, Cross, and Parkinson

intercalators fit between base pFdrt3

Fig 2 A short section of DNA showing schematlcally the modesof binding of intercalators, minor-groove binders, and major-groove binders.

Structure 1.

Structure 2.

major groove. The major groove is therefore receiving considerable attention as a target for natural and synthetic drugs. Although most of the drugs that act in the major groove are small alkylating agents, some sequence selectivity is being achieved (5). These alkylating agents are among the drugs that bind covalently to DNA. Drugs that bind covalently to DNA, like those that bind noncovalently, may be major or minor groove binders or intercalators.

High-Resolution

NMR

91

Table 1 Typical Chemical Shift Ranges for Proton Resonances in NMR Suectra of Nucleic Acids Proton type

Expected chemical shaft”

TCH, Sugar 2’ and 2”H S’terminal 5’H and 5”H Sugar 5’H, 5”H, and 4’H Sugar 3’H Sugar 1’H C C5H A C2H, C8H G C8H T C6H c C6H C ammo CH4 ( l)b C amino CH4 (2)b G immo CHlb T immo CH3b

Distmgmshing features Sharp singlet

1.OO-2.00 ppm 2.00-3.00 ppm 3 70 ppm 4.00-4 50 ppm 4.50-5.20 ppm 5 30-6 20 ppm 5.30-6 20 ppm

3J = lo-Hz doublet

6.50-8 20 ppm 6.50-8.20 ppm 6.40-6.80 ppm 8 30-8.50 ppm 12 50-13 OOppm 13 50-14.00 ppm

3J = lo-Hz doublet Exchange out in D20 Exchange out m D20 Exchange out m D20 Exchange out m D20

@Chemicalshifts relative to Internal T S P “For Watson-Crick base pairs

2. Measurement and Interpretation of DNA and DNA-Ligand 2.1. Sample

of NMR Spectra Complexes

Conditions

It is generally accepted that, at low concentrations and in low-ionicstrength solutions, self-complementary oiigonucleotides tend to form hairpin loops as well as double helices (6). Oligonucleotrdes required for the NMR study of double-helical structures in solution are therefore usually dissolved in a buffer containing 100 mM NaCl and 10 mM phosphate. At room temperature (295 K), such a double helix produces a characteristic ‘H MNR response,details of which are shown in Table 1. 2.2. ID NMR

The classification of ‘H chemical shifts is primarily a consequence of neighboring group effects. Variations of chemical shift within these groups are owing to secondary effects of global and local structure perturbations away from canonical structure. For example, unstacking of base-paired double-helical DNA by thermal denaturing to form two

92

Barber, Cross, and Parkinson

single-stranded DNAs is usually manifest as a shift of ‘H resonances to higher frequencies. A plot of chemical shift change vs temperature for any single resonance will usually yield the melting temperature, T,, at which half the population has converted to a second form, Dissociation enthalpies for the duplex to hairpin and duplex to single-strand conversions can also be derived using such data (7,8). Clearly, varration of solvent and solute composition can markedly alter the global DNA structure; for example, conversion of B-DNA to left-handed Z-DNA can be effected by the addition of methanol to an aqueous sample (9). 2.3. ID Multiple-Pulse

Experiments

For most NMR studies of aqueous solutions, the sample is dissolved in D20. This is inappropriate for observation of imino and amino protons of DNA, which exchange rapidly with the solvent even when base paired. It is therefore necessary to dissolve the sample in 90% Hz0 (with 10% D20 present to provide a lock signal), and to suppress or attenuate the huge water signal. This is most simply achieved by presaturation, but saturation transfer to the imino protons results in reductions in the intensity of these signals (10). Although the loss of signals can be used to one’s advantage, an alternative, known as the binomial pulse sequence,can be applied (11,12). This technique attenuates the water signal without continuous irradiation of the water resonance and does not employ the use of a decoupler channel. By using a binomial pulse (17 or 133T) as the observe pulse, the decoupler channel can be used to generate 1D NOES between imino protons and near neighbors, such as ademne C2 protons. Such NOES can be drfficult to detect in 2D experiments. The NOE in such cases, which is a measure of interatomic separation <5 A, is usually negative, owing to the slow tumbling time of the molecule m solutron. Two-pulse experrments provide means of measuring other parameters, such as the spinlattice relaxation times (T,). A variation of T,s for particular types of proton along an oligonucleotide chain can point to sequence-specific structural variations (13). Measurement of Trs can provide a means of Identifying adenine C2 hydrogens, whrch have characteristic T,s lymg between 4 and 10 s. Although most macromolecular NMRinformatron derives from proton studies, heteronuclear studies are proving increasingly useful. The

High-Resolution

NMR

low natural abundance of i5N and t3C prevents many detailed experiments from being carried out, but phosphorus can be detected readily at natural abundance and serves as a probe for DNA backbone conformation. Under normal circumstances, a right-handed B-DNA structure gives rise to 31Presonances clustered around -3 ppm relative to internal phosphate. Variations away from this central position by 1 or 2 ppm have been shown to be indicative of alternative DNA backbone conformations (9). Assignments of phosphorus spectra can be made either by *H detected HMQC (2D Heteronuclear Multiple Quantum Coherence) (14) or in 1D by specific 170 and 180 labeling (15). Labelmg with 170 directly attached to phosphorus broadens the 31Psignal below detection, whereas ‘*O labeling causes an isotopic shift to the low-frequency side of the normal 31Psignal. Enrichment is necessary for the observation of i3C DNA resonances, and by combining 31P with 13CNMR, it can be possible to formulate some understandmg of DNA backbone motion (16). In a similar way, base-pair mobility has been investigated by appropriate 2H and t5N enrichment (16). Such 1D NMR applications are of benefit, but despite improved signal dispersion at higher magnetic field strengths, the combination of substantial signal overlap and poor resolution, especially for iH nuclei, imposes major limitations on the assignment of spectra of DNA and l&and-DNA complexes. For this reason, 2D and 3D NMR methods have emerged as the principal sources of both assignments and structural and dynamic information on such materials. 2.4.20 NMR Homonuclear 2D NMR experiments provide the basis for a considerable amount of both qualitative and quantitative structural data for DNA and DNA-ligand complexes in solution. COSY (Homonuclear Shift Correlated Spectroscopy) and its variants (DQCOSY, E.COSY, and P.E.COSY) provide scalar coupling information (see Chapter 1) and can be used to deduce bond torsion angles (I 7). NOESY (Nuclear Overhauser Enhancement Spectroscopy) provides information relating to the spatial arrangement of atoms with respect to one another. Additional experiments, such asTOCSY (Total Correlation Spectroscopy, also called HOHAHA) and ROESY (Rotating Frame NOESY), can be used, but for DNAwork, NOESY- and COSY-type experiments

94

Barber, Cross, and Parkinson

are the most useful. For these experiments to be executed within a sensible time period (3-12 h), l-5 mM solutions (lo-20 mg oligonucleotide) are required. This provides sufficient material without causing aggregation or providing too much signal for the spectrometer ADC to handle. Data should normally be acquired m phasesensitive mode on a nonspmning sample and, where possible, all necessary data (i.e., COSY, NOESY, TOCSY) should be acquired without removing the sample from the magnet (18). Whereas useful 1D experiments may be performed at medium field strengths, for all but the shortest oligonucleotides, 2D experiments should be carried out at 500-600 MHz. 2.5. Strategy for the lH NMR Assignment of DNA

The strategy used for assignmg 2D NMR spectra of DNA involves connecting adjacent spin systems on the basis of the spatial mformation inherent in the NOESY data. Three scalar-coupled spin systems (isolated from one another m terms of direct bonding) exist in DNA, namely sugar ring, Hl’, H2’, H2”, H3’, H4’, H5’, HS’, cytosine C5HC6H, and thymine CHs-C6H (4J-coupling). Figure 3 shows a typical COSY spectrum for a regular B-DNAstructure; off-diagonal responses correlate nuclei that are scalar-coupled. In NOESY data sets, the off-diagonal responses correlate nuclei that are dipolar-coupled. When the NOESY experiment is run with a mixing time of 200-300 ms, dipolar couplings are seen relating all pairs of nuclei separated in space by 5 A or less. These dipolar couplings can be predicted on the basis of known DNA structure (see Table 2) and used as the basis of a sequential method of assignment of ‘H NMR data (19). Figure 4A shows one of the most useful quadrants of a DNA NOESY data set used to make the sequential assignment in the duplex [d(AGACGTCT),] (20). The starting point for the assignment is the crosspeak labeled “a.” This is the NOE correlatingT8Cl’H andT*C6H, as shown by the dashed line “a” in Fig. 4B. The second crosspeak to T8C6H, “b,” is the NOE between T*C6H and C7CI’H, indicated by “b” in Fig. 4B. The third crosspeak, “c,” occurs between C7C1’H and C7C6H, and similarly C7C6H is correlated toT6C1’H by an NOE, “d.” The full “walk” for the sequence [d(AGACGTCT),] can be seen, with part of the structure

High-Resolution

9

6

NMR

7

95

6

5

4

3

2

1

0

PPM

Frg. 3. Magnrtude mode 2D tH COSY NMR spectrum of the 8-mer [d(AGACGTCT)], at 296 K recorded at 500 MHz Although the spectrum is shown symmetrized, thus practrce 1s now not recommended for 2D spectra.

Characterrstlc

Table 2 Dipolar Couplings m a B-DNA

Structure”

Pu C8H(n)/Py C6H(n) to CI’H(n) C2”H(n) Cl’H(n - 1) C2”H (n - 1) Pu C8H(n)/Py C6H(n) to C CSH(n) or T CHs(n + 1) Cl’H(n) to C2”H(n), C3’H(n), C4’H(n) C2’H(n) to C2”H(n), C3’H(n) C3’H(n) to C4’H(n), CS’H(n), CS’H(n) C4’H(n) to CS’H(n), CS”H(n) ‘Pu represents purinebases, and Py represents pyrlmldme bases If n 1s the number of the residue m the oligonucleotide sequence, then (n - 1) is the 5’ neighboring nucleotide unit, and (n + 1) IS the 3’ nelghbormg unit (19)

Barber, Cross, and Parkinson

96

A C6H C4 Ts

$

G5 G2 6.00

Ai

A3 5.00 5.50 Sugar ring I’H resonances : &mer

6.00

PPM

Fig. 4 (A) “Fingerprmt” 1’H to aromatlc crosspeakregion of the phase-sensltlve ‘H 2D NOESY spectrumof the 8-mer, [d(AGACGTCT)12 acquiredat 400 MHz (B) Representation of the 3’ end of the 8-mer duplex In the right-handed format

High-Resolution

NMR

97

B-DNA H6

H6

H5

H5

5’

Z-DNA

Hl’

H2’

H2”

c

H6

HI’

H2’

H2”

C

H6

H5

H5

Hl’

H2’

H2”

H4’

H5’

H5”

Hl’

H2’

H2”

H4’

HS

H5”

3’

Fig, 5 Spatial NOE connectlvltles expected for S-CpGpC-3’ in right- and lefthanded DNA structures

shown. This “walk” procedure can be applied not only to the aromaticCl’H region, but also to the aromatic-C2’H, C2”H region. 2.6. Structural Characterization of DNA from NMR Data

Once an assignment has been made, certain qualitative conclusions can be drawn concerning the type of structure formed in solution, based on the NMR data acquired. As shown earlier, 1D NMR studies can reveal whether a material has double-helical characteristics. The assignment “walk” from 2D NOESY data can show whether that duplex IS left- or right-handed. Predictions of NOE “contacts” for both types provide a basis for making the judgment (Fig. 5) (21). Since the NMR data for a single conformation have a unique solution, the assignment falls into one of these two categories. Thus, for the example in Fig. 4, the structure is right-handed. Two different forms of right-handed duplex DNA, A- and B-DNA, show subtly different NMR responses. X-ray studies show that distances between purine H8 or pyrimidine H6 base protons and neighboring 2’ protons are different for the two geometries: H6/H8 (n) to H2’ H6/H8 (n) to H2’(n - 1) A-DNA B-DNA

3.9‘k 1.98,

1.7‘& 3.9A

H2’ (n - I) relates to the 2’ proton on the 5’ flanking nucleotides relative to ‘92.”

98

Barber,

Cross, and Parkinson

These distances are clearly reflected in NOESY data. For a B-DNA structure, NOES between base C6 and C8 hydrogens and the C2’ sugar hydrogen of the same nucleoside unit are larger than NOES between base C6 and C8 hydrogens and the C2’ hydrogen of the neighboring 5’ nucleoside unit. For an A-DNA structure, the opposite is true (22). Such is the power of the NOESY experiment that it is now being used as the source of quantitative solution structures of DNA m solution. This subject is beyond the scope of the current discussion, but references for further reading are provided (23-25). 3. Structural and Dynamic Studies of DNA A number of different forms of DNA exist and even coexist in the same sample. NMR lends itself particularly well to their study, especially in cases where no suitable crystals can be obtained for X-ray work. Both structural and dynamic studies are m progress on unusual DNA structures known to be responsible for mutagemc effects. 3.1. Left-Handed

Z-DNA

Z-DNA, with its name reflecting the zigzag appearance of the phosphate backbone, is known to occur in CG-rich regions of DNA, and is believed to be recognized by certain types of DNA-binding proteins (26). Its detection has fueled speculation about its occurrence in nature and the conditions under which it forms. NMR is being used to probe Z-DNA structure, B-Z DNA interfaces, and B to Z conversions, in short, easily manageable fragments. A typical example is the (m5dCdG), structure studied as a function of methanol concentration m 0. 1M NaCl (9). At low-salt and low-methanol concentration, the B form is more favored. Higher methanol concentrations convert the species to Z-DNA. Bromination and alkylation at the 5 position of guanine appear to provoke Z-DNA formations of G- and C-rich fragments (6). Generally, guanine takes up the syn conformation with respect to the sugar ring, which is puckered in the C3’-endo form; this contrasts with B-DNA in which all residues are anti and sugar rings are usually in the C2’endo form. Characteristic NMR responses for Z-DNA are thus very clear. H4’, HS, and H5” resonances occur at lower frequencies than for B-DNA; the mtranucleotide G(C8H to Hl’) NOE is very much stronger than in B-DNA, and, as previously alluded to, the 31Pspectrum shows resonances at higher frequencies. An assignment walk strategy, simi-

High-Resolution

NMR

lar to that used for B-DNA, can be used to assign the ‘H NMR spectrum of Z-DNA (21). Alternatively, a 1: 1 mixture of Z- and B-DNA can be generated by the addition of methanol to B-DNA (9), and the Z-DNA spectrum assigned by chemical exchange using the (previously asstgned) B-DNAspectrum. This procedure is possible, because the B- and Z-DNA are in slow exchange (giving separatedistinct signals) on the NMR time scale. Slow exchange also allows the percentage ZDNA to be calculated (from the integration of resolved signals). Data recorded at different temperatures have enabled Arrhenius plots of In(%B/%Z) against l/T(K)-’ to act as a source of AH and AS values for the interconversions. The results indicated that the enthalpy term favors Z-DNA and the entropy term, B-DNA (9). 3.2. Hairpin-Loop DNA DNA sequencesconsisting of inverted repeats (such asAGCTAGCT) are often present in regulatory sequencesof DNA, and are of particular interest since they mimic hairpin-loop structures present in cruciforms and RNA. The possibility that these hairpin-loop forms have some regulatory capacity has fueled interest in the duplex to hairpin-loop conversion, the factors that govern hairpin-loop stability, and general consideration of conformational pathways to chain folding in solution. Hairpin-loop DNA crystals have so far proved elusive, and NMR is therefore the only source of structural data on short DNA hairpin structures. NOESY NMR data of hairpin loops show base stacking features similar to B-DNA (7). Usually the stem is a right-handed B-DNA duplex, with base stacking continuing into the loop region. Many thymine loop-containing oligonucleotides studied by NMR, such as d(ATCCTATTTTTAGGAT) and d(CGCGTTTTCGCG), have been proposed to be stabilized by alternative T.T base pairing in the loop region. It remains to be established how the sequence and structure of the stem region influences the nature of the loop region in such materials. 3.3. Base Mispairing

and Defective DNA Structures DNA repair mechanisms and mutagenesis provide a further source of DNA/NMR studies. Misincorporation of a base by base mispairing and the subsequent mutagenic potential created have resulted in a

100

Barber, Cross, and Parkinson

significant interest in the structure and stabrhty of DNA containing GT, GA, AC, and TC mismatches. NMR studies of the GT mismatch DNA d(AAATTTTCAAA) d(TTTGAGAATTT) show that GT occupies a normal position in the helix (26). One effect of GT base-pair formation is the reduction in stability of the duplex, as evidenced by a decrease in the melting temperature of aberrant structures compared to theparentmaterials. The melting temperatureof d(CGTGAATTCGCG) d(CGCGAATTCGCG) containing two GT mismatches was found to be 52°C compared with 72°C for the parent [d(CGCGAATTCGCG)], (27). 3.4. Extrahelical Bases and Frame-Shift Mutagenesis

DNA sequences that incorporate an extra unpaired base into the structure, such as the sequence shown, are of interest in the study of frame shift mutagenesis. C-G-C-A-G-A-A-T-T-C---G-C-G G-C-G

_ _ _

C-T-T-A-A-G-A-C-G-C

Although X-ray crystal studies of this material showed the As to be looped out (28), NMR reveals that A is stacked into the double-helical structure in solution (29). By contrast, NMR studies of d(CA,-CA,G) d(CT6G) show the extrahelical C to be looped out (30). Further studies by NMR of triple-helical DNA, a-DNA, base-alkylated DNA, and backbone alkylated DNA are all adding to the understanding of factors influencing DNA structure and mutagenic repair.

l

4. Drug-DNA Interactions 4.1. Minor Groove Binders

DNA is the known target of many compounds used as antiviral and/ or antitumor agents. Many NMR and indeed X-ray studies of DNAligand complexes are carried out with a view to understanding the details governing the molecular recognition events and providing a long-term framework on which to base future drug research. Such ligands exhibit complementarity to specific base sequencesbased on shape, charge, polarity, and the pattern of hydrogen bonds. DNAbinding drugs specific for the minor groove of DNA generally prefer AT-rich, rather than GC-rich regions of DNA. The drugs are usually

High-Resolution

NMR

Examples of Llgand-DNA Ligand Netropsin Netropsm Distamycm A Lexltropsin Hoechst 33258

Table 3 Complexes Studied by NMR

DNA sequence d(GGAATTCC) d(GGTATACC) d(CGCGAAATTGGC) d(CATGGCCATG) d(CGCGAATTCGCT)

Ref.

l

d(GCCAATTTCGCG)

31 33 35 34 32

planar and crescent shaped, and possess donor/acceptor functionality, and the DNA minor groove possessesan electrostatic potential minimum attractive to many such ligands. Studies of these complexes by NMR involve analysis of binding modes, complex lifetimes, and base specificity. Examples of materials studied in this way are shown in Table 3. 4.1.1. Structural Features of Complexes from NMR Data

The broadening of DNA ‘H resonances on the addition of a suitable minor-groove binding ligand has often been taken as primary evidence of complex formation. The broadening is a reflection of the increased rotational correlationtime of the DNA with a ligand tightly bound to it. The ‘H spectrum of a drug-DNA complex is dependent on its rate of dissociation. Slow exchange on the NMR time scale is characterized by the simultaneous appearance of clearly resolved signals from both free and ligand-bound oligonucleotides, when the ligand to oligonucleotide molar ratio is ~1: 1, An example is shown in Fig. 6. The appearance of “doubled” resonances, also apparent in Fig. 6, arises from the use of self-complementary oligonucleotides and nonsymmetrical ligands. The previously identical oligonucleotide strands in the unbound oligonucleotide become magnetically nonequivalent in the complex when binding of such a nonsymmetrical ligand is tight. This inequivalence only occurs in and around the binding site, thus providing a macroscopic feel for the position of binding. In cases where the rate of exchange is intermediate on the NMR time scale, this inequivalence is less apparent. In general, detailed definitions of binding sites are provided by NOE data. DNA minor grooves provide useful probes by which this definition can be made; sugar 1’hydrogens

102

Barber, 1 ligand

0 75

0.5

lrgand

Cross, and Parkinson

1 DNA

1 DNA

ligand

1 DNA

0 25 ligand

1 DNA

ti 3 00

2.60

2 20

1 60

1 40

1 00

PPM

Fig 6 One-dlmenslonal ‘H NMR spectra of mixtures of free d(CGCGAATTCGCG) and Hoechst 33258 bound d(CGCGAATTCGCG) m slow exchange, in the ratios of DNA to ligand indicated

point in toward the minor groove, and base-pair imino and adenine C2 hydrogens line the floor of the minor groove. By assessing both the complexation shifts of these resonances and NOES between these protons and resonances assigned to the ligand, a clear picture of the binding mode is created. Complexation shifts are well represented by illustration, change in chemical shift between free and bound DNA (AS) being plotted against chain position (Fig. 7). In and around the binding site, Cl’ hydrogen resonances move to lower frequency (more shielded), and base-paired amino and adenine C2 hydrogen resonancesmove to higher frequency (more deshielded). Ring current amsotropy fromligand aromatic rings

High-Resolution

NMR

103

0 2O.lO-

A6

-0

l-

-0.2-0.3-0 4-

~1’

~2’

~3’

C4’

~5’

T6’

A7’

A6’

@’

$0’,$1’C12

0 20 lO-0.1-0.2-0 3-

A6

-0 4-

-0 5-0 6-0 7-0 0-0 9-1 o-1

l-

-1 2-

Fig. 7 NMR spectra (1D) of the thymme methyl region of d(CGCGAAITCGCG)2 m 10 mM phosphate, 100 mM NaCl, and 0 1 m&Z NaN, at pH apparent 7 0,99 96% DzO referenced to Internal 3-tnmethylsilyl [2,2,3,3,-2H4] proplonate. Spectra were recorded at 500 MHz m the presence of the indicated molar ratios of added Hoechst 33258

IS seen as the main cause of these shifts, Cl’ hydrogens lying perpendicular to the plane of the rings. Intermolecular NOES provide a more detailed source of binding position information. Their analysis requires an assignment of the complex ‘H spectrum for both DNA and ligand protons, a task that

Barber, Cross, and Parkinson

HO

A3’-H2

W-%

H

H

Structures 4 and 5 Examples of mtermolecular NOES observed for both Netropsm (left, Structure 4) and Hoechst 33258 (right, Structure 5) bound at an AATT bmdmg site.

may take some months to complete, following the protocol outlined m Section 2. A NOESY off-diagonal response that correlates a ligand proton resonance with a DNA proton resonance represents a close contact between DNA and ligand. Charting such responses for all ligand protons provides a series of structural constraints corresponding to intermolecular distances <5 A. Typical mtermolecular contacts are shown for both Netropsin (Structure 4) and Hoechst 33258 (Structure 5) bound at an AATT site (31,32). (See Structures 4 and 5.) The majority of contacts are between imino and adenme C2 hydrogens and ligand aromatic/NH hydrogens, which lie on the concave edges of the drugs. The general feature of the NMR data from such complexes is characteristic of a DNA molecule whose gross structural characteristrcs are similar in both free and bound forms. NOESY assign-

High-Resolution

lVMR

\ \

’

,’

105

FLIP -

1’

Fig 8 The existence of transferred NOE arises through drug flippmg on the NMR time scale. Closed and open circles represent two sets of two protons that are chemically equivalent m the free DNA, but become inequlvalent in the presence of an unsymmetrical hgand. “Normal” NOES are shown by sohd double-headed arrows “Transfer” NOES are shown by double-headed dashed arrows.

ments and the presence of imino proton resonances in the correct position for Watson-Crick base pairs indicate that base pairing remams intact when a ligand is bound. 4.1.2. Dynamzc Features

of Ligand-DNA

Complexes

Certain dynamic features of drug-DNA complexes are directly accessible from NOESY data used to assign the complex. Many instances are reported in which the ligand occupies two identical and equally populated binding sites (31-35). An unsymmetrical ligand imposes asymmetry upon an otherwise symmetrical oligonucleotide double helix. If a single DNA proton has two identities dependent upon the orientation of a ligand, then in this slow exchange limit, two resonances resulting from chemical exchange will give rise to a NOESY response correlating both resonances. Such an exchange crosspeak is clear evidence for drug reorientation on the NMR time scale. A second type of NOESY crosspeak, the exchange NOE, is also a feature of such data. It arises when one partner from a pair of protons, which normally show an NOE correlation, takes on a new identity within the NMR time period, when a response is observed between the new partners (see Fig. 8). Such ‘NOES’ can be seen for residues >lO 8, apart in space. Exchange crosspeaks identify an on/off process in which the ligand has time to reorient itself on the NMR time scale.The two-site exchange

106

Barber,

Cross, and Parkinson

Structure 6.

process is deemed to go by one of four mechanisms, namely intermolecular exchange betwen DNA molecules, mtramolecular sliding, “walking” of the ligand, or by a flip-flop exchange (34). The last, and most favored, is expected to go vra a loosely associated complex between drug and ligand in which the rate-determining step is the departure of the drug from where it orients itself along the minor groove floor. By way of example, the complex between distamycin A (Structure 6) and [d(CGCGAATTCGCG)], has been calculated to flip at a rate of 2 s-l, and exchange at a rate of 4 s-t (36). The figures are arrived at by use of Eq. 1, where Av is the frequency separation of the resonances of two interconverting centers measured at coalescence. Kcoal

(1)

= (I’&-&) AV = 2.22Av

The actlviation energy for exchange, AG, at the coalescence temperature, Tcoal,can be calculated from Eq. 2. AG = 19 14Tcoa1[9 77 + log (T,,dAv)]

(Jmol-‘)

(2)

Without going to these lengths, some qualitative feel for exchange rates can be grasped from resonance lmewidths, a good example of which exists in the comparison of the netropsin/AATT and netropsin/ TATA complexes in solution (31). At identical temperatures, certain notable resonances in the former complex were sharper than in the latter. Raising the temperature of both complexes was seen to lift the asymmetry in the ‘H spectrum of the latter complex faster than in the former, the implication being a tighter binding process for the AATT complex. The differences in their relative affinities have been put down to different hydrogen bonding strengths, the conclusion being that such H-bond contributions are important for specificity, but not for overall complex formation.

High-Resolution

NMR

107

4.2. Intercalators NMR studies of drugs binding to DNA by intercalation now frequently parallel studies of minor groove binders. The accessibility of milligram quantities of self-complementary oligonucleotides has greatly facilitated very detailed studies of the molecular basis of the interaction of minor-groove binders and intercalators. NMR studies of intercalators, including carcinogens, such as ethidium bromide, and antitumor drugs, such as actinomycin D, predate these developments, and many ingenious studies have been published based on relatively simple NMR techniques. It would be wrong to suggest that these have been superceded. Detailed information about binding interactions is almost always obtained from analysis of complex 21) data, but, if such detail is not required, this analysis should be avoided! One-dimensional 31PNMR spectroscopy can provide useful information about the binding of intercalators to DNA (37)‘).Chemical shifts of 31Pare sensitive to conformational changes in DNA, and intercalating drugs cause downfield shifts in the 31Psignal, whereas divalent cations cause upfield shifts. Relaxation parameters (linewidths and T, measurements) are also sensitive to intercalating drugs. When ethidmm bromide is added to sonicated calf thymus DNA, there is a downfield shift of the (unresolved) 31Presonance indicating that the complex is m fast exchange. As more ligand is added, the 31P resonance continues to move downfield until the DNA becomes saturated (at O.SMequivalent drug:basepairs). The linewidth also increases; when saturated with ethidium bromide, the 31Presonance of calf thymus DNA increased by about 50 Hz (37). The spin-lattice relaxation time, T,, is, however, affected very little by the presence of ethidium bromide. The authors suggested that ethidium may have little effect on the internal motion of DNA (reflected in T,), but may slow the overall motion of the molecule (reflected in the increased linewidth). The intercalating drugs quinacrine and daunorubicin behaved qualitatively similarly to ethidium, although the magnitudes of the chemical shift changes were quite different. Actinomycnn D(3), however, binds more tightly to DNA, and slow exchange was observed. When actinomycin D was added to sonicated calf thymus DNA, two 31P signals were seen. One had a very similar shift to that of unbound DNA, although it was somewhat broadened. The other was substantially downfield shifted and was very broad indeed. Actinomycin D

108

Barber, Cross, and Parkinson

shows strong GC selectivity, and it was concluded that the GC phosphates are responsible for the shifted resonance, whereas the more distant AT phosphates give rise to the unshifted peak. Methodology for analyzing fast-exchange intercalators has been extended (38), and the binding of ethidium to Z-DNA (39) is among the systems studied by NMR. Further notable 1D studies of intercalation include a t9F study of fluoroquinone binding to DNA (40), an investigation of the interaction ofpropidium with oligonucleotides containing mismatch (G-T) base pairs (41), and a 23Na study of the effect of intercalators on the association of sodium ions with DNA (42). Particularly instructive, however, are the long-term NMR investigations of the binding of actinomycin D and the his-intercalator echinomycin to DNA. Echinomycin intercalates with GC selectivity, with the connecting peptides lying in the minor groove. The many NMR studies leading to a detailed model for its binding have been reviewed in depth recently (43), so this discussion principally concerns actinomycin D. In 1986, a 1D NMR study of the binding of actinomycin D to a number of self-complementary oligonucleotides all containing the (GCGC)2 sequence was published (44). In this study, 31PNMR and ‘H NMR (observing the imino protons at 611S-14.5) spectra were used to elucidate several points. Actinomycin D binds preferentially in SGC sequences (rather than CG or any other sequence). When two GC sequences are available, both may be bound by actinomycin D, even if they are adjacent. The length of the flanking sequence has little effect on the binding. When the drug and oligonucleotides are present in a 1: 1 ratio, two distinct 1: 1 adducts are formed; at higher drug concentration, a unique 2: 1 adduct is formed. In essence,these conclusions were arrived at by counting the imino peaks m the ‘H spectrum! The asymmetry of the drug molecule suggested a molecular basis for the results obtained in these experiments. Integration of the imino protons showed that one of the two 1:1 complexes was favored by a factor of two or three over the other, and chemical shift analysis suggested that the favored complex was the one in which the benzenoid ring lies over the central G (see Fig. 9). In the 2: 1 complex, only this orientation is permitted, resulting in a simplified NMR spectrum. Further studies, including 2D experiments, have been carried out by the same group (45,46) confirming and extending these findings.

High-Resolution

NMR

109

Fig 9 The bmdmg of actmomycin D to d(GCGC)z* (left) more favored 1: 1 complex, (center) less favored 1: 1 complex, and (right) 2: 1 complex. Structure 3 IS shown at bottom

4.3. Major

Groove

Binders

Many antitumor drugs bindin the major groove, generally covalently through N-7 of guanine. Nitrogen mustards, mitozolomides, and cisplatin all act in the major groove, but studies of their modes of action have mainly involved techniques other than NMR. The particular strength of NMR in the study of drug-DNA interactions is the ability of the NOESY experiment to identify intramolecular and intermolecular contacts in noncovalently bound complexes involving relatively large drugs. To date, such drugs as have been discovered or synthesized have mainly been found to bind by intercalation or m the

110

Barber, Cross, and Parkinson

*sN\ HN’ 3

Pt

/ c’ ‘a

Structure 7.

minor groove (6). Chromomycm A3 was at one stage in its checkered history believed to be a major groove binder, having started life as an intercalator. Recent NMR studies by two groups (43,47) provide very strong evidence for the drug binding in the minor groove. One notable set of NMR experiments concerns the binding of cisplatin to DNA fragments. Cisplatin (Structure 7) binds to two adjacent guanine baseslinking them. Very high-resolution NMR studies have been carried out on the cisplatin d(GG) adduct (48). This is quite a small molecule and gives very sharp lines in the ‘H NMR system. The NMR spectrum was assigned completely using 1D techniques. Proton-phosphorus coupling constants were then converted to conformational information using an empirical mathematical relationship, a modified Karplus equation. It was concluded that the 5’ G sugar moiety is distorted by the drug so that the sugar is in the N (C3’-endo) conformation, whereas the 3’ G sugar adopts predominantly the normal S(C2’-endo) conformation. These studies have been extended to longer oligonucleotide chains (49). NMR studies of carcmogens binding DNA in the major groove are currently more numerous than those of drugs. The techniques involved are directly transferable and this field has been reviewed in depth (50). 5. Conclusions The availability of synthetic oligonucleotides, especially of selfcomplementary strands of DNA of defined sequence,has greatly facilitated very detailed NMR (and X-ray) studies of DNA structure and DNA-ligand binding. NMR is the only technique available to date that allows solution structures of DNA to be explored to atomic resolution. In conjunction with DNA footprinting, it is also an invaluable tool in determining the molecular basis of drug action. Clearly NMR, together with X-ray crystallography and molecular dynamics, will continue to be used in drug design. The aim must be to

High-Resolution

NMR

111

design molecules that can read sequence, and so (subject to nontrivial pharmacokinetic and toxicological considerations) be used to control the replication of viruses and the expression of oncogenes. At the moment, this effort is largely concentrated m minor-groove binders and on extended intercalators, such as actinomycin D, whose peptide chains lie in the minor groove. The major groove is now attracting attention, however, and since proteins read sequence from the major groove, it is likely that drugs will be able to do the same and that NMR will be used extensively to determine their modes of binding. References 1. Dervan, P. B. (1986) Design of sequence-specific DNA-bmdmg molecules Science 232,464-47 1 2 Goodsall, D and Dickerson, R E (1986) Isohelical analysts of DNA groovebinding drugs. J Med. Chem. 29,727-733 3. Kissmger, K , Krowrcki, K , Dabriowak, J C , and Lown, J W (1987) Molecular recognition between ohgopeptides and nucleic acids Monocatiomc imidazole Lexitropsms that display enhanced GC sequence dependent DNA bmdmg. Blochemlstry 26,5590-5595 4 Zakrzewska, K and Pullman, B (1988) Theoretical study of the sequence selectivity of isolexms, tsohelical DNA groove binding ligands. Proposals for the GC minor groove specific compounds J. Biomol Struct. Dyn 5, 1043-1058. 5 Thurston, D. E and Thompson, A. S (1990) The molecular recogmtton of DNA. Chem Br. 26,767-772 6 van de Ven, F J M and Hilbers, C W (1988) Nucleic acids and nuclear magnetic resonance. Eur. J. Biochem. 178, l-38 7 Wemmer, D. E., Chou, S. H , Hare, D. R., and Retd, B R. (1985) Duplexhairpin transitions in DNA NMR studies on CGCGTATACGCG Nucleic Acids Res. 13,3755-3772. 8 Delort, A.-M., Neumann, J M , Molko, D., Hervt, M , TCoule, R , and Tran Dinh, S. (1985) Influence of uracil defect on DNA structure ‘H NMR mvestigation at 500 MHz Nucleic Acids Res 13,3343-3354 9. Fetgon, J., Wang, A. H -J., van der Mare], G A., van Boom, J. H , and Rich, A (1984) A one- and two-dtmensional NMR study of the B to Z transition of (m5dC-dG)3 m methanoltc solution. Nucleic Acids Res 12, 1243-I 263 10. RaJagopal, P , Gilbert, D E , van der Marel, G. A , van Boom, J. H , and Fetgon, J (1988) Observation of exchangeable proton resonances of DNA in two-dimensional NOE spectra using a presaturation pulse. application to d(CGCGAATTCGCG) and d(CGCGAm6ATTCGCG) J. Magn Reson. 78, 1243-l 263 11 Hore, P. J (1983) A new method for water suppression m the proton NMR spectrum of aqueous solutions J. Magn. Reson. 54,539-542 12 Hore, P J (1983) Solvent suppression m Fourier transform NMR. J Magn Reson. S&283-300

112

Barber, Cross, and Parkinson

13 Lefevre, J.-F, Lane, A. N , and Jardetsky, 0. (1987) Solution structure of the trp operator of E. colt determmed by NMR Biochemistry 26, 5076-5090 14 Byrd, R. A., Summers, M F., and Zon, G. (1986) A new approach for asstgning 31P NMR signals and correlating adJacent nucleosrde deoxyrtbose motettes via ‘H-detected multtple-quantum NMR Apphcatton to the adduct of d(TGGT) with the antrcancer agent (ethylenediamme) drchloroplatmum. J Am. Chem Sot. 108,504,505 15 Shah, D. O., Lai, K , and Gorenstem, D. G (1984) Facile synthesis and 31P NMR spectra of a double-labelled ohgonucleotide d(Ap170)Gp(‘80)Cp(160)T) J.Am Chem Sot 106,4302,4303 16 James, T L., Bendel, P., James, J. L., Keepers, J W., Kollman, P. A., Lapidot, A., Murphy-Boesch, J., and Taylor, J. E. (1983) Conformattonal flextbihty of nucleic acrds Jerusalem Symp Quantum Chem Blochem 16, 155-167 17. Kessler, H., Gehrke, M., and Griesmger, C (1988) Two-dtmensional NMR spectroscopy-background and overview of the experrments Angew Chem Int Ed. Engl. 27,490-536

18 Wuthrich, K. (1986) NMR of Proteins and Nucleic Acids Wtley, New York 19. Hare, D. R , Wemmer, D. E , Chou, S -H , Drobny, G , and Reid, B R. (1983) Asstgnment of the non-exchangeable proton resonances of d(CGCGAATTCGCG) using two dimensional nuclear magnetic resonance methods. J Mel Biol 171, 3 19-336. 20 Parkinson, J. A (1989) NMR Studies on the Met J Operator from Escherxhla coli. Ph D Thests University of Leeds, UK 21 Orbons, L. P. M , van der Marel, G A., van Boom, J. H , and Altona, C (1986) The B and Z forms of the d(mSC-G)s and d(brSC-G)s hexamers in solution* a 300 MHz and 500 MHz two-dimensional NMR study. Eur J. Biochem 160, 131-139. 22. Haasnoot, C A G., Westerink, H. P , van der Mare], G A , and van Boom, J. H. (1984) Dtscrrmination between A-type and B-type conformattons of double hehcal nucleic acid fragments in solutton by means of two dimensronal nuclear Overhauser experrments. J Blomol Struct Dyn 2,345-360 23 Nerdal, W , Hare, D R , and Rerd, B R. (1988) Three-dtmensional structure of the wild-type lac Prrbnow promoter DNA m solution J Mel Biol 201, 7 17-739. 24 Clore, G M , Oschkinat, H , McLaughlm, L W , Benseler, F , Happ, C. S , Happ, E , and Gronenborn, A M (1988) Refinement of the structure of the DNA dodecamer 5-d(CGCGPATTCGCG)2 contammg a stable purme-thymme base parr combmed use of nuclear magnetrc resonance and restramed molecular dynamics Biochemistry 27,4185-4197 25. Metzler, W. J., Wang, C., Kitchen, D. B , Levy, R M., and Pardi, A (1990) Determinmg local conformational variation m DNA. J. MOE Biol 214,71 l-736 26 Qmgnard, E , Fazakerley, G V , van der Marel, G A., van Boom, J. H., and Guschlbauer, W (1987) Comparison of the conformation of an oligonucleotrde containmg a central G-T base pair with non-mrsmatch sequence by proton NMR Nucleic Acids Res. 15, 3397-3409

High-Resolution

NMR

113

27 Patel, D J., Kozlowski, S. A , Marky, L. A., Rice, J. A , Broka, C , Dallas, J , Itakura, K., and Breslauer, K. J. (1982) Structure, dynamics, and energetms of deoxy guanosine-thymidine wobble base pan formation in the self complementary d(CGTGAATTCGCG) duplex in solution Biochemistry 21,437444 28. Joshua-Tor, L., Rabinovich, D., Hope, H., Frowlow, F., Appella, E., and Sussman, J. L (1988) The three-dimensional structure of a DNA duplex contaming looped-out bases. Nature 334,82-84. 29 Patel, D. J., Kozlowski, S A., Marky, L. A., Rice, J A., Broka, C., Itakura, K , and Breslauer, K J. (1982) Extra adenosme stacks mto the self complementary d(CGCAGAATTCGCG) duplex m solution. Biochemistry 21,445-451 30 Morden, K M , Chu, Y. G., Martin, F. H., and Tmoco, I (1983) Unpaired cytosme in the deoxyoligonucleotide duplex dCA,CasG dCT6G IS outside of the helix. Btochemistry 22, 5557-5563. 3 1 Patel, D J. and Shapiro, L (1985) Molecular recognition in noncovalent antitumour agent-DNA complexes* NMR studies of the base and sequence dependent recognmon of the DNA minor groove by netropsin Blochimie 67, 887-915. 32. Parkmson, J A, Barber, J , Douglas, K. T., Sharples, D , and Rosamond, J (1990) Minor-groove recognition of the self-complementary duplex d(CGCGAATTCGCG)2 by Hoechst 33258 a high-field NMR study Biochemistry 29, 10,181-10,190 33 Patel, D J. (1982) Antibiotic-DNA Interactions intermolecular nuclear Overhauser effects in the netropsin- d(CGCGAATTCGCG) complex m solution. Proc. Natl. Acad Sci USA 79,6424-6428. 34 Lee, M., Hartley, J. A., Pon, R T., Krowrcki, K , and Lown, J. W. (1988) Sequence specific molecular recognition by a monocationic lexitropsin of the decadeoxyribonucleotrde d(CATGGCATG), structural and dynamic aspects deduced from the high field ‘H NMR studies Nucleic Acids Res 16,665-684 35 Pelton, J G and Wemmer, D. E. (1989) Structural charactertsation of a 2.1 distamycin A d(CGCAAATTGGC) complex by two-dimensional NMR. Proc. l

Nat1 Acad. Sci USA 06,5723-5721.

36 Klevit, R. E., Wemmer, D E., and Reid, B. R (1986) ‘H NMR studies of the interaction between dlstamycin A and a symmetrical DNA dodecamer Biochemistry

25,3296-3303

37 Wilson, W. D. and Jones, R. L (1982) Interaction of actinomycm D, ethidmm, quinacrme, daunorubrcin, and tetra-lydne with DNA: 31P NMR chemical shift and relaxation investigation. Nucletc Acids Res. 10, 1399-1410. 38 Chandrasekaran, S , Kusuma, S., Boykin, D. W., and Wilson, W. D. (1986) A new approach utrlismg high-resolution proton NMR m structural analysis on intercalation complexes of natural DNA Magn. Resort. Chem 24, 630-637. 39 Shafer, R H., Brown, S. C., Delbarre, A., and Wade, D (1984) Bmding of ethidium and bis(methidium) spermme to Z DNA by intercalation. Nucleic Acids Res. 12,4679-4690 40. Mirau, P A , Shafer, R H., James, T L , and Bolton, P. H (1982) Fluoroquinacrme binding to nucleic acids: mvestlgatton of the 19F NMR, opti-

114

Barber, Cross, and Parkinson

cal, and fluorescence properties m the presence of DNA, poly(A) and tRNA Biopolymers 21,909-92 1 4 1. Wilson, W. D , Jones, R. L , Zon, G , Banvtlle, D. L , and Marzilh, L G. (1986) Specrfrcity m DNA Interactions: an NMR mvestrgatton of the interaction of proptdmm wtth ohgodeoxynucleottdes contaming normal and G-T base pairs Biopolymers 25, 199 l-20 15 42 Martam, Y H and Wilson, W D (1983) Effect of mtercalatmg drugs and temperature on the assoctation of sodmm tons wtth DNA 23Na NMR studtes J Am Chem Sot. 105,627,628 43 Gao, X and Pate], D J (1989)Antttumour drug-DNA mteracttons. NMR studtes of echmomycm and chromomycm complexes Q Rev Biophys 22,93-l 38. 44 Wtlson, W D., Jones, R L., Scott, E V., Zon, G , and Marztlh, L. G. (1986) Actinomycin D binding to ohgonucleotides with S’d(GCGC)3’ sequences Definitive ‘H and 31P NMR evidence for two dtstmct d(GC) 1 1 adducts and for adjacent sate binding in a unique 2.1 adduct J Am Chem Sot 108, 7113,7114 45 Jones. R L., Scott, E V , Zon, G., Marztlh, L. G , and Wtlson, W D (1988) An NMR mvesttgatron of the binding of the anttcancer drug actmomycm D to ohgodeoxyrtbonucleosrdes wtth isolated Sd(GC)3’ bmdmg sttes. Blochemlstry 27,6021-6026 46 Scott, E V , Jones, R L., Banvtlle, D L , Zon, G , Marztlh, L G , and Wilson, W. D. (1986) ‘H and 31P NMR mvestrgattons of actmomycm D bmdmg selecttvtty wtth deoxyrrbonucleostdes containing multiple adJacent d(GC) sttes Biochemistry 27,9 15-923. 47. Banville, D. L., Kemry, M. A , Kam, M , and Shafer, R H (1990) NMR studies of the mteractton of chromomycm A3 wtth small DNA duplexes. Bmdmg to GC-contaming sequences Biochemistry 29,652 l-6534. 48. den Hartog, J H J., Altona, C , Chottard, J -C., Gtrault, J -P., Lallemand, Y , de Leeuw, F. A A. M , Marcehs, A T. M., and ReedJtk, J (1982) Conformattonal analysis of the adduct cis-[Pt(NH& d(GpG)]+ m aqueous solution A htgh field (500-300 MHz) NMR study Nucleic Ads Res lo,471513730 49 den Hartog, J H J., Altona, C , van der Marel, G A , and ReedJrk, J (1984) A ‘H and 3’P NMR study of crs-Pt(NH3)2 [d(CpGpG)-N7(2), N7(3)] Eur J Biochem 147,37 1-379 50. Hams, T M., Stone, M P., and Harris, C M. (1988) Apphcation of NMR spectroscopy to studies of reacttve uttermediates and therr mteractrons wtth nucleic acids Chem Res Toxic01 1,79-97

Structural Characterization of the Carbohydrate Moieties of Glycoproteins by High-Resolution lH=NMR Spectroscopy Herman

van Halbeek

1. Introduction The biochemical/hromedical research community, the pharmaceutical industry, and, indeed, molecular biologists generally are faced with the increasing need for characterization of carbohydrate structures of recombinant glycoproteins and natural analogs. Cultured mammalian cells (such as Chinese hamster ovary [CHO] cells) are used to produce glycoproteins for therapeutic and diagnostic use because of their ability to perform glycosylation. The presence of oligosaccharide moieties is often compulsory to define several biological activities of glycoproteins, including clearance rate, immunogenicity, and specific biological activity. Since a number of factors that influence glycosylation still elude our control (such as culture environment and age of the cells), the same gene expressed in the same type of cell may not always yield a product with exactly the same glycosylation pattern, presenting drug batch quality-control problems for the pharmaceutical industry. Nuclear magnetic resonance(NMR) spectroscopy provides a powerful nondestructive means to characterize glycoprotein carbohydrates structurally and is an indispensable part of the current methodology of glycosylation site mapping. From Methods m Molecular &o/ogy, Vol 17 Spectroscopic Methods and Analyses NMR, Mass Spectrometry, and Metalloprotern Techmques Edited by. C Jones, B Mulloy, and A H Thomas Copynght 01993 Humana Press Inc , Totowa, NJ

116

Van Halbeek

We will limit the discussion in this chapter to solution-state ‘HNMR spectroscopy as a method for the characterization of theprimary structure of N-type oligosaccharide chains of glycoproteins. In presenting NMR spectroscopy as a “fingerprinting” technique, we need only discuss single-pulse, one-dimensional (1D) ‘H-NMR spectroscopy. A slightly more complicated experiment would be performed only for solvent suppression. We will illustrate the applicability of the 1D ‘H-NMR fingerprinting method for the structural elucidation of the carbohydrate chains of three recombinant glycoproteins, namely, recombinant soluble human CD4 (rCD4) (1) and recombinant human tissue plasminogen activator (rtPA) (2), both expressed in CHO cells, and a recombinant hepatitis B virus (HBV) surface antigen glycoprotem (pre-S2 + S), expressed in cells of the mnn9 mutant strain of the yeast Saccharomyces cerevisiae (3). We will briefly compare the preS2 + S structures to the structures of carbohydrate chains of allergen Art v II, as elucidated by ‘H-NMR spectroscopy (4). The 1D ‘H-NMR spectrum of an oligosaccharide or glycopeptide represents an “identity card” of the carbohydrate. A 1D ‘H-NMR study may suffice for primary structure determination if the oligosaccharide itself, or a compound of closely related structure, hasbeen characterized previously. The usefulnessof the “structural-reporter groupconcept” (5,6) in this context will be outlined in detail in Section 3.5. Fingerprinting a carbohydrate through 1D ‘H-NMR spectroscopy is possible if at least 10 nmol of pure oligosaccharide/glycopeptide are available. Primary structure determination of glycoprotein-derived oligosaccharides can be performed by ‘H-NMR spectroscopy employing radio frequencies larger than 300 MHz. We strongly advise using a spectrometer operating at 500 or 600 MHz because of the advantages in sensitivity and spectral resolution. Although high-field (300-600 MHz) NMR spectrometers are widespread these days, mere accessto such an instrument does not always guarantee the successful structural elucidation of the carbohydrate in question. Only a skillful NMR spectroscopist familiar with the peculiar aspects of carbohydrate NMR spectra will record spectra of the quality and under the conditions that allow reliable interpretation. An experienced specialist is almost always required to interpret the NMR data in terms of the structure of the carbohydrate. Efforts are beginning, however, to automate interpretation of ‘H-NMRspectraof carbohydrates(7,8). Thus, biological research-

‘H-NMR

of Glycoprotein Carbohydrates

117

ers often seek collaboration with laboratories specializing in NMR spectroscopy of carbohydrates. The author’s laboratory is part of the US National Institutes of Health (NIH) Resource Center for Biomedical Complex Carbohydrates; it welcomes any requests regarding structural analysis of glycoprotein carbohydrates and provides advice on preparing samples, on the suitability of samples for NMR analysis, and so on. NMR spectra of carbohydrates are run at 500 or 600 MHz at nominal costs on a nonprofit basis, and help in interpretation of the data is provided. 2. Materials A glycoprotein, whether natural or recombinant, consists of a protein in which one or more amino acids bear carbohydrate (are glycosylated). A carbohydrate chain attached to the amide (CO-M12) group in the side chain of an Asn residue is referred to as an N-type oligosaccharide; oligosaccharides attached to the hydroxyl group of a Ser or Thr residue are O-type chains. N-type oligosaccharide chains have the following pentasaccharide core structure in common: 4’

Mana( l--+6)

\

Mar$( 1-+4)GlcNA$( 1-+4)GlcNAc Mana( l-+3)’ 3 2 1 /

4

Depending on the extension of the core, N-type carbohydrates are subdivided into three types termed: 1. N-acetyllactosamine; 2. High-mannose;and 3. Hybrid (seeScheme1). Examples of each of these categories commonly found m recombinant glycoproteins are discussed later. Characteristically, a glycoprotein does not have a single carbohydrate structure attached to a specific glycosylation site. This microheterogeneity, illustrating the phenomenon that a glycoprotein can take on various glycoforms, greatly complicates the structural characterization of glycoprotein carbohydrates. However, since NMR spectroscopy can handle mixtures of closely related structures,we often can cope with heterogeneitywithout resorting

118

Van Halbeek (a) N-acety~ia&.samme

type

dmntennaty’ 5’ N 6 NeuAcu(2~3)Gal~(l+4)GlcNAc~( NeuAca(2+3)Galj3(1+4)GlcNAc~( N 6 5

4’ 1 +P)Mana(l *S), 1+P)Mana( l-0)’ 4

P=ucw

Manp(l+4)GlcNAcf3( 3 2

-wo-I\

l -4)GlcNAc 1

tetra-antennary: N”

8’

T

NeuAcu(2-r3)Galf3(1+4)GlcNAc~(l-r6~ N’ 4’ 6 5 NeuAcu(2+3)Gal~(l+4)GlcNAc~(1 +P)Mana( 1+S), NeuAca(2+3)Gal~(l+4)GlcNAc~(l+2)Mana(1+3~ N 6 5 4 NeuAcu(2+3)Gal/3(1+4)GlcNAc~(l-4~ N’

7

6

(b) high-mannose

[Fuca(l+6)]0.1\ Mar@ 1+4)GlcNAcP( 1+4)GlcNAc 2 1 3

type 4

0

Mana(1 +P)Mana(l+6h A DZ Mana(l+2)Mana(l+3f

4’ Mana( 1-r61

Mana(1 +P)Mana(l-+6)\ E Mana(l-rS)/ D4 Mana(1 -+2)Mana(l-2)/ 4 C DI

Man~(l-r4)GlcNAc~(l+4)GlcNAc 3 2 1

(4 Wnd we 0 Mana(l+6)\ 4’ A Manrr(l+61 F-=41 -6)10-l\ Mana( l -3j Man~(l+4)GlcNAc~(l-+4)GlcNAc 2 1 NeuAca(2-3)Galp(l+4)GlcNAc~(l-2)Manu(l~3)/ 3 N 6 5 4

Scheme 1. Typical structuresof N-type ohgosacchandesas releasedfrom recombinant glycoproteins

lH-A?MR of Glycoprotein

Carbohydrates

to rigorous purification of a glycoprotein into its individual glycoforms. Nevertheless, the glycoprotein must be purified to homogeneity, as the presence of contammatmg (glyco-)proteins must be eliminated before tackling the structural analysis of the carbohydrates, The mol wt of the glycoprotein and the number of glycosylation sites occupied (i.e., the carbohydrate content) determine how much sample is needed to begin structural analysis. Ordinardy, one needs about 20 mg of pure glycoprotein starting material. Glycoproteins arem generaltoo large to be studied asintact macromolecules by ‘H-NMR spectroscopyfor the structure of their carbohydrates. The spectraof intact glycoprotems show mostly fairly wide lines and lack the resolution needed for fine structural analysis. The severe overlap of resonances,includmg the overlap of carbohydrate and protein signals, and the mrcroheterogeneity of the sample render spectral analysis of an mtact glycoprotem asyet impossible. Degradation to partial structures is mandatory when detailed primary structural information on oligosaccharides 1sdesrred.Partral structures suitable for NMR spectroscopy are: 1. Glycopeptrdes; 2. Olrgosaccharrdes; and 3. Reducedohgosaccharrdes. Glycopeptrdes are the partial structures of choice when we must preserve the information on the glycosylation site m the protein. Glycopeptrdes can be generated by specific (e.g., trypsin, chymotrypsin) or nonspecific (pronase, pepsin) proteolytic digestion of the protein portion of the glycoprotem. Although larger peptides can be handled, NMR spectroscopic analysis ISfacrlitated when the glycopeptides (once purified from the peptrdes) have a peptide chain no longer than -10 amino acids. Also, preferably, the glycopeptides should be homogeneous in their peptide portion, although peptide heterogeneity is usually not an insurmountable problem, since only the positions of the signals of the first couple of monosaccharide resrdues attached to a peptide are affected. However, for glycopeptides to be analyzed successfully, they should contain only one glycosylation site; the heterogeneity in the carbohydrate structure at that site can then be adequately characterized by ‘H-NMR spectroscopic analysis. Oligosaccharides aregeneratedby enzymatic release from the glycoprotein (sometimes after denaturation) or from (tryptic) glycopeptides, by

120

Van Halbeek

N-glycanase (also known as PNGase F) or by an endo-glycosidase, such as endo-H or endo-F. Scheme 2 gives a typical example of such an approach for complete structural characterization of the carbohydrates of a recombinant glycoprotein. N-glycanase cleaves fairly aspecifically the N-glycosidic linkage between core residue GlcNAc- 1 and Asn, resulting in a reducing oligosaccharide with an intact N,N’diacetylchitobiose moiety. The endo enzymes cleave the linkage between the two GlcNAc residues in the core, producing an oligosaccharide that ends in just one GlcNAc residue at the reducing end. The endo enzymes vary in specificity. For example, endo-H cleaves high-mannose and hybrid structures, but not N-acetyllactosamine-type oligosaccharides.The advantagesof oligosaccharides over glycopeptides are: (1) they show no signal overlap in the NMR spectrum with amino acid protons; and (2) they are easier than glycopeptides to purify to homogeneity of carbohydratechain, which is reflectedin the NMR spectrum. The drawback of analyzing oligosaccharrdes is the loss of information about the glycosylation site and, sometimes, the anomerization of the reducing oligosaccharide. The oligosaccharide in solution is present as a mixture of the a and p anomers, and this may affect the NMR spectrum to such an extent that the interpretation becomes ambiguous. To avoid the anomerization effect on the ‘H-NMR spectrum, to simplify subsequent purification procedures, or to render the oligosaccharide amenable to techniques of structural analysis other than NMR, the oligosaccharide may be reduced by NaBH, (or NaBD4 for mass spectrometry) to its corresponding oligosaccharide-aldito1.t To enhance sensitivity of detection during purification of the oligosaccharides, the reduction may be carried out with NaBT4, thereby incorporating a radioactive label in the compound. Neither the D nor the T label in the oligosaccharide-alditols has an adverse effect on ‘H-NMR spectroscopic analysis of the compounds. It is evident that, in all cases, the partial degradation technique and subsequent chemical modifications applied should not affect the structure of the carbohydrate. Desialylation, defucosylation, deacetylation, desulfation, aspecific cleavage at the reducing end, and so forth, are tMucm-type, Ser/Thr-lmked carbohydrates are most convemently studled as ohgosaccharide-aldltols Mucin-type glycopeptldes, ordmarlly contammg clusters of carbohydrates, are not smtable for NMR analysis

‘H-NMR

of Glycoprotein Carbohydrates

I21

.

c

Asn 300

N

9

C

r-CD4

tlyptic

-

36s

‘“I mg) towIn 1 peptldes and

digest

glycopeptldes

CI Asn ,-. 300

Asn271

Asn-271

Asn-300

glycopeptide(s) I endo+ I rewrsed-phase HPLC

glycopeptide(s)

N-g/ycirlese

I

I peptrde

olrgosaccharldes I

& olrgosaccharldes I I I I / I I reversed-phase HPLC

G-15 I I I t

klYW peptides 1 glycopepfldasa L peptrde & olrgosaccharldes reversed-phase

I

G- 75 (desa/lmgJ I FPLC on mono-0 1

I ;

Asn-300 1) lH-NMR

spectroscopy

2) hrgh-pH

anion exchange

3)

glycosyf

composition

4)

glycosyl

llnkage

5)

FAB-MS

(of peptrdes

chromatography

HPLC

I

I

G- 15 (desalfmg) I FPLC on moncM

A

endc-H

(HPAE)

analysis (methylation) and

analysis glycopeptides)

Scheme 2. Release, isolation, and purification of N-type oligosaccharides from recombmant soluble human CD4 For detads of the experimental procedures, the reader 1s referred to ref (1) The structural characterization of fractions Asn-271 Q2 and Asn-300 endo-H is discussed m the text

known to occur when chemical methods are used for deglycosyIation or when enzymatic methods are followed by workup procedures under conditions not mild enough (pH, temperature, chromatographic conditions for purifications) to keep the carbohydrate structures intact.

122

Van Hal beek

Finally, the (reduced)oligosaccharides must be purified before undergoing NMR analysis. Important in this respect are removal of any (proton-containing and nonproton-containing) contaminants and purification of the carbohydrates to virtual homogeneity (charge, size, and so on). The most frequently observed noncarbohydrate contaminants that can disturb the NMR spectrum are salts/buffers (acetate, lactate [from the fingers of the biologist], SDS, EDTA, and so forth). Even salts that do not contain protons (NaCl, phosphates, and so on) ~111 impair the NMR spectrum (line broadening) and shift the HOD peak (effect on dielectric constant and pH of the solution). One should be aware that carbohydrates only detected by their radioactive label are usually the ones most seriously contaminated with all sorts of nonradioactive, but NMR-disturbing substances. The amount of pure oligosaccharide needed to obtain a 1D ‘H-NMR spectrum in <6 h of data acquisition time depends on the field strength of the spectrometer available and the sensitivity of the probe. As a rule of thumb, 15 nmol of oligosaccharide (corresponding to 25-30 pg of a decasaccharide) are the minimum requirement for primary structural analysis on a 500-MHz spectrometer. For analysis at 600 MHz, 7-10 nmol(15-20 l.tg)are sufficient, whereas 35-40 nmol are needed at 400 MHz and approx 100 nmol at 300 MHz. The other materials neededfor NMR analysis of oligosaccharides are adeuterated solvent (D20 for underrvatized glycoprotein oligosaccharides) and an NMR tube. As for the D20, usedfor exchange of OH andNH protons for deuterium atoms, one should bear in mind that the actual spectrum is recorded using a solution of the compound in 0.4 or 0.5 mL of D,O ofthe highest available deuterationgrade(“lOO.O%,” “gold-label,” in practice >99.99%), but the initial exchange stepscan be performed with 99.8% D20. Also, the deuteration percentageon the label of the bottle or ampule is only real if the D20 is handled correctly (see Section 3.1.). Typically, ‘H-NMR analyses of oligosaccharide samples are carried out in 5-mm NMR tubes. It is of the utmost importance to purchasehigh-quality glass tubes. Wilmad 535-PPtubes arerecommended for analysis at 500 or 600 MHz; Wilmad 528-PP tubes are acceptable. Bad-quality spectra may be obtained on an otherwise excellent NMR spectrometer and valuable instrument time wasted by using tubes of inferior quality.

IH-NMR

of Glycoprotein

Carbohydrates

123

3. Methods 3.1. Sample Preparation A glycopeptide or oligosaccharide sample submitted for NMR analysis is stored in dry state at -20°C until use. The first steps in the actual preparation of the sample for NMR analysis are proton exchanges m D20. The sample is dissolved twice in D20 (99.8 and 99.96% D, respectively) at room temperatureand pD 6 with intermediate lyophilization. The purpose of the exchange treatments is the complete conversion of OH and NH groups in the constituent monosaccharides into OD and ND (chemical exchange), and the preparation of an eventual solution of the carbohydrate with as low as possible a residual amount of HOD. The exchanges are usually performed in small glass vials; some researchers prefer to perform the exchange directly in the NMR tube. Each step in the exchange procedure (dissolving the sample in -0.5 mL DzO, allowing the exchange process to take place, and subsequent lyophilization) may take 6-8 h. Check the pH (pD) of the solution immediately after dissolving the sample for the first exchange step. If the pH does not fall between 6 and 8, adjust it with dilute DC1 or NaOD. Remember that the glycosidic linkages of sialic acid residues tend to hydrolyze at pH 5 or lower, whereas esters, such as O-acetates, are cleaved under both acidic (pH < 5) and basic (pH > 8) conditions. One last purification step critical to the quality of the NMR spectrum is removal of paramagnetic impurities (metal ions). This step is typically carried out in the NMR lab just prior to the recording of the NMR spectrum; paramagnetic impurities may not bother the biologist in any other application of the glycoprotein and/or its oligosaccharides. Chelex is the best material to use for the routine removal of paramagnetic ions. A few particles of Chelex are sufficient to sequesterthe metals. Incubate in a small vial during the first and/or secondexchange step, under slow, continuous swirling for 30 min, to remove the paramagnetics, and then pipet off the solution. In this way, a 0.5-r& sample can be processed with minimal loss from adsorption and with no significant dilution (9). The sample is then dissolved three more times in highest quality D20 (99.996% D) at roomtemperatureandpD6,withintermediate lyophilization. Altogether, a total of at least48 h arenecessaryto prepare a sample for NMR analysis, taken from the time it arrives in the NMR lab.

124

Van Hal beek

A few final remarks pertinent to sample preparation: Several companies market D20 of the quality (deuteration grade, free of paramagnetic impurities, and so on) required for this type of NMR analysis. It is best to purchase D,O (especially the 99.996% D-grade D,O) in small ampules (0.5-1.0 mL) rather than in large bottles (over 10 mL). In any case, the D20 should be handled in dry atmosphere so as to preserveits quality after the container is opened.Perform the exchanges in a glove box maintaining humidity at ~7 or 8%. Allow samples that have been stored in a freezer to warm to room temperature before dissolving them in D20. Prerinse syringes and pipet tips in D20 just prior to use. Moisture in the air is the NMR spectroscopist’s worst enemy. Lyophilization can be replaced by flash evaporation. One way or the other, try to prevent contact of the sample with the air. Prior to ‘H-NMR spectroscopic analysis, the sample is redissolved in 0.4 or 0.5 mL of D20 (99.996% D) and transferred into a 5-mm NMR tube. The actual volume dependson the length of the RF receiver coil in the probe that the NMR spectrometer uses. The sample should be dissolved at least 12 h before the actual spectrum is recorded to ensure complete solvation. This time lapse significantly improves the quality of the NMR spectrum over that of a spectrum recorded immediately after dissolving the sample. When transferring the sample into the NMR tube, filter it (over cotton wool, prerinsed with highest quality D20) to remove any insoluble particulates. Check the pD of the resulting solution, either before or after the NMR spectrum is run, by putting a droplet on pH paper.The pD of the solution should be between 6 and 8. It is not necessary to degas the solution and/or to seal the NMR tube for the types of NMR experiments described here. Acetone is the most widely used internal standard for ‘H-NMR spectroscopy in D,O; its chemical shift is 6 2.225 ppm (referenced to DSS in D20 at 0 ppm). Typically, the carefully cleaned NMR tube used for the analysis contains a trace amount of acetone. Of course, 0.5- 1.O pL acetone may be added, before the NMR experiment, as a standard from a syringe to the oligosaccharide solution. Sometimes, the sample may contain a small amount of free acetate. The acetate methyl protons will show up in the ‘H-NMR spectrum as a singlet at 6 1.908 ppm and may be used as an alternative internal standard for chemical shift calibration, unless the pD of the solution deviates significantly from 6-8. Occasionally, deuterated acetone (CD$OCD,) (-50 pL) is added

lH-NkiR

of Glycoprotein

Carbohydrates

to the solution for lock purposes (see Section 3.2.); the residual CD&OCD2H gives rise to a multiplet, centered around 2.167 ppm, and provides another means for the calibration of chemical shifts. 3.2. The NMR

Spectrometer

‘H-NMR spectroscopy is performed on a pulse-FT NMR spectrometer, operating at a radio frequency in the range of 300-600 MHz for ‘H. For the purposes of this type of analysis, there are no major differences in performance of NMR spectrometers of different manufacturers (Bruker, General Electric, JEOL, Varian) of the same field strength. The spectra shown in this chapter were recorded in the author’s laboratory on a Bruker AM-500 spectrometer (operating at 500 MHz for ‘H) interfaced with an Aspect-3000 computer. It is important to use a high-sensitivity 5mm probe for recording the ‘H-NMR spectra of the oligosaccharide. The D signal of the solvent serves as a reference for the field-frequency lock. The temperature of the sample in the probe can be selected between 20 and 30°C but must be kept constant during the NMR experiment. If constant room temperature is maintained in the environment where the NMR spectrometer is located, spectra can be acquired at ambient temperature without temperature control by the NMR spectrometer. However, when recording the spectrum of a weak sample over 4-6 h, the temperature should be controlled by the spectrometer. In that case, the temperature of the sample is typically controlled at a few degrees above room temperature (e.g., 27”C), because most commercially available variable temperature (VT) control units on NMR spectrometers cannot maintain constant sample temperature well if they are set the same as room temperature. The temperature fluctuations induced in the sample by heating and cooling of the VT unit are reflected in the resonance position of the D signal of D,O and the ‘H signal of HOD. Since the position of the HOD signal is by far the most temperaturesensitive in the ‘H spectrum, the only sharp peak resulting from lock compensation is the solvent peak, while the remainder of the signals shift back and forth with temperaturefluctuations, resulting in broad lines. If the spectrometer has problems maintaining stable temperature in the temperature range in which you wish to conduct your NMR experiment, use another deuterated solvent for the lock, mixable with D20, but with a temperature-insensitive lock frequency. Such a solvent is

126

Van Hal beek

acetone-&. As little as 50 pL must be added to the solution and are sufficient for lock in the presence of 450 ~JLD,O. After carefully adjusting the shims, i.e., optimizmg the magnetic field homogeneity over the sample, the scene is set for data collection. During the NMR experiment, the sample is spun at a constant rate of - 15-20 Hz. If the gam in resolution does not outweigh the occurrence of spinning side bands (especially around the strong residual solvent “HOD” signal), the NMR spectra are recorded without spinning the sample. In our experience, it is not necessary to spin a sample if the nonspinning shim gradients are carefully adjusted. 3.3. Recording

the IH-NMR

Spectrum

Standard acquisition parameters for routine 1D ‘H-NMR spectroscopic analysis of oligosaccharides in D20 are as follows. With the spectral width set to lo-12 ppm and a time domain of 16K or 32K data points, we get an acquisition time of 2-4 s/scan. The flip angle of the pulse used is 70-75”. An additional relaxation delay between consecutive scans is not necessary under these conditions. (Typical T, values for ‘H in medium-size oligosaccharides are 0.1-0.5 s.) As examples, we present the ‘H-NMR spectra of two oligosaccharide samples isolated from rCD4 (Scheme 2). The actual values for all relevant acquisition parameters can be found in the legend to Fig. 1. Data collection is continued until the signal-to-noise (S/N) ratio in the anomeric-proton region of the spectrum is at least 3, but preferably 5 or better. Depending on the amount of carbohydrate material available for analysis, reaching this S/N value may require a few hundred Frg. 1 (opposzfepage). (A) 500-MHz ‘H-NMR spectrum (D20, pD 6,27”C) of the dtsialyi oligosaccharrdes (200 pg) released from Asn-27 1 of recombmant soluble human CD4 by N-glycanase (I). Acquisition parameters number of data pomts, 32K (both m the time domam and in the frequency domain); spectral width, 5000 Hz; acquisrtron time (= total interpulse delay), 3 28 s; pulse width, 5.0 ps (-75’ fhp angle), number of scans, 8000. (B) Structural-reporter-group regions of spectrum (A), after resolution enhancement by Lorentztan-to-Gausstan transformatton (LB -2.0, GB 0 18); final drgital resolution, 0 3 Hz/pt Fraction Q2 contams two compounds, namely, 42 - F and Q2 + F, m the ratio of 4.1 The bold numbers above and below the spectrum refer to the corresponding glycosyl residues m the structures (see also Table 1 later m this chapter) Asstgnments above the signals m the spectrum refer to the major component m the mtxture. The srgnals for the mmor component in the mtxture are Indicated below the spectrum when they doffer from those for the major one. The NAc CHs singlets are shown at a different intensity from the other sections of the spectrum

A

NAc -Cl-l, -

HOD

Hi atoms

“1

Man H2

Fuc -CH3 -

NeuAc HIaq

Q2-F/Q2

B

‘i i

t F

NeuAc HIax

128

Van Hal beek

up to several thousand scans. Thus, the total time to obtain the NMR spectrum is a few minutes (for - 1~01 of carbohydrate) up to 6 h (for 10 nmol of carbohydrate). Despite taking all the above precautions when preparing the sample for NMR analysis, the residual HOD signal will still appear as the dominant peak in all spectra but those of the most concentrated samples. The HOD signal is found at 6 4.75-4.80 ppm; its exact position varies with temperature, pH, and concentration of the solution and, therefore, should not be used for calibration of the chemical shift scale. If the residual HOD signal obscuresany signals of interest, we have two ways to make signals in the immediate vicinity of the HOD peak visible. We can either modify the routine single-pulse NMR experiment (into a “water suppression” experiment) while maintaining sample temperature, or we can repeat the standard NMR experiment at higher temperature, since raising the sampletemperatureto40 or45”C is usually sufficient to observe the region around 64.7-4.9 undisturbed. However, solvent suppressionis preferred over temperature elevation. Not only does the HOD signal shift when the sample temperature is changed, but some of the carbohydrate proton signals shift, too. Although the chemical shifts vary only slightly with temperature changes, such effects usually prevent unambiguous assignment of signals based on ambient temperature data. There is also the risk of degrading the sample (e.g., desialylation) at high temperature. The srmplest way to suppress the solvent signal is by fast pulsing. That is the major reason for not using an additional relaxation delay in data collection when the acquisition trme is already on the order of a second (see earlier). Since carbohydrate protons have much shorter relaxation times than the HOD proton, the signals of the former will not be affected by “fast pulsing,” and, therefore, the sensitivity of the method will not be degraded. Alternatively, presaturation, a technique very popular with peptide and nucleotide NMR spectroscopists (IO), can be used to suppress the residual HOD signal. Careful adjustment of the irradiation time and power level of the presaturation pulse usually generated by the ‘H decoupler channel of the spectrometer is required to obtain a spectrum that stall holds information on signals close to the HOD peak. A third water-suppression technique is a multiple-pulse sequence called a WEFT (water-eliminated Fourier transform) experiment. It

lH-NMR

of Glycoprotein Carbohydrates

utilizes the difference in relaxation times between the HOD and carbohydrate protons. This experiment is based on the “inversion-recovery” principle. Typically, a (180” -‘c - 90” - acquisition) sequence, in which the delay z is empirically optimized, gives quite satisfactory results, especially if the 180” pulse is composite (90”,180”,90”,) (4). If, for sensitivity reasons, selective inversion of the HOD peak is desired, the 180” pulse in the preceding scheme may be replaced by a selective 180” pulse (usually a DANTE pulse train [11] or a shapedpulse). Both the nonselectiveandthe selectiveWEFT experiment leavethe regions immediately to the right and the left of the HOD signal unaffected. 3.4. Data Processing The result of the data collection just described is called a free induction decay (FID), or an NMR spectrum in the time domain. To convert the FID into the NMR spectrum in the frequency domain, one routinely applies FT, followed by phase correction. The FID can be manipulated before FT, depending on the aspect of the resulting NMR spectrum to be emphasized. When multiplied by an exponential function, the result after FT is a sensitivity-enhanced spectrum at the cost of resolution (line-broadening); however, when a sinusoidal or Gaussian function is used for window multiplication (S/N ratio permitting), the result is a resolution-enhanced spectrum. Often, after Gaussian multiplication, the number of data points is increased before FT (“zerofilling”) so as to ensure a sufficient number of data points to achieve a digital resolution of -0.2-0.3 Hz/pt. Artificial resolution enhancementwas applied to the spectra displayed m Figs. 1B and 2B; parameters are specified in the legends. The latter technique is most useful in the methyl proton region of the ‘H-NMR spectrum, where small differences in chemical shift between sharp methyl singlets (NAc signals) or doublets (Fuc C6 protons) are very significant for structural analysis. Sometimes, spectral integration is applied in the anomeric-proton region of the spectrum, mamly to verify the purity of the sample by determining the ratio in which two or more components in the sample occur. 3.5. Spectral Interpretation The spectrum obtained represents an “identity card” of the carbohydrate. The ‘H-NMR spectrum of a given carbohydrate is unique; mere comparison of the ‘H-NMR spectraof compounds (evenwithout detailed

130

Van Halbeek

interpretation) can demonstrate the identity of compounds. Thus, a 1D ‘H-NMR study may suffice for primary structure determination if the oligosaccharide has been characterized previously. Several glycoprotein carbohydrate ‘H-NMR data bases are available for N-type glycopeptides and ohgosaccharides in D20, for example (5,12,13). When the spectrum does not match any of the spectra m existing data bases, attempts can be made to interpret the ‘H-NMR spectrum in terms of (partial elements of) the primary structure of the carbohydrate (including anomeric configurations and positions of glycosidic linkages) by using the well documented structural-reporter group concept (5,6). As an example of this approach, the interpretation of the 500-MHz ‘HNMR spectra of two oligosaccharide samples isolated from soluble human rCD4 (Figs. 1 and 2) is discussed. We basically ignore the crowded region in the center of the spectrum (between 3 and 4 ppm; Figs. 1A and 2A), and only the positions and patterns of those signals that are individually observable are examined. Particularly useful structural reporter groups in such 1D analyses are: 1. The anomeric(Hl) protons; 2. The protonsattachedto the carbonatomsin the direct vlclnlty of a substltution posltion; 3 The protonsattachedto deoxy carbonatoms; and 4. Methyl protons,e.g.,m N-acetyl groups. The structural-reporter-group regions of the spectra are depicted in Figs. 1B and 2B. The chemical shifts of the structural reporter groups are measured and compiled in a table. These values are then compared with literature data on similar N-type oligosaccharides and/or glycopeptides. Fig 2 (opposrte page) (A) 500-MHz ‘H-NMR spectrum(D20, pD 6,27’(Z) of the hybrid-type oligosaccharldes(-250 K) releasedfrom Asn-300 of recombinantsoluble humanCD4 by endo-H (I) NMR acquisltlon parametersare the sameas hstedm the legend to Fig 1A (B) Expanded and resolution-enhancedstructural-reporter-group regions of spectrum (A) The fraction contains two compounds,EH-2 and EH-1, m the ratio of 3.2 The bold numbersabove and below the spectrumrefer to the correspondmgglycosyl residuesm the structures(seealso Table 3 later in this chapter). Assignmentsabove the signalsm the spectrum refer to the major component in the mixture The signalsfor the mmor component are mdlcated below the spectrum when they differ from those for the major one The NAc CH3 smgletsare shown at a different mtenslty from the other sectionsof the spectrum

NAc -CH,

HOD

A

NeuAc H3ax 50

I 4 0

30

I

1

”

20

+ 6 (ppm)

EH-l(-B)/EH-2(

t B) e

B

lMarro(l-s)l,,\ A Malla(l-311 N~(2-3)(ia18(1-4)GlcNAcp(l-2)Msno(l-3)/ N 6 6

4 MWW\ Mm4(14)GbNAc 2 ’ 4

r

.

132

Van Halbeek

Figure 1 shows the ‘H-NMR spectrum of fractron Q2 released from rCD4 glycosylation site Asn-27 1 by N-glycanase and purified on the basis of its charge (Scheme 2). The NMR spectrum indicates that the sample contains a mixture of two di-antennary N-acetyllactosaminetype oligosaccharides ending in NJ’-diacetylchitobiose (compare Scheme 1). The di-antennary type of branching is evident from the set of chemical shifts of the Man Hl and H2 atoms (Table 1) (cf. l-3,5). Tri-, tri’-, and tetra-antennary structures would reveal their degree of branching by virtue of, among other features, different sets of Man H 1 and H2 chemical shifts (see Table 2). Both branches of the di-antennary oligosaccharides Q2 terminate in NeuAc attached in a(2+3)-linkage to Gal, as seen by the pair of NeuAc H3ax and H3eq signals (61.80 and 2.76, respectively;Table 1). The precise position of the NeuAc H3ax signal reflects the branch location of the NeuAc residue. The position of the H3ax signal is different for NeuAc in the C3-linked (6 1.796) vs the C6-linked branch (6 1.799). These values, along with the positions of the Gal-6 and Gal6’ Hl doublets and the GlcNAc-5 and GlcNAc-5’ NAc signals, are valuable for determining both the type of linkage of the NeuAc residue to Gal and the branch location of the residue (2,5). (It is worth noting that a(2+6)-linked NeuAc residues [although not found m the carbohydrate chains of glycoproteins expressed in CHO cells] have different characteristic chemical shifts for their H3ax and H3eq signals. They also exert different effects on the chemical shifts of the aforementioned structural reporter groups of residues in the sialylated branch; see ref. 14.) The ‘H-NMR spectrum of fraction 42 (Fig. 1) shows that the disialyl fraction has two components whose structures differ in the presence of the fucosyl group at C6 of GlcNAc- 1. The major (>80%) component, 42 -F, is a di-antennary oligosaccharide without Fuc; the minor component in the mixture, 42 + F, has additional Fuc a( 1+6) attached to GlcNAc-1 in the core. The presence of such a Fuc residue manifests itself in the NMR spectrum of the oligosaccharide by (1) structuralreporter-group signals of the Fuc residue itself, and (2) chemical shifts induced on the reporter group signals of residues GlcNAc-1 and GlcNAc-2.TypicalchemicalshiftsforFucHl are64.89 (forthe a-anomer of the reducing oligosaccharide) and 4.90 (for the p-anomer), for HS 6 4.095 (a) and 4.13 (p), and for the CHs protons 6 1.21 (a) and 1.22

IH-NMR

of Glycoprotein Carbohydrates

133

(p); each pair of signals occurs in the intensity ratio typical of reducing oligosaccharides ending in GlcNAc, a:P - 2: 1. All three of the structural-reporter-group signals of Fuc show relatively large anomerization effects (A&.&. Extending the chitobiose unit by Fuc ~(1-6) at GlcNAc- 1 affects the chemical shifts of H 1 (A8 0.055 ppm) and NAc protons (A8 0.013 ppm) of GlcNAc-2, and of HI in the a-anomer of GlcNAc-1 (A6 -0.008 ppm) (see Table 1). The latter effect was used to determine the ratios of fucosyl and nonfucosyl compounds in mixture 42 (Fig. 1) as a complementary aid to the intensity ratio of the NAc signals of GlcNAc-2 at 6 2.09612.093 for the fucosyl and 6 2.0821 2.081 for the nonfucosyl compound (Table 1). The oligosaccharides released by endo-H from the Asn-300-containmg tryptic glycopeptrde ofrCD4 (fraction Asn-300endo-H, Scheme 2, further denoted as EH) were investigated by 500-MHz ‘H-NMR spectroscopy in a mixture. The NMR spectrum of fraction EH (Fig. 2) indicates the presence of high-mannose and/or hybrid-type ohgosaccharides by virtue of the multiple signals in the anomeric region with shapes typical of Man H 1. Unlike the oligosaccharides released by Nglycanase (see Section 2.), the EH oligosaccharides end in a reducing GlcNAc-2 residue. The ohgosaccharideswere judged to be of the hybrid type (see Scheme 1) becauseof the occurrence of NeuAc signals in the same spectrum. The signals of Gal H3 (6 4.12), and of NeuAc H3ax and H3eq (at 6 1.80 and 2.76, respectively) indicate the linkage between NeuAc and Gal to be a(2+3). The spectrum in Fig. 2 was essentially superimposing the spectra of two previously identified hybrid-type oligosaccharides with structures EH-1 and EH-2. Those oligosaccharides differ in the presence of the residue Man-B (see Fig. 2 and Table 3). When released from rtPA, oligosaccharides EH- 1 and EH-2 were separated by HPAE and were characterized in pure state by 500-MHz ‘H-NMR spectroscopy(2). The chemical shifts of the structural reporter groups of compounds EH- 1 and EH-2 have been included in Table 3. (For reference purposes, we have compiled in Table 3 the chemical shifts of the structural reporter groups of high-mannose oligosaccharides with compositions Mans-toGlcNAc,-2.) We deduced that EH-1 and EH-2 occur in the mixture EH from Asn-300 in rCD4 in a ratio of 2:3. We came to this conclusion because of the intensity ratio of the H 1 signals of the a-anomer of the reducing GlcNAc-2 (at 6 5.252 and 5.248, respectively), and also becauseof the ratio of the Hl signals of

GlcNAc-Id

H-l

Man-3 Man-4 Man-4’

Gal-6 Gal-6’

H-2

H-3

Man-3 Man-4 Man-4 GlcNAc-5 GlcNAc-5’ Gal-6 Gal-6

GlcNAc-2

Fuccr( l-6)

Residuea

Reporter group

Anomer of ollgosacchande

ride ride

4 248 4.190 4 110

5 182 4 70 4.889 4 895 4 66 4.66 4 77 5.121 4.927 4 585 4 585 4.469 4 474

QO+F

ride ride

4 248 4 190 4.110

5 191 470 4 615 4 606 4 77 5 121 4 927 4 585 4.585 4 469 4.474

QO-F

4 113 nd’

4.247 4 191 4 108

5 183 4 696 4.889 4 896 4 665 4 669 4 77 5 119 4 928 4.575 4.583 4544 4 474

Ql+F

4 113 nd’

4 247 4.191 4 108

5 190 4 696 4 614 4 605 4 77 5 119 4 928 4.575 4 583 4544 4 474

Ql-F

nd’ 4 113

4 247 4.191 4 108

5.183 4 696 4.89 4.90 4.665 4.669 4 77 5 119 4.926 4.583 4 575 4.467 4 550

Ql’+F

Chemrcal shrft,b ppm in c

nd’ 4 113

4 247 4 191 4 108

5.190 4 696 4 614 4 605 4.77 5 119 4.926 4 583 4 575 4.467 4 550

Ql’-F

Table 1 ‘H Chemical Shrfts of Structural Reporter Groups of Constituent Monosaccharrdes for Dr-antennary Oligosaccharides of the N-Acetyllactosamine Type Released by N-Glycanase

4.113 4 118

4 246 4 190 4 114

5.182 4 697 4.893 4900 4.663 4 667 4 77 5 118 4 924 4 573 4 573 4544 4 550

Q2+F

4113 4 118

5 191 4.697 4.613 4 603 4 77 5.118 4 924 4.573 4.573 4.544 4.550 4.246 4.190 4114

Q2-F

g

s s

$

Fuca( 146)

CH3

2.039 2.082 2 082 2.05 1 2049 -

-

-

-

2 039 2.096 2.093 2.048 2.048 2.03 1 -

4.095 4.135 1.210 1.220

2 757 -

1 796 -

2.039 2 082 2.080 2.048 2048 2031 -

-

-

2.757 -

1796 -

2.039 2096 2.093 2051 2.045 2.03 1

1.210 1 220

4 10 4.13

2.757

1799

2.039 2 082 2.080 2051 2.045 2.03 1

-

2 757

1.799

1 212 1.222 2.039 2.096 2.093 2.048 2.043 2.032 2.032

4.095 4.136

2.759 2.759

1.796 1.800

2 039 2.082 2081 2.048 2.043 2 032 2.032

-

2.759 2 759

1 796 1.800

“The numbermg system used for denotmg gIycosy1 residues in the dxmtennary ohgosaccharides IS as follows. 4 6 5’ N NeuAca(2-+3)Galp( l+l)GlcNAcP( 1+2)Mana( 1+6) Fuca( l-6 Mar@ 1+l)GlcNAcP( 1-+4)Glc k AC NeuAca(2+3)Galp( 1-+4)GlcNAcP( 1+2)Mana( 1+3 $ 3 2 1 6 5 4 blIka were acqmred at 500 MHz for neutral solutions of the compounds m D20 at 27°C “Oligosaccharides were released from recombinant soluble human CD4 or from recombmant human tissue plasmmogen activator (1,2); for complete structures, compare Scheme 1 QO denotes asialo, Ql denotes mononalyl, and Q2 stands for dlsialyl ohgosaccharide Ql’ denotes a monosmlyl ch-antennary ohgosaccharide havmg Its siahc acid residue attached to Gal-6 The F stands for an a( 146)~fucosyl residue at GlcNAc-1 Structures are schematxally illustrated m the table heading usmg a shorthand symbohc notation; W = GlcNAc, 0 = Gal, 0 = Man; A = NeuAc, 0 = Fuc. The peripheral umt on the left corresponds to the glycosyl residues 5-6-N, the umt on the nght to the 5’-6’-N’ glycosyl resrdues dData for correspondmg, reduced ohgosacchandes are compiled m (14) ‘nd Not determined

GlcNAc-5 GlcNAc-5’ NeuAc Neu AC

GlcNAc- 1 GlcNAc-2

1.209 1220 2.039 2095 2.091 2.051 2.049 -

Fuca( 1+6)

H-5

NAc

4097 4.130

NeuAc NeuAc’

-

H-3eq

a a#

NeuAc NeuAc’

H-3ax

2 Ql

oi

GlcNAc- 1

H-l

Man-3 Man-4 Man-4 GlcNAc-5 GlcNAc-5’ GlcNAc-7 GlcNAc-7’ Gal-6 Gal-6’ Gal-8

GlcNAc-2

Fucoc(1+6)

Residuea

Reporter group

if %P a$ a-3 a$ %P 0 %P %P @P a$

if

a. P

Anomer of ohgosacchande

4 663 668 4 760 5 114 4 910 4 562 4 573 4 542 4 542 4 549 4 546

4 893 899

5 181 4 690

Q3+F

5 190 4.690 4.615 4 606 4 760 5.114 4 910 4 562 4.573 4 542 4 542 4 549 4 546

Q3-F

5 181 4 690 4 893 4 899 4 663 4 668 4 760 5 123 4871 4.572 4 590 4 562 4 546 4 546 -

Q3’+F

5 190 4 690 4 614 4 605 4 760 5 123 4 871 4 572 4 590 4 562 4.546 4 546 -

Q3’-F

5 182 4.688 4.902 4 910 4 662 4 667 4 76 5 131 4 858 4 563 4 594 4 542 4 562 4 542 4.547 4 547

Q4+F

5.190 4 688 4 615 4.606 4.76 5 131 4.858 4 563 4.594 4 542 4 562 4.542 4 547 4 547

Q4-F

Chemical shlft,b ppm mc l)+F

5 181 4 688 4900 4 907 4 660 4 660 4 76 5 129 4 856 4 563 4.595 4 540 4 562 4 546 4 450 4 546

Q(4+

Table 2 ‘H Chemical Shifts of Structural Reporter Groups of Constituent Monosaccharides for Tn-antennary and Tetra-antennary Oligosacchandes of the N-Acetyllactosamme Type Released by N-Glycanase

5 181 4.689 4 897 4.906 4 659 4 659 4 76 5 128 4.855 4564 4.595 4.540 4.556 4 547 4.452 4 543

4(4+2)-F

1813 1.813 1813 -

4095 4 136

1.211 1221

NeuAc NeuAc’ NeuAc* NeuAc*’

NeuAc Neu Ad NeuAc* NeuAc*’

Gal-6 Gal-8’

Fuca( 1+6)

Fucct( 1+6)

H-3ax

H-3eq

H-4

H-S

CH3

ndf ndf

2.756 2 756 2 756 -

4 122 4 122 4 122 -

Gal-6 Gal-6 Gal-8 Gal-8’ GalP4add Galf14add

H-3

nd ndf

2.756 2 756 2.756 -

1.813 1.813 1.813 -

4.122 4.122 4 122 -

4 214 4.214 4 107

4 214 4 214 4 107

Man-3 Man-4 Man-4

-

-

H-2

Gal-8’ GlcNAcP3 GlcNA@ Galp4add Galp4add

1.211 1221

4.095 4 136

ndf nd

2 756 2.756 2.756

1.813 1.813 1813

4 122 4.122 4 122 -

4 253 4.196 4091

4.562 -

nd ndf -

2.756 2 756 2.756

1.813 1.813 1 813

4.122 4.122 4.122 -

4 253 4.196 4091

4.562 -

4.095 4 135 1.211 1221

ndf ndf

2 756 2.756 2 756 2.756

1.805 1 805 1.805 1.805

4 120 4.120 4 120 4.120 -

4209 4.224 4092

4 562 -

nB nd

2 756 2 756 2 756 2.756

1.805 1.805 1.805 1.805

4.120 4 120 4.120 4 120 -

4.209 4 224 4.092

4 562 -

1210 1220

4095 4.135

4 162 ndf

2.756 2.756 2.756 2 756

1.803 1 803 1.803 1.803

4 117 ndf 4 117 4 117 4.117 -

4.210 4.223 4.090

4.562 4 696 4 556 -

1 210 1.220

4095 4.135

4 162 4.162

1.803 1.803 1.803 1.803 2 757 2.757 2 757 2.757

4.212 4.224 4.090 4.117 ndf 4.117 rid 4.117 4 117

4.467 4 697 4.697 4.556 4.556

% 0 (2” 8 kl 0’ i% Be : e g s R R’ P

Ei

*

Reporter group NAc

GlcNAc-5 GlcNAc-5’ GlcNAc-7 GlcNAc-7’ GlcNAcP3 GlcNAcP3’ NeuAc NeuAc’ NeuAc* NeuAc*’

GlcNAc- 1 GlcNAc-2

Residuea

Anomer of ohgosacchande 2 039 2097 2095 2044 2044 2.074 2031 2031 2031

Q3+F

2 039 2.083 2081 2044 2.044 2 074 2031 2031 2031

Q3-F

(conmue~)

2 039 2095 2091 2 052 2 039 2 039 2031 2031 2031

Q3’+F

Table 2

2 039 2 082 2081 2 052 2 039 2 039 2031 2031 2031

Q3’-F

2 039 2.095 2091 2.048 2 039 2 075 2 039 2 030 2 030 2 030 2 030

Q4+F

2.039 2.082 2 080 2048 2 039 2 075 2 039 2 030 2.030 2 030 2 030

Q4-F

Chemical shlft,b ppm in’

2.038d 2094 2090 2047 2 037d 2 075 2 035d 2 036d 2 030 2 030 2 030 2 030

2.038p 2091 2.088 2047 2 036e 2 075 2 036e 2.036’ 2 035’ 2.030 2 030 2.030 2 030

d

Q(4 + l)+F Q(4 + 2)-F

‘The numbermg system used for denotmg glycosyl restdues m the tn- and tetra-antennary ohgosaccharides IS as follows N*’ 8 7’ NeuAca(2~3)Galp(l--%l)GlcNAc~( 1+6) N’ 6 5’ \4 NeuAca(2+3)Galp( 1+4)ClcNAcP( 1+2)Mana( l-+6) Fuca( 1+6), Mat& 1*)GlcNAcP( 14)GlcNAc NeuAca(2~3)Gal~(l~4)GlcNAc~(l~2)Mana(l-+3) > 3 2 1 N 6 5 /4 NeuAca(2+3)Galp( l-~I)GlcNAcp( lj4) N* 7 8 The resrdues m the addrtronal N-acetyllactosamme units m compounds Q(4 + 1) + F and Q(4 + 2) + F are denoted GlcNAcP3 and GalP4 add (see foatnote ‘) bData were acquired at 500 MHz for neutral soluttons of the compounds m D,O at 22-27°C ‘Ohgosaccharides were released from recombinant human tissue plasmmogen acttvator (2) or from recombmant human erythroporetm (Watson, Blithe, and Van Halbeek, in preparation); for complete structures, compare Scheme 1 43 denotes trtstalyl tn-antennary, Q3’ denotes tnstalyl tn’amennary, and Q4 stands for tetrasralyl tetra-antennary oligosacchande. Q(4 + 1) denotes a tetrasralyl tetra-antennary ohgosacchande having an additional (sialylated) N-acetyllactosamine unit p( lj3)-attached to Gal-6 Q(4 + 2) denotes a tetrastalyl tetra-antennary ohgosacchande havmg two addtttonal (sialylated) N-acetyllactosamine units P(1*3)-attached to Gal-6 and Gal-8, respectively The F stands for an a(lj6)-fucosyl restdue at GlcNAc-1. Structures are schemattcally illustrated m the table heading usmg a shorthand symbolic notation, n = GlcNAc, 0 = Gal, 0 = Man; A = NeuAc, D = Fuc The penpheral umts, from left to nght, correspond to the glycosyl restdues 5-6-N, 7-8-N*, 5’-6-N’ and 7’-8’-N*‘, respectively d*eAssrgnments may have to be interchanged fnd Not determmed

H-2

GlcNAc- 1

H-l

Man-4 Man-C

Man-D, Man-B Man-D, Man-E Man-D, Glc-NAc-5 Gal-6 Man-3

Man-3 Man-4 Man-C Man-D, Man-4 Man-A

GlcNAc-2

Residue”

Reporter group

; a$ a$

;1 a# a.P a$ 4 a# 0 a3

; a# a,P a,P 0 4

a P

Anomer of oligosacchande

4244 4 232 4118 4069

245 72 77 108 4 874 5 083 5 108 4911 -

4 255 4244 4 069 -

5 4 4 5

5 245 4 72 4 77 5 352 5 054 4 874 5 083 5 108 4911 -

-

4.230 ndd ndd

5 189 4 698 4 597 4.765 5340 5.046 4 870 5 093 4 909 -

HM (6 + 2)

(::)

ml

(5 + 1)

4 597 4.765 5 340 5.301 5046 4 870 5 093 4 909 4 230 ndd ndd

5 189 4 698

(?2)

5 142 5046 4 230 ndd ndd

5 189 4 698 4 597 4 765 5 340 5 301 5.046 4 870 5 093

(:?2)

4.166 4 158 4 089 4 117

5 249 4.720 4 782 5.347 5304 5 050 4 874 5 085 5 115 5 147 5042 4 932 -

-

(9”+“1)

Chenucal shift? ppm mc

5 189 4.698 4 597 4 765 5.340 5 301 5046 4 870 5404 5 046 5 142 5.046 4 230 ndd ndd

(!?--)

5.192 470 4.602 4 77 5.338 5 301 5048 4 873 5 095 5 141 5.048 5 141 5042 4.261 4 106 4 092

(1E2)

5 252 472 4 77 5 124 4 897 5 094 5 124 4 575 4544 4.256 4 239 4 197 -

-

EH- 1”B)

Table 3 ‘H Chemical Shifts of Structural Reporter Groups of Constituent Monosaccharides for Ohgosacchandes of the High-Mannose and Hybrid Types Released by Endo-H or by N-Glycanase

-

4 4 4 4 4

545 576 256 239 202 -

-

248 72 77 121 4 876 5 079 5 105 4911 -

5 4 4 5

EH - 2(+W

ii s 22 cw

8

a$ a$ a$ a$ a$ a$ a$ a$ a$ 43 a.P a$ a$

a$ a$

4144 4.069 3.98 2043 -

-

4144 4069 3.99 -

2.043 -

4 143 ndd ndd 2.039 2064 -

ndd 4 143 ndd ndd 2 039 2064 -

ndd 4.143 ndd ndd ndd 2.039 2064 -

4 074 4 150 4.053 4.027 4.074 3991 2 045 -

ndd 4.143 ndd ndd ndd ndd 2.039 2064 -

-

2 038 2.065 -

-

4 067 4 149 4067 4018 4 067 4.018 4.067 -

-

4 116 I .797 2 755 2.045 2 050 2031

-

4.127 4049 -

4.146 4049 3.98 4.114 I 797 2 757 2045 2 050 2031

numbermg system used for denotmg glycosyl restdues m the htgh-mannose ohgosacchandes ts as follows Mana(l+Z)Mana(l+6) B Mana(l-t6) D3 Mana(l+2)Mana(I+3) 4 > A Mat& 1+4)GlcNAc~( l+I)GlcNAc D2 3 2 1 Mana( 1+2)Mana( 1+6) > E Mana( 1+3) D4 Mana(l-+2)Mana(l-+2) > 4 C Q and m the hybnd ohgosacchandes B Mana(l+6) 4 A Mana(l+6) Mana(l+3 Man~(l+4)GlcNAc~(l~4)GlcNAc 2 NeuAca(2+3)Galp( l+I)GlcNAcB( 1+2)Mana( 1+3) > 3 2 1 4 N 6 5 bData were acqutred at 500 MHz for neutral soluttons of the compounds in DzO at 2TC ‘Ohgosaccharides were released from recombmant soluble human ttssue plasmmogen acttvator (Z), from recombmant hepatttts B surface anttgen preS2 + S (3) and or from allergen Art Y II (4). for complete structures, compare Scheme 1 HM(5 + 1) denotes high-mannose MqGlcNAc, HM(6 + 2) denotes MattsGl~NAc~, so on; EH-1 stands forendo-H released hybnd ohgosacchande-1 Structures are schemattcally tllustrated m the table headmg usmg a shorthand symbolic notatton, n = GlcNAc, 0 = Gal, 0 = Man, A = NeuAc The peripheral umt on the left corresponds to the glycosyl restdues C-D,, and the untt on the nght to the B-D, glycosyl restdues dnd Not determmed

The

H-3 H-fax H-3eq NAc

Man-D, Man-4 Man-A Man-D, Man-B Man-D, Man-E Man-D, Gal-6 NeuAc NeuAc GlcNAc- 1 GlcNAc-2 GlcNAc-5 NeuAc

Van Halbeek Man-A in the a-anomer of the respective oligosaccharides (at 6 5.094 and 5.079) (see Fig. 2B). The structures of the oligosaccharides released from Asn-27 1 and Asn-300 of rCD4, and their relative abundances, have been published (I). A pictorial representation of the site heterogeneity of the carbohydrate structures of recombinant soluble CD4 expressed in CHO cells is given in Fig 3. 4. Notes 1. Advantages and disadvantages of NMR. NMR spectroscopy is a powerful method for primary structural characterization of glycoprotein carbohydrates, but, standmg alone, the method hasits limitations. Therefore, NMR should be the first, but never the only step m the structural analysis procedure. Partial or even complete primary structure determination IS possible from the 1D ‘H-NMR spectrumprovided that structurallyrelated compounds have been previously characterizedby ‘H-NMR spectroscopy.It is recommended that the glycosyl-residuecomposition be obtained independently by chemical analysis and the mol wt be verified by FAR mass spectrometry. The most important advantage of NMR spectroscopy over other techniques used for structural analysis of carbohydrates is its nondestructive nature. The ohgosaccharide/glycopeptide sample,after NMR analysis, can be recovered 100% unimpaired and used for other analyses, biological activity tests, and so forth. Also, mixtures of structurally closely related components can be analyzed successfully. The most important hmitation of NMR spectroscopyis its sensitivity. Not only are at least 10-15 nmol of pure carbohydrate required to record anNMR spectrum, even at 600 MHz, heterogeneity occurring in low abundance in the sample may escapeattention. For example, the occurrence of NeuGc eluded NMR analysts (4% of total siahc acid m the samplesdiscussedin Figs. 1 and 2, asdetermined by sialic acid analysis, see Fig. 4; cf. [1.5]). ‘H-NMR spectroscopy may also fail to detect the presence of nonmagnetically active nuclei in the carbohydrate: Although it is relatively straightforward to detect the presence of a phosphate (16) or O-acetate (17) group m an oligosaccharide by NMR, sulfate may escape detection (see, however, ref. 18). With regard to size limitations, N-type ohgosaccharides as large as a pentadeca- to eicosasaccharide(i.e., ohgosacchandes with 15-20 constituent glycosyl residues) have been successfully characterized by 500-MHz ‘H-NMR spectroscopy (see, e.g., refs. 2 and 3). Degrees of branching as great as six (so-called “intersected penta-antenna@ structures) have been identified by NMR (19,20). However, with increasing size,it may be pos-

lH-NMR

of Glycoprotein

Carbohydrates

n Asn-271 q Asn-300

EH 1

EH 2

O&F

00

F

OlrF

QlF

OlrF

01

F

02+F

02

F

Fig. 3 Histogram showmg the glycosylatlon site heterogeneity of recombinant soluble human CD4 expressed m CHO cells (I) Explanation of the symbolic notatlon n = GlcNAc, 0 = Gal, 0 = Man, Cl = Fuc, A = NeuAc (compare Table 1) A small portion (9%) of the structures EH-1 and EH-2 is attached to Asn-300 via C6fucosylated GlcNAc; the remamder (9 1%) of the structures IS lmked to Asn-300 through GlcNAc devoid of fucose. The N-acetyllactosamine-type structures occur m the glycoprotein as shown.

sable to define structural elements that extend the core and backbone of the common structures (Scheme l), but it is not always possible to delineate unambiguously their branch location by NMR alone. A classical example of the latter situation 1s the so-called poly-N-acetyllactosamme type structures, i.e., extensions of the basic dl-, trr-, or tetra-antennary ohgosaccharides (Scheme 1) with a number of N-acetyllactosamine units (m series and/or parallel) attached p( 143) and/or p( 14) to Gal residues (21) (compare Table 2). Also, blood group and other antigenic determinants m the peripheral regions of N-type oligosaccharldes cannot always be located in an exact branch, depending on the complexity of substitution (22).

14%

Van Halbeek

NeuAc

A---O

NeuRa(2+3)Galp(l-+

~ I 8

10

12 14 Ttme (min)

16

18

Fig. 4. Determmation, by high-pH amon-exchange chromatography with pulsed amperometrlc detection (PAD), of slahc acids m recombinant soluble human CD4 expressed in CHO cells after mild acid hydrolysis (0 lMTFA, 80°C, 1 h; then Dionex AS6) The glycoprotein was found to contain NeuAc (R = AC (CO-CH3) and NeuGc (R = Gc (CO-CH,OH) m the ratio of 96:4

2 Automation of the method. With a dedicated rmcrocomputer at the heart of the NMR spectrometer, the method of recording spectra is easily automated. However, sample preparation will remam the responslbllity of a researcher. The most time-consuming part of the structural analysis of carbohydrates by NMR, until now, has been spectral mterpretatlon. It 1sthere where efforts along two different alleys are underway to automate the method. The use of a search algorithm to compare a list of chemical shifts of structural reporter groups wtth all those m a data base appears to be rather straightforward. Indeed, several such computer programs have been written to assist m the interpretation of glycoprotem ohgosaccharlde ‘HNMR spectra (7,23). A much more elegant and potentially faster way is to use the entire spectrum for pattern recogmtion, mcludmg the 3-4 ppm envelope region. The NMR spectrum, already available m digital format, would not be reduced mto a list of chemical shifts, as 1s done for human interpretation. In our laboratories at the CCRC, recently artificial neural

lH-NMR

of Glycoprotein

Carbohydrates

145

networks have been successfully applied for automated spectral mterpretation, mcludmg NMR spectra (8). In the foreseeable future, (NMR) spectral data bases will be connected to the complex carbohydrate structure data base(CCSD) (24). The neural network searchalgorithms will be made available to the scientific community much like CarbBank. 3. De nova structural elucrdatron of carbohydrates by NMR spectroscopy. When the ID *H spectrum does not resemble that of a known oligosaccharide structure, the combmation of multiple-pulse ‘H-NMR spectroscopic techniques (chiefly, TOCSY and ROESY) may be applied for the de now sequencing of the carbohydrate, provided that 1-3 pm01of pure substance are available for the analysis. The TOCSY technique permits subspectral editing of the ‘H spectrum for each constitutmg monosaccharide and, consequently, the vutually complete assignment of all the multiplet patterns in the ‘H-NMR spectrum Subsequently, from the ROESY spectrum, we can deduct the sequenceof the monosaccharide residues, mcludmg identification of the positions and configurations of glycosidic lmkages. A discussion of more sophisticated NMR techniques is beyond the scope of this chapter. However, the Interested reader is referred to recent monographs (25,26) and review articles (27-30). As mentioned earlier, for de nova sequencing of the carbohydrates by experiments, such as 1D and 2D TOCSY and ROESY, typically 100times the amount of sample mentioned for the ID analysis is needed (e.g., 1 pm01at 500 MHz). 4. Solution conformation analysts by NMR spectroscopy. ‘H-NMR is presented here as a method eminently suited for the elucidation of the primary structure of glycoprotein carbohydrates. It is also the method of choice for solution conformation analysis. Complete ‘H resonance assignmentsand primary structure determmation are a prerequisite for the analysis of the solution conformation based on quantitation of (‘H,*H) NOES. Oftentimes assistedby other NMR parameters (r3C chemical shifts, heteronuclear coupling constantsand NOE effects, isotope shift effects, and so on [31]) and always evaluated by theoretical conformational analysis, i.e., potential energy calculations of one sort or another (HSEA, AMBER, MM2, Monte Carlo, molecular dynamics, and so on) (32,33), 2D and 3D ‘H-NMR spectroscopy is the key experimental technique for solution conformation analysis of carbohydrates and glycoconjugates. Ultimately, the knowledge of primary structures, 3D conformattons, and the dynamics/flexibility of glycoprotem oltgosacchandes m the natural environment of their glycoprotem macromolecule will broaden our insights into their functioning as mediators of numerous biological cellcell and cell-molecule interactions. It is the author’s convictton that NMR spectroscopy is the most valuable contributor toward this understandmg.

146

Van Halbeek

Acknowledgments Research in the author’s lab is supported by National Institutes of HealthGrants P41-RR-0535 1,POl-AI-27135 andROl-HL-38213. The author is indebted to Rosemary Nuri for editing the manuscript. Abbreviations CHO, Chinese hamster ovary; lD, one-dimensional; 2D, two-dimensional, and so on; CCSD, complex carbohydrate structure data base; DSS, sodium 4,4-dimethyl-4-silapentane-1-sulfonate; FAB, fast-atom bombardment; FID, free induction decay; FT, Fourier transform( ation); ‘H(H), hydrog en, D, deuterium, T, tritium; HBV, hepatitis B vu-us;NOE, nuclear Overhauser effect; rCD4, recombinant cluster differentiation antigen; RF, radio frequency; ROESY, rotating-frame NOE-correlated spectroscopy; rtPA, recombinant human tissue plasminogen activator; S/N, signal-to-noise ratio; TOCSY, total correlation spectroscopy; WEFT, water-eliminated FT. References 1.

Spellman,M.

W., Leonard, C. K., Basa, L. J., Gelmeo, I., and Van Halbeek, H (1991) Carbohydrate structures of recombmant soluble human CD4 expressed m Chinese hamster ovary cells Biuchemlstry 30,2395-2406 2 Spellman, M. W , Basa, L. J., Leonard, C. K , Chakel, J , O’Connor, J V , Wdson, S., and Van Halbeek, H (1989) Carbohydrate structures of human tissue plasmmogen activator expressed m Chinese hamster ovary cells J Biol Chem 264, 14,100-14,111 3 Yu Ip, C. C., Miller, W J., Kubek, D. J., Strang, A.-M , Van Halbeek, H , Pieseckr, S. J., and Alhadeff, J. A. (1992) Structural characterization of the N-glycans of a recombinant hepatitrs B surface antigen derived from yeast Biochemistry 31,

285-295. 4 Nrlsen, B M., Sletten, K., Smestad Paulsen, B , O’Nerll, M., and Van Halbeek, H (1991) Structural analysis of the glycoprotein allergen Art v II from the pollen of mugwort (Artemisla vulgarrs L.) J. Biol Chem. 266,266C-2668. 5 Vhegenthart, J F G , Dorland, L , and Van Halbeek, H (1983) High-resolutron ‘H-nuclear magnetic resonance spectroscopy as a tool m the structural analysis of carbohydrates related to glycoprotems. Adv. Carbohydr. Chem. Blochem. 41,

209-374 6 Van Halbeek, H (1984) Structural analysis of the carbohydrate chams of mucmtype glycoprotems by high-resolutron ‘H-NMR spectroscopy. Biochem. Sot Trans. 12,601-605 7 Hounsell, E F and Wrrght, D J (1990) Computer-assisted mterpretatron of rHNMR spectra in the analysis of the structure of ohgosaccharrdes Carbohydr Res 205,19-29.

‘H-NMR

of Glycoprotein

Carbohydrates

147

8. Meyer, B , Hansen, T , Nute, D., Albersheim, P., Darvrll, A. G., York, W. S., and Sellers, J. (1991) Identification of the ‘H-NMR spectra of complex oligosaccharides with artificial neural networks. Science 251,542-544. 9 Oppenheimer, N J (1989) Basic techniques. Sample preparation Methods Enzymol. 176,78-92

10 Hore, P J. (1989) Basic techniques. Solvent suppression. Methods Enzymol 176, 64-77. 11 Haasnoot, C. A G (1983) Selective solvent suppression m ‘H FT-NMR using a DANTE pulse; its application in normal and NOE measurements. J. Mugn. Reson 52,153-158 12. Carver, J. P. and Grey, A. A (198 1) Determination of glycopeptide primary structure by 360-MHz proton magnetic resonance spectroscopy Biochemistry 20, 6607-6616. 13 Brockhausen, I., Grey, A A , Pang, H., Schachter, H., and Carver, J. P. (1988) N-acetylglucosaminyltransferase substrates prepared from glycoprotems by hydrazinolysts of the GlcNAc-Asn linkage Purification and structural determrnation of oligosaccharides with mannose and iV-acetylglucosamme at the nonreducing termini Glycoconpgate J 5,419448. 14 Green, E D , Adelt, G , Baenziger, J. U., Wtlson, S , and Van Halbeek, H. (1988) The asparagine-linked oligosaccharides of bovine fetuin: Structural analysis of N-glycanase-released oligosaccharides by 500-MHz ‘H-NMR spectroscopy J. Biol Chem. 263, 18,253-18,268.

15. Hokke, C H , Bergwerff, A A., Van Dedem, G. W. K., Van Oostrum, J., Kamerling, J. P., and Vliegenthart, J. F. G. (1990) Sialylated carbohydrate chains of recombinant glycoprotems expressed in Chinese hamster ovary cells contam traces of N-glycolylneurammic acid. FEBS Lett. 275,9-14 16 Couso, R. 0, Van Halbeek, H , Reinhold, V. N., and Kornfeld, S. (1987) The high-mannose ohgosaccharides of Dictyostelium discoldeum glycoproteins contain a novel intersecting N-acetylglucosamine residue. J. Biol Chem 262, 452 l-4527. 17. Damm, J. B L , Voshol, H., HBrd, K , Kamerlmg, J. P , and Vliegenthart, J. F G (1989) Analysis of N-acetyl-4-O-acetylneurammic acid-containing N-lmked carbohydrate chams released by N-glycanase, Apphcation to the structure determrnation of the carbohydrate chains of equine fibrinogen Eur J. Biochem. 180, 101-l 10. 18. De Waard, P., Koorevaar, A., Kamerlmg, J. P , and Vliegenthart, J. F. G (1991) Structure determinatron by ‘H-NMR spectroscopy of (sulfated) sialylated N-linked carbohydrate chains released from porcine thyroglobulin by N-glycanase. J. Biol. Chem 266,42374243

19 Paz Parente, J., Wieruszeski, J. M , Strecker, G , Montreuil, J , Fournet, B., Van Halbeek, H., Dorland, L , and Vhegenthart, J. F. G (1982) A novel type of carbohydrate structure present m hen ovomucord J. Blol. Chem. 257, 13,173- 13,176. 20 Paz Parente, J , Strecker, G , Leroy, Y., Montreml, J , Fournet, B., Van Halbeek, H , Dorland, L., and Vhegenthart, J.F G. (1983) Primary structure of a novel Nglycosidrc carbohydrate unit derrved from hen ovomucord, a 500-MHz ‘H-NMR study FEBS Lett. 152, 145-152.

148

Van Halbeek

21. Fukuda, M , Bothner, B., RamsamooJ, P., Dell A., Tiller, P R., Varlu, A , and Klock, J. C. (1985) Structures of sialylated fucosyl polylactosaminoglycans isolated from chronic myelogenous leukemia cells J. Blol. Chem. 260, 12,95712,967 22 Fmne, J , Brermer, M E., Hansson, G C , Karlsson, K A., Leffler, H., Vhegenthart, J. F. G., and Van Halbeek, H. (1989) Novel polyfucosylated iV-lmked glycopeptides with blood group A, H, X, and Y determmants from human small-intestmal epithelial cells. J. Biol. Chem. 264,5720-5735 23. Bot, D. S M., Cleij, P , Van ‘t Klooster, H. A., Van Halbeek, H , Veldink, G A., and Vliegenthart, J F. G (1988) Identification and substructure analysis of ohgosaccharide chains derived from glycoprotems by computer retrieval of hrghresolution ‘H-NMR spectra. J Chemometncs 2, 1 l-27. 24 Doubet, R S , Bock, K , Smith, D M , Darvill, A G , and Albersheim, P (1989) The complex carbohydrate structure database. Trends Bwchem. Scl 14,475477.

25. Derome, A. E. (1987) Modern NMR Technrques for Chemrstry Research Pergamon, Oxford 26 Sanders, J K M. and Hunter, B K. (1987) Modern NMR Spectroscopy* A Guide for Chemists.Oxford University Press, Oxford. 27 Bush, C. A (1988) High-resolution NMR m the determination of structure m complex carbohydrates. Bull. Magn Reson.10,73-95. 28 Dabrowslu, J (1989) Analytical methods: Two-dimensional proton magnetic resonance spectroscopy. Methods Enzymol. 179,122-l 56 29 Van Halbeek, H (1990) NMR of complex carbohydrates, m Frontiers of NMR m Molecular Biology, UCLA Symposia Series vol. 109 (Live, D., Armnage, I M , and Patel, D , eds ), Liss, New York, pp. 195-213. 30. Van Halbeek, H. and Poppe, L (1992) Structure elucidation of ohgosacchartdes by NMR spectroscopy. Adv. Carbohydr. Chem Biochem. (in preparation). 31 Poppe, L., Stutke-Pnll, R., Meyer, B., and Van Halbeek, H. (1992) The solutton conformatron of sialyl-cx(2+6)-lactose studied by modern NMR techniques and Monte Carlo stmulations. J. Biomol NMR 2, 109-136 32. Homans, S. W (1990) Ohgosacchartde conformations: Application of NMR and energy calculattons Progr. NMR Spectrosc 22,155-g 1 33 Meyer, B. (1990) Conformational aspects of ohgosaccharides Top Curr Chem 154,141-208

&IAPTER

6

The Application of Nuclear Resonance to Structural of Polysaccharides Christopher

Jones

and Barbara

Magnetic Studies Mullqy

1. Introduction 1.1. Polysaccharides: Occurrence and Importance

Polysaccharides are ubiquitous components of living tissues.They are storagecompounds in both animals and plants, and form important structural elements in, for example, plant cell walls, insect exoskeletons, and animal connective tissues. In bacteria, they are important both as structural elements in the cell wall (the teichoic and teichuronic acids) and as surface antigens, such as the O-antigenic oligo- or polysaccharide chain of the lipopolysaccharides (LPS) of gram-negative species, and the capsular polysaccharides (CPS) found on many pathogenic bacteria. These extracellular bacterial polysaccharides have a protective function, preventing desiccation of the organism, and are important determinants of virulence, since they shield the bacterium from the body’s defenses. Polysaccharides also occur in many mammalian and other systems as the glycosaminoglycan (GAG) side chains of proteoglycans, with both biochemical (such asthose of cell-surface heparansulfate [I] ) and structural functions (for example, the chondroitin sulfates of connective tissue [22]). An increasing range of polysaccharides is now being exploited commercially. Bacterial CPS mixtures are in use as human vaccines (3), From Methods III Molecular Biology, Vol 17 Spectroscop/c Methods and Analyses NMR, Mass Spectrometry, and Metalloprotem Techmques Edlted by C Jones, B Mulloy, and A H Thomas Copynght 01993 Humana Press Inc , Totowa, NJ

149

150

Jones and Mulloy

and the glycosaminoglycan heparin has long been used clinically as an anticoagulant and antithrombotic agent (4). 1.2. Polysaccharide

Structures

Two classesof polysaccharides will be considered in this chapter: those having a strict regular repeat unit, such as the capsular polysaccharides and LPS O-antigen, and, second,polysaccharides, such as the glycosaminoglycans, in which heterogeneity occurs as a result of varying substitution and/or epimerization of P-u-glucuronic acid to a-L-iduronic acid. 1.3. Comparison Between Polysaccharides and Peptides

A knowledge of the ways in which polysaccharrde structure differs from that of polypeptides rationalizes the different approaches used to obtain and interpret nuclear magnetic resonance(NMR) datafor thesetwo types of biopolymers. The repertoire of commonly occurring monomers is about the same size in each case, but, whereas the peptide linkage is rigidly defined, monosaccharidesmay be linked together in a wider variety of ways. Each sugar can be present either as the a- or p-anomer and may be linked to any of the free hydroxyl groups on the adjacent sugar residue (see Note 1). Both linear and branched systems occur in polysaccharide systems, with a wide variety of nonsugarsubstituents: acetateesters,sulfate esters, pyruvate acetals, and so on. This would lead to an impossibly complex spectrum, but for the fact that even in relatively heterogeneouspolysaccharides there is a strong repeating element of not more than sevensugars, rather than the nonrepeating linear sequencefound in globular proteins. Consequently, a single resonancein the spectrum usually does not arise from a single residue in the primary sequence,but is the superposition of signals from similar residuesat various positions along thecham. Polysaccharides are also almost invariably polydisperse; for structural studies, this does not introduce difficulties in NMR measurement or mterpretation. Capsular poiysaccharides have mol wt of typically hundreds of kilodaltons, but can give surprisingly sharp signals, unlike proteins of the same size (but see Note 2). Most glycosaminoglycans have mol wt of 10-50 kDa, though hyaluronic acid may reach a mol wt of over a million. Their NMR spectra are often more complex than those of capsular polysaccharides (becauseof

NMR of Polysaccharides

151

heterogeneity) andwith broadersignals, since steric crowding of thebulky substituents tends to make the polysaccharide chain stiffer. 1.4. Scope ofNMR Studies 0fPolysaccharides NMR 1sthe single most powerful technique for solving the structures of intact polysaccharides. Information can be obtained on the composition, sequence,linkage, and substitution positions of polysaccharides, as well as the anomeric configuration. The absolute configuration of the sugarresidues cannot normally be determined by NMR, for which GC or optical techniques must be used (5). The nondestructive nature of NMR spectroscopy allows it to precede other techniques, such as methylation analysis (6). Structural studies on carbohydrates by NMR involve some consideration of conformational properties as a matter of necessity, but the use of NMR techniques in the determination of “secondary” and “tertiary” structures of polysaccharides will not be dealt with here. 2. Sample

Preparation

2.1. Removal of-Protein and Nucleic Acid Impurities Samples must be free from protein and nucleic acid impurities. The extent of protein and nucleic acid contamination can be readily estimated by measurement of UV absorption at 280 and 254 nm; a pure polysaccharide sample will have little or no absorption at these wavelengths. Enzymic digestion with ribonuclease, deoxyribonuclease, and proteases followed by dialysis or gel filtration is valuable, since glycosidase impurities in these enzymes are not significant. 2.2. Removal of Unwanted Counterions from Anionic Polysaccharides Very acidic polysaccharides,suchas sulfatedglycosaminoglycans, may show a high affinity for paramagnetic heavy metal impurities, which broaden resonancesin the spectrum to an unacceptableextent even when present in very small quantities. The most convenient method of dealing with theseis to run the sample through a small column (we use 1 x 8 cm) of a suitable ion-exchange resin in the sodium form (Dowex 50X8 or Chelex, Bio-Rad, Richmond, CA). The ion exchanger should be very well washed with distilled water (about 1 mL/min for at least 4-5 h or, better,

Jones and Mulloy overnight) immediately before use. In other sulfate-containing samples and samples prone to gelling, control of the counterion can also be important and is achieved by the same process. 2.3. Sample

Quantity

For capsular polysaccharides or LPS O-antigens approx 5 mg are required for a full proton study at high field. More material, typically 20 mg, is required for carbon analysis.Thesequantities dependon the sample to some extent-more when a large repeat unit is present or for a very viscous sample, but methods possible on newer instruments, particularly proton-detected heteronuclear correlation spectra, are improving sensitivity here. Larger samples of glycosaminoglycans may be necessary10-20 mg for proton and 50-100 mg for carbon studies. 2.4. Solvents:

Exchange

with D,O

The NMR spectrum will almost invariably be collected in aqueous solution (seeNote 3), and, since polysaccharides carry a large number of exchangeableprotons,deuteriumexchange is very strongly recommended. These polysaccharides are almost invariably thermally stable and not prone to denaturation. Comprehensive freeze-drying and exchange with D20 areboth possible and desirable.Dissolve the sample in the minimum amount of D,O (CPSsand GAGS arevery soluble), freeze, and lyophilize; repeatthis processthreetimes (seeNote 4). Solvent suppressionin proton spectrashould then be unnecessary,thereby simplifying the experimental procedure and the final spectrum. Many important peaks are close to the solvent resonanceand can be seenmore clearly without solvent suppression. The information that can be obtained from the exchangeableprotons (OH and NH) is relatively little, m contrast to the use made of the armde proton in peptide and protein studies. 2.5. Control

ofpH

The final pH of a solution may lie in the range in which uranic acids or phosphateesters titrate, and the chemical shift of the H-5, C-5, C-6, and the phosphate system is then sensitive to small differences in experimental conditions. In these cases, control of pH with dilute buffer is recommended-phosphate at pH 7 is useful unless 31Pspectra are required. The buffer should, of course, be deuterium exchanged before use.

NMR of Polysaccharides

153

3. Conditions

for the Collection of Spectra 3.1. Temperature Spectracan be obtained at high temperatures,since polysaccharides are usually quite heat stable. Increasing the probe temperature to 70-90°C considerably sharpensthe resonancesand increasessensitivity, particularly in 2D experiments (see Note 5). 3.2. Residual Water Some interesting resonancesmay be obscured by the water peak, but this can be moved upfield by increasing the temperature. One-dimensional proton spectra should therefore be collected at more than one temperature.In general,resonancesarising from thepolysaccharide will show small temperature coefficients, and temperature changesdo not additionally complicate the assignment. 3.3. ID Proton and Carbon Spectra Prelimmary ID proton spectra should be obtained for any sample, as a check on its suitability for NMR, before large amounts of spectrometer time (and spectroscopist time) are committed to more elaborate studies. Carbon spectra take much longer, but are very informative. 3.4.20 Spectra Most of the usual repertoire of 2Dexperiments can be applied, but some fail becauseof poor sensitivity. In general, those experiments that obtain correlations through small coupling constants by using relatively long tuning delays causeproblems, since the signal decays (by rapid T2 relaxation) before acquisition begins. Experiments that fall into this category include the J-resolved proton experiment, long-range correlation experiments (homo- and heteronuclear),and sometimes even the standardonebond carbon-proton correlation experiment, which can fail with very viscous samples (7). On the other hand, rapid T, and T2 relaxation does allow time to be saved on relaxation delays (see Chapter 5). 3.5, Planning an NMR Study A set of 1D proton and carbon, COSY, RELAY COSY, NOESY, and heteronuclear correlation spectracan be collected in a total of about 48 h on a SOO-MHz spectrometer, if everything works the first time and the quantrty of material is not limtted. This will provide most of the informa-

154

Jones and Mulloy

non needed for a structural study or highlight specific problems to be solved by other methods. 3.6. Field Strength and Sensitivity Proton spectra should be run at the highest possible field (300 MHz at least), but worthwhile carbon spectra can be obtained even at a field as low as 20 MHz. Proton spectra are of course much more sensitrve than carbon spectra, and sensitivity increasesdramatically with increasing field strength. There have been very few studies on labeled (13Cm this context) polysaccharides. 4. Interpretation and Structure

of Spectra Determination

4.1. Repeating Polysaccharides: Composition High-mol-wt repeating polysaccharides usually show remarkably simple spectra with insignificant complications owing to end groups. Unfortunately, these spectra are very crowded, which createsa different set of problems. The structure (3) of the repeat unit of a typical bacterial polysaccharide, the CPS from Streptococcus pneumoniae Type lOA, IS shown in Fig. 1. Nearly all the nonexchangeable protons are present in either H-C(-C)(-C)-0 or H-C(-C)(-C)-N systems,andgive signals between 3.34.5 ppm. Other structural elements that frequently occur are uranic acids, O- andN-acetyl groups, and pyruvate acetals,all of which give rise to quite characteristic resonances. The 500-MHz proton spectrum of pneumococcal type 10A CPS is shown in Fig. 2 and the 125MHz carbon spectrum in Fig. 3. The usual ranges of important peaks are shown. Countmg the resonancesof each type in these spectra usually answers most of the basic compositional questions, such as (Tables 1 and 2): How many carbon atoms are there m the repeat unit? How many sugars are there m the repeat unit? How many aminosugars are present? How many Cdeoxysugars are present? How many uromc acids are present? How many free hydroxymethyl groups are present? Are there any noncarbohydrate substttuentspresent? How many sugars are present as the a-anomer? How many as the /3-anomer?

NMR of Polysaccharides

-S)-/3-D-Galf-(

I-3)-/J-D-Galp-(

155

I-4)-/?-D-G;lpNAc-(I-3)-a-D-Galp-( I

I-2)-D-Rlbol-(SOPOz-

/I-D-Lalf

Fig. 1. The structure (3) of the repeat unit of the capsular polysaccharide from Streptococcus pneumoniae Type 10A. This polysaccharrde contains sugars in both the pyranose and furanose rmg form, in both the a- and J3-anomeric configuratron, and an alditol phosphate linkage. OAc H

s HOC 3lng hydrogens

k-

I

Anomerlc

hydrogen8

I C-Me -

,~‘~‘I”..I’~.‘I..‘.1”“I.~“1”‘~I””)’~’.I’~~’I”~’I’.~’ s-5 5.0 4.5

4.0

3.5

Chemical

3.0

Shllt

2.5

2.0

I 5

1.0

.K

( ppm)

Fig. 2 The 500-MHz proton spectrum of the pneumococcal 10A polysaccharide (obtained at 7O’C) showing the typical positions of a number of common resonances “Ring hydrogens” mclude all the hydrogens on the sugar rmg and hydroxymethyl pendant groups The resonance marked n is from ethanol, and the resonances marked * arise from contammatron by the cell-wall polysaccharrde Are any furanose ring-form sugars present? Are any 13C-3’P couplmgs apparent (typtcally

5 Hz)?

(See Notes 6-8) The 1D 13CDEPT- 135 spectrum can be usedto quantify the number of -CH,- resonancesm the repeatunit, valuable if the presenceof alditols or

’

-‘-I----.‘----,-...’ 100

carbons

so

FHH

-I’-80 70 Cherntcal

- ---I

Rmg carbons I-.60 Shift (ppm)

CH,OH

CH-N I. 50

*-‘--I-..‘.--, 40

30

.-....-I-

C&&i

20

F&H

Fig. 3. The 125-MHz DEPT-135 carbon spectrum of the pneumococcal 10A polysacchande labeled with the positrons of the resonances from a number of common groups This spectrum was obtamed at 7O”C, and 1s phased so that resonances from methene and methyl carbons are negative, and resonances from methylene carbons are posmve. Resonances from quaternary carbons wrll not appear. The resonance at 55 ppm arises from an impunty

I’.. 110

Anomenc

--I

2F: 8 k

s

8

NMR of Polysaccharides

157

Table 1 Typical Proton Chemical Shifts and Coupling Patterns for Frequently Resolved Resonances m the NMR Spectra of Polysaccharides

H-l of a sugars H-l of p sugars H- 1 of CWZLZ~~~sugars H- 1 of p-manno sugars ManNAc H-2 a-GalA H-5 a-IdoA H-5 ct-munno H-2 f3-Glc H-2” N-acetylammosugar methyl H-6 of 6-deoxysugars

Chemical shift range, ppm

Coupling pattern

5.5-5.0 4.4-5.0 5.549 5.0-4.6 4.3-4.1 4 6-5 0 46-50 3944 3.4-3 3 2 15-2.00 1.35-1 15

d, 3 5 Hz d, 8 Hz d, <2 Hz d, <2 Hz d,35Hz d, approx 1 Hz d, approx 3 Hz d,4Hz t, 10Hz Zl,6Hz

Wnsubstttuted Table 2 Typical Chemical Shafts in the Carbon NMR Spectra of Polysaccharldes Chemical shift, ppm Uromc acid C-6 N-acetylaminosugar carbonyl carbon Furanose sugar C- 1 (a, p) Pyranose sugar C- 1 (c1) (PI UnsubstitutedQ furanose ring carbons Unsubstituteda pyranose ring carbons Glycosylated hydroxymethyl groups Unsubstituteda hydroxymethyl groups C-N m amino sugars Wacetyl methyls C-6 in 6-deoxyhexoses

170-180 170-180 101-111 91-101 95-105 70-85 65-75 65-68 61-64 48-55 23-25 15-19

aSubstnutton, e.g , with sulfate groups in the glycosaminoglycans, causesa downfield shift (of 2-7 ppm) of the resonance due to the carbonatomat the position of substitution

-6-linked sugarsis suspected.The one-bondproton-carbon coupling constant (measured from the 1D 13Cspectrum recorded without the usual broad-bandor composite pulseprotondecoupling) is diagnostic of anomeric configuration, being approx 170Hz for a-anomers and approx 160 Hz for J3-anomers(8). This method can be particularly useful for manno-resi-

158

Jones and Mulloy

dues, where the (small) 3Jn,,n, value is difficult to measure and only poorly diagnostic of anomeric configuration. A 1D phosphorusspectrum can usually be readily obtained if phosphorylation is suspected,and chemical shift correlation should distinguish between phosphomonoesters and phosphodiesters.More detailed structural analysis requires resorting to 2D methods. 4.2. Glycosaminoglycans

All glycosaminoglycans, except keratansulfate, consist of a lmearchain of alternating hexuronic acids and hexosamines, so that the basic repeating unit is a disaccharide. Commonly found substituentsareiV-acetyl and N- and O-sulfate groups.The structure of heparansulfate is shown m Fig. 4, as an example; Fransson(2) can be consulted for other aspectsof GAG structure. Heterogeneity can arise in several ways. a-L-Iduronic acid in heparin, heparansulfate, anddermatansulfate arisesfrom thepostpolymerizationepimerizationof P-o-GlcA, which may not be complete. There may also be considerablevariationin substitutionwith sulfategroups (seeFig. 4). Some well defined atypical primary sequences may be found, for example, as the remnant of the linkage regions at the reducing end of the polysaccharide chain, originally attached to the protein core of the proteoglycan (9) and occasionally as the capping region (IO) of the GAG side chains of proteoglycans. An unusual primary sequenceis also found asthe binding site on heparin for antithrombin, and contributions from this sequencecan be seen in the ‘H spectrum of heparin (II). The emphasis in NMR studies of GAGS is as a consequencerather different from that in structural determination of bacterial polysaccharides. Establishment of the main backbone structure is likely to be somewhat simpler, with the positions of substituentsandidentification of variant sequencesa more important goal. Figure 5 shows a detail of the anomeric region of the high-field ‘H spectrumofheparansulfate.TheNMRspectraofheparln(11,12), heparan sulfate (11,13),dermatan sulfate (14,151,keratansulfate(7,16), and chondroitin sulfate (I 7) have been discussed in a number of papers. 4.3. Spectral

Assignments

Detailed structure analysis (which sugars are present, how are these sugarslinked together,what is the sequenceof sugars,and which positions are substituted) has to begin with assignment of the proton, and, usually, the carbon spectra.Proton assignmentscan be madeby the standardNMR

NMR of Polysaccharides

159

NHCOCH,

OH

-4)-P-D-Glc A-( 1-4)~CY-D-GlcNAc-( I -

oso,-4)-ac-L-ldoA(20SOJ-(

NHSO;

1-4)-ol-D-GlcNS0,(60SO~)~l-

Fig. 4. Structure of heparan sulfate The backbone IS a block copolymer of two different repeating disaccharides sequences, A and B. In sequence A, the glucosamine residue is occasionally O-sulfated rather than iV-acetylated, in sequence B, erther of the O-sulfate groups may be missing. The sequence (-GlcNAc-IdoA) never occurs.

correlation methods of COSY, relayed COSY, and HOHAHPOCSY methods. This usually gives sufficient information, but problems may remain because of severe spectral overlap or when small coupling constantsresult in extremely weak correlation crosspeaks,and the gala&o H4/H-5 correlation (3Jn4,n5= typically < 1Hz) is the most common problem. In caseswhere the connectivity is lost, NOE methods must be used, but these can create circular arguments if NOE arguments are also used to determine sugar residue sequence. When interpreting a 2D correlated spectrum (e.g., COSY), the lowfield, well resolved H- 1resonancesareusually usedasstarting points, and the connectivities established from there, but any assignable, well resolved peak, such as the methyl group of a Gdeoxysugar, a high-field H-5 (p-gluco or P-manno), or a low-field H-5 (a-uranic acids) can be used as a starting point. Each monosaccharide acts as an isolated spin system and can be assigned separately.Alditols (usually present as alditol phos-

160

Jones and Mulloy

I 5.8

5.4

5.0

4.6

iwm

Fig. 5 Part of the 500-MHz NMR spectrum of heparan sulfate, showing the well resolved resonances owing to the anomerrc protons (H-l) and to H-5 of cr+rduromc actd. The signals are numbered as follows: 1, H- 1 of N-sulfated glucosamine linked at C-l to P-o-glucuromc acid; 2, H-l of N-acetylated glucosamme linked through C-l to glucuromc acid (as m Ftg 4, sequence A) and N-sulfated glucosamine lmked at C-l to cc-L-rduronic actd (as rn Fig. 4, sequence B), 3, H-l of 2-O-sulfated iduromc acrd (mostly as m Ftg 4, sequence B); 4, H-lof unsulfated tduromc acid linked to 6-Osulphated glucosamme; 5, H-l of unsulfated rduromc actd linked to glucosamine, not 6-O-sulphated; 6, H-5 of iduronic acid, 7, H-l of glucuromc actd lmked to N-acetylated glucosamme (as m Frg. 4, sequence A)

phates) are a particular problem, since they usually lack well resolved, easily assigned resonances that can be used as starting points for the connectivity analysis. In such cases,proton assignments from 31P-1Hor i3C-lH correlation experiments may be very useful. Pulse sequencesthat may simplify the spectrum and make specific assignments, such as tnplequantum filtered COSY (for hydroxymethyl H5/H6/H6’ systems), are valuable-the assignment is only a means to an end Carbon assignments are made by correlation of a t3C resonanceto an assigned resonancein the proton spectrum using either traditional, proton-detected, CHORTLE (18) or the modified COLOC experiments (7).

NMR of Polysaccharides

J,,=3Hz J,=lHz

Jza=lOHz JM=4Hz Js6=Jse=6Hz Js6=12Hz

Fig 6. Coupling constants around the a-Gal residue

Proton-detected experiments are to be recommended, since they provide both high sensitivity andthe best resolution in the crowded proton domain (IPJO). The limitation on unambiguous assignment of the carbon spectrum is usually the overlap of the proton resonances,and carbon chemical shift arguments can resolve these ambiguities. 4.4. Assignments of Spin Systems to Specific Sugar Residues In contrast to amino acids, the spin systems arising from various sugar residues show similar chemical shifts, and they must be differentiated on the basis of their interproton coupling constants.Thesecoupling constants are often only available from 2D correlation spectra,which should therefore be obtained at the highest possible digital resolution, or from the 1D variants of the various 2D correlation techniques. Omitting well resolved and easily assigned high-field resonancesreducesthe spectral width and improves digitization without undue increases in data storage requirements, andthe spectrum may be processedwith strong resolution enhancement to resolve the fine structureof the crosspeaks,from which interproton coupling constants can be estimated. Typical interproton coupling constants are shown for the a-galactose system in Fig. 6. In general, antiperiplanar hydrogens (those in an axialaxial relationship) show an 8-10 Hz coupling, and protons in an axial-

162

Jones and Mulloy

equatorial relationship show a 34 Hz couplmg, dropping to ca 1 Hz if both hydrogens are antiperiplanar to an oxygen across the bond. These relationships hold for pyranose sugarsin the standard4C, ring forms, but are not applicable to furanose sugars (where there are fewer data available) or when a pyranosering is not in the 4C1conformation (seeNote 9) 4.5. Substituents The position of 0-acetyl groups can be determined from the strong downfield shift on the a proton; effects in the carbon spectraarerelatively small upfield shifts on the substituted carbons, although long-range C-H correlation has been used (21). De-0-acetylation and reanalysis of the sample can provide additional evidence. N-acetyl groups can be located by correlationto NH m Hz0 solutionor from the downfield position of the N-CH proton resonance,but the proton shifts are not really diagnostic and our preferred method is from the high-field 13Cresonanceof the C-N system at ca 48-55 ppm. The position of phosphorylation can be determined fromcoupling in the t3C spectrum if the resonanceISresolved or by 1H-31Pcorrelation, which has high sensitivity even when detected through the 31Pnucleus. In some cases,such as the phosphodiester link through the anomeric position found in some capsular polysaccharides, the low-field chemical shift and 1H-31Pcoupling constant (typically 8 Hz) arediagnostic. Dephosphorylation can be carried out by treatment with48 or 60% hydrofluoric acid (22) (handle this with care; it is extremely dangerous) or phosphodiesters cleaved with strong base (23), to generate either oligosaccharides or a dephosphorylated polymer depending on the polysaccharide. Pyruvate acetals generate a characteristic proton resonanceat ca 1.5ppm, and thecarbon chemical shift of the acetal carbon distinguishes between five- or six-membered ring systems. Various methods are available to determine the stereochemistry at the pyruvate C-2 that generally rely on comparison of chemical shifts with those of model systems (24-26). These substituents can be removed by dilute acid hydrolysis. 4.6. Linkage Position and Sequence The best ways to establish linkage positions in the absenceof a methylation analysis arecarbonchemical shift (Table l), interproton inter-residue NOES (27), selective INEPT experiments (20,21,28), and proton chemical shift effects. Comparison of observed proton chemical shifts with those of mono- and oligosaccharide model systems is not a reliable way

NMR of Polysaccharides

163

to locate the position of glycosylation, but can be very useful to assign terminal residues in side chains. We are not aware of any complete tabulations of proton chemical shifts for oligosaccharides (29), although they are available for carbon chemical shifts. Glycosylation causes a large (6-8 ppm) downfield shift of the a carbon and a small ( l-2 ppm) upfield shift of the p carbons, and so is diagnostic of the linkage position (30,311, but the situation is more complex at branch-point residues. Interproton interresidue NOES will occur when an anomeric proton on one residue is close in space to a ring proton on another residue, and so provide both linkage and sequencedata (27). The strongestinterresidue NOE is almost always observedbetween hydrogensattachedto thecarbon atomsinvolved in the glycosidic link. This method is quite reliable, but some linkages, especially (I-6)-links, give weak NOES, and NOES (usually weaker) to the proton on the carbon adjacentto the linkage can occur (32). Depending on the precise local conformation, other NOES may be observed to confirm the analysis; well reportedcasesare between Hex H-l and Gal H-6s in a-o-Hex (1-4)-o-Gal and between Hex H-5 and Man H-2 in a-D-Hex( I-3)-D-Man(NAc) systems, but such casesare obviously dependent on the anomeric configuration, relative configuration, and absolute configuration of the sugars concerned (33). These NOES are usually obtained from NOESY spectra, but the ID NOE difference experiment has its place, since the anomertc resonancesare usually well resolvedandthe high digital resolutionallows themultiplicity of enhancedpeaksto be determined, which is of particular value in heavily overlapped systems.The literature on interresidue NOES involving furanose sugarsis much smaller, but the same patterns are to be expected. Correlation experiments tuned for the small interproton coupling constant across the glycosidic link usually fail because spin-spin relaxation is too fast, but a selective ID long-range carbon-proton correlation across the glycosidic link has been used successfully in some cases(28). These experiments require a well resolved anomeric resonanceand an assigned carbon spectrum, but fail on viscous samples where proton 7’+ are short (i.e., where lines are broad). 4.7. Conformation

Long-range NOES have not been observed in polysaccharides (and their repeat nature would make assignment to a particular repeat unit impossible), and the general paucity of interresidue NOES suggeststhat

164

Jones and Mulloy

methods developed for the conformational analysis of proteins and oligonucleotides are not applicable to polysaccharides. Shorter range interresidue NOES may, in favorable cases, be used to determine the approximately helical “secondary” structures of polysaccharides (34).

5. Notes 1. There is, however, a wide range of rare sugars m the bacterial polysacchandes that are found occasionally and that can greatly complicate the analysts. 2. The fact that lures are sufficiently narrow to give interpretable polysaccharide spectrareflects internal mobility rather than overall tumbling; linewidth is therefore at least as much a functton of chemical structure as of mol wt. Polysaccharides

3. 4. 5.

6.

7.

with

a particularly

flexible

link (e.g , a [l-6]

link, a

phosphodiester or an alditol) tend to gave better spectrathan those lackmg this kind of feature. Similarly highly charged polysaccharides wtth counterton shells and accompanying solvation, or crowdmg owing to many large substrtuents give much poorer spectra Both these problems are encountered m the glycosaminoglycans. Sometimes chemical modificatton can help (e.g., de-O-acetylatton); thts works by changing the internal motions rather than the overall mol wt. Sonicatron of very high-mol-wt samples can lead to partial, random depolymerizatron to approx 50 kDa, so reducing lmewrdths and improvmg the spectra (35). Very few studies have been carried out tn DMSO solution; the hmttmg factor is solubihty. There 1sa growing body of solid-state studies (ltmrted to 13Cstudies) on polysacchartdes with a simple repeat unit. We find a vacuum centrifuge useful; freeze-drying m a small desiccator evacuated by means of an oil pump, protected by a dry ice/methanol cold trap seemsmore efficient than the big freeze-dryers. We have encounteredproblems owtng to spectrometertnstabrhtyat elevated temperature, frequently because of such simple mechanical problems as unbalanced au flows or slow thermal equilibration within the probe. In such cases, obtaining the spectrum without sample spurning would be advantageous, and the effects on hnewrdth are not tmportant m such hrghmol-wt samples Even the regular repeating polysaccharrdes can suffer from mcomplete or heterogeneous substrtutron, which complrcates the spectra. This heterogeneity can sometimes be removed chemically, for example (27), by de-O-acetylation with 5% ammonia for 5 h at 37°C. Nonintegral ratios of the area under resonances is usually mdtcattve of some form of mcomplete substitution. Some of the most interesting polysaccharides form highly ordered, high-

NMR of Polysaccharides

165

mol-wt complexes. Of these, some, such aschitin and cellulose are msoluble, some, such as agarose, and the carrageenans form gels, and many of the fungal (I-3)-P-glucans form htghly ordered triple-helical structures. These systemsare amenable to t3C solid-state NMR studies, which provide a lot of structural and conformational information, but will not be considered further here (36). 8. Pulse methods: Rapid repetmon, usually with a short pulse, tmproves signal-to-noise for protonated carbons, but unprotonated carbons (whtch for carbohydrates are mainly ketose C-2, carbonyls, and pyruvate acetal carbons) will give signals of much lower intensity owing to their slow relaxation rates.If your carbon spectraare poor because of small NOES, running spectra m polarization transfer mode should be well worthwhile. 9 An example of this is the a-L-tduronate residue found in several glycosami-

noglycans, where the pyranose rmg is mobile, and its conformatton can best bedescribed as an equiltbtium between two or three contrrbutmg forms. In such cases, interpretation

1s not simple, and the considerable

literature

on this example should be consulted (37-39).

References 1 Gallagher, J. T , Lyon, M , and Steward, W P. (1986) Structure and function of heparan sulphate proteoglycans. Biochem J. 236,3 13-325. 2. Fransson, L. A. (1985) Mammalian glycosammoglycans, in The Polysaccharides, vol. 3 (Aspinall, G , ed.), Academic, New York. 3. Jennings, H. J (1990) Capsular polysaccharides as Vaccine Candtdates, in Current Topics in Microbiology and Immunology, vol. 150 (Jann, K. and Jann, B., eds.) Springer-Verlag, Berlin, pp. 97-128 4 Lane, D. A. and Lindahl, U. (1989) Heparin Edward Arnold, London. 5. Gerwig, G J., Kamerling, J P , and Vliegenthart, J. F. G (1979) Determination of the absolute configuration of monosacchartdes m complex carbohydrates by capillary g.1.c. Carbohydr. Res. 77, l-7. 6 Lindberg, B. and Lonngren, J. (1978) Methylation analysts of complex carbohydrates. general procedure and application for sequence analysis. Methods Enzymol. SO, 3-33. 7 Cockin, G H., Huckerby, T N , and Nieduszynski, I. A (1986) High-field NMR

studres of keratan sulphates; ‘H and 13C assig nments of keratan sulphate from shark cartilage Biochem. J 236,921-924 8. Bock, K. and Pedersen, C (1974) A study of t3CH coupling constants in hexopyranoses. J. Chem. Sot. Perkin Trans II, 293-297. 9 Van Halbeek, H , Dorland, L., Veldink, G. A , Vliegenthart, J. F. G., Garegg, P J., Norberg, T., and Lindberg, B (1982) A 500 MHz Protonmagnettc-resonance study of several fragments of the carbohydrateprotein lmkage region commonly occurring m proteoglycans Eur J Blochem. 127, l-6 10. Huckerby, T. N., Drckensen, J. M., and Nreduszyuki, I A. (1991) Characterisatton

166

Jones and Mulloy

of ohgosaccharldes from the non-reducing termml of keratan sulphate chams Carbohydr. Res. m press. 11 Mulloy, B and Johnson, E. A (1987) Assignment of the ‘H-NMR spectra of heparm and heparan sulphate Carbohydr Res 170, 151-164. 12. Gatti, G , Casu, B., Hamer, G. K , and Perlm, A. S (1979) Studies on the conformation of heparin by ‘H and 13C NMR spectroscopy. Macromolecules 12, 1001-1007. 13 Huckerby, T. N and Nreduszynskr, I. A (1982) Proton chemical shrfts m the NMR spectra of heparan and heparin Carbohydr Res 103, 141-145. 14 Sanderson, P. N , Huckerby, T N., and Nieduszynski, I A (1989) Chondrortinase ABC digestion of dermatan sulphate, NMR spectroscopic characterrzation of the oligo- and poly-saccharides Biochem .I 257,347-354 15. Bossennec, V , Petitou, M., and Perly, B. (1990) ‘H-NMR mvestigatlon of naturally occurrmg and chemically oversulphated dermatan sulphates Btochem J 267,625-630.

16 Hounsell, E , Feeney, J., Scudder, P., Tang, P W., and Ferzi, T (1986) ‘H-NMR studies at 500 MHz of a neutral disaccharide and sulphated di-, tetra-, hexa- and larger oligosaccharrdes obtained by endo+galactosrdase treatment of keratan sulphate Eur J. Biochem 157,375-384 17 Weitr, D., Rees, D A., and Welsh, E. J (1979) Solution conformatton of glycosaminoglycans. assignment of the 300 MHz ‘H-magnetic resonance spectra of chondrortm 4-sulphate, chondroitm 6-sulphate and hyaluronate, and investigation of an alkali-induced conformational change. Eur J. Biochem 94,505-5 14 18 Altman, E , Brrsson, J.-R , and Perry, M. B (1988) Structure of the O-antigen polysaccharide of Haemophilusinjluenzaeserotype 3 (ATCC 27090) lipopolysaccharade Carbohydr. Res. 179,245-258. 19 Byrd, R. A , Egan, W , Summers, M F., and Bax, A (1987) New NMR spectroscopic approaches for structural studies of polysaccharrdes apphcation to the Haemophilusinjluenzae type a hpopolysaccharrde. Carbohydr. Res.166,47-58 20. Tsui, F. P , Egan, W., Summers, M F , Byrd, R. A , Schneerson, R., and Robbms, J. B. Determmation of the structure of the E coli KlOO capsular polysaccharide, cross-reactive with the capsule from Type B Haemophtlusinjluenzae Carbohydr Res 173,65-74

21 Bax, A, Summers, M. F , Egan, W., Guirgls, N., Schneerson, R , Robbins, J B , Orskov, I , and Vann, V F. (1988) Structural studies of the E. colt’ K93 and K53 capsular polysaccharrdes. Carbohydr. Res 173,53-64 22 Moreau, M , Richards, J C., Perry, M. B , and Kmskern, P J (1988) Structural analysis of the specific capsular polysaccharlde of Streptococcus pneumomae Type 45 (American Type 72). Btochemrstry 27,6820-6829. 23 Watson, M J , Tyler, J M , Buchanan, J G , and Baddlley, J G (1972) The Type-specrfic substance from PneumococcusType 13. Btochem J. 130,45-54 24. Garegg, P. J., Jansson, P.-E , Lindberg, B., Lmdh, F , Lonngren, J , Kvarnstrom, I , and Nimmich, W (1980) Configuration of the acetal carbon of pyruvic acid acetals in some bacterral polysaccharides Carbohydr Res.78, 127-132.

NMR of Polysaccharides

167

25. Gorin, P A J., Mazurek, M , Duarte, H. S , Iacommi, M , and Duarte, J. H (1982) Propertres of 13C-NMR spectra of 0-( 1-Carboxyethylidene) derivatives of methyl /%Galactopyranosrde models for the determinatron of pyruvic acetal structure m polysaccharides Curbohydr. Res. 100, l-15. 26. Jones, C (1990) A novel method for the determination of the stereochemistry of pyruvate acetal substrtuents applied to the capsular polysaccharide from Streptococcus pneumoniae Type 4 Carbohydr Res 198,353-357 27 Moreau, M., Richards, J. C , Perry, M. B , and Kmskern, P. J. (1988) Apphcation of high-resolution NMR spectroscopy to the elucidation of the structure of the specific capsular polysaccharide of Streptococcus pneumoniae type 7F Carbohydr

Res 182,79-99.

28 Richards, J C and Leitch, R A (1989) Elucrdatron of the structure of the Pasteurella haemolytica serotype TlO hpopolysaccharide O-antigen by NMR spectroscopy. Carbohydr Res 186,275-286. 29. Bock, K. and Thogersen, H. (1982) Nuclear magnetic resonance spectroscopy m the study of mono- and ohgosacchandes Ann Rep in NMR Spectroscopy 13, 1-57 30 Bock, K. and Pedersen, C. (1983) Carbon-13 nuclear magnetrc resonance spectroscopy of monosacchartdes. Adv Carbohydr. Chem. Blochem. 43,27-66. 31. Bock, K , Pedersen, C., and Pedersen, H. (1984) Carbon-13 nuclear magnetrc resonance data for oligosaccharides Adv Carbohydr. Chem Blochem. 42,193-225 32. Lrpkmd, G M , Shashkov, A. S., Mamyan, S. S., and Kochetkov, N K. (1988) The nuclear Overhauser effect and structural factors determining the conformations of disaccharide glycosrdes Curbohydr. Res 181, l-12 33 Jones, C and Currre, F. (1989) Pneumococcal polysaccharide S4; a structural revision. Carbohydr. Res 184,279-84 34. Forster, M , Jones, C., and Mulloy, B. (1989) NOEMOL. integrated molecular graphics and the simulatton of Nuclear Overhauser effects m NMR spectroscopy. J. Mol Graphics 7, 196-217. 35 Szu, S. C , Zen, G , Schneerson, R., and Robbins, J. B. (1986) Ultrasonic uradiation of bacterral polysaccharrdes. Characterization of the depolymerised products and some applicatrons of the process Carbohydr. Res. 152,7-20 36 Saitb, H and Ando, I (1990) High-resolutton solid-state NMR studies of synthetic and brologrcal macromolecules. Ann. Rep. NMR Spectroscopy 21,209-290 37 Smay, P (1986) Active fragments of natural ohgosaccharides Pure and Appl Chem. 61,481-483.

38 Sanderson, P. N., Huckerby, T H., and Nieduszynsh, I A. (1987) Conformational equilibria of a-L-iduronic residues in disaccharides derived from heparm Blochem. J 243,175-181

39 Paulsen, H , Pollex, A , Smnwell, V , and van Boeckel, C A A (1988) Konformatronanalyse von heparm-analogen di- und trisacchariden mu CGLrdopyranose-emheiten Llebigs Ann Chem. 41 I-41 8.

CHAFFER7

Dynamic and Exchange Processes in Macromolecules Studied by NMR Spectroscopy Lu-Yun

Lian

1. Introduction It is normal for a biological system to be in a dynamic state. The function of many biological systems depends on their flexibility, and here NMR can provide the experimental basis for investigating the mechanical function of such systems. The types of motions that are 1hought to occur in proteins, their frequency ranges, and the methods for their detection are summarized in Table 1. It is important to distinguish ubiquitous thermal vibrations or group rotations from the more extensive motions that can be propagated through larger segments of a structure. Motions and dynamics are reflected by five different NMR parameters:chemical shift, spin-spin coupling constant,the areaenclosed by a resonance, relaxation time, and nuclear Overhauser effect (I). The NMR data can be used to provide either qualitative evidence of flexibility or quantitative measurements of exchange rates. This chapter describes several basic experimental analytical NMR techniques frequently used for the qualitative and quantitative analysis of dynamic and exchange processes, focusing on protein systems; the same approach can be applied to most biological macromolecules. ‘The analysis of data for dynamic processes, such as the determination of rate constants and binding constants, can be rather complicated; From Methods m Molecular S/o/ogy, Vol 17 SpecRoscop/c Methods and Analyses NMR, Mass Spectrometry, and Metalloprotem Tecbnrques Edlted by C Jones, B Mulloy, and A H Thomas Copynght 01993 Humana Press Inc , Totowa, NJ

169

170

Lian

Table 1 Matrons m Protems Approximate Frequency Ranges and Methods of DetectIon Types of Motion

Frequency, Hz

Methods of detection

Bond and angle vibration

10’2 to 10’4 109 to 10’3

Side chain and protein rotation

108 to 10”

Aromatic side cham rotation Conformational changes, protein unfolding

100 to 10s

IR, raman X-ray, molecular dynamics simulation NMR relaxation, flourescence depolarization, ESR NMR chemical shift averaging Isotope exchange deutermm exchange

10-s to 16

wherever possible, attempts are made here to simplify the mathematics involved. 2. Sample Preparation and Experimental Conditions

This section deals with the considerations that should be borne in mind m order to acquire good data for further analysis, with an emphasis on experiments to determine exchange and dynamics. These considerations are: quality sample preparation, the effects of factors influencing exchange rates, and some experimental NMR parameters to be used. 2.1. Sample

Preparation

The general guidelines for sample preparation in NMR studies should be adhered to (2). If the sample is to be prepared in D20, the easiest method is repetitive lyophilization with a D,O buffer, although care must be taken to ensure that the stability of the protein (as monitored by turbidity on redissolving or by loss of biological activity) is not affected by lyophilization. The presence of salts and of some buffers can influence stability. Where proteins cannot by lyophilized, alternative methods, such as dialysis against a D20 buffer, membrane ultrafiltration using microconcentrators, centrifugal filtration, or less often, size-exclusion gel filtration may be possible. If the sample is to be kept m D20 solution for some time (~3 d), approx 50 rniV of sodium azide should be added to prevent microbial growth. Solubility, the possibility of aggregation, and the availability of the macromolecule often limit the protein concentration used. In the case

Dynamic

and Exchange Aspects

171

of nucleotides, there is a tendency to stack at high concentrations. Low solution concentrations mean long data acquisition time and, hence, greater usage of expensive spectrometer time. In addition, the stability of protems (enzymes) or the possibility of the degradation of ligands (sometimes by the host protein itself) limits the duration over which an experiment can be performed. If exchange rates are to be determined when using enzyme-bound substratecomplexes, the substrateconcentratlons should be sufficiently smaller than those of the enzyme, such that over 90% of the substrate molecule is in the bound form. Therefore, the limiting factor in deciding the feasibility of an NMR experiment for the determination of exchange rate is the highest manageable protein concentration, the availability of substrates not normally being a problem.* For experiments carried out on most high-field spectrometers (>400 MHz), at which frequencies the signal-to- noise ratio has improved significantly over the last few years, protein concentrations of approx 0.5-l mM in D20 and approx 2-3 mM in HZ0 (with the substrate concentrations therefore slightly less) are typically used. Paramagnetic cations can produce dramatic changes in the NMR linewidth of nuclei in their vicinity (of the paramagnetic cation), even If their concentrations are very low (approx 10-4-10-5 times those of the enzyme or substrate). Hence, the paramagnetic ions should be removed using chelating agents, such as EDTA (5-10 w). 2.2. Factors Influencing Exchange and Dynamic Processes Many exchange rates are either acid- or base-catalyzed or both, and most increase with temperature. Care must also be taken to remove the possibility of any extraneous exchanges, since these will invalidate certain assumptions made in the interpretation of the results. 2.3. NMR Parameters Implementation of many of the experiments and the interpretation of the results may be quite complex, and some specialist help is advisable at both stages. When attempting line-shape analysis, it is important to obtain the best possible data for meaningful information-by *In this chapter, sets of experiments carried out at constant protein concentration ~111generally be described.

172

Lian

avoiding saturation (and hence partial relaxation) of a spin system, since this will result in a distorted line shape, by using the maximum number of points per hertz for good representation of the line shape, and by acquiring data with the best possible signal-to-noise ratio. For 1D magnetization transfer experiments (Section 3.2.), it is important to ensure high-quality selective pulses both for selective inversion and for saturation. The pulse power should be carefully calibrated and adjusted to avoid spill-over effects; a shaped pulse can sometimes be helpful, since many spectrometers suffer from “unclean” selective square pulses. All the general conditions required for the acquisition of good 2D data apply to the 2D experiments described here (see Chapter 2). Postacquisition data processing has now reached (but by no means “plateaued” at} such a state of sophistication that many alternatives to discrete Fourier transformation are now available. These alternative data-processing methods (3,4) are geared toward enhancing the resolution and signal-to-noise ratio of a data set. Such methods can be particularly beneficial- many data sets are wasted as a result of a poor processing approach. 3. NMR Techniques for Qualitative and Quantitative Analysis of Dynamic and Exchange Processes This section discusses the several NMR techniques that are used for determining exchange rates or for observing dynamic processes. One very important dynamic process that has been studied extensively using a combination of all the techniques discussed in this section is that of protein unfolding; hence, a separate section will be devoted to discussing this process. 3.1. Chemical Exchange When discussing dynamic processes, it is necessary to define a time scale. The physical event being detected, for example, the chemical shift in an NMR spectrum, specifies a particular time scale, and all molecular events happening on this time scale will be directly reflected in its measurement (e.g., the observed chemical shift). Table 2 gives the definition of slow, intermediate, and fast exchange rates on the NMR time scale when the measured variable is either chemical shift, 6, coupling constant, J, or relaxation time T, or T2. It is clear that the

Dynamic

and Exchange Aspects

173

Table 2 NMR Time Scale Time scale

Exchange rate Intermediate

Slow

Chemical shift, 8 Coupling const , Jb

kc<&,--6, k<<J,-J,

T2 Relaxatton (Av,,$ =l/M2)

k<
k-8A-6B k=JA-JB

k>>&--& k>>J,-J,

,,L-l T213

TEA

Fast

T213

k>>A-l TEA

TUB

“(8, - 6,) IS typically of the order of hundreds of Hz. bI is typlcally of the order of l-l 0 Hz. cAv,,z = linewidth at half height

d(lTEA

-) 1 is typIcally of the order of l-20 Hz for proteins TUB

three parameters define quite different, although partially overlapping, time scales. The chemical shift is expressed in o rad s-l, that is, if the shift is 6 ppm, and the spectrometer frequency is o, MHz, then o = 6 x 2 7cx o, rad s-l. Thus, fast exchange at a low frequency can sometimes be found to be in the equivalent of intermediate or even slow-exchange on higher frequency spectrometers. Consider two very common biochemical situations, one where A and B are interconverting forms of a molecule, and the other where ligand A binds to a macromolecule B to give a complex AB. ka AHB kb A+B

(1)

kt 2 1 AB

The spectral parameters of A are 6,, J,, l/T,,, and so on, with a similar set for B and for AB. In the case of (l), assume for simplicity that k, = kb = k. The appearanceof the NMR spectrum, as illustrated schematically in Fig. 1, depends on the lifetime of (for [ 11)states A and B, and (for [2]) state Aand state AB, A andAB being either the free and bound forms of the ligand, the free and bound forms of a protein, or the protonated and unprotonated forms of an amino acid side chain. If the exchange rate is slow (Fig. 1A), separate signals are observed for the

174

Lian

h

kc<

IDA-WBI

k=z(WA-WB)

Cm

k=3(WA-WB) WA

OB

WA

WB

L-L

WB

k=

WA

~(WA-WB)

k=dOA-WBl

Fig. 1 Change m chemical shifts and linewldths m the presence of chemical exchange between two equally populated environments (A) Slow exchange; (B), (C), and (D) progressively more rapid mtermediaterates of exchange, (E) fast exchange

molecule in each state; if the exchange rate is fast (Fig. lE), a single (averaged) signal is observed. At intermediate exchange, that is, where the chemical shift difference between the resonancesin the absence of exchange is comparable to the exchange rate, non-Lorentzian line shapes, which depend on the value of the exchange rate and on the chemical shift differences, are observed. (Figs. lB, C, and D).

Dynamic

and Exchange Aspects

175

What can be deduced from the line shape in each of the different exchange situations? Consider a situation where the signals are single nonoverlapping Lorentzian lines without a multiplet structure, as in Fig. 1. If the slow exchange condition exists, the exchange rates are deduced simply in terms of the broadening of each resonance; if in fast exchange, the spectrum generally contains no measurable information about exchange rates other than the implicit fact that these rates must be much larger than the chemical shift differences m the absence of exchange. For intermediate exchange, a detailed comparison of the observed line shapes with those predicted by analytical expressions will be necessary to determine the exchange rate (5). In the present chapter, however, a more simplified approach is taken where quantitative binding constants are to be determined. Because the two processes (I) and (2) just described are somewhat different in terms of their analysis by NMR, they will be treated separately. 3.1.1. Slow Exchange

3.1.1.1. PROCESS (1) The condition for slow exchange is as shown m Table 2. The equations to describe the observed linewidth of A and B are governed by the following transverse relaxation times, bearing in mind that the linewidth at half height uli2 = 1/7cT2: = l/TzB + kb (3) l&A,obs = l/T2~ + k, and 1/TZnVobb The magnetization ofA(or B) will decay asif in the absenceof exchange ( 1/T2* or l/T,,) with an additional relaxation process caused by the exchange of rate k, (or kb). 3.1.1.2. PROCESS(2) When monitoring only the ligand resonances A and AB, the transverse relaxation times are given by: l/Tmobs= l/T,, + ki[B] and l&m,obs

=

~/TZAB

+ k-1

(4)

[B] being the concentration of the macromolecule. The range of k, which can be measured with these equations, is lO-lo2 s-i, since relaxation is related to observable linewidths. Note that the linewidth of the signal from the free ligand is concentration-dependent, whereas that of the bound ligand is not.

176

Lian 3.1.2. Fast Exchange

3.1.2.1. PROCESS (1) A and B interconvert sufficiently fast to make resonances A and B indistinguishable and a new resonance, which exhibits the weighted average of the observable NMR, P, parameter in each of the two states, is observed. Pobs =P@A + pBf B (5) where PA and pn are the mole fractions of the A and B species present in SOlUtiOnwith PA + PB = 1. 3.1.2.2. PROCESS (2) When monitoring resonances of species A and AB (which can be free and bound forms of either ligand or of protein), the observed NMR parameter P (8A,SAu, l/TzA, l/TzAB, and SOon) will be a weighted average of the A and AB parameters. Pobs

= p#A

+ PABfAB

(6)

where PA= [Al/Atot, P,Q = [AWAtot and Atot = [Al f LABI, Btot= PI + [AB], and PA + PAB = 1. It is possible to obtain the ligand-binding constant by analyzing the behavior of one of the measurable NMR parameters, e.g., chemical shift, as a function of the ligand concentration at constant macromolecular concentration. One way of doing this is to express the changes in chemical shift in a form where Kd, the dissociation constant, can be readily obtained. Let: &=pAB-PAandd=Pobs-PA

(7)

where PA is the observable NMR parameter of the free species (ligand or protein), PAB is the chemical shift of the bound form (given by the shift at PAB + 1, zero free ligand or protein concentration). Since PA + /JAB = 1, it is possible to write: A=&PAB

(8)

At equilibrium, the dissociation constant I$ can be written as: Kd = [Al[Bl/[AB] = kJk, Therefore, PABcan be expressed in terms of the dissociation constant and concentration as:

Dynamic

and Exchange Aspects PAB

=

[ABl/Am

=

177 [A][Bl/Am

= W([Bl + Kd)

l

Kd

(10)

or = [AlJW&,{ [Al + Kd} (11) At this point, it is important to make a distinction between two common situations: (1) The species A, whose NMR parameter is observed, is held at constant concentration, for example, at constant protein concentration, AtOt,and variable llgand concentration Btot. The change in chemical shift A can be expressed as: &PlNBl + Kd) (12) Eq(2) resembles aMichaelis-Menten equation, and the standard graphical methods of analyzing this type of data can be used. A plot of the dependence of the change in chemical shift, A, on the ligand concentration, [B] is a hyperbola (Fig. 2A). The dissociation constant Kd can be deduced directly from this plot; it is the concentration of the ligand that gives half-maximal binding, that is, the concentration at AJ2. (2) The species A, whose parameter is observed, is varied in concentration; for example, A is the ligand that is added to a constant protein concentration Btot. Eq. (11) is now applicable; this equation is less simple to analyze, but if [A] >> [AB], Eq. (11) simplifies to give: A = QAB

=

(13) A = bAKAto, + Kd) A plot of the change in chemical shift as a function of ligand concentration gives a rectangular hyperbola (Fig. 2B); the binding constant can now be deduced using a best-fit curve analysis. 3.1.3. Moderately Fast Exchange 3.1.3.1. PROCESS (1) Figure 1 (B, C, and D) shows the situation where the exchange rate is in this regime; line broadening of the order of up to six times the linewidth when in the fast exchange regime is observed. No detailed analysis is possible. 3.1.3.2. PROCESS (2) Instead of the observed resonance being either a simple weighted average of the chemical shift of A and AB, or two distinct resonances

178

Lian

Llgand

concentration (mM)

B 540Y 3204 EIOOf . 3 80z 2 800 . = 40-

8 6 20 5 -1 “4 0.0

08

Fig. 2 (A) A theoretical plot of the change m chemical shift of a protein resonance as a function of hgand concentratron These data are obtamed for the experiment when variable concentrations of a hgand are added to a constant concentration of the protein and a resonance associated with the protein monitored (B) A theorettcal plot of the change m chemtcal shaft of the ligand resonance as a functron of ligand concentratton, obtained at constant protein concentration

for A andAB, the observed chemical shift changes progressrvely from that of AB to that of A as the hgand concentration is increased (at constant protein concentration). The linewidth does not change steadily as a function of ligand concentratton, but rather passes through a maximum, reflecting a contribution from the exchange process to the lmewidth. The observed linewidth vu2 (l/W, = q12) is given by

Dynamic

and Exchange Aspects

179

PI + k-1) (14) Although the first two terms represent the weighted average of the relaxation rates in the two states, a third term is included to account for the exchange contribution. The range of kr which can be detected, depends on Ao, this usually being in the range: 102-lo5 s-l. To obtain an accurate binding constant from the observed linewidth, a full lineshape analysis is required (5). lfTzobs=P~fT2~

+PABIT2AB

+ 4’n2Pd?do2~(k,

3.2. Magnetization

Transfer

NMR spectroscopy provides the ability to measure rate constants by monitoring a system at equilibrium. Examples include: reaction pathways, which can be deducedby following the transfer of nuclei between two positions- on the substrate and on the product molecule, and the estimation of the exchange rates of labile hydrogen in peptides and proteins with the hydrogen in the bulk water, hence providing valuable information concerning the conformational and dynamic properties of these molecules. When rate constants are in the slow exchange regime 10-2-102s-‘, th e magnetization transfer technique can be used for their determination. Consider a slow exchange process A+ B -AB. Perturbation of one resonance by selective irradiation, for example, the resonance of A, will cause changes in the intensity of the other observable resonance, in this case that of AB, owing to transfer of magnetization from one to the other as a result of exchange. The three magnetization transfer experiments commonly used are: saturation transfer, inversion transfer, and 2D exchange. Only the two-site exchange case is discussed here, this being rather more straightforward than a multisite case, in which extra care must be taken in the analysis to account for as many of the processes involved as possible. 3.2.1. Saturation

Transfer

Using the aforementioned slow-exchange process as an example, if resonance A is saturated, the fractional change in intensity of the AB resonance at steady state is given by the equation ~ABIIAB=RIABI(RIAB+~-~)

(15)

IAB, I’ AB are the intensities of the AB resonance before and after irradiatron of the A resonance, respectrvely, and Z?,,, 1sthe longrtudi-

where

180

Lian

nal relaxation rate+ of AB, which can be determined independently, The advantage of the saturation transfer method in the steady state over the inversion transfer approach (see following section) is that the time-course of the intensity of only one signal (AB) needs to be analyzed. The saturation transfer experiment can also be used qualitatively; for example, systematic irradiation throughout the relevant region of the spectrum permits location of the resonance(s) of bound ligand by observing selective decreasesin intensity of the corresponding resonance(s) of the free ligand. 3.2.2. Inversion

Transfer

This experiment is performed in the same manner as the saturation transfer experiment previously described, with the exception that a selective 180” pulse is used to completely invert a selected resonance. The pulse sequence is 180” (selective) - t- 90” (nonselective) - acquisition, t being a variable delay. The pulse sequence is repeated for different t values. Although the inversion transfer approach affords more experimental information concerning the involved rate constants covering a larger range of rates when compared with the saturation transfer approach, its major drawback is the multiexponential time dependences of the signal intensities (6). This latter disadvantage excludes simple data analyses based on semilogarithmic plots and initial slopes. A computerized nonlinear least-squares analysis using a complete theoretical model has to be used for correct estimation of the rate constants. 3.2.3.20

Exchange

From the experimental point of view, both the 1D saturation and inversion transfer methods described earlier have major selectivity and experimental-time disadvantages,particularly for macromolecules. In addition, in the case of the saturation transfer approach, the rate constant needsto be greater than the relaxation rate, R,. Clearly, the 2D magnetization transfer (2D exchange experiment), which usesthe same pulse sequence

as the 2D NOESY

experiment

(see Chapter

2), is more

efficient; it allows the entire matrix of all the exchange processes in a system to be obtained from a single experiment. tThe relaxation rate, RIAB, IS the sameasl/TIAB usedpreviously,but 1susedm this form for convemencem subsequent equations

Dynamic and Exchange Aspects

181

To illustrate the analysis of data obtained from a 2D exchange experiment, the example used here is the determination of the rate of hydrogen exchange of the labile hydrogens of a peptide with water (7). The 2D exchange spectra using various mixing times are acquired by means of an observation pulse that does not excite the water signal, such as a Redfield pulse or a 1337 pulse. Analysis of the variation of intensities of the cross- and diagonal-peaks with mixing times can be simplified by making some assumptions: 1. This 1sa simple, two-site exchange,A(NH) + B(H,O); 2. The mole fraction, X, of NH is much smaller than the mole fraction of HzO, that is X,, c< XHZoandXHZo= 1; and 3. The normalizedrateconstant,k, is given by k = kAXA = kBX,. Theoriginal equations(8), which describethe dependenceof the kinetics on mixing time, tm,can be reducedto: UAA = XA exp[(-RIA ~BA =[XAWIA

+ k - RIBI

kxp(-%Ah)

+ ‘%I - exp[(-hA

+&J)

(16)

whereaAAandaBAarethemixing COCffiCiCntS,UAAbCingprOpOrtiOnal to the intensity (or volume) of the diagonal and uBAbeing proportional to the intensity (or volume) of the crosspeak. RIA corresponds to the spin-lattice relaxation time. An example of a plot of the scaled intensities (to account for the variations in linewidths and peak heights in the ID spectrum) against the mixing time tm is shown in Fig. 3. Considering only the diagonal peak, the time dependence of the intensity gives a value of (RI, + k) for each resonance. This value can then be used in Eq. (16) to determine k, since RIB, the spin-lattice relaxation time, can be obtained using the inversion-recovery method. 3.3. Isotope

Exchange

Methods

The study of enzyme kinetics by the observation of isotope exchange is very similar to that using the magnetization transfer approach (9). One way of using isotope labels is to introduce the isotopic label chemically into the molecule under study and then to observe the transfer of this label to other molecules via NMR. Isotope exchange experiments are independent of NMR relaxation times, since they depend only on the concentration of the permanently labeled chemical groups. The inverse of the exchange rate for isotope exchange experiments should be approximately of the order of magnitude of the time

182

Lian

0.0

0.2

04

0.6

0.8

1.0

tm, set Fig. 3 Representativeplot of the variation of intensity (or volume) of diagonal- (*) and cross (H) peak as a function of mixing time, fm m a 2D magnetization transfer experiment

needed to acquire a good NMR spectrum of the sample. The simplest way in which exchange of isotopes may be followed using NMR is to monitor the appearance or disappearance of a signal by direct detection of a magnetically active nucleus. For example, one can detect proton replacement by a deuterium atom from the disappearance of the relevent proton resonance (see Section 3.5.). It is also possible to follow the exchanges of isotopes of nuclei that are inaccessible to NMR detection by observing their influence on nuclei that are easily detected; for example, the replacement of l*O by 160can be probed by means of 31P resonance multiplicity characteristics. Other indirect methods of isotope detection include the spin-echo technique which allows the observation of the exchange of spin- l/2 nuclei of low sensitivity (such as i3C, i5N) via attached protons (IO). 3.4. Relaxation

Time Measurements

The three main NMR relaxation parameters-spin-lattice relaxation time, Ti, spin-spin relaxation time, T,, and dipolar relaxation rate, NOE-have long been used to provide a dynamic description of protein structures. The dynamic aspects of a protein influence these relaxation

Dynamic

and Exchange Aspects

183

parameters, and an appreciation of how these parameters in turn affect the final NMR data acquired is important for the design and execution of experiments and for the interpretation of data. However, this approach is a complex and difficult one, since the problem is not merely one of determining rates of motion within the framework of a particular dynamic model, but one of formulating an actual description of the motion. Nevertheless, qualitative analyses of relaxation effects can be carried out, and some of these are describe here. For example, in the case of smaller proteins (mol wt ~20 kDa), having typical rotational correlation time for overall motion of 10-9-10-8s, observation of linewidths narrower than those expected from the mol wt can be explained by side chain motion, especially the rotation of methyl groups. Rotation about a single bond m the side chain takes place in the range 10-9-10-‘o s. Any observed differences in the linewidths of, for example, the different methyl groups, may reflect either a difference in their rates of rotation, the presence of additional motions, or a difference in the number of neighboring atoms contributing to relaxation. To distinguish between the different possibilities, a detailed analysis of the relaxation parameters, especially the nuclear Overhauser effects, is necessary. In large molecules (mol wt >30 kDa), where the rotational correlation times are typically 1O-8-1O-6s and NMR linewidths are >50 Hz, the appearance of linewidths of 5-20 Hz in the spectrum can be taken to indicate the presence of more extensive motions in addition to individual side chain rotations. For example, the existence of a randomcoil segment in a protein or of structured subdomains, with internal motions, in a multidomain protein (II) may be the case. Experimentally, it is straightforward to measure the spin-lattice relaxation time (T,) and the spin-spin relaxation time (T.). The spin-lattice relaxation time is commonly measured by the inversion recovery method, using the pulse sequence 180”- t - 90”- acquire, where t is a variable delay. The amplitude of the signal after Fourier transformation is given by: AtrJ= A, - 2&e-‘/T, where A, is the thermal equilibrium value and A0 the value immediately after inversion (A, 2 -A,). The simplest pulse sequence for measuring T2 (when the magnetic field is inhomogeneous, asin the majority of the cases) is the spin-echo

Lian

184

experiment. The pulse sequence is 90”- t - 180”- t - acquire, where t is a variable delay. Further details of these experiments are given in most standard NMR textbooks. 3.5. Protein

Folding

and Unfolding

One very important dynamic process that has been successfully studied using NMR is that of protein unfolding. The dynamics of structural changes can be investigated over a wide range of time scales using magnetization transfer, hydrogen exchange, line-shape analysis, and relaxation methods. Many of the methods used depend on the ability to monitor changes in the structural environment of individual protons and a knowledge of the specific assignment of the proton resonances.The three aspectsof protein folding/unfolding most closely examined using NMR are the dynamics of protein folding/unfolding, protein stability via hydrogen exchange kinetics, and the structural characterization of folding/unfolding intermediates in addition to the unfolded form. 3.5.1. Dynamics of Protein Folding

The interconversion between the folded and unfolded forms of a protein is usually slow on the NMR time scale, giving rise to separate lines for the native and denatured states at equilibrium; the spectrum at the intermediate denaturant concentration or temperature is a superposition of the native and unfolded spectra. Generally, two types of spectrumcan be obtained when studying folding/unfolding transitions at equilibrium: spectra where all the resonances can be attributed to either the folded or the unfolded state or those that contain additional resonances from a discrete partially folded state, the latter case being illustrated in Fig. 4 for the refolding of P-lactamase. The absence of intermediate state resonances can simply imply that the intermediates are either short-lived, in low population, or in rapid exchange with either the folded or the unfolded conformation. Quantitative kinetic information on the structural exchange between the unfolded and folded forms of a protein at equilibrium can be obtained using a time-resolved saturation transfer method (see Section 3.2.1.). A resolved signal in the spectrum of the unfolded protein is irradiated, and the intensity of the corresponding signal in the folded state normalized and then plotted as a function of the saturation time, t. In a

Dynamic

and Exchange Aspects

185

lntermedlate state hts resonances

:;:

Fig. 4. ‘H spectrum showing the refoldmg of P-lactamase at 294 K. (A) Native state, (B) “stable” intermediate refolding state, and (C) unfolded state

two-state model, the intensity decays exponentially from the equilibrium value M, to a limiting value Iw,, where M,IM, = RfIRf+ k, (see Eq. [ 15]), Rf being the relaxation rate in the folded form, and k,, the rate of unfolding. Conversely, if the resonance in the folded state is irradiated and that in the unfolded state observed, the rate of folding kfcan be measured. The saturation transfer experiment can, in addition, reveal the existence of multiple folded and unfolded conformations (12). The presence of multiple forms must be taken into consideration when quantitative analysis of the saturation transfer experiment is undertaken. 3.5.2. Hydrogen

Exchange and Protein Stability

The hydrogen exchange rate of the amide protons, which are often involved in the internal hydrogen bond of a protein, is highly informative for studying its internal motions and for monitoring its stability. When individually assigned protons are observed by ‘H NMR, hydrogen exchange is seen, which can be caused by a wide spectrum of fluctuations ranging from local distortions that break a small number

186

Lian

of hydrogen bonds to global transition approaching general unfolding. Where major structural unfolding is responsible for the exchange of internal labile hydrogen atoms with the solvent, as is generally the case when destabilizing conditions exist, a structural unfolding model is most frequently used to derive quantitative rate values: ku kc kf N(H) t) U(H) t) U(D) t) N(D) (18) (In DO ku kf N and U are the native and unfolded states of a protein, respectively, and k, the unfolding rate. Peptide hydrogen exchange is either acid- or base-catalyzed, and k, can be written as: k, = kn[H+]

+ kou[OH-]

Base catalysis is much more efficient than acid catalysis with values of koH = lo* s-‘44-l and k u = 1 s-‘Me*. It is important to understand the kinetic exchange mechanisms governing these exchange rates in order to interpret the hydrogen exchange data (13). The correlation between the rate of hydrogen exchange and protein stability has been demonstrated in many proteins. In the ‘H spectrum of medium-sized proteins, such as ribonuclease A and bovine phospholipase AZ, subsets of NH protons are observed when the protems are dissolved in DzO. These protons are attributed either to the deeply buried protons distributed throughout the protein structure, these protons being involved in hydrogen bonding, or to a whole strand of buried secondary structure, such as an a-helix, these slow exchanging protons being accessible to the solvent only after unfolding in drastic conditions. 3.5.3.

Structural Characterization and Folding Pathways

It is possible to characterize, in some detail, the structure of folding mtermediates by using a combination of advanced NMR and rapid solution mixing techniques. In this combined approach, proton or deuteron labels are trapped within the backbone and side chain protons m the refolded protein, and their location and quantity determined using homonuclear and heteronuclear 2D NMR (NOESY, HOHAHA, HMQC) experiments. The relative proton occupancy (that is, the number of proton labels incorporated), P, at each site is calculated by using the formula:

Dynamic

and Exchange Aspects

187

(20) where I, is the measured resonance intensity or volume, Z, the signal intensity or volume of the fully protonated group, andf, the residual fraction of HZ0 present in the reaction mixture. The two main approaches for label trapping experiments are described in the following sections, 3.5.3.1. COMPETITION METHOD This technique is suitable for examining early refolding events. It aims to balance the rate of amide exchange against the rate of refolding, so that exchangeable protons are trapped in parts of the protein that refold early (14). The following steps are involved: I. Unfold the protein in HZ0 (using a denaturant or at an extreme pH). 2. Refold the protein, and at the same time induce H-D exchange, by rapid dilution of the denaturant with D,O buffer. 3 After a reaction time, t, quench the exchange process by rapid lowermg of pH; refolding wtll contmue to completion without further H-D exchange. 4. Recover protein (by freeze-drying or multiple concentration-wash procedure; see Section 2). 5. Prepare an NMR sample under conditions that minimize further H-D exchange, e.g., low temperature and moderately actdlc pH. 3.5.3.2. PULSE LABELING METHOD This method is designed for investigating (15,16). The following steps are involved:

the later stages of folding

1. Refold the initially deuterated protein in D,O for a variable period. 2. Pulse label for approx 50 ms with an excess of Hz0 buffer; NH sites that are still exposed are selectively protonated. Change to basic pH briefly to ensure that all exposed sites are fully labeled and protected groups deuterated. 3. Quench the exchange by lowermg pH, and allow the protein to refold to natrve form. 4. Prepare an NMR sample as m Section 3.5.3 1. (steps 4 and 5). The backbone and side chain labile protons are ideal conformational probes, because they are distributed throughout the protein structure, amide proton exchange rates are determined predominantly by intramolecular hydrogen bonding, thereby reflecting important aspects of the protein structures, and hydrogen-deuteriumexchange is usually associated with negligible structural changes.

188

Lian

As far as the unfolded protein is concerned, its structural characterization requires the assignment of the ‘H NMR spectrum. Two-dimensional, exchange-mediated magnetization transfer experiments (see Section 3.2.3.) can be used for the assignment of the spectrum of a reversibly unfolded protein provided that, first, specific assignment of the resonancesin the folded protein is known, andsecond,therateexchange or structural interconversion between the folded and unfolded forms on the NMR time scale is slow (of the order of 1 s). The NMR spectrum of an unfolded protein contains amino acid resonances that deviate from those in the unstructured peptide. These deviations have been attributed to short-range interactions between hydrophobic side chains, rather than to residual secondary or tertiary structures that are of any significance. References 1 Jardetzky, 0. and Roberts, G C K (1981) Protein dynamrcs, m NMR zn Molecular Biology, Academrc, New York, pp 448492 2 Oppenheimer,N J (1989) Samplepreparatron,m Methods zn Enzymology, vol. 167, Part A, Academrc, New York, pp 78-89 3. Stephenson,D S (1988) Linear prediction and maximum entropy methods m NMR spectroscopy,m Progress zn NMR Spectroscopy , vol. 13, Pergamon,Oxford, pp 5 15-626 4 Hoch, J C (1989) Modern spectrum analysis m nuclear magnetic resonance. alternatives to the Fourier transform, in Mefhods in Enzymology, vol. 167, Part A, Academic, New York. pp 216-241 5 Rao, B D N (1989) Nuclear magnetic resonancelure-shapeanalysis and determination of exchange rates, in Methods zn Enzymology, vol 167, Part A, Academrc, New York pp 279-3 11 6. Dahlquist, F W , Longmurr, K. J., and Du Vernet, R B. (1975) Dtrect observation of chemical exchange by s selective pulse nmr technique J Magn. Reson 17,406 7 Dobson, C M., Lran, L-Y, Redfield, C., and Toppmg, K. D (1986) Measurement of hydrogen exchangerates using2D NMR spectroscopy J Magn. Reson 69,20 l-209 8 Jeener,J , Meier, B H , Bachmann, P., and Ernst, R R (1979) Investrgatron of exchangeprocessesby two-drmensronalNMR spectroscopy J Chem Phys 71, 4546-4553 9 Brindle, K. M. and Campbell, I D (1987) NMR studresof kinetics m cells and tissues Q Rev Biophys 19(3/4), 159-182 10. Griffey, R H and Redfleld, A G (1987) Proton-detected heteronuclearedited and correlated nuclear magnetrc resonanceand nuclear Overhauser effect m solution Q Rev Bzophys. 19(1/Z), 51-82.

Dynamic and Exchange Aspects

189

11 Oswald, R. E , Bogusky, M J , Bamberger, M , Smith, R A G , and Dobson, C M. (1989) Dynamics of the multtdomam fibrinolyttc protein urokmase from twodtmenstonal NMR. Nature (London) 337,579-582. 12. Evans, P A , Dobson, C M , Kautz, R. A , Hatfall, G., and Fox, R. 0. (1987) Proline rsomerrsm m staphylococcal nuclease charactertzed by NMR and sttedirected mutagenests Nature (London) 329,266-268 13 Creighton, T E (1984) Protems in solution, in Protems, W. H Freeman, New York, pp. 265-328. 14. Schmrd, F. X. and Baldwin, R. L (1979) Detection of early intermediate m the folding of Rrbonuclease A by protection of amrde protons against exchange. J. Mol Biol 135199-215

15 Udgaonkar, J B. and Baldwin, R L. (1988) NMR evidence for an early framework intermediate on the folding pathway of ribonuclease A Nature (London) 335,694-699. 16 Roder, H , Elove, G A , and Englander, W. (1988) Structural characterizatron of folding intermediates in cytochrome c by H-exchange labellmg and proton NMR Nature (London) 335,700-704.

CHAPTER 8

Introduction

to Mass Spectrometry Robin

Wait

1. Introduction Mass spectrometry (MS) is a sensitive and powerful analytical technique, in which ionized sample molecules are separated according to their mass to charge ratios (m/z) by the application of electric and/or magnetic fields. If the ionization regime deposits sufficient excess energy, a proportion of the sample molecules will dissociate, the pattern of product ions formed being dependent on the structure of the mtact compound (Fig. 1). Amass spectrum thus consists of the masses (strictly mass to charge ratios, m/z) of these ions plotted agamst abundance. Interpretation of the spectrum thus affords information about both the mol wt and the structure of the sample. By the standards of most other physical methods, mass spectrometry is fairly sensitive, requiring somewhere between low picomoles and nanomoles of material, depending on the ionization method employed, but against this must be set its destructive nature. The present mtroduction aims to provide a brief overview of the technique, to define some of the key terms, and to offer a short tour of some of the different instruments that are more or less legitimately called mass spectrometers. Readers wishing a more detailed account should consult refs. 1-9. A recent volume of Methods in Enzymology (5) devoted entirely to mass spectrometry is particularly recommended, since both instrumentation and applications are comprehensively covered. All mass spectrometers consist of a means of ion generation, a mass analyzer for their separation, and an ion detector. In the following From Methods III Molecular Btology, Vol 17’ Spectroscoprc Methods and Analyses NMR, Mass Spectrometry, and Metalloprotern Technrques Edlted by C Jones, B Mulloy, and A H Thomas CopyrIght 01993 Humana Press Inc , Tofowa, NJ

191

Ionizati on

f’2 f; Fragmentation

f

Mass-spectrum

fz

M+

Fig 1 The principle of massspectrometry. Sample molecules (M) are lomzed, and a propotion of the molecular Ions (M”) dlssoclatesforming fragments fi, f2, and so forth The massesof the molecular ions and fragment ions are determmed, and plotted against abundance

Sample molecule

c

A -t f, +x dL

Introduction

to Mass Spectrometry

193

1 1 AM

\

10 %

Fig. 2 The 10% valley defimtmn of resolution Two peaks of equal mtenslty are said to be resolved when the height of the valley between them IS 10% of the maxlmum peak intensity

sections, these elements will be briefly considered in relation to the requirements for the analysis of biological molecules. 2. Mass Analyzers The characteristics of mass analyzers that determine suitability for biological applications include the mass range, the transmission efficiency (which will influence the overall sensitivity), compatibility with the appropriate ionization techniques and sample introduction devices, and the resolution. Resolution is a measure of the ability of an instrument to separate ions of similar mass. The resolving power of a magnetic sector mass spectrometer is usually defined as in Fig. 2. The resolution of two adjacent peaks of equal intensity and massesM and M + AM is equal to M/AM when the height of the valley between them IS 10% of the maxrmum peak mtensrty. Thus, resolution of 1000 IS

required to separate masses 1000 and 1001. The different types of mass analyzers are described at greater length in refs. 8 and 9. A comprehensive account of the optics of charged particles is available (for the mathematically sophisticated) in ref. 10. 2.1. Magnetic Sector Mass Spectrometers In a magnetic sector mass spectrometer, mass measurement is performed by deflecting the ions with a magnetic field; the extent of the deflection is proportional to the mass of the ion, more massive ions experiencing smaller deflection at a given field strength. Ions are accelerated out of the source region by the application of an accelerating voltage ‘t: acquiring thereby kinetic energy equivalent to mv2/2, i.e.: ZV = mv2f2

(1)

where z is the number of charges on the ion under consideration (in units of the charge on an electron), v is its velocity, and m is its mass. When the ion enters a magnetic field of strength B, it experiences a deflecting force of magnitude Bzv at right angles to its direction of travel, which forces it to describe a circular orbit of radius, r, such that: Bzv = mv2/r

(2)

Combining Eqs. (1) and (2) and rearranging gives the fundamental mass spectrometer equation: m/z = B2&2V

(3) Inspection of Eq. (2) shows that a magnetic sector is strictly a momentum analyzer that separatesions according to the product of their mass and velocity rather than mass alone. It is important therefore that ions enter the magnetic field with the same energy, since otherwise ions of the same mass, but differing in their velocity will be brought into focus at a different point, which will degrade the resolution. The energy spread of the ion beam may be reduced by means of an electrostatic analyzer (ESA; the electric sector). The ESA consists of two curved plates, across which a fixed electric field, E, is applied. Ions entering the field are constrained mto circular orbits of radius R, such that: zE = mv21R

(4)

By rearranging, it can be seen that: R = 2(mv2/2)lzJ3

(5)

Introduction

to Mass Spectrometry

195

Electric Sector

Magnet

Directional Velocity

focusing

curve

focusing

curv

P&t

Source Slit

of Double Focus

fig. 3. Schematic representation of the ion optics of a double-focusing magnetic sector instrument of forward geometry. If the ion detector is placed at the point of intersection of the direction and velocity focusing curves, the image will be independent of the velocity and angular spread of the ion beam.

By combination of Eq. (4) with Eq. (l), it follows that: R = 2VIE

(6)

Thus, ions of the same charge and kinetic energy will follow the same orbit, irrespective of their mass, and will be brought to a focus at the same point, whereas ions of varying energy will follow a slightly divergent path. Thus, the energy spreadof the ion beam may be reduced by placing a slit at the exit of the ESA so that only ions of the selected range of energies pass into the magnetic analyzer. The ESA also acts as a directional focusing device, counteracting tendencies toward angular divergence of the ion beam. The combination of an electric and a magnetic sector is described as double focusing, because the two fields are so arranged that the direction and velocity dispersion produced in one is counteracted by that from the other (Fig. 3). The detector is placed at the point where the direction and velocity focusing curves intersect, so that the final image is independent of the velocity and directional spread of the ion beam. By varying (scanning) the magnetic field strength, ions of different mass are sequentially brought into focus at this point. Instruments in which the ESA precedes the magnet are said to be of forward geometry, whereas in reverse geometry instruments, the ESA

196

Wait

is placed after the magnet. Double-focusing instruments of either geometry are capable of extremely high resolution (up to lOO,OOO),but this is achieved by reducing the widths of the various resolving slits, which results in considerable sacrifice of sensitivity. In practice, it is seldom necessary to operate above about 5000 resolution when analyzing biological molecules, which still enables mass measurement accuracy of better than 0.5 dalton across the entire mass range. The current upper mass limit attainable by these instruments at full accelerating voltage (8 or 10 kV) is around 15 kDa. Inspection of Eq. (6) shows that the mass range can be extended by operating at higher magnetic field strength, by increasing the radius of the ion trajectory, or by reducing the accelerating voltage. The latter course is the simplest, but is achieved at the cost of a reduction in sensitivity, which usually becomes unacceptable once the voltage drops below 3 keV.* In practice, it is difficult to achieve magnetic field strengths much higher than about 2.3 T with electromagnets. It is not possible to take advantage of the much higher fields attainable with superconducting magnets because of the need for scanning operation. Increasing the radius causes a rapid increase in the overall ion optical path length, resulting in impracticably large and expensive instruments. This effect can be reduced to some extent by decreasing the focal length by mtroducing the ion beam into the magnetic field at nonnormal incidence angles and by the use of inhomogeneous fields. 2.2. Quadrupole

Mass Filters and Ion Traps The quadruple mass analyzer consists of four rods of circular or hyperbolic cross-section, arranged asin Fig. 4A. Voltages having a DC component U and a radiofrequency component of the form V,cos cot are applied acrossopposite rods asshown in Fig. 4B. The radiofrequency period is short compared to the transit time of ions through the device. At any given field, ions with a narrow range of m/z, values are constrained into stable trajectories and pass down the mam axis of the quadrupole; all other ions describe unstable paths until they collide with the rods or are lost between them. At fixed frequency, a mass *Note on umts Energy values are frequently quoted m umts of electron volts (eV) m the literature of mass spectrometry This IS convenient because the translatlonal energies of Ions can be immediately and intuitwely related to the accelerating voltage employed One electron volt IS equwalent to 96 49 kJ mol-l

Introduction

to Mass Spectrometry

197

(U + vocos 64 -

Fig. 4A,B. The principle of the quadrupole analyzer.

spectrum may be obtained by scanning U and V, so as to maintain a constant ratio, which will enable the sequential transmission of ions of different masses. Voltages can be switched more rapidly than a magnetic field can be scanned, so quadrupoles are frequently used in fastscanning GC-MS systems. The mass range, resolution, and mass measurement accuracy of quadrupoles are inferior to those of doublefocusing magnetic instruments, the current upper mass limit being about 4000 dalton, with unit resolution achievable throughout this

198

Wait

range. The transmission efficiency of quadrupoles IS potentially high, since there is no necessity for resolving slits. Another advantage is that quadrupole analyzers are most effective at separating ions of low velocity, so their ion sources operate close to ground potential, the injected ions generally having energies ~100 eV, this greatly simplifies the task of interfacing to LC systems and to atmospheric pressure ionization techniques, such as electrospray. A triple quadrupole arrangement provides a relatively cheap route to tandem mass spectrometry. The first stageis usedfor precursorion selection, and the intermediate (radiofrequency only) quadrupole functions as a collision cell. The product ions are recorded by scanning the third analyzer. The main limitation of these instruments is that they are restricted to the analysis of low-energy collisions (see Section 5.1. and Chapter 12). The ion trap (sometimes known as a QUISTOR, for quadrupole ion store) is a device that operates on similar principles to a conventional quadrupole (II). It consists of three electrodes, one toroidal and two end caps. The electrodes are machined so as to provide hyperbolic inner surfaces and thus resembles a quadrupole in cross-section. Ions of a given mass can be constrained into stable orbits by the application of suitable potentials and then sequentially ejected into an external electron multiplier detector. Ion traps have principally been used as components of low-cost GC-MS systems, but the devices have considerable potential for tandem MS, and have recently been shown to be capable of both high-mass (>40 kDa) and high-resolution operation (12). 2.3. Time-of-FZight Analyzers In the time-of-flight (TOF) mass spectrometer, ions are accelerated through a potential (V) and then drift down a tube toward a detector. If all

the ions arrive at the beginning of the drift tube with the sameenergy (mv2/ 2 = zeV), then those differing in mass will have different velocities: v = (2 zeVlm)“*

(7)

so for a tube of length L, the time of flight of an ion is given by: t = L/v = (L*m/2 zeV)“*

(8)

from which its mass (m) may be easily calculated. It is clearly important that the ions should be produced at an accurately known start time and preferably should all originate from the same spatial position. For these reasons,TOF analyzers are mainly used

Introduction

to Mass Spectrometry

199

in conjunction with pulsed ionization techniques, such as plasma desorption and laser desorption. Some variation in ion energy is difficult to avoid, and this is responsible for the relatively low mass resolution of the devices (often ~1000). The energy spread among ions of the same mass can be reducedby the useof various types of ion reflectors; more energetic ions penetrate further into the reflecting field than less energetic ones of the same mass, and thus, their flight time is slightly increased, resulting in tighter bunching among isomass ions. Resolving powers of several thousandhave been achieved using reflector technology. The mass range of these instruments is virtually unlimited, and the absence of resolving slits ensures very high transmission efficiencies compared to magnetic mass spectrometers, resulting in excellent sensitivity. 2.4. Fourier

Transform

Mass Spectrometers

Ions (of charge z) contained within a strong magnetic field (B) will describe a circular motion in a direction perpendicular to the applied field, the angular frequency (0,) of this motion being inversely proportional to the mass (m) of each ion: co, = zBlm

(9)

The coherent circular motion of the ions sets up an image current in a detector, which is amplified and Fourier transformed to convert the time domain signal into a frequency domain signal. A mass spectrum can be obtained from the frequency domain signal, because each discrete frequency corresponds to a single mass. Conventional ion optics are not required, since mass separation and ion detection occur in the same cell; accordingly, transmission losses are greatly reduced, and the sensitivity is correspondingly good. The resolution of a Fourier transform instrument is a function of the observation period of the time domain signal and is in principle extremely high. The actual resolution attainable will be determined by the lifetimes of the trapped ions, which are not indefinite because the coherence of the orbital motion of ions of the same mass is gradually lost as a result of collision with residual gas in the detection cell. The instruments operate at fixed magnetic field strength; there is no requirement for magnet scanning, so superconductmg magnets, which develop much higher fields than electromagnets, can be used. The mass range is directly proportional to the magnetic field strength, so very high mass operation is possible;

200

Wait

the main practical restriction is imposed by the detection limits of the low-frequency signals characteristic of high-mass ions. FT instruments are well suited to use with pulsed ionization methods, since all the products of a single ionization event can be trapped and analyzed. Powerful tandem MS experiments are also possible with the trapped ions. The principal technical difficulties are a consequence of the very stringent vacuum requirements of the method. 3. Methods of Ion Production 3.1. Vapor-Phase Ionization Methods The oldest method for generating ions for mass analysis that is still m routine use is electron impact (EI) ionization. Sample molecules m the vapor phase collide with electrons emitted from a heated metal filament, causing eJection of an electron from the sample molecule, which thus is left carrying a net positive charge: M + e- + M+’ + 2e-

(10)

The radical cations produced by electron impact are known as odd electron ions, since they possess an unpaired electron. The energy of the bombarding electrons is about 70 eV, whereas the ionization potentials of most organic molecules are below 15 eV, so up to 50 eV of excess energy are imparted by the ionization process. Smce the dissociation energies of most organic bonds fall within the range 3-10 eV (300-1000 kJ/mol), considerable fragmentation usually results. This fragmentation is reproducible and characteristic of the molecule, and therefore offers a powerful technique for the structure determination of unknowns. In some cases,the fragmentation is so extensive that no molecular ions are observed, and the spectra are dominated by relatively unmformative low-mass ions. This problem may be overcome by the use of various soft ionization techniques that limit the excessenergy deposited, and so control the extent of fragmentation. Chemical ionization (CI) is one such method. In this technique, the sample vapor is introduced into the ion source in the presence of a reagent gas, such as methane or isobutane. Bombardment with electrons ionizes molecules of the reagent gas, initially forming radical cations, from which reactive ionic species, such as CH,+ and C,H,+, are generatedby collision with neutral gas molecules in the high-pressure CI source. It is these ionic species that ionize the

Introduction

to Mass Spectrometry

201

sample molecules, usually by proton attachment or electrophilic addition, but sometimes by charge exchange. The collision processesallow equilibration of the energy deposited by the primary ionization event, so the excess energy imparted to the sample molecules is small (generally approx 5 eV), and molecular ion production predominates over fragmentation. The cationized molecules produced by CI (sometimes called quasi-molecular ions in the older literature), are even electron species, in contrast to the odd electron ions characteristic of EL The probability of direct ionization of sample molecules by electron impact is low because of the much higher concentration of the reagent gas. Electron impact and chemical ionization sources can be fitted to most types of mass spectrometer, and are generally provided as part of the standard equipment of commercial instruments, particularly those equipped with a gas chromatographic inlet system. The major drawback of EI and CI mass spectrometry from the point of view of the biochemist is the requirement for the sample to be presented for ionization in the vapor phase. This limits their application to compounds of low mass (generally <1200 dalton) that are either intrinsically thermally stable and volatile, or that can be made so by suitable chemical derivatization. Thus, most biopolymers are amenable to analysis only after chemical or enzymatic degradation and conversion to volatile derivatives. The requirement for sample volatility is less stringent in Desorption Chemical Ionization (DCI) (also called direct chemical ionization), in which the solid sample is deposited on a wire and introduced into the plasma within a chemical ionization source. On electrical heating, cationized sample molecules are ejected. Although heating of the sample is still required, molecular ions can be obtained from materials of higher mass and polarity than are suitable for analysis by conventional CL As the mass of the sample increases, however, sensitivity falls and pyrolytic processes become increasingly significant. The following sections describe methods of ionization that are less subject to this limitation. 3.2. Field Desorption Desorption methods refer to ionization techniques in which ionized sample molecules are ejected directly from a condensed phase into the vacuum of the mass spectrometer, without prior conversion into a neutral vapor.

Wait In Field Desorption (FD), the sample is applied to a heatable metal emitter that has been activated by the growth of carbon microneedles over its surface. The emitter is maintained at accelerating potential and functions as the anode. Very high potential gradients (of the order of lo8 V/cm) are present at the sharp tips of the needles, which permit quantum tunneling of electrons from the sample molecules into vacant orbitals in the anode, resulting in desorption of the sample ions by Coulombic repulsion. Cation attachment and thermal ionization may also aid desorption, particularly of more polar samples (13). The resulting molecular ions have little excess energy, and fragments are therefore weak or absent, which facilitates analysis of mixtures. Field desorption has the (possibly undeserved) reputation of being a difficult technique, and particularly since the advent of FAB has suffered a decline in popularity. This is unfortunate, since it is sometimes successful where other techniques fail, particularly for samples of low polarity. Moreover, it does not produce the high level of background characteristic of matrix-assisted desorption techniques, such as FAB. This absence of background means that FD is well suited to use with integrating detectors, such as microchannel arrays (Section 4.), which can overcome some of the limitations imposed by the weak and fluctuating ion currents associated with the technique. 3.3. Particle

Desorption

Methods

In this group of techniques, ionization is effected by bombarding the sample with energetic particles (keV to MeV depending on the technique), which has the rather paradoxical effect of desorbing a proportion of the sample molecules as ions with fairly low internal energy, generally because much of the impact energy is equilibrated and dissipated within the solid or liquid matrix employed. In Plasma Desorption, the bombarding particles are the fission fragments of the element Californium 252 (252Cf),which have energies in the region of 80-100 MeV. The solid sample is deposited from solution as a thin layer on a metal foil target. Each fission event gives rise to two particles, moving in opposite directions. One of these strikes the sample foil, desorbing sample molecules from the impact zone into the time of flight mass analyzer (Section 2.2.), while the other hits a detector, triggering a start signal. Sample ions are desorbed as singly, and sometimes as doubly or triply protonated species; structurally useful frag-

Introduction

to Mass Spectrometry

203

mentations tend not to be observed, so in general no sequence information is obtained. However, PD-MS is a high-mass technique and can be used to measure molecular massesof proteins of up to about 45 kDa, to an accuracy of plus or minus a few dalton. The ultimate mass accuracy attainable is limited by the low resolution of the time of flight analyzer, and by peak broadening owing to metastable decompositions. A detailed account of the practical aspects of PD-MS is given by Roepstorff in Chapter 10. The desorption methods that have seenthe widest application in the biological sciences over the past few years are probably Fast Atom Bombardment (FAB) and the related technique Liquid Secondary Zonization Mass Spectrometry (LSIMS). Both techniques employ bombardment by particles with energy in the keV range: Xenon or argon atoms of 6-10 keV in the case of FAB, whereas LSIMS uses somewhat more energetic cesium ions (lo-30 keV). The defining characteristic of both experiments is that the sample is presented for ionization as a solution in a liquid of low vapor pressure, such as glycerol, which protects the sample molecules from excessive radiation damage. The bombarding particles desorb both positive and negative sample ions; positive ions are formed by cation attachment, whereas negative ions result fromdeprotonation. Singly charged species predominate in both cases. The molecular ions are usually accompanied by fragmentation products, particularly below mass 3000 or so. In the case of carbohydrate polymers and peptides, fragmentation occurs predominantly at the glycosidic and peptide bonds, and so affords a means of sequence determination. The upper mass limit for ion production by FAB and LSIMS is between 10 and 20 kDa, though the technique is most successful up to about mass 6000. FAB/LSIMS sources can be used with most types of mass analyzer, but are particularly successful in combination with double-focusing magnetic instruments, the high resolution of which enables mass measurement to at least 0.3 dalton over the entire mass range. Chapter 11 describes the use of FAB ionization for the characterization of peptides. The main disadvantage of conventional FAB is that the liquid matrix contributes a high level of chemical background, which limits sensitivity and may obscure fragment ions. These problems are much less severe in continuousflow FAB (sometimes called dynamic FAB), in which a continuous flow of solvent is delivered to the tip of the FAB probe via a fused silica capillary. Typi-

cal solvents include water, aqueous methanol, or acetonitrile (possibly containing a small proportion of glycerol) at a flow rate of between 5 and 10 pL/min-‘. The low concentration of matrix contributes a significantly lower level of background ionization than in conventional FAB, which results in better detection limits, The device may be used in flow injection mode, the sample being introduced directly into the solvent stream, enabling quantitative measurements to be made more reliably than in conventional FAB. Alternatively a separation technique, such as microbore HPLC or capillary electrophoresis, may be directly coupled. Continuous flow FAB and its applications have been fully described in a recent book (14). 3.4. Laser Desorption Methods Laser energy may also be used to ionize and desorb samples for mass spectrometric analysis. Both far infrared radiation (from a COZ laser) and 266-nm UV radiation (from a frequency quadrupled neodymium/yttrium-aluminium-garnet [Nd/YAG] laser) have been used for the purpose. The pulsed nature of the ionization mode dictates the use of a time of flight or Fourier transform-type mass analyzer. In the absenceof a matrix, the technique appearsto be restricted to samples below mass 2000. The recently introduced matrix-assisted laser desorption technique, however, is capable of analysis of materials of severalhundred thousand dalton in mass. A few picomoles of sample are mixed with an excess of a matrix, such as nicotinic acid or sinapinic acid, that has an absorption band at the laser wavelength employed. On irradiation, sample ions are ejected into a time of flight tube and massanalyzed. The predominant species desorbed are singly charged monomers, but multiply charged ions and dimers are also observed. Few fragments are formed, and since the technique is not especially subject to suppressionand discrimination effects, it is suitable for mixture analysis. The mass accuracy of the technique is currently somewhat less than that of electrospray (Section 3.5 .), mainly because of the low resolution of the TOF analyzers employed and the formation of artifacts by matrix adduction. A full account of the technique IS presented in Chapter 9 by two of its pioneers. 3.5. Ion Evaporation Methods Several ionization methods rely on the evaporation of ions from charged droplets, of which the most important are thermospray and the various forms of electrospray. In thermospray ionization, the

Introduction

to Mass Spectrometry

205

sample in solution is introduced into the mass spectrometer via a heated capillary tube, which partially vaporizes and nebulizes the solvent, forming a supersonic jet of fine droplets, which enters the mass spectrometer ion source. As the droplets continue to evaporate, ionized sample molecules are ejected and can be mass analyzed. The chief importance of the thermospray source is that, because it can tolerate high liquid flow rates (typically OS-l.5 mL/min-‘) and the presence of (volatile) buffer salts, it provides one of the most successful means of the direct interfacing of mass spectrometry and liquid chromatography (15). In the electrospray ion source (which operates near atmospheric pressure), the sample solutron enters through a fused sihca capillary that is maintained at a high potential with respect to acounterelectrode, so as to produce a field gradient of the order of 3 kV/cm. Under these conditions, the solution emerges from the capillary as a finely dispersed aerosol of highly charged droplets. Further evaporation results in shrinkage of the droplets until the mutual repulsion of the multiply charged sample molecules is sufficient to overcome the forces holding the droplet together, and the so-called Coulombic explosion ejects the sample molecules mto the vapor phase. A proportion of these ions passes through a sampling orifice into a region of intermediate pressure and then through a further orifice (“the skimmer”), into the high vacuum of the mass spectrometer (Fig. 5). The sample (at a concentration of I-lOpmol@L) is usually introduced in solution in water/methano1 or water/acetonitrile containing a few percent of acetic acid, at a flow rate of l-10 pL/min, although flow rates up to 50 &/min may be accommodated. The compatibility with low flow rates is useful and simplifies coupling with capillary electrophoresis. Various designs of electrospray interface exist, in which the rate of desolvation may be enhanced by heating of the capillary or by a flow of bath gas. This latter, pneumatically assisted version is sometimes known as ionspruy and may enable somewhat greater flows of solvent (up to about 200 p.L/min) to be introduced into the system, which is an advantage when interfacing with HPLC. The significant point about all forms of this ionization process is that multiply charged ions are formed from samples, such as proteins, that have sufficient charge bearing sites. Because mass spectrometers measure mass to charge ratio (m/z), rather than mass directly, these

206

Wait Skimmer

Lenses

sampling orrfice \

Counter electrode

.., III::*... . .

. ,

.

c Mass

Capillary tube

I Atmospheric pressure

1 l;;argwng

I 1 24; euvw l

Fig. 5. Schematic of an electrospray ton source. A fine spray of charged droplets emerges from the cap&q; as the droplets evaporate, multiply charged sample ions are eJected and pass through the sampling orifices into the mass spectrometer.

multiply charged species are observed at low apparent masses; a protein of A4,.40,000 bearing 20 charges will be detected at about mass 2000, for example. An electrospray spectrum of a protein typically consists of a series of peaks, ascending in mass, in which each member of the series in the high-mass direction corresponds to a molecular ion carrying one fewer proton. It is not uncommon for proteins ionized by electrospray to bear a many as 40 charges, so even quite large molecules can be brought within the range of conventional quadrupole and magnetic sector mass spectrometers. Measurement of the actual mass requires the determination of the number of charges on each ion, which is easily achieved provided that two peaks that differ by one charge can be identified. The ions in the series all have the general formula (M, + zIQz+, where M, is the molecular mass of the protein, z is the integer number of charges, and H is the mass of a proton (1.0079 dalton). If we consider two adjacent peaks, m2 and ml, differing by one charge where: m2

= rwr + zH)M

q=[M,+(z+l)Hy(z+l)

(11) (12)

Introduction

to Mass Spectrometry

207

then the charge, L on m2 is given by: z=(m,-H)l(m;!-m,) (13) A simple computer algorithm can repeat this process for every peak in the spectrum. In some cases, the charge state can be assigned by measurement of the mass separation between adjacent isotope peaks; the apparent mass difference between a doubly charged ion and its first 13Cpeak will be 0.5, for example. Electrospray is usable to over 100 kDa, with a claimed mass resolution of 0.1% and an accuracy of 0.0 1%. These relatively high-mass measurement accuracies are largely attributable to the superior resolutions of quadrupole and (especially) magnetic sector instruments over TOF analyzers. Mass measurement can be achieved on about a picomole of pure protein, and in a few favorable cases, sensitivities m the femtomolar range have been obtained. Electrospray does not involve a matrix, and therefore, the spectra are relatively free of background contributions; in consequence, it is possible to detect minor components that constitute as little as 1% of a mixture. The ions produced by electrospray have fairly low internal energy, and therefore, little spontaneous fragmentation is observed. However, becausethe technique can be implemented on instruments with a tandem MS capability, it is possible to generate fragmentation artificially by collisional activation (see Section 5.1. and Chapter 12). Collisional activation is less straightforward when the precursor ions have more than a single charge, since charge stripping may occur, with the product ions appearing at higher mass-to-charge ratio than their parents, and it may be difficult to assign the correct number of charges. Most of the published examples of sequence determination by MS/MS in conjunction with electrospray have used precursor ions of relatively low mass and charge number. Collision spectra have been successfully recorded from multiply charged protein molecules, but the difficulties of interpretation are at present formidable (16). Electrospray mass spectrometry is reviewed in refs. 17-19. 4. Methods of Ion Detection A detector is required to convert the mass-separatedion current into an electrical signal that can be amplified, computer processed, and output to a printer/plotter. Magnetic sector and quadrupole mass spectrometers generally employ some form of electron multiplier for this purpose.

208

Wait

A discrete dynode type of electron multiplier consists of a series of between 10 and 20 electrodes (known as dynodes), electrically linked by a chain of resistors, so as to maintain a fixed potential difference between adjacent dynodes. When an ion strikes the first dynode, secondary electrons are emitted, which are accelerated toward the second dynode, where each triggers a further burst of electrons, which in turn strikes the third dynode, and so on until the electrons emitted from the final dynode are collected on an anode at the far end of the device (Fig, 6). Electron multipliers are characterized by very high gains (>106), and very rapid response times. Continuous dynode multipliers are also sometimes used: these work on the same principle, but consist of a curved ceramic tube, the inner wall of which is coated with an electron-emitting resistive layer. When ions strike this coating near the entrance, secondary electrons are ejected, which in their turn strike the wall emitting further secondary electrons. Some instruments use a photomultiplier detector, which consists of a conversion dynode that produces a high yield of secondary electrons when struck by the ion beam. The secondary electrons are accelerated toward a photoemissive surface, and the resulting photons are detected by an optically coupled photomultiplier. The advantage of this system is that the photocathode and electron multiplier are sealed in their own vacuum envelope, and thus, do not suffer deterioration in performance as a result of contamination or ion burns. The sensitivity of a multiplier-type detector is (in part) a function of the number of secondary electrons produced by the impact of each ion at the first dynode. This ion yield is proportional to the velocity of the impacting ion. Since ions are accelerated to the same kinetic energy in the mass spectrometer, it follows that ions of higher mass have lower velocities, and that therefore the efficiency of detection falls as the mass is increased. This problem is to some extent overcome by the use of postacceleration detection (PAD). A typical PAD consists of a polished aluminium electrode that is maintained at a high potential with respect to an off-axis electron multiplier. The ion beam is deflected toward the PAD electrode, causing, on impact, emission of a shower of electrons, which are accelerated toward the first dynode of the multiplier. Ion detection is inherently inefficient in instruments in which the mass spectrum is recorded by scanning, because only one mass is

ion

Dynodes

Resistor Chain

To Amplifier

Fig. 6. Principle of the electron multiplier A cham of resistors applies a potential difference between successive dynodes. Ions from the mass spectrometer striking the first dynode release a shower of electrons These are accelerated to the next dynode, tnggermg the emission of further electrons on impact.

Incident

-kV

2P is a

210

Wait

sampled at a time, all other ions being discarded. Array detectors address this weakness by detecting all ions within a given mass range simultaneously, which can result in considerable enhancements of sensitivity. Current implementations have a fairly restricted mass window (typically a few percent of the mass range,),so some need for scanning remains, and a spectrum is built up by stepping the array over the mass range of interest. The full benefits of array detection are likely to be realized in applications, such as tandem mass spectrometry and field desorption, where sensitivity is limited by the absolute level of signal, not the ratio of signal to background. 5. Collisional

Activation

and Linked

Scanning

5.1. Collisional Activation The various soft ionization methods described in Section 3. mostly favor the production of molecular ions with relatively little tendency to fragment. Although this facilitates molecular mass measurement, more detailed structure determination may not be possible in the absence of fragmentation. One strategy to increase the number of structurally significant ions in a mass spectrum is to induce fragmentation after the initial ionization event, which can be effected by mterposmg a region of high-pressure gas contained in a collision cell in the flight path of the ion beam. When ions collide with the target gas, a proportion of their translational energy is converted into vibrational energy, which may be sufficient to induce bond breakage. This collisional activation process may be performed in magnetic sector instruments, m which casethe kinetic energy of the parent ion beam will be in the kiloelectron volt range (in the laboratory frame) or in a quadrupole-type system, where the incident beam energy is usually < 100eV. In either case, only a proportion of the kinetic energy in the laboratory frame is available for conversion into the internal energy of the colliding system (2Q21). The maximum energy available for conversion into internal modes (EC,,) is related to the difference between the initial and final relative kinetic energies of the incident ion and the target gas, and is given by: = [mt h + q, )b%b (14) where mPis the mass of the incident ion, m, is the mass of the target gas, EC,, is the energy in the center of mass frame, and Elabis the energy in the laboratory frame (i.e., the kinetic energy of the incident beam, ECM

Introduction

to Mass Spectrometry

211

equivalent to the potential difference between the ion source and the collision cell times the charge of the particle). The mechanisms of excitation differ between the high-energy processes observed in sector instruments, and the low-energy collisions characteristic of quadrupole and hybrid instruments. It is therefore not surprising that different fragmentation pathways are sometimes seen. 5.2. Linked Scanning Techniques The most satisfactory way to analyze the products of a collision experiment is by means of tandem mass spectrometry, in which the collision cell is placed between two mass analyzers connected in series, the first being used for selection of the precursor ion beam for fragmentation, and the second being used to mass analyze the products. The various types of tandem mass spectrometry, and the applications of the technique are described by Costello in Chapter 12. It is also possible to analyze the products of collisional activation with a conventional magnetic sector instrument of either geometry by means of linked scanning techniques, which effectively use the electrostatic and magnetic sectors as separate mass-analysis stages. Consider a double-focusing instrument of forward geometry, such as that illustrated in Fig. 3. For normal operation, the field across the ESA plates is maintained in a fixed relationship to the accelerating voltage so as to transmit ions with a narrow range of translational energies. If a collision cell is placed in the field-free region between the source and the electrostatic analyzer, and the ion M+ fragments there, its translational energy will be distributed between the product species according to the law of conservation of momentum. These product ions will therefore have reduced translational energy and will be unable to pass through the ESA. To mass analyze them, we must reduce the voltage of the ESA to the value required to transmit them. By scanning the ESA voltage (E> and the magnetic field (B) in a fixed ratio (at constant accelerating voltage), we can therefore selectively analyze the products of collision processes without interference from ions formed in the source, which will be too energetic to passthrough the ESA. Among the most useful of such linked scans are the B/E scan (fragment ion scan), the B2/E scan (precursor ion scan), and the constant neutral loss scan. In the B/E scan, the ESA and magnet are scanned so as to mamtain a constant ratio of B to E. This records only the decomposition

212

Wait

products of the selected precursor ion and is thus useful for obtaining clean spectra from the components of mixtures, though with two sector instrument; the precursor ion resolution is fairly poor. On a foursector instruments, a B/E scan of the second mass spectrometer is the usual way of obtaining a collision spectrum of an ion selected with the first analyzer. Scanning the ESA and magnet in a constant B2/E ratio records all parents of a selected product ion; the resolution of the parent ion is poor because of the effects of translational energy release. Finally, the constant neutral loss scan records all species that decompose by elimination of a selected neutral fragment. The experiment is performed by varying B and E so that the ratio B2( 1 - E)/E2 is constant. References 1. Rose, M. E. and Johnson, R. A. W. (1982) Muss Spectrometry for Chemists and Biochemists. Cambridge University Press, Cambridge, UK 2. Watson, J. T (1985) Introduction to Mass Spectrometry Raven, New York 3. McLafferty, F W. (1980) Interpretation of Mass Spectra University Science Books, Mill Valley, CA 4 Chapman, J. R. (1985) Practical Organrc Mass Spectrometry. Wtley, Chichester. 5. McCloskey, J. A. (ed.) (1990) Mass spectrometry. Methods m Enzymology, vol. 193. Academic, San Diego. 6 Burhngame, A. L. and McCloskey, J. A. (eds ) (1990) Blologrcal Muss Spectrometry Elsevier, Amsterdam 7 Suelter, C. H. and Watson, J T (eds ) (1990) Biomedical Applicanons ofMass Spectrometry Methods m Blochemlcal Analysis, vol 34. Wiley, New York 8. White, F A and Wood, G M. (1986) Mass Spectrometry: Applications in Snence and Engineering. Wtley, New York. 9. Duckworth, H E , Barber, R C , and Venkatasubramanian, V. S (1986) Muss Spectroscopy. Cambridge University Press, Cambridge, UK 10 Wollmck, H. (1987) Optics of Charged Particles. Academic, New York. 11. March, R E and Hughes, R J. (1989) Quadrupole Storage Muss Spectrometry Chemical Analysis Series, vol 102, Wiley, New York 12 March, R. E. (1991) A musing on the present state of the ion trap and prospects for future apphcattons Org Mass Spectrom 26,627-632 13. Lattimer, R P and Schulten, H -R. (1989) Field lomzation and field desorptlon mass spectrometry. Past, present and future. Anal. Chem 61, 1201A-1215A 14. Caprioli, R M. (1990) Continuous Flow Fast Atom Bombardment Mass Spectrometry. Wiley, New York. 15 Yergey, A. L., Edmonds, C G , Lewis, I. A. S, and Vestal, M L (1990) Liqmd Chromatography Mass Spectrometry Techniques and Apphcations Plenum, New York 16 Smith, R D , Loo, J. A., Barinaga, C J , Edmonds, C G., and Udseth, H. R (1990) Collisional activation and colhslon-activated dissociation of large multl-

Introduction

17 18 19.

20 21

to Mass Spectrometry

213

ply charged polypepttdes and proteins produced by electrospray tomzatton. J Am. Sot. Mass Spectrom 1,53-65. Fenn, J. B., Mann, M., Meng, C K , Wong, S. F and Whitehouse, C M (1990) Electrospray ionization-principles and practice Mass Spectrom. Reviews 9,37-70 Mann, M (1990) Electrospray* Its potential and limitations as an ionization method for biomolecules Org. Mass Spectrom 25,575-587 Smith, R D , Loo, J A, Edmonds, C. G., Barinaga, C J , and Udseth, H R (1990) New developments in biochemtcal mass spectrometry electrospray iomsation. Anal. Chem. 62,882-899. Cooks R. G. (ed.) (1978) Collision Spectroscopy Plenum, New York. Busch, R, L , Glish, G L , and McLuckey, S. A. (1988) Mass SpectrometryIMass Spectrometry. Techniques and Applicatrons of Tandem Mass Spectrometry VCH, New York.

CHAPTER 9

Laser Desorption Ionization Mass Spectrometry of Bioorganic Molecules Michael

Karas

and Ute Bahr

1. Introduction Matrix-assisted laser desorption/ionization (LDI) mass spectrometry (MS) is a very young method that has overcome the mass limitations for the mass spectrometry of biopolymers (1-4). Four years ago, a UV-absorbing matrix was used to extend the accessible mass range of UV-LDI of peptides, and a strong dependenceon the UV absorption properties of the matrix was demonstrated. The mol-wt determination capability is now ca. 300,000, and since the method is still evolving, its potential is far from fully exploited. Two other mass spectrometric methods are currently applicable to high-molecular mass determination; plasma desorption mass spectrometry (Chapter 10 in this vol.) enables the production and detection of molecular ions of proteins up to about 30 kDa. Another recent technique, electrospray MS (5), works by spraying a solution of sample into an electric field, producing highly charged droplets, from which molecular ions are desorbed. The characteristic feature of electrospray mass spectra is a distribution of multiple-charged molecular ions, allowing mass measurement of proteins up to 70 kDa (see Chapter 8). UV-laser desorption of organic molecules without a matrix, mtensively studied in the authors’ group for the past few years, shows systematic limitations. The ability to desorb intact molecular ions (and structure-specific fragment ions) was found to be related to a strong From Methods in Molecular Biology, Vol 17’ Spectroscopfc Methods and Analyses NMR, Mass Spectrometry, and Metaltoprotem Techmques Edlted by. C Jones, B Mulloy, and A H Thomas Copynght Q1993 Humana Press Inc , Totowa, NJ

215

216

Karas and Bahr

resonance absorption band of the analyte at the laser wavelength used and required careful maintenance of the applied laser power density (irradiance, W/cm2) at the lowest possible value (threshold). For larger molecules especially, intense fragmentation could not be avoided, imposing a mass limit of ca. 1500 dalton. These limitations were overcome by the introduction of the matrix desorption technique. The prmciple of this method, as we so far understand it, IS as follows: The analyte molecules are embedded m an excess matrix of small organic molecules that show a high resonant absorption at the laser wavelength used. The matrix absorbs the laser energy, thus inducing a soft disintegration of the sample-matrix mixture into free (gas phase) matrix and analyte molecules and molecular ions. A more detailed description of the process is given in ref. 3. In general, only molecular ions of the analyte molecules are produced, and almost no fragmentation occurs. This makes the method well suited for mol-wt determinations and mixture analysis. The instructions given in the following are based on the experience of the authors’ group and that of Beavis and Chait at Rockefeller University (6-8); currently, more than 300 different proteins have been successfully analyzed, and no major limitations have yet emerged. 2. Materials A few microliters of a 1Oe5-1Ow7Msolution of the biopolymer sample, typically 1 pmol or less, suffice for analysis. The practical limit is determined more by the volume of sample solution that can be handled easily, than by the amount consumed in the analysis. Since this is estimated to need only about lo-t7 mol, most of the sample material can be recovered after the analysis. The smallest amount of sample that has been used to date is about 50 fmol (5 x lo-” mol). Table 1 shows the substances usable as matrices, and their corresponding solvents and wavelengths. The best results are obtained by using materials of the highest available purity for analytes, matrix, and solvents. The degree of salt contamination, and the concentrations of additives, such as buffers, compatible with high-quality results depend both on their nature and on the matrix used. When using nicotinic acid, salt concentrations of up to about 10m3Mcause only slight degradation of the signal quality, and the same is true for detergents, such as dodecylsulfate, Tween@,or

LDI-MS

of Bioorganic Molecules

217

Table 1 Commonly Used Matrrces Matrix

Solvent

Nicotinic acid Ferulic acid* Smapmrc acid* Caffetc actd* 2,5-dlhydroxybenzotc actd Vanillic acid 3-Nitrobenzylalcohol Pyrazine-carboxylic actd 2-Aminobenzoic acid 3-Aminopyrazine-Z carboxyhc acid 7,8-Dihydroxycoumarin

Wavelength

Water/lo% ethanol 1.1 Mixture of ethanol 0.1% TFA II I9 Water/lo% ethanol Water/lo% ethanol (Liquid matrix) 1.1 Water/ethanol Water/20-30% ethanol 1: 1 Water/ethanol

266,220-290 266,337,355

nm nm

266,337,355 266 nm 266 nm 266 nm 266,331,355 331 nm

nm

Water/lo% ethanol

266,337,355

nm

,I I,

nm

*Caffelc acid IS 3,4-dihydroxycmnamlc acid, ferulic acid ISthe 3-methoxy denvatwe, and smapmic acid IS the 3,4-dlmethoxy denvatwe.

Triton 100”. In the case of cinnamic acid matrices, the protein may be loaded from a buffered solution (e.g., 50 mMNa-Citrate). In sinapinic acid, matrix salt concentrations of 1M are tolerable. This behavior is highly desirable for practical applications, because proteins can be examined under physiological conditions and problems owing to denaturation and limited solubility can be avoided.

3. Methods 3.1. Sample Preparation

Sample preparation is a critical step for successful matrix desorption. Even though matrix and sample will usually form a homogenous solution, separation of matrix and analyte may occur during drying, which is fatal to the laser desorption technique. The following very simple preparation technique is generally usable. 3.1.1. Proteins and Glycoproteins For protein and glycoprotein analysis, the solid sample is dissolved in double-distilled water either alone or containing about 10% ethanol. Depending on its solubility, the matrix ISdissolved m water, water/ ethanol, or water/acetomtrile (5-10 g/L). Trifluoroacetic acid (TFA)

218

Karas and Bahr

is usually added to a 0.1% concentration (seeTable 1). The sample and matrix solutions (0.5-l & each) are mixed on an inert metallic (Ag, Pt) sample target, and are dried in a stream of (warm) air or in vacua inside the mass spectrometer. A greater excess of matrix is preferable when using cinnamic acid, and 1 lt,L of protein sample is therefore mixed with 10 ltL of matrix solution: 0.5-l l.tL of this final solution is used for analysis. The sample is then introduced into the vacuum chamber of the mass spectrometer for analysis. 3.1.2. Carbohydrates

For carbohydrate analysis, about 1 g/L of aqueous solution is used. This class of compounds works best in a 1:4 mixture with dihydroxybenzoic acid at an excitation wavelength of 337 or 355 nm, or with nicotinic acid at 266 nm. 3.1.3. Nucleotides

Nucleotide samples are dissolved in double-distilled water at 0.1 to 1 g/L and mixed with 5 x 10m2Mmatrix in a ratio of 1: 1. The matrices are aminobenzoic acid or aminopyrazine carboxylic acid with a laser wavelength of 337 nm. 3.2. Apparatus Until now, laser ionization time-of-flight mass spectrometers were only commercially available as microprobe instruments (LAMMA 1000, Leybold Heraeus, Koln, Germany; LIMA, Cambridge Instr./ Kratos, Cambridge, UK). For our experiments, a LAMMA 1000 laser microprobe prototype instrument was used, which is described in detail elsewhere (9). Figure 1 shows the schematic diagram of a laser mass spectrometer. Since then, the first instrumentation dedicated to the LDI of biopolymers hasbeen introduced (Vestec Corp., Houston, TX), and further suppliers are expected to follow in the near future. The sample is irradiated with short pulses of UV laser light, either with 5-10 ns laser pulses from a Q-switched Nd-Yag-laser at a wavelength of 266 nm (frequency-quadrupled) or at 355 nm (frequencytripled). Pulses (3 ns) from a nitrogen laser at 337 nm may also be used. In each case, a matrix that absorbs at the appropriate wavelength is required. With microprobe instruments, the focus diameter is between 3-30 pm. Larger focal areas of ca. 100-500 l.un, attainable by single quartz lensesof 10-50 cm focal length, canbe usedandare advantageous,

LDI-MS

of Bioorganic

Molecules I

219 I

Soinple stage

Tronslentrec

Fig 1 Schematic diagram of a reflector-type time-of-fhght laser ion source.

mass spectrometer with

because the microheterogeneity of the sample is then less important. Good results are only obtained over a narrow irradiance range, which (since it depends on the focal area) has to be determined for the instrument used. Typical values are in the 106-lo7 W/cm2 range. For a focal area of low3cm2, an irradiance of lo7 W/cm2 corresponds to an energy of the laser pulse of only 100 J; thus, low-power, inexpensive lasers such as nitrogen lasers,can be used.Each pulse produces ions (positive as well as negative) that are accelerated by an appropriate voltage, focused through an ion-optical system, mass separated in a time-offlight tube (with or without an ton reflector), and detected with a secondary ion multiplier. A digital oscilloscope providing a fast analog-to-digital converston (record length of 32K samples and vertical resolution of 8 bit) and a maximum time resolution of 10 ns (e.g., Le Croy 9400) is needed for signal recording. Although each laser pulse produces a complete mass spectrum, the signal-to-noise ratio is usually improved by summing 1O-30 single-shot spectra. Spectra are then

220

Karas and Bahr

Fig. 2 LDI mass spectrum of human albumm, obtained with 337 nm and feruhc acid as matrix Twenty single spectra have been accumulated

further processed (e.g., calibration and mass assignment) using personal computer-basedsoftware. The time for a single analysis, including sample preparation, is typically cl5 min. 3.3. Analysis

of Proteins

and Glycoproteins

Figure 2 shows the mass spectrum of the protein human serum albumin (mol wt 66,437) obtained using a 337-nm wavelength laser, ferulic acid matrix and represents the accumulation of 20 single-laser shots. The molecular ion is the base peak, and is accompanied by doubly and triply charged molecular ions and cluster molecular ions [M] n+‘-. All of these signals can be used for mol-wt determination, Fro: results obtained for peptides and small proteins, it can be deduced that, whereas molecular ions are mainly (de)protonated species, the contribution of charged species formed by addition of alkali metal ions also has to be taken mto account. Low-mass signals can be attributed exclusively to the matrix or low-mol-wt contaminants. Fragment ions owing to the cleavage of covalent bonds (e.g., peptide bonds) are not observed even with increasing irradiance. Too high an irradiance will initially degrade the high-mass signal intensities and quality, until the whole process switches to a mode where only unspecific low-mol-wt

LDI-MS

of Bioorganic

Molecules

Fig. 3 LDI mass spectrum of a mixture of cyclodextrm

221

and maltoheptaose

fragment ions are observed. Glycoprotems with a carbohydrate content of up to 80% have been analyzed and show the same characteristics. 3.4. Analysis

of Carbohydrates

To date, only underivatized sugars and sugar mixtures, such as maltodextrins with a mol wt of 3500 and mannosides containing 1 Nacetylglucosamme up to a mol wt of 1700, have been analyzed. Quasimolecular ions formed by sodium or potassium ion addition to a neutral molecule are the only species detected; neither multiply charged nor cluster ions are produced. Figure 3 shows the LDI spectrum of a mixture of maltoheptaose and cyclodextrin with mol wts of 1175 and 1134. The sodium adducted molecular ions of both substances and the potassium adduct of maltoheptaose are desorbed. 3.5. Analysis

of Nucleotid-es

Spectra from nucleic acids and oligonucleotides up to a mol wt of 39,000 have been obtained. Better signal intensities are given in the negative ion mode for oligonucleotides. When operating at 337 nm, several matrices can be used, but the best results were obtained with aminopyrazinecarboxylic acid and ammobenzoic acid. As the mol wt increases, the peaks become broader, because protons are progressively exchanged for alkali metal ions complicating mol-wt determination. Figure 4 shows the negative ion spectrum of ohgodeoxythymidylic

222

Karas and Bahr

2000

5000

lb000

ii000 M,/z

Ag. 4 LDI mass spectrum of ohgodeoxythymidyhc acid (d[ptl15) with a mol wt of 458 1 The spectrum represents the sum of 25 single laser shots with ammopyrazinecarboxylic acid as matrix at 337-nm wavelength

acid d(pT)i5 with a mol wt of 458 1. Twenty-five single spectra were summed to produce the spectrum, which was obtained using an aminopyrazinecarbonic acid matrix. 3.6. Structural

Information

Besides mol-wt information, matrix UV-LDI may also yield some information about subunit structure of proteins. Figure 5 shows the spectrum of lectin (from Cunavalia ensz’formis) obtained using a nicotinic acid matrix. The matrix solution contained 10% ethanol. The molecular ion pattern observed differs from that of simple singlechain proteins. The intensity distribution cannot be interpreted as being the result of multiple charging of a larger protein or of the production of cluster ions from a smaller protein because of the absence of triply charged species. Thus, the spectrum represents the subunit structure of lectin with the stable tetra-, tri-, di-, and monomeric species, as well as their doubly charged species, as indicated m the assignment of Fig. 5. The dissociation into subunits is mainly the result of the conditions in the matrix-analyte mixture and is strongly promoted by the addition of alcohol. This was proven experimentally and also explains why matri-

LDI-MS

of Bioorganic Molecules

223

Fig 5 LDI spectrum of lectm (from Cunaval~.~enslfomzis) with a mol wt of 102,000.

ces that are lesswater-soluble andthus needa higher percentageoforganic solvent in the matrix solution (e.g., the cinnamic acid derivatives) show only signals corresponding to the subunits. All in all, there is strong evidence that matrix-LDI simply reflects the aggregation state m the initial matrix solution and may therefore be used to determine the state of association of proteins. Future work will show if enzyme/ substrateand/or antibody/antigen complexes can also be desorbedintact. 4. Notes 4.1. Mass Resolution and Mass Determination Accuracy

The massresolution achieved in the microprobe instrument usedby the authors is typically 300-500 daltons in the low-mass range. For highmass ions, peak widths at half-height corresponding to a massresolution of 150 are the best observed, whereas Beavis and Chait report an upper limit of 700 with their experimental arrangement.Several factors contributing to a reduced mass resolution have been defined so far; others need more careful examination. The different methods of ion detection (separate conversion dynode in front of a Venetian blind multiplier vs doublechannel plate detector) are presumed to contribute.

224

Karas and Bahr

Another factor is the tendency of some matrices, e.g., nicotinic and vanillic acid, to form artifacts by the addition of matrix molecules, fragments, or photochemical products to the analyte ion. At low resolution, this may reduce the accuracy of mass determination, especially at high masses where shoulders arising from adduct signals are not resolved. A simple solution to this problem is the use of cinnamic acid matrices that do not promote adduct ion formation. Finally, mass resolution may only appear to be low because of a chemical heterogeneity in the molecules being analyzed. Despite the rather low experimental mass resolution, the accuracy of the mol-wt determination may reach a relatively high level. This strictly relates to the calibration procedure. Calibration with low-mass signals will only give absolute accuracy values in the range of 0.20.5% because of the necessarily imprecise time measurement and the need for extrapolation into the high-mass range. The highest possible accuracy is achieved by using a biopolymer of accurately known mass to provide the calibration masses. Masses are assigned by centroiding the upper symmetric part of all molecular ions having adequate signalto-noise ratios (as in 252Cf plasma desorption mass spectrometry, Chapter 22). Figure 6 shows the spectrum of a mixture of trypsinogen from bovine pancreas and cytochrome c. The singly and doubly charged molecular ions of cytochrome c are used as calibration masses. The measured mol mass is 23,985 + 12 dalton, which corresponds to a mass accuracy of + 0.05%. The mol wt calculated from the sequence is 23,98 1. Better mass resolution, enabling accuracies of about lOA (up to 20 kDa), has been demonstrated by Beavis and Chait for well defined proteins (10). 4.2. Mass Range The mass range accessible to the LDI technique reaches up to ca. 300 kDa. The trimer of urease subunits, which is known to be stable in solution, having a mass of 272,500 dalton, is the largest molecular ion detected. The heaviest ion measured so far is the dimer of glucose isomerase at a mass of 344,800 dalton. 4.3. Combination

with

Biochemical

Methods

Matrix-LDI mass spectrometry, like other mass spectrometric techniques, shows its full potential in combination with other biochemical techniques. Only a few examples can be given here.

LDI-MS

of Bioorganic

Molecules

225

Fig. 6 LDI mass spectrum of trypsmogen with an admixture of cytochrome C for mass calibral ion

4.3.1. Combination

with Enzymatic Methods

This was exemplified by the authors using enzymatic cleavage of the carbohydrate constituent of a glycoprotein. The sugar chain of beef spleen violet phosphatase (mol wt 35,050) was removed by cleavage with the enzyme endoglycosidase H. Comparison of the spectra from the intact and the treatedsample thus gives an easy and accuratedetermination of the mol wt of the sugar component (II). Conversely, if the amino acid composition is known, as in the case of Endoglucanase III (from the fungus Trichoderma reseii), the difference between the measured mol wt of 48,780 and the calculated value enabled determination of a sugar content of 6620 dalton, corresponding to 15.7% (12). Proteins composed of disulfide-linked subunits can be separated into their #constituent units by reduction of the disulfide bonds with dithiothreitol (DTT). This has been demonstrated by nicotinic acid matrix LDI of a monoclonal antibody (IgG from mouse against a human lymphokine). Before cleavage, the spectrum shows the intact protein with the molecular ion as base peak, and with multiply charged and cluster ions present. After addition of DTT, the spectrum gives information about the mol wt of the light and heavy chains. Chait and Field

226

Karas and Bahr

have proposed monitoring the time-course of enzymatic reactions by plasma desorption mass spectrometry of proteins bound to nitrocellulose membranes (13). Laser desorption massspectrometry may equally well be used in place of PD-MS, e.g., for the (partial) sequencing of protems. Until this time, the strategy has only been applied to small peptides, for example, the removal of the C-terminal amino acid of porcine renin substrate by enzymatic cleavage with carboxypeptidase Y. With ferulic acid as the matrix, the enzyme can be used under normal buffering conditions. In practice, the enzyme solution is added to a small droplet of peptide solution. After a suitable interval, the matrix solution is added, which stops the enzymatic reaction by lowering the pH. The spectra taken after solvent evaporation can thus be used to monitor the extent of enzymatic digestion. Future improvements in mass resolution will enable the technique to be applied m the lo-20 kDa range as an interesting tool when rapid determination of partial sequence is sufficient. Even in its existing form, matrix-LDI can also be used to monitor the time-course of other chemical or enzymatic reactions, and for process control applications. 4.3.2. Combination

with Blotting

Techniques

Because lasers can easily be focused to a small, precisely located region of the target, the technique is capable of producing spatially resolved mass spectra. The combination of LDI-MS with one- or twodimensional separation techniques, such as SDS-PAGE, is thus a challenging prospect. In this way, correct mol wt could be assigned to the separatedspecies, and incomplete separation detected. The most promising approach is the use of blotting techniques, which simultaneously retain spatial resolution and effect clean-up of the separated sample molecules. The prerequisite for this is that common blotting membranes can be used as substrates for subsequent matrix LDI. Whereas nitrocellulose gave poor results, experiments using polyvinylidene difluoride (PVDF) were much more encouragmg. In the mitial investigations, solutions of insulin and P-lactoglobulin were dripped onto a piece of PVDF and thoroughly washed with water and water/TFA solution. One microliter of a 30% formic acid solution of ferulic or sinapinic acid was then placed on the spot. Theseconditions are expected to at least partially overcome the strong protein-PVDF interaction. Experiments applying the technique to electroblotted proteins after

LDI-MS

of Bioorganic

Molecules

227

PAGE are currently in progress in the authors’ laboratory. An additional, highly interesting aspect of the successful use of PVDF-substrates is that the strong binding of the proteins to the membranes allows easy micropurification of the sample. Matrix-LDI may also make use of the further advantages of such a technique as, e.g., performing microscale chemical reaction directly on the substrate. Note Added in Proof

Although this article covers the principal aspects, the MALDI technique hasquickly developed and spreadout since this chapter was written. Several dedicated commercial instruments are available today. For further reading the following article is recommended: Hillenkamp, F., Karas, M., Beavis, R. C., and Chait, B. T. (1991)Anal. Chem. 63,1193A. References I Karas, M , Bahr, U , Ingendoh, A , and Hlllenkamp, F. (1989) Laser desorptlon/ lonizatlon mass spectrometry of proteins of mass 100,000 to 250,000 dalton Angew

Chem Int Ed. Engl. 28,X0,761

2 Karas, M., Ingendoh, A , Bahr, U., and Hlllenkamp, F (1989) Ultraviolet-laser desorptlon/lonizatlon mass spectrometry of femtomolar amounts of large proteins. Blamed. Environ. Mass Spectrom 18,841-843 3 Karas, M , Bahr, U , and Hillenkamp, F. (1989) UV-laser matrix desorptlon/lonlzation mass spectrometry of proteins in the 100,000 dalton range Int J. Mass Spectrom Ion Processes 92,23 l-242. 4. Karas, 111.and Hlllenkamp, F. (1988) Laser desorptlon Ionization of proteins with molecular masses exceeding 10,000 daltons Anal. Chem. 60,2299-2301. 5. Smith, R. D , Loo, J A, Edmonds, C G., Barmaga, C J., and Udseth, H R (1990) New developments m biochemical mass spectrometry: Electrospray Ionization. Anal. Chem 62,882-889. 6 Beavis, R. C and Chalt, B. T (1989) Factors affecting the ultraviolet laser desorptlonllonizatron mass spectrometry. Rapid Commun Mass Spectrom. 3, 233-23’7 7 Beavis, R. C and Chalt, B T. (1989) Matrix-asslsted laser desorptlon mass spectrometry using 355 nm radiation. Rapid Commun. Mass Spectrom. 3,4361139.

8 Beavis, R. C and Chait, B T. (1989) Cinnamic acid derivatives as matrices for ultraviolet laser desorptlon mass spectrometry of proteins. Raprd Commun Mass Spectrom 3,432-435

9. Feigl, P , Schueler, B , and Hillenkamp, F. (1983) LAMMA 1000, a new mstrument for bulk microprobe mass analysis by pulsed laser irradiation. Int. J Mass Spectrom Ion Phys. 47, 15-18. 10. Beavis, R. C and Chalt, B T. (1990) High accuracy molecular mass determination of proteins using matrix assisted laser desorptlon mass spectrometry. Anal. Chem 62, 1836-1840.

228

Karas and Bahr

11 Hillenkamp, F., Karas, M , Ingendoh, A., and Stahl, B (1990) Matrix-assisted UV-laser desorptton/ionlzation: A new approach to mass spectrometry of large biomolecules, m Biological Muss Spectrometry (Burlmgame, A and McClosky, J. A , eds.), Elsevier, Amsterdam, pp 49-60 12 Karas, M , Bahr, U., Ingendoh, A , Nordhoff, E , Stahl, B , Strupat, K , and Hillenkamp, F. (1990) Anal. Chim. Acta 241, 175-185. 13 Chait, B., Chaundhary, T , and Field, F H (1987) Mass spectrometrrc charactertzatlon of mtcroscale enzyme catalyzed reactions on surface-bound pepttdes and proteins, m Methods m Protean Sequence Analysis (Walsh, K A., ed ), Humana, Clifton, NJ, pp. 483492

CHAPTER

10

252Xalifornium Plasma Desorption Time-of-Flight Mass Spectrometry of Peptides and Proteins Peter

Roepstorff

1. Introduction Plasma desorption mass spectrometry (PDMS) (I) is a method for the mol-wt determination of peptides and small proteins. The upper mass limit is, in optimal cases, approx 30 kDa, with a precision of about 0.1%. This precision far exceeds that of classical biochemical methods, such as SDS-gel electrophoresis or gel permeation chromatography. The molecular weights determined by PDMS depend only on the atomic composition of the molecule and not, as m other methods, on extraneous properties, such as hydrophobicity or shape. Instrumentation for PDMS is presently only available from one manufacturer, Applied Biosystems AB (P.O. Box 15045, S-750 45 Uppsala, Sweden). The instruments are relatively cheap, and their operation and maintenance simple compared to other mass spectrometers with high-mass capability. A majority of the current applications are in the field of protein chemistry, and the method is rapidly becoming a routine technique in the protein chemistry laboratory (2). In practical biochemical studies, the amount of sample is often a limiting factor, and the quantity available for the mass spectrometric measurement may often be all that is recovered at the end of a tedious and costly preparation procedure. It is, therefore, very important that the mass spectrometric procedure employed gives the best chance of From Methods m Molecular Biology, Vol 17’ Specfroscoprc Methods and Analyses NMR, Mass Spectromefry, and Metalloprofem Techmques Edlted by C Jones, El Mulloy, and A H Thomas Copynght 01993 Humana Press Inc , Totowa, NJ

229

Roepstorff success at the first attempt and that the maximum amount of information is extracted from a given quantity of sample. The principal information obtained by PDMS analysis is the mol wt, but since the method consumes a very small proportion of the sample, it is possible to recover the reminder for chemical or enzymatic degradation.Thus, the molecular weights of the degradation products may be obtained without consuming further sample. The methods currently employed in the author’s laboratory to prepare and apply the samples and to obtain further structural information on the sample are described in the following sections. 2. Principles

of PDMS

and of Sample

Preparation

All particle-induced desorption methods are based on desorption of a sample from the solid or liquid state by bombardment with a beam of neutral molecules or ions whose energies range from a few kiloelectronvolts (keV) to megaelectronvolts (MeV). In PDMS, a low flux of primary ions in the 100 MeV energy range produced by spontaneous fission of californium-252 is used. The principle of the plasma desorption mass spectrometer is illustrated in Fig. 1. The sample is deposited on a 0.5-l pm-thick foil of aluminized polyester and placed in front of a lo-@i 252Cfsource. In each fission event, two collinear fission fragments are created, one of which hits the start detector and triggers the time measurement, while the other penetrates the sample and causes desorption of a number of secondary ions derived from the sample and sample matrix. These ions are accelerated by a lo-20 kV potential between the sample foil and the acceleration grid at earth potential, and are allowed to drift through the field-free flight tube to the stop detector where their flight times are recorded by the time-to-digital converter (TDC). Each fission event results only in formation of a few sample ions, and so to obtain sufficient ion statistics, it is necessary to accumulate data from a large number of fission events (105-107). This corresponds to anything between a few minutes and several hours of recording time. The flight time, T, is related to the mass-to-charge ratio of the ion (m/z) by the equation T = ki(m/z)‘” + k2, where ki and k2 are constants. Calibration of a spectrum can thus be performed if the masses and flight times of two peaks are known; normally the peaks for H+ and NO+ are used, since these are always abundant in the spectra when a nitrocellulose support is used.

231

Plasma Desorption Mass Spectrometry Sample on melsl foil

TDC

-

i-’ Computer

i

i

i

Time (ps)

Ftg 1 Principle of the plasma desorption mass spectrometer Ftssion of the 252Cf nucleus creates two particles, one of which triggers the start detector, whereas the other causes desorption of sample ions These are accelerated through the earthed grad and pass through the field free flight tube to the stop detector From the mdivtdual flight times measured after a large number of fission events, the mass spectrum IS reconstructed m the computer (Reproduced by courtesy of Applied Biosystems AB, Uppsala )

Roepstorff The commercial instrument is fully automated and very simple to operate. Also, unlike conventional mass spectrometers, there are no adjustments or ion-focusing controls. The quality of the spectrum therefore relies only on the quality of the sample and the sample preparation. Like other desorption ionization methods, PDMS is very sensitive to low-mol-wt contaminants, especially metal ions, whereas ammonium ions and many neutral contaminants have little or no adverse effect on the quality of the spectra obtained (3). Alkali metal salts are often used in buffers in protein studies and are also frequently present as trace impurities in water and organic solvents. Since the presence of even very small amounts of alkali metal ions strongly suppresses the molecular ion signal in the PD-spectra of peptides and proteins, it is of utmost importance to use clean solvents and to design sample preparation methods that minimize contamination or allow removal of such contaminants. First, glassware should, whenever possible, be replaced with polypropylene, and second, high-performance liquid chromatography (HPLC) with ultrapure solvents 1s advisable as the final purification step. The use of nitrocellulose as sample support allows removal of salt contammants, because they can be removed by washing the surface with pure solvents after adsorption of a peptide or protein sample (4). The nitrocellulose support has the further advantage that increased sensitivity and increased molecular ion yields are obtained (5).

3. Materials 1. Nitrocellulose solution prepared by dissolving a piece of nitrocellulose membrane (Blo-Rad Laboratories, Richmond, CA) in acetone to a concentration of 2 ctg/$. 2. Electrospray equipment (Applied Blosystems AB, Uppsala, Sweden). 3. Sample-solution: 0.1% trifluoroacetic acid (TFA), 15% ethanol or acetonitnle in water (v/v/v). The water m this and all other solutions must be ultra-high quality (UHQ), 15-18 M&&m resistlvlty water, for example, asprepared with an Elgastat UHQ apparatus (ELGA Ltd., High Wycombe Bucks, UK) or a Milh-Q water purification unit (Milllpore Waters, Milford, MA). 4. Sample spinner consisting of a vertically mounted variable speed motor (up to a minimum of 2500 rpm) equipped with a small target holder at the end of the motor shaft. 5. Washing solution: 0.1% TFA in UHQ-water.

Plasma Desorption

Mass Spectrometry

233

6. Reduction solution: 0.08M dithiothreitol (DTT) m O.lM ammonium bicarbonate, pH 7.8. 7. Digestion-solutions: 1pg/pL of trypsin or Stuphylococcusaureus protease in O.lM ammonium hydrogen carbonate adjusted to pH 7.4, or 1 l.tg/I.tLof carboxypeptidase Y or carboxypeptidase MI1 (malt carboxypeptidase II) in 0.05M ammonium acetate adjusted to pH 4-4.3 with acetic acid.

4. Methods 1. The mtrocellulose solution (25-50 pL) is placed in the spray capillary of the electrospray equipment, and the high voltage turned on to start the spray. The focusmg voltage is then adjusted to give a spot sizeof approx 7 mm in diameter, and the spray continued until all the solution is used. The mtrocellulose targets are vrsually inspected for quality and homogeneity (see Note 2). 2. The sample is dissolved m the solvent to a concentratton of 0.01-l E&L, and 2-5 @ of this solutton are placed in the center of the nitrocellulosecovered target mounted on the sample spinner. The sample IS distributed over the surface by gradually increasing the motor speed and dried by spinning at full speed.The drying that propagates from the center is easily observed (see Note 3). 3. If the sample is known to be contaminated by salts or other low-mol-wt compounds, it is washed by adding 5-200 pL of the washing solution to a slowly spinnmg target and dried at full speed as previously described. If necessary after recording a spectrum, the target can be removed from the mass spectrometer, washed, and reanalyzed (see Note 4). 4. After automatic or manual calibration of the spectrum, the centrotds of the peaks of interest are determined. To avoid influence from metastable contributions and unresolved adduct ions, i.e., the broad base of the peak, centrotd determination is, whenever possible, carried out by considering only the upper half of the peak as indicated m Fig. 2. 5. After recording a spectrum, most of the sample is still intact and can be cleaved chemically or enzymatically in situ (6). The target is removed from the mass spectrometer, 2-4 pL of the appropriate reduction or digestion solution are dtstributed over the surface, and the moist target is placed in a small plastic box containing a moist filter paper to prevent evaporation of the solution. The reaction is stopped after 5-20 min by spin-drying (see also Note 5).

5. Notes 1, HPLC using UHQ solvents is the preferred final purificatton step prior to massspectrometric analysis. It is possible to apply an aliquot of the eluent

234

Roepstorff Human Insulin

I

2904

37;

0

MH ;’

MH +

-

3

3248

4497

5747

E

M/Z

Fig. 2. Plasma desorptron spectrum of the molecular ton region of human insulin The bar indicates the position of the cursors in the centrordmg operation, and the shaded area 1s the part of the peak used for the calculatron of the centroid. The broadening of the lower part of the peak is owing to metastable decay in the field free flight tube directly to the nitrocellulose target, but the cornpositron of the solvent and the concentration of sample may not be optimal. Therefore, the collected fractions are normally lyophtlized in polypropylene Eppendorf tubes in a vacuum centrifuge and redissolved m an appropnate amount of sample solution prior to application. 2. For analysis of molecules above 7-10 kDa and for very small sample amounts, it 1s important that targets with a thick homogeneous nitrocellulose layer are used. The best mtrocellulose targets are therefore used for such analysis, whereas lower quality targets may be used for less critical samples. 3. Small or very hydrophilic peptides may not adsorb strongly to the mtrocellulose and, thus, may be removed to the periphery of the target m the spinning process. If this is suspected, spinning 1s omitted and the sample solution simply left to dry. After drying, careful washing with a very small volume of washing solution may be possible. 4. Poor sample ion yield mdtcates the need to wash the sample. This may be the result of too little sample or too much sample, or a sample that is difficult to desorb, but most frequently it is the result of too high a content of

Plasma Desorption

Mass Spectrometry

235

alkali metal ions. This can be ascertained by observation of the intensity of Na+ and K+ ions at m/z 23 and 40, respectively. If the summed abundance of these two ions 1smore than half of that for H+, washing is recommended. Washing will also improve the result if too much sample has been applied. 5. The followmg in situ reactions have been successfully performed: Reduction with DlT (7,8); enzymatic digestion with trypsin (7), Stu$zylococc~~ aureus protease (7,8), and carboxypeptidases (9), When such reactions are performed, some of the reaction products may be too small to absorb to the mtrocellulose, and so are lost m the spin-drying process. Some components may not be observed becauseof suppression effects (7,s). Although informative, these procedures do not always give a complete picture.

References 1. Thorgerson, D. F , Skowronski, R P., and Macfarlane D. F. (1974) New approach to the mass spectrometry of non-volatile compounds. Biochem Biophys. Res Comm. 60,6 16-62 1.

2 Roepstorff, P (1989) Plasma desorptlon mass spectrometry of peptides and proteins Act Chem.Res.22,421-427 3 Mann, M , Nielsen, H R , and Roepstorff, P. (1990) Practical aspects of calibration and effect of non-protein compounds on spectrum quality in protein analysis by PDMS, m Ion Formation from Organic Solids (Hedm, A., Sundqvist, B U R , and Benninghoven, A , eds.), Wiley, Chichester, UK, pp. 47-54. 4 Jonsson, G. P., Hedin, A B., Hlkansson, P. L , Sundqvlst, B U R., S&e, B. G , Nielsen, P. F , Roepstorff, P., Johansson, K. E , Kamensky, I , and Lindberg, M S L. (1986) Plasma desorptlon mass spectrometry of peptides and protems adsorbed on nitrocellulose. Anal. Chem.58, 1084-1087. 5 Nielsen, P F., Klarskov, K , Hprjrup, P., and Roepstorff, P (1988) Optimization of sample preparation for plasma desorption mass spectrometry of peptldes and proteins using a mtrocellulose ma&lx Biomed Environ Mass Spectrom 17, 355-362 6 Chalt, B T., Chaudhary, T., and Field, F. H. (1987) Mass spectrometric characterization of microscale enzyme catalyzed and chermcal reactions on surface bound peptides and proteins, in Methods in Protein SequenceAnalysis (Walsh, K. A , ed.), Humana, Clifton, NJ, pp. 483-492. 7 Nielsen, P. F. and Roepstorff, P (1988) Suppression effects in peptlde mapping by plasma desorption mass spectrometry. Bromed Environ Mass Spectrom 18, 131-137. 8 Nielsen, P. F., Roepstorff, P., Clausen, I. B., Jensen, E G , Jonassen, I , Svendsen, A , Balschmidt, P , and Hansen, F. B. (1989) Plasma desorption mass spectrometry, an analytical tool mprotem engineering: characterisation of modified insulms. Protein Eng. 2,449-457

9 Klarskov, K , Breddam, K., and Roepstorff, P (1989) C-Terminal sequence determination of peptldes degraded with carboxypephdases of different speclfianes and analysed by 25*f plasmadesorpfion mass spectromeq Anal Biochem 180,28-37

Fast Atom Bombardment Mass Spectrometry of Peptides Robin Wait 1. Introduction The contribution of mass spectrometry to the solution of problems in protein biochemistry was limited until the development of methods of ionization that do not require derivatization or prior vaporization of the sample. Fast atom bombardment (FAB), introduced by Barber et al, in 1981 (I), is one of the most important of these methods, and has been widely applied in the peptide and protein field. In the FAB experiment (Fig. l), the sample is dissolved in a liquid of low vapor pressure, often glycerol or thioglycerol (“the matrix”), and is bombarded by a beam of energetic particles, such as xenon atoms that sputter sample molecules from the surface layers of the matrix into the mass spectrometer vacuum. Proton or other cation attachment produces abundant (positive) ions characteristic of the sample’s molecular mass. A proportion of these molecular ions dissociate, producing structurally informative fragments that are generally less intense than the molecular ions, since the ionization process imparts relatively little excess energy. Negatively charged ions are also generated, and spectra may be recorded in either mode by appropriate selection of the polarity of the ion extraction voltages. At low-mass FAB, spectra are generally dominated by signals attributable to ionization of the matrix. The background of “chemical noise” extending to high mass, which gives FAB spectra their characteristic peak-at-every-mass appearance, is probFrom Methods m Molecular Biology, Vol 17 Spectroscopic Methods and Analyses NMR, Mass Spectrometry, and Metalloprotem Techmques Edited by C Jones, B Mulloy, and A H Thomas Copynght 01993 Humana Press Inc , Totowa, NJ

237

238

Wait

Fig. 1. The FAB experiment. The sample, dissolved in a liquid of low vapor pressure on the probe tip, is bombarded with a beam of energetic particles, such as xenon atoms, which sputters ionized sample molecules into the mass spectrometer vacuum. The resulting ions are then mass-analyzed in the usual way.

ably attributable to direct hits on sample and matrix molecules by the bombarding species. Figure 2 shows a typical FAB spectrum of the cyclic heptapeptide microcystin-LR, obtained from the cyanobacterium Microcystis aeruginosa.

The advantages of the technique over conventional methods are that unusual and modified amino acid residues are easily identified, mixtures containing several peptides are amenable to analysis, and since there is no requirement for a free N-terminus, cyclic (2) and N-terminally blocked materials can be readily characterized. The areas where FAB-MS-based techniques have proven particularly successful include the identification of posttranslational modifications (3) (including glycosylation [4], phosphorylation [S], and sulfation [6]), checking the correctness of cDNA-derived protein sequences (7), confirmation of the fidelity of synthesis of recombinant materials (8), verifying the products of solid-phase peptide synthesis (9), the characterization of variant forms of proteins, such ashemoglobin (IO), and the assignment of disulfide bridges (11,12). It should be stressed that the information provided by FAB-MS is complementary to that afforded by classical protein methodologies, so the technique is most effectively deployed in conjunction with conventional strategies for protein characterization. The present chapter provides an introductory account of the practical aspects of obtaining and interpreting the FAB spectra of peptideand protein-derived samples. It is particularly aimed at biochemists

1149.6 450.3

I

372.2

I

386.1

282.3

Fig. 2 FAEI mass spectrum of a cyclic heptapeptide toxin, mycrocystin-LR, from Microcystis aerugznosu. The inset shows an expansion of the molecular ion region The signal at m/z 1149.6 IS an artifact owmg to addrtion of a molecule of the matrix compound, dithiothrettol. The spectrum was obtained on a Kratos MS80, using xenon as bombarding gas and a 5:l mixture of dithiothreitol:dithioerythritol as matrix. The mass-assigned profile plot represents the average of eight scans. These compounds are particularly suited to analysis by FAR, since the absence of a free N-terminal and the presence of unusual ammo acids, such as Nmethyl dehydroalanine, /3-methylaspartate, and 3-amino-9-methoxy-2,6,8,-trimethyl-lO-phenyld6-decadienoic acid, make them difficult to characterize by conventional methods

0

30.

40.

50-

60-

70-

90-

6

240

Wait

contemplating collaboration with mass spectrometrists, but may also be found useful by spectrometrists who have not previously worked with peptides. Although the main focus is on the use of two sector magnetic mass spectrometers, much of what follows is equally applicable to other types of instruments. Relatively few detailed applications are discussed, but many will be found in the more comprehensive references listed below (13-19). 2. Instrumentation

and Materials

2.1. Instrwnentation Fast atom bombardment sources can be fitted to most types of mass spectrometers, including magnetic sector, quadrupole, and Fourier transform instruments. Magnetic sector instruments, however, are particularly suitable in that their mass range at full accelerating voltage (currently >lO,OOO dalton) approximately matches that over which FAB produces usable secondary ion currents. Although the upper mass limit of a magnetic sector instrument equipped with FAB is lower than that of time-of-flight mass spectrometers fitted with laser ionization or plasma desorption sources (see below; Chapters 9 and lo), mass measurement accuracies of 0.3 dalton or better are possible over much of the range of sector instruments permitting unambiguous identification of most peptide modifications, including C-terminal amidation. Moreover, becauseFAB producescontinuous ion currents ratherthan the short pulses of these other techniques, it is more easily combined with supplementary techniques, such as collisional activation and peak matching. Apart from a mass spectrometer, the equipment for FAB-MS consists of a source of bombarding particles and a probe, by means of which the sample is mtroduced into the instrument. The bombarding species may be neutral, for example, argon or xenon atoms, or charged particles, such as cesium ions (20). Xenon is much to be preferred over argon, since greater momentum transfer results in significantly more intense ion currents (21,22); the higher cost is relatively insignificant given the extremely low rate of consumption (typically a few mL/d). Cesium guns produce ions of up to 35 keV energy, compared to the typical 6-10 keV of fast atom guns. Consequently, better secondary ion yields are obtained, particularly in the high-mass (~3000 dalton) region (23). Increased sample lifetimes have also been reported owing to a lower flux of bombarding particles. The latter technique is more

FAB Mass Spectrometry accurately called liquid secondary ionization mass spectrometry (LSIMS), since the primary beam is not composed of neutral atoms. However, the crucial aspect of both experiments is the use of a liquid matrix, not the charge state of the bombarding particles (indeed the “neutral” beam from a saddle-field FAB gun contains a substantial proportion of xenon ions [22j). The principal difference is thus that higher primary beam energies are used in LSIMS; the resulting spectra are often virtually indistinguishable, so the term FAB is sometimes used somewhat loosely to describe both techniques. The main focus of this chapter is FAB-MS using xenon bombardment, but most of what follows is equally applicable to LSIMS with a cesium ion beam. For most peptides, positive ion operation is more appropriate, since ion currents (and therefore sensitivity) are higher, and the fragmentation processes are more informative. There may be some advantage in recording negative ion spectra if electrophoresis indicates a net negative charge; however, as the number of amino acids increases, the imbalance between acidic and basic residues usually becomes relatively less significant, and such factors as overall hydrophobicity become more important in determining ionization efficiency. Negative ion operation, however, does appear to be superior for the analysis of peptides containing sulfated tyrosine residues, the [M - HI- ions of which are much less prone to desulfation than [M + H]+ ions (6). For introduction into the mass spectrometer, the sample solution is deposited on a removable target mounted on the probe. Targets are generally constructed of copper, gold-plated copper, or stainless steel. Copper is not especially satisfactory, since it sometimes contributes copper adduct ions to the sample spectrum, and cleaning with nitric acid results in rapid dissolution of the target. Improved sensitivity has been reported from the use of gold-plated copper targets (24). Stainless steel is the most generally useful target material; its “wettability” by some matrices is poor, but this can be improved by roughening with mild abrasive. Coating the target surface with a thin layer of nitrocellulose has been reported to improve sensitivity (25). This can be achieved by dissolving a piece of nitrocellulose filter disk in acetone to a concentration of about 1 mg/mL, applying 1 ILL to the target, and allowing to dry. It is also possible to effect some desalting of hydrophobic samples by applying them to the nitrocellulose surface, and carefully washing with 10 pL of deionized water.

242

Wait

2.2. Materials 2.2.1. Matrix Choice

The defining characteristic of the FAB experiment is not bombardment with a neutral beam, but the presentation of the sample for ionization as a solution in a liquid matrix. This matrix protects the sample molecules from excessive radiation damage, and is responsible for the very stable and persistent analyte ion currents obtained, since surface ablation, and diffusion and convection processes within the matrix continually replenish the supply of sample molecules available for desorption. The chemistry of the sample matrix interaction is crucial, since in the positive mode, ionization is effected by transfer of protons from matrix to analyte, whereas in the negative mode, the matrix functions as a proton acceptor (26). The matrix has to dissolve the sample, since better data are obtained from solutions than from dispersions of peptides. The selected matrix should be miscible with water or other polar solvents, because peptide samples are generally loaded from aqueous solutions. A high dielectric constant is a useful property, because reduction of the coulombic interaction between solvated ion pairs will lower the energy required for their desorption. It is important that the matrix has a low vapor pressure,so that it does not evaporatetoo rapidly. This is necessary both to ensureadequatesample lifetimes (since the analyte spectrum will not be observed once the matrix is exhausted) and to preservethe vacuum in the ion source housing. Cooling of the probe is a means of prolonging ion currents when using volatile matrices, such as thioglycerol(27). Matrix viscosity is another significant experimental variable; generally low viscosity is desirable, since free diffusion of the sample molecules is thereby facilitated. Increasing the matrix viscosity has been shown to cause a severe degradation of spectral quality (28). The matrix does, however, need to be sufficiently viscous to form a stable film on the target, especially for instruments where the sample target is mounted vertically in the source. Finally, ions derived from ionization of the matrix should not obscure informative regions of the sample spectrum, and the matrix should preferably not react chemically with the sample. No single matrix ideally satisfies all these criteria, and some degree of compromise has to be accepted. The role of the liquid matrix has been reviewed (29,30) and a compilation of the most commonly used compounds has been published (31). A useful tabulation of rel-

FAB Mass Spectrometry

243

evant physical properties, including viscosity, dielectric constants, heats of vaporization, and proton affinities, has recently appeared (32). For work with peptides, the most commonly used matrices are glycerol, thioglycerol, 3-nitrobenzyl alcohol, and a eutectic mixture of dithiothreitol and dithioerythritol, though sulfolane, 2-hydroxyethyl disulfide (33), thiodiethylene glycol(34), and others have also been used. Glycerol is still among the most widely used FAB matrices. It is an excellent solvent of all but the most hydrophobic peptides, and because of its low vapor pressure, beam currents persist for 20 min or more. The background spectrum is well characterized, consisting of protonated clusters [93 + 92,]+ ([91 + 92,]- in the negative mode), which extend to beyond mass 1200. Thioglycerol is more acidic than glycerol and, consequently, affords more intense [M + H]+ signals. A mixture of thioglycerol and trifluoroacetic acid is particularly successful for the analysis of high-mass peptides (23,35). Thioglycerol is extremely volatile, and the resulting ion currents are therefore shortlived (typically <5 min), which limits the application of repetitive scanningand signal averagingmethods. Samplelifetimes canbe increased by the admixture of 50% glycerol, which also reduces the propensity of glycerol to form high-mass clusters. The background spectrum of thioglycerol consists of clusters of mass [ 109 + 108,]+. These are very weak above m/z 649 (n = 5). Commercial samples of thioglycerol frequently contain ammonium ions, which is why ammonium cationization ([MH + 17]+) of both matrix and sample ions (particularly with neutral sugar samples) is sometimes observed. A dithiothreitol/dithioerythritol matrix is prepared by mixing DTT and DTE in the proportions 5: 1 (w/w), and gently warming to 60°C or until melted. The resulting eutectic mixture remains liquid on cooling to room temperature. Sample ion currents are almost as persistent as those obtained with glycerol, but are more intense and the background ionization is weaker. Cluster ions are produced at m/z 155,309, and so on by increments of 154, but are very weak above m/z 463. If the matrix has not been freshly prepared, significant amounts of the oxidized form (2 dalton lower in mass) may be present, particularly in the larger clusters. Prolonged bombardment of samples in this matrix can result in the reduction of disulfide bridges (Section 2.2.2.). 3-Nitrobenzyl alcohol (36) is particularly useful for high-mol-wt and hydrophobic materials, which may give solubility problems in

244

Wait

more polar matrices. It is less readily cationized than hydroxylated matrices, and so can also be useful for samples that are difficult to desalt completely. Cluster ions ([ 154 + 153,]+) are intense at low mass, but much weaker above m/z 460. The higher mass clusters exhibit multiple dehydroxylations. On xenon bombardment, this matrix turns a disconcerting orange-brown color, but this seemsto be without adverse effect on the sample. Provided the best available grade is purchased, any of these matrices can be satisfactorily used as supplied. Some workers, however, prefer to purify glycerol and thioglycerol by vacuum distillation. It is prudent to degas matrices thoroughly before use, either under vacuum or ultrasonically, because this reduces the risk of frothing and spitting m the vacuum system. 2.2.2. Matrix-Induced

Artifacts

Chemical interactions between sample and matrix can result in the formation of artifacts, such as redox reaction products and adducts. The primary atom beam may itself play a direct role in these processes by generating radical and other reactive species as a result of radiation damage of matrix or sample molecules (37). Reactions between peptides and the matrix can be difficult to recognize, because the rate and extent of reaction can be variable and unreproducible. If sufficient sample remains, it is often useful to repeat an FAB experiment using a different matrix to check that unexpected or unassigned peaks do not exhibit mass shifts, which would indicate that they correspond to adducts or other artifacts. Mixed clusters may be formed between sample and matrix ions, resulting in [M + H + 92]+ with a glycerol matrix. Noncovalent addition of 154 dalton to molecular ions has been reported in both positive and negative modes when using DTT/DTE matrix (38). [MH + 12]+ species are sometimes observed when analyzing peptides in glycerol or other hydroxylated matrices (39,40). Other adducts corresponding to the addition of 24,42,56,74,86, 104, 116, and 134 dalton to the [M + H]+ have also been reported (41). Peptides containing several basic residues are particularly prone to form these adducts, especially in dilute solution and on prolonged bombardment. They probably originate via attachment of glycerol to primary amine groups in the peptide, followed by fragmentation, reaction at multiple sites

245

FAB Mass Spectrometry

being responsible for the higher mass species. Acidification and the use of sulfur-containing matrices minimize formation of these artifacts. The reduced artifact formation in thioglycerol may be the result of the production of S-centeredradicals, which aremore likely to decompose or dimerize than the C-centered radicals obtained when glycerol is irradiated (39). Unexpected ions have also been reported in FAB spectra of peptides run in 3-nitrobenzyl alcohol in both the positive and negative ion modes (42). These ions, corresponding to the addition of 133 dalton to themolecularion(i.e., [M+H+ 133]+and [M-H+ 13317, are thought to arise via Schiff’s base formation between 3-nitrobenzaldehyde and primary amine groups of the peptide. The 3-nitrobenzaldehyde may be present in the matrix as an impurity or may be generated in situ as a result of radiation damage. The extent of this reaction can be reduced by decreasing the nucleophilicity of the amine groups by protonation. Redox reactions are frequently observed (43,44), which are responsible for the production of various protonatedand deprotonatedspecies of the general type [M + nH]+. Usually, these artifact peaks are relatively weak compared to the molecular ions of peptides, but they may be of comparable intensity to the monoisotopic peak in larger peptides, and so make identification of the true molecular ion difficult (45). Under somecircumstances, disulfide bond reduction hasbeenreported, particularly in thiol-containing matrices, such asDTT/DTE and thioglycerol. In the caseof a peptlde containing a single intrachain disulfide bond, this is manifested by an enhancement of the intensity of the [M + 3]+ peak, which results in a distortion of the theoretical isotope pattern. Reduction of interchain S-S linkages results in the appearanceof additional signals in the spectrum, corresponding to the molecular ions of the constituent peptides. The extent of reduction varies with the matrix used and the length of xenon bombardment, and appearsto be reduced by acidification of the matrix. There is also some dependenceon structure, since many disulfide-bridged peptides are stable over the time scale of the FAB experiment, even m thiol-containing matrices. 2.2.3. Additives

and Cosolvents

The rate of dissolution of solid peptides in the viscous liquids used as FAB matrices is often slow, so samples are generally first dissolved in a suitable solvent that is miscible with the selected matrix. Trifluoro-

acetic (0.1%) or 30% acetic acids are often used for this purpose. The latter is useful for many peptides of otherwise poor solubility. Acidification of the matrix frequently results in an enhancement of the intensity of the molecular ion signal. The usual explanation of this phenomenon is that protonation increases the concentration of preformed ions available for desorption. A recent report, however, suggests that additional mechanisms are at work, including improved solvation of the analyte, changes in surface activity, modification of analyte volatility, and alteration of the effects of radiation chemistry (46). Acidification can alternatively be achieved by adding a drop of 0. 1M HCl or 5% acetic acid to the mixture of sample and matrix on the probe. Acids, such as oxalic orp-toluenesulfonic, which are less liable to be pumped away in the vacuum lock, can be used instead. The latter, being surface active, may enhance sensitivity by increasing the surface concentration of the protonated sample by ion pairing. Heptafluorobutyric acid has been recommended for this reason (47), but should only be added to samples that are effectively salt free, since alkali metal heptafluorobutyrates form readily desorbedclusters. Additives can also be used to increase the amount of structurally useful information in the FAB spectra of peptides; a considerable enhancement of the absolute degree of fragmentation was obtained by addition of 1 pL of 1M perchloric or sulfuric acid to peptides run from a glycerol/thioglycerol matrix (48). 2.3. Sample Considerations 2.3.1. Sample Purification The quality of the sample can determine the success or failure of an FAB experiment. The criteria of sample purity are rather different from thoseencounteredelsewhere in biochemistry; FAB-MS is extremely unforgiving of surface-active and ionic impurities, but it is often possible to obtain useful data from samples containing mixtures of peptides. The primary atom or ion beam does not penetrate more than lo-100 8, into the matrix, so sample desorption is restricted to this region. Thus, contaminants, such as detergents, which are more surface active than the sample, will successfully compete for the surface layers of the matrix and may be preferentially desorbed, in the worst case resulting in the complete suppression of sample ionization. A series of peaks separated by 44 dalton is often indicative of detergent

FAB Mass Spectrometry contamination. Wherever possible, sample preparation should avoid detergent-solubilization steps, and apparatus and containers that come into contact with the sample should be acid washed and thoroughly rinsed in deionized water. Ionic materials employed in sample preparation, such as ammonium sulfate, denaturing agents, like urea and guanidinium HCl, and ion-pairing reagents can also seriously interfere with FAB-MS and should be stringently removed in subsequent steps. Similarly, sodium and potassium salts, which are ubiquitous in samples of biological origin, cause problems. Trace amounts of alkali salts are not always disadvantageous and can be useful for distinguishing molecular ions from fragments, since the former will be accompanied by cationized satellite peaks. The presence of excessive salt contamination is generally indicated by cationization of matrix molecules, which introduces additional nonsample ions into the spectrum and may obscure structurally informative fragment ions. Sodium cationization of glycerol clusters, for example, results in a series of ions m/z 115,207,299, and so on; if the intensity of these amounts to >lO% of the corresponding protonated clusters, further desalting is almost certainly indicated. Cation attachment to sample molecules reduces sensitivity by spreading the sample ion current over several molecular species, e.g., [M + Na]+ and [M + K]+, respectively, 22 and 38 dalton above the [M + H]+ peak. Replacement of exchangeable protons in the sample by metal cations introduces further complexity into the spectra and, in extreme cases,prevents unambiguous determination of the molecular mass. Metal cation attachment can also inhibit sequence-specificfragmentation, since protonation of the amide nitrogen is required for the genesis of the a,, b,, and y,, + 2 series of backbone cleavage ions (49) (Section 3.2.). Replacement of protons by other cations alters the energy of the system and, thus, reduces the intensity of these fragments. In cases of serious salt contamination, the samplederived signal may be suppressed altogether because of preferential ionization of ionic clusters. Where possible, alkali and other metal ions should be avoided m the work-up of samples, e.g., by the use of volatile buffers, such as ammonium hydrogen carbonate, ammonium acetate, pyridinium acetate, or N-ethyl morpholine, which can be removed by freeze-drying. Volatile buffer salts may be contaminated by involatile impurities that will be concentrated by freeze-drying, so buffer solutions should always be

248

Wait

prepared from materials of the highest available purity, and the mimmum possible volumes and concentrations used. Sodium and potassium salts are leached frommany types of glass, so glass apparatus and containers should be avoided as far as possible, and sample manipulations carried out in conical bottomed polypropylene Eppendorf tubes. This is in contrast to gas chromatography-mass spectrometry for which plasticware is generally unsuitable because of the danger of attack by organic solvents, and consequent contamination of the sample with plasticizers and other interfering compounds (.50,51). For procedures involving aggressive reagents or solvents, for which plastics are undesirable, good-quality boron silicate glasscontainers (Wheaton vials, reactivials, or equivalent) may be employed instead.The final stageof sample preparation should normally be reversed-phase HPLC using a C 18, C8, or other suitable column, and an appropriate gradient of acetonitrilei water/TFA, which usually effects adequate desalting for FAB-MS. It is also possible to use disposablesolid-phase extraction cartridges packed with a C18-type phase; the sample is loaded in aqueous solution, the salts washed through with deionized water, and the peptides eluted with acetonitrile/water/TFA. An automated procedure for desalting samples for FAB, using disposable high-capacity solid-phase extraction columns, has recently been described (52). An appropriate blank should be subjected to all stages of sample preparation and analyzed by FAB to demonstrate that unacceptable levels of contaminants are not being introduced by the protocol. 2.3.2. Quantity of Sample Required

It is difficult to generalize about the quantity of peptide required, because sensitivity is sample, matrix, instrument, and operator dependent, and will vary with the information desired; a simple mass measurement can be achieved with lo- or 20-fold less material than would be needed to deduce a partial sequence. Similarly, a collisional activation experiment on a two-sector instrument will consume more sample than required for a conventtonal FAB spectrum, though the former experiment will usually generatemore information. Sensitivity will be highest when the mass spectrometer is operated at its full accelerating voltage and when transmission is maximized by using the widest setting of the resolving slits (i.e., low-resolution conditions). Sensitivity is related to the concentration of sample in the matrix, rather than the

FAB Mass Spectrometry

249

absolute amount, and will thus be increased by the use of targets of low surface area loaded with small volumes of matrix. Some peptides are intrinsically more sensitive, producing ion currents an order of magnitude higher than others run under similar conditions. In general, ion currents fall off rapidly with increasing molecular mass, even allowing for the molar advantage enjoyed by materials of low A4,. Contamination with traces of ionic or surface-active impurities will drastically increase the amount of sample neededto obtain a satisfactory spectrum. Normally, the sample requirement for FAB lies somewhere between a few picomoles and a nanomole, but it should not be necessary to load more than about 10 pg of a peptide sample to obtain acceptable data, and significantly increasing the quantity may actually result in a deterioration of the spectrum owing to increased matrix viscosity. It is unlikely that all of the sample will be consumed in the course of an analysis, so the residue may be washed off the target and purified by HPLC. 2.4. Acquisition of FAB Spectra 2.4.1. Calibration

Compounds

and Calibration

To assign a mass axis to the spectra, the instrument must first be calibrated. This is generally achieved by recording the spectrum of a reference compound that produces peaks of accurately known mass. Identification of the reference peaks then enables mass assignment of the unknown peaks, either manually or by computer. The most generally useful mass calibrant for FAB-MS is cesium iodide, which produces clusters of formula [(CsI),Cs]+ covering the mass range of 133 to >25,000 in the positive ion mode. Negative ion calibration can be achieved using [(CsI),I]-clusters, but the ion currents are weaker by at least an order of magnitude, so the range of the resulting calibration is usually narrower. Both cesium and iodine are monoisotopic elements, so the reference peaks are singlets, devoid of isotopic complexity. Identification of the reference massesis facilitated by characteristic intensity discontinuities at m/z 1692,247 1,35 10, and 5849. Acalibration sample is prepared by dissolving cesium iodide (of 99.5% or greater purity) in distilled water to a concentration of about 100 mgl mL, applying about 1 pL evenly to the sample stage and allowing the water to evaporate. The temptation to speed this process by performing the evaporation in the vacuum lock should be resisted, since rapid boiling off of the solvent will probably result in contamination of the

250

Wait

lock with solid CsI, which may lead to scratching of the vacuum seals. No advantage is gained by using larger quantities of CsI, since excessive quantities usually produce a weaker spectrum. When irradiated by the xenon atom beam, CsI emits a faint blue-white fluorescence, which provides a convenient verification of correct gun operation. A calibration sample so prepared is good for several hours’ use, but should be renewed once or twice per day. The target should be thoroughly cleaned before fresh calibrant is applied. It is best to keep one target solely for calibration compounds to avoid introducing ionic contaminants into otherwise salt-free samples. Commercial CsI may be contaminated by low concentrations of RbI, which is manifested by the presence of doublets of satellite peaks 48 and 46 dalton below the main references. The use of copper sample stages can also result in the incorporation of copper ions in the clusters, in which case signals are observed 70 and 68 dalton below the references. At high gain, these signals can give the low mass end of the spectra a rather confusing aspect. For some purposes, a calibration below m/z 133 may be needed (e.g., to mass assign the immonium ions in a peptide spectrum or to calibrate a B/E linked scan to low mass). Sodium or lithium iodides may be used to provide calibration down to m/z 23 and m/z 6, respectively. The mass increment between references in CsI (260) is rather too large for high-resolution mass measurements by the peak matching technique, particularly at relatively low mass. More suitable calibrants for this purpose include mixtures of CsI or NaI and glycerol, and an equimolar mixture of CsI and RbI. The former is particularly useful for calibration down to low massin the negative mode, since the glycerol contributes several references below m/z 127. A suitable CsI/ glycerol mixture may be prepared by the addition of 2 vol of an aqueous CsI solution (260 mg/mL) to 1 of glycerol (53), or by mixing equal weights of CsI and glycerol and briefly heating at 90°C (24). The exact masses of the CsI glycerol reference spectra in the positive (24) and negative (54) modes have been published. It should be noted that the spectrum varies over time, glycerol clusters tending to predominate initially, whereas cesium clusters become more significant after a few minutes. The most suitable spectrum for calibration is obtained after l-2 min of bombardment. The CsURbI mixture is used exactly as pure CsI calibrant; mixed CsI/RbI clusters differing in mass by 48 dalton are desorbed, which

FAB Mass Spectrometry

251

provide additional references between the (CsI)Cs and (RbI)Rb clusters. A calibration extending to low mass can be obtained by the addition of NaI, LiI, and KI to this mixture. 2.4.2. Data Acquisition

It is desirable to operate the FAB gun under conditions of reproducible primary beam flux, a parameter that is not easily measured directly in a conventional saddle field gun. Beam flux is controlled by the interaction of gas flow rate, anode potential, and emission current (the current that flows between the anode and cathode of the gun) (22). At a constant gas flow rate and anode potential, the sample ion yield is more or less proportional to the emission current. For most purposes, an emission current of 1 mA and 8 kV anode potential is suitable. The gun can be set up to provide reproducible conditions of beam flux by monitoring the intensity of m/z 133 from a CsI sample as a function of emission current and primary beamenergy (55). Once correctly set up, the gun can be switched off between samples, which will decrease the frequency of electrode replacement, requiring the removal and dismantling of the gun. Arecent report (56) suggeststhat optimum secondary ion yields are obtained from bovine insulin when using cesium ion bombardment at about 15 keV primary beam energy; if the bombarding energy is increased, the secondary ion yield actually falls, possibly because the beam penetratestoo far into the matrix for optimum ionization. To perform an acquisition, a suitable peak in the mass range of interest is displayed on the VDU or oscilloscope, and the tuning controls of the mass spectrometer are adjusted for maximum sensitivity and optimum peak shapeat the chosen resolution. On most instruments, the tunings for positive and negative ion operation will be found to be different. A reference mass from CsI or other appropriate calibration compound (preferably close to the mass of the sample if this is known) is suitable for preliminary tuning. If there is no shortage of sample, the molecular ion can be used for final tuning. If peak centroiding and mass assignment are to be performed in real time, the reference data would be acquired at this point and the instrument calibrated. To acquire the spectrum of the analyte, l-2 wof matrix are applied to the probe tip via a microsyringe or micropipet, and a similar volume of sample solution is added and thoroughly mixed. Suitable solvents forpeptide samples include 0. lMTFA, 30% acetic acid, or 50% water/

252

Wait

acetomtrrle. Peptides that are insoluble in aqueous solvents can usually be persuaded to dissolve in dimethylsulfoxide. Care should be taken not to add too large a volume of solvent, since if the mixture on the probe becomes too fluid, it can run off the target. The concentration of the solution should therefore be adjusted so that 1 pL or less contains sufficient sample for the analysis. If the amount of sample is very limited, it may be necessary to dissolve it in a larger volume and to make multiple applications, removing excess solvent in the vacuum lock in between. Alternatively, an equal volume of matrix can be added to a solution of the sample in an Eppendorf tube (sonication may be used to ensure complete dissolution) and a few microliters transferred to the probe. The latter is the preferred method when samples are limited in quantity or when solubility in the matrix is poor. The sample having been loaded by either method, the probe is introduced into the lock, and the solvent is pumped off m the rough vacuum; exhaustive removal of solvent is probably unnecessary, since evaporation of the final traces seemsto facilitate desorption of sample ions. It is prudent to withdraw the probe at this stage to check that sufficient matrix is present and that the target is evenly coated. Volatile matrices, such as thioglycerol, frequently coevaporate with the solvent, and it may therefore be necessary to add a little more. The probe is reinsertedinto the vacuum lock, the highvacuum valve opened, and the probe carefully introduced into the ion source, to a point a few millimeters short of the operating position. The FAB gun is switched on, and apeakcharacteristic of the sample (either the [M + H]+ or a prominent fragment) is displayed on the VDU. If the sample is a complete unknown, a matrix ion can be displayed instead. The probe is then fully inserted; its position is adjusted to maximize the intensity of the displayed peak and data acquisition initiated. If the instrument is equipped with a data system, there are two possible acquisition strategies; data can be centroided and mass assigned in real time, in which case the spectra will be presentedin bar-graph format, or the raw data can be written to disk and processed off-line. In FAB-MS, particularly at high mass, raw data acquisition (sometimes called profile or MCA acquisition) in which the digitized, but uncentroided collector signal is stored is preferable, since the operator has much more control over the subsequenttreatment of the data. Weak signals can be improved by repetitive scanning and computerized addition of the scans (57), the improvement in signal-to-noise ratio being proportional to

FAB Mass Spectrometry

the square root of the number of scans summed. This approach is particularly useful when scanning over a narrow mass range, since a hundred or more scanscan be recorded before the sample is exhausted. It is also possible to select appropriate values for the threshold and other peak detection parameters manually, to check that centroiding of unresolved multiplets has been performed correctly and to apply various smoothing algorithms to the data. Moreover, visual inspection of peak profile plots is more likely to distinguish weak fragment ions from noise spikes and other spurious signals, which a data system

might centroid erroneously. Once centroided, such signals are indistinguishable from genuine sample-related ions, since centroiding and conversion of the data into bar chart form destroys all information on peak shape and width. Peak centroiding is not in fact necessary, since

FAB spectra are more satisfactorily presented in the form of massassigned profile plots. When the precise mass of a sample is unknown, it is generally desirable to perform a survey scan at low resolution (about 1000) over the entire mass range (58). This will define an average (chemical) M, The accuracy of this mass determination can then be improved by performing a narrow magnet scan over the molecular ion region. Scanning over the five adjacent CsI references is generally suitable. If the sensitivity permits, this experiment can be performed at a resolution sufficient to separate the 13C isotopic contributions, and so define the monoisotopic M,. Where the sample is known to be of low M, (i.e.,

below about 3000), a survey scan is unnecessary, and the instrument is operated at a resolution of ca. 3000, and scanned from a start mass of 3500 down to about 100 at a scan rate of 30 s/decade of mass. 2.4.3. Target Cleaning

The FAB probe tip should be carefully cleaned between samples. The following is a suitable protocol for stainless-steel targets: 1. Wipe away any remains of the previous sample (or carefully wash into a polypropylene Eppendorf tube if it is necessaryto recover and repurify it). 2 Clean the target with a fine grade of abrasive paper. 3. Immerse the tip in concentrated nitric acid for a few seconds (well away from the instrument and preferably in a fume hood). 4. Rinse off the nitric acid in deionized water, and wash the target in HPLCgrade methanol, ideally with ultrasonic agitation; remove and dry.

254

Wait

Notes: 1. Rather less aggressive cleaning regimes should be used for copper and gold-plated targets. Copper is rapidly dissolved by concentrated nitric acid, so 50% should be used and the period of Immersion shortened. Abrasives and concentrated HN03 should not be used on gold-plated targets. 2. It is best to reserve one target for use with ionic calibration compounds, such as CsI and NaI, since very exhaustive cleaning IS required to remove all traces of these materials.

3. The Interpretation and Use of FAB-MS Spectra of Peptides 3.1. Molecular Mass Measurement The most fundamental piece of information in the FAB spectra of peptides is the value of the relative molecular mass (M,), which is obtained by the subtraction of the mass of hydrogen from that of the protonated molecular ion. For some purposes, a molecular mass measurement alone may be sufficient; for example, where the sequence is supposedly known from nucleotide sequencing, or for the confirmation of the structures of synthetic and recombinant peptides. Inspection of Fig. 2 reveals a cluster of ions in the region of the [M + HI+, rather than a single molecular species. This complexity arises principally because of the presence of the naturally occurring isotope 13C, which has an abundance of 1.1% compared to 12C.By convention, the species containing the lowest mass number of each isotope is regarded as being “the” molecular ion (i.e., consisting of only 12C, 14N, ‘H, 160, and 32S, in the case of peptides). For a peptide of y1carbon atoms, therefore, the isotope peak that contains a single 13C atom will be y1x 1.1% of the intensity of the molecular ion peak. The consequence of this is that, for peptides containing more than about 18 amino acid residues, the monoisotopic peak is not the most intense in the molecular ion cluster, and as the number of carbon atoms increases, it becomes even less significant (59). This is illustrated in Fig. 3, which shows the calculated isotope distributions expected for peptides of 10, 26, and 5 1 residues. For the decapeptide angiotensin I, the strongest signal is given by the [M + HI+, m/z 1296.7. In the 26-residue beevenom peptide melittin, however, the peak containing one 13Catom is more intense than the molecular ion, whereas in human insulin, the

FAB Mass Spectrometry

255 2847

5610

a99 L

se.1 1

e12

1300

-7

c

13oGQ

-J

813 %

se10

Fig. 3 Calculated isotopic distributions of the protonated molecules of 1. Angiotensm 1 (10 residues [M + H]+ = 1296 7) ), 2: Mehttin (26 residues [M + H]+ = 2845.8), and 3: human insulin (51 residues [M + H]+ = 5804.6). As the number of residues increases, the rsotopically pure monorsotoprc ion becomes relatwely less significant (Peak labels are rounded to the nearest integer.)

monoisotopic molecular ran (m/z 5804.646) 1sonly the eighth most intense signal in the molecular ion cluster. Furthermore, the various 13Cisotope peaks are not themselves homogeneous, but contain small contributions from 34S, 15N 180 and 2H. With molecules the size of iniulin and larger, it is in any case difficult to unambiguously identify the monoisotopic peak, which can easily become lost in the background, particularly if ion statistical effects distort the theoretical isotope distribution. The problem is compounded by the superposition of a pattern of satellite peaks, such as M+, [M - l]+, and so forth, because of redox processes (Section 2.2.2.). Furthermore, the narrow resolving slits required to separate the individual 13Cisotope peaks result in a considerable decrease of sensitivity by reducing transmission through the instrument. The dis-

256

Wait

tribution of the ion current over several molecular species results in a further loss of sensitivity. For these reasons, it is often preferable when analyzing large peptides to operate the instrument with the resolving slits fully open (i.e., at low [
257

FAB Mass Spectrometry Table 1 Monotsotopic and Average Residue Masses of Common Ammo Acids and Termmal Group+’ Amino acid Alanine Arginine Asparagine Aspartic acid Carboxymethylcysteine Cysteme Glutamic acid Glycine Histidine Homoserme Homoserme lactone Isoleucine Leucme Lysine Methionine Phenylalamne Prolme Pyroglutamtc acid Serme Threonine Tryptophan Tyrosme Valme Hydrogen Amino Hydroxyl Methyl ester Acetate

Abbreviation

Monotsotopic

mass

Average mass

A R N D CMCys

7103711 156.10111 114 04293 115.02694 16101466

71.0788 156.1875 114 1038 115 0886 161.1755

C E G H Hse Hse>

103.00919 129 04259 57.02 146 137.05891 101 04768 83 03712

103.1388 129 1155 57.05 19 137.1411 101 105 83 090

I L K M F P
113 08406 113 08406 128.09496 13104049 147.06841 97 05276 111.03203 87.03203 101 04768 186.0793 1 163.06333 99.06841 1.00782 16 01872 17.00274 31.01839 43.01839

113 1594 113 1594 128.1741 131 1926 147.1766 97 1167 111 1002 87.0782 101.1051 186.2132 163.1760 99 1326 1.00794 16 0226 17.0073 310342 43.0452

aPepttde molecular masses may be calculated using the data in this table by summing the value of the residue masses for the sequence and addmg the approprtate terminal groups (i e , H and OH for a peptide with a free ammo and carboxyl terminal, H and NH;! for a carboxy terminal amide, and so on) An addmonal proton must be added to grve the value of the [M + HI+ ton.

conjunction with mass assignment by manual counting of oscilloscope traces, which effectively ignores the fractional component of the mass. Most mass spectrometer data systems include software for the calculation of molecular masses from atomic compositions, which the user can modify to include additional functional groups and amino acid

258

Wait

residue masses. Alternatively, the calculations may be performed with a spreadsheet program, such as “Lotus 1,2,3.” The single letter codes of amino acids can be used as labels for macros that enterthe appropriate residue mass,which enables the automatic calculation of the molecular ion and fragment ion masses corresponding to a given sequence. 3.2. Fragmentation

of Peptides

It was recognized very early that structurally significant fragment ions arepresent in the FAR spectraof peptides (61-63). These fragments originate from cleavages of the amide backbone, the mass differences between peaks corresponding to the residue masses of consecutive amino acids. Convenrently, these all differ by at least 1 dalton, except for the pairs leucine/isoleucine and lysine/glutamine (Table 1). The task of sequence determination is thus reduced to the correct identification of consecutive fragment ions (Fig. 4). Chargeretention can occur on either the amino or the carboxyl-terminal-containing fragments, and the cleavage point can be the peptide bond itself, or the adjacent N-C and CC bonds, with or without hydrogen transfer (Fig. 5), thus giving rise to at least six families of sequence ions. Internal fragment ions, generally amino acylium or immonium ions (Fig. 6), are produced by the operation of multiple cleavages. Amino acid immonium ions ( [H2N = CH -RI+) at low mass identify some (though not necessarily all) of the residues in the sequence (Table 2) and are thus a useful interpretative aid. In two-sector spectra, however, the low-mass end of the spectrum is frequently dominated by the matrix background, which is likely to obscure the immonium ions, except when high sample loadings are used. Losses of amino acid side chain fragments from the [M + H]+ ion provide a further indication of amino acid composition (Table 2). These various fragmentation processes have been summarized (13) and a consistent nomenclature generally adopted. Fragments derived from the N-terminus of the molecule are designated asA, B, or C, depending on the point of cleavage, whereas C-terminal-containing ions are similarly described as X, Y, and Z (Fig. 5). The residue at which cleavage occurs is indicated by a subscripted numeral, fragments being numbered in ascending order from the terminal that retains the charge. This system enables unambiguous description of both positive and negative ion mode fragmentation processes. The original nomencla-

Intensrly

1788

[M+Hj+=t7498

b+

Y15

1500

62

61

16e3

g

Ylfi

G f’

y14

%?

fi”

Y13

1408

B4

9 65

9

Yl2

Y11

1300

B6

3

YlO

87 7ca3

Y9

1200

B6 8374

-j% jr

Ya

B9 ma.5

rfp

Y7

Y6

B,O mw.5

llee

B1l 11045

Y5

b10

'12 m7.3

$3 -J-ii 9

y4

1888

'13 t3mI

$3

Y3

'14 14uu

fi

Y2

Yl

'15 16w.~

616 lacll

j-i- $2

NH2

b8

I

YX

Frg. 4 Part of the FAB spectrum of a 17-residue synthetic peptide GIVPPDEELPGLVSLNC, showmg sequence-related fragment Ions The mass-assigned profile plot is an average of six raw data scans and was obtained with a Kratos MS80, usmg xenon as bombarding gas and DTTIDTF as matrrx. The fragments are named accordmg to the system defined in Frg 5 A proportion of the intensrty of the srgnal at n/z 1480.7 is owing to the [M + HI+ of an impurity from which the three N-terminal residues have been deleted.

10

20

30

40

50

60

70

80

90

Reldtve 100

260

Wait

Fig 5. The Roepstorff and Fohlman system (64) for the nomenclature of peptide sequence ions Fragments m which charge IS retained on the N-termmus-denved product are named A, B, or C, depending on the pomt of cleavage, whereas C terminal fragments are similarly designated X, Y, and Z The residue at which cleavage occurs is indicated by a subscripted arabic numeral, the fragments bemg numbered m ascendmg order from the charge-retaining terminal Hydrogen transfers are explicitly designated by means of arable numerals, e g , Y, + 2.

ture of Roepstorff and Fohlman (64) indicated hydrogen transfers by superscripted primes (e.g., Y,“). In more recent practice, such transfers are either designated explicitly by the use of arabic numerals (e.g., Y, + 2) or else are deemed to be implicit, particularly in the case of the Y, + 2 and C,, + 2 fragments. These are thus simply described as Y, and C,, since two hydrogens are almost invariably transferred in the course of their formation. The use of lowercase letters has also been recommended and widely adopted (65), to avold confusion with the single letter codes for amino acids. Some of these fragments are depicted in Figs. 6 and 7, together with the rules for calculating their masses from a given amino acid sequence. It should be stressed that, m most cases, the structures are hypothetical, not having been determined in any rigorous sense,and arejustified principally by their heuristic usefulness. Although the nature and extent of cleavage is highly structure dependent, the factors that determine which fragments are formed in the twosector FAB spectra of peptides are poorly understood. Most peptides undergo fragmentation by at least some of theseprocesses,the c + 2 and the y + 2 pathways being the most common. The presence of a basic residue at or near the N-terminal favors production of c t 2 ions, whereas a basic residue located toward the C-terminal encouragesy + 2 and z + 2 ion formation. Cleavage at proline often produces particularly intense y + 2 fragments. In two-sector spectra obtained without collisional activation, z + 2 fragments are observed more commonly than the z and z + 1 species found in tandem spectra.Extended series of

261

FAB Mass Spectrometry

R NH-CH- ‘”

~-NH-%+

n-l

ProtonatedPeptrde

A, (Mass= Zn - 27)

Ant1 (Mass= Zn-26) H+ R I” NH-CH-

I -NH2

i-l-1

B, (Mass= Z, t 1)

InternalAcyl Ion

C,t2

(Mass= Z, + 18)

Internal lmmonum

Ion

Frg. 6. Common N-terminal cleavage tons Their massesmay be calculated by summing the residue massesof the amino acrds in the sequenceup to the cleavage pomt (countmg from the N-termmal end), and adding or subtractmg the indicated numbers. Note that Internal acyl and immonmm Ions originate by the operation of multiple cleavages,and do not contain the orrgmal N-termmal

262

Wait Table 2 Nominal Masses of Amino Acid Immonium

Amino actd Alanine Arginine Asparagine Aspartm acid Cysteme Glutamic acid Glycine Histidine Isoleucine Leucme Methionme Phenylalanme Proline Serme Threonme Tryptophan Tyrosme Valine

Mass of immonium 44 129 87 88 76 102 30 110 86 86 104 120 70 60 74 159 136 72

Ions and Side Chain Losses ion

Side chain mass, loss from 1M + Hl+ 15 100 58 59 47 73 81 57 57 75 91 31 45 130 107 43

x fragments are likewise relatively uncommon. The presence of particular amino acids may be indicated by characteristic processes;serineand threonine-containing fragments, for example, often lose water. More recently, three additional families of fragments have been described in high-energy collision spectra, designated d,, v,, and w,, which arise by side chain losses from backbone cleavage ions (66,67). Fission of the P-bond of the side chain of the newly generated terminal residueof an a, + 1 fragment producesa d, ion, and w, ions are produced analogously from z, + 1 fragments. The importance of these ions is that they allow the distinction of isomeric amino acids, such as isoleutine and leucine (68) and 3-hydroxy- and4-hydroxyprolines (69). An ion 28 dalton below an a, ion, for example, is diagnostic of a C-terminal leucine residue in the fragment, whereas isoleucine in the corresponding position would eliminate 42 dalton. In the negative ion mode, the most usually observed fragmentation processes produce ions of the y, z, and c series.

FAB Mass Spectrometry

263

9

n+l -HN-CH-g-NH-CH 0

ProtonatedPeptrde

X, (Mass= zn + 45)

HiH-b-&!H-&

I::-&1:H-&

Yn + 2 (Mass= Z, + 19)

R I”

Z,+2

(Mass= Z, t 4)

OH n-l

+CH-

Z,, (Mass= Z, + 2)

Fig. 7 Common C-terminal cleavage ions Their massesare calculated by summing the residuemassesof the amino acids in the sequenceup to the cleavage point (counting from the C termmal) and addmg the number Indicated. Other C-terminal tons sometimesencounteredinclude Z, + 1 and Y, - 2.

Because of these complex and varied fragmentation pathways, there are comparatively few instances of the unaided sequencing of peptides of unknown structure by conventional (i.e., two sector) FAB mass spectrometry (70,71). This is unsurprising given that it is necessary to rely on spontaneous fragmentations to produce an intense and readily assignable set of sequence-defining ions. In practice, the fragments are weak, so that in all but the most favorable cases,large samples (2-5 nmol or more) are required to ensure that sequenceions are not obscured by the matrix background, particularly at low mass. Second, the pattern of fragmentation is generally complex, resultmg in a formidable task of assignment; members of most of the possible families of backbone cleavage products may be represented without necessarily an extended set of any one type. The presence of extensive internal fragmentation further complicates interpretation. Third, a pure sample is needed, so that the molecuiar ions of minor contaminants are not misidentified as fragments, and so that sequenceions can be unambiguously associated with their correct precursor ions. For thesereasons,the most challenging sequencingproblems aremore likely to succumb to a tandem mass spectrometry-based approach (see Chapter 12). However, when sequencesandpartial sequencesareknown or suspected,FAB MS on a two-sector instrument provides a convenient and rapid means of their verification or correction, particularly when supplemented by linked scanning and microchemical procedures. 3.2.1. The Interpretation

of Fragmentation

Patterns

The task of interpreting any peptide spectrum IS srmplifred by taking into account all available nonmass spectrometric data, such as the results of amino acid analysis and other microchemical procedures, the sequences of possibly homologous materials, and the specificities of any enzymes used in the preparation of the sample. First, identify and mark on the spectrum all peaks attributable to matrix clusters; these neednot be consideredany further (but note any in the high-mass region that are unexpectedly intense and may therefore correspond to matrix ions isobaric with sample ions). Next, identify the [M + H]+ ion, which should be the most prominent peak at the high-mass end of the spectrum; if more than one molecular ion is present, sequencing will be impossible except by collisional activation and linked scanning.

FAB Mass Spectrometry The first step in extracting the sequence information from the spectrum is to assign as many fragments as possible to C- and N-terminal series. Paradoxically, this may be simpler in complex spectra where several types of fragments are present, since ions corresponding to cleavage of adjacent bonds will maintain a constant numerical relationship. The C-terminal y,, + 2 fragments, for example, are often accompanied by weaker z, + 2 satellites 15 dalton lower in mass (or 16 and 17 dalton, respectively, in the case of z + 1 and z ions). Similarly, a c, + 2 ion will be 17 dalton heavier than the corresponding b, fragment, which will in turn be 28 dalton above an a, ion. Derivatization can be a useful tool for assigning sequence ions: Acetylation of the N-terminal (if unblocked) with aqueousacetic anhydride (Section 55.1.) will shift a,, b,, and c, fragments by +42, while C-terminal ions are unaffected. Conversely x,, y,,, and z,, ions will be mass shifted by esterification of the C-terminal (Section 5.5.2). However, the modified peptide may exhibit altered fragmentation behavior, and interpretation can be further complicated by acetylation of the E-amino group of of lysine and esterification of the side chains of aspartic and glutamic acids. Examination of the low-mass region of the spectrum may locate amino acid immonium ions, which afford data on the composition of the peptide (Table 2). Losses from the molecular ion corresponding to the side chain fragmentations listed in Table 2 should also be identified and marked on the spectrum. The next step is to identify the N- and C-terminal residues, which is achieved by searching the mass window between [M + H] - 30 and [M + H] - 23 1 for ions resulting from elimination from the molecular ion of the neutral fragment masses in Table 3. Note that there is some degeneracy in this table; for example, loss of glycine from the C terminal by an a-type cleavage cannot be distinguished from the loss of an N-terminal glutamic acid by an x-type cleavage. In such cases, the best strategy is to continue searching for losses of further residues, in the hope that the ion series can be identified. Some side chain fragmentations are also a potential source of confusion; for example, a signal at [M + H]+ - 100 could correspond to either an N-terminal threonine or to elimination of an arginine side chain. If fragment ions are present, but no plausible candidates for one or the other terminal

266

Wait Data for Identification

Amino acid residue Alanine Arginine Asparagme Aspartlc acid Cysteme Glutarmc acid Glutamine Gl ycine Histidine Isoleucme Leucine Lysme Methionine Phenylalanine Prolme Serme Threonine Tryptophane Tyrosine Vahne

Table 3 of N- and C-Terminal Residues0

Mass of C-terminal neutral fragment a b c+2 116 201 159 160 148 174 173 102 182 158 158 173 176 192 142 132 146 231 208 144

88 173 131 132 120 146 145 74 154 130 130 145 148 164 114 104 118 203 180 116

71 156 114 115 103 129 128 57 137 113 113 128 131 147 97 87 101 186 163 99

Mass of N-termmal neutral fragment X z+2 Y+2 44 129 87 88 76 102 101 30 110 86 86 101 104 120 70 60 74 159 136 72

70 155 113 114 102 128 127 56 136 112 112 127 130 146 96 86 100 185 162 98

85 170 128 129 117 143 142 71 151 127 127 142 145 161 111 101 115 200 177 113

T%e text for explanation

residues emerge, consider the possibility of N- or C-terminal modlflcation. Once a fragment corresponding to elimmation of a terminal amino acid has been identified, the next step is to subtract the residue mass of each of the residues in Table 1 and check for the presence of a fragment at each resulting mass; when one 1slocated, the process is repeated until no further matches are found. It is important to use the full set of residue masses for each step, which will probably result in several candidate sequences, of which the longest 1sthe most likely to be correct. Havmg located a consecutive series of fragments, search for related ions that might confirm the assignment; for example, look for z + 2 fragments 15 dalton below a putative set of y + 2 ions. Once the sequence has been identified, it should be relatively stralghtforward to assign the majority of the signals in the spectrum, including

FAB Mass Spectrometry

internal fragments. If a significant proportion of major ions resist assignment, particularly at high mass, then the candidate sequence is unlikely to be correct. In many spectra, the sequence ions do not extend to high mass. In such cases, the procedure is to assign as many fragments as possible to C-and N-terminal series, as described earlier, and then starting from two peaks separated by an amino acid residue mass, systematically search for the next peak in the series, both above and below, using the full set of residue masses. Additional experiments should always be performed to verify a mass spectrometrically determined sequence. If the peptide has a free N-terminal, the sequence of the first few residues can be determined by performing manual subtractive Edman degradations, with mass spectrometric measurement of the shortened peptides in between each cycle (Section 5.4.). It may also be possible to obtain some C-terminal sequence information by digestion with carboxy peptidases (72,73). 4. Peptide Mapping By FAB-MS Peptide mapping by mass spectrometry (74,75), variously known as “FAB mapping” (8) and “Digit printing” (IO), is a rapid method for checking the structures of peptides of supposedly known sequence, such as are obtained from recombinant DNA methods. The peptide (reduced and carboxymethylated, if necessary) is specifically cleaved usually by means of a protease enzyme, but sometimes chemically, to generate a mixture of shorter peptides, which is analyzed by FAB MS, either directly or after fractionation by HPLC in the case of larger materials. The masses of the [M + H]+ ions are then compared with those predicted on the basis of the sequence and the known specificity of the cleaving enzyme. In the ideal case, all the predicted molecular ions and no others are observed, thus confirming the structure of the peptide. Structural modification of the peptide is manifested by missing and anomalous signals in the digest, since such errors as insertions and deletions, oxidation of methionine, disulfide bridge formation, posttranslational modifications, and C-terminal heterogeneity resulting from proteolytic processing (“ragged ends”) alter the mass of the peptide containing the modification. If the protein sequence has been deduced by translation of the corresponding cDNA, then the method provides a simple way of checking

268

Wait

for errors in the nucleotide sequence, from misidentified bases causing single amino acid changes to more serious problems resulting in erroneous reading frames and the consequent failure of most of the signals in the digest to map onto the sequence (7). Digestion of most peptides with enzymes, such as trypsin, rarely results in fragments >3500 dalton in mass, so materials that greatly exceed the mass range of magnetic sector mass spectrometers are amenable to analysis. Other advantages of the FAB mapping strategy are, first, that errors at any point in the sequence are in principle located with equal probability, in contrast to conventional methods where errors that are remote from either terminal are often detected only with difficulty, and second, the digest can be examined directly, without the need to separate and purify the individual peptides. Figure 8 shows a simple example of this technique. Tryptic peptides of the predicted mass verify most of the sequence of a 58-residue synthetic analog of staphylococcal protein A. However, instead of two signals at m/z 906 and m/z 1458 expected from cleavage at lysine,, and lysinedg, an anomalous signal at m/z 2328 was observed, corresponding to the massof the sequencebetween asparaginezs and lysined9, minus 18. The interpretation is that an aspartic acid residue has cyclized to form aspartimide, rendering the adjacent lysine residue trypsin resistant (76). A useful tabulation of the mass changes attributable to various combinations of amino acid substitution has been published (IO). Additional experiments can be performed to aid assignment of the spectra; one or more manual cycles of Edman degradation will cause appropriate mass shifts to all peptides with a free N terminus. In a tryptic digest, the peptide (or peptides if C-terminal ragged ends are present) containing the original C terminal can be identified by treatment with carboxypeptidase B, which specifically removes lysine and argmine from the C terminus. Since cleavage with trypsin produces peptides terminating with these two amino acids, any peptide that is not mass shifted by subsequentcarboxypeptidase B digestion must contain the C terminus. If all the signals exhibit mass shifts, then either the original C-terminal residue is itself lysine or arginine, or the C-terminal peptide is subject to suppression and so is not observed in the digest. Extensions of the FAB mapping technique provide elegant strategies for the location of glycosylation sites and for the assignment of disulfide bridges. Digestion of a glycopeptide with a proteolytic enzyme will generate

2530

2000

1500

d

656

Fig 8 FAB spectrum of a tryptrc drgest of a 5%resrdue synthetrc analog of staphylococcal protein A The predicted peptides with [M + H]+ srgnals at m/z 906 and 1458 are replaced by an unexpected pepude [M + H]+ = 2328, which is because of a dehydratron of the aspartic acrd in position 36 of the sequence rendermg the adjacent lysyl bond trypsm resistant. The inset shows the sequence of the peptrde and its predtcted tryptrc cleavage fragments Peptides observed m the digest are shown shaded The spectrum was obtained on a Kratos MS80 using xenon bombardment and DTT/DlX matrrx. The mass-assigned profile plot was obtained by averaging 10 scans

0

10

70s

270

Wait

a mixture of peptides, some of which are glycosylated and some of which are not. The glycosylated peptides, being more hydrophilic, are not generally observed in the FAB spectrum of the digest. Molecular ion signals that can be mapped onto the known sequence thus correspond to the unglycosylated fragments of the peptide. Treatment of the mixture with a second enzyme, peptide-N-glycosidase F (“N-glycanase”) cleaves off the sugar chains, converting the asparagine residue at the point of attachment to aspartic acid. If a second FAB spectrum is recorded, additional signals will now be present corresponding to previously glycosylated peptides. The molecular ions of thesepeptides will be 1 dalton heavier per attachment site than predicted from the sequencebecauseof the conversion of Asn to Asp (4). To assign the positions of disulfide bonds (again it must be stressed, in materials of known sequence), the peptide is first cleaved, ideally under conditions that minimize disulfide reshuffling, i.e., at low pH, using for example pepsin, V8 protease, or CNBr/formic acid. An FAB spectrum is then recorded, followed by reduction (Section 5.5.3.) and reanalysis. Comparison of the two spectra reveals (in favorable cases) the locations of disulfide bonds in the intact polypeptide (11,12). The main practical problem with this strategy is that potential cleavage sites may be inaccessible when the disulfide bridges are intact, particularly if the investigator is restricted to enzymes active below pH 7. Unexpected signals may be observed in FAB maps for reasons other than structural modification of the peptide. The enzymatic digestion may not proceed as expected, perhaps because some potential sites are partially resistant to cleavage, or because the enzyme is not completely specific or is contaminated with another protease. Some of the signals in the digest may be attributable to contaminating proteins or to digestion products of the enzyme itself (77). The major limitation of the FAB mapping technique is that some of the components of the digest may not be observed in the spectrum (“suppression”). This is often the result of differences in surface activity between the peptides in the mixture; the more hydrophobic species tend to concentrate at the matrix/vacuum interface and so are preferentially sampled, whereas hydrophilic peptides, being more readily solvated, are less likely to occupy the surface layers of the matrix, and hence have a lower probability of desorption, and may even be com-

FAB Mass Spectrometry

271 Table 4

Bull and Breese Index Values for Amino Acid ResIduesa Ammo acid residue Alanme Argmine Asparagine Asparw Acid Cystine Glutamme Glutamic acid Glycine Histidine Isoleucine Leucine Lysme Methiomne Phenylalanme Proline Serine Threonine Tryptophane Tyrosme Valine

Bull and Breese index value +610 +690 +890 +610 +360 +970 +5IO +810 +690 -1450 -1650 +460 -660 -1520 -170 +420 +290 -1200 -1430 -750

‘The Bull and Breese index for a peptlde IS obtained by summing the values of the constltuent ammo acids and dlvidmg by the number of resrdues A positive value denotes a hydrophlhc peptlde, whereas negative values Imply greater hydrophoblaty. In nuxtures that encompass a range of hydrophobiatIes, peptides with high negative indices are the least hkely to be suppressed rn the FAB spectra

pletely absent from the spectrum (78-80). Often the more hydrophilic components of a mixture are observed only after several minutes of bombardment, so when analyzing mixtures, it is prudent to record data over the entire lifetime of the sample. The extent of suppressionis proportional to concentration, and it may sometimes be possible to ameliorate the effect by dilution of the sample. Average hydrophobicity indices (79), calculated using the hydrophobicity values of amino acids measured by Bull and Breese (81), are a good predictor of which peptides in amixture will be observedin anFAB spectrum. Hydrophobicity indices are calculated using the data in Table 4, by summing the values for each amino acid in the peptide, and dividing by the total number of residues.

272

Wait

The more negative the value, the more hydrophobic the peptide, and vice versa. Peptides with positive values are likely to be suppressed when mixed with those with more negative values. Note that both will be observed if analyzed separately. Some caution should be exercised, however, when considering small peptides, because the terminal amino and carboxyl groups exert a disproportionate effect on the overall hydrophobicity, which the index value does not take into account. If calculation of Bull and Breese indices suggests that there is a wide spread of hydrophobicities in a given peptide digest, measures to reduce suppression of the more hydrophilic components should be considered. Fractionation by HPLC prior to FAB-MS is among the simplest. It is not normally necessary to separateeach peptide; chromatography on a C 18 reversed-phase column with an acetomtrile/water/TFA gradient will tend to elute peptides in approximate order of hydrophobicity, so collection of fractions containing four to six components will probably reduce the hydrophobicity distribution to an acceptabledegree. In some cases, suppression effects can be reduced by the use of a more hydrophobic matrix; for example, the use of 1,2,6-hexanetriol in place of glycerol- and thiol-containing matrices reduced the suppression of several phosphopeptides relative to their dephosphorylated analogs (82). Increasing the ionic strength within the matrix by the addition of a strong mineral acid, such as perchloric acid, can also increase the likelihood of observing the more hydrophilic components of mixtures. Sample introduction via a continuous-flow FAB probe has also been shown to reduce the extent of suppression, though the phenomenon is not abolished (83). Esterification of carboxyl groups by treatment with acidic (2MHCl) methanol, isopropanol, or hexanol is an alternative means of overcoming suppression problems by reducing the differences in hydrophobicity among peptides (84). If most of a sequence has been verified in an FAB mapping experiment, but some of the tryptic peptides are missing, possibly as a result of suppression, then digestion of the original sample with a second protease of contrasting specificity will usually produce a mixture of peptides with a different spread of hydrophobicities, which may enable observation of signals representative of the unverified portions of the sequence. Digestion with a second enzyme is also a useful means of further pinpointing the sites of any anomalies revealed by the first experiment.

FAB Mass Spectrometry

273

5. Experimental Procedures 5.1. Digestion with Trypsin Trypsin cleaves on the C-terminal side of arginine and lysine residues. The reaction should be performed in a volatile buffer, such as ammonium hydrogen carbonate, which can be removed by freeze-drying. 1. Dissolve the sample jn a suttable volume of ammonmm hydrogen carbonate buffer (50 mmol, pH 8.5) m a polypropylene Eppendorf tube. Typically 0.1 mg of pepttde and 100 pL of buffer would be used. 2. Add a sufficient amount of a 1 pg/pL solution of trypsm in ammonium hydrogen carbonate buffer to give an enzyme to substrate ratto of 1:lOO (w/w). 3. Incubate for 2-4 h at 37°C. The progress of the reaction may be momtored by HPLC. Digestion ts complete when the chromatographic profile has stabilized. 4 Terminate the reaction by freezmg, and lyophtlize; a second lyophtltzation stage, after addition of further water, may be necessary to achieve complete removal of buffer. Alternatively, the digestion may be stopped by the addition of 10 pL of 30% acetic acid, and the sample concentrated on a vacuum centrifuge. A few microltters of propanol should be added to prevent prectpttation of the peptides during concentration. 5. Redissolve the lyophilized sample in a suitable volume of 30% acetic acid prior to FAB-MS. Notes: 1. Ammonmm hydrogen carbonate buffer (50 nuI4) is prepared by dissolving 0.4 g of NH,HCOs m 100 mL of deionized water, and adjusting to pH 8.5 by the addition of a few microliters of 0.88 ammonia solution. 2. A sequencing grade of trypsm, treated with tosyl phenylalanyl chloromethyl ketone (TCPK), to inhibit chymotryptic activity should be used. 3. Arg-Pro and Lys-Pro bonds are not usually cleaved. The rate of cleavage may also be reduced by adjacent acidic residues. 4. Even when TCPK-treated trypsin IS used, cleavage at other sites, such as phenylalamne and tyrosine, is sometimes observed parttcularly at high-enzyme/substrate ratios and with extended reaction times. 5. Longer digestions (6-12 h) may somettmes be needed. Since trypsin inactivates itself by autolysts, it may be more advantageous to add fresh enzyme after 4 h. 6. Trypsin autolysis products are somettmes observed m the FAB spectra of digests and (77); a control digestion should therefore be set up con-

274

Wait

taming buffer enzyme, but no peptide. Trypsm autolysis 1sreduced by the inclusron of 1 mM CaClz m the buffer. 7. Disulfide-bridged peptrdes may require reduction and carboxymethylation or pyridylethylatton prior to digestion. If this is carrted out under denatunng conditions using 6M guanidmtum chloride, the reduced and carboxymethylated material should be desalted by HPLC prior to digestion and FAB-MS. 8. Samples may be withdrawn at intervals throughout the digestion and analyzed by FAB MS. Monitoring the incomplete digestion products as a function of time then enables the construction of sequence-ordered tryptic maps (85).

5.2. Conditions

for Other Proteases

Other proteases are also sometimes useful for FAB-mapping experiments, chymotrypsin, V8 protease from S. aureu~ (sold by BoehringerMannheim [Indianapolis, IN] as endoproteinase Glu-C), and endoproteinase Lys-C being among the most commonly used. Lys-C cleaves with high specificity at lysine; minor nonspecific cleavages have been reported, primarily at asparagine. V8 protease cleaves at glutamyl and to a lesser extent aspartyl bonds. Chymotrypsin has a broader specificity than the other enzymes, cleaving on the C-terminal side of tryptophan, tyrosine, and phenylalanine, and less readily at leucine and histidine. The conditions for chymotrypsin, Lys-C, and V8 protease are similar to those for trypsin, except that the pH of the ammonium hydrogen carbonate buffer should be adjusted to 7.8 for V8 protease. Incubation temperatures of 37°C andenzyme-to-substrate ratios of 1: 100 (w/w) are appropriate. V8 protease can also he used in ammonium acetate buffer at pH 4, which allows digestion under conditions that minimize disulfide bond scrambling. At pH 8.4, the specificity is effectively restricted to glutamic acid.

5.3. Cleavage

with Cyanogen Bromide

Cyanogen bromide cleaves with highefficiency and specificity at methionine generating aC-terminal homoserine or homoserine lactone residue (masses 101.047 and 83.037, respectively) at the point of cleavage. 1. The peptide is dissolved to a concentration of 5-10 p&L tn 70% aqueous formic acid, and cyanogen bromide is added in 50-loo-fold molar excess.This may be achieved by the addition of a few crystals of CNBr, but given its toxic nature, It 1spreferably introduced as a freshly made concentrated solution in 70% formic acid.

FM Mass Spectrometry

275

2. Flush the tube with nitrogen, and incubate in the dark at room temperature for 16-24 h. 3. Terminate the reaction by the addition of 10 vol of water, and lyophlhze twice (after the addition of fresh water). Notes: 1. Cyanogen bromide KSextremely toxic, and all mampulatlons should be performed m a fume cupboard. 2. The cyanogen bromide solution should be prepared freshly before use and, If not completely colorless, should be discarded. 3. Various side reactions have been reported, including cleavage of some Asp-Pro bonds, cyclization of freshly exposed N-terminal glutammyl residues, and cleavage at tryptophane and tyrosme residues. The posslbllity of artifactual N-formylation should also be considered.

5.4. Manual

Edman

Degradation

Edman degradation selectively removes the N-terminal residue of unblocked peptides. The strategy is somewhat different from conventional protein sequencing in that the phenylthiohydantoin

derivatives

are discarded and the identity of the N-terminal residue is deduced from the mass difference between the original and the shortened peptide. In contrast to conventional Edman sequencing, the experiment may be performed on unfractionated mixtures of peptides, making it an extremely useful adjunct to the FAB mapping technique. Given sufficient sample, it is possible to perform several consecutive cycles of degradation, recording spectrabetween each, though the buildup of reaction products results in a cumulative deterioration of signal quality. 1. The dry sample (5-20 nmol) is dissolved in 50 pL of distilled water in a plastic Eppendorf tube, 50 pL of a 5% (w/v) solution of phenyhsothiocyanate in pyrldme 1s added, and the mixture IS incubated for 30 min at 37OC. 2. The sample IS extracted with two successive 100~pL vol of heptane:ethyl acetate (2: l), and the organic (upper) phase IS discarded. 3. The residual aqueous phase is lyophilized and treated with 50 & of anhydrous trifluoroacetlc acid for 10 mm at 37°C. 4. The trifluoroacetic acid is evaporated in a stream of nitrogen, the residue is dissolved m 50 pL of deionized water, and 1sextracted twice with 100 pL of butyl acetate to remove the amino acid anilmothiazolinones, which are discarded along with the organic phase.

276

Wait

5. The samples are lyophilized and redissolved m a suitable volume of 30% acetic acid, and loaded onto the FAB probe. Notes: 1. If only a single Edman cycle is required, it may be possible to omit the thiazolanme extraction step and analyze the sample at the end of stage 3, after removal of the TFA. 2. Treatment with phenyl isothiocyanate also offers a convenient means of distinguishing the isobaric residues lysine and glutamme, since the side cham amino group of lysme reacts to form a phenylthiocarbamyl derivative, whereas glutamme is unaffected. Lysine-containing ions are thus mass-shifted by 135 dalton.

5.5. Other Useful Microchemical

Reactions

5.5.1. Acetylation Primary amine groups are acetylated by treatment with acetic anhydride in methanol. If a 1: 1 mixture of acetic anhydride and its perdeuteriated analog is used in the reaction, then all N-terminal-containing ions will be reconizable as acharacteristic pattern of doublets, separated by 3 dalton (61). 1. Dissolve the peptide (up to 5 nmol) in 5 pL water in an Eppendorf tube. 2. Add 50 pL of a 1: 1 mixture of acetic anhydride and methanol (the acetic anhydride may be an equimolar mixture of [CH3C0]20 and [CD,CO],O), and allow to react for 1 min. 3. Quench the reaction with water and either freeze-dry or vacuum evaporate, redissolve m 30% acetic acid, and apply to the FAB probe. Notes: 1. Derivatization of the a-ammo group is complete after 1 mm; the Eammo group of lysme reacts more slowly. Some degree of acetylation at other sites, however, is usually observed 2. The reaction can be performed on the probe tip by addition of 1 pL of the methanohacetic anhydride reagent to the sample and matrix. 5.5.2. Methylation Treatment with acidic methanol converts the carboxyl terminus and the side chains of aspartic, glutamic, and S-carboxymethylcysteme residues to their methyl esters, the mass increment being 14 daltoni residue derivatized. Hence, the number of acidic residues in the pep-

FAB Mass Spectrometry

277

tide can be calculated from the mass shift of the molecular ion, and assignment of fragment ions may also be facilitated, since N-terminalcontaining ions will be unshifted unless they contain one or more of these residues. 1. Methanolic HCl is prepared by the dropwise addition of 1.6 mL of highpurity acetyl chloride to 10 mL of dry methanol. The solution is allowed to stand for 10 min at room temperature before use. 2. One hundred microliters of this reagent are added to approx 10 nmol of the dry peptide in a polypropylene Eppendorf tube. 3. After 2 h at room temperature, the methanol is evaporated in a stream of dry nitrogen, and the derivatized peptide is redrssolved in a suitable volume of 30% acetic actd, and analyzed by FAB-MS. Notes: 1. The methanobc HCl reagent should be prepared m a fume cupboard, and additions of acetyl chlorrde made from a dropping funnel into a vented flask, to eliminate spitting. 2. Partial replacement of amide groups is somettmes observed with this procedure. 3. The reaction may also be performed on the FAB probe itself; a mtcrohter of the methanolrc HCl reagent IS added to the mixture of sample and matrix on the target, and allowed to stand for 10 min at room temperature. The excess reagents are then pumped away in the vacuum lock, and the spectrum is recorded in the usual way. The shorter reactron time may result in incomplete derivatization. 4. Other esterification reactrons may be performed by substituting the appropriate alcohol m place of methanol. 5.5.3. Disulfide

Bond Reduction

Usually, spectra of both oxidized and reduced forms of peptides are required, so reduction of disulfide bonds is most appropriately effected on the probe tip in DTT/DTE matrix. One microliter of 0.88 ammonia solution is added to the mixture of sample and DTT/DTE matrix. After standing for 10 min, the ammonia is removed in the vacuum lock, the matrix is reacidified with O.lM HCl or 30% acetic

acid, and the spectrum is recorded. Alternatively, dissolve the peptide in 100 pL of pH 8.5 ammonium hydrogen carbonatebuffer, add 5 pL of a 100mg/mL solution of DTT, and incubate for 4 h at room temperature under a nitrogen atmosphere.

278

Wait References

1 Barber, M., Bordoli, R. S., Sedgwick, R. D , and Tyler, A. N. (1981) Fast atom bombardment of solids as an ion source m mass spectrometry Nature 293, 270-275 2 Eckart, K., Schwartz, H., Tomer, K B., and Gross, M L. (1985) Tandem mass

spectrometry methodology for the sequence determmatron of cychc pepttdes J Am. Chem. Sot. 107,6765-6769. 3. Carr, S. A. and Biemann, K. (1984) Identification

of posttranslationally modified ammo actds in proteins by mass spectrometry. Methods Enzymol. 106,29-58. 4 Carr, S. A. and Roberts, G D (1986) Carbohydrate mapping by mass spectrometry* A novel method for identifying attachment sues of Asn-lmked sugars m glycoproteins Anal Biochem 157,396406. 5 Poulter, L., Ang, S.-G., Gibson, B W , Holmes, C F B , Caudwell, F. B , Pitcher, J., and Cohen, P (1988) Analysis of the in vlvo phosphorylatton state of rabbit skeletal muscle glycogen synthase by fast-atom-bombardment mass spectrometry. Eur J. Biochem 175497-510 6 Arlandmr, E., Giota, B , Perseo, G., and Vigevam, A (1984) Fast atom bombardment mass spectrometry of cerulettde and [Tyr4] ceruletide. Znt. J. Peptlde Protein Res. 24386-391

7 Gibson, B. W. and Btemann, K (1984) Strategy for the mass spectrometrtc vertfication and correction of the primary structures of proteins deduced from their DNA sequences Proc Natl. Acad. Sci. USA 81,1956-1960 8. Morris, H. R., Panico, M., and Taylor, G W (1983) FAB-mapping of recombtnant-DNA protein products. Blochem Biophys Res Commun. 117,299-305 9 Canova-Davts, E., Chloupek, R C , Baldonado, I. P., Battersby, J E , Spellman, M. W., Basa, L. J., O’Connor, B., Pearlman, R., Quan, C., Chakel, J. A., St&s, J. T., and Hancock, W S (1988) Analysis by FAB-MS and LC of protems produced by etther btosynthettc or chemical techmques. Am. Biotechnol. Lab 6, 8-17 10 Wada, Y., Matsuo, T., and Sakurat, T. (1989) Structure elucidation of hemoglobin varients and other proteins by digit-prmtmg method Mass Spectrom. Rev 8, 379-434.

11. Morris, H. R. and Puccr, P. (1985) A new standard method for rapid assignment of S-S bridges in proteins Biochem Biophys. Res. Commun. 126,1122-l 128. 12. Yasdanparast, R , Andrews, P. C , Smith, D. L , and Dixon, J E (1987) Asugnment of disulfide bonds in protems by fast atom bombardment mass spectrometry. J. Biol. Chem. 262,2507-2513. 13. Btemann, K. and Martin, S. A. (1987) Mass spectrometnc determination of the amino acid sequence of peptides and proteins. Mass Spectrom Rev. 6, l-76. 14 McNeal, C J (ed ) (1988) The Analyszs of Peptldes and Proteins by Mass Spectrometry. Wtley, Chichester, UK 15 Desiderio, D M (ed ) (1990) Mass Spectrometry of Peptldes. CRC, Boca Raton, FL. 16. McEwen, C. N. and Larsen, B. S (eds.) (1990) Mass Spectrometry of Biologzcal Materials. M. Dekker, New York

FAB Mass Spectrometry

279

17 McCloskey, J. A (ed ) (1990) Methods in Enzymology, vol. 193: Mass Spectrometty Academic, San Diego, CA 18 Burlingame, A. L and McCloskey, J. A. (eds ) (1990) Biological Mass Spectrometry. Elsevier, Amsterdam. 19. Suelter, C H. and Watson, J T. (eds.) (1990) Methods in Biochemical Analysis, vol. 34. Biomedical Applications of Mass Spectrometry. Wiley, New York. 20. Aberth, W., Straub, R M., and Burlingame, A L. (1982) Secondary ton mass spectrometry with cesmm ion primary beam and liquid target matrix for analysts of bioorganic compounds. Anal Chem 54,2029-2034. 21. Martin, S A., Costello, C. E , and Biemann, K (1982) Opttmizatton of expertmental procedures for fast atom bombardment mass spectrometry Anal Chem 54,2362-2368. 22 Alexander, A. J and Hogg, A M. (1986) Characterizatron of a saddle-field dtscharge gun for FABMS usmg different discharge vapours int, J. Mass spectrom. Ion Processes 69,297-3 11

23. Barber, M. and Green, B. N (1987) The analysis of small proteins in the molecular weight range IO-24 kDa by magnetic sector mass spectrometry Raped Commun. Mass Spectrom 1, W-83.

24. Buko, A. M., Phillips, L. R., and Fraser, B. A. (1983) Peptide studies using a fast atom bombardment high field mass spectrometer and data system: l-Sample mtroduction, data acquisttton and mass calibration. Biomed. Muss. Spectrom 10, 324-333 25. Van Bremen, R B. and Le, J C. (1989) Enhanced sensittvrty of peptide analysis

by fast atom bombardment mass spectrometry using mtrocellulose as a substrate. Rapid. Commun Mass. Spectrom 3,20-24.

26 Fenselau, C and Cotter, R. J. (1987) Chemical aspects of fast atom bombardment. Chem. Revs. 87,501-512 27 Falick, A. M , Walls, F C , and Lame, R. A (1986) Cooled sample introduction probe for liquid secondary Ionization mass spectrometry Anal Bzochem. 159, 132-137. 28 Shiea, J. T and Sunner, J (1990) Effects of matrix vtscostty on FAB spectra Int J Mass Spectrom. Ion Processes 96,243-265.

29 De Pauw, E (1986) Liquid matrices for secondary ton mass spectrometry. Muss Spectrom. Rev. 5, 191-212 30 De Pauw, E (1990) Matrix selection in liquid secondary ion and fast atom bombardment mass spectrometry Methods Enzymol. 193,201-214. 31. Gower, J L. (1985) Matrix compounds for fast atom bombardment mass spectrometry. Biomed Mass. Spectrom. 12, 191-196 32. Cook, K. D , Todd, P. J , and Friar, D. H. (1989) Physical properties of matrices used for fast atom bombardment. Biomed Environ. Mass Spectrom. 18,492497

33 Kenny, P. T M. (1990) The use of 2-hydroxyethyl disulphrde as a matrix m liquid secondary-ion mass spectrometry Rapid Commun. Mass Spectrom 4, 156-158 34. De Angelis, F , Nicoletti, R., and Santi, A (1988) Thiodiethyleneglycol: A very

280

Wait

efficient matrix compound for fast atom bombardment mass spectrometry (FABMS). Org Mass Spectrom. 23,800-803. 35 Green, B N. and Bordoh, R S (1990) The molecular weight determmation of large peptides by magnetic sector mass spectrometry, m Mass Spectrometry of Peptzdes @esideno, D. M., ed.), CRC, Boca Raton, FL, pp. 109-l 19 36. Meili, J and Setble, J (1984) A new versatile matrix for fast atom bombardment analysrs. Org. Mass Spectrom. 19,581, 582 37 Freld, F H (1982) Fast atom bombardment study of glycerol Mass spectra and radtatron chemistry. J. Phys. Chem 86,5 115-5 123 38. Buko, A. M and Fraser, B A (1985) Peptide studies usmg a fast atom bombardment high field mass spectrometer and data system. 4. Disulftde contammg peptides. Biomed. Mass Spectrom 12,577-585. 39 Keough, T. (1988) Matrix effects on the formation of beam-induced adduct ions during fast atom bombardment of N-alkylpyndmmm salts Znt J Mass. Spectrom Ion Processes 86, 155-168 40 Lehmann, W D., Ressler, M , and Komg, W A (1984) Investigations on basic aspects of fast atom bombardment mass spectrometry Boomed. Mass Specrrom 11,217-222

41. Dass, C. and Deslderio, D M (1988) Particle beam induced reactions between peptides and liquid matnces. Anal. Chem. 60,2723-2729. 42 Barber, M , Bell, D. J., Morris, M , Tetler, L. W., Woods, M D , Monaghan, J. J , and Morden, W. E. (1988) The interaction of meta-nitrobenzyl alcohol with compounds under fast atom bombardment conditions. Rapid. Commun. Mass Spectrom 2,181-183 43. Kyranos, J. N. and Vouros, P (1990) Reduction processes m fast atom bombardment mass spectrometry: Interdependance of analyte and matrix redox potentials Biomed Environ. Mass Spectrom 19,628-634 44. Fupta, Y , Matsuo, T , Sakurai, T , Matsuda, H , and Katakuse, I (1985) Mass distribution of peptide molecular tons in the secondary Ionization process Int J Mass. Spectrom Ion. Processes 63,23 l-240 45 Verkey, K (1990) Interference effects caused by oxtdation and reduction processes in fast atom bombardment mass spectrometry Int J Mass Spectrom Ion Processes 97,265-282.

46 Shiea, J. and Sunner, J. (1991) The acid effect m fast atom bombardment

Org

Mass Spectrom 26,3844.

47 Kausler, W., Schnetder, K., and Spneller, G (1988) Practical hints for peptide sequencing by soft ionizatton methods. Biomed Environ Mass Spectrom 17, 15-19 48 Naylor, S and Monett, G. (1989) Factors affecting the fragmentatton of peptides m fast atom bombardment mass spectrometry Blamed. Enwon. Mass. Spectrom l&405-412

49 Mueller, D. R., Eckersley, M , and Richter, W J (1988) Hydrogen transfer reactions m the formation of “Y + 2” sequence tons from protonated peptides Org Mass Spectrom. 23,2 17-222 50 Ende, M. and Spiteller, G (1982) Contaminants m mass spectrometry. Mass Spectrom. Rev 1,29-62

FAB Mass Spectrometry 5 1. Middledltch, B. S. (1989) Anafytwzl Artifacts. Elsevler, Amsterdam. 52. Moon, D.-C. and Kelley, J A. (1988) A simple desalting procedure for fast atom bombardment mass spectrometry Biomed Environ. Mass Spectrom. 17,229-237. 53 Sate, K., Asada, T., Ishihara, M., Kunihiro, F , Kammei, Y , Kubota, E , Costello, C. E., Martin, S. A., Scoble, H. A , and Blemann, K (1987) High performance tandem mass spectrometry: Calibration and performance of linked scans of a four-sector instrument. Anal Chem. 59,1652-1659. 54. Buko, A. M., Phillips, L. R., and Fraser, B. A. (1983) Peptide studies using a fast atom bombardment high field mass spectrometer and data system 3-Negative iomzation. Mass calibration, data acquisition and structural characterlzatlon Biomed. Mass. Spectrom. 10,387-393.

55. Reynolds, J D. and Cook, K. D (1990) Improving fast atom bombardment mass spectra: the influence of some controllable parameters on spectral quahty. J. Am Sot. Mass Spectrom. 1, 149-157. 56 Arberth, W. H and Burlingame, A. L (1988) Effect of primary beam energy on the secondary ion sputtering efficiency of hquid secondary iomzation mass spectrometry in the 5-30 keV range. Anal. Chem. 60,1426-1428. 57 Grotjahn, L. and Taylor, L. C E. (1985) The use of signal averaging techniques for the quantitation and mass measurement of high molecular weight compounds using fast atom bombardment mass spectrometry. Org Mass Spectrom. 20, 146-152. 58. Cotter, R. J., Larsen, B S., Heller, D. N , Campana, J E , and Fenselau, C (1985) Wide mass range scannmg for the fast atom bombardment mass spectrometry of very large compounds. Anal Chem. 57,1479-1480 59 Yergey, J., Heller, D. N , Hansen, G., Cotter, R. J , and Fenselau, C. (1983) Isotope distributions in mass spectra of large molecules Anal Chem. 55,353-356. 60. Yergey, J., Cotter, R J , Heller, D N , and Fenselau, C (1984) Resolution requirements for middle-molecule mass spectrometry. Anal Chem 56,2262,2263 61. Morris, H. R , Panico, M , Barber, M., Bordoli, R. S, Sedgwlck, R. D., and Tyler, A (1981) Fast atom bombardment+ A new mass spectrometrlc method for peptide sequence analysis Blochem. Biophys Res. Commun 101,623-63 1 62. Williams, D H., Bradley, C V , Santikarn, S , and BoJesen, G. (1982) Fast-atombombardment mass spectrometry. a new technique for the determination of molecular weights and amino acid sequences of peptides. Blochem J 201, 105-l 17 63. Barber, M., Bordoh, R S, Sedgwick, R. D , and Tyler, A N (1982) Fast atom bombardment mass spectrometry of the anglotensin peptldes Biomed. Mass Spectrom. 9,208-214.

64. Roepstorff, P. and Fohlman, J (1984) Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed. Mass Spectrom 11,601. 65. Biemann, K (1988), Contribution of mass spectrometry to peptlde and protem structure Biomed Envrron. Mass. Spectrom 16,99-l 11 66. Johnson, R. S , Martin, S ‘A., and Blemann, K (1988) Collision-induced fragmentation of (M + H)+ ions of peptides. Side chain specific sequence Ions Znt J Mass Spectrom Ion Processes 86,137-154

67. Johnson, R. S., Martin, S. A , Blemann, K., St&s, J. T., and Watson, J T. (1987) Novel fragmentation of peptldes by collision induced decomposition m a tandem

282

Wait

mass spectrometer Differentiation

of leucine and isoleucme. Anal Chem 59,

262 l-2625 68. Stults, J. T. and Watson, J. T. (1987) Identification

of a new type of fragment ion in the collisional activation spectra of peptides allows leucme/isoleucme differentiation Biomed. Envrron. Mass. Spectrom. 14,583-586. 69 Kassel, D. B. and Biemann, K (1990) Differentiation of hydroxyprolme isomers and isobars in pepudes by tandem mass spectrometry. AnaL Chem. 62,169 l-l 695 70 Morris, H. R., Panico, M , Karplus, A , Lloyd, P. E., and Riniker, B. (1982) Elucidation by FAB-MS of the structure of a new cardioactive peptide from Aplysca Nature 300,643-645.

71. Seki, S , Kambara, H., and Naolu, H. (1985) Sequence analysis for an unknown peptide by molecular secondary ion mass spectrometry. Org Mass. Spectrom 20,18-24. 72. Bradley, C. V., Willlams,

D. H., and Hanley, M. R (1982) Peptide sequencmg using the combination of Edman degradation, carboxypeptidase digestion and fast atom bombardment mass spectrometry. Biochem. Biophys Res. Commun. 104,1223-1230

73. Caprioli, R. M. and dases with molecular etry. Anal. Biochem. 74 Biemann, K (1982)

Fan, T. (1986) Peptide sequence analysis usmg exopeptianalysis of the truncated polypeptides by mass spectrom154,596-603.

Sequencing of protems Int J Mass Spectrom Ion. Phys

45,183-194. 75 Wada, Y , Hayashi, A., Masanori, F , Katakuse, I., Ichthara, T , Nakabushi, H ,

Matsuo, T , Sakurai T., and Matsuda, H. (1983) Characterization of a new fetal hemoglobin variant, Hb F Izumi ‘* 6G’u-G’y,by molecular secondary ion mass spectrometry. Brochlm. Biophys. Acta. 749,244-248 76. Wan, R., James, B., and Calder, M R (1991) Synthesis and characterization by fast atom bombardment mass spectrometry of peptides related to the B-domain of staphylococcal protein A. Org. Mass Spectrom 26,458-462. 77 Vestling, M M., Murphy, C. M., and Fenselau, C (1990) Recognmon of trypsm autolysis products by high-performance liquid chromatography and mass spectrometry Anal Chem. 62,2391-2394 78 Clench, M R , Garner, G. V , Gorden, D B , and Barber, M (1985) Surface effects in FAB mappmg of proteins and peptides Blamed Mass Spectrom 12, 355-357. 79. Naylor, S., Findeis, A. F , Gibson, B. W., and Willlams, D H (1986) An approach toward the complete FAB analysis of enzymic digests of peptides and protems J Am Chem. Sot. 108,6359-6363 80. Naylor, S., Moneti, G., and Guyan, S (1988) Hydrophobic effects m the fast atom bombardment mass spectra of proteins and large peptides. Blamed Envwon. Mass Spectrom. 17,393-397.

81. Bull, H B. and Breese, K (1974) Surface tension of ammo acid solutions A hydrophobicity scale of the amino acid residues. Arch. Biochem Brophys. 161, 665-670.

FAB Mass Spectrometry

283

82. Poulter, L., Ang, S.-G , Williams, D H., and Cohen, P (1987) Observations on the quantitation of the phosphate content of peptides by fast atom-bombardment mass spectrometry Biochim. Blophys. Acta 929,296391. 83 Caprioli, R. M., Moore, W. T , and Fan, T. (1987) Improved detection of “suppressed” peptides in enzymic digests analysed by FAB mass spectrometry Rapid Commun. Mass Spectrom 1, 15-17.

84 Falick, A. M. and Maltby, D A (1989) Derivatization of hydrophilic peptides for liquid secondary ion mass spectrometry at the picomole level Anal Brochem. 182,165-169

85. Whaley, B. and Capnoh, R. M. (1991) Identification of nearest-neighbor peptides m protease digests by mass spectrometry for construction of sequence-ordered tryptic maps. Biol Mass Spectrom. 20,210-214.

CHAPTER12

Tandem Mass Spectrometry Catherine

E. Costello

1. Introduction For more than 30 years, electron ionization mass spectrometry (EIMS) has played a key role in the structural determination of small biological compounds, largely becauseit has three advantagesto offer: very high sensitivity compared to other structural methods, such as nuclear magnetic resonance and infrared spectrometry, the possibility for analysis of mixtures, and the wealth of data in the spectra that can provide information about structural details. However, the use of EIMS for the structure elucidation of larger biological molecules is limited by the necessity for vaporizing samples before ionization, a process that causes the thermal degradation of high-mol-wt and/or polar compounds. More recently, the development of “softer” methods of ionization that do not require vaporization prior to ionization has substantially overcome the problem of thermal decomposition, but these ionization methods impart little excess energy to the molecular ions and result in spectra that contain few, if any, fragment ions. In order to obtain detailed information about structure, therefore, the molecular ions must be decomposed and the mass spectra of the decomposition products recorded. For this type of analysis, a tandem mass spectrometer is employed. The resulting spectra include product (fragment) ions derived from a single precursor (parent) ion, provide structural details, such as amino acid or sugar sequence and residue modifications, and identify the components of conjugated lipids or From Methods in Molecular Stology, Vol 17. Spectroscopic Methods and Analyses NMR, Mass Spectrometry, and MetaNoprotern Technfques E&ted by C Jones, B Mulloy, and A H Thomas Copyright 01993 Humana Press Inc , Totowa, NJ

285

286

Costello

other adducts. In the case of samples that are mixtures, the structures of each of the components can be specifically determined. A brief survey of tandem mass spectrometry as it is employed for the elucidation of several important compound types is presented here, usmg examples from the author’s research and collaborations. For a comprehensive review of current mass spectral approaches to the structure determination of biologically significant compounds, the reader is referred to a recent volume edited by J. A. McCloskey (I). 2. Instrumentation for Tandem Mass Spectrometry 2.1. General

Description

Throughout the following discussion, the examples used are spectra of biologically important compounds that have been ionized by liquid secondary ion mass spectrometry (LSIMS), a process in which an accelerated beam of primary particles (e.g., Cs+ or Xe”, having energies of 6-25 keV) is directed at a target that contains sample dissolved m a liquid matrix, such as glycerol or triethanolamine. When a neutral primary beam is used, the process is also called fast atom bombardment (FAB). LSIMS is presently the most commonly used ionization method for samples that have mol wt in the lOOO-10,000 dalton range. LSIMS produces mostly (M + H)+ or (M - H)- ions and causes little fragmentation. Adduct ions that include cationic or anionic impurities (salts, buffers) and/or matrix may also be observed. In a tandem mass spectrometer, two stages of mass selection are carried out. The first separation (MS-l) resolves species formed in the ion source (when “soft” ionization methods are used, these species are mostly molecular ions, (M + H)+ or (M - H))). Scanning MS-l and recording the signal after this separation result m a “normal” mass spectrum that includes the molecular ions of all the species present and some low-abundance fragment ions, as well as cluster ions formed from the matrix alone or from a combination of matrix and sample ions, For the tandem (MS/MS) measurement, an ion selected by MS- 1 is passed mto MS-2, and the spectrum of its decompositron products is obtained by scanning MS-2. Because ions produced by liquid SIMS have little excess energy and do not undergo much spontaneous decomposition, additional energy that will force decomposition is usually imparted to the ions selected by MS- 1 by allowing them to undergo collisions with

Tandem MS

287

Fig. 1. Principle of a magnetic deflectlon tandem mass spectrometer. I.S. = ion source; DET = detector; P, = peaks in the “normal” mass spectrum followmg FAB ionization, these are primarily (M + H)+ or (M - H)-. Here, the magnet 1s set to transmit P3 into the colllsion region, where it is decomposed into fragments F,, that form its CID mass spectrum. (Reproduced, with permissIon, from Science 131)

an inert gas, such as helium or xenon, in a collision cell located in the field-free region between MS-l and MS-2. This process is referred to as Collision Induced Decomposition (CID). The product ions and the remaining precursor are directed into the MS-2 analyzer, where they are resolved and detected. Because the product ions differ both in mass and energy, the MS-2 analyzer can produce only very low-resolution spectra, unless both dimensions are included in the scan. Figure 1 shows a cartoon depicting the CID MS/MS experiment (3). 2.2. Triple

Quadrupole

Instruments

In these instruments, the separation devices and the collision cell are all quadrupole mass analyzers. The central quadrupole has only radiofrequency (rf) applied to it, so it transmits all masses and functions as a region that confines the ion beam long enough for collisions with the gas to occur. Quadrupoles 1 and 3 have both rf and dc voltages, and act as mass filters. They may easily be scanned in a variety of patterns with respect to each other, so that, for example, all products from a given precursor, all precursors of a given product, or all ions

288

Costello

linked by the same neutral loss are transmitted. Collisions take place at lo-200 eV and multiple collisions generally occur, so that the spectra may be quite complex. Some triple quadrupole instruments have a quoted mass range of 4000 dalton, but poor sensitivity at high mass limits the utility. MS/MS experiments are most successful below m/z 1000, although some laboratories report very useful peptide spectra to ca. m/z 2000 (2a,b), with a sacrifice of resolution for sensitivity. An interesting new application of these instruments is CID of multiply charged ions generated by electrospray ionization (#,5). 2.3. Hybrid

Instruments

The use of a double-focusing instrument rather than a quadrupole as MS- 1 allows higher resolution in the selection of the precursor ion and higher transmission for high-mass precursor ions. When this type of MS-l is coupled with a two-stage quadrupole, the first quadrupole is used as the collision cell and the final quadrupole separatesthe product ions. As in the triple quadrupole instrument, collisions take place at low energy, and the transmission of high-mass products is compromised, so these hybrid instruments also work best below m/z 1000 (6,7). High-energy collisions may be carried out with such an instrument, by using a cell in the field-free region between the magnetic and electric sectors, but the sensitivity is poor for this type of measurement. Other combinations of analyzers have been made as experimental hybrid instruments, but are not commercially available. 2.4. Magnetic

Sector

Instruments

The highest performance is achieved when both MS- 1 and MS-2 are double-focusing instruments, so that both precursor and product ions may be separated at unit resolution or better. The collision cell can generally be floated at any potential between ground and the accelerating voltage. Usually, collisions take place at 3-10 kV with respect to ground (Elab) so that high-energy processes are observed. The gas pressure and collision cell dimensions are such that each ion undergoes only one or two collusions, a factor that results m relatively simple spectra. When MS-2 is a magnetic double-focusing instrument, MS- 1 is set to transmit the selected ion, and the magnetic and electric fields of MS-2 are scanned at a constant ratio (linked scan at constant B/E). Other relationships can also be used to produce different types of

Tandem MS information (all precursors, neutral losses). The spectra included in this chapter have been obtained with such an instrument (JEOLHXI lO/ HXl lo), the performance characteristics of which have been described elsewhere (8). 3. Peptides 3.1. Amino Acid Sequence Although the mol wt of a peptide may usually be readily determined from the LSIMS spectrum, information about the amino acid sequence is most often missing or at best incomplete, because the fragment ions have only low abundances, are obscured by the background, or, in the caseof mixtures, cannot be unambiguously relatedto a specific molecular ion. In the tandem mass spectrometry experiment, the fields of MS-l are set to transmit the molecular ion of interest into the collision region between MS- 1 and MS-2, where it is subjected to CID. The product ion spectrum recorded by MS-2 shows more abundant fragmentation and contains only fragments derived from the selected mass value. The information content of the spectrum is thus enhanced, and the contribution of extraneous components to the spectrum is minimized. Figures 2A and B, respectively, show the LSIMS spectrum of the tetradecapeptide renin substrate (Asp-Arg-Val-Tyr-Leu-His-Pro-PheHis-Leu-Leu-Val-Tyr-Ser) and the CID mass spectrum of its (M + H)+ m/z 1758.9. Assignments of fragments are indicated by symbols used according to the Biemann modification (9) of the Roepstorff and Fohlman nomenclature (10). These structures are shown in Scheme 1. The two-sector spectrum (Fig. 2A) contains a nearly complete set of a, ions, but, above m/z 1000, their abundances are low and nearly the same as fragments belonging to other ion series. Although the assignments for a known peptide can be made with confidence, it would be difficult to deduce the amino acid sequence of an unknown peptide from such a spectrum. As is commonly observed for CID mass spectra of peptides, the spectrum shown in Fig. 2B includes abundant low-mass ions that are immonium ions of most (but not usually all) of the amino acids contained in the sequence. Similar ions are present in the two-sector mass spectrum, but most are obscured by the presence of very abundant ions owing to the matrix. At the high-mass end of the CID mass spectrum, abundant ions are also observed, this set arising via cleavages of the

290

Costello

c I

/

200

300

400

500 600

700

000

900

1000 1100

1200

1300

1400

1500

1600

(M+H)+ 17589

7

1700

Fig 2. Mass spectra of the tetradecapeptrde renm substrate, Asp-Arg-Val-Tyr-Leu-His-Pro-Phe-Hls-Leu-~u-Val-Tyr-Ser, M, 1757.9, &ssolved m 1: 1 0.5% TFAIglycerol Accelerating voltage 10 kV, postacceleration at the detectors, 18 kV Symbols refer to the fragment types whose general structures are shown in Scheme 1. (A) “Normal” (two-sector) mass spectrum Inset shows the (M + H)+ molecular ion region Only the monoisotopic peak, m/z 1758.9, was selected for CID, and spectrum B therefore does not Include 13C isotope peaks. Asterisks (*) mark matrix peaks (B) CID mass spectrum of the (M + H)+, m/z 1758.9 Collision cell floated at 3 kV with respect to ground.

m/z 100

H

B

292

Costello x3

Yo

=3

x2

Y2

=2

Xl

Yl

Zl

H2N-

al bl

cl

a2 b2

r---@-l

a3 b3

~2

~3

#A i LO-(NH-CHR-CO),.I-OH

H-(NH-CHR-CO&NH-

4,

Wn

A HN=CH-CO-(NH-CHR-CO),.,-OH

Scheme1. Nomenclature for peptlde fragment ions in tandemmassspectra (12)

various amino acid side chains. These provide further information about the amino acid content of the peptide. The relative abundances of the members of these two sets of fragments do not, however, indicate the number of each type of residue, since their abundances are structure-dependent. It has been found experimentally that the CID spectra of most peptides are dominated by a single sequence-ion fragment type, and that the amino acid sequence can be determined by calculating the intervals between the peaks of the major ion series. For renin substrate, the a,,series is dominant (II), probably because there is a basic residue (Arg) in the penultimate position with respect to the

Tandem MS None

N-acylated Pepttdes contammg no basic amino acids

293 Localization

of the positive

Peptides containing no basic amino acids

Signdlcant mainly b,, some yn

charge

HIS, Lys peptides

kens

Complete

+!J+I peptldes

Art peptides

z

mainly a,, d, and v,, w,~, Y,,, respectively

Scheme 2. Summary of rules relating the degree of charge locahzation and fragmentatlon processes observed in high-energy CID mass spectra of peptide (M + H)+ ions. Reproduced, with permission, from ref. 12

N-termmus. The dominant series shifts as the likely site for charge localization moves from one termmus to the other and is especially clear when a precharged site (e.g., quarternary ammonium) exists at one terminus. If no favored charge site is present, the spectrum is dominated by b, and y, ions, the products of cleavage at the backbone amide N-C bond. This pattern is summarized in Scheme 2 (12). 3.2. Differentiation of Isomeric and Isobaric Residues The intervals between the sequence ions give an indication of the amino acid sequence that is definitive, except in the case of residues that are isomeric or have the same nominal mass value, and thus would yield isobaric ions. In most mass spectra of peptides, these residues could not be differentiated, but in high-energy collision spectra, a secondary cleavage that involves scission of the P,‘y C-C bond in the side chain of the terminal residue in an a, + 1 or z, + 1 fragment ion leads to further sets of ions, termed d,, and v, or w,, respectively, whose structures are shown in Scheme 1 (12). Numerous d, ions can be seen in Fig. 2B, and these indicate clearly that the xLe residues are all Leu. When a strongly basic residue (Arg, Lys) is present and serves as the most likely site of charge localization, side chain cleavage fragment ions become especially prominent, and an apparently irregular set of sequence ions is observed. Assignment of sequence in such a

Costello

294

spectrumis difficult to achievemanually, but canbe accommodatedthrough useof computer programs that sort the peaks observedby matching them to calculated values (9,13,14). The spectrum shown in Fig. 3 was obtained for the (M + H)+ m/z 1256.7 of a peptide found to be one component of a tryptic digest of the hemoglobinfrom a patient suspectedto have a hemoglobin abnormality. All the peptide (M + H)+ m/z values that would be predicted for tryptic digestion of the Hb P-chain were observed, except for the peak expected at m/z 13 14.7, the (M + H)+ of a peptide containing the residues 18-30 (Val-Asn-Val-Asp-Glu-Val-Gly-Gly-Glu-AlaLeu-Gly-Arg), which was replaced by a peak at m/z 1256.7 (15). Only three possible single amino acid exchanges would cause the observed massshift (21Asp+Gly), (22Glu+Ala), or (26Glu+Ala). The observed spectrum fits only one of thesepossibilities (22Glu-+Ala), and the modification could therefore be unequivocally assigned, chiefly on the basis of the w,-series ions. (A literature search revealed that this modification had been reported previously as Hemoglobin G. Coushattu [16].) 3.3. Posttranslational

Modifications

Posttranslational modifications, such as acetylation or other fatty acyl substitution, phosphorylation, sulfation, or glycosylation, and, more importantly, C-terminus fraying are common and greatly affect the biological activity and distribution of peptldes and proteins. The amino acid sequence translated from the gene may also be truncated so that the product differs from that predicted on the basis of the gene sequence. Modifications to the N-terminus block the Edman-based sequenator methods; other modifications may be relatively transparent, their presence suggested only by low yields in some sequenator cycles and impossible to characterize by these methods. These modifications are especially important to define for genetically engineered systems where the final peptides intended for use as pharmaceuticals are produced by organisms that may or may not carry out the “normal” posttranslational modifications. A LSIMS mass spectrum will generally indicate the mass shift that accompanies such a modification, but a CID spectrum is usually necessary to locate the site of the modification and details of its structure (3). Proteins modified by glycosylation present the analyst with the dual challenge of determining the structures of both peptide and carbohydrate portions, as well as the attachment sites. These are discussed in Section 4.4.

100

,R

‘2

200

b*

300

W3

400

500

600

700

800

900

I

1000

"11

1100

I

W12

1256 7

(M+H)+

1200

Fig 3 CD mass spectrum of the (M + H)+, m/z 1256.7, of a peptide found in the tryptlc digest of a hemoglobm @chain variant (IS) Sample dissolved in 1: 1 0 5% WA/glycerol Accelerating voltage 10 kV Collision cell floated at 3 kV with respect to ground The sequence asslgned on the basis of this spectrum was Val-Asn-Val-Asp-Ala-Val-Gly-Gly-Glu-Ala-Leu-Gly-Arg.

m/z

V

Costello

296 3.4. Derivatization

Derivatization can be used to improve the ionization efficiency and, thus, the sensitivity for detection of a peptide, or to introduce a site for charge localization that improves the information content of the spectrum by producing a fragment ion seriesthat favors a full set of sequence ions. Both N-terminal and C-terminal derivatives can be used for this purpose, as suggested in Scheme 2 (17,18). 4. Carbohydrates and Glycoconjugates 4.1. Sugar Sequence

The structural determination of oligosaccharides and glycoconjugates is a more difficult problem than is the amino acid sequencedetermmation of peptides, because the possibility of branching sites leads to more potential structures from the same set of building blocks, and because the rate of occurrence of heterogeneity in individual biological samples is generally higher. The sequenceof sugarresidue types (hexose, pentose, hexosamine, and so on) and the branch points are often quite clear, even in the normal LSIMS mass spectrum of rigorously purified native orpermethylated oligosaccharides and glycoconjugates (19,20). For many biological samples, however, the situation is complicated by the presence of heterogeneity in the aglycon and of species with variations in the size of the carbohydrate, so the assignment of fragment ions may be somewhat arbitrary or the fragment ions may not have sufficient abundance to be useful. The CID MS/MS approach enables the analyst to relate fragment ions to a specific parent, and to enhance the amount of sequenceand aglycon structural information by increasing and directing the extent of fragmentation. For a saponin isolated from the Western subterranean termite whose triterpene component (echinocystic acid) was known, but for which the carbohydrate structure remained to be determined, the LSIMS spectrum contained an (M + H)+ m/z 940.5 and aglycon fragments at m/z 455.4 and 437.4, but did not include fragments that would indicate the carbohydrate composition or sequence. The CID mass spectrum, shown in Fig. 4, obtained for a few micrograms of sample, made possible the assignment of the sugar sequence.The structuredetermined for the unknown was found to match that reported by another group at about the same time (21). The technique becomes even more useful when mixtures of closely relatedoligosaccharidesareencounteredor extensive branchmg is present.

297

Tandem MS

222

----Qz2364 ---A -132 penlobe

0-penioce t

+2tl 808

+2H 436

0-GloHAc...O’ t

2

466 (MtH)+ 9405

4

468

w-f,, 204

w-f, 336 676

m/z

100

200

300

400

500

600

808

700

800

900

Fig. 4. CID mass spectrum of the (M + H)+, m/z 940 5, of a saponin from Western subterranean termites Sample dissolved m 1.1 DMSO/glycerol Acceleratmg voltage 10 kV Grounded colhslon cell. Symbols refer to the fragmentation processes ~llustrated in Scheme 3C.

Burlingame and his colleagues haverecently usedCID to assign the structures of the complex

mixture

of oligosaccharides

from the lipo-oligosac-

charides of a pyocin-resistant Neisseria gonorrhoeae (22) and to correct an earlier structural assignment of the MansGlcNAc2 core oligosaccharide of the mmn mutant of Saccharomyces cerevisiae (23). Isomeric residues and linkage sites can only be differentiated when fragments are present that result from cleavages occuring within the carbohydrate rings. Derivatization or cationization may favor such pathways (24-29). Some information on configuration may be obtained for small oligosaccharides (30,31), but conclusions about the generality of this approach must await the accumulation of a larger body of experimental results. Complementary data from degradative techniques

298

Costello

and NMR are useful when adequate amounts of pure samples of individual components are available. Particularly effective for linkage site determination is periodate cleavage of vicinal dials, an approach that has been optimized for oligosaccharides and glycoconlugates by Nilsson and coworkers (32), who have applied this method prior to two-sector FABMS. MS/MS analysis of the oxidation products offers a further opportunity to remove any remaining ambiguities. 4.2. Lipid Structures in Glycosphingolipids The biological activity of glycosphingolipids depends on the structures of both components of the molecules: the carbohydrate and the lipid (a long-chain base, such as 4Esphingenine, usually with an N-acyl substituent, such as palmitoyl). Samples recovered from a biological source are usually complex mixtures that vary in both (33-35). It is therefore imperative that an analytical method for these compounds be able to characterize the individual components as completely as possible within the limits of available sample amounts and handling time. Methods that minimize either sample requirement or procedural steps obviously are advantageous.Tandemmass spectrometry hashigh sensitivity and permits structural elucidations without requiring the complete resolution of all components, yet retains a high degree of specificity. It is therefore well suited to glycosphingolipid analysis. As noted in the preceding section, information on carbohydrate sequence and branching is often available from the FABMS spectrum, but may be more clearly related to individual molecular weights from the MS/MS spectrum. In addition, for each component, the total weight of the sphingolipid portion, the chain lengths of the long-chain base and fatty acyl group, and the sites of unsaturation or hydroxylation may be determined (36). Figure 5 shows the negative-ion CID mass spectrum obtained for the (M-H)-, m/z 890.5, of lactosyl-N-stearoyl-sphinganine. Scheme 3 indicates the types of fragmentation observed in these spectra (36-38). For underivatized compounds, the negative ion FABMWMS spectrum of the (M - H)- presents more information about the structure of the carbohydrate portion, whereas the positive ion spectrum of the (M + H)+ is generally more informative about the lipid (37). Similar patterns areobserved for glycophosphosphingolipids. Although the figures presented here show FABMYMS spectra from

100

200

300

400

500

yfl

I

566

600

700

yI 728

800

\

900

Fig. 5 CID mass spectrum of the (M - H)-, m/z 890.6, of lactosyl-N-stearoyl sphmganme Sample dissolved m 1: 1 DMSO/ tnethanolamine. Acceleratmg voltage -10 kV Colhsion cell floated at -3 kV with respect to ground. Fragment nomenclature 1s as described m Scheme 3

m/z

1

Cl 8 179

2.4A,-H 221

8906

(M-H)-

300

Costello

m

,oQ-

a

OH

OH

0

z1 yo

to

Scheme 3 Nomenclature for glycosphmgoliprd fragment ions m tandem mass spectra. (A) Posrtrve ion product Ions from fragmentations withm the cerarmde portion (B) Negative ran ceramide product ions. (C) Product ions from fragmentatrons w&m the carbohydrate. The number of hydrogen transfers during ring cleavages (A,, X,) may vary and must be indicated m each case Fragments that arise via glycosidrc cleavages mclude predictable hydrogen transfers and are thus defined* B,+ = [B,], Ci+ = [C, + 2H]+, Y,+ = [Y, + 2H], Z,+ = [ZJ], B,- = [B, - 2H], C- = [C,], Y,- = [Y,], Z, = CZ, - 2H] Reproduced, with pernnss~on, from refs 36 and 38.

Q

HO

Yl

Costello a four-sector instrument, it is worth noting that some of this information may be obtained during two-sector linked scanning at constant B/E ratio with FAB ionization (19,39) or with supercritical fluid chromatography-chemical ionization mass spectrometry, using NH3 as the reagent gas (40) and magnetic scans. For complex mixtures that contain components that differ in mass by only a few daltons, however, the limited mass resolution for the precursor in these latter types of experiments can leave some ambiguity about structural components that are close in mass. 4.3. Derivatization

As is the case for peptides, derivatization may be used to improve sensitivity and to direct fragmentation during the tandem mass spectral analysis of glycolipids. Derivatization also helps to lessen deleterious effects from the endogeneous salts often present in biological samples, by blocking their probable binding sites. As the carbohydrate content of the compounds increases, their observed FAB sensitivity decreases.Permethylation or peracetylation reversesthis effect (19,20). For CID mass spectra, permethylation is preferable, because the fragmentation patterns of these derivatives contain a wealth of structural information, whereas the CID spectra of the peracetylated compounds are dominated by multiple losses of acetic acid and ketene. The spectra of the permethylated compounds contain fragments related to the lipid structure, as well as fragmentations within the carbohydrate, predominantly of the 1,5Xntype, that provide sequenceinformation. Reduction of the amide bonds increasesthe sensitivity and helps to control the fragmentation (36,41) in the CID spectrum whether or not permethylation is also carried out. Reduction with LiAlH, in combmation with permethylation hasbeen usedvery successfully for EI mass spectral analysis of glycolipids by Karlsson and coworkers (33). Reduction with BH, (or BDs) allows sample handling at the low nmol to pmol level, and subsequent treatment with H202/NH3 results in concomitant conversion of ,-CH = CH- groups to -CH,CHOH(or -CHDCHOH-). These hydroxyl groups introduce a mass shift in the fragmentations along the base and fatty acyl hydrocarbon chains that permit location of the site(s) of the initial unsaturation. Manipulation of reaction conditions determines whether all amide groups (including N-acetyl) or only the ceramide carbonyl is reduced (36,4I). The spectra contain

Tandem

MS

303

abundant fragments that allow most features of the glycolipid structure to be defined. Figure 6 shows the CID mass spectrum of the (M + H)+, m/z 2 178.5 of the higher homolog of fully reduced, permethylated ganglioside Gut,. 4.4. Glycopeptides, Other Glycoconugates, and Related Compounds Strategies have been developed for the location of occupied glycosylation sites on peptides that contain one or many more potential sites that may be variously substituted to yield multiple glycoforms, and for sequence determinations of both the peptide and the carbohydrate(s) (42-44). This process may be carried out so that the two components areseparatedearly in the purification schemeand sequenced independently. Although this result is useful, it is even more informative to retain the connection between glycosylation site(s) and structure, so that the specificity or heterogeneity at each site may be determined. CID mass spectra of glycopeptides that contain oligosaccharides attached to moderate-length peptides provide information on the sequence of both (44). For these compounds, derivatization is not necessary, This type of analysis is particularly important for recombinant glycoproteins and may well become a prerequisite for quality assurance of recombinant drugs. Tandem mass spectrometry has also proven to be useful for the characterization of glycerophospholipids, which contain one or two fatty acyl substituents and a third substituent, such as phosphocholine, phosphoserine, or phosphoinositol(45,46). CID of selected fragment ions, rather than the molecular ions, provides identification of both the chain length and the position of the substituents. 5. Other Compound Classes 5.1. Nucleotides and Oligonucleotides Tandem mass spectrometry has, as yet, been little used for nucleic acid research (47), although mass spectral analysis has for many years played a critical role in the identification of new and unusual nucleic acids. In collaboration with Grotjahn (48) and with McCloskey (49), Gross et al. have investigated the CID mass spectral behavior of some small oligonucleotides and nucleic acids, using a triple-sector (EBE) mass spectrometer. Grotjahn et al. (50) showed that, for deoxyoligo-

iH20R

hiOR

’

CAN

&OR

&HOR

@---HOR

R

COOR

-lit i

015 362

1739

C,-MeOH

R=Me

Fig 6 CID mass spectrum of the (M + H)+, m/z 2178 5, of the higher homolog of the fully reduced, permethylated ganglioslde GD1,. Accelerating voltage 10 kV Colhslon cell floated at 3 kV with respect to ground

T

RO

0

2: g z

Tandem MS

305

nucleotides containing up to 12 or more bases, the sequence information in the normal negative-ion FAB mass spectrum is quite complete; in this instance, there is no advantage to CID unless the sample contains a mixture or includes unusual components. Such examples are beginning to appear m the literature, and many more are surely to be expected. Cushnir and colleagues (51) have used a hybrid instrument and deuterium exchange to identify alkylated DNA bases found in human urine. Dino et al. have reported four-sector CID studies of tetranucleotides modified with the ultimate benzo[a]pyrene metabolite (52). Claeys and coworkers haveemployed constantneutral loss scans on a hybrid instrument to detect deoxynucleosides and their adducts as a means for assessing exposure to phenyl glycidyl ether, a mutagen used in the production of epoxyresins (53). They have also investigated the CID behavior of his-nucleobase adducts with the anticancer drug cisplatin (54). We have carried out analogous studies ofcisplatinoligonucleotide adducts, and have observed fragmentation pathways that indicate the sites of platinum binding (55). The spectrum shown m Fig. 7 was obtained for trans-{Pt(NH&(Guo)[d(CpG)]}+, m/z 1050.9, recovered by enzymtc digestion of the 1,4-intrastrand crosslinked reaction product of trans-diamminedichloroplatinum(I1) with 5’-d(TCTACGCGTTCT) (56). The spectrum clearly showed that platinum is bound to one guanosine and one cytidine nucleoside, consistent with other experimental results, and verified that the kinetically favored 1,3-intrastrand crosslinked product had undergone rearrangement. 5.2. Inorganic, Organometallic, and Coordination Compounds

The soft ionization methods, although each was first popularized for the analysis of peptides and later for other organic compounds, have begun to find roles also for the characterization of inorganic, organometallic, and coordination compounds. The use of FABMS has recently been reviewed (57). Field desorption (58), DC1 (59), and 252Cf-plasma desorption mass spectrometry (60) are also useful, particularly for high-mol-wt samples and for salts, and can provide complementary results that are especially important when questions arise concerning the possibility of chemical interactions between the analyte and the FAB matrix. Although inorganics are not generally regarded within the realm of biologrcal compounds, the widespread utilization

100

200

1055

300

1065

400

.

lg5Pt(G)(C) 456

500

600

‘g5Pt(NH3)(Guo)(C) 589

700

800

900

-t

1000

0

I

r’

Fig 7. CID mass spectrum of the ion at m/z 1050.9 m the FAB mass spectrum of rrans- { Pt(NH,),(Guo)[d(CpG)] }+ (56) Inset shows molecular ion region of two-sector spectrum Asterisks (*) mark matrix-related peaks Guo = guanosme, S = sugar (deoxymbose). Ammonia losses from low-abundance ions are not marked, but occur 17 pand 34 p below the marked ions The peak selected for CID contams prrmarily 195Pt(M - H*)+, but there IS some contribution from the ‘94Pt species containing an additional hydrogen Sample dtssolved in 2: 1 water/glycerol Accelerating voltage 10 kV Grounded collision cell

m/z

I

(G+H)+ 152

1045

r---F-l 1051

0

0 0 ic (D Fz

Tandem MS

307

of inorganic complexes as diagnostic imaging and therapeutic agents presents an important application for mass spectrometry as a means to ascertain or verify structures of synthetic complexes and their metaboIites. CID is quite efficient for inorganic compounds and complexes, and is practical at higher mass values than are usually attempted for organic compounds, more than 4000 dalton with a four-sector instrument (61) and 2000 dalton with a hybrid (62). The CID behavior of complexes may give clues to chemical reactivity, such as the elimination of alkenes and neutrals that contain one or two oxygen atoms from technetium complexes, such as that shown in Fig. 8. These compounds were found to be effective catalysts for alkene oxidation (63). Tc-based radioimaging agents are clmically important, and both FABMS/MS and DCI/MS/MS have become valuable tools for their characterization (59,64). In addition to radiormaging applications, nuclear magnetic resonance and novel antibody-based chemotherapeutic agents employ inorganic complexes, and the study of metalloenzymes offers a further arena for MS and MS/MS studies. 6. Conclusions Tandem mass spectrometry adds a wealth of new opportunities for the analyst in the characterization of minute amounts of biological materials even when they cannot be purified to homogeneity. The increasing availability of a range of instrument types and a body of pertinent literature has brought the method out of the developers’ hands and into the realm of biochemists. The very recent developments of powerful new ionization methods for high-mol-wt compounds, matrix-assisted laster desorption (65), and electrospray (66) will provide yet more opportunity for MS/MS experimental methods. We are still on the very steep part of the learning curve, however, and the next few years promise to be even more exciting and rewarding than the hectic recent past! Acknowledgments The author is grateful to her MIT colleagues, especially K. Biemann, S. A. Martin, I. A. Papayannopoulos, B. Domon, and J. E. Vath, for many helpful and thought-provoking discussions on sample preparation and the acquisition and interpretation of tandem mass spectra, as well as to the many elsewhere who have generously shared then results

100

150

200

327.0

250

I 300

T

311

I

327 381

\

467

Fig 8 CID massspectrumof the (M - Cl)+, m/z 467.1, m the FAE3massspectrum of chloro-( 1,2-di-n-butyl- 1,2-dloxalato)-oxo(l,lO-phenanthrolmetechnetmm(V) complex dissolvedm 1 1 CH.$l+z-mtrobenzyl alcohol (63) R1,Rg= butyl, R2& = H, phen = phenanthroline Accelerating voltage 10 kV Grounded collision cell

m/z

r’I’r’l’J’r’l’l’l’(‘l’l’i’l~~‘l’l’l’l’~’l~l’l~r

(phen+H)+ 181 I

m/z

295

-

2 E % F

Tandem MS

309

and insights. The hemoglobm digest was provided by T. Matsuo and Y. Wada (Osaka), the saponin sample by I. Kubo (Berkeley), the platinated oligonucleotide by S. J. Lippard (MIT), and the technetium complex by A. Davison (MIT). The MIT Mass Spectrometry Facility is supported by Grant No. RR003 17, from the NIH Center for Research Resources. References 1 McCloskey, J A (ed.) (1990) Methods in Enzymology, vol 193 Mass Spectrometry. Academtc, San Dtego, CA. 2 (a) Hunt, D F , Buko, A M , Ballard, J. B , Shabanowrtz, J , and Grordam, A B (198 1) Sequence analysts of polypeptides by collision acttvated dissociation on a triple quadrupole mass spectrometer. Biomed Mass Spectrom 8, 397-408. (b) Hunt, D F , Yates, J R III, Shabanowrtz, J., Winston, S , and Hauer, C R (1986) Protein sequencmg by tandem mass spectrometry Proc. Natl. Acad Scz USA 83, 6233-6237. 3. Bremann, K and Scoble, H A. (1987) Characterrzatron by tandem mass spectrometry of structural modrficattons in protems. Science 237,992-998 4. Huang, E C and Hemon, J D (1990) LC/MS and LC/MS/MS determmatton of protem tryptrc drgests J Am Sot. Mass Spectrom. 1, 158-165 5 Smith, R. D., Loo, Joseph A , Barinaga, C. J , Edmonds, C. G., and Udseth, H R. (1990) Colhstonal activatton and colliston-activated drssoctation of large multrply charged polypeptides and proteins produced by electrospray tomzatton. J Am Sot. Mass Spectrom 1,.53-45 6 Poulter, L and Taylor, L C. E (1989) A comparison of low and high energy collistonally activated decomposmon MS-MS for peptide sequencmg lnt. J. Mass Spectrom Ion Proc. 91, 183-197. 7. Alexander, A J , Thtbault, P , Boyd, R K , Curtis, J M., and Rinehart, K. L (1990) Colhston induced drssoctatton of peptrde tons Part 3. Comparison of results obtained usmg sector-quadrupole hybrrds with those from tandem double-focusing instruments Int J Mass Spectrom. Ion Proc 98, 107-134 8 Sato, K , Asada, T , Ishrhara, M , Kunihiro, F., Kammei, Y., Kubota, E., Costello, C. E , Martm, S A , Scoble, H A , and Btemann, K. (1987) High-performance tandem mass spectrometry Calibration and performance of linked scans of a four-sector instrument Anal. Chem 59, 1652-1659 9. Johnson, R S and Btemann, K. (1989) Computer program (SEQPEP) to aid m the Interpretation of htgh-energy colhsron tandem mass spectra of pepttdes Boomed Env. Mass Spectrom 18,945-957 10 Roepstorff, P. and Fohlman, J. (1984) Proposal for a common nomenclature for sequence ions m mass spectra of peptides. Biomed. Mass Spectrom. 11,601. 11 Martin, S A , Johnson, R. S , Costello, C. E , and Biemann, K (1988) The structure determmatton of peptides by tandem mass spectrometry, m Analysis of Peptides and Proteins (McNeal, C J , ed ), Wiley, Chchester, UK, pp 135-150 12. Johnson, R S , Martin, S A , and Biemann, K. (1988) Collrston-induced frag-

310

Costello

mentation of (M + H)+ tons of peptides. Stde cham specific sequence ions. Int. J. Mass Spectrom. Ion Proc. 86, 137-154.

13. Papayannopoulos, I. A and Biemann, K. (1991) A computer program (COMPOST) for predictmg mass spectrometry information from known ammo acid sequences. J. Am Sot Mass Spectrom. 2, 174-177 14. Lee, T D and Vemun, S. (1990) MacProMass. a computer program to correlate mass spectral data to peptide and protein structures. Biomed. Env Mass Spectrom. 19,639-645

15. Matsuo, T (1989) High performance sector mass spectrometers past and present Mass Spectrom Rev. 8,203-236

16 Wada, Y , Matsuo, T., and Sakurai, T. (1989) Structure elucidation of hemoglobin vanants and other proteins by drgtt-printing method. Mass Spectrom Rev. 8, 379434.

17 Vath, J. E., Zollmger, M , and Biemann, K. (1988) A method for the dertvattzation of organic compounds at the sub-nanomole level with reagent vapor Fres Z Anal. Chem 331,248-252

18. Vath, J E and Blemann, K. (1990) Microdertvatization of peptides placing a fixed positive charge at the N-terminus to modify high energy collision fragmentation. Int. J Mass Spectrom Ion Proc 100,287-299 19 Egge, H. and Peter-Katalimc, J (1987) Fast atom bombardment mass spectrometry for structural elucidatron of glycoconugates Mass Spectrom. Rev 6,33 l-393 20 Dell, A. (1987) F.A.B.-Mass spectrometry of carbohydrates. Adv Carbohydr. Chem Biochem 4519-72

21 Carpani, G , Orsmi, F., Sisti, M , and Verolta, L (1989) Saponins from Albizza antihelmmtica

Phytochemistry

28,863-866

22. Gibson, B W., Webb, J W , Yamasaki, R , Fisher, S. J., Burlingame, A L , Mandrell, R. E , Schneider, H , and Griffiss, J. H. (1989) Structure and heterogeneity of the ohgosaccharides from the hpopolysacchartdes of a pyocin-resistant Nersseria gonorrhoeae. Proc. Nat1 Acad SCL USA 86, 17-21 23. Hernandez, L M., Ballou, L., Alvarado, E , Gillece-Castro, B L , Burlmgame, A L., and Ballou, C. E. (1989) A new Saccharomyces cerevisiae mnn mutant Nlinked oligosaccharlde structure J Biol Chem 264, 11,849-l 1,856 24 Poulter, L. and Burlmgame, A L (1990) Desorption mass spectrometry of ohgosaccharides coupled with hydrophobic chromophores, in Methods m Enzymology, vol 193. Mass Spectrometry (McCloskey, J A , ed.), Academic, San Diego, CAA, pp. 661689 25. Richter, W. J , Muller, and Domon, B (1990) Tandem mass spectrometry m structural characterrzatton of ohgosaccharlde residues in glycoconugates, m Methods m Enzymology, vol. 193. Mass Spectrometry (McCloskey, J A., ed ), Academic, San Diego, CA, pp. 607-623. 26. Guevremont, R. and Wright, J. L. C (1987) FAB and sequential mass spectrometry with a VG ZAB-EQ: hexose stereoisomers Rapid Commun Mass Spectrom. 1,12-13

27. Puzo, G., Fourme, J.-J., and Prome, J.-C (1985) Identification

of stereoisomers

Tandem MS

311

of some hexoses by mass spectrometry using fast atom bombardment and mass ton kmetic energy. Anal. Chem. 57,892-894 28 Gage, D A., Rathke, E , Costello, C E., and Jones, M. Z. (1992) Determmatton of sequence and lmkage of tissue oligosaccharides in caprme j3-mannosidosts by FAB-CAD-MS/MS GlycoconJugate J. 9. 29 Orlando, R , Bush, C. A , and Fenselau, C (1990) Structural analysts of ohgosaccharides by tandem mass spectrometry: collisional activation of sodium adduct ions Biomed. Envwon Mass Spectrom 19,747-754 30 Laine, R. A , Pamtdtmukkala, K. M., French, A. D , Hall, R W., Abbas, S. A., Jam, R K , and Matta, K. L (1988) Linkage posmon in ohgosaccharides by fast atom bombardment tomzatton, colhaon-activated dtssociatton, tandem mass spectrometry and molecular modelmg. J. Am. Chem Sot. 110,6931-6939. 31. Garozzo, D , Giuffrtda, M , Impallomem, G , Balhstrert, A , and Montaudo, G (1990) Determination of hnkage position and identtficatron of the reducing end m hnear oligosaccharides by negative ton fast atom bombardment mass spectrometry Anal. Chem. 62,279-286 32 Angel, A -S., Lindh, F , and Nilsson, B. (1987) Determmation of binding poslttons m ohgosacchartdes and glycosphmgohptds by fast-atom-bombardment mass spectrometry. Carbohydr. Res. 168, 15-3 1. 33 Breimer, M L., Hansson, G C , Karlsson, K.-A , Leffler, H., Ptmlott, W , and Samuelsson, B E (1979) Selected ton monitormg of glycosphingohpid mixtures Identtfication of several blood group type glycohptds in the small Intestine of an indtvtdual rabbit. Biomed Mass Spectrom 6,23 l-24 1. 34 Kanfer, J N. and Hakomort, S (1983) Handbook of Llpld Research, vol 3 Sphingollpld Biochemistry, Plenum, New York 35 Ladisch, S., Sweeley, C C , Becker, H , and Gage, D. (1989) Aberrant fatty acyl a-hydroxylation m human neuroblastoma tumor gangliosides. J. Biol. Chem 264, 12,097-12,105. 36 Costello, C. E and Vath, J E. (1990) Tandem mass spectrometry of glycohptds, m Methods m Enzymology, vol. 193. Mass Spectrometry (McCloskey, J A., ed ), Academic, San Diego, CA, pp 738-768 37 Domon, B. and Costello, C E. (1988) Structure elucidation of glycosphingoliptds and gangliosides using high performance tandem mass spectrometry. Blochemutry 27,1534-1543 38. Domon, B. and Costello, C. E (1988) A systematic nomenclature for carbohydrate fragmentations m FABMSMS of glycoconJugates Glycoconjugate J 5, 397-409 39 Ohashi, Y., Iwamori, M , Ogawa, T., and Nagai, Y. (1987) Analysis of longcham bases m sphingohpids by positive ton fast atom bombardment or matrtxassisted secondary ton mass spectrometry Biochemrstry 26,3990-3995 40. Kuet, J., Her, G R., and Remhold, V. N (1989) Supercntical fluid chromatography of glycosphmgohpids Anal Blochem 172,228-234. 41 Domon, B , Vath, J E , and Costello, C E (1990) Analysis of dertvatized ceramides and cerebrosides by high performance tandem mass spectrometry Anal Biochem 184,151-164

312

Costello

42 Carr, S. A, Roberts, G D , Jurewtcz, A., and Fredertck, B (1988) Structural fingerprinting of Asn-linked carbohydrates from specific attachment sttes m glycoproteins by mass spectrometry apphcatton to tissue plasmmogen acttvator Biochemie 70,1445-1454 43 Gtllece-Castro, B L , Ftsher, S J , Tarentmo, A L , Peterson, D. L , and Burlingame, A L (1987) Structure of the ohgosacchande portion of human hepatitis B surface antigen. Arch Btochem Biophys 256, 194-201 44 Vath, J E., Jankowskr, M. A., Martin, S A , and Scoble, H A (1990) Characterization of recombmant glycoprotems by mass spectrometry Abstr 38th ASMS Conference on Mass Spectrometry and Allied Topccs, Tucson, AZ, pp. 35 1,352 45. Kaygamch, K. and Murphy, R. C. (1991) Molecular species analysts of arachidonate contaming glycerophosphocholines by tandem mass spectrometry J. Am Sac Mass Spectrom 2,45-54 46 Huang, Z.-H, Gage, D A , and Sweeley, C C (1992) Characterrzatron of dracylglycerylphosphocholine molecular spectes by FAB-CAD-MUMS a general method not sensitive to the nature of the fatty acyl groups 3, 71-78 47 Cram, P F. (1990) Mass spectrometric techmques n-r nucleic actd research Mass Spectrom Rev. 9,505-554 48 Cerny, R L., Gross, M L , and GrotJahn, L (1986) Fast atom bombardment combined with tandem mass spectrometry for the study of dmucleottdes. Anal Btochem 156,424-435 49 Crow, F. W , Tomer, K B., Gross, M L , McCloskey, J A , and Bergstrom, D F (1984) Fast atom bombardment combined with tandem mass spectrometry for the determination of nucleosides Anal Biochem. 139,243-262. 50 GrotJahn, L , Blocker, H , and Frank, R (1985) Mass spectroscoprc sequence analysis of ohgonucleottdes Btomed. Mass Spectrom 12,5 14-524 51 Cushnir, J. R , Naylor, S., Lamb, J H , and Farmer, P B. (1990) Deutermm exchange studies in the identification of alkylated DNA bases found in urme, by tandem mass spectrometry Rapid. Commun Mass Spectrom 4,42U3 1 52. Dmo, John J., Jr, Guenat, C. R., Tomer, K B., and Kaufman, D G (1987) Analyses of carcmogen-modttied oligonucleottdes by fast atom bombardment/ tandem mass spectrometry Rapid Commun Mass Spectrom 1,69-7 1. 53. Claereboudt, J., Esmans, E L., Van den Eeckhout, E. G , and Claeys, M (1990) Constant neutral loss scanning for the charactertzatron and sensittve analysts of deoxynucleosrdes and derivatives desorbed by fast atom bombardment Abstr 8th International Sympostum on Mass Spectrometry tn Life Sciences, Ghent, Belgium, p, 43 54 Claereboudt, J , De Splegeleer, Ltppert, B , De Brum, E A , and Claeys, M (1989) Fast atom bombardment and tandem mass spectrometry for the structural charactertzatron of crsplatm analogs and bis-nucleobase adducts with crsplatm Spectros Int. J. 7,91-l 12. 55. Plazrak, A. S , Costello, C. E., Comess, K. M , Bancroft, D. P , and Lrppard, S J (1990) High performance tandem mass spectrometry of platmated oligonucleottde fragments Abstr 38th ASMS Conference on Mass Spectrometry and Allted Toptcs, Tucson, AZ, pp 792-793

Tandem MS

313

56 Comess, K M., Costello, C E., and Lippard, S. J. (1990) Identification and characterization of a novel linkage isomerization in the reaction of trans-diamminedichloroplatmum(I1) with S-d(TCTACGCGTTCT). Biochemistry 29,2102-2110 57. Miller, J M. (1990) Fast atom bombardment mass spectrometry (FAB MS) of organometallic, coordination and related compounds Mass Spectrom Rev 9, 3 19-348 58 Schulten, H R. (1979) Biochemical, medical and environmental applications of field-ionization and field-desorption mass spectrometry. Int. J. Mass Spectrom Ion Phys 32,97-283 59. Unger, S E , McCormick, T. J , Treher, E N., and Nunn, A. D. (1987) Comparison of desorption ionization methods for the analysis of neutral seven-coordinate technetium radiopharmaceuticals Anal Chem. 59, 1145-I 149. 60 Fackler, J. P., Jr., McNeal, C J., Pignolet, L H., and Wmpenny, R E. P. (1989) 252Cf-Plasma desorption mass spectrometry as a tool for studying very large clusters; evidence for vertex-sharing icosahedra as components of Au& PPh,) ,&is J. Am Chem Sot. 41,ll l-l 14 61 Was& S H., Costello, C E , Rhemgold, A. L , and Haggerty, B S (1991) The preparation and characterization of two new isomorphous heteropoly oxofluorotungstate [CoWr70s6F6NaH419and [FeW170s6F,NaH,]8-anions Inorg Chem 30,1788-1792 62 Bott, G , Ogden, S , and Leary, J. A (1990) Collision-energy ramp. A modification to an RF-only quadrupolecollision cell Rapid Commun Mass Spectrom 4, 34 l-344 63 Pearlstem, R M., Lock, C J. L , Faggiam, R., Costello, C. E., Zeng, C -H., Jones, A G., and Davison, A. (1988) Synthesis and characterization of technetium(V) complexes with amine, alcoholate and chloride hgands Inorg Chem 27,2409-24 13 64. Nicohni, M , Bandoh, G , and Mazzi, U (1990) Technetium and Rhenrum in Chemistry and Nuclear Medrcine 3, Cortina International, Verona, and Raven, New York. 65 Hillenkamp, F , Karas, M , Bean, R C , and Chait, B T. (1991) Matrix-assisted laser desorption/ionization massspectrometry of biopolymers. Anal Chem 63, 1193A-1203A. 66 Fenn, J. B , Mann, M , Meng, C K , and Wong, S F (1990) Electrospray ionization-principles and practice Mass Sectrom Rev 9,37-70

CHAPTER

Miissbauer Dominic

13

Spectroscopy P. E. Dickson

1. Introduction

Mossbauer spectroscopy involves the emission and absorption of y-rays by nuclei in solids. This technique is based on the Miissbauer effect, whereby certain nuclei, when in a solid, can emit and absorby-rays without energy loss because of recoil. This leads to resonant absorption with an extremely high precision, which can be used to investigate the very small changes in the nuclear energy levels that result from the hyperfine jnteractions between the nucleus and its electronic environment. Thus, the Mossbauer nucleus in a solid acts as a probe of the chemical and physical state of the atom, molecule, and solid in which the nucleus is situated. The Mossbauer effect only occurs in certain nuclei that emit and absorb low-energy y-rays. Of these, 57Feis by far the most suitable, andessentially all applications of Mossbauer spectroscopy to the study of metalloproteins have involved 57FeMossbauer spectroscopy and the mvestigation of non-containing protems. The samples usually form the absorber and contain the 57Fein its stable nonradioactive form, which constitutes 2% of natural iron. Mossbauer spectroscopy is both specific to the type of nucleus being studied and is also a very local probe, giving detailed information about the chemistry in the immediate vicinity of the 57Fenucleus. These attributes make the technique particularly suitable for mvestigating proteins in which the iron atoms have a central and crucial role (e.g., hemoglobin). From Methods m Molecular B/ology, Vol 17 Spectroscoprc Methods and Analyses NMR, Mass Spectromefry, and Metalloprotern Techmques Edited by C Jones, B Mulloy, and A H Thomas Copynght 01993 Humana Press Inc , Totowa, NJ

315

316

Dickson

Mijssbauer spectroscopy using 57Fecan give information on the valence state of the iron atoms, on the nature and arrangement of the ligands, on the spin state of the iron atoms, and the degree of magnetic order. Frequently, this information can be more readily interpreted by combining the data fromMijssbauer spectroscopy with data from other physicochemical techniques, such as electron spin resonance (Chapter 14) or X-ray absorption (Chapter 16). Each different chemical environment of the Miissbauer nucleus within a sample gives rise to a distinct contribution to the Miissbauer spectrum. Thus, Miissbauer spectroscopy can be used to identify the number of different forms of the Mijssbauer atom within a sample, and to determine the nature of these different forms by comparison of the spectra with those of known materials or by direct interpretation of the spectral parameters. 2. The Miissbauer Effect and Miissbauer Spectroscopy of Miissbauer Spectroscopy Certain nuclei emit low-energy y-rays when a radioactive excited state decays to the stable ground state. When the nucleus is in a solid, the emission and absorption of thesey-rays can take place without any recoil or consequent energy loss. This is because any recoil energy would be taken up by vibrations in the solid, and since these vibrations only have certain energies (i.e., they are quantized), recoil does not always occur. If the emission and subsequentabsorption of aparticular y-ray by a certain type of nucleus (e.g., 57Fe)are both recoilless, the linewidth or energy resolution of the resonant absorption is determined by the lifetime of the excited state and can be very narrow (-lo-* eV in the case of 57Fe). Miissbauer spectroscopy utilizes the high resolution of the resonant absorption provided by the Miissbauer effect to investigate the details of the nuclear energy levels. The usual arrangement is to have a radioactive source emitting the required Mossbauer y-rays, which then pass through the sample under investigation, containing the Mossbauer isotope in its stable ground state. Adetection system monitors whether the y-rays from the source pass through the sample or are absorbed. The energy scan of the technique is provided by moving the source relative to the sample and thus Doppler shifting the energy of the emitted y-rays. The resulting Mijssbauer spectrum consists of a plot of counts against the velocity of the source. 2.1. Principles

Mtissbauer Spectroscopy

317

2.2. Types of Information from Miissbauer Spectroscopy If the nuclear energy levels in both the source and absorber were unaffected by the hyperfine interactions between the nucleus and the surrounding electrons, the Mossbauer spectrum would consist of a single absorption line at zero velocity. The hyperfine interactions generally make the Mossbauer spectrum more complex than this. The hyperfine interactions fall into three main categories: chemical shift, quadrupole splitting, and magnetic splitting. Each of these yields a different type of information. The chemical shift in the position of the Mossbauer absorption line(s), relative to zero velocity, results from the Coulomb interaction between the charge on the nucleus and the electronic charge density at the nucleus. This depends on the oxidation state, degree of covalency, nature of the ligands of the Mossbauer atom and so on, hence the name. For example in the case of iron-containing systems, the chemical shift enables Fe2+and Fe3+to be distinguished. The quadrupole splitting arises from the interaction between the nonspherical charge distribution of the nucleus and any asymmetry in the atomic charge distribution around the nucleus. In the case of 57Fe, the spectrum then consists of two absorption lines, with their separation (the quadrupole splitting) giving information about the electronic structure of the Mossbauer atom itself, and the charge and position of the surrounding ligands. The nuclear energy levels are normally associated with a nuclear magnetic moment, and in the presence of a magnetic field, the nuclear energy levels are split in a way that is considerably more complex than in the case of the purely electrostatic interactions described above. For 57Fe,the magnetic splitting gives a spectrum with six lines, with the separation between the outer lines bemg directly proportional to the magnetic field at the 57Fenucleus. This effective magnetic field depends on the magnetic moment of the Miissbauer atom, the degree of magnetic order in the sample as a whole, and the presence of any external applied field. For 57Fe,the magnetic splitting is most frequently observed in magnetically ordered materials, and the effective field is characteristic of the different oxidation and spin states of the iron atoms. The magnetic splitting has an associated Miissbauer measurement time, and if the magnetic field sensed by the nucleus is changing on a com-

318

Dickson

parable time scale as a result of thermal or other fluctuations, the magnetic splitting may be reduced or completely eliminated. This phenomenon is known as magnetic relaxation, and it is often observed in magnetically split Mijssbauer spectra as a function of temperature. Because the Mijssbauer effect is associated with the movement (or lack of it) of the Mossbauer atom, the linewidth and absorption intensity of the Miissbauer spectrum can contain information concerning the dynamics of the Mossbauer atom within the molecule or solid in which it is located. In the case of iron metalloproteins, the iron atom and its motion may be crucial to the function of the protein. Further details on the types of information that can be obtained using Mossbauer spectroscopy can be found in a number of standard texts (e.g., I and 2). 2.3. Limitations The use of Mossbauer spectroscopy for studymg metalloproteins is limited to those proteins containing a Mossbauer element, which virtually restricts the use to iron-containing metalloproteins. There are Mossbauer isotopes of tin, gold, zinc, nickel, potassium, and various rare earth elements, but the Mijssbauer characteristics are generally far less advantageous than in the case of 57Fe,and much less information can be extracted from the Mossbauer spectra. Taken together with the importance of iron in biologtcal molecules, this means that virtually all applications of Mossbauer spectroscopy to the study of metalloproteins involve 57FeMossbauer spectroscopy. Another limitation imposed by Mossbauer spectroscopy results from the requirement that the sample must be in a solid form, either crystalline, freeze-dried, or a frozen solution. In addition to this, the technique requires very much more concentrated samples than is the case with many other techniques. These requirements are discussed in more detail in Section 3.2. 3. Experimental

Techniques

3.1. Spectrometers A block diagram of a typical Mossbauer spectrometer is shown in Fig. 1. The source contains the Mossbauer isotope in a radioactive state, which decays to the ground state with the emission of the Mossbauer y-ray. For 57FeMijssbauer spectroscopy, the source used is 57Co,which decays to 57Fewith a half-life of 270 d. The source strength

Miissbauer Spectroscopy

Mhsbauer

319

Spectrum

Frg. 1 Schematm representation of a typical Mossbauer spectrometer A s7Co source mounted on a transducer moves backwards and forwards m a cycle governed by the waveform generator. The energy of the y-ray emitted IS altered by the Doppler effect, its transmtssion through the s7Fe-containing sample IS detected, and an electronic signal 1sgenerated Correlation of these signals with the source velocity then gives the spectrum of ‘y-ray counts against source velocity. The sample can be cooled or held in an externally applied magnetic field.

of typically 4 x lo9 Bq (100 mCi) means that adequate radiation protection is required and that all investigators must be registered radiation workers. The half-life of the source and its cost (of the order of $10,000) combine to produce an important element in the running costs of the technique. The source is mounted on a transducer that gives a range of source velocities and provides the energy scan of the technique. The transducer is driven by a special waveform that produces equal times spent in equal velocity ranges. The Mossbauer y-rays are detected by standard nuclear instrumentation. The resulting counts are normally recorded in a data acquisition system that provides synchronization with the transducer motion and, hence, produces a Mossbauer spectrum of counts against velocity.

320

Dickson

To accumulate a spectrum with an adequate signal-to-noise ratio, many velocity scans are required, and typical counting times are of the order of hours. In the case of metalloprotems containing only a few Mijssbauer atoms in a large molecule, it may be necessary to accumulate the spectrum for much longer than this. It is often useful to be able to make measurements over a range of temperatures and in large applied magnetic fields. Most spectrometers incorporate arrangements for mounting the sample holder in a cryostat with liquid nitrogen or liquid helium, giving temperatures down to around 1 K or even lower. Large magnetic fields can be obtained by mcorporating a superconducting solenoid in a liquid helium cryostat. A Mbssbauer spectroscopy setup with its associated cryogenics and vacuum systems, as well as the necessary data analysis hardware and software, is usually only found in laboratories specializmg in this technique. Further information on the experimental aspects can be found elsewhere (1,3). 3.2. Sample Requirements The first requirement is that the Miissbauer nuclei should be in a solidenvironment. For metalloproteins, the measurements can be made on frozen solutions, freeze-dried samples, or protein concentrated by centrifugation. In circumstances where single crystals are available, these allow extra structural information to be obtained from the analysis. The technique is nondestructive, and the protein sample can be recovered at the end of the experiment. In many cases, the metalloprotein may be present within part of a living system, such as tissue or whole bacterial cells. This presents no problems to the technique, and measurements can be easily made on frozen or freeze-dried tissue samples or frozen bacterial cell paste. Clearly the conditions under which the measurements are made must be taken into account when interpreting the data. The requirement for a solid sample can sometimes be slightly relaxed. A very large molecule in a viscous liquid can produce a system that is effectively solid as far as the very short time scale of Mossbauer spectroscopy is concerned. Thus, it can be possible to obtain Mossbauer spectra from a complete system in vivo. A related aspect is that any deviation from complete rigidity of the Miissbauer atom within the sample produces changes in the Mijssbauer spectra, which can be used to investigate protein dynamics.

Mdsbauer

Spectroscopy 16 mm

4

321

4

5mm --

1.

Fig 2 Cross-sectional view of a typlcal cylmdrical sample holder made from nylon

To obtain an adequate Mijssbauer spectrum, the sample must contain a certain

minimum

number

of the M(issbauer

nuclei

per unit area.

This requirement imposes a major constraint on what can be achieved using this technique. The area of the sample is constrained by the geometry of the spectrometer, and its thickness is limited by the nonresonant absorption of y-rays. A typical sample holder would be 0.5 cm thick and would have a cross-sectional area of around 2 cm2, giving a vol of lcm3 (see Fig. 2). The concentration of the 57Fe Mijssbauer nuclei depends on the concentration of iron in the sample and the isotopic abundance of 57Fe(2%) relative to natural iron (which is a mixture of 56Fe and 57Fe).In the case of metalloproteins, which may contain only a few iron atoms in a very large molecule, it may be necessary to enrich the samples with 57Fe.This can be done by either growing the relevant organism on an 57Fe-rich medium, or by removing the iron-contaming moiety and then reconstituting the protein with 57Fe.For 57FeMossbauer spectroscopy, the concentration required to produce a spectrum with a good signal-to-noise ratio, but without suffering unduly from saturation effects, is around 5 mg of natural iron/cm2 of sample. This corresponds to a total of about 200 lrmol of natural iron (4 l.tmol of 57Fe)in a typical sample. The minimum feasible amount of iron in the sample is of the order of 2 ~01 of natural iron (40 nmol of 57Fe).The generally rather high concentration of the material required for Mossbauer spectroscopy should of course be taken into account when interpreting the data.

322

Dickson

57FeMiissbauer spectroscopy is sensitive to all the forms of iron m the sample. Although this can be very useful analytically, it means that great care must be taken with regard to the purity of the sample with respect to adventitious iron-containing components. The presence of other heavy elements within the sample should also be avoided if possible, since they give nonresonant absorption and can drastically reduce the intensity of the spectrum. 3.3. Data Analysis Computer fitting of the Mossbauer spectra is the normal practice, since the hyperfine interactions have a good theoretical model. In simple

cases, this analysis

involves

finding

the parameters

of a series

of lines, in singlet, doublet, and sextet combinations, which give the best fit to the experimental spectrum. In other situations, a more complex model of the hyperfine interactions is required. In all cases, the fitted parameters are interpreted in terms of information on oxidation and spin state, types of ligands, magnetic structure, the forms of iron atom present, and so forth, as discussed above. 4. Applications

Mossbauer spectroscopy has been applied to the study of all the main groups of iron-containing metalloproteins: heme proteins, ironsulfur proteins, iron-transport proteins, and iron-storage proteins. In order to give some illustrative examples of the use of the technique, we will briefly consider Mossbauer studies of two of these classes. A complete overview of biological applications of this technique may be found elsewhere (4). 4.1. Iron-Sulfur Proteins The iron-sulfur proteins have active centers containing one, two, three, or four iron atoms. In all cases, the iron atoms are in an approximately tetrahedral arrangement, surrounded by four sulfur atoms. The similarity of the iron environment in the various members of the group means that the Mijssbauer parameters can be calibrated by measurements on the simpler members and can then be used to investigate more complex proteins. Mossbauer spectroscopy has been particularly helpful in assigning the valence state of the iron atoms and the degree of electron delocalization between them. The Mossbauer chemical shift is particularly

Mlissbauer Spectroscopy

323

Fig. 3 Mossbauer spectra taken at 77 K from a two-iron ferredoxm m the OXIdized and reduced state. The oxidized state has both iron atoms in the Fe3+ state (shown by the doublet with a narrow splitting), whereas the reduced state contains one Fe*+ atom (corresponding to the doublet with a much larger splitting)

sensitive to the valence state, and from measurements on a wide range of proteins, a chemical shift vs valence scale has been developed (5). Rubredoxins have one atom in the active center, which is Fe3+in the oxidized form of the protein and Fe2+ in the reduced form. In the proteins with two iron atoms in the active center, the two iron atoms are both Fe3+in the oxidized form, whereas in the reduced form, one is Fe3+and the other is Fe2+.This information is clearly evident in the Miissbauer spectra shown in Fig. 3, where the narrowly split doublet corresponds to Fe3+and the widely split doublet to Fe2+.Proteins with three iron atoms in the active center undergo a one-electron reduction with the extra electron being shared between two iron atoms, which then have a valence intermediate between Fe3+and Fe2+.In the fouriron centers,an evengreaterdegreeof electron delocalization is observed. The magnetic moment of the iron atoms leads to a coupling mechanism when there are more than one iron atoms in a center. Mijssbauer spectroscopy, particularly with an external applied magnetic field, has

324

Dickson

o75 -

----!++- +* l

*

*

----jr-=i+:+

o5O3a 6 =r g 9

_ Oz-

I -10

.

t

I -5

/

Velocity

I 0

I

I 5

I, 10

(mm/s)

Fig. 4 Vartable temperature Mossbauer spectra of horse-spleen hemostderm Hemostderm has a central core containing up to 4000 iron atoms, whlch)eads to magnetic ordering at low temperatures. The variation m the spectrum with temperature can be analyzed m terms of the stze distribution and composttion of the iron-containing cores

given considerable information on the nature of this coupling, which has implications for the electronic structure.

4.2. Iron-Storage

Proteins

The most widespread member of this group of proteins is ferrrtin, which consists of a protein shell with a central cavity of approx 8 nm diameter, containing a small particle of an iron oxyhydroxide. The central cavity can contain up to 4000 iron atoms. Because of the large

Mhsbauer

325

Spectroscopy

number and proximity of the iron atoms, there is magnetic ordering at low temperatures. The observation of this magnetic ordering in the Mossbauer spectra is temperature dependent in a way that depends on the distribution of particle sizes within the ferritin sample. The behavior of the Mossbauer spectra as a function of temperature can be interpreted to give information on this distribution and on certain magnetic properties of the iron-containing material in the protein cavity. In Fig. 4, this behavtor can be seenin a material called hemosiderin. Hemosiderin is very similar to ferritin and is found in conditions of iron overload. In addition to ferritins of broadly similar types in virtually all higher organisms, there are ferritins m bacteria that are significantly different. The magnetic effects observed in Mossbauer spectra have been instrumental in showing up these differences, as well as differences in the forms of hemosiderm found in different pathological iron overload syndromes. References 1 Greenwood, N N and Grbb, T C. (1971) Mossbauer Spectroscopy Chapman and Hall, London 2 Drckson, D P E and Berry, F J (1986) Mossbauer Spectroscopy Cambrtdge Umversity Press, Cambrrdge. 3 Drckson, D P E and Johnson, C E (1984) Mossbauer spectroscopy, m Structural and Resonance Techniques in Blologlcal Research (Rousseau, D L , ed ), Academrc, New York, pp 245-293 4. Dickson, D P. E (1984) Apphcatron to brologrcal systems, in Mossbauer Spectroscopy Applied to Inorganic Chemistry, vol 2 (Long, G J , ed >, Plenum, New York, pp 339-389 5 Cammack, R., Dickson D P E., and Johnson, C E (1977) Evidence from Mdssbauer spectroscopy and magnetic resonance on the actrve centers of the iron-sulfur proteins, m Iron-Sulfur Proteins, vol. 3 (Lovenberg, W., ed ), Academic, New York, pp 283-330.

&KPl’ER

14

Electron Paramagnetic Resonance Spectroscopy of Metalloproteins Richard

Cammack

1. Introduction

Electron paramagnetic resonance (EPR), or electron spin resonance (ESR) spectroscopy, is a technique for studying paramagnetic materials, the molecules of which contain unpaired electrons. These comprise organic free radicals and transition metal ions. The biological significance of these rare species is that they often occur at the active sites of enzymes and the electron-transfer systems of bioenergetics. Manganeseions areof special significance in molecular biology, because they can often bind to nucleotides and nucleic acids in a similar way to magnesium, and serve as probes of their environment. There are at least three main areas of interest for biologrcal EPR. The first of these is to examine the naturally occurring transition metal ions and radicals in a sample, and learn about their function and environment. The second is to use artificially introduced radicals or spinlabels as probes of the environment and dynamics of a particular biological species, such as a protein or lipid. The third is to use spintraps to identify the short-lived radicals produced during biological processes, such as reactions with oxygen, Each of these is a large area, but there are some areas of technique that overlap. For example, they all use the same type of spectrometer and there are similar problems of sensitivity and contamination.

From Methods m Molecular Wology, Vol 17 Spectrooscoplc Methods and Analyses NMR, Mass Spectrometry, and Metalloprotem Techniques Edited by C Jones, I3 Mulloy, and A H. Thomas Copyrtght 01993 Humana Press Inc , Totowa, NJ

327

328

Cammack

This chapter covers the first of these objectives, with particular reference to transition metal ions. It is intended to provide practical guidance to biochemists and molecular biologists who need to make EPR measurements of their material. Mention will be made of the problems of obtaining samples m the right oxidation state and in sufficient concentration to provide recognizable EPR signals, andof resolving the complex signals that arise in biological material. More details about the operation of the EPR spectrometer, and the theory of transition-metal EPR are provided elsewhere (1-4). Because the magnetic moment of the electron is hundreds of times larger than those of nuclei, such as ‘H, EPR spectroscopy is in principle much more sensitive than NMR. EPR can be used to measure transition ions in relatively low concentrations. It can be applied, for example, to a metalloenzyme, tocell extracts containing that enzyme, or even, if it is present in sufficient amounts, to whole cells. In order to measure transition metals ions by EPR, it is often necessary to use low temperatures. This is a consequence of the extremely rapid electron-spin relaxation rates. As a result, the sample must be in the frozen state, which prevents most measurements of mobility and kinetics. Only free radicals (including spin-labels and spin-traps) and a few transition ions, notably manganese, are readily detected at room temperature. However, EPR is a nondestructive method, so the sample can be reused if it survives freezing. Although it can be used quantitatively, EPR is not a general method to detect the total amount of a metal ion in solution. Standard methods, such as atomic absorption spectrophotometry, are adequate for this. The value of EPR lies in its selectivity. It can observe individual chemical forms of the metal or radical, even in quite complex mixtures, and can provide information, such as their valence state and ligands. As in NMR, spin-spin interactions between the electron and other electrons or nuclei can be informative about structure. By studying changes that occur in the EPR spectrum of one of these species as a result of biological reactions, we can learn about its function, 1.1. Systems That Can Be Studied by EPR Paramagnetic transition metal ions of biological interest that can readily be detected by EPR spectroscopy are iron, copper, manganese, molybdenum, and, more rarely, vanadium, cobalt, and nickel. The

EPRof Metalloproteins

329

Table 1 Properties of TransItIon Ions Relevant to Biological Systems

Metal ion

Paramagnetic oxidation state?

Vanadium Manganese Iron Cobalt

vouv)* Mn(“)* Mn(‘V) F,(III)* ’ c0(11)*

Nickel Copper

N*(I) Ni(“I)* cud

Other state@ v(V),* vm Mn(“‘) Fe(II)* co(‘)* coUII)* Ni(II)* cd’)

Isotopesb

Nuclear spinb

% Natural abundance of lsotopeb

51V 55Mn S7Fe s9co

712 512 l/2 712

99 76 100 2 19 100

6’Ni 312 1 134 63cu 312 69 09 wu 312 30.91 Molybdenum MO(V)* Mo(‘V)* 95Mo 512 15 72 Mo(V’)* 97Mo 512 9 46 *Denotes commonly occurring states on biological material UOther oxrdatlon states exist for these elements, which ~111generally not be detectable by EPR bOther isotopes also exist for these elements Compounds of these other Isotopes will be EPR-detectable, but wlthout hyperfme sphttmgs

measurable oxidation states of these elements, and their isotopes that have nuclear spins, are summarized in Table 1. They are sometimes combined in clusters, such as in iron-sulfur proteins, which are also paramagnetic. Another source of paramagnetism in electron-transfer systems arises from free radicals of organic molecules, such as flavins and quinones. Radicals may also be induced by reactions of reduced organic matter with oxygen and by irradiation. 1.2. Principle In most compounds, the electrons arepaired and are therefore diamagnetic. Only those molecules that have unpaired electrons are detected. The electron has an associated spin, S = l/2, which gives rise to a magnetic moment, b. EPR is therefore a magnetic resonance phenomenon, like nuclear magnetic resonance. It relies on the splitting of the energy levels of the electron states, m, = &l/2, by an applied magnetic field. Resonant absorption results from the excitation of electrons from the lower to the higher energy level, by interaction with microwave radiation.

330

Cammack

A conventional continuous-wave (cw) EPR spectrum is obtained by measuring the microwave absorption at fixed frequency, while continuously scanning the applied magnetic field, B,. The position of resonance of a paramagnet is defined by the g-factor, derived from the equation: hv = glh &I or g = W(l.Q&) = 7 1.448(v/B0)

(1) (2)

if v is in GHz, B. in mT. It can be seen from this equation that the higher the g-factor, the lower the resonant magnetic field, for a given irradiation frequency. A typical EPR measurement would involve absorption of microwaves with a frequency of 9 GHz in a magnetic field of about 320 mT (3200 G). This is close to the free-electron g-factor of 2.0023. Transition metal ions may have g-factors that are substantially different from 2 because of interactions between the electron spin and the orbltals. The g-factor is treated as a characteristic parameter of the particular spin system. It may be used in a diagnostic way, analogous to the chemical shift in an NMR spectrum. The EPR spectrum is conventionally presented as the first derivative of the microwave absorption. This derivative spectrum is produced by the detection system, which employs magnetic field modulation to improve the signal:noise. Therefore, the typical EPR spectrum has features both above and below the baseline. A simple spectrum, without hyperfine splittings, only crosses the baseline once. 1.3. Detectable Valence States In general, EPR is readily detected in systems with a single unpaired electron,such asCu(“)(electron spin S= l/2), or an odd number of electrons, such asMn(“) (S = 5/2). Oxidation or reduction of a paramagnetic ion will changeit either into a form that haszero net spin, such asCu(‘) or low-spin Fe(“), or into one that has an even spin, such as high-spin Fe(“) (S = 2). As a result, EPR is sensitiveto the oxidation state of the transition ion. 1.4. Factors Affecting the Line Shape of the EPR Spectrum The form and characteristics of the EPR spectrum are influenced by a number of effects. Often quite subtle differences in the coordmation state of a transition ion can drastically affect the shape of the EPR spec-

EPR

of Metalloproteins

331

trum. Space does not permit a description of the quantum-mechanical basis of these effects, which are described in standard texts (3). They will only be mentioned m terms of the practical information that can be derived from them in a biochemical system. These include: Spin-orbit couplmg, which affects the g-factor. Hyperftne couplmg, A, with the metal nucleus, and superhyperfbze couplings to hgand nuclet, whtch give rise to broadenmg or sphttmg of the spectrum mto (21 +I) lines, where / is the spm quantum number of the nucleus, Zero-field sphttmgs m multlelectron ions, which can cause extreme shifts m the apparent g-factor. Electron spin-lattice relaxation rate, T,, which affects the temperature dependence of the EPR signal. Electron spin-spm relaxation rate, T2, which affects the lme width of the spectral lmes. Interaction with distant electron spms. The electrons interact with each other by exchange coupling and dipolar couplmg. Strong exchange coupling between two adjacent spins will completely alter the form of the spectrum or cause tt to disappear completely. Weaker mteracttons can be obsel ved between spins at distances up to 2 nm, as sphttmgs or broadening of the constituent spectra, or an increased relaxation. Amsotropy IS an important concept, which means that all of these effects on the EPR resonance vary with the dn-ectton m which the Ba field 1s applied to the molecule. Since the molecules m a typical sample are oriented randomly, the resultant spectrum IS the average of the spectra of all orientations. Strain, which results from distorttons of the local geometry of the metal centers. The statrstrcal distribution of strain, together with the way m whrch 11mfhrences the g-factor, A-values, and relaxation, causes further broadening. All of these factors affect the form of the EPR spectrum and the conditions in which it is detected.Examples areshown in Fig. 1.The spectrum of MO(“) (Fig. lc) is a simple S = l/2 system, with a typical g-factor slightly ~2. The spectrum of Ni W) (Fig. Id) has a g-factor >2, with pronounced g-factor amsotropy. Manganese (Fig. le) has a g-factor close to 2.0, but the spectrum is split into six lines, by hyperfine interaction with the 55Mn nucleus. The spectrum of CL&“) (Fig. lb) has a typical axial line shape, with hyperfine splitting of gll mto four lines.

332

Cammack g - FAClDR

10876

rnlll

I I

Irlrlr

5

4

I

I ” ’ ’ 1 “““““““”

3

I” ’ ’ ’ ’ ’ ’ ’ I

I

I IIIIIII 100

200 hfACNEl7C

1.5

2

1 I I I I I 300

400

FYELD, mT

Fig. 1, Examples of EPR spectra,plotted on a wadefield scan to emphastze the range of g-factors of the spectra of transition metals m proteins The samples and temperatures of measurement were (a) an organtc radical, at g = 2.003 (room temperature); (6) Cu(“) m superoxide drsmutase (77 K), (c) the desulfo-mhtbited molybdenum(“) signal from milk xanthme oxidase (150 K) (the field scan IStoo wide to reveal the rhombic line shape of this spectrum), (d> nrckel(‘n) m hydrogenase from Desulfovibrio gigas (80 K), (e) Mn(‘n Ions in solutron (room temperature), u> reduced [2Fe-2SI cluster m spinach ferredoxm (24 K); (g) low-spin Fe(**n m metmyoglobm azide (30 K); (h) high-spm Fe(m) in D. gigas rubredoxin (12 K), (I) high-spin Fe(I’I) m methaemoglobin (10 K)

The spectra of iron are more complex, because this multielectron ion can take up two different spin states, depending on the ligands around it and their geometry. High-spin Fetnl) has all of Its five 3d electrons in separate orbitals (S = 92). Low-spin iron has four of its 3d electrons paired, so that the net spin of the ion is S = l/2. The

EPR of Metalloproteins

333

resulting EPR spectra are quite distinct. The g-factors are spread over a very wide range because of zero-field splittings, which are usually highly anisotropic. In Fig. l(i), the spectrum of a high-spin ferriheme is axial, with minimum and maximum g-factors of g,,= 2, gl = 6. These values correspond to the cases where the applied magnetic field lies parallel and perpendicular to the normal to the heme plane, respectively. The signal at g = 4.3 (Fig. lh) is typical of highly distorted Fe(m) centers and is commonly seen in spectra of biological systems. The spectrum of low-spin Fe(rrr)(Fig. lg), as observed in many oxidized cytochromes, has g-factors ~4. When several metal ions form a cluster, their electron spins are coupled together,forming a new spin system. An example is the (2Fe-2S) iron-sulfur cluster in a ferredoxin (Fig lJ>. In the reduced protein, ferric ion (S = 5/2) and ferrous ion (S = 2) couple, to give a net spin of S= l/2, which gives rise to the EPR spectrum. Spectra of such clusters often have irregular properties, such asunusual g-factors, and a strong dependence on the temperature of measurement. It can be seenthat the form of the spectrum of any particular metal ion depends on the type of metal ion, its valence state, and on environment. 2. Materials and Methods EPR spectroscopy can readily be applied to purified proteins and, in favorable cases, even to whole cells. 2.1. Spectrometers Atpresent, twomainmanufacturers,BrukerAnalytischeMesstechnik GMBH (Germany) and Jeol Instrument Company (Japan), produce commercial cw EPR spectrometers. There are still a large number of instruments made by Varian (USA) in operation. All of them use the same general principle. 2.1.1. Computers

EPR spectrometers are now provided with a computer system to assist in the setting up and running of the instrument. Specific applications include: l

l

Signal averaging:to enhancethe signal:noise ratio of spectraof dilute samples. Spectral subtractions: to resolve complex spectracontainmg multiple overlapping signals.

334 . .

Cammack Storage of spectra on disk: for comparison and replotting. Simulation of spectra: The parameters of a spectrum may be derived by computer stmulatton, in which the shape of the EPR spectrum is calculated from assumed values for g, A, and a general line shape function The values of these parameters are varied, sometimes iteratively, to find the best fit to the experimental spectrum. This is the most exact way of estimating the values of these parameters and of resolving spectra that contain a number of components,

2.1.2. Temperature Control For measurements at low temperatures, the spectrometer is fitted with acryostat. Although immersion cryostats that contain liquid nitrogen (77 K) or liquid helium (4.2 K) are simple to operate, they provide only a fixed temperature. Flow cryostats are more flexible and are now most widely used. Liquid nitrogen is satisfactory for most studtes of copper, molybdenum, or manganese ions. Liquid helium is necessary to achieve temperatures below 77 K, which are required for ironcontaining proteins. 2.2. Cell Holders Sample holders are normally made of pure quartz; borosilicate glass contains metallic contaminants, which give spurious EPR signals. Some plastics may also be used. It is best to check these for EPR signals before use. 2.2.1. Cells for Aqueous Samples Liquid water and other polar solvents are “lossy.” They attenuate the microwave power so much that the instrument cannot be tuned. This can be minimized by the appropriate shape and position of the sample holder in the cavity. In the standard rectangular cavity, the optimum configuration is a flat cell (thickness
EPR of Mvtalloproteins

335

so for quantitative comparisons, the internal dimensions of the tubes should be identical. The sensitive region of the cavity is only 1O-20 mm, so if the sample is correctly positioned, the minimum volume required for optimum signal is about 100 pL. Tubes can be filled and additions can be made by using a syringe with a long needle. Additions of substrates and so on to the sample are made directly into the EPR tube. The samples are thoroughly mixed by stirring up and down with a stainless-steel wire formed into a loop at the end. Any bubbles formed can be removed by vigorously shaking the tube, like a clinical thermometer. Samples are frozen in liquid nitrogen. To avoid breakage, hold the tube at the surface and freeze carefully and slowly, starting from the bottom. Faster freezing may be achieved by plunging the tubes into a small container of 2-methylbutane or methanol, precooled in liquid nitrogen. Store the sample tubes in a container of liquid nitrogen, but above the surface, otherwise oxygen will condense inside. To thaw the sample tubes, hold them in the air for a few seconds to allow any liquid gases to evaporate and then plunge them into warm water. WARNING: Aqueous samples in quartz tubesthat have beenfrozen in liquid nitrogen can easily break when they are warmed up, sometimes explosively. Safety spectacles should be worn while handling them. Some commonly used buffers, particularly phosphate and Tris, can drastically change their pH when freezing. It is better to use zwitterionic buffers, which do not suffer from this effect. 2.3. Sample Preparation The range of concentrations needed to obtain a perceptible EPR signal is extremely wide, and it is hard to give a precise estimate of the amount needed. It depends on the mol wt, the instrumental conditions, and critically, the line width of the spectrum. For a narrow-line spectrum, such as a free radical, or manganese ions in solution, the minimum concentration that may be detected may be as low as 0.1 pit4, whereas a species with a very broad spectrum might need 50 uM. Concentrations to obtain a well-defined spectrum should be ten times greater. In practice, another limitation to the amount that can be detected is contamination, either in the form of background signals from the cavity and cryostat, or by paramagnetic impurities in the sample. The former can be minimized by cleaning or subtracted out with the computer, whereas the latter can be mmimized by appropriate purification.

336

Cammack

Mn2+ ions are present in most cell extracts and give a strong six-line spectrum around g = 2, which can be substantially broadened out by the addition of 2 mM EDTA. Occasionally, a speck of dust or other contamination in the sample holder can cause spurious signals in the spectrum. For this reason, and becausethe conditions for preparation may prove not to be optimum, it is prudent to prepare duplicate samples if possible. On the basis of the preliminary measurements, it may be decided to alter the conditions of preparation. Sometimes it may be possible to add further reagents to the sample, but in other cases,the treatment is irreversible. It is always safest to keep some of the material in reserve for further preparations. 2.3.1. Concentration

Methods

Although EPR is more sensitive than NMR, it still requires much higher concentrations of protein than are used in conventional biochemical procedures, such asenzyme assays.For good spectra, samples should be as concentrated as possible. Fortunately, only small volumes are needed. For membrane preparations, such as mitochondria, samples should be pelleted in a centrifuge and resuspended in the minimum volume. For solutions, there are various concentrating methods, perhaps the most convenient for proteins being small-scale centrifugal concentrators that use an ultrafiltration membrane. Acheap alternative, if the solution is not too salty, is to dry the sample down under a stream of nitrogen. 2.3.2. Oxidation

and Reduction

Treatments

Often samples must be oxidized or reduced to bring the transition metal ion into theparamagnetic oxidation state. Potassium hexacyanoferrate (in) (ferricyanide) is a commonly used general-purpose oxidizing agent, whereas ascorbate and sodium dithionite are used asreducing agents. Dithionite reacts spontaneously with oxygen, forming acid bisulfite and oxygen radicals, so it should be added as a freshly made solution, prepared in a vessel flushed with nitrogen or argon, In an open EPR tube, the contents can be protected from reoxidation by “blanketing” the surface with argon through a syringe needle. The dithionite is Injected directly into the solution with a long-needle microliter syringe. Normally an excess reductant is added (e.g., 2 n-H), but more precise reduction may be made either by stoichiometric addition of reducing agent or by mediator titration (5).

EPR of Metalloproteins

337

These additions of substratesare relatively slow, and the sample will generally have reached equilibrium before the measurement is made. For kinetic studies, it is necessary to use rapid-freeze techniques (6). 2.3.3. Isotope Substitutions

In order to identify the metal in a paramagnetic center, or the nuclei in its vicinxty, it is often desirable to replace the metal or other atoms by a stable isotope with a different nuclear spin. This will cause a predictable change m the spectrum owing to hyperfine or superhyperfine interactions. For example, to determine if a signal is owing to n-on, 57Fe(I = I/%) might be substituted for the naturally abundant56Fe(I = 0). Unlike radioactive tracer experiments, where only small amounts of the isotope are introduced, the stable isotope must account for most of the element in the product. This can be done by removal and replacement of the metal, using the techniques of protein chemistry (7). Alternatively the organism may be grown on a medium containing the appropriate isotope. This is not usually feasible in a mammal for 57Fe7though it has been done in a cow, for 95Mo (8). It is easier to use a microorganism; an overproducing strain is best, provided it inserts the metal correctly. 2.3.4. Preparation

of Oriented Samples

SinceEPR spectraof most paramagnetic ions areanisotropic, it follows that if the molecules in a sample can be aligned, the spectrum will change as the sample is rotated in the EPR cavity. Such alignment is found in crystals, and if this can be achieved, it is possible to determine the direction of the principal axes of g and A relative to the crystal axes. This is rarely practicablein biological systems. However, it is found that the proteins, such as cytochromes, in biological membranes, such as mitochondria, are oriented relative to the membrane plane. These membranes can be stacked into an EPR sample by centrifugation onto a Mylar sheet and careful drying down under controlled humidity. In this way, the direction of the hemeplanes relative to the membraneplane hasbeenidentified (9). 3. Experimental

3.1. Measurement of the EPR Spectrum As may be seen from the number of parameters that affect the form of the EPR spectrum, it is necessary to manipulate the EPR spectrometer conditions to optimize the spectrum of a particular component.

338

Cammack

Often the choice of correct conditions can make the difference between an almost imperceptible weak signal and a strong one. The most important parameters that affect the EPR spectrum are as follows. 3.1.1. Temperature

The optimum temperature for any particular measurement is determined by trial and error. If the temperature IS too high, the spectrum disappears because of relaxation broadening. Generally, the sensitivity improves as the temperature is decreased, until microwave power saturation sets in. 3.1.2. Microwave

Power

The signal:noise ratio of the spectrum improves with increasedpower, at least up to 20 mW. At low temperatures and high power, the spectrum may be diminished and distorted by microwave power saturation. 3.1.3. Magnetic Field Range

This is usually defined by center field and sweep width. Obviously, these must be wide enough to cover the spectrum of interest. With unknown samples, it is best to start with a wide sweep (e.g., 50-450 mT), and then narrow down on features of interest. 3.1.4. Modulation

Amplitude

Effectively, this is a bandwidth defining the narrowest line that can be obtained. To avoid distortion, this should be l/3 of the narrowest feature to be observed. Typically, it is set to 1 mT (10 G) for general broad scans. 3.1.5. Scan Time and Time Constant

These are adjusted to give reasonable signal:noise, a longer time constant giving less noise, but requiring a slower scan. For very noisy signals, it is preferable to take multiple scans and signal average. 4. Information from EPR Spectra The spectrum of a typical cell extract or impure preparation will contain signals from free radicals at g = 2, ferric iron around g = 4.3, and manganese, centered around g = 2. These may be species of interest, but often derive from contaminants. The species of interest may be identified, for example, by oxidation or reduction with specific substrates. EPR spectroscopy is used to identify and measure specific forms of metal ions and radicals in biological materials.

EPRof Metalloproteins

339 4.1. Identification

In practice, regardless of the complications of coordination geometry, the characteristic lme shape and temperature dependence of the EPR spectrum may often be used in a diagnostic way to identify a metal center in a protein. Examples are given in Fig. 1. If the nucleus of the transition ion has a spin, such as 55Mn (nuclear spin I = X!), or 63Cu (I = 3/2), the hyperfine splitting can aid the identification of EPR signals, particularly in complex spectra with several components. 4.2. Resolution of Spectra of Impure Biological Extracts

It is possible to vary the conditions of sample preparation and instrumental parameters so as to emphasize the spectra of the individual components. This can be done by adjusting: l

l

l

l

The redox potential; The addition of speclflc substrates; The temperature of measurement; and Other instrument parameters, e.g., modulation

amphtude.

4.3. Quantitation

A distinguishing feature of EPR spectroscopy 1sthat the absolute intensity of a spectrum, regardless of its shape, is a direct measure of the number of electron spins in the sample. This value is obtained from the spectrum by integrating it twice. It is important that the whole spectrum is measured in isolation, with flat baseline on either side. Before integration, corrections are made for baseline position and slope (Fig. 2). It 1spossible to estimate the concentration by comparmg the double integral of the spectrum with that of a sample of a paramagnet of known concentration, such as a frozen solution of 1 m44 copper + 10 miWEDTA. If both samples are of similar geometry and measured under the same conditions, the double integrals of the spectra may be compared directly. This method is only applicable under restricted conditions (namely, that the spectrum corresponds to an integral spin (S = l/2) and is not saturatedwith microwave power). Practical aspects of quantitative EPR and the effects of various instrumental parameters have been discussed m detail by Randolph (10).

340

Cammack

Second

ntegral=l

Integral

154x107-10' -0

Fig 2. Method of double mtegratron of a spectrum of an S = l/2 species. The spectrum, obtained as a series of drgltal values on the computer, IS first leveled by subtractmg a sloping baseline. Next, the first integral IS obtained by numerical summation of the pomts A second correction is made to level the spectrum agam, and the second Integral IS taken The value shown is proportronal to the concentration of spins in the sample.

4.4. ilficrowave

Power

Saturation

This occurs when the electron spins cannot dissipate the microwave energy sufficiently rapidly. The rate of energy dissipation, in the form of heat, to the surroundings IS defined by the electron spin-lattice relaxation time, Ti. It leads to a decreasedsignal amplitude and distortion of the spectrum. Power saturation is much more pronounced at lower temperatures, since Ti is strongly temperature-dependent. Power saturation may be observed experimentally by measuring the intensity

EPR of Metalloproteins

341

\ 10007 m-.--,-.--.-,-T--

p: ‘4

.

sislml dPower 100 -:

.

\ . \ .\

\ A

\

r, \ \

10 -I ’ ’ ’ ““‘: ’ ’ ““‘q ’ ’ ““q ’ ’ “““: ’ ’ “‘Y ’ ’ ““‘Y 0001 001 01 I IO 1000 100 Power, mW

Fig. 3 Example of a power saturation curve. The amplitude of an EPR signal 1s measured wlfh varymg microwave power, keeping other parameters, especially temperature, constant. The value Pm IS a measure of the degree of microwave power saturation and is a function of the electron-spin relaxation times T, and T2

of the EPR signal as a function of the applied microwave power, Under nonsaturating conditions, the signal should be proportronal to the square root of the power, but this will decrease at higher power as the signal is saturated. The data may be plotted as shown in Fig. 3. The power saturation is characterized by a power for half-saturation, Pl12, which can be derived from the plot. Although in principle this parameter is related to the product TIT2, its most practical application lies in observation of changes of electron-spin relaxation as a result of spin-spin interactions (II). 4.5. Coordination Geometry The dependence of g and A on the direction of the BOfield should in

princrple provide information about the coordination geometry around a metal center. Unfortunately, it is often misleading. For example, the spectrum of a CL&“)protein, such as plastocyanin, mdicates axial symmetry, which might be taken to imply square-planar or pyramidal geometry; in fact, the coordination is a highly distorted tetrahedron.

Cammack

342 4.6. Environment

of the Metal

Ion

The identity of ligands may be ascertained from the superhyperfine interaction with the spins of ligand nuclei, notably i4N (I = 1). These nuclear spins may cause splitting or broadening of the EPR spectrum. This is particularly noticeable when the spectra of samples are compared with different isotopes having different nuclear spins, for example, 15N [I = l/2]. Similarly, interactions with exchangeable protons may be detected by a decrease in the line width on exchanging into 2H20. The effects may be seen in spectra with narrow line widths, when a number of ligand nuclei may be identified. A good example is the measurement of the coordination of the molybdenum m xanthme oxidase by Bray (12). They are less easily detected in spectra of broad species, such as most iron or copper complexes. In order to resolve very small hyperfine splittings, it is necessary to use more specialist techniques, such as electron-nuclear double resonance (ENDOR) or electron spin-echo envelope modulation (ESEEM). 4.7. Redox Potentials

The midpomt oxidation-reduction potentials of mdividual components can be resolved by means of the mediator titration technique. This method has been used extensively for membrane-bound electron transport components. It involves adjustment of a solution of the material to a known redox potential, and then measurement of the extent of oxidation or reduction from the amplitude of its EPR signal. Often this means removing a sample and freezing it for spectroscopic measurements. A vessel for this purpose, and details of sample preparation have been described by Dutton (5). From a series of samples poised at known redox potentials, it is possible to plot a graph from which the midpoint potential can be estimated (Fig. 4). The advantage of this method is that the species does not have to be pure, as long as some portion of its EPR spectrum is detectable, and indeed the midpoint potentials of several speciescan be estimated in the sameset of samples. 5. Conclusions

EPR can be used to identify and quantify specific forms of metal ions in biological systems. Important considerations are that the samples are prepared in sufficiently high concentrations, in the appropriate redox state, and avoiding compounds that give interfering signals. The

343

EPR of Metalloproteins 100

0 -200

-150

-100

-50

0

50

100

150

200

250

300

Redox potential, mV

Fig 4 Example of a redox titration curve For each datapoint,thesampleis poised at the potential shown, and the EPR spectrum IS taken, rf necessary, by removmg a sample and freezmg. The amphtude of some feature in the spectrum IS measured and plotted against the potenttal The data are fitted to a curve calculated from the Nernst equatton, wtth the maximum signal amphtude and mrdpoint potential E, as vartables

choice of the measurementtemperature and running conditions is important for acquiring good spectra. EPR offers the ability to determine the effects of spin-spin interactions with ligand nuclei and other unpaired electrons, which complicate the spectra, but offer the possibility of obtaining additional mformation. Acknowledgments I thank ELC. Hatchikian and R. C. Bray for some of the samples used m Fig. 1. This work was supported by grants from the U.K. Science and Engineering Research Council. Abbreviations and Symbols cw, continuous-wave; EDTA, ethylenediamine tetra-acetate; ENDOR, electron-nuclear double resonance; EPR, electron paramagnetic resonance; ESEEM, electron spin-echo envelope modulation; ESR, electron spin resonance; GHz, gigahertz; kHz, kilohertz; mT, millitesla; NMR, nuclear magnetic resonance; A, hyperfine coupling constant; B,, magnetic flux density (magnetic field); Em, midpoint oxidation-reduction potential; g, spectroscopic g-factor; h, Planck’s

344

Cammack

constant; I, nuclear spin; mA4, j.lM, millimolar, micromolar; m,, electron spin quantum number; Pllz, microwave power for half-saturation; S, electron spin; T1,T2, electron spin-lattice and spin-spin relaxation time; v, microwave frequency; b, Bohr magneton. References 1 Czech, R and Francrk, A (1989) Instrumental Effects rn Homodyne Electron Paramagnetlc Resonance Spectrometers Ellis Horwood, Chichester; Wiley, New York 2 Eaton, G R and Eaton, S. (1990) Electron paramagnetrc resonance, m Anafytccal Instrumentation Handbook (Ewing, G W , ed.), Marcel Dekker, New York, pp. 467-530. 3. Wertz, J. E. and Bolton, J R. (1986) Electron Spm Resonance. McGraw-Hill, New York 4 Gibson, J F. (1987) Electron paramagnetic resonance in metalloproteins, in Spectroscopy of Inorganrc-Based Materials (Hester, R , ed.), Wiley, Chrchester, pp 333-406 5 Dutton, P. L (1978) Redox potentiometry. determination of mrdpoint potentials of oxidatron-reductron components of biological electron-transfer systems, m Methods rn Enzymology Blomembranes, vol. 54 (Fletscher, S and Packer L , eds.), Academic, New York, pp 4 1l-435. 6. Ballou, D. P (1978) Freeze-quench and chemical-quench techniques, in Methods m Enzymology: Blomembranes, vol 54 (Fleischer, S and Packer, L , eds ), Academic, New York, pp 85-93. 7 Maret, W and Zeppezauer, M (1988) Preparation of metal-hybrid enzymes, m Methods in Enzymology, vol 158 (Riordan, J F and Vallee, B L , eds ), Academic, New York, pp. 79-94. 8 George, G N and Bray, R C (1988) Studies by electron paramagnetic resonance spectroscopy of xanthine oxidase enriched with molybdenum-95 and with molybdenum-97 Biochemistry 27,3603-3610. 9 Poole, R K , Blum, H , Scott, R I, Collmge, A, and Ohnishi, T (1980) The orientation of cytochromes m membrane multilayers prepared from aerobically grown Escherrchla co11K12. J Gen. Mlcrobrol 119, 145-154 10 Randolph, M L (1972) Quantitative considerations m electron spm resonance studtes of biologtcal matertals, m Brologtcal Apphcations of Electron Spm Resonance (Swartz, H M , Bolton, J R , and Borg, D C., eds.), Wtley-Interscience, New York, pp 119-133 11 Rupp, H , Rao, K K , Hall, D 0 , and Cammack,R (1978) Electron spin relaxation of iron-sulphur proteins studiedby microwave power saturation Blochlm Biophys Acta 537,255-269 12. Bray, R C (1988) The morgamc biochemistry of molybdoenzymes. Q Rev. Blophys. 21,299-330

CHAPTER 15

Resonance Raman Spectroscopy of Metalloproteins Using CW Laser Excitation Roman

S. Czernusxewicx

1. Introduction The study of metalloproteins by resonanceRaman (RR) spectroscopy began two decades ago, with the publication by Long and coworkers of the first RR spectrum of the iron-sulfur protein rubredoxin (1,2). This simple spectrum, which contained only four bands attributed to the Fe-S stretching and bending vibrations of the protein FeS, cluster, generated much interest because of the potential of RR spectroscopy for monitoring structures of metal centers in complex biologrcal systems (3,4). The unique ability of this technique to study the coordination environment of transition metals in proteins derives from its dramatic increase in detection sensitivrty and selectivity for vibrations closely associatedwith atoms at the absorbing center(s)in the molecule. When the molecule ISexcited with a strong monochromatic light whose energy matches that of an electric-dipole allowed electronic transition, a vibronic coupling with the electronically excited state increases the probability of observing Raman scattering from vibrational transitions in the electronic ground state, and the modesthat do show enhancement are locahzed on the chromophore (i.e., on the group of atoms that gives rise to the electronic transition). Since vibrational frequencies are sensitive to molecular bond strength, number of atoms, geometry, and coordination environment, the positions of the enhanced Raman From Methods m Molecular Bmlogy, Vol 17 Spectroscopic Methods and Analyses NMR, Mass Spectrometry, and Metalloprotem Technrques Edited by C Jones, B Mulloy, ,snd A H Thomas Copyright 01993 Humana Press Inc , Totowa, NJ

345

346

----k

Czernuszewicz

Rayleigh

c

Raman

Ftg 1 The origm of infrared and Raman spectra The IR spectrum origmates m the 2) + 2)’transtttons between two vtbrattonal levels of the molecule m the ground electronic state. The correspondmg frequencies are observed as absorption peaks m the infrared region The Raman spectrum originates m the electromc polarizatton causedby energy transfer from the incident UV or vtstble photons to the molecule during a scatteringprocess The vibrational frequenciesare observedasRaman shifts from the mctdent frequency v, tn the UV or vtstble region.

bands can be used to monitor the chromophoric structure. Metalloproteins frequently exhibrt allowed electronic transitions, owmg to X-E* and/or ligand-metal charge-transfer (CT) transitions (5), and consequently, they give wide scope to the application of RR spectroscopy. A great many RR studies of heme proteins, cobalamin, chlorophylls, carotenoids, flavin nucleotides, the visual prgments, and bacteriorhodopsin, and a variety of iron and copper metalloprotein sites have been carried out in laboratories around the world (6-11). 2. Physical 2.1. Raman

Principles Effect

Transitions between different vibrational energy levels in a molecule occur in the infrared region of the electromagnetic spectrum. Such transitions can therefore be probed by a direct absorption of infrared photons (infrared spectroscopy), or alternatively by an inelastic light scattering of visible and/or UV photons (Raman spectroscopy), as illustrated diagrammatically m Fig. 1. Upon inelastic scattering, a photon can lose energy by raising a molecule to an excited vrbrational state or gain energy by inducing the reverse process, producing Stokes or antr-Stokes lines, respectively, in the Raman spectrum, as shown in Fig. 2 for rubredoxm excited with v, = 20141 cm-’ laser line at 77 K.

Resonance Raman Spectroscopy D. gigas Frozen

347

Rubredoxin

solullo~~ (77K). v, - 20141

r’

cm'

w m'

Stokes

-400

Antdtokes

Soattenng

-300

-200

-100

0 A%/om”

100

Scatterlng

200

300

400

Ftg 2. Resonance Raman spectrum of oxrdrzed Desulfovibrio gigas rubredoxm, showing Rayleigh, Stokes and anti-stokes bands When monochromattc radiation of frequency v, is Incident on a molecular system a fraction of the light is scattered Most of thts scattered light consists of radtatton of the same frequency as the Incident hght, although it differs m the dtrection of propagatron (Rayleigh scattermg). A small fraction of the scattered light obtains new modified frequenctes (v, + v,,,,), the frequency changes bemg equal to the frequencies associated with transitions between vibrational levels of the system (Raman scattering) Radtation scattered with a frequency lower than that of the mctdent beam (i.e , of the type v, -v,,,,) 1s referred to as Stokes, whereas that at the higher frequency (i.e , of the type V, + v,,,,) IS called anti-stokes. Since the fraction of molecules occupying exctted vrbratlonal states depends on the Boltzman factor (kT/hc = 209 cm-’ at T = 300 K), the mtenslty of anti-stokes lines falls off rapidly with decreasmg temperature (kT/hc = 54 cm-’ at T= 77 K) and increasmg vibrational frequency When the v, IS offset to zero, the positrons of Raman peaks correspond to the vibrational frequencies, v,,,, The 496 5-nm (20142-cm-t) excitation wavelength for the spectrum was provided by Coherent Innova 90-6 Ar+ ton laser The scattered light was dispersed by a SPEX 1403 double monochromator equipped with 1800 grooves/mm holographic gratmgs and detected at 6 cm-’ sht wrdths by a cooled Hamamatsu 928 photomulttpher tube, under a control of a SPEX DM3000 data station. The spectrum was recorded digitally by frontscattermg dtrectly off the surface of a protein frozen solution kept in a liquid-N2 Dewar The protein sample (3 mM) was in 0 05M TrisHCl buffer, pH 7 5

348

Czernuszewicz

If the incident frequency is offset to zero, then Raman peaks occur at the same frequencies as peaks in the infrared spectrum, Different selection rules, however, govern the intensity of the respective vibrational modes. Infrared absorptions depend on the dipole moment change during molecular vibration, and IR intensity I,,1’2 0~(acl/aQ), where ~1 is the dipole moment and Q is the vibrational coordinate. On the other hand, the Raman scattering depends on the variation of the polarizability (or induced dipole moment) of the molecule a during vibration, and Raman intensity InIl = @a/~Q),Q. Thus, symmetric bond stretching modes, which produce large polarizability changes, usually dominate the Raman spectra, whereas asymmetric stretches and deformation modes, involving large dipole moment changes,tend to be more mtense in infrared spectra. This makes Raman spectroscopy more favorable for the study of biological materials since, there is considerably less spectral interference from the deformation modes associated with the H-bond network of water molecules; these are often dominant features in the infrared spectra of aqueous samples. 2.2. Resonance

Raman

Effect

Resonance Raman (RR) scattering occurs when a material is irradiated with light corresponding to an absorption band region. The RR spectrum is thus produced in the same way as the emission spectrum and has qualitatively the same vibrational structure. In both cases, the absorbed light quanta are reemitted with changed frequencies. In RR spectroscopy, the excited state created by the absorption-emission sequence is coherent, but in ordinary absorption-emission spectroscopy, the excited state created by the absorption process first undergoes relaxation before reemitting the light, so that all phase relations between absorption and emission are lost. The absence of a relaxation step in RR scattering means that no mformation is lost, so that an effective resolution can be obtained. It is, thus, a form of high-resolution vibronic spectroscopy, in which the intensity of vibrational light scattering is examined as a function of the proximity of the excitation wavelength to electronic absorption bands of the molecule. When this excitation falls close to or within an electronic absorption band of the sample, most of the Raman bands are attenuated by the absorption, but certain bands, those that arise from vibrational modes that mimic the distortion of the molecule m its resonant excited state (12), are greatly

Resonance Raman Spectroscopy

349

enhanced ( 103-lo4 increases in the intensities have been observed). The correct identification of the vibrational modes showing RR enhancement will therefore aid in the assignment of the resonant electronic transition and vice versa. Moreover, this technique can provide detailed information about chromophoric centers of the molecule, because only modes closely associated with atoms at the absorbmg center in the molecule display the RR effect. 2.3. Electronic Structure and RR Enhancement Although vibrational frequencies in RR spectroscopy are still a function of the electronic ground state, the vibratlonal intensities are determined by the properties of the electronic excited state(s), as discussed briefly in this section. Theoretically, two means of resonance enhancement are important: the Franck-Condon principle, which recognizes the molecular mechanism involving a displacement of the potential mmlma of the ground and excited states along a vibrational normal coordinate Q, and the Herzberg-Teller vibronic coupling, which involves a transfer of the transition moment between different excited electronic states induced by a vibrational excitation (13). These are also known as the A and B term scatterings, respectively, after the two leading terms of a rapidly converging Taylor expansion of the Raman polarizability with respect to the coordinate Q. The A term is:

A = I&* (l/h) cu W 1‘W 11WW,,,- v, + KJI

(1)

where pe ts the pure electronic transition moment for the resonant excited state e, of which ‘u is a particular vibrational level of bandwidth r,; vUl is the transition frequency from the ground vibrational level i to the level U; and v, is the excitation laser frequency (exact resonance occurs when vu, = v,). The numerator contains Franck-Condon overlap integrals between the vibrational wave function of the intermedlate level II of the resonant excited state with the initial and final levels i andj, which are usually the 0 and 1 vibrational levels of the ground electronic state (fundamental Raman transitions). Consequently, the A term scattering occurs when the electronic transition involved is strongly allowed, with Raman band intensities scaling with the square of pe [which by definition has large values). The relative enhancements for different vibrational modes are determined by the values of the Franck-

350

Czernuszewicz

Condon integrals, which are nonzero and may become large if there is a substantial excited state potential shift along the vibrational coordinate. Only totally symmetric modes are subject to A term enhancement, since they can satisfy the latter condition by symmetry. The B term reflects the dependence of the electronic transition moment on the vibrational coordinate Q, and is given by (13):

where pLe’= (&/aQ)Q. When the electronic resonant transition is weakly allowed, pe’can be of the samemagnitude as pe, or even exceed it, if the excited state e can gain absorption strength from other excited states via vibronic coupling. In the Herzberg-Teller formalism (13): pe’= ~1s(s 133 1dQ 1e)/(v, - v,>

(3)

where v, and CL,are the frequency and transition dipole moment of the mixing electronic state, S, and (S 1aH IaQ (e) is the vibronic coupling operator that connects two excited statese and s. The stronger the transition to the state S,and the closer it is m energy to e, the larger l.~~’will be. Thus, B term scattering becomes especially important in resonance with a weak electronic transition that is coupled vibromcally with a closely lying strong one. The B term denominator is the same as for the A term, but its numerator contains the Q-dependent vibrational overlap integrals, as well as Franck-Condon factors; it does not vanish even if there is no excited state shift of the potential. Thus, the B term scattering provides a prime mechanism of RR activity for nontotally symmetric modes, because from Eq. (3) the vibrations enhanced may have any symmetry that is contained in the direct product of the symmetry representations of the mixing electronic transitions. The relative enhancement factors for different mode symmetries depend on the effectiveness of the vibration in coupling the excited states. Hence, by moving into the resonance region, those fundamentals that reflect the change in geometry when converting the molecule from its ground to excited state (Franck-Condon allowed) or those that are able to couple the resonant excited state vibromcally to some other electronic state with different transition moment (Herzberg-Teller allowed) will be strongly enhanced. This selective enhancement is one of the most important and valuable aspects of the RR spectroscopy,

Resonance Raman Spectroscopy

351

since it leads to a considerable simplification of the observed Raman spectra; they consist primarily of bands arising from either totally or nontotally symmetric fundamentals depending on the nature of the resonant electronic transition. 2.4. Illustration: Metalloporphyrins The advantage of selective enhancement has made RR spectroscopy a favorite method for the study of heme proteins and metalloporphyrins (14,15), whose extended aromatic macrocycle gives rrse to low-lying X-X* electronic transitions that are polarized in the porphyrin plane and are conveniently excited with visible lasers. Their first two excited states are well described by Gouterman’s four-orbital model (16), illustrated in Fig. 3, along with the absorption spectrum of the heme model complex, nickel octaethylporphyrm (NiOEP). The LUMO is a degenerate K* pair of e, symmetry (in the idealized Dllh point group), whereas the two HOMOs, a2uand a,,, have nearly the same energy. Consequently, the excitations es* t a2uand es* t al,, which have the same symmetry (a,, x e6 = a2u x es = E,), interact strongly, producmg well-separated states, with energies in the violet and yellow-green regions of the spectrum. The transition dipoles add up for the higher energy transition to produce the very strong absorption B (or Soret) band at 392 nm, but nearly cancel for the lower energy transition producing a much weaker Q, (or a) band at 552 nm. The latter borrows about 10% of the intensity from the former via vibronic mixing, producing a side band at 516 nm, called Q1 or p band. For this highly symmetric chromophore, configuration interactions and vibronic mixing between these transitions facilitate a strong resonance enhancement of the in-plane vibrational modes of the porphyrin ring and of the peripheral substituents, the relative intensities of which depend dramatically on the laser excitation wavelengths. This is nicely illustrated in Fig. 4, which displays survey RR spectraof NiOEP (I 7-20), obtained with three different laser excitation wavelengths. Excitation in the vicinity of the Soret absorption band (406.7 nm, top) produces the spectrum that is dominated by the A term (Franck-Condon) scattering and polarized Raman peaks (p, with the depolarization ratios, p = II /I,,, c 3/4) arising from totally symmetric vibrations, Al,. In contrast, when the laser is near resonance with the Q absorption bands (530.9 and 568.2 nm, middle and bottom), the spectra are domi-

Czernuszewicz

352

r

NiOEP

552

.

T

I

300

II

I

I

400

I

I

h/rim

I

I

500

I

I

I

600

Fig. 3 Electromc absorption spectrum of NIOEP m CH& and the correspondmg Gouterman’s four orbital model, explaining its origin The e c u2U,aI, orbital excltatlons (labeled as 1 and 2) are the same symmetry (E, m tii e D4* point group of the chromophore), and thus interact strongly, producing a strong B transition at 392 nm (transition dipoles add up) and a weak Q0 transltlon at 552 nm (transition dipoles subtract). The latter vlbromcally mixes with the former, producing a Q, side band at 5 16 nm

nated by the B term (Herzberg-Teller) scattering and the vibrations that are effective in mixing the Q and Soret transitions. Since both transitions are of E, symmetry, the allowed symmetries of the vibronically active modes are E, x E, =Alg + B,, +I$, +A,. However, the A,, modes were shown to be ineffective in mixing the transition moments because of the high porphyrin chromophore symmetry.

Resonance Raman Spectroscopy

Lx

= 406.7

nm

353

NiOEP

p modes

h ex = 530.9 rim ap modes

Fig. 4. Resonance Raman spectra m parallel(ll) and perpendicular (I) scattermg of NiOEP in CH2C12 solution (17), showing the selective enhancement of different modes under multiple resonant state conditions. Resonance with the B (Soret) absorption band (h,, = 406.7 nm) enhances predommantly polarized bands (p), ansing from totally symmetric modes, A,, (Franck-Condon scattering), whereas the spectra obtained with Q band excitations (h,, = 530.9 and 568 2 nm) are domlnated by depolarized (dp) and anomalously polarized (ap) bands, arismg from nontotally symmetric modes, BI, and BZg and AZg, respectively (Herzberg-Teller scattering). The modes labeled v, refer to porphyrin in-plane skeletal modes, whereas bands marked by asterisks are due to CH.$& All spectra were obtained by frontscattermg from a spmmng NMR tube with a SPEX 1401 double monochromator and a cooled RCA 3 1034A PMT tube using and Ortec 93 15 photoncounting system, under the control of a MINC II (DEC) minicomputer. Exciting radiation for RR spectra was provided by Coherent Innova lOO-K3 Kr+ ion laser.

354

Czernuszewicz

Indeed, the RR spectra with excitation in the Q bands are dommated by depolarized (dp, p = 314, B,, and B,,) and anomalously polarized peaks (ap), the latter having greater intensity in the perpendicular than in the parallel scattering component (p > 3/4). Anomalously polarized Raman bands have contributions from A,, modes, which have antisymmetric scattering tensors, aXY = -aYX. The Q-band resonance is also subject to an interference effect between Q0 and Q, vibronic transitions, which produces an additional RR selectivity among the mixing nontotally symmetric modes. It selects for Azs modes at excitation wavelengths falling in between Q0 and Q, transltions, whereas excitation on either side selects for B,, and B,, modes. This is the reason why ap bands are dominant with 530.9 nm excltatlon, whereas dp bands dominate the 568.2 nm excited spectrum (Fig. 4). Because the Q, absorption band strength is not negligible (Fig. 3), polarized Raman peaks of moderate intensity are also seen in the Qresonant spectra, although their Intensity pattern is different from that of the Soret-resonant spectrum (Fig. 4). Thus, the high-frequency bands arising fromvz, v3, and vq modes dominate the 406.7 nm excited spectrum, but have negligible intensity in the 530.9 and 568.8 nm excited spectra, where vg, vg, and v7 are the strongest polarized peaks. This difference implies quite different shapes for the Soret and Q excited state potentials (I 7). 3. Methods

In general, the method of Raman studies is to obtain and assign frequencies associated with molecular vibrations of interest. Since Raman scattering can be 103-106-fold weaker than Rayleigh scattering, its detection requires the use of an intense monochromatic light source, a high-resolution monochromator, and a highly efficient photodetector. In practice, visible and UV laser lines provide the best incident radiation, and the instrument is calibrated for the particular v, being used. The scattered photons of varying frequencies (v,~) are resolved by a scanning monochromator or a diode array instrument, and their intensities are plotted as a function of vibrational frequency, Vv,d- v, -v,. The use of computer-related capabilities, such as multiple scanning, signal averaging, background subtraction, and data smoothmg, leads to significant enhancements in spectral quality. Basic components of a Raman spectrometer are shown schematically in Fig. 5.

Resonance Raman

Spectroscopy r--‘----------‘-----‘----------------------

DATA

CAPTURE

355 Spctrom*er

control

6 DISPLY Polarization scramblsr

Fig. 5 Sc,hematlc diagram of the basic components of a Raman spectrometer See text for description

3.1. The Laser The use of RR spectroscopy as a structure probe of metal active sites requires tuning of the laser excitation wavelength wrthin the electronic absorption of the metalloprotem. The most reliable sources for this purpose are continuous-wave (cw) lasers, which emit a continuous light beam of constant power. Figure 6 depicts the components of a simple gas laser. Ion gas lasers, in which the light emitting species is Ionized argon or krypton, are by far the most wrdely used cw systems, since they can be readily tuned between various discrete wavelengths m the visible and near-UV region, as listed m Table 1. Argon lasers are

Czernuszewicz

356 BEAM SPLITTER

OUTPUT COUPLER I

I

RESONATOR

I

PRISM

WINDOW

Fig 6. Schematic diagram of a cw ton gas laser showing its three key elements the plasma tube, resonator, and power supply. The laser tube contams acttve gas medmm (tomzed He-Ne, Ar, and Kr) and confutes the plasma dtscharge, the best commercrally avatlable argon and krypton laser systems use the rugged metal ceramic tube constructions to ensure high output power and reliability The resonator holds the two laser mirrors, a semitransparent curved output coupler (90-99% reflection) and a high reflector (flat or curved), m a precise orientation such that an optrcal cavity with a high degree of stimulated emtsston is formed. In this conftguration, many of the laser lures will osctllate stmultaneously producing a multtlme output (commonly used to pump organic dye lasers) To allow only one wavelength to osctllate at a time, an intracavtty prism 1s placed between a Brewsterangle window and a high reflector, by tlltmg the prrsm, different smgle-lutes may easrly be selected. The power supply delivers the electrical energy required to sustam the plasma discharge The laser output power is monitored by a calibrated internal photo detector

used primarily for their high output power in the blue/green region (454.5-514.5 nm), whereas krypton tubes are used principally for their red (647.1 and 676.5 nm), yellow/green (568.2-520.8 nm), and violet (406.7-415.3 nm) outputs (the latter region is particularly useful in the RR studies of heme proteins because of their strong Soret absorption bands near 400-430 nm). At high output powers, both lasers can also be purchased with the UV option, which allows their tunability to extend down to 334 (Ar) and 324 nm (Kr). Also available on the market are the helium-neon and helium-cadmium lasers, which emit very useful Raman excitation lines at 632.8 and 441.6 and 325 nm, respectively, although at lower power levels. In addition Ar+ and Kr+ lasers are convenient light sources for pumping the dye lasers, which are capable of producing cw radiation of any desired wavelength within the broad fluorescence emission spectrum (typically 50 nm) of a given organic dye. In recent years, a variety of pulse operating laser systems

Resonance Raman

Spectroscopy

357

Table 1 Lasmg Lines of Contmuous-Wave Lasers for Raman Excitation in the Near-UV and Vtsible Region0 Outputpower/ W Wavenumber Wavelength He-Ne” He-Cd” nm cm-’ KlJ AP 799.32 12510 6 030 793 14 12608 1 0.20 752 55 13288 2 0.80 676 44 14783 2 0 60 647 09 15453 9 2.00 632.82 15802.3 0.08 0.80 568.19 17599.8 1 00 530.87 18837.2 0.42 528.69 18914.7 520 83 19200 1 0.45 514 53 19435.1 2.40 501.72 19931 6 048 496 51 20140 7 0.72 487 99 20492.4 1.80 482.49 20724.7 0.25 476.49 20987.0 0.72 476.24 20997 7 0.25 472.69 21155.7 024 468.04 21365.7 040 21468.9 0.18 465 79 21837 2 0.42 457.94 454.5 1 22002 0 0 14 44156 22646.8 006 415 44 24070 9 0 15 413.13 1.20 24205.5 406 74 24585.7 0.60 001 379.53 26348 2 363.79 27488 5 0 40 0.27 356.42 28056 6 35142 28456 1 0.01 351.11 28480 9 0.30 350.74 28511 0 1.30 337.50 29629 6 0 17 335 85 29775.3 001 334 47 29897.9 002 333 61 29974 9 005 325 03 30766.4 0 02 323.95 30869 0 002 “Values are expressed as wavelengths and wavenumbers m air “Power values for Coherent Models Innova 90-6 (Ar+) and lnnova 200-K2 (Kr+) CPower value for Spectra-Physics Model 125A. ‘Power value for Licomx Model 2042NB

358

Czernuszewicz

i.e., lasers emitting short pulses of high photon flux radiation (down to femtosecond duration time) have emerged as important Raman excltation sources, particularly in the time-resolved (21-23) and deep-UV Raman (2425) spectroscopies. 3.2. Optical System Around the Sample In any Raman experiment, the function of the optical system is to bring an incident laser beam to a focus either inside the sample (solutions) or at its surface (frozen solutions or solids), and to project an enlarged image of this focus on the monochromator slit (Fig. 5). Since a laser source emits not only the high-intensity laser line, but also some spontaneous emission lines (nonlaser lines from the plasma) that form a background to the laser line, these additional lines must be excluded from the laser beam before it strikes the sample. For this purpose, the beam is passed through an interference filter or, as indicated in Fig. 5, a special dispersive prism block. Although interference filters are convenient to use, their application suffers from the fact that a large quantity of filters are needed, one for each individual laser line. In addition, they may allow plasma lines to leak through as they age. A direct vision prism (also known as an Amici prism) used in conjunction with a short focal length lens (300400 mm), a small iris diaphragm, and a fixed-width slit (typically 50 or 75 ltrn wide) is favored in the author’s laboratory. This design can be applied over a wide wavelength range (limited only by the prism glass material), and it has the advantage of completely isolating any desired laser line. The Amici prism can be inserted anywhere in the laser light path before the sample focusing lens, but the slit, whose function is to block the plasma lines dispersed by the prism, must be placed at the focal point of the nearby short focal length lens. The laser radiation, uncluttered by the plasma lines, is directed into the sample area by dielectric mirrors or right-angle prisms. The beam is brought to a focus by a simple lens (lenses of 50-100 mm focal length can be used), and the sample is accurately placed at the focus with an x,y,z-translation stage. The function of this lens is to Increase photon flux at the sample and, thus, to increase the Raman signal. It can be either plano-convex, for a pinpoint focus, or cylindrical, for a sharp-line focus. The focus position on the sample and the center of the collection optics must be on the optic axis of the spectrometer, and a

Resonance Raman Spectroscopy

359

varrety of arrangements based on lens or mirror systems are available (26-28). In the author’s laboratory, the scattered light is gathered to a focus on the monochromator slit by a simple two-lens system, consisting of a standard 50-mm camera lens q- number = 1.2) coupled to a 2 in diameter plano-convex fused silica lens. To fill the spectrometer optics exactly, the coupling lens should have anf-number similar to that of the spectrometer. Overfilling (smallerf-numbers) will result in stray light and loss of potential Raman signal, and underfilling (larger f-numbers too large) will reduce resolution. For most spectral runs, a quartz wedge (polarization scrambler) is placed before the monochromator entrance slit, whose function 1sto change linear mto circular polarization of the light entering the slit to avoid measurement errors owing to the variable spectrometer transmittance of the light polarized in different directions. A complete optrcal system should also include a polarization analyzer (e-g,, a Polaroid sheet) for measuring the depolartzatron of Raman bands (v&e ~nfra). 3.3. The Spectrometer

and Detection

System

The low intensity of Raman (although It might be resonance enhanced) vs Raylergh scattering requires the use of a high-quality monochromator that gives a low background in the spectrum. Nowadays, Raman instruments are equipped with monochromators in which the scattered photons from the sample are spatially dispersed by two or three diffractron gratings. The Raman spectrum is displayed as a series of small wavelength lines at the focal plane of the monochromator exit port. As shown in Fig. 5, there are two main methods by which this spectrum can be detected. The most widely used method involves a scanning spectrometer consisting of a double monochromator in conJunction with aphotomultiplrer tube (PMT), equipped with photon counting electronics for maximum sensitivity. In a monochromator, a narrow exit slit of the same size as the entrance slit is used to isolate a single wavelength line from all the wavelengths that strike the focal plane. Different wavelength lines are moved sequentially across the slit and detected as the gratings are slowly rotated by the accurate drive of the spectrometer under control of the computer. An alternative approach j s to operate the monochromator as a spectrograph by removing the exit slit and capturing the dispersed light with spatrally sensltive vrdeo-type multrchannel detectors (MD), such asphotodiode arrays,

360

Czernuszewicz

vidicon tubes, and charge-coupled devices (29-31). These detectors are actually an array of a large number of closely spaced miniature photoelectric detectors (typically 500-2000), that allow an entire section of Raman spectrum to be measured simultaneously (the spectral coverage depending on the spectrograph resolving power and the width of the MD target) and, thus, enormously reducing the time required to obtain a spectrum. This is particularly advantageous in time-resolved experiments, when short-lived transient species are of interest, or when the sample is undergoing chemical or photochemical changes during data acquisition and/or prolonged exposure to the laser beam. However, one must be aware of some limitations of nonscanning instruments. Only coarse changes in spectral resolution are possible with multichannel detection (via replacement of the gratings), and stray light might compete with Raman scattering in the low-frequency region of the spectrum. Triple monochromators are often employed to circumvent the latter problem, but their lower throughput removes some of the speed advantage of MD detection. 3.4. Sampling Techniques The beam diameter of laser excitation in Raman spectroscopy is approx 2 mm, and it is reduced to a few microns after focusing, thus decreasing the effective scattering volume to several microliters. This represents the minimum volume of protein expected to yield a useful RR spectrum. Because it is also a unidirectional entity, the laser lifted many restrictions on the sample illummation configurations, andoffered practically unlimited freedom in the design and use of sample cells. Nowadays, the scattered light can be viewed at 90” to the direction of the exciting beam (conventional 90” scattering geometry), at 180” (backscattering), or at any angle in between (frontscattering). In the latter two geometries, either a small right-angle prism or a small front surface mirror placed in between a sample cell and a focusing lens is used to direct the laser beam to the examination point within the sample. Furthermore, samples may be examined in any physical state, shape, or aggregation. Figure 7 illustrates various ways of positioning the sample in the laser beam that have been found to be useful in studying metalloproteins. The simplest Raman cell for solutions is the glass or quartz capillary tube held transverse to the laser beam, from which scattered radiation is collected at 90”. To protect solutions from evapo-

Resonance Raman Caplllary

Sealed

361

Spectroscopy Cylinder

or Flow

Spin

Low Temperature

cryalp

\\

t Frozen sdutlon

NMR

Spin

t

Tuba

or

Stir

Microscope

M Single crystal

Fig. 7 Schematic diagram of a variety of useful sampling techmques for measuring Raman spectra of metalloproteins. See text for description.

ration and exposure to oxygen, moisture, and impurities, the ends of quartz capillaries can be sealed with the paraffin film or rubber septa, whereas the Pyrex TMtubes can be heat-sealed under vacuum or in an inert atmosphere. As little as microliter quantities of proteins can be examined in this way, provided that the sample, being stationary, can withstand the localized heating by the focused laser beam. Such a heating is particularly troublesome in the RR studies, since the excltation wavelength is intentionally tuned to the protein absorption band, and thermal effects as well as photoeffects are expected to occur. Defocusing the laser beam at the sample often helps to minimize these problems, but the more successful way is to flow the protein solution through the capillary cell. This can be achieved in two ways, depending on the amount of solution available. If only a few milliliters of protein are available, a recirculating close loop arrangement can be set

362

Czernuszewicz

up, with a peristaltic pump. With larger volumes in hand (5-20 mL), it is preferable to flow the solution through the capillary only once, by using a syringe pump. The latter method has the advantage of a smooth motion action and of not recycling the possible photoproducts through the laser beam. Alternatively, the protein solution can be put inside a glass or quartz rotating cell to promulgate the Incident radiation over a large volume. A variety of spinning devices have been put forward to accomplish this, and the cylindrical rotating cell introduced by Kiefer and Bernstein (32) in 1971 has been especially popular. About 1 mL of solution is necessary in the smallest practical cylindrical cell, Since this device entails 90” scattering through the corner of the cylinder, it requires a high-quality glass or quartz seal, and precise optical alignment. The focus of the laser must fall very close to the wall inside the solution to minimize the absorption of Raman scattered light by the solution itself, but not on the glass wall, since that will cause spurious lines originating within the glass to be observed. Because of self-absorption problem, the spinning cylinder is most suitable for transparent liquids or solutions. For highly colored solutions, a frontscattering spinning NMR tube arrangement, introduced by Shriver and Dunn (33), has proven to be convenient for many Raman applications. In most applications, the angle of incidence is ca 135”, as shown, although 180” backscattering may also be obtained if desired. It is a simple and versatile design that allows the rotation and cooling of the sample simultaneously, via flowing a stream of cold nitrogen over the spinning tube near the sample illumination area. Also, small volumes (as little as 50-100 cls,can be used in flat-bottom tubes) and/or air-sensitive samples requiring a confined space can easily be examined in this configuration. A disadvantage of the spinning NMR tube is that there is very little lateral mixing of the solution and the illuminated volume tends to be limited to a circle around the tube. This problem can be circumvented by inserting a small magnetic stirrer (e.g., a spin-fin) in the tube. All glass sample containers have limitations when used to investlgate the low-frequency spectral region (O-500 cm-*), however, because broad nonresonant Raman scattering from the glass or quartz cylinders and NMR tubes produces an ill-defined envelope of bands between 300 and 500 cm-‘. These bands often dominate RR spectra in this

Resonance Raman

Spectroscopy Liquid

363

N,

Pump Q

Fiat 0-M Flange Jor

Glass-to-Metal Seal ~To Spectrometer ClJ Cryotip

f OptIcal Flat

\ Laser Beam

Fig. 8 A simple hqmd-N2 temperature cryostat for RR studiesof frozen protem solutions (34) The protem solution (5-15 pL of I-10 mM concentration) is placed m a copper cup on the end of a cold finger to give a flat surface The glassor quartz shroud 1sclamped over the cold finger and the sample1sfrozen by pouring liquid N2 m the horizontal dewar Once the sample is frozen, the dewar IS turned to a vertical position and evacuated to 104-10e6 Torr. The dewar, filled with liquid N2, IS transferred to the Raman sample compartment and the scattered light 1scollected via 135” frontscattermg geometry directly from the surface of a frozen protein solution.

region, obscuring weak sample bands. These difficulties have led us to design severallow-temperature devices (34-37) that canbe usedto obtain RR spectra via 135” frontscattering directly from the surface of a frozen protein solution (Figs. 8 and 9). In most applications, the protein solution, optimally having an absorbancebetween 1 and 2 in a 1-mm absorption cuvet, is placed in a copper cup on the end of a liquid N, cooled finger (77 K) or a cryotip attached to a liquid-He refrigerator (10 K) to give a flat surface with mimmal meniscus. Efficient cooling of the

364

Czernuszewicz

f OptlCOl flot Frozen saMtar samp4e

I \

B

CCR

sample

chamber

\

2 mdlom, l/8 in thick Ouftz

window

/ Sample

Flat retaining nut

A

holder

To spectrometerrl

Collection

lens

I

Fig. 9. (A) A mimature Raman cell for frozen protein solutions that once loaded with sample can conveniently be shipped between laboratories m dry Ice or a hquid N2 and then attached without further manipulation to a hehum closed-cycle refrigerator (CCR) station for RR measurements(36) (B) Orientation of mmicell and sampling optics m normal operation The main advantagesof this design are (1) very small quantities of sample are required (one to two drops); (2) the cell atmosphere can be controlled, (3) cryogenic temperatures are obtained, down to 10 K, and (4) Raman scattering originates chrectly from the surface of a frozen solution without interferences from a glassor quartz scattering.

Resonance Raman

Spectroscopy

365

sample permits the use of higher laser powers (300-400 mW), and hence, better quality spectra can be obtained. Also, Raman bands are considerably narrower in the frozen state, since molecular motions responsible for broadening are heavily suppressed. Thus, an mcrease in spectral resolution is typically observed in comparison to room temperature data (34). Only very small quantities of sample arerequired (~30 a), and this is especially valuable for studying biological molecules that are isolated in low yields or may be very expensive. An additional advantage is that the cell atmosphere can easily be controlled, thus allowing protein solutions that require anaerobic handling in certain redox states to be examined. Finally, the RR spectra can also be obtained from protein single crystals by directing the laser beam through a microscope and collecting the backscatteredlight by the microscope objective in a Raman microprobe arrangement. 3.5. Spectral Interpretation and Band Assignment Understanding in detail the nature of vibrational modes being monitored by RR spectroscopy is the key in applying it as a metalloprotein structure probe. The fundamentals of vibrational motion and spectra of molecules in terms of their chemical structures and properties have been extensively treated in the literature (38-43). In this section, we shall concentrate on general considerations only. 1 Resonanceexcltatlon is unique In that the enhancementis restrlcted to Ramanlmes arlsmg from electronsmakmg up the chromophorlc bonds. Often they are the only featuresobservable.For example, resonance with n-71;”electronic transltions, which give rise to strong absorption bandsof polyeneand aromatic chromophores,enhancespredominantly stretchingmodesof the7cbonds(Fig. 4). Similarly, resonancewith hgand + metal CT transitions, which arepresentm many proteins containmg transition metals at their active site, enhancesmodesm which the metalllgand and internal llgand bonds are stretched(Fig. 2). Consequently, the task of ascribing the RR-enhanced bands to a particular structural element 1sgreatly simplified as compared to the IR and normal Raman (i.e., far from resonance)spectra,where exceedingly numerousbands are expected to appear (a molecule conslstmg of N atoms may have 3N - 6 [3N - 5 if It has a linear structure] vibrational modes of motion). 2. Inasmuch as the positions of peaks in the RR spectrum are ground-state vibrational frequencies, the principles of interpretation are, of course, the same as those for IR and normal Raman spectra (44-48). Thus, the

366

Czernuszewicz

frequencies of the RR scattered light are also mterpreted in terms of atomic masses, molecular geometry (bond distances and bond angles), force constants (bond strength), and local environmental influences (e.g., solvent and conformational effects, pH changes, H-bonding, spin and redox states, and so on). It should be noted that complete theoretical and empirical procedures exist for calculatmg the forms and frequencies of all modes of vibration of relatively small molecules (39) Although it might be complex and time-consuming, an extension of these procedures to large molecules IS frequently straightforward, because many submolecular groups of atoms give rise to vibrational bands m characteristic narrow ranges of frequencies (known as group frequencies, localized modes) (44), which can be identified with particular classes of structures. 3. Useful mformation on the scattermg process and on the symmetry of the vibration involved can be obtained by measurmg the depolarization ratio, p, of a Raman band If the Incident radiation is polarized, as it is with a cw laser source, the Raman scattered light can be polarized to various degrees that depend on the nature of the active vibration. The depolarization ratio is given by p = 1,/I,, , where II and I,, are the mtensities of Raman light that is polarized, perpendicular and parallel respectively, to the polarizatton of the excitation radiation Experimentally, it is obtained by msertmg a polarization analyzer between the sample and the monochromator, which upon rotation by 90” passes II or I,, (Fig. 5) For totally symmetrtc vibrations, which by nature do preserve molecular symmetry durmg the motion of the nuclei, the Incident beam polarization is largely maintained m the scattered light leadmg to small p values (0 I p c 3/4, polarized Raman bands). The higher the symmetry of a molecule, the closer to zero is the value of p. On the other hand, if the vibrational motion distorts the symmetry (nontotally symmetric vibrations), a sigmficant depolarization can occur producing depolarized Raman bands. From scattermg theory, it is predicted that for such vibrations p = 3/4. Under resonance or near resonance conditions, however, certain nontotally symmetric modes can also give rise to anomalously (p > 3/4) or even mversely (p = w, m theory) polarized Raman bands. Nickel octaethylporphyrm provides a good example of the relation between symmetric and various types of nontotally symmetric modes and depolarization ratios (see Fig 4). 4 The most reliable tool that can be used to aid vibrational mterpretation is isotopic substitution, however (49). Upon isotope replacement(s), a frequency shtft(s) IS produced for any vibration that mvolves appreciable movement of the exchanged atom(s), and the larger the isotopic

Resonance Raman

Spectroscopy

367

mass difference, the greater the frequency shift will be. An example is the shifts in the Fe-S stretching frequencies of the FeS, cluster in rubredoxm on forming the 54FeS4moiety. In then pioneering study of rubredoxin, Long and coworkers (1,2) found a set of four RR bands in the metal-ligand stretching and bending regron, just the number of normal modes expected for a tetrahedral iron-sulfur complex. A strong polarized band at 3 14 cm-t was assigned to the totally symmetric breathmg mode v,(A,), involving the m-phase stretching of all four Fe-S bonds. The other three stretches (there must be four altogether, one for each bond) were assigned to the triply degenerate asymmetric mode v,(T,), observed weakly as a depolarized band at 368 cm-i. The remammg two bands at 126 and 150 cm-’ were identified with the SFeS bending modes, v2(E) (doubly degenerate) and v,(T,) (triply degenerate), respectively. Recent reexamination of the Fe-S stretching region at higher resolution m frozen protem solution at 77 K (50) has revealed the situation to be more complex, however, as shown m Figs. 2 and 10. When the low temperature spectrum was recorded with 568.2-nm excitation, three bands were then seen at 376, 363, and 348 cm-‘, not one, in the region where ~~(7’~)was expected, implymg the degeneracy was completely lifted. In addttion, a 324 cm-’ shoulder became apparent on the v, band. A firm assignment of the 376, 363, and 348 cm-i bands to the three vs components was established by exammmg the isotope shifts in 54Fereconstituted protein via RR difference spectroscopy (36), as shown m Fig. 10. All three bands showed clear 54Fe upshifts, as expected for asymmetric Fe-S stretchmg vibratrons. The dominant RR band at 314 cm-* did not shift on 54Fesubstitution, confirming its assignment to the Fe& breathmg mode, vI (the Fe atom does not move m such a mode). Likewise, a side band at 324 cm-’ showed no discernible 54Feshift; the intensity is cancelled in the difference spectrum over 314-324 cm-’ band envelope (Fig. 10). This band was therefore assigned to a cysteme SCC bending mode (50). 5. The experimental assignments can be tested by a calculation of the frequencies using the techniques of normal coordinate analysis (NCA) (3944), m which a physically reasonable force field is developed, so that the best fit between observed and calculated frequencies can be obtamed. Such a calculation requires information on the spatial arrangement of the atoms within the molecule (geometry), the massesof the atoms, and the force constants. A geometric model for the molecule is assumed by utilizing results obtained by X-ray diffraction or other spectroscopic techniques, but those are often not available for metalloproteins, since the former technique, for example, requues well-grown crystals. In such

368

Czernuszewicz

3:4 hL)y= 568.2 v,(T,)

t t cm a z 1I

A -1.1

A-B

,

nm

1.4 2.5cm’

I

280

I

310

I

340

I

370

4 0

AT/cmFig. 10. ResonanceRaman spectra of oxtdized Desulfovlbrlo glgas rubredoxm (A), its 54Fe-substitutedprotein (-B), and correspondmgdifference spectrum (A-B) (50), obtained by usmga tuning fork difference Ramancell (35). This device moved a divided cup through the focused laser beam, using the swmgmg motion of a spring attached to a copper cold finger (77 K), and driven at its resonance frequency (30-40 Hz) by a solenoidand magnet.In this way two frozen protein samples were alternately excited and a synchronous srgnal was sent to gating and photoncounting electronics to allow independent analysis of the scattered light in two channels. To obtain a difference spectrum the resultmg two independent spectra were subtracteddigitally. Both spectrawere measuredwith 568 2-nm Spectra Physics 171 Kr+ laser excitation and 4 cm-’ slit widths, whereas the monochromator (SPEX 1401 double monochromator) was advanced m 0 2 cm-’ increments The protein samples(4 mM) were m 0.05M Tris-HCl buffer, pH 7.5

Resonance Raman

Spectroscopy

369

cases, one must rely on structural parameters of Judicially chosen morgame model complexes of metalloprotem active sites (vide infra). It should be also noted that the force field is not known in advance, and the usual empirical approach is to extrapolate well-established force constants from small molecules having similar bonding propertles (I 7) Again, lsotoplc data can be invaluable m this situation, because the correct reproduciblllty of Isotope shifts in the NCA calculation provides the most crltical test of the adequacy of the force constants used, as well as the vibrational assignments. 6. Finally, when applying the RR spectroscopy to the structural problem of metalloprotems, an exploration of protein sites should go hand m hand with the similar study of their model complexes (51-56), partlcularly those that have the ability to mimic the metalloprotem properties. This IS advantageous for several reasons: a. In every case the exact structures of analogs are available from Xray crystallographic data; b. Synthetic routes for preparmg such complexes are worked out, and thus more extensive isotope labeling 1susually possible, c. RR spectra can be complemented with the infrared absorption spectra, which IS rarely the case for metalloproteins; d Since molecular parameters and full vibrational signatures are avallable, detailed NCA calculations can be carried out, and accurate plctures for all 3N-6 cluster vibrations can be obtamed, and e. The RR spectral pattern and band assignments of analogs will provide the reference points for analyzing the protem spectra, since their respective vtbrational patterns must be compatible. Typical examples include extensive RR studies of metalloporphyrms (15), iron-sulfur complexes (57), and ~-0x0 Iron and copper dlmers (14).

4. Notes 1. Sample purity 1soften the key to obtaining a good-quality RR spectrum from metalloproteins. For example, light scattering from a particulate matter present m protein solution may ruin a good spectrum by dramatically increasing its background. A low-speed centrifugation or a passage through IO-100 ~1filters ~111ensure the hqulds or solutions are optically homogeneous 2. A more frustrating source of high backgrounds m the RR spectrum, especially for a newcomer to the techmque, is the lummescence emlsslon resulting from fluorescence or phosphorescence processes,or both RR scactermg, even though enhanced, 1s inherently less probable than lummescence; thus, It can easily be obscured by a much stronger lumm-

Czernuszewicz

370

escence signal. Shifting the laser excitation mto the blue often reduces this interference, because the absolute positions of Raman peaks change with excitation wavelength (Fig. I), whereas the positron of the wavelength of maximum luminescence does not. Burning off the fluorescence by prolonged irradiation of the sample with high laser powers or adding a quenching agent to the sample solution (e.g., potassium iodide) will also reduce fluorescence m many cases, provided that such treatments do not denature the protem being studied. Since the lummescence from impurities is the most common occurrence, however, the best solution is to purify the sample by employing such techniques as dialysis, ultrafiltration, and/or size-exclusion chromatography. 3. When choosmg the laser excitation power, it should be remembered that too high a power can cause protein denaturation resulting from localized heating by a tightly focused laser beam. Strong absorption of large amounts of incident radiation m RR spectroscopy may also increase the probability of photodegradation of chromophoric metal sites in biological molecules. In addition to simple reduction of laser power, these problems are satisfactory elimmated by* a. Defocusing the laser beam on the sample; b. Attenuating the Incident radiation by msertmg neutral density filters; c. Using a cylindrical lens to produce lure focus on the sample surface; d. Tuning the laser to the red side of achromophor’s absorption spectrum; e. Rotatmg or stirring the sample; f. Cooling or freezing the sample; and g. Any combination of the above. It is always wise to check the integrity of the protein being studied by monitoring its biological activity and/or other spectroscopic properties before and after recording the RR spectra. 4. It should be remembered that concentrated buffer solutions can contribute Raman peaks, and runnmg a matrix solution of the metalloprotem as a control is recommended. Also, too high a buffer concentration is especially critical for measurements on frozen protein samples, since buffer molecules can readily crystallize around the protein molecules effectively blocking the exciting radiation. Frozen samples may also exhibit a medium intense Raman peak near 230 cm-’ and a much weaker feature near 310 cm-i, which are associatedwith ice lattice vibrations (34)

Acknowledgments The author acknowledges the valuable contributions of coworkers whose work is cited herein and the support from The Robert A. Welch Foundation.

Resonance Raman

Spectroscopy

371

References 1 Long, T V and Loehr, T M (1970) The possible determmatton of non coordmatton in nonheme iron protems using laser-Raman spectroscopy Rubredoxm J Am Chem Sot 92,6384-6386 2 Long, T V., Loehr, T. M , Alkms, J. R., and Lovenberg, W (1971) Determlnatton of iron coordmatton m nonheme non using laser-Raman spectroscopy. II Clostrtdtum pasteurtanum Rubredoxm in aqueous solutton J Am Chem Sot 93, 1809-1811 3 Spiro, T G. (1974) Resonance Raman spectroscopy* A new structure probe for biological chromophores Act. Chem Res 7,339-344 4 Spiro, T. G. and Gaber, B. P (1977) Laser Raman scattermg as a probe of protein structure Annu Rev Btochem 46,553-572 5 Que, L , Jr (ed.) (1988) Metal clusters in proteins ACS Sympostum Series 372, American Chemical Soctety, Washington, DC 6 Carey, P R (1982) Btochemical Applicattons of Raman and Resonance Raman Spectroscopies Academic, New York 7. Tu, A. T. (1982) Raman Scattertng in Btology Wtley, New York. 8 Parker, F. S (ed ) (1983) Appltcattons of Infrared, Raman and Resonance Raman Spectroscopy tn Btochemtstry Plenum, New York 9 Moore, C B (ed ) (1974 to present) Chemical and Biochemtcal Appltcattons of Lasers, vol 1 to present, Academtc, New York 10 Clark, R J H and Hester, R E (eds ) (1970 to 1985) Advances tn Infrared and RamanSpectroscopy, vols I-12, and (1986 to present) Advances in Spectroscopy, vol. 13 to present, Wiley, Chtchester. 11 Spiro, T G (ed ) (1988) Btologtcal Applicatrons of RamanSpectroscopy, vols 1-3, Wiley-Interscience, New York 12 Spiro, T G. and Stem P (1977) Resonanceeffect m vtbrattonal scattermg from complex molecules Annu. Rev. Phys Chem 28,501-521. 13 Albrecht, A C. (1961) On the theory of Raman mtensitles J Chem Phys. 34, 1476- 1484. 14 Spiro, T G (ed ) (1988) Btologtcal Applicattons of RamanSpectroscopy, vol 3, ResonanceRamanSpectra of Hemeand Metalloprotetns, Wiley-Interscience, New York. 15 Spiro, T G , Czernuszewlcz, R S., and Lt, X -Y. (1990) Metalloporphyrm structure and dynamics from resonanceRaman spectroscopy Coord Chem Rev. 100,5 14-57 1 16 Gouterman, M (1979) Opttcal spectra and electromc structure of porphyrins and related rmgs, m Porphyrms, vol. III, Part A (Dolphin, D , ed ), Academtc, New York, pp l-156 17 Li, X -Y., Czernuszewtcz, R S , Kmcaid, J R , Stem, P., and Spiro, T G. (1990) Consistent porphyrin force field. 2 Nickel octaethylporphyrm skeletal and substituent mode assignmentsfrom t5N, Meso-d4, and methylene-d16 Raman and Infrared isotope shifts J Phys Chem 94,47-6 I. 18 Lt, X -Y , Czernuszewlcz, R S , Kmcald, J R , and Spiro, T G. (1989) Con-

Czernuszewicz

372

ststent porphyrm force field 3. Out-of-plane modes in the resonance Raman spectra of planar and ruffled nickel octaethylporphyrm J. Am Chem Sot 111,7012-7023.

19. Czernuszewicz, R S., Li, X -Y , and Spiro, T G (1989) Nickel octaethylporphyrin rufflmg dynamics from resonance Raman spectroscopy. J Am. Chem Sot 111,7024-703 1. 20 Czernuszewicz, R S , Macor, K A, LI, X -Y , Kmcaid, J. R., and Spiro, T. G (1989) Resonance Raman spectroscopy reveals al, vs a2” character and pseudoJahn-Teller distortion m radical cations of Nt”, Cu”, and ClFe’n octaethyl- and tetraphenylporphyrms J. Am Chem Sot 111,3860-3869. 21 Atkinson, G. H (1983) Time-Resolved Vibrattonal Spectroscopy. Academic, New York 22 Hamaguchi, H. (1987) Transient and time-resolved resonance Raman spectroscopy of short-lived intermediate species, in Vibrattonal Spectra and Structure, vol. 16 (Durig, J. R., ed ), Dekker, New York, pp. 227-309 23 Rousseau, D L and Friedman, J M (1988) Transient and cryogenic studies of photodissociated hemoglobin and myoglobm, m Btologtcal Appltcatrons of Raman Spectroscopy, vol 3 (Spiro, T G., ed.), Wiley, New York, pp 133-215. 24 Tsuboi, M., Nishimura, Hirakawa, A Y., and Peticolas, W (1988) Resonance Raman spectroscopy and normal modes of the nucleic acid bases, m Biologtcal Applicattons of Raman Spectroscopy, vol 2 (Spiro, T G , ed.), Wiley, New York, pp 109-179 25 Hudson, B. S. and Mayne, L. C (1988) Peptides and protein side chains, in Biological Applications of Raman Spectroscopy, vol 2 (Spiro, T. G., ed ), Wiley, New York, pp. 181-209. 26 Kiefer, W. (1977) Recent techniques in Raman spectroscopy, in Advances tn Infrared and Raman Spectroscopy, vol. 3 (Clark, R J H and Hester, R E., eds ), Heyden, London, pp l-42 27. Strommen, D P and Nakamoto, K (1984) Laboratory Raman Spectroscopy, Wiley, New York, pp. 16-20 28. Gardiner, D J and Graves, D J. (eds ) (1989) Practical Raman Spectroscopy, Springer-Verlag, Berlin 29. Talmi, Y. (1982) Spectrophotometry and spectrofluorometry with the self-scanned photodtode array. Appl Spectrosc 36, 1-18 30 Jones, D.G (1985) Photodiode array detectors in UV-VIS spectroscopy. Part I Anal Chem 57,1057A-1073A. 3 1 Pemberton, J E., Sobocinski, R. L , Bryant, M A, and Carter, D. A (1990) Raman spectroscopy using charge-coupled device detection Spectroscopy 5,26-36 32 Kiefer, W and Bernstem, H J (1971) A cell for resonance Raman excitatton with lasers in liquids. Appl Spectrosc 2,500,501 33. Shriver, D F. and Dunn, J. B. R (1974) The backscattermg geometry for Raman spectroscopy of colored materials Appl. Spectrosc. 28,3 19-323. 34. Czernuszewicz, R. S. and Johnson, M. K. (1983) A sample low-temperature cryostat for resonance Raman studies of frozen protein solutions Appl. Spectrosc. 37, 297-298

Resonance Raman

Spectroscopy

373

35 Eng, J F., Czernuszewicz, R S , and Spiro, T. G (1985) Raman difference spectroscopy via backscattermg from a spinnmg tube and from a low-temperature tuning fork J. Raman Spectrosc 16,432-437. 36. Czernuszewicz, R S (1986) Closed-cycle refrigerator solution and rotating solid sample cells for anaerobic resonance Raman spectroscopy Appl Spectrosc 40, 571-573 37 Drozdzewsh, P. M and Johnson, M K (1988) A simple anaerobic cell for lowtemperature Raman spectroscopy Appl Spectrosc 42, 1575-1577. 38. Herzberg, G (1945) Molecular Spectra and Molecular Structure II. Infrared and RamanSpectra of Polyatomtc Molecules,Van Nostrand Reinhold, Princeton 39. Wrlson, E B Jr., Decius, J C., and Cross, P C. (1955) Molecular Vtbrations The Theory of Infrared and RamanVtbrational Spectra.McGraw-Hill, New York 40. Steele, D. (1971) Theory of Vtbrattonal Spectroscopy W B. Saunders, Philadelphia. 41 Woodward, L A (1972) Introduction to the Theory of Molecular Vibrations and Vibrattonal Spectroscopy Oxford University Press, London 42 Cyvin, S J. (1972) Molecular Structure and Vibrations. Elsvier, Amsterdarn. 43 Nakamoto, K (1986) Infrared and Raman Spectra of Inorganic Coordination Compounds,4th ed Wiley-Interscience, New York 44 Colthup, N. B., Daly, L H , and Wiberley, S E. (1990) Introduction to Infrared and RamanSpectroscopy,3rd ed Academic, New York 45 Doohsh, F R , Fateley, W. G , and Bentley, F F (1974) Characteristic Raman Frequenctes of Organic Compounds,2nd ed. Academic, New York. 46 Adams, D M (1967) Metal-Ligand and Related Vibrations Edward Arnold, London 47 Szymanskl, H A (1964, 1966, 1967) Interpreted Infrared Spectra, vols I-III, Plenum, New York. 48. Ferraro, J. R (1971) Low Frequency Vibrations of Inorganic and Coordination Compounds Plenum, New York 49 Mohan, N , Muller, A , and Nakamoto, K (1970) The metal isotope effect on molecular vibrations, m Advances tn Infrared ana’RamanSpectroscopy, vol 1 (Clark, R. J H and Hester, R E., eds.), Heyden, London, pp 173-226 50. Czernuszewicz, R S , LeGall, J., Moura, I., and Spiro, T G. (1986) Resonance Raman spectra of rubredoxin New assignments and vibrational coupling mechamsm from iron-54/uon-56 isotope shifts and variable-wavelength excitation. Inorg Chem 25,696-700

51 Yachandra, V. K , Hare, J., Gewuth, A, Czernuszewicz, R S., Kimura, T , Holm, R H , and Spiro, T G (1983) Resonance Raman spectra of spinach ferredoxin and adrenodoxm and of analog complexes. J. Am Chem.Sot 105,6462-6468. 52. Han, S , Czernuszewlcz, R. S , and Spiro, T. G (1989) Vibrational spectra and normal mode analysis for [2Fe-2S] protein analoques usmg 34S, 54Fe, and *H sub-sittution: coupling of Fe-S stretchmg and S-C-C bendmg modes. J Am. Chem. sot 111,3496-3504

53. Han, S , Czernuszewlcz, R S , Kimura, T , Adams, M. W. W , and Spiro, T. G. (1989) Fe2S2 protein resonance Raman spectra revisrted. structural varrahons

374

54

55

56

57

Czernuszewicz among adrenodoxm, ferredoxm, and red paramagnetic protein J Am Chem Sot 111,3505-35 11 Johnson, M K , Czernuszewmz, R S , Spiro, T G , Fee, J A, and Sweeney, W V. (1983) Resonance Raman spectroscopic evtdence for a common [3Fe-4S] structure among protems containmg three-iron centers J. Am Chem. Sot 105, 667 l-6678 Czernuszewicz, R S , Macor, K A, Johnson, M K , Gewtrth, A, and Sprro, T G (1987) Vibrattonal mode structure and symmetry in protems and analogues contammg Fe&, clusters resonance Raman evidence for different degrees of distortion m HiPIP and ferredoxm J Am. Chem Sot. 2,7178-7 187. Czernuszewlcz, R S., Sheats, J E , and Spu-o, T G (1987) Resonance Raman spectra and excitation profile for [Fe,O(O,CCH,),(HB(pz),),], a hemerythrm analoque. Inorg Chem 26,2063-2067 Spiro, T. G., Czernuszewtcz, R S , and Han, S (1988) Iron-sulfur protems and analog complexes, m Brologlcal Applications of Raman Spectroscopy, vol 3 (Spiro, T G , ed ), Wtley, New York, pp 523-553

CHAPTER

16

The Application of X-Ray Absorption Spectroscopy to Characterize Metal Centers in Proteins C. David

Garner

1. Introduction The availability of synchrotron radiation has transformed the status of X-ray absorption spectroscopy (XAS) from a topic of relatively minor interest to one of major scientific importance and activity. The availability of a continuum of electromagnetic radiation with a high intensity throughout a large part of the X-ray region has permitted reliable XAS data to be obtained for a wide range of elements. XAS is ideally suited to probe the immediate environment of specific atoms in a material, irrespective of its physical state. X-ray crystallography is rightly regarded asthe most powerful structural technique to provide the architectural details of a protein molecule. However, sometimes the resolution of crystallographic data is insufficient to draw meaningful conclusions concerning the detailed nature of a metal center within a protein. Also, some proteins refuse to crystallize or yield crystals suitable for a high-resolution structure determination. Furthermore, it is important to establish structural details for proteins, and especially their catalytic centers,when maintained at conditions similar to their working environment-typically, in aqueousmedia in the presence of substrate, inhibitor, or suitable redox partner, XAS 1s a local structural probe, the information content of which derives from electron diffraction. For a metalloprotein, the electron source and detector is the metal atom that is probed, since selective From Methods m Molecular Biology, Vol 17 Spectroscopy Methods and Analyses NMR, Mass Spectrometry, and Metalloprotetn Techntques EdIted by* C Jones, 0 Mulloy, and A H. Thomas Copynght 01993 Humana Press Inc., Totowa, NJ

375

376

Garner

excitation is achieved by scanning a range of X-ray wavelengths particularly appropriate to the element of central interest. The selectivity and the local nature of the diffraction process give the technique its major strength. For example, metal-ligand distances can be determined to an accuracy of ca. +0.0281 (see Notes 1 and 2). In addition, XAS does not require crystalline materials; thus, aqueous protein samples are readily probed under a variety of conditions. Since 1975, when the first measurement of an X-ray absorption spectrum was recorded using a synchrotron radiation source (I), XAS has become established as an important technique for probing the environment of transition metals in proteins. Numerous studies have been accomplished, and many significant advances made. The majority of these have been identified andcollated in several excellent reviews (e.g., 2-8) to which the reader is referred. This chapter provides a qualitative description of the theoretical basis of the technique and outlines important experimental considerations. 2. Theoretical

Basis

2.1. Description The absorption of X-rays by a material may be expressed as Eq. (1): I = I, e-p

(1)

where I, = incident X-ray intensity; I = transmitted intensity; p, = absorption coefficient; x = thickness of absorber. Figure 1 contains an illustration of a typical X-ray absorption spectrum, that for Cu/Znmetallothionein. For all materials, the generally smooth variation of absorption with increasing photon energy is punctuated by sharp increases in absorption, called absorption edges. These occur at positions where the incident photons are of an energy that is just sufficient to promote an electron from a core level of a particular atom to valence level and beyond to an unbound state. For a 1s (or K-shell) electron, K-absorption edges result, and Fig. 1 shows the selectivity inherent m the technique, in that the copper and zinc K-edges are clearly resolved. Above the absorption edge, pdecreases with increasing photon energy, and apart from gaseous monoatomic species, low-amplitude oscillations are usually observed up to several hundred eV above the edge. Historically,

and because of the different theoretical treatment nec-

essary to interpret the data, it has been customary to classify the oscil-

EXAFS +

Fig 1. The X-ray absorption spectrum of Cu,Zn-metallothronem, showing the Cu and the X-ray Absorption Near Edge Structure (XANES) and the Extended X-ray Absorptron rial representation of the ortgm of EXAFS. As the energy of the mcrdent X-ray changes, photoelectron This leads to constructrve or destructrve interference between the outgoing the absorption coeffictent of the material

XANES

ay-)

Zn K-edges, the demarcation between Fme Structure (EXAFS), and a pictoso does the wavelength of the emrtted and backscattered wave, and changing

-\

378

Garner

latlons within ea. 50 eV of the edge as the X-ray Absorption Near Edge Structure (XANES) and those that extend beyond this region as the Extended X-ray Absorption Fine Structure (EXAFS). The theoretical basis of the latter IS considered mature, and interpretations of EXAFS data have been reported confidently for over a decade. In contrast, progress in the understanding and, therefore, the application of XANES has been relatively slow, and data in this spectral region are generally not interpreted, but used qualitatively to “fingerprint” a metal site, X-ray absorption spectra generally contain two other types of information: (1) The actual energy of a particular absorption edge of an element depends on the oxidatton state and the nature of the immediate chemical environment of that element. Typically, one unit increase of oxidation state increasesthe energy of the edge by l-3 eV. (2) The excitation of a core electron into the continuum may be convoluted with transitions to the valence levels. These promotions give rise to preedge and edge features that can provide information concerning the chemical nature and electronic structure of the primary absorber. 2.2. EMS Interpretation The EXAFS (or XANES) amplitude x(k) is defined as the modulation of the absorption coefficient, h of a particular atom relative to the smooth background absorption coefficient, &, normalized by the absorption coefficient that would be observed for a free atom (Eq. [2]). X(k) = (cl - &mo (2) Although EXAFS and XANES are recorded as a function of energy, it is conventional to plot the data as a function of k, the photoelectron wave vector (Eq. [3]): k = (27L/h)[2m(E - E&J]1’2

(3) where E is the energy of the X-ray photon; I$, is the energy of the absorption threshold; m is the mass of the electron; and h is Planck’s constant. Lee and Pendry (9) andAshley and Doniach (10) showed that, except for the energies very close to the absorption threshold, a single scattering formalism is usually sufficient to describe the observed EXAFS. The effect IS shown prctorrally in Fig. 1, which depicts a photoelectron being generated by the absorption of an X-ray photon by acopper atom and backscattered from an adjacent atom (see Note 3). As the photon beam energy is smoothly increased beyond the ionization threshold,

X-Ray Absorption

Spectroscopy

379

so the wavelength of the photoelectron decreases, and there will be a smooth and continuous movement through constructive, then destructive, then constructive, and so on, interference of the outgoing and backscattered waves. Constructive and destructive interference increase and lower, respectively, the absorption coefficient, as compared to the free atom value. When the energy of the photoelectron is sufficiently high, the curvature of the photoelectron wave can be neglected, and the theory can be greatly simplified as the plane-wave approximation. In this approximation, the oscillatory EXAFS function, x(k), associated with a K-absorption edge may be written as Eq. (4). x(k) = -C 3cos26,(N,/kR~)l~(7c) I sm(2kR, + 26, + &) exp (-20yk2) exp(-2R,/h) (4)

This equation shows the structural basis of EXAFS, in that x(k) is dependent on: 1. The number of scattermg atoms N,, 2. The distance of the scattering atoms RJfrom the primary absorber; 3. The type of scattering atom, through the characteristic energy dependence of the backscattermg amphtude IfJ (@I the magmtude of which generally increases with increasing Z. 4. The 26, phase shift due to the potential of the emitting atom, and 5. The phase of the backscattermg factor, q$.

The mean square variation in R, is represented by the Debye-Waller factor,* 0:; his the elastic mean-free path of the photoelectron, and it is the damping term [exp(-2R,lh)] that invariably limits backscattermg contributions to <4 A from a metal atom in a biological system. The jth neighbor makes an angle Ojwith the polarization vector of the incident X-ray, and the term 3 cos2 0, averages to 1 for solutions and polycrystalline samples. At lower photoelectron energies, the plane wave approximation (Eq. [4]) breaks down and leads to errors in the calculated phase,which in turn can result in an incorrect determination of the interatomic distances.Furthermore, it is this low-energy part of the EXAFS spectrum that usually contains most of the backscattering from the low 2 backscatters.This is particularly important in the case of biological systems, where the range *The Debye-Wailer parameter used here IS not the same as that used m X-ray crystallography The latter refers to the elhpsold of atom movement, whereas EXAFS Includes a consIderatIon of atom poutions along the metal-hgand dlrectlon only.

380

Garner

of the EXAFS data is often limited since backscattering is weaker from low Zatoms and it is the locatron of atoms, such as nitrogen or oxygen, that is crucial to many investigatrons. The low-energy part of the EXAFS spectrum can be analyzed by use of the exact theory given by Lee and Pendry (91, which takes account of the curvature of the electron wave and thus has been named the “spherical wave method.” Unfortunately, this exact theory has not been used in a majority of studies becauseof its mathematical complexity and requirement for large computational time. However, this exact theory has beensimplified by performing an average over the angular positions of the scattering atoms relative to the X-ray beam direction (II) and has beenconsolidated in the Daresbury analysis programs EXCURV. Thts simplification is strictly only applicable to data analysis for polycrystalline or amorphous samples, but is well suited for virtually all studies of transitron metals in biological systems, The first stagein EXAFS dataanalysis is the removal of the background absorption, the extraction of the EXAFS, and its normalization to that for a unit metal atom. An madequate removal of the background can result m a number of deficiencies in the EXAFS data, which may lead to inaccurate determination of the amplitude function and/or distortions of the low “I?” contribution. Often it is convenient to weight the data by km (usually m = 2 or 3) to enhance the data at high k. Figure 2 shows the iron K-edge EXAFS x k3recorded for the iron-molybdenum cofactor (FeMoco) extracted from the FeMo-protein of the nitrogenase of Klebsiella pneumoniae (12), together with its Fourier transform. 2.3. Fourier

Transform

Sayers et al. (13) recognized that the Fourier transform of X(k) would yield a radial distribution function, providmg a visual representation of the atomic arrangement about the primary absorber. This peaks at distances close to the corresponding R, values (Eq. [4]), the area of the peak depending upon the number of backscattering atoms (N,) withm the shell and their backscattering amplitude If,(n) I. Unless a correction is applied, the peaks in the Fourier transform occur at a lower R, than the actual value because of the phase shift that the backscattered photoelectron experiences. This was recognized by Sayers et al., and they subsequently showed that the phase shift of an absorber and scatterer pair can be extracted from a related compound of known structure and used to obtain an accurate value for the corresponding distance in an

wavs Vector (k, ~@rom-‘) 12

4

Metal-ligand

3

5

6

7

distance A

8

9

I

Fig 2 k3-Weighted EXAFS data associatedwith the iron K-edge of the iron-molybdenum cofactor (FeMoco) extracted from the FeMo-protein of the rutrogenase of Klebsdlu pneumonlae and its Fourier transform (12)

Photoelectron

Fourier transform

382

Garner

unknown system. Alternatively, rt is possible to calculate the absorber and backscatterer contributions to the phase shift separately (9). Both of these approaches have been successfully used for structural studies of metal centers in proteins. 2.4. EXAFS

Analysis

and Accuracy EXAFS data analysis consists of varying E, and the values of-at

least--R,, NJ, and 01 for each shell of atoms adjacent to the primary absorber, to produce optimum agreement between the experimental and simulated data and their Fourier transforms. The level of agreement may be assessedby a fit index (e.g., Eq. [5]), summed over all y1pomts. However, this should not be at the expense of deficiencies obvious from a visual inspectron. The level of improvement achieved by addition of an extra shell needs to be carefully monitored and the statistical significance assessed(14). FI = i F 1 1QTxe,,(~)*- xtt,W*l4 100 4 1 l

(5)

The major source of uncertainty m parameter determination by EXAFS analysis arises from the correlation that exists between the coordination number (N,) and the Debye-Waller factor (of ) for each shell. This correlation occurs through the amplitude of the backscattered wave (Eq. [4]) and results in the uncertainty in N, being f25%. The primary manifestation of R, is m the relative phases of the outgoing and backscattered waves, and in general, R, values can be determined to an accuracy of ca. 310.02A. The atomic number of the backscattermg atoms cannot usually be determined to a precision better than f 1, since I&(K) I and @I(Eq. [4]) do not show a marked dependence on 2. The present limitations in the accuracy with which NJ and Zvalues can be determined are especially frustrating for the characterization of metal centers in protems. Therefore, wherever possible, the information obtained from other spectroscopic and structural techniques should be incorporated into the EXAFS analysis to help remove ambiguity 2.5. Multiple

Scattering

Multiple scattering contributions are generally not important in the EXAFS region, However, this is not so for systems that have colinear arrangements of three or more atoms. For transition metals m brologrcal

X-Ray Absorption

Spectroscopy

383

Fig. 3. The Important multiple-scattering pathways of an lmldazole group that ~111 be mamfest m the copper K-edge EXAFS are indicated by the broken lines

systems, multiple scattering effects are very important for M-C=0 and M-C=N moieties and for coordination by imidazole groups (Fig. 3). Theoretical developments (15), allowing the inclusion of double- and triple-scattering contributions, have allowed satisfactory data analyses of multiple scattering contributions (16). These and other (17) developments are advancing the possibilities of successfully interpreting the XANES region, where extensive multiple scattering of the photoelectron occurs. 3. Experimental Considerations The basic arrangement of apparatus for recording an X-ray absorption spectrum is illustrated in Fig. 4. 3.1. Source

Within an X-ray absorption spectrum, the signal is generally small compared with the atomic absorption resulting from the excitation of the core electron of the atom of intensity (Fig. 1). Thus, good-quality EXAFS

and XANES

data require

an intense

and stable X-ray

source.

I

Ion cllanlber

Ion Chamber

Sample Insxlion

Fig. 4 Schematic arrangement for measurement of an X-ray absorption spectrum

Mowchromator

coulmathJn 6 slit

1 x-ray

X-Ray Absorption

385

Spectroscopy

I

Q 10'"

SRS

2GeV,l

2 T

z 5 10'3 1 *

10'2

1 10" E

1 1o'O B f 01

1

10

102 Wavelength

103

10"

105

10"

10'

(A)

Fig 5 Spectral distribution, from the Daresbury Synchrotron Radlatlon Source at the normal bendmg and wrggler magnets.

To date, there has been no XAS study of a metalloprotein using a laboratory X-ray source, such as a rotating anode; all such studies have required synchrotron radiation sources. A recent valuable development at many such facilities hasbeen the inclusion of insertion devices, especially wavelength shifters and multipole wiggler magnets, to increase the intensity available, particularly at shorter wavelengths. Synchrotron radiation sources (Fig. 5) are continuous and intense X-ray sourcesfor the study of the K-edge XAS of elements with Z< 65. However, there are two limitations to this range: (1) The application of XAS to elements with 2 < 20 is limited on many facilities because of the use of beryllium windows on the majority of the X-ray beam lines of synchrotron radiation sources; these are necessary to separate the high vacuum of the source from the experiment. (2) The large lifetime broadenmg owing to the limited electron and hole lifetime at high excitation energies places an upper limit on 2. The heaviest element that has been investigated by K-edge EXAFS is iodine (2 = 53), which has its K-edge at 0.3781 (33,500 eV) and a lifetime broadening >7 eV. For the higherzelements, L-edge XAS can be recorded, concomitant with the excitation of a 2s or 2p electron;

386

Garner

typically, L-edge data can be recorded for molybdenum (2 = 42) to uranium (2 = 92). 3.2. Monochromator

The “channel-cut” monochromator is the simplest type employed experimentally. Achannel is cut in aperfect crystal (e.g., Si) to provide two parallel reflecting surfaces that have a particular crystal plane (e.g., the Si[220]) parallel to the surface. The Bragg condition is used to select a particular wavelength, and the reflected beam emerges parallel to the incident beam, but vertically displaced by 2Dcos 8, where D is the distance between the two faces and 6 the angle between the beam and the Bragg planes. The accuracy of data collected using channel-cut crystal monochromators may be limited because of harmomc contamination of the reflected beam. Harmomc suppression can be achieved by use of a double crystal monochromator that has the two crystal faces slightly offset; this effect is chosen to give a high acceptance of the particular fundamental wavelength together with good harmonic rejection of its harmonics. 3.3. Detectors

The conventional XAS experiment involves the direct measurement of the incident and transmitted beam intensity using ionization chambers. The first chamber contains a weakly absorbing gas that permits 170% of the incident radiation to fall on the sample, and the second ionization chamber contains a mixture of inert gases that will absorb virtually all of the transmitted intensity. The measured absorption coefficient comprises that owing to the matrix (pM) and that owing to the atom of interest (lt*). The application of transmission method is ultimately limited by the incident number of photons and the ratio of h to PA. In cases where l&&A = 1, it is difficult to use the transmission method, and for ratios > 10, it is almost impossible. The detection sensitivity can be enhanced if a discrimination can be made between the matrix and host absorption. X-ray fluorescence offers just this possibility. Since the fluorescence yield is practically independent of the excitation energy over an EXAFS spectrum (- 1000 eV above the edge), a change m the absorption cross-section is directly reflected by a change in the fluorescence yield. This increased contrast arises since the fluorescence of the ele-

X-Ray Absorption

Spectroscopy

387

ment of interest, in the region of one of its absorption edges, is considerably greater than that of the lighter matrix atoms. Fluorescence detection is now a standard procedure for recording XAS for metal atoms in biological systems. Originally, Tl doped NaI scintillators were employed, but now a new generation of solid-state detectors, with improved sensitivity and stability, are favored. These allow data to be collected at concentrations 51 mM in the element of interest. 3.4. Sample

The basic requirement for any sample to be investigated by XAS is that the element of interest be present at a concentration of at least (say) 1 mM, for a vol of ca. 0.5 mL. Although both of these limits can be lowered, the present technical specifications for sources and detectors mean that for concentrations < 1 mM--although the edge may be distinct-the EXAFS profiles will generally be of a poor quality and limited range; therefore, multiple scans (say up to 16) will be essential for any meaningful interpretation of the data. Brighter sources and detectors with an enhanced discrimination of the signal above the background absorption will lead to improved sensitivity in the foreseeable future but an order of magnitude improvement is not envisaged (see Note 4). Recording data at low (77 or 4 K) temperature reduces the DebyeWaller parameter, and this usually permits extension to the data range with a consequent improvement in resolution. Therefore, it is advantageous to study metalloprotems in a frozen glass contained between thin plastic (e.g., Mylar) windows glued to a robust (e.g., aluminium or perspex) frame. The solution containing the necessary buffers salts and other mediators should be made up at room temperature, injected into the cell, and rapidly frozen to liquid nitrogen temperature. An “antifreeze” agent (e.g., glycerol or ethylene glycol) should be added to prevent crystallization; should this occur, Bragg reflections may dominate part of the spectrum. The use of solutions has the major advantage that air-sensitive samples can be loaded into the cell in an inert atmosphere m the biochemical laboratory and then frozen prior to storage, transportation, and loading, mto the cryostat at the synchrotron source. Also, the study of metalloprotems in solution readily allows the introduction of redox partners, substrates or substrate analogs, inhibitors, and so forth, to monitor directly their effect at the metal center.

388

Garner

As an alternative to studying solutions, lyophilized powders may be investigated, again at low temperature. However, the production of these may lead to some degradation of the biological sample, and this form of material is generally less convenient and flexible than solutions. The polarization inherent in synchrotron radiation readily permits the measurement for anisotropy in the X-ray absorption spectra for oriented samples. This has been used to good effect for single crystals of the iron-molybdenum protein of nitrogenase (18) and the manganese reaction center of photosystem(I1) in oriented chloroplasts (19). 4. Notes XAS-especrally with respect to EXAFS-has many advantages as a probe of metal centers in biological materials, and a whole host of systems have now been studied (2-8). Beyond the absenceof a reqmrement for crystalline materials, the major attractions are the specificity and sensitivity of the technique, and the provision of interatomic distances with an accuracy of f0.02 A within (say) 4 8, of the primary absorber. However, it should be noted that: 1 No angular information is obtained. 2. Rarely does the structural mformation extend beyond 4 A from the metal probed. 3. The spectrum sums data for all atoms of a particular element and, if the element of interest is present m more than one chemical form, an average environment is obtained. This can still provide useful mformation, as seen from the iron K-edge investigations of the iron-molybdenum cofactor of the mtrogenases (Fig. 2) (12), which contams at least SIX iron atoms. However, it is important to estabhsh independently the number of different sites of the atom of interest m the system to be studied. 4. The possibility of radiation damage must be anticipated and the integrity of samples should be momtored after and, if possible, durmg measurement. For enzymes, the activity before and after study should be determined. The impact of a htgh flux of X-rays on a metalloprotein can lead to the production of radicals and, especially m the presence of polar solvent molecules, solvated electrons. 5. XAS is a “sportmg method,” and the strength of any interpretation will benefit from other information Interpretation of EXAFS mvariably requires calibration by correspondmg measurement and analysis of data for chemical analogs of a known structure. Such comparisons can also serve to “fingerprint” a metal site m a protein by matching the X-ray

X-Ray Absorption Spectroscopy

389

absorption spectrum with that of a well characterized chemical system. Protein crystallography IS especially complementary to XAS. Thus, the latter generally achieves a more precise determmation of metalhgand bond lengths than the former, as manifest for rubredoxm (20). Also, the knowledge of the groups adjacent to the metal provided by protein crystallography, in (say) the native protein, removes many of the interpretive ambiguities inherent to EXAFS, and provides an excellent base from which to monitor and mterpret how the metal center responds to changes m redox status, pH, and reagents.

The main emphasis in any X-ray absorption spectroscopic study should be to collect the best-quality data possible for the systems of interest. This will require access to a synchrotron radiation source with state-of-the-art instrumentation. Beyond this, there is a real need for professional expertise in data collection and, more importantly, data interpretation. Therefore, collaborations between biologists, with expertise in the isolation and purification

of samples, and those cog-

nizant in the arts of XAS measurement and analysis are sensible and mutually beneficial. References 1 Kincaid, B M. and EJsenberger, P (1975) Synchrotron radlatJon studJes of the K-edge photoabsorptlon spectra of KrJ, Br2, and GeC14. A comparison of theory and experiment Whys Rev Lett. 34, 136 1-l 367. 2 Cramer, S. P. and Hodgson, K. 0. (1979) X-ray absorption spectroscopy: a new structural method and Its apphcatlon to bJOJJJOrganiCchemistry. Prog. Inorg. Chem. 25, l-39. 3 Powers, L (1982) X-ray absorption spectroscopy* Application to biological molecules Biochem. Biophys Acta 683, l-38 4 Cramer, S. P. (1983) Molybdenum enzymes: A survey of structural information from EXAFS and EPR spectroscopy, m Advances in Inorganic and Blomorgamc Mechanisms (Sykes, A. G , ed ), Academic, London, pp 260-288. 5 Scott, R A (1985) Measurement of metal-1Jgand distances by EXAFS Methods Enzymol. 117,4 14-459 6 HasnaJn, S S. (1987) ApplJcation of EXAFS and XANES to metalloprotems Life Chem Rep 4,273-331

7 Hasnam, S. S and Garner, C D. (1987) CharacterJzatJon of metal centres Jn biological systems by X-ray absorption spectroscopy Prog Blophys. Mol Biol 50,47-65. 8 HasnaJn, S. S. (ed ) (1990) Synchrotron Radiation and Biophysics. Ellis Horwood, Chichester, pp 9-121 9 Lee, P. A. and Pendry, J. B. (1975) Theory of the extended X-ray absorptJon fine structure Phys Rev Bll, 2795-2811

390

Garner

10 Ashley, C. A and Domach, S (1975) Theory of extended X-ray absorption edge fine structure (EXAFS) m crystalhne sohds Phys Rev Bll, 1279-1288 11 Gurman, S. J., Bmsted, N , and Ross, I (1986) A rapid, exact curved-wave theory for EXAFS. J. Phys C 17, 143-151. 12 Arber, J. M , Flood, A C , Garner, C. D , Gormal, C A , Hasnam, S S., and Smith, B. E (1988) Iron K-edge absorption spectroscopy of the lron-molybdenum cofactor of the mtrogenasefrom Klebslella pneumonlae Blochem J 252, 421-425 13 Sayers, D. E , Stern, E A , and Lytle, F W (1971) New technique for Invest]gating non crystalline structures Fourier analysisof the extended X-ray absorptlon fine structure. Phys Rev Lett. 27, 1204-1207 14 Joyner, R W., Martin, K J , and Meeham, P (1987) Some apphcatlons of statistical testsin analysis of EXAFS and SEXAFS data. J. Phys C. 20,40054012 15 Gurman, S. J , Bmsted, N , and Ross, I (1986) A rapid, exact, curved-wave theory for EXAFS calculations II The multlple-scattermg contrlbutlons J Phys C 19, 1845-1861 16 Blackburn, N J , Strange, R. W , McFadden, L M , and Hasnam, S S (1987) Anion bmdmg to bovme erythrocyte superoxide dlsmutasestudied by X-ray absorption spectroscopy A detailed structural analysis of the native enzyme and the azido and cyano derlvatles using a multlple-scattering approach. J Am Chem. Sot 109,7 162-7 170 17. Durham, P J , Pendry, J B , and Hodges, C. H (1982) Calculation of X-ray absorption near edge structure, XANES Comp Phys Commun 25, 193-205 18 Flank, A. M., Wemmger, M., Mortensen, L. E , and Cramer, S. P (1986) Single crystal EXAFS of nitrogenase J Am Chem Sot. 108, 1049-1055. 19. George, G N , Prince, R. C , and Cramer, S P (1989) The manganesesite of the photosynthetic water-splitting enzyme Science 243,789-791 20 Watenpaugh, K D , Seeker,L. C , and Jensen,L H (1980) Crystallographic refinement of rubredoxm at 1 2 A resolution J A401 Blol 138,615-633

A Actinomycin D, 107-109 Albumin, 220 Alpha helix, 4 7 4 8 Amide protons, in NMR exchange, 17,48, 74, 75, 83, 181, 182, 185-187 temperature sensitivity, 74, 83 Amino acid residues in proteins, molecular weights, 256-258 NMR spectra, 33-40 Anisotropy, in EPR, 33 1 in NMR,17, 18, 102, 103 Anomerization, of oligosaccharides, 120 Antibodies, 225 Array detectors, 208-2 10 B Beta sheet, 48 Beta turns, 77, 78 Broadband decoupling, 7 C Carbohydrates, mass spectrometry, 22 1, 296-298 methylation, 302 NMR spectroscopy, 115-1 65 Carboxypeptidase B, 268 CD4, recombinant human, 121, 126, 127, 130-142 Chemical exchange, 172-179, see also Magnetization transfer, Conformational interconversion Chemical ionization mass spectrometry, see Mass spectrometry, ion production Chemical shift, in Mossbauer spectroscopy, 3 17, 322,323

in NMR, 2-5 of amide protons, 83 of amino acid residues, 33-40 of oligosaccharides, 129-142 of oligonucleotides, 91, 93 in peptides, 32, 75 reference, 3-5 temperature dependence, 74 Cisplatin, 110, 305 Collision cell, 210, 21 1, 287 Collisional activation, see Mass spectrometry, basic principles Conformational interconversion, 18, 22, 23 52, 53, 69, 70, 75-8 1,165, see also Proline, cis-trans isomerization Correlation time, 8, 30, 31, 73, 83,101,164 Cyanogen bromide, 274,275

D Debye-Waller factor, 379, 382 Dimethylsulfoxide, 7 1, 82 Dipolar relaxation, 8, 28-30 DISMAN, 54 Distance geometry, 53,54 Dithioerythritol, 243-245 Dithiothreitol, 243-245 DNA, see Oligonucleotides Dynode, 208,209,223

E Edman degradation, 275, 276 Electron impact mass spectrometry, see Mass spectrometry, ion production Electron paramagnetic resonance spectrometry, information content, 338, 339 instrumentation, 333-335, principles, 329-333

Index quantitation, 339,340 sample preparation, 335-337 sensitivity, 335 suitable nuclei, 329 Electrospray mass spectrometry, see Mass spectrometry, ion production Electron spin resonance spectroscopy, see Electron paramagnetic resonance spectroscopy Endo H, 120 Ethidium bromide, 107, 108 Exchange processes, 172-1 79 fast exchange, 176, 177 intermediate exchange, 177-179 slow exchange, 175

F FAB mapping, see Fast atom bombardment mass spectrometry, pep tide mapping Fast atom bombardment mass spectrometry, artifacts, 244,245 basic principles, 237, 238 calibration, 249-25 1 choice of fast atom, 240, 241 choice of matrix, 242-246 continuous flow, 203,204 data collection, 251-253 instrumentation, 240,241 interfering contaminants, 246-248 ion production, 202-204 peptide mapping, 267-275 sample preparation, 246-249,273-277 sensitivity, 248, 249 suppression effects, 270-272 target cleaning, 253, 254 Ferritin, 324, 325 Field desorption mass spectrometry, see Mass spectrometry, ion production Flavodoxin, 22 Fourier transform, in NMR, 2, 10.23-27 in X-ray absorption, 380-382 Fourier transform mass spectrometry, see Mass spectrometry, mass analyzers

G g-factor, in EPR, 330-333 Gamma turns, 77,78 Glycolipids, mass spectrometry, 298-303 Glycopeptides, mass spectrometry, 268-270 NMR spectroscopy, 119 release of carbohydrate chains, 119, 120 Glycoproteins, 119, 220,221,225, 294 Glycoprotein carbohydrate chains, structure determination by NMR, analysis of NMR spectrum, 129-142 conformation, 145, 146 diantennary, 132 experimental conditions, 125-1 29 fucosylated, 132-135 high mannose, 133, 140, 141 hybrid, 133, 140, 141 poly-N-acetyllactosamine, 143 sample preparation, 123-125 sulfated, 142 structures, 117-1 19 triantennary, 136-139 Glycosaminoglycans, 150-152, 158

H Hemoglobin, 56,57 Heptafluorobutyric acid, 246 Heteronuclear correlation, see NMR pulse sequences Heteronuclear editing, 30 Histidine, pKa measurement, 56, 57 Hoechst 33258, 102-104 Hybrid mass spectrometers, 288 Hydrogen bonding, 74 Hyperfine coupling, in EPR, 331

I Immonium ion, in MS, 262 Ion traps, see Mass spectrometry, mass analyzers Iron storage proteins, 324,325 Iron-sulfur proteins, 322-324, 367, 368 Isotope distrubution, in mass spectrometry, 254, 255 Isotopic enrichment, in EPR, 337

Index in Mossbauer spectroscopy, 321 in NMR, 17, 21,22, 30,47,61 in resonance Raman spectroscopy, 366,367

J J-coupling, see Spin coupling

K Karplus curve, 5,6,51,52,74

L P-Lactamase, 184, 185 Laser desorption ionization mass spectrometry, basic principles, 215-220 of glycoproteins, 221, 222 instrumentation, 218-220 ion production, 204,216 matrices, 216, 217 of proteins, 220, 221 protein sequencing, 226 of protein subunits, 222, 223 resolution, 223, 224 sample preparation, 216-218 sensitivity, 216 Lineshape analysis, 173-179 Lipids, see Glycolipids Longitudinal relaxation, see SpinLattice relaxation LSIMS, 241,286

M Magnetic sector mass spectrometers, 288,289 Magnetization transfer, 8, 172, 179-184 Mass spectrometry, see also Fast atom bombardment mass spectrometry, Laser desorption ionization mass spectrometry, Plasma desorp tion ionization mass spectrometry basic principles, 191-2 12 collisional activation, 210-212 fast atom bombardment, 237, 238 ion detection, 207-210 laser desorption ionization mass spectrometry, 2 15-220 linked scanning, 21 1, 2 12

plasma desorption ionization mass spectrometry, 229-232 resolution, 193 sensitivity, 191 tandem mass spectrometry, 285-289 fragmentation of peptides, 258-267 ion production, in chemical ionization mass spectrometry, 200, 20 1 in electron impact mass spectrometry, 200, 20 1 in electrospray, 204, 207 in FABMS, 202-204 in field desorption MS, 201, 202 in LDIMS, 204,2 16 mass analyzers, Fourier transform, 199, 200 ion traps, 198, 199 magnetic sector instruments, 194-1 96 quadrupole mass filters, 196-1 99 time-of-flight analyzers, 198, 199 Matrix-assisted laser desorption mass spectrometry, see Laser desorption ionization mass spectrometry Membrane-bound proteins, 18, 337 Metalloporphyrins, 35 1-354 Metalloproteins, 22 Molecular reorientation, see Correlation time Molecular weight, 254 Mossbauer spectroscopy, information content, 3 17, 3 18 instrumentation, 3 18-320 principles, 3 16 sample preparation, 320-322 sensitivity, 321 suitable nuclei, 3 15, 3 18

N Netropsin, 106 N-glycanase, 119, 120,270 3-Nitrobenzylalcohol, 243-245 Nitrocellulose, 226, 234-234, 241 NMR pulse sequences, 11-14 COLOC, 13 COSY, 11,12,29 double-quantum correlation, 29

Index heteronuclear correlation, 29, 47 HMBC, 13 HMQC, 13, 93 HOHAHA, 11,29 NOESY, 11,23-29, 31,72 ROESY, 29,72 TOCSY, see HOHAHA solvent suppression, 62, 92, 128, 129 zero-filling, 129 NMR spectroscopy, see also Lineshape analysis, Nuclear Overhauser enhancement averaging of parameters, 64-70, 79, 80 buffers, 62 deuterium exchange, 62 linewidths, 18-21, 63, 88, 89, 107, 164, 171, 172 magnetic field homogeneity, 63 of oligonucleotides, spectral assignments, 94-97 paramagnetic impurities, 62,63,123,171 of peptides, 69-83 of proteins, 15-67 scope and limitations, 16-23 quantitation, 5 sample preparation, 170, 171 sensitivity, 16, 61, 70, 88, 94, 122, 142, 152 spectral dispersion, 17 three-dimensional, 30, 31 two-dimensional, 23-3 1 NOE, see Nuclear Overhauser enhancement, NMR pulse sequences Nuclear magnetic moment, in Mossbauer spectroscopy, 3 17 Nuclear Overhauser enhancement (NOE), 7, 11,47-55,82,83 back-calculation, 55 initial rate regime, 30, 31, 49 and internal motion, 49, 50 measurement, 50, 5 1 spin diffusion, 30, 49

0 Oligonucleotides, A-DNA, 97,98 B-DNA, 93,97,98 dynamics, 98-100

hairpin loop DNA, 99 major groove, 89,90 mass spectrometry, 221, 222, 303-305 melting temperature, 92 minor groove, 89 mismatch duplexes, 99, 100 NMR spectroscopy, sample preparation, 9 1 spectral assignments, 94-97 Z-DNA, 98,99 Oligonucleotide-ligand interactions, 89-94, 100-1 11 dynamic aspects, 105, 106 intercalation, 107-109 major groove binding, 109, 110 minor groove binding, 100-105 Oligosaccharides, from glycoproteins, 117-1 19, see also Glycoprotein carbohydrate chains mass spectrometry, 22 1, 296-298 NMR spectroscopy, 115-146 release from glycopeptides, 119, 120 reduced, 120

P Peptide bond cleavage, see Mass spectrometry, fragmentation of peptides Peptide sequence determination, by MS, 264-267, 289-294 Peptides, assignment of NMR spectrum, 71,72 hydrophobicity index, 271-273 identification of C-terminal peptide, 268 methylation, 276, 277 molecular weight determination by MS, 254-257 N-acetylation, 276 NMR spectroscopy, 69-83 structure determination, 77-8 1 Phase sensitive spectra, 9, 28 Plasma desorption mass spectrometry, basic principles, 230-232 sample preparation, 232-235 PNGase F, see N-glycanase Polarization, in resonance Raman spectroscopy, 351-354,366

Index Polysaccharides, analysis of NMR spectrum, 158-162 conformation, 163, 164 linkage analysis, 162, 163 sample preparation, 151, 152 sequence determination, 162, 163 structure determination, 154-1 58 substituents, 162 Postacceleration detection, 208 Posttranslational modification, 294, 303 Proline, cis-trans isomerization, 22, 56, 8 1 Proteins, aggregation, 19-21, 222, 223 complexes with ligands, 60 disulfide linkages, 225,270,277,278 domains, 58, 59 folding, 18, 58-60, 184-188 molecular weight determination, by PDMS, 220,221 NMR asssignment, 32-47 sequencing, 226 structure determination, 47-55 Pseudoatoms, in protein structure determination, 46, 47 Pyruvate dehydrogenase, 58

R Raman spectroscopy, see Resonanace Raman spectroscopy Random coil, 78-8 1 Redox potential, 342, 343 Renin substrate, 290-292 Resonance Raman spectroscopy, instrumentation, 354-365 principles, 345-35 1 Rubredoxin, 323, 345, 347, 367, 368

S Saturation transfer, see Magnetization transfer Scalar coupling, see Spin coupling Secondary structure, 77-81 Sequential assignments, see Proteins, NMR assignment Shaped pulses, in NMR, 8, 9 Soft pulses, in NMR, 8,9 Solvent effects, in NMR spectroscopy, 75, 81, 82

Spin coupling, in NMR, 5-7, 28, 73, 74, 161,162 as constraint in protein structure determination, 5 1, 52 Spin diffusion, see Nuclear Overhauser enhancement, spin diffusion Spin-lattice relaxation (T,), in EPR, 331,340 in NMR, 7, 8, 18, 92, 107, 182, 183 Spin-spin relaxation (T2), in EPR, 331 in NMR, 7,8, 18, 182, 183 Staphylococcal nuclease, 56, 57 Stereospecific assgnments, see Proteins, NMR assignment Structural reporter groups, 1 16, 130

T T,, see Spin-lattice relaxation T2, see Spin-spin relaxation Thioglycerol, 243, 245 Time-of-flight mass spectrometry, see Mass spectrometry, mass analyzers Transverse relaxation, see Spin-spin relaxation Triple quadrupoie mass spectrometers, 287, 288 Trypsin, 268-269, 273, 274 Two-dimensional NMR, 9-14, see also NMR pulse sequences U Urokinase, 58, 59

X XAFS, see X-ray absorption spectroscopy XANES, see X-ray absorption spectroscopy Xenon, 240 X-Ray absorption spectroscopy, instrumentation, 383-387 principles, 367-383 sample preparation, 387, 388