Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands Linacre House, Jordan Hill, Oxford OX2 8DP, UK
First edition 2009 Copyright r 2009 Elsevier B.V. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email:
[email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http:// www.elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-444-53055-4 ISSN: 0166-526X
For information on all Elsevier publications visit our website at books.elsevier.com
Printed and bound in Hungary 09 10 11 12 13 10 9 8 7 6 5 4 3 2 1
ADVISORY BOARD Joseph A. Caruso University of Cincinnati, Cincinnati, OH, USA Hendrik Emons Joint Research Centre, Geel, Belgium Gary Hieftje Indiana University, Bloomington, IN, USA Kiyokatsu Jinno Toyohashi University of Technology, Toyohashi, Japan Uwe Karst University of Mu¨nster, Mu¨nster, Germany Gyo¨rgy Marko-Varga AstraZeneca, Lund, Sweden Janusz Pawliszyn University of Waterloo, Waterloo, Ont., Canada Susan Richardson US Environmental Protection Agency, Athens, GA, USA
Wilson & Wilson’s
COMPREHENSIVE ANALYTICAL CHEMISTRY
Edited by ´ D. BARCELO Research Professor Department of Environmental Chemistry IIQAB-CSIC Jordi Girona 18-26 08034 Barcelona Spain
Wilson & Wilson’s
COMPREHENSIVE ANALYTICAL CHEMISTRY PROTEIN MASS SPECTROMETRY
VOLUME
52 Edited by JULIAN P. WHITELEGGE The Jane & Terry Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine The Molecular Biology Institute The Brain Research Institute University of California Los Angeles 760, Westwood Plaza Los Angeles, CA 90024, USA
Amsterdam Boston Heidelberg London New York Oxford Paris San Diego San Francisco Singapore Sydney Tokyo
CONTRIBUTORS TO VOLUME 52
Rinat R. Abzalimov Department of Chemistry, LGRT #701, University of Massachusetts, 710 North Pleasant Street, Amherst, MA 01003, USA Paul V. Attwood School of Biomedical, Biomolecular and Chemical Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia Jon Barbour Medizinisches Proteom-Center, Ruhr-Universita¨t Bochum, D-44780 Bochum, Germany Summer L. Bernstein Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA 93106, USA Paul G. Besant School of Biomedical, Biomolecular and Chemical Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia Michael T. Bowers Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA 93106, USA Leticia Britos Beckman Center, Stanford University, Stanford, CA 94305, USA Robert J. Chalkley Mass Spectrometry Facility, Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143, USA Xunming Chen Applied Biosystems, Framingham, MA 01581, USA Jason J. Cournoyer Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, MA 02115, USA
xiii
xiv
Contributors to Volume 52
Alek N. Dooley Pasarow Mass Spectrometry Laboratory, The Semel Institute for Neuroscience and Human Behavior and The Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA John R. Engen Department of Chemistry & Chemical Biology and The Barnett Institute of Chemical & Biological Analysis, Northeastern University, Boston, MA 02115, USA Kym F. Faull Pasarow Mass Spectrometry Laboratory, The Semel Institute for Neuroscience and Human Behavior and The Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA Michael C. Fitzgerald Department of Chemistry, Duke University, Durham, NC 27708-0346, USA Leonard J. Foster UBC Centre for High-Throughput Biology, Department of Biochemistry and Molecular Biology, 406-2125 East Mall, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada Wendell P. Griffith Department of Chemistry MS 602, University of Toledo, 2801 W. Bancroft St., Toledo, OH 43606-3390, USA Michael L. Gross Departments of Chemistry and Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA Frederic Halgand Laboratoire de physiologie de la reproduction, Universite´ de Tours, UMR INRA-CNRS 6175, 37380 Nouzilly, France David M. Hambly Department of Analytical and Formulation Sciences, Amgen Inc., Seattle, WA 98119, USA Adrian D. Hegeman Department of Biochemistry and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706, USA Erin D. Hopper Department of Chemistry, Duke University, Durham, NC 27708-0346, USA
Contributors to Volume 52
xv
Edward L. Huttlin Department of Biochemistry and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706, USA Jodie V. Johnson Department of Chemistry, University of Florida, Gainsville, FL, USA Jonathan E. Katz Spielberg Family Center for Applied Proteomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA Igor A. Kaltashov Department of Chemistry, 701A LGRT, University of Massachusetts, 710 North Pleasant Street, MA01003, USA Lars Konermann Department of Chemistry, The University of Western Ontario, London, ON, N6A 5B7, Canada Arthur Laganowsky Department of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, CA 90095, USA Martin R. Larsen Department of Biochemistry and Molecular Biology, Campusvej 55, DK-5230, University Of Southern Denmark, Odense, Denmark Helmut E. Meyer Medizinisches Proteom-Center, Ruhr-Universita¨t Bochum, D-44780 Bochum, Germany Jennifer L. Mitchell Department of Chemistry, University of New Mexico, Albuquerque, NM 87131, USA Anirban Mohimen Schering-Plough Research Institute, Summit NJ 07901, USA Thomas A. Neubert Department of Pharmacology and Skirball Institute of Biomolecular Medicine, 540 First Avenue, Lab 5-18, University School of Medicine, New York, NY 10016, USA Andrew J. Norris Pasarow Mass Spectrometry Laboratory, The Semel Institute for Neuroscience and Human Behavior and The Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
xvi
Contributors to Volume 52
Peter B. O’Connor Department of Biochemistry, Boston University School of Medicine, 670 Albany Street, Boston, MA 02118, USA Silke Oeljeklaus Medizinisches Proteom-Center, Germany
Ruhr-Universita¨t
Bochum,
D-44780
Bochum,
Jingxi Pan Department of Chemistry, The University of Western Ontario, London, ON N6A 5B7, Canada Darryl Pappin Applied Biosystems, Framingham, MA 01581, USA Phillip J. Robinson Cell Signaling Unit, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia Philip L. Ross Applied Biosystems, Framingham, MA 01581, USA Christopher M. Ryan Pasarow Mass Spectrometry Laboratory, The Semel Institute for Neuroscience and Human Behavior and The Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA Gerold Schmitt-Ulms Centre for Research in Neurodegenerative Diseases, Department of Laboratory Medicine and Pathobiology, University of Toronto, 6 Queen’s Park Crescent West, Ontario M5S 3H2, Canada Lucy Shapiro Beckman Center, Stanford University, Stanford, CA 94305, USA Tujin Shi Centre for Research in Neurodegenerative Diseases, University of Toronto, 6 Queen’s Park Crescent West, ON M5S 3H2, Canada Lorelei D. Shoemaker Center for Regenerative Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA Anna E. Speers Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO 80045, USA
Contributors to Volume 52
xvii
Michael R. Sussman Department of Biochemistry and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706, USA Liangjie Tang Department of Chemistry, Duke University, Durham, NC 27708-0346, USA Esteban Toro Beckman Center, Stanford University, Stanford, CA 94305, USA Keith Vosseller Department of Biochemistry and Molecular Biology, Drexel University College of Medicine, Philadelphia, PA 19130, USA Bettina Warscheid Medizinisches Proteom-Center, ZKF E.042, Ruhr-Universita¨t Bochum, Universita¨tsstr. 150, D-44801 Bochum, Germany Rasanjala Weerasekera Centre for Research in Neurodegenerative Diseases, Department of Laboratory Medicine and Pathobiology, University of Toronto, 6 Queen’s Park Crescent West, Ontario M5S 3H2, Canada Adam B. Weinglass Merck Research Laboratories, Department of Ion Channels, Rahway, NJ, 07065, USA Lance Wells Department of Biochemistry and Molecular Biology, Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602, USA Julian P. Whitelegge The Pasarow Mass Spectrometry Laboratory, NPI – Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine, and The Molecular Biology Institute, and The Brain Research Institute, University of California, Los Angeles, CA 90024, USA Derek J. Wilson Department of Chemistry, York University, 4700 Keele Street, Toronto, ON, M3J 1P3, Canada Christine C. Wu Department of Pharmacology, University of Colorado School of Medicine, Aurora, CO 80045, USA Chong-Feng Xu Department of Pharmacology and Skirball Institute of Biomolecular Medicine, 540 First Avenue, Lab 5-18, University School of Medicine, New York, NY 10016, USA
xviii
Contributors to Volume 52
Guoan Zhang Department of Pharmacology and Skirball Institute of Biomolecular Medicine, 540 First Avenue, Lab 5-18, University School of Medicine, New York, NY 10016, USA Xin-Lin Zu School of Biomedical, Biomolecular and Chemical Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
VOLUMES IN THE SERIES
Vol. 1A
Vol. 1B Vol. 1C Vol. 2A
Vol. 2B
Vol. 2C
Vol. 2D Vol. 3
Vol. 4
Vol. 5
Vol. 6 Vol. 7 Vol. 8
Vol. 9
Analytical Processes Gas Analysis Inorganic Qualitative Analysis Organic Qualitative Analysis Inorganic Gravimetric Analysis Inorganic Titrimetric Analysis Organic Quantitative Analysis Analytical Chemistry of the Elements Electrochemical Analysis Electrodeposition Potentiometric Titrations Conductometric Titrations High-Frequency Titrations Liquid Chromatography in Columns Gas Chromatography Ion Exchangers Distillation Paper and Thin Layer Chromatography Radiochemical Methods Nuclear Magnetic Resonance and Electron Spin Resonance Methods X-ray Spectrometry Couiometric Analysis Elemental Analysis with Minute Sample Standards and Standardization Separation by Liquid Amalgams Vacuum Fusion Analysis of Gases in Metals Electroanalysis in Molten Salts Instrumentation for Spectroscopy Atomic Absorption and Fluorescence Spectroscopy Diffuse Reflectane Spectroscopy Emission Spectroscopy Analytical Microwave Spectroscopy Analytical Applications of Electron Microscopy Analytical Infrared Spectroscopy Thermal Methods in Analytical Chemistry Substoichiometric Analytical Methods Enzyme Electrodes in Analytical Chemistry Molecular Fluorescence Spectroscopy Photometric Titrations Analytical Applications of Interferometry Ultraviolet Photoelectron and Photoion Spectroscopy Auger Electron Spectroscopy Plasma Excitation in Spectrochemical Analysis
xix
xx
Volumes in the Series
Vol. 10 Vol. 11 Vol. 12
Vol. 13
Vol. 14 Vol. 15 Vol. 16 Vol. 17 Vol. 18 Vol. Vol. Vol. Vol. Vol.
19 20 21 22 23
Vol. Vol. Vol. Vol. Vol. Vol. Vol. Vol. Vol. Vol. Vol.
24 25 26 27 28 29 30 31 32 33 34
Vol. Vol. Vol. Vol. Vol. Vol. Vol. Vol. Vol.
35 36 37 38 39 40 41 42 43
Vol. 44 Vol. 45 Vol. 46
Organic Spot Tests Analysis The History of Analytical Chemistry The Application of Mathematical Statistics in Analytical Chemistry Mass Spectrometry Ion Selective Electrodes Thermal Analysis Part A. Simultaneous Thermoanalytical Examination by Means of the Derivatograph Part B. Biochemical and Clinical Application of Thermometric and Thermal Analysis Part C. Emanation Thermal Analysis and other Radiometric Emanation Methods Part D. Thermophysical Properties of Solids Part E. Pulse Method of Measuring Thermophysical Parameters Analysis of Complex Hydrocarbons Part A. Separation Methods Part B. Group Analysis and Detailed Analysis Ion-Exchangers in Analytical Chemistry Methods of Organic Analysis Chemical Microscopy Thermomicroscopy of Organic Compounds Gas and Liquid Analysers Kinetic Methods in Chemical Analysis Application of Computers in Analytical Chemistry Analytical Visible and Ultra-violet Spectrometry Photometric Methods in Inorganic Trace Analysis New Developments in Conductometric and Oscillometric Analysis Titrimetric Analysis in Organic Solvents Analytical and Biomedical Applications of Ion-Selective Field-Effect Transistors Energy Dispersive X-ray Fluorescence Analysis Preconcentration of Trace Elements Radionuclide X-ray Fluorecence Analysis Voltammetry Analysis of Substances in the Gaseous Phase Chemiluminescence Immunoassay Spectrochemical Trace Analysis for Metals and Metalloids Surfactants in Analytical Chemistry Environmental Analytical Chemistry Elemental Speciation – New Approaches for Trace Element Analysis Discrete Sample Introduction Techniques for Inductively Coupled Plasma Mass Spectrometry Modern Fourier Transform Infrared Spectroscopy Chemical Test Methods of Analysis Sampling and Sample Preparation for Field and Laboratory Countercurrent Chromatography: The Support-Free Liquid Stationary Phase Integrated Analytical Systems Analysis and Fate of Surfactants in the Aquatic Environment Sample Preparation for Trace Element Analysis Non-destructive Microanalysis of Cultural Heritage Materials Chromatographic-mass Spectrometric Food Analysis For Trace Determination of Pesticide Residues Biosensors and Modern Biospecific Analytical Techniques Analysis and Detection by Capillary Electrophoresis Proteomics and Peptidomics: New Technology Platforms Elucidating Biology
Volumes in the Series
Vol. Vol. Vol. Vol. Vol.
47 48 49 50 51
Modern Instrumental Analysis Passive Sampling Techniques in Environmental Monitoring Electrochemical (Bio) Sensor Analysis Analysis, Fate and Removal of Pharmaceuticals in the Water Cycle Food Contaminants and Residue Analysis
xxi
FOREWORD
PROTEIN MASS SPECTROMETRY: A PERSONAL HISTORICAL VIEW In the early part of the 1990s the use of mass spectrometry for protein studies exploded due to the start of the proteomics era. Many young scientists in the field believe that protein mass spectrometry started at that time. This is not true. The first attempts to analyze peptides and proteins date back to the mid-1960s. A landmark publication for me personally was a paper by Mickey Barber and a number of French scientists in which they described the structure determination of a heavily and heterogeneously modified peptidolipid called Fortuitine with a molecular weight of 1,359 using electron impact (EI) mass spectrometry [1]. At that time all mass spectrometric analysis required volatile samples, i.e., samples which could be brought at the gas phase without thermal destruction by heating the sample. Fortuitine was a very fortuitous sample because it was N- and C-terminally blocked and in addition a number of the amide nitrogen atoms were methylated, all making the molecule more volatile. EI-ionization supplied sufficient energy to the molecular ion to cause fragmentation along the peptide backbone. The mass accuracy in the double focusing AEI MS9 electrostatic/ magnetic sector instrument was so high that the mass of the molecular ions as well as the fragment ions could be determined with a mass accuracy of a few ppm. A few other groups worldwide were also trying to analyze peptides by mass spectrometry. Among the most prominent was Klaus Biemann at MIT, who combined the need for separation and mass spectrometry by producing derivatives of small peptides which were amenable for GCMS. He also developed computer programs to interpret the data. In a paper from 1966 [2] he has the following statement in a discussion: ‘‘In view of the small amount of material required (microgram quantities) and of the extreme speed (1–3 min of computer time) with which this objective and exhaustive interpretation is achieved, this approach shows considerable promise for the routine determination of the amino acid sequence of small peptides obtained upon partial hydrolysis of the oligopeptides resulting from enzymatic cleavage of large polypeptides. It should also be useful in synthetic work since the principle is independent of the end groups and additional protecting groups, the mass of which can be read in with the data’’. This almost prophetical statement was really foreseeing the future of mass spectrometry in protein studies. In the late 1960s and early1970s a number of groups including my own took up peptide mass spectrometry. We were still limited by the need of producing volatile derivatives. Two main lines were followed, the use of permethylated xxiii
xxiv
Foreword
peptides developed by Das et al. [3] and further improved by Thomas et al. [4] using direct inlet probes, and the reduced and trimethyl silylated derivatives introduced by Biemann using separation and introduction of the samples by GCMS. At that time it gradually became clear that post-translational protein modifications were frequent and that mass spectrometry would be an important analytical method to identify and determine the nature and positions of these. My personal breakthrough, which changed the attitude of many of my colleagues from considering the combination of mass spectrometry and protein chemistry as a utopia to be potentially valuable for protein studies, was the discovery of g-carboxy-glutamic acid residues in the blood clotting factors [5,6]. A number of other modifications such as methylations, acetylations and hydroxylations were also discovered by mass spectrometry in that period. Unfortunately, we could not prepare volatile derivatives of glycosylated and phosphorylated peptides and consequently not see these frequent modifications with our mass spectrometers. Another focus of peptide mass spectrometry was protein sequencing. There we were in competition with the stepwise chemical Edman degradation and the automated peptide sequencer. Due to the need for derivatization when using mass spectrometry and the limited peptide size we could analyze, we could not catch up with the sequencer. Whenever we made a little progress, for example, the ability to analyze peptide mixtures and automatically interpret the resulting complex spectra [7], we were outperformed by improvements of the sequencer. In the 1970s two new ionization methods, chemical ionization (CI) and field desorption (FD), were invented. Both were tested for peptide analysis, e.g. [8,9]. CI resulted in more uniform fragment peak intensity than EI throughout the mass range of the spectra but did not eliminate the need for volatile derivatives. FD was more promising because it allowed analysis of underivatized peptides. However, the spectra were extremely difficult to obtain because the desorption took place in a very short time window and with the scanning instruments used at that time, it was real luck to be at the right position of the scan when the peptide ion was generated. Consequently none of these new ionization techniques resulted in a breakthrough for protein mass spectrometry. In the 1970s mass spectrometry was recognized all over as a tool for discovery of certain types of protein modifications and for sequencing of N-terminally blocked peptides which were not amenable for Edman degradation. However, most of the time it was a hard uphill walk and only few scientists believed in us. In the beginning of the 1980s the situation changed to excitement. The main reason was the developments in the nucleic acid field resulting in cDNA sequencing making de novo sequencing on the protein level partially redundant, and the advent of new ionization methods that allowed mass spectrometric analysis of underivatized peptides and intact proteins. Already in 1974 a new ionization method, plasma desorption mass spectrometry (PDMS), based on desorption of the sample directly from the solid state by bombardment with high energy (100 MeV) ions was described by McFarlanes group at Texas A&M [10]. The method remained largely unrecognized by protein mass spectrometrists until Bo Sundqvist and collaborators at the Uppsala University demonstrated that it was possible to obtain mass spectra of intact proteins using PDMS, first
Foreword
xxv
insulin [11] and later for larger proteins [12,13]. They also developed the first commercially available plasma desorption mass spectrometer through the spinoff company BioIon. It was a fully automated instrument based on the almost forgotten time-of-flight principle. I was lucky to collaborate with the group in Uppsala and to get the first of these new instruments in my laboratory and thereby to participate in this exciting new development. Independently Mickey Barber (the one who analyzed Fortuitine in 1964) published another method, fast atom bombardment (FAB) [14], that allowed desorption of underivatized peptides from a glycerol matrix. The method was quickly demonstrated also to allow mass spectrometry of insulin [15,16] and some larger intact proteins. FAB could readily be installed on existing sector field mass spectrometers (including our own) and therefore quickly became available in many laboratories, whereas acceptance of PDMS was slower because it required new instrumentation and also, due to the 252Cf source, extensive safety precautions. In the coming period most of the principles now used in proteomics were established. Thus, characterization of proteins by peptide mass mapping using FAB-MS was suggested by Howard Morris [17] and soon also taken up by PDMS, the later allowing digestion of the proteins after recording their mass spectra directly on the nitrocellulose-covered target. Thus a combined top-down and bottom-up strategy could be performed on a single sample preparation. Although FAB and PD being soft ionization methods, sufficient energy was supplied to the formed ions to cause some fragmentation. This was investigated for both methods and the now widely used nomenclature for mass spectrometric peptide fragmentation suggested [18]. To generate more fragment ions collision induced dissociation (CID) was introduced in mass spectrometric peptide analysis using either triple quadrupole or the giant four sector instruments. The latter were extensively used by Klaus Biemann, e.g. [19]. They allowed high-resolution mass spectra to be recorded from parent as well as fragment ions and he succeeded in de novo sequencing of several small- to medium-sized proteins using such instruments. The now key element in proteomics, i.e., identification of proteins and peptides based on comparison of their mass with cDNA sequence information was also initiated in that period [20,21]. The scientists who in the 1980s investigated the possibility for mass spectrometry of involatile molecules were a rather small group dominated by physicists who tried to understand the desorption phenomenon but included a few chemists and biologists who wanted to apply the techniques. The group, including among many others Franz Hillenkamp, Ken Standing, Michael Karas, Brian Chait and Marvin Vestal, met at regular symposia, first the Ion Formation of Organic Solids (IFOS) meetings organized by A. Benninghoven and later the Desorption meetings. The atmosphere was enthusiastic with open exchange of information. By that time it became clear to us that the FAB and PD were limited in terms of the molecular size of the proteins to approximately 25 kDa and 35 kDa, respectively. Although the methods were quite sensitive there was also a need for better sensitivity to cope with the challenges in biological research. The next and very important breakthrough came in 1988 when John Fenn presented the first electrospray ionization (ESI) spectra of intact proteins at the
xxvi
Foreword
ASMS conference in San Francisco, and Hillenkamp and Karas the first matrixassisted laser desorption ionization (MALDI) spectra at the International Mass Spectrometry Conference in Bordeaux. There was no doubt that these two new ionization techniques would revolutionize protein mass spectrometry. Proteomics took off in the coming years even though the term was first introduced several years later. Several of us constructed our own instruments to take advantage of the new techniques. The instrument manufacturers also saw the light and started to build new instruments to take advantage of this new market. Since then the protein mass spectrometry community as well as the number of papers have been growing exponentially and an entirely new generation of highly efficient automated mass spectrometers has become available, some of which can meet mass accuracy on the same level as Mickey Barber’s instrument in 1965, and new applications have emerged. As described in the chapters in this book mass spectrometry is now a key analytical technique in protein studies and molecular and cellular biology. The ugly duckling has become a swan to stay in the terms of Hans Christian Andersen who was born in the home city of my university.
REFERENCES 1 M. Barber, P. Jolles, E. Vilkas and E. Lederer, Determination of amino acid sequences in oligopeptides by mass spectrometry. I. The structure of fortuitine, an acyl-nonapeptide methyl ester, Biochem. Biophys. Res. Commun., 18 (1965) 469–473. 2 K. Biemann, C. Cone and B.R. Webster, Computer-aided interpretation of high-resolution mass spectra. II. Amino acid sequences of peptides, J. Am. Chem. Soc., 88 (1966) 2597–2598. 3 B.C. Das, S.D. Gero and E. Lederer, N-methylation of N-acyl oligopeptides, Biochem. Biophys. Res. Commun., (1967) 211–215. 4 D.W. Thomas, Mass spectrometry of permethylated peptide derivatives: Extension of the technique to peptides containing aspartic acid, glutamic acid and tryptophane, Biochem. Biophys. Res. Commun., 33 (1968) 483–486. 5 J. Stenflo, P. Fernlund, W. Egan and P. Roepstorff, Vitamin K dependent modifications of glutamic acid residues in Prothrombin, Proc. Natl. Acad. Sci. USA, 71 (1974) 2730–2733. 6 P. Fernlund, J. Stenflo, P. Roepstorff and J. Thomsen, Vitamin K and the biosynthesis of prothrombin g-carboxy-glutamic acids, the vitamin K dependent structures of Prothrombin, J. Biol. Chem., 250 (1975) 6125–6133. 7 P. Roepstorff and K. Kristiansen, The use of Edman degradation in peptide mixture analysis by mass spectrometry, Biomed. Mass Spectrom., 1 (1974) 231–236. 8 A.A. Kiryushkin, H.M. Fales, T. Axenrod, E.J. Gilbert and G.W.A. Milne, Chemical ionization mass spectrometry of complex molecules – VI: Peptides, Org. Mass Spectrom., 5 (1971) 19–31. 9 H.U. Winkler and H.D. Beckey, Field desorption mass spectrometry of peptides, Biochem. Biophys. Res. Commun., 46 (1972) 391–398. 10 D.F. Torgerson, R.P. Skowronski and R.D. Macfarlane, New approach to the analysis of non-volatile compounds, Biochem. Biophys. Res. Commun., 60 (1974) 616–618. 11 P. Ha˚kansson, I. Kamensky, B. Sundqvist, J. Fohlman, P. Peterson, C. McNeal and R.D. Macfarlane, 127-I plasma desorption mass spectrometry of insulin, J. Am. Chem. Soc., 104 (1982) 2948–2949. 12 B. Sundqvist, I. Kamensky, P. Ha˚kansson, J. Kjellberg, M. Salehpour, S. Widdiyasekera, J. Fohlmann, P.A. Peterson and P. Roepstorff, Californium-252 plasma desorption time of flight mass spectrometry of proteins, Biomed. Mass Spectrom., 11 (1984) 242–257.
Foreword
xxvii
13 B. Sundqvist, P. Roepstorff, J. Fohlman, A. Hedin, P. Ha˚kansson, I. Kamensky, M. Lindberg, M. Salehpour and G. Sa¨ve, Molecular weight determination of proteins by Californium plasma desorption mass spectrometry, Science, 226 (1984) 696–698. 14 M. Barber, R.S. Bordoli, G.J. Elliot, R.D. Sedgwick, A.N. Tyler and B.N. Green, Fast atom bombardment of solids (FAB): A new ionsource for mass spectrometry, J. Chem. Soc. Chem. Commun., (1981) 325–327. 15 M. Barber, R.S. Bordoli, G.J. Elliot, R.D. Sedgwick, A.N. Tyler and B.N. Green, Fast atom bombardment of bovine insulin and other large peptides, J. Chem. Soc. Chem. Commun., (1982) 936–938. 16 A. Dell and H.R. Morris, Fast atom bombardment high field mass spectrometry of 6000 dalton polypeptides, Biochem. Biophys. Res. Commun., 196 (1982) 1456–1462. 17 H.R. Morris, M. Panico and G.W. Taylor, FAB-mapping of recombinant-DNA protein products, Biochem. Biophys. Res. Commun., 117 (1983) 299–305. 18 P. Roepstorff and J. Fohlman, Proposal for a common nomenclature for sequence ions in mass spectra of peptides, Biomed. Mass Spectrom., 11 (1984) 601. 19 R.S. Johnson and K. Biemann, The primary structure of thioredoxin from Chromatium vinosum determined by high performance tandem mass spectrometry, Biochemistry, 26 (1985) 1209–1214. 20 G.J. Feistner, P. Højrup, C.J. Evans, D.F. Barofsky, K.F. Faull and P. Roepstorff, Mass spectrometric charting of bovine posterior/interior pituitary peptides, Proc. Natl. Acad. Sci. USA, 86 (1989) 6013–6017. 21 H.V. Scheller, J.S. Okkels, P.B. Høj, I. Svendsen, P. Roepstorff and B.L. Møller, The primary structure of a 4.0 kDa photosystem I polypeptide encoded by the chloroplast psaI gene, J. Biol. Chem., 264 (1989) 18402–18406.
Peter Roepstorff Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
PREFACE
Welcome to Protein Mass Spectrometry. There is no doubt mass spectrometry has revolutionized protein biochemistry, as described in the eloquent foreword written by my colleague Peter Roepstorff who describes the historical development of the field. The collection of chapters assembled in this volume will introduce the trainee biochemist to many of the approaches currently driving the field forward at ever-increasing pace. It is of course an incomplete collection but I believe, comprehensive enough to cover the most important lessons. I hope it is apparent from the work and ideas described that the field is moving forward with great momentum, fuelled by increasingly sophisticated instrumentation and powerful computational and informatics resources. I am heartened by the positive support I have received from the mass spectrometry community as I have undertaken this project. I thank all the authors for their diligent work preparing the chapters and apologize sincerely for delays in production that can be blamed solely on me. I thank Professor Damia Barcelo´ of the Ministerio de Ciencia y Technologia in Barcelona for the invitation to propose this volume and the staff at Elsevier, especially Joan Anuels and Anne Russum who have patiently guided me toward production, and Krishnan Balakrishnan in command of final typesetting. I will finish with a piece of advice for young scientists considering their direction; do not underestimate the contributions you might make to this field, it is still young and developing rapidly. Try and imagine what might be accomplished in five or ten years from now with improved mass spectrometers, faster electronics and less expensive access to supercomputing resources, alongside the ever expanding genomics databases. Julian P. Whitelegge
xxix
S ER I E S E D I TOR ’ S P R E F A CE
I am very pleased to introduce the second volume on proteomics in the Comprehensive Analytical Chemistry series. The first, edited by G. Marko-Varga, was entitled ‘‘Proteomics and peptidomics: New technology platforms elucidating biology’’ and published in 2005 as volume 46. The field of proteomics has continued to grow over the last few years and more and more analytical groups are now involved in this type of work. This volume, entitled ‘‘Protein Mass Spectrometry’’ and edited by Julian P. Whitelegge, is a comprehensive compilation of chapters on the various aspects of this field. In his foreword, Peter Roepstorff describes in clear language that newcomers can understand that mass spectrometry of proteins started far earlier than the 1990s. In the 1960s the first attempts were made to analyze peptides and proteins by mass spectrometry. The real breakthrough however took place at the ASMS conference in San Francisco in 1998 when John Fenn reported on electrospray ionization spectra of intact proteins for the first time. John Fenn was later awarded the Nobel Prize for the discovery of electrospray as a new interfacing system in mass spectrometry. After the success of this work and other papers, it has become clear to the scientific community that mass spectrometry is the key analytical technique in protein studies and molecular and cellular biology. The first five chapters introduce the reader to the principles and concepts of mass spectrometry and to ion mobility and are useful chapters for understanding the various applications of protein characterization. The middle chapters discuss many applications of protein analysis, such as integral membrane proteins, shotgun approaches, phosphoroproteomics, hystidine phosphorylation and analysis of deamination in proteins. The last four chapters are devoted to quantitative proteomics and accuracy of quantitation from metabolite labelling experiments and also multiplexed quantitative proteomics. This is certainly a timely book since interest in mass spectrometry of proteins is still dominant at most of the mass spectrometry conferences taking place worldwide. This book will be of interest for a broader audience of analytical chemists and specially to those who are already working or planning to enter the field of proteins in biology or biomedicine.
xxxi
xxxii
Series Editor’s Preface
Finally, I would like to thank all the contributing authors of this book for their time and efforts in preparing this comprehensive compilation of research papers that will make this volume on protein mass spectrometry a key reference book in this field. D. Barcelo´ Department of Environmental Chemistry, IIQAB-CSIC Barcelona, Spain
CHAPT ER
1 An Introduction to the Basic Principles and Concepts of Mass Spectrometry Kym F. Faull, Alek N. Dooley, Frederic Halgand, Lorelei D. Shoemaker, Andrew J. Norris, Christopher M. Ryan, Arthur Laganowsky, Jodie V. Johnson and Jonathan E. Katz
Contents
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
Opening Remarks The Instrument Vacuum Systems Definitions Resolution Mass Accuracy Isotopes Reconciling Theoretical and Measured Masses Charge State Assignment The Need for Chromatography The Myth of Defining Elemental Compositions Desorption Ionization: Laser Desorption Spray Ionization: Electrospray Ionization Mass Analyzers Time-of-Flight Mass Spectrometers Linear Quadrupole Mass Filters Quadrupole Ion Traps Linear Ion Traps Ion Cyclotron Cells and Fourier Transform Mass Spectrometry The Orbitrap Detectors Electron Multipliers Conversion Dynodes or High-Energy Dynodes Quantification Structural Elucidation by Mass Spectrometry Gas Phase Ion Stabilities and Energetics of the Collisionally-Activated Dissociation Process
Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00201-8
2 4 4 5 6 8 9 11 11 12 13 14 16 19 20 22 23 26 27 29 30 31 32 32 34 35
r 2009 Elsevier B.V. All rights reserved.
1
2
Kym F. Faull et al.
27. Collision-Induced Dissociation 28. Electron Capture Dissociation 29. Electron Transfer Dissociation 30. Scan Modes in Tandem Mass Spectrometry 31. Conclusions Acknowledgements References
36 38 40 40 43 44 44
1. OPENING REMARKS The definition of mass spectrometry as the science of manipulating gas phase ions identifies the physical technique with which this volume is concerned [1]. If a molecule can be converted into a gas phase ion it can be interrogated by this technique. Making possible the study of proteins by mass spectrometry required the development of methods to convert them into gas phase ions, and of techniques to separate the ions and detect them. At first this seemed an impossible task. How can a delicate polymeric chain of chemically dissimilar amino acids — which may contain thousands of carbon, nitrogen, oxygen, sulfur and phosphorous atoms — that is thermally unstable and has negligible vapor pressure, be converted into a gas phase ion? The processes involved, however, are astonishingly straightforward. In this opening chapter the reader is introduced to the principles of these processes as a prelude to the more detailed and complex chapters that follow. The technique of mass spectrometry began with the work of Thomson, Dempster and Aston [2] in the early 1900s. Since then, the field has grown in an exponential manner, spurred on by several major developments. The literature is replete with many accounts of these developments; a particularly unique perspective is provided by Brunne´e [3]. Combined gas chromatography-mass spectrometry was one of these developments in the 1960s. It allows for the direct analysis of complex mixtures, thus obviating the prior need for sample purity. This development more or less coincided with the use of the technique to monitor contamination of the environment with man-made chemicals including pesticides and other pollutants. Another significant development came in the 1970–1980s when the first desorption ionization techniques were introduced. Thermal stability was not necessary for successful ionization with these ionization methods. The so-called plasma desorption and fast atom bombardment ionization techniques, stemming from the work of Macfarlane [4] and Barber [5] and their colleagues, respectively, stand out as being particularly significant developments. These developments provided an inspirational lead for others to follow, and the subsequent discoveries of laser desorption (LD) and electrospray ionization (ESI), for which Tanaka [6], Karas and Hillenkamp [7] and Fenn [8] are widely acknowledged, provided robust ionization methods for a wide range of
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
3
both thermally stable and thermally unstable molecules including proteins and peptides. The chapters in this volume bear testimony to the significance of these discoveries. Amino acids, peptides and proteins have several unique characteristics which make them particularly suitable to mass spectrometric analysis. Firstly, their ionizable functionalities render these molecules excellent candidates for ESI and LD ionization. Secondly, with the exception of isoleucine and leucine (Table 1), the unique masses of 18 of the 20 common protein amino acids allows for identification on the basis of their mass alone. Thirdly, the universal amide bond that links the amino acids means that the characterization of polymers of amino acids is not confounded with complications that arise from linkage heterogeneity, as is the case with carbohydrate characterization. Known rare exceptions to this third point include the isoaspartyl and isodityrosine bonds that occur in some mammalian proteins and peptide cross-linkages that occur in some bacterial cell wall proteins.
Table 1
The amino acids of proteins
Amino acida
Alanine Arginine Asparagine Aspartic acid Cysteine Glutamine Glutamic acid Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine
Three letter symbol
Single letter symbol
Residue molecular weights (Da) Integer
Average
Monoisotopic
Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Try Val
A R N D C Q E G H I L K M F P S T W Y V
71 156 114 115 102 128 129 57 137 113 113 128 131 147 97 87 101 186 163 99
71.0788 156.1876 114.1039 115.0886 103.1448 128.1308 129.1155 57.0520 137.1412 113.1595 113.1595 128.1742 131.1986 147.1766 97.1167 87.0782 101.1051 186.2133 163.1760 99.1326
71.03711 156.10111 114.04293 115.02694 103.00919 128.05858 129.04259 57.02146 137.05891 113.08406 113.08406 128.09496 131.04049 147.06841 97.05276 87.03203 101.04768 186.07931 163.06333 99.06841
a Amino acids have unique masses. The exception is isoleucine and leucine (), which have identical elemental composition. Lysine and glutamine () have similar masses.
4
Kym F. Faull et al.
Figure 1 The basic components of a mass spectrometer.
2. THE INSTRUMENT In the simplest configuration, a mass spectrometer consists of a sample introduction device, an ion source, a mass analyzer, a detector and a data system. These components are most easily conceived as being arranged in a linear array (Figure 1). Samples are introduced into the ion source in a solid, liquid (in solution or neat) or gaseous state. Proteins are only introduced in solution or as solids. Gas phase ions are made from the sample in the ion source. Spray (ESI, micro- and nano-spray) and LD ionization are the two most important methods for creating gas phase ions from proteins. These gas phase ions are then separated on the basis of their mass/charge (m/z) ratio in the mass analyzer. Quadrupole (Q), quadrupole ion trap (QIT), linear ion trap (LIT), time-of-flight (TOF), ion cyclotron resonance (ICR) and orbitrap analyzers are the most common mass analyzers used in protein research today. Often these analyzers are used in a linked or tandem arrangement in such a way that two analyzers, separated by a gas collision or reaction cell, are assembled in a linear array. The analyzers in a tandem arrangement may have a similar or a different design. Finally the ions are detected. Usually ion currents are measured, often with an electron multiplier, multichannel plate or photomultiplier. Ion counting is sometimes also used. The frequency of currents induced in detector plates (image current) is measured in ICR and orbitrap instruments. The data system allows the manipulation of the recorded signals.
3. VACUUM SYSTEMS In the mass analyzer it is undesirable to have the gas phase analyte ions either deviate from their desired trajectories, or fragment by colliding with gas molecules, unless one deliberately chooses to orchestrate such a collision (as in tandem mass spectrometry or MS/MS). Therefore mass analyzers operate in a vacuum (usually p105 Torr, 1 Torr ¼ 1 mm Hg). Vacuums of this quality are typically generated through two stages of pumping in which a high vacuum pump (ion pump, cryogenic pump, oil diffusion pump or, more commonly on modern instruments, a turbomolecular pump) is connected in series to a rotary mechanical pump.
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
5
4. DEFINITIONS The molecular ion (designated M) represents the intact molecule, and is the precursor of any fragment or adduct ions formed during the ionization process. Typically the charge on the ion is indicated. If the charge results from loss of an electron, attachment of a proton or protons, alkali metal adduction or loss of a proton, the status of the molecular ion is so-indicated (e.g. M+d, MH+, (M+6 H)6+, MNa+ and (M–H), respectively). The unit used in mass measurement is the Dalton (Da). This is defined as 1/12th the mass of carbon-12 (12C). Human insulin, for example, has an elemental formula of C254H377N65O75S6 and a molecular weight of 5729.6009 Da for the monoisotopic (12C254 1H377 14N65 16O75 32S6) species. Mass spectra are displayed graphically as ion current intensity on the ordinate (y-axis) plotted against the m/z ratio of the ions on the abscissa (x-axis). The ion current intensity can be expressed relative to the most intense signal in the spectrum (percent relative intensity), or alternatively as percentage total ionization (%S), which represents the abundance of individual ions expressed as a percentage of the total abundance of all the ions in the spectrum, or as absolute ion intensity (ion counts or ion current). Integer mass refers to the mass of an element to the nearest whole number. Monoisotopic mass refers to the exact mass of the lightest isotope of each element in the formula. Thus for 12C the integer and monoisotopic masses are, by definition, the same. However, the exact masses of all other elements deviate slightly from their corresponding integer values (Table 1). This deviation from the integer value is referred to as the mass defect, which can be calculated from the monoisotopic mass. Hydrogen, for example, has an integer mass of 1 Da, while the monoisotopic mass of the 1H atom is 1.00782504 Da. When the naturally occurring hydrogen isotope deuterium (2H, 2.014101787 Da, 0.015% natural abundance) is taken into account, the average mass of hydrogen is 1.00794 Da; this is also referred to as the chemical mass. To interpret a spectrum it is essential to know the charge state of the ions. This can usually be determined with certainty, although there are occasions, particularly with low-resolution mass analyzers, when ambiguity arises. The base peak in a mass spectrum is the most intense peak in the spectrum. There is a distinction to be made between profile and centroid data (Figure 2). Profile data is the direct readout from the detector (Figure 2, top panel). In profile data, each signal has a width associated with it, which results from a distribution of imperfect ion measurements. This width is described in terms of resolution (see below), and different analyzers vary in their resolving power. Peaks tend to be symmetrical; however, the asymmetry sometimes evident in profile data can be due to elemental heterogeneity in the ions contributing to the signal, and to imperfections in the performance of the mass analyzer. To simplify data manipulation and presentation (such as for text book presentations), profile data can be converted to a centroid format by arithmetic manipulation. The centroid of a profile peak is defined as the m/z obtained from the weighted center-of-mass
6
Kym F. Faull et al.
Figure 2 A selected region of the electron ionization mass spectrum of perfluorinated kerosine collected on a time of flight mass spectrometer. Panel A shows a region of the profile spectrum with the apex mass and centroid at half-width/half-height indicated. Note the difference between the centroid mass position and the apex of the profile peak. Panel B shows the centroid mass position of the profile peak from panel A.
determination of that peak. Thus, the m/z of a centroid peak is not always the same as the m/z of the tallest point of the profile peak.
5. RESOLUTION Resolution refers to the ability to discriminate between ions of similar m/z. This is a characteristic determined by the mass analyzer but can be influenced by the energy spread of ions emerging from the ion source. There are two methods by which resolution is calculated. The traditional method is defined as M/DM and should be accompanied by notation of the degree of peak separation used when making the calculation (Figure 3). A 10% valley (nadir to zenith ratio) is most commonly used while a 50% valley is often used for TOF analyzers (early TOF analyzers produced spectra with broad ‘‘feet’’). While this method of calculating resolution is referred to as the M/DM method, it should more correctly be referred to as (m/z)/(Dm/z) method, because the m/z values are used in the calculations. There are three general levels of resolution. Low resolution usually implies unit resolution in the mass range of interest, as in Figure 3, for example, where m/z 906.5 is separated from 907.7 with a 10% valley. Medium resolution implies a resolution between 4,000 and 10,000. High resolution implies a resolution of 10,000 or greater. Conventional quadrupole and QIT analyzers are unit resolution instruments, although the new generation instruments, including LIT instruments, are capable
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
7
Figure 3 M/DM method for calculating resolution. The difference (DM) between the monoisotopic peak at 906.5 and the 13C1 peak at 907.5 is 1 Da. Thus, 906.5 Da divided by 1 Da equals 906.5, which is the resolution (M/DM method) at this point in the mass spectrum. Unit resolution is the term used when resolution equals the mass. This signal, from an ammonium adduct of a polypropylene glycol component, was collected on an electrospray triple quadrupole instrument.
of higher resolution. Double focusing magnetic sector instruments (magnetic and electrostatic analyzers in series) are high-resolution instruments, older linear TOF analyzers are low-resolution instruments, but when equipped with a reflector, and particularly with time-lagged focusing, TOF analyzers are capable of highresolution measurements. The orbitrap and ICR cells are capable of extraordinarily high resolution. An example of the need for high resolution emerged during the analysis of the first moon rock samples brought back to earth from the Apollo mission in 1969. At this time there was interest in determining if these samples contained organic molecules indicative of life. Several laboratories prepared extracts of the samples and used mass spectrometry to search for amino acids, porphyrins and other biochemicals in the samples. One of the few findings was the presence of relatively large amounts of CO in the samples, but recognition of CO required sufficient resolution to separate CO from N2. 12 14
C16O N2
Nominal mass 28 Nominal mass 28
Monoisotopic mass 27.9949 Monoisotopic mass 28.0061
The resolution required for the separation of CO and N2 is X28/0.0112 ¼ 2,500. This resolution could not be obtained on quadrupole mass analyzers, and the discovery of the relatively large amounts of CO in the samples was made by the group that used a magnetic sector instrument with a much higher resolution (B. Halpern, personal communication) [9].
8
Kym F. Faull et al.
Figure 4 Full width at half height method (FWHM) for calculating resolution. The width of this peak at m/z 148.9605 at half height is 0.0337 Da. Thus, the FWHM resolution is calculated as 148.9605 divided by 0.0337 ( ¼ 4420). This signal is an EI fragment ion from a phthalate collected on a time of flight mass spectrometer connected to gas chromatograph.
The other method for calculating resolution was developed for spectra obtained on TOF analyzers, but has since become the more commonly used approach. Termed the ‘‘full width at half height method’’ (FWHM), this calculation can be made from single ions (Figure 4). The M/DM and FWHM approaches for calculating resolution produce widely differing numerical values from the same data (Figure 5), hence the requirement for method stipulation when resolution values are cited.
6. MASS ACCURACY This refers to the accuracy with which the m/z value of a peak can be assigned. The precision of this assignment is a characteristic of the mass analyzer and is influenced by the scan-to-scan variation in the m/z assignment of the peak (i.e. reproducibility of the scan by the mass analyzer). The deviation between a calculated mass and the experimentally measured mass is expressed in milli Da (mDa) or parts per million: ppm ¼ (calculated massmeasured mass) 106/ calculated mass. A reasonable rule of thumb is that, with the current technology and ions up to about m/z 4,000, the best mass measurement accuracies are in the low ppm range (1–10). All mass spectrometers must be calibrated. For external calibration the mass analyzer is calibrated across the mass range of interest with a standard that may be a single compound but more commonly is a mixture of compounds. The unknown(s) are then analyzed and the m/z values of detected ions are assigned
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
9
Figure 5 Comparison of M/DM and FWHM methods for calculating resolution. See Figures 3 and 4 for a description of the M/DM and FWHM methods, respectively. Notice the large difference in the numerical values for resolution calculated from the same data. This spectrum is the 16 charge state electrospray ion of horse myoglobin recorded with a Fourier transform mass spectrometer.
using the calibration established with the standard. While external calibration is easier to perform, the most precise measurements are made with internal calibration in which the standard and the unknown are present in the source simultaneously. After collection of the spectra the signals from the standard are assigned, and then the assignments for the unknown signals are made by inter- or extra-polation. In the case of internal calibration the difficulty arises in obtaining a good balance for the ion intensities from the calibrant and the unknown(s), and requires the availability of suitable compounds for use as internal calibrants. Calibrants commonly used for electron impact and chemical ionization include perfluorinated kerosene (commonly referred to as heptacosa or FC-43) which are available as a range of boiling point mixtures (the thermal stability and negative mass defect of fluorinated calibrants is an advantage). Cesium iodide (CsI) and mixtures of CsI and RbI are often used as calibrants for fast atom bombardment (FAB) and liquid secondary ionization (LSIMS) sources, and sometimes for ESI and laser desorption (LD) sources. A variety of peptides, proteins and polypropylene glycols ([CH2CH2CH2O]n) are used as calibrants for ESI and matrix-assisted LD (MALDI).
7. ISOTOPES Most elements have naturally occurring stable isotopes; this can be both a bane and a blessing. The bane comes from the added complexity they contribute
10
Kym F. Faull et al.
to the spectra; the blessing comes from the help they provide for determination of the chemical formula and the charge state of an ion (see below). In organic and biochemical applications, the appearance and interpretation of resolved mass spectra is complicated by the naturally occurring isotopes of C, H, O, N, P and S. Of these, the naturally occurring carbon-13 (13C) isotope adds the most complexity because of the prevalence of carbon and the fact that carbon-13 accounts for about 1.1% of the carbon on earth. If an ion contains 10 carbon atoms, then the signal corresponding to the 12C13 9 C1 component will have a relative intensity of 10 1.1% ¼ 11% compared to the 12C10 component. As illustrated in Figure 6, as the number of carbon atoms in an ion increases, so does the relative intensity of the 13C1 component (and 13C2, etc.) until it exceeds the intensity of the all-12C component. The 13C1, 13C2, etc. containing peaks are referred to as isotopomers. The masses and abundances of the common biological elements and their common isotopes are shown in Figure 7.
Figure 6 Molecular ion regions of the ammoniated adduct ions formed by electrospray from acetonitrile (A) and polypropylene glycol components (B–F). Note the increased relative intensity of the 13C1 isotopomers with respect to carbon content. Panel E is not labeled for increased clarity.
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
11
Figure 7 Integer, monoisotopic and average (chemical) masses of the elements found in proteins.
8. RECONCILING THEORETICAL AND MEASURED MASSES When the mass of a molecule is calculated (theoretical mass) for comparison with a measured mass, the resolution in the mass spectrum dictates which of the elemental masses (monoisotopic or chemical) should be used. In Figure 8, the comparison should be made with calculations based on monoistopic masses with the resolved data (right panel), and with average (chemical) masses with the unresolved data (left panel). The calculation of the molecular weight of a polymer is accomplished from the sum of the residue masses of the monomer units, to which the mass of the end groups is added. In the case of peptides and proteins, as already stated, the amino acid masses are unique, with the exception of leucine and isoleucine (Table 1), making them particularly amenable to characterization on the basis of their mass.
9. CHARGE STATE ASSIGNMENT Charge state assignment is essential for the interpretation of any mass spectrum. Singly charged ions of organic molecules are characterized by a 1 Da separation between the 13C-containing isotopomers (DM), doubly charged ions are characterized by 0.5 Da separation, triply charged by 0.33 Da separation, etc. Because the charge state of an ion is the inverse of the separation between the 13 C-isotopomers (charge state ¼ 1/DM), the charge state can be calculated from the data in a mass spectrum so long as there is sufficient resolution of the
12
Kym F. Faull et al.
Figure 8 Comparison of insulin mass spectra collected under non-resolving and resolving conditions. The figure shows the molecular ion region of bovine insulin collected on a MALDI-TOF instrument at low resolution (left panel, linear mode with time-lagged focusing) and at a resolution of about 5,000 (FWHM, reflectron mode with time-lagged focusing, right panel). Both of these spectra are the average from 500 laser shots at a spot containing 5 pmol of bovine insulin and a-cyano-4-hydroxy-cinnamic acid matrix.
isotopomer ions. For example, the ion in Figure 3 has a charge state of 1 (1/1), the ions in Figure 5 have a charge state of 16 (1/0.0619 ¼ 16.2), all the ions depicted in Figure 5 have a charge state of 1 (1/1), and the insulin ion in Figure 8 (right panel) is singly charged (1/1), but the charge state cannot be determined from the unresolved insulin ion (Figure 8, left panel).
10. THE NEED FOR CHROMATOGRAPHY The idea of eliminating chromatography from the workflow of complex sample analysis, thus limiting manipulation of the sample and shortening analysis time, has been repeatedly revisited. In 1966 chemical ionization (CI) was discovered, and at the time was thought to have promise in this regard [10]. The reason for this expectation was the hope that the simple spectra produced during CI would have sufficient inherent specificity for the identification and quantification of targeted compounds in complex extracts from direct analysis by solid probe techniques. For complex samples this hope did not materialize except in a few isolated circumstances [11,12]. The subsequent development of tandem mass spectrometry revitalized this hope, and the more recent development of MALDI-TOF has had the same effect. Nevertheless, complex samples almost invariably require chromatography, and the time delay and extra complications imposed appear inescapable facts of life. The direct coupling of LC to MS is more difficult than GC/MS coupling because of the gas load created by vaporization of the LC effluent (e.g. 100 ml/min
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
13
of water will create 124.4 ml/min of vapor at standard temperature and pressure). Despite this problem there have been several generations of LC-MS interfaces. Today, for peptides and proteins LC-MS is virtually the exclusive domain of ESI [13]. However, because of limitations imposed by ESI, there are restrictions on the type of eluants that can be used for LC during LC-MS. Because ESI is sensitive to involatile inorganic salts (the inorganic salts tend to sequester the ion current and strip the charge from the organic analytes of interest), eluants containing common biochemical buffers such as phosphate, sulfate, Tris, sodium and potassium are poorly tolerated. Reverse phase LC using aqueous eluants mixed with organic modifier (methanol or acetonitrile) are well tolerated, with dilute acid for the positive ion mode (commonly 0.1% formic or acetic acids because trifluoroacetic acid tends to suppress the ion signal) and appropriate pH adjustment for the negative ion mode (ammonium acetate, pH 5). Multidimensional chromatography is required for particularly complex samples. The combination of ion exchange and reverse phase, developed by Yates and colleagues [14], has been an important development because of the unidimensional design and the high capacity of both chromatographic modalities. The disadvantage of this method is the relatively low resolution of the ion exchange step that results in the appearance of an analyte in more than one ion exchange fraction. Other multidimensional techniques, such as chromatofocusing combined with reverse phase chromatography in a twodimensional (2D) format, have also been used, but the low resolution of chromatofocusing has limited the applicability of this approach. There remains a need for a robust, high capacity, high resolution, mass spectrometrically compatible, orthogonal separation modality that can be conveniently used in combination with reverse phase chromatography. Hydrophilic interaction chromatography (HILIC) meets some of these requirements, and although the newer resins appear to have adequately stable behavior and good resolving power, their capacity has yet to be thoroughly tested.
11. THE MYTH OF DEFINING ELEMENTAL COMPOSITIONS Because all elements have a unique mass, it has long been proposed that if the m/z of an ion is measured with sufficient accuracy, an elemental composition can be determined from first principles without additional information. However, in practice this is only fulfilled at relatively low m/z and with the best available mass measurement accuracies (low ppm range; Figure 9). At higher m/z there is an exponentially increasing number of elemental possibilities that can account for the observed mass of an ion. Therefore, to restrict the number of acceptable elemental combinations, an exponentially increasing mass measurement accuracy would be required, and this is not provided by any of the available mass analyzers. Conversely, an accurate mass measurement can always be used to substantiate a hypothesized elemental formula, regardless of the molecular weight. At the current level of technological development the following rules of thumb are worthy of note. Mass measurement accuracies of less than 10 ppm are
14
Kym F. Faull et al.
Figure 9 The power of exact mass. The table lists the combinations of carbon, hydrogen, nitrogen, oxygen, phosphorous and sulfur that can account for the three listed ions at four different mass measurement accuracies. The elemental compositions are restricted to 0–500 for carbon, 0–1,000 for hydrogen, 0–6 for nitrogen and oxygen, 0–3 for phosphorous and 0–4 for sulfur.
generally accepted as adequate for reconciling measured m/z signals with calculations based on molecular formulas, particularly with molecular weights below 2 kDa. Mass measurement accuracies better than 1 ppm are rarely achieved on a routine basis, although very recent results with the orbitrap instrument suggest such measurements may become more common in the future ( Joshua Coon, unpublished observations). Measurements by Fourier transform mass spectrometry (FTMS) typically have less than 3 ppm accuracy. New generation TOF mass spectrometers can now produce 2–5 ppm accuracy. The orbitrap mass spectrometer can produce low ppm accuracies. Quadrupole and QIT mass measurement accuracies are generally no better than 20 ppm.
12. DESORPTION IONIZATION: LASER DESORPTION Today LD is the most important desorption ionization method in use, and the acronym MALDI is used to denote when LD is used in conjunction with a matrix to assist the ionization process. Work in the 1970–1980s, particularly that of Ron Macfarlane with 252Cf-desorption from a solid surface, and of Michael Barber and colleagues with atom (and subsequently ion) beam desorption from a liquid surface, provided the inspiration for others to search for improved methods for the efficient ionization of large molecules [4,5,15,16]. LD had been under investigation since the 1980s also, but crucial observations by Tanaka and Karas and Hillenkamp who obtained strong ion currents for proteins when they were
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
15
co-crystallized with a small organic molecule which absorbed energy in the region of the laser light source, ushered in the MALDI era [6,7]. This technique is usually used in conjunction with a TOF mass analyzer because of the compatibility of the pulsed ion source and the TOF analyzer (Figure 10). Probably the most gentle mode of ionization available today, it is used for analysis of a wide range of compounds from low-mass materials to the analysis of proteins with molecular weights W100 kDa. The ionization process can tolerate inorganic salts and other additives commonly used in biochemical preparations (except sodium dodecyl sulfate, SDS). The process can be extraordinarily sensitive (fmol–pmol required), but the high chemical background derived from the matrix can be a problem in the low m/z range (m/zo500). LDMS spectra are generally characterized by intense singly charged ions. The mechanism of ion formation is probably complex and it is likely that there is more than one process involved. A simple analogy can be made with the CI and perhaps also the FAB ionization processes. Ion formation in CI and FAB has been thoroughly investigated. It is generally thought to involve a reaction between a gas phase ion (a proton donor derived from the reagent gas or liquid matrix) and a gas phase molecule (the analyte). A similar reaction may take place during LDMS. Immediately following adsorption of the laser pulse a dense plume of gas phase material is generated. Desorbed analyte molecules react in the gas phase with proton donors that are formed upon molecular disintegration of the organic molecules (matrix and sample) in the immediate vicinity of the point of laser impact. Such a gas phase ion-molecule reaction would have favorable rate constants and would predominately result in the formation of singly protonated analyte molecules.
Figure 10 Simplified schematic representation of a laser desorption ion source.
16
Kym F. Faull et al.
N O
O
O OH
OH
HO
OH
OH
O
OH OH
OH
Figure 11 Some commonly used MALDI matrices: 2,5-dihydroxybenzoic acid (left, DHB, used for peptides, carbohydrates and glycolipids), a-cyano-4-hydroxy-cinnamic acid (center, commonly referred to as ‘‘alpha’’, used for peptides) and 4-hydroxy-3,5-dimethoxy-cinnamic acid (right, sinapinic acid or sinapic acid, used for proteins).
The art in MALDI analysis lies in the preparation and co-crystallization of the matrix and sample. A number of matrices have been successfully used. Those favored for peptides and proteins are simple aromatic organic acids that absorb at or near the wavelength of the laser pulse. Those most commonly used are shown in Figure 11. The accuracy of molecular weight measurements of proteins by MALDI-TOF is o0.1% of the mass.
13. SPRAY IONIZATION: ELECTROSPRAY IONIZATION This phenomenon was originally observed by Malcolm Dole, but rediscovered by John Fenn and colleagues at Yale in the early 1980s, work for which Fenn was awarded the 2002 Nobel Prize in Chemistry [17–21]. Alexandrov and colleagues in Russia independently described the phenomena, but their contribution is generally not widely acknowledged in the West [22]. The process works by forcing a solution of the analyte through a capillary to which a voltage is applied (Figure 12). For flow rates above several microliters per minute (typically referred to as electrospray), a coaxial gas is applied to aid in the formation of a spray from the tip of the capillary. For flow rates less than several micro liters per minute (typically referred to as micro- or nano-spray), no gas is needed as a spray develops naturally due to the high-voltage applied. The emerging droplets of the spray are charged, lending the name to the technique. On encountering heat and frequently an opposing flow of gas (bath gas), the solvent in the droplets begins to evaporate. As the droplet size decreases due to this evaporation, the prevailing theory is that the charge density in the droplet increases and eventually naked ions are ejected as a consequence of charge repulsion. Molecules emerge with one or more proton adducts; for example, MH+ in the positive mode (Figure 13A), (M–H) in the negative mode (Figure 13B); and larger molecules emerge with
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
17
Figure 12 Simplified schematic representation of an electrospray ionization source. Adapted from the Ionsprayt design of Applied Biosystems/MDS Sciex.
many charges (Figure 13C). These ions then enter the high vacuum of the mass spectrometer through varied inlet technologies and geometries depending on the type of analyzer the ESI source design. This technique is extraordinarily effective at producing gas phase ions from a wide range of involatile molecules. Thermal stability is not a requirement, but the process is intolerant of involatile inorganic ions, and sodium dodecylsulfate. The ionization mass range is presumably unlimited, certainly W100 kDa, but because the observed m/z values of ions from large molecules (e.g. proteins) are almost always less than 4,000 (often less than 2,500) due to the multiple charge states, ESI is frequently attached to a mass analyzer with a limited mass range (e.g. quadrupole and ion trap instruments). The production of multiply charged ions from macromolecules such as proteins (Figure 13C) may at first glance appear to complicate the interpretation of the spectra [23]. However, a number of arithmetic algorithms have been developed for deconvolution of the mass spectra into true molecular weight spectra (Figure 13D). One easy way to visualize the process is to calculate the molecular weights for each ion in the spectrum across a range of charge states (Figure 14). Ions of differing charge state originating from the same molecule then emerge as matching molecular weights across the diagonal of the display. Ions that are not part of the series do not fit the matching molecular weight at any charge state. The accuracy of molecular weight measurements by ESI is astonishingly good, and for proteins is usually quoted as less than 70.01% of the mass. A celebrated example of the accuracy of this mass assignment emerged shortly following the use of ESI for the analysis of proteins. The molecular weight of myoglobin (153 residue protein) calculated from the X-ray structure is 16,950.5 Da. The molecular weight measured by ESI was found unequivocably to be 16,951.5 Da [24]. Apparently residue 122 was mis-assigned in the X-ray
18
Kym F. Faull et al.
Figure 13 Representative ESI mass spectra collected on a quadrupole mass spectrometer: (A) dimyristoyl phosphatidyl choline, positive ion mode, (B) bovine brain sulfatides, negative ion mode, (C) bovine heart myoglobin, positive ion mode and (D) molecular weight spectrum derived from the mass spectrum shown in (C).
structure as asparagine rather than aspartic acid (–COOH ¼ 45 Da, –CONH2 ¼ 44 Da). This mis-assignment may not be important for the function of myoglobin, but in biology there are many examples where 1 Da conversion of the free carboxyl form of a peptide hormone to the amide form has a major effect on the biological potency of the molecule.
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
Observed ions (m/z) 1305.00 1211.80 1131.10 1060.50 998.20 942.70 893.20 848.60 808.20 771.50 738.00
12 15647.9 14529.5 13561.1 12713.9 11966.3 11300.3 893.2 10171.1 9686.3 9245.9 8843.9
13
16951.9 15740.3 14691.2 13773.4 12963.5 12242.0 893.2 11018.7 10493.5 10016.4 9580.9
14 18255.9
16951.1 15821.3 14832.9 13960.7 13183.7 12490.7 11866.3 11300.7 10786.9 10317.9
15 19559.9 18161.9
16951.4 15892.4 14957.9 14125.4 13382.9 12713.9 12107.9 11557.4 11054.9
Ion charge state (12-24) 16 17 18 20863.9 22167.9 23471.9 19372.7 20583.5 21794.3 18081.5 19211.6 20341.7 16951.9 18011.4 19070.9 15955.1 16952.3 17949.5 15067.1 16008.8 16950.5 14275.1 15167.3 16059.5 13561.5 14409.1 15256.7 12915.1 13722.3 14529.5 12327.9 13098.4 13868.9 11791.9 12528.9 13265.9
19 24775.9 23005.1 21471.8 20130.4 18946.7 17892.2
16951.7 16104.3 15336.7 14639.4 14002.9
20 26079.8 24215.8 22601.8 21189.8 19943.8 18833.8 17843.8
16951.8 16143.8 15409.8 14739.8
21 27383.8 25426.6 23731.9 22249.3 20941.0 19775.5 18736.0 17799.4
16951.0 16180.3 15476.8
22 28687.8 26637.4 24862.0 23308.8 21938.2 20717.2 19628.2 18647.0 17758.2
16950.8
23 29991.8 27848.2 25992.1 24368.3 22935.4 21658.9 20520.4 19494.6 18565.4 17721.3
16213.8
16950.8
19
24 31295.8 29059.0 27122.2 25427.8 23932.6 22600.6 21412.6 20342.2 19372.6 18491.8 17687.8
Figure 14 Interpretation of the spectrum of multiply charged ions derived from the ESI spectrum of myoglobin shown in Figure 13C. The observed ions from the mass spectrum (Figure 13C) are listed in the left-hand column. The body of the table is a listing of molecular weights calculated for each ion with charge states 12–24 using the formulae molecular weight ¼ (m/z charge state)mass of the charging species, when the charging species is gained, species H, mass 1.0079 Da. The correspondence shown in bold font across the diagonal reveals the matching calculations which show the observed ions have charge states ranging from 13 (m/z 1305.0) to 23 (m/z 738.0), and the average of the calculations (16951.38) is within experimental error (+/ 0.01% for measurements of this type made on quadrupole mass spectrometers at unit resolution) of the molecular weight calculated from the protein sequence (16951.5 Da).
14. MASS ANALYZERS Gas phase ions can be separated on the basis of their m/z ratio using the physical principles that define the motion of ions within magnetic or electromagnetic fields, or the time taken to drift through a field-free space following acceleration. These techniques spread the ions in time and/or space, thus allowing their independent detection. Mass analyzers are usually grouped according to the physical principles by which ions of different m/z are separated, for example magnetic sector, TOF, ICR, etc. Another way to group mass analyzers is according to their mode of operation with respect to how the ion beam is generated; thus, there are instruments that have a continuous mode of operation (so-called scanning instruments; magnetic sectors, quadrupoles), others with a pulsed mode of operation (TOF), and yet others with an ion trapping mode (QITs, ICR cells, orbitraps). Yet another way to group mass analyzers is with respect to how the ion beam is measured, either as integrating or non-integrating instruments. All mass analyzers have advantages and disadvantages, and there is no single instrument that is ideal for all applications and experiments. The nature of the problem and the instruments available in the laboratory dictate the mass analyzer that will be used for any given experiment. The distinction between integrating and non-integrating analyzers should be dealt with in more detail. At any point in time, a non-integrating mass analyzer (e.g. a quadrupole analyzer) focuses only a small portion of the ion beam at the detector, and the remainder of the ion beam is discarded. However, integrating mass analyzers (e.g. TOF and ICR cells) detect essentially the entire ion beam. Thus with pulsed beam ion sources, integrating analyzers have an advantage over their non-integrating counterparts. With continuous beam ion sources, the
20
Kym F. Faull et al.
advantage of integrating instruments is diminished unless the duty cycle of the analyzer is such that a major portion of the ion beam is accepted. The disadvantage for non-integrating instruments can be offset, at least partially, for targeted compound analysis, by focusing on a specific ion or ions. In the MS mode this is referred to as selected ion monitoring (SIM), and in the MS/MS mode this is referred to as selected reaction monitoring (SRM, parent ion to product ion transition or reaction) or multiple reaction monitoring (MRM). Thus with SIM, SRM and MRM, the ion- or reaction-specific traces or chromatograms are collected in lieu of complete mass spectra. These modes of data collection are commonly used with chromatographic sample introduction for quantitative analyses.
15. TIME-OF-FLIGHT MASS SPECTROMETERS The method of ion separation by these instruments is most easily visualized by imagining a collection of ions aligned on a plane in a vacuum (Figure 15). A plate positioned behind the ions is then charged with a high electrical potential of the same polarity. The ions are repelled away from the plate. The force of repulsion is dependent on the charge of the ion and not the mass. So, two singly charged ions of different masses will feel the same repulsive force, however, the lower inertia of the less massive ion will translate into an increased velocity away from the plate. After passing through some grids that shield the ions from the force fields, they then drift through a field-free region until they strike the detector. The ions with low m/z strike the detector first, followed by ions of higher m/z. The time taken for the ions to traverse the drift tube is recorded. This time depends on mass, the number of charges, and the acceleration potential. The flight time (t)
Figure 15 Schematic representation of a MALDI-TOF instrument equipped with a reflectron. The sample is co-crystallized with matrix (see Figure 10). The function of the grids is to shield ions in the flight tube from the voltages associated with the source. In this way, the flight tube is ‘‘field free’’, allowing the ions to drift toward the detector after their initial acceleration. Adapted from an Applied Biosystems instrument design.
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
21
can be calculated from the equation: pffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffi t ¼ d= 2U m=q, where d is the distance traveled, U the voltage used to repel the ion into the fieldfree drift tube, m the mass of the ion and q the charge on the ion. Typical instrument designs have fight tubes of 1 m or so in length and use repelling (usually referred to as accelerating) voltages of around 20 kV. So, for example, the singly charged ions of mass 23 (Na+), 100, 1,000 and 4,000 Da in a 1-m flight path with 20 kV accelerating voltage, experience flight times of 2.44, 5.09, 16.10 and 32.19 ms, respectively. Of course it is impossible to perfectly align a collection of ions on a plane in the gas phase. However, laser pulses, when directed at samples embedded in a matrix on a flat surface, produce packets of ions with relatively small dispersion in energy and space. Hence the combination of LD and TOF analyzers is a compatible configuration. In addition, orthogonal extraction of ions from a continuous ion beam by the use of a repeller (or pusher) operated at high voltage and frequency, effectively produces packets of ions also with small energy spread. Thus these analyzers are also used in combination with continuous beam ion sources. The performance of TOF analyzers is improved with the use of time-laggedfocusing of the ions in the ion source. In the case of laser desorption, this can be visualized by imagining the packet or plume of gas phase ions that result from the impact of a laser pulse as a cloud. With the repelling voltage on continuously, the ions will move down the flight tube as a cloud immediately after formation, thus limiting the separation (resolution) of ions of different m/z. If, however, the repelling voltage is applied a small interval after the impact of the laser pulse, those ions in the cloud closest to the repelling plate will experience a slightly stronger repulsive force than those ions on the distant side of the cloud. The effect will be to narrow the width of the cloud and change its shape into a pancake. Thus the energy and spatial dispersion of ions will be reduced, resulting in improved separation between ions of similar m/z as they travel down the flight tube. The performance of these instruments is also improved with the use of a reflectron, otherwise known as a Mamyrin ring after the inventor Boris Mamryin (Figure 15), which is in effect an electrostatic mirror. These devices serve to correct, during flight, for small differences in the velocity spread of ions with the same m/z, and thus improve resolution. The reflector is positioned toward the end of the drift tube. Ions with higher velocity penetrate deeper into the mirror than those with the same m/z but lower velocity. On emerging from the mirror, ions with the same m/z have less spatial separation and are more clearly separated from ions with a slightly different m/z. Thus an instrument may be equipped with two detectors, one positioned at the end of the flight tube for use when the reflector is not used (linear mode of operation), and the other positioned off-axis to collect ions as they emerge from the reflector (Figure 15). On commercial TOF instruments the reflectors are effective at separating the
22
Kym F. Faull et al.
carbon isotope clusters in molecules out to about m/z 6,000. An example of insulin MALDI-TOF spectra collected both with and without the use of a reflector are shown in Figure 8. Most modern TOF analyzers are equipped with both a reflector and time-lagged-focusing to improve resolution.
16. LINEAR QUADRUPOLE MASS FILTERS The operation of these instruments (and the QITs and LITs described below) is based on the motion of ions in modulated electric fields, a combination of radio frequency (RF) and direct current (DC) fields. Wolfgang Paul and Hans Dehmelt were awarded the Nobel Prize in Physics in 1989 for their independent work developing the theory behind their operation and performance, and for constructing the first instruments. In the late 1950s, Robert Finnigan saw the practical potential of these devices and was the first to commercialize their construction. This effort was aided by the growing need for robust methods to monitor environmental contamination with industrial and other pollutants, and it coincided with the virtually simultaneous development of combined GC/MS. At this time, magnetic sector instruments were the most common type of mass analyzer in use. Quadrupoles appeared more suited to GC coupling because they operated at higher pressures than magnetic sectors and, because they were not hampered with the difficulties inherent to rapidly changing a magnetic field, they could be scanned at rates more compatible with the 10–20 s GC peak widths common on the packed columns that were then in use. Linear quadrupole mass filters, or commonly, quadrupole mass filters or ‘‘quadrupoles’’, are not to be confused with their three-dimensional (3D) or cubic counterparts (the QITs discussed in the next section). The filter consists of an assembly of four symmetrically arranged rods to which RF and DC voltages are supplied (Figure 16). Theoretically, rods with hyperbolic cylindrical crosssections perform better, although for convenient fabrication most are constructed with carefully machined rods with a circular cross section. The rods are positioned with identical diagonal distances between them. With the z-axis running longitudinally down the center of the assembly, and the x–y axes perpendicular to the rods, ions are injected along the z-axis. The motion of ions in the x and y planes, but not in the z-direction, is influenced by the superimposed RF and DC fields. This motion is described by the Matthieu equations [25]. Optimal performance of the filter depends on precise and stable DC and RF fields along the entire length of the rod assembly. The injected ions undergo transverse oscillations in the x–y plane, with frequencies that depend on their m/z value and with oscillations about the z-axis that depend on the magnitude of the applied RF and DC fields. With proper selection of RF and DC, ions of a given m/z will have relatively small oscillations about the z-axis, and they will pass through the assembly without striking either the rods or the walls of the device, and impinge upon the detector positioned at the end of the assembly; ions with other m/z values will have larger oscillations about the z-axis that increase in amplitude until they collide with the rods or the walls of the device, thus being neutralized
23
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
y
z x + y
y
-
-
x
+ ro x
Figure 16 Schematic representation of a set of quadrupole rods. The x, y and z axes are labeled. Ions injected into the assembly along the z-axis undergo oscillations of varying magnitude in the x and y planes as they travel down the length of the rods. Two crosssection diagrams are shown, one with circular rods, the other with more efficient, but harder to manufacture, hyperbolic rods.
and removed from the ion beam. A mass spectrum is usually obtained by linearly increasing the DC and RF voltages, keeping their ratio and the oscillator frequency constant, allowing the successive emergence of ions of increasingly higher m/z. Advantages of this design include the ease with which electric fields can be controlled, as opposed to magnetic fields, and the linear mass scale. The m/z range of commercial instruments is up to 4,000 with unit resolution throughout the range, although newer instruments can operate with peak widths of 0.1 m/z (FWHM). The small size, fast scan speed, relative mechanical simplicity of the device (although the ion motion itself is complex) and low construction costs are also attractive features of these instruments. Another important use of these devices emerged with the realization that with the DC voltage set to zero and the RF maintained, the ions remained focused along the z-axis with relatively no mass selectivity. When operated in this RF-only mode (wide band pass) the assembly is used as an ion guide and as a gas phase collision cell in tandem instruments.
17. QUADRUPOLE ION TRAPS QIT mass spectrometers first trap ions and in so doing no mass spectrum is recorded. The spectrum is only recorded when the ions of different m/z are
24
Kym F. Faull et al.
selectively expelled from the trap onto a detector. Thus, the operation is not continuous but involves a series of discrete steps in chronological order which include filling the trap with ions, removing unwanted ions that are outside the m/z range of interest, and then sequentially expelling the trapped ions and recording their m/z values and intensities. These instruments are 3D counterparts of the linear quadrupole mass filters described above. Wolfgang Paul and Helmut Steinwedel first described ion motion in these devices, but the demonstration of their potential practical utility came from the work of a number of individuals as detailed by March, Todd and Hughes, and from the work of George Stafford and colleagues at Finnigan Corporation [26–32]. The device consists of an evacuated cavity created by three hyperbolic surfaces (electrodes). These are a doughnut-shaped ring electrode and two endcapped electrodes (Figure 17). The layout resembles a 2D slice through a linear quadrupole mass filter. The instrument is operated by the application of DC and RF voltages to the ring and end-capped electrodes. Prior to 1984, the trap was operated in the mass-selective stability mode in which ions with a stable trajectory were stored by application of RF and DC voltages to the electrodes, and then ejected from the ion trap via a pulse of DC to an endcap electrode, to impinge upon an external detector, usually an electron multiplier. To obtain a mass spectrum, the RF and DC were scanned slowly at a constant ratio while the above scan function was repeated. With only this mode of operation, the relative complexity of the scan function and the lack of any apparent advantage over linear quadupoles available at the time, hindered their development as analytical instruments. However, three important developments in the 1980s led to vast improvements in their operation. The mass-selective instability mode of operation, developed by Stafford et al., takes advantage of the low-mass cut-off inherent with all instruments of this design [32]. The low-mass cut-off refers to the m/z value below which ions have unstable trajectories and are lost from the trap. As a general rule of thumb, on commercial instruments the low-mass cut-off approximates to about one third of the m/z of the parent ion in the MS/MS mode. In the mass-selective instability
Figure 17 Schematic representation of a 3D quadrupole ion trap.
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
25
mode of operation, the trap is filled with ions in the RF-only mode with the voltage applied to the ring electrode and the end-capped electrode grounded. Under these conditions, the ions with m/z values above the low-mass cut-off have a stable trajectory within the trap. Then the RF on the ring electrode is increased (the end-capped electrodes remain grounded), and ions of increasing m/z are moved toward the point at which their trajectories become unstable. Upon reaching this point they are expelled from the trap in the z-direction and impinge upon the detector positioned outside the end cap electrode. In this way a complete mass range can be scanned with a single sweep of the RF, and if the RF is ramped linearly, the mass range is scanned linearly. The next important development was the use of the bath gas. These instruments are almost always operated now in the presence of helium bath gas at relatively high pressure (approximately 1 mTorr). Without the bath gas the ions are distributed throughout the volume of the trap. In the presence of the bath gas the ions experience many non-fragmenting collisions with the helium, and in so doing lose kinetic energy. These collisions effectively dampen the oscillations of the ions and result in their confinement closer to the center of the trap. This results in a significant improvement in mass resolution and improved trapping of the ions introduced into the trap (sensitivity). The final important development came from appreciation of the significance of space-charge effects. With too many ions in the trap, space-charge influences of one ion upon the motion of another become significant. These effects must be minimal for good scan-to-scan reproducibility and accuracy of m/z assignment. The current instruments are able to adjust the filling time so that the trap contains an optimal population of ions to yield maximum sensitivity without significant spacecharging. Recent instrument designs include additional sophistication to the scan functions that result in significant improvements in resolution. For example, during the mass-selective instability scan of the RF, the addition of a resonant excitation voltage across the endcap electrodes at voltages sufficiently high to result in ion ejection significantly increases the mass resolution. Referred to as axial modulation, some designs apply the axial modulation frequency close to the low-mass cut-off region. By varying the applied frequency, the mass range of the ion trap can be increased and in addition high resolution across narrow m/z ranges scan can be achieved. Application of resonant excitation voltages at lower voltages causes kinetic excitation of mass-isolated ions, resulting in collision-induced dissociation (CID) to form product ions. This is the method by which fragmentation is induced in these instruments (see below). QITs are relatively inexpensive devices that require low mechanical tolerances, operate at relatively high pressures (104–103 Torr) and can trap mass-selected ions for seconds to minutes. Their low cost, small size, fast scanning capability and multiple operational modes has led to their widespread use. Note that in the linear quadrupole mass filter described above, ions with unstable trajectories are lost (i.e. not detected), whereas in the QIT ions need to acquire unstable motion before they escape the trap and be recorded.
26
Kym F. Faull et al.
18. LINEAR ION TRAPS Yet another type of quadrupole trapping device is now available [33]. The linear quadrupole design described above can be adapted as an ion trap by including two end electrodes (Figure 18). Ions injected into the device along the z-axis can be trapped radially by operating the quadrupole in the RF-only mode with helium (B3 mTorr) present as a kinetic energy damping gas. To prevent ions from escaping axially, a DC potential is applied to the end electrodes. Thus ions injected into the device will become trapped. This design is essentially a 2D ion trap in contrast to the 3D QIT described above. At any point ions can be expelled axially (along the z-axis), by removal of the DC potential on the end electrode, or radially, by ramping the RF voltages to increase the magnitude of the x–y
Figure 18 Schematic representation of the design and operation of a linear quadrupole ion trap. The device consists of set of quadrupole rods identical to those used in triple (or single) quadrupole mass spectrometers. At either end of the rods an electrode is placed. When the electrodes are energized the passage of ions is prevented. The electrodes can be in the form of a lens as shown, or a set of stubby quadrupole rods. (A) In the filling mode, the end electrode is energized, preventing ions from escaping out the back end of the trap. (B) Once the trap is full, the front electrode is also energized, thus preventing more ions from entering the trap and preventing the trapped ions from escaping. The trapped ions now oscillate along the length of the trap. (C) Two methods can be employed to record a mass spectrum. In axial ejection/ detection, an appropriate voltage is applied to the end electrode to extract the trapped ions along the z-direction. In radial ejection/detection, the electrodes remain energized, and a resonance voltage is applied to force the ions out of the trap through slits cut into two of the rods. Radial detection requires the synchronized operation of two detectors, one positioned over each slit in the rods.
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
27
oscillations such that the ions exit in the x–y plane. This is equivalent to the massselective instability mode described for the QIT. Two designs are available commercially today: one with linear ejection of the ions, which requires a detector positioned in the normal position along the z-axis at the end of the device; and the other, which is less intuitive, by radial ejection of the ions. For radial ejection, two opposing electrodes (of the four) have slots cut in them behind which are situated dual detectors (electron multipliers), the operation of which must be synchronized. As with the 3D ion trap, in conjunction with the RF ramp, a resonant excitation voltage can be applied across the electrodes to increase the efficiency of ejection (sensitivity) and mass resolution. Advantages of this design over the QIT include enhanced trapping efficiency of injected ions, and the ability to trap significantly more ions prior to onset of significant spacecharging. Both features result in the LIT having greater sensitivity over the QIT. LITs can be used as standalone mass analyzers, or more commonly in combination with other mass analyzers in tandem configurations. The tandem configuration takes advantage of the strong focusing of the ions along the z-axis.
19. ION CYCLOTRON CELLS AND FOURIER TRANSFORM MASS SPECTROMETRY The principle of operation of ICR cells and FTMS is based on the fact that ions rotate in a plane perpendicular to a superimposed magnetic field in a direction defined by the so-called right-hand rule at a frequency dependant on their m/z. The rotating ions can be detected based on an image current that is induced in detector plates positioned outside of the cyclotron cell. The measured frequencies of the image current can be converted into m/z values with the cyclotron equation: o ¼ qB=m where o is the cyclotron frequency, q the charge on the ion, B the magnetic field strength and m mass of the ion (Figure 19). Thus light ions have a high cyclotron frequency, and more massive ions have a lower frequency. Importantly, the cyclotron frequency does not depend upon the entering velocity or energy of the ion, partly explaining why the technique is able to produce such extraordinary high resolution. The ions are trapped inside a cell that is positioned inside a unidirectional, constant and homogeneous magnetic field. The cell can be cubic, rectangular or, more commonly today, cylindrical. Under these conditions the ions are first stored in the cell where they undergo continuous rotation or cyclotronic motion at a very small radius (Figure 19). The ions are then excited to a larger radius of rotation by a pulse of radio-frequency energy. Ions with the same m/z are excited to a coherent cyclotronic motion. The characteristics of the cyclotronic motion are recorded as the frequency of the image current induced in detector plates that form the sides of the cell. The image current frequency is a composite sine wave consisting of the sum of the image currents formed by each packet of ions with
28
Kym F. Faull et al.
Figure 19 Principles of the Ion Cyclotron Resonance Cell. Top: Cyclotron motion results from a balance between the centrifugal force and the magnetic force. Bottom: Ions in a homogeneous magnetic field (B) move in circular orbits and in so doing induce image currents in the externally positioned detector plates. Adapted from ref. [34].
the same m/z. FT analysis is used to transpose the sampled image current measurements (referred to as the transient) from the time domain into the frequency domain. This has the effect of deconvolving the complex image current waveform into the individual contributions from the frequencies of each of the separate ion packets. The frequency domain data is easily translated into a mass spectrum by a simple calibration function. Cyclotron frequencies are measured with high precision, and the magnetic field is constant; these factors contribute to the production of spectra with unprecedented accuracy of m/z assignment and resolution.
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
29
In practice the physics works superbly provided the transients are not disturbed by collisions of the ions with gas molecules, or by interactions between the rotating ions (space-charge effects). For the production of accurate highquality spectra the transients need to be stable for an adequate duration (B1 s for 100,000 resolving power at m/z 400 with a 7 Tesla magnet), and space-charge influences of one ion upon the rotation of another must be minimal. Hence the requirement that the ICR cell be held at less than 109 Torr to minimize the collisions between an ion and a gas molecule, and the limit on the number of ions that can be tolerated within the cell (B104–106 ions in a cubic centimeter cell). In addition, FT analysis requires adequate computing speed. The basic principles of ion cyclotron motion were described in the 1930s by Lawrence who was awarded the 1939 Nobel Prize in physics for developing the high-energy physics cyclotron and discovering the transuranium elements. The first ICR cells for organic mass spectrometry were built by John Baldeschwieler and colleagues [35]. The technique was quickly recognized for producing spectra with unprecedented resolution and accuracy of m/z assignment, but it did not gain widespread use until several obstacles were overcome. These include the development of the FT procedure for the rapid conversion of the transients to mass spectra, the availability of affordable computers with adequate data storage capacity and computing speed, the development of robust pumping systems that can achieve and maintain the necessary vacuum, the development of methods for introducing ions into the ICR cell from external ion sources, and the development of methods for accurately controlling or estimating the number of ions introduced into the ICR cell [36–37]. Commercial instruments today use widebore superconducting magnets with field strengths in the range 7–15 T. The tandem combination of a LIT with an ICR cell is a successful configuration with which the introduction of ions from an external ion source can be accurately controlled. This configuration is finding widespread application, particularly within the proteomics field.
20. THE ORBITRAP The orbitrap mass analyzer is the first fundamentally new mass analyzer introduced commercially in over 20 years. The device is an ion trap but there are no RF or magnetic fields. Moving ions are trapped as they rotate around an electrode, with electrostatic attraction toward the electrode balanced by the centrifugal force arising from the rotation. This trapping principal was first implemented in the Kingdon trap [38], then modified by Knight [39] and finally developed by Makarov [40] as the basis for the orbitrap analyzer. In it the ions are trapped in a high electrostatic field and allowed to orbit a spindle-shaped central electrode within a barrel-shaped outer electrode (Figure 20). The axial frequency of harmonic oscillations is inversely proportional to the square root of the m/z, and the resulting image current is recorded in the time domain similarly to image currents in FT-ICR instruments. Such a design represents an alternative to the magnet-based ion cyclotron motion used for mass analysis in an ICR cell.
30
Kym F. Faull et al.
Note that, unlike the FT-ICR where the motion along the z-axis is ignored and it is motion in the x–y plane that is measured, in the orbitrap, it is the z-motion that is measured and the x–y motion that is ignored. The development of the technology into a useful mass spectrometer required implementation of a several technological advances [41]. One fundamental problem was that of introducing ions into the orbitrap from an externally positioned ion source. Ions coming from the outside into a static electric field will normally continue unabated through the field and emerge on the other side, much like the passage of comets across the solar system. By lowering the voltage on the central electrode during the ion introduction step, it is possible to allow the ions to remain in the electrostatic field and assume a cyclic rotation around the central electrode. This in effect traps the ions, and is referred to as electrodynamic squeezing. The next advance involved developing a technology for introducing ultrashort packets of ions (o1 ms duration) from the external ion source. The introduction of ultra-short packets of ions is necessary if the resulting spectra are to have resolutions approaching those theoretically possible. A solution to this problem emerged with the development of the so-called ‘‘C’’ trap, an RF-only (i.e. non-mass selective) curved quadrupole in which ions are stored and their vibrations dampened by non-fragmenting collisions with a bath gas. The exit from the C-trap is positioned opposite the entrance, but displaced from the equatorial circumference of the orbitrap (Figure 20). Thus short packets of ions are injected from the C-trap by application of a DC voltage and converge on the orbitrap entrance. Ions entering the orbitrap are trapped by the electrodynamic squeezing phenomena, and assume a cyclic rotation around the central electrode. The ions move to and fro, along the z-axis of the trap as dictated by the spindleshaped curvature of the electrode. Ions of the same m/z develop coherent motion and gather as a ring-shaped cloud, much like the rings of Saturn. The frequency of transverse movement of these clouds of ions in the z-axis is recorded on split outer electrodes, and the time domain signal is converted into an m/z signal by fast Fourier transformation. The current status of this instrument is that resolutions between 20,000 and 160,000 across the 100–10,000 m/z range (M/DM) can be routinely achieved, and as already stated, very recent results with internal lock mass calibration show that sub ppm accuracy is possible making this a high-performance instrument suitable for many applications ( Joshua Coon and refs. [42,43]).
21. DETECTORS The final component of the mass spectrometer depicted in Figure 1 is the detector. Most mass analyzers produce a beam of ions that can be detected as an electrical current when the beam impinges upon a responsive surface. For example, currents of the order of picoamps are typically produced by ESI sources after amplification by an electron multiplier. Some detectors measure ion current, and some detectors count ions. The types of devices that are used to detect the ion
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
31
Ions injected
Barrel shaped outer electrode
Spindle shaped central electrode
Pressure <10-10 torr
Figure 20 Schematic representation of an orbitrap. The approximate ion path is shown. Ions are injected into the device in short packets (o1 ms duration) from a curved linear ion trap (not pictured) positioned orthogonally to the orbitrap. Injected ions of the same m/z assume a coherent rotation around the spindle-shaped central electrode, and the frequency of their transverse oscillations is recorded. Adapted from a Thermo Scientific instrument design.
beam include electron multipliers, photomultipliers and multichannel plates. Off-axis positioning of the device is common to avoid recording the impact from neutral species that would otherwise cause a high background signal. Highenergy conversion dynodes are also used to magnify the signal from an incoming ion (see below). Focused ion beams are detected by point detectors that include electron- and photomultipliers, and dispersed beams are detected by array detectors which include multichannel plates. ICR cells and the orbitrap are unusual in that they use a different, nondestructive method for detecting the ion beam. Both these instruments are ion traps and contain a population of ions that coherently move within the confines of the trap in a manner dependent on their m/z. The rotating ions generate an image current on appropriately positioned plates at the side of the trap, and the systems work by translating the time-domain image current data into the frequency domain, and then subsequently converting those frequencies into a mass spectrum.
22. ELECTRON MULTIPLIERS These are common detectors on quadrupole, QIT and LIT instruments. When energetic ions collide with a suitable solid surface (one that requires a relatively small amount of energy to remove an electron from the surface, referred to as a surface with a low workfunction), electrons are liberated. The conversion of ions to electrons and subsequent electron amplification can be achieved using discrete collisions on separate electrodes (dynodes), the potentials of which are arranged so that electrons are accelerated from one stage to the next producing a cascading effect, for example, the so-called venetian blind electron multipliers.
32
Kym F. Faull et al.
Alternatively, one can apply a potential across a semi-conductor material so that the electrons produced in one collision event are accelerated into the device to undergo further electron-surface collisions. Typically 10–20 stages of electron amplification are used. A single electron gives 106 electrons in just 12 stages of gain, when the average gain factor on each collision is just 3.3, a typical number.
23. CONVERSION DYNODES OR HIGH-ENERGY DYNODES The velocity of the incoming ions determines the probability and number of electrons released from the low workfunction surface of the electron multiplier. As a result, high mass ions, which are relatively slow moving, even at the customary energies of a few keV, often need to be converted into smaller, more energetic ions. This initial ion-to-ion conversion is achieved by impact of the primary ions onto a suitable dynode conversion surface. Ions and electrons resulting from impact with the dynode are accelerated into the throat of an appropriately positioned electron multiplier for further amplification of the signal. Negative ions are detected with electron multiplier operating at the same potentials as for positive ion detection, but with the conversion dynode at the opposite polarity.
24. QUANTIFICATION Accurate quantification is only achieved by using an internal standard (IS). The rationale for the use of an internal standard (IS) is to compensate for errors that are otherwise difficult to control. Such errors include variations in the yield of a purification process and chemical modification, for example disulfide reduction, thiol alkylation, ezymatic or chemical digestion, etc., and losses due to adsorption on glassware and chromatographic surfaces. The best internal standard is one that has physical and chemical properties that are as similar as possible to those of the analyte in question. Chemical analogues are often used as ISs, but molecules labeled with stable isotopes of low natural abundance (e.g. 2H, 13C and 14 N and 18O) are the preferred choice. In the case of stable isotope-labeled ISs there is virtually no distinction made in the analytical experimental work-up between the naturally occurring molecules and the IS, either by adsorption or during extraction and derivatization. Furthermore, because of the usual co-elution of the IS and analyte, they are subjected to the same mass spectrometric conditions. Mass spectrometry has the unique ability to distinguish between such closely related molecules and provides a ratio of their signal intensities. Provided an absolute standard is available, this ratio can be converted to moles of analyte. Without an absolute standard, one is left with relative quantification. The recently developed iTRAQt and related methods for peptide (protein) quantification with MS rely on measurement of relative analyte amounts in different samples.
33
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
The example selected to demonstrate the principles of quantification by mass spectrometry uses GC/MS, but the same principles apply regardless of the method by which samples are introduced into the mass spectrometer (Figure 21). In this example the task was to measure the absolute concentrations of a serotonin metabolite (5-hyroxyindoleacetic acid, 5HIAA) in human cerebrospinal fluid (CSF), which surrounds the brain and spinal cord. Five standard samples were prepared in an isotonic salt solution (artificial CSF) with known amounts of
Human CSF (ml)
Added unlabelled analyte ng
Added labelled internal standard ng
Analyte peak height
Internal standard peak height
Analyte/ internal standard ratio
Standard 1 Standard 2 Standard 3 Standard 4 Standard 5
0 0 0 0 0
0.00 2.80 6.99 13.98 21.96
15.0 15.0 15.0 15.0 15.0
12 1224 2163 4916 9775
2077 4835 3578 4295 4522
0.01 0.25 0.60 1.14 2.16
Unknown 1A Unknown 1B Unknown 2A Unknown 2B Unknown 3A Unknown 3B
1 1 1 1 1
15.0 15.0 15.0 15.0 15.0
1001 1781 2833 2031 623
984 1768 2816 1874 504
1.02 1.01 1.01 1.08 1.24
1
15.0
1341
1157
1.16
Sample
Analyte in human CSF (ng)
11.1 11.0 10.9 11.8 13.4 12.5
Standard curve Peak height ratio: analyte peak height / internal standard peak height)
2.50 2.00 1.50
y = 0.0957x - 0.0416 R2 = 0.9874
1.00 0.50 0.00 0.00 -0.50
5.00
10.00
15.00
20.00
25.00
ng Analyte
Figure 21 Quantitation using selected ion monitoring with a deuterated internal standard. Data set from an experiment in which the concentration of 5-hydroxyindole acetic acid (5HIAA, analyte) was measured in human lumbar cerebrospinal fluid (CSF) using 2H2-5HIAA as the internal standard. The experimental setup is as described in the text. The amount of analyte in each human CSF sample was calculated by interpolation from the standard curve.
34
Kym F. Faull et al.
authentic 5HIAA (0–22 ng), and three samples of human CSF were prepared in duplicate (A and B series). The same amount of IS (2H3-5HIAA, 15 ng) was added to the standards and the human CSF samples. All samples were then processed in the same manner by extraction with an organic solvent, concentration under a stream of nitrogen gas, chemical derivatization and analysis by GC/MS [44]. The SIM traces for the signals corresponding to the molecular ions of 5HIAA (m/z 622) and IS (m/z 625) were recorded, and the absolute peak heights (arbitrary units) for these two virtually co-eluting signals are shown in Figure 21, along with the ratio of their peak heights (R, 5HIAA/2H3-5HIAA peak height). Calibration curves prepared from the data for the standard samples show that the absolute peak height for 5HIAA, and the R-values, are linear when plotted against the amount of added 5HIAA. However, the absolute peak heights for 5HIAA in the duplicate samples from human CSF vary widely, reflecting experimental variations that result from the extraction, derivatization and injection procedures. However, the R-values for the same samples agree within the errors expected for this type of work. Thus the amount of 2H3-5HIAA in each human CSF sample can be accurately computed by interpolation from the standard curve using the R-values. This example reveals the value of using an IS to correct for errors that unavoidably occur during sample preparation and analysis.
25. STRUCTURAL ELUCIDATION BY MASS SPECTROMETRY Desorption and spray ionization methods tend to produce abundant molecular ions with little or no fragmentation. This is ideal for determination of molecular weights, but provides no information on structural details. One approach to this problem would be to fragment or dissociate the ion and measure the m/z values of the resulting charged fragments. For proteins and peptides, dissociation can be induced by collision with a gas molecule (CID also known as collisionally activated dissociation (CAD)), or capture of an electron (electron capture dissociation, ECD), or an electron transfer dissociation (ETD). Provided the dissociation of the parent ion is not a random process and that certain bonds break easier than others, such an experiment should yield useful structural information. Dissociation induced by such processes is not random but highly reproducible, and the rules governing fragmentation of many classes of gas phase ions, including those from peptides, are now known. The spectra of fragment ions generated in this way can often be interpreted to yield a sequence, or more commonly a partial sequence. These experiments are referred to as tandem mass spectrometry (MS/MS), and an arrangement of analyzers within an instrument in which such an experiment could be performed is represented schematically in Figure 23. There has been a long history to the development of different methods to induce fragmentation of gas phase ions, including intense electromagnetic radiation (laser-induced fragmentation), collisions with an inert surface (surface-induced dissociation), photodissociation and infrared
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
Figure 22
35
Structural elucidation by MS/MS.
multiphoton dissociation (IRMPD). Today for protein chemistry the most important methods are CID, usually with argon or nitrogen, ECD and ETD. MS/MS experiments can be carried out by positioning two analyzers in series (referred to as MS1 and MS2), separated by a collision cell in which dissociation is induced (Figure 22). The operation of the two analyzers requires synchronization of their scan modes. A successful tandem design can be assembled with identical analyzers for MS1 and MS2 (e.g. two double focusing magnetic sector instruments, or more commonly, two quadrupole, or two TOF analyzers), or with dissimilar analyzers for MS1 and MS2 (e.g. hybrid quadrupole/TOF instruments). However, experiments such as these can also be conveniently carried out using a single trapping mass spectrometer (QITs and LITs): by isolating a specific parent ion, allowing the parent to fragment, and then measuring the m/z values of the resulting charged fragments. In this case, the experiment is performed with a single analyzer with obvious cost advantages.
26. GAS PHASE ION STABILITIES AND ENERGETICS OF THE COLLISIONALLY-ACTIVATED DISSOCIATION PROCESS In the gas phase a distinction is drawn between ions in three different stability regimes. These are unstable ions of high internal energy with dissociation rates W106 s1 that are not observed in mass spectra, stable ions with low internal energy and dissociation rateso105 s1 that constitute the bulk of the ion current observed in mass spectra and selected for fragmentation in MS/MS experiments, and finally metastable ions with intermediate internal energy and dissociation rates between 106 and 105 s1 that fragment between the ion source and detector. Metastable ions are occasionally observed in mass spectra as broad poorly resolved peaks.
36
Kym F. Faull et al.
Depending on the energy absorbed by the parent ion, there are two possible fragmentation pathways dictated by the energy regime. Most MS/MS instruments work in the low-energy regime, while high-energy ion fragmentation typically occurs in tandem TOF and tandem magnetic sector instruments. Ions undergoing low-energy fragmentation have an average kinetic energy prior to collision of 10–100 eV, whereas ions undergoing high-energy fragmentation have an average kinetic energy of a few keV prior to collision [45]. These two modes also differ by the fact that the low-energy mode involves multiple collisions, whereas high-energy fragmentation often involves a single collision event. In general, fragmentation of a parent ion can occur when the transmitted collision energy is sufficiently high that the ion is excited beyond the dissociation threshold. Regardless of the mass spectrometer used, the selection window for the parent ion is an important parameter. A narrow window imparts greater selectivity in the process by restricting the experiment to a homogeneous population of parent ions, but a wide window will increase signal strength of the product ions, and hence sensitivity. Quadrupole and QITs can be tuned for a wide range of parent ion acceptance windows from wide (W71.0 Da) to unit (71.0 Da) to narrow (o71.0 Da). There is less flexibility in parent ion acceptance window selection with TOF analyzers, and they are generally operated with a 3– 10-Da wide selection gate.
27. COLLISION-INDUCED DISSOCIATION CID describes an interaction wherein the projectile ion is dissociated as a result of interaction with a target neutral species. This is brought about by conversion of part of the translation or kinetic energy of the ion to internal energy in the ion during collision. CID can occur in the ion source-mass analyzer interface, where ions in the gas phase emerging from the source encounter residual gaseous molecules from the atmosphere and drying gas. There is no parent ion selection with these events, so interpretation of the resulting fragment ion spectrum can be difficult. In contrast, CID can also occur in the collision cell that is positioned between the two analyzers of the tandem mass spectrometer. In this case there is opportunity for parent ion selection, and interpretation of the resulting fragment ion spectrum is not confounded with uncertainty about the m/z of the parent ion. The efficiency of CID may be increased by using a more massive gas, such as argon or xenon. With different instruments, CID can be carried out with high (keV) or low (10–100 eV) collision energies with some differences being noted in the types and relative abundances of the product ions. However, low-energy CID MS/MS experiments are, by far, the most common method used to dissociate peptide ions. Although gas phase dissociation of peptides has been investigated for many years, and the general rules governing the process are known, it remains as an important field of ongoing research because de novo peptide sequencing is still a challenge. This is of particular importance because of the increasing numbers of MS/MS spectra generated through automated MS methodologies and used for
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
37
protein identification through database searches. The general model used to explain peptide fragmentation in the gas phase is called the ‘‘mobile proton model’’ [46]. The idea behind this is that bond cleavages are charge directed. During ionization, protonation occurs on basic sites and the population of different protonated forms of a peptide or a protein depends on both the internal energy of the ions, as well as the gas phase basicities of protonation sites. There is a correlation between gas phase basicities of protonation sites and the energy required to induce fragmentation. The presence of a charge at a specific site may promote the fragmentation of adjacent bonds. For example, protonation at the peptide backbone initiates charge directed cleavages at the peptide skeleton leading primarily to cleavage of the amide bond and production of so-called b- and y-type ions (Figure 23). Peptides carrying a basic amino acid in the C-terminus (e.g. peptides produced by trypsin digestion) tend to show more intense and higher m/z y-ions in the spectra, attributed to the stability of these ions. The corresponding b-ions are thought to fragment further because the charge site is mobile, and thus they appear as less intense signals in the lower m/z region of the spectra. Although peptide CID spectra are generally dominated by b- and y-type ions, there are other ions formed, and the spectrum of fragment ions produced can be complex. For example, b-ion isomerization leads to proton
Figure 23 Peptide fragmentation nomenclature in which y-type ions contain the C-terminus, and b-type ions contain the N-terminus of the peptide. Commonly each ion breaks in just one place. Rarely are two bonds broken.
38
Kym F. Faull et al.
scrambling that is the random movement of protons within an ion, while charge remote fragmentation results in the neutral loss of water and ammonia [47–49]. In addition, production of internal fragments resulting from multiple collisions that occur with low-energy MS/MS instruments, is another frequently observed feature of peptide CID spectra [48,50]. The difficulties in de novo peptide sequencing with CID arise primarily from incomplete b- and y-ladders in the spectra, but insufficient or overabundant fragmentation, or unusual and unexpected fragmentation processes also complicate interpretation of the spectra. Recently, a comprehensive study of synthetic peptides sharing common features showed that half of the compounds displayed low-energy fragmentation patterns that were in agreement with the mobile proton model [47]. The remaining half of the peptides underwent alternative dissociation pathways with complex rearrangements. These unusual fragmentation processes include charge remote fragmentation, intra-molecular interactions, gas phase rearrangements and b-ion scrambling. Apart from the difficulties for de novo peptide sequencing mentioned above, other important deficiencies in CID spectra also include ineffectiveness for probing site-specific chemical modifications, especially labile post-translational modifications such as phosphorylation and some glycosylations. To overcome these limitations other fragmentation modes have been investigated. The most useful to date, ECD and ETD, appear to offer needed complimentarity to the CID process. Both ECD and ETD use low-energy electrons to promote fragmentation of the peptide backbone, and differ only in the way that low-energy electrons are transferred to gas phase peptide and protein ions.
28. ELECTRON CAPTURE DISSOCIATION ECD was first introduced by McLafferty and colleagues in 1998, and since then has been demonstrated to be efficient causing its use to become increasingly wide spread (see Zubarev, this volume) [51–54]. In an ECD experiment, multiply charged ions are trapped in an ICR cell and allowed to react with near-thermal energy electrons (p0.2 eV) produced within the cell with an electron gun. Capture of such an electron by a protonated peptide is exothermic by B6 eV, and yields peptide backbone fragmentation in a non-ergodic manner. Two phenomenon are then observed; partial neutralization of the parent ion leading to the charge state reduction, and extensive backbone cleavage yielding predominantly c and zd fragment ion series (Figure 24). This is in contrast to CID dissociation that, as already stated, produces predominantly b- and y-fragment ions. Whereas b- and y-fragment ions originate from the cleavage of the amide bond, c and zd fragment ions result from the N-Ca amine backbone (Figure 23). Thus the ECD and CID processes produce complimentary fragmentation patterns. ECD does not exclusively produce c and zd fragments as there is some production of b- and y-fragments with restricted fragmentation N-terminal to proline residues [55–58]. In the ICR cell, ECD works on a millisecond timescale with an efficient precursor to fragment ion conversion rate of about 30%. This reaction timescale is
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
Figure 24 Scheme of the ECD process.
39
40
Kym F. Faull et al.
compatible with all forms of liquid chromatography in use today, providing ample time for the collection of many spectra during the elution of a single chromatographic peak. The process typically produces fragment ion spectra with near-uniform signal intensity of the fragment ions. This is in contrast to CID spectra in which the intensity of the fragment ions can vary dramatically. Importantly, because the peptide backbone cleavages occur in an independent manner, this process preserves labile chemical modifications that are often lost during CID. Ongoing investigations are focusing on the use of this technique for characterization of labile modifications such as phosphorylation, O-glycosylation, acylation, sulfation and nitration [7,51,59–61]. Another important impact of ECD in the biological mass spectrometry field is the development of ‘‘top-down’’ analyses of proteins. The ‘‘top-down’’ approach refers to the direct MS/MS fragmentation of intact proteins in the gas phase for their identification and characterization. Recently, ECD experiments were shown to preserve quaternary structures in the gas phase and could be used in the future to probe arrangements of protein complexes [62].
29. ELECTRON TRANSFER DISSOCIATION The main drawback of ECD is that it can only be implemented on instruments in which near-thermal electrons have a sufficiently long residence time (milliseconds or longer) to allow reaction with gas phase analyte ions. Such electrons have long residence times in ICR cells, and the technique can only be used today on FTMS instruments. ETD was developed to implement the same reaction on other ion trap mass spectrometers [63,64]. In this process, low-energy electrons are transferred to multiply charged peptide ions using gas phase ion–ion chemistry. The kinetics of such chemistry have long been known, and electron-donating species, such as gas phase anthracene anions, are used to deliver the electrons to the isolated parent ions of interest. The gas phase electron-donor ions, generated outside the reaction chamber by a separate ionization process, most commonly electron capture chemical ionization of gas phase molecules, are injected into the reaction chamber to catalyze the ETD process. This method is analogous to ECD and produces c and zd fragment ions with similar efficiency and relatively uniform intensity. ETD also preserves labile post-translational modifications. Benefits are the same as for ECD, and include the ability to analyze large and non-tryptic peptides with complete or nearly complete sequence coverage, including post-translationally modified residues [53,65]. The ability to perform this type of fragmentation on less expensive and widely used ion trap instruments is leading to the widespread use of this fragmentation method in protein mass spectrometry.
30. SCAN MODES IN TANDEM MASS SPECTROMETRY The linear arrangement of two scanning mass spectrometers in series, shown simplistically in Figure 22, is a logical design that provides a unique analytical
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
41
platform with which multiple types of experiments can be performed. The product (fragment) ion scan has already been introduced: a parent ion is selected in MS1, fragmented in the collision cell, and the m/z of the resulting charged products (fragments) are recorded by scanning MS2 (Figures 25 and 26A). The reverse experiment can be performed in which MS1 is scanned and MS2 is set to transmit a single product (Figure 26B). This experiment, referred to as a ‘‘parent ion scan’’ yields the m/z of the parent ions that give rise to a common product. A third scan modality can be implemented in which both MS1 and MS2 are scanned with a set m/z offset (Figure 26C). This neutral loss scan will record the m/z of all parent ions that undergo the same m/z loss upon fragmentation. In the field with which this volume is concerned (protein mass spectrometry), product ion and neutral loss scans are the most important. Product ions scans yield fragmentation patterns that are used to sequence and identify peptides. Neutral loss scans are used to identify peptides that lose unique masses upon collisions,
Figure 25 Scheme of events that take place during a fragment ion scan.
42
Kym F. Faull et al.
Figure 26 Triple quadrupole scan modes.
such as 98 Da from the loss of a molecule of phosphoric acid (H2PO4) as occurs readily during CID of phosphopeptides. For quantitative experiments there is yet another scan mode that is used. If MS1 is set to transmit a single parent (P), and MS2 set to transmit a single fragment (F), then the resulting trace will represent the intensity from the P-F transition (Figure 26D). This mode of data collection, referred to as ‘‘reaction monitoring’’, can only be employed when the parents and the products are known, and is usually done in combination with chromatography. With an appropriate IS the relative intensity of the transition for the analyte (PA-FA) and reference compound (PIS-FIS) is a quantitative ratio, and the amount of analyte in the sample can be accurately calculated from the ratio if the molar responses of the two compounds are known. Modern instruments are capable of monitoring many P-F transitions during the time taken for elution of a single chromatographic peak. Thus it is generally referred to as ‘‘multiple reaction monitoring’’ (MRM). The four scan modalities (fragment ion scan, parent ion scan, neutral loss scan and MRM) described above and in Figure 26 can be performed in tandem mass spectrometers with a linear arrangement. However, certain combinations of instrument types for MS1 and MS2 are better for some of these experiments. By far the most successful and versatile combination of instruments is that of a dual quadrupole arrangement. In this design, the collision chamber is also a quadrupole, but operated in the RF-only mode to focus the incoming and out-going ion beams along the z-axis to reduce scatter resulting from collisions
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
43
and improve ion transmission efficiency. Thus these instruments are called triple quadrupoles, frequently abbreviated as triple quads, or QQQ or QqQs. MS/MS can be achieved without two linked analyzers with a QIT instrument. The ions from the sample are injected into the trap and held there briefly. All unwanted ions are then ejected leaving the desired parent ion. This is done by the application of RF voltages and resonant excitation were forms to eject all ions from the ion trap except for the parent ion of interest. The behavior of the chosen parent ion can then be observed in isolation. It is possible to supply a supplementary RF potential at an appropriate frequency to excite these ions. In the absence of the helium bath gas the ions would gain kinetic energy, develop an unstable trajectory and be lost from the trap. However, in the presence of the helium bath gas, collisions between the retained and excited parent ions and the bath gas occur, converting some of the kinetic energy of the ions into internal energy, hence exciting them internally and facilitating dissociation. A conventional scan then reveals the resulting fragment ions. This CID event is efficient if the excitation voltage is optimized. With QITs the processes of parent selection, fragmentation and fragment detection, are separated in time, not space, as in a QQQ instrument. However, the fragment ion spectra produced by both MS/MS methods (QIT and QQQ) are surprisingly similar. Some, but not all of the scan modalities described above and in Figure 26 can be performed in QITs. Most notably, the parent and neutral loss scans are not readily implemented on QITs, and indeed no commercial QITs offer these two scan modes. This shortcoming can be at least partially offset by the development of software for reconstruction of the data set after all parents have been individually fragmented, a feature that is possible to implement on a practical basis because of the exceptionally fast scanning ability of new QIT designs. The QIT design, however, offers an important advantage over the QQQ design for other applications. Multistage CID MS/MS experiments, referred to as MSn, are possible on a QIT. In a linear format such experiments would require multiple mass analyzers. Although restricted to fragment ion scans, this technique is advantageous for the elucidation of structural details and fragmentation pathways that are beyond the reach of MS/MS experiments. For example, MSn is finding important application in elucidating the fine structure of the glycosyl moieties attached to proteins.
31. CONCLUSIONS The field of mass spectrometry has witnessed a phenomenal development over the past decade. This may be intimidating to the novice in the field. However, the principles upon which the technique is based do not change, and a sound grasp of these principles will place a new student in a position to understand and appreciate the potential that this technique offers. In this chapter we have attempted to lay the groundwork for understanding these principles in simple, non-technical terms. It is hoped that with this information at hand the
44
Kym F. Faull et al.
subsequent chapters in this volume can be fully appreciated by all those wishing to learn more about this important and exciting field.
ACKNOWLEDGEMENTS The authors wish to acknowledge the advice received from Franz Hillenkamp, John Fenn, Alexander Makarov and Robert McIver, who read and edited selected portions of this chapter, Hans Barnard for his excellent editorial advice and Jae Oh Yoon for help with some of the figures.
REFERENCES 1 P.S.H. Wong and R.G. Cooks, Ion trap mass spectrometry, Curr. Sep., 16(3) (1997) 85–92. 2 M.A. Grayson (Ed.), Meauring Mass: From Positive Rays to Proteins. Chemical Heritage Press, Phildelphia, PA, 2002. 3 C. Brunne´e, The ideal mass analyzer: Fact or fiction, Int. J. Mass Spectrom. Ion Process., 76 (1987) 121–237. 4 D.F. Torgerson, R.P. Skowronski and R.D. Macfarlane, A new approach to the mass spectrometry of non-volatile compounds, Biochem. Biophys. Res. Commun., 60 (1974) 616–624. 5 M. Barber, R.S. Bordoli, R.D. Sedgwick and A.N. Tyler, Fast atom bombardment of solids as an ion source in mass spectrometry, Nature, 293 (1981) 270–275. 6 K. Tanaka, H. Waki, Y. Ido, S. Akita, Y. Yoshida and T. Yoshida, Rapid Commun. Mass Spectrom., 8(2) (1988) 151–153. 7 M. Karas and F. Hillencamp, Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons, Anal. Chem., 60 (1988) 2299–2301. 8 J.B. Fenn, M. Mann, C.K. Meng, S.F. Wong and C.M. Whitehouse, Electrospray ionization for massspectrometry of large biomolecules, Science, 246(4926) (1989) 64–71. 9 A.L. Burlingame, M. Calvin, J. Han, W. Henderson, W. Reed and B.R. Simoneit, Lunar organic compounds: Search and characterization. Earth, Moon, Planets, 1(3) (1970) 396 (Proceedings of the Lunar Science Conference, Houston, Texas, USA, January 5–8, 1970). 10 M.S.B. Munson and F.H. Field, Chemical ionization mass spectrometry. I. General introduction, J. Am. Chem. Soc., 88 (1966) 2621–2630. 11 J.M.L. Mee, J. Korth and B. Halpern, Rapid and quantitative blood analysis for free fatty acids by chemical ionization mass spectrometry, Anal. Lett., 9(12) (1976) 1075–1083. 12 J.M.L. Mee, J. Korth, B. Halpern and L.B. James, Rapid and quantitative blood amino acid analysis by chemical ionization mass spectrometry, Biomed. Mass Spectrom., 4(3) (1977) 178–181. 13 C.M. Whitehouse, R.N. Dreyer, M. Yamashita and J.B. Fenn, Electrospray interface for liquid chromatographs and mass spectrometers, Anal. Chem., 57(3) (1985) 675–679. 14 M.P. Washburn, D. Wolters and J.R. Yates, Large-scale analysis of the yeast proteome by multidimensional protein identification technology, Nat. Biotechnol., 19 (2001) 242–247. 15 R.D. Macfarlane and D.F. Torgerson, Californium-252 plasma desorption mass spectrometry, Science, 191 (1976) 920. 16 M. Barber, R.S. Bordoli, R.D. Sedwick and A.N. Tyler, Fast atom bombardment of solids (FAB): A new ion source for mass spectrometry, J. Chem. Soc. Chem. Commun., (1981) 325–327. 17 M. Dole, L.L. Mach, R.L. Hines, R.C. Mobley, L.D. Ferguson and M.B. Alice, Molecular beams of macroions, J. Chem. Phys., 49 (1968) 2240–2249. 18 M. Yamashita and J.B. Fenn, Electrospray ion source: Another variation on the free-jet theme, J. Phys. Chem., 88(20) (1984) 4451–4459. 19 M. Yamashita and J.B. Fenn, Negative ion production with the electrospray ion source, J. Phys. Chem., 88 (1984) 4671–4675. 20 J.B. Fenn, M. Mann, C.K. Meng, S.F. Wong and C.M. Whitehouse, Electrospray ionizationprinciples and practice, Mass Spectrom. Rev., 9(1) (1990) 37–70.
An Introduction to the Basic Principles and Concepts of Mass Spectrometry
45
21 J.B. Fenn, Electrospray: Wings for molecular elephants, (Nobel Lecture) Angew. Chem., Int. Ed., 42 (2003) 3871–3894. 22 M.L. Alexandrov, L.N. Gall, M.V. Krasnov, V.I. Nikolaev and V.A. Shkurov, J. Anal. Chem. USSR, 40 (1985) 1227–1236. 23 M. Mann, C.K. Meng and J.B. Fenn, Interpreting mass spectra of multiply charged ions, Anal. Chem., 61(15) (1989) 1702–1708. 24 D.S. Ashton, C.R. Beddell, B.N. Green and R.W.A. Oliver, Rapid validation of molecular structures of biological samples by electrospray-mass spectrometry, FEBS Lett., 342(1) (1994) 1–6. 25 P.H. Dawson, Quadrupole Mass Spectrometry and its Applications, Elsevier Science Publisher, New York, 1976. 26 W. Paul and H. Steinwedel, A new mass spectrometer without magnetic field, Z. Naturforsch., 8a (1953) 448–450. 27 R.E. March and R.J. Hughes, Quadrupole Storage Mass Spectrometry, Wiley, New York, 1989. 28 J.F. Todd and R.E. March, Practical Aspects of Ion Trap Mass Spectrometry-Volume I: Fundamentals of Ion Trap Mass Spectrometry, CRC Press, Boca Raton, 1995, ISBN 0-8493-4452-2. 29 J.F. Todd and R.E. March, Practical Aspects of Ion Trap Mass Spectrometry-Volume II: Ion Trap Instrumentation, CRC Press, Boca Raton, 1995, ISBN 0-8493-8253-X. 30 J.F. Todd and R.E. March, Practical Aspects of Ion Trap Mass Spectrometry-Volume III: Chemical, Environmental, and Biomedical Applications, CRC Press, Boca Raton, 1995, ISBN 0-8493-8251-3. 31 J.F. Todd and R.E. March, Quadrupole Ion Trap Mass Spectrometry, 2nd ed., Wiley-Interscience, New York, 2005, ISBN 0-471-48888-7. 32 G.C. Stafford, P.E. Kelley, J.E.P. Syka, W.E. Reynolds and J.F.J. Todd, Recent improvements in and analytical applications of advanced ion trap technology, Int. J. Mass Spectrom. Ion Process., 60(1) (1984) 85–98. 33 J.C. Schwartz, M.W. Senko and J.E.P. Syka, A Two-dimensional quadrupole ion trap mass spectrometer, J. Am. Soc. Mass Spectrom., 13 (2002) 659–669. 34 R.T. McIver and J.R. McIver, Fourier Transform Mass Spectrometry: Principles and Applications. Ionspec Corporation, Irvine, California, USA, 2006. 35 J.D. Baldeschwieler, Ion cyclotron resonance spectroscopy. Cyclotron double resonance provides a new technique for the study of ion-molecule reaction mechanisms, Science, 159 (1968) 263. 36 M.B. Comisarow and A.G. Marshall, Fourier transform ion cyclotron resonance spectroscopy, Chem. Phys. Lett., 25(2) (1974) 282–283. 37 R.T. McIver, R.L. Hunter and W.D. Bowers, Coupling a quadrupole mass spectrometer and a fourier transform mass spectrometer, Int. J. Mass Spectrom. Ion Phys., 64 (1985) 67–77. 38 K.H. Kingdon, A method for the neutralization of electron space charge by positive ionization at very low gas pressures, Phys. Rev., 21(4) (1923) 408–418. 39 R.D. Knight, Storage of ions from laser-produced plasmas, Appl. Phys. Lett., 38 (1981) 221–223. 40 A. Makarov, Electrostatic axially harmonic orbital trapping: A high-performance technique of mass analysis, Anal. Chem., 72 (2000) 1156. 41 Q. Hu, R.J. Noll, H. Li, A. Makarov, M. Hardman and G. Cooks, The orbitrap: A new mass spectrometer, J. Mass Spectrom., 40(4) (2005) 430–443. 42 A. Makarov, Theory and practice of the orbitrap mass analyzer. Proceedings of the 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, Washington, USA, 2006. 43 J.V. Olsen, L.M.F. de Godoy, G. Li, B. Macek, P. Mortensen, R. Pesch, A. Makarov, O. Lange, S. Horning and M. Mann, Parts per million mass accuracy on an orbitrap mass spectrometer via lock mass injection into a C-trap, Mol. Cell Proteomics, 4(12) (2005) 2010–2021. 44 K.F. Faull, P.J. Anderson, J.D. Barchas and P.A. Berger, Selected ion monitoring assay for biogenic amine metabolites and probenecid in human cerebrospinal fluid, J. Chromatogr. B, Biomed. Appl., 163 (1979) 337–349. 45 J.M. Wells and S.A. McLuckey, Collision-induced dissociation (CID) of peptides and proteins, Methods Enzymol., 402 (2005) 148–185. 46 B. Paizs and S. Suhai, Fragmentation pathways of protonated peptides, Mass Spectrom. Rev., 24(4) (2005) 508–548.
46
Kym F. Faull et al.
47 A.G. Harrison, A.B. Young, C. Bleiholder, S. Suhai and B. Paizs, Scrambling of sequence information in collision-induced dissociation of peptides, J. Am. Chem. Soc., 128(32) (2006) 10364–10365. 48 L. Mouls, J.L. Aubagnac, J. Martinez and C. Enjalbal, Low energy peptide fragmentations in an ESI-Q-Tof type mass spectrometer, J. Proteome Res., 6(4) (2007) 1378–1391. 49 L. Mouls, G. Subra, J.L. Aubagnac, J. Martinez and C. Enjalbal, Tandem mass spectrometry of amidated peptides, J. Mass Spectrom., 41(11) (2006) 1470–1483. 50 X. Chen and F. Turecek, Simple b ions have cyclic oxazolone structures. A neutralizationreionization mass spectrometric and computational study of oxazolone radicals, J. Am. Soc. Mass Spectrom., 16(12) (2005) 1941–1956. 51 R. Bakhtiar and Z. Guan, Electron capture dissociation mass spectrometry in characterization of peptides and proteins, Biotechnol. Lett., 28(14) (2006) 1047–1059. 52 F.W. McLafferty, E.K. Fridriksson, D.M. Horn, M.A. Lewis and R.A. Zubarev, Biomolecule mass spectrometry, Science, 284(5418) (1999) 1289–1290. 53 L.M. Mikesh, B. Ueberheide, A. Chi, J.J. Coon, J.E. Syka, J. Shabanowitz and D.F. Hunt, The utility of ETD mass spectrometry in proteomic analysis, Biochim. Biophys. Acta, 1764(12) (2006) 1811–1822. 54 R.A. Zubarev, Electron-capture dissociation tandem mass spectrometry, Curr. Opin. Biotechnol., 15(1) (2004) 12–16. 55 H.J. Cooper, Investigation of the presence of b ions in electron capture dissociation mass spectra, J. Am. Soc. Mass Spectrom., 16(12) (2005) 1932–1940. 56 S. Lee, S.Y. Han, T.G. Lee, G. Chung, D. Lee and H.B. Oh, Observation of pronounced b,y cleavages in the electron capture dissociation mass spectrometry of polyamidoamine (PAMAM) dendrimer ions with amide functionalities, J. Am. Soc. Mass Spectrom., 17(4) (2006) 536–543. 57 R.A. Zubarev, D.M. Horn, E.K. Fridricksson, N.L. Kelleher, N.A. Kruger, M.A. Lewis, B.K. Carpenter and F.W. McLafferty, Electron capture dissociation for structural characterization of multiply charged protein cations, Anal. Chem., 72(3) (2000) 563–573. 58 N.L. Kelleher, R.A. Zubarev, K. Bush, B. Furie, B.C. Furie, F.W. McLafferty and C.T. Walsh, Localization of labile posttranslational modifications by electron capture dissociation: The case of gamma-carboxyglutamic, Anal. Chem., 71(19) (1999) 4250–4253. 59 R. Bakhtiar and Z. Guan, Electron capture dissociation mass spectrometry in characterization of post-translational modifications, Biochem. Biophys. Res. Commun., 334(1) (2005) 1–8. 60 K. Breuker and F.W. McLafferty, Native electron capture dissociation for the structural characterization of noncovalent interactions in native cytochrome C, Angew. Chem. Int. Ed. Engl., 42(40) (2003) 4900–4904. 61 Y.O. Tsybin, M. Ramstrom, M. Witt, G. Baykut and P. Hakansson, Peptide and protein characterization by high-rate electron capture dissociation Fourier transform ion cyclotron resonance mass spectrometry, J. Mass Spectrom., 39(9) (2004) 1077. 62 Y. Xie, J. Zhang, S. Yin and J.A. Loo, Top-down ESI-ECD-FT-ICR mass spectrometry localizes noncovalent protein-ligand binding sites, J. Am. Chem. Soc., 128(45) (2006) 14432–14433. 63 J.J. Coon, B. Ueberheide, J.E. Syka, D.D. Dryhurst, J. Ausio, J. Shabanowitz and D.F. Hunt, Protein identification using sequential ion/ion reactions and tandem mass spectrometry, Proc. Natl. Acad. Sci. USA, 102(27) (2005) 9463–9468. 64 J.E. Syka, J.J. Coon, M.J. Schroeder, Shabanowitz and D.F. Hunt, Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry, Proc. Natl. Acad. Sci. USA, 101(26) (2004) 9528–9533. 65 R.J. Chakley, C.S. Brinkworth and A.L. Burlingame, Side-chain fragmentation of alkylated cysteine residues in electron capture dissociation mass spectrometry, J. Am. Soc. Mass Spectrom., 17(9) (2006) 1271–1274.
CHAPT ER
2 Characterization of Protein Higher Order Structure and Dynamics with ESI MS Wendell P. Griffith, Anirban Mohimen, Rinat R. Abzalimov and Igor A. Kaltashov
Contents
1. Introduction 2. Charge-State Distributions of Protein Ions in ESI MS and LargeScale Conformational Dynamics of Single Polypeptide Chains 3. Conformational Dynamics in Multi-Component Systems: Assembly of Hemoglobin Tetramers 4. Charge-State Distribution and the Estimation of the Solvent-Exposed Surface Areas of Proteins 5. Limitations of the Use of Charge-State Distributions for Determining Protein Conformational Heterogeneity 6. Future Outlook Acknowledgements References
47 48 51 55 58 59 60 61
1. INTRODUCTION Characterization of protein higher order structure traditionally depended on various spectroscopic techniques, such as NMR and X-ray crystallography. Recently, the unique characteristics of ESI MS (smaller sample size, low working protein sample concentrations, ability to handle heterogeneous systems, etc.) brought it to the forefront of protein structure research. It has become a potent experimental tool in biophysics, which in many cases provides valuable information on various aspects of protein behavior in solution. One unique feature of the ESI process over other types of ionization methods used in mass spectrometry is its ability to produce ions of polar and thermally labile species Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00202-X
r 2009 Elsevier B.V. All rights reserved.
47
48
Wendell P. Griffith et al.
with multiple charges and transfer them to the gas phase. Although ionic species carrying more than one elementary charge are not uncommon in mass spectra of macromolecular ions generated by means of fast atom bombardment (FAB) or matrix-assisted laser desorption/ionization (MALDI), they typically account for only a small fraction of the overall signal. Furthermore, the average number of charges per unit mass is much lower for the macromolecular ions produced by FAB and MALDI, as compared to ions generated by ESI. Multiple charging of proteins in ESI MS most commonly occurs in the positive-ion mode in the form of protonation (the attachment of multiple protons, H+). However, other types of polycation formation can also occur, e.g., via the attachment of ubiquitous alkali metal ions, such as Na+ and K+. Although there are a number of factors that can influence the number of charges that can be carried by a protein ion in the gas phase, the extent of protonation of a protein molecule is determined in large part by its physical size in solution or, put more precisely, the amount of its surface area that is exposed to the solvent. The solvent-exposed surface area of a protein in solution is dictated by its higher order structure and, as a result, the extent of multiple charging observed in ESI mass spectra of a protein reflects its conformation in solution. It is for this reason that the analysis of protein charge-state distributions in ESI mass spectra provides a very effective, though simple, means of evaluating the integrity of the higher order structure of proteins and their complexes as well as assessing their conformational heterogeneity.
2. CHARGE-STATE DISTRIBUTIONS OF PROTEIN IONS IN ESI MS AND LARGE-SCALE CONFORMATIONAL DYNAMICS OF SINGLE POLYPEPTIDE CHAINS Not long after the introduction of ESI as an ionization technique in MS, it was observed that dramatic changes of charge-state distributions of protein ions were directly linked to protein denaturation [1,2]. These observations brought about the realization of the potential of ESI MS as a means of probing protein higher order structure and detecting large-scale conformational transitions in solution. Natively folded proteins in solution by definition have a compact structure and undergo ESI to produce ions carrying a relatively small number of charges. This is because the compact shape of tightly folded polypeptide chains in solution does not allow the accommodation of a significant number of protons on their surface upon transition from solution to the gas phase. For this reason, ion peaks in ESI mass spectra of proteins in aqueous solutions at neutral pH typically dominate the high m/z regions of the mass spectra and are almost always characterized by having narrow distribution of charge states. Unlike folded proteins, conformers lacking native structure (i.e., those that are either partially or fully unfolded in solution as a result of denaturation) give rise to ions carrying a significantly larger number of charges and their charge-state distributions are significantly broader. This is because once the protein ion loses its compactness upon denaturation (or unfolding), a significantly larger number
Characterization of Protein Higher Order Structure and Dynamics with ESI MS
49
of charges can be accommodated on its surface. Native and non-native protein states often coexist at equilibrium under mildly denaturing conditions. In such situations protein ion charge-state distributions become bimodal, reflecting the presence of both native and denatured states. Dramatic changes of protein charge-state distributions therefore often serve as gauges of large-scale conformational changes [3]. This unique feature of ESI MS has been utilized extensively in the recent past in numerous studies of large-scale protein dynamics ranging from simple protein folding [4,5] and small ligand binding [6,7], to the more complex processes of multi-unit protein assembly [8,9], as well as protein interaction with other biopolymers [10]. Unless certain gas-phase processes in the ESI interface cause a distortion of the charge-state distributions (vide infra), the ionic signal corresponding to the natively folded protein molecules maintains a nearly constant charge-state distribution (compare traces A and B in Figure 1). The portion of the protein ion signal representing non-native or less compact states, however, is much more heterogeneous and evolves as the solution conditions change. This phenomenon is a result of most proteins having not just a single non-native state, but rather multiple conformations retaining varying levels of native structure. In most cases these non-native conformations do not give rise to distinct ion signals in ESI mass spectra because of the insufficient differences in the solvent-accessible surface area (SASA) among individual conformers. This results in either unresolved or poorly resolved charge-state distributions (such as those in Figure 1B and C), where two or more different protein conformers may produce ions carrying equal numbers of charges in the gas phase. In many cases, however, the information on individual protein states can be extracted from such unresolved charge-state distributions using chemometric tools [11,12], as illustrated in Figure 2 using acid unfolding of apomyoglobin (aMb) as an example. aMb is a single-chain protein free of a heme group. Removal of the heme group from an otherwise stable holomyoglobin has a two-fold effect. It causes local changes in the native-like state of this protein (partial loss of two helical segments), and also induces global unfolding in a significant fraction of the protein population, as judged by ESI MS [11,13]. Four states are known to exist both kinetically (in refolding experiments) and at equilibrium, commonly termed N (native, or native-like), I (the so-called pH 4 intermediate), E (extended), and U (unfolded) [14]. The degree of conformational disorder clearly increases as the protein solution is acidified (Figure 2A–C). Singular value decomposition (SVD) of the entire data array acquired in the pH range 2.5–10.0 (a 22 12 matrix) yields four significant singular values, together accounting for 96% of the total signal variance (Figure 2D). More rigorous analysis of ESI MS data suggests that evolution of the charge-state distributions of aMb ions during acid unfolding of the protein can indeed be reconstructed using four independent components [12]. In other words, four conformers become populated during such a process. Once the number of independent components is known, a supervised minimization routine can be used to determine the average charge and the width of the ionic envelope representing each particular conformer, as well as its contribution to the overall protein ion signal (Figure 2).
50
Wendell P. Griffith et al.
+20
A
+19
+21
+20
B +21 +19
+29 +41
+50
C
1000
2000
3000
4000
m/z
Figure 1 ESI mass spectra of an 80-kDa protein transferrin acquired under near-native (10 mM ammonium acetate, pH 7.0, panel A), mildly denaturing (10 mM ammonium acetate, pH adjusted to 5.0, panel B), and strongly denaturing (water/methanol/acetic acid, 47:50:3 v:v:v, panel C) conditions. Emergence of non-native (partially unfolded) states is evident in (B) as the charge-state distribution becomes bimodal. Further unfolding of the protein (population of significantly less compact states) is manifested in (C) by a dramatic increase of the abundance of highly charged protein ions. Adapted with permission from [32].
51
Characterization of Protein Higher Order Structure and Dynamics with ESI MS
abundance
(A)
N I
0
E
600 singular value
100
(D)
300
0 0
2
4
6
100
i
8
10
abundance
(B)
12 (E)
E
0 (C)
abundance
100
U E
0
cummulative variance
100 I
0 5
10
15 20 n (charge state)
25
0
2
4
6
8
10
i
Figure 2 Conformational heterogeneity and acid-induced unfolding of apomyoglobin (aMb) monitored by ESI MS. Panels (A–C) illustrate fitting of representative charge-state distributions of aMb ions; the mass spectra were acquired in 10 mM CH3CO2NH4 aqueous solutions whose pH levels were adjusted to (A) 7.4, (B) 4.5, and (C) 2.5. A plot of singular values wi obtained from SVD of the entire ESI MS data array of aMb is shown in panel D. A plot of cumulative variance accounted for by the first i singular values is shown in panel E. Adapted with permission from [12].
3. CONFORMATIONAL DYNAMICS IN MULTI-COMPONENT SYSTEMS: ASSEMBLY OF HEMOGLOBIN TETRAMERS In the example of aMb large-scale dynamics discussed in the previous section, the analysis was carried out using only one characteristic of protein ions, namely their charge states. Masses of protein ions become important in a situation when protein conformational dynamics is monitored in a multi-component system. Protein conformational analysis in such systems is an extremely challenging task for most spectroscopic methods due to signal interference and the subsequently exceedingly difficult task of separating signals corresponding to the different protein species. ESI MS resolves this problem in the most elegant way, as the ion peaks corresponding to the different protein components of the mixture generally appear at different m/z values and do not interfere with each other. Even in most unfavorable cases when the ion peaks corresponding to different ionic species do overlap (e.g., for monomeric and dimeric ions whose number of charges differ by a factor of two, MHnn+ and M2H2n2n+), a distinction between the two species can be made if the resolving power of the mass analyzer is high enough to allow the
52
Wendell P. Griffith et al.
spacing between the isotopic peaks to be determined. Thus, mass measurement provides an important second dimension in the analysis of protein ion chargestate distributions, making ESI MS unrivaled in its ability to characterize largescale protein conformational dynamics in highly heterogeneous mixtures. This feature of ESI MS was recently used to study assembly and dissociation of mammalian tetrameric hemoglobin (Hb). Members of the mammalian Hb family are tetrameric proteins, which are composed of two a- and two b-globins, each containing a non-covalently bound prosthetic heme group [15]. Hb, whose main physiological function is the delivery of dioxygen from the lungs to peripheral respiring tissues and carbon dioxide in the opposite direction [16], has served for many years as a paradigm of small ligand binding, cooperativity, and allostery. Despite being highly stable while within the red blood cells, the quaternary structure of tetrameric Hb is very dynamic when the protein is released into plasma upon hemolysis of erythrocytes [17,18], where the tetrameric form (ab)2 is in rapid and continuous equilibrium with the dimeric form ab ( denotes the presence of the heme group on a globin chain) [19]. The ESI mass spectrum of a dilute solution of bovine Hb acquired under nearnative conditions is shown in Figure 3. The spectrum reveals the presence of at least five different species in solution, whose signals are clearly resolved: Hb tetramer, (ab)2, Hb dimer, ab, semi-Hb dimer ab, and two monomeric species, a and b. The a-globin, which is always in its holo- or heme-bound form, maintains a highly compact structure as indicated by its narrow charge-state distribution. This native (N) conformation predominates the mass spectra even under mildly acidic conditions (Figure 3, middle trace). Once the solution pH is decreased to 4, the a-globin becomes destabilized sufficiently enough to allow the heme group to dissociate from the polypeptide chain (Figure 3, top trace). Contrary to the a-globin, the b-globin is very flexible, as reflected by the significant increase of the extent of multiple charging under near-native conditions. In fact, the degree of structural disorder within the monomeric b-chains is so significant that they are not able to retain the heme group efficiently, although binding to a monomers dramatically reduces their flexibility, apparently ‘‘snapping’’ them into ‘‘correct’’ fold. Thus, formation of semi-Hb (a dimeric species with a single heme group, ab) endows b-chains with heme-binding competency, leading to formation of a ‘‘normal’’ Hb dimer, ab, whose dimerization produces a tetrameric Hb species (ab)2. A detailed analysis of conformational dynamics of individual globin chains at near-neutral and mildly acidic pH suggests that the ordered oligomerization process remains efficient as long as the natively folded a chain serves as a rigid template for b-globin binding. Acidification of the protein solution to pH 4 leads to significant unfolding of a chains and heme group dissociation, which apparently arrests the ordered oligomerization, replacing it with inefficient random dimerization (Figure 3). Despite a significant degree of structural heterogeneity exhibited by monomeric b-globins, in the absence of a-chains they (as well as their fetal analogs, g-globins) are able to form homotetrameric assemblies in vivo [20,21], whose quaternary structure is remarkably similar to that of normal Hb species
53
Characterization of Protein Higher Order Structure and Dynamics with ESI MS
α+n, β+n 100 80 pH 3 60 (αβ)+12 β2+12 +12
α+8
40
α2+11 (αβ)+11
α2
β2+11
20 0
pH 4
100 80
(α∗β∗)+12
60
(α∗β)+12 α+8
40
β+8 20 0 (α∗β∗)2+18
100
pH 8
80
(α∗β)+12
60
(α∗β∗)+12
β+16 40 (α∗)+8
β+23
20 0 500
1000
1500
2000
2500
3000
3500
m/z
Figure 3 ESI mass spectra of bovine Hb acquired at pH 3, 4, and 8. Ionic signal above m/z 2,400 in the pH 3 spectrum, and below m/z 1,650 in the pH 8 mass spectrum has been multiplied by a factor of 5 for clear visualization. Adapted with permission from [10].
(ab)2 [22–24]. This behavior contrasts sharply with that of a-globins, which do not undergo ordered oligomerization to form homodimeric or tetrameric species in the absence of their counterparts and in fact require a specialized chaperone system [25] converting excessive free a monomers in erythrocytes to chemically
54
Wendell P. Griffith et al.
inert states [26]. Although it may seem puzzling that it is the globin lacking a well-defined structure that is capable of forming tetrameric species with a nativelike structure, analysis of ionic charge-state distributions in ESI MS of isolated b-globins suggests that this polypeptide actually populates several states in solution (Figure 4). One of these states possesses a degree of compactness, which is the same as that of natively folded globins, such as the native conformation of myoglobin (vide supra). This conformation is represented by the ionic signal at charge states +7, +8, and +9. Therefore, it is probably more appropriate to describe b-globin flexibility not in terms of disorder (lack of structure), but rather a high degree of conformational heterogeneity, as the polypeptides lacking welldefined structure coexist under equilibrium alongside the highly compact (and most likely folded) state. Although it may be tempting to call this compact b-globin state ‘‘native,’’ such a term would actually be a misnomer in this case, as it is not the only state of the protein populated under native conditions. Instead, we will continue to refer to this state as ‘‘compact,’’ keeping in mind that it is very likely that its higher order structure follows the blueprint of a generic globin fold [27,28]. +8
A
α-globin + heme
relative abundance
100
+9 +7 0 500
1000
1500
2000
2500
3000
3500
m/z
+18
B
+12
100 relative abundance
β - globin + heme
+21 + +
+19 +8 +9
+14 +9
+8
0 500
1000
1500
2000
2500
3000
3500
m/z
Figure 4 ESI mass spectra of the isolated a- (A) and b-chains (B) of bovine Hb saturated with a large molar excess of a heme group. Calculated m/z values for putative dimeric ions (bb)+12, (bb)+12, and (b)+12 2 are indicated with dotted lines.
Characterization of Protein Higher Order Structure and Dynamics with ESI MS
55
When the protein solution is saturated with heme, this compact conformation begins to display a limited ability to bind and retain the heme group, although a significant fraction of folded polypeptide chains remain heme-free. It seems logical to assume that this conformer also acts as a ‘‘marginal’’ template for dimerization, providing the requisite scaffold to which a flexible b-chain can adopt. In a heme-saturated solution, the initial binding event appears to be a b–b interaction, leading to formation of a homo-analog of a semi-Hb dimer, bb. Just like in the case of semi-Hb, this binding event locks the flexible apopolypeptide chain in a proper conformation, making it competent to bind a heme group. In a heme-saturated solution, this leads to an efficient formation of a homodimer b2 and, eventually, to a homotetrameric species, b4. Unlike b-globins, heme-reconstituted a-chains largely fail to dimerize under near-native conditions, as can be seen from the appearance of ESI MS of isolated a-globin solution saturated with heme (Figure 4). Analysis of protein ion chargestate distributions clearly indicates that the protein exists mostly in a compact tightly folded conformation in the pH range down to 5, which may serve as an excellent binding template but is not flexible enough for efficient self-assembly (vide supra). The small fraction of polypeptides that are less structured (charge states +10 and above) are likely due to lack the requisite flexibility as well, since they appear to maintain their heme-binding capacity. That, as well as their low Boltzmann weight, does not allow them to act as efficient flexible binding partners of compact, highly structured states of a-globins. The ability of b-globins to undergo ordered oligomerization to form dimers and tetramers both in vitro and in vivo highlights the extreme importance of chain dynamics and conformational heterogeneity for the protein assembly process. Neither stable structure nor extreme flexibility alone appear to be sufficient to drive the ordered oligomerization process forward, while a combination of a highly ordered structural template and a flexible partner does result in efficient dimerization. The benefits of intrinsic structural disorder for protein binding became apparent in recent years [29], challenging the traditional perception of a well-defined structure as a prerequisite for recognition and efficient interaction. The results of our work provide further indication that the two concepts are not necessarily contradictory, and that disorder possibly evolves in complex biological systems alongside structure as a means to attain the highest possible efficiency in biomolecular interactions.
4. CHARGE-STATE DISTRIBUTION AND THE ESTIMATION OF THE SOLVENT-EXPOSED SURFACE AREAS OF PROTEINS In addition to being an efficient tool to probe conformational dynamics, the correlation between the extent of multiple charging of protein ions in ESI MS and the degree of protein compactness in solution can also be used to provide estimates of SASA of natively folded proteins and their complexes [30]. Although evaluation of SASA based on the measurements of average charges of protein ions in ESI MS cannot presently rival the established techniques as far as
56
Wendell P. Griffith et al.
4
A
ln(average charge)
TT 3
T D
2
1 8
9
10
11
ln(Surface Area,
12
Å2) B
T +18
+17
T +19 (TT)+28
D+13
(TT)+26
T
D+12
(TT)+27
relative abundance
100
D +11 0 2500
3000
3500
4000 m/z
4500
5000
Characterization of Protein Higher Order Structure and Dynamics with ESI MS
57
measurement precision, it may be extremely useful for characterizing protein assemblies in solution that are not amenable to analysis using traditional biophysical tools due to their transient nature or heterogeneous character. One such example is shown in Figure 5, where the surface area of a non-globular protein assembly (an octameric form of sickle cell hemoglobin, HbS) is estimated by measuring the average charge of protein ions in ESI MS and using the charge– surface correlation [30] as a ‘‘calibration curve.’’ The octameric form of HbS is often viewed as a precursor to HbS polymerization in red blood cells, a process leading to erythrocyte deformation [19]. The contact area between the two tetramers in the octameric structure of HbS is very limited, giving the entire assembly an appearance of ‘‘touching spheres’’ (see the insert in Figure 5A). The octamer ion signal (labeled TT in Figure 5) is prominent in the ESI mass spectrum of a diluted HbS sample acquired under aerobic conditions, as are the dimer (D) and tetramer (T) ionic components. Average charges are calculated for each of these three species based on the observed distributions of charge states, and the surface areas of the corresponding assemblies are calculated using the charge–surface correlation (Figure 5A). Surface areas of all three species estimated using this procedure are indicated with arrows on the graph ˚ 2 for D, 2.48 104 A ˚ 2 for T, and 4.56 104 A ˚ 2 for TT) and appear to (1.40 104 A match reasonably well the surfaces calculated based on the available crystal ˚2 structure, which are indicated with solid vertical lines on the graph (1.36 104 A 4 ˚2 4 ˚2 for D, 2.43 10 A for T, and 4.76 10 A for TT). The most significant deviation (4%) is observed for the octameric species, most likely due to its highly concave shape. This deviation, however, is insignificant when compared to the difference between the crystal structure-based surface of HbS octamer and the surface estimate produced by simple summation of the solvent-exposed surface areas of ˚ 2 (vertical dashed lines in Figure 5A), its monomeric constituents of 60,900 A which corresponds to a 34% deviation from the surface calculated based on HbS crystal structure. Therefore, it appears that the charge–surface correlation can be used to provide reasonable estimates of solvent-shielded surface at protein– protein interfaces within macromolecular assemblies in solution.
Figure 5 The use of charge–surface correlation for natively folded proteins (A) as a calibration curve for estimation of surface area of the octameric species of human sickle cell hemoglobin (HbS). The ESI mass spectrum of Hb S (40 mM solution at neutral pH) is shown in panel B. The evaluation of surface areas of dimers (D), tetramers (T), and octamers (TT) was based on the experimentally measured average number of charges accommodated by the respective protein ions (open circles in panel A). Shaded circles on the graph in panel A represent the set of protein ions (ranging from a 5-kDa insulin to 0.5-MDa ferritin) used to establish the charge–surface correlation and construct the calibration curve. None of the sickle-cell hemoglobin species was used to construct the calibration curve. Vertical lines projected from the x-axis represent the numerical values of surfaces of the protein species based upon crystal structures of oligomers (solid) and simple summation of the surfaces of the constituent monomeric species (dashed). Arrows indicate the numerical value of surfaces estimated based on the ESI MS data, with shaded areas representing confidence intervals for such estimates. Adapted with permission from [30].
58
Wendell P. Griffith et al.
5. LIMITATIONS OF THE USE OF CHARGE-STATE DISTRIBUTIONS FOR DETERMINING PROTEIN CONFORMATIONAL HETEROGENEITY In the preceding discussion we implicitly assumed that macromolecular geometry in solution is the major determinant of the extent of multiple charging of ESI-generated protein. However, the charge-state distributions can be affected by a variety of other factors, mostly as a result of processes occurring in the ESI interface region [31]. One particularly common situation is the apparent reduction of the number of protons carried by protein ions in the gas phase, which is frequently encountered when acid unfolding of proteins in solution is induced by adjusting the pH of weak buffer solutions (such as ammonium acetate, CH3CO2NH4) with acids (such as acetic acid, CH3CO2H). Evolution of charge-state distributions following mild acidification (down to pH ¼ pKa of acetic acid) does not reveal any noticeable contributions of gas-phase ion chemistry [31]. However, continuous acidification of protein solution (below the pKa) often results in a detectable decrease of the average charges of ionic species representing native protein conformations [31]. This is due to protein-anion adduct formation in solution (e.g., MHnn+?CH3CO 2 ). At pH equal to that of the pKa of acetic acid the CH3CO /CH CO H pair 2 3 2 acts as a strong buffer and large quantities of CH3CO2H are required in order to bring about even a relatively small pH decrease. This results in a dramatic n+ increase of CH3CO 2 concentration, resulting in more efficient MHn ?CH3CO2 complex formation. This complex dissociates in the gas phase to produce (n1)+ MH(n1) and neutral CH3CO2H, since charge separation in the gas phase (to produce MHnn+ and CH3CO 2 ions) would carry a very significant enthalpic penalty. The result of these processes is an apparent reduction of protein ions, which can be incorrectly interpreted (if the gas-phase ion chemistry is ignored) as a result of tightening of the protein structure in solution [31]. Under certain conditions, ionic charge-state distributions may also be affected by other gas-phase processes, such as dissociation of non-covalent aggregates [32]. It often proceeds via the so-called asymmetric charge partitioning, when a highly charged monomer is ejected from the complex [33,34]. When significant quantities of such metastable aggregate ions are produced, the ESI mass spectra may contain ion peaks corresponding to monomeric protein species carrying a rather high number of charges. The presence of such ionic species in ESI mass spectra may be (incorrectly) interpreted as a manifestation of protein unfolding in solution [32]. Obviously, analysis of charge-state distributions of protein ions in ESI MS provides reliable information on protein conformational dynamics only if the observed changes in the extent of multiple charging are due to the changes in protein shape and solvent-exposed surface area. Ignoring the gas-phase processes that may alter the ionic charge distributions can introduce a significant systematic error in the evaluation of protein conformational heterogeneity and solution dynamics.
Characterization of Protein Higher Order Structure and Dynamics with ESI MS
59
6. FUTURE OUTLOOK A very important advantage offered by the analysis of protein ion charge-state distributions as a means to monitor conformational changes is the ability of this technique to deal with complex multi-component systems. Indeed, signal interference does not present a challenge to the approach based on the analysis of protein ion charge-state distributions in ESI MS, since ion peaks corresponding to different protein components of the mixture will generally have different m/z values and, therefore, will not interfere with each other. However, in some unfavorable situations signal interference does present a problem even for ESI MS measurements, as high degrees of structural heterogeneity (e.g., those frequently encountered among highly glycosylated proteins) lead to very significant broadening of ion peaks and, in extreme cases, overlap of peaks corresponding to different species and charge states. A successful application of charge-state deconvolution procedure in such a situation may require utilization of a more sophisticated mathematical apparatus than the one presented here (e.g., maximum entropy-based methods to resolve individual ionic species and charge states in the mass spectra before applying chemometric tools). One example of such a system is the interaction between Hb and an Hb-binding protein, haptoglobin (Hp). Hps are a group of genetically polymorphic serum glycoproteins found in vertebrates [35,36]. In its simplest form, Hp 1-1, human Hp has a tetrachain structure that is composed of two light chains (L) and two heavy chains (H), which are covalently connected by disulfide bridges in an H-L-L-H sequence (see color inset in Figure 6). Hps are most widely known for their function in the sequestration of free Hb molecules in the blood, their transport to the liver, and subsequent catabolism [37]. Each Hp 1-1 molecule can bind Hb in a 1:1 ratio (with each Hp H-chain binding to one Hb ab-dimer). The binding is effectively irreversible and is among the strongest of all known non-covalent protein–protein interactions in nature [38]. ESI MS readily detects interaction between Hp 1-1 and Hb and provides information on binding stoichiometry (Figure 6). Furthermore, the average charges of Hp ions and their complexes with Hb can be used to provide valuable information on protein geometry in solution in the absence of crystal structures. However, such information is difficult to obtain for higher Hp oligomers and their complexes with Hb, whose ions are also present in the ESI mass spectra (m/zW6,000). The high degree of conformational heterogeneity of Hp (due to natural variation of its saccharide content) leads to very significant peak broadening and does not allow distinct detection of ions at different charge states. More sophisticated methods of ESI MS data analysis will be required in order to obtain reliable information on both masses and charges of these high mass ions. Although the focus of this chapter is the use of ESI MS to characterize conformation and large-scale dynamics of proteins, it seems inevitable that the analysis of charge-state distributions as a means to probe higher order structure will be expanded in the near future to include other types of biopolymers and even polymers of abiotic origin. Charge-state distribution analysis has been already applied to monitor conformational transitions in protein-DNA systems, however the focus was on a protein component [39]. Although oligonucleotides
60
Wendell P. Griffith et al.
L2H2+22 L2H2
relative abundance
75
α
β
α
β
α
β
(α∗β∗)L2H2+23
(α∗β∗)+11
100
(α∗β∗)L2H2+24
+23
L2 H2 +22
50 L2 H2 +23
25
0 2000
3000
4000
5000
6000
7000
8000
9000
m/z
Figure 6 ESI mass spectra of haptoglobin (Hp 1-1, gray trace) and an Hp–Hb mixture (black trace). Letters L and H refer to light and heavy chains of Hp 1-1, and a and b refer to globin chains. The inset shows putative structure of Hp/Hb complexes. Only top structure (unsaturated with Hb) can be confidently identified in the ESI mass spectrum of Hp/Hb mixture, despite apparent excess of free (unbound) Hb dimers in solution.
also produce multiply charged ions in ESI MS, it remains to be seen whether the extent of protonation (or deprotonation in the negative-ion mode) of oligonucleotides and their complexes with protein molecules truly reflects their compactness in solution. If this is indeed the case, analysis of ionic charge-state distributions in ESI mass spectra of oligonucleotide–protein complexes will shed light on intimate details of processes ranging from gene duplication to expression to protein synthesis. Finally, application of this technique to study behavior of polymers that are not genetically controlled (both biological and synthetic) may provide an urgently needed means to carry out conformational analysis in highly heterogeneous systems, a development that will greatly benefit multiple fields ranging from glycobiology to biopharmaceuticals (e.g., design of polymer-conjugated therapeutics) to nanotechnology (e.g., design of bio-inspired nanomaterials).
ACKNOWLEDGEMENTS This work was supported by grants R01 GM061666 from the National Institutes of Health and CHE-0406302 from the National Science Foundation.
Characterization of Protein Higher Order Structure and Dynamics with ESI MS
61
REFERENCES 1 S.K. Chowdhury, V. Katta and B.T. Chait, Probing conformational changes in proteins by mass spectrometry, J. Am. Chem. Soc., 112 (1990) 9012–9013. 2 J.A. Loo, R.R. Loo, H.R. Udseth, C.G. Edmonds and R.D. Smith, Solvent-induced conformational changes of polypeptides probed by electrospray-ionization mass spectrometry, Rapid Commun. Mass Spectrom., 5 (1991) 101–105. 3 L. Konermann and D.J. Douglas, Acid-induced unfolding of cytochrome c at different methanol concentrations: Electrospray ionization mass spectrometry specifically monitors changes in the tertiary structure, Biochemistry, 36 (1997) 12296–12302. 4 L. Konermann and D.J. Douglas, Equilibrium unfolding of proteins monitored by electrospray ionization mass spectrometry: Distinguishing two-state from multi-state transitions, Rapid Commun. Mass Spectrom., 12 (1998) 435–442. 5 R. Grandori, Detecting equilibrium cytochrome c folding intermediates by electrospray ionisation mass spectrometry: Two partially folded forms populate the molten-globule state, Protein Sci., 11 (2002) 453–458. 6 D.R. Gumerov and I.A. Kaltashov, Dynamics of iron release from transferrin N-lobe studied by electrospray ionization mass spectrometry, Anal. Chem., 73 (2001) 2565–2570. 7 E.T. van den Bremer, W. Jiskoot, R. James, G.R. Moore, C. Kleanthous, A.J. Heck and C.S. Maier, Probing metal ion binding and conformational properties of the colicin E9 endonuclease by electrospray ionization time-of-flight mass spectrometry, Protein Sci., 11 (2002) 1738–1752. 8 W.P. Griffith and I.A. Kaltashov, Highly asymmetric interactions between globin chains during hemoglobin assembly revealed by electrospray ionization mass spectrometry, Biochemistry, 42 (2003) 10024–10033. 9 D.H. Simmons, D.J. Wilson, G.A. Lajoie, A. Doherty-Kirby and L. Konermann, Subunit disassembly and unfolding kinetics of hemoglobin studied by time-resolved electrospray mass spectrometry, Biochemistry, 43 (2004) 14792–14801. 10 W.P. Griffith and I.A. Kaltashov, Mass spectrometry in the study of hemoglobin: From covalent structure to higher order assembly, Curr. Org. Chem., 10 (2006) 535–553. 11 A. Dobo and I.A. Kaltashov, Detection of multiple protein conformational ensembles in solution via deconvolution of charge state distributions in ESI MS, Anal. Chem., 73 (2001) 4763–4773. 12 A. Mohimen, A. Dobo, J.K. Hoerner and I.A. Kaltashov, A chemometric approach to detection and characterization of multiple protein conformers in solution using electrospray ionization mass spectrometry, Anal. Chem., 75 (2003) 4139–4147. 13 F. Wang and X. Tang, Conformational heterogeneity of stability of apomyoglobin studied by hydrogen/deuterium exchange and electrospray ionization mass spectrometry, Biochemistry, 35 (1996) 4069–4078. 14 R. Gilmanshin, M. Gulotta, R.B. Dyer and R.H. Callender, Structures of apomyoglobin’s various acid-destabilized forms, Biochemistry, 40 (2001) 5127–5136. 15 R.E. Dickerson and I. Geis, Hemoglobin: Structure, Function, Evolution, and Pathology, Benjamin/ Cummings Pub. Co, Menlo Park, CA, 1983. 16 N. Maclean, Haemoglobin, Edward Arnold, London, 1978. 17 L. Lunelli, P. Zuliani and G. Baldini, Evidence of hemoglobin dissociation, Biopolymers, 34 (1994) 747–757. 18 M.S. Hargrove, T. Whitaker, J.S. Olson, R.J. Vali and A.J. Mathews, Quaternary structure regulates hemin dissociation from human hemoglobin, J. Biol. Chem., 272 (1997) 17385–17389. 19 J.M. Manning, A. Dumoulin, X. Li and L.R. Manning, Normal and abnormal protein subunit interactions in hemoglobins, J. Biol. Chem., 273 (1998) 19359–19362. 20 D.A. Rigas, R.D. Koler and E.E. Osgood, New hemoglobin possessing a higher electrophoretic mobility than normal adult hemoglobin, Science, 121 (1955) 372–372. 21 R.T. Jones, W.A. Schroeder, J.E. Balog and J.R. Vinograd, Gross structure of hemoglobin H, J. Am. Chem. Soc., 81 (1959) 3161–3161. 22 G.E.O. Borgstahl, P.H. Rogers and A. Arnone, The 1.8 A structure of carbonmonoxy-b4 hemoglobin: Analysis of a homotetramer with the R quaternary structure of liganded a2b2 hemoglobin, J. Mol. Biol., 236 (1994) 817–830.
62
Wendell P. Griffith et al.
23 G.E.O. Borgstahl, P.H. Rogers and A. Arnone, The 1.9 A structure of deoxy b4 hemoglobin: Analysis of the partitioning of quaternary-associated and ligand-induced changes in tertiary structure, J. Mol. Biol., 236 (1994) 831–843. 24 R.D. Kidd, H.M. Baker, A.J. Mathews, T. Brittain and E.N. Baker, Oligomerization and ligand binding in a homotetrameric hemoglobin: Two high-resolution crystal structures of hemoglobin Bart’s g4, a marker for a-thalassemia, Protein Sci., 10 (2001) 1739–1749. 25 L. Feng, D.A. Gell, S. Zhou, L. Gu, Y. Kong, J. Li, M. Hu, N. Yan, C. Lee and A.M. Rich, Molecular mechanism of AHSP-mediated stabilization of a-hemoglobin, Cell, 119 (2004) 629–640. 26 S. Zhou, J.S. Olson, M. Fabian, M.J. Weiss and A.J. Gow, Biochemical fates of alpha hemoglobin bound to alpha hemoglobin stabilizing protein (AHSP), J. Biol. Chem., 281 (2006) 32611–32618. 27 A.M. Lesk and C. Chothia, How different amino acid sequences determine similar protein structures: The structure and evolutionary dynamics of the globins, J. Mol. Biol., 136 (1980) 225–230. 28 D. Bashford, C. Chothia and A.M. Lesk, Determinants of a protein fold: Unique features of the globin amino acid sequences, J. Mol. Biol., 196 (1987) 199–216. 29 A.K. Dunker, C.J. Brown, J.D. Lawson, L.M. Iakoucheva and Z. Obradovic, Intrinsic disorder and protein function, Biochemistry, 41 (2002) 6573–6582. 30 I.A. Kaltashov and A. Mohimen, Estimates of protein surface areas in solution by electrospray ionization mass spectrometry, Anal. Chem., 77 (2005) 5370–5379. 31 D.R. Gumerov, A. Dobo and I.A. Kaltashov, Protein-ion charge-state distributions in electrospray ionization mass spectrometry: Distinguishing conformational contributions from masking effects, Eur. J. Mass Spectrom., 8 (2002) 123–129. 32 R.R. Abzalimov, A.K. Frimpong and I.A. Kaltashov, Gas-phase processes and measurements of macromolecular properties in solution: On the possibility of false positive and false negative signals of protein unfolding, Int. J. Mass Spectrom., 253 (2006) 207–216. 33 J.C. Jurchen and E.R. Williams, Origin of asymmetric charge partitioning in the dissociation of gasphase protein homodimers, J. Am. Chem. Soc., 125 (2003) 2817–2826. 34 F. Sobott, M.G. McCammon and C.V. Robinson, Gas-phase dissociation pathways of a tetrameric protein complex, Int. J. Mass Spectrom., 230 (2003) 193–200. 35 J.C. Wejman, D. Hovsepian, J.S. Wall, J.F. Hainfeld and J. Greer, Structure of haptoglobin and the haptoglobin–hemoglobin complex by electron microscopy, J. Mol. Biol., 174 (1984) 319–341. 36 N. Urushibara, T. Kumazaki and S. Ishii, Hemoglobin-binding site on human haptoglobin, J. Biol. Chem., 267 (1992) 13413–13417. 37 M.R. Langlois and J.R. Delanghe, Biological and clinical significance of haptoglobin polymorphism in humans, Clin. Chem., 42 (1996) 1589–1600. 38 V. Kaartinen and I. Mononem, Hemoglobin binding to deglycosylated haptoglobin, Biochim. Biophys. Acta, 953 (1988) 345–352. 39 H.B. Kamadurai, S. Subramaniam, R.B. Jones, K.B. Green-Church and M.P. Foster, Protein folding coupled to DNA binding in the catalytic domain of bacteriophage lambda integrase detected by mass spectrometry, Protein Sci., 12 (2003) 620–626.
CHAPT ER
3 Noncovalent Protein Interactions Summer L. Bernstein and Michael T. Bowers
Contents
1. Introduction 2. Instrumentation and Technical Development 2.1 Tandem MS: The Q-TOF 2.2 Fourier transform ion cyclotron resonance: ICR-MS/BIRD/ECD 2.3 Ion mobility mass spectrometry 2.4 Ion mobility: Traveling wave (T-wave) 3. Protein Misfolding and Aggregation 3.1 Amyloid b 3.2 a-Synuclein 4. Ligand–Receptor Interactions 5. Heterogeneous Complexes: TRAP 6. Subunit Exchange of Transthyretin 7. Future Directions 8. Conclusions Acknowledgments References
63 64 65 65 66 66 67 67 72 74 76 78 78 79 79 79
1. INTRODUCTION Proteins are the fundamental components in living cells that have the innate ability to fold, function, and self assemble into complicated macromolecular structures in a complex environment [1]. The biological activity and function of a protein is based on its ability to adopt a stable three-dimensional structure. Every major cellular process is maintained and carried out by multimeric protein assemblies of at least 10 or more proteins [2]. Proteins also interact with other biomolecules such as nucleic acids (DNA/RNA), metals, ligands, peptides, small molecules, saccharides, and cofactors to form complex molecular protein Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00203-1
r 2009 Elsevier B.V. All rights reserved.
63
64
Summer L. Bernstein and Michael T. Bowers
machines. However, little is known about these macromolecular structures and their relationship with their cellular functions. Since the development of electrospray ionization combined with mass spectrometry (ESI-MS) by Fenn and coworkers in the late 1980s [3], MS has become an important complementary tool to structural techniques such as NMR, X-ray crystallography, and electron microscopy to study the structural aspects of biological macromolecular assemblies. ESI-MS can measure the exact mass of a protein complex and the individual components of macromolecular assemblies making it an attractive tool for drug discovery [4]. Noncovalent interactions in solution involve electrostatic interactions, hydrogen bonding, hydrophobic interactions, and van der Waals forces. Of interest here is the fact that several studies have shown that noncovalent assemblies often survive the solvent depletion process and retain much of their quaternary structure [5–8]. Several reviews on the subject over the past 10 years have demonstrated the usefulness of MS to study these types of systems not only as an analytical technique but also as a powerful tool that gives new insight and a deeper understanding into the nature of protein interactions within complex biological systems [8–14]. Given that the investigation of noncovalent complexes has already been reviewed extensively, the focus of this chapter will be to highlight only a few important systems that utilize a number of complementary mass spectrometric approaches. We will begin by discussing two peptides, amyloid b-protein and a-synuclein (a-syn), which are involved in homogeneous oligomerization and implicated in protein misfolding and aggregation diseases. Next we will consider the assembly of the tryptophan RNA-binding protein, TRAP, and how MS is well suited to study highly complex heterogeneous macromolecular systems [5,15]. And finally, we will examine how dynamic processes such as assembly and subunit exchange can be monitored by MS. Like most scientific quests, one approach is not sufficient to solve the entire problem. All these examples reflect the advantage of applying complementary MS techniques which in turn reflects the multidisciplinary approach required to solve difficult and interesting biological problems.
2. INSTRUMENTATION AND TECHNICAL DEVELOPMENT ESI-MS is the principal soft ionization method for proteins because a typical protein has multiple charge sites allowing analysis of high-molecular-weight species in a mass spectrometer with a limited mass range. Of particular importance is nanospray-ESI (nano-ESI) which reduces the amount of sample (mg to mg) and solution volume (mL to mL) required and produces a finer spray compared to traditional ESI. Nano-ESI allows more efficient desolvation of the droplets, a critical parameter for preservation, detection, and analysis of noncovalent complexes in aqueous solutions containing metal ions and buffers [8]. Sample preparation is an important factor. An objective when studying biological systems is to try to maintain the system in its native functional state. This is quite often a challenge for MS since many buffers used in biochemistry are
Noncovalent Protein Interactions
65
not compatible with the mass spectrometer. Ammonium acetate and ammonium bicarbonate are the most popular choices for MS application and either dialysis or centrifugal membrane/gel filtration columns are used for salt removal, buffer exchange, and as sample concentrators. In all the experiments described here great care and caution was taken with each sample to maintain integrity of the system when sprayed. Of course some structural change occurs following solution evaporation but these can be minimized, especially for noncovalent complexes.
2.1 Tandem MS: The Q-TOF The combination of nano-ESI with time-of-flight mass spectrometry (TOF) opened the door to the study of systems with a limitless mass-to-charge (m/z) range. However, in the high mass range the separation between multiple charge states gets smaller. This can be a problem that compounds with heterogeneity. Thus tandem MS (MS/MS) for selective collision-induced dissociation (CID) has become an essential tool for deciphering noncovalent heterogeneous complexes [12]. In MS/MS the macromolecular ions are mass selected in the first mass analyzer (often a quadrupole), are injected into a collision cell and subjected to multiple low-energy collisions (usually with Ar gas), and finally pass through the second mass analyzer (typically in a TOF) for identification of the dissociation products. Proteins in a native solution tend to carry relatively low charge and thus large noncovalent complexes have relatively high m/z. When mass analyzed by a quadrupole this becomes a problem. The Robinson group has made significant contributions in this area with modifications to the original Q-TOF (Q-TOF2; Waters-Micromass, UK Ltd.), which helped overcome the mass range dilemma by increasing the quadrupole operating range up to 22,000 m/z [16]. The charge distribution over a wide m/z range often provides valuable information about the topology, quaternary structure, and stability that other techniques are not able to provide [13]. Further, numerous studies of protein complexes have shown that CID of a specific macromolecular complex often allows identification of interacting subunits by dissociation of oligomers into (relatively) highly charged monomers and (relatively) low-charged n1 oligomers. The capabilities of these techniques have been demonstrated on a variety of systems including intact ribosomes [17], viruses [18], transmembrane proteins [19], proteosomes [20], and molecular chaperones such as the GroEL complex [21,22].
2.2 Fourier transform ion cyclotron resonance: ICR-MS/BIRD/ECD Fourier transform ion cyclotron resonance (FT-ICR) MS is another popular method for studying proteins because of its high resolution and mass accuracy capabilities. Ions can be sequentially stored, selected, and fragmented in the ion trap. Blackbody infrared dissociation (BIRD) [23,24] and electron capture dissociation (ECD) [25] are two useful techniques carried out in the FT-ICR MS
66
Summer L. Bernstein and Michael T. Bowers
to study fragmentation and dissociation of noncovalent assemblies. In BIRD the ICR is equipped with a temperature controllable cell where the ions can be stored for variable lengths of time after which the fragments are detected and analyzed. However, when ions are stored in the ICR and subjected to ECD, product ions arise from the cleavages of the amide bond in the protein chain. ECD is often used as a top-down sequencing approach for intact proteins in order to detect post-translational modifications. However, in selected cases ECD can dissociate covalent bonds while retaining the noncovalent interactions in noncovalent complexes [25]. These experiments are useful for revealing specific binding sites within a noncovalent complex as will be discussed in the following example of spermine ligation with a-syn.
2.3 Ion mobility mass spectrometry MS coupled with ion mobility (IM) measurement [7,26,27] is also emerging as a prominent technique to study protein conformation as shown in work by Bowers and co-workers [28–30], Jarrold and co-workers [31,32], and Clemmer and coworkers [33]. IM is a noninvasive method capable of separating molecules of different size without chemical bias. Analogous to molecules separated by a constant electric field in size exclusion chromatography (SEC) or gel electrophoresis, IM separates ions in a cell filled with an inert gas (usually He) under the influence of a weak electric field. IM measures the amount of time it takes for a short pulse of ions to drift through the He buffer gas and arrive at the detector giving an arrival time distribution (ATD). Using the average drift time and kinetic theory [34], it is possible to obtain accurate values of the molecules’ collision cross section (s) for a mass-selected ion of interest. To obtain molecularlevel detail, these experimental cross sections are compared with high-level theoretical predictions. Several programs for theoretical modeling [35] and either extended dynamics runs or simulated annealing protocols are used to obtain low-energy families of model structures [36]. IM is also a useful tool that separates ions with multiple conformations and/or oligomers of the same m/z, as we will demonstrate in the following example on the amyloid b-protein.
2.4 Ion mobility: Traveling wave (T-wave) Recently IM has been adapted into a commercially available quadrupoleorthogonal TOF instrument manufactured by Waters Inc., UK. [37]. In this instrument the ions are propelled through the IM cell by a series of pulses that move in time from one electrode to the next (T-wave). Although traditional kinetic theory cannot be used to obtain an accurate cross section, the arrival time of the ions in the T-wave instrument can be calibrated against systems with known cross sections to obtain an approximate cross section. One of the major advantages of this device is that it can analyze large noncovalent complexes while retaining high sensitivity as compared to a purpose-built instrument with a standard drift mobility cell.
Noncovalent Protein Interactions
67
3. PROTEIN MISFOLDING AND AGGREGATION One area of research in our laboratory focuses on neurological protein misfolding disease. When proteins misfold they frequently expose hydrophobic regions normally buried in the interior while the protein is in its native state. Thus the misfolded proteins are prone to interact with other misfolded proteins leading to an increased propensity for aggregation and eventual formation of insoluble amyloid fibrils. This process is known as amyloidosis [38–40]. The ability for proteins to misfold and form fibrils is now recognized as a generic feature of many polypeptide chains [40,41]. However, in rare instances amyloidosis has been implicated in several devastating neurologic diseases including Alzheimer’s disease (AD), Parkinson’s disease (PD), transmissible spongioform encephalopathies (TSE), Huntington’s disease, and others. Although the pathology of each disease is unique, the common end result, plaque formation, suggests a general mechanism of aggregation that can be targeted for therapeutic intervention [42]. An increasing body of evidence now supports the hypothesis that soluble noncovalently bound oligomeric assemblies of proteins are key pathogenic effectors of these neurologic diseases [43–45]. Presently little is known about the initial phases of folding and assembly of these soluble oligomers due to their metastable nature. Understanding the key events that lead to misfolding, identifying any partially misfolded intermediates, and determination of early oligomer size distributions are major goals of present theoretical and experimental work in the field.
3.1 Amyloid b Assembly of the amyloid b-protein (Ab) is a seminal feature of AD [46]. The two most common isoforms of Ab arising from the enzymatic cleavage of a larger transmembrane protein are Ab40 and Ab42, composed of 40 and 42 amino acids, respectively. Although Ab42 has concentrations only 10% of those of Ab40 in the brains of healthy individuals, it is found as the predominant form in senile plaques [47,48] and with higher concentrations in individuals affected by inherited forms of AD caused by various genetic mutations [42]. Furthermore, in vitro biophysical studies have shown Ab42 forms fibrils significantly faster and has more potent neurotoxic activity than Ab40 [42]. This large body of evidence strongly implicates Ab42 as a major contributory factor in the etiology of AD. Thus, much of the recent research has been focused on discovering therapeutics that target the assembly of the Ab peptides an on understanding the differences between Ab40 and Ab42 in their self-association process. Unfortunately the dynamic, noncovalent, homotypic self-association of Ab42 presents problems for biochemical and functional analyses. It appears that Ab42 monomer exists in steady state with higher-order assemblies. This situation complicates quantitative determination of the oligomer size distribution and determination of structure-activity relationships. The propensity of Ab to form fibrils also precludes application of classical structure determination methods,
68
Summer L. Bernstein and Michael T. Bowers
including solution-phase NMR and X-ray crystallography. As a consequence, we have applied MS and IM methods to this problem. When a pH 7.4 solution of Ab42 is analyzed on the Q-TOF2, the +4 and +3 monomer charge states are the dominant peaks in the mass spectrum but make up only a minor fraction of the total intensity (Figure 1) [49]. The most striking feature about this mass spectrum is the large mound-like distribution from about m/z 500 to 8,000. A very broad range of oligomers with a range of charge states makes up the unresolved distribution with a maximum intensity near m/z 3,000. To investigate the composition of the large distribution, a segment of it centered on m/z 3,200 was isolated in the first analyzer region and was subjected to CID in the collision cell. The resulting mass spectrum obtained in the TOF region is shown as an inset in Figure 1. After CID, a series of distinct peaks corresponding to z/n ¼ +3 to +6 are observed at m/z 500–1,600, having dissociated from the species selected in the first analyzer region. Here z is the charge and n is the oligomer number (n ¼ 1 is monomer, n ¼ 2 is dimer, etc.). The CID spectrum also shows a large distribution centered about m/z 2,300. This distribution is due to noncovalent ‘‘stripped oligomers’’ maintained after collisions with Ar. This result agrees with the explanation that the large mound centered on m/z 3000 in the mass spectrum
+4 Large oligomers
+5
+4 +6 +3
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 m/z
+3 Large oligomers
1000
2000
3000
4000 m/z
5000
6000
7000
8000
Figure 1 Positive ion mass spectrum for Ab42 obtained from the modified Q-TOF2 (WatersMicromass, Manchester, UK). The solution in the capillary was heated to 901C to detach ammonium adducts. Inset: CID mass spectrum of Ab42 isolated at m/z 3,200. Adapted from Ref. 49.
Noncovalent Protein Interactions
69
(Figure 1) is due to unresolved large oligomers. Although it does not help identify the oligomer distribution, it does indicate Ab42 readily forms large oligomers, most likely highly charged protofibrils. Figure 1 shows that under suitable conditions, MS can be used to observe the early stages of protein assembly. [Pro19]Ab42 is an alloform of Ab42 with the synthetic substitution Pro19 Phe19 near the central hydrophobic region. This substitution is known to completely disrupt fibril formation [50]. Compared to Ab42, [Pro19]Ab42 gives a drastically different positive ion mass spectrum (Figure 2) [49]. High-resolution 13 C isotope patterns reveal that the +4 and +3 peaks are monomers while the remaining peaks are dimers and larger oligomers. An interesting point is that upon magnification of the region between m/z 2,500 and 3,500, oligomer peaks are clearly visible and are assigned as trimer (Tr), tetramer (Te), and pentamer (P). In summary, Ab42 wild-type forms a broad distribution of highly charged large oligomers while [Pro19]Ab42, under similar conditions, forms only small amounts of small oligomer, results consistent with the observation that the Pro19-Phe19 substitution strongly reduces the fibrillization of wild-type Ab42 [42]. Under physiologic conditions, Ab has an overall solution charge state of 3 due to six acidic sites and three basic sites. Since we are interested in comparing the MS results to the solution assembly we chose to conduct the IM-MS studies in negative ion mode. High-resolution 13C-isotope distributions and ATDs were
Figure 2 Positive ion mass spectrum of [Pro19]Ab42 on the Q-TOF2. The expanded region shows the presence of oligomers where Tr ¼ trimer, Te ¼ tetramer, and P ¼ pentamer. Adapted from Ref. 49.
70
Summer L. Bernstein and Michael T. Bowers
obtained for each peak observed in the mass spectra for Ab42, [Pro19]Ab42, and several other alloforms of Ab, including the significantly less neurotoxic Ab40. Each spectrum had intense peaks for the z/n ¼ 4 and 3 charge states. Isotope patterns indicated these are due to monomers (n ¼ 1). All of the spectra had oligomers observed at z/n ¼ 5/2 and [Pro19]Ab42 and Ab40 had significant z/n ¼ 2 peaks, which are primarily oligomers [50]. The z/n ¼ 5/2 ATD for Ab42 is shown in Figure 3A. Analysis (using ˚ 2) and dodecamers variable ion injection energies) reveals hexamers (2,898 A 2 ˚ ) of Ab42 as the most stable oligomers sprayed from solution [50]. Upon (4,308 A more recent and careful analysis of the z/n ¼ 5/2 ATD it has been determined that the decamer is also present as a shoulder to longer times of the dodecamer feature (3,870 A2). A large gap between the hexamer and decamer feature in the ATD is observed indicating the absence of octamer. The ATD shown in Figure 3A indicates that the early stages of Ab42 oligomerization in solution involve a steady-state distribution of monomer 2 hexamer(pentamer) 2 dodecamer(decamer). Another important point is the absence of any oligomers larger than dihexamer in the IM-MS experiment. This is an intriguing situation since from our experimental observations [49] we are certain that insoluble aggregates are rapidly forming and clogging our spray capillaries. One feasible explanation for this observation is that once a larger oligomer is formed, for example a trihexamer, a structural conversion takes place (b-sheet formation) that allows for the rapid addition of either monomer or oligomer units to form a broad ‘‘protofibril’’ oligomer distribution [49]. The z/n ¼ 5/2 ATDs for [Pro19]Ab42 (Figure 3B) and Ab40 (Figure 3C) indicate that the tetramer is the largest soluble oligomer formed. In addition, the z/n ¼ 2 peak (data not shown) is predominately composed of monomer and dimer for Ab40 [49], and monomer, dimer, and trimer for [Pro19]Ab42 [49]. Hence, Ab40 has a steady-state distribution of monomer2dimer2tetramer while [Pro19]Ab42 has a steady-state distribution of monomer2dimer2 trimer2tetramer. These results are consistent with the Q-TOF2 positive-ion mass spectrum for [Pro19]Ab42 (Figure 2). In addition to ATD evidence that no
(A)
(B)
(C)
200 300 400 500 600 700 800
200 300 400 500 600 700 800
200 300 400 500 600 700 800
drift time
drift time
drift time
Figure 3 The z/n 5/2 ATDs for (A) Ab42, (B) [Pro19]Ab42, and (C) Ab40. The peaks are assigned to different oligomeric species (see text).
Noncovalent Protein Interactions
Table 1
71
Summary of oligomer size distributionsa from ion mobility data and 13C isotope patterns
z/n
Ab42
[Pro19]Ab42
Ab40
5/2 2
30 D5+Te10+H15+(P)25 2 +(H)2 —
D5+Te10 M2+D4+Tr6+Te8
D5+Te10 M2+D4+Tr6
a
Only dominant features are included, where M ¼ monomer, D ¼ dimer, Tr ¼ trimer, Te ¼ tetramer, P ¼ pentamer, and H ¼ hexamer. Note that the relative abundance of different oligomers for any given alloform at a particular z/n value depends on drift-cell injection energy and that lower-order oligomers are not necessarily observed at low injection energies.
oligomers larger than the tetramer are being formed for either alloform, the fact that the [Pro19]Ab42 and Ab40 samples spray continuously and easily also signifies no formation of higher-order oligomers. Table 1 summarizes our findings on the oligomer size distributions of the three alloforms Ab42, [Pro19]Ab42, and Ab40 determined using IM data and high-resolution MS. The combination of all the data presented here leads us to conclude that the oligomers observed reflect genuine solution-phase assemblies. Not only is it desirable to determine the oligomeric distributions of Ab but also to obtain structural information on the assemblies. Recall that for each oligomer observed in the ATDs an experimental cross section was obtained. To model these oligomers atomistically is not currently possible. However, we can get a good idea of the shape of the oligomers using a simplified ‘‘hard sphere’’ model. If the sphere is fit to the cross section all reasonable shapes for the subsequent oligomer cross sections are too big. This is not surprising because this simple model does not account for the ability of the monomer subunit to compact itself to accommodate the addition of adjacent subunits. Much of this accommodation can be accounted for by fitting the dimer cross section rather than the monomer. Once this is done we can construct tetramers, hexamers, etc., of different shapes and get predicted cross sections for comparison with experiment. Details are given elsewhere [49]. The comparison indicates that the hexamer prefers a planar ˚ 2, stheory ¼ 3,100 A ˚ 2) rather than a closest hexagonal arrangement (sexp ¼ 2,898 A 2 ˚ packed configuration (stheory ¼ 5,728 A ) or a quasi-linear arrangement ˚ 2). The dodecamer prefers a stacked arrangement of two (stheory ¼ 3,450 A ˚ 2, stheory ¼ 4,562 A ˚ 2) rather than a side-by-side hexamers hexagons (sexp ¼ 4,308 A 2 ˚ arrangement (stheory ¼ 2,578 A ). The decamer also prefers a stacked arrangement ˚ 2, stheory ¼ 3,864 A ˚ 2). of two pentagons of the form [(Ab42)5]2 (sexp ¼ 3,870 A From solution-based studies it was postulated that Ab42 monomers associate to form pentamer/hexamer units called paranuclei [42]. These paranuclei are then believed to form larger oligomers which eventually go on to form protofibrils and finally fibrils. At some point a conversion in conformation must take place from an unstructured assembly with a-helix character to the b-sheet structure that has been found in the fibrils. The findings from the IM-MS data suggest an oligomerization mechanism where the dodecamer plays a leading role (Figure 4). In our revised mechanism, monomer forms a steady-state distribution with dimer and tetramer that associate to form paranuclei. The paranuclei then associate to form dihexamers (or dipentamers) with a stacked ring structure.
72
Summer L. Bernstein and Michael T. Bowers
Old Aβ42 Aggregation Mechanism M (U) Paranuclei (U) Large Oligomers (U)
Fibrils (β) Aβ42 Mechanism using MS/IM-MS M
D
Te
Paranuclei
Toxic?
Protofibrils (β)
[Pro19]Aβ42 Aβ40 M
D
Te
Figure 4 Ab42 aggregation scheme. Adapted from Ref. 49.
At this point it appears that addition of a third paranucleus leads to a transition to b-sheet and very rapid growth via monomer addition. This model is consistent with recent information from animal studies that indicates a special role for a 56-kDa species [51], consistent with our stacked dihexamer.
3.2 a-Synuclein a-Syn is the primary proteinaceous material implicated in the pathogenesis of PD. Although the etiology of PD is still unknown, a-syn became the focus of PD research when it was first discovered as the primary fibrillar component in Lewy bodies (LBs) and Lewy neurites (LNs) found in the brains of PD victims [52]. Additionally, rare inherited autosomal early-onset forms of PD are linked to two missense mutations (A53T and A30P) found in the gene that codes for a-syn [53]. In recent years, a growing body of evidence suggests that oligomeric assemblies of a-syn are responsible for the toxic effects seen in PD and that the fibrillar deposits are perhaps a byproduct of neuronal death. a-Syn is a highly soluble 14-kDa protein containing seven imperfect repeats near the N-terminal region and a highly acidic C-terminal region. In vitro studies show that a-syn is natively unfolded, having little or no ordered structure in the soluble form under physiological conditions [54]. Small-angle X-ray scattering studies indicate that ˚ , larger than expected at neutral pH the protein has a radius of gyration of about 40 A ˚ for a folded globular conformation (15 A), and smaller than predicted for a random ˚ ) [55]. Fibrillization however is believed to occur in a coil conformation (52 A nucleation-dependent mechanism [56,57] initiated by a critical structural transformation from an unfolded conformation to a partially folded intermediate [56]. These studies indicate that a-syn most likely populates several intermediate conformations, both monomeric and oligomeric, on the pathway to the mature fibril form.
Noncovalent Protein Interactions
73
Our lab demonstrated that the highly unstructured nature of a-syn can be revealed using IM-MS [58]. In the case of wild-type a-syn and the A30P mutant [59], the charge distribution at pH 7.4 tends to be broad, ranging from 6 to 16, centered at 11, and with cross sections at higher charge states consistent with extended protein structures. Another interesting feature observed in both mass spectra is dimer peaks for the 17 to 21 charge states. Data obtained for pH 2.5 solutions, however, showed a narrower monomer charge-state distribution from 6 to 11, centered at 8, and no sign of dimer formation. Usually conformations under physiologic solution conditions are natively folded and have fewer sites exposed for either protonation or deprotonation during the electrospray process, resulting in rather narrow charge-state distributions centered at low charge states. Solutions containing more extended conformers have more sites available for excess charging and produce broader charge-state distributions centered at higher charge states. However, a-syn is natively unfolded at physiological pH explaining the rather broad distribution. a-Syn also appears to become more compact at low pH which is consistent with the monomer charge-state distribution and lower charge states observed in the mass spectrum. These differences observed in the charge-state distributions and cross sections under the different conditions are consistent with general trends in MS, where changes in the charge-state distribution of protein are felt to be due to the changes in the proteins’ secondary solution conformation [60]. The IM data for the pH 7 solutions indicates that there are two distinct families of structures: one consisting of relatively compact proteins with eight or less negative charges and one consisting of relatively extended structures with nine or more charges. The transition from one family to the other occurs between ˚ 2). A plot charge state 8 and 9 with an increase in cross section of 50% (B800 A of cross section versus charge state, Figure 5, shows the trends for both WT and A30P. For the compact family of structures, A30P generally has a larger cross section than the analogous structures for WT [59]. The cross sections of the extended conformations for both species are nearly identical. Cross sections of structures within the family of extended conformations increase with increasing charge state indicating that structures continue to unfold as more charges are added, presumably due to Coulomb repulsion. The more compact structures are believed to be more closely related to the solution structure (a-syn prefers to be 8 or 9 in solution) and therefore we suggest A30P has a more extended solution conformation than the wild type and this fact could have implications for its increased tendency for oligomerization [61]. Obtaining a 3-D structure for a-syn is a big challenge because there are no NMR or crystal structures to start with and an all-atom molecular dynamics simulation is unrealistic at this point due to the size of the molecule. Therefore cross sections were calculated for two limiting model structures: an extended, all-helical structure and a compact globular structure (see Figure 5). Neither of these models is expected to be a realistic description of the actual protein but are intended to be ‘‘yardsticks’’ for comparison with the experimental data. It is evident that the low charge state cross sections are in good agreement with the cross section of the compact globular structure. The experimental cross sections
74
Summer L. Bernstein and Michael T. Bowers
Negative Ions pH 7 3600
all helical
Cross Section (Å2)
3000
2400
1800
globular
1200 -7 A30P Low IE
-9
-11 -13 Charge State A30P High IE
-15
WT Low IE
-17 WT High IE
Figure 5 Plot of cross section versus charge state for the WT (J) and A30P (&) a-syn. Two families of structures under high and low injection energy conditions are observed: the relatively compact structures dominant at low injection energies (open squares and open circles), and the extended structures dominant at high injection energies (filled squares and filled circles) [59]. Adapted from Ref. 58.
of the higher charge states are much larger than the globular theoretical structure, but not as extended as the all-helical structure, consistent with a substantially unfolded family of structures. Since more compact structures dominate at low pH and since aggregation appears to be enhanced at low pH, important partially folded states of a-syn may be formed that lead to eventual aggregation [61]. How this observation correlates with PD is yet to be understood.
4. LIGAND–RECEPTOR INTERACTIONS Another interesting type of intermolecular interaction effective in multicellular organisms occurs when a low-molecular-weight molecule approaches a cell extracellularly and binds to a specific receptor in/on the cell membrane. The lowmolecular-weight molecule, the ligand, can be a hormone released in one part of the organism to transmit a signal to the receptor-containing cell. Other small molecules targeting a specific receptor or enzyme include synthetic drugs. A ligand binding to a receptor may either trigger a response in the cell (agonists) or may inhibit a response (antagonists) by blocking the receptor from activation by an agonist.
Noncovalent Protein Interactions
75
Examination of the ligand–receptor interaction is often very difficult using any of the traditional structural methods available today. For instance, many hormone receptors are membrane proteins embedded in the plasma membrane. Correct folding of these receptors for hormone attachment is only possible in the presence of the membrane and often of other membrane or cytoplasmic proteins. However, partial knowledge of the receptor-binding site available from sequence homology and point mutation studies is sometimes enough information to place structural restraints on the ligand. MS methods can be applied to low-molecularweight ligands to explore conditions yielding a ligand structure that matches the receptor requirements. For instance, a structural analysis of the nine-residue peptide hormone oxytocin (Cys-Tyr-Ile-Gln-Asn-Cys-Pro-Leu-Gly) by IMS-MS carried out in our laboratory reveals that the complex of oxytocin with certain divalent metal ions such as Co2+, Mn2+, and Zn2+ fits the structural requirements of the receptor much better than the bare hormone [62]. A number of biochemical studies indicate that the presence of these metal ions is beneficial for receptor binding, whereas other divalent metal ions such as Cu2+ do not have a beneficial effect. We have been able to establish that Cu2+ forms a square-planar coordination geometry with oxytocin leading to an unfavorable arrangement of the oxytocin side chains with respect to interaction with the receptor, while other divalent metals form an octahedral complex with backbone carbonyls providing the required structure for binding to the receptor [63]. An increasing number of studies, where the intact ligand–receptor complex could be electrosprayed and examined in the mass spectrometer, indicate that MS opens fascinating possibilities to quantitatively measure solution-phase as well as gas-phase noncovalent interaction strengths [64]. One example is the characterization of protein–ligand complexes and associated enzyme intermediates of GlcNAc-6-O-sulfotransferase using FTICR-MS [65]. The comparison of kinetics and binding properties of enzyme–substrate and enzyme–inhibitor probed by both solution- and gas-phase methods produces evidence suggesting that the binding domain is preserved in transferring the complexes from solution into the gas phase [65]. Another important example is the specific binding of a-syn to several natural polycations including spermidine, spermine, and basic histone proteins which have implications in a-syn aggregation [66]. The interaction leads to increased oligomerization although it does not induce any significant change in the secondary structure of natively unfolded state [66]. ESI-ECD-MS/MS has been applied by Xie et al. to study the specific binding of spermine to a-syn [67]. According to the mass spectrum, the addition of spermine ligand did not significantly affect the charge-state distribution, only decreasing the average charge state from +15.8 to +14.7. The ECD FT-ICR-MS of a 1:1 halo a-syn-spermine solution at physiologic pH, however, revealed spermine-bound C-terminal products only and no unbound or N-terminal bound products. Additionally, they determined that spermine binding is localized between Gly106 and Pro138. While several studies have indicated that the addition of spermine does not induce secondary structure [66,68,69], there is some debate about other structural changes which may occur upon spermine binding. Recent NMR and ECD findings show
76
Summer L. Bernstein and Michael T. Bowers
that when spermine binds specifically to a-syn, the natively unstructured protein extends into a more open conformation yet remains highly unstructured [67,70]. It is believed that the charge shielding that results from spermine binding to the C-terminus reduces intramolecular electrostatic interactions, thereby reducing the overall compactness of a-syn, possibly promoting aggregation. Recent IM-MS data obtained in our group finds an opposite effect from spermine binding to a-syn. Rather than inducing larger more extended structures, we found that when spermine bound to a-syn near physiologic pH, the protein underwent a charge reduction from -10, the net charge in solution, to form a complex with net charge -6, which was accompanied by a dramatic size reduction of about a factor of two. The presence of spermine shifted the conformational equilibrium in solution to favor this more compact form, which presumably is a partially folded intermediate that then proceeds to form aggregates.
5. HETEROGENEOUS COMPLEXES: TRAP The heterogeneous nature of biological assemblies is a major issue in structural biology. Several examples previously reviewed show that MS can aid in deciphering subunit composition and stoichiometry of complexes up to the megadalton mass range. The tryptophan (trp) binding attenuation protein (TRAP) from Bacillus subtilis studied by the Robinson group serves as an excellent example of the utility and relevance of MS approaches for structural studies on intact heterogeneous protein assemblies [5,15]. TRAP is a small 8-kDa protein. The X-ray crystal structure shows that TRAP in the presence of trp self-assembles into a stable ring of 11 oligomer subunits with 11 trp binding sites [71]. In addition, a 53-base segment of the trp leader mRNA binds around the perimeter of TRAP stabilizing the solution structure [15]. Although the topology was uncovered using X-ray crystallography, until MS approaches were applied the stoichiometry was unknown. When an aqueous 4:1 solution of trp to TRAP monomer under physiologic conditions was first analyzed in the Q-TOF2 by Robinson et al., three different species were observed [15]. To their surprise these species corresponded to an 11-mer, 12-mer, and 24-mer. The 11-mer and 12-mer were hypothesized to correspond to two types of stable single ring structures while the 24-mer was believed to correspond to a stable double ring. MS/MS reveals the 11-mer and 12-mer both bind 11 trp molecules. The TRAP 24-mer MS/MS data, however, reveal 22, 17, and 11 bound trp molecules. This implies that the double ring formation of the 24-mer must have asymmetric stacking of the TRAP rings where only 11 trp molecules are exposed and the other 11 trp molecules are sandwiched between the two rings. Furthermore, the results indicate that the addition of trp significantly stabilize the TRAP complex with the addition of RNA providing further stabilization [15]. Although it was shown that MS provides valuable information not otherwise obtained from traditional structural techniques about the stoichiometry of TRAP, it was still unclear whether the ring structure remained stabilized upon desolvation or if it collapsed. Like most situations, several techniques working
Noncovalent Protein Interactions
A. Mass Spectrum
77
B. Ion Mobility (a)
22+
(c)
22+
20+
21+ (b) 20+
(d)
23+
21+
23+ 4000
5500 m/z
7000
3200
6800
3200
6800
Collision cross section(Å2)
Figure 6 (A) Mass spectrum of the 11-member TRAP complex bound to 11 trp molecules. (B) (a–d) Ion mobility (IM) data for the charge states of the trp-bound TRAP complex. Dark and grey dashed lines correspond to collision cross sections for ring and collapsed structures based on the native state (6,500 A˚2) and close-packed icosahedral (5,600 A˚2) arrangements of spheres, respectively. Adapted with permission from Ref. 5.
synergistically provided the most insight. To answer this question of structural integrity, TRAP was analyzed using the recently introduced T-wave IM device [37] to determine the protein’s conformation. This was accomplished by measuring its collision cross section and comparing it to the cross section calculated from its crystal structure [5]. A mass spectrum of the 11-mer TRAP-trp complex is shown in Figure 6A. IM data were obtained for each of the charge states for the TRAP complex in ligand-bound form (Figure 6B). The experimental cross section for the ˚ 2 (Figure 6B (c)), a value that +20 charge state of the TRAP complex is 6,400 A agrees well with the estimated cross section determined from the crystal structure ˚ 2) [71]. However, as the charge state increases the cross section gets (B6,600 A smaller, most likely due to the destabilization of the ring from excessive charging. The cross sections are compared to model structures composed of compressible spheres with an approximate diameter of TRAP subunits. The dark and grey dashed lines in Figure 6B correspond to collision cross sections for a stable ring ˚ 2) and a close-packed structure based on the planar, circular native state (6,500 A 2 ˚ icosahedron arrangement (5,600 A ). In addition, the cross section for the 11-mer ˚ 2, TRAP-trp-RNA complex for the +19 charge state has a cross section of 7,400 A a value 22% larger than the TRAP-trp complex. These results indicate that RNA does bind on the periphery of the TRAP ring accounting for the increase in cross section and is completely stabilizing the ring structure. These results demonstrate
78
Summer L. Bernstein and Michael T. Bowers
that MS and IM-MS are useful techniques to explore the topology of individual components in complex heterogeneous macromolecular assemblies.
6. SUBUNIT EXCHANGE OF TRANSTHYRETIN Macromolecular assemblies are dynamic entities that are affected by several environmental influences such as temperature, pH, and concentration. A major challenge is to observe assembly and disassembly of complexes and monitor conformational changes within these assemblies. Several impressive examples have been demonstrated and reviewed [9]. A recent study by Keetch et al. demonstrated that complexes are not static entities but, in fact, that their subunits can readily exchange [72]. In the case for transthyretin, a 55-kDa tetrameric transporter protein found in blood plasma and cerebrospinal fluid, this exchange may have implications in the development of familial polyneuropathy amyloidosis, the most common form of systemic amyloid disease. Those who are heterozygous, co-expressing the wild-type transthyretin and a variant (L55P in this particular study) are most severely affected by this debilitating disease. Although several models exist for the mechanism of the oligomerization from the transthyretin tetramer to amyloid fibrils, all agree that dissociation of the tetramer at low pH contributes to the disease [72–74]. Subunit composition and exchange of WT and variant L55P intact tetramers were monitored using MS as a function of time where one set of homotetramers was isotopically labeled with 13C and 15N throughout and the other was in its native form. The mass spectrum indicated five types of tetramers exist as determined from their unique masses. First it was established that the most plausible mechanism for subunit exchange was that both monomers and dimers are exchanged between tetramers, yet dimers dominated the exchange in the presence of L55P. In the mixed WT and L55P case, tetramers contained two or three L55P units and dominated in the early stages of formation. Next, the rate of exchange was determined. It was found that the rate increases more than an order of magnitude for L55P compared to WT and dissociation of the WT protein is the rate-limiting step. These results suggest that in the presence of L55P, dimers of WT have a higher propensity to accumulate and ultimately may become trapped in insoluble aggregates.
7. FUTURE DIRECTIONS Applications of MS to novel and important biological problems have become a driving force in structural biology. In this post-genomic era there will continue to be a high demand to understand the relationship between protein structure and function. As more complex systems and interactions are discovered, the technology of MS will continue to advance, grow, and aid in this discovery. It is apparent that in order to solve difficult biological problems several techniques
Noncovalent Protein Interactions
79
will need to be employed in a synergistic way. MS, often coupled to IM, will be increasingly important in this regard.
8. CONCLUSIONS This chapter demonstrates through a few interesting examples how MS can reveal structure, assembly states, and mechanistic information about intact protein assemblies. Assembly stoichiometry appears to be conserved in most instances and is the most significant connection between solution and the dehydrated system observed in MS [20]. However, how much of the conformation is retained upon desolvation is still one of the most debated, criticized, and sought out issues involved in using MS approaches to solve structural biological problems. It has been demonstrated that MS in combination with IM can provide unique insights into the oligomerization behavior of aggregating species from solution, as demonstrated in the example of Ab, through its ability to resolve systems of identical m/z into unique structural elements. The ability to calculate cross sections provided by IM not only allows insight into the oligomerization mechanism but also generates structural constraints for in silico modeling of protein assembly. Like most situations, several techniques working synergistically provided the most insight.
ACKNOWLEDGMENTS SLB would like to acknowledge Joseph Loo for his continuous encouragement and inspiration through the impact of his work in the field of mass spectrometry. We would also like to acknowledge Carol V. Robinson and her group for collaboration, useful discussions, and for their unremitting drive to examine noncovalent interactions of proteins using mass spectrometry. Thomas Wyttenbach is also acknowledged for his contribution to this chapter and much of the research presented. We also acknowledge Megan Grabenauer and Nicholas Dupuis for their contribution to many of the results discussed here. Finally, MTB is pleased to acknowledge the National Science Foundation under grant CHE-0503728 and the National Institute of Health under grant PO1-AG027818.
REFERENCES 1 R.J. Ellis and A.P. Minton, Cell biology — join the crowd, Nature, 425(6953) (2003) 27–28. 2 B. Alberts, The cell as a collection of protein machines: Preparing the next generation of molecular biologists, Cell, 92(3) (1998) 291–294. 3 J.B. Fenn, M. Mann, C.K. Meng, S.F. Wong and C.M. Whitehouse, Electrospray ionization for massspectrometry of large biomolecules, Science, 246(4926) (1989) 64–71. 4 S.A. Hofstadler and K.A. Sannes-Lowery, Applications of ESI-MS in drug discovery: Interrogation of noncovalent complexes, Nat. Rev. Drug Discov., 5(7) (2006) 585–595. 5 B.T. Ruotolo, K. Giles, I. Campuzano, A.M. Sandercock, R.H. Bateman and C.V. Robinson, Evidence for macromolecular protein rings in the absence of bulk water, Science, 310(5754) (2005) 1658–1661. 6 E.S. Baker, S.L. Bernstein and M.T. Bowers, Structural characterization of G-quadruplexes in deoxyguanosine clusters using ion mobility mass spectrometry, J. Am. Soc. Mass Spectrom., 16(7) (2005) 989–997. 7 T. Wyttenbach and M.T. Bowers, Gas-phase conformations: The ion mobility/ion chromatography method, Top. Curr. Chem., 225 (2003) 207–232.
80
Summer L. Bernstein and Michael T. Bowers
8 J.A. Loo, Electrospray ionization mass spectrometry: A technology for studying noncovalent macromolecular complexes, Int. J. Mass Spectrom., 200(1–3) (2000) 175–186. 9 A.J.R. Heck and R.H.H. van den Heuvel, Investigation of intact protein complexes by mass spectrometry, Mass Spectrom. Rev., 23(5) (2004) 368–389. 10 R.H. van den Heuvel and A.J.R. Heck, Native protein mass spectrometry: From intact oligomers to functional machineries, Curr. Opin. Chem. Biol., 8(5) (2004) 519–526. 11 A.E. Ashcroft, Recent developments in electrospray ionisation mass spectrometry: Noncovalently bound protein complexes, Nat. Prod. Rep., 22(4) (2005) 452–464. 12 J.L.P. Benesch, J.A. Aquilina, B.T. Ruotolo, F. Sobott and C.V. Robinson, Tandem mass spectrometry reveals the quaternary organization of macromolecular assemblies, Chem. Biol., 13(6) (2006) 597–605. 13 J.L. Benesch and C.V. Robinson, Mass spectrometry of macromolecular assemblies: Preservation and dissociation, Curr. Opin. Struc. Biol., 16(2) (2006) 245–251. 14 J.A. Loo, Studying noncovalent protein complexes by electrospray ionization mass spectrometry, Mass Spectrom. Rev., 16(1) (1997) 1–23. 15 M.G. McCammon, H. Hernandez, F. Sobott and C.V. Robinson, Tandem mass spectrometry defines the stoichiometry and quaternary structural arrangement of tryptophan molecules in the multiprotein complex TRAP, J. Am. Chem. Soc., 126(19) (2004) 5950–5951. 16 F. Sobott, H. Hernandez, M.G. McCammon, M.A. Tito and C.V. Robinson, A tandem mass spectrometer for improved transmission and analysis of large macromolecular assemblies, Anal. Chem., 74(6) (2002) 1402–1407. 17 A.A. Rostom and C.V. Robinson, Detection of the intact GroEL chaperonin assembly by mass spectrometry, J. Am. Chem. Soc., 121(19) (1999) 4718–4719. 18 M.A. Tito, K. Tars, K. Valegard, J. Hajdu and C.V. Robinson, Electrospray time-of-flight mass spectrometry of the intact MS2 virus capsid, J. Am. Chem. Soc., 122(14) (2000) 3550–3551. 19 L.L. Ilag, I. Ubarretxena-Belandia, C.G. Tate and C.V. Robinson, Drug binding revealed by tandem mass spectrometry of a protein-micelle complex, J. Am. Chem. Soc., 126(44) (2004) 14362–14363. 20 J.A. Loo, B. Berhane, C.S. Kaddis, K.M. Wooding, Y.M. Xie, S.L. Kaufman and I.V. Chernushevich, Electrospray ionization mass spectrometry and ion mobility analysis of the 20S proteasome complex, J. Am. Soc. Mass Spectrom., 16(7) (2005) 998–1008. 21 C.V. Robinson, M. Gross, S.J. Eyles, J.J. Ewbank, M. Mayhew, F.U. Hartl, C.M. Dobson and S.E. Radford, Conformation of GroEL-bound a-lactalbumin probed by mass spectrometry, Nature, 372(6507) (1994) 646–651. 22 C.V. Robinson, M. Gross and S.E. Radford, Probing conformations of GroEL-bound substrate proteins by mass spectrometry, Method Enzymol., 290 (1998) 296–313. 23 R.C. Dunbar, BIRD (blackbody infrared radiative dissociation): Evolution principles, and applications, Mass Spectrom. Rev., 23(2) (2004) 127–158. 24 W.D. Price, P.D. Schnier and E.R. Williams, Binding energies of the proton-bound amino acid dimers Gly Gly, Ala Ala, Gly Ala, and Lys Lys measured by blackbody infrared radiative dissociation, J. Phys. Chem. B, 101(4) (1997) 664–673. 25 R.A. Zubarev, N.L. Kelleher and F.W. McLafferty, Electron capture dissociation of multiply charged protein cations. A nonergodic process, J. Am. Chem. Soc., 120(13) (1998) 3265–3266. 26 D.E. Clemmer and M.F. Jarrold, Ion mobility measurements and their applications to clusters and biomolecules, J. Mass Spectrom., 32(6) (1997) 577–592. 27 G. von Helden, M.-T. Hsu, P.R. Kemper and M.T. Bowers, Structures of carbon cluster ions from 3 to 60 atoms: Linears to rings to fullerenes, J. Chem. Phys., 95(5) (1991) 3835–3837. 28 J. Gidden, P.R. Kemper, E. Shammel, D.P. Fee, S. Anderson and M.T. Bowers, Application of ion mobility to the gas-phase conformational analysis of polyhedral oligomeric silsesquioxanes (POSS), Int. J. Mass Spectrom., 222(1–3) (2003) 63–73. 29 J. Gidden, T. Wyttenbach, A.T. Jackson, J.H. Scrivens and M.T. Bowers, Gas-phase conformations of synthetic polymers: Poly(ethylene glycol), poly(propylene glycol), and poly(tetramethylene glycol), J. Am. Chem. Soc., 122(19) (2000) 4692–4699. 30 T. Wyttenbach, G. von Helden and M.T. Bowers, Gas-phase conformation of biological molecules: Bradykinin, J. Am. Chem. Soc., 118(35) (1996) 8355–8364. 31 K.B. Shelimov, D.E. Clemmer, R.R. Hudgins and M.F. Jarrold, Protein structure in vacuo: Gas-phase confirmations of BPTI and cytochrome c, J. Am. Chem. Soc., 119(9) (1997) 2240–2248.
Noncovalent Protein Interactions
81
32 D.E. Clemmer, R.R. Hudgins and M.F. Jarrold, Naked protein conformations: Cytochrome c in the gas phase, J. Am. Chem. Soc., 117(40) (1995) 10141–10142. 33 J.W. Li, J.A. Taraszka, A.E. Counterman and D.E. Clemmer, Influence of solvent composition and capillary temperature on the conformations of electrosprayed ions: Unfolding of compact ubiquitin conformers from pseudonative and denatured solutions, Int. J. Mass Spectrom., 187 (1999) 37–47. 34 E.A. Mason and E.W. McDaniel, Transport Properties of Ions in Gases, Wiley, New York, 1988. 35 D.A. Case, D.A. Pearlman, J.W. Caldwell, T.E.C., III., J. Wang, W.S. Ross, C.L. Simmerling, T.A. Darden, K.M. Merz, R.V. Stanton, A.L. Cheng, J.J. Vincent, M. Crowley, V. Tsui, H. Gohlke, R.J. Radmer, Y. Duan, J. Pitera, I. Massova, G.L. Seibel, U.C. Singh, P.K. Weiner and P.A. Kollman (2002), AMBER 7, University of California, San Francisco 36 J. Gidden, A. Ferzoco, E.S. Baker and M.T. Bowers, Duplex formation and the onset of helicity in poly d(CG)n oligonucleotides in a solvent-free environment, J. Am. Chem. Soc., 126(46) (2004) 15132–15140. 37 K. Thalassinos, S.E. Slade, K.R. Jennings, J.H. Scrivens, K. Giles, J. Wildgoose, J. Hoyes, R.H. Bateman and M.T. Bowers, Ion mobility mass spectrometry of proteins in a modified commercial mass spectrometer, Int. J. Mass Spectrom., 236(1–3) (2004) 55–63. 38 B. Caughey and P.T. Lansbury, Protofibrils, pores, fibrils, and neurodegeneration: Separating the responsible protein aggregates from the innocent bystanders, Annu. Rev. Neurosci., 26 (2003) 267–298. 39 E.H. Koo, P.T. Lansbury and J.W. Kelly, Amyloid diseases: Abnormal protein aggregation in neurodegeneration, Proc. Natl. Acad. Sci. USA, 96(18) (1999) 9989–9990. 40 C.M. Dobson, Principles of protein folding, misfolding and aggregation, Semin. Cell Dev. Biol., 15(1) (2004) 3–16. 41 C.M. Dobson, Protein misfolding, evolution and disease, Trends Biochem. Sci., 24(9) (1999) 329–332. 42 G. Bitan, S.S. Vollers and D.B. Teplow, Elucidation of primary structure elements controlling early amyloid b-protein oligomerization, J. Biol. Chem., 278(37) (2003) 34882–34889. 43 J. Wang, D.W. Dickson, J.Q. Trojanowski and V.M.Y. Lee, The levels of soluble versus insoluble brain Ab distinguish Alzheimer’s disease from normal and pathologic aging, Exp. Neurol., 158(2) (1999) 328–337. 44 W.L. Klein, G.A. Krafft and C.E. Finch, Targeting small Ab oligomers: The solution to an Alzheimer’s disease conundrum? Trends Neurosci., 24(4) (2001) 219–224. 45 M.D. Kirkitadze, G. Bitan and D.B. Teplow, Paradigm shifts in Alzheimer’s disease and other neurodegenerative disorders: The emerging role of oligomeric assemblies, J. Neurosci. Res., 69(5) (2002) 567–577. 46 D.J. Selkoe, Alzheimer’s disease: Genes, proteins, and therapy, Physiol. Rev., 81(2) (2001) 741–766. 47 D. Scheuner, C. Eckman, M. Jensen, X. Song, M. Citron, N. Suzuki, T.D. Bird, J. Hardy, M. Hutton, W. Kukull, E. Larson, E. LevyLahad, M. Viitanen, E. Peskind, P. Poorkaj, G. Schellenberg, R. Tanzi, W. Wasco, L. Lannfelt, D. Selkoe and S. Younkin, Secreted amyloid b-protein similar to that in the senile plaques of Alzheimer’s disease is increased in vivo by the presenilin 1 and 2 and APP mutations linked to familial Alzheimer’s disease, Nat. Med., 2(8) (1996) 864–870. 48 T.E. Golde, C.B. Eckman and S.G. Younkin, Biochemical detection of Ab isoforms: Implications for pathogenesis, diagnosis, and treatment of Alzheimer’s disease, Biochim. Biophys. Acta, 1502(1) (2000) 172–187. 49 S.L. Bernstein, N.F. Dupuis, N.D. Lazo, T. Wyttenbach, M.M. Condron, G. Bitan, D.B. Teplow, J.-E. Shea, B.T. Ruotolo, C.V. Robinson and M.T. Bowers, Early amyloid b-protein oligomerization and the importance of tetramers and dodecamers, Nat. Chem. Biol., (2008) submitted. 50 S.L. Bernstein, T. Wyttenbach, A. Baumketner, J.-E. Shea, G. Bitan, D.B. Teplow and M.T. Bowers, Amyloid b-protein: Monomer structure and early aggregation states of Ab42 and its Pro19 alloform, J. Am. Chem. Soc., 127(7) (2005) 2075–2084. 51 S. Lesne, M.T. Koh, L. Kotilinek, R. Kayed, C.G. Glabe, A. Yang, M. Gallagher and K.H. Ashe, A specific amyloid-b protein assembly in the brain impairs memory, Nature, 440(7082) (2006) 352–357. 52 M.G. Spillantini, M.L. Schmidt, V.M.Y. Lee, J.Q. Trojanowski, R. Jakes and M. Goedert, a-Synuclein in Lewy bodies, Nature, 388(6645) (1997) 839–840. 53 M.H. Polymeropoulos, C. Lavedan, E. Leroy, S.E. Ide, A. Dehejia, A. Dutra, B. Pike, H. Root, J. Rubenstein, R. Boyer, E.S. Stenroos, S. Chandrasekharappa, A. Athanassiadou, T. Papapetropoulos, W.G. Johnson, A.M. Lazzarini, R.C. Duvoisin, G. DiIorio, L.I. Golbe and R.L. Nussbaum, Mutation in the a-synuclein gene identified in families with Parkinson’s disease, Science, 276(5321) (1997) 2045–2047.
82
Summer L. Bernstein and Michael T. Bowers
54 V.N. Uversky, J. Li and A.L. Fink, Evidence for a partially folded intermediate in a-synuclein fibril formation, J. Biol. Chem., 276(14) (2001) 10737–10744. 55 V.N. Uversky, H.J. Lee, J. Li, A.L. Fink and S.J. Lee, Stabilization of partially folded conformation during a-synuclein oligomerization in both purified and cytosolic preparations, J. Biol. Chem., 276(47) (2001) 43495–43498. 56 K.A. Conway, J.D. Harper and P.T. Lansbury, Accelerated in vitro fibril formation by a mutant a-synuclein linked to early-onset Parkinson disease, Nat. Med., 4(11) (1998) 1318–1320. 57 K.A. Conway, S.J. Lee, J.C. Rochet, T.T. Ding, R.E. Williamson and P.T. Lansbury, Acceleration of oligomerization, not fibrillization, is a shared property of both a-synuclein mutations linked to early-onset Parkinson’s disease: Implications for pathogenesis and therapy, Proc. Natl. Acad. Sci. USA, 97(2) (2000) 571–576. 58 S.L. Bernstein, D.F. Liu, T. Wyttenbach, M.T. Bowers, J.C. Lee, H.B. Gray and J.R. Winkler, a-Synuclein: Stable compact and extended monomeric structures and pH dependence of dimer formation, J. Am. Soc. Mass Spectrom., 15(10) (2004) 1435–1443. 59 M. Grabenauer, S.L. Bernstein, J.C. Lee, T. Wyttenbach, N.F. Dupuis, H.B. Gray, J.R. Winkler and M.T. Bowers, Spermine binding to Parkinson’s protein a-synuclein and its disease-related A30P and A53T mutants, J. Phys. Chem. B, (2008) submitted. 60 S.K. Chowdhury, V. Katta and B.T. Chait, Probing conformational changes in proteins by mass spectrometry, J. Am. Chem. Soc., 112(24) (1990) 9012–9013. 61 J. Li, V.N. Uversky and A.L. Fink, Effect of familial Parkinson’s disease point mutations A30P and A53T on the structural properties, aggregation, and fibrillation of human a-synuclein, Biochemistry, 40(38) (2001) 11604–11613. 62 D.F. Liu, A.B. Seuthe, O.T. Ehrler, X.H. Zhang, T. Wyttenbach, J.F. Hsu and M.T. Bowers, Oxytocinreceptor binding: Why divalent metals are essential, J. Am. Chem. Soc., 127(7) (2005) 2024–2025. 63 T. Wyttenbach, D. Liu and M.T. Bowers, Interactions of the hormone oxytocin with divalent metal ions, J. Am. Chem. Soc., 130(18) (2008) 5993–6000. 64 J.M. Daniel, S.D. Friess, S. Rajagopalan, S. Wendt and R. Zenobi, Quantitative determination of noncovalent binding interactions using soft ionization mass spectrometry, Int. J. Mass Spectrom., 216(1) (2002) 1–27. 65 Y.H. Yu, C.E. Kirkup, N. Pi and J.A. Leary, Characterization of noncovalent protein-ligand complexes and associated enzyme intermediates of GlcNAc-6-O-sulfotransferase by electrospray ionization FT-ICR mass spectrometry, J. Am. Soc. Mass Spectrom., 15(10) (2004) 1400–1407. 66 J. Goers, V.N. Uversky and A.L. Fink, Polycation-induced oligomerization and accelerated fibrillation of human a-synuclein in vitro, Protein Sci., 12(4) (2003) 702–707. 67 Y.M. Xie, J. Zhang, S. Yin and J.A. Loo, Top-down ESI-ECD-FT-ICR mass spectrometry localizes noncovalent protein-ligand binding sites, J. Am. Chem. Soc., 128(45) (2006) 14432–14433. 68 T. Antony, W. Hoyer, D. Cherny, G. Heim, T.M. Jovin and V. Subramaniam, Cellular polyamines promote the aggregation of a-synuclein, J. Biol. Chem., 278(5) (2003) 3235–3240. 69 L.D. Morrison, X.C. Cao and S.J. Kish, Ornithine decarboxylase in human brain: Influence of aging, regional distribution, and Alzheimer’s disease, J. Neurochem., 71(1) (1998) 288–294. 70 C.W. Bertoncini, Y.S. Jung, C.O. Fernandez, W. Hoyer, C. Griesinger, T.M. Jovin and M. Zweckstetter, Release of long-range tertiary interactions potentiates aggregation of natively unstructured a-synuclein, Proc. Natl. Acad. Sci. USA, 102(5) (2005) 1430–1435. 71 A.A. Antson, E.J. Dodson, G. Dodson, R.B. Greaves, X.P. Chen and P. Gollnick, Structure of the trp RNA-binding attenuation protein, TRAP, bound to RNA, Nature, 401(6750) (1999) 235–242. 72 C.A. Keetch, E.H.C. Bromley, M.G. McCammon, N. Wang, J. Christodoulou and C.V. Robinson, L55P transthyretin accelerates subunit exchange and leads to rapid formation of hybrid tetramers, J. Biol. Chem., 280(50) (2005) 41667–41674. 73 S.L. McCutchen, W. Colon and J.W. Kelly, Transthyretin mutation Leu-55-Pro significantly alters tetramer stability and increases amyloidogenicity, Biochemistry, 32(45) (1993) 12119–12127. 74 X. Jiang, C.S. Smith, H.M. Petrassi, P. Hammarstrom, J.T. White, J.C. Sacchettini and J.W. Kelly, An engineered transthyretin monomer that is nonamyloidogenic, unless it is partially denatured, Biochemistry, 40(38) (2001) 11442–11452.
CHAPT ER
4 Protein Analysis with Hydrogen–Deuterium Exchange Mass Spectrometry Jennifer L. Mitchell and John R. Engen
Contents
1. Introduction 1.1 Review of protein structure 1.2 Obtaining information about conformation and dynamics 2. Experimental Protocol 2.1 Deuterium introduction 2.2 Global versus local information 2.3 HPLC and MS 2.4 Data interpretation 3. Illustrative Examples 3.1 Protein conformation and the effects of mutation 3.2 Binding interactions 3.3 Investigating proteins lacking structural data 4. Conclusions Acknowledgements References
83 84 86 88 88 89 92 93 96 96 97 98 100 101 101
1. INTRODUCTION There are some properties of proteins that remain hidden to a mass spectrometer during simple molecular weight analyses. Some of these hidden protein properties include protein conformation, protein dynamics and protein interactions. How can these properties be revealed when mass spectrometers measure molecular weight not protein conformation? One way to uncover these properties with a mass spectrometer is to use a labeling method that ‘‘captures’’ the structural information before mass analysis occurs. The following chapter will describe one of these labeling methods: hydrogen–deuterium exchange. Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00204-3
r 2009 Elsevier B.V. All rights reserved.
83
84
Jennifer L. Mitchell and John R. Engen
1.1 Review of protein structure Before the details of hydrogen-deuterium exchange (HX) and mass spectrometry (MS) are described, lets review proteins and protein structure (see also Refs [1–3] for more background). Recall that proteins are composed of a linear chain of amino acids (the primary structure). For simple mass analyses, the sequence is all the information that is required. By adding together the molecular weights of the amino acids in the chain, the expected molecular weight of the whole protein is obtained. In most cases, protein molecular weight is a unique number. Therefore, when the mass spectrum of a protein is obtained and the molecular weight is found to be the same as was predicted from the sequence, the identify of the protein is confirmed. Any deviations from the expected molecular weight provide information. Things like post-translational modifications or mutations from the expected amino acid sequence produce obvious mass deviations. In addition to the primary structure, there are several more levels of structural information. These higher levels are hidden from simple mass analysis. The primary structure may fold three-dimensionally to produce the secondary structure of the protein. The main types of secondary structure are the a-helix and the b-sheet (composed of b-strands). Those parts of the primary structure that do not adopt a helix of sheet conformation are often designated as loops or unstructured regions. Two pieces of sequence that have very different primary structures (therefore distinct in simple mass analyses) may have identical secondary structure, or vice versa. Secondary structural elements organize into tertiary structure in which helicies and sheets pack in some organized fashion to produce the final three-dimensional structure of the protein (Figure 1A). Again, almost no information about the tertiary structure is provided by simple mass analyses. A fourth level of structure is the quaternary structure in which folded tertiary units may assemble into dimers, trimers, etc. Understanding the connections between structure and function is of great interest. The creation of protein structure begins at the earliest stages of polypeptide construction as the developing chain is assembled by the ribosome. The final structure, after synthesis is complete, dictates the eventual function of the protein. The relationship between higher-order structure (tertiary and quaternary) and function can be different from protein to protein. Modifications as simple as a single change from one amino acid to another can alter the folding of the protein and the eventual function of the molecule. The classic example of this is hemoglobin in which a single mutation from glutamic acid to valine results in abnormal hemoglobin packing/interactions which lead to sickle-cell anemia [4]. In other cases, multiple changes to the primary structure do not cause major variation in the tertiary structure [5]. In addition to the four levels of protein structure, another property of proteins that is sometimes overlooked is protein dynamics. In solution, proteins are moving continuously. Motions range from bond vibrations and ring flips to whole subunit translocations and changes in tertiary structure. As a result of protein dynamics, proteins exist in populations, with the majority of the population usually found in one or more preferred, or lowest energy folded forms (for many
Protein Analysis with Hydrogen–Deuterium Exchange MS
85
Figure 1 Protein structure and hydrogen exchange. (A) Ribbon diagram of a protein illustrating the various levels of structural complexity. (B) Details of primary structure. The backbone amide hydrogens are indicated with black circles. All amino acids except proline have a backbone amide hydrogen. (C) Hydrogen bonding associated with b-sheets. The b-sheet shown here is parallel, but anti-parallel b-sheets also form similar hydrogen bonds. The amide nitrogens/hydrogens are boxed. (D) Hydrogen bonding in an a-helix. This a-helix is the traditional type with 3.6 residues per turn. Other helical structures (i.e., 310 helicies) contain related hydrogen bonds. The one visible amide nitrogen/hydrogen is boxed.
proteins, W99% of the molecules may exist in nearly the same conformation). The folded state, therefore, is an ensemble of structures often referred to as the native state. The native state may fluctuate, sending part of the population into other related conformations for periods of time. Some proteins have significant portions of their populations in different conformations simultaneously and because alternate conformations comprise a significant portion of the overall population of protein molecules (i.e., Refs [6,7]), these alternate conformation populations can be detected. Other proteins prefer to exist in a denatured state and fold to the functional conformation upon interaction with partner proteins [8,9]. It is safe to say that in general, proteins cannot be fully understood without studying their dynamics and conformational changes.
86
Jennifer L. Mitchell and John R. Engen
The degree to which proteins change conformation in solution depends on the thermodynamic properties of the protein and the proteins’ environment. Protein conformation can also be greatly influenced by binding and interaction(s) with other proteins, cellular machinery and small molecules. The conformational variation during normal function, as well as conformational changes that are abnormal and associated with various disease states are of great interest.
1.2 Obtaining information about conformation and dynamics Methods such as X-ray crystallography, NMR spectroscopy and electron microscopy generally describe only one conformation of a protein (perhaps the most stable form). For mainly technical reasons, such methods are also generally unable to characterize transitions of the population from one form to the other. While NMR methods can reveal protein dynamics, there are limitations in terms of the size of the proteins that can be investigated. MS can be used to investigate protein conformation and movements. As described in the introduction, MS is not often associated with analysis of protein conformation and dynamics but rather as a method to measure molecular weight and quantify molecules. However, with labeling methods, many details about protein conformation, dynamics and interactions can be revealed by MS. One major labeling method that can be used to investigate protein conformation and dynamics with MS is hydrogen exchange. Hydrogen exchange itself is a phenomenon in which labile hydrogens in proteins exchange positions with hydrogens in the surrounding solvent [10–12]. This phenomenon has been known for nearly 100 years and occurs all the time for all proteins. The exchange can be observed when an isotope of hydrogen other than protium (H) is used. Tritium oxide (T2O) was employed first and now deuterium oxide (D2O) is mostly used for labeling. Because tritium and deuterium have different spectroscopic properties than protium, detection of the exchange can be accomplished. For example, deuterium is NMR silent whereas protium is not. In an NMR experiment where HX is occurring, amide hydrogen peaks will slowly disappear as deuterium replaces hydrogen. Because deuterium has a molecular weight of two whereas protium has a molecular weight of one, if a protein were to be deuterated, its mass would increase slightly. Measuring the mass of the deuterated protein with a mass spectrometer indicates how much deuterium was incorporated (reviewed recently in Refs [13,14]). Determining where the deuterium gets incorporated then provides local information about the conformation and conformational changes in the protein during the labeling process. Some of the most interesting and useful hydrogens that exchange are the backbone amide hydrogens (Figure 1B). While labile hydrogens in side chains can also exchange, they are typically not analyzed with HX MS (see below). Hydrogens bonded to carbon essentially do not exchange. Every amino acid has a backbone amide hydrogen (except for proline) meaning that each amino acid can be used as an exchange sensor. Because secondary structural elements are held together by hydrogen bonds that involve these backbone amide hydrogens (Figure 1C and D), information about protein conformation can be obtained.
Protein Analysis with Hydrogen–Deuterium Exchange MS
87
The exchange rates of backbone amide hydrogens are significantly slowed in structured regions of the protein over what they would be if no structure were present. It is therefore possible to derive some information about protein conformation if the amide hydrogen exchange rate(s) is/are measured. Proteins that are completely denatured or devoid of any structure will exchange fast and become more massive quickly relative to proteins where there is structure. Structured portions will exchange and increase in mass much slower than parts of the protein with no structure. An additional factor that affects the exchange rate is the solvent accessibility. Those amide hydrogens that have easy access to the solvent, typically those found on the surface of the protein, can exchange quickly compared to those that are buried in the hydrophobic core of the protein and have less access to solvent. The primary mechanism whereby exchange occurs in structured regions involves local unfolding reactions (Figure 2) (reviewed in Ref. [10]). A protein may unfold partially or globally to expose amide hydrogens to solvent. The unfolding may involve breaking single hydrogen bonds and backbone displacements of a few tenths of an angstrom or it may involve breaking multiple hydrogen bonds and movements of multiple angstroms. If the amide hydrogen does not participate in hydrogen bonding, the solvent accessibility becomes the dominant factor determining the exchange rate. The rate of the partial/local/ global unfolding, k1, is distinct from the rate of the exchange reaction, k2, which
H
D
k1
k-1
k1
k-1
D2O+ODk2 H
D
Figure 2 Local unfolding model of hydrogen exchange [10]. Unfolding events expose amide hydrogens to deuterium in the solution (usually present in great excess). Base-catalyzed exchange is shown. The rate of exchange, k2, can be calculated for unstructured sequences [15].
88
Jennifer L. Mitchell and John R. Engen
can be predicted for unstructured polypeptides [15]. The actual exchange reaction is both acid and base catalyzed with the minimum exchange rate for backbone amide hydrogens occurring at about pH 2.5–2.6 [12,15,16]. At physiological pH, where most labeling experiments are done, the base-catalyzed exchange mechanism is dominant. If the exposed amide hydrogen finds itself in the presence of OD catalyst and D2O (Figure 2), exchange can occur. Again, the rate of protein unfolding (k1) or dynamics in solution is a function of the protein itself. Some regions are highly flexible and become deuterated quickly while others are very stable and are deuterated very slowly. Any alteration in the conformation of the protein, such as a change from inactive to active, unbound to bound, wild type to mutant, can alter the location and rate of deuterium labeling. The mass spectrometer is used to measure the amount and location of the deuteration.
2. EXPERIMENTAL PROTOCOL 2.1 Deuterium introduction To begin the labeling reaction, the H2O buffer that proteins are normally in must be replaced with a D2O buffer. The most common way to do this is by dilution with at least a 15–20-fold excess of D2O [17], as shown in Figure 3. The resulting solution is a mixture of H2O and D2O but the solution is at least 95% D2O. By overwhelming the H2O in this way, the labeling reaction (k2 in Figure 2) is unidirectional. If a 50:50 mixture of H2O/D2O were used there would also be a contribution of a k 2, the reverse D-H exchange. In the most common type of experiment, a continuous labeling experiment, the labeling reaction is allowed to proceed for various amounts of time: 10 s, 30 s, 1 min, etc. A typical continuous labeling experiment may have 12–15 time points ranging from 3 s to 24 h of labeling. Another variety of labeling is pulsed labeling [18]. In pulsed labeling, protein conformation is perturbed for various amounts of time with one or more parameters like denaturant, heat, pH, binding, etc. A quick, fixed length of D2O exposure (the pulse) usually lasting only 10 s or less labels only those regions that have little or no structure. With pulsed labeling, it is possible to monitor the instantaneous population of folded versus unfolded molecules, while with continuous labeling the transition between the states is observed. Protein labeling normally occurs at physiological pH (pH 7.0–8.0) where the majority of proteins are in their most biologically relevant conformation. To stop or quench the exchange so that mass analysis can occur, the pH of the solution is reduced to the minimum value for exchange, pH 2.5 (Figure 3). If it were not for the minimum exchange rate occurring at pH 2.5, hydrogen exchange MS would not be possible. By reducing the pH from 7.0 to 2.5, the exchange rate for the average backbone amide hydrogen is reduced about 10,000-fold (although this is not true for all positions, see Refs [19,20]). To make sure the analysis is reproducible, it is critical to always label and quench with the same pH values. It is also important to know these values as accurately as possible. The optimal
Protein Analysis with Hydrogen–Deuterium Exchange MS
89
Protein in H2O, pH 7.0, 20°C Add 20-fold excess D2O Protein labeling Reduce pH to 2.5, temp to 0°C Labeling quenched yes
no Local analysis?
Digest with pepsin pH 2.5, 0°C HPLC pH 2.5, 0°C
Data interpretation
Mass analysis
Figure 3 General experimental scheme for hydrogen exchange mass spectrometry [14,19,20]. A continuous labeling experiment using electrospray ionization mass spectrometry (ESIMS) [18] is shown. After labeling is quenched, one decides whether to continue with local analysis or global analysis.
way to ensure reproducibility and reliable quenching is to use a buffer with a pKa near pH 2.5. Phosphate buffers are particularly convenient in this regard because there is a pK1 at 2.15 for the quench and a pK2 at 6.82 for the label (see Ref. [17]). At pH 2.5, if a deuterated protein were to suddenly find itself in a 100% H2O environment, the reversion of the deuterium label back to hydrogen would occur with a half-life of about 10–15 min. Owing to the constraints of the mass analysis (see below), this is not enough time to make the mass measurement without substantial loss of deuterium. To reduce the rate of reversion (known as backexchange) by another order of magnitude, the temperature of the quenched sample is lowered to 01C. To minimize back-exchange, the quench conditions of pH 2.5 and 01C must be maintained at all times once the quench is initiated.
2.2 Global versus local information Once a protein has been labeled and the labeling reaction quenched, the analysis of the deuterium incorporation can occur. The analysis may take one of the two forms: analysis of the intact protein or analysis of peptides produced in a digest of the labeled protein (Figure 3). One or both of these may be performed in a typical experiment.
90
Jennifer L. Mitchell and John R. Engen
Figure 4 Example of data from a global hydrogen exchange mass spectrometry experiment. (A) Transformed spectra of a 47 kDa protein that was incubated in D2O for the amounts of time shown on the left. Dotted lines help show the increase in mass as deuteration proceeds. The centroid mass of each peak is shown on the right-hand side. (B) Conversion of the mass increase information in the spectra in panel A into a deuterium uptake curve. The mass of unlabeled protein is subtracted from the mass at each labeling timepoint (relative deuterium level) and plotted versus labeling time. The data have not been adjusted for deuterium back-exchange (see text).
To determine the global incorporation of deuterium, the intact, labeled protein is analyzed. The analysis method is described in Section 2.3. Example data from a global analysis is shown in Figure 4. The mass of the undeuterated protein was obtained. The increase in mass as a result of deuterium labeling for various amounts of time (in this case 10 s, 1 and 8 h) was determined by subtracting the mass measured at each time point from the mass of the undeuterated protein. The resulting data (Figure 4B) was plotted as the time in deuterium (on a log scale, x-axis) versus the increase in mass (the meaning of ‘‘Relative Deuterium Level’’ in Figure 4B will be discussed below). Several valuable pieces of information are obtained during a global analysis. First, the number of rapidly exchanging amide hydrogens in the protein can be estimated if the first labeling time point is sufficiently short (such as 10 s or less). The amide hydrogens that exchange during such a short labeling time are those that must be highly exposed to solvent and probably not involved in hydrogen bonds within secondary structural elements [21]. Second, the number of amide hydrogens that do not exchange at all during the course of the experiment can be determined. In the example in Figure 4B, there are 387 amide hydrogens in this protein but only about 215 have exchanged within 8 h. The remaining 172 amide hydrogens are most likely found in highly stable regions of the protein and are resistant to exchange during the 8 h time course. A third piece of information that can be obtained from global analysis is the number of amide hydrogens that
Protein Analysis with Hydrogen–Deuterium Exchange MS
91
Relative Deuterium Level (Da)
260
240
220
200
180
160
140 1 10 100 Time in Deuterium (min)
1000
Figure 5 Example of data from mutants of the protein in Figure 4 [24]: (K) wild-type protein; (~) mutant 1, a single R169E mutation; (&) mutant 2, a triple mutation of I385A, V386A, F387A. While the sites of mutation are far apart from each other in this protein, they have the same overall effect on the global deuterium exchange.
undergo some kind of unfolding or fluctuations during the labeling time course (10 s to 8 h in Figure 4B). Although not discussed in this chapter, additional information about the protein dynamics and kinetic mechanisms can be derived from the shape or isotopic distribution profile of the mass spectrum (Figure 4A). For additional information about these topics, see Refs [7,13,14,20,22,23]. Figure 5 illustrates data for the same protein as shown in Figure 4, but with the addition of more data points, more replicate analyses and a few more versions of the protein [24]. The wild-type protein illustrated in Figure 4 is shown as the solid circle in Figure 5. Two mutant forms of this protein were analyzed at the same times as the wild-type protein. Both mutants cause protein activation although the location of the mutations is quite different from each other (see Section 3.2 for more details). The deuterium incorporation of the mutants is greater than that of the wild-type protein. Because the different in deuterium level is apparent from the shortest labeling times (where the rapidly exchanging amide hydrogens are located), it can be concluded that both of these mutations cause some change to the overall conformation of this protein that converts some residues into rapidly exchanging amides (exposed to solvent and not hydrogen bonded). The difference in mass between the curve for the wild-type protein and for the mutants indicates how many residues were affected. The location of these residues can be determined by digestion experiments. The information obtained from global analysis may be useful in deciding whether to pursue digestion experiments — if an effect is seen at the global level, then effects should also been seen at the peptide level after digestion (the opposite
92
Jennifer L. Mitchell and John R. Engen
is not necessarily true). To obtain local information rather than global information (Figure 3), the labeled and quenched protein can be digested into pieces and the deuteration of each of the resulting peptides determined at each deuterium labeling time [19]. By doing the peptide analysis, the regions where changes occur can be determined. Such information is perhaps the most valuable thing that HX MS can offer. To perform the digestion, several factors must be considered. The quench conditions of pH 2.5 and 01C must be maintained. With these conditions, an acid protease must be used. Pepsin is the preferred enzyme for digestion of HX samples, although other enzymes have been utilized [25]. Pepsin digestion is not predictable from protein sequence. Unlike a more specific enzyme like trypsin which is known to cut after arginine and lysine residues, pepsin does not have such a defined specificity. It is, however, a reproducible enzyme and will cleave proteins in the same places given the same conditions. One drawback to pepsin digestion, therefore, is that the identity of the peptides that are produced (i.e., the amino acid sequence and hence the location in the protein) cannot be predicted. Each peptic peptide must be identified, usually with MS/MS and exact mass analyses. Another factor to consider during digestion is the speed of digestion. Because the digestion must take place in solution, there may be some artifactual exchange during the digestion step. If digestion is performed in a predominantly D2O buffer, there will be artifactual forward exchange, or addition of deuterium at positions where it was not incorporated in the folded, pH 7 conformation used for labeling. If the digestion is performed in a predominantly H2O solution, there will be artifactual back-exchange or loss of deuterium label to hydrogen in the solution. To minimize these effects, quench conditions must be maintained and the digestion must proceed as rapidly as possible. In-solution digestion of 5 min or less in a predominantly H2O solution results in deuterium losses, or backexchange of about 7–10% [17,20]. To reduce these losses and to produce a more efficient digestion, pepsin can be immobilized on agarose beads. Not only is digestion with immobilized enzyme much more efficient and less time consuming to achieve the same peptides, but the beads can be packed into a column for online digestion [26] and eventual automation [27].
2.3 HPLC and MS As shown in Figure 3, once digestion is complete for local analysis (or if global analysis is chosen instead), the actual mass determination is undertaken. Although MALDI MS has been used to measure deuterium incorporation [28–30], electrospray ionization (ESI) is the far more common method and will be described here. Prior to ESI analysis, an HPLC step is inserted. In addition to providing desalting prior to sample introduction into the mass spectrometer, the HPLC step serves another important purpose. Because the HPLC is performed with protiated solvent (H2O), there is back-exchange that occurs. Any deuterium that had exchanged into side-chain positions, which have very fast exchange rates even
Protein Analysis with Hydrogen–Deuterium Exchange MS
93
under quench conditions, will revert back to hydrogen during the HPLC step. Therefore, any deuterium incorporation that is measured is a result of deuteration at backbone amide positions only — the side-chain deuteration is washed away. This substantially simplifies the data interpretation step because the maximum number of deuterium that could be found in each amino acid that has a backbone amide hydrogen will be one. The HPLC step must, however, maintain quench conditions or all of the deuterium, including that at backbone amide positions, will be washed away. The quench pH of 2.5 can be maintained with 0.05% TFA in the mobile phases [19], an ideal scenario for separation of peptides and proteins. The columns, injectors, and all associated tubing must be maintained at 01C, usually with an ice bath. There is, however, deuterium back-exchange of backbone amide hydrogens during the HPLC step that cannot be avoided even in the best of circumstances. Just as short digestion minimizes back-exchange during digestion, keeping the HPLC step as short as possible helps minimize the reversion of deuterium to protium. A typical HPLC run is completed in less than 10 min and will lead to around 10% deuterium loss in a well-optimized system. A problem is that chromatography at 01C is relatively poor (see Figure 6A for an example). The modest resolution afforded by typical C18 columns often does not usually become problematic unless the size of the protein to be digested exceeds about 40–50 kD. In Figure 6, the HPLC of a 47 kD protein digestion does not provide fantastic chromatography (Figure 6A) but each peptide is fairly well isolated and resolved from other peptides on the m/z scale (Figure 6C and D). Attempts to improve the chromatography at 01C by using smaller particle packing materials at higher pressure have been successful [31] but are not yet commonplace. Once the HPLC step is complete, the eluant is directed into a mass spectrometer for mass analysis and data processing. To process data for intact proteins, the methods described above and in Figures 4 and 5 are used. When local information is desired, the mass increase of each peptic peptide must be determined. Even in instruments with modest resolution (B2,000), the isotopic peaks of peptides will be resolved. The average mass of each isotopic envelope is determined for the undeuterated and the deuterated species by finding the centroid of each isotopic envelope (Figure 6D) [17]. These data processing steps are repeated for all the peptic peptides produced during the digestion. The analysis is then repeated for each time point in the time course.
2.4 Data interpretation The final step in a hydrogen exchange experiment is to interpret the deuterium incorporation data (Figure 3). To visualize the data, the deuterium uptake is often converted into a graph, or a deuterium uptake curve (as in Figure 4B or Figure 7). If no adjustment is made for the deuterium back-exchange that occurs during analysis, the y-axis can be labeled as relative deuterium level. As described in detail in Ref. [14], plotting data as relative deuterium level has a number of advantages and is most often done when no totally deuterated control sample is available to quantify the amount of back-exchange that occurred during analysis.
94
Jennifer L. Mitchell and John R. Engen
Figure 6 Scheme for data processing in local analysis. The HPLC chromatogram (A) illustrates the modest quality of the separation at 01C and pH 2.5. Separation was performed on a C18 microbore column, 1 50 mm, 4 mm particles. A combination of all the scans in the chromatogram (B) demonstrates the complexity of the peptic digestion. However, selection of just a few scans (C) simplifies the spectra. Each of the peaks observed in spectra from selected portions of the chromatogram are magnified (D) for data processing. The same processing steps are repeated for all the time points in the experiment.
Protein Analysis with Hydrogen–Deuterium Exchange MS
95
Figure 7 Data illustrating the effects of binding on deuterium incorporation at the protein level (main graph) and the peptide level (inset) [37]. Free protein is shown with an open circle (J) and bound protein with a closed circle (K). For this experiment, W85% of the protein was bound to a peptide ligand with a Kd of 10 mm. The peptide shown in the inset corresponds to amino acids 44–61.
If the location of changes in various forms of the protein (i.e., Figure 4 and 7) is desired, all the analyses must occur together under identical conditions so the relative differences have meaning. On the other hand, if the absolute deuterium level is of greater interest, back-exchange adjustment is required and the necessary controls must be analyzed along with the sample set. An adjustment can be made for back-exchange to arrive at an absolute number of deuteria that exchanged at each time point [19]. Although several variations [32] to the original adjustment formula [19] have been described, the original adjustment method has found the most use in hydrogen exchange experiments. If a crystal structure is available for the protein being investigated, the data interpretation can continue with the known structure as a guide. The deuterium incorporation information can be plotted onto the crystal structure rendering of the protein, considered in light of the hydrogen bonds found in the crystal structure and interpreted along with the structural elements found in each peptic peptide (recall that the deuterium exchange reaction captures the conformational information of the protein prior to digestion). Note that data in HX MS is obtained on a peptic peptide level. Exchange cannot be localized in most cases to individual amide positions, even with MS/MS (reviewed in Ref. [14]). Therefore, the interpretation of the results must consider that, for example, changes involving 2–3 deuteriums in a peptide of 20 amino acids do not occur throughout the entire
96
Jennifer L. Mitchell and John R. Engen
peptide but are confined to only a few amino acids within that peptide. Identification of those few residues remains unknown.
3. ILLUSTRATIVE EXAMPLES So what can be done with this hydrogen–deuterium labeling method? What types of information can be obtained and why might HX MS offer advantages over other methods for determining such information? In the following section, three examples of applying HX MS to various protein questions will be given. These examples are meant to illustrate the breadth of the experiments that are possible. Please refer to the recent reviews on HX MS for more examples [13,14] of how one can use HX MS to answer other important questions about proteins.
3.1 Protein conformation and the effects of mutation Mutations are changes to the amino acid sequence that make the protein different than the normal sequence. What is considered the ‘‘normal’’ sequence? For proteins that have only one isoform, normal is the sequence that is always found. A mutant is an abnormal translation that might include a few changes to the amino acid sequence. For many proteins, however, defining the ‘‘normal’’ or wildtype sequence is much more difficult. Sequencing results may disagree; there may be multiple isoforms or the protein may have been isolated from multiple cellular hosts that each contain slightly different sequences. In such cases, the consensus sequence is used to represent what is normal. Mutation may be a natural result of different genes. If, for example, W90% of protein X is found with one sequence and only 10% is found with a modified form, the 10% would be considered the ‘‘mutant’’ form. Mutation may result from alterations in the gene that are propagated from generation to generation or mutations may be a more isolated event only occurring on a limited basis until the DNA is repaired. UV radiation, for example, can cause DNA mutation that is then passed along to cause mutant proteins. Creating mutations by site-directed mutagenesis [33] permits investigation about the effects of mutation on protein conformation. Hydrogen exchange and MS can be used to determine the effects of natural or induced mutation on protein conformation. To do such an experiment, deuterium incorporation is measured for the wild type or ‘‘normal’’ protein form and compared with deuterium incorporation in the various mutant forms of interest. The utility of these kinds of experiments is illustrated with Figure 5, which has already been partially described. The protein being probed in the Figure 5 example is arrestin2, a signaling molecule that is believed to change its conformation prior to interaction with G-protein coupled receptors (GPCRs) [24]. The wild-type protein cannot interact with GPCRs. The conformation of the form that can interact with GPCRs is unknown but several mutants have been described that can bind to GPCRs. It is thought that these mutations alter the structure of the wild-type arrestin2 protein in a similar fashion to what happens normally in vivo just before GPCR interaction.
Protein Analysis with Hydrogen–Deuterium Exchange MS
97
Figure 5 illustrates the hydrogen exchange comparison of the wild-type protein with two of the arrestin2 mutants that have altered conformations permitting GPCR interaction. This data is a good example of analyzing the global, or protein-wide impact of mutation. Both mutations cause the protein to incorporate about 15–20 more deuterium atoms from the earliest time points to the conclusion of the experiment. Such a dramatic change in the deuterium uptake results from changing only a few amino acids and indicates that the mutations must cause a change in the conformation of arrestin2. To correlate the overall difference in exchange with specific regions of the protein, local analysis (Figure 3) was performed on these three proteins. Reference 20 describes results and ˚ can result locations of the changes and suggests that mutations as far away as 30 A in conformational modifications similar to those caused by mutation right near the site of conformational change. In other words, the affected regions are common to both mutants even though the mutations are not anywhere near one another. The results further point out that a modest change in primary structure can lead to a major change in tertiary structure. Such hydrogen exchange measurements could be made on any protein. While there are countless examples of such experiments from laboratories around the world, the reader is referred to two other examples from our laboratory in which we used mutation to help describe the interaction surface of two proteins [34] and to help understand the internal binding sites and intramolecular interactions of a tyrosine kinase [35].
3.2 Binding interactions As many proteins participate in protein complexes that complete complex biological tasks [36], the effects of interaction on protein conformation are of great interest. Any changes to protein conformation that alter hydrogen exchange rates can be investigated with hydrogen exchange and MS. An example of detecting binding-induced changes in protein conformation and dynamics at both the protein and peptide levels is shown in Figure 7. The SH3 domain of the tyrosine kinase Lck interacts with a peptide from the Tip protein with a dissociation constant of about 10 mM [37]. The interaction slows deuterium uptake on a global scale in the Lck SH3 domain (Figure 7). The bound form (solid circle) is eventually deuterated to the same extent as the unbound form (open circle), but the rate of uptake is reduced in the bound form. The number of amide hydrogens whose rates are slowed can be quantified by measuring the change to the deuterium uptake at a given time in deuterium. In Figure 7, for example, about 15 amide hydrogens (not accounting for back-exchange) are affected at the 1 min D2O incubation time point (compare open circles with closed circles). Changes such as those in Figure 7 at the global level are diagnostic for the interaction of the protein with its ligand. By varying the concentration of the ligand, in a titration-type experiment, an approximate Kd can be obtained [34]. Such an experiment can be performed for protein–protein interactions, protein– peptide interactions, protein–small molecule interactions and protein–nucleic acid interactions, provided there is a detectable change in the deuterium uptake upon
98
Jennifer L. Mitchell and John R. Engen
binding. The tighter the binding, the easier it becomes to detect changes in deuterium exchange [38]. Complexes with dissociation constants as high as 100 mM have been investigated in our laboratory [39]. In addition to obtaining global-level information, the regions that are affected by interactions can be localized with digestion experiments. The inset in Figure 7 shows data for one of the seven peptides obtained from digestion of the Lck SH3 domain [37]. The shape of the bound and free curves in this peptide are similar to the shapes of the curves for the intact protein. The relative number of backbone amide that is altered by binding (in this example B4 at the 1 min labeling time) can be determined from the graph. Similar deuterium uptake graphs for the other peptides provide information about where the changes occur upon interaction. In regions where no changes occur upon binding, the deuterium uptake curves are identical for the bound and free forms of the protein. A few complicating factors exist however, and must be kept in mind when investigating interactions with hydrogen exchange and MS. First, in some complexes, interactions are driven by the association of side chains which may not necessarily alter the exchange rates of backbone amide hydrogens, particularly those found in stable secondary structural elements [34]. Second, finding decreases in deuterium exchange in a particular region, especially changes at short labeling times (o30 s) does not necessarily help to locate the binding interface formed in a complex. It is entirely possible that interactions/binding may alter the conformation of one of the proteins such that deuterium exchange is altered in regions distant from the binding surface [40], although more sophisticated label/wash experiments [41] can help identify such events. Third, binding interaction experiments become much more technically challenging than analyses of deuterium uptake in single proteins. In most cases, the analysis of binding interactions via HX MS requires a comparative examination of unbound and bound protein states. Baseline HX MS data is collected for each partner individually and for the mixture of the two partners [42]. If all the experiments are performed under identical conditions, changes in the relative deuterium level in peptides (as in Figure 7) indicate which regions have undergone changes due to interaction. Additional difficulties in interaction experiments include the need for extra HPLC separation, mass spectral signal suppression of one protein by the other, added complexity of analyzing all the peptides from both partners, etc. Almost all of these difficulties can be overcome to produce data about protein interactions that are not only extremely valuable, but may also be quite difficult to obtain with other methods.
3.3 Investigating proteins lacking structural data Many hydrogen exchange MS experiments are on proteins for which there is structural information such as an X-ray crystal structure or an NMR structure. Can HX MS be used as a tool in structural determination? The answer is yes, but to a limited extent. There are other distinctions of using HX MS to probe the conformation of proteins for which there is limited structural information. One difference is that
99
Protein Analysis with Hydrogen–Deuterium Exchange MS
HX MS studies (like NMR) take place on the solution conformation, which in a few cases is not the same as that which crystallizes. A typical hydrogen exchange MS experiment to probe unknown conformation may require less than 50 nM of material. Because these requirements are so much less than those for NMR, for example, many proteins that cannot be prepared in high concentrations or amounts are suitable for investigation with hydrogen exchange and MS. As described in Section 1, the main factors that determine hydrogen exchange rates in folded proteins are hydrogen bonding and solvent accessibility. These two factors cannot be separated from one another. For example, a well structured, hydrogen-bonded a-helix on the surface of a protein may have the same deuterium uptake as an unstructured, non hydrogen-bonded loop that is well solventprotected and buried in the center of a protein. The coexistence of these two factors, therefore, prohibits structural classification and secondary structure assignment of proteins with hydrogen exchange methods. What can be done, however, is to delineate regions that are protected from those that are less protected. An example of applying HX MS to investigate the structure of a protein where there is limited structural information is shown in Figure 8. Only partial structural information exists for the Nef protein from HIV. The core regions (from amino acid 75–207) have been analyzed multiple times with crystallography and NMR.
210aa N 152 163
60 75
Percent Deuterated
100
178
C 207
After 10 s in D2O
75
50
25
20
40
60
80
100
120
140
160
180
200
Residue Number
Figure 8 Deuteration of a protein for which only part of the structure is known [43]. The organization of the protein is shown at the top. Structural information is known by NMR for the parts shaded with black (top part of protein) and by crystallography for the parts shaded with grey (bottom part of protein). Small bars above the protein indicate the known secondary structural elements: white line, PPII helix; black line, a-helix; grey line, b-strand. The percentage deuteration for 12 peptic peptides was determined by measuring the deuterium level after 10 s in D2O. Exchange after other amounts of time was also determined [43].
100
Jennifer L. Mitchell and John R. Engen
A long loop between residues 152–178 was removed to facilitate crystallization and the conformation of this loop was not determined. NMR investigations have helped determine the conformation of the N-terminal portion from residues 1–60 and provided more information about the loop from 152 to 178. Hydrogen exchange MS was performed on the HIV Nef protein [43] and a summary of the results is shown in Figure 8. As expected, the regions with known structural elements, such as the first two a-helicies, had significant protection from exchange after 10 s in D2O. The N-terminal region, known to be dynamic in solution, was heavily (but not completely) deuterated after 10 s in deuterium, as was the loop between residues 152–178. Some may argue that no new information has been revealed during such an analysis. The location of unstructured regions was known because when they were removed, crystallization occurred. This notion is true for this example because the HX MS was done after the unstructured regions were determined. In the reverse experiment, HX MS can be used before attempting crystallization to assist X-ray crystallographers in identifying unstructured regions of a protein [44]. Once the unstructured regions, often loops, are modified or removed, the structured core of the protein may become amenable to crystallization. Another benefit of determining the hydrogen exchange for proteins of unknown or limited structure is that the baseline data are obtained for comparisons of the same proteins when part of complexes. The growing awareness of intrinsically unfolded proteins that only become structured upon interaction with partners [8] makes HX MS experiments on proteins lacking structural information when not part of complexes quite attractive for characterizing the uncomplexed structure and what happens to that structure upon binding. In addition to analysis of the Nef protein from HIV, the Nef protein from SIV (simian immunodeficiency virus) was also investigated with HX MS [43]. Prior to the HX MS analyses, no structural information existed for the SIV Nef protein. HX MS results showed that there were some similar unstructured regions, but that there were also some regions that were protected from exchange in the parts of SIV Nef that are different from HIV Nef. A global hydrogen exchange analysis of SIV versus HIV Nef also indicated that the SIV protein was more dynamic in solution than its HIV counterpart. In all, the hydrogen exchange data for the HIV and SIV Nef proteins provided new and significant conformational information about these proteins. Such methods can be applied to other proteins as well.
4. CONCLUSIONS As was illustrated in this chapter, MS can be used for much more than just measuring the molecular weight of proteins. With the labeling method of HX, protein conformations and changes in conformation can be investigated with MS. Because so many proteins are uncharacterized and are not amenable to other types of structural analyses, it is believed that HX MS will make major contributions towards understanding proteins and their functions.
Protein Analysis with Hydrogen–Deuterium Exchange MS
101
ACKNOWLEDGEMENTS We thank D. D. Weis, J. M. Hochrein and T. E. Wales for obtaining the data shown in Figures 7 & 8. We gratefully acknowledge funding from the NIH (R01-070590 and R01-068901).
REFERENCES 1 T.E. Creighton, Protein Folding, 2nd ed., W. H. Freeman and Co, New York, NY, 1992. 2 D. Voet and J.G. Voet, Biochemistry, 3rd ed., Wiley, New York, 2005. 3 C.-I. Bra¨nde´n and J. Tooze, Introduction to Protein Structure, 2nd ed., Garland Publishing, New York, 1999. 4 W.A. Eaton and J. Hofrichter, Sickle cell hemoglobin polymerization, Adv. Protein Chem., 40 (1990) 63–279. 5 Q. Yi, P. Rajagopal, R.E. Klevit and D. Baker, Structural and kinetic characterization of the simplified SH3 domain FP1, Protein Sci., 12 (2003) 776–783. 6 M. Tollinger, N.R. Skrynnikov, F.A. Mulder, J.D. Forman-Kay and L.E. Kay, Slow dynamics in folded and unfolded states of an SH3 domain, J. Am. Chem. Soc., 123 (2001) 11341–11352. 7 T.E. Wales and J.R. Engen, Partial unfolding of diverse SH3 domains on a wide timescale, J. Mol. Biol., 357 (2006) 1592–1604. 8 H.J. Dyson and P.E. Wright, Unfolded proteins and protein folding studied by NMR, Chem. Rev., 104 (2004) 3607–3622. 9 H.J. Dyson and P.E. Wright, Insights into the structure and dynamics of unfolded proteins from nuclear magnetic resonance, Adv. Protein Chem., 62 (2002) 311–340. 10 S.W. Englander and N.R. Kallenbach, Hydrogen exchange and structural dynamics of proteins and nucleic acids, Q. Rev. Biophys., 16 (1984) 521–655. 11 S.W. Englander, L. Mayne, Y. Bai and T.R. Sosnick, Hydrogen exchange: The modern legacy of Linderstrom-Lang, Protein Sci., 6 (1997) 1101–1109. 12 S.W. Englander and A. Poulsen, Hydrogen-tritium exchange of the random chain polypeptide, Biopolymers, 7 (1969) 379–393. 13 A.N. Hoofnagle, K.A. Resing and N.G. Ahn, Protein analysis by hydrogen exchange mass spectrometry, Annu. Rev. Biophys. Biomol. Struct., 32 (2003) 1–25. 14 T.E. Wales and J.R. Engen, Hydrogen exchange mass spectrometry for the analysis of protein dynamics, Mass Spectrom. Rev., 25 (2006) 158–170. 15 Y. Bai, J.S. Milne, L. Mayne and S.W. Englander, Primary structure effects on peptide group hydrogen exchange, Proteins: Struct. Funct. Genet., 17 (1993) 75–86. 16 R.S. Molday, S.W. Englander and R.G. Kallen, Primary structure effects on peptide group hydrogen exchange, Biochemistry, 11 (1972) 150–158. 17 J.R. Engen and D.L. Smith, Investigating the higher order structure of proteins: Hydrogen exchange, proteolytic fragmentation and mass spectrometry. In: J.R. Chapman (Ed.), Mass Spectrometry of Proteins and Peptides, Humana Press, Totowa, NJ, 2000, Vol. 146 of Methods of Molecular Biology, pp. 95–112. 18 Y. Deng, H. Pan and D.L. Smith, Comparison of continuous and pulsed labeling amide hydrogen exchange/mass spectrometry for studies of protein dynamics, J. Am. Soc. Mass Spectrom., 10 (1999) 675–684. 19 Z. Zhang and D.L. Smith, Determination of amide hydrogen exchange by mass spectrometry: A new tool for protein structure elucidation, Protein Sci., 2 (1993) 522–531. 20 D.L. Smith, Y. Deng and Z. Zhang, Probing the non-covalent structure of proteins by amide hydrogen exchange and mass spectrometry, J. Mass Spectrom., 32 (1997) 135–146. 21 K. Dharmasiri and D.L. Smith, Mass spectrometric determination of isotopic exchange rates of amide hydrogens located on the surfaces of proteins, Anal. Chem., 68 (1996) 2340–2344. 22 I.A. Kaltashov and S.J. Eyles, Mass Spectrometry in Biophysics: Conformation and Dynamics of Biomolecules, Wiley-Interscience, New York, 2005. 23 D.D. Weis, M. Hotchko, T.E. Wales, L.F. Ten Eyck and J.R. Engen, Identification and characterization of EX1 kinetics in H/D exchange mass spectrometry by peak width analysis, J. Am. Soc. Mass Spectrom., 17 (2006) 1498–1509.
102
Jennifer L. Mitchell and John R. Engen
24 J.M. Carter, V.V. Gurevich, E.R. Prossnitz and J.R. Engen, Conformational differences between arrestin2 and pre-activated mutants as revealed by hydrogen exchange mass spectrometry, J. Mol. Biol., 351 (2005) 865–878. 25 L. Cravello, D. Lascoux and E. Forest, Use of different proteases working in acidic conditions to improve sequence coverage and resolution in hydrogen/deuterium exchange of large proteins, Rapid Commun. Mass Spectrom., 17 (2003) 2387–2393. 26 L. Wang, H. Pan and D.L. Smith, Hydrogen exchange-mass spectrometry: Optimization of digestion conditions, Mol. Cell Proteomics, 1 (2002) 132–138. 27 M.J. Chalmers, S.A. Busby, B.D. Pascal, Y. He, C.L. Hendrickson, A.G. Marshall and P.R. Griffin, Probing protein ligand interactions by automated hydrogen/deuterium exchange mass spectrometry, Anal. Chem., 78 (2006) 1005–1014. 28 J.G. Mandell, A.M. Falick and E.A. Komives, Measurement of amide hydrogen exchange by MALDI-TOF mass spectrometry, Anal. Chem., 70 (1998) 3987–3995. 29 A. Baerga-Ortiz, C.A. Hughes, J.G. Mandell and E.A. Komives, Epitope mapping of a monoclonal antibody against human thrombin by H/D-exchange mass spectrometry reveals selection of a diverse sequence in a highly conserved protein, Protein Sci., 11 (2002) 1300–1308. 30 A. Nazabal, M. Laguerre, J.M. Schmitter, J. Vaillier, S. Chaignepain and J. Velours, Hydrogen/ deuterium exchange on yeast ATPase supramolecular protein complex analyzed at high sensitivity by MALDI mass spectrometry, J. Am. Soc. Mass Spectrom., 14 (2003) 471–481. 31 Y. Wu, J.R. Engen and W.B. Hobbins, Ultra performance liquid chromatography (UPLC) further improves hydrogen/deuterium exchange mass spectrometry, J. Am. Soc. Mass Spectrom., 17 (2006) 163–167. 32 A.N. Hoofnagle, K.A. Resing and N.G. Ahn, Practical methods for deuterium exchange/mass spectrometry, Methods Mol. Biol., 250 (2004) 283–298. 33 D. Botstein and D. Shortle, Strategies and applications of in vitro mutagenesis, Science, 229 (1985) 1193–1201. 34 J.R. Engen, Analysis of protein complexes with hydrogen exchange and mass spectrometry, Analyst (London), 128 (2003) 623–628. 35 J.M. Hochrein, E.C. Lerner, A.P. Schiavone, T.E. Smithgall and J.R. Engen, An examination of dynamics crosstalk between SH2 and SH3 domains by hydrogen/deuterium exchange and mass spectrometry, Protein Sci., 15 (2006) 65–73. 36 A.C. Gavin and G. Superti-Furga, Protein complexes and proteome organization from yeast to man, Curr. Opin. Chem. Biol., 7 (2003) 21–27. 37 D.D. Weis, P. Kjellen, B.M. Sefton and J.R. Engen, Altered dynamics in Lck SH3 upon binding to the LBD1 domain of Herpesvirus saimiri Tip, Protein Sci., 15 (2006) 2402–2410. 38 J.G. Mandell, A.M. Falick and E.A. Komives, Identification of protein-protein interfaces by decreased amide proton solvent accessibility, Proc. Natl. Acad. Sci. USA, 95 (1998) 14705–14710. 39 J.R. Engen, T.E. Smithgall, W.H. Gmeiner and D.L. Smith, Identification and localization of slow, natural, cooperative unfolding in the hematopoietic cell kinase SH3 domain by amide hydrogen exchange and mass spectrometry, Biochemistry, 36 (1997) 14384–14391. 40 L. Mayne, Y. Paterson, D. Cerasoli and S.W. Englander, Effect of antibody binding on protein motions studied by hydrogen-exchange labeling and two-dimensional NMR, Biochemistry, 31 (1992) 10678–10685. 41 R.A. Garcia, D. Pantazatos and F.J. Villarreal, Hydrogen/deuterium exchange mass spectrometry for investigating protein-ligand interactions, Assay. Drug Dev. Technol., 2 (2004) 81–91. 42 S. Kaveti and J.R. Engen, Protein interactions probed with mass spectrometry. In: R.S. Larson (Ed.), Bioinformatics and Drug Discovery, Totowa, NJ, Humana Press, 2006, Vol. 316 of Methods of Molecular Biology, pp. 179–197. 43 J.M. Hochrein, T.E. Wales, E.C. Lerner, A.P. Schiavone, T.E. Smithgall and J.R. Engen, Conformational features of the full-length HIV and SIV Nef proteins determined by mass spectrometry, Biochemistry, 45 (2006) 7733–7739. 44 G. Spraggon, D. Pantazatos, H.E. Klock, I.A. Wilson, V.L. Woods, Jr. and S.A. Lesley, On the use of DXMS to produce more crystallizable proteins: Structures of the T. maritima proteins TM0160 and TM1171, Protein Sci., 13 (2004) 3187–3199.
CHAPT ER
5 Biochemical Reaction Kinetics Studied by Time-Resolved Electrospray Ionization Mass Spectrometry Lars Konermann, Jingxi Pan and Derek J. Wilson
Contents
1. Introduction 2. Time-Resolved ESI-MS 2.1 Experimental setup 2.2 Theory and data analysis 3. Selected Applications 3.1 Enzyme kinetics 3.2 Protein folding and assembly 3.3 Unfolding and subunit disassembly of noncovalent protein complexes 3.4 Hydrogen/deuterium exchange: Continuous and pulse-labeling approaches 4. Conclusions and Outlook Acknowledgements References
103 105 105 107 109 109 112 115 117 120 121 121
1. INTRODUCTION Owing to their superb sensitivity and selectivity, mass spectrometry (MS)-based techniques are among the most important analytical tools for a wide range of applications including proteomics, pharmacology, and environmental monitoring. Another particularly interesting area is the use of MS for studying the kinetics of chemical and biochemical processes. Experiments using this approach can provide valuable information on reaction mechanisms, rate constants, and Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00205-5
r 2009 Elsevier B.V. All rights reserved.
103
104
Lars Konermann et al.
short-lived intermediates [1–3]. MS can offer insights that are complementary to those obtained by traditional kinetic techniques employing optical detection methods, for example, UV-Vis absorption or fluorescence spectroscopy. Of the numerous ionization techniques available for MS, electrospray ionization (ESI) is most suitable for kinetic studies on solution-phase processes, since it provides a seamless connection between solution-phase chemistry and gas-phase detection. ESI-MS can be applied to a wide variety of analytes, ranging from low molecular weight compounds to large noncovalent assemblies [4]. Time-dependent changes in the chemical composition of a reaction mixture can often be monitored on-line, that is, by simply injecting the solution into the ESI source while the process of interest occurs. Lee et al. were the first to demonstrate the viability of this concept, shortly after ESI-MS had been developed [5]. An alternative approach is to terminate the reaction at selected time points, by exposing the solution to suitable quenching agents such as acid and/or low temperature [6–9]. Typically these quench-flow techniques employ off-line strategies for sample analysis, thereby facilitating the incorporation of clean-up steps by liquid chromatography or dialysis. Additional sample processing, such as chemical derivatization or limited proteolysis may also be employed. The application of off-line methods can be problematic in cases where analytes are not stable under the conditions of the quenched reaction mixture. Moreover, quench-flow studies can become very labor intensive because temporal profiles have to be assembled from multiple individual experiments representing different time points. Any kinetic experiment has to be designed such that its time resolution is adequate for monitoring the processes of interest. Because chemical and biochemical reactions occur on time scales ranging from sub-milliseconds to hours, different approaches may be required for studying different types of processes. Most reactions can be initiated by the mixing of two initially separated solutions. The duration of this mixing step directly affects the time resolution of the experiment. Another important factor is the amount of time required after mixing until data acquisition can commence. Manual mixing is suitable for slow processes that occur within minutes to hours. In contrast, reactions occurring on faster time scales require the use of automated rapid mixing devices. Traditional rapid mixing schemes operate under turbulent flow conditions and normally require high flow rates which often implies large sample consumption [10,11]. With the advent of microfluidics [12], various laminar flow devices have been developed that are based on hydrodynamic focusing and diffusive mixing. Microsecond time resolution with optical detection has been demonstrated for both turbulent and laminar flow [13–16]. This chapter focuses on the on-line coupling of ESI-MS with continuous-flow rapid mixing devices. ‘‘Time-resolved’’ ESI-MS measurements using these systems have proven to be highly effective tools for kinetic studies on processes occurring on the time scale of milliseconds to seconds. Time-resolved ESI-MS offers the capability to monitor multiple reactive species simultaneously, thereby providing a ‘‘global view’’ of (bio)chemical reactions, highly accurate rate parameters, as well as detailed insights into reaction mechanisms. Following a
Time-Resolved ESI-MS
105
description of the experimental methodology and underlying theory, we will discuss the application of time-resolved ESI-MS to enzymatic and proteinfolding/unfolding reactions. In addition, we describe the combination of timeresolved ESI-MS with isotopic labeling strategies. Alternative approaches such as stopped-flow MS [17–19] and membrane inlet MS (MIMS) [20] will not be considered here due to space limitations.
2. TIME-RESOLVED ESI-MS 2.1 Experimental setup Time-resolved ESI-MS studies can be carried out in a number of different ways. A continuous-flow system equipped with a static capillary mixer offers the simplest possible approach for this type of experiment (Figure 1A) [21–23]. Reactant solutions are continuously passed from two stepper motor-driven syringes into a mixing tee, where the reaction of interest is initiated. The tee empties into a fused silica reaction capillary that is connected to the ESI source of the mass spectrometer. The reaction time is determined by the solution flow rate and by the capillary dimensions. One possibility to control the reaction time is by altering the flow rate. However, this strategy is not always advisable because it may result in artifactual changes of analyte ion abundances [24]. A better, but more labor-intensive method to study different time points is by installing reaction capillaries of different length (while leaving the flow rate constant). In this way, intensity-time profiles of ionic signals representing solution-phase species of interest can be obtained from series of individual measurements. Static capillary mixers of the type depicted in Figure 1A are most useful for experiments that require information on only relatively few time points. Figure 1B shows a somewhat more advanced continuous-flow setup that employs a concentric capillary mixer [25]. In contrast to the device discussed above (Figure 1A), the reaction chamber volume of this system is variable and can be controlled by a stepper-motor driven mechanism. Kinetic data can be acquired for a wide range of different time points in an automated fashion without changing the solution flow rate. Reactant solutions are continuously expelled from both syringes. Syringe 1 is connected to the inner capillary, whereas the solution delivered by syringe 2 flows through the outer capillary. The end of the inner capillary is blocked by a polyimide plug. Solution from syringe 1 escapes through a notch cut B2 mm from the capillary end. This forces both reactant solutions to undergo diffusive mixing in the B8 mm intercapillary space. The reaction proceeds while the mixture flows towards the outlet of the apparatus. Kinetic experiments are performed by initially positioning the mixer very close to the outlet of the outer capillary (so that the two capillary ends are flush). This represents the earliest time-point that can be monitored (ca. 5 ms for a total flow rate of 100 mL/min). High voltage is applied directly to the stainless steel outer capillary, and ESI occurs at the outlet of the device. The use of a collateral nebulizer gas flow around the outer capillary increases the signal
106
Lars Konermann et al.
syringe 1
reaction time ∝ capillary length
MS
(A)
syringe 2
mixing tee
ESI source
three-way union syringe 1
sleeve mixer MS plug ESI source
syringe 2
inner capillary outer capillary
(B)
mixer 1 mixer 2
syringe 1
MS
syringe 2
(C)
syringe 3
Figure 1 Schematic depiction of different on-line continuous-flow capillary mixing systems for time-resolved ESI-MS. (A) Static device. The reaction of interest is initiated by mixing of solutions from syringes 1 and 2. The reaction time is proportional to the length of the capillary connecting the mixing tee and the ESI source. (B) Concentric capillary mixer with adjustable reaction chamber volume. The distance between the mixer and the ESI source (and hence the reaction time) can be adjusted by altering the position of syringe 1, as indicated by the dashed arrow. (C) Concentric capillary mixer similar to (B), but with a second (static) mixer added downstream of mixer 1. Arrows indicate the direction of liquid flow. Adapted with permission from ref. [25]. Copyright 2003 American Chemical Society.
intensity and stability. The inner capillary is then slowly pulled back, while solutions from the two syringes are continuously passed through the reaction zone. The mass spectrometer monitors the ions emitted from the source, thus yielding data at successively later reaction times. These experiments provide data in three dimensions; the time axis is determined by the mixer position, the m/z
Time-Resolved ESI-MS
107
axis provides information on the identity of the species in the reaction mixture, and the intensity axis is related to the concentration of each of the various species. The software routines required to operate the mass spectrometer during these kinetic measurements are identical to those employed during the acquisition of LC-MS profiles. Mass spectra for specific reaction times, as well as intensity-time profiles for selected ionic species can be extracted from the measured total ion current (TIC). Capillary mixing devices of the type discussed here have a time resolution that allows the measurement of rate constants up to at least 100 s1 [25]. They can be coupled to many different types of ESI mass spectrometers, for example, triple quadrupole or time-of-flight (TOF) instruments [25,26]. A number of experimental protocols require the incorporation of two sequential mixing steps. One possible implementation of such a system is shown in Figure 1C. The operation of this setup is analogous to that of Figure 1B, with the exception that a second (static) mixer is mounted downstream of the first one. The reaction time is determined by the capillary volume and the flow rate between the two mixers. The second mixer may be used for pulsed isotope labeling (see Section 3.4). Alternatively, it allows the addition of an ‘‘electrosprayfriendly’’ make-up solvent (e.g., a methanol/acetic acid solution) immediately prior to ESI in cases where the original reaction mixture contains substances that interfere with the ionization process.
2.2 Theory and data analysis 2.2.1 Laminar flow effects The analysis of kinetic data generated by the continuous-flow devices in Figure 1 would be simplest in the hypothetical case of plug flow, where all particles experience the same flow velocity. In this case it would be possible to associate any position along the reaction capillary with a unique, well-defined reaction _ and capillary cross-sectional area A in typical time t. However, the flow rate V time-resolved ESI-MS experiments result in Reynolds numbers far below the threshold value of 2,000. Under these conditions, the flow within the reaction capillary is laminar, with a velocity profile v(r) that is given by [13,27] r2 vðrÞ ¼ vmax 1 2 (1) R where R is the inner radius of the reaction capillary and r represents the radial position within the capillary. The flow velocity at the center of the capillary, vmax, _ is twice the average flow velocity v¯, defined as v¯ ¼ V=A. This parabolic velocity profile has a tendency to distort the measured kinetics by ‘‘blurring’’ the time axis, because individual positions l along the reaction capillary cannot be associated with specific reaction times t. Instead, each value of l corresponds to a range of reaction times that are spread around the average value ¯t, where ¯t ¼ l=¯v. _ ¼ 30 mL/min and l ¼ 10 cm results in ¯t ¼ 0.88 s for a reaction For example, V capillary with an inner diameter of 75 mm. The distortion of the measured kinetics due to laminar flow effects has to be taken into account for a detailed analysis of experimental data [25]. For an
108
Lars Konermann et al.
analyte solution traveling through the reaction capillary, the ‘‘age’’ a of each molecule is defined as the time required to move from the mixing point to the ESI source. The probability that an analyte molecule has an age in the range a a þ da is given by P(¯t, a)da, where P(¯t, a) is the ‘‘age distribution function’’. For laminar flow, P(¯t, a) is given by [13] ¯t2 2a3 ¯ Pðt; aÞ ¼ 0 Pð¯t; aÞ ¼
for a ¯t=2
and
(2)
for ao¯t=2
Now consider a kinetic process that is associated with a time-dependent signal I(m/z, t). Blurring of the time axis due to laminar flow causes the mass spectrometer to detect an average signal oI(m/z)W(¯t ) that can be expressed as: oIðm=zÞ4ð¯t Þ ¼
Z1
Iðm=z; aÞPð¯t; aÞda
(3)
0
This equation is valid for any age distribution function P(¯t, a). For the laminar flow conditions considered here, substitution of Equation (2) with Equation (3) results in oIðm=zÞ4ð¯t Þ ¼
¯t2 2
Z1 Iðm=z; aÞ ¯t=2
da a3
(4)
Equation (4) is valid as long as radial diffusion of analyte molecules within the reaction capillary is negligible. This has been shown to be the case for processes that have essentially gone to completion within a time window of [25] ¯t
R2 36D
(5)
where D is the diffusion coefficient of the analyte of interest. For D ¼ 1 1010 m2/s (a typical value for proteins), and R ¼ 91 mm, R2/36D ¼ 2.3 s. In other words, for most practical applications Equation (4) provides a reasonable approximation as long as the experimental time window does not exceed a few seconds. A more simplistic analysis that neglects laminar flow effects, assuming that oI(m/z)W(¯t ) ¼ I(m/z, t), will result in errors of the measured rate constants on the order of 30% [13,25].
2.2.2 Global data analysis ESI of proteins and many other analytes produces ions in various charge states. Time-resolved ESI-MS data, therefore, often exhibit a considerable complexity and can comprise hundreds of individual intensity-time profiles. Instead of analyzing all of these profiles individually, it is advantageous to apply a global analysis procedure that results in a relatively small number of parameters which completely describe the whole data set. Many chemical processes can be described by a system of coupled first-order differential equations [28]. This implies that a master equation of
Time-Resolved ESI-MS
109
the general form [29] ! dx ! ¼ A^ x (6) dt ! can be established, where x ¼ x1(t), y , xn(t) is a vector containing all timedependent concentrations of the n species that are involved in the reaction. A^ is the n n rate matrix of the system with eigenvalues l1, y , ln. Except for ln (which is 0), the apparent rate constants lj are functions of all the microscopic ^ Equation (6) can be solved for any set of initial rate constants in the matrix A. conditions, resulting in multi-exponential expressions for the concentration profiles xj(t). Accordingly, all the time-dependent ESI-MS intensity profiles accompanying the kinetics can be described by Iðm=z; tÞ ¼
n1 X
Cj ðm=zÞ expðt=tj Þ þ Cn ðm=zÞ
(7)
j¼1
This equation can be substituted in to Equation (4) in order to account for laminar flow effects. The (n 1) relaxation times, tj, in Equation (7) are given by tj ¼ (lj)1. These tj values are common to all the Iðm=z; ¯tÞ profiles, and the observed kinetics differ only in the amplitudes C1(m/z), y, Cn(m/z) [30,31]. These amplitudes can be positive (indicating an exponential decay) or negative (indicating an exponential rise). As it represents the best fit to the entire set of measured intensity-time profiles, global analysis more accurately reflects the kinetics of the system than a conventional peak-by-peak analysis, and it allows a more reliable determination of the relaxation times tj.
3. SELECTED APPLICATIONS 3.1 Enzyme kinetics Enzymes are vital components of all biological systems. It is not surprising, therefore, that there is a strong motivation to understand the basic principles underlying their activity. Enzymatic reactions can be simple two-step processes, or they can be more complicated and involve a sequential series of intermediates, often along with major conformational changes of the protein [32–34]. Kinetic experiments are among the most important tools for elucidating these reaction mechanisms. Most enzyme studies are carried out under steady state conditions, where high time resolution is not required. However, kinetic experiments in the pre-steady state regime can often provide more mechanistic information. Presteady state studies explore the short period of time immediately following the initiation of an enzymatic reaction. It is during this period, that reaction intermediates become successively populated, such that the rate constants of individual steps can be measured [11,35,36]. Rapid mixing experiments with optical detection are a well-established tool for pre-steady state kinetic experiments. Unfortunately, most reactions of enzymes with their biologically relevant substrates cannot be studied in this way, because there are no associated
110
Lars Konermann et al.
chromophoric changes. For this reason, kineticists often use artificial substrate analogs that undergo a color change upon turnover. Obviously, this approach is problematic because the kinetics observed with these analogs may be different from those that would be observed with the natural substrate(s). In some cases the use of radioactively labeled substrates provides an alternative approach [37]. However, radiochemical methods are somewhat cumbersome, and problems can arise due to nonspecific entrapment of the label. Kinetic studies by time-resolved ESI-MS do not require chromophoric substrates or radioactive labeling and, therefore, the application of this approach to enzymatic systems is very attractive [2,5,17,38–42]. a-Chymotrypsin is a serine protease, responsible for the breakdown of proteins and peptides in the small intestine [43]. Ser195 represents the reactive nucleophile in the active site of the enzyme. Chymotrypsin also catalyzes the hydrolysis of esters, including several synthetic substrate analogs. In basic terms, the chymotrypsin reaction mechanism can be expressed as [11,35,44]
E free + S
Kd
ES
K2
EP 2
K3
E free + P 2
(8)
P1 In the first step of this sequence, free enzyme Efree and substrate S form a noncovalent enzyme–substrate complex, ES, that is characterized by a dissociation constant Kd. Subsequently, Ser195 forms a covalent bond with the carbonyl carbon of the substrate, thus releasing the first hydrolysis product P1. The rate constant of this acylation step is denoted as k2. The subsequent deacylation has a rate constant of k3, and it leads to regeneration of the free enzyme by hydrolysis of the Ser195ester bond, through release of the second hydrolysis product P2. For conditions where S is present in large excess, the kinetics can be described as [11,35] ½EP2 ðtÞ ¼ A 1 expðkobs tÞ (9) and ð½Efree þ ½ESÞðtÞ ¼ B expðkobs tÞ þ C
(10)
A, B, and C in these equations are constants, and kobs is given by kobs ¼ k3 þ
k2 ½S Kd þ ½S
(11)
where [S] is the substrate concentration. Measurements of kobs as a function of substrate concentration allow the determination of the parameters k2, k3, and Kd in Equation (8). The application of time-resolved ESI-MS is illustrated for the conversion of a chromophoric substrate analog, para-nitrophenyl acetate (pNPA) [45]. pNPA was chosen here because it provides a simple way to validate the ESI-MS-based kinetic experiments by standard UV-Vis stopped-flow spectroscopy. Chymotrypsin catalyzes the conversion of pNPA to para-nitrophenol (pNP, a bright yellow-colored substance) and acetate. In the framework of Equation (8), pNP corresponds to P1, and acetate corresponds to P2. The setup depicted in Figure 1C
Time-Resolved ESI-MS
111
was employed for monitoring the reaction kinetics, using enzyme solution in syringe 1, substrate in syringe 2, and an acidic make-up solvent in syringe 3. A quadrupole mass spectrometer, operated in multiple ion mode, was used for monitoring intensity-time profiles for the 12+ charge state of unmodified a-chymotrypsin (MW ¼ 25,234 Da) and acetylated enzyme (E-P2; MW ¼ [25,234 + 42] Da) (Figure 2A). Note that the noncovalent ES complex does not remain intact under the make-up flow conditions employed here, such that ES contributes to the signal intensity of the ‘‘free’’ enzyme. Interestingly, ESI-MS
ESI-MS Intensity (cps)
A 3e+4 E-P2 2e+4
1e+4
Efree + ES
0
1
2
3
Time (s) 3
B
kobs(s-1)
2
1
α (ESI-MS) δ (ESI-MS) α and δ (optical)
0 0
1
2 3 4 Substrate Concentration (mM)
5
Figure 2 Enzyme kinetics monitored by time-resolved ESI-MS. (A) Intensity-time profiles, reflecting the accumulation of acetylated chymotrypsin (E-P2) and depletion of (Efree+ES) following the exposure of the enzyme to 5 mM of pNPA substrate. Solid lines represent fits to the experimental data according to Equations (9) and (10). The data were obtained by monitoring the 12+ ion abundance of free and acetylated enzyme (m/z 2,103 and 2,107, respectively) as a function of time. (B) Measured kobs values as a function of substrate concentration. Solid symbols refer to ESI-MS data for the two forms of the protein (a and du). Open circles depict values determined by optical stopped-flow spectroscopy. The fitted curves are based on Equation (11). Adapted with permission from ref. [45]. Copyright 2004 American Chemical Society.
112
Lars Konermann et al.
analysis of commercially supplied chymotrypsin revealed the presence of an additional form of the enzyme, having a slightly higher mass of 25,450 Da. This protein corresponds to a catalytically active precursor of a-chymotrypsin, termed du. Time-resolved ESI-MS allows the acylation kinetics of the a and du forms to be monitored independently. In contrast, optical experiment that monitor the formation of pNP cannot distinguish between the two forms of the enzyme, rather, the observed kinetics will be an average of the two. The resulting profiles are well described by Equations (2) and (10) (solid lines in Figure 2A), such that the dependence of kobs on the substrate concentration can be determined for both a- and du-chymotrypsin (Figure 2B). Evidently, the catalytic activity of the two chymotrypsin forms is very similar. An analysis of the measured kobs values based on Equation (11) results in Kd values of (1.470.2) and (1.770.2) mM, and rate constants k2 of (3.270.3) and (3.770.3) s1, respectively, for a and du. Unfortunately, for the system studied here k3 is too small to be determined in pre-steady state measurements (the y-axis intercept of the curves in Figure 2B is close to 0). These results show the deacylation step (k3) to be rate limiting for the hydrolysis of pNPA by chymotrypsin. Also shown in Figure 2B are the results of control experiments where the release of pNP was monitored by optical stopped-flow spectroscopy. The corresponding values of Kd and k2 are close to those obtained by ESI-MS, (1.670.1) mM and (3.670.2) s1, respectively. The data discussed in this section demonstrate that time-resolved ESI-MS is well suited for monitoring the kinetics of enzymatic reactions. The time resolution of this approach is adequate for studies in the steady state and in the pre-steady state regimes [45]. As noted earlier, a chromophoric substrate was chosen here as an example only to allow a direct validation of the ESI-MS kinetics in control experiments employing classical techniques. Obviously, the presence of a chromophore is not a requirement for MS-based experiments and, therefore, a much wider range of catalytic processes can be monitored that are not amenable to standard optical experiments [2,5,17,38,40,41,45–52]. Using isotope exchange approaches of the type discussed in Section 3.4, time-resolved ESI-MS may also be used for monitoring the conformational dynamics of enzymes during catalysis [32,33].
3.2 Protein folding and assembly The mechanisms by which denatured proteins spontaneously fold into their native structures have been a focal point of research for several decades [53]. In recent years, many fundamental principles governing the folding of single-chain polypeptides have been uncovered [54–57]. In contrast, surprisingly little efforts have been directed towards the assembly of multi-subunit structures from their unfolded constituents. The formation of quaternary protein assemblies adds another layer of complexity to the protein-folding problem because folding and binding can be closely intertwined [58–67]. Kinetic studies on protein association processes are challenging because spectroscopic signals such as fluorescence, circular dichroism, or even NMR methods cannot readily distinguish unimolecular events from those that are linked to the formation of intermolecular
Time-Resolved ESI-MS
113
contacts. Light and X-ray scattering methods exhibit relatively poor selectivity. Size fractionation and chemical cross-linking approaches have a limited kinetic resolution [68]. ESI-MS offers an interesting alternative for monitoring protein–ligand and protein–protein interactions [69]. Intact protein assemblies can be transferred into the gas phase as multiply protonated entities, such that binding stoichiometries can be deduced directly from the mass of the observed ions [70–72]. In addition, the ESI charge state distribution provides a highly sensitive probe for the overall conformational properties of proteins in solution. Compact protein structures give rise to relatively low protonation states. Solution-phase unfolding greatly increases the extent of protonation, thereby shifting the overall peak distribution to higher charge states [70,73,74]. Thus, ESI-MS provides a tool for monitoring protein conformational changes that are associated with the formation or disruption of intermolecular noncovalent interactions [69,75,76] (see also Chapter 2 in this volume). The application of time-resolved ESI-MS to a folding/binding process can be illustrated using the calcium-binding protein S100A11 as an example. This protein is largely a-helical and forms a symmetric, homodimeric quaternary structure both in its apo- and calcium-loaded form [77]. Our group has investigated the folding mechanism of S100A11 both in the presence and in the absence of calcium [78]. For the sake of simplicity, the discussion here will be restricted to the folding of apo-S100A11. Time-resolved ESI mass spectra recorded at different times after a pH jump from 2.4 to 8.5 allow tracking the conformational changes associated with the formation of tightly folded apo-S100A11 homodimers, starting from aciddenatured monomers (Figure 3). These experiments employed a single-step mixing device of the type depicted in Figure 1A. For a reaction time of 10 ms (Figure 3B), the protein is still predominantly monomeric. Notably, the charge state distribution obtained under these conditions is bimodal, with maxima at 11+ and 7+. This observation indicates the presence of two distinct solution-phase conformations that differ in their overall compactness [79–81]. Charge states around 11+ are assigned to a more unfolded conformation, termed MU. The charge state distribution of this species suggests that it has an overall compactness similar to that of the acid-denatured protein at pH 2.4 (Figure 3A). The slight shift in maximum from 12+ to 11+ may reflect a small conformational change. Alternatively, this effect could result from the different ionization conditions used (pH 2.4 versus pH 8.5), which might cause a reduced charge acquisition during ESI [82]. Charge states around 7+ in Figure 3B are assigned to monomeric proteins in a more tightly folded conformation, referred to as MF. Figures 3C, D show data recorded for reaction times of 200 and 800 ms, respectively. These spectra reveal increasing contributions of dimeric protein ions in charge states 9+ to 11+, whereas the relative intensities of the monomeric peaks steadily decrease. The final stage of the folding and assembly process is characterized by the data in Figure 3E, recorded 5 min after the pH jump. This spectrum is dominated by dimeric S100A11. The persistence of low intensity monomeric signals is consistent with the known dissociation constant of the
114
Lars Konermann et al.
MU
m
m
MF
Dimer
m12+ m
A m
m m m
m7+ m7+
B
m
m Normalized ESI-MS Intensity
m
m
m11+ m m
m
m7+
m m
m11+ m
m
D10+
C D
m m
m
D D10+
D D
m7+ m11+ D D10+
E D
m11+ 1000
m7+ 1500
D 2000
2500
m/z
Figure 3 Protein folding and assembly monitored by time-resolved ESI-MS. (A) Acid-denatured S100A11 in aqueous solution at pH 2.4. Time-resolved mass spectra recorded 10 ms, 200 ms, 800 ms, and 5 min after a pH jump from 2.4 to 8.5 are shown in panels (B)–(E). Notation: ‘‘m’’ represents monomeric protein ions; ‘‘D’’ represents dimeric species. The charge states of some selected ionic species are indicated as well. Dashed lines separate the ionic signals attributed to three different kinetic species, MU (unfolded monomer), MF (folded monomer), and folded dimer. Adapted with permission from ref. [78]. Copyright 2006 American Chemical Society.
protein [83]. Spectra obtained on protein samples that had not been previously exposed to acidic conditions are indistinguishable from that depicted in Figure 3E (data not shown). Overall, these ESI-MS data reveal that a pH jump from 2.4 to 8.5 causes S100A11 to undergo a transition from a denatured
Time-Resolved ESI-MS
115
monomeric structure to a tightly folded dimer on the time scale of B1 s. This conversion is not a simple two-state process. Instead, it involves a monomeric kinetic intermediate, MF, that is associated with charge states around 7+, which indicates a highly compact structure. Additional ESI-MS experiments allowed a detailed kinetic model of the S100A11 folding and assembly process to be developed, details on this work can be found in ref. [78].
3.3 Unfolding and subunit disassembly of noncovalent protein complexes Time-resolved ESI-MS not only represents a powerful approach for monitoring the self-assembly of biological systems (see Section 3.2.), it also provides detailed information on the reverse process, that is, the breakdown of highly ordered macromolecular complexes [84]. The core oxygenase domain of inducible nitric oxide synthase (iNOSCOD) is a homodimeric protein complex of 103 kDa. Each of the two subunits binds a heme cofactor. In addition, two tetrahydrobiopterin (H4B) moieties are sequestered at the dimer interface. The ESI mass spectrum recorded under native solvent conditions indicates the predominant presence of dimeric iNOSCOD with cofactors bound, but also contributions from tightly folded heme-bound (‘‘holo’’) monomers (Figure 4A). Exposure of the protein to acidic conditions results in extensive denaturation within a few seconds. An increase in the relative intensities of monomer ions, and a concomitant decrease of the relative dimer intensities is observed 9 ms after a pH jump from 7.5 to 2.8 (Figure 4B). The disruption of protein–protein interactions is accompanied by the loss of H4B. A reaction time of 500 ms (Figure 4C) marks the point at which the relative intensities of heme-bound monomers are highest. Interestingly, these ions show a trimodal charge state distribution, thus indicating the presence of at least three solution-phase structures. The first group of these peaks, centered around 14+, is attributed to heme-bound monomers in a relatively tightly folded conformation. The other two distributions, with maxima around 21+ and 27+, represent increasingly unfolded species. The transient population of significantly unfolded proteins that remain bound to a heme cofactor is not unprecedented; a similar behavior has previously been observed for hemoglobin [26] and myoglobin [85]. At a reaction time of 4 s (Figure 4D), the iNOSCOD dimer has become almost unobservable. Instead, the spectrum is dominated by apomonomers in high charge states, extending up to at least 60+. Charge states in this range were not observed for apo-monomers formed early during the reaction (Figure 4B). This indicates that the reaction mixture for t ¼ 4 s contains heme-free monomeric proteins that are significantly more unfolded than for early time points. A more detailed view of the denaturation process is obtained when subjecting the intensity-time profiles corresponding to the various protein species to global analysis (Equation (7)). Signals corresponding to the intact protein dimer show a rapid decay that is characterized by a relaxation time of 360 ms (Figure 5A). This process occurs concomitantly with the formation of monomeric heme-bound proteins that appear in a wide range of different charge states, representing
116
Lars Konermann et al.
D
h
15+
22+
A
D D
h
h
h
D 14+
Relative ESI-MS Intensity
h 27+ 21+ h h hh h h h hh h h h h h h hh
B
D
20+
D
h
D
D
C
h h h hh h h h h h h h h h h h h
h
h D D
36+ 27+ a aa aa a a a a
a 60+ a a
a a
1000
D D
D
a
a
a a
2000
h a
a a a a 3000 m/z
a
14+
h a 4000
5000
Figure 4 Time-resolved ESI mass spectra of the iNOSCOD, representing different time points during acid-denaturation of the protein. (A) Native protein at pH 7.5. The other three panels show spectra recorded (B) 9 ms, (C) 500 ms, and (D) 4 s following a change in solution conditions from pH 7.5 to 2.8. Peaks corresponding to the intact protein dimer are marked as ‘‘D’’, heme-bound monomers are indicated by ‘‘h’’, and heme-free (apo) monomers are marked as ‘‘a’’. Adapted with permission from ref. [84]. Copyright 2005 American Chemical Society.
solution-phase structures from tightly folded conformers all the way to significantly unfolded species (Figure 5B). These heme-bound monomers represent short-lived intermediates that are depleted with a relaxation time of 620 ms. The main products of this process are highly unfolded monomeric proteins in their heme-free (apo) form (Figure 5C). In simple terms, the denaturation of iNOSCOD can thus be described as a sequential process.
117
Time-Resolved ESI-MS
A
Dimer20+
10000
B
5000
Intensity (cps)
Intensity (cps)
12000
holo-M14+
4000
8000 6000 4000 2000 4
3000 2000 1000 4
3
3
s)
5000
m/z
apo-M34+
2500
Intensity (cps)
2
e(
0 4600 4800
5200
m
s)
e(
1
5400
Ti
m
Ti
2
1 0
4000 35 3000 00 25 2000 00 1500 m/z
C
2000 1500 1000 500 4 3
s)
e(
m
Ti
2 1
4000 35 3000 00 25 2000 00 0 15 m/z 1000 00
Figure 5 Unfolding kinetics of iNOSCOD monitored by time-resolved ESI-MS. Shown in this figure are intensity-time profiles obtained for ionic signals representing the (A) native protein dimer, (B) monomeric, heme-containing reaction intermediates, and (C) unfolded, heme-free monomers. Red lines represent the result of a global fitting procedure according to Equation (7). Adapted with permission from ref. [84]. Copyright 2005 American Chemical Society. (See colour Plate Section at the end of this book.)
Disruption of the native protein dimer generates monomeric species that subsequently undergo unfolding with concomitant loss of heme. A detailed global analysis of the kinetic data reveals a number of additional steps and parallel processes that cannot be discussed here due to space constraints [84]. When considered in the context of other recent studies [26], the iNOSCOD kinetics highlighted here suggest that the occurrence of complex reaction mechanisms involving short-lived intermediates is a common feature for the denaturation of large noncovalent protein complexes.
3.4 Hydrogen/deuterium exchange: Continuous and pulse-labeling approaches Hydrogen/deuterium exchange (HDX) methods are among the most powerful tools for studying protein folding, dynamics, and function. Upon exposure of a protein to a D2O-containing solvent, labile hydrogens in backbone amide
118
Lars Konermann et al.
groups and amino acid side chains can be replaced with deuterium. Notably, the occurrence of these HDX events is closely related to the structure and dynamics of the protein. Sites that are solvent accessible and not involved in stable hydrogen bonds undergo rapid exchange. HDX at sites that are protected by steric shielding and/or hydrogen bonding occurs more slowly. The latter type of isotope exchange is mediated by conformational fluctuations that are associated with transient opening events. In the commonly encountered EX2 regime conformational opening/closing processes occur on time scales ranging from sub-micro to milliseconds, and most exchangeable sites have to cycle through numerous of these transitions before exchange occurs [6,86,87]. In contrast, HDX in the EX1 regime leads to complete exchange of entire domains, or even the whole protein, during one single cooperative opening event [88–91]. A more complete discussion of the mechanisms underlying EX1 and EX2 exchange can be found in refs. [89,90,92,93] (also refer to Chapters 2, 4 in this volume). Following HDX, proteins can be analyzed by two-dimensional NMR spectroscopy. NMR provides spatially resolved information on the proton occupancy of individual exchangeable sites [87]. NMR measurements exploit the different nuclear spins of protium (spin 1/2) and deuterium (spin 1). Deuterons do not contribute to 1H NMR signals, and the measured peak areas are thus directly proportional to the proton occupancy of each site. It is important to note that these proton occupancies represent an average over the entire protein population within a sample [88,94]. In recent years, the use of MS for HDX studies has become increasingly popular [70]. HDX-MS measurements are based on the different masses of protium (1.0078 Da) and deuterium (2.0141 Da). In contrast to NMR, MS data are not averaged over all the molecules in the sample. Instead, co-existing protein conformations can be detected, and their HDX properties can be monitored individually. This aspect greatly facilitates the differentiation between HDX events occurring in the EX1 and EX2 regimes [88,95]. Spatially resolved HDX information by MS can be obtained by using peptide-mapping approaches [6,96,97] (also refer to Chapter 4 in this volume). Although NMR techniques have undergone a rapid development in recent years, studies on proteins larger than 30 kDa remain challenging. In contrast, there is virtually no upper limit for the analysis of proteins by MS. Also, the concentration and the amount of protein required for MS-based experiments are usually orders of magnitude less than for NMR [6,98]. HDX studies aimed at characterizing protein structural dynamics typically monitor the isotope exchange of proteins as a function of time, employing experimental windows of tens of minutes to hours. Manual mixing is adequate for many of these studies. Depending on the solvent conditions and the conformational properties of the protein, however, HDX processes sometimes occur much faster, thus necessitating the application of rapid mixing techniques. For example, the structural dynamics of semi-denatured myoglobin have been explored using the time-resolved ESI-MS setup depicted in Figure 1A in measurements extending from a few milliseconds up to 3 s [90]. Rist et al. have developed an integrated system capable of performing sub-second HDX, proteolytic digestion, desalting, and peptide separation in an on-line fashion [99].
Time-Resolved ESI-MS
119
Quite different from these continuous-labeling studies is the application of pulse-labeling strategies that expose the protein to a D2O-containing solvent only for a very brief amount of time. This type of experiment can be carried out using a double-mixing setup of the type depicted in Figure 1C. It allows HDX to take place during a short time interval that is immediately followed by ESI [82]. HDX predominantly occurs in unfolded protein regions that are solvent-exposed and that possess a non-intact hydrogen-bonding network. A slightly basic pD is required to ensure that unprotected sites exchange within a few milliseconds [100,101]. Pulse-labeling strategies represent a very powerful tool for the detection and characterization of short-lived protein-folding intermediates [7,89,102,103]. For the double-mixing setup in Figure 1C, the first mixing step exposes an initially denatured protein to refolding conditions for a variable amount of time. Subsequently, mixing of the protein with D2O initiates a 25 ms HDX labeling pulse that is terminated by desolvation during ESI. The HDX level and ESI charge state distribution represent complementary structural probes; the former reports on the intactness of the hydrogen-bonding network and the accessibility of exchangeable sites, whereas the latter monitors the overall compactness of the protein [70,82,104]. Importantly, pulse-labeling studies employing quench-flow MS methods provide information only on the HDX properties of the protein. The ESI charge state distribution cannot be used as a structural probe under these conditions [7,103,105]. The question whether the folding kinetics of ubiquitin (MW 8.5 kDa, 144 exchangeable hydrogen atoms) follow a simple two-state mechanism, or whether there is evidence for the occurrence of a folding intermediate is currently being debated in the literature [82]. Our group studied the refolding of this protein (initially denatured in methanol/water at pH 2.0) using time-resolved ESI-MS with on-line pulsed HDX. Prior to refolding, the protein shows an ESI mass spectrum exhibiting a broad charge state distribution with a maximum around 11+ (Figure 6A). Refolding was triggered by a change in solution conditions to pH 10.0. Mass spectra acquired during refolding have a bimodal appearance. They exhibit a relatively broad charge state distribution centered at 9+, and a more narrow one encompassing the 6+ and 5+ charge states. Refolding of the protein is reflected in a gradual intensity decrease of ions in charge states around +9, and a concomitant increase of the 6+ and 5+ ions (Figure 6B and C). After pulsed HDX, the mass distributions observed for charge states around 9+ exhibit a large shift of (8374) Da relative to unlabeled ubiquitin (Figure 6D and E, dashed lines). For t ¼ 3.3 s, the 6+ and 5+ ions exhibit a much smaller shift of (5872) Da (Figure 6E, solid line). Interestingly, however, for t ¼ 40 ms the HDX behavior observed for the 6+/5+ charge states coincides with that observed for the 13+ to 7+ signals (Figure 6D). It is concluded that the refolding of ubiquitin under the conditions studied here involves three kinetic species, (i) the denatured state D (charge states around 9+, large mass shift), (ii) a folding intermediate D (charge states 6+/5+, large mass shift), and (iii) the folded state F (charge states 6+/5+, small mass shift). The fact that D and D show indistinguishable HDX characteristics suggests that these two species undergo rapid interconversion during the labeling pulse. This implies that D and D are separated by a
120
Lars Konermann et al.
A 11+ 10+ 12+ 9+
Denatured Protein 8+
6+ 7+
Normalized ESI-MS Intensity
5+
B
9+
t = 40 ms
9+
5+
5+
C
5+
D
t = 3.3 s
5+
E
9+ 9+
700
1000
1300 m/z
1600
1900
30
50
70 90 110 130 150 Mass Shift (Da)
Figure 6 Ubiquitin folding monitored by time-resolved ESI-MS with on-line pulsed HDX. (A) Spectrum of the unfolded protein prior to triggering folding. Spectra for reaction times of 40 ms and 3.3 s are shown in panels (B) and (C), respectively. (D, E) Mass shift distributions resulting from HDX for the 9+ (dashed line) and 5+ (solid line) charge states for reaction times of 40 ms and 3.3 s. Adapted with permission from ref. [82]. Copyright 2005 American Chemical Society.
relatively low energy barrier. In contrast, the transition to F occurs on a slower time scale which is indicative of a major barrier. In summary, the refolding of ubiquitin under the conditions used here is not a simple two-state (D-F) process. Time-resolved ESI-MS with on-line HDX reveals the presence of an intermediate, most likely an on-pathway species according to D#D-F. This result is in line with the notion that the occurrence of folding intermediates is more widespread than commonly thought, especially in cases where a cursory analysis indicates two-state behavior [82].
4. CONCLUSIONS AND OUTLOOK Time-resolved ESI-MS represents a powerful alternative to conventional methods for monitoring the kinetics of rapid biochemical processes. The assembly and use of both static and adjustable capillary mixers for time-resolved ESI-MS is relatively straightforward, and usually only requires minimal modifications of commercially available ion sources. Care must be taken for the analysis of kinetic
Time-Resolved ESI-MS
121
data collected under laminar flow conditions, since the parabolic velocity profile within the reaction capillary leads to a blurring of the time axis. This distortion can be rectified by considering the specific shape of the age distribution function during data analysis. The highly selective nature of ESI-MS permits multiple co-existing species to be monitored simultaneously. For complex systems, this can result in the acquisition of numerous interdependent intensity-time profiles. Analyzing these data globally increases the accuracy of the extracted numerical parameters, aids the interpretation of the measured kinetics, and facilitates the understanding of reaction mechanisms. Although time-resolved ESI-MS techniques are at a stage of development where they are readily applicable to numerous problems in biological chemistry, there remains considerable room for improvement. The time resolution of ESI-coupled capillary mixing devices, for example, is still considerably lower than for comparable optical methods. The dead-time of the experiments could be improved by optimizing the mixer geometry, and by speeding the analyte transfer from the Taylor cone at the capillary outlet into the gas phase. A drawback of direct on-line methods is the low tolerance of ESI towards high concentrations of non-volatile solvent additives such as buffers, salts, denaturants, and surfactants [106,107]. The implication is that time-resolved ESI experiments must either be limited to the use of ‘‘non-physiological’’ volatile salts, such as ammonium acetate, or employ rapid on-line desalting and sample clean-up techniques. There are a growing number of reports to support the feasibility of the latter approach [106,108,109]. In addition, the use of integrated quench-flow systems with on-line sample processing is a very promising development [99]. Also, the combination of time-resolved methods with fused droplet ESI-MS [110], or on-line MALDI-MS methods [111] offers some very interesting prospects. In any case, kinetic studies employing MS-based methods are certain to play an ever-increasing role in many areas of chemistry and biochemistry.
ACKNOWLEDGEMENTS The Konermann laboratory is financially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Foundation for Innovation, the Ontario Innovation Trust, the University of Western Ontario, and by the Canada Research Chairs Program.
REFERENCES 1 D. Fabris, Mass spectrometric approaches for the investigation of dynamic processes in condensed phase, Mass Spectrom. Rev., 24 (2005) 30–54. 2 A. Liesener and U. Karst, Monitoring enzymatic conversions by mass spectrometry: A critical view, Anal. Bioanal. Chem., 382 (2005) 1451–1464. 3 L.S. Santos, L. Knaack and J.O. Metzger, Investigation of chemical reactions in solution using API-MS, Int. J. Mass Spectrom., 246 (2005) 84–104. 4 J.B. Fenn, Electrospray wings for molecular elephants (Nobel Lecture), Angew. Chem. Int. Ed., 42 (2003) 3871–3894.
122
Lars Konermann et al.
5 E.D. Lee, W. Mu¨ck, J.D. Henion and T.R. Covey, Real-time reaction monitoring by continuousintroduction ion-spray tandem mass spectrometry, J. Am. Chem. Soc., 111 (1989) 4600–4604. 6 D.L. Smith, Y. Deng and Z. Zhang, Probing the noncovalent structure of proteins by amide hydrogen exchange mass spectrometry, J. Mass Spectrom., 32 (1997) 135–146. 7 V. Tsui, C. Garcia, S. Cavagnero, G. Siuzdak, H.J. Dyson and P.E. Wright, Quench-flow experiments combined with mass spectrometry show apomyoglobin folds through an obligatory intermediate, Protein Sci., 8 (1999) 45–49. 8 C.T. Houston, W.P. Taylor, T.S. Widlanski and J.P. Reilly, Investigation of enzyme kinetics using quench-flow techniques with MALDI-TOF mass spectrometry, Anal. Chem., 72 (2000) 3311–3319. 9 D. Bo¨kenkamp, A. Desai, X. Yang, Y.-C. Tai, E.M. Marzluff and S.L. Mayo, Microfabricated silicon mixers for submillisecond quench-flow analysis, Anal. Chem., 70 (1998) 232–236. 10 M.C.R. Shastry, S.D. Luck and H. Roder, A Continuous-flow mixing method to monitor reactions on the microsecond time scale, Biophys. J., 74 (1998) 2714–2721. 11 A.R. Fersht, Structure and Mechanism in Protein Science, W. H. Freeman & Co, New York, 1999. 12 G.M. Whitesides, The origins and the future of microfluidics, Nature, 442 (2006) 368–373. 13 L. Konermann, Monitoring reaction kinetics by continuous-flow methods: The effects of convection and molecular diffusion under laminar flow conditions, J. Phys. Chem. A, 103 (1999) 7210–7216. 14 H. Roder, K. Maki and H. Cheng, Early events in protein folding explored by rapid mixing methods, Chem. Rev., 106 (2006) 1836–1861. 15 D.E. Hertzog, B. Ivorra, B. Mohammadi, O. Bakajin and J.G. Santiago, Optimization of a microfluidic mixer for studying protein folding kinetics, Anal. Chem., 78 (2006) 4299–4306. 16 H.Y. Park, X. Qiu, E. Rhoades, J. Korlach, L.W. Kwok, W.R. Zipfel, W.W. Webb and L. Pollack, Achieving uniform mixing in a microfluidic device: Hydrodynamic focusing prior to mixing, Anal. Chem., 78 (2006) 4465–4473. 17 A.A. Paiva, R.F. Tilton, G.P. Crooks, L.Q. Huang and K.S. Anderson, A novel method for the detection and identification of transient enzyme intermediates using rapid mixing and electrospray mass spectrometry, Biochemistry, 36 (1997) 15472–15476. 18 H. Ørsnes, T. Graf and H. Degn, Stopped-flow mass spectrometry with rotating ball inlet: Applications to the ketone-sulfite reaction, Anal. Chem., 70 (1998) 4751–4754. 19 B.M. Kolakowski and L. Konermann, From small-molecule reactions to protein folding: Studying biochemical kinetics by stopped-flow electrospray mass spectrometry, Anal. Biochem., 292 (2001) 107–114. 20 D.B. Northrop and F.B. Simpson, Kinetics of enzymes with isomechanisms: Britton induced transport catalyzed by bovine carbonic anhydrase II, measured by rapid-flow mass spectrometry, Arch. Biochem. Biophys., 352 (1998) 288–292. 21 J.W. Sam, X.J. Tang and J. Peisach, Electrospray mass spectrometry of iron bleomycon: Demonstration that activated bleomycin is a ferric peroxide complex, J. Am. Chem. Soc., 116 (1994) 5250–5256. 22 L. Konermann, B.A. Collings and D.J. Douglas, Cytochrome c folding kinetics studied by timeresolved electrospray ionization mass spectrometry, Biochemistry, 36 (1997) 5554–5559. 23 L. Konermann, F.I. Rosell, A.G. Mauk and D.J. Douglas, Acid-induced denaturation of myoglobin studied by time-resolved electrospray ionization mass spectrometry, Biochemistry, 36 (1997) 6448–6454. 24 L. Konermann, E.A. Silva and O.F. Sogbein, Electrochemically induced pH changes resulting in protein unfolding in the ion source of an electrospray mass spectrometer, Anal. Chem., 73 (2001) 4836–4844. 25 D.J. Wilson and L. Konermann, A capillary mixer with adjustable reaction chamber volume for millisecond time resolved studies by electrospray mass spectrometry, Anal. Chem., 75 (2003) 6408–6414. 26 D.A. Simmons, D.J. Wilson, G.A. Lajoie, A. Doherty-Kirby and L. Konermann, Subunit disassembly and unfolding kinetics of hemoglobin studied by time-resolved electrospray mass spectrometry, Biochemistry, 43 (2004) 14792–14801. 27 R.F. Probstein, Physicochemical Hydrodynamics, IInd ed., Wiley, New York, 1994. 28 R.H. Pain, Mechanisms of Protein Folding, 2nd ed., Oxford University Press, New York, 2000.
Time-Resolved ESI-MS
123
29 M.N. Berberan-Santos and J.M.G. Martinho, The integration of kinetic rate equations by matrix methods, J. Chem. Ed., 67 (1990) 375–379. 30 J.M. Beechem, M. Ameloot and L. Brand, Global and target analysis of complex decay phenomena, Anal. Instrum., 14 (1985) 379–402. 31 A.R. Holzwarth, Time-resolved fluorescence spectroscopy, Methods Enzymol., 246 (1995) 334–362. 32 Y.-H. Liu and L. Konermann, Enzyme conformational dynamics during catalysis and in the ‘resting state’ monitored by hydrogen/deuterium exchange mass spectrometry, FEBS Lett., 580 (2006) 5137–5142. 33 L.S. Busenlehner and R.N. Armstrong, Insights into enzyme structure and dynamics elucidated by amide H/D exchange mass spectrometry, Arch. Biochem. Biophys., 433 (2005) 34–46. 34 Y.J. Huang and G.T. Montelione, Proteins flex to function, Nature, 438 (2005) 36–37. 35 K. Hiromi, Kinetics of Fast Enzyme Reactions: Theory and Practice, JWiley, New York, 1979. 36 L. Konermann and D.J. Douglas, Pre-steady state kinetics of enzymatic reactions studied by electrospray mass spectrometry with on-line rapid-mixing techniques, Methods Enzymol., 354 (2002) 50–64. 37 K.S. Anderson, J.A. Sikorski and K.A. Johnson, A tetrahedral intermediate in the EPSP synthase reaction observed by rapid quench kinetics, Biochemistry, 27 (1988) 7395–7406. 38 D.B. Northrop and F.B. Simpson, Beyond enzyme kinetics: Direct determination of mechanisms by stopped-flow mass spectrometry, Bioorg. Med. Chem., 5 (1997) 641–644. 39 D.L. Zechel, L. Konermann, S.G. Withers and D.J. Douglas, Pre-steady state kinetic analysis of an enzymatic reaction monitored by time-resolved electrospray ionization mass spectrometry, Biochemistry, 37 (1998) 7664–7669. 40 Z. Li, A.K. Sau, S. Shen, C. Whitehouse, T. Baasov and K.S. Anderson, A snapshot of enzyme catalysis using electrospray mass spectrometry, J. Am. Chem. Soc., 125 (2003) 9938–9939. 41 A.J. Norris, J.P. Whitelegge, K.F. Faull and T. Toyokuni, Analysis of enzyme kinetics using electrospray ionization mass spectrometry and multiple reaction monitoring: Fucosyltransferase V, Biochemistry, 40 (2001) 3774–3779. 42 K.S. Anderson, Detection of novel enzyme intermediates in PEP-utilizing enzymes, Arch. Biochem. Biophys., 433 (2005) 47–58. 43 D.M. Blow, Structure and mechanism of chymotrypsin, Acc. Chem. Res., 9 (1976) 145–152. 44 G. Zubay, Biochemistry, 4th ed., Wm. C. Brown Publisher, Dubuque, IA, 1998. 45 D.J. Wilson and L. Konermann, Mechanistic studies on enzymatic reactions by electrospray ionization ms using a capillary mixer with adjustable reaction chamber volume for time-resolved measurements, Anal. Chem., 76 (2004) 2537–2543. 46 C. Wu, D.H.L. Robertson, S.J. Hubbard, S.J. Gaskell and R.J. Beynon, Proteolysis of native proteins: Trapping of a reaction intermediate, J. Biol. Chem., 274 (1999) 1108–1115. 47 D. Fabris, Steady-state kinetics of ricin A-chain reaction with the sarcin-ricin loop and with HIV-1 psi-RNA hairpins evaluated by direct infusion electrospray ionization mass spectrometry, J. Am. Chem. Soc., 122 (2000) 8779–8780. 48 P. Wang, D.F. Snavley, M.A. Freitas and D. Pei, Screening combinatorial libraries for optimal enzyme substrates by mass spectrometry, Rapid Commun. Mass Spectrom., 15 (2001) 1166–1171. 49 N. Pi and J.A. Leary, Determination of enzyme/substrate specificity constants using a multiple substrate ESI-MS assay, J. Am. Soc. Mass Spectrom., 15 (2004) 233–243. 50 N. Pi, Y. Yu, J.D. Mougous and J.A. Leary, Observation of a hybrid random ping-pong mechanism of catalysis for NodST: A mass spectrometry approach, Protein Sci., 13 (2004) 903–912. 51 J.M. Wiseman, Z. Takats, B. Gologan, V.J. Davisson and R.G. Cooks, Direct characterization of enzyme-substrate complexes by using electrosonic spray ionization mass spectrometry, Angew. Chem. Int. Ed., 44 (2005) 913–916. 52 S. Shipovskov, T. Karlberg, M. Fodje, M.D. Hansson, G.C. Ferreira, M. Hanson, C.T. Reinmann and S. Al-Karadaghi, Metallation of the transition-state inhibitor N-methyl mesoporphyrin by ferrochelatase: Implications for the catalytic reaction mechanism, J. Mol. Biol., 352 (2005) 1081–1090. 53 C.B. Anfinsen, Principles that govern the folding of protein chains, Science, 181 (1973) 223–230. 54 D. Baker, A surprising simplicity to protein folding, Nature, 405 (2000) 39–42.
124
Lars Konermann et al.
55 V. Daggett and A. Fersht, The present view of the mechanism of protein folding, Nat. Rev. Mol. Cell Biol., 4 (2003) 497–502. 56 J. Rumbley, L. Hoang, L. Mayne and S.W. Englander, An amino acid code for protein folding, Proc. Natl. Acad. Sci. USA, 98 (2001) 105–112. 57 J.N. Onuchic and P.G. Wolynes, Theory of protein folding, Curr. Opin. Struct. Biol., 14 (2004) 70–75. 58 C.J. Mann and C.R. Matthews, Structure and stability of an early folding intermediate of escherichia coli trp aporepressor measurerd by Far-UV stopped-flow circular dichroism and 8-Anilino-1-naphthalene sulfonate binding, Biochemistry, 32 (1993) 5282–5290. 59 R. Jaenicke, Protein folding: Local structures, domains, subunits, and assemblies, Biochemistry, 30 (1991) 3147–3161. 60 A.P. Minton, Implications of macromolecular crowding for protein assembly, Curr. Opin. Struct. Biol., 10 (2000) 34–39. 61 A. Nichtl, J. Buchner, R. Jaenicke, R. Rudolph and T. Scheibl, Folding and association of b-Galactosidase, J. Mol. Biol., 282 (1998) 1083–1091. 62 M.G. Mateu, M.M.S.D. Pino and A.R. Fersht, Mechansim of folding and assembly of a small tetrameric protein domain from tumor suppressor p53, Nat. Struct. Biol., 6 (1999) 191–198. 63 E.I. Shakhnovich, Folding by association, Nat. Struct. Biol., 6 (1999) 99–102. 64 G.M. Verkhivker, D. Bouzida, D.K. Gehlhaar, P.A. Rejto, S.T. Freer and P.W. Rose, Simulating disorder-order transition in molecular recognition of unstructured proteins: Where folding meets binding, Proc. Natl. Acad. Sci. USA, 100 (2003) 5148–5153. 65 K. Gunasekaran, C.-J. Tsai, S. Kumar, D. Zanuy and R. Nussinov, Extended disordered proteins: Targeting function with less scaffold, Trends Biochem. Sci., 28 (2003) 81–85. 66 B.A. Shoemaker, J.J. Portman and P.G. Wolynes, Speeding molecular recognition by using the folding funnel: The fly-casting mechanism, Proc. Natl. Acad. Sci. USA, 97 (2000) 8868–8873. 67 M.O. Crespin, B.L. Boys and L. Konermann, The reconstitution of unfolded myoglobin with hemin dicyanide is not accelerated by fly casting, FEBS Lett., 579 (2005) 271–274. 68 R. Seckler, Assembly of multi-subunit structures. In: R.H. Pain (Ed.), Mechanisms of Protein Folding, Oxford University Press, Oxford, 2000. 69 M. Fa¨ndrich, M.A. Tito, M.R. Leroux, A.A. Rostom, F.U. Hartl, C.M. Dobson and C.V. Robinson, Observation of the noncovalent assembly and disassembly pathways of the chaperone complex MtGimC by mass spectrometry, Proc. Natl. Acad. Sci. USA, 97 (2000) 14151–14155. 70 I.A. Kaltashov and S.J. Eyles, Studies of biomolecular conformations and conformational dynamics by mass spectrometry, Mass Spectrom. Rev., 21 (2002) 37–71. 71 I.A. Kaltashov and S.J. Eyles, Mass Spectrometry in Biophysics, Wiley, Hoboken, NJ, 2005. 72 A.J.R. Heck and R.H.H. Van den Heuvel, Investigation of intact protein complexes by mass spectrometry, Mass Spectrom. Rev., 23 (2004) 368–389. 73 S.K. Chowdhury, V. Katta and B.T. Chait, Probing conformational changes in proteins by mass spectrometry, J. Am. Chem. Soc., 112 (1990) 9012–9013. 74 R. Grandori, I. Matecko and N. Muller, Uncoupled analysis of secondary and tertiary protein structure by circular dichroism and electrospray ionization mass spectrometry, J. Mass Spectrom., 37 (2002) 191–196. 75 O.V. Nemirovskiy, R. Ramanathan and M.L. Gross, Investigation of calcium-induced, noncovalent association of calmodulin with melittin by electrospray ionization mass spectrometry, J. Am. Soc. Mass Spectrom., 8 (1997) 809–812. 76 H. Vis, U. Heinemann, C.M. Dobson and C.V. Robinson, Detection of a monomeric intermediate associated with dimerization of protein hu by mass spectrometry, J. Am. Chem. Soc., 120 (1998) 6427–6428. 77 A.C. Dempsey, M.P. Walsh and G.S. Shaw, Unmasking the annexin I interaction from the structure of apo-S100A11, Structure, 11 (2003) 887–897. 78 J.X. Pan, A. Rintala-Dempsey, Y. Li, G.S. Shaw and L. Konermann, Folding kinetics of the S100A11 protein dimer studied by time-resolved electrospray mass spectrometry and pulsed hydrogendeuterium exchange, Biochemistry, 45 (2006) 3005–3013.
Time-Resolved ESI-MS
125
79 R. Grandori, Detecting equilibrium cytochrome c folding intermediates by electrospray ionization mass spectrometry: Two partially folded forms populate the molten globule state, Protein Sci., 11 (2002) 453–458. 80 A. Dobo and I.A. Kaltashov, Detection of multiple protein conformational ensembles in solution via deconvolution of charge-state distributions in ESI MS, Anal. Chem., 73 (2001) 4763–4773. 81 A. Mohimen, A. Dobo, J.K. Hoerner and I.A. Kaltashov, A chemometric approach to detection and characterization of multiple protein conformers in solution using electrospray ionization mass spectrometry, Anal. Chem., 75 (2003) 4139–4147. 82 J.X. Pan, D.J. Wilson and L. Konermann, Pulsed hydrogen exchange and electrospray charge-state distribution as complementary probes of protein structure in kinetic experiments: Implications for ubiquitin folding, Biochemistry, 44 (2005) 8627–8633. 83 G. Wang, S. Zhang, D.G. GFernig, D. Spiller, M. Martin-Fernandez, H. Zhang, Y. Ding, P.S. Rudland and R. Barraclough, Heterodimeric interaction and interfaces of S100A1 and S100P, Biochem. J., 382 (2004) 375–383. 84 D.J. Wilson, S.P. Rafferty and L. Konermann, Kinetic unfolding mechanism of the inducible nitric oxide synthase oxygenase domain determined by time-resolved electrospray mass spectrometry, Biochemistry, 44 (2005) 2276–2283. 85 O.O. Sogbein, D.A. Simmons and L. Konermann, The effects of pH on the kinetic reaction mechanism of myoglobin unfolding studied by time-resolved electrospray ionization mass spectrometry, J. Am. Soc. Mass Spectrom., 11 (2000) 312–319. 86 S.W. Englander, Hydrogen exchange and mass spectrometry: A historical perspective, J. Am. Soc. Mass Spectrom., 17 (2006) 1481–1489. 87 M.M.G. Krishna, L. Hoang, Y. Lin and S.W. Englander, Hydrogen exchange methods to study protein folding, Methods, 34 (2004) 51–64. 88 A. Miranker, C.V. Robinson, S.E. Radford and C.M. Dobson, Investigation of protein folding by mass spectrometry, FASEB J., 10 (1996) 93–101. 89 L. Konermann and D.A. Simmons, Protein folding kinetics and mechanisms studied by pulselabeling and mass spectrometry, Mass Spectrom. Rev., 22 (2003) 1–26. 90 D.A. Simmons, S.D. Dunn and L. Konermann, Conformational dynamics of partially denatured myoglobin studied by time-resolved electrospray mass spectrometry with online hydrogendeuterium exchange, Biochemistry, 42 (2003) 5896–5905. 91 D.M. Ferraro, N.D. Lazo and A.D. Robertson, EX1 hydrogen exchange and protein folding, Biochemistry, 43 (2004) 587–594. 92 S.W. Englander, L.M. Bai and T.R. Sosnick, Hydrogen exchange: The modern legacy of Linderstrøm-Lang, Protein Sci., 6 (1997) 1101–1109. 93 A. Hvidt and S.O. Nielsen, Hydrogen exchange in proteins, Adv. Protein Chem., 21 (1966) 287–386. 94 A. Miranker, C.V. Robinson, S.E. Radford, R. Aplin and C.M. Dobson, Detection of transient protein folding populations by mass spectrometry, Science, 262 (1993) 896–900. 95 H. Xiao, J.K. Hoerner, S.J. Eyles, A. Dobo, E. Voigtman, A.I. Melcuk and I.A. Kaltashov, Mapping protein energy landscapes with amide hydrogen exchange and mass spectrometry: I. A generalized model for a two-state protein and comparison with experiment, Protein Sci., 14 (2005) 543–557. 96 T.E. Wales and J.R. Engen, Hydrogen exchange mass spectrometry for the analysis of protein dynamics, Mass Spec. Rev., 25 (2006) 158–170. 97 L. Cravello, D. Lascoux and E. Forest, Use of different proteases working in acidic conditions to improve sequence coverage and resolution in hydrogen/deuterium exchange of large proteins, Rapid Commun. Mass Spectrom., 17 (2003) 2387–2393. 98 J.R. Engen and D.L. Smith, Investigating protein structure and dynamics by hydrogen exchange MS, Anal. Chem., 73 (2001) 256A–265A. 99 W. Rist, F. Rodriguez, T.J.D. Jorgensen and M.P. Mayer, Analysis of subsecond protein dynamics by amide hydrogen exchange and mass spectrometry using a quench-flow setup, Protein Sci., 14 (2005) 626–632. 100 Y. Bai, J.S. Milne, L. Mayne and S.W. Englander, Primary structure effects on peptide group hydrogen exchange, Proteins: Struct. Funct. Genet., 17 (1993) 75–86.
126
Lars Konermann et al.
101 H. Roder, G.A. Elo¨ve and S.W. Englander, Structural characterization of folding intermediates in cytochrome c by H-exchange labelling and proton NMR, Nature, 335 (1988) 700–704. 102 S.W. Englander, In pursuit of protein folding, Science, 262 (1993) 848–849. 103 D.K. Heidary, L.A. Gross, M. Roy and P.A. Jennings, Evidence for an obligatory intermediate in the folding of Interleukin-1b, Nat. Struct. Biol., 4 (1997) 725–731. 104 D.A. Simmons and L. Konermann, Characterization of transient protein folding intermediates during myoglobin reconstitution by time-resolved electrospray mass spectrometry with on-line isotopic pulse labeling, Biochemistry, 41 (2002) 1906–1914. 105 H. Yang and D.L. Smith, Kinetics of cytochrome c folding examined by hydrogen exchange and mass spectrometry, Biochemistry, 36 (1997) 14992–14999. 106 D.J. Wilson and L. Konermann, Ultrarapid desalting of protein solutions for electrospray mass spectrometry in a microchannel laminar flow device, Anal. Chem., 77 (2005) 6887–6894. 107 T.L. Constantopoulos, G.S. Jackson and C.G. Enke, Effects of salt concentration on analyte response using electrospray ionization mass spectrometry, J. Am. Soc. Mass Spectrom., 10 (1999) 625–634. 108 N. Lion, J. Gellon, H. Jensen and H.H. Girault, On-Chip protein sample desalting and preparation for direct coupling with electrospray ionization mass spectrometry, J. Chromatogr. A, 1003 (2003) 11–19. 109 M.R. Holl, P. Galambos, F.K. Forster, J.P. Brody and P. Yager, Optimal design of a microfabricated diffusion-based extraction device, Proc. Am. Soc. Mech. Eng. DSC, 59 (1996) 189–195. 110 D.-Y. Chang, C.-C. Lee and J. Shiea, Detecting large biomolecules from high-salt solutions by fused-droplet electrospray ionization mass spectrometry, Anal. Chem., 74 (2002) 2465–2469. 111 M. Brivio, R.H. Fokkens, W. Verboom, D.N. Reinhoudt, N.R. Tas, M. Goedbloed and A. van den Berg, Integrated microfluidic system enabling (Bio)chemical reactions with on-line MALDI-TOF mass spectrometery, Anal. Chem., 74 (2002) 3972–3976.
A
Dimer20+
10000
B
5000
Intensity (cps)
Intensity (cps)
12000
holo-M14+
4000
8000 6000 4000 2000 4
3000 2000 1000 4
3
3
s)
5000
m/z
apo-M34+
2500
Intensity (cps)
2
e(
0 4600 4800
5200
m
1
5400
Ti
s)
e(
m
Ti
2
1 0
4000 35 3000 00 25 2000 00 1500 m/z
C
2000 1500 1000 500 4 3
s)
e(
m
Ti
2 1
4000 35 3000 00 25 2000 00 0 15 m/z 1000 00
Plate 1 Unfolding kinetics of iNOSCOD monitored by time-resolved ESI-MS. Shown in this figure are intensity-time profiles obtained for ionic signals representing the (A) native protein dimer, (B) monomeric, heme-containing reaction intermediates, and (C) unfolded, heme-free monomers. Red lines represent the result of a global fitting procedure according to Equation (7). Adapted with permission from ref. [84]. Copyright 2005 American Chemical Society. (For Black and White version, see page 117.)
CHAPT ER
6 Thermodynamic Analysis of Protein Folding and Ligand Binding by SUPREX Michael C. Fitzgerald, Liangjie Tang and Erin D. Hopper
Contents
1. Introduction 2. The SUPREX Protocol 3. Evaluation of Thermodynamic Parameters 1=2 3.1 CSUPREX values 3.2 DGf and m-values 3.3 Two-state folding 3.4 Denatured state structure 3.5 EX2 exchange 3.6 The proline effect 4. Quantitative Analysis of Ligand Binding 4.1 Evaluation of dissociation constants 4.2 Relative binding free energies 5. Unique Applications 5.1 High-throughput screening 5.2 Thermodynamic analysis of proteins in multi-component mixtures 5.3 Analysis of protein folding intermediates 6. Conclusion References
127 128 131 131 131 132 132 134 135 136 136 138 141 142 143 145 146 146
1. INTRODUCTION SUPREX (Stability of Unpurified Proteins from Rates of H/D Exchange) is an H/D exchange- and MALDI mass spectrometry-based technique for measuring the solution-phase thermodynamic properties of proteins and protein–ligand complexes [1,2]. The technique is designed to evaluate the free energy (i.e., DGf value) Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00206-7
r 2009 Elsevier B.V. All rights reserved.
127
128
Michael C. Fitzgerald et al.
and m-value (i.e., dDGf/d[Denaturant]) of a protein’s overall folding reaction. These values are fundamental biophysical parameters that are widely used in the study of proteins as they provide important information about protein folding and function. SUPREX is distinguished from other H/D exchange- and mass spectrometrybased approaches for the analysis of protein stability and ligand binding (e.g., the PLIMSTEX technique [3]) in its use of chemical denaturant. In SUPREX the H/D exchange properties of the globally protected amide protons are measured as a function of denaturant concentration. Specifically, the mass increase (i.e., Dmass) of a protein sample at different denaturant concentrations is measured after a specific H/D exchange time, and these Dmass values are used to determine the thermodynamic parameters of a protein’s folding/unfolding reaction. Amide protons that exchange through local unfolding reactions do not have denaturantdependent exchange reactions. In contrast, globally protected amide protons exchange only when the protein globally unfolds; thus the exchange rates of these protons are denaturant dependent. The denaturant-dependent exchange of globally protected amide protons is the most important factor for SUPREX analysis, and it is this factor that enables the derivation of thermodynamic parameters from SUPREX data. SUPREX is analogous to conventional chemical denaturation techniques that utilize various optical spectroscopies (e.g., circular dichroism (CD), fluorescence, or absorbance) for studying the equilibrium unfolding properties of proteins [4]. However, the mass spectrometry readout in SUPREX gives the technique important experimental advantages over conventional spectroscopy-based techniques. In particular, SUPREX is amenable to automation and high-throughput analyses. It is also useful for the analysis of milligram to nanogram quantities of both purified and unpurified protein samples. These unique advantages make SUPREX an attractive technique for several important applications. One application in which SUPREX has proven especially useful has been in the thermodynamic analysis of protein–ligand binding interactions. A number of studies have established that reasonably accurate and precise binding free energies and dissociation constants can be determined by SUPREX [2,5–9]. This chapter is focused on applications of SUPREX to the thermodynamic analysis of protein folding and ligand binding. The requisite experimental protocols for SUPREX data acquisition and analysis are briefly described. A more detailed description of the data acquisition and analysis methods necessary to perform SUPREX was recently published [10]. Herein, we summarize the types of protein and protein–ligand systems that are amenable to SUPREX analysis and the thermodynamic parameters that can be extracted from such analyses. Unique applications of the technique are also highlighted.
2. THE SUPREX PROTOCOL The basic SUPREX protocol is outlined in Figure 1. The experiment begins with the distribution of a protein or protein–ligand complex into a series of deuterated
Thermodynamic Analysis of Protein Folding and Ligand Binding
In cr ea s
Fully Protonated Protein Solution
129
MALDI Sample Target
in g
[D en at ur an t] Deuterated Exchange Buffers
ΔMass = ?
Ice Cold, Low pH MALDI Matrix Monitor Change in Mass with Increasing [Denaturant] Using MALDI-MS
m/z
Figure 1 Schematic representation of the basic SUPREX protocol.
H/D exchange buffers that contain increasing concentrations of a chemical denaturant such as guanidinium chloride (GdmCl) or urea. The protein samples in the series of exchange buffers are allowed to undergo H/D exchange for a specified amount of time. After this specified H/D exchange time (a time that is the same for each protein-containing exchange buffer in the series), the mass of each deuterated protein sample is determined using MALDI mass spectrometry. Ultimately, the deuterium content (i.e., Dmass value) is determined for the protein in each denaturant-containing H/D exchange buffer, and these values are used to generate a SUPREX curve (i.e., a plot of Dmass versus [denaturant]) at a specific exchange time (Figure 2A). Very stable proteins and proteins that are complexed with tight-binding ligands require relatively high concentrations of denaturant to fully exchange their amide hydrogens for deuterons. Such proteins yield SUPREX curves with sigmoidal transitions that occur at higher denaturant concentrations than less stable proteins and proteins in the absence of tight-binding ligands. The SUPREX experiment requires an initial 10-fold dilution of the fully protonated test protein into the deuterated exchange buffers. This dilution is important because it places the protein in a largely (B90%) deuterated environment. However, only a fraction of the protein in each buffer is ultimately taken on to the MALDI analysis. Sample economy and sensitivity can be improved by adding a concentration and desalting step to the basic protocol to recover a greater percentage of the protein in the H/D exchange buffers after the specified exchange time. Such a protocol has made it possible to generate a SUPREX curve using as little as 10 pmol of protein material [11]. To obtain accurate thermodynamic parameters using SUPREX, the folding reactions of protein samples in the deuterated H/D exchange buffers must be at equilibrium. Proteins with folding/unfolding reactions that are slow to equilibrate once the protein is diluted from the protonated stock solution into the deuterated
130
Michael C. Fitzgerald et al.
A.
Δ Mass (Da)
60 55 50 45 40 0 B.
1
2 Urea (M)
3
4
6
−ΔGapp (kcal mol-1)
5
4
3
2 0.0
0.5
1.0 ½ C SUPREX
1.5
2.0
2.5
(M Urea)
Figure 2 Representative SUPREX data acquired on a 12 kDa protein, the S-protein. (A) SUPREX curves for the S-protein using H/D exchange times of 10 (circles), 20 (triangles), and 30 min (squares). The solid lines represent the best fit of each SUPREX curve to Equation (1). The dotted 1=2 1=2 lines mark the C SUPREX values. (B) The DGapp versus C SUPREX plot for the S-protein. The filled symbols indicate the data points from the SUPREX curves shown in (A). The open symbols indicate the data points from additional SUPREX curves (data not shown). The solid line is the result of linear least squares fitting of the data to Equation (2). Adapted with permission from ref. [2]. Copyright 2003 American Chemical Society.
exchange buffers can be problematic in SUPREX. If the H/D exchange times used in SUPREX are not significantly longer than the time required for the protein to reach equilibrium, then a pre-equilibrium protocol must be used. The preequilibrium protocol is identical to the basic SUPREX protocol outlined in Figure 1 except that it includes an equilibration step in protonated denaturant-containing buffers before H/D exchange is initiated in the deuterated denaturant-containing buffers [12].
Thermodynamic Analysis of Protein Folding and Ligand Binding
131
3. EVALUATION OF THERMODYNAMIC PARAMETERS 1=2
3.1 CSUPREX values An important parameter associated with a protein’s SUPREX curve is the 1=2 concentration of denaturant at the transition midpoint (i.e., the CSUPREX value). 1=2 The CSUPREX value is typically obtained by fitting the data points in a SUPREX curve to a sigmoidal equation, such as Equation (1), using a nonlinear regression analysis [2]. a DMass ¼ DM0 þ (1) 1=2 ½DenaturantC
1þe
b
SUPREX
In Equation (1), DM0 is the mass change of the protein measured before exchange of its globally protected amide hydrogens (i.e., the Dmass at the pretransition baseline of the SUPREX curve), a is the amplitude of the transition in 1=2 Dalton, [Denaturant] is the denaturant concentration, CSUPREX is the concentration of denaturant at the transition midpoint of the curve, and b is related to the steepness of the transition.
3.2 DGf and m-values The relationship between a protein’s DGf value (i.e., folding free energy) and the 1=2 CSUPREX value obtained in a SURPEX experiment can be described by Equation (2), which is derived in the appendix of ref. [2]. 2 3 hkint it 6 0:693 1 7 7 ¼ mC1=2 RT6 (2) SUPREX DGf 4ln nn 5 n1 ½P 2n1 In Equation (2), m is defined as dDGf/d[Denaturant], R is the gas constant, T is the temperature in Kelvin, hkint i is the average intrinsic exchange rate of an unprotected amide proton, t is the H/D exchange time, n is the oligomeric state of the protein, and [P] is the protein concentration expressed in n-mer equivalents. The left side of the equality in Equation (2) represents the apparent folding free energy of the protein at each denaturant concentration and will hereafter be referred to as DGapp. A protein’s DGf and m-value are typically determined by (1) generating multiple SUPREX curves in which the H/D exchange time is varied (see Figure 2A), 1=2 (2) extracting a CSUPREX value for each curve, and (3) plotting the DGapp (calculated using the left side of the equality in Equation (2)) as a function of the 1=2 CSUPREX values recorded at different exchange times (see Figure 2B). Ultimately, a linear least squares analysis of the data in the resulting plot yields the equation of a line in which the slope and y-intercept correspond to m and DGf, respectively. In such DGf and m-value calculations, an estimation of the hkint i value is necessary; this value may be estimated using one of two methods. The first method uses the relationship hkint i ¼ 10pH5 min1; this is valid for
132
Michael C. Fitzgerald et al.
experiments performed at room temperature and in buffers with a pH W 4 [1]. The hkint i value can also be estimated using the SPHERE program, which calculates an estimated hkint i value using the protein’s primary amino acid sequence, the temperature, the buffer pH, and experimentally determined amide H/D exchange rates in model peptides [13,14]. The derivation of Equation (2) (see ref. [2]) requires three important assumptions about the denaturant-induced protein unfolding reaction. These assumptions include the following: (1) the unfolding reaction is reversible and well-modeled by a two-state process (i.e., partially folded intermediate state(s) are not significantly populated), (2) the protein is under EX2 exchange conditions (i.e., the protein folding rate is faster than the intrinsic exchange rate of an unprotected amide proton), and (3) the protein’s denatured state is similar to a random coil conformation. Below is a discussion of the impact of each of these assumptions on the accuracy and precision of DGf and m-value determinations by SUPREX.
3.3 Two-state folding Two-state folding is an important prerequisite for the evaluation of DGf and m-values by any chemical denaturation-based technique [4]. SUPREX curves and conventional denaturation curves for non-two-state folding proteins typically have either multiple transitions or one broad transition. Often additional biophysical data (such as consistent DGf and m-values from thermodynamic and kinetic data or coincident unfolding transitions using multiple structural probes) are needed to validate the two-state folding behavior of a protein. In the case of non-two-state folding proteins, accurate DGf and m-values cannot be measured by SUPREX or by conventional techniques. However, the transition midpoints of non-two-state folding proteins recorded by SUPREX (and by conventional techniques) can serve as qualitative measures of protein stability and ligand binding. Listed in Table 1 are examples of two-state folding proteins for which SUPREX-derived DGf and m-values have been measured. For comparison, the DGf and m-values determined by other techniques (e.g., CD, fluorescence, or differential scanning calorimetry (DSC)) are also tabulated. The SUPREX-derived DGf and m-values for the first nine proteins summarized in Table 1 are in good agreement with the results derived using other techniques (i.e., differences are r15%). The precision of the SUPREX measurements in Table 1 is also comparable to that of other techniques (standard errors of fitting are generally on the order of 10% or less).
3.4 Denatured state structure The last five proteins in Table 1 have SUPREX-derived m-values that agree well with those derived by conventional techniques. However, the SUPREX-derived DGf values of these five proteins are significantly larger (i.e., W15% larger) than the CD or fluorescence titration results obtained under similar conditions. One cause for such a systematic error in SUPREX can be the use of an aberrantly high hkint i value.
Thermodynamic Analysis of Protein Folding and Ligand Binding
Table 1
Thermodynamic parameters obtained on two-state folding proteins
Proteins
Method
m-value (kcal/mol/M)
DGf (kcal/mol)a
Coil-VaLd
CD SUPREX CD SUPREX CD SUPREX CD SUPREX CD SUPREX CD SUPREX CD SUPREX DSC, CD SUPREX Fluorescence SUPREX
2.7b 2.670.6c 1.9d 2.070.4c 2.170.2e 1.970.1e 3.670.1f 3.870.4c 2.1g 2.170.1g 11.871.4h 11.171.8c 2.1i – – – 1.970.1m 2.170.1m
18.4b 18.472.0c 10.5d 9.370.9c 8.570.3e 9.470.2e (9.0) 10.670.4f 11.470.2c (10.8) 4.470.2g 5.070.4g 61.773.8h 66.876.6c 19.271.1i 20.470.4j 5.4,k 3.870.2l 5.170.4j 6.870.1m 8.170.1m (7.4)
CD SUPREX Fluorescence SUPREX Fluorescence SUPREX Fluorescence SUPREX CD Fluorescence SUPREX
1.870.1n 2.070.1n 0.6o 0.670.1p 1.870.0q 1.870.1m 1.370.1p 1.170.1p 2.870.2m 2.570.1m 2.470.1m
4.770.2n 6.070.1n 3.6o 5.870.2p (5.4) 7.070.2q 8.970.1m (8.1) 2.170.1p 5.270.1p (2.9) 8.170.4m 7.570.2m 11.470.2m (9.1)
GCN4p1 Ubiquitin ArcR l685 4-OT TrpR B1-domain Eglin C Protein L abl-SH3 CI2 RNase S RNase A
a
133
All values in parenthesis are corrected for the proline effect under the assumption that each proline position attains a cis/trans ratio of 1/4 in the equilibrium unfolded state [21]. See text for explanation of correction factors. From ref. [36]. c From ref. [37]. d From ref. [38]. e From ref. [12]. f From ref. [39]. g From ref. [1]. h From ref. [40]. i From ref. [41]. j From ref. [6]. k From ref. [42]. l From ref. [43]. m From ref. [18]. n From ref. [44]. o From ref. [45]. p From ref. [2]. q From ref. [46]. b
134
Michael C. Fitzgerald et al.
The hkint i value is a critical parameter for accurate determination of DGf values in SUPREX. This value must be accurately approximated to obtain SUPREXderived DGf values that are in good agreement with the data yielded by conventional spectroscopy-based techniques. The methods used for approximating hkint i values for SUPREX analysis (see Section 3.2) both assume that the denatured state of the protein is similar to a random coil configuration. The existence of partial structure in the unfolded state of a protein will cause an overestimation of the hkint i value, thus resulting in the calculation of a larger (i.e., more negative) DGf value in the SUPREX analysis. Interestingly, there is experimental evidence for partial structure in the denatured states of four of the last five proteins in Table 1. An NMR study on the denatured state of protein L revealed that the chemical shifts of some residues were significantly different from their random coil values, suggesting that some residual structure was present in the denatured state of protein L [15]. NMR experiments were also performed on chymotrypsin inhibitor 2 (CI2) [16]. The denatured state of CI2 was observed to be highly unfolded; however, there was some residual native helical structure along with hydrophobic clustering in the center of the chain. Residual structure was also detected in the denatured states of RNase A and RNase S using H/D exchange-, NMR-, and calorimetry-based techniques [17]. The chemical-induced denatured state for abl-SH3 has not been studied in detail. The results in Table 1 provide intriguing evidence that the presence of structured denatured states may be a source of large discrepancies between DGf values derived using SUPREX and conventional spectroscopy-based techniques. It has also been proposed that such discrepancies may be useful for the biophysical characterization of structured denatured states [18].
3.5 EX2 exchange Another important assumption in quantitative analyses of DGf and m-values using SUPREX is that the protein exhibit the so-called EX2 exchange behavior (i.e., the protein refolding rate is significantly faster than the intrinsic exchange rate of an unprotected amide proton) [1,2]. Under EX2 exchange conditions, only one population of deuterated protein molecules is typically detected by mass spectrometry during the course of H/D exchange. In contrast, under EX1 exchange conditions, two distinct populations of deuterated protein molecules are typically detected by mass spectrometry during the course of H/D exchange: one in which none of the globally protected amide protons are exchanged, and one in which all of the globally protected amide protons are exchanged. Thus, in some cases it is possible to detect non-EX2 exchange behavior by the presence of two distinct protein ion signals in the MALDI readout of SUPREX. Such behavior is exclusively detected in the transition region of a SUPREX curve [7]. However, it is important to note that the sensitivity and resolving power of the mass spectral analysis can often preclude the detection of EX1 behavior. Recently, we have shown that non-EX2 exchange behavior can also be detected by careful examination of SUPREX data [19]. Under EX2 exchange conditions, the plots used to extract DGf and m-values using Equation (2) should be linear.
Thermodynamic Analysis of Protein Folding and Ligand Binding
135
12
-1 −ΔGapp (kcal mol )
8
4
0
-4
-8 0
1
2
3
4
5
6
[Denaturant] (M)
Figure 3 Theoretical SUPREX data showing the behavior expected from a protein exclusively in the EX2 exchange regime (circles) and from a protein not exclusively in the EX2 exchange regime (triangles). Both data sets are from theoretical proteins with a DGf value of 9.5 kcal/mol. The circles represent data from a protein with a refolding rate constant (kcl) of 102 s1 and a kcl denaturant dependence (mcl) of 0.5 kcal/mol/M, parameters that put the protein exclusively in the EX2 exchange regime. The triangles represent data from a theoretical protein with a kcl of 102 s1 and a mcl of 1.5 kcal/mol/M, parameters that do not put the protein exclusively in the EX2 exchange regime (particularly at [denaturant] W2 M). Adapted with permission from ref. [19].
However, non-EX2 exchange behavior results in a pronounced nonlinearity in these plots (Figure 3). It is also noteworthy that such nonlinearities due to nonEX2 exchange behavior have been found to occur most often at denaturant concentrations W2 M. Non-EX2 exchange conditions caused by the presence of denaturant can be minimized by employing longer H/D exchange times to allow the transition midpoints of the SUPREX curves to reside at lower concentrations of denaturant (i.e., oB2 M). Protein folding rates are also generally not as sensitive to pH as hkint i values, which decrease by roughly an order of magnitude for every unit of decrease in pH between 14 and 4 [20]. Thus, non-EX2 exchange behavior can often be avoided by working at a lower pH.
3.6 The proline effect The cis–trans isomerization of proline residues can lead to differences in stability measurements made using H/D exchange- and non-H/D exchange-based methods [21]. When proline residues are present in the protein’s primary amino acid sequence, the unfolded state probed by the H/D exchange experiment is higher in energy than the unfolded state probed in non-H/D exchange experiments. This is because the amide H/D exchange time is short compared to the time it takes the Xxx-Pro bonds in an unfolded polypeptide to reach their
136
Michael C. Fitzgerald et al.
lowest free energy state (i.e., with the Xxx-Pro bonds fully equilibrated in their cis and trans states). In the unfolded state, the cis–trans equilibrium of Xxx-Pro bonds favors the trans conformation; thus, folded protein structures that contain cis prolines require the greatest correction of folding free energy in H/D exchange analyses. Under the assumption that each proline residue in a protein’s polypeptide chain attains a cis–trans ratio of 1/4 in the unfolded state, correction terms of 1.0 and 0.135 kcal/mol/residue have been proposed for cis and trans prolines, respectively [21]. Thus, if the three-dimensional structure of the native protein is known, then the above correction factors provide a means for adjusting DGf values derived by H/D exchange methods so that they can be better compared to the same values obtained using non-H/D exchange-based methods (see parenthetical values in Table 1). However, it is important to note that if the cis–trans ratio is not 1/4 in a protein’s unfolded state, then a different correction factor may be required.
4. QUANTITATIVE ANALYSIS OF LIGAND BINDING An important application of SUPREX has been the quantitation of protein–ligand binding affinities. Quantitative SUPREX analyses of protein–ligand binding rely on the measurement of a protein’s DGf and m-value in the absence and presence of a ligand (Figure 4). This ultimately makes possible the calculation of a DDGf value between the protein and protein–ligand complex (i.e., a binding free energy for the ligand). Described further is the use of SUPREX to evaluate the dissociation constants (i.e., Kd values) and relative binding free energies for different ligands to a given protein (i.e., DDGf,Rel values).
4.1 Evaluation of dissociation constants In cases where a protein has one or more independent ligand binding sites of equal affinity, Equation (3) [22] can be used to calculate a Kd value from a SUPREXderived DDGf value. Kd ¼
½L eDDGf =NRT
1
(3)
In Equation (3), [L] is the concentration of free ligand, N is the number of independent binding sites, and DDGf is the change in folding free energy upon ligand binding. For SUPREX ligand binding experiments in which the total ligand concentration is greater (e.g., at least 10-fold greater) than the protein concentration, the free ligand concentration ([L] in Equation (3)) can be estimated as the total ligand concentration [23]. If the ligand concentration is not in large excess over the protein concentration, Equation (4) must be used to calculate Kd values from
Thermodynamic Analysis of Protein Folding and Ligand Binding
137
A. 40
Δ Mass (Da)
35
30
25
20
0
1
2
3
4
5
Urea (M)
B.
−ΔGapp (kcal mol-1)
7
6
5
4
0.0
0.5
1.0
1.5 ½ C SUPREX
2.0
2.5
3.0
3.5
(M Urea)
Figure 4 SUPREX data acquired in a typical protein–ligand binding analysis. (A) Representative SUPREX curves for the abl-SH3 domain (2 mM) in the absence of ligand using an H/D exchange time of 7.5 min (triangles) and in the presence of a peptide ligand (180 mM) using an H/D exchange time of 10 min (circles). The solid lines represent the best fit of each 1=2 SUPREX curve to Equation (1). The dotted lines mark the C SUPREX values. (B) The DGapp versus 1=2 C SUPREX plot obtained for the abl-SH3 domain in the absence (circles) and in the presence (triangles) of the peptide ligand. Filled symbols indicate the data points from the SUPREX curves shown in (A). The open symbols denote data points from additional SUPREX curves (data not shown). The solid lines are the results of linear least squares fitting of the data to Equation (2). Adapted with permission from ref. [2]. Copyright 2003 American Chemical Society.
138
Michael C. Fitzgerald et al.
SUPREX-derived DDGf values [24]. Kd ¼
4Ltotal eDDGf =NRT 4Ptotal ðeDDGf =NRT 1Þ ð2eDDGf =NRT 1Þ2 1
(4)
Summarized in Table 2 are some of the model protein–ligand systems characterized to date by SUPREX. Overall, the SUPREX-derived dissociation constants (Kd values) are in good agreement (i.e., generally within 5-fold) with those derived by other techniques. SUPREX enables the evaluation of a wide range of Kd values. For example, the values in Table 2 range from 0.1 nM to 1.3 mM. SUPREX is also amenable to the analysis of a wide variety of protein– ligand complexes including those that contain peptides, nucleic acids, small molecules, and proteins as ligands. It is noteworthy that not all of the protein systems in Table 2 are two-state folders. For example, CaM and BCAII are known to have non-two-state folding properties [25–27], and it is not clear whether or not CypA, MoaE, PKCy, and MltB are two-state folders as the necessary biophysical studies on these proteins have not been performed. For non-two-state folding proteins, the SUPREXderived DGf and m-values do not accurately reflect the biophysical properties of their folding reactions. Despite this, these values are still useful for comparative purposes (i.e., the SUPREX-derived DGf values could be used to generate the DDGf values needed to evaluate Kd values). For example, a meaningful DDGf value for the binding of melittin to calcium-loaded CaM could be ascertained from SUPREX-derived DGf values obtained on calcium-loaded CaM in the presence and absence of melittin (see Table 2) [24]. All of the SUPREX-derived Kd values tabulated in Table 2 are for ligands with one or more independent, equivalent binding sites on the target protein. Determining the Kd values for multiple, independent, nonequivalent ligand binding sites in a protein using SUPREX-derived DDGf values is challenging. In theory, such Kd values could be determined by measuring DDGf values as a function of ligand concentration and fitting the data to an appropriate binding polynomial. In practice, extracting multiple Kd values from such a data fitting would require greater precision in SUPREX-derived DDGf values than is typically observed (710–20%). Nonetheless, an overall DDGf value evaluation of protein– ligand binding for systems with multiple, nonequivalent binding sites is possible using SUPREX.
4.2 Relative binding free energies In some cases, SUPREX data for a protein–ligand complex is well described by Equation (2), but data on the protein alone is not well described by Equation (2) (Figure 5). In such cases, Kd value calculations are not possible because a DGf value for the protein cannot be determined. Consequently, the binding free energy (DDGf) cannot be determined. Poor fits of the SUPREX data to Equation (2) can occur when the H/D exchange behavior of the protein is dominated by local unfolding events rather than global unfolding events [24]. Often, the presence of a
Thermodynamic Analysis of Protein Folding and Ligand Binding
139
Table 2 SUPREX-derived dissociation constants for selected model protein systems. All proteins are known two-state folders except where noted
a
Protein complex
SUPREX-derived Kd
Literature Kd
ArcR-L (DNA fragment 1) ArcR-R (DNA fragment 2) ArcR-NS (DNA fragment 3) B1 domain+Fc frag S-protein+S-peptide TrpR+W TrpR+W+DNA FIV Pr+TL3 Ab1 SH3+peptide 1 Ab1 SH3+peptide 2 Ab1 SH3+peptide 3 Ab1 SH3+peptide 4 S-protein+peptide 6 S-protein+peptide 7 S-protein+peptide 8 S-protein+peptide 9 CypA+CsAl CaM-Ca(II)-Melittinn BCAII-Zn(II)-SULFAn BCAII-Zn(II)-CBSn MoaE+Moa Dl PKCy+Inhibitor 1l PKCy+Inhibitor 12l PKCy+Inhibitor 24l MltB+MurNAcl MltB+MurNAc-dipeptidel
0.370.2 nMa 1.070.6 nMa 0.370.2 nMa 0.5270.14 mMc 2.470.6 nMc 130720 mMc 0.1670.09 nMc 5207330 nMc 57725 mMi 2878 mMi 2578 mMi 5.272.4 mMi 2507110 mMi 9.773.0 mMi 400770 nMi 34712 nMi 77717 (32720) nMm 53720 nMo 93749 mMo 2.671.0 mMo 1777 mMt 0.1 nMv 0.6 nMv 16.7 nMv 1.3070.283 mMw 1.0570.356 mMw
0.37 0.2 nMb 1.570.9 nMb 1.370.4 mMb 0.24 mMd 1.1 nMe 42 mMf 0.25 nMg 41 nMh 118 mMi 28.8 mMi 24.1 mMi 5.6 mMi 16 mMj B7 mMi 41 nMk 6.3 nMi 30–200 nMm 110p/18.4q nM 71 mMr 0.730/0.760 mMs 4.5 mMu 0.2 nM (IC50)v 1.0 nM (IC50)v 20.1 nM (IC50)v 1.6770.156 mMw 0.30970.160 mMw
From ref. [5]. From ref. [47]. From ref. [6]. d From ref. [48]. e From ref. [49]. f From ref. [50]. g From ref. [51]. h From ref. [52]. i From ref. [2]. j From ref. [53]. k From ref. [54]. l Not known whether or not the protein is a two-state folder. m From ref. [9]. n Protein is known non-two-state folder. o From ref. [24]. p From ref. [55]. q From ref. [3]. r From ref. [56]. s From ref. [57]. t From ref. [8]. u From ref. [31]. v From ref. [58]. w From ref. [59]. b c
140
Michael C. Fitzgerald et al.
A.
14
−ΔGapp (kcal mol-1)
12 10 8 6 4 2 0
1
2 1/2 C SUPREX
B.
3
4
5
6
4
5
6
(M GdmCl)
14
−ΔGapp (kcal mol-1)
12 10 8 6 4 2 0
1
2 1/2 C SUPREX
3
(M GdmCl)
Figure 5 SUPREX analysis of Bcl-xL in the absence and presence of ligands. The DGapp versus 1=2 C SUPREX plot obtained for (A) Bcl-xL in the absence of ligand, and (B) Bcl-xL in the presence of either Bak 1 (open circles) or Bak 2 (closed circles). The solid lines are the results of linear least squares fitting of the data to Equation (2). The R2 value for apo-Bcl-xL in (A) is 0.5790, indicating that the apoprotein is not well described by Equation (2) (see text). In contrast, the R2 values for Bcl-xL-Bak 1 and Bcl-xL-Bak 2 in (B) are 0.9166 and 0.9120, respectively. Adapted with permission from ref. [24]. Copyright 2007 American Chemical Society.
tight-binding ligand can induce the H/D exchange behavior of the protein to be dominated by a global unfolding event (see Figure 5B). In such cases, having multiple known ligands for the protein can make possible the evaluation of relative binding free energies. If the first ligand has a unique binding site from the other ligands, then Kd measurements are still possible. This was the case for CaM and BCAII (see Table 2). Neither apo-CaM nor apo-BCAII yielded SUPREX data that was well
Thermodynamic Analysis of Protein Folding and Ligand Binding
Table 3
141
SUPREX-derived relative binding free energies
Protein complex
DGf (kcal/mol)
DDGf,Rel (kcal/mol)
FbpA+PO4a FbpA+Cita FbpA+AsO4a FbpA+SO4a
11.170.4 9.770.2 10.570.2 9.770.1
0b 1.570.4b 0.670.5b 1.470.4b
Bcl-xL-Bak1c Bcl-xL-Bak2c
11.570.1 13.170.2
0d 1.670.2d
Literature DDGf,Rel (kcal/mol) 0 1.98 0.68 – 0 2
a
From ref. [7]. Value relative to FbpA+PO4. From ref. [24]. d Value relative to Bcl-xL-Bak1. b c
described by Equation (2) [24]. However, the metal-loaded proteins (i.e., calciumloaded CaM and zinc-loaded BCAII) yielded SUPREX data that was well described by Equation (2). Thus, the metal ions served to reduce local unfolding events and allow a global unfolding event to dominate the H/D exchange behavior of these proteins. This allowed for Kd measurements for the CaM and BCAII ligands listed in Table 2. In cases where the first ligand occupies the same binding site as the other ligand(s), Kd measurements are not possible, but relative binding free energies can still be measured. The anti-apoptotic protein Bcl-xL is one such example (Table 3 and Figure 5). Similar to CaM and BCAII, multiple ligands for Bcl-xL are known. However, unlike CaM and BCAII, the ligands of interest for Bcl-xL all bind to the same site on the protein [24]. Thus, a dissociation constant could not be calculated for the second ligand in the presence of the first ligand. However, it was possible to ascertain the relative binding affinities of the two ligands (see Table 3) [24]. These relative binding affinities were in reasonable agreement with the binding affinities estimated based on their relative Ki values [24]. In another example, SUPREX was used to analyze the binding of synergistic anions to ferric-binding protein (FbpA), a bacterial iron transport protein [7]. FbpA requires the binding of a synergistic anion to transport Fe3+ across the periplasm. SUPREX analysis of Fe3+FbpA in the absence of a synergistic anion was not feasible due to the difficulty of preparing an anion-free Fe3+FbpA sample. Thus, Kd measurements were not possible. However, even though Kd values could not be measured for Fe3+FbpA-synergistic anion complexes, relative binding free energies for a series of synergistic anions could be measured [7]. This information allowed for the relative binding affinities of different synergistic anions to be quantified (see Table 3).
5. UNIQUE APPLICATIONS SUPREX has several important experimental advantages over conventional biophysical methods (e.g., spectroscopy or calorimetry) for making
142
Michael C. Fitzgerald et al.
thermodynamic measurements on protein folding reactions. These advantages include the capacity for (1) high-throughput analysis, (2) analysis of small amounts of material, (3) analysis of unpurified proteins (i.e., proteins in multicomponent biological mixtures), and (4) measurement of free energies across a wide range of denaturant concentrations. The following are several applications of SUPREX that have exploited one or more of these advantages.
5.1 High-throughput screening A high-throughput screening (HTS) strategy utilizing a single-point SUPREX protocol has been developed [28]. This strategy can be used for the rapid detection of protein–ligand binding events. It is based on the increase in protein stability that occurs upon ligand binding and therefore has several advantages over conventional HTS techniques. First, the technique does not require covalent labeling and thus is not susceptible to the alteration of ligand binding behavior that sometimes occurs in the presence of labels. Second, no immobilization of the compound or the target protein is necessary; this helps to prevent some of the complications in ligand binding behavior that occur in some HTS techniques. Third, no time-consuming separations are required. Finally, SUPREX is amenable to the analysis of multiple classes of ligands (e.g., small molecules, nucleic acids, peptides, and proteins) over a wide range of binding affinities, which makes the technique more flexible than many other HTS strategies. The single-point SUPREX protocol involves the acquisition of Dmass data at a single denaturant concentration rather than at a range of different denaturant concentrations (as is done in a typical SUPREX analysis). The single point is chosen so that the magnitude of the Dmass value reveals whether the compound is interacting with the target protein (see dotted line in Figure 6A). If a ligand interacts with the target protein, a low Dmass value will be measured. However, if a ligand does not interact with the target protein, a high Dmass value will be measured. Thus, in a single-point SUPREX assay, compounds with a low Dmass are designated as hits (see Figure 6B). Subsequent SUPREX analyses involving the acquisition of complete SUPREX curves can be performed for hit validation. The feasibility of the single-point SUPREX assay has been demonstrated in a proof-of-concept study in which a one-bead, one-compound library was screened against the S-protein [28]. In this study, four model peptide libraries were screened against the protein target. The first library demonstrated the ability of the technique to select for ligands with a range of binding affinities. Screening the remaining three libraries showed that single-point SUPREX was effective at detecting a tight-binding ligand in a library containing many weak-binding ligands. Unlike many conventional HTS assays, single-point SUPREX does not rely on the exploitation of a unique characteristic of the target protein, and it is amenable to screening virtually any type of ligand library (e.g., small molecule, peptide, nucleic acid, etc.). Thus, it is a relatively general screening technique. However, since this technique is based on SUPREX, several of the SUPREX assumptions still apply. For example, the exchange behavior of the target protein should be in the
Thermodynamic Analysis of Protein Folding and Ligand Binding
A.
143
70 65
Δ Mass (Da)
60 Protein (No ligand)
55
0.3 μM Kd
50
0.03 μM Kd
45 40 35 30 0
1
2
3
4
5
[Denaturant] (M) B. 70 *
*
*
*
Δ Mass (Da)
60
50
40
30 0
10
20
30
40
Sample Number
Figure 6 High-throughput screening (HTS) for protein–ligand binding using SUPREX. (A) Theoretical SUPREX curves for a model protein in the absence of ligand (left), in the presence of a ligand with a dissociation constant of 0.3 mM (middle), and in the presence of a ligand with a dissociation constant of 0.03 mM (right). The dotted line denotes a denaturant concentration that could be used in a single-point SUPREX screening experiment. (B) Single-point SUPREX screening results obtained at the designated [denaturant]. Bars marked by an asterisk represent the negative controls (protein in the absence of ligand), and black bars represent the hits. Adapted with permission from ref. [28]. Copyright 2004 American Chemical Society.
EX2 regime, and the protein folding reaction must be reversible. The two-state assumption is not important for this assay as long as the H/D exchange behavior is dominated by global unfolding events.
5.2 Thermodynamic analysis of proteins in multi-component mixtures Several recent applications have exploited the ability of SUPREX to analyze unpurified proteins, particularly proteins in multi-component mixtures [8,32]. SUPREX can be performed on multiple protein components of a mixture as long as the ion signal for each component can be resolved in the MALDI readout. This is in
144
Michael C. Fitzgerald et al.
contrast to conventional spectroscopy-based techniques, which require the use of highly purified proteins. In addition to the ability to analyze complex mixtures, SUPREX has the additional advantage of not requiring labels for the analysis of intermolecular interactions. Thus, SUPREX can be used to quantify binding affinities for protein–protein interactions without the use of covalent labels. The technique is also amenable to the analysis of proteins in complex biological mixtures such as cell lysates. These unique advantages of SUPREX have made possible several measurements that were previously experimentally inaccessible.
5.2.1 Molybdopterin synthase SUPREX was used to measure the binding affinities of the different subunits in the Escherichia coli molybdopterin (MPT) synthase heterotetramer [8]. The MPT synthase enzyme is composed of a central homodimer of MoaE subunits, each of which is bound to one MoaD subunit. MPT synthase is activated by the addition of a thiocarboxylate moiety to each MoaD subunit (MoaD-SH). SUPREX was used to measure the binding affinity of MoaE to MoaD and MoaD-SH. This was the first measurement of the binding affinity between the subunits in activated MPT synthase (i.e., MoaE-MoaD-SH), and it revealed that the subunits were more tightly bound when the enzyme was activated. This observation is consistent with the proposed mechanism of MPT synthase [29–31].
5.2.2 Cytoplasmic dynein light chain-intermediate chain complex SUPREX has also been used to evaluate the thermodynamic properties of cytoplasmic dynein, a large protein complex that consists of four homodimeric protein components [32]. These components include the heavy chains, the intermediate chains, the light intermediate chains, and the light chains. Two of the light chain components, LC8 and TcTex1, interact with the intermediate chains; however, the weak dimerization of LC8 [33] has made thermodynamic analysis of the complex difficult [32]. Also, the multi-component nature of the complex makes this system nearly impossible to study using conventional spectroscopybased methods. However, using SUPREX, the binding free energies for LC8 interacting with both an intermediate chain peptide and TcTex1 were measured [32]. This first-time measurement provided important insight into the roles of LC8 and TcTex1 and supported a possible regulatory role for the two proteins.
5.2.3 In vivo analysis The measurement of protein stability in vivo is another application that exploits the ability of SUPREX to analyze multi-component mixtures [34]. SUPREX was recently used to compare the stability of monomeric l repressor in vitro and in vivo. In vivo SUPREX analysis was possible because both D2O and urea are able to diffuse across the E. coli cell membrane. Monomeric l repressor was overexpressed in E. coli grown in a protonated medium, and H/D exchange was initiated by placing the E. coli cells into a deuterated environment. After the exchange time was complete, the deuterated proteins were analyzed by MALDI mass spectrometry. Monomeric l repressor was found to have the same stability in vivo as it did in vitro. However, when the cells were grown in a hyperosmotic
145
Thermodynamic Analysis of Protein Folding and Ligand Binding
environment prior to performing SUPREX, the in vivo stability of the protein was significantly greater than the previously measured in vitro stability. In the case of this hyperosmotic environment, the in vitro stability measurements for monomeric l repressor were not representative of the true in vivo protein stability.
5.3 Analysis of protein folding intermediates
8
8
6
6 I
4
II
4 2
2
0
0
-2
-2 0
1
2
CD - ΔGapp (kcal mol-1)
(kcal mol-1)
SUPREX -ΔGapp
SUPREX has proven useful in fundamental protein folding studies as well. For example, SUPREX was recently used in the detection of equilibrium intermediates in protein folding reactions [35]. This application relies on the ability of SUPREX to measure folding free energies across a wide range of denaturant concentrations. When conventional equilibrium unfolding techniques are used to measure the thermodynamic properties of a protein folding reaction, the midpoint of the unfolding transition is related to the DGf and m-value of the protein. The transition midpoint of a SUPREX curve, however, relies not only on the DGf and m-values of the protein, but also on hkint i and exchange time. This dependence of the transition on exchange time allows for SUPREX data to be collected over a wide range of denaturant concentrations. The ability to move the midpoint of a SUPREX curve is advantageous in two respects: (1) the linear extrapolation of the data to determine a DGf value can be performed over a shorter distance, and (2) the SUPREX data can be collected at lower denaturant concentrations, allowing the measurements to be taken under more physiologically relevant conditions. Collection of SUPREX data at lower denaturant concentrations has allowed for the detection of protein folding intermediate states that are not heavily populated in the presence of high
3
[GdmCl] (M) 1=2
Figure 7 Biphasic SUPREX behavior observed for cytochrome c. The DGapp versus C SUPREX plot derived from SUPREX data (closed symbols and left y-axis) and from conventional spectroscopy data (open symbols and right y-axis) are shown. The solid lines represent the independent linear least squares fitting of the data in regions I and II to Equation (2). The different slopes (i.e., m-values) in regions I and II are indicative of the population of a partially folded intermediate state(s) in the protein’s equilibrium unfolding reaction. Figure adapted with permission from ref. [35].
146
Michael C. Fitzgerald et al.
concentrations of denaturant [35]. The intermediates could be detected with SUPREX because of a change in m-value at lower denaturant concentrations (Figure 7). When lower concentrations of denaturant are present, conditions are more favorable for the population of intermediate states. These intermediate states make the protein folding reaction less cooperative and thus lower the m-value of the protein. As shown in Figure 7, a change in m-value leads to a kink in the plot 1=2 of DGapp versus CSUPREX . Thus, the dependence of the SUPREX transition midpoint on the exchange time allows for the unfolding behavior of the protein to be examined under a wide range of denaturant concentrations, allowing for the detection of partially folded intermediate states. This type of analysis is not possible using conventional spectroscopy-based techniques.
6. CONCLUSION SUPREX is a useful tool for measuring protein folding free energies, binding free energies, and binding affinities. Unique experimental advantages of SUPREX over existing methods for making the same types of thermodynamic measurements include the technique’s high-throughput capabilities and the capacity for studying multi-component mixtures. These advantages make SUPREX a promising technique that could open the door to the study of proteins under conditions that more closely resemble the conditions inside a cell. For example, the technique could be used to study protein folding under the conditions of molecular crowding or in the presence of molecular chaperones. Additionally, there is the possibility of examining the thermodynamic properties of large multi-protein complexes by SUPREX. Future studies exploiting the unique advantages of SUPREX promise to provide important new information about protein folding and ligand binding interactions involved in fundamental biological processes.
REFERENCES 1 S. Ghaemmaghami, M.C. Fitzgerald and T.G. Oas, A quantitative, high-throughput screen for protein stability, Proc. Natl. Acad. Sci. USA, 97 (2000) 8296–8301. 2 K.D. Powell and M.C. Fitzgerald, Accuracy and precision of a new H/D exchange- and mass spectrometry-based technique for measuring the thermodynamic properties of protein–peptide complexes, Biochemistry, 42 (2003) 4962–4970. 3 M.M. Zhu, D.L. Rempel, Z.H. Du and M.L. Gross, Quantification of protein–ligand interactions by mass spectrometry, titration, and H/D exchange: PLIMSTEX, J. Am. Chem. Soc., 125 (2003) 5252–5253. 4 C.N. Pace, Determination and analysis of urea and guanidine hydrochloride denaturation curves, Methods Enzymol., 131 (1986) 266–280. 5 L.Y. Ma and M.C. Fitzgerald, A new H/D exchange- and mass spectrometry-based method for thermodynamic analysis of protein–DNA interactions, Chem. Biol., 10 (2003) 1205–1213. 6 K.D. Powell, S. Ghaemmaghami, M.Z. Wang, L.Y. Ma, T.G. Oas and M.C. Fitzgerald, A general mass spectrometry-based assay for the quantitation of protein–ligand binding interactions in solution, J. Am. Chem. Soc., 124 (2002) 10256; 125 (2003) 4398–4398. 7 P.L. Roulhac, K.D. Powell, S. Dhungana, K.D. Weaver, T.A. Mietzner, A.L. Crumbliss and M.C. Fitzgerald, SUPREX (stability of unpurified proteins from rates of H/D exchange) analysis of
Thermodynamic Analysis of Protein Folding and Ligand Binding
8 9
10 11
12
13 14 15 16
17
18 19
20 21 22 23 24
25 26
27 28 29
147
the thermodynamics of synergistic anion binding by ferric-binding protein (FbpA), a bacterial transferrin, Biochemistry, 43 (2004) 15767–15774. Y. Tong, M.M. Wuebbens, K.V. Rajagopalan and M.C. Fitzgerald, Thermodynamic analysis of subunit interactions in Escherichia coli molybdopterin synthase, Biochemistry, 44 (2005) 2595–2601. M.Z. Wang, J.T. Shetty, B.A. Howard, M.J. Campa, E.F. Patz and M.C. Fitzgerald, Thermodynamic analysis of cyclosporin A binding to cyclophilin A in a lung tumor tissue lysate, Anal. Chem., 76 (2004) 4343–4348. M.C. Fitzgerald, S.Y. Dai and Y. Tong, In: M.L. Gross and R.M. Caprioli (Eds.), Encyclopedia of Mass Spectrometry, , Elsevier Science, Boston, MA, 2006, Vol. 6, pp. 794–802. K.D. Powell and M.C. Fitzgerald, Measurements of protein stability by H/D exchange and matrixassisted laser desorption ionization mass spectrometry using picomoles of material, Anal. Chem., 73 (2001) 3300–3304. S.Y. Dai, M.W. Gardner and M.C. Fitzgerald, Protocol for the thermodynamic analysis of some proteins using an H/D exchange- and mass spectrometry based technique, Anal. Chem., 77 (2005) 693–697. Y.W. Bai, J.S. Milne, L. Mayne and S.W. Englander, Primary structure effects on peptide group hydrogen-exchange, Proteins, 17 (1993) 75–86. Y.Z. Zhang, Ph.D. Thesis: Protein and Peptide Structure and Interactions Studied by Hydrogen Exchange and NMR, University of Pennsylvania, PA, 1995. Q. Yi, M.L. Scalley-Kim, E.J. Alm and D. Baker, NMR characterization of residual structure in the denatured state of protein L, J. Mol. Biol., 299 (2000) 1341–1351. S.L. Kazmirski, K.B. Wong, S.M.V. Freund, Y.J. Tan, A.R. Fersht and V. Daggett, Protein folding from a highly disordered denatured state: The folding pathway of chymotrypsin inhibitor 2 at atomic resolution, Proc. Natl. Acad. Sci. USA, 98 (2001) 4349–4354. J.L. Neira, P. Sevilla, M. Menendez, M. Bruix and M. Rico, Hydrogen exchange in ribonuclease A and ribonuclease S: Evidence for residual structure in the unfolded state under native conditions, J. Mol. Biol., 285 (1999) 627–643. Y. Dai, Ph.D. Thesis: Biophysical Applications of Amide Proton Hydrogen Exchange and MALDI Mass Spectrometry to Protein Folding, Duke University, Durham, NC, 2006. S.Y. Dai and M.C. Fitzgerald, Accuracy of SUPREX (stability of unpurified proteins from rates of H/D exchange) and MALDI mass spectrometry-derived protein unfolding free energies determined under non-EX2 exchange conditions, J. Am. Soc. Mass Spectrom., 17 (2006) 1535–1542. D.L. Smith, Y.Z. Deng and Z.Q. Zhang, Probing the non-covalent structure of proteins by amide hydrogen exchange and mass spectrometry, J. Mass Spectrom., 32 (1997) 135–146. Y.W. Bai, J.S. Milne, L. Mayne and S.W. Englander, Protein stability parameters measured by hydrogen-exchange, Proteins, 20 (1994) 4–14. J.A. Schellman, Macromolecular binding, Biopolymers, 14 (1975) 999–1018. I.H. Segel, Enzyme Kinetics, John Wiley & Sons, New York, 1975. L. Tang, E.D. Hopper, Y. Tong, J.D. Sadowsky, K.J. Peterson, S.H. Gellman and M.C. Fitzgerald, An H/D exchange- and mass spectrometry-based strategy for the thermodynamic analysis of protein– ligand binding, Anal. Chem., 79 (2007) 5869–5877. D. Andersson, P. Hammarstrom and U. Carlsson, Cofactor-induced refolding: Refolding of molten globule carbonic anhydrase induced by Zn(II) and Co(II), Biochemistry, 40 (2001) 2653–2661. R.W. Henkens, B.B. Kitchell, S.C. Lottich, P.J. Stein and T.J. Williams, Detection and characterization using circular-dichroism and fluorescence spectroscopy of a stable intermediate conformation formed in the denaturation of bovine carbonic-anhydrase with guanidinium chloride, Biochemistry, 21 (1982) 5918–5923. L. Masino, S.R. Martin and P.M. Bayley, Ligand binding and thermodynamic stability of a multidomain protein, calmodulin, Protein Sci., 9 (2000) 1519–1529. K.D. Powell and M.C. Fitzgerald, High-throughput screening assay for the tunable selection of protein ligands, J. Comb. Chem., 6 (2004) 262–269. G. Gutzke, B. Fischer, R.R. Mendel and G. Schwarz, Thiocarboxylation of molybdopterin synthase provides evidence for the mechanism of dithiolene formation in metal-binding pterins, J. Biol. Chem., 276 (2001) 36268–36274.
148
Michael C. Fitzgerald et al.
30 M.J. Rudolph, M.M. Wuebbens, K.V. Rajagopalan and H. Schindelin, Crystal structure of molybdopterin synthase and its evolutionary relationship to ubiquitin activation, Nat. Struct. Biol., 8 (2001) 42–46. 31 M.M. Wuebbens and K.V. Rajagopalan, Mechanistic and mutational studies of Escherichia coli molybdopterin synthase clarify the final step of molybdopterin biosynthesis, J. Biol. Chem., 278 (2003) 14523–14532. 32 J.C. Williams, P.L. Roulhac, A.G. Roy, R.B. Vallee, M.C. Fitzgerald and W.A. Hendrickson, Structural and thermodynamic characterization of a cytoplasmic dynein light chain-intermediate chain complex, Proc. Natl. Acad. Sci. USA, 104(24) (2007) 10028–10033. 33 E. Barbar, B. Kleinman, D. Imhoff, M.G. Li, T.S. Hays and M. Hare, Dimerization and folding of LC8, a highly conserved light chain of cytoplasmic dynein, Biochemistry, 40 (2001) 1596–1605. 34 S. Ghaemmaghami and T.G. Oas, Quantitative protein stability measurement in vivo, Nat. Struct. Biol., 8 (2001) 879–882. 35 S.Y. Dai and M.C. Fitzgerald, A mass spectrometry-based probe of equilibrium intermediates in protein-folding reactions, Biochemistry, 45 (2006) 12890–12897. 36 J.A. Boice, G.R. Dieckmann, W.F. DeGrado and R. Fairman, Thermodynamic analysis of a designed three-stranded coiled coil, Biochemistry, 35 (1996) 14480–14485. 37 K.D. Powell, T.E. Wales and M.C. Fitzgerald, Thermodynamic stability measurements on multimeric proteins using a new H/D exchange- and matrix-assisted laser desorption/ionization (MALDI) mass spectrometry-based method, Protein Sci., 11 (2002) 841–851. 38 J.A. Zitzewitz, O. Bilsel, J.B. Luo, B.E. Jones and C.R. Matthews, Probing the folding mechanism of a leucine-zipper peptide by stopped-flow circular-dichroism spectroscopy, Biochemistry, 34 (1995) 12812–12819. 39 T.E. Wales and M.C. Fitzgerald, The energetic contribution of backbone-backbone hydrogen bonds to the thermodynamic stability of a hyperstable P22 arc repressor mutant, J. Am. Chem. Soc., 123 (2001) 7709–7710. 40 P. Silinski, M.J. Allingham and M.C. Fitzgerald, Guanidine-induced equilibrium unfolding of a homo-hexameric enzyme 4-oxalocrotonate tautomerase (4-OT), Biochemistry, 40 (2001) 4493–4502. 41 T. Fernando and C.A. Royer, Unfolding of Trp repressor studied using fluorescence spectroscopic techniques, Biochemistry, 31 (1992) 6683–6691. 42 S. Lindman, W.F. Xue, O. Szczepankiewicz, M.C. Bauer, H. Nilsson and S. Linse, Salting the charged surface: pH and salt dependence of protein G B1 stability, Biophys. J., 90 (2006) 2911–2921. 43 M. Ramirez-Alvarado and L. Regan, Does the location of a mutation determine the ability to form amyloid fibrils? J. Mol. Biol., 323 (2002) 17–22. 44 X.Y. Yang and M.C. Fitzgerald, Total chemical synthesis of the B1 domain of protein L from Peptostreptococcus magnus, Bioorg. Chem., 34 (2006) 131–141. 45 V.V. Filimonov, A.I. Azuaga, A.R. Viguera, L. Serrano and P.L. Mateo, A thermodynamic analysis of a family of small globular proteins: SH3 domains, Biophys. Chem., 77 (1999) 195–208. 46 S.E. Jackson and A.R. Fersht, Folding of chymotrypsin inhibitor-2.2. Influence of proline isomerization on the folding kinetics and thermodynamic characterization of the transition-state of folding, Biochemistry, 30 (1991) 10436–10443. 47 B.M. Brown and R.T. Sauer, Assembly of the Arc repressor operator complex-cooperative interactions between DNA-bound dimers, Biochemistry, 32 (1993) 1354–1363. 48 D.J. Sloan and H.W. Hellinga, Dissection of the protein G B1 domain binding site for human IgG Fc fragment, Protein Sci., 8 (1999) 1643–1648. 49 R.P. Hearn, F.M. Richards, J.M. Sturtevant and G.D. Watt, Thermodynamics of binding of S-Peptide to S-Protein to form ribonuclease S’, Biochemistry, 10 (1971) 806–817. 50 J.J. He and K.S. Matthews, Effect of amino-acid alterations in the tryptophan-binding site of the Trp repressor, J. Biol. Chem., 265 (1990) 731–737. 51 V. Letilly and C.A. Royer, Fluorescence anisotropy assays implicate protein–protein interactions in regulating Trp repressor DNA-binding, Biochemistry, 32 (1993) 7753–7758. 52 T. Lee, G.S. Laco, B.E. Torbett, H.S. Fox, D.L. Lerner, J.H. Elder and C.H. Wong, Analysis of the S3 and S3 ‘ subsite specificities of feline immunodeficiency virus (FIV) protease: Development of a
Thermodynamic Analysis of Protein Folding and Ligand Binding
53 54
55
56
57
58
59
149
broad-based protease inhibitor efficacious against FIV, SIV and HIV in vitro and ex vivo, Proc. Natl. Acad. Sci. USA, 95 (1998) 939–944. R. Varadarajan, P.R. Connelly, J.M. Sturtevant and F.M. Richards, Heat-capacity changes for protein peptide interactions in the ribonuclease-S system, Biochemistry, 31 (1992) 1421–1426. J. Thomson, G.S. Ratnaparkhi, R. Varadarajan, J.M. Sturtevant and F.M. Richards, Thermodynamic and structural consequences of changing a sulfur atom to a methylene group in the M13n1e mutation in ribonuclease-S, Biochemistry, 33 (1994) 8587–8593. T.T. Lam, J.K. Lanman, M.R. Emmett, C.L. Hendrickson, A.G. Marshall and P.E. Prevelige, Mapping of protein: Protein contact surfaces by hydrogen/deuterium exchange, followed by online high-performance liquid chromatography-electrospray ionization Fourier-transform ioncyclotron-resonance mass analysis, J. Chromatogr. A, 982 (2002) 85–95. D. Matulis, J.K. Kranz, F.R. Salemme and M.J. Todd, Thermodynamic stability of carbonic anhydrase: Measurements of binding affinity and stoichiometry using ThermoFluor, Biochemistry, 44 (2005) 5258–5266. Y.S.N. Day, C.L. Baird, R.L. Rich and D.G. Myszka, Direct comparison of binding equilibrium, thermodynamic, and rate constants determined by surface- and solution-based biophysical methods, Protein Sci., 11 (2002) 1017–1025. L. Frego, E. Gautschi, L. Martin and W. Davidson, The determination of high-affinity protein/ inhibitor binding constants by electrospray ionization hydrogen/deuterium exchange mass spectrometry, Rapid Commun. Mass Spectrom., 20 (2006) 2478–2482. C.W. Reid, D. Brewer and A.J. Clarke, Substrate binding affinity of pseudomonas aeruginosa membrane-bound lytic transglycosylase B by hydrogen-deuterium exchange MALDI MS, Biochemistry, 43 (2004) 11275–11282.
CHAPT ER
7 Microsecond Time-Scale Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues David M. Hambly and Michael L. Gross
Contents
1. Introduction 1.1 Why footprint protein surfaces with hydroxyl radicals? 1.2 Advantages of mass spectrometry 2. Reagents for Surface Mapping 2.1 D2O is a reversible reagent with high coverage and low specificity 2.2 Irreversible reagents with high specificity that give limited coverage 2.3 Radicals as an irreversible yet specific reagent 2.4 Hydroxyl radical oxidized analytes in mass spectrometry 2.5 Generation of hydroxyl radicals by irradiation of water 2.6 The synchrotron method for protein footprinting 2.7 Hydroxyl radical reactions and kinetics with amino acids 3. Fast Photochemical Oxidation of Proteins (FPOP) 3.1 Practical pointers 3.2 FPOP reaction variables 3.3 Investigating the reaction time 3.4 Conclusions Acknowledgement References
151 151 152 153 153 154 155 155 155 156 157 163 163 165 167 171 172 172
1. INTRODUCTION 1.1 Why footprint protein surfaces with hydroxyl radicals? The nature of the amino acids located at protein surfaces is the major factor in determining ligand selectivity and affinity. The ability to understand the surfaces of individual proteins, and interactions between proteins and ligand is Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00207-9
r 2009 Elsevier B.V. All rights reserved.
151
152
David M. Hambly and Michael L. Gross
a major undertaking. X-ray crystallography [1], NMR structure determination [2], and cryo-electron microscopy [3–5] are capable of producing models with high surface resolution. Other tools can be used if X-ray, NMR, and cryo-electron microscopy fail. Chemical footprinting of protein surfaces is one approach for investigating residues that are important for ligand selectivity and affinity. It originated in the late 1970s when Galas and Schmitz [6] used it and limited digestion of DNA to evaluate solvent exposure and identify the binding site of the lac Repressor protein with the regulatory sequence of the lac operon on DNA. Although gel analysis offers residue resolution when footprinting DNA, more powerful methods must be used for chemical footprinting of proteins.
1.2 Advantages of mass spectrometry Although the various methods discussed above can interrogate protein surfaces, none has the sensitivity that mass spectrometry (MS) can provide. In particular, when proteins are digested, either chemically or proteolytically, the resulting peptides can be efficiently screened at high femtomole levels by using modern proteomic methods. With a modern matrix-assisted laser desorption ionization time-of-flight (MALDI-ToF) mass spectrometer or a reversed-phase liquid chromatograph coupled to an electrospray ionization mass spectrometer, thousands of peptides per hour can be identified [7]. The former technology allows proteins to be identified by their peptide fingerprint — a list of the masses of a unique set of peptides — whereas LC/ESI technology is an effective means to sequence peptides and confirm protein primary sequence [8]. Sequencing methods are invaluable when post-translational modifications, such as those generated by chemical footprinting, are to be identified because careful analysis of the product-ion spectrum usually allows the modified amino acid to be identified. To monitor changes in the protein surface by MS, the protein molecular weight must be altered in a manner that allows a distinction to be made between, for example, the free protein and the protein in a macromolecular complex. It is also essential that the introduced modification does not disrupt the structure of the complex. The biological community, since the late 1970s, has accepted the idea of using reagents to footprint the interaction areas of macromolecules in complexes [6,9–11]. The approach can also be viewed as a probe of solvent-exposed residues on a protein surface. Hydrogen/deuterium (H/D) exchange monitored by 2D NMR was first demonstrated by Kurt Wu¨thrich in 1982 to identify residues involved in stable secondary structure [9]. Later work demonstrated that chemical cross-linking using photoactive probes [12,13] or protein cleavage by hydroxyl radicals [14,15] afforded data that permit similar conclusions; that is, amino acid residues with decreased solvent exposure (e.g., those located at an interface) are protected from cleavage or cross-linking. Therefore, the use of MS as a probe of protein surfaces requires a suitable reagent that modifies the solvent-exposed residues, and thereby changes their mass, but does not modify residues buried at a protein interface or beneath the surface. By repeating the experiment in the absence of the
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
153
binding partner, the residues that were previously protected should now be exposed and modified.
2. REAGENTS FOR SURFACE MAPPING This introduction focuses on three types of reagents that are used to modify proteins in a way that allows distinctions to be made between a bound protein complex and its uncomplexed constituents. The basic premise for the use of these reagents is that they react with the protein at locations that are solvent accessible. As shown in Figure 1, those portions of a protein that are bound to a ligand will be modified differently than in the absence of the ligand [16]. The ideal reagent would modify each residue to afford a stable product that is amenable to analysis by tandem MS methods; further, the reagent would not affect the buffer system or the protein conformation.
2.1 D2O is a reversible reagent with high coverage and low specificity A nearly ideal reagent for modifying proteins is deuterium used in the form of deuterated water. Amide hydrogens of peptide bonds readily exchange with deuterium resulting in a mass increase of 1 Da for each amide. The rate of exchange is decreased if the amide hydrogen participates in stable hydrogen bonds [17–20]. Other than proline, all residues have an exchangeable amide proton. Generally, the presence of an interface will stabilize nearby secondary structure, strengthening those H-bonds near the interface and decreasing their rates of H/D exchange. Although hydrogens on carboxylic acids, alcohols, and amines are labile and deuterate rapidly, they are readily back-exchanged with hydrogen when, at the end of an experiment, non-deuterated water is added Specific Reagent R ---R ----
Specific Reagent
---- R ---- R
R ---R ------- R ---- R
Specific Reagent
Unmodified residues R ---R ----
---- R ---- R
Figure 1 Schematic showing that specific reagents modify only solvent-exposed target residues and indicate the protein–ligand interface when experiments 7 ligand are performed.
154
David M. Hambly and Michael L. Gross
under quench conditions (this is called a forward-exchange experiment since the rate of uptake of deuterium is monitored). Amide-bound deuteriums remain in place and are stable for tens of minutes during the solution quench, which is achieved by decreasing the temperature to 01C and the pH to B2.5 [21]. In a typical H/D back-exchange experiment, one would deuterate the protein first, then form the complex in D2O, and then transfer the complex into H2O [22,23]. The ligand binding cooperatively stabilizes the protein structure at the interface, and consequently, the amide deuterons at the interface resist back-exchange to the hydrogen form, and the rate of loss of deuterium at these sites is slowed. This method takes advantage of nearly uniform reactivity for all amino acid residues (except proline). An underlying assumption is that the isotope, while causing a mass shift, does not disturb the buffer system or protein conformation. To locate the protected amide, tandem MS methods may be employed provided the energy input needed to fragment the peptide in MS/MS does not interconvert amide Ds and Hs (i.e., there is no scrambling). There are varying opinions on this point [24–29]. The method of H/D exchange combined with MS is producing significant results [30–36]. For example, monitoring the extent of H/D exchange of a protein when titrated with ligand allows binding affinities to be determined by modeling the titration curve [37]. This method, known as PLIMSTEX, works for protein complexes involving binding to metal ion, to small molecules, and to other peptides or proteins [38]. Affinities can also be obtained by titrating the complex with denaturant and following the extent of denaturation with H/D exchange. The approach, called SUPREX [39,40], takes advantage of the more facile denaturation (i.e., occurring at lower concentration of denaturant) of the protein than of the complex. In some instances, protein conformational changes can be mapped using the H/D exchange MS/MS methodology [25,29,41].
2.2 Irreversible reagents with high specificity that give limited coverage A second class of reagent is one that irreversibly modifies specific residues [16,42–49]. The goal, when using this class of reagents, is to modify residues on a protein while it is in monomeric form and compare the modifications to those observed while the protein is in a complex, as is illustrated in Figure 1. The protein should not undergo modification at residues that are protected when the complex is formed. If the protein, in the absence of the complex, is now modified at this site, one concludes that the amino acid side chain is at, or very near, the complex interface. To locate the modified residue, the protein is digested, and standard proteomic methods are used for peptide analysis. One example of a reagent that targets primary amines is the N-hydroxysuccinimide (NHS) functional group, shown in Figure 2. The NHS leaving group drives stable amide bond formation with any exposed primary amine, thus tagging the residue as solvent accessible. Numerous publications report the successful use of these and related reagents to locate side chains whose reactivity was altered in the formation of a protein complex [43,50–52]. Reagents are also available for photoaffinity cross-linking, where the protein of interest is modified first using a specific
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
O
O
N
155
O
O O O
O
N
O
Figure 2 Disuccinimidyl suberate contains two amine reactive NHS groups permitting cross˚ . The amines react at the ester positions, linking of primary amines separated by o 11 A releasing NHS, and when both esters are modified, the labeled primary amines are covalently linked via the intervening C8 alkyl chain.
reagent, then photo-activated to link covalently the two proteins that form a complex at a non-specific site. Standard proteomic methods enable the modified residues to be identified, revealing the linkage location [12,13]. Given that high specificity reagents only probe small areas of a protein surface (typically only lysine and the N-terminus), a reagent with properties common to the first two classes would offer improvements; that is, a reagent that irreversibly probes many amino acid residues.
2.3 Radicals as an irreversible yet specific reagent Radical reagents are a third class of reagents. The radical most often used is the hydroxyl radical because it is readily formed, highly reactive, moderately selective, and closely approximates the size of a water molecule [53]. Hydroxyl radicals are short-lived in solution and react mainly with aromatic and sulfur-containing amino acid residues [54]. Forty-one publications as of the end of 2005 attest to the usefulness of the hydroxyl-radical footprinting for measuring protein solvent accessibility.
2.4 Hydroxyl radical oxidized analytes in mass spectrometry Hydroxyl radicals can be formed by at least four distinct processes. Two methods have been known since the 1900s and involve either electron donation to hydrogen peroxide [14,55–57] (via the Fenton reagent as shown in Scheme 1) or homoloytic cleavage of hydrogen peroxide [58–61] (via UV irradiation shown in Scheme 2). Two other options have also been explored. One uses high voltage electrospray oxidation [62–67] to form an oxygen radical cation that reacts with water to give the radical (Scheme 3). The second utilizes synchrotron radiolysis to eject an electron from water.
2.5 Generation of hydroxyl radicals by irradiation of water High-energy processes, such as g-radiation or synchrotron radiation [54], ionize water to form radical cations [68,69]. Transfer of a proton from the radical cation
156
David M. Hambly and Michael L. Gross
HO
Scheme 1
OH
+
Fe
2+
+
HO
HO
−
+
Fe
3+
The Fenton reagent generates hydroxyl radical and hydroxide by oxidizing iron.
hν (UV) HO
Scheme 2
O
OH
2 HO
Photolytic cleavage of hydrogen peroxide results in two hydroxyl radicals.
O
high V
O
O
+
+
3H2O
− edry
H3O
O
+
+
HO
+
− eaq
O
Scheme 3 Proposed reaction scheme for the electrospray-generation of OH radicals to oxidize proteins and peptides.
H2O
Ionizing Radiation
e−aq
+
HO
O
O
+
H2O O
H2O
+
+ e−dry O HO
2H2O O−
+
H3O
+ H2O HO
+
+
HO
+
e−aq
HO
O
+
HO−
OH
Scheme 4 Ionizing radiation generates hydroxyl radicals directly from water by electron abstraction followed by various water reactions.
results in a hydroxyl radical as detailed in Scheme 4. This methodology has been used in the vast majority of protein-solvent-exposure work since 2001. An advantage of this method is that no reagents are needed; the water itself is the source of the radicals. One drawback is that the method must be performed in a solution devoid of other substances (e.g., buffering reagents) that would quench the radicals. Therefore, sodium cacodylate (Na(CH3)2AsO2), a non-radical scavenger, is the buffer of choice, providing pH control over one range (pKa ¼ 6.2). The shortest irradiation is B10 msec [70]. Although this time exceeds the B100 msec for super-secondary structure unfolding [71,72], most proteins will maintain their global conformation in this time frame. Although using a synchrotron light source is an excellent approach, the lack of wide availability of the synchrotron light source is an obstacle to broad implementation of this method [73].
2.6 The synchrotron method for protein footprinting 2.6.1 Recent uses of hydroxyl radicals for protein oxidation The OH radicals, generated by the methods that are outlined above, label the solvent-exposed residues of a protein in the same manner as do OH radicals in
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
157
DNA and RNA footprinting. Only the synchrotron method, however, has been validated in numerous studies, all from the group of Mark R. Chance [74]. The method was most recently used to monitor the kinetics of RNA polymerase motion along a transcript [75]. The synchrotron-based OH radical approach also can monitor the folding of a ribozyme [76], the unfolding of specific regions of apomyoglobin [77], and determine protein solvent-accessible residues [78] and protein–protein interfaces [79].
2.6.2 Avoiding oxidation-induced conformational changes When the footprinting experiment requires times, comparable to or greater than protein unfolding, the experiment should be conducted so that only one modification is observed per molecule (first order decay kinetics are observed for the unmodified species). This ‘‘single hit’’ requirement is difficult to meet when using radicals because the probe reagent is highly reactive and promiscuous. Given that Chance and co-workers observe multiple oxidations on each protein [65,80], they argue that no conformational changes occur if the reactivity of a given region of the protein, as revealed by the dose-dependent disappearance of the corresponding peptide, formed in digestion, occurs by first order kinetics. The modified amino acids are then located by sequencing the modified peptide ion by tandem MS.
2.7 Hydroxyl radical reactions and kinetics with amino acids The nature and rates of reaction of hydroxyl radicals with each amino acid were extensively reviewed by Garrison [81]. In the reactions with amino acids, the radicals can be formed in oxygen-saturated solutions after fast electrons or g-radiation ionize water, generating hydroxyl radicals. The mechanism of hydroxyl radical reaction with an amino acid is thought to be the same irrespective of which method is used to produce the radical. In many cases, the products are different if the solution is degassed, indicating the role of dissolved oxygen in the overall reaction. Considering only the reactions in which oxygen is present, we see that the amino acid radicals formed in the first step react with oxygen at nearly diffusion-limited rates (i.e., with rate constants that are 109 to 1010 M1s1 [68]).
2.7.1 Hydroxyl radical and sulfur containing residues Methionine and cysteine are the most reactive amino acids toward OH radicals; the reactions occur with diffusion control. Methionine is directly attacked by the hydroxyl radical, forming a radical intermediate on the sulfur atom. The sulfur radical can react with oxygen to form hydroperoxide, producing the methionine S-oxide (sulfoxide, +16 Da) [81]. Alternatively, the hydroxyl moiety on the radical sulfur can form the sulfoxide after a second hydroxyl radical removes the alcohol hydrogen as shown on the first line in Scheme 5. A second round of oxidation affords the methionine S,S-dioxide (sulfone) (+32 Da) along the same pathway as the first oxidation. The OH radical reacts with cysteine by abstracting a hydrogen atom, yielding a sulfur radical. In the studies reviewed by Garrison [81], a free cysteine concentration of 1 mM at neutral pH is sufficient to give disulfide bond
158
David M. Hambly and Michael L. Gross
Methionine
O S
H3 C
O
OH HO
OH
S
H3 C
NH2
OH
O O
H3 C
S
S
OH NH2
H2O
O
OH O
H3 C
NH2
O
O
O HO
O
O
H3 C
OH NH2
O
S
OH NH2
OH
Cysteine
O HS
O OH
HO
S
OH
NH2
HO
Scheme 5 products.
O
O
O
O
O S
OH
O H2O
HO
OH NH2
S NH2
NH2
OH H2O
O
O
HO
OH
S
OH NH2
Reaction of radical with methionine (top) and cysteine (bottom) to form oxidized
formation via a thiolate (RS) species. In dilute protein solutions, however, the radical reacts with oxygen forming a peroxide radical intermediate. Subsequent reactions form 3-sulfenoalanine (+16) as shown in Scheme 5. Although the 3-sulfinoalanine (+32) and cysteic acid (+48) products were reported in the literature, the mechanism is not yet elucidated [82].
2.7.2 Phenylalanine, tyrosine, and hydroxyl radicals The phenyl group of phenylalanine and the hydroxylphenyl group of tyrosine are only slightly less reactive than S-containing amino acids. They undergo addition at an available position of a hydroxyl radical to give a delocalized unpaired electron at various ring sites (Scheme 6). A second radical may remove a hydrogen atom, reforming the aromatic ring. Another reaction involves attack of the carbon-centered radical on dissolved oxygen to give a hydroperoxide [83,84]. The hydroperoxide undergoes a hydrogen rearrangement, releasing the hydroperoxide radical and forming a mono hyroxylated ring (reaction on phenylalanine) or dihydroxy phenyalanine (reaction on tyrosine).
2.7.3 Hydroxyl radical and tryptophan When the hydroxyl radical reacts with tryptophan, there are numerous sites of attack, and each can lead to a different product. Approximately 40% of the oxidation occurs on the six-membered ring, leading to substitution on the 4, 5, 6, or 7 positions of the indole (Scheme 7) [85–91]. These reactions probably proceed through the same intermediates as the phenylalanine oxidations discussed earlier.
159
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
O
O
HO
OH
OH HO
NH2
O
H
HO
OH HO
NH2
HC
O
NH2
O O
O
H
HO
HO
OH O
Scheme 6
OH
NH2
O
NH2 HO
O
OH Radical reactions with phenylalanine.
O C5 C6
C4
C3 C2 N1H
C7
OH NH2 O
O
NH
OH
HO
OH NH2
NH2
NH
HO O
~40% of the reaction O
O
O
O OH HO NH
OH O
C
NH2 NH
OH
O
NH2
OH NH
OH
NH2
O O OH HO
O
Scheme 7
NH
O
NH2
~60% of the reaction
Reaction of tryptophan with the radical.
Theoretical and experimental analyses indicate that the remaining radicals (60%) attack the C-2 position of the indole ring, leading to water loss and delocalization of the radical [85–87,89–91]. The radical can react with oxygen, leading to breaking of the indole ring when the hydroperoxide radical is released. The final product is N-formylkynurenine, a +32 species. There are other minor products, notably a +4 and +14 species [92], but the reaction mechanisms are not known at this time. Generally, oxidation of tryptophan forms products that have mass shifts of +16, +32, and +48.
2.7.4 Histidine products from oxidation The reactions of histidine and hydroxyl radicals yield numerous end products. Initial attack of the radical is at the C-2, C-4, or C-5 position of the imidazole ring,
160
David M. Hambly and Michael L. Gross
O N
HO
OH
HO
Nπ 2
Nτ H
O
NH2 N C
HO
N H
NH2
O
O H2 N O
NH2
Asparagine (-23)
HO
OH O
NH2
Aspartic Acid (-22)
N
+
OH
N CH H
NH2
O
O OH
O
HO
OH OH
NH2
N H 2-oxo histidine (+16)
OH
4 5
OH
O
NH2
N H
O
O
H N
NH O
O OH
NH2
O O
OH NH2
Formyl-asparagine (+5) Ketone-aldehyde (-10)
Scheme 8 The multiple products of histidine oxidation with hydroxyl radical.
as shown in Scheme 8, resulting in the formation of formyl-asparginine (+5 Da), 2-oxohistidine (+16 Da), a diketone (10 Da), aspartate (22 Da), or asparginine (23 Da) [93–97]. Oxidation of histidine in proteins has been rarely reported, even though the kinetics of the reaction appears to be nearly identical to that of amino acids with other aromatic rings. (Histidine is an aromatic amino acid as the imidazole moiety has six electrons (Huckel’s rule: n ¼ 1) delocalized in the ring. Specifically, all atoms in the ring are planar and are sp2-hybridized.) These rate constants were determined for the free amino acids in solution. Given that other aromatic residues were frequently reported as modified, whereas histidine modification was only sparingly reported [80], it may be questionable to assume that rate constants for an amino acid in solution and for an amino acid residue on the protein surface are identical.
2.7.5 Aliphatic amino acid oxidation Free radicals can react at any C–H bond of an aliphatic side chain of an amino acid, and the rates of reaction depend on the stability of the radical formed [98]. Compared to attack of the radical at an aromatic side chain, the reaction rate constants for attack at an aliphatic carbon are between three-fold and threehundred-fold smaller. When attack at an aliphatic carbon occurs, the hydroxyl radical abstracts a hydrogen atom, giving a carbon-centered radical. For leucine and isoleucine, a radical formed at the tertiary carbon is stabilized by the electrondonating methyl and methylene groups. As shown in Scheme 9, the newly formed radical can then attack dissolved oxygen, forming an alcohol or carbonyl [99–108]. The relative reactivities of methyl and methylene groups are determined by other factors than simply the tertiary or secondary nature of the carbon atom. For example, the modifications of isoleucine in proteins have yet to be reported, but
161
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
O H3 C
O OH
HO
H3 C
CH3 NH2
C
OH
HO
CH3 NH2 O
OH
H3 C
O OH
CH3 NH2 O
O
H2 C H3 C
O
OH
O
O
CH3 NH2 OH
O
OH
CH3 NH2
OH O
CH3 NH2
OH
O
HO
H3 C
OH
O
O
CH3 NH2
OH CH3 NH2
Scheme 9 Radical reactions with leucine exemplifying aliphatic oxidation when the hydroxyl radical abstracts hydrogen from alkyl carbons.
there is no difference in the rate constants for reaction with OH of the isolated amino acids, leucine and isoleucine [109,110]. In proteins, the reactivity difference may be due to steric hindrance around the Cb of isoleucine, which would be less exposed to radical attack than the Cg of a leucine at the same position, and is unlikely due to the electron-donating properties of an ethyl versus methyl group [98]. The reactivity of amino acids in proteins is principally determined by the nature of the amino acid and its solvent accessibility, which is the basis for the protein foot-printing method. This further reinforces the proposal that the rates of reaction for free amino acids in solution may not be perfect references for the rate of reaction of the amino acid in a protein.
2.7.6 Reaction kinetics The rate constants for reaction of hydroxyl radicals with the various free amino acids range from 1 1010 to 2 107 M1s1 [109]. Carmichael of the Notre Dame Radiation Research Laboratory developed a web-based database of peer-reviewed ‘‘best available data’’ on the reaction rates of hydroxyl radicals with over 1,700 compounds. Data on reactions with the amino acids shown in Table 1 are from website http://www.rcdc.nd.edu/compilations/Hydroxyl/OH.HTM. The relative reactivities as ordered by the rate constants for reaction of OH with the free amino acids are: CWW, YWMWFWPWHWRWI/LWVWP, Q, TWKWSWEWAWDWNWG. Chance also investigated the relative order of reactivity by comparing the oxidative products of peptides with two or more possible targets (e.g., both W and Y on a peptide, W and L on a different peptide) and determined the following reaction order: CWMWWWYWFWHWL/IWR, K, VWS, T, PWQ, EWN, DWAWG [111]. The reason for the differences in relative reactivity is unclear at this time.
162
David M. Hambly and Michael L. Gross
Table 1 Rate constants for reactions of hydroxyl radical with amino acids, a simple peptide, and a common denaturing agent Molecule
Rate (M1s1)
Cysteine Tryptophan Tyrosine Methionine Phenylalanine Histidine OH radical Arginine Leucine Isoleucine Valine Proline Glutamine Threonine Lysine Serine GlyGly Glutamate Alanine Aspartate Asparagine Glycine Urea
3 1010 1 1010 1 1010 8 109 7 109 5 109 5 109 4 109 2 109 2 109 7 108 5 108 5 108 5 108 4 108 3 108 3 108 2 108 8 107 7 107 5 107 2 107 8 105
2.7.7 Side reactions There are other pathways for the reaction of radicals with proteins than those discussed earlier. Radical transfer between aromatics is a known pathway. Furthermore, a recent study of oxidatively inhibited proteins indicated that nonactive site methionines are preferentially oxidized, and these modifications do not cause enzyme activity to be lost [112–117]. However, if the methionine residues are removed through mutagenesis, the proteins lose activity with the first oxidation. Levine et al. [112–114] proposed that radicals generated at active sites in a protein may transfer the radical site to a nearby methionine, maintaining catalytic activity. Additionally, tyrosine acts as a radical sink for neutral indolyl radicals formed in the reaction of tryptophan with N3 . [118]. For the dipeptide, tryptophantyrosine, the rate constant for the reaction with OH radical is 7 104 s1 [118]. The penta-peptide WPPPY, which uses three rigid proline residues as spacers to create a 1.3-nm gap between the radical donor and acceptor, has a k ¼ 0.2 104 s1, 35-fold smaller [118,119].
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
163
3. FAST PHOTOCHEMICAL OXIDATION OF PROTEINS (FPOP) 3.1 Practical pointers Given the promiscuous and highly reactive nature of the hydroxyl radical and the likely conformational changes induced by extensive radical damage in a protein, it is critical to initiate and complete the reaction as rapidly as possible. Given that the UV photolysis of H2O2 to form hydroxyl radicals is a wellcharacterized reaction, the use of a laser, instead of a simple UV lamp, seemed an optimal way to trigger the chemical reaction. An ideal laser wavelength would be exclusively absorbed by H2O2, and quantitatively generate two hydroxyl radicals with nearly identical energy distributions. Very high energy lasers, such as 157and 193-nm lasers are unsuitable as ambient moisture and a protein would absorb all the energy (Figure 3). The protein UV spectrum has a minimum at B245 nm. The krypton fluoride (KrF) excimer laser, with an output wavelength of 248 nm appears to be the optimum choice. Moreover, the quantum yield for production of OH radicals from H2O2 is 1.7. We attempted to use the 4th fundamental of a Nd:YAG laser (l ¼ 266 nm) but found insufficient oxidation in a test system, likely due to increased protein absorbance (B10%), lower laser power (12 mJ/pulse versus 50 mJ/pulse for KrF), and a 30% decrease in the H2O2 absorption compared to that at 248 nm (Figure 4) [58,59]. Two distinct setups were tested. In the first setup, the laser beam was fired through a solution that was loaded into a pipet tip. As the beam cannot pass through plastic, the beam needs to be properly aligned with the wide end of the pipet tip opening to ensure maximum irradiation of the sample. To reproducibly hold a 20-mL tip, we used a 1/4v swagelock fitting to a lab stand. A business card was used to align the swagelock opening with the laser beam (using low power output). Once aligned, a 200 mL pipet tip containing 10–15 mL of 10-mM apomyoglobin with 20 mM H2O2 in 10 mM NaHPO4 pH ¼ 7.5 was inserted into the swagelock fitting (this can be a positive control). The laser was then turned on, and programmed for one shot per second. Once the laser is active, the pulse is
Figure 3 Absorption spectrum of 10 mM apomyoglobin from 190 to 300 nm. Inset is the spectrum from 240 to 300 nm, demonstrating that 248 nm is near the minima for protein absorption. Absorbances are marked at 193, 248, and 266 nm.
164
David M. Hambly and Michael L. Gross
Figure 4 Absorption spectrum of 5 mM H2O2 in water. The absorbances are marked at 248 and 266 nm.
triggered by turning on the external pulse generator for one pulse (B&K Precision, Yorba Linda, CA). While this setup does produce oxidized protein, there is a large excess of H2O2 molecules (1.2 1017 molecules) compared to photons of 248-nm light (maximum at 50 mJ ¼ 6.2 1016 photon). We modified the setup to maximize the photon:H2O2 ratio (Figure 5). A lowvolume flow cell capable of withstanding high photon density is critical for this application. Fused silica (f.s.) tubing, stripped of its polyimide coating, makes an excellent low-volume flow cell (old f.s. tubing breaks more easily than new tubing). The silica must be held in place to obtain reproducible results. Although we used microclamps (Thor Labs), any similar setup would work. A beam stop behind the tubing serves as a minimum safety requirement, but this was augmented with a 1/4v thick plexiglass shield around the front of the laser. We found it necessary to focus the laser beam B2 to achieve maximum protein oxidation. A flow of the protein and peroxide solution was via a syringe pump at a fixed flow rate (e.g., 20 mL/min). Only one laser pulse was used per bolus plug of solution in the flow cell to prevent a second step of oxidation, so the timing of the laser pulse must be coordinated with the flow rate and flow-cell volume. The protein solution containing the peroxide was flowed through a fused silica capillary tube. The laser fired at defined intervals through a window cleared of polyimide coating. The laser passes through the transparent quartz of the capillary and photolyzes the H2O2. The formed hydroxyl radicals oxidize the protein. Before the laser fires a second time, the sample has moved out of the reaction chamber and is replaced by a new bolus plug of material. To determine the appropriate laser frequency, the flow-cell volume was calculated by measuring the beam width at the tubing and multiplying this by the area of the tube (pr2h). For example, the volume irradiated in the flow cell of a 150 mm fused silica tubing by a laser beam of 2 mm width is 0.035 mL. By dividing
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
165
Solvent Flow
Diffusion barrier Eppendorf tube for sample collection
Figure 5
Optimized photochemical oxidation setup.
this by the flow rate (in mL/sec), one can conclude that the irradiated sample spends 0.105 sec in the flow cell. The laser was fired 10% slower than 9.5 Hz (1/0.105 sec) to ensure that there is a small diffusion barrier between irradiated sample and unmodified sample entering the flow cell. The oxidized protein solution was collected in an Eppendorf tube to which was previously added a few microliters of 1-mM bovine catalase. The catalase removed the H2O2 by converting it to H2O and O2 in a few seconds.
3.2 FPOP reaction variables With this setup, we tested various concentrations of H2O2 from 5 to 100 mM. We observed little change in the oxidation pattern of apomyoglobin above 20 mM H2O2. This may indicate that there are insufficient photons to photolyze the additional molecules of H2O2. An estimate of 5.2 1014 photons impinging on the surface area of the capillary is approximately the same as the number of molecules of 25 mM H2O2 in the capillary (5.3 1014 molecules of H2O2 — see Figure 6 for more data). The results clearly indicate that we are able to oxidize apomyoglobin with a single pulse of 248-nm KrF excimer laser light at 50 mJ/pulse output. The amount of oxidation is sensitive to the total energy output. However, when the energy output decreased below B40 mJ/pulse, the amount of oxidized proteins decreased from 80% to B50%. At this point, the laser output can be returned to 50 mJ/pulse by changing the KrF excimer gas.
3.2.1 Limitations As with any radical-based method, there are some limitations of this approach. One is the presence of aromatic buffers or sulfur-based reducing agents that can rapidly scavenge all radicals, directly competing with the protein for oxidation. When possible, these should not be added to the reaction mixture. Additionally, compounds that absorb significantly near 248 nm should be avoided when using FPOP at this wavelength.
166
David M. Hambly and Michael L. Gross
Figure 6 Mass spectra of the oxidized protein to show the extents of apomyoglobin oxidation at various concentrations of H2O2: (A) 0 mM H2O2 (a small amount of column carryover is observed), (B) 5 mM H2O2, (C) 10 mM H2O2, (D) 15 mM H2O2, (E) 20 mM H2O2.
The method would seem to be ill suited to map protein–ligand interactions that are hydrophilic in nature. In these protein–ligand interactions systems, the amino acid side chains that become solvent-excluded when the complex is formed have low reaction rates for hydroxyl radicals. Little oxidation would occur on these residues, and minimal information gleaned about the site of interaction.
3.2.2 Mass spectrometric analysis To desalt the protein, 5 mL (containing 50-pmol total protein) of sample was loaded onto a C18 Opti-guard column (Optimize Technologies, Oregon City, OR) that had
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
167
been pre-equilibrated with 150 mL of water. The sample was then loaded and desalted with 350 mL of water. The protein was eluted using 20 mL/min 50% CH3CN in 0.1% formic acid into a Q-ToF mass spectrometer. Data acquisition was carried out in the ‘‘W mode,’’ which, by means of four passes through the mass analyzer, affords a mass resolving power of B15,000 (full width at half maximum). The summed data were deconvoluted with MaxEnt 1, a protein deconvolution program made available by the instrument manufacturer (selecting appropriate mass ranges and a ‘‘resolution’’ of 0.5).
3.3 Investigating the reaction time It would be preferable to have a method that completes the reaction before secondary unfolding of the protein can occur. The current photochemical or chemical methods require 30 sec to 5 min reaction duration. Synchrotron radiolysis requires at least 50 ms to do the dose-dependent footprinting measurements that are necessary to observe first-order decay kinetics. Any time is sufficiently short if the oxidation proceeds under single-hit conditions. Allowing more extensive oxidation to occur, however, will permit detection of more oxidation sites and increase the dynamic range of the experiment, but this requires more time (or more H2O2), opening the door to oxidation-induced protein unfolding. Given that proteins can undergo super-secondary structure conformational dynamics in microseconds, as reviewed in the introduction [72,120,121], extending the time of oxidation to times longer than microseconds may lead to results complicated by oxidation-induced protein unfolding. Most proteins studied thus far require tens of microseconds to expose buried residues to solvent in a temperature-jump experiment where the solution is heated by 201C in a few nanoseconds [71,72,122]. The multi-microsecond unfolding timescale represents a worst-case scenario, and as such defines how quickly a chemical footprinting experiment must be completed so that only the native state of any protein is foot-printed. To determine the lifetime of hydroxyl radicals in solution, we can use the second-order rate of recombination of two hydroxyl radicals to reform H2O2 as a worst-case scenario (k ¼ 6 109 M1s1 [109]) to estimate the longest time for exposure. Considering this means of depleting OH radicals, one can show that the radicals exist at significant concentrations relative to the protein even after 100 ms (Figure 7). Whereas global tertiary changes will likely not occur in 100 ms, changes in the organization of secondary structure can occur in the low microsecond range. The time of exposure can be reduced by using a scavenger (in excess) to remove the radicals from the reaction medium in a manner that is more rapid than dimerization to reform H2O2. Using the known rate constants for reactions of phenylalanine or glutamine with the hydroxyl radical and assuming a maximum starting concentration of 1-mM radical, one can calculate the time-dependent profile of hydroxyl radical concentration in the solution (Figure 7). The use of phenylalanine at 20 mM results in nearly complete OH radical interception and consequently nearly no protein oxidation. Although we used glutamine at 15 or 20 mM, the concentration of the scavenger can be varied over a larger range. Increasing the concentration of the
168
David M. Hambly and Michael L. Gross
Figure 7 Calculated time-dependent concentrations of OH radical as a function of the presence of a scavenger. Reference data are represented by diamonds with no line; that is, the second order disappearance of hydroxyl radical as a result of self-reaction to reform H2O2. Square: the first order disappearance of hydroxyl radical as a result of reaction with excess phenylalanine (20 mM) results in quenching of the radical in o 0.1 ms. Triangle: the first order disappearance of hydroxyl radical as a result of reaction with excess glutamine (20 mM) results in quenching of the radical in o 1 ms.
scavenger should decrease the extent of hydroxyl modifications as the increasing amount of glutamine becomes increasingly competitive for radicals versus the reactive residues on the protein. The result, at the protein level, is the decrease in the amount of oxidized protein. Figure 8 shows the deconvoluted mass spectra for oxidized myoglobin at scavenger concentrations of 0, 5, 10, 20, and 60 mM glutamine. This approach allows a calculation of the decay rate of an unmodified protein (or peptide constituent) to ensure that the native state is being probed. For example, if a protein should unfold as a result of oxidation, new channels of reactivity will open and the decay rate will become multiphasic [123]. A clean, first-order decay may be a gold standard in the protein-footprinting method and indicates no detectable unfolding. By taking the ratio of the unmodified protein signal versus total protein signal as the glutamine concentration varies, we obtained a linear decay when using different concentrations of glutamine (the radical dose decreases as the glutamine concentration increases). Figure 9 shows the linear decay on a semi-log plot with an R2 value of 0.99. The 0, 5, and 10 mM glutamine data points were excluded as the scavenger concentration was probably not in excess of that of the radical. Smaller changes in dose can also be effectuated by changing the ionic strength (Figure 10). In this work, there was a background salt concentration of 1 mM NaH2PO4, pH ¼ 7.8, but below 50 mM NaCl, minimal change in the oxidation pattern was observed. Another way to alter the dose of hydroxyl radicals is to use varying concentrations of protein as shown in Figure 11. One must be careful that the
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
169
Normal Protein signal Total Protein Signal
Figure 8 Deconvoluted mass spectra of oxidized myoglobin in the presence of increasing concentrations of glutamine. This results in decrease in the signal for oxidized protein.
Plot of the Decay of Unmodified Myoglobin versus Glutamine Concentration 100%
R2 = 0.9923 10% 0
20
40
60
80
100
[Glutamine] (mM)
Figure 9 Semi-log plot demonstrating first order decay of an unmodified protein/total protein versus glutamine concentration. As the glutamine concentration was increased, the dose decreased.
protein does not aggregate, or more importantly alter the oligomerization state if the goal is to obtain dose-dependent curves. This approach may be useful for selfassociation studies as the work can be performed in the presence of a constant concentration of a non-interfering protein (e.g., BSA). The protein concentration can be decreased to at least as low as 100 nM by diluting the protein prior to oxidation. We tested this idea by diluting myoglobin to 100 nM, submitted the protein to FPOP, and observed no difference in the pattern of oxidized protein signals compared to those found for a 1 mM myoglobin sample. For samples that aggregate at the higher concentrations needed by other methods, MS provides a way to measure the solvent exposure at low concentrations, with only micrograms of material.
170
David M. Hambly and Michael L. Gross
Figure 10 Deconvoluted mass spectra of oxidized myoglobin in the presence of increasing concentrations of NaCl. The results show increasing signal for oxidized protein as [NaCl] increases.
Figure 11 Deconvoluted mass spectra of oxidized myoglobin as a function of protein concentration.
Another means to alter the dose of hydroxyl radicals is to simply change the hydrogen peroxide concentration. Figure 12 shows the effect on the extent of protein oxidation in a solution of hydrogen peroxide at concentrations from 100 mM down to 1 mM. Most notably, there is almost no difference between oxidation in 50 mM versus 100 mM H2O2 concentrations. This strongly indicates that nearly 100% of the photons are absorbed at 50 mM, leading to saturation.
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
171
Figure 12 Deconvoluted mass spectra of oxidized myoglobin as the H2O2 concentration is varied.
A more powerful laser, such as the EX100, capable of 100 mJ/pulse of 248-nm light should be able to photolyse H2O2 at higher concentrations.
3.4 Conclusions That hydroxyl radicals react with Cys, Met, Trp, Tyr, Phe, and His at nearly diffusion-limited reaction rates [109] can be carried over into methods to footprint proteins. These residues are frequently found at protein–ligand interfaces, suggesting that reactions with OH radicals can be used to footprint protein– ligand interfaces [124,125]. Success requires choosing the appropriate laser, reaction conditions, and concentration of scavenger. The early stages of the development of a method involving photochemical production of OH radicals by laser irradiation of a solution of protein containing small amounts of H2O2 included testing various laser wavelengths capable of homolytically cleaving H2O2 into hydroxyl radicals. We call this method ‘‘fast photochemical oxidation of proteins’’ (FPOP). Although an Nd-YAG 266 nm laser was able to generate a very small amount of oxidized protein, the use of 248 nm light from a KrF laser resulted in substantial oxidation of the protein, strongly suggesting that this is a preferred wavelength at which to carry out FPOP. The approach is to irradiate the sample in a flow cell by timing the laser pulse frequency to the length of time required to pass a bolus or plug of sample completely through the flow cell. The laser frequency was decreased by 10% to ensure that a small amount of protein is not subjected to reactions with radicals by means of a diffusion barrier between oxidized protein and photolytically generated hydroxyl radicals and allows some unreacted protein to remain as a reference. By adding at least 15-mM glutamine to the solution prior to irradiation, we found we could control the time and extent of oxidation. Inclusion of this
172
David M. Hambly and Michael L. Gross
scavenger reduces the reaction duration from B100 ms (without scavenger) to 1 ms. Carrying out the oxidation this rapidly ensures that no protein super-secondary structure unfolds during the reaction and insures that the oxidations that are observed are those of the native protein.
ACKNOWLEDGEMENT This work was supported by a grant from the NIH National Centers for Research Resources (P41RR00954).
REFERENCES 1 M.H. Koch, P. Vachette and D.I. Svergun, Small-angle scattering: A view on the properties, structures and structural changes of biological macromolecules in solution, Q. Rev. Biophys., 36(2) (2003) 147–227. 2 W. Braun, J. Kallen, V. Mikol, M.D. Walkinshaw and K. Wuthrich, Three-dimensional structure and actions of immunosuppressants and their immunophilins, FASEB J., 9(1) (1995) 63–72. 3 R. Harrer, Associations between light-harvesting complexes and Photosystem II from Marchantia polymorpha L. determined by two- and three-dimensional electron microscopy, Photosynth. Res., 75(3) (2003) 249–258. 4 N. Volkmann, H. Liu, L. Hazelwood, E.B. Krementsova, S. Lowey, K.M. Trybus and D. Hanein, The structural basis of myosin V processive movement as revealed by electron cryomicroscopy, Mol. Cell, 19(5) (2005) 595–605. 5 C.L. Wolfe, J.A. Warrington, L. Treadwell and M.T. Norcum, A three-dimensional working model of the multienzyme complex of aminoacyl-tRNA synthetases based on electron microscopic placements of tRNA and proteins, J. Biol. Chem., 280(46) (2005) 38870–38878. 6 D.J. Galas and A. Schmitz, DNAse footprinting: A simple method for the detection of proteinDNA binding specificity, Nucleic Acids Res., 5(9) (1978) 3157–3170. 7 R.A. Bradshaw and A.L. Burlingame, From proteins to proteomics, IUBMB Life, 57(4–5) (2005) 267–272. 8 D.C. Chamrad, G. Koerting, J. Gobom, H. Thiele, J. Klose, H.E. Meyer and M. Blueggel, Interpretation of mass spectrometry data for high-throughput proteomics, Anal. Bioanal. Chem., 376(7) (2003) 1014–1022. 9 G. Wagner and K. Wuthrich, Amide protein exchange and surface conformation of the basic pancreatic trypsin inhibitor in solution. Studies with two-dimensional nuclear magnetic resonance, J. Mol. Biol., 160(2) (1982) 343–361. 10 T.D. Tullius, Physical studies of protein–DNA complexes by footprinting, Annu. Rev. Biophys. Biophys. Chem., 18 (1989) 213–237. 11 M. Zhong, L. Lin and N.R. Kallenbach, A method for probing the topography and interactions of proteins: Footprinting of myoglobin, Proc. Natl. Acad. Sci. USA, 92(6) (1995) 2111–2115. 12 S.A. McMahan and R.R. Burgess, Use of aryl azide cross-linkers to investigate protein–protein interactions: An optimization of important conditions as applied to Escherichia coli RNA polymerase and localization of a sigma 70-alpha cross-link to the C-terminal region of alpha, Biochemistry, 33(40) (1994) 12092–12099. 13 Y. Chen, Y.W. Ebright and R.H. Ebright, Identification of the target of a transcription activator protein by protein–protein photocrosslinking, Science, 265(5168) (1994) 90–92. 14 E. Heyduk and T. Heyduk, Mapping protein domains involved in macromolecular interactions: A novel protein footprinting approach, Biochemistry, 33(32) (1994) 9643–9650. 15 R. Miyake, K. Murakami, J.T. Owens, D.P. Greiner, O.N. Ozoline, A. Ishihama and C.F. Meares, Dimeric association of Escherichia coli RNA polymerase alpha subunits, studied by cleavage of single-cysteine alpha subunits conjugated to iron-(S)-1-[p-(bromoacetamido)benzyl]ethylenediaminetetraacetate, Biochemistry, 37(5) (1998) 1344–1349.
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
173
16 R. Kluger and A. Alagic, Chemical cross-linking and protein–protein interactions — A review with illustrative protocols, Bioorg. Chem., 32(6) (2004) 451–472. 17 L. Konermann and D.A. Simmons, Protein-folding kinetics and mechanisms studied by pulselabeling and mass spectrometry, Mass Spectrom. Rev., 22(1) (2003) 1–26. 18 M.M. Krishna, L. Hoang, Y. Lin and S.W. Englander, Hydrogen exchange methods to study protein folding, Methods, 34(1) (2004) 51–64. 19 K.A. Resing, A.N. Hoofnagle and N.G. Ahn, Modeling deuterium exchange behavior of ERK2 using pepsin mapping to probe secondary structure, J. Am. Soc. Mass Spectrom., 10(8) (1999) 685–702. 20 Z. Zhang and D.L. Smith, Determination of amide hydrogen exchange by mass spectrometry: A new tool for protein structure elucidation, Protein Sci., 2(4) (1993) 522–531. 21 S.W. Englander and A. Poulsen, Hydrogen-tritium exchange of the random chain polypeptide, Biopolymers, 7(3) (1969) 379–393. 22 J.G. Mandell, A. Baerga-Ortiz, A.M. Falick and E.A. Komives, Measurement of solvent accessibility at protein–protein interfaces, Methods Mol. Biol., 305 (2005) 65–80. 23 A. Baerga-Ortiz, S. Bergqvist, J.G. Mandell and E.A. Komives, Two different proteins that compete for binding to thrombin have opposite kinetic and thermodynamic profiles, Protein Sci., 13(1) (2004) 166–176. 24 D. Kuck, Half a century of scrambling in organic ions: Complete, incomplete, progressive and composite atom interchange, Int. J. Mass Spectrom., 213(2/3) (2002) 101–144. 25 Y. Deng, H. Pan and D.L. Smith, Selective isotope labeling demonstrates that hydrogen exchange at individual peptide amide linkages can be determined by collision-induced dissociation mass spectrometry, J. Am. Chem. Soc., 121(9) (1999) 1966–1967. 26 D.R. Reed and S.R. Kass, Hydrogen-deuterium exchange at non-labile sites: A new reaction facet with broad implications for structural and dynamic determinations, J. Am. Soc. Mass Spectrom., 12(11) (2001) 1163–1168. 27 R.S. Johnson, D. Krylov and K.A. Walsh, Proton mobility within electrosprayed peptide ions, J. Mass Spectrom., 30(2) (1995) 386–387. 28 J.A.A. Demmers, D.T.S. Rijkers, J. Haverkamp, J.A. Killian and A.J.R. Heck, Factors affecting gasphase deuterium scrambling in peptide ions and their implications for protein structure determination, J. Am. Chem. Soc., 124(37) (2002) 11191–11198. 29 M.Y. Kim, C.S. Maier, D.J. Reed and M.L. Deinzer, Site-specific amide hydrogen/deuterium exchange in E. coli thioredoxins measured by electrospray ionization mass spectrometry, J. Am. Chem. Soc., 123(40) (2001) 9860–9866. 30 S. Akashi and K. Takio, Conformational changes of proteins observed by hydrogen/deuterium exchange and electrospray ionization mass spectrometry, J. Mass Spectrom. Soc. Jpn., 46(1) (1998) 75–82. 31 P. Guy, H. Remigy, M. Jaquinod, B. Bersch, L. Blanchard, A. Dolla and E. Forest, Study of the new stability properties induced by amino acid replacement of tyrosine 64 in cytochrome c553 from Desulfovibrio vulgaris Hildenborough using electrospray ionization mass spectrometry, Biochem. Biophys. Res. Commun., 218(1) (1996) 97–103. 32 M. Jaquinod, P. Guy, F. Halgand, M. Caffrey, J. Fitch, M. Cusanovich and E. Forest, Stability study of Rhodobacter capsulatus ferrocytochrome c2 wild-type and site-directed mutants using hydrogen/ deuterium exchange monitored by electrospray ionization mass spectrometry, FEBS Lett., 380(1,2) (1996) 44–48. 33 C.S. Maier, O.-H. Kim and M.L. Deinzer, Conformational properties of the A-state of cytochrome c studied by hydrogen/deuterium exchange and electrospray mass spectrometry, Anal. Biochem., 252(1) (1997) 127–135. 34 S.J. Valentine and D.E. Clemmer, H/D exchange levels of shape-resolved cytochrome c conformers in the gas phase, J. Am. Chem. Soc., 119(15) (1997) 3558–3566. 35 D.S. Wagner and R.J. Anderegg, Conformation of Cytochrome c studied by deuterium exchangeelectrospray ionization mass spectrometry, Anal. Chem., 66(5) (1994) 706–711. 36 J.R. Engen and D.L. Smith, Investigating the higher order structure of proteins: Hydrogen exchange, proteolytic fragmentation, and mass spectrometry, Methods Mol. Biol. (Totowa, NJ), 146(Mass Spectrometry of Proteins and Peptides) (2000) 95–112.
174
David M. Hambly and Michael L. Gross
37 M.M. Zhu, D.L. Rempel, Z. Du and M.L. Gross, Quantification of protein–ligand interactions by mass spectrometry, titration, and H/D exchange: PLIMSTEX, J. Am. Chem. Soc., 125 (2003) 5252–5253. 38 M.M. Zhu, D.L. Rempel and M.L. Gross, Modeling data from titration, amide H/D exchange, and mass spectrometry to obtain protein-ligand binding constants, J. Am. Soc. Mass Spectrom., 15(3) (2004) 388–397. 39 K.D. Powell, S. Ghaemmaghami, M.Z. Wang, L. Ma, T.G. Oas and M.C. Fitzgerald, A general mass spectrometry-based assay for the quantitation of protein-ligand binding interactions in solution, J. Am. Chem. Soc., 124(35) (2002) 10256–10257. 40 K.D. Powell and M.C. Fitzgerald, Accuracy and precision of a new H/D exchange- and mass spectrometry-based technique for measuring the thermodynamic properties of protein–peptide complexes, Biochemistry, 42(17) (2003) 4962–4970. 41 I.A. Kaltashov and S.J. Eyles, Crossing the phase boundary to study protein dynamics and function: Combination of amide hydrogen exchange in solution and ion fragmentation in the gas phase, J. Mass Spectrom., 37(6) (2002) 557–565. 42 G.H. Dihazi and A. Sinz, Mapping low-resolution three-dimensional protein structures using chemical cross-linking and Fourier transform ion-cyclotron resonance mass spectrometry, Rapid Commun. Mass Spectrom., 17(17) (2003) 2005–2014. 43 D.M. Schulz, C. Ihling, G.M. Clore and A. Sinz, Mapping the topology and determination of a lowresolution three-dimensional structure of the calmodulin–melittin complex by chemical crosslinking and high-resolution FTICRMS: Direct demonstration of multiple binding modes, Biochemistry, 43(16) (2004) 4703–4715. 44 A. Sinz, Chemical cross-linking and mass spectrometry for mapping three-dimensional structures of proteins and protein complexes, J. Mass Spectrom., 38(12) (2003) 1225–1237. 45 N.S. Green, E. Reisler and K.N. Houk, Quantitative evaluation of the lengths of homobifunctional protein cross-linking reagents used as molecular rulers, Protein Sci., 10(7) (2001) 1293–1304. 46 M.O. Glocker, C. Borchers, W. Fiedler, D. Suckau and M. Przybylski, Molecular characterization of surface topology in protein tertiary structures by Amino-Acylation and mass spectrometric peptide mapping, Bioconjug. Chem., 5(6) (1994) 583–590. 47 H. Ohguro, K. Palczewski, K.A. Walsh and R.S. Johnson, Topographic study of arrestin using differential chemical modifications and hydrogen/deuterium exchange, Protein Sci., 3(12) (1994) 2428–2434. 48 V. Amico, S. Foti, R. Saletti, A. Cambria and G. Petrone, Identification of iodination sites in cytochrome c by high-performance liquid chromatography and fast atom bombardment mass spectrometry, Biomed. Environ. Mass Spectrom., 16(1–12) (1988) 431–437. 49 E.R. Stadtman and R.L. Levine, Free radical-mediated oxidation of free amino acids and amino acid residues in proteins, Amino Acids, 25(3–4) (2003) 207–218. 50 B. Onisko, E.G. Fernandez, M.L. Freire, A. Schwarz, M. Baier, F. Camina, J.R. Garcia, S. RodriguezSegade Villamarin and J.R. Requena, Probing PrPSc structure using chemical cross-linking and mass spectrometry: Evidence of the proximity of Gly90 amino termini in the PrP 27-30 aggregate, Biochemistry, 44(30) (2005) 10100–10109. 51 K.L. Bennett, M. Kussmann, P. Bjork, M. Godzwon, M. Mikkelsen, P. Sorensen and P. Roepstorff, Chemical cross-linking with thiol-cleavable reagents combined with differential mass spectrometric peptide mapping — A novel approach to assess intermolecular protein contacts, Protein Sci., 9(8) (2000) 1503–1518. 52 J. Peterson James, M. Young Malin and J. Takemoto Larry, Probing alpha-crystallin structure using chemical cross-linkers and mass spectrometry, Mol. Vis. [Electronic], 10 (2004) 857–866. 53 T. Heyduk, N. Baichoo and E. Heyduk, Hydroxyl radical footprinting of proteins using metal ion complexes, Met. Ions. Biol. Syst., 38 (2001) 255–287. 54 S.D. Maleknia, M. Brenowitz and M.R. Chance, Millisecond radiolytic modification of peptides by synchrotron X-rays identified by mass spectrometry, Anal. Chem., 71(18) (1999) 3965–3973. 55 J.H. Fenton and H. Jackson, The oxidation of polyhydric alcohols in presence of iron, J. Chem. Soc., Trans., 75 (1899) 1. 56 H.J.H. Fenton and H.O. Jones, The oxidation of organic acids in presence of ferrous iron. Part I, J. Chem. Soc., Trans., 77 (1900) 69–76.
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
175
57 J.S. Sharp, J.M. Becker and R.L. Hettich, Protein surface mapping by chemical oxidation: Structural analysis by mass spectrometry, Anal. Biochem., 313(2) (2003) 216–225. 58 H.C. Urey, L.H. Dawsey and F.O. Rice, Absorption spectrum and decomposition of hydrogen peroxide by light, J. Am. Chem. Soc., 51 (1929) 1371–1383. 59 R.B. Holt, C.K. McLane and O. Oldenberg, Ultraviolet absorption spectrum of hydrogen peroxide, J. Chem. Phys., 16 (1948) 225–229. 60 S.G. Schrank, H.J. Jose, R.F. Moreira and H.F. Schroder, Applicability of Fenton and H2O2/UV reactions in the treatment of tannery wastewaters, Chemosphere, 60(5) (2005) 644–655. 61 J.S. Sharp, J.M. Becker and R.L. Hettich, Analysis of protein solvent accessible surfaces by photochemical oxidation and mass spectrometry, Anal. Chem., 76(3) (2004) 672–683. 62 K. Morand, G. Talbo and M. Mann, Oxidation of peptides during electrospray ionization, Rapid Commun. Mass Spectrom., 7(8) (1993) 738–743. 63 S.D. Maleknia, M.R. Chance and K.M. Downard, Electrospray-assisted modification of proteins: A radical probe of protein structure, Rapid Commun. Mass Spectrom., 13(23) (1999) 2352–2358. 64 J.W.H. Wong, S.D. Maleknia and K.M. Downard, Hydroxyl radical probe of the calmodulin– melittin complex interface by electrospray ionization mass spectrometry, J. Am. Soc. Mass Spectrom., 16(2) (2005) 225–233. 65 S.D. Maleknia, J.W.H. Wong and K.M. Downard, Photochemical and electrophysical production of radicals on millisecond timescales to probe the structure, dynamics and interactions of proteins, Photochem. Photobiol. Sci., 3(8) (2004) 741–748. 66 J.W.H. Wong, S.D. Maleknia and K.M. Downard, Study of the ribonuclease-S-protein-peptide complex using a radical probe and electrospray ionization mass spectrometry, Anal. Chem., 75(7) (2003) 1557–1563. 67 S.D. Maleknia and K. Downard, Radical approaches to probe protein structure, folding, and interactions by mass spectrometry, Mass Spectrom. Rev., 20(6) (2001) 388–401. 68 C.L. Hawkins and M.J. Davies, Generation and propagation of radical reactions on proteins, Biochim. Biophys. Acta, 1504(2–3) (2001) 196–219. 69 M.J. Davies and R.T. Dean, Radical-mediated protein oxidation: From chemistry to medicine, Oxford University Press, Oxford, 1997, pp 44–45. 70 B. Sclavi, M. Sullivan, M.R. Chance, M. Brenowitz and S.A. Woodson, RNA folding at millisecond intervals by synchrotron hydroxyl radical footprinting, Science (Washington, DC), 279(5358) (1998) 1940–1943. 71 M. Gulotta, R. Gilmanshin, T.C. Buscher, R.H. Callender and R.B. Dyer, Core formation in apomyoglobin: Probing the upper reaches of the folding energy landscape, Biochemistry, 40(17) (2001) 5137–5143. 72 D.M. Vu, J.K. Myers, T.G. Oas and R.B. Dyer, Probing the folding and unfolding dynamics of secondary and tertiary structures in a three-helix bundle protein, Biochemistry, 43(12) (2004) 3582–3589. 73 I. Shcherbakova, S. Mitra, R.H. Beer and M. Brenowitz, Fast Fenton footprinting: A laboratorybased method for the time-resolved analysis of DNA, RNA and proteins, Nucleic Acids Res., 34(6) (2006) e48. 74 K. Takamoto and M.R. Chance, Radiolytic protein footprinting with mass spectrometry to probe the structure of macromolecular complexes, Annu. Rev. Biophys. Biomol. Struct., 35 (2006) 251–276. 75 M. Brenowitz, D.A. Erie and M.R. Chance, Catching RNA polymerase in the act of binding: Intermediates in transcription illuminated by synchrotron footprinting, Proc. Natl. Acad. Sci. USA, 102(13) (2005) 4659–4660. 76 B. Sclavi, S. Woodson, M. Sullivan, M. Chance and M. Brenowitz, Following the folding of RNA with time-resolved synchrotron X-ray footprinting, Meth. Enzymol., 295(Energetics of Biological Macromolecules, Part B) (1998) 379–402. 77 M.R. Chance, Unfolding of apomyoglobin examined by synchrotron footprinting, Biochem. Biophys. Res. Commun., 287(3) (2001) 614–621. 78 H. Rashidzadeh, S. Khrapunov, M.R. Chance and M. Brenowitz, Solution structure and interdomain interactions of the Saccharomyces cerevisiae ‘‘TATA binding protein’’ (TBP) probed by radiolytic protein footprinting, Biochemistry, 42(13) (2003) 3655–3665.
176
David M. Hambly and Michael L. Gross
79 J.-Q. Guan, K. Takamoto, S.C. Almo, E. Reisler and M.R. Chance, Structure and dynamics of the actin filament, Biochemistry, 44(9) (2005) 3166–3175. 80 S.D. Maleknia, C.Y. Ralston, M.D. Brenowitz, K.M. Downard and M.R. Chance, Determination of macromolecular folding and structure by synchrotron X-ray radiolysis techniques, Anal. Biochem., 289(2) (2001) 103–115. 81 W.M. Garrison, Reaction mechanisms in the radiolysis of peptides, polypeptides, and proteins, Chem. Rev. (Washington, DC), 87(2) (1987) 381–398. 82 S.G. Reddy, K.K. Wong, C.V. Parast, J. Peisach, R.S. Magliozzo and J.W. Kozarich, Dioxygen inactivation of pyruvate formate-lyase: EPR evidence for the formation of protein-based sulfinyl and peroxyl radicals, Biochemistry, 37(2) (1998) 558–563. 83 O.H. Wheeler and R. Montalvo, Radiolysis of phenylalanine and tyrosine and aqueous solution, Radiat. Res., 40(1) (1969) 1–10. 84 S. Steenken and P. O’Neill, Oxidative demethoxylation of methoxylated phenols and hydroxybenzoic acids by the hydroxyl radical. An in situ electron spin resonance, conductometric pulse radiolysis and product analysis study, J. Phys. Chem., 81(6) (1977) 505–508. 85 R.C. Armstrong and A.J. Swallow, Pulse- and gamma-radiolysis of aqueous solutions of tryptophan, Radiat. Res., 40(3) (1969) 563–579. 86 S.V. Jovanovic and M.G. Simic, Repair of tryptophan radicals by antioxidants, J. Free Radic. Biol. Med., 1(2) (1985) 125–129. 87 L. Josimovic, I. Jankovic and S.V. Jovanovic, Radiation induced decomposition of tryptophan in the presence of oxygen, Radiat. Phys. Chem., 41(6) (1993) 835–841. 88 M.J. Davies, S. Fu, H. Wang and R.T. Dean, Stable markers of oxidant damage to proteins and their application in the study of human disease, Free Radic. Biol. Med., 27(11/12) (1999) 1151–1163. 89 R.V. Winchester and K.R. Lynn, X- and g-radiolysis of some tryptophan dipeptides, Int. J. Radiat. Biol. Relat. Stud. Phys. Chem. Med., 17(6) (1970) 541–548. 90 L.P. Candeias, P. Wardman and R.P. Mason, The reaction of oxygen with radicals from oxidation of tryptophan and indole-3-acetic acid, Biophys. Chem., 67(1–3) (1997) 229–237. 91 G.G. Jayson, G. Scholes and J. Weiss, Formation of formylkynurenine by the action of X-rays on tryptophan in aqueous solution, Biochem. J., 57 (1954) 386–390. 92 T.J. Simat and H. Steinhart, Oxidation of free Tryptophan and Tryptophan residues in peptides and proteins, J. Agric. Food Chem., 46(2) (1998) 490–498. 93 K. Uchida and S. Kawakishi, 2-Oxohistidine as a novel biological marker for oxidatively modified proteins, FEBS Lett., 332(3) (1993) 208–210. 94 K. Uchida and S. Kawakishi, Selective oxidation of imidazole ring in histidine residues by the ascorbic acid-copper ion system, Biochem. Biophys. Res. Commun., 138(2) (1986) 659–665. 95 J. Kopoldova and S. Hrncir, Gamma-radiolysis of aqueous solution of histidine, Zeitschrift fuer Naturforschung, C: J. Biosci., 32C(7–8) (1977) 482–487. 96 K.M. Bansal and R.M. Sellers, Polarographic and optical pulse radiolysis study of the radicals formed by hydroxyl radical attack on imidazole and related compounds in aqueous solutions, J. Phys. Chem., 79(17) (1975) 1775–1780. 97 P.S. Rao, M. Simic and E. Hayon, Pulse radiolysis study of imidazole and histidine in water, J. Phys. Chem., 79(13) (1975) 1260–1263. 98 V.A. Burgess, C.J. Easton and M.P. Hay, Selective reaction of glycine residues in hydrogen atom transfer from amino acid derivatives, J. Am. Chem. Soc., 111(3) (1989) 1047–1052. 99 C. Von Sonntag and H.-P. Schuchmann, Peroxyl radicals in aqueous solutions, Peroxyl Radic., (1997) 173–234. 100 J.E. Bennett, Kinetic electron paramagnetic resonance study of the reactions of tert-butylperoxyl radicals in aqueous solution, J. Chem. Soc., Faraday Trans., 86(19) (1990) 3247–3252. 101 P. Neta, R.E. Huie and A.B. Ross, Rate constants for reactions of peroxyl radicals in fluid solutions, J. Phys. Chem. Ref. Data, 19(2) (1990) 413–513. 102 E. Bothe, M.N. Schuchmann, D. Schulte-Frohlinde and C. Von Sonntag, Hydroperoxyl elimination from a-hydroxyalkylperoxyl radicals in aqueous solution, Photochem. Photobiol., 28(4–5 (Singlet Oxygen Relat. Species Chem. Biol.)) (1978) 639–644.
Hydroxyl Radical Profiling of Solvent-Accessible Protein Residues
177
103 Y. Ilan, J. Rabani and A. Henglein, Pulse radiolytic investigations of peroxy radicals produced from 2-propanol and methanol, J. Phys. Chem., 80(14) (1976) 1558–1562. 104 S. Abramovitch and J. Rabani, Pulse radiolytic investigations of peroxy radicals in aqueous solutions of acetate and glycine, J. Phys. Chem., 80(14) (1976) 1562–1565. 105 J. Rabani, D. Klug-Roth and A. Henglein, Pulse radiolytic investigations of OHCH2O2 radicals, J. Phys. Chem., 78(21) (1974) 2089–2093. 106 J.A. Howard, Absolute rate constants for reactions of oxyl radicals, Adv. Free Radical Chem. (London), 4 (1972) 49–173. 107 J.E. Bennett, D.M. Brown and B. Mile, Electron spin resonance of the reactions of alkylperoxy radicals. 2. Equilibrium between alkylperoxy radicals and tetroxide molecules, Trans. Faraday Soc., 66(2) (1970) 397–405. 108 K. Adamic, J.A. Howard and K.U. Ingold, Absolute rate constants for hydrocarbon autoxidation. XVI. Reactions of peroxy radicals at low temperatures, Can. J. Chem., 47(20) (1969) 3803–3808. 109 G.V. Buxton, C.L. Greenstock, W.P. Helman and A.B. Ross, Critical review of rate constants for reactions of hydrated electrons, hydrogen atoms and hydroxyl radicals (.OH/.O-) in aqueous solution, J. Phys. Chem. Ref. Data, 17(2) (1988) 513–886. 110 C. von Sonntag, The Chemical Basis of Radiation Biology, Taylor and Francis, London, 1987. 111 G. Xu and M.R. Chance, Radiolytic modification and reactivity of amino acid residues serving as structural probes for protein footprinting, Anal. Chem., 77(14) (2005) 4549–4555. 112 R.L. Levine, B.S. Berlett, J. Moskovitz, L. Mosoni and E.R. Stadtman, Methionine residues may protect proteins from critical oxidative damage, Mech. Ageing Dev., 107(3) (1999) 323–332. 113 R.L. Levine, J. Moskovitz and E.R. Stadtman, Oxidation of methionine in proteins: Roles in antioxidant defense and cellular regulation, IUBMB Life, 50(4–5) (2000) 301–307. 114 R.L. Levine, L. Mosoni, B.S. Berlett and E.R. Stadtman, Methionine residues as endogenous antioxidants in proteins, Proc. Natl. Acad. Sci. USA, 93(26) (1996) 15036–15040. 115 W. Vogt, Oxidation of methionyl residues in proteins: Tools, targets, and reversal, Free Radic. Biol. Med., 18(1) (1995) 93–105. 116 M.R. DeFelippis, M. Faraggi and M.H. Klapper, Evidence for through-bond long-range electron transfer in peptides, J. Am. Chem. Soc., 112(14) (1990) 5640–5642. 117 M.H. Klapper and M. Faraggi, Applications of pulse radiolysis to protein chemistry, Q. Rev. Biophys., 12(4) (1979) 465–519. 118 M. Faraggi, M.R. DeFelippis and M.H. Klapper, Long-range electron transfer between tyrosine and tryptophan in peptides, J. Am. Chem. Soc., 111(14) (1989) 5141–5145. 119 W.A. Prutz, F. Siebert, J. Butler, E.J. Land, A. Menez and T. Montenay-Garestier, Charge transfer in peptides: Intramolecular radical transformations involving methionine, tryptophan and tyrosine, Biochim. Biophys. Acta (BBA) - Protein Struct. Mol. Enzymol., 705(2) (1982) 139–149. 120 W.A. Eaton, V. Munoz, S.J. Hagen, G.S. Jas, L.J. Lapidus, E.R. Henry and J. Hofrichter, Fast kinetics and mechanisms in protein folding, Annu. Rev. Biophys. Biomol. Struct., 29 (2000) 327–359. 121 D.N. Ivankov and A.V. Finkelstein, Prediction of protein folding rates from the amino acid sequence-predicted secondary structure, Proc. Natl. Acad. Sci. USA, 101(24) (2004) 8942–8944. 122 R. Gilmanshin, S. Williams, R.H. Callender, W.H. Woodruff and R.B. Dyer, Fast events in protein folding: Relaxation dynamics of secondary and tertiary structure in native apomyoglobin, Proc. Natl. Acad. Sci. USA, 94(8) (1997) 3709–3713. 123 J.G. Kiselar, S.D. Maleknia, M. Sullivan, K.M. Downard and M.R. Chance, Hydroxyl radical probe of protein surfaces using synchrotron X-ray radiolysis and mass spectrometry, Int. J. Radiat. Biol., 78(2) (2002) 101–114. 124 B. Ma, T. Elkayam, H. Wolfson and R. Nussinov, Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces, Proc. Natl. Acad. Sci. USA, 100(10) (2003) 5772–5777. 125 D.M. Hambly and M.L. Gross, Laser flash photolysis of hydrogen peroxide to oxidize protein solvent-accessible residues on the microsecond timescale, J. Am. Soc. Mass Spectrom., 16(12) (2005) 2057–2063.
CHAPT ER
8 Intact Protein Mass Measurements and Top-Down Mass Spectrometry: Application to Integral Membrane Proteins Julian P. Whitelegge
Contents
1. Introduction 2. Intact Protein Mass Measurements 2.1 Sample preparation and separations for integral membrane and other proteins 3. Ionization 3.1 Dissociation of intact proteins 3.2 Data interpretation 3.3 Future considerations References
179 180 183 188 189 190 192 194
1. INTRODUCTION The ease with which peptides can be delivered to mass spectrometers, subjected to automated tandem mass spectrometry, and the data subsequently screened for matches to a protein sequence database, has resulted in an overwhelming predominance of ‘bottom-up’ proteomics strategies. Since there is greater sensitivity of the mass spectrometer to peptides rather than proteins, a larger proportion of the proteome is available to bottom-up proteomics. However, a growing body of mass spectrometrists are embracing intact protein mass measurements, and sophisticated ‘top-down’ tandem mass spectrometry experiments on intact proteins because they realize that proteomic information is lost when the individual proteins of the proteome are cleaved into a complex mixture of small peptides. Furthermore, bottom-up strategies favor peptides that are Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00208-0
r 2009 Elsevier B.V. All rights reserved.
179
180
Julian P. Whitelegge
easily recovered with robust ionization properties (proteotypic peptides), such that proteins or regions of proteins with unfavorable properties will be selected against. Thus, while integral membrane proteins (IMPs) constitute around 30% of the proteome, their transmembrane domains are easily excluded from the average bottom-up proteomics experiment. Progress has been made in this respect, by improving protocols for membrane protein extraction and digestion [1,2]. In this chapter we show how top-down mass spectrometry provides a route toward proteomics experiments that embrace the transmembrane domain by addressing the whole intact protein.
2. INTACT PROTEIN MASS MEASUREMENTS The mass spectrum of an intact protein defines the native covalent state of the gene product and its heterogeneity. To better understand the sort of information that can be lost in ‘shotgun’ proteomics strategies we will consider the PsbH protein from spinach (Spinach oleracea). The measured mass of the protein is 7,598 Da (Figure 1). To calculate the predicted mass of the protein one first visits the protein sequence database at NCBI (http://www.ncbi.nlm.nih.gov/). A search for ‘psbh spinach’ returns five entries because NCBI keeps a ‘redundant’ database where new information is added as a new entry, rather than updating a single entry. Among the five entries, visual inspection reveals one entry from the SwissProt database. Not all proteins have a SwissProt (sp) entry yet but it is nearly always best to use the SwissProt entry, if there is one, because this database is maintained in a non-redundant status. This means the information in this entry has been annotated with the latest information on that protein so it is likely (but not always) the most reliable source for information including the primary protein sequence. Annotation of SwissProt entries is not immediate however and data must be published and then updates manually submitted. The primary sequence and information on known post-translational modifications is found at the bottom of the SwissProt entry and this can be taken for mass calculation. In the case of spinach PsbH, the SwissProt entry (P05146) includes two sequence ‘conflicts’ because a later sequencing effort disagreed with an earlier version. Steve Go´mez actually predicted the sequencing errors based upon the intact mass measurement and a consideration of sequence conservation across a wide range of PsbH sequences [3]. According to the SwissProt entry the initiating Met residue is removed such that the mature form covers amino acids 2–73 (‘mature chain’). The average mass can then be calculated using the link to ‘PeptideMass’, a mass calculator at the EXPASY informatics site (http:// us.expasy.org/tools/peptide-mass.html). Note that mass setting is for ‘M’ rather than ‘M+H+’ and ‘average’ rather than ‘monoisotopic’ mass. Cys residues are unmodified (there are none in PsbH) and the enzyme is set to ‘no cutting’. Of course, there are many other mass calculators but PeptideMass is generally reliable. PeptideMass returns the calculated mass for residues 2–73 as 7,598.8559 Da in reasonable agreement with the measured mass (Figure 1.). Mass accuracy on quadrupole mass spectrometers is around 0.01% (100 ppm) giving a
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
181
Figure 1 The mass spectrum of an intact protein defines the native covalent state of the gene product and its heterogeneity. Intact protein electrospray ionization mass spectrometry was used to profile the PsbH protein from spinach thylakoid membranes prepared from plants incubated in low light versus those exposed to high light for 45 min. The peak for the unmodified protein has a mass of 7,598 Da and phosphorylation adducts (+80 Da) are notable. PsbH is known to have two phosphorylation sites though double phosphorylation can be seen only in the high light sample (7,758). Minor oxidative modifications (+16 Da, circles) can be seen on each phosphorylated form distributed evenly between all three states. Under high light conditions an as yet unidentified +32 Da modification (squares), probably also oxidative, appears preferentially associated with the phosphorylated forms of PsbH, one on the singly phosphorylated species (7,710) and two on the doubly phosphorylated species (7,792; 7,825). Thus the +32 Da modification is related or linked to the +80 Da phosphorylation. Whether one increases the probability of the other will need testing in more developed experiments. Since the two modifications are likely on different sites, a proteolytic cleavage that separates the two modifications (as in a bottom-up proteomics experiment) would result in loss of the ‘linkage’ information. Modified from Go´mez and coworkers [4]. Permission obtained from American Society for Biochemistry and Molecular Biology, 2002.
margin of error of 0.76 Da for the PsbH measurement. The SwissProt entry for spinach PsbH mentions a single phosphorylation site at Thr3. Unfortunately, PeptideMass does not have a convenient way to include mass calculations for modified forms so one must look up the delta mass for phosphorylation and add this to the calculated mass. The Delta Mass tool maintained by ABRF is useful in this respect (http://www.abrf.org/index.cfm/dm.home). If you cannot find the modification online it will be necessary to consider the changes to the atomic formula of the molecule introduced by the modification. Thus for phosphorylation we add +80 Da to the calculated mass (7,678 Da). Figure 1 shows that in the low light sample the singly phosphorylated form is more abundant than the unmodified form, with the assumption that the two species have the same ionization efficiency. For intact proteins this assumption is usually satisfactory but this is not
182
Julian P. Whitelegge
always the case with peptides. For this reason such a conclusion of abundance is usually described as ‘semi-quantitative’. Absolute quantification in mass spectrometry is achieved using internal standards while relative quantification can be achieved using isotopic labeling strategies (see Warscheid, Chapter 17). The minor peaks in the low light spectrum probably correspond to ‘noise’. The different measured masses for PsbH are called intact mass tags (IMTs) and it should be noted that a single protein can give rise to many. Note also that if there were alternative phosphorylation sites on this protein, singly phosphorylated isoforms with the modification at different sites would have the same IMT. The high light spectrum of PsbH (Figure 1) is more complex. A second phosphorylation appears to be apparent. Its presence is supported by a SwissProt entry for Arabidopsis PsbH that reports phosphorylation at Thr5, as well as Thr3 (P56780). So consideration of what’s going on in one species can help interpret what’s going on in another. Such logic underlies comparative physiology and biochemistry, a discipline that draws little attention at this time. The mass calculated for the doubly phosphorylated species is in reasonable agreement with that measured (7,758 Da). ‘Reasonable agreement’ is a ‘wooly’ term that simply means that calculated and measured masses are ‘within measurement error’, 100 ppm in this case. Confidence can be dramatically boosted by working with accurate mass measurements exceeding 5 ppm mass accuracy, though distinguishing protein phosphate from sulfate modifications, for example, requires still higher mass accuracy. Besides phosphorylation, there are other adducts appearing in the high light PsbH mass spectrum. A +16 Da modification appears at low levels on all three phosphoforms, marked with the solid black circle. This is likely due to Met oxidation to its sulfoxide (MetO). More striking however is the appearance of a +32 Da adduct seen only on the phosphorylated forms. The nature of the modification is unknown but an oxidative addition of two oxygen atoms is a reasonable hypothesis. The doubly phosphorylated form appears to have a sub-population with two of these modifications, while the singly phosphorylated species exhibits just one. Thus there appears to be a link between the +32 Da modification and phosphorylation. The biological significance of this link is not yet clear and one can speculate that one modification leads to the other or that one is a response to the other, and so on. The mass spectral output provides information that allows us to develop new hypotheses for future testing. What is clear, is that if the +32 Da modification is happening at a different site to the N-terminal phosphorylations, a bottom-up proteomics experiment where the protein is proteolyzed could lead to separation of the two pieces of information [4]. The histone code is now being considered in this context and Allis has speculated on binary switches involving methylation and phosphorylation of adjacent residues [5]. It is likely that in the coming years the paradigm of single modifications acting as on/off switches will be expanded to include much more complicated logic through multiple modifications. How did we know the protein was PsbH? It was possible to match many thylakoid membrane proteins to their intact masses because post-translational modifications are minimal and predictable in most cases in the chloroplast [4]. PsbH is the only thylakoid protein in that size range. The TagIdent tool at
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
183
EXPASY (http://us.expasy.org/tools/tagident.html) can be used to search for database entries by intact mass alone but the user must be aware that it will not include any modifications relying on a mass calculation from the complete genomic translation. Many other systems will exhibit greater post-translational modification and making assignments based upon intact mass alone will become impossible, and they are at best coincidental. Identification of a protein from its mass can be accomplished in two ways. Ions from the intact protein can be isolated in the mass spectrometer for tandem mass spectrometry (see Section 3.1 on top-down MS), or samples collected concomitantly with elution of the IMT can be subjected to chemical cleavage (CNBr) or digestion (trypsin) to yield peptides for bottom-up tandem mass spectrometry. The only direct way to identify the IMT is through top-down MS because the bottom-up approach could identify several proteins in a collected fraction providing several candidates for the IMT. In the case of bottom-up analysis it becomes necessary that an N- or C-terminal fragment is analyzed by mass spectrometry such that the bottom-up data and the intact mass are consistent with a single processed species. Since the intact mass tag summarizes the primary structure of the protein, proteome-wide studies of IMTs can provide powerful insights. Analysis of the IMTs for nuclear-encoded thylakoid membrane proteins imported from the cytoplasm allowed us to classify three different membrane insertion mechanisms based upon patterns of transit peptide cleavage [6].
2.1 Sample preparation and separations for integral membrane and other proteins The right sample preparation and purification workflow can turn a seemingly impossible task into the routine. IMPs are amenable to both electrospray ionization (ESI) and matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry provided they can be purified away from salts, lipids, and detergents in aqueous/organic solvent mixtures. The best way to achieve this is typically determined empirically but some general trends can be related. IMPs are best left in their native configuration until sample preparation, either in the membrane or as native complexes extracted in mild non-ionic detergents. Disturbance of the tertiary and secondary structure, by running an SDS gel, for example, often renders IMPs highly susceptible to aggregation. Often it is beneficial to precipitate IMPs with organic solvents in order to remove some lipids and detergents but this can render the precipitate difficult to dissolve. High concentrations of organic acids (usually formic acid) are often then necessary for solubilization with immediate HPLC to transfer the proteins to a less reactive environment. In the case of thylakoid membrane proteins it is necessary to precipitate the proteins with acetone, stripping them of bound chlorophyll and other cofactors. The Halobacterium purple membrane, dominated by Bacteriorhodopsin, can be conveniently solubilized directly into formic acid without precipitation. Mammalian membrane systems can be very resistant to dissolution and require detergent disruption followed by organic precipitation to render the proteins suitable for analysis. Clues to successful preparation of a membrane
184
Julian P. Whitelegge
system can sometimes be found in older literature, under unlikely titles demanding time-consuming manual searches in the library. The goal is always consistent, to generate a protein-enriched sample that can be quickly solubilized in formic acid for immediate HPLC. Residual amounts of lipids and detergents should be expected and it is necessary that the chromatography system in use should separate the protein of interest away from such contaminants. A suite of chromatography systems has been described for analysis of membrane proteins [7] and these are illustrated in Figure 2. While we originally described a system that involved a high concentration of formic acid in the aqueous phase throughout the separation [8], systems that rapidly separate the protein from excess formic acid are now favored, due to the potential for covalent formylation of the protein (+28 Da adducts). Liquid chromatography (HPLC) combined with online ESI-MS (LC-MS) has been used for all our method development efforts, allowing us to monitor the covalent status of the eluting proteins and to include minimal modification as a criterion in successful method development. Methods reported in the literature that were developed without online MS as a readout of analyte integrity should not be assumed to be useful without testing this criterion. Thus while elevated temperatures are often helpful in HPLC separations, excessive temperature can accelerate undesirable chemistry. The thylakoid membrane cytochrome b6f complex was analyzed by sizeexclusion chromatography (SEC) and reverse-phase chromatography (RPC) coupled with online ESI mass spectrometry (Figure 2). In both cases the sample was prepared by acetone precipitation prior to dissolution in formic acid (90% in water, v/v) and immediate injection to HPLC. The separation on the SEC system used is limited such that larger proteins elute in the 6–9 min range and smaller ones over 8–11 min (Figure 2A). SEC works very well in its size-exclusion context and the larger protein mass spectrum is free from interference from small proteins or other small molecules in the sample or the formic acid. The four larger subunits (17–35 kDa) were not separated from each other but could be
Figure 2 The right sample preparation and purification workflow makes analysis of a membrane protein complex routine. A sample of spinach cytochrome b6f complex (300 mg protein) was precipitated with acetone (80%, v/v, 201C, 1 h) and dissolved in 90% formic acid for immediate LC-MS analysis. (A) Size-exclusion separation at 250 mL/min in chloroform/ methanol 1% aqueous formic acid (4/4/1, v/v) using a silica support (SW2000 XL, 4.6 mm 30 cm, Tosoh Biosciences, Montgomeryville, PA) at 401C. (B) Reverse-phase separation at 100 mL/min in aqueous/organic trifluoroacetic acid (0.1% TFA) using a polystyrene-divinylbenzene copolymer support (PLRP/S, 300 A˚, 2 mm 15 cm, Varian Inc., Palo Alto, CA) at 401C. The column was equilibrated at 95% A (0.1% TFA in water), 5% B (0.05% TFA, 50% acetonitrile, 50% isopropanol) for 30 min before sample injection. A compound linear gradient was initiated 5 min after injection, ramping to 40% B at 30 min and 100% B at 150 min. Column eluent was directed to the ESI source of a triple quadrupole mass spectrometer (API III+, PE Sciex, Concord, Canada) via a liquid flow splitter that delivered approximately half the eluent flow to a fraction collector (LC-MS+). Data was processed using BioMultiView software. Modified from Whitelegge and coworkers [9]. Permission obtained from American Society for Biochemistry and Molecular Biology, 2002.
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
185
186
Julian P. Whitelegge
Figure 2 (Continued)
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
187
deconvoluted from the composite mass spectrum (Figure 2B). The smaller subunits elute in the tail of the larger subunits but could also be deconvoluted from the mass spectrum [9]. The separation achieved using RPC is much better than SEC and worked well for the cytochrome b6f sample (Figure 2C). Note that smaller subunits (PetL, N, M, G) tend to give stronger ion currents than the larger ones (PetB, PetC) though this is dependent on individual ionization efficiencies achieved under the conditions used and cytf and PetD gave strong signals. An essential feature of this work is the low chemical background achieved when no proteins are eluting such that reasonable signal/noise is achieved even with poorly ionizing proteins. PetB has four transmembrane helix domains and was the most challenging IMP of this analysis – despite relatively low ionization efficiency the signal to noise is excellent (Figure 2D) and the molecular mass profile clearly deconvoluted. Note however that the most intense ion in the spectrum was derived from a singly oxidized (+16 Da) isoform of PetN whose retention was shortened by the modification. The cytochrome b subunit (PetB) was concluded to have a covalently bound heme group (+615 Da) based upon the difference between measured and calculated masses [9] and subsequent crystallography revealed the presence of a Cys-linked c-type heme, as well as the two non-covalently associated b-type hemes known to be associated with the complex [10,11]. The X-ray structure confirmed that the LC-MS analysis included all the subunits of the complex. Recovery of IMPs is not always quantitative. Some dispersed, aggregated protein can be captured on the 0.2 micron filters used to protect columns while more can make its way through the filter but end up in a bound/insoluble state (Figure 3A). Blank injections of formic acid are generally effective at cleaning the filter in the chloroform/methanol/aqueous formic acid solvent used for SEC though abundant proteins such as bacteriorhodopsin can take several injections to clear. Occasional cleaning of the filter with nitric acid is recommended. The bound insoluble material can be shifted to the mobile phase by equilibration of the reversephase column in the chloroform/methanol/aqueous formic acid buffer used for SEC prior to making a formic acid injection. By inserting the size-exclusion column in line after the reverse-phase column, the released protein can be separated from the formic acid for improved mass spectrometry and UV quantification. The addition of isopropanol to HPLC buffers to enhance elution of IMPs was noted by Tarr and Crabb [12]. The utility of the polymeric column at elevated temperature was described by Bowyer and colleagues [13]. The chloroform/ methanol precipitation protocol described by Wessel and Flugge [14] has been useful for much of our work but readers are warned that some IMPs can partition into the chloroform phase, where they can be recovered by SEC as described. While the chromatographic systems described here have performed well for thylakoid membrane proteins as well as bacterial systems, there is undoubtedly a need for development of chromatographic systems for membrane proteins from mammalian systems. Hydrophilic-interaction chromatography (HILIC) has proved useful in mitochondrial systems with aqueous organic extracts being loaded onto polyhydroxyethyl aspartamide columns equilibrated at high organic concentrations and then eluted with a gradient of decreasing organic concentration [15].
188
Julian P. Whitelegge
Figure 3 Ionization is straightforward provided the protein is in solution in a suitable solvent. (A) Reverse-phase separations tend to retain sub-populations of some IMPs in a bound, insoluble sink. This sub-population can be shifted to the soluble phase by equilibrating the column in the buffer used for SEC in Figure 2 (100 mL/min), and making a formic acid injection. (B) By eluting the reverse-phase column as described in A through the size-exclusion column, the released protein can be separated from the formic acid for mass spectrometry. Figure 3A was modified from Whitelegge and coworkers [23] with permission of Future Medicine Ltd., London, 2006. Figure 3B was modified from Whitelegge and coworkers [24]. Permission obtained from Elsevier Ltd., Oxford, 2005.
3. IONIZATION Ionization, both ESI and MALDI, is straightforward provided the protein is in solution in a suitable solvent, typically an aqueous organic mixture lacking nonvolatile salts or detergents. It is important to note that in the case of MALDI-TOF the matrix solution solvent should be identical to, or compatible, that of the sample. If the protein precipitates upon mixing with the matrix solution the experiment will be unsuccessful. There has been little experimentation beyond ESI and MALDI for ionization of membrane proteins. Fast-atom bombardment (FAB) ionization was successful for a small proteolipid [16] and Halgand and coworkers used atmospheric pressure photoionization (APPI) for analysis of a hydrophobic peptide [17]. Since both techniques tend to produce predominantly singly charged ions it is unlikely they will find general favor in membrane protein research. One new development that may hold some promise for the future is a technique called laser-induced liquid bead ionization desorption (LILBID) [18]. In this technique, aqueous microdroplets are gently excited with infra-red laser photons resulting in generation of low charge-state ions, of intact complexes at lowest fluences and intact subunits at higher fluences. LILBID has already been applied to complexes III and IV of the respiratory chain from a bacterial source [18]. One can conceive of combining a technique such as this with an electrospray plume in order to multiply charge the subunits of a desorbed
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
189
complex in a manner analogous to MALDESI [19]. The microdroplets of LILBID are aqueous such that detergent solubilized micelles can be analyzed without complex chromatography separations. Another possibility for membrane protein research is field-induced droplet ionization under investigation by Beauchamp and colleagues [20]. Analogous to ESI, this process also appears to be applicable to solutions containing salts for working with native conditions. Clearly, there is an exciting future with discoveries open to young scientists.
3.1 Dissociation of intact proteins The beauty of tandem mass spectrometry is the highly efficient direct purification one can achieve with the m/z selective filter. Thus a molecule of specific molecular mass can be isolated from a relatively complex mixture with ease, very quickly. While there are only two basic dissociation chemistries available for tandem mass spectrometry, a variety of intact protein gas-phase dissociation strategies provide versatile options for top-down mass spectrometry of IMPs. The first dissociation mechanism available is collision-activated dissociation (CAD) that combines the kinetic energy of the selected precursor ions and collisions with inert gases (Ar and He are common) to thermally (vibrationally) excite the ions. With sufficient energy input, CAD occurs resulting in a backbone cleavage at the peptide bond generating b ions that include the N-terminus and y ions that include the C-terminus of the peptide or protein (see Chapter 1). CAD was first applied to intact proteins by Loo and coworkers shortly after the discovery of ESI [21]. CAD has been used for analysis of membrane protein primary structure on triple quadrupole [9,22], quadrupole TOF [9,23] and, recently, Fourier-transform ion cyclotron resonance (FT-ICR) mass analyzers [24,25]. CAD tends to produce distinct patterns of product ions as some bonds are more easily cleaved than others, so full sequence coverage should not be expected. It should be noted that multiply charged ions fragment much more readily than singly charged ions by CAD so that ESI is the only practical option for top-down mass spectrometry. Furthermore, different charge states of the same molecular ion can require different threshold dissociation energies and may yield different fragmentation patterns. More recently, a second mechanism, electron-capture dissociation (ECD), was discovered [26]. Zubarev and McLafferty describe a non-ergodic mechanism whereby low energy electrons are reacted with multiply charged positive ions in an FT-ICR cell. ECD occurs before the energy of the excited ion equilibrates such that sites of N–Calpha cleavage are largely sequence independent, typically yielding better sequence coverage than CAD. An early observation with ECD was that larger proteins (W20 kDa) tended to exhibit charge reduction rather than fragmentation and it was proposed that tertiary structure would hold the protein together despite cleavage events such that the molecule appeared uncleaved. The use of infra-red irradiation to thermally excite gas phase ions such that they become denatured, concomitant with ECD, alleviates this problem. Such thermal excitation allows for activated ion ECD (aiECD). The level of thermal excitation is adjusted to avoid excessive excitation and backbone cleavage by the CAD mechanism. Infra-red multi-photon dissociation (IRMPD) and black-body infra-red
190
Julian P. Whitelegge
dissociation (BIRD) describe thermal excitation experiments that deliberately cleave the peptide backbone by the CAD mechanism. The FT-ICR cell is ideal for ECD because of the ease of bringing together negative electrons with positive ions. Photons are also easily introduced with the use of an infra-red laser for aiECD and IRMPD. CAD is now usually performed outside of the ICR cell with subsequent transmission of product ions to the cell, using hybrid ion trap or quadrupole FT-ICR systems that allow the cell to be maintained at optimal pressure. Top-down FT-ICR experiments have been performed on bacteriorhodopsin, a seventransmembrane helix IMP, and the c-subunit of the ATP synthase Fo that has two transmembrane helices [24,25]. In the case of the c-subunit ECD alone resulted in charge reduction and it was necessary to use aiECD for efficient dissociation of this 8 kDa protein. The use of aiECD gave better sequence coverage than CAD and yielded extensive sequence information from the transmembrane domains [25]. It was concluded that transmembrane helices are stable in the chromatographic system used and remained intact in the gas phase until thermal excitation. Another development that extends the utility of ECD is the related technique, electron-transfer dissociation (ETD) whereby an anion is used to supply the electron for the dissociation [27]. ETD has been implemented on linear ion trap mass spectrometers and promises to make the technique more widely available than ECD. Since top-down really needs the resolution afforded by FT-ICR, ETD on the linear ion trap is unsatisfactory for larger precursor ions. Implementation of ETD on the Orbitrap analyzer will help in this respect. Typically, top-down mass spectrometry is not done online because the experiments take longer than the chromatographic timescale allows and must be set up manually. There is current excitement that ETD might be more suitable for online top-down but this remains to be demonstrated. Currently, we collect fractions during LC-MS on a low-resolution mass analyzer (LC-MS+) and then use these fractions for top-down experiments. First a full mass range scan is performed to define suitable precursor ions. The most intense ions from some membrane proteins sometimes fall higher than m/z 2,000 dictating use of extended mass range on typical ion trap mass spectrometers. Then a selected ion scan is used to inspect the chosen precursor ion and define conditions for CAD whereby 70–90% of the precursor is dissociated. Finally the product ion spectrum is collected over the full mass range. The sequence coverage achieved in a topdown analysis of the c-subunit benefited immensely from scanning to m/z 3,000 [25]. For proteins up to around 5,000 Da, a single scan may give good sequence coverage but for most top-down work multiple scans are averaged. Scans (or preferably FT transients) are typically averaged until visual inspection confirms good signal to noise on product ions. Even prolonged data collection may not yield full sequence coverage however, and it makes sense to supplement CAD experiments with ECD for practical expansion of coverage.
3.2 Data interpretation Current data processing strategies for interpretation of top-down mass spectrometry datasets are laborious. Firstly, the MSMS data must be converted from m/z to m. It is
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
191
easy to calculate z based upon the 12C/13C isotopomer spacing (1/z) when spectra have been collected on a high-resolution instrument. Unfortunately, for larger ions typical in top-down experiments the most abundant isotopomers contain several 13 C atoms and the monoisotopic peak with all 12C is practically undetectable. Thus the problem arises as to how to assign monoisotopic mass. In practice this is achieved by modeling the theoretical isotopomer distribution of a known atomic formula onto the experimental profile. But of course for an unassigned peak in an MSMS spectrum the atomic formula is not known. In practice, the most abundant isotopomer peak is assumed to be close to the average mass of the molecule and the atomic formula is estimated by dividing the average mass by the mass of the average amino acid residue, known as ‘averagine’. With this artificial atomic formula it is then possible to map the position of the monoisotopic peak onto the experimental data. It is also possible to be off by 1 or 2 13C atoms, especially if the data quality is marginal. The first example of software to perform the m/z to m deconvolution was called THRASH [28] and more recent examples have appeared [29]. Mascot Distiller (Matrix Science) and Xtract (Thermo) are commercially available and Magtran is available as freeware. Once the mass peaklist is obtained the dataset is ready for further analysis, typically extraction of ‘sequence tags’ for protein identification [30]. Sequence tags are derived from mass differences characteristic of amino acid residues in the peaklist and they are independent of the N- or C-terminus of the protein. The short sequence tags extracted from the mass peaklist are then used for a database homology search using software similar to BLAST [31]. The Prosight PTM website (https://prosightptm.scs.uiuc.edu/) has a suite of tools for top-down mass spectrometry data interpretation including extraction and searching of sequence tags [32,33]. Commercial software for topdown analyses is starting to appear with Prosight PC (Thermo Fisher). Once the protein is identified, it is very rare that the mass calculated for the reported sequence will match the mass measured in the experiment, due to deviations in primary structure arising from sequence errors, post-translational modifications and so on. For complete assignment of primary structure from the MSMS dataset a complex manual interpretation phase is necessary to maximize the number of ions assigned to the structure (Figure 4). It is typical to work on adjusting the N- and the C-terminus until sets of ions (b and y, c and z.) start matching the sequence being tested. The overall goal is to assign a sequence that agrees with the measured mass of the parent ion, and matches as many fragments as possible to both N- and C-terminal fragments. Other ions in the MSMS spectrum could arise from water ( 18 Da) or ammonia ( 17 Da) loss, or from internal fragments where multiple dissociation events have occurred. It should be possible to assign all ions in the MSMS spectrum but this is rarely the case in practice. It is unlikely that the ions in the MSMS dataset will provide full sequence coverage across every bond so some reliance upon genomic data is retained. The endpoint in data interpretation is somewhat subjective but hopefully the majority of ions have been assigned and the measured and calculated masses are in full agreement (Figure 5). The typical ion isolation window used for top-down experiments is several dalton wide in order to span the entire isotopomer envelope of the molecular ion.
192
Julian P. Whitelegge
Figure 4 A variety of intact protein gas-phase dissociation strategies provide versatile options for top-down mass spectrometry. The 5+ protonated ion of the ATP synthase c-subunit (AtpH, Arabidopsis thaliana) was subjected to ECD, aiECD and CAD on a hybrid linear ion trap FT-ICR mass spectrometer. Fragment assignments from CAD (b and y ions), ECD, and aiECD (c and z. ions) experiments are mapped to the sequence of AtpH. The c and z. fragments marked by @ symbol were present in both conventional ECD and activated ion ECD spectra. The b and y fragments marked by symbol are present in both CAD and aiECD spectra; and by # symbol – only in aiECD spectra. The c and z. fragments in grey were manually annotated. Transmembrane domains are shaded, demonstrating the improved coverage afforded by aiECD. The numbering on the right-hand side of the figure is reversed for counting y and z. ions. Modified from Zabrouskov and Whitelegge [25]. Permission obtained from American Chemical Society, Washington, DC, 2007.
Narrowing the window typically cuts ion transmission to unacceptable levels, and complicates estimation of the monoisotopic peak. Consequently although we usually consider that we are working with a single isolated ion, it should be remembered that a real protein population often exhibits microheterogeneity such that a mixture of isobaric/isomeric isoforms are included at a particular nominal mass. Thus, interpretation of the MSMS spectrum needs to take this into account. The PRP3 protein from human saliva with a mass of B10,999 Da was recently demonstrated to be a mixture that included isobaric variation (N replacing D in the published sequence) and isomeric variation (D4N versus D50N) [34]. By careful analysis of the high-resolution MSMS spectrum it was possible to conclude that the D4N isoform constituted around 50% of the population while D50N made up around 30%. Such considerations should be noted when reviewing product ion mass accuracy in top-down experiments.
3.3 Future considerations Exciting prospects for top-down mass spectrometry can be conceived if sophisticated data-interpretation algorithms can be brought to bear for data interpretation and eventually integrated with data-dependent acquisition strategies. The original vision of top-down proteomics described by Fred McLafferty and Neil Kelleher [35], will only be realized with the development of software to accelerate the throughput of the technique. Kelleher’s group has pioneered software development in top-down proteomics, and recently described the use of shotgun databases to encompass diverse combinations of posttranslational modifications in histones [36]. The disadvantage of such an approach
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
193
Figure 5 Current data processing strategies are laborious. The subject must first be identified, typically via use of sequence tags. An iterating process of manual sequence assignment then continues until an endpoint is reached – usually agreement of the calculated mass for the assigned structure and the measured mass within experimental error. Review of top-down data requires access to raw data, peaklists derived automatically and/or manually, and the output of the sequence assignment algorithm.
is that the database gets very large because of the need to house all the different structural combinations, as well as the problem that only structural possibilities included in the database are considered in the search. An ideal approach would remain unbiased in order to include previously unknown sequence variants and post-translational modifications [34]. It is likely that bio-informaticians experienced in genomics will contribute to software development in this arena [37]. Although the FT-ICR MS has been the accepted platform for top-down mass spectrometry, instrument development is advancing rapidly. The Makarov analyzer (Orbitrap, Thermo Fisher) [38,39] brings high resolution at a lower cost and will become competitive for smaller proteins [40]. Another point to consider is that resolution on the Orbitrap decreases linearly with increasing m/z while on the FT-ICR it does so with the square of the m/z. Thus there may be advantages to the Orbitrap while working at extended m/z range (W3,000) [24,25]. There is currently speculation that the ETD process is faster than ECD, and thus implementation of ETD on the Orbitrap is eagerly anticipated for top-down proteomics. If quality top-down mass spectra can be generated on the chromatographic timescale there will likely be explosive growth in the field.
194
Julian P. Whitelegge
The development of hybrid mass spectrometer combinations such as the linear ion trap FT-ICR or the linear ion trap Orbitrap has revolutionized the field. Further instrument development that embraces ion-mobility separations as well as the latest ionization technology, as described above, could bring exciting new possibilities to top-down membrane protein mass spectrometry.
REFERENCES 1 C.C. Wu, M.J. MacCoss, K.E. Howell and J.R. Yates, 3rd, A method for the comprehensive proteomic analysis of membrane proteins, Nat. Biotechnol., 21(5) (2003) 532–538. 2 J. Blonder, T.P. Conrads, L.R. Yu, A. Terunuma, G.M. Janini, H.J. Issaq, J.C. Vogel and T.D. Veenstra, A detergent- and cyanogen bromide-free method for integral membrane proteomics: Application to Halobacterium purple membranes and the human epidermal membrane proteome, Proteomics, 4(1) (2004) 31–45. 3 J.P. Whitelegge, S.M. Go´mez and K.F. Faull, Proteomics of membrane proteins. Proteome characterization and proteomics. In: R.D. Smith and T. Veenstra (Eds.), Adv. Protein Chem., 65 (2003) 271–307. 4 S.M. Go´mez, J.N. Nishio, K.F. Faull and J.P. Whitelegge, The chloroplast grana proteome defined by intact mass measurements from LC-MS, Mol. Cell Proteomics, 1 (2002) 45–59. 5 W. Fischle, Y. Wang and C.D. Allis, Binary switches and modification cassettes in histone biology and beyond, Nature, 425(6957) (2003) 475–479. 6 S.M. Go´mez, K.Y. Bil’, R. Aguilera, J.N. Nishio, K.F. Faull and J.P. Whitelegge, Transit peptide cleavage sites of integral thylakoid membrane proteins, Mol. Cell Proteomics, 2 (2003) 1068–1085. 7 J.P. Whitelegge, HPLC and mass spectrometry of intrinsic membrane proteins. In: M.-I. Aguilar (Ed.), Methods in Molecular Biology (Volume 251). HPLC of Peptides and Proteins, Humana Press Inc., Totawa, N.J., 2004, pp. 323–339. 8 J.P. Whitelegge, C. Gundersen and K.F. Faull, Electrospray-ionization mass spectrometry of intact intrinsic membrane proteins, Protein Sci., 7 (1998) 1423–1430. 9 J.P. Whitelegge, R. Aguilera, H. Zhang, R. Taylor and W.A. Cramer, Full subunit coverage liquid chromatography electrospray-ionization mass spectrometry (LCMS+) of an oligomeric membrane protein: Cytochrome b6f complex from Spinach and the cyanobacterium, M. laminosus, Mol. Cell Proteomics, 1 (2002) 816–827. 10 G. Kurisu, H. Zhang, J.L. Smith and W.A. Cramer, Structure of the cytochrome b6f complex of oxygenic photosynthesis: Tuning the cavity, Science, 302(5647) (2003) 1009–1014. 11 D. Stroebel, Y. Choquet, J.L. Popot and D. Picot, An atypical haem in the cytochrome b(6)f complex, Nature, 426(6965) (2003) 413–418. 12 G.E. Tarr and J.W. Crabb, Reverse-phase high-performance liquid chromatography of hydrophobic proteins and fragments thereof, Anal. Biochem., 131(1) (1983) 99–107. 13 J.P. Whitelegge, P. Jewess, M.G. Pickering, C. Gerrish, P. Camilleri and JR Bowyer, Sequence analysis of photoaffinity-labelled peptides derived by proteolysis of photosystem 2 reaction centers from thylakoid membranes treated with (14C)-azidoatrazine, Eur. J. Biochem., 207 (1992) 1077–1084. 14 D. Wessel and U.I. Flugge, A method for the quantitative recovery of protein in dilute solution in the presence of detergents and lipids, Anal. Biochem., 138(1) (1984) 141–143. 15 J. Carroll, I.M. Fearnley and J.E. Walker, Definition of the mitochondrial proteome by measurement of molecular masses of membrane proteins, Proc. Natl. Acad. Sci. USA, 103(44) (2006) 16170–16175. 16 E. Terzi, P. Boyot, A. Van Dorsselaer, B. Luu and E. Trifilieff, Isolation and amino acid sequence of a novel 6.8-kDa mitochondrial proteolipid from beef heart. Use of FAB-MS for molecular mass determination, FEBS Lett., 260(1) (1990) 122–126. 17 A. Delobel, F. Halgand, B. Laffranchise-Gosse, H. Snijders and O. Lapre´vote, Characterization of hydrophobic peptides by atmospheric pressure photoionization-mass spectrometry and tandem mass spectrometry, Anal. Chem., 75(21) (2003) 5961–5968.
Intact Protein Mass Measurements and Top-Down Mass Spectrometry
195
18 N. Morgner, T. Kleinschroth, H.D. Barth, B. Ludwig and B. Brutschy, A Novel approach to analyze membrane proteins by laser mass spectrometry: From protein subunits to the integral complex, J. Am. Soc. Mass Spectrom., 5 (2007). [Epub ahead of print]. 19 J.S. Sampson, A.M. Hawkridge and D.C. Muddiman, Generation and detection of multiplycharged peptides and proteins by matrix-assisted laser desorption electrospray ionization (MALDESI) Fourier transform ion cyclotron resonance mass spectrometry, J. Am. Soc. Mass Spectrom., 17(12) (2006) 1712–1716. 20 R.L. Grimm and J.L. Beauchamp, Dynamics of field-induced droplet ionization: Time-resolved studies of distortion, jetting, and progeny formation from charged and neutral methanol droplets exposed to strong electric fields, J. Phys. Chem. B, 109(16) (2005) 8244–8250. 21 J.A. Loo, H.R. Udseth and R.D. Smith, Peptide and protein analysis by electrospray ionizationmass spectrometry and capillary electrophoresis-mass spectrometry, Anal. Biochem., 179(2) (1989) 404–412. 22 I.M. Fearnley and J.E. Walker, Analysis of hydrophobic proteins and peptides by electrospray ionization MS, Biochem. Soc. Trans., 24(3) (1996) 912–917. 23 J.P. Whitelegge, Tandem mass spectrometry of integral membrane proteins for top-down proteomics, Trends Anal. Chem., 24 (2005) 576–582. 24 J.P. Whitelegge, F. Halgand, P. Souda and V. Zabrouskov, Top-down mass spectrometry of integral membrane proteins, Expert Rev. Proteomics, 3(6) (2006) 585–596. 25 V. Zabrouskov and J.P. Whitelegge, Increased coverage in the transmembrane domain with activated-ion electron capture dissociation for top-down fourier-transform mass spectrometry of integral membrane proteins, J. Proteome Res., 6(6) (2007) 2205–2210. 26 R.A. Zubarev, N.L. Kelleher and F.W. McLafferty, Electron capture dissociation of multiply charged protein cations. A nonergodic process, J. Am. Chem. Soc., 120(13) (1998) 3265–3266. 27 J.E. Syka, J.J. Coon, M.J. Schroeder, J. Shabanowitz and D.F. Hunt, Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry, Proc. Natl. Acad. Sci. USA, 101 (2004) 9528–9533. 28 D.M. Horn, R.A. Zubarev and F.W. McLafferty, Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules, J. Am. Soc. Mass Spectrom., 11(4) (2000) 320. 29 L. Chen, S.K. Sze and H. Yang, Automated intensity descent algorithm for interpretation of complex high-resolution mass spectra, Anal. Chem., 78(14) (2006) 5006–5018. 30 E. Mørtz, P.B. O’Connor, P. Roepstorff, N.L. Kelleher, T.D. Wood, F.W. McLafferty and M. Mann, Sequence tag identification of intact proteins by matching tandem mass spectral data against sequence databases, Proc. Natl. Acad. Sci. USA, 93 (1996) 8264–8267. 31 S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J. Lipman, Basic local alignment search tool, J. Mol. Biol., 215 (1990) 403–410. 32 G.K. Taylor, Y.B. Kim, A.J. Forbes, F. Meng, R. McCarthy and N.L. Kelleher, Web and database software for identification of intact proteins using ‘‘top down’’ mass spectrometry, Anal. Chem., 75(16) (2003) 4081–4086. 33 R.D. LeDuc, G.K. Taylor, Y.B. Kim, T.E. Januszyk, L.H. Bynum, J.V. Sola, J.S. Garavelli and N.L. Kelleher, ProSight PTM: An integrated environment for protein identification and characterization by top-down mass spectrometry, Nucleic Acids Res., 32(Web Server issue) (2004) W340–W345. 34 J.P. Whitelegge, V. Zabrouskov, F. Halgand, P. Souda, S. Bassilian, W. Yan, L. Wolinsky, J.A. Loo, D.T. Wong and K.F. Faull, Protein-sequence polymorphisms and post-translational modifications in proteins from human saliva using top-down Foruier-transform ion cyclotron resonance mass spectrometry. Int. J. Mass Spectrom., 268 (2007) 190–197. 35 N.L. Kelleher, H.Y. Lin, G.A. Valaskovic, D.J. Aaserud, E.K. Fridriksson and F.W. McLafferty, Top down versus bottom up protein characterization by tandem high-resolution mass spectrometry, J. Am. Chem. Soc., 121 (1999) 806–807. 36 J.J. Pesavento, Y.B. Kim, G.K. Taylor and N.L. Kelleher, Shotgun annotation of histone modifications: A new approach for streamlined characterization of proteins by top down mass spectrometry, J. Am. Chem. Soc., 126(11) (2004) 3386–3387.
196
Julian P. Whitelegge
37 D. Tsur, S. Tanner, E. Zandi, V. Bafna and P.A. Pevzner, Identification of post-translational modifications by blind search of mass spectra, Nat. Biotechnol., 23(12) (2005) 1562–1567. 38 A. Makarov, Electrostatic axially harmonic orbital trapping: A high-performance technique of mass analysis, Anal. Chem., 72 (2000) 1156. 39 Q. Hu, R.J. Noll, H. Li, A. Makarov, M. Hardman and G. Cooks, The Orbitrap: A new mass spectrometer, J. Mass Spectrom., 40(4) (2005) 430–443. 40 B. Macek, L.F. Waanders, J.V. Olsen and M. Mann, Top-down protein sequencing and MS3 on a hybrid linear quadrupole ion trap-orbitrap mass spectrometer, Mol. Cell Proteomics, 5(5) (2006) 949–958.
CHAPT ER
9 Probing the Structure and Function of Integral Membrane Proteins by Mass Spectrometry Adam B. Weinglass
Contents
1. Introduction 2. Technical Aspects of Mass Spectrometry of Integral Membrane Proteins 3. MS of Integral Membrane Proteins Provides Insight into Structure, Function and Mechanism 3.1 Topology 3.2 Substrate-binding studies 3.3 Monitoring conformational change 3.4 Production of semi-synthetic and synthetic membrane proteins to analyze mechanism 3.5 MS as a crystallization tool 4. Conclusions Acknowledgements References
197 198 199 200 200 205 207 208 209 209 209
1. INTRODUCTION Membranes play a fundamental role in cellular structure by providing a physical barrier between the cell and its environment and the various sub-cellular compartments within eukaryotic cells. The integral membrane proteins residing in these membranes are involved in critical biological processes, such as ion and solute transport (ion channels and transporters), energy generation (respiration, ATP synthase) and cell signaling (G-protein coupled receptors and growth factor receptors). Consequentially, membrane proteins account for a significant proportion of all known pharmacological targets and modulating their activity Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00209-2
r 2009 Elsevier B.V. All rights reserved.
197
198
Adam B. Weinglass
relates to problems ranging from acid reflux through anxiety, depression and epilepsy to hypertension and cancer. In addition, a review of completed genome projects reveals a wealth of predicted membrane proteins with unknown function that may represent future drug targets. The tendency of integral membrane proteins to be poorly expressed, and their intimate association with lipids, introduces additional complexities in understanding the relationship between structure and function which ultimately leads to detailed mechanistic insight. Although these difficulties can be overcome with heterologous expression systems [1,2] and the prudent use of detergents, structural biology investigations lag significantly behind those on soluble proteins. For example, genomic analyses reveal that 10–30% of all open reading frames encode membrane proteins; however, their representation in proteomic datasets is much less [3]. Likewise, while there are over 24,000 structures of soluble proteins, there are only B155 unique membrane protein structures [March 2008, http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html]. The object of this chapter is to illustrate how mass spectrometry (MS) can be exploited to complement biochemical and biophysical approaches, thus providing detailed insights into the structure, function and mechanism of integral membrane proteins.
2. TECHNICAL ASPECTS OF MASS SPECTROMETRY OF INTEGRAL MEMBRANE PROTEINS The discovery of mild ionization techniques for MS in the late 1980s (matrixassisted laser desorption ionization time of flight (MALDI-TOF) [4] and electrospray ionization (ESI [5])) permits routine analysis of biological macromolecules, yielding precise definition of the covalent state of a gene product and its heterogeneity. MALDI-TOF uses samples co-crystallized with a small UV-absorbing matrix molecule to transfer molecules from a dried solid phase to the gas phase, and is thus not directly compatible with HPLC, though spotting aliquots of sequential fractions for MALDI-TOF remains a useful option. In contrast, ESI introduces the ability to couple HPLC directly to MS and the first liquid chromatography/electrospray ionization mass spectrometry (LC/ESI-MS) experiments provided an exciting breakthrough in technology [6]. Often, the HPLC eluant is directed straight to the ESI source for continuous ionization while the mass spectrometer measures the mass to charge ratio (m/z) of ions detected during the course of a scan through the m/z range. As a consequence of the low tolerance of MS to involatile salts, LC/MS has been most widely applied to reverse phase (RP)-HPLC with volatile solvents (e.g. acetonitrile) and ion-pairing agents (e.g. trifluoroacetic acid), though success has also been achieved with sizeexclusion (SE)-HPLC provided salts are excluded. Integral membrane proteins and hydrophobic peptides have a tendency to aggregate and precipitate following extraction from the membrane unless stabilized in detergent micelles. However, the high detergent to protein ratio in such micelles leads to a significant loss of signal due to detergent peaks over
Probing the Structure and Function of Integral Membrane Proteins by MS
199
a wide portion of the MS response and therefore it has been necessary to develop a new set of tools to permit MALDI-TOF [7,8] and ESI-MS of integral membrane proteins. The ultra-thin sample preparation method for MALDI-TOF [7] allows the resolution of both bacterial and mammalian membrane proteins in up to 1% of various non-ionic detergents. Furthermore, using a-cyano-4-hydroxycinnamic acid as the matrix molecule results in the production of several mass peaks representing a number of charge states leading to improved deconvolution and, potentially, mass accuracy of 0.01%. Similarly, the discovery of HPLC solvents compatible with the solubility and disaggregation of the analyte and ESI-MS [8,9] enables separation of intact proteins (containing up to 15 transmembrane helices) [10,11] and complex mixtures of hydrophobic peptides produced upon chemical/ proteolytic cleavage [12,13] with mass accuracy at 0.01%. The recent application of Fourier Transform MS permits the mass analysis of polytopic integral membrane proteins to a mass accuracy of B0.001% [14].
3. MS OF INTEGRAL MEMBRANE PROTEINS PROVIDES INSIGHT INTO STRUCTURE, FUNCTION AND MECHANISM The ability to obtain precise mass measurements of integral membrane proteins and hydrophobic peptides permits MS to be incorporated into several stages of integral membrane protein characterization complementing traditional biochemical and biophysical approaches (Figure 1). Specifically, integration of MS into membrane protein studies is yielding insights into topology, ligand binding site(s) and conformational changes in membrane proteins of unknown and known structure. Furthermore, MS is being utilized to monitor the synthesis of semi-synthetic or synthetic membrane proteins that are refractory to expression
Membrane Protein
MS
Structure Crystallography & NMR
Function Genetics, biochemistry & Biophysics
Dynamics Spectroscopy & Time resolved crystallography
Integral membrane protein characterization
Figure 1 Utilizing mass spectrometry for integral membrane protein characterization. MS can be utilized to obtain detailed insights into the mechanism and analytical properties of integral membrane proteins.
200
Adam B. Weinglass
in biological systems. Finally, by utilizing MS to precisely determine molecular mass, membrane protein crystallographers are monitoring their preparations, rationally designing ‘crystallizable cores’ and identifying protein/lipid interactions necessary for crystallization and/or diffraction. In this section, the utility of MS to these aspects of integral membrane protein characterization is discussed.
3.1 Topology The determination of transmembrane topology is the first step of membrane protein characterization. Theoretical approaches such as hydropathy plots [15] and the positive inside rule [16] typically serve as a starting point for antibody or fusion protein studies. Alternatively, many groups now generate functional Cysless versions of their protein [17]. Following the introduction of single Cys residues with retention of activity, membrane permeable or impermeable radioactive Cys-modification reagents are applied to determine topology [18]. Recently, two MS-driven approaches have been developed [19,20] allowing rapid analysis of wild-type protein without the need for site-directed mutants, antibodies or radioactivity. The X-ray structure of bacteriorhodopsin (bR) reveals that limited proteolysis does not result in cleavages within the hydrophobic interior of the bilayer [21]. Likewise, treatment of bR with the small aqueous reagent tetranitromethane (TNM) modifies only solvent accessible tyrosine residues to o-nitrotyrosine, illustrating that those Tyr at the level of the acyl chains remain unmodified and TNM can serve as a topological probe [22,23]. Therefore, based on the assumption that polytopic integral membrane proteins in general will comply to these criteria, these approaches [19,20] have been utilized to probe the topology of the purified and reconstituted glycine receptor (GlyR), a member of the ligand-gated ion channel superfamily that rapidly mediates signaling across the synaptic cleft. However, rather than identifying sites of cleavage by Edman degradation, peptides are assigned by MS and confirmed by tandem MS [19]. Likewise, following reaction with TNM, GlyR peptides were isolated by in-gel trypsin digestion and modified peptides were identified by a shift in the absorbance maxima and confirmed by an expected mass deviation from the unmodified peptides by tandem MS [20]. Such insights can be applied across the entire superfamily by homology modeling.
3.2 Substrate-binding studies 3.2.1 Group-specific modification reagents For the many integral membrane proteins lacking high-resolution structure, alternative approaches have been developed to gain detailed insights into the ligand binding site(s) [18], including Cys-scanning mutagenesis [17]. For example, starting with a functional Cys-less version of the lactose permease from Escherichia coli, each of the 417 residues in LacY is individually replaced with Cys revealing that only 6 residues are essential for function [17]. However, probing the precise role of these essential residues in ligand binding is fraught
Probing the Structure and Function of Integral Membrane Proteins by MS
201
with difficulties (Figure 2), replacing the essential residue with a suitable reporter (e.g. Cys to study the effect of ligand on alkylation) abolishes the effect to be observed. An alternative is to monitor the ability of ligand to reduce covalent modification by a group-specific modification reagent. However, in a polytopic integral membrane protein, there are usually several amino acids containing the same side chain leading to a complex signal generated by fluorescent or radioactive modification reagents. This problem can be overcome by resolving hydrophobic peptides (routinely representing over 90% of the protein sequence) produced upon cleavage of the modified protein and quantitating by LC/MS [12,13]. Utilizing carboxyl-group-specific modification reagents (carbodiimides) followed by ESI-MS, the role of an essential residue (Glu269) in the binding of substrate to the lactose permease of E. coli was investigated. After demonstrating that the substrate p-NPGal partially protects intact LacY against carbodiimide A
E126 E126 RR144 144 W 151 W151
S
E325 E325 H322 H 322 H++ R 302 R302 E269 E269
Thiol Reagents
Group specific reagents
B
E126 E126 S S R 144 4 R14 W 151 W151
C
E325 E325 H322 H 322 H++ R 302 R302 E269 C26 9
Dynamics & Accessibility Inactive-no substrate/pH effects
E126 E126 +
S
H++
R 144 4 R14 W 151 W151
S
E325 E325 H322 H 322 H++ R 302 R302 E269 E269
Accessibility Active substrate/pH effects Complex background
Figure 2 Alternative approaches towards studying binding events in membrane transport proteins. A representation of the inward facing conformation of LacY illustrates charge pairs between essential residues (A, E126/R144, E269/H322 and R302/E325). Examining the environment of these essential residues in this, and other, conformations can be accomplished by replacing the residue with an environmental reporter (B, R144C, E269C) or analyzing the environment of the native residue with group-specific modification reagents (C, R144, E269). While E269C and R144C yield insights into dynamics and accessibility of positions 144 and 269, the mutant is inactive and there are no substrate effects. In contrast, R144 and E269 modification with group specific modification reagents yield environmental information, however, there are several potential sites of modification yielding a complex background. (See Colour Plate Section at the end of this book.)
202
Adam B. Weinglass
reactivity [13], CNBr peptides were resolved indicating that a significant proportion of the decrease in carbodiimide reactivity occurs specifically in a nanopeptide containing Glu269 (Figure 3). Furthermore, by monitoring the ability of different substrate analogues to protect against carbodiimide modification of Glu269, it was speculated that the C-3 hydroxyl group of the galactopyranosyl ring plays an important role in specificity, possibly by H-bonding with Glu269 [13]. Remarkably, the X-ray crystal structure of LacY [24] confirms these detailed predictions, demonstrating the feasibility of the approach. A I Unmodified 899.7
Relative Intensity (%)
100
III p-NPGal/DiPC
II DiPC 899.7 1025.7
100
75
75
75
50
50
50
25
25
25
0
20
30
40
0
20
30
40
899.7
100
0
IV p-NPGlc/DiPC 899.7 1025.7
100 75
1025.7
50 25
20
30
40
0
20
30
40
Time (min)
B
Figure 3 Analyzing the role of an essential residue in substrate binding to LacY. (A) DiPC modifies Glu269 in a substrate-dependent manner. Purified single-Cys148 LacY was incubated with 2% dimethylsulfoxide (DMSO) (I) and 20 mM DiPC (II), 20 mM DiPC in the presence of a substrate analogue p-nitrophenyl-a-D-galactopyranoside ( p-NPGal) (III) or 20 mM DiPC in the presence of the non-substrate, p-nitrophenyl-a-D-glucopyranoside ( p-NPGlc) (IV). Data are displayed as selected ion chromatograms of unmodified and DiPC-modified peptide 268 -GELLNASIM-276 (m/z 899.7 and 1025.7, respectively). (B) Substrate-binding site of LacY. Hydrogen bonds and salt bridges are represented by dashed black lines. The carboxyl group of Glu269 lies within 4 A˚ of the C3-OH of the substrate analogue thio-b-D-galactopyranoside (TDG). (See Colour Plate Section at the end of this book.)
Probing the Structure and Function of Integral Membrane Proteins by MS
203
Group-specific covalent modification combined with ESI-MS has also been exploited to examine conformational re-arrangements in the substrate-binding site of LacY. In the absence of substrate, the essential residue Arg144 [18] is predicted to form a charge pair with Glu126 [25–28]. However, in the X-ray structure of LacY with substrate bound, Arg144 forms a salt bridge with Glu269 and is situated in a position to form an H-bond with the indole nitrogen of Trp151 [24]. To examine the environment of Arg144 in the absence of substrate and without replacement of Arg144 with a reporter (e.g. Cys or His), non-essential residues in LacY were prudently replaced with Met to engineer CNBr cleavage sites permitting the isolation of a small, high-yield peptide containing a single reactive species (Arg144) [29]. Monitoring the covalent modification of Arg144 with the arginine-specific modification reagent butane-2,3-dione (BD) reveals that when Glu126, the putative charge pair partner of Arg144, is replaced with Ala (Glu126-Ala), the reactivity of Arg144 with BD increases B3-fold (Figure 4). Conversely, the replacement Glu269-Ala elicits no significant effect on the reactivity of Arg-144 [29]. Interestingly, recent structures of LacY solved in the absence and presence of substrate confirm the re-arrangement of charge pairs during substrate binding [30]. Similarly, group-specific modification combined with ESI-MS has been utilized to explore the role of a unique carboxyl residue in EmrE [31], a small multidrug transporter in E. coli that extruded variously positively charged drugs across the plasma membrane in exchange for protons, thereby rendering cells resistant to these compounds. Biochemical experiments indicate that the basic functional unit of EmrE is a dimer where the common binding site for protons and substrate is formed by the interaction of an essential charged residue (Glu14) from both monomers [32]. Furthermore, carbodiimide modification studied using functional assays indicates that Glu14 is the target of the reaction [33,34]. By exploiting ESI-MS, it was possible to directly monitor the reaction of carbodiimide with each monomer rather than following inactivation of the functional unit. Such studies revealed that up to B80% of the Glu14 residues in EmrE are modified by the carboxyl-specific modification reagent, DiPC, in a timedependent fashion, indicating that each Glu14 residue in the oligomer is accessible to DiPC. Furthermore, pre-incubation with TPP+ reduced the reaction of Glu14 with DiPC by up to 80% (Figure 5 and [31]). Taken together with other biochemical data, the findings form part of the current ‘time-sharing’ mechanism in which both Glu14 residues in a dimer are involved in TPP+ and H+ binding [32,35].
3.2.2 Photoaffinity probes Ecker and coworkers [36] have applied a shotgun approach to probe the ligand binding site(s) of P-glycoprotein, a multidrug resistance protein often found overexpressed in human cancer cells. A series of profafenone-related photoaffinity ligands were designed combining high specificity and selectivity for P-gp with high labeling efficiency. Following irradiation and cleavage with proteases, peptide mass fingerprints correlating to ligand-modified peptides were identified
204
Adam B. Weinglass
Figure 4 Butane-2,3-dione modification of R135M/R142S/C154G and E126A/R135M/R142S/ 154G. Purified R135M/R142S/C154G LacY (B40 mM) in DDM (pH 8.0) was incubated for 30 min at 301C with 2% DMSO (I) and 20 mM BD in the absence (II) or presence (III) of 20 mM p-NPGal. Similarly, purified E126A/R135M/R142S/C154G LacY (B40 mM) in DDM (pH 8.0) was incubated for 30 min at 301C with 2% DMSO (IV) and 20 mM BD in the absence (V) or presence (VI) of 20 mM p-NPGal. For clarity of presentation, the data are displayed as an integrated mass spectrum of scans 230–280 (23.6–28.7 min) containing the unmodified and BD-modified peptide 135SNFEFGSARM145 (m/z 549.22+ and 592.22+, respectively) in the m/z range between 500 and 800 Da. The asterisk () refers to a hydrolysis product of the BD-modified 135SNFEFGSARM145 [29] produced during sample preparation under acidic conditions.
Probing the Structure and Function of Integral Membrane Proteins by MS
I Unmodified 100
II DiPC
1031.92+
1094.92+
205
III TPP+/DiPC 1031.92+
Relative Intensity (%)
75 1031.92+
50 1094.92+
25
0 52 54 56 58 60 Time (min)
52 54 56 58 60 Time (min)
52 54 56 58 60 Time (min)
Figure 5 Modification of Glu-14 by DiPC. DiPC modifies Glu-14 in a substrate-dependent manner. Purified EmrE (B40 mM) in DDM at pH 6.5 was incubated for 30 min at 301C with 2% DMSO (I) and 2 mM DiPC (II) or 2 mM DiPC in the presence of 1 mM TPP+(III). Data are displayed as selected ion chromatograms of unmodified and DiPC-modified peptide 2 NPYIYLGGAILAEVIGTTLM21 (m/z 1031.92+ and 1094.92+, respectively).
by MALDI-TOF and a number of confined protein regions contributing to drug binding were found [36]. The profafenone-related photoaffinity ligand, GPV317, designed for P-gp [36], also competes with Hoechst 33342, a substrate of the proton motive force-driven multidrug transporter LmrP of Lactococcus lactis (Weinglass, A.B., unpublished data). Interestingly, by photolabeling LmrP with GPV317 in the absence and presence of Hoechst 33342 it is possible to demonstrate that there is a single GPV317 binding site per polypeptide (unpublished data, Figure 6). Since MS allows modified peptides to be readily identified by mass and/or sequence, photoaffinity labeling serves as a useful foundation for subsequent site-directed mutants to gain insights into the mechanism of LmrP.
3.3 Monitoring conformational change In a significant proportion of soluble protein X-ray structures, the bound ligand is not accessible to the surface indicating that a high-resolution snapshot may not always serve as the best template for rational drug design [37]. Thus, studies
206
Adam B. Weinglass
A +28 +32
+24 +22
+26
+30
+34 +38 +42 +44 1000
+36
+40
1200
1400
1600
1800
2000
2200
m/z, amu B I Unmodified 46500
II GPV317 46500
III Hoechst 33342/GPV317 46500
46928
46000
47000 48000 Mass, amu
46000
47000 48000 Mass, amu
46000
47000 48000 Mass, amu
Figure 6 LmrP is covalently modified by GPV317 in a substrate protectable manner. (A) ESI mass spectrum of LmrP. ESI generates multiply charged ions, each with different m/z ratios through the addition of variable numbers of protons. For LmrP, it was necessary to generate ions with at least 21 protons such that their m/z was in the scanning range of the mass spectrometer. (B) LmrP is covalently labeled by GPV317. Computer-generated reconstruction of the zero-charged protein (BioMultiView; PE Sciex, Applied Biosystems) (I) following photolabeling with GPV317 in the absence (II) or presence of the competitive substrate Hoechst 33342 (III).
identifying the preferred conformational states of the target and the conformational transitions between these states may identify alternate and perhaps more viable points of therapeutic intervention. The nicotinic acetylcholine receptor (AChR) undergoes significant conformational changes as it transitions between the closed, open and densensitized states. Recently, in an elegant study [38], these conformational changes have been monitored in a coordinated MS and electrophysiological approach. Voltage clamped oocytes expressing AChR were pre-incubated with various lipophilic benzophenone probes and continually exposed to acetylcholine, UV irradiation was applied during 500 ms pulses to +40 or 140 mV (which produced closed or 50% open receptors). MS and MS/MS of AChR peptides produced following proteolytic cleavage revealed the precise site of probe incorporation and provided insights into the different conformational states and their transitions. MS has also been utilized to resolve two conflicting models ([39,40] and [41,42]) for the gating transition from the closed to the open state of the bacterial
Probing the Structure and Function of Integral Membrane Proteins by MS
207
potassium channel (KcsA). These models were built upon the X-ray structure of the closed conformation and site-directed spin labeling studies (SDSL) of the open state. However, even under conditions preferring the open conformation, under equilibrium conditions the closed state still predominates. To overcome these limitations, identify transient open states and reveal conformationally dynamic regions, a non-equilibrium technique named site-directed mass tagging (SDMT) has been developed [43]. In SDMT a library of KcsA mutants individually containing a single Cys residue at each residue in the protein is reacted with nitroxide spin label methanethiosulphonate (MTS) probes under conditions favoring the opening of the channel. By monitoring an appropriate mass change of the full-length molecule in a mass spectrometer, SDMT provides a means to assess the reactivity of each position in KcsA with the MTS-probe. This approach reveals that KcsA is a dynamically modular molecule with the extracellular half of the membrane-spanning region being rigid during gating, while the intracellular half undergoes a significant conformational change. SDMT represents an attractive extension of substituted cysteine accessibility mutagenesis (SCAM) [44] and Cys-scanning mutagenesis 17, however, rather than using either function or radioactivity, respectively, as the read-out, it is a change in mass.
3.4 Production of semi-synthetic and synthetic membrane proteins to analyze mechanism The lack of mechanistic insight(s) into some integral membrane proteins can sometimes be attributed to insurmountable problems of over-expression in cellular systems and/or limitations in current site-directed strategies (e.g. complexities involved in the incorporation of unnatural amino acids). Such mechanistic studies would benefit significantly from the ability to chemically synthesize the membrane proteins and functionally characterize them following site-specific incorporation of fluorescent dyes, SDSL spin labels or 15N labeled analogues for NMR. Recently, the semi-synthetic or synthetic production of membrane proteins by biological and/or solely chemical means has been reported. Notably, RP-HPLC in-line with ESI-MS has been critical for the successful purification and identification of synthetic and biological peptides and confirmation of their successful ligation [45–47]. For example, KcsA was semi-synthesized by expressed protein ligation [45,46]. Following ligation of an N-terminal recombinant peptide a-thioester to a chemically synthesized C-terminal peptide, the resultant protein folded into a tetrameric state in lipid vesicles and bound agitoxin-2 [47]. However, limitations in solid-phase peptide synthesis determined that the 35 C-terminal amino acids required to open the pore were not present in the semi-synthetic KcsA and therefore, in contrast to full-length KcsA, the truncated version did not give measurable single-channel currents in planar lipid membranes [47]. Notably, a single mutation in KcsA (A98G) allowed semi-synthesis of functional KcsA in planar lipid membranes [48] and has permitted detailed studies of the ion selectivity filter [49–51].
208
Adam B. Weinglass
Total chemical protein synthesis has been used to generate multi-milligram amounts of the mechanosensitive channel of large conductance from E. coli (Ec-MscL) and Mycobacterium tuberculosis (Tb-MscL) [52]. Following reconstitution of the synthetic channels into vesicles, single-channel recordings show that conductance, pressure dependence and substrate distribution of MscL and Tb-MscL is similar to the recombinant channel. Likewise, an integral membrane protein (Vpu) encoded by the HIV-1 genome that plays a role in release of virus particles from infected cells and in the degradation of the cellular receptor has been generated synthetically by taking advantage of MS [53].
3.5 MS as a crystallization tool Integral membrane proteins overexpressed at suitable levels for crystallization trials often contain micro-heterogeneities that hinder the formation of highquality crystals [54]. Such micro-heterogeneities (e.g. cleavage of N-terminal methionine or partial cleavage of hydrophilic domains) are often not visible on silver or Coomassie stained SDS-gels. However, MS identifies these microheterogeneities rapidly using either the ultra-thin sample preparation method [7] for MALDI-TOF or size-exclusion HPLC in-line with ESI-MS [55]. Once flexible termini have been removed by engineering the DNA construct, the remaining protein core may crystallize more readily. Although this approach has been successful for ion channels [56] and transporters [57], its one limitation is that it cannot easily identify and eliminate flexible loops in the middle of the protein sequence. Another parameter of importance for crystallization of membrane proteins is optimizing the lipid: protein ratio of purified protein in a detergent micelle. For example, thin layer chromatography (TLC) demonstrates that the glycerol phosphate transporter (GlpT) of E. coli only produces high-quality diffraction patterns when B20 lipid molecules are attached to each GlpT [58]. Recently, ESI-MS has been used to monitor the non-covalent interaction of membrane phospholipids with KcsA. Following reconstitution of KcsA into lipid vesicles of variable composition, phosphatidylglycerol (PG) and phosphatidylethanolamine (PE) are found to preferentially associate with respect to PC, perhaps reflecting differences in the affinity of these phospholipids for KcsA in the membrane [59]. Interestingly, these observations are consistent with TLC studies on KcsA [60] and raise the possibility that MS will serve as a useful tool to understand lipid/ protein interactions if intact membrane assemblies and complexes with phospholipids can be detected at physiological concentration levels while avoiding the use of organic modifiers in the ESI-MS experiments. Extraction of electron density and structural information from X-ray diffraction data of protein crystals requires knowledge of the magnitudes and phase angles of the diffracted X-rays. However, the diffraction data itself contains only magnitudes and the associated phases must be obtained by other means. ‘Phasing’ by isomorphous replacement involves analyzing the X-ray diffraction pattern of a crystal containing protein derivatized with a heavy atom with minimal perturbation of the native protein fold. In multiple wavelength
Probing the Structure and Function of Integral Membrane Proteins by MS
209
anomalous dispersion (MAD) phasing, isomorphic replacement with an anomalous scatterer such as selenium (as selenomethionine, SeMet) is accomplished by recombinantly expressing the protein in an E. coli strain that is auxotrophic for methionine and grown on a medium containing SeMet [61]. Alternatively, in multiple isomorphous replacement (MIR), protein crystals are soaked with selected heavy atom derivatization reagents. For both of these approaches MS has been successfully used to ensure high stochiometric incorporation and for crystal soaking experiments, MS allows the extent of heavy atom incorporation to be analyzed in single crystals [62].
4. CONCLUSIONS Combining chemical and/or protease cleavage followed by HPLC in-line with MS rather than Edman degradation for the identification of peptides has led to the design of numerous experiments to investigate topology and substrate binding to membrane proteins. Likewise, the means to rapidly analyze the mass of full-length membrane proteins has led to technical advances to probe dynamics by non-equilibrium techniques. Moreover, the same approaches allow crystallographers to rapidly determine the quality of their protein and rationally engineer the DNA construct to remove micro-heterogeneities and flexible termini that are not visible by classical staining of SDS-gels. Additionally, by exploiting LC/MS to purify and identify synthetic and biologically produced peptides has led to its incorporation into the semi-synthetic and synthetic production of significant amounts of membrane protein that may contain unnatural amino acids or fluorescent, EPR and NMR probes for detailed mechanistic studies. Thus, by integrating MS with structural tools such as crystallography and NMR, an understanding of the molecular mechanism of membrane proteins will be accelerated.
ACKNOWLEDGEMENTS The author would like to thank Peter Chiba and Gerhard Ecker for providing GPV317 and Wil Konings and Piotr Mazurkiewicz for bacterial strains expressing LmrP. Special thanks to Drs. H. Ronald Kaback, Julian Whitelegge and Kym Faull for their help, support and guidance with biochemical and mass spectral aspects of this research. Finally, special thanks goes to our collaborators Shimon Schuldiner, Misha Soskine and Jose Luis Vazquez-Ibar.
REFERENCES 1 C.G. Tate, Overexpression of mammalian integral membrane proteins for structural studies, FEBS Lett., 504 (2001) 94–98. 2 R. Grisshammer and C.G. Tate, Overexpression of integral membrane proteins for structural studies, Q. Rev. Biophys., 28 (1995) 315–422.
210
Adam B. Weinglass
3 C.C. Wu and J.R. Yates, 3rd., The application of mass spectrometry to membrane proteomics, Nat. Biotechnol., 21 (2003) 262–267. 4 M. Karas and F. Hillenkamp, Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons, Anal. Chem., 60 (1988) 2299–2301. 5 J.B. Fenn, M. Mann, C.K. Meng, S.F. Wong and C.M. Whitehouse, Electrospray ionization for mass spectrometry of large biomolecules, Science, 246 (1989) 64–71. 6 T.R. Covey, R.F. Bonner, B.I. Shushan and J. Henion, The determination of protein, oligonucleotide and peptide molecular weights by ion-spray mass spectrometry, Rapid Commun. Mass Spectrom., 2 (1988) 249–256. 7 M. Cadene and B.T. Chait, A robust, detergent-friendly method for mass spectrometric analysis of integral membrane proteins, Anal. Chem., 72 (2000) 5655–5658. 8 P.A. Schindler, A. Van Dorsselaer and A.M. Falick, Analysis of hydrophobic proteins and peptides by electrospray ionization mass spectrometry, Anal. Biochem., 213 (1993) 256–263. 9 J.P. Whitelegge, HPLC and mass spectrometry of intrinsic membrane proteins, Methods Mol. Biol., 251 (2004) 323–340. 10 E. Turk, O. Kim, J. le Coutre, J.P. Whitelegge, S. Eskandari, J.T. Lam, M. Kreman, G. Zampighi, K.F. Faull and E.M. Wright, Molecular characterization of Vibrio parahaemolyticus vSGLT: A model for sodium-coupled sugar cotransporters, J. Biol. Chem., 275 (2000) 25711–25716. 11 J. le Coutre, J.P. Whitelegge, A. Gross, E. Turk, E.M. Wright, H.R. Kaback and K.F. Faull, Proteomics on full-length membrane proteins using mass spectrometry, Biochemistry, 39 (2000) 4237–4242. 12 H. Venter, A.E. Ashcroft, J.N. Keen, P.J. Henderson and R.B. Herbert, Molecular dissection of membrane-transport proteins: mass spectrometry and sequence determination of the galactose-H+ symport protein, GalP, of Escherichia coli and quantitative assay of the incorporation of [ring-213C]histidine and (15)NH(3), Biochem. J., 363 (2002) 243–252. 13 A.B. Weinglass, J.P. Whitelegge, Y. Hu, G.E. Verner, K.F. Faull and H.R. Kaback, Elucidation of substrate binding interactions in a membrane transport protein by mass spectrometry, Embo. J., 22 (2003) 1467–1477. 14 J.P. Whitelegge, Thylakoid membrane proteomics, Photosynth. Res., 78 (2003) 265–277. 15 J. Kyte and R.F. Doolittle, A simple method for displaying the hydropathic character of a protein, J. Mol. Biol., 157 (1982) 105–132. 16 G. von Heijne, Control of topology and mode of assembly of a polytopic membrane protein by positively charged residues, Nature, 341 (1989) 456–458. 17 S. Frillingos, M. Sahin-Toth, J. Wu and H.R. Kaback, Cys-scanning mutagenesis: A novel approach to structure function relationships in polytopic membrane proteins, FASEB. J., 12 (1998) 1281–1299. 18 H.R. Kaback, M. Sahin-Toth and A.B. Weinglass, The kamikaze approach to membrane transport, Nat. Rev. Mol. Cell Biol., 2 (2001) 610–620. 19 J.F. Leite, A.A. Amoscato and M. Cascio, Coupled proteolytic and mass spectrometry studies indicate a novel topology for the glycine receptor, J. Biol. Chem., 275 (2000) 13683–13689. 20 J.F. Leite and M. Cascio, Probing the topology of the glycine receptor by chemical modification coupled to mass spectrometry, Biochemistry, 41 (2002) 6140–6148. 21 S. Fimmel, T. Choli, N.A. Dencher, G. Buldt and B. Wittmann-Liebold, Topography of surfaceexposed amino acids in the membrane protein bacteriorhodopsin determined by proteolysis and micro-sequencing, Biochim. Biophys. Acta, 978 (1989) 231–240. 22 H.D. Lemke and D. Oesterhelt, The role of tyrosine residues in the function of bacteriorhodopsin. Specific nitration of tyrosine 26, Eur. J. Biochem., 115 (1981) 595–604. 23 P. Scherrer and W. Stoeckenius, Selective nitration of tyrosines-26 and -64 in bacteriorhodopsin with tetranitromethane, Biochemistry, 23 (1984) 6195–6202. 24 J. Abramson, I. Smirnova, V. Kasho, G. Verner, H.R. Kaback and S. Iwata, Structure and mechanism of the lactose permease of Escherichia coli, Science, 301 (2003) 610–615. 25 P. Venkatesan and H.R. Kaback, The substrate-binding site in the lactose permease of Escherichia coli, Proc. Natl. Acad. Sci. USA, 95 (1998) 9802–9807. 26 M. Sahin-Toth, J. le Coutre, D. Kharabi, G. le Maire, J.C. Lee and H.R. Kaback, Characterization of Glu126 and Arg144, two residues that are indispensable for substrate binding in the lactose permease of Escherichia coli, Biochemistry, 38 (1999) 813–819.
Probing the Structure and Function of Integral Membrane Proteins by MS
211
27 M. Zhao, K.C. Zen, W.L. Hubbell and H.R. Kaback, Proximity between Glu126 and Arg144 in the lactose permease of Escherichia coli, Biochemistry, 38 (1999) 7407–7412. 28 C.D. Wolin and H.R. Kaback, Thiol cross-linking of transmembrane domains IV and V in the lactose permease of Escherichia coli, Biochemistry, 39 (2000) 6130–6135. 29 A. Weinglass, J.P. Whitelegge, K.F. Faull and H.R. Kaback, Monitoring conformational rearrangements in the substrate-binding site of a membrane transport protein by mass spectrometry, J. Biol. Chem., 279 (2004) 41858–41865. 30 O. Mirza, L. Guan, G. Verner, S. Iwata and H.R. Kaback, Structural evidence for induced fit and a mechanism for sugar/H+ symport in LacY, Embo. J., 25 (2006) 1177–1183. 31 A.B. Weinglass, M. Soskine, J.L. Vazquez-Ibar, J.P. Whitelegge, K.F. Faull, H.R. Kaback and S. Schuldiner, Exploring the role of a unique carboxyl residue in EmrE by mass spectrometry, J. Biol. Chem., 280 (2005) 7487–7492. 32 H. Yerushalmi and S. Schuldiner, A model for coupling of H(+) and substrate fluxes based on ‘time-sharing’ of a common binding site, Biochemistry, 39 (2000) 14711–14719. 33 H. Yerushalmi, S.S. Mordoch and S. Schuldiner, A single carboxyl mutant of the multidrug transporter EmrE is fully functional, J. Biol. Chem., 276 (2001) 12744–12748. 34 H. Yerushalmi and S. Schuldiner, An essential glutamyl residue in EmrE, a multidrug antiporter from Escherichia coli, J. Biol. Chem., 275 (2000) 5264–5269. 35 S. Schuldiner, When biochemistry meets structural biology: The cautionary tale of EmrE, Trends Biochem. Sci., (2007). 36 G.F. Ecker, E. Csaszar, S. Kopp, B. Plagens, W. Holzer, W. Ernst and P. Chiba, Identification of ligand-binding regions of P-glycoprotein by activated-pharmacophore photoaffinity labeling and matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry, Mol. Pharmacol., 61 (2002) 637–648. 37 S.J. Teague, Implications of protein flexibility for drug discovery, Nat. Rev. Drug Discov., 2 (2003) 527–541. 38 J.F. Leite, M.P. Blanton, M. Shahgholi, D.A. Dougherty and H.A. Lester, Conformation-dependent hydrophobic photolabeling of the nicotinic receptor: Electrophysiology-coordinated photochemistry and mass spectrometry, Proc. Natl. Acad. Sci. USA, 100 (2003) 13054–13059. 39 Y. Jiang, A. Lee, J. Chen, M. Cadene, B.T. Chait and R. MacKinnon, Crystal structure and mechanism of a calcium-gated potassium channel, Nature, 417 (2002) 515–522. 40 Y. Jiang, A. Lee, J. Chen, M. Cadene, B.T. Chait and R. MacKinnon, The open pore conformation of potassium channels, Nature, 417 (2002) 523–526. 41 E. Perozo, D.M. Cortes and L.G. Cuello, Structural rearrangements underlying K+-channel activation gating, Science, 285 (1999) 73–78. 42 Y.S. Liu, P. Sompornpisut and E. Perozo, Structure of the KcsA channel intracellular gate in the open state, Nat. Struct. Biol., 8 (2001) 883–887. 43 B.L. Kelly and A. Gross, Potassium channel gating observed with site-directed mass tagging, Nat. Struct. Biol., 10 (2003) 280–284. 44 A. Karlin and M.H. Akabas, Substituted-cysteine accessibility method, Methods Enzymol., 293 (1998) 123–145. 45 K. Severinov and T.W. Muir, Expressed protein ligation, a novel method for studying protein– protein interactions in transcription, J. Biol. Chem., 273 (1998) 16205–16209. 46 T.W. Muir, D. Sondhi and P.A. Cole, Expressed protein ligation: A general method for protein engineering, Proc. Natl. Acad. Sci. USA, 95 (1998) 6705–6710. 47 F.I. Valiyaveetil, R. MacKinnon and T.W. Muir, Semisynthesis and folding of the potassium channel KcsA, J. Am. Chem. Soc., 124 (2002) 9113–9120. 48 F.I. Valiyaveetil, M. Sekedat, T.W. Muir and R. MacKinnon, Semisynthesis of a functional K+ channel, Angew. Chem. Int. Ed. Engl., 43 (2004) 2504–2507. 49 F.I. Valiyaveetil, M. Sekedat, R. Mackinnon and T.W. Muir, Glycine as a D-amino acid surrogate in the K(+)-selectivity filter, Proc. Natl. Acad. Sci. USA, 101 (2004) 17045–17049. 50 F.I. Valiyaveetil, M. Sekedat, R. MacKinnon and T.W. Muir, Structural and functional consequences of an amide-to-ester substitution in the selectivity filter of a potassium channel, J. Am. Chem. Soc., 128 (2006) 11591–11599.
212
Adam B. Weinglass
51 F.I. Valiyaveetil, M. Leonetti, T.W. Muir and R. Mackinnon, Ion selectivity in a semisynthetic K+ channel locked in the conductive conformation, Science, 314 (2006) 1004–1007. 52 D. Clayton, G. Shapovalov, J.A. Maurer, D.A. Dougherty, H.A. Lester and G.G. Kochendoerfer, Total chemical synthesis and electrophysiological characterization of mechanosensitive channels from Escherichia coli and Mycobacterium tuberculosis, Proc. Natl. Acad. Sci. USA, 101 (2004) 4764–4769. 53 G.G. Kochendoerfer, D.H. Jones, S. Lee, M. Oblatt-Montal, S.J. Opella and M. Montal, Functional characterization and NMR spectroscopy on full-length Vpu from HIV-1 prepared by total chemical synthesis, J. Am. Chem. Soc., 126 (2004) 2439–2446. 54 D.N. Wang, M. Safferling, M.J. Lemieux, H. Griffith, Y. Chen and X.D. Li, Practical aspects of overexpressing bacterial secondary membrane transporters for structural studies, Biochim. Biophys. Acta, 1610 (2003) 23–36. 55 J.P. Whitelegge, C.B. Gundersen and K.F. Faull, Electrospray-ionization mass spectrometry of intact intrinsic membrane proteins, Protein Sci., 7 (1998) 1423–1430. 56 D.A. Doyle, J. Morais Cabral, R.A. Pfuetzner, A. Kuo, J.M. Gulbis, S.L. Cohen, B.T. Chait and R. MacKinnon, The structure of the potassium channel: Molecular basis of K+ conduction and selectivity, Science, 280 (1998) 69–77. 57 Y. Huang, M.J. Lemieux, J. Song, M. Auer and D.N. Wang, Structure and mechanism of the glycerol-3-phosphate transporter from Escherichia coli, Science, 301 (2003) 616–620. 58 M.J. Lemieux, J. Song, M.J. Kim, Y. Huang, A. Villa, M. Auer, X.D. Li and D.N. Wang, Threedimensional crystallization of the Escherichia coli glycerol-3-phosphate transporter: A member of the major facilitator superfamily, Protein Sci., 12 (2003) 2748–2756. 59 J.A. Demmers, A. van Dalen, B. de Kruijff, A.J. Heck and J.A. Killian, Interaction of the K+ channel KcsA with membrane phospholipids as studied by ESI mass spectrometry, FEBS Lett., 541 (2003) 28–32. 60 F.I. Valiyaveetil, Y. Zhou and R. MacKinnon, Lipids in the structure, folding, and function of the KcsA K+ channel, Biochemistry, 41 (2002) 10771–10777. 61 S. Doublie, Preparation of selenomethionyl proteins for phase determination, Methods Enzymol., 276 (1997) 523–530. 62 S.L. Cohen and B.T. Chait, Mass spectrometry as a tool for protein crystallography, Annu. Rev. Biophys. Biomol. Struct., 30 (2001) 67–85.
A
E126 E126 RR144 144 W 151 W151
S
E325 E325 H322 H 322 H++ R 302 R302 E269 E269
Thiol Reagents
Group specific reagents
B
E126 E126 S S R 144 4 R14 W 151 W151
C
E325 E325 H322 H 322 H++ R 302 R302 E269 C26 9
Dynamics & Accessibility Inactive-no substrate/pH effects
E126 E126 +
S
H++
R 144 R14 4 W 151 W151
S
E325 E325 H322 H 322 H++ R 302 R302 E269 E269
Accessibility Active substrate/pH effects Complex background
Plate 2 Alternative approaches towards studying binding events in membrane transport proteins. A representation of the inward facing conformation of LacY illustrates charge pairs between essential residues (A, E126/R144, E269/H322 and R302/E325). Examining the environment of these essential residues in this, and other, conformations can be accomplished by replacing the residue with an environmental reporter (B, R144C, E269C) or analyzing the environment of the native residue with group-specific modification reagents (C, R144, E269). While E269C and R144C yield insights into dynamics and accessibility of positions 144 and 269, the mutant is inactive and there are no substrate effects. In contrast, R144 and E269 modification with group specific modification reagents yield environmental information, however, there are several potential sites of modification yielding a complex background. (For Black and White version, see page 201.)
A I Unmodified 899.7
Relative Intensity (%)
100
III p-NPGal/DiPC
II DiPC 899.7 1025.7
100
100
75
75
75
50
50
50
25
25
25
0
20
30
40
0
20
30
0 40 20 Time (min)
899.7
IV p-NPGlc/DiPC 899.7 1025.7
100 75
1025.7
50 25
30
40
0
20
30
40
B
Plate 3 Analyzing the role of an essential residue in substrate binding to LacY. (A) DiPC modifies Glu269 in a substrate-dependent manner. Purified single-Cys148 LacY was incubated with 2% dimethylsulfoxide (DMSO) (I) and 20 mM DiPC (II), 20 mM DiPC in the presence of a substrate analogue p-nitrophenyl-a-D-galactopyranoside ( p-NPGal) (III) or 20 mM DiPC in the presence of the non-substrate, p-nitrophenyl-a-D-glucopyranoside ( p-NPGlc) (IV). Data are displayed as selected ion chromatograms of unmodified and DiPC-modified peptide 268 -GELLNASIM-276 (m/z 899.7 and 1025.7, respectively). (B) Substrate-binding site of LacY. Hydrogen bonds and salt bridges are represented by dashed black lines. The carboxyl group of ˚ of the C3-OH of the substrate analogue thio-b-D-galactopyranoside (TDG). Glu269 lies within 4 A (For Black and White version, see page 202.)
CHAPT ER
10 Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology Anna E. Speers and Christine C. Wu
Contents
1. 2. 3. 4. 5. 6.
Introduction IMP Structure and Characterization Mass Spectrometry Instrumentation General Considerations for Sample Preparation Localizing Glycosylation Sites Limited Proteolysis 6.1 Benchmark study: The glycine receptor 6.2 The high pH-proteinase K (hppK) method 6.3 Targeting transmembrane domains 6.4 Limited proteolysis applied to global topology profiling 7. Residue-Specific Chemical Modification 7.1 Biotinylation of cell surface lysines 7.2 Labeling solvent-exposed cysteines 7.3 o-Nitrosylation of interfacial tyrosines 7.4 Oxidation of solvent-exposed methionines 7.5 Labeling binding-site glutamates and arginines 8. Photoaffinity Labeling of Binding-Site Residues 9. Cross-Linking 10. H/D Exchange 11. Summary and Future Directions Abbreviations Acknowledgement References
Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00210-9
214 214 216 218 220 222 223 224 224 225 226 226 227 228 228 229 230 233 234 236 238 239 239
r 2009 Elsevier B.V. All rights reserved.
213
214
Anna E. Speers and Christine C. Wu
1. INTRODUCTION Integral membrane proteins (IMPs) mediate a host of cellular processes, including intercellular communication, ion transport, and propagation of signaling cascades [1–3]. As such, there is great interest in elucidating their roles in patho/ physiological processes and their potential as therapeutic drug targets. One crucial component to such insight is detailed structural/topological information, which can form the basis for homology classification, functional assignment, and the rational design of activators and inhibitors; however, owing to their difficult biophysical properties, IMP characterization lags far behind that of their soluble counterparts. Consequently, a variety of low-to-moderate resolution biochemical approaches have been developed, which, in concert, can yield significant insight into IMP biology. In a broad sense, these techniques all rely on the same principle: assessing the susceptibility of specific residues to chemical modification — by endogenous enzymes (glycosylation), probe labeling (e.g., cell surface biotinylation), isotopic labeling (H/D exchange), or proteolytic cleavage (membrane shaving) — to ascertain the nature of the local environment (solvent- or lipidexposed). Such experiments can provide valuable information regarding IMP structure (tertiary and quaternary organization, ligand-binding sites) and topology (membrane-embedded vs. solvent-exposed domains, cytoplasmic orientation) in the absence of high-resolution data, as well as insight into dynamic protein conformations. In recent years, mass spectrometry (MS) analysis has been integrated into a number of these methods. MS is an attractive technique owing to its high information content (accurate mass, sequencing, and localization of covalent modifications), low detection thresholds (routinely low fmol to attomol), and highthroughput capacity. This chapter will discuss the strengths and limitations of different methods for IMP structure/topology characterization and the potential benefits afforded by MS technology.
2. IMP STRUCTURE AND CHARACTERIZATION The peptide backbone is an inherently polar structure. Consequently, to exist stably within the membrane, membrane-embedded protein domains must adopt secondary structures that shield the backbone from the hydrophobic core of the lipid bilayer. Both a-helices and b-sheets are motifs characteristic of transmembrane domains (TMDs) that allow for extensive hydrogen bonding of the peptide backbone, minimizing unfavorable interactions. b-Barrel proteins (porins) are composed of multiple, amphiphilic b-strands, having alternating polar and hydrophobic residues for interaction with the central pore and lipids, respectively. They are present in the outer membranes of chloroplasts, mitochondria, and Gram() bacteria, where their primary function is to regulate membrane integrity and allow for the passive influx/efflux of small molecules. b-Barrel proteins, however, are predicted to make up only a small fraction of the genome [4]. In contrast, proteins with transmembrane a-helices (TMHs) are abundant in nearly all membrane types, and are predicted to make up 20–25% of all open reading
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
215
frames for most organisms [5,6]. Given their abundance, a-helical IMPs have been the primary subject of structural/topological characterization. a-Helical IMPs can be divided into two main categories: bitopic and polytopic. Bitopic, or single-pass IMPs, have one TMH and solvent-exposed globular domains on either side of the membrane. They can be oriented with either their C-terminus or N-terminus in the cytoplasm, with the opposite tail in the exoplasmic space (i.e., extracellular/luminal). Cell surface markers, receptors, and adhesion factors are often bitopic proteins, having cytoplasmic domains involved in cellular signaling or interaction with cytoskeletal components [7]. IMPs with multiple TMHs are classified as polytopic or multi-pass, with helices arranged in a bundle oriented approximately normal to the plane of the membrane [8]. Almost 5% of mammalian genes code for 7-TMH G-protein-coupled receptors (GPCRs), while numerous small molecule transport proteins in eubacteria/archaea/fungi/ plants are polytopic IMPs with 6 or 12 TMHs [5]. IMPs have long resisted structural characterization by standard means. As a result, the Protein Data Bank [9] currently contains only B160 unique highresolution IMP structures (see http://blanco.biomol.uci.edu/Membrane_Proteins_ xtal.html). It has been estimated that the number of solved structures doubles every B3 years and lags B15 years behind that of soluble proteins [10]. This discrepancy is largely accounted for by the technical challenges involved in IMP recombinant expression, purification, and solubilization in active form [1–3]. These species also resist crystallization and structural NMR studies, which require significant amounts of purified material and can be complicated by the presence of the structurally important lipid bilayer [2]. Unlike soluble proteins, however, primary sequence analysis of IMPs can yield significant insight into secondary structure. The defined composition of TMHs and length constrains imposed by the membrane greatly facilitates identification of membrane-embedded segments, providing a basis for predicting protein folding with respect to the membrane. TMHs must be at least 15 (though typically 20–25) residues long to span the entire membrane, and are composed primarily of aliphatic residues for favorable interactions with lipid tails [11]. The aromatic amino acids Tyr and Trp are relatively abundant at the lipid–water interface, while charged/polar residues are almost entirely restricted to soluble domains [12]. Additionally, cytoplasmically exposed loops, which do not have to be translocated upon folding, tend to contain more positively charged residues than exoplasmic domains due to the energetic cost of moving polar species across the hydrophobic membrane core [7,11]. This so-called positive-inside rule [13,14] is used to indicate IMP orientation. Based on these parameters, a number of topology prediction algorithms have been developed that can predict (with at least some rough degree of accuracy) the overall topology of IMPs, i.e., the number and location of TMHs and their orientation with respect to the cytoplasm (for review see refs. [7,11,15]). However, topology prediction has proven to be a far more complicated process due to the now apparent structural diversity of IMPs. Membrane-spanning regions may be extremely long, containing 40 or more residues [16], highly tilted, flexed, kinked, or even interrupted by short intra-membrane breaks in helicity [17–19]. Additionally, helices may enter the membrane and then turn back to form
216
Anna E. Speers and Christine C. Wu
a re-entrant loop rather than an actual TMH [20,21]. There are also examples of dual topology proteins that co-exist in opposite orientations [22], and ‘‘frustrated’’ proteins that adopt multiple orientations due to conflicting structural requirements [23] (for reviews see Refs. [4,24]). Given this diversity and the slow progress in high-resolution structural characterization, defining the structure/topology of IMPs has benefited greatly from multidisciplinary approaches — mutagenesis studies (cysteine scanning, cysteine accessibility, spin labeling, engineering N-linked glycosylation sites), localization of specific antibody epitopes, fusion with reporter proteins (for Escherichia coli and yeast), limited proteolysis, and chemical labeling with hydrophilic and lipophilic reagents — that rely on immunogenic, fluorescent, or radioactive detection [25,26]. In recent years, MS platforms have begun to supplant, or at least provide a complement to, these traditional biochemical readouts.
3. MASS SPECTROMETRY INSTRUMENTATION The two basic strategies for protein analysis by MS are top-down and bottom-up. The top-down approach uses MS methods for the detailed analysis of intact proteins, whereas the bottom-up approach analyzes peptides generated from protein digests. This chapter addresses the bottom-up approach to IMP characterization. In terms of MS platforms, samples of little to moderate complexity can be analyzed directly, whereas more complex samples typically require prior separation (e.g., chromatographic, electrophoretic). The ‘‘soft’’ ionization techniques of matrix-assisted laser desorption/ionization (MALDI) [27] and electrospray ionization (ESI) [28] are almost exclusively used to introduce charged peptides into the mass spectrometer, with MALDI being more tolerant to detergents and other contaminants, while ESI is more readily interfaced with liquid chromatography (LC) separation. Inside the mass spectrometer, the parent mass of each species is determined, and peptide ions can undergo further fragmentation to generate tandem MS (MS/MS) spectra for peptide sequencing and localization of covalent modifications. Single stage (e.g., time-of-flight, TOF) analyzers provide parent masses only. Multistage instruments like quadrupole-TOF hybrids (Q-TOF), ion trap (IT), linear ion trap (LIT), orbitrap, or Fourier transform-ion cyclotron resonance (FT-ICR) instruments have MS/MS capabilities. As will be discussed, MS investigation of IMP structure/topology largely involves the analysis of modified peptides and/or complex mixtures. As such, critical instrumentation parameters include mass accuracy and resolving power — important for peptide identification and isotopic resolution — and also sensitivity and high-throughput capacity. ITs and LITs are sensitive instruments capable of high-throughput online analysis of LC eluate via ESI. LITs have a higher trapping capacity, and thus greater sensitivity and dynamic range than ITs. However, in high-throughput mode, the resolution of IT-type analyzers is relatively low. In comparison, TOF analyzers have better resolution and mass accuracy (Figure 1). Multistage Q-TOF analyzers (having a collision cell between the Q and TOF, often a second Q), exhibit high resolution and mass accuracy in both the MS
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
217
Figure 1 Low-resolution vs. high-resolution spectra. Illustrations for (A) resolution of isotopic peaks for one peptide ion and (B) deconvolution of isotopic envelopes for two overlapping peptide ions using a high-resolution MS instrument.
and MS/MS modes; however, because they lack trapping capacity, quadrupole machines are less sensitive than IT devices for standard MS/MS data acquisition. Conversely, because quadrupoles are not affected by space-charging effects, they have better dynamic range than ITs/LITs. As such, they are often used for quantitation. For example, triple quadrupoles (Q-Q-Q) and hybrid Q-Q-LIT devices are excellent for quantitation via multiple reaction monitoring, where specific transitions between precursor and fragment ions are recorded [29]. Currently, FT-ICR analyzers have the highest resolution/mass accuracy, thus affording better quality spectra and increased peak capacity. Hybrid instruments are particularly useful for high-throughput experiments. FT-ICRs with external LITs allow for high-resolution/mass accuracy MS spectra (for parent ion identification) and parallel high-throughput acquisition of low-resolution MS/MS spectra (for sequencing). As a result, parent masses can be used to constrain search parameters of the MS/MS spectra, resulting in identification of more peptides with high confidence. Orbitraps have similar characteristics to FTICRs and can also be interfaced with LITs. However, they are generally easier to maintain as they do not require a superconducting magnet, relying instead on an oscillating electric field for ion separation. (For review of MS instrumentation see ref. [29].) For summary see Table 1. To generate fragmentation ion spectra, peptides are typically collided with an inert gas (collision-induced dissociation, CID). However, alternative fragmentation techniques are also available. Electron capture dissociation (ECD) [30] on the
218
Table 1
Anna E. Speers and Christine C. Wu
Comparison of common mass spectrometers operating in MS/MS modea
Mass accuracy Resolution Sensitivity Throughput Dynamic range
IT/LIT
Q-TOF
Q-Q-Q
FT-ICR/orbitrap
Low Low High Very high Low
High High Moderate High Moderate
Moderate Low Moderate High High
Very high Very high High High Moderate
a
Parameters are a general guideline only and are dependent upon instrument configuration, ionization method, and scanning mode used.
FT-ICR and electron transfer dissociation (ETD) [31] on the LIT rely on the lowenergy electron transfer from ions in the collision cell, resulting in a more even distribution of fragmentation, enhanced preservation of post-translational modifications (PTMs), and the ability to analyze long polypeptides. As such, these fragmentation techniques could hold promise for the localization of covalent modifications and large, hydrophobic peptides from TMH domains [29,32]. Unlike CID, ECD and ETD may also prevent hydrogen/deuterium (H/D) scrambling of isotopically labeled peptides during fragmentation, allowing residue-specific localization of deuterium incorporation for H/D exchange experiments [30,33].
4. GENERAL CONSIDERATIONS FOR SAMPLE PREPARATION A prerequisite for any bottom-up MS study is assessing whether or not it is possible to achieve adequate sequence coverage of the protein(s) of interest. Appropriate sample preparation is critical for maximizing sequence coverage. In general, IMPs are low-abundant species, and can be difficult to solubilize in aqueous solution owing to their amphipathic nature, hindering separation, digestion, and analysis. Consequently, specialized protocols have been developed for handling IMPs. With complex proteomic samples, some type of separation is necessary prior to analysis. Gel-based methods are best avoided, especially isoelectric focusing (IEF)/sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), as IMPs do not solubilize well in IEF-compatible buffers and tend to precipitate at their isoelectric point. Even if the IEF step is avoided, it can still be difficult to extract hydrophobic peptides from gels following in-gel digest. While a variety of alternative methods exist for protein separation (e.g., solution-phase IEF, anion or cation exchange, reversed phase (RP)LC), when employing a bottom-up MS strategy, it can be advantageous to first digest proteins en masse and then separate the resulting peptides by one or more orthogonal means. This is termed as the ‘‘shotgun’’ [34] strategy for MS-based proteomics. The most widely used implementation is microcapillary liquid chromatography (mLC) interfaced directly with ESI mass spectrometers. The mLC columns are fused silica capillaries
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
219
with B100 micron inner diameter and a laser-pulled 5 micron tip; they are capable of flow rates of a few hundred nanoliters per minute (thus mLC is often referred to as nanoflow LC). Columns are packed at high pressure with RP resin, or, in the case of Multidimensional Protein Identification Technology (MudPIT) [35,36], a combination of strong cation exchange (SCX) and RP materials. Although less common, systems for the automated deposition of LC eluate on MALDI plates are also available (e.g., see ref. [37]). Because most detergents (most notably SDS) used to facilitate IMP solubilization/denaturation for proteolysis are difficult to remove and can dramatically suppress peptide ionization [37–39], a variety of alternative solubilization and digestion techniques have been developed to optimize (1) the efficiency of digestion, (2) types of peptides generated, and (3) downstream MS compatibility (for review see Ref. [40]). IMPs can be solubilized to some extent in 60% methanol and then digested with trypsin and/or chymotrypsin [41]. Organic acids can also be used for solubilization in combination with methionine-directed cyanogen bromide (CNBr) digestion [35,36]. A variety of ‘‘MS-compatible’’ detergents are also available, including acid labile species like RapiGest (Waters) and PPS (Protein Discovery), which break down into innocuous byproducts without detergent-like properties. The proprietary detergent Invitrosol (Invitrogen) has a different LC retention time than most peptides, preventing interference, and sodium deoxycholate has been shown to precipitate upon acidification [42]. Alternatively, on-membrane proteolysis [43], which involves selective digestion of soluble domains while leaving membrane-embedded hydrophobic domains untouched, requires no detergent, only chaotropes (e.g., urea and guanidinium chloride), which are easily removed by desalting during LC, as they do not bind to RP or SCX resin. The application of on-membrane or ‘‘limited’’ proteolysis to IMP topology will be discussed below. In terms of digestion techniques, trypsin is not ideal for IMPs, as TMHs have few charged residues and thus few tryptic cleavage sites (Lys, Arg) [12]. As a result, trypsin digests produce large, hydrophobic TMH-containing peptides that tend to adhere to RP and SCX resin and can be difficult to sequence without the aid of high mass accuracy/high-resolution instruments and high-charge state deconvolution. As such, trypsin can be used in combination with enzymes (e.g., chymotrypsin [41]) or reagents (CNBr [44,45]) that cleave at non-polar residues (Trp, Tyr, Phe for chymotrypsin; Met for CNBr) more likely to be found in TMHs. While cleavage strategies directed at reducing the length and hydrophobicity of TMHs are critical for successful analysis, optimized conditions for mLC separation are also important, namely separation at elevated temperature. While most high-performance liquid chromatography (HPLC) systems have integrated thermostatting capability, allowing routine analysis above room temperature [46], the column heater functionality was largely omitted when transitioning to the mLC platform for shotgun proteomics applications. This omission may have significantly hindered IMP analysis, as mLC at elevated temperature has been shown to significantly improve the elution of hydrophobic peptides, with separation at 601C giving 5-fold more unique peptide identifications as compared to room temperature [45].
220
Anna E. Speers and Christine C. Wu
For ESI-MS/MS applications, hydrophobic peptides are expected to ionize and fragment as well if not better more than hydrophilic species. ESI has also been shown to slightly favor the identification of hydrophobic peptides over MALDI, although not by enough to discount the use of the latter for IMP analysis (for discussion see ref. [40]). Thus, with the appropriate sample preparation and analysis conditions, and aided by high-resolution/mass accuracy instrumentation (as well as robust MS/MS analysis software, which is outside the scope of this chapter), good sequence coverage of IMPs is theoretically possible using bottomup MS strategies, laying the groundwork for comprehensive structure/topology analysis.
5. LOCALIZING GLYCOSYLATION SITES One of the most basic methods for topological characterization is identification of glycosylated residues. Because N-linked glycosylation (Asn modification) takes place in the lumen of the endoplasmic reticulum (ER), only the termini and loops of IMPs exposed to this environment should become glycosylated. (Note: luminally oriented domains become extracellular domains upon vesicle fusion with the plasma membrane.) As such, determination of glycosylation sites can be a valuable tool for basic topological determination. Lectin-mediated affinity purification is commonly applied to the isolation of glycoproteins/glycopeptides. Species bound to the affinity column are eluted upon treatment with peptide-Nglycosidase F (PNGase F), which removes the carbohydrate moiety, converting Asn to Asp. If this hydrolysis is carried out in 18O water, isotope incorporation marks the Asp as a formerly glycosylated residue. Following this strategy, Kaji et al. [47] affinity purified glycopeptides resultant from trypsin digest of a Caenorhabditis elegans preparation. Peptides were analyzed by mLC-MS/MS using a hybrid Q-TOF instrument well suited to the analysis of PTMs [29]. They identified almost 1,500 N-glycosylated sites on 829 proteins, including 175 predicted polytopic and 257 bitopic IMPs. Prediction algorithms (SignalP and ConPredII) were used to classify the location of the N-terminus of each bitopic protein as either cytoplasmic or luminal. Presence of a predicted N-terminal, ER-targeting cleavable signal sequence is a strong indication of N-terminal luminal orientation (Figure 2A). For 160 out of 181 proteins with a predicted cleavable signal sequence, experimentally determined glycosylation sites were localized exclusively to the N-terminal side of the predicted TMH, while the remaining few were glycosylated on the C-terminal side or on both sides. This 88% correlation between predicted and experimental topologies is quite high, and discrepancies call into question the prediction algorithm assignments. For IMPs lacking a signal sequence (Figure 2B), 48 were glycosylated exclusively on the C-terminal side and 26 on the N-terminal side of the predicted TMH. Interestingly, for IMPs lacking a cleavable signal sequence but with an N-terminal luminal orientation, it was observed that the translocated/glycosylated N-terminal side of the protein was, in general, quite long, an interesting finding given that translocation of this segment is thought to be post-translational and requires an
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
221
Figure 2 Glycosylation of bitopic IMPs. (A) IMPs with N-terminal signal sequences have an N-terminus with a luminal orientation. The signal sequence is cleaved after protein insertion into the membrane, and the soluble domain exposed within the ER lumen is susceptible to glycosylation (‘‘Ys’’). (B) For bitopic IMPs lacking a cleavable signal sequence, orientation is though to be dictated by the ‘‘positive inside’’ rule, where the most positively charged soluble domain remains in the cytoplasm. The opposite soluble domain, located within the lumen, may become glycosylated.
unfolded structure, presumably more difficult if a large domain adopts spontaneous tertiary structure while still located in the cytosol. Identification of glycosylation sites is most applicable as a high-throughput technique for the analysis of IMP (particularly bitopic) orientation. The method is dependent upon the presence and location of glycosylation sites, and so resolution of intra- and extra-membrane domains is limited. In comparison, the limited proteolysis strategy discussed in the next section can be used to more precisely define the boundaries of TMHs.
222
Anna E. Speers and Christine C. Wu
6. LIMITED PROTEOLYSIS Proteolytic digest of sealed, uniformly oriented vesicles — referred to as limited proteolysis, protease protection, or membrane shaving — is one method for identifying the location and orientation of solvent-exposed and membraneembedded domains. For an overview of techniques discussed, see Figure 3. The underlying principle of this strategy is that the hydrophobic core of the bilayer is
Figure 3 General summary of limited proteolysis strategies. Starting membrane preparation (center) is an intact membrane vesicle preparation (or liposomes in the case of Leite et al.). IMP orientation may be assessed by proteolytic digest of intact membranes (Rodriguez-Ortega et al., Peck et al., Wu et al. — hppK with protease protection). While all methods generate soluble peptides for analysis, Blackler et al. and Leite et al. (as well as Bear and colleagues, who follow a method analogous to Leite et al., with the exception of using formic acid to solubilize membrane-embedded peptides) also carry out the complementary analysis of membrane-embedded domains. Abbreviations: aq, aqueous; FA, formic acid; MeOH, methanol; and ppt, precipitate.
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
223
inaccessible to soluble proteases (which has been shown for bacteriorhodopsin [48]). As a result, peptides will be generated exclusively from the extra-vesicular, solvent-exposed domains. Most popular membrane preparations include caveolae (invaginations in the plasma membrane), thylakoids (pigmented vesicles within chloroplasts), plasma membranes, mitochondria, and intact bacteria [43]. Most structures are reported to be isolated more or less homogeneously in their natural orientation (e.g., cytosolic domains are concealed on the interior of plasma membrane vesicles but exposed on the exterior of pinched-off caveolae); however, different preparative methods can lead to inside-out conformations of some membrane types (e.g., plasma membrane [49–51]). Vesicles (re)forming with an inside-out orientation during sample preparation and/or contamination from other membranes can complicate topological assessment. The term ‘‘limited’’ proteolysis has also been applied to mean ‘‘limited to soluble domains’’ rather than just limited to a specific side of the membrane. Vesicles can be permeabilized using high pH buffers [52], detergents [53], or sonication [48] to allow digestive enzymes access to both sides of the membrane. Technically speaking, such an experiment is not a characterization of topology, but rather intra- vs. extra-membrane domains, as topological determination implies a vectorial assessment (i.e., cytoplasmic vs. exoplasmic) as well.
6.1 Benchmark study: The glycine receptor Leite et al. [48] presented the first study using limited proteolysis coupled with MS analysis, in which they mapped the topology of the glycine receptor, a ligandgated ion channel of the nicotinicoid superfamily whose members mediate signaling across the synapse. The receptor was recombinantly expressed and functionally reconstituted in lipid vesicles, where activity was taken to indicate that the receptor molecules were in a uniform, native conformation. The membrane preparation was permeabilized via sonication to allow enzyme (trypsin, V8, Lys-C) access to the vesicle interior. The authors were cautious to not over-digest, citing concerns over subsequent unfolding/digestion of membrane-embedded domains. While this is generally considered not to be a significant problem, it is possible that protein structure may be altered upon removal of soluble domains. Soluble and membrane-embedded peptides were separated by centrifugation, and the membrane-embedded peptides extracted with an organic solvent/acid mix prior to analysis by mLC-MS and MS/MS (Q-Q-Q). In the soluble peptide fraction, they discovered proteolytic cleavage sites in regions expected to be inaccessible, suggesting the existence of unusually short TM peptide segments. The authors postulate that the TMDs may be composed of both a-helical and b-sheet components, as far fewer residues are needed to span the membrane in a b-sheet motif. Analysis of the membrane-embedded peptides suggested that the N-terminal domain, thought to be entirely solvent-exposed, contained a short re-entrant loop. Overall, the limited proteolysis data was in accord with finding from previous biochemical studies, including localization of PTMs and disulfides, lipophilic probe labeling, cysteine mutagenesis/chemical modification, and immunolabeling [48,54].
224
Anna E. Speers and Christine C. Wu
The authors do note some degree of overlap between the solvent-accessible and membrane-embedded peptide samples. It is known that, even with digestion of soluble proteins in the presence of a membrane, some peptides will associate with the lipid fraction (albeit in limited quantity) [48]. However, a more common phenomenon resulting in soluble peptides purifying with the membrane fraction is incomplete cleavage due to steric hindrance (with membrane, other regions of the protein, PTMs) or lack of cleavage sites close to the membrane-solvent interface. Thus, it is expected that the membrane fraction will contain more contamination from the soluble fraction than vice versa. Another potential grey area is IMP domains that may lie on the surface of the membrane, and thus not be readily classified (or biochemically segregated) as either soluble or membraneembedded.
6.2 The high pH-proteinase K (hppK) method The non-sequence-specific enzyme, proteinase K, is of particular benefit in limited proteolysis studies, as soluble domain cleavage sites are not limited to specific residues, allowing for enhanced cleavage of soluble domains. The protease has maximal activity at neutral pH, where exposed domains can be completely digested into di- and tri-peptides. However, at alkaline pH, activity is attenuated, generating peptides amenable to MS/MS sequencing [52]. While proteinase K has the potential to generate many more peptides than proteases with specific cleavage sites, in limited proteolysis experiments, proteinase K does appear to have preferred cleavage sites, possibly a result of local topology, thus avoiding excessively complex samples (unpublished results). The high pH environment also causes sealed vesicles to form open structures, allowing protease access to both sides of the membrane, but leaving the bilayer and most IMPs (with the possible exception of some bitopic proteins [55]) intact. Two different digestion schemes have been proposed by Wu et al. [52] using proteinase K (Figure 3). In the first scheme, membranes in high pH buffer are digested with proteinase K (the hppK method), and soluble peptides are separated by centrifugation and analyzed by MudPIT, providing information regarding intra- and extra-membrane domains. In the second scheme, intact vesicles are first digested with proteinase K at neutral pH, removing all solvent-exposed extra-vesicular domains. Upon subsequent high pH treatment, the vesicles open, and the formerly protected extra-membrane domains are digested (proteinase K) for MudPIT analysis, so the orientation, as well as the location, of the soluble domains can be determined. Both methods were applied to enriched Golgi membranes, allowing topological assignment of numerous resident Golgi IMPs.
6.3 Targeting transmembrane domains In the above example, only the soluble peptides were used to assess topology, as analysis of the membrane-embedded peptide fraction can be a much more complicated endeavor. As previously mentioned, owing to their length (20–25 residues on average), and high hydrophobicity, transmembrane-containing
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
225
peptides can be difficult to extract from gels, elute from RP columns, and may not be within the m/z range of typical MS/MS instruments (for review see ref. [40]). Additionally, the high lipid content can interfere with mLC separation and analysis. A new protocol by Blackler et al. [44] employing re-digestion of membraneembedded domains with CNBr, optimized lipid removal, and heated mLC, largely resolved these issues. Following hppK digestion [52], shaved membranes are re-isolated by centrifugation, and then solubilized with a CNBr/formic acid solution [44]. CNBr cleaves at methionines, which are predicted to occur in approximately half of eukaryotic TMHs, a much higher frequency than the charged residues Arg and Lys [12,56]. Upon addition of an aqueous-organic buffer and centrifugation, selective precipitation of lipids is observed. Analysis of the transmembrane-containing peptides by mLC-MS/MS (LIT) at elevated temperature (601C) resulted in the identification of over 300 peptides, 2/3 of which overlapped a predicted TMH [45]. Using a MudPIT approach (601C) increased peptide identifications to over 1,300 per run, with over half containing full or partial TMHs [44]. Topological information afforded by analysis of TMHs may be particularly beneficial given that coverage of soluble domains may be incomplete, as exposed loops may be short [5] or heavily modified (e.g., glycosylated), hindering digestion and identification. Topology mapping using this optimized TMH analysis protocol is currently underway.
6.4 Limited proteolysis applied to global topology profiling From the preceding discussion, it is apparent that the limited proteolysis technique is compatible with topology profiling on a proteome-wide scale. Rodriguez-Ortega et al. [57] applied limited proteolysis to the analysis of the group A Streptococcus bacterial surface proteome as a way to identify potential antigens for vaccine development. Following membrane shaving with trypsin or proteinase K, soluble peptides were analyzed by MudPIT-MALDI-MS/MS (TOF-TOF) and mLC-MS/MS (Q-TOF). Peptides from 72 proteins, including 37 predicted IMPs, were identified. When the extracellular location of identified peptides was compared with that predicted by the PSORT algorithm, good correlation was obtained for 26/37 (70%) of IMPs. For six of the remaining proteins, experimental evidence was strong enough to cast doubt on the PSORT topology assignment. In another example, Peck and colleagues [58] combined limited proteolysis with phosphoproteomics by trypsinizing the exposed cytosolic domains of inside-out (Brij58 treated) Arabidopsis plasma membranes, followed by phosphopeptide enrichment using immobilized metal ion affinity chromatography (IMAC). They identified several hundred phosphorylation sites from B200 proteins following mLC-MS/MS analysis (Q-TOF). It was observed that most phosphorylation sites were located near the N- or C-termini of multi-pass proteins, thus defining that terminus as cytoplasmically located. When compared with predicted topology (TMHMM algorithm), assignments were in agreement for nearly 75% of IMPs. More evidence would be required to assign orientation to the remaining quarter. The level of detail obtained from the single protein (i.e., glycine transporter; see also work by Bear and colleagues [59] in the next section) vs. the global
226
Anna E. Speers and Christine C. Wu
proteome studies is quite different, with the latter being significantly lower resolution. These examples highlight a natural trade-off between sequence coverage and number of proteins characterized. Hopefully, with improved sample preparation and analysis methods — and the application of higher resolution/mass accuracy MS instrumentation — it will be possible to find ways to get more detailed topological information from high-throughput analyses. Advancement of the limited proteolysis strategy is particularly attractive given that the method (like glycosylation analysis) can be applied to proteins in native membranes. Additionally, the technique has the potential to define TMH edges with fairly high (residue level) resolution, especially if used with nonsequence-specific proteases. One major caveat to the limited proteolysis technique is that data interpretation can be complicated by inclusion of soluble domains in the membrane-associated fraction, which should be addressable by careful data analysis and further refinement of the limited proteolysis technique.
7. RESIDUE-SPECIFIC CHEMICAL MODIFICATION Assessing the susceptibility of specific residues to chemical modification depending on their local environment is another method for investigating IMP topology. Hydrophilic probes should only react with solvent-exposed residues, whereas lipophilic reagents should only modify residues in a hydrophobic milieu. Such chemical labeling can provide information regarding residue environment as a function of conformation state (e.g., open or closed ion channels) and interaction with ligands (e.g., substrates, activators, inhibitors). Probes typically have electrophilic reactive groups, like N-hydroxysuccinimide (NHS) for lysine labeling or iodoacetamide/maleimide for cysteine labeling; however, a number of residues are susceptible to selective chemical modification, as discussed below. Examples can be found in Figure 4. Chemical labeling can be easily combined with limited proteolysis to increase the amount of topological information provided by a single experiment.
7.1 Biotinylation of cell surface lysines Biotinylation reagents, consisting of a biotin affinity tag linked to an aminereactive NHS (or NHS variant) are generally regarded as membrane-impermeable. Nunomura et al. [60] biotinylated the surface-exposed lysines of mouse embryonic stem cells and isolated the plasma membrane fraction for solubilization and trypsin digestion. Biotinylated peptides were affinity purified using immobilized avidin. Following elution with an aqueous/organic mix, MudPIT analysis (Q-TOF) identified almost 1,000 biotinylated peptides, with specific fragmentation of the biotin tag confirming labeling. In total, 200 predicted IMPs (SOSUI algorithm) were identified. The orientation of the N- and C-termini was experimentally assigned for 122 single-pass IMPs, localizing the N-terminus to the extracellular region in 80% of cases, with the rest unambiguously assigned by observed C-terminal modification. Like the previous two methods described
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
227
Figure 4 Representative chemical modifications for structure/topology characterization. Modifications include residue-specific (amino acid indicated by one-letter code) labeling using mono- and bi-functional (cross-linking) reagents and the photolabeling reagents used by Leite et al. Hydrophilic reagents label residues outside of the bilayer (horizontal lines), while hydrophobic reagents can label residues exposed to the lipid interior (as shown) or hydrophobic pockets.
above (glycosylation analysis, limited proteolysis), this example of cell surface biotinylation is amenable to the high-throughput analysis of IMP orientation. The remaining examples of chemical modification were applied to single proteins, allowing for more detailed studies of topology and/or binding-site architecture.
7.2 Labeling solvent-exposed cysteines Bear and colleagues [59] combined cysteine labeling with limited proteolysis for the topological characterization of a recombinantly expressed ClC-2 chloride ion
228
Anna E. Speers and Christine C. Wu
channel functionally reconstituted in liposomes. They labeled solvent-accessible cysteines with Alexa Fluor 488 maleimide, and then applied limited proteolysis (trypsin) to isolate the soluble domains, using sonication to give the enzyme access to the liposome interior. The soluble peptides were isolated for mLC-MS/MS (Q-TOF) analysis, and the membrane-embedded peptides were solubilized in formic acid for analysis by MALDI-MS/MS (Q-TOF). High sequence coverage from the limited proteolysis study allowed assignment of the boundaries of the N- and C-terminal domains, as well as several extra-membrane loops. Of the 16 cysteine residues in the protein, nine could be detected by MS, including six labeled residues from the soluble fraction, and one labeled Cys from the membrane fraction, indicating a position proximal to the membrane surface for the latter Cys residue. In total, their data confirmed and further refined the purported ClC-2 structure. It can also be instructive to see which cysteines may be involved in disulfide bonds, as it may provide information on proximal domains. Wang et al. [61] distinguished free cysteines from those involved in disulfide bonds of the chlamydial major outer membrane protein in native elementary body membranes by reacting free cysteines with 4-vinyl-pyridine, reducing remaining cysteines, and incubating with iodoacetamide. MALDI-TOF and mLC-MS/MS (Q-TOF) analysis identified several putative (carboxyamidomethylated) residues involved in disulfide bridges.
7.3 o-Nitrosylation of interfacial tyrosines As previously mentioned, tyrosines occur with high frequency at the interfacial (polar head group) region of the bilayer, and are thus attractive targets for precisely defining TMH boundaries. Following up on their limited proteolysis study of the glycine receptor discussed above, Leite et al. [62] used the hydrophilic reagent tetranitromethane to modify tyrosines by o-nitrosylation. One potential advantage of using small chemical reagents is that they may not encounter the same steric hindrance as bulky proteases close to the membrane surface. The ability to label interfacial (and solvent exposed) tyrosines, but not those in the hydrophobic core, has been confirmed for bacteriorhodopsin. The authors used LC-MS and LC-MS/MS (Q-Q-Q) to identify modified tyrosines, and, in agreement with limited proteolysis studies, found positive mass shifts for tyrosines previously thought to be buried within the lipid bilayer, indicating the existence of at least one TMD composed of unusually few amino acids.
7.4 Oxidation of solvent-exposed methionines The hydrophilic oxidant chloramine T can convert accessible methionines to methionine sulfoxides, revealing extra-membrane residues. Using the polytopic anion transporter band 3 in native erythrocyte membranes as a model, Li et al. [63] exposed membrane preparations to oxidation, followed by solubilization, trypsin digestion, and LC-MS/MS analysis (IT). As expected, the authors identified a number of oxidation-sensitive methionines in regions previously characterized as
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
229
solvent exposed. One residue, Met 741, believed to reside in a TMH, was also found to be susceptible to oxidation, indicating localization to a soluble region. Interestingly, Met 741 was selectively protected from oxidation in the presence of known inhibitors of band 3, which fix the protein in an inactive outward- or inward-facing conformation (depending on the specific compound used). This result indicates that Met 741 may lie close to the interface region, being alternately solvent exposed or lipid-embedded depending on receptor conformation.
7.5 Labeling binding-site glutamates and arginines Thus far, examples discussed have been geared towards characterization of IMP topology. However, residue-specific chemical labeling can also be used to investigate binding-site dynamics and uncover specific residue–residue or residue–ligand interactions, providing insight to transport mechanisms. In a series of experiments using membrane preparations of E. coli expressing the polytopic transporter lactose permease (LacY), Kaback and colleagues [64,65] characterized the chemical labeling susceptibility of specific glutamate and arginine residues in the sugar-binding pocket — in the presence and absence of substrate and in response to specific mutations. Following glutamate labeling by carboxyl-reactive hydrophobic carbodiimides [64], or arginine labeling by guanidino-specific butane-2,3-dione, LacY was digested with CNBr/formic acid and analyzed by LC-MS/MS (Q-Q-Q or IT). To facilitate analysis, in one instance, a novel CNBr cleavage site was introduced to isolate a specific arginine of interest, excluding other potential labeling sites from that peptide [65]. The resulting labeling patterns revealed a variety of residue–ligand and residue–residue interactions, advancing a proposed charge-pairing rearrangement model that occurs upon substrate binding and translocation [65]. Weinglass et al. [66] also exploited hydrophobic carbodiimide labeling to help distinguish between two proposed structural models for the E. coli EmrE multidrug transporter: a homodimer and a tetramer composed of two heterodimers. EmrE has a single, highly conserved charged residue (Glu-14) that is located in a membrane-embedded domain, which is required for catalysis. Inhibition of EmrE can be achieved via hydrophobic (but not hydrophilic) carbodiimide labeling, presumably at Glu-14; however, because previous studies had relied on functional readout for the entire complex, the stoichiometry of labeling was unknown. Labeling of detergent-solubilized EmrE, CNBr/formic acid digest and LC-MS/MS analysis (IT) allowed for the specific identification of Glu-14 as the modified residue. Furthermore, the authors showed that nearly all Glu-14 residues undergo labeling, which could be blocked almost completely by substrate competition. These findings were key to distinguishing between the two structural models, as, in the tetramer model, the Glu-14 residues in each heterodimer are in strikingly different environments, one being solvent exposed and the other membrane-embedded, with only one pair in the tetramer involved in substrate binding. It follows that the two types of glutamates would have equally different labeling profiles, which is not the case. Near-complete labeling (and equal blocking by substrate) suggests that the homodimer conformation is
230
Anna E. Speers and Christine C. Wu
indeed correct, as it proposes nearly equivalent environments for the Glu-14 residues. Thus, the use of MS analysis allowed both the identification of the labeled residue and the determination of percent labeling, providing a more detailed structural picture than could be obtained by previous methods.
8. PHOTOAFFINITY LABELING OF BINDING-SITE RESIDUES Photoaffinity reagents are typically natural substrate analogues with an added photoreactive group (e.g., diazirine, benzophenone, or aryl azide) and reporter tag (e.g., radioactive, fluorescent, immunoreactive, or affinity). Upon irradiation with long-wave UV light, a reactive intermediate (carbene from diazirine, or nitrene from benzophenone and aryl azide groups) is generated, capable of incorporation into a variety of bonds (OH, C ¼ C, CH), with carbenes being more reactive and less selective than nitrenes. Thus, unlike modification of reactive side chains discussed in the previous section, photoaffinity labeling is not nearly as dependent upon the presence of specific residues (for review see ref. [67]). Radiolabeled reporters are most frequently employed as a readout for photoaffinity agents. In a typical experiment, light-induced protein labeling is followed by partial digestion (e.g., CNBr), SDS-PAGE separation, detection by autoradiography, and band excision/N-terminal sequencing or MALDI-TOF MS for identification. Using this approach, sites of modification are generally only localized to large peptides. Using MS for peptide identification rather than amino acid sequencing is generally preferred for IMPs, as it requires less material and is not reliant upon efficient digestion [68]. If MS is used for both detection and identification, then proteins can be more extensively digested (e.g., trypsin, chymotrypsin) and analyzed directly by MALDI-TOF or mLC-MS/MS, allowing more precise determination of label incorporation. Additionally, no additional reporter group on the probe is necessary, as the mass shift inducted by covalent labeling serves as a reporter tag; however, use of an isotopic signature may be of significant benefit, as it serves to distinguish labeled from unlabeled peptide ions during MS. For example, Sachon et al. [69] used an equal molar mixture of deuterated and non-deuterated photoaffinity probes, which allowed modified species to be characterized by doublet peaks in the full mass spectra. Because photoaffinity reactions can be low yielding, the population of labeled species is typically greatly diluted by unlabeled peptides. As such, Sachon et al. found their characterization efforts facilitated by using benzophenone substrate analogues bearing a biotin affinity tag for streptavidin enrichment. Labeled peptides bound to the solid phase resin were analyzed directly by MALDI-TOF, a technique also used by Becker and colleagues [68]. Lamos et al. [70] also report a similar strategy using mixed isotopic (hydrogen/deuterium; H/D) biotinylated benzophenone probes. After photolabeling, proteins were affinity purified, digested on the solid support, eluted, and analyzed by LC-MS/MS to identify (1) unlabeled peptides from affinity purified proteins via MS/MS sequencing and (2) labeled peptides via isotopic shift in the full MS, using the protein list from the unlabeled fraction to
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
231
validate labeled peptide assignments [70]. Splitting a single peptide into two peaks via differential isotopic labeling does reduce signal intensity; however, that can be offset by an increased number of identified labeling events. It is also worth noting that the use of deuterium label gives peptides with slightly different LC retention times, so 13C or 18O might be better choices [71]. Even without isotope signatures and affinity purification, MALDI-TOF analysis can successfully identify photolabeled peptides from recombinantly expressed IMPs. Chiba and colleagues applied photoaffinity labeling using benzophenone-derivatized substrate analogues to model the substrate binding domains of the multidrug transporters P-glycoprotein [72] and LmrA [73]. Recombinantly expressed transporters in insect cell/E. coli membrane preparations were subject to photolabeling, SDS-PAGE and in-gel digest (chymotrypsin), and then analyzed by MALDI-TOF. Labeled peptides from several TMHs for each protein were identified, suggesting potential binding-site architectures for both transporters. In a similar study using a detergent-solubilized polytopic IMP (as opposed to a membrane preparation) Wu et al. [74] mapped the substrate binding sites of the recombinantly expressed human multidrug resistance protein 1 (ABCC1) using the endogenous substrate leukotriene C4, which has intrinsic photoactive properties due to its conjugated triene structure. After SDS-PAGE and in-gel digest (trypsin, chymotrypsin, V8), several potential probe-labeled peptides were identified, indicating a substrate-binding site bounded by at least four TMHs and a cytoplasmic loop. Individual conformations of receptor-binding sites can also be mapped using photoaffinity labeling, provided that the photoaffinity reaction occurs on a shorter timescale than structural rearrangement [75]. In a study by Leite et al. [76] hydrophobic diazirine and benzophenone probes (Figure 4) were used to interrogate the topology of the nicotinic acetylcholine receptor alpha1 subunit in the open, closed, and desensitized states. Using mLC-MS/MS (IT) to analyze Glu-C peptide digests, Leite et al. were able to pinpoint sites of probe modification site specifically or to within a few residues, identifying hydrophobic associations that changed depending on the conductance state. Sites of probe incorporation included lipid-exposed faces of TMHs, protein segments in contact with the membrane face, and hydrophobic-binding pockets. However, several known integral membrane domains were not observed, likely due to peptide loss prior to or during mLC separation, as Glu-C digestion may result in rather long TMH peptides. Their ability to assign sites of labeling using mLC-MS/MS is quite rare among photoaffinity studies of membrane proteins. Interestingly, their labeling efficiency (up to 65%) [76] is significantly higher than most reports. Efficiency of photolabel incorporation is generally quite low [75], on the order of B1% or less by some accounts [77–79]. For comparison, Leite et al. [76] report the labeling of synthetic peptides in vitro with B10% yield, which is in accordance with similar experiments conducted in our lab (50 mM peptide labeled with 10-fold excess hydrophobic diazirine probe gives up to 10% labeling; 50 mM peptide labeled with 10-fold excess benzophenone gives up to 35% labeling, unpublished results). Thus, relatively high yielding reactions can be achieved under certain optimized circumstances, and the success of Leite et al. may be due to a particular set of
232
Anna E. Speers and Christine C. Wu
experimental conditions that allowed for extremely high labeling efficiency. Such conditions may be somewhat unique, as similar results have yet to be reported since their publication in 2003. Owing to the low efficiency of most photolabeling reactions and the random nature of incorporation, labeled peptides will likely be a very heterogeneous mixture of species making up only a minor fraction of total peptides. As a result, MALDI-TOF analysis has a slight edge over LC-MS/MS using standard MS instrumentation (Figure 5B). In MALDI, all modified forms of a particular peptide will appear as one peak; however, if one were to fragment those species en masse, the spectrum would likely be too complicated to deconvolute given the presence of multiple sites of labeling; thus, the most useful information comes from the parent mass. Given the isomeric nature of the labeled peptides, it may also be quite difficult to fully resolve such a mixture by chromatographic means,
Figure 5 Comparison between (A) residue-specific and (B) photoaffinity labeling. Analysis by MALDI-TOF gives isotopic resolution of a single species in both cases, although intensity is expected to be much lower for the photolabeled peptide ion due to generally low-yielding reactions (note intensity scales). If species are separated by mLC, then the isomeric photolabeled species will separate to some extent, diluting signal intensity. If mLC is interfaced with a low-resolution IT/LIT analyzer then only unit m/z will be resolved in the MS spectrum.
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
233
leading to overlapping peaks and multiple species in each MS/MS spectra, which can be extremely difficult to deconvolute. Additionally, a chromatographic separation means that multiple labeled forms of the same peptide will be spread out over the LC trace instead of collapsing to a single parent peak (as in direct spotting on a MALDI plate), reducing the chances of obtaining even a parent mass for identification. However, even with the advantages of MALDI, labeled peptide identification can still be an extremely challenging task. For example, Vaughan et al. [80] characterized the dopamine transporter (DAT) ligand-binding site using radioactive and non-radioactive analogues of uptake blockers (e.g., cocaine). They report B1 fmol of labeled peptide per one-hundred 15 cm2 plates of DATexpressing HEK cells, a quantity readily analyzed by analytical SDS-PAGE/ autoradiography, but not by standard LC-MALDI (or, indeed, LC-MS/MS). If the photolabeling reaction were routinely higher yielding, then the multiple sites of labeling would present less of an obstacle for identification and sequencing; however, such experimental conditions seem to be exceedingly rare. In contrast, the other types of covalent reagents (e.g., electrophilic) discussed above are both much higher yielding and site specific, making MS analysis more straightforward, as samples are less complex and sites of labeling more easily inferred (Figure 5). However, their site-specific incorporation can also be a hindrance; because labeling is restricted by the presence of specific residues, detailed information regarding potential labeling sites is likewise restricted. Because photolabeling is significantly less site-specific, if the technology could be advanced such that sites of probe incorporation could be routinely identified by MS/MS, it could be a powerful strategy for elucidating binding-site architecture and membrane topology. As such, photoaffinity experiments should be greatly aided by (1) isotopic labeling of photoaffinity probes, as discussed above; (2) the affinity purification of labeled peptides prior to analysis, which has been demonstrated in a few cases (see above); however, appendage of large affinity reagents can compromise binding affinity and change the hydropathic nature of the probe, so alternate strategies for affinity purification [81] will need to be developed/applied; (3) optimized mLC separation; (4) use of higher resolution/mass accuracy MS/MS instrumentation (e.g., FT-ICR, Orbitrap), for increased peak capacity and resolution of isomeric species; and (5) analysis software that allows deconvolution of MS/MS spectra containing multiple species. Additionally, if non-productive probe fragmentation compromises sequencing, alternative fragmentation techniques (ECD or ETD) could provide better MS/MS spectra.
9. CROSS-LINKING Chemical cross-linking involves covalent bond formation between different regions of a molecule (intra-molecular) or between different molecules (intermolecular) using bifunctional reagents generally targeted towards Lys and/or Cys (one example shown in Figure 4). Localization of cross-linking sites can provide information regarding the spatial organization of monomeric and multisubunit proteins, which can be used to refine IMP structure prediction
234
Anna E. Speers and Christine C. Wu
algorithms [82,83]; however, cross-linking is limited by the number and relative location of accessible reactive residues. Following enzymatic digestion, crosslinked products can be analyzed by MS methods. To facilitate identification, crosslinkers can employ isotope labels [84,85] or functional groups that give specific fragmentation patterns in MS/MS spectra [86]. Application of the high resolution/mass accuracy FT-ICR allows unambiguous identification of crosslinked products by parent mass alone, which can be quite beneficial given the potentially complex nature of MS/MS spectra [33,87]. Cross-linking experiments involving IMPs are exceedingly rare, and the work by Jacobsen et al. [87] on bovine rhodopsin reveals some of the potential reasons cross-linking has not been more widely employed. They used lysine- and cysteine-reactive bifunctional reagents to cross-link rhodopsin in native membrane preparations, followed by CNBr digestion and FT-ICR analysis. The authors note that the cross-linking experiment required significant optimization of (among other parameters) separation of monomeric species to isolate intra-protein crosslinks, proteolysis, fragmentation of parent ions, and data analysis software. Identification of cross-linked products was compromised by the small amount of cross-linked product relative to unmodified peptides, as well as large peptide size, a function of CNBr cleavage. Double digestion strategies employing trypsin in addition to CNBr were unsuccessful due to significant missed cleavages, leading to a more complex mixture of low-abundant species. In cases where peptides contained multiple potentially reactive residues, MS/MS spectra were obtained to localize cross-linking sites. In addition to CID, ECD was employed in several instances to generate better fragmentation patterns. Their results suggest that residue side chains have considerable range of motion in the timescale of the cross-linking reaction, and indicate backbone regions with enhanced flexibility. Jacobsen et al. make the important point that cross-linking of two residues only implies that, at some point during the course of the reaction, their side chains were in sufficient proximity for cross-linking to occur, but it does not imply that such proximity is necessarily common or energetically favorable [87].
10. H/D EXCHANGE Amide H/D exchange is useful for the characterization of protein structure, dynamics, and protein–ligand interactions [88,89]. The rate of exchange depends in large part on the hydrogen exposure to the solvent and degree of hydrogen bonding, with solvent-exposed atoms exchanging more readily than inaccessible atoms. Native protein amide H/D exchange rates can vary up to eight orders of magnitude — from a few milliseconds to days or longer. For proteins embedded in the membrane, H/D exchange is a function of solvent permeation in different regions of the bilayer as well as hydrogen bonding and tertiary/quaternary structure steric effects (Figure 6) [90]. Exchange kinetics can be measured by several methods, the two most widely used being nuclear magnetic resonance (NMR) and MS. NMR is limited to proteins less than B50 kDa and requires
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
235
Figure 6 Hypothetical deuterium incorporation into backbone amides of an IMP. Degree of labeling should be highest in unstructured and semi-structured solvent-exposed domains. After quenching in a low temperature/low pH buffer, protein is digested and analyzed by MS to determine deuterium incorporation.
sufficient amounts of purified protein. Solubility can also be problematic for IMPs; however, NMR does allow for analysis of exchange at the atomic level, providing high spatial resolution (for review see ref. [89]). Like limited proteolysis and photoaffinity labeling, H/D exchange is not dependent upon the presence of specific residues. Consequently, H/D exchange experiments are potentially able to provide highly detailed maps of lipid-occluded TMHs, ligand-binding sites, and other solvent-inaccessible domains. In a typical H/D exchange experiment using an MS platform, a protein of interest is exposed to a deuterated solvent for various times to allow on-exchange, and then immediately quenched with a low pH/low temperature buffer to significantly slow (but not entirely arrest) further on-exchange and back-exchange. The protein is then denatured (e.g., urea) and digested with an acid-stable protease (pepsin), and peptides are separated and analyzed by LC-MS, the whole process (quench to MS) taking 30 min or less [91]. Deuterium enrichment can be calculated from analysis of the isotopic distribution for each peptide as a function of time, giving a profile for the overall solvent accessibility of the amide hydrogens. MS methods are generally only able to look at H/D exchange at the peptide rather than residue level, as CID may significantly scramble H and D atoms [92]. However, use of ECD in FT-ICR MS may not be plagued by the same problem due to the lower energy of the fragmentation process [30]. Given the extremely limited time for chromatographic separation before loss of the deuterium label, a highresolution MS instrument is also beneficial for increasing peak capacity. Chalmers et al. [91] report identification of far more peptide ions (masses determined within 100s ppb) and successful deconvolution of isotopic envelopes of overlapping peaks using an FT-ICR. So far, no successful H/D exchange experiment has been reported for an IMP in a native membrane, although there are a few examples (all from the same group) where conformational changes have been investigated using IMP–detergent and IMP–phospholipid complexes. Armstrong and colleagues
236
Anna E. Speers and Christine C. Wu
[93] reported conformational analysis of the microsomal glutathione transferase 1 (MGT1), a homotrimeric IMP with four-helix bundle membrane-spanning domains in each subunit, using a Q-Q-Q instrument for MS analysis. Regions of fast H/D exchange for a TritonX–100–MGT1 complex were found to correlate well with solvent-exposed domains in the protein crystal structure, while TMHs largely exhibited slower exchange kinetics. Incubation with glutathione resulted in a substantial reduction in solvent accessibility for a region in the cytosolic domain, indicative of a binding site. There was also evidence that TMHs are involved in the conformational change that occurs upon binding. In an effort to assess the extent to which characterization of a detergent–solubilized complex is reflective of biology in a membrane, H/D exchange was carried out with a catalytically active protein– phospholipid complex. Based on similar kinetics profiles, the authors concluded that the detergent and phospholipid complexes were structurally quite similar, with major differences mostly localized to the TMH ends, in contact with phospholipid/detergent head groups [93]. Recently, Busenlehner et al. [94] used a footprinting technique to more precisely define peptides involved the GSH and other substrate-binding sites for MGT1. After deuterium on-exchange, ligand is added to complex with the protein, which is then diluted with non-deuterated water to allow back-exchange before quenching. The end result should be retention of the deuterium label selectively within the ligand-binding site. They identified what appear to be distinct binding sites for fatty acids/phospholipids and other hydrophobic substrates, and regions of the protein potentially involved in conformational transitions. The same group [95] has also applied H/D exchange MS to catalytically active, detergent-solubilized cytochrome c oxidase (CcO), a redox-driven proton pump. They identified several peptides potentially involved in conformational changes of the gating mechanism that allows alternate access to each side of the membrane. It should be noted that crystal structures exist for both MGT1 and CcO, which no doubt greatly facilitated interpretation of H/D data. In turn, information from H/D exchange was able to uncover dynamic features not revealed by X-ray crystallography. Given the paucity of literature on the subject, it is unclear whether H/D exchange is generally applicable to IMPs or if MGT1 and CcO belong to a small pool of well-behaved IMPs amenable to rapid low pH digestion and peptide separation/analysis. Additionally, high throughput of complex samples may be quite difficult due to analysis time constraints, which are necessary to avoid loss of the isotopic label. Hopefully further advances in H/D exchange methodology will allow for more widespread application to the study of IMP structure and topology.
11. SUMMARY AND FUTURE DIRECTIONS Over the past B20 years, traditional biochemical protein characterization assays, relying on ultraviolet, absorbance, fluorescent, or radioactive readout, have been interfaced with or replaced by MS detection. On the heels of the genomic revolution, this paradigm shift has allowed for the characterization of protein
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
237
expression and sequence analysis on a global scale. In contrast, MS characterization of IMP structure/topology on a proteome-wide level is still in relative infancy. As mentioned previously, the three main advantages of an MS platform are: high throughput, sensitivity, and high information content (i.e., accurate mass determination and sequencing of native/covalently modified peptides). However, detailed IMP structural/topological determination does not easily lend itself to high-throughput analysis: many traditional techniques rely on assaying the interaction between a protein and specific ligand (activator, inhibitor, cofactor) or systematic mutations. Additionally, due to the general low abundance and difficult biophysical properties of IMPs, obtaining enough material may require painstaking optimization of a recombinant expression system and functional characterization. Finally, while data collection may be amenable to highthroughput, interpretation of results for reconstruction of a structural/topological model may require significantly more time. Indeed, so far the only applied global topology assays (identification of glycosylation sites, cell surface biotinylation, limited proteolysis) have yielded only low-resolution data of IMP orientation. In terms of the second parameter, sensitivity, routine biochemical readouts (radiography, fluorescence) have sensitivities (if not resolution) rivaling or exceeding that of standard MS instrumentation. Thus, the principle remaining factor motivating MS implementation is high content information, which may explain the lag in adoption of the technology for detailed structural/topological characterization. However, with the ongoing development of instrumentation capable of even higher mass accuracy, resolution, sensitivity, and fragmentation abilities, hopefully the next few years will see high-end instrumentation applied in earnest to the problem of solving IMP structure and topology. In particular, advances in non-sequence-specific limited proteolysis, photoaffinity labeling, and H/D exchange, all potentially high-resolution techniques — at least relative to methods relying on residue-specific modification (Table 2) — could be particularly revolutionary. Progress should be focused on increasing sequence coverage, Table 2
Summary of MS-based IMP structure/topology characterization techniques
Low-resolution techniquesa Locating glycosylation sites Residue-specific chemical modification Cross-linking
High throughput
Monitor dynamics
Yes Yes
No Yes
Potential, but difficult
Potential
Yes Potential Potential, but difficult
Potential Yes Yes
b
Moderate-resolution techniques Limited proteolysis Photoaffinity labeling H/D exchange a
Low resolution: characterization dependent upon presence/location of specific residues. Moderate resolution: characterization largely independent of residue identity, enhanced ability to map TMH edges and binding-site surfaces. b
238
Anna E. Speers and Christine C. Wu
localization of covalent modifications, and higher throughput analysis/automated data interpretation to allow for full structural and topological characterization of native, endogenously expressed IMPs on a global scale.
ABBREVIATIONS aq CcO CID CNBr DAT DTT ECD ER ESI ETD FA FT-ICR GPCR H/D hppK IEF IMAC IMP IT LacY mLC LC LIT MALDI MeOH MGT1 MS MS/MS MudPIT NHS PAGE PNGase F ppb ppt PTM Q RP SCX SDS
Aqueous Cytochrome c oxidase Collision-induced dissociation Cyanogen bromide Dopamine transporter Dithiothreitol Electron capture dissociation Endoplasmic reticulum Electrospray ionization Electron transfer dissociation Formic acid Fourier transform-ion cyclotron resonance G-protein-coupled receptor Hydrogen/deuterium High pH-proteinase K Isoelectric focusing Immobilized metal ion affinity chromatography Integral membrane protein, specifically a-helical Ion trap Lactose permease Microcapillary liquid chromatography, RP unless otherwise specified Liquid chromatography Linear ion trap Matrix-assisted laser desorption/ionization Methanol Microsomal glutathione transferase 1 Mass spectrometry Tandem mass spectrometry Multidimensional protein identification technology N-hydroxysuccinimide Polyacrylamide gel electrophoresis Peptide-N-glycosidase F Parts per billion Precipitate Post-translational modification Quadrupole Reversed phase Strong cation exchange Sodium dodecyl sulfate
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
TMD TMH TOF UV
239
Transmembrane domain Transmembrane a-helix Time-of-flight Ultraviolet
ACKNOWLEDGEMENT Financial support was provided by NIH Grants DA021744 and AA016171.
REFERENCES 1 C.M. Ott and V.R. Lingappa, Integral membrane protein biosynthesis: Why topology is hard to predict, J. Cell Sci., 115(Pt 10) (2002) 2003–2009. 2 J. Torres, T.J. Stevens and M. Samso, Membrane proteins: The ‘Wild West’ of structural biology, Trends Biochem. Sci., 28(3) (2003) 137–144. 3 C. Zhou, Y. Zheng and Y. Zhou, Structure prediction of membrane proteins, Genomics Proteomics Bioinformatics, 2(1) (2004) 1–5. 4 A. Elofsson and G. von Heijne, Membrane protein structure: Prediction vs reality, Annu. Rev. Biochem., 76 (2007) 125–140. 5 E. Wallin and G. von Heijne, Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms, Protein Sci., 7(4) (1998) 1029–1038. 6 A. Krogh, B. Larsson, G. von Heijne and E.L. Sonnhammer, Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes, J. Mol. Biol., 305(3) (2001) 567–580. 7 N. Hurwitz, M. Pellegrini-Calace and D.T. Jones, Towards genome-scale structure prediction for transmembrane proteins, Philos. Trans. R. Soc. Lond., B, Biol. Sci., 361(1467) (2006) 465–475. 8 J.U. Bowie, Helix packing in membrane proteins, J. Mol. Biol., 272(5) (1997) 780–789. 9 H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov and P.E. Bourne, The protein data bank, Nucleic Acids Res., 28(1) (2000) 235–242. 10 S.H. White, The progress of membrane protein structure determination, Protein Sci., 13(7) (2004) 1948–1949. 11 M. Punta, L.R. Forrest, H. Bigelow, A. Kernytsky, J. Liu and B. Rost, Membrane protein prediction methods, Methods, 41(4) (2007) 460–474. 12 M.B. Ulmschneider, M.S. Sansom and A. Di Nola, Properties of integral membrane protein structures: Derivation of an implicit membrane potential, Proteins, 59(2) (2005) 252–265. 13 G. von Heijne and Y. Gavel, Topogenic signals in integral membrane proteins, Eur. J. Biochem., 174(4) (1988) 671–678. 14 J. Nilsson, B. Persson and G. von Heijne, Comparative analysis of amino acid distributions in integral membrane proteins from 107 genomes, Proteins, 60(4) (2005) 606–616. 15 J.U. Bowie, Helix-bundle membrane protein fold templates, Protein Sci., 8(12) (1999) 2711–2719. 16 C.P. Chen and B. Rost, Long membrane helices and short loops predicted less accurately, Protein Sci., 11(12) (2002) 2766–2773. 17 R.P. Riek, I. Rigoutsos, J. Novotny and R.M. Graham, Non-alpha-helical elements modulate polytopic membrane protein architecture, J. Mol. Biol., 306(2) (2001) 349–362. 18 P.L. Yeagle, M. Bennett, V. Lemaitre and A. Watts, Transmembrane helices of membrane proteins may flex to satisfy hydrophobic mismatch, Biochem. Biophys. Acta, 1768(3) (2007) 530–537. 19 E. Screpanti and C. Hunte, Discontinuous membrane helices in transport proteins and their correlation with function, J. Struct. Biol., 159(2) (2007) 261–267. 20 J.M. Cuthbertson, D.A. Doyle and M.S. Sansom, Transmembrane helix prediction: A comparative evaluation and analysis, Protein Eng. Des. Sel., 18(6) (2005) 295–308.
240
Anna E. Speers and Christine C. Wu
21 H. Viklund, E. Granseth and A. Elofsson, Structural classification and prediction of reentrant regions in alpha-helical transmembrane proteins: Application to complete genomes, J. Mol. Biol., 361(3) (2006) 591–603. 22 M. Rapp, E. Granseth, S. Seppala and G. von Heijne, Identification and evolution of dual-topology membrane proteins, Nat. Struct. Mol. Biol., 13(2) (2006) 112–116. 23 G. Gafvelin and G. von Heijne, Topological ‘‘frustration’’ in multispanning E. coli inner membrane proteins, Cell, 77(3) (1994) 401–412. 24 G. von Heijne, Membrane-protein topology, Nat. Rev. Mol. Cell Biol., 7(12) (2006) 909–918. 25 G. von Heijne, Principles of membrane protein assembly and structure, Prog. Biophys. Mol. Biol., 66(2) (1996) 113–139. 26 A.B. Weinglass, J.P. Whitelegge and H.R. Kaback, Integrating mass spectrometry into membrane protein drug discovery, Curr. Opin. Drug Discov. Devel., 7(5) (2004) 589–599. 27 M. Karas and F. Hillenkamp, Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons, Anal. Chem., 60(20) (1988) 2299–2301. 28 J.B. Fenn, M. Mann, C.K. Meng, S.F. Wong and C.M. Whitehouse, Electrospray ionization for mass spectrometry of large biomolecules, Science, 246(4926) (1989) 64–71. 29 B. Domon and R. Aebersold, Mass spectrometry and protein analysis, Science, 312(5771) (2006) 212–217. 30 R.A. Zubarev, N.L. Kelleher and F.W. McLafferty, Electron capture dissociation of multiply charged protein cations. A nonergodic process, J. Am. Chem. Soc., 120(13) (1998) 3265–3266. 31 J.E.P. Syka, J.J. Coon, M.J. Schroeder, J. Shabanowitz and D.F. Hunt, Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry, Proc. Natl. Acad. Sci. USA, 101(26) (2004) 9528–9533. 32 B. Bogdanov and R.D. Smith, Proteomics by FTICR mass spectrometry: Top down and bottom up, Mass Spectrom. Rev., 24(2) (2005) 168–200. 33 J. Borch, T.J. Jorgensen and P. Roepstorff, Mass spectrometric analysis of protein interactions, Curr. Opin. Chem. Biol., 9(5) (2005) 509–516. 34 J.R. Yates, 3rd, Mass spectrometry and the age of the proteome, J. Mass Spectrom., 33(1) (1998) 1–19. 35 M.P. Washburn, D. Wolters and J.R. Yates, 3rd, Large-scale analysis of the yeast proteome by multidimensional protein identification technology, Nat. Biotechnol., 19(3) (2001) 242–247. 36 D.A. Wolters, M.P. Washburn and J.R. Yates, 3rd, An automated multidimensional protein identification technology for shotgun proteomics, Anal. Chem., 73(23) (2001) 5683–5690. 37 N. Zhang, N. Li and L. Li, Liquid chromatography MALDI MS/MS for membrane proteome analysis, J. Proteome Res., 3(4) (2004) 719–727. 38 H. Zischka, C.J. Gloeckner, C. Klein, S. Willmann, M. Swiatek-de Lange and M. Ueffing, Improved mass spectrometric identification of gel-separated hydrophobic membrane proteins after sodium dodecyl sulfate removal by ion-pair extraction, Proteomics, 4(12) (2004) 3776–3782. 39 R.R. Loo, N. Dales and P.C. Andrews, Surfactant effects on protein structure examined by electrospray ionization mass spectrometry, Protein Sci., 3(11) (1994) 1975–1983. 40 A.E. Speers and C.C. Wu, Proteomics of integral membrane proteins — theory and application, Chem. Rev., 107(8) (2007) 3687–3714. 41 F. Fischer, D. Wolters, M. Rogner and A. Poetsch, Toward the complete membrane proteome: High coverage of integral membrane proteins through transmembrane peptide detection, Mol. Cell Proteomics, 5(3) (2006) 444–453. 42 J. Zhou, T. Zhou, R. Cao, Z. Liu, J. Shen, P. Chen, X. Wang and S. Liang, Evaluation of the application of sodium deoxycholate to proteomic analysis of rat hippocampal plasma membrane, J. Proteome Res., 5(10) (2006) 2547–2553. 43 A.V. Vener and P. Stralfors, Vectorial proteomics, IUBMB Life, 57(6) (2005) 433–440. 44 A.R. Blackler, A.E. Speers, M.S. Ladisnky and C.C. Wu, A shotgun proteomic method for the identification of membrane-embedded proteins and peptides. J. Proteome Res., in press (2008). 45 A.E. Speers, A.R. Blackler and C.C. Wu, Shotgun analysis of integral membrane proteins facilitated by elevated temperature, Anal. Chem., 79(12) (2007) 4613–4620. 46 L.R. Snyder, J.J. Kirkland and J.L. Glajch, Practical HPLC Method Development, 2nd ed., Wiley, New York, 1997, pp. 497–509.
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
241
47 H. Kaji, J.I. Kamiie, H. Kawakami, K. Kido, Y. Yamauchi, T. Shinkawa, M. Taoka, N. Takahashi and T. Isobe, Proteomics reveals N-linked glycoprotein diversity in Caenorhabditis elegans and suggests an atypical translocation mechanism for integral membrane proteins, Mol. Cell Proteomics, 6(12) (2007) 2100–2109. 48 J.F. Leite, A.A. Amoscato and M. Cascio, Coupled proteolytic and mass spectrometry studies indicate a novel topology for the glycine receptor, J. Biol. Chem., 275(18) (2000) 13683–13689. 49 P. Aanismaa and A. Seelig, P-glycoprotein kinetics measured in plasma membrane vesicles and living cells, Biochemistry, 46(11) (2007) 3394–3404. 50 B. Sarkadi, I. Szasz and G. Gardos, Characteristics and regulation of active calcium transport in inside-out red cell membrane vesicles, Biochem. Biophys. Acta, 598(2) (1980) 326–338. 51 F. Johansson, M. Olbe, M. Sommarin and C. Larsson, Brij 58, a polyoxyethylene acyl ether, creates membrane vesicles of uniform sidedness. A new tool to obtain inside-out (cytoplasmic side-out) plasma membrane vesicles, Plant J., 7(1) (1995) 165–173. 52 C.C. Wu, M.J. MacCoss, K.E. Howell and J.R. Yates, 3rd, A method for the comprehensive proteomic analysis of membrane proteins, Nat. Biotechnol., 21(5) (2003) 532–538. 53 L.M. Casano, H.R. Lascano, M. Martin and B. Sabater, Topology of the plastid Ndh complex and its NDH-F subunit in thylakoid membranes, Biochem. J., 382(Pt 1) (2004) 145–155. 54 J.F. Leite and M. Cascio, Structure of ligand-gated ion channels: Critical assessment of biochemical data supports novel topology, Mol. Cell. Neurosci., 17(5) (2001) 777–792. 55 C. Wei, J. Yang, J. Zhu, X. Zhang, W. Leng, J. Wang, Y. Xue, L. Sun, W. Li, J. Wang and Q. Jin, Comprehensive proteomic analysis of Shigella flexneri 2a membrane proteins, J. Proteome Res., 5(8) (2006) 1860–1865. 56 L.A. Eichacker, B. Granvogl, O. Mirus, B.C. Muller, C. Miess and E. Schleiff, Hiding behind hydrophobicity. Transmembrane segments in mass spectrometry, J. Biol. Chem., 279(49) (2004) 50915–50922. 57 M.J. Rodriguez-Ortega, N. Norais, G. Bensi, S. Liberatori, S. Capo, M. Mora, M. Scarselli, F. Doro, G. Ferrari, I. Garaguso, T. Maggi, A. Neumann, A. Covre, J.L. Telford and G. Grandi, Characterization and identification of vaccine candidate proteins through analysis of the group A Streptococcus surface proteome, Nat. Biotechnol., 24(2) (2006) 191–197. 58 T.S. Nuhse, A. Stensballe, O.N. Jensen and S.C. Peck, Phosphoproteomics of the Arabidopsis plasma membrane and a new phosphorylation site database, Plant Cell, 16(9) (2004) 2394–2405. 59 M. Ramjeesingh, C. Li, Y.M. She and C.E. Bear, Evaluation of the membrane-spanning domain of ClC-2, Biochem. J., 396(3) (2006) 449–460. 60 K. Nunomura, K. Nagano, C. Itagaki, M. Taoka, N. Okamura, Y. Yamauchi, S. Sugano, N. Takahashi, T. Izumi and T. Isobe, Cell surface labeling and mass spectrometry reveal diversity of cell surface markers and signaling molecules expressed in undifferentiated mouse embryonic stem cells, Mol. Cell Proteomics, 4(12) (2005) 1968–1976. 61 Y. Wang, E.A. Berg, X. Feng, L. Shen, T. Smith, C.E. Costello and Y.X. Zhang, Identification of surfaceexposed components of MOMP of Chlamydia trachomatis serovar F, Protein Sci., 15(1) (2006) 122–134. 62 J.F. Leite and M. Cascio, Probing the topology of the glycine receptor by chemical modification coupled to mass spectrometry, Biochemistry, 41(19) (2002) 6140–6148. 63 C. Li, S. Takazaki, X. Jin, D. Kang, Y. Abe and N. Hamasaki, Identification of oxidized methionine sites in erythrocyte membrane protein by liquid chromatography/electrospray ionization mass spectrometry peptide mapping, Biochemistry, 45(39) (2006) 12117–12124. 64 A.B. Weinglass, J.P. Whitelegge, Y. Hu, G.E. Verner, K.F. Faull and H.R. Kaback, Elucidation of substrate binding interactions in a membrane transport protein by mass spectrometry, EMBO J., 22(7) (2003) 1467–1477. 65 A. Weinglass, J.P. Whitelegge, K.F. Faull and H.R. Kaback, Monitoring conformational rearrangements in the substrate-binding site of a membrane transport protein by mass spectrometry, J. Biol. Chem., 279(40) (2004) 41858–41865. 66 A.B. Weinglass, M. Soskine, J.L. Vazquez-Ibar, J.P. Whitelegge, K.F. Faull, H.R. Kaback and S. Schuldiner, Exploring the role of a unique carboxyl residue in EmrE by mass spectrometry, J. Biol. Chem., 280(9) (2005) 7487–7492. 67 E.L. Vodovozova, Photoaffinity labeling and its application in structural biology, Biochemistry (Mosc), 72(1) (2007) 1–20.
242
Anna E. Speers and Christine C. Wu
68 C.D. Son, H. Sargsyan, G.B. Hurst, F. Naider and J.M. Becker, Analysis of ligand-receptor crosslinked fragments by mass spectrometry, J. Pept. Res., 65(3) (2005) 418–426. 69 E. Sachon, O. Tasseau, S. Lavielle, S. Sagan and G. Bolbach, Isotope and affinity tags in photoreactive substance P analogues to identify the covalent linkage within the NK-1 receptor by MALDI-TOF analysis, Anal. Chem., 75(23) (2003) 6536–6543. 70 S.M. Lamos, C.J. Krusemark, C.J. McGee, M. Scalf, L.M. Smith and P.J. Belshaw, Mixed isotope photoaffinity reagents for identification of small-molecule targets by mass spectrometry, Angew. Chem. Int. Ed. Engl., 45(26) (2006) 4329–4333. 71 A. Sinz, Isotope-labeled photoaffinity reagents and mass spectrometry to identify protein-ligand interactions, Angew. Chem. Int. Ed. Engl., 46(5) (2007) 660–662. 72 K. Pleban, S. Kopp, E. Csaszar, M. Peer, T. Hrebicek, A. Rizzi, G.F. Ecker and P. Chiba, P-glycoprotein substrate binding domains are located at the transmembrane domain/transmembrane domain interfaces: a combined photoaffinity labeling-protein homology modeling approach, Mol. Pharmacol., 67(2) (2005) 365–374. 73 G.F. Ecker, K. Pleban, S. Kopp, E. Csaszar, G.J. Poelarends, M. Putman, D. Kaiser, W.N. Konings and P. Chiba, A three-dimensional model for the substrate binding domain of the multidrug ATP binding cassette transporter LmrA, Mol. Pharmacol., 66(5) (2004) 1169–1179. 74 P. Wu, C.J. Oleschuk, Q. Mao, B.O. Keller, R.G. Deeley and S.P. Cole, Analysis of human multidrug resistance protein 1 (ABCC1) by matrix-assisted laser desorption ionization/time of flight mass spectrometry: Toward identification of leukotriene C4 binding sites, Mol. Pharmacol., 68(5) (2005) 1455–1465. 75 A. Mourot, T. Grutter, M. Goeldner and F. Kotzyba-Hibert, Dynamic structural investigations on the torpedo nicotinic acetylcholine receptor by time-resolved photoaffinity labeling, Chembiochem., 7(4) (2006) 570–583. 76 J.F. Leite, M.P. Blanton, M. Shahgholi, D.A. Dougherty and H.A. Lester, Conformation-dependent hydrophobic photolabeling of the nicotinic receptor: Electrophysiology-coordinated photochemistry and mass spectrometry, Proc. Natl. Acad. Sci. USA, 100(22) (2003) 13054–13059. 77 B.H. White and J.B. Cohen, Agonist-induced changes in the structure of the acetylcholine receptor M2 regions revealed by photoincorporation of an uncharged nicotinic noncompetitive antagonist, J. Biol. Chem., 267(22) (1992) 15770–15783. 78 H. Bayley and J. Staros, Photoaffinity labeling and related techniques. In: E. Scriven (Ed.), Azides and Nitrenes, Academic Press, New York, 1984, pp. 433–490. 79 G. Schuster and M. Platz, Photochemistry of phenyl azide, Adv. Photochem., 17 (1992) 69–143. 80 R.A. Vaughan, M.L. Parnas, J.D. Gaffaney, M.J. Lowe, S. Wirtz, A. Pham, B. Reed, S.M. Dutta, K.K. Murray and J.B. Justice, Affinity labeling the dopamine transporter ligand binding site, J. Neurosci. Methods, 143(1) (2005) 33–40. 81 A.E. Speers and B.F. Cravatt, A tandem orthogonal proteolysis strategy for high-content chemical proteomics, J. Am. Chem. Soc., 127(28) (2005) 10018–10019. 82 J.L. Faulon, K. Sale and M. Young, Exploring the conformational space of membrane protein folds matching distance constraints, Protein Sci., 12(8) (2003) 1750–1761. 83 K. Sale, J.L. Faulon, G.A. Gray, J.S. Schoeniger and M.M. Young, Optimal bundling of transmembrane helices using sparse distance constraints, Protein Sci., 13(10) (2004) 2613–2627. 84 J.W. Back, V. Notenboom, L.J. de Koning, A.O. Muijsers, T.K. Sixma, C.G. de Koster and L.Z. de Jong, Identification of cross-linked peptides for protein interaction studies using mass spectrometry and O-18 labeling, Anal. Chem., 74(17) (2002) 4417–4422. 85 D.R. Muller, P. Schindler, H. Towbin, U. Wirth, H. Voshol, S. Hoving and M.O. Steinmetz, Isotope tagged cross linking reagents. A new tool in mass spectrometric protein interaction analysis, Anal. Chem., 73(9) (2001) 1927–1934. 86 X.T. Tang, G.R. Munske, W.F. Siems and J.E. Bruce, Mass spectrometry identifiable cross-linking strategy for studying protein-protein interactions, Anal. Chem., 77(1) (2005) 311–318. 87 R.B. Jacobsen, K.L. Sale, M.J. Ayson, P. Novak, J.H. Hong, P. Lane, N.L. Wood, G.H. Kruppa, M.M. Young and J.S. Schoeniger, Structure and dynamics of dark-state bovine rhodopsin revealed by chemical cross-linking and high-resolution mass spectrometry, Protein Sci., 15(6) (2006) 1303–1317.
Bottom-Up Mass Spectrometry Analysis of Integral Membrane Protein Structure and Topology
243
88 S.W. Englander, Hydrogen exchange and mass spectrometry: A historical perspective, J. Am. Soc. Mass Spectrom., 17(11) (2006) 1481–1489. 89 L.S. Busenlehner and R.N. Armstrong, Insights into enzyme structure and dynamics elucidated by amide H/D exchange mass spectrometry, Arch. Biochem. Biophys., 433(1) (2005) 34–46. 90 S.W. Englander, T.R. Sosnick, J.J. Englander and L. Mayne, Mechanisms and uses of hydrogen exchange, Curr. Opin. Struct. Biol., 6(1) (1996) 18–23. 91 M.J. Chalmers, S.A. Busby, B.D. Pascal, Y. He, C.L. Hendrickson, A.G. Marshall and P.R. Griffin, Probing protein ligand interactions by automated hydrogen/deuterium exchange mass spectrometry, Anal. Chem., 78(4) (2006) 1005–1014. 92 T.J.D. Jorgensen, H. Gardsvoll, M. Ploug and P. Roepstorff, Intramolecular migration of amide hydrogens in protonated peptides upon collisional activation, J. Am. Chem. Soc., 127(8) (2005) 2785–2793. 93 L.S. Busenlehner, S.G. Codreanu, P.J. Holm, P. Bhakat, H. Hebert, R. Morgenstern and R.N. Armstrong, Stress sensor triggers conformational response of the integral membrane protein microsomal glutathione transferase 1, Biochemistry, 43(35) (2004) 11145–11152. 94 L.S. Busenlehner, J. Alander, C. Jegerscohld, P.J. Holm, P. Bhakat, H. Hebert, R. Morgenstern and R.N. Armstrong, Location of substrate binding sites within the integral membrane protein microsomal glutathione transferase-1, Biochemistry, 46(10) (2007) 2812–2822. 95 L.S. Busenlehner, L. Salomonsson, P. Brzezinski and R.N. Armstrong, Mapping protein dynamics in catalytic intermediates of the redox-driven proton pump cytochrome c oxidase, Proc. Natl. Acad. Sci. USA, 103(42) (2006) 15398–15403.
CHAPT ER
11 Covalent Trapping of Protein Interactions in Complex Systems Rasanjala Weerasekera, Tujin Shi and Gerold Schmitt-Ulms
Contents
1. Introduction 2. Protein Crosslinking 3. Interactome Methods 3.1 Alternatives to crosslinking-based methods 3.2 Crosslinking-based methods 4. Interface and Topology Mapping 4.1 Alternatives to crosslinking-based methods 4.2 Crosslinking-based methods 5. Future Directions Abbreviations Acknowledgements References
245 247 251 251 252 259 259 260 266 268 268 268
1. INTRODUCTION It is now well appreciated that individual proteins engage in complex and dynamic interactions with other proteins to fulfill their diverse cellular roles [1–3]. The realization of the significance of protein assemblies for protein biology has important ramifications. For instance, following a ‘guilt-by-association’ heuristic, the cellular function of an uncharacterized protein may be inferred from the known functions of proteins with which it associates [4]. Further, a careful analysis of the molecular environment of a protein known to play an important role in disease may provide insights into novel aspects of disease manifestation. As a result, methodologies that provide insights into protein interactions of established disease target proteins are expected to play a key role in the development of novel therapeutic targets and diagnostic markers. Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00211-0
r 2009 Elsevier B.V. All rights reserved.
245
246
Rasanjala Weerasekera et al.
For the ease of discussion, protein interaction mapping techniques can be categorized into low-resolution interactome mapping protocols and higher resolution topology mapping strategies. Interactome mapping strategies will reveal the individual protein constituents of a protein complex without establishing actual linkages among the proteins contained in the complex. Topology mapping on the other hand identifies regions within a protein that contribute either to internal contact sites (‘intraface’), or to the binding to another protein (interface), and, therefore, can provide information about the topology of an individual protein or a protein complex (Figure 1). Close scrutiny of current technologies available for the study of protein– protein interactions reveals at least two areas of investigation that have fallen behind. These are (1) the development of tools for studying the subset of physiological protein complexes that cannot be readily extracted intact from their cellular milieu (e.g. membrane proteins), and (2) the development of a) Bait interactome
b) Bait complexome Complex 1
Complex 2
IA2
Bait Bait
IA1
c) Topology of Complex 1 derived from atomic coordinates internal interface ('intraface')
Bait
Bait
IA1
IA2
d) Low-resolution topology model of Complex 1 built from crosslinking and mass spectrometry data
IA1 Bait
IA1
Bait
interface
Figure 1 Terminology employed to define the architecture of protein complexes. (a) Hypothetical interactome consisting of ‘bait’ protein and its two interactors, IA1 and IA2. (b) Hypothetical complexome depicting bait protein as a constituent of distinct protein complexes containing either IA1 (‘Complex 1’) or IA2 (‘Complex 2’). Please note that the direct interaction of proteins can sometimes be inferred following extensive biochemical characterization of highly purified protein complexes or systematic application of interactome protocols to a large number of bait proteins. Such complexome representations do generally, however, provide no insights into the topology of protein complexes (a more detailed terminology of protein complexes can be found in Ref. [2]). (c) Hypothetical high-resolution topology depiction of Complex 1 built from atomic coordinates as, e.g. revealed by crystallography or NMR analysis (oversimplified for illustration purposes). (d) Hypothetical low-resolution topology model of Complex 1 as deduced by protein crosslinking and mass spectrometry (MS). Please note that such a low-resolution model may contain accurate contact sites of peptide strands but will not provide x, y space-coordinates of individual atoms.
Covalent Trapping of Protein Interactions in Complex Systems
247
methodologies for the elucidation of protein topologies. In vivo chemical crosslinking may provide a solution to the difficulties encountered in both areas mentioned above — a view taken by many, which has given rise to a range of crosslinking methodologies in recent years (Figure 2). The advent of soft ionization strategies for mass spectrometry (MS) [5–7], and the increasing availability of genomic sequence depositories [8–10] in the late 1980s and 1990s, respectively, set the stage for the rapid pace with which novel interactome and topology mapping strategies have become available. Detailed descriptions of procedures for the MS-based identification of peptides can be found elsewhere (e.g. Ref. [11] and references therein). Here, comments will be restricted to aspects pertinent to the analysis of crosslinked samples. In this chapter, we will review methodological advancements and applications aimed at the study of interactomes and protein complex topologies in cells and tissues. Emphasis will be placed on concepts and considerations for the design of an in vivo crosslinking experiment. We will restrict our presentation to generic approaches which crosslink all proteins to their respective next neighbors and do not rely on protein-specific characteristics for downstream sample processing. As, in particular, the study of topologies following in vivo crosslinking is still in its infancy, we will include in our discussion some methodologies which so far have only shown to work in vitro, as long as their experimental concepts are in principle compatible with a move to in vivo work.
2. PROTEIN CROSSLINKING The choice of crosslinking reagent can strongly influence methodology and outcome of any crosslinking experiment. Minimally, a choice needs to be made with regard to the tether length of the reagent and the nature of the reactive groups which mediate the intended covalent crosslink. For more specialized applications it may be helpful for the crosslinking reagent to be equipped with built-in functionalities which facilitate the enrichment of crosslinked products, allow reversal of crosslinks downstream in the sample purification procedure and/or provide a means by which to quantify crosslinks (Figure 3). Indispensable features of crosslinking reagents employed for in vivo work are solubility in water and the ability to readily overcome biological barriers such as cellular membranes. As excellent general reviews of crosslinking chemistries can be found in the literature [12–15], we will restrict our discussion to a short overview of considerations for the choice of crosslinking reagent and emphasize aspects relevant for in vivo crosslinking work. Activation: A typical crosslinking reagent harbors one reactive group at each of its ends for covalent attachment to complementary chemical groups on proteins. Whereas with homobifunctional crosslinking reagents these reactive groups are identical, heterobifunctional reagents carry non-identical reactive groups at their ends. Reactive groups can be further divided into constitutively active and inducible groups. Given the right microenvironmental milieu (i.e. permissive temperature, pH, salt, etc.), constitutively active groups will engage
248
Rasanjala Weerasekera et al.
Select crosslinker, optimize reaction parameters
Introduce tagged bait protein
In vivo crosslinking
Affinity purification of protein complexes containing bait protein
CNBr cleavage
Separation of crosslinked proteins/fragments
Enzymatic digestion
Enrichment of crosslinked peptides
Mass spectrometry analysis
Computational identification of crosslinked proteins or peptides
Structure model building
Figure 2 Flow chart summarizing key elements of crosslinking strategies. The varying thickness of arrows that connect the steps indicates the popularity of a design feature in the combined literature, i.e. thicker arrows indicate more frequent use.
Covalent Trapping of Protein Interactions in Complex Systems
249
Homobifunctional crosslinker Spacer arm
Reactive group Tether length
Heterobifunctional crosslinker
Additional functionalities of cosslinkers Isotope tag
Crosslink reversal chemistry
H/L
SO3Na Enhanced water solubility Affinity tag for enrichment
Figure 3 Terminology and schematic presentation of functional elements of crosslinking reagents. Only a subset of crosslinking design features found in the crosslinking literature is shown. The intention here was to convey key concepts and restrict the presentation to elements that are most relevant for in vivo crosslinking work.
in covalent crosslinks as soon as they encounter a receptive chemical group on a target molecule. In contrast, inducible groups require an external stimulus such as the energy provided from a light source for coupling. Dependent on the chemistry employed, the reactive group may provide relaxed specificity and thereby promiscuously react with a range of similar chemical groups exposed on proteins [16] or may provide exquisite specificity. While in principle a diverse group of crosslinking reagents with different reactivities could be devised, the current in vivo crosslinking literature is largely dominated by amino group reactive crosslinkers that are either constitutively active or activated through irradiation with UV light. Tether length: Naturally, the crosslinking reaction is affected by sterical and distance constraints inherent to the chemical structure of both the crosslinking reagent and the peptide chains to which the crosslinker is applied. A large variety
250
Rasanjala Weerasekera et al.
of crosslinking reagents with different tether lengths have been described. While some crosslinkers require close contact between chemical target groups for efficient crosslinking (e.g. zero-length crosslinkers or formaldehyde), others may ˚ . The need to overcome lipid bridge through-space distances of more than 20 A membranes somewhat restricts the practical range of tether lengths for in vivo crosslinking work. The most frequently employed in vivo crosslinking reagents, formaldehyde and disuccinimidyl suberate (DSS), are equipped with spacer arms ˚ , respectively (Figure 4). of B2 and 11.4 A Reversibility: Formidable challenges in crosslinking studies include the validation of crosslinks following interactome work, and the identification of crosslinked peptides in interface studies. A reversible crosslinking reagent may aid in meeting both of the above challenges. Crosslinkers that can be cleaved by thiol, periodate or hydroxylamine have been devised for this purpose. In vivo studies which capitalized on this optional feature of the crosslinking reagent are dominated by two implementations. They either (i) used crosslinking reagents with spacer arms that harbor disulfide bridges and therefore could be reopened by treatment with thiols under reducing conditions (e.g. Refs. [17–19]), or (ii) employed formaldehyde for the formation of crosslink bonds that can be
H
Formaldehyde crosslinking
a)
Protein Protein H
H Protein
N
Protein Protein
N
N
H OH
H
H C
CH2
- H2O
C
H
H
N H
N
Protein
O
Protein Protein
H
H
NHS-ester crosslinking (only first step shown), Disuccinimidyl suberate (DSS) O
H
O H
O O
N
N
O O
O
O
O
O
N
O
H
N O
Protein Protein
N
N
Protein
O
b)
O
O H
O
Figure 4 Chemistries of the two most frequently employed crosslinking reagents in the in vivo crosslinking literature. Formaldehyde and DSS rely on dramatically different chemistries for homobifunctional amino group reactive crosslinking. (See Colour Plate Section at the end of this book.)
Covalent Trapping of Protein Interactions in Complex Systems
251
reversed by heat treatment in the presence of an excess of amino groups and reducing agent (e.g. Refs. [20–22]). Enrichment: The derivatization of crosslinking reagents with functional groups facilitates the enrichment of crosslinking products, and is enjoying increasing popularity in the in vitro crosslinking literature. In particular, the attachment of a biotin group to the spacer arm has been frequently reported [23–26]. However, the efficient delivery of these bulkier reagents to cells has not yet been achieved, and as a result, a generic enrichment strategy based on the use of thus derivatized crosslinking reagents is currently unavailable. Membrane permeability: Other than for studies of extracellular protein interactions a critical requirement for in vivo work is the ability of the crosslinking reagent to readily traverse lipid membranes either through a passive diffusion mechanism or through active transport. This limitation arguably represents the single most constraining requirement that disqualifies the vast majority of crosslinking reagents for in vivo crosslinking applications. Water solubility: This property falls into the same category as the requirement for membrane permeability. However, compared to the limited options available for counteracting the membrane permeability constraint, there are well-tested chemical strategies to equip an otherwise water-insoluble molecule with the ability to dissolve at reasonable yield in water. The most popular methods include derivatization with strongly water-soluble functionalities such as a sulfonyl group. In conclusion, the design and choice of crosslinking reagents for in vivo crosslinking work is governed by additional considerations that do not apply to in vitro crosslinking experiments with highly purified protein complexes. Consequently, a much smaller repertoire of crosslinking reagents is available for these studies and the in vitro crosslinking trend towards the use of even more complex modular crosslinking reagents equipped with multiple functionalities [14,27] is thus far not reflected in the in vivo crosslinking literature.
3. INTERACTOME METHODS 3.1 Alternatives to crosslinking-based methods A variety of strategies are presently available for mapping protein interactions. Techniques range from genetic approaches such as the Two-Hybrid System (THS) [28], synthetic lethal screens [29,30], fluorescence resonance energy transfer (FRET) [31,32] and the recently reported quantitative genetic interaction mapping method [33], to biochemical protocols that reveal the composition of a protein complex by immunoprecipitation [34] or affinity-tag purification [35] and MS [36]. Recently, a combination of conventional tandem affinity purification (TAP) and downstream analysis by MS enabled the generation of a three-dimensional interaction map of a heterodecameric protein complex through the application of gentle electrospray ionization (ESI) parameters that preserve non-covalent protein interactions [37]. Many of the above approaches provide complementary
252
Rasanjala Weerasekera et al.
and orthogonal interactome information and therefore can be used to crossvalidate candidate protein interactions.
3.2 Crosslinking-based methods Crosslinking-based interactome studies are well suited to complement some of the purely genetic interaction mapping approaches, which may point at functional interactions without establishing direct contacts of proteins. Furthermore, in vivo protein crosslinking serves as an important tool for the biochemical mapping of labile protein–protein interactions [38]. Studies involving protein complexes that require a particular milieu for integrity and/or that dissociate when subjected to common cell lysis and protein extraction conditions unless chemically stabilized, greatly benefit from the use of in vivo crosslinking as an initial step.
3.2.1 Capture of bait protein All current in vivo crosslinking protocols for interactome mapping require extensive enrichment of the protein complexes of interest prior to downstream analyses by MS. Two generic strategies for the affinity capture of protein complexes containing a bait protein dominate the literature. These are (i) coimmunoprecipitations with bait-specific antibodies and (ii) affinity purifications following genetic tagging and heterologous expression of bait proteins. Co-immunoprecipitations remain the method of choice in studies directed against endogenous proteins, investigations of the molecular environment of a protein in a complex tissue, and interactome mapping efforts of proteins that either do not tolerate the addition of tags or demonstrate altered biology following the addition of tags. While some of the most popular tags (FLAG, Myc, HA, V5, Protein A peptide) still require antibodies for interactome work, other tagging strategies such as polyhistidine tags, calmodulin or streptavidin binding peptides rely on antibody-independent affinity capture steps [35]. Tagging strategies are particularly powerful for systematic and large-scale interactome mapping efforts involving bait proteins purified from cells (e.g. Refs. [39,40]). Improvements in transgenic technologies also offer the transition to baitspecific tagging in complex organisms [41]. In recent years, genetic-chemical tagging strategies have been added to the repertoire and are gaining in popularity [42,43].
3.2.2 Recognition of unspecific interactors The covalent stabilization of protein interactions translates into the ability to treat affinity-captured protein complexes with stringent salt and detergent washing steps. Compared with protocols that avoid crosslinking or employ in vitro crosslinking, the move to in vivo crosslinking further reduces the risk of being misled in situations where non-physiological interactors bind directly to the protein of interest or to the affinity-purified protein complex only when present in an extract, but do not physiologically interact with the bait protein when cellular integrity is maintained. Nevertheless, unspecific interactors can be found
Covalent Trapping of Protein Interactions in Complex Systems
253
in any interactome sample as a result of (i) aggregated proteins in the sample that co-sediment with the affinity matrix, (ii) proteins that bind directly to the affinity matrix, (iii) proteins which under physiological conditions are found in a different cellular compartment than the bait protein, but have an intrinsic propensity to bind to the bait protein when present in an extract, (iv) abundant cellular proteins that populate affinity purification eluate fractions when samples are subject to less than stringent washing conditions, (v) proteins that originate from the affinity matrix themselves, e.g. if crude antibody preparations were coupled to chemically activated matrix beads and finally (vi) proteins such as trypsin, human skin and hair proteins, etc., introduced into the sample as a result of sample handling procedures. With this many possible sources of unspecific proteins, rather than aiming to eliminate all unspecific contaminants, the objective is to minimize their occurrence and more importantly, to know their identities. A conceptually simple and most reliable approach here is what we call the bait exclusion strategy. In this strategy, side-by-side affinity purifications are carried out from starting materials which differ in the presence or absence of the bait protein. As all steps during the procedure employ the same reagents and protocols, differences in the final list of candidate protein interactors are attributed to the expression versus ‘knock-out’ of the protein of interest. While the simplicity of this approach is compelling, caution is warranted as a protein may end up in the eluate as a result of its ability to non-physiologically and thus unspecifically bind directly to the bait protein or to the affinity-captured protein complex upon disruption of sample tissue. In addition, a protein for which expression is downregulated as an indirect result of bait protein knock-out and thus conspicuously absent from the affinity-purified material may be mistaken for a physical interactor (Figure 5). Two bait exclusion derivative strategies have been used in instances where no knock-out samples were available. The first of these employs a side-by-side affinity capture of identical wild-type extracts in which the affinity matrix to be used as the control is subjected to pre-saturation with a small ligand (e.g. peptide) that competes with the bait protein complex for binding to the affinity matrix. In the second strategy, the knock-out of a bait protein is replaced with a mere knock-down using RNA interference technology [44]. However, neither of these derivative strategies may achieve a complete suppression of bait protein capture in the control sample. As a result it may be difficult to distinguish candidate protein interactors as these proteins may also be present in control samples albeit at lower quantities. It is therefore advisable to consider combining these derivative strategies with isotopic labeling protocols that afford quantitative comparison of samples [44–46] as discussed below. It is advisable to incorporate quantitation methods into the experimental strategy in all possible instances. The benefit is two-fold: quantitative data provide useful information for downstream validation and more importantly, help distinguish specific from unspecific interactors by their relative abundances in side-by-side analyzed samples [47]. Many strategies for the generation of quantitative MS data have been reported [45,46,48,49]. The two most prevalent
254
Rasanjala Weerasekera et al.
Sample
Control
Sample List
Control List
Extracts
Affinity capture
Affinity capture matrix
1.
Bait protein
2.
Specific direct bait interactor
3.
Specific indirect bait interactor
4.
Matrix interactor
5.
Unspecific direct bait interactor
6.
Unspecific indirect bait interactor
7. . . .
Figure 5 Cartoon depicting bait exclusion strategy for the identification of non-specific interactors in affinity purification experiments. The bait exclusion strategy employs parallel affinity purifications from starting materials which differ in the presence or absence of the bait protein. As a result, bait-specific candidate interactors represent the subpopulation of proteins exclusively found in eluate fractions derived from the bait-expressing starting material. Unspecific binders may be found exclusively in the sample list or both sample and control lists dependent on whether these proteins interact with the bait protein and its specific interactors or aspects of the affinity matrix itself.
Covalent Trapping of Protein Interactions in Complex Systems
255
strategies make use of isotopic labels which are incorporated into proteins either through metabolic labeling or through chemical derivatization of peptides. In particular, when combined with bait exclusion concepts outlined above quantitative MS data can readily reveal candidate interactors of interest [44,50]. Quantitative data are further indispensable for interactome studies aimed towards abundant cellular proteins where without quantitation, it may be impossible to distinguish bona fide interactors from unspecific contaminants. Spectral counting or integration of extracted ion currents collected from parent MS spectra may in some instances serve as a cost-efficient alternative to isotopic labeling protocols [51]. An advantage of isotopic labeling quantitation strategies over the spectral counting approach however remains the ability to combine samples. This benefit not only reduces the MS analysis time but also eliminates the risk of an MS inherent sampling bias — a common phenomenon where runto-run variances in the analyses of complex samples may be misinterpreted to reflect sample-to-sample differences [52].
3.2.3 Methodologies A typical crosslinking experiment is based on reaction conditions that are well compatible with the downstream proteolysis of proteins, and that enable straightforward identification of proteins based on the presence of unmodified peptides with the use of standard protein identification software. Crosslinking is however associated with sample losses as a result of additional sample handling steps — crosslinking, quenching and removal of chemicals — and inefficient recovery of crosslinked peptides from gels and chromatography matrices. Sample losses are expected to increase in instances where crosslinking and cleavage chemistries share overlapping specificity, exemplified by the frequently employed combination of amino group-specific crosslinkers and downstream fragmentation with trypsin. In any event, crosslinking is paralleled by an increase in the chemical complexity of protein samples and therefore may contribute to peak overlap and ion suppression effects during MS. Despite these restrictions, crosslinking does not typically impose a major difficulty for protein interactome work and usually does not require deviation from standard ‘bottom-up’ proteomics workflows. The sensitivity of any protocol that interrogates protein interactions by in vivo crosslinking is further limited by the need to recover crosslinked proteins from cells following the crosslinking reaction. Recently, the global in vivo incorporation of photo-activatable amino acid derivatives was reported [53]. In this global strategy, the use of amino acids that closely resemble natural amino acids helped to avoid the cellular identity control mechanisms. As a result, photo-activatable amino acids are incorporated into proteins during the translation process similar to their natural amino acid counterparts. So far this approach has been successful with the insertion of photo-leucine and photo-methionine into proteins. This choice of amino acids is however of particular relevance for the study of membrane protein interactions as, due to its hydrophobic nature, leucine is highly represented in the membrane-spanning regions of transmembrane proteins. The authors reported a 99% activation rate of the photo-activatable
256
Rasanjala Weerasekera et al.
group following a 3-min UV exposure. A concern with such a strategy which employs global metabolic incorporation of amino acid derivatives is the cellular stress that may be caused by the presence of large amounts of chemically modified proteins. No gross manifestations of toxicity were observed at levels of 0.7% photo-methionine incorporation [53]. Mild formaldehyde crosslinking has been utilized extensively for the study of nucleosomal protein interactions or protein–DNA interactions [22,54–58] and is increasingly been recognized as a useful tool for the study of protein interactions involving membrane proteins [59]. Features that make formaldehyde crosslinking attractive are (i) the water solubility of the reagent; (ii) the absence of ˚) reagent-induced rearrangements of the proteins and (iii) the short (2–3 A crosslink bonds that endure harsh, non-physiological treatments and are reversible [22]. Various groups have reported on the use of a strategy that employs mild formaldehyde crosslinking followed by SDS-PAGE analysis of crosslinked products and tandem mass spectrometry (MS/MS) to identify interactors of bait proteins [20,60–64]. Although the above studies use affinity purification to enrich for crosslinked protein complexes, the use of onedimensional SDS-PAGE to separate crosslinked proteins from their uncrosslinked counterparts (i.e. uncrosslinked bait protein) is a limitation since crosslinked proteins often do not resolve well on these gels. Moreover, many large protein complexes cannot be analyzed in this manner since SDS-PAGE analysis contains an upper molecular weight working limit of B500 kDa. The first strategy reported to crosslink protein interactions in their physiological milieu in a complex tissue, i.e. prior to the disruption of tissue integrity, is the time-controlled transcardiac perfusion crosslinking (tcTPC) method [65] (Figure 6). tcTPC combines transcardiac perfusion and mild formaldehyde crosslinking for the study of protein interactions in complex tissues. Crosslinked complexes are immunoaffinity purified, in-solution trypsinized directly (rather than being resolved on a gel first) and finally subjected to MS/MS analysis to reveal in vivo interactors of selected bait proteins. When applied to the prion protein (PrP) the tcTPC method enabled identification of more than 20 proteins which reside in spatial proximity to PrP in vivo and a recent application of this method revealed the comprehensive molecular environment of the amyloid precursor protein in the brain [66]. Owing to its application to intact tissues, tcTPC can capture membrane proteins interactions in trans with proteins on neighboring cells that frequently belong to a different cell type. It has recently been shown that prolonged storage of formalin-fixed and paraffin-embedded tissue over months or even years is compatible with subsequent MS-based identification of proteins [67,68]. It remains to be seen whether such tissue material can be utilized in studies aimed at deciphering protein–protein interactions. A downside of any method, including tcTPC, which targets endogenous proteins is the need for selective immunoaffinity reagents. The development of engineered mice which express tandem affinity tagged or biotinylated versions of bait proteins may serve as an interesting methodological advancement in this context [41,69,70].
Covalent Trapping of Protein Interactions in Complex Systems
257
c Reduction, alkylation, proteolytic cleavage
b High stringency affinity pull-down of target protein complexes
a time-controlled Transcardiac Perfusion Crosslinking (tcTPC)
d
time
Reduction of sample complexity by two-dimensional LC
f Computationally aided protein identification
e m/z
Online ESI-MS/MS
Figure 6 Schematic representation of the time-controlled transcardiac perfusion crosslinking procedure. (a) The crosslinking solution is pumped through the circulatory system of the mouse in a parameter-defined manner. (b) Protein complexes are purified in a stringent immunoaffinity step, then (c) reduced, alkylated and digested. (d) Two-dimensional liquid chromatography of peptides is coupled to (e) online ESI-MS/MS, which is followed by (f) computationally aided protein identification.
3.2.4 Validation While the composition of an interactome data set for any new bait protein is somewhat unpredictable, certain criteria can be used to facilitate assessment of data quality and to guide the selection of candidate proteins for downstream validation. As such, a critical requirement for meaningful downstream validation studies is that the bait protein not only be represented in a given data set but also gives rise to the strongest protein identification in terms of both signal intensities of parent ions and percent protein coverage. It is further helpful to sort proteins according to sequence coverage obtained, as this parameter is often a good correlate for the abundance with which a protein interaction occurs. Physiologically relevant but intrinsically dynamic or weak interactions are, however, generally difficult to capture (even with prior crosslinking). With regard to the predicted nature and size of data sets, it is reasonable to assume that any bait protein may engage in interactions with proteins that play a role in the various steps during its formation and maturation including translation, folding, transport, post-translational modification, cellular function and degradation. Clearly, an interactome study that reveals no interaction or an excessive number
258
Rasanjala Weerasekera et al.
of interactions (e.g. W100 specific candidate interactors) should raise suspicions regarding the quality of data. In such a scenario, the presence of a significant subset of proteins known to physically or genetically interact with one another may indicate an excessive presence of indirect interactors. Frequently, it is necessary to weed lists of candidate interactors for the most likely functional interactors. Caution needs to be exercised during these steps to avoid bias, and the following comments may serve merely as starting points in this difficult task. Abundant and promiscuous interactors: The comparison of interactome data collected with a range of bait proteins in affinity purification studies generally reveals a subset of proteins that appear to reside in the vicinity of more than one bait protein. Frequently, these proteins are either known to promiscuously facilitate folding or degradation of bait proteins or represent highly abundant cellular proteins. While their presence in the vicinity of a bait protein may indicate physiologically relevant interactions, it may pose a formidable challenge to prove specificity and physiological relevance of these interactions in downstream validation studies. Gene Ontology (GO) annotations: These describe the molecular function, biological processes and cellular component a protein has been associated with and may help to group candidate interactors [71]. A reasonable strategy here is to limit initial validation efforts to the candidate interactor within a classification group for which the highest percentage of sequence coverage was obtained. Similarly, a detailed sequence analysis occasionally reveals the presence of shared protein subdomains in a subset of candidate interactors, and suggests selective affinity of the bait protein for proteins that harbor this domain. As above, in such a scenario it may be advisable to initially restrict validation efforts to the candidate interactor containing this shared sequence domain which gave rise to the most confident identification. ‘Expert eyes’ and PubMed searches: This approach would be considered the most controversial as it is potentially fraught with bias. It is however unlikely that anyone would resist the temptation to screen literature databases for corroborating evidence supporting the notion of a possible interaction between a bait protein and its candidate interactors. Undoubtedly, valuable resources may be saved by subjecting the list of candidate interactors to a screen by experts who understand the biology of the bait protein, and others who have encountered dozens of similar interactome data sets. Once promising interactors of a bait protein have been selected for further investigations, an array of biochemical methods needs to be employed to characterize these proteins and probe for their involvement in physiological activities that govern the biology of the bait protein. If more than just a few candidate interactors need to be validated in parallel, it may be most economical to base an initial screen on rtPCR. Underlying this recommendation is the observation that many functional interactors are subject to transcriptional coregulation. This approach requires parallel harvesting of RNA from cells or tissues that express differential levels of either the bait or the candidate interactor for cDNA synthesis. Alternatively, the effect of an RNAi-based knock-down of
Covalent Trapping of Protein Interactions in Complex Systems
259
candidate interactors on the expression and post-translational modification of the bait protein may be investigated. Both approaches generate data orthogonal to the physical interaction data set. Additional validation tools to be considered are overexpression analyses of selected targets, reciprocal immunoprecipitations, glycerol velocity gradient centrifugation, iodixinol gradient centrifugation, immunocytochemistry and functional cell-based assays. In addition to establishing whether a given candidate interactor represents a physiological interactor of the bait protein, the above studies should aim to delineate whether the bait protein engages in interactions with multiple candidate interactors as part of a single protein complex, or binds to a subset of its interactors within multiple distinct complexes. Naturally, the choice of the methodology will depend on the nature of the candidate interactor, the availability of specific immunoreagents, cell or animal models, specific inhibitors/agonists and whether a knock-down/overexpression of the target protein can be achieved.
4. INTERFACE AND TOPOLOGY MAPPING 4.1 Alternatives to crosslinking-based methods There is relatively little information available regarding the topology of multisubunit protein complexes except for a handful of well-characterized and abundant protein assemblies, for which high-resolution structural data are available [72,73]. Progress with high-resolution structure technologies is continuously being made. However, it is unlikely that X-ray or NMR-based strategies will provide routine access to interface data of multi-constituent membrane protein complexes or transient complexes in the near future. The THS and FRET (and its derivatives) at the other end of the methodological spectrum are not only useful for the identification of protein interactors, but also are generally considered powerful genetic tools for higher resolution interface mapping [31,74]. A downside of any such genetic approach for interface mapping is the hypothesis-driven nature of the experiment and the limited power it offers for the dissection of complex or non-linear interfaces. To minimize conceptual bias in such a study, one could either generate a large number of expression constructs, or narrow down interfaces through iterative cycles of expression construct cloning and testing. The same limitation also applies to biochemical surface plasmon resonance (SPR)-based interface mapping strategies that require the recombinant expression of a series of deletion constructs for individual protein complex subunits. An advantage of the SPR approach is that it not only reveals qualitative binding information but also generates data that allow calculation of affinity constants [75,76]. Valuable topology information can also be obtained by probing the solvent accessibility and hydrogen bonding characteristics of amino acids that are buried within a protein complex. For example, deuterium–hydrogen (D/H) exchange followed by MS is a well-suited methodology for this application [77–80].
260
Rasanjala Weerasekera et al.
A field that has received considerable attention in recent years is the computational modeling of protein–protein interfaces and the use of sophisticated algorithms for protein structure prediction. The reader is directed to some very good review articles available on this topic [72,81–84]. Naturally, the selection of methodologies we have pointed towards in this introductory paragraph can only represent a small window into a massive multidisciplinary effort with the objective to identify protein complex topologies. Accelerated progress in this direction will depend on increased integration of information derived from various experimental sources and on the development of complementary strategies.
4.2 Crosslinking-based methods An inherent conceptual feature of any crosslinking-based interface mapping protocol is the need to employ relatively mild crosslinking conditions in order to enable downstream fragmentation of protein complexes. The consequences of this constraint are an overabundance of uncrosslinked peptides which can make the search for crosslinked peptides challenging (Figure 7) and the requirement for a significantly higher amount of protein starting material than needed for the mere identification of proteins and peptides. The situation is further exacerbated by the fact that frequently the crosslinking reaction leads to merely derivatized but uncrosslinked peptides. This observation is not surprising as one would expect instances where a crosslinking reagent which has reacted with a peptide through one of its reactive groups cannot execute the second reaction step of a crosslink reaction due to its restricted freedom of movement. Amongst peptides that are bona fide crosslinked one can further distinguish between short-range crosslinks which crosslink nearby residues within the same peptide strand and therefore provide limited topology information versus informative long-range crosslinks which crosslink different domains or peptide strands through space (Figure 8a). In considering the number of possible crosslinks for a given protein complex — assuming no prior structural knowledge regarding crosslinks being restricted to only a subset of theoretically possible crosslinks exists — it becomes apparent that crosslinking possibilities roughly progress as a function of the square of the increase in length of the crosslinked peptide chain. In other words, it is expected that a 10-fold increase in molecular weight of a target complex would be paralleled by more than a 100-fold theoretical increase in the number of crosslinks that one would need to consider (Figure 8b). For the above reasons, identifying crosslinked peptides within protein complexes has been compared to the search for a needle in a haystack [85] and to date, this literature is dominated by studies that involve the characterization of topologies of highly purified proteins from relatively large quantities of starting material, and protein complexes with rarely more than two proteins as constituents [14,27,86]. No generic crosslinking-based strategy that allows sensitive interface mapping of protein interactions in vivo has been reported.
U 1439.79 100
U 1479.77 U 1567.72 U 1639.91 U 1724.81
60 U 927.47 40
U 1907.89
U 1193.58 U 1249.60
20
U 1138.47
U 2045.00
U 1283.69
U 2492.23
U 2247.92 C 2294.09
U 2612.15
U 2953.46 M C C 3019.13 3252.64 2872.33
U 3513.62
0 1000
1500
2000
2500
3000
3500
4000
m/z [amu]
261
Figure 7 Representative MALDI parent ion mass spectrum depicting relative signal intensities of unmodified, modified and crosslinked peptides following in vitro crosslinking and proteolytic digestion of test protein. Tryptic digest of bovine serum albumin (BSA) following in vitro crosslinking with DSS. All labeled peptides are tryptic peptides derived from BSA. C, crosslinked peptides; M, peptides modified by DSS but not crosslinked and U, unmodified or carbamidomethylated peptides. Adapted from Ref. [87].
Covalent Trapping of Protein Interactions in Complex Systems
Relative intensity (%)
80
U 1880.89
262
Rasanjala Weerasekera et al.
a)
N
N
N
N N
C
C
Uncrosslinked
C
Derivatized ('dead end')
C
Short-range intracrosslink
C Through-space crosslink
b) 1
N
C N 1+2
N
2
C N
2+3
2 cleavage sites
5 peptides
5 + 4 + 3 + 2 + 1=
3 cleavage sites
7 peptides
7 + 6 +... + 2 + 1=
1401 peptides
C
C N
700 cleavage sites
3
C 15 through-space crosslinks
28 through-space crosslinks . . . 1401 + 1400 + ... + 2 + 1 = 1,000,000 through-space crosslinks
Figure 8 Illustration of the sample complexity problem inherent to crosslinking/MS-based topology work. (a) Classification of products resulting from the crosslinking reaction. The true theoretical possibilities of crosslink products are even larger as any combination of these four main products may occur (e.g. a peptide which carries both a derivatization and a short-range intracrosslink). (b) Correlation of theoretical crosslink possibilities to be considered and size of protein (complex). Calculated numbers of theoretical through-space crosslinks are based on the assumption that protein cleavage reactions may produce no more than one missed cleavage. Please note that for a protein complex of unknown topology, one would need to assume that any peptide might be crosslinked to self or any other peptide in the complex. No additional modifications were considered. (See Colour Plate Section at the end of this book.)
4.2.1 Direct detection In the conventionally used direct detection strategy, a purified protein complex is first crosslinked, next digested and peptides are then analyzed by peptide mass fingerprinting in a single-stage MS experiment. The direct detection of masses matching calculated masses of crosslinked peptides provides structural constraints which can aid the generation of a simple topology model and may be utilized to restrict a given protein to possible fold families. As the size of target complexes increases, so do the number of possible crosslinks and, as a result, the need to base the assignment on tandem MS data (Figure 9).
Covalent Trapping of Protein Interactions in Complex Systems
263
As on average the mass of a crosslinked peptide — with two peptides plus the crosslinking reagent — is more than twice the mass of an uncrosslinked peptide, crosslinked peptides often exhibit poor ionization characteristics. Both ‘soft’ ionization strategies have been successfully employed for in vitro crosslinking work. MALDI ionization offers the advantage of singly charged ions and therefore simplified mass spectra. This advantage may, however, be offset by the poor fragmentation of these large ions due to their relatively low internal energies. ESI, in contrast, leads to the dilution of signals over multiple charge states for large peptides but compensates this disadvantage with higher internal energies and, consequently, more informative fragmentation spectra. Various solutions have been offered that facilitate the detection of crosslinked peptides. The most commonly used protocols capitalize on labeling strategies which introduce predictable mass shifts. Here, 18O labeling introduced via proteolytic cleavage in ‘heavy’ water [87], labeling with a mixture of isotopecoded crosslinkers [88,89] or proteins [90] and fluorescently labeled crosslinkers [91,92] have found the widest application. Alternatively, crosslinking reagents have been equipped with features that afford assignment of crosslinks from signature ions in tandem MS spectra [93]. Crosslinking work generates a need for two types of MS data analysis algorithms. Namely algorithms for the identification of candidate crosslink masses within MS spectra and algorithms which aid in the interpretation of tandem MS spectra and allow assignment of crosslink sites. While various software solutions (e.g. Refs. [87,91,94,95]) and add-in algorithms to protein identification packages [96,97] to address this need have been reported, none of these solutions have found widespread application so far, and the computational analysis of crosslinks has remained a relatively young field of investigation. A contributing factor here is a lack of systematic data on the fragmentation behaviour of crosslinked peptides. Thus, workflows for the assignment of crosslink sites typically require a semi-manual data analysis approach, i.e. candidate crosslink masses may be selected computationally following initial acquisition of MS spectra. Iterative tandem analyses of candidate crosslinked parent ions then generate informative fragment ion spectra. These can be partially interpreted by some of the algorithms mentioned above. Ultimately, however, crosslink assignments continue to require the manual verification of fragment ion spectra by experienced mass spectrometrists. It is important to note that the commonly used convention for confident protein identifications in interactome work, namely the presence of two strong and unique fragmentation spectra assigned to the same protein entry, cannot be fulfilled in topology/interface mapping investigations relying on the direct detection of crosslinked peptides. Unless data are collected following independent reactions with a range of different crosslinkers, assignments for the mapping of protein topologies/interfaces are based on the interpretation of single tandem MS spectra. In conclusion, the requirement to detect and fragment large peptides limits the sensitivity of direct detection strategies as these peptides frequently exhibit dissatisfactory ionization characteristics. The requirement to base detection on
264
Rasanjala Weerasekera et al.
Direct detection
Indirect detection intra-crosslink Protein 1 Protein 2
inter-crosslink
a
crosslinked
+
crosslinking
-
uncrosslinked
MW
MW
b
PMF
iterative analyses
U UU
U
U U
c
pI
U
pI
UU C C
m/z inter-crosslink
intra-crosslink
inter-crosslink
MS/MS
2
3
7
1
d
4 C m/z
m/z
K
3 K Protein 2
Protein 1
K K
56
10
11
8 9 C
m/z
K
K e
f
intra-crosslink PMF
m/z
11 7 5 6 M 2 M 8 10 9 M M Protein 2
M 4 1 M Protein 1
M M
KK
M M
M
M
Covalent Trapping of Protein Interactions in Complex Systems
265
the interpretations of single tandem MS spectra poses formidable obstacles for data analysis and translates into less confident assignments. Despite impressive progress in this field the majority of direct detection strategies have so far been of modest practical benefit to mainstream biochemists largely due to the high purity and quantity of samples required for a successful experiment. Also, the trend towards the use of more complex crosslinking reagents may be counterintuitive if the long-term objective is to study protein complex topologies following in vivo crosslinking of cells or intact tissues.
4.2.2 Indirect detection To overcome shortcomings associated with the direct detection of crosslinked peptides, we recently developed a first-generation indirect detection strategy for low-resolution interface mapping of protein complexes [87] (Figure 9). In this strategy crosslinking of purified protein complexes is followed by cyanogen bromide (CNBr) cleavage of the protein samples. The resulting CNBr-cleaved protein fragments are then resolved using two-dimensional (2D) gel electrophoresis. Differential analysis of 2D maps generated from crosslinked versus uncrosslinked samples reveals candidate spots harboring crosslinked CNBr fragments amongst the majority of signals common to both gels that originate from uncrosslinked CNBr fragments. To determine the nature of crosslinked CNBr fragments, spots of interest are excised from the gel, in-gel trypsinized and analyzed by matrix-assisted laser desorption ionization (MALDI) time-of-flight
Figure 9 Flow diagram depicting concepts of direct and indirect topology/interface mapping strategies. Chemical crosslinking of a hypothetical protein complex consisting of two proteins (symbolized by blue and green ribbons) stabilizes regions of spatial proximity either between juxtaposed strands (intercrosslink) or within one strand (intracrosslink). Direct strategy: (a) To enrich crosslinked reaction products, these may be separated from uncrosslinked material (e.g. by gel electrophoresis). (b) The crosslinked fraction is then digested (e.g. with trypsin) and (c) generated peptides are analyzed by MS. (d) On the basis of computational predictions, a subset of candidate crosslink masses is selected for tandem MS analysis. (e) Complex analyses of tandem MS spectra then may reveal the identity of crosslinked peptides and may lead to the identification of amino acids which reacted with the crosslinking reagent and (f) can be used for the generation of a topology model. Indirect strategy: (a) The crosslinked protein complex is fragmented into intermediate-sized fragments using chemical cleavage. An uncrosslinked sample of the same protein complex is also fragmented in a separate reaction. (b) Exploiting differences in their physico-chemical properties, fragments from both uncrosslinked and crosslinked samples are fractionated. Differential profiling enables the identification of crosslinked fragments of interest in the presence of an excess of uncrosslinked fragments. (c) Individual crosslinked fragments are further cleaved into smaller peptides, the majority of which are not directly involved in the crosslink and therefore are not modified by the crosslinking reagent. (d) Mass spectrometry analysis enables mapping of peptides to regions within the primary structure of the protein complex constituents (e). M ¼ methionine residues (CNBr cleavage sites). (f) Thus obtained linkage constraints form the basis for the generation of a topology model. Adapted from Ref. [87]. (See Colour Plate Section at the end of this book.)
266
Rasanjala Weerasekera et al.
(TOF) MS and peptide mass fingerprinting (PMF) or — for large protein complexes — MS/MS. The objective of the MS analysis here is not to detect tryptic peptides that are directly crosslinked. Instead, CNBr fragments which give rise to a candidate crosslink spot are identified by their unmodified tryptic peptides and the use of standard peptide identification software (e.g. Mascot, Sequest, etc.). As most CNBr fragments are expected to contain multiple (5 on average) tryptic peptides with an average length of 10 amino acids, a move to an indirect detection strategy increases the chance that a contact site between peptide strands can be identified even if individual tryptic peptides exhibit poor ionization characteristics for MS. For confident inferred assignment of intra- or intermolecular contact regions, this protocol requires that the theoretical cumulative molecular weight and isoelectric point of CNBr fragments identified in a given 2D gel spot match with the observed molecular weight and isoelectric point of that gel spot. The current in vitro implementation of this protocol requires 50 pmol starting material and thereby appears to be more sensitive than alternative protocols which rely on the direct detection of crosslinked peptides. It remains to be seen whether the above indirect detection protocol can tolerate the expected increase in sample complexity of samples derived from in vivo crosslinked material.
4.2.3 Validation Once crosslinking data for a protein complex become available, the spatial constraints they provide form the basis for the generation of a low-resolution topology model. At this point, model-building strategies can be borrowed from high-resolution structure analysis techniques. A key objective during these steps is to derive a model that minimizes the internal energy of the assembled protein complex. Frequently, the high-resolution structure of a protein complex constituent or a homologue thereof may be available and can provide important additional structural clues that provide a starting point for the assembly of the model. Modern interface prediction algorithms can further facilitate the modelbuilding process. A recent manuscript describes the use of a combination of the above strategies to assemble a low-resolution structural model for the bacterial signal recognition particle (Ffh) and its receptor (FtsY) from chemical crosslinking data [98].
5. FUTURE DIRECTIONS Naturally, the challenges in the field of study reviewed here largely overlap with the general challenges the proteomics research community faces in its quest to fully characterize the protein complement of cells and organisms. As such, key challenges are (i) the spatial and temporal dynamics of protein assemblies, (ii) the more than six orders of magnitude spanning dynamic range of protein expression levels observed [99] and (iii) the complexity of protein regulation. Current technologies largely map stable interactions and are frequently blind towards the influence of post-translational modifications or splicing variants on interactions of individual proteins. Given the overwhelming task ahead, the
Covalent Trapping of Protein Interactions in Complex Systems
267
research reviewed here can only be considered first steps in a research field that is bound to expand. It is likely that research into protein–protein interactions in the coming years will be dominated by advancements in sample preparation protocols and MS technology. On the sample preparation side, it will be necessary to build sensitive, robust and generic strategies for interface/topology mapping applicable to a large number of diverse bait proteins. Given the invariably low yield of informative crosslink events, progress in this field will hinge upon the availability of a method for the selective retrieval of throughspace crosslinked peptides following in vivo crosslinking. Current methods also still do poorly in their ability to define boundaries of protein complexes, assign the same bait protein to distinct protein complexes and determine the stoichiometry of protein constituents within a given protein complex. Mass spectrometry offers the ability to monitor dynamic events such as the folding of a protein or the assembly of a protein complex (e.g. Ref. [100]). The current literature in this area is dominated by H/D or chemical derivatization experiments and in vitro crosslinking work. It is, however, conceivable that similar experiments could be carried out in vivo if differentially isotope-coded crosslinkers were added to cells at different time points. On the MS end, the very inefficient introduction of ions into contemporary mass spectrometers and the relatively high chemical and electronic noise levels can be singled out as areas where improvements are to be expected. The recent developments in top-down analysis strategies [83,101,102] offer the exciting possibility that it may be possible some day to routinely subject large protein complexes to comprehensive characterizations of post-translational modifications/alternative splicing. Such an implementation will need to await much needed developments in FT-ICR or related MS technologies. It will also require improvements to front-end interfaces for MS and fragmentation strategies that provide efficient internal fragmentation of large molecules. With current electron capture dissociation protocols good sequence coverage of N- or C-terminal regions are routinely achieved, but the internal fragmentation of large proteins is hampered by strong hydrogen bonding forces of desolvated proteins in the gas phase. Once the next generation tools become available, the sheer scale of the undertaking will require advancements in sample throughput. Here, proteomics research will have to apply knowledge gained from large-scale DNA sequencing and genome mapping projects. Amongst these, the need for concerted action, consistent data formats and integrated and dynamic data storage are obvious areas for improvement. Eventually, we will hopefully see the emergence of novel imaging tools that narrow the gap between crosslinking-based low-resolution interactome mapping tools that were the focus of this chapter and contemporary high-resolution X-ray, electron microscopy and NMR strategies. Finally, research into protein–protein interfaces will be stimulated by the rapid advances in the field of computational research into the prediction of protein structures and protein–protein interfaces. However, even if technologies are available today that would provide full characterization of dynamic protein assemblies within an in vivo environment, the research community would still face the challenge of translating this knowledge into something useful for society, i.e. diagnostics to
268
Rasanjala Weerasekera et al.
detect and therapeutics to manipulate and defeat diseases. Therefore, it is important that advances in our understanding of protein assemblies are paralleled by equally productive translational and applied research.
ABBREVIATIONS BSA CID CNBr DSS ESI FRET IEP MALDI MS/MS Mr PMF QqTOF tcTPC THS
Bovine serum albumin Collision-induced dissociation Cyanogen bromide Disuccinimidyl suberate Electrospray ionization Fluorescence resonance energy transfer Isoelectric point Matrix-assisted laser desorption ionization Tandem mass spectrometry Molecular weight Peptide mass fingerprinting Quadrupole time-of-flight Time-controlled transcardiac perfusion crosslinking Two-Hybrid System
ACKNOWLEDGEMENTS We would like to thank the Canadian Institute for Health Research, the Canadian Foundation for Innovation and the W. Garfield Weston Foundation for their support.
REFERENCES 1 P. Aloy and R.B. Russel, The third dimension for protein interactions and complexes, Trends Biochem. Sci., 12 (2002) 633–638. 2 A.C. Gavin, P. Aloy, P. Grandi, R. Krause, M. Boesche, M. Marzioch, C. Rau, L.J. Jensen, S. Bastuck, B. Dumpelfeld, A. Edelmann, M.A. Heurtier, V. Hoffman, C. Hoefert, K. Klein, M. Hudak, A.M. Michon, M. Schelder, M. Schirle, M. Remor, T. Rudi, S. Hooper, A. Bauer, T. Bouwmeester, G. Casari, G. Drewes, G. Neubauer, J.M. Rick, B. Kuster, P. Bork, R.B. Russell and G. Superti-Furga, Proteome survey reveals modularity of the yeast cell machinery, Nature, 440 (2006) 631–636. 3 F. Sobott and C.V. Robinson, Protein complexes gain momentum, Curr. Opin. Struct. Biol., 12 (2002) 729–734. 4 S. Oliver, Proteomics: Guilt-by-association goes global, Nature, 403 (2000) 601–603. 5 J.D. Fenn, M. Mann, C.K. Meng, S.F. Wong and C.M. Whitehouse, Electrospray ionization for mass spectrometry of large biomolecules, Science, 246 (1989) 64–71. 6 M. Karas and F. Hillenkamp, Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons, Anal. Chem., 60 (1988) 2299–2301. 7 K. Tanaka, H. Waki, Y. Ido, S. Akita, Y. Yoshida and T. Yoshida, Protein and polymer analyses up to m/z 100 000 by laser ionization time-of flight mass spectrometry, Rapid Commun. Mass Spectrom., 2 (1988) 151–153.
Covalent Trapping of Protein Interactions in Complex Systems
269
8 R.D. Fleischmann, M.D. Adams, O. White, R.A. Clayton, E.F. Kirkness, A.R. Kerlavage, C.J. Bult, J.F. Tomb, B.A. Dougherty and J.M. Merrick, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd, Science, 269 (1995) 496–512. 9 E.S. Lander, L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczyky, R. Le Vine, P. McEwan, K. McKernan, J. Meldrim, J.P. Mesirov, C. Miranda, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheridan, C. Sougnez, N. StangeThomann, N. Stojanovic, A. Subramanian, D. Wyman and I.H.G.S. Consortium, Initial sequencing and analysis of the human genome, Nature, 409 (2001) 860–921. 10 J.C. Venter, S. Levy, T. Stockwell, K. Remington and A. Halpern, Massive parallelism, randomness and genomic advances, Nat. Genet., 33 (2003) 219–227. 11 R. Aebersold and M. Mann, Mass spectrometry-based proteomics, Nature, 422 (2003) 198–207. 12 D.A. Fancy, Elucidation of protein–protein interactions using chemical crosslinking or label transfer techniques, Curr. Opin. Cell Biol., 4 (2000) 28–33. 13 K. Melcher, New chemical crosslinking methods for the identification of transient protein–protein interactions with multiprotein complexes, Curr. Protein Pept. Sci., 4 (2004) 287–296. 14 A. Sinz, Chemical cross-linkers and Fourier transform ion cyclotron resonance mass spectrometry for structural analysis of a protein/peptide complex, J. Am. Soc. Mass Spectrom., 17 (2006) 1100–1113. 15 S.S. Wong and L.J. Wong, Chemical crosslinking and the stabilization of proteins and enzymes, Enzyme Microb. Technol., 14 (1992) 866–874. 16 B. Metz, G.F. Kersten, P. Hoogerhout, H.F. Brugghe, H.A. Timmermans, A. deJong, H. Meiring, J. ten Hove, W.E. Hennink, D.J. Crommelin and W. Jiskoot, Identification of formaldehyde-induced modifications in proteins: Reactions with model peptides, J. Biol. Chem., 279 (2004) 6235–6243. 17 C. McNulty, J. Thompson, B. Barrett, L. Lord, C. Andersen and I.S. Roberts, The cell surface expression of group 2 capsular polysaccharides in Escherichia coli: The role of KpsD, RhsA and a multi-protein complex at the pole of the cell, Mol. Microbiol., 59 (2006) 907–922. 18 P. Percipalle, A. Jonsson, D. Nashchekin, C. Karlsson, T. Bergman, A. Guialis and B. Daneholt, Nuclear actin is associated with a specific subset of hnRNP A/B-type proteins, Nucleic Acids Res., 30 (2002) 1725–1734. 19 E.E. Weiss, M. Kroemker, A.H. Rudiger, B.M. Jockusch and M. Rudiger, Vinculin is part of the cadherin-catenin junctional complex: Complex formation between alpha-catenin and vinculin, J. Cell Biol., 141 (1998) 755–764. 20 G. Schmitt-Ulms, G. Legname, M.A. Baldwin, H.L. Ball, N. Bradon, P.J. Bosque, K.L. Crossin, G.M. Edelman, S.J. DeArmond, F.E. Cohen and S.B. Prusiner, Binding of neural cell adhesion molecules (N-CAMs) to the cellular prion protein, J. Mol. Biol., 314 (2001) 1209–1225. 21 J.T. Skare, B.M. Ahmer, C.L. Seachord, R.P. Darveau and K. Postle, Energy transduction between membranes. TonB, a cytoplasmic membrane protein, can be chemically crosslinked in vivo to the outer membrane receptor FepA, J. Biol. Chem., 268 (1993) 16302–16308. 22 V. Jackson, Formaldehyde cross-linking for studying nucleosomal dynamics, Methods, 17 (1999) 125–139. 23 S.C. Alley, F.T. Ishmael, A.D. Jones and S.J. Benkovic, Mapping protein–protein interactions in the bacteriophage T4 DNA polymerase holoenzyme using a novel trifunctional photo-crosslinking and affinity reagent, J. Am. Chem. Soc., 122 (2000) 6126–6127. 24 C.E. Brown, L. Howe, K. Sousa, S.C. Alley, M.J. Carrozza, S. Tan and J.L. Workman, Recruitment of HAT complexes by direct activator interactions with the ATM-related Tra1 subunit, Science, 292 (2001) 2333–2337. 25 F. Chu, S. Mahrus, C.S. Craik and A.L. Burlingame, Isotope-coded and affinity-tagged cross-linking (ICATXL): An efficient strategy to probe protein interaction surfaces, J. Am. Chem. Soc., 128 (2006) 10362–10363. 26 M. Trester-Zedlitz, K. Kamada, S.K. Burley, D. Fenyo, B.T. Chait and T.W. Muir, A modular crosslinking approach for exploring protein interactions, J. Am. Chem. Soc., 125 (2003) 2416–2425. 27 J.W. Back, L. de Jong, A.O. Muijsers and C.G. de Koster, Chemical cross-linking and mass spectrometry for protein structural modeling, J. Mol. Biol., 331 (2003) 303–313.
270
Rasanjala Weerasekera et al.
28 S. Fields and O. Song, A novel genetic system to detect protein–protein interactions, Nature, 340 (1989) 245–246. 29 S.L. Ooi, D.D. Shoemaker and J.D. Boeke, DNA helicase gene interaction network defined using synthetic lethality analyzed by microarray, Nat. Genet., 35 (2003) 277–286. 30 A.H. Tong, M. Evangelista, A.B. Parsons, H. Xu, G.D. Bader, N. Page, M. Robinson, S. Raghibizadeh, C.W. Hogue, H. Bussey, B. Andrews, M. Tyers and C. Boone, Systematic genetic analysis with ordered arrays of yeast deletion mutants, Science, 2942 (2001) 2364–2368. 31 H. Wallrabe and A. Periasamy, Imaging protein molecules using FRET and FLIM microscopy, Curr. Opin. Biotechnol., 16 (2005) 19–27. 32 S.M. Fernandez and R.D. Berlin, Cell surface distribution of lectin receptors determined by resonance energy transfer, Nature, 264 (1976) 411–415. 33 S.R. Collins, K.M. Miller, N.L. Maas, A. Roguev, J. Fillingham, C.S. Chu, M. Schuldiner, M. Gebbia, J. Recht, M. Shales, H. Ding, H. Xu, J. Han, K. Ingvarsdottir, B. Cheng, B. Andrews, C. Boone, S.L. Berger, P. Hieter, Z. Zhang, G.W. Brown, C.J. Ingles, A. Emili, C.D. Allis, D.P. Toczyski, J.S. Weissman, J.F. Greenblatt and N. Krogan, Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map, Nature, 446 (2007) 806–810. 34 P. Ajuh, B. Kuster, K. Panov, J.C. Zomerdijk, M. Mann and A.I. Lamond, Functional analysis of the human CDC5L complex and identification of its components by mass spectrometry, EMBO J., 19 (2000) 6569–6581. 35 A. Bauer and B. Kuster, Affinity purification-mass spectrometry, Eur. J. Biochem., 270 (2003) 570–578. 36 T. Bouwmeester, A. Bauch, H. Ruffner, P.O. Angrand, G. Bergamini, K. Croughton, C. Cruciat, D. Eberhard, J. Gagneur, S. Ghidelli, C. Hopf, B. Hushe, R. Mangano, A.M. Michon, M. Schirle, J. Schlegl, M. Schwab, M.A. Stein, A. Bauer, G. Casari, G. Drewes, A.C. Gavin, D.B. Jackson, G. Joberty, G. Neubauer, J.M. Rick, B. Kuster and G. Superti-Furga, A physical and functional map of the human TNF-alpha/NF-kappa B signal transduction pathway, Nat. Cell Biol., 6 (2004) 97–105. 37 H. Hernandez, A. Dziembowski, T. Taverner, B. Seraphin and C.V. Robinson, Subunit architecture of multimeric complexes isolated directly from cells, EMBO J., 7 (2006) 605–610. 38 R. Reeves and M.S. Nissen, Interaction of high mobility group-I (Y) nonhistone proteins with nucleosome core particles, J. Biol. Chem., 268 (1993) 21137–21146. 39 N. Krogan, G. Cagney, H. Yu, G. Zhong, G. Xinghua, A. Ignatchenko, J. Li, S. Pu, N. Datta, A.P. Tikuisis, T. Punna, J.M. Peregrin-Alvarez, M. Shales, X. Zhang, M. Davey, M.D. Robinson, A. Paccanaro, J.E. Bray, A. Sheung, B. Beattie, D.P. Richards, V. Canadien, A. Lalev, F. Mena, P. Wong, A. Starostine, M.M. Canete, J. Vlasblom, S. Wu, C. Orsi, S.R. Collins, S. Chandran, R. Haw, J.J. Rilstone, K. Gandi, N.J. Thompson, G. Musso, P. St Onge, S. Ghanny, M.H. Lam, G. Butland, A.M. Altaf-Ul, S. Kanaya, A. Shilatifard, E. O’Shea, J.S. Weissman, C.J. Ingles, T.R. Hughes, J. Parkinson, M. Gerstein, S.J. Wodak, A. Emili and J.F. Greenblatt, Global landscape of protein complexes in the yeast Saccharomyces cerevisiae, Nature, 440 (2006) 637–643. 40 A.C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M. Michon, C. Cruciat, M. Remor, C. Hofert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B. Huhse, C. Leutwein, M.A. Heurtier, R.R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork, B. Seraphin, B. Kuester, G. Neubauer and G. Superti-Furga, Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, 415 (2002) 141–147. 41 D. Zhou, J.-X. Ren, T.M. Ryan, N.P. Higgins and T.M. Townes, Rapid tagging of endogenous mouse genes by recombineering and ES cell complementation of tetraploid blastocysts, Nucleic Acids Res., 32 (2004) e128. 42 I. Chen, M. Howarth, W. Lin and A.Y. Ting, Site-specific labeling of cell surface proteins with biophysical probes using biotin ligase, Nat. Methods, 2 (2005) 99–104. 43 B.A. Griffin, S.R. Adams and R.Y. Tsien, Specific covalent labeling of recombinant protein molecules inside live cells, Science, 281 (1998) 269–272. 44 M. Selbach and M. Mann, Protein interaction screening by quantitative immunoprecipitation combined with knockdown (QUICK), Nat. Methods, 3 (2006) 981–983.
Covalent Trapping of Protein Interactions in Complex Systems
271
45 M.B. Goshe and R.D. Smith, Stable isotope-coded proteomic mass spectrometry, Curr. Opin. Biotechnol., 14 (2003) 101–109. 46 S. Sechi and Y. Oda, Quantitative proteomics using mass spectrometry, Nat. Rev. Mol. Cell Biol., 5 (2003) 699–711. 47 J.A. Ranish, E.C. Yi, D.M. Leslie, S.O. Purvine, D.R. Goodlett, J. Eng and R. Aebersold, The study of macromolecular complexes by quantitative proteomics, Nat. Genet., 33 (2003) 349–355. 48 B. Blagoev and M. Mann, Quantitative proteomics to study mitogen-activated protein kinases, Methods, 40 (2006) 243–250. 49 S.P. Gygi, B. Rist, S.A. Gerber, F. Turecek, M.H. Gelb and R. Aebersold, Quantitative analysis of complex protein mixtures using isotope-coded affinity tags, Nat. Biotechnol., 17 (1999) 994–999. 50 A.J. Tackett, J.A. DeGrasse, M.D. Sekedat, M. Oeffinger, M.P. Rout and B.T. Chait, I-DIRT, a general method for distinguishing between specific and nonspecific protein interactions, J. Proteome Res., 4 (2005) 1752–1756. 51 W.M. Old, K. Meyer-Arendt, L. Aveline-Wolf, K.G. Pierce, A. Mendoza, J.R. Sevinsky, K.A. Resing and N.G. Ahn, Comparison of label-free methods for quantifying human proteins by shotgun proteomics, Mol. Cell Proteomics, 4 (2005) 1487–1502. 52 H. Liu, R.G. Sadygov and J.R. Yates, 3rd, A model for random sampling and estimation of relative protein abundance in shotgun proteomics, Anal. Chem., 76 (2004) 4193–4201. 53 M. Suchanek, A. Radzikowsk and C. Thiele, Photo-leucine and photo-methionine allow identification of protein–protein interactions in living cells, Nat. Methods, 2 (2005) 261–267. 54 G. Fragoso and G.L. Hager, Analysis of in vivo nucleosome positions by determination of nucleosome-linker boundaries in crosslinked chromatin, Methods, 11 (1997) 246–252. 55 V. Orlando, H. Strutt and R. Paro, Analysis of chromatin structure by in vivo formaldehyde crosslinking, Methods, 11 (1997) 205–214. 56 J. Wells and P.J. Franham, Characterizing transcription factor binding sites using formaldehyde crosslinking and immunoprecipitation, Methods, 26 (2002) 48–56. 57 T.I. Lee, S.E. Johnstone and R.A. Young, Chromatin immunoprecipitation and microarray-based analysis of protein location, Nat. Protoc., 1 (2006) 729–748. 58 T. Sandmann, J.S. Jakobsen and E.E. Furlong, ChIP-on-chip protocol for genome-wide analysis of transcription factor binding in Drosophila melanogaster embryos, Nat. Protoc., 1 (2006) 2839–2855. 59 M.J. Hannah, U. Weiss and W.B. Huttner, Differential extraction of proteins from paraformaldehyde-fixed cells: Lessons from synaptophysin and other membrane proteins, Methods, 16 (1998) 170–181. 60 C. Guerrero, C. Tagwerker, P. Kaiser and L. Huang, An integrated mass spectrometry-based proteomic approach: Quantitative analysis of tandem affinity-purified in vivo crosslinked protein complexes (QTAX) to decipher the 26 S proteasome-interacting network, Mol. Cell Proteomics, 5 (2006) 366–378. 61 J. Vasilescu, X.J. Guo and J. Kast, Identification of protein–protein interactions using in vivo crosslinking and mass spectrometry, Proteomics, 12 (2004) 3845–3854. 62 P. Hajek, A. Chomyn and G. Attardi, Identification of a novel mitochondrial complex containing mitofusin 2 and stomatin-like protein 2, J. Biol. Chem., 282 (2007) 5670–5681. 63 G. Layh-Schmitt, A. Pdtelejnikov and M. Mann, Proteins complexed to the P1 adhesin of Mycoplasma pneumoniae, Microbiology, 146 (2000) 741–747. 64 J.S. Rohila, M. Chen, R. Cerny and M.E. Fromm, Improved tandem affinity purification tag and methods for isolation of protein heterocomplexes from plants, Plant J., 38 (2004) 172–181. 65 G. Schmitt-Ulms, K. Hansen, J. Liu, C. Cowdrey, J. Yang, S. DeArmond, F.E. Cohen, S.B. Prusiner and M.A. Baldwin, Time-controlled transcardiac perfusion cross-linking for the study of protein interactions in complex tissues, Nat. Biotechnol., 22 (2004) 724–731. 66 Y. Bai, K. Markham, F. Chen, R. Weerasekera, J. Watts, P. Horne, Y. Wakutani, R. Bagshaw, P.M. Mathews, P.E. Fraser, D. Westaway, P. St. George-Hyslop and G. Schmitt-Ulms, The in vivo brain interactome of the amyloid precursor protein. Under review (2007). 67 B.L. Hood, T.P. Conrads and T.D. Veenstra, Mass spectrometric analysis of formalin-fixed paraffinembedded tissue: Unlocking the proteome within, Proteomics, 6 (2006) 4106–4114.
272
Rasanjala Weerasekera et al.
68 B.L. Hood, M.M. Darfler, T.G. Guiel, B. Furusato, D.A. Lucas, B.R. Ringeisen, I.A. Sesterhenn, T.P. Conrads, T.D. Veenstra and D.B. Krizman, Proteomics analysis of formalin-fixed prostate cancer tissue, Mol. Cell Proteomics, 4 (2005) 1741–1753. 69 P.O. Angrand, I. Segura, P. Volkel, S. Ghidelli, R. Terry, M. Brajenovic, K. Vintersten, R. Klein, G. Superti-Furga, G. Drewes, B. Kuster, T. Bouwmeester and A. Acker-Palmer, Transgenic mouse proteomics identifies new 14-3-3-associated proteins involved in cytoskeletal rearrangements and cell signaling, Mol. Cell Proteomics, 5 (2006) 2211–2227. 70 E. deBoer, P. Rodriguez, E. Bonte, J. Krijgsveld, E. Katsantoni, A. Heck, F. Grosveld and J. Strouboulis, Efficient biotinylation and single-step purification of tagged transcription factors in mammalian cells and transgenic mice, Proc. Natl. Acad. Sci. U.S.A., 100 (2003) 7480–7485. 71 J. Lomax, Get ready to GO! A biologist’s guide to Gene Ontology, Brief. Bioinform., 6 (2005) 298–304. 72 R.B. Russell, F. Alber, P. Aloy, F.P. Davis, D. Korkin, M. Pichaud, M. Topf and A. Sali, A structural perspective on protein–protein interactions, Curr. Opin. Struct. Biol., 14 (2004) 313–324. 73 A. Sali, R. Glaeser, T. Earnest and W. Baumeister, From words to literature in structural proteomics, Nature, 422 (2003) 216–225. 74 T. Zal and N.R. Gascoigne, Using live FRET imaging to reveal early protein–protein interactions during T cell activation, Curr. Opin. Immunol., 16 (2004) 674–683. 75 J. Buijs and G.C. Franklin, SPR-MS in functional proteomics, Brief. Funct. Genomic Proteomic, 4 (2005) 39–47. 76 D. Nedelkov and R.W. Nelson, Surface plasmon resonance mass spectrometry: Recent progress and outlooks, Trends Biotechnol., 21 (2003) 301–305. 77 L.S. Busenlehner and R.N. Armstrong, Insights into enzyme structure and dynamics elucidated by amide H/D exchange mass spectrometry, Arch. Biochem. Biophys., 433 (2005) 34–46. 78 S.J. Eyles and I.A. Kaltashov, Methods to study protein dynamics and folding by mass spectrometry, Methods, 34 (2004) 88–99. 79 C.S. Maier and M.L. Deinzer, Protein conformations, interactions, and H/D exchange, Meth. Enzymol., 402 (2005) 312–360. 80 T.E. Wales and J.R. Engen, Hydrogen exchange mass spectrometry for the analysis of protein dynamics, Mass Spectrom. Rev., 25 (2006) 158–170. 81 P. Aloy, M. Pichaud and R.B. Russel, Protein complexes: Structure prediction challenges for the 21st century, Curr. Opin. Struct. Biol., 15 (2005) 15–22. 82 W. DeLano, Unraveling hot spots in binding interfaces: Progress and challenges, Curr. Opin. Struct. Biol., 12 (2002) 14–20. 83 G.R. Smith and J.E. Sternberg, Prediction of protein–protein interactions by docking methods, Curr. Opin. Struct. Biol., 12 (2002) 28–35. 84 O. Schueler-Furman, C. Wang, P. Bradley, K. Misura and D. Baker, Progress in modeling of protein structures and interactions, Science, 310 (2005) 638–642. 85 A. Sinz, Chemical cross-linking and mass spectrometry for mapping three-dimensional structures of proteins and protein complexes, J. Mass Spectrom., 38 (2003) 1225–1237. 86 R. Weerasekera and G. Schmitt-Ulms, Crosslinking studies for the study of membrane protein complexes and protein interaction interfaces, Biotechnol. Genet. Eng. Rev., 23 (2006) 41–62. 87 R. Weerasekera, Y.M. She, K. Markham, Y. Bhai, N. Opalka, S. Orlicky, F. Sicheri, T. Kislinger and G. Schmitt-Ulms, Ineractome and interface protocol (2IP): A novel strategy for high sensitivity topology mapping of protein complexes, Proteomics, 7 (2007) 3835–3852. 88 D.R. Mueller, P. Schindler, H. Towbin, U. Wirth, H. Voshol, S. Hoving and M.O. Steinmetz, Isotopetagged cross-linking reagents. A new tool in mass spectrometric protein interaction analysis, Anal. Chem., 73 (2001) 1927–1934. 89 K.M. Pearson, L.K. Pannell and H.M. Fales, Intramolecular cross-linking experiments on cytochrome c and ribonuclease A using an isotope multiplet method, Rapid Commun. Mass Spectrom., 16 (2002) 149–159. 90 X. Chen, Y.H. Chen and V.E. Anderson, Protein cross-links: Universal isolation and characterization by isotopic derivatization and electrospray ionization mass spectrometry, Anal. Biochem., 273 (1999) 192–203.
Covalent Trapping of Protein Interactions in Complex Systems
273
91 A. Sinz and K. Wang, Mapping protein interfaces with a fluorogenic cross-linker and mass spectrometry: Application to nebulin–calmodulin complexes, Biochemistry, 40 (2001) 7903–7913. 92 R.N. Wine, J.M. Dial, K.B. Tomer and C.H. Borchers, Identification of components of protein complexes using a fluorescent photo-crosslinker and mass spectrometry, Anal. Chem., 74 (2002) 1939–1947. 93 J.W. Back, A.F. Hartog, H.L. Decker, A.O. Muijsers, L.J. Koning and L. De Jong, A new crosslinker for mass spectrometric analysis of the quarternary structure of protein complexes, J Am. Soc. Mass Spectrom., 12 (2001) 222–227. 94 T. Taverner, N.E. Hall, A.J. O’Hair and R.J. Simpson, Characterization of an antagonist interleukin-6 dimer by stable isotope labeling, cross-linking, and mass spectrometry, J. Biol. Chem., 277 (2002) 46487–46492. 95 B. Schilling, R.H. Row, B.W. Gibson, X. Guo and M.M. Young, MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides, J. Soc. Mass Spectrom., 14 (2003) 834–850. 96 S. Peri, H. Steen and A. Pandey, GPMAW — a software tool for analyzing proteins and peptides, Trends Biochem. Sci., 26 (2001) 687–695. 97 R.J. Chalkley, P.R. Baker, L. Huang, K.C. Hansen, N.P. Allen, M. Rexach and A.L. Burlingame, Comprehensive analysis of a multidimensional liquid chromatography mass spectrometry dataset acquired on a quadrupole selecting quadrupole collision cell, time-of-flight mass spectrometer. II. New developments in protein prospector allow for reliable and comprehensive automatic analysis of large datasets, Mol. Cell Proteomics, 4 (2005) 1194–1204. 98 F. Chu, S.O. Shan, D.T. Moustakas, F. Alber, P.F. Egea, R.M. Stroud, P. Walter and A.L. Burlingame, Unraveling the interface of signal recognition particle and its receptor by using chemical crosslinking and tandem mass spectrometry, Proc. Natl. Acad. Sci. U.S.A., 101 (2004) 16454–16459. 99 S. Ghaemmaghami, W.K. Huh, K. Bower, R.W. Howson, A. Belle, N. Dephoure, E.K. O’Shea and J.S. Weissman, Global analysis of protein expression in yeast, Nature, 425 (2003) 737–741. 100 E. Kurucz, I. Ando, M. Sumegi, H. Holzl, B. Kapelari, W. Baumeister and A. Udvardy, Assembly of the Drosophila 26S proteasome is accompanied by extensive subunit rearrangements, Biochem. J., 365 (2002) 527–536. 101 B. Bogdanov and R.D. Smith, Proteomics by FTICR mass spectrometry: Top down and bottom up, Mass Spectrom. Rev., 24 (2005) 168–200. 102 N.L. Kelleher, Top-down proteomics, Anal. Chem., 76 (2004) 197A–203A.
H
Formaldehyde crosslinking
a)
Protein Protein Protein
N
H
H Protein Protein
N
N
H OH
H
H C
CH2
- H2O
C
H
H
N H
N
Protein
O
Protein Protein
H
H
NHS-ester crosslinking (only first step shown), Disuccinimidyl suberate (DSS) O
H
O H
O O
N
N
O O
O
O
O
O
N
O
H
N O
Protein Protein
N
N
Protein
O
b)
O
O H
O
Plate 4 Chemistries of the two most frequently employed crosslinking reagents in the in vivo crosslinking literature. Formaldehyde and DSS rely on dramatically different chemistries for homobifunctional amino group reactive crosslinking. (For Black and White version, see page 250.)
a)
N
N
N
N N
C
C
Uncrosslinked
C
Derivatized ('dead end')
C
Short-range intracrosslink
C Through-space crosslink
b) 1
N
C N 1+2
N
2
C N
2+3
2 cleavage sites
5 peptides
5 + 4 + 3 + 2 + 1=
3 cleavage sites
7 peptides
7 + 6 +... + 2 + 1=
1401 peptides
C
C N
700 cleavage sites
3
C 15 through-space crosslinks
28 through-space crosslinks . . . 1401 + 1400 + ... + 2 + 1 = 1,000,000 through-space crosslinks
Plate 5 Illustration of the sample complexity problem inherent to crosslinking/MS-based topology work. (a) Classification of products resulting from the crosslinking reaction. The true theoretical possibilities of crosslink products are even larger as any combination of these four main products may occur (e.g. a peptide which carries both a derivatization and a short-range intracrosslink). (b) Correlation of theoretical crosslink possibilities to be considered and size of protein (complex). Calculated numbers of theoretical through-space crosslinks are based on the assumption that protein cleavage reactions may produce no more than one missed cleavage. Please note that for a protein complex of unknown topology, one would need to assume that any peptide might be crosslinked to self or any other peptide in the complex. No additional modifications were considered. (For Black and White version, see page 262.)
Direct detection
Indirect detection intra-crosslink Protein 1 Protein 2
inter-crosslink
a
crosslinked
+
crosslinking
-
uncrosslinked
MW
MW
b
PMF
iterative analyses
U UU
U
U U
c
pI
U
pI
UU C C
m/z inter-crosslink
intra-crosslink
inter-crosslink
MS/MS
2
3
7
1
d
4 C m/z
m/z
K
3 K Protein 2
Protein 1
K K
56
10
11
8 9 C
m/z
K
K e
f
intra-crosslink PMF
m/z
11 7 5 6 M 2 M 8 10 9 M M Protein 2
M 4 1 M Protein 1
M M
KK
M M
M
M
Plate 6 Flow diagram depicting concepts of direct and indirect topology/interface mapping strategies. Chemical crosslinking of a hypothetical protein complex consisting of two proteins (symbolized by blue and green ribbons) stabilizes regions of spatial proximity either between juxtaposed strands (intercrosslink) or within one strand (intracrosslink). Direct strategy: (a) To enrich crosslinked reaction products, these may be separated from uncrosslinked material (e.g. by gel electrophoresis). (b) The crosslinked fraction is then digested (e.g. with trypsin) and (c) generated peptides are analyzed by MS. (d) On the basis of computational predictions, a subset of candidate crosslink masses is selected for tandem MS analysis. (e) Complex analyses of tandem MS spectra then may reveal the identity of crosslinked peptides and may lead to the identification of amino acids which reacted with the crosslinking reagent and (f) can be used for the generation of a topology model. Indirect strategy: (a) The crosslinked protein complex is fragmented into intermediate-sized fragments using chemical cleavage. An uncrosslinked sample of the same protein complex is also fragmented in a separate reaction. (b) Exploiting differences in their physico-chemical properties, fragments from both uncrosslinked and crosslinked samples are fractionated. Differential profiling enables the identification of crosslinked fragments of interest in the presence of an excess of uncrosslinked fragments. (c) Individual crosslinked fragments are further cleaved into smaller peptides, the majority of which are not directly involved in the crosslink and therefore are not modified by the crosslinking reagent. (d) Mass spectrometry analysis enables mapping of peptides to regions within the primary structure of the protein complex constituents (e). M ¼ methionine residues (CNBr cleavage sites). (f) Thus obtained linkage constraints form the basis for the generation of a topology model. Adapted from Ref. [87]. (For Black and White version, see page 264.)
CHAPT ER
12 Phosphoproteomics Martin R. Larsen and Phillip J. Robinson
Contents
1. Introduction to Phosphoproteomics 2. Strategies for Enrichment of Phosphorylated Peptides 3. Mass Spectrometric Analysis of Phosphorylated Peptides 4. Quantitative Phosphoproteomics 5. Factors Affecting Phosphoproteomics 6. Conclusion Acknowledgements References
275 277 282 285 290 292 293 293
1. INTRODUCTION TO PHOSPHOPROTEOMICS Protein phosphorylation is the most widespread post-translational modification found in nature. This means of cellular regulation results from a complicated interplay between protein kinases and protein phosphatases. Protein kinases can add a phosphate group to a specific amino acid in a protein and protein phosphatases are able to remove the phosphate group again, thereby controlling the regulation of any given protein. Cells utilize reversible phosphorylation of proteins to control all aspects of their operation, such as turning on or off metabolism, initiating or terminating the cell cycle or responding to extracellular signals (e.g., [1]). More specifically, phosphorylation of a protein alters its function in a variety of ways such as increasing or decreasing enzymatic activity, or regulating protein– protein, protein–lipid and protein–nucleic acid interactions. Proteins are phosphorylated on specific amino acid residues in their amino acid sequence. Mostly this is Ser, Thr or Tyr, and rarely on His. Proteins are frequently phosphorylated on more than one site, occasionally up to 50 sites. It can allow phosphorylation to control distinct functions of the protein that are governed by different domains in the protein. It also allows complex regulation of which protein kinases can access a specific site in order to act as a signal transduction processing unit, when for Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00212-2
r 2009 Elsevier B.V. All rights reserved.
275
276
Martin R. Larsen and Phillip J. Robinson
example, phosphorylation of one site by one protein kinase is required prior to phosphorylation of another site by another kinase. Often a complicated interplay between several phosphorylated amino acids in a protein determines its interaction partners and activity and the technical ability to identify and characterize the phosphorylation pattern is essential for understanding the function of such a protein. Phosphoproteomics is a relatively new field that aims to identify and map the phosphorylation sites in a single protein, or the multiple proteins inside a cell organelle, or most commonly to map the sites in an entire cell or tissue. It is mostly achieved by mass spectrometry (MS) techniques that have enormously improved in sensitivity and accuracy in recent years. It also aims to compare the phosphorylation sites in such samples before and after a key cell process or disease state, to better understand its biological regulation. There are many difficult challenges to phosphoproteomics, including imperfect MS methods and inherent difficulties with the proteins themselves. Many abundant cellular proteins are phosphorylated, like metabolic enzymes or structural proteins. The proteins that are regulated by phosphorylation are frequently present in very low copy numbers in the cell because they are signaling molecules. Sometimes only a small number of these molecules are phosphorylated at a given time (low stoichiometry). These problems make it a challenge to find functionally important phosphorylation sites. Fast, efficient and sensitive methods are required to be able to characterize phosphorylation events. MS identification and sequencing of phosphopeptides is inherently more difficult than that of non-phosphopeptides and many problems arise due to their much lower abundance. Attempts to apply traditional ‘‘shotgun’’ proteomic analysis rarely work well. In this approach, the protein complement of a given cell is cleaved into peptides followed by tandem mass spectrometric analysis for identification of as many peptides as possible. This rarely provides identification of phosphorylated peptides or other posttranslationally modified peptides in general. There are many reasons for this, but primarily the phosphorylated peptides are mostly present in only low amounts in comparison to the non-phosphorylated peptides. This is due to either the low abundance of the protein they originate from, or to the low stoichiometry of phosphorylation in an abundant protein (which reduces their effective abundance up to 100-fold). Secondly, phosphorylated peptides are more negatively charged, therefore they are suppressed in the ionization process, giving rise to lower signals in the MS analysis. Since the mass spectrometers perform fragmentation on a limited number of peptides at any one time, only the most intense ions will be selected for fragmentation, which will predominantly be the non-phosphorylated peptides. Therefore, to identify the complete phosphoprotein complement in a cell at a given time (i.e., the phosphoproteome) enrichment methods are required to separate the phosphorylated peptides from the non-phosphorylated peptides prior to tandem MS analysis. Numerous strategies have been developed for large scale phosphoproteomics in recent years. The goal of all strategies is to simplify the sample prior to phosphopeptide enrichment in order to increase the number of phosphorylated peptides that can be obtained from a single sample. Strategies for simplifying the
Phosphoproteomics
277
target sample involve pre-purification of subproteomes or cell organelles or purification of individual protein complexes by pulldowns or immunoprecipitation (IP). These will not be discussed further here. Intense effort has focused on optimizing strategies for the enrichment of phosphopeptides prior to MS. A broad outline of phosphoproteomic strategies is provided in Figure 1.
2. STRATEGIES FOR ENRICHMENT OF PHOSPHORYLATED PEPTIDES Comprehensive phosphoproteomics is the ability to take a complex peptide mixture and selectively isolate the phosphorylated peptides from the nonphosphorylated peptides, and then to identify and sequence the total complement of phosphorylated peptides. Several strategies for enrichment of phosphorylated peptides have been developed for this purpose, of which the most commonly used are immobilized metal affinity chromatography (IMAC) ([2–9]), chemical derivatization [10,11], or titanium dioxide (TiO2) chromatography [12–14]. While all are effective, no method or combination thereof yet offers complete coverage of the phosphoproteome. Traditionally, IMAC has been the method of choice for enrichment of phosphorylated peptides from total proteolytic digests. In this method, the phosphorylated peptides are reversibly bound to a chelating material containing metal ions which targets the phosphate group [2,4,5]. The most commonly used metal ions are iron or gallium. The phosphorylated peptides are bound to the IMAC material in the presence of a weak acid and are eluted from the material using: (i) high or low pH, (ii) metal chelators such as ethylenediaminetetra-acetate (EDTA), or (iii) reagents that interfere with the binding between the metal ion and the phosphate group, such as phosphoric acid or phosphate buffer. Despite its advantages and ease of use, IMAC has three main problems. Firstly, IMAC provides a significant enrichment of phosphopeptides from simple mixtures, but its selectivity towards phosphorylated peptides decreases with increasing complexity of the initial sample. Secondly, peptides that are acidic (multiple Glu or Asp amino acids) but not phosphorylated bind strongly to IMAC, resulting in a phosphopeptide pool containing a significant amount of ‘‘non-specific’’ (or undesirable) peptides. To circumvent the problem of non-specific binding, O-methylesterification has been suggested as a means to block the acidic amino acids [6]. However, this can result in incomplete methylation and unwanted side reactions such as deamidation, leading to additional methylation of the peptide. The O-methylesterification can generate multiple variants of the same peptide making the peptide mixture more complex and decreasing the overall output from the MS analysis. Alternatively, non-specific binding of acidic peptides can be reduced by loading the total digest onto the IMAC material in a slightly more acidic buffer containing 0.1% trifluoroacetic acid (TFA) [9]. However, this can also result in less binding of phosphopeptides and especially mono-phosphorylated peptides [15]. Thirdly, IMAC has an unusual property of different enrichment efficiencies for different phosphopeptides, depending on the loading/incubation buffer [16]. Inclusion of various detergents, salt and low molecular molecules, like EDTA, can decrease
278
Martin R. Larsen and Phillip J. Robinson
Phosphoproteomic strategies Cell culture -Sub-proteome purification -Organelle purification -Protein pulldowns or IPs -Total protein purification
Protein Mixture
Enriched Phosphoprotein Mixture
Proteolysis Proteolysis
-Qiagen kit -Al (III) hydroxide -Other
Peptide Mixture
Pre-separation of peptides -Isoelectric focusing -Strong anion/cation exchange -Hydrophilic interaction chromatography -Other
Affinity enrichment of phospho-tyrosine containing peptides
Enrichment of phosphopeptides -IMAC -Beta-elimination -Phosphoramidate chemistry -TiO2 or ZrO2 chromatography -Other
Tandem MS analysis of phosphopeptides
-CID -ECD/ETD -pdMS3 -Multistage activation -Other
Phosphopeptide identification Figure 1 Overview of phosphoproteomics strategies and combinations of methods.
Phosphoproteomics
279
(or even abolish) the affinities of phosphopeptides for the IMAC material, especially for mono-phosphorylated peptides [16]. This results in different pools of IMAC-purified phosphopeptides from the same sample, depending on the buffers used in the initial experiment. Therefore, most experiments using IMAC include a reversed-phase (RP) chromatography purification step prior to the phosphopeptide enrichment on IMAC. This improves the sample quality but has a risk of losing valuable material [17], in particular, any short or hydrophilic phosphopeptide would be lost on the RP column. The second strategy for enrichment of phosphopeptides from complex proteolytic digests involves two chemical derivatization methods. In the first method, the phosphate group attached to serine or threonine residue is chemically removed by beta-elimination using high pH. This leaves behind a dehydro-amino acid product which is suitable for nucleophilic attachment by a Michael addition reaction [10,18]. A variety of chemical groups can be attached to the dehydro-amino acid, making it easier to detect the former location of the phosphorylation site by MS. By including a stable isotope labeling or affinity tagging using biotin or other tags this strategy can be further adapted for peptide quantitation [19,20]. There are several limitations to this method, including co-purification of O-glycosylated peptides, lack of efficient derivatization of multiply-phosphorylated peptides and unwanted side reactions like peptide backbone cleavage [18]. Therefore, it requires considerable attention to be effectively used. The second chemical derivatization method was introduced by Aebersold and co-workers and involves derivatization of phosphorylated peptides using phosphoramidate chemistry [11]. Here the phosphate group on each phosphopeptide is activated and coupled covalently via a phosphoamidate bond to a solid support. This method requires that the peptide mixture is O-methylesterified to prevent acidic peptides binding to the solid support, with the disadvantages that follow this chemical step. Therefore, it requires substantial hands-on experience, is time-consuming and is not in widespread use. The third strategy is rapidly becoming the most popular approach. TiO2 was recently introduced as an alternative to IMAC for enrichment of phosphopeptides [12–14]. When first introduced in 2004 [12] the initial peptide loading buffer conditions resulted in high binding of non-phosphorylated peptides, limiting its widespread utility. A breakthrough in the enrichment of phosphorylated peptides using TiO2 came with the introduction of more rigid buffer conditions for loading the peptides onto the TiO2 resin. Lower pH and substituted aromatic acids resulted in much less non-specific binding [14]. This procedure was further optimized [16,21] and is now routinely used for phosphoproteomics studies (e.g., [22–26]). The mechanism of binding of phosphate to TiO2 is unclear, but we speculate that the binding between TiO2 and the phosphate group is a bridging bidentate binding [14] which is very strong at low pH but which can be dissociated using high pH. Multiply-phosphorylated peptides bind much stronger to the TiO2 resin and are difficult to recover even with high pH, suggesting a co-operative binding effect for multiply-phosphorylated peptides. TiO2 chromatography has several advantages over IMAC. Firstly, biological samples can be applied to TiO2 columns in the presence of a wide variety of detergents, surfactants, phosphate buffers, EDTA and other reagents without decreasing the purification efficiency [16]. This makes the
280
Martin R. Larsen and Phillip J. Robinson
procedure much more flexible. Secondly, TiO2 does not specifically bind to acidic peptides that are not phosphorylated in the optimized loading buffers. Thus, it allows for isolation of a phosphopeptide pool that is far less contaminated with non-phosphopeptides. Tyrosine phosphorylation is mainly involved in receptor signaling upon external stimulation. Tyrosine phosphorylation is the lowest abundance phosphorylation event in the mammalian cell. It is therefore conceptually more difficult to detect tyrosine-phosphorylated peptides in the presence of abundant serine- and threonine-phosphorylated peptides, unless substantial fractionation and phosphopeptide enrichment is used. Immunoaffinity enrichment of tyrosinephosphorylated peptides using anti-phosphotyrosine antibodies has been applied to measure the temporal dynamics of tyrosine phosphorylation in insulin signaling [27]. Using 3T3-L1 adipocytes stimulated with insulin for increasing time, a total of 122 tyrosine phosphorylation sites on 89 proteins was reported. In comparison, Thingholm and co-workers identified 60 tyrosine phosphorylation sites in a plasma membrane preparation from non-stimulated human meschenchymal stem cells using TiO2 chromatography in combination with sodium pervanadate treatment (a tyrosine phosphatase inhibitor) [28]. Therefore, the optimal strategy for the study of the tyrosine-phosphoproteome is not yet clear, and TiO2 may prove very important in this area. Despite the numerous strategies used in modern phosphoproteomic studies for affinity enrichment of phosphorylated peptides prior to mass spectrometric analysis no single method is able to provide a complete coverage of the phosphoproteome of a given sample [24]. One of the reasons for the large differences between the enrichment strategies is likely that phosphopeptides differ with respect to amino acid content and the number of phosphate groups attached to each peptide, providing a different binding affinity to each type of chromatographic material. In addition, the micro-environment of a peptide in which a phosphate group is located, represented by local amino acid functional groups in that part of the sequence, has a large influence on the physico-chemical properties of the resulting phosphopeptide and thereby contributes to differences in binding affinity to different chromatographic materials. This means that a peptide which has the same amino acid sequence but different site for phosphorylation will most likely have different binding properties to different chelating materials. A direct comparison of the same sample separated on IMAC vs TiO2 reveals a number of differences between them (Figure 2). The figure shows enrichment of Figure 2 Enrichment of phosphopeptides using TiO2 and IMAC. (A) MALDI-MS analysis of a peptide mixture derived from tryptic digestion of transferrin (human), serum albumin (bovine), beta-lactoglobulin (bovine), carbonic anhydrase (bovine), beta-casein (bovine), alpha-casein (bovine), ovalbumin (chicken), ribonuclease B (bovine pancreas), alcohol dehydrogenase (baker yeast), myoglobin (whale skeletal muscle), lysozyme (chicken) and alpha amylase. (B) MALDI-MS analysis of TiO2 enriched phosphopeptides from 500 fmol of the peptide mixture. (C) MALDI-MS analysis of IMAC enriched phosphopeptides from 500 fmol of the peptide mixture. (D) MALDI-MS analysis of phosphopeptides purified from the flow through from the IMAC beads using TiO2. The phosphopeptides are labeled with asterisks.
0 1000
* *
0 1000
*
m/z
*
*
2618.6 2670.1 2703.8 2720.8 2747.0
** *
*
* **
m/z 3122.3
2966.1
*
*
3179.3
*
3088.0
*
3008.0
* *
2901.3
** * *
2901.4
*
2489.1 2511.1
0 1000
2613.1 2670.2
2312.1 2352.8
* * * * * ** * * * * * * 3179.0
3122.3
3007.9
2901.3 2966.1
2747.0
2489.0 2511.1 2546.0 2613.1 2670.1 2720.6
2312.1
*
*
2061.8 2088.9
1952.0
*
1927.7
1832.9
*
2061.8 2088.9
1927.7 1951.9
*
1660.8
0 1000
2511.1
D
*
2061.8
C 1660.8
*
* *
2088.9
1952.0
10,000
B
1760.0
7,000
1660.8
1482.71466.7
80,000
1466.7
Intensity
1907.9 1881.9 1953.0 2030.1 2072.0 2098.9 2171.1 2218.1 2198.2
3383.5
2707.5
2549.3 2584.2
2458.1
2313.2
1479.8
1383.9
1283.8
1195.6
1581.8 1640.0 1687.9 1760.0
1316.7 1345.8
1067.7 1121.6 1163.7
Intensity
A
1367.8 1384.8
1267.8
Intensity
40,000
1740.0
1466.7 1482.8
Intensity
Phosphoproteomics
m/z
m/z
281
Total digest of 12 proteins
3500
TiO2
3500
Fe3+-IMAC
*
3500
Fe3+-IMAC then TiO2
*
3500
282
Martin R. Larsen and Phillip J. Robinson
phosphorylated peptides from 500 fmol of a peptide mixture consisting of peptides originating from tryptic digestion of 12 different proteins including the phosphoproteins alpha-casein, beta-casein and ovalbumin using IMAC (iron) and TiO2 chromatography. The peptide mixture was loaded onto an IMAC microcolumn in 0.1% TFA/50% acetonitrile, whereas the peptide mixture was loaded onto a TiO2 micro-column using an optimized buffer consisting of 80% acetonitrile/5% TFA/1 M glycolic acid [16]. The phosphorylated peptides were eluted from the micro-columns in both cases using ammonia water pH 11.3 and analyzed by MS. Figure 2A shows the direct analysis of the peptide mixture by matrix-assisted laser desorption ionization (MALDI)-MS. The MALDI-MS peptide mass maps of the enriched phosphopeptides using TiO2 and IMAC are shown in Figure 2B and C, respectively. The phosphorylated peptides are indicated by asterisks. Both strategies were able to selectively enrich the phosphorylated peptides from the mixture. However, the signal intensities were markedly higher for the TiO2, suggesting a more efficient enrichment using this method. The ionization of mono-phosphorylated peptides is significantly higher than for multiply-phosphorylated peptides. This, combined with the low signals from the mono-phosphorylated peptides in the IMAC purification, suggests there is a lower binding affinity of those peptides in the loading buffer for IMAC. Indeed, TiO2 chromatography of the IMAC flow-through revealed a substantial amount of mono-phosphorylated peptides that were not captured on IMAC (Figure 2D). These data show a substantial strength of TiO2 in isolating fewer non-specific peptides, making the identification of more phosphopeptides much easier. We have recently developed a new strategy for the separation of mono-phosphorylated peptides from multiply-phosphorylated peptides in complex mixtures which take advantage of the different selectivity of IMAC and TiO2 [15]. MS on complex peptide mixtures is greatly improved by a variety of preseparation techniques that strive to reduce the complexity of the proteolytic digest prior to loading on an liquid chromatography (LC)–MS. To increase coverage of the phosphoproteome all of the enrichment strategies above can be readily adapted to existing pre-separation techniques. Such strategies include isoelectric focusing, strong cation exchange, strong anion exchange or hydrophilic interaction chromatography. Instead of a single LC–MS run, these strategies result in many sequential runs using the pre-separated pools. This greatly increases proteome coverage. The phosphoproteome is no different and the use of preseparation method for phosphoproteomics is highly recommended to split the phosphopeptides isolated by TiO2 or IMAC etc., into smaller pools prior to MS analysis of each.
3. MASS SPECTROMETRIC ANALYSIS OF PHOSPHORYLATED PEPTIDES Tandem MS has now become the preferred way to identify phosphorylated peptides and locate the phosphorylation sites within them. This is due to its unique sensitivity, rapid analysis and ability to deal with very complex mixtures, especially
Phosphoproteomics
283
in combination with nano-LC. However, comprehensive phosphoproteomics is still in its infancy and various problems are specifically associated with the MS analysis of phosphorylated peptides. Having obtained a pool enriched for phosphopeptides the next step is to deliver them into the MS for sequencing. MS depends on peptides being ionized prior to entry into the instrument. The comprehensive analysis of phosphopeptides is compromised by the ionization process itself. Electrospray ionization favors the ionization of non-phosphorylated peptides over phosphorylated peptides. This results in an overall reduced analysis of phosphorylated peptides in the presence of non-phosphorylated peptides. This occurs because the mass spectrometer can only automatically select a limited number of ions for fragmentation before moving to the next sample eluting from the LC. Thus, the software selects the stronger signal rather than the weaker phosphopeptide signal. Mono-phosphorylated peptides have a higher ionization efficiency than multiplyphosphorylated peptides (that are carrying more than one phosphate group), adding further bias to the MS analysis to favor mono-phosphorylated peptides. These factors contribute to obtaining a less than complete phosphoproteome. To obtain an amino acid sequence of an ionized peptide using tandem MS/MS, traditionally, peptides are fragmented by accelerating them by some electrical potential, followed by colliding them with an inert gas. This is called collision-induced dissociation (CID). CID provides fragmentation of the peptide backbone and thus information on the amino acid sequence of the peptide and location of the specific phosphorylation site. Fragmentation of phosphorylated peptides by CID frequently results in the loss of phosphoric acid from the phosphorylated amino acid (serine and threonine). This is because it is the preferred fragmentation pathway at low energy in comparison to fragmentation of the peptide backbone which requires higher energy. In practice this means that the target phosphate is lost before fragmentation of the peptide backbone is taking place. This significantly decreases the chances for assigning the phosphorylated peptide to a specific amino acid sequence in the following database search. This is one of the reasons why large scale phosphoproteomics is normally able to only identify a relatively low number of the phosphopeptides automatically selected by the instrument for fragmentation (and hence for sequencing). There are two phosphopeptide scanning techniques that offer alternative ways to identify phosphorylation sites in phosphopeptides by MS. These are normally performed on a total proteolytic digest without any prior peptide enrichment. The first is precursor ion scanning [29]. This technique allows detection of modified peptides by recording the loss of specific diagnostic fragment ions from the peptide during CID. Most phosphorylated peptides are detected in negative ion mode by the loss of PO 3 (m/z 79) and then subsequently fragmented in positive ion mode. An instrument with fast and robust switching between negative and positive mode is required for this technique. Precursor ion scanning in positive ion mode for the detection of the immonium ion from phosphotyrosine (m/z 216.043) [30] can be used to selectively analyze the small pool of tyrosinephosphorylated peptides normally present in large scale phosphoproteomics. Another method for selecting phosphorylated peptides from within a complex
284
Martin R. Larsen and Phillip J. Robinson
mixture for fragmentation is neutral loss scanning, where the loss of phosphoric acid is detected after elevated collision energy [31]. In general, the phosphopeptide scanning methods used on whole proteolytic digests are no longer applicable for large scale phosphoproteomics due to the efficiency of the IMAC or TiO2 enrichment methods. These now ensure that a high percentage of the peptides automatically selected by the instrument for fragmentation are phosphorylated (up to W80% [15,16]). Detecting the presence of the phosphate group in such enriched fractions by ion scanning would only decrease the amount of phosphopeptides which are then selected for fragmentation. There are several techniques that can increase the number of phosphopeptides identified from the same starting pool. One is phosphopeptide directed MS3 (pdMS3) using an ion trap MS. In this technique, an ion originating from a neutral loss of phosphoric acid detected in the normal MS2 is subsequently selected for a second round of CID (now called MS3) to obtain even more peptide sequence information [32]. To generate useful pdMS3 information more starting material is required because the fragmentation only uses a subset of the original material that was used for tandem MS analysis. Decreasing the starting material will also decrease the benefit derived from the pdMS3 experiment. This technique has recently been successfully applied to large scale phosphoproteomics studies [8,22, 32]. Several problems are associated with pdMS3, including the need for refilling the ion trap and the extra steps of isolating and fragmenting the neutral loss ion. An alternative to pdMS3 is multistage activation (MSA) [33]. In this technique, the neutral loss ion is subjected to CID while the fragments from the precursor ion are still present in the ion trap. This eliminates the re-filling and isolation steps. The result is a composite spectrum that contains product ions from both the precursor and the neutral loss product. This spectrum contains much more structural information than MS/MS spectra and produces more confident database search scores. The outcome is also more convenient and easier to interpret as each ion provides only one fragment ion spectrum instead of two. Another way to overcome the problems with poor fragmentation of phosphorylated peptides using CID is electron transfer dissociation (ETD) [34,35]. ETD is a technique whereby the peptides are dissociated by transferring electrons to positively charged peptide ions. The electron transfer results in cleavage of the amide groups along the peptide backbone resulting in predominantly C and Z ions [36]. The major benefit is that this leaves their amino acid side chains and post-translational modifications intact and thus they can be sequenced more readily. Recently, ETD has successfully been applied to large scale phosphoproteomics [26,37]. Combining CID and ETD on the same set of enriched phosphorylated peptides will result in a distinct overlap in the identified phosphorylated peptides, increasing coverage of the phosphoproteome. This is mainly as a result of different fragmentation mechanisms and the fact that CID performs better on peptides with lower charge stages (o3), whereas ETD performs best on charge stages W3. Therefore, the success of analyzing phosphorylated peptides by ETD largely relies on the method of proteolytic cleavage which should ensure that peptides with 3 or more charges are generated.
Phosphoproteomics
285
A comparative example of these techniques using normal tandem MS/MS, pdMS3 and ETD is shown in Figure 3. The MS fragmentation analysis of a triply charged peptide derived from Lamin-A/C (sp|P02545) is shown. The peptide (LRLpSPpSPTSQR, (M+3H)3+ ¼ 467.88) is phosphorylated on the two serine residues. The MS/MS analysis resulted in the loss of one and two phosphate groups and only a few signals originating from fragmentation of the peptide backbone were retained (Figure 3A). CID gives preferred fragmentation of the N-terminal side of prolines (y5 and b6) which can result in poor fragmentation of phosphopeptides phosphorylated by proline-directed kinases (S/T-P). The subsequent pdMS3 analysis of the ion corresponding to the loss of the first phosphate group resulted in a fragment ion spectrum dominated by the loss of a second phosphoric acid and very little peptide backbone fragmentation (Figure 3B). Analysis of the same peptide by ETD results in no loss of phosphoric acid and almost the full fragment ion coverage of the peptide by c and z ions [36] (Figure 3C). ETD, in contrast to CID does not fragment N-terminal to prolines and will therefore provide better backbone coverage of such phosphopeptides. This illustrates the strength of using different fragmentation methods to increase the coverage of the phosphoproteome.
4. QUANTITATIVE PHOSPHOPROTEOMICS The ability to assess the dynamic changes in phosphorylation in a cell upon stimulation or other cellular event is essential for phosphoproteomics. This relies on efficient tools for relative or absolute quantification of the different phosphorylation sites. Standard MS approaches are qualitative only and provide no information about whether a phosphorylation has increased or decreased. A large number of specific quantitative methods have been developed based on preparing peptide standards and generation of standard curves. However, this needs to be performed individually for each phosphopeptide and phosphorylation site and cannot be used for large scale phosphoproteomics. Recently, more suitable quantitative tools have been developed for quantitative proteomics which can also efficiently be used to quantify phosphorylated peptides. The most straightforward and simple methods for relative quantification are label-free methods which do not introduce stable isotope and chemical reactions into the experiment. Two methods for label-free quantification exist, ion current based and spectral counting. The first technique takes advantage of the extracted ion chromatograms for each ion in a LC–MS experiment and a value of intensity is determined by the area under the total ion count (peak intensity) [38]. However, the method relies on the reproducibility of the LC system, which can vary significantly. The second method for label-free quantification of protein abundance is spectral counting. In this technique, the number of MS/MS spectra acquired for each protein is correlated to the abundance of the protein [39]. This method is not of great value in phosphoproteomics since the phosphopeptide enrichment methods are optimized for low non-specific binding. Therefore, in many cases, only one single phosphopeptide is identified for each protein.
286
435.33
Martin R. Larsen and Phillip J. Robinson
100
z10 z9 z8 y10 y9 y8
A
z6 y6
z5 z4 z3 z2 y5 y4 y3 y2
z1 y1
L R L pS P pS P T S Q R
90 80
b1 c1
70 60
b2 b3 c2 c3
b4 c4
b5 c5
b6 c6
b7 c7
b8 c8
b9 c9
b10 c10
-2xH3PO4
50
0
200
b6*
400
b6 814.34
716.33
10
y5
y4
b6*2+
491.22
20
y3 588.42
y5 2+
30
358.67 390.20 402.67
40 294.67
Relative Abundance
z7 y7
-H3PO4
600
800
1000
1200
1400 m/z
402.67 100
B
-2xH3PO4
80 70 60 50
30 20 10 0
300
y5
400
500
600
700
800
900
1000
1100
1200 1386.46
200
y4
588.42
b4*
40
491.22
Relative Abundance
90
3+
0
200
400
600
800
z 8 c8 1003.41 1029.44
884.22
c7 928.53
739.24 755.23
z4
z6
803.12
10
664.34 680.38
20
c3
c5
475.21
30
c2
374.22 400.32
40
1000
1244.43
y6
z9 c9
z′10
1200.08
z2 c2
50
c10
702.05
70 60
2+
467.19
80
1200
1273.45
pSP
C
287.19
Relative Abundance
90
1+
pSP
1116.41
100
1300 m/z
1400
m/z
Figure 3 Tandem MS analysis of the doubly phosphorylated peptide LRLpSPpSPTSQR from Lamin-A/C. (A) MS/MS analysis of fragment ions obtained by CID of the triply charged phosphopeptide at m/z 467.88. (B) MS3 fragment ion spectrum of the signal corresponding to the loss of phosphoric acid (m/z 435.33). (C) MS/MS analysis of fragment ions obtained by ETD of the triply charged phosphopeptide at m/z 467.88. Asterisks indicate ions originating from the loss of phosphoric acid.
Phosphoproteomics
In vivo metabolic stable isotope labelling
A
Cells grown in medium containing normal isotopes(12C or 14N)
Cells grown in medium containing heavy isotopes(13C or 15N)
Harvest cells
Harvest cells
B
In vitro stable isotope labelling
Tissues or Cells grown in normal medium
Tissues or Cells grown in normal medium
Harvest cells
Harvest cells
Extract protein
Extract protein
Mix proteins 1:1 Mix cells 1:1 Purification of subproteome
Extract protein Purification of subproteome
287
Purification of subproteome
Proteolytic digest Proteolytic digest
Proteolytic digest
Proteolytic digest
Mix 1:1
Enrichment of phosphorylated peptides
Enrichment of phosphorylated peptides
MS analysis
MS analysis
Quantitation levels
·
Protein stable isotopic labels: - amines - cysteines
·
Protein stable isotopic labels: - amines - cysteines
·
Peptide stable isotopic labels: - amines - cysteines - acidic amino acids - N-terminal - enzyme-mediated
·
Label-free quantification
Figure 4 Overview of the current strategies for quantitative phosphoproteomics using stable isotope labeling.
In contrast, individual phosphorylation sites within the same protein are normally differentially regulated in vivo. The most popular tools for relative quantification in phosphoproteomics employ stable isotopes (i.e., non-radioactive). Such isotopes have the same physico-chemical properties as the normal molecule and therefore behave in the same way as the normal molecule in biochemical analyses. An overview of the different strategies in stable isotope quantitation is shown in Figure 4 and a list of the most common labeling procedures is provided in Table 1. In vivo metabolic labeling, in which the proteins are labeled inside the cell during growth, is a popular tool for quantitation in proteomics when combined with high resolution MS (Figure 4A). One strategy for metabolic labeling is the incorporation of the heavy isotope of nitrogen (15N) into the protein backbone or amino acid side chains during growth of cells or whole animals [40,41]. This strategy is expensive, especially when applied to whole animals. It requires specialized software for quantitation as the incorporation of 15N differs between peptides. Stable isotope labeling can be performed in vivo by using isotopic labeled amino acids, as first proposed for nuclear magnetic resonance (NMR) studies [42]. This type of in vivo labeling was first introduced into proteomics as SILAC
288
Martin R. Larsen and Phillip J. Robinson
Table 1 Scheme for stable isotopic labeling strategies used in mass spectrometry-based phosphoproteomics Quantification tools In vivo metabolic stable isotope labeling (at the cellular level)
Growth of cells in stable isotopes [40,41] Isotopic labeled amino acids (SILAC) [43]
15
N: Incorporation of 15N in the peptide backbone and selected amino acid side chains during cellular growth. 13 C, 15N: Cellular growth in the presence of the heavy isotopes of selected amino acids (e.g., arginine, lysine, or leucine)
In vitro stable isotope labeling (at the protein/peptide level) Target
Label
Comments
Primary amines
iTRAQ [50]
Isobaric tags for relative and absolute quantitation. Chemical reagents consisting of four isobaric reagents which produce peptides with identical masses in MS mode but produce low mass quantitative ions (m/z 114, 115, 116 and 117) in MS/MS mode. Mass coded abundance tagging. Labeling of the C-terminal lysine residues on tryptic peptides using stable isotope guanidination. N-terminal isotope-encoded tagging, e.g., using 13C or 2H labeled propionyl.
MCAT [65]
NIT [66] Cysteines
ICAT [47] Alkylation reagents [67]
Peptide Cterminus
Esterification [68] Enzymemediated [48]
Isotope-coded affinity tags targeting free cysteines. Quantitation using 2H or 13C. Alkylation of free cysteines using stable isotope containing acrylamide or iodoacetamide (e.g., 2H) Derivatization of the carboxylic groups of C-terminal and acidic amino acids using isotopic alcohols (e.g., MeOH-d0/d3. Incorporation of 18Oxygen during specific hydrolysis of the peptide backbone by some enzymes including trypsin.
(stable isotope labeling using amino acids in cell culture), as another tool for in vivo metabolic stable isotope labeling of proteins in cell culture [43]. In this technique, two groups of cells are grown in culture media that are identical except in one respect: the first media contains a ‘‘light’’ (or normal) and the second a ‘‘heavy’’ isotopic form of a particular amino acid
Phosphoproteomics
289
(e.g., l-Arginine-12C6 or l-Arginine-13C6). Through the use of essential amino acids the cells in separate plates are forced to use either the particular labeled or the unlabeled form. Each cell doubling replaces at least half of the original form of the amino acid, eventually incorporating up to 100% of a given light or heavy form of the amino acid. In practice a lower incorporation is observed for some proteins with lower turnover rates such as plasma membrane proteins. One of the two cell populations can now act as control (light), whereas the other (heavy) can be subjected to the cellular stimulation of interest or other biological experiment. After cell growth then stimulation of one set of cells, the two sets (control and stimulated) are harvested then mixed together in equal ratio. From this point on, standard protein or phosphoprotein extraction and MS analysis is applied. Since the stable isotope does not change the chemistry of the peptides or phosphopeptides, the peptides from control or stimulated cells always elute together from the high-performance liquid chromatography (HPLC) or TiO2 etc. The result is that the mass spectrum reveals pairs of peptides that differ by the sum of the number of additional 15N or 13C etc. When purified and analyzed together in this way, the area under the intensity curve for each pair is a quantitative measure of the relative abundance of each. The advantage of this method is that the cells can be mixed in a very early stage thereby eliminating variations introduced by the subsequent protein extraction, sub-proteomic fractionation and general sample handling. Unfortunately, this method is limited to cells that grow well in medium containing dialyzed serum and therefore it cannot be easily applied to primary cells, tissues, body fluids, some specialized cell cultures or cell fragments lacking protein synthesis like red blood cells, platelets or synaptosomes. A number of phosphoproteomic studies have been performed using SILAC labeling (e.g., [8,22,44–46]). The in vitro stable isotope labeling methods are performed at the protein level or more commonly at the peptide level, after initial sample preparation (Figure 4B). Basically, these in vitro methods do not rely on cell labeling, but rather on chemical derivatization of proteins or peptides derived from two (control vs stimulated) or more populations of cells. Since a small change in protein extraction or peptide purification can result in substantial changes in the quantitation, reproducible sample handling and sub-proteome purification methods are essential for the successful outcome of the analysis. In vitro stable isotope labeling is performed by reacting a chemical reagent containing one or more heavy isotopes (e.g., 13C or 15 N) with selected amino acids in the peptides (known as derivatization). This set of peptides is then mixed with a control sample labeled with the normal isotope. The mixed sample is then analyzed as one by MS for relative quantitation of the two samples. A large variety of appropriate covalent labels exists and a list of those most commonly used is provided in Table 1. A commonly used derivatization reagent in proteomics studies is the IsotopeCoded Affinity Tags (ICAT), which target free cysteines [47]. ICAT uses stable isotope labeling of paired protein samples, based on light or heavy isotopic forms (2H or 13C). The ICAT reagent is based on alkylation of cysteines by iodoacetamide, but also includes a biotin tag for affinity purification of the derivatized peptides. The technique’s strength is both quantitation and identification are achieved
290
Martin R. Larsen and Phillip J. Robinson
in a single analysis of samples from any source. However, extensive sample fractionation is required prior to analysis by MS/MS. This label is not widely applicable to large scale phosphoproteomics as most phosphorylated peptides do not typically include a cysteine residue. In a recent study, we found that cysteine containing phosphopeptides represented a total of 5% of all the identified phosphopeptides [15]. Another in vitro stable isotope labeling strategy is incorporation of 18Oxygen during hydrolysis of proteins by different proteases [48]. Some proteases i.e., trypsin and endoproteinases Lys-C and Glu-C, are able to incorporate two oxygen molecules during proteolysis, giving a mass difference of 4 Da if performed in the presence of 18Oxygen. This property makes it useful for relative quantitation by MS/MS. However, there are several drawbacks using 18O, such as back exchange and poor incorporation when the proteolytic cleavage site is placed next to acidic amino acid residues [49]. The most commonly used in vitro labeling is the set of iTRAQ reagents. These are chemical reagents consisting of four isobaric reagents (i.e., they each have the same mass). After derivatization to the N-terminus or lysines of peptides they produce modified peptides with identical masses in MS mode. However, on peptide fragmentation in MS/MS mode, each tag releases a different mass reporter ion (m/z 114, 115, 116 and 117). The ratio of the four reporters provides the relative abundance of any given peptide. This enables a comparison of four different samples from cells from up to four different states [50]. An analysis of the intensity of each reporter ion allows relative quantitation of the amount of peptide in each digest. This strategy has recently been applied to phosphoproteomic studies [51] and an 8-plex version of these chemical reagents are available to simultaneously quantify eight different samples [52].
5. FACTORS AFFECTING PHOSPHOPROTEOMICS Despite the recent successes in optimization of methods for enrichment of phosphorylated peptides [9,14–16,21] several factors affect the outcome of phosphoproteomics studies. The amount of starting material will have a major impact on the number of phosphopeptides that it is possible to identify in any given phosphoproteomic study. The more starting material available the higher will be the number of phosphopeptides that can be identified and more of the low abundance phosphorylation sites can be reached. This is evident from several large scale phosphoproteomics studies where several milligram of material is used (e.g., [8,22,32]). When working with large amount of material the whole phosphoproteomic analysis becomes much easier, as for example, non-specific peptide losses due to binding to plastic surfaces, pipette tips and other surfaces and losses due to multiple subsequent separation steps become a lot less significant. A real challenge in phosphoproteomics lies in analysis of low amount of material where miniaturized sample preparation methods need to be developed and optimized. In most cases large amounts of sample are not readily available as most biological studies operate with relatively low cell numbers rather than large organs like a
Phosphoproteomics
291
liver. This is particularly challenging with primary cells where even more limited cell numbers can be obtained. These types of experimental systems put much higher demands on phosphoproteomics sample preparation. Almost all phosphoproteomics studies use conventional RP material to desalt and concentrate the phosphopeptides when performing separation prior to tandem MS identification and quantification. Many phosphorylated peptides, especially multiply-phosphorylated peptides and small tryptic phosphorylated peptides, are very hydrophilic and the affinity to RP material is therefore reduced significantly [17]. This results in loses of phosphorylated peptides that do not bind to the RP material. This is especially relevant in large scale studies with high amount of starting material, where the chance of overloading the RP column is high. Also it is significant since small hydrophilic phosphopeptides do not ionize well in MS and may therefore erroneously appear to be very low in abundance. Recently, we introduced graphite micro-columns as an alternative or supplement to RP material for efficient capture of the hydrophilic phosphorylated peptides that do not bind to RP material followed by MS analysis [17]. An excellent way to monitor for such major loss of phosphopeptides is to track all the 32P-labeling and account for it all. We have successfully used this strategy in phosphoproteomics studies (Larsen MR, unpublished data). Proteolytic cleavage is traditionally performed in phosphoproteomics using trypsin, which cleavage the proteins C-terminally to the basic amino acid residues lysine and arginine. Many consensus sites for protein kinases are located in a basophilic amino acid sequence environment, e.g., cAMP-dependent protein kinase (R-R/K-X-S/T), protein kinase C (S/T-X-X-R/K), cyclin-dependent kinases (S/T-P-X-R/K) and calmodulin-dependent protein kinase II (R-X-X-S/T). Using a basophilic protease like trypsin will frequently leads to cleavage next to those phosphorylation sites and will increase the chance of generating small phosphorylated peptides. This is particularly important as the likelihood is high of having an additional basic cleavage site located nearby. This results in small hydrophilic phosphopeptides which are notoriously difficult to retain on RP material and analyze by MS. Therefore, the use of proteases with other specificities should be explored more often to increase the phosphoproteome coverage in large scale studies. Since all phosphoproteomics studies employ phosphopeptide enrichment steps which selective enrich for peptides based on the phosphate group, the chance of purifying other molecules with such a group is very high. The influence of for example nucleic acids and phospholipids on the purification and analysis of phosphopeptides has not yet been fully addressed. Such molecules are likely to have a large effect on the enrichment capacity and efficiency of most phosphopeptide enrichment materials. This could especially be an issue when working with phosphopeptides derived from membrane or nuclear preparations. In addition, such molecules will also have a large influence on the subsequent LC–MS/MS analysis as they will greatly retard the separation efficiency of the RF material. The influence of those molecules can be overcome by introducing protein precipitation steps which eliminate most nucleic acids and lipids from the solutions, with the risk of losing valuable material.
292
Martin R. Larsen and Phillip J. Robinson
Every published large scale phosphoproteomics study on mammalian cells uses cell lines that have been subjected to serum starvation/deprivation for 12–18 h prior to external stimulation or other manipulations (e.g., [22,32,44,45]). One of the purposes of starvation is to reduce the basal phosphorylation state in the cell because traces of growth hormones and other factors that activate signaling cascades in the cell are removed. As many signaling cascades use the same downstream pathways, this step can be important to detect small but biological significant differences in the phosphorylation status of the signaling proteins. Starvation is also thought to partly synchronize the cells in G0/G1 phase of the cell cycle. Cells subjected to serum starvation will have a wide distribution of cell sizes, proteomes and content of DNA, and will therefore reflect cells of a specified stage in the division cycle [53,54]. It is thought that after starvation each cell will respond equally to the subsequent external stimulation. However, there are several concerns associated with the use of serum starved cells in phosphoproteomic strategies. Serum starvation itself is cytotoxic to most cells and will induce stress and apoptosis responses or distinct changes in the morphology and metabolism in many cultured cell lines as well as primary cells. Serum deprivation triggers initial steps in an apoptotic pathway but does not lead to the typically observed DNA fragmentation detected by other apoptotic pathways, suggesting a distinct difference between pathways leading to apoptosis [55]. Kulkarni and McCulloch found that withdrawal of stimulatory growth factors by serum deprivation induced apoptotic cell death in a subset of Balb/c 3T3 fibroblasts in vitro and reported several morphological changes associated with apoptosis [56]. Serum starvation has been considered to be among the strongest inducers of intracellular ceramide formation [57]. Ceramides are lipid molecules composed of sphingosine and a fatty acid, which are normally found in the cell membranes. Ceramides and other sphingolipids have important functions integral to cell cycle arrest [58], apoptotic signaling [59–61], regulation of differentiation and proliferation [62], and are known to regulate both protein phosphatases [63] and protein kinases [64]. Therefore, serum starvation can lead to substantial changes in the proteome as well as the phosphoproteome and the cellular response to external stimulation may also be strongly influenced by those changes. In many large scale experiments, the change in phosphorylation that are discovered for some proteins frequently cannot be explained and the influence of serum starvation or other stress factors should be considered. The degree of change inside the cell upon serum starvation will depend on the type of cells used in the experiment. Cancer-like cell lines will most likely be more tolerant than primary cells, however, the biological significance of the experiment will in most cases be higher using primary cells. The effect on starvation should be investigated further in order to eliminate this factor in large scale phosphoproteomics studies.
6. CONCLUSION Phosphoproteomics has only recently been introduced as a discipline for the comprehensive characterization of the phosphorylation status of large sets of
Phosphoproteomics
293
proteins, as opposed to characterization of individual phosphorylated proteins. Due to the contribution of a number of research groups around the world the methods for enriching phosphorylated peptides from complex mixtures of peptides in a proteolytic digest are now developed to a degree where they can be applied to large scale experiments. Also the development of more sensitive mass spectrometers that can perform advanced fragmentation of phosphorylated peptides such as phosphopeptide-directed pdMS3, MSA and ETD, has significantly increased the ability to perform phosphoproteomics. However, there are many unsolved questions in phosphoproteomics. The effect of the amount of material, proteases, serum starvation, and selecting the optimum strategies to perform the analysis (Figure 1) are some examples which have not yet been compared directly with each other. In addition, there has been a growth in the number of publications describing small and insignificant changes to current phosphopeptide enrichment strategies, leaving newcomers in phosphoproteomics with a tangle of too many strategic choices. While the ability to identify hundreds or even thousands of phosphorylation sites in a cell is an exciting and major technical advance in recent years, a complete phosphoproteome still remains elusive. In the future, major steps forward in methods or technology are required to greatly enhance the present strategies in order to more fully uncover the hidden phosphoproteome.
ACKNOWLEDGMENTS Terry Zhang from ThermoFisher (San Jose, US) is acknowledged for obtaining the ETD data of the doubly phosphorylated peptide LRLpSPpSPTSQR on their LTQ-ETD instrument. This work was supported by The Danish Natural Science Research Council (Grants No. 21-03-0167) and The Danish Strategic Research Council (Young Investigator awards (MRL)).
REFERENCES 1 J.D. Graves and E.G. Krebs, Protein phosphorylation and signal transduction, Pharmacol. Ther., 82(2–3) (1999) 111–121. 2 D.C.A. Neville, C.R. Rozanas, E.M. Price, D.B. Gruis, A.S. Verkman and R.R. Townsend, Evidence for phosphorylation of serine 753 in CFTR using a novel metal-ion affinity resin and matrixassisted laser desorption mass spectrometry, Protein Sci., 6(11) (1997) 2436–2445. 3 D. Figeys, S.P. Gygi, Y. Zhang, J. Watts, M. Gu and R. Aebersold, Electrophoresis combined with novel mass spectrometry techniques: Powerful tools for the analysis of proteins and proteomes, Electrophoresis, 19(10) (1998) 1811–1818. 4 S.H. Li and C. Dass, Iron(III)-immobilized metal ion affinity chromatography and mass spectrometry for the purification and characterization of synthetic phosphopeptides, Anal. Biochem., 270(1) (1999) 9–14. 5 M.C. Posewitz and P. Tempst, Immobilized gallium(III) affinity chromatography of phosphopeptides, Anal. Chem., 71(14) (1999) 2883–2892. 6 S.B. Ficarro, M.L. McCleland, P.T. Stukenberg, D.J. Burke, M.M. Ross, J. Shabanowitz, D.F. Hunt and F.M. White, Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae, Nat. Biotechnol., 20(3) (2002) 301–305. 7 T.S. Nuhse, A. Stensballe, O.N. Jensen and S.C. Peck, Large-scale analysis of in vivo phosphorylated membrane proteins by immobilized metal ion affinity chromatography and mass spectrometry, Mol. Cell. Proteomics, 2(11) (2003) 1234–1243.
294
Martin R. Larsen and Phillip J. Robinson
8 A. Gruhler, J.V. Olsen, S. Mohammed, P. Mortensen, N.J. Faergeman, M. Mann and O.N. Jensen, Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway, Mol. Cell. Proteomics, 4(3) (2005) 310–327. 9 M. Kokubu, Y. Ishihama, T. Sato, T. Nagasu and Y. Oda, Specificity of immobilized metal affinitybased IMAC/C18 tip enrichment of phosphopeptides for protein phosphorylation analysis, Anal. Chem., 77(16) (2005) 5144–5154. 10 K.A. Resing, R.S. Johnson and K.A. Walsh, Mass-spectrometric analysis of 21 phosphorylation sites in the internal repeat of rat profilaggrin, precursor of an intermediate filament-associated protein, Biochemistry, 34(29) (1995) 9477–9487. 11 H.L. Zhou, J.D. Watts and R. Aebersold, A systematic approach to the analysis of protein phosphorylation, Nat. Biotechnol., 19(4) (2001) 375–378. 12 M.W.H. Pinkse, P.M. Uitto, M.J. Hilhorst, B. Ooms and A.J.R. Heck, Selective isolation at the femtomole level of phosphopeptides from proteolytic digests using 2D-nanoLC-ESI-MS/MS and titanium oxide precolumns, Anal. Chem., 76(14) (2004) 3935–3943. 13 A. Sano and H. Nakamura, Titania as a chemo-affinity support for the column-switching HPLC analysis of phosphopeptides: Application to the characterization of phosphorylation sites in proteins by combination with protease digestion and electrospray ionization mass spectrometry, Anal. Sci., 20(5) (2004) 861–864. 14 M.R. Larsen, T.E. Thingholm, O.N. Jensen, P. Roepstorff and T.J.D. Jorgensen, Highly selective enrichment of phosphorylated peptides from peptide mixtures using titanium dioxide microcolumns, Mol. Cell. Proteomics, 4(7) (2005) 873–886. 15 T.E. Thingholm, O.N. Jensen, P.J. Robinson and M.R. Larsen, SIMAC-A phosphoproteomic strategy for the rapid separation of mono-phosphorylated from multiply phosphorylated peptides. Mol. Cell. Proteomics, 7(4) (2008) 661–671. 16 S.S. Jensen and M.R. Larsen, Evaluation of the impact of some experimental procedures on different phosphopeptide enrichment techniques. Rapid Commun. Mass Spectrom., 21(22) (2007) 3635–3645. 17 M.R. Larsen, M.E. Graham, P.J. Robinson and P. Roepstorff, Improved detection of hydrophilic phosphopeptides using graphite powder microcolumns and mass spectrometry — Evidence for in vivo doubly phosphorylated dynamin I and dynamin III, Mol. Cell. Proteomics, 3(5) (2004) 456–465. 18 D.T. McLachlin and B.T. Chait, Improved beta-elimination-based affinity purification strategy for enrichment of phosphopeptides, Anal. Chem., 75(24) (2003) 6826–6836. 19 W. Weckwerth, L. Willmitzer and O. Fiehn, Comparative quantification and identification of phosphoproteins using stable isotope labeling and liquid chromatography/mass spectrometry, Rapid Commun. Mass Spectrom., 14(18) (2000) 1677–1681. 20 M.B. Goshe, T.P. Conrads, E.A. Panisko, N.H. Angell, T.D. Veenstra and R.D. Smith, Phosphoprotein isotope-coded affinity tag approach for isolating and quantitating phosphopeptides in proteome-wide analyses, Anal. Chem., 73(11) (2001) 2578–2586. 21 T.E. Thingholm, T.J.D. jorgensen, O.N. Jensen and M.R. Larsen, Highly selective enrichment of phosphorylated peptides using titanium dioxide, Nat. Protoc., 1(4) (2006) 1929–1935. 22 J.V. Olsen, B. Blagoev, F. Gnad, B. Macek, C. Kumar, P. Mortensen and M. Mann, Global, in vivo, and site-specific phosphorylation dynamics in signaling networks, Cell, 127(3) (2006) 635–648. 23 S. Rinalducci, M.R. Larsen, S. Mohammed and L. Zolla, Novel protein phosphorylation site identification in spinach stroma membranes by titanium dioxide microcolumns and tandem mass spectrometry, J. Proteome Res., 5(4) (2006) 973–982. 24 B. Bodenmiller, L.N. Mueller, M. Mueller, B. Domon and R. Aebersold, Reproducible isolation of distinct, overlapping segments of the phosphoproteome, Nat. Methods, 4(3) (2007) 231–237. 25 B. Macek, I. Mijakovic, J.V. Olsen, F. Gnad, C. Kumar, P.R. Jensen and M. Mann, The serine/ threonine/tyrosine phosphoproteome of the model bacterium Bacillus subtilis, Mol. Cell. Proteomics, 6(4) (2007) 697–707. 26 H. Molina, D.M. Horn, N. Tang, S. Mathivanan and A. Pandey, Global proteomic profiling of phosphopeptides using electron transfer dissociation tandem mass spectrometry, Proc. Natl. Acad. Sci. USA, 104(7) (2007) 2199–2204. 27 K. Schmelzle, S. Kane, S. Gridley, G.E. Lienhard and F.M. White, Temporal dynamics of tyrosine phosphorylation in insulin signaling, Diabetes, 55(8) (2006) 2171–2179.
Phosphoproteomics
295
28 T.E. Thingholm, M.R. Larsen, C.R. Ingrell, M. Kassem and O.N. Jensen, Phosphoproteome analysis of human stem cell plasma membranes using TiO2 columns and mass spectrometry, J. Proteome Res., (2008). 29 G. Neubauer and M. Mann, Mapping of phosphorylation sites of gel-isolated proteins by nanoelectrospray tandem mass spectrometry: Potentials and limitations, Anal. Chem., 71(1) (1999) 235–242. 30 H. Steen, B. Kuster, M. Fernandez, A. Pandey and M. Mann, Detection of tyrosine phosphorylated peptides by precursor ion scanning quadrupole TOF mass spectrometry in positive ion mode, Anal. Chem., 73(7) (2001) 1440–1448. 31 R.H. Bateman, R. Carruthers, J.B. Hoyes, C. Jones, J.I. Langridge, A. Millar and J.P.C. Vissers, A novel precursor ion discovery method on a hybrid quadrupole orthogonal acceleration time-offlight (Q-TOF) mass spectrometer for studying protein phosphorylation, J. Am. Soc. Mass Spectrom., 13(7) (2002) 792–803. 32 S.A. Beausoleil, M. Jedrychowski, D. Schwartz, J.E. Elias, J. Villen, J.X. Li, M.A. Cohn, L.C. Cantley and S.P. Gygi, Large-scale characterization of HeLa cell nuclear phosphoproteins, Proc. Natl. Acad. Sci. USA, 101(33) (2004) 12130–12135. 33 M.J. Schroeder, J. Shabanowitz, J.C. Schwartz, D.F. Hunt and J.J. Coon, A neutral loss activation method for improved phosphopeptide sequence analysis by quadrupole ion trap mass spectrometry, Anal. Chem., 76(13) (2004) 3590–3598. 34 M.J. Schroeder, D.J. Webb, J. Shabanowitz, A.F. Horwitz and D.F. Hunt, Methods for the detection of paxillin post-translational modifications and interacting proteins by mass spectrometry, J. Proteome Res., 4(5) (2005) 1832–1841. 35 J.E. Syka, J.J. Coon, M.J. Schroeder, J. Shabanowitz and D.F. Hunt, Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry, Proc. Natl. Acad. Sci. USA, 101(26) (2004) 9528–9533. 36 P. Roepstorff and J. Fohlman, Proposal for a common nomenclature for sequence ions in massspectra of peptides, Biomed. Mass Spectrom., 11(11) (1984) 601. 37 A. Chi, C. Huttenhower, L.Y. Geer, J.J. Coon, J.E.P. Syka, D.L. Bai, J. Shabanowitz, D.J. Burke, O.G. Troyanskaya and D.F. Hunt, Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry, Proc. Natl. Acad. Sci. USA, 104(7) (2007) 2193–2198. 38 P.V. Bondarenko, D. Chelius and T.A. Shaler, Identification and relative quantitation of protein mixtures by enzymatic digestion followed by capillary reversed-phase liquid chromatographytandem mass spectrometry, Anal. Chem., 74(18) (2002) 4741–4749. 39 H.B. Liu, R.G. Sadygov and J.R. Yates, A model for random sampling and estimation of relative protein abundance in shotgun proteomics, Anal. Chem., 76(14) (2004) 4193–4201. 40 Y. Oda, K. Huang, F.R. Cross, D. Cowburn and B.T. Chait, Accurate quantitation of protein expression and site-specific phosphorylation, Proc. Natl. Acad. Sci. USA, 96(12) (1999) 6591–6596. 41 C.C. Wu, M.J. MacCoss, K.E. Howell, D.E. Matthews and J.R. Yates, Metabolic labeling of mammalian organisms with stable isotopes for quantitative proteomic analysis, Anal. Chem., 76(17) (2004) 4951–4959. 42 A.P. Hansen, A.M. Petros, A.P. Mazar, T.M. Pederson, A. Rueter and S.W. Fesik, A practical method for uniform isotopic labeling of recombinant proteins in mammalian-cells, Biochemistry, 31(51) (1992) 12713–12718. 43 S.E. Ong, B. Blagoev, I. Kratchmarova, D.B. Kristensen, H. Steen, A. Pandey and M. Mann, Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics, Mol. Cell. Proteomics, 1(5) (2002) 376–386. 44 B. Blagoev, S.E. Ong, I. Kratchmarova and M. Mann, Temporal analysis of phosphotyrosinedependent signaling networks by quantitative proteomics, Nat. Biotechnol., 22(9) (2004) 1139–1145. 45 R. Amanchy, D.E. Kalume, A. Iwahori, J. Zhong and A. Pandey, Phosphoproteome analysis of HeLa cells using stable isotope labeling with amino acids in cell culture (SILAC), J. Proteome Res., 4(5) (2005) 1661–1671. 46 G.A. Zhang, D.S. Spellman, E.Y. Skolnik and T.A. Neubert, Quantitative phosphotyrosine proteomics of EphB2 signaling by stable isotope labeling with amino acids in cell culture (SILAC), J. Proteome Res., 5(3) (2006) 581–588.
296
Martin R. Larsen and Phillip J. Robinson
47 S.P. Gygi, B. Rist, S.A. Gerber, F. Turecek, M.H. Gelb and R. Aebersold, Quantitative analysis of complex protein mixtures using isotope-coded affinity tags, Nat. Biotechnol., 17(10) (1999) 994–999. 48 O.A. Mirgorodskaya, Y.P. Kozmin, M.I. Titov, R. Korner, C.P. Sonksen and P. Roepstorff, Quantitation of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using O-18-labeled internal standards, Rapid Commun. Mass Spectrom., 14(14) (2000) 1226–1232. 49 II. Stewart, T. Thomson and D. Figeys, O-18 Labeling: A tool for proteomics, Rapid Commun. Mass Spectrom., 15(24) (2001) 2456–2465. 50 P.L. Ross, Y.L.N. Huang, J.N. Marchese, B. Williamson, K. Parker, S. Hattan, N. Khainovski, S. Pillai, S. Dey, S. Daniels, S. Purkayastha, P. Juhasz, S. Martin, M. Bartlet-Jones, F. He, A. Jacobson and D.J. Pappin, Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents, Mol. Cell. Proteomics, 3(12) (2004) 1154–1169. 51 Y. Zhang, A. Wolf-Yadlin, P.L. Ross, D.J. Pappin, J. Rush, D.A. Lauffenburger and F.M. White, Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules, Mol. Cell. Proteomics, 4(9) (2005) 1240–1250. 52 B.L. Williamson, P.L. Ross, S. Pillai, B. Purkayastha, S. Daniels and D. Pappin, Protein quantitation using a novel 8-plex set of isobaric peptide labels, Mol. Cell. Proteomics, 5(10) (2006) S55. 53 S. Cooper, Mammalian cells are not synchronized in G(1)-phase by starvation or inhibition: Considerations of the fundamental concept of G(1)-phase synchronization, Cell Prolif., 31(1) (1998) 9–16. 54 S. Cooper, Reanalysis of the protocol for in vitro synchronization of mammalian astrocytic cultures by serum deprivation, Brain Res. Protoc., 15(3) (2005) 115–118. 55 W.A. Kues, J.W. Carnwath, D. Paul and H. Niemann, Cell cycle synchronization of porcine fetal fibroblasts by serum deprivation initiates a nonconventional form of apoptosis, Cloning Stem Cells, 4(3) (2002) 231–243. 56 G.V. Kulkarni and C.A.G. McCulloch, Serum deprivation induces apoptotic cell-death in a subset of balb/c 3t3 fibroblasts, J. Cell Sci., 107 (1994) 1169–1179. 57 Y.A. Hannun, Functions of ceramide in coordinating cellular responses to stress, Science, 274(5294) (1996) 1855–1859. 58 S. Jayadev, B. Liu, A.E. Bielawska, J.Y. Lee, F. Nazaire, M.Y. Pushkareva, L.M. Obeid and Y.A. Hannun, Role for ceramide in cell-cycle arrest, J. Biol. Chem., 270(5) (1995) 2047–2052. 59 L.M. Obeid, C.M. Linardic, L.A. Karolak and Y.A. Hannun, programmed cell-death induced by ceramide, Science, 259(5102) (1993) 1769–1771. 60 K. Thevissen, I. Francois, J. Winderickx, C. Pannecouque and B.P.A. Cammue, Ceramide involvement in apoptosis and apoptotic diseases, Mini-Rev. Med. Chem., 6(6) (2006) 699–709. 61 R. Caricchio, L. D’Adamio and P.L. Cohen, Fas, ceramide and serum withdrawal induce apoptosis via a common pathway in a type II Jurkat cell line, Cell Death Differ., 9(5) (2002) 574–580. 62 H. Wakita, Y. Tokura, H. Yagi, K. Nishimura, F. Furukawa and M. Takigawa, Keratinocyte differentiation is induced by cell-permeant ceramides and its proliferation is promoted by sphingosine, Arch. Dermatol. Res., 286(6) (1994) 350–354. 63 C.E. Chalfant, K. Kishikawa, M.C. Mumby, C. Kamibayashi, A. Bielawska and Y.A. Hannun, Long chain ceramides activate protein phosphatase-1 and protein phosphatase-2A — Activation is stereospecific and regulated by phosphatidic acid, J. Biol. Chem., 274(29) (1999) 20313–20317. 64 H. Le Stunff, L. Dokhac and S. Harbon, The roles of protein kinase C and tyrosine kinases in mediating endothelin-1-stimulated phospholipase D activity in rat myometrium: Differential inhibition by ceramides and cyclic AMP, J. Pharmacol. Exp. Ther., 292(2) (2000) 629–637. 65 G. Cagney and A. Emili, De novo peptide sequencing and quantitative profiling of complex protein mixtures using mass-coded abundance tagging, Nat. Biotechnol., 20(2) (2002) 163–170. 66 X. Zhang, Q.K. Jin, S.A. Carr and R.S. Annan, N-terminal peptide labeling strategy for incorporation of isotopic tags: A method for the determination of site-specific absolute phosphorylation stoichiometry, Rapid Commun. Mass Spectrom., 16(24) (2002) 2325–2332. 67 S. Sechi and B.T. Chait, Modification of cysteine residues by alkylation. A tool in peptide mapping and protein identification, Anal. Chem., 70(24) (1998) 5150–5158. 68 T. He, K. Alving, B. Feild, J. Norton, E.G. Joseloff, S.D. Patterson and B. Domon, Quantitation of phosphopeptides using affinity chromatography and stable isotope labeling, J. Am. Soc. Mass Spectrom., 15(3) (2004) 363–373.
CHAPT ER
13 Analysis of Protein-Tyrosine Phosphorylation by Mass Spectrometry Guoan Zhang, Chong-Feng Xu and Thomas A. Neubert
Contents
1. Introduction 2. Enrichment 2.1 Anti-pTyr antibodies 2.2 PTyr binding domains 2.3 Global phosphopeptide enrichment 3. Qualitative Analysis 3.1 Detection of phosphopeptides 3.2 MS/MS analysis of phosphopeptides 4. Quantitative Analysis 4.1 Study of cellular signaling by quantitative phosphotyrosine proteome analysis 5. Future Directions 5.1 MS instrumentation 5.2 PTyr enrichment 5.3 Bioinformatics 5.4 Quantitation 6. Conclusions Abbreviations Acknowledgement References
297 299 300 300 300 301 302 304 305 306 309 309 309 309 310 310 310 311 311
1. INTRODUCTION Phosphorylation is one of the most common posttranslational modifications of proteins and plays a key role in cellular signal transduction. In eukaryotic cells, phosphorylation mainly occurs on serine, threonine and tyrosine residues [1]. Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00213-4
r 2009 Elsevier B.V. All rights reserved.
297
298
Guoan Zhang et al.
Tyrosine phosphorylation is catalyzed by enzymes called protein-tyrosine kinases (PTKs) that can add phosphate to specific tyrosine residues in target proteins. The phosphate is removed from phosphotyrosine by enzymes called protein-tyrosine phosphatases (PTPs) [2,3]. To date there are 90 genes known to encode RTKs and 107 genes encoding PTPs in the human genome [4,5]. Although pTyr is as stable as pSer and pThr at physiological pH, because of the high turnover numbers of PTPs and their generally tight negative regulation of PTK activity, most PTK substrates have a very low basal level of tyrosine phosphorylation. As a result, the overall level of phosphotyrosine (pTyr) in proteins in normal vertebrate cells is very low compared to the levels of phosphoserine (pSer) and phosphothreonine (pThr) (pSer:pThr:pTyr ¼ 1,800:200:1) [1]. Despite its relatively infrequent occurrence, tyrosine phosphorylation is especially important in signal transduction. In evolution terms, true PTKs did not emerge until the appearance of metazoans, suggesting the need for more sophisticated signaling regulation by higher organisms. The reversible phosphorylation of tyrosine plays a key role in the control of most fundamental cellular processes including the cell cycle, cell migration, cell metabolism and survival, cell proliferation and differentiation. Aberrant tyrosine phosphorylation can induce many types of cancer and other diseases [1,6,7]. As a result, PTKs are becoming a major group of drug targets. Tyrosine phosphorylation can control signal transduction through multiple mechanisms. First, the phosphorylation of a regulatory tyrosine residue can induce a conformational change in an enzyme that stimulates its activity. Second, phosphorylated tyrosine residues can serve as specific binding sites for Src homology 2 (SH2) domains or phosphotyrosine binding (PTB) domains [8,9]. Through such pY-dependent protein–protein interactions, regulatory proteins are recruited to phosphorylated receptors and docking proteins, and thereby activate specific signaling pathways. Third, tyrosine phosphorylation can also reduce catalytic activity, as in the case of the SH2 domain of the Src tyrosine kinase. This regulatory SH2 domain engages an inhibitory C-terminal phosphotyrosine site, leading to an auto-inhibited conformation of the kinase domain [8]. To understand the underlying mechanisms of tyrosine phosphorylationdependent signaling events, it is important to determine the relevant sites that are phosphorylated in vivo. Analysis of tyrosine phosphorylation has been challenging for a host of reasons. (1) Many signaling proteins that can be tyrosine phosphorylated are present at low abundance within the cell. (2) The stoichiometry of tyrosine-phosphorylated proteins is often very low. Most of the time, only a very small fraction of the available pool of a protein is phosphorylated. (3) Proteins often have multiple phosphorylation sites and are heterogeneously phosphorylated. (4) Tyrosine phosphorylation is highly dynamic. Several analytical techniques are available for the analysis of phosphorylation, which include Edman sequencing, 32P labeling, and mass spectrometry (MS) [10,11]. Owing to the rapid improvement in MS instrumentation and related proteomic methodologies during the last decade, MS has quickly become the method of choice for phosphorylation analysis. Compared to other methods, MS
Analysis of Protein-Tyrosine Phosphorylation by MS
299
offers the advantages of high sensitivity, high speed, ability to localize the phosphorylation sites and suitability for large-scale analysis [10–15]. In a typical analysis, phosphorylated proteins are digested using a protease, usually trypsin, and the digest is analyzed by MS. The localization of the phosphorylation sites is achieved by performing tandem mass spectrometry (MS/ MS) [16], in which the peptide is fragmented and the fragment ion masses are used to determine the peptide sequence as well as the sites of phosphorylation. Analysis at the peptide level is referred to as the ‘‘bottom-up’’ approach. Direct analysis of intact phosphorylated proteins by MS (‘‘top-down’’ approach) has been very rare due to the difficulties in obtaining purified target proteins, the low sensitivity in MS of intact proteins, and difficulties in performing MS/MS on intact proteins and interpreting the resulting spectra. A comprehensive analysis of phosphorylation by MS has multiple stages: (1) identification of the phosphorylated proteins; (2) localization of the phosphorylation sites and (3) quantitation of the dynamic changes in phosphorylation. In this article, we will give an overview of MS-based techniques and strategies for analysis of tyrosine phosphorylation. The analysis of serine and threonine phosphorylation will be mentioned but not emphasized. Finally we will give a prospective of emerging new techniques that may greatly enhance our ability for phosphotyrosine analysis in the future. Basic information about peptide and protein analysis by MS will not be covered in detail and readers are referred to recent reviews [16,17].
2. ENRICHMENT Analysis of phosphorylated proteins or peptides in complex samples is usually very difficult because of the low level of protein phosphorylation. Even for purified phosphoproteins, after protease digestion phosphorylated peptides usually represent only a small fraction of total peptides due to low stoichiometry of phosphorylation. The presence of nonphosphorylated peptides is detrimental for MS analysis for many reasons. (1) The MS signals of phosphopeptides are strongly suppressed by nonphosphopeptides due to the competitive nature of ionization in both electrospray ionization (ESI) and matrix assisted laser desorption/ionization mass spectrometry (MALDI-MS). (2) In automated data-dependent LC-MS analysis, only high abundance peptides are selected for sequencing. Phosphopeptides are often missed because of their low signal intensities [18]. (3) For complex samples, detection of phosphopeptides may be influenced by the presence of isobaric nonphosphopeptides. This is especially a problem for MALDI-MS when peptides are not fractionated before MS analysis. Therefore efficient enrichment is often the key to successful phosphorylation analysis. For the analysis of tyrosine phosphorylation, the need for enrichment is even greater than for serine and threonine phosphorylation because of its much less frequent occurrence. Currently several techniques are available for enrichment of phosphorylated proteins or peptides. Some of them enrich pSer, pThr and pTyr
300
Guoan Zhang et al.
phosphorylation in a generic fashion, while others only enrich specific forms of phosphorylation.
2.1 Anti-pTyr antibodies There are commercially available antibodies against pTyr (e.g. 4G10, PY99, PT66 and PY100). These antibodies bind to pTyr with little to no detectable crossreactivity to pSer, pThr or nonphosphorylated tyrosine. It has been shown that different pTyr antibodies can bind to the same pTyr protein in immunoanalysis, independent of the exact nature of the immunogen against which the antibodies were raised [19]. Unfortunately, equally effective antibodies against serine or threonine phosphorylated proteins are not currently available. The high specificity of pTyr antibodies may be due to the aromatic side chain of pTyr which affords high binding affinity and selectivity compared to the aliphatic side chains of pSer or pThr. These antibodies have been routinely used in isolation of tyrosinephosphorylated proteins and have been very useful for enriching low abundance pTyr proteins for MS analysis [20]. Until recently it has been generally believed that pTyr antibodies are not good for enrichment of pTyr peptides [21–24]. In 2005, Rush et al. showed that pTyr peptides can be efficiently isolated from protease-digested cellular protein extracts using a pTyr antibody [25]. By combining this approach with LC-MS/MS, hundreds of pTyr sites were identified using cancer cell lines. Several recent studies have used this approach to profile pTyr changes in RTK signaling pathways [26,27]. For MALDI-MS analysis, the selectivity of the immunoaffinity isolation was improved by Zhang et al. by using an immunoprecipitation (IP) buffer containing the detergent n-octyl glucoside [28]. For pTyr analysis of complex samples, immunoaffinity enrichment of pTyr peptides has the advantage over other approaches that only pTyr peptides are enriched while pSer and pThr peptides are discarded. For generic phosphorylation enrichment approaches, all forms of phosphorylation are enriched such that even after enrichment, detection of pY peptides remains difficult in the presence of overwhelming amounts of pSer and pThr peptides.
2.2 PTyr binding domains Recombinant pTyr binding domains can be used to isolate their target proteins or peptides. Known domains that bind to pTyr include the SH2 and PTB domains, and the C2 domain of protein kinase C [29–31]. The SH2 domain of Grb2 has been used to selectively pull down interacting pTyr proteins for MS analysis [32]. Unlike anti-pTyr antibodies, SH2 domains recognize pTyr and three to five amino acids C-terminal to the pTyr site. Therefore the interactions are sequence specific and cannot be used for global pTyr enrichment.
2.3 Global phosphopeptide enrichment In addition to the immunoaffinity-based methods, in recent years several chromatographic and electrophoretic enrichment methods have been developed
Analysis of Protein-Tyrosine Phosphorylation by MS
301
for the isolation of phosphopeptides. These methods include immobilized metal affinity chromatography (IMAC)[33–35], metal oxide/hydroxide affinity chromatography (MOAC, e.g. TiO2, ZrO2 or Al(OH)3, etc.) [36–38], strong cationexchange chromatography (SCX) [39] and isoelectric focusing (IEF) [40,41]. IMAC: IMAC has been the most popular technique for the isolation of phosphopeptides prior to MS analysis. IMAC utilizes the high affinity of phosphate groups towards a metal-chelated stationary phase, typically Fe3+ or Ga3+. The traditional IMAC method has been plagued by non-specific retention of peptides rich in acidic amino acids (aspartic acid and glutamic acid). Using an esterification reaction with methanolic HCl to convert the acidic residues to methyl esters before IMAC enrichment; Ficarro et al. have significantly improved the selectivity of this method [35]. Despite this progress, IMAC methods can vary widely in effectiveness, mainly due to the requirement of metal ion loading and washing steps and the use of different sample loading/washing buffers [42]. TiO2 chromatography: Organic phosphate can be selectively adsorbed to the surface of titanium dioxide, which makes titanium dioxide chromatography a new and alternative technique to IMAC for the highly specific enrichment of phosphopeptides [36]. These two methods seem to be complementary because each method isolates a different but somewhat overlapping subset of phosphopeptides from a phosphoproteome [43]. Titanium dioxide chromatography is very robust, reproducible and easy to use. We and others have observed that the selectivity of the TiO2 chromatography can be greatly improved by using very low pH sample loading and washing buffer (e.g. 1–5% TFA in 80% AcN). SCX chromatography and multi-dimensional enrichment methods: SCX chromatography has been widely applied as the first fractionation dimension in shotgun proteomics [44]. Recently Beausoleil et al. showed that strong cation exchange at pH 2.7 could be used to enrich phosphorylated peptides, which generally eluted in early fractions because of their relatively low charges as compared to nonphosphorylated peptides [39]. Compared to IMAC and TiO2 chromatography, SCX chromatography suffers much more from contamination by nonphosphorylated acidic peptides. In addition, phosphorylated peptides with internal basic residues elute with the bulk of nonphosphorylated peptides and therefore may be missed in the analysis. To compensate for poor selectivity, in many cases SCX chromatography is combined with IMAC [45] or TiO2 chromatography [46] and functions as a pre-enrichment and prefractionation tool. Another reason for this combination is that SCX has a high capacity, and both IMAC and TiO2 chromatography work better for samples of low complexity.
3. QUALITATIVE ANALYSIS Characterization of phosphoproteins usually involves the isolation of phosphoproteins, enzymatic digestion, isolation of phosphopeptides, detection of phosphopeptides and phosphopeptide sequencing using tandem MS. Figure 1 summarizes the main strategies used in modern phosphoproteomics.
302
Guoan Zhang et al.
Figure 1 Main analytical pathways in modern phosphoproteomics. Recent useful tools or strategies are shown in italics and include the use of metal oxide affinity chromatography (MOAC) to purify phosphoproteins, use of IMAC or titanium dioxide (TiO2) to select phosphopeptides, strong cation-exchange chromatography (SCX) to separate phosphopeptides, multiple reaction monitoring (MRM) to select ions that correspond to potential phosphopeptides (in the case of already identified phosphoproteins) and additional fragmentation (MS3) to help with the identification of phosphorylation sites.
3.1 Detection of phosphopeptides As summarized in Figure 1, there are several methods available for the specific detection of phosphopeptides from peptide mixtures [47,48]. Many of the detection
Analysis of Protein-Tyrosine Phosphorylation by MS
Table 1
303
Useful masses for phosphopeptide analysis by mass spectrometry
Methods
Phosphatase treatment
Neutral loss scan
Negative precursor ion scan
Positive precursor ion scan
pSer/pThr pTyr
80 (HPO3) 80 (HPO3)
98 (H3PO4) 80 (HPO3)
79 ðPO 3Þ 79 ðPO 3Þ
N/A 216 (Im. Ion)
methods are based on changes in the mass of phosphorylated amino acid residues in the gas phase during collision-induced dissociation (CID). The most useful masses for the detection of phosphopeptides are shown in Table 1. Phosphopeptides can be detected based on the n79.97 Da (HPO3) increase in the MW of a putative phosphopeptide, and the candidates can be verified by MS/MS sequencing [49]. Alternatively, phosphopeptides can be identified based on an n79.97 Da (HPO3) loss in mass after phosphatase treatment. Besides the mass differences, phosphopeptides can also be detected based on the negative charges on the phosphates. Negative ion MALDI-MS has been used for identification of phosphopeptides, which demonstrate greater relative ion intensities in negative ion mode as compared to positive ion mode [50,51]. However, this approach suffers from poor specificity because of the high background of nonphosphorylated acidic peptides. Recently, Xu et al. showed that removal of these acidic groups by esterification with methyl groups could greatly diminish the ion intensity of these acidic nonphosphorylated peptides in negative ion mode and therefore greatly increase the selectivity of the this method [52]. More commonly, phosphopeptides can be selectively detected using ‘‘neutral loss scan’’ [53] and ‘‘precursor ion scan’’ methods [54–57]. Phosphopeptides, especially those containing pSer and pThr, tend to lose the phosphate moiety during CID. Neutral loss of 98 Da has been very useful for the detection of phosphoserine/phosphothreonine-containing peptides [58]. While pTyr can occasionally lose 80 Da by neutral loss in CID, neutral loss is rarely applied to detect tyrosine-phosphorylated peptides due to the stability of the phosphate linkage to tyrosine. In negative ion mode, the fragment ion at m/z 79 ðPO 3 Þ has been most frequently used for phosphorylation-specific precursor ion scanning [53,57]. These ions are formed by all phosphopeptides, regardless of whether phosphorylation occurs on serine, threonine or tyrosine residues. This approach suffers from two limitations: (1) alkaline spray solvent conditions are required to achieve good sensitivity; and (2) MS/MS sequencing of phosphopeptides is almost always carried out in positive ion mode. A more specific method for the detection of tyrosine-phosphorylated peptides is phosphotyrosine-specific immonium ion scanning (PSI scanning). During PSI scanning, the immonium ion of phosphotyrosine at m/z 216.043 is a reporter for tyrosine-phosphorylated peptides. Because most of the other fragment ions at the same nominal mass can be resolved from this immonium ion using a high resolution tandem mass spectrometer such as a quadrupole TOF, this method can be very specific and sensitive for the detection of tyrosine-phosphorylated
304
Guoan Zhang et al.
peptides. The applicability of PSI scanning for the sensitive analysis of low abundance tyrosine-phosphorylated signaling proteins has been shown by the mapping of phosphotyrosine residues in the epidermal growth factor (EGF) [59], fibroblast growth factor receptor signaling pathway [60] and the Bcr/Abl fusion oncoprotein [55].
3.2 MS/MS analysis of phosphopeptides Tandem MS has been the most common method for determining protein phosphorylation sites. In the mass spectrometer, phosphopeptides tend to undergo facile loss of phosphate from pSer, pThr and less frequently pTyr residues upon CID. In positive ion mode, pSer/pThr residues exhibit neutral loss of 98 Da (due to the loss of H3PO4). pTyr residues are more stable, although neutral loss of 80 Da (due to the loss of HPO3) can be observed in MALDI-MS/MS using elevated collision energy. Figure 2 illustrates the different stabilities of pThr and pTyr residue during CID (unpublished data). Both ESI-MS/MS and MALDI-MS/MS work well for the sequence analysis of phosphorylated peptides. CID of singly charged nonphosphorylated peptides in MALDI analyses suffers from complex fragmentation patterns and abundant
Figure 2 Deconvoluted ESI Q-TOF MS/MS spectra of a singly phosphorylated peptide (TPEEGGYSYEISEK, MW ¼ 1667.66 after phosphorylation) from mouse microtubule-associated protein 1B. In (a) Thr1 was phosphorylated and in (b) Tyr9 was phosphorylated.
Analysis of Protein-Tyrosine Phosphorylation by MS
305
internal fragments. However, CID of singly charged phosphopeptides is typically informative, and often yields mass spectra containing dominant y and b ions. While we are not aware of any studies addressing this phenomenon directly, these y and b ions may be due to an additional mobile proton provided by the phosphate group counter ion.
4. QUANTITATIVE ANALYSIS Changes in tyrosine phosphorylation levels of signaling proteins play crucial roles in controlling cellular signal transduction. Therefore, it is particularly important to quantify these changes to understand the regulation of signaling. For relative quantitation of phosphorylated proteins by MS, stable isotope labeling approaches are generally used. Usually these approaches involve differential labeling of proteins or peptides using non-radioactive (stable) isotopes before the samples are mixed for MS analysis. The labeled/unlabeled samples have predictable mass differences which can be readily recognized in MS. At the same time the labeled/unlabeled pairs have the same chemical properties, after labeling the samples can be mixed and handled together so that the error associated with sample preparation and MS detection is minimized [61–63]. There are several methods for incorporating stable isotopes into proteins. Some methods such as ICAT only label a subset of total peptides (e.g. cysteine-containing peptides)[64]. For quantitation of phosphorylation, it is important to make sure all phosphopeptides are labeled. For this purpose several major strategies are available: (1) stable isotope labeling by amino acids in cell culture (SILAC) [61]; (2) chemical labeling through amine groups (N-termini and lysines are labeled) [65– 71] or carboxyl groups (C-termini, aspartic acids and glutamic acids are labeled) [72] and (3) 18O labeling during protease digestion [73,74]. Of these methods, it appears that SILAC is becoming the method of choice. SILAC involves cell culture in media supplemented with ‘‘light’’ (natural) or ‘‘heavy’’ isotope-containing amino acids. The isotopes are incorporated into proteins during protein synthesis in the cells. After labeling, all proteins in two or more cell populations are encoded with either light or heavy versions of the labeling amino acid, which allows for relative quantitation in MS. If labeling takes place during several cell divisions, SILAC has high labeling efficiency (Z97% after 5 cell divisions) and does not involve complicated chemistry. More importantly, it allows combining the labeled and unlabeled samples at an early stage of the experiment to avoid variability caused by parallel sample handling. This is a major advantage over chemical labeling and proteolytic 18O labeling, especially when enrichment of phosphorylated proteins or peptides is required prior to MS analysis. Preferably labeling with arginine and lysine is used for SILAC in combination with trypsin as the protease so that all tryptic peptides contain a labeling amino acid. For pTyr analysis, tyrosine can be used as the labeling amino acid [75]. Alternatively, if a phosphorylation site is known, the corresponding phosphopeptide can be chemically synthesized with heavy isotopes and spiked into the sample in known quantities as internal standards for quantitation. The
306
Guoan Zhang et al.
major advantages of this method include: (1) it allows absolute quantitation and (2) there is no need to label the samples. Gygi and colleagues have used this method to quantify phosphorylation changes during the Xenopus cell cycle [76,77]. Limitations include: (1) the internal standards need to be identified and synthesized; (2) it is more suitable for studies of a small number of targeted proteins and is difficult to apply to large-scale proteomic experiments.
4.1 Study of cellular signaling by quantitative phosphotyrosine proteome analysis Quantitative analysis of pTyr proteomes can be used to elucidate pTyr-dependent signaling mechanisms. Rather than focusing on one specific target protein at a time, such studies can provide a global view of the changes of tyrosine phosphorylation of proteins inside cells in response to stimuli. The quantitation of pTyr can be performed at the protein level or peptide level. Here we give an example of each. Protein level: The main goal of these studies is to identify changes in the phosphorylation states of phosphotyrosine-containing proteins and their association with binding partners in response to a specific stimulus, for example, a ligand for a receptor tyrosine kinase. Matthias Mann and colleagues described a strategy to screen for effector proteins in the epidermal growth factor receptor (EGFR) signaling pathway based on quantitative analysis of pTyr-containing proteins [78]. The basic principle of the strategy is: First, the proteins in two populations of cells are labeled with amino acids containing different stable isotopes (SILAC). Then one cell population is stimulated with EGF ligand to activate the EGFR pathway, while the other population is left untreated as a control. The two sets of cells are lyzed and equal amounts of the lysates are combined for anti-pTyr IP. After tryptic digestion, the IPed proteins are identified and quantified using MS. The proteins with changed ratios are either tyrosine phosphorylated (or dephosphorylated) in response to EGF stimulation or they are tight binding partners of those differentially tyrosine-phosphorylated proteins. Those proteins not involved in EGF-mediated signaling or pulled down by the anti-pTyr IP through non-specific binding have ratios close to 1:1. Thus effector proteins can be easily recognized by their SILAC ratios after the anti-pTyr IP. To add another dimension to the analysis, a more sophisticated labeling scheme was used to obtain protein ratios from multiple time points. Three cell populations were encoded with different stable isotopic forms of arginine. Each population was stimulated by EGF for a different length of time, and anti-pTyr IP was performed. Arginine-containing peptides occurred in three forms in MS, based on which IPed proteins were quantified. Then two experiments were combined to generate five-point dynamic profiles for the signaling proteins (Figure 3). The time course experiments not only identified players in the pathway, but also the dynamic activation profiles of the involved proteins, thus providing an informative dataset for modeling signaling networks with a systems biology approach.
Analysis of Protein-Tyrosine Phosphorylation by MS
307
Figure 3 Construction of a temporal profile of the phosphotyrosine proteome in response to EGFR activation. (A) Cells are SILAC labeled with one of three different forms of arginine. Each population is stimulated by EGF for the indicated time interval and lysed. Lysates are combined for anti-pTyr IP. IPed proteins are digested and analyzed by LC-MS/MS. Arg-0, Arg6 and Arg-10 stand for 12C614N4-Arg, 13C614N4-Arg and 13C615N4-Arg, respectively. (B) Two such experiments with different time points are then combined to obtain a temporal profile containing five time points. One common time point is used to link the two datasets. (See Colour Plate Section at the end of this book.)
308
Guoan Zhang et al.
Peptide level: Although the quantitation of pTyr proteins after anti-pTyr can be very useful, it does not provide information about dynamic change of specific phosphorylation sites in signaling events. Because many signaling proteins have multiple pTyr sites, the dynamic change for each specific site cannot be deduced from the ratio of the phosphoproteins. Identification of pTyr peptides is more challenging than identification of pTyr-containing proteins, but the result is more informative because it provides links between specific phosphorylation sites and a specific signaling process. A good example of this approach is a comprehensive study of site-specific phosphorylation dynamics in EGFR signaling by Mann and colleagues [46]. Cell populations were labeled with SILAC and treated with EGF for different lengths of time. Cellular proteins were then digested. Phosphopeptides were fractionated and enriched using strong cation exchange and TiO2 before LC-MS identification and quantitation (Figure 4). The temporal dynamics of 6,600 phosphorylation sites in response to EGF stimulation were established. All three forms of phosphorylation were enriched in this experiment by the use of TiO2, but anti-pTyr antibodies
Figure 4 Generalized strategy for profiling the phosphorylation site dynamics for intracellular signaling systems. Three cell populations are differentially labeled with three different forms of arginine (12C614N4-Arg, 13C614N4-Arg and 13C615N4-Arg) and lysine (12C61H414N2-Lys, 12C62H414N2-Lys and 13C61H415N2-Lys) amino acids, creating three states distinguished by mass. Each population is stimulated for a different length of time with receptor tyrosine kinase ligand such as EGF or other stimulus, and two such experiments are combined to yield five time points. Cells are combined, lysed and fractionated into cytoplasmic and nuclear fractions. After tryptic digestion, peptides are fractionated using SCX. Phosphopeptides from each SCX fraction are then enriched using TiO2 and analyzed by MS.
Analysis of Protein-Tyrosine Phosphorylation by MS
309
can be used to specifically enrich pTyr peptides in targeted studies of tyrosine phosphorylation.
5. FUTURE DIRECTIONS 5.1 MS instrumentation Just a few years ago, the analysis of pTyr by MS, especially on a proteomic scale, was a formidable task even for the most advanced MS centers. Now although such experiments still are considered technologically challenging, hundreds of pTyr sites can be identified and quantified in a single experiment using techniques and instruments available to most MS laboratories. This progress has been largely attributed to the rapid development and improvement in MS instrumentation. In recent years the invention of new MS instruments, especially linear ion traps [79] and hybrid instruments such as LTQ-FTTM (linear ion trap-Fourier transform) [80] and LTQ-OrbitrapTM (linear ion trap-Orbitrap) [81], has greatly improved the sensitivity, mass accuracy, speed and dynamic range of MS analysis. Improvements to proven technologies such as the quadrupole time-of-flight, triple quadrupole, and tandem time-of-flight mass spectrometers have also greatly aided the study of posttranslational modifications such as phosphorylation. New MS/MS technologies, such as the newly developed electron capture dissociation (ECD) and electron transfer dissociation (ETD) have shown great promise in sequencing of phosphopeptides [82,83]. While top-down approaches for the analysis of pTyr are not yet widely used, as high resolution MS instrumentation and the appropriate software for data analysis improve and become easier to use, we anticipate that this powerful approach will become more and more popular due to its ability to achieve complete protein sequence coverage [84–86].
5.2 PTyr enrichment Immunoaffinity enrichment of pTyr peptides by anti-pTyr antibodies has demonstrated its utility and will play a more important role in pTyr analysis. Some global phosphopeptide enrichment techniques such as IMAC and TiO2based approaches have become fairly mature and robust after having been improved and optimized considerably over the last few years. They will continue to be employed in pTyr analysis as important alternatives to anti-pTyr antibodies.
5.3 Bioinformatics With large numbers of pTyr sites being identified by MS, there is a need for more powerful bioinformatic tools for data interpretation and mining. New tools are being developed to facilitate database searching for phosphopeptides and localization of modification sites, which are important for large-scale pTyr proteome analysis [87–89]. By data mining large numbers of pTyr sites identified from proteomic experiments, novel phosphorylation motifs and signaling
310
Guoan Zhang et al.
networks may be discovered [90,91]. There also will be an increasing need for better documentation of the newly discovered pTyr sites. Conventional protein databases contain very little information about tyrosine phosphorylation of proteins. To enable efficient use of phosphorylation information in various proteomes, some phosphorylation databases have been established, such as PHOSIDA (M. Mann lab, Max Planck Institute of Biochemistry), PhosphoSite (Cell Signaling Technology), and Phospho.ELM (The European Molecular Biology Laboratory). Such databases are expected to expand considerably in size and utility in the future.
5.4 Quantitation Quantitation of pTyr proteomes is a powerful tool to study pTyr-dependent cell signaling events. Experiments that involve temporal profiling of in vivo pTyr sites are especially useful in this regard, and we expect the number of time course experiments to increase as improvements in bioinformatics and instrumentation make these experiments more user friendly. Eventually this type of experiment will be applied to characterization of many or all RTK pathways in different biological contexts. SILAC and other stable isotope labeling based approaches will become even more popular in quantitative pTyr studies. The current goal of many labs however is to eliminate the need for stable isotope labeling for comparative and absolute quantitation studies of tyrosine phosphorylation. We anticipate that the popularity of label-free methods will increase as methodologies and bioinformatic tools improve in the future.
6. CONCLUSIONS There are many newly developed techniques available for sample preparation, MS identification and quantitation. To choose the strategy for a specific pTyr study, it is important to understand the strengths and limitations of these techniques. Regardless of the method used, enrichment of pTyr proteins and peptides increases the likelihood of successful analysis. MS is already the technique of choice for analysis of tyrosine phosphorylation. It will become even more powerful in the near future.
ABBREVIATIONS CID EGF EGFR ESI IMAC IP MALDI
Collision-induced dissociation Epidermal growth factor Epidermal growth factor receptor Electrospray ionization Immobilized metal ion affinity chromatography Immunoprecipitation Matrix assisted laser desorption ionization
Analysis of Protein-Tyrosine Phosphorylation by MS
MS MS/MS pSer or pS PSI PTB pThr or pT pTyr or pY SCX SH2 SILAC
311
Mass spectrometry Tandem mass spectrometry Phosphoserine Phosphotyrosine-specific immonium ion scanning Phosphotyrosine binding Phosphothreonine Phosphotyrosine Strong cation exchange Src homology 2 Stable isotope labeling by amino acids in cell culture
ACKNOWLEDGEMENT We gratefully acknowledge NIH grants P30 CA016087 and P30 NS050276 to T.A.N. for support.
REFERENCES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
T. Hunter, Philos. Trans. R. Soc. Lond. B. Biol. Sci., 353(1368) (1998) 583–605. A. Ostman and F.D. Bohmer, Trends Cell Biol., 11(6) (2001) 258–266. S. Zolnierowicz and M. Bollen, EMBO. J., 19(4) (2000) 483–488. A. Alonso, J. Sasin, N. Bottini, I. Friedberg, I. Friedberg, A. Osterman, A. Godzik, T. Hunter, J. Dixon and T. Mustelin, Cell, 117(6) (2004) 699–711. G. Manning, D.B. Whyte, R. Martinez, T. Hunter and S. Sudarsanam, Science, 298(5600) (2002) 1912–1934. T. Pawson, Eur. J. Cancer, 38(Suppl 5) (2002) S3–S10. A.C. Porter and R.R. Vaillancourt, Oncogene, 17(11 Reviews) (1998) 1343–1352. T. Pawson, Cell, 116(2) (2004) 191–203. M.B. Yaffe, Nat. Rev. Mol. Cell. Biol., 3(3) (2002) 177–186. M. Mann, S.E. Ong, M. Gronborg, H. Steen, O.N. Jensen and A. Pandey, Trends Biotechnol., 20(6) (2002) 261–268. E. Salih, Mass Spectrom Rev., 24(6) (2005) 828–846. S.J. Ding, W.J. Qian and R.D. Smith, Expert Rev. Proteomics, 4(1) (2007) 13–23. M. Mann and O.N. Jensen, Nat. Biotechnol., 21(3) (2003) 255–261. D.T. McLachlin and B.T. Chait, Curr. Opin. Chem. Biol., 5(5) (2001) 591–602. S.A. Carr, R.S. Annan and M.J. Huddleston, Methods Enzymol., 405 (2005) 82–115. H. Steen and M. Mann, Nat. Rev. Mol. Cell Biol., 5(9) (2004) 699–711. R. Aebersold and M. Mann, Nature, 422(6928) (2003) 198–207. H. Liu, R.G. Sadygov and J.R. Yates, 3rd., Anal Chem., 76(14) (2004) 4193–4201. M.P. Kamps, Methods Enzymol., 201 (1991) 101–110. A. Pandey, A.V. Podtelejnikov, B. Blagoev, X.R. Bustelo, M. Mann and H.F. Lodish, Proc. Natl. Acad. Sci. USA, 97(1) (2000) 179–184. V. De Corte, H. Demol, M. Goethals, J. Van Damme, J. Gettemans and J. Vandekerckhove, Protein Sci., 8(1) (1999) 234–241. M.S. Kalo and E.B. Pasquale, Biochemistry, 38(43) (1999) 14396–14408. M.S. Kalo, H.H. Yu and E.B. Pasquale, J. Biol. Chem., 276(42) (2001) 38940–38948. K. Marcus, D. Immler, J. Sternberger and H.E. Meyer, Electrophoresis, 21(13) (2000) 2622–2636. J. Rush, A. Moritz, K.A. Lee, A. Guo, V.L. Goss, E.J. Spek, H. Zhang, X.M. Zha, R.D. Polakiewicz and M.J. Comb, Nat. Biotechnol., 23(1) (2005) 94–101. K. Schmelzle, S. Kane, S. Gridley, G.E. Lienhard and F.M. White, Diabetes, 55(8) (2006) 2171–2179.
312
Guoan Zhang et al.
27 Y. Zhang, A. Wolf-Yadlin, P.L. Ross, D.J. Pappin, J. Rush, D.A. Lauffenburger and F.M. White, Mol. Cell. Proteomics, 4(9) (2005) 1240–1250. 28 G. Zhang and T.A. Neubert, Proteomics, 6(2) (2006) 571–578. 29 B.T. Seet, I. Dikic, M.M. Zhou and T. Pawson, Nat. Rev. Mol. Cell Biol., 7(7) (2006) 473–483. 30 C.H. Benes, N. Wu, A.E. Elia, T. Dharia, L.C. Cantley and S.P. Soltoff, Cell, 121(2) (2005) 271–280. 31 P. Nollau and B.J. Mayer, Proc. Natl. Acad. Sci. USA, 98(24) (2001) 13531–13536. 32 B. Blagoev, I. Kratchmarova, S.E. Ong, M. Nielsen, L.J. Foster and M. Mann, Nat. Biotechnol., 21(3) (2003) 315–318. 33 M.C. Posewitz and P. Tempst, Anal. Chem., 71(14) (1999) 2883–2892. 34 A. Stensballe, S. Andersen and O.N. Jensen, Proteomics, 1(2) (2001) 207–222. 35 S.B. Ficarro, M.L. McCleland, P.T. Stukenberg, D.J. Burke, M.M. Ross, J. Shabanowitz, D.F. Hunt and F.M. White, Nat. Biotechnol., 20(3) (2002) 301–305. 36 M.R. Larsen, T.E. Thingholm, O.N. Jensen, P. Roepstorff and T.J.D. Jorgensen, Mol. Cell. Proteomics, 4(7) (2005) 873–886. 37 H.K. Kweon and K. Hakansson, Anal. Chem., 78(6) (2006) 1743–1749. 38 F. Wolschin, S. Wienkoop and W. Weckwerth, Proteomics, 5(17) (2005) 4389–4397. 39 S.A. Beausoleil, M. Jedrychowski, D. Schwartz, J.E. Elias, J. Villen, J.X. Li, M.A. Cohn, L.C. Cantley and S.P. Gygi, Proc. Natl. Acad. Sci. USA, 101(33) (2004) 12130–12135. 40 G. Maccarrone, N. Kolb, L. Teplytska, I. Birg, R. Zollinger, F. Holsboer and C.W. Turck, Electrophoresis, 27(22) (2006) 4585–4595. 41 C.F. Xu, H.B. Wang, D.M. Li, X.P. Kong and T.A. Neubert, Anal. Chem., 79(5) (2007) 2007–2014. 42 Y.M. Ndassa, C. Orsi, J.A. Marto, S. Chen and M.M. Ross, J. Proteome Res., 5(10) (2006) 2789–2799. 43 B. Bodenmiller, L.N. Mueller, M. Mueller, B. Domon and R. Aebersold, Nat. Methods, 4(3) (2007) 231–237. 44 A.J. Link, J. Eng, D.M. Schieltz, E. Carmack, G.J. Mize, D.R. Morris, B.M. Garvik and J.R. Yates, Nat. Biotechnol., 17(7) (1999) 676–682. 45 J.C. Trinidad, C.G. Specht, A. Thalhammer, R. Schoepfer and A.L. Burlingame, Mol. Cell. Proteomics, 5(5) (2006) 914–922. 46 J.V. Olsen, B. Blagoev, F. Gnad, B. Macek, C. Kumar, P. Mortensen and M. Mann, Cell, 127(3) (2006) 635–648. 47 S. Laugesen, A. Bergoin and M. Rossignol, Plant Physiol. Biochem., 42(12) (2004) 929–936. 48 H. Steen and M. Mann, J. Am. Soc. Mass Spectrom., 13(8) (2002) 996–1003. 49 E.J. Chang, V. Archambault, D.T. McLachlin, A.N. Krutchinsky and B.T. Chait, Anal. Chem., 76(15) (2004) 4472–4483. 50 Y.L. Ma, Y. Lu, H.Q. Zeng, D. Ron, W.J. Mo and T.A. Neubert, Rapid Commun. Mass Spectrom., 15(18) (2001) 1693–1700. 51 K. Janek, H. Wenschuh, M. Bienert and E. Krause, Rapid Commun. Mass Spectrom., 15(17) (2001) 1593–1599. 52 C.F. Xu, Y. Lu, J.H. Ma, M. Mohammadi and T.A. Neubert, Mol. Cell. Proteomics, 4(6) (2005) 809–818. 53 M.J. Huddleston, R.S. Annan, M.F. Bean and S.A. Carr, J. Am. Soc. Mass Spectrom., 4(9) (1993) 710–717. 54 M. Salek, A. Alonso, R. Pipkorn and W.D. Lehmann, Anal. Chem., 75(11) (2003) 2724–2729. 55 H. Steen, M. Fernandez, S. Ghaffari, A. Pandey and M. Mann, Mol. Cell. Proteomics, 2(3) (2003) 138–145. 56 H. Steen, B. Kuster, M. Fernandez, A. Pandey and M. Mann, Anal. Chem., 73(7) (2001) 1440–1448. 57 H. Steen, B. Kuster and M. Mann, J. Mass Spectrom., 36(7) (2001) 782–790. 58 A. Schlosser, R. Pipkorn, D. Bossemeyer and W.D. Lehmann, Anal. Chem., 73(2) (2001) 170–176. 59 H. Steen, B. Kuster, M. Fernandez, A. Pandey and M. Mann, J. Biol. Chem., 277(2) (2002) 1031–1039. 60 A.M. Hinsby, J.V. Olsen, K.L. Bennettt and M. Mann, Mol. Cell. Proteomics, 2(1) (2003) 29–36. 61 S.E. Ong, B. Blagoev, I. Kratchmarova, D.B. Kristensen, H. Steen, A. Pandey and M. Mann, Mol. Cell. Proteomics, 1(5) (2002) 376–386. 62 S.E. Ong, L.J. Foster and M. Mann, Methods, 29(2) (2003) 124–130. 63 S.E. Ong and M. Mann, Nat. Chem. Biol., 1(5) (2005) 252–262.
Analysis of Protein-Tyrosine Phosphorylation by MS
313
64 S.P. Gygi, B. Rist, S.A. Gerber, F. Turecek, M.H. Gelb and R. Aebersold, Nat. Biotechnol., 17(10) (1999) 994–999. 65 J.L. Hsu, S.Y. Huang, N.H. Chow and S.H. Chen, Anal. Chem., 75(24) (2003) 6843–6852. 66 Y.H. Lee, H. Han, S.B. Chang and S.W. Lee, Rapid Commun. Mass Spectrom., 18(24) (2004) 3019–3027. 67 D.E. Mason and D.C. Liebler, J. Proteome Res., 2(3) (2003) 265–272. 68 M. Munchbach, M. Quadroni, G. Miotto and P. James, Anal. Chem., 72(17) (2000) 4047–4057. 69 P.L. Ross, Y.N. Huang, J.N. Marchese, B. Williamson, K. Parker, S. Hattan, N. Khainovski, S. Pillai, S. Dey, S. Daniels, S. Purkayastha, P. Juhasz, S. Martin, M. Bartlet-Jones, F. He, A. Jacobson and D.J. Pappin, Mol. Cell. Proteomics, 3(12) (2004) 1154–1169. 70 A. Schmidt, J. Kellermann and F. Lottspeich, Proteomics, 5(1) (2005) 4–15. 71 X. Zhang, Q.K. Jin, S.A. Carr and R.S. Annan, Rapid Commun. Mass Spectrom., 16(24) (2002) 2325–2332. 72 D.R. Goodlett, A. Keller, J.D. Watts, R. Newitt, E.C. Yi, S. Purvine, J.K. Eng, P. von Haller, R. Aebersold and E. Kolker, Rapid Commun. Mass Spectrom., 15(14) (2001) 1214–1221. 73 O.A. Mirgorodskaya, Y.P. Kozmin, M.I. Titov, R. Korner, C.P. Sonksen and P. Roepstorff, Rapid Commun. Mass Spectrom., 14(14) (2000) 1226–1232. 74 X. Yao, A. Freas, J. Ramirez, P.A. Demirev and C. Fenselau, Anal. Chem., 73(13) (2001) 2836–2842. 75 N. Ibarrola, H. Molina, A. Iwahori and A. Pandey, J. Biol. Chem., 279(16) (2004) 15805–15813. 76 S.A. Gerber, J. Rush, O. Stemman, M.W. Kirschner and S.P. Gygi, Proc. Natl. Acad. Sci. USA, 100(12) (2003) 6940–6945. 77 D.S. Kirkpatrick, S.A. Gerber and S.P. Gygi, Methods, 35(3) (2005) 265–273. 78 B. Blagoev, S.E. Ong, I. Kratchmarova and M. Mann, Nat. Biotechnol., 22(9) (2004) 1139–1145. 79 J.C. Schwartz, M.W. Senko and J.E. Syka, J. Am. Soc. Mass Spectrom., 13(6) (2002) 659–669. 80 W. Haas, B.K. Faherty, S.A. Gerber, J.E. Elias, S.A. Beausoleil, C.E. Bakalarski, X. Li, J. Villen and S.P. Gygi, Mol. Cell. Proteomics, 5(7) (2006) 1326–1337. 81 A. Makarov, E. Denisov, A. Kholomeev, W. Balschun, O. Lange, K. Strupat and S. Horning, Anal. Chem., 78(7) (2006) 2113–2120. 82 N.L. Kelleher, R.A. Zubarev, K. Bush, B. Furie, B.C. Furie, F.W. McLafferty and C.T. Walsh, Anal. Chem., 71(19) (1999) 4250–4253. 83 J.E. Syka, J.J. Coon, M.J. Schroeder, J. Shabanowitz and D.F. Hunt, Proc. Natl. Acad. Sci. USA, 101(26) (2004) 9528–9533. 84 X.M. Han, M. Jin, K. Breuker and F.W. McLafferty, Science, 314(5796) (2006) 109–112. 85 M.T. Boyne, J.J. Pesavento, C.A. Mizzen and N.L. Kelleher, J. Proteome Res., 5(2) (2006) 248–253. 86 S.M. Patrie, J.T. Ferguson, D.E. Robinson, D. Whipple, M. Rother, W.W. Metcalf and N.L. Kelleher, Mol. Cell. Proteomics, 5(1) (2006) 14–25. 87 S.A. Beausoleil, J. Villen, S.A. Gerber, J. Rush and S.P. Gygi, Nat. Biotechnol., 24(10) (2006) 1285–1292. 88 R. Matthiesen, M.B. Trelle, P. Hojrup, J. Bunkenborg and O.N. Jensen, J. Proteome Res., 4(6) (2005) 2338–2347. 89 D. Tsur, S. Tanner, E. Zandi, V. Bafna and P.A. Pevzner, Nat. Biotechnol., 23(12) (2005) 1562–1567. 90 I. Kratchmarova, B. Blagoev, M. Haack-Sorensen, M. Kassem and M. Mann, Science, 308(5727) (2005) 1472–1477. 91 J. Villen, S.A. Beausoleil, S.A. Gerber and S.P. Gygi, Proc. Natl. Acad. Sci. USA, 104(5) (2007) 1488–1493.
Plate 7 Construction of a temporal profile of the phosphotyrosine proteome in response to EGFR activation. (A) Cells are SILAC labeled with one of three different forms of arginine. Each population is stimulated by EGF for the indicated time interval and lysed. Lysates are combined for anti-pTyr IP. IPed proteins are digested and analyzed by LC-MS/MS. Arg-0, Arg6 and Arg-10 stand for 12C614N4-Arg, 13C614N4-Arg and 13C615N4-Arg, respectively. (B) Two such experiments with different time points are then combined to obtain a temporal profile containing five time points. One common time point is used to combine the two datasets. (For Black and White version, see page 307.)
CHAPT ER
14 Protein Histidine Phosphorylation Xin-Lin Zu, Paul G. Besant and Paul V. Attwood
Contents
1. Introduction 2. Chemistry of Phosphohistidine 3. Protein Histidine Phosphorylation 3.1 The two-component histidine kinases in plants, fungi and bacteria 3.2 Histidine kinases in mammalian cells 3.3 Histone H4 histidine kinase (HHK) 3.4 Nucleoside diphosphate kinase 3.5 Other mammalian histidine kinases 3.6 Autophosphorylation of metabolic enzymes on histidine residues 3.7 Mammalian two-component-like histidine kinases 3.8 Protein histidine phosphatases (PHPs) 4. Detection of Histidine Phosphorylation 4.1 Phosphoamino acid analysis 4.2 Histidine kinase assays 4.3 In-gel kinase detection of histidine kinases 4.4 Use of mass spectrometric methods 5. Future Directions 5.1 Fourier transform ion cyclotron resonance mass analysis (FTMS): ‘‘Top-down’’ mass spectrometry 5.2 Thiophosphorylation of histidine improves chemical stability 5.3 Edman degradation of radiolabelled, histidine-thiophosphorylated NDPK 5.4 32P/33P differential phosphoprotein labelling 5.5 Phosphohistidine-specific antibodies 5.6 Two-component histidine kinase and histidine kinase consensus phosphorylation sites 6. Conclusion Acknowledgements References
Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00214-6
316 317 318 318 321 321 325 327 328 328 329 331 331 332 334 335 339 339 341 342 342 344 344 346 346 346
r 2009 Elsevier B.V. All rights reserved.
315
316
Xin-Lin Zu et al.
1. INTRODUCTION Protein phosphorylation is recognised as probably the most important posttranslational modification (PTM) that occurs in cellular proteins. The kinases that catalyse the phosphorylation reactions and the phosphatases that catalyse the dephosphorylation reactions are involved in intracellular signalling pathways that control almost all aspects of cellular function. There are many published reviews of protein phosphorylation and the associated kinases and phosphatases in the context of cellular signalling, a recent example of which is presented by Pawson and Scott [1]. The vast majority of these reviews concentrate on serine/ threonine and tyrosine kinases and their corresponding phosphoamino acid products. The structures of the side chains of phosphohydroxyamino acids are shown in Figure 1. Phosphorylation of amino acids other than the hydroxyamino acids has however been known to occur for many years. One situation where such phosphorylations occur is the phosphorylation of an active-site amino acid to produce a transient phosphoenzyme intermediate as a part of an enzymic catalytic mechanism, e.g. phosphoserine in phosphoglycerate mutase [2], phosphoaspartate in Ca2+-ATPase [3], phosphohistidine in nucleoside diphosphate kinase (NDPK) [4] and phosphocysteine in protein tyrosine phosphatases [5]. The structures of the side chains of phosphoamino acids other than the phosphohydroxyamino acids are shown in Figure 2. In recent times, the role of protein phosphorylation by protein kinases other than serine/threonine and tyrosine kinases and the importance of other phosphoamino acids in cellular signalling pathways have become evident. Specifically, the discovery of the two-component histidine kinases (TCHKs) in bacteria, fungi and plants has revealed new ways in which protein phosphorylation functions in intracellular signalling processes (see ref. [6] for a review). These signalling processes can involve the intra- and inter-protein transfers of a phosphoryl group between phosphohistidine and aspartate and between phosphoaspartate and histidine (see Section 2). TCHKs are often part of the signalling pathway used to initiate a cellular response of the organism to an external stimulus such as a change in osmolarity, ethylene concentration, nutrient availability, etc. Protein histidine phosphorylation and HKs are topics of growing interest, not just from the perspective of their importance in bacteria, fungi and
CH 3
O H2C C
O
P O-
phosphoserine
O-
HC C
O
O O
P O-
phosphothreonine
O-
H2C
O
C
phosphotyrosine
Figure 1 Structures of the side chains of the phosphohydroxyamino acids.
P O-
O-
317
Protein Histidine Phosphorylation
H 2C
P
S
O-
H2C
O-
C
O
O
O
c
O
O-
C
phosphocysteine
O-
P
phosphoaspartate -
O
-O -
P
O
N 1
O
CH 2 CH 2
N 1
O
P
O
C
C
C -O
+ P
-
3-phosphohistidine
3 N
-O
O
-O
1-phosphohistidine
P N 1
CH 2
3 N 3 N
O
O
1,3-diphosphohistidine
Figure 2 Structures of the side chains of phosphoamino acids other than the phosphohydroxyamino acids.
plants but from a small but growing interest in their occurrence in mammalian cells (see refs. [7,8] for reviews). The aim of this chapter is to raise awareness of protein histidine phosphorylation as a phenomenon and to describe what is known about its roles in cellular function. The chapter will consider the chemical differences between phosphohistidine and other phosphoamino acids and its stability under a variety of conditions. We shall give an overview of what is known about protein histidine phosphorylation, protein HKs and protein histidine phosphatases (PHPs) and their roles in bacteria, fungi, plants and mammalian cells. Current methods of detection of phosphohistidine in proteins will be described, including HK assays, phosphoamino acid analysis and approaches involving mass spectrometric methods. Finally, we shall consider future technical developments that will enhance our ability to study protein histidine phosphorylation and enhance our understanding of its biological roles, especially in mammalian cells.
2. CHEMISTRY OF PHOSPHOHISTIDINE As can be seen in Figure 2, there are three possible forms of phosphohistidine, 1-phosphohistidine, 3-phosphohistidine and 1,3-diphosphohistidine. 1-Phosphohistidine has been identified in a number of phosphoproteins [9–13] whilst 3-phosphohistidine has been detected in others [14–19]. There have been no reports of 1,3-diphosphohistidine in phosphoproteins.
318
Xin-Lin Zu et al.
While phosphohydroxyamino acids i.e. phosphoserine, phosphothreonine and phosphotyrosine all contain phosphoester bonds, phosphohistidine contains a phosphoamidate (P–N) bond. Thermodynamically, the phosphoester bonds in phosphohydroxyamino acids in proteins are more stable (DG0 of hydrolysis ¼ 6.5 to 9.5 kcal mol1) than the phosphoamidate bond in phosphohistidine (DG0 of hydrolysis B 12 to 14 kcal mol1) [20]. This greater thermodynamic propensity to transfer its phosphoryl group to other molecules makes phosphohistidine a good phosphoamino acid to play its roles in two-component and multicomponent phosphorelay signalling systems found in bacteria, fungi and plants (see below). In addition to the thermodynamic property that distinguishes phosphohistidine from the phosphohydroxyamino acids, its kinetic stability under acidic and alkaline conditions is also different. The phosphohydroxy amino acids are stable under acidic conditions (in 1 M HCl at 1001C the half lives of phosphoserine and phosphothreonine are about 18 h whilst that of phosphotyrosine is about 5 h [21]). Phosphohistidine however is unstable under acidic conditions (in 1 M HCl at 491C, 1-phosphohistidine has a half life of 18 s and 3-phosphohistidine has a half life of 25 s [22]). The difference in stability between the 1- and the 3- forms of phosphohistidine is thought to be due to the proximity of the a-amino group to the 1-nitrogen of the imidazole ring [22]. Both phosphoserine and phosphothreonine are unstable under alkaline conditions (treatment at 371C in 1 M NaOH for 18–20 h results in complete dephosphorylation [21]). However, the lability of these phosphoamino acids in proteins and peptides is somewhat variable, depending on the neighbouring amino acids. Phosphotyrosine and phosphohistidine are both alkali-stable, although care does need to be taken that evaporation does not result in increased concentrations of the alkali (Figures 3 and 4) [23]. While the phosphohydroxyamino acids are stable, phosphohistidine is unstable in neutral 1 M hydroxylamine [21]. The acidlability/alkali-stability of phosphohistidine is a basis for distinguishing it from the phosphohydroxyamino acids and phosphocysteine which has similar stability to phosphotyrosine [22]. Phosphoaspartate and phosphoglutamate are both acid- and alakali-labile, however, phospholysine has similar stability properties to phosphohistidine [22] and hence phosphoamino acid analysis is always required to confirm the presence of phosphohistidine. Diethyl pyrocarbonate derivitisation (DEPC) of histidine prevents phosphorylation [21], however it is not entirely specific for histidine and can modify tyrosine residues, thus prevention of phosphorylation of a protein on treatment with DEPC cannot be taken as definitive proof of histidine phosphorylation [21].
3. PROTEIN HISTIDINE PHOSPHORYLATION 3.1 The two-component histidine kinases in plants, fungi and bacteria Histidine kinases (HKs) catalyse the phosphorylation of histidine in proteins. They were first fully identified and characterised in plants, fungi and bacteria in
Protein Histidine Phosphorylation
319
Figure 3 (A) Thin-layer electropherogram of 3-phosphohistidine, heated with 3 M KOH (with (i) and without (ii) an oil overlay), at 1051C over a 3-h time period. (B) Density of the 3-phosphohistidine spots from (A) (i) and (ii) as a percentage of the density of the spots at time zero, —&— with oil overlay and –}– without oil overlay. Reproduced with permission from [23].
1980s and the term ‘‘two-component histidine kinase’’ was coined (for a review see ref. [6]). Biochemical studies showed that they were involved in signalling systems for sensing the changes of external environment such as temperature, osmolarity and chemoattractants. For example, plants respond to changes of ethylene concentration through regulation of the HK activities of ethylene
320
Xin-Lin Zu et al.
Figure 4 (A) Thin-layer electropherogram of phosphotyrosine, heated with 3 M KOH (with (i) and without (ii) an oil overlay), at 1051C over a 3-h time period. (B) Density of the phosphotyrosine spots from (A) (i) and (ii) as a percentage of the density of the spots at time zero, —&— with oil overlay and –}– without oil overlay. Reproduced with permission from [23].
receptors [24]. The HKs in bacteria, fungi and plants have been described as twocomponent protein systems composed of two major functional parts: (1) the HK and (2) the response regulator protein. The receptor or sensor protein that has HK activity exists in the cell membrane as a preformed dimer or in some cases, may dimerise in response to the extracellular signal. Each monomer has three domains including sensing domain, dimerisation domain and kinase domain.
Protein Histidine Phosphorylation
321
When the extracellular stimulus, such as a change in osmolarity, is sensed, the kinase domain is activated and phosphorylates the histidine in the dimerisation domain of its partner monomer using ATP as a phosphoryl donor. The high free energy of hydrolysis of the P–N bond in phosphohistidine facilitates the transfer of the phosphoryl group to an aspartate residue of the other component protein, a response regulator, which triggers the cell response (Figure 5). In some cases, the phosphoryl group is initially transferred to an aspartate in another domain of the HK sensor protein (e.g. Sln1p and BvgS systems) or to an aspartate in another protein (e.g. SpoOF in the KinA system) (see Figure 5). In these more complex systems, the phosphoryl group is then transferred from phosphoaspartate back to histidine, either in yet another domain of the HK sensor protein (e.g. BvgS system) or in a separate phosphorelay protein (e.g. SpoOB in the KinA system, Ypd1p in the Sln1p system) (see Figure 5). Finally, the phosphoryl group is transferred to an aspartate in the response regulator protein (e.g. SpoOA, Ssk1p, BvgA; see Figure 5). In the Sln1p system, the phosphorylation state of Ssk1 controls the activity of the protein Hog1, which is a mitogen-activated protein kinase (MAPK) [25]. With the growing knowledge of HKs in bacteria, fungi and plants, researchers questioned whether HKs were present in other organisms, specifically in mammals.
3.2 Histidine kinases in mammalian cells The functions of HKs in bacterial, fungal and plant cells are well understood, but little is known in mammalian cells. It has been estimated that about 6% of all phosphoamino acids in eukaryotes may be accounted for by phosphohistidine [26]. The early study of phosphorylated proteins in rat liver mitochondria showed that there is greater extent of phosphohistidine formation than phosphoserine [27]. A significant amount of phosphohistidine was also found in bovine liver mitochondria and other cellular compartments [28,29]. Since then, phosphohistidine has attracted the attention of researchers and several HKs in mammalian cells have been detected. The following sections will focus on some of these HKs.
3.3 Histone H4 histidine kinase (HHK) The protein that is responsible for the histone H4 histidine phosphorylation is referred to as HHK. Study of HHK started in 1970s, when Smith’s group intensively investigated the histone phosphorylation in regenerating rat liver, it was found that phosphohistidine was one of the major phosphoamino acids formed in histone H4 in regenerating liver in vivo; more significantly, there was a correlation between HHK activity and DNA synthesis [30]. Thereafter, studies were undertaken to characterize the HHK. It was partially purified from Walker-256 carcinosarcoma cell nuclei. When incubated with histone H4, [g-32P]ATP and different nucleoside triphosphates (NTPs) including dATP, GTP, CTP, dGTP, UTP, dTTP and dCTP, ATP was shown to be the effective phosphoryl donor whilst HHK activity was inhibited by the other NTPs at different levels [31].
322
Xin-Lin Zu et al.
Phosphorylation of histone H4 by HHK was found to occur on histidine residues of the pre-existing histone H4 molecules in the in vivo study of regenerating liver cell nuclei [32]. In an investigation of the chemical and enzymatic phosphorylation of HPr (a phosphocarrier protein), the NMR spectral characteristics of phosphohistidine isomers were described in detail [33]. This NMR technique was applied to detect and characterise in vitro histone H4 phosphorylation by nuclear HHK extracted from regenerating rat liver and Walker-256 carcinosarcoma cells.
Protein Histidine Phosphorylation
323
It was demonstrated that H18 and H75 of histone H4 were both phosphorylated and 3-phosphohistidine was the exclusive product [34]. Since that time, Matthews’ group took the leading role in HHK research by developing methods of separating phosphohistidine from a mixture of phosphoamino acids [35] and extracting HHK from growing slime mould, Physarum polycephalum, nuclei [17]. The slime mould nuclear extract was incubated with histone H4 and [g32P]ATP, the phosphorylated H4 was then hydrolysed under alkaline conditions followed by phosphoamino acid analysis. 1-Phosphohistidine was detected in the hydrolysate. This work led to the establishment of the basic methods to estimate HHK activity and to purify HHK from proliferating yeast Saccharomyces cerevisiae [36,37]. The first attempt to identify HHK in yeast resolved a protein with the molecular weight of 32 kDa [12]. However, no amino acid sequence data for this protein has ever been published. It is noteworthy that HHK from rat liver and Walker-256 cells catalysed the formation of 3-phosphohistidine, whereas that from slime mould and yeast catalysed the formation of 1-phosphohistidine. It suggests that there are different HHK in different species, which are responsible for catalysing the formation of specific phosphohistidine isomers. The discovery of HHK from yeast and development of the associated techniques helped restart research on HHK in mammalian cells. Attwood and
Figure 5 Two-component histidine kinases (TCHKs) (a) Mode of action of a simple system H represents a conserved histidine in the dimerisation domain which is phosphorylated (P) by the HK domain of the dimer-partner HK, using ATP as a substrate. This phosphoryl group is then transferred from phosphohistidine to a conserved aspartate (D) in the regulatory domain of the response regulator protein. This triggers a conformational change in the effector domain of the response regulator protein which results in the initiation of the cellular response, usually via regulation of gene expression. (b) Multicomponent phosphorelay systems. KinA is the sensory HK component (see Figure 5a) of the multicomponent phosphorelay system that regulates sporulation in Bacillus subtilis. SpoOF is the first regulatory domain-like protein containing a conserved aspartate that is phosphorylated by the transfer of the phosphoryl group from phosphohistidine in KinA. SpoOB is a dimerisation domain-like protein containing a conserved histidine that is phosphorylated by the transfer of the phosphoryl group from the phosphoaspartate of SpoOF. The phosphoryl group is then transferred from the phosphohistidine of SpoOB to the aspartate of the response regulator protein SpoOA (see Figure 5a). In the osmoregulatory system of yeast Sln1p is the HK component which in this case also has an aspartate-containing regulatory-domain which is similar to SpoOF. The first phosphotransfer is thus intramolecular from the phosphohistidine of the dimerisation domain of KinA to the SpoOF-like domain. Ypd1p and Ssk1p correspond to SpoOB and SpoOA, respectively. The BvgS system regulates the transcription of virulence factors in Bordetella pertussis. BvgS is the HK component which in this case also contains an extra dimerisation-like domain, similar to SpoOB and a regulatory-like domain, similar to SpoOF. In this instance the first two phosphotransfers are intramolecular from the phosphohistidine of the dimerisation domain of BvgS to the aspartate of the SpoOF-like domain and then on to the histidine of the SpoOB-like domain. Reproduced with permission from [7].
324
Xin-Lin Zu et al.
co-workers developed several methods specific for HHK investigation, including an acid-labile/alkali-stable HHK assay and an in-gel kinase assay. Those methods will be described in the next major section. Using these methods, partially purified proteins from thymus were demonstrated to contain HHKs, whose molecular weights were between 34 and 41 kDa [38]. HHK activity was also found in regenerating rat liver, foetal rat liver, foetal human liver and human carcinosarcoma tissue [39]. In this study HHK activity of rat liver was found to increase and peak at 18 h after partial hepatectomy, at least 6 h preceding the peak of cellular proliferation. This is somewhat different to the early observation of in vivo study of regenerating rat liver, in which the HHK activity and DNA synthesis peak at 20 h after partial hepatectomy [30]. More importantly from a clinical perspective, HHK activity in human hepatocellular carcinoma (HCC) liver tissue was estimated to be about 400 times higher than that of normal liver tissue around the tumour and 4 times higher than that of human foetal liver [39]. In foetal rat liver, HHK activity was estimated to be 40 times of that in postnatal rat liver [39]. In addition to liver tissues, HepG2 cells (HCC cell line) and PIL-2 cells (a p53 knockout mousederived tumourigenic liver progenitor cell line) also displayed high HHK activity at different levels. The above evidence strongly suggested that HHK activity was highly associated with liver growth and the pattern of its expression was similar to oncodevelopmental markers [39]. HHK in mouse tumour (generated from PIL-2 cell line) has been partially purified and two proteins with molecular weight of 56 and 67 kDa were found to have HHK activity [40].
3.3.1 The significance of HHK in cellular biology Although there are few studies of HHK’s biological roles, theoretical possibilities of its function have been proposed. Histones belong to a nuclear protein family with five members: histone H1; H2A; H2B; H3 and H4. These proteins bind DNA, the DNA super-helical structure winds around the octamer composed of the dimers of histones H2A, H2B, H3 and H4. The octamer assembles and the DNA associates to form the nucleosome, thus facilitating DNA’s condensation. As the most conserved core histone, the amino acid sequence of histone H4 is highly conserved with sequences differing by only 2 residues between the bovine and pea protein [41], indicating its crucial role. Histone H4 has two histidine residues: H18 and H75. The imidazole ring of H75 hydrogen bonds to E4 of histone H2B within the nucleosome core is assumed to stabilise the nucleosome octamer [42]. It was thought that phosphorylation of H75 could result in destabilisation of nucleosome structure [7,43]. The fact that only pre-existing histone H4 was phosphorylated on histidine residue in regenerating liver cells [32] and H75 of histone H4 in nucleosome core could not be phosphorylated by P. polycephalum nuclear extract which contains HHK activity in vitro [44] has led reviewers to speculate that H75 of histone H4 is phosphorylated during DNA replication when histones are disassembled, to prevent the premature reassembly of nucleosome complexes during DNA synthesis [7].
Protein Histidine Phosphorylation
325
The H18 of histone H4 is in its N-terminal tail (K16-N25), which has been shown to interact with H2A-H2B dimer of a neighbouring nucleosome [42] and may be involved in the folding of chromatin fibre [45]. The carcinogenic elements nickel and copper were found to bind histone H4 on H18 [46]. Nickel also inhibits histone H4 acetylation on K12 in mammalian cells [47]. It was thought that if histidine phosphorylation is correlated to histone H4 acetylation, gene expression patterns would be affected [48]. More interestingly, the basic amino acids R17, H18 and R19 within the histone H4 tail were shown to be critical for remodelling the activity of ISWI (initiation switch protein), an ATPase that facilitates the sliding of histone octamers on DNA in the assembly of chromatin [49]. In the same study, it was demonstrated that the acetylation of K12 and K16, that are close to the R17H18R19 region, significantly compromised the stimulation of ISWI. Although there is no direct evidence, it is possible that the phosphorylation of H18 of the histone H4 tail could be involved in regulation of or chromatin remodelling.
3.4 Nucleoside diphosphate kinase NDPK is an enzyme catalysing the interconversion of nucleoside diphosphates (NDPs) and NTPs via a phosphohistidyl-enzyme intermediate [50]. Being a housekeeping enzyme in cellular nucleotide metabolism, NDPK is ubiquitous and is found in plants, bacteria and mammalian species as soluble (NDPK A and B) and membrane-bound (NDPK C, D) isoforms. NDPK is encoded by human nm23 genes. Nm23-H1, a product of nm23, is a tumour metastasis suppressor protein which has 80% amino acid sequence identify with NDPK A polypeptides [51].
3.4.1 Histidine kinase activity of NDPK: Autophosphorylating enzymes In a number of studies, the phosphoenzyme intermediate of NDPK was shown to apparently transfer its phosphoryl group to histidine residues on a number of proteins, i.e. ATP-citrate lyase [52,53], succinate thiokinase [54] and both EnvZ and CheA (TCHKs from Escherichia coli) [55]. However, all of the above enzymes are capable of histidine autophosphorylation and Levit et al. [56] demonstrated that phosphorylation of CheA and EnvZ only underwent histidyl phosphorylation in the presence of NDPK when ADP was also present, even at extremely low concentrations. It was proposed that the ADP was phosphorylated by NDPK to form ATP which CheA and EnvZ then used in autophosphorylation reactions. A similar phenomenon could also explain the apparent phosphorylation of ATP-citrate lyase and succinate thiokinase by NDPK.
3.4.2 Histidine kinase activity of NDPK: Annexin I Membrane-associated NDPK is thought to be a component in a Cl-regulated signalling system in respiratory epithelia, which results in the histidyl phosphorylation of annexin I (and other proteins) [57–60]. Annexin I is a protein that participates in the regulation of arachidonic acid production [61] and was initially identified as a phospholipase A2 inhibitor [62]. Annexin I also plays a role in hepatocyte growth factor signalling [63] and intracellular calcium release [64].
326
Xin-Lin Zu et al.
3.4.3 Histidine kinase activity of NDPK: G-proteins Evidence has accumulated that the b-subunit of heterotrimeric G-proteins (Gb) could be phosphorylated by a HK and that this phosphoryl group could be subsequently transferred from the phosphohistidine to GDP bound on the a-subunit of the G-protein (Ga) [65–72]. The importance of this phosphotransfer reaction is that in the inactive state the G-protein has GDP bound to Ga, which is complexed with the b and g subunits (Gbg). The G-protein is normally activated by binding to a membrane receptor–ligand complex, whereupon GTP displaces GDP on Ga. The Ga–GTP complex then dissociates from Gbg and activates an effector protein such as adenylate cyclase, thus initiating an intracellular signalling cascade. The GTP on Ga is gradually hydrolysed to GDP by the subunit’s intrinsic GTPase activity, thus causing its own deactivation and allowing rebinding of Gbg. Transfer of the phosphoryl group from the phosphohistidyl-Gb to GDP on Ga provides a potentially receptor-independent pathway of G-protein activation or a means of prolonging G-protein activation in the absence of a ligand–receptor complex. Studies have shown that NDPK and G-protein interact physically. Phosphorylated NDPK was demonstrated to be able to phosphorylate GDP bound to G-protein in vitro [73] and both proteins were extractable from rat liver membranes as a complex, the formation of which was regulated by hormone receptor to activate adenylate cyclase [74]. Adding [g-32P]ATP to a mixture of GDP bound to G-protein and NDPK resulted in linear release of 32Pi without dissociation of GDP from Ga [75]. Adding GDP to canine cardiac sarcolemmal membranes induced an increase of adenylate cyclase activity, in contrast to the attenuated effect caused by GDPbS. This implicates a transphosphorylation step in the pathway leading to adenylate cyclase activation; the NDPK activity and G-protein stimulation both correlated with the elevated adenylate cyclase activity [76]. Recently, NDPK B and Gb were co-purified from bovine brain membrane and co-immunoprecipitated [77]. Gb phosphorylation was reconstituted by adding an NDPK-enriched fraction obtained from bovine brain membrane and H266 of Gb was confirmed to be the phosphorylation site [77]. The results were in a good agreement with the observation of the enhancement of adenylate cyclase activity by dual overexpression of NDPK B and Ga in H10 cells (a cell line derived from neonatal rat cardiomyocytes), in contrast to the cells overexpressing NDPK alone or untransfected cells [78]. Mutation of H118 of NDPK B resulted in loss of NDPK activity and Gb phosphorylation compared to that of wild type NDPK B in overexpressing cells [78]. It was hypothesised that NDPK and Gb formed a complex where transphosphorylation occurred between H118 of NDPK and H266 of Gb using NTP as the phosphoryl donor, the phosphoryl group was then transferred from phosphorylated Gbg to GDP to produce GTP which displaced the GDP bound to Ga and thus activated the Ga [78]. A similar scenario of NDPK- and Gb-mediated Ga activation was found in studies of insulin secretologue-stimulated histidine phosphorylation of NDPK and Gb [72]. The author proposed that NDPK
Protein Histidine Phosphorylation
327
activation contributes to the increased GTP/GDP ratio, which channels GTP to G-proteins, culminating in their activation.
3.4.4 Other protein kinase activities of NDPK In accordance with the high DG0 of hydrolysis of phosphohistidine and analogously to the TCHKs described previously, it is not surprising that phosphoenzyme form of NDPK is able to transfer its phosphoryl group to a number of different amino acids on other proteins. Nm23-H1 was found to be able to transfer a phosphoryl group from its active site phosphohistidine to an amino acid of a protein in bovine brain membrane [53]. The protein was purified and identified as aldolase C and the site of transphosphorylation was D319 [79]. The equivalent residue of aldolase A, which is glutamate, was not phosphorylated [79]. More notably, the histidine-aspartate phosphotransferase activity of Nm23-H1 was greater than that of the mutants of Nm23-H1: nm23-H1P96S and nm23-H1S120G, both of which were unable to suppress cancer cells motility. It was suggested that a signalling pathway including Nm23-H1 and aldolase C regulates metastasis. Evidence of Nm23-H1 acting as a histidine-serine phosphotransferase was collected using kinase suppressor of Ras (KSR) from either transfected 293T cells or MDA-MD-435 breast carcinoma cells. KSR is the scaffold protein of MAPK cascade. It was found that both S392 and S434 can be phosphorylated by [32P]Nm23 in vitro [80]. The Nm23-H1-transfected breast carcinoma cells expressed lower levels of activated MAPK than that of the control suggesting that elevated Nm23-H1 led to the decreased MAPK activity.
3.5 Other mammalian histidine kinases Over decades of studies of protein phosphorylation, reports of phosphohistidinecontaining proteins have constantly occurred although knowledge of the corresponding HKs is still limited.
3.5.1 HK in the synaptic plasma membrane Synaptic plasma membrane fragments were found to contain proteins that were phosphorylated on histidine and serine in presence of Mg2+. The two phosphorylations however differed in their kinetic analysis: the phosphorylation of serine was cyclic AMP (cAMP)-dependent and had a Km of 0.12 mM, whereas histidine phosphorylation was cAMP-independent or did not follow the Michaelis–Menton kinetics. It was thought that phosphorylation of both serine and histidine was a part of synaptic transmission regulation [81]. There were no other data about the protein substrate or the stoichiometry of the phosphorylations.
3.5.2 p38 and p36 HK In 1987, a protein of 38 kDa from rat liver plasma membrane named p38 was found to be phosphorylated transiently by [g-32P]ATP and the level of
328
Xin-Lin Zu et al.
phosphorylation was elevated by adding Ras protein and GDP [82]. The phosphorylation was removed by acid treatment of [32P]p38 and pretreatment of cellular membrane with DEPC. It was thus concluded that a histidine of p38 was being phosphorylated, although there was no direct detection of phosphohistidine. An in vivo study of rat liver plasma membrane phosphorylation demonstrated a protein of 36 kDa called p36 as being phosphorylated and the phosphorylation was increased by feeding rats clofibrate, a peroxisome proliferator [83]. The [32P]p36 was also acid-labile/alkali-stable and enhanced by Ras protein and GDP; the formation of phosphohistidine in [32P]p36 was confirmed by alkali hydrolysis of the phosphorylated protein followed by phosphoamino acid analysis [84]. Histidyl phosphorylation of p36 was also detected in rat hepatoma Fao cells [85]. The similarities between p38 and p36 led to the speculation that they may be identical proteins. However, p36 has never been identified whilst p38 was identified as ornithine transcarbamylase by screening a library containing cDNA sequences with a p38 antibody [26]. When Motojima and co-workers tried to purify the HK from rat liver, a protein with a molecular weight of 70–75 kDa and p36 were co-solubilised from plasma membrane and co-eluted after HPLC analysis [84]. The biological roles of p36 and p38 phosphorylations have never been addressed; there have been no reports of the position of histidine phosphorylation in the amino acid sequence of p38 for the identity of the HK.
3.5.3 p-Selectin HK Crovello et al. [86] described the phosphorylation of p-selectin in human platelets induced by thrombin and collagen stimulation. Serine, threonine, tyrosine and histidine were all phosphorylated according to the phosphoamino acid analysis of the tryptic digest of [32P]p-selectin. Edman degradation analysis confirmed that H771 and H773 in cytoplasmic tail of p-selectin were phosphorylated [86]. Thrombin and collagen have different receptors, with different downstream signalling effects. It was suggested that histidine phosphorylation of p-selectin may be involved in modulating two independent transduction pathways. However, the potential p-selectin HK has yet to be identified.
3.6 Autophosphorylation of metabolic enzymes on histidine residues In addition to the histidine phosphorylation of proteins regulated by independent HKs, there are a number of metabolic enzymes which as part of their mechanism of action autophosphorylate on an active site histidine residue yielding a phosphoenzyme intermediate that is critical for their catalytic activity. Apart from NDPK, ATP citrate lyase and succinate thiokinase which have already been mentioned, there is fructose-2,6-bisphosphatase and phosphoglycerate mutase.
3.7 Mammalian two-component-like histidine kinases The investigation of TCHKs in bacteria, fungi and plants led researchers to seek evidence of similar enzymes in mammalian cells. NDPK, as discussed in
Protein Histidine Phosphorylation
329
Section 4.2, is a phosphotransferase, functioning like a TCHK but has no structural similarity to the latter. In terms of structural homology, two mammalian proteins found in rat heart mitochondria had the four prototypical TCHK motifs, they are the branched-chain a-ketoacid dehydrogenase kinase (BCKDHK) and pyruvate dehydrogenase kinase (PDHK) [87,88]. However, in vitro investigations showed that both enzymes phosphorylate their substrates on serine rather than histidine [89,90]. The focus then turned to the possibility of phosphohistidine formed in a phosphoenzyme intermediate. Mutation of H239 in rat PDHK which was suggested to play an important catalytic role resulted in a loss of 90% of its activity [91]. However, when Tovar-Mendez mutated the two most likely histidine residues to be involved in phosphotransfer: H121; H168 of Arabidopsis thaliana PDHK, this failed to abolish its autophosphorylation and transphosphorylation activities [92]. However, pretreating A. thaliana PDHK with histidine-modifying reagents, DEPC, or dichloro-(2,2u:6u,2v-terpyridine)platinum(II) dihydrate (DTPD), did abolish PDHK autophosphorylation and inhibited the phosphorylation of its substrate, i.e. E1a of PDH [93]. Besant et al. [94] demonstrated that the autophosphorylation of BCKDHK and yeast TCHK sln1 was inhibited by the antifungal antibiotic radicicol, an inhibitor of ATPase activity of heat shock protein 90 (HSP90) [94]. The authors proposed that BCKDHK, sln1 and HSP90 share a common structural homology of the ATP-binding domain; and because radicicol does not inhibit serine/threonine and tyrosine kinases, it might inhibit the intrinsic HK activity of BCKDHK and sln1. To date, the evidence that BCKDHK and PDHK autophosphorylate on a histidine residue and subsequently transfer this phosphoryl group to their protein substrates is not clear.
3.8 Protein histidine phosphatases (PHPs) If some HKs are enzymes that ‘‘turn on’’ signalling pathways, are there complimentary phosphatases that turn them off?
3.8.1 Serine/threonine phosphatases as PHPs PHPs widely exist in multiple rat tissues including brain, heart, liver, kidney, lung, skeletal muscle and spleen [95]. In addition to phosphohistidine, phospholysine was also a substrate of PHPs. Rat brain cytosol PHP was isolated and identified to be 3-phosphohistidine/6-phospholysine phosphatase with a molecular weight of 150 kDa [96]. Soon after it was found to be a dimeric acid phosphatase [97]. Synthetic phosphohistidine was used as the substrate in the above studies. In a study of bovine liver phosphoamidase, non-synthetic substrates, [32P]his-STK and [32P]his-NDPK, were dephosphorylated by a 13-kDa PHP within 2 h [98]. Evidence for a PHP acting in vivo was also presented in an HK study of rat liver, a protein at 45 kDa containing [32P]his-p36 phosphatase activity was resolved by HPLC elution, 5 fractions after the co-elution of HK of p36 and p36 itself [84]. The 45-kDa PHP dephosphorylates [32P]his-p36 within 30 min in vitro. It was resistant to okadaic acid and dependent on Mg2+, which are typical characteristics of phosphatase-2C (PP2C) phosphatase.
330
Xin-Lin Zu et al.
Except PP2C, there were other Mg2+-dependent phosphatases such as pyruvate dehydrogenase phosphatase, phosphatase-1 (PP1), phosphatase-2A (PP2A) and Ca2+-dependent phosphatase-2B (PP2B) which are serine/threonine phosphatases and PP1b which is a tyrosine phosphatase, [99]. When PP1, PP2A, PP2B and PP2C from rabbit skeletal muscle were used to dephosphorylate the yeast HHK-phosphorylated histone H4 ([32P]his-histone H4) under different conditions, all phosphatases except for PP2B were shown to be the effective PHPs of [32P]his-histone H4. In addition, these phosphatases were at least as effective as PHPs as they were as serine/threonine phosphatases [100]. Later, PP1 and isoforms of PP2A were extracted from both rat liver and spinach leaves followed by PHP activity measurements using [32P]his-histone H4 as substrate. Derived from the two very different species, both PP1 and PP2A showed PHP activity, suggesting that phosphohistidine dephosphorylation is a highly conserved cellular event and is essential for physiological cellular regulations [101].
3.8.2 Novel PHPs The methods to detect PHP were also developed specifically to separate phosphohistidine-containing substrates, where acid treatment has to be avoided. For example, Matthews and coworkers employed either a filter-based assay, which retains the substrate and allows the released phosphate to flow through [102], or SDS-PAGE [100] to demonstrate the dephosphorylation of the substrate. In the latter method, the dephosphorylation reaction mixture was subjected to SDS-PAGE after incubation and the radioactivity of the remaining phosphorylated substrate was detected by autoradiography and analysed quantitatively [102]. Klumpp and co-workers applied methanol/acetone precipitation after PHP assay to separate the free phosphate (32Pi) from the [32P]substrate [103]. With the improvements of methodology, a novel PHP that is not a serine/ threonine phosphatase was identified. Two groups of PHP researchers have contributed to this discovery. The Klumpp group purified proteins with high PHP activity from rabbit liver using extensive column chromatography directed by PHP assay. Via 11,000-fold purification, a 14-kDa protein was found to be a PHP, the homologue of which was highly expressed in neurons of Caenorhabditis elegans [104]. Simultaneously, a PHP from porcine liver cytosol was purified and identified through a similar procedure by Zetterqvist’s group; the enzyme was cloned using identical human cDNA and expressed as a 13.7-kDa protein with specific PHP activity [105]. A 56-kDa inorganic pyrophosphatase was also shown to be able to dephosphorylate phosphohistidine and phospholysine in vitro [106]; it was then named phosphohistidine phospholysine inorganic pyrophosphate phosphatase (LHPPase) and cloned from a human HeLa cell cDNA library. The recombinant human LHPPase with the similar enzymatic characteristics of bovine LHPPase was found to be highly expressed in liver and kidney [107]. It is unambiguous that PHPs do exist, however most of the investigations were undertaken in vitro, little work has performed in vivo. With more and more advanced techniques, the full picture of HK- and PHP-regulated cellular processes will emerge.
Protein Histidine Phosphorylation
331
4. DETECTION OF HISTIDINE PHOSPHORYLATION 4.1 Phosphoamino acid analysis One of the simplest ways to confirm that the site of phosphorylation in a phosphoprotein is a particular amino acid is to perform phosphoamino acid analysis. In this process, the protein substrate is phosphorylated using a nucleotide in which the g-phosphate is radiolabelled, commonly with 32P, resulting in the formation of a [32P]phosphoprotein product. The most commonly used radiolabelled nucleotide is [g-32P]ATP which can be purchased and used in an in vitro reaction comprising protein kinase, protein substrate and any cofactors (e.g. Mg2+). Alternatively, cells in culture can be metabolically labelled with 32Pi; in this case the cells are incubated for several hours in phosphate-free media to which several millicuries of 32Pi has been added. This results in the formation of [g-32P]nucleotides which are then used as substrates in intracellular protein kinase reactions resulting in the formation of [32P]phosphoproteins. Phosphoproteins are then subjected to partial hydrolysis to release free radiolabelled phosphoamino acids, the most commonly used method is partial acid hydrolysis of phosphoproteins, however, because of the acid-lability of the phosphohistidine, this method cannot be employed when analysing phosphohistidyl phosphoproteins. Alkaline hydrolysis has been widely used in which the phosphoprotein is incubated at 1001C for several hours with 3 M KOH (see e.g. refs. [17,38,108]. The resultant hydrolysate is then diluted for analysis or the potassium ions are precipitated using perchloric acid, prior to analysis [38,86]. Another way of hydrolysing phosphoproteins is by using a proteolytic enzyme, pronase E (Streptomyces griseus protease) [109]. Analysis of the hydrolysates may be performed in a number of different ways; these include anion-exchange chromatography, e.g. using Dowex-1-X8 [108] or Mono Q Sepharose [37] and detection of the radiolabelled phosphoamino acids, matching their elution times/volumes with those of authentic phosphoamino acid standards. Alternatively, reverse-phase chromatography can be employed [37]. It is possible to detect non-radiolabelled phosphoamino acids using an o-phthalaldehyde — derivatisation procedure with fluorescence detection [37], although this method still does not have the sensitivity of radiolabel detection. A number of thin-layer chromatographic (TLC)/electrophoretic methods are also employed in conjunction with phosphoimaging to visualise radiolabelled amino acids followed by ninhydrin staining to visualise the authentic phosphoamino acid standards. There is thin-layer electrophoresis, usually on cellulose plates [23,38] although historically electrophoretic separation has also been done on paper [110]. Recently, reverse-phase thin-layer chromatography has been employed for the analysis of phosphohistidine-containing phosphoprotein hydrolysates [40,109,111]. Figure 6 shows a reverse-phase thin-layer chromatogram of histone H4 phosphorylated by a partially purified preparation of yeast HHK. On the left in Figure 6 is the phosphoimage of the TLC plate and on the right is the ninhydrin-stained plate, showing the phosphoamino acid standards.
332
Xin-Lin Zu et al.
Figure 6 Figure 3 Phosphoamino acid analysis of histone H4 phosphorylated by the yeast HK preparation. A digest of the 32P-labelled phosphorylated histone H4 was split into two and treated with either acid or water as described in Materials and Methods. The water-treated digest was run in lane 3 and the acid-treated digest was run in lane 4 of the reverse-phase TLC plate. A mixture of water-treated phosphoamino acid standards was run in lane 1 and a similar acid-treated mixture was run in lane 2. Individual phosphoamino acid standards were run in lanes 5–9: lane 5 — phosphohistidine (P-H); lane 6 — phosphotyrosine (P-Y); lane 7 — phosphoarginine (P-R); lane 8 — phosphothreonine (P-T); lane 9 — phosphoserine (P-S). The left-hand panel shows the phosphorimage of the TLC plate while the right-hand panel shows image of the ninhydrin-stained plate. Positions of the phosphoamino acid standards and histidine are indicated on the right of the figure. Reproduced with permission from [109].
This figure also shows the disappearance of both the radioactive spot and the ninhydrin-stained spot corresponding to phosphohistidine when the hydrolysate mixed with the phosphoamino acid standards was treated with acid. 31 P and 1H NMR have also been used to directly detect phosphohistidine in proteins [33,34]. Whilst this method has the advantage of being non-destructive and does not rely on radiochemical labelling, it is also much less sensitive than the methods described above and hence its use has been very limited.
4.2 Histidine kinase assays Wei and Matthews [36] first described a filter-based assay that could be used to measure HK activity. It is based on the alkali-stability of phosphohistidine compared to the alkali-lability of phosphoserine and phosphothreonine. Following phosphorylation of the protein with [g-32P]ATP, the reaction mixture is heated at 601C in 0.5 M NaOH to hydrolyse and thus deplete any phosphoserine and phosphothreonine present. The reaction mixture is then
Protein Histidine Phosphorylation
333
spotted on to Nytran filters which are washed under alkaline conditions to remove [g-32P]ATP and 32Pi. The filters are then dried and the amount of [32P]phosphoprotein determined by Cherenkov counting of the filters. This method does not, however, discriminate between histidine phosphorylation and tyrosine phosphorylation. Recently, Tan et al. [109] have modified this assay to make it more specific for the measurement of HK activity. In the modified assay, following alkali treatment, the kinase reaction mixture is split into two. One half is heated in 1 M HCL at 601 for 30 min and then made alkaline again by addition of NaOH, NaCl is added to the other half of the reaction mixture. The two samples are then spotted on to separate Nytran filters (pretreated with 1 mM ATP to reduce [g-32P]ATP binding) and washed in 10 mM tetrasodium pyrophosphate to improve removal of [g-32P]ATP and 32Pi. The radioactivity associated with the acid-treated sample corresponds to tyrosine phosphorylation whilst that of the NaCl-treated sample corresponds to both tyrosine and histidine phosphorylation. The difference between the radioactivity of the two samples gives the amount of histidine phosphorylation (Figure 7, ref. [109]).
Figure 7 Phosphorylation of histone H4 in vitro by an HK preparation from yeast. Reactions were performed for the indicated periods of time and then terminated with NaOH, followed by heating at 601C for 30 min to deplete any phosphoserine or phosphothreonine present in the histone H4. Each reaction mixture was then split into two and half was acid-treated, this and the rest of the assay procedure was then performed as described in Section 4.2. The assays were performed in triplicate and results are present as picomole of Pi incorporated into histone H4 per milligram total protein in enzyme preparation, 7SEM. Time courses of alkali-stable phosphorylation (’), alkali-stable/acid-stable phosphorylation (K) and acidlabile phosphorylation ( ) are shown. Lines were fitted to the alkali-stable and alkali-stable/ acid-stable data by non-linear least squares regression analysis, fitting an equation describing a first-order exponential approach to a steady-state rate. The line was drawn through acid-labile data using linear least squares regression analysis. In the curve fits, each datum point was weighted according to the standard error of the mean associated with that datum point [weighting ¼ 1/(SEM)2]. Reproduced with permission from [109].
7
334
Xin-Lin Zu et al.
4.3 In-gel kinase detection of histidine kinases Besant and Attwood [38] were able to detect yeast HHK and putative thymus HHKs using an in-gel kinase assay in which the substrate histone H4 was incorporated into the polyacrylamide gel matrix by mixing it with the acrylamide solution prior to polymerisation. The HHK-containing samples were then prepared for SDS-PAGE in the usual way, by denaturing in SDS sample buffer containing b-mercaptoethanol. Electrophoresis was then performed, the gel was then washed to remove SDS and the proteins in the gel were fully denatured by incubation of the gel in 6 M guanidine HCl. Protein renaturation was then performed as described elsewhere [112] and the gel then incubated in a kinase reaction mixture containing [g-32P]ATP. The gel was then washed to remove any unincorporated radioactivity and the gel was then subjected to phosphoimage analysis. Where radioactive bands appear, these correspond to the positions in the gel of histone H4 kinases which have phosphorylated the histone H4 incorporated into the gel at these places. As can be seen in Figure 8, there are
Figure 8 Phosphoimage of in-gel renaturation and phosphorylation using histone H4 as a substrate of putative thymus and yeast HKs as described in Section 4.3. Control portion of the gel is identical to the histone H4 substrate portion except that no histone H4 was incorporated into the polyacrylamide matrix. Molecular weights based on standards from the coomassie-stained gel indicated on the ordinate show the yeast HK to be approximately 32 kDa and four putative porcine thymus HKs to be approximately 34–41 kDa. Reproduced with permission from [38].
Protein Histidine Phosphorylation
335
radioactive bands corresponding to positions of histone H4 kinases; authentic, purified yeast HHK was used as a control and a band was present at the expected position on the gel corresponding to the molecular weight of the HHK. No histone H4 was incorporated into the right-hand half of the gel but the same samples were run on both halves. The lack of radioactive bands on this half of the gel indicates that under these conditions, no autophosphorylation occurred.
4.4 Use of mass spectrometric methods The use of MS is well suited for detecting the presence of PTMs. Traditionally, a ‘‘bottom-up’’ approach has been commonly used where the protein of interest is digested into peptides, which are then analysed via MS or MSn. Recently ‘‘topdown’’ studies of entire proteins using Fourier transformed mass spectrometry (FTMS) have been used to detect multiple protein PTMs [113–115]. The identification of phosphorylation sites on proteins has been reported using both approaches, however histidine phosphorylation of proteins has only been reported using the bottom-up approach. Given the labile nature of the phosphoamidate bond of phosphohistidine, it remains to be demonstrated whether phosphohistidine is stable enough for detection under the conditions required for FTMS analysis. Even using traditional bottom-up MS approaches, detecting the phosphorylation of the basic residues histidine, lysine and arginine and the phosphoanhydride phosphoamino acid bonds of phosphoaspartate and phosphoglutamate has still been challenging. The main reason for this is their limited stability under acidic conditions that were traditionally applied during sample preparation. The chemical instability of phosphohistidine and the other acid-labile phosphoamino acids is particularly noticeable when analysed in positive ion mode using either ESI or MALDI-MS. It appears that in positive ion mode, the protonated peptides containing phosphohistidine readily lose their phosphoryl group (HPO 3 80 amu) and go undetected as phosphopeptides. Fortunately, in negative ion mode, phosphohistidine-containing peptides are comparatively more stable and have been readily detected using various ionisation sources. However, the loss of a phosphoryl group is sometimes advantageous in the detection of phosphopeptides. Neutral loss of HPO3 (80 amu) in positive ion mode or PO 3 (79 amu) in negative ion mode can in some instances be used as markers of phosphorylation [116]. Given the inherent problems associated with the acid-lability of phosphohistidine, it is still possible to detect it under optimised conditions by MS. Chemically synthesised and purified 3-phosphohistidine was analysed by ESI-MS as shown in negative ion mode by ref. [111] (Figure 9).
4.4.1 ESI-MS and ICP-MS Several different mass spectrometric techniques have been used to detect histidine phosphorylation of proteins. Wind et al. [117] have used a combination of reverse-phase liquid chromatography, electrospray ionisation mass spectrometry (ESI-MS) and element MS with phosphorus 31P detection (also known as
336
Xin-Lin Zu et al.
Figure 9 Negative ion mass spectrum of 3-phosphohistidine [M–H] ¼ 234.1 amu. Reproduced with permission from [111].
inductively coupled plasma mass spectrometry (ICP-MS)) to examine histidine phosphorylation of the bacterial chemotaxis regulatory TCHK, CheA. After tryptic digestion, under mildly alkaline conditions, of the in vitro phosphorylated HK target domain of CheA, the phosphopeptides were analysed by reverse-phase mLC-ICP-MS and 31P detection (Figure 10). The second peak in the spectrum above with a retention time of approximately 46.5 min was later determined by tandem nanoESI-MS to be a peptide-containing phosphohistidine. What was significant in this study was the ability to demonstrate the acid-lability of a phosphohistidine-containing proteins
Protein Histidine Phosphorylation
337
Figure 10 Analysis of a tryptic digest of an incubation of the HK CheA-C with its substrate CheA-H (molar ratio 1:10) in the presence of ATP. Analysis by mLC-ICP-MS with 31P detection. The arrow points to the phosphohistidine-containing peptide. Reproduced with permission from [117].
Figure 11 Stability of phospho-CheA-H under typical solvent conditions for MS (50% acetonitrile, 1% formic acid). Reproduced with permission from [117].
by nanoESI-MS. Using the histidine-phosphorylated HK target domain in a solvent system typically used for optimal ionisation of peptides (50% aqueous acteonitrile and 1% formic acid), the acid hydrolysis of phosphohistidine was able to be followed over time (Figure 11). The initial degree of phosphorylation was established at 40% by measuring the peak intensities of the phosphorylated peptides determined in the deconvoluted nanoESI-MS spectrum. Over a 60-min period this dropped to 10% illustrating (i) the difficulties associated with identifying histidinephosphorylated proteins, (ii) awareness of the chemical environment used in preparing histidine-phosphorylated samples and (iii) the probable underestimation of protein histidine phosphorylation simply due to the acid-lability of phosphohistidine and other acid-labile protein phosphoamino acids under standard MS sample solvent conditions.
338
Xin-Lin Zu et al.
4.4.2 IMAC and MALDI TOF-MS A popular phosphopeptide enrichment technique involves the use of immobilised metal affinity chromatography (IMAC) in which various divalent cations complexed to a solid support are used to bind phosphopeptides. A vast array of metal ions have been used for purifying phosphopeptides but the only one to date used for purifying a histidine-phosphorylated peptide is copper II (Cu2+). Napper et al. [118] make use of this phosphopeptide enrichment technique prior to analysing histidine phosphopeptides via matrix-assisted laser desorption ionisation time-of-flight mass spectrometry (MALDI TOF-MS). The phosphohistidine-containing peptide identified in this study was from the protein HK HPr found in the phosphoenolpyruvate-sugar phosphotransferase system (PTS). Phosphorylated HPr was digested under mildly alkaline conditions using V8 protease and the digest was applied to acidified Cu2+-charged IMAC zip tips (Millipore). Although IMAC matrices can also be used to bind proteins or peptides with histidine affinity tags and neutral to alkaline pH, in this instance false positive non-phosphorylated peptides are avoided by loading the sample under mildly acidic conditions. Given that the pKa of the imidazole ring of histidine is 6.0, the ring becomes protonated at a pH lower than 6.0 and will be repelled from the positively charged divalent copper ions allowing only the phosphorylated peptides to bind. Given the acid-lability of phosphohistidine the binding and washing steps are done rapidly to reduce hydrolysis of the phosphoramidate bond. Elution of histidine phosphopeptides is successfully achieved using an alkali ammonia solution with subsequent neutralization of the sample with trifluoroacetic acid [118]. The now enriched phosphopeptides are mixed with matrix and analysed by MALDI TOF-MS. Similar to the findings for ESI-MS analysis of phosphohistidine peptides, it was found that in positive ion mode the relative abundance of the histidine-phosphorylated peptide to non-phosphorylated peptide was 1:25. It seems identifying phosphohistidine-containing peptides in positive ion mode is challenging due to the lability of phosphohistidine. In the event that the researcher knows that a particular peptide is histidine-phosphorylated they can search for peptides of the appropriate mass in positive ion mode. However, in the event where the site of phosphorylation is unknown, identifying phosphohistidine-containing peptides in positive ion mode is not recommended in the first instance. Even though phosphohistidine-containing peptides were identified in negative ion mode below (Figure 12), there was still evidence of dephosphorylation illustrating once again the lability of phosphohistidine during mass spectrometric analysis.
4.4.3 Chemical histidine phosphorylation of peptides Histidine phosphorylation of peptides or proteins can be achieved chemically using potassium phosphoramidate or phosphoryl chloride [119]. At alkaline pH this phosphoramidate reaction is specific for histidine and is convenient if phosphorylating a peptide or protein sequence with a known number of histidine residues. In an excellent case study of methods to analyse histidine-phosphorylated peptides, Medzihradszky et al. [120] used chemical phosphorylation of
Protein Histidine Phosphorylation
339
Figure 12 Negative ion linear mode MALDI TOF-MS of the Cu2+ IMAC extract from a V8 digest of the histidine-phosphorylated protein HPr. Reproduced with permission from [118].
a synthetic peptide and a combination of ESI-MS and MALDI TOF-MS to specifically address the analysis of histidine-phosphorylated peptides. In the Figure 13, the peptide Ac-PLSFTNPLHSDDWH-NH2 was chemically phosphorylated on histidine and analysed in positive ion (Figure 13A) and negative ion (Figure 13B) modes using ESI-MS. As can be clearly seen, on positive ion mode the spectrum is a lot more complex due to multiply charged sodium adducts of both phosphorylated and unphosphorylated peptide. By contrast, the negative ion spectrum is a lot cleaner with most abundant ions being the doubly charged phosphorylated and dephosphorylated peptide without any sodium adducts. By contrast, the negative ion MALDI TOF-MS spectrum in reflectron mode (Figure 14) shows a simplified spectrum of only the phosphorylated peptide. Clearly, there are no metal ion adducts or dephosphorylated peptides as seen in the ESI-MS. Using mass spectrometric techniques to identify protein and peptides phosphorylated on histidine still requires further investigation and further optimisation. The acid-labile nature of the phosphoamidate bond of phosphohistidine is certainly the major factor influencing the success of detection using MS. With the majority of phosphopeptide detection and analysis focusing on the acid-stable phosphohydroxy amino acids, the standard MS approaches are likely to miss phosphohistidine-containing phosphopeptides. Together, the above examples highlight the problems associated with acid-lability of phosphohistidine and demonstrate that histidine-phosphorylated proteins and peptides can be successfully detected using MS.
5. FUTURE DIRECTIONS 5.1 Fourier transform ion cyclotron resonance mass analysis (FTMS): ‘‘Top-down’’ mass spectrometry Recent advances in FTMS have made it possible to undertake what is referred to as ‘‘top-down’’ analysis of proteins. This type of mass spectrometer has the
340
Xin-Lin Zu et al.
Figure 13 Electrospray ionisation mass spectrum of the phosphorylated peptide Ac-PLSFIuNPLHSDDWH-NH2 in (A) positive ion and (B) negative ion modes. Asterisks designate signals derived from the dephosphorylated peptide. Reproduced with permission from [120], courtesy of Protein Science.
Protein Histidine Phosphorylation
341
Figure 14 Negative ion MALDI spectrum peptide Ac-SFTNPLH(p)AAANH2 acquired in reflectron mode. CHCA was used as matrix. Only the phosphorylated species is detected in this mode. Reproduced with permission from [120], courtesy of Protein Science.
ability to search for multiple PTMs on an undigested protein or large peptide. This type of mass analyser has a much broader mass range than traditionally configured ESI-MS or MALDI TOF-MS allowing for global analysis of entire proteins. Traditional FTMS is now being coupled to other sources to produce hybrid machines that are now becoming more commonplace. Electrospray (ESI) sources are one example of an ionisation source being used in the analysis of peptides for FTMS. Examples of this have been used to investigate the PTMs of histones, histone variants and many other proteins which undergo methylation, acetylation, phosphorylation or other PTMs [113–115]. There are no references to this type of analysis being used to identify histidine phosphorylation of proteins. The problems associated with the acid-lability of phosphohistidine in sample preparation or during enrichment processes prevail regardless of the type of mass analyser used. That said the power of FTMS to analyse multiple PTMs might in the future prove useful in the identification of proteins phosphorylated on histidine.
5.2 Thiophosphorylation of histidine improves chemical stability Biochemical analysis of phosphohistidine residues in proteins is severely hampered by the acid-lability of the phosphoramidate bond. However, it has been demonstrated that by replacing the phosphoryl group linked to the
342
Xin-Lin Zu et al.
histidine residue with a thiophosphoryl, a thiophosphohistidine derivative with increased stability is formed. This was demonstrated by Lasker et al. [119] who compared the relative acidlabilities of a phosphorylated versus thiophosphorylated histidine-containing synthetic peptide as well as the catalytic subunit of the yeast HK Sln1. Thiophosphorylation can be achieved chemically using PSCl3 or enzymatically by ATPgS. Pirrung et al. [121] have used another reagent, namely thiophosphoramidate as an effective means to thiophosphorylate histidine. As with the different isomers of phosphohistidine, they find the 3-thiophosphohistidine isoform to be the major component of the chemical synthesis due to its increased stability. Phosphorylation and thiophosphorylation of histidine-containing peptides can be detected by MS [119]. Monoprotonated peptide ions with the correct molecular masses were detected for the histidine-phosphorylated and thiophosphorylated peptides which are characterised by a molecular mass shift of 80 (phospho-) and 96 amu (thiophospho-), respectively (Figure 15).
5.3 Edman degradation of radiolabelled, histidine-thiophosphorylated NDPK Edman degradation is commonly used as a chemical means of obtaining N-terminal sequence information of peptides and proteins. It has also found a use in determining certain PTMs, including protein phosphorylation. With respect to histidine phosphorylation, the acid-lability becomes an issue, however Lasker et al. [119] have taken a common autophosphorylating HK NDPK and used this methodology together with other supporting evidence to show that NDPK can be thiophosphorylated on histidine. The predicted histidine phosphorylation site of bovine liver NDPK is in position 118 corresponding to the tryptic peptide 115 NII HGSDSVESAEK 128. To identify the phosphorylated histidine, radiolabelled tryptic peptides derived from phosphorylated and thiophosphorylated NDPK (using [g-32P]ATP and [g-35S]thioATP, respectively) were obtained from the major radioactive peaks of RP-HPLC runs and subjected to radioactive Edman degradation. The phosphohistidine-containing peptide above resulted in spurious sequence data with variable amounts of radioactivity released from successive Edman degradation cycles. However, the increased chemical stability of the thiophosphohistidinecontaining peptide showed a clear signal above background for a released radioactive derivative anilinothiazolinone (ATZ) at amino acid in position 4. Since the only tryptic peptide derived from bovine liver NDPK with a histidine in position 4 is the one shown above, histidine was confirmed as being the target phosphoamino acid.
5.4
32
P/33P differential phosphoprotein labelling
In a similar vain to ICAT differential protein labelling tags [122], a recent detection method based on the differential labelling of proteins with 32Pi and 33Pi
Protein Histidine Phosphorylation
Figure 15 Electrospray MS analyses of synthetic peptides purified by RP-HPLC; (A) unphosphorylated; (B) after treatment with potassium phosphoramidate; (C) after treatment with PSCl3; shown are the monoprotonated molecular ions of the peptides (A) Ac-HGGGGAAAL-NH2, (B) Ac-(pH)GGGGAAAL-NH2 and (C) Ac-(tpH)GGGGAAAL-NH2. Reproduced with permission from [119], courtesy of Protein Science.
343
344
Xin-Lin Zu et al.
can be used to look at protein phosphorylation in different cellular states [123]. With this method normal cells might be labelled with 32Pi and cancerous cells labelled with 33Pi. The cell extracts are mixed and separated by two-dimensional gel electrophoresis. Initial exposure of the gel reveals all proteins labelled with both 32Pi and 33Pi. Any proteins phosphorylated in the cancerous cells, which is not phosphorylated in the normal cells will have a 33Pi label. A second exposure is taken where the gel is covered with a thick acetate sheet sufficient to block the less energetic b particles of 33Pi. The disappearance of labelled protein spots on the gel will then provide and image of only those proteins independently phosphorylated in the cancerous state. The radiolabelled spots of interest can then be excised and identified by MS as well as subsequent identification of the amino acid residue being phosphorylated using the techniques described elsewhere in this chapter. One application of this methodology that might be applied specifically for the detection of acid-labile protein phosphorylation is the acid treatment of the gel directly or gels that have been blotted onto nylon membranes. Acid treatment would remove any phosphates on histidine, lysine, arginine, aspartate or glutamate. So not only would there be detection of cancer specific phosphorylation but acid treatment of the same gel would provide some evidence as to the nature of the phosphorylated amino acid.
5.5 Phosphohistidine-specific antibodies Anti-phosphotrysosine antibodies have proved extremely useful for the detection of phosphotyrosine residues in phosphoproteins and for immunoaffinity concentration of such phosphoproteins prior to digestion and mass spectrometric peptide analysis. Given that both phosphotyrosine and phosphohistidine have aromatic rings it should be possible to obtain anti-phosphohistidine antibodies. However, we are aware of some attempts that have been made to generate such antibodies which have been unsuccessful. It is likely that when phosphohistidine or peptides containing this phosphoamino acid are part of the immunogen, hydrolysis occurs too quickly to elicit a strong immune response. It would seem that future attempts will use non-hydrolysable analogues of phosphohistidine [124] or more stable thiophosphohistidines. It is not clear whether two different antibodies will be required to detect the 1- and 3- forms of phosphohistidine.
5.6 Two-component histidine kinase and histidine kinase consensus phosphorylation sites High levels of HHK activity in liver progenitor cells (LPCs), together with evidence of HHK activity from other sources [32,39] has lead to speculation of other HK target substrates may exist. Although much of the interest in PTM of histone proteins centres on the N-terminal tails, there have been reports that H75 of histone H4 is also a site of phosphorylation. H75 lies within a structural motif known as a histone fold. Many proteins possess this structural motif but a selected few have a defined motif containing the canonical histidine residue
Protein Histidine Phosphorylation
345
along with other features, which suggest an HK-specific substrate consensus sequence. To identify some of the intracellular targets of HHK activity other than histone H4, a database search using the sequence of the histidine fold (see below) has yielded several potential target proteins. These are listed in Table 1. For the most part, this motif appears to have two conserved aspartate residues (shown in bold), one 7 residues upstream of the canonical histidine and the other 12 residues downstream. In addition the third residue after the histidine is always a basic amino acid (lysine, arginine or histidine — shown underscored). Not surprisingly, many of these proteins have functions that are associated with the cell cycle. Since HHK activity is associated with cell proliferation and cancer, these putative HHK substrates will require further investigation using many of the MS and other proteomics techniques specific for the detection of histidine phosphorylation described elsewhere in this chapter. TCHK also possess distinct ATP binding and a consensus phosphorylation site similar to HHKs where the histidine residue usually has a basic residue on the third residue downstream, e.g. TVFIANISHELRTPLNGIL (the histidine phosphorylation site of the yeast TCHK, Sln1). Two mammalian mitochondrial proteins, BCKHK and PDHK, also possess the same ATP-binding and phosphorylation motifs as TCHKs yet there is still debate as to whether these two mammalian proteins are bone fide TCHK’s [90,94,125]. TCHKs, including the mammalian homologues BCKDHK and PDHK, all possess a distinct ATP-binding fold known as a Bergerat fold [126]. The Bergerat fold is also shared by other functionally unrelated proteins such as heat shock protein 90 (HSP90), DNA gyrase and topoisomerases. There is evidence to suggest that this distinct structural motif may be a hallmark of TCHKs and possibly other HKs [94,126]. This however has yet to be determined experimentally for other Bergerat fold-containing proteins.
Table 1
Sequences of histone fold proteins
HISTONE_H4 HISTONE_H3 HUMAN P53 CENP-Aa TAFII60b INCENPa SURVIVINa CBPc HUMAN ATBF1d a
VFLENVIR.D LVGLFE.D RCPHHERCSDSD FLVHLFEED LKRIVQD HNYLNSDD GWEPD FITS AALQ
AVTYTEHAKR TNLCVIHAKR GLAPPQHLIR AYLLTLHAGR AAKFMNHAKR STDDEAHPRK DDPIEEHKKH EASERCHGEK THFNEVHAKR
KTVTAMDVV YA VTIMPKDIQ LARR VEGNLRVEYLDDRNTFRH VTLFPKDVQ LAR QKLSVRDID PIPTWARGTP LASQ SSGCAFLSVK RKTINGEDIL F PQLPVSDRHVYKYRCN
CENP-A/INCENP/Survivin, all centromere-associated proteins. TAFII60,TATA-binding protein-associated factor 60 (yeast TAF60, Drosophila TAF60/62, human TAF80). CBP, CAAT-box DNA-binding protein subunit B (also known as nuclear transcription factor Y subunit beta (NF-Y protein chain B). d ATBF1, Human AT-motif binding factor 1/alpha feto enhancer-binding protein. b c
346
Xin-Lin Zu et al.
6. CONCLUSION The investigation of protein histidine phosphorylation and its associated HKs is still very much in its infancy, especially in mammalian cells. Over the next few years it is likely that there will be more and more interest in this PTM and the development of new methods of investigation of protein phosphorylation will certainly have a major impact on the progress of research in this area. Without a doubt, MS, with its high sensitivity and precision, will be at the forefront in these investigations.
ACKNOWLEDGEMENTS We would like to thank the Ada Bartholomew Medical Research Trust, the UWA Small Grant Scheme and the Raine Medical Research Foundation for their financial support.
REFERENCES 1 T. Pawson and J.D. Scott, Protein phosphorylation in signaling — 50 years and counting, Trends Biochem. Sci., 30 (2005) 286–290. 2 M.B. Potters, B.T. Solow, K.M. Bischoff, D.E. Graham, B.H. Lower, R. Helm and P.J. Kennelly, Phosphoprotein with phosphoglycerate mutase activity from the archaeon Sulfolobus solfataricus, J. Bacteriol., 185 (2003) 2112–2121. 3 M. Liu, M. Krasteva and A. Barth, Interactions of phosphate groups of ATP and aspartyl phosphate with the sarcoplasmic reticulum Ca2+-ATPase: An FTIR study, Biophys. J., 89 (2005) 4352–4363. 4 B. Schneider, A. Norda, A. Karlsson, M. Veron and D. Deville-Bonne, Nucleotide affinity for a stable phosphorylated intermediate of nucleoside diphosphate kinase, Protein Sci., 11 (2002) 1648–1656. 5 D. Barford, A.K. Das and M.P. Egloff, The structure and mechanism of protein phosphatases: Insights into catalysis and regulation, Annu. Rev. Biophys. Biomol. Struct., 27 (1998) 133–164. 6 A.M. Stock, V.L. Robinson and P.N. Goudreau, Two-component signal transduction, Annu. Rev. Biochem., 69 (2000) 183–215. 7 P.G. Besant, E. Tan and P.V. Attwood, Mammalian protein histidine kinases, Int. J. Biochem. Cell Biol., 35 (2003) 297–309. 8 P.G. Besant and P.V. Attwood, Mammalian histidine kinases, Biochim. Biophys. Acta, 1754 (2005) 281–290. 9 O. Walinder, Evidence of the presence of 1-phosphohistidine as the main phosphohistidine as the main phosphorylated component at the active site of bovine liver nucleoside diphosphate kinase, Acta Chem. Scand., 23 (1969) 339–341. 10 B. Edlund, Effects of chemical modification of lysine, tyrosine and tryptophan residues in pea seed nucleoside diphosphate kinase and inhibition of the enzyme with antibodies, Ups J. Med. Sci., 87 (1982) 251–258. 11 E.B. Waygood, K. Pasloske, L.T. Delbaere, J. Deutscher and W. Hengstenberg, Characterization of the 1-phosphohistidinyl residue in the phosphocarrier protein HPr of the phosphoenolpyruvate: Sugar phosphotransferase system of Streptococcus faecalis, Biochem. Cell Biol., 66 (1988) 76–80. 12 J.M. Huang, Y.F. Wei, Y.H. Kim, L. Osterberg and H.R. Matthews, Purification of a protein histidine kinase from the yeast Saccharomyces cerevisiae. The first member of this class of protein kinases, J. Biol. Chem., 266 (1991) 9023–9031. 13 J. Gross, M. Rajavel, E. Segura and C. Grubmeyer, Energy coupling in Salmonella typhimurium nicotinic acid phosphoribosyltransferase: Identification of His-219 as site of phosphorylation, Biochemistry, 35 (1996) 3917–3924.
Protein Histidine Phosphorylation
347
14 A.M. Spronk, H. Yoshida and H.G. Wood, Isolation of 3-phosphohistidine from phosphorylated pyruvate, phosphate dikinase, Proc. Natl. Acad. Sci. U.S.A., 73 (1976) 4415–4419. 15 S. Narindrasorasak and W.A. Bridger, Phosphoenolypyruvate synthetase of Escherichia coli: Molecular weight, subunit composition, and identification of phosphohistidine in phosphoenzyme intermediate, J. Biol. Chem., 252 (1977) 3121–3127. 16 M.R. El-Maghrabi and S.J. Pilkis, Rat liver 6-phosphofructo 2-kinase/fructose 2,6-bisphosphatase: A review of relationships between the two activities of the enzyme, J. Cell. Biochem., 26 (1984) 1–17. 17 V.D. Huebner and H.R. Matthews, Phosphorylation of histidine in proteins by a nuclear extract of Physarum polycephalum plasmodia, J. Biol. Chem., 260 (1985) 16106–16113. 18 A. Tauler, M.R. el-Maghrabi and S.J. Pilkis, Functional homology of 6-phosphofructo-2-kinase/ fructose-2,6-bisphosphatase, phosphoglycerate mutase, and 2,3-bisphosphoglycerate mutase, J. Biol. Chem., 262 (1987) 16808–16815. 19 K.D. Kumble, K. Ahn and A. Kornberg, Phosphohistidyl active sites in polyphosphate kinase of Escherichia coli, Proc. Natl. Acad. Sci. USA, 93 (1996) 14391–14395. 20 J.B. Stock, A.M. Stock and J.M. Mottonen, Signal transduction in bacteria, Nature, 344 (1990) 395–400. 21 B. Duclos, S. Marcandier and A.J. Cozzone, Chemical properties and separation of phosphoamino acids by thin-layer chromatography and/or electrophoresis, Meth. Enzymol., 201 (1991) 10–21. 22 D.E. Hultquist, The preparation and characterization of phosphorylated derivatives of histidine, Biochim. Biophys. Acta, 153 (1968) 329–340. 23 P.G. Besant and P.V. Attwood, Problems with phosphoamino acid analysis using alkaline hydrolysis, Anal. Biochem., 265 (1998) 187–190. 24 H. Sakai, J. Hua, Q.G. Chen, C. Chang, L.J. Medrano, A.B. Bleecker and E.M. Meyerowitz, ETR2 is an ETR1-like gene involved in ethylene signaling in Arabidopsis, Proc. Natl. Acad. Sci. USA, 95 (1998) 5812–5817. 25 J.M. Lu, R.J. Deschenes and J.S. Fassler, Saccharomyces cerevisiae histidine phosphotransferase Ypd1p shuttles between the nucleus and cytoplasm for SLN1-dependent phosphorylation of Ssk1p and Skn7p, Eukaryot. Cell, 2 (2003) 1304–1314. 26 H.R. Matthews, Protein kinases and phosphatases that act on histidine, lysine, or arginine residues in eukaryotic proteins — a possible regulator of the mitogen-activated protein kinase cascade, Pharmacol. Ther., 67 (1995) 323–350. 27 L.L. Bieber and P.D. Boyer, 32P-labeling of mitochondrial protein and lipid fractions and their relation to oxidative phosphorylation, J. Biol. Chem., 241 (1966) 5375–5383. 28 M. Deluca, K.E. Ebner, D.E. Hultquist, G. Kreil, J.B. Peter, R.W. Moyer and P.D. Boyer, The isolation and identification of phosphohistidine from mitochondrial protein, Biochem. Z., 338 (1963) 512–525. 29 O. Zetterqvist and L. Engstrom, Isolation of [32P]phosphohistidine from different rat-liver cell fractions after incubation with [32P]adenosine triphosphate, Brit. J. Haematol., 12 (1966) 520–530. 30 C.C. Chen, D.L. Smith, B.B. Bruegger, R.M. Halpern and R.A. Smith, Occurrence and distribution of acid-labile histone phosphates in regenerating rat liver, Biochemistry, 13 (1974) 3785–3789. 31 D.L. Smith, C.C. Chen, B.B. Bruegger, S.L. Holtz, R.M. Halpern and R.A. Smith, Characterization of protein kinases forming acid-labile histone phosphates in Walker-256 carcinosarcoma cell nuclei, Biochemistry, 13 (1974) 3780–3785. 32 C.C. Chen, B.B. Bruegger, C.W. Kern, Y.C. Lin, R.M. Halpern and R.A. Smith, Phosphorylation of nuclear proteins in rat regenerating liver, Biochemistry, 16 (1977) 4852–4855. 33 M. Gassner, D. Stehlik, O. Schrecker, W. Hengstenberg, W. Maurer and H. Ruterjans, The phosphoenolpyruvate-dependent phosphotransferase system of Staphylococcus aureus. 2. 1H and 31P-nuclear-magnetic-resonance studies on the phosphocarrier protein HPr, phosphohistidines and phosphorylated HPr, Eur. J. Biochem., 75 (1977) 287–296. 34 J.M. Fujitaki, G. Fung, E.Y. Oh and R.A. Smith, Characterization of chemical and enzymatic acidlabile phosphorylation of histone H4 using phosphorus-31 nuclear magnetic resonance, Biochemistry, 20 (1981) 3658–3664. 35 L. Carlomagno, V.D. Huebner and H.R. Matthews, Rapid separation of phosphoamino acids including the phosphohistidines by isocratic high-performance liquid chromatography of the orthophthalaldehyde derivatives, Anal. Biochem., 149 (1985) 344–348.
348
Xin-Lin Zu et al.
36 Y.F. Wei and H.R. Matthews, A filter-based protein kinase assay selective for alkali-stable protein phosphorylation and suitable for acid-labile protein phosphorylation, Anal. Biochem., 190 (1990) 188–192. 37 Y.F. Wei and H.R. Matthews, Identification of phosphohistidine in proteins and purification of protein-histidine kinases, Meth. Enzymol., 200 (1991) 388–414. 38 P.G. Besant and P.V. Attwood, Detection of a mammalian histone H4 kinase that has yeast histidine kinase-like enzymic activity, Int. J. Biochem. Cell Biol., 32 (2000) 243–253. 39 E. Tan, P.G. Besant, X.L. Zu, C.W. Turck, M.A. Bogoyevitch, S.G. Lim, P.V. Attwood and G.C. Yeoh, Histone H4 histidine kinase displays the expression pattern of a liver oncodevelopmental marker, Carcinogenesis, 25 (2004) 2083–2088. 40 E. Tan, Ph.D. Thesis: The Partial Purification of Mammalian Histidine Kinase and the Detection of Histone H4 Histidine Kinase Activity during Liver Injury and Hepatocellular Carcinoma, The University of Western Australia, 2005. 41 R.J. DeLange, D.M. Fambrough, E.L. Smith and J. Bonner, Calf and pea histone IV. 3. Complete amino acid sequence of pea seedling histone IV; comparison with the homologous calf thymus histone, J. Biol. Chem., 244 (1969) 5669–5679. 42 K. Luger, A.W. Mader, R.K. Richmond, D.F. Sargent and T.J. Richmond, Crystal structure of the nucleosome core particle at 2.8-angstrom resolution, Nature, 389 (1997) 251–260. 43 E. Tan, P.G. Besant and P.V. Attwood, Mammalian histidine kinases: Do they REALLY exist? Biochemistry, 41 (2002) 3843–3851. 44 Y.F. Wei, J.E. Morgan and H.R. Matthews, Studies of histidine phosphorylation by a nuclear protein histidine kinase show that histidine-75 in histone H4 is masked in nucleosome core particles and in chromatin, Arch. Biochem. Biophys., 268 (1989) 546–550. 45 F. Lenfant, R.K. Mann, B. Thomsen, X. Ling and M. Grunstein, All four core histone N-termini contain sequences required for the repression of basal transcription in yeast, EMBO J., 15 (1996) 3974–3985. 46 M.A. Zoroddu, T. Kowalik-Jankowska, H. Kozlowski, H. Molinari, K. Salnikow, L. Broday and M. Costa, Interaction of Ni(II) and Cu(II) with a metal binding sequence of histone H4: AKRHRK, a model of the H4 tail, Biochim. Biophys. Acta, 1475 (2000) 163–168. 47 L. Broday, W. Peng, M.H. Kuo, K. Salnikow, M. Zoroddu and M. Costa, Nickel compounds are novel inhibitors of histone H4 acetylation, Cancer Res., 60 (2000) 238–241. 48 P.S. Steeg, D. Palmieri, T. Ouatas and M. Salerno, Histidine kinases and histidine phosphorylated proteins in mammalian cell biology, signal transduction and cancer, Cancer Lett., 190 (2003) 1–12. 49 C.R. Clapier, K.P. Nightingale and P.B. Becker, A critical epitope for substrate recognition by the nucleosome remodeling ATPase ISWI, Nucleic Acids Res., 30 (2002) 649–655. 50 J. Munoz-Dorado, N. Almaula, S. Inouye and M. Inouye, Autophosphorylation of nucleoside diphosphate kinase from Myxococcus xanthus, J. Bacteriol., 175 (1993) 1176–1181. 51 M. Engel, M. Veron, B. Theisinger, M.L. Lacombe, T. Seib, S. Dooley and C. Welter, A novel serine/threonine-specific protein phosphotransferase activity of Nm23/nucleoside-diphosphate kinase, Eur. J. Biochem., 234 (1995) 200–207. 52 P.D. Wagner and N.D. Vu, Phosphorylation of ATP-citrate lyase by nucleoside diphosphate kinase, J. Biol. Chem., 270 (1995) 21758–21764. 53 P.D. Wagner, P.S. Steeg and N.D. Vu, Two-component kinase-like activity of nm23 correlates with its motility-suppressing activity, Proc. Natl. Acad. Sci. USA, 94 (1997) 9000–9005. 54 J.M. Freije, P. Blay, N.J. MacDonald, R.E. Manrow and P.S. Steeg, Site-directed mutation of Nm23-H1. Mutations lacking motility suppressive capacity upon transfection are deficient in histidine-dependent protein phosphotransferase pathways in vitro, J. Biol. Chem., 272 (1997) 5525–5532. 55 Q. Lu, H. Park, L.A. Egger and M. Inouye, Nucleoside-diphosphate kinase-mediated signal transduction via histidyl-aspartyl phosphorelay systems in Escherichia coli, J. Biol. Chem., 271 (1996) 32886–32893. 56 M.N. Levit, B.M. Abramczyk, J.B. Stock and E.H. Postel, Interactions between Escherichia coli nucleoside-diphosphate kinase and DNA, J. Biol. Chem., 277 (2002) 5163–5167.
Protein Histidine Phosphorylation
349
57 K.J. Treharne, L.J. Marshall and A. Mehta, A novel chloride-dependent GTP-utilizing protein kinase in plasma membranes from human respiratory epithelium, Am. J. Physiol., 267 (1994) L592–L601. 58 R. Muimo, S.J. Banner, L.J. Marshall and A. Mehta, Nucleoside diphosphate kinase and Cl() sensitive protein phosphorylation in apical membranes from ovine airway epithelium, Am. J. Respir. Cell Mol. Biol., 18 (1998) 270–278. 59 R. Muimo, Z. Hornickova, C.E. Riemen, V. Gerke, H. Matthews and A. Mehta, Histidine phosphorylation of annexin I in airway epithelia, J. Biol. Chem., 275 (2000) 36632–36636. 60 K.J. Treharne, C.E. Riemen, L.J. Marshall, R. Muimo and A. Mehta, Nucleoside diphosphate kinase — a component of the [Na(+)]- and [Cl()]-sensitive phosphorylation cascade in human and murine airway epithelium, Pflugers Arch., 443(Suppl. 1) (2001) S97–S102. 61 R.J. Flower and N.J. Rothwell, Lipocortin-1: Cellular mechanisms and clinical relevance, Trends Pharmacol. Sci., 15 (1994) 71–76. 62 B.P. Wallner, R.J. Mattaliano, C. Hession, R.L. Cate, R. Tizard, L.K. Sinclair, C. Foeller, E.P. Chow, J.L. Browing, K.L. Ramachandran and R.B. Pepinsky, Cloning and expression of human lipocortin, a phospholipase A2 inhibitor with potential anti-inflammatory activity, Nature, 320 (1986) 77–81. 63 G.G. Skouteris and C.H. Schroder, The hepatocyte growth factor receptor kinase-mediated phosphorylation of lipocortin-1 transduces the proliferating signal of the hepatocyte growth factor, J. Biol. Chem., 271 (1996) 27266–27273. 64 B.M. Frey, B.F. Reber, B.S. Vishwanath, G. Escher and F.J. Frey, Annexin I modulates cell functions by controlling intracellular calcium release, FASEB J., 13 (1999) 2235–2245. 65 T. Wieland, I. Ulibarri, P. Gierschik and K.H. Jakobs, Activation of signal-transducing guaninenucleotide-binding regulatory proteins by guanosine 5u-[gamma-thio]triphosphate. Information transfer by intermediately thiophosphorylated beta gamma subunits, Eur. J. Biochem., 196 (1991) 707–716. 66 T. Wieland, M. Ronzani and K.H. Jakobs, Stimulation and inhibition of human platelet adenylylcyclase by thiophosphorylated transducin beta gamma-subunits, J. Biol. Chem., 267 (1992) 20791–20797. 67 T. Wieland, B. Nurnberg, I. Ulibarri, S. Kaldenberg-Stasch, G. Schultz and K.H. Jakobs, Guanine nucleotide-specific phosphate transfer by guanine nucleotide-binding regulatory protein betasubunits. Characterization of the phosphorylated amino acid, J. Biol. Chem., 268 (1993) 18111–18118. 68 S. Kaldenberg-Stasch, M. Baden, B. Fesseler, K.H. Jakobs and T. Wieland, Receptor-stimulated guanine-nucleotide-triphosphate binding to guanine-nucleotide-binding regulatory proteins. Nucleotide exchange and beta-subunit-mediated phosphotransfer reactions, Eur. J. Biochem., 221 (1994) 25–33. 69 T. Wieland, K. Liedel, S. Kaldenberg-Stasch, D. Meyer zu Heringdorf, M. Schmidt and K.H. Jakobs, Analysis of receptor-G protein interactions in permeabilized cells, Naunyn Schmiedebergs Arch. Pharmacol., 351 (1995) 329–336. 70 B. Nurnberg, R. Harhammer, T. Exner, R.A. Schulze and T. Wieland, Species- and tissuedependent diversity of G-protein beta subunit phosphorylation: Evidence for a cofactor, Biochem. J., 318 (1996) 717–722. 71 A. Kowluru, M. Tannous and H.Q. Chen, Localization and characterization of the mitochondrial isoform of the nucleoside diphosphate kinase in the pancreatic beta cell: Evidence for its complexation with mitochondrial succinyl-CoA synthetase, Arch. Biochem. Biophys., 398 (2002) 160–169. 72 A. Kowluru, Defective protein histidine phosphorylation in islets from the Goto-Kakizaki diabetic rat, Am. J. Physiol. Endocrinol. Metab., 285 (2003) E498–E503. 73 H. Uesaka, M. Yokoyama and K. Ohtsuki, Physiological correlation between nucleosidediphosphate kinase and the enzyme-associated guanine nucleotide binding proteins, Biochem. Biophys. Res. Commun., 143 (1987) 552–559. 74 N. Kimura and N. Shimada, Direct interaction between membrane-associated nucleoside diphosphate kinase and GTP-binding protein(Gs), and its regulation by hormones and guanine nucleotides, Biochem. Biophys. Res. Commun., 151 (1988) 248–256.
350
Xin-Lin Zu et al.
75 S. Kikkawa, K. Takahashi, N. Shimada, M. Ui, N. Kimura and T. Katada, Conversion of GDP into GTP by nucleoside diphosphate kinase on the GTP-binding proteins, J. Biol. Chem., 265 (1990) 21536–21540. 76 F. Niroomand, R. Mura, K.H. Jakobs, B. Rauch and W. Kubler, Receptor-independent activation of cardiac adenylyl cyclase by GDP and membrane-associated nucleoside diphosphate kinase. A new cardiotonic mechanism?, J. Mol. Cell. Cardiol., 29 (1997) 1479–1486. 77 F. Cuello, R.A. Schulze, F. Heemeyer, H.E. Meyer, S. Lutz, K.H. Jakobs, F. Niroomand and T. Wieland, Activation of heterotrimeric G proteins by a high energy phosphate transfer via nucleoside diphosphate kinase (NDPK) B and Gbeta subunits. Complex formation of NDPK B with Gbeta gamma dimers and phosphorylation of His-266 IN Gbeta, J. Biol. Chem., 278 (2003) 7220–7226. 78 H.J. Hippe, S. Lutz, F. Cuello, K. Knorr, A. Vogt, K.H. Jakobs, T. Wieland and F. Niroomand, Activation of heterotrimeric G proteins by a high energy phosphate transfer via nucleoside diphosphate kinase (NDPK) B and Gbeta subunits. Specific activation of Gsalpha by an NDPK B.Gbetagamma complex in H10 cells, J. Biol. Chem., 278 (2003) 7227–7233. 79 P.D. Wagner and N.D. Vu, Histidine to aspartate phosphotransferase activity of nm23 proteins: Phosphorylation of aldolase C on Asp-319, Biochem. J., 346 (2000) 623–630. 80 M.T. Hartsough, D.K. Morrison, M. Salerno, D. Palmieri, T. Ouatas, M. Mair, J. Patrick and P.S. Steeg, Nm23-H1 metastasis suppressor phosphorylation of kinase suppressor of Ras via a histidine protein kinase pathway, J. Biol. Chem., 277 (2002) 32389–32399. 81 M. Weller, Protein-bound histidine, as well as protein-bound serine, residues are sites of phosphorylation in the synaptic plasma membrane, Biochim. Biophys. Acta, 509 (1978) 491–498. 82 A.N. Hegde and M.R. Das, Ras proteins enhance the phosphorylation of a 38 kDa protein (p38) in rat liver plasma membrane, FEBS Lett., 217 (1987) 74–80. 83 K. Motojima and S. Goto, A protein histidine kinase induced in rat liver by peroxisome proliferators. In vitro activation by Ras protein and guanine nucleotides, FEBS Lett., 319 (1993) 75–79. 84 K. Motojima and S. Goto, Histidyl phosphorylation and dephosphorylation of P36 in rat liver extract, J. Biol. Chem., 269 (1994) 9030–9037. 85 K. Motojima, P. Passilly, B. Jannin and N. Latruffe, Histidyl phosphorylation of P36 in rat hepatoma Fao cells in vitro and in vivo, Biochem. Biophys. Res. Commun., 205 (1994) 899–904. 86 C.S. Crovello, B.C. Furie and B. Furie, Histidine phosphorylation of P-selectin upon stimulation of human platelets: A novel pathway for activation-dependent signal transduction, Cell, 82 (1995) 279–286. 87 K.M. Popov, Y. Zhao, Y. Shimomura, M.J. Kuntz and R.A. Harris, Branched-chain alpha-ketoacid dehydrogenase kinase. Molecular cloning, expression, and sequence similarity with histidine protein kinases, J. Biol. Chem., 267 (1992) 13127–13130. 88 K.M. Popov, N.Y. Kedishvili, Y. Zhao, Y. Shimomura, D.W. Crabb and R.A. Harris, Primary structure of pyruvate dehydrogenase kinase establishes a new family of eukaryotic protein kinases, J. Biol. Chem., 268 (1993) 26602–26606. 89 J.R. Davie, R.M. Wynn, M. Meng, Y.S. Huang, G. Aalund, D.T. Chuang and K.S. Lau, Expression and characterization of branched-chain alpha-ketoacid dehydrogenase kinase from the rat. Is it a histidine-protein kinase? J. Biol. Chem., 270 (1995) 19861–19867. 90 J.J. Thelen, J.A. Miernyk and D.D. Randall, Pyruvate dehydrogenase kinase from Arabidopsis thaliana: A protein histidine kinase that phosphorylates serine residues, Biochem. J., 349 (2000) 195–201. 91 A. Tuganova, M.D. Yoder and K.M. Popov, An essential role of Glu-243 and His-239 in the phosphotransfer reaction catalyzed by pyruvate dehydrogenase kinase, J. Biol. Chem., 276 (2001) 17994–17999. 92 A. Tovar-Mendez, J.A. Miernyk and D.D. Randall, Histidine mutagenesis of Arabidopsis thaliana pyruvate dehydrogenase kinase, Eur. J. Biochem., 269 (2002) 2601–2606. 93 B.P. Mooney, N.R. David, J.J. Thelen, J.A. Miernyk and D.D. Randall, Histidine modifying agents abolish pyruvate dehydrogenase kinase activity, Biochem. Biophys. Res. Commun., 267 (2000) 500–503.
Protein Histidine Phosphorylation
351
94 P.G. Besant, M.V. Lasker, C.D. Bui and C.W. Turck, Inhibition of branched-chain alpha-keto acid dehydrogenase kinase and Sln1 yeast histidine kinase by the antifungal antibiotic radicicol, Mol. Pharmacol., 62 (2002) 289–296. 95 C. Wong, B. Faiola, W. Wu and P.J. Kennelly, Phosphohistidine and phospholysine phosphatase activities in the rat: Potential protein-lysine and protein-histidine phosphatases?, Biochem. J., 296 (1993) 293–296. 96 H. Ohmori, M. Kuba and A. Kumon, Two phosphatases for 6-phospholysine and 3-phosphohistidine from rat brain, J. Biol. Chem., 268 (1993) 7625–7627. 97 H. Ohmori, M. Kuba and A. Kumon, 3-Phosphohistidine/6-phospholysine phosphatase from rat brain as acid phosphatase, J. Biochem., 116 (1994) 380–385. 98 H. Hiraishi, F. Yokoi and A. Kumon, Bovine liver phosphoamidase as a protein histidine/lysine phosphatase, J. Biochem., 126 (1999) 368–374. 99 P.T. Cohen, Novel protein serine/threonine phosphatases: Variety is the spice of life, Trends Biochem. Sci., 22 (1997) 245–251. 100 Y. Kim, J. Huang, P. Cohen and H.R. Matthews, Protein phosphatases 1, 2A, and 2C are protein histidine phosphatases, J. Biol. Chem., 268 (1993) 18513–18518. 101 H.R. Matthews and C. Mackintosh, Protein histidine phosphatase activity in rat liver and spinach leaves, FEBS Lett., 364 (1995) 51–54. 102 Y. Kim and H.R. Matthews, Protein phosphatase assay suitable for acid-labile substrates, Anal. Biochem., 211 (1993) 28–33. 103 S. Klumpp, J. Hermesmeier and J. Krieglstein, Detection of protein histidine phosphatase in vertebrates, Meth. Enzymol., 366 (2003) 56–64. 104 S. Klumpp, J. Hermesmeier, D. Selke, R. Baumeister, R. Kellner and J. Krieglstein, Protein histidine phosphatase: A novel enzyme with potency for neuronal signaling, J. Cereb. Blood Flow Metab., 22 (2002) 1420–1424. 105 P. Ek, G. Pettersson, B. Ek, F. Gong, J.P. Li and O. Zetterqvist, Identification and characterization of a mammalian 14-kDa phosphohistidine phosphatase, Eur. J. Biochem., 269 (2002) 5016–5023. 106 H. Hiraishi, F. Yokoi and A. Kumon, 3-Phosphohistidine and 6-phospholysine are substrates of a 56-kDa inorganic pyrophosphatase from bovine liver, Arch. Biochem. Biophys., 349 (1998) 381–387. 107 F. Yokoi, H. Hiraishi and K. Izuhara, Molecular cloning of a cDNA for the human phospholysine phosphohistidine inorganic pyrophosphate phosphatase, J. Biochem., 133 (2003) 607–614. 108 O. Walinder, Identification of a phosphate-incorporating protein from bovine liver as nucleoside diphosphate kinase and isolation of 1-32P-phosphohistidine, 3-32P-phosphohistidine, and N-epsilon-32P-phospholysine from erythrocytic nucleoside diphosphate kinase, incubated with adenosine triphosphate-32P, J. Biol. Chem., 243 (1968) 3947–3952. 109 E. Tan, X. Lin Zu, G.C. Yeoh, P.G. Besant and P.V. Attwood, Detection of histidine kinases via a filter-based assay and reverse-phase thin-layer chromatographic phosphoamino acid analysis, Anal. Biochem., 323 (2003) 122–126. 110 D.E. Hultquist, R.W. Moyer and P.D. Boyer, The preparation and characterization of 1-phosphohistidine and 3-phosphohistidine, Biochemistry, 5 (1966) 322–331. 111 P.G. Besant, M.V. Lasker, C.D. Bui and C.W. Turck, Phosphohistidine analysis using reversedphase thin-layer chromatography, Anal. Biochem., 282 (2000) 149–153. 112 I. Kameshita and H. Fujisawa, A sensitive method for detection of calmodulin-dependent protein kinase II activity in sodium dodecyl sulfate-polyacrylamide gel, Anal. Biochem., 183 (1989) 139–143. 113 V. Zabrouskov, M.W. Senko, Y. Du, R.D. Leduc and N.L. Kelleher, New and automated MSn approaches for top-down identification of modified proteins, J. Am. Soc. Mass Spectrom., 16 (2005) 2027–2038. 114 C.E. Thomas, N.L. Kelleher and C.A. Mizzen, Mass spectrometric characterization of human histone H3: A bird’s eye view, J. Proteome Res., 5 (2006) 240–247. 115 C.H. Borchers, R. Thapar, E.V. Petrotchenko, M.P. Torres, J.P. Speir, M. Easterling, Z. Dominski and W.F. Marzluff, Combined top-down and bottom-up proteomics identifies a phosphorylation site
352
116 117
118
119
120 121 122
123
124 125
126
Xin-Lin Zu et al.
in stem-loop-binding proteins that contributes to high-affinity RNA binding, Proc. Natl. Acad. Sci. USA, 103 (2006) 3094–3099. R.S. Annan, M.J. Huddleston, R. Verma, R.J. Deshaies and S.A. Carr, A multidimensional electrospray MS-based approach to phosphopeptide mapping, Anal. Chem., 73 (2001) 393–404. M. Wind, A. Wegener, R. Kellner and W.D. Lehmann, Analysis of CheA histidine phosphorylation and its influence on protein stability by high-resolution element and electrospray mass spectrometry, Anal. Chem., 77 (2005) 1957–1962. S. Napper, J. Kindrachuk, D.J. Olson, S.J. Ambrose, C. Dereniwsky and A.R. Ross, Selective extraction and characterization of a histidine-phosphorylated peptide using immobilized copper(II) ion affinity chromatography and matrix-assisted laser desorption/ionization time-offlight mass spectrometry, Anal. Chem., 75 (2003) 1741–1747. M. Lasker, C.D. Bui, P.G. Besant, K. Sugawara, P. Thai, G. Medzihradszky and C.W. Turck, Protein histidine phosphorylation: Increased stability of thiophosphohistidine, Protein Sci., 8 (1999) 2177–2185. K.F. Medzihradszky, N.J. Phillipps, L. Senderowicz, P. Wang and C.W. Turck, Synthesis and characterization of histidine-phosphorylated peptides, Protein Sci., 6 (1997) 1405–1411. M.C. Pirrung, K.D. James and V.S. Rana, Thiophosphorylation of histidine, J. Org. Chem., 65 (2000) 8448–8453. P.D. Von Haller, E. Yi, S. Donohoe, K. Vaughn, A. Keller, A.I. Nesvizhskii, J. Eng, X.J. Li, D.R. Goodlett, R. Aebersold and J.D. Watts, The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis, and the application of statistical tools for data analysis and interpretation, Mol. Cell Proteomics, 2 (2003) 428–442. A. Wyttenbach and A.M. Tolkovsky, Differential phosphoprotein labelling (DIPPL), a method for comparing live cell phosphoproteomes using simultaneous analysis of 33P and 32P labelled proteins. Mol. Cell Proteomics, 5 (2005) 553–559. C. Schenkels, B. Erni and J.L. Reymond, Phosphofurylalanine, a stable analog of phosphohistidine, Bioorg. Med. Chem. Lett., 9 (1999) 1443–1446. M.V. Lasker, P. Thai, P.G. Besant, C.D. Bui, S. Naidu and C.W. Turck, Branched-chain a-ketoacid dehydrogenase kinase: A mammalian enzyme with histidine kinase activity, J. Biomol. Tech., 13 (2002) 1–9. R. Dutta and M. Inouye, GHKL, an emergent ATPase/kinase superfamily, Trends Biochem. Sci., 25 (2000) 24–28.
CHAPT ER
15 O-GlcNAc Proteomics: Mass Spectrometric Analysis of O-GlcNAc Modifications on Proteins Robert J. Chalkley, Lance Wells and Keith Vosseller
Contents
1. Introduction 1.1 O-GlcNAcylation, a widespread intracellular post-translational modification responsive to signaling and changing cell states 1.2 Responsiveness of O-GlcNAc to signaling and changing cell states 1.3 O-GlcNAcylation interplay with phosphorylation 1.4 Elevated O-GlcNAcylation is linked with type II diabetic states 1.5 O-GlcNAcylation link to Alzheimer’s Disease (AD) 1.6 Site-specific regulatory functions of O-GlcNAc: The need for mass spectrometry based site-mapping strategies 2. Challenges to Mapping Sites of O-GlcNAc Modification 3. Early Efforts in O-GlcNAc Site-Mapping 4. Enzymatic Tagging of O-GlcNAc to Facilitate Enrichment and Identification of Modification Sites 5. Chemoenzymatic Approaches in O-GlcNAc Proteomics 6. Beta-Elimination/Michael Addition Strategies for O-GlcNAcylation Site-Mapping 7. Direct Enrichment of Native O-GlcNAc Modified Proteins with WGA Lectin Weak Affinity Chromatography (LWAC) 8. Ion Trap MS2/MS3 for O-GlcNAc Modified Peptide Identification 9. Electron Capture Dissociation (ECD) for O-GlcNAc Site-Mapping 10. Interpretation of O-GlcNAcylated Peptide Mass Spectrometry 11. Conclusions References
Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00215-8
354 354 355 355 356 357 357 358 360 361 362 363 365 366 366 368 369 370
r 2009 Elsevier B.V. All rights reserved.
353
354
Robert J. Chalkley et al.
1. INTRODUCTION In 1986, Gerald Hart and co-workers discovered a carbohydrate post-translational modification (PTM) that went against the dogma of the time that glycosylation occurred exclusively on cell surface, secreted, or luminal proteins [1]. The modification was a single O-linked N-acetylglucosamine (O-GlcNAc) residue, shown to be in beta linkage to serines and threonines of cytosolic and nuclear proteins. Since that time, O-GlcNAc has been shown to be a widespread modification found on diverse functional classes of proteins throughout higher eukaryotic cells and to have fundamental roles in a variety of cellular processes [2,3]. However, given the emerging regulatory importance of O-GlcNAcylation in many biological systems and cell types, the modification is still relatively unknown and understudied. Additionally, relatively little is known about regulatory functions of specific O-GlcNAc modification sites, in part due to historical difficulties in mapping sites of modification. In this chapter, we will briefly review evidence for specific functions of O-GlcNAcylation to establish its biological significance. We will then discuss why this modification has been historically difficult to detect, and make the case that specific mass spectrometrybased O-GlcNAc modification site-mapping strategies now emerging will be a critical basis for expanding research on site-specific regulatory functions of this modification. In that context, we will review and compare the recent efforts to develop methods/techniques that will allow for effective mass spectrometrybased site-specific O-GlcNAc proteomics. In general, the field of proteomics is recognizing the importance of analyses beyond protein expression, as it is becoming clear that complex patterns of PTMs (even on single proteins) functionally diversify proteomes. Methods to specifically target these various modifications should be incorporated into analyses, in order to more fully and accurately characterize complex and dynamic proteomes.
1.1 O-GlcNAcylation, a widespread intracellular post-translational modification responsive to signaling and changing cell states O-GlcNAcylation is a dynamic enzyme-mediated cytosolic and nuclear carbohydrate modification of serines and threonines by N-acetylglucosamine that is in many ways analogous to phosphorylation [4,5]. O-GlcNAcylation appears to be ubiquitous in all cell types of metazoans, but has not been shown to occur in lower organisms such as yeast. Relative levels of O-GlcNAc modification appear to be differentially controlled in certain tissues by increased expression of the enzyme which catalyzes addition of O-GlcNAc (OGT). For example, OGT is most highly expressed in Brain and Pancreas [6,7], leading to relatively abundant levels of O-GlcNAc modification, perhaps suggesting critical evolutionarily conserved roles for O-GlcNAcylation in these tissues. Over one hundred specific proteins belonging to diverse functional classes have been shown to be O-GlcNAc modified [8]. However, based on O-GlcNAc western blotting of various cellular fractions on 2D gels, there are many O-GlcNAc modified proteins yet to be identified [3]. One of the key features of O-GlcNAcylation that sets it apart from complex glycosylation
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
355
is that it rapidly turns over on proteins and is responsive to specific cellular signals and changing cell states. Glycosyltransferases and glycosidases that act to produce complex glycans are sequestered in secretory compartments and are thus separated physically from potential extracellular substrates. Hence, they lack the ability to respond rapidly to signaling to effect changes in cell surface complex glycosylation. However, enzymes which add (OGT) and remove O-GlcNAc (O-GlcNAcase) are localized in the same cellular compartments as their substrates and are thus positioned to regulate O-GlcNAc modification levels rapidly in response to signaling. The dynamic nature of O-GlcNAcylation indicates it may play roles analogous to phosphorylation in signaling, acting as a rapid switching mechanism to control processes such as protein–protein interactions and subcellular localization. For example, O-GlcNAcylation has been implicated in the regulation of the Sp1 transcription factor association with the transcriptional co-activator TAF110 [9], and plays a role in Sp1 nuclear localization [10].
1.2 Responsiveness of O-GlcNAc to signaling and changing cell states The pharmacological agents calcium ionophore and okadaic acid, which bypass agonist-induced signals to effect calcium- and phosphorylation-based signaling, cause O-GlcNAc responsiveness within 1 min, demonstrating the potential for rapid signaling-based modulation of O-GlcNAc modification [11]. In an example of more physiologically relevant rapid changes in O-GlcNAcylation levels, it was shown that treatment of neutrophils with the fMLF agonist acting through a heterotrimeric G protein coupled receptor (Gi subclass) caused increases in O-GlcNAcylation within 1 min [12]. This rapid agonist induced increase in O-GlcNAc modification was functionally linked to neutrophil chemotactic responses to fMLP associated with increases in MAPK activation [13]. In response to a variety of cellular stresses, a complex protective adaptive program is initiated in eukaryotic cells, which is associated with rapidly elevated O-GlcNAc modification levels. For example, within 15 min after heat shock of cells, levels of O-GlcNAcylation significantly increase, and this increase is functionally linked to survival in response to thermal stress [14]. O-GlcNAc modification dynamics have also been shown to be critical to proper progression through specific phases of the cell cycle [15].
1.3 O-GlcNAcylation interplay with phosphorylation O-GlcNAcylation and O-phosphorylation may sometimes be functionally reciprocal, as the two modifications can compete for the same residue in some proteins (Figure 1) [16]. Both O-GlcNAcylation and O-phosphorylation have been mapped to the same residue in several proteins, including C-myc [17] and the estrogen receptor beta [18]. In addition to direct competition for the same residue, O-GlcNAcylation has been shown to influence phosphorylation of nearby residues, as in the case of RNA pol II [19,20]. A reciprocal relationship between phosphorylation and O-GlcNAcylation has also been demonstrated in neurons [21,22]. Thus, altered levels of O-GlcNAc modification may modulate
356
Robert J. Chalkley et al.
X-Ser(thr)-X ATP kinase
O-GlcNAcase OH HO HO
O O
PO3-
competition
AcNH X-Ser(thr)-X X-Ser(thr)-X
OGT
phosphatase
UDP-GlcNAc X-Ser(thr)-X
Figure 1 O-GlcNAc is a dynamic cytosolic enzyme catalyzed PTM of serines and threonines which can compete with phosphorylation. (See Colour Plate Section at the end of this book.)
phosphorylation-based signaling. Conversely, interplay between these two modifications may include instances in which site-specific O-GlcNAcylation facilitates phosphorylation (e.g. O-GlcNAcylation-dependent activation of a kinase). Thus, regulation of O-GlcNAc modification levels likely modulate many phosphorylation-based signaling events.
1.4 Elevated O-GlcNAcylation is linked with type II diabetic states O-GlcNAcylation dynamic responsiveness appears to act as a sensor of nutritional states. In type II diabetic hyperglycemic states, levels of O-GlcNAc modification are elevated due to increased flux of glucose through the hexosamine biosynthetic pathway (HBP). The end-product of the HBP is UDP-GlcNAc, the donor sugar nucleotide used by the transferase (OGT) which adds O-GlcNAc to proteins. OGT activity is particularly responsive to the range of UDP-GlcNAc concentrations observed in cells. Thus, increased flux of glucose through the HBP in type II diabetic states leads to increased O-GlcNAc modification of proteins [5,23–28]. A main pathology in type II diabetes is the inability of tissues like skeletal muscle and adipose to respond appropriately to insulin stimulation (referred to as insulin resistance). A causal link between O-GlcNAcylation and insulin resistance has been demonstrated, as specific elevation of O-GlcNAcylation causes insulin resistance associated with insulin signaling defects involving the IRS1/2 to Akt phosphorylation cascade [29–33]. Several additional specific links between O-GlcNAc modification and diabetic dysfunction have been reported. For example, elevated O-GlcNAc modification of eNOS in diabetic states was shown to inhibit Akt-mediated phosphorylation and activation of this enzyme [34] that is thought to be linked to vascular defects in type II diabetes. O-GlcNAcylation has
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
357
also been linked to the pathology of diabetic cardiomyopathy, as high glucoseinduced impairment of cardiomyocyte function, including altered calcium transients and transcriptional effects, was specifically linked to elevated O-GlcNAc modification [35]. The ability of pancreatic beta cells to sense glucose and secrete appropriate amounts of insulin is critical to glucose homeostasis but is compromised in type II diabetes. In beta cells, O-GlcNAcylation levels are strongly elevated in type II diabetes, and pharmacological elevation of O-GlcNAc modification by Streptozotocin (STZ) or PUGNAc blocks glucose-stimulated insulin secretion from beta cell islets [36,37], indicating an important functional link between O-GlcNAcylation in normal vs. type II diabetic insulin secretion. Importantly, while strong links between elevated O-GlcNAc modification and diabetic dysfunction are established, nothing is known about the site-specific mechanistic roles of O-GlcNAcylation in these defects. It will be important to identify specific O-GlcNAc modification sites in order to experimentally pursue such information.
1.5 O-GlcNAcylation link to Alzheimer’s Disease (AD) O-GlcNAc modification levels are reduced in a post-mortem human AD brain [38] and on specific synaptic proteins [39,40]. Glucose metabolism/availability is reduced in AD [41], and thus reduced flux of glucose through the HBP may underlie reduced O-GlcNAcylation levels in AD. Hyperphosphorylated Tau is the major component of pathological neurofibrillary tangles in AD. Tau can be modified by O-GlcNAc also, but this modification is reduced in human AD brains, and elevation of O-GlcNAc modification is functionally reciprocal with specific hyperphosphorylated sites on Tau in vivo [22,38]. Thus, a specific functional connection between abnormal O-GlcNAcylation and AD pathology has been established. However, no specific sites of O-GlcNAc modification have been mapped on Tau or other proteins in the context of AD, highlighting the importance of site-specific O-GlcNAcylation proteomics in revealing potential disease-related mechanisms.
1.6 Site-specific regulatory functions of O-GlcNAc: The need for mass spectrometry based site-mapping strategies As described above, O-GlcNAcylation is now implicated in a variety of cellular processes and protein regulatory events. However, knowledge of site-specific O-GlcNAc modification regulatory mechanisms is very limited. Mapping sites of O-GlcNAc attachment is critical to this goal. As single enzymes are believed to catalyze either addition or removal of O-GlcNAc, genetic or pharmacological targeting of their activities results in global modulations of O-GlcNAc levels, making it difficult to implicate specific modification events in regulatory processes. Experimentally, it will be important to modulate O-GlcNAc modification at specific sites through techniques such as site-directed mutagenesis, to establish site-specific functional roles, and this will require knowledge of specific O-GlcNAc
358
Robert J. Chalkley et al.
modified residues. Additionally, knowledge of sites modified within functional domains of proteins will rationally guide hypotheses concerning O-GlcNAcylation’s role in protein behavior. There appears to be a loose consensus motif for O-GlcNAc modification on a small subset of the sites thus far identified that involves proline and valine in proximity to multiple hydroxyl-containing amino acids (serine and/or threonine) (so-called ‘‘PVST’’ motif). However, the majority of O-GlcNAcylation sites are not associated with a recognizable primary amino acid sequence motif [42]. Thus, while the PVST motif may aid in predicting some O-GlcNAc modification sites, unbiased pursuit of novel O-GlcNAc modification events is still important.
2. CHALLENGES TO MAPPING SITES OF O-GlcNAc MODIFICATION In part, the lack of knowledge about site-specific O-GlcNAc function has been limited by challenges in detecting and then mapping specific sites of O-GlcNAc attachment. There are several challenges that make site-mapping of O-GlcNAcylation more difficult than other PTMs such as phosphorylation. First, while phosphorylation studies have benefited from high sensitivity radiolabeling with 32 P, the only radiolabel able to be incorporated in/tagged to O-GlcNAc is tritium, which has six orders of magnitude less radioactivity than 32P, resulting in very low sensitivity of detection. PTMs in general are of low abundance relative to unmodified species of proteins. It has become apparent in the field of proteomics that effective targeting of PTMs greatly benefits from affinity steps to enrich for modified peptides/proteins of interest. In the case of phosphorylation, immobilized metal ion affinity chromatography (IMAC) has greatly facilitated large scale site-mapping of phosphorylation from complex protein mixtures [43–45]. However, until quite recently, an analogous enrichment procedure for in vivo OGlcNAc modified peptides was not available [42]. Finally, relative to other PTMs such as phosphorylation, O-GlcNAc is quite labile during standard collisioninduced dissociation (CID) fragmentation [46,47]; the predominant fragmentation method used in mass spectrometers. CID is an excitatory fragmentation process that preferentially breaks the weakest bonds in a molecule. However, in order to be able to determine a site of modification it is necessary to observe fragment ions that retain the sugar moiety. The energy required to break the O-glycosidic linkage between sugar and peptide is much less than that required to fragment the peptide backbone. Thus, in CID MS/MS spectra of O-GlcNAc modified peptides, the primary fragmentation results in loss of N-acetylglucosamine from the peptide (appearing as the oxonium ion at m/z 204.08), and the appearance of the precursor having lost the mass of O-GlcNAc (mass of precursor minus 203.08 Da). Consequently, spectra with little information content are generated, leading to difficulties in identifying these peptides and assigning O-GlcNAc modification sites. Examples of the lability of O-GlcNAc in CID seen in MS/MS acquired on either a quadrupole time-of-flight (Q-TOF) geometry instrument (QSTAR) or in an ion trap instrument (LTQ) are shown in Figure 2A and B. Doubly charged precursor O-GlcNAc modified peptides have slightly differing characteristics in these two
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
359
(A) CID in Q-TOF mass spec MS/MS of [M+2H]2+ 644.9 QLLPSTATVR (O-GlcNAc modified, site not determined) 204.08 GlcNAc oxonium ion
Relative intensity
138.06
242.1 b2
355.2 b3
375.2 338.2 y3 446.3 y4 100
200
300
[M+H]1+ 1085.7 Precursor Having lost O-GlcNAc
[M+2H]2+ 543.3 Precursor Having lost O-GlcNAc
400
500
934.5 y7 731.4 +O-GlcNAc 634.35 y7 844.5 y6 1068.5 y8
600
700
800
900
1000
1100
m/z (B) CID in ion trap mass spec MS/MS of [M+3H]3+ 579.9 HDTSASTQSTPASSR (O-GlcNAc modified, site not determined) Relative intensity
10X
20X 511.8 b5
204.1 O-GlcNAc oxonium ion
[M+2H]2+ 766.9 Precursor Having lost O-GlcNAc
720.3 y5 +O-GlcNAc
1016.3 b10 916.4 b9
1280.3 1181.1 y13 1396.6 y12 y14
m/z
Figure 2 CID MS/MS of an O-GlcNAc modified peptide in (A) Q-TOF mass spectrometry (QSTAR instrument) showing the abundance of the 204.08 GlcNAc oxonium ion having been lost from the peptide, as well as intact precursor ions having lost O-GlcNAc. (B) CID MS/MS of an O-GlcNAc modified peptide in ion trap mass spectrometry (LTQ instrument) shows the presence of the 204.08 GlcNAc oxonium ion (apparent at relatively low intensity compared to Q-TOF spectra of O-GlcNAc peptides) and intact precursor ions having lost O-GlcNAc (having very strong intensity relatively compared to Q-TOF spectra of O-GlcNAc peptides). Due to lability of O-GlcNAc in each case, there is not enough information to determine exact site of modification in these peptides. (See Colour Plate Section at the end of this book.)
types of mass spectrometers. In QSTAR spectra, the full mass range is accessible and the m/z 204.08 oxonium ion of GlcNAc itself is almost always the dominant ion in the spectra. The abundance of this ion illustrates the lability of the modification, and highlights the fact that none or very little O-GlcNAc will be retained on fragment ions. Additionally, as the majority of the energy of CID is lost in release of GlcNAc, minimal fragmentation of the peptide backbone is generally observed. This is highlighted by the presence of intact precursor having lost the mass of O-GlcNAc (Figure 2A), indicating incomplete fragmentation of the precursor peptide. In some cases, sufficient fragment ion information exists to still be able to sequence the peptide, as in this example. However, information on the exact site of O-GlcNAc modification is generally lost (as the sugar is not present on the fragment ions). By tailoring the fragmentation conditions to use threshold levels of energy for fragmentation it is possible to improve the chances of
360
Robert J. Chalkley et al.
observing modified fragment ions [48]. Using this method novel O-GlcNAc modification sites have been identified [49]. However, this approach will not always work, is somewhat dependent on the peptide sequence and is not practical for global modification analysis. In ion trap instruments, the increased energy introduced into molecules to instigate CID fragmentation gives ions that have an m/z less than a third of the m/z of the precursor ion too much energy to allow them to be trapped. This often precludes the observation of the m/z 204.08 O-GlcNAc oxonium ion and even when it can be observed, the intensity of this ion in ion trap MS/MS spectra is relatively much lower than that seen in instruments that use a quadrupole for CID fragmentation (Figure 2B). In ion trap MS/MS spectra, the m/z of the precursor having lost O-GlcNAc is the strongly dominant ion, as seen in Figure 2B. For example, the most intense ion invariably observed in MS/MS corresponds to a neutral loss of m/z 101.5 (corresponding to the doubly charged precursor having lost the mass of O-GlcNAc 203.08). Again, for ion trap MS/MS of O-GlcNAc peptides, fragment ion series are usually very poor and information on the original site of O-GlcNAc attachment is rare. Indeed, due to the fact that in ion trap CID once the molecule has been fragmented it is no longer being resonantly excited, it is rare to get fragments that are the product of two bond cleavages to be formed, unlike in CID in a quadrupole, where multiple collisions can take place so there is a greater chance of the deglycosylated precursor ion undergoing further fragmentation which will facilitate peptide identification (although not site identification). For the reasons outlined earlier, it is perhaps not unexpected that O-GlcNAc modifications have gone unnoticed in most mass spectrometry analyses. Not only does the lability make the identification of these peptides difficult with standard automated CID, but automated search algorithms have to be appropriately modified to search for O-GlcNAc modifications in MS/MS spectra allowing for the neutral loss of the modification (see section on interpretation of O-GlcNAc mass spectrometry data). It should be noted that the presence of the m/z 204.08 may be confused with a y2 ion corresponding to the dipeptide GK (m/z 204.13), which is a relatively common product in tryptic digestions. However, the mass accuracy of most TOF or other higher mass accuracy analyzers should be able to discriminate between this mass difference. Also, the GlcNAc oxonium ion undergoes multiple water losses to produce further fragments at m/z 186.07 and 168.06. Additionally, the genuine O-GlcNAc oxonium ion is characteristically the most intense in the spectra and this abundance is diagnostic of that ion being GlcNAc vs. GK. The following sections will now deal with a variety of approaches in the context of mass spectrometry and proteomics designed to overcome the challenges of O-GlcNAc site-mapping described thus far.
3. EARLY EFFORTS IN O-GlcNAc SITE-MAPPING Initial studies of O-GlcNAc site-mapping involved radiolabeling of O-GlcNAc by enzymatic tagging of the modification with [3H] galactose through the action of a
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
361
galactosyltransferase. Following proteolysis of the labeled protein, repeated rounds of high-performance liquid chromatography (HPLC)-based purification of [3H] labeled peptides where then followed by manual Edman degradation protein sequencing to track release of the [3H] label associated with a specific amino acid residue in a given peptide [50]. Galactosyltransferase will label any terminal GlcNAc on a protein and thus will also tag complex N-linked carbohydrates with terminal GlcNAc. Thus, care must be taken to first enzymatically remove N-linked carbohydrates from a sample with PNGase F in order to achieve specificity of the technique for O-GlcNAcylation. This approach is now largely just of historical interest, as its sensitivity is poor, necessitating production of large amounts of starting protein, often involving overexpression of protein in heterologous systems, for successful site-mapping and is not likely to be useful in identifying in vivo endogenous sites of O-GlcNAc modification. If the reader is interested in more details on this technique, they are referred to Whelan and Hart [51].
4. ENZYMATIC TAGGING OF O-GlcNAc TO FACILITATE ENRICHMENT AND IDENTIFICATION OF MODIFICATION SITES The lectin wheat germ agglutinin (WGA) has specificity for binding to N-acetylglucosamine (GlcNAc) and sialic acid [52]. However, early attempts to exploit WGA specificity to enrich O-GlcNAc modified peptides directly using WGA were not successful. WGA acts as a dimer containing four carbohydrate binding sites [53] and achieves high affinity interactions with complex carbohydrates through binding multiple sugar moieties simultaneously [54]. The single N-acetylglucosamine moiety of O-GlcNAc has a much lower affinity for WGA. The dissociation constant for free GlcNAc binding to WGA has been measured in the 10 mM range [55], suggesting that an O-GlcNAc interaction with WGA would be quite weak. For this reason, it is not unexpected that traditional protocols for WGA capture of glycosylated peptides/proteins involving binding, washing, and specific elution steps (while effective for complex carbohydrates) are not effective in isolating O-GlcNAc modified peptides. However, an alternative lectin-based approach has been used to isolate O-GlcNAc peptides for subsequent mass spectrometry analysis. Galactosyltransferase is first used to covalently add a galactose moiety to O-GlcNAc. The lectin Ricin communis agglutinin (RCA), which binds specifically to galactose, is then used to affinity isolate the tagged peptides. This approach has been used to enrich O-GlcNAc modified peptides from complex mixtures of unmodified peptides [56,57]. Again, the use of galactosyltransferase labeling is not necessarily specific for O-GlcNAc, and thus N-linked carbohydrates with terminal GlcNAc will also be targeted by this approach if not removed by prior enzymatic digestion with PNGase F. CID of galactose tagged O-GlcNAc peptides gives rise to carbohydrate-specific ions which are diagnostic of O-GlcNAc. The appearance of m/z 366.14 in MS/MS corresponds to the GlcNAc-Gal dissacharide, whereas 204.08 corresponds to the liberation of GlcNAc from that dissacharide. The relatively strong presence of these ions is due to the lability of the O-GlcNAc linkage in CID. The production of
362
Robert J. Chalkley et al.
m/z 366 was exploited to target precursor ions of interest utilizing a triple quadrupole mass spectrometer with the capability for ion scanning for the m/z 366 produced by low energy CID (so-called product ion scanning) which then triggered high energy CID of that same precursor for peptide sequencing [57]. RCA chromatography of galactose labeled O-GlcNAc modified peptides while validated on a known synthetic O-GlcNAc modified peptide, has not been demonstrated for discovery and identification of novel O-GlcNAc modification sites. Potential difficulties may include interference by components in the complex lysates of cells and tissue which is for some reason not compatible with RCA chromatography. More direct lectin-based enrichment of native O-GlcNAc modified peptides leading to proteomic scale identification of novel O-GlcNAc sites has now been demonstrated using WGA (see section ‘‘Direct enrichment of native O-GlcNAc modified proteins with WGA lectin weak affinity chromatography (LWAC)’’) [58].
5. CHEMOENZYMATIC APPROACHES IN O-GlcNAc PROTEOMICS A potential drawback of lectin-based methods for O-GlcNAcylation enrichment is the relatively low affinity of lectins for O-GlcNAc. Like most regulatory PTMs, O-GlcNAcylation is generally of low stoichiometry. Thus, in the case of weak affinity lectin enrichment, much more abundant unmodified peptides may overlap in elution with O-GlcNAcylated peptides. Enrichment of O-GlcNAc modified peptides in theory may be performed successfully by immunopurification with antibodies against the modification (e.g. CTD110.6 monoclonal or RL2). However, such an approach has not been demonstrated. It is likely, given the size of the O-GlcNAc epitope, that these antibodies, while specific for O-GlcNAcylation and useful in western blotting, have relatively low affinity for modified peptides and thus are expected to be only weakly effective in immunoprecipitation. However, it remains to be seen whether strategies for enrichment of O-GlcNAc modified peptides can be adapted to include the use of anti-O-GlcNAcylation antibodies. In an effort to generate a high affinity tool for purification of O-GlcNAc modified peptides, a chemoenzymatic approach has been taken which introduces a chemical biotin tag or ‘‘handle’’ on O-GlcNAc which allows for subsequent purification with high affinity streptavidin chromatography [59,60]. In this scheme, an analogue of galactose containing a ketone functionality at the C-2 position of the galactose ring is first enzymatically attached to O-GlcNAc. To achieve this, a mutated version of galactosyltransferase (Y289L) is utilized which has increased activity in using the UDP-galactose ketone analogue for modification of O-GlcNAc, without any compromise of specificity. The incorporated ketone functionality is then reacted with aminooxy biotin. The biotinylated O-GlcNAcylated protein or peptide can then be isolated with streptavidin chromatography prior to mass spectrometry analysis. In addition to mass spectrometry, this approach also allows for high sensitivity detection of biotin labeled O-GlcNAc modified proteins on blotting membranes using streptavidin-HRP as a detection reagent. This approach was applied to tryptic digests of rat forebrain extracts [61], and the
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
363
enriched fractions displayed strong enrichment of O-GlcNAc modified peptides as assessed by the presence of relatively intense diagnostic ions corresponding to precursor ions having lost the mass of the GlcNAc-ketone-biotin moiety (due to lability of the O-GlcNAc linkage to the peptide). Thirty-four different peptides from MS/MS spectra containing such diagnostic ions were identified as O-GlcNAc modified. Due to the lability of the GlcNAc-ketone-biotin, specific modification site assignment was only possible in a few cases. The peptides identified arise from proteins of diverse functions in transcription, cytoskeletal dynamics and synaptic transmission. This approach can be used on any protein sample from cells or tissue as the tagging is performed in vitro subsequent to sample preparation. An alternative chemoenzymatic method for introducing a high affinity tag on O-GlcNAc for subsequent purification utilizes a metabolic labeling approach [62]. A metabolic salvage pathway exists which converts GlcNAc to GlcNAc-1-phosphate en route to the production of UDP-GlcNAc, which is the donor sugar nucleotide used by OGT in the addition of O-GlcNAc to proteins. The promiscuity of enzymes in this pathway was exploited to introduce an unnatural azido derivative of GlcNAc (GlcNAz) into this salvage pathway leading to production of UDPGlcNAz. OGT was shown to be able to utilize UDP-GlcNAz in cells leading to the incorporation of the derivative at O-GlcNAcylation sites on proteins. The azido functional group can then be covalently coupled with phosphine in a ‘‘Staudinger’’ ligation reaction [62]. This chemistry has been used with biotinylated phosphine as a means to subsequently enrich GlcNAz modified proteins with streptavidin chromatography [63]. The GlcNAz is peracetylated to facilitate entry into cells. This approach was used to capture 51 and 199 putative O-GlcNAc modified proteins from cytosolic cell culture lysates in two independent studies [63,64], but no specific evidence of O-GlcNAc modification (peptides containing the modification) or information on sites of modification were observed in the mass spectrometry analyses. As with any enrichment strategy, it is important to rigorously discriminate between background and specific binding. With this metabolic chemoenzymatic labeling technique, it is possible that unwanted biological effects through the introduction of an unnatural UDP-GlcNAc analogue may occur. For example, the normal dynamic cycling of O-GlcNAc in cells may be somewhat affected as O-GlcNAcase may not as efficiently be able to utilize O-GlcNAz as a substrate. Additionally, this approach is useful only in cell culture systems where the GlcNAz analogue can be loaded into cells, and cannot be used for in vivo animal studies.
6. BETA-ELIMINATION/MICHAEL ADDITION STRATEGIES FOR O-GlcNAcYLATION SITE-MAPPING Base-catalyzed beta-elimination has been used for more than 40 years to release O-glycans from polypeptides for analysis [65]. In the past decade, several researchers have taken advantage of beta-elimination strategies to map sites of O-linked PTM, including phosphorylation and O-glycosylation. Upon betaelimination, the modified serine or threonine forms dehydroalanine or methyldehydroalanine, respectively. This difference in mass of 18 Da is readily detected by
364
Robert J. Chalkley et al.
mass spectrometry and Greis and co-workers applied this methodology to the mapping of O-GlcNAc sites in 1996 [46]. Since beta-elimination results in an alpha, beta unsaturated carbonyl, a nucleophile readily attacks and forms a covalent bond to the beta carbon through classical conjugate (Michael) addition. Several laboratories have taken advantage of this chemistry for mapping, presumably, phosphorlyation sites on serines and threonines [66–69]. Hart’s laboratory took advantage of this approach and developed a strategy for mapping O-GlcNAc sites termed BEMAD for mild betaelimination followed by Michael addition with the nucleophile dithiothreitol (DTT) [70]. BEMAD offers several advantages for site-mapping that include replacing the weak O-linkage with a linkage that is stable under CID, the tag can be used to enrich the modified protein/peptide (using thiol chromatography for DTT or strepavidin affinity if a biotin pentylamine is used as the nucleophile), and that an isotope can be introduced for relative quantification of differential site occupancy between two samples [70,71] (Figure 3). Using this approach (which adds a 136.1 Da to the previously modified serine/threonine), multiple O-GlcNAc sites have been mapped and relative quantification using light vs. heavy isotopes contained in the tag has been achieved [42,71,72]. A distinct disadvantage of this strategy is that multiple types of modifications on multiple modified amino acids can undergo beta-elimination to form a common alpha, beta unsaturated carbonyl, thus introducing possible difficulties in determining specificity of the chemistry. In fact, BEMAD strategies have been used to enrich and identify not only sites of serine/threonine phosphorlyation and serine/threonine O-GlcNAc modification, but also sites of cysteine alkylation [71], and serine/threonine O-Man addition, and chondroitin-containing
O S H
CH2 C
N H
CH2
C
BEMAD Strategy for Site-Mapping NH2 β−Elimination
C CH2 O
Alkylated Cysteine
N H
C
H C Light DTT (d0) O
(Glycan or phosphate)
H N H
C
Dehydroalanine
OH HSCH2CHCHCH2SH
O CH2
β−Elimination
DTT (d0 or d6)
Michael Addition
OH
or
Heavy DTT(d6) OH
N H
CH2 C
C O
HSCD2CDCDCD2SH OH
C O
O-Glycosylated or O-phosphate Modified Serine (or threonine)
Figure 3 Scheme for beta-elimination/Michael addition replacement of O-GlcNAc and serine/threonine phosphorylation with isotopic light or heavy (deuterated) DTT affinity tag which is a stable amino acid side chain in CID (unlike O-GlcNAc) and thus facilitates exact site-mapping in MS/MS.
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
365
glycosaminoglycan attachment sites [73,74]. However, approaches have been developed to deal with this lack of specificity. In fact, the alkylated cysteine reactivity can be used as an advantage to enrich and perform relative quantification on cysteine-containing peptides similar to the isotope-coded affinity tag (ICAT) strategy developed by Aebersold’s group [71]. For the Olinked PTMs to serine and threonine residues, BEMAD can be performed following enrichment of a particular class of modified proteins/peptides (e.g. phosphorylated vs. O-GlcNAc modified) and/or before and after treatment of the peptides with specific hydrolases such as phosphatases or specific glycosidases [42,71,73,74] to enhance specificity of targeting a particular PTM. A further note of caution is that under extreme beta-elimination conditions water can be eliminated from unmodified serines and threonines resulting in false-positive sites of modification being assigned [75]. Finally, it should be noted that proline at the +1 site of modification slows down the chemistry substantially, most likely due to the lack of an amino proton to stabilize the carbonyl group during the beta-elimination [70]. However, BEMAD conditions have been developed to overcome the +1 proline at about 50% efficiency without destroying the peptide backbone [51]. In conclusion, while other site-mapping strategies certainly have advantages over BEMAD in terms of specificity and ease of analysis, BEMAD is a powerful method when properly controlled to tag, enrich and quantify sites of O-GlcNAc attachment.
7. DIRECT ENRICHMENT OF NATIVE O-GlcNAc MODIFIED PROTEINS WITH WGA LECTIN WEAK AFFINITY CHROMATOGRAPHY (LWAC) O-GlcNAc enrichment approaches described thus far in the chapter depend on chemical, or enzymatic derivatization of O-GlcNAc. Potential drawbacks of such approaches include lack of specificity in labeling, incomplete labeling and lowered sensitivity due to additional steps of labeling. A strategy for direct enrichment of native O-GlcNAc modified peptides would thus be advantageous in O-GlcNAc proteomics and has been recently described [42]. The approach takes advantage of the specificity of WGA lectin for binding to N-acetylglucosamine. This binding is quite weak, being in the 10 mM range, but application of low flow rate isocratic HPLC coupled with a relatively long WGA column, allowed for retardation of O-GlcNAc modified peptides relative to unmodified peptides such that a strongly enriched population of O-GlcNAc modified peptides could be recovered in later eluting fractions. The technique is referred to as ‘‘lectin weak affinity chromatography’’ (LWAC). The buffer utilized is Tris based containing high salt to reduce background associations. No prior chemical or enzymatic treatment is required for this technique, and thus sensitivity and specificity is increased. The approach is nondestructive, and thus suitable for analysis of multiple PTMs in the same sample. Complex carbohydrates with terminal GlcNAc or sialic acid may be targeted by WGA. However, it is observed that in LWAC, a subset of peptides modified by complex carbohydrate elutes significantly later than O-GlcNAc modified peptides (likely due to their relatively stronger affinity for WGA). Thus, LWAC allows for
366
Robert J. Chalkley et al.
differential enrichment of peptides modified by either O-GlcNAc or complex carbohydrate containing terminal GlcNAc. Succinylation of WGA (sWGA) increases specificity towards GlcNAc over sialic acid [76]. However, some degree of WGA affinity for GlcNAc is compromised by succinylation, making O-GlcNAcylated peptide enrichment more difficult. Enhancement of O-GlcNAc modification site-mapping may be achieved by increasing stoichiometry of O-GlcNAc modifications by means such as pharmacological inhibition of O-GlcNAcase with PUGNAc [77]. However, it is important to note that PUGNAc, as an analogue of GlcNAc, will interfere with WGA binding to genuine O-GlcNAcylated peptides, and thus must not be present in samples when performing WGA chromatography.
8. ION TRAP MS2/MS3 FOR O-GlcNAc MODIFIED PEPTIDE IDENTIFICATION Sequence determination of natively modified O-GlcNAc peptides is possible in some cases from MS/MS spectra, but as the primary fragmentation is the dissociation of the sugar residue from the peptide and this fragmentation causes the loss of the majority of the energy imparted into the molecule in the CID process, generally relatively little peptide backbone fragmentation is observed, hampering the ability to identify the peptide. Thus, intact precursor ions having lost O-GlcNAc are invariably observed in MS/MS as a very strong ion. As an example, ion trap (LTQ instrument) MS/MS of a LWAC enriched O-GlcNAc modified peptide at [M+2H]2+ m/z 1000.8 (Figure 4, upper panel) generates an unmodified precursor at [M+2H]2+ m/z 898.9 (neutral loss of 203.08 Da that appears as the loss of 101.5 for doubly charged ions) having an intensity such that 10 magnification is required to view other fragment ions. It should be noted that the neutral loss of OGlcNAc for triply charged ions would appear as loss of m/z 67.7 from the precursor m/z. These predictable neutral losses can be used to trigger MS/MS/MS (MS3) as a way to potentially generate additional fragment ion information on the precursor peptide. As the neutral loss species is no longer modified by O-GlcNAc, more extensive fragmentation of the peptide backbone is observed compared to the O-GlcNAc modified form. MS3 of the neutral loss [M+2H]2+ 898.9 (Figure 4, lower panel) generates a continuous series of y ions (y8–y14) that compliments information from MS/MS and strengthens the identification of this peptide as TPVDYIDLPYSSSPSR. Revisiting the MS/MS scan, low intensity fragment ions retaining O-GlcNAc indicate the site of modification is one of the four serines at the C-terminus of this peptide.
9. ELECTRON CAPTURE DISSOCIATION (ECD) FOR O-GlcNAc SITE-MAPPING The recent developments of alternative fragmentation mechanisms that do not break the most labile bonds have the potential to revolutionize the use of mass spectrometry for O-GlcNAc modification site assignment. In electron capture
367
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
MS/MS of [M+2H]2+ 1000.82
[M+1H]1+ 1798.0 Precursor Having lost O-GlcNAc
[M+2H] 2+ 898.9 Precursor Having lost O-GlcNAc
300
2000
m/z
MS/MS/MS of [M+2H]2+ 898.9 620.4 558.5 y6 804.4 B5-H2O 576.3 b7 b5 689.4 446.3 b6 y4 400
600
800
880.5 y8
993.5 y9 1000
1108.5 1221.6 y10 y11
1200
1384.6 1499.7 y13 1599.8 y12 y14
m/z TPVDYIDLPYSSSPSR(O-GlcNAc modified) from the protein Ponsin *** *
1400
1600
Figure 4 Ions displaying neutral loss of the mass of O-GlcNAc in MS/MS can be fragmented in MS/MS/MS (MS3) to generate enhanced fragment ion information. In this example, CID of a doubly charged O-GlcNAc modified peptide [M+2H] 2+ 1000.82 gives rise to a dominant O-GlcNAc neutral loss ion ( 102) at [M+2H] 2+ 898.9 in MS/MS which is targeted for MS/MS/MS leading to increased fragment ion information that allows sequencing of the modified peptide as TPVDYIDLPYSSSPSR. (See Colour Plate Section at the end of this book.)
dissociation (ECD) mass spectrometry, the N–C bond in the peptide backbone is preferentially cleaved to form c and z ions [78]. Labile modifications are retained on the fragment ions and ECD has been used to identify O-GlcNAc modification sites in mouse brain [42]. Figure 5 shows the ECD fragmentation spectrum of an O-GlcNAc modified peptide from Spectrin beta chain, where the site of O-GlcNAc modification can easily be determined by the mass difference of 290.12 Da between c13 and c14 ions, which corresponds to the mass of an O-GlcNAc modified serine residue. There are, however, limitations to the widespread use of ECD for O-GlcNAc site identification. Unfortunately, this type of fragmentation can only be performed in a fourier transform ion cylotron resonance (FT-ICR) mass spectrometer. These instruments use a super-conducting magnet for mass measurement and are hence expensive. FT-ICR is also less sensitive than many other mass spectrometry analyzers due to the ion cloud manipulation required to transport and trap the ions in the FT-ICR cell. ECD is a less efficient fragmentation mechanism than CID, and thus a significant amount of precursor ion is generally still present in the fragmentation spectrum. This further limits the sensitivity of the approach. The efficiency of ECD is also dependent on the charge state of the precursor ion, with higher charge state components more readily fragmented. Hence, there may be a
368
Robert J. Chalkley et al.
[M+3H]3+ x4
x4
579.26
100
z H D T S A S T Q S T P A S Sg Rc
90
Relative Abundance
80 [M+3H]2+.
70
868.90
60
Ser + GlcNAc
c6 616.27
50
z1 159.10
c8 c7 847.38
40 30
717.32
c4 458.20 c5 270.12 371.17 529.24 c2
c9 c14
932.41 1099.23
c3
c12
1578.69
c13
1288.57 1201.56
1561.79
20 1669.08
956.88
10
1437.02
0 200
400
600
800
1000 m/z
1200
1400
1600
Figure 5 ECD MS/MS spectrum of an O-GlcNAc modified peptide from spectrin. In this LC-ECD-MS/MS spectrum from an LWAC enriched fraction of peptides from post-synaptic density both the peptide and site of modification could be determined. (See Colour Plate Section at the end of this book.)
shift towards using different enzymes other than trypsin to digest proteins that will produce larger, higher charged fragments for analysis. A second novel fragmentation mechanism, electron transfer dissociation (ETD), has recently been demonstrated [79]. This has similar peptide preferences and produces similar fragment ion types to ECD, but can be employed in a quadruople ion trap mass spectrometer. This type of instrumentation is significantly cheaper and more widely available than FT-ICR instrumentation. Also, the efficiency of ion capture in a quadrupole ion trap is much higher than in an FT-ICR solenoid. Hence, ETD should be more sensitive than ECD. Commercial instrumentation employing ETD is just becoming available, and these instruments have the potential to become major tools used for O-GlcNAc site identification using mass spectrometry.
10. INTERPRETATION OF O-GlcNAcYLATED PEPTIDE MASS SPECTROMETRY One strategy that has been used to try to identify potential modified peptides is to search for pairs of precursor ions that differ in mass by 203.08 Da [80].
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
369
However, it should be noted that O-GlcNAc modification of a serine or threonine has been observed to inhibit tryptic proteolytic cleavage at proximal lysines or arginines [80] and thus modified and unmodified versions of the same peptide will not necessarily be observed [49]. As has been previously discussed, O-GlcNAcylated peptides produce characteristic fragmentation spectra. Hence, a script has been written to search MS/MS spectra for the combination of m/z 204.08 and an intact precursor having lost the mass of O-GlcNAc ( 203.08 amu), which is strongly diagnostic for an O-GlcNAc peptide (see boxed ions in Figure 2B) [42]. The script, when used to search LC-MS/MS data of non-enriched PSD peptides, predicted that zero out of a total of 501 MS/MS spectra corresponded to an O-GlcNAc modified peptide. However, when the script was used to search an ‘‘LWAC enriched’’ LC-MS/MS data file, it predicted 103 out of a total of 388 spectra corresponded to O-GlcNAc modified peptides, helping to guide analysis to peptides of interest. This result also affirms that the stoichiometry and abundance of O-GlcNAc modification is low, as no modified peptides were detected without prior modification enrichment by LWAC. It is important to note that in searching for the modification of O-GlcNAc using database search algorithms, simply allowing for variable modification of serines and threonines by the mass of 203.08 Da is not an optimal strategy, as most fragment ions will have lost the mass of O-GlcNAc due to its lability. Thus, it is important to search O-GlcNAc as a neutral loss modification in MS/MS spectra. For example, when Mascot was adapted to search for O-GlcNAc as a neutral loss modification the algorithm became effective at identifying sequences of O-GlcNAc modified peptides in LWAC enriched fractions. Of course, in this searching strategy, all potential information on specific site assignment is lost. Hence, ideally one would want to search for both O-GlcNAc modified and unmodified fragment ions simultaneously for optimal peptide sequencing and site-assignment. It is useful in manual interpretations of MS/MS spectra corresponding to O-GlcNAc peptides to look for ion pairs differing by the mass of the sugar residue (203.08 Da) as this is likely to indicate O-GlcNAc modified fragment ions which can help in specific site-assignment and also identifies the corresponding lower molecular weight ion as an unmodified fragment to be used in sequencing.
11. CONCLUSIONS Progress in the understanding of the functional significance and importance of O-GlcNAcylation has been slow until recently due to the lack of sensitive, specific and global strategies available for characterization of the modification. However, the application of mass spectrometry for modification detection and site determination, especially using recently developed fragmentation mechanisms, combined with strategies for enrichment of modified components from complex mixtures, promises to accelerate the understanding and characterization of this widespread regulatory modification.
370
Robert J. Chalkley et al.
However, to really understand the functional role the modification is playing it will be necessary to perform quantitative, comparative studies that can track changes in levels of modification in response to cellular stimuli, changing cell states and in the context of disease. Hence, isotopic labeling strategies will have to be introduced into the workflow. Also, as O-GlcNAcylation appears to have strong functional, regulatory links to phosphorylation, to properly characterize signaling cascades it will be required to study both O-GlcNAc modification and phosphorylation simultaneously. This type of approach should now be feasible with the availability of specific enrichment strategies for each modification combined with the ability to analyze the modifications in a non-derivatized state, which as well as being more sensitive, also eliminates potential issues with misassignment to other O-linked modifications. Indeed, an understanding of which PTMs co-exist on a given protein species is likely to be vital to understand the co-operativity and reciprocity in function that different modifications perform. This will require strategies that analyze larger pieces (or in an ideal scenario intact proteins) to determine which modifications can co-occur on the same molecule. Significant progress has been made with this type of strategy in the study of PTMs of histones [81,82], but in many ways these are ideal proteins for this type of strategy in that they are small, abundant and have many high stoichiometry modifications. O-GlcNAc is generally a low stoichiometry modification, and if you want to study its co-localization with other low stoichiometry modifications such as phosphorylation, then the abundance of multiply modified species will present a huge challenge to this type of approach. In conclusion, there are still many challenges to solve in the study of this elusive modification, but the recent rapid progress in techniques to facilitate its discovery and characterization promise to catalyze the understanding of O-GlcNAcylation’s role in signal transduction, homeostasis and disease.
REFERENCES 1 G.D. Holt and G.W. Hart, The subcellular distribution of terminal N-acetylglucosamine moieties. Localization of a novel protein-saccharide linkage, O-linked GlcNAc, J. Biol. Chem., 261(17) (1986) 8049–8057. 2 C. Slawson, M.P. Housley and G.W. Hart, O-GlcNAc cycling: How a single sugar post-translational modification is changing the way we think about signaling networks, J. Cell Biochem., 97(1) (2006) 71–83. 3 L. Wells, K. Vosseller and G.W. Hart, Glycosylation of nucleocytoplasmic proteins: Signal transduction and O-GlcNAc, Science, 291(5512) (2001) 2376–2378. 4 G.W. Hart, Dynamic O-linked glycosylation of nuclear and cytoskeletal proteins, Annu. Rev. Biochem., 66 (1997) 315–335. 5 K. Vosseller, K. Sakabe, L. Wells and G.W. Hart, Diverse regulation of protein function by O-GlcNAc: A nuclear and cytoplasmic carbohydrate post-translational modification, Curr. Opin. Chem. Biol., 6(6) (2002) 851–857. 6 R.N. Cole and G.W. Hart, Cytosolic O-glycosylation is abundant in nerve terminals, J. Neurochem., 79(5) (2001) 1080–1089. 7 J.A. Hanover, Z. Lai, G. Lee, W.A. Lubas and S.M. Sato, Elevated O-linked N-acetylglucosamine metabolism in pancreatic beta-cells, Arch. Biochem. Biophys., 362(1) (1999) 38–45.
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
371
8 G.W. Hart, M.P. Housley and C. Slawson, Cycling of O-linked beta-N-acetylglucosamine on nucleocytoplasmic proteins, Nature, 446(7139) (2007) 1017–1022. 9 M.D. Roos, K. Su, J.R. Baker and J.E. Kudlow, O glycosylation of an Sp1-derived peptide blocks known Sp1 protein interactions, Mol. Cell Biol., 17(11) (1997) 6472–6480. 10 G. Majumdar, A. Harrington, J. Hungerford, A. Martinez-Hernandez, I.C. Gerling, R. Raghow and S. Solomon, Insulin dynamically regulates calmodulin gene expression by sequential O-glycosylation and phosphorylation of sp1 and its subcellular compartmentalization in liver cells, J. Biol. Chem., 281(6) (2006) 3642–3650. 11 L.S. Griffith and B. Schmitz, O-linked N-acetylglucosamine levels in cerebellar neurons respond reciprocally to pertubations of phosphorylation, Eur. J. Biochem., 262(3) (1999) 824–831. 12 Z.T. Kneass and R.B. Marchase, Neutrophils exhibit rapid agonist-induced increases in proteinassociated O-GlcNAc, J. Biol. Chem., 279(44) (2004) 45759–45765. 13 Z.T. Kneass and R.B. Marchase, Protein O-GlcNAc modulates motility-associated signaling intermediates in neutrophils, J. Biol. Chem., 280(15) (2005) 14579–14585. 14 N.E. Zachara, N. O’Donnell, W.D. Cheung, J.J. Mercer, J.D. Marth and G.W. Hart, Dynamic O-GlcNAc modification of nucleocytoplasmic proteins in response to stress. A survival response of mammalian cells, J. Biol. Chem., 279(29) (2004) 30133–30142. 15 C. Slawson, N.E. Zachara, K. Vosseller, W.D. Cheung, M.D. Lane and G.W. Hart, Perturbations in O-linked beta-N-acetylglucosamine protein modification cause severe defects in mitotic progression and cytokinesis, J. Biol. Chem., 280(38) (2005) 32944–32956. 16 K. Kamemura and G.W. Hart, Dynamic interplay between O-glycosylation and O-phosphorylation of nucleocytoplasmic proteins: A new paradigm for metabolic control of signal transduction and transcription, Prog. Nucleic Acid Res. Mol. Biol., 73 (2003) 107–136. 17 T.Y. Chou, G.W. Hart and C.V. Dang, c-Myc is glycosylated at threonine 58, a known phosphorylation site and a mutational hot spot in lymphomas, J. Biol. Chem., 270(32) (1995) 18961–18965. 18 X. Cheng, R.N. Cole, J. Zaia and G.W. Hart, Alternative O-glycosylation/O-phosphorylation of the murine estrogen receptor beta, Biochemistry, 39(38) (2000) 11609–11620. 19 W.G. Kelly, M.E. Dahmus and G.W. Hart, RNA polymerase II is a glycoprotein. Modification of the COOH-terminal domain by O-GlcNAc, J. Biol. Chem., 268(14) (1993) 10416–10424. 20 F.I. Comer and G.W. Hart, Reciprocity between O-GlcNAc and O-phosphate on the carboxyl terminal domain of RNA polymerase II, Biochemistry, 40(26) (2001) 7845–7852. 21 T. Lefebvre, C. Alonso, S. Mahboub, M.J. Dupire, J.P. Zanetta, M.L. Caillet-Boudin and J.C. Michalski, Effect of okadaic acid on O-linked N-acetylglucosamine levels in a neuroblastoma cell line, Biochim. Biophys. Acta, 1472(1–2) (1999) 71–81. 22 T. Lefebvre, S. Ferreira, L. Dupont-Wallois, T. Bussiere, M.J. Dupire, A. Delacourte, J.C. Michalski and M.L. Caillet-Boudin, Evidence of a balance between phosphorylation and O-GlcNAc glycosylation of Tau proteins — A role in nuclear localization, Biochim. Biophys. Acta, 1619(2) (2003) 167–176. 23 L. Wells, K. Vosseller and G.W. Hart, A role for N-acetylglucosamine as a nutrient sensor and mediator of insulin resistance, Cell Mol. Life Sci., 60(2) (2003) 222–228. 24 N.E. Zachara and G.W. Hart, O-GlcNAc a sensor of cellular state: The role of nucleocytoplasmic glycosylation in modulating cellular function in response to nutrition and stress, Biochim. Biophys. Acta, 1673(1–2) (2004) 13–28. 25 Y. Akimoto, G.W. Hart, H. Hirano and H. Kawakami, O-GlcNAc modification of nucleocytoplasmic proteins and diabetes, Med. Mol. Morphol., 38(2) (2005) 84–91. 26 M.G. Buse, Hexosamines, insulin resistance, and the complications of diabetes: Current status, Am. J. Physiol. Endocrinol. Metab., 290(1) (2006) E1–E8. 27 R.J. Konrad and J.E. Kudlow, The role of O-linked protein glycosylation in beta-cell dysfunction, Int. J. Mol. Med., 10(5) (2002) 535–539. 28 M. Hawkins, N. Barzilai, R. Liu, M. Hu, W. Chen and L. Rossetti, Role of the glucosamine pathway in fat-induced insulin resistance, J. Clin. Invest., 99(9) (1997) 2173–2182. 29 K. Vosseller, L. Wells, M.D. Lane and G.W. Hart, Elevated nucleocytoplasmic glycosylation by O-GlcNAc results in insulin resistance associated with defects in Akt activation in 3T3-L1 adipocytes, Proc. Natl. Acad. Sci. USA, 99(8) (2002) 5313–5318.
372
Robert J. Chalkley et al.
30 E.B. Arias and G.D. Cartee, Relationship between protein O-linked glycosylation and insulinstimulated glucose transport in rat skeletal muscle following calorie restriction or exposure to O-(2-acetamido-2-deoxy-d-glucopyranosylidene)amino-N-phenylcarbamate, Acta Physiol. Scand, 183(3) (2005) 281–289. 31 E.B. Arias, J. Kim and G.D. Cartee, Prolonged incubation in PUGNAc results in increased protein O-linked glycosylation and insulin resistance in rat skeletal muscle, Diabetes, 53(4) (2004) 921–930. 32 S.Y. Park, J. Ryu and W. Lee, O-GlcNAc modification on IRS-1 and Akt2 by PUGNAc inhibits their phosphorylation and induces insulin resistance in rat primary adipocytes, Exp. Mol. Med., 37(3) (2005) 220–229. 33 D.A. McClain, W.A. Lubas, R.C. Cooksey, M. Hazel, G.J. Parker, D.C. Love and J.A. Hanover, Altered glycan-dependent signaling induces insulin resistance and hyperleptinemia, Proc. Natl. Acad. Sci. USA, 99(16) (2002) 10695–10699. 34 X.L. Du, D. Edelstein, S. Dimmeler, Q. Ju, C. Sui and M. Brownlee, Hyperglycemia inhibits endothelial nitric oxide synthase activity by posttranslational modification at the Akt site, J. Clin. Invest., 108(9) (2001) 1341–1348. 35 R.J. Clark, P.M. McDonough, E. Swanson, S.U. Trost, M. Suzuki, M. Fukuda and W.H. Dillmann, Diabetes and the accompanying hyperglycemia impairs cardiomyocyte calcium cycling through increased nuclear O-GlcNAcylation, J. Biol. Chem., 278(45) (2003) 44230–44237. 36 K. Liu, A.J. Paterson, R.J. Konrad, A.F. Parlow, S. Jimi, M. Roh, E. Chin, Jr and J.E. Kudlow, Streptozotocin, an O-GlcNAcase inhibitor, blunts insulin and growth hormone secretion, Mol. Cell Endocrinol., 194(1–2) (2002) 135–146. 37 Y. Akimoto, G.W. Hart, L. Wells, K. Vosseller, K. Yamamoto, E. Munetomo, M. Ohara-Imaizumi, C. Nishiwaki, S. Nagamatsu, H. Hirano and H. Kawakami, Elevation of the post-translational modification of proteins by O-linked N-acetylglucosamine leads to deterioration of the glucosestimulated insulin secretion in the pancreas of diabetic Goto-Kakizaki rats, Glycobiology, 17(2) (2007) 127–140. 38 F. Liu, K. Iqbal, I. Grundke-Iqbal, G.W. Hart and C.X. Gong, O-GlcNAcylation regulates phosphorylation of tau: A mechanism involved in Alzheimer’s disease, Proc. Natl. Acad. Sci. USA, 101(29) (2004) 10804–10809. 39 P.J. Yao and P.D. Coleman, Reduced O-glycosylated clathrin assembly protein AP180: Implication for synaptic vesicle recycling dysfunction in Alzheimer’s disease, Neurosci. Lett., 252(1) (1998) 33–36. 40 K. Kanninen, G. Goldsteins, S. Auriola, I. Alafuzoff and J. Koistinaho, Glycosylation changes in Alzheimer’s disease as revealed by a proteomic approach, Neurosci. Lett., 367(2) (2004) 235–240. 41 D. Schubert, Glucose metabolism and Alzheimer’s disease, Ageing Res. Rev., 4(2) (2005) 240–257. 42 K. Vosseller, J.C. Trinidad, R.J. Chalkley, G. Specht C., A. Thalhammer, A.J. Lynn, J.O. Snedecor, S. Guan, K.F. Medzihradszky, D.A. Maltby, R. Schoepfer and A.L. Burlingame, O-linked N-acetylglucosamine proteomics of postsynaptic density preparations using lectin weak affinity chromatography and mass spectrometry, Mol. Cell Proteomics, 5(5) (2006) 923–934. 43 S.B. Ficarro, M.L. McClel, P.T. Stukenberg, D.J. Burke, M.M. Ross, J. Shabanowitz, D.F. Hunt and F.M. White, Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae, Nat. Biotechnol., 20(3) (2002) 301–305. 44 J.C. Trinidad, C.G. Specht, A. Thalhammer, R. Schoepfer and A.L. Burlingame, Comprehensive identification of phosphorylation sites in postsynaptic density preparations, Mol. Cell Proteomics, 5(5) (2006) 914–922. 45 V. Dehennaut, T. Lefebvre, C. Sellier, Y. Leroy, B. Gross, S. Walker, R. Cacan, J.C. Michalski, J.P. Vilain and J.F. Bodart, O-Linked N-acetylglucosaminyltransferase inhibition prevents G2/M transition in Xenopus laevis Oocytes, J. Biol. Chem., 282(17) (2007) 12527–12536. 46 K.D. Greis, B.K. Hayes, F.I. Comer, M. Kirk, S. Barnes, T.L. Lowary and G.W. Hart, Selective detection and site-analysis of O-GlcNAc-modified glycopeptides by beta-elimination and tandem electrospray mass spectrometry, Anal. Biochem., 234(1) (1996) 38–49. 47 M.J. Huddleston, M.F. Bean and S.A. Carr, Collisional fragmentation of glycopeptides by electrospray ionization LC/MS and LC/MS/MS: Methods for selective detection of glycopeptides in protein digests, Anal. Chem., 65(7) (1993) 877–884.
Mass Spectrometry Analysis of O-GlcNAc Modifications on Proteins
373
48 R.J. Chalkley and A.L. Burlingame, Identification of GlcNAcylation sites of peptides and alphacrystallin using Q-TOF mass spectrometry, J. Am. Soc. Mass Spectrom., 12(10) (2001) 1106–1113. 49 R.J. Chalkley and A.L. Burlingame, Identification of novel sites of O-N-acetylglucosamine modification of serum response factor using quadrupole time-of-flight mass spectrometry, Mol. Cell Proteomics, 2(3) (2003) 182–190. 50 K.D. Greis, W. Gibson and G.W. Hart, Site-specific glycosylation of the human cytomegalovirus tegument basic phosphoprotein (UL32) at serine 921 and serine 952, J. Virol, 68(12) (1994) 8339–8349. 51 S.A. Whelan and G.W. Hart, Identification of O-GlcNAc sites on proteins, Methods Enzymol., 415 (2006) 113–133. 52 V.P. Bhavanandan and A.W. Katlic, The interaction of wheat germ agglutinin with sialoglycoproteins. The role of sialic acid, J. Biol. Chem., 254(10) (1979) 4000–4008. 53 K.A. Kronis and J.P. Carver, Wheat germ agglutinin dimers bind sialyloligosaccharides at four sites in solution: Proton nuclear magnetic resonance temperature studies at 360 MHz, Biochemistry, 24(4) (1985) 826–833. 54 C.S. Wright, Crystal structure of a wheat germ agglutinin/glycophorin-sialoglycopeptide receptor complex. Structural basis for cooperative lectin-cell binding, J. Biol. Chem., 267(20) (1992) 14345–14352. 55 L. Leickt, M. Bergstrom, D. Zopf and S. Ohlson, Bioaffinity chromatography in the 10 mM range of Kd, Anal. Biochem., 253(1) (1997) 135–136. 56 B.K. Hayes, K.D. Greis and G.W. Hart, Specific isolation of O-linked N-acetylglucosamine glycopeptides from complex mixtures, Anal. Biochem., 228(1) (1995) 115–122. 57 P.A. Haynes and R. Aebersold, Simultaneous detection and identification of O-GlcNAc-modified glycoproteins using liquid chromatography-tandem mass spectrometry, Anal. Chem., 72(21) (2000) 5402–5410. 58 K. Vosseller, J.C. Trinidad, R.J. Chalkley, C.G. Specht, A. Thalhammer, A.J. Lynn, J.H. Snedecor, S. Guan, K.F. Medzihradszky, D.A. Maltby, R. Schoepfer and A.L. Burlingame, O-GlcNAc proteomics of postsynaptic density preparations using lectin weak affinity chromatography (LWAC) and mass spectrometry, Mol. Cell Proteomics, 5(5) (2006) 923–934. 59 N. Khidekel, S. Arndt, N. Lamarre-Vincent, A. Lippert, K.G. Poulin-Kerstien, B. Ramakrishnan, P.K. Qasba and L.C. Hsieh-Wilson, A chemoenzymatic approach toward the rapid and sensitive detection of O-GlcNAc posttranslational modifications, J. Am. Chem. Soc., 125(52) (2003) 16162–16163. 60 H.C. Tai, N. Khidekel, S.B. Ficarro, E.C. Peters and L.C. Hsieh-Wilson, Parallel identification of O-GlcNAc-modified proteins from cell lysates, J. Am. Chem. Soc., 126(34) (2004) 10500–10501. 61 N. Khidekel, S.B. Ficarro, E.C. Peters and L.C. Hsieh-Wilson, Exploring the O-GlcNAc proteome: Direct identification of O-GlcNAc-modified proteins from the brain, Proc. Natl. Acad. Sci. USA, 101(36) (2004) 13132–13137. 62 D.J. Vocadlo, H.C. Hang, E.J. Kim, J.A. Hanover and C.R. Bertozzi, A chemical approach for identifying O-GlcNAc-modified proteins in cells, Proc. Natl. Acad. Sci. USA, 100(16) (2003) 9116–9121. 63 R. Sprung, A. Nandi, Y. Chen, S.C. Kim, D. Barma, J.R. Falck and Y. Zhao, Tagging-via-substrate strategy for probing O-GlcNAc modified proteins, J. Proteome Res., 4(3) (2005) 950–957. 64 A. Nandi, R. Sprung, D.K. Barma, Y. Zhao, S.C. Kim, J.R. Falck and Y. Zhao, Global identification of O-GlcNAc-modified proteins, Anal. Chem., 78(2) (2006) 452–458. 65 J.B. Adams, Linkage of carbohydrate to hydroxyamino acids in mucopolysaccharides and mucoproteins, Biochem. J., 97(2) (1965) 345–352. 66 M. Adamczyk, J.C. Gebler and J. Wu, Selective analysis of phosphopeptides within a protein mixture by chemical modification, reversible biotinylation and mass spectrometry, Rapid Commun. Mass Spectrom., 15(16) (2001) 1481–1488. 67 M.B. Goshe, T.P. Conrads, E.A. Panisko, N.H. Angell, T.D. Veenstra and R.D. Smith, Phosphoprotein isotope-coded affinity tag approach for isolating and quantitating phosphopeptides in proteome-wide analyses, Anal. Chem., 73(11) (2001) 2578–2586. 68 W. Weckwerth, L. Willmitzer and O. Fiehn, Comparative quantification and identification of phosphoproteins using stable isotope labeling and liquid chromatography/mass spectrometry, Rapid Commun. Mass Spectrom., 14(18) (2000) 1677–1681.
374
Robert J. Chalkley et al.
69 H.E. Meyer, E. Hoffmann-Posorske and L.M. Heilmeyer, Jr, Determination and location of phosphoserine in proteins and peptides by conversion to S-ethylcysteine, Methods Enzymol, 201 (1991) 169–185. 70 L. Wells, K. Vosseller, R.N. Cole, J.M. Cronshaw, M.J. Matunis and G.W. Hart, Mapping sites of O-GlcNAc modification using affinity tags for serine and threonine post-translational modifications, Mol. Cell Proteomics, 1(10) (2002) 791–804. 71 K. Vosseller, K.C. Hansen, R.J. Chalkley, J.C. Trinidad, L. Wells, G.W. Hart and A.L. Burlingame, Quantitative analysis of both protein expression and serine/threonine post-translational modifications through stable isotope labeling with dithiothreitol, Proteomics, 5(2) (2005) 388–398. 72 L.E. Ball, M.N. Berkaw and M.G. Buse, Identification of the major site of O-linked betaN-acetylglucosamine modification in the C terminus of insulin receptor substrate-1, Mol. Cell Proteomics, 5(2) (2006) 313–323. 73 S.K. Olson, J.R. Bishop, J.R. Yates, K. Oegema and J.D. Esko, Identification of novel chondroitin proteoglycans in Caenorhabditis elegans: Embryonic cell division depends on CPG-1 and CPG-2, J. Cell Biol., 173(6) (2006) 985–994. 74 B. Woosley, M. Xie, L. Wells, R. Orlando, D. Garrison, D. King and C. Bergmann, Comprehensive glycan analysis of recombinant Aspergillus niger endo-polygalacturonase C, Anal. Biochem., 354(1) (2006) 43–53. 75 D.T. McLachlin and B.T. Chait, Improved beta-elimination-based affinity purification strategy for enrichment of phosphopeptides, Anal. Chem., 75(24) (2003) 6826–6836. 76 M. Monsigny, C. Sene, A. Obrenovitch, A.C. Roche, F. Delmotte and E. Boschetti, Properties of succinylated wheat-germ agglutinin, Eur. J. Biochem., 98(1) (1979) 39–45. 77 R.S. Haltiwanger, K. Grove and G.A. Philipsberg, Modulation of O-linked N-acetylglucosamine levels on nuclear and cytoplasmic proteins in vivo using the peptide O-GlcNAc-betaN-acetylglucosaminidase inhibitor O-(2-acetamido-2-deoxy-d-glucopyranosylidene)amino-Nphenylcarbamate, J. Biol. Chem., 273(6) (1998) 3611–3617. 78 R.A. Zubarev, Electron-capture dissociation tandem mass spectrometry, Curr. Opin. Biotechnol., 15(1) (2004) 12–16. 79 J.E. Syka, J.J. Coon, M.J. Schroeder, J. Shabanowitz and D.F. Hunt, Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry, Proc. Natl. Acad. Sci. USA, 101(26) (2004) 9528–9533. 80 E.P. Roquemore, A. Dell, H.R. Morris, M. Panico, A.J. Reason, L.A. Savoy, G.J. Wistow, J.S. Zigler, Jr, B.J. Earles and G.W. Hart, Vertebrate lens alpha-crystallins are modified by O-linked N-acetylglucosamine, J. Biol. Chem., 267(1) (1992) 555–563. 81 A.L. Burlingame, X. Zhang and R.J. Chalkley, Mass spectrometric analysis of histone posttranslational modifications, Methods, 36(4) (2005) 383–394. 82 B.A. Garcia, J. Shabanowitz and D.F. Hunt, Characterization of histones and their posttranslational modifications by mass spectrometry, Curr. Opin. Chem. Biol., 11(1) (2007) 66–73.
X-Ser(thr)-X ATP kinase
O-GlcNAcase OH O
HO
O
HO
PO3-
competition
AcNH X-Ser(thr)-X X-Ser(thr)-X
phosphatase
OGT UDP-GlcNAc
X-Ser(thr)-X
Plate 8 O-GlcNAc is a dynamic cytosolic enzyme catalyzed PTM of serines and threonines which can compete with phosphorylation. (For Black and White version, see page 356.) (A) CID in Q-TOF mass spec MS/MS of [M+2H]2+ 644.9 QLLPSTATVR (O-GlcNAc modified, site not determined) 204.08 GlcNAc oxonium ion
Relative intensity
138.06
242.1 b2
355.2 b3
375.2 338.2 y3 446.3 y4 100
200
300
[M+H]1+ 1085.7 Precursor Having lost O-GlcNAc
[M+2H]2+ 543.3 Precursor Having lost O-GlcNAc
400
500
934.5 y7 731.4 +O-GlcNAc y7 634.35 844.5 y6 1068.5 y8
600
700
800
900
1000
1100
m/z (B) CID in ion trap mass spec 3+ MS/MS of [M+3H] 579.9 HDTSASTQSTPASSR (O-GlcNAc modified, site not determined) Relative intensity
10X
20X 511.8 b5
204.1 O-GlcNAc oxonium ion
[M+2H]2+ 766.9 Precursor Having lost O-GlcNAc
720.3 y5 +O-GlcNAc
1016.3 b10 916.4 b9
1280.3 1181.1 y13 1396.6 y12 y14
m/z
Plate 9 CID MS/MS of an O-GlcNAc modified peptide in (A) Q-TOF mass spectrometry (QSTAR instrument) showing the abundance of the 204.08 GlcNAc oxonium ion having been lost from the peptide, as well as intact precursor ions having lost O-GlcNAc. (B) CID MS/MS of an O-GlcNAc modified peptide in ion trap mass spectrometry (LTQ instrument) shows the presence of the 204.08 GlcNAc oxonium ion (apparent at relatively low intensity compared to Q-TOF spectra of O-GlcNAc peptides) and intact precursor ions having lost O-GlcNAc (having very strong intensity relatively compared to Q-TOF spectra of O-GlcNAc peptides). Due to lability of O-GlcNAc in each case, there is not enough information to determine exact site of modification in these peptides. (For Black and White version, see page 359.)
MS/MS of [M+2H]2+ 1000.82
[M+1H]1+ 1798.0 Precursor Having lost O-GlcNAc
[M+2H] 2+ 898.9 Precursor Having lost O-GlcNAc
300
2000
m/z
MS/MS/MS of [M+2H]2+ 898.9
880.5 y8
620.4 558.5 y6 804.4 B5-H2O 576.3 b7 b5 689.4 446.3 b6 y4 400
600
1108.5 1221.6 y10 y11
993.5 y9
800
1000
1200
1384.6 1499.7 y13 1599.8 y12 y14
m/z TPVDYIDLPYSSSPSR(O-GlcNAc modified) from the protein Ponsin *** *
1400
1600
Plate 10 Ions displaying neutral loss of the mass of O-GlcNAc in MS/MS can be fragmented in MS/MS/MS (MS3) to generate enhanced fragment ion information. In this example, CID of a doubly charged O-GlcNAc modified peptide [M+2H] 2+ 1000.82 gives rise to a dominant O-GlcNAc neutral loss ion (102) at [M+2H] 2+ 898.9 in MS/MS which is targeted for MS/MS/MS leading to increased fragment ion information that allows sequencing of the modified peptide as TPVDYIDLPYSSSPSR. (For Black and White version, see page 367.) [M+3H]3+ x4
x4
579.26
100
z H D T S A S T Q S T P A S Sg Rc
90
Relative Abundance
80 [M+3H]2+.
70
868.90
60
Ser + GlcNAc
c6 616.27
50
z1 159.10
c8 c7 847.38
40 30
717.32
c4
c9 c14
932.41 1099.23
c2 c3 458.20 c5 270.12 371.17 529.24
c12
1578.69
c13
1288.57 1201.56
1561.79
20 1669.08
956.88
10
1437.02
0 200
400
600
800
1000 m/z
1200
1400
1600
Plate 11 ECD MS/MS spectrum of an O-GlcNAc modified peptide from spectrin. In this LC-ECD-MS/MS spectrum from an LWAC enriched fraction of peptides from post-synaptic density both the peptide and site of modification could be determined. (For Black and White version, see page 368.)
CHAPT ER
16 Analysis of Deamidation in Proteins Jason J. Cournoyer and Peter B. O’Connor
Contents
1. What is Deamidation? 2. How Does Deamidation Occur? 3. Biological Significance of Deamidation 3.1 Mass spectrometric methods for studying deamidation 3.2 Deamidation in monoclonal antibodies 3.3 Aging 3.4 Amyloid diseases 3.5 Autoimmune diseases: Lupus and celiac disease 3.6 Cancer 3.7 Cataracts 3.8 Anthrax vaccine 3.9 Other diseases 3.10 The PIMT repair enzyme 3.11 Chemical methods for detection of isoAsp 3.12 Deamidation as a sample-handling artifact 4. Non-MS Based Methods for Studying Deamidation 4.1 Proteolytic digestion 4.2 Isoaspartyl antibody 4.3 Reversed phase HPLC 4.4 Ion exchange chromatography 4.5 Electrophoresis 4.6 Nuclear magnetic resonance (NMR) 4.7 Edman degradation 4.8 The PIMT enzyme 5. Mass Spectrometry Based Methods for Studying Deamidation 5.1 Measuring deamidation by isotopic deconvolution and mass defect methods 5.2 The diagnostic bn1+H2O for isoaspartyl residues in CAD MS spectra 5.3 Aspartyl versus isoAspartyl fragment ion ratios in CAD spectra 5.4 Immonium ions of isoaspartyl residues 5.5 Liquid chromatography/mass spectrometry (LCMS) 5.6 Electron capture dissociation
Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00216-X
376 377 380 380 381 381 382 382 382 383 383 383 384 384 385 385 385 386 386 387 388 388 389 389 390 390 391 392 393 394 395
r 2009 Elsevier B.V. All rights reserved.
375
376
Jason J. Cournoyer and Peter B. O’Connor
6. Quantitation of Deamidation and Its Products 6.1 HPLC combination methods 6.2 CAD methods without HPLC 6.3 ECD method without HPLC 7. Isotopic Labeling Methods 8. Summary References
397 397 398 398 399 401 402
1. WHAT IS DEAMIDATION? Deamidation is the most common of all post-translational modifications (PTMs) in proteins, but it tends to be understudied for several reasons. First, since mass spectrometry (MS) is the most sensitive method for studying peptide PTMs, it is the obvious choice for study of deamidation; however, deamidation results in a +0.984 Da shift, and the ubiquitous ion trap mass spectrometer barely has the mass resolution and frequently does not have the mass accuracy to clearly define such a shift. Second, this +0.984 Da mass shift usually generates a deamidated product with isotopic peaks which overlap with the non-deamidated precursor. Third, the products of deamidation are isomeric, so even if the deamidation is noticed, the ratio of the two products is frequently unclear. Figure 1 shows the isotopic peak distributions of a peptide as it is exposed to high pH for 12 days. Before reaction, the peptide is primarily the native –NL– sequence, but has deamidated B50% within 3 days and is almost entirely deamidated within 12 days. The deamidated product is a mixture of Asp and isoAsp residues. Deamidation, technically, implies loss of an amide group — usually as a neutral ammonium molecule. However, traditionally, ‘‘deamidation’’ refers to replacement of an amide with a hydroxyl group. Thus, the true, net mass change is –NH2 (16.0188 Da)+OH (17.0027 Da) ¼ 0.9839 Da. Detailed discussion of the chemistry of this process is available in the next section, but the essential reaction (under basic conditions) is shown in Figure 2 for Asparagine residues and Figure 3 for Glutamine residues. Deprotonation of the backbone amide promotes nucleophilic attack of the side chain carbonyl, which results in loss of ammonium and formation of the cyclic succinimide intermediate. This intermediate is symmetric around the ?–C(O)–N–C(O)–? , and hydration on either side of this carbonyl will generate the two products. Whether the precursor residue is asparagine or glutamine, the two products are isomeric. Asparagine generates a mixture of aspartic and isoaspartic acid, also called a-aspartic (the natural aspartic acid) and b-aspartic acid. Glutamine generates a mixture of glutamic and g-glutamic acids. Furthermore, aspartic acid and isoaspartic acid can spontaneously isomerize similar to the glutamic acid variants, although this occurs at a much slower rate. In both cases, the two isomers are difficult to differentiate due to their similarity in chemical structures and reactivity. This chapter focuses primarily on
Analysis of Deamidation in Proteins
377
Figure 1 A tryptic peptide from Cytochrome C undergoing harsh deamidation conditions (pH 11, 371C) for 2 weeks. The mass spectra of the doubly charged peptide ion shows systematic shift from Asn to a mixture of Asp/isoAsp.
mass spectrometric methods that can be used to distinguish them, but Section 4 also discusses non-mass spectrometric methods.
2. HOW DOES DEAMIDATION OCCUR? While this chapter is not intended to be a treatise on deamidation, it is important to note the two primary, canonical reaction mechanisms that are involved. These mechanisms denote the ‘‘standard model’’ for deamidation which correctly implies that these models are not necessarily correct, but are useful for understanding and predicting products. A more detailed discussion is available elsewhere [1]. In particular, reaction rates as a function of the primary sequence of free, random-coil peptides has been extensively studied [2–4].
378
Jason J. Cournoyer and Peter B. O’Connor
Figure 2 The cannonical mechanism of deamidation involves cyclization of asparagines (Asn) to form the succinimide intermediate with loss of ammonium, followed by hydration at either amide bond to from a mixture of aspartic (Asp) and isoaspartic (isoAsp) acids. Deamidation is irreversible under physiological conditions, but the isomerization of the products occurs at a relatively slow rate.
Figure 3 Deamidation also occurs on glutamine, but at a substantially slower rate than on asparagines. The products, similarly, are a mixture of glutamic acid and g-glutamic acid.
Deamidation minimizes around pH 5–6, and there is a prominent basecatalyzed reaction at physiological pH and higher and an acid-catalyzed reaction below pH 4 or so. The base-catalyzed reaction is shown in Figure 4. This reaction is noticeable at pH 7 in the case of rapid deamidating sequences such as ?NG? and very rapid at pH 10 even for slow deamidating sequences such as ?NL? . Thus, it is significant under tryptic digestion conditions, which usually use ammonium bicarbonate buffer at pH B8. Figures 4 and 5 are drawn to involve asparagine, but similar reaction mechanisms can be drawn for glutamine. The base-catalyzed reaction is initiated by deprotonation of the backbone amide, with the resulting negatively charged amide reacting with the side chain carbonyl to cyclize and generate the succinimide intermediate. The limiting factor in the base-catalyzed reaction is deprotonation of the backbone amide, so this reaction is accelerated under basic conditions and is essentially prohibited when the C-terminal amino acid is proline. The succinimide intermediate has an axis of symmetry (Figures 2 and 3; dotted line) if one ignores the rest of the protein.
Analysis of Deamidation in Proteins
379
Figure 4 Demidation at neutral or basic pHW6, starts by nucleophilic attack of the backbone amide nitrogen on the side chain carbonyl to form the succinimide.
Figure 5 Deamidation at acidic pH o 5, does not involve the cyclic intermediate, but instead involved direct hydrolysis on the side chain. Thus, deamidation under acidic conditions does not generate isomeric product mixture.
Hydration of the amide bond above or below of this axis of symmetry will generate the aspartic/glutamic acid and isoaspartic/g-glutamic acid products, respectively. A chemist’s immediate assumption, therefore, is that these two products should be formed in a 50:50 mixture. In reality, the rest of the protein and steric effects influence this product ratio with the isoaspartic acid form favored 3:1 on average; the glutamic acid variant is less studied, so there is no consensus about the standard product ratio for glutamine deamidation. The succinimide can also racemize the a-carbon position, to form a D-succinimide which, upon hydration, generates the D-amino acid forms. Distinguishing stereoisomers is sometimes possible with MS, but is beyond the scope of this chapter. The acid-catalyzed reaction is less important at physiological pH, but can become important once a sample is acidified with formic, acetic, or trifluoroacetic acid (TFA) for electrospray mass spectrometry. Under acidic conditions, the side chain carbonyl becomes protonated to a hydroxyl, which places a positive charge at the g-carbon/e-nitrogen making it highly susceptible to nucleophilic attack by water. This reaction results in only the aspartyl/glutamyl isoform; isoaspartyl/ g-glutamyl formation is not possible. A further reaction that is important at low pH is acid hydrolysis, which cleaves the backbone. Acid hydrolysis is dominant when the C-terminal amino acid is proline.
380
Jason J. Cournoyer and Peter B. O’Connor
Both the acid- and the base-catalyzed reactions ignore the influence of the protein higher-order structure. In general, proteins protect asparagine and glutamine residues from deamidation by keeping them hydrogen bonded in an alpha helix or beta sheet. However, there are many, well-established cases (see below) where specific Asn or Gln residues are exposed in a loop region or in the active site of a protein, where deamidation occurs and alters the protein function. Frequently, this is related to aging, unfolding, and turnover of the protein, but it also can be related to disease.
3. BIOLOGICAL SIGNIFICANCE OF DEAMIDATION Deamidation of asparagine (and to a lesser extent glutamine) residues is the most common of all PTMs [1,5]. Because the deamidation products, aspartic acid and isoaspartic acid, accumulate in the body over time, it is strongly implicated in many diseases (see below). Currently, about 100 papers/year are published in biological journals regarding Asp/isoAsp formation in proteins, but to date only one systematic study has been published regarding the ratios of the rates of formation of these two products over time [6–8]. This study was a heroic effort which involved high-resolution High-performance liquid chromatography (HPLC) separation of the two aspartic acid isomers as well as synthesis of each individual peptide in both isoforms to verify their peak elution times. While deamidation itself is relatively easy to detect (due to a charge shift and a +0.984 Da mass shift), its products are difficult to analyze, in general, because of the mass spectral interference and similarity in chemical reactivity. The standard model for the deamidation reaction (Figures 2 and 3) involves nucleophilic attack of the backbone amide nitrogen on the side chain carbonyl with loss of ammonia and formation of a cyclic succinimide [1,3,5,9–14]. Deprotonation of the backbone amide nitrogen facilitates this reaction which is why it is accelerated under basic conditions and polar solvent systems [15]. The succinimide is unstable in water and, due to the symmetry around the ring nitrogen, can hydrate to form a mixture of aspartic acid and isoaspartic acid, usually in a B3:1 ratio in favor of isoaspartic acid [16–18], although this ratio is peptide sequence and conformation dependent [19–21]. The structural dependence of deamidation (for crystal structures, see Catazano et al. [22]) has been a matter of intense study over the last 3–4 decades (recently reviewed by Robinson and Robinson [1]). However, very little has been done in studying the structural dependence on the formation of the two deamidation products, aspartic acid and isoaspartic acid.
3.1 Mass spectrometric methods for studying deamidation MS has been extensively used to study deamidation as conversion of asparagine to aspartic acid and conversion of glutamine to glutamic acid involves a +0.984 Da mass shift. However, isotopic interference usually complicates the assignment requiring fitting of the experimental isotopic distribution to a sum of two model isotopic distributions to quantify them [3] (which prohibits quantitative analysis
Analysis of Deamidation in Proteins
381
of low abundance Asp/isoAsp products), although this fitting method is not necessary if sufficient resolution is available [23,24]. Furthermore, MS has difficulty distinguishing aspartic acid from isoaspartic acid residues because they are isomeric and, thus, have zero mass shift. Many attempts have been made to distinguish these isomers with varying success. An old method using fast atom bombardment relied on signature immonium ions [25–27], the abundance ratio of b/y fragments at Asp versus isoAsp varies widely but unpredictably [28,29], and one group showed that b+H2O peaks were observed with isoAsp residues [28], although this effect appears to be sequence dependent [30]. A new method for distinguishing Asp and isoAsp residues in peptides [31,32] and proteins [33] has potential for clearly distinguishing these isomers. The most effective methods to date utilize liqiud chromatography (LC) or capillary electrophoresis (CE) combined with MS and Tandem mass spectrometry (MS/MS) [34–36]. CE and polyacrylamide gel electrophoresis (PAGE) are particularly useful for separating asparagine from Asp/isoAsp containing peptides and proteins due to the charge shift [36,37], and these separations can be followed up by MS, albeit with caveats described below.
3.2 Deamidation in monoclonal antibodies Stability of monoclonal antibody based therapeutics is becoming a critical concern for the US Food and Drug Administration and for the biotechnology and pharmaceutical companies that are developing these products, and one of the primary modifications affecting the stability of monoclonal antibodies is deamidation and isomerization of aspartic acid residues [38]. The current methods generally involve LCMS and LCMS/MS [30,39,40], which is slow but relatively effective in detecting Asn, Asp, isoAsp, and sometimes the succinimide residues [30]. For example, Chelius et al. [30] did a detailed study of deamidation of a Human Immunoglobulin Gamma antibody, identified four deamidation sites of the –NG– or –NN– motif, and found that b/y intensity ratios were an unreliable indicator of isoaspartic acid. Overall, LCMS and LCMS/MS methods rely entirely on isoaspartate containing peptides eluting before aspartate residues in the chromatographic trace to differentiate them; there is, in general, no reliable mass spectrometric confirmation of this assignment. Deamidation also affects other biotech products. For example, the hematopoietic growth factor known as stem cell factor or SCF shows a 50-fold decrease in activation if an isoAsp is formed at Asn10, but a slight increase in activity if Asp is formed there [41]. A potential thrombosis inhibitor, RGD, which is a small pseudopeptide including one aspartic acid residue, has been shown to isomerize causing instability in its activity over time [29].
3.3 Aging Increased deamidation is observed extensively in aged proteins, and presumably these deamidation sites are converted to a mixture of aspartic acid and isoaspartic acid. The ratio of the two products is not generally known. One of the many aging
382
Jason J. Cournoyer and Peter B. O’Connor
hypotheses, therefore, is that accumulation of deamidated proteins leads to cell death [4]. Several nice examples include erythrocyte aging where the membrane protein 4.1b deamidates in vivo with a half-life of 41 days [42], which roughly correlates with erythrocyte lifespan. Overexpression of the protein L-isoaspartyl methyltransferase (PIMT) repair enzyme (see below) in drosophila increased their lifespan [43,44], and PIMT knockout mice die quickly and can be rescued by PIMT gene therapy experiments [45–50].
3.4 Amyloid diseases Deamidation (and presumably the formation of isoaspartic acid residues) is considered as a possible initiation event for Amyloid diseases like Alzheimer’s disease (AD) [51–53] and Type 2 diabetes mellitus [54]. While the jury is still out on this hypothesis, there is substantial supporting information. Aspartic acids 1 and 7 of the a-beta peptide are known to be isomerized (and racemized) in Amyloid plaques with B70% isoAsp content [55]. Deamidation of insulin leads to amyloid fibril formation [56]. Formation of isoAsp residues is correlated with increased formation of beta-sheet structures [11,57,58] and deamidation is suppressed when the asparagine is constrained within an alpha helix [59]. The hypothesis that amyloid diseases, which are characterized by misfolding of proteins followed by aggregation and formation of fibrils, could be caused by formation of isoaspartic acid is very reasonable at this point.
3.5 Autoimmune diseases: Lupus and celiac disease Lupus is an autoimmune disease that is correlated with a high level of isoAsp formation on Histone H2B, and knockdown studies of the isoaspartic acid repair enzyme PIMT (see below) have shown that histone H2B will accumulate B1% isoAsp per day without PIMT. It is speculated that the isoAsp containing H2B generates the immune response which causes the disease. In a systematic and fundamental study, McAdam et al. [60] showed that T-cells will recognize deamidated hen egg lysozyme, but not native peptides thus demonstrating that immune response can be activated by deamidation. Mazzeo et al. [61] showed that the protein transglutaminase could deamidate 19 sites in alpha-gliadin. These modifications are likely responsible for the autoimmune response noted in celiac disease. Sjostrom et al. [62] have also correlated deamidation with celiac disease.
3.6 Cancer Deamidation of two specific asparagine residues in Bcl-XL is a critical apoptotic switch in a range of tumor cells [63,64]. Normal cells suppress the apoptotic signal by suppression of deamidation of Bcl-XL, which is an apoptosis inhibitor. DNA damage induced removal of the two proteins p53 and Rb in cancer cells (which are the deamidation suppressors of Bcl-XL) therefore leads to cell death. Some
Analysis of Deamidation in Proteins
383
tumor cells resist apoptosis by reducing Bcl-XL deamidation rate or by increasing Bcl-XL production [65,66].
3.7 Cataracts Cataracts are often associated with PTMs in crystallins in the eye. One of the most common of these PTM’s is deamidation [67–73]. For example, Lapko et al. [68] found five Asn and nine Gln deamidation sites in gS crystalline and correlated their positions with solvent exposure. Harms et al. [70] noted that deamidation of Gln146 in bB1-crystallin resulted in greater aggregation. Interestingly, crystallins seem to show more glutamine deamidation than other proteins leading to speculation that an enzyme may be involved as was shown above with transglutaminase. Another protein associated with cataracts, Aquaporin, showed two sites of deamidation and one site of aspartic acid isomerization [74]. In the latter case, the b/y pattern found by Gonzalez et al. [28] was corroborated and peptides were synthesized to confirm the assignment of Asp versus isoAsp residues.
3.8 Anthrax vaccine A newly developed bacillus anthracis vaccine is primarily composed of a recombinant 83 kDa protein called Protective Antigen (rPA) [75–79]. This protein deamidates readily at 7 of 68 asparagine residues, increasing isoform complexity in electrophoretic separations, and this complexity increases with in vitro aging experiments. Aged rPA has marked decrease in activity.
3.9 Other diseases Deamidation has been observed in Osteopontin, a sialic acid rich glycophosphoprotein, which is implicated in many bone diseases and in the formation of calcium deposits [80]. Similarly, calbindin’s calcium homeostasis capacity in the central nervous system is modulated by its deamidation state [81]. Beutow et al. [82] have noted using a series of mutation and loop deletion studies that cytotoxic necrotizing factor 1 (CNF1), which is implicated in urinary tract infections and neonatal meningitis, allows Escherichia coli entrance to the cell by specifically deamidating one Gln residue in a series of skeletal proteins RhoA, Rac1, and Cdc-42. This result has also been observed by other groups [83–87]. Erythrocyte aging has been shown to be associated with increase in isoaspartic acid which is also associated with oxidative stress using mild oxidants such as H2O2 [88,89]. A disease associated with weakened red blood cell walls, hereditary spherocytocis, is correlated with enhanced methylation (presumably at isoAsp or D-Asp) on membrane proteins [90]. And finally, a number of other biochemical effects are noticed that are dependent on deamidation and aspartic acid isomerization. For example, cAMP-dependent protein kinase activity is dependent on Asn–Asp conversion [91], serine hydroxymethyltransferase deamidates and isomerizes in vivo [92], and human phenylalanine hydroxylase deamidation at Asn32
384
Jason J. Cournoyer and Peter B. O’Connor
appears to have a regulatory function and increases the rate of phosphorylation of this enzyme [93].
3.10 The PIMT repair enzyme One of the most important developments to date in the field of isoaspartic acid analysis is the discovery of PIMT (also sometimes called protein carboxyl methyltransferase, PCMT) [94,95]. This enzyme selectively methylates L-isoaspartyl residues which promotes their cyclization back to the succinimide with the loss of methanol [10,96]. Over time it converts isoAsp residues (and D-Asp residues) to L-Asp residues making it effectively an isoAsp repair enzyme [9,97]. Due to its selectivity for isoAsp residues, when this enzyme is combined with a radioactive methyl donor, the radioactivity can be used to trace formation of isoaspartic acid. This is the principle of the IsoQuant kit sold by Promega [98]. PIMT is naturally occurring in almost all mammalian tissues, and studies in the literature concerning its substrates are too numerous to cite, but for example, calmodulin aged at 7.4 pH and 371C accumulates 1.2 mols of methylation sites per mol of protein with one NG site as the major contributor but two DG sites and one NG sites as additional prominent contributors with seven more aspartic acid sites showing isomerization [99]. Particularly interesting, however, were a set of studies involving knockdown of PIMT levels in Rat PC12 cells using PIMT inhibitors [100,101] and PIMT-deficient knockout mice [45,102]. In the former case, isoAsp accumulated when PIMT was knocked down and resulted in immune response on Histone H2B which may be the cause of autoimmune diseases such as lupus. The knockout case was even more dramatic with KO mice suffering epileptic seizures and death within 1–2 months. In these cases isoAsp levels in brain tissue extracts were B5–10x higher than their heterozygous littermates (which could produce PIMT). Gene therapy techniques used to replace PIMT in knockout mice partially improved the symptoms, but only partially repaired the isoAsp residues in damaged proteins which implies that once isoAsp forms, some proteins cannot be repaired by PIMT [48]. Clearly PIMT is a critical protein in central nervous system function. PIMTbased assays (including one using LC to detect a non-radioactive methanol [103]) are currently used extensively for checking for the presence of isoAsp, but they can only determine the presence, but not the position of isoAsp residues.
3.11 Chemical methods for detection of isoAsp Edman degradation fails at isoaspartic acid residues which can be used to assign isoAsp positions. Also the Asp-N digestion enzyme fails at isoaspartic acid [104] so that different peptide patterns produced by this enzyme can be used to differentiate a- from b-aspartic acid. Also formation of the succinimide can be taken as an advantage for cleavage of the peptide backbone [105]. While such chemical methods work, they are time consuming, usually require at least microgram quantities of proteins, and are subject to side reactions. Newer methods which are faster, more reliable, and more sensitive will greatly advance the field.
Analysis of Deamidation in Proteins
385
3.12 Deamidation as a sample-handling artifact While deamidation, as discussed, has many in vivo consequences, it can also be formed in vitro during sample handling and analysis if care is not taken [106]. Deamidation is frequently observed as an artifact of tryptic digestion since ammonium bicarbonate buffers are often used at pH 8. Chelius et al. [30] observed that overnight digestion at pH 8 yielded 30% deamidation of labile sites in an antibody, while a 4-h digestion yielded no detectible deamidation. Use of deglycosylation enzymes [107] and use of beta elimination reactions [108] both cause deamidation, and separation of proteins on PAGE gels can sometimes cause deamidation as well [93]. Liu et al. [109] showed that deamidation in 18O labeled water could be used to detect ‘‘real’’ deamidation from artifacts because it results in a +3 Da shift instead of the expected +1 Da shift. Finally, racemization can occur as well during deamidation [110]. Clearly deamidation can occur if proteins or peptides with labile sites are exposed to basic environments for any period of time. It is, therefore, imperative that care be taken to minimize artifactual deamidation and develop methods which allow distinguishing deamidation products that are real from those that are formed during sample handling.
4. NON-MS BASED METHODS FOR STUDYING DEAMIDATION 4.1 Proteolytic digestion Kameoka used endoproteinase Asp-N and MS to detect deamidation of asparagine and isomerization of aspartyl residues in a protein [111]. Endoproteinase Asp-N is a residue specific protease that cleaves on the N-terminal side of an L-aspartyl residue and not at D-aspartyl or D/L-isoaspartyl residues. Lysozyme with an isomerized aspartyl and a deamidated asparagine residue were mixed separately with N15 labeled lysozyme, digested with Asp-N and analyzed by matrix assisted laser desorption-ionization time-of-flight (MALDI-TOF) MS. Modified sites were identified by the presence of new peptides (asparagine deamidation) or the lack of expected peptides (aspartyl isomerization) when compared to those generated from digestion of the N15 labeled protein. Although most mass spectrometers can detect deamidation (+0.984 Da mass shift), protein digestion with Asp-N is a useful and simple technique for detecting aspartyl isomerization. Also, Asp-N could be used to differentiate aspartyl and isoaspartyl peptides since one of the two forms is an acceptable substrate for the enzyme. Other proteases have been used to detect isoaspartyl residues in proteins and peptides due to the uncommon peptide linkage associated with the form. The tryptic digest of the separated aspartyl and isoaspartyl forms of ribonuclease A showed differences in their HPLC peptide maps [112,113]. The differences were due to an apparent missed cleavage in the isoaspartyl chromatogram. The isoaspartyl residue, D67, was found to be adjacent to the missed cleavage site (?KD67G?). Therefore, the abnormal linkage associated with the isoaspartyl was presumed to be the reason why the K66 site was resistant to cleavage by trypsin.
386
Jason J. Cournoyer and Peter B. O’Connor
4.2 Isoaspartyl antibody Antibodies raised against isoaspartyl containing peptides using the MAP procedure (multiple antigen peptide system [114]) have been successful in identifying racemized aspartyl residues and deamidated asparaginyl residues in peptides and proteins, particularly for the Ab peptide associated with AD [115–117]. The antibodies raised are epitope specific so each suspected isoaspartyl residue requires a specific antibody and thus synthetic peptide for the MAP procedure. The antibodies raised against the synthetic peptides Ab1-42(isoD7) and Ab142(isoD23) were used to immunostain the brain tissue from six AD brains and non-AD brains that were used as control samples [116]. The Ab1-42(isoD7) antibody failed to be of use because it stained both AD samples and control samples, but the Ab1-42(isoD23) antibody was found to preferentially stain highly aggregated forms of Ab1-42 in the amyloid-bearing vessels and the core of mature plaques. Therefore, isomerized D23 was suggested to play a role in Ab1-42 sedimentation. In another example, the distribution of isoaspartyl residues in the postmortem brain of a 65-year-old patient with cerebral amyloid angiopathy (CAA) was studied with the Ab1-42(isoD7) and Ab1-42(isoD23) antibodies [117]. The patient had a D23N mutation within their Ab sequence (Iowa-type) that was suspected to undergo deamidation to the isoD23 form more easily than D23 form, possibly triggering an early onset of fibrillogenesis in blood vessels. Tissue immunostained with anti-isoaspartyl detected isoD7 in vascular and parenchymal deposits but isoD23 was detected only in vascular deposits suggesting that deamidation at the mutation site could have played a part in premature deposition of the Ab peptide in this case.
4.3 Reversed phase HPLC Separation of the asparaginyl, aspartyl, and isoaspartyl forms of peptides can be accomplished by RP-HPLC. Chromatography is an appealing methodology for separation because it can be used in combination with other techniques that assist in identifying the separated species. The elutant can be infused directly into a mass spectrometer or fractions can be collected so that a more detailed analysis by MS, Edman degradation, or PIMT methylation can be performed. Many studies have shown that the deamidation products elute in the following order; isoaspartyl, aspartyl, and then the aminosuccinyl form of the peptide [30,33,40,54,91,118–121]. This trend has been used to identify each peptide form [30] yet subsequent analysis by other techniques provides a more dependable result. When developing a method for separation, many factors need to be considered including the type of gradient, mobile phase composition, and column selection. Typically, a linear gradient on a RP-HPLC platform equipped with a C18 column, and a mobile phase system that consists of an acidified aqueous phase (mobile phase A), such as 0.1% TFA, and an organic phase (mobile phase B), such as acetonitrile, is used and separation is achieved by varying the gradient. This approach was useful for measuring the deamidation rates of peptides and the enzyme kinetics and repair rates associated with the protein isoaspartyl
Analysis of Deamidation in Proteins
387
methyltransferase. A linear gradient of 0–40% B in 40 min was adequate to separate the isoforms of the peptide VYPDGA, corresponding to residues 22–27 of deamidated ACTH, wherein the isoaspartyl form was discovered to be an excellent substrate for PIMT [118]. Also, the system was used to follow the repair process of the small peptide, WMisoDF, to the aspartyl version by PIMT via time course plots based on the abundance of the intermediates and by-products in the reaction mixture [119]. All five products involved in the repair process (D/L-aspartyl, D/L-isoaspartyl, and succinimide versions of WMDF) were separated by a gradient 20–40% in 40 min. Both amino acid composition and length of the peptide affect the gradient necessary for separation of their isomers, each to varying degrees. For example, although the peptide GFDLDGGGVG contained twice as many residues as VYPDGA, they required the same chromatographic conditions to separate their isoforms [118,121]. Therefore, the unexpected behavior of peptides makes predicting RP-HPLC conditions almost impossible and often times results in having to develop custom gradients for every set of peptides. A general method can be developed but may require an extremely long gradient so that all possible deamidation sites are sufficiently separated. The products of three deamidation sites in a recombinant monoclonal antibody were separated on one HPLC run using a gradient of 0–65% B in 195 min [30]. Other types of gradients used for separation include step and concave gradients and even isocratic conditions. Although C18 is a popular choice for isoaspartyl/aspartyl separations, C8 columns can also separate the peptide isoforms although a shallower gradient may be necessary [99]. Finally, acidic modifiers such as TFA, formic acid, and acetic acid provide adequate retention and peak shape, but changing the pH of the mobile phase can help shift the retention times of interfering species in the chromatogram. For example, using a mobile phase system at pH 6 helped separate four forms of a peptide that was both deamidated and isomerized [99]. Since the aspartyl/isoaspartyl forms are ionized at pH 6, the native, non-deamidated form is shifted to a higher retention time away from the ionized forms therefore providing a less complex chromatogram.
4.4 Ion exchange chromatography Ion exchange chromatography is useful for separating species based on their ionic charge and therefore can be used to separate the native from the deamidated form of a protein since asparagine/glutamine is converted to their ionizable acid homologues. Mobile phase systems used for separation should have a pH that ensures deprotonation of the acidic group so that the overall charge of the native and deamidated forms show differential binding to the stationary phase. Bound proteins are eluted with gradual increase in concentration of a counter ion, i.e. salt. Both cation exchange and anion exchange can be used to differentiate the two forms. The deamidated form of the protein binds more strongly for anion exchange chromatography therefore eluting later than the native form, and vice versa for cation exchange analysis. Cation exchange chromatography has been used to separate the forms of a partially deamidated proteins [41,122] including ribonuclease A [113,123–124] and a monoclonal antibody [40] and anion exchange has been used for gS-crystallin [68] and protective antigen [75]. Separation of
388
Jason J. Cournoyer and Peter B. O’Connor
isoglutamyl/isoasapartyl and glutamyl/aspartyl by ion exchange is difficult, since the difference between the pKa of the isomers is so similar. The kinetics of isoaspartyl and aspartyl formation from deamidating ribonuclease A was measured using a very shallow KCl gradient on a cation exchange system [6].
4.5 Electrophoresis Gel electrophoresis can be used to analyze proteins affected by deamidation since both the isoelectric point and shape are altered. Isoelectric focusing (IEF) can separate based on the change in charge and native gel electrophoresis can discriminate protein forms based on the change in shape. The more traditional method, SDS-PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis) is not used since the molecular weight resolution on gels is insufficient to discriminate between the native and deamidated forms. IEF is performed on a pH gradient gel that allows proteins to migrate based on their isoelectric point (pI) in the presence of an applied electric field. Performing separations using buffers with pH greater than the pKa of the aspartyl or isoaspartyl side chain allows separation of the deamidated from non-deamidated forms since their respective pIs are different. The method is especially useful for separating multiple forms of a protein due to several deamidations; the multiple forms of stem cell factor [41], protective antigen [76], and phenylalanine hydroxylase [93] were separated by IEF. Native gel electrophoresis is also a useful electrophoretic technique for analyzing deamidated proteins since it can also separate the isomeric products of deamidation. The rationale for separation is that formation of aspartyl/isoaspartyl residues affects the shape of the protein when compared to the native form. For example, the three forms of partially deamidated calmodulin (N97 to isoD/D97) were separated by native gel electrophoresis [125]. The isoD97 form migrated the slowest followed by the D97 and then N97 form. The larger molecular radii for isoD97 and D97 account for their slower migration; formation of isoaspartyl affects the shape of the protein more than the aspartyl form. Other proteins separated by native gel electrophoresis include the multiple deamidated forms of calbindin [81] and protective antigen [75]. CE has been used to separate the deamidation products and their respective stereoisomers. In CE, analytes are separated by their charge and interaction with the surface silanol groups of the column in the presence of a strong electric field and pH buffered mobile phase. All four isomers (D/L-aspartyl and D/L-isoaspartyl) from aspartyl isomerization in a tripeptide were successfully separated on a silica capillary column using sodium phosphate buffer solution at pH 3 [36]. Contrary to elution order found in reverse phase HPLC analysis, the isoaspartyl peptides were more strongly retained on the capillary since their pKa is predicted to be lower than the aspartyl side chain. Also, the products of D130 isomerization from aged and digested rHGH were separated by CE using similar separation conditions [126].
4.6 Nuclear magnetic resonance (NMR) 2D NMR has been used to differentiate the isoaspartyl from aspartyl residues in peptides [127,128]. Scalar coupling of the 1H spin systems between the backbone nitrogen and the Ca, Cb, and amide nitrogen of the neighboring amino acid
Analysis of Deamidation in Proteins
389
residue in NOESY 2D 1H NMR spectra allows the approximate distances between these groups to be determined for large molecular structures such as proteins [129]. For normal residue linkages, the Ca–N or N–N couplings are typically strong while Cb–N tend to be weaker. However, isoasaprtyl linkages have a methylene group inserted into the backbone that changes the magnitude of these couplings. For example, 2D 1H NMR was used to detect an isoaspartyl residue (isoD56) within a 30-residue tryptic peptide from calbindin D9k that deamidated during purification [127]. The distance between the nitrogen of the G57 and Cb of ˚ while the distance of isoD56 (backbone methylene) was found to be less than 3.5 A ˚ , respectively. This trend Ca and N (of isoD) from N (of G57) were 4.7 and 6 A describes spatially what is attributed to an isoaspartyl residue in the peptide backbone. Additionally, the data showed that the backbone region containing an isoaspartyl residue was in a more extended helical shape compared to the aspartyl version. Another study used 2D 1H NMR to differentiate to the isoforms of a 15-residue peptide using the presence of the Cb – N found in the isoaspartyl spectrum that was not found in the aspartyl spectrum [128]. The advantage of using 2D 1H NMR is the additional structural information acquired that provides insight on how the presence of an isoaspartyl residue affects the higher-order strucutre of the region. However, the disadvantage is that each experiment requires a large amount of sample, which may make the method inapplicable to biological experiments; the samples in the NMR studies described were in the range of 103 M, which is close to the precipitation concentration of most proteins.
4.7 Edman degradation Edman degradation, a chemical method used to sequence N-terminal a-linked peptides, was found to be blocked by an isoaspartyl residue when Smyth et al. attempted to sequence residues 11–18 of pancreatic ribonuclease [130]. For normal a-linked peptides, the addition of phenylisothiocyanate to the amino group facilitates nucleophilic attack of the carbonyl group by the thiol group ultimately resulting in cleavage of the residue and generation of a peptide with n-1 residues and a new amino terminus. When an isoaspartyl residue is the next residue, the carbonyl group is beyond the reach for the attacking thiol due to the inserted methylene group, and the process is interrupted. The failure of Edman degradation at isoaspartyl residues has been used to detect these residues in proteolytic peptides. Furthermore, products of deamidation separated by HPLC determined to have the same masses and sequences by MS/MS or amino acid hydrolysis can be subjected to Edman degradation for differentiation. This method has been useful for experiments involving proteolytic peptides from deamidated proteins [131,132] including ribonuclease A [112,130], calmodulin [99], and human stem cell factor [41] and isomerized Ab peptide from AD brains [133].
4.8 The PIMT enzyme The enzyme PIMT, which selectively methylates isoaspartyl residues, can be used for analyzing asparaginyl deamidation and aspartyl isomerization by detecting the
390
Jason J. Cournoyer and Peter B. O’Connor
resulting isoaspartyl residues. Analytical methods employing PIMT use radioactively labeled S-adenosyl-L-methione (AdoMet, 14C or 3H) as the methyl donor that is selectively incorporated onto the isoaspartyl caboxylate group. A popular technique for measuring the isoaspartyl content in peptides by radiolabeling with PIMT is by the vapor diffusion method. In this method, [14C] or [3H] methanol released by quenching the methylation reaction with a mild base is measured by liquid scintillation [118,134]. The methylation step is carried out for 30 min to 1 h and then immediately quenched with a mild base generating radiolabeled methanol. The opened reaction vial is then placed in a sealed tube containing a scintillation cocktail. The [14C] or [3H] methanol vapor diffuses from the reaction solution (or spotted filter paper) into the scintillation cocktail which is then counted to determine the quantity of isoaspartyl residues versus radiolabeled methanol standards. For proteins, the vapor diffusion method is used but precipitation of the protein using trichloroacetic acid is done after methylation and before treatment with base to remove unreacted [14C] or [3H] AdoMet [135,136]. Isoaspartyl residues in cell (E. coli [137], erythrocytes [90], and ooctyes [136]) and tissue homogenates (drosophila [43], Caenorhabditis elegans [138], and mice [45,46]), proteins [101,125,139–141] and peptides [93,99,118,121,122,133,142,143] have been characterized using the vapor diffusion method. Isoaspartyl detection using PIMT is often used in combination with separation techniques, such as HPLC and gel electrophoresis, to localize the isoaspartyl residues in a mixture of proteins or peptides. The vapor diffusion method can be used with HPLC to provide two concurrent chromatograms, one based on UV absorbance and the other derived from radioactive counts of collected fractions. The peptides or proteins can be labeled after being fractionated by HPLC or before separation. Peptides from aged proteins [139–141], such as calmodulin [99], were analyzed using HPLC and PIMT methylation as well as a mixture of histones isolated from nuclei of mouse brain tissue [101,144]. Also, a mixture of proteins can be reacted with PIMT in the presence of radiolabeled AdoMet, separated by gel electrophoresis and then subjected to fluorography [101,136,139,143]. The fluorogram of the gel reveals which proteins, based on molecular weight, contain isoaspartyl residues. The relative intensity of the spots can also provide some information of the relative amount of isoaspartyl formation between the proteins. Finally, the Isoquant kit developed by Promega also uses the PIMT enzyme and HPLC to detect and measure the abundance of isoaspartyl residues by measuring the change in UV absorbance of AdoMet before and after methylation [103]. Localization is not possible but the convenience of the kit helps to quickly and easily provide some information on isoaspartyl content.
5. MASS SPECTROMETRY BASED METHODS FOR STUDYING DEAMIDATION 5.1 Measuring deamidation by isotopic deconvolution and mass defect methods The isotopic deconvolution method for measuring deamidation uses the +0.984 Da shift in the mass spectrum associated with the conversion of –NH2 to –OH. The
Analysis of Deamidation in Proteins
391
corresponding peaks in the spectrum are a combination of two overlapping species resulting in an atypical isotopic pattern. Assuming the peak intensities are additive, the pattern can be deconvolved quantitatively to its two separate forms by fitting the theoretical isotopic patterns of each contributing species to the experimental pattern. Although a considerable amount of signal averaging is advised to obtain a pattern that accurately represents the sample composition, this method is still much faster and uses less sample than HPLC. Robinson and Robinson efficiently performed a comprehensive analysis of the deamidation rates of asparginyl and glutamyl residues in over 700 peptides that required thousands of analyses using the isotopic deconvolution method on data obtained from a quadrupole mass spectrometer [145,146]. The extent of deamidation in proteins has also been determined by analyzing the proteolytic peptide mixture on a MALDI-TOF mass spectrometer. The advantage of MALDI is that ionization generates only singly charged species therefore producing a mass spectrum of minimal complexity, an important benefit considering the number of proteolytic peptides that can be obtained from a protein digest. For example, the extent of deamidation of human phenylalanine hydroxylase [93,147], calbindin [81], and protective antigen [76] was determined using the isotopic deconvolution method on data from a MALDI-TOF mass spectrometer. Isotopic deconvolution has also been used on high-resolution data from electrospray ionization Fourier transform mass spectrometry (ESI-FTMS) analysis of deamidated ribonuclease A [148] and crystallins from human eye lenses (A. B. Robinson, Private communication). Fragments obtained from top–down analysis of these proteins allowed the localization and extent of deamidation to be measured at several sites therefore eliminating the need for digestion. The mass defect method for measuring deamidation relies on the ability to resolve the 19 mDa mass difference between the A+1 isotope of a non-deamidated peptide and the monoisotopic peak of the deamidated form [23]. Once resolved, the relative intensity of the two peaks corresponds to the extent of deamidation assuming the ionization and detection efficiencies of the two species is the same. FTMS can attain the high resolution necessary to resolve the two forms. For example, a resolution of 280,000 was achieved on an FTMS in order to separate the three deamidated forms of 16-residue synthetic peptide [23]. The mass defect method can also be used on fragments generated by top–down fragmentation (A. B. Robinson, Private communication; [24]). For example, mixtures of wild-type and mutant (Q162E) were mixed at known ratios and subjected to top–down fragmentation by IRMPD [24]. Both forms for fragments containing the Glu/Gln162 site were resolved from one another and the mixture composition revealed by mass defect analysis was found to be close to the expected value.
5.2 The diagnostic bn1+H2O for isoaspartyl residues in CAD MS spectra In 1992, Papayannolpoulos [149] and Carr [150] both reported a bn1+H2O (n is the position of isoaspartyl or aspartyl) fragment ion in the collisionally activated dissociation (CAD) spectrum of peptides with an isoaspartyl residue that was not
392
Jason J. Cournoyer and Peter B. O’Connor
found in the spectrum of the aspartyl form of the same peptide. The fragment ion was also used to differentiate the isoaspartyl from aspartyl forms of a deamidated proteolytic peptide, separated by HPLC, from hirudin, an anticoagulant peptide [151]. The mechanism involves migration of the –OH from the isoaspartyl side chain to the n-1 carbonyl group via an oxazolidone intermediate that rearranges to generate the bn1+H2O ion and an aldinine fragment. Since the isoaspartyl side chain resembles the C-terminus, generation of bn1+H2O for isoaspartyl residues is suggested to resemble the fragmentation channel shown to occur in the low-energy MS/MS spectra of peptides wherein a C-terminal hydroxyl rearrangement generates bm1+H2O fragment ions (m is the length of the peptide) [152,153]. Schindler et al. showed evidence of the hydroxyl transfer mechanism for isoaspartyl residues by performing MS/MS on an O18 labeled peptide [151]. The labeled peptide was synthesized by incubating the succinimide derivative of the peptide in O18 water, so that upon hydrolysis, one of the equivalent oxygens of the carboxyl group of the isoaspartyl side chain was labeled with O18. The low energy MS/MS of the peptide showed the bn+H2O ion being split in a 1:1 ratio of O16:O18 thus proving that there is a migration of –OH from the side chain to the diagnostic fragment. A study in 2000 by Gonzalez et al. showed that the bn1+H2O fragment ion and its complement, the yln-46 fragment ion, can be used to differentiate isoaspartyl from aspartyl residues in sets of synthetic peptides, including D7 and D23 of b-amyloid peptides analogues, and detect isoaspartyl residues in deamidated tryptic peptides from recombinant proteins [28]. The data also demonstrated that the bn+H2O intensity is much larger if a basic amino acid or the N-terminus is on the N-terminal side of the isoaspartyl residue. Intermolecular interaction between the side chain of the basic and isoaspartyl residues is believed to facilitate rearrangement to generate the diagnostic ions. A severe limitation to using the bn+H2O ion to detect isoaspartyl residues may occur when analyzing tryptic peptides, since they should only have C-terminal basic residues unless there is incomplete digestion or there is a histidine present within the tryptic peptide. The complement yln-46 fragment ion can be used to detect isoaspartyl residues in the case of tryptic peptides, but this fragment is usually much less abundant than the bn+H2O fragment ion (the largest yln-46/yln intensity ratio reported was 0.039). Nonetheless, the isoaspartyl residues in two tryptic peptides were characterized by the presence of the yln-46 ions.
5.3 Aspartyl versus isoAspartyl fragment ion ratios in CAD spectra Lloyd and coworkers were first, in 1988, to use fragment ion abundance ratios to differentiate aspartyl from isoaspartyl residues in the MS/MS spectra of peptides using a double focusing magnetic sector mass spectrometer [26]. The MS/MS of the peptides RKDVY and DIRKF-NH2 showed loss of CO from bn+1 to form an+1 while the same loss was much smaller for the isoaspartyl versions (bn+1/an+1(Asp) W bn+1/an+1(isoAsp)). The authors suggest that loss of CO to form the an+1, a stable iminium ion, for the aspartyl form is a much more favored pathway than loss of CO form the isoaspartyl bn+1 to from a primary carbocation. Additionally,
Analysis of Deamidation in Proteins
393
the same trend was found for a peptide containing a glutamyl residue and its isoglutamyl homologue. A study by Lehman et al. in 2000 found that the intensity of fragment ions resulting from amide backbone bond cleavage (b and y ions) on either side of aspartyl/isoaspartyl showed a reproducible trend that can be used to distinguish isoaspartyl from aspartyl residues [154]. Based on the MS/MS spectra of 15 sets of isomeric peptides, the b/y intensity ratio of complementary b and y ions from cleavage on either side of the aspartyl/isoaspartyl residue were consistently larger for the aspartyl form than the isoaspartyl form. Fragmentation intensity ratios for aspartyl were typically less than 15 times larger than the same fragmentation for the isoaspartyl counterparts although some values were much larger. The trend is believed to be a result of the competition between forming the oxazolone containing the N-terminus (b ion) and direct cleavage to form a terminal amine containing the C-terminus (y ion) [155]. In the ESI process, the amide nitrogen can be protonated thus weakening the C–N bond facilitating nucleophilic attack of the amide bond carbonyl on the C-terminal adjacent amino acid to form the oxazolone. When the residue is isoaspartic acid, formation of the oxazolone is hindered. On the N-teminal side of the isoaspartyl residue, oxazolone formation can be hindered by a similar interaction between the isoaspartyl side chain and the backbone carbonyl since the carboxyl group is closer in proximity to the backbone more so than the aspartyl form. On the C-terminal side, a six-membered ring must be formed that contains the isoaspartyl residue and is kinetically less favored than the five-membered oxazolone structure. Therefore, b ion formation is hindered on both sides of the isoaspartyl residue and direct cleavage is favored resulting in a decreased b/y ion ratio compared to the aspartyl form.
5.4 Immonium ions of isoaspartyl residues The structure proposed by Lloyd for the a1 ion found in the MS/MS spectrum of the peptide DIRKF-NH2, missing from the spectrum for the isoaspartyl version, is essentially the immonium ion for an aspartyl residue (m/z ¼ 88) [26]. Immonium ions are small internal fragments containing one amino acid side chain that result from the cleavage of multiple backbone bonds and are useful for determining the amino acid composition of a peptide. Several studies since then have shown that the intensity of the aspartyl immonium ion found in the MS/MS spectrum of an isoaspartyl peptide is much smaller (or nonexistent) than that found in the aspartyl spectrum [28,154]. Lehmann showed that, for 15 sets of peptides, the aspartyl immonium ion intensity (normalized to another immonium ion in the spectrum) was on the average 5.5 times higher for the aspartyl form over the isoaspartyl form and suggest that such a trend could be used to differentiate the two forms [154]. Gozalez et al. used the aspartyl and isoaspartyl immonium ion intensities to differentiate isomers that could not be differentiated using the bn1+H2O and yln-46 fragment ions [28]. In addition to the aspartyl immonium ion, a fragment ion at m/z ¼ 70 was found in the isoaspartyl spectrum that was not found in the aspartyl spectrum. The ion was suggested to be the immonium ion for an
394
Jason J. Cournoyer and Peter B. O’Connor
isoaspartyl residue that results from a rearrangement of a primary carbocation, a structure suggested to be unstable and therefore not found in the isoaspartyl spectrum. Loss of water from the side chain of the carbocation yields a charged acylium structure of mass m/z ¼ 70. This ion, however, cannot be used as an absolute indicator for the presence of an isoaspartyl since it is the same mass for the proline immonium ion, which is typically a strong signal in the mass spectra of proline-containing peptides.
5.5 Liquid chromatography/mass spectrometry (LCMS) LCMS methods use the RP-HPLC technique because the acid modifiers help the retention and peak shape of peptides on the LC column while assisting protonation and being volatile for MS analysis (see Section 4.3 for details). The localization and extent of deamidation in a protein can be determined from one LCMS run of the proteolytic peptides. For example, nine deamidation sites (aspargine and glutamine) were characterized in gS-crystallin from cataracts as determined from an LCMS analysis of the trypsin digest [68]. Deamidation was first determined by the isotopic deconvolution method and localization of these sites were determined by the MS/MS data, which was necessary because multiple asparagine and glutamine residues were present in many of the peptides. In another study, the carboxyl groups of tryptic peptides were methyl esterified in order to simplify the detection and measurement of two deamidation sites [70]. Detecting the 1 Da shift in an ion trap can be difficult but the +14 Da from the introduced methyl ester at the carboxyl residue (deamidation site) make recognizing deamidation much easier. Also, the methyl ester changes the hydrophobicity of the peptide, shifting it away from the non-deamidated form and therefore simplifying the LC chromatogram for quantitative measurements. The relative quantification of deamidation products can also be determined by LCMS analysis, provided the LC separation is sufficient to separate the isomers. As mentioned above, the isoaspartyl form of a deamidated peptide typically elutes before the aspartyl form and this trend has been used to assign the identities of peptides found in an LC chromatogram [30], but supportive data is often needed to unambiguously make such assignments. Analyses using synthetic peptide standards [60,91] and a mutant form of a protein [40] have been successful in confidently identifying deamidation products separated by HPLC thereby providing reliable quantitative measurements. For example, LCMS/MS was used to measure the extent of in vivo deamidation of a monoclonal antibody using a mutant form as a standard [40]. The antibody in question was isolated and digested in parallel with the mutant form that had an aspartyl residue substituted for the deamidating asparaginyl residue. The retention time of the peptide with the aspartyl substitution from the mutant protein had the same mass-to-charge ratio, MS/MS profile, and retention time as the aspartyl peptide (deamidation product) from the in vivo sample. The information allowed the extent of deamidation and relative quantification of the products to be determined from the LCMS/MS experiment (the identity of the isoaspartyl peptide was assumed based on its retention time with respect to the aspartyl form). Other experiments
Analysis of Deamidation in Proteins
395
used the retention time and MS/MS profiles of corresponding synthetic peptides (aspartyl, isoaspartyl, and asparaginyl) to measure the deamidation of peptides or the natural abundance of isoaspartyl residues in a protein [60,91].
5.6 Electron capture dissociation Differentiation of aspartyl from isoaspartyl residues in synthetic peptides can be accomplished using electron capture dissociation (ECD) based on fragments generated from cleavage of the Ca–Cb bond [31]. ECD is performed in the ICR cell of an FTMS instrument by irradiating trapped, multiply protonated peptides and proteins with electrons. The fragments are then excited and detected with the high resolution and accuracy attributed to FTMS instrumentation. The c and z fragments typically generated by ECD are a result N–Ca cleavage, but when the residue is an isoaspartyl residue, cleavage of Ca–Cb backbone bond generates the cln+57 and zn-57 diagnostic fragment ions (n is the position of the isoaspartyl/ aspartyl residue and l is the length of the peptide and numbers are nominal masses in Daltons). Figure 6 shows two models, synthetic peptides which demonstrate the z4-57 diagnostic peak. The top spectrum in Figure 6 has no z4-57 peak, an abundant z4-CO2 peak, and an abundant M-60 peak, clearly indicating that Asp4 is, indeed, the a-aspartic acid isomer. The bottom spectrum instead shows an abundant z4-57 peak (it is the second most intense fragment ion peak in the spectrum) and much lower z4-CO2 and M-60 peaks, clearly indicating that Asp4, in this case, is the isoaspartic acid isomer. The cleavage mechanism that generates these diagnostic peaks is similar to a McLafferty rearrangement [156] that results in a stable, even electron enol (zn-57)
Figure 6 Electron capture dissociation spectra of two synthetic peptides, one with Asp (D) and one with isoAsp (isoD). The two peaks Z4CO2 and Z457 show the primary diagnostic peaks with clarity. Because this peptide is similar to a tryptic peptide, with C-terminal arginine, few c ions are observed. (Reprinted with permission from ref. [159].)
396
Jason J. Cournoyer and Peter B. O’Connor
and an odd electron, glycyl-like residue with a Ca radical (cln+57). Rauk has shown that an Ca radical is captodatively stabilized on a glycine residue due to the flanking amine and carbonyl groups [157]. The spectra of the aspartyl peptides showed a peak in the side chain loss region not found in the isoaspartyl peptide spectra representing neutral loss of the aspartyl side chain from the reduced molecular ion ((M+2 H)+d-60). The aspartyl side chain loss from Ca–Cb cleavage indicates the presence of an aspartyl residue, but is less informative when other aspartyl residues are present in the peptide. The cln+57 and zn-57 diagnostic ions were used to detect isoaspartyl residues in a deamidated tryptic peptide, a tryptic peptide from deamidated protein, and to differentiate isoaspartyl and aspartyl tryptic peptides from a protein separated by HPLC [33]. Figure 7 shows a similar separation of the two isoforms of a tryptic peptide from calmodulin, with the spectra below showing the regions for the z10-57, the c6+58, and the M-60 diagnostic peaks. Clearly, the left peak contains isoaspartic acid, and the right one contains aspartic acid. The cln+57 and zn-57 ions were also used to differentiate the isoforms of aspartic acid in synthetic peptides using electron transfer dissociation (ETD) in an ion trap mass spectrometer [32]. Similar to ECD, ETD cleaves N–Ca bonds in peptides and proteins but via interactions with gas-phase electron donors in an ion trap. The advantage of an ion trap is that it is inexpensive instrumentation compared to FTICRMS, but with inferior resolution. Using nitrobenzene anions as
Figure 7 Revised phase HPLC separation of the isomeric forms of the deamidated tryptic peptide (91)VFDKDGNGYISAAELR(106) from bovine calmodulin that were then differentiated by ECD. Mass spectra are zoomed in regions of the ECD fragment spectra where the diagnostic fragment ions used to differentiate the two forms should be located, shaded areas.
Analysis of Deamidation in Proteins
397
an electron donor, the cln+57 and zn-57 fragment ions were generated, localizing the position of three isoaspartyl residues in the peptide. Also, b and y series backbone fragments were generated, not typically found in ECD spectra, which can be helpful for extra sequence information. Therefore, ETD provides the diagnostic ions for detecting and localizing isoaspartyl residues as well as providing additional fragment ions not found in ECD spectra, but has difficulty resolving higher charge state ions compared to ECD performed in a FTMS.
6. QUANTITATION OF DEAMIDATION AND ITS PRODUCTS 6.1 HPLC combination methods As mentioned above, quantitating the extent of deamidation and the relative abundance of deamidation products can be done with HPLC when used in combination with techniques that can discriminate the multiple forms. Edman degradation, mass spectrometric techniques, and PIMT assays can be performed on collected fractions to differentiate the peptides (aspartyl/isoaspartyl peptides) as long as the peptides are adequately separated.
6.2 CAD methods without HPLC The capability to quantitate the relative amounts of aspartyl and isoaspartyl residues using CAD fragments (bn+H2O, b/y intensity ratio and immonium ions) is possible [154,158]. The experiments illustrating this ability involve performing MS/MS analysis on mixtures of synthetic peptides that vary in isoaspartyl/ aspartyl composition, intending to represent the possible outcomes of asparaginyl deamidation or aspartyl isomerization. Lehman showed that both the b/y intensity ratio and immonium ions could be used to calculate the relative abundance of the two forms [154]. A plot of the b/y intensity ratio from cleavage on the N-terminal side of the peptide VQ(D/isoD)GLR versus aspartyl content showed an asymptotic relationship that could be used as a calibration curve. Also, MS/MS analysis of mixtures of myrGDAAAK and its isoaspartyl counterpart showed a linear relationship between the aspartyl immonium ion (normalized to the lysine immonium ion) and aspartyl composition. However, since these fragments ions are not diagnostic and calculation of the relative intensity the fragments are necessary, at least two points on the calibration curves are needed, and peptide standards are required to generate calibration curves. Alternatively, the bn+H2O fragment ion is diagnostic [149–151]. The relative intensity of bn+H2O was shown to increase linearly with isoaspartyl content when compared to other backbone cleavages that were assumed to be unchanging regardless of sample composition [158]. Despite the advantage, the isoaspartyl peptide standard is necessary to establish the calibration plot.
6.3 ECD method without HPLC Similar to the CAD method using the bn+H2O isoaspartyl fragment ion, the zn-57 diagnostic ion found in the ECD spectra of isoaspartyl/aspartyl peptide mixtures
398
Jason J. Cournoyer and Peter B. O’Connor
was shown to increase linearly with isoaspartyl content for three sets of isomeric peptides that varied in amino acid sequence [159]. The linear relationship between the relative abundance of zn-57 to all backbone fragment ions and isoaspartyl composition was used as opposed to the method using the bn+H2O CAD fragment ion, which used the relative abundance of the diagnostic ion to another fragment ion in the mass spectrum [158]. The relative abundance of most ECD fragment ions from backbone cleavages were found to change with the substitution of one form of aspartic acid for the other, most likely due to gas-phase hydrogen bonding involving polar side chains. Therefore, relative quantitation using the zn-57 fragment ion normalized to another backbone fragment ion may be possible, but only if it does not vary with isoaspartyl composition. Otherwise, normalization to total backbone fragment ion abundance was shown to provide a consistent linear relationship. The ECD method still requires an isoaspartyl version of the peptide to construct the linear plot, however using the deamidated proteolytic peptide from the protein that contains the deamidation site should provide a 3:1 (isoaspartyl: aspartyl) standard for calibration [159]. The proteolytic peptide mixture obtained from protein digestion can be incubated under harsh conditions to provide the 75% isoaspartyl peptide standard to be used for calculating the composition of the mixture from in vivo or in vitro deamidation experiments (Figure 8).
Figure 8 Diagram of a methodology for systematic relative quantitation of Asp/isoAsp ratios in proteins. The methods assume that a small, random coil peptide from an enzymatic digest, under harsh deamidation conditions, will deamidate to the 75% isoAsp typically observed. Provided that caveat holds, the harsh deamidation conditions provide a second calibration point, which allows relative Asp/isoAsp ratios to be measured. This caveat has been verified in several cases, but for any particular protein of interest, it would have to be checked. (Reprinted with permission from ref. [159].)
Analysis of Deamidation in Proteins
399
7. ISOTOPIC LABELING METHODS The most common ambiguity in mass spectrometric analysis of deamidation and the formation of isoaspartyl residues arises from the simple, nonenzymatic nature of the deamidation reaction. Because the reaction occurs automatically and pseudo-unimolecularly (it requires H2O) at basic pH, it will occur during sample-handling procedures. The most common observation to this effect is that deamidation occurs during enzymatic digestion when trypsin is used, overnight, at pH 8–9, which is a fairly standard condition. Therefore, how do we know which deamidation and isoaspartyl sites are formed under physiological conditions and which are formed as an artifact of our sample processing methods prior to analysis? One method is use of 18O isotopic labeling. 18 O labeling of peptides can be accomplished by digestion of proteins in the presence of H218O. When digestion is performed in H218O, at least one 18O is incorporated into the C-terminus of the newly formed peptide since one molecule of H2O is used to hydrolyze the peptide bond. 18O labeling is easily amenable to analysis by MS since labeled peptides should show a 2 Da shift in their mass spectrum. Interestingly, Schnolzer et al. used MS to show that trypsin can incorporate more than one 18O during peptide bond hydrolysis as indicated by two overlapping isotopic distribution in the mass spectrum (2 and 4 Da shift) [160]. The authors believed that cleaved peptides can interact with the protease again to incorporate another 18O and that the extent of these post-cleavage interactions is peptide dependent. Therefore, tryptic digestion in H218O results in peptide mixtures containing varying amounts of 18 O incorporation, which was later corroborated by other studies [161,162]. However, using consistent digestion procedures can eliminate discrepancies between experiments regarding the amount of 18O that is incorporated and such rigorous protocols have helped to make 18O labeling a viable method for proteomics. Identification of proteins’ C-termini [160,163] and glycosylation sites [164] have been accomplished by 18O labeling and several strategies, reminiscent of ICAT methodology, have been proposed for evaluating relative protein expression using 18O labeling [162,165–167]. Additionally, Grossenbacher et al. labeled two DG sites of recombinant hirudin with 18O by opening their succinimide intermediates (from aspartyl dehydration) with the addition of H218O, therefore suggesting the possibility of detecting aspartyl isomerization and asparaginyl deamidation via an 18O labeling experiment [168]. Finally, both Grossenbacher et al. [168] and Lui et al. [109] found that deamidation occurring in H218O incorporates 18O into the aspartyl or isoaspartyl side chain suggesting the possibility of detecting aspartyl isomerization and asparaginyl deamidation via an 18O labeling experiment. Deamidation of labile asparaginyl residues often occurs during digestion (as opposed to aspartyl isomerization which is typically 40 times slower than deamidation) since elevated temperature and pH for extended periods of time are often required which promotes deamidation. The typical +1 Da shift upon deamidation is often too subtle a change to determine
400
Jason J. Cournoyer and Peter B. O’Connor
the extent of conversion especially when complex isotopic clusters from larger, multiply charged peptides are analyzed. However, the 3 Da shift produced by deamidation in H218O (–NH2 to –18OH) can help determine the extent of unfavorable deamidation that may result from prolonged digestion especially when the deamidation of these larger, multiply charged peptides are investigated. Three proteins (calmodulin, reduced and alkylated ribonuclease A, and lysozyme) were digested in 95% H218O (0.1 M ammonium bicarbonate, pH 8.3) with trypsin (25:1 ratio, respectively) for extended periods of times from which aliquots were taken periodically and analyzed by nanospray FTMS (Figure 9). First, the tryptic peptide from ribonuclease A (residues 67–85) showed only one 18O incorporation upon digestion and completely deamidated after 1 day of incubation as illustrated with an approximate 3 Da shift in the mass spectrum (Figure 9a). After 2 days, the peptide with two deamidations became the more prominent species and this second deamidation site is most likely N71. Although the predicted deamidation half-life for this site is B55 days at pH 7.4, the digest was performed in an environment about 8 times more basic (pH 8.3), which may dramatically accelerate the rate of deamidation within the time frame studied. The tryptic peptide from calmodulin (residues 91–106) initially showed a 1:2 mixture of one and two 18O incorporations, respectively, and continued to shift to a 1:3 mixture after 4 h indicating that this peptide is an excellent substrate for trypsin (Figure 9b). After 24 h, the relative ratio of one to two 18O incorporations remained at 1:3 but the entire isotopic pattern shifted +3 Da as a result of complete deamidation. Similar to the calmodulin peptide, the tryptic peptide from lysozyme (residues 46–61) initially showed a mixture of one and two 18O incorporations (about 1:1, respectively) which shifted slightly in favor of the peptide with two 18O incorporations after 8 h (Figure 9c). Again, this minor shift could be due to the peptide continuing to interact with trypsin. Both forms (peptides with one and two 18O) showed their deamidated counterparts after 1 day of incubation and these deamidated forms were the most prominent after 2 days of incubation. This is expected since the predicted deamidation half-life of the –NS– sequence over –NG– is approximately 10-fold. All three experiments show that the +3 Da mass shifts in the mass spectrum accompanied with deamidation occurring in H2O18 may be a useful technique for detecting and measuring deamidation, especially for routine proteomic experiments in which prolonged digestion periods are often required. The +3 Da mass shift helps to simplify the process of determining the relative abundance of the deamidated and native peptides so that these values can be directly inferred from the spectrum without deconvolution calculations. Although multiple O18 incorporations from digestion may complicate the spectrum, once the relative abundance of the two forms are established (one and/or two 18O), the +3 Da shifts should still provide a better method for detecting and measuring deamidation versus the traditional 1 Da shift found with deamidation occurring in H216O.
Analysis of Deamidation in Proteins
401
Figure 9 Tryptic digestion of peptides in isotopically labeled water incorporates 18O twice at the C-terminus, and once at each deamidation site.
8. SUMMARY Deamidation of asparagine and glutamine residues to their acidic counterparts is a common PTM on proteins. It occurs non-enzymatically in solution, and with the exception of –NP– and –QP– sequences, it occurs at all Asn and Gln residues
402
Jason J. Cournoyer and Peter B. O’Connor
in all proteins eventually. However, the rates vary dramatically with the primary and higher-order structure of proteins. Deamidation has substantial repercussions in protein stability, often leading to the unfolding and recycling of proteins by the cell. However, the misfolding of deamidated proteins is also associated with many diseases from Amyloid diseases to cataracts to cancer. Deamidation results in a charge change and a +0.984 Da mass shift, both of which are readily detectable by many methods, but it results in a mixture of isomers that are relatively difficult to differentiate. A new mass spectrometric method was recently introduced (and discussed above) which uses ECD fragmentation patterns to distinguish aspartyl and isoaspartyl residues formed from deamidation of asparagine. This method is expected to work similarly for glutamine deamidation. The ECD method not only distinguishes the isomers, it also suggests several methods to quantify them. One method which has been demonstrated in the cytochrome C case, utilizes the characteristic 1:3 branching ratio on deamidation to provide a calibration point which allows relative quantitation. Finally, while deamidation occurs naturally, it also occurs as a samplehandling artifact, most notably during overnight tryptic digestion at basic pH. This artifact can be controlled and monitored by 18O isotopic labeling methods.
REFERENCES 1 N.E. Robinson and A.B. Robinson, Molecular Clocks: Deamidation of Asparaginyl and Glutaminyl Residues in Peptides and Proteins, Althouse Press, Cave Junction, OR, 2004. 2 N.E. Robinson and A.B. Robinson, Prediction of protein deamidation rates from primary and three-dimensional structure, Proc. Natl. Acad. Sci. USA, 98 (2001) 4367–4372. 3 N.E. Robinson and A.B. Robinson, Molecular clocks, Proc. Natl. Acad. Sci. USA, 98 (2001) 944–949. 4 A.B. Robinson, J.H. McKerrow and P. Cary, Controlled deamidation of peptides and proteins: An experimental hazard and possible biological timer, Proc. Natl. Acad. Sci. USA, 66 (1970) 753–757. 5 N.E. Robinson and A.B. Robinson, Deamidation of human proteins, Proc. Natl. Acad. Sci. USA, 98 (2001) 12409–12413. 6 S. Capasso and P. Di Cerbo, Kinetic and thermodynamic control of the relative yield of the deamidation of asparagine and isomerization of aspartic acid residues, J. Pept. Res., 56 (2000) 382–387. 7 S. Capasso, Estimation of the deamidation rate ol asparagine side chains, J. Pept. Res., 55 (2000) 224–229. 8 S. Capasso, G. Balboni and P. Di Cerbo, Effect of lysine residues on the deamidation reaction of asparagine side chains, Biopolymers, 53 (2000) 213–219. 9 T.B. Brennan and S. Clarke, Deamidation and isoaspartate formation in peptides and proteins. In: D.W. Aswad (Ed.), CRC Press, Boca Raton, FL, 1995, pp. 65–90. 10 D.W. Aswad, M.V. Paranandi and B.T. Schurter, Isoaspartate in peptides and proteins: Formation, significance, and analysis, J. Pharm. Biomed. Anal., 21 (2000) 1129–1136. 11 M.L. Xie and R.L. Schowen, Secondary structure and protein deamidation, J. Pharm. Sci., 88 (1999) 8–13. 12 J.L. Radkiewicz, H. Zipse, S. Clarke and K.N. Houk, Neighboring side chain effects on asparaginyl and aspartyl degradation: An ab initio study of the relationship between peptide conformation and backbone nh acidity, J. Am. Chem. Soc., 123 (2001) 3499–3506.
Analysis of Deamidation in Proteins
403
13 C.D. Smith, M. Carson, A.M. Friedman, M.M. Skinner, L. Delucas, L. Chantalat, L. Weise, T. Shirasawa and D. Chattopadhyay, Crystal structure of human l-isoaspartyl-o-methyl-transferase with s-adenosyl homocysteine at 1.6 a resolution and modeling of an isoaspartyl-containing peptide at the active site, Protein Sci., 11 (2002) 625–635. 14 F. Paradisi, J.L.E. Dean, K.F. Geoghegan and P.C. Engel, Spontaneous chemical reversion of an active site mutation: Deamidation of an asparagine residue replacing the catalytic aspartic acid of glutamate dehydrogenase, Biochemistry, 44 (2005) 3636–3643. 15 T.B. Brennan and S. Clarke, Spontaneous degradation of polypeptides at aspartyl and asparaginyl residues: Effects of the solvent dielectric, Protein Sci., 2 (1993) 331–339. 16 P. Bornstein and G. Balian, Cleavage at asn-gly bonds with hydroxylamine, Meth. Enzymol, 47 (1977) 132–145. 17 Y.C. Meinwald, E.R. Stimson and H.A. Scheraga, Deamidation of the asparaginyl-glycyl sequence, Int. J. Peptide. Protein Res., 28 (1986) 79–84. 18 T. Geiger and S. Clarke, Deamidation, isomerization, and racemization at asparaginyla and aspartyl residues in peptides, J. Biol. Chem., 262 (1987) 785–794. 19 K. Patel and R.T. Borchardt, Chemical pathways of peptide degradation. 2. Kinetics of deamidation of an asparaginyl residue in a model hexapeptide, Pharm. Res., 7 (1990) 703–711. 20 K. Patel and R.T. Borchardt, Chemical pathways of peptide degradation. 3. Effect of primary sequence on the pathways of deamidation of asparaginyl residues in hexapeptides, Pharm. Res., 7 (1990) 787–793. 21 N.P. Bhatt, K. Patel and R.T. Borchardt, Chemical pathways of peptide degradation. 1. Deamidation of adrenocorticotropic hormone, Pharm. Res., 7 (1990) 593–599. 22 F. Catanzano, G. Graziano, S. Capasso and G. Barone, Thermodynamic analysis of the effect of selective monodeamidation at asparagine 67 in ribonuclease a, Protein Sci., 6 (1997) 1682–1693. 23 D.G. Schmid, F. von der Mulbe, B. Fleckenstein, T. Weinschenk and G. Jung, Broadband detection electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry to reveal enzymatically and chemically induced deamidation reactions within peptides, Anal. Chem., 73 (2001) 6008–6013. 24 N.E. Robinson, K.J. Lampi, R.T. McIver, R.H. Williams, W.C. Muster, G. Kruppa and A.B. Robinson, Quantitative measurement of deamidation in lens beta-b2-crystallin and peptides by direct electrospray injection and fragmentation in a Fourier transform mass spectrometer, Mol. Vis., 11 (2005) 1211–1219. 25 S. Castet, C. Enjalbal, P. Fulcrand, J.F. Guichou, J. Martinez and J.L. Aubagnac, Characterization of aspartic acid and beta-aspartic acid in peptides by fast-atom bombardment mass spectrometry and tandem mass spectrometry, Rapid Commun. Mass Spectrom., 10 (1996) 1934–1938. 26 J.R. Lloyd, M.L. Cotter, D. Ohori and D.L. Doyle, Distinction of alpha-aspartyl and beta-aspartyl and alpha-glutamyl and gamma-glutamyl-transferase peptides by fast atom bombardment tandem mass-spectrometry, Biomed. Environ. Mass Spectrom., 15 (1988) 399–402. 27 J.J. Kusmierz and F.P. Abramson, Tracing n-15 with chemical-reaction interface mass-spectrometry – a demonstration using n-15-labeled glutamine and asparagine substrates in cell-culture, Biol. Mass Spectrom., 23 (1994) 756–763. 28 L.J. Gonzalez, T. Shimizu, Y. Satomi, L. Betancourt, V. Besada, G. Padron, R. Orlando, T. Shirasawa, Y. Shimonishi and T. Takao, Differentiating alpha- and beta-aspartic acids by electrospray ionization and low-energy tandem mass spectrometry, Rapid Commun. Mass Spectrom., 14 (2000) 2092–2102. 29 N.C. Luu, S. Robinson, R. Zhao, R. McKean and D.P. Ridge, Mass spectrometric differentiation of alpha- and beta-aspartic acid in a pseudo-tetrapeptide thrombosis inhibitor and its isomer, Eur. J. Mass Spectrom, 10 (2004) 279–287. 30 D. Chelius, D.S. Rehder and P.V. Bondarenko, Identification and characterization of deamidation sites in the conserved regions of human immunoglobulin gamma antibodies, Anal. Chem., 77 (2005) 6004–6011. 31 J.J. Cournoyer, J.L. Pittman, V.B. Ivleva, E. Fallows, L. Waskell, C.E. Costello and P.B. O’Connor, Deamidation: Differentiation of aspartyl from isoaspartyl products in peptides by electron capture dissociation, Protein Sci., 14 (2005) 452–463.
404
Jason J. Cournoyer and Peter B. O’Connor
32 P.B. O’Connor, J.J. Cournoyer, S.J. Pitteri, P.A. Chrisman and S.A. McLuckey, Differentiation of aspartic and isoaspartic acids using electron transfer dissociation, J. Am. Soc. Mass Spectrom., 17 (2006) 15–19. 33 J.J. Cournoyer, C. Lin and P.B. O’Connor, Detecting deamidation products in proteins by electron capture dissociation, Anal. Chem., 78 (2006) 1264–1271. 34 J. Schmidt, Feasibility study: Fast liquid chromatography-mass spectrometry for the quantification of aspartic acid in an aspartate drug, Anal. Bioanal. Chem., 377 (2003) 1120–1123. 35 S. De Boni, C. Oberthur, M. Hamburger and G.K.E. Scriba, Analysis of aspartyl peptide degradation products by high-performance liquid chromatography and high-performance liquid chromatography-mass spectrometry, J. Chromatogr. A, 1022 (2004) 95–102. 36 S. De Boni, C. Neususs, M. Pelzing and G.K.E. Scriba, Identification of degradation products of aspartyl tripeptides by capillary electrophoresis-tandem mass spectrometry, Electrophoresis, 24 (2003) 874–882. 37 H. Sarioglu, F. Lottspeich, T. Walk, G. Jung and C. Eckerskorn, Deamidation as a widespread phenomenon in two-dimensional polyacrylamide gel electrophoresis of human blood plasma proteins, Electrophoresis, 21 (2000) 2209–2218. 38 D.T.Y. Liu, Deamidation – a source of microheterogeneity in pharmaceutical proteins, Trends Biotech., 10 (1992) 364–369. 39 W. Zhang and M.J. Czupryn, Analysis of isoaspartate in a recombinant monoclonal antibody and its charge isoforms, J. Pharm. Biomed. Anal., 30 (2003) 1479–1490. 40 L.H. Huang, J.R. Li, V.J. Wroblewski, J.M. Beals and R.M. Riggin, In vivo deamidation characterization of monoclonal antibody by LC/MS/MS, Anal. Chem., 77 (2005) 1432–1439. 41 Y.R. Hsu, W.C. Chang, E.A. Mendiaz, S. Hara, D.T. Chow, M.B. Mann, K.E. Langley and H.S. Lu, Selective deamidation of recombinant human stem cell factor during in vitro aging: Isolation and characterization of the aspartyl and isoaspartyl homodimers and heterodimers, Biochemistry, 37 (1998) 2251–2262. 42 M. Inaba and Y. Maede, Correlation between protien 4.1a/4.1b ratio and erythrocyte life span, Biochim. Biophys. Acta, 944 (1988) 256–264. 43 D.A. Chavous, F.R. Jackson and C.M. O’Connor, Extension of the drosophila lifespan by overexpression of a protein repair methyltransferase, Proc. Natl. Acad. Sci. USA, 98 (2001) 14814–14818. 44 C.M. O’Connor, D.A. Chavous and F.R. Jackson, Overexpression of a protein repair methyltransferase exlends the drosophila lifespan, Mol. Biol. Cell, 12 (2001) 243A. 45 E. Kim, J.D. Lowenson, D.C. MacLaren, S. Clarke and S.G. Young, Deficiency of a protein-repair enzyme results in the accumulation of altered proteins, retardation of growth, and fatal seizures in mice, Proc. Natl. Acad. Sci. USA, 94 (1997) 6132–6137. 46 J.D. Lowenson, E. Kim, S.G. Young and S. Clarke, Limited accumulation of damaged proteins in l-isoaspartyl (d-aspartyl) o-methyltransferase-deficient mice, J. Biol. Chem., 276 (2001) 20695–20702. 47 E. Kim, J.D. Lowenson, S. Clarke and S.G. Young, Phenotypic analysis of seizure-prone mice lacking l-isoaspartate (d-aspartate) o-methyltransferase, J. Biol. Chem., 274 (1999) 20671–20678. 48 M. Ogawara, M. Takahashi, T. Shimizu, M. Nakajima, Y. Setoguchi and T. Shirasawa, Adenoviral expression of protein-l-isoaspartyl methyl transferase (pimt) partially attenuates the biochemical changes in pimt-deficient mice, J. Neurosci. Res., 69 (2002) 353–361. 49 K.J. Reissner, T.M. Luc, M.J. Mamula and D.W. Aswad, Isoaspartate to synapsin i from pimt deficient mice, Faseb J., 15 (2001) A888. 50 T. Shimizu, T. Ikegami, M. Ogawara, Y. Suzuki, M. Takahashi, H. Morio and T. Shirasawa, Transgenic expression of the protein-l-soaspartyl methyltransferase (pimt) gene in the brain rescues mice from the fatal epilepsy of pimt deficiency, J. Neurosci. Res., 69 (2002) 341–352. 51 T. Yamada, J. Liepnieks, M.D. Benson and B. KluveBeckerman, Accelerated amyloid deposition in mice treated with the aspartic protease inhibitor, pepstatin, J. Immunol., 157 (1996) 901–907. 52 J. Orpiszewski and M.D. Benson, Induction of beta-sheet structure in amyloidogenic peptides by neutralization of aspartate: A model for amyloid nucleation, J. Mol. Biol., 289 (1999) 413–428. 53 J. Orpiszewski, N. Schormann, B. Kluve-Beckerman, J.J. Liepnieks and M.D. Benson, Protein aging hypothesis of Alzheimer disease, Faseb J., 14 (2000) 1255–1263.
Analysis of Deamidation in Proteins
405
54 M.R. Nilsson, M. Driscoll and D.P. Raleigh, Low levels of asparagine deamidation can have a dramatic effect on aggregation of amyloidogenic peptides: Implications for the study of amyloid formation, Protein Sci., 11 (2002) 342–349. 55 H. Yamaguchi, S. Sugihara, K. Ishiguro, A. Takashima and S. Hirai, Immunohistochemical analysis of cooh-termini of amyloid-beta protein (a-beta) using end-specific antisera for a-beta-40 and a-beta-42 in Alzheimers-disease and normal aging, Amyloid, 2 (1995) 7–16. 56 M.R. Nilsson and C.M. Dobson, Chemical modification of insulin in amyloid fibrils, Protein Sci., 12 (2003) 2637–2641. 57 M.L. Xie, Z. Shahrokh, M. Kadkhodayan, W.J. Henzel, M.F. Powell, R.T. Borchardt and R.L. Schowen, Asparagine deamidation in recombinant human lymphotoxin: Hindrance by three-dimensional structures, J. Pharm. Sci., 92 (2003) 869–880. 58 M. Xie, J. Aube, R.T. Borchardt, M. Morton, E.M. Topp, D. Vander Velde and R.L. Schowen, Reactivity toward deamidation of asparagine residues in beta-turn structures, J. Pept. Res., 56 (2000) 165–171. 59 A.A. Kosky, U.O. Razzaq, M.J. Treuheit and D.N. Brems, The effects of alpha-helix on the stability of asn residues: Deamidation rates in peptides of varying helicity, Protein Sci., 8 (1999) 2519–2523. 60 S.N. McAdam, B. Fleckenstein, I.B. Rasmussen, D.G. Schmid, I. Sandlie, B. Bogen, N.J. Viner and L.M. Sollid, T cell recognition of the dominant i-a(k)-restricted hen egg lysozyme epitope: Critical role for asparagine deamidation, J. Exp. Med., 193 (2001) 1239–1246. 61 M.F. Mazzeo, B. De Giulio, S. Senger, M. Rossi, A. Malorni and R.A. Siciliano, Identification of trans glutaminase-mediated deamidation sites in a recombinant alpha-gliadin by advanced massspectrometric methodologies, Protein Sci., 12 (2003) 2434–2442. 62 H. Sjostrom, K.E.A. Lundin, O. Molberg, R. Korner, S.N. McAdam, D. Anthonsen, H. Quarsten, O. Noren, P. Roepstorff, E. Thorsby and L.M. Sollid, Identification of a gliadin t-cell epitope in coeliac disease: General importance of gliadin deamidation for intestinal t-cell recognition, Scand. J. Immunol., 48 (1998) 111–115. 63 S.J. Weintraub and S.R. Manson, Asparagine deamidation: A regulatory hourglass, Mech. Ageing Dev., 125 (2004) 255–257. 64 B.E. Deverman, B.L. Cook, S.R. Manson, R.A. Niederhoff, E.M. Langer, I. Rosova, L.A. Kulans, X.Y. Fu, J.S. Weinberg, J.W. Heinecke, K.A. Roth and S.J. Weintraub, Bcl-x-l deamidation is a critical switch in the regulation of the response to DNA damage, Cell, 111 (2002) 51–62. 65 T. Takehara and H. Takahashi, Asparagine deamidation as a novel posttranslational modification of bcl-xl, Gastroenterology, 118 (2000) A443. 66 T. Takehara and H. Takahashi, Suppression of bcl-xl deamidation in human hepatocellular carcinomas, Cancer Res., 63 (2003) 3054–3057. 67 Y.H. Kim, D.M. Kapfer, J. Boekhorst, N.H. Lubsen, H.P. Bachinger, T.R. Shearer, L.L. David, J.B. Feix and K.J. Lampi, Deamidation, but not truncation, decreases the urea stability of a lens structural protein, beta b1-crystallin, Biochemistry, 41 (2002) 14076–14084. 68 V.N. Lapko, A.G. Purkiss, D.L. Smith and J.B. Smith, Deamidation in human gamma s-crystallin from cataractous lenses is influenced by surface exposure, Biochemistry, 41 (2002) 8638–8648. 69 R.J. Kapphahn, C.M. Ethen, E.A. Peters, L. Higgins and D.A. Ferrington, Modified alpha-a crystallin in the retina: Altered expression and truncation with aging, Biochemistry, 42 (2003) 15310–15325. 70 M.J. Harms, P.A. Wilmarth, D.M. Kapfer, E.A. Steel, L.L. David, H.P. Bachinger and K.J. Lampi, Laser light-scattering evidence for an altered association of beta b1-crystallin deamidated in the connecting peptide, Protein Sci., 13 (2004) 678–686. 71 P. Ahmann, K. Amyx and K.J. Lampi, Deamidation at the n-terminal domain interface (q70e), but not the homologue c-terminal domain interface (q162e) of beta b2 crystallin altered the protein stability, Invest. Ophthalmol. Vis. Sci., 46 (2005). 72 K.J. Lampi, M. Harms and D.M. Kapfer, Deamidation at asparagine 157 in human betab1-crystallin, Invest. Ophthalmol. Vis. Sci., 44 (2003) U409. 73 K.J. Lampi, J.T. Oxford, H.P. Bachinger, T.R. Shearer, L.L. David and D.M. Kapfer, Deamidation of human beta b1 alters the elongated structure of the dimmer, Exp. Eye Res., 72 (2001) 279–288.
406
Jason J. Cournoyer and Peter B. O’Connor
74 L.E. Ball, D.L. Garland, R.K. Crouch and K.L. Schey, Post-translational modificatino of aquaporin 0 (aqp0) in the normal human lens: Spatial and temporal occurrence, Biochemistry, 43 (2004) 9856–9865. 75 W.J. Ribot, B.S. Powell, B.E. Ivins, S.F. Little, W.M. Johnson, T.A. Hoover, S.L. Norris, J.J. Adamovicz, A.M. Friedlander and G.P. Andrews, Comparative vaccine efficacy of different isoforms of recombinant protective antigen against Bacillus anthracis spore challenge in rabbits, Vaccine, 24 (2006) 3469–3476. 76 G. Zomber, S. Reuveny, N. Garti, A. Shafferman and E. Elhanany, Effects of spontaneous deamidation on the cytotoxic activity of the Bacillus anthracis protective antigen, J. Biol. Chem., 280 (2005) 39897–39906. 77 A.M. Friedlander, S.L. Welkos and B.E. Ivins, Anthrax Vaccines, Anthrax, 271 (2002) 33–60. 78 A.M. Friedlander, Microbiology – tackling anthrax, Nature, 414 (2001) 160–161. 79 B.E. Ivins, M.L.M. Pitt, P.F. Fellows, J.W. Farchaus, G.E. Benner, D.M. Waag, S.F. Little, G.W. Anderson, P.H. Gibbs and A.M. Friedlander, Comparative efficacy of experimental anthrax vaccine candidates against inhalation anthrax in rhesus macaques, Vaccine, 16 (1998) 1141–1148. 80 M. Keykhosravani, A. Doherty-Kirby, C. Zhang, D. Brewer, H.A. Goldberg, G.K. Hunter and G. Lajoie, Comprehensive identification of post-translational modifications of rat bone osteopontin by mass spectrometry, Biochemistry, 44 (2005) 6990–7003. 81 C. Vanbelle, F. Halgand, T. Cedervall, E. Thulin, K.S. Akerfeldt, O. Laprevote and S. Linse, Deamidation and disulfide bridge formation in human calbindin d-2gk with effects on calcium binding, Protein Sci., 14 (2005) 968–979. 82 L. Buetow and P. Ghosh, Structural elements required for deamidation of rhoa by cytotoxic necrotizing factor 1, Biochemistry, 42 (2003) 12784–12791. 83 G. Flatau, L. Landraud, P. Boquet, M. Bruzzone and P. Munro, Deamidation of rhoa glutamine 63 by the escherichia coli cnf1 toxin requires a short sequence of the gtpase switch 2 domain, Biochem. Biophys. Res. Commun., 267 (2000) 588–592. 84 G. Flatau, E. Lemichez, M. Gauthier, P. Chardin, S. Paris, C. Fiorentini and P. Boquet, Toxin-induced activation of the g protein p21 rho by deamidation of glutamine, Nature, 387 (1997) 729–733. 85 M. Lerm, J. Selzer, A. Hoffmeyer, U.R. Rapp, K. Aktories and G. Schmidt, Deamidation of cdc42 and rac by Escherichia coli cytotoxic necrotizing factor 1: Activation of cjun n-terminal kinase in hela cells, Infect. Immun., 67 (1999) 496–503. 86 G. Schmidt, M. Lerm, J. Selzer and K. Aktories, Deamidation and transglutamination of gln 63 of rho induced by E. coli cytotoxic necrotizing factor 1, Naunyn Schmiedebergs Arch. Pharmacol., 357 (1998) R59. 87 M. Lerm, G. Schmidt, U.-M. Goehring, J. Schirmer and L. Aktories, Identification of the region of rho involved in substrate recognition by Escherichia coli cytotoxic necrotizing factor 1 (cnf1), J. Biol. Chem., 274 (1999) 28999–29004. 88 D. Ingrosso, S. D’Angelo, E.d. Carlo, A.F. Perna, V. Zappia and P. Galletti, Increased methyl esterification of altered aspartyl residues in erythrocyte membrane proteins in response to oxidative stress, Eur. J. Biochem., 267 (2000) 4397–4405. 89 P. Galletti, D. Ingrosso, C. Manna, G. Clemente and V. Zappia, Protein damage and methylationmediated repair in the erythrocyte, Biochem. J., 306 (1995) 313–325. 90 D. Ingrosso, S. D’Angelo, A.F. Perna, A. Iolascon, E. Miragali Del Giudice, S. Perrota, V. Zappia and P. Galletti, Increased membrane-protein methylation in hereditary spherocytosis: A marker of cytoskeletal disarray, Eur. J. Biochem., 228 (1995) 894–898. 91 P.T. Jedrzejewski, A. Girod, A. Tholey, N. Konig, S. Thullner, V. Kinzel and D. Bossemeyer, A conserved deamidation site at asn 2 in the catalytic subunit of mammalian camp-dependent protein kinase detected by capillary LC-MS and tandem mass spectrometry, Protein Sci., 7 (1998) 457–469. 92 A. Artigues, A. Birkett and V. Schirch, Evidence for the in vivo deamidation and isomerization of an asparaginyl residue in cytosolic serine hydroxymethyltransferase, J. Biol. Chem., 265 (1990) 4853–4858. 93 T. Solstad, R.N. Carvalho, O.A. Andersen, D. Waidelich and T. Flatmark, Deamidation of labile asparagine residues in the autoregulatory sequence of human phenylalanine hydroxylase – structural and functional implications, Eur. J. Biochem., 270 (2003) 929–938.
Analysis of Deamidation in Proteins
407
94 D.W. Aswad, Substoichiometric methylation of porcine adrenocorticotropin by protein carboxyl methyltransferase requires deamidation of asparagine 25, J. Biol. Chem., 259 (1984) 10714–10721. 95 E.D. Murray and S. Clarke, Synthetic peptide substrates for the erythrocyte proein carboxyl methyltransferase, J. Biol. Chem., 259 (1984) 10722–10732. 96 J.D. Lowenson and S. Clarke, Structural elements affecting the recognition of l-isoaspartyl residues by the l-isoaspartyl d-aspartyl protein methyltransferase – implications for the repair hypothesis, J. Biol. Chem., 266 (1991) 19396–19406. 97 B.A. Johnson and D.W. Aswad, Deamidation and isoaspartate formation in peptides and proteins. In: D.W. Aswad (Ed.), CRC Press, Boca Raton, FL, 1995, pp. 91–113. 98 http://www.promega.com/pnotes/53/5015b/5015b.html 99 S.M. Potter, W.J. Henzel and D.W. Aswad, In-vitro aging of calmodulin generates isoaspartate at multiple asn-gly and asp-gly sites in calcium-binding domain-ii, domain-iii, and domain-iv, Protein Sci., 2 (1993) 1648–1663. 100 B.A. Johnson, J. Najbauer and D.W. Aswad, Accumulation of substrates for protein l-isoaspartyl methyltransferase in adenosine dialdehyde-treated pc12 cells, J. Biol. Chem., 268 (1993) 6174–6181. 101 A.L. Young, W.G. Carter, H.A. Doyle, M.J. Mamula and D.W. Aswad, Structural integrity of histone h2b in vivo requires the activity of protein l-isoaspartate o-methyltransferase, a putative protein repair enzyme, J. Biol. Chem., 276 (2001) 37161–37165. 102 A. Yamamoto, H. Takagi, D. Kitamura, H. Tatsuoka, H. Nakano, H. Kawano, H. Kuroyanagi, Y. Yahagi, S. Kobayashi, K. Koizumi, T. Sakai, K. Saito, T. Chiba, K. Kawamura, K. Suzuki, T. Watanabe, H. Mori and T. Shirasawa, Deficiency in protein l-isoaspartyl methyltransferase results in a fatal progressive epilepsy, J. Neurosci., 18 (1998) 2063–2074. 103 A.D. Carlson and R.M. Riggin, Development of improved high-performance liquid chromatography conditions for nonisotopic detection of isoaspartic acid to determine the extent of protein deamidation, Anal. Biochem., 278 (2000) 150–155. 104 D. Kameoka, T. Ueda and T. Imoto, A method for the detection of asparagine deamidation and aspartate isomerization of proteins by MALDI/TOF-mass spectrometry using endoproteinase asp-n, J. Biochem., 134 (2003) 129–135. 105 M.Y. Kwong and R.J. Harris, Identification of succinimide sites in proteins by n-terminal sequence analysis after alkaline hydroxylamine cleavage, Protein Sci., 3 (1994) 147–149. 106 P. Chaurand, B.B. DaGue, S. Ma, S. Kasper and R.M. Caprioli, Strain-based sequwnce variation and structure analysis of murine prostate specific binding protein using mass spectrometry, Biochemistry, 40 (2001) 9725–9733. 107 Y. Zhen, R.M. Caprioli and J.V. Staros, Characterization of glycosylation sites of the epidermal growth factor receptor, Biochemistry, 42 (2003) 5478–5492. 108 J.A. Karty and J.P. Reilly, Deamidation as a consequence of beta-elimination of phosphopeptides, Anal. Chem., 77 (2005) 4673–4676. 109 P. Liu and F.E. Regnier, Recognizing single amino acid polymorphism in proteins, Anal. Chem., 75 (2003) 4956–4963. 110 B. Li, R.T. Borchardt, E.M. Topp, D. VanderVelde and R.L. Schowen, Racemization of an asparagine residue during peptide deamidation, J. Am. Chem. Soc., 125 (2003) 11486–11487. 111 D. Kameoka, A method for the detection of asparagine deamidation and aspartate, J. Biochem., 134 (2003) 129–135. 112 A. Di Donato, P. Galletti and G. D’Alessio, Selective deamidation and enzymatic methylation of seminal ribonuclease, Biochemistry, 25 (1986) 8361–8363. 113 A. Di Donato, M.A. Ciardiello, M. Denigris, R. Piccoli, L. Mazzarella and G. Dalessio, Selective deamidation of ribonuclease-a – isolation and characterization of the resulting isoaspartyl and aspartyl derivatives, J. Biol. Chem., 268 (1993) 4745–4751. 114 J.P. Tam, Synthetic peptide vaccine design: Synthesis and properties of a high-density multiple antigenic peptide system, Proc. Natl. Acad. Sci. USA, 85 (1988) 5409–5413. 115 T. Shimizu, A. Watanabe, M. Ogawara, H. Mori and T. Shirasawa, Isoaspartate formation and neurodegeneration in Alzheimer’s disease, Arch. Biochem. Biophys., 381 (2000) 225–234. 116 T. Shimizu, H. Fukuda, S. Murayama, N. Izumiyama and T. Shirasawa, Isoaspartate formation at position 23 of amyloid beta peptide enhanced fibril formation and deposited onto
408
117
118
119
120 121
122
123 124 125 126
127
128
129 130 131 132
133
134 135 136
Jason J. Cournoyer and Peter B. O’Connor
senile plaques and vascular amyloids in Alzheimer’s disease, J. Neurosci. Res., 70 (2002) 451–461. Y. Shin, H.S. Cho, H. Fukumoto, T. Shimizu, T. Shirasawa, S.M. Greenberg and G.W. Rebeck, A beta species, including isoasp23 a beta, in iowa-type familial cerebral amyloid angiopathy, Acta Neuropathol., 105 (2003) 252–258. E.D. Murray and S. Clarke, Synthetic peptide-substrates for the erythrocyte protein carboxyl methyltransferase – detection of a new site of methylation at isomerized l-aspartyl residues, J. Biol. Chem., 259 (1984) 722–732. P.N. McFadden and S. Clarke, Conversion of isoaspartyl peptides to normal peptides – implications for the cellular repair of damaged proteins, Proc. Natl. Acad. Sci. USA, 84 (1987) 2595–2599. T.V. Brennan and S. Clarke, Spontaneous degradation of polypeptides at aspartyl and asparaginyl residues – effects of the solvent dielectric, Protein Sci., 2 (1993) 331–338. B.A. Johnson, E.D. Murray, S. Clarke, D.B. Glass and D.W. Aswad, Protein carboxyl methyltransferase facilitates conversion of atypical l-isoaspartyl peptides to normal l-aspartyl peptides, J. Biol. Chem., 262 (1987) 5622–5629. D.W. Aswad, Stoichiometric methylation of porcine adrenocorticotropin by protein carboxyl methyltransferase requires deamidation of asparagine-25 – evidence for methylation at the alphacarboxyl group of atypical l-isoaspartyl residues, J. Biol. Chem., 259 (1984) 714–721. S. Capasso and S. Salvadori, Effect of the three-dimensional structure on the deamidation reaction of ribonuclease a, J. Peptide Res., 54 (1999) 377–382. S.J. Wearne and T.E. Creighton, Effect of protein conformation on rate of deamidation – ribonuclease-a, Proteins, 5 (1989) 8–12. B.A. Johnson, E.L. Langmack and D.W. Aswad, Partial repair of deamidation-damaged calmodulin by protein carboxyl methyltransferase, J. Biol. Chem., 262 (1987) 12283–12287. A. Vinther, A. Holm, T. HoegJensen, A.M. Jesperson, N.K. Klaussen, T. Christensen and H.H. Sorensen, Synthesis of stereoisomers and isoforms of a tryptic heptapeptide fragment of human growth hormone and analysis by reverse-phase HPLC and capillary electrophoresis, Eur. J. Biochem., 235 (1996) 304–309. W.J. Chazin, J. Kordel, E. Thulin, T. Hofmann, T. Drakenberg and S. Forsen, Identification of an isoaspartyl linkage formed upon deamidation of bovine calbindin-d9k and structural characterization by 2d h-1-nmr, Biochemistry, 28 (1989) 8646–8653. M.J. Mamula, R.J. Gee, J.I. Elliott, A. Sette, S. Southwood, P.J. Jones and P.R. Blier, Isoaspartyl posttranslational modification triggers autoimmune responses to self-proteins, J. Biol. Chem., 274 (1999) 22321–22327. K. Wutrich, Nmr of Proteins and Nucleic Acids, Wiley, New York, 1986. D.G. Smyth, W.H. Stein and S. Moore, On the sequence of residues 11 to 18 in bovine pancreatic ribonuclease, J. Biol. Chem., 237 (1962) 1845–1850. J.D. Gary and S. Clarke, Purification and characterization of an isoaspartyl dipeptidase from Escherichia coli, J. Biol. Chem., 270 (1995) 4076–4087. C. Fledelius, A.H. Johnsen, P.A.C. Cloos, M. Bonde and P. Qvist, Characterization of urinary degradation products derived from type i collagen – identification of a beta-isomerized asp-gly sequence within the c-terminal telopeptide (alpha 1) region, J. Biol. Chem., 272 (1997) 9755–9763. A.E. Roher, J.D. Lowenson, S. Clarke, C. Wolkow, R. Wang, R.J. Cotter, I.M. Reardon, H.A. Zurcherneely, R.L. Heinrikson, M.J. Ball and B.D. Greenberg, Structural alterations in the peptide backbone of beta-amyloid core protein may account for its deposition and stability in Alzheimers disease, J. Biol. Chem., 268 (1993) 3072–3083. D.E. Macfarlane, Inhibitors of cyclic nucleotide phosphodiesterases inhibit protein carboxyl methylation in intact blood platelets, J. Biol. Chem., 259 (1984) 1357–1362. D.W. Aswad and E.A. Deight, Purification and characterization of two distinct isoozymes of protein carboxymethylase from bovine brain, J. Neurochem., 40 (1983) 1718–1725. C.M. O’Connor, Regulation and subcellular-distribution of a protein methyltransferase and its damaged aspartyl substrate sites in developing xenopus oocytes, J. Biol. Chem., 262 (1987) 10398–10403.
Analysis of Deamidation in Proteins
409
137 J. Kindrachuk, J. Parent, G.F. Davies, M. Dinsmore, S. Attah-Poku and S. Napper, Overexpression of l-isoaspartate o-methyltransferase in Escherichia coli increases heat shock survival by a mechanism independent of methyltransferase activity, J. Biol. Chem., 278 (2003) 50880–50886. 138 A. Niewmierzycka and S. Clarke, Do damaged proteins accumulate in caenorhabditis elegans l-isoaspartate methyltransferase (pcm-1) deletion mutants?, Arch Biochem. Biophys., 364 (1999) 209–218. 139 K.J. Reissner, M.V. Paranandi, T.M. Luc, H.A. Doyle, M.J. Mamula, J.D. Lowenson and D.W. Aswad, Synapsin i is a major endogenous substrate for protein l-isoaspartyl methyltransferase in mammalian brain, J. Biol. Chem., 281 (2006) 8389–8398. 140 M.V. Paranandi, A.W. Guzzetta, W.S. Hancock and D.W. Aswad, Deamidation and isoaspartate formation during in-vitro aging of recombinant tissue-plasminogen activator, J. Biol. Chem., 269 (1994) 243–253. 141 B.A. Johnson, J.M. Shirokawa, W.S. Hancock, M.W. Spellman, L.J. Basa and D.W. Aswad, Formation of isoaspartate at 2 distinct sites during invitro aging of human growth-hormone, J. Biol. Chem., 264 (1989) 14262–14271. 142 R. Tyler-Cross and V. Schirch, Effects of amino acid sequence, buffers, and ionic strength on the rate and mechanism of deamidation of asparagine residues in small peptides, J. Biol. Chem., 266 (1991) 22549–22556. 143 J.R. Barber and S. Clarke, Membrane-protein carboxyl methylation increases with humanerythrocyte age – evidence for an increase in the number of methylatable sites, J. Biol. Chem., 258 (1983) 1189–1196. 144 G.W. Young, S.A. Hoofring, M.J. Mamula, H.A. Doyle, G.J. Bunick, Y.L. Hu and D.W. Aswad, Protein l-isoaspartyl methyltransferase catalyzes in vivo racemization of aspartate-25 in mammalian histone h2b, J. Biol. Chem., 280 (2005) 26094–26098. 145 N.E. Robinson, A.B. Robinson and R.B. Merrifield, Mass spectrometric evaluation of synthetic peptides as primary structure models for peptide and protein deamidation, J. Peptide Res., 57 (2001) 483–493. 146 N.E. Robinson, Z.W. Robinson, B.R. Robinson, A.L. Robinson, J.A. Robinson, M.L. Robinson and A.B. Robinson, Structure-dependent nonenzymatic deamidation of glutaminyl and asparaginyl pentapeptides, J. Peptide Res., 63 (2004) 426–436. 147 R.N. Carvalho, T. Solstad, E. Bjorgo, J.F. Barroso and T. Flatmark, Deamidations in recombinant human phenylalanine hydroxylase – identification of labile asparagine residues and functional characterization of asn -W asp mutant forms, J. Biol. Chem., 278 (2003) 15142–15152. 148 V. Zabrouskov, X.M. Han, E. Welker, H.L. Zhai, C. Lin, K.J. van Wijk, H.A. Scheraga and F.W. McLafferty, Stepwise deamidation of ribonuclease a at five sites determined by top down mass spectrometry, Biochemistry, 45 (2006) 987–992. 149 I.A. Papayannopoulos and K. Biemann, Fast-atom-bombardment and tandem mass-spectrometry of synthetic peptides and by-products, Peptide Res., 5 (1992) 83–90. 150 M.F. Bean and S.A. Carr. Differentiation of alpha- and beta-aspartic acids in isomerized peptides by tandem ms, The 40th ASMS Conference on Mass Spectrometry and Allied Topics, Dallas, Texas, 1999. 151 P. Schindler, D. Muller, W. Marki, H. Grossenbacher and W.J. Richter, Characterization of a betaasp33 isoform of recombinant hirudin sequence variant 1 by low-energy collision-induced dissociation, J. Mass Spectrom., 31 (1996) 967–974. 152 G.C. Thorne, K.D. Ballard and S.J. Gaskell, Metastable decomposition of peptide (m+h)+ ions via rearrangement involving loss of the c-terminal amino acid residue, J. Am. Soc. Mass Spectrom., 1 (1990) 249–257. 153 K.D. Ballard and S.J. Gaskell, Intermolecular [o-18] isotopic exchange in the gas-phase observed during the tandem mass spectrometric analysis of peptides, J. Am. Chem. Soc., 114 (1992) 64–71. 154 W. Lehmann, A. Schlosser, G. Erben, R. Pipkorn, D. Bossemeyer and V. Kinzel, Analysis of isoaspartate in peptides by electrospray tandem mass spectrometry, Protein Sci., 9 (2000) 2260–2268. 155 W. Lehmann and A. Schlosser, Five-membered ring formation in unimolecular reactions of peptides: A key structural element controlling low-energy collision-induced dissociation of peptides, J. Mass Spectrom., 35 (2000) 1382–1390.
410
Jason J. Cournoyer and Peter B. O’Connor
156 F.W. McLafferty and F. Turecek (Eds.), Interpretation of Mass Spectra, 4th ed., University Science Books, Sausalito, 1993, p. 72. 157 A. Rauk, D. Yu and D.A. Armstrong, Toward site specificity of oxidative damage in proteins: C-h and c-c bond dissociation energies and reduction potentials of the radicals of alanine, serine, and threonine residues – an ab initio study, J. Am. Chem. Soc., 119 (1997) 208–217. 158 I.A. Popov, S. Kozin, O.N. Kharybin, A.S. Kononikhin and E.N. Nikolaev. Recognition of individual amino acid isomeric form in peptides by ft-icr mass spectrometry. Application to Alzheimer’s disease peptides, 54th ASMS Conference on Mass Spectrometry and Allied Topics, Seattle, 2006. 159 J.J. Cournoyer, C. Lin, M.J. Bowman and P.B. O’Connor, Quantitating the relative abundance of isoaspartyl residues in deamidated proteins by electron capture dissociation, J. Am. Soc. Mass Spectrom., 18 (2006) 48–56. 160 M. Schnolzer, P. Jedrzejewski and W.D. Lehmann, Protease-catalyzed incorporation of o-18 into peptide fragments and its application for protein sequencing by electrospray and matrix-assisted laser desorption/ionization mass spectrometry, Electrophoresis, 17 (1996) 945–953. 161 I.I. Stewart, T. Thomson and D. Figeys, 18o labeling; a tool for proteomics, Rapid Commun. Mass Spectrom., 15 (2001) 2456–2465. 162 X.D. Yao, C. Afonso and C. Fenselau, Dissection of proteolytic o-18 labeling: Endoproteasecatalyzed o-16-to-o-18 exchange of truncated peptide substrates, J. Proteome Res., 2 (2003) 147–152. 163 T. Kosaka, T. Takazawa and T. Nakamura, Identification and c-terminal characterization of proteins from two-dimensional polyacrylamide gels by a combination of isotopic labeling and nanoelectrospray Fourier transform ion cyclotron resonance mass spectrometry, Anal. Chem., 72 (2000) 1179–1185. 164 B. Kuster and M. Mann, O-18-labeling of n-glycosylation sites to improve the identification of gel-separated glycoproteins using peptide mass mapping and database searching, Anal. Chem., 71 (1999) 1431–1440. 165 Y.K. Wang, Z.X. Ma, D.F. Quinn and E.W. Fu, Inverse o-18 labeling mass spectrometry for the rapid identification of marker/target proteins, Anal. Chem., 73 (2001) 3742–3750. 166 X.D. Yao, A. Freas, J. Ramirez, P.A. Demirev and C. Fenselau, Proteolytic o-18 labeling for comparative proteomics: Model studies with two serotypes of adenovirus, Anal. Chem., 73 (2001) 2836–2842. 167 M. Heller, H. Mattou, C. Menzel and X.D. Yao, Trypsin catalyzed o-16-to-o-18 exchange for comparative proteomics: Tandem mass spectrometry comparison using MALDI-TOF, ESI-QTOF, and ESI-ion trap mass spectrometers, J. Am. Soc. Mass Spectrom., 14 (2003) 704–718. 168 H. Grossenbacher, W. Marki, D. Coulot, D. Muller and W.J. Richter, Characterization of succinimide-type dehydration products of recombinant hirudin variant 1 by electrospray tandem mass spectrometry, Rapid Commun. Mass Spectrom., 7 (1993) 1082–1085.
CHAPT ER
17 Mass Spectrometry-Driven Approaches to Quantitative Proteomics and Beyond Silke Oeljeklaus, Jon Barbour, Helmut E. Meyer and Bettina Warscheid
Contents
1. Why to Use Mass Spectrometry in Quantitative Proteomics 2. MS-Based Approaches to Quantitative Proteomics 2.1 Label-free protein quantification by MS 2.2 Stable isotope labeling methods for quantitative MS 3. Applications in Functional Proteomics 3.1 Protein localization 3.2 Protein–protein interactions 3.3 Protein dynamics 4. How to Obtain Meaningful Data in MS-Based Quantitative Proteomics 4.1 Data validation 5. Perspectives References
411 413 413 416 426 426 431 435 437 438 439 439
1. WHY TO USE MASS SPECTROMETRY IN QUANTITATIVE PROTEOMICS In order to advance our understanding of the molecular processes involved in the development, survival, or pathology of an organism, different but complementary strategic tracks can be followed. These approaches and, thus, the respective disciplines — genomics, transcriptomics, and proteomics — evolved from the central dogma of molecular biology of how gene products are formed. The genome contains all the basic information necessary for life and survival and describes the biological potential of an organism in general. However, the Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00217-1
r 2009 Elsevier B.V. All rights reserved.
411
412
Silke Oeljeklaus et al.
genome is predominantly static and, as a consequence, usually remains unaltered regardless of intra- and extracellular stimuli or perturbations. Associated effects are ultimately manifested in changes in abundance, structure, function, and interaction on the level of proteins. The genome and the proteome are dynamically linked by the transcriptome, which reflects the subset of genes actively expressed in an organism at any given time. Modern transcriptomic techniques provide us with the tools for describing the entirety of mRNA transcripts present in a particular cell type, tissue, or organism in a quantitative manner. Information about the abundance of a distinct species of mRNA molecules has been used to predict the expression level of the corresponding protein. However, studies comparing mRNA and protein levels revealed that these do not necessarily correlate [1]. On this account, proteomics methodologies are essential for monitoring changes in protein abundance as well as for characterizing proteins. Proteomics takes advantage of the existence of databases containing the sequences of diverse organisms which became available through various genome sequencing projects. This certainly helped to promote proteomics and to establish its current status in biological, medical, and pharmaceutical sciences. At the same time, rapid advancements in mass analyzers as well as ionization and dissociation methods of biomolecules elevated mass spectrometry (MS) to a key technology in proteomics. Modern proteomics methods combined with MS provide the capability to identify (sub-) cellular proteomes, to characterize post-translational modifications, as well as to map functional protein complexes. Furthermore, increasing effort has been made to develop quantitative proteomics methodologies, which allow for assessing changes in protein abundance associated with, for example, a stimulus or a disease. In contrast to descriptive proteomics, quantitative proteomics generally aims at identifying only those proteins that show changes in expression levels or undergo changes regarding the relative degree of post-translational modifications. It offers a unique opportunity for placing proteins into a functional context. To attain information on protein abundance, two alternative strategies can be followed. The first one is based on two-dimensional polyacrylamide gel electrophoresis (2-D PAGE), usually followed by single-stage MS for protein identification. The second strategy utilizes MS with or without stable isotope labeling for gaining both qualitative and quantitative information on proteins using the same analytical device. 2-D PAGE is a well-established technology that has been used for about 30 years to reveal quantitative differences between biological samples by densitometric analysis of spot patterns in gels after silver or Coomassie Blue staining [2]. Sensitivity and linear dynamic range of this method could be improved significantly by introducing the difference gel electrophoresis (DIGE) technology [3]. However, shortcomings such as a strong bias against hydrophobic proteins (i.e., membrane proteins) and proteins with an extreme isoelectric point and/or molecular weight remain inherent to 2-D PAGE. Furthermore, a high number of protein spots may contain more than a single protein species, since the complexity of samples often exceeds the resolution of 2-D PAGE. In such a case, protein quantification of high accuracy and specificity using densitometry is not always feasible. To circumvent these problems, gel-free approaches for
MS-Driven Approaches to Quantitative Proteomics and Beyond
413
comparative quantitative analysis of proteins have been developed, in which MS is employed for both descriptive and quantitative purposes. In the present chapter, we discuss basic principles as well as novel, intriguing applications of MS in quantitative and functional proteomics. In these approaches, capability, versatility, and usefulness of biological MS are exploited at their best.
2. MS-BASED APPROACHES TO QUANTITATIVE PROTEOMICS 2.1 Label-free protein quantification by MS Initial efforts in MS-based proteomics research focused on the mere identification of proteins expressed in a tissue or a cell type using high-performance liquid chromatography (HPLC) coupled to nano electrospray ionization (ESI) tandem MS (MS/MS). More recently, however, efforts have shifted to develop methods which enable the quantification of proteins by MS in order to reveal differences in protein abundance, for example in a set of diseased and ‘‘control’’ or perturbed and non-perturbed samples. While densitometric analysis of 2-D gels is already used on a routine basis in comparative proteomics, HPLC/ESI-MS/MS has only recently been exploited to simultaneously attain information on protein identities as well as quantities. At first glance, the latter approach appears straightforward. The mass spectrometer measures both the specific mass-to-charge (m/z) ratio and the ion current for each peptide. At relatively low concentrations, the ion current of an analyte correlates with its concentration. The MS response, however, levels off at higher analyte concentrations. This is mainly a consequence of saturation in the ESI process [4]. Suppression of ion formation is also caused by the presence of co-eluting components or solution constituents (e.g., buffer). Irrespective of the nature of ion suppression, each process results in the detection of peptide signal intensities that do not necessarily correlate with true peptide concentrations in a given sample [5–7]. Since the MS response strongly depends on the solvent as well as on sample composition and concentration, different signal intensities are likely to be recorded for the same peptide which is present in equal amounts in two different samples. Only the use of an adequate internal standard can compensate for those effects. In order to determine quantitative changes in protein abundance in, for example, healthy and diseased tissues by label-free proteomics methods, same amounts of respective protein digests are sequentially analyzed by LC/ESI-MS/MS. When peptides are eluting from the column they are directly transferred into the ESI source where they become ionized. Subsequently, m/z ratios and intensities of peptides are measured in the mass analyzer (referred to as MS survey scan) followed by consecutive MS/MS scans of typically the three most abundant peptide ions observed in the survey scan. A complete LC/ESI-MS/MS run comprises a series of iterative cycles of MS survey and MS/MS scans in an effort to analyze each peptide across the entire LC gradient. Product ion scans provide information on peptide sequences; this represents the basis for protein identification via bioinformatics. Using consecutive MS survey scans, elution profiles of
414
Silke Oeljeklaus et al.
individual peptides can be reconstructed and displayed as extracted ion chromatograms (XICs). From peptide XICs, quantitative information can be deduced using peptide peak areas as a measure of the abundance of respective proteins in the sample. For the relative quantitative comparison of two states, XICs of peptides from different LC/MS runs are extracted and then matched between LC runs to account for chromatographic shifts in peptide retention. The correct matching of peptide elution profiles is confirmed using sequence information retrieved in MS/MS scans. However, due to undersampling in the mass analyzer, only a fraction of all peptides is typically subjected to MS/MS when analyzing complex samples by LC/MS. Thus, the sequence signature of a given peptide is not necessarily provided in every run, which can hamper the correct matching of peptides between different LC/MS runs. The applicability of label-free approaches to relative quantitative protein analysis was first shown by Chelius et al. [8,9] using either 1-D or 2-D LC/ESI-MS/ MS. Protein abundance ratios were calculated on the basis of peak areas of respective peptides, which allowed for relative quantification with an error of B20% and a linear dynamic range of B104. The lower limit of protein quantification is mainly determined by the MS detection limit, which is typically in the low femtomole range, and the signal-to-noise level. Factors such as sample complexity, loading capacity of the LC system, and point of saturation of the MS detector affect the upper limit of linear signal response. To minimize technical variability and, hence, quantification errors in label-free approaches, highly reproducible sample preparation as well as accurate pipetting and MS sample injection have to be ensured; among those, the latter step has been reported to limit reproducibility most [10,11]. In addition, given potential variability in ESI efficiencies and peptide retention times across various chromatographic separations, the use of an adequate internal standard is a prerequisite when aiming at accurate relative quantification by MS. The normalization method best applicable to label-free quantitative proteomics is still under debate [12]. The accuracy of quantitative measurements can clearly be improved by either adding the same amount of a standard protein digest to each sample [13] or utilizing abundant unchanging proteolytic peptides from housekeeping proteins [8,14]. These are then used as landmarks between runs and for data normalization. Label-free LC/ESI-MS/MS analyses in conjunction with data normalization show linear correlation between the amounts of an analyte and respective peak areas for the analysis of simple mixtures [4,6] as well as for proteolytic peptides from a single or a few proteins spiked into a complex biological sample [9,13,14]. Continuative studies focused on the applicability of label-free approaches to: (1) the relative quantification of post-translationally modified proteins [8,15–18]; (2) the analysis of protein distributions across several density gradient fractions [19,20] (for detailed discussion see Section 3.1); and (3) large-scale quantitative proteomics experiments [8,21–24]. The latter approach is accompanied by the development of software tools suitable for automated processing and statistical analysis of large-scale datasets [11].
MS-Driven Approaches to Quantitative Proteomics and Beyond
415
In an effort to improve the probability to detect significant differences not only of high-abundance but also of low-abundance proteins in two different samples, multiple LC/ESI-MS/MS analyses of the same binary sample set combined with advanced statistical data analysis were performed [21,25], thereby compensating for variability in signal intensities, noise levels, and experimental variations. Recently, Smith and co-workers [26] adopted the ‘‘accurate mass and time (AMT)’’ tag concept (for a review on the AMT approach see ref. [27]) to quantitatively compare the proteomes of Shewanella oneidensis cultured under aerobic and suboxic conditions using Fourier transform ion cyclotron resonance (FTICR) MS. In this elaborate approach, tryptic peptide mixtures were analyzed by 2-D LC/ESI-MS/MS on an ion trap instrument in order to generate a database consisting of theoretical accurate masses of identified peptides and their normalized elution times (NETs). For the purpose of relative quantification of proteins, AMT tags of peptides were obtained by MS survey scans of tryptic digests of differentially cultured cells on an LC-FTICR instrument operated at a resolution of B105. Peptide identities were inferred by searches against accurate masses and NETs stored in the previously established database. Peak areas of peptides were then used to determine abundance ratios of respective proteins. Two biological experiments, each with three technical replicates, were performed. Only proteins for which peptides were detected in multiple replicates were considered for relative quantification, certainly improving the significance of data. A benefit of the AMT approach for label-free quantitative proteomics is a very sensitive and accurate mass analysis. Furthermore, the acquisition of AMT tags obviates the need for precursor ion selection, thereby improving peptide ion statistics, sensitivity, and accuracy of protein quantification. The improvement of ion statistics and signal-to-noise ratios via prolonged scan times is particularly beneficial for accurate peak recognition and, hence, for the quantification of lowabundant proteins. While the mass spectrometric system used primarily determines accuracy and dynamic range of peak ratio measurements, one should consider that reproducibility strongly depends on the entire experimental strategy employed. This aspect is particularly important for label-free quantitative proteomics approaches and is a good reason for minimizing the number of parallel sample processing and fractionation steps. Aiming at meaningful quantitative data on proteins, one has to consider both the technical and biological variability of the system. It is therefore essential to perform an adequate number of technical and biological replicates and to calculate protein abundance ratios on the basis of multiple peptide peaks. To summarize, label-free proteomics is most appealing to researchers for its ease in use and cost-effectiveness. Further advantages are as follows: (1) applicable to virtually every type of sample, including material from tissue biopsy; (2) does not affect growth or sample preparation; and (3) no effects on MS analysis by changes in sample complexity, peptide mass, ionization efficiency, or collisional activation conditions. Recent approaches appear to be promising for relative protein quantification on a global scale; however, further refinements are certainly needed before reaching the level of accuracy usually achieved by employing isotopically labeled internal standards as discussed in the following sections.
416
Silke Oeljeklaus et al.
Nonetheless, this technique will certainly advance, particularly with ongoing software developments including statistical tools for relative quantitative analysis of large-scale LC/MS datasets.
2.2 Stable isotope labeling methods for quantitative MS 2.2.1 Metabolic labeling strategies Metabolic labeling strategies are based on the in vivo incorporation of stable isotopic labels during growth of organisms. Nutrients or amino acids in a defined culture medium are replaced by their isotopically labeled (15N, 13C, or 2H) counterparts, which leads to uniform labeling of proteins during the processes of cell growth and protein turnover. For comparative, quantitative proteomics studies, different populations of cells are grown in ‘‘light’’ and ‘‘heavy’’ media incorporating the label into the proteins at the earliest stage possible. Since the label introduces an inherent and distinct mass shift into the proteins, proteolytic peptides identical in sequence but from differently labeled populations can easily be distinguished by MS. An essential advantage of metabolic labeling approaches is the fact that for each peptide present in a proteolytic mixture, an internal standard allowing for accurate relative quantification is generated in vivo. Metabolically labeled cells can be combined directly after harvesting, thereby minimizing errors due to separate sample handling. Metabolic labeling is arguably one of the most accurate methods for relative quantification of proteins by MS. Its application is tailored to the use in biological systems that can be maintained under controlled conditions, such as cell culture systems. However, the possibility to metabolically label multi-cellular organisms such as Caenorhabditis elegans and Drosophila melanogaster as well as small mammals (i.e., Rattus norvegicus) using 15N isotopes has been reported [28,29]. Figure 1 shows a schematic representation of the general workflow of metabolic labeling employed for relative quantitative MS analyses. The two most commonly used strategies, labeling with 15N and stable isotope labeling by amino acids, will be discussed below. For a more comprehensive review addressing opportunities and pitfalls associated with metabolic labeling, see ref. [30].
2.2.1.1 15N-labeling. For global labeling of proteomes with 15N isotopes, cells are grown in media containing 15N-enriched (W96%) ammonium salts, which eventually leads to the replacement of almost all nitrogen atoms in proteins by 15 N. After harvesting, differentially labeled cells are typically mixed in a 1:1 ratio with respect to weight or cell number, followed by cell lysis. At this point, the sample may be subjected to fractionation and purification steps in order to reduce sample complexity, thereby increasing the probability of detecting low-abundant proteins. Since proteomes to be compared are combined in one sample, the accuracy of quantification is not affected by experimental variations. Proteins are then proteolytically digested and the resulting peptide mixtures are analyzed by MS. Differentially 14N/15N-labeled peptides, which are otherwise identical, can be
MS-Driven Approaches to Quantitative Proteomics and Beyond
State A
417
State B
"light" cells
"heavy" cells mix 1:1
relative intensity
492.3 "light" peptide "heavy" peptide 490.3
492.8
490.8 491.3
493.3
cell lysis m /z
sample fractionation and/or purification (optional)
separation post or prior proteolytic digestion
protein quantification and identification by mass spectrometry
Figure 1 Schematic representation of metabolic labeling for relative quantitative MS analysis. Different populations of cells (state A and B) are grown in media containing either ‘‘light’’ or ‘‘heavy’’ stable isotopes, such as 14N/15N-coded nutrients or differently isotopically labeled amino acids, which eventually results in the incorporation of the respective isotopes into the entire proteome. After harvesting, cells can be combined in a 1:1 ratio. Following cell lysis, the sample may be subjected to further fractionation and purification steps in order to reduce sample complexity. Proteins are then proteolytically digested and resulting peptide mixtures are subjected to MS analysis. Differentially labeled peptides can be identified in mass spectra as doublets separated by the mass shift imparted by the label. Information about relative protein abundance can be extracted from intensities or areas of corresponding peptide peaks. The example shows a doubly charged peptide pair exhibiting a nominal mass shift of 4 Da.
identified in mass spectra as doublet ion clusters separated by the mass difference introduced by the 15N isotopes. Relative quantitative information on proteins in the two samples is obtained by comparing peak intensities or peak areas of the corresponding ‘‘light’’ and ‘‘heavy’’ peptides. For increased accuracy in relative quantification, a set of at least two differentially labeled proteolytic peptide pairs should be assignable to each protein. In 15N-labeling, quantitative analysis of MS data is complicated by the fact that each nitrogen atom in each amino acid is replaced by the ‘‘heavy’’ isotope. This results in varying mass shifts that depend on length and amino acid composition of the peptide and can only be predicted if the amino acid sequence of the protein is known. In order to minimize peak overlapping and facilitate interpretation of MS data, a reduction in sample complexity by pre-fractionation combined with the use of high-resolution MS was recommended [31]. However, low-resolution ion trap instruments in combination with suitable algorithms for data analysis have successfully been used in shotgun experiments aiming at analyzing differential expression of 14N/15N-labeled proteins in single-cell organisms [32–34]. Differential labeling of entire proteomes of bacteria and yeast using 15 N-enriched media followed by quantitative MS analysis was first published in the late 1990s [35,36] and since then has been applied to mammalian cell lines [31].
418
Silke Oeljeklaus et al.
Heck and co-workers [28] applied 15N-labeling to D. melanogaster and C. elegans, which were fed 15N-labeled single-cell organisms. This resulted in at least 94–95% incorporation of ‘‘heavy’’ nitrogen atoms into all proteins, which is generally sufficient to enable accurate MS-based protein quantification. Engelsberger et al. [37] demonstrated the applicability of 15N-labeling to comparative proteomics of Arabidopsis thaliana cell cultures. In studies aiming at analyzing molecular mechanisms underlying leaf senescence in plants [38] and the influence of light and dark on plant growth [39], 15N-labeling has successfully been applied to label A. thaliana plants. Yates and colleagues [29] extended this labeling strategy to small mammalian organisms. A rat was fed 15N-enriched algae for 44 days, which led to incorporation rates of 15N ranging from approximately 75% in the brain to W90% in the liver and plasma proteins. A comparison to a littermate subjected to the same feeding regime but using 14N algae revealed that the 15N-rich diet appeared to have no adverse effects on growth and health of the labeled animal. However, it can be expected that this very specialized diet led to impaired growth and development of these subjects. Considering the fact that metabolic labeling using stable isotope-enriched media is quite cost-intensive, labeling with 15N is better suited for simpler organisms.
2.2.1.2 Stable isotope labeling by amino acids in cell culture. A very effective yet simple strategy for in vivo labeling of proteins that has gained widespread application is stable isotope labeling by amino acids in cell culture (SILAC) [40]. Before Mann and co-workers established this approach for comparative, quantitative proteomics, metabolic incorporation of isotopically coded amino acids in cell culture was employed to generate mass-tagged proteolytic peptides for increased specificity and accuracy of protein identification via peptide mass fingerprinting [41,42]. This method has been further applied to identify posttranslational modifications such as phosphorylation and methionine oxidation [43]. Jiang and English combined differential metabolic labeling of yeast cells using natural abundance and deuterated leucine with two-dimensional gel electrophoresis followed by MALDI-TOF-MS for assessing protein expression levels [44]. Differential labeling of proteomes of distinct cell populations is achieved by growing one set of cells in normal medium while another one is grown in medium containing one or more selected isotopically coded (2H, 13C and/or 15N) amino acids. After several cell cycles, the amino acid(s) will be effectively incorporated into the proteins. Analogous to the metabolic labeling approach using 15N, the workflow of a SILAC approach typically consists of pooling differentially labeled cells directly after harvesting, followed by protein extraction, multi-step purification processes, proteolytic digestion, and finally peptide LC/ESI-MS/ MS analyses. Unlike 14N/15N-labeled peptide pairs, however, corresponding peptides in a SILAC experiment exhibit fixed mass differences that are defined by the combination of the isotopically coded amino acid used for labeling and the enzyme chosen for proteolytic digestion. For ease of MS data evaluation and accurate quantification of protein abundances, labeling of proteomes should be as complete as possible. This can be achieved using amino acids that the organisms subjected to the study are
MS-Driven Approaches to Quantitative Proteomics and Beyond
419
unable to synthesize, that is, essential amino acids or amino acids for which genetically manipulated organisms are auxotrophic. In either case, the cells are obliged to incorporate the particular form of the selected amino acid supplied in the medium. After an appropriate number of cell doublings (i.e., more than five cell doublings; ref. [40]), the degree of incorporation is generally limited by the purity of the isotope-coded amino acid used, which is usually 96% atleast. The nature of the amino acid chosen for metabolic labeling can be crucial for the outcome of a SILAC experiment. In general, amino acids with relatively high abundances such as arginine, leucine, and lysine are employed since they guarantee global labeling of a proteome. This results in a high number of labeled proteolytic peptides, thereby providing as much information as possible for accurate relative quantification of proteins based on multiple peptide pairs. Ideally, the amino acid used for labeling should not be part of the biosynthetic pathway of any other amino acid in order to prevent accidental appearance of the label in a different amino acid through metabolic conversion. Conversion of 13C6-arginine to fully 13C-substituted proline has been observed in HeLa cells, presumably through arginine catabolism via the arginase pathway [45]. In a quantitative proteomics study, this can be taken into account by reducing the amount of arginine in the media or by omitting proline-containing peptides from relative protein quantification [46]. Besides 13C6-arginine, 13C6/15N4-arginine [47], 13C6-lysine [48], 13 C6/15N2-lysine [49], 2H3-leucine [40,50], 13C6-leucine [51], 13C9-tyrosine [52], and 13 2 C H3-methionine [53] have been employed for SILAC experiments. However, when using deuterated forms of amino acids as labeling agents, one should bear in mind that corresponding peptide pairs may exhibit a shift in retention time in reversed phase (RP) HPLC. This means that labeled and unlabeled peptides do not necessarily co-elute, which can complicate the accurate assessment of the relative abundance of peptide pairs [54]. 13C/12C-labeled peptide pairs do not exhibit chromatographic differences in RPLC and thus are jointly detected in the same MS scan [45]. Maximal information for the quantification of proteins is obtained when simultaneous labeling with 13C6-arginine and 13C6-lysine combined with tryptic digestion is employed [55,56]. Each tryptic peptide except for the C-terminal peptide of the precursor protein will bear an isotopically labeled amino acid that introduces a fixed mass shift of 6 Da or, in the case of missed cleavage sites, a multiple of 6 Da. This usually results in increased sensitivity and accuracy of relative quantification of proteins by MS. The availability of 12C6-arginine, 13C6-arginine, and 13C6/15N4-arginine allows for simultaneously comparing three different populations of cells [47]. In mass spectra obtained in such an experimental set-up, differentially labeled peptide species appear as triplets of ion clusters separated by mass shifts of 6 and 10 Da. The SILAC strategy has been successfully applied to study differences in protein expression during different biological processes, such as muscle cell differentiation [40], metastatic prostate cancer progression [48], resistance to anticancer drugs in breast cancer cell lines [56], B cell differentiation [51], microglial activation [57], stress-induced regulation of glutathione-S-transferase expression in suspension cells of A. thaliana [58], and CD95-induced apoptosis [59]. SILAC has
420
Silke Oeljeklaus et al.
also proven to be a powerful tool for identifying and analyzing post-translational modifications. For example, it has been used for the analysis of signal transduction events mediated by phosphorylation/dephosphorylation in the yeast pheromone signaling pathway [60], as well as in ephrin B2 signaling in NG108 cells [61], in Her2/neu signaling using NIH 3T3 cells stably transfected with Her2 [62], and for identifying and quantifying methylation sites of proteins in HeLa cells [53]. Furthermore, SILAC was employed to dissect and quantify protein–protein interactions and multi-protein complexes involved, among others, in the epidermal growth factor receptor (EGFR) pathway [63], in focal adhesion complexes [64], and in the insulin-dependent translocation of the insulin-regulated glucose transporter, GLUT4, to the plasma membrane [65]. This list exemplifies the different SILAC applications for use in quantitative proteomics and reflects the very high potential of this method. SILAC is currently one of the most promising tools in quantitative proteomics and, certainly, will be increasingly exploited to gain deeper insights into functional aspects of changes in protein levels, protein dynamics including the determination of post-translational modifications, as well as protein–protein interactions in various biological systems.
2.2.2 Chemical tagging strategies Aside from the incorporation of stable isotopes by metabolic labeling, various chemical reactions have been exploited to introduce isotope-encoded tags into proteins or their respective peptides. Generally, stable isotope labeling via chemical reaction enables quantitative MS-based proteomics investigations of tissue biopsy samples and any other biological material that is not amenable to metabolic labeling. All chemical labeling approaches in proteomics employ a common strategy whereby proteomes are quantitatively compared using isotope-encoded tags combined with MS. Following the labeling step, samples are combined in equal ratio and jointly processed for peptide MS analysis. In order to facilitate mass spectrometric discrimination of differentially labeled peptide pairs, the ‘‘heavy’’ isotope tag should confer a mass shift of at least 4 Da. Mass shifts less than 4 Da generally result in an overlap of isotopic distributions of peptide pairs, which complicates quantitative evaluation of MS data. The chemical tag is usually targeted to a specific functional group of an amino acid residue within a polypeptide to which it is covalently attached. Figure 2 schematically represents a generic peptide and depicts current isotope-coding strategies which target various functional groups, such as the amide, sulfhydryl/thiol, and carboxyl group.
2.2.2.1 Targeting sulfhydryl groups. Cysteine residues in proteins are appealing targets for stable isotope labeling by site-directed derivatization of the nucleophilic sulfhydryl (-SH) group. Several cysteine-directed labeling strategies were developed for quantitative purposes, although only one approach, the isotope-coded affinity tag (ICAT) technology, has become more widely accepted in proteomics research. Following the first report by Aebersold and co-workers in 1999 [66], ICAT was quickly commercialized as a stable isotope reagent for quantitative proteomics applications, enabling both labeling of intact proteins and
MS-Driven Approaches to Quantitative Proteomics and Beyond
421
Figure 2 Schematic representation of current chemical isotope tagging strategies, which target specific functional groups of peptides/proteins, such as the amide, sulfhydryl/thiol, and carboxyl group (for detailed information on principles and applications of these methods please refer to main text).
subsequent enrichment of the respective tagged peptides prior to MS analysis. The original ICAT reagent consists of three components: a biotin tag, which is used to affinity-enrich tagged peptides; an oxyethelyene linker region; and a thiolspecific reactive iodoacetyl group by which cysteine residues in proteins are derivatized. Through incorporation of stable isotopes into the linker region, a ‘‘light’’ and a ‘‘heavy’’ form of the ICAT reagent are available. In the ‘‘heavy’’ version, eight deuterium atoms (2H8) are incorporated, bestowing a mass difference of 8 Da between differentially labeled peptides. Following the ICAT strategy, two protein samples are differentially tagged, pooled in equal ratio, and proteolyzed to peptides. Peptides carrying the ICAT tag are affinity-enriched using the biotin tag and subsequently analyzed by LC/ESI-MS/MS for identification and relative quantification. An important feature of the ICAT technology is the specific enrichment of cysteine-containing peptides, which are of comparatively low occurrence. As a result, sample complexity is significantly reduced prior to MS analysis. This alleviates problems related to peak overlapping and peptide undersampling during MS analysis. However, no information on either identity or quantity of cysteine-free proteins is obtained, certainly limiting proteome coverage. It is also known that iodoacetamide-based alkylating reagents may not completely alkylate all cysteine residues in a protein [67,68]. Quantitative tagging of cysteine residues with ICAT requires optimized reagent concentrations, buffer conditions, and reaction time [69]. Prolonged reaction times, for example, may result in partial
422
Silke Oeljeklaus et al.
derivatization of other amino acid residues, such as lysine, histidine, methionine, tryptophan, and/or tyrosine. To avoid cleavage of the thioether linkage by reaction with molecular oxygen, labeling and further sample processing steps are preferably performed under inert gas. When using the original 1H8/2H8-encoded ICAT reagents, one has to consider that retention times of differentially tagged peptides differ during RP-HPLC [54], which complicates relative quantification. In addition, the large biotin tag (+442 Da) increases the molecular weight of the peptide to a significant extent. Since the biotin tag is also prone to fragmentation, interpretation of peptide MS/MS spectra is complicated [69]. These issues have been addressed by developing a new version of ICAT, termed acid-labile isotope-coded extractants (ALICE) [70]. The refined tag contains an acid-cleavable site in the tether between biotin tag and iodoacetyl group, which allows for the removal of biotin before MS analysis. Moreover, the new tag comprises a 13C9-coded linker. As a result, ALICE-labeled peptide pairs co-elute from the RP column and, thus, are observed in the same MS scans; this facilitates the process of relative quantification. The development of the various ICAT reagents and their application has been summarized by Tao and Aebersold [71]. More recently, the usability of the original and the acid-cleavable ICAT for MS-based analysis was compared by differentially labeling a tryptic digest from yeast proteins with either tag followed by ESIMS/MS analyses [72]. The use of cleavable ICAT resulted in the identification of an additional 33% of proteins with no loss in accuracy of quantification. Cleavable ICAT was also used to complement 2-D PAGE analysis of the proteome of Pseudomonas putida KT 2440 following induction with benzoate. Results by ICAT and MS confirmed some 2-D PAGE findings and, moreover, identified a new set of proteins induced under these conditions [73]. An alternative to ICAT was recently reported, referred to as HysTag, in which the biotin tag is replaced with a six-histidine sequence [74]. The histidine motif can be used for affinity-enrichment using Ni2+-loaded, immobilized metal affinity chromatography (IMAC) or strong cation-exchange chromatography. HysTag is a derivatized decapeptide consisting of four components: (i) an affinity ligand; (ii) a tryptic cleavage site; (iii) a coding region containing different isotopic forms of alanine (1H4 or 2H4); and (iv) a thiol-reactive group. Using the HysTag method, protein lysates are differentially labeled, pooled, and digested with the endoprotease Lys-C. Labeled peptides are affinity-enriched and subsequently subjected to tryptic digestion, which also results in the cleavage of the initial HysTag to a dipeptide-containing tag. This approach was successfully utilized for relative quantitative MS analysis of membrane proteins isolated from the mouse hind- and forebrain [74].
2.2.2.2 Targeting primary amines. Derivatization of primary amines (i.e., the e-amino group of lysine residues and amino termini of peptides/proteins) by isotope-encoded tags is an increasingly used strategy in quantitative MS-based proteomics. Chemical-labeling strategies targeting primary amines are generally referred to as ‘‘global internal standard technology’’ (GIST) approaches [75] (for a detailed review on stable isotope labeling of primary amines, see ref. [76]). GIST
MS-Driven Approaches to Quantitative Proteomics and Beyond
423
methods offer several benefits, such as the relative ease of amide derivatization or the potentiality to quantitatively analyze post-translationally modified proteins (e.g., phosphoproteins). Anhydrides or esters are utilized to covalently attach the isotope-encoded tags to peptides by nucleophilic acyl substitution of the a- and/or e-amino groups. Since these amino groups differ in nucleophilicity, the pH of the reaction mixture is critical for labeling: acylation of a- and e-amino groups is favored at pH 6.5 and pH 8.5, respectively, whereas at neutral pH both groups are acylated [77]. Two prominent GIST approaches, for which reagent kits are commercially available, are discussed below. 2.2.2.2.1 iTRAQ. The isobaric tag for the relative and absolute quantitation (iTRAQ) technology is based on the global tagging of primary amino groups of peptides using standard N-hydroxysuccinimide (NHS) chemistry [78]. Unlike other tagging approaches, the iTRAQ strategy specifically aims at sample multiplexing. Current iTRAQ reagents are available in four distinct isotopic forms. They consist of three principal components: a reporter group based on N-methylpiperazine; a carbonyl balance group; and a peptide-reactive group (NHS ester). In all isotopic forms, the overall nominal mass of the reporter and balance group is 145.1 Da, owing to the selective inclusion of 13C, 15N, and 18O atoms. Consequently, peptides that were differentially labeled with these isobaric tags appear as single peaks in MS spectra. During peptide MS/MS experiments, the carbonyl balance group is cleaved off, which eventually results in the generation of four distinguishable reporter ions with the respective masses of 114.1, 115.1, 116.1, and 117.1 Da. The relative intensities of these reporter ions reflect relative quantities of the respective peptides. Identification of peptides is performed using m/z information on all other fragment ions observed in the MS/MS spectra combined with database searches. The iTRAQ strategy offers distinct advantages over conventional peptide coding approaches. The main advantage is the possibility of sample multiplexing without generating clusters of peptide pairs detected in MS scans. Instead, differentially labeled peptides appear as single peaks in MS spectra. Information on the abundance of each peptide is obtained through the respective iTRAQ reporter ions in the second stage of mass analysis. Moreover, the tags are modular and may be modified easily if needed. In order to enable multiplexing of up to eight samples, a second generation of iTRAQ tags has been designed [79]; these reagents may be useful to improve statistical significance through the analysis of multiple biological replicates within a single comparative experiment. Nevertheless, a significant limitation of the iTRAQ strategy arises from the principal feature of its design, namely its reliance on reporter groups that are only observable in MS/MS scans. Since iTRAQ-labeled peptides must be subjected to MS/MS for relative quantification, this approach is only well applicable to the analysis of samples of low to moderate complexity. In any case, the use of multidimensional chromatography is recommended in order to reduce the number of co-eluting peptides and, thus, to obtain MS/MS spectra not only of high-abundant but also of low-abundant peptides. In addition, iTRAQ is not compatible with the use of conventional ion trap instruments as their peptide fragmentation spectra
424
Silke Oeljeklaus et al.
lack information regarding the low m/z range in which iTRAQ reporter ions appear. Nevertheless, in spite of these limitations, iTRAQ has gained wide popularity since its introduction in 2004 and was applied to address various biological questions (reviewed in ref. [80]). Lately, some efforts have been made to use iTRAQ for chemical labeling of intact proteins [81,82]. 2.2.2.2.2 ICPL. Lottspeich and co-workers [83] reported a method, termed isotope-coded protein labeling (ICPL), that allows to introduce amine-reactive tags into intact proteins. The ICPL method takes advantage of the generally high abundance of lysine residues in proteins providing multiple labeling sites and, hence, increasing the probability to perform protein quantification on the basis of multiple peptide pairs. The ICPL tag is an isotope-coded N-nicotinoyloxysuccinimide (Nic-NHS). Different versions such as 1H4/2H4- and 12C6/13C6-NicNHS have been synthesized for its use in quantitative proteomics. The latter version ensures co-elution of peptide pairs, which facilitates accurate data evaluation when employing LC/MS. Using ICPL, samples are reduced and alkylated before the derivatization of proteins is performed. Following sample pooling, differentially labeled protein mixtures can either be separated by chromatography and/or gel electrophoresis or they are directly subjected to proteolytic digestion. When using trypsin as proteolytic agent, one has to consider that enzymatic cleavage of ICPL-labeled proteins only occurs C-terminal to arginine residues and not at the modified lysine residues. This results in tryptic peptides with increased size. Using ICPL combined with 2-D PAGE, both identification and relative quantification of proteins can be achieved by peptide mass fingerprinting on, for example, a MALDI-TOF instrument. However, a critical feature of ICPL is a considerable shift in protein pH upon derivatization with Nic-NHS, which results in a decrease in protein pI and, thus, affects isoelectric focussing during 2-D PAGE. While this effect may be advantageous for 2-D PAGE analysis of basic proteins (e.g., histones or ribosomal proteins), it generally reduces the resolution of conventional 2-D PAGE for the bulk of proteins [83]. The applicability of ICPL to quantitative MS-based proteomics was demonstrated by the differential analysis of (1) rat hepatoma cells exposed to a hepatocarcinogenic dioxin resulting in the identification of several candidates implicated in cell cycle regulation, growth factor signaling, and the control of apoptosis [84]; (2) nuclei arrested at different stages of apoptosis, which allowed to establish a so far unknown link between the chromatin protein DEK and cell death [85]; and (3) the membrane proteome of a halobacterium cultured under aerobic versus anaerobic/phototrophic conditions [86].
2.2.2.3 Targeting C-termini. Following a classical synthesis approach, isotopic variants of alcohols were used to differentially label peptides via esterification of their carboxyl groups [87]. This results in the derivatization of not only peptide C-termini, but also aspartic and glutamic acid residues. Derivatization of acidic side chains of amino acid residues may be avoided using enzyme-mediated
MS-Driven Approaches to Quantitative Proteomics and Beyond
425
incorporation of stable isotopes from 18O-labeled water. The incorporation of two ‘‘heavy’’ oxygen atoms (18O) into peptide C-termini was demonstrated using various serine proteases (e.g., trypsin, chymotrypsin, Glu-C, and Lys-C) either in combination with protein digestion or after completion of amide bond hydrolysis using H218O [88–90]. Thus, protein hydrolysis performed in H218O results in the generation of C-terminally labeled proteolytic peptides, bestowing mass shifts of +4 Da on all peptides with exception of protein C-termini. For relative quantitative MS-based analysis, two samples are usually digested in parallel, one in H216O, the other in H218O, and subsequently mixed in a 1:1 ratio prior to chromatographic separation of the peptide mixture and MS analysis. Alternatively, differential labeling using H218O can be performed post digestion. Relative quantification is based on the evaluation of signal intensities or peak areas of 16O/18O-encoded peptide pairs observed in MS scans. MS/MS spectra of 18O-labeled peptides show characteristic mass shifts in the y-ion series, whereas b-ion series remain unchanged. This feature can be used to reliably distinguish between y- and b-ion series in peptide fragment spectra [91]. Since 18O labeling does not alter the physicochemical properties of peptides, corresponding peptide pairs co-elute in chromatography and exhibit identical ionization efficiencies. In addition, very small amounts of sample may be labeled using this strategy [92]. Nevertheless, there are some concerns regarding variability in the rate of isotopic incorporation, which was reported to be dependent on the type of amino acid [90], the size of peptide [91], the amino acid sequence [93], and the enzyme used [88]. 18O labeling has been employed successfully in a number of biological studies, such as in the differential analysis of detergent-resistant membrane microdomains [94] and the secretory proteome of adipose cells [95]. As described above, various chemical labeling strategies have been developed for use in MS-based quantitative proteomics. When choosing a labeling method, the investigator should consider various factors, such as sample complexity, available quantities of the samples to be compared, as well as the instrumentation to be employed. For example, when using isotopic labels which introduce only a small mass shift into peptides, the use of a high-resolution mass spectrometer such as FTICR-MS is advantageous. Moreover, the biological question addressed is of fundamental importance for selecting the ‘‘right’’ approach. ICAT, for example, may facilitate the quantification of low-abundant proteins [96], whereas it is generally not applicable to the identification and quantification of cysteinefree proteins. For the latter purpose and, in addition, for detecting posttranslational modifications, a GIST approach may be better suited [76]. Using iTRAQ or ICPL, however, no sample buffers containing primary amino groups can be used as these would quench the labeling reaction. Furthermore, proteolytic 18 O labeling has been reported to be incompatible with high concentrations of urea [97]. In any case, the researcher should ensure consistent sample handling within the entire experimental process. In this regard, chemical isotope labeling is best performed as early in the analytical process as possible. This generally improves technical reproducibility and assists protein quantification of high accuracy.
426
Silke Oeljeklaus et al.
3. APPLICATIONS IN FUNCTIONAL PROTEOMICS 3.1 Protein localization In eukaryotic cells, proteins are spatially organized into organelles or large cellular structures that not only provide adequate microenvironments but also control their specific functions and activities. Correct localization of proteins is generally of great importance for their interactions within complex regulatory networks of a cell. In many cases, precise protein localization can be vital for the well-being of an entire organism. This is manifested in various inherited diseases that are caused by proteins mislocalized due to erroneous targeting signals or cellular transport. As an example, mistargeting of the enzyme alanine/glyoxylate aminotransferase from peroxisomes to mitochondria in patients with the hereditary disease primary hyperoxaluria type I results in kidney stones at early age [98,99]. Knowledge of subcellular localization sites of proteins is therefore important to shed light on the functions and interactions of so far uncharacterized proteins in health and disease. Since mechanisms involved in protein targeting mainly rely on the recognition of distinct consensus sequences, bioinformatic approaches can be used to deduce information on subcellular localization sites of proteins in silico. Common protein localization predictors are, for example, SignalP [100], MitoProt [101], and PSORT [102], which are used to recognize specific signal sequences for the targeting of proteins to the endoplasmatic reticulum (ER), to mitochondria, or to peroxisomes, for example. In contrast, homology-based predictors utilize phylogenetic protein profiling to predict proteins residing in organelles of endo-symbiotic origins [103]. To improve the reliability of localization predictions, a computational program, termed Proteome Analyst, has recently been developed. It uses SWISS-PROT database annotations from close homologs of the protein of interest in combination with machine learning [104]. Major advantages of bioinformatic prediction methods are certainly their ease of use and general applicability; however, since the overall accuracy is comparably low, results obtained using this approach need to be confirmed independently. This is especially true for localization predictions of proteins organized in small organellar structures as well as for proteins with transmembrane domains or with multiple localization sites. In any case, experimental strategies have to be followed to obtain further evidence that a protein is a true component of an organelle; among them immunomicroscopy is probably the most accepted method in cell biology. Yet, in vivo immunolocalization studies of proteins are labor-intensive and challenging, especially in mammalian systems. Using antibodies, the outcome largely depends on the availability of highly specific antisera, which have to specifically recognize proteins of interest under native conditions. If a protein of interest is fused to a reporter protein (e.g., GFP), great care has to be taken that overexpression of proteins does not saturate intracellular transport mechanisms, leading to aberrant subcellular localization. In addition, a problem frequently encountered with the use of GFP is that the reporter masks targeting signals within the recombinant proteins. It may therefore be critical if GFP has been fused to the C- or N-terminus of the protein [105]. Despite those potential pitfalls, large-scale
MS-Driven Approaches to Quantitative Proteomics and Beyond
427
epitope tagging of open reading frames using GFP followed by visualization of resulting fusion proteins via immunofluorescence microscopy provides a unique opportunity to systematically localize proteins in cells [106]. Besides immunofluorescence-based approaches, subcellular fractionation by density gradient centrifugation (DGC) in combination with enzyme activity measurements and/or immunoblot analyses are widely used to follow proteins of interest across an entire gradient in order to gain information on their subcellular lozalization. The protein profiles obtained are then evaluated in terms of consistency with abundance and/or activity profiles of specific organelle marker proteins. Currently, subcellular fractionation combined with MS-based analysis, referred to as organellar proteomics, potentially provides the most powerful means to attain reliable information on the cellular localization of proteins on a large scale. The pre-eminent potential of organellar proteomics has been demonstrated in various studies reviewed in ref. [107]. Comprehensive protein catalogs are now available for nuclear substructures [20,108,109] as well as cytoplasmatic organelles [110–114]. It should be noted that organelles are not static entities, but rather dynamic cellular structures; therefore, the protein complements are only defined for a specific tissue, cell type and/or metabolic state at a given time. A major benefit of organellar proteomics is that the complexity of an organelleenriched sample is in theory compatible with the sensitivity and dynamic range of current MS-based methods. Consequently, MS has the potential to allow for the identification of virtually every component in a given subproteome. In reality, however, subcellular structures can usually not be purified to homogeneity due to the limited resolving power of subcellular fractionation techniques. In addition, the level of enrichment using DGC differs for various organelles. Hence, subproteomes are typically contaminated with proteins from other cellular compartments — a circumstance that adds to sample complexity and that may hamper the identification of low-abundance organelle constituents. Although serial purification of organelles by, for example, immunoisolation can alleviate these problems, discrimination between co-purified contaminants and bona fide constituents still remains a major concern in organellar proteomics. Isolating organelles with high purity is therefore a key factor in obtaining meaningful data in organellar proteomics. In this respect, it is important to note that the reliability of proteomics datasets on organelles is generally not limited due to incorrect protein identification but rather wrong assignment of co-purifying proteins. To address these issues, organellar proteomics strategies have been refined by harnessing subtractive and quantitative methods discussed in the following sections (for a schematic overview see Figure 3).
3.1.1 Subtractive proteomics method In subtractive proteomics introduced by Yates and co-workers [115–117], a control fraction and a fraction enriched for the organelle of interest are separately analyzed by MS. Proteins found in the control are then subtracted from the protein list of the organelle fraction in order to discriminate between co-purified
Silke Oeljeklaus et al.
(A)
MS
Intensity
#5 #4
#2 #1
ESI-MS2
Intensity
#3
Lysosomes Mitochondria Peroxisomes
MS2
Data analysis
m/z
Proteins iin Proteins in Proteins in fraction # #5 ffraction ction #3 fraction #2
(C)
#2 “light"
Relative Intensity
Relative quantitative proteomics approach
( (B)
#3
#3 "heavy"
#2
ratio ∼ 1:5
ratio ∼ 1:5 #3
#2
m/z
(D) Normalized intensity
Protein correlation profiling approach
Substractive proteomics approach
Gradient centrifugation and mass spectrometric analysis
428
Mitochondrial proteins Peroxisomal proteins Lysosomal proteins Contaminants #1
#2
#3
#4
#5
fractions
Figure 3 Protein localization using (semi-) quantitative MS-based proteomics strategies. (A) Intact organelles such as lysosomes, mitochondria, and peroxisomes present in a cell lysate are separated by DGC. Following the fractionation of the resulting gradient, enriched organelle fractions of interest are analyzed by ESI-MS/MS for protein identification by bioinformatics. In order to distinguish between genuine proteins of an organelle and copurified proteins from other cellular structures, different strategies can be followed. (B) Using subtractive proteomics, protein lists obtained by MS analysis of individual organelle-enriched gradient fractions are compared. Proteins that were identified in both the organelle fraction of interest and ‘‘control’’ fractions are regarded as contaminants. (C) In the relative quantitative proteomics approach using stable isotope labeling, enriched organelle fractions are compared to ‘‘control’’ fractions based on relative quantitative measurements. Only those proteins that are more abundant in an organelle fraction of interest compared with the ‘‘control’’ are considered as true resident proteins of this organelle. (D) Protein profiles are established based on MS analysis of gradient fractions. Marker proteins for different organelles define the respective consensus profiles. Proteins that follow the characteristic consensus profile for an organelle are considered as resident proteins of this organelle, whereas deviations from the profile indicate contaminants. For further information on the different approaches, see corresponding sections in the main text.
contaminants and genuine proteins of the organelle. The high capability of the subtractive proteomics method was first shown in a study of the nuclear envelope (NE) [115]. To generate protein lists, NE fractions and microsomal membrane (MM) fractions were analyzed using the multi-dimensional protein identification technology (MudPIT) [118,119]. Resident proteins of the NE were then filtered by
MS-Driven Approaches to Quantitative Proteomics and Beyond
429
subtracting all proteins identified in MM fractions from those proteins identified in NE fractions. The applicability of this approach to cataloging NE resident proteins was facilitated by the fact that MM fractions can readily be prepared devoid of NEs. All proteins identified in both fractions could thus be disregarded as contaminants without greater doubt. Since the purification of organelles using DGC generally results in an enrichment profile following a Gaussian curve with overlapping profiles of both the organelle of interest and co-purified structures, Keller et al. [117] analyzed the peak fraction of basal-bodies of Chlamydomonas centrioles and two neighboring fractions (‘‘controls’’) from a Nycodenz gradient. In order to determine specific components of this structure, all proteins identified in either of the controls were subtracted from the protein list obtained for the centriole peak fraction. This procedure resulted in the reliable identification of 73% of the known centrioleassociated proteins in Chlamydomonas, as well as various Chlamydomonas homologs of centriole- or centrosome-associated proteins from other species. An additional advantage of subtractive proteomics methods applied to protein localization is that it does not employ further sample treatment (e.g., stable isotope labeling) and, thus, can be adapted easily to proteomics studies of different biological systems. Nonetheless, one should note that the subtractive concept is only well applicable to the analysis of subcellular structures that can be isolated with comparably high purity. Furthermore, subtractive methods fail in identifying those proteins that possess functional activities in more than one structural compartment.
3.1.2 Relative quantitative proteomics approaches Lilley and co-workers [120] developed an alternative approach to organellar proteomics using stable isotope labeling combined with relative quantitative MS analysis, which is referred to as ‘‘localization of organelle proteins by isotope tagging (LOPIT).’’ Using LOPIT, the level of enrichment of organellar proteins across different gradient fractions can be determined by measuring peptide abundance ratios. It therefore provides a relative quantitative means to distinguish resident proteins of an organelle from co-purified proteins from other organelles. LOPIT was first used in conjunction with ICAT to study protein distributions of the endomembrane system of A. thaliana [120]. Plant organellar membranes were partially separated according to their density using iodixanol gradient centrifugation. The gradient was divided into individual fractions, which were then pair-wise differentially labeled using cleavable ICAT reagents. Following pooling and tryptic digestion, peptides containing the ICAT label were enriched using avidin affinity chromatography and analyzed by LC-MS/MS. To obtain protein distributions across the gradient, ICAT ratios were determined for individual proteins identified in each fraction. Proteins of so far unknown localization were assigned to specific organelles (i.e., Golgi, ER) by matching their respective distributions to distributions of organelle marker proteins via cluster analysis.
430
Silke Oeljeklaus et al.
In an effort to further improve resolution and sensitivity of LOPIT, cysteinespecific labeling of proteins by ICAT was replaced with iTRAQ labeling of primary amino groups in tryptic peptides followed by 2-D LC/ESI-MS/MS [121]. In addition, the number of tandem MS runs and, hence, the time of analyses could then be reduced by conducting multiplexing experiments. To reliably identify and assign proteins to different subcellular structures, two independent density gradient separations of A. thaliana membranes were analyzed by LOPIT using iTRAQ. Only those proteins which were found in both preparations were considered for further evaluation. Protein abundance ratios of homologous proteins were calculated exclusively on the basis of unique peptides, thereby further reducing false positive assignments. Quantitative datasets were evaluated by multivariate analysis in combination with a suitable training set of organellar marker proteins (i.e., mitochondria/plastids, Golgi, ER, plasma membrane, and vacuoles). This resulted in the successful assignment of various so far undescribed proteins to specific plant organelles. From in vivo localization studies of a subset of those proteins using fluorescence microscopy, a false positive rate of about 10% was estimated for protein localization using LOPIT. Following the concept of LOPIT, Chen et al. [113] distinguished integral membrane proteins from membrane-associated proteins and soluble proteins by quantitative analysis of zymogen granule membranes across different levels of membrane purification using iTRAQ. Jiang et al. [122] aimed at identifying so far unknown mitochondrial proteins by relative quantitative MS analysis of rat liver mitochondria from two different stages of purification using ICAT. Aitchison and co-workers [123] utilized ICAT and quantitative MS to obtain new insight into the composition of the peroxisome membrane from yeast. Peroxisomes are notoriously difficult to purify by DGC, since mitochondria and microsomes possess similar densities and therefore contaminate peroxisomal fractions to a significant extent. To identify proteins that specifically enrich in peroxisome fractions, two different experiments were conducted. Membraneenriched fractions of peroxisomes from a density gradient were quantitatively compared with either (1) membrane-enriched mitochondria peak fractions or (2) peroxisomal membranes of higher purity obtained by additional affinity purification. In each of the two experiments B350 proteins were identified; only about 70 proteins (B20%) of these were considered as peroxisomal candidates based on their enrichment factors. These numbers suggest that peroxisomal membranes were isolated in the presence of high background contamination. Consequently, the capability of the first experiment to detect a high number of proteins integral to or specifically associated with peroxisomal membranes was low, as such proteins are generally of low-abundance and will therefore inevitably be masked by high-abundance membrane proteins from mitochondria or the ER. This limitation was alleviated in the second experiment, in which peroxisomal membranes of different degrees of purity were compared. As a result, significantly more peroxisomal membrane proteins were identified; Rho1, a small GTPase, could be linked to peroxisome biogenesis and movement for the first time [123].
MS-Driven Approaches to Quantitative Proteomics and Beyond
431
3.1.3 Protein correlation profiling In a very elegant approach, Mann and co-workers [19,20] adopted label-free quantitative MS-based proteomics to generate protein profiles across a density gradient. This method, termed protein correlation profiling (PCP), can be used to follow distributions of proteins of interest across a gradient in a fashion analogous to Western blotting. ‘‘MS-blotting,’’ however, provides the capability to simultaneously monitor hundreds of proteins through various fractions of a gradient with high specificity in the same experiment. To generate protein profiles for protein localization, proteolytic peptide mixtures of successive fractions of a density gradient are analyzed by LC/ESIMS/MS. Normalized XICs of peptides are then plotted against respective gradient fractions. Using marker proteins for organelles of interest, consensus profiles are established and those proteins with profiles that either follow or deviate from these consensus profiles are considered to be true resident proteins and copurifying contaminants, respectively. Furthermore, when peptides specific for protein isoforms are quantified, isoforms can be followed across a gradient. Protein profiles can also feature characteristics of consensus profiles of two or more distinct subcellular structures. While PCP is therefore a powerful means to detect potential multiple localization sites of proteins, subtractive proteomics methods do not provide this level of information. The high applicability of PCP to protein localization was demonstrated by the identification of novel constituents of human centrosomes [20] and mouse kidney peroxisomes [148], as well as by the generation of protein maps of 10 subcellular structures which were only partially separated across a mouse liver gradient [19]. The latter large-scale PCP study is certainly less suitable for the study of lowabundant organelles (e.g., peroxisomes) or organelles which show very similar profiles under the experimental conditions applied. As PCP does not rely on stable isotope labeling, it is in principle applicable to any type of sample (e.g., tissue, cell cultures) and, hence, provides a universal strategy to protein localization using peptide XICs observed in LC/MS runs. However, a high quality of quantitative data on peptide/protein abundances across different samples is only obtained if both sample preparation and analysis are highly reproducible; this prevents the use of multistage sample purification, preparation and/or separation steps in PCP analysis. In conclusion, MS-based subtractive or quantitative proteomics approaches to protein localization provide powerful means to exploit large-scale proteomics datasets and convert them into functional information on proteins. In combination with orthogonal tools, such as functional genomics, organellar proteomics can be expected to accelerate our knowledge and sharpen our picture of the organization, dynamics, and function of proteins within a cell.
3.2 Protein–protein interactions The fate and function of a cell depends on the smooth operation of a vast diversity of spatially and temporally regulated biological processes promoting, among
432
Silke Oeljeklaus et al.
others, survival, proliferation, cell–cell communication, and apoptosis. Most of the events underlying these processes on the molecular level are mediated by specific protein–protein interactions forming stable or transient assemblies within a dynamic network of interacting proteins. The identification of a protein’s binding partners offers the possibility of placing it into the context of a biological pathway, which in turn may provide important insights into its function and regulation. Characterizing protein interactions on a broader scale further enables the reconstruction of interaction maps for proteins and protein complexes of a cell, which improves our understanding of fundamental biological processes [124–126]. MS-based proteomics in combination with adequate techniques to purify a protein complex of interest represents an effective tool for analyzing protein–protein interactions in both small- and large-scale experiments (reviewed in refs. [127,128]). Traditional biochemical approaches for the isolation of protein complexes rely on affinity chromatography techniques such as co-immunoprecipitation and epitope tagging, which generally utilize the protein of interest as bait to isolate its genuine binding partners. There is a variety of different tags that have successfully been employed to characterize protein interactions in various biological contexts (reviewed in refs. [129,130]). An inherent drawback of affinity purification techniques, however, is the inability to discriminate between true interacting proteins and co-purifying contaminants. There is a trade-off between specificity and sensitivity. For increased specificity and reduced level of background contaminants, more stringent washing conditions need to be applied. However, this will in all likelihood result in the loss of weakly bound specific binding partners. In a conventional MS-based protein interaction experiment, proteins of an affinity-purified complex and an adequate negative control performed in parallel are typically separated by 1-D PAGE. Subsequently, proteins are visualized and band patterns are compared. Protein bands that are only present in the experimental sample are excised from the gel and subjected to in-gel proteolytic digestion. In general, peptide mixtures are eventually analyzed by LC/ESI-MS/MS in order to determine the identity of the proteins. Alternatively, an affinity-purified protein sample and its control can be directly subjected to proteolytic digestion in solution followed by separate LC/ESI-MS/ MS analyses. Non-specifically associated proteins can be filtered by comparing the lists of the respective protein composition. This strategy enables the detection of many co-purified contaminants in a sample. As a result of possible limitations in peptide sampling in the mass spectrometer, however, a reliable differentiation between specific and non-specific binding partners can be difficult. In addition, control and experimental samples are separately prepared and analyzed and, thus, exposed to errors based on uneven sample handling. The absence of a protein in a list of identified proteins does not supply evidence for the absence of this protein in the sample. Consequently, a protein that has only been identified in the experimental sample is not necessarily a specific component of a protein complex. An approach for increasing specificity in protein interaction studies is the application of quantitative MS strategies. Differential labeling of experimental and
MS-Driven Approaches to Quantitative Proteomics and Beyond
433
control sample by either metabolic or chemical stable isotope-coding techniques allows for identification of specific interacting partners in protein complexes with high reliability. The principle of metabolic labeling strategies has already been described in detail in this chapter (Section 2.2.1). For studies of protein–protein interactions, distinct cell populations (e.g., one population expressing the protein of interest fused to an affinity tag, the other one expressing the endogenous, nontagged version of the protein as a control) are differentially labeled by growing in defined media containing either the natural or a heavy isotope-coded version of a nutrient or amino acid. Cells are combined in a 1:1 ratio. Proteins are extracted and protein complexes are then purified by affinity chromatography. After proteolytic digestion, the sample is analyzed by LC/ESI-MS/MS enabling both identification of the protein components in the sample and, based on peak intensities of ‘‘light’’ and ‘‘heavy’’ peptides, identification of genuine interacting proteins. The affinity purification step specifically enriches the bait protein and its binding partners. Hence, components of a protein complex can be identified by peptide ratios significantly higher than one. Contaminating proteins from labeled and unlabeled samples that typically bind equally to this complex exhibit peptide ratios of approximately one. As already discussed in this chapter, metabolic labeling is tailored to the use in cell culture systems. To analyze protein–protein interactions in different kinds of biological samples, chemical labeling using stable isotope-labeled tags can be exploited. The wide variety of tags reported so far enables chemical isotope labeling at either the protein or peptide level (see Section 2.2.2). Stable isotope labeling of intact proteins is certainly advantageous since it allows for the purification of a protein complex in the presence of a control. Using peptide-reactive isotopelabeled tags, experimental and control isolation must be performed separately. In either case, differentially labeled samples are jointly analyzed by LC/ESI-MS/MS; discrimination between true and non-specific interacting proteins follows the rules described above. There are several advantages of using quantitative proteomics strategies for protein–protein interaction studies. Since experimental and control samples are, at least partly, prepared and analyzed together, experimental variations due to uneven sample handling are reduced. In addition, peptides bearing stable isotopes can easily be distinguished from their ‘‘light’’ counterparts in MS allowing for relative quantification and, thus, identification of co-purifying contaminants. This abolishes the necessity of using highly purified complexes for MS-based studies. As a consequence, fewer purification steps and/or milder washing conditions during protein complex isolation can be tolerated, thereby increasing the probability of detecting transient and weak interaction partners. Aebersold and co-workers [131] demonstrated the potentialities of a relative quantitative proteomics approach for a detailed and accurate analysis of partially purified protein complexes using ICAT in combination with MS analysis. By comparing a sample containing the yeast RNA polymerase II preinitiation complex enriched by a single-step purification procedure with a sample in which the complex was not enriched, they were able to identify the majority of the known specific protein components of this complex. In addition, Ranish et al.
434
Silke Oeljeklaus et al.
[131] showed that stable isotope labeling is also a suitable tool to study the stoichiometry of protein complexes and dynamically regulated interactions. STE12, a transcription factor involved in the yeast pheromone response, was affinity purified from cells exposed to different environmental stimuli. A comparative approach using ICAT followed by relative quantitative MS analysis revealed changes in the abundance as well as dynamic changes in the composition of the STE12 protein complexes isolated from yeast cells in different states [131]. The use of ICAT labeling has also successfully been adopted to demonstrate dynamic changes in proteins interacting with the MafK transcription factor induced by erythroid differentiation in MEL cells [132]. Apart from ICAT, which specifically reacts with cysteine residues in proteins, primary amine-reactive isotope-labeled tags have been used for the comparative profiling of protein complexes [133,134]. Lottspeich and co-workers [133] refined the quantitative MS approach based on stable isotope labeling and utilized it to determine the stoichiometry of the human small nuclear ribonuclearprotein complex in an absolute quantitative fashion. Mann and colleagues performed studies revealing protein–protein interactions that are involved in fundamental cell signaling pathways in mammalian cells. Firstly, they employed SILAC to characterize signaling complexes participating in the EGFR pathway [63]. They metabolically labeled proteins in EGF-stimulated and non-stimulated HeLa cells with 13C6- and 12C6-arginine, respectively. The SH2domain of the adapter protein, Grb2, which is known to specifically bind to activated EGFR, was used as bait to affinity-purify protein complexes from whole cell lysates. This approach enabled the identification of functional protein–protein interactions within a high background of non-specifically associated proteins based on relative quantitative evaluation of proteolytic peptide pairs containing 13 C6/12C6-arginine. This study was extended to monitor dynamic, time-dependent changes in signaling events stimulated by EGF treatment using a triple-labeling approach and anti-phosphotyrosine antibodies for affinity purification [47]. In another study, SILAC was applied to identify proteins that interact with the insulin-regulated glucose transporter (GLUT4) in an insulin-dependent fashion [65]. Basal and insulin-treated myoblast cells stably expressing myc-tagged GLUT4 were differentially labeled with 2H3-leucine or 1H3-leucine. GLUT4 and associated proteins were immunoprecipitated from total membrane preparations using an antibody against the myc epitope. Analysis of co-precipitated proteins by LC/ESIMS/MS resulted in the identification of a number of promising candidates, such as a-actinin-4 and other cytoskeletal proteins that may be involved in the insulintriggered translocation of GLUT4 to the plasma membrane. The examples given above demonstrate that quantitative MS in clever combination with biochemical, cell and molecular biological techniques are preeminently suited to study the composition, stoichiometry, and dynamics of protein assemblies in different biological contexts. This facilitates greater knowledge regarding the function and regulation of fundamental cellular processes that are generally governed by protein–protein interaction networks in response to internal and external stimuli. Aside from characterizing protein complexes, the application of quantitative MS-based proteomics can be transferred to investigate
MS-Driven Approaches to Quantitative Proteomics and Beyond
435
interactions between proteins and other molecules, such as peptides [135], DNA [136], RNA, metabolites, and drugs [137], thus providing an effective tool in addressing many types of biological and medical questions.
3.3 Protein dynamics The overall objective of functional proteomics is to gather detailed information about post-translational modifications, localization, and interactions of all the proteins of a given proteome in order to identify their specific structural and/or functional roles within a biological system. Integration of all this information will eventually result in a global description of the events occurring in a cell. Cellular activities mediated by proteins are highly dynamic; they change constantly depending on the developmental, environmental, or physiological context. To fully understand these cellular processes and their regulation, it is of great importance to take into account the temporal dimension of changes in protein abundance. Protein dynamics results from the antagonistic processes of protein biosynthesis and degradation as well as protein modifications and localization. Yet, most of the proteomics studies undertaken so far merely provide cellular ‘‘snapshots’’ reflecting a static picture of a defined proteome in a distinct state at a given time. As illustrated by selected examples described below, MS-based quantitative proteomics employing stable isotope labeling provides a powerful tool to follow temporal changes in protein abundance in living systems as well as to measure protein-turnover rates (for in-depth reviews about stable isotope labeling in proteome-wide studies of protein dynamics and turnover, see refs. [30,138,141]).
3.3.1 Measuring dynamics of protein phosphorylation In order to study the dynamics of phosphotyrosine-based signaling events upon stimulation by EGF, Mann and colleagues [47] employed SILAC combined with relative quantitative MS analysis. Populations of HeLa cells were differentially labeled using three isotopic forms of arginine and treated with EGF for different length of time following complete labeling. After cell lysis, extracts were combined and jointly subjected to immunoprecipitation in order to specifically enrich tyrosine-phosphorylated proteins prior to peptide MS analysis. For increased resolution of the dynamics of tyrosine phosphorylation in proteins, two independent three-state experiments were combined, which enabled the comparison of five different time points after EGF stimulation. As a result, Blagoev et al. [47] were able to determine the activation dynamics of practically all known EGF-signaling proteins; they additionally identified signaling molecules with tyrosine phosphorylation sites which were not known to be involved in EGF signaling yet. In a similar approach, Gygi and co-workers [139] used SILAC to analyze the dynamics of EGF-induced changes in specific phosphorylation sites of signaling molecules involved in the mitogen-activated protein kinase signaling pathway in HEK 293E cells. As an alternative to metabolic labeling strategies, White and co-workers [140] exploited iTRAQ to simultaneously analyze four different time points of EGFR activation. Using this chemical labeling approach
436
Silke Oeljeklaus et al.
combined with selective phosphotyrosine peptide enrichment employing anti-phosphotyrosine antibodies followed by IMAC, they were able to successfully monitor the dynamics of tyrosine phosphorylation in the EGFR signaling pathway. A crucial feature of all the studies on phosphorylation events described above is certainly the specific enrichment of phosphoproteins or -peptides prior to MS analysis. Moreover, the results illustrate the potential of enrichment strategies for the identification and quantification of low-abundant sample constituents (e.g., phosphoproteins) in general.
3.3.2 Measuring rates of protein turnover Traditional biochemical strategies for measuring protein-turnover rates employ pulse-chase experiments using radiolabeled compounds. MS-based quantitative proteomics modified this well-established concept by replacing radioactive compounds with stable isotopes. Since this kind of studies requires direct incorporation of stable isotopes into the proteins, only metabolic labeling methods (see Section 2.2.1) are applicable. In order to monitor protein biosynthesis, a population of unlabeled cells is usually exposed to the stable isotope-encoded compound added to the medium. After a defined time of exposure, cells are harvested and lysed. Following extraction and proteolytic digestion of the proteins, peptides are analyzed by MS. The degree of incorporation of stable isotopes into newly synthesized proteins is calculated based on relative ion intensities or peak areas of labeled peptides and their unlabeled counterparts observed in MS spectra. The degree of stable isotope incorporation generally depends on the length of exposure to the label; time-dependent differences provide information about the rate of protein biosynthesis. In contrast, approaches aiming at measuring the degradation of proteins usually rely on tracing the loss of label from pre-labeled proteins. To date, a variety of labeled compounds has been employed successfully to obtain data on protein turnover in various organisms (reviewed in ref. [141]). In order to measure the relative dynamic turnover of proteins in Escherichia coli, Cargile et al. [142] used 13C-glucose as labeling agent. The entire cell culture was harvested 30 min after the pulse. Proteins were extracted, separated on a 1-D gel, digested with trypsin, and analyzed by MS. This resulted in the determination of synthesis/degradation ratios for a variety of different proteins. However, the choice of 13C-glucose generally results in a very complex labeling pattern, which complicates data analysis [141]. Alternatively, Gobom and co-workers [143] used a 15 N-labeling strategy in order to determine protein turnover in HeLa cells exposed to heat stress. At the onset of the stimulus, the control medium of experimental populations was replaced by a medium supplemented with 15N. Cells were harvested at the beginning as well as at defined time points after perturbation. Subsequent separation of proteins by 2-D PAGE and MALDI-MS analysis of the respective 14N/15N-labeled peptides allowed to distinguish mechanisms by which the cells regulate protein concentrations, that is, increased or decreased protein synthesis and degradation, respectively.
MS-Driven Approaches to Quantitative Proteomics and Beyond
437
Beynon and co-workers [144] employed SILAC to successfully determine rates of protein degradation in Saccharomyces cerevisiae. After complete labeling of proteins with 2H10-leucine, the medium was switched to one with 1H10-leucine. Samples were collected at defined time points before and after the chase and subsequently analyzed by MS. This approach enabled profiling of 50 unique proteins with degradation rates from imperceptible to almost 10% per hour. More recently, the same group adopted the SILAC strategy to follow rates of protein turnover in chicken. For metabolic labeling of cellular proteins, chickens were fed a semi-synthetic diet containing the essential amino acid valine in its ‘‘light’’ (1H8) and ‘‘heavy’’ (2H8) form in a 1:1 ratio [145]. This allowed for measuring relative turnover rates of 55–90% of individual muscle proteins by MS. Another, very interesting strategy is the application of pulse-labeling experiments combined with MS to study the dynamic aspects of assembly and turnover of protein complexes. This has been demonstrated by Nowaczyk and coworkers [146], who studied the dynamics of distinct subcomplexes of photosystem II from cyanobacteria using a combinatorial approach of epitope tagging, 15 N-pulse labeling, affinity purification, and MALDI-MS. These examples clearly demonstrate that the use of stable isotope labeling and quantitative MS provide a powerful tool for monitoring dynamics of individual proteins on a global scale. Prerequisite for the correct measurement of protein turnover and protein dynamics by MS is a sound design of the entire experiment, which also includes advanced data evaluation. Information gained in this kind of studies will without doubt make substantial contributions to a more comprehensive and detailed understanding of cellular processes and their regulations.
4. HOW TO OBTAIN MEANINGFUL DATA IN MS-BASED QUANTITATIVE PROTEOMICS Quantitative proteomics provides the capability to follow the fate of proteins through different stages of development or disease of living, dynamic systems. It is used as an analytical instrument to identify biomarkers for diagnostic purposes, to reveal stimulus-induced changes in a proteome or to shed light on the regulation of cellular processes in health and/or disease. Yet, quantitative MS-based proteomics cannot be considered an established technology so far. The outcome is inevitably linked to the overall design of each study. In this respect, robustness, reproducibility, and accuracy of each technique employed are key factors. The exceptional potential of stable isotope labeling combined with quantitative MS for tackling biological and medical questions is beyond doubt. Nonetheless, to fulfill high expectations, future challenges will be to further refine existing or even develop novel quantitative proteomics methods. At the same time, a major focus should be on taking into account the variability of biological systems. For this purpose, an adequate number of biological replicates has to be analyzed followed by advanced bioinformatical and statistical data evaluation. Only if these requirements are fulfilled, one can reliably detect significant changes in protein
438
Silke Oeljeklaus et al.
expression that usually concern a rather small set of proteins only. Those proteins are often directly linked to a regulatory event whose malfunction may become manifested in a serious disease. Quantitative proteomics studies have to be designed carefully in ways that maximize the probability to discover biomarkers, novel drug targets, or to improve our general understanding of biological pathways and processes. When delineating a quantitative proteomics study, various questions need to be considered, such as: Which subset or class of proteins is expected to undergo changes in expression? If necessary, are adequate techniques such as subcellular fractionation or affinity purification available to reproducibly enrich those proteins? Is the amount of starting material sufficient to avoid sample pooling? Are an adequate number of biological replicates available? Which labeling technology is best applicable and does it work quantitatively in our hands? Is further chromatographic or gel electrophoretic separation required to reduce sample complexity? Which kind of mass analyzer is available and, consequently, which dynamic range, sensitivity, and mass resolution can be attained? Are suitable tools for large-scale MS data collection and evaluation by advanced bioinformatics and statistics at hand? Are resources (time, money, people, and instruments) and knowledge available to conduct the study? In continuation of integration and interrogation of data gained by quantitative proteomics, validation of the results using complementing biological and/or biochemical techniques is recommended in order to guarantee reliability and significance of novel information. At the very best, those techniques do not only substantiate the quantitative proteomics findings, they may also provide the capability of obtaining further information on a protein of interest, such as its specific activity or cellular location.
4.1 Data validation Western blotting is arguably the most frequently used biochemical method to detect differences in protein abundance. Thus, if specific antibodies against the protein(s) of interest are available, it is often used as tool to validate data derived from comparative proteomics studies. Advanced techniques additionally offer the possibility to evaluate Western blot results in a quantitative manner. However, since most Western blot detection systems rely on chemiluminescence or fluorescence, the linear dynamic range of detection is limited. A further strategy that is widely used to corroborate proteomics data comprises (immuno-) fluorescencebased techniques, such as immunocytochemistry, immunohistochemistry, and tagging of interesting candidates with a fluorescent probe (e.g., GFP). These
MS-Driven Approaches to Quantitative Proteomics and Beyond
439
methods enable the visualization of proteins in vivo, which may not only verify data regarding protein abundance but also provide most valuable information about the cellular localization of the protein under investigation. The latter may also be useful for the design of follow-up studies. In addition, quantitative realtime reverse transcription polymerase chain reaction (RT-PCR) is increasingly employed to verify proteomics data. This technique enables to obtain information about the abundance of mRNA transcripts by reverse transcribing the RNA molecules into their DNA complements followed by simultaneously amplifying and quantifying the DNA. The use of quantitative real-time RT-PCR can be advantageous, since it facilitates the validation of several different proteins of interest in a single experiment; this renders it more time- and cost-effective than Western blotting. It is often the method of choice when specific antibodies for Western blotting and/or immunofluorescence-based analytical methods are not available. However, despite the fact that RT-PCR is a highly sensitive technique, the current performance of quantitative real-time RT-PCR is still associated with considerable limitations (reviewed in ref. [147]) which makes it difficult to achieve accurate and biologically relevant results. In addition, as pointed out before, protein and mRNA levels do not necessarily correlate.
5. PERSPECTIVES Proteomics research strives at dissecting proteins on a proteome-wide scale in order to draw a molecular map of proteins present in a biological system. Advanced levels of information comprise the location, abundance, functions, interactions, and dynamics of each protein in a cell in dependency on internal and external stimuli. To eventually achieve such a holistic view, proteomics endeavours proceed by employing an array of established as well as novel, sophisticated methodologies. Among those, biological MS is the core technology. Rapid advancements in both the concept of mass analyzers and the physical techniques applicable to protein/peptide fragmentation have accelerated proteomics research. The synergy of MS and stable isotope labeling provides the capability to quantify proteins with high accuracy on a large scale. However, a future challenge in quantitative proteomics will be to take into account the variability of biological systems on a routine basis, which in turn requires advanced computational and statistical methods to integrate large-scale datasets. Moreover, quantitative MS-based proteomics acting in concert with biochemical as well as molecular and cell biological approaches promises to add new dimensions to the characterization of cellular processes leading to an extended knowledge of biological systems.
REFERENCES 1 S.P. Gygi, Y. Rochon, B.R. Franza and R. Aebersold, Correlation between protein and mRNA abundance in yeast, Mol. Cell. Biol., 19 (1999) 1720–1730. 2 P.A. Haynes and J.R. Yates, 3rd, Proteome profiling-pitfalls and progress, Yeast, 17 (2000) 81–87.
440
Silke Oeljeklaus et al.
3 M. Unlu, M.E. Morgan and J.S. Minden, Difference gel electrophoresis: A single gel method for detecting changes in protein extracts, Electrophoresis, 18 (1997) 2071–2077. 4 K. Tang, J.S. Page and R.D. Smith, Charge competition and the linear dynamic range of detection in electrospray ionization mass spectrometry, J. Am. Soc. Mass Spectrom., 15 (2004) 1416–1423. 5 R.D. Smith, Y. Shen and K. Tang, Ultrasensitive and quantitative analyses from combined separations-mass spectrometry for the characterization of proteomes, Acc. Chem. Res., 37 (2004) 269–278. 6 N.B. Cech and C.G. Enke, Practical implications of some recent studies in electrospray ionization fundamentals, Mass Spectrom. Rev., 20 (2001) 362–387. 7 L. Tang and P. Kebarle, Dependency of ion intensity in electrospray mass spectrometry on the concentration of the analytes in the electrosprayed solution, Anal. Chem., 65 (1993) 3654–3668. 8 D. Chelius, T. Zhang, G. Wang and R.F. Shen, Global protein identification and quantification technology using two-dimensional liquid chromatography nanospray mass spectrometry, Anal. Chem., 75 (2003) 6658–6665. 9 D. Chelius and P.V. Bondarenko, Quantitative profiling of proteins in complex mixtures using liquid chromatography and mass spectrometry, J. Proteome Res., 1 (2002) 317–323. 10 R.E. Higgs, M.D. Knierman, V. Gelfanova, J.P. Butler and J.E. Hale, Comprehensive label-free method for the relative quantification of proteins from biological samples, J. Proteome Res., 4 (2005) 1442–1450. 11 J. Listgarten and A. Emili, Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry, Mol. Cell. Proteomics, 4 (2005) 419–434. 12 S.J. Callister, R.C. Barry, J.N. Adkins, E.T. Johnson, W.J. Qian, B.J. Webb-Robertson, R.D. Smith and M.S. Lipton, Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics, J. Proteome Res., 5 (2006) 277–286. 13 P.V. Bondarenko, D. Chelius and T.A. Shaler, Identification and relative quantitation of protein mixtures by enzymatic digestion followed by capillary reversed-phase liquid chromatographytandem mass spectrometry, Anal. Chem., 74 (2002) 4741–4749. 14 W. Wang, H. Zhou, H. Lin, S. Roy, T.A. Shaler, L.R. Hill, S. Norton, P. Kumar, M. Anderle and C.H. Becker, Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards, Anal. Chem., 75 (2003) 4818–4826. 15 J.D. Hoffert, T. Pisitkun, G. Wang, R.F. Shen and M.A. Knepper, Quantitative phosphoproteomics of vasopressin-sensitive renal cells: Regulation of aquaporin-2 phosphorylation at two sites, Proc. Natl. Acad. Sci. USA, 103 (2006) 7159–7164. 16 H. Steen, J.A. Jebanathirajah, M. Springer and M.W. Kirschner, Stable isotope-free relative and absolute quantitation of protein phosphorylation stoichiometry by MS, Proc. Natl. Acad. Sci. USA, 102 (2005) 3948–3953. 17 B.B. Willard, C.I. Ruse, J.A. Keightley, M. Bond and M. Kinter, Site-specific quantitation of protein nitration using liquid chromatography/tandem mass spectrometry, Anal. Chem., 75 (2003) 2370–2376. 18 C.I. Ruse, B. Willard, J.P. Jin, T. Haas, M. Kinter and M. Bond, Quantitative dynamics of sitespecific protein phosphorylation determined using liquid chromatography electrospray ionization mass spectrometry, Anal. Chem., 74 (2002) 1658–1664. 19 L.J. Foster, C.L. de Hoog, Y. Zhang, X. Xie, V.K. Mootha and M. Mann, A mammalian organelle map by protein correlation profiling, Cell, 125 (2006) 187–199. 20 J.S. Andersen, C.J. Wilkinson, T. Mayor, P. Mortensen, E.A. Nigg and M. Mann, Proteomic characterization of the human centrosome by protein correlation profiling, Nature, 426 (2003) 570–574. 21 G. Wang, W.W. Wu, W. Zeng, C.L. Chou and R.F. Shen, Label-free protein quantification using LC-coupled ion trap or FT mass spectrometry: Reproducibility, linearity, and application with complex proteomes, J. Proteome Res., 5 (2006) 1214–1223. 22 J.C. Silva, R. Denny, C. Dorschel, M.V. Gorenstein, G.Z. Li, K. Richardson, D. Wall and S.J. Geromanos, Simultaneous qualitative and quantitative analysis of the Escherichia coli proteome: A sweet tale, Mol. Cell. Proteomics, 5 (2006) 589–607.
MS-Driven Approaches to Quantitative Proteomics and Beyond
441
23 M. Ono, M. Shitashige, K. Honda, T. Isobe, H. Kuwabara, H. Matsuzuki, S. Hirohashi and T. Yamada, Label-free quantitative proteomics using large peptide data sets generated by nanoflow liquid chromatography and mass spectrometry, Mol. Cell. Proteomics, 5 (2006) 1338–1347. 24 W.M. Old, K. Meyer-Arendt, L. Aveline-Wolf, K.G. Pierce, A. Mendoza, J.R. Sevinsky, K.A. Resing and N.G. Ahn, Comparison of label-free methods for quantifying human proteins by shotgun proteomics, Mol. Cell. Proteomics, 4 (2005) 1487–1502. 25 M.C. Wiener, J.R. Sachs, E.G. Deyanova and N.A. Yates, Differential mass spectrometry: A labelfree LC-MS method for finding significant differences in complex peptide and protein mixtures, Anal. Chem., 76 (2004) 6085–6096. 26 R. Fang, D.A. Elias, M.E. Monroe, Y. Shen, M. McIntosh, P. Wang, C.D. Goddard, S.J. Callister, R.J. Moore, Y.A. Gorby, J.N. Adkins, J.K. Fredrickson, M.S. Lipton and R.D. Smith, Differential label-free quantitative proteomic analysis of Shewanella oneidensis cultured under aerobic and suboxic conditions by accurate mass and time tag approach, Mol. Cell. Proteomics, 5 (2006) 714–725. 27 L. Pasa-Tolic, C. Masselon, R.C. Barry, Y. Shen and R.D. Smith, Proteomic analyses using an accurate mass and time tag strategy, Biotechniques, 37 (2004) 621–624, 626–633. 28 J. Krijgsveld, R.F. Ketting, T. Mahmoudi, J. Johansen, M. Artal-Sanz, C.P. Verrijzer, R.H. Plasterk and A.J. Heck, Metabolic labeling of C. elegans and D. melanogaster for quantitative proteomics, Nat. Biotechnol., 21 (2003) 927–931. 29 C.C. Wu, M.J. MacCoss, K.E. Howell, D.E. Matthews and J.R. Yates, 3rd, Metabolic labeling of mammalian organisms with stable isotopes for quantitative proteomic analysis, Anal. Chem., 76 (2004) 4951–4959. 30 R.J. Beynon and J.M. Pratt, Metabolic labeling of proteins for proteomics, Mol. Cell. Proteomics, 4 (2005) 857–872. 31 T.P. Conrads, K. Alving, T.D. Veenstra, M.E. Belov, G.A. Anderson, D.J. Anderson, M.S. Lipton, L. Pasa-Tolic, H.R. Udseth, W.B. Chrisler, B.D. Thrall and R.D. Smith, Quantitative analysis of bacterial and mammalian proteomes using a combination of cysteine affinity tags and 15Nmetabolic labelling, Anal. Chem., 73 (2001) 2132–2139. 32 M.P. Washburn, R. Ulaszek, C. Deciu, D.M. Schieltz and J.R. Yates, 3rd, Analysis of quantitative proteomic data generated via multidimensional protein identification technology, Anal. Chem., 74 (2002) 1650–1657. 33 M.P. Washburn, R.R. Ulaszek and J.R. Yates, 3rd, Reproducibility of quantitative proteomic analyses of complex biological mixtures by multidimensional protein identification technology, Anal. Chem., 75 (2003) 5054–5061. 34 Y.K. Wang, Z. Ma, D.F. Quinn and E.W. Fu, Inverse 15N-metabolic labeling/mass spectrometry for comparative proteomics and rapid identification of protein markers/targets, Rapid Commun. Mass Spectrom., 16 (2002) 1389–1397. 35 Y. Oda, K. Huang, F.R. Cross, D. Cowburn and B.T. Chait, Accurate quantitation of protein expression and site-specific phosphorylation, Proc. Natl. Acad. Sci. USA, 96 (1999) 6591–6596. 36 L. Pasa-Tolic, P.K. Jensen, G.A. Anderson, M.S. Lipton, K.K. Peden, S. Martinovic, N. Tolic, J.E. Bruce and R.D. Smith, High throughput proteome-wide precision measurements of protein expression using mass spectrometry, J. Am. Chem. Soc., 121 (1999) 7949–7950. 37 W.R. Engelsberger, A. Erban, J. Kopka and W.X. Schulze, Metabolic labeling of plant cell cultures with K15NO3 as a tool for quantitative analysis of proteins and metabolites, Plant Methods, 2 (2006) 14. 38 R. Hebeler, S. Oeljeklaus, K.A. Reidegeld, M. Eisenacher, C. Stephan, B. Sitek, K. Stu¨hler, H.E. Meyer, M.J.G. Sturre, P.P. Dijkwel and B. Warscheid, Study of early leaf senescence in Arabidopsis thaliana by quantitative proteomics using reciprocal 14N/15N-labeling and difference gel electrophoresis, Mol. Cell. Proteomics, 7 (2008) 108–120. 39 E.L. Huttlin, A.D. Hegeman, A.C. Harms and M.R. Sussman, Comparison of full versus partial metabolic labeling for quantitative proteomics analysis in Arabidopsis thaliana, Mol. Cell. Proteomics, 6 (2007) 860–881. 40 S.E. Ong, B. Blagoev, I. Kratchmarova, D.B. Kristensen, H. Steen, A. Pandey and M. Mann, Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics, Mol. Cell. Proteomics, 1 (2002) 376–386.
442
Silke Oeljeklaus et al.
41 X. Chen, L.M. Smith and E.M. Bradbury, Site-specific mass tagging with stable isotopes in proteins for accurate and efficient protein identification, Anal. Chem., 72 (2000) 1134–1143. 42 J.M. Pratt, D.H. Robertson, S.J. Gaskell, I. Riba-Garcia, S.J. Hubbard, K. Sidhu, S.G. Oliver, P. Butler, A. Hayes, J. Petty and R.J. Beynon, Stable isotope labelling in vivo as an aid to protein identification in peptide mass fingerprinting, Proteomics, 2 (2002) 157–163. 43 H. Zhu, T.C. Hunter, S. Pan, P.M. Yau, E.M. Bradbury and X. Chen, Residue-specific mass signatures for the efficient detection of protein modifications by mass spectrometry, Anal. Chem., 74 (2002) 1687–1694. 44 H. Jiang and A.M. English, Quantitative analysis of the yeast proteome by incorporation of isotopically labeled leucine, J. Proteome Res., 1 (2002) 345–350. 45 S.E. Ong, I. Kratchmarova and M. Mann, Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC), J. Proteome Res., 2 (2003) 173–181. 46 S.E. Ong and M. Mann, Mass spectrometry-based proteomics turns quantitative, Nat. Chem. Biol., 1 (2005) 252–262. 47 B. Blagoev, S.E. Ong, I. Kratchmarova and M. Mann, Temporal analysis of phosphotyrosinedependent signaling networks by quantitative proteomics, Nat. Biotechnol., 22 (2004) 1139–1145. 48 P.A. Everley, J. Krijgsveld, B.R. Zetter and S.P. Gygi, Quantitative cancer proteomics: Stable isotope labeling with amino acids in cell culture (SILAC) as a tool for prostate cancer research, Mol. Cell. Proteomics, 3 (2004) 729–735. 49 Y. Hathout, J. Flippin, C. Fan, P. Liu and K. Csaky, Metabolic labeling of human primary retinal pigment epithelial cells for accurate comparative proteomics, J. Proteome Res., 4 (2005) 620–627. 50 L.J. Foster, C.L. De Hoog and M. Mann, Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors, Proc. Natl. Acad. Sci. USA, 100 (2003) 5813–5818. 51 E.P. Romijn, C. Christis, M. Wieffer, J.W. Gouw, A. Fullaondo, P. van der Sluijs, I. Braakman and A.J. Heck, Expression clustering reveals detailed co-expression patterns of functionally related proteins during B cell differentiation: A proteomic study using a combination of one-dimensional gel electrophoresis, LC-MS/MS, and stable isotope labeling by amino acids in cell culture (SILAC), Mol. Cell. Proteomics, 4 (2005) 1297–1310. 52 N. Ibarrola, H. Molina, A. Iwahori and A. Pandey, A novel proteomic approach for specific identification of tyrosine kinase substrates using [13C]tyrosine, J. Biol. Chem., 279 (2004) 15805–15813. 53 S.E. Ong, G. Mittler and M. Mann, Identifying and quantifying in vivo methylation sites by heavy methyl SILAC, Nat. Methods, 1 (2004) 119–126. 54 R. Zhang, C.S. Sioma, S. Wang and F.E. Regnier, Fractionation of isotopically labeled peptides in quantitative proteomics, Anal. Chem., 73 (2001) 5142–5149. 55 N. Ibarrola, D.E. Kalume, M. Gronborg, A. Iwahori and A. Pandey, A proteomic approach for quantitation of phosphorylation using stable isotope labeling in cell culture, Anal. Chem., 75 (2003) 6043–6049. 56 M.L. Gehrmann, Y. Hathout and C. Fenselau, Evaluation of metabolic labeling for comparative proteomics in breast cancer cells, J. Proteome Res., 3 (2004) 1063–1068. 57 Y. Zhou, Y. Wang, M. Kovacs, J. Jin and J. Zhang, Microglial activation induced by neurodegeneration: A proteomic analysis, Mol. Cell. Proteomics, 4 (2005) 1471–1479. 58 A. Gruhler, W.X. Schulze, R. Matthiesen, M. Mann and O.N. Jensen, Stable isotope labeling of Arabidopsis thaliana cells and quantitative proteomics by mass spectrometry, Mol. Cell. Proteomics, 4 (2005) 1697–1709. 59 B. Thiede, A. Kretschmer and T. Rudel, Quantitative proteome analysis of CD95 (Fas/Apo-1)induced apoptosis by stable isotope labeling with amino acids in cell culture, 2-DE and MALDI-MS, Proteomics, 6 (2006) 614–622. 60 A. Gruhler, J.V. Olsen, S. Mohammed, P. Mortensen, N.J. Faergeman, M. Mann and O.N. Jensen, Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway, Mol. Cell. Proteomics, 4 (2005) 310–327. 61 G. Zhang, D.S. Spellman, E.Y. Skolnik and T.A. Neubert, Quantitative phosphotyrosine proteomics of EphB2 signaling by stable isotope labeling with amino acids in cell culture (SILAC), J. Proteome Res., 5 (2006) 581–588.
MS-Driven Approaches to Quantitative Proteomics and Beyond
443
62 R. Bose, H. Molina, A.S. Patterson, J.K. Bitok, B. Periaswamy, J.S. Bader, A. Pandey and P.A. Cole, Phosphoproteomic analysis of Her2/neu signaling and inhibition, Proc. Natl. Acad. Sci. USA, 103 (2006) 9773–9778. 63 B. Blagoev, I. Kratchmarova, S.E. Ong, M. Nielsen, L.J. Foster and M. Mann, A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signalling, Nat. Biotechnol., 21 (2003) 315–318. 64 C.L. de Hoog, L.J. Foster and M. Mann, RNA and RNA binding proteins participate in early stages of cell spreading through spreading initiation centers, Cell, 117 (2004) 649–662. 65 L.J. Foster, A. Rudich, I. Talior, N. Patel, X. Huang, L.M. Furtado, P.J. Bilan, M. Mann and A. Klip, Insulin-dependent interactions of proteins with GLUT4 revealed through stable isotope labeling by amino acids in cell culture (SILAC), J. Proteome Res., 5 (2006) 64–75. 66 S.P. Gygi, B. Rist, S.A. Gerber, F. Turecek, M.H. Gelb and R. Aebersold, Quantitative analysis of complex protein mixtures using isotope-coded affinity tags, Nat. Biotechnol., 17 (1999) 994–999. 67 M. Hamdan and P.G. Righetti, Modern strategies for protein quantification in proteome analysis: Advantages and limitations, Mass Spectrom. Rev., 21 (2002) 287–302. 68 M. Galvani, L. Rovatti, M. Hamdan, B. Herbert and P.G. Righetti, Protein alkylation in the presence/absence of thiourea in proteome analysis: A matrix assisted laser desorption/ ionization-time of flight-mass spectrometry investigation, Electrophoresis, 22 (2001) 2066–2074. 69 M.B. Smolka, H. Zhou, S. Purkayastha and R. Aebersold, Optimization of the isotope-coded affinity tag-labeling procedure for quantitative proteome analysis, Anal. Biochem., 297 (2001) 25–31. 70 Y. Qiu, E.A. Sousa, R.M. Hewick and J.H. Wang, Acid-labile isotope-coded extractants: A class of reagents for quantitative mass spectrometric analysis of complex protein mixtures, Anal. Chem., 74 (2002) 4969–4979. 71 W.A. Tao and R. Aebersold, Advances in quantitative proteomics via stable isotope tagging and mass spectrometry, Curr. Opin. Biotechnol., 14 (2003) 110–118. 72 E.C. Yi, X.J. Li, K. Cooke, H. Lee, B. Raught, A. Page, V. Aneliunas, P. Hieter, D.R. Goodlett and R. Aebersold, Increased quantitative proteome coverage with (13)C/(12)C-based, acid-cleavable isotope-coded affinity tag reagent, and modified data acquisition scheme, Proteomics, 5 (2005) 380–387. 73 Y.H. Kim, K. Cho, S.H. Yun, J.Y. Kim, K.H. Kwon, J.S. Yoo and S.I. Kim, Analysis of aromatic catabolic pathways in Pseudomonas putida KT 2440 using a combined proteomic approach: 2-DE/MS and cleavable isotope-coded affinity tag analysis, Proteomics, 6 (2006) 1301–1318. 74 J.V. Olsen, J.R. Andersen, P.A. Nielsen, M.L. Nielsen, D. Figeys, M. Mann and J.R. Wisniewski, HysTag-a novel proteomic quantification tool applied to differential display analysis of membrane proteins from distinct areas of mouse brain, Mol. Cell. Proteomics, 3 (2004) 82–92. 75 A. Chakraborty and F.E. Regnier, Global internal standard technology for comparative proteomics, J. Chromatogr. A, 949 (2002) 173–184. 76 F.E. Regnier and S. Julka, Primary amine coding as a path to comparative proteomics, Proteomics, 6 (2006) 3968–3979. 77 G. Gaudriault and J.P. Vincent, Selective labeling of alpha- or epsilon-amino groups in peptides by the bolton-hunter reagent, Peptides, 13 (1992) 1187–1192. 78 P.L. Ross, Y.N. Huang, J.N. Marchese, B. Williamson, K. Parker, S. Hattan, N. Khainovski, S. Pillai, S. Dey, S. Daniels, S. Purkayastha, P. Juhasz, S. Martin, M. Bartlet-Jones, F. He, A. Jacobson and D.J. Pappin, Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents, Mol. Cell. Proteomics, 3 (2004) 1154–1169. 79 L. Choe, M. D’Ascenzo, N.R. Relkin, D. Pappin, P. Ross, B. Williamson, S. Guertin, P. Pribil and K.H. Lee, 8-plex quantitation of changes in cerebrospinal fluid protein expression in subjects undergoing intravenous immunoglobulin treatment for Alzheimer’s disease, Proteomics, 7 (2007) 3651–3660. 80 K. Aggarwal, L.H. Choe and K.H. Lee, Shotgun proteomics using the iTRAQ isobaric tags, Brief Funct. Genomic. Proteomic., 5 (2006) 112–120. 81 S. Wiese, K.A. Reidegeld, H.E. Meyer and B. Warscheid, Protein labeling by iTRAQ: A new tool for quantitative mass spectrometry in proteome research, Proteomics, 7 (2007) 340–350.
444
Silke Oeljeklaus et al.
82 S.R. Guertin, M. Minkoff, B. Williamson and S. Purkayastha, Relative quantitation of yeast proteins labeled with isobaric tags and fractionated by SDS-page, Proceedings of the 54th ASMS Conference on Mass Spectrometry and Allied Topics, 2006, Seattle, Washington. 83 A. Schmidt, J. Kellermann and F. Lottspeich, A novel strategy for quantitative proteomics using isotope-coded protein labels, Proteomics, 5 (2005) 4–15. 84 H. Sarioglu, S. Brandner, C. Jacobsen, T. Meindl, A. Schmidt, J. Kellermann, F. Lottspeich and U. Andrae, Quantitative analysis of 2,3,7,8-tetrachlorodibenzo-p-dioxin-induced proteome alterations in 5L rat hepatoma cells using isotope-coded protein labels, Proteomics, 6 (2006) 2407–2421. 85 A. Tabbert, F. Kappes, R. Knippers, J. Kellermann, F. Lottspeich and E. Ferrando-May, Hypophosphorylation of the architectural chromatin protein DEK in death-receptor-induced apoptosis revealed by the isotope coded protein label proteomic platform, Proteomics (Epub ahead of print), (2006). 86 B. Bisle, A. Schmidt, B. Scheibe, C. Klein, A. Tebbe, J. Kellermann, F. Siedler, F. Pfeiffer, F. Lottspeich and D. Oesterhelt, Quantitative profiling of the membrane proteome in a halophilic archaeon, Mol. Cell. Proteomics, 5 (2006) 1543–1558. 87 D.R. Goodlett, A. Keller, J.D. Watts, R. Newitt, E.C. Yi, S. Purvine, J.K. Eng, P. von Haller, R. Aebersold and E. Kolker, Differential stable isotope labeling of peptides for quantitation and de novo sequence derivation, Rapid Commun. Mass Spectrom., 15 (2001) 1214–1221. 88 X. Yao, A. Freas, J. Ramirez, P.A. Demirev and C. Fenselau, Proteolytic 18O labeling for comparative proteomics: Model studies with two serotypes of adenovirus, Anal. Chem., 73 (2001) 2836–2842. 89 K.J. Reynolds, X. Yao and C. Fenselau, Proteolytic 18O labeling for comparative proteomics: Evaluation of endoprotease Glu-C as the catalytic agent, J. Proteome Res., 1 (2002) 27–33. 90 X. Yao, C. Afonso and C. Fenselau, Dissection of proteolytic 18o labeling: Endoprotease-catalyzed 16O-to-18O exchange of truncated peptide substrates, J. Proteome Res., 2 (2003) 147–152. 91 M. Schnolzer, P. Jedrzejewski and W.D. Lehmann, Protease-catalyzed incorporation of 18O into peptide fragments and its application for protein sequencing by electrospray and matrix-assisted laser desorption/ionization mass spectrometry, Electrophoresis, 17 (1996) 945–953. 92 L. Zang, D. Palmer Toy, W.S. Hancock, D.C. Sgroi and B.L. Karger, Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection, LC-MS, and 16O/18O isotopic labelling, J. Proteome Res., 3 (2004) 604–612. 93 II. Stewart, T. Thomson and D. Figeys, 18O labeling: A tool for proteomics, Rapid Commun. Mass Spectrom., 15 (2001) 2456–2465. 94 J. Blonder, L.R. Yu, G. Radeva, K.C. Chan, D.A. Lucas, T.J. Waybright, H.J. Issaq, F.J. Sharom and T.D. Veenstra, Combined chemical and enzymatic stable isotope labeling for quantitative profiling of detergent-insoluble membrane proteins isolated using Triton X-100 and Brij-96, J. Proteome Res., 5 (2006) 349–360. 95 X. Chen, S.W. Cushman, L.K. Pannell and S. Hess, Quantitative proteomic analysis of the secretory proteins from rat adipose cells using a 2D liquid chromatography-MS/MS approach, J. Proteome Res., 4 (2005) 570–577. 96 L. DeSouza, G. Diehl, M.J. Rodrigues, J. Guo, A.D. Romaschin, T.J. Colgan and K.W. Siu, Search for cancer markers from endometrial tissues using differentially labeled tags iTRAQ and cICAT with multidimensional liquid chromatography and tandem mass spectrometry, J. Proteome Res., 4 (2005) 377–386. 97 F.E. Regnier, L. Riggs, R. Zhang, L. Xiong, P. Liu, A. Chakraborty, E. Seeley, C. Sioma and R.A. Thompson, Comparative proteomics based on stable isotope labeling and affinity selection, J. Mass Spectrom., 37 (2002) 133–145. 98 C.J. Danpure, P.R. Jennings, P. Fryer, P.E. Purdue and J. Allsop, Primary hyperoxaluria type 1: Genotypic and phenotypic heterogeneity, J. Inherit. Metab. Dis., 17 (1994) 487–499. 99 C.J. Danpure, M.J. Lumb, G.M. Birdsey and X. Zhang, Alanine:Glyoxylate aminotransferase peroxisome-to-mitochondrion mistargeting in human hereditary kidney stone disease, Biochim. Biophys. Acta, 1647 (2003) 70–75. 100 H. Nielsen, J. Engelbrecht, S. Brunak and G. von Heijne, A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites, Int. J. Neural. Syst., 8 (1997) 581–599.
MS-Driven Approaches to Quantitative Proteomics and Beyond
445
101 M.G. Claros and P. Vincens, Computational method to predict mitochondrially imported proteins and their targeting sequences, Eur. J. Biochem., 241 (1996) 779–786. 102 K. Nakai and M. Kanehisa, A knowledge base for predicting protein localization sites in eukaryotic cells, Genomics, 14 (1992) 897–911. 103 E.M. Marcotte, I. Xenarios, A.M. van Der Bliek and D. Eisenberg, Localizing proteins in the cell from their phylogenetic profiles, Proc. Natl. Acad. Sci. USA, 97 (2000) 12115–12120. 104 Z. Lu, D. Szafron, R. Greiner, P. Lu, D.S. Wishart, B. Poulin, J. Anvik, C. Macdonell and R. Eisner, Predicting subcellular localization of proteins using machine-learned classifiers, Bioinformatics, 20 (2004) 547–556. 105 R. Pepperkok, J.C. Simpson and S. Wiemann, Being in the right location at the right time, Genome Biol., 2 (2001) 1024. 106 J.C. Simpson and R. Pepperkok, Localizing the proteome, Genome Biol., 4 (2003) 240. 107 J.R. Yates, 3rd, A. Gilchrist, K.E. Howell and J.J. Bergeron, Proteomics of organelles and large cellular structures, Nat. Rev. Mol. Cell. Biol., 6 (2005) 702–714. 108 M. Dreger, L. Bengtsson, T. Schoneberg, H. Otto and F. Hucho, Nuclear envelope proteomics: Novel integral membrane proteins of the inner nuclear membrane, Proc. Natl. Acad. Sci. USA, 98 (2001) 11943–11948. 109 J.S. Andersen, Y.W. Lam, A.K. Leung, S.E. Ong, C.E. Lyon, A.I. Lamond and M. Mann, Nucleolar proteome dynamics, Nature, 433 (2005) 77–83. 110 A. Sickmann, J. Reinders, Y. Wagner, C. Joppich, R. Zahedi, H.E. Meyer, B. Schonfisch, I. Perschil, A. Chacinska, B. Guiard, P. Rehling, N. Pfanner and C. Meisinger, The proteome of Saccharomyces cerevisiae mitochondria, Proc. Natl. Acad. Sci. USA, 100 (2003) 13207–13212. 111 S.W. Taylor, E. Fahy, B. Zhang, G.M. Glenn, D.E. Warnock, S. Wiley, A.N. Murphy, S.P. Gaucher, R.A. Capaldi, B.W. Gibson and S.S. Ghosh, Characterization of the human heart mitochondrial proteome, Nat. Biotechnol., 21 (2003) 281–286. 112 V.K. Mootha, J. Bunkenborg, J.V. Olsen, M. Hjerrild, J.R. Wisniewski, E. Stahl, M.S. Bolouri, H.N. Ray, S. Sihag, M. Kamal, N. Patterson, E.S. Lander and M. Mann, Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria, Cell, 115 (2003) 629–640. 113 X. Chen, A.K. Walker, J.R. Strahler, E.S. Simon, S.L. Tomanicek-Volk, B.B. Nelson, M.C. Hurley, S.A. Ernst, J.A. Williams and P.C. Andrews, Organellar proteomics: Analysis of pancreatic zymogen granule membranes, Mol. Cell. Proteomics, 5 (2006) 306–312. 114 F. Forner, L.J. Foster, S. Campanaro, G. Valle and M. Mann, Quantitative proteomic comparison of rat mitochondria from muscle, heart, and live, Mol. Cell. Proteomics, 5 (2006) 608–619. 115 E.C. Schirmer, L. Florens, T. Guan, J.R. Yates, 3rd and L. Gerace, Nuclear membrane proteins with potential disease links found by subtractive proteomics, Science, 301 (2003) 1380–1382. 116 T.Y. Sam-Yellowe, L. Florens, T. Wang, J.D. Raine, D.J. Carucci, R. Sinden and J.R. Yates, 3rd, Proteome analysis of rhoptry-enriched fractions isolated from plasmodium merozoites, J. Proteome Res., 3 (2004) 995–1001. 117 L.C. Keller, E.P. Romijn, I. Zamora, J.R. Yates, 3rd and W.F. Marshall, Proteomic analysis of isolated chlamydomonas centrioles reveals orthologs of ciliary-disease genes, Curr. Biol., 15 (2005) 1090–1098. 118 M.P. Washburn, D. Wolters and J.R. Yates, 3rd, Large-scale analysis of the yeast proteome by multidimensional protein identification technology, Nat. Biotechnol., 19 (2001) 242–247. 119 D.A. Wolters, M.P. Washburn and J.R. Yates, 3rd, An automated multidimensional protein identification technology for shotgun proteomics, Anal. Chem., 73 (2001) 5683–5690. 120 T.P. Dunkley, P. Dupree, R.B. Watson and K.S. Lilley, The use of isotope-coded affinity tags (ICAT) to study organelle proteomes in Arabidopsis thaliana, Biochem. Soc. Trans., 32 (2004) 520–523. 121 T.P. Dunkley, S. Hester, I.P. Shadforth, J. Runions, T. Weimar, S.L. Hanton, J.L. Griffin, C. Bessant, F. Brandizzi, C. Hawes, R.B. Watson, P. Dupree and K.S. Lilley, Mapping the Arabidopsis organelle proteome, Proc. Natl. Acad. Sci. USA, 103 (2006) 6518–6523. 122 X.S. Jiang, J. Dai, Q.H. Sheng, L. Zhang, Q.C. Xia, J.R. Wu and R. Zeng, A comparative proteomic strategy for subcellular proteome research: ICAT approach coupled with bioinformatics prediction to ascertain rat liver mitochondrial proteins and indication of mitochondrial localization for catalase, Mol. Cell. Proteomics, 4 (2005) 12–34.
446
Silke Oeljeklaus et al.
123 M. Marelli, J.J. Smith, S. Jung, E. Yi, A.I. Nesvizhskii, R.H. Christmas, R.A. Saleem, Y.Y. Tam, A. Fagarasanu, D.R. Goodlett, R. Aebersold, R.A. Rachubinski and J.D. Aitchison, Quantitative mass spectrometry reveals a role for the GTPase rho1p in actin organization on the peroxisome membrane, J. Cell. Biol., 167 (2004) 1099–1112. 124 A.C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M. Michon, C.M. Cruciat, M. Remor, C. Hofert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B. Huhse, C. Leutwein, M.A. Heurtier, R.R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork, B. Seraphin, B. Kuster, G. Neubauer and G. Superti-Furga, Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, 415 (2002) 141–147. 125 Y. Ho, A. Gruhler, A. Heilbut, G.D. Bader, L. Moore, S.L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S. Schandorff, J. Shewnarane, M. Vo, J. Taggart, M. Goudreault, B. Muskat, C. Alfarano, D. Dewar, Z. Lin, K. Michalickova, A.R. Willems, H. Sassi, P.A. Nielsen, K.J. Rasmussen, J.R. Andersen, L.E. Johansen, L.H. Hansen, H. Jespersen, A. Podtelejnikov, E. Nielsen, J. Crawford, V. Poulsen, B.D. Sorensen, J. Matthiesen, R.C. Hendrickson, F. Gleeson, T. Pawson, M.F. Moran, D. Durocher, M. Mann, C.W. Hogue, D. Figeys and M. Tyers, Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry, Nature, 415 (2002) 180–183. 126 A.C. Gavin, P. Aloy, P. Grandi, R. Krause, M. Boesche, M. Marzioch, C. Rau, L.J. Jensen, S. Bastuck, B. Dumpelfeld, A. Edelmann, M.A. Heurtier, V. Hoffman, C. Hoefert, K. Klein, M. Hudak, A.M. Michon, M. Schelder, M. Schirle, M. Remor, T. Rudi, S. Hooper, A. Bauer, T. Bouwmeester, G. Casari, G. Drewes, G. Neubauer, J.M. Rick, B. Kuster, P. Bork, R.B. Russell and G. Superti-Furga, Proteome survey reveals modularity of the yeast cell machinery, Nature, 440 (2006) 631–636. 127 R. Aebersold and M. Mann, Mass spectrometry-based proteomics, Nature, 422 (2003) 198–207. 128 T. Ito, K. Ota, H. Kubota, Y. Yamaguchi, T. Chiba, K. Sakuraba and M. Yoshida, Roles for the twohybrid system in exploration of the yeast protein interactome, Mol. Cell. Proteomics, 1 (2002) 561–566. 129 K. Terpe, Overview of tag protein fusions: From molecular and biochemical fundamentals to commercial systems, Appl. Microbiol. Biotechnol., 60 (2003) 523–533. 130 A. Bauer and B. Kuster, Affinity purification-mass spectrometry. Powerful tools for the characterization of protein complexes, Eur. J. Biochem., 270 (2003) 570–578. 131 J.A. Ranish, E.C. Yi, D.M. Leslie, S.O. Purvine, D.R. Goodlett, J. Eng and R. Aebersold, The study of macromolecular complexes by quantitative proteomics, Nat. Genet., 33 (2003) 349–355. 132 M. Brand, J.A. Ranish, N.T. Kummer, J. Hamilton, K. Igarashi, C. Francastel, T.H. Chi, G.R. Crabtree, R. Aebersold and M. Groudine, Dynamic changes in transcription factor complexes during erythroid differentiation revealed by quantitative proteomics, Nat. Struct. Mol. Biol., 11 (2004) 73–80. 133 E.O. Hochleitner, B. Kastner, T. Frohlich, A. Schmidt, R. Luhrmann, G. Arnold and F. Lottspeich, Protein stoichiometry of a multiprotein complex, the human spliceosomal U1 small nuclear ribonucleoprotein: Absolute quantification using isotope-coded tags and mass spectrometry, J. Biol. Chem., 280 (2005) 2536–2542. 134 I. Fierro-Monti, S. Mohammed, R. Matthiesen, R. Santoro, J.S. Burns, D.J. Williams, C.G. Proud, M. Kassem, O.N. Jensen and P. Roepstorff, Quantitative proteomics identifies Gemin5, a scaffolding protein involved in ribonucleoprotein assembly, as a novel partner for eukaryotic initiation factor 4E, J. Proteome Res., 5 (2006) 1367–1378. 135 W.X. Schulze and M. Mann, A novel proteomic screen for peptide–protein interactions, J. Biol. Chem., 279 (2004) 10756–10764. 136 C.L. Himeda, J.A. Ranish, J.C. Angello, P. Maire, R. Aebersold and S.D. Hauschka, Quantitative proteomic identification of six4 as the trex-binding factor in the muscle creatine kinase enhancer, Mol. Cell. Biol., 24 (2004) 2132–2143. 137 Y. Oda, T. Owa, T. Sato, B. Boucher, S. Daniels, H. Yamanaka, Y. Shinohara, A. Yokoi, J. Kuromitsu and T. Nagasu, Quantitative chemical proteomics for identifying candidate drug targets, Anal. Chem., 75 (2003) 2159–2165.
MS-Driven Approaches to Quantitative Proteomics and Beyond
447
138 R.J. Beynon, The dynamics of the proteome: Strategies for measuring protein turnover on a proteome-wide scale, Brief Funct. Genomic. Proteomic., 3 (2005) 382–390. 139 B.A. Ballif, P.P. Roux, S.A. Gerber, J.P. MacKeigan, J. Blenis and S.P. Gygi, Quantitative phosphorylation profiling of the ERK/p90 ribosomal S6 kinase-signaling cassette and its targets, the tuberous sclerosis tumor suppressors, Proc. Natl. Acad. Sci. USA, 102 (2005) 667–672. 140 Y. Zhang, A. Wolf-Yadlin, P.L. Ross, D.J. Pappin, J. Rush, D.A. Lauffenburger and F.M. White, Time-resolved mass spectrometry of tyrosine phosphorylatiopn sites in the EGF receptor signaling network reveals dynamic modules, Mol. Cell. Proteomics, 4 (2005) 1240–1250. 141 M.K. Doherty and R.J. Beynon, Protein turnover on the scale of the proteome, Expert Rev. Proteomics, 3 (2006) 97–110. 142 B.J. Cargile, J.L. Bundy, A.M. Grunden and J.L. Stephenson, Jr., Synthesis/degradation ratio mass spectrometry for measuring relative dynamic protein turnover, Anal. Chem., 76 (2004) 86–97. 143 N. Gustavsson, B. Greber, T. Kreitler, H. Himmelbauer, H. Lehrach and J. Gobom, A proteomic method for the analysis of changes in protein concentrations in response to systemic perturbations using metabolic incorporation of stable isotopes and mass spectrometry, Proteomics, 5 (2005) 3563–3570. 144 J.M. Pratt, J. Petty, I. Riba-Garcia, D.H. Robertson, S.J. Gaskell, S.G. Oliver and R.J. Beynon, Dynamics of protein turnover, a missing dimension in proteomics, Mol. Cell. Proteomics, 1 (2002) 579–591. 145 M.K. Doherty, C. Whitehead, H. McCormack, S.J. Gaskell and R.J. Beynon, Proteome dynamics in complex organisms: Using stable isotopes to monitor individual protein turnover rates, Proteomics, 5 (2005) 522–533. 146 M.M. Nowaczyk, R. Hebeler, E. Schlodder, H.E. Meyer, B. Warscheid and M. Ro¨gner, Psb27, a cyanobacterial lipoprotein involved in the repair cycle of photosystem II, Plant Cell, 18 (2006) 3121–3131. 147 S.A. Bustin and T. Nolan, Pitfalls of quantitative real-time reverse-transcription polymerase chain reaction, J. Biomol. Tech., 15 (2004) 155–166. 148 S. Wiese, T. Gronemeyer, R. Ofman, M. Kunze, C.P. Grou, J.A. Almeida, M. Eisenacher, C. Stephan, H. Hayen, L. Schollenberger, T. Korosec, H.R. Waterham, W. Schliebs, R. Erdmann, J. Berger, H.E. Meyer, W. Just, J.E. Azevedo, R.J.A. Wanders and B. Warscheid, Characterization of mouse kidney peroxisomes by tandem mass spectrometry and protein correlation profiling. Mol. Cell. Proteomics, 6 (2007) 2045–2057.
CHAPT ER
18 Multiplexed Quantitative Proteomics Using Mass Spectrometry Philip L. Ross, Xunming Chen, Esteban Toro, Leticia Britos, Lucy Shapiro and Darryl Pappin
Contents
1. Introduction 2. Isobaric N-Terminal Peptide Tagging 2.1 Features of isobaric tagging chemistry 2.2 Limitations 2.3 Experimental and workflow conditions 2.4 Chromatography 3. Mass Spectrometry 3.1 MALDI-TOF-TOF mass spectrometry 3.2 The LC-MALDI workflow 3.3 Advantages of LC-MALDI 4. Quantitative Applications Using Isobaric Tagging 4.1 Bacterial cell cycle protein quantitation 4.2 Characterization of a novel protease References
449 452 452 454 454 456 457 457 459 459 461 461 463 466
1. INTRODUCTION Chemical modification strategies have found widespread use in protein chemistry as researchers continue to answer questions pertaining to protein structure, function and quantity [1,2]. Similarly, mass spectrometry (MS) has become a primary tool in protein structure characterization. In the past decade, it has been realized that the synergy between chemistry and the analytical power of MS can add a new dimension to the scope of biological research activity. In particular, the use of isotope-specific chemistry combined with MS based on Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00218-3
r 2009 Elsevier B.V. All rights reserved.
449
450
Philip L. Ross et al.
either electrospray ionization (ESI) or matrix-assisted laser desorption/ionization (MALDI) has enabled protein identification and quantitation at an unprecedented scale. In this chapter, we will highlight one such ‘proteomics’ approach to provide the reader a view into its current state-of-the-art. This approach uses isobaric N-terminal peptide tagging with subsequent MALDI tandem time-offlight (TOF/TOF) MS to perform multiplexed protein identification and quantitation. The underlying principle beneath the majority of quantitative proteomic analysis methodologies is isotopic substitution. A number of stable isotopes of carbon, hydrogen, nitrogen and oxygen are essentially chemically and biologically indistinguishable from each other. In a mass spectrometer however, these isotopes are distinguishable since each isotope has a unique mass number, giving rise to the notion of a ‘heavy’ vs. a ‘light’ version of the same molecular species. Present-day comparative proteomics uses the following general workflow: proteins from two or more samples are isolated, digested into peptides, subjected to some form of chromatographic separation and finally analyzed by MS. Isotopic substitution is introduced at one of several points (Figure 1) in the process such that one or more sites in a protein or peptide contain an isotopically distinct portion. Sets of reactive tag molecules have been devised such that naturally occurring atoms at one or more positions are replaced with higher mass isotopes (typically 13C or 2H), giving each tag a distinct mass despite being otherwise biologically or chemically identical. The number of distinct tag masses corresponds to the number of samples that can be compared. The samples remain separate until after the labeling reaction, then they are combined and carried through the remaining steps, ultimately culminating in lists of MS peaks for each peptide identified in the experiment. In the simplest case, where two samples are compared, peak intensities between the ‘light’ and ‘heavy’ are compared to provide a relative quantity between the two. As the peak intensity data between the two samples are acquired in the same analysis, there is an element of internal consistency inherent to this approach. The use of isotopic substitution has seen widespread use for a long time in mass spectrometric analysis of small molecules. Use of stable isotopic labeling (SIL) in proteomics is becoming a mature field, with publication of numerous
Figure 1 Representation of a typical protein quantitation workflow using mass spectrometry (MS). There are multiple potential points of introduction of isotopic substitution to facilitate quantification between two or more samples.
Multiplexed Quantitative Proteomics Using Mass Spectrometry
451
reviews from different perspectives [3,4]. Ultimately all that is required is placement of a mass differential detectable by MS or MS/MS between the two or more samples to be compared. Thus new methods and refinements continue to emerge. It is far beyond the scope of this chapter to adequately describe every method, however we will highlight the basic attributes of several of the more commonly used methods. We will turn to a detailed description of one such approach, isobaric peptide labeling, in the next chapter. Researchers have found ways to incorporate stable isotopes at virtually every stage in a protein analysis workflow. The most logical way to present these is the order in which the isotopic enrichment is introduced (Figure 1). Stable isotope labeling of amino acids in culture (SILACt reagents) [5–7] introduces a differentially labeled amino acid at the cell growth stage through the use of specialized culture media. One normal amino acid is replaced with an isotopically enriched amino acid. Two cell states to be compared are grown separately in respective ‘light’ and ‘heavy’ media. After cells are harvested, they can be combined for further processing. Intact proteins can be labeled with techniques such as cleavable isotopecoded affinity tags (cleavable ICATs reagents, refs. [8,9]) that place a differential isotope-labeled group including a biotin at cysteine residues. After combining the two differentially labeled protein mixtures, the sample is enzymatically digested and subjected to avidin affinity chromatography. Only cysteine-containing peptides are released for subsequent analysis, thus offering a dramatic reduction in sample complexity with the caveat that most proteins of interest have at least one cysteine-containing peptide. Pairs of heavy and light versions of the same peptide can be identified and quantified from 8 or 16 Da spacing in MS spectra. A molecular weight differential for relative quantification can be introduced during enzymatic protein digestion itself. Serine proteases such as trypsin and endoprotease Glu-C incorporate two oxygen atoms from solvent during hydrolysis of peptide bonds and subsequent incubation [10,11]. If digestion is carried out in highly pure H218O, a high percentage of both carboxyl oxygens will be replaced by 18O. To perform relative quantification, two samples to be compared are separately digested; one in normal water, the other in 18O-enriched water, and combined thereafter. Respective peptide pairs can then be identified by a characteristic 4 Da spacing in subsequent MS analysis. Peptides produced from enzymatic digestion of a complex protein sample can be labeled at either N- or C-termini or at specific residues [1,12–16]. Many such reagents have been reported to accomplish peptide-level derivatization with stable isotope labeling, including the technique to be described in detail in the next chapter. Most commonly it is the N-terminus of the peptide that is derivatized. The attractive element of such an approach is that all peptides in a complex digest mixture can, in principle, be labeled, including those containing post-translational modifications. Finally, synthetic peptides with some form of differential isotopic labeling can be spiked into a proteolytic digest just prior to liquid chromatography and MS (LC/MS/MS) [17–19]. The amino acid sequence of these internal standard
452
Philip L. Ross et al.
peptides match experimentally observed sequences, and can be synthesized with isotopically heavy analogues of one or more amino acids, as described in the AQUA [18] (absolute quantification) approach. Alternatively, test samples and internal standard peptides can be labeled with respective light and heavy N-terminal tags, as highlighted in the MIDASt workflow [19] (multiple reaction monitoring initiated detection and sequencing) approach. The internal standard peptides will chromatograph in an identical fashion to the native peptide. In both of these cases, absolute measurement of target proteins is possible since a known quantity of the internal standard peptide is added to the sample.
2. ISOBARIC N-TERMINAL PEPTIDE TAGGING 2.1 Features of isobaric tagging chemistry A protein quantitation approach has emerged that makes use of isobaric tags (iTRAQt reagents (Applied Biosystems)) to chemically modify peptide N-termini and lysine amine side chains [16]. This technology consists of a set of four different reagents whose chemical structures and nominal molecular masses are the same (Figure 2). The reagent system uses an N-hydroxy succinimide ester (NHS ester) functional group to covalently attach to peptide amine groups in solution. When attached to a peptide, each of the four reagents gives the same mass addition, thus giving the same molecular weight in an MS measurement. When subjected to fragmentation in an MS/MS experiment, each tag produces a unique low-mass signature ion in the m/z 114–117 region, while the remainder of the spectrum appears as a normal, single component peptide fragmentation spectrum. When subjected to fragmentation, the tag molecule first breaks at the amide bond to the peptide. Through subsequent gas-phase unimolecular decomposition (‘neutral loss’), the carbon monoxide group is lost, leaving N-methyl piperazino positive ion. The neutral loss group thus serves as a mass balance group, ensuring that each of the four reagents are isobaric when coupled to a peptide amine. The peak areas of the abundant low-mass signature ions enable peptide-relative quantification. In general, it is observed that standard deviations of quantitative measurement for iTRAQ are in the vicinity of 20%. Alternatively, one or more of the reagents can be used to derivatize synthetic proteotypic peptides that are spiked into samples at known concentration as internal standards. There are several strengths of the isobaric tagging approach in the context of proteomics applications. Since there are four unique signature ions, up to four separate samples can be compared quantitatively in a single experiment. This is arguably the highest degree of multiplexing currently available for widespread use in MS analysis of proteins. Sequence-derived fragment ions of labeled peptides produced by MS/MS are not differentially labeled, so a mixture of a peptide labeled with any or all of the four reagents will show a single component MS/MS spectrum. In effect, the fragment ions are the summed intensity from up to four samples, so there is an element of signal enhancement in MS/MS.
Multiplexed Quantitative Proteomics Using Mass Spectrometry
453
Figure 2 Chemical structures and isotope-coding scheme for four-plex isobaric tagging chemistry (ref. [16]). Reprinted with permission from Association of Biochemistry and Molecular Biology.
An alternative way of stating these features is that every peptide MS/MS spectrum contains both sequence identification information and all information for quantification. There is no requirement for comparison of separate spectra or need to perform post-analysis reconciliation between identification information and quantitative data. The nature of the tag itself confers stability to the fragmentation process, so that there is a simplification of MS/MS spectra into more information-rich ions. The fragmentation to produce low-mass signature ions appears to be efficient for a number of MS platforms utilizing both MALDI and ESI. This aspect of the approach lends itself to use in laboratories that already have access to MS
454
Philip L. Ross et al.
instrumentation for protein and peptide analysis. Since all peptides in a mixture will be tagged, this technique is applicable to analysis of post-translational modifications. It can also be combined with absolute quantification approaches where a known quantity of an internal standard peptide is spiked into an experiment. In this case, the internal standard peptides are tagged with one of the four reagents, and three quantitative ‘channels’ remain for samples. By measuring peak areas between the standard and samples, absolute quantification for a particular protein in up to three different samples can be performed in a single experiment.
2.2 Limitations No description of a scientific methodology would be complete without calling attention to limitations that may be encountered. In the isobaric peptide tagging approach described here, it is necessary to collect MS/MS spectra on every detected precursor to ultimately identify those peptides that undergo a meaningful quantitative change between the different cell states measured. Conversely in approaches based on mass difference labeling, all precursor pairs can be evaluated statistically and only those pairs observed to change by a predetermined statistical metric are subjected to further analysis by MS/MS. Nevertheless, the ability to directly perform four-way comparisons offers a three-fold reduction in the number of experiments required compared to two-way comparisons. The small molecular mass of the signature ions produced by iTRAQ reagents are, in general, not well suited to analysis in quadrupole ion trap mass spectrometers. The ion trap platform is itself quite versatile in the ability to develop novel acquisition modes. It is therefore possible to enhance signal levels in the low-mass region to obtain adequate peak areas for some applications. To date however, quadrupole and TOF platforms provide better transmission at m/zo150. Finally, any peptide-level quantitative approach has potential dynamic range limitations when used for highly complex mixture analysis such as whole, unfractionated cell lysates. Even with cation exchange fractionation prior to reverse-phase LC and MS/MS, many fractions will be rich in peptides that are of equivalent or near-equivalent molecular weight. The result of this limitation in ‘mass space’ is that there will quite likely be a peptide signal and hence a quantitation signal at any mass selected for fragmentation. This background can reduce the effective dynamic range of quantitation changes that can be reliably measured. Ultimately, experimental design represents the best tool for arriving at a more focused class of analyte molecules from which to derive quantitative information.
2.3 Experimental and workflow conditions A number of sample processing steps are used before peptide mixtures are labeled for quantitative MS analysis. This process is depicted in Figure 3. Cells must first be lysed and the proteins isolated and digested prior to labeling. In a
Multiplexed Quantitative Proteomics Using Mass Spectrometry
455
Figure 3 General workflow diagram for performing multiplexed protein quantification with isobaric tags starting from a complex cellular sample. Each sample is processed separately until after the peptide labeling step, where they are combined and carried through SCX chromatography and LC-MS/MS.
typical workflow, 6 M guanidine, 1% triton X-100 in 100 mM triethylammonium bicarbonate (TEAB) is used for solubilizing proteins. For more difficult preparations more aggressive solubilization agents may be used. The main restriction when considering solubilization conditions is that components containing primary or secondary amines will be reactive with the NHS ester group of the iTRAQ reagent. After removing insoluble debris by centrifugation, proteins can either be precipitated (trichloroacetic acid or acetone), or directly enzymatically digested. It is very beneficial to perform a total protein assay for each of the clarified protein samples to be studied before proceeding with further steps. The reason for this is that quantitation based on peptide-level chemistry requires parallel processing of all samples until labeling is complete. Supplying equal amounts of total protein to each sample helps to balance trypsin digestion efficiency, labeling reagent consumption and simplifies data normalization. Typically, cysteine residues are reduced and alkylated to alleviate primary and secondary protein structure difficulties prior to digestion. Digestion without precipitation will only be successful after solubilizing agents have been appropriately diluted to avoid interference with trypsin or other proteases. Detergents pose particular problems as ionic detergents such as SDS are incompatible with reverse-phase chromatography, and non-ionic detergents interfere with cation exchange chromatography. Moreover detergents and salts are highly incompatible with MS. TEAB is the buffering agent of choice as a
456
Philip L. Ross et al.
volatile quaternary ammonium salt that is unreactive toward NHS ester reagents that can be removed by evaporation if needed. Precipitation or other specific isolation techniques such as affinity chromatography represents a very convenient means of removing all other solution components in the protein sample. Numerous filtration, dialysis and chromatography cartridges are available from multiple vendors to also accomplish some form of biomolecule separation based on molecular weight. The primary drawback of precipitation is the loss of proteins due to downstream insolubility. Much of the protein pellet can be recovered by a two-step tryptic digestion in the presence of 0.1% SDS. The first step using trypsin in a 1:40 enzyme:substrate ratio helps to break up insoluble protein solids into soluble peptides. Another aliquot of trypsin is applied 4–6 h later to complete the digestion during subsequent 18 h incubation. Once digestion is complete, peptides are ready for derivatization. The use of the NHS ester functional group has particular implications when considering optimal reaction conditions. Reaction efficiency is basically mitigated by the competition between aminolysis and hydrolysis (and hence deactivation) of the NHS ester group of the reagent itself. The ideal buffer conditions are 75% isopropanol, 0.25 M TEAB pH 8.5 in a total volume not exceeding 100 ul. A high organic solvent content reduces the rate of hydrolysis dramatically, allowing amine coupling to proceed favorably. Beyond 75–80% isopropanol however aminolysis rates begin to fall, and reagent solubility also decreases. A significant concentration of buffer is also required to maintain solution pH during the reaction. Typically, peptide quantities representing 50–100 ug of total protein can be derivatized at close to 100% in a 30-min reaction under these conditions, assuming a substantial excess of iTRAQ NHS ester reagent. Commercially available kits supply the reagent in 2 mmol quantities, which is roughly a 10-fold excess of reagent over peptide amines in a 100-ug tryptic digest of cellular proteins. When the reaction is complete, all free peptide N-termini and lysine side chains will carry one molecule of the tag reagent. A lysine at the N-terminus of a peptide will have two tags, one at the N-terminus and one at the e amino side chain. Potential side reactions are limited to tyrosine and histidine side chains. Under the conditions described here, at most only a small percentage (o4%) of any given peptide will actually undergo the side reaction, meaning it has negligible outcome with respect to final quantitative measurements. The reaction can be considered complete after 1 h at room temperature. At this time, the reagent has either coupled to free amines in the sample, or hydrolyzed to the free acid salt of the reagent. Although quenching is not necessary, acidifying the reactions to pHo5 with trifluoroacetic acid should effectively stop the reaction. The set of four samples labeled with each reagent can then be combined and carried through one or more rounds of chromatography.
2.4 Chromatography The N-methyl piperazine acetate group comprising the isobaric tag behaves very similarly to a moderately basic amino acid. In strong cation exchange (SCX) chromatography, derivatized peptides remain bound slightly longer in a typical
Multiplexed Quantitative Proteomics Using Mass Spectrometry
457
KCl gradient. There is a large excess of unused reagent that hydrolyzes to the free acid salt form of the NHS ester during the reaction. This salt can potentially interfere with peptide binding to the cation exchange column, therefore it is necessary to dilute the sample 10–100 to reduce the salt concentration to below 10 mM. Drying the mixed sample may be useful to remove the 250-mM TEAB and isopropanol from the labeled sample. Typically, the labeled sample is loaded onto the SCX column in 5–10 ml volumes of cation exchange binding buffer. If each peptide sample is 50–100 ug of material, then the combined sample will be up to 400 ug, therefore a column of sufficient capacity should be chosen. A high volume wash of several column volumes is used to remove unreacted reagent and other sample components, then a cation gradient from 0.005 to 0.5 M KCl over 15 min is sufficient to separate complex mixtures into 10–12 fractions. Alternatively, SCX cartridge procedures can be used as a form of sample clean-up prior to reverse-phase chromatography. Similarly, a high-capacity reversed-phase cartridge may also be used as a sample clean-up step prior to analytical reversedphase chromatography. As with SCX, several column volumes of binding buffer should be used to fully wash away low-molecular-weight contaminants from the sample prior to elution of labeled peptides. Reverse-phase chromatography is performed using the same buffer systems, columns and gradient as conventional peptide rpHPLC. Direct separation of a labeled sample can also be performed by rpHPLC as long as capacity issues are addressed. Typically, the sample is first bound onto a trap column to wash residual reagent away for 10–20 min before starting the gradient.
3. MASS SPECTROMETRY 3.1 MALDI-TOF-TOF mass spectrometry The workhorse tool of peptide and protein sequencing is tandem MS. In MS/MS, ions are produced from a continuous liquid flow via ESI, or from dried spots on a target through MALDI. An ion of interest at a specific mass-to-charge ratio is isolated in the mass spectrometer and subjected to fragmentation where the fragment masses are recorded and interpreted. There are numerous hardware configurations to accomplish this. One such recently emerging technology is TOF/TOF MS, which uses MALDI to produce ions for MS and MS/MS experiments [20,21]. The combination of the MALDI process with TOF technology confers particular benefits for quantitative analysis of complex protein samples. In simplest terms, a TOF mass spectrometer uses a simple positive voltage differential to push ions in a vacuum toward a detector that is usually held at ground potential. The speed and arrival time of the ions at the detector is inversely proportional to their mass. In the MALDI process, a sample is combined with a large excess of a light-absorbing matrix molecule and deposited onto a surface. The surface is placed in the electric field, and pulsed laser light is used to generate ions that are accelerated to the detector. Two decades of
458
Philip L. Ross et al.
dedicated evolution of both physical and chemical processes have culminated in the high-performance MALDI-TOF instruments in use today, as exemplified in Figure 4. The TOF-TOF instrument consists of two accelerating regions — Source 1 and Source 2, with an ion selection device and a collision cell between. Source 1 uses a linear delayed extraction TOF [21,22] configuration to sharply focus ions in time and space to the timed ion selector (TIS) region. The TIS is programmed with a time delay generator and voltage pulsing to act as a gate allowing only ions of interest within a given m/z range to pass through the collision cell. Before reaching the collision cell, a voltage potential partially decelerates the ions so that collisions occur with a kinetic energy appropriate for peptide backbone fragmentation. After the fragments are formed, they travel to the second source region where a second delayed extraction TOF analyzer records a mass spectrum of the fragments. The second TOF analyzer is equipped with a two-stage reflector that provides additional energy and mass focusing of fragments formed in the
Figure 4 Schematic diagram of a MALDI-TOF/TOF mass spectrometer used for highperformance MS and quantitative MS/MS of complex peptide mixtures. The instrument shown is similar to the 4,800 MALDI-TOF/TOF Analyzer marketed by Applied Biosystems/ MDS Sciex.
Multiplexed Quantitative Proteomics Using Mass Spectrometry
459
collision cell. Several sets of tunable deflectors help to guide the ion beam through the instrument. This instrument is equipped with two detectors to support multiple modes of operation in a single platform. To acquire a simple, low-resolution mass spectrum, the elements in the second source and reflector are turned off, and the ions are delivered to the linear detector. To acquire a high-resolution mass spectrum, as is done to identify peptide precursors for subsequent analysis by MS/MS, the elements in the second source are kept off, and the ion mirror is on to provide time and energy focused ions to the reflector detector. Finally, for MS/MS to obtain a fragment spectrum, the second source, reflector and reflector detector are used.
3.2 The LC-MALDI workflow The LC-MALDI workflow takes advantage of the high-performance characteristics of the MALDI-TOF/TOF platform combined with the powerful resolving characteristics of reverse-phase LC [23]. Using the MALDI-TOF/TOFt instrument (Applied Biosystems/MDS Sciex), it is possible to collect up to 25–30 but at most 50 individual MS/MS spectra on a single dried spot before the material becomes substantially depleted (Figure 5). For samples containing more than 2 or 3 digested proteins some degree of separation is required to spread out the large number of peptides. In a typical configuration, a 60-min reverse-phase chromatography run will be combined post-column with a flow of MALDI matrix solution using a mixing tee. Combined liquid is then spotted sequentially at approximately 20 s intervals across the rectangular MALDI plate to form an array of spots (7 rows 44 columns). Each spot has a pre-programmed position and elution time as established through the mass spectrometer operation software. As described in the previous chapter, peptides can be labeled with isobaric tags for quantitation or can be underivatized peptides and utilize largely the same set of chromatography and MS conditions. In the LC-MALDI workflow, MS data and MS/MS data are collected separately. The entire array of spots is first analyzed by reflector MS to obtain accurate precursor mass information for each spot in the array. Sometimes, an internal standard peptide is spiked into the matrix solution as a simple means of providing a calibration signal for higher mass accuracy (ca. 10 ppm). The MS scan and interpretation software generates a list of precursors for a subsequent MS/MS run. The primary purpose of the interpretation software is to generate a list where a given precursor mass is only analyzed once to avoid redundancy. Other parameters can also be applied in creating the precursor list, which is one of the main strengths of the LC-MALDI approach, as described in the next paragraph. The MS/MS acquisition run then starts, generating both fragmentation spectra for peptide sequence identification and quantitation data for each individual MS/MS spectrum.
3.3 Advantages of LC-MALDI The main advantages of the LC-MALDI workflow arise from the ability to intelligently and independently sift through precursor and elution time data
460
Philip L. Ross et al.
Figure 5 Schematic diagram of the LC-MALDI MS/MS workflow with multiplexed peptide quantification. Eluent from HPLC is combined with matrix and spotted sequentially across the MALDI plate. Typically, four individual HPLC runs can be collected on a single plate. The plate is introduced into the mass spectrometer where an MS scan is acquired to search for peptide precursors, typically in the range 900–4,000 Da. After automatic generation of a precursor list, MS/MS spectra are acquired. Typically in a whole-cell proteomics experiment, MS/MS spectra from up to 25 precursors can be recorded from a single MALDI spot. Each MS/MS spectrum contains fragmentation information to identify the corresponding peptide, and relative quantification information to show the pattern of change of the respective protein between the four samples.
using a variety of selection parameters and criteria. The collection of MS/MS spectra is therefore not limited by the chromatographic time scale. A given peptide will be found in several consecutive spots on the target plate and can be interrogated by MS/MS at the spot of maximum abundance. Precursor masses can be excluded for specified lengths of elution time so that MS/MS data can be acquired on less abundant precursors. The MALDI process itself is not limited by space-charge considerations in the same way as ESI, therefore considerably more material can be loaded into LC-MALDI MS/MS experiments. This additional loading makes it possible to collect spectra from less abundant proteins. Experiments can, in fact, be designed to specifically avoid collecting spectra from the most abundant proteins as determined by the intensity of the precursor ion. Precursor lists can be tailored to perform multiple rounds of MS/MS acquisition to take into account the progressive depletion of sample. Since the MS scan is recorded separately, alternate experiments can be performed after the first set of identification and quantification has been completed. A previously collected precursor list can be supplied as an exclusion list in both
Multiplexed Quantitative Proteomics Using Mass Spectrometry
461
mass and elution time dimensions. Finally, for quantification purposes, the MALDI-TOF/TOF mass spectra typically produce the most abundant signature ion peaks compared to other MS platforms. This, combined with the ability to accumulate hundreds of single spectra into a final MS/MS spectrum, confers particular statistical benefits for relative quantification measurements. There are several drawbacks encountered with the LC-MALDI workflow in comparison to LC-MS of peptides in large-scale proteomics experiments. Since many individual laser shots are averaged to give the cumulative spectrum, collection of MS/MS spectra from a single 60-min LC gradient can require several hours. This is in part a result of the fact that the actual efficiency of fragmentation of the precursor ion in the TOF/TOF hardware is substantially lower than in quadrupole mass spectrometers. The actual collision energy cannot typically be adjusted with the current TOF-TOF configurations. Therefore it is necessary to average at least 100–200 laser shots to obtain useful spectra from peptides at less than 20 fmol in a spot. Typically, 1,500 laser shots are acquired to take advantage of the signal-to-noise gain from signal averaging. Despite the apparent time drawback, it is possible to acquire 50,000–100,000 shots on a precursor of interest if desired.
4. QUANTITATIVE APPLICATIONS USING ISOBARIC TAGGING The number of potential applications of multiplexed protein quantitation could itself occupy a large volume. Two disparate examples of this methodology are briefly highlighted in this chapter. The aim of this section is to give a sense of the wide scope of experiments that can benefit from the technology described in this chapter. These applications extend far beyond typical comparative analysis in proteomics into basic protein chemistry. Since the reagents central to this technology are amine reactive, there are clearly many other suitable classes of analyte extending beyond proteins and arguably the biological sciences.
4.1 Bacterial cell cycle protein quantitation Caulobacter crescentus is a commonly encountered aquatic bacterium found in the vicinity of freshwater supplies. It is known to survive long periods of time under harsh conditions in the absence of nutrients. The bacterium exists in two forms, a motile ‘swarmer’ cell which moves in response to its environment with the aid of a single tiny flagellum, and a ‘stalked’ cell which has lost its flagellum adheres to surfaces with the aid of a rigid peptidoglycan stalk. Over a period of 45 min, the swarmer cell differentiates into the stalked cell. The stalked cell then proceeds to replicate and divide. Caulobacter divides asymmetrically to form a swarmer cell, and another stalked cell. This unusual bacterial cell cycle represents a simple model system in developmental biology from which fundamental aspects of differentiation and asymmetric cell division can be studied. The bacterium is also of considerable engineering interest as it has been established that C. crescentus stalked cells produce one of the strongest natural adhesives allowing cells to adhere to wet surfaces [24]. Genome sequencing and mRNA expression
462
Philip L. Ross et al.
measurements have been performed with regard to the C. crescentus cell cycle [25,26]. However, protein quantitative analysis has primarily emphasized individual proteins of interest. Focusing on differentiation from the swarmer cell to the stalked cell, a pure population of Caulobacter swarmers was isolated by centrifugation. The cells were then allowed to proceed through to formation of a population of stalked cells (Figure 6A). Cells were harvested at 15 min time points from 0 to 45 min. To measure protein change through each time point, peptides isolated from each time point were separately labeled with one of four isobaric reagents. Once labeled, the four individual samples were combined and separated into 10 fractions by SCX chromatography. Each SCX fraction was then analyzed by LC-MALDI MS/MS to identify and measure changes in protein abundance during cellular differentiation. The entire set of MS/MS spectra (ca. 20,000) are submitted for database searching using the Mascot (www.matrixscience.com) search engine. Relative quantification data from each individual high-confidence peptide is then grouped into the corresponding proteins and the respective ratios are averaged. Quantitative results from the entire data set are normalized to correct for systematic variation between samples in the experiment. The relative levels of approximately 1,200 proteins were monitored through the 45-min period. Each individual peptide MS/MS spectrum contains a set of
Figure 6 (A) Workflow for quantitation of protein change over 15 min intervals during the 45-min differentiation of Caulobacter crescentus cells from flagellated mobile cells to stalked non-mobile cells. (B) Illustration of protein turnover of ejected flagellar proteins during C. crescentus differentiation. Raw spectral are shown at the top, and averaged normalized data are displayed in the graph below.
Multiplexed Quantitative Proteomics Using Mass Spectrometry
463
Figure 7 (A) Raw signature ion profiles for two separate peptides from the McpA protein, demonstrating consistency between individual peptide quantitative measurements. (B) Averaged, normalized protein profiles for C. crescentus McpA and McpC proteins during differentiation. The graph shows profiles from analysis of two separate synchronized C. crescentus cultures allowed to then proceed through the cell cycle.
four peaks whose areas reflect the abundance profile of the corresponding protein as the Caulobacter cells undergo differentiation. Several proteins whose abundance changes have been well studied in Caulobacter were also identified in the MS experiments, providing validation for the overall methodology. These include several flagellar proteins (Figure 6B) that are degraded as the cell ejects its flagellum and loses motility. Another example is the methyl-accepting chemotaxis proteins McpA and McpC (Figure 7). These proteins are membrane receptors responding to external chemical stimuli and, through a series of protein modification events, give information to direct motion of the cell via the flagellum. These proteins are targeted for degradation during differentiation as they are no longer required in the non-motile stalked cell. Several aspects of the quantitative LC-MALDI workflow are illustrated in Figure 7. Peptides belonging to the same protein, for example McpA in Figure 7A, are typically very consistent with respect to quantification. Secondly, biological replicates (Figure 7B) have demonstrated a high degree of reproducibility. The quantitative MS results obtained here demonstrate similar degradation patterns for both McpA and McpC. In addition to verifying temporal patterns of previously studied proteins, the power of LC-MALDI as a discovery tool reveals the time course profiles of several hundred proteins that have not previously been studied in C. crescentus.
4.2 Characterization of a novel protease Many proteolytic enzymes found in nature fall under one of several broad classes based on common functional or mechanistic characteristics of the enzyme. One rather unusual proteolytic enzyme discovered recently that seems to occupy a distinct category of its own is the metallo-endopeptidase from Grifola frondosa (GFMEP) [27,28]. This small (24 kDa) enzyme cleaves peptide bonds at the
464
Philip L. Ross et al.
N-terminal side of lysine and requires 1 equivalent of zinc for activity. Although some aspects of the kinetics and mechanism of this protein have been studied, we sought to examine the properties of this enzyme in the context of complete digestion of proteins into peptides. One study used quantitative proteomics with LC-MALDI to examine the time-based appearance profiles of peptides during a solution protein digest with GFMEP (Figure 8A). In this experiment, proteins were digested with GFMEP and at four time points, aliquots of the reaction were collected and the reaction stopped by EDTA addition. Then, each separate aliquot was labeled with one of the four iTRAQ reagents, and the labeled digestion mixtures were mixed together and analyzed by LC-MALDI MS/MS. A particular subtle advantage of the isobaric approach is highlighted here. If the peptide has virtually no abundance at a particular time point, it will still be detected in the experiment since the MS/MS spectrum is an aggregate signal from all four time points. Several example peptides (from phosphorylase b) are shown in Figure 8B. Many of the peptides measured in these experiments follow nearly identical appearance kinetics. There is approximately two-fold increase in abundance between 6 and 20 h, indicating a relatively slow digestion compared to trypsin. In similar experiments trypsin digestion appears 90% complete within 2 h. There are a number of partial digestion products (Figure 9), whose appearance profiles also reflect a slow processivity. The peptide KDFYELEPHKFQN is clearly an
Figure 8 (A) Workflow used to monitor peptide appearance profiles during the solution digestion of proteins by GFMEP. (B) Time course profiles of three peptides from bovine phosphorylase b protein during solution digest with GFMEP.
Multiplexed Quantitative Proteomics Using Mass Spectrometry
465
Figure 9 Illustration of partial cleavage and R-specific activity characteristics of GFMEP during a 24-h solution digestion of phosphorylase b protein.
intermediate product, as it exhibits a maximum abundance at 1 h, then disappears to the truncated KDFYELEPH at longer times. This behavior indicates an apparent ‘clean-up’ of missed cleavages to produce fully digested peptides. Digestion with GFMEP also shows a small degree of N-terminal Arg-specific activity following slower reaction kinetics than the primary Lys-specific activity. This behavior is exemplified by peptide KNFNRHLHFTLV, which reaches a maximum abundance at 6 h. Between 6 and 20 h, there is a three-fold increase in the R-specific RHLHFTLV, suggesting that this alternate activity is more prevalent after 6 h.
466
Philip L. Ross et al.
There are perhaps many alternative methods to measure the reaction kinetics of proteolysis or amino acid biochemical processes. There are subtle advantages afforded by the multiplexed isobaric approach shown here. Both structural characterization of reaction products and kinetic information is provided with a single data collection event. Since quantitative data from each time point is obtained in a single spectrum, intermediate reaction products can be observed even if they are present in negligible amounts during one or more time points.
REFERENCES 1 R.L. Lundblad, Chemical Reagents for Protein Modification, 3rd ed., CRC Press, Boca Raton, 2004. 2 R.L. Lundblad, The Evolution from Protein Chemistry to Proteomics: Basic Science to Clinical Application, CRC Press, Boca Raton, 2005. 3 A. Leitner and W. Lindner, Proteomics, 6 (2006) 5418–5434. 4 R. Aebersold and M. Mann, Nature, 422 (2003) 198–207. 5 S.E. Ong and M. Mann, Nat. Chem. Biol., 1 (2005) 252–262. 6 Y. Oda, K. Huang, F.R. Cross, D. Cowburn and B.T. Chait, Proc. Natl. Acad. Sci. USA, 96 (1999) 6591. 7 S.E. Ong, B. Blagoev, I. Kratchmarove, D.G. Kristensen, H. Steen, A. Pandey and M. Mann, Mol. Cell Proteomics, 1 (2002) 376–386. 8 S.P. Gygi, B. Rist, S.A. Gerber, T. Frantisek, M.H. Gelb and R. Aebersold, Nat. Biotechnol., 17 (1999) 994–999. 9 E.C. Yi, X.J. Li, K. Cooke, H. Lee, B. Raught, A. Page, V. Aneliunas, P. Hieter, D.R. Goodlett and R. Aebersold, Proteomics, 5 (2005) 380–387. 10 M. Schnolzer, P. Jedrzejewski and W.D. Lehmann, Electrophoresis, 17 (1996) 945–953. 11 X. Yao, A. Freas, J. Ramirez, P.A. Demirev and C. Fenselau, Anal. Chem., 73 (2001) 2836–2842. 12 M. Munchbach, M. Quadroni, G. Miotto and P. James, Anal. Chem., 72 (2000) 4047–4057. 13 J.-L. Hsu, S.-Y. Huang, N.-H. Chow and S.-H. Chen, Anal. Chem., 75 (2003) 6843–6852. 14 A. Thompson, J. Schaefer, K. Kuhn and S. Kienle, Anal. Chem., 75 (2003) 1895–1904. 15 T. Keough, R.S. Youngquist and M.P. Lacey, Anal. Chem., 75 (2003) 157A–165A. 16 P.L. Ross, Y.N. Huang, J.N. Marchese, B. Williamson, K. Parker S. Hattan, et al., Mol. Cell Proteomics, 3 (2004) 1154–1169. 17 O.A. Mirgorodskaya, Y.P. Kozmin, M.I. Titov, R. Ko¨rner, K.P. So¨nksen and P. Roepstorff, Rapid Commun. Mass Spectrom., 14 (2000) 1226–1232. 18 S.A. Gerber, J. Rush, O. Stemman, M.W. Kirschner and S.P. Gygi, Proc. Natl. Acad. Sci. USA, 100 (2003) 6940–6945. 19 R.D. Unwin, J.R. Griffiths, M.K. Leverentz, A. Grallert, I.M. Hagan and A.D. Whetton, Mol. Cell Proteomics, 4 (2005) 1134–1144. 20 M.L. Vestal and J.M. Campbell, Meth. Enzymol., 402 (2005) 79–108 (and references therein). 21 M.L. Vestal, P. Juhasz and S.A. Martin, Rapid Commun. Mass Spectrom., 9 (1995) 1044–1050. 22 P. Juhasz, M.T. Roskey, I.P. Smirnov, L.A. Haff, M.L. Vestal and S.A. Martin, Anal. Chem., 68 (1996) 941–946. 23 S.J. Hattan, J. Marchese, N. Khainovski, S. Martin and P.J. Juhasz, Proteome Res., 4 (2005) 1931–1941. 24 P.H. Tsang, G. Li, Y.V. Brun, L.B. Freund and J.X. Tang, Proc. Natl. Acad. Sci. USA, 103 (2006) 5764–5768. 25 M.T. Laub, H.H. McAdams, T. Feldblyum, C.M. Fraser and L. Shapiro, Science, 290 (2000) 2144–2148. 26 J. Holtzendorff, D. Hung, P. Brende, A. Reisenauer, P.H. Viollier, H.H. McAdams and L. Shapiro, Science, 304 (2004) 983–987. 27 T. Nonaka, N. Dohmae, Y. Hashimoto and K. Takio, J. Biol. Chem., 272 (1997) 30032–30039. 28 T. Nonaka, H. Ishikawa, Y. Tsumuraya, Y. Hashimoto and N. Dohmae, J. Biochem. (Tokyo), 118 (1995) 1014–1020.
CHAPT ER
19 Large-Scale Subcellular Localization of Proteins by Protein Correlation Profiling Leonard J. Foster
Contents
1. Introduction 1.1 Biochemical enrichment of organelles 2. Peptide Correlation Profiling 2.1 Limitations 3. Other Quantitative Methods 4. Software for PCP 5. Hardware Requirements for PCP 6. The Future for PCP and Organelle Proteomics Acknowledgements References
467 468 468 473 473 474 475 475 476 476
1. INTRODUCTION Organelles are differentiated structures within a cell that perform specialized functions. They are often thought of as being enclosed by a membrane, as in the case of the endoplasmic reticulum or a mitochondrion, but many structures within the eukaryotic nucleus are also considered organelles, such as the nucleolus or a spliceosome. With the rise of proteomics, organelles have quickly become one of the favourite subjects for proteomic projects [1–3] since their protein composition must be more narrowly defined than that of a whole cell and thus one can hope to be able to understand an organelle system long before a whole cell could be fully understood. Despite this reduced complexity, to date no one has conclusively demonstrated the entire protein catalogue of any organelle, let alone a whole cell. Virtually every organelle that can be named has been Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00219-5
r 2009 Elsevier B.V. All rights reserved.
467
468
Leonard J. Foster
targeted by a proteomics study and our understanding of the composition of these intriguing entities is growing. As instruments and methods allow us to probe organellar proteomes more deeply though it is becoming abundantly clear that straightforward qualitative proteomic analysis of biochemically enriched fractions is no longer acceptable; it is impossible to objectively differentiate contaminants from specific components without more information. This chapter will describe the implementation, difficulties and future of a quantitative proteomics approach for unbiased assignment of co-purifying contaminants in organellar proteomes called peptide correlation profiling.
1.1 Biochemical enrichment of organelles Organelles can only be studied in two ways: by microscopy of a sufficient resolution to resolve one compartment from another and by biochemical enrichment. George Palade championed the biochemical enrichment of subcellular structures using sucrose [4], for which he was recognized with the 1974 Nobel Prize in Physiology and Medicine. Since that time reasonable enrichment procedures have been worked out for virtually every organelle, employing methods from the original sucrose technique to fluorescence-activated cell sorting [5] to immune complexing [6] to magnetization [7] to electrophoresis [8]. The problem with any biochemical fractionation procedure, however, is that nothing can be purified to absolute purity. Thus, there will always be contaminants but too often in organelle proteomics this fact is overlooked [9]. Depending on the question being asked in an experiment, contaminants may not matter or may be safely ignored but in a proteomic exercise designed to identify all the components of a compartment the issue of contamination cannot be ignored. Mass spectrometers are so exquisitely sensitive that contamination at the level of 0.1% or less can easily be picked up and can lead to incorrect conclusions being drawn about the functions of an organelle. One method used quite frequently to filter out contaminants in an organelle proteomic study is to use an annotated protein database such as UniProt [10] or ontological classification system like Gene Ontology [11] to remove all proteins from one’s dataset that have not been previously annotated as being the organelle of interest [12]. However, by this method one might as well not even go through the trouble of doing the wet experiment since one could arrive at the same conclusion by simply querying the databases for all proteins found in that organelle.
2. PEPTIDE CORRELATION PROFILING Classically in organelle biochemistry the gold standard for assigning the localization of a given protein was to show that it displayed a gradient distribution pattern similar to that of a commonly accepted marker of that location (Figure 1). Figure 1 also shows that why just looking at the composition of a single fraction is insufficient to assign localization. For example, if one were studying the endoplasmic reticulum–Golgi intermediate compartment (ERGIC),
Large-Scale Subcellular Localization of Proteins by PCP
469
Figure 1 Western blotting of density gradient fractions to assign localization. Western blot distributions of three hypothetical proteins across a density gradient designed to isolate the endoplasmic reticulum/Golgi intermediate compartment (ERGIC). The top protein is a known marker of ERGIC and the middle protein can be putatively assigned to ERGIC since its profile matches very closely to that of the top protein. However, the profile of the bottom protein indicates that it is not a resident protein of ERGIC as it migrates differently in the density gradient.
one might look only at fraction 3 from Figure 1, since that is where an ERGIC marker protein peaks. However, by probing fraction 3 deeply enough one would also find another protein which clearly does not belong in the ERGIC, but in the absence of information from fractions 1, 2, 4, 5, 6, 7 and 8 that protein could not objectively be distinguished as a contaminant. Protein correlation profiling (PCP) is a proteomic method for using gradient information to improve the localization assignments in organelle proteomic studies in direct analogy to the Western blotting example in Figure 1. Conceptually PCP is no more complex than Western blotting. You separate an organelle sample on a density gradient, collect fractions from the gradient and analyze all the fractions for the protein(s) of interest. In practice there are many factors that can impact the success of the experiment. Step (1). A sample containing the organelle(s) of interest is separated on a continuous density gradient and fractions are collected across the entire gradient. A continuous density gradient is preferred over a discontinuous (step) gradient as the sharp steps in density in a discontinuous gradient severely impair the gradient resolution and thus the ability to distinguish true organelle proteins from contaminants. The number of fractions taken from the gradient should be sufficient to yield more than just one fraction per peak of interest. In other words, if the marker that will be used to define a particular organelle is only detectable in one fraction then assigning another protein to that organellar location because it is also found only in that fraction does not instill a great degree of confidence in the assignment. Step (2). Once fractions have been collected they need to be analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) [13]. Initial applications of PCP [14,15] have simply digested the proteins from each gradient fraction in solution and then used LC-MS/MS to analyze each fraction as a whole. Alternatively, each gradient fraction could be further fractionated to achieve a greater depth of proteome coverage but only certain types of fractionation are appropriate (see further for more discussion). Another important consideration is the profile of the chromatographic gradient used for the LC-MS/MS. For the ‘correlation’ aspect of PCP discussed next it is crucial to have highly reproducible chromatography so we find it best to use a simple one-step linear gradient from low % organic to high % organic [15], which is what most groups are using for
470
Leonard J. Foster
reversed-phase separations in LC-MS/MS. Another consideration is the datadependent acquisition [16] method used for the MS/MS. Generally it is always better to acquire as many MS2 spectra as possible, in order to identify as many peptides as possible, but the MS1 spectra must also be frequent enough to ensure adequate sampling of the elution profile for each peptide as it comes off the analytical column; approximately six or eight data points within the time frame of the average LC peak width is appropriate. Step (3). Correlation of LC gradients to find intensities for all peptides in all fractions. This step is the most crucial step of the whole PCP procedure but it is only required because of a technical limitation in current mass spectrometers. The aim of PCP is to be able to track the intensities of all peptides across all fractions from the density gradient so if one were able to positively identify all peptides present during each and every LC-MS/MS analysis then this step would be trivial. However, even the fastest mass spectrometers available today can only identify one or two thousand peptides in a 1-h analysis yet an organellar sample might easily contain tens or hundreds of thousands of peptides so clearly the vast majority of peptides in a complex sample go unidentified. To complicate matters further, it is commonly observed that the set of peptides/proteins identified by MS/MS in two sequential analyses of the same sample do not precisely overlap. Indeed, for complex samples the intersection of the two can be as low as 30% or 40%. The implication of this for PCP is that if one wants to measure the intensities of all identified peptides across all density gradient fractions then one cannot rely on MS/MS identification alone. Table 1 demonstrates this with some hypothetical data; some peptides are identified in all three experiments but most are not. The way that this limitation is overcome in PCP, and the origin of ‘correlation’ in the acronym, is to correlate the elution times of peptides common to all analyses in order to be able to accurately fill in the kinds of gaps depicted in Table 1. In reality, when analyzing a large number of density gradient fractions the number of peptides common to all fractions will probably be too small to accurately align the LC elution times so in practice what is done is to make pair-wise comparisons (e.g., in Table 1 this
Table 1
Incomplete overlap of peptides identified in repeated LC-MS/MS analyses Elution timea (s)
Peptide
ACDCR TRAFFICR DARKER LIGHTER SLACKER a
LC-MS1
LC-MS2
LC-MS3
3,454 – 4,444 4,321 2,678
– 2,468 4,434 4,312 2,687
3,445 2,486 4,454 4,318 –
Hypothetical peptide sequences and the times that they were subjected to tandem mass spectrometry in three independent LC-MS analyses of the same sample. Dashes (–) represent the lack of any MS/MS information supporting the identification of that peptide in the particular LC-MS experiment.
Large-Scale Subcellular Localization of Proteins by PCP
471
would mean correlating LC-MS1 with LC-MS2 and LC-MS1 with LC-MS3). Once elution times for peptides can be predicted for LC-MS/MS experiments in which they were not fragmented then one can look for the peptide ion at the predicted elution time since its mass-to-charge ratio (m/z) is known (see further for a discussion of useful mass spectrometers for PCP and the importance of mass accuracy). Once all known peptide ions can be localized in all density gradient fractions then the ion intensities can be measured and used for Step 4. Step (4). Calculation of density gradient profiles for all proteins. Once peptide ion intensities across all density gradient fractions are known then the profiles for the originating proteins can be calculated. The simplest solution to this step is to simply average the profiles of all peptides derived from a protein to arrive at the profile for that protein. However, there are several factors to consider here: (1)
(2)
(3)
The absolute magnitude of all the peptides from a single protein will not be equal. The amino acid composition of a peptide will dictate how well it is detected by the mass spectrometer (i.e., the magnitude of the ion intensity) so even assuming that all peptides from a protein are present at equal concentrations, the magnitude of their profiles can vary widely. This makes it essential to normalize the profiles for each peptide before averaging them for a protein. Not all peptides from a given protein give equally reliable profiles. This consideration is related to the one discussed above and it boils down to the detector in the particular mass spectrometer. For all detectors a profile based on ion intensities with a low signal-to-noise (s/n) ratio will be less reliable than the one with a high s/n but for some detectors a very high s/n is not necessarily good either. Time-of-flight detectors, for instance, can easily saturate, making a peptide profile flatter than it should be and thus useless for PCP. While they suffer other consequences of having too many ions, Fourier transform (FT) detectors are generally not prone to signal saturation, especially those where the number of ions entering the FT cell is tightly regulated as in the LTQ-FT and LTQ-Orbitrap [17,18] from ThermoFisher Scientific. An investigator should be aware of the characteristics of his/her mass spectrometer in this regard so that unreliable data can be left out of the analysis. For a recent PCP project employing an LTQFT, we calculated average protein profiles from the five peptide profiles with the highest magnitudes [15]. The profiles themselves were normalized before averaging but the absolute magnitude of each was stored and all peptides for a given protein were ranked by their absolute magnitude before choosing only the top-five normalized profiles (Figure 2). Peptides of the same sequence can be found in multiple proteins. Many proteins share one or more tryptic peptides [19], especially splice variants, so one must consider how to deal with the quantitative data from shared peptides: Should the values for shared peptides be averaged into each of the proteins they are found in? Should they be used only for the protein
472
Leonard J. Foster
Figure 2 Higher magnitude peptide profiles are usually more reliable. (A) Density gradient profiles for four peptides of different absolute magnitudes from malate dehydrogenase in a mitochondrial PCP experiment. (B) Once normalized the three profiles of highest magnitude essentially overlap while the lowest magnitude profile differs considerably.
with the most peptides identified? Should they only be used in the absence of other quantitative data? Should they be disregarded entirely? This dilemma is not unique to PCP but nonetheless there is no universally agreed upon policy for handling it. Optimally one would only use proteotypic peptides [20,21] to quantify each protein but as there are no tools yet available to handle data in such a way the informatics behind this approach is prohibitive. Fortunately we find that there are not a great many instances where this problem occurs. Step (5). Comparison of profiles of unlocalized proteins to known markers. The final step in PCP is to compare the profiles of all proteins to the profile(s) of a known marker(s). If the profile of a protein whose localization is unknown matches closely enough to the profile of a known marker of a compartment then the unknown protein very likely resides in/on that compartment. A chi-squared (w2) test has been the method of choice for this approach, although what constitutes a reasonable cut-off for localization assignments depends a lot on the data so we intentionally avoid giving a cut-off value here.
Large-Scale Subcellular Localization of Proteins by PCP
473
2.1 Limitations It goes without saying that the initial biochemistry used to isolate or enrich the organelle(s) of interest is crucial; if the enrichment is poor then the results of PCP are also likely to be poor. Beyond the biochemistry though, there are still some limitations and impediments to the widespread use of PCP: (a)
(b)
Computation and software. Our recent PCP study [15] required the equivalent of six months of a high-end desktop CPU running 24 h/day just to extract all the intensity data from the raw mass spectra. On top of this, virtually all of the quantitation had to be manually verified. The computation could be manually shared by multiple CPUs but the software platforms that can handle such data are not yet parallelizable. In any case, as computers get even faster this limitation will be minimized. The bigger stumbling block though is software. MSQuant (http://msquant.sourceforge. net/) is currently the most versatile suite of tools for quantifying MS data but it has difficulty defining chromatographic peaks precisely, especially at low s/n levels. The only viable solution we have found is to manually verify chromatographic peak assignments for all peptides in order to ensure accurate data. All other platforms we have tested have the same problem though, so despite its deficiencies MSQuant still out-performs all commercial packages and so we continue to use it. Improved analysis software is where the biggest advances are to be made in PCP and related techniques in the next few years. Mass spectrometer dynamic range. The best LC-MS systems available are only able to fragment ions at a rate of a few per second and to detect ions across a concentration range of about 1+E04 or 1+E05. This places some hard physical limits on the depth to which the proteome of a sample may be probed. Two solutions to this are (1) to fractionate the sample and (2) to build a better mass spectrometer. The latter option is beyond the capabilities of most laboratories so fractionation becomes key. Prior to LC-MS/MS, proteins are often fractionated by molecular weight using SDS-PAGE, so-called gel-enhanced LC-MS/MS (GeLC-MS/MS) [22], but this approach presents some additional challenges for PCP. If the gel lanes containing separated fractions from the density gradient are not cut at precisely the same molecular weights then a given protein could, for example, be in slice 7 in one lane and slice 8 in the next lane. Liquid-based fractionation methods [23–25] generally give more reproducible fractions so we no longer use GeLC-MS/MS for PCP.
3. OTHER QUANTITATIVE METHODS PCP could be classified under a more general category of quantitative methods that make use of gradient information to improve the assignments of subcellular localization. Related to PCP, Location of Organelle Proteins by Isotope Tagging (LOPIT) was reported by two groups [26,27] at roughly the same time as PCP [14]
474
Leonard J. Foster
was first described. LOPIT makes use of stable isotope labeling (SIL) to compare the enrichment ratio for proteins between two or more fractions from a density gradient centrifugation (see [9] for more discussion of the differences between LOPIT and PCP). In PCP quantitation is based on ion intensities measured in parallel LC-MS/MS experiments and its accuracy suffers slightly because a given peptide may not be in precisely the same ionization environment in one experiment as it is in the next. On the other hand, SIL [28,29] is the gold standard for MS-based quantitation because the analytes being compared are detected under the same ionization conditions. While SIL methods are more accurate, they are also more expensive and multiplexing (i.e., analyzing more than two conditions at the same time) can be problematic. SIL by amino acids in cell culture (SILAC) [30,31] can handle up to three conditions simultaneously and isobaric tags for relative and absolute quantitation (iTRAQ) [32] can now handle up to eight conditions at the same time, which could allow one to do a whole PCP experiment in a single analysis! SILAC can and has been used on virtually every instrument in use in proteomics today but iTRAQ is limited to only a few instrument types. Another downside to such a high level of multiplexing is that the amount of fractionation required to probe as deeply into a proteome as with duplexed samples increases disproportionately. Spectral counting [33] is a semiquantitative method that is gaining in popularity although, with few exceptions [34,35], we believe it is largely overinterpreted. The method relies on the ‘random’ sampling of peptide ions by a tandem mass spectrometer as a measure of protein abundance. More abundant peptides (from more abundant proteins) are more likely to be selected for MS/MS and so the number of spectra identifying a protein is related to its abundance. This method works very nicely for proteins at the top-end of the abundance scale [34,35] but usually does not provide enough sampling for accurate or precise quantitation of lower abundance proteins. Thus, while spectral counting is a good method for perhaps the most abundant one-quarter or one-half of all proteins identified in a proteomic study, it is often mistakenly applied across the entire dataset [36,37]. Spectral counting is often touted as less expensive than SIL but to do it correctly the amount of sampling that would be required to be able to allow accurate quantitation across even a majority of proteins in a proteome would require at least five to ten times more mass spectrometer time than SIL, more than balancing out the consumable costs for SIL. Thus, spectral counting is an inappropriate quantitation method for PCP.
4. SOFTWARE FOR PCP There are several open source and commercial software packages that can, in principle, be used for PCP. The important software feature required to extract the maximum amount of data from the mass spectra is the ability to locate ion signals in LC-MS/MS data where there is no MS/MS confirmation of the signal’s identity. This feature was specifically engineered into MSQuant for PCP and if a
Large-Scale Subcellular Localization of Proteins by PCP
475
QSTAR (Applied Biosystems/Sciex) or an LTQ-FT or an LTQ-Orbitrap (ThermoFisher Scientific) is available then MSQuant is likely to be the best option. We have also tested some of the commercial packages available for similar purposes and while they have some nice features the approach they use for correlation is too memory-intensive to handle high-resolution MS data efficiently.
5. HARDWARE REQUIREMENTS FOR PCP PCP as described above relies on finding peptide ions in mass spectra where no MS/MS spectra supports their identification directly but where MS/MS spectra did positively identify an ion of the same m/z at the same elution time in a parallel LC-MS/MS experiment. Thus, the accuracy of the m/z measurement of the precursor ion is absolutely essential to ensure reliable profiles. We recommend using an instrument capable of measuring precursor ion m/z at an accuracy level of at least 10 parts-per-million (ppm), such as quadrupole/ time-of-flight (QTOF) or FT-based instruments. FT instruments are typically more accurate than QTOFs and they have a much higher resolving power but scan-toscan variability in signal intensities leads to more noisy data, even in instruments where the number of ions entering the FT cell is tightly controlled [17,18]. The reasons for this variability are unclear but it may be a result of ion statistics (too few ions sampled to get accurate measurements) or electronic effects. Where the ion count in the FT-ICR is not controlled, on-line LC/MS will not be at all useful for PCP and the only option will be static nanospray of the fractions. QTOF instruments do not display this noisiness but, on the flip side, FT instruments have a far superior linear dynamic range than QTOFs.
6. THE FUTURE FOR PCP AND ORGANELLE PROTEOMICS Current software cannot take full advantage of data from current hardware so software is the current bottleneck, without question. As proteomics groups become more sophisticated though, the software gap will certainly be closed. Beyond software, MS hardware needs to be faster, to be more sensitive and to have a higher dynamic range, although the same could be said of virtually any application of MS. Accuracy used to be another parameter that people were always seeking to improve but current QTOF and FT instruments can operate near 1 ppm accuracy. While there is always room for improvement, better accuracy than that has limited practical use in proteomics. As for dynamic range, there are two possible solutions: increase the dynamic range of MS detectors or fractionate the analytes further. Diverse fractionation methods are already available, enough even to fractionate samples to homeopathic levels (i.e., diluted so much that there is less than one molecule in each fraction) if desired, but the amount of MS time required to analyze all the fractions from such a sample would make the experiment impossible. Therefore, MS systems need to become much faster and more sensitive to get same amount of data out of a much shorter
476
Leonard J. Foster
chromatographic separation. If an LC-MS/MS analysis could be completed in one-tenth the time it currently takes then the sample could be fractionated ten times further, allowing its proteome to be probed perhaps two to three times deeper. In summary, we believe that PCP is a much stronger method for assigning localizations in proteomic studies than simply identifying a protein in a supposedly ‘pure’ fraction and in a few years it is likely that organelle proteomic studies will be required to use PCP or a similar method. Software and hardware are also still a few years away from routine application of PCP on a large scale [15], but PCP, LOPIT or a similar method can and should be applied to all proteomic studies of single organelles immediately.
ACKNOWLEDGEMENTS This chapter is supported in part by a Canadian Institutes of Health Research operating grant (MOP-77688) to Leonard J. Foster. He is a Michael Smith Foundation Scholar, a Peter Wall Institute for Advanced Studies Early Career Scholar and the Canada Research Chair in Organelle Proteomics.
REFERENCES 1 J.R. Yates, 3rd, A. Gilchrist, K.E. Howell and J.J.M. Bergeron, Proteomics of organelles and large cellular structures, Nat. Rev. Mol. Cell Biol., 6 (2005) 702–714. 2 J.S. Andersen and M. Mann, Organellar proteomics: Turning inventories into insights, EMBO Rep., 7(9) (2006) 874–879. 3 S. Brunet, P. Thibault, E. Gagnon, P. Kearney, J.J. Bergeron and M. Desjardins, Organelle proteomics: Looking at less to see more, Trends Cell Biol., 13(12) (2003) 629–638. 4 G.H. Hogeboom, W.C. Schneider and G.E. Palade, Cytochemical studies of mammalian tissues. I. Isolation of intact mitochondria from rat liver; some biochemical properties of mitochondria and submicroscopic particulate material, J. Biol. Chem., 172 (1948) 619–635. 5 I. Fialka, P. Steinlein, H. Ahorn, G. Bock, P.D. Burbelo, M. Haberfellner, F. Lottspeich, K. Paiha, C. Pasquali and L.A. Huber, Identification of syntenin as a protein of the apical early endocytic compartment in Madin-Darby canine kidney cells, J. Biol. Chem., 274(37) (1999) 26233–26239. 6 A. Volchuk, R. Sargeant, S. Sumitani, Z. Liu, L. He and A. Klip, Cellubrevin is a resident protein of insulin-sensitive GLUT4 glucose transporter vesicles in 3T3-L1 adipocytes, J. Biol. Chem., 270(14) (1995) 8233–8240. 7 H.S. Li, D.B. Stolz and G. Romero, Characterization of endocytic vesicles using magnetic microbeads coated with signalling ligands, Traffic, 6(4) (2005) 324–334. 8 H. Zischka, G. Weber, P.J. Weber, A. Posch, R.J. Braun, D. Buhringer, U. Schneider, M. Nissum, T. Meitinger, M. Ueffing and C. Eckerskorn, Improved proteome analysis of Saccharomyces cerevisiae mitochondria by free-flow electrophoresis, Proteomics, 3(6) (2003) 906–916. 9 L.J. Foster, Mass spectrometry outgrows simple biochemistry: New approaches to organelle proteomics, Biophys. Rev. Lett., 1(2) (2006) 163–175. 10 The Universal Protein Resource (UniProt), Nucleic Acids Res., 35(Database issue) (2007) D193–D197. 11 The Gene Ontology (GO) project in 2006., Nucleic Acids Res., 34(Database issue) (2006) D322–D326. 12 C.C. Wu, M.J. MacCoss, G. Mardones, C. Finnigan, S. Mogelsvang, J.R. Yates, 3rd and K.E. Howell, Organellar proteomics reveals golgi arginine dimethylation, Mol. Biol. Cell, 15(6) (2004) 2907–2919. 13 R. Aebersold and M. Mann, Mass spectrometry-based proteomics, Nature, 422(6928) (2003) 198–207.
Large-Scale Subcellular Localization of Proteins by PCP
477
14 J.S. Andersen, C.J. Wilkinson, T. Mayor, P. Mortensen, E.A. Nigg and M. Mann, Proteomic characterization of the human centrosome by protein correlation profiling, Nature, 426(6966) (2003) 570–574. 15 L.J. Foster, C.L. de Hoog, Y. Zhang, Y. Zhang, X. Xie, V.K. Mootha and M. Mann, A mammalian organelle map by protein correlation profiling, Cell, 125(1) (2006) 187–199. 16 H. Steen and M. Mann, The ABC’s (and XYZ’s) of peptide sequencing, Nat. Rev. Mol. Cell Biol., 5(9) (2004) 699–711. 17 J.V. Olsen, L.M. de Godoy, G. Li, B. Macek, P. Mortensen, R. Pesch, A. Makarov, O. Lange, S. Horning and M. Mann, Parts per million mass accuracy on an orbitrap mass spectrometer via lock-mass injection into a C-trap, Mol. Cell Proteomics, 4(12) (2005) 2010–2021. 18 J.E. Syka, J.A. Marto, D.L. Bai, S. Horning, M.W. Senko, J.C. Schwartz, B. Ueberheide, B. Garcia, S. Busby, T. Muratore, J. Shabanowitz and D.F. Hunt, Novel linear quadrupole ion trap/FT mass spectrometer: Performance characterization and use in the comparative analysis of histone H3 post-translational modifications, J. Proteome Res., 3(3) (2004) 621–626. 19 J. Rappsilber and M. Mann, What does it mean to identify a protein in proteomics? Trends Biochem. Sci., 27(2) (2002) 74–78. 20 B. Kuster, M. Schirle, P. Mallick and R. Aebersold, Scoring proteomes with proteotypic peptide probes, Nat. Rev. Mol. Cell Biol., 6(7) (2005) 577–583. 21 P. Mallick, M. Schirle, S.S. Chen, M.R. Flory, H. Lee, D. Martin, J. Ranish, B. Raught, R. Schmitt, T. Werner, B. Kuster and R. Aebersold, Computational prediction of proteotypic peptides for quantitative proteomics, Nat. Biotechnol., 25(1) (2007) 125–131. 22 M. Schirle, M.A. Heurtier and B. Kuster, Profiling core proteomes of human cell lines by onedimensional PAGE and liquid chromatography-tandem mass spectrometry, Mol. Cell Proteomics, 2(12) (2003) 1297–1305. 23 M.P. Washburn, D. Wolters and J.R. Yates, 3rd, Large-scale analysis of the yeast proteome by multidimensional protein identification technology, Nat. Biotechnol., 19(3) (2001) 242–247. 24 P. Horth, C.A. Miller, T. Preckel and C. Wenz, Efficient fractionation and improved protein identification by peptide OFFGEL electrophoresis, Mol. Cell Proteomics, 5(10) (2006) 1968–1974. 25 Y. Ishihama, J. Rappsilber and M. Mann, Modular stop and go extraction tips with stacked disks for parallel and multidimensional peptide fractionation in proteomics, J. Proteome Res., 5(4) (2006) 988–994. 26 T.P. Dunkley, R. Watson, J.L. Griffin, P. Dupree and K.S. Lilley, Localization of organelle proteins by isotope tagging (LOPIT), Mol. Cell Proteomics (2004). 27 M. Marelli, J.J. Smith, S. Jung, E. Yi, A.I. Nesvizhskii, R.H. Christmas, R.A. Saleem, Y.Y. Tam, A. Fagarasanu, D.R. Goodlett, R. Aebersold, R.A. Rachubinski and J.D. Aitchison, Quantitative mass spectrometry reveals a role for the GTPase Rho1p in actin organization on the peroxisome membrane, J. Cell Biol., 167(6) (2004) 1099–1112. 28 S.E. Ong, L.J. Foster and M. Mann, Mass spectrometric-based approaches in quantitative proteomics, Methods, 29(2) (2003) 124–130. 29 S.E. Ong and M. Mann, Mass spectrometry-based proteomics turns quantitative, Nat. Chem. Biol., 1(5) (2005) 252–262. 30 S.E. Ong, B. Blagoev, I. Kratchmarova, D.B. Kristensen, H. Steen, A. Pandey and M. Mann, Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics, Mol. Cell Proteomics, 1(5) (2002) 376–386. 31 S.E. Ong, I. Kratchmarova and M. Mann, Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC), J. Proteome Res., 2(2) (2003) 173–181. 32 P.L. Ross, Y.N. Huang, J.N. Marchese, B. Williamson, K. Parker, S. Hattan, N. Khainovski, S. Pillai, S. Dey, S. Daniels, S. Purkayastha, P. Juhasz, S. Martin, M. Bartlet-Jones, F. He, A. Jacobson and D.J. Pappin, Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents, Mol. Cell Proteomics, 3(12) (2004) 1154–1169. 33 H. Liu, R.G. Sadygov and J.A. Yates, 3rd, A model for random sampling and estimation of relative protein abundance in shotgun proteomics, Anal. Chem., 76(14) (2004) 4193–4201. 34 P. Lu, C. Vogel, R. Wang, X. Yao and E.M. Marcotte, Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation, Nat. Biotechnol., 25(1) (2007) 117–124.
478
Leonard J. Foster
35 F. Blondeau, B. Ritter, P.D. Allaire, S. Wasiak, M. Girard, N.K. Hussain, A. Angers, V. LegendreGuillemin, L. Roy, D. Boismenu, R.E. Kearney, A.W. Bell, J.J. Bergeron and P.S. McPherson, Tandem MS analysis of brain clathrin-coated vesicles reveals their critical involvement in synaptic vesicle recycling, Proc. Natl. Acad. Sci. USA, 101(11) (2004) 3833–3838. 36 A. Gilchrist, C.E. Au, J. Hiding, A.W. Bell, J. Fernandez-Rodriguez, S. Lesimple, H. Nagaya, L. Roy, S.J. Gosline, M. Hallett, J. Paiement, R.E. Kearney, T. Nilsson and J.J. Bergeron, Quantitative proteomics analysis of the secretory pathway, Cell, 127(6) (2006) 1265–1281. 37 Q. Liu, G. Tan, N. Levenkova, T. Li, E.N. Pugh, Jr, J. Rux, D.W. Speicher and E.A. Pierce, The proteome of the mouse photoreceptor sensory cilium complex, Mol. Cell Proteomics (2007).
CHAPT ER
20 Metabolic Labeling Approaches for the Relative Quantification of Proteins Edward L. Huttlin, Adrian D. Hegeman and Michael R. Sussman
Contents
1. Introduction 2. Selected Metabolic Labeling Strategies 2.1 Full metabolic labeling 2.2 Stable Isotopic Labeling by Amino Acids in Cell Culture (SILAC) 2.3 Partial metabolic labeling 3. Practical Experimental Considerations 3.1 Selection of isotopic label 3.2 Introduction of label into organism 3.3 The importance of steady state labeling 3.4 Computational approaches for automated data analysis 3.5 Reciprocal experimental design 4. Comparison of Full versus Partial Labeling 4.1 Spectral complexity 4.2 Effects on peptide identification via tandem mass spectrometry 4.3 Numbers of peptide identifications 4.4 Quantification 4.5 Dynamic range 4.6 Accuracy 4.7 Summary 4.8 Analogies with other quantitative proteomic techniques 5. Future Directions 5.1 Expansion to new organisms and biological systems 5.2 Extension of techniques and concepts to protein turnover analysis 5.3 Further developments of MS instrumentation 5.4 Conclusion References
Comprehensive Analytical Chemistry, Volume 52 ISSN: 0166-526X, DOI 10.1016/S0166-526X(08)00220-1
480 483 484 484 485 487 487 489 491 493 495 498 499 499 501 503 503 504 506 506 507 507 507 508 508 509
r 2009 Elsevier B.V. All rights reserved.
479
480
Edward L. Huttlin et al.
1. INTRODUCTION In recent years rapid developments in instrumentation and informatics have enabled the routine identification of thousands of peptides and proteins in complex biological samples using mass spectrometry (MS). When combined with advances in sample preparation and fractionation, these techniques enable the high-throughput study of proteins isolated from organisms living under a wide variety of biological conditions. This technology has subsequently been applied by a number of researchers who have performed broad survey experiments to catalog those proteins that are expressed in a wide variety of experimental systems under diverse conditions, providing insight into a variety of fields ranging from cancer biology [1] to insect physiology [2]. While surveying the identities of proteins identified in biological samples may in some cases provide useful biological information, this level of analysis on its own provides only the most cursory characterization of the proteome. Ideally, one must be able not only to identify the proteins present, but also to compare the relative abundances of each species under diverse conditions in a quantitative way. Recently, a variety of strategies for quantitative proteomic characterization have been developed that allow the comparison of abundances for specific proteins across multiple biological samples [3,4]. These strategies allow detection of more subtle proteomic changes and thus better characterization of the biological system in question. When a peptide is detected by a mass spectrometer, its signal intensity is determined not only by its abundance but also by its own ionization properties and the ionization properties of any other chemical species it may accompany. Competition for ionization potential during both electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI), broadly termed ‘‘matrix effects’’, can produce dramatic changes in signal intensity for species in complex analyte mixtures. In addition, the signal observed for each peptide is prone to variability introduced by the extensive sample preparation that is generally required prior to mass spectrometric analysis. Since many of these effects vary from sample to sample, comparing intensities of even the same peptide to determine its relative abundance among multiple mass spectrometric analyses often leads to unacceptable uncertainty for quantitative work. Many experimental approaches for quantitative proteomics address these sources of variability with stable isotopic labeling. In a typical isotopic labeling experiment, the goal is to compare protein abundances between two samples. While many different isotopic labels may be selected and introduced in many different ways, the general strategy is the same. At some point during growth or sample preparation, proteins or peptides from each sample are labeled with either ‘‘light’’ or ‘‘heavy’’ forms of a chemical tag. These tags are chemically identical, except that the ‘‘heavy’’ form is made with one or more stable (non-radioactive) heavy isotopes (13C, 18O, 15N, or 2H), whereas the ‘‘light’’ form contains all elements in their natural abundance states (predominantly 12C, 16O, 14N, and 1H). Once they are isotopically labeled, both samples are combined to provide internal control for all subsequent steps in sample preparation and analysis. As normally
Metabolic Labeling Approaches
481
implemented, any isotope effects associated with these labeling techniques are minor and are generally undetectable via proteomic techniques. Because they display essentially identical chemical properties, ‘‘light’’ and ‘‘heavy’’ forms of each species behave identically during sample preparation and are analyzed simultaneously, experiencing exactly the same fractionation and ionization conditions. Yet ‘‘light’’ and ‘‘heavy’’ peptides produce distinct isotopic envelopes and can be distinguished in the mass spectrometer based on the mass difference introduced by the isotopic label. (Figure 1 for an explanation of isotopic envelopes.) Ultimately, due to the excellent internal control, relative intensities of ‘‘light’’ and ‘‘heavy’’ forms of each peptide may be compared to determine its relative level of abundance in the original samples. Though many approaches for stable isotopic labeling have been developed, they can generally be divided into two groups depending on the stage in sample preparation at which the isotopic label is introduced. In vitro isotopic labeling includes techniques that introduce an isotopic label either enzymatically or chemically onto partially isolated proteins or peptides during sample preparation and digestion (Figure 2). These approaches are the most versatile of quantitative proteomics techniques and can generally be applied to proteins or peptides from any biological system. However, because the isotopic label is introduced at a relatively late stage in sample preparation, these approaches are prone to the effects of sample handling prior to sample combination. Isotope-Coded Affinity Tag (ICAT) [6] and isobaric tag for relative and absolute quantitation (iTRAQ) [7] are popular commercialized approaches for chemical labeling that respectively introduce isotopic labels on sulfhydryl groups (cysteine) and primary amines (lysine and N-termini). An isotopic label can also be introduced enzymatically via tryptic digestion of samples in either H216O or H218O [8,9]. Another way to introduce an isotopic label into a sample is through direct metabolic incorporation of isotopically labeled nutrients from media or food, often called in vivo or metabolic labeling (see Figure 2). When an isotopic label is provided through diet or as part of culture media, it will be naturally incorporated into the organism or biological system of interest during normal growth and development. Although relatively large isotope effects associated with 2H labeling generally limit its usefulness for this approach, organisms ranging from individual cells such as yeast and Escherichia coli to multicellular organisms such as rats may be grown entirely on 15N or 13C, with no obvious deleterious effects. Samples from these labeled organisms may be combined with samples from organisms grown on natural abundance media immediately upon sacrifice or harvest, allowing internal control for all steps in sample preparation. This includes potentially highly variable steps such as tissue homogenization and protein extraction that are generally not controlled when other labeling methods are used. As a quantitative proteomic technique, metabolic labeling offers some unique advantages that make it especially well suited for analysis of selected biological systems [10]. Its primary strength is the unparalleled internal control it provides throughout sample preparation and analysis, even at the protein level. This provides an opportunity for extensive fractionation of complex protein samples prior to tryptic digestion, including immunohistochemical approaches
482
Edward L. Huttlin et al.
Figure 1 Peptide isotopic envelopes. Note: Peptides are predominantly made from carbon, nitrogen, oxygen, hydrogen, and sulfur. Because each of these elements exists as multiple isotopes with distinct masses, each peptide exists not as a single chemical species, but rather as a collection of species with identical chemical formulae that differ in their isotopic composition and thus in their mass. On mass spectrometers with sufficient resolution these variants of each peptide with different isotopic composition (isotopomers) produce separate signals reflecting their differences in mass. The collection of signals produced by all isotopomers of a particular peptide (or other chemical species) is called an isotopic envelope. The relative intensities of each peak in the isotopic envelope reflect the abundances of isotopomers at each mass. Plotted above is the isotopic envelope that is observed for the specified peptide sequence. Around five or six separate peaks are clearly visible in this isotopic envelope, and each peak is separated by a single Dalton mass difference. Since this peptide is doubly charged, this corresponds to a 0.5 m/z difference from peak to peak. The peak from the isotopomer that contains only most common isotopic form of each element (12C, 14N, 1H, 16O, and 32S) is called the monoisotopic peak. Immediately to its right is the [M+1] isotopic peak due to isotopomers containing a single heavy atom (13C, 15N, or 2H). Similarly, the next peak to the right [M+2] is due to isotopomers containing one or more heavy isotopes that increase the mass by two (one 18O, or two 13C, 15N, or 2H). This pattern continues for all peaks in the isotopic envelope. The relative abundances of isotopomers at each mass are determined by the total numbers of each element in a particular peptide, as well as the abundances of each isotope. If the chemical formula and isotopic abundances are known, then the resulting shape of the isotopic envelope can be calculated from the binomial distribution [5].
or continuous separations such as SDS-PAGE, with complete internal control. Additionally, many model systems such as yeast, bacteria, plants, and cultured cells can be easily and economically grown in the presence of a variety of isotopic labels, despite some additional cost for labeling reagents. Several strategies exist for metabolic labeling, depending on the experimental system of interest. We will introduce each of these and discuss some practical considerations for the design and execution of metabolic labeling experiments. Then we will compare the performance of two such approaches for
Metabolic Labeling Approaches
483
Figure 2 Comparison of isotopic labeling techniques. Note: Two general strategies are shown for introducing stable isotopic labels for MS-based relative quantification. For in vitro labeling, protein is extracted separately from two biological samples (in this case plants A and B) and the heavy and light isotopic labels are introduced during a subsequent chemical or enzymatic step. Metabolic labeling, in contrast, relies on the biological system’s capacity to incorporate isotopically labeled nutrients during growth as a means of universally labeling target biomolecules. In vitro approaches are generally more broadly applicable and have fairly predictable spacing between heavy and light peaks, but suffer from variability introduced during processing prior to mixing. Metabolic labeling provides internal control for all extraction and processing steps, but is limited to biological systems where isotopically labeled nutrients can be incorporated to equilibrium and heavy and light peak spacing depends on the chemical formula.
characterization of complex proteomic samples. Finally, we will highlight analogies to other protein quantification strategies and consider future directions and applications of metabolic labeling for quantitative proteomics.
2. SELECTED METABOLIC LABELING STRATEGIES There are three primary experimental approaches for using metabolic labeling to compare steady-state levels of protein abundance on a proteomic scale: Full Metabolic Labeling, Stable Isotopic Labeling by Amino Acids in Cell Culture (SILAC), and Partial Metabolic Labeling. Though the means and extent of isotopic labeling vary among them, all three techniques are based upon the fundamental idea that isotopically labeled and natural abundance forms of each peptide may be compared in a mixed sample to determine the relative
484
Edward L. Huttlin et al.
abundance of each peptide in the original samples. Because these techniques differ in the nature of isotopic labeling required, each is uniquely suited for application in particular experimental systems, and each presents unique challenges for data analysis and interpretation. Each approach is explained in detail below, highlighting its unique qualities.
2.1 Full metabolic labeling In full metabolic labeling, the organism of interest is grown with access to food or media containing only labeled nutrients. The isotopic label is then incorporated into all proteins and other biological molecules through normal growth. If allowed to grow for a sufficient period of time, essentially complete substitution of 15N or 13C for their natural abundance counterparts may be achieved. When a completely labeled protein mixture is combined with a matched natural abundance sample and analyzed via MS, labeled and unlabeled forms of each peptide produce completely distinct isotopic envelopes, separated by 1 Da for each labeled atom in the peptide. Because the spacing between labeled and unlabeled isotopic envelopes depends on the number of labeled atoms in each peptide, it will vary depending on peptide sequence (Figure 3). Once the sequence of a peptide is known, one can find its matching envelope in the mass spectrum based on its elemental composition and can compare intensities of labeled and unlabeled peptide forms. Full metabolic labeling has been applied to study many different organisms, from yeast [11] to multicellular eukaryotes including Caenorhabditis elegans and Drosophila melanogaster [12] as well as rats [13] and even intact potato [14] and Arabidopsis thaliana plants [15]. This approach is particularly useful for the study of model organisms such as bacteria and yeast that can be grown on defined media containing simple nitrogen or carbon sources.
2.2 Stable Isotopic Labeling by Amino Acids in Cell Culture (SILAC) In SILAC, organisms are grown on food or media containing a specific amino acid that has been labeled with 15N and 13C [16]. This approach has also been applied for single-celled organisms such as yeast [17,18]. This approach is analogous to full metabolic labeling, except rather than isotopically labeling a particular atom throughout the organism, a selected amino acid is ubiquitously replaced with its isotopically labeled form. When labeled and unlabeled samples are combined, the heavy and light envelopes are separated by a fixed mass difference for each isotopically labeled amino acid in a given peptide sequence (see Figure 3). This fixed mass difference varies depending on how many heavy isotopes can be introduced in the amino acid that is chosen for substitution. Selection of the amino acid to be isotopically labeled depends on many factors. As its name implies, this technique is particularly useful for labeling cells grown in culture, although it may be applied in other experimental systems as well [19].
Metabolic Labeling Approaches
485
Figure 3 Comparison of expected isotopic labeling patterns for multiple metabolic labeling strategies. Note: Because the label used is different, each of the primary metabolic labeling strategies produces distinctive patterns of labeled and unlabeled isotopic distributions that are used to determine the relative abundance of each peptide. Diagrammed above are simulated isotopic envelopes for the peptide VQLAETYLSQAALGDANADAIGR. Envelopes are displayed for full and partial metabolic labeling as well as for SILAC at multiple ratios of heavy and light peptide abundance. Note that both full metabolic labeling and SILAC result in two separate envelopes, whereas partial metabolic labeling results in a single composite envelope. For full metabolic labeling at 98% 15N, heavy and light envelopes are separated by 1 Da for each nitrogen in the peptide, whereas in SILAC the heavy and light envelopes are separated by a fixed mass for every labeled atom in the peptide. In this case the peptide contains a single Arg residue, resulting in a shift of 6 Da. In the partial metabolic labeling case, note how the shape of the composite isotopic envelope changes as the mixing ratio changes.
2.3 Partial metabolic labeling Though they provide significant advantages for internal control of sample preparation and analysis, traditional full metabolic labeling and SILAC may
486
Edward L. Huttlin et al.
be difficult or expensive to apply in some experimental systems. For example, although intact plants such as Arabidopsis may be easily and economically labeled to high 15N incorporation, such labeling may require somewhat unusual growth conditions [15]. For some organisms, the growth conditions required for metabolic labeling may limit its application for investigation of some biological questions. Furthermore, labeling of mammals with 15N or 13C may quickly become expensive, requiring relatively large amounts of food and long growth times possibly spanning multiple generations to achieve a high and uniform incorporation of the labeled isotope [13,20]. If quantitative proteomic information could be obtained from samples using lower levels of isotopic incorporation, metabolic labeling could be applied more economically to a greater number of organisms under a wider variety of experimental conditions. An alternative strategy called partial metabolic labeling has been recently demonstrated to be an effective tool for quantitative proteomics. This approach is based upon a key observation made by Whitelegge and co-workers: when an unlabeled peptide mixture is combined with a second sample that is partially labeled to a specific low incorporation of 15N or 13C, the isotopic envelopes for the two forms of each peptide overlap, forming a composite distribution whose shape reflects the relative contributions of both labeled and unlabeled forms of each peptide [21]. We have subsequently demonstrated that when the incorporation of the partially labeled sample is known, the relative abundances of labeled and unlabeled forms of each peptide may be determined with high accuracy based on the shape of the composite isotopic envelope. Furthermore, this partial metabolic labeling strategy may be effectively applied for quantitative proteomic analysis of complex protein mixtures [22]. As a quantitative proteomic technique, partial metabolic labeling differs from both full metabolic labeling and SILAC in a fundamental way. In both full metabolic labeling and SILAC, labeled and unlabeled forms of the same peptide produce two distinct isotopic envelopes that are separated by several mass units. For these two approaches, the relative abundance of a particular peptide is determined by comparing the signal intensities of these distinct isotopic distributions. In contrast, when partial metabolic labeling is used, the labeled and unlabeled distributions coalesce. Differences in the shape of a single composite isotopic envelope are used to determine the relative abundances of labeled and unlabeled forms for each peptide (see Figure 3). Because it involves deconvolution of overlapping isotopic envelopes from related species, partial metabolic labeling bears some conceptual similarity to Mass Isotopomer Distribution Analysis (MIDA), a technique developed by Marc Hellerstein and Richard Neese and used to study processes of biopolymerization [23,24]. Though it differs from other metabolic labeling techniques in this key respect, partial metabolic labeling still provides the primary benefit of metabolic labeling: unparalleled internal control for all steps in sample preparation and analysis. Because a lower level of isotopic enrichment is required, partial metabolic labeling should be compatible with a greater variety of model organisms that are raised under a wider range of experimental conditions.
Metabolic Labeling Approaches
487
3. PRACTICAL EXPERIMENTAL CONSIDERATIONS Metabolic labeling is essentially quite straightforward: one organism is raised on natural abundance food or media while another organism is raised on isotopically enriched food or media. The samples are then combined and analyzed to compare protein abundances in the two organisms. In practice, however, a number of technical issues must be considered if metabolic labeling is to be applied as a quantitative technique on a proteomic scale. Beyond selecting a particular metabolic labeling technique, one must select an isotopic label and determine how it will be introduced into the model organism of interest. Furthermore, one must ensure that complete and stable incorporation is achieved. Once samples are generated and data is collected, an automated system for quantitative analysis is required. Finally, certain control experiments must be incorporated into the overall experimental design to account for potential differences between the labeled and unlabeled nutrient sources. Each of these issues is discussed in greater detail below, with practical advice and suggestions for the successful execution of metabolic labeling experiments.
3.1 Selection of isotopic label For full metabolic labeling, either 15N or 13C may be used as an isotopic label. Isotope effects associated with either of these labels are minor and are generally undetectable via quantitative proteomic techniques. In practice, 15N is usually selected because it is often easier and more economical to introduce in a controlled way. This is especially true for plants, which can fix gaseous carbon dioxide and thus require atmospheric control for exhaustive 13C incorporation. Another key advantage for metabolic labeling with 15N is its increased tolerance for incomplete isotopic enrichment. Because peptides contain fewer nitrogens than carbons, their isotopic envelopes broaden more slowly as the level of isotopic enrichment decreases (Figure 4). At realistically achievable levels of incorporation, 15N metabolic labeling generally provides more clearly defined isotopic envelopes at a better signal to noise ratio. Although one could arguably use other isotopes to metabolically label proteins as well, these are less commonly chosen for a variety of reasons. For example, deuterium labeling is rarely used because substitution of deuterium for hydrogen results in significant, often physiologically disruptive, isotope effects. Not only does deuterium tend to interfere on a biological level, but it can alter the behavior of peptides during fractionation as well [25]. Furthermore, metabolic labeling with 18O is generally not used due to expense, and because many oxygens occupy exchangeable positions in peptides and could be scrambled with natural abundance water during sample preparation and analysis. Thus, metabolic labeling is usually performed using 15N as the isotopic label. Similarly, either 15N or 13C may be used for partial metabolic labeling with good success, although 15N is likely more tolerant of small variations in incorporation, as discussed earlier. The specific incorporation level chosen for partial metabolic labeling is critical for the success of this technique. Ideally, the
488
Edward L. Huttlin et al.
Figure 4 Effects of 15N and 13C incorporation on isotopic envelope shape. Note: Two families of simulated isotopic envelopes are shown for the peptide GLDIETAGHY[pT]V (a C-terminal tryptic peptide of the A. thaliana P-type proton ATPase AHA3 with regulatory phosphorylation site). The simulations show how the isotopic envelopes are affected by differences in isotopic incorporation of 15N and 13C when each is varied from natural abundance to 100%. In both cases incomplete incorporation distributes the total signal across many peaks in a broad envelope with the undesirable analytical consequence of reducing the observable signal to noise. Only at very high or low incorporation does the majority of the signal collapse into a minimum number of peaks resulting in optimal signal to noise. Carbon atoms are significantly more numerous than nitrogen atoms in typical peptide elemental compositions resulting in significantly broader envelopes for carbon than nitrogen at identical incorporation percentages. In this example, with 14 nitrogens and 56 carbons, there are clear differences in the widths of the 13C- and 15N-labeled distributions. Using 15N provides an added analytical advantage as it is more tolerant of incomplete incorporation.
incorporation of the isotopic label should be sufficiently high so that the labeled and unlabeled isotopic envelopes have distinct shapes when analyzed individually. Yet the incorporation must be low enough that existing software for datadependent MS/MS analysis and peptide identification will work properly. The optimal isotopic incorporation depends on both the isotopic label used (15N or 13 C) and the average size of the peptides to be analyzed, which is proteasedependent. For a more detailed mathematical treatment of this subject, see Huttlin et al. [22]. An optimal 15N incorporation of B6% for proteomic analysis of tryptic peptides is demonstrated therein. The proper selection of a labeled amino acid is critical for the success of SILAC experiments. To provide interpretable experimental results, the isotopically labeled amino acid should completely replace its natural abundance counterpart in all proteins in the sample. Furthermore, the heavy isotopes should be confined to the original amino acid and should not be permitted to spread to other molecules. In practice, the amino acid to be labeled must be essential, and cannot be synthesized by the organism or cell line in question. Furthermore, the
Metabolic Labeling Approaches
489
labeled amino acid must not be significantly degraded through normal metabolic processes on the timescale of the experiment. Finally, relative quantification is only possible for those peptides that contain the labeled amino acid within their sequence. Thus, a common amino acid should be selected for labeling so that it will be found in as many peptides as possible. In many cases, both arginine and lysine are labeled for SILAC experiments [26,27]. Then every tryptic peptide should contain at least one labeled residue. At first glance, it may be surprising that arginine is used, as it is not an essential amino acid for adult vertebrates. However, recent experiments have shown arginine to be essential for many cells in culture [28] and its effectiveness for SILAC has been verified for selected cell lines [29]. Yet caution must be taken because in some cell types arginine may be metabolized to produce proline, diluting its isotopic label among multiple amino acids [10,29]. Thus, it is a good practice to verify the appropriateness of arginine or any other amino acid for one’s chosen experimental system prior to any SILAC experiment. For a detailed discussion of practical experimental aspects of SILAC, see Ong and Mann [30]. In most metabolic labeling experiments, a single isotopically labeled sample is combined with a single natural abundance sample for comparison. However, multiple labels may be used for both SILAC and full metabolic labeling to compare more than two biological samples in a single analysis. Snijders et al. combined 15N- and 13C-labeled samples with a natural abundance control to simultaneously compare three samples via full metabolic labeling [31]. Additionally, Blagoev et al. combined unlabeled samples and samples labeled with two different forms of isotopically labeled arginine to compare a total of three samples in a single SILAC analysis [32]. For some experiments in selected model organisms, these approaches for multiplexing analysis of multiple samples can be very useful. Until now we have discussed metabolic labeling strategies in which a sample bearing an isotopic label is compared with a natural abundance sample to compare relative levels of abundance. One could instead compare the natural abundance sample with an isotopically depleted sample. Such an approach was employed by Smith and co-workers to compare protein abundance among natural abundance and 15N-,13C-depleted samples during intact protein analysis [33]. This approach can be used for comparison of relatively large molecules (such as intact proteins) in which the natural abundance and depleted envelopes would be resolvable. Alternatively, isotope-depleted samples could be used in conjunction with smaller peptides at isotopic incorporations that generate overlapping envelopes, via an approach analogous to partial metabolic labeling. The isotopic incorporation of the non-depleted sample could be adjusted to levels somewhat higher than natural abundance if needed as in the case of partial labeling, to optimize performance of the technique.
3.2 Introduction of label into organism The means by which an isotopic label is introduced varies considerably depending on the organism in question. Some organisms can be grown
490
Edward L. Huttlin et al.
on relatively simple labeled precursors, such as 13C-glucose or a variety of 15 N-containing salts. These organisms include numerous bacteria such as Deinococcus radiodurans [34,35], yeast [11], plant cells [36–39], and even intact potato plants [14] and Arabidopsis seedlings [15,40]. Some of these organisms, such as yeast, can also be grown economically on specific labeled amino acids for SILAC experiments [17]. Because the costs of their simple labeled precursors are low, metabolic labeling of these particular organisms is especially economical. Other systems, including cultured mammalian cells, cannot be raised on simple precursors and must instead be grown in the presence of specific labeled amino acids. These are often ideal candidates for SILAC, which has been used for a variety of biological systems that have been reviewed by Mann [19]. Cell types include those derived from mammals [16] as well as plants [41]. Other researchers have used media formulated with 15N-labeled amino acid mixtures to achieve full metabolic labeling of a variety of mammalian cell lines, including mouse B16 melanoma cells [34] and cultured HeLa cells [42]. While many simple organisms can be labeled directly on isotopically labeled precursors, other more complex organisms must be labeled indirectly via feeding on other isotopically labeled organisms. This approach was first reported by Krijgsveld et al., who labeled C. elegans and D. melanogaster through feeding on 15N-labeled E. coli and yeast, respectively [12]. This approach has also been extended to mammals. Researchers from the Yates laboratory have reported 15 N metabolic labeling of rats that were fed a diet derived from labeled Spirulina platensis, a prokaryotic blue-green alga [13,20]. Summarized in Table 1 are those organisms that have been studied using steady-state metabolic labeling for proteomic characterization. Due to rapid developments in this field the list is likely not complete, but is meant to demonstrate the versatility of metabolic labeling strategies for quantitative proteomics. References are included to specific protocols for metabolic labeling of each organism. In a metabolic labeling experiment, both organisms labeled with heavy and light isotopes should be as identical as possible. However, actually raising some organisms entirely on a diet containing a heavy isotope may be technically difficult or prohibitively expensive. As a result, some researchers have begun to investigate alternative sources of labeled proteins to expand the feasibility of metabolic labeling as a quantitative technique. Although intact mammals such as mice would be fairly difficult and expensive to raise on a diet containing 15N, mouse cells in culture may be quite easily labeled with high efficiency. Thus, researchers have tried using cultured mouse brain cells as isotopically labeled standards for quantitative proteomic characterization of intact mouse brains [59]. The labeled protein extracts from the cultured cells serve as internal standards and allow the indirect comparison of protein abundance in extracts from multiple types of intact mouse brains. Of course, only those proteins that are expressed in both the cultured brain cells and intact brain tissue may be quantified. Additionally, when employed in this way, metabolic labeling no longer provides internal control for tissue homogenization and protein extraction. However, this strategy can be useful for quantitative proteomic characterization of tissues that are especially difficult to label and for which a matching cell line is available.
Metabolic Labeling Approaches
Table 1
491
Steady-state metabolic labeling of selected organisms and cell types Labeling approacha
Reference
15
Organism Arabidopsis thaliana Arabidopsis thaliana Caenorhabditis elegans Deinococcus radiodurans Drosophila melanogaster Escherichia coli Escherichia coli Glycine max (soybean) Methanococcus maripaludis Methanosarcina acetivorans Plasmodium falciparum (HB3 and Dd2) Rattus norvegicus Saccharomyces cerevisiae Saccharomyces cerevisiae Solanum tuberosum (potato) Sulfolobus solfataricus
N, 13C-FML N-PML 15 N-FML 15 N-FML 15 N-FML 15 N-FML SILAC 15 N-FML 15 N-FML 15 N-FML SILAC 15 N-FML 15 N-FML SILAC 15 N-FML 15 N-FML
[15,40] [22] [12] [33–35] [12] [33,43] [44] [45] [46] [47] [48] [13,20] [11,33,49] [17,50,51] [14] [31,51]
Cell culture General protocols and considerations Arabidopsis thaliana suspension culture B16 melanoma cells C2C12 myoblast cells (murine) D3 embryonic cells HeLa cells Human skin fibroblasts NIH 3T3 fibroblasts PC3M (human prostate carcinoma) Sf9 insect cells
SILAC 15 N-FML 15 N-FML SILAC SILAC SILAC SILAC SILAC SILAC SILAC
Reviewed in [19,30] [36–39,52] [34] [16] [53] [32,54,55] [18] [16,56] [57] [58]
15
a
Abbreviations for labeling approaches include: FML, Full Metabolic Labeling; PML, Partial Metabolic Labeling; SILAC, Stable Isotopic Labeling by Amino Acids in Cell culture.
Similarly, Snijders et al. have investigated using metabolic labeling to compare protein abundances across multiple bacterial species using peptides that are shared between both organisms [60].
3.3 The importance of steady state labeling During metabolic labeling, an isotopic label is incorporated into an organism from its food source through growth. This process is not instantaneous — during early stages of labeling, proteins incorporate isotopic labels at different rates, depending on their relative rates of turnover and the effective isotopic incorporation levels of their accessible precursor pools. These rates of turnover can vary significantly from protein to protein in any organism and from tissue to
492
Edward L. Huttlin et al.
tissue in complex multicellular organisms. Before uniform and complete labeling is achieved, significant variation in levels of isotopic incorporation may be seen across multiple proteins and tissues from a particular organism at any given time. While this variability in isotopic incorporation can provide interesting information regarding rates of protein turnover [61], it can complicate the use of metabolic labeling for comparison of protein abundances when uniform and complete isotopic labeling is not achieved.
3.3.1 Effects of incomplete labeling The effects of incomplete isotopic labeling on quantification vary depending on the type of metabolic labeling that is used. In the case of full metabolic labeling, the effects vary depending on the stage of isotopic incorporation that is achieved. At early stages in isotopic labeling, the relative distribution of labeled and unlabeled atoms among proteins is complex. Different patterns are seen for single-celled and multicellular organisms, depending on the relative sizes and availability of unlabeled and labeled precursor pools as well as the degree to which labeled and unlabeled precursors are allowed to mix. These early stages of isotope incorporation are completely incompatible with full metabolic labeling. At later stages in isotopic incorporation when labeled and unlabeled precursor pools mix, the isotopically labeled sample may display single peptide envelopes with intermediate isotopic incorporation (Figure 4). Depending on the level of incorporation that is achieved, this stage may be compatible with full metabolic labeling, as long as isotopic envelopes from labeled and unlabeled peptide samples are clearly separated and no unlabeled peptides remain in the isotopically enriched sample. At lower incorporations, the isotopic envelope is spread over many possible m/z values, decreasing the signal to noise ratio and reducing the effectiveness of quantification. Overall, higher levels of isotopic incorporation give better quantitative results via full metabolic labeling. While quantification is possible for peptides with lower incorporation, nearly complete labeling is required for identification of labeled peptides via existing MS/MS peptide search algorithms. For a more thorough discussion of the effects of variable levels of 15N incorporation on peptide isotopic distributions, see Snijders et al. [62]. As with full metabolic labeling, incomplete incorporation interferes in SILAC analysis. At low incorporations, multiple overlapping distributions are observed for each peptide. While corrections can occasionally be made to adjust for partially overlapping envelopes, lower incorporations complicate quantitative analysis. In practice, incomplete labeling is a more serious problem for SILAC than for full metabolic labeling because the spacing between the labeled and unlabeled envelopes for each peptide is much smaller. This results in greater interference between labeled and unlabeled isotopic envelopes when lower incorporation is achieved. Because the labeled and unlabeled forms of each peptide produce overlapping isotopic envelopes via partial metabolic labeling, one must assume constant and known levels of isotopic incorporation for both labeled and unlabeled peptides to accurately determine their relative abundance. In practice,
Metabolic Labeling Approaches
493
this requires analyzing a small portion of both labeled and unlabeled samples separately in order to determine average levels of isotopic incorporation. If incorporation levels have not stabilized at the target enrichment for both labeled and unlabeled samples, then this technique will provide inaccurate results. Because the heavy and light isotopic envelopes for each peptide completely overlap and are indistinguishable in mixed samples, this technique is the most sensitive to variations in levels of isotopic incorporation.
3.3.2 Achieving complete isotopic labeling Ultimately complete and uniform isotopic labeling is achieved by allowing the model organism to grow on isotopically labeled food for a sufficiently long time that its unlabeled mass is diluted to a negligible level. For unicellular systems, growth for approximately eight doublings on labeled media is usually sufficient for complete labeling (B100%, or reagent-limited) [62]. Similar results have also been demonstrated within seven doublings in yeast [63] for full metabolic labeling and incorporations of better than 97% have been obtained within five doublings for SILAC labeling of cells in culture [30]. Intact plants, such as Arabidopsis, can also be fully labeled with 15N following growth from germination for two weeks in liquid culture [15]. In contrast to plants and unicellular organisms, complete (reagent-limited) metabolic labeling is more difficult to achieve in mammals. In the first published example of metabolic labeling of a mammal for proteomic analysis, an adult rat was fed a 15N-labeled diet for 48 days. At sacrifice the 15N incorporations varied widely from tissue to tissue, ranging from B74% in the brain to B91% in the liver [13]. These incorporations have subsequently been improved by raising multiple generations of rats on an 15N-labeled diet. When a mother rat was also fed a labeled diet from weaning through pregnancy, tissues taken from her pups at 45 days after birth displayed a uniform 15N incorporation of B95% even in tissues such as the brain which have slow rates of protein turnover [20]. Though truly complete labeling has not yet been reported in mammals to date in a proteomic context, stable levels of 15N-incorporation at 95% will work quite well for quantitative proteomics.
3.4 Computational approaches for automated data analysis An automated system for analyzing data is essential if metabolic labeling is to be used in a proteomic context. In recent years a number of systems have been developed that will automatically determine relative ratios of peptide and protein abundance between metabolically labeled samples. These systems have been designed for each of the metabolic labeling strategies discussed previously and are compatible with a range of different MS instrument platforms. These programs are generally freely available, either as compiled programs or as programming code that may be used or altered as needed. We provide below a brief overview of several software packages and algorithms for analysis of metabolic labeling data.
494
Edward L. Huttlin et al.
One package for analysis of data from metabolic labeling experiments is MsQuant (http://msquant.sourceforge.net). Created by members of the laboratory of Matthias Mann, this package was originally designed for the automated analysis of data from SILAC experiments [54]. Instructions and add-ons are provided to accommodate full metabolic labeling as well [52]. A particularly useful algorithm for automated quantitative analysis of metabolic labeling data from liquid chromatography (LC)-MS experiments uses linear regression to determine ratios of abundance of labeled and unlabeled forms of each peptide (Figure 5). Originally proposed by Thorne et al. [64] and implemented by MacCoss and co-workers for proteomic analysis [65], this algorithm is robust to unusual chromatographic profiles and naturally corrects for background noise. Additionally, the regression method provides a correlation coefficient that indicates the quality of quantification for each peptide. It is the basis for their software package, RelEx, which is designed for quantitative analysis of data from any of several isotopic labeling techniques, including full metabolic labeling, SILAC, and ICAT, among others. This approach was also adapted by our laboratory for the analysis of quadrupole time of flight (QTOF)
Figure 5 Correlation strategy for relative quantification of isotopically labeled peptides following LC–MS analysis. Note: The correlation strategy for relative quantification first relies on the identification of the peptide sequence from assignment of heavy or light species MS/MS data. Once the formula of the peptide is known the spacing between the monoisotopic peaks of the heavy and light envelopes is defined (A) and extracted ion chromatograms (EICs) can be generated (B). Each pair of heavy and light datapoints along the elution time axis of the EICs is then plotted against each other (C) as illustrated by the colored points in (B) and their corresponding point in (C). Because the heavy and light labeled peptides coelute exactly, the points in (C) form a line. A linear regression (D) of this plot provides the ratio of heavy to light as the slope and an estimate of noise/baseline as the y-intercept. The correlation coefficient from the regression analysis can be used to evaluate the success of the operation allowing rapid assessment without visual inspection. (See Colour Plate Section at the end of this book.)
Metabolic Labeling Approaches
495
data from full metabolic labeling experiments via a series of Mathematica scripts [15,22]. Most recently, researchers from the Yates laboratory have released an updated version of RelEx called ‘‘Census’’ that has been adapted for use with the LTQ-Orbitrap hybrid mass spectrometer [49]. This program takes advantage of the increased dynamic range and high mass accuracy as well as the fast sequencing speed of this state of the art instrumentation for experiments involving full metabolic labeling. Researchers from Oak Ridge National Laboratories have also released an algorithm that uses principal component analysis (PCA) for automated analysis of data from full metabolic labeling experiments [66]. While mathematically equivalent to the linear regression method used by MacCoss and co-workers, the PCA method allows the automated estimation of signal to noise ratios for each peptide. This provides a reliable metric for the automated removal of poor measurements from large datasets. They have also provided tools that aid in estimation of protein abundance ratios from ratios for individual peptides [67]. Several other useful programs and algorithms for the automated interpretation of data from full metabolic labeling experiments are available as well. One program has been developed for interpretation of data collected on an LTQFTICR hybrid mass spectrometer. It takes advantage of the high resolution MS data this instrument can collect, while employing novel filters to remove questionable peptide measurements [68]. Additionally, Du et al. have published an algorithm for use of full metabolic labeling in conjunction with top-down proteomics experiments. While demonstrating the usefulness of their algorithm, they also discuss experimental challenges that arise specifically when metabolic labeling is combined with top-down strategies [69]. Finally, an algorithm for the automated determination of incorporation for isotopically labeled peptide samples based on isotopic envelope shape has been published and released as a stand-alone program [70]. This algorithm has been implemented in Mathematica as well [15]. We have incorporated the Thorne/MacCoss linear regression algorithm for analysis of LC–MS data into an algorithm for interpretation of data from partial metabolic labeling experiments. Implemented as a series of Mathematica scripts, this program determines the relative heights of each peak in the observed isotopic distribution for each peptide and compares the overall shape of the observed distribution with the shapes of predicted distributions to determine the relative contributions of light and heavy forms for each peptide [22].
3.5 Reciprocal experimental design The goal of any metabolic labeling experiment in proteomics is to identify proteins and peptides that show differences in abundance between two biological samples. However, one must exercise caution when interpreting data from these experiments: differences that are observed in a particular experiment could be either due to the biological variable of interest, or due to differences in the growth conditions to which the labeled and unlabeled samples were exposed. These
496
Edward L. Huttlin et al.
differences in growth conditions could theoretically be due to the presence of the isotope label itself; however, this is not particularly likely. Maximal primary kinetic isotope effects for 15N and 13C are known to be in the order of a few percent [71], meaning that substituting either 15N or 13C for its natural abundance counterpart would lead to a change in reaction rates of a few percent for bondmaking or bond-breaking steps that occur at the labeled atom. Any isotope effects that would be observed at the biological or analytical levels would likely be much more indirect equilibrium effects of isotope substitution and would likely be correspondingly smaller in magnitude, and ultimately undetectable via proteomic techniques. If any differences are seen, they are more likely due to the fact that the labeled and unlabeled media or food were prepared separately, perhaps under different conditions and with different reagents. Such differences are of particular concern with complex nutrient sources such as the Spirulinabased diet used for metabolic labeling of rats. To address these possible secondary effects of isotopic labeling, one should always incorporate replicates into the experimental design, exchanging the isotopic label between biological samples to be compared (Figure 6). When this reciprocal experimental design is used, it is easy to distinguish changes due to the biological variable of interest from other labeling artifacts. Changes due to the biological variable will show reciprocal ratios (heavy:light) when the isotopic label is exchanged, whereas changes due to media or growth-dependent variables will show either the same or inconsistent ratios (heavy:light) when the label is exchanged. As an additional benefit, reciprocal experimental design can also allow easy identification of exogenous contaminating peptides. Peptides from a variety of sources may be introduced into protein mixtures during sample preparation, including peptides from human keratins, trypsin, as well as other sources. In most cases these peptides should be recognized by search algorithms. However, if this step fails, they will be distinguishable from peptides of interest because they will always appear with natural abundance isotope distributions and will be unaffected by exchange of the isotopic label. For SILAC and full metabolic labeling, when samples are analyzed in pairs of runs using reciprocal labeling, this also conveniently provides a quick method for identifying differentially expressed peptides, as described by Wang et al. [43]. After running the pair of reciprocal LC–MS analyses, the two are analyzed using standard software to compare intensities for particular m/z-retention time features in both LC–MS runs. If there is no significant difference between the biological samples to be compared for a particular peptide, then there should also be no significant difference between intensities of 14N and 15N envelopes for that particular peptide across the two LC–MS analyses when the labels are switched. However, when there is a difference in expression, this will be visible as significant differences in intensity both for the 14N envelopes and the 15N envelopes when compared between the two runs (see Figure 6). Once these candidates are identified by any standard program for comparing LC–MS profiles, the relative intensities of the labeled and unlabeled envelopes within each run can be measured and compared manually. These measurements
Metabolic Labeling Approaches
497
Figure 6 Isotopic labeling patterns and reciprocal experimental design. Note: When designing metabolic labeling experiments, it is wise to include replicates for which the isotopic label has been exchanged among experimental and control samples. When abundance ratios for a particular peptide are compared across reciprocal replicates, the effect of label exchange can allow one to differentiate among artifacts and changes that are actually due to the experimental variable of interest. Diagrammed above are simulated heavy and light isotopic envelopes for a hypothetical peptide across two reciprocal replicates. Also indicated is the proper interpretation for each pattern. Note in particular that one may differentiate among food or media-dependent differences and consistent changes due to the experimental variable of interest.
should be more precise than the initial comparisons of selected envelopes between LC–MS analyses because they are internally controlled. But by incorporating reciprocal labeling into the experimental design, one may quickly identify peptides of interest using standard software tools for comparisons of
498
Edward L. Huttlin et al.
separate LC–MS analyses and can reasonably use manual analysis of metabolic labeling data to quantify changing peptides without requiring specialized programs.
4. COMPARISON OF FULL VERSUS PARTIAL LABELING Successful characterization of a peptide in a quantitative proteomics experiment requires completion of several distinct analytical steps: first, a particular precursor mass must be selected and fragmented for data-dependent MS/MS analysis; second, the resulting MS/MS spectrum must be used to identify the peptide via various search algorithms; and third, the relative levels of heavy and light forms of the peptide must be determined. The overall performance of any particular quantitative proteomic technique depends on the successful completion of each of these steps. Different techniques can influence the performance of each of these steps to variable extents, leading to differences in the overall success of each quantitative technique on a proteomic scale. We will compare the performance of full and partial metabolic labeling as quantitative proteomic techniques, highlighting how their performance at each of these stages of peptide characterization affects the overall performance of the algorithm. Our purposes for presenting this analysis are two-fold: first, by describing in detail the performance of each metabolic labeling algorithm, we intend to provide a concrete indication of the performance that can be expected from these techniques using current instrumentation; more generally, we intend to highlight the kinds of factors that may affect the performance of quantitative proteomic techniques overall. In this broader sense our work will illustrate those factors that should be considered in the evaluation of any other quantitative proteomic strategies. The data presented below is from a series of experiments our laboratory recently performed to benchmark full and partial metabolic labeling as quantitative proteomic techniques. Arabidopsis seedlings were grown in liquid culture at either natural abundance, 6% 15N, or W98% 15N levels of incorporation. These plants were identical except for their isotopic label. Following harvest, protein extraction, and digestion, the resulting peptide mixtures were combined at known ratios to evaluate the performance of each technique. The natural abundance peptide mixture was combined with the 6% 15N peptide mixture at ratios ranging from 100:1 to 1:100 to evaluate the performance of partial metabolic labeling over four orders of magnitude. Similarly, the natural abundance mixture was combined with the W98% 15N peptide mixture at the same ratios to evaluate the performance of full metabolic labeling over the same dynamic range. All analyses were performed via micro-LC–MS with automated data-dependent MS/MS collection using a QTOF mass spectrometer. Peptides were identified using Mascot [72], while a series of Mathematica scripts were used for peptide quantification via either full or partial metabolic labeling approaches. For a more rigorous analysis of these results and a more extensive comparison of full and partial metabolic labeling, see Huttlin et al. [22].
Metabolic Labeling Approaches
499
4.1 Spectral complexity Aside from the reduced amount of isotopic label that is consumed by partial metabolic labeling, one of the most obvious differences between labeling techniques is that partial metabolic labeling produces a single composite isotopic envelope for each pair of labeled and unlabeled peptides, whereas full metabolic labeling produces two distinct envelopes. As a result, full metabolic labeling has two times as many isotopic envelopes in every MS precursor spectrum. When a complex peptide mixture is analyzed that contains envelopes from many individual peptides in each precursor scan, this two-fold difference in complexity contributes to a lower signal to noise ratio overall, and some isotopic envelopes may be obscured during analysis via full metabolic labeling. In contrast, precursor spectra from analysis of the same sample via partial metabolic labeling appear much simpler, with each isotopic envelope being much more clearly defined. Displayed in Figure 7 are representative MS precursor spectra from our LC–MS analyses of Arabidopsis soluble protein digests. On top are representative spectra from a full metabolic labeling experiment, while two equivalent spectra from a partial metabolic labeling analysis are displayed below. Note how much simpler the bottom spectra are, and how much more clearly each isotopic envelope is defined. During a proteomic experiment, this difference in spectral complexity and signal to noise ratio can influence both the numbers of peptides identified and the relative quality and quantity of peptide quantifications.
4.2 Effects on peptide identification via tandem mass spectrometry The process of peptide identification in a data-dependent mass spectrometric experiment occurs in a series of sequential steps. First, a precursor spectrum is collected. Second, a particular precursor ion is selected, isolated, and fragmented to produce a product ion spectrum. Third, the monoisotopic precursor mass is assigned and the MS/MS product ion spectrum is filtered and prepared for searching. Finally, the MS/MS product ion spectrum is searched against a genome database that has been previously filtered to include only those peptides that correspond to the observed precursor mass, within specified tolerances. Each of these steps may be influenced in different ways by partial and full metabolic labeling. In partial metabolic labeling, when the ratio of labeled to unlabeled peptide is high, the spectral processing software occasionally misassigns the monoisotopic mass for the precursor ion. If this error is not corrected, the peptide cannot be correctly identified because the peptide search engine does not normally consider peptides whose mass differs from the observed precursor by 1 Da. For our experiments, we were forced to widen the precursor mass tolerances for our database searches, filtering assigned peptides afterwards to remove incorrect peptide identifications that were made due to the wide mass tolerances. This problem could be ultimately addressed by optimizing precursor selection and assignment programs for partial labeling. Until then, we have found that acceptable results may be obtained by performing database searches with
500
Edward L. Huttlin et al.
Figure 7 Comparison of spectral complexity. Note: Differences in spectral complexity between LC–MS analyses of full and partial metabolically labeled peptide samples are shown in two matched sets of sample spectra A and B. For each labeling scheme peptide species are labeled i through viii as pairs of envelopes for full metabolic labeling or as broad single envelopes for partial metabolic labeling. Full metabolic labeling produces more peaks with more overlap resulting in ambiguities shown by question marks. Partial labeling produces simpler spectra.
unusually wide precursor mass tolerances and filtering questionable peptide identifications afterward. Because isotopic envelopes from completely 15N-labeled peptides (W98%) look very similar to their unlabeled counterparts, full metabolic labeling does not
Metabolic Labeling Approaches
501
adversely affect the performance of spectral processing routines at all. In fact, when complete 15N labeling is achieved, peptides may be identified based on their fragmentation patterns and precursor masses as long as the mass definitions for each amino acid are adjusted to reflect the added mass from each 15N. To a first approximation this works as well for W98% 15N-labeled peptides as it does for their natural abundance counterparts. However, on closer examination of the identifications that are made during a full metabolic labeling experiment, subtle differences in the confidence of heavy and light peptide identifications become apparent. Greater uncertainty is seen for identifications that are made against 15N-labeled peptides than for identifications against natural abundance peptides. This difference is not due to any errors in spectral processing, but is rather an inherent consequence of the increased amino acid ambiguity resulting from the increased mass that full 15N metabolic labeling introduces. When 15N is substituted ubiquitously for 14N in proteins, this significantly increases the numbers of isobaric amino acids that must be considered. For a natural abundance sample, Leu and Ile are isobaric, whereas Lys and Gln are nominally isobaric; however, when 15N is introduced in place of 14N, Asp and Asn as well as Glu and Gln become isobaric as well. This leads to a dramatic increase in numbers of indistinguishable amino acid sequences on a proteome-wide scale. As a result, there is more ambiguity in peptide sequence assignments for heavy peptides, leading to an increased likelihood of false positive identifications by peptide search algorithms. This needs to be considered when large datasets of 15 N-labeled peptides are analyzed [15].
4.3 Numbers of peptide identifications Plotted in Figure 8 are the numbers of peptides that were identified via both full and partial metabolic labeling for a variety of mixing ratios ranging from 0 (all natural abundance) to infinity (all 15N-labeled). Overall, fairly similar numbers of peptides are identified by both techniques across the range of mixing ratios, although full metabolic labeling generally does slightly better. Looking more closely, a few trends become apparent as the mixing ratio is varied. For full metabolic labeling, the numbers of peptide identifications tend to dip in the middle, with higher numbers of identifications on either extreme. This is a consequence of spectral complexity and the process of data-dependent MS/MS analysis. Before a precursor mass is selected for sequencing in a datadependent experiment, all observed isotopic envelopes are ranked in order of decreasing intensity. Each species is then selected for sequencing in order, with higher intensity peptides being sequenced first. When the 15N and 14N forms of each peptide are roughly comparable in intensity, as is the case when the ratio is roughly 1:1, both tend to be selected for sequencing. Since the total number of sequencing events during an LC–MS experiment is finite, these redundant sequencing events reduce the number of unique peptides that may be identified in the analysis of complex peptide mixtures. As the ratio deviates from 1:1, the likelihood of redundant sequencing decreases and the number of unique peptides identified increases.
502
Edward L. Huttlin et al.
Figure 8 Influence of relative abundance on peptide identification and quantification. Note: In order to evaluate the robustness of the full and partial metabolic labeling strategies with regard to peptide identification and quantification, a series of controlled peptide mixtures from all natural abundance (ratio heavy to light ¼ 0) to all labeled (either 5.2% for partial or 98.2% for full; ratio heavy to light ¼ N) were prepared and analyzed. Results from these experiments are shown here with the full labeling data indicated with black bars and the partial labeling data with gray bars. The top panel shows how the number of peptides identified by each approach changes as a function of mixing ratio. The lower panel shows the percent of identified peptides that were successfully quantified by each approach also as a function of mixing ratio.
For partial metabolic labeling, the numbers of peptide identifications generally drop as the 15N contribution increases. This reflects difficulties with MS and MS/MS spectral processing that were discussed previously. Although errors in precursor monoisotopic ion assignment may be filtered and corrected,
Metabolic Labeling Approaches
503
these errors are more difficult to filter in MS/MS spectra. Ultimately optimization of spectral processing software will address this problem. However, in the range of ratios that are most commonly seen in biological experiments this problem does not appear to be too serious.
4.4 Quantification Once peptides were identified, each was also quantified using a series of Mathematica routines. Not all peptides can be quantified by either technique, depending on the signal to noise ratios of the relevant precursor spectra, as well as the presence of overlapping isotopic distribution and other interferences. Thus, results are filtered to remove questionable quantifications based on correlation thresholds, which serve as a simple screen for interferences without requiring manual validation. Plotted in Figure 8 are the percentages of identified peptides that were successfully quantified, separated by the ratio of heavy to light peptides for each combined sample. Note the different patterns for full and partial metabolic labeling. Partial metabolic labeling successfully quantifies almost every peptide that is identified, and does so consistently across the range of ratios tested. In contrast, full metabolic labeling displays a strong dependence on mixing ratio. The highest percentage of identified peptides is successfully quantified near a ratio of 1:1. However, as the ratio deviates toward either extreme, the rate of successful quantification drops dramatically. These trends are due to the fundamentally different way in which each ratio is determined. In the case of full metabolic labeling, one envelope is always compared with another. As the ratio approaches one extreme or the other, the lower intensity envelope tends to be lost in background noise. As a result, the quantification becomes unreliable and is necessarily excluded. In contrast, partial metabolic labeling considers only the shape of a single isotopic envelope, whether the ratio of heavy to light peptide is at 1:1 or is nearing either extreme. Thus, the success rate of these measurements based on isotopic envelope shape is relatively consistent across the range of ratios tested. Interestingly, this fundamental difference appears to give partial metabolic labeling an advantage over full metabolic labeling for successful identification of peptides that are present at very uneven ratios.
4.5 Dynamic range In addition to considering the numbers of peptides identified and quantified, it is important to consider the range over which each technique may be used to measure peptide ratios. Plotted in Figure 9 are median ratios for all peptides identified in controlled mixtures of labeled and unlabeled Arabidopsis soluble protein digests. Note that the medians for both full and partial metabolic labeling are very similar and generally match the known mixing ratios between the ranges of 1:10 and 10:1. Outside this range, although both techniques report significant differences in relative abundance, the ratios become more qualitative
504
Edward L. Huttlin et al.
Figure 9 Normalized median ratios for mixtures of labeled and unlabeled Arabidopsis proteins. Note: To evaluate the accuracy of full and partial metabolic labeling as quantitative proteomic techniques, a series of mixtures of Arabidopsis peptides were prepared. Each was mixed at a specific expected ratio based on protein concentration. The expected ratio for each mixture is indicated along the x-axis, and is plotted as a darkened circle. Vertical bars are also plotted representing the median observed ratio for all peptides identified and quantified via full and partial metabolic labeling, respectively. Median ratios have been normalized with respect to the 1:1 mixture for each technique. All ratios are of the form heavy:light. Samples containing only labeled peptides are annotated to have an expected ratio of infinity (N), whereas samples containing only unlabeled peptides are annotated to have an expected ratio of zero. For additional information, see Huttlin et al. [22].
than quantitative measurements. In general, full metabolic labeling tends to underestimate extreme ratios, whereas partial metabolic labeling tends to overestimate extreme ratios. It should be noted that the dynamic range observed for both full and partial metabolic labeling is very instrument-dependent. For example, we report a dynamic range spanning from B1:10 to 10:1 using our QTOF mass spectrometer, which is similar to the dynamic range previously reported for full metabolic labeling by other researchers using an LCQ ion trap mass spectrometer [11]. However, a recent report using an LTQ-Orbitrap mass spectrometer for full metabolic labeling reports improved results measuring differences as large as 50- to 100-fold [49]. As instrumentation improves, the ability of either metabolic labeling strategy to accurately measure large ratios of abundance will certainly improve.
4.6 Accuracy While our previous analysis of dynamic range provides a general indication of the accuracy associated with measurements from each technique, the error associated
Metabolic Labeling Approaches
505
with individual measurements should be considered as well. Figure 10 shows a contour plot representing the error associated with quantification of specific peptides across known mixtures of labeled and unlabeled Arabidopsis proteins. Only those ratios between 1:10 and 10:1 are considered. Note that the distributions of error are generally very similar for both partial and full metabolic labeling. Additionally, the majority of errors are small, showing o20% relative error. Based
Figure 10 Distribution of relative error associated with ratios from individual peptides. Note: Displayed above is a density plot indicating the distribution of relative error associated with specific peptides that were quantified by full or partial metabolic labeling, respectively. These peptides come from mixtures of labeled and unlabeled Arabidopsis proteins that were combined at a range of specific known ratios to judge the accuracy of each technique. See Huttlin et al. [22] for the details. Measurements obtained via partial metabolic labeling are plotted in shades of cyan, whereas those obtained via full metabolic labeling are plotted as magenta. The intensity of the color is proportional to the density of individual peptide measurements in that particular region of the plot. When the cyan and magenta regions overlap, purple shades are produced. These indicate regions where measurements from both full and partial metabolic labeling are present in significant numbers. Note that most measurements are clustered in a fairly small area, with most displaying an error o20–30%. Overall, the distributions of errors associated with full and partial metabolic labeling are quite similar, although values from partial metabolic labeling tend to display slightly higher errors at small ratios. (See Colour Plate Section at the end of this book.)
506
Edward L. Huttlin et al.
on this analysis, both full and partial metabolic labeling appear to provide equally reliable measures of relative abundance within their quantitative range.
4.7 Summary Overall, both full and partial metabolic labeling allow quantitative comparison of isotopically labeled and unlabeled samples with similar accuracy. Both techniques return values that are most accurate within a relatively limited dynamic range that likely varies depending on the specific instrument used. Outside this range, values become more qualitative in nature. Interestingly, partial metabolic labeling appears more likely to provide ratios for peptides that are present at extremely different levels, when full metabolic labeling tends to fail. This is due to the fundamentally different way in which full and partial metabolic labeling determine relative levels of abundance. Overall, both full and partial metabolic labeling are viable quantitative proteomic techniques that should make the advantages of metabolic labeling (excellent internal control for all steps in sample preparation and analysis) accessible in a wide variety of experimental contexts.
4.8 Analogies with other quantitative proteomic techniques Our previous analysis has focused in detail on two selected strategies for quantitative proteomics: full and partial metabolic labeling. However, many of the factors that influence the performance of partial and full metabolic labeling also influence other isotopic labeling techniques. For example, much like full metabolic labeling, SILAC produces two distinct isotopic envelopes. One might expect SILAC and full metabolic labeling to display similar dynamic ranges and accuracy. In fact this is what is seen in the literature [16]. Furthermore, one might expect SILAC to display trends similar to full metabolic labeling with respect to numbers of peptides identified and quantified across varying ratios. Furthermore, among in vitro labeling techniques, the ICAT reagent produces pairs of labeled and unlabeled isotopic envelopes, whereas the iTRAQ reagent produces single isotopic envelopes for each peptide. By analogy with our comparison of full and partial metabolic labeling, one might expect iTRAQ to provide advantages with respect to spectral complexity and numbers of peptide identifications. Finally, all isotopic labeling techniques require consistent and complete labeling for accurate results, and all isotopic labeling techniques are subject to limitations of dynamic range and accuracy. Recently, several studies have been published in which full 15N metabolic labeling is compared with other quantitative proteomic strategies with respect to overall performance. One study has compared full metabolic labeling with Difference Gel Electrophoresis (DIGE) for quantification of proteins following 2D gel separation [73]. Additionally, two groups have compared full metabolic labeling with spectral counting strategies for label-free quantitative proteomics [46,74]. While only a few alternative quantitative proteomic techniques have been
Metabolic Labeling Approaches
507
evaluated via head to head comparisons in this way, these studies provide important context for evaluating the strengths and weaknesses of each approach.
5. FUTURE DIRECTIONS In the past few years metabolic labeling has developed into an important experimental strategy for quantitative proteomics that is now being applied by many different researchers in a wide variety of different biological systems. Today, software and other tools for the automated analysis of metabolic labeling data on a proteomic scale are readily available. Furthermore, researchers have gained a tremendous understanding of the technical details that can determine the success or failure of a metabolic labeling experiment. We are now poised to apply these metabolic labeling techniques to a wide range of biological systems and deepen our understanding of many aspects of biology. Looking forward, expansion of metabolic labeling in several areas seems especially likely.
5.1 Expansion to new organisms and biological systems First, as the advantages of metabolic labeling become better understood by a wider range of researchers, its use will likely expand to many additional organisms and experimental systems. Traditionally metabolic labeling has been applied to the study of bacterial strains, yeast, and cells in culture. These particular biological systems are easily controlled and especially amenable to metabolic labeling with inexpensive and readily available precursors. Metabolic labeling will certainly continue to be applied in these systems as an important tool for quantitative proteomic analysis. However, in recent years there has been great progress extending metabolic labeling to multicellular organisms. Metabolic labeling has now been applied in C. elegans and D. melanogaster [12] as well as rats [13,20] and intact plants [14,15]. These studies open the door for metabolic labeling of a much wider range of organisms. Furthermore, with the recent development of partial metabolic labeling as an alternative technique, metabolic labeling should now be applicable in an economical way to many other organisms under much more diverse experimental conditions.
5.2 Extension of techniques and concepts to protein turnover analysis We have predominantly limited our discussion to steady-state metabolic labeling techniques and their application for comparison of static levels of protein expression in different biological samples. However, by monitoring the process by which isotopic labels are incorporated into different proteins and different tissues prior to achieving a steady state, information can be gleaned regarding relative rates of protein turnover (synthesis and degradation) [61]. These studies incorporate and extend many of the concepts and tools that have been described for steady-state metabolic labeling, but allow additional characterization of important and often overlooked biological processes on a proteomic scale.
508
Edward L. Huttlin et al.
Characterization of protein turnover on a broad scale will compliment existing genomic/proteomic/metabolomic technology and aid systems biology as a whole. Already a number of researchers have begun to apply metabolic labeling techniques to characterize protein turnover [63,75–77]. Undoubtedly interest in the application of metabolic labeling to characterize protein turnover will increase dramatically as many of the experimental details are clarified and the potential of this approach becomes clear.
5.3 Further developments of MS instrumentation Many of the basic tools used for proteomics analysis rely upon a number of important technological developments that have been made over the past few years. Fundamental improvements in mass spectrometers, HPLC systems, informatics tools and general sample handling techniques, have progressed dramatically, greatly increasing our ability to collect and analyze proteomics measurements on a large scale. Improvements in MS instrumentation alone have occurred at a dizzying pace in recent years, leading to dramatic gains in instrument performance, duty cycle, sensitivity, and reliability. Today, state-of-the-art hybrid mass spectrometers enable the collection of tremendous amounts of high resolution mass spectrometric measurements, accompanied by huge numbers of peptide identifications. There is no doubt that even greater advances in the areas of instrument design and data management will continue to enhance the effectiveness of all metabolic labeling strategies for years to come. As stated earlier, one of the key advantages of metabolic labeling and all isotopic labeling techniques is the internal control for peptide chromatographic separation and analysis that they provide. As chromatographic separations and methods of ionization improve, the interference of matrix effects and thus the need for internal control of these steps may be reduced or even eventually eliminated. However, metabolic labeling also provides internal control for all stages in sample preparation and fractionation. Regardless of future developments in instrumentation, metabolic labeling will likely remain an important tool for controlling the error associated with these variable steps in analysis.
5.4 Conclusion As we look ahead, we have considered many exciting areas of growth for metabolic labeling. While expanding its use to larger organisms and delving into protein flux measurements will certainly lead to exciting advances, perhaps the most important role for metabolic labeling in proteomics will turn out to be an extension of its most fundamental advantage: unparalleled control of all stages in sample preparation and analysis. At the present time, two key challenges face proteomics as a field: dynamic range and reproducibility. Improvements in instrumentation have certainly helped to address both of these issues. However, we are able to identify only a small fraction of all proteins that are present in most complex biological samples, due to limitations in both sensitivity and dynamic range. One way to address these issues is through extensive sample fractionation
Metabolic Labeling Approaches
509
as well as use of targeted purification strategies such as immunoprecipitations [78] and immobilized metal affinity chromatography (IMAC) [79]. However, these kinds of procedures are prone to significant technical error. Already, researchers are beginning to employ metabolic labeling in conjunction with both immunoprecipitations [54] and IMAC [52] to make targeted quantitative comparisons. More generally, our fundamental analytical challenge is not only to detect the most obvious biological changes at the proteome level, but to observe even subtle changes in protein abundance and patterns of posttranslational modifications. To detect these subtle effects, we will require even higher reproducibility for our measurements. Ultimately, the unparalleled internal control that all metabolic labeling techniques provide will be essential for addressing these challenges in a comprehensive way.
REFERENCES 1 K.E. Hung, A.T. Kho, D. Sarracino, L.G. Richard, B. Krastins, S. Forrester, B.B. Haab, I.S. Kohane and R. Kucherlapati, Mass spectrometry-based study of the plasma proteome in a mouse intestinal tumor model, J. Proteome. Res., 5(8) (2006) 1866–1878. 2 E. Brunner, C.H. Ahrens, S. Mohanty, H. Baetschmann, S. Loevenich, F. Potthast, E.W. Deutsch, C. Panse, U. de Lichtenberg, O. Rinner, H. Lee, P.G. Pedrioli, J. Malmstrom, K. Koehler, S. Schrimpf, J. Krijgsveld, F. Kregenow, A.J. Heck, E. Hafen, R. Schlapback and R. Aebersold, A high-quality catalog of the Drosophila melanogaster proteome, Nat. Biotech., 25(5) (2007) 576–583. 3 S.-E. Ong and M. Mann, Mass spectrometry-based proteomics turns quantitative, Nat. Chem. Biol., 1(5) (2005) 252–262. 4 M.R. Roe and T.J. Griffin, Gel-free mass spectrometry-based high throughput proteomics: Tools for studying biological response of proteins and proteomes, Proteomics, 6 (2006) 4678–4687. 5 E. Hugentobler and J. Jo¨liger, A general approach to calculating isotope abundance ratios in mass spectrometry, J. Chem. Edu., 49(9) (1972) 610–612. 6 S.P. Gygi, B. Rist, S.A. Gerber, F. Turecek, M.H. Gelb and R. Aebersold, Quantitative analysis of complex protein mixtures using isotope-assisted affinity tags, Nat. Biotech., 17 (1999) 994–999. 7 P.L. Ross, Y.N. Huang, J.N. Marchese, B. Williamson, K. Parker, S. Hattan, N. Khainovski, S. Pillae, S. Dey, S. Daniels, S. Purkayastha, P. Juhasz, S. Marter, M. Brtlet-Jones, F. He, A. Jacobson and D.J. Pappin, Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents, Mol. Cell Proteomics, 3 (2004) 1154–1169. 8 X. Yao, A. Freas, J. Ramirez, P.A. Demirev and C. Fenslau, Proteolytic 18O labeling for comparative proteomics: Model studies with two serotypes of adenovirus, Anal. Chem., 73(13) (2001) 2836–2842. 9 M. Miyagi and K.C. Rao, Proteolytic 18O-labeling strategies for quantitative proteomics, Mass Spectrom. Rev., 26(1) (2007) 121–136. 10 R.J. Beynon and J.M. Pratt, Metabolic labeling of proteins for proteomics, Mol. Cell Proteomics, 4 (2005) 857–872. 11 M.P. Washburn, R. Ulaszek, C. Deciu, D.M. Schieltz and J.R. Yates, 3rd, Analysis of quantitative proteomic data generated via multidimensional protein identification technology, Anal. Chem., 74(7) (2002) 1650–1657. 12 J. Krijgsveld, R.F. Ketting, T. Mahmoudi, J. Johansen, M. Artal-Sanz, C.P. Verrijzer, R.H. Plasterk and A.J. Heck, Metabolic labeling of C. elegans and D. melanogaster for quantitative proteomics, Nat. Biotechnol., 21(8) (2003) 927–931. 13 C.C. Wu, M.J. MacCoss, K.E. Howell, D.E. Matthews and J.R. Yates, 3rd, Metabolic labeling of mammalian organisms with stable isotopes for quantitative proteomics analysis, Anal. Chem., 76(17) (2004) 4951–4959.
510
Edward L. Huttlin et al.
14 J.H. Ippel, L. Pouvreau, T. Kroef, H. Gruppen, G. Versteeg, P. van den Putten, P.C. Struik and C.P. van Mierlo, In vivo uniform 15N-isotope labelling of plants: Using the greenhouse for structural proteomics, Proteomics, 4(1) (2004) 226–234. 15 C.J. Nelson, E.L. Huttlin, A.D. Hegeman, A.C. Harms and M.R. Sussman, Implications of 15 N-metabolic labeling for automated peptide identification in Arabidopsis thaliana, Proteomics, 7(8) (2007) 1279–1292. 16 S.-E. Ong, B. Blagoev, I. Kratchmarova, D.B. Kristensen, H. Steen, A. Pandey and M. Mann, Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics, Mol. Cell Proteomics, 1 (2002) 376–386. 17 H. Jiang and A.M. English, Quantitative analysis of the yeast proteome by incorporation of isotopically labeled leucine, J. Proteome. Res., 1 (2002) 345–350. 18 H. Zhu, S. Pan, S. Gu, E.M. Bradbury and X. Chen, Amino acid residue specific stable isotope labeling for quantitative proteomics, Rapid Commun. Mass Spectrom., 16 (2002) 2115–2123. 19 M. Mann, Functional and quantitative proteomics using SILAC, Nat. Rev. Mol. Cell Biol., 7(12) (2006) 952–958. 20 D.B. McClatchy, M.Q. Dong, C.C. Wu, J.D. Venable and J.R. Yates, 3rd, 15N metabolic labeling of mammalian tissue with slow protein turnover, J. Proteome Res., 6(5) (2007) 2005–2010. 21 J.P. Whitelegge, J.E. Katz, K.A. Pihakari, R. Hale, R. Aguilera, S.M. Gomez, K.F. Faull, D. Vavilin and W. Vermaas, Subtle modification of isotope ratio proteomics; an integrated strategy for expression proteomics, Phytochemistry, 65(11) (2004) 1507–1515. 22 E.L. Huttlin, A.D. Hegeman, A.C. Harms and M.R. Sussman, Comparison of full versus partial metabolic labeling for quantitative proteomic analysis in Arabidopsis thaliana, Mol. Cell Proteomics, 6(5) (2007) 860–881. 23 M.K. Hellerstein and R.A. Neese, Mass isotopomer distribution analysis: A technique for measuring biosynthesis and turnover of polymers, Am. J. Physiol., 263(Endocrinol. Metab. 26) (1992) E988–E1001. 24 M.K. Hellerstein and R.A. Neese, Mass isotopomer distribution analysis at eight years: Theoretical, analytic, and experimental considerations, Am. J. Physiol., 276(Endocrinol. Metab. 39) (1999) E1146–E1170. 25 R. Zhang, C.S. Sioma, S. Wang and F.E. Regnier, Fractionation of isotopically labeled peptides in quantitative proteomics, Anal. Chem., 73(21) (2001) 5142–5149. 26 N. Ibarrola, D.E. Kalume, M. Gronborg, A. Iwahori and A. Pandey, A proteomic approach for quantitation of phosphorylation using stable isotope labeling in cell culture, Anal. Chem., 75(22) (2003) 6043–6049. 27 K.-S. Park, D.P. Mohapatra, H. Misonou and J.S. Trimmer, Graded regulation of the Kv2.1 potassium channel by variable phosphorylation, Science, 313 (2006) 976–979. 28 L. Scott, J. Lamb, S. Smith and D.N. Wheatley, Single amino acid (arginine) deprivation: Rapid and selective death of cultured transformed and malignant cells, Br. J. Cancer, 83(6) (2000) 800–810. 29 S.-E. Ong, I. Kratchmarova and M. Mann, Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC), J. Proteome Res., 2(2) (2003) 173–181. 30 S.-E. Ong and M. Mann, A practical recipe for stable isotope labeling by amino acids in cell culture (SILAC), Nat. Protoc., 1(6) (2006) 2650–2660. 31 A.P. Snijders, M.G. de Vos and P.C. Wright, Novel approach for peptide quantitation and sequencing based on 15N and 13C metabolic labeling, J. Proteome Res., 4(2) (2005) 578–585. 32 B. Blagoev, S.-E. Ong, I. Kratchmarova and M. Mann, Temporal analysis of phosphotyrosinedependent signaling networks by quantitative proteomics, Nat. Biotechnol., 22 (2004) 1139–1145. 33 R.D. Smith, L. Pasa-Tolic, M.S. Lipton, P.K. Jensen, G.A. Anderson, Y. Shen, T.P. Conrads, H.R. Udseth, R. Harkewicz, M.E. Belov, C. Masselon and T.D. Veenstra, Rapid quantitative measurements of proteomes by Fourier transform ion cyclotron resonance mass spectrometry, Electrophoresis, 22 (2001) 1652–1668. 34 T.P. Conrads, K. Alving, T.D. Veenstra, M.E. Belov, G.A. Anderson, D.J. Anderson, M.S. Lipton, L. Pasa-Tolic, H.R. Udseth, W.B. Chrisler, B.D. Thrall and R.D. Smith, Quantitative analysis of bacterial and mammalian proteomes using a combination of cysteine affinity tags and 15 N-metabolic labeling, Anal. Chem., 73(9) (2001) 2132–2139.
Metabolic Labeling Approaches
511
35 R.D. Smith, G.A. Anderson, M.S. Lipton, L. Pasa-Tolic, Y. Shen, T.P. Conrads, T.D. Veenstra and H.R. Udseth, An accurate mass tag strategy for quantitative and high-throughput proteome measurements, Proteomics, 2 (2002) 513–523. 36 J.K. Kim, K. Harada, T. Bamba, E.-i. Fukusaki and A. Bobayashi, Stable isotope dilutionbased accurate comparative quantification of nitrogen-containing metabolites in Arabidopsis thaliana T87 cells using in vivo 15N-isotope enrichment, Biosci. Biotechnol. Biochem., 69(7) (2005) 1331–1340. 37 K. Harada, E. Fukusaki, T. Bamba, F. Sato and A. Kobayashi, In vivo 15N-enrichment of metabolites in suspension cultured cells and its application to metabolomics, Biotechnol. Prog., 22(4) (2006) 1003–1011. 38 W.R. Engelsberger, A. Erban, J. Kopka and W.X. Schulze, Metabolic labeling of plant cell cultures with K15NO3 as a tool for quantitative analysis of proteins and metabolites, Plant Methods, 2 (2006) 14–25. 39 V. Lanquar, L. Kuhn, F. Lelievre, M. Khafif, C. Espagne, C. Bruley, H. Barbier-Brygoo, J. Garin and S. Thomine, 15N-metabolic labeling for comparative plasma membrane proteomics in Arabidopsis cells, Proteomics, 7(5) (2007) 750–754. 40 A.D. Hegeman, C.F. Schulte, Q. Cui, I.A. Lewis, E.L. Huttlin, H. Eghbalnia, E.L. Ulrich, A.C. Harms, J.L. Markley and M.R. Sussman, Stable isotope assisted assignment of elemental compositions for metabolomics. Anal. Chem., 79 (2007) 6912–6921. 41 A. Gruhler, W.X. Schulze, R. Matthiesen, M. Mann and O.N. Jensen, Stable isotope labeling of Arabidopsis thaliana cells and quantitative proteomics by mass spectrometry, Mol. Cell Proteomics, 4(11) (2005) 1697–1709. 42 G.T. Cantin, J.D. Venable, D. Cociorva and J.R. Yates, 3rd, Quantitative phosphoproteomic analysis of the tumor necrosis factor pathway, J. Proteome Res., 5(1) (2006) 127–134. 43 Y.K. Wang, Z. Ma, D.F. Quinn and E.W. Fu, Inverse 15N-metabolic labeling/mass spectrometry for comparative proteomics and rapid identification of protein markers/targets, Rapid Commun. Mass Spectrom., 16 (2002) 1389–1397. 44 F.W. Studier, Protein production by auto-induction in high density shaking cultures, Protein Expr. Purif., 41(1) (2005) 207–234. 45 M.A. Grusak and S. Pezeshgi, Uniformly 15N-labeled soybean seeds produced for use in human and animal nutrition studies: Description of a recirculating hydroponic growth system and whole plant nutrient and environmental requirements, J. Sci. Food Agric., 64(2) (1994) 223–230. 46 E.L. Hendrickson, Q. Xia, T. Wang, J.A. Leigh and M. Hackett, Comparison of spectral counting and metabolic stable isotope labeling for use with quantitative microbial proteomics, Analyst, 131(12) (2006) 1335–1341. 47 L. Li, Q. Li, L. Rohlin, U. Kim, K. Salmon, T. Rejtar, R.P. Gunsalus, B.L. Karger and J.G. Ferry, Quantitative proteomic and microarray analysis of the archaeon Methanosarcina acetivorans grown with acetate versus methanol, J. Proteome Res., 6(2) (2007) 759–771. 48 N. Nirmalan, P.F. Sims and J.E. Hyde, Quantitative proteomics of the human malaria parasite Plasmodium falciparum and its application to studies of development and inhibition, Mol. Microbiol., 52 (2004) 1187–1199. 49 J.D. Venable, J. Wohlschlegel, D.B. McClatchy, S.K. Park and J.R. Yates, 3rd, Relative quantification of stable isotope labeled peptides using a linear ion trap-orbitrap hybrid mass spectrometer, Anal. Chem., 79(8) (2007) 3056–3064. 50 T.C. Hunter, L. Yang, H. Zhu, V. Majidi, E.M. Bradbury and X. Chen, Peptide mass mapping constrained with stable isotope-tagged peptides for identification of protein mixtures, Anal. Chem., 73 (2001) 4891–4902. 51 A.P. Snijders, J. Walther, S. Peter, I. Kinnman, M.G. de Vos, H.J. van de Werken, S.J. Brouns, J. van der Oost and P.C. Wright, Reconstruction of central carbon metabolism in Sulfolobus solfataricus using a two-dimensional gel electrophoresis map, stable isotope labelling and DNA microarray analysis, Proteomics, 6(5) (2006) 1518–1529. 52 J.J. Benschop, S. Mohammed, M. O’Flaherty, A.J.R. Heck, M. Slijper and F.L.H. Menke, Quantitative phosphoproteomics of early elicitor signaling in Arabidopsis, Mol. Cell Proteomics Feb., 21 (2007) (Epub ahead of print).
512
Edward L. Huttlin et al.
53 J.A. Vogt, K. Schroer, K. Holzer, C. Hunzinger, M. Klemm, K. Biefang-Arndt, S. Schillo, M.A. Cahill, A. Schrattenholz, H. Matthies and W. Stegmann, Protein abundance quantification in embryonic stem cells using incomplete metabolic labelling with 15N amino acids, matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry, and analysis of relative isotopologue abundances of peptides, Rapid Commun. Mass Spectrom., 17 (2003) 1273–1282. 54 W.X. Schulze and M. Mann, A novel proteomic screen for peptide–protein interactions, J. Biol. Chem., 279(11) (2004) 10756–10764. 55 J.S. Andersen, Y.W. Lam, A.K. Leung, S.E. Ong, C.E. Lyon, A.I. Lamond and M. Mann, Nucleolar proteome dynamics, Nature, 433 (2005) 77–83. 56 S.J. Berger, S.W. Lee, G.A. Anderson, L. Pasa-Tolic, N. Tolic, Y. Shen, R. Zhao and R.D. Smith, Highthroughput global peptide proteomic analysis by combining stable isotope amino acid labeling and data-dependent multiplexed-MS/MS, Anal. Chem., 74 (2002) 4994–5000. 57 P.A. Everley, J. Krijgsveld, B.R. Zetter and S.P. Gygi, Quantitative cancer proteomics: Stable isotope labeling with amino acids in cell culture (SILAC) as a tool for prostate cancer research, Mol. Cell Proteomics, 3 (2004) 729–735. 58 M. Bruggert, T. Rehm, S. Shanker, J. Georgescu and T.A. Holak, A novel medium for expression of proteins selectively labeled with 15N-amino acids in Spodoptera frugiperda (Sf9) insect cells, J. Biomol. NMR, 25(4) (2006) 335–348. 59 Y. Ishihama, T. Sato, T. Tabata, N. Miyamoto, K. Sagane, T. Nagasu and Y. Oda, Quantitative mouse brain proteomics using culture-derived isotope tags as internal standards, Nat. Biotechnol., 23(5) (2005) 617–621. 60 A.P. Snijders, B. de Koning and P.C. Wright, Relative quantification of proteins across the species boundary through the use of shared peptides, J. Proteome Res., 6(1) (2007) 97–104. 61 M.K. Doherty and R.J. Beynon, Protein turnover on the scale of the proteome, Expert Rev. Proteomics, 3(1) (2006) 97–110. 62 A.P. Snijders, B. de Koning and P.C. Wright, Perturbation and interpretation of nitrogen isotope distribution patterns in proteomics, J. Proteome Res., 4(6) (2005) 2185–2191. 63 J.M. Pratt, J. Petty, I. Riba-Garcia, D.H.L. Robertson, S.J. Gaskell, S.G. Oliver and R.J. Beynon, Dynamics of protein turnover, a missing dimension in proteomics, Mol. Cell Proteomics, 1 (2002) 579–591. 64 G.C. Thorne, S.J. Gaskell and P.A. Payne, Approaches to the improvement of quantitative precision in selected ion monitoring: High resolution applications, Biomed. Mass Spectrom., 11 (1984) 415–420. 65 M.J. MacCoss, C.C. Wu, H. Liu, R. Sadygov and J.R. Yates, 3rd, A correlation algorithm for the automated quantitative analysis of shotgun proteomics data, Anal. Chem., 75(24) (2003) 6912–6921. 66 C. Pan, G. Kora, D.L. Tabb, D.A. Pelletier, W.H. McDonald, G.B. Hurst, R.L. Hettich and N.F. Samatova, Robust estimation of peptide abundance ratios and rigorous scoring of their variability and bias in quantitative shotgun proteomics, Anal. Chem., 78(20) (2006) 7110–7120. 67 C. Pan, G. Kora, W.H. McDonald, D.L. Tabb, N.C. VerBerkmoes, G.B. Hurst, D.A. Pelletier, N.F. Samatova and R.L. Hettich, ProRata: A quantitative proteomics program for accurate protein abundance ratio estimation with confidence interval evaluation, Anal. Chem., 78(20) (2006) 7121–7131. 68 V.P. Andreev, L. Li, T. Rejtar, Q. Li, J.G. Ferry and B.L. Karger, New algorithm for 15N/14N quantitation with LC-ESI-MS using an LTQ-FT mass spectrometer, J. Proteome Res., 5(8) (2006) 2039–2045. 69 Y. Du, B.A. Parks, S. Sohn, K.E. Kwast and N.L. Kelleher, Top-down approaches for measuring expression ratios of intact yeast proteins using Fourier transform mass spectrometry, Anal. Chem., 78(3) (2006) 686–694. 70 M.J. MacCoss, C.C. Wu, D.E. Matthews and J.R. Yates, 3rd, Measurement of the isotope enrichment of stable isotope-labeled proteins using high resolution mass spectra of peptides, Anal. Chem., 77(23) (2005) 7646–7653. 71 P. Huskey, In: P.F. Cook (Ed.), Enzyme mechanism from isotope effects, Boca Raton, FL, CRC Press, 1991, p. 37. 72 D.N. Perkins, D.J. Pappin, D.M. Creasy and J.S. Cottrell, Probability-based protein identification by searching sequence databases using mass spectrometry data, Electrophoresis, 20 (1999) 3551–3567.
Metabolic Labeling Approaches
513
73 A. Kolkman, E.H. Dirksen, M. Slijper and A.J. Heck, Double standards in quantitative proteomics: Direct comparative assessment of difference in gel electrophoresis and metabolic stable isotope labeling, Mol. Cell Proteomics, 4(3) (2005) 255–266. 74 B. Zybailov, A.L. Mosley, M.E. Sardiu, M.K. Coleman, L. Florens and M.P. Washburn, Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae, J. Proteome Res., 5 (2006) 2339–2347. 75 B.J. Cargile, J.L. Bundy, A.M. Grunden and J.L. Stephenson, Synthesis/degradation ratio mass spectrometry for measuring relative dynamic protein turnover, Anal. Chem., 76(1) (2004) 86–97. 76 M.K. Doherty, C. Whitehead, H. McCormack, S.J. Gaskell and R.J. Beynon, Proteome dynamics in complex organisms: Using stable isotopes to monitor individual protein turnover rates, Proteomics, 5 (2005) 522–533. 77 N. Gustavsson, B. Greber, T. Kreitler, H. Himmelbauer, H. Lehrach and J. Gobom, A proteomic method for the analysis of changes in protein concentrations in response to systemic perturbations using metabolic incorporation of stable isotopes and mass spectrometry, Proteomics, 5 (2005) 3563–3570. 78 M. Gronborg, T.Z. Kristiansen, A. Stensballe, J.S. Andersen, O. Ohara, M. Mann, O.N. Jensen and A. Pandey, A mass spectrometry-based proteomic approach for identification of serine/threoninephosphorylated proteins by enrichment with phospho-specific antibodies: Identification of a novel protein, Frigg, as a protein kinase A substrate, Mol. Cell Proteomics, 1(7) (2002) 517–527. 79 S.B. Ficarro, M.L. McCleland, P.T. Stukenberg, D.J. Burke, M.M. Ross, J. Shabanowitz, D.F. Hunt and F.M. White, Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae, Nat. Biotechnol., 20(3) (2002) 301–305.
Plate 12 Correlation strategy for relative quantification of isotopically labeled peptides following LC–MS analysis. Note: The correlation strategy for relative quantification first relies on the identification of the peptide sequence from assignment of heavy or light species MS/MS data. Once the formula of the peptide is known the spacing between the monoisotopic peaks of the heavy and light envelopes is defined (A) and extracted ion chromatograms (EICs) can be generated (B). Each pair of heavy and light datapoints along the elution time axis of the EICs is then plotted against each other (C) as illustrated by the colored points in (B) and their corresponding point in (C). Because the heavy and light labeled peptides coelute exactly, the points in (C) form a line. A linear regression (D) of this plot provides the ratio of heavy to light as the slope and an estimate of noise/baseline as the y-intercept. The correlation coefficient from the regression analysis can be used to evaluate the success of the operation allowing rapid assessment without visual inspection. (For Black and White version, see page 494.)
Plate 13 Distribution of relative error associated with ratios from individual peptides. Note: Displayed above is a density plot indicating the distribution of relative error associated with specific peptides that were quantified by full or partial metabolic labeling, respectively. These peptides come from mixtures of labeled and unlabeled Arabidopsis proteins that were combined at a range of specific known ratios to judge the accuracy of each technique. See Huttlin et al. [22] for the details. Measurements obtained via partial metabolic labeling are plotted in shades of cyan, whereas those obtained via full metabolic labeling are plotted as magenta. The intensity of the color is proportional to the density of individual peptide measurements in that particular region of the plot. When the cyan and magenta regions overlap, purple shades are produced. These indicate regions where measurements from both full and partial metabolic labeling are present in significant numbers. Note that most measurements are clustered in a fairly small area, with most displaying an error o20–30%. Overall, the distributions of errors associated with full and partial metabolic labeling are quite similar, although values from partial metabolic labeling tend to display slightly higher errors at small ratios. (For Black and White version, see page 505.)
SUBJECT INDEX
accuracy, 8, 13–14, 16–17, 25, 28–30, 45, 65, 121, 132, 146–147, 174, 180, 182, 192, 180–181, 182, 199, 215–220, 226, 233–234, 237, 276, 309, 360, 376, 395, 412, 414–419, 422, 425–426, 437, 439, 459, 471, 474–475, 477, 486, 495, 504–506 accurate, 13, 29, 66, 104, 128–129, 132, 134, 182, 214, 237, 246, 295, 354, 391, 414–416, 418–419, 424, 433, 439, 459, 473–475, 506 acid-labile, 324, 328, 333, 335, 337, 339, 344 adduct, 5, 7, 10, 58, 68, 182, 339 backbone amide, 85–88, 93, 98, 117, 376, 378–380 bottom-up, 179–183, 216, 218, 220, 255, 299, 335 CAD, 34, 189–190, 192, 391–392, 397–398 carbodiimides, 201 charge, 5, 13, 16–17, 19–21, 27, 37–38, 64–65, 68, 70, 73–74, 76–77, 108, 113–115, 119–120, 177, 198–199, 201, 203, 215, 216, 219, 225, 229, 263, 284, 301, 303, 304, 305, 367, 379–381, 387–388, 402, 440, 457, 471, 482 charge state, 5, 9–12, 17, 19, 38, 69, 73–75, 77, 111, 113, 115, 119, 199, 219, 284, 367, 397 charge-state distribution, 49–50, 52, 55, 59 chromatography, 12–13, 40, 42, 66, 93, 104, 174, 184, 187, 189, 198, 208, 216, 218–219, 225, 238, 255, 257, 277, 279–280, 282, 289, 301–302, 310, 330–331, 335, 338, 358, 361–366, 380–381, 386–387, 394, 422–423, 425, 429, 432–433, 451, 455–457, 459, 462, 469, 494, 509 cleavage, 37–38, 67, 152, 155, 172, 183, 199, 214, 219, 223–224, 229, 234, 255, 262–263, 265, 279, 369, 384–385, 389, 393, 395–397, 403, 407, 419, 422, 424, 444, 465 collisionally-activated, 35, 391–393, 397 conformation, 48, 52, 54–55, 59 correlation, 414, 431, 468–470, 475, 494, 503
Coulomb repulsion, 73 cross section, 66, 71, 73–74, 77 crosslink, 247, 250, 256, 260, 262–263, 265–267 Dalton, 5, 131, 395, 482 deamidation, 376–383, 385–391, 394–395, 397–402 denaturant, 128–129, 131, 135, 142–143, 145–146, 154 denaturation, 48 denatured, 85, 87, 112, 114, 119, 132, 134, 235, 334 desorption, 2, 9, 14–15, 21, 34, 92, 147–148, 152, 198, 216, 238, 265, 268, 282, 299, 310, 338, 410, 443–444, 450, 480 detectors, 21, 26–27, 30–31, 66, 459, 471, 475 disorder, 49, 52, 54–55 dissociation, 25, 34–36, 38, 40, 43, 52, 58, 65–66, 78, 97–98, 110, 113, 128, 136, 138–139, 141, 143, 173, 189–192, 217–218, 238, 267–268, 283–284, 303, 309–310, 326, 358, 361, 366, 368, 391, 395–396, 412 ECD, 34–35, 38–40, 65–66, 75, 189–190, 192–193, 217–218, 233–235, 238, 267, 309, 366–368, 395–398, 402 electron multiplier, 4, 24, 30, 32 electron-capture, 46, 65–66, 75–76, 217–218, 233, 234, 235, 267, 374, 395–397, 397–398 electron-transfer, 34, 38–40, 218, 233, 278, 284–285, 293, 368, 396–397 electrospray, 2, 7, 9–10, 16–17, 61–62, 64, 73, 89, 92, 104, 149, 152, 155, 181, 183, 188, 198, 216, 238, 251, 268, 283, 299, 310, 335, 340–341, 343, 372, 379, 391, 413, 450, 480 ESI, 2–4, 9, 13, 17–19, 30, 47–55, 57–60, 64, 92, 104–108, 113, 115–116, 119, 121, 152, 183–184, 188–189, 198, 206, 216, 218, 220, 238, 251, 263, 268, 299, 304, 310, 335, 341, 393, 413–414, 450, 453, 457, 460, 480 ETD, 34–35, 38, 40, , 190, 193, 218, 233, 238, 284–286, 293, 309, 368, 396–397 EX1, 101, 118, 134
515
516
Subject Index
EX2, 118, 132, 134–135, 142 exchange, 83–93, 95–100, 112, 117–119, 127–132, 134–138, 140–146, 152–154, 203, 214, 218–219, 234–238, 259, 282, 290, 301, 308, 311, 349, 387–388, 444, 454–457, 496–497 fourier-transform, 27–29, 189 gas phase, 2, 4, 15, 17, 19, 21, 23, 34–38, 40, 49, 58, 75, 113, 121, 173–174, 198, 216–217, 220, 267, 303, 396, 398, 452 GlcNAc, 359–361, 363, 365–366 heterogeneity, 48, 51–52, 54–55, 58–59 histidine, 316–319, 321–329, 331–339, 341–342, 344–346 hybrid, 35, 82, 123, 160, 190, 192, 194, 217, 220, 251, 295, 309, 341, 495, 508 hydrogen bond, 85, 87, 90–91, 118–119, 153, 214, 234–236, 259, 267, 324, 398 hydrogen-deuterium, 84, 124–125, 149, 173, 234–236, 259 hydroxyl, 151–152, 155–164, 166–168, 170–171, 202, 250, 318, 358, 376, 379, 392 ICAT, 420–422, 425, 429–430, 433–434, 451, 481, 494, 506 identification, 3, 12, 37, 40, 65, 96, 122, 172, 174, 183, 191, 207, 209, 215–217, 219–221, 225, 229–230, 233–235, 237–238, 247, 250, 254–260, 263, 265–266, 276, 282, 289, 291, 302–303, 308, 310, 335, 341, 344, 360–362, 366–368, 399, 412–413, 418, 421–425, 427–429, 431–434, 436, 450, 453, 459–460, 470, 475, 480, 488, 492, 494, 496, 499, 502–503 IMAC, 277, 279–280, 282, 284, 301–302, 309–310, 338–339, 358, 422, 436, 509 intact mass tag, 183, 182, 183 intact protein, 40, 79, 89, 98, 113, 115–116, 179–181, 183, 189, 192, 299, 370, 420, 424, 433, 451, 489 intensity, 5, 10, 40, 42, 68, 105, 107, 109, 111, 113, 119, 202, 205, 231–232, , 285, 289–290, 303, 359–360, 366, 381, 390–393, 397, 440, 450, 452, 460, 471, 473, 480, 496, 501, 503, 505 Interact, 252, 254, 258 intermediates, 104, 109, 116–117, 119–120, 145–146, 158, 243, 371, 387, 399 interpretation, 10–11, 17, 19, 36, 38, 93, 95, 121, 172, 190–192, 226, 236–238, 263, 309,
352, 360, 368, 410, 417, 422, 459, 484, 495, 497 ion current, 5, 13, 30, 35, 107, 255, 285, 413 ion stability, 35–36 ionization, 2–6, 9, 12, 14–17, 34, 37, 40, 47–48, 64, 89, 92, 104, 107, 113, 147, 149, 152, 180–181, 183, 187–189, 194, 198, 216, 218–219, 238, 247, 251, 263, 265–266, 268, 276, 282–283, 299, 310, 372, 391, 412–413, 415, 425, 450, 474, 480–481, 508 ion-mobility, 66, 71 isoAsp, 376, 378, 381–384, 392, 395, 398 isobaric, 450–456, 459, 461–462, 464, 466, 474, 481, 501 isotopic, 105, 214, 216–217, 230–233, 235–236, 255, 287–289, 306, 364, 370, 376, 380, 390–391, 394, 399–400, 402, 416, 420, 422–425, 435, 450–451, 480–489, 491–501, 503, 506–508 isotopomer, 11, 482, 486 iTRAQ, 423–425, 430, 435, 452, 454–456, 464, 474, 481, 506 Kinase, 276, 291, 298, 300, 306, 308, 316, 319–321, 324–327, 329, 331–334, 344, 356, 383, 435, 513 kinetics, 103, 107, 109–112, 117, 119–121, 147–148, 157, 160–161, 167, 234, 236, 327, 386, 388, 464–466 Labile, 86, 117, 153, 219, 252, 335, 358, 366–367, 385, 399, 422 LC-MALDI, 459–464 lectin, 361–362, 365 Ligand, 74–75 linear ion-trap, 4, 26–27, 31, 26–27 LOPIT, 429–430, 473–474, 476 MALDI, 9, 14–16, 48, 92, 102, 121, 127, 129, 134, 143–144, 147–149, 152, 188, 183, 188, 198–199, 205, 208, 216, 219–220, 232–233, 238, 240, 261, 263, 282, 304, 310, 338–339, 341, 391, 407, 418, 424, 436, 450, 453, 457, 459–460, 480 mapping, 153, 225, 246–247, 251–252, 259–260, 263, 265, 267, 295–296, 304, 352, 354, 357–358, 364, 410, 445, 511 mass measurement, 5, 8, 13–14, 52, 89, 180, 199, 367 membrane, 65, 74–75, 105, 144, 180, 182–184, 187–190, 194, 197–201, 203, 207–209, 214–215, 220–226, 228–229, 231, 233–236, 238, 246, 251, 255–256, 259, 280, 289, 291,
Subject Index
320, 326–328, 362, 382–383, 412, 420, 422, 424–425, 428, 430, 434, 463, 467, 511, 513 metabolic, 416–420, 427, 433, 435–437, 481–487, 489–509 misfolding, 64, 67 mixing, 104–109, 113, 118–119, 121, , 334, 459, 483, 485, 501–503 multiple charging, 48, 52, 55, 58 multiple reaction monitoring, 20, 42 negative ion, 69 neutral, 284, 303–304, 318, 335, 338, 360, 366–367, 369, 376, 379, 396, 423, 452 non-covalent, 63–79 O-glycosyl, 279, 363 Oligomer, 67–71, 76 orbitrap, 4, 7, 14, 29–31, 45, 190, 193–194, 196, 190, 193–194 oxidation, 155–160, 162–172, 228–229, 418 parent ion, 20, 24, 34–36, 38, 41–43, 217, 261 partial, 483, 485–487, 489, 491–492, 495, 498–507 peaklist, 191, 191, 193 peptide mass, 180–181, 180–181 peptides, 3, 9, 11, 13, 16, 34, 36–38, 40–41, 48–51, 63–64, 67, 89, 92–93, 98–99, 110, 132, 138, 142, 152, 154, 161, 179–180, 182–183, 198–203, 205–207, 209, 216–220, 222–226, 228, 230–237, 247, 250, 252, 255, 257, 260–263, 265–267, 276–277, 279–280, 282–285, 287–291, 293, 299–301, 303–306, 308–310, 318, 335, 337–339, 341–344, 358–363, 365–366, 368–369, 377, 381–383, 385–401, 413–425, 429–431, 433, 435–436, 450–452, 454, 456–457, 459, 461–465, 470–474, 480–482, 487–489, 491–492, 494–506 phenylalanine, 158, 162, 167–168 phosphatase, 280, 303, 329–330, 356, 365 phospho, 275–292, 310, 318, 335, 354, 355–356, 358, 363–365, 370, 384, 418, 420, 423, 435–436, 464–465 photolysis, 163 post-translational modification, 38, 40, 66, 84, 152, 183, 238, 257, 259, 275, 316, 354, 408, 412, 420, 435 post-translational modification, 451, 454, precursor, 5, 38, 57, 112, 189–190, 217, 256, 283–284, 303, 358–360, 362–363, 366–369, 376, 415, 419, 454, 459–461, 475, 491–492, 498–503 pre-steady, 109, 112,
517
product, 25, 36, 41, 43, 66, 110, 153, 158–159, 180–181, 189–190, 192, 198, 204, 234, 247, 251, 256, 262, 265, 279, 284, 323, 325, 331, 360, 376, 379, 394, 413, 465, 499 product ion, 20, 41, 43, 66, 152, 284, 362, 413, 499 protein dynamics, 83–84, 86, 91, 125, 127–146, 174, 197–208, 243, 272, 277–285, 297–310, 316–346, 354–369, 376–401, 420, 435, 437, 452–457, 467–476, 480–508 proteinase, 224–225, 385 quadrupole, 4, 6–7, 14, 17–19, 22–26, 30–31, 35–36, 42, 65, 107, 189–190, 217, 238, 268, 295, 303, 309, 358, 360, 362, 368, 391, 454, 461, 475, 494 quadrupole ion-trap, 23–25 quantitative, 305–306, 310 quench, 88–89, 92–93, 123 quenching, 168, 235–236, 255, 390, 456 radiation, 155, 161, 203, 206, 230, 261 radical, 155–163, 167–168, 396 receptor, 74–75 resolution, 5–9, 11–13, 19, 21–23, 25, 27–29, 65, 93, 104, 107, 109, 112–113, 121, 147, 152, 167, 199, 214, 216–218, 221, 226, 232–235, 237, 246, 259, 287, 303, 309, 348, 376, 381, 388, 391, 395–396, 412, 415, 424, 430, 435, 438, 459, 468–469, 482, 495, 508 reverse-phase, 13, 184, 331, 336, 408, 455, 457, 459 sample preparation, 34, 64, 183–184, 199, 204, 208, 218, 220, 223, 226, 267, 289–291, 305, 310, 335, 341, 363, 414–415, 431, 480–481, 485–487, 496, 506, 508 scan modes, 35, 40, 42 selected reaction monitoring, 20 separation, 6–7, 11, 13, 20–21, 58, 65, 93–94, 98, 118, 142, 182, 184, 187, 199, 216–219, 225, 230–231, 233–236, 248, 282, 290–291, 331, 380, 385–388, 390, 394, 396, 425, 431, 436, 438, 450, 456–457, 459, 476, 506, 508 sequence, 19, 34, 40–41, 59, 75, 84, 92, 96, 110, 132, 135, 152, 179–180, 189–193, 201, 205, 208, 215, 218–221, 226, 228, 237, 247, 257–258, 267, 275, 277, 280, 283–284, 291, 299–300, 304, 309, 323–325, 328, 338, 342, 345, 358–360, 366, 376–377, 380–381, 386, 389, 397–398, 400, 414, 416–417, 422, 425, 451, 453, 459, 471, 482, 484, 489, 494, 501 sequence tag, 195, 191, 193
518
Subject Index
SILAC, 305–308, 310–311, 418–420, 434–435, 437, 451, 474, 483–486, 488–494, 496, 506 size-exclusion, 66, 184, 198, 208 stoichiometry, 298–299 substrate, 201–203, 205–206, 208–209, 229–231, 298, 323, 327, 329–331, 334, 337, 345, 363, 385, 387, 400, 456, 513 suprex, 127–138, 140–146, 154 tandem mass spectrometry, 4, 34, 40, 45–46, 80, 121, 153, 154, 157, 179, 183, 189, 194–195, 179–180, 183, 189, 238, 256, 268, 273, 294–295, 299, 311, 352, 374, 381, 403, 406, 409–410, 444, 447, 457, 470, 499 time-of-flight, 4, 20, 61, 65, 80, 107, 152, 183, 183, 188, 189, 198–199, 208, 216, 239, 265, 268, 273, 295, 309, 338, 352, 358, 373, 385, 450, 471, 475, 512 time-resolved, 104–108, 110–121, 175, 242, 296, 447 TiO2, 277, 279–280, 282, 284, 289, 301, 308
TOF-TOF, 458, 461 top-down, 40, 66, 179–180, 183, 189–194, 216, 267, 299, 309, 335, 339, 391, 495 Topology, 199–200, 209, 214–216, 219, 223–227, 229, 231, 233, 236–237, 246–247, 259–260, 262–263, 265–266 transmembrane, 180, 187, 190, 192, 199–200, 214, 224, 239 traveling wave, 66 trypsin, 37, 92, 172, 183, 200, 219–220, 223, 225–226, 228, 230–231, 234, 253, 255, 265, 288, 290–291, 299, 305, 368, 385, 394, 399–400, 424–425, 436, 451, 455–456, 464, 496 turnover, 416, 435–437, 462, 491–493, 507–508 two-component, 316, 318–320, 323, 344 tyrosine, 297–300, 303, 305–306, 308–310 unfolding, 48–52, 58, 87–88, 91, 113, 115, 117, 128, 132, 138, 140–141, 143, 145–146, 156–157, 167–168, 223, 380, 402