ACCELERATION AND IMPROVEMENT OF PROTEIN IDENTIFICATION BY MASS SPECTROMETRY
Acceleration and Improvement of Protein I...
46 downloads
829 Views
7MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ACCELERATION AND IMPROVEMENT OF PROTEIN IDENTIFICATION BY MASS SPECTROMETRY
Acceleration and Improvement of Protein Identification by Mass Spectrometry Edited by
WILLY VINCENT BIENVENUT Biochemistry Institute, Protein Analysis Facility, Lausanne University, Switzerland
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 1-4020-3318-4 (HB) ISBN 1-4020-3319-2 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. Sold and distributed in North, Central and South America by Springer, 101 Philip Drive, Norwell, MA 02061, U.S.A. In all other countries, sold and distributed by Springer, P.O. Box 322, 3300 AH Dordrecht, The Netherlands.
Printed on acid-free paper
All Rights Reserved © 2005 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed in the Netherlands.
"Conformity is the jailer of freedom and the enemy of growth." John F. Kennedy
DEDICATION
To my parents for their help and support during all the years.
TABLE OF CONTENTS
TABLE OF CONTENTS PREFACE ACKNOWLEDGEMENT LIST OF CONTRIBUTOR
vii xvii xix xxi CHAPTER 1
WV. Bienvenut Introduction: Proteins analysis using mass spectrometry 1. Introduction: from genome to proteomic analysis 2. Proteins separation 2.1. Introduction 2.2. Electrophoretical separation 2.2.1. Gel separation 2.2.1.1. Molecular mass separation 2.2.1.2. Isoelectric focalisation separation technique 2.2.1.3. Bi-Dimensional separation technique 2.2.1.4. Visualization/staining methods for gel separated proteins Organic dyes Metallic ions staining Covalently immobilized dyes Radioisotope labelling 2.2.2. Capillary electrophoresis separation 2.3. Liquid chromatography 2.4. Mutidimentional chromatography separation 2.5. Conclusion 3. Proteins electroblotting from gel to polymere membrane 3.1. Introduction 3.2. Transfer systems 3.3. Composition and influence of the blotting buffer and solvents 3.3.1. Buffers composition 3.3.2. Effects of the SDS and methanol contained in the buffer solution 3.3.3. Others influencing parameters
vii
1
1 3 3 3 4 5 5 6 6 7 9 10 11 11 12 14 14 14 14 15 18 18 20 20
viii 3.4. Membranes staining 3.4.1. Non-denaturing organic staining process 3.4.2. Radiolabelled protein detection 3.4.3. Denaturing staining process 3.5. Conclusion 4. Proteins identification 4.1. Introduction 4.2. Nuclear proteins identification procedures 4.3. Enzymatic cleavage of proteins 4.3.1. Introduction 4.3.2. Enzymatic cleavage description 4.3.2.1. Treatment and digestion of transblotted proteins 4.3.2.2. Treatment and digestion of gel separated proteins 4.3.2.3. Utilisation of immobilized endoproteinases 4.3.3. Trypsin 4.3.3.1. Enzymatic activity measurement 4.3.3.2. Cleavage specificity 4.3.4. Endoproteinase Lys-C 4.3.5. Chymotrypsin 4.3.6. Pepsin 4.3.7. Bacterial endopeptidases 4.3.8. Conclusion 4.4 Chemicals cleavage of the proteins. 4.4.1. Introduction 4.4.2. Acidic hydrolysis 4.4.3. Cyanogen bromide 4.4.4. Cleavage at the carbonyl side of t he Trp 4.4.5. Cleavage at Cys residues 4.4.6. Conclusion 4.5. Sample preparation and clean-up for MALDI-MS analysis 4.5.1. Chromatographic treatment 4.5.2. Preparation of Samples for MALDI-MS analysis 4.5.2.1. Dry droplet method 4.5.2.2. Spin-coated drying 4.5.2.3. Slow crystallisation 4.5.2.4. Fast evaporation method 4.5.2.5. Crystalline germ method 4.5.2.6. Sprayed matrix 4.5.3. Sample desalting procedures 4.5.4. Conclusion
21 21 23 24 25 25 25 25 27 27 27 28 28 29 29 31 31 33 34 35 36 37 37 37 37 38 38 40 40 41 41 42 42 43 43 43 44 44 44 45
ix 4.6. Proteins identification using mass spectrometry 45 4.6.1. Protein identification using PMF technique. 46 4.6.1.1. Method description 46 4.6.1.2. MALDI-TOF-MS analysis technique 47 The laser 47 The matrix 49 The co-matrix 55 4.6.1.3. MALDI ionisation mechanism 55 4.6.1.4. Time of flight separation of the ions and its improvement 58 4.6.1.5. Signal detection and data acquisition 59 4.6.1.6. Separation and detection using ICR-FT 59 4.6.1.7. Signal reproducibility 60 4.6.1.8. Suppression effects 60 4.6.1.9. Quantification by mass spectrometry 63 4.6.1.10. Data treatment for protein identification 65 MALDI-MS spectrum calibration 65 Identification tools 66 Principal affecting factors during data processing 67 Other possible criteria for protein Identification not directly integrated to identification tools 72 Interpretation of results and limits of validity 72 4.6.2. Protein identification from internal peptide sequence 73 4.6.2.1. Introduction 73 4.6.2.2. ESI-MS/MS analysis 75 5. Advances techniques for protein identification 79 5.1. Introduction 79 5.2. Chemical modifications 80 5.2.1. Introduction 80 5.2.2. Reaction involving free amino groups of the peptides/proteins 80 5.2.2.1. Acetylation of the amino groups 80 5.2.2.2. Lys specific reactions 81 5.2.2.3. Iso-thiocyanate treatment of the free amino groups for N-Terr cleavage (Edman type reaction) 83 5.2.3. Reaction involving free carboxylic groups of the peptides/proteins84 5.2.4. Labile hydrogen atoms exchange to deuterium atoms 85 5.2.5. Cysteine alkylation 86 5.2.6. Peptides modification using charged modifications 87 5.2.6.1. Positively charged modifications 87 5.2.6.2. Negatively charged modifications 88 89 5.2.6.3. Conclusion 5.2.7. Stable isotope labelling during the digestion 89 5.2.8. Conclusion 90
x 5.2. Biochemical approach 5.3. In-vivo labelling 6. Automated approach
91 94 94 CHAPTER 2
Molecular scanner development: Toward clinical molecular scanner for proteome research: Parallel protein chemical processing before and during western-blot. Reprinted with permission from Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Copyright 1999, American Chemical Society. Abstract Keywords 1. Introduction 2. Experimental section 2.1. Reagents 2.2. Covalent attachment of trypsin and blocking of the IAV membrane 2.3. Activity measurement of trypsin covalently bound to the IAV membrane 2.4. 1-DE and 2-DE separation 2.5. In Gel Digestion 2.6. On membrane Digestion 2.7. OSDT process 2.8. PIGD 2.9. DPD combined method 2.10. MALDI-TOF-MS 2.11. Post-acquisition processing and software identification tools 3. Results 3.1. Activity measurement of trypsin covalently bound to the IAV membrane 3.2. IGD 3.3. OMD 3.4. OSDT 3.5. PIGD 3.6. DPD applied to 1-DE 3.7. Comparative digestion between OSDT, PIGD and the DPD applied to 2-DE 3.8. DPD applied to 2-DE
119 120 121 122 122 122 123 123 124 124 124 125 125 125 126 126 126 128 128 129 130 130 131 132
xi 4. Discussion and conclusion 5. Acknowledgments 6. References
132 136 136 CHAPTER 3
Quantitation during electroblotting step: Enhanced protein recovery after electrotransfer using square wave alternating voltage. Reprint by permission of Elsevier Science from Bienvenut, W., Deon, C., Sanchez, J., & Hochstrasser, D. (2002). Enhanced protein recovery after electrotransfer using square wave alternating voltage. Anal Biochem, 307(2), 297-303. Copyright 2002. Abstract Keywords 1. Introduction 2. Material and methods 2.1. Mono-dimensional electrophoresis (1-DE) 2.2. Electroblot 2.3. Detection, quantification and statistics 2.4. [14C] signal linearity and influence of the accumulation time 3. Results and discussion 3.1. Comparison of the electric field and buffer composition effects 3.2. Statistical test for the transfer reproducibility
3.3. Gel residual protein after transblotting process 4. Concluding remarks 6. Acknowledgement 7. References
139 139 140 140 140 141 142 142 143 143 145
146 148 149 149
CHAPTER 4 Signal traitment and virtual imaging (1/2): A molecular scanner to highly automated research and to display proteome images. Reprinted with permission from Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 4981-4988. Copyright 1999, American Chemical Society. Abstract Keywords 1. Introduction
151 152 153
xii 2. Experimental section 2.1. Materials and reagents 2.2. Description of the method 3. Results and discussion 3.1. Representation of the analysis of a 1-dimensional scan of 1-DE 3.2. Representation of the analysis of a two-dimensional scan from a single band of 1-DE 3.3 Identification by two-dimensional scan of human plasma proteins separated by 2-DE 4. Discussion 5. Conclusion 6. Acknowledgement 7. References
155 155 155 158 158 160 161 163 166 166 167
CHAPTER 5 Signal traitment and virtual imaging (2/2): Visualization and analysis of molecular scanner peptide mass spectra. Reprint by permission of Elsevier Science from Mueller, M., Gras, R., Appel, R. D., Bienvenut, W. V., & Hochstrasser, D. F. (2002). Visualization and analysis of molecular scanner peptide mass spectra. J Am Soc Mass Spectrom, 13(3), 221-231. Copyright 2002, by the American Society of Mass Spectrometry. Abstract 1. Introduction 2. Methods 3. Results and discussion 3.1. Visualization of spectra 3.2. Chemical noise 3.3. Calibration 3.4. Identification and clustering of masses 4. Conclusion 5. Acknowledgements 6. References
169 170 172 173 173 174 176 178 185 187 187
xiii
CHAPTER 6 Improvement in the peptide mass fingerprint protein identification (1/2): Hydrogen/deuterium exchange for higher specificity of protein identification by peptide mass fingerprinting Reprinted by permission of John Wiley & Sons, Inc., from Bienvenut, W., Hoogland, C., Greco, A., Heller, M., Gasteiger, E., Appel, R., et al. (2002). Hydrogen/deuterium exchange for higher specificity of protein identification by peptide mass fingerprinting. Rapid Commun. Mass Spectrom., 16(6), 616-626. Copyright 2002. Abstract 189 1. INTRODUCTION 191 2. Methods 193 2.1. Chemicals 193 2.2. protein separations 193 2.3. In-gel protein digestion 195 2.4. MALDI-ToF MS analysis 195 2.5. H/D exchange on the MALDI sample plate 196 3. Results and discussion 196 3.1. Visualization of spectra 197 3.2. Chemical noise 197 3.3. Calibration 198 3.4. Identification and clustering of masses 199 3.5. Application of the technique to tryptic bovine serum albumin digest 199 3.6. Application of the technique to an unknown protein digest 200 4. Discussion and conclusion 202 4.1. Influence of the matrix compound 202 4.2. Influence of the physico-chemical characteristic of the solvent 203 4.3. Influence of the amino acid composition of the peptide 203 4.4. Application of the technique as a validating and discriminating method 204 5. Challenge and future developments 204 6. Acknowledgements 205 7. References 205
xiv
CHAPTER 7 Improvement in the peptide mass fingerprint protein identification (2/2): MALDI-MS/MS with high resolution and sensitivity for identification and characterization of proteins Reprinted by permission of Wiley-Liss, Inc, a subsidiary of John Wiley & Sons, Inc., from Bienvenut, W., Deon, C., Pasquarello, C., Campbell, J., Sanchez, J., Vestal, M., et al. (2002). Matrix-assisted laser desorption/ionization-tandem mass spectrometry with high resolution and sensitivity for identification and characterization of proteins. Proteomics, 2(7), 868-876. Copyright 2002. Abstract Keywords 1. Introduction 2. Materials and methods 2.1. Reagents and apparatus 2.2. Protein solubilisation for preparative 2-D PAGE 2.3. 2-D PAGE 2.4. Image analysis 2.5. Protein digestion 2.6. Sample preparation 2.7. Database interrogation 3. Results 3.1. Peptide sequences discrimination 3.2. De Novo sequencing 3.3. Tryptophan oxidation 4. Conclusion 6. Acknowledgements 5. References
209 209 210 210 210 211 211 212 212 212 214 216 216 218 220 222 222 222
CHAPTER 8 Proteomic and mass spectrometry: Some aspects and recent developments. Bienvenut, W. V., Mueller, M., Palagi, P. M., Gasteiger, E., Heller, M., Jung, E., Giron, M., et al. (2001). Proteomic and mass spectrometry: some aspects and recent developments, In J. N. Housby (Ed.), Mass spectrometry and genomic analysis (1st ed., Vol. 2, pp. 93-145). Dordrecht: Kluwer academic press. 1. Introduction to proteomics 2. Protein biochemical and chemical processing followed by
225
xv mass spectrometric analysis 2.1. 2-DE gel protein separation 2.2. Protein identification using peptide mass fingerprinting and robots 2.2.1. MALDI-MS analysis 2.2.2. MS/MS analysis 2.2.2.1. MALDI-RETOF-PSD MS analysis 2.2.2.2. ESI-MS/MS analysis 2.2.3. Improvement of the identification by chemical modification of peptides 2.2.3.1. Esterification 2.2.3.2. H/D exchange: quantitation of labile protons on peptides 2.3. The molecular scanner approach 2.3.1. Double parallel digestion process 2.3.2. 14C quantitation of the transferred product and diffusion 2.3.2.1. Comparison of the influence of the electric field on the protein recovery 2.3.2.2. DPD quantification test 3. Protein identification using bioinformatics tools 3.1. Protein identification by PMF tools using MS data 3.1.1. Peak detection 3.1.2. Identification tools 3.2. MS/MS Ions Search 3.3 De novo sequencing 3.4 Other tools related to protein identification 3.5. Data storage and treatment with LIMS 3.6. Concluding remarks 4. Bioinformatics tools for the molecular scanner 4.1. Peak detection and spectrum intensity images 4.2. Protein identification 4.3. Validation of identifications 4.4. Concluding remarks 5. Conclusion 6. Acknowledgements 7. References
226 227 229 231 234 235 236 239 240 241 247 247 248 249 250 252 252 254 254 259 260 261 262 264 265 265 267 267 273 273 274 274
CHAPTER 9 Conclusions and perspectives
283 APPENDIX
Abbreviations used in this book Abbreviations for usual amino acids and chemical constants
285 289
xvi
Index
291
PREFACE
Now that the human genome has been fully sequenced, the need for efficient protein analysis and characterization tools has never been so critical. Firstly, computer algorithms have been used to predict genes and it is accepted that as much as 10% of them might have been missed. Only final gene products, i.e. the proteins, prove that gene sequences with signal sequences, introns and exons are correct. Secondly, it is nearly impossible at present to predict with high accuracy the final polypeptide product and its co- and post-translational modifications. Then, a protein’s partial characterization allows a definite identification of the protein’s processing such as the amino-acid sequence modification induced by the editing of the mRNA during alternative splicing. Thirdly, there is lack of correlation between the expression levels of mRNA and proteins. Their respective half-lives are very different as well as their levels of expression. The major difficulty in analysing proteins is the tremendous diversity of their chemical and other properties. Their concentrations vary by more than 12 orders of magnitude in body fluids and by more than 7 orders of magnitude in cells. For example in blood, the concentration of albumin is in the millimolar range and Tnf (tumour necrosis factor) in the femtomolar range. While the pI (isoelectric point) of DNA or mRNA is around 4.2 to 4.5, the pI of proteins extends from less than 3 to more than 12. Whereas the solubility of nucleic acids is excellent, proteins, and especially membrane proteins, can be excessively hydrophobic. Consequently, no single method is available to fully analyse a complex mixture of proteins. In addition, no amplification process such as PCR or RT-PCR exists in the protein world. Therefore, extremely sensitive methods are required to detect the lowabundance proteins. Many methods to separate, identify and partially characterize polypeptides have been available for a long time. Until recently, many of them required a relatively high concentration/amount of proteins/material. Miniaturization of the analytical techniques does not necessarily solve the difficulty in detecting low-abundance proteins. For example, if in equipment with attomole sensitivity one injects a volume of nanolitres, the limit of detection in concentration does not exceed micromolar, way above the concentration of interesting physiologically relevant proteins. Consequently, the critical step in working with complex protein samples is to select efficient pre-fractionation and separation techniques. Often the best methods are based on affinity pre-purification and a combination of chromatography and/or electrophoresis. Many approaches could be used to detect the proteins and some of their modifications. Several developments in the field of mass spectrometry offer a new avenue, especially in the area of large-scale protein identification and partial characterization. Multi-compartment equipment allows the precise selection of xvii
xviii precursor ions (peptides), their efficient fragmentation and final characterization (sequence and modifications) and can also provide accurate quantification methods. The latest improvements are at both the hardware and software levels to provide fully automated and rapid identification methods. This book is timely. It reviews in a concise form most techniques that should be known by scientists working in a proteomics laboratory or analysing proteins of interest. It first reviews the electrophoretic and chromatographic separation methods. It then summarizes the quantification and identification methods such as immunoblotting, protein chemistry, peptide fingerprinting or sequencing by fragmentation. Several chapters highlight fascinating developments in the field of mass spectrometry and related techniques. The text shows the reader the perspective of this relatively new field of proteomics. Finally, the book lists numerous references to critical work done many years ago and unavailable on computer databases. It should therefore be part of every laboratory’s library.
Prof. Denis F. HOCHSTRASSER
ACKNOWLEDGEMENTS
First of all, I would like to thank all the people, scientists and non-scientists alike, who have contributed to the development of this work. Secondly, I would like to give special thanks to: - Professor Denis F. Hochstrasser from the Medical University Department of Pathology, Science University at the Department of Pharmacology and responsible for the Clinical Chemistry Central Laboratory at the cantonal hospital of Geneva (Switzerland) for accommodating a research position in his laboratory, thereby improving my knowledge of the chemist’s role in protein chemistry, biochemistry and mass spectrometry techniques; - Dr Jean-Charles Sanchez, responsible for the bi-dimensional electrophoresis laboratory at the cantonal hospital of Geneva for integrating me to his research group and for his critical approach; - Professor Jean-Luc Veuthey, from Geneva University of Science, responsible for the Pharmacology Section of the Pharmacy and Pharmaceutical Analysis Unit, who accepted the co-direction of my thesis project; - Professor Jacques Weber, dean of the science faculty, and Dr Jérôme Garin, Research Director at the CEA centre (Grenoble, F), who took time to judge this thesis; - Véronique Converset, Abderahim Karmime, Gérald Rossellat and Salvo Paesano, technicians at the University Cantonal hospital of Geneva who conducted some of the experiments involved in this project; - Dr Séverine Frutiger-Hughes from the Pathology Department and Dr Graham Hughes from the Biochemistry Department of the Medical University for their excellent knowledge of protein chemistry and helpful discussions; - Danièle Roiron head of Prof. Hochstrasser’s secretariat, and Dr Catherine Zimmerman from the Clinical Chemistry Central Laboratory at the cantonal hospital of Geneva (Switzerland), for their discussions; - Professor Keith Rose, Scientific Director of GeneProt (Geneva, CH), for his help in organising my work; - Professor Darryl Pappin from Applied Biosystems (Framingham , MA, USA) who received me into his laboratory (ICRF, London, UK) and taught me the techniques of peptides chemical modifications; xix
xx -
Dr Manfredo Quadroni for his help during the preparation of this manuscript;
Finally, I would like to thank all of the R&D laboratory as well as personnel from the Clinical Chemistry Central Laboratory for their help over the past 5 years and I address a large thank-you to my parents for all the sacrifices they had to make throughout the years...
LIST OF CONTRIBUTORS Appel Ron D.:
Binz Pierre-Alain: Campbell Jennifer M.: Déon Catherine:
Diaz Jean-Jacques:
Gasteiger Elisabeth:
Gays Steven:
Giron Marc:
Gras Robin:
Greco Anna:
Heller Manfred: Hochstrasser Denis F.:
Hoogland Christine:
Hughes .Graham J.: Jung Eva E.:
Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, CH-1211 Geneva 14, Switzerland Applied Biosystems, 500 Old Connecticut Path, Framingham, MA 01701 Central Clinical Chemistry Laboratory, Pathology department, Geneva University Hospital, Rue Michelidu-Crest 24, CH-1211 Geneva 14 INSERM U369, Faculté de Médecine Lyon-R.T.H. Laennec, 7, Rue Guillaume Paradin, 69372 LYON CEDEX 08, France Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland INSERM U369, Faculté de Médecine Lyon-R.T.H. Laennec, 7, Rue Guillaume Paradin, 69372 LYON CEDEX 08, France Central Clinical Chemistry Laboratory, Geneva University Hospital, CH-1211 Geneva 14, Switzerland Central Clinical Chemistry Laboratory, Pathology department, Geneva University Hospital, Rue Michelidu-Crest 24, CH-1211 Geneva 14 Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland University Medical Centre, Rue Michel-Servet 1, CH1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland xxi
xxii Central Clinical Chemistry Laboratory, Geneva University Hospital, CH-1211 Geneva 14, Switzerland Müller Markus: Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Palagi Patricia M.: Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Pasquarello Carla: Central Clinical Chemistry Laboratory, Pathology department, Geneva University Hospital, Rue Michelidu-Crest 24, CH-1211 Geneva 14 Rose Keith: University Medical Centre, Rue Michel-Servet 1, CH1211 Geneva 4, Switzerland Converset Véronique: (previously Rouge Veronique) Central Clinical Chemistry Laboratory, Geneva University Hospital, CH-1211 Geneva 14, Switzerland Sanchez Jean-Charles: Central Clinical Chemistry Laboratory, Pathology department, Geneva University Hospital, Rue Michelidu-Crest 24, CH-1211 Geneva 14 Vestal Marvin L.: Applied Biosystems, 500 Old Connecticut Path, Framingham, MA 01701 Karmime Abderahim:
CHAPTER 1 INTRODUCTION Protein analysis using mass spectrometry
WV. Bienvenut
1. INTRODUCTION: FROM GENOME TO PROTEOMIC ANALYSIS Development during the 1980s of new techniques of mass spectrometry such as “Matrix-Assisted Lased Desorption/Ionisation” (MALDI) (Karas & Hillenkamp, 1988; Tanaka et al., 1988) or “ElectroSpray Ionisation” (ESI) (Aleksandrov et al., 1984; Fenn, Mann, Meng, Wong, & Whitehouse, 1989; Yamashita & Fenn, 1984) allowed the analysis of large organic polymers. Both techniques have the advantage of producing stable molecular ions for biomolecules such as proteins and oligonucleotides. Limits of sensitivity are as low as 0.1 to 100 fmol, which correspond to very sensitive tools compatible with low-abundance substrates. 1200
Number of articles
1000
972
800
755
600
400
352
200 137 0 1993
0
3 1995
8
35
5 50
1997
1999
2001
2003
Years
Figure 1. Occurrence of the words "proteome” and “proteomic" in the PubMed databank.
1 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 1–118. © 2005 Springer. Printed in the Netherlands.
W.V. BIENVENUT
2
Mr Standard (kDa)
Gel A 1
2
Gel B 3
4
5
Mr Standard 6
(kDa)
PHS2 (98) BSA (65)
OVAL (45) PHS2 (98)
CAH2 (31)
BSA (65)
ITRA (24) OVAL (45)
LYC (14)
Figure 2. Nucleolar protein separation by SDS-PAGE followed by Coomassie blue staining (from Dr JJ Diaz (Scherl et al., 2002)). Acrylamide gel concentration was 12.5 % for gel A, which allows preferential separation of high MW protein and 8 % acrylamide for gel B able to preferentially separate low MW proteins. Proteins are distributed in a range of 10 kDa up to 120 kDa for gel A, whereas for gel B they are distributed from 30 kDa up to 150 kDa. MW standard proteins are visible in lanes 1 and 6 and can be used to estimate the Mr of the separated protein visible in lanes 3 and 4. Lanes 2 and 5 are empty.
Such techniques completely changed the protein characterisation approach. In 1993, five groups around the world (Henzel et al., 1993; James, Quadroni, Carafoli, & Gonnet, 1993; Mann, Hojrup, & Roepstorff, 1993; Pappin et al., 1995; Yates, III,
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
3
Speicher, Griffin, & Hunkapiller, 1993) demonstrated the possibility of utilising the peptide mass fingerprint obtained after a specific enzymatic cleavage, e.g. by trypsin, to identify the target protein. This simple technique was sufficient for confident protein identification. The process was based on the comparison of the theoretical protein fragment masses, generated in silico from databases, with the experimental values obtained by mass spectrometry. The genome sequencing resources are extremely important for such an approach (Lander et al., 2001; Venter et al., 2001). In 1994, during the first congress “From Genome to Proteome” in Siena, Marc Wilkins proposed the use of the term proteome to describe the proteins expressed by the genome (Wilkins, Pasquali et al., 1996; Williams & Hochstrasser, 1997) at a particular time in a given tissue, species, etc. Since 1994, the words “proteome” and “proteomic” have been used increasingly (Figure 1), illustrating the scientific interest in the translation of the genome: the proteins or proteomes. Nevertheless, although protein characterisation and identification techniques are robust and are distributed worldwide, purification of the protein mixtures is required before such analyses. 2. PROTEIN SEPARATION 2.1. Introduction All cells contain complex protein mixtures in term of pI, MW and hydrophobicity and in a huge range of concentrations. As an example, biological fluids as simple as milk or human serum contain mainly casein and albumin, respectively, but hundreds or thousands of different proteins are also present in such mixtures at various concentrations. In human plasma, albumin concentration is around 35–50 g per litre, corresponding to 500–750 PM (Doumas, Watson, & Biggs, 1971), whereas vitamin D binding protein concentration is around 200 mg per litre, corresponding to 4 PM (Dahl et al., 2003). In some fluids, such as vaginal secretions, Gaucherand et al. (Gaucherand, Guibaud, Rudigoz, & Wong, 1994) have quantified D-fetoprotein concentrations for the diagnosis of premature rupture of membranes. The threshold was determined to be 30 Pg/l of this protein, which corresponds to 500 fM concentration. Purification and/or separation steps are needed before the characterisation step. 2.2. Electrophoretic separation One of the most frequently used techniques for protein separation is based on their amphoteric characteristics. Depending of the sample pH, proteins carry negative and/or positive net charges. Macroscopically, proteins can have a net positive charge (basic pH) or a net negative charge (acidic pH) or can be neutral if the matrix pH is identical to the isoelectric point (pI). Thus, depending on the sample pH, proteins can be charged and they can then migrate under the influence of an electric field.
W.V. BIENVENUT
4
Usually a solid support (polyacrylamide gel) or liquid medium (buffer in a silica capillary) is used for such separation.
Figure 3. Bi-dimensional separation of proteins contained in a human plasma sample followed by silver staining. pI range is from 4 to 10 and MW range is 5 to 200 kDa (Copyright SWISS 2D-PAGE, http://ch.expasy.org/cgi-bin/map2/noid?PLASMA_HUMAN).
2.2.1. Gel separation Two main techniques are used for protein separation and both of them involve the physical and chemical properties of the proteins being analysed: - Separation as a function of the proteins’ molecular volume, which is usually considered to be sufficiently similar to the protein molecular weight, - Separation as a function of proteins’ isoelectric points.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
5
If, both techniques can be used separately, their combination allows the efficient separation of hundreds to thousands of proteins in a single process due to the orthogonal approach. 2.2.1.1. Molecular mass separation Protein separation using molecular volume (or mass equivalent) is one of the oldest partition techniques (Shapiro, Vinuela, & Maizel, 1967; Weber & Osborne, 1969). However, all the proteins contained in a sample show a wide range of pI (typically from pH 2–3 to 11–12) so that, at the same pH value, the proteins do not all show a similar charge density ratio (charge per mass unit). To obtain protein migration depending only on the molecular mass, all proteins must have a similar charge density ratio. To achieve this, proteins contained in a sample are denatured and mixed with a detergent such as sodium dodecyl sulfate (SDS). These charged molecules interact with the protein backbones at the rate of 1.4 g of SDS per gram of protein, which corresponds approximately to 1 molecule of SDS to every 2 AAs (Pitt-Rivers & Impiombato, 1968). This separation technique requires a matrix of polyacrylamide obtained by copolymerisation of acrylamide and a cross linker such as piperazine diacrylyl. The concentration of the cross linker will define the pore size, which directly influences the separation efficiency. Cross linker concentration can be homogeneous in the whole gel (e.g. Figure 2) or may vary to produce a “y-axis” gradient (e.g. Figure 3). The latter preparation is much more difficult to produce, but such gels can more accurately separate proteins across a wide range of mass (typically from 5 to 200 kDa). A few minor modifications have been adopted since Laemmli (Laemmli, 1970) first described this technique. After the staining step, separated proteins appear as parallel bands on mono-dimensional gels (Figure 2) that contain from one to dozens proteins (Scherl et al., 2002) or as spots for 2-DE (Figure 3). 2.2.1.2. Isoelectric focusing separation technique Proteins are amphoteric molecules, which means that proteins have both acidic and basic properties; at particular pH values corresponding to the isoelectric points (pI), such components have a net charge equivalent to zero. Their solubility is the lowest at that pH, and under such conditions proteins precipitate and a further solubilisation step is necessary if the sample is to be used for a second separation technique. IsoElectric Focussing (IEF) is a protein/polypeptide separation technique based on the amphoteric chemical properties corresponding the protein pI. Protein samples are mixed with a buffer able to form a charge on all of the material (usually at basic pH) and then loaded on a pH gradient. In an electric field, proteins migrate and concentrate to their pI, where they stop and usually precipitate. Older techniques used polyacrylamide gel containing a mobile buffer, also called Carrier Ampholytes (CAs). Under the effect of the electric field, the CAs create a pH gradient (Seiler, Thobe, & Werner, 1970) on which proteins can be separated. This technique is very powerful but its major drawback was the reproducibility of
6
W.V. BIENVENUT
the pH gradient due to the mobility of the CAs during the focusing step (also influenced by temperature). New buffering substances called Immobilines™ are copolymerised with the acrylamide (Bjellqvist et al., 1982; Bossi, Righetti, & Chiari, 1994; Rosengren, Bjellqvist, & Gasparic, 1976), which improves the reproducibility of the pH gradient and also the reproducibility of the separation. A drawback of such a technique is the limited amount of protein separated in the case of preparative gels in comparison to the CA system. This separation technique can be used alone (Etienne et al., 1999; Towbin, Staehelin, & Gordon, 1979), but the process is usually combined with a second separation technique such as SDS-PAGE whereby proteins are separated according to their molecular volume, a good approximation of the molecular weight. 2.2.1.3. Bi-Dimensional separation technique This separation technique is a combination of the two previously described techniques: - Isoelectric focussing, - SDS-PAGE. The result of the combination is a strong increase of the resolving power such that thousands of proteins/polypeptides can be separated in a single process. This method was proposed historically in 1970 by Kenrik & Margolis, who used it for native protein separations (Kenrik & Margolis, 1970). Because native protein separations are not so frequent, the method was adapted for denatured samples. This development was conducted by O’Farrell, Klose and Scheele in 1975 (Klose, 1975; O'Farrell, 1975; Scheele, 1975). By convention, the pH gradient corresponds to the X-axis and the MW separation to the Y-axis. The method is highly efficient in the pH ranges from 3.5 to 10 and/or from 4 to 7 (Gorg, Postel, & Gunther, 1988). Narrower pH domains spanning only one pH unit were successfully used by Tonella et al. (Tonella et al., 1998) to visualise the proteomic expression in limited ranges. Such an approach involving a few 2-DE separations is able to separate a larger number m of proteins than the use of a single 3.5 to 10 pH range 2-DE separation. Nevertheless, such approach is limited to the commercial availability of such IEF gradients. As an example, pH values higher than 9-10 are difficult to reach and results are not always reproducible. 2.2.1.4. Visualisation/staining methods for gel separated proteins Protein visualisation is important because it directly influences protein detection and the subsequent processing such as excision of proteins for PMF. Two different approaches are used for protein staining using metallic ions, e.g. silver or zinc. Alternatively, organic dyes, namely Coomassie brilliant blue (CBB), SYPRO® and Amido-Black (AB) are commonly used. A non-exhaustive list of few staining agent is available in Table 1.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
7
Organic dyes Coomassie staining is probably the most widely used method for protein detection after SDS-PAGE. Two Coomassie dyes can be used: - Coomassie brilliant blue (CBB R250) - Coomassie brilliant blue colloidal (CBB G250), which differ only by one methyl group. The limit of detection of these staining techniques is 8-10 ng for colloidal CBB staining version and 50-100 ng for the standard. However, this staining protocol does not allow direct quantification of materials contained in the gel because of to the large variability in staining intensity observed for different proteins (Chrambach, Reisfeld, Wyckoff, & Zaccar, 1967; Neuhoff et al., 1990). The dye molecules bind to proteins owing to an interaction of dye sulfonate groups with basic residues of the polypeptides, e.g. the H-amino groups of lysine residues. Indeed, the staining response is more likely linked to the concentration of basic sites at the surface of the proteins (Salih & Zenobi, 1998), but the hydrophobicity interaction also acts during the staining process. In some cases, dye molecules could be aggregated on a single basic position (Tal, Silberstain, & Nusser, 1985). However, staining by CBB is quite reproducible and shows good linearity within a limited range, so that the amount of a given protein can be determined fairly accurately given a calibration curve and a good scanner for densitometry. Finally, CBB staining is compatible with mass spectrometry for protein characterisation, i.e. easy to carry it out, fast and cheap. All theses factors account for its popularity (Galvani, Bordini, Piubelli, & Hamdan, 2000; Shevchenko, Loboda, Shevchenko, Ens, & Standing, 2000). SYPRO® dyes are recently developed protein staining agents. Protein visualisation is obtained by interaction between the protein and a complex of europium or ruthenium and an organic ligand, e.g. bathophenanthroline. These stains are fluorescent and enable a lowering of the limit of detection. In the case of SYPRO® ruby (Malone, Radabaugh, Leimgruber, & Gerstenecker, 2001), limits of detection are below the silver stain level with 0.25-8 ng of protein (Yan, Harry, Spibey, & Dunn, 2000). Other SYPRO® stains are available, namely: - SYPRO® red (Steinberg, Haugland, & Singer, 1996; Steinberg, Jones, Haugland, & Singer, 1996): similar limit of detection to SYPRO® ruby (0.5–10 ng protein), - SYPRO® orange (Malone et al., 2001; Steinberg, Haugland et al., 1996; Steinberg, Jones et al., 1996): limit of detection between 4 and 10 ng protein, - SYPRO® tangerine (Steinberg et al., 2000): limit of detection between 5 and 25 ng protein.
W.V. BIENVENUT
8
Table 1. Comparison of different stains for protein detection on gel or membranes and compatibility with different proteomic analyses. Staining agent
Sensitivity (ng/band) 50–100 1–10
Reversibility NA No
PMF Comp. NA No
1.5
NA
Yes
NA
NA
Yes
50–100
NA
yes
Coomassie colloidal blue (CBB G250)
8–10
NA
Yes
Copper
NA
NA
Yes
2-Methoxy-2,4-diphenyl3(2H)-furanone Niles red
NA
NA
No
5–25
NA
NA
Radioisotope labelling
NA
NA
Yes
SYPRO® orange
4–10
NA
Yes
SYPRO® red
0.5–10
NA
Yes
SYPRO® ruby SYPRO® tangerine Zinc
0.25–8 4–10 7–15
NA NA Yes
Yes Yes Yes
Amido-Black Colloidal silver with glutaraldehyde Colloidal silver without glutaraldehyde PMF Compatible silver (commercial kit from Pharmacia) Coomassie brilliant blue (CBB R250)
References (Chrambach et al., 1967) (Rabilloud, 1990, 1992; Switzer, Merril, & Shifrin, 1979) (Shevchenko, Jensen et al., 1996) (Yan et al., 2000)
(Chrambach et al., 1967; Shevchenko, Jensen et al., 1996) (V Neuhoff, Amold, Taube, & Ernhardt, 1988; Neuhoff et al., 1990) (Lee, Levin, & Branton, 1987) (Alba & Daban, 1998) (Daban, 2001; Daban, Bartholomé, & Samsó, 1991) (S. Patterson, Thomas, & Bradshaw, 1996) (Malone et al., 2001; Steinberg, Haugland et al., 1996; Steinberg, Jones et al., 1996) (Steinberg, Haugland et al., 1996; Steinberg, Jones et al., 1996) (Malone et al., 2001) (Steinberg et al., 2000) (Fernandez-Patron et al., 1994)
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
9
The major advantage of such dyes is their good compatibility with subsequent MS analysis with the exception of SYPRO® tangerine (Lauber et al., 2001), which shows yields of recoveries similar to silver staining without glutaraldehyde. Furthermore, since detection is based on emission rather than adsorption, the linear range for quantitation is greatly improved. The first major drawback of SYPRO® dyes is their cost, especially for SYPRO® ruby. A study (Malone et al., 2001) that compared MALDI spectra quality and cost per gel highlighted SYPRO® orange, which is less sensitive (1–10 times depending of the proteins) but much more cost effective ($6.00/gel compared to $133.00/gel for SYPRO® ruby). Moreover, no differences were observed in the mass spectra of stained proteins. Another more general problem with fluorescent dyes is that visualisation has to be performed under ultraviolet light; therefore special scanners are required to maximize detection performance. Manual inspection of gels is also sometimes difficult and it is more difficult to verify spot excision accuracy when working with spot cutters. Amido-Black is a popular dye for staining proteins transferred onto membranes. It was used also for in-gel staining (Chrambach et al., 1967) but, due to the poor sensitivity, Coomassie blue is generally preferred. Niles red is an unusual dye (Daban et al., 1991) that binds proteins mostly by hydrophobic interactions. An advantage of this dye is the ability of the protein to be electroblottable after the staining step. This dye has been used directly for in-gel protein staining, followed by electrotransfer to a PVDF membrane (Daban, 2001). India ink (Lek, Yang, Wang, & Cheng, 1995) and Ponceau red (Gianazza et al., 1995) are also used as protein stains but mostly for visualisation of proteins transferred onto PVDF or nitrocellulose membranes (Breggren et al., 1999). Interactions between proteins and dyes are mainly due to non-covalent interactions, electrostatic and non-specific interactions such as hydrogen bonds and van der Waals bounds (Salih & Zenobi, 1998). Ionic interactions are involved between the sulfonate group of the dye and basic residues such as His, Lys and Arg. Hydrophobic interactions are obtained between the phenyl groups of the stains and hydrophobic parts of the proteins. As an example, Tal et al. (Tal et al., 1985) clearly show that a single molecule of lysozyme is able to bind up to 48 molecules of CBB R-250, whereas only 28 basic residues were available in this sequence. Also, it must be noted that the intensity is not always related to protein concentration but also depends to the number of basic residues, the hydrophobic part of the sequence and, additionally, the size of the stained polypeptide. Since the protein is unknown, it is very difficult to determine quantitatively the amount of material. Metallic ion staining Silver staining is a widely used protein visualisation technique and is considered as a denaturing method. The principle is to use the ability of the carboxylic groups of the proteins to bind silver ions, which are then reduced to metal, producing a brownblack metallic blur at the position of the focussed protein. A large number of
10
W.V. BIENVENUT
different protocols have been reported in the literature, corresponding to a large range of sensitivity. As an example, colloidal silver staining using glutaraldehyde as a sensitiser and cross-linking agent could detect down to 1–10 ng of separated protein (Rabilloud, 1990, 1992; Sanchez & Hochstrasser, 1998; Switzer et al., 1979). This staining protocol is one of the most sensitive, but is a long and tricky procedure. To be compatible with mass spectrometric analysis, such staining methods must be conducted without glutaraldehyde (Galvani et al., 2000; Jungblut & Seifert, 1990; Shevchenko, Wilm, Vorm, & Mann, 1996; Yan et al., 2000), since this reagent cross links the amino groups of the proteins, producing complex and unidentifiable peptides (Lauber et al., 2001). Several kits for fast MS-compatible silver staining are commercially available. The negative staining process using zinc ions with imidazol buffer is also compatible with MS techniques (Fernandez-Patron et al., 1994) and can be achieved in less than 15 minutes (Fernandez-Patron, Calero et al., 1995). The principle is that zinc cations create insoluble complexes with imidazol molecules (Fernandez, Gharahdaghi, & Mische, 1998), producing a white background all over the SDSPAGE gel. At the position of focussed proteins, the SDS bound to the protein by hydrophobic interaction inhibits the formation of the Zn/imidazol complex. As a result, the protein’s positions appear transparent on a white gel. To perform this staining, the gel is incubated in a solution containing SDS/imidazol to improve staining contrast (Ortiz et al., 1992), after which the gel is incubated in a zinc cation solution. The limit of detection of this technique is 7–15 ng protein loaded, mostly depending of the protein concerned (Fernandez-Patron et al., 1994). Another advantage of this procedure is the reversibility of the staining reaction. Indeed, to liberate the polypeptides for further analysis, e.g. PMF protein identification, the whole gel or gel plug can be incubated in a zinc-chelating solution (typically EDTA or citric acid), which disrupts the complexes. The major drawback of this staining process (despite the SDS incubation step) is the low contrast of the gel image that sometimes makes it difficult to localize precisely spot positions, and the impossibility of performing quantitation by densitometry. In 1995, Fernandez-Patron (Fernandez-Patron, Hardy, Sosa, Seoane, & Castellanos, 1995) proposed a double staining protocol using first Coomassie and then Zn/imidazol. Again, the advantages of this combined technique are the speed of the staining procedure and the compatibility with MS analyses and the fact that it allows visualisation of protein using two different staining process; the Zn/imidazol process especially highlights proteins that were not visible with CBB G-250 staining. Covalently immobilized dyes Dye molecules can also be covalently linked to the protein. As an example Alba and Daban (Alba & Daban, 1998) proposed the use of 2-methoxy-2,4-diphenyl-3(2H)furanone, a non-fluorescent compound. These molecules react with primary amino groups to produce fluorescent derivatives. This technique has the great advantage of
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
11
showing low background since only bound dye molecules gives a signal. However, due to the modification of the H-amino group of the lysine, subsequent trypsin digestion proceeds with very poor efficiency. One of the emerging methods, based on the covalently linked dyes, for protein visualisation is “differential in-gel electrophoresis” (DIGE). This technique was proposed in 1997 when Unlu et al. (Unlu, Morgan, & Minden, 1997) reported a study in which two samples were separated on the same gel. Both samples, e.g. from a diseased and a healthy patient (Zhou et al., 2002), are treated before gel separation with different fluorescent dyes that binds covalently to H-amino groups of Lys. The emission wavelength is different for each dye. The gel is scanned twice and two images are recorded and superimposed. This technique is very interesting for the accurate determination of differential protein expression, since the approach overcomes all problems related to reproducibility of 2D-PAGE migration. Disadvantages of DIGE are the high cost of the reagents and scanners required for the analysis as well as some concerns about positional accuracy for spot cutting since cysteine modifications induces a small but significant mass shift. Labelling of proteins is kept at sub-stoichiometric levels (below 5 %) to maximize recovery and prevent spot spreading. Radioisotope labelling Radioisotope-labelled AAs are frequently used for protein visualisation. In most cases, such a technique can be applied only if the sample is obtained from cell or bacterial cultures (Patterson et al., 1996). (See section 5.4). To conclude, the silver staining process is probably one of the preferred methods for visualisation of proteins previously separated by PAGE with a good sensitivity. However, such chemical modifications are not easily compatible with MS analysis. SYPRO ruby shows similar sensitivity to the silver staining but with the advantage that it responds linearly to the protein concentration over a larger range than does silver stain, and moreover that protein samples are compatible with MS analysis. The commercially available stain SYPRO® ruby is expensive compared to CBB R250 or G250, Sypro® orange or negative zinc staining, which are also compatible with MS analysis. 2.2.2. Capillary electrophoresis separation Although bi-dimensional electrophoresis (2-DE) is a powerful technique for the simultaneous separation of hundreds to thousands of proteins/polypeptides contained in complex biological samples (Herbert, Sanchez, & Bini, 1997), it is timeconsuming and expensive, and gel to gel reproducibility is not easy to obtain. Capillary electrophoresis has a lower resolution than the 2-D PAGE separation on complex samples but the technique allows separation of proteins/polypeptides in few minutes. As for SDS-PAGE separation, the analytes are separated under high electric fields (1000 V/cm) in capillary columns of <100 Pm. Analytes migrate
12
W.V. BIENVENUT
through the electric field as in SDS-PAGE or 2-D PAGE separations, but the main advantage is the ability to analyse positively and/or negatively charged ions as well as neutral species. Depending of the capillary capping, three different separation modes can be obtained. - Isoelectric focussing (IEF): protein separation by pI (Jensen et al., 1999; Shen, Berger, & Smith, 2001). - Capillary zone electrophoresis (CZE): protein separation is based on the (net charge)/(protein volume) ratio. Such technique is not really helpful for analysis of complex biological mixtures (Ding & P, 1999), but it can be efficient for peptides mixture such as protein digests (Lin, Shao, & Xia, 2000). - Micellar electrophoresis or micellar electro-kinetic chromatography (MEKC): Proteins from the sample are mixed with a surfactant such as SDS. This detergent, which solubilises proteins, allows separation of proteins as a function of their hydrophobicity and mass/charge ratio. The main advantage is the ability to separate charged proteins and also neutral components. Nevertheless, the MEKC methodology is not well adapted to LC-MS technology because of the suppressing effect of salts and contaminants, e.g. SDS, that must be removed before further investigations (Amini, Dormady, Riggs, & Regnier, 2000). A new generation of detergents are now available and must be MS compatible, but as usual with a low level of such compounds (Ishihama, Katayama, & Asakawa, 2000). Capillary separation technique, due to its high resolving power, is an on-going technique for analysis of proteins after a brief separation (Jensen et al., 1999). The sample preparation required needs to be a little more complicated to ensure compatibility (Jensen et al., 1999; Lin et al., 2000). 2.3. Liquid chromatography Liquid chromatography (LC) is another approach for protein fractionation. The separation is based on physico-chemical criteria of the peptides/proteins: - Hydrophobicity with reversed-phase chromatography (Chen, Walkes, Wu, Timmons, & Kinsel, 1999; Hellman, Wernsted, Gonez, & Heldin, 1995; Lacey, Bergen, Magera, Naylor, & O'Brien, 2001; Link et al., 1999; Link, Hays, Carmack, & III, 1997; Miliotis et al., 2000; Neubaeur et al., 1998). This separation technique is the most frequently used. To summarise the principle of such separations, proteins and peptides are eluted at different organic solvent ratios. Peptides containing hydrophilic residues (Lys, Arg, Tyr, Trp,…) tends to be eluted at low organic solvent concentrations, whereas hydrophobic peptides (containing hydrophobic
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
-
-
13
residues: Leu, Ile, Pro, Val, Phe,…) require higher organic solvent concentrations to be released. An advantage of the technique is the compatibility of the eluting solvent with mass spectrometric approaches such as MALDI-MS (Miliotis et al., 2000) and ESI MS (Link et al., 1999; A. J. Link et al., 1997). Reversed-phase chromatography is also integrated into multidimensional chromatography such as ionic chromatography in which the mobile phase contains high concentrations of salts and the reversed-phase column helps to exchange the eluant for a MS-compatible solvent (Lacey et al., 2001). Ionic charge with ionic chromatography (Bethancourt et al., 2001; Hara & Yamakawa, 1996; Lacey et al., 2001; Link et al., 1999; Weitzhandler, Farman, Rohrer, & Avdalovic, 2001). This technique is particularly well adapted to post-translational modification analysis such as sialylations (Lacey et al., 2001) or N-terminally blocked proteins (Bethancourt et al., 2001). Mr with size exclusion chromatography. This technique can be used to separate proteins contained in complex samples depending on their MW (Davis, Spahr et al., 2001; Shen et al., 2001; S. Wang & Regnier, 2001) but it can also be used for a sample desalting step (Hara & Yamakawa, 1996).
These three criteria are the most frequently used techniques, but others are available for protein separation using liquid chromatography with a more restricted application area: - Immobilized metal affinity columns (IMAC) allow enrichment of samples in phospho-peptides, but the interaction between the column and peptides has a low selectivity and usually this technique will enrich samples with basic peptides and phospho-peptides (Li, Dong, Miller, & Naylor, 1999; Posewitz & Tempst, 1999; Watts et al., 1994). Such columns can be integrated into on-line systems for ESI-MS analysis of phospho-peptides (Cao & Stults, 1999). - Immuno-selective columns, where monoclonal or polyclonal antibodies are covalently linked to the inside surface of the column. This technique is frequently used to extract a specific protein or family of proteins (Lacey et al., 2001; S. Patterson et al., 1996). Some of these immunoreactive columns use not antibody but a protein such as immobilized protein G for antibody extraction from serum (Nedonchelle, Pitiot, & Vijayalakshmi, 2000) or any other protein that can produce a stable complex with agonists such as concanavalin A used for the ovarian cancers diagnosis (Gercel-Taylor, Bazzett, & Taylor, 2001).
14
W.V. BIENVENUT
2.4. Mutidimensional chromatographic separation Protein separation from complex samples with a single technique does not permit one to obtain sufficient resolution to isolate each protein. Utilisation of several techniques in a serial approach improves protein separation. Shen et al. (Shen & Allison, 2000) obtained similar protein resolution between 2-DE and size exclusion chromatography coupled with IEF capillary electrophoresis. Proteins were separated by MW during size exclusion chromatography followed by pI separation using IEF capillary electrophoresis. Others have described systems that were coupled to reversed-phase columns (Davis, Beierle et al., 2001) or immobilised trypsin columns (Wang & Regnier, 2001) during the second dimension of liquid chromatographic separation. Multidimensional chromatography has other advantages such as sample desalting on a reversed-phase column after ionic exchange chromatography (Lacey et al., 2001), but separating a complex sample directly ahead of the mass spectrometer and not on a 2-DE must limit sample contaminations and be time consuming. 2.5. Conclusion Various methods for protein separation can be applied depending of the amount of the sample and it composition. 2-DE (Klose, 1975; O'Farrell, 1975; Scheele, 1975) protein separation tends to be fully integrated, as in multidimensional chromatography coupled to MS analysis. These techniques are usually more reproducible than gel separation of proteins due to the low amount of manual intervention, to be quicker and to allow separation of proteins on a larger scale by physico-chemical properties. Moreover, these techniques can be applied to native samples, which facilitates identification of proteins involved in a protein complex in such investigations as those of protein/protein interactions (Butt et al., 2001) or protein/ADN or protein/drug interactions. 3. PROTEIN ELECTROBLOTTING FROM GEL TO POLYMER MEMBRANE 3.1. Introduction Two main applications justify the use of protein electroblotting: - Protein identification using the Edman sequencing technique, - Identification or characterisation techniques in which the target protein must be easily available, such as in immuno-chemistry… Proteins transblotted at the surface of a chemically and mechanically stable membrane facilitate storage and sample accessibility. Polyacrylamide gels are fragile and can easily be broken, but Shevchenko et al. (Shevchenko, Loboda, Ens,
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
15
Schraven, & Standing, 2001) demonstrated that protein separated in polyacrylamide gel could be used successfully for protein identification after 8 years of dried gel storage at room temperature with no evidence of protein degradation or modification. 3.2. Transfer systems Three techniques are usually used for such applications: - Tank electrophoresis, - Semi-dry electrophoresis, - Diffusion process. The first two techniques are the most frequently used for protein transfer. They are considered active transfer-process since the electric field allows proteins/peptides to migrate from the gel to the membrane. The third method is considered a passive method and is mostly used for RNA and DNA blotting.
1
2
3
4
5
6
4
3
2
7
Figure 4. Electroblotting "sandwich": 1, cathode; 2, maintaining grid; 3, sponge; 4, Whatman paper; 5, SDS-PAGE; 6, PVDF, 7, anode.
Tank transfer is the older electroblotting system. Schematically, the SDS-PAGE is overlaid on a membrane that is able to capture proteins/polypeptides using hydrophobic interactions or ionic or covalent bonds (Table 2). The gel and the membrane are securely immobilised in a “sandwich format” between two Whatman papers. This construction is immersed in a tank filled with the electroblotting buffer (Section 3.3) and continuous voltage is applied orthogonally to the gel (Figure 4). Usually, the blotting process is conducted at constant voltage with tank transfer, but other voltage signals could also be used such as a square wave voltage, which
W.V. BIENVENUT
16
improves protein recovery on the hydrophobic membrane (Bienvenut, Deon, Sanchez, & Hochstrasser, 2002). Depending on the tank geometry and the buffer, the transfer process can be achieved after a period of one hour up to overnight (Bolt & Mahoney, 1997; Reim & Speicher, 1992; Vestling & Fenselau, 1994).Hirano proposed semi-dry transfer in 1989 (Hirano, 1989). The major advantage of such an electroblotting technique is the uniformity of the electrodes and the electric field. Indeed, the electrodes are flat metallic plates in a direct contact with the Whatman paper. With this arrangement, the electric field must be exactly the same in all gel positions, which is not the case when wire electrodes are used for tank transfer. Usually the blotting process can be conducted in 30 minutes, but a longer period does not impair the transfer (Hirano, 1989). The second advantage is the small amount of buffer used in this technique. At least 1 litre of buffer is needed for tank electroblotting. In semi-dry transfer, the Whatman paper is the only “reservoir” of buffer, which decreases the volume of buffer to 200–300 ml. Moreover, the reservoir-function of the Whatman paper permits a multi-buffer system due to the complete separation of the anodic and cathodic buffers. As an example, Laurière proposed a complex transfer method able to increase protein recovery (Lauriere, 1993). Usually, a continuous current is applied (1 mA/cm2) for the blotting process over a period of 1 hour (Eckerskorn et al., 1997; Lauriere, 1993).
9 8
7
4 6 5
3 2
Figure 5. Transverse section of a passive diffusion blotting system. (1) Buffer tank, (2) plastic base,(3, 4) Whatman paper,(5) plastic film to limit the transfer to the gel surface, (6) SDS-PAGE, (7)stack of absorbing paper, (8) glass plate, (9) compression weight (Sealy & Southern, 1982)
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
17
Neumann and Mullner (Neumann & Mullner, 1998) proposed an interesting application of the semi-dry blotting in which a single gel is able to produce two membranes in a single process. In that scheme they used two PVDF membranes, i.e. one for each side of the gel. The applied voltage is reversed after a defined period that allows similar repartition on each PVDF membrane. It must be noted that the two membranes are images of each other. The advantage of such a technique is to produce very similar protein patterns from a sample. One membrane can then be used for denaturing staining, e.g. silver staining or carbohydrate staining, while the second membrane can be used for immuno-detection, Edman degradation, PMF, etc. The third method uses a passive diffusion technique that is effective for RNA or DNA blotting (Chomczynski & Mackey, 1994). The technique was also applied to proteins transfer but, due to its low material recovery and reproducibility, was not much used. The gel is laid on a Whatman paper that is soaked in and connected to the blotting buffer. The hydrophobic membrane is on the top of the gel and a stack of dry absorbing paper allows an ascensional migration of the buffer and the blotting material (Figure 5). This technique usually needs a few hours (an overnight period at least) to obtain a correct result. Nitrocellulose is now less used than PVDF but it is still an interesting capture surface. The two mains factors limit its utilisation: - The low binding capacity of such membranes. An example of the low binding capacity was shown by Bolt and Mahoney (Bolt & Mahoney, 1997), who used multiple levels of nitrocellulose membranes. With this technique, they showed that a non-negligible portion of the proteins cross the first nitrocellulose membrane and are immobilised on the second. - Moreover, this material is not as stable as the PVDF membranes, e.g. it is susceptible to degradation after heating and it is soluble in various organic solvents, and so on. Nevertheless, even if the chemical properties of nitrocellulose decrease its usability, it is a well adapted protein-binding surface for the CNBr cleavage approach (Dukan et al., 1998). These two previously described membranes have neutral surfaces and the main membrane/protein interactions are hydrophobic. Others interactions are used to collect protein/polypeptide material, such as Immobilon™ CD or Ny+ from Millipore (no longer available). Such membranes are positively charged and capture the proteins/peptides (or more particularly nucleic acids in the case of Ny+) by ionic interactions. Accordingly, low molecular weight peptides could be efficiently captured, which is not possible with hydrophobic interaction (Patterson et al., 1996; Schreiner, Strupat, Lottspeich, & Eckerskorn, 1996). Table 2 shows a nonexhaustive list of membranes. Membrane pores sizes also have a direct influence on the protein adsorption capacity. As an example, the pores sizes of Immobilon™ P and PSQ membranes are
18
W.V. BIENVENUT
0.5 and 0.2 Pm respectively. These parameters are directly linked to the adsorption capacity (Table 2) (Bolt & Mahoney, 1997; Mozdzanowski & Speicher, 1992). Other polymeric based membranes have been used for specific purposes, such as cellulose acetate (Iijima, Shiba, Inoue, Yoshida, & Kimura, 1997), Nylon (mainly used for nucleic acid transfer), polypropylene, or polyethylene (Baker, Dunn, & Yacoub, 1991; Jungblut, Eckerskorn, Lottspeich, & Klose, 1990). Before the development of polymeric membranes, blotting processes were conducted with derivatised glass fibre tissues and they were mostly used for the Edman sequencing approach. Glass fibre tissues treated with diisothiocyanate (Wachter, Machleidt, Hofner, & Otto, 1973) or with 3-aminopropyl triethoxysilane (Aebersold, Teplow, Hood, & Kent, 1986) allow the binding of proteins covalently or by ionic interaction. 3.3. Composition and influence of the blotting buffer and solvents Buffers are crucial during the electroblotting operation (tank or semi-dry). Buffer creates the electric field that is the active process for protein migration and carries ions from the anode to the cathode to maintain the electrical neutrality of the system. An aqueous buffer is generally used but it usually contains a low concentration of organic solvents. 3.3.1. Buffers composition A large number of buffering substances are used. One of the most-often used was described originally by Towbin et al. (Towbin et al., 1979). This buffer is a mixture of Tris® base (25 mM) and Gly (192 mM) solubilised in an aqueous solution containing 20% methanol to facilitate protein extraction from the gel (Dukan et al., 1998). Some variations on this technique have also been proposed, such as: - Only 10% methanol (Patterson et al., 1996; Vestling & Fenselau, 1994), - Half of the Tris® and Gly concentration (Mozdzanowski & Speicher, 1992; Reim & Speicher, 1992). Schleuder et al. (Schleuder, Hillenkamp, & Strupat, 1999) used 2-amino-2methyl-1,3-propanediol (50 mM) with 20% methanol as a transblotting buffer. The pH of this solution is 8.3, whereas Towbin (or derivatives) are usually around 8.5. Both buffers show some problems directly linked to the neutral to low basic pH, which does not favour transfer of highly basic proteins. Indeed, at pH 8.3–8.5 the basic proteins are mainly positively charged (especially the free amino groups) and move backwards in the cathodic direction and will not be collected on the hydrophilic membrane, which is usually at the anodic side. Some basic material will
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
19
Table 2. Physical characteristics of some collecting membranes commonly used in protein electroblotting. D: Pores size (Pm); Thick: thickness (Pm). Adsorp: membrane capacity (Pg/cm2); Interact: polypeptide/membrane interaction; C: covalent (C-S: stable or C-H: hydrolysable); H: hydrophobic; I: ionic. CO OAct: activated carbonyl. References: 1,(Mozdzanowski & Speicher, 1992); 2, Pall catalogue; 3, Millipore catalogue; 4, Bio-Rad catalogue; 5, Millipore Tech protocol # 14 and 18. Membrane
Supplier
Polymer
D
Thick
Adsorb
Immun-Blot™ Sequi-Blot™ Nitro-cellulose
Bio-Rad Bio-Rad Bio-Rad
PVDF PVDF NC
130 130 NA
160 170-200 80-100
Zeta-Probe Immobilon™ P
Bio-Rad Millipore
Nylon PVDF
0.2 0.2 0.45 & 0.2 NA 0.45
Immobilon™ PSQ
Millipore
PVDF
Immobilon™ CD Immobilon™ AV Immobilon™ NC Immobilon™ Ny+ Immobilon™ Ny Biodyne® A
Millipore
PVDF
0.25 (1) 0.2 (3) 0.1
Millipore
PVDF
0.65
130
Millipore
NC
0.45
Millipore
Nylon
Millipore
Nylon
Pall
NA NA 100- 175-200 (1) 130 85-294 (3) NA 185-200 (1) 262-448 (3) NA 200-222
Inter Chem. Ref. act Mod. H 1,4 H 1,4 H 1,4 I H
Cation 4 1, 3
H
-
1, 3
I
Cation
3
150
CS
COAct
5
NA
120-260
H
-
3
0.45
NA
NA
I
Cation
3
0.45
NA
NA
H
-
3
NA
H
-
2
NA
I
Cation
2
NA
I
Anion
2
135
CH
2
NA
H
Aldehyde -
Nylon 6,6 0.2, 0.45, 152 r 1.2 13 152 r Nylon 6,6 0.45 13 152 r Nylon 6,6 0.45 13 Polyether0.45 152 sulfone PVDF 0.45 165
Biodyne® B
Pall
Biodyne® C
Pall
UltraBind™
Pall
BioTrace™
Pall
™
Pall
NC
NA
140
209
H
-
2
Schleicher &Schuell Schleicher &Schuell
PVDF
0.45
NA
NA
NA
NA
1
PVDF
NA
NA
NA
NA
NA
1
BioTrace
Westran™ I Westran™ II
2
20
W.V. BIENVENUT
thus be lost during this step and a more basic buffer would be preferred such as (3cyclohexylamino)-1-propanesulfonic acid (10 mM) with 10% methanol, which has a buffer pH around 11 (Gianazza et al., 1995; Vestling & Fenselau, 1994). As previously described, use of a semi-dry electroblotting system allows different anodic and cathodic buffers. This is especially true for methanol concentration, where the anodic buffer contains 20% methanol compared to 5% methanol at the cathodic side. The higher methanol concentration favours protein precipitation at the anodic side where the binding membrane is located. Laurière (Lauriere, 1993) developed a more complex reservoir system for semi-dry electroblotting in which completely different buffers were used simultaneously. An advantage of this technique was the improvement of the protein extraction from the gel and an increase in the binding capacity of the membrane. Other exotic buffers have also been used for protein electroblotting, such as Tris/borate buffer at pH 8 (Fleming & Paull, 1988) or Tris/acetate buffer at pH 7.4 (Bolt & Mahoney, 1997). 3.3.2. Effects of the SDS and methanol in the buffer solution SDS has a strong influence on the protein blotting process since this compound is used to create micellar conditions that facilitate protein migration at a constant charge to mass ratio. While this property is used for efficient SDS-PAGE protein separation, it also facilitates protein extraction from the gel during an electroblotting process. However, at high SDS concentrations the detergent saturates the binding positions of the hydrophobic membranes and proteins and SDS are then competitors for the adsorption sites on the membranes (Mozdzanowski & Speicher, 1992). It is extremely important to check this parameter carefully so as to have a sufficient SDS concentration to easily extract the proteins/peptides from the gel but not an excess that will prevent hydrophobic interaction. Methanol in the buffer has less effect than the SDS, but this organic solvent helps to dissociate the protein/SDS complexes and decrease adsorption at the surface of the hydrophobic membrane. An increase of methanol in the buffer will counterbalance the negative effect of the SDS (Lauriere, 1993). The SDS concentration must be around 0.03–0.05% (m/v) to obtain a satisfying result with 10 % methanol (Bolt & Mahoney, 1997; Mozdzanowski & Speicher, 1992). 3.3.3. Other influencing parameters While the influence of temperature is low in this technique, a cooling system is usually combined with electroblotting apparatus to prevent buffer over-heating due to joule effects (inducing modification of the pH) as well as rapid evaporation of the solvent. This is more critical with semi-dry transfer since the buffer volume is much more limited than in tank transfer. The optimum transfer time is a difficult parameter to set. Nevertheless, although semi-dry transfer allows a decrease in the blotting period in comparison to tank
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
21
transfer, usually a minimum of one hour is needed to obtain a correct and exploitable membrane. If a low voltage is used, e.g. 10 V, a longer period such as 10–15 hours can be needed, which does not impair the blotting process. 3.4. Membrane staining Although membranes are mostly produced for immuno-blotting detection, it can be useful to know approximately the protein pattern on the membrane. For this, a quick staining step is sometime applied before further investigation. But not all the staining methods are compatible with immuno-blotting (for example see Table3) and any membrane treatment must be done carefully. In Table 3, a non-exhaustive list of the most-used current stains is provided with information about the reversibility of the staining procedure, the compatibility of the PMF and immuno-blotting and the limit of detection range. 3.4.1. Non-denaturing organic staining processes With this type of staining technique, membranes can be reused for others analyses, e.g. protein immuno-detection (Sanchez, Wirth et al., 1997), Edman sequencing or AA composition (Fernandez et al., 1998). One of the most used dyes is Coomassie blue R250 also called Coomassie brilliant blue (CBB). It allows easy detection of proteins on the membrane down to 10–30 ng per spot or band. Description of its utilisation can be found in numerous articles (Gianazza et al., 1995; Reim & Speicher, 1992; Schreiner et al., 1996). A chemical modification of CBB R250 provides the colloidal Coomassie stain, also called Coomassie G250. The major advantage of this stain compared to CBB R250 (Iijima et al., 1997) is its high sensitivity. No direct investigations have been conducted to compare both stains on membranes, but it has been clearly demonstrated in gel based staining (Scheler et al., 1998). The high percentage of organic solvent needed to prepare the staining solution is sometimes not compatible with some membranes, e.g. nitrocellulose. Amido Black is also a frequently used staining agent the membrane samples (Lauriere, 1993; Towbin et al., 1979) and it is possible to detect proteins down to 15–60 ng per spot or band within a few minutes. Ponceau S or Ponceau red is a similar staining agent to Amido Black with a higher limit of detection of 60–100 ng per spot or band. It is thus a low-sensitivity stain and is less used in the proteomic area (Gianazza et al., 1995; Hirano et al., 1991). Fluorescent dies have recently become powerful tools for protein visualisation on membranes. The stains of this family are based on complexes between rare earth elements such as europium or ruthenium and various ligands. The two most frequent are: - Bathophenanthroline disulfonate/europium, also called SYPRO® orange (Kemper, Berggren, Diwu, & Patton, 2001),
W.V. BIENVENUT
22
Table 3. Comparison of different dyes for protein detection on membrane after electroblotting. Rever (reversibility), IC (immunochemistry) and PMF columns give the compatibility/reversibility of these techniques. Sensitivity (ng/band)
Membrane
Rever
IC
PMF
References
15–60
PVDF, NC
NA
No
No
NA
PVDF
NA
Yes
NA
15–30
PVDF, NC
No
NA
Yes
15–30
PVDF, NC
Yes
Yes
Yes
10–30
PVDF
No
No
Yes
NA
PVDF
NA
No
NA
4–8
PVDF, NC
No
No
No
NA
PVDF
No
Yes
NA
Iodine
NA
Yes
NA
NA
[14C] labelled proteins
NA
Immobilon CD PVDF, NC
No
NA
No
[ I] labelled proteins
NA
PVDF, NC
NA
NA
NA
[35S] labelled Met
NA
PVDF
No
NA
Yes
Millipore staining kit for Immobilon CD MDPF
NA
NA
NA
Yes
5
Immobilon CD PVDF
No
Yes
NA
Colloidal gold
1–4
PVDF, NC
No
No
No
60–100
NC
Yes
Yes
Yes
NA
PVDF
NA
NA
NA
SYPRO® Rose
10–20
PVDF, NC
NA
Yes
NA
SYPRO® Ruby
2–8
PVDF, NC
No
Yes
Yes
(Breggren et al., 1999) ( Sanchez et al., 1997) (Breggren et al., 1999) (Breggren et al., 1999) (Breggren et al., 1999) (Breggren et al., 1999) (Breggren et al., 1999) (Eynard & Laurière, 1998) (Gianazza et al., 1995) Bolt & Mahoney, 1997 (Mozdzanowski & Speicher, 1992) (Patterson et al., 1996) (Patterson et al., 1996) (Alba & Daban, 1998) (Breggren et al., 1999) (Breggren et al., 1999) (Hirano et al., 1991) (Kemper et al., 2001) (Breggren et al., 1999)
Staining agent Amido Black
Colloidal silver with glutaraldehyde Bathophenanthroline disulfonate/ Europium Coomassie brilliant blue (CBB R250) Coomassie colloidal blue (CBB G250) India Ink
125
Ponceau S
-
Commercially available SYPRO® ruby, which is a complex of ruthenium (Berggren et al., 1999).
The main advantages of these new stains are their impressive limits of detection, at 15–30 ng/band or spot for SYPRO orange and 2–8 ng/band or spot for SYPRO
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
23
ruby, which is comparable to colloidal silver staining sensitivity, the usual reference for protein visualisation. Alba & Daban (Alba & Daban, 1998) proposed using 2-methoxy-2,4-diphenyl3(2H)-furanone (MDPF) as a reagent against free amino groups of the proteins that lead to fluorescent-labelled proteins. Usually, protein labelling is conducted prior the 2-DE separation. A side effect of this technique is a major difference between labelled and unlabelled proteins, e.g. the Mr difference observed with the DIGE staining protocol (Tonge et al., 2001). In that case, the chemical reaction is conducted after the electroblotting process and directly on the collecting membrane. This approach has the advantage of conducting the protein separations with unmodified material; then no spot-shift is be observed as occurs between DIGE labelled/unlabelled samples (Tonge et al., 2001). Moreover, this reagent needs to be covalently linked to the free amino groups to express fluorescence, whereas the excess of reagent and by-products are not fluorescent. This is a crucial advantage of this technique since a low fluorescence background increases the detection sensitivity to 5 ng per band or spot. Protein modification does not affect further investigation such as immuno-blotting but will affect protein cleavage using trypsin since the H-amino groups of Lys are modified and blocked. Limit of detection ranges of the most frequently used membrane stains are listed in Table 3. According to Berggren et al. (Breggren et al., 1999), some staining process are not compatible with further analysis such as immuno-detection. However, other groups have conducted successful immuno-detection on AB stained membrane (Sanchez, Wirth et al., 1997). 3.4.2. Radiolabelled protein detection Another frequently used technique for visualising proteins is to incorporate radioisotope labelled AAs into the protein primary structure. This technique can be easily applied in vivo. The growth medium is enriched with specifically radio-labelled AAs such as sulfur atoms on the Met or Cys residues, e.g. 35S-labelled Met (Cossio, Sanchez, Wettstein, & Hochstrasser, 1997; Patterson et al., 1996), which are directly incorporated to the protein sequence. These radioactive proteins can then be visualised by radiography or fluorography after a separation step (mono- or polydimensional) Incorporation of radioisotope-labelled AAs into the sequence of proteins can be achieved only if in-vivo generated proteins are available. The technique is not applicable for in-vitro labelling. Then, two other radio-labels are available: - 14C labelling: covalent protein modification (Bolt & Mahoney, 1997), - 125I labelling: non-covalent protein modification (Mozdzanowski & Speicher, 1992). These modifications are usually conducted on purified proteins and not on complex mixtures (Bolt & Mahoney, 1997; Mozdzanowski & Speicher, 1992). The
24
W.V. BIENVENUT
techniques are limited to method development much more than protein quantitation, which is more accurate using 35S AAs. 3.4.3. Denaturing staining process Some extremely sensitive staining methods are available for visualising gelseparated polypeptides on a membrane. Unfortunately, it is generally impossible to reuse such membranes for further analysis such as immuno-detection. As an example, India ink allows visualisation down to 4–8 ng of protein per spot (Lek et al., 1995). The intermediate solution is to produce a duplicate of the membrane (from a single gel (Neumann & Mullner, 1998) or from two parallel-prepared gels). One of the two membrane can be used with a sensitive stain, e.g. India ink staining, whereas the second will be used for a non-denaturing process, e.g. immunodetection (Hailat & Hanash, 1995). Nevertheless, another group obtains successful results with a single membrane used for denaturing staining process followed by immuno-detection (Eynard & Laurière, 1998). Colloidal silver staining of membranes is also possible but is not frequently used (Christiansen & Houen, 1992; Gultekin & Heermann, 1988). The limit of detection of the technique is between 15 and 30 ng per spot or band; this is nearly the same as the CBB R-250 detection limits, so that this staining method is not especially useful. Furthermore, colloidal silver staining is much more time consuming than CBB R250 staining and the staining pattern is usually very different from that of other stains, which means that it could be difficult to link such images to images obtained with other staining processes (Tonella et al., 1998). Colloidal gold is used for protein staining but mainly on PVDF membranes. Such methods show some of the best sensitivities, with limits of detection around 1– 4 ng (Berggren et al., 1999), similar to that of SYPRO® ruby stain which is compatible with MS analysis. Colloidal gold staining irreversibly modifies proteins, making it impossible to perform subsequent protein identification by mass spectrometry or immuno-detection (Cossio et al., 1997). 3.5. Conclusion Blotting techniques aim to transfer proteins previously separated by SDS-PAGE onto a polymeric membrane. Such a process is sometimes needed, e.g. for Edman degradation or immuno-blotting, but this step is usually not used for PMF sample preparation, especially because of the loss of material during the electroblotting process (Vestling & Fenselau, 1994), which decreases the sensitivity of the approach. Because utilisation of large endoproteinases such as endoproteinase V8 or Glu-C does not allow in-gel digestion, protein contained in gel plugs must be extracted in a liquid format or at the surface of a PVDF membrane to make it much more accessible (Good et al., 1995; Rosen, Shoshani, Naor, & Sela, 2001).
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
25
4. PROTEIN IDENTIFICATION 4.1. Introduction The primary sequences of proteins from an increasing number of complete genomes (http://www.tigr.org/mdb/mdbcomplete.html) such as Haemophilus influenzae Rd (Fleischmann et al., 1995), Escherichia coli (Blattner et al., 1997), Saccharomyces cerevisiae (Goffeau et al., 1996), Caenorhabditis elegans {Consortium, 1998 #893}, Arabidopsis thaliana {Initiative, 2000 #892}, Drosophila melanogasterr (Adams, Celniker, Holt, Venter, & al., 2000) as well as Homo sapiens (Dunham et al., 1999; Lander et al., 2001; Venter et al., 2001), are now readily available. But direct translation of the DNA sequences into polypeptides sequences does not, at present, allow us to identify and understand the physiology of the organisms (Achaz, Coissac, Viari, & Netter, 2000; Hood, 1999) or the exact sub-cellular localisation, the yield of expression or the exact functions of such products. It must be noted that DNA sequences represent the storage of information, whereas the function depends on their translation products, i.e. the proteins. 4.2. Uncleaved protein identification procedures The use of uncleaved proteins for their identification and characterisation is limited to a few methods previously used on a large scale. Two of them, and probably the most widely used, are the Edman sequencing technique (Edman, 1950; Edman & Begg, 1967) and immuno-detection (Towbin et al., 1979). In both cases, the analyses require a chemically stable substrate or matrix where proteins are transblotted from the gel, e.g. PVDF membranes (see section 3). One method, Edman degradation, involves a chemical reaction between a reagent, e.g. phenyl isothiocyanate, and free amino groups. The N-Terr group as well as H-amino groups of the Lys present in the protein sequence are modified. A second chemical reaction selectively cleaves the chemically modified N-Terr AA. The “byproduct” of this reaction is available for AA characterisation using liquid or gas chromatography. Using a few cycles of the treatment described, the N-Terr sequence can be identified (Edman & Begg, 1967). Of course, such a process requires purified peptides since by-product characterisation needs to be as clear as possible. With this technique, 40–50 residues can be identified per day (personal communication, Dr G. Hugues, University Medical Centre, Geneva). This technique was previously a method of choice for protein characterisation, e.g. of pig trypsin (Hermodson, Ericsson, Neurath, & Walsh, 1973), E-casein (Ribadeau Dumas, Brignon, Groschamet, & Mercier, 1972), or human albumin (Meloun, Moravek, & Kostka, 1975). At present, this powerful technique is mostly used for identification of small peptides (Kollisch, Lorenz, Kellner, Verhaert, & Hoffmann, 2000), for PTM characterisation (Lehr et al., 2000) and for quality control of synthetic peptides. It has also been used jointly with mass spectrometry to characterise AAs with similar
26
W.V. BIENVENUT
masses (Leu/Ile) or slightly different masses (Lys/Gln, Phe/Met-sulfoxide). Incidentally, this type of use has reduced as a result of new development of MSn techniques that are able to clearly identify the AA sequence, also when the N-Ter groups is modified by a post-translational modification, e.g. acetylation, which is frequent for proteins located in tissues. Identification of uncleaved proteins is also possible by immuno-detection. This identification is based on protein recognition using antibodies. Two types of antibody are used with different results: - Polyclonal antibodies: these molecules have limited specificity and can cross-react with others non-specific proteins, - Monoclonal antibodies: these molecules must react fairly specifically with a limited range of proteins. In both cases, protein mixtures must be previously separated (see section 2) using, for example, 1-DE or 2-DE followed by a transblotting step (see section 3). Location of the protein–antibody complex is achieved using chemi-luminescence, fluorescence or radioisotope emissions. This technique is extremely sensitive and allows detection down to 5–10 ng per band or spot, which corresponds to 100–200 fmol (Berggren et al., 1999). Nevertheless, good specificity of the test and clear protein identification are difficult to achieve due to possible cross-reaction. Also, it is very difficult to identify several proteins at the same time, notwithstanding the development by Sanchez et al. (Sanchez, Wirth et al., 1997) of a method able to identify nine oncogenic proteins at the same time with nine different antibodies. The use of uncleaved proteins in mass spectrometry is limited to the determination of a few physico-chemical characteristics such as protein/protein interaction (Mackun & Downard, 2003; Rusconi, Guillonneau, & Praseuth, 2002) and molecular mass (Clauser, Baker, & Burlingame, 1999; Galvani et al., 2000; Karas & Hillenkamp, 1988; McComb et al., 1997; Vestling & Fenselau, 1994). Usually, proteins are extracted out of the gel using a passive method (Cohen & Chait, 1997; Galvani et al., 2000; Mirza et al., 2000) or electroelution (Galvani et al., 2000; Haebel, Jensen, Andersen, & Roepstorff, 1995; Le Maire, Deschamps, Moller, Le Caer, & Rossier, 1993). Ogorzalek et al. (Ogorzalek Loo et al., 1997a) developed a technique for the determination of the protein MW directly in the separating gel. In fact, mass spectrometry is able to determine protein molecular weight with a very high precision when a Fourier transform detector is used (Clauser et al., 1999), but due to the large amount of PTM it is nearly impossible to use such information for direct protein identification.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
27
4.3. Enzymatic cleavage of proteins 4.3.1. Introduction Direct identification of uncleaved proteins is limited due to the previously described problems. Present techniques of protein identification based on mass spectrometric analysis require the use of cleaved proteins (see section 3.6.1). The object of digestion protocols prior to MS identification of proteins is to obtain sufficient enzymatic or chemical cleavage to successfully extract peptides from the matrix (solution, gel or membrane) in a form that is directly compatible with MS analysis. 4.3.2. Enzymatic cleavage description Enzymatic cleavage is mostly conducted on previously separated proteins in order to decrease sample complexity. This approach allows us to clearly identify the analysed proteins. It is mostly used for protein identification and characterisation in a limited amount of sample. In some cases, the idea is not to characterise proteins but just to identify as many as possible of them in the sample. For this approach, an increasing number of processes called top-down proteomic analysis (GodovacZimmermann & Brown, 2003; Thevis, Loo, & Loo, 2003) start with the digestion step of a complex mixture, which can be whole-cell lysate or can be enriched media (enriched organelles or protein complexes, …). The peptide mixtures obtained are then treated by chromatography to decrease sample complexity before the mass spectrometric analysis step. Depending on the chemical treatment, all of the peptides can be available for the analysis (Geng, Fi, & Reignier, 2000; Link et al., 1999) or only a selection of them, e.g. cysteine-containing peptides (Gygi et al., 1999). In the beginning of protein identification using mass spectrometry, proteins were commonly blotted onto PVDF membranes. This was mostly done for protein identification based on Edman degradation. With time, it clearly appeared to be better to conduct protein digestion directly in-gel. The main advantages are to limit sample loss during the blotting process and increase material extraction after the digestion.
4.3.2.1. Treatment and digestion of transblotted proteins Although these methods are not frequently used, other identification process such as immuno-detection or sugar labelling are sometimes integrated into the protein identification method. In such cases, the proteins bound on the PVDF membrane are stained and the spots of interest are cut out. After this, the first step will be to remove the stain and decrease artefactual stain peaks (Salih & Zenobi, 1998). Due to the hydrophobicity of PVDF, few hydrophobic binding sites are available to capture trypsin before the digestion. Different process will lead to membrane capping using non-proteic reagents such as polyvinylpyrolidone (PVP-40) (Hirano
28
W.V. BIENVENUT
et al., 1991; Schleuder et al., 1999), l’-octyl-E-glucopyranoside (Courchesne, Luethy, & Patterson, 1997; Pappin, 1997) or hydrogenated Triton X100 (Fernandez et al., 1998). Another common protein treatment before endoproteolytic digestion is reduction followed by alkylation of the sulfhydryl groups. After this modification, the primary sequence of the proteins is easily available to the digesting enzyme. The reduction reaction is usually conducted using DTE/DTT or mercaptoethanol followed by an alkylation reaction using iodoacetamide, iodoacetic acid, 4-vinylpyridine (Fernandez et al., 1998; Vestling & Fenselau, 1994) or other related compounds, e.g. ICAT reagents (Gygi et al., 1999). Sulfhydryl groups can also react during the gel separation process with the non-polymerised acrylamide (Bordini, Hamdan, & Righetti, 2000). The utilisation of polymeric membranes shows the great advantage of maintaining the protein substrate at the surface of the membrane and the excess of reagent can easily washed. The digestion step is then obtained by immersion of the piece of membrane in a buffer solution containing the endoproteolytic enzyme. Most of the related articles use trypsin enzyme in a bicarbonate buffer (Courchesne, Luethy, & Patterson, 1997; Fernandez et al., 1998; Pappin, 1997). The conditions for accurate protein digestion were 1 to 24 hours digestion period and a temperature range from 20 to 37°C. It must be noted that the technique is particularly interesting for large endoproteolytic enzymes such as Asp-N or Glu-C that are not very efficient for in gel digestion. After the digestion, peptides are easily extracted using 50% MeCN/0.1% TFA/FA. 4.3.2.2. Treatment and digestion of gel-separated proteins At present, in-gel digestion is the most frequently used digestion technique for gelseparated proteins. Advantages of such direct treatment are the availability of the total protein sample and decreased contamination linked to intermediate treatments. As described in the previous section, sulfhydryl groups of the cysteines are reduced and alkylated, especially for 1-DE separated proteins where the reduction/alkylation step is not included in the process. Reagents are usually the same as previously described (Egelhofer, Bussov, Luebbert, Lehrach, & Nordhoff, 2000) followed by a washing and a destaining step. At that stage, the gel plug is dehydrated and rehydrated with the endoproteolytic solution (containing the correct buffer) and left for 1 to 24 hours at 20 to 37°C for digestion. At the end of this digestion process, the supernatant is collected. A first extraction is conducted using 0.1–1 % TFA/FA (TFA is frequently used for the MALDI approach, whereas FA is used for ESI technique). A final extraction can be conducted using 50% MeCN/0.1% TFA/FA. The pooled fractions are partially dried to concentrate the sample and remove the MeCN if a combined approach with LC-MS is used. Bonetto et al. (Bonetto, Bergman, Jörnwall, & Sillard, 1997) proposed alkylating peptides after the tryptic digestion using 2-bromoethyl-trimethylammonium (Itano & Robinson, 1972). Such modification of the cysteine-containing peptides facilitates carboxypeptidase P and Y degradation.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
29
4.3.2.3. Utilisation of immobilized endoproteinases Usually, the endoproteolytic digestion of proteic substrates is achieved by the addition of the endoproteinase with the corresponding buffer to lyophilised material, gel plugs or pieces of PVDF membrane. Nevertheless, a few examples in the literature use immobilised enzymes and the most frequently cited is trypsin due to it specificity and its availability at low cost. Frequently, use of an endoproteinase to produce peptides involves autolysis of the enzyme. After few minutes or hours, the enzyme loses most of it activity. This is the case for trypsin, where DE trypsins are converted to \-trypsin (see trypsin cleavage specificity), which exhibits a completely different specificity. Overnight digestion at 37°C is then not always a good idea since it can induce unspecific cleavage (see section 4.3.3.2). A better solution is usually, 1–2 hours at 37°C or overnight at room temperature to limit unspecific cleavages that are favoured mostly by time and temperature. If the endoproteinase is immobilised on a solid substrate, the autolytic reaction is limited and the enzyme activity is generally preserved. Krogh et al. (Krogh, Berg, & Hojrup, 1999) proposed immobilising trypsin on paramagnetic beads (from DynaBeads¥). Such a digestion process allows good protein digestion to be obtained without autolysis fragments of the trypsin. On-target digestion of the protein was also developed to limit the loss of hydrophobic peptides at the surface of the sample tubes. Dogruel et al. (Dogruel, Williams, & Nelson, 1995) developed a functionalised gold surface where the trypsin was covalently linked to the gold surface via bi-functional linkers. The sample in the correct buffer is loaded on the target for the digestion conducted at 37°C in an atmosphere with moisture. The protein hydrolysis is stopped by the addition of an acidic solution of matrix. This approach theoretically allows visualisation of all of the peptides without loss despite MALDI’s suppressing effects limiting complete peptide visualisation. Théberge et al. (Théberge, Connors, M.Skinner, & Costello, 2000), who captured transthyretin using an affinity column and loaded binding material directly onto the reactive plate used such an approach. Recently, Wang and Regnier (Wang & Fitzgerald, 2001) developed an on-line system in which, first, proteins contained in samples are separated using size exclusion chromatography, and then on-line separated proteins are loaded onto a trypsin immobilised column. Successful protein digestion was obtained after 20 minutes at 37°C. Vecchione et al. (Vecchione et al., 2001) used a similar approach for on-line characterisation of human fibrinogen variants. 4.3.3. Trypsin Trypsin (EC 3.4.21.4) is the most widely used endoproteinase for protein identification using the technique called peptide mass fingerprint (PMF) protein identification. Trypsin is a serine peptidase belonging to peptidase family S1 and
30
W.V. BIENVENUT
Table 4. Main substrate used for tryptic activity measurement; O (nm) corresponds to the absorption wavelength used during hydrolysis kinetic measurement. For fluorescence analysis, the A value corresponds to the activation wavelength whereas the E value corresponds to the emission wavelength. Us. Abr.: usual abbreviation; O (nm): absorption wavelength; Method (Direct or Indirect): refers to substrate utilisation directly by optical density measurement (Direct) or if the result is obtained after one second quantitative operation able to quantitate the amount of modified substrate (Indirect) Substrate 1-Anilino-8-naphthalenesulfonate N-Benzoyl-DL-arginine p-nitroanilide N N N-Glutaryl-L-phenylalanine p-nitroanilide p-Nitro-phenyl p-guanidino benzoate
Us. Abr.
O (nm)
Method
References
ANS
-
Indirect
DL-BAPA L-GPNA NPGB
410
Indirect Indirect Direct
(Spencer, Titus, & Spencer, 1975) (Stewart, Lee, & Dobson, 1963) (Ford, Chambers, & Cohen, 1973)
-
Direct Direct
L-Lysyl-nitroaniline Benzoyl-L-arginamide Dimethyl casein
DMC
-
Indirect
Benzoyl-L-arginine ethylic ester, (Bz-Arg-OEt) D-Tosyl-L-arginine methyl ester,
BAEE
253
Direct
TAME
247
Direct
(Tos-Arg-OMe) N D)-Benzoyl-DL-arginine-4N(
BAPNA
-
Indirect
BAAMC
A = 380 E = 460
Direct
CBAAMC
A = 380 E = 460
Direct
nitroanilide/HCl (Bz-Arg-NHPhNO 2) Benzoyl-DL-arginine 7-amino-4methylcoumarine (Bz- Arg-AMC) Carbobenzoxy-L-arginine 7-amino-4methylcoumarine (Cbz- Arg-AMC)
(Whittaker & Bendr, 1965) (Y. Lin, Means, & Feeney, 1969) (Schwert & Takenaka, 1955) (Hummel, 1959) (Mole & Horton, 1973; Zassenhaus, Hanson, & Wolgemuth, 1976) (Zimmerman, Ashe, Yurewics, & Patel, 1977) (Zimmerman et al., 1977)
was described for the first time in 1876 by Kühne, who described a proteolytic activity in pancreas secretions. This enzyme is available from a large variety of animals such as bovine (SWISS-PROT entry: P00760) or porcine (SWISS-PROT entry: P00761). This enzyme has some advantages compared to others: - It produces a limited amount of autolysis products, - The mass range of the produced peptide are typically between 500 and 2500 Da, which is compatible with MS analysis (MALDI and ESI-MS), - Its molecular mass is 23.5 kDa, which facilitates its incorporation in the gel,
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
31
-
Resistance against buffer (phosphate, Tris…) and chaotropes (urea, SDS, …) - Specificity of the cleavage. Usually, the substrate/enzyme ratio must be around 1/20 to 1/100 (w/w) in a 50 mM ammonium bicarbonate buffer pH 8 to obtain a correct digestion. 4.3.3.1. Enzymatic activity measurement The optimum pH for trypsin activity is around 8, whereas at pH 3 it is completely inactivated without denaturation. Calcium chloride added to the buffer tends to prevent autolysis products. Nevertheless, the usual buffer for trypsin digestion is ammonium bicarbonate. With this, calcium cations and carbonate anions produce an insoluble material that limits the advantage of calcium and introduces insoluble particles into the sample. Enzymatic activity can be measured using various different direct or indirect spectrometric measurements on several substrates illustrated in Table 4. Main substrate used for tryptic activity measurement; O (nm) corresponds to the absorption wavelength used during hydrolysis kinetic measurement. For fluorescence analysis, the A value corresponds to the activation wavelength whereas the E value corresponds to the emission wavelength. Us. Abr.: usual abbreviation; O (nm): absorption wavelength; Method (Direct or Indirect): refers to substrate utilisation directly by optical density measurement (Direct) or if the result is obtained after one second quantitative operation able to quantitate the amount of modified substrate (Indirect). The most frequently used techniques are 7$0( (Hummel, 1959), BAEE (Schwert & Takenaka, 1955) or BAAMC (Zimmerman et al., 1977) for free enzyme, whereas BAPNA (Mole & Horton, 1973) is preferred for immobilised enzyme. 4.3.3.2. Enzyme specificity Trypsin is considered a specific enzyme. This specificity is due to the requirement of a basic residue on the lateral chain for the proteolytic activity. Cleavages are obtained preponderantly on the carbonyl side of the two basic AAs, Lys and Arg residues. Enzymatic activity is 2 to 10 times less for the Lys residue compared to the Arg residue, but the activity is still 105 times larger for these two AAs than for the others (Craik et al., 1985). This rule is usually sufficient for protein identification by mass spectrometry, but cleavage rules are a little less specific than these previous ones. Keil (Keil, 1992) has done extensive investigation concerning various endoproteinase cleavage sites. First, enzyme selectivity is not related to only one AA but usually to a pattern containing 2–8 residues. The secondary interaction sites between the enzyme and the substrate are not involved in the cleavage reaction but are involved in the selectivity and complex enzyme/substrate stability (see Figure 6: trypsin interactions with the substrate are situated between S2–S4 and S’1–S’4). As an example, a prolyl residue has a negative effect when situated at the position P’1
32
W.V. BIENVENUT
but if situated at the position P2 or P’2, it increases enzyme/substrate complex stability and so has a positive effect on the cleavage kinetics (Zimmerman et al., 1977). As well as prolyl residues, an arginyl residue at the positions P3, P’1 and P’3 has a negative effect on cleavage kinetics; but also, in some cases, prolyl at the P’1 position does not inhibit cleavages such as the patterns XWK-PX or XMR-PX (X corresponds to any AA and M, R, W, K and P correspond to the one-letter AA code) patterns (Keil, 1992). Hill (Hill, 1965) mentioned in his study the negative effect on cleavage kinetic of charged or polar AAs at the position P’1. Recently, Thiède et al. (Thiede et al., 2000) studied the digestion product of 104 proteins from human Jurkat T cells and Mycobacterium, considering missed cleavage sites. Three main reasons provide an explanation: - Position P’1 is occupied by a Pro, - Position P2 and/or P’1 is occupied by Lys or Arg, - Position P2 and/or P’1 is occupied by Glu or Asn. Substrate Endoproteinase
Pn - P4 - P3 - P2 - P1 - P’1 - P’2 - P’3 - P’4 - P’n - S4 - S3 - S2 - S1 - S’1 - S’2 - S’3 - S’4 -
Cleavage site or Interaction site Figure 6. Schematic representation of the enzyme/substrate complex; positions Pn and P’n are determined from the cleavage site position that is situated between P1 and P’1
“Non-specific” cleavages are also reported with trypsin. The more frequent are found at the C-Terr of the tryptophanyl, tyrosyl, phenylalanyl and methionyl residues. The cleavage after the first three residues can be explained by the chemotryptic activity contained in trypsin, but the cleavage after the Met is much more difficult to explain. Moreover, this atypical cleavage is also found when chemotryptic inhibitors are added to the trypsin (Keil, 1982), such as L-1-tosylamido-2-phenylethyl chloromethyl ketone (TPCK). In fact, commercially available trypsin contains three different isoforms: D, E and \. These three components can be easily separated by chromatography. The first two isoforms are well known, but the \ trypsin is also known to be the pseudo-trypsin that corresponds to an autolysis product of trypsin. Finally, trypsin could be considered as a specific endoproteinase since the kinetic activity after lysyl or/and arginyl residues is far more than for the other residues, but we must be aware that unspecific cleavages can also occur.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
33
4.3.4. Endoproteinase Lys-C Lysyl endopeptidase or endoproteinase Lys-C (EC 3.4.21.50) is considered a specific enzyme of the serine protease belonging to peptidase family S1. Cleavage occurs specifically at the C-Terr of the lysine. Achromobacter lyticus (strain M497-1) produces it, but a similar enzyme can be found in bacterial strain ATCC 29487 of Lysobacter enzymogene. From Achromobacter lyticus, an extremely specific proteinase is produced in a mixture that also contains a low quantity of two other endoproteinases of unknown specificity (Masaki, Nakmura, Isono, & Soejima, 1978). The optimal pH for the endoproteolytic activity is around 9–9.5 but it also exhibits an esterase activity at a pH around 8. Usually, the substrate/enzyme ratio must be around 1/200 to 1/400 (w/w) in a Tris®/HCl buffer pH 9 to obtain a complete hydrolysis of the lysyl bonds in 6 hours at 37°C (Tsunasawa et al., 1987). This enzyme is stable in a pH domain from 4 to 10 and still active up to 40°C. The interaction domain is situated on the residues S1 to S3 (Sakiyama et al., 1990). In terms of specificity, some partial cleavages were found for the bound Lys-Gln, LysVal, Lys-Pro (Jekel, Weijer, & Beintena, 1983) and more generally a decrease of enzymatic activity when the P’1 position is occupied by a prolyl residue or a charged AA (Tsunasawa et al., 1987). Unspecific cleavages were found for the bonds Arg-Gln , Thr-Phe, Phe-Gly (Jekel et al., 1983), Arg-Ser and Arg-Ala (Yonetsu et al., 1986). This enzyme is moderately used for protein identification using PMF, probably for the following reasons: - The average frequency of lysine in proteins contained in SWISS-PROT (release 41) is around 5.96%; a cleavage occurs every 16–18 residues (on average), which represents 2000–2200 Da average mass for the peptides, corresponding to an analysis window between 1000 to 4000 at least. The peptides are not easily analysed by MALDI-MS. - Biochemical properties of the peptides produced are similar to these produced by trypsin digestion (basic N-Terr residue). Thus, combination of enzymes such as trypsin and Lys-C to improve sequence coverage is not particularly useful since “similar” peptides are produce and generally similar peptide sequences are missing. Up to now, only a few groups have used the lysyl endopeptidase on a large scale (Fountoulakis et al., 1998; Patterson et al., 1996). 4.3.5. Chymotrypsin Chymotrypsin (EC 3.4.21.1) is considered a non-specific enzyme of the serine proteases belonging to peptidase family S1. As well as trypsin, this enzyme is found in the pancreas of various animals and it is the main contaminant of trypsin. By
W.V. BIENVENUT
34
itself, this enzyme corresponds to an equimolar mixture of chymotrypsin A (Hartley, 1964; Meloun et al., 1966) and chymotrypsin B (Smillie, Furka, Nagabhushan, Stevenson, & Parkes, 1968) with 80% of identity between both sequences but with the two forms having similar activity (Gráf, Szilágyi, & Venekei, 1998). This enzyme is considered as of low specificity due to the large number of possible cleavage sites. The main activity is found at the C-Terr of the tryptophanyl, tyrosyl, phenylalanyl and leucyl residues (Bergmann & Fruton, 1941; Desnuelle, 1960; Neurath, 1957). These four AAs correspond to the most frequent hydrolysed bond, but cleavage can also occur after methionyl and histidyl residues in lower amounts. If only these four AAs are considered, which represent approximately one-fifth of all available residues (by comparison to the average AA composition of SWISSPROT release 41), it is not guaranteed that cleavage will occur every time. In fact, considering the leucyl residue, only 2 out of 5 are effectively cleaved. Accordingly, since it is still impossible to predict the cleavage sites, it is not possible at present to use this endoproteinase for protein identification using the PMF approach. Nevertheless, it is interesting to use this endoproteinase when the challenge is maximum sequence recovery, e.g. for post-translational modification characterisation (Kussmann & Roepstorff, 1998). Table 5. A non-exhaustive list of pepsins with some physico chemical values Species
Protein name
SWISS-PROT
MW
entry
(kDa)
pI
Human identity (%)
Homo sapiens (Human)
Pepsin A
P00790
34.628
3.36
100
Macaca fuscata fuscata
Pepsin A2/A3
P27677
34.569
3.38
95
(Japanese macaque)
Pepsin A1
P03954
34.497
3.24
94
Pepsin A4
P27678
34.828
3.80
91
Pepsin A
P11489
34.497
3.24
94
Canis familiaris (Dog)
Pepsin A
Q9GMY6
34.738
3.72
86
Gallus gallus (Chicken)
Pepsin A
P00793
35.532
4.26
61
Sus scrofa (Pig)
Pepsin A
P00791
34.699
3.24
82
Pepsin
Q91322
35.786
3.47
52
Macaca mulatta (Rhesus macaque)
Rana catesbeiana (Bull frog)
A large number of unspecific cleavages were identified after Asn, cysteic acid, Gln, Gly, His, Ile, Lys, Ser, Thr, Val with a low frequency. In fact, the interaction
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
35
sites between enzyme and substrate cover positions P5 to P’3 with a low influence for P4, P5 and P’3. Duan & Laursen (Duan & Laursen, 1994) quantified the dominant activity linked to the AA residue position in P1. Chemotryptic activity of the free enzyme can be determined by direct spectrometric measurement of the pnitroaniline during the hydrolysis reaction (Del Mar, Largman, Brodrick, & Geokas, 1979). 4.3.6. Pepsin Pepsin (EC 3.4.21.1), an acidic protease discovered in the 18th century, is secreted by the stomach of various vertebrates (Table 5) from human to chicken or frog. This aspartic peptidase belongs to peptidase family A1 and is considered a non-specific enzyme. Its inactive precursor, pepsinogen, is produced in stomach mucosa, but some other peptidases (pepsin B, C and D) are also present. Optimum peptide hydrolysis is situated in the pH range 1 to 6 (Lin et al., 1992), but usually it is used at pH 2 with the optimum of activity at pH 3.5. Lockridge et al. (Lockridge, Adkins, & La, 1987) have shown a better specificity when the digestion procedure was conducted at pH 1.3 rather than pH 2, mainly for the cleavage involving phenylalanyl and leucyl residues Pepsin exhibits a low specificity due to the number of possible cleavages at the carbonyl as well as the amido group of aromatic AAs such as Phe, and hydrophobic residues such as Leu. As usual, the enzyme specificity was determined with a small substrate such as Phe-Phe, Glu-Tyr but with poor efficiency. In fact, the specificity of pepsin was shown to be better on longer substrates (Tang & Koelsch, 1995). Positions P1 and P’1 are crucial in term of recognition site specificity of large hydrophobic AAs, such as phenylalanyl or leucyl residues (Tang & Koelsch, 1995). Considering the specificity requirements of the other subsites, they are not selective and can accommodate various residues such as Leu, Ala, norleucine, Glu, Ser, Asp, Arg and Ile in a peptide substrate. Basic AAs have a negative influence on the kinetics of hydrolysis. For example, the pattern XRXFX (X = any AA) will not produce the cleavage around the phenylalanyl residue. In the case of a prolyl residue, positions P4 and P3 help cleavage, whereas positions P2 and P’3 decrease cleavage efficiency. Depending to all of theses factors, the kinetics of hydrolysis differs as a function of the AAs involved (Rao & Dunn, 1995). This enzyme also shows transpeptidase and esterase activity (Ryle & Porter, 1959). To conclude, pepsin is not suitable for protein identification using peptide mass fingerprint due to the production of unpredictable peptides. Nevertheless, pepsin is a good compromise for the characterisation of the post-translational modifications since cleavage is generally situated on the hydrophobic sites of the proteins.
36
W.V. BIENVENUT
4.3.7. Bacterial endopeptidases Other endoproteinases from bacterial sources are commercially available with a broad range of specificity. Some of them have been moderately used for protein identification or characterisation. The endoproteinase V8 or endopeptidase Glu-C (EC 3.4.21.19) is produced by Staphylococcus aureus (SWISS-PROT entry: P04188) and was first identified by Drapeau et al. (Drapeau, Boily, & Houmard, 1972). This serine peptidase belongs to peptidase clan SA, family S2, and is considered a specific enzyme with cleavage at the carbonyl side of negatively charged residues such as Glu and Asp. Nagata et al. (Nagata, Yoshida, Ogata, Araki, & Noda, 1991) showed that the enzymatic activity is optimum at different pH values , considering the glutamyl or aspartyl residues (respectively pH 8 or pH 7). Theoretically, bicarbonate buffers (50 mM pH 7.6) preferentially induce a cleavage after glutamyl residues, whereas phosphate buffers (100 mM pH 7.8) induce cleavages after both residues. However, bicarbonate buffer seems to act as an “inhibitor” during the hydrolysis reaction (Sørensen, Sørensen, & Breddam, 1991). A few Glu-specific serine proteases have been isolated from other bacteria (Birktoft & Breddam, 1994), e.g. Bacillus licheniformis (SWISS-PROT entry: P80057). Its utilisation is mostly limited to blotted protein and protein solutions (Le Maire et al., 1993; Puri & Surolia, 1994). Peptidyl-Asp metalloendopeptidase or endoproteinase Asp-N (EC 3.4.24.33) was first isolated from the supernatant of a Pseudomonas fragi culture (Noreau & Drapeau, 1979; Porzio & Pearson, 1975). The amino acid sequence of this enzyme from the clan MX, family M99, is not known but its selectivity allows cleavage at the N-Terr side of either aspartic acid or cysteic acid residues, an artefact of the cysteine residue (Mitsumoto et al., 2001; Yang et al., 2002), but with low specificity (Ponstingl, Maier, Little, & Krauhs, 1986). Prolyl oligo-peptidase or Pro endopeptidase (EC 3.4.21.26) was discovered on the human uterine mucosa (SWISS-PROT entry: P48147) (Walter, Shlank, Glass, Schwartz, & Kerenyi, 1971). This peptidase has high enzymatic activity against prolyl residues, but its specificity is low. Quéméneur et al. (Quéméneur, Moutiez, Charbonnier, & Ménez, 1998) produced a recombinant enzyme with a modification of the interaction site to increase the specificity and obtain “specific” cleavages. A few other enzymes are commercially available such as clostripain (EC 3.4.22.8) from Clostridium histolyticum (SWISS-PROT entry: P09870) (Kocholaty, Weil, & Smith, 1938) or subtilisin (EC 3.4.21.62), but none of them has sufficient selectivity to be used at present for protein identification by PMF. Nevertheless, such endoproteinases are valuable when used in parallel with trypsin to increase protein sequence recovery for further characterisation.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
37
4.3.8. Conclusion The number of available endoproteinases able to produce predictable peptides is limited (trypsin, endopeptidase Lys-C, Asp-N, Glu-C) and is even worse if they are to be used for in-gel digestion. In 1982, Keil (Keil, 1982) proposed some ways to identify new specific enzymes: - Discover new proteases with cleaving rules targeted to infrequent AAs residues, - Define the specificity more accurately for well-known enzymes. Twenty years later, nothing has changed and trypsin is still the enzyme of choice for PMF. Nevertheless, considering the work published by Keil in 1992 (Keil, 1992) and the improvement of the data-mining approach, it seems to be possible to define endoproteolytic cleavage rules for what are usually called “unspecific” enzymes. Indeed, the large influence of the site of interaction between the substrate and the enzyme has been shown (Quéméneur et al., 1998). Although at present enzyme specificity is defined on a single AA (position P1 and sometimes also P’1), enzymes such as pepsin shows a larger interaction domain. The bio-informatic treatment of such data may be able to produce interesting predictive models. Alternatively, Quéméneur et al. (Quéméneur et al., 1998) showed that it was possible to modify the interaction sites of a protease to increase it specificity. Such remodelling may be applied to others enzymes to increase their specificity. 4.4.
Chemical cleavage of proteins
4.4.1. Introduction Chemical cleavage of proteins was an important approach for protein identification using Edman sequencing. Such treatments allow very selective cleavage, but usually a large amount of material is needed and a desalting step must frequently be included in the digestion protocol to remove salts and/or buffer needed for the chemical reaction. Nevertheless, some of them could be usefully adapted for use in the proteomic area. 4.4.2. Acidic hydrolysis This must be one of the oldest protein cleaving techniques (Braconnot, 1820). External parameters have a direct influence to the specificity of the cleavage (Hill, 1965). Acid source and concentration and temperature of the reaction have a direct influence on the process. As an example, 1.5 M HCl at 100°C gives a hydrolysis constant of 0.216 x 10-3 min-1 for the Gly-Thr bond, and 0.025 x 10-3 min-1 for the Ile-Gly bond. Using 6 M HCl at the same temperature increases the hydrolysis
38
W.V. BIENVENUT
kinetics by 10-fold and 4-fold respectively, as well as decreasing bond hydrolysis specificity. Inglis et al. ( Inglis, McKern, Roxburgh, & Strike, 1979) obtained a specific cleavage at the carbonyl side of the asparagine residue with HCl solution pH 2.00 ± 0.04 during 2 hours at 108°C. Similar results were obtained by Landon (Landon, 1983) when 10% acetic acid buffered at pH 2.5 was used during 24–120 hours at 40°C. Usually, decrease of the pH value decreases the selectivity of such hydrolysis. Using adapted conditions, it is possible to obtain a recurrent hydrolysis of a protein/peptide ( Mirgorodskaya, Hassan, Wandall, Clausen, & Roepstorff, 1999). When used on purified peptides, this technique allows one to read the AA sequence directly on the mass spectrum (Gobom, Migorodskaya, Nordhoff, Hojrup, & Roepstorff, 1999; Vorm & Roepstorff, 1994). However, the technique is influenced by so many parameters that it is difficult to obtain specific hydrolysis. Thus, it is not possible to use it for protein identification by PMF; but again it could be useful for protein characterisation. 4.4.3. Cyanogen bromide Cyanogen bromide is a useful reagent for cleaving protein specifically after Met residues. This method, developed by Gross and Witkop (Gross & Witkop, 1961, 1962), is one of the most specific techniques for chemical cleavage. A mixture of 0.3 M CNBr in 70% acetic acid allows a complete hydrolysis of the methionyl bond on the carbonyl side with the conversion of the Met in homoserine lactone. The peptides produced are generally large, with molecular weights from 1000 Da up to few thousands Da (personal communication, E. Gasteiger, Swiss Institute of Bioinformatics, Geneva). An unspecific cleavage was noted for the pattern XM-(S/T)X and sometimes a lack of quantitativity. This technique was used with success for the identification of membrane proteins (van Montfort, Canas, Duurkens, Godovac-Zimmermann, & Robillard, 2002; van Montfort, Doeven et al., 2002). In fact, due to the hydrophobicity of such polypeptides, it is not easy to produce a sufficient amount of peptide for protein identification when using trypsin. Since Met residues are generally situated in the hydrophobic region, digestion with CNBr could decrease the hydrophobicity of the produced peptide and so facilitate its analysis. 4.4.4. Cleavage at the carbonyl side of the Trp Due to the indole ring, Trp is extremely reactive and few chemical reagents can produced peptide backbone cleavages. These cleaving reactions were developed mostly for the Edman sequencing approach of protein primary sequence identification. o-Iodobenzoic acid (Mahoney & Hermodson, 1979) was one of the usual reagent for Trp cleavage (Hara & Yamakawa, 1996, Puri & Surolia, 1994) but nearly forsake a present. This cleavage procedure was based on the indole ring oxidation
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
39
(Mahoney, Smith, & Hermodson, 1981) that needed a previous operation of protein reduction and alkylation of the sulfhydryl group to limit cysteic acid formation. Nevertheless, Met residue must be oxidized during such treatment. Specificity of the cleavage is usually good but residues like Ile or Val in position P’1 impair partially the cleavage (around 70 % cleavage). Non-specific cleavage after the carbonyl group of the Tyr was mentioned and more especially if Gly or Phe residue are at position P’1 (Mahoney & Hermodson, 1979). Such cleavage after the Tyr residue seems to be due to the dismutation of oiodosobenzoic reagent to iodoxybenzoic acid. Utilisation of p-cresol during the cleavage reaction decreases by-product formation and decreases non-specific cleavage (Mahoney, Smith, & Hermodson, 1981). Another possible explanation of the non-specific cleavage obtained after the Tyr residues was proposed by Fontana et al. (Fontana, Dalzoppo, Grandi, & Zambonin, 1983). In that case, a buffer composed of guanidine/HCl was a source of halogen atoms that could react with the phenol ring of the Tyr. Such modified AA produces a cleavage in the same way as protein treatment using N N-halogenosuccinimides. For both explanations, p-cresol addition allows a decrease in such by-products. N-halogenosuccinimides (NHS) were used in presence of urea to produce Trp N specific cleavage. The activity of such agents is directly linked to the halogen atoms contained in the NHS. The bromine derivative (NBS) is highly reactive but, as a side effect, non-specific cleavages were observed after Tyr and His. Utilisation of the NBS with 2,4,6-tribromo-4-methylcyclohexadione decreases this excess of reactivity and increases the specificity of the reaction (Fontana, Savige, & Zambonin, 1979). The chlorine derivative (NCS) is less reactive than NBS, which allows increased specificity of the cleavage, but reactions are usually not quantitative. At low NCS concentration (equimolar), the main reaction occurring is the oxidation of Met. For higher concentrations (10 times excess of reagent), 50% of specific cleavage is obtained as well as oxidation of Met and Cys residues (Lischwe & Sung, 1977). The iodine derivative (NIS) is not frequently used. Other halogenated agents have also been used such as: - Mixture of dimethylsulfoxide and bromhydric acid (Savige & Fontana, 1977; Wachter & Werhahn, 1979) - Cyanogen bromide in heptafluorobutyric acid (Fontana et al., 1979; Ozols & Gerard, 1977) - Large excess of 2-(2-nitrophenylsulfenyl)-3-bromoindolenine or BNPSskatol (Omenn, Fontana, & Anfisen, 1970). 4.4.5. Cleavage at Cys residues Specific cleavage reaction at the cysteinyl residue was reported by Wu & Watson (Wu & Watson, 1998). In that case, Cys must be cyanylated with 1-cyano-4,4dimethylaminopyridinium tetrafluoroborate. The cleavage is obtained by incubation
W.V. BIENVENUT
40
of the cyanylated substrate in alkaline buffer: 4 M guanidine/HCl and 0.25 M Tris/HCl (pH 9.0), 0.02 M NaOH (near pH 12), 2 M ammonia solution. This quantitative cleavage technique can be applied directly to native protein and helps in disulfide bridge characterisation and also PMF protein identification when the proteins are reduced/alkylated before the digestion. 2-Nitro-5-thiocyanobenzoic acid was also used with success for Cys specific protein cleavage (Degani & Patchornik, 1974; Jacobson, Schaffer, Stark, & Vanaman, 1973). 4.4.6. Conclusion The main problem associated with protein specific cleavage using chemical compounds is the large excess of reagent required to obtain quantitative cleavage. Furthermore, most of these chemical reactions involve a non-volatile buffer. At the end of the cleavage process, a desalting step is absolutely needed before mass spectrometric analysis. Nevertheless, chemical cleavage is an interesting approach when the major challenge is to characterise the protein primary sequence. An example of such multi-digestion was described during the characterisation of a Psophocarpus tetragonolobu lectin (Puri & Surolia, 1994). Table 6. Effect of the salt concentration on peptide mass determination by MALDI-MS: (+) no or negligible influence, (-) negative influence (Reprinted from Journal of Chromatography, (Amini et al., 2000), © 2000 with permission from Elsevier Science). Buffer (salt) Ammonium acetate Guanidine Gly M Tris TM Ammonium bicarbonate Imidazol EPPS Sodium acetate CAPS Sodium carbonate Sodium citrate Sodium borate Sodium phosphate HEPES MES ADA
20 + + + + + + + + + + + + + + + +
50 + + + + + + + + + + + + + + + +
100 + + + + + + + + + + + + + + + -
Concentration (mM) 150 200 250 300 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + -
350 + + + + -
400 + + + -
500 + + + -
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY 4.5.
41
Sample preparation and clean-up for MALDI-MS analysis
4.5.1. Chromatographic treatment Although MALDI-MS is considerably more tolerant of buffers and salts compared to ESI, it is still advisable to remove as many contaminants from a sample as possible since these species will compete with the peptides/proteins for ionisation (protons) in the gas phase, resulting in suppression of ionisation for the species of interest. Lecchi and Caprioli (Lecchi & Caprioli, 1996) have elegantly demonstrated the dramatic effects of high sample purity on the detection sensitivity in MALDIToF analysis of peptides (Lecchi & Caprioli, 1996). Amini et al. (Amini et al., 2000) investigated the signal response during MALDI-MS analysis of contaminated samples. 500 mM Gly or ammonium acetate does not affect signal, whereas 100-150 mM sodium phosphate buffer strongly inhibits the analyte’s ionisation. As well as phosphate, sodium dodecyl sulfate (SDS), a widely used surfactant during SDS-PAGE separation, suppresses all signals when present at only 30 mM (Table 6). Despite the tolerance of MALDI-MS for salt contamination, samples must be as free of salt as possible. A number of groups simultaneously developed methods for the clean-up of small quantities of peptide mixtures, which often, but not always, were derived from in-gel digests (e.g., (Courchesne & Patterson, 1997; Gevaert, Demol, Sklyarova, Vandekerckhove, & Houthaeve, 1998; Jensen, Wilm, Shevchenko, & Mann, 1998; Shevchenko, Wilm et al., 1996; Zhang, Andren, & Caprioli, 1995). All of these approaches employ reversed-phase material to bind the peptides, allowing salts, buffers and other polar gel-related contaminants to be washed away (or significantly reduced in concentration). The bound peptides can be eluted in a small volume (from sub Pl to low Pl) of high-concentration organic, acidified solvent, e.g. 70% acetonitrile (v/v) in 0.1% formic acid (v/v), thereby affecting both peptide concentration and clean-up of the sample. The major challenge of a such technique is the utilisation of systems with dead volume below 1 Pl (Courchesne & Patterson, 1997; Gobom, Nordhoff, Mirgorodskaya, Ekman, & Roepstorff, 1999). In some cases, matrix could be dissolved directly in the elution solvent (Gobom, Nordhoff et al., 1999). Although this technique is extremely powerful, it is generally a timeconsuming method, especially for the chromatographic column packing. Recently, Millipore (Bedford, CA, USA) has launched a commercially available product for a reversed-phase desalting system against peptide mixtures, called Zip Tips™ (Jin, Chen, Lubman, Misek, & Hanash, 1999). These are micropipette tips packed with different media such as C18 phase for peptide purification, C4 for protein purification, or IMAC for phospho-peptide enrichment. Utilisation of such material is possible with robots, allowing high throughput and high reproducibility of the process (Alvarez, Larsen, Coldren, & Rice, 2000; Hale, Butler, Kniermann, & Becker, 2000).
42
W.V. BIENVENUT
Also based on “reversed-phase chromatography”, utilisation of hydrophobic membrane such as PVDF membranes (Brockman, Dodd, & Orlando, 1997; Vestling & Fenselau, 1994), polyethylene membranes (Worrall, Cotter, & Woods, 1998; Worrall, Lin, Cotter, & Woods, 2000), polyurethane membranes (McComb et al., 1997) or any other hydrophobic material allows efficient desalting of the sample as in the reversed-phase technique. This is the origin of Surface Enhanced Laser Desorption/Ionisation MS (SELDI-MS), in which hydrophobic surfaces are produced directly on the MALDI sample target (Borrebaeck et al., 2001; Brockman & Orlando, 1995, 1996; Brockman, Shah, & Orlando, 1998; Nedelkov & Nelson, 2000; Nelson, Nedelkov, & Tubbs, 2000). 4.5.2. Preparation of samples for MALDI-MS analysis MALDI-MS analysis is performed on samples that have been mixed with matrix (small organic molecules; see Section 4.6.1.2.3), resulting in co-crystallisation of the analyte in the matrix crystallisation structure. The matrix adsorbs the energy carried by laser photons to produce molecular gas phase ions of the peptides contained in the sample as well as various matrix-related ions. These matrixes have several advantages: - Protecting the analyte from the high-energy source (UV laser), - Inducing the production of ions of non-volatile compounds in the gas phase, - Producing singly charged ions due to the huge dilution of the sample in the matrix (usually the molecular ratio is between 1/100 and 1/5000). Nevertheless, a crystallisation step is crucial since the speed of solvent evaporation has a direct influence on the MALDI-MS signal. A number of sample preparation techniques have been proposed in last 10 years. 4.5.2.1. Dry droplet method The “dry droplet” method, proposed by Hillenkamp (Karas & Hillenkamp, 1988), was the first used method. The analyte and the matrix solution are mixed together, then loaded on the MALDI sample plate. Solvents are evaporated using simply airdrying. The major disadvantage of such a technique is the lack of one to one sample reproducibility due to variable drying parameters (room temperature, moisture, …). Despite the lack of reproducibility, the method is still used, mostly for its robustness in obtaining a MALDI-MS signal in a few minutes for nearly all samples. 4.5.2.2. Spin-coated drying This technique is directly linked to the dry droplet method. Due to the problem of homogeneity during solvent drying, Nielsen et al. (Nielsen et al., 1988) proposed to centrifuge the sample before crystallisation. The size of the final sample (dry droplet of matrix) is tuneable by the speed of the centrifugation. Solvent evaporation is
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
43
increased due to the larger surface of the sample and, because of the higher speed of crystallisation, reproducible sample preparations from 5 000 to 15 0000 Da were obtained (Perera, Perkins, & Kantartzoglou, 1995). 4.5.2.3. Slow crystallisation During slow crystallisation (Beavis & Bridson, 1993), co-crystallisation is driven by thermodynamics, mainly on non-polar crystal faces i.e., (103). This indicates that protein or peptide incorporation in the 3-D structure of matrix crystals acts directly on the MALDI signal (Chan, Colburn, Derrick, Gardiner, & Bowden, 1992; Doktycz, Savickas, & Krueger, 1991). Xiang et al. (Xiang & Beavis, 1994) proposed a very-low-speed crystallisation process (crystallisation over a few hours) that produces mono-crystals of sinapinic acid including the substrate. MALDI-MS analysis is conducted directly on that crystal. Furthermore, in highly saltcontaminated samples, slow crystallisation over a few hours produces a matrix mono-crystal in which analyte is incorporated but salts are not (Xiang & Beavis, 1994). This technique takes a few hours to produce the crystal and a requires a large amount of material. 4.5.2.4. Fast evaporation method For most samples, the peptidic mixture does not contain large amounts of salts and/or contaminants and quicker methods can be used. Vorm et al. (Vorm, Roepstorff, & Mann, 1994) proposed using a matrix solution in acetone that allows fast evaporation of the solvent. This resulted in a more uniform surface with smaller crystals than the previously described methods. The analyte solution is added on the top of the matrix crystal layer, where the non-polar faces bind proteins/peptides. This approach allows improved mass accuracy and signal reproducibility (Vorm et al., 1994). Analyte solvents must not contain organic solvent since the matrix is used as a “MALDI-matrix” and also as a “hydrophobic surface” to bind proteins/peptides. Comparison of this technique in different laboratory concluded that there was no dependency between matrix crystallisation and the signal sensitivity and/or limits of detection and resolution (Gobom et al., 2001; Lauber et al., 2001; Shevchenko, Wilm et al., 1996; Thiede, Wittmann-Liebold, Bienert, & Krause, 1995). Indeed, these two parameters are much more closely linked to the apparatus than the sample preparation technique itself.
44
W.V. BIENVENUT
4.5.2.5. Crystalline seed method A few groups have developed alternative “on-probe” clean-up approaches, which essentially involve the exclusion of salts and other contaminants from the crystal matrix by increasing the number of “nucleation” sites (sites where crystals form). Xiang and Beavis (Xiang & Beavis, 1994) accomplished sample clean-up on the probe by forming, on the metal probe surface, a thin layer of matrix crystals that are crushed. Non-adherent crystals were removed and the sample was added in a solvent, which resulted in partial dissolution of the preformed surface. The surface could be washed vigorously with water before the analysis step. 4.5.2.6. Sprayed matrix For the same purpose, i.e. to improve reproducible matrix crystallisation, utilisation of ES was proposed for MALDI-MS sample preparation (McNeal, Macfarlane, & Thurston, 1979). Such a technique does not really improve the result (Nielsen et al., 1988), but this method became of interest for samples analysed directly on hydrophobic membranes, such as in MS imaging approaches (see section 6 and following chapters) (Chaurand, Schwartz, & Caprioli, 2002; Kruse & Sweedler, 2203). 4.5.3. Sample desalting procedures Although matrix crystals can be used as a hydrophobic surface, low-energy interactions are limited to retaining the sample at the surface of the matrix. In the case of highly contaminated samples, other techniques must be employed. A low-cost and efficient technique for sample desalting is the use of standard hydrophobic membranes such as a PVDF membrane (Brockman et al., 1997; Vestling & Fenselau, 1994), polyethylene or polypropylene membranes (Worrall et al., 1998; Worrall et al., 2000), polyurethane membranes (McComb et al., 1997) or any other hydrophobic material. Such treatment is a direct application of the reversed-phase chromatographic separation: the polymeric surface adsorbs proteins/peptides by hydrophobic interaction, whereas salts and polar contaminants are removed from the hydrophobic surface during a washing step with acidified water solution. After that step, matrix solution containing organic solvent is added at the surface membrane and air-dried. Since 1994, chemically modified surfaces have been developed for the MALDI approach, especially reversed-phase (Brockman et al., 1998; Brockmann, Dodd, & Orlando, 1997; Vestling & Fenselau, 1994) but also immuno-specific interaction (Brockman & Orlando, 1995, 1996), strong cation exchange, and others. It is now called Surface-Enhance Laser Desorption/Ionisation (SELDI) (Borrebaeck et al., 2001; Brockman & Orlando, 1995, 1996; Brockman et al., 1998; Nedelkov & Nelson, 2000; Nelson, Nedelkov, & Tubbs, 2000) and some commercial products
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
45
are available (Dare et al., 2002). This is an efficient method for sample cleaning, but the approach has limitations: - If home made, e.g. PVDF membrane, it is a time consuming technique, - If commercial, e.g. Ciphergen products, the cost of the SELDI target is still high (around $100 for an 8-position single-use plate). Thus, the simplest way to decrease rapidly and efficiently the concentration of residual salts is to rinse the crystals with cold acidified water, e.g. 0.1% trifluoroacetic acid (Beavis & Chait, 1990; Gobom et al., 2001). An improvement of this method is the use of nitrocellulose. Pure nitrocellulose is dissolved in acetone, loaded on the MALDI sample plate and dried immediately. On top of this hydrophobic surface, the analyte solution is loaded, following by a washing step (Landry, Lombardo, & Smith, 2000; Lauber et al., 2001; Shevchenko, Wilm et al., 1996). Nitrocellulose can also be mixed into the matrix solution. In that case, peptide ionisation and signal reproducibility are increased (Preston, Murray, & Russell, 1993). 4.5.4. Conclusion A comparison was conducted between the three major MALDI-MS sample preparation techniques (Landry et al., 2000): dry droplet, fast evaporation and matrix with nitrocellulose. The highest sequence recovery was obtained with the nitrocellulose, but when samples are highly contaminated by salt, the fast evaporation technique is also very powerful. 4.6. Protein identification using mass spectrometry. Recent development of MALDI (Karas & Hillenkamp, 1988; Tanaka et al., 1988) and ESI (Aleksandrov et al., 1984; Fenn et al., 1989) ionisation methods provide the opportunity to analyse and characterise biopolymers such as ADN, polypeptides and/or glycosidic chains. MALDI-MS is still a method of choice for protein identification due to the low cost per sample and the high throughput rate. The interest of this analytical approach for biopolymer analysis is clearly visible in Table 7; more than 1300 published articles (Figure 7) in 2002 included "MALDI" and related terms. The basic technique linked to MALDI-MS development is protein identification using the “peptide mass fingerprint” technique (PMF).
W.V. BIENVENUT
46
4.6.1.
Protein identification using PMF technique.
4.6.1.1. Method description
Number of concerned articles
1400
133 32 3 1231
1200
1022
1000 800
785
600
592 438
400
3 357
200
186
0 0 1990
6 0 1992
19 19
68
1994
1996
Y199 1998 199
2000
2002
Figure 7. Occurrence of “MALDI” and related terms in the Medline database for the years 1990–2002.
If the primary sequence of a protein is known, the calculation of its theoretical MW is easy, but the experimental value is usually different from the theoretical/calculated value due to alternative splicing, protein truncation, PTM, etc. MALDI-MS was mostly used at first for MW determination of recombinant proteins and for quality control (mostly orientated to PTM) (Nakanishi, Okamoto, Tanaka, & Shimizu, 1994). In 1989, Henzel W.J., Stults J.T. et Watanabe W. from Genentech Inc. proposed the utilisation of protein digestion using specific endoproteinases such as trypsin to identify proteins by mass spectrometry at the 3rd symposium of the “Protein Society” (Seattle). A second poster was presented by Yates J.R., Griffin P.R., Hunkapiller T., Speicher S. and Hood L.E. on the same subject in 1991 during the “Caltech Symposium” (Baltimore). In 1993, five independent groups published for the first time such an approach for protein identification using mass spectrometry, trypsin specific endoproteinase and computer comparison of the experimental peptide masses with the theoretical mass of the in silico generated tryptic peptides (Henzel et al., 1993; James et al., 1993; Mann et al., 1993; Pappin, Hojrup, & Bleasby, 1993; J. R. Yates, III et al., 1993). The final treatment of the data is to match the experimental mass values to the theoretical value and highlight the most frequently matched proteins. Most of the development of this technique
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
47
was due to the computer-related data-treatment and software development (Clauser et al., 1995; Pappin et al., 1993). The technique allows identification of a large number of proteins, developing the concept of 2-DE maps or bi-dimensional cards (pI and Mr) of the proteins contained in a biological sample (see http://www.expasy.ch/ch2d/2d-index.html) (Fountoulakis et al., 1998; Hochstrasser et al., 1992; Rabilloud et al., 1998; L Tonella et al., 1998). The limitation of this technique was mainly the availability of the protein primary sequences in the databank that were less well documented: SWISS-PROT database release 25 in 1993 contained 29 955 entries (http://www.expasy.org/txt/old-rel/relnotes.25.txt) and, at that time, TrEMBL did not exist. The limitation of this technique was mainly the availability of the protein primary sequences in the databank that were less well documented: SWISS-PROT database release 25 in 1993 contained 29 955 entries (http://www.expasy.org/txt/old-rel/relnotes.25.txt) and, at that time, TrEMBL did not exist. 4.6.1.2. MALDI-ToF-MS analysis technique In 1988, Karas and Hillenkamp (Karas & Hillenkamp, 1988) proposed a new strategy for the analysis of biopolymers with Mr higher than 10 000 Da. The technique emerged from the work of Tanaka et al. (Tanaka et al., 1988), who obtained molecular ion desorption using laser irradiation of the sample and metallic power dispersed in glycerol, i.e. the matrix. The use of an organic matrix that adsorbs the UV emission of the laser to replace the metallic powder in glycerol has several advantages: - Easier to use, - Larger choice of compounds, - Variable solubility in various solvents, - Very large UV range of absorbing molecules. Investigations on organic molecules significantly improved the sensitivity of the method, to be compatible with protein analysis (Figure 8). The laser The laser is a crucial part of this system. It delivers the energy needed for protein/peptide desorption/ionisation through the energy carried in the laser photons. Usually, the analyte is diluted in the matrix at a molecular ratio varying from 100 to 10 000 times excess of matrix. Then, photon energy heats the sample to convert solid crystals to matrix/analyte gas phase. The high pressure of the matrix gas induces a gas plume containing the matrix but also the desorbed analyte (Figure 8A). From this desorbed material a very limited portion (around 0.01% of the initial population) will be available for the analysis of charged ions. At the same time, part of the photon energy is used to produce ions (see section 4.6.1.3).
W.V. BIENVENUT
48
Figure 8. Schematic of MALDI process and instrument. (A) A sample co-crystallized with the matrix is irradiated by a laser beam, leading to sublimation and ionisation of peptides. (B) About 100–500 ns after the laser pulse, a strong acceleration field is switched on (delayed extraction), which imparts a fixed kinetic energy to the ions produced by the MALDI process. These ions travel down a flight tube and are turned around in an ion mirror, or reflector, to correct for initial energy differences. The mass-to-charge ratio is related to the time it takes an ion to reach the detector; the lighter ions arrive first. The ions are detected by an electron multiplier. (Reproduced with the permission of "The Annual Review of Biochemistry", © 2001, Mann et al.)
Lasers in the UV range are the most frequently used: - 337 nm from a nitrogen laser (Schreiner et al., 1996; Strupat,
Karas, & Hillenkamp, 1991), -
355 nm (triple eximer) or 266 nm (quadruple eximer) Nd:YAG laser (Ingendoh, Karas, Hillenkamp, & Giessmann, 1994; Karas & Hillenkamp, 1988; Land & Kinsel, 2001).
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
49
An IR laser produced similar results and some groups developed this approach. The most frequently used are the Er:YAG laser at 2.94 Pm (Budnik, Jensen, Jorgensen, Haase, & Zubarev, 2000; Cramer, Hillenkamp, & Haglund, 1996; Eckershorn, Strupat, Karas, Hillenkamp, & Lottspeich, 1992; Schleuder et al., 1999) and the carbon dioxide laser at 10.6 Pm (Cramer et al., 1996). Other less conventional adjustable-wavelength lasers were also used in research approaches, but their use was limited due to the cost and the difficulty of obtaining results with reliability. Cramer et al. (Cramer et al., 1996), researching an efficient matrix for IR emission, used a “free electron” laser to determine the influence of the wavelength on the ionisation. Menzel et al. (Menzel, Dreisewerd, Berkenkamp, & Hillenkamp, 2001) used a tuneable “parametric optic oscillation” laser to determine mechanisms of energy transfer from laser photons to matrix as a function of the wavelength. Such a laser system was also used to analyse intact proteins directly from the PVDF membrane (Ryzhov et al., 2000). In terms of the MALDI spectral pattern, UV and IR laser produced similar results (Niu, Zhang, & Chait, 1998), but Budnik et al. (Budnik et al., 2000) clearly showed the advantage of using IR lasers due to the better sensitivity, at least 10 times higher than for UV lasers for fragile molecules. Such improvement is particularly interesting for glycosylation modification of proteins/peptides and sulfatation, which are visible with IR lasers but not with UV lasers (loss of the PTM during desorption/ionisation process). Also, non-covalent protein complexes were successfully obtained with IR-MALDI- MS, whereas using UV-MALDI-MS, only the separated agonists are visible. Nevertheless, use of IR-MALDI-MS is not frequent because of the greater complexity of the laser. Although the wavelength of the laser emission is an important factor, the laser fluence strongly influences the MALDI spectrum, especially in IR-MALDI-MS where the adsorption bands are narrower than in UV, in a range from 2.7 to 4.0 Pm (Cramer & Corless, 2001; Menzel et al., 2001). In UV-MALDI-MS, the adsorption frequency covers 70 to 280 Pm and there the problem is less important (Zenobi & Knochenmuss, 1998). The matrix Matrix choice is a crucial factor in MALDI-MS analysis. In most of the analysis process, the matrix is mixed with the analyte that co-crystallises to produce crystals of matrix in which the analyte is included. Other preparation methods can improve signal intensity and/or reproducibility (see sections 4.5.2. and 4.5.3.1). Since the beginning of the use of MALDI-MS and nicotinic acid (Karas, Bachmann, Bahr, & Hillenkamp, 1987; Karas & Hillenkamp, 1988), a large number of “matrix” molecules have been proposed. Table 7 lists some of them as well as the type of analyte for which they are preferentially used. Due to the absorption wavelength needed for the laser desorption/ionisation, some of them are aromatic (Beavis & Chait, 1989).
W.V. BIENVENUT
50
Of the compounds listed in, only a few are frequently used: - D-Cyanohydroxycinnamic acid for peptide mixtures below 5 kDa (Beavis, Chaudhary, & Chait, 1992; Egelhofer et al., 2000), - 2,5-Dihydroxybenzoic acid (gentisic acid) also for peptide mixtures. This matrix was used for a few studies conducted to determine the MALDI ionisation mechanisms (Glückmann & Karas, 1999; Karas, Glückmann, & Schäfer, 2000; Land & Kinsel, 2001; Macha, Limbach, Hanton, & Owens, 2001; Stevenson, Breukert, & Zenobi, 2000; Strupat et al., 1991). - Trans-3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic acid) is mostly used for protein analysis (Beavis et al., 1992; Glückmann & Karas, 1999; Schreiner et al., 1996; Stevenson et al., 2000) Table 7. Matrixes used for MALDI-MS analysis Matrix
Usual Abbreviation
Laser
Analytes
References (Macha, Limbach, & Savickas, 2000) (Taranenko, Tang, Allman, Chang, & Chen, 1994) (Jespersen, Niessen, Tjaden, & van der Greef, 1998) (Beavis & Chait, 1989; Önnerfjord et al., 1999)
3-Amino picolinic acid
3APA
UV
Oligo nucleotides
2-Amino-4-methyl5-nitropyridine
AMNP
UV
Proteins
CA
UV
4ACCA, ACCA
UV
DHBA
UV
SA
UV, IR
Proteins and peptides
(Beavis & Chait, 1989)
IPA
UV355
Proteins
(Bai, Liang, Liu, Zhu, & Lubman, 1996)
HMPPA
UV355
Protein’s
(Bai et al., 1996)
Caffeic acid D-Cyanohydroxy cinnamic acid 2,5Dihydroxybenzoic acid (gentisic acid) Trans-3,5dimethoxy-4hydroxycinnamic acid (sinapinic acid) Indole-3 pyruvic acid 4-hydroxy-3methoxyphenyl pyruvic acid
Proteins and peptides Peptide, proteins and glycoproteins Peptides, glycopeptides and carbohydrates
(Beavis et al., 1992)
(Jespersen et al., 1998; Strupat et al., 1991)
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
51
HABA
UV
Peptide, proteins and glycoproteins
3HPA, HPA
UV
Oligo nucleotides
Trans-indoleacrylic acid
IAA
UV
Polymers
Trans-3-methoxy-4hydroxycinnamic acid (ferulic acid)
FA
UV
Proteins and peptides
(Gusev, Wilkinson, Proctor, & Hercules, 1995)
2-(4-hydroxyphenylazo)-benzoic acid
picolinic acid
(Juhasz, Costello, & Biemann, 1993) (Wu, Shaler, & Becker, 1994; Wu, Steding, & Becker, 1993) (Green-church & Limbach, 1998; Macha et al., 2000) (Beavis & Chait, 1989; R. Beavis et al., 1992; Önnerfjord et al., 1999)
5-Methyl salicylic acid
MSA
UV
DHB comatrix for proteins and peptides
Nicotinic acid
NA
UV266
Proteins
(Karas et al., 1987; Karas & Hillenkamp, 1988)
Picolinic acid
PA
UV
Oligo nucleotides
(Tang et al., 1994)
UV
Non polar polymers
UV266, IR
Proteins
UV
Glycopeptides
Trans-retinoic acid
Succinic acid
(SA)
2-Amino-5nitropyridine
3-Aminoquinoline Anthracene 6-Aza-2thiothymine 2,6-Dihydroxy acetophenone
3-AQ
UV UV
ATT
UV
DHAP
UV
Proteins, peptides and polysaccharid es Polymers Oligo nucleotides Peptides and proteins
(Schreiner et al., 1996; Whittal, Schriemer, & Li, 1997) (Nordhoff et al., 1993; Overberg, Karas, Bahr, Kaufmann, & Hillenkamp, 1990) (Schmidt, Krause, Beyermann, & Bienert, 1995) (Sze, Dominic, & Wang, 1998) (Macha et al., 2000) (Lecchi, Le, & Pannell, 1995) (Gorman, Fergusson, & Nguyen, 1996)
W.V. BIENVENUT
52
Ice (water)
IR
Proteins
Polymers and proteins Harmaline 1-Hydroxy isoquinolein p-Nitroaniline
UV HIC
UV UV
2-Nitrophenyl octylether
UV
Pyrene
UV
Triethylamine 1,8,9-Trihydroxy anthracene (Dithranol)
THAP
UV UV
2,3,4-Trihydroxy acetophenone
UV
2,4,6-Trihydroxy acetophenone
UV
Proteins and oligosacchari des Oligosacchari des Polymers Non-polar polymers Polymers Oligo nucleotides Non-polar polymers Fragile peptides, DNA fragments DNA fragments
(Berkenkamp, Karas, & Hillenkamp, 1996; Hunter, Lin, & Becker, 1997; Kraft, Mills, & Dratz, 2001) (Berkenkamp et al., 1996; Hunter et al., 1997; Kraft et al., 2001) (Nonami, Tanaka, & Fukuyama, 1998) (Mohr, Börnsen, & Widmer, 1995) (Jespersen et al., 1998) (Bahr, Deppe, Karas, Hillenkamp, & Giessmann, 1992) (Macha et al., 2000) (Harvey, 1993; Pieles, Zurcher, & al., 1993) (Deery et al., 1997; Jackson, Jennings, & Scrivens, 1997) (Schmidt et al., 1995; Zhu et al., 1996) (Schmidt et al., 1995; Zhu et al., 1996)
Most of the matrixes listed in Table 7 are used in conjunction with UV lasers, but they are usually compatible with IR lasers due to the frequent carboxylic, amine or hydroxyl groups present on such molecules. Nevertheless, some matrixes such as adipic acid (Menzel et al., 2001) or succinic acid (Schleuder et al., 1999) are much better adapted to IR laser irradiation. More generally, all of the organic acids could be used with different rate of success. Water is also a matrix of choice for IRMALDI-MS. (Berkenkamp et al., 1996). One of the most interesting advantages water is the natural availability of this component in biological samples. A sample can then be used directly after a frozen step. However, long use of these samples is not possible because of the melting of ice during the laser shot and high sublimation rates in the spectrometer vacuum, which can reach 10-7 to 10-9 Torr (below 10-10 atm.).
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
53
Liquid matrixes such as glycerol (Berkenkamp et al., 1996) or trimethylamine (Harvey, 1993; Pieles et al., 1993) were used with success using IR irradiation. Advantages of liquid matrixes are the fluidity of the liquid, which allows renewal of the matrix and the analyte continuously over the laser beam. Such analyte fluidity decreases the phenomena of sample exhaustion and local overheating. The first role of the matrix is to absorb the energy from laser photons, but the matrix have a direct influence on the analyte integrity. Karas et al. (Karas et al., 1995) classified the matrix into “hot” and “cold” categories. This differentiation is directly linked to the ejection speed in the matrix plume (see section 4.6.1.2.6) in vacuum. Concerning the “hot” matrixes, e.g. ACCA, peptide internal fragments (Lake, Johnson, McEwen, & Larsen, 2000) and fragmentation of the lateral chain tend to appear more frequently than with “cold” matrixes, e.g. DHBA or HPA, that protect much more the PTM such as glycosylation (Karas et al., 1995; Strupat et al., 1991). For PMF analysis of peptide mixtures, internal and PTM fragmentations must be limited, and a matrix such as DHBA shows an advantage compared to the ACCA, which is used for the PSD-MALDI-MS where peptide fragmentation is an advantage (see section 4.6.2.3.1). Another related difficulty is the analysis of highly hydrophobic peptides that are not soluble in media like water/MeCN. Breaux et al. (Breaux, Green-Church, France, & Limbach, 2000) proposed a technique using surfactants as co-matrix to increase hydrophobic peptide solubility. The usual surfactants have a strong negative influence on the MALDI signal. This group also proposed the use of a chloroform/methanol solvent to prepare the sample (analyte and matrix). Using that solvent and 3-indoleacrylic acid as a matrix gave good results (Green-church & Limbach, 1998). Other exotic matrixes such as polycyclic aromatic, e.g. anthracene, pyrene, acenaphthene, were described for non-polar compounds, e.g. polystyrene or polybutadiene (Macha et al., 2000). The main problem related to the matrix is the crystallisation step, which suffers from a lack of reproducibility from one sample to another (see section 4.6.1.3). Moreover, due to the matrix cluster ions produced during the ionisation step, MALDI spectra are not really helpful for substrate masses below 800 Da. Some groups developed alternative approaches with immobilised matrix to decrease the low Mr range interferences. Hutchens & Yip (Hutchens & Yip, 1993) immobilised ACCA on agarose beads and this preparation was used with myoglobin. The corresponding MALDI spectrum shows the myoglobin peak but without major matrix contamination at low mass. Mouradian et al. (Mouradian, Nelson, & Smith, 1996) immobilised the matrix directly at the surface of gold plate using sulfhydryl anchors. Analysis of the protein mixture give a correct spectrum, but part of the immobilised matrix was desorbed from the surface and was visible on the spectrum. More recently, Biriak & Allen (Biriak & Allen, 1998) proposed a new type of matrix based on doped silicon (DIOS), i.e. semiconductor material as used in microelectronics, activated by UV irradiation to produce a highly porous material with high specific surface. This modified surface could be chemically treated to produce a hydrophobic surface where peptides/proteins were captured, allowing
54
W.V. BIENVENUT
desalting steps. This surface can be used directly for MALDI analysis without mass range limitation (Wei, Buriak, & Siuzdak, 1999).
Figure 9. The new Ciphergen matrix for Surface Enhanced Neat Desorption (SEND) provides MALDI-like mass spectra of biopolymers without the need to add matrix compounds. The component carries three different functional groups linked to a hydrophilic backbone (a): a silanol group to create a covalent link between the surface and the matrix (b), a hydrophilic linker (c) and two matrix molecules (d) . A C18 modified SEND array is created by depositing a thin film of a co-polymer of Į-cyano-4-methacryloyloxycinnamic acid and stearyl methacrylate on the surface of a biochip array. The incorporation of a C18 chain into the polymer units gives additional reversed-phase characteristics (c). Sample clean-up can be directly carried out on the array surface. Further washing of the sample on the C18 modified SEND array allows selective retention and purification of hydrophobic peptides.
During the 2003 American Society of Mass Spectrometry congress (Montréal, CA), Ciphergen presented a new polyfunctional MALDI matrix (Figure 9). This organic chemical compound is formed by a backbone in which three different side chains are linked: - A UV matrix absorbing group: Į-cyano-4-methacryloyloxycinnamic acid - A reversed-phase system: stearyl methacrylate, which adds C18 functionality - An anchor to link this molecule to the MALDI target surface: silanol reactive group. This new generation of matrixes should provide a very interesting approach where low-M Mr compounds are detectable by MALDI-MS.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
55
The co-matrix Co-matrixes are usually chemical compounds that do not act directly in the ionisation/desorption process but can improve MALDI signal quality. In the case of phospho-peptide analysis, the phosphate group acts against the production of positive ions. Utilisation of ammonium salts as co-matrix, i.e. ammonium citrate (Asara & Allison, 1999; Gorman et al., 1996; Wolfender et al., 1999) or ammonium acetate (Gorman et al., 1996), improves signal intensity. Fucose was proposed as a co-matrix (Gusev et al., 1995) to act as a moderator of the initial speed of matrix ions in the gas plume (Glückmann & Karas, 1999) and could described as a “matrix-cooler”. Nitrocellulose has previously been described as a co-matrix. First of all, such hydrophobic compounds allow desalting of the sample, but they also increase signal reproducibility and improve direct quantitation (Landry et al., 2000; Lauber et al., 2001; Shevchenko, Jensen et al., 1996). Metallic ions such as silver or copper salts were used to improve ionisation of non-polar polymer (Macha et al., 2001). Although matrix physico-chemical properties have an influence on the MALDI signal, their influence is limited compared to the influence of sample preparation. Both of these parameters must be used to determine the preparation technique preferred for specific analytes (Table 8). Table 8. Priority (1–3: high to low) for specific matrix utilisation for different analyses and their compatibility with sample preparation technique. A: dry droplet, B: thin layer, C: thick layer, D: sandwich (Reproduced with permission of Wiley-VCH Verlag Gmbh, ( Kussmann et al., 1997), ©1997) Matrix
PMF
Small proteins
Large proteins
Glycopeptides
Glycoproteins
ACCA SA DHBA THAP
1
3 1 2
1 1
2 3 1
2 1
2 2
Compatible preparation technique A, B, C, D A, D A A, B
4.6.1.3. MALDI ionisation mechanism Protein and peptide analysis techniques using mass spectrometry are recent and have been developed mainly on an empirical basis (Karas et al., 1987; Karas & Hillenkamp, 1988). Signal reproducibility, suppression effects on the analyte signal, and thermo-chemical relationships between matrix and ion production are not completely understood. As an example, the MALDI matrix is usually described as an organic component that can absorb the laser beam energy (Table 7), but not all UV and/or IR absorbing molecules are potential MALDI matrixes (Bai et al., 1996;
56
W.V. BIENVENUT
Mouradian et al., 1996; Taranenko et al., 1994). Ion formation is not completely explained and is still a subject of interest for some groups (Karas et al., 2000; Land & Kinsel, 2001; Q. Lin & Knochenemuss, 2001; Menzel et al., 2001; Stevenson et al., 2000; Wong, So, & Chan, 1998). The frequently asked questions are: - Is the matrix involved in the ionisation process? - Where does the ionisation proton come from? Wong et al. (Wong et al., 1998) tried to determine the origin of protons when polymers were analysed by MALDI-MS and the role of the matrix in the ionisation process. The idea of this study was to determine whether acidic hydrogens of carboxylic or phenolic groups of the matrix could act in the ionisation mechanism since their excited state pKa K could explain such phenomena (Gimon, Preston, Solouki, White, & Russell, 1992; Preston-Schaffer, Kinsel, & Russell, 1994). Matrixes and analytes that could liberate not protons but deuterium atoms were used. In that case, ionisation of the analyte occurs and the ionisation proton comes from the analyte itself, the matrix and/or traces of solvent used for the sample preparation. Grigorean et al. (Grigorean, Carey, & Amster, 1996) obtained similar results when the acidic groups of the matrix were esterified. During the analysis of non-polar polymers with non-polar matrix, metallic ions used as co-matrix contributed to substrate ionisation to produce adducts such as [M+Ag]+ in the case of silver salts (Knochenmuss, Lehmann, & Zenobi, 1998; Macha et al., 2000). All of these investigations are directly linked to experimental approaches. Another approach is to determine whether there is a relationship between physicochemical properties of the matrix and the ionisation process that could explain the energy transfer from the matrix to the analyte during the ionisation process. Stevenson et al. (Stevenson et al., 2000) found no correlation between matrix sublimation temperature and the ion internal energy or initial kinetic energy. But there is a correlation between gas phase acidity of the matrix and ion internal energy (Gimon et al., 1992; Preston-Schaffer et al., 1994). This relationship could explain the property of the matrix to induce (or reduce) the analyte fragmentation (Glückmann & Karas, 1999) that is directly linked to the initial ion velocity in the gas plume. Again, such correlation does not explain the energy transfer from the laser photons to the ions. The first theory proposed was photo-ionisation (Grigorean et al., 1996; Zenobi & Knochenmuss, 1998). This hypothesis is based on the chromatophore property of the organic molecules used as matrix. Indeed, Karas et al. (Karas et al., 1987; Karas & Hillenkamp, 1988) investigated the use of a matrix compatible with the laser beam. Nevertheless, ion formation from laser photons has been the subject of numerous hypotheses. One of the most realistic is summarised in an article (Liao & Allison, 1995).
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
57
Equation 1. Hypothesis for matrix ion formation (M: matrix, hQ: photon energy at the frequency Q) hQ m(hQ) . · · M Æ M Æ M+ + e-
A matrix molecule first adsorbs a photon, leading to an excited state. Then, with one or more photons, the excited matrix molecule ejects one electron and a cation radical of the matrix molecule is produced. This hypothesis seems correct since radical cations have been observed (Ehring, Karas, & Hillemkamp, 1992) as well as free electrons around the MALDI source (Quist, Huth-Fehre, & Sunqvist, 1994). Two studies were conducted to validate the model. Land and Kinsel (Land & Kinsel, 2001) found a correlation between laser power and the intensity of the signal corresponding to a positively charged analyte/matrix complex containing one peptide molecule. Their investigation indicates a two-photon mechanism for the ionisation process from laser emission at 308 nm. From this ionised aggregate, one more photons are needed to dissociate the matrix to the peptide ion. The energy requirement for the ionisation of the matrix/analyte complex is lower than the energy required for the ionisation of a single matrix molecule since the energy supplied by two 337 or 355 nm laser photons (the usual wavelengths for UVMALDI-MS) is not sufficient to completely explain this phenomenon thermodynamically. Let & Kinsel proposed a hypothesis involving complex stabilisation energy corresponding to the missing energy, which represents more than 1 eV. This hypothesis was partially verified by Lin and Knochenmuss (Lin & Knochenemuss, 2001), who measured the stabilisation energy of such complexes as a function of the number of matrix molecules involved in the complex. The complex energy rapidly reaches the maximum value of 7.82 eV, whereas the energy needed for the ionisation of a single matrix molecule is 8.0475. The stabilisation energy then corresponds to 0.2275 eV, while the energy of two 337 nm photons is 7.36 eV. Thus, 0.46264 eV is still missing to completely explain this ionisation mechanism. Nevertheless, the model was partially confirmed by experimental results (Ehring et al., 1992; Quist et al., 1994) and now the goal is to quantify the photo-thermic effect of the laser to determine if it complements the required energy for peptide ionisation. Another peculiarity of MALDI ionisation is the production of mainly singly charged ions. In fact, a huge amount of multiply charged ions are produced during the initial stage of the ionisation process in the gas plume (Karas et al., 2000). Due to the electrostatic interaction, the highly charged ions are neutralized by counterpart ions. Lower charged ions show less electrostatic interaction and singly charged ions are “the lucky survivors” of the produced ions. Moreover, this hypothesis could also explain the fragmentation phenomena occurring during the PSD-MALDI-MS. Less study has been conducted on the ionisation mechanism using IR lasers. Menzel et al. (Menzel et al., 2001) showed high correlation between the laser
58
W.V. BIENVENUT
wavelength and the vibrational absorption wavelengths of hydroxyl, amine and alkyl groups 4.6.1.4. Time of flight separation of the ions and its improvement The time of flight (ToF) analyser was developed by Wiley & McLaren in 1953 (Wiley & McLaren, 1953). This old technology was limited at that time mostly by the lack of accuracy of mass measurement, whereas electromagnetic analysers were preferred for their better resolution and mass accuracy. In recent decades, improvement in electronic components has allowed considerable improvement in ToF technology, which has no Mr range restriction theoretically. Moreover, such highly sensitive and accurate systems have accessible prices compared to similar quadrupolar-developed technologies. Standard time of flight analysers provide a resolution of a few hundred FWHM for analytes below 10 kDa. Ingendoh et al. (Ingendoh et al., 1994) list a series of factors involved in signal resolution, such as initial energy distribution and ion production time. Utilisation of an ion reflectron system reduces ions dispersion energy (Mamyrin, Karatajev, Shmikk, & Zagulin, 1973). The size of the laser beam also has an influence and must be around 10 Pm. Coupled with 109 Hz oscilloscope signal acquisition, signal resolution was notably increased. Another proposed improvement to signal resolution was the system called “Delayed Extraction” (DE) (Mamyrin et al., 1973). This technique consists of accumulating ions before their acceleration in the ToF analyser, which compensates the effect of the non-negligible period of ion production (a few ns). It is in fact a pulsed extraction at the source level obtained by an increase of the acceleration potential from 0 to 3 kV in 300 ns, corresponding to the laser firing time and ion production time. This improvement allowed 1024 FWMH resolution to be reached using cytochrome C. The most recent improvement was the integration of both of the previously described systems, i.e. reflectron and DE, on MALDI sources equipped with ToF analyser. With such combined system, Mr measurement accuracy was brought down to ± 2–3 ppm (Jensen, Mortensen, Vorm, & Mann, 1997; Takach et al., 1997), whereas signal resolution was higher than 10 000 FWMH (Andersen & Mann, 2000) and limit of detection was around the fmol level. Based on the matrix used during the ionisation process, Strupat et al. (Strupat et al., 1991) evaluated the amount of material used during spectrum acquisition. With the hypothesis of uniform reparation/integration of the analyte into the matrix during the co-crystallisation, only attomol levels of analyte are needed to produce a correct spectrum. To conclude, the ToF analyser was highly efficient for protein identification with the PMF technique.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
59
4.6.1.5. Signal detection and data acquisition Ion detection systems for MALDI-ToF MS are based on the utilisation of impact amplification surfaces. Such material transforms the impact of the ions against the surface into a free electron. This electron is then amplified by cascade effects to produce a measurable signal. Micro-channel plates, electron multipliers or conversion diodes are the most frequently used systems (Westmacott et al., 2000). Usually such systems produce voltages proportional to the number of impacts against the detector. The second challenge is to measure such voltages in the smallest amount of time that would correspond to 'M Mr after conversion Mr = f(t). However, detector response is not linear since the secondary electron emission is dependent on the kinetic energy, which is proportional to the square of ion velocity. The main influence of this parameter is a rapid decrease in the detector response for analytes of Mr higher than 2000 Da (Beuhler & Friedman, 1980, 1983; Brunelle, Chaurand, Della-Negra, Le Beyec, & Parillis, 1997; Geno & Macfarlane, 1989; Westmacott, Ens, & Standing, 1996). Less used are cryogenic detectors such as the “hot electron micro-calorimeter” (Hilton et al., 1998; Twerenbold, 1996; Twerenbold et al., 1996) and the “superconducting tunnel junction” (Benner et al., 1997; Frank et al., 1996; Westmacott et al., 2000). These detectors measure the energy of the incident ion and the Mr of the analyte is not involved in secondary electron production. Then, all analytes with an energy higher than the thermal energy of the environment and electronic noise must be detected (Booth, Cabera, & Fiorini, 1996; Twerenbold, 1996). Ion detection is a crucial step, but signal acquisition is also very important since it has a direct impact on spectrum resolution. Indeed, the more recording channels the digitiser has, the accurate will be the measurement and resolution. For this part of the instrument, improvements in microelectronics have increased commercially available digitiser performance in recent years. As an example, in 1988 Karas & Hillenkamp (Karas & Hillenkamp, 1988) used a 2048-channel digitiser. At present, a common digitiser provides a few hundred thousand channels, and some can provide up to a billon channels. With such digitisers and the various improvement of MALDI, e.g. electrostatic optics, peak resolution is routinely between 10 000 and 15 000 FWMH. 4.6.1.6. Separation and detection using ICR-FT ToF is the most frequent analyser used with MALDI sources, but others can also be used. One of the most powerful analyser/detectors is “Fourier Transform Ion Cyclotron Resonance” (FTICR). This technique is based on vibrational analysis of the ions, which is converted by Fourier transform to the corresponding Mr of the frequency-associated ion. Resolution of the signal is much better than with improved
60
W.V. BIENVENUT
TOF systems and Mr accuracy is routinely below 2 ppm (Palmblad, Wetterhall, Markides, Hakansson, & Bergquist, 2000). The major advantage of this technique is the very accurate measurement of ion resonant vibrations in comparison to time measurement with ToF systems, but one off the disadvantages is the high cost of such analyser/detectors. Sensitivity levels are usually similar to traditional MALDIToF MS with a low fmol level (Palmblad et al., 2000). Experiments were also conducted using IR as well as UV lasers with success (Budnik et al., 2000). 4.6.1.7. Signal reproducibility Although MALDI-MS technique is well adapted to protein analysis and especially for protein digests, the low reproducibility of the signal is still one of the major problems. Sample preparation protocols and salt concentration strongly influence the signal, and variation is also due to other parameters such as pH or water contained in the solvent. Figueroa et al. (Figueroa, Torres, & Russell, 1998) determined the influence of water contained in matrix solvent and found sequence-dependent responses. It is difficult to optimise this value in the case of a protein digest. Solvent pH also influences the result and Cohen and Chait (Cohen & Chait, 1997) showed this effect as a function of the pH range: - Between pH 1.1 and 1.6 (usually with formic acid solvent) observed peptides 2 kDa are detected, - Between pH 2.4 and 2.9 high Mr peptides are not visible, - Between those two domains, peptide analysis is correct for Mr range from1 to 6 kDa. Finally, acquisition parameters can influence the signal reproducibility. Gobom et al. (Gobom et al., 2000) developed an optimised method able to quantify neurotensin in human brain using an internal standard. They noted that the number of laser shots was an influencing factor since 10 consecutive shots showed a signal intensity variation up to 20%. For 400 shots, variation was reduced to 2%. Thus, to reduce variability of signal intensity, a large number of spectra must be record during all analyses (at least a hundred consecutive shots). 4.6.1.8. Suppression effects Suppression effects have strong influences on sequence coverage when protein digest are analysed (Amini et al., 2000; Brancia, Oliver, & Gaskell, 2000; Kratzer, Eckerskorn, Karas, & Lottspeich, 1998; Wenschuh, Halada, Lamer, Jungblut, & Krause, 1998). Although this phenomenon is well known, it is definitely not predictable. The effect is clearly visible when an equimolar mixture of peptides is analysed directly by MALDI-MS. Kratzer et al. reported the utilisation of dynorphine derivatives (Kratzer et al., 1998). This 17-residue peptide was prepared by chemistry. At each coupling step (from the 6th cycle up to the 17th), an equal amount
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
61
of peptide is removed from the synthetasor reactor. The spectrum obtained from such a mixture shows a large variation of the peptide-peak intensities depending, presumably, on the peptide sequence. In summary, the disparity of the intensity range, the ratio between the two extreme intensities is equal to 170. They first conclude that the effect of the primary structure of the peptide is mainly linked to hydrophobicity and basicity, but without fixed rules. Other studies found a correlation between the secondary structure of the peptide and the intensity (Wenschuh et al., 1998). Krause et al. (Krause, Wenschuh, & Jungblut, 1999) attempted to determine the correlation between the intensity of the five most intense peaks of a spectrum and the C-Terr AAs of tryptic peptides (Lys or Arg). A hundred identified mycobacterial proteins were used for the study and 94% of the analysed samples showed that the five most intense peaks have Arg at the C-Terr residue. Similar results were obtained using two peptides with the same primary sequence except that the C-Terr residue is Arg in one case and Lys in the second (Brancia et al., 2000). The authors linked this effect to the higher charge stability due to the lateral chain of the Arg and possible charge delocalisation. This theory could also explain the advantage and the difference in terms of intensity when Lys-C-Terr peptides are converted to homoarginine-C-Terr peptides. It must be noted that the activity of trypsin against Lys residues is 10 times lower than for the Arg residue (Keil, 1992). Amini et al. (Amini et al., 2000) reported an impressive example of this phenomenon: a sample obtained from capillary electrophoresis separation was analysed by MALDI and showed a single peak (Figure 10 A & B). When the same sample was separated by MEKC, nine different analytes could be detected as well as one main product (Figure 1 C). From this example, it must be concluded that any peptide can act as a strong signal suppressor. Egelhofer et al. (Egelhofer et al., 2000) advise MS users against addition of internal standards to the sample used for internal calibration that could also act as signal suppressors.Recently, Wang & Fitzgerald (Wang & Fitzgerald, 2001) proposed the utilisation of a solid mixture of matrix. This contains an analyte/matrix ratio from 1/2000 to 1/50 and is crushed finely, then loaded on a double-face adhesive tape. The technique provides signal reproducibility, but such preparation requires a large amount of analyte (between 0.1 and 2 mg). To conclude, it must be noted that the AA composition strongly influences the MALDI signal. While a few empirical rules were defined, e.g. Arg/Lys at the C-Ter of the peptide, it is still not possible to predict whether a peptide will be visible or not, despite studies aimed in that direction (Amini et al., 2000; Brancia et al., 2001; Brancia, Oliver, & Gaskell, 2000; Brancia, Openshaw, & Kumashiro, 2002; Gay, Binz, Hochstrasser, & Appel, 1999; Gay, Binz, Hochstrasser, & Appel, 2002; Kratzer et al., 1998; Krause et al., 1999; Wenschuh et al., 1998).
62
W.V. BIENVENUT
Figure 10. Electropherogram obtained by running the peptide mixture in a capillary. Operation conditions: BioFocusE 3000 Capillary Electrophoresis System (BIO RAD, CA, USA), equipped with a UV detector monitoring a wavelength of 214 nm. A fused-silica capillary 38 cm (34 cm effective length) 375 mm I.D. (Polymicro Technologies, Phoenix, AZ, USA) was kept at a constant temperature of 25.08°C and the applied voltage was 10 kV. (A) CZE separation of the peptide mixture, BGE was 10 mM acetate buffer, pH was measured to 7.01; (B) Mass spectra obtained by analysing CZE fraction (A) by MALDI-ToF; (C) MEKC analysis of Fr 14; BGE was 20 mM SDS in ammonium acetate (10 mM). (Reprinted from (Amini et al., 2000) © 2000, with permission from Elsevier Science)
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
63
4.6.1.9. Quantification by mass spectrometry Due to the problems of signal reproducibility (see previous section), protein quantitation using MS techniques is difficult. Nevertheless, a few articles have described this goal and most of them concern relative quantitation, with few describing absolute quantitation. Due to the low reproducibility of the MALDI signal, absolute quantitation using MALDI-MS must be conducted using a well-established protocol. As an example, Hensel et al. (Hensel, King, & Owens, 1997) described a methodology able to quantitate directly the analyte contain in the sample. This technique uses a calibration curve of the well-defined analyte. Preparation of the MALDI samples is done using an ES deposition technique that allows reproducible samples to be obtained. If such a technique is possible, the calibration curve must be constructed for all new analytes. Moreover, the average value of the variation coefficient of signal reproducibility is 8.5%, which does not allow precise peptide concentration measurement. Relative quantitation of a substrate against an internal standard/reference is more frequently used than absolute quantitation. Gobom et al. (Gobom et al., 2000) described such a technique. The goal of the study was to quantify the neurotensin contained in the brain. The internal standard used in such a procedure must be similar to the analysed peptide to produce an identical response in term of ionisation. Usually, stable isotopes are used to label the standard-analyte. In the case of neurotensin quantitation, 13C-labelled neurotensin was used. This type of analysis is frequently used for analyte quantitation in biological matrixes. Nevertheless, for all new analytes, the stable-isotope-labelled equivalent must be available. A new approach to quantitation uses relative quantitation of protein expression. This is the case of the “isotope-coded affinity tag” (ICAT™) proposed by Gygi et al. (Gygi et al., 1999). This technique allows determination of over- and/or underexpression of protein between two similar samples, e.g. healthy/diseased, or more generally a sample and a control, in a single-track process. Each sample is treated with chemical reagents, which are differently labelled with stable isotopes (deuterium-labelled molecules for the ICAT¥). Both samples are mixed together after the labelling step for 2-D gel separation or direct LC-MS/MS analysis. Then for the same tagged peptide sequence, the molecular mass of the peptide coming from healthy/control sample will be different. In that case, MS response for both modified peptides must be similar to determine the relative proportion from one to the other. In the particular case of the ICAT, a biotin is covalently linked to the alkylating tag. After a tryptic digestion, it is possible to extract only cysteinecontaining peptide using an avidin column, which simplifies a lot the sample. The idea of the avidin column is directly linked to multidimensional chromatography. Of course, proteins that do not contain cysteine are lost and other labelling systems on different reactive sites are developed. In chromatographic separation, the two differentially modified peptides do not have the same retention time due to the
64
W.V. BIENVENUT
difference of H/D hydrophobicity. A new type of ICAT¥ reagent has been developed with 13C-labelled isomers. With such labelling the problem of chromatographic retention time is strongly decreased. Different “isotope-coded tags” without the affinity tail were also used for differential protein expression determination using mass spectrometry. A few 2 H3] reagents have been proposed such as N N-acetoxysuccinimide and N-acetoxy-[ N succinimide (Chakraborty & Regnier, 2002 ; Zhang, Sioma, Wang, & Regnier, 2001), N N-ethylmaleimide or N N-methylmaleimide (Niwayama, Kurono, & Matsumoto, 2001) or acrylamide (Gehanne et al., 2002), NN t-butyliodoacetamide and iodoacetanilide (Pasquarello, Burgess, Hochstrasser, Sanchez, & Corthals, 2003).
Figure 11. Plot of the Mascot error distribution over the Mr range of matched peptide fragments on ESI-Q-Star for ELINSWVESQTNGIIR ([M+H]+ = 1858.97 Da ) correctly matched peptide from OVAL_CHICK (SWISS-PROT entry: P01012) that shows a “linear” repartition of the error over the peptide mass range (A) and DEMSDLSGRLALSGINAVCIGATLVNALLNQK (M+H]+ = 3286.71 Da ) incorrectly matched to “Random sequence for testing scoring statistics” of Mascot random database
In-vivo labelling can be used for relative protein quantitation. One of the most frequently used reagents is Leu-D10, which corresponds to Leu carrying 10 deuterium atoms to replace hydrogen atoms (Jiang & English, 2002; Martinovic, Veenstra, Anderson, Pasa-Tolic, & Smith, 2002). The idea is to grow a cell line or bacteria in enriched media. As an example, in one experimental condition, the medium is enriched with Leu-D10. Under different growing conditions, normal Leu is used (Jiang & English, 2002; Martinovic et al., 2002; Veenstra, Martinovic, Anderson, Pasa-Tolic, & Smith, 2000). At the end, an equal amount of both of these culture media are mixed and analysed after a separation and/or purification when needed. Similar results to ICAT or ALICE are obtained with a mass shift of 10 Da
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
65
between the two peak distributions for relative quantitation on signal surface integration. 4.6.1.10. Data treatment for protein identification MALDI-MS spectrum calibration Protein identification using PMF techniques needs frequent spectrum calibration in order to decrease peptide mass errors. Indeed, in most of the PMF identification tools, the peptide mass error is one of the most important parameters since this value has a great influence on the number of proposed candidate proteins (Clauser et al., 1999). Default calibration of the apparatus allows determination of protein/peptide mass below 1% of mass error for proteins and 0.2 to 0.5 Da for peptides. Despite limited error values, they are not usually sufficient for correct protein identification with good accuracy (Jensen et al., 1999). Internal calibration of MALDI spectra is probably the most accurate method of data treatment. However, such improvement could also decrease the richness of the spectrum due to signal suppression (Egelhofer et al., 2000). In tryptic digestion, autolysis products of the trypsin can be used as internal standard efficiently. Usually, for porcine and or bovine trypsin, 2 to 5 autolytic fragments from 842 to 2283 Da can be used easily for spectrum calibration. Such an approach allows the average mass error to be decreased below 10 ppm (Causer, Baker, & Burlingame, 1999). Nevertheless, calibrations based on autolysis products of the endoproteinase cannot be achieved every time and alternative calibration protocols must then be applied. If internal calibration using trypsin autolysis is not possible, a large number of contaminants can be used in such a process. As an example, keratin contamination could replace trypsin autolysis products. While these contaminants are not of interest in the protein identification process, they are very useful for internal calibrations and obviate addition of external peptidic standards. Such approach, although it allows very high peptide mass precision to be obtained, is time consuming. Automated external calibration is usually a more rapid method for processing a large number of spectra and with sufficient accuracy to identify the target protein. For this approach, a mixture of synthetic peptides is generally used as a reference sample. A calibration file is then generated and applied to the acquired MALDI spectra of the samples. The technique is not as accurate as internal calibration, but 100–50 ppm average error can be achieved. One of the error sources is the ionisation source geometry. A calibration sample is loaded on the whole surface of the MALDI sample target and spectra are acquired for each position. Certain positions of the MALDI sample target are defined as MS calibration sample. The differences between calibration sample position and sample position, in term of mass shift, are determined. Since part of the error is reproducible due to apparatus geometry, a 2-D calibrating matrix can be defined and used to “correct” geometrical error automatically and rapidly on the other samples
66
W.V. BIENVENUT
(Egelhofer et al., 2000). Nevertheless, the method requires the production of a recalibration matrix for each sample plate and apparatus. Another automated spectra re-calibration is based on linear regression on the matched peptide masses identified during the first skimming step (Gras et al., 1999). Indeed, use of ToF analysers has the advantage of producing a typical error vs. peptide Mr pattern (Figure 11) even though the correct model is not a first-order phenomenon. Robust linear regression thus gives good information about the confidence of the peptide mass values. When linear regression is possible, i.e. correlation coefficient higher than 0.8, such a calculation represents the error values by Equation 2. Error = A x Mr + B (calculated from Figure 11) and Error = Theo.(M Mr) - Exp.( Mr) Recalc(M Mr) = Theo.(M Mr) - Error Recalc(M Mr) = Theo.(M M r ) - A * Mr + B
Gras et al. (Gras et al., 1999) used such an of approach to improve score calculation. Moreover, from this type of calculation, it is possible to re-calibrate the spectrum with high accuracy, which could be lower than 20–30 ppm. Nevertheless, this method is divergent and if the initial set of masses is not sufficiently accurate (from 100 ppm to 1000 ppm (Gras et al., 1999)), important false positive identifications can be introduced. To reduce that problem, the combination the two previously described methods strongly improves the peptide mass accuracy and the robustness of the final method. The type of external calibration to be described is based on the discontinuity of the peptides masses. Indeed, due to the limited atomic composition of the peptides, mostly C, H, N, O, (Gly is frequently used as an AA model for mass spectrum deisotoping) preferential mass can be determined (Gay, Binz, Hochstrasser, & Appel, 1999). Peptide masses can then be re-calibrated to that theoretical value. The error must be lower than 0.5 Da to avoid data treatment divergence, i.e. wrong isotopic distribution. This technique permits an error of calibration below 50 ppm. Identification tools Algorithms have been developed for protein identification through the PMF approach (Table 9). All of these tools have the same approach. The first step is to compare the experimental list of peptide masses to the theoretical in silico digested proteins for the databases. The second step is to determine a score that can discriminate clearly true positive from false positive. One of the easiest scores is directly linked to the number of peptides that match to a protein. This scoring method was used with “PeptIdent” (Wilkins & Gooley, 1997) and “peptide search”. This type of scoring was adequate in the past for protein identification using the PMF approach, but it is no longer so due to the increasing size of the database.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
67
Moreover, large proteins such as titin also frequently give false positive identification (TrEMBL entry: Q8WZ42, 34 350 AA long sequence, MW = 3 816 262 Da). Indeed, on such large number of AA combinations, a false positive identification of a very large protein is easy. With some tools it is possible to skim such false positive identifications using a higher mass cut-off value. Other tools use a probability approach based on the peptide MW frequency of the in silico digested database. Applied to tryptic digests, high MW peptides (higher than 1500 Da) are less frequent in protein digestion and so they have the highest influence on the final protein identification score (Pappin et al., 1993). On the same idea, the “Mascot” tool uses a statistical score based protein identification using a random set of “masses” submitted to a random sequence database (Perkins, Pappin, Creasy, & Cottrell, 1999). Others, such as “ProFound” and “ProFound New” have their score determined by Bayesian calculation. Such an approach correspond to the calculation of the mathematical distance between virtual values and experimental values (Zhang & Chait, 2000). Although some of these tools are powerful, they do not clearly identify the target protein from noisy spectra. Accordingly, the use of different tools and compilation of the results has an advantage compared to a single answer. Moreover, the new ProFound is well adapted to multi-protein identification (up to four simultaneously) in MALDI-MS samples. A new version of on-site “Mascot” proposed similar data treatment. Principal affecting factors during data processing Although protein identification using the PMF approach is based on mass list comparisons obtained from MS spectra, more parameters are needed to improve data treatment. Most of the protein identification tools using PMF techniques, e.g. PeptIdent, Mascot, MS-Fit (Zhang & Chait, 2000) are extremely sensitive to peptide mass error. Indeed, peptide masses from mass spectra must be compared to in silico generated values. In most tools, the error value is required to exclude or validate each mass. In such tools, calibrated MALDI spectra must be used to produce reliable results. In that case, the error range must not be too small (Jensen, Podtelejnikov, & Mann., 1996), since there is a huge risk of losing some information due to intrinsic instrument error (around 2–5 ppm depending mainly on the machine), or so large as to induce a high yield of false positive matches. In a new generation of tools (Gras et al., 1999; Parker et al., 1998), exact internal calibration is no longer needed. In fact, tools are using the alignment of the error as a function of the peptides masses. With such a technique, proteins are identified with an error as large as 1000 ppm due to the utilisation of a robust linear regression. Moreover, the correlation coefficient during the statistical treatment of the data can be used as a discriminating factor for the final protein score.
W.V. BIENVENUT
68
Table 9. Protein PMF identification tools available on the Internet. OWL* database is no longer updated since May 1999 (release 31.4) Program name MOWSE
Mascot MS-Fit/MSTag/MS-Seq
ProFound PeptideSearch
PeptIdent
Smartident
PepMAPPER
Score calculation Probabilistic model (MOWSE) MOWSE (Modified) MOWSE
Bayesian Algorithm Number of identified peptides Number of identified peptides Heuristic score based on learning method
Database used
Internet address
References
OWL*
http://srs.hgmp.mcr.ac.uk/
(Pappin et al., 1993)
OWL*/NCBInr
http://matrixscience.com/
(Perkins et al., 1999) (Clauser et al., 1999)
SWISS-PROT/ http://www.prospector.uscsf. GenePept/pdbES edu/ T/ OWL*/NCBInr NCBInr/ http://prowl.rockefeller.edu/c SWISS-PROT gi-bin/ProFound/ NCBInr http://www.mann.emblheidelberg.de/Services/Pepti deSearch/PeptideSearchIntro .html SWISShttp://www.expasy.ch/ PROT/TrEMBL SWISSPROT/TrEMBL
http://ch.expasy.org/tools/
Various protein databases and complete organisms
http://wolf.bms.umist.ac.uk/ mapper/
(Zhang & Chait, 2000)
(Wilkins & Gooley, 1997) (Gras et al., 1999)
The error value can be defined in two different ways: - m/z unit: Da or mDa. This unit is directly comparable to the Mr of the peptides. - ppm unit: (error value)/theoM Mr*106. The advantage with this calculation is to limit the effect of the error as a function of the peptide Mr, i.e. the error window is smaller for smaller peptides and increases continuously as a linear function of the peptide mass, which is a correct model of the error reparation in ToF analyser apparatus.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
69
Chemical modification of the proteins can have two different origins: punctual or controlled. Punctual modifications are usually linked to artefactual reactions between the analyte (free reactive groups of the proteins) and some chemicals available during the separation and/or treatment process. For example, any remaining unpolymerised acrylamide is well known to react with the free sulfhydryl group of Cys (Bordini et al., 2000). Met oxidation is also a current punctual modification. Samples previously purified on IMAC columns usually show high Met oxidation levels (Bethancourt et al., 1999). One of the most frequent controlled modifications is the alkylation of the sulfhydryl group of Cys to decrease artefactual modification. Usually, the Cys conversion must be total to prevent an increase of spectrum complexity (and calculation time). Bordini et al. (Bordini et al., 2000; Galvani et al., 2000) proposed using acrylamide as an alkylating agent since a proportion of the Cys are modified with this reagent during the protein separation step. Other reagents are more frequently used including iodoacetamide, iodoacetic acid (not usable for IEF separation since it modifies the protein pI), and 4-vinylpyridine. The advantage of controlled modifications is the low computational impact of such modifications since the value of the Cys AA is replaced by the value of the Cys-CAM AA. Artefactual modifications usually strongly increase the spectrum complexity and can be the origin of false positive identification. In recent years, chemical modification of the peptides has become a goal in improving protein identification. Some residues can be modified, such as Glu/Asp by esterification, or Met/Trp by complete oxidation (Gevaert & Vandekerkhove, 2000). A more complete description of chemical modification of peptides is available section 5. One important parameter in protein identification by PMF is related to the databases used for the matching process. Indeed, this technique is very powerful if the protein sequence of the target sample is known and present in the database used for the protein identification. Some databases of protein and nucleotide sequences are available freely on the Internet, e.g. SWISS-PROT, TrEMBL, nrDB, Dbest, and most of the PMF tools use them. The two main sources for such data are: - SWISS-PROT (the manually annotated database) and TrEMBL (the complement of SWISS-PROT automatically annotated) (Bairoch & Apweiler, 2000) - Genepept the translation of the Genbank database (NCBI nucleotide database) (Wheeler et al., 2001) At present the SWISS-PROT database contains 163 496 entries (release 45.1) and TrEMBL contains 1 448 882 entries (release 28.1), treated automatically to decrease redundancy. Nevertheless, if these two databases are added they correspond to around 1 612 378 genes products. Genpept is an equivalent to the SWISS-PROT/TrEMBL database with high level of redundancy in terms of protein sequences. As an example, in SWISS-PROT all
70
W.V. BIENVENUT
proteins corresponding to a single gene are available in a single entry including, variant, alternative splicing and conflict sequences; that is not the case in Genpept, where one entry correspond to a single AA sequence. Other protein databanks of less interest exist, such as OWL or PIR. These banks cannot be considered as particularly useful since OWL is no longer updated since May 1999 (last OWL release 31.4) and PIR is an annotated database similar to SWISS-PROT and with similar information. It is also possible to generate virtual protein databases where the random sequence can be orientated to a defined AA composition, a defined protein AA composition or a statistical distribution AA composition. Whereas these databases are of no use for protein identification, using the results of PMF protein identification against such data allow determination the false positive score level. This type of calculation is done for Mascot (Perkins et al., 1999). Three major nucleotide databases are freely available (EMBL, Genbank, DDBJ). Their contents are similar and the major difference is related to the format and organisation of the data. They can also be used for protein identification but the DNA sequence must be translated in the 6-opened frames, which increases the calculation time considerably. Nevertheless, in a few cases that solution is the easiest way to identify a protein from the sequenced genome of an organism that is freely available (Link et al., 1999; Neubaeuer et al., 1997; Shevchenko, Sunyaev et al., 2001). EST databases are also used for protein identification. This type of information is used for protein identification if clear identification cannot be obtained with the previously discussed databases. This will be the case for badly represented species, for which EST databases contain much more information (Neubauer et al., 1998), although interpretation of the results is usually very difficult. But exploitation of all of the protein/DNA sequence available is not a solution. Usually, the first step will be to compose a virtual database containing a pre-screen set of protein/DNA sequence databases. Information can be directly extracted for the sample itself, e.g. tissues, species, Mr and pI of the target proteins. When proteins have previously been separated by 2DE, such values are easily available, whereas only Mr of proteins is available from 1-DE. The Mr value obtained from the separating gels is only an estimation of the real protein MW based on steric interaction between the polypeptide and the polyacrylamide web. The real protein mass can show 10–20% error on such Mr values. This value can show major variations against the theoretical MW value (higher than 50%). The two main reasons are: - PTM of the protein. In the case of glycosylation, an important mass shift can be observed, especially for low-MW proteins, - Protein fragments from in-vivo protein degradation or in-vitro sample alteration.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
71
The pI is less frequently used. Two mains reasons are, first, that the theoretical calculation is on an empirical basis and, second, that the PTM can induce pI shift. Usually the value must be used with a large error window of around 2–5 pH units. An opportunity to greatly decrease the number of candidate protein is usually to set the correct species. Such drastic species restriction is not always helpful, since a large number of species available in protein databanks contain only few proteins. In that case, utilisation of family and not restrictive species could help to identify the correct protein from another species if the homology between species is sufficient (usually higher than 90–95% at least for PMF identification, less for ESI MS/MS based identification) (Berndt, Hobohm, & Langen, 1999). Such information limits the number of potential candidates in a first rapid “skimming” of the databases (Berndt et al., 1999) that will be used further for the identification. Chemical modification of peptides (mainly artefactual, less for the chemically conducted) and completeness of the endoproteolytic digestion have the opposite effect, increasing the number of potential fragments (in silico generated) and calculation time. It is extremely important to balance correctly all of these factors to remove noise without loss of information. The endoproteolytic enzyme is a very important parameter since it will be involved during in silico peptide generation (Berndt et al., 1999). For few of these enzyme, the cleavage rules are clear and apply to the N and/or C side of a limited amount of residues, e.g. trypsin, Lys-C, CNBr. Other enzymes have less strict cleavage rules. In those cases, it could be difficult to compute all of the possible cleavages, e.g. with chymotrypsin or pepsin. In the last case, it is usual to have a large number of unidentified peptides that correspond to one non-computable cleavage site (Keil, 1992). Moreover, irrespective of the enzyme specificity, the fact that cleavage sites can be missed during the digestion may explained by: - PTM of the cleavage site, - Deactivating AA pattern, e.g. Pro in P’1 with trypsin (Keil, 1992), - Internal cleavage site on non-denatured protein, … Bio-informatic tools can process the list of experimental masses with a limited amount of potentially uncleaved sites, i.e. 1 or 2 MC. The score weight of such a peptide can be processed differently from 0 MC peptide scores. In some tools this value is accessible, e.g. in Mascot, ProFound in MS-Fit. Users can themselves define the confidence of a 1 MC peptide by setting the “P value” (from 0 to 1) to define whether the 1 MC peptide will have no or a similar weight as 0 MC peptides. To conclude, the operator must critically interpret results to validate one or more candidate proteins. The highest score is not always the best identification. Tools have biases that over- or under-estimate peptide scores and manual validation is an essential step.
72
W.V. BIENVENUT
Other possible criteria for protein identification not directly integrated into identification tools The previously mentioned parameters are available in nearly all of the PMF identification tools. Nevertheless, the data-mining approach and interpretation of the identification data allows us to define a few rules for validating protein characterisation. Such rules are usually not yet available in current tools. For example, Met can produce an artefactual modification with the oxidised form. Such a modification is available in all tools, but usually both the oxidised and non-oxidised peptides are present in the analysed mixture. Such information can be use as a validation factor. Moreover, Met is not the only AA able to produce oxidative by-products. Thr must also be associated with some oxidative forms but such modifications are not frequently proposed (Thiede et al., 2000). As well as oxidation, some AA patterns are frequently associated with PTM. This is the case with pyroglutamination of the Glu residues at the N-terminal position (Thiede et al., 2000). Krause et al. (Krause et al., 1999) investigated the signal intensity as a function of the peptide C-Terr residue. The five most intense peaks were used and the authors concluded that in 94% of these peptides Arg is the C-Terr residue. Incorporation of a result like this in the protein identification tools could greatly improve the identification since a second parameter from MALDI spectra would be used, i.e. relative peak intensity. At present the cleavage rules are set to a single AA at P1 position. Sometimes, the P’1 position is used as a deactivating factor, for example Pro residues during tryptic digestion. Nevertheless, it is quite clear that most of the endoproteinase use larger interaction sites (Keil, 1992). Better knowledge of these enzyme/substrate interactions is needed to be able to predict the cleavage sites with the highest accuracy. especially for enzymes regarded as unspecific, e.g. pepsin and papain. Improvements in bio-informatics must be able to treat such data to obtain a better fit between theoretical and experimental peptides (Thiede et al., 2000). Less critical is the protein staining after the gel separation. As an example, the silver staining procedure can be very sensitive, but the protein sequence recovery during the identification procedure is usually lower than the recovery obtained with any other staining, e.g. negative Zn/imidazol stain, colloidal Coomassie or SYPRO® (Lauber et al., 2001; Scheler et al., 1998; Valdes et al., 2000). Interpretation of results and limits of validity Due to the increasing size of the databases, protein identification using peptide mass fingerprint is becoming less and less convenient. In 1993, when peptide mass fingerprint was proposed (Henzel et al., 1993; James et al., 1993; Mann et al., 1993; Pappin et al., 1993; Yates, III et al., 1993), 4 to 5 peptides with a mass accuracy of 2–3 Da were required for a correct identification (Pappin et al., 1993). Today some theoretical studies suggest a strong decrease of the mass accuracy (down to 60 ppm) for correct protein identification against a part of the database, e.g. for E. coli or
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
73
Homo sapiens (Axelsson, Naven, & Fenyo, 2001). If the request is conducted against the whole database, e.g. SWISS-PROT and TrEMBL (around 700 000 entries), the mass accuracy required for a correct identification must be lower than 5 ppm error (Clauser et al., 1999). From the data submitted by the operator, the identification tool will return a list of potential identified proteins linked to a score value. Criteria must be used to validate or in-validate the proposed result. First of all, the score value is the main validating factor. Indeed, calculation of such values is orientated to produce the highest score for true positive identification, whereas the score must be as low as possible for a true negative. Between these limits, the false positive or negative results must be validated manually (Tang, Zhang, Fenyö, & Chait, 2000; Wang, Shoeman, & Traub, 2000). Then other information contained in the result output must be used, such as percentage of protein sequence recovery (Berndt et al., 1999), missed cleavage, or Met oxidation, as well as the error distribution between the theoretical and experimental Mr and pI values of the target protein (Figure 11). Nevertheless, some proteins are difficult to identify using the PMF approach. This is especially the case for low-MW proteins (below 10 kDa) or for highly modified (PTM) proteins. In the first case, small proteins produce a limited amount of peptide. Moreover, due to the MALDI suppression effect, only few real peptide masses are available during the analysis. In the second case, the modified peptide does not match the theoretical mass since PTM alters it. In some case only few unmodified peptides are available for the identification, which is not always clear. Current false positive identifications concern especially very large proteins (MW higher than 0.5–1 MDa), where the huge number of theoretical peptides may match few of the experimental masses. 4.6.2
Protein identification from internal peptide sequence
4.6.2.1 Introduction The identification procedure is similar to protein identification using the PMF approach, where the masses of the peptides are compared to in silico generated proteolytic fragment masses. For the MS/MS approach, the first step consists of comparison of the experimental peptide masses to in silico generated peptide mass, as in the PMF approach. Then, potential peptides are processed to generate potential internal fragments following the possible cleavage, as summarized in Figure 13. The peptide bond shows several possible fragmentation positions all along the peptide/protein backbone. This second list of peptide fragment masses is then compared to the experimental one in the same way as in the PMF procedure. A score
74
W.V. BIENVENUT
Figure 12. Strategy for unambiguous identification of proteins by MS. Electrophoretically isolated proteins are digested in-gel with a sequence-specific protease and the resultant peptide mixture is analysed by MALDI-MS. The high mass accuracy peptide map is used to query a comprehensive protein sequence database. If a protein is unambiguously identified, the sample is added to the list of identified proteins and the next sample is processed. Otherwise, the sample is desalted and a number of peptides from the mixture are sequenced by nanoelectrospray MS/MS. Peptide sequence tags are constructed and used for database searches either in full-length sequence databases or in EST databases in order to identify the protein or a corresponding EST. In cases of unknown proteins, the full-length peptide sequence is extracted from tandem mass spectra and used to design oligonucleotide probes for cloning of the cognate gene. Refer to complete article for further details (Reprinted by permission of Wiley-Liss Inc, a subsidiary of John Wiley & Sons, Inc from (Norregaard, Larsen, & Roestorff, 1998), © 1998).
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
75
is calculated for each peptide. Usually for a tryptic protein digestion, 2–50 peptides (depending on the size and amount of the protein) can be identified and the final score is obtained by the mathematical combination of all of the peptides scores (Eng, McCormack, & Yates, 1994; Fenyo, Qin, & Chait, 1998; Mann & Wilm, 1994).
Figure 13. Peptide fragmentation scheme and fragment nomenclature (Roepstorff & Fohlman, 1984)
4.6.2.2 ESI-MS/MS analysis Peptide analysis using the ESI-MS/MS approach is the method of choice for unsuccessful PMF protein identifications. MALDI and ESI ionisation were developed during the same period (Aleksandrov et al., 1984; Fenn et al., 1989; Karas et al., 1987; Tanaka et al., 1988; Yamashita & Fenn, 1984) and facilitate development in the area of biopolymer analysis. Whereas with the MALDI approach the analyte must be mixed with a matrix to favour energy transfer from the laser photons to the peptides, ESI directly uses the dilute solution of the analyte to produce gas phase ions through an ES interface. The liquid sample is pumped at rates from nanolitres (nano-spray) to microlitres (micro-spray) per minute through a hypodermic needle that is under a high potential difference (from a thousand to a few thousand volts). This approach is easily compatible with liquid chromatography to greatly decrease the complexity of the peptide mixture. The electrostatic field produces fine droplets with an excess of positive (or negative) charge. Use of a drying gas evaporates the solvent more rapidly, especially for the micro-spray approach where the volume of liquid is quite important. Due to the Coulomb repulsion of the
76
W.V. BIENVENUT
positive or negative charges, the droplet explodes (“Coulomb explosion”) to produce smaller droplets down to the critical Rayleigh diameter. These small droplets are airdried and the analyte is converted to gas phase ions. This process favours the production of multiply charged gas-phase molecular ions. Ion charges are usually proportional to the MW of the analyte. As an example, tryptic peptides usually produce a doubly (or triply) charged peak, with one charge at the N-Terr side of the peptide and a second on the basic residue of the Lys or Arg (C-Terr residue) side chain. These charges are mobiles and this mobility seems to be related to the primary sequence (Schutz, Kapp, Simpson, & Speed, 2003). This charges distribution favours the production of yn and bn ion series. For intact proteins such as myoglobin, the charge state of the molecule is between 8–10 and 20–24 positive charges, with a maximum of signal intensity between 14 and 20 positive charges (Figure 14) that depends mainly on the ESI source and parameters used for such acquisition. The analyte usually appears in a m/z window between 500 and 3000 Da for proteins and on of only 500 to 2000 Da for tryptic peptides.
Figure 14. ESI-MS spectrum of myoglobin (a) and the reconstructed molecular ion (b)
Due to the low mass window needed for the analysis of such ions, quadrupolar systems are frequently used, e.g. Figure 15. These systems are extremely powerful in that mass range and they can filter precursor ion mass with a good accuracy for MS/MS analysis. Parent ion fragmentation is driven in a collision cell where an inert gas, e.g. argon, helium or nitrogen, is used as a collision gas to produce specific daughter fragment ions. Such fragments are then analysed using a quadrupolar or a ToF system (Figure 15). The last configuration, a hybrid combination of a
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
77
quadrupolar system with a ToF separation system, allows excellent signal resolution and mass accuracy to be obtained (Morris et al., 1996). Different analysis systems such as Ion Trap (IT) are frequently coupled with ESI sources (Kaiser, Cooks, Syka, & Stafford, 1990; Louris et al., 1987). ITs are electrostatic traps that can accumulate ions and release them depending on the highfrequency voltage applied to the electrodes (or 3-D trap system). Such systems can produced MSn ion generation with n greater than 2; the previously described apparatus usually allows MS2 analysis. These IT systems are usually not expensive but the Mr precision and signal resolution are quite poor in comparison with ToF separating systems. Furthermore, due to the intrinsic properties of such traps, it is not possible to visualise all of the mass range in a single time, i.e. only the mass range from one-third of the precursor ion mass to the precursor Mr is visible. MSn is a possible approach to scanning a larger mass window.
Figure 15. Scheme of a quadrupole ToF instrument. Ions are produced at atmospheric pressure in the ion source shown to the left. After traversing a counter-current gas stream (curtain gas), the ions enter the vacuum system and are focussed into the first quadrupole section (q0). They can be mass separated in Q1 and dissociated in q2. Ions enter the time-of-flight analyser through a grid and are pulsed into the reflector and onto the detector, where they are recorded. There are 14 000 pulsing events per second. (Reprinted with the authorisation of “The Annual Review of Biochemistry”, © 2001, (Mann, Hendrickson, & Pandley, 2001))
For all types of separation process, signal intensity is correlated with the analyte concentration up to a saturation level. the signal suffers no influence of the analyte
78
W.V. BIENVENUT
flow (Emmett & Caprioli, 1994; Wilm et al., 1996) and similar signals must be obtained for nl/min up to few Pl/min flow. The use of such low volumes favoured the combination of ESI ionisation with peptide separation using mono-dimensional (Chen et al., 1999; Lauber et al., 2001) or multi-dimensional (Link et al., 1999) capillary HPLC and/or capillary electrophoresis (Li et al., 2001). With such technological improvement, the sensitivity of the technique was strongly increased and protein identification could be conducted with less than a pmol of material, e.g. around ng level of target protein (Andersen & Mann, 2000). The chromatographic approach allows reduction of the sample complexity. Such a system coupled with an automated MS/MS acquisition approach is a very powerful tool for accurate protein identification (Ducret, Bartone, Haynes, Blanchard, & Aebersold, 1998). However, this ionisation technique is highly sensitive to contaminants such as salts or surfactants. Capillary electrophoresis coupled systems, multidimensional chromatographic separations and contaminated samples (by salts and/or surfactants) are usually desalted using an on-line reversed-phase column where buffer can be changed to the correct mobile phase for ESI. This type of data can be used directly against protein databases for protein identification but, due to the high information content, it is also possible to conduct research on complete genome sequences (Neubaeuer et al., 1997) and/or EST databases (Neubaeur et al., 1998), which are usually nucleotide databases. Such information frequently contains errors (especially for EST database) but is sufficient to identify partial or full protein sequences by homology rather than by identity as in PMF approach. In these cases, the genome of a neighbouring species can be sufficient for protein identification (Shevchenko, Sunyaev et al., 2001). This approach uses automated treatment of the data against the database. It should be noted that good-quality ESI-MS/MS spectra are not always linked to an identified protein. In such a case, the goal is to extract the peptide sequence directly from the raw data (MS/MS spectra): de novo sequence. Programs are available for automated de-novo peptide sequence identification. Dancik et al. (Dancik, Addona, Clauser, Vath, & Pevzner, 1999) proposed a tool able to determine peptides sequence based on learning-algorithms. Such developments are not frequent and the results are still ambiguous. The best results obtained in that way remain sequence tags. Manual interpretation of the spectra is generally required to improve the results. Chemical modifications can be done to facilitate peptide sequence extraction, such as: - 18O C-Ter labelling (Rose et al., 1991; A Shevchenko et al., 1997; Uttenweiler-Joseph, Neubauer, Christoforidis, Zerial, & Wilm, 2001), which produces an easily identifiable pattern for the C-Terr peptides (see section 5.2.6), - Positively (Schwartz & Jardine, 1996; Sonsmann, Romer, & Schomburg, 2002; Yates, Eng, McCormack, & Schieltz, 1995) or negatively (Keough, Lacey, Fieno et al., 2000; Lindh, Hjelmqvist, Bergman, Sjövall, & Griffiths, 2000) charged peptide derivatisation (see section 5.2.5),
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY -
79
MS3 trapping approach, which facilitates spectrum interpretation and peptide sequence identification (Zhang & McElvain, 2000).
This last approach (MS3 at least) is a machine approach since the idea is to use the MSn resources of trap systems to generate preferentially yn or bn ion series. Indeed, if yn (in the MS2 stage) is selected as parent and precursor ion for MS3, the daughter ions will be preferentially yn-x ions as well as bn, an or any other ion series. These MS3 spectra are then used for daughter ion assignment on the MS2 spectrum. These techniques for protein identification using MS/MS analysis allow protein identification with high confidence. Moreover, by comparison with the PMF approach, where protein databases are mostly the unique source of information, the MS/MS approach allows the use of nucleotide sequences from genome sequencing projects (generally good-quality data) and also EST databases (generally poorquality data). In any case, if the protein sequence is not present in any data bank, at least manual interpretation of the spectrum is able to give sufficient information to develop a cloning approach. However, such a technique is highly time consuming and must be restricted to a limited number of samples (Figure 12). Note that, due to the chromatographic approach, signal suppression is usually less with the LC-ESIMS approach than with the MALDI-MS approach, in which all of the material is available on a unique position. Percentage sequence recovery is usually higher with the LC-MS approach. Moreover, due to the different ionisation techniques (ESI or MALDI), different ions were identified with each technique (Shevchenko, Sunyaev et al., 2001). The information concerning the internal protein sequence will be helpful for gene cloning approaches and preparation of a recombinant protein when needed. 5. ADVANCED TECHNIQUES FOR PROTEIN IDENTIFICATION 5.1. Introduction Protein identification was demonstrated in 1993 by five groups simultaneously (Henzel et al., 1993; James et al., 1993; Mann et al., 1993; Pappin et al., 1993; Yates, III et al., 1993). At that time, Pappin et al. (Pappin et al., 1993) successfully identified proteins using 4 to 5 peptides with a mass accuracy of 2–3 Da against the SWISS-PROT database, which contains 29 955 entries (release 25). Today, the SWISS-PROT and TrEMBL databases contain respectively 133 312 and 939 599 entries (releases 41.21 and 24.8 respectively). During the last 10 years, the number of protein sequences available increased more than 35-fold and is bound to increase exponentially in the next few years (personal communication, Prof. A. Bairoch). It is no longer possible to submit requests against the whole databases, i.e. SWISSPROT/TrEMBL, since the usual MS accuracy of the peptide mass is higher than 5
80
W.V. BIENVENUT
ppm (Clauser et al., 1999). Some approaches could be used to obtain better protein discrimination after PMF identification: - Reduction of the size of the database by the selection of taxonomic species or family, - Complementary information such as peptide/protein tags (by MALDIPSD-MS or MS/MS), - Complementary information from protein/peptide primary sequence (determination of partial/total AA composition of the peptide). This last technique allows validation of matched peptide sequences, which increases the accuracy of protein identification. A wide range of such determination methods were developed a few years ago when protein identification was mostly conducted via Edman degradation. The most frequently described modifications found in the literature are described below. 5.2. Chemical modification 5.2.1. Introduction Chemical modification of peptides/proteins compounds offers a large range of possibilities against reactive groups of the usual AAs. Indeed, most of these reactions were developed during Edman sequencing projects where it was important to produce internal peptides using chemical cleaving reagents. Various reagents to protect specific groups or induce residue-specific reactions were developed. At present, the major challenge is not to find new chemical reactions but mostly to adapt previous protocols to the new restriction linked to the MS approach, including: - Low amount of material, - Low salts and buffer contaminants Usually such modifications involve the reactive groups of the AAs, e.g. the amino group of Lys or the carboxylic group of Glu or Asp. 5.2.2. Reactions involving free amino groups of peptides/proteins Primary amine groups such as N-Terr amino groups as well as the H-amino groups of Lys are highly reactive. Despite the slight differences in the ppK Ka values of such groups, it is difficult to have specific reactions against one of the two types of amino groups. 5.2.2.1. Acetylation of the amino groups Due to the reactivity of the H-amino group of the lysine, this is one of the most frequent chemical modifications. As an example, Labouesse & Gervais (Labouesse & Gervais, 1967) studied the impact of acylation of the lysine H-amino groups of
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
81
trypsin in terms of the remaining endoproteolytic activity. A similar technique was also used to quantitate the frequency of lysine residues contained in a peptide/protein sequence or to determine the topology of the interaction sites of protein complexes (Ohguro, Palczewski, Walsh, & Johnson, 1994; Suckau et al. 1992). Such modification was also used to facilitate interpretation of CID/PSD spectra (Hisada, Konno, Itagaki, Naoki, & Nakajima, 2000; Pfeifer, Rucknagel, Kuellertz, & Schierhorni, 1999). Reactions of this type will modify all of the free amino groups of peptides/proteins. Briefly, the analyte is mixed with acetic anhydride and trimethylammonium buffer to obtain a quantitative conversion. A major advantage of this modification is the use of a volatile reagent such as trimethylammonium buffer and acetic anhydride (or acetic acid after the hydrolysis) that can be removed by vacuum (Pfeifer et al., 1999). In some cases, acylation occurs on the hydroxyl groups of the serine and tyrosine (Pfeifer et al., 1999) but such a functionality can be regenerated by treatment of the sample with dilute ammonia solution. Acetic anhydride is frequently used, but other reagent can be used, such as acetyl chloride or propionic anhydride (Chaurand, Luetzenkirchen, & Spengler, 1999). 5.2.2.2. Lys specific reactions
Lysyl residue
o-methylisourea
homoarginyl residue
methanol.
Figure 16. Conversion reaction of the lysine residue to a homoarginine residue
Some reactions are much more specific than acetylation: for example, the conversion of Lys to homoarginine, where the H-amino group of the Lys reacts specifically with o-methylisourea to produce the new synthetic residue (Figure 16). This converts the primary amino group of the lysine (pK Ka 9.7) into a homoarginine group (pK Ka 10.8), which mimics the physico-chemical properties of the Arg residue (Beardsley, Karty, & Reilly, 2000; Brancia et al., 2001; Hale et al., 2000; Keough, Lacey, & Youngquist, 2000).
82
W.V. BIENVENUT
Figure 17. Linear (a) and reflectron (b) mode MALDI-ToF spectra for 2 pmol of tryptic digest of yeast alcohol dehydrogenase. The signals in the spectra are labelled as derived from lysine- (K-) or arginine- (R- containing peptides. (Reproduced with permission of Wiley-VCH Verlag Gmbh, (Brancia et al., 2000), ©2000).
This modification was helpful because similar peptides containing an Arg or Lys at the C-Terr of the peptide (same primary sequence except the C-Terr residue) show different signal intensities during MALDI-MS analysis. By this conversion of the Lys to homoarginine, some Lys C-Terr peptides appear stronger than previously and increase the yield of sequence recovery. The conversion reaction is conducted at 37°C and the substrate is incubated with o-methylisourea in a basic buffer such as sodium carbonate (Hale et al., 2000), sodium hydroxide (pH 10.5) (Brancia et al., 2001), ammonia solution (pH 11.5) (Beardsley et al., 2000), or diisopropylamine solution (pH 11.3) (Keough, Lacey, & Youngquist, 2000). The conversion reaction is easy, but it involves a purification step to remove the large excess of reagent and buffer salts that are not vacuum compatible before further analysis. Such Lys modification can also help to suppress the ambiguity between Lys (MW = 128.17 Da) and Gln/Glu (MW = 128.13/129.11 Da) as the MW of the homoarginine product is 170.18 Da. Since this reaction allows modification of exclusively the Hamino group of Lys, the N-Terr amino groups are available for different chemical modifications specific to these groups (Keough, Lacey, & Youngquist, 2000).
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
83
This modification also increases the percentage sequence recovery since the modification of the Lys acts directly on the ionisation potential of the peptides. Figure 17 shows clearly the improvement in signal intensity for converted Lys C-Ter residues (Beardsley et al., 2000). This modification is important for MALDI-based analyses but is not very useful for the ESI approach. 5.2.2.3. Isothiocyanate treatment of the free amino groups for N-Terr cleavage (Edman type reaction) At present, Edman degradation is fully automated with robots able to process 40–50 residues a day. But when Edman & Begg (Edman & Begg, 1967) published their technique for AA sequencing, it was a manual approach. To summarize the reaction, the primary amino group of the peptides reacts with isothiocyanate derivatives, e.g. phenyl isothiocyanate (Edman & Begg, 1967), trifluoroethyl isothiocyanate (BartletJones, Jeffrey, Hansen, & Pappin, 1994), or allyl isothiocyanate (Gu & Preswich, 1997). The H-amino group of the Lys will be converted into a secondary amide, whereas N-Terr residues may be subject to recessive cleavage, as in Edman degradation using chemical cleavage, obtained by neat TFA treatment. The Edman approach places interest on the by-product of the cleavage reaction: the thiohydantoin residue. This compound carries a fluorophor group and the lateral chain of the corresponding AA. A chromatographic approach is able to identify most of the potential residues. In a mass spectrometry approach, the idea is no longer to identify the by-product but the peptide mass modification that corresponds to the NTerr AA. Such sequence information (single AA tag) can be integrated into the identification procedure as a peptide validation step. Phenyl isothiocyanate is the reagent of choice in automated Edman degradation systems. However, this reagent is highly toxic and for manual modification other reagents are preferred such as dimethylaminoazobenzene isothiocyanate (Chang, 1983; Wang et al., 2000) or trifluoroethyl isothiocyanate (Bartlet-Jones et al., 1994). This last reagent has the advantage of being volatile, so that the excess of reagent is easily removed by vacuum before MS analysis (Spengler, 1997). This compound was used successfully during seven successive cycles of manual cleavage coupled with MS analysis. Nevertheless, the high reactivity of such compounds seems to shows artefactual modification such as “acetylation” of the hydroxyl groups of the Ser and Thr. Allyl isothiocyanate shows better selectivity, but again this reagent is toxic and difficult to use in manual approaches (Gu & Preswich, 1997) If such free amino group modification is originally used to identify the primary sequence of a peptide/protein, it helps to remove the ambiguity against Lys, Q and E residues with the MS approach (Brancia et al., 2001).
84
W.V. BIENVENUT
5.2.3. Reactions involving free carboxylic groups of peptides/proteins Protein/peptide esterification is a well-known chemical modification of the carboxylic groups of the side chains of Glu and Asp as well as the C-Terr of peptides/proteins (Bartlet-Jones et al., 1994; Fraenkel-Conrat & Olcott, 1945). Due to the equilibrium in the final state and the kinetics of such reactions, a catalyst must be used as well as a large excess of alcohol to obtain quantitative modification. The original “recipe” proposed by Fraenkel-Conrat was also the simplest. Hydrochloric acid solution as a catalyst of the reaction was added to anhydrous methanol in order to obtain a 0.1–0.2 M HCl solution. This reagent was mixed with the dry analyte but, due to the small amount of water added through the HCl solution, the reaction was not total. An improvement of this method was to use HCl gas directly bubbling into methanol to prepare the reagent (Chibnall, Mangan, & Rees, 1958). At 0°C the esterification reaction may take a few days (Wilcox, 1967). An alternative was to produce HCl directly in situ. For this, acetyl chloride is added to dry methanol to produce ethyl acetate (by-product) and in-situ HCl (Hunt, Yates, Shabanowitz, Winston, & Hauer, 1986; Powell et al., 1995). A similar approach was proposed by Patterson et al. (Patterson et al., 1996) using thionyl chloride. The advantage of this reagent is the ability to convert the carboxylic group of the substrate to activated carbonyl (intermediate of reaction) that reacts immediately with the “solvent” (methanol) to obtain the methyl ester. This modification allows quantification of the Glu and Asp residues contained in the peptide/protein. A considerable advantage of this method is its speed and the yield of conversion, both of which are high. Methanol catalysed with hydrochloric acid is usually used for such treatment (Acharya, Maanjula, Murthy, & Vithayathil, 1977; Bartlet-Jones et al., 1994; Chibnall et al., 1958; Fraenkel-Conrat & Olcott, 1945; Hunt, Yates, 3rd, Shabanowitz, Winston, & Hauer, 1986; Wilcox, 1967), but other alcohols could be used, e.g. ethanol, (Nutkins & Williams, 1989), 2-propanol, 1-propanol, 1-hexanol, 1-octanol, or benzyl alcohol (Falick & Maltby, 1989). Other reagents have also been used for this type of chemical modification, such as diazoacetic acid (Riehm & Scheraga, 1965) to esterify free carboxylic groups of a protein, and glycinamide (Akashi et al., 1997) to produce amide converted groups quantifiable by MS. Another important reaction is C-Terr chemical degradation. The idea was to propose an alternative for chemical sequencing when N-Terr is blocked and Edman degradation cannot be used successfully. Such a development is actually older than Edman chemistry and was proposed by Schlack & Kumpf in 1926 (Schlack & Kumpf, 1926) using a reaction proposed by Johnsan & Nicolet in 1911 (Johnson & Nicolet, 1911). Reaction conditions are quite strong, but cleavage after Asp and Pro is not always successful. Due to the similarity of the approaches for C-Terr and N-Ter chemical sequencing, automation of both techniques was developed simultaneously
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
85
and on similar apparatus (Inglis, 1991; Wittmann-Liebold, Matscull, Pilling, Bradaczek, & Graffunder, 1991). Recent improvements allow secure analysis of Pro-containing peptides (Bailey, Tu, Issai, Ha, & Shively, 1994). This protein/peptide sequencing technique, an alternative to Edman degradation, is nevertheless infrequently used because of the improvement of MS techniques that provide an easier way to characterise proteins. As an example, acidic hydrolysis allows recurrent degradation of the C-Terr residue (Takamoto, Kamo, Kubota, Satake, & Tsugita, 1995; Tsugita et al., 1992). Usually, all of the degradation steps are present in the final mixture. Then, the MALDI approach is of interest for a previously purified peptide since the AA sequence is directly accessible (Thiede, Salnikow, & Wittmann-Liebold, 1997). Nevertheless, these techniques are not much used because of: - The lack of quantitativity of the chemical cleavage, especially in C-Ter cleavage, - The large amount of material needed for such an approach, - The availability of a robust enzymatic approach (see section 5.3.1). 5.2.4. Labile hydrogen atom exchange with deuterium atoms Hydrogen/deuterium (H/D) exchange is frequently used in biochemistry and is applied to characterise protein/protein interactions (Jones, Stott, Howard, & Perham, 2000), protein conformation changes (Katta & Chait, 1993; Villanueva, Canals, Villegas, Querol, & Avilés, 2000; Wang & Tang, 1996) and secondary structure of proteins (Kraus, Janck, Bienert, & Krause, 2000; Zhang & Chait, 2000). H/D exchange is mostly driven by the steric conformation of the protein (Engen & Smith, 2001). Protein analysis using NMR approaches after H/D exchange highlights secondary structure such as E-sheets (Zhang & Chait, 2000) or D-helices (Bhattacharjya & Balaram, 1997). This type of information, coupled to MS data analysis, is helpful in reconstructing three-dimensional structures of proteins. Spengler et al. (Spengler, Lutzenkirchen, & Kaufmann, 1993) proposed the use of H/D exchange not for 3-D protein characterisation but to help in MALDI-PSDMS spectrum interpretation. Sepetov et al. (1993) developed the same method but applied it to ESI-MS; in any case, all residues except the prolyl residue, which does not exchange a labile proton, could exchange a limited number of hydrogen atoms (Table 10). It is then possible to define the theoretical number of H/D exchanges and compare it to the experimental value. It is essential that all labile protons must be exchanged (Figueroa et al., 1998). Comparison of the PSD spectra (before and after treatment) facilitates spectrum interpretation since H/D modification is peptide residue-dependent (Chaurand, Luetzenkirchen, & Spengler, 1999; James, Quadroni, Carafoli, & Gonnet, 1994). A similar technique was used in ESI-based analysis (Sepetov et al., 1993) and also to help in peptide fragment validation. The use of these techniques in a validation procedure applied to PMF protein identification was
W.V. BIENVENUT
86
developed by Hochstrasser’s group (Bienvenut, Hoogland et al., 2002) and such chemical treatment was conducted with success on unknown proteins in gel digests. It facilitates protein identification due to the validation of primary sequence, especially for the low- to medium-MW proteins. This article is reproduced in full in Chapter 5. Table 10. Exchangeable protons as a function of the AA residue. MSO, C-PAM and C-CAM correspond respectively to Met sulfoxide and the acrylamide or iodoacetamide alkylation product of the cysteine. One-letter AA code
Exchangeable proton(s)
P F, G, I, L, M (MSO), V
0 1
C, D, E, S, T, W, Y K, Q, N, C-CAM, C-PAM – R
2 3 4 5
5.2.5 Cysteine alkylation Due to the sulfhydryl group (also called a cystine residue when linked to produce a disulfide bridge), cysteines are reactive AAs. Usually, proteins are denatured during gel migration to prevent any disulfide bonds reforming after and/or during migration. Cysteine alkylation blocks the reactivity of such groups. Some common chemicals available for such goal include: - Iodoacetamide (Galvani, Hamdan, Herbert, & Righetti, 2001), - Acrylamide (Gehanne et al., 2002; Sechi, 2002; Sechi & Chait, 1998), - Iodoacetic acid (Sok & Sih, 2001), - 4-Vinylpyridine (Ortiz & Bubis, 2001) Gygi et al. (Gygi et al., 1999) proposed a new method for the relative quantitation of all proteins contained in a complex sample. The idea is to compare the MS intensity of cysteine-containing peptides against reference samples. They proposed two alkylating agents, called Isotope-Coded Affinity Tags (ICAT) labels (Griffin, Goodlett, & Aebersold, 2001) with a light (hydrogen atoms) and a heavy isotope version (deuterium atoms). Each sample is alkylated separately with one of the reagents, and they are then mixed together for further treatment. Due to the affinity tag (biotin group), the cysteine containing peptides are easily extracted using an avidin column to greatly reduce sample complexity so that it could be used directly for multidimensional LC-MS/MS (Link et al., 1999). However, the avidin/biotin complex could be one reason why part of the sample is lost, due to the strong affinity between biotin and avidin, as well as for artefactual peaks/fragments
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
87
found during CID analysis and corresponding to the avidin tag. Qiu et al. (Qiu, Sousa, Hewick, & Wang, 2002) proposed a new version of the ICAT label called “Acid-Labile Isotope-Coded Extractants” (ALICE). The idea of this system is to cleave the biotin tag to liberate cysteine-containing peptides for multidimensional chromatography or any further analysis. Some groups (Chakraborty & Regnier, 2002) developed other reagents based on peptide labelling using stable isotopes. 5.2.6. Peptide modification using charged modifications So far, peptide modification have been conducted with neutral reagents, but it is also possible to use positively or negatively charged reagent to modify polypeptide properties. These modifications are usually used to improve fragmentation, especially for positively charge modification, and decrease the signal of C-Terr or N-Ter ion series, especially for negatively charged modifications. Such modification will help especially for daughter fragment recognition in de-novo sequencing. 5.2.6.1 Positively charged modifications Anchoring a positively charged group on one end of a peptide will have certain advantages (Roth, Huang, Sadagopan, & Watson, 1998): - Stabilising the positive charge on a fixed place, - Favouring some fragmentation due to charge position. Two main classes of reagents are used with similar experimental approaches: - Quaternary ammonium (Bartlet-Jones et al., 1994; Spengler et al., 1997; Stults, May 31-June 5, 1992 1992 1992; Stults, Lai, McCune, & Wetzel, 1993), - Quaternary phosphonium (Huang, Shen, Wu, Gage, & Watson, 1999; Liao, Huang, & Allison, 1997; Shen & Allison, 2000). These reagents react with the free amino groups to produce an acetylated peptide through an activated carbonyl carried by the charge tag (Figure 18). The modified peptides exhibit a strong positive charge at the N-Terr and the fragmentation will be orientated to an and bn ion series (Huang et al., 1997; Stults et al., 1993). The positive charge also increases the absolute intensity of the signal 2to 5 fold during MS analysis (Stults et al., 1993). While these modifications were mostly developed for the ESI-MS/MS approach, Bartlet-Jones (Bartlet-Jones et al., 1994) used such modification for MALDI-PSDMS. In that case, Lys was pre-treated with o-methylisourea to protect the H-amino group of the Lys and obtain a single modification (or stable positive charge) per peptide. Then, again, the PSD-MS spectrum will show mainly an and bn ion series (Spengler et al., 1997).
88
W.V. BIENVENUT
Figure 18. (2,4,6-Trimethoxyphenyl) phosphonium aetylation of peptides (Huang et al., 1999) 5.2.6.2 Negatively charged modifications As previously described, a positively charged tag can be added to the peptide sequence to improve fragmentation. It is also possible to modify peptides with negatively charged reagents. In that case, the an, bn and cn ion series will be suppressed due to the negative charge, whereas xn, yn and zn ions series will not. Reagents proposed for this modification are usually bifunctional organic molecules carrying on one side an activated carbonyl able to react with the free amino groups and an acid group on the other side (Keough, Youngquist, & Lacey, 1999b). Two of the best reagents are sulfobenzoic acid and chlorosulfonylacetyl chloride (Keough et al., 1999b). With such modification the MALDI-PSD MS spectrum shows a complete series of yn ions series (Figure 19) and similar results were obtained in ESI-MS/MS (Bauer, Sun, Keough, & Lacey, 2000; Keough, Lacey, Fieno et al., 2000). These simplified spectra allow de-novo sequencing of unknown peptides with the PSD approach (Keough, Lacey, Fieno et al., 2000; Keough et al., 2000; Keough et al., 1999b). C-Terr derivatisation was also studied with the use of 4-aminonaphthalene sulfonic acid (Lindh, Griffiths, Bergman, & Sjövall, 1994). This reaction is not very helpful since the conversion yield is sequence dependent with rate of conversion from 10% to 100% and artefactual reactions are frequent on the carboxylic side chains. Furthermore, this modification was tested on large amounts of peptide (10–100 nmol) and is difficult to apply to gel-separated proteins. The main advantage of such modification is the improvement of negative mode detection, which is increased 5to 50-fold (Griffiths, Lindh, Bergman, & Sjövall, 1995).
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
89
Figure 19. MALDI-PSD MS spectrum in positive mode acquisition of a modified peptide carrying a sulfonate tag at the N-Ter (Reproduced with permission of the National Academy of Science, (Keough et al., 1999b), ©1999)
5.2.6.3. Conclusion The use of covalently immobilised charged tags is sometimes helpful, mostly for de-novo protein sequence identification, but it is difficult to apply the technique to a protein digest. Above all, these chemical reactions are usually conducted in non volatile buffer, and it is then imperative to include a purification step in each of these modifications. The simplest way would be a Zip Tips™ approach, but a chromatographic approach will also decrease the mixture complexity for further analyses. As a result of all these steps, part of the material is lost and 100 fmol is generally the minimum amount of material needed for a successful procedure.
5.2.7.
Stable isotope labelling during the digestion
Due to the endopeptidase activity, a stable isotope of 18O can be added to the newly generated carboxylic group during the cleavage step. The peptide will increase in Mr by two units by comparison to the theoretical isotopic distribution. Usually, the digestion process is conducted with 1:1 16O:18O as a solvent for the endoproteinase and the buffer. During the hydrolysis of the peptide bond, half of the carboxylic groups receive 16O whereas the other half receive 18O. This labelling procedure is directly linked to the hydrolysis process and can be conducted in a few hours, i.e. the digestion step. The MS signal will then be characteristic of that type of modification (C-Terr fragment), e.g. peak 413.3 Da in Figure 20 by comparison to the usual peak of 1016.2 Da.
90
W.V. BIENVENUT
Figure 20. Sequencing of a myosin I heavy-chain kinase tryptic peptide (VVLEGLRYLHTR) digested in solution with 18O labelling. The LC-MS/MS spectrum shows data obtained from a +2 charged peptide with m/z 714.8. The protein was digested in solution with 50% 18O labelled water. The insets show the isotope distribution profiles for y and b type ions. A proper isotope distribution profile is maintained in the centroid mode, allowing the identification of y and b type ions. The y and b type ions are denoted with ● and X, respectively. Ions that are generated from loss of H2O/NH H3 are marked with *.(Reprinted by permission of Wiley-VCH Verlag Gmbh, (Qin, Herring, & Zhang, 1998), ©1998) Pure 18O-labelled water can be used for the digestion step, but in that case the esterase activity of most of the serine protease will exchange a second atom of oxygen on the carboxylic group (Bender & Kemp, 1957; Rose et al., 1991; Schnolzer, Jedrzejewski, & Lehmann, 1996). This modification takes much more time and usually at least 48 hours are required for a correct labelling procedure. Due to the large excess of 18O, the final modification of the peptide is a 4-Da mass increase by comparison to the theoretical value. This type of chemical modification has been used for protein quantitation (absolute (Mirgorodskaya et al., 2000) or relative (Yao, Freas, Ramirez, Demirev, & Fenselau, 2001)), but it has a low reproducibility of yield (Mirgorodskaya et al., 2000; Yao et al., 2001).
5.2.8. Conclusion Some of the chemical modifications of peptides described (summarized in Table 11) are able to extract information corresponding to the primary peptide sequences. Such information is helpful in the process of protein identification and especially for PMF protein identification or de-novo sequencing. Nevertheless, these modifications
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
91
generally require a purification step and/or a decrease in sample complexity. These problems do not favour automated sample treatment, which reduces the implementation to a few important samples. 5.3. Biochemical approach Protein/peptide modification can be chemically based. An example is the recurrent peptide hydrolysis in acidic media. Such peptide modification can be obtained using biochemical approaches. Carboxypeptidase can produce similar results to C-Ter recurrent hydrolysis. As well as C-Terr hydrolysis, N-Terr recurrent hydrolysis is also available with aminopeptidase. There are commercially available carboxypeptidases.. One of these is carboxypeptidase A, which has aromatic (residues) as the principal AA target as well as branched residues such as Ile. Carboxypeptidase B correspond to an enzyme with a specificity mostly related to basic residues such as Lys or Arg (Folk, Piez, Carroll, & Gladner, 1960), more especially Arg (Tan & Eaton, 1995). Carboxypeptidase C is a generic term including enzymes showing their highest activity against hydrophobic residues. Carboxypeptidase Y is one of the components of this mixture and is able to produce cleavage after prolyl residues. Carboxypeptidase I and carboxypeptidase MI/MIII are also some of the main components. Carboxypeptidase P is much more orientated to prolyl residues (Dehm & Nordwig, 1970). In any case, if carboxypeptidase A and/or B have preferred cleavage sites, it is mostly related to the kinetics of the hydrolysis reaction and not the ability to obtain a cleavage. As an example, these enzymes are able to cleave bonds but Leu, Ile, Asn, Gly or Glu involve a longer time (Nishihira et al., 1995; Villegas, Vendrell, & Avilés, 1995). Because of this, it is almost impossible to obtain real recursive hydrolysis with a single enzyme in a single sample. Enzyme cocktails containing at least two enzymes are frequently used to achieve this. The most common mixtures are: - Carboxypeptidase Y and P to cleave all of the hydrophobic residues at the C-terminal side of the peptide (Bonetto et al., 1997; Patterson, Tarr , Regnier, & Martin, 1995; Thiede et al., 1995), - Carboxypeptidase B with a second enzyme to preferentially cleave tryptic peptides since the first residue in such material is a basic residue and may limit further internal hydrolysis (Déon et al., 1997), - Carboxypeptidase was used alone in some cases (Wang et al., 2000), with the risk that reaction is stopped if there is a prolyl residue in the sequence.
K H-amino group
All (except P) C
C
C
D, E and C-termini N-termini
Homoarginyl conversion (MCAT)
Labile H
ALICE
Acrylamide labelling
Esterification
Nicotinic acid derivatives
ICAT
AA K and C-Terr
Chem. modif. H-Amino group acetylation
4
Few
3
7
8
1–5
42
Few
' Mr
+
+
+
+
+
–
-
Q –/ (±)
ML ML
ML ML ML
–
–
Ac. L
L
L
L
H
H
L
M
Th. L
Less expensive than ICAT or ALICE Simplify MS/MS data with only bn ions
Less expensive than ICAT or ALICE
Decrease mixture complexity
Decrease mixture complexity
Validate fist AA sequence
Advantages Suppression of the ambiguity between K and Q, potential quantification Better ionisation for K C-Ter peptides,
Low MW difference, high computation needed, potentially incomplete conversion Potential side effect on highly modified molecules (D9) Time consuming preparation, noncleavable K after treatment
Loss of information
Loss of information
Non-volatile reagents, not adapted to quantification despite a literature description (Cagney & Emili, 2002) Long process
Drawbacks Potential side effect of the acylating agent during ionisation
B
B/A
B
B
B
A
A
Mod. B/A
Table 11. Summary of the chemical modifications described in the text (Chem. modif.). 'Mr:Mass variation induced by the modification; “Few” of them correspond to a variable mass depending of the chemicals used during the chemical modification. Q: quantitation; +: possible; – not possible. Ac.: accuracy of the quantitation; Th.: throughput: M, Medium; L, Low; H, High; Mod.: period when the protein is labelled: B, before digestion; A, after digestion; D, during; CG: cell growth.
92 IENVENUT
O inverse labelling Positively charged modification 15 N, 13C or 2H enriched AA Negatively charged modification
18
Succinic anhydride
N-acetoxy succinimide N-alkylmaleimide
Free NH2 groups Free NH2 groups Free NH2 groups Tryptic CTer peptides Free NH2 groups Various AA Free NH2 groups – + –
Few Few
–
M
–
+
2 or 4 Few
+
+
3 or 5 4
ML ML ML H
+
3
M
M
M
L
L
L
L
Decrease an and bn fragments intensity (carrying negative charge)
Information concerning the C-Terr residue (K or R) Less expensive than ICAT or ALICE Less expensive than ICAT or ALICE Facilitate yn ions identification Increases charge stability on an and bn fragments CG A
Needs purification prior to analysis
A
D
Needs cultured cell lines
Esterase activity, high computer treatment for quantification Needs purification prior analysis
A NTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
93
W.V. BIENVENUT
94
Theoretically such enzymatic treatment is able to produce a recursive digestion of peptide from C-Terr to N-Ter, but the experimental counterpart is not as evident. It is usually better to use an isolated peptide in such an approach (Schar, Borsen, Gassmann, & Widmer, 1991; Thiede et al., 1997). Furthermore, enzyme selectivity and kinetics play important roles, impairing some cleavages, which are not quantitative, e.g. native or alkylated Cys residues partially impaired the enzymatic cleavage. Moreover, this type of biochemical treatment is extremely sensitive to time of reaction. To obtain a good recovery of the AA sequence, digestion time is generally needed to cover the whole sequence. The suppression effect related to peptide mixtures may also have an adverse influence on the spectrum (Kratzer et al., 1998). Carboxypeptidase B has also been used to remove a single residue at the C-Ter residue of tryptic peptides. In that case, the treatment completely changes the physico-chemical properties of the protein digest mixture (Bethancourt et al., 2001), mostly in terms of peptide charge and hydrophobicity. Aminopeptidases are also available but, due to the high specificity of Edman degradation, they are not frequently used. The most frequently mentioned is carboxypeptidase M, which is active against all residues (Wang et al., 2000) and was used to characterise a peptide in a peptide/oligonucleotide complex. 5.4. In-vivo labelling Metabolic labelling is not possible for all kinds of sample. Indeed, it is limited to cell culture or bacterial growth. The idea is similar to the previously described in vivo labelling procedure (see section 4.6.1.9). In this case, the goal is not to quantify the relative abundance of a protein but rather to determine the presence or absence of a specific residue in a peptide sequence or the number of such residues in a small protein (below 30 kDa) (Ong et al., 2002; Veenstra et al., 2000). This method is not frequently used because it is an in-vivo labelling procedure and, furthermore, stable isotope labelled reagents are expensive, which limits their use. 6.
AUTOMATED APPROACH
The task of identifying and characterising all proteins expressed by a genome is tremendous (Hochstrasser, 1998; Williams & Hochstrasser, 1997). The word proteome has been coined to refer to the expressed protein complement of a genome (Wilkins et al., 1995). Several genomes have already been fully sequenced and many others will be in the near future. Although it is possible to extract from genome data a complete set of the potentially expressed protein or AA sequences, in many cases this information is not sufficient to unravel the function of a newly discovered gene product: the proteins also need to be identified and characterised (Williams & Hochstrasser, 1997).
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
95
Massively parallel protein identification and characterisation techniques are required. Several groups around the world have developed methods using liquid chromatography–mass spectrometry (LC-MS) to sequentially identify and partially characterise proteins from complex biological samples. MALDI-ToF techniques have been developed to analyse intact proteins or their peptide fingerprint profiles (Scheler et al., 1998). Several software programs have been developed to assist protein identification by comparison of mass spectra obtained from MS or MS/MS experiments with theoretical spectra from protein and DNA databases (Davis, Spahr et al., 2001; Spahr et al., 2001). They demonstrated the possibility of scanning a transblotted membrane on a MALDI-ToF-MS equipped with an infrared laser where intact proteins were detected. The detection sensitivity was equal to or better than that obtained by silver-staining. Ogorzalek Loo et al. (Ogorzalek Loo et al., 1997a) analysed proteins directly from a polyacrylamide gel with good sensitivity and mass accuracy. Peptide mass fingerprinting (PMF), a method of choice in proteome studies, requires specific chemical or enzymatic digestion followed by MS of the resulting peptides. Up to now, the digestion step has been a sequential process in which robotics can be used for spot excision, such as the “spot picker” proposed by Traini et al. (Traini et al., 1998b). Due to the large amount of sample produced during 2-DE separation, manual procedures are not designed for high throughput protein identification, e.g. whole proteomes of cell lines or bacteria. Robotized approaches are now possible. The goal is mainly to decrease manual treatments of the sample such as: - Protein spot cutting, - In gel protein digestion, - Peptide extraction, - Sample loading on MALDI sample plate, - MALDI-MS analysis, - Signal treatment, - Data treatment for protein identification, - Validation of the protein identification. To summarize, five main steps can be identified: - Spot cutting: This step is the origin of some contaminations, such as keratin contamination, when it is conducted manually (Lopez, 2000; Parker et al., 1998) but it also allows more reproducible samples (Traini et al., 1998b; Walsh, Molloy, & Williams, 1998). The speed of a robot depends on the guidance system and can be quite low. - Sample treatment: This step corresponds to the liquid handling treatment of the sample. Automation of this step greatly decreases repetitive work and risk of sample contamination (Houthaeve, Gausepohl, Ashman, Nillson, & Mann, 1997; Houthaeve, Gausepohl, Mann, & Ashman, 1995).
W.V. BIENVENUT
96
-
-
This step corresponds to in-gel protein digestion, peptide extraction from the sample and sample loading onto the MALDI sample plate (Ashman et al., 1997; Houthaeve et al., 1997; Houthaeve et al., 1995). Some robots are available commercially, but there is no single fully automated robot that can process all of the liquid handling treatment from gel plug destaining to digest sample loading to the MALDI sample plate. Usually, two robots are used or separate steps on a single robot: at the least, one for the gel plug treatment from destaining steps to protein digest extraction and a second for protein digest loading, with or without desalting step, to the MALDI sample plate. MALDI-MS analysis: This step is fully automated and correspond to the acquisition of spectra of the submitted samples, Data treatment: On-line data treatment is now directly linked to the MALDI-MS analysis and decreases repetitive work and risk of error. Nevertheless, such a solution is not always the most accurate since the peptide mass list is the most important input for protein identification with PMF. The algorithm used for peak detection must harvest as many peaks as possible to clearly identify the protein without loss of information, but also must not harvest so many masses as to introduce false positive masses and so decrease the identification confidence. Some of these algorithms use an average isotopic distribution to calculate the fit between experimental and theoretical patterns (Berndt et al., 1999) or a Poisson statistical law able to resolve complex mixtures (Breen, Hopwood, Williams, & Wilkins, 2000). Identification/validation: This last step usually requires an operator and is difficult to automate.
Robotized procedures increase the speed of large-scale protein identification, but the validation step, which is crucial before publication of the results, is still the bottleneck of such techniques. Another technique for high-throughput protein identification, proposed by Prof. Hochstrasser’s group (Bienvenut et al., 1999; Binz, Muller et al., 1999), is called the “molecular scanner” process, based on protein digest MS imaging. These workers wished to investigate whether it was possible to digest all separated proteins on a bidimensional electrophoresis (2-DE) gel simultaneously, and whether it was possible to transfer all resulting protein fragments to a membrane without loss of spatial resolution and subsequent mass spectrometric sensitivity. In 1991, Prof. Hochstrasser ( Hochstrasser et al., 1991) proposed the concept of the molecular scanner applied to the medical sciences. At that stage, the idea was more or less the utilisation of the upcoming computing tools and especially the comparison of two different samples after 2-DE separation. A tool was developed, called “Melanie” (Appel et al., 1988), in order to analyse proteomic maps and
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
97
highlight the main differences in protein expression. The second step was to integrate into that system new functionalities able to identify proteins. Using a parallel process, all proteins of a 2-DE are simultaneously digested proteolytically and electro-transferred onto a PVDF membrane (Bienvenut et al., 1999). MALDI-ToF-MS scanning of the membrane would then provide a massively parallel way to rapidly and partially characterise thousands of proteins with an appropriate integrated software system (Hochstrasser, 1998) able to treat mass spectra and to create a fully annotated image. After automated protein identification from the obtained peptide mass fingerprints using SmartIdent software (Binz, Muller et al., 1999; Gras et al., 1999; Muller, Gras, Appel, Bienvenut, & Hochstrasser, 2002), a fully annotated 2-D map is created online. This is a multi-dimensional representation of a proteome, which contains interpreted PMF data in addition to protein identification results. This highly automated method or “MS-imaging” method can create a fully annotated 2-D map starting from a 2-DE. The correlation between neighbouring spectra is used to recalibrate the peptide masses (Muller et al., 2002) but also to validate identified protein. Visualisation of all peptide mass fingerprint data revealed that some masses are localized in spots whereas other masses, especially in the lower mass region, spread out over the entire membrane. These latter masses were attributed to chemical noise and were discarded from the mass fingerprints. If only isolated spectra were available, the identification of chemical noise masses would be very difficult and these masses could disturb the PMF identification. Since the membrane was slightly warped after sample plate preparation, the overall calibration of the spectra was poor. A few master spectra that permitted very clear PMF identifications could be used as calibrating files, using matched peptide masses as internal standards. The calibration of the remaining spectra is strongly improved by using the correlation between neighbouring spectra. By selecting the masses detected on a contiguous, but limited, region of the membrane, the noise in the data is reduced. The distributions of the peptide signal intensities of these masses seem to reflect the concentration of the proteins they stemmed from. Masses with similar peptide signal intensity distributions were put together in clusters, which allowed separation of masses that stemmed from overlapping proteins. Twenty different clusters were obtained in this way and were submitted to the PMF identification program, which provided clear identifications for 13 of them. Finally, a method that clusters peptide masses according to the similarity of the spatial distributions of their signal intensities finalized the image reconstruction and protein spot definition. This method allows many of the false positives that usually go along with PMF identifications to be discarded and allows identification of many weakly expressed proteins present in the gel (Muller et al., 2002). These are only some of the applications that are possible with molecular scanner data. The researchers are currently working on a new PMF identification scoring method that automatically takes into account the two-dimensional aspect of
98
W.V. BIENVENUT
the data. A very intriguing prospect for future development comes from a new generation of mass spectrometers such as MALDI-ToF/ToF and MALDI-QqToF machines, which could combine the MALDI scanning technique with MS/MS identification. The mass grouping method could then be used, after a first MS scan, to efficiently select parent masses for subsequent fragmentation analysis. This provides the technological basis for the development of a clinical molecular scanner, which will be adapted and dedicated to medical diagnostics (Hochtrasser et al., 1991). 6. REFERENCES Acharya, A., Maanjula, B., Murthy, G., & Vithayathil, P. (1977). Int J Peptide Protein Res, 9, 213-219. Achaz, G., Coissac, E., Viari, A., & Netter, P. (2000). Mol Biol Evol, 17, 1268-1275. Adams, M., Celniker, S., Holt, R., Venter, J., & al., e. (2000). Science, 287, 2185-2191. Aebersold, R., Teplow, D., Hood, L., & Kent, S. (1986). Electroblotting onto activated glass: High efficiency preparation of proteins from analytical SDS-PAGE for direct sequence analysis. J. Biol Chem, 261(9), 4229-4238. Akashi, S., Shirouzu, M., Terada, T., Ito, Y., Yokoyama, S., & Takio, K. (1997). Anal. Biochem., 248, 15-25. Alba, F., & Daban, J. (1998). Electrophoresis, 19, 2407-2411. Aleksandrov, M., Gall, L., Krasnov, V., Nikolae, V., Pavlenko, V., Shkurov, V., et al. (1984). Bioorg. Khim., 10, 710. Alvarez, E., Larsen, B., Coldren, C., & Rice, J. (2000). Effects of residual acrylamide monomeres from two-dimentional gels on matrix-assisted laser desorption/ionisation peptide mass mapping experiements. Rapid Commun. Mass Spectrom., 14, 974-978. Amini, A., Dormady, S., Riggs, L., & Regnier, F. (2000). The impact of buffers and surfactants from micellar electokinetic chromatography on matrix-assited laser desorption/ionisation (MALDI) mass spectrometry of peptides. Effect of buffer type and concentration on mass determination by MALDItime-of-flight mass spectrometry. J. chromatography A, 894, 345-355. Andersen, J., & Mann, M. (2000). Functional genomics by mass spectrometry. FEBS letters, 480, 25-31. Appel, R., Hochstrasser, D., Roch, C., Funk, M., Muller, A., & Pellegrini, C. (1988). Electrophoresis, 9, 136-142. Asara, J., & Allison, J. (1999). Enhaced detection of phosphopeptides in MALDI-MS using ammonium salts. J. Am. Soc. Mass Spectrom., 10, 34-44. Ashman, K., Houthaeve, T., Clayton, J., Wilm, M., Podtelejnikov, A., & Jensen, O. (1997). Lett. Pept. Sci., 4, 57-65. Axelsson, J., Naven, T., & Fenyo, D. (2001, 4-6th off April). Paper presented at the From biology to pathology: the proteomics perspective, York, UK. Bahr, U., Deppe, A., Karas, M., Hillenkamp, F., & Giessmann, U. (1992). Anal. Chem., 64, 2866-2869. Bai, J., Liang, X., Liu, Y., Zhu, Y., & Lubman, D. (1996). Rapid Commun. Mass Spectrom., 10, 839-844. Bailey, J., Tu, O., Issai, G., Ha, A., & Shively, J. (1994). Anal. Biochem., 224, 588-596. Bairoch, A., & Apweiler, R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic acids research, 28, 45-48. Baker, C., Dunn, M., & Yacoub, M. (1991). Evaluation of membranes used for electroblotting of proteins for direct automated microsequencing. Electrophoresis, 12, 342-348. Bartlet-Jones, M., Jeffrey, W., Hansen, H., & Pappin, D. (1994). Peptide ladder sequencing by MS using a novel volatile degradation reagent. Rapid Commun. Mass Spectrom., 8, 737-742. Bauer, M., Sun, Y., Keough, T., & Lacey, M. (2000). Sequencing of sulfonic acid derivatized peptides by electrospray MS. Rapid Commun. Mass Spectrom., 14, 924-929. Beardsley, R., Karty, A., & Reilly, J. (2000). Enhancing the intensities of lysine-terminated tryptic peptide ions in matrix-assisted laser desorption/ionisation mass spectrometry. Rapid Commun. Mass Spectrom., 14, 2147-2153.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
99
Beavis, J., & Bridson, J. (1993). Epitaxial protein inclusion in sinaptic acid cristal. J. Phys. D.: Appl. Phys., 26, 442-447. Beavis, R., & Chait, B. (1989). Rapid Commun. Mass Spectrom., 3, 432-435. Beavis, R., & Chait, B. (1990). Rapid, sensitive analysis of protein mixtures by mass spectrometry. Proc. Natl. Acad. Sci. USA, 87, 6873-6877. Beavis, R., Chaudhary, T., & Chait, B. (1992). a-CHCA as a matrix for MALDI. Org mass spectrosm, 27, 156-158. Bender, M., & Kemp, K. (1957). The kinetics of the -chymo-trypsin-catalyzed oxygen exchange of carboxylic acids. J. Am. Chem. Soc., 79, 116-. Benner, W., Barfknecht, A., Frank, M., Horn, D., Jaklevic, J., Labov, S., et al. (1997). J. Am. Soc. Mass Spectrom., 8, 1094. Berggren, K., Steinberg, T., Lauber, W., Carroll, J., Lopez, M., Chernokalskaya, E., et al. (1999). A luminescent ruthenium complex for ultrasensitive detection of proteins immobilized on membrane supports. Anal Biochem, 276(2), 129-143. Bergmann, M., & Fruton, J. (1941). Adv. Enzymol., 1, 63-98. Berkenkamp, S., Karas, M., & Hillenkamp, F. (1996). Proc. Natl. Sci. USA, 93, 7003-7007. Berndt, P., Hobohm, U., & Langen, H. (1999). Reliable automatic protein identification from Matrixassisted laser desorption/ionisation mass spectrometric peptide fingerprints. Electrophoresis, 20, 3521-3526. Bethancourt, L., Besada, V., González, L., Morera, V., Padrón, G., Takao, T., et al. (2001). Selective isolation and identification of N-terminal blocked peptides from tryptic protein digests. J. Peptide Res., 57, 345-353. Bethancourt, L., Takao, T., Gonzalez, J., Reyes, O., Besada, V., Padron, G., et al. (1999). The metastable decomposition of a peptide containing oxydised methionines in MALDI-ToF-MS. Rapid Commun. Mass Spectrom., 13, 1075-1076. Beuhler, R., & Friedman, L. (1980). Nucl. Instrum. Methods, 170, 309-. Beuhler, R., & Friedman, L. (1983). J. Appl. Phys., 54, 4188. Bhattacharjya, S., & Balaram, P. (1997). Proteins, 29, 492-507. Bienvenut, W., Deon, C., Sanchez, J., & Hochstrasser, D. (2002). Enhanced protein recovery after electrotransfer using square wave alternating voltage. Anal Biochem, 307(2), 297-303. Bienvenut, W., Hoogland, C., Greco, A., Heller, M., Gasteiger, E., Appel, R., et al. (2002). Hydrogen/deuterium exchange for higher specificity of protein identification by peptide mass fingerprinting. Rapid Commun. Mass Spectrom., 16(6), 616-626. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 49814988. Biriak, J., & Allen, M. (1998). Lewis acid mediatedd functionalisation of porous silicon with substitute alkenes and alkynes. J. Am. Chem. Soc, 120, 1339-1340. Birktoft, J., & Breddam, K. (1994). Methods Enzymol., 244, 114-126. Bjellqvist, B., Ek, P., Righetti, P., Gianazza, E., Gorg, A., Westermeir, R., et al. (1982). J. Biochem. Biophys., 6, 317-339. Blattner, F., Plunkett, G. r., Bloch, C., Perna, N., Burland, V., Riley, M., et al. (1997). Science, 277, 1453-1474. Bolt, M., & Mahoney, P. (1997). High-efficiency blotting of proteins of divers sizes following SDSPAGE. Anal. Biochem., 247, 185-192. Bonetto, V., Bergman, A., Jörnwall, H., & Sillard, R. (1997). C-Terminal sequence analysis of peptides and proteins using carboxypeptidases and mass spectrometry after derivatisation of lysine and cysteine residues. Anal. Chem., 69, 1315-1319. Booth, N., Cabera, B., & Fiorini, E. (1996). Ann. Rev. Nuc. Part. Sci., 49, 471. Bordini, E., Hamdan, M., & Righetti, P. (2000). Probing acrylamide alkylation sites in cysteine free proteins by MALDI-Tof. Rapid Commun. Mass Spectrom., 14, 840-848.
100
W.V. BIENVENUT
Borrebaeck, C., Ekström, S., Malmborg, H., AC, Nilsson, J., Laurell, T., & Marko-Varga, G. (2001). Bio/techniques, 30, 1126-1130. Bossi, A., Righetti, P., & Chiari, M. (1994). Electrophoresis, 15, 1112-1117. Braconnot, H. (1820). Ann. Chim. Phys., 13, 113-125. Brancia, F., Butt, A., Beynon, R., Hubbard, S., Gaskell, S., & Oliver, S. (2001). A combination of chemical derivatisation and improved bioinformatic tools optimises protein identification for proteomics. Electrophoresis, 22, 552-559. Brancia, F., Oliver, S., & Gaskell, S. (2000). Improved matrix-assisted laser desorption/ionisation mass spectrometric analysis of tryptic hydrolysates of proteins following guanidination of lysinecontaining peptides. Rapid Commun. Mass Spectrom., 14, 2070-2073. Brancia, F. L., Butt, A., Beynon, R. J., Hubbard, S. J., Gaskell, S. J., & Oliver, S. G. (2001). A combination of chemical derivatisation and improved m bioinformatic tools optimises protein identification for proteomics. Electrophoresis, 22(3), 552-559. Brancia, F. L., Oliver, S. G., & Gaskell, S. J. (2000). Improved matrix-assisted laser desorption/ionization mass spectrometric analysis of tryptic hydrolysates of proteins following guanidination of lysinecontaining peptides. Rapid Commun Mass Spectrom, 14(21), 2070-2073. Brancia, F. L., Openshaw, M. E., & Kumashiro, S. (2002). Investigation of the electrospray response of lysine-, arginine-, and homoarginine-terminal peptide mixtures by liquid chromatography/mass spectrometry. Rapid Commun Mass Spectrom, 16(24), 2255-2259. Breaux, G., Green-Church, K., France, A., & Limbach, P. (2000). surfractant-aided, MALDI-MS of hydrophobic and hydrophylic peptides. Anal. Chem., 72, 1169-1174. Breen, E., Hopwood, F., Williams, K., & Wilkins, M. (2000). Automatic Poisson peak harvesting for high throughput protein identification. Electrophoresis, 21, 2243-2251. Berggren, K., Steinberg, T., Lauber, W., Carroll, J., Lopez, M., Chernokalskaya, E., et al. (1999). A luminescent ruthenium complex for ultrasensitive detection of proteins immobilized on membrane supports. Anal. Biochem., 276, 129-143. Brockman, A., Dodd, B., & Orlando, R. (1997). A desalting approach for MALDI-MS using on-probe hydrophobic self-assembled monolayers. Anal. Chem., 69, 4716-4720. Brockman, A., & Orlando, R. (1995). Probe-immobilized affinity chromatography/mass spectrometry. Anal. Chem., 67, 4581-4585. Brockman, A., & Orlando, R. (1996). New immobilization chemistry for probe affinity mass spectrometry. Rapid Commun. Mass Spectrom., 10, 1688-1692. Brockman, A., Shah, N., & Orlando, R. (1998). Optimization of a hydrophobic solid-phase extraction interface for MALDI. J. Mass Spectrom., 33, 1141-1147. Brockmann, A., Dodd, B., & Orlando, R. (1997). A desalting approach for MALDI-MS using on-probe hydrophobic self-assembled monolayers. Anal. Chem., 69, 4716-4720. Brunelle, A., Chaurand, P., Della-Negra, S., Le Beyec, Y., & Parillis, E. (1997). Rapid Commun. Mass Spectrom., 11, 353-362. Budnik, B., Jensen, K., Jorgensen, T., Haase, A., & Zubarev, R. (2000). Benefit of 2.94um infrared matrix-assited laser desorption/ionisation for analysis of labile molecules by fourier transform mass spectrometry. Rapid Commun. Mass Spectrom., 14, 578-894. Butt, A., Davison, M., Smith, G., Young, J., Gaskell, S., Oliver, S., et al. (2001). Chromatographic separation as a prelude to two-dimentional electrophoresis in proteomics analysis. Proteomics, 1, 4253. Cagney, G., & Emili, A. (2002). De novo peptide sequencing and quantitative profiling of complex protein mixtures using mass-coded abundance tagging. Nat Biotechnol, 20(2), 163-170. Cao, P., & Stults, J. (1999). J. Chromatogr. A, 853, 225-235. Causer, A., Baker, P., & Burlingame, A. (1999). Role of accurate mass measurement (+- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Anal. Chem., 71, 2871-2882. Chakraborty, A., & Regnier, F. (2002). Global internal standard technology for comparative proteomics. J Chromatogr A, 949(1-2), 173-184. Chan, T., Colburn, A., Derrick, P., Gardiner, D., & Bowden, M. (1992). Org. Mass Spectrom., 27, 188. Chang, J. (1983). Methods Enzym., 91, 455-466.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
101
Chaurand, P., Luetzenkirchen, F., & Spengler, B. (1999). Peptide and protein identification by MALDIPSD ToF-MS. J. Am. Soc. Mass Spectrom., 10, 91-103. Chaurand, P., Luetzenkirchen, F., & Spengler, B. (1999). Peptide and protein identification by matrixassisted laser desorption ionization (MALDI) and MALDI-post-source decay time-of-flight mass spectrometry. J Am Soc Mass Spectrom, 10(2), 91-103. Chaurand, P., Schwartz, S., & Caprioli, R. (2002). Imaging mass spectrometry: a new tool to investigate the spatial organization of peptides and proteins in mammalian tissue sections. Curr Opin Chem Biol., 6(5), 676-681. Chen, C., Walkes, A., Wu, Y., Timmons, R., & Kinsel, G. (1999). Influence of sample preparation methodology on the reduction of peptide MALDI-ion signal by surface peptide binding. J Mass Spectrom, 34, 1205-1207. Chibnall, A., Mangan, J., & Rees, M. (1958). Biochem. J., 68, 114- 118. Chomczynski, P., & Mackey, K. (1994). Anal. Biochem., 221, 303-305. Chrambach, A., Reisfeld, R., Wyckoff, M., & Zaccar, i. J. (1967). A procedure for a rapid and sensitive staining of proteinprotein fractionated by polyacrylamide gel electrophoresis. Anal. Biochem., 20, 150-154. Christiansen, J., & Houen, G. (1992). Electrophoresis, 13, 179-183. Clauser, K., Baker, P., & Burlingame, A. (1999). Anal. Chem., 71, 2076-2084. Clauser, K., Hall, S., Smith, D., Webb, J., Andrews, L., Tran, H., et al. (1995). Rapid mass spectrometric peptide sequencing and mass matching for characterization of human melanoma proteins isolated by two-dimensional PAGE. Proc. Natl. Acad. Sci. USA, 92(11), 5072-5076. Cohen, S., & Chait, B. (1997). Mass spectrometry of whole proteins eluted from SDS-PAGE gels. Anal. Biochem., 247, 257-267. Cossio, G., Sanchez, J., Wettstein, R., & Hochstrasser, D. (1997). Spermatocytes and round spermatids of rat testis: the difference between in vivo and in vitro protein patters. Electrophoresis, 18, 548-552. Courchesne, L., Luethy, R., & Patterson, S. (1997). Comparison of in-gel and on-membrane digestion methods at low to sub-pmol level for subsequent peptide and fragment-ion mass analysis using matrix-assisted laser-desorption ionization mass spectrometry. Electrophoresis, 18(3-4), 369-381. Courchesne, L., & Patterson, S. (1997). Manual microcolumn chromatography for sample clean-up before mass spectrometry. BioTechniques, 22(2), 244-250. Courchesne, P., Luethy, R., & Patterson, S. (1997). Comparison of in gel and on membrane digestion methods at low to sub-pmol level for subsequent peptide and fragmentation mass analysis using MALDI-MS. Electrophoresis, 18, 369-381. Craik, C., Largman, C., Fletcher, T., Barr, P., Fletterick, R., & Rutter, W. (1985). Science, 228, 291-297. Cramer, R., & Corless, S. (2001). The nature off collision-induced dissociation processes of doubly protonated peptides: comparative study for the future use of matrix-assisted laser desorption/ionization on a hybrid quadrupole time-of-flight mass spectrometer in proteomics. Rapid Commun Mass Spectrom, 15(22), 2058-2066. Cramer, R., Hillenkamp, F., & Haglund, R. (1996). IR-MALDI by using a tunable mid Infra Red free electron laser. J. Am. Soc. Mass. Spectrom., 7, 1187-1193. Daban, J. (2001). Fluorescent labelling of proteins with Nile red and 2-methoxy-2,4-diphenyl-3(2H)furanone: physicochemical basis and application to the rapid staining of sodium dodecyl sulfate polyacrylamide gels ans western blots. Electrophoresis, 22, 874-880. Daban, J., Bartholomé, S., & Samsó, M. (1991). Anal. Biochem., 199, 169-174. Dahl, B., Schiodt, F., Ott, P., Wians, F., Lee, W., Balko, J., et al. (2003). Plasma concentration of Gcglobulin is associated with organ dysfunction and sepsis after injury. Crit Care Med, 31(1), 152-156. Dancik, V., Addona, T., Clauser, K., Vath, J., & Pevzner, P. (1999). De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol., 6, 327-342. Dare, T., Davies, H., Turton, J., Lomas, L., Williams, T., & York, M. (2002). Application of surfaceenhanced laser desorption/ionization technology to the detection and identification of urinary parvalbumin-alpha: a biomarker of compound-induced skeletal muscle toxicity in the rat. Electrophoresis, 23(18), 3241-3251. Davis, M., Beierle, J., Bures, E., McGinley, M., Mort, J., Robinson, J., et al. (2001). Automated LC-LCMS-MS platform using binary ion-exchange and gradient reversed-phase chromatography for improved proteomic analyses. J Chromatogr B Biomed Sci Appl, 752(2), 281-291.
102
W.V. BIENVENUT
Davis, M., Spahr, C., McGinley, M., Robinson, J., Bures, E., Beierle, J., et al. (2001). Towards defining the urinary proteome using liquid chromatography-tandem a mass spectrometry. II. Limitations of complex mixture analyses. Proteomics, 1(1), 108-117. Deery, M., Jennings, K., Jasieczek, C., Haddleton, D., Jackson, A., Yates, H., et al. (1997). Rapid Commun. Mass Spectrom., 11, 57-62. Degani, Y., & Patchornik, A. (1974). Biochemistry, 13, 1-14. Dehm, P., & Nordwig, A. (1970). Eur. J. Biochem., 17, 372-377. Del Mar, E., Largman, C., Brodrick, J., & Geokas, M. (1979). Anal. Biochem., 99, 316-320. Déon, C., Promé, J., Promé, D., Francina, A., Groff, P., Kalmes, G., et al. (1997). J. Mass, Spectrom.,. Desnuelle, P. (1960). In The Enzymes (Vol. Vol 4, pp. 93-118). New York: Academic Press. Ding, J., & P, V. (1999). Advances in CE/MS. Anal. Chem., 71, A378-A385. Dogruel, D., Williams, P., & Nelson, R. (1995). Rapid tryptic mapping using enzymatically active mass spectrometer probe tips. Anal. Chem., 67, 4343-4348. Doktycz, S., Savickas, P., & Krueger, D. (1991). Rapid Commun. Mass Spectrom., 145, 145. Doumas, B., Watson, W., & Biggs, H. (1971). Albumin standards and the measurement of serum albumin with bromcresol green. Clin Chim Acta, 31(1), 87-96. Drapeau, G., Boily, Y., & Houmard, J. (1972). J. Biol. Chem., 247, 6720-6726. Duan, Y., & Laursen, R. (1994). Anal. Biochem., 216, 431-438. Ducret, A., Bartone, N., Haynes, P., Blanchard, A., & Aebersold, R. (1998). A simplified gradient solvent delivery system for capillary liquid chromatography - electrospray ionization mass spectrometry. submitted. Dukan, S., Turlin, E., Biville, F., Bolbeck, G., Touati, D., Tabet, J., et al. (1998). Coupling 2D-SDSPAGE with CNBr cleavage and MALDI-ToF-MS: A strategy applied to the identification of proteins induced by a hypochlorous acid stress in E. Coli. Anal. Chem., 70, 4433-4440. Dunham, I., Shimizu, N., Roe, B., Chissoe, S., Hunt, A., Collins, J., et al. (1999). Nature, 402, 489-495. Eckershorn, C., Strupat, K., Karas, M., Hillenkamp, F., & Lottspeich, F. (1992). Mass spectrometric analysis of blotted proteins after gel electrophoretic separation by MALDI. Electrophoresis, 13, 664665. Eckerskorn, C., Strupat, K., Schleuder, D., Hochstrasser, D., Sanchez, J., Lottspeich, F., et al. (1997). Analysis of proteins by direct scanning IR-MALDI-MS after 2-D PAGE separation and electroblotting. Anal. Chem., 69, 2888-2892. Edman, P. (1950). Acta Chem. Scand., 4, 283. Edman, P., & Begg, G. (1967). Eur. J. Biochem., 1, 80-91. Egelhofer, V., Bussov, K., Luebbert, C., Lehrach, H., & Nordhoff, E. (2000). Improvements in protein identification by MALDI-ToF MS peptide mapping. Anal. Chem., 72, 2741-2750. Ehring, H., Karas, M., & Hillemkamp, F. (1992). Org. mass Spectrom., 27, 427-480. Emmett, M., & Caprioli, R. (1994). Micro-electrospray mass spectrometry: Ultra-high-sensitivity analysis of peptides and proteins. J. Am. Soc. Mass. Spectrom., 5, 605-613. Eng, J., McCormack, A., & Yates, J. r. (1994). An approach to correlate tandem mass spectral data pf peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom., 5(11), 976989. Engen, J., & Smith, D. (2001). Investigating protein structure and dynamics by hydrogen exchange MS. Anal. Chem., 73, J. Chromatogr. A 256-A265. Etienne, M., Jerome, M., Fleurence, J., Rehbein, H., Kundiger, R., Yman, I., et al. (1999). Electrophoresis, 20, 1923-1933. Eynard, L., & Laurière, M. (1998). Electrophoresis, 19, 1394-1396. Falick, A., & Maltby, D. (1989). Anal. Biochem., 182, 165-169. Fenn, J., Mann, M., Meng, C., Wong, S., & Whitehouse, C. (1989). Electrospray ionization for mass spectrometry of large biomolecules. Science, 246(4926), 64-71. Fenyo, D., Qin, J., & Chait, B. (1998). Protein identification using mass spectrometric information. Electrophoresis, 19(6), 998-1005. Fernandez, J., Gharahdaghi, F., & Mische, S. (1998). Routine identification of proteins from sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) gels orr polyvinyl difluoride membranes using matrix assisted laser desorption/ionization time of flight mass spectrometry (MALDI-ToF-MS). Electrophoresis, 19(6), 1036-1045.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
103
Fernandez-Patron, C., Calero, M., Collazo, P., Garcia, J., Madrazo, J., Musacchio, A., et al. (1995). Protein reverse staining: high-efficiency microanalysis of unmodified proteins detected on electrophoresis gels. Analytical Biochemistry, 224(1), 203-211. Fernandez-Patron, C., Calero, M., Collazo, P., Garcia, J., Madrazo, J., Musachio, A., et al. (1994). Anal. Biochem., 2, 203-211. Fernandez-Patron, C., Hardy, E., Sosa, A., Seoane, J., & Castellanos, L. (1995). Double staining of coomassie blue-stained polyacrylamide gels by imidazol-sodium dodecyl sulfate-zinc reverse staining: sensitive detection of coomassie blue-undetected proteins. Anal. Biochem., 224, 263-269. Figueroa, I., Torres, O., & Russell, D. (1998). Effects of the water content in the sample preparation for MALDI on the mass spectra. Anal. Chem., 70, 4527-4533. Fleischmann, R., Adams, M., White, O., Clayton, R., Kirkness, E., Kerlavage, A., et al. (1995). Science, 269, 496-512. Fleming, J., & Paull, T. (1988). Biotechniques, 6, 926-929. Folk, J., Piez, K., Carroll, W., & Gladner, J. (1960). Biol. Chem., 235. Fontana, A., Dalzoppo, D., Grandi, C., & Zambonin, M. (1983). Bovine pepsinogens and pepsins. Meth. Enzym., 91, 311-318. Fontana, A., Savige, W., & Zambonin, M. (1979). In Methods in peptide and protein sequence analysis (pp. 309-322). Amsterdam: Elsevier. Ford, J., Chambers, R., & Cohen, W. (1973). An Active-site Titration Method for Immobilized Trypsin. Biochim Biophys Acta, 309, 175-. Fountoulakis, M., Juranville, J., Roder, D., Evers, S., Brndt, P., & Langen, H. (1998). Reference map of the low molecular mass proteins of Haemophilus influenzae. Electrophoresis, 19, 1819-1257. Fraenkel-Conrat, H., & Olcott, H. (1945). J. Biol. Chem., 161, 259-268. Frank, M., Barfknecht, A., Benner, W., Horn, D., Jaklevic, J., Labov, S., et al. (1996). Rapid Commun. Mass Spectrom., 10, 1946-1950. Galvani, M., Bordini, E., Piubelli, C., & Hamdan, M. (2000). Effect of experiemntal conditions on the analysis of sodium dodecyl sulphate polyacrylamide gel electrophoresis separated protein by matrixassisted laser desorption/ionisation mass spectrometry. Rapid Commun. Mass Spectrom., 14, 18-25. Galvani, M., Hamdan, M., Herbert, B., & Righetti, P. (2001). Alkylation kinetics of proteins in preparation for two-dimensional maps: a matrix assisted laser desorption/ionization-mass spectrometry investigation. Electrophoresis, 22(10), 2058-2065. Gaucherand, P., Guibaud, S., Rudigoz, R., & Wong, A. (1994). Diagnosis of premature rupture of the membranes by the identification of alpha-feto-protein in vaginal secretions. Acta Obstet Gynecol Scand, 73(6), 456-459. Gay, S., Binz, P., Hochstrasser, D., & Appel, R. (1999). Electrophoresis, 20, 3527-3534. Gay, S., Binz, P. A., Hochstrasser, D. F., & Appel, R. D. (1999). Modeling peptide mass fingerprinting data using the atomic composition of peptides. Electrophoresis, 20(18), 3527-3534. Gay, S., Binz, P. A., Hochstrasser, D. F., & Appel, R. D. (2002). Peptide mass fingerprinting peak intensity prediction: extracting knowledge from spectra. Proteomics, 2(10), 1374-1391. Gehanne, S., Cecconi, D., Carboni, L., Righetti, P., Domenici, E., & Hamdan, M. (2002). Quantitative analysis of two-dimensional gel-separated proteins using isotopically marked alkylating agents and matrix-assisted laser desorption/ionization mass spectrometry. Rapid Commun Mass Spectrom, 16(17), 1692-1698. Geng, M., Fi, J., & Reignier, F. (2000). J. Chromatogr. A, 870, 295-313. Geno, P., & Macfarlane, R. (1989). Int. J. Mass Spectrom. Ions Processes, 92, 195. Gercel-Taylor, C., Bazzett, L., & Taylor, D. (2001). Gynecol. Oncol., 81, 71-76. Gevaert, K., Demol, H., Sklyarova, T., Vandekerckhove, J., & Houthaeve, T. (1998). A peptide concentration and purification method for protein characterization in the subpicomole range using matrix assisted laser desorption/ionization postsource decay (MALDI-PSD) sequencing. Electrophoresis, 19(6), 909-917. Gevaert, K., & Vandekerkhove, J. (2000). protein identification methods in proteomics. Electrophoresis, 21, 1145-1154. Gianazza, E., Coari, P., Lovati, M., Manzoni, C., Ghibandi, E., & Salmona, M. (1995). Basic proteins and basic membranes adjusting blotting and stainning conditions to Immobilon CD. J. chromatography A, 698, 351-359.
104
W.V. BIENVENUT
Gimon, M., Preston, L., Solouki, T., White, M., & Russell, D. (1992). Org. Mass Spectrom., 27, 827-830. Glückmann, M., & Karas, M. (1999). The initial ion velocity and its dependence on matrix, analyte and preparation method in ultraviolet matrix-assisted laser desorption/ionisation. J. Mass Spectrom., 34, 467-477. Gobom, J., Kraeuter, K., Persson, R., Steen, H., Roepstorff, P., & Ekman, R. (2000). Detection and quantification of neurotensin in human brain tissue by matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry. Anal. Chem., 72, 3320-3326. Gobom, J., Migorodskaya, E., Nordhoff, E., Hojrup, P., & Roepstorff, P. (1999). Use of vapor-phase acid hydrolysis for mass spectrometric peptide mapping and protein identification. Anal. Chem., 71, 919927. Gobom, J., Nordhoff, E., Mirgorodskaya, E., Ekman, R., & Roepstorff, P. (1999). Sample purification and preparation technique based on nano-scaled reversed-phase columns for the sensitive analysis of complex peptide mixtures by maldi-assisted laser desortion/ionisation mass spectrometry. J. Mass Spectrom., 34, 105-116. Gobom, J., Schuerenberg, M., Mueller, M., Theiss, D., Lehrarch, H., & Nordhorff, E. (2001). a-cyano-4hydoxycinnamic acid affinity sample preparation. A protocol for MALDI-MS peptide analysis in proteomics. Anal. Chem., 73, 434-438. Godovac-Zimmermann, J., & Brown, L. R. (2003). Proteomics approaches to elucidation of signal transduction pathways. Curr Opin Mol Ther, 5(3), 241-249. Goffeau, A., Barrell, B., Bussey, H., Davis, R., Dujon, B., Feldmann, H., et al. (1996). Science, 274, 563567. Good, T. E., Weber, P. S., Ireland, J. L., Pulaski, J., Padmanabhan, V., Schneyer, A. L., et al. (1995). Isolation of nine different biologically and immunologically active molecular variants of bovine follicular inhibin. Biol Reprod, 53(6), 1478-1488. Gorg, A., Postel, W., & Gunther, S. (1988). The current state of two-dimensional electrophoresis with immobilized pH gradients. Electrophoresis, 9(9), 531-546. Gorman, J., Fergusson, B., & Nguyen, T. (1996). Use of 2,6-Dihydroxyphenone for analysis of fragile peptides, disulfidee bonding and small proteins by MALDI. Rapid Commun. Mass Spectrom., 10, 529-536. Gráf, L., Szilágyi, L., & Venekei, I. (1998). In Hand book of proteolytic enzymes (pp. 30-38). San Diego: Academic Press. Gras, R., Muller, M., Gasteiger, E., Gay, S., Binz, P., Bienvenut, W., et al. (1999). Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection. Electrophoresis, 20(18), 3535-3550. Green-church, K., & Limbach, P. (1998). Matrix-assisted laser desorption/ionisation mass spectrometry of hydrophobic peptides. Anal. Chem., 70, 5322-5325. Griffin, T., Goodlett, D., & Aebersold, R. (2001). Advances in proteome analysis by mass spectrometry. Curr Opin Biotechnol., 12(6), 607-612. Griffiths, W., Lindh, I., Bergman, T., & Sjövall, J. (1995). Rapid Commun. Mass Spectrom., 9, 667-676. Grigorean, G., Carey, R., & Amster, I. (1996). Eur. J. Mass Spectrom., 2, 139-143. Gross, E., & Witkop, B. (1961). J. Am. Chem. Soc., 83, 1510. Gross, E., & Witkop, B. (1962). J. Biol. Chem., 237, 1856-1860. Gu, Q., & Preswich, G. (1997). Efficient peptide ladder sequencing by MALDI-ToF-MS using allyl isothiocyanate. J. Peptide Res., 49, 484-491. Gultekin, H., & Heermann, K. (1988). Anal. Biochem., 172, 320-329. Gusev, A., Wilkinson, W., Proctor, A., & Hercules, D. (1995). Improvement of signal reproducibility and matrix/comatrix effects in MALDI analysis. Anal. Chem, 67, 1034-1041. Gygi, S., Rist, B., Gerber, S., Turecek, F., Gelb, M., & Aebersold, R. (1999). quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnol., 17, 994-999. Haebel, S., Jensen, C., Andersen, S. O., & Roepstorff, P. (1995). Isoforms of a cuticular protein from larvae of the meal beetle, Tenebrio molitor, studied by mass spectrometry in combination with edman degradation and two-dimentional polyacrylamide gel electrophoresis. Prot Scien, 4, 394-404. Hailat, N., & Hanash, S., Indian,. (1995). J. Biochem. Biophys., 32, 240-244.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
105
Hale, J. E., Butler, J. P., Kniermann, M. D., & Becker, G. W. (2000). Increased sensitivity of tryptic peptide detection by MALDI-ToF mass spectrometry r is achieved by conversion of lysine to homoarginine. Anal. Biochem., 287, 110-117. Hara, S., & Yamakawa, M. (1996). Biochem. Biophys. Res. Comm., 220, 644-649. Hartley, B. (1964). Nature, 201, 1284-1287. Harvey, D. (1993). Rapid Commun. Mass Spectrom., 7, 614-619. Hellman, U., Wernsted, C., Gonez, J., & Heldin, C. H. (1995). Improvement of an in-gel digestion procedure for the micropreparation of internal protein-fragments for amino acid sequencing. Anal. Biochem., 224(1), 451-455. Hensel, R. R., King, R. C., & Owens, K. G. (1997). Electrospay sample preparation for improved quantitation in matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom., 11, 1785-1793. Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., & Watanabe, C. (1993). Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proceedings of the National Academy off Sciences of the United States of America, 90(11), 5011-5015. Herbert, B., Sanchez, J., & Bini, L. (1997). In Proteome Research: New Frontiers in functional genomics (pp. 13-33). Berlin: Springer-Verlag. Hermodson, M., Ericsson, L., Neurath, H., & Walsh, K. (1973). Biochemistry, 12, 3146-3153. Hill, R. (1965). Hydrolysis of proteins. Adv. Prot. Chem, 20, 37-107. Hilton, G., Maritnis, J., Wollman, D., Irwin, K., Dulcie, L., Gerber, D., et al. (1998). Nature, 391, 672675. Hirano, H. (1989). J. Prot. Chem., 8, 115-130. Hirano, H., Komatsu, S., Kajiwara, H., Takakura, H., Sakiyama, F., & Tsunasawa, S. (1991). Microsequence analysis of N?-blocked proteins electroblotted onto an immobilizing matix from polyacrylamide gel. Analytical Sciences, Suppl 7, 941-944. Hisada, M., Konno, K., Itagaki, Y., Naoki, H., & Nakajima, T. (2000). Advantages of using nested collision induced dissociation/post-source decay with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry: sequencing of novel peptides from wasp venom. Rapid Commun Mass Spectrom., 14(19), 1828-1834. Hochstrasser, D. (1998). Proteome in perspective. Clin. Chem. Lab. Med., 36(11), 825-836. Hochstrasser, D., Frutiger, S., Paquet, N., Bairoch, A., Ravier, F., Pasquali, C., et al. (1992). Electrophoresis, 13, 992-1001. Hochstrasser, D. F., Appel, R. D., Vargas, R., Perrier, R., Vurlod, J. F., Ravier, F., et al. (1991). A clinical molecular scanner: the Melanie project. MD Comput, 8(2), 85-91. Hochtrasser, D., Appel, R., Vargas, R., Perrier, R., Vurlod, J., Ravier, F., et al. (1991). MD-Computing, 8, 85-91. Hood, D. (1999). Parasitology, 118, S3-S9. Houthaeve, T., Gausepohl, H., Ashman, K., Nillson, T., & Mann, M. (1997). Automated protein preparation techniques using a digest robot. J. Prot. Chem., 16(5), 343-348. Houthaeve, T., Gausepohl, H., Mann, M., & Ashman, K. (1995). Automation of micro-preparation and enzymatic cleavage of gel electrophoretically separated proteins. FEBS Lett., 376(1-2), 91-94. Huang, Z., Wu, J., Roth, K., Yang, Y., Gage, D., & Watson, J. (1997). Anal. Chem., 69, 137-144. Huang, Z. H., Shen, T., Wu, J., Gage, D. A., & Watson, J. T. (1999). Protein sequencing by matrixassisted laser desorption ionization-postsource decay-mass spectrometry analysis of the N-Tris(2,4,6trimethoxyphenyl)phosphine-acetylated tryptic digests. Anal Biochem, 268(2), 305-317. Hummel, B. (1959). Can. J. Biochem. Physiol., 37, 1393-1398. Hunt, D., Yates, J., Shabanowitz, J., Winston, S., & Hauer, C. (1986). Proc. Natl. Sci. USA, 83, 62336237. Hunt, D. F., Yates, J. R., 3rd, Shabanowitz, J., Winston, S., & Hauer, C. R. (1986). Protein sequencing by tandem mass spectrometry. Proc Natl Acad Sci U S A, 83(17), 6233-6237. Hunter, J., Lin, H., & Becker, C. (1997). Cryogenic frozen solution matrixes for analysis of DNA by time-of-flight mass spectrometry. Anal Chem, 69(17), 3608-3612. Hutchens, T., & Yip, T. (1993). New desorption strategies for the mass spectrometric analysis of macromolecules. Rapid Commun. Mass Spectrom., 7, 576-580.
106
W.V. BIENVENUT
Iijima, S., Shiba, K., Inoue, J., Yoshida, T., & Kimura, M. (1997). J. Clin. Lab. Anal., 11, 220-224. Ingendoh, A., Karas, M., Hillenkamp, F., & Giessmann, U. (1994). Factors affecting the resolution in MALDI-MS. Int. J. Mass Spectrom. Ion Proc., 131, 345-354. Inglis, A. (1991). Chemical procedures for C-Terminal sequencing of peptides ans proteins. Anal. Biochem., 195, 183-196. Inglis, A., McKern, N., Roxburgh, C., & Strike, P. (1979). In Methods in peptide and protein sequence analysis (pp. 329-343). Amsterdam: Elsevier. Ishihama, Y., Katayama, H., & Asakawa, N. (2000). Surfactants usable for electrospray ionization mass spectrometry. Anal Biochem, 287(1), 45-54. Itano, H., & Robinson, E. (1972). J. Biol. Chem., 247, 4819-4824. Jackson, A., Jennings, K., & Scrivens, J. (1997). J. Am. Soc. Mass Spectrom., 8, 76-85. Jacobson, G., Schaffer, M., Stark, G., & Vanaman, T. (1973). J. Biol. Chem., 248, 6583-6591. James, P., Quadroni, M., Carafoli, E., & Gonnet, G. (1993). Protein identification by mass profile fingerprinting. Biochem Biophys Res Commun, 195(1), 58-64. James, P., Quadroni, M., Carafoli, E., & Gonnet, G. (1994). Protein identification in DNA databases by peptide mass fingerprinting. Protein Sci., 3(8), 1347-1350. Jekel, P., Weijer, W., & Beintena, J. (1983). Use off endoproteinase Lys-C from Lysobacter enzymogenes in protein sequence analysis. Anal. Biochem., 134, 347-354. Jensen, O. N., Mortensen, P., Vorm, O., & Mann, M. (1997). Automation of matrix-assisted laser desorption/ionisation mass spectrometry using fuzzy logic feedback control. Anal. Chem., 69, 17061714. Jensen, O. N., Podtelejnikov, A., & Mann-M. (1996). Delayed extraction improves specificity in database searches by matrix-assisted laser desorption/ionization peptide maps. Rapid Commun. Mass Spectrom., 10(11), 1371-1378. Jensen, O. N., Wilm, M., Shevchenko, A., & Mann, M. (1998). Peptide sequencing of 2-DE gel-isolated proteins by nanoelectrospray tandem mass spectrometry. In A. J. Link (Ed.), Methods in molecular biology, 2-D proteome analysis protocols. Totowa, NJ: Humana Press Inc. Jensen, P., Pasa-Tolic, L., Anderson, G., Horner, J., Lipton, M., Bruce, J., et al. (1999). Probing proteomes using capillary isoelectric focusing-electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry. Anal. Chem., 71(11), 2871-2882. Jespersen, S., Niessen, W., Tjaden, U., & van der Greef, J. (1998). Basic matrices in the analysis of noncovalent complexes by matrix-assisted laser desorption/ionization mass spectrometry. J Mass Spectrom, 33(11), 1088-1093. Jiang, H., & English, A. (2002). Quantitative analysis of the yeast proteome by incorporation of isotopically labeled leucine. J Proteome Res., 1(4), 345-350. Jin, X., Chen, Y., Lubman, D., Misek, D., & Hanash, S. (1999). Capillary electrophoresis/tandem mass spectrometry for analysis of proteins from two-dimensional sodium dodecyl sulfate polyacrylamide gel electrophoresis. Rapid Commun Mass Spectrom, 13(23), 2327-2334. Johnson, T., & Nicolet, B. (1911). J. Am. Chem. Soc., 33, 1973-1978. Jones, D., Stott, K., Howard, M., & Perham, R. (2000). Biochemistry, 39, 8448-8459. Juhasz, P., Costello, C., & Biemann, K. (1993). J. Am. Soc. Mass Spectrom., 4, 399-409. Jungblut, P., Eckerskorn, C., Lottspeich, F., & Klose, J. (1990). Blotting efficiency investigated by using two-dimensional electrophoresis, hydrophobic membranes and proteins from different sources. Electrophoresis, 11(7), 581-588. Jungblut, P., & Seifert, R. (1990). Analysis by high-resolution two-dimensional electrophoresis of differentiation-dependent alterations in cytosolic protein pattern of HL-60 leukemic cells. J. Biochem. Biophys., 21(1), 47-58. Kaiser, R., Jr, Cooks, R., Syka, J., & Stafford, G., Jr,. (1990). Rapid Commun. Mass Spectrom., 4, 30-33. Karas, M., Bachmann, D., Bahr, U., & Hillenkamp, F. (1987). Int. J. Mass Spectrom. Ion Processes, 78, 53-68. Karas, M., Bahr, U., Strupat, K., Hillenkamp, F., Tsarbopoulos, A., & Pramanik, B. N. (1995). Matrix dependence of metastable fragmentation of glycoproteins in MALDI ToF mass spectrometry. Anal. Chem., 67, 675-679. Karas, M., Glückmann, M., & Schäfer, J. (2000). Ionization in matrix-assisted laser desorption/ionisation: singly charged molecular ions are the lucky survivors. J. Mass Spectrom., 35, .01-12.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
107
Karas, M., & Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem, 60(20), 2299-2301. Katta, V., & Chait, B. (1993). J. Am. Chem. Soc., 115, 6317-6321. Keil, B. (1982). In Methods in protein sequence analysis. Clifton: Humana Press. Keil, B. (1992). Specificity of proteolysis. Heidelberg/New York: Springer-Verlag. Kemper, C., Berggren, K., Diwu, Z., & Patton, W. (2001). An improved, luminescent europium based stain for detection of electroblotted proteins on nitrocellulose or polyvinylidiene difluoride membranes. Electrophoresis, 22, 881-889. Kenrik, K., & Margolis, J. (1970). Anal. Biochem., 33, 204-207. Keough, T., Lacey, M., Fieno, A., Grant, R., Sun, Y., Bauer, M., et al. (2000). Tandem mass spectrometry methods for definitive protein identification in proteomics research. Electrophoresis, 21, 2252-2265. Keough, T., Lacey, M., & Youngquist, R. (2000). Rapid Commun. Mass Spectrom., 14, 2348-2356. Keough, T., Lacey, M. P., & Youngquist, R. S. (2000). Derivatization procedures to facilitate de novo sequencing of lysine-terminated tryptic peptides using postsource decay matrix-assisted laser desorption/ionization mass spectrometry. Rapid Commun Mass Spectrom, 14(24), 2348-2356. Keough, T., Youngquist, R. S., & Lacey, M. P. (1999). A method for high-sensitivity peptide sequencing using postsource decay matrix-assisted laser desorption ionization mass spectrometry. Proc Natl Acad Sci U S A, 96(13), 7131-7136. Klose, J. (1975). Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues. A novel approach to testing for induced point mutations in mammals. Humangenetik, 26(3), 231-243. Knochenmuss, R., Lehmann, E., & Zenobi, E. (1998). Eur. Mass Spectrom., 4, 421-427. Kocholaty, W., Weil, L., & Smith, L. (1938). Biochem. J., 32, 1685-1690. Kollisch, G., Lorenz, M., Kellner, R., Verhaert, P., & Hoffmann, K. (2000). Eur J Biochem, 267, 55025508. Kraft, P., Mills, J., & Dratz, E. (2001). Mass spectrometric analysiss of cyanogene bromide fragments of integral membrane proteins at the picomole level: application to rhodopsine. Anal. Biochem., 292, 76-76. Kratzer, R., Eckerskorn, C., Karas, M., & Lottspeich, F. (1998). Suppression effects in enzymatic peptide ladder sequencing using UV-MALDI-MS. Electrophoresis, 19, 1910-1919. Kraus, M., Janck, K., Bienert, M., & Krause, E. (2000). Characterisation of intermolecular ?-sheet peptides by mass spectrometry and hydrogen isotope exchange. Rapid Commun. Mass Spectrom., 14, 1094-1104. Krause, E., Wenschuh, H., & Jungblut, P. R. (1999). The dominance of Arg containing peptidesin MALDI derived tryptic mass fingerprints of proteins. Anal. Chem., 71, 4160-4165. Krogh, T. N., Berg, T., & Hojrup, P. (1999). Protein analysis using enzymes immobiliezed to paramagnetic beads. Anal. Biochem., 274, 153-162. Kruse, R., & Sweedler, J. (2203). Spatial profiling invertebrate ganglia using MALDI MS. J Am Soc Mass Spectrom., 14(7), 752-759. Kussmann, M., Nordhoff, E., Rahbek-Nielsen, H., Haebel, S., Rossel-Larsen, M., Jakobsen, L., et al. (1997). Matrix-assisted laser desorption/ionisation mass spectrometry sample preparation techniques designed for various peptide and protein analytes. J. Mass Spectrom., 32, 593-601. Kussmann, M., & Roepstorff, P. (1998). Spectroscopy, 14, 1-27. Labouesse, J., & Gervais, M., Eur,. (1967). J. Biochem., 2, 215-223. Lacey, J., Bergen, H., Magera, M., Naylor, S., & O'Brien, J., Clin,. (2001). Chem. Lab. Med., 47(3), 513518. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. Lake, D. A., Johnson, M. V., McEwen, C. N., & Larsen, B. S. (2000). Sample préparation for high throughput accurate mass analysis by MALDI-ToF-MS. Rapid Commun. Mass Spectrom., 14, 10081013. Land, C., & Kinsel, G. (2001). The mechanism of matrix to analyte proton transfer in clusters of 2,5dihydroxybenzoic acid and the tripeptide VPL. J. Am. Soc. Mass Spectrom., 12, 726-731. Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921.
108
W.V. BIENVENUT
Landon, M. (1983). Cleavage at aspartyl-propyl bonds. Meth. Enzym., 91, 145-149. Landry, F., Lombardo, C. R., & Smith, J. W. (2000). A method for application of samples to MALDIToF target that enhances peptide detection. Anal. Biochem., 279, .1-8. Lauber, W. M., J.A.Caroll, Dufield, D. R., Kiessel, J. R., Radabaugh, M. R., & Malone, J. P. (2001). Mass spectrometry compatibility of two-dimentional gel protein stains. Electrophoresis, 22, 906-918. Lauriere, M. (1993). A semi-dry electroblotting system efficiently transfers both high and low molecular weight proteins separated by SDS-PAGE. Anal. Biochem., 212, 206-211. Le Maire, M., Deschamps, S., Moller, J., Le Caer, J.-P., & Rossier, J. (1993). Anal. Biochem., 214, 50-57. Lecchi, P., & Caprioli, R. M. (1996). Matrix-assisted laser desorption mass spectrometry for peptide mapping. New Methods in Peptide Mapping For the Characterization of Proteins, 219-240. Lecchi, P., Le, H., & Pannell, L. (1995). Nucleic Acids Res., 23, 276-277. Lee, C., Levin, A., & Branton, D. (1987). Anal. Biochem., 166, 308-312. Lehr, S., Kotzka, J., Herkner, A., Sikmann, A., Meyer, H., Krone, W., et al. (2000). Biochemistry, 39, 10898-10907. Lek, L., Yang, E., Wang, D., & Cheng, L., J,. (1995). Biochem. Biophys. Methods, 30, 9-20. Li, A., Sowder, R. C., Henderson, L. E., Moore, S. P., Garfinkel, D. J., & Fisher, R. J. (2001). Chemical cleavage at aspartyl residues for protein identification. Anal Chem, 73(22), 5395-5402. Li, F., Dong, M., Miller, L. J., & Naylor, S. (1999). Efficient removal of sodium dodecyl sulfate (SDS) enhances analysis of proteins by SDS-polyacrylamide gel electrophoresis coupled with matrixassisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Commun Mass Spectrom, 13(5), 464-465. Liao, P., & Allison, J. (1995). Mass Spectrom., 30, 511-523. Liao, P., Huang, Z., & Allison, J. (1997). Charge remote fragmentation of peptides following attachement of a fixed positive charge: a MALDI PSD study. J. Am. Soc. Mass Spectrom., 8, 501-509. Lin, Q., & Knochenemuss, R. (2001). Two-photons ionization thresholds of matrix-assisted laser desorption/ionisation matrix clusters. Rapid Commun. Mass Spectrom., 15, 1422-1426. Lin, T., Shao, X., & Xia, Q. (2000). J. Chromatogr. A, 855, 695-707. Lin, Y., Fusek, M., Lin, X., Hartsuck, J., Kezdy, F., & Tang, J. (1992). J. Biol. Chem., 267, 18413-18418. Lin, Y., Means, G. E., & Feeney, R. E. (1969). The action of proteolytic enzymes on N,N-dimethyl proteins. Basis for a microassay for proteolytic enzymes. J Biol Chem, 244(3), 789-793. Lindh, I., Griffiths, W., Bergman, T., & Sjövall, J. (1994). Rapid Commun. Mass Spectrom., 8, 797-803. Lindh, I., Hjelmqvist, L., Bergman, T., Sjövall, J., & Griffiths, W. (2000). J. Am. Soc. Mass Spectrom. , 11, 673-686. Link, A. J., Eng, J., Schieltz, D. M., Carmack, E., Mize, G. J., Morris, D. R., et al. (1999). Direct analysis of protein complexes using mass spectrometry. Nature Biotechnol., 17, 676-682. Link, A. J., Hays, L. G., Carmack, E. B., & III, J. R. Y. (1997). Identifying the major proteome components of Haemophilus influenzae type strain NCTC-8143. Electrophoresis, 18, 1314-1334. Lischwe, M. A., & Sung, M. T. (1977). Use of N-Chlorosuccinimide/Urea for the selective cleavage of tryptophanyl peptide bonds in proteins. J. Biol Chem, 252(14), 4976-4980. Lockridge, O., Adkins, S., & La, D., BN,. (1987). J. Biol. Chem., 262, 12945-12952. Lopez, M. F. (2000). Better approach to finding the needle in a haystack: optimizing proteome analysis through automation. Electrophoresis, 21, 1082-1093. Louris, J., Cooks, R., Syka, J., Kelly, P., Stafford, G., & Todd, J. (1987). Anal. Chem., 59, 1677-1685. Macha, S. F., Limbach, P. A., Hanton, S. D., & Owens, K. G. (2001). Silver cluter interferences in matrix-assisted laser desortion/ionisation (MALDI) mass spectrometry of nonpolar polymers. J. Am. Soc. Mass Spectrom., 12, 732-743. Macha, S. F., Limbach, P. A., & Savickas, P. J. (2000). Application of nonpolar matrices for the analysis of low molecular weight nonpolar synthetic polymers by matris-assisted laser desorption/ionisation time-of-flight mass spectrometry. J. Am. Soc. Mass Spectrom., 11, 731-737. Mackun, K., & Downard, K. M. (2003). Strategy for identifying protein-protein interactions of gelseparated proteins and complexes by mass spectrometry. Anal Biochem, 318(1), 60-70. Mahoney, W., Smith, P., & Hermodson, M. (1981). Biochemistry, 20, 443-448. Mahoney, W. C., & Hermodson, M. A. (1979). High-yield cleavage of tryptophanyl peptide bonds by oIodosobenzoic acid. Biochemistry, 18, 3810-3814.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
109
Mahoney, W. C., Smith, P. K., & Hermodson, M. A. (1981). Fragmentation of proteins with oiodosobenzoic acid: chemical mechanism and identification of o-iodoxybenzoic acid as a reactive contaminant that modifies Tyrosyl residues. Biochemistry, 20, 443-448. Malone, J. P., Radabaugh, M. R., Leimgruber, R. M., & Gerstenecker, G. S. (2001). Practical aspects of fluorescent staining for proteomic applications. Electrophoresis, 22, 919-932. Mamyrin, B., Karatajev, V., Shmikk, D., & Zagulin, V. (1973). JEPT, 37, 45-. Mann, M., Hendrickson, R., & Pandley, A. (2001). Analysis of proteins and proteomics by mass spectrometry. Ann. Rev. Biochem, 70, 437-473. Mann, M., Hojrup, P., & Roepstorff, P. (1993). Use off mass spectrometric molecular weight information to identify proteins in sequence databases. Biol Mass Spectrom, 22, 338-345. Mann, M., & Wilm, M. (1994). Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem, 66, 4390-4399. Martinovic, S., Veenstra, T., Anderson, G., Pasa-Tolic, L., & Smith, R. (2002). Selective incorporation of isotopically labeled amino acids for identification of intact proteins on a proteome-wide level. J. Mass Spectrom., 37(1), 99-107. Masaki, T., Nakmura, K., Isono, M., & Soejima, M. (1978). Agric. Biol. Chem., 47, 1443-1445. McComb, M. E., Oleschuk, R. D., Manley, D. M., Donald, L., A.Chow, O'Neil, J. D. J., et al. (1997). Use of a non-porous polyurethane membrane as a sample support for MALDI-ToF MS of peptides and proteins. Rapid Commun. Mass Spectrom., 11, 1716-1722. McNeal, C., Macfarlane, R., & Thurston, E. (1979). Anal. Chem., 51, 2036-2039. Meloun, B., Kluh, I., Kostka, V., Morávek, L., Prusik, Z., Vanecek, J., et al. (1966). Biochim. Biophys. Acta, 130, 543-545. Meloun, B., Moravek, L., & Kostka, V. (1975). FEBS Lett., 58, 134-137. Menzel, C., Dreisewerd, K., Berkenkamp, S., & Hillenkamp, F. (2001). Mechanisms of énergie deposition in infrared matrix-assisted laser desorption/ionization mass spectrometry. Int. J. Mass Spectrom. Ion Proc., 207, 73-96. Miliotis, T., Kjellström, S., Nilsson, J., Laurell, T., Edholm, L.-E., & Marko-Varga, E. (2000). J. Mass Spectrom., 35, 369-377. Mirgorodskaya, E., Hassan, H., Wandall, H. H., Clausen, H., & Roepstorff, P. (1999). Partial vapor-phase hydrolysis of peptide bonds: a methods for mass spectrometric determination of o-glycosylated sites in glycopeptides. Anal. Biochem., 269, 54-65. Mirgorodskaya, O., YP, K., MI, T., R, K., CP, S., & P., R. (2000). Quantification of peptides and proteins by matris-assisted laser desorption/ionization mass spectrometry using 18O-labeled internal standards. Rapid Commun. Mass Spectrom., 14, 1226-1232. Mirza, U. A., Liu, Y. H., Tang, J. T., Porter, F., Bondoc, L., Chen, G., et al. (2000). Extraction and characterisation of adenovirus proteins from sodium dodecylsulfate polyacrylamide gel electrophoresis by matrix assisted laser desorption/ionisation mass spectrometry. J. Am. Soc. Mass Spectrom., 11, 356-361. Mitsumoto, A., Nakagawa, Y., Takeuchi, A., Okawa, K., Iwamatsu, A., & Takanezawa, Y. (2001). Oxidized forms of peroxiredoxins and DJ-1 on two-dimensional gels increased in response to sublethal levels of paraquat. Free Radic Res, 35(3), 301-310. Mohr, M., Börnsen, K., & Widmer, H. (1995). Rapid Commun. Mass Spectrom., 9, 809-814. Mole, J., & Horton, H. (1973). Biochemistry, 12, 816-822. Morris, H. R., Paxton, T., Dell, A., Langhorne, J., Berg, M., Bordoli, R. S., et al. (1996). High sensitivity collisionally-activated decomposition tandem mass spectrometry on a novel quadrupole / orthogonalacceleration time-of-flight mass spectrometer. Rapid Communications in Mass Spectrometry, 10, 889-896. Mouradian, S., Nelson, C. N., & Smith, L. M. (1996). A self-assembled matrix monolayer for ultraviolet matrix-assisted laser desorption/ionisation mass spectrometry. J. Am. Chem. Soc, 118, 8639-8645. Mozdzanowski, J., & Speicher, D. (1992). Microsequence analysis of electroblotted proteins. I. Comparison of electroblotting recoveries using different types of PVDF membranes. Anal Biochem, 207(1), 11-18. Muller, M., Gras, R., Appel, R. D., Bienvenut, W. V., & Hochstrasser, D. F. (2002). Visualization and analysis of molecular scanner peptide mass spectra. J Am Soc Mass Spectrom, 13(3), 221-231.
110
W.V. BIENVENUT
Nagata, K., Yoshida, N., Ogata, F., Araki, H., & Noda, K. (1991). Subsite mapping of an acidic amino acid-specific endopeptidase from Streptomyces griseus, GluSGP, and protease V8. J Biochem., 110, 859-862. Nakanishi, T., Okamoto, N., Tanaka, K., & Shimizu, A. (1994). Biol Mass Spectrom, 23, 230-233. Nedelkov, D., & Nelson, R. (2000). J. Mol. Recogn, 13, 140-145. Nedonchelle, E., Pitiot, O., & Vijayalakshmi, M. (2000). Appl. Biochem. Biotechnol., 83, 287-294. Nelson, R., Nedelkov, D., & Tubbs, K. (2000). Anal. Chem., 72, A404-A411. Nelson, R. W., Nedelkov, D., & Tubbs, K. A. (2000). Biosensor chip mass spectrometry: a chip-based proteomics approach. Electrophoresis, 21, 1155-1163. Neubaeuer, G., Gottschalk, A., Fabrizio, P., Séraphin, B., Lührmann, R., & Mann, M. (1997). Identification of the proteins of the yeast U1 small nuclear ribocleoprotein complex by mass spectrometry. Proc. Natl. Sci. USA, 94, 385-390. Neubaeur, G., King, A., Rappsilber, J., Calvio, C., Watson, M., Ajuh, P., et al. (1998). Mass spectrometry and EST-database searching allows characterisation of the multiprotein spliceosome complex. Nature Genetics, 20, 46-50. Neubauer, G., King, A., Rappsilber, J., Calvio, C., Watson, M., Ajuh, P., et al. (1998). Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex. Nat Genet, 20(1), 46-50. Neuhoff, V., Amold, N., Taube, D., & Ernhardt, W. (1988). Electrophoresis, 9, 255-262. Neuhoff, V., Stamm, R., Pardowitz, L., Amold, N., Ernhardt, W., & Taube, D. (1990). Electrophoresis, 11, 101-117. Neumann, H., & Mullner, S. (1998). Two replica blotting methods for fast immunological analysis of common proteins in two-dimentional electrophoresis. Electrophoresis, 19, 752-757. Neurath, H. (1957). In Advances in Protein Chemistry (Vol. vol. 12, pp. 319-386). New York: Academic Press. Nielsen, P., K, K., P, H., & P., R. (1988). Optimization of sample preparation for plasma desorption mass spectrometry of peptides ans proteins using a nitrocellulose matrix. Biomed & Environ Mass Spectrom, 17, 355-362. Nishihira, J., Hibiya, Y., Sakai, M., Nishi, S., Kumazaki, T., Ohki, S., et al. (1995). Biochim. Biophys. Acta, 1252, 233-238. Niu, S., Zhang, W., & Chait, B. (1998). J. Am. Soc. Mass Spectrom., 9, 1-7. Niwayama, S., Kurono, S., & Matsumoto, H. (2001). Synthesis of d-labeled N-alkylmaleimides and application to quantitative peptide analysis by isotope differential mass spectrometry. Bioorg Med Chem Lett, 11(17), 2257-2261. Nonami, H., Tanaka, K., & Fukuyama, Y., Erra-Balsells, DM,. (1998). Rapid Commun. Mass Spectrom., 12, 285-296. Nordhoff, E., Cramer, R., Karas, M., Hillenkamp, F., Kirpekar, F., Kristiansen, K., et al. (1993). Ion stability of nucleic acids in infrared matrix-assisted laser desorption/ionization mass spectrometry. Nucleic Acids Res., 21(15), 3347-3357. Noreau, J., & Drapeau, G. (1979). Isolation and properties of the protease from the wild-type and mutant stains of Pseudomonas fragi. J Bacteriol, 140, 911-916. Norregaard, J., G, Larsen, M., & Roestorff, P. (1998). Protein, Suppl2, 74-89. Nutkins, J., & Williams, D. (1989). Eur. J. Biochem. O'Farrell, P. H. (1975). High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem., 250(10), 4007-4021. Ogorzalek Loo, R., Mitchell, C., Stevenson, T., Martin, S., Hines, W., Juhasz, P., et al. (1997). Electrophoresis, 18, 382-390. Ohguro, H., Palczewski, K., Walsh, K., & Johnson, R. (1994). Prot Scien, 3, 2428-2434. Omenn, G., Fontana, A., & Anfisen, C. (1970). J. Biol. Chem., 145, 1895-1902. Ong, S., Blagoev, B., Kratchmarova, I., Kristensen, D., Steen, H., Pandey, A., et al. (2002). Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics., 1(5). Önnerfjord, P., Ekström, S., Bergquist, J., Nilsson, J., Laurell, T., & Marko-varga, G. (1999). Homogeneous sample preparation for automated high throughput analysis with matrix-assisted laser
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
111
desorption/ionisation time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom., 13, 315322. Ortiz, J., & Bubis, J. (2001). Effects of differential sulfhydryl group-specific labeling on the rhodopsin and guanine nucleotide binding activities of transducin. Arch Biochem Biophys., 387(2), 233-242. Ortiz, M. L., Calero, M., Fernandez, P.-C., Patron, C. F., Castellanos, L., & Mendez, E. (1992). Imidazole-SDS-Zn reverse staining of proteins in gels containing or not SDS and microsequence of individual unmodified electroblotted proteins. Febs Letters, 296(3), 300-304. Overberg, A., Karas, M., Bahr, U., Kaufmann, R., & Hillenkamp, F. (1990). Rapid Commun. Mass Spectrom., 4, 293-296. Ozols, J., & Gerard, C. (1977). J. Biol. Chem., 252, 5986-5989. Palmblad, M., Wetterhall, M., Markides, K., Hakansson, P., & Bergquist, J. (2000). Analysis of enzymatically digested proteins aand protein mixtures using a 9.4 tesla fourier transform ion cyclotron mass spectrometer. Rapid Commun. Mass Spectrom., 14, 1029-1034. Pappin, D. (1997). Peptide mass fingerprint using MALDI-ToF-MS. Meth. Mol. Biol., 64, 165-173. Pappin, D., Hojrup, P., & Bleasby, A. (1993). Rapid identification of proteins by petide mass fingerprint. Curr. Biol., 3(6), 327-332. Pappin, D., Rahman, Hansen, H., Bartlet-Jones, M., Jeffery, W., & Bleasby, A. (1995). Chemistry, mass spectrometry and peptide-mass databases: Evolution of methods for the rapid identification and mapping of cellular proteins. In A. L. Burlingame & S. A. Carr (Eds.), Mass Spectrometry in the Biological Sciences (pp. 135-150). Totowa, NJ: Humana Press. Parker, K., Garrels, J., Hines, W., Butler, E., McKee, A., Patterson, D., et al. (1998). Electrophoresis, 19, 1920-1932. Pasquarello, C., Burgess, J. A., Hochstrasser, D. F., Sanchez, J. C., & Corthals, G. L. (2003). NEW ALKYLATING REAGENTS ALLOWING SIMULTANEOUS BIOCHEMICAL CHARACTERISATION AND RELATIVE QUANTITATION OF PROTEINS, Montreal. Patterson, D., Tarr , G., Regnier, R., & Martin, S. (1995). C-terminal ladder sequencing via matrixassisted laser desorption/ionisation mass spectrometry coupled with carboxypeptidase Y time dependant and concentration dependant digestions. Anal. Chem., 67, 3971-3978. Patterson, S., Thomas, D., & Bradshaw, R. (1996). Application of combined mass spectrometry and partial amino acid sequence to the identification of gel separated proteins. Electrophoresis, 17, 877891. Perera, I., Perkins, J., & Kantartzoglou, S. (1995). Spin-coated samples for high resolution matrix-assisted laser desorption/ionization time-of-flightt mass spectrometry of large proteins. Rapid Commun. Mass Spectrom., 9, 180-187. Perkins, D., Pappin, D., Creasy, D., & Cottrell, J. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20(18), 3551-3567. Pfeifer, T., Rucknagel, P., Kuellertz, G., & Schierhorni, A. (1999). A strategy for rapid and efficient sequencing of Lys-C peptides by MALDI-ToF MS PSD. Rapid Commun. Mass Spectrom., 13, 362369. Pieles, U., Zurcher, W., & al., e. (1993). MALDI-ToF-MS: a powerful tool for the mass and sequence analysis of natural and modified oligonucleotides. Nucleic acids research, 21, 3191-3196. Pitt-Rivers, R., & Impiombato, F. (1968). Biochem. J., 109, 825-830. Ponstingl, H., Maier, G., Little, M., & Krauhs, E. (1986). In Advanced Methods in Protein Microsequence Analysis (pp. 316-319). Berlin/Heidelberg: Springer. Porzio, M., & Pearson, A. (1975). Biochim. Biophys. Acta, 384, 235-241. Posewitz, M., & Tempst, P. (1999). Anal. Chem., 71, 2883-2892. Powell, J., Fisher, W., Park, M., Craig, A., Rivier, J., White, S., et al. (1995). Primary structure of solitary form of gonadotropin-releasing hormaone (GnRH) in cichlid pituitary; three forms of GnRH in brain of cichlid and pumpkinseed fish. Regul. Pept, 57, 43-53. Preston, L., Murray, K., & Russell, D. (1993). Biol. Mass Spectrom., 22, 544-550. Preston-Schaffer, L., Kinsel, G., & Russell, D. (1994). J. Am. Soc. Mass Spectrom., 5, 800. Puri, K., & Surolia, A. (1994). J. Biol. Chem., 269, 30917-30926. Qin, J., Herring, C., & Zhang, X. (1998). Rapid Commun. Mass Spectrom., 12, 209-216.
112
W.V. BIENVENUT
Qiu, Y., Sousa, E., Hewick, R., & Wang, J. (2002). Acid-labile isotope-coded extractants: a class of reagents for quantitative mass spectrometric analysis of complex protein mixtures. Anal Chem, 74(19), 4969-4979. Quéméneur, E., Moutiez, M., Charbonnier, J., & Ménez, A. (1998). Nature, 391, 301-304. Quist, A., Huth-Fehre, T., & Sunqvist, B. (1994). Rapid Commun. Mass Spectrom., 5, 149. Rabilloud, T. (1990). Electrophoresis, 11. Rabilloud, T. (1992). A comparison between low background silver diammine and silver nitrate protein stains. Electrophoresis, 13(7), 429-439. Rabilloud, T., Kieffer, S., Procaccio, V., Louwagie, M., Courchesne, P., Patterson, S., et al. (1998). Twodimensional electrophoresis of human placental mitochondria and protein identification by mass spectrometry: Toward a human mitochondrial proteome. Electrophoresis, 19(6), 1006-1014. Rao, C., & Dunn, B. M. (1995). Evidence for electrostatic interactions in the S2 subsite of porcine pepsin. Adv Exp Med Biol, 362, 91-94. Reim, D., & Speicher, D. (1992). Microsequence analysis of electroblotted proteins: part II. Anal. Biochem., 207, 19-23. Ribadeau Dumas, B., Brignon, G., Groschamet, F., & Mercier, J. (1972). Structure primaire de la caseine beta bovine. Eur. J. Biochem., 25, 505-514. Riehm, D., & Scheraga, H. (1965). Structural studies of ribonuclease. XVII. A reactive caboxyl group in Ribonuclease. Biochemistry, 4, 772-782. Roepstorff, P., & Fohlman, J. (1984). Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed. Mass Spectrom., 11, 601. Rose, K., Stocklin, R., Savoy, L., Regamey, P., Offord, R., Vuagnat, P., et al. (1991). Protein Eng., 4. Rosen, G., Shoshani, M., Naor, R., & Sela, M. N. (2001). The purification and characterization of an 88kDa Porphyromonas endodontalis 35406 protease. Oral Microbiol Immunol, 16(6), 326-331. Rosengren, A., Bjellqvist, B., & Gasparic, V. (1976). Method for generating a pH-function for use in electrophoresis. US. Roth, K., Huang, Z., Sadagopan, N., & Watson, J. (1998). Charge derivatization of peptides for analysis by mass spectrometry. Mass Spectrom. Rev., 17, 255-274. Rusconi, F., Guillonneau, F., & Praseuth, D. (2002). Contributions of mass spectrometry in the study of nucleic acid-binding proteins and of nucleic acid-protein interactions. Mass Spectrom Rev, 21(5), 305-348. Ryle, A., & Porter, R. (1959). Biochem. J., 73, 75-86. Ryzhov, V., Bundy, J., Fenselau, C., Taranenko, N., Doroshenko, V., & Prasad, C. (2000). Matrixesassited laser desorption/ionisation time-of-flight analysis of bacillus spores using a 2.94um infrared laser. Rapid Commun. Mass Spectrom., 14, 1701-1706. Sakiyama, F., Suzuki, M., Yamamoto, A., Aimoto, S., Norioka, S., Masaki, T., et al. (1990). J. Protein Chem., 9, 297-298. Salih, B., & Zenobi, R. (1998). MALDI mass spectrometry of dye-peptide and dye protein complexe. Anal. Chem., 70, 1536-1543. Sanchez, J., & Hochstrasser, D. (1998). In A. Link (Ed.), Method in molecular biology: 2-d proteome analysis protocoll (Vol. 112, pp. 227-233). Totowa, NJ: Humana press. Sanchez, J., Wirth, P., Jacoud, S., Appel, R., Sarto, C., Wilkins, M., et al. (1997). Electrophoresis, 18, 638-641. Sanchez, J. C., Wirth, P., Jaccoud, S., Appel, R. D., Sarto, C., Wilkins, M. R., et al. (1997). Simultaneous analysis of cyclin and oncogene expression using multiple monoclonal antibody immunoblots. Electrophoresis, 18(3-4), 638-641. Savige, W., & Fontana, A. (1977). Methods Enzymol., 47, 459-469. Schar, M., Borsen, K., Gassmann, E., & Widmer, H. (1991). Monitoring of carboxypeptidase digestion by MALDI-MS. Chimia, 45, 123-126. Scheele, G. (1975). J. Biol. Chem., 250, 5375-5385. Scheler, C., Lamer, S., Pan, Z., Li, X., Salnikov, J., & Jungblut, P. (1998). Peptide mass fingerprint sequence coverage from differently stained proteins on 2-DE patterns by MALDI-MS. Electrophoresis, 19, 918-927. Scherl, A., Coute, Y., Deon, C., Calle, A., Kindbeiter, K., Sanchez, J. C., et al. (2002). Functional proteomic analysis of human nucleolus. Mol Biol Cell, 13(11), 4100-4109.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
113
Schlack, P., & Kumpf, W. (1926). Hoppe-Seyler’s Z. Physiol. Chem., 154, 125-170. Schleuder, D., Hillenkamp, F., & Strupat, K. (1999). IR-MALDI-mass analysis of electroblotted proteins directly from the membrane: comparison of different membranes, application to on-membrane digestion, and protein identification by database searching. Anal. Chem., 71, 3238-3247. Schmidt, M., Krause, E., Beyermann, M., & Bienert, M. (1995). Pept. Res., 8, 238-242. Schnolzer, M., Jedrzejewski, P., & Lehmann, W. (1996). Protease-catalyzed incorporation of O-18 into peptide-fragments and its application for protein sequencing by electrospray and matrix-assisted laser desorption/ionization mass-spectrometry. Electrophoresis, 17(5), 945-953. Schreiner, M., Strupat, K., Lottspeich, F., & Eckerskorn, C. (1996). UV-MALDI-MS of electroblotted proteins. Electrophoresis, 17, 954-961. Schutz, F., Kapp, E. A., Simpson, R. J., & Speed, T. P. (2003). Deriving statistical models for predicting peptide tandem MS product ion intensities. Biochem Soc Trans, 31(Pt 6), 1479-1483. Schwartz, J. C., & Jardine, I. (1996). Quadrupole ion trap mass spectrometry. Methods in Enzymology, 270, 552-586. Schwert, G., & Takenaka, Y. (1955). A Spectrophotometric Determination of Trypsin and Chymotrypsin. Biochim Biophys Acta, 16, 570. Sealy, P., & Southern, E. (1982). In Gel electrophoresis of nucleic acids: A practical approach (pp. 7172). Oxford: IRL Press. Sechi, S. (2002). A method to identify and simultaneously determine the relative quantities of proteins isolated by gel electrophoresis. Rapid Commun. Mass Spectrom., 16(15), 1416-1424. Sechi, S., & Chait, B. (1998). Modification of cysteine residues by alkylation. A tool in peptide mapping and protein identification. Anal Chem., 70(24), 5150-5158. Seiler, N., Thobe, J., & Werner, G. (1970). Hoppe Seylers Z. Physiol. Chem., 351, 865-868. Sepetov, N. F., Issakova, O. L., Lebl, M., Swiderek, K., Stahl, D. C., & Lee, T. D. (1993). The use of hydrogen-deuterium exchange to facilitate peptide sequencing by electrospry tandem mass spectrometry. Rapid Commun. Mass Spectrom., 7, 58-62. Shapiro, A., Vinuela, E., & Maizel, J. (1967). Biochem. Biophys. Res. Commun., 28, 815-820. Shen, T. L., & Allison, J. (2000). Interpretation of matrix-assisted laser desorption/ionization postsource decay spectra of charged-derivatized peptides: some examples of Tris[(2,4,6Trimethoxyphenyl)phosphonium]-tagged proteolytic digestion products of phosphoenolpyruvate carboxykinase. J. Am. Soc. Mass Spectrom., 11, 145-152. Shen, Y., Berger, S., & Smith, R. (2001). High-efficiency capillary isoelectric focusing of protein complexes from Escherichia coli cytosolic extracts. J. Chromatogr. A, 914(1-2), 257-264. Shevchenko, A., I, C., W, E., KG, S., B, T., M, W., et al. (1997). Rapid "De Novo" peptide sequencing by a combination of nanoelectrospray isotopic labelling and Q-ToF-MS. Rapid Commun. Mass Spectrom., 11, 1015-1024. Shevchenko, A., Jensen, O. N., Podtelejnikov, A. V., Sagliocco, F., Wilm, M., Vorm, O., et al. (1996). Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. Proc. Natl. Acad. Sci U.S.A., 93(25), 14440-14445. Shevchenko, A., Loboda, A., Ens, W., Schraven, B., & Standing, K. G. (2001). Archived polyacrylamide gels as a resource for proteome characterization by mass spectrometry. Electrophoresis, 22(6), 11941203. Shevchenko, A., Sunyaev, S., Loboda, A., Bork, P., Ens, W., & Standing, K. G. (2001). Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. Anal Chem, 73(9), 1917-1926. Shevchenko, A., Wilm, M., Vorm, O., & Mann, M. (1996). Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Analytical Chemistry, 68(5), 850-858. Shevchenko, S., Loboda, A., Shevchenko, A., Ens, W., & Standing, K. (2000). Anal. Chem., 72, 21322141. Smillie, L., Furka, A., Nagabhushan, N., Stevenson, K., & Parkes, C. (1968). Nature, 218, 343-346. Sok, D., & Sih, C. (2001). Difference in susceptibility of tyrosine residue to oxidative lodination between a thioredoxin box region and a hormonogenic region. Arch Pharm Res., 24(5446-54). Sonsmann, G., Romer, A., & Schomburg, D. (2002). Investigation of the influence of charge derivatization on the fragmentation of multiply protonated peptides. J Am Soc Mass Spectrom, 13(1), 47-58.
114
W.V. BIENVENUT
Sørensen, S., Sørensen, T., & Breddam, K. (1991). FEBS Lett., 294, 195-197. Spahr, C. S., Davis, M. T., McGinley, M. D., Robinson, J. H., Bures, E. J., Beierle, J., et al. (2001). Towards defining the urinary proteome using liquid chromatography-tandem mass spectrometry. I. Profiling an unfractionated tryptic digest. Proteomics, 1(1), 93-107. Spencer, P., Titus, J., & Spencer, R. (1975). Direct Fluorimetric Assay for Proteolytic Activity Against Intact Proteins. Anal Biochem, 64, 556-. Spengler, B. (1997). Post-source decay analysis in matrix*assisted laser desorption/ionisation mass spectrometry of biomolecules. J. Mass Spectrom., 32, 1019-1036. Spengler, B., Lutzenkirchen, F., & Kaufmann, R. (1993). On-target deuteration for peptide sequencing by laser mass spectrometry. Org. Mass Spectrom., 28, 1482-1490. Spengler, B., Lutzenkirchen, F., Metzger, S., Chaurand, P., Kaufmann, R., Jeffery, W., et al. (1997). Peptide sequencing of charged derivatives by postsource decay MALDI mass spectrometry. Int. J. Mass Spectrom. Ion Proc., 169/170, 127-140. Steinberg, T., Lauber, W., Berggren, K., Kemper, C., Yue, S., & Patton, W. (2000). Electrophoresis, 21, 497-508. Steinberg, T. H., Haugland, R. P., & Singer, V. L. (1996). Applications of SYPRO orange and SYPRO red protein gel stains. Analytical Biochemistry, 239(2), 238-245. Steinberg, T. H., Jones, L. J., Haugland, R. P., & Singer, V. L. (1996). SYPRO orange and SYPRO red protein gel stains: one-step fluorescent staining of denaturing gels for detection of nanogram levels of protein. Analytical Biochemistry, 239(2), 223-237. Stevenson, E., Breukert, K., & Zenobi, R. (2000). Internal energies of analyte ions generated from different matrix-assisted laser desorption/ionization matrices. J. Mass Spectrom., 35, 1035-1041. Stewart, J., Lee, H., & Dobson, J. (1963). Evidence for a Functional Carboxyl Group in Trypsin and Chymotrypsin. J Am Chem Soc, 85, 1537. Strupat, K., Karas, M., & Hillenkamp, F. (1991). 2,5-Dihydroxy benzoic acid: a new matrix for laser desorption-ionization mass spectrometry. Int. J. Mass Spectrom. Ion Proc., 111, 89-102. Stults, J. (May 31-June 5, 1992). Proceeding of the 40th ASMS conference of Mass Spectrometry and Allied Topics. Paper presented at the ASMS, Washington, DC, USA. Stults, J., Lai, J., McCune, S., & Wetzel, R. (1993). Anal. Chem., 65, 1703-1708. Suckau, D., M, M., & M., P. (1992). Protein surface f topology-probing by selective chemical modification and mass spectrometric peptide mapping. Proc. Natl. Sci. USA, 89, 5630-5634. Switzer, R., Merril, C., & Shifrin, S. (1979). Anal. Biochem., 98, 231-237. Sze, E., Dominic, C., TW, & Wang, G. (1998). J. Am. Soc. Mass Spectrom., 9, 166-174. Takach, E., Hines, W., Patterson, D., Juhasz, P., Falick, A., Vestal, M., et al. (1997). Accurate mass measurements using MALDI-ToF with delayed extraction. J. Prot. Chem., 16(5), 363-369. Takamoto, K., Kamo, M., Kubota, K., Satake, K., & Tsugita, A. (1995). Eur. J. Biochem., 228, 362-372. Tal, M., Silberstain, A., & Nusser, E. (1985). Why does coomassie brillant blue R interact differently with different proteins ? a partial answer. J. Biol Chem, 260(18), 9976-9980. Tan, A., & Eaton, D. (1995). Biochemistry, 34, 5811-5816. Tanaka, K., Waki, H., Ido, Y., Akita, S., Yoshida, Y., & Yoshida, T. (1988). Rapid Commun. Mass Spectrom., 2, 151-153. Tang, C., Zhang, W., Fenyö, D., & Chait, B. (2000). Proceeding of the 48th ASMS conference of Mass Spectrometry and Allied Topics. Paper presented at the ASMS, Long Beach, CA, USA. Tang, J., & Koelsch, G. (1995). A possible function off the flaps of aspartic proteases: the capture of substrate side chains determines the specificity of cleavage positions. Protein Peptide Lett, 2, 257266. Tang, K., Taranenko, N., Allman, S., Chen, C., Chang, L., & Jacobson, K. (1994). Picolinic acid as a matrix for laser mass spectrometry of nucleic acids and proteins. Rapid Commun Mass Spectrom, 8(9), 673-676. Taranenko, N. I., Tang, K., Allman, S. L., Chang, L. Y., & Chen, C. H. (1994). 3-Aminopicolinic acid as a matrix for laser desorption mass spectrometry of biopolymeres. Rapid Commun. Mass Spectrom., 8, 1001-1006. The Arabidopsis Initiative. (2000). Nature, 408, 796-815. The C elegans Sequencing Consortium. (1998). Science, 282, 2012-2018.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
115
Théberge, R., Connors, L. H., M.Skinner, & Costello, C. E. (2000). Detection of transthyretin variants using immunoprecipitation and MALDI bioreactive probes: a clinical application of mass spectrometry. J. Am. Soc. Mass Spectrom., 11, 172-175. Thevis, M., Loo, R. R., & Loo, J. A. (2003). Mass spectrometric characterization of transferrins and their fragments derived by reduction of disulfide bonds. J Am Soc Mass Spectrom, 14(6), 635-647. Thiede, B., Lamer, S., Mattow, J., Siejak, F., Dimmler, C., Rudel, T., et al. (2000). Analysis of missed cleavage sites, tryptophan oxydation and N-Terminal pyroglutamination after in-gel tryptic digestion. Rapid Commun. Mass Spectrom., 14, 496-502. Thiede, B., Salnikow, J., & Wittmann-Liebold, B. (1997). C terminal ladder sequencing by an approach combining chemical degradation with analysis by MALDI-MS. Eur. J. Biochem., 244, 750-754. Thiede, B., Wittmann-Liebold, B., Bienert, M., & Krause, E. (1995). MALDI-MS for C-terminal sequence determination of peptides ans proteins degradation by carboxypeptidase Y and P. FEBS letters, 357, 65-69. Tonella, L., Walsh, B., Sanchez, J.-C., Ou, K., Wilkins, M., Tyler, M., et al. (1998). '98 Escherichia coli SWISS-2DPAGE database update. Electrophoresis, 19(11), 1960-1971. Tonella, L., Walsh, B. J., Sanchez, J. C., Ou, K., Wilkins, M. R., Tyler, M., et al. (1998). '98 Escherichia coli SWISS-2DPAGE database update. Electrophoresis, 19(11), 1960-1971. Tonge, R., Shaw, J., Middleton, B., Rowlinson, R., Rayner, S., Young, J., et al. (2001). Validation and development of fluorescence two-dimensional differential gel electrophoresis proteomics technology. Proteomics, 1(3), 377-396. Towbin, H., Staehelin, T., & Gordon, J. (1979). Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: procedure and some applications. Proc. Natl. Sci. USA, 76, 4350-4354. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J.-C., et al. (1998). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19, 1941-1949. Tsugita, A., K, T., M, K., & H., I. (1992). C-terminal sequencing of protein. A novel partial acid hydrolysis and analysis by mass spectrometry. Eur. J. Biochem., 206, 691-696. Tsunasawa, S., Sugihara, A., Masaki, T., Sakiyama, F., Takeda, Y., Miwatani, T., et al. (1987). J. Biochem., 101. Twerenbold, D. (1996). Rep. Progr. Phys., 59, 349-426. Twerenbold, D., Vuilleumier, J.-L., Gerber, D., Tadsen, A., Van, D., Brandt, B, & Gillevet, P. (1996). Appl. Phys. Lett, 68, 3503-3505. Unlu, M., Morgan, M. E., & Minden, J. S. (1997). Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis, 18, 2071-2077. Uttenweiler-Joseph, S., Neubauer, G., Christoforidis, S., Zerial, M., & Wilm, M. (2001). Automated de novo sequencing of proteins using the differential scanning technique. Proteomics, 1(5), 668-682. Valdes, I., A, P., C, G., A, B., M, L., C, N., et al. (2000). Novel procedure for the identification of proteins by mass fingerprinting combining two-dimentional electrophoresis with fluorescent SYPRO red staining. J. Mass Spectrom., 35, 672-682. van Montfort, B. A., Canas, B., Duurkens, R., Godovac-Zimmermann, J., & Robillard, G. T. (2002). Improved in-gel approaches to generate peptide maps a of integral membrane proteins with matrixassisted laser desorption/ionization time-of-flight mass spectrometry. J Mass Spectrom, 37(3), 322330. van Montfort, B. A., Doeven, M. K., Canas, B., Veenhoff, L. M., Poolman, B., & Robillard, G. T. (2002). Combined in-gel tryptic digestion and CNBr cleavage for the generation of peptide maps of an integral membrane protein with MALDI-ToF mass spectrometry. Biochim Biophys Acta, 1555(1-3), 111-115. Veenstra, T. D., Martinovic, S., Anderson, G. A., Pasa-Tolic, L., & Smith, R. D. (2000). Proteome analysis using selective incorporation of isotopically labeled amino acids. J. Am. Soc. Mass Spectrom., 11, 78-82. Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., et al. (2001). The sequence of the human genome. Science, 291(5507), 1304-1351. Vestling, M., & Fenselau, C. (1994). PVDF: an interface for gel electrophoresis and MALDI-MS. Biochem. Soc. Trans., 22(2), 547-551.
116
W.V. BIENVENUT
Villanueva, J., Canals, F., Villegas, V., Querol, E., & Avilés, F. (2000). Hydrogen exchange monitored by MALDI-ToF MS for rapid characterization of the stability and conformation of proteins. FEBS letters, 472, 27-33. Villegas, V., Vendrell, J., & Avilés, F. (1995). Protein Sci., 4, 1792-1800. Vorm, G., & Roepstorff, P. (1994). Biological mass spectrometry, 23, 734-740. Vorm, O., Roepstorff, P., & Mann, M. (1994). Improved Resolution and Very High-Sensitivity In MALDI ToF Of Matrix Surfaces Made By Fast Evaporation. Anal. Chem., 66(19), 3281-3287. Wachter, E., Machleidt, W., Hofner, H., & Otto, J. (1973). Aminopropyl glass and its p-phenylene diisothiocyanate deri^vative: a new support in solide phase Edman degradation of peptides and proteins. FEBS letters, 35(1), 97-102. Wachter, E., & Werhahn, R. (1979). In Methods in peptide and protein sequence analysis (pp. 21-33). Amsterdam: Elsevier. Walsh, B., Molloy, M., & Williams, K. (1998). Electrophoresis, 19, 1883-1890. Walter, R., Shlank, H., Glass, J., Schwartz, I., & Kerenyi, T. (1971). Science, 173, 827-829. Wang, F., & Tang, X. (1996). Biochemistry, 35, 4069-4078. Wang, M., & Fitzgerald, M. (2001). A solid sample preparation method that reduces signal suppression effects in the maldi analysis of peptide. Anal. Chem., 73, 625-631. Wang, Q., Shoeman, R., & Traub, P. (2000). Biochemistry, 39, 6645-6651. Wang, S., & Regnier, F. (2001). J. Chromatogr. A, 913, 429-436. Watts, J., Affolter, M., Krebs, D., Wange, R., Samelson, L., & Aebersold, R. (1994). J. Biol Chem, 269(47), 29520-29529. Weber, K., & Osborne, M. (1969). J. Biol. Chem., 244, 4406-4412. Wei, J., Buriak, J., & Siuzdak, G. (1999). Desorption ionization mass spectrometry on porous silicon. Nature, 399, 243-246. Weitzhandler, M., Farman, D., Rohrer, J., & Avdalovic, N. (2001). Proteomics, 1, 179-185. Wenschuh, H., Halada, P., Lamer, S., Jungblut, P., & Krause, E. (1998). The ease of peptide detection by MALDI-MS: the effect of secondary structure on signal intensity. Rapid Commun. Mass Spectrom., 12, 115-119. Westmacott, G., Ens, W., & Standing, K. (1996). Nucl, Instr, Methods Phys. Res. Sect., 108, 282. Westmacott, G., Zhong, F., Frank, M., Friedrich, S., Labov, E., & Benner, W. (2000). Rapid Commun. Mass Spectrom., 14, 600-607. Wheeler, D., Church, D., Lash, A., Leipe, D., Madden, T., Pontius, J., et al. (2001). Database resources of the national center for biotechnology information. Nucleic acids research, 29, 11.juin. Whittaker, J., & Bendr, H. (1965). Kinetics of papain-catalyzed hydrolysis of a N-benzoil-L-arginine ethyl ester and a N-benzoyl-L-arginamide. J Am Chem Soc, 87, 2728–2738. Whittal, R., Schriemer, D., & Li, L. (1997). Anal. Chem., 69, 2734-2741. Wilcox, P. (1967). Esterification. Meth. Enzym., 11, 605-616. Wiley, W., & McLaren, I. (1953). Time-of-flight mass spectrometer with improved resolution. Rev. Sci. Instrum., 26, 1150-1157. Wilkins, M., & Gooley, A. (1997). In Proteome Research: New Frontiers in functional genomics (pp. 3564). Berlin: Springer-Verlag. Wilkins, M., Pasquali, C., Appel, R., Ou, K., Golaz, O., Sanchez, J., et al. (1996). From proteins to proteomes: large scale protein identification by 2-D electrophoresis and amino acids analysis. Bio/techniques, 14, 61-65. Wilkins, M., Sanchez, J., Gooley, A., Appel, R., Humphery-Smith, J., Hochstrasser, D., et al. (1995). Progress with proteome projects: Why all proteins expressed by genome should be identified and how to do it. Biotechnology & genetic Engineering Reviews, 13, 19-50. Williams, K., & Hochstrasser, D. (1997). In Proteome Research: New Frontiers in functional genomics (pp. 1-12). Berlin: Springer-Verlag. Wilm, M., Shevchenko, A., Houthaeve, T., Breit, S., Schweigerer, L., Fotsis, T., et al. (1996). Femtomole sequencing of proteins from polyacrylamide gels by nano-electrospray mass spectrometry. Nature, 379(6564), 466-469. Wittmann-Liebold, B., Matscull, L., Pilling, O., Bradaczek, H., & Graffunder, H. (1991). In Methods in protein sequence analysis. Basel: Birkhausen Verlag.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
117
Wolfender, J., Chu, F., Ball, H., Wolfender, F., Fainzilber, M., Baldwin, M., et al. (1999). Identification of Tyrosine sulfation in conus pennaceus eonotoxin -PnIA and -PnIB: further investigation of labil sulfo and phosphopeptides by electrospray, MALDI and atmospheric presure MALDI-MS. J. Mass Spectrom., 34, 447-454. Wong, C., So, M., & Chan, T. (1998). Origins of the proton in the generation of protonated polymeres and peptides in matrix-assited laser desorption/ionisation. Eur. Mass Spectrom, 4, 223-232. Worrall, T., Cotter, R., & Woods, A. (1998). Purification of contaminated peptides and proteins on synthetic membrane surfaces for matrix-assisted d laser desorption/ionisation mass spectrometry. Anal. Chem., 70, 750-756. Worrall, T., Lin, H., Cotter, R., & Woods, A. (2000). On-probe sample purification of lipids for MALDIToF-MS. J. Mass Spectrom., 35, 647-650. Wu, J., & Watson, J. (1998). Optimisation of the cleavage reaction for cyanylated cysteinyl proteins for efficient and simplified mass mapping. Anal. Biochem., 258, 268-276. Wu, K., Shaler, T., & Becker, C. (1994). Time-of-flight mass spectrometry of underivatized singlestranded DNA oligomers by matrix-assisted laser desorption. Anal Chem., 66(10), 1637-1645. Wu, K., Steding, A., & Becker, C. (1993). Matrix-assisted laser desorption time-of-flight mass spectrometry of oligonucleotides using 3-hydroxypicolinic acid as an ultraviolet-sensitive matrix. Rapid Commun Mass Spectrom, 7(2), 142-146. Xiang, F., & Beavis, R. (1994). A method to increase contaminant tolerance in protein matrix-assisted laser desorption/ionization by the fabrication of thin protein-doped polycrystalline films. Rapid Commun Mass Spectrom, 8, 199-204. Yamashita, M., & Fenn, J. (1984). Phys. Chem., 88, 4451-4459. Yan, J., Harry, R., Spibey, C., & Dunn, M. (2000). Postelectrophoretic staining of proteins separated by two-dimentional gel electrophoresis using SYPRO dyes. Electrophoresis, 21, 3657-3665. Yan, J., Wait, R., Berkelman, T., Harry, R., Westbrook, J., Weeler, C., et al. (2000). A modified silver staining protocol for visualization of proteins compatible with matrix-assisted laser desorption/ionization and electrospray ionization mass spectrometry. Electrophoresis, 21, 3666-3672. Yang, K. S., Kang, S. W., Woo, H. A., Hwang, S. C., Chae, H. Z., Kim, K., et al. (2002). Inactivation of human peroxiredoxin I during catalysis as the result off the oxidation of the catalytic site cysteine to cysteine-sulfinic acid. J Biol Chem, 277(41), 38029-38036. Yao, X., Freas, A., Ramirez, J., Demirev, P., & Fenselau, C. (2001). Anal. Chem., 73, 2836-2842. Yates, J. R., III, Speicher, S., Griffin, P. R., & Hunkapiller, T. (1993). Peptide mass maps: A highly informative approach to protein identification. Anal Biochem, 214, 397-408. Yates, J. R. I., Eng, J. K., McCormack, A. L., & Schieltz, D. (1995). Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Analytical Chemistry, 67(8), 1426-1436. Yonetsu, T., Higuchi, K., Tsunasawa, S., Takagi, S., Sakiyama, F., & Takeda, T. (1986). FEBS Lett., 203, 149-152. Zassenhaus, P., Hanson, K., & Wolgemuth, R. (1976). Anal. Biochem., 76, 321-329. Zenobi, R., & Knochenmuss, R. (1998). Ion formation in MALDI mass spectrometry. Mass Spectrom. Rev., 17, 337-366. Zhang, H., Andren, P., & Caprioli, R. (1995). Micro-preparation procedure for high-sensitivity matrixassisted laser desorption ionization mass spectrometry. J. Mass Spectrom., 30, 1768-1771. Zhang, R., Sioma, C., Wang, S., & Regnier, F. (2001). Fractionation of isotopically labeled peptides in quantitative proteomics. Anal Chem, 73(2), 5142-5149. Zhang, W., & Chait, B. (2000). Anal. Chem., 72, 2482-2489. Zhang, Z., & McElvain, J. (2000). Improvements in protein identification by MALDI-ToF MS peptide mapping. Anal. Chem., 72, 2337-2350. Zhou, G., Li, H., DeCamp, D., Chen, S., Shu, H., Gong, Y., et al. (2002). 2D differential in-gel electrophoresis for the identification of esophageal scans cell cancer-specific protein markers. Mol Cell Proteomics, 1(2), 117-124. Zhu, Y., Chung, C., Taranenko, N., Allmann, S., Martin, S., Haff, L., et al. (1996). The study of 2,3,4Trihydroxyacetophenone and 2,4,6-Trihydroxyacetophenone as matrix for DNA detection in MALDI-ToF-MS. Rapid Commun. Mass Spectrom., 10, 383-388.
118
W.V. BIENVENUT
Zimmerman, M., Ashe, B., Yurewics, E. C., & Patel, G. (1977). Sensitive assays for trypsin, elastase, and chymotrypsin using new fluorogenic substrates. Anal. Biochem., 78, 47-51.
CHAPTER 2 MOLECULAR SCANNER DEVELOPMENT Toward a Clinical Molecular Scanner for Proteome Research: Parallel Protein Chemical Processing Before and During Western-blot. Reprint with permission from (Bienvenut et al., 1999), copyright 2003 American Chemical Society.
WV. Bienvenut, JC. Sanchez, A. Karmime, V. Rouge, K. Rose, PA. Binz, DF. Hochstrasser
ABSTRACT In order to increase the throughput of protein identification and characterisation in proteome studies, we investigated three methods of performing protein digestion in parallel. The first, which we term "One-Step Digestion-Transfer" (OSDT), is based on protein digestion during the transblotting process. It involves the use of membranes containing immobilised trypsin which are intercalated between the gel and a PVDF collecting membrane. During electrotransfer, some digestion of the transferred proteins occurs, although poorly for basic and/or high molecular weight (MW) proteins. The second method is based on "In-Gel" digestion of all proteins in parallel, and termed "Parallel In-Gel Digestion" (PIGD) to denote this fact. The PIGD led to more efficient digestion of basic and high MW proteins (> 40 kDa) but suffered from a major drawback: loss of resolution for low molecular weight polypeptides (< 60 kDa) through diffusion during the digestion process. The third method examined was the combination of PIGD and OSDT procedures. This combination call "Double Parallel Digestion" (DPD), led to greatly improved digestion of high molecular weight and basic proteins without losses of low MW polypeptides. Peptides liberated during transblotting of proteins through the immobilised trypsin membrane were trapped on a PVDF membrane and identified by mass spectrometry in scanning mode (see Chapter 4 (Binz, Wilkins et al., 1999)).
119 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 119–137. © 2005 Springer. Printed in the Netherlands.
120
BIENVENUT ET AL.
KEYWORDS Parallel protein digestion, semi-dry electroblot, 2-DE, MALDI-TOF-MS, peptide mass fingerprint, double parallel digestion, molecular scanner, automation, IAVtrypsin, proteome
MOLECULAR SCANNER DEVELOPMENT
121
1. INTRODUCTION Several genomes have already been fully sequenced and many others will be in the near future. Although it is possible to extract from genome data a complete set of the potentially expressed protein amino acid sequences, in many cases this information is not sufficient to unravel the function of a newly discovered gene product: the proteins also need to be identified and characterised (Hochstrasser, 1998; Williams & Hochstrasser, 1997). The task of identifying and characterising all proteins expressed by a genome is tremendous (Williams & Hochstrasser, 1997). The word proteome has been coined to refer to the expressed protein complement of a genome (Wilkins, Pasquali et al., 1996). Massively parallel protein identification and characterisation techniques are required. Several groups around the world have developed methods using liquid chromatography mass spectrometry (LC-MS) to sequentially identify and partially characterise proteins from complex biological samples (Figeys, Ducret, Yates, & Aebersold, 1996; Wilm & Mann, 1996). Matrix Assisted Laser Desorption/Ionisation-Time of Flight (MALDI-TOF) techniques have been developed to analyse intact proteins or their peptide fingerprints (Jungblut et al., 1996; Pappin, Coull, & Koster, 1990; Scheler et al., 1998; Shevchenko, Wilm et al., 1996). Several software programs have been developed to assist protein identification by comparison of mass spectra obtained from mass spectrometry (MS) or MS-MS experiments with theoretical spectra from proteins and DNA databases ( Binz, Wilkins et al., 1999). Recently, Eckerskorn et al. (Eckerskorn et al., 1997) demonstrated the possibility of scanning a transblotted membrane with a MALDITOF mass spectrometer (MALDI-TOF-MS) equipped with an infrared laser and detecting intact proteins. The detection sensitivity was equal to or better than that obtained by silver-staining. Ogorzalek Loo et al. (Ogorzalek Loo et al., 1997a) analysed proteins directly from a polyacrylamide gel with good sensitivity and mass accuracy. Peptide mass fingerprinting (PMF), a method of choice in proteome studies, requires specific chemical or enzymatic digestion followed by MS of the resulting peptides. Up to now, the digestion step has been a sequential process where robotics can be used for spot excision such as the "spot picker" proposed by Traini et al. (Traini et al., 1998b). We wished to investigate if it was possible to digest all separated proteins on a bi-dimensional electrophoresis (2-DE) gel simultaneously, and if it was possible to transfer all resulting protein fragments to a membrane without loss of spatial resolution and subsequent mass spectrometric sensitivity. Were this all to be possible, proteins separated on 2-DE could be digested in parallel, transblotted to a membrane, and MALDI-TOF-MS scanning of the membrane would then provide a massively parallel way to rapidly and partially
122
BIENVENUT ET AL.
characterise thousands of proteins with an appropriate integrated software system (Hochstrasser, 1998) able to treat mass spectra and to create a fully annotated image. In this article, we present three approaches to parallel sample preparation. The first method (OSDT) was developed to digest proteins previously separated by 1-DE or 2-DE during the transblotting process. The second one (PIGD) involves applying the standard IGD procedure to the whole gel, followed by transblotting. The third method, called double parallel digestion (DPD), is the combination of the two previous methods. These three methods are compared with the standard sequential methods for protein digestion, i.e. in-gel (IGD) or on-membrane digestion (OMD). 2. EXPERIMENTAL SECTION 2.1. Reagents Sequencing-grade modified trypsin was purchased from Promega (Madison, WI, USA). Immobilon™ AV membranes were purchased from Millipore (Bedford, MA, USA). Acrylogel-PIP 2.6%C solution was purchased from BDH (Poole, England). Trans-Blot® PVDF membrane and Broad range Mr sodium dodecylsulfate polyacrylamide gel electrophoresis (SDS-PAGE) standards containing bovine pancreatic trypsin inhibitor (BPT1, 6.5 kDa), chicken lysozyme (LYC, 14.3 kDa), soybean trypsin inhibitor (ITRA, 20.1 kDa), bovine carbonic anhydrase (CAH2, 28.9 kDa), chicken ovalbumin (OVAL, 42.7 kDa), bovine serum albumin (ALBU, 66.4 kDa), rabbit phosphorylase b (PHS2, 97.2 kDa), E. Coli E-galactosidase (BGAL, 116.4 kDa) and rabbit myosin (MYSS, 223 kDa) were purchased from Bio(Richmond, CA, USA). Trifluoroacetic acid (TFA), Rad tris(hydroxymethyl)aminomethane (Tris), 3-[cyclohexylamino]-1-propanesulfonic acid (CAPS), trypsin (type IX from porcine pancreas, dialysed and lyophilised) and D-cyano-4-hydroxy-trans-cinnamic acid (ACCA) were purchased from Sigma (StLouis, MO, USA) and were of analytical grade. Acetonitrile (ACN), calcium chloride, ethanolamine, glycine, D-tosyl-L-arginine methylester (TAME) and SDS were purchased from Flucka (Buch, Switzerland) and were of analytical grade except for ACN (preparative HPLC grade). Ethanol, hydrochloric acid, methanol, sodium bicarbonate, sodium chloride, sodium dihydrogenophosphate, polyoxyethylene sorbitan monolaurate (Tween 20) were purchased from Merck (Darmstadt, Germany). MilliQ water (Millipore) was used when necessary. Immobilised pH gradient strips were purchased from Amersham Pharmacia Biotech (Uppsala, Sweden). 2.2. Covalent attachment of trypsin and blocking of the IAV membrane IAV membrane is a commercially available modified PVDF membrane. Its activated carboxyl groups are reactive towards nucleophiles such as amine groups of proteins
MOLECULAR SCANNER DEVELOPMENT
123
or peptides. Trypsin was immobilised on this membrane according to the manufacturer’s instructions (Immobilon Tech Protocol: TP014, TP015, TP018). Briefly, a 10x12 cm2 IAV membrane was wetted in a solution of trypsin (2.0 mg/ml in 20 mM sodium dihydrogenophosphate buffer, pH 7.8) and then incubated in a rotating hybridiser HB-2D (Techne, Cambridge, England) at room temperature for 3 hours. The membrane was washed 3 times rapidly and vigorously in 10 ml of PBSTween solution (20 mM of sodium dihydrogenophosphate, 140 mM sodium chloride and 0.5% Tween 20, pH 7.4) to remove unreacted trypsin, then incubated 3 hours with 10 ml of ethanolamine (1 M in 1 M sodium bicarbonate buffer pH 9.5, final pH 10.5) at 4°C to block the remaining actived carbonyl groups of the membrane. After this capping step, the membrane was washed 3 times rapidly and vigorously in 10 ml of PBS-Tween solution and then twice 30 minutes in 10 ml of PBS-Tween solution. Membranes were stored at 4°C in a 46 mM Tris-HCl, 1 mM calcium chloride and 0.1% sodium azide buffer solution, pH 8.1. 2.3. Activity measurement of trypsin covalently bound to the IAV membrane The tryptic activity of the IAV-trypsin membrane was determined using the trypsin assay reagent TAME. One cm2 of IAV-trypsin membrane was immersed in a mixture composed of 2.6 ml of 460 mM Tris-HCl, 11.5 mM calcium chloride, pH 8.1 solution, 0.3 ml of 10 mM TAME solution and 0.1 ml of 1 mM HCl solution. After 40 seconds of vigorous stirring, the optical density of the solution was measured at 247 nm with an UV-Visible spectrophotometer (Ultrospec III, Amersham Pharmacia Biotech). A second measurement was performed after 3 minutes of constant vigorous stirring. The value of 'A247/min was used to calculate the equivalent amount of active trypsin (expressed per unit surface area) as described previously14. 2.4. 1-DE and 2-DE separation 1-DE was conducted essentially according to Laëmmli (Laemmli, 1970) with 12% T and 2.6% C for linear polyacrylamide. Protein migration was carried out using MiniProtean II electrophoresis apparatus (Bio-Rad) operated at 200 V for 45 minutes. For mini 2-DE, protein separation from human plasma was conducted according to Sanchez et al. (Sanchez & Hochstrasser, 1998) using immobilised pH gradient strips 3.5-10 and 5-5.5. When necessary, the gel was stained with Coomassie Brilliant Blue (CBB) R250 (0.1% w/v), methanol (30% v/v) and acetic acid (10% v/v) for 30 minutes and destained with repeated washes of methanol (40% v/v) and acetic acid (10% v/v).
124
BIENVENUT ET AL.
2.5. In Gel Digestion Protein spots were excised from the gel and then digested with trypsin using previous published procedures (Sanchez & Hochstrasser, 1998; Shevchenko, Wilm et al., 1996) and modified as described below. The piece of gel was first destained with 200 Pl of 50 mM ammonium bicarbonate, 50% ACN during 1 hour at 37°C. Destaining solution was removed and the gel was dried in a vacuum centrifuge (Speed Vac, Savant). Gel pieces were reswollen with 20 Pl of 20 mM ammonium bicarbonate and 4 Pl of 0.1 Pg/Pl trypsin. After over-night incubation at room temperature, the gel was dried under high vacuum to evaporate solvent and volatile salts. Then 20-40 Pl of 50% ACN, 0.3% TFA were added and the gel sonicated for 15 minutes to extract peptides. A control extraction (blank) was performed using a piece of the gel from a region between the protein bands. 2.6. On membrane Digestion Proteins previously separated by 1-DE or 2-DE were electroblotted onto a PVDF membrane using the semi-dry method essentially according to a previous description (Jin & Cerletti, 1992) using 10 mM CAPS pH 11 or half-strength Towbin’s (½Towbin) pH 8.4 with both 0.01% SDS in 10% methanol, using a laboratory-made semi-dry apparatus. Transfer was complete after 3 hours at 1 mA/cm2. PVDF membranes were stained with amido black (0.5% w/v), isopropanol (25% v/v) and acetic acid (10% v/v) for 1 minute and destained with repeated washing with deionised water. Tryptic digestion was performed according to previous work (Pappin et al., 1996), modified as described below. Pieces of membrane were excised and destained with 500 Pl of 50% methanol during 2 hours at room temperature. Following removal of the supernatant and drying of the membrane, 10 Pl of 50 mM ammonium bicarbonate 30% ACN and 4 Pl of 0.1 Pg/Pl trypsin were added and incubated over night at room temperature. Supernatant was collected and the membrane extracted with 20 Pl 80% ACN during 15 minutes with sonication to extract the peptides from the PVDF. The extract was pooled with the previous supernatant. After drying in the vacuum centrifuge, the digested material was resuspended in 30% ACN, 0.1% TFA. A control extraction (blank) was performed using a piece of the gel from a region between the protein bands. 2.7. OSDT process Immediately after the SDS-PAGE protein separation, gels were soaked in deionised water for 5 minutes, and then equilibrated 10 minutes in ½Towbin buffer containing 0.01% (w/v) of SDS. Electrotransfer was carried out in a laboratory-made semidry apparatus overnight at room temperature. In order to increase the migration time of
MOLECULAR SCANNER DEVELOPMENT
125
the protein through the IAV membranes during transfer (and thereby allow more time for digestion to take place), we used an asymmetrical alternating voltage. We selected a square wave form alternating voltage: +12.5 V for 125 ms followed by -5 V for 125 ms, repetitively. The transblotting process was completed after 12-18 hours. To perform the digestion during the electroblotting, a double layer of IAVtrypsin membrane was intercalated between the polyacrylamide gel (where the protein resided) and the PVDF membrane (which acted as the collecting surface), to create a transblot-digestion sandwich. After the transfer procedure, the PVDF membranes were washed in deionised water for 5 minutes (and when required were stained). 2.8. PIGD Immediately after SDS-PAGE protein separation, gels were soaked 3 times in deionised water for 5 minutes. The entire wet gel or a selected part of it was air dried at room temperature, over night. The gel was then rehydrated and incubated at 35°C with a volume (corresponding to 3-5 times the initial volume of the gel) of 0.1 mg/ml trypsin in 10 mM Tris-HCl, pH 8.2. After 30 minutes of incubation for rehydratation and partial protein digestion, the excess of trypsin solution was removed. Then, the gel was incubated for a further 30 minutes at 35°C to complete the digestion. Proteins and peptides contained in the gel were electroblotted onto PVDF membranes using the procedure described above. 2.9. DPD combined method After migration, gels were soaked and dried as for the PIGD procedure. They were rehydrated with 0.05 mg/ml trypsin in 10 mM Tris-HCl, pH 8.2 during 30 minutes at 35°C. At this stage, the gel was transblotted onto PVDF membrane using the OSDT process. 2.10. MALDI-TOF-MS MS measurement from PVDF membranes and liquid solution were conducted with a MALDI-TOF mass spectrometer Voyager™ Elite (PerSeptive Biosystems, Framingham MA, USA) equipped with a 337 nm nitrogen laser. The analyser was used in the reflectron mode at an accelerating voltage of 20 kV, a delayed extraction parameter of 140 ns and a low mass gate of 850 Da. Laser power was set slightly above threshold for molecular ion production. Spectra were obtained by summation of 10 to 256 consecutive laser shots. For both, IGD and OMD, solutions were used directly without further sample preparation or cleanup prior to MALDI-TOF-MS analysis. One Pl of the digested protein solution was loaded on the MALDI stainless steel sample plate and 1 Pl of 4 mg/ml ACCA in 30% ACN, 0.1% TFA matrix solution used was added and air-dried. Autolysis products of trypsin were used as
126
BIENVENUT ET AL.
internal calibrants (singly protonated peptides 98-107 and 58-77). For PVDF membranes, two different methods were used: sequential or automatic. The sequential method was used only to obtain a single MALDI spectrum from a limited portion of the membrane. Small pieces of PVDF (1x1 to 2x4 mm2) were cut and fixed with silicone grease to an appropriately modified MALDI sample plate. One Pl of matrix solution (5 mg/ml ACCA in 70% MeOH) was deposited onto the PVDF membrane. For internal calibration purposes, the matrix solution also contained two synthetic peptides. Development of an automated procedure which enables scanning of the membrane is described in detail in Binz et al. article (Binz, Muller et al., 1999). 2.11. Post-acquisition processing and software identification tools Measured masses were submitted to the PMF search tool PeptIdent (Binz, Wilkins et al., 1999; Wilkins et al., 1999) (http://www.expasy.ch/tools/) located on the ExPASy server (http://ch.expasy.org). Some restrictions were applied. The apparent masses of the parent protein based on electrophoretic migration were used with a margin of ±20%. The species of origin of the various proteins were also taken as known. No pI limits were introduced to restrict the search for the SDS-PAGE standards. For 2-DE, pI and MW values were determined by gel matching to the human plasma SWISS-2DPAGE master gel available on the ExPASy server (http://www.expasy.ch/ch2d/). These values were used with a tolerance of ±1 pI unit and 30% of the MW. Peptide mass tolerance used was ± 0.2 Da for both 1-DE or 2DE. Cysteine and methionine modifications were chosen on the web submitting form depending on chemical treatment applied to the protein sample. FindMod (http://www.expasy.ch/tools/) was used for peptide identification when comparing digestion efficiency. To compare the efficiency of different techniques, we calculated the percentage of amino acids covered by the identified fragments for each protein. The better the digestion efficiency, the fewer missed cleavages (MC) were found. 3. RESULTS 3.1. Activity measurement of trypsin covalently bound to the IAV membrane The enzyme surface density as determined by TAME test showed 0.90 ± 0.20 Pg of active trypsin per cm2 (53 TAME tests and 18 membranes tested). No correlation was found between the initial trypsin concentration during the membrane preparation and the surface density (Corr.: -0.18). Activity remained stable up to a
OMC
OSDT
PIDG
DPD
Value
±14.3
18.1
45.2
Av.
35.3
ALBU
MYSS
43.6
OVAL
49.4
35.1
CAH2
39.4
64.1
ITRA
BGAL
61.2
LYC
PHS2
60.3
BPT1
82.7
44.8
90.0
82.3
87.5
83.3
100.0
100.0
90
66.7
13.8
33.3
10
11.8
12.5
16.7
0.0
0.0
10.0
33.3
3.5
22.2
0.0
8.8
0.0
0.0
0.0
0.0
0.0
0.0
±16.3
38.8
9.7
29.5
21.3
46.3
42.3
44.2
56.6
60.3
78.9
87.5
81.2
57.4
73.2
85.4
84.2
95.8
66.7
19.1
12.5
18.8
33.2
26.8
8.3
15.8
4.2
33.3
2.0
0.0
0.0
9.4
0.0
6.3
0.0
0.0
0.0
±6.7
23.5
32.4
16.8
12.8
27.5
25.8
25.6
67.1
50.0
67.0
55.2
80.2
75.6
73.6
30.6
50.0
24.4
42.9
19.8
21.4
26.4
2.3
0.0
8.6
15.7
0.0
3.3
0.0
±17.0
34.4
27.0
28.8
35.4
25.1
26.3
26.9
55.6
69.3
69.9
74.4
63.8
56.5
87.5
87.5
71.4
43.2
75.0
25.8
19.9
30.1
21.1
12.5
12.5
28.6
56.8
25.0
4.3
5.7
6.1
22.3
0.0
0.0
0.0
0.0
0.0
±11.9
25.0
8.5
35.0
27.6
25.4
26.0
24.7
13.1
16.6
48.3
78.6
61.5
87.0
61.1
57.1
16.7
100.0
71.4
85.7
100
20.2
38.5
13.0
27.8
42.9
83.3
0.0
28.6
14.3
0.0
2.0
0.0
0.0
11.1
0.0
0.0
0.0
0.0
0.0
0.0
Proteins % Cov 0MC 1MC 2MC % Cov 0MC 1MC 2MC % Cov 0MC 1MC 2MC % Cov 0MC 1MC 2MC % Cov 0MC 1MC 2MC
IGD
Table 1. Results of IGD, OMD, OSDT, PIGD and DPD expressed as percentage of sequence coverage (% Cov.), and efficiency of the digestion (0 to 2 peptides missed cleavages: 0 MC, 1 MC and 2MC)
MOLECULAR SCANNER DEVELOPMENT 127
128
BIENVENUT ET AL.
year when membranes were stored in 46 mM Tris-HCl, 1 mM CaCl2, 0.1% azide buffer at 4°C. Membranes could be reused: tryptic activity decreased slightly after at least 3 cycles of transblot-digestion but was still sufficient. 3.2. IGD Figure 1A shows 1-DE separation of the broad range MW standard proteins. Protein IGD was performed using a gel with similar separation using 1 Pg for each protein band. Results of this digestion are summarised in Table 1. The average value of protein sequence coverage obtained with tryptic digestion is 45.2 ± 14.0 % coverage (9 tested proteins). These results are better than those obtained by Sheler et al. (Sheler et al., 1998) for tryptic IGD for CBB R 250 proteins staining (27.8 ± 3.0% coverage, 4 tested proteins) and similar to their results obtained with tryptic digestion for CBB G 250 proteins staining (39.5 ± 13.5% coverage, 9 tested proteins). Mass spectra gave sufficient information to identify all 9 studied proteins with the PMF technique.
Figure 1. 1-DE (Coomassie Blue stained) separation of broad range molecular weight standards on 12% PAGE (A) or PVDF membranes (amido black stained)obtained after different transblotting processes: Standard transblotting using ½ Towbin buffer without SDS (B) or with 0.01% SDS (C), using CAPS buffer with 0.01% SDS (D) and different digestion methods including OSDT (E), PIDG (F) and DPD (G)
3.3. OMD This technique can afford high recovery of protein during the transblotting process, although high MW and basic proteins present transblotting difficulties. Addition of a
MOLECULAR SCANNER DEVELOPMENT
129
small amounts of SDS helps the high MW proteins migration but has little to no effect on basic proteins (Mozdzanowski & Speicher, 1992). Addition of 0.01% SDS to the ½ Towbin buffer was found to have a positive effect on phosphorylase b and E-galactosidase transfer (Figure 1B-C). For the more basic proteins (pI > 8.5), a more basic buffer such as CAPS pH 11 is much more effective compared to ½ Towbin, pH 8.4. As an example in Figure 1B, myosin (MW = 220 kDa), lysozyme (pI = 9.3) and pancreatic trypsin inhibitor (pI = 9.2) are not detectable when ½ Towbin is used, in spite of the presence of 0.01% of SDS. CAPS buffer is more effective for transblot of basic proteins (Figure 1D), but high MW proteins such as myosin are still not detectable. Due to these difficulties to transblot some proteins from 1-DE or 2-DE to the PVDF membrane, it is not possible to identify all proteins. The results of OMD obtained for transblotted proteins using the best conditions (CAPS buffer, 0.01% SDS) are summarised in Table 1. Mass spectra gave sufficient information to identify 8 of the 9 proteins studied (myosin is missing) with the PMF technique. Results in term of amino acids sequence coverage shows 38.8 ± 16.3% (9 tested proteins). IGD and OMD were thus very similar for sequence coverage and also for digestion efficiency, with 70 to 80% of no MC peptides, around 20% of 1 MC and 0 to 5% of 2 MC. These two methods commonly used for proteins identification by PMF were the reference methods with which our new approaches were compared. 3.4. OSDT The electrotransfer process was optimised using different voltage profiles. The continuous voltages usually applied for the electroblot process were not satisfactory in terms of protein digestion in the range studied (10 to 40 V, data not shown). We postulated that proteins were not staying long enough and "shaked" in the tryptic interface to be efficiently digested. In order to reduce the effective migration rate of protein through the IAV membranes during the transfer, we applied an asymmetrical alternating voltage with square wave form. Due to our choice of square wave form voltage, the effective voltage applied was 3.7 V and the transblotting process was completed after 12-18 hours. During this period where IAV-trypsin membranes were intercalated between the SDS-PAGE and PVDF membrane, only a small loss of protein or polypeptide resolution by diffusion is evident (Figure 1 E, to be compared with the normal transblot, Figure 1D). However, since this process is based on electroblotting, problems of extracting high MW and basic proteins from the gel as described above are still applicable. This is shown in Figure 1E where pancreatic trypsin inhibitor, lysozyme and myosin are not visualised on the collecting PVDF membrane. The results obtained with this combined digestiontransfer method shows lower sequence coverage (23.5 ± 6.7%, 27 tested proteins) as well as lower digestion efficiency (more missed cleavages) than IGD (45.2 ± 14.0% coverage) or OMD (38.8 ± 16.3% coverage). Mass spectra gave sufficient information to identify 6 of the 9 proteins studied with the PMF technique.
BIENVENUT ET AL.
130 3.5.PIGD
The rehydrated gel was transblotted using standard conditions, with CAPS buffer to improve the transfer of digested basic or high MW proteins. Most of the digested proteins were transblotted (Figure 1F) to the PVDF membrane under the best electroblotting conditions. The results obtained with this technique in terms of sequence coverage were good for basic and high MW proteins (69.3% coverage for pancreatic trypsin inhibitor, 55.6% for lysozyme, 35.4% for albumin, 28.8% for PHS2, see Table 1). In term of digestion efficiency, this method is similar to OSDT with 67% of 0 MC, 30% of 1 MC and 2% of 2 MC. Mass spectra gave sufficient information to identify 8 of the 9 proteins studied with the PMF technique. 3.6. DPD applied to 1-DE Peptide masses corresponding to the matched peptides using PeptIdent tool are labelled. The combination of the 2 techniques led to a great improvement of protein digestion and transfer (Figure 1G) of polypeptides fragments to the collecting membrane. The results of this method are summarised in Table 1. In term of sequence coverage, the results obtained are lower than the standard and PIGD methods but similar to OSDT technique. It appears that the digestion quality was similar to the sequential digestion method with 80% of 0 MC, 20% of 1 MC and 04% of 2 MC (compared with IGD and OMD, Table 1). Mass spectra gave sufficient information to identify all 9 studied proteins with the PMF technique. Table 2: Results for PIGD, OSDT and the DPD applied to the mini 2-DE gel * Analyses were performed under the same conditions using the APO A1 spot PIGD
OSDT
DPD
46.5 ± 13.9
38.6 ± 14.8
67.2 ± 11.3
0 MC
29.6
38.3
40
1 MC
63
28.3
50
2 MC
7.4
33.3
10
12 ± 5
10 ± 4
20 ± 3
18500 ± 7500
9500 ± 8000
35000 ± 6000
% Coverage
no. of identified peptides Average intensity of the 5 highest peaks
MOLECULAR SCANNER DEVELOPMENT
131
Figure 2: Fragment of the PVDF amido black stained membrane and MALDI-TOF-MS spectra of APA1 obtained directly from the collecting PVDF membrane with the 3 different digestion processes: PIGD, B) OSDT and C) DPD
3.7. Comparative digestion between OSDT, PIGD and the DPD applied to 2-DE We compared the performance of each technique with proteins of human plasma separated by mini 2-DE (Figure 2). One selected protein (Apolipoprotein A-1, APA1) was analyzed in each experiment. The MALDI-TOF-MS mass spectra obtained with the three different techniques are shown in Figure 2 and results are summarized in Table 2.
BIENVENUT ET AL.
132
This comparison shows clearly the advantage of the combined method in terms of sequence coverage, which was similar for PIGD (46.5 ± 13.9%, 3 samples) and OSDT (38.6 ± 14.8%, 3 samples) but significantly higher for the combined technique (67.2 ± 11.3%, 3 samples). Signal intensity was also higher for the combined technique. 3.8. DPD applied to 2-DE The DPD technique was applied to a mini 2-DE separation of an E. coli sample as described in materials and methods. The same sample was run on 3 gels: one was used for Coomassie blue staining (Figure 3 A) and the second was electrotransferred to a PVDF membrane which was stained with amido black (Figure 3 B). The DPD technique was applied to the last gel (Figure 3 C) and the collecting PVDF membrane was stained with amido black, however, no spot was visible after this destaining step. A 9 x 13 mm area was cut from the collecting membrane that (representing a pI range from 5.1 to 5.2 and a MW range from 35 to 45 kDa) and scanned every 300 Pm by MALDI-MS. The 1536 spectra obtained were used to recreate the MS intensity image (Figure 3 D). Each spectrum was also used for protein identification. Four different proteins located in 6 different positions were identified. These scanning results were confirmed by IGD of the corresponding spot of the Coomassie blue stained gel. Using this technique, the overlap of 3 proteins (IDH_ECOLI, AC: P08200; PGK_ECOLI, AC: P11665: METK_ECOLI, AC: P04384) was clearly visualised with the imaging software MELANIE (Appel, Palagi et al., 1997; Appel, Vargas, Palagi, Walther, & Hochstrasser, 1997). 4. DISCUSSION AND CONCLUSION As a consequence of the increasing importance of proteome mapping for biological and clinical (Hochstrasser, 1997) applications, new high throughput identification methods are needed. The different techniques proposed here highlight IGD as the gold standard for protein identification using PMF technique in terms of percent coverage of the sequence (45.2 ± 14.0%) and the number of proteins identified (9 out of 9). At present, the traditional IGD method, involving sequential gel spot excision and digestion, is a bottleneck for protein identification (Hochstrasser, 1998; Houthaeve et al., 1997) Different ways to speed up the process have been proposed, e.g. using robotics for cutting spots from gels and for automating sample handling (Traini et al., 1998b). Although quicker and more reproducible than the manual procedure, the method remains a sequential one. In addition, the size of the spot to be excised is usually defined a priori and cannot be adapted as a function of protein quantity as well as protein overlapping. In contrast to these sequential approaches we have proposed a parallel method of protein digestion. Proteins of 1-DE or 2-DE gels are treated simultaneously, thus providing a highly parallel digestion technique. Two new approaches were studied separately and also combined. All methods
MOLECULAR SCANNER DEVELOPMENT
133
produce a collection of digested protein fragments on a PVDF membrane after a transblotting process. PVDF membranes stained with amido black are shown in Figure 1 B to G. Intensities of the stained proteins differ depending of the digestion technique, probably due to the different relative staining efficiencies of peptides and proteins. It is well known that protein-dye interactions are mostly electrostatic and non-specific in nature, i.e. Van der Waals or hydrogen bonds (Salih & Zenobi, 1998). Dyes such as amido black (sulfonate derivatives) act mainly through electrostatic interactions with the basic residues (lysine, arginine, histidine and N-terminus amino group) of polypeptides. A minor part of the complex formation is due to low energy interaction. With large proteins, 1 dye molecule reacts with 1 amino group to create a negatively charged multidye (Salih & Zenobi, 1998) to which further dye molecules become associated: in the case of lysozyme with 18 basic residues, 48 dye molecules can be bound (Tal et al., 1985) With small peptides, on the other hand, 1 dye molecule can complex more than one peptide (e.g. 2 peptides for one amido black molecule) and no low energy interactions can be developed (Salih & Zenobi, 1998). Thus, the staining intensity of a protein decreases continuously as a function of the extent of digestion. Figure 2 shows apolipoprotein A1 after 2-DE and parallel digestion. For the OSDT sample, the staining intensity is the highest but MALDIMS spectrum intensity is the lowest, whereas for the DPD sample the spectrum intensity is the highest and the staining intensity of the protein on the collecting membrane is the lowest. In the OSDT approach, proteins are extracted from the gel and digested during the transfer to the PVDF collecting membrane. Trypsin immobilised on IAV membranes was found suitable. These membranes were originally designed for covalent protein microsequence analysis (Coull, Pappin, Mark, Aebersold, & Koster, 1991; Pappin, Coull, & Koster, 1990) and were also used for polypeptide or enzyme immobilisation (Canas, Dai, Lackland, Poretz, & Stein, 1993; Seo et al., 1993). In our procedure, trypsin was attached covalently to the IAV membrane and then used in the transblotting sandwich. Others enzymes could be similarly immobilised to the IAV membrane. A compromise must be found between a buffer which facilitates efficient polypeptide transfer (especially high MW and basic fragments) and one in which trypsin is active. Due to the limited range of pH for optimum trypsin activity, a basic buffer like CAPS buffer cannot be used. With ½ Towbin at a pH suitable for tryptic activity, the transblotting buffer was not at the optimum pH and composition for the transfer of basic and high MW proteins. Low MW proteins (below 60 kDa) showed higher sequence coverage than larger proteins (except BGAL: in spite of its high MW, this protein has transferred under all blotting conditions). Basic pI and high MW proteins generally presented problems for the transblot process and so for high yield digestion. As a consequence, OSDT is particularly well adapted to low MW polypetides (<60 kDa) with pI value below 8.5 and cannot be applied with success to high MW and mostly basic proteins.
G
IEF
5.5
PGK_EC
IDH_
I
Figure 3: DPD result from a mini 2-DE of E.coli extract. A) Coomassie Blue stained mini 2-DE gel. B) Amido Black stained PVDF membrane after electrotransfer. The image corresponds to the doted area in A). C) Amido Black stained PVDF membrane after DPD. The image corresponds to the doted area in A). D) MS intensity image obtained after MS scanning of the straight line delimited area of C (pI range: 5.15.2, MW range: 35-45kDa, Real size: ~ 9 x 13 mm, 1536 MS spectra). The corresponding areas in A) and B) are similaryly represented. The circles correspond to the areas were proteins were identified and crosses indicate centers of spots.
SD
5.0
134 BIENVENUT ET AL.
MOLECULAR SCANNER DEVELOPMENT
135
The second method studied was IGD of proteins as previously described (Hellman et al., 1995; Rosenfeld, Capdevielle, Guillemot, & Ferrara, 1992; Shevchenko, Wilm et al., 1996) but adapted to treat a whole gel (PIGD). This technique is much better adapted to basic and high MW proteins since there is no compromise to be found between good digestion conditions and good transfer conditions: the steps are separated. Proteins are first partially digested to produce small fragments, then transblotted. These small polypeptides, which have a wide range of hydrophobicity and pI values, are ideal for transblotting and for accurate MS analysis. As an example, lysozyme (MW = 14313 Da, pI = 9.3, GRAVY = 0.19) theoretically produced fragments with trypsin for at more 1 missed cleavage, gives MW range from 132 Da up to 3163 Da, pI range from 6 to 10 (calculated with computed pI/MW (Binz, Wilkins et al., 1999), http://www.expasy.ch/tools/) and GRAVY (calculated with ProtParam, http://www.expasy.ch/tools/) range from +0.26 to -0.81. Proteins that could not migrate intact can be transblotted as fragments. Thus, with this method, the digestion of high MW and basic proteins is achieved, the fragments transblot well and permit characterisation of the respective proteins. The time allowed for air-drying of the gel was critical. It needs to be dried as much as possible and kept in a flat position to allow extensive and homogeneous rehydration with trypsin solution. Usually, this digestion step was conducted at 35°C but could be performed at 4°C, which would slow down the digestion process. Two problems appeared. First, the gel rehydration must be incomplete to avoid the problem of gel sweating and consequent diffusion of proteins and/or peptides. Extensive diffusion was avoided with an incubation time of around 30 minutes corresponding to 60 to 80% gel rehydration. The second problem, which is to deliver a suitable degree of protein digestion, is more difficult to handle and is linked to the first. With insufficient proteolytic digestion, a large number of "multimissed" cleavages will disturb protein identification by databases matching. On the other hand, extensive digestion after a longer incubation period promotes peptide diffusion and decreases the signal intensity (which is concentration dependent), an effect mostly marked with low MW proteins, as they are most rapidly digested. The PIGD technique is thus relatively well adapted to high MW and basic proteins but it is difficult to use if proteins with a large range of MW are present. Separately, these two techniques led to successful parallel digestion either for low molecular weight (< 60 kDa, OSDT) or high molecular weight proteins (> 40 kDa, PIGD). The methods are thus complementary and the combination of PIGD and OSDT was tested successfully. In this combined method, the first step involved modifying the physical and chemical properties of the proteins by partial enzymatic digestion. The resulting fragments were then transblotted (and further digested) using the OSDT procedure. By comparison to the ”gold standard” (IGD), the percent sequence coverage obtained with the DPD technique is so far lower (25.0 ± 11.9% for the DPD and 45.2 ± 14.0% for the IGD) but was sufficient to allow identification of all 9 proteins in spite of their mixed characteristics (high and low MW, basic and hydrophobic). When this technique was applied to a mini 2-DE gel of E. coli, the
136
BIENVENUT ET AL.
scanning process allowed us to obtain spectra from overlapping 2-DE proteins (IDH_ECOLI, AC: P08200; PGK_ECOLI, AC: P11665: METK_ECOLI, AC: P04384). The spatial resolution of this technique is a result of the relatively narrow dimension of the laser beam dimension compared to the size of gel pieces used for IGD. Another advantage of this method in comparison with IGD is that the digestion is highly parallel: with DPD, thousands of proteins may be digested simultaneously overnight. It might prove possible to add other enzymatic activities to the digestion sandwich. One could envisage using a phosphatase or a glycosidase followed by an endoproteinase. A major drawback of this technique would appear to be the low intensity of peptide staining compared to protein, which limits visualisation of the gel separation after polypeptide electroblotting. However, this parallel digestion approach was developed for detection by MALDI-TOF MS scanning. A molecular scanner providing virtual visualisation of the collecting membrane limits the impact of this problem and is proposed by Binz et al. (Binz, Muller et al., 1999). This scanning method is part of a highly automated integrated system involving automated scanning by MALDI-TOF MS, spectra treatment, identification of proteins by PMF and creation of a fully annotated 2-DE map. 5. ACKNOWLEDGMENTS This work was supported by the Swiss National Fund for Scientific Research (grant 32-49314.96) and the Montus Foundation. PAB acknowledges financial support from the Helmut Horten Foundation. 6. REFERENCES Appel, R., Palagi, P., Walther, D., Vargas, J., Sanchez, J., Ravier, F., et al. (1997). Melanie II--a thirdgeneration software package for analysis of two-dimensional electrophoresis images: I. Features and user interface. Electrophoresis, 18(15), 2724-2734. Appel, R., Vargas, J., Palagi, P., Walther, D., & Hochstrasser, D. (1997). Melanie II--a third-generation software package for analysis of two-dimensional electrophoresis images: II. Algorithms. Electrophoresis, 18(15), 2735-2748. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 49814988. Binz, P., Wilkins, M., Gasteiger, E., Bairoch, A., Appel, R., & Hochstrasser, D. (1999). In R. Kellner, F. Lottspeich & H. Meyer (Eds.), Microcharacterisation of proteins (2nd ed., pp. 277-300). Berlin: Wiley-VCH. Canas, B., Dai, Z., Lackland, H., Poretz, R., & Stein, S. (1993). Covalent attachment of peptides to membranes for dot-blot analysis of glycosylation sites and epitopes. Anal. Biochem., 211(2), 179182. Coull, J., Pappin, D., Mark, J., Aebersold, R., & Koster, H. (1991). . Anal. Biochem., 194, 110-120. Eckerskorn, C., Strupat, K., Schleuder, D., Hochstrasser, D., Sanchez, J., Lottspeich, F., et al. (1997). Analysis of proteins by direct scanning IR-MALDI-MS after 2-D PAGE separation and electroblotting. Anal. Chem., 69, 2888-2892.
MOLECULAR SCANNER DEVELOPMENT
137
Figeys, D., Ducret, A., Yates, J., & Aebersold, R. (1996). Protein identification by solid phase microextraction-capillary zone electrophoresis-microelectrospray-tandem mass spectrometry. Nature Biotechnology, 14(11), 1579-1583. Hellman, U., Wernsted, C., Gonez, J., & Heldin, C. H. (1995). Improvement of an in-gel digestion procedure for the micropreparation of internal protein-fragments for amino acid sequencing. Anal. Biochem., 224(1), 451-455. Hochstrasser, D. (1997). In M. Wilkins, K. Williams, A. RD & D. Hochstrasser (Eds.), Proteome research: new frontiers in functionnal genomics. Berlin: Springer-VCH. Hochstrasser, D. (1998). Proteome in perspective. Clin. Chem. Lab. Med., 36(11), 825-836. Houthaeve, T., Gausepohl, H., Ashman, K., Nillson, T., & Mann, M. (1997). Automated protein preparation techniques using a digestt robot. J. Prot. Chem., 16(5), 343-348. Jin, Y., & Cerletti, N. (1992). Appl. Theor. Electrophor., 3, 1342-1351. Jungblut, P., Thiede, B., Zimmy-Arndt, U., Muller, E., Scheler, C., Wittmann-Liebold, B., et al. (1996). Resolution power od 2-DE and identification of proteins from gels. Electrophoresis, 17(5), 839-847. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. Ogorzalek Loo, R., Mitchell, C., Stevenson, T., Martin, S., Hines, W., Juhasz, P., et al. (1997). Electrophoresis, 18, 382-390. Pappin, D., Coull, J., & Koster, H. (1990). In J. Villafranca (Ed.), Current research in protein chemmistry (pp. 191-202). San Francisco: Academic Press. Pappin, D., Coull, J., & Koster, H. (1990). Solid-phase sequence analysis of proteins electroblotted or spotted onto polyvinylidene difluoride membranes. Anal. Biochem., 187(1), 10-19. Pappin, D., Rahman, D., Hansen, H., Bartlet-Jones, M., Jeffery, W., & Bleasby, A. (1996). In A. Burlingame & S. Carr (Eds.), Mass spectrometry in the biological science (pp. 135-150). Totawa, NJ: Humana press. Rosenfeld, J., Capdevielle, J., Guillemot, J., & Ferrara, P. (1992). In-gel digestion of proteins for internal sequence analysis after one- or two-dimensional gel electrophoresis. Analytical Biochemistry, 203(1), 173-179. Salih, B., & Zenobi, R. (1998). MALDI mass spectrometry of dye-peptide and dye protein complexe. Anal. Chem., 70, 1536-1543. Sanchez, J., & Hochstrasser, D. (1998). In A. Link (Ed.), Method in molecular biology: 2-d proteome analysis protocol (Vol. 112, pp. 227-233). Totowa, NJ: Humana press. Scheler, C., Lamer, S., Pan, Z., Li, X., Salnikov, J., & Jungblut, P. (1998). Peptide mass fingerprint sequence coverage from differently stained proteins on 2-DE patterns by MALDI-MS. Electrophoresis, 19, 918-927. Seo, M. L., Kim, J. S., Lee, S. S., Bae, Z. U., Lee, H. L., & Park, T. M. (1993). Amperometric enzyme electrode for the determination of NH4+. J. Korean Chem. Soc., 37(11), 937-942. Shevchenko, A., Wilm, M., Vorm, O., & Mann, M. (1996). Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Analytical Chemistry, 68(5), 850-858. Tal, M., Silberstain, A., & Nusser, E. (1985). Why does coomassie brillant blue R interact differently with different proteins ? a partial answer. J. Biol Chem, 260(18), 9976-9980. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J.-C., et al. (1998). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19, 1941-1949. Wilkins, M., & al., e. (1999). High throughput mass spectrometry discovery of protein post translational modification. J. Mol. Biol., 289, 645-657. Wilkins, M., Pasquali, C., Appel, R., Ou, K., Golaz, O., Sanchez, J., et al. (1996). From proteins to proteomes: large scale protein identification by 2-D electrophoresis and amino acids analysis. Bio/techniques, 14, 61-65. Williams, K., & Hochstrasser, D. (1997). In Proteome Research: New Frontiers in functional genomics (pp. 1-12). Berlin: Springer-Verlag. Wilm, M., & Mann, M. (1996). Analytical properties of the nanoelectrospray ion source. Anal Chem, 68(1), 1-8.
CHAPTER 3 QUANTITATION DURING ELECTROBLOTTING STEP Enhanced Protein Recovery after Electrotransfer using Square Wave Alternating Voltage. Reprint with permission from (Bienvenut, Deon, Sanchez et al., 2002) copyright 2002, with permission from Elsevier Science
WV. Bienvenut, C. Deon, J-C. Sanchez, DF. Hochstrasser
ABSTRACT Protein identification is becoming a complement to the available fully sequenced genomes. To meet the challenge, newly developed techniques for high throughput protein identification using matrix-assisted laser desorption/ionisation mass spectrometry (MALDI-MS) and peptide mass fingerprint are needed. Two years ago, a parallel protein digestion process was proposed. It provided a collecting polyvinylidene difluoride (PVDF) membrane able to be scanned by MALDI. Acquired data were used to recreate a virtual multidimensional image. Voltage used during this protein electroblotting technique was an unusual square wave alternative voltage (SWAV). The goal of the current study is to evaluate quantitatively the efficiency of the SWAV compared with a classical electroblot process on intact proteins. The effect of the pulsed electric field and the buffer composition were compared to a standard continuous transblotting process defined as the gold standard. Combination of the pulsed asymmetric electric field with 3(cyclohexylamino)-1-propane-sulfonique acid (CAPS) buffers showed an average 65 % increase of protein recovery. Moreover, a strongest effect is observed for high Mr proteins. In conclusion, the present study highlighted a positive influence of the "shaking" effect of the asymmetric alternative voltage on gel protein extraction. KEYWORDS Protein recovery / Electroblot / Electroelution / Squared wave alternating voltage / Quantification
139 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 139–150. © 2005 Springer. Printed in the Netherlands.
140
BIENVENUT ET AL.
1. INTRODUCTION The rapid development of genome sequencing projects is producing a huge amount of potentially expressed proteins from many species {Blattner, 1997 #84;Consortium, 1998 #893;Venter, 2001 #1377;Consortium, 2001 #544}.By translation, most DNA sequences give primary structures of potential proteins. These information sources are important for further investigations and ultimate comprehension of organism physiology (Achaz et al., 2000; Hood, 1999). However, identification and characterization of these proteins is an immense challenge due to the huge number of potential samples (proteins) in a single gel. Mass spectrometric techniques such as MALDI-MS (Karas & Hillenkamp, 1988) and electrospray tandem mass spectrometry (Aleksandrov et al., 1984; Yamashita & Fenn, 1984) developed during the last decade allow rapid identification of target proteins with acceptable accuracy (Joubert-Caron et al., 2000; Kaji et al., 2000; Raymackers et al., 2000). New techniques for a genuinely high throughput are needed and our laboratory proposed a technique where proteins are endoproteolytically cleaved in a parallel way. This technique was named "Double Parallel Digestion" (DPD). The collecting membrane was directly scanned by MALDI-MS and a virtual multidimensional image was obtained by bioinformatic treatment of the MALDI-MS data (Bienvenut et al., 1999; Binz, Muller et al., 1999). Due to the large difference between standard electroblotting techniques using continuous current or tension and our electroblotting specification, it was interesting to determine the influence of such method using a SWAV electroblotting with ½ Towbin buffer (Eckerskorn & Lottspeich, 1990). The difference of protein recovery under different electric fields (continuous current and SWAV) and/or buffers (½ Towbin and CAPS) were evaluated. 2. MATERIAL AND METHODS 2.1. Mono-dimensional electrophoresis (1-DE) 1-DE was conducted essentially according to Laëmmli (Laemmli, 1970) with 12% T and 2.6% C linear polyacrylamide gels. Three different protein mixtures were used for this study. The first two mixtures were the commonly used, wide range Mr standard proteins containing nine unlabelled proteins (MYSS, mixture of myosin from rabbit skeletal muscle (SWISS-PROT accession number: Q28641 and P02562); BGAL: E-Galactosidase from E. Coli (SWISS-PROT accession number: P00722); PHS2: Phosphorylase b from rabbit skeletal muscle (SWISS-PROT accession number: P00489); BSA: Bovine serum albumin muscle (SWISS-PROT accession number: P02769); OVAL: ovalbumin from chicken hen egg white (SWISS-PROT accession number: P01012); CAH2: carbonic anhydrase type 2 from bovine serum (SWISS-PROT accession number: P01012); ITRA: trypsin inhibitor
QUANTITATION DURING ELECTROBLOTTING STEP
141
from soybean (SWISS-PROT accession number: P01012); LYC: lysozyme from Chicken hen egg white (SWISS-PROT accession number: P01012); BPT1: trypsin inhibitor from Bovine pancreas (SWISS-PROT accession number: P01012)) and the low range Mr standard proteins containing six unlabelled proteins (PHS2, BSA, OVAL, CAH2, ITRA, LYC). Both were obtained from Bio-Rad (Hercules, CA, USA). The third mixture of six [14C] labelled proteins (MYSS, PHS2, BSA, OVAL, CAH2, LYC) was purchased from Amersham Pharmacia Biotech (Upsala, Sweden). For all of these experiments, 1 Pg of each unlabelled protein or 50 nCi of each [14C] radiolabelled proteins were loaded on a single lane of the 1-DE gel. For all mixtures, proteins were diluted to the correct concentration in 3% E-mercaptoethanol in TrisHCl (60mM, pH 6.8), glycerol (10% v/v), sodium dodecylsulfate (SDS 2% w/v) and reduced at 95°C for 5 minutes before 1-DE migration. Protein migration was carried out using the Mini-Protean II electrophoresis apparatus (Bio-Rad) operated at 200 V for 45 to 50 minutes. When necessary, the gel was stained with Coomassie brilliant blue R250 (CBB R250 0.1% w/v), water (60% v/v), methanol (30% v/v) and acetic acid (10% v/v) for 30 minutes and destained with repeated washes of water (50% v/v), methanol (40% v/v) and acetic acid (10% v/v). 2.2. Electroblot Immediately after the 1-DE protein separation, gels were soaked in deionised water for five minutes, and then equilibrated two times five minutes in the cathodic blotting buffer. Trans-Blot PVDF membranes (Bio-Rad) were equilibrated in the anodic buffer for 5 minutes. The different buffers used for this study and their compositions are detailed in Table 1. Electrotransfer was carried out in a laboratorymade semi-dry apparatus during an overnight period at room temperature. A double layer of PVDF membrane was used just to verify that no protein cross over the first membrane. As previously described (Sanchez & Hochstrasser, 1998), the standard blotting technique is using a continuous current corresponding to 1 mA/cm2 of transferred gel using heterogeneous CAPS buffer. This technique will be referred as the gold standard method for further comparison due to its extensive utilization for protein electroblotting (Bolt & Mahoney, 1997; Jungblut et al., 1990; Mozdzanowski & Speicher, 1992; Neumann & Mullner, 1998; Sanchez & Hochstrasser, 1998). The voltage applied to the transblotting sandwich in the parallel digestion technique (13) was an asymmetrical alternating voltage. This square wave alternating voltage (SWAV) delivers +12 V during 125 ms followed by -5 V during 125 ms repetitively. It corresponds to a 4 hertz frequency signal and an average tension of 3.5 V. After the electroblotting step, the membranes were washed rapidly with deionized water and air-dried. When necessary after the transblotting operation, gels were stained with CBB R250 as previously described. PVDF membranes were stained with AB (Amido black 0.5% w/v), isopropanol (25% v/v) and acetic acid (10% v/v) for 1 minute and destained by repeated washes in deionized water.
BIENVENUT ET AL.
142
2.3. Detection, quantification and statistics The protein electroblotting technique has been widely used and subjected to many investigations in order to quantify protein recovery on the collecting membrane (1823). Usually, this was carried out with [14C] radiolabelled proteins, which emit Eparticles of low energy easily absorbed by the environment. Due to the thickness of the gel, it is not possible to obtain an accurate measurement of the E- signal emitted by the proteins. Therefore, an absolute quantification of protein recovery on the collecting membrane was not possible. To overcome this problem, the signals acquired on the collecting membranes were compared with a reference obtained from a 1-DE separation of the [14C] labelled protein standard mixture. Detection of the [14C] radioactivity was performed with a Phosphor-Imager apparatus (Molecular Dynamics, AP Biotech, Upsala, Sweden). Control experiments were also conducted with unlabelled proteins. The electroblotted material collected onto the PVDF membranes was stained with AB and detection of the bands was achieved with an optical laser scanner (Molecular Dynamics, AP Biotech, Upsala, Sweden). Melanie 3 software (GeneBio, Geneva, CH) (24) was used for image treatment and band quantification. The band volume and/or optical density (OD) were used throughout the study for recovery comparison. When possible, statistical studies were conducted using the F and Students t test. Table 1: Electroblotting buffer composition Name
Anodic buffer Cathodic buffer
Heterogeneous CAPS(Traini et al., 1998b) 10 mM CAPS, pH 11, 20% MeOH (v/v) 10 mM CAPS, pH 11, 5 % MeOH
Heterogeneous ½ Towbin (Eckerskorn & Lottspeich, 1990) 13 mM Tris, 100 mM glycine, 20% MeOH (v/v) 13 mM Tris, 100 mM glycine, 5% MeOH (v/v)
Homogeneous ½ Towbin (Kaji et al., 2000) 13 mM Tris, 100 mM glycine, 12.5% MeOH (v/v) 13 mM Tris, 100 mM glycine, 12.5% MeOH (v/v)
2.4. [14C] signal linearity and influence of the accumulation time. Detection of [14C] labelled samples needed a long exposure period to provide a valid signal despite the utilization of a sensitive phosphor-imager support. To eliminate dependence on the exposure time, the signal intensity was not used directly but always compared with a reference to obtain a ratio. One lane of [14C] labelled proteins separated by 1-DE was air dried between two cellophane sheets using an Easy Breeze Air Gel dryer (Hoefer, AP Biotech, Upsala, Sweden). This reference gel was exposed on the storage phosphor screen for various periods from 4 to 70 hours. Signal response was linear and proportional to the exposure time for the
QUANTITATION DURING ELECTROBLOTTING STEP
143
period between 20 to 70 hours (data not shown). All further analyses were done using an accumulation time within that range. The ratio between the signal of the reference gel and signal of the sample was calculated such that ratios from different samples could be compared directly, non-influenced by the exposure time. 3. RESULTS AND DISCUSSION Due to the large difference between the gold standard (Sanchez & Hochstrasser, 1998) versus the SWAV electroblotting technique (Bienvenut et al., 1999), this study was conducted to determine the impact of the transblotting conditions, i.e. electric field applied and buffer composition, to the proteins recovery. The results are described in the next four sections. 3.1. Comparison of the electric field and buffer composition effects
Figure 1: Images obtained during the comparison of protein recovery between gold standard vs. SWAV transfer. [14C] labelled proteins (A-D) or Bio-Rad Mr standard followed by AB staining (E-F) were separated by 1-DE then, electroblotted to PVDF membrane using different electric fields and buffers: A and E) Heterogeneous CAPS, 1 mA/cm2; B) Heterogeneous CAPS, SWAV; C) Heterogeneous ½ Towbin , SWAV; D and F) Homogeneous ½ Towbin, SWAV.
Two major parameters were tested in this section: the electric field and the buffer used for protein electroblotting previously described in section 2.2. Results of the [14C] labelled protein recovery using with three different buffers were compared to the gold standard method. The phosphorimages of the collecting PVDF membranes (Figure 1A-D) showed seven bands corresponding to the six separated proteins (MYSS, PHS2, ALBU, OVAL, CAH2, LYC) plus the migration front. The ratios of
BIENVENUT ET AL.
144
the band intensities were calculated by dividing the SWAV transblotting process intensity over gold standard technique intensity. The increased values corresponding to the six proteins common with [14C] labelled mixture are given in Figure 2. 250
% increase of protein recovery
200
150
100
50
0 MYSS
PHS2
BSA
OVAL
CAH2
LYC
Average
Proteins B/A
C/A
D/A
F/E
F
Figure 2: Percentage increase in protein recovery with SWAV transblotting process compared to gold standard method applied to [14C] labelled (B/A, C/A, D/A) and Bio-Rad Mr standard (F/E) proteins (calculated from the value obtained from images shown in Figure 1). B/A, C/A and D/A correspond respectively to the ratio of the [14C] labelled protein’s signal intensity of the sample B, C and D (Figure 1) over the intensity of the sample A (gold standard). Similar ratio is calculated for the AB stained proteins (F/E). Since BGAL (36% increase) and ITRA (21% increase) were not present in the [14C] labelled proteins standard mixture, the ratio values are not shown. F/E Ratio did not allow to calculate the MYSS ratio since this protein band is not visible on the PVDF membrane after staining. Buffer composition influence on protein recovery when SWAV is used by comparison to the gold standard method. Direct comparison of the SWAV and the continuous current with the heterogeneous CAPS buffer shows an average 65% increase of [14C] labelled protein recovery whereas 46 and 35% respectively are obtained with heterogeneous or homogeneous ½ Towbin buffers. For both series of proteins, the strongest increase is obtained for the high Mr proteins (MYSS or PHS2 for the [14C] labelled protein and BGAL or PHS2 for the BioRad Mr standard proteins).
It can be noted that the heterogeneous CAPS buffer with the SWAV showed the highest mean recovery, with 65% increase by comparison with the same buffer used with the continuous current transfer (gold standard method). This result highlighted the positive influence of the SWAV versus the continuous current. Utilization of ½ Towbin in heterogeneous or homogeneous composition showed a lower recovery
QUANTITATION DURING ELECTROBLOTTING STEP
145
than the heterogeneous CAPS buffer with 46% and 35% of increase respectively which was still higher than recovery with the gold standard method. The same experiment as described above was conducted with Bio-Rad Mr standard proteins followed by AB staining of the PVDF membrane (Figure 1 E and F). Height bands are visible corresponding to BGAL, PHS2, BSA, OVAL, CAH2, ITRA, LYC and the migration front that also contained BPT1. MYSS, due to its high Mr, was usually not extracted and thus remained undetectable. This was confirmed by the presence of the MYSS band that is visible in the CBB R250 stained gel after the transfer (data not shown). It must be noted that the second layer of capture membrane in both experiments (14C labelled protein and AB stainned proteins) never shows protein trace (data not shown). The average increase in protein recovery corresponds to 35 and 24% for respectively [14C] labelled and AB stained proteins. Nevertheless, due to the great disparity of recovery values related to the proteins, this result is not a good representation of the transfer. When the calculation is done only with the values obtained for both samples ([14C] and AB stained proteins), they show 26% increase of protein recovery. The increase in protein recovery is not identical for the whole Mr range and a positive impact was found for high Mr proteins (MYSS and PHS2 for [14C] labelled proteins, BGAL and PHS2 for the AB stained membranes). Table 2: Description of the parameters used for the comparison of protein recovery as a function of applied electric field and transblotting buffer. Experiment 1 corresponds to the reference for the comparison ratio summarized in Figure 1. Experiments 2, 3 and 4 are using the SWAV with different buffers that allow to compare their influence. *: Gold standard Experiment
1* 2 3 4 Heterogeneous Heterogeneous Heterogeneous Homogeneous Buffer CAPS CAPS ½ Towbin ½ Towbin Electric field 1 mA/cm2 SWAV SWAV SWAV
3.2. Statistical test for the transfer reproducibility In order to verify the reproducibility of the observed increase in protein recovery, the electroblotting experiments conducted in section 3.1 were repeated (n=6) using the low range Mr standard protein from Bio-Rad followed by AB staining of the PVDF membranes. Table 2 details the results obtained in this study. The average 23% increase in protein recovery was identical to the previous result (Figure 2). This statistical analysis showed clearly that the increase of protein recovery for three out of the six proteins (PHS2, OVAL, and CAH2) was significant (p < 0.05). The average increase for these proteins represents more than 20% and is also significant with p < 0.05. For the low Mr proteins, a net benefit of the SWAV by comparison with the gold standard method was not clearly established: the differences were not statistically significant for LYC and ITRA. Itt was also the case for BSA. It must be
146
BIENVENUT ET AL.
noted that this protein is highly soluble in aqueous solution and easily transferred under normal conditions. Table 3 Reproducibility of the increase of protein recovery between electrotransfer using the gold standard method and SWAV transfer with ½ Towbin applied to unlabeled proteins (Low range Mr protein standard). The band volumes of both experiments were compared to determine if the % of increase in protein recovery were statistically significant. This was performed using an F test to determine if the SD of both band volume quantification were comparable followed by a student t test to verify the significance. Statistical result on a six times repeated experiment is shown in the last column: (+) p < 0.05, (-) Non significant; The protein band volume is indicated in the two central columns with the following format: “Average value r SD” (n=6). All of the 6 proteins as well as the average value showed an increase of protein recovery when the SWAV transferring method was used, but only PHS2, OVAL, CAH2 and the average value are significant with p < 0.05. Protein band volume using Gold standard electrotransfer technique Proteins 548 r 72 PHS2 913 r 161 BSA 653 r 118 OVAL 882 r 91 CAH2 839 r 110 ITRA 533 r 125 LYC 728 r 91 Average
Protein band volume using SWAV electrotransfer technique 767 r 117 1099 r 261 938 r 94 1093 r 113 869 r 142 595 r 294 893 r 144
% of increase (SWAV vs. 1mA/cm2) 40 (+) 20 (-) 44 (+) 24 (+) 4 (-) 12 (-) 23 (+)
3.3. Gel residual protein after transblotting process Previous results showed an increase of material recovery mostly for high Mr proteins. Consequently, the material remaining in the gel after the transblotting step must also be affected. To verify and quantify the amount of proteins remaining in the gel after the electroblot step, gels were air dried as previously described in section 2.4. The phosphorimages of the resulting gels from the gold standard transfer (1 mA/cm2 with heterogeneous CAPS buffer) and the SWAV transfer (homogeneous ½ Towbin with SDS) are shown in Figure 3. Ratios between the signal volume of the remaining proteins in the electroblotted gels (gold standard and SWAV electroblot) and the signal volume of the reference gel (unblotted gel) are shown in Figure 4. The average remaining material contained in the gold standard electrotransfered gel corresponded to 34 ± 14% (n=6) of the material contained in the unblotted gel whilst an average of 17 ± 9% (n=6) is remaining in the case of the SWAV transferred gel. The difference between these two samples was significant (p < 0.025). Nevertheless, a large disparity depending on proteins (SD = ± 14% and ±
QUANTITATION DURING ELECTROBLOTTING STEP
147
9%, n = 6) could be observed on Figure 4. High molecular weight proteins such as MYSS are more affected by this problem whereas 58% of the material can remain in the gel. In section 3.2, the comparison of protein recovery using gold standard electroblotting conditions and SWAV technique showed higher recovery for the second method. Results obtained after the quantification of the gel remaining material confirmed this observation. 14
C labelled Proteins
A (Gold standard)
B
C
D
E
F AB stained proteins MYSS BGAL PHS2 BSA
MYSS PHS2 BSA
OVAL
OVAL
CAH2 CAH2 ITRA
LYC
LYC
Figure 3: Phosphorimages of the 14C labelled proteins remaining in the gel after the electroblot process. A, Control unblotted gel used as a reference gel to calculate volume ratio in Figure 4; B, Gel after gold standard transfer (1 mA/cm2 of gel surface using heterogeneous CAPS); C, Gel after SWAV transfer using homogeneous ½ Towbin. In-gel remaining material is lower after the SWAV electrotransfer than after the gold standard electrotransfer. It is clearly visible that the high Mr proteins are more affected by this effect.
Quantification of the material remaining in the gel after the electroblotting step confirmed the advantage of SWAV utilization compared to the continuous current. This proposed voltage is able to increase up to 200% the protein recovery on the PVDF membrane. 4. CONCLUDING REMARKS The present study had two objectives: first to determine the effect of square wave alternative voltage versus continuous current on protein electroblotting, and second,
BIENVENUT ET AL.
148
Ratio of the proteins remaining in the gel after electroblot to the reference gel
to evaluate the influence of electroblotting buffer composition. The SWAV with heterogeneous CAPS buffer showed a strong beneficial effect for the pulsed voltage with 65% average increase of protein recovery. The strongest effect was found for the high Mr proteins i.e. MYSS, BGAL, PHS2. The effect was less important, for smaller proteins (< 60 kDa). It was also found that the buffer composition influenced the level of protein recovery. For example, compared to CAPS buffer with SWAV, the use of heterogeneous and homogeneous ½ Towbin buffers showed only respectively 46 and 35 % average increase of protein recovery. The material remaining in the gel after the electroblotting step also confirmed the higher recovery for high Mr proteins. Comparison of retained material between the gold standard method and the SWAV method showed a decrease of the material in the gel, mostly for the larger proteins. Utilization of the SWAV could be generalized since the average material recovery of intact protein is 65% higher than the gold standard method. This pulsed electric field technique is highly interesting for any postseparation analysis using PVDF as a matrix. More generally this technique is applicable for protein recovery from gels, e.g. electroelution.
0.7 0.6
0.58
0.5 0.4
0.4
0.4
0.34 0.3
0.3
0.26 0 26
0.2
0 0.23
0 0.24
0.15 0 15
0.22 00.15 15
0 0.17
0.14
0.1 0 0 MYSS
PHS2
BSA
OVAL
CAH2
LYC
Mean value
Proteins Gel after gold standard electroblotting
Gel after SWAV electrotransfer
Figure 4: Signal volume ratio between the remaining 14C labelled proteins after gold standard and the SWAV electroblotting process over the reference unblotted gel used as a reference gel (values obtained from gels shown on Figure 3)
QUANTITATION DURING ELECTROBLOTTING STEP
149
6. ACKNOWLEDGEMENT This work was supported by the Swiss National Fund for Scientific Research (grant 31-59095.99). The authors acknowledge Prof. Jacques Deshusses, Dr. Richard W. James, Dr. Manfred Heller, Dr. Patricia Palagi, Dr. Christine Hoogland, Dr. Sonja Voordijk and Applied Biosystems for their technical support. 7. REFERENCES Achaz, G., Coissac, E., Viari, A., & Netter, P. (2000). Mol Biol Evol, 17, 1268-1275. Aleksandrov, M., Gall, L., Krasnov, V., Nikolae, V., Pavlenko, V., Shkurov, V., et al. (1984). Bioorg. Khim., 10, 710. Bienvenut, W., Deon, C., Sanchez, J., & Hochstrasser, D. (2002). Enhanced protein recovery after electrotransfer using square wave alternating voltage. Anal Biochem, 307(2), 297-303. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 49814988. Blattner, F., Plunkett, G. r., Bloch, C., Perna, N., Burland, V., Riley, M., et al. (1997). Science, 277, 1453-1474. Bolt, M., & Mahoney, P. (1997). High-efficiency blotting of proteins of divers sizes following SDSPAGE. Anal. Biochem., 247, 185-192. Consortium, I. H. G. S. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921. Eckerskorn, C., & Lottspeich, F. (1990). Combination of two-dimensional gel electrophoresis with microsequencing and amino acid composition analysis: improvement of speed and sensitivity in protein characterization. Electrophoresis, 11, 554-561. Hood, D. (1999). Parasitology, 118, S3-S9. Joubert-Caron, R., Le Caer, J., Montandon, F., Poirier, F., Pontet, M., Imam, N., et al. (2000). Protein analysis by mass spectrometry and sequence database searching: a proteomic approach to identify human lymphoblastoid cell line proteins. Electrophoresis, 21(12), 2566-2575. Jungblut, P., Eckerskorn, C., Lottspeich, F., & Klose, J. (1990). Blotting efficiency investigated by using two-dimensional electrophoresis, hydrophobic membranes and proteins from different sources. Electrophoresis, 11(7), 581-588. Kaji, H., Tsuji, T., Mawuenyega, K., Wakamiya, A., Taoka, M., & Isobe, T. (2000). Profiling of Caenorhabditis elegans proteins using two-dimensional gel electrophoresis and matrix assisted laser desorption/ionization-time of flight-mass spectrometry. Electrophoresis, 21(9), 1755-1765. Karas, M., & Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem, 60(20), 2299-2301. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. Mozdzanowski, J., & Speicher, D. (1992). Microsequence analysis of electroblotted proteins. I. Comparison of electroblotting recoveries using different types of PVDF membranes. Anal Biochem, 207(1), 11-18. Neumann, H., & Mullner, S. (1998). Two replica blotting methods for fast immunological analysis of common proteins in two-dimentional electrophoresis. Electrophoresis, 19, 752-757. Raymackers, J., Daniels, A., De Brabandere, V., Missiaen, C., Dauwe, M., Verhaert, P., et al. (2000). Identification of two-dimensionally separated human cerebrospinal fluid proteins by N-terminal sequencing, matrix-assisted laser desorption/ionization--mass spectrometry, nanoliquid chromatography-electrospray ionization-time off flight-mass spectrometry, and tandem mass spectrometry. Electrophoresis, 21(11), 2266-2283.
150
BIENVENUT ET AL.
Sanchez, J., & Hochstrasser, D. (1998). In A. Link (Ed.), Method in molecular biology: 2-d proteome analysis protocoll (Vol. 112, pp. 227-233). Totowa, NJ: Humana press. The C elegans Sequencing Consortium. (1998). Science, 282, 2012-2018. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J.-C., et al. (1998). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19, 1941-1949. Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., et al. (2001). The sequence of the human genome. Science, 291(5507), 1304-1351. Yamashita, M., & Fenn, J. (1984). Phys. Chem., 88, 4451-4459.
CHAPTER 4 SIGNAL TRAITMENT AND VIRTUAL IMAGES PRODUCTION (1/2) A Molecular Scanner to Highly Automate Proteomic Research and to Display Proteome Images Reproduced with permission of ( Binz, Muller et al., 1999). Copyright (1999) American Chemical Society
PA. Binz, M. Muller, D. Walther, WV. Bienvenut, R. Gras, C. Hoogland, G. Bouchet, E. Gasteiger, R. Fabbretti, S. Gay, P. Palagi, MR. Wilkins, V. Rouge, L. Tonella, S. Paesano, G. Rossellat, A. Karmime, A. Bairoch, JC. Sanchez, RD. Appel, DF Hochstrasser
ABSTRACT Identification and characterization of all proteins expressed by a genome in biological samples represent major challenges in proteomics. Today's commonly used high throughput approaches combine two-dimensional electrophoresis (2-DE) with peptide mass fingerprinting (PMF) analysis. Although automation is often possible, a number of limitations still adversely affect the rate of protein identification and annotation in 2-DE databases: the sequential excision process of pieces of gel containing protein; the enzymatic digestion step; the interpretation of mass spectra (reliability of identifications), and the manual updating of 2-DE databases. We present a highly automated method that generates a fully annotated 2DE map. Using a parallel process, all proteins of a 2-DE are first simultaneously digested proteolytically and electro-transferred onto a polyvinylidene difluoride (PVDF) membrane. The membrane is then directly scanned by MALDI-TOF MS. After automated protein identification from the obtained peptide mass fingerprints using PeptIdent software (http://www.expasy.ch/tools/peptident.html), a fully annotated 2-D map is created online. It is a multi-dimensional representation of a proteome, that contains interpreted PMF data in addition to protein identification results. This “MS-imaging” method represents a major step towards the development of a clinical molecular scanner.
151 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 151–168. © 2005 Springer. Printed in the Netherlands.
152
BINZ ET AL.
KEYWORDS Molecular scanner, high throughput analysis, parallel protein digestion, bioinformatics, MALDI-TOF MS, imaging, peptide mass fingerprinting, database searching, proteome, DPD, OSDT
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
153
1. INTRODUCTION Today’s genome sequencing projects provide a huge amount of information in the form of nucleotide sequences that are being stored in specific databases. In a first approach this information can be interpreted in order to obtain the coded amino acid (AA) sequences of all potentially expressed proteins. In their active forms proteins often differ from the predicted AA sequence as they can be processed or carry posttranslational modifications. Most of these modifications are not predictable from gene sequences. In fact, a single gene sequence may give rise to more than ten structurally different proteins (Wilkins, Sanchez, Williams, & Hochstrasser, 1996). As an example D-1-antitrypsin is known to exist in at least 22 different forms in the human plasma master image in the SWISS-2DPAGE database (Hoogland et al., 2000). This yields by extrapolation between 500'000 to one million different protein forms expressed in Human. The description of a proteome (Wilkins et al., 1995), involving the identification of all proteins contained in a biological sample, therefore represents a real experimental challenge. Methods involving high-resolution protein separation, parallelisation of sample preparation, automation of experimental processes and of database comparison, as well as powerful and specific visualization tools need to be developed and integrated (Hochstrasser, 1998; Williams & Hochstrasser, 1997). Identifying a protein from a complex biological sample requires at least three steps. The protein is first isolated. Then very specific experimental attributes, such as peptide mass fingerprinting (PMF) or partial amino acid sequences are determined. In the third step identification is attempted by matching these attributes with those computed for all entries in a protein sequence database. The 2-DE technique is a method of choice to separate with high resolution a large number of proteins in one single procedure, particularly when narrow range pH gradients are used (Sanchez & Hochstrasser, 1998; Scheler et al., 1998). It provides a graphical representation of a proteome, where each protein form present in the so called 2-DE map is represented by a spot or a series of spots and can be described by a pI, an apparent molecular weight and an intensity-related value. Among different methods used routinely, the PMF approach is generally accepted to be currently by far the most effective and rapid way to identify proteins from a 2-DE gel. In this method, proteins are excised and proteolytically digested from protein spots. The resulting peptides are measured by mass spectrometry and then matched against a database of theoretical peptide mass fingerprints deduced from protein sequences. A score is calculated which represents the similarity between the experimental and the theoretical peptide masses. In principle the protein with the highest score should result in a correct identification. Various problems emerge with regard to the analysis of very complex biological samples such as human tissue. How can we attain to a reasonable throughput when
154
BINZ ET AL.
performing a proteolytic digestion of all proteins after 2-DE? How can we reduce the number of manipulations required for the sample preparation before MS measurement? How can we simultaneously reduce the sizes of the samples and therefore increase their number? How can we handle the huge amount of experimental data and represent the result in a simple and comprehensive way? A number of solutions have been proposed to answer these questions. Various approaches have been described to automate and accelerate the method. Traini et.al. (Traini et al., 1998b) have proposed the use of a prototype robotics system to image and to excise a few hundred spots from a stained polyvinilidene difluoride (PVDF) blot. The protein samples were then enzymatically digested with an automated liquid handling system. The mass spectra of the peptide mass fingerprints were acquired using MALDI-TOF MS in automated mode. Proteins were identified using an automated interrogation software. Even though this approach is automated, the time consuming digestion process is partially sequential and involves expensive sample handling, due to material costs. In addition, since the size of a sample is limited by the size of the excised spot, problems occur when overlapping spots are present on a gel. In order to reduce sample handling and to decrease the analyzed sample size to that of the MALDI-TOF MS laser beam impact (a spot of a few tens of Pm in diamater), gels or membranes containing peptides or proteins have been used for direct MALDI-TOF MS measurements. Ogorzalek Loo et al. (Ogorzalek Loo et al., 1997b) have measured protein masses directly from thin layer isoelectrofocusing gels. Various types of membranes were also used as sample support for peptide or protein mass determinations, such as polyethylene (Blackledge & Alexander, 1995), non-porous polyurethane (McComb et al., 1998; McComb et al., 1997)10,11, PVDF (Immobilon PSQ or Trans-Blot) (Eckerskorn et al., 1997; Fabris et al., 1995; Schreiner et al., 1996; Vestling & Fenselau, 1994)or charged membrane Immobilon CD (Schreiner et al., 1996). Use of these sample supports allows the MS instrument to measure spectra separated by distances in the micrometer range. This opens the possibility to scan such a surface and create intensity images using the intensities of the MS signals, and therefore to localize single peptides or proteins (Bienvenut et al., 1999; Caprioli, Farmer, & Gile, 1997; Eckerskorn et al., 1997). In order to further increase the throughput of protein identification and to offer a flexible and powerful proteomic visualization tool, we designed a highly automated method that can create a fully annotated 2-D map starting from a 2-DE. This technology is called “molecular scanner”. It combines parallel methods for protein digestion and electro-transfers (using the one-step digestion-transfer (OSDT) or the double parallel digestion (DPD) techniques as described by Bienvenut et al. (Bienvenut et al., 1999)) with peptide mass fingerprinting approaches to identify proteins directly from PVDF membranes, the surface of which is scanned with MALDI-TOF MS. Using a set of dedicated tools it allows to create, analyze and visualize a proteome as a multi-dimensional image. This provides the technological
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
155
basis for the development of a clinical molecular scanner, which will be adapted and dedicated to medical diagnostics (Hochtrasser et al., 1991). 2. EXPERIMENTAL SECTION 2.1. Materials and reagents IAV-trypsin membranes were prepared as described in Bienvenut et al. (Bienvenut et al., 1999). Low range SDS-PAGE standards and Trans-Blot® PVDF membrane were purchased from Bio-Rad (Richmond, CA, USA). Trifluoroacetic acid (TFA), D-cyano-4-hydroxy-trans-cinnamic acid (ACCA) was purchased from SIGMA (StLouis, MO, USA). Acetonitrile (AcCN) HPLC grade was purchased from Flucka (Buchs, Switzerland). Methanol puriss pa was purchased from Merck (Darmstadt, Germany). High vacuum grease was purchased from Labofur GmbH (Bern, Switzerland). 2.2. Description of the method The method can be divided into 4 main sections (Figure 1). A) Separation and digestion of the proteins. One-dimensional separation of SDS-PAGE standards and mini 2-DE of human plasma were performed according to Laemmli (Laemmli, 1970) and Sanchez (Sanchez & Hochstrasser, 1998), respectively. All proteins were proteolytically digested with trypsin and electro-blotted onto a PVDF membrane, using OSDT parallel process as described by Bienvenut et.al. (Bienvenut et al., 1999) (Figure 1A). The collecting PVDF membrane thus contained sets of digestion products of all proteins, each of them localized at discrete positions on the surface. IAV-trypsin membrane was prepared as described in Bienvenut et.al. (Bienvenut et al., 1999). Where needed PVDF membranes were stained with amido-black after OSDT. B) Acquisition of the peptide mass fingerprinting data. Matrix solution made of 5 mg/ml ACCA in 50% AcCN, 0.1% TFA or of 10 mg/ml ACCA in 70% MeOH was sprayed on the PVDF membrane until the membrane became wet. After air-drying the membrane was stuck on a modified MALDI sample plate using high vacuum grease. The stainless steel surface of the MALDI MS sample plate was flattened to allow the deposition of a 4 x 4 cm2 PVDF membrane. An array of positions was
D D
W here are the m asses x, y located ? Plot as M S intensity Sm ooth the im age
Show identified proteins
E lectrotransfer under alternative electric field
x => pI
(x a , ya): SW ISS_PR OT P54001 (x b , yb ): SW ISS_PRO T P22323 (x c , yc): not found ...
Set of M S
x a, ( x b ,y
B
Identification in SW IS S -P R O T / TrEM B L
PeptIden
(xa ,y a ) -> pIa , M w a, other user defined param eters
C
(x a ,y a): {m a1 ; Ia1 }, {m a2 ;Ia2 }, ... (x b ,y b ): {m b1 ; Ib1 }, {m b2 ;Ib2 }, ... ...
P eptide M ass Fingerprints: m ass data + M S intensity data
A utom atic peak detection and calibration
TO F-detector
M DPNK
+ +
laser
CSTW HFR
Set of identification data
y => Mr
M em brane + m atrix solution sprayed on m odified M A LD I plate
Figure 1: Scheme of the molecular scanner. A) Parallel digestion and simultaneous electro-transfer of proteins from a 2-D PAGE using the DPD/OSDT method 16. B) MALDI-TOF MS scanning of PVDF collecting membrane after spraying with matrix solution. (xi, yj) refers to the position where MS spectra were measured on the PVDF membrane. C) Identification procedure. The peak detection and mass calibration yields sets of PMF. The MS signal measured at each (xi, yj) coordinate is represented by its m/z value mix and its MS intensity Iix. The xi and yi values are interpreted as pI and Mr values. The PMF data are submitted to PeptIdent. Identification results are collected together with the PMF data. D) A visualization tool allows to represent the analyzed data in different forms. Three examples of typical queries and representations are described here. (D1) An MS intensity image can be created, that contains the identification data as database labels. It is generated in a Melanie readable format. (D2) Another option allows to search for a particular protein and to visualize it as an intensity plot. In this i plot, the intensity represents the number of masses identified to belong to the protein at each position. (D3) The program further allows to search for a set of redefined masses, and to generate an intensity image where the intensity represents the total intensity of the found mass peaks at each (x,y) position. This image can be smoothed if needed.
D
W here is protein x? Plot as num ber of identified m asses
P 22323
Annotated 2D -im age
P54001
D
Exam ples of typical queries
PV D F m em brane collecting the digested products
interface with im m obilised endopeptidase, i.e. IA V -trypsin
2D-gel containing the proteins
156
BINZ ET AL.
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
157
defined on the membrane. The membrane was then scanned by the MS, i.e. a mass spectrum was acquired at each position of the array (Figure 1B). The distance between separate MS acquisition on the grid was constant for a given experiment (ranging between 0.2 and 0.5 mm). Mass spectra were acquired on a VoyagerTM Elite MALDI-TOF mass spectrometer (Perseptive Biosystems, Framingham, MA, USA) equipped with a 337 nm nitrogen laser and a Delayed Extraction device. The accelerating voltage was 18 kV, a delayed extraction parameter of 140 ns was selected and the m/z value selected for the low mass gate was generally 850. Laser power was set about 20% above threshold. The diameter of the laser beam on the membrane was about 100 m. Between 40 and 100 spectra were accumulated, depending on the amount of material analyzed. The set of coordinates of the laser shots on the sample plate and the naming of the MS files was controlled by special software to overcome limitations in the maximum number of coordinates and spectra allowed by the Voyager 4.03 acquisition software. C) Processing of the MS data and protein identification. A flexible and interactive tool was developed to automatically treat all MS data consecutively and to perform the various steps of the analysis, starting with peak detection and calibration (Figure 1C). The positions on the sample plate were converted to apparent molecular mass (M Mr) and pI values. The PMF data of all spectra, together with the calculated pI and Mr and other user defined parameters (such as mass tolerance, chemical modifications considered, species taken into account, etc.), were automatically sent over Internet for protein identification to PeptIdent, a PMF identification tool developed in Geneva (Binz, Wilkins et al., 1999) and available at the ExPASy server (http://www.expasy.ch, (Appel, Bairoch, & Hochstrasser, 1994)). D) Analysis of the results: creation of virtual maps. The identification results of PeptIdent were represented as an annotated image. All outputs of PeptIdent were acquired and stored in a modified format. The program generated a first virtual, annotated “2-D-map”, a 3-D image where the x and y coordinates related to pI and Mr values, respectively. The z values were represented in gray scale and reflected the intensity of the MS spectra, as defined by the sum of the intensities of the MS signals in the considered MS spectra. The range of m/z values to be considered was predefined. The intensity scale was chosen linear or logarithmic and the image was smoothed in some cases. The image file was stored in a graphical format, which can 2.3. Detection, quantification and statistics be read by the Melanie 2-DE image analysis software package (Appel, Palagi et al., 1997; Appel, Vargas et al., 1997). The image also contained the identification results, which can be highlighted as labels in Melanie
158
BINZ ET AL. (Figure 1D). The number of distinct attributes contained in the image reflects the number of dimension the image virtually contains. These are: pI, Mr, identification labels (SWISS-PROT or TrEMBL AC numbers, ID labels), peptide masses and MS intensities. Then for all potentially identified proteins, the annotations from PeptIdent (number of missed cleavages, annotated modifications, chemical modifications of Cys and Met residues, peptide sequences) are also available.
From all the data contained in this multi-dimensional image the user can choose to filter and visualize only particular aspects (Fig 1D). Proteins or peptides can be searched on the image by filtering part of the total information. Thus, a protein can be visualized by the positions where it has been identified. The z intensity can be a binary (black / white for present / absent, respectively) or a gray level. The darkness represents then either the number of peptides found to match the protein in the identification process using PeptIdent, or the sum of the MS intensities of the peptide masses matching the queried protein. Instead of searching for a protein, the user can specify and visualize a set of peptide masses. In this case, the image intensity scale can be defined from the number or the MS intensities of the masses detected out of the chosen list. (Figure 1D). 3. RESULTS AND DISCUSSION 3.1. Representation of the analysis of a 1-dimensional scan of 1-DE In order to set up the various experimental parameters of the method, we have performed a number of analyses on a protein mixture of molecular weight standards separated on SDS PAGE and treated by the DPD or OSDT method. The selected collecting membrane was PVDF. The membranes were initially stained with amido black to visualize the positions of the peptide fingerprinting bands (Figure 2A). Matrix solution was sprayed on the whole surface of the membrane. About 1.5 ml was used to spray a 4.4 x 0.5 cm PVDF membrane. The volume of matrix solution effectively deposited on the membrane was estimated to be 1-2 Pl/mm2. After air drying, the membrane was scanned in one dimension with MALDI-TOF MS. The summed intensity of the detected MS signals, for a given mass range, was plotted against the axis coordinate along the membrane (Figure 2B). The intensity of the MS spectra obtained from the stained membrane varied along the scanning axis. The positions of the 4 maximum intensities on the MS profiles correlated with the positions of the 4 stained bands. The MS profiles revealed distinctly resolved bands, thus suggesting a conserved separation of the peptide fingerprints during DPD or OSDT step and during matrix deposition. The peptide containing areas are separated by blank areas, showing no MS intensity (position 2 in Figure 2A and MS spectrum in Figure 2D). No significant broadening of the band was observed in comparison to the corresponding undigested electro-transfered stained protein bands. This would suggest that the
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
159
peptides are not diffusing significantly during the digestion, during the transfer process and on the membrane upon matrix apposition. From this membrane, protein identification was performed from each MS spectrum at maximum of MS intensity (Figure 2D). All 4 proteins (position 1,3,4,5 in Figure 2A) could be unambiguously identified in SWISS-PROT using PeptIdent. Different matrix solvents and apposition techniques of the matrix solution were compared. As an example, the methanol containing matrix solution wetted the surface of the PVDF in a much more homogeneous manner than the acetonitrile containing solution. More intense MS spectra and more homogeneous MS profiles were obtained with the methanolic matrix solution (data not shown). A)
1)
33)
4) 55)
D) 1)
+
CAH2_BOVIN 2
2)) B
C o 3) u n t 4) s
C
Scanning axis [x 10-4 inch]
1000
Figure 2. Result of a one-dimensional scan. Low range standard proteins were separated with 1-DE and then treated with the OSDT procedure. The one-dimensional MS scan was performed on an amido black stained PVDF membrane along the longitudinal dashed line of the PVDF image A. The plots B and C are MS intensity profiles. They represent the intensity of MS signals (number of Counts from the MALDI-TOF MS detector) as a function of the position on the membrane (x unit is 10-4 inch). In the lined plot only m/z values bigger than 1100 were considered. The intensities due to the two internal standards at mass 1498.82 +/- 1 and 2095.08 +/- 1 Da were excluded. MS spectra measured at the intensity maxima of the plot (positions 1), 3), 4), 5) in A and in the background (position 2) in A are shown in D. From these spectra, the four standard proteins could be identified with PeptIdent as labeled with their SWISS-PROT ID identifiers on the respective PMF MS spectra. The plot C is made of single ion intensity profiles. The selected ions were chosen from the set of peptides specifically matching for one of the four identified proteins. Values of 2198.2 (+), 1774.0 (*), 1440.0 (o) and 1426.7 (x) m/z were considered with a window of +/- 1 Da. Matrix solution was 10mg/ml ACCA in 70% methanol. 110 spectra have been accumulated 100 times on a total scanning length of 4.4cm.
160
BINZ ET AL.
3.2. Representation of the analysis of a two-dimensional scan from a single band of 1-DE Similarly, Figure 3 shows the result of a 2-D scan and its interpretation performed on a single protein band. Low range SDS standards were separated on 1-DE, processed with the OSDT method, and the PVDF was stained with amido black. A 0.8 x 0.6 cm2 piece of membrane containing the digested soybean trypsin inhibitor was scanned with a resolution of 0.5 mm. The amido black stained image of the band after OSDT on a PVDF membrane (Figure 3B) was compared with a MS intensity image calculated from all MS spectra (Figure 3C and Figure 3D), where only m/z values higher than 1100 Da were considered (there are disturbing matrix
A) M inte
B)
C)
Figure 3: Two-dimensional MS scan of 1-DE: The soybean trypsin inhibitor band. From the same membrane as in Figure 2 a piece of 1.1 x 0.9 cm2 was cut around the soybean trypsin inhibitor band and sprayed with a 10 mg/ml ACCA solution in 70% methanol. An array of 16 x 12 points was defined around the center of the band, with distance between spots of 500 Pm. A) 3-D MS intensity profile. All m/z higher than 1100 Da were considered to create the smoothed image. B) amido black stained image. C) MS intensity image. D) MS intensity image, plotted in a logarithmic scale. The white dots represent the positions where the ITRA_SOYBN was unambiguously identified with a minimum of 5 m/z matching values with PeptIdent
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
161
signals below 1100 Da). A 3-D plot (Figure 3A) representing the absolute intensities as function of the x,y position on the membrane was created with Matlab 5.2 (MathWorks, Inc., 24 Prime Parkway, Natick Massachussets, USA), and a MS intensity image was calculated under the same conditions (Figure 3C). The MS intensity profile followed relatively smoothed curves, suggesting a relatively homogeneous quality of the matrix crystallization. This approach highlighted the possibility to create intensity profiles with MS intensity values and to describe spot areas. Addition of a matrix solution did not seem to lead any significant diffusion of peptides on the matrix surface. The sensitivity of such an MS-staining was equal or better than that of amido black staining of the transferred protein. The intensity images obtained from these values show that the experimental noise, which may be due to inhomogeneity of the matrix crystallization quality on the membrane or to the low doping of the matrix by the analytes was limited. From the 192 acquired spectra the soybean trypsin inhibitor (SWISS-PROT ID: ITRA_SOYBN, AC: P01070) could be unambiguously identified 31 times (white dots in Figure 3D) with a minimum of 5 matched m/z values, representing a contiguous region in the center of the 2-D image. 3.3 Identification by two-dimensional scan of human plasma proteins separated by 2-DE From a human plasma sample separated on a mini-2-DE and transferred using the OSDT method, we have cut a section of the collecting PVDF membrane around the amido black stained spot of the apolipoprotein-A1 (SWISS-PROT AC P02647). The smoothed MS intensity image is shown in Figure 4B. From the 195 MS files measured from pixels at 380 Pm steps, 77 yielded the apolipoprotein-A1 as the identified protein with a minimum of 5 peptide masses matched. The image shows that MS intensities are detected around the area corresponding to the amido black stained visible surface. The observed signal corresponded to the most intense peaks of the spectra from which apolipoprotein A1 was identified, thus suggesting that “MS staining” is more sensitive than the amido black staining method. Various representations from a more complex protein mixture are shown in Figure 4. Another section of the amido black stained PVDF membrane, obtained after OSDT treatment of a mini-2-DE gel from a human plasma sample, was also scanned with the MALDI-TOF MS. Its size was about 1.83 x 0.37 cm2 (Figure 4A and Figure 4C). A total of 890 spectra were measured at 400 Pm steps. The chosen section of the scanned PVDF membrane contained a set of overlapping spots and trains of spots, as deduced from the known repartition of identified proteins in the human plasma image in the SWISS-2DPAGE database (http://www.expasy.ch/cgi-bin/map2/big?PLASMA_HUMAN). It also contained contamination from the adjacent and very abundant albumin, centered above the upper right corner of the excised PVDF surface. In addition, a probably high number of proteins, whose sequences are unknown in databases, were also present in this sample. The MS intensity image reveals a continuous background of MS signals, represented by a
162
BINZ ET AL.
Figure 4. 2-D scan of a plasma mini-2-DE after OSDT. Possibilities to extract proteomic information from a 2-D MS scan. 250 Pg of human plasma were separated on mini-2DE. Proteins were digested and transferred on PVDF using the OSDT procedure. A) Image of the membrane stained with amido black. B) Smoothed MS intensity image of the region where the apolypoprotein-A1 (SWISS-PROT AC P02647) was identified. The circle indicates the size of the spot visible by amido black staining. This shows that MS intensities are still detected where amido black staining is blank. C) Enlargement of the image A), showing the positions corresponding to the proteins identified on the area. The labels are the SWISS-2DPAGE ID names: AACT_HUMAN (alpha-antichymotrypsin, AC: P01011), VTDB_HUMAN (vitamin D binding protein, AC: P02774), ALBU_HUMAN (serum albumin, AC: P02768), A1AT_HUMAN (alpha-1-antitrypsin, AC: P01009), FIBG_HUMAN (fibrinogen gamma Dchain, AC: P02679), IGHA_HUMAN (immunoglobulin D-chain, AC: P99002). D) Raw MS intensity image of the region zoomed from the amido black image. E) Same image, but smoothed. F) In this MS image, the intensity is related to the number of MS signals used to identify the query protein, i.e. AACT_HUMAN (SWISS-PROT AC P01011). One of the MS spectra of the spot is also shown. G) Smoothed MS image, where the intensity represents the summed intensity of the MS signals used to identify VTDB_HUMAN (SWISS-PROT AC P02774) at each pixel. One of the MS spectra allowing the identification of the protein is shown. H) MS intensity image of the same region, where only MS signals belonging to Immunoglobulin D chain (IGHA_HUMAN) peptides are considered.
grey background (Figure 4D, Figure 4E). This suggests that a lot of peptide material is measured on the whole surface, and that the protein spots are not isolated entities. The analysis tool allowed, however, to filter this complex feature and gave the possibility to extract spots corresponding to single proteins. Protein spots can therefore be isolated from chemical noise. As examples Figure 4F) and Figure 4G) show two different regions of the image from which the alpha-1-antichymotrypsin
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
163
(SWISS-PROT ID: AACT_HUMAN, AC: P01011) and the vitamin D binding protein (SWISS-PROT ID: VTDB_HUMAN, AC: P02774) was identified and visualized as two isolated spots, respectively. The figure shows two possibilities of representing specific “intensity values”. In the first possibility (the AACT_HUMAN spot in Figure 4F), the intensity of each pixel is proportional to the absolute number of peptides identified for a given protein. In Figure 4F, the protein is AACT_HUMAN. This intensity is somehow related to the confidence of identification. In the second possibility, i.e. the VTDB_HUMAN spot in Figure 4G, the intensity is proportional to the sum of the MS intensities of the peptides peaks identified for a given protein. In Figure 4G, the protein is VTDB_HUMAN. The intensity was then smoothed. Here the intensity is more related to the protein concentration than the previous one. Therefore we have graphically extracted the contribution of the two proteins from the total MS intensity image shown in Figure 4C and Figure 4D. Some proteins are highly abundant, and are present in multiple forms, such as the immunoglobulin alpha chain. They are detected on a large part of the area, thus yielding chemical noise for other proteins (Figure 4H). A number of proteins were clearly identified from this sample, and their relative positions on the membrane correlated with those identified in the human plasma master gel in the SWISS-2DPAGE database (Figure 4C). 4. DISCUSSION The technique presented, known as a molecular scanner, provides a powerful tool for proteomics research. Firstly, it is a high throughput method dedicated to protein identification using peptide mass fingerprinting or other methods in the future and applied to the entire 2-DE. It uses a parallel method of protein digestion. Thus, in one experimental step, thousands of proteins can be chemically processed or digested simultaneously, under identical experimental conditions. The obtained sample can be directly used for MS measurements. This method limits losses of material caused by sample manipulation. The size of each MS sample is reduced to the size of the laser beam used in the MALDI-TOF MS, i.e. about 10-2 mm2. A single protein 2-DE spot can therefore be represented by more than 100 spectra. Secondly, the PMF analysis is fully automated, and can be modularly modified at any step, i.e. choice of the peak detection algorithm, of the calibration procedure, of the masses considered for identification, of the arguments sent to PeptIdent and of the image representation. The molecular scanner provides virtual images which can be considered as graphical projections of an automatically generated proteomic database. The database can be searched by protein identifiers (i.e. protein ‘name’), or by massrelated identification results. The user can choose to visualize a single protein by searching the positions where the protein has been identified. As the position of a set of masses can be searched, a protein can be visualized as a function of the number and/or the intensity of MS signals matched by Peptident for this protein. Where a protein yields a train of spots on a 2-DE gel, the spot corresponding to one particular
164
BINZ ET AL.
form of the protein can be isolated by searching a specific peptide mass in the spectra. This allows the systematic analysis of post translational modifications. In this respect, all Peptident results could be used as input data for a characterization step using FindMod (Wilkins et al., 1999). FindMod is a tool which interprets unannotated MS signals for a given protein and PMF data. It looks, by mass difference, for the occurrence of post translational modifications using a set of intelligent rules as well as for potential amino acid substitutions. It can therefore be systematically linked to PeptIdent, i.e. after the identification step, it helps to further characterize and discriminate all spots of a train. In the future different potential post-translational modifications will be automatically highlighted in various colors on the image obtained by this scanner. The high resolution obtained by the MS scanning becomes particularly useful when overlapping spots occur. This can be interpreted as a mixture of proteins. Reconstitution of intensity envelopes from peptide mass fingerprinting allows to discriminate the two or more overlapping spots. Then one or the other spots can be visualized by choosing the peptide masses specific to this particular protein form in order to create an image or they can be represented by different coloring systems. As an additional feature, neither the gel nor the PVDF membrane need to be chemically stained. The MS intensity acts as a ‘coloring’ agent. Since spots can be localized, the image can therefore be compared, aligned and matched with other gel images or PVDF image stained with conventional methods. As for chemical staining methods, the intensity of the MS signals are neither proportional to the amount of protein loaded, nor to the amount of amino acid contained in the different spots. This relies on the desorption process and on the ionisation yields. Thus, the intensity of the MS signals only partially correlates with the intensity of an amido-black staining (see Figure 3 of the accompanying paper (18)) or with the absolute amount of material. In the 1-D scan (Figure 2) the SDS gel was loaded with 1 Pg of each protein. Therefore no estimation of protein amount can be deduced from a single MS image. However, comparative studies may be performed between several MS images in cases where identical spots are compared. The illustrated experiments gave a preliminary idea of the sensitivity of the method. In 1D experiments we have loaded 1Pg of each molecular weight standards, which corresponds to about 10 to 33pmol of proteins (see Figure 2). The size of the bands, visible on the control PVDF membrane, i.e. membrane obtained without protein digestion, and stained with Amido Black, were of about 15 mm2. All proteins could be identified very clearly. The sensitivity was here of 66ng/mm2, respectively of 0.66 to 2 pmol/mm2. As the area of a protein spot on a mini-2D gel covers about this size, one could extrapolate that the detection limit for a clear identification lays around the low picomole range today, with no optimization of the method. Another experiment was performed by loading 0.2Pg of each molecular weight standards. All proteins could be identified, but on a fewer number of pixels (not shown). Moreover, one single MS measurement covers about 10-2 mm2. Although the efficiency of peptide extraction, co-cristallisation and ionization processes are not known, every identification was performed on about 10 fmol of
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
165
initial protein. Note that these calculations are to give a rough estimate of the current sensitivity of the identification process. It is also to be noticed that spectra measured at positions neighboring pixels of identified proteins still contain detectable peaks. The sensitivity of peptide peak detection is therefore higher than the sensitivity of protein identification. An interesting observation has to be mentioned. When comparing spectra measured after an in-gel digestion and after a membrane scan, differences are noticeable in term of peptides detected. Even if most of the mass signals are present in both cases, the intensity of the signals can differ strongly. Some signals are present in the in-gel spectrum and absent in the scanned spectrum. Unfortunately, there is no evident relationship between the presence and intensity of a signal and a physico-chemical property of the corresponding peptides. However, most of the peptides detected with the two methods are perfectly digested peptides, and generally covers similar sequence percentage. More detailed studies will have to be undertaken. As MS information represents an additional specific property for a spot, two images containing MS information could be aligned by matching their mass spectra and/or their resulting identifications in the Melanie 2-DE image analysis software. This procedure could replace and/or confirm manual and software based alignment of matched gel images. The development of the molecular scanner required us to develop and to integrate high throughput methods for sample preparation and analysis. Specific bioinformatics tools had to be created as well. The molecular scanner was designed as a set of interconnecting modules, which can be exchanged and modified in a very flexible manner. It can therefore easily be adapted for improvements and modifications. The current bottleneck of this technique is the time necessary to scan the membrane with the mass spectrometer. Without optimization, the MS scanning time of a 4 x 4 cm2 surface is about 55 hours at a 0.4 mm resolution with 64 laser shots per position. This means that a full 16x16 cm2 membrane would require, in the same conditions, more than 36 days of continuous measurements and about 40Gb of memory to archive the raw data. As people tend to stretch the pI axis using narrow pH gradient strips in the first dimension, this would increase the separation power of the protein spots, but increase the measurement time needed. In order to accelerate the acquisition rate of the MS spectra, limited currently by the 3Hz frequency of the laser and by a fixed number of laser shots per pixel, one should at least be able to software-control the number of required laser shots, i.e. to skip acquisition when spectra are empty or where the signal to noise ratio is above a given threshold. This may gain a factor 2 to 5. Due to ion statistics, it is difficult to reduce drastically the number of laser shoots per pixel. As the detector is inactive at least 99% of the time, the acquisition frequency should be increased, either by a increase of the laser repetition rate, or by the use of multiple lasers at neighboring positions on the membrane. As time is required to allow relaxation of the crystals between two laser shoots, there is a physical limitation of the pulse rate alone. As the specificity of a protein identification strongly depends on the mass accuracy, efforts can be also
166
BINZ ET AL.
focused in the comparison of mass patterns in neighboring pixels. Finally, as MS technology develops, we anticipate that the full scan of a 10 x 10 cm2 mini-2-DE gel will be performed in a few hours. 5. CONCLUSION In medicine, the development of computer assisted tomography methods allowed to visualize the complexity of the human body as a volume of anatomically related organs and tissues. The cellular components of a tissue can today be described using immunohistology and immunocytology. There is an obvious need to describe the protein content of a cell or of a biological fluid. The molecular scanner allows to analyze many proteins in such a complex system. It reports, at the molecular dimension, the complexity of protein content. The presentation of a proteome as a searchable database, which can be visualized as user defined 3-D images, provides a powerful tool for comparative analysis in proteomics. The method, initially starting from 2-DE separation of proteins, can be adapted to other fields such as protein chips or other multidimensional separation methods. It can also be applied in clinical diagnostics where modifications occurring to proteins, i.e. mutations, changes in post-translational modifications have to be monitored. These changes may be observed as changes in the PMF patterns, although they may not influence the migration of the protein itself. In addition to the presented approach, high throughput MS/MS sequencing methods 25,26 or chip technology could represent complementary features. They are yielding additional information and provide huge amount of data to be analyzed through visualization methods, such as the one proposed here. Finally, this technique allows to be combined with additional types of analysis. The same surface can be reused for new analysis, such as an MS scan under different conditions, or with another laser, i.e. an IR-laser 13. In the case of particularly interesting spots, one can use the known coordinates of the location of the spot to perform additional chemistry on this particular area. The spot of interest can similarly be cut using a dedicated excision system to be submitted to further analyses, such as MS/MS. The molecular scanner is therefore a tool which can be fully integrated in any more general proteomics analysis process. 6. ACKNOWLEDGEMENT We are deeply grateful to the Helmut Horten Foundation for its financial support. This work was also supported by the Swiss National Fund for Scientific Research (grants 32-49314.96 and 31-52974.97) and the Montus Foundation. We are also very thankful to Dr. Keith Rose for his technical and critical help.
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
167
7. REFERENCES Appel, R., Bairoch, A., & Hochstrasser, D. (1994). Trends Biochem. Sci., 19, 258-260. Appel, R., Palagi, P., Walther, D., Vargas, J., Sanchez, J., Ravier, F., et al. (1997). Melanie II--a thirdgeneration software package for analysis of two-dimensional electrophoresis images: I. Features and user interface. Electrophoresis, 18(15), 2724-2734. Appel, R., Vargas, J., Palagi, P., Walther, D., & Hochstrasser, D. (1997). Melanie II--a third-generation software package for analysis of two-dimensional electrophoresis images: II. Algorithms. Electrophoresis, 18(15), 2735-2748. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 49814988. Binz, P., Wilkins, M., Gasteiger, E., Bairoch, A., Appel, R., & Hochstrasser, D. (1999). In R. Kellner, F. Lottspeich & H. Meyer (Eds.), Microcharacterisation of proteins (2nd ed., pp. 277-300). Berlin: Wiley-VCH. Blackledge, J., & Alexander, A. (1995). Polyethylene membrane as a sample support for direct MALDI MS of high mass proteins. Anal. Chem., 67, 843-848. Caprioli, R., Farmer, T., & Gile, J. (1997). Molecular imaging of biological samples localization of peptides and proteins using MALDI-TOF-MS. Anal. Chem., 69, 4751-4760. Eckerskorn, C., Strupat, K., Schleuder, D., Hochstrasser, D., Sanchez, J., Lottspeich, F., et al. (1997). Analysis of proteins by direct scanning IR-MALDI-MS after 2-D PAGE separation and electroblotting. Anal. Chem., 69, 2888-2892. Fabris, D., Vestling, M., Cordero, M., Doroshenko, V., Cotter, R., & C, F. (1995). Rapid Commun Mass Spectrom., 9(11), 1051-1055. Hochstrasser, D. (1998). Proteome in perspective. Clin. Chem. Lab. Med., 36(11), 825-836. Hochtrasser, D., Appel, R., Vargas, R., Perrier, R., Vurlod, J., Ravier, F., et al. (1991). MD-Computing, 8, 85-91. Hoogland, C., Sanchez, J., Tonella, L., Binz, P., Bairoch, A., Hochstrasser, D., et al. (2000). 28(1), 286288. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. McComb, M., Oleschuk, R., Chow, A., Ens, W., Standing, K., Perreault, H., et al. (1998). Characterization of hemoglobin variants by MALDI-TOF MS using a polyurethane membrane as the sample support. Anal Chem, 70(24), 300. McComb, M. E., Oleschuk, R. D., Manley, D. M., Donald, L., A.Chow, O'Neil, J. D. J., et al. (1997). Use of a non-porous polyurethane membrane as a sample support for MALDI-TOF MS of peptides and proteins. Rapid Commun. Mass Spectrom., 11, 1716-1722. Ogorzalek Loo, R., Mitchell, C., Stevenson, T., Martin, S., Hines, W., Juhasz, P., et al. (1997). Sensitivity and mass accuracy for proteins analyzed directly from polyacrylamide gels: implications for proteome mapping. Electrophoresis, 18(3-4), 382-390. Sanchez, J., & Hochstrasser, D. (1998). In A. Link (Ed.), Method in molecular biology: 2-d proteome analysis protocoll (Vol. 112, pp. 227-233). Totowa, NJ: Humana press. Scheler, C., Lamer, S., Pan, Z., Li, X., Salnikov, J., & Jungblut, P. (1998). Peptide mass fingerprint sequence coverage from differently stained proteins on 2-DE patterns by MALDI-MS. Electrophoresis, 19, 918-927. Schreiner, M., Strupat, K., Lottspeich, F., & Eckerskorn, C. (1996). UV-MALDI-MS of electroblotted proteins. Electrophoresis, 17, 954-961. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J.-C., et al. (1998). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19, 1941-1949. Vestling, M., & Fenselau, C. (1994). PVDF: an interface for gel electrophoresis and MALDI-MS. Biochem. Soc. Trans., 22(2), 547-551.
168
BINZ ET AL.
Wilkins, M., & al., e. (1999). High throughput mass spectrometry discovery of protein post translational modification. J. Mol. Biol., 289, 645-657. Wilkins, M., Sanchez, J., Gooley, A., Appel, R., Humphery-Smith, J., Hochstrasser, D., et al. (1995). Progress with proteome projects: Why all proteins expressed by genome should be identified and how to do it. Biotechnology & genetic Engineering Reviews, 13, 19-50. Wilkins, M., Sanchez, J., Williams, K., & Hochstrasser, D. (1996). Current challenges and futures applications for protein maps and post-translational vector maps in proteome project. Electrophoresis, 17, 830-838. Williams, K., & Hochstrasser, D. (1997). In Proteome Research: New Frontiers in functional genomics (pp. 1-12). Berlin: Springer-Verlag.
CHAPTER 5 SIGNAL TRAITMENT AND VIRTUAL IMAGES PRODUCTION (2/2): Visualization and Analysis of Molecular Scanner Peptide Mass Spectra. (Muller et al., 2002)
Muller M, Gras R, Appel RD, Bienvenut WV, Hochstrasser DF
ABSTRACT The molecular scanner combines protein separation using gel electrophoresis with peptide mass fingerprinting (PMF) techniques to identify proteins in a highly automated manner. Proteins separated in a 2-dimensional polyacrylamide gel (2DPAGE) are digested ‘in parallel’ and transferred onto a membrane keeping their relative positions. The membrane is then sprayed with a matrix and inserted into a matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometer, which measures a peptide mass fingerprint at each site on the scanned grid. First, visualization of PMF data allows surveying all fingerprints at once and provides very useful information on the presence of chemical noise. Chemical noise is shown to be a potential source for erroneous identifications and is therefore purged from the mass fingerprints. Then, the correlation between neighboring spectra is used to recalibrate the peptide masses. Finally, a method that clusters peptide masses according to the similarity of the spatial distributions of their signal intensities is presented. This method allows discarding many of the false positives that usually go along with PMF identifications and allows identifying many weakly expressed proteins present in the gel.
169 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 169–188. © 2005 Springer. Printed in the Netherlands.
170
MULLER ET AL.
1. INTRODUCTION At present, as complete genomes for an increasing number of organisms are available, attention must be focused on proteins encoded by the genes. In contrast to the static genome, the proteome of an organism is a highly dynamic and connected network, and new analytical methods have to be developed in order to describe its spatial and temporal changes and interactions (Godovac-Zimmermann & Brown, 2001). An important step in this task is the high throughput identification of proteins, which nowadays mostly relies on efficient protein separation, mass spectrometry, protein sequence databases as well as bioinformatics (Bienvenut et al., 2001). One of the most important methods for protein separation is 2-dimensional polyacrylamide gel electrophoresis (2D-PAGE) (Bjellqvist et al., 1982). This technique allows separating simultaneously thousands of proteins according to their isoelectric point (pI) I and molecular weight (Mr) and displaying them on a twodimensional map. Mass spectrometry (MS) has become one of the most powerful techniques to identify organic molecules. Among various applications, peptide mass fingerprinting (PMF) is frequently used because, combined with matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (Karas & Hillenkamp, 1988; Tanaka et al., 1988), it provides a rapid and sensitive method for protein identification. PMF compares the list of experimental masses of peptides, the peptide mass fingerprint, obtained by specific endoproteolytic digestion of proteins with the theoretical mass values calculated by in silico digestion of protein sequences. A score valuates how well the theoretical masses match the fingerprint (Henzel et al., 1993; James et al., 1993; Mann, Höjrup, & Roepstorff, 1993; Pappin et al., 1993; Yates, III et al., 1993). Gras et al. (Gras et al., 1999) presented a PMF identification algorithm, which is based on a scoring schema that takes into account important parameters like mass accuracy, protein coverage by matching peptides, number of missed cleavage sites and the deviation of the measured pII and Mr values (if available) from theoretical predictions. In order to learn the weights of these parameters for the PMF identification score, a set of 91 PMF test spectra was used and optimal values of these weights were calculated by means of a genetic algorithm. Eriksson et al. (Eriksson, Chait, & Fenyo, 2000) investigated the influence of different experimental parameters on statistical thresholds used to discern false matches for two different scoring schemas. Since the experimental mass fingerprint can match the theoretical peptide masses of a protein by chance, there is always a certain probability for false identifications in PMF. There is a trade-off between sensitivity and specificity of a database search: if the search is too restrictive, it might miss some proteins (false negatives) and if it is not restrictive enough, it might find too many erroneous matches (false positives).
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
171
The precision of mass measurements certainly influences the sensitivity and specificity of PMF identification. Since the resolution of mass spectrometers has improved, calibration errors are now the limiting factor. These errors originate from uncertainties in the estimation of experimental parameters such as electric field strengths and initial ion velocities. Calibration of mass spectra is not a trivial problem even if internal standards are used. For TOF instruments, the function that relates the flight time with the M/Z value and the algorithm to calculate the calibration parameters have to be carefully chosen in order to get a good precision. Christian et al. (Christian, Arnold, & Reilly, 2000) described a method that is based on physical flight time equations (Juhasz, Vestal, & Martin, 1997; Vestal & Jushaz , 1998) and a simplex method to search for the optimal instrument parameters. This approach proved to be more robust than usual curve fitting methods, especially in the mass range where no standard masses were available. Several partially automated methods have been proposed to excise protein spots from a stained gel, to submit the excised material to endoproteolytic digestion and to extract peptides from the excised gel (Lopez, 2000). The peptides are then loaded onto a MALDI sample plate and introduced into a mass spectrometer for PMF acquisition (Traini et al., 1998b) . These methods have the inconvenience that the location of protein spots must be known prior to excision, and that the excision precision is limited (> 1mm). Recently, Binz et. al. (Binz, Muller et al., 1999) introduced a new and highly automated approach, dubbed the molecular scanner, which combines 2-D PAGE separation techniques with PMF methods. In this approach, the proteins were digested firstly in the gel itself and then during transfer onto a collecting polyvinylidene fluoride (PVDF) membrane (Bienvenut et al., 1999). This membrane was sprayed with a matrix solution (D-Cyano-4-hydroxy cinnamic acid), and the co-crystallisation of the matrix and the peptides allowed MALDI-MS analysis. Since diffusion in this process was not relevant, the location of the peptides on the PVDF membrane corresponded to the location of their proteins in the gel (Pacholski & Winograd, 1999). The membrane was then scanned by a MALDI-TOF mass spectrometer. For each scanned point the acquired peptide mass fingerprint was submitted to a PMF identification program, which returned a list of matching proteins. A threshold that was based on a statistical analysis of erroneous identifications was used to distinguish false identifications by their average identification score (Bienvenut et al., 2001). This method provided good results for the most abundant proteins, but it had difficulties to distinguish weakly expressed proteins from noise. A graphical display allowed visualising the matching proteins on a two-dimensional map. High throughput methods can produce a large amount of mass spectrometric data, and multidimensional visualization of these data is becoming more and more important. It allows surveying data and provides ideas for algorithmic solutions. One example is secondary ion mass spectrometry (SIMS) techniques, where natural tissues can be scanned with a spatial resolution of less than 100nm and the resulting
172
MULLER ET AL.
spectra can be used to visualize the 2- or 3-dimensional distributions of secondary ions (Pacholski & Winograd, 1999). Stoeckli et al. (Stoeckli, Chaurand, Hallahan, & Caprioli, 2001) coated frozen thin sections of tissue with a solution of MALDI matrix, then dried and introduced them into a mass spectrometer, which scanned the sample. For a human brain tissue, an area of 8.5mm x 8mm was scanned with a grid spacing of 100Pm and the position of 45 ions were recorded and rendered as 2dimensional images. In this paper, visualization of all mass fingerprints provides important information on the presence of chemical noise that is shown to be a potential source for false matches in the PMF identification procedure. The correlation of neighbouring spectra is used to recalibrate the mass fingerprints. In order to simplify PMF identifications, an algorithm calculates distributions of peptide signal intensities and joins the masses with similar distributions into clusters. These clusters represent protein spots, and many of them yield a clear PMF identification. These methods were developed in the framework of the molecular scanner, but we think that they are of more general interest since they deal with issues such as chemical noise, calibration, weak signal detection and how contextual information can be used to improve results. 2. METHODS In this experiment, 1 mg E. coli proteins were separated by 2D-PAGE. After in-gel digestion, the proteins were submitted to a digestion-transfer and trapped on a PVDF membrane (Bio-Rad, Richmond CA). A portion with a size of approximately 9x13 mm (corresponding to a pI range of 5.1-5.2 and a Mr range of 35’000-45’000 Da) was cut out from the membrane and pasted on the sampling plate of a MALDI-TOF mass spectrometer (Voyager Elite, Applied Biosystems, Framingham MA), which was equipped with a 337-nm UV laser. 5mg/mL of D-cyano-4-hydroxycinnamic acid (4-HCCA from Sigma, St-Louis MO) dissolved in 70% methanol was sprayed on the PVDF membrane. Then the membrane was scanned on a 48x32 grid with a sampling distance of 0.25 mm. 64 laser shots were fired at a frequency of 3 Hz leading to an acquisition time of about 9 hours. The disc space needed to store all the spectra was 350 MB, which could be compressed to 3MB after peptide signal detection if just the mass fingerprints were stored. More details of the molecular scanner experiment discussed in this article can be found in (Bienvenut et al., 1999). The algorithms used for peptide signal detection and the PMF identification program SmartIdent are described in Gras et al. (Gras et al., 1999). Since the concentration of some proteins was low, only a few of their peptide masses were detectable and the minimal number of matching masses for the PMF search was set to 2 if deconvoluted peptide mass lists were used and to 3 otherwise (since the standard version of SmartIdent requires att least 3 matching masses, it was adapted to the needs of this experiment). The number of missed cleavages was set to one and
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
173
only chemical modifications of cysteine and methionine were considered. The mass tolerance was set to 200ppm. A reduced version of Swiss-Prot (Release 39.22 of 20Jun-2001) that contained all 4740 proteins from E.coli was searched for PMF identification. Calculations were performed on a 500 MHz Pentium with 128 MB RAM on Windows NT. Programs were written in C++ and Virtual Reality Modeling Language (VRML 2.0, http://www.sdsc.edu/vrml) was used for visualization. VRML is a software standard that defines the format of data files sent over the Internet for visualization and animation, and is therefore supported by Internet browsers. Netscape® Communicator 4.7 was used to render the VRML data files and ‘m/z’-software by Proteometrics to render single spectra. 3. RESULTS AND DISCUSSION 3.1. Visualization of spectra The data obtained in the molecular scanner experiment consisted of a set of mass spectra: one for each scan point. The first aim was to get an idea of how the data were structured. Since there were 1536 spectra, it was impossible to inspect and compare them by means of conventional visualization tools that are only able to render a few spectra at a time. We designed a method that allows circumventing this problem and inspecting all spectra at once. Each mass detected in a spectrum can be associated with a point in a 3dimensional space (Figure 1a) where the horizontal plain corresponds to the scanned membrane and the vertical axis to the mass value. In Figure 1b all masses between 800 Da and 1000 Da are marked as points revealing that some masses were detected on a contiguous region of the scanned membrane, while others were found only on isolated lattice sites. For the main part of this paper we considered only masses that could be reproducibly detected in a neighborhood, because this provided more reliable results than working with all masses. Therefore a filter discarded a mass from a mass fingerprint if it could not be detected in the majority of the 8 surrounding sites. All lattice sites were treated simultaneously and this process was repeated until a stable configuration was obtained, i.e. the filter can be represented as a synchronous cellular automaton (Toffoli & Margolus, 1987). This filter is different from a filter that selects the most intense peptide signals in an isolated spectrum since it takes into account the spatial correlation of the data. There were several low intensity peptide signals detected on a contiguous region that proved to be essential for the identification of a protein. The masses that pass this filter and do not belong to chemical noise (see below) are called contiguous masses and are depicted in Figure 1c.
MULLER ET AL.
174
a) b)
c)
804.
820. 838.
936.
999. 951.
d) Figure 1. (a) The pI axis goes from 5.1 to 5.2, whereas the Mr axis is inverted and goes from 45’000 Da to 35’000 Da. Masses of one spectrum (m1,…,m5) are schematically depicted. (b) Masses between 800 Da and 1000 Da. The peptide signal detection threshold was set to the optimal value used for the identification where also small signals are detected (signal height > 2.2*noise). (c) Contiguous masses between 800 Da and 1000 Da. Only the masses that were detected in a contiguous, but well localized region are shown. (d) 800 Da –1000 Da portion of a spectrum from the upper right part of the scanned membrane. Only an arbitrary selection of detected peptide signals is labeled
3.2. Chemical noise Figure 1b reveals an interesting feature: some masses cover the entire membrane while others are localized in spots. Figure 1d shows that the localized peptide signals at 951.5 Da and 999.7 Da are not distinguishable from ubiquitous masses at 804.4 Da, 820.4 Da, 838.2 Da and 936.1 Da by means of signal intensity. Figure 4 shows
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
175
that signal intensity distributions of ubiquitous masses are flat in contrast to E. coli peptide masses. In order to automatically find ubiquitous masses, a routine test how even and spread out an intensity distribution is. Therefore it divides the membrane into 26 regions of equal size (8x8) and calculates the deviation between each
I i and the overall mean intensity I tot . If the sum over all regions of the relative deviations ¦ I i I tot / I tot is smaller than a certain
region’s mean intensity
i
threshold (=20) and if the mass is detected on more than 72 sites, it is called ubiquitous (Table 1). Table 1. Ubiquitous masses. Mass: Ubiquitous mass value. Since this value is not exactly the same in all spectra where the mass was found the median value is displayed (after calibration, see below). Number of sites: The number of spectra where the mass was found (maximal 1536). Sometimes, masses were detected with a deviation of about +1 Da from a keratin/trypsin peptide mass. This might be due to difficulties to detect the monoisotopic mass for very small peptide signals. Alleged origin: If a mass matched a trypsin (SwissProt entry: TRYP_PIG) or a keratin peptide, it is indicated in this field (one missed cleavage, maximal mass deviation 200 ppm). The following human keratins produced more than one match match: 1) K1CM_HUMAN, 2) K2C1_HUMAN. Mass (Da)
Number of sites
Alleged origin
Mass (Da)
Number of sites
Alleged origin
Mass (Da)
Number of sites
Alleged origin
804.5
1320
keratin2
861.3
97
keratin1
1046.6
180
keratin2
820.4
761
871.2
577
keratin1
1060.3
105
keratin1
823.4
112
912.4
234
1092.2
234
829.3
139
913.5
103
1126.7
170
832.5
377
914.5
74
1164.7
206
833.4
265
926.4
868
1480.0
72
834.4
451
927.5
188
1804.1
99
838.3
636
936.2
355
1994.3
93
keratin2
839.3
315
940.5
582
2118.4
136
keratin1
842.5
1251
1027.2
305
2211.4
92
trypsin
845.3
665
1032.6
154
2250.2
89
859.5
202
1045.7
366
keratin1,2
keratin2
trypsin
trypsin
keratin1
MULLER ET AL.
176
Since diffusion is limited in the molecular scanner technique (Binz et al., 1999), and since none of the ubiquitous masses (exception: 820.4 Da) could be associated with peptide masses of proteins annotated in the respective portion of the master SWISS-2DPAGE (Hoogland et al., 2000) gel (Swiss-Prot entries: IDH_ECOLI, METK_ECOLI, PGK_ECOLI, ACEA_ECOLI), these ubiquitous masses do not stem from proteins of the E.coli sample. However, some of these ubiquitous masses could be attributed to known impurities from tryptic autolysis and various forms of human keratin, whereas the remaining masses couldstem from modified or unknown impurity peptides and matrix clusters. Matrix clusters form another source of chemical noise in the low mass range, especially if the amount of protein to be analyzed is low (Keller & Li, 2000; Land & Kinsel, 2001), but in contrast to contaminating peptides their mass and intensity are not reproducible and it is not sure whether they could be detected over the entire membrane. In addition, the ubiquitous masses could not be explained by a formula for matrix cluster masses as described Keller et al. (Keller & Li, 2000). Whatever the source for the masses listed in Table 1 is, it would be impossible to discern them from low intensity peptides from the E. coli sample without the knowledge of their spatial distribution provided by the molecular scanner data. 3.3. Calibration Masses detected over the entire membrane could be used to investigate the calibration of the mass spectrometer. Figure 2a reveals that mass values were locally quite stable, but varied significantly over the entire membrane, whereas the difference between the minimal and maximal measured value of the trypsin peptide mass at 842.509 Da was about 1 Da because the membrane was warped at its upper edge (high Mr values), and because physical conditions as electric field strength depend on the position of the sampling plate (Egelhofer et al., 2000). Therefore it was impossible to assign precise mass values valuable for all spectra, and a large mass deviation of 700 ppm about the median values had to be taken into account. A re-calibration of the spectra would facilitate data handling, and we had to device a method that does not rely on internal standard masses since these were not used in the experiment described here. Since we had no information about flight times and how they had been converted into mass values, it was not possible to apply the method described in (Christian et al., 2000) to our problem and we had to guess a function that calculates the corrected masses from the original masses. Egelhofer et al. (Egelhofer et al., 2000) used a linear relationship, which was a reasonably good approximation to their data and is easy to calculate with. We chose a different approach:
m1 2 corrected
a1m1 2 a2 m a3m3 2 a4 m 2 k
(1)
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
177
which took account of additional terms and fitted f well to observed data (not shown). If the calibration correction is known for a set of masses { m i , m i corrected }, then the parameters
ak can be calculated by a robust fit (Press, teukolsky, Vetterlin, &
Flannery, 1995) of equation 1. None of the trypsin or keratin peptide masses could be detected in all spectra and therefore they could not be used as internal standards. However, many masses found
Figure 2. (a) Masses between 841 Da and 845 Da. The masses around 842.5 Da, which are detected over the entire membrane, correspond to a trypsin peptide, whereas the masses around 843.5 Da stem from Isocitrate lyase (Swiss-Prot entry ACEA_ECOLI) and are localized in the pI- Mr plane except for a few outliners. The scattering of mass values is due to calibration errors that become larger (0.7 Da) towards the edges of the membrane. For better visualization, the mass values are rendered as a surface plot. (b) As in (a), but after calibration using the algorithm described in the text.
in one spectrum could also be detected in the spectra of the neighboring scan points with a relatively small mass deviation (<200ppm), which would allow at least a relative adjustment of masses of a spectrum with respect to its neighbors. If there was a way to calibrate some master spectra, the relative adjustment could be used to calibrate the remaining spectra. Some scanned points provided very clear PMF identifications even if large mass deviations were allowed. The peptide masses of the identified proteins can then be
178
MULLER ET AL.
used as standard masses for the calibration of the associated mass fingerprints. An iterative algorithm was then used to calibrate the remaining the spectra: 1. Choose some sites with very clear identifications and use the theoretical masses of the matching peptides as mass standards in order to calibrate the respective fingerprints with equation (1). 2. For each spectrum that was not calibrated in step 1, one of the following steps is performed: a) If a spectrum is found in a 3x3 neighborhood that has already been calibrated in step 1, adjust the masses with respect to this spectrum, i.e. find the masses that are common in both spectra (with a mass tolerance of 200ppm) and fit equation (1) to these values. If several such spectra are found, take an average adjustment, i.e. take the mean values of the parameters ak . b)
3.
If no such spectra are found, take the average adjustment with respect to all spectra in the 3x3 neighborhood. This step is performed simultaneously for all spectra and a new, corrected set of fingerprints is obtained that replaces the old fingerprints. Repeat step 2 until the variations of the masses over the membrane are small enough.
The result of this procedure is depicted in Figure 2b. 109 master spectra were selected, all in the upper part of the membrane where the abundant proteins were found. The remaining variation of the mass values over the entire membrane was smaller than 200 ppm. This method has one drawback: if no clear identifications could be found in an experiment, the calibration provides only a relative adjustment between neighbors. In this case the known masses of trypsin and keratin might serve as standards and nevertheless allow a recalibration of the mass fingerprints. 3.4. Identification and clustering of masses A peptide mass fingerprint is usually contaminated with chemical noise and masses of fragmented or modified peptides. In addition, the resolution of the 2D-PAGE was limited and some spots overlapped and abundant proteins covered weakly expressed ones. On the sites of some weakly expressed proteins many of the detected peptide masses stemmed from their abundant neighbours, and the PMF identification of the spectrum obtained at the respective sites yielded a list of matches where the weakly expressed proteins were only found in a lower rank. All these ‘fake’ masses strongly enhance the number of mass combinations and produce false matches in the database search. Figure 3 shows SmartIdent scores if untreated fingerprints are submitted. Only the abundant proteins IDH_ECOLI, 6PGD_ECOLI, METK_ECOLI and PGK_ECOLI were coherently detected with the highest score, other proteins scored highest only at isolated sites and disappeared elsewhere in the mist of false identifications. The protein ATOC_ECOLI has peptides at 804.428 Da, 820.423 Da,
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
179
842.404 Da, 1045.484 Da and 1046.603 Da, which match signals produced by chemical noise, and is therefore detected over a large part of the membrane (Figure 3b). Its pI of 6.01 and molecular weight of 52176.39 Da are outside the scanned portion of the membrane and it is unlikely to be found over such a large region. Though its score is significant it is a false identification. ACEA_ECOLI is a protein
a)
b)
c)
d)
Figure 3. SmartIdent scores for calibrated, but otherwise untreated spectra. a) Highest score for each scanned site. For a better visualization, score values were cut at 150’000. Sites are dark if one of the proteins of Figure 5 was detected with the highest score and light otherwise. b) Score of Acetoacetate metabolism regulatory protein ato C (ATOC_ECOLI), which produced matches with chemical noise. Chemical noise is sometimes suppressed in spots of abundant proteins, which explains the holes in the score landscape. c) Score of ACEA_ECOLI. d) Score of hypothetical protein yagE (YAGE_ECOLI).
annotated in SWISS-2DPAGE and it is identified over a contiguous region, but not with a very significant score (Figure 3c). Also YAGE_ECOLI is identified in the same region with a similar score (Figure 3d) and it is difficult to decide whether it is a true or a false identification (a more detailed analysis as described below discards it as a false one). This shows that the identification score does not provide sufficient information for weakly expressed proteins and we have to investigate PMF identifications in further detail. A peptide mass fingerprint in the overlapping zone of the spots of Isocitrate dehydrogenase (IDH_ECOLI) and d s-Adenosylmethionine synthetase
MULLER ET AL.
180
(METK_ECOLI) was sent to SmartIdent. The protein with the highest score was IDH_ECOLI (score: 818034.11; 13 matching peptides) followed by Allantoate amidohydrolase (ALLC_ECOLI; score: 51341.17; 6 matching peptides) and others with slowly decreasing scores. METK_ECOLI (score: 25122.18; 5 matching peptides) was only found in the sixth rank. While the score of the first protein is significantly higher than the score of the second protein, there is almost no difference between the second and third rank, and it is very difficult to decide whether ALLC_ECOLI is an erroneous match without additional information. However, the signal intensities of the matching masses revealed interesting properties (Figure 4).
a)
951.505 Da
b)
999.558 Da
1155.684
e)
1254.742 Da
f)
804.419 Da
g)
820.413 Da
i)
1177.629 Da
j)
1193.623 Da
d)
c)
1108.517 Da
h)
951.456 Da
k)
2086.002 Da
Figure 4. The vertical axis represents the peptide signal intensities (peptide signal heights) as a function of the position on the membrane. The intensity was set to 0 if no peptide mass could be detected at the respective position within r100 ppm of the theoretical peptide mass. Note that the scale varies from case to case. (a)-(e) Intensity distribution of the matching peptides of METK_ECOLI. (f)-(k) Intensity distribution of the matching peptides of ALLC_ECOLI.
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
181
For METK_ECOLI all peptides except the one at 1155.684 Da showed a similar intensity distribution. The peptide at 1155.684 Da, which stemmed from the neighboring protein Phosphoglycerate kinase (PGK_ECOLI), showed lower molecular weight and higher pI values in good correspondence to other peptides of PGK_ECOLI. The case of ALLC_ECOLI was very different because no intensity distribution specific to this protein could be found. The first two masses (804.419 Da and 820.413 Da) were not localized and were part of the chemical noise (Table 1). The next mass (951.456 Da) belonged to METK_ECOLI, and the peptide at 1193.623 Da was similar to the one at 1155.684 Da and also belonged to PGK_ECOLI, whereas the remaining two masses (1177.629 Da and 2086.002 Da) could be attributed to IDH_ECOLI. Therefore we assume that the identification of ALLC_ECOLI was erroneous. This analysis identified two possible causes for erroneous identifications: chemical noise and overlapping protein spots. Chemical noise could be identified using the method described above and purged from the mass fingerprints. In order to separate masses from overlapping proteins, the masses that had similar intensity distributions had to be identified and put into the same cluster. If each protein corresponds to a particular pattern of intensity distribution, then the clusters will only contain masses that stem from the same protein. It is known that peptide signal intensity in MALDI-MS has a poor shot-to-shot reproducibility due to matrix/analyte inhomogeneity, variation in laser power and detector nonlinearity (Gusev et al., 1995). Normalization of the signal intensity with internal standards improves reproducibility and allows a quantitative analysis over two orders of magnitude (Duncan, Matanovic, & Cerpa-Poljak, 1993; Gobom et al., 2000), at least if the internal standard is chemically similar to the measured peptides and if concentrations are low enough to avoid suppression effects (Gusev et al., 1995). Kratzer et al. (Kratzer et al., 1998) investigated suppression effects in MALDI-MS with 4-HCCA as matrix, and obtained a good reproducibility of the absolute signal intensities after averaging over 50 laser shots. They showed with a mixture of 10 peptides that up to 2.5pmol/peptide the absolute signal intensities of all peptides increased nearly linearly with increasing peptide amount, but for higher amounts, complicated nonlinear suppression effects came into play depending on the presence of basic amino acids, hydrophobicity and peptide length. The longer, more hydrophobic and arginine containing peptides did not decrease in signal intensity in the measured range (100 fmol – 25 pmol), but stayed constant or did slightly increase for high concentrations, whereas other peptides were strongly suppressed. In the experiment discussed here and in similar experiments (18), the absolute signal intensity of the contiguously detected peptides always increased towards the center of a spot where peptide concentration is highest. For the amount of E. coli sample analyzed, the amount of protein in a spot is expected to be in the low pmol range producing digested peptides of even lower amount, which might well be in the linear range. For the well-expressed, contiguously detected peptides one can therefore
MULLER ET AL.
182
assume that the signal intensity is positively correlated with the concentration of its protein in the gel, and a similar intensity distribution of two peptides indicates that they stem from the same protein. In order to quantify the similarity between intensity distributions, a correlation measure had to be defined. We chose a modified version of the linear correlation that also takes into account how strongly two distributions overlap. The correlation between two masses mi and mj is defined by:
corrij hi
1 nij
1 nij
2nij
nij
¦
nij ik
k 1
;
j
; 1 d corrij d 1
V iV j
ni n j
¦h
k 1
Vi
1 nij
nij
¦
(2)
; 2
k 1
nj) is the number of sites where mi (m mj) is detected, respectively, and nijj is where ni (n the number of sites where both mi and mj are found. hikk is the intensity of mi at site k. The factor 2nijj/(ni +n nj) is 1 if mi and mj are found on exactly the same sites and 0 if there is no overlap at all. The sums in the above equations always go over all sites where both masses are detected. This correlation measure does not change if a signal is multiplied by a constant factor and it is stable against small local variations in absolute signal intensity as long as hik / h jjk are kept constant. Table 2 shows the correlation value for some masses from Figure 4. Obviously there is a strong correlation between masses of the same protein and a negative correlation between masses belonging to different proteins. We calculated the correlation (Equation 2) between the 124 contiguous masses and performed a hierarchical cluster analysis (Han & Kamber, 2001) in order to group the masses according to their intensity distribution, which yielded 20 clusters, 11 of which contained more than two masses. Since the intensity of a mass should be highest where the concentration of protein is maximal, the summit of an intensity distribution should indicate the center of a spot. Figure 5 shows the summits of all 124 masses colored according to the cluster they belong to. It shows that the summits stemming from the same cluster lie close together unless the protein corresponding to the cluster was found on different spots. There was no overlapping of the centers of different clusters, and several weakly expressed spots could be well separated from their intense neighbors. We calculated the correlation (Equation 2) between the 124 contiguous masses and performed a hierarchical cluster analysis (Han & Kamber, 2001) in order to group the masses according to their intensity distribution, which yielded 20 clusters, 11 of which contained more than two masses. Since the intensity of a mass should
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
183
be highest where the concentration of protein is maximal, the summit of an intensity distribution should indicate the center of a spot. Figure 5 shows the summits of all 124 masses colored according to the cluster they belong to. It shows that the summits stemming from the same cluster lie close together unless the protein corresponding to the cluster was found on different spots. There was no overlapping of the centers of different clusters, and several weakly expressed spots could be well separated from their intense neighbors. Table 2. Correlation between intensity distributions. Negative values were close to 0 because peptides with a bad linear correlation usually had little overlap. A cut-off of 0.35 was used in the clustering algorithm.
999.558 Da 1254.742 Da 1155.684 Da 1193.623 Da 2086.002 Da
999.558 Da 1
1254.742 Da 0.606
1155.684 Da -0.084
1193.623 Da -0.078
2086.002 Da -0.097
0.606
1
-0.046
-0.007
-0.074
-0.084
-0.046
1
0.747
-0.058
-0.078
-0.007
0.747
1
-0.036
-0.097
-0.074
-0.058
-0.036
1
The algorithm described above provides a means of clustering the contiguous peptide masses that belong to the same spot, and the masses of the same cluster can be submitted to the PMF identification program. Since chemical noise was removed and all the masses stemmed from the same protein (assuming that the spot centers of neighboring proteins are sufficiently separated), these identifications should contain less erroneous matches. Instead of the 1536 mass lists of all scanned points, only the 20 mass lists of all clusters had to be submitted to SmartIdent. Some of these mass lists had only a few entries and the identification score was not discriminative enough to clearly identify a protein. In this case the contiguous masses were not sufficient and we had to revert to the entire set of masses. Therefore all masses in a 3x3 neighborhood of the cluster center that appeared more than once were collected, and all the proteins that matched at least 2 contiguous masses were compared with these extended mass lists. If the extended mass list clearly distinguished a protein, this identification was accepted.
184
MULLER ET AL.
Figure 4: Summits of the intensity distributions of all 124 masses found on a contiguous but localized region. The intensity distributions were smoothed using a median filter before the summits were calculated. The vertical axis indicates the number of summits found on the respective scan point. The groups that could be identified (13 of 20) carry a label: Aldehyde dehydrogenase A (ALDA_ECOLI), Ketol-acid reductoisomerase (ILVC_ECOLI), Seryl-tRNA synthetase (SYS_ECOLI), Isocitrate dehydrogenase (IDH_ECOLI), 6-Phosphogluconate dehydrogenase (6PGD_ECOLI), Isocitrate lyase (ACEA_ECOLI), s-Adenosylmethionine synthetase (METK_ECOLI), Phosphoglycerate kinase (PGK_ECOLI), Enolase (ENO_ECOLI), Putrescine-binding periplasmic protein [precursor] (POTF_ECOLI), 3Oxoacyl-[acyl-carrier-protein] synthase III (FABH_ECOLI), Glutathione synthetase (GSHB_ECOLI), Phosphoserine aminotransferase (SERC_ECOLI). IDH_ECOLI, PGK_ECOLI and ACEA_ECOLI were found on two spots
Table 3 shows PMF identifications for those clusters that could be clearly identified. IDH_ECOLI, 6PGD_ECOLI, METK_ECOLI and PGK_ECOLI, the most abundant proteins, had a high score, which was much higher than the score of the protein in the second rank. It is remarkable that many of the contiguous masses attributed to the clusters of these proteins did not match a peptide mass, the most extreme case being PGK_ECOLI with 22 unmatched masses. A visual examination revealed that these masses really had a similar intensity distribution, and the problem did not lie in the clustering algorithm. We see two different explanations: first, the peptides could be highly modified, fragmented or produced by unspecific cleavage and, second, other proteins could be present in the same spot. In the case of PGK_ECOLI the intensity distributions almost always showed two spots, and it seems unlikely that another protein could be present in the same two spots. In addition, there was no other protein that matched a lot of masses from the extended mass list. Therefore the first hypothesis seems more likely. Of the 22 unmatched masses only three could be explained by modified peptides (one
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
185
carboxyamidomethyl cysteine, one di-methylation and one phosphorylation), therefore unspecific cleavage or fragmentation seems to have caused most of these masses, but further investigation has to be carried out in order to give a definite answer. Other identifications were less clear, but could be confirmed with the extended mass lists. Even if the protein database was small since it just contained the E.coli proteins, it is remarkable that some proteins could be identified with only two masses. Therefore, if the right masses are selected, a small number of masses might be sufficient for a clear PMF identification and we think that the algorithm presented here provides such a good selection. The groups that could not be identified still provided valuable information on the presence of a spot, which may be useful for gel matching. 4. CONCLUSION The molecular scanner is a protein identification technique that is able to scan a gel without previous knowledge of spot locations. Since the distance between two points at which the membrane is sampled is smaller than the average spot size, several spectra per spot are obtained. This allows applying optimization methods that make use of the spatial correlation present in the data. Visualization of all peptide mass fingerprint data revealed that some masses are localized in spots whereas other masses, especially in the lower mass region, spread out over the entire membrane. These masses were attributed to chemical noise and were discarded from the mass fingerprints. If only isolated spectra were available, the identification of chemical noise masses would be very difficult and these masses could disturb the PMF identification. Since the membrane was slightly warped after it had been pasted on the sampling plate of the spectrometer, and since the physical parameters that define the M/Z value of peptides as a function of their flight time depended of the position of the sampling plate, the overall calibration of the spectra was bad. A few master spectra that permitted very clear PMF identifications could be calibrated using matched peptide masses as internal standards. The calibration of the remaining spectra was strongly improved by using the correlation between neighboring spectra. By selecting the masses that were detected on a contiguous, but limited region of the membrane, the noise in the data was reduced. The distributions of the peptide signal intensities of these masses seemed to reflect the concentration of the proteins they stemmed from. Masses with similar peptide signal intensity distributions were put together in clusters, which allowed separating masses that stemmed from overlapping proteins. 20 different clusters were obtained in this way and were submitted to the PMF identification program, which provided clear identifications for 13 of them.
MULLER ET AL.
186
Table 3. Identification results. Swiss-Prot entry: Swiss-Prot entry for the proteins that could be identified. Asterisks mark identifications that had to be verified by the extended mass lists as described in the text. Score: SmartIdent identification score. If the protein was found in the first rank the value in parenthesis represents the score of the second rank, on the other hand, if the protein was not found in the first rank the value in parenthesis represents the score of the first rank. Best number of matched masses: The highest number of matched masses of the respective protein found among the original 1536 mass fingerprints. Rank: Rank of the protein in the list of matching proteins sorted with respect to the score. Number of contiguous masses: Number of contiguous masses in a cluster that were submitted to SmartIdent. Number of matched masses: Number of contiguous masses of a cluster that matched peptides in the database search. The number of matching masses of the extended mass list (see text) is indicated in parenthesis Swiss-Prot entry
Score
Rank
Best number of matched masses
Number of contiguous masses
Number of matched masses
ALDA_ECOLI
1066.50 (59.64) 152.48 (274.84) 179.83 (26.83) 870993.90 (946.29) 56913.04 (855.43) 116.18 (53.67) 313362.28 (379.00) 117051.12 (4221.04) 6889.07 (1941.52) 977.17 (0.02) 84.14 (4.59) 98.68 (172.43) 74.93 (2.15)
1
5
3
3 (3)
2
9
8
3 (15)
1 1
4 15
2 17
2 (3) 11 (16)
1 1 1
9 4 9
7 2 12
5 (8) 2 (3) 7 (9)
1
11
30
8 (11)
1
7
9
4 (8)
1 1 2 1
5 4 4 4
5 2 2 4
3 (3) 2 (3) 2 (5) 2 (3)
ILVC_ECOLI* SYS_ECOLI IDH_ECOLI 6PGD_ECOLI ACEA_ECOLI* METK_ECOLI PGK_ECOLI ENO_ECOLI* POTF_ECOLI FABH_ECOLI GSHB_ECOLI* SERC_ECOLI
These are only some applications that are possible with molecular scanner data. We are currently working on a new PMF identification scoring method that automatically takes into account the two-dimensional aspect of the data. A very intriguing prospect for future development comes from a new generation of mass spectrometers such as MALDI-TOF/TOF (Medzihradszky et al., 2000) and
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
187
MALDI-QqTOF (Loboda et al., 2000) machines, which could combine the MALDI scanning technique with MS/MS identification. The mass grouping method could then be used, after a first MS scan, to efficiently select parent masses for subsequent fragmentation analysis. A new technique (Wei et al., 1999), where the peptides are put on a porous silicon surface allowing disorption-ionization (DIOS) without a matrix directly from the surface, could also have a direct application in the framework of the molecular scanner. 5. ACKNOWLEDGEMENTS This work was supported by the Swiss National Fund for Scientific Research (grant 31-52974.97) and the Helmut Horten Foundation. The authors would like to thank Pierre-Alain Binz, Salvo Peasano and Jean-Charles Sanchez for their very useful contributions. 6. REFERENCES Bienvenut, W., Müller, M., Palagi, P., Gasteiger, E., Heller, M., Jung, E., et al. (2001). Proteomics and mass spectrometry: some aspects and recent developments. In J. Housby (Ed.), Mass spectrometry and genomic analysis (pp. 93-142). Amsterdam: Kluwer academic. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 49814988. Bjellqvist, B., Ek, P., Righetti, P., Gianazza, E., Gorg, A., Westermeir, R., et al. (1982). J. Biochem. Biophys., 6, 317-339. Christian, N., Arnold, R., & Reilly, J. (2000). Improved calibration of time-of-flight mass spectra by simplex optimization of electrostatic ion calculations. Anal. Chem., 72(14), 3327-3337. Duncan, M., Matanovic, G., & Cerpa-Poljak, A. (1993). Rapid Commun Mass Spectrom, 7, 1090-1094. Egelhofer, V., Bussov, K., Luebbert, C., Lehrach, H., & Nordhoff, E. (2000). Improvements in protein identification by MALDI-TOF MS peptide mapping. Anal. Chem., 72, 2741-2750. Eriksson, J., Chait, B., & Fenyo, D. (2000). A statistical basis for testing the significance of mass spectrometric protein identification results. Anal Chem, 72(5), 999-1005. Gobom, J., Kraeuter, K., Persson, R., Steen, H., Roepstorff, P., & Ekman, R. (2000). Detection and quantification of neurotensin in human brain tissue by matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry. Anal. Chem., 72, 3320-3326. Godovac-Zimmermann, J., & Brown, L. (2001). Perspectives for mass spectrometry and functional proteomics. Mass Spectrom Rev., 20(1), 1-57. Gras, R., Muller, M., Gasteiger, E., Gay, S., Binz, P., Bienvenut, W., et al. (1999). Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection. Electrophoresis, 20(18), 3535-3550. Gusev, A., Wilkinson, W., Proctor, A., & Hercules, D. (1995). Improvement of signal reproducibility and matrix/comatrix effects in MALDI analysis. Anal. Chem, 67, 1034-1041. Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., & Watanabe, C. (1993). Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proceedings of the National Academy off Sciences of the United States of America, 90(11), 5011-5015.
188
MULLER ET AL.
Hoogland, C., Sanchez, J., Tonella, L., Binz, P., Bairoch, A., Hochstrasser, D., et al. (2000). 28(1), 286288. James, P., Quadroni, M., Carafoli, E., & Gonnet, G. (1993). Protein identification by mass profile fingerprinting. Biochem Biophys Res Commun, 195(1), 58-64. Juhasz, P., Vestal, M., & Martin, S. (1997). J Am Soc Mass Spectrom, 8, 209-217. Karas, M., & Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem, 60(20), 2299-2301. Keller, B., & Li, L. (2000). J Am Chem Soc, 11, 88-93. Kratzer, R., Eckerskorn, C., Karas, M., & Lottspeich, F. (1998). Suppression effects in enzymatic peptide ladder sequencing using UV-MALDI-MS. Electrophoresis, 19, 1910-1919. Land, C., & Kinsel, G. (2001). The mechanism of matrix to analyte proton transfer in clusters of 2,5dihydroxybenzoic acid and the tripeptide VPL. J. Am. Soc. Mass Spectrom., 12, 726-731. Lopez, M. F. (2000). Better approach to finding the needle in a haystack: optimizing proteome analysis through automation. Electrophoresis, 21, 1082-1093. Mann, M., Höjrup, P., & Roepstorff, P. (1993). Biol. Mass Spectrum, 22, 338. Muller, M., Gras, R., Appel, R. D., Bienvenut, W. V., & Hochstrasser, D. F. (2002). Visualization and analysis of molecular scanner peptide mass spectra. J Am Soc Mass Spectrom, 13(3), 221-231. Pacholski, M., & N, W. (1999). Chem. Rev., 99, 2977-3005. Pappin, D., Hojrup, P., & Bleasby, A. (1993). Rapid identification of proteins by petide mass fingerprint. Curr. Biol., 3(6), 327-332. Press, W., teukolsky, S., Vetterlin, W., & Flannery, B. (1995). Numerical recipes in C. Cambridge: University press. Stoeckli, M., Chaurand, P., Hallahan, D., & Caprioli, R. (2001). Imaging mass spectrometry: a new technology for the analysis of protein expression in mammalian tissues. Nature, 7, 493-496. Tanaka, K., Waki, H., Ido, Y., Akita, S., Yoshida, Y., & Yoshida, T. (1988). Rapid Commun. Mass Spectrom., 2, 151-153. Toffoli, T., & Margolus, N. (1987). Cellular automata machines. Cambridge (MA): MIT press. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J.-C., et al. (1998). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19, 1941-1949. Vestal, M., & Jushaz P. (1998). J Am Soc Mass Spectrom., 9, 892-911. Yates, J. R., III, Speicher, S., Griffin, P. R., & Hunkapiller, T. (1993). Peptide mass maps: A highly informative approach to protein identification. Anal Biochem, 214, 397-408.
CHAPTER 6 IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION Hydrogen/Deuterium Exchange for Higher Specificity of Protein Identification by Peptide Mass Fingerprinting. (Bienvenut, Hoogland et al., 2002)
WV. Bienvenut, C. Hoogland, A. Greco, M. Heller, E. Gasteiger, RD. Appel, JJ. Diaz, JC. Sanchez, DF. Hochstrasser.
ABSTRACT Genome sequencing projects produce large amounts of information that could be translated into potential protein sequences. Such amounts of material continuously increase protein database sizes. At present, 15 times more protein sequences are available in the SWISS-PROT and TrEMBL databases than 8 years ago in SWISSPROT. One of the methods of choice for protein identification makes use of specific endoproteolytic cleavage followed by the MALDI-MS analysis of the digested product. Since 1993, when this technique was first demonstrated, the conditions required for a correct identification have changed dramatically. Whilst 4-5 peptides with an accuracy of 2-3 Da were sufficient for a correct identification in 1993, 10-13 peptides with less than 60 ppm mass error are now required for Human and E. coli proteins. This evolution is directly related to the continuous increase of protein database sizes, which causes an increase of the number of false positive matches in identification results. Utilisation of an information complement deduced from the primary protein sequence in the process of identification by peptide mass fingerprints can help to increase confidence in the identification results. In this article, we propose the exchange of labile hydrogen atoms with deuterium atoms. The exchange reaction with optimised techniques has shown an average 95% of hydrogen/deuterium exchange on tryptic peptides. This level of exchange was sufficient to single out one or more peptides from a list of potential candidate proteins due to the dependence of hydrogen/deuterium exchange on the peptide primary structure. This technique also has clear advantages in the identification of small proteins where direct protein identification is impaired by the limited number of endoproteolytic peptides. Then, primary sequence information obtained with this 189 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 189–207. © 2005 Springer. Printed in the Netherlands.
190
BIENVENUT ET AL.
technique could help to identify proteins with high confidence without any expensive tandem mass spectrometer instruments.
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION191
1. INTRODUCTION During the last decades, DNA sequencing has progressed dramatically, and the rapid development of genome sequencing projects is producing a huge amount of information concerning potentially expressed proteins from a large range of organismes. Genomes from more than 30 bacteria and archaebacteria have been fully sequenced (Blattner et al., 1997; Fleischmann et al., 1995; Ng et al., 2000). On the eukaryotic side, organisms with larger genomes e.g. Caenorhabtis elegans {Consortium, 1998 #893}, as well as Arabidopsis thaliana {Initiative, 2000 #892}, the first plant genome, were fully sequenced, and the entire Homo sapiens genome has been published (Adams et al., 2000; Consortium, 2001). Whole proteome analysis presents new challenges for the identification and characterisation of the actual proteic products. Protein sequences can be deduced from DNA sequences, and potential proteins result in a continuous expansion of protein databases, e.g. TrEMBL and SWISS-PROT (Bairoch & Apweiler, 2000) or GenPept (Wheeler et al., 2001). As an example of the database size increase, SWISS-PROT database release 25 in 1993 contained 29955 entries (http://www.expasy.org/txt/old-rel/relnotes.25.txt) and, at that time, TrEMBL did not exist. Today, 9 years later, the SWISS-PROT and TrEMBL databases contain 103 373 and 562 098 entries respectively (release 40.7 and 19.1 respectively, http://www.expasy.org/sprot/). Thus, the number of protein sequences in databases has increased by a factor of more than 20, and the increase is currently exponential. Peptide mass fingerprinting (PMF), a technique involving the analysis of peptides obtained after specific proteolytic digestion of polypeptides, has shown its efficiency for protein identification (Henzel et al., 1993; James et al., 1993; Mann et al., 1993; Pappin et al., 1993; Yates, III et al., 1993) since 1993. Whilst in 1993, only 4-5 peptides with an accuracy of 2-3 Da were generally enough to identify a protein (Pappin et al., 1993), these values have changed dramatically. A recent investigation (Axelsson et al., 2001) showing the impact of the peptide mass error on protein identification demonstrates that a 60 ppm error on a limited number of submitted masses (usually less than 30) allows the correct identification of all tested proteins when searched against the E. coli or Homo sapiens databases with 10-13 peptide masses matched. Similarly, Clauser et al. (Clauser et al., 1999) found that, on expanding the database to all mammalian entries, a maximum of 50 ppm mass error can be allowed to clearly identify correctly a protein. Mass accuracy must further be between 0.5-5 ppm if the identification is conducted against the whole protein database, e.g. SWISS-PROT and TrEMBL or GenPept, without species restriction. If sample preparation techniques and mass spectrometer accuracy do not follow the rapid increase in protein database sizes, the probability of obtaining unequivocal
192
BIENVENUT ET AL.
protein identification by PMF will decrease continuously. Lahn et al. (Lahn & Langen, 2000) addressed this problem in proposing a few additional techniques. One approach is the utilization of another endoproteinase to generate a new PMF to be submitted to the identification tools, thus increasing the confidence of protein identification. Another is protein identification using PMF improved by the generation of a short sequence tag obtained by tandem mass spectrometry (MS/MS). Whilst information like amino acid sequence or amino acid composition may increase the confidence in any protein identification, it can be difficult to apply the methods proposed by Lahm since sufficient material for two digestions is required and, furthermore, the MS/MS sequence tag identification needs expensive instruments such as electrospay ionisation (ESI) with a tandem mass spectrometer (MS) not accessible to all laboratories. Another approach is to apply specific modifications to some amino acids in the peptide chain in order to obtain additional information to improve the confidence of protein identification process by PMF. A few of these modifications, were previously described, e.g. the esterification of side chain carboxylic groups as well as the carboxy-terminal group (Acharya et al., 1977; Bartlet-Jones et al., 1994; Falick & Maltby, 1989; Fraenkel-Conrat & Olcott, 1945; Hunt et al., 1986; Nutkins & Williams, 1989; Patterson et al., 1996; Wilcox, 1967). Other modifications like the combination of two different cysteine alkylating agents proposed by Gygi et al. (Gygi et al., 1999), could also furnish crucial information allowing improved confidence in protein identification. Hydrogen/deuterium (H/D) exchange is a common practice in biochemistry. Numerous articles have described the exchange of labile hydrogen atoms on proteins using deuterated solvents. This technique is mostly used to identify specific binding sites28, conformational changes(Katta & Chait, 1993, Villanueva et al., 2000; Wang & Tang, 1996) and deduce the secondary structure of proteins(Kraus et al., 2000; Zhang & Chait, 2000). Spengler et al. (Spengler et al., 1993) proposed to use H/D to exchange all labile hydrogen atoms of a peptide to facilitate MALDI post source decay (MALDI-PSD) mass spectra interpretation(Chaurand et al., 1999; Kaufmann, Spengler, & Lutzenkirchen, 1993). Such treatment allows one to filter a list of candidate PSD fragments based on the number of exchangeable hydrogen atoms. Since the number of exchangeable labile hydrogen atoms in a peptide is sequence dependent (see Table 1), two different peptides with a similar Mr may have a different ‘exchanged Mr’ in the H/D treated sample, and may then be distinguished from each other. At about the same time, Sepetov et al. (Sepetov et al., 1993) also proposed the “use of H/D exchange to facilitate peptide sequencing by electro-spray tandem mass spectrometry". The concept was exactly the same, except that the H/D exchange process was applied to ESI-MS/MS to help to interpret MS/MS spectra (James et al., 1994).
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION193 The purpose of this article was to determine if applying H/D exchange to a protein on the MALDI sample plate improves the confidence of protein identification by PMF. 2. MATERIALS AND METHODS 2.1. Chemicals Acrylogel-PIP 2.6%C solution was purchased from BDH (Poole, GB). Low range Mr standards containing chicken lysozyme (LYC, 14.3 kDa), soybean trypsin inhibitor (ITRA, 20.1 kDa), bovine carbonic anhydrase (CAH2, 28.9 kDa), chicken ovalbumin (OVAL, 42.7 kDa), bovine serum albumin (ALBU, 66.4 kDa), rabbit phosphorylase b (PHS2, 97.2 kDa) were purchased from Bio-Rad (Richmond, CA, E-mercaptoethanol, USA). Trifluoroacetic acid (TFA), tris(hydroxymethyl)aminomethane (Tris¥), trypsin (type IX from porcine pancreas, dialysed and lyophilised), [Glu]-fibrinopeptide B (EGVNDNEEGFFSAR), substance P (RPKPQQFFGLM-NH2), dihydroxybenzoic acid (DHBA) and D-cyano4-hydroxy cinnamic acid (ACCA) were purchased from Sigma (St-Louis, MO, USA). Acetonitrile (AcN), glycerol and sodium dodecylsulfate (SDS) were purchased from Flucka (Buchs, CH) and were of analytical grade except for AcN (preparative HPLC grade). Sodium bicarbonate, methanol (MeOH) and absolute ethanol (EtOH) were purchased from Merck (Darmstadt, D). MilliQ water (Millipore, Bedford, USA) was used when necessary. Heavy water (isotopic purity = 99.95%), d1-TFA (isotopic purity = 99.50%), d6-EtOH (isotopic purity = 99.50%), d4-MeOH (isotopic purity = 99.95%), d3-AcN (isotopic purity = 99.80%) were purchased from Dr Glaser (Basel, CH). 2.2. Protein separations One-dimensional electrophoresis (1-DE) was conducted essentially according to Laemmli (Laemmli, 1970) with 12% polyacrylamide home-made gels. Low range Mr protein standard (Bio Rad) was diluted to the desired concentration in 3% Emercaptoethanol, Tris-HCl (60mM, pH 6.8), glycerol (10% v/v), SDS (2% w/v) and reduced at 95°C for 5 min before 1-DE migration. Six Pg of the protein mixture (corresponding to one Pg of each protein) were loaded on a single lane of the gel. Migration was carried out using the Mini-Protean II electrophoresis apparatus (BioRad) operated at 200 V for 45 to 50 minutes. The gel was stained with CBB R250 (0.1% w/v), water (60% v/v), methanol (30% v/v) and acetic acid (10% v/v) for 30min and destained with repeated washes of water (50% v/v), methanol (40% v/v) and acetic acid (10% v/v).
Figure 1. Determination of the optimum number of sample treatments for H/D exchange. a) MALDI-MS signal of untreated substance P with DHBA as matrix (M Mr = 1347.74 Da) where a sodium adduct peak is visible (*). b) Spectrum of a) after one treatment for H/D exchange using 2 Pl of D2O/d6-EtOH (1:1), c) After two H/D exchange treatments, d) After three H/D exchange treatments and e) After four H/D exchange treatments respectively. Maximally 23 hydrogen atoms can theoretically be exchanged corresponding to a theoretical mass of 1370.89 Da (100% of H/D exchange). The horizontal double arrow denotes the isotopic separation and the vertical line, the median of the isotopic distribution for substance P peptide after H/D exchange reactions.
e
d
c
b
a
194 BIENVENUT ET AL.
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION195 Ribosomal proteins obtained after the infection of HeLa cells with the herpes simplex virus type 1 / strain 17 (HSV-1) were purified in a two-dimensional gel electrophoresis (2-DE). The procedure of cell infection protocol, protein extraction and separation on a 2-DE is fully described in the article of Greco et al. (Greco et al., 2001). 2.3. In-gel protein digestion Protein spots were excised from the 1-DE or 2-DE gels and then digested with trypsin using previously published procedures (Bienvenut et al., 1999; Shevchenko, Jensen et al., 1996) modified as described below. The gel pieces were destained in 100 Pl of 50 mM ammonium bicarbonate, 30% AcN by incubation for 45 minutes at 37°C. The procedure was repeated until destaining was complete. The destaining solution was removed and the gel was dried in a vacuum centrifuge (Hetovac VR-1, Heto-Holten A/S, Allerød, Denmark) and subsequently reswollen in 20 Pl of 20 mM ammonium bicarbonate and 6.25 Pg/Pl of trypsin from pig. After over-night incubation at room temperature, the gel was dried under high vacuum to evaporate solvent and volatile salts. Then 20 Pl of 1% TFA in water were added and incubated for 20 minutes at room temperature with occasional shaking. The supernatant was transferred to a new polypropylene vessel. A second extraction was conducted using 20 Pl of 0.1% TFA in 50% AcN and incubated for 20 minutes at room temperature with occasional shaking. The supernatant was combined with the previous one and dried under high vacuum. Peptide material was re-solubilized in 5 Pl of 0.1% TFA in 50% AcN. A control extraction (blank) was prepared using a piece of the gel from a region located between the protein bands. 2.4. MALDI-ToF MS analysis MALDI-TOF MS was conducted using a Voyager™ Elite or Super STR (Applied Biosystems, Framingham, MA, USA) equipped with a 337 nm nitrogen laser at repetition rate of 3 and 20 Hz respectively. The analysers were used in reflectron mode with an accelerating voltage of 20 kV. Delayed extraction between 150 to 450 ns and a low mass gate setting of 900 m/z were applied. Laser power was set slightly above threshold for molecular ion production. Spectra were obtained by summation of 50-250 (Elite) and 100-1000 (Super STR) consecutive laser shots. To optimise the experimental conditions, solutions containing 1 pmol/Pl of synthetic Substance P and [Glu]-fibrinopeptide B, were used and external calibration was applied. Usually, one Pl of peptide solution was loaded on the MALDI sample plate followed by one Pl of various matrix solutions (see in the results section). The error of molecular ion mass measurement was below 100 ppm. Validation of the results was performed by using in gel digests of BSA and proteins from HeLa cells infected with HSV-1 respectively. Again, one Pl of the digest solution was loaded
196
BIENVENUT ET AL.
on the MALDI stainless steel sample plate followed by 1 Pl of 10 mg/ml of ACCA in 50% AcN, 0.1% TFA, and fast air-drying was applied. Autolysis products of porcine trypsin were used as internal standards for calibration (singly protonated peptides: sequence 98-107 = 1045.54 m/z and sequence 58-77 = 2211.107 m/z respectively). With any sample, the preparation was first analysed by MALDI-MS to acquire a reference "untreated spectrum". Then, the same samples were submitted to hydrogen/deuterium exchange as described in the following section and a second spectrum set corresponding to the "treated" sample was obtained by MALDI-MS. 2.5. H/D exchange on the MALDI sample plate To operate the exchange of the labile protons of the amino acid chain of the peptide, two Pl of various deuterated solvents were added to the sample on the MALDI sample plate, itself in a closed box flushed with dry nitrogen or argon as described by Spengler et al.34. After 30 seconds of incubation, the remaining solvent was evaporated under vacuum. In order to obtain a maximum exchange yield, the procedure was usually repeated three times as described previously34. In order to optimise the yield of exchange, several parameters were modified; the number of treatments with the deuterated solvent and its composition, the nature of the matrix, and the solvent acidity were tested as outlined in the results section to increase the H/D exchange rate. 3. RESULTS In 1993, Spengler et al. (Spengler et al., 1993) described techniques using hydrogen/deuterium (H/D) exchange to improve the recognition of MALDI-PSD ions with linear MALDI analysis. Their best results were reported to involve the treatment of one pmol of substance P using DHBA as matrix at 10 mg/ml in 10% EtOH, followed by two successive treatments with D2O/d1-EtOH (1:1) of the sample. The yield of H/D exchange obtained was 98.5% calculated by the following mathematical expression: (Untreated m/z – H/D treated m/z) % H/D exchange = __________________________________ * 100 (1) Theo. Num. Exch. H * 'H/D
Untreated m/z: experimental m/z of the native protonated peptide, H/D treated m/z: experimental m/z of the H/D treated protonated peptide, Theo. Num. Exch. H: Theoretical number of exchangeable hydrogen atoms on the treated peptide including the ionisation proton, 'H/D: Mr difference between hydrogen and deuterium atoms that corresponds to 1.0063 Da.
The recent improvements of MALDI-MS technology, such as reflectron43 and delayed extraction(Brown & Lennon, 1995; Jensen, Vorm, & Mann, 1996; Vestal,
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION197 Juhasz, & Martin, 1995; Whittal & Li, 1995) systems, now allow more accurate measurement of the achieved level of H/D exchange due to the isotopic resolution now available. Thus, the previously optimised conditions (Spengler et al., 1993) needed to be reconsidered in order to increase the level of confidence of protein identification by PMF. 3.1. Optimized sample treatment The level of H/D exchange on a synthetic peptide was tested as a function of the number of treatments with deuterated solvent. As a starting point, we used the optimised conditions described by Spengler et al. (Spengler et al., 1993). Isotopic purity of the solvents was 99.95% and 99.5% for D2O and d6-EtOH, respectively. The exchange was performed from one to four times, and the corresponding MALDI mass spectra are shown in Figure 1. It was apparent that the isotopic distribution pattern of peptide peak was altered after H/D exchange. Such pattern could be explained by different exchange rates of protons on amide bounds compared to those on other bonds on the side chains. Engen & Smith (Engen & Smith, 2001) have shown that the isotopic distribution of deuterium atoms occurs in such substitution processes following a binomial distribution. Then, to compare the efficiency of the H/D exchange of the different treatments, the width of the isotopic distribution and the average isotopic peak distribution centroid (Figueroa et al., 1998) were used as the measures for the reaction yield. In Table 1, the theoretical number of labile protons for each amino acid involved in a peptide is listed. Substance P contains maximally 23 exchangeable protons including the ionisation proton, producing a new peptide mass of 1370.89 m/z. The results summarised in Figure 1 clearly show that after three consecutive treatments the H/D exchange is at the maximum, resulting in the smallest isotopic distribution width (3 to 4 Da units). The isotopic width could not, however, be narrowed by a fourth treatment, and the mean m/z value did not change either. This mean m/z corresponds to the mass of substance P peptide with 22 out of 23 possible hydrogen atoms replaced by deuterium atoms. It corresponds to 96 % of the exchange with a ± 4 Da widths for the isotopic distribution. 3.2. Influence of the matrix and its concentration Different concentrations of DHBA were tested with the same amount of substance P to determine if the matrix concentration affects the yield of H/D exchange. No major differences were observed between 10, 25 and 50 mg/ml of DHBA matrix (data not shown). The choice of DHBA (Figure 1) was dictated by the Spengler et al.34 paper. Although, at the time the work was done, DHBA was a commonly used matrix50, at present ACCA is the preferred substance for peptide analysis. Therefore, the latter was alsotested for its suitability in H/D exchange experiments using substance P as analyte. MALDI-MS spectra comparing results after three consecutive treatments of
BIENVENUT ET AL.
198
substance P with deuterated solvent are shown in Figure 2. The H/D exchange rate using ACCA matrix was not as good as with DHBA as matrix. The isotopic cluster width increased from ±3 Da with DBHA to ±4 to 5 Da with ACCA. The symmetrical isotopic distribution observed with DHBA (Figure 2b) could not be achieved with ACCA and, moreover, a decrease in H/D exchange efficiency was observed. The deuterated solvent used for the exchange reaction must completely resolubilize the matrix and the analyte embedded in it to allow the exchange reaction to achieve the maximum yield. 3.3. Influence of the deuterated solvent composition The kinetics of H/D exchange reactions are strongly dependent on the amide’s intramolecular hydrogen bonding and access to the solvent (Englander & Kallenbach, 1984). Kraus et al. (Kraus et al., 2000) studied the rate of exchange on synthetic peptides composed of a central E-sheet forming domain (VT-sequence). The stability of the E-structure is related to the yield of H/D exchange that decreases as the VT chain length is increased. Also, polarity and protic strength of the solvent influence directly the speed of the reaction and the yield of conversion. We compared three different deuterated mixtures in order to determine the most efficient solvent.
a
b
c
Figure 2: Effect of the DHBA and ACCA matrix on the H/D exchange rate of substance P a) Control MALDI-MS signal of one pmol untreated substance P using 25 mg/ml of DHBA as matrix. Similar MALDI-MS signal was obtained with 10 mg/ml of ACCA as matrix (data not shown). b) Spectrum of the sample shown in (a) after three consecutive treatments with 2 Pl of d6-EtOH/D2O (1:1), c) MALDI-MS spectrum of substance P prepared with 10 mg/ml of ACCA as matrix after three consecutive treatments with the same deutarated solvents.
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION199 The first solvent composition tested was the previously used d6-EtOH/D2O (1:1) mixture. The second mixture was composed of D2O/ d4-MeOH (7:3), and the third mixture was composed of d3-AcN/D2O (3:7). For all solvent mixtures tested on substance P, no differences were noticed in terms of isotopic width (± 4 Da) and mean value (1369.87 m/z) (data not shown). Solvent pH has been shown to have an effect on the rate of H/D exchange (Englander, Rogero, & Englander, 1985; Rosa & Richards, 1979). For most of the labile hydrogen atoms, e.g. hydrogen atoms on the hydroxyl groups, the concentration of free deuterium ions plays an important role in the exchange reaction. These ions facilitate the exchange of the hydrogen atoms through an intermediate ion well known in organic chemistry (McMurry, 1988). To this end, the d4-MeOH/D2O mixture was spiked with d1-TFA at four different concentrations in order to increase the deuterium ion concentration. MALDI mass spectra obtained with these solvent compositions are shown in Figure 3. d1-TFA concentrations of 1 and 4% resulted in the best exchange, as defined by the fact that the mean value of the isotopic distribution was identical to the substance P Mr with all labile hydrogen atoms exchanged with deuterium atoms. Moreover, the isotopic distribution was the smallest obtained with a minimal width of ± 2 to 3 Da. Similar results were obtained using d3-AcN/D2O/d1-TFA (30:70:1) (data not shown). A larger isotopic distribution and an unfavourable mean value were obtained when using 10% d1-TFA. This result may be explained by the hygroscopic nature of TFA that could contaminate the deuterium solvent and thus compromise a high yield of conversion. The solvent with only 0.1 % of TFA showed a lower efficiency for H/D exchange, probably due to the less efficient concentration of deuterium ions. 3.4. Influence of the amino acid sequence of the peptide In order to confirm results obtained with substance P, a second synthetic peptide, [Glu]-fibrinopeptide B, was used to verify the optimised parameters, i.e. three successive treatments with two Pl of d4-MeOH/D2O/d1-TFA (70:30:1) using 10 mg/ml of ACCA as matrix. The isotopic width thus obtained was similar to the result obtained with substance P, but the best mean value obtained with [Glu]-fibrinopeptide B was 1 Da below the fully exchanged theoretical mass of 1600.88 m/z for 30 hydrogen atoms exchanged by deuterium atoms(Figure 3 g-o). 3.5. Application of the technique to tryptic bovine serum albumin digest The application of the above described technique to a protein digest does not require any modification. The spectrum obtained from an untreated tryptic BSA digest is shown in Figure 4a. Following data acquisition, the sample was subjected to H/D exchange. A second MALDI mass spectrum was acquired and is shown in Figure 4b. In general, the pattern of the spectrum after treatment is similar to that of the
200
BIENVENUT ET AL.
untreated sample although some minor peaks have been lost and relative peak intensities change slightly. As expected, isotopic clusters were shifted after H/D exchange. This effect is shown clearly for the higher Mr peptides as outlined in the zoom boxes in Figure 5. The yield of H/D exchange was very high ranging from 95 to 100 % of labile hydrogen atoms exchanged with deuterium, and corresponds to a mean value of 97 ± 2 % of exchange (Table 2). The H/D exchange data allowed the verification of the identities of 8 out of 13 peaks from the untreated sample. 3.6. Application of the technique to an unknown protein digest In order to investigate further the usefulness of this method, it was applied to in-gel digests of two unknown proteins from a HeLa cell line infected with HSV-1 (Greco et al., 2001). Proteins were separated by 2-DE digested as described above. The first selected protein had a Mr = 62 kDa and a pI = 9.5 whereas the second had a Mr = 12 kDa and a pI = 10.5. The first protein spot could be clearly identified as polyadenylate binding proteins 1 (PAB1) or 2 (PAB2) by PMF (SWISS-PROT entries: P11940 or Q15097 respectively) (Table 3a). However, it was not possible to distinguish by MS which of these proteins was present due to missing peptides in the variable region. Nevertheless, biochemical investigations tend to validate the identification of the PAB1 protein (Greco et al., 2001). One of the matching peptide masses shown in Table 3b (1045.549 m/z) could also correspond to an autolysis product of porcine trypsin (TRYP_PIG, SWISS-PROT accession number: P00761) that is frequently found in protein digests. Application of the H/D exchange method clearly identified this peptide as originating from PAB1. The trypsin peptide, LSSPATLNSR, with theoretical mass of 1045.56 m/z can be transformed into a deuterated peptide of 1067.71 m/z (22 labile hydrogen atoms). In contrast, the corresponding PAB1 peptide, GFGFVSFER (1045.51 m/z), has only 18 exchangeable hydrogen atoms, resulting in a theoretical mass of 1063.62 m/z. The experimentally determined Mr after H/D exchange was 1062.61 m/z. Identification of the second protein was not as unequivocal (Table 4). In that case, the first 10 ranked proteins from Virus (Table 4A) and Human databases (Table 4B) did not show a sufficiently discriminating score to identify correctly the target protein. The 3 or 4 matched peptide masses were not sufficient to unequivocally identify a protein. The theoretical Mr and pI values of the cap (Greco et al., 2001)side protein VP26 (SWISS-PROT entry P10219) were consistent with the experimental position of the second protein on the 2-DE (Protein 5 in Table 4A). The H/D treatment allowed us to strongly validate three out of the five identified peaks (Figure 6). Such results could not be obtained with any other protein ranked in the SmartIdent output (Table 4). As a final proof, MS-MS analysis using a Q-TOF from Micromass (Manchester, UK) confirmed the identification (Table 4C).
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION201
g h j k m o
g h j k m o
Figure 4. Influence of the d1-TFA concentration in various deuterated solvent on the H/D exchange. MALDI-MS signal of untreated peptides with ACCA as matrix: a) substance P, g) [Glu]-fibrinopeptide B; b-f) and h-o): Spectra of substance P or [Glu]-fibrinopeptide B respectively as in a) and g) but treated three consecutive times with deutarated solvents using b) D2O/dd4-MeOH (30:70), c) D2O/dd4-MeOH/d1-TFA (30:70:0.1), d) D2O/dd4-MeOH/dd1-TFA (30:70:1), e) D2O/dd4-EtOH/d d1-TFA (30:70:4), f) D2O/dd4-EtOH/d d1-TFA (30:70:10), h) D2O/dd4EtOH (1:1), j) D2O/dd4-MeOH (30:70), k) D2O/dd4-MeOH/d1-TFA (30:70:1), m) D2O/dd3AcN/d d1-TFA (70:30:1) and n) D2O/dd3-AcN (70:30). The best results were obtained with solvents containing 1 and 4% of d1-TFA allowing for complete H/D exchange of substance P. Nevertheless, same conditions applied to [Glu]-fibrinopeptide B gave 29 out of the 30 hydrogen atoms that must theoretically exchanged corresponding to the theoretical mass of 1600.88 Da (100% of H/D exchange).
BIENVENUT ET AL.
202
4. DISCUSSION AND CONCLUSION H/D exchange is a technique mostly used for conformational structural analysis of proteins. For identification purposes, H/D exchange techniques were reported as early as1993(Spengler et al., 1993) but were only developed to validate MS/MS results(Fleischmann et al., 1995; James et al., 1993; Ohguro et al., 1994; Spengler et al., 1993). Here, the idea was to apply H/D exchange to a protein digest in order to increase the confidence of protein identification by PMF. In addition, the improvement of MALDI-MS resolution and sensitivity achieved during the last ten years has continued to update and improve this technique. 4.1. Influence of the matrix compound
A
B
Figure 5. a) MALDI-MS spectrum of untreated bovine serum albumin digest: b) MALDI-MS spectrum of sample shown in a) after treatment with deuterated solvent. The zoom boxes show the isotopic distribution of the 2045.03 Da peak in the untreated sample and the same peak after the H/D exchange appearing at 2075.63 Da.
In the previously published method (Spengler et al., 1993), two treatments of synthetic peptides with deuterated solvents were proposed to obtain a best H/D exchange yield. From our experiments, three repetitive treatments gave optimal results. It was shown that H/D exchange reactions resulted in the highest exchange rates close to 100% (Figure 1 and 3). Inconsistencies in H/D exchange reproducibility were observed, and these could be directly related to the matrix substance (Figure 2). In fact, to obtain a complete and efficient exchange, matrix and
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION203 analyte must be completely re-solubilised during the treatment. Therefore, for the sample deutaration a deuterated solvent must be used with a high solubility constant for the matrix. Additionally, a fast matrix drying method is in general favored by a thin matrix/analyte layer on the target. As a result, the matrix re-solubilisation during the H/D exchange treatment is facilitated and a better contact between matrix/analyte and the deuterated solvent is obtained. 4.2. Influence of the physico-chemical characteristic of the solvent The solvent used for H/D exchange has therefore a strong influence on the exchange rate. The different polarity of the solvents, e.g. d6-EtOH, d4-MeOH and d3-AcN, did not lead to any major differences (Figure 3). The pD (-log[D+] § -log [H+] = pH) on the other hand showed a strong influence on the exchange rate. A limited amount of d1-TFA (from 1 to 4%) led to a strong increase in the yield of H/D exchange in the case of substance P, but did not affect the exchange with [Glu]-fibrinopeptide B. In fact, the latter peptide contains three acidic residues (one aspartic and 2 glutamic acids) creating a theoretical pI of 4.0 in water (http://www.expasy.org/tools/pi_tool.html). Although that pH value is dependent on the solvent, this peptide exhibits a high density of acidic residues in its primary structure. This physico-chemical property may cause an autocatalytic effect on the H/D exchange compared to that for substance P which does not contain any acidic residues. 4.3. Application of the technique to an unknown protein digest 4.3. Influence of the amino acid composition of the peptide The yield of H/D exchange is not only related to the details of the technique, e.g. matrix quantity, deuterated solvent pH, but also to the peptide primary and secondary structure. Kraus et al. (Kraus et al., 2000) had shown that the secondary structure of the peptide is stable when a (Val-Thr)5 and larger clusters are present in the peptide sequence and thus limits the exchange yield. Below this value, i.e. (VT)4 and smaller clusters, the secondary structure effect is much more limited and the solvent access to the labile protons is favoured. Nevertheless, the exchange could be limited due to the hydrophobic region that prevents the access to the hydrophilic solvent51. In the above described case, and with the utilisation of mean mass valuees, substance P shows 100% of exchange and [Glu]-fibrinopeptide B shows 96.7 % of exchange. Utilisation of such a technique on a larger peptide collection like a protein digest allowed confirmation of the results obtained from synthetic peptides. For example, the result obtained with bovine serum albumin digest shows an exchange of 98 ± 2% (calculated from 8 values), and the exchange rate for the first unknown protein (PAB1_HUMAN) was 94 ± 5% (calculated from 9 values). From these examples, the experimental yield is close to 95% of the theoretical maximum exchange
204
BIENVENUT ET AL.
4.4. Application of the technique as a validating and discriminating method Utilisation of this technique on protein digests is valuable in different cases. For example, masses used for protein identification could also be related to contaminants like matrix, keratins and/or autolysis products. A few peptides of these contaminant proteins are well known but, with only a MALDI mass spectrum, it is not possible to clearly determine if a peptide mass is related to the identified protein or to a contaminant. Then, this H/D exchange technique can rapidly and precisely identify the best matching sequence corresponding to the experimental information. As an application, this technique has shown for the first unknown protein of the virus infected HeLa cells separated by 2-DE, its ability to differentiate that type of ambiguity, and showed that where the 1045.56 m/z peak derived from the PAB1 protein and not from the autolysis product of the trypsin. In addition, this technique is not only limited to a single peptide, but can also be applied to all protein-matched peptides. It was described above that this technique of H/D exchange applied to a tryptic protein digest shown an average exchange generally close to 95%. Due to the reproducibility of the mean exchange, this technique could be used as a validating method applied to PMF protein identification. H/D exchange is based on mass discrimination directly related to the primary sequence, thus giving a valuable complement of information. As an example, in the case of the second protein from infected HeLa cells, this technique was applied to the protein digest. Information collected on the untreated and treated digest mass spectra allowed us to validate a protein identification using all of the confirmed peptides. In that case, the complementary information allowed discrimination of the correct protein from the others listed in the PMF output. 5. CHALLENGE AND FUTURE DEVELOPMENTS One of the major challenges related to this technique is the treatment of the sample. Indeed, although the first part of the sample preparation is as usual for MALDI-MS sample analysis, the deuteration of the sample needs practice and additional handling. The H/D exchange must take place in a closed environment protected from atmospheric water that may compromise a good exchange of hydrogen by deuterium. A continuous flush by dried and inert gas is a prerequisite, but a robotic approach could easily be considered for the sample preparation due to the required repetitive treatment of the sample. A second disavantage is that all of the results presented in this paper were obtained by manual retrieval of the data from the MALDI-MS spectra and PMF identification tools. This exhaustive step is, at present, needed to exploit such data. Concerning data treatment, Spengler et al. (Spengler et al., 1993) and Sepetov et al. (Sepetov et al., 1993) developed a computer analysis system to help MS/MS spectra interpretation using such data. The development of efficient bioinformatic tools should be able to extract information contained in the data due to the present
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION205 approach, i.e. utilisation of the average isotopic peak distribuon centroid (Figueroa et al., 1998) as the reference m/z. The third drawback seems to corresponds to the loss of material or at least loss of sensitivity. Although the MALDI mass spectrum of the treated sample lacks a few of the minor peaks, this does not affect the ability to retrieve further information. Furthermore, in Figure 4, a few clearly visible peaks, e.g. 1399.68 or 1439.82 m/z, are no longer found in the MALDI-MS spectrum after the H/D exchange. Otherwise, smaller peaks than the previous one, like 1163.63 m/z, are still visible after the treatment. The H/D treatment seems to modify physico-chemical properties of the treated peptides that influence positively or negatively their signal intensity in the spectrum of H/D treated sample. Despite these small drawbacks, the great advantage of this technique is the ability to use the same sample previously analysed by MALDI-MS for a second analysis. Moreover, this technique could be applied with success to 2-DE samples. From our samples, this treatment allows one to obtain information on the primary structure of the peptide. This type of investigation is sufficient in our cases to clearly identify a peptide between a few candidates, or more generally to facilitate protein identification without the help of expensive MS-MS systems. Further utilisation will show if such technique could be useful on a large range of protein. 6. ACKNOWLEDGEMENTS This work was supported by the Swiss National Fund for Scientific Research (grant 31-59095.99). The authors acknowledge Prof. Jacques Deshusses, Dr Catherine Déon and Gérald Rossellat for their technical support. 7. REFERENCES Acharya, A., Maanjula, B., Murthy, G., & Vithayathil, P. (1977). Int J Peptide Protein Res, 9, 213-219. Adams, M., Celniker, S., Holt, R., Venter, J., & al., e. (2000). Science, 287, 2185-2191. Axelsson, J., Naven, T., & Fenyo, D. (2001, 4-6th off April). Paper presented at the From biology to pathology: the proteomics perspective, York, UK. Bairoch, A., & Apweiler, R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic acids research, 28, 45-48. Bartlet-Jones, M., Jeffrey, W., Hansen, H., & Pappin, D. (1994). Peptide ladder sequencing by MS using a novel volatile degradation reagent. Rapid Commun. Mass Spectrom., 8, 737-742. Bienvenut, W., Hoogland, C., Greco, A., Heller, M., Gasteiger, E., Appel, R., et al. (2002). Hydrogen/deuterium exchange for higher specificity of protein identification by peptide mass fingerprinting. Rapid Commun. Mass Spectrom., 16(6), 616-626. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Blattner, F., Plunkett, G. r., Bloch, C., Perna, N., Burland, V., Riley, M., et al. (1997). Science, 277, 1453-1474. Brown, R., & Lennon, J. (1995). Mass resolution improvment by incorporation of pulsed ion extraction in a matrix-assisted laser desorption/ionisation linear time-of-flight mass spectrometer. Anal. Chem, 67, 1988-2003.
206
BIENVENUT ET AL.
Chaurand, P., Luetzenkirchen, F., & Spengler, B. (1999). Peptide and protein identification by MALDIPSD TOF-MS. J. Am. Soc. Mass Spectrom., 10, 91-103. Clauser, K., Baker, P., & Burlingame, A. (1999). Anal. Chem., 71, 2076-2084. Consortium, I. H. G. S. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921. Engen, J., & Smith, D. (2001). Investigating protein structure and dynamics by hydrogen exchange MS. Anal. Chem., 73, J. Chromatogr. A 256-A265. Englander, J., Rogero, J., & Englander, S. (1985). Anal . Biochem., 147, 234-. Englander, S., & Kallenbach, N. (1984). Q. Rev. Biophys., 16, 521. Falick, A., & Maltby, D. (1989). Anal. Biochem., 182, 165-169. Figueroa, I., Torres, O., & Russell, D. (1998). Effects of the water content in the sample preparation for MALDI on the mass spectra. Anal. Chem., 70, 4527-4533. Fleischmann, R., Adams, M., White, O., Clayton, R., Kirkness, E., Kerlavage, A., et al. (1995). Science, 269, 496-512. Fraenkel-Conrat, H., & Olcott, H. (1945). J. Biol. Chem., 161, 259-268. Greco, A., Bienvenut, W., Sanchez, J., Kindbeiter, K., Hochstrasser, D., Madjar, J., et al. (2001). Identification of ribosome-associated viral and cellular basic proteins during the course of infection with herpes simplex virus type 1. Proteomics, 1(4), 545-549. Gygi, S., Rist, B., Gerber, S., Turecek, F., Gelb, M., & Aebersold, R. (1999). quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnol., 17, 994-999. Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., & Watanabe, C. (1993). Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proceedings of the National Academy off Sciences of the United States of America, 90(11), 5011-5015. Hunt, D., Yates, J., Shabanowitz, J., Winston, S., & Hauer, C. (1986). Proc. Natl. Sci. USA, 83, 62336237. James, P., Quadroni, M., Carafoli, E., & Gonnet, G. (1993). Protein identification by mass profile fingerprinting. Biochem Biophys Res Commun, 195(1), 58-64. James, P., Quadroni, M., Carafoli, E., & Gonnet, G. (1994). Protein identification in DNA databases by peptide mass fingerprinting. Protein Sci., 3(8), 1347-1350. Jensen, O. N., Vorm, O., & Mann, M. (1996). Sequence patterns produced by incomplete enzymatic digestion or one-step Edman degradation of peptide mixtures as probes for protein database searches. Electrophoresis, 17(5), 938-944. Katta, V., & Chait, B. (1993). J. Am. Chem. Soc., 115, 6317-6321. Kaufmann, R., Spengler, B., & Lutzenkirchen, F. (1993). Mass spectrometric sequencing of linear peptides by product-ion anaylsis in a reflectron time-of-flight mass spectrometer using matricassisted laser desorption/ionisation. Rapid Commun. Mass Spectrom., 7, 902-910. Kraus, M., Janck, K., Bienert, M., & Krause, E. (2000). Characterisation of intermolecular ?-sheet peptides by mass spectrometry and hydrogen isotope exchange. Rapid Commun. Mass Spectrom., 14, 1094-1104. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. Lahn, H. W., & Langen, H. (2000). Mass spectrometry: a tool for the identifiaction of proteins separated by gels. Electrophoresis, 21, 2105-2114. Mann, M., Höjrup, P., & Roepstorff, P. (1993). Biol. Mass Spectrum, 22, 338. McMurry, J. (1988). Organic chemistry (2nd ed.). Belmont, CA,: Brooks/Cole Publishing Company. Ng, W., Kennedy, S., Mahairas, G., Berquist, B., Pan, M., Shukla, H., et al. (2000). Proc. Natl. Acad. Sci. USA, 97, 12176-12181. Nutkins, J., & Williams, D. (1989). Eur. J. Biochem. Ohguro, H., Palczewski, K., Walsh, K., & Johnson, R. (1994). Prot Scien, 3, 2428-2434. Pappin, D., Hojrup, P., & Bleasby, A. (1993). Rapid identification of proteins by petide mass fingerprint. Curr. Biol., 3(6), 327-332. Patterson, S., Thomas, D., & Bradshaw, R. (1996). Application of combined mass spectrometry and partial amino acid sequence to the identification of gel separated proteins. Electrophoresis, 17, 877891.
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION207 Rosa, J., & Richards, J. (1979). J. Mol. Biol., 133, 399-. Sepetov, N. F., Issakova, O. L., Lebl, M., Swiderek, K., Stahl, D. C., & Lee, T. D. (1993). The use of hydrogen-deuterium exchange to facilitate peptide sequencing by electrospry tandem mass spectrometry. Rapid Commun. Mass Spectrom., 7, 58-62. Shevchenko, A., Jensen, O. N., Podtelejnikov, A. V., Sagliocco, F., Wilm, M., Vorm, O., et al. (1996). Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. Proc. Natl. Acad. Sci U.S.A., 93(25), 14440-14445. Spengler, B., Lutzenkirchen, F., & Kaufmann, R. (1993). On-target deuteration for peptide sequencing by laser mass spectrometry. Org. Mass Spectrom., 28, 1482-1490. The Arabidopsis Initiative. (2000). Nature, 408, 796-815. The C elegans Sequencing Consortium. (1998). Science, 282, 2012-2018. Vestal, M. L., Juhasz, P., & Martin, S. A. (1995). Delayed extraction matrix-assisted laser desorption time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom., 9, 1044-1050. Wang, F., & Tang, X. (1996). Biochemistry, 35, 4069-4078. Wheeler, D., Church, D., Lash, A., Leipe, D., Madden, T., Pontius, J., et al. (2001). Database resources of the national center for biotechnology information. Nucleic acids research, 29, 11.juin. Whittal, R., & Li, L. (1995). Anal. Chem., 67, 1950-1954. Wilcox, P. (1967). Esterification. Meth. Enzym., 11, 605-616. Yates, J. R., III, Speicher, S., Griffin, P. R., & Hunkapiller, T. (1993). Peptide mass maps: A highly informative approach to protein identification. Anal Biochem, 214, 397-408. Zhang, W., & Chait, B. (2000). Anal. Chem., 72, 2482-2489.
CHAPTER 7 IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION (2/2) MALDI-MS/MS with high resolution and sensitivity for identification and characterization of proteins. (Bienvenut, Deon, Pasquarello et al., 2002)
WV. Bienvenut, C. Deon, C. Pasquarello, JM. Campbell, JC. Sanchez, ML. Vestal, DF. Hochstrasser
ABSTRACT Although PMF is currently the method of choice to identify proteins, the number of proteins available in databases is increasing constantly, and hence, the advantage of having sequence data on selected peptide, in order to increase the effectiveness of database searching, is more crucial. Until recently, the ability to identify proteins based on the peptide sequence was essentially limited to the use of ESI tandem MS methods. The recent development of new instruments with MALDI sources and true MS/MS capabilities creates the capacity to obtain high quality tandem mass spectra of peptides. In this work, using the new high resolution MALDI-TOF/TOF tandem mass spectrometer from Applied Biosystems, examples of successful identification and characterization of bovine heart proteins (SWISS-PROT entries: P02192, Q9XSC6, P13620) separated by 2-DE and blotted onto PVDF membrane are described. Tryptic protein digests were analyzed by MALDI-TOF to identify peptide masses afterward used for MS/MS. Subsequent high-energy MALDI-TOF/TOF CID spectra were recorded on selected ions. All data, both MS and MS/MS were recorded on the same instrument. Tandem mass spectra were submitted to databases searching using MS-Tag or were manually de-novo sequenced. An interesting modification of tryptophan residue, a “double oxidation”, was enlightened during these analyses. KEYWORDS MALDI-TOF/TOF-MS, Bovine, Protein identification, tandem MS
209 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 209–224. © 2005 Springer. Printed in the Netherlands.
BIENVENUT ET AL.
210
1. INTRODUCTION Protein identification by PMF is a rapid method to identify proteins previously separated using 1- or 2-DE. However, in analyses of complete gels, there are always samples that could not be identified exclusively with the information available in the PMF (Norregaard et al., 1998) and hence, need to be analyzed with a method that is able to provide primary structure information on a peptide, such as tandem MS systems. Although MALDI-PSD-MS is able to provide information on the AA sequence (Gevaert et al., 1997), ESI-MS/MS has be adopted as the method of choice for rapid attaining peptide sequence (Neubaeur et al., 1998). The use of ESI, however, necessitates that an additional sample preparation step be completed in order to analyze 2-D gels. Recently, the development of a new MS/MS generation apparatus with a MALDI source allows proteins to be directly identified from a common sample using either the PMF and/or high quality tandem MS/MS data (Loboda, Krutchinsky, Bromirski, Ens, & Standing, 2000; K. Medzihradszky et al., 2000; A. Shevchenko, Jensen et al., 1996; A. Shevchenko, Sunyaev et al., 2001). In this article, we are showing results obtained with the new MALDI-TOF/TOF- MS workstation from Applied Biosystems. Two examples are shown to describe the utilization of this apparatus for peptide discrimination and sequence validation. In the third example, the five most intense peaks observed in the spectrum did not correspond to the PMF identified protein. Using de novo interpretation of the MS/MS spectrum obtained from one peptide gave an AA sequence that contains an unknown residue. This sequence corresponded to the previously identified protein, but showed a “doubly oxidized” tryptophan for the unknown residue. Interestingly, the peptides with the unusual modification of tryptophan appear to have relatively high ionization efficiency. In addition, this modification could be as helpful as oxidized methionine for protein identification if properly defined in database searching tools. Utilization of the TOF/TOF allows the same sample to be used not only to rapidly identify, but also to further characterize a digested protein. In cases where neither the PMF procedure, nor database analysis of tandem data is successful, this analysis technique is also compatible with de novo sequencing. 2. MATERIALS AND METHODS 2.1. Reagents and apparatus The reagents and apparatus used for 2DE protein separation have been described in detail elsewhere (Hochstrasser et al., 1992).
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION211 2.2. Protein solubilisation for preparative 2-D PAGE Seven mg of lyophilized heart ventricle tissue were dissolved in 450 Pl of an IPG gel rehydration buffer, containing 8 M urea, 4 % w/v CHAPS, 65 mM DTT, 0.8 % v/v carrier ampholytes Resolytes¥ 3.5-10 and a trace of bromophenol blue. Due to the low solubilization of such tissue, protein concentration in the solution must be lower than 3.5 mg of total protein in solution. 2.3. 2-D PAGE
2
1 3
Figure 1: 2-DE PVDF membrane separation of Bovine heart ventricle proteins amido black stained. Spots labeled from 1 to 3 are the proteins used for tryptic digest followed by MALDITOF-MS and MALDI-TOF/TOF-MS/MS analysis.
2-D PAGE was performed using immobilized pH gradient strips and an improved in-gel rehydration technique (Sanchez, Rouge et al., 1997) allowing proteins to enter the gel during the rehydration step. Separation was carried out using sigmoidal IPG
212
BIENVENUT ET AL.
strip (18 cm) covering a pH of 3.5-10 (Bjellqvist, Hughes et al., 1993; Bjellqvist, Pasquali, Ravier, Sanchez, & Hochstrasser, 1993). For the second dimensional separation, vertical gradient slab SDS-PAGE (9-16% T) was used. The cross linker was piperazine diacrylyl (2.6% C) and the adjunct catalyst was sodium thiosulfate (Hochstrasser & Merril, 1988; Hochstrasser, Patchornik, & Merril, 1988; Laemmli, 1970). After this separation, proteins were electroblotted onto PVDF membrane with 10 mM CAPS buffer, pH 11.0 in 5% v/v methanol (for the cathodic side) and 20 % v/v methanol (for the anodic side) (Jin & Cerletti, 1992) sing a homemade semidry apparatus. The transfer was carried out using 100 mA constant current during one hour followed by an overnight transfer at 40 mA (28 V maximum). PVDF membrane was stained for 2 minutes with 0.5% w/v amido black, 20% v/v isopropanol (Eckerskorn & Lottspeich, 1993) destained with water and air-dried. 2.4. Image analysis The amido black stained PVDF membrane was scanned with an optical densitometer and the image was analyzed with MELANIE 3 software program (Appel, Palagi et al., 1997; Appel, Vargas et al., 1997). Figure 1 shows the membrane obtained with this sample and the spots used for protein analyses are labeled from 1 to 4. 2.5. Protein digestion Protein spots were excised from the membrane and then digested with trypsin using previous published procedure (Bienvenut et al., 1999), modified as described below. The pieces of membrane were first destained for 2 hours at room temperature with 1 ml of 50% methanol/water (1:1) solution. The destaining solution was removed and the process was repeated a second time. Twenty five Pl of 60 mM ammonium bicarbonate, 40% v/v acetonitrile and 15 ng/Pl of trypsin were added to the membrane and incubated at room temperature overnight. The supernatant was collected and material at the surface of the PVDF membrane was extracted using 20 Pl of 80% v/v acetonitrile, 0.1 % v/v TFA solution, incubated for 3 minutes at room temperature with sonication. This supernatant was combined with the first one. The extracts were dried in a speed-vac up to 1-2 Pl of solvent. Solvent (0.1% TFA in 50% ACN) was added to make up volume of dried sample to 5 Pl. 2.6. Sample preparation 0.8 Pl of the digested protein was loaded on a MALDI 2x96-wells target. The same volume of matrix (10 mg/ml D-Cyano-4-hydroxycinnamic acid in 50% ACN, 0.1%
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION213 100
1592.84
A
2.1E+4
90 80
% I n t e n s i t y
70 60 50 40
1869.02
30
997.0
1269.2
1541.4
1813.6
2085.8
2358.0
Mass (m/z)
110.11
100
882.1
B) y6
90
716.41
b7
b8 b9
b10
LFTGHPETLEK
70
y10 y9
60
% Intensity
b6
b5
80
y8
y7
1271.66
y6
50
40
a5
b5
528.37
30
556.34
328.27 86.13 120.1393
20
227.20
b7
233.26 195.18
782.38
296.24
10
415.3208 490.3542
0
52
300
548
796
y7 y8 b8 853.45 910.48 883.42
b10
b9 996.50
y9
1125.56
b10+H2O
1011.49 968.58
1044
Mass (m/z)
Figure 2: A) MALDI-MS spectrum of the tryptic digest of spot 1 (Figure 1). Labeled peaks correspond to the matched peptide to Bovine Myoglobin (SWISS-PROT entry: P02192). "T" labeled peaks correspond to autolysis products of the trypsin. Two peptide masses (1271.66 and 1521.91 Da) could be matched to two different sequences. The 1521.91 Da peptide was matched to the sequence (K)KHGNTVLTALGGILK(KK), The second possible sequence is only a frame shift of the previous one corresponding to (KK)HGNTVLTALGGILKK(K). MS/MS spectrum of this peptide validates the second sequence (data not shown). B) MS/MS spectrum of 1271.66 Da peptide (HLAESHANKHK or LFTGHPETLEK) acquired with the MALDI-TOF/TOF-MS. No fragment corresponds to the first possible AA sequence whatever the second sequence is validated by bn and yn (n = 10-6) ions series.
TFA) was added to the digest on the target. Samples were dried as quickly as possible under vacuum. MS and MS/MS analyzes were performed on the Applied Biosystems Voyager TOF/TOF¥ Workstation (Medzihradszky et al., 2000), which uses a 200Hz Nd:YAG laser operating at 355 nm. During MS/MS analysis, air was
BIENVENUT ET AL.
214
used as the collision gas. Spectra were obtained by accumulation of 200 to 2000 consecutive laser shots. 2.7. Database interrogation were performed using SmartIdent PMF interrogations (http://www.expasy.org/tools/) (Gras et al., 1999). Peak harvesting was done automatically using Data Explorer software. Peak resolution was calculated using the Data Explorer software, with only baseline correction being applied to the raw data. The query was made for the bovine species with a minimum number of matched masses set as 4. The maximum tolerance for masses was 50 ppm after an internal calibration using autolysis products of trypsin, at most one missed cleavage for tryptic peptides was allowed, and the modifications accepted were carbamidomethyl cysteines and artifactual oxidation of methionines. Mr and pI values of the analyzed spot were obtained from the 2-DE gel. SWISS-PROT & TrEMBL databases were used for the search. MS/MS interrogations were carried out with the same parameters as previously described for the PMF research, using MSTAG or MS-Pattern tools (http://prospector.ucsf.edu/) depending on the type of interrogation. Precursor peak error was set as 50-100 ppm and fragment tolerance was defined as 500-1500 ppm. No internal calibration of the MS/MS data was completed. Table 1. Peak resolution value of ten peptides on the MALDI-MS spectrum shown in Figure 3A. The average resolution is around 13500 (FWMH) but the 1507.77 Da peak shows a significant variation of this value (P > 90) that allows average calculation without it. In that case, the average resolution is 14050 r 1250. Mr
Resolution FWMH
907.48 1015.47 1157.55 1231.64 1507.77 1643.81 1675.81 1785.96 1994.95 2151.04 Average
12395 13623 13498 13960 8760 15636 14146 16594 13776 12775 13500 r 2000
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION215 1675.8021
100
8300.6
90 80
% Intensity
70 60
907.4878 50 40
1157.5495 1231.6365
1643.8131 4
30
1785.96
1507.77 20
y8 b2
100
b4
b8
935.48
313.2
1507.77 7
LSVEALNSLTGEFK
90
80
y12 y11 y10 y9 y8 y7 y6 y5 y4
70
b4
B) % Intensity
60
GGDDLDPNYVLSSR 50
y11 y10 40
86.1159
201.19
30
b3-H2O b 282.30
20
70.0729 10
y8
b2
60.0540 0.0540 63.311 63.3111 65.2771 133.7069
b4
y5
429.34
581.36
b5 y4 500.36
y6 694.42
y7 781.44
y8 b8
814.45
895.44
y9 y10 1008.49 1079.59
y10
y11 y11
y
1163.60 1278.67 12 1208.62 1307.63
Figure 3: A) MALDI-MS spectrum of the tryptic digest of spot 2 (Figure 1). Labeled peaks correspond to the matched peptide to Bovine Creatine Kinase (SWISS-PROT entry: Q9XSC6). "T" and "K" labeled peaks correspond to autolysis product of the trypsin and peptides from Keratin contamination respectively. The underlined peak corresponds to an unmatched peak against the target protein but corresponds to the 1643.81 Da carrying a double oxidation (described more precisely in section 3.3). B) MS/MS spectrum of 1507.77 Da peptide (GGDDLDPNYVLSSR or LSVEALNSLTGEFK) acquired with the MALDI-TOF/TOF-MS. Both peptide fragments could be validated. Underlined Mr values and labeled fragments correspond to the C-terminus arginine peptide. Others correspond to C-terminus lysine peptide fragments.
216
BIENVENUT ET AL. 3. RESULTS
3.1. Peptide sequences discrimination One possible utilization of MS/MS is the validation of the AA sequence of a peptide previously matched by PMF protein identification. As an example, the digest obtained from the protein spot 1 (Figure 1) was analyzed directly by the MALDI-TOF/TOF workstation in single MS mode. The spectrum obtained is shown in Figure 2A and the interrogation of the PMF unambiguously identified the protein as Bovine Myoglobine (SWISS-PROT entry: P02192) using 8 masses corresponding to 10 possible sequences. Indeed, two peptide masses, 1271.66 and 1521.91 Da, could each be matched with two different sequences. The 1521.91 Da peptide matched to both the sequence (K)KHGNTVLTALGGILK(KK), and a frame shift of this sequence, (KK)HGNTVLTALGGILKK(K). The MS/MS spectrum of this peptide (data not shown) validated the second sequence. In fact, tryptic activity against the former type of AA pattern - XKKZ where X and Z correspond to any of the usual AA except the proline residue for Z - could also confirmed this sequence since preferential cleavage occurs between XKK-X and not between XK-KX (Halfon & Craik, 1998; Keil, 1992). For the second peptide corresponding to the experimental Mr of 1271.663 Da, both possible AA sequences were completely different: HLAESHANKHK and LFTGHPETLEK. The theoretical Mr of both candidates were 1271.660 Da and 1271.663 respectively, thus the experimental mass error difference for each peptide was lower than 5 ppm. An MS/MS spectrum was acquired from this peak and is shown in Figure 2B. Fragments of the first proposed sequence cannot match any major peaks visible in the spectrum. On the other hand, there was a clear match of the bn and yn ion series for n= 6-10 of the second peptide sequence. It was also possible to attribute a few others fragments such as internal fragments mostly near the proline residue (GH, HP, PE, PET…), immonium ions (H, L, P, F) as well as an x3 and a2 ions to this second sequence. Table 2: MS/MS Fragment peak resolution obtained from the spectrum shown in Figure 3B Mr 1507.84 935.54 581.38 175.19 Average
Resolution FWMH 4555 5149 3411 3951 4267
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION217 In the second example of using MS/MS to confirm the identification proposed by PMF, protein spot 2 was identified (Figure 3A) as the Bovine Creatine Kinase (SWISS-PROT entry: Q9XSC6). As in the previous example, one of the observed accurate masses could be related to two different peptide sequences: GGDDLDPNYVLSSR and LSVEALNSLTGEFK. These peptides corresponded to the theoretical Mr of 1507.702 and 1507.800 Da respectively, representing -45 and +20 ppm errors. The MS/MS spectrum clearly validated the presence of the Cterminus lysine peptide (Figure 3B). However, a few very intense peaks such as 935.48 Da, cannot be matched to that sequence but can be matched to the Cterminus arginine peptide. Moreover, Table 1 shows the calculated resolutions for the ten most intense peaks of the spectrum shown in Figure 3A. The average resolution was 14000. In the case of the 1507.77 Da peptide the resolution of the peak in the PMF was only 8700, a fact which could be explained by peak being a convolution of two peptides in which the Mr differ from 0.1 Da.
Figure 4: MS-TAG output search results conducted using the MS/MS spectrum of the 1507.8 Da peak from the bovine Creatine kinase protein shown in Figure 3B. The result clearly identifies Creatine kinase protein and moreover, both peptides matched with the submitted mass values.
218
BIENVENUT ET AL.
In the MS/MS spectrum, the fragments obtained from both peptides were substantially different in term of number and intensity and resolution (Table 2). In fact P. Juhasz from Applied Biosystems (Framingham, MA) have previously shown that type of results where C-terminus lysine peptides were giving more unbiased distribution of MSMS fragments MS/MS spectra than the C-terminus arginine peptides (personal communication). As the above discussion has shown, MALDI-TOF/TOF-MS can be used for peptide sequence discrimination with efficiency and velocity. Such spectra can not only be used directly to verify the sequence of a peptide but can also be used for protein identification with tools that are more generally allotted to ESI-MS/MS spectra. As an example, Figure 4 shows the results obtained for the data extracted from the MS/MS spectrum shown in figure 3B. That spectrum provided identification of the target protein and validation of both peptide sequences. Such protein identification research was also conducted with success using the MS-TAG tool without enzyme specification (i.e., search for unspecific cleavages). Hence, protein digest analyses for protein identification can be done using both conventional PMF techniques and MS/MS data obtained from the TOF/TOF workstation. 3.2. De Novo sequencing Protein identification techniques using either PMF or MS/MS data are still limited to the availability of the protein in the databases. If the protein cannot be identified by such techniques, de novo sequencing becomes a valuable tool through which a short sequence tag can be identified. This technique is described in the literature for ESIMS/MS spectra (Dancik et al., 1999; W. Zhang & Chait, 2000). However, it was more difficult to apply this technique to MALDI-PSD-MS spectrum due to the large amount of information included in such spectrum and/or the lack of mass accuracy without any chemical modification of the peptides including reverse HPLC purification (Keough, Lacey, & Youngquist, 2000). The PMF of the digest products obtained from the third protein (Spot 3, Figure 1) allowed (Figure 5A) the identification of the ATP synthase D chain (SWISSPROT entry: P13620). However, the five most intense peaks of the spectrum cannot be matched to this protein, or to any other contained in the databases. MS/MS data were acquired as a way to identify the protein source(s) of these peptides. In figure 5B, MS/MS spectrum of the 1548.80 Da peptide is shown. Direct submission of MS/MS data for protein identification against all species contained in the database and with no enzyme restrictions did not result in the identification of the protein.
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION219 1125.5182
1548.8018
100
3.3E+4
90 80
% Intensity
70 60 50 40 30 20
1516.81
1113.52 .5
1603.79 1571.80
1731.88 2009.04
T
10
1219.61
100
x10
90
80
B)
1202.58 0
70
272.24
% Intensity
60
50
P
I/L
218
30
112.10 129.14 20
82.0
D
I/L
T
498.40
40
y1 R
548.29
255.20 5 385.33
371.4
660.8
950.2
1239.6
1529.0
Mass (m/z)
Figure 5: A) MALDI-MS spectrum of the tryptic digest of spot 3 (Figure 1). Labeled peaks correspond to the matched peptide to Bovine ATP synthase D chain (SWISS-PROT entry: P13620). "T" labeled peaks correspond to autolysis product of the trypsin. The five underlined peak masses correspond to the five most intense unmatched peaks against the target protein. B) MS/MS spectrum of 1548.8 Da peptide. From these data, a de novo sequence was deduced: T(I/L)DX XnVA(F/Mox)GE(I/L)(I/L)PR. It contains an unknown sequence of AA or PTM labeled Xn corresponding of a [delta]Mr of 218 Da.
Subsequently, the use of de novo sequencing determined the nearly complete sequence pattern as [T(I/L)DXnVA(FMox)GE(I/L)(I/L)PR]. Only one unknown AA sequence, corresponding to 218 Da, was unsolved and annotated as Xn. This pattern was submitted for a sequence only search allowing 1 to 3 residues for the Xn sequence. Only one peptide matching with this pattern was returned belonging to the
220
BIENVENUT ET AL.
ATP synthase D chain. Its sequence (K)TIDWVAFGEIIPR(N) corresponded to the theoretical Mr of 1516.83 Da. The unknown Xn sequence should corresponds in that case to a single tryptophan residue carrying a +32 Da modification. That mass difference can be attributed to a double oxidation of this residue and will be discussed in the next section. Nevertheless, this example and others that are not discussed in this article clearly show that de novo sequencing can be routinely performed from the quality of MS/MS spectra provided from the TOF/TOF and, has been already demonstrated with ESI MS/MS data, can be useful for unknown protein identification. 3.3. Tryptophan oxidation AA residue oxidation is described in the literature; however, this type of modification is mostly known for the methionine, which is easily oxidized during various chemical treatments like with cyanogen bromide. If during this procedure monooxidation of the tryptophan was described, the double oxidation is less mentioned. As an example, “doubly oxidized tryptophan”, when used as a keyword in a Medline search, results in only two articles (Swiderek, Davis, & Lee, 1998; Thiede, Otto, Zimny, Muller, & Jungblut, 1996). One of the articles is only mentioning the possible advantage to use the doubly oxidized modification in the process of protein identification in the PMF database tools (Thiede et al., 1996). The second, using synthetic peptides as model for methionine and tryptophan oxidation, showed that oxidized tryptophan are silent in term of the creation of unique CID fragments, such as the loss of the 64 Da from methionine (Swiderek et al., 1998). Nevertheless, 202 and 218 Da residues corresponding respectively to the E-(3oxindolyl)alanine and N-formylkynurenine are characteristics of mono and doubly oxidized tryptophan that could help during de novo sequencing. Others products resulting from tryptophan oxidation include kynurenine (M Mr 190) and 3hydroxykynurenine (M Mr 206) which have already been identified by different analytical methods such as absorbance, fluorescence and mass spectrometry (Holt, Milligan, & Rivett, 1977) (Finley, Dillon, Crouch, & Schey, 1998; Kotiaho, Eberlin, Vainiotalo, & Kostiainen, 2000). This residue is sensitive to oxidation either as a free AA or as a part of a peptide chain. Figure 6 depicts an enlargement of the MS spectrum containing the peptide with the tryptophan residue. The peak at 1516.83 Da is attributed to the peptide belonging to ATP Synthase D containing the native tryptophan. De novo sequencing technique demonstrated that the peak at 1548.82 Da, corresponding to an increment of 32 Da, is the result of a “double oxidation” of tryptophan for which the Nformylkynumerine structure can be proposed (Figure 6). Many minor peaks (1520.82, 1532.83, 1536.82, 1564.81) might also be attributed to other oxidation products of the tryptophan residue such as kynurenine, E-(3-oxindolil)alanine, 3hydroxykynurenine and hydroxy-N-formylkynumerine.
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION221 The peak issued from dioxidized form shows the highest intensity of tryptophan derivatives, a fact that is contrary to the work presented by Thiede et al. (Thiede et al., 2000) where tryptophan containing peptides were partly monooxidized and, in a smaller amount, doubly oxidized. 1548.82
100
4.7E+4 O NH
90
O
80 70
NH CHO
O
O
NH
NH O
% Intensity
60
NH
NH
50
O
O
NH
NH O NH
O
40 NH2
30 20
0
1513.0
OH
NH CHO
HO
Unknown
1516.83 10
O NH2
O
1520.82
1536.82 1532.83 1529.83
1524.4
1535.8
1564.81 1547.2
1558.6
1570.0
Mass (m/z)
Figure 6: Proposed tryptophan derivatives obtained from 1516.83 m/z peak that corresponds to the peptide containing the unmodified tryptophan residue. 1520.82 m/z could correspond to the kynurenine, 1529.83 m/z to an unknown by-product found in all oxidized tryptophan pattern, 1532.83 m/z to the hydroxytryptophan, 1536.82 m/z to the 3-hydroxykynurenine, 1548.82 m/z to the N-formylkynurenine and 1564.81 m/z to the hydroxy-N-formylkynurenine.
The four other intense peaks also corresponded to ATP synthase D chain peptides containing the modified tryptophan residue, (Figure 5A) as confirmed by their fragmentation spectra (data not shown). In the case of arginine C-terminus peptides that generally provide a limited amount of fragment ions, the doubly oxidized peptides exhibit a MS signal of very high intensity, and a MS/MS spectrum with nearly a complete ion series that could be used for de novo sequencing. Therefore, oxidation of tryptophan residues could be considered as a valuable chemical modification if specified in PMF and MS/MS tools to improve protein identification.
222
BIENVENUT ET AL. 4. CONCLUSION
These results clearly show that the MALDI-TOF/TOF mass spectrometer is able to produce highly valuable data. Such information was used with success to identify proteins both from the PMF and directly from the MS/MS spectra using standard protein identification tools. Moreover, we were able to extract the de novo peptide sequences on MS/MS spectra, as it is usually done with ESI-MS/MS data. These data show also the large potential to used tryptophan oxidation as a tool for protein identification using both the PMF and MS/MS techniques. In the first case, it seems that peptides containing oxidized tryptophan have a higher average ionization efficiency than the none oxidized residue, an observation that could both improve the limit of detection and increase percentage of recovery. In the second case, these peptides gave nearly complete ion series from which de novo sequences could be efficiently created with both C-terminus arginine and lysine. 6. ACKNOWLEDGEMENTS This work was supported by the Swiss National Fund for Scientific Research (grant 31-59095.99). The authors acknowledge Dr. Peter Juhasz, Irene Fasso, Salvo Paesano, Veronique Converset and Alexander Scherl for their technical support. 5. REFERENCES Acharya, A., Maanjula, B., Murthy, G., & Vithayathil, P. (1977). Int J Peptide Protein Res, 9, 213-219. Adams, M., Celniker, S., Holt, R., Venter, J., & al., e. (2000). Science, 287, 2185-2191. Axelsson, J., Naven, T., & Fenyo, D. (2001, 4-6th off April). Paper presented at the From biology to pathology: the proteomics perspective, York, UK. Bairoch, A., & Apweiler, R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic acids research, 28, 45-48. Bartlet-Jones, M., Jeffrey, W., Hansen, H., & Pappin, D. (1994). Peptide ladder sequencing by MS using a novel volatile degradation reagent. Rapid Commun. Mass Spectrom., 8, 737-742. Bienvenut, W., Hoogland, C., Greco, A., Heller, M., Gasteiger, E., Appel, R., et al. (2002). Hydrogen/deuterium exchange for higher specificity of protein identification by peptide mass fingerprinting. Rapid Commun. Mass Spectrom., 16(6), 616-626. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Blattner, F., Plunkett, G. r., Bloch, C., Perna, N., Burland, V., Riley, M., et al. (1997). Science, 277, 1453-1474. Brown, R., & Lennon, J. (1995). Mass resolution improvment by incorporation of pulsed ion extraction in a matrix-assisted laser desorption/ionisation linear time-of-flight mass spectrometer. Anal. Chem, 67, 1988-2003. Chaurand, P., Luetzenkirchen, F., & Spengler, B. (1999). Peptide and protein identification by MALDIPSD TOF-MS. J. Am. Soc. Mass Spectrom., 10, 91-103. Clauser, K., Baker, P., & Burlingame, A. (1999). Anal. Chem., 71, 2076-2084. Consortium, I. H. G. S. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921. Engen, J., & Smith, D. (2001). Investigating protein structure and dynamics by hydrogen exchange MS. Anal. Chem., 73, J. Chromatogr. A 256-A265.
IMPROVEMENTS IN THE PEPTIDE MASS FINGERPRINT PROTEIN IDENTIFICATION223 Englander, J., Rogero, J., & Englander, S. (1985). Anal . Biochem., 147, 234-. Englander, S., & Kallenbach, N. (1984). Q. Rev. Biophys., 16, 521. Falick, A., & Maltby, D. (1989). Anal. Biochem., 182, 165-169. Figueroa, I., Torres, O., & Russell, D. (1998). Effects of the water content in the sample preparation for MALDI on the mass spectra. Anal. Chem., 70, 4527-4533. Fleischmann, R., Adams, M., White, O., Clayton, R., Kirkness, E., Kerlavage, A., et al. (1995). Science, 269, 496-512. Fraenkel-Conrat, H., & Olcott, H. (1945). J. Biol. Chem., 161, 259-268. Greco, A., Bienvenut, W., Sanchez, J., Kindbeiter, K., Hochstrasser, D., Madjar, J., et al. (2001). Identification of ribosome-associated viral and cellular basic proteins during the course of infection with herpes simplex virus type 1. Proteomics, 1(4), 545-549. Gygi, S., Rist, B., Gerber, S., Turecek, F., Gelb, M., & Aebersold, R. (1999). quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnol., 17, 994-999. Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., & Watanabe, C. (1993). Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proceedings of the National Academy off Sciences of the United States of America, 90(11), 5011-5015. Hunt, D., Yates, J., Shabanowitz, J., Winston, S., & Hauer, C. (1986). Proc. Natl. Sci. USA, 83, 62336237. James, P., Quadroni, M., Carafoli, E., & Gonnet, G. (1993). Protein identification by mass profile fingerprinting. Biochem Biophys Res Commun, 195(1), 58-64. James, P., Quadroni, M., Carafoli, E., & Gonnet, G. (1994). Protein identification in DNA databases by peptide mass fingerprinting. Protein Sci., 3(8), 1347-1350. Jensen, O. N., Vorm, O., & Mann, M. (1996). Sequence patterns produced by incomplete enzymatic digestion or one-step Edman degradation of peptide mixtures as probes for protein database searches. Electrophoresis, 17(5), 938-944. Katta, V., & Chait, B. (1993). J. Am. Chem. Soc., 115, 6317-6321. Kaufmann, R., Spengler, B., & Lutzenkirchen, F. (1993). Mass spectrometric sequencing of linear peptides by product-ion anaylsis in a reflectron time-of-flight mass spectrometer using matricassisted laser desorption/ionisation. Rapid Commun. Mass Spectrom., 7, 902-910. Kraus, M., Janck, K., Bienert, M., & Krause, E. (2000). Characterisation of intermolecular ?-sheet peptides by mass spectrometry and hydrogen isotope exchange. Rapid Commun. Mass Spectrom., 14, 1094-1104. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. Lahn, H. W., & Langen, H. (2000). Mass spectrometry: a tool for the identifiaction of proteins separated by gels. Electrophoresis, 21, 2105-2114. Mann, M., Höjrup, P., & Roepstorff, P. (1993). Biol. Mass Spectrum, 22, 338. McMurry, J. (1988). Organic chemistry (2nd ed.). Belmont, CA,: Brooks/Cole Publishing Company. Ng, W., Kennedy, S., Mahairas, G., Berquist, B., Pan, M., Shukla, H., et al. (2000). Proc. Natl. Acad. Sci. USA, 97, 12176-12181. Nutkins, J., & Williams, D. (1989). Eur. J. Biochem. Ohguro, H., Palczewski, K., Walsh, K., & Johnson, R. (1994). Prot Scien, 3, 2428-2434. Pappin, D., Hojrup, P., & Bleasby, A. (1993). Rapid identification of proteins by petide mass fingerprint. Curr. Biol., 3(6), 327-332. Patterson, S., Thomas, D., & Bradshaw, R. (1996). Application of combined mass spectrometry and partial amino acid sequence to the identification of gel separated proteins. Electrophoresis, 17, 877891. Rosa, J., & Richards, J. (1979). J. Mol. Biol., 133, 399-. Sepetov, N. F., Issakova, O. L., Lebl, M., Swiderek, K., Stahl, D. C., & Lee, T. D. (1993). The use of hydrogen-deuterium exchange to facilitate peptide sequencing by electrospry tandem mass spectrometry. Rapid Commun. Mass Spectrom., 7, 58-62. Shevchenko, A., Jensen, O. N., Podtelejnikov, A. V., Sagliocco, F., Wilm, M., Vorm, O., et al. (1996). Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. Proc. Natl. Acad. Sci U.S.A., 93(25), 14440-14445.
224
BIENVENUT ET AL.
Spengler, B., Lutzenkirchen, F., & Kaufmann, R. (1993). On-target deuteration for peptide sequencing by laser mass spectrometry. Org. Mass Spectrom., 28, 1482-1490. The Arabidopsis Initiative. (2000). Nature, 408, 796-815. The C elegans Sequencing Consortium. (1998). Science, 282, 2012-2018. Vestal, M. L., Juhasz, P., & Martin, S. A. (1995). Delayed extraction matrix-assisted laser desorption time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom., 9, 1044-1050. Wang, F., & Tang, X. (1996). Biochemistry, 35, 4069-4078. Wheeler, D., Church, D., Lash, A., Leipe, D., Madden, T., Pontius, J., et al. (2001). Database resources of the national center for biotechnology information. Nucleic acids research, 29, 11.juin. Whittal, R., & Li, L. (1995). Anal. Chem., 67, 1950-1954. Wilcox, P. (1967). Esterification. Meth. Enzym., 11, 605-616. Yates, J. R., III, Speicher, S., Griffin, P. R., & Hunkapiller, T. (1993). Peptide mass maps: A highly informative approach to protein identification. Anal Biochem, 214, 397-408. Zhang, W., & Chait, B. (2000). Anal. Chem., 72, 2482-2489.
CHAPTER 8 PROTEOMICS AND MASS SPECTROMETRY: Some aspects and recent developments
WV. Bienvenut, M. Müller, P.M. Palagi, E. Gasteiger, M. Heller, E. Jung, M. Giron, R. Gras, S. Gay, P-A. Binz, GJ. Hughes, J-C. Sanchez, RD. Appel, DF. Hochstrasser
1. INTRODUCTION TO PROTEOMICS For several decades, DNA sequencing has progressed dramatically. Genomes from several bacteria, yeast and drosophila have been completely sequenced. Furthermore, the sequencing of the human genome is completed. In parallel, numerous genomic tools have been developed in order to study biological processes and explain physio-pathological findings in molecular terms. Indeed, the biological function of each gene should be understood. Therefore, after the stages of genome sequencing and gene discovery, attention must be focused on gene expression and the functions of the proteins they encode. DNA chip technology allows the simultaneous analysis of the expression of thousands of genes at the mRNA level and can unravel some biological processes. However, as previously demonstrated (Anderson & Seilhamer, 1997; Link, Tempel, & Hund, 1992), the correlation between the expression of mRNA and protein is low. In addition, many protein functions are related to their post-translational modifications such as phosphorylation or glycosylation and not to their level of expression. Consequently, large-scale studies of proteins or proteomes will be needed to complement genomic studies to better understand life processes. The word proteome was proposed by Marc Wilkins (Williams & Hochstrasser, 1997) to depict the PROTEin complement of a genOME. There are numerous proteomes for a single genome and proteomes are much more complex than genomes. Proteomics is the science which deals with the high throughput analysis of proteins, and this includes their identification, the measure of their level of expression and their partial characterisation. Thus, proteomics should complement genomics. Proteomics relies on efficient protein separation techniques, mass spectrometry, bioinformatics as well as gene and protein databases. One of the most powerful protein separation techniques is twodimensional polyacrylamide gel electrophoresis (2-D PAGE or 2-DE gel) independently developed by Klose (Klose, 1975)and O’Farrel (O'Farrell, 1975). It has been further refined by the Andersons who proposed in 1975 the concept of a human protein index (Anderson, Giometti, Gemmell, Nance, & Anderson, 1982; J. 225 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 225–281. © 2005 Springer. Printed in the Netherlands.
226
BIENVENUT ET AL.
Taylor, Anderson, Scandora, Willard, & Anderson, 1982). It was certainly one of the early milestones of proteomics. Today, despite several drawbacks, 2-D PAGE is still very useful to analyse and display simultaneously thousands of proteins separated by charge and apparent size. In this chapter, new developments combining 2-D PAGE and mass spectrometry will be described. It will include the parallel chemical processing of proteins and the extensive use of bioinformatics tools and protein databases. 2. PROTEIN BIOCHEMICAL AND CHEMICAL PROCESSING FOLLOWED BY MASS SPECTROMETRIC ANALYSIS Several years ago, identification of a single protein and its subsequent characterisation was a challenge for (bio-)chemists (Schwert & Takenaka, 1955). Historically, protein identification and characterisation was mainly conducted using the Edman degradation (Edman & Begg, 1967) in order to determine the primary sequence of a protein. Later on, N-terminal sequencing using Edman degradation was the method of choice to determine N-terminal or internal sequences which could be used to define an oligonucleotide probe specific for the mRNA encoding the protein in question. As Edman degradation provides a very accurate tool to determine a primary sequence, it is still in use. However, there are two major drawbacks. First, only 40 to 50 amino acids can be identified per day under normal conditions (Hughes G, unpublished results) and secondly, the N-terminal amino acid must be free of certain post-translational modifications, i.e pyroglutamination or acylation to be available for Edman degradation. These days, this technique finds its application mostly in the characterisation of small proteins or peptide (Kollisch et al., 2000), in the quality control of recombinant proteins, in the determination of phosphorylation sites (Lehr et al., 2000) or in the deduction of amino acid pairs that cannot be resolved by mass spectrometry, i.e., leucine/isoleucine, lysine/glutamine, phenylalanine/methionine sulfoxide, because of identical or nearly identical masses (Ramsay, Steinborner, Waugh, Dua, & Bowie, 1995). To overcome the low throughput characteristic of Edman degradation, amino acid composition analysis was implemented in the protein identification scheme. This method is based on the chromatographic analysis of free amino acids obtained after acid hydrolysis (Blankenship, Krivanek, Ackermann, & Cardin, 1989; Einarsson, Josefsson, & Lagerkvist, 1983; Haynes, Sheumack, Greig, Kibby, & Redmond, 1991). Amino acid analysis can achieve high throughput protein identification, however, there is a decrease in the confidence of identification (Golaz et al., 1996; Wilkins, Pasquali et al., 1996; Yan et al., 1996). A combination of amino acid analysis and Edman degradation, limited to 3-5 cycles to obtain a short sequence tag, was used to increase the confidence in protein identification (Gooley et al., 1997; Wilkins, Gasteiger, Sanchez, Appel, & Hochstrasser, 1996; M. R. Wilkins, Ou et al., 1996). At present, with the human genome nearly sequenced, whole proteome analysis presents new challenges for the identification and characterisation of the actual gene
PROTEOMICS AND MASS SPECTROMETRY
227
products, i.e. the proteins. While the genome represents a more or less unique set of data, the proteome is far more diverse as not all proteins are expressed at the same time and in the same tissues. Needless to say that such a huge project, probably involving the identification and characterisation of close to 1 million gene products is simply not feasible by Edman sequencing. This task needs accurate, reliable and rapid identification methods. Two-dimensional electrophoresis gels (2-DE), a biochemical technique used to separate proteins according to their molecular weight and isoelectric point, emerged in the middle 70’s as a revolutionary procedure in protein analysis (Klose, 1975; O'Farrell, 1975). Many of its technical problems had to be refined and it was only in the last 10 years that 2-DE gels have proven their capacity (Bjellqvist et al., 1982). This revival was mainly due the combination of improved 2-DE techniques to mass spectrometer instrumentation, better computing and software tools, and the emergence of large protein sequence databases from genome-sequencing projects. Peptide mass fingerprinting (PMF) involving analyses of peptides obtained after specific proteolytic digestion of polypeptides has shown its efficacy for protein identification (Henzel et al., 1993; James et al., 1993; Mann et al., 1993; Pappin et al., 1993; Yates, III et al., 1993). With the recent development of mass spectrometers (MS) such as Matrix Assisted Laser Desorption/Ionisation MS (MALDI-MS) (Karas & Hillenkamp, 1988)and the Electro-Spray Ionisation MS (ESI-MS) (Aleksandrov et al., 1984; Yamashita & Fenn, 1984), bio-polymers can be efficiently measured and rapidly identified by database searches. 2.1. 2-DE gel protein separation A critical problem in the post-genome era is the capacity of current techniques to perform large-scale separation of complex protein mixtures. A number of technologies have been investigated or are under development, such as capillary and gel electrophoresis, micro-channel networks (Rossier et al., 1999) and liquid chromatography (LC). LC separation is starting to be widely used in proteomics. Link et al. have shown that multi-dimensional liquid chromatography coupled to a tandem mass spectrometer can analyse protein complexes (Link et al., 1999). They applied their approach to the yeast 80 S ribosome and identified ~100 proteins in a single run. Oda et al. (Oda, Huang, Cross, Cowburn, & Chait, 1999) have described a MS-based method for simultaneous identification and quantitation of 2-DE separated proteins. Changes in post-translational modifications at specific sites on proteins were also determined. Accurate quantitation has been achieved by the use of whole-cell stable isotope labelling. Gygi et al. (Gygi et al., 1999) have also published a method to quantify protein expression by using a new class of chemical reagents called isotope-coded affinity tags (ICATs) and tandem mass spectrometry (MS/MS). This ICAT technology provides a means to quantitatively compare global protein expression in semi-complex protein mixtures. However, it is still generally agreed, that two-dimensional gel electrophoresis remains unrivalled for its capacity to resolve several thousand polypeptides and to detect differentially expressed
Internet URL address
http://www.protana.com
http://www.thermoquest.com
Protana
ThermoQue st
Investigator HT Analyzer Software
Protein Solution 1, SQL-LIMS, ProteinKeeper (under development) and Protein Prospector Home made: Robot for excision of Q*STAR, supplied by PE PPSS 2.2 (contains PepSea, gel spots; In-gel digestion system Sciex; REFLEX III, supplied ProteomeDB, Inspector, SoftSpot, by Bruker Daltonics PepSea FlowAgent) Surveyor™ LC System, Xcalibur supporting Finnigan LCQ™DUO, LCQ TurboSEQUEST DECA and TSQ®
http://www.genomicsolutions.c Flexys Proteomics Robot, ProGest om Protein Digestion Station and ProMS MALDI Prep Station http://www.pecorporation.com Symbiot workstation Voyager STR MALDI-TOF http://www.appliedbiosystems. MS, the Voyager DE-PRO or com the Mariner API-TOF LC-MS
Genomic Solutions
PE Corp
http://www.bio-rad.com
Bio-Rad
PROTEAN 2-D Spot MassPREP™ Station
http://www.micromass.co.uk
MAPII
Under development
MS machinery
Data handling system and related software Ettan design LC-MS system ImageMaster 2D Elite and (API-MS); ETTANTM design Database. Partnership with MALDI-TOF Cimarron Software Inc. esquire3000, AutoXecute and MS Biotools BiflexIII or ProflexIII MS supporting MASCOT Cutter, TOF Spec-2E MALDI-TOF Proteome Works System including PDQuest or Melanie II and MS or Q-TOF MS, MassLynx – ProteinLynx
Robot for sample preparation
Micromass
http://www.apbiotech.com Pharmacia Biotech Brucker Dalto http://www.bruker.com
Company
Table 1. List of available high throughput protein identification systems
228 BIENVENUT ET AL.
PROTEOMICS AND MASS SPECTROMETRY
229
proteins. In 1970, Kenrick and Margolis published the first two-dimensional protein separation technique using native isoelectric focusing (IEF) and gradient gel electrophoresis (Kenrik & Margolis, 1970). However, the main tool used today to display and evaluate proteome complexity of any organism is the denaturing 2-DE independently developed by O’Farrell, Klose and Scheele in 1975 (Klose, 1975; O'Farrell, 1975; Scheele, 1975). Many thousand polypeptides can be separated on the basis of different molecular properties in each of the two dimensions: charge (pI) in the first dimension and molecular mass (Mr) in the second. IEF is the electrophoretic technique in which the proteins are fractionated according to their isoelectric point (pI) through an immobilised pH gradient (IPG). This is achieved by using a set of weak acids and bases named Immobilines™. When the electric field is applied, only the sample molecules and any non-grafted ions migrate. Upon termination of electrophoresis, the proteins are separated into stationary isoelectric zones (Bjellqvist et al., 1982). The majority of the laboratories running 2-DE are currently using 3.5-10 or 4-7 IPG strips to display a wide range of proteins (Gorg et al., 1988). However, the use of 1 pH unit narrow range IPG permits higher protein loading and the investigation of a smaller, but more detailed “window” of the proteome. The work of Tonella et al. (Tonella et al., 1998) demonstrated that the combination of 1 pH unit range gels with their high loading capacity (4 mg of proteins loaded) can display 85% of the E. coli proteome between the pI range from 5.09 to 6.09. After the first dimensional separation, the IPG strips are transferred onto classical vertical or horizontal SDS slab gels first published by Lämmli (Laemmli, 1970) to dissociate all proteins into their individual polypeptide chains. Thus, in addition to the analysis of the polypeptide composition of a sample, the investigator can also determine their apparent molecular mass Today, silver staining (Rabilloud, 1990) is probably the most popular nonradioactive protein detection as it is more sensitive than Coomassie Brilliant Blue (Neuhoff, Arold, Taube, & Ehrhardt, 1988) or reverse staining (Fernandez-Patron, Castellanos-Serra, & Rodriguez, 1992). However, other procedures including fluorescent staining (Patton, 2000) and S35 or P32 radiolabelled samples can also be used (Johnston, Pickett, & Barker, 1990). Two-dimensional polyacrylamide gel patterns can be digitised and analysed on a computer to allow quantitative image analysis and automatic gel comparison by querying specialised protein databases. Several image analysis programs have been developed since 1975. The Swiss Institute of Bioinformatics (SIB) has developed and commercialised Melanie 3 (GeneBio S.A., Geneva), the design of which has been based on user friendliness. It allows the study of similarities and differences between sets of 2-DE images obtained from biological samples under different conditions (i.e. healthy vs. diseased or drug treated vs. non-treated samples) and could further help to find diagnostic, prognostic and therapeutic molecular markers in specific diseases or state (Appel et al., 1991). 2.2. Protein identification using peptide mass fingerprinting and robots Thanks to the improvements in mass spectrometric technologies, MALDI-MS analysis has become a powerful tool for protein identification by the easy and quick
230
BIENVENUT ET AL.
PMF technique (Henzel et al., 1993; James et al., 1993; Mann et al., 1993; Pappin et al., 1993; Yates, III et al., 1993). However, spot excision, protein digestion, peptide extraction and MALDI-MS sample preparation is a major bottleneck in the high throughput protein identification and characterisation process. A few groups proposed a robotic and/or computational approach to increase sample throughput, i.e. excision robots to cut out protein spots from 2-DE gels (Traini et al., 1998a; Walsh et al., 1998), liquid handling systems to automatically pipette solvents (Ashman et al., 1997; Houthaeve et al., 1997; Houthaeve et al., 1995) or computer programs to automatise peak detection in mass spectra (Breen et al., 2000; Gras et al., 1999). In the standard manual procedure, protein spots of interest are chosen from the gel image. They are first excised from the gel and deposited in tubes or wells of a microtiter plate. After, they are individually subjected to an endoproteolytic cleavage step. At the end of this digestion, the sample is used for MS analysis. This manual procedure requires from the operator numerous and repetitive manipulations, needless to say that these repetitive steps are error prone. A linear workflow includes the following steps: 1. Scan the gels; 2. Match the image with a master image or another image that will be used for comparison; 3. Choose the spots to be excised; 4. Cut out the spots, transfer them to vials (either tubes or microtiter plate wells); 5. Perform the required steps to digest the proteins in order to obtain peptide fragments that can be measured by MS; 6. Perform MS measurements; 7. Extract the MS raw data, treat and submit it to a protein identification software for database comparison. During the whole procedure, a large amount of information is generated for each sample. This includes the description of both the gel and the chosen spot, the treatment of each spot, the vials in which they are processed, the MS files they generate and the identification results. There is a definite need for an automated process and this can be achieved using robots and laboratory management systems. Automation also decreases human work and as a consequence reduces possible human errors. A number of groups around the world are developing, optimising and integrating robotic systems to obtain what can be called a “robotised integrated proteomic solution”. The ultimate goal is to set up a pipeline that includes the hardware and software components, e.g. a gel imaging system, a spot picker, a liquid handling system, a MALDI-plate loading system, a MALDI-MS instrument and a PMF identification tool. An example of what is currently available is the ARRM-BR214 spot cutter distributed by Bio-Rad (Traini et al., 1998a). This robot takes an image of the gel or the PVDF blotted membrane, excises the spots and deposes them in a 96-well microtiter plate. There are more integrated systems available to perform multiple tasks. An example is the MultiprobeII robot from Packard Instruments which can perform a complete proteolytic digestion and load the peptide extract onto MALDI-
PROTEOMICS AND MASS SPECTROMETRY
231
MS sample plates. In addition to the control of their main tasks, these individual robots include other dedicated software components such as an image analysis program that automatically selects and excises spots of interest. However, there is no fully automated system available on the market. To this end, a number of collaborations and partnerships are actively working on these developments. Depending on the required throughput, the overall data size to handle, the needed flexibility, laboratories might decide to choose one system or another, or to purchase only parts of a package. Table 1 is a non exhaustive list of systems that are currently under various phases of developments and commercialisation. 2.2.1. MALDI-MS analysis Since the introduction of a new type of mass spectrometer (Karas et al., 1987; Tanaka et al., 1988) capable of analysing intact proteins and polymers up to 100.000 Da, great interest has been shown for techniques allowing the analysis of proteins. Basically, the MALDI-MS technique consists of mixing the analyte (organic polymers, proteins or peptides) with an excess of an organic compound (matrix) able to absorb the energy of a UV laser shot and to transfer this energy to the analyte for desorption and ionisation. Today, the preferred matrices are: - D-cyano-4-hydroxy cinnamic acid (ACCA) (Li et al., 1997) for low molecular weight peptides (800-4500 Da); - Sinapinic acid for larger polypeptides (Nakanishi et al., 1994); - Dihydroxy benzoic acid (DHBA) for glycopeptides and oligosaccharide (Okamoto, Takahashi, Doi, & Takimoto, 1997; Stemmler, Buchanan, Hurst, & Hettich, 1995; Stemmler, Hettich, Hurst, & Buchanan, 1993). Karas and Hillenkamp (Karas et al., 1987) used also nicotinic acid as a matrix and more recently Gusev et al. (Gusev et al., 1995) proposed to improve the signal reproducibility by using fucose as a co-matrix with furulic acid and DHBA. To prevent the metastable fragmentation of glycoproteins, 3-hydroxy picolinic acid was proposed by Karas's group (Karas et al., 1995). Other uncommon matrices were proposed such as 3-amino picolinic acid (Stemmler et al., 1995; Taranenko et al., 1994), various trihydroxyacetophenone compounds (Zhu et al., 1996), or mesotetrakis(pentafluorophenyl)porphyrin (Ayorinde, Hambright, Porter, & Keith, 1999). More innovative propositions were made recently by Wei's group (Wei et al., 1999). Here, the MALDI-MS sample-plate is the sample support and also the matrix. This surface is made of porous silicon chemically modified to achieve desorption and ionisation of the analyte directly from the surface without the intervention of an organic matrix. During the first years of the MALDI-MS era, these spectrometers were mostly used to verify protein masses or to identify and characterise post-translational modifications (Nakanishi et al., 1994). In 1989, Henzel WJ, Stults JT and Watanabe W of Genentech Inc. presented the idea of protein identification using PMF in a poster at the Third Symposium of the Protein Society in Seattle. A second poster proposing this identification technique was shown in 1991 by Yates JR, Griffin PR, Hunkapillar T, Speicher S and Hood LE of Caltech during the Baltimore
232
BIENVENUT ET AL.
Symposium. Despite these 2 attempts, no real developments were done until 1993 when 5 groups (Henzel et al., 1993; James, Quadroni, Carafoli, & Gonnet, 1993; Mann, Hojrup, & Roepstorff, 1993; Pappin et al., 1995; Yates, III, Speicher, Griffin, & Hunkapiller, 1993) used this technique successfully to identify proteins. The comparison of in silico digested protein fragments with experimental masses became possible because of the development of computer programs (Clauser et al., 1995; Pappin et al., 1993) and computer facilities. At that time, only 4-5 peptides with an accuracy of 1-2 Da were generally enough to identify a protein compared to 5-10 peptides with an accuracy better than 50 ppm at present. Success of this peptide analysis technique was due to: - The production of singly charged ions; - High sensitivity (far less material needed than for Edman degradation); - A large mass range (from 500-600 Da up to a few hundred thousand Da); - Short analysis time; - Low sensibility to salts and contaminants, Major drawbacks were the low resolution, a few hundred full-width halfmaximum (FWHM) for ions above 10kDa, and the low accuracy of mass measurement. Ingendoh et al. (Ingendoh et al., 1994) listed a series of causative factors such as broad initial energy distribution and spread in apparent generation time. They proposed modifications to improve peak resolution such as reducing the ion energy distribution with an “ion reflectron” (Mamyrin et al., 1973). The ion reflectron is able to compensate for the flight time error due to energy spread which can reach up to 15 %. Also, a better peak resolution was obtained after focalisation of the laser beam (< 10 Pm) and by the use of a 109 Hz digital oscilloscope. At that time the peak resolution was a few thousand FWHM for ions above 3000 Da. Others groups (Brown & Lennon, 1995; Jensen, Podtelejnikov et al., 1996; Katta & Chait, 1993; Vestal et al., 1995; Whittal & Li, 1995) obtained an improvement of mass accuracy and peak resolution by the use of a delayed extraction system (Wiley & McLaren, 1953). This system uses a pulsed ion extraction MALDI ionisation technique increasing the accelerating voltage from 0 up to 3 kV in 300 nanoseconds. This technique allowed an increase of peak resolution for cytochrome c (12 kDa) from 350 FWHM obtained in linear mode to 1024 with a continuous ion extraction. The major recent improvement in terms of mass accuracy and peak resolution is certainly the combination of the reflecting analyser with the delayed extraction mode (Takach et al., 1997). They were able to achieve mass measurement accuracy of ±2 ppm on a 1000 Da peak with a resolution as high as 10.000 FWHM. Such improvement in accuracy has obvious implications in the reliability of protein identification using PMF. Higher resolution can be obtain when MALDI-MS is coupled with a Fourrier Transform Ion Cyclotron Resonance (FTICR) but the major drawback of such system is its expense. Nevertheless, if existing instruments are able to provide a highly accurate mass determination, this technique is extremely dependent on matrices (Karas et al., 1995) and sample preparation (Chen et al., 1999). Chen et al. investigated the interaction between the surface on which the sample is loaded and the peptide. They
PROTEOMICS AND MASS SPECTROMETRY
233
suggested that the surface, the solvent and the technique used to prepare the sample for MALDI-MS analysis influences ion signal. Cohen et al. (Cohen & Chait, 1996) and Figueroa et al. (Figueroa et al., 1998) investigated the influence of the solvent composition and the rate of matrix crystallisation. Cohen's group concluded that high molecular mass peptides must be prepared with a solvent that includes formic acid (solvent pH<1.8) and a slow crystallisation. On the other hand, for low Mr peptides, better results are obtained with trifluoroacetic acid/acetonitrile solvent (solvent pH|2) and fast solvent evaporation. Figueroa's group also showed the importance of water concentration in the solvent. Homogeneous co-crystallisation of the matrix and peptides/proteins is the critical step in sample preparation. Physical characteristics of a protein digest are widely distributed in terms of masses, pI and hydrophobicity. Kratzer et al. (Kratzer et al., 1998) found a preference of hydrophobic peptide adsorption to the non-polar (103) face. This fact was demonstrated by the crystallographic investigation of Beavis (Beavis & Bridson, 1993). The speed of crystallisation can also explain the inclusion or exclusion of hydrophilic peptide into the growing crystal. The secondary structure of a peptide will also affect the signal intensity. Wenschuh et al. have shown (Wenschuh et al., 1998) that MALDI-MS signal response of peptides displaying stable D-helical and E-sheet structures was different when two adjacent amino acids were replaced by their corresponding D-isomers. A simple D-L amino acid modification may disrupt D-helical and E-sheet structures and therefore completely alter the MALDI-MS spectral pattern. Slow crystallisation of the matrix helps for co-crystallisation with predominant adsorption of hydrophobic peptides in large crystals. In this case, hydrophobic peptides occupy the adsorption sites on the (103) crystallographic face and this produces a stable architecture. If solvent is quickly evaporated (flash evaporation), matrix micro-crystals are obtained. Peptide adsorption on the hydrophobic adsorption sites is controlled by the kinetics of analyte diffusion to the adsorption sites on the (103) crystallographic face. This type of co-crystallisation is thermodynamically less stable than the previous one due to the higher potential energy. In this case, hydrophobic and hydrophilic peptides can be integrated in the crystal structure without site competition allowing a better distribution and decreased discrimination between them (Amado, Damingues, Santana-Marques, Ferrer-Correia, & Jones, 1997). This behaviour can also explain the suppression effect reported in the literature (Cohen & Chait, 1996; Patterson et al., 1996). In a few articles, the use of surfactants was proposed to decrease the suppression effect. N-Octylglucoside (Cohen & Chait, 1996) showed a positive effect on high Mr peptides but anionic surfactants (Breaux et al., 2000) were preferred to analyse hydrophobic peptides. The use of surfactants helps to create a more homogeneous distribution of hydrophobic and hydrophilic peptides integrated in the matrix by decreasing the stabilising hydrophobic effects between peptide and matrix. Krause (Krause et al., 1999) found that 94% of the most intense peaks bore an arginine (R) residue at the C-terminal side of the tryptic fragment obtained from digested proteins from mycobacteria. They obtained higher signal intensities for peptides containing R than for those with lysine (K) at the C-terminal amino acid.
234
BIENVENUT ET AL.
They attributed this effect to specific chemical properties of R by the comparison of signal intensity with similar peptide carrying K or R at the C-terminal. Meanwhile, an exhaustive study conducted by Keil (Keil, 1992) on the endoproteinase specificity concludes to a lower activity of trypsin towards lysyl compared to arginyl residues. In this case, after tryptic digestion, C-terminal R peptides would be at higher concentration in the protein digest, thus explaining the higher intensity of these peaks. The above facts do not favour protein or peptide quantitation using MALDI-MS. Some problems are associated with MALDI-MS quantification: i) low shot-to-shot reproducibility, ii) various signal suppression effects, and iii) strong influence of sample preparation and matrix crystallisation. Nevertheless, it is possible to use MALDI-MS to obtain absolute or relative quantitation. In most cases, the idea is to use an internal standard for an absolute quantitation, but this standard must have the same physico-chemical characteristics as the quantified peptide. The use of a different peptide in terms of sequence may result in different desorption and ionisation properties. Usually, the internal standard is the same peptide labelled with a stable isotope to modify slightly the mass. Gobom et al. (Gobom et al., 2000) developed a method to quantify neurotensin in human brain tissue. For a 10 shots cumulative spectrum, they obtained more than 20% variation of the signal intensity. Due to the low reproducibility of shot-to-shot signal intensity, this technique needed up to 400 cumulative acquisitions to minimise this problem. Under those conditions, they obtained a variation of r2%. In this case, MALDI-MS as a quantitation technique was not as good as the reference method but allowed more specific information to be obtained. Hensel et al. (Hensel et al., 1997) proposed to use an electrospay method to prepare the sample. With this technique, they were able to decrease the coefficient of variation more than 3 times as compare to air-dried samples. Gygi et al. (Gygi et al., 1999) proposed a relative quantitation of all proteins contained in a sample using a special alkylating agent called isotope-coded affinity tag (ICAT). This technique was developed for MS/MS analysis but it can also be used with MALDI-MS. To conclude, MALDI-MS has been greatly improved since Karas’ and Tanaka’s first descriptions (Karas et al., 1987; Tanaka et al., 1988). It is now possible to obtain routinely a peak resolution better than 2000 FWHM and mass accuracy below 30 ppm, and under these conditions, most of the digested proteins can be clearly identified. Unfortunately, some proteins cannot be directly identified by this method and more information about their primary structure is required. Such information can be obtained by MS/MS techniques or by specific chemical modification as described below. 2.2.2. MS/MS analysis PMF can sometimes give ambiguous results: if the PMF results have to be searched against large sequence databases; if the peptides have post-translational modifications; if the sequence of the protein under investigation is not known. Then, obviously, more sequence information is needed. Spectra acquired from fragmented peptides either by post-source decay (PSD) or collision induced dissociation (CID),
PROTEOMICS AND MASS SPECTROMETRY
235
can be used to determine sequence tags or the complete sequence of a peptide, with the help of computer algorithms. This additional peptide sequence information makes protein identification less ambiguous and can be used to search expressed sequence tags (EST) databases in case the protein is not yet listed in a protein database. 2.2.2.1. MALDI-RETOF-PSD MS analysis PSD of peptides relies on the metastable decay of ions in the first field-free drift tube of a time of flight (TOF) analyser. Metastable decay is initiated by low-energy collisions of neutral matrix molecules (dominant in the desorption plume) with ionised analyte molecules during the initial stage of acceleration. The different masses of the fragment ions are separated by dispersion of the in-time ions with different kinetic energies in the electrostatic reflector field. These are detected after the second field free drift tube (Kaufmann, Kirsch, & Spengler, 1994; Kaufmann et al., 1993). As mentioned above, MALDI-PSD relies heavily on metastable ions and the most common PSD fragment ions produced are a,b,y,z and d ions according to the defined nomenclature (Johnson, Martin, Biemann, Stults, & Watson, 1987; Roepstorff & Fohlman, 1984). These ions could also produce satellite ions that loose ammonia or water (Spengler, 1997). Direct utilisation of MALDI-PSD MS for unknown peptides sequencing is not easy and generally not very sensitive. In contrast to CID with a collision gas in a collision cell, there is little control on the degree of dissociation of the reaction pathways and the fragments produced. Chemical modification of the peptides prior to PSD MS measurement was proposed to improve sequence identification (Spengler et al., 1993), to facilitate peptide specific fragmentation and to suppress a part of the spectrum in order to simplify it. Pappin, Spengler and Allison's groups (Liao et al., 1997; Spengler, 1997) used a modified N-terminal amino acid group with a quaternary ammonium ion. This charged group at one end of the peptide facilitates fragmentation and allows much simpler spectra with mostly a type ions. Lacey's group (Bauer et al., 2000; Keough et al., 2000; Keough, R.S. Youngquist, & Lacey, 1999a) modified the N-terminal end of the peptide with a negatively charged compound. In this case, y-ion type fragmentation is seen, a, b and c types of ions are suppressed due to the negative charge carried by the sulfonate on the N-terminal. Vandekerckhove's group developed an interesting technique to obtain MALDIPSD spectra (Gevaert et al., 1997; Gevaert, Demol, Sklyarova, Houthaeye et al., 1998; Gevaert, Demol, Sklyarova, Vandekerckhove et al., 1998). POROS R2 beads were used to extract and concentrate diluted peptides from the protein digest. Then, a washing step was done to remove contaminating salts. The whole sample, POROS R2 beads and peptides, were mixed with the matrix and loaded on the MALDI sample plate for PSD analysis. With this method, mostly y-ions were produced. Nevertheless, in a review dealing with protein identification methods, Gevaert and Vandekerckhove recognised (Gevaert, Houthaeve, & Vandekerckhove, 2000) that sequence determination for an unknown sequence was difficult using this technique and that de novo sequencing was impossible.
236
BIENVENUT ET AL.
To overcome the limitations inherent with PSD peptide sequencing using MALDI instrumentation, considerable research efforts are under way to use a MALDI source in conjunction with tandem MS and a CID cell. One approach is the MALDI-TOF/TOF as a linear configuration (Cornish & Cotter, 1993; Medzihradszky et al., 2000). Another solution is the use of the more mature orthogonal acceleration TOF technique as the second stage mass analyser with either a quadrupole parent ion selector (Krutchinsky et al., 2000; Shevchenko, Loboda, Ens, & Standing, 2000) or a linear TOF separator upfront to the CID cell. These are rather new developments in the field of tandem MS. The most common and furthest developed ionisation method in conjunction with CID, is electrospray ionisation (ESI). 2.2.2.2. ESI-MS/MS analysis Mass spectral analysis requires that the analyte is introduced into the mass spectrometer as an gaseous ion. This is a major hurdle especially for the ionisation of biological molecules, which consist mostly of large and therefore extremely nonvolatile polymeric units. Nevertheless, several ionisation methods were developed during the last decades. Among them, MALDI and ESI were the most successful because of their high ionisation efficiencies, i.e. very high ratios of (molecular ions produced)/(molecules consumed) (Kaufmann et al., 1993). A major advantage of ESI is that it produces multiply charged ions in an almost linear correlation of charges added per mass increment of a polypeptide (Smith, Loo, Edmonds, Barinaga, & Udseth, 1990). Thus, molecular ions can be analysed based on their mass-to-charge ratio (m/z) which is mainly in the range of m/z = 500-3000. As a result, the mass analyser can be kept relatively simple and mass measurements are very precise. These advantages were recognised independently by two research groups in the 1980's, Yamashita and Fenn (Yamashita & Fenn, 1984) in the US and Aleksandrov et al. (Aleksandrov et al., 1984) in the former USSR. The latter group also accomplished for the first time the on-line coupling of liquid chromatography to a mass spectrometer (Aleksandrov et al., 1984) which is still one of the strengths of ESI. The actual mechanism of the ESI process is still a matter of debate and the dedicated reader may refer to the literature, e.g. Kebarle (Kebarle, 2000). In summary, the analyte solution is pumped through a narrow bore capillary held at a potential of a few kilovolts relative to a counter-electrode situated normally behind a first set of ion focusing lenses. This results in a fine mist (spray) of small charged droplets under atmospheric pressure. The charge on the droplets drives them through the inlet orifice and sampling lenses/capillaries under differential pumping into the high vacuum system for mass analysis. The use of a warmed drying gas, gradually reduced pressure and heating, desolves the charged droplets. The droplets go through a cascade of so-called Coulomb explosions initiated because droplet shrinking results in critical Rayleigh diameters. This process leads finally to the production of completely desolvated and multiply charged analyte ions. The focused ion beam is then subjected to a first mass analyser, which is in most cases a quadrupole, using radio frequency signals to scan through the m/z range or select an
PROTEOMICS AND MASS SPECTROMETRY
237
ion with a specific m/z to be analysed. ESI is a soft ionisation method resulting generally in no fragmentation of the sample ions. However, ions of a specific m/z value separated in the first mass analyser can be fragmented in a collision cell in the presence of a few mTorr of a neutral and inert gas. This low energy collisioninduced dissociation (CID) process is very efficient on doubly or triply charged peptides inducing mainly fragmentation of amide bonds in the peptide backbone. Commonly, tryptic digests are used for this type of analysis and on average peptides retain two positive charges when ionised in positive ion-mode, one on the basic Nterminal and the second one on the basic C-terminal amino acid K or R, respectively. Thus, the product ions are generally singly charged and contain either the intact N- or C-terminal amino acid of the b- or y-ion type, respectively. The masses of the fragment ions are then measured in the second mass analyser of a tandem mass spectrometer. The resulting MS/MS spectrum contains sufficient information to deduce sequence tags of the fragmented peptide or can be automatically correlated with sequence databases for rapid and reliable protein identification (see chapter 3.2 below). Fast instrument controlling software can switch quickly the tandem mass spectrometer from MS to MS/MS mode thus enabling automated data acquisition of the fragments produced. Probably the simplest configuration of a tandem mass spectrometer is the socalled triple quadrupole setup, consisting of a first quadrupole to scan parent ions, a second quadrupole which serves as the CID cell, and a third quadrupole used to scan product ions. This type of tandem mass spectrometer was the first to be commercially available. Although having a rather limited mass resolution of around 1000-2000 (m/'m) these instruments are still used in many labs around the world due to the possibility of doing single ion monitoring and parent ion scanning. A more recent implementation of tandem MS was already described in the previous section, where product ions produced in the CID cell are measured in an orthogonal acceleration TOF compartment (for review see (Guilhaus, Selby, & Mlynski, 2000)). These instruments combine ESI-quadrupole technology with the superior mass resolution and sensitivity of a TOF analyser. A resolution of 5000 is standard with a Q-TOF instrument. Another type of instrument with a similar resolution as the triple quadrupole MS relies on a different technique by collecting ions in a potential trap. The ions can be scanned in MS mode by altering the radio frequency amplitude of the trap, which leads to the ejection of the ions into the detector. For MS/MS, the trap is filled with ions of a specific m/z value by adjusting the RF amplitude followed by introduction of the collision gas and scanning of the fragments. The great advantage of an ion-trap MS is its operation speed, which is up to ten times faster than a quadrupole instrument at identical sensitivity.
BIENVENUT ET AL.
238
[M+2H]2+
100
1537.5
1761.5 1704.4
relative intensity (%)
Phospho-S G
R
F
E
E
S V S A A D V G A E A SPhospo G
0
m/z
Figure 1. A tryptic digest of chicken ovalbumin was passed over an immobilised metal affinity column (IMAC) to isolate phosphorylated peptides. The phosphopeptide enriched eluate in 0.1 M sodium phosphate buffer was desalted with ZipTip and analysed with nano-ESI-MS/MS on a Q-Tof (Micromass, Manchester, UK). The peak at m/z = 1044.95 recorded in the MS survey scan was induced to fragmentation by collision with Argon gas and the recorded MS/MS spectrum was subjected to interpretation with SEQUEST, searching against a July 2000 release of SWISS-PROT. The peptide was identified as EVVGS*AEAGVDAASVSEEFR from ovalbumin with the serine residue at position five modified by a phosphate ester group. Peaks corresponding to the y-ion series of the underlined part of the above sequence were found in the spectrum denoted by dotted lines and single letter amino acid symbols in the figure. The inset represents a zoomed-in region of the spectrum showing the sequence phosphoserine-glycine.
As mentioned above, ESI instruments were coupled to liquid chromatography. The biggest impact in ESI MS has been the adaptation to reduced flow capabilities in the 10-500 nl/min range. Wilm and Mann (Wilm & Mann, 1996) and Emmett and Caprioli (Emmett & Caprioli, 1994) developed such improvements in parallel. The combination of nano-flow LC with micro-capillary reversed phase HPLC and nanoES has i) dramatically improved the sensitivity of ESI-MS/MS and ii) enabled the automation of protein identification by using an autosampler for loading of samples onto the LC (Ducret et al., 1998). In Figure 1, an example of a MS/MS spectrum is shown illustrating one of the strengths of ESI-MS/MS, namely the characterisation of post-translational modifications of proteins. The recorded spectrum of a doubly charged peptide with m/z = 1044.95 contained all the information to identify and characterise unambiguously the phosphorylated peptide. Although the singly phosphorylated peptide with the sequence shown in Figure 1 contains three serine residues as potential phosphorylation sites, the fragment ion pattern of the y-ion series demonstrated that only serine in position five could be phosphorylated. Indeed, this
PROTEOMICS AND MASS SPECTROMETRY
239
serine residue is a known phosphorylation site of chicken ovalbumin. The interpretation of this spectrum was greatly facilitated by the use of SEQUEST (see section 3.2), which scored this particular spectrum with a relatively high crosscorrelation score. 2.2.3. Improvement of the identification by chemical modification of peptides The expansion of protein sequence databases, e.g. TrEMBL, SWISS-PROT, NCBInr brought about by genome sequencing projects, decreases the probability of obtaining an unequivocal protein identification by PMF alone (Lahn & Langen, 2000). More information like amino acid sequence or amino acid composition increases the confidence in any protein identification. Lahm's group (Lahn & Langen, 2000) defined three ways to reduce this problem: - Acquisition of an other MALDI-MS spectra with optimised parameters, - Use of a different endoproteinase to generate a different PMF, - Identification of a short sequence tag using MALDI-PSD and/or ESIMS/MS. The time required for such experiments prevents such methods being used for high throughput protein identification (Lopez, 2000). Table 2. Peptides identified in the tryptic digest of native ovalbumin. Bold characters in the sequence column represent potential esterification sites. Last column (Ester) shows if the peptide was also found in the esterified sample listed in Table 4 (‘+’ if found, ‘-‘ otherwise) Peptide mass 1209.52 1345.738 1465.776 1555.721 1571.716 1581.721 1597.716 1687.84 1773.899 1858.966 2008.946
Position 190-199 370-381 111-122 187-199 187-199 264-276 264-276 127-142 323-339 143-158 340-359
Sequence DEDTQAMPFR HIATNAVLFFGR R YPILPEYLQCVK K AFKDEDTQAMPFR AFKDEDTQAM*PFR LTEWTSSNVMEER LTEWTSSNVM*EER GGLEPINFQTAADQAR ISQAVHAAHAEINEAGR ELINSWVESQTNGIIR R EVVGSAEAGVDAASVSEEFR
Ester + + + + + + + +
more easily supply the necessary information. Possible chemical reactions must have the following requirements: - They must be fast; - Have a high yield of conversion; - Be simple to handle; - Use reagents and buffers that do not leave by-products which may alter matrix crystallisation and ionisation process; - Ideally, reaction should be done on the MALDI sample plate.
BIENVENUT ET AL.
240
Alternatively, reactions should be done with a sample already embeded in matrix on the MALDI-MS sample plate. In the following section, two techniques, involving the modification of samples are described below (Bienvenut, Hoogland et al., 2002).
A
B
m/z
Figure 2. A) MALDI-MS spectrum of the native OVAL_CHICK digest peak labels show peptide mass, peptide sequence, protein source and the peptide position within the protein. B) MALDI-MS spectrum of the esterified sample. Peak labels show peptide mass, protein source, experimental number of esterifications, and the peptide position within the protein.
2.2.3.1. Esterification A well-documented modification is the esterification of side chain carboxylic groups of. glutamic (D) and aspartic (E) acids together with the carboxy-terminal group. Usually, methanol is used as the methylation reagent (Acharya et al., 1977; BartletJones et al., 1994; Fraenkel-Conrat & Olcott, 1945; Hunt et al., 1986; Patterson et al., 1996; Wilcox, 1967). Other alcohols can be used, e.g. 2-propanol, 1-butanol, 1hexanol, 1-octanol, benzyl alcohol (Falick & Maltby, 1989) or ethanol (Nutkins & Williams, 1989) This type of treatment allows the determination of the number of D and E in a peptide, thus increasing the confidence of its identification.
PROTEOMICS AND MASS SPECTROMETRY
241
The technique used in our laboratory is an adaptation of Pappin’s method (Bartlet-Jones et al., 1994). Briefly, the esterification reagent is obtained by addition of thionyl chloride to dry methanol at –80°C to form a 1% solution. Fresh reagent (10Pl) is added to an Ependorf tube containing the dried peptide sample After incubation at 55°C for 20 minutes, the excess reagent is removed by vacuum centrifugation. The modified peptides are re-suspended in 5 Pl of acetonitrile/water/TFA solution (50:49:1 v:v:v). Two Pl of the sample are loaded on the MALDI sample plate prior to matrix addition and MALDI-MS analysis. To perform a comparison of the untreated and treated samples, both are loaded on the MALDI sample plate and analysed (Figure 2). The spectrum of the unmodified digest is used for a primary protein identification using PMF (Table 2). The modified material is also analysed by MALDI, and masses of the modified peptides are used for protein identification using PMF with the Mascot program (see section 3.1.2.). In this case, esterified carboxylic groups are considered as permanent modifications like chemical modifications of cysteines, e.g. carboxymethylation. The comparison of the two peak lists allows a determination of potentially esterified peptides and their degree of esterification. The mass difference of "n x 14.0157" (the mass difference induced by the esterification) is checked between the peak list of the untreated and treated samples. Results are summarised in Table 3. Protein identification based on the PMF mass lists of the treated and untreated samples mostly resulted in the correct identification but with a non significant score (Table 5). The combined use of the mass list of untreated sample and the corresponding degree of esterification listed in Table 3 allows a clear identification of the correct protein with a highly significant score. 2.2.3.2. H/D exchange: Quantitation of labile protons on peptides Hydrogen/deuterium (H/D) exchange is a common practice in biochemistry. Numerous articles have described the exchange of protein labile hydrogens using heavy water (D2O) and other deuterated solvents, e.g. D4-methanol (MeOD), D3acetonitril (D3-AcN), D1-trifluoroacetic acid (D-TFA). This technique is mostly used to identify specific binding sites (Jones et al., 2000), conformational changes (Katta & Chait, 1993; Villanueva et al., 2000; Wang & Tang, 1996)and deduce secondary structure of proteins (Kraus et al., 2000; Zhang & Chait, 2000). Proteins are usually incubated in a deuterated solvent and labile protons are exchanged with different kinetics during the incubation step. Hydrogen/hetero-atom binding energy and labile hydrogen protein steric positions are the most important factors in the kinetics of the reaction. For example, in a protein complex involving two agonists, binding and/or adsorbing sites are differently exposed to the solvent, and these hydrogens therefore have a different kinetic rate of exchange (Buijs et al., 1999). For a E-sheet, hydrogens involved in the kinetic of H/D exchange show a lower rate of modification that is correlated with the stabilisation of the hydrogens due to the secondary structure (Zhang & Chait, 2000). These modifications or kinetics of exchange could be visualised using nuclear magnetic resonance or quantified using MS. This information is used to reconstruct the 3D-protein configuration.
BIENVENUT ET AL.
242
Table 3. Peak list comparison. List of masses showing a difference of "n x 14.0157 Da" with less than 15 PPM error between MALDI-MS spectra from untreated and esterified samples. Init. mass: list of masses in the native sample spectrum that matched with peak masses of the esterified sample. Mass ester: masses in the esterified sample spectrum; # ester: potential number of ester groups on the peptide chain (n: from 1 to 6); # D & E: potential number of D and E amino acids in the peptide sequence (n-1 due to the C-terminal esterification); Protein source: the protein to which the peptide corresponds (Protein named O95678 and K2C1_HUMAN are two cytoskeletal keratin type II from hair); Bold letters in sequence column represent potential position of the esterification site on the sequence from D and E amino acids and letters in italics represent potential esterification at the C-terminal amino acid. MSO: methionine sulfoxide Init. Mass
Mass ester 1149.56 1163.58 1180.60 1179.59 1221.64 1221.64 1265.58 1305.70 1407.68 1563.78 1597.80
# ester 2 3 3 3 4 3 4 2 3 2 3
#D &E 1 2 2 2 3 2 3 1 2 1 2
1121.53 1121.53 1138.55 1165.56 1165.58 1179.58 1209.52 1277.67 1365.64 1535.75 1555.73
Protein source
1555.72 1571.72
1611.78 1627.78
4 4
3 3
1581.73 1584.68 1597.71
1637.80 1668.76 1653.75
4 6 4
3 5 3
1687.84 1716.86 1773.87
1729.89 1744.89 1801.90
3 2 2
2 1 1
1773.91
1815.96
3
2
2008.97
2079.04
5
4
2008.95
2093.04
6
5
OVAL_CHICK Partially esterified OVAL_CHICK
2211.06
2281.14
5
4
TRYP_PIG
Unknown Unknown Unknown O95678 O95678 K2C1_HUMAN OVAL_CHICK Unknown O95678 Unknown OVAL_CHICK Partially esterified OVAL_CHICK OVAL_CHICK MSO OVAL_CHICK Unknown OVAL_CHICK MSO OVAL_CHICK Unknown OVAL_CHICK Partially esterified OVAL_CHICK
Sequence
YEELQVTAGR G VRYEDEINK N DYQELMNTK T DEDTQAMPFR NTKQEISEMN R AFKDEDTQAM PFR AFKDEDTQAM PFR AFKDEDTQAM PFR LTEWTSSNVM EER LTEWTSSNVM EER GGLEPINFQT AADQAR A ISQAVHAAHA EINEAGR ISQAVHAAHA EINEAGR G EVVGSAEAGV DAASVSEEFR EVVGSAEAGV DAASVSEEFR LGEHNIDVLE GNEQFINAAK A
PROTEOMICS AND MASS SPECTROMETRY
243
Table 4. Peptides identified from the peak list of the esterified sample. Mass ester: masses in the esterified sample spectrum; Init. mass: masses in the native sample spectrum; # ester: number of ester groups on the peptide chain; Seq. Pos.: sequence position; Sequence: amino acid sequence of the corresponding peptide; Bold characters represent position of the esterification site in the sequence and M*: methionine sulfoxide Mass ester
Init. mass
# ester
Seq. Pos.
1815.957 1265.578 1611.783 1627.782 1637.796 1653.747 2093.043
1773.899 1209.520 1555.721 1571.716 1581.721 1597.716 2008.946
3 4 4 4 4 4 6
323-339 190-199 187-199 187-199 264-276 264-276 340-359
Sequence EPINFQTAADQAR ISQAVHAAHAEINEAGR DEDTQAMPFR AFKDEDTQAMPFR AFKDEDTQAM*PFR LTEWTSSNVMEER LTEWTSSNVM*EER EVVGSAEAGVDAASVSEEFR
Table 5. Identification results for chicken ovalbumin (OVAL_CHICK) and score using Mascot PMF tool. Mass error is limited to 15 PPM, Mr of the protein is fixed to 45 kDa, methionines could be oxidised, cysteines are native and the NCBInr database was used. Research for protein identification was conducted against all entries contained in the database and a second time only against "lobe-finned fish and tetrapod clade" to reduce the size of the database. Id. Rank: OVAL_CHICK identification rank; Sc.: Score; Stat.: significance of the result at P<0.05, Scores are significant for values higher than 62 (see section3.1.2. the description of Mascot tool). Id. Pept.: number of identified peptides in the PMF output All NCBInr entries Sample Native sample Esterified sample Combined results: Table 3
Id. rank 1 2
Sc.
Stat.
60 32
-
Id. Pept. 11 8
1
111
+
8
Lobe-finned fish and tetrapod clade Id. Sc. Stat. Id. rank Pept. 1 60 11 1 32 8 1
111
+
8
et al. (Spengler et al., 1993) proposed to use H/D to exchange all the labile hydrogens of a peptide. The comparison by MS of the native and H/D exchanged peptides allowed a determination of the number of labile protons carried by the peptide. This technique was shown to have two advantages. First, they used this type of information during MALDI-PSD analysis to confirm the identified amino acids with the number of exchanged protons (see Table 6). Comparison of the PSD spectrum of treated and non-treated peptides facilitated the interpretation of the peptide amino acid sequence (Chaurand et al., 1999; Spengler et al., 1993). Second, direct MALDI spectra comparison of the native and treated peptides permitted the determination of the number of exchangeable protons.
BIENVENUT ET AL.
244
When this technique was applied to the whole digest of a protein, such information was used to confirm or reject the previously proposed sequence using PMF. Table 6. Number of exchangeable hydrogen atoms in common amino acids Amino acid
Alanine Arginine Asparagine Aspartic acid Cysteine (native) Cysteine (carbamidomethyl) Cysteine (propionamide) Glutamic acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophane Tyrosine Valine
1 letter code amino acid A R N D C C-CAM C-PAM E Q G H I L K M F P S T W Y V
# of exchangeable protons 1 5 3 2 2 3 3 2 3 1 2 1 1 3 1 1 0 2 2 2 2 1
To obtain reliable data, the H/D exchange must be as complete as possible. Also, the quality and the composition of the deuterated solvent used are crucial. Figueroa et al. (Figueroa et al., 1998) showed that the concentration of D2O in these solvents is really important for the H/D exchange equilibrium. For example, a low percentage of D2O (1-3 % in deuterated methanol) showed equilibrium at 77% H/D exchange in bradykinin. When D2O concentration was increased to 40%, the exchange reached its maximum for an equilibrium exchange of 97.2%. Composition of the solvent is also important since the reaction is mainly limited by kinetics. For this type of reaction where labile protons are replaced by deuterium, free D+ ions in the solvent could be considered as catalyst for the exchange reaction. Also, the remaining quantity of hydrogens in the deuterated solvent considered as contaminants is competing in the exchange reaction. To limit the competition effect of H with D in the exchange reaction, such solvents must be of the highest quality grade. Practically, after acquiring a first MALDI-MS spectrum of the native sample, the matrix/analyte crystals are treated with a solution of MeOD/D2O/TFA (70:30:1, v:v:v). A volume of two Pl of the deuterated solvent are added to the sample in a closed-box flushed with dry nitrogen. After 30 seconds of incubation, the remaining
PROTEOMICS AND MASS SPECTROMETRY
245
solvent is evaporated under vacuum. This step is repeated 3-4 times before a new MALDI-MS spectrum is acquired. Labile hydrogen exchange is usually around 90 to 95%. Due to this partial exchange, treated peptide masses show a larger isotopic distribution after deuteration (Zoom boxes in Figure 3). Therefore, the highest peak of the distribution is used to define the average mass of the modified peptide for further calculations.
Figure 3. A) MALDI-MS spectrum of bovine albumin digest: Peak annotations correspond to the peptide Mr, the name of the protein and the peptide position in the sequence. B) MALDIMS spectrum of the same sample after treatment with deuterated solvent: peak annotations correspond to the peptide Mr after treatment, Previous Peptide (PP) Mr and the number of H exchanged compared to the theoretical number.
Figure 3 shows an example of protein treatment with deuterated solvent for H/D exchange. The bovine albumin digest (Figure 3A) was treated directly on the MALDI sample plate as described above and a second spectrum was acquired (Figure 3B). Results of the PMF identification on the first spectrum querying the SWISS-PROT and TrEMBL databases for mammalian species allowed the identification of bovine albumin (ALBU_BOVIN, P02769) at the first rank and Yellow mealworm ecdysone receptor (O02035). In the case of bovine albumin, 10 peaks out of 22 were matched in both spectra (Table 7).
BIENVENUT ET AL.
246
Tables 7 and 8: Identified peptide from ALBU_BOVIN (Table 7) or O02035(Table 8) before and after treatment with deuterated solvent. Init. Mass: identified peptides from ALBU_BOVIN from untreated spectrum; Seq. Pos.: position of the peptide in ALBU_BOVIN sequence; Sequence: sequence of the corresponding peptide; Theo. H/D: Theoretical number of exchangeable hydrogens; H/D Mass: peptide mass after sample treatment; Exp. H/D: experimental number of exchanged hydrogens; % H/D: % of hydrogens exchange; NV: peak not visible after treatment Init. mass 927.50 1068.46 1163.64 1166.49 1249.61 1283.72 1305.72 1399.70 1439.82 1479.81 1567.76 1576.76 1639.95 1724.84 1747.70 1880.91 1907.91 1927.80 2045.01 2247.90 2492.05 3774.58
Seq. Pos. Sequence 161-167 413-420 66-75 460-468 35-44 361-371 402-412 569-580 360-371 421-433 347-359 139-151 437-451 469-482 184-197 508-523 529-544 581-597 168-183 267-285 45-65 168-197
Init. Seq. Pos. mass 927.50 87-95 1146.60 233-242 1249.61 335-344 1283.72 324-334 1537.80 458-471 1576.76 251-263 1673.82 135-149 1907.91 70-86 2513.97 324-344
YLYEIAR QNC*DQFEK LVNELTEFAK C*C*TKPESER FKDLGEEHFK HPEYAVSVLLR HLVDEPQNLIK TVMENFVAFVDK RHPEYAVSVLLR LGEYGFQNALIVR DAFLGSFLYEYSR LKPDPNTLC*DEFK KVPQVSTPTLVEVSR MPC*TEDYLSLILNR YNGVFQECCQAEDK RPCFSALTPDETYVP K LFTFHADIC*TLPDTE K C*C*AADDKEAC*FAVEG PK RHPYFYAPELLYYAN K EC*C*HGDLLEC*ADDRA DLAK GLVLIAFSQYLQQC*P FDEHVK RHPYFYAPELLYYAN KYNGVFQEC*C*QAEDK Sequence SDTSSMSGR IEPELSDSEK AC*SSEVM*M*FR LLQEDQIALLK TLGNQNSEMCISLK ISPEQEELILIHR ASGYHYNALTC*EGCK IWIPGHTIIASNHHL AK LLQEDQIALLKAC*SS EVM*M*FR
Theo. H/D 17 22 20 25 21 21 22 22 26 26 26 26 29 27 31 30 29 33 31 41 38 61
H/D Mass 944.78 NV 1183.75 NV NV NV NV NV NV 1505.98 1592.92 NV NV 1754.97 NV 1911.10 1936.03 NV 2075.18 NV 2528.26 3833.91
Exp. H/D 17
% H/D 100
20
100
26 25
100 96.2
26
96.3
30 28
100 96.6
30
96.8
36 59
94.7 96.7
Theo. H/D H/D Mass 22 944.78 20 NV 22 NV 22 NV 30 NV 26 1592.92 31 NV 29 1936.03 41 NV
Exp. H/D 17
% H/D 77.3
22
84.6
28
96.6
PROTEOMICS AND MASS SPECTROMETRY
247
The average exchange of hydrogen to deuterium on the 10 identified peptides corresponded to 97.7 ± 2.0 %. The second scoring protein in the output of the PMF tool was identified only with 3 peaks out of 9 (Table 8) and then the exchange rate was calculated to be 86.2 ± 8.0 %. Thus, this technique has a clear discriminating power enabling the unambiguous identification of a protein. Additionally, this technique has the major advantage that the same sample prepared for protein identification by PMF is reused to do H/D exchange. The technique is prone for automation and therefore large numbers of samples can be analysed in a high throughput mode. 2.3. The molecular scanner approach Well known high throughput approaches combine two-dimensional electrophoresis (2-DE) with PMF analysis (Joubert-Caron et al., 2000; Thiede et al., 2000). Although automation is often possible, a number of limitations still adversely affect the rate of protein identification and annotation in 2-DE databases: - The sequential excision process of pieces of gel containing protein; the enzymatic digestion step, - The interpretation of mass spectra (reliability of identifications), - The manual updating of 2-DE databases. Methods involving high resolution protein separation, paralleled sample preparation, automation of experimental processes and of database comparison, as well as powerful and specific visualisation tools need to be developed and integrated (Hochstrasser, 1998; Williams & Hochstrasser, 1997). In order to further increase the throughput of protein identification and to offer a flexible and powerful proteomic visualisation tool, we designed a highly automated method that can create a fully annotated 2-DE map (Binz et al., 1999). This technology called “molecular scanner”, combines parallel methods for protein digestion and electrotransfer in a PMF approach to identify proteins. MALDI-MS analysis is conducted directly on the PVDF membranes by a scanning procedure. Using a set of dedicated tools this creates, analyses and visualises a proteome as a multidimensional image. This provides the technological basis for the development of a clinical molecular scanner, which could, for example, be adapted to medical diagnostics (Hochstrasser et al., 1991). 2.3.1. Double parallel digestion process At the 1998 Sienna conference, our group presented a "parallel protein digestion during the electroblot" system (Bienvenut et al., 1999; Binz, Muller et al., 1999). During electrotransfer, a membrane (ImmobilonTM AV or IAV), containing covalently bound trypsin (IAV-trypsin), was present between the gel and the PVDF collecting surface. This results in tryptic cleavage of proteins during their migration to the PVDF membrane. In that respect, the transfer tension was adapted to reduce the migration speed of the proteins using an alternative square shape tension. The resulting effective tension was 3.5 V and this electric field had also the advantage in that it modified the protein orientation during each pulse. With this technique, the
248
BIENVENUT ET AL.
problem of low recovery of basic and high Mr polypeptide after electroblotting were still encountered. An improvement was achieved by operating a pre-digestion of the proteins in the gel prior to electroblotting. This combination called "Double Parallel Digestion" (DPD), led to greatly improved digestion of high molecular weight and basic proteins without losses of low Mr polypeptides. This method allowed successful identification by PMF of proteins differing over a wide pI and Mr range directly on the collecting PVDF membrane using MALDI-MS (Bienvenut et al., 1999; Binz, Muller et al., 1999). The whole procedure is carried out as described in Bienvenut et al. article, (Bienvenut et al., 1999). Briefly, after SDS-PAGE protein separation, gels are soaked 3 times in de-ionised water for 5 minutes and then air dried at room temperature for 12 hours (for example overnight). For the first pre-digestion, gels are re-hydrated with 0.05 mg/ml trypsin in 10 mM Tris-HCl, pH 8.2 during 30 minutes at 35°C. Subsequently, the gels are transblotted onto PVDF membrane in a laboratory-made semidry apparatus at room temperature. In order to increase the migration time of the protein through the IAV membrane during the transfer (and thereby allowing more time for digestion to take place), an asymmetrical alternating voltage was used. A square waveform alternating voltage was selected: +12.5 V for 125 ms followed by -5 V for 125 ms, repetitively. The transblotting process is completed after 12-18 hours. To perform the digestion during the electroblotting, a double layer of IAV-trypsin membrane is inserted between the polyacrylamide gel (where the protein are located) and the PVDF membrane (which acted as the collecting surface), to create a transblot-digestion sandwich. After the transfer procedure, the PVDF membrane is washed in de-ionised water for 5 minutes and stained if required. 2.3.2. 14C quantitation of the transferred product and diffusion The technique of Western blotting is widely used and a lot of investigations have been undertaken in order to quantify protein recovery on the collecting membrane (Bolt & Mahoney, 1997; Jungblut et al., 1990; Mozdzanowski & Speicher, 1992; Neumann & Mullner, 1998; Reim & Speicher, 1992). One of the most common difficulties related to the description of the DPD transfer process is the estimation of the yield of proteins transferred from the gel onto the collecting PVDF membrane. One of the solutions was to use 14C radiolabelled proteins (Bienvenut, Deon, Sanchez et al., 2002). 14 C activity is an emission of E- particles of low energy easily absorbed by the environment. Due to the thickness of the gel, it is not possible to obtain an accurate measurement of the E- signal emmitted by the gel separated proteins. Therefore, an absolute quantitation of proteins recovered on the collecting membrane is not possible. To overcome this problem, the signals acquired on the collecting membranes were compared to a reference obtained from a one dimensional electrophoresis (1-DE) gel of the 14C labelled proteins. Protein recovery under different conditions was measured and the influence of the following parameters were evaluated:
PROTEOMICS AND MASS SPECTROMETRY -
249
Effect of the buffer (heterogeneous CAPS and homogeneous ½ Towbin); Effect of the electric field used for the transfer: 1mA/cm2 or square shape tension (SST); DPD versus standard transfer.
2.3.2.1. Comparison of the influence of the electric field on the protein recovery The efficiency of the transfer using standard transblotting techniques or adapted SST (used during the DPD process) was tested without the digestion step. Main parameters are shown in Table 9. Table 9. Buffers and electrical fields used during the experiment for the comparison of recovery of undigested proteins Experiment Buffer Composition Anodic (MeOH) Cathodic (MeOH) Electric field
1 Heterogeneous CAPS 10 mM CAPS buffered at pH 11 20% 5%
2 Homogeneous 1/2 Towbin 13 mM Tris, 100 mM glycine 12.5% 12.5%
1mA/cm2
SST
For each of these experiments, 2 lanes of 1-DE mini-gel were used. One Pg of Bio-Rad Mr standard was loaded on the first lane of the gel and 50 nCi of 14C radiolabelled proteins from Amersham-Pharmaciawere loaded on the second lane. Proteins were separated using a standard technique (Laemmli, 1970). At the end of electrophoresis, the gel was washed for 3 minutes in de-ionised water, 1 minute in the 20% methanol buffer. PVDF membranes were equilibrated in the 5% methanol buffer for the experiment 1. In experiment 2, the same homogeneous 1/2 Towbin was used to equilibrate the gel and the PVDF membrane. After transfer, the membranes were washed rapidly with de-ionised water and air-dried. PVDF membranes were scanned with a Phospho-Imager apparatus for the 14C radiolabelled proteins and with an optical densitometer for Bio-Rad Mr standard stained with Amido Black. The volume of the spots were measured using the Melanie software (Appel, Vargas et al., 1997). The image obtained from 14C labelled samples (Figure 4) showed 7 bands corresponding to myosin (MYSS), phosphorylase b (PHS2), albumin (BSA), ovalbumin (OVAL), carbonic anhydrase (CAH2) and lysozyme (LYC). The results are summarised in Table 10.
BIENVENUT ET AL.
250
Table 10. Percentage increase in protein recovery using DPD type transblotting process compared to standard transfer; NP: protein not present in the sample; NV V*: protein not visible on the PVDF after staining Transfer technique Proteins
14 Bio-Rad Mr standard C labelled protein with amido black stain with autoradiography (experiment 1) (experiment 2)
NV* 36 37 6 21 20 21
110 NP 70 10 14 21 NP
LYC
28
18
Average
24
23
MYSS BGAL PHS2 BSA OVAL CAH2 ITRA
Both experiments showed the positive effect on protein recovery when using the square shape tension during the electrotransfer process. The average increase is equivalent for the Amido Black stained and 14C labelled proteins. Nevertheless, this electric field is not acting identically on all proteins. The effect is less important for low Mr proteins (less than 20% for proteins smaller than 60 kDa) and the strongest effect is found for the high Mr proteins i.e. MYSS, BGAL, PHS2. 2.3.2.2. DPD quantification test The staining intensities of a protein decrease progressively as a function of the extent of digestion (Salih & Zenobi, 1998; Tal et al., 1985) and thus a comparison of the staining intensities of the protein bands after normal and DPD transfer is not possible. One way to quantify the loss of material was the use of 14C radiolabelled proteins. The DPD process was compared to the standard protein electroblotting to PVDF membrane (i.e. experiment 2 using SST, see Table 9). The quantitation was done as above with the 6 14C labelled proteins (MYSS, PHS2, BSA, OVAL, CAH2, and LYC). Digestion control was performed in parallel with the Bio-Rad Mr standards. The results obtained after 24 hours of exposition are shown in Figure 5. The binding activity of the PVDF membrane is mostly due to electrostatic interaction. Smaller peptides did not bind to the surface of the collecting membrane with a strong interaction. 14C labelling of proteins is performed by modification of the J-amino group of lysine (Dottavio-Martin & Ravel, 1978). Radioactivity decrease during the DPD technique could be due to loss of small peptides carrying C14 labelled lysine. This loss of a few small peptides does not seem to be a problem
PROTEOMICS AND MASS SPECTROMETRY
251
for protein identification. Interestingly, in the case of myosin and more generally for high Mr proteins, this technique allows to transfer more efficiently polypeptides to the collecting membrane, and to collect more material than in the traditional transfer process. Sample
SDS-PAGE Mr standard from Bio-Rad Heterogeneous Homogeneous CAPS, Towbin, 1mA/cm2 SST
Transfer technique
14
C SDS-PAGE Mr standard from Amersham Heterogeneous Homogeneous CAPS, Towbin, 1mA/cm2 SST
Proteins MYSS BGAL PHS2 ALBU OVAL CAH2
ITRA LYC
Figure 4. Amido Black stained proteins and auto-radiography of 14C samples.
The field of protein identification has expanded over the last few years with the improvements of accuracy and sensitivity of MS instruments. At the same time, development of computer resources helped to speed up the analysis of the huge amounts of data generated by MS instruments, and to increase the number of nucleotide and protein sequence entries in specialised databases. Much progress has been made concerning high throughput facilities to prepare samples and run 2-DE gels. Furthermore, the automatic analysis from complete 2-DE gels up to the mass spectra data is already possible without human intervention (Binz et al., 1999; Traini et al., 1998a). Many available PMF identification and post-identification software tools are able to assist with protein identification, but the final analysis of the results still requires human interpretation and validation. Some bioinformatics software tools for proteomics combine data analysis, statistics and artificial intelligence methods to manage MS data, to identify proteins and to update databases. In this section, specific tools used to identify proteins are reviewed. They use lists of peptide mass values from MS or MS/MS as input, and they may also combine this information with amino acid sequence tag information
BIENVENUT ET AL.
252
or amino acid composition to enhance the identification of proteins. Figure 6 shows a simplified flow chart of sample preparation and MS data collection. It also shows the techniques and tools for protein identification described in this section. Bioinformatics is also concerned by the huge amount of data generated experimentally by the wet-lab, as well as the data generated by its tools outputs. In this case, a laboratory management systems (LIMS) must be precisely designed for each specific need. 70000
Abs. Spot Volume
60000 50000 40000 30000 20000 10000 0 MYSS
PHS2
BSA
OVAL
CAH2
LYC
Mean
Proteins Control Transfer with SST
DPD
Figure 5. Absolute 14C-signal intensity of the control transfer and DPD process.
3. PROTEIN IDENTIFICATION USING BIOINFORMATICS TOOLS 3.1. Protein identification by PMF tools using MS data PMF, currently the most common method used to identify proteins in a high throughput environment, is based on the comparison of a list of experimental peptide masses with theoretical peptide masses. The experimental masses are generated from the MS measurement of an enzymatically digested protein sample. The theoretical masses are obtained from an in silico digestion of all sequences in a database. The goal is to find the protein(s) whose peptide masses show the best match with the experimental fingerprint. The method can be divided into 3 steps. The first step is peak detection, i.e. the selection of the most relevant masses for protein identification from the mass spectra. Frequently, only few experimental peptide masses in the fingerprint match the theoretical masses, and it is therefore crucial to detect also low-intensity but “important” peaks, while, at the same time, avoiding to select too many “non-important” peaks. The second step is the comparison of the selected experimental peptide mass values to all protein sequences in a database, which were theoretically cleaved by applying the cleavage rule corresponding to the enzyme used for the sample digestion. Finally, a similarity rule (score) should provide a measure of quality of fit of the matched values, in
PROTEOMICS AND MASS SPECTROMETRY
253
order to either automatically interpret the result and choose the best-matching protein, or to help the user to identify the correct protein.
Figure 6. Possible schematic and simplified data flow for protein identification using mass spectrometry.
In the comparison phase, apart from the experimental peptide masses and the proteinase used to digest the proteins, some optional attributes may be specified to reflect experimental conditions and to reduce the search space. These optional attributes may include information coming from the sample such as species of origin, Mr or pI of the whole protein with the accepted error range, possible chemical or artefactual modifications like carboxymethylation of cysteines or oxidation of methionines. Other parameters to be specified include the mass tolerance or the minimum number of matching peptide required for a protein to be
BIENVENUT ET AL.
254
suggested as a possible match. Providing a maximum of information available about the sample helps to decrease the number of candidate proteins, to reduce the probability of false positive matches, and thus to increase the confidence of the identification. However, one must be careful not to miss the correct protein either. 3.1.1.
Peak detection
Peak detection is an important step in the identification process. Sometimes only a few experimental peptide masses in the fingerprint match the theoretical masses, and therefore the failure to detect a relevant peak can hinder the correct identification of a protein. However, if too many false peaks are considered, this may lead to erroneous database matches causing false identifications, as well as increasing search duration. Furthermore, it is important to precisely determine the peptide masses. Algorithms that perform peak detection usually take into consideration the probable isotopic distribution when looking for the relevant monoisotopic masses. For example, Breen et al. (Breen et al., 2000) use a Poisson model to calculate the isotope distribution in order to select the monoisotopic peaks. These algorithms should also be able to separate overlapping isotopic patterns. In some cases, peak detection softwares delivered with the spectrometer hardware, designed to determine the monoisotopic masses, do not have the necessary flexibility. In our case, for example, the peak detection software had to be rewritten in order to be integrated into the automated high throughput identification pipeline. A genetic algorithm was proposed to optimise the thresholds needed for peak detection (Gras et al., 1999). We have then shown the important correlation between peak detection thresholds and identification results. 3.1.2.
Identification Tools
Several tools are available to identify proteins using PMF. They all compare peptide masses obtained from mass spectrometry experiments to the theoretical peptide masses obtained from a theoretical digestion of all sequences in a protein sequence database. The programs generate a list of protein entries, ordered by a score that tries to reflect the fit between theoretical and experimental parameters. It is therefore evident that the order of suggested proteins in the result list is of paramount importance for a facilitated interpretation of the identification, in particular when manual intervention should be minimised. All programs compute scores for each hit; some of these scoring systems are very simple, while others use probabilistic methods to increase confidence in the matching protein. A list of PMF tools and their URLs is given in Table 11. The simplest scoring method counts the number of peptide masses matched. This is applied by the PeptideSearch tool (http://www.mann.emblheidelberg.de/Services/PeptideSearch/PeptideSearchIntro.html) which queries a non-redundant database (nrdb), as well as by our PeptIdent program (Binz, Wilkins et al., 1999) which searches the SWISS-PROT and TrEMBL databases (Bairoch &
PROTEOMICS AND MASS SPECTROMETRY
255
Apweiler, 2000). When using tools based on this scoring approach, it is important to note that an upper boundary for the intact protein mass should be specified, since a score based on the number of matched peptides alone, clearly favours high molecular weight proteins. The MOWSE program (Pappin et al., 1993) determines a score by considering the frequency of each peptide mass in the NCBInr database, process giving stronger weights to heavy peptides, as these peptide masses can be observed less frequently. This score also takes into account the presence of missed cleavage sites in matched peptides: the user can select to down-weight the contribution of partially cleaved peptide fragments to the score, by specifying a value for the so-called pFactor. The MS-Fit program (Clauser et al., 1995) uses a similar scoring method, and can be used to search several databases, including NCBInr, GenPept, pdbEST, and SWISS-PROT. The algorithm of the Mascot program (Perkins et al., 1999) is based on the one used by MOWSE, but it introduces a probability-based score, which considers the matches as random events depending on the number of entries in the database. ProFound (Zhang & Chait, 2000) calculates a probability for the identification of the correct protein, given by a bayesian formula, and uses the distance between experimental and theoretical masses obtained from the NCBInr database. A more recent version of this tool (see Table 11 for the URL) takes into consideration many more attributes and the same scoring method (personal communication). For the sake of clarity, it will be referenced in this document as ProFound-New. Finally, the MassSearch program also determines an identification score based on the probability to randomly obtain a match of n experimental masses with n theoretical masses, given the interval of possible masses and the maximum allowed distance of masses accepted in this match. This process is repeated through epochs. At each epoch, a new mass is added, by increasing the allowed maximal distance, until a maximum probability is reached. All these algorithms use various attributes (in addition to the mass values) to limit the number of candidate proteins. However, they make little use of this information in their score calculation, since they use at most one or two of these attributes, such as the presence of missed cleavage sites or the mass distribution in the database. They represent only a small part of the parameters that could influence the quality of identification. We proposed a scoring scheme that considers about 30 attributes with their respective contributions to the score values (Gras et al., 1999). As the importance of these contributions is difficult to estimate, this approach uses a learning algorithm: a genetic algorithm has been implemented to estimate the weight corresponding to each specific attribute. As a result, the discrimination rate of candidate proteins can be enhanced, i.e. the score allows to distinguish between false positive and correct matches. This method, which is implemented in the SmartIdent tool, is also robust against mass calibration errors, since it uses a linear regression method to qualify the global goodness of the matches between the experimental masses and the theoretical ones. Table 12 shows a comparison of the results of some of the available PMF programs when analysing a very difficult mass spectra. The query was made for a list of 60 masses measured on a Voyager Elite MALDI-TOF MS. The sample corresponds to protein G3PC_ARATH (SWISS-PROT P25858) obtained after 2-DE
SeqMS
http://matrixscience.com/ http://www.prospector.uscsf.edu/
OWL*/NCBInr SwissProt/Genepept/pdbE ST/OWL*/NCBInr NCBInr/ SwissProt
Number of SwissProt/TrEMBL peptides matched Heuristic score SwissProt/TrEMBL based on learning method SwissProt/PIR/NR/dbEST and others Probabilistic models
Cooper et al., 174
et al.,
Fenyo et al., 159
http://prowl.rockefeller.edu/PROWL/p epfragch.html http://www.protein.osakau.ac.jp/organic/SeqMS.html http://frl.lptc.u-bordeaux.fr http://www.expasy.ch/tools/glycomod/
F-Cossio 167
Gras et al., 51
Binz et al., 139
Zhang and Chait, 132
Perkins et al., 155 Clauser et al., 68
References Pappin et al., 26
http://www.expasy.ch/
http://prowl.rockefeller.edu/cgibin/ProFound/ http://www.mann.emblheidelberg.de/Services/PeptideSearch/ PeptideSearchIntro.html http://www.expasy.ch/
Internet URL address http://srs.hgmp.mcr.ac.uk/
Search Database OWL*
Bayesian algorithm Number of NCBInr peptides matched
Scoring Type Probabilistic models Mowse Mowse
MassXpert GlycoMod * OWL was last updated in May 1999 (release 31.4)
Other tools
PepFrag
SmartIdent
PeptIdent
PeptideSearch
Program Name Protein attribute Mowse (MS or MS/MS) Mascot MS-Fit/MSTag/MS-Seq ProFound
Table 11: Programs freely available on the Internet for protein identification
256 BIENVENUT ET AL.
PROTEOMICS AND MASS SPECTROMETRY
257
separation and tryptic digestion as described by Bienvenut et al. (Bienvenut et al., 1999). In this comparison, the parameter values used in all identification programs were identical, whenever possible, and they are detailed in the table legend. In the case that the parameters were not comparable, the default parameters were used. The difficulties highlighted by the use of such different programs to identify the correct protein are interesting to be analysed. SmartIdent, MS-Fit, ProFound and ProFoundNew with SWISS-PROT database retrieved the correct protein entry as the first hit. PeptIdent with SWISS-PROT database ranked this protein in 2nd, with the same score as the first hit, since they both have the same number of peptide masses matched. PeptIdent with SWISS-PROT and TrEMBL database, Mascot and ProFound-New with NCBInr database have assigned ranks as high as 8 to this protein. Table 12: Identification of G3PC_ARATH (GLYCERALDEHYDE 3-PHOSPHATE DEHYDROGENASE,CYTOSOLIC), Species Arabidopsis thaliana, SWISS-PROT (SP) entry P25858 using different PMF tools. The restricting parameters used were Arabidopsis thaliana for the species, a minimum number of 4 matched masses, a maximal tolerance for masses of 60 ppm, at most one missed cleavage of tryptic peptides allowed, and the modifications accepted were oxydised methionines and cysteines treated with iodoacetamide to form carboxyamidomethyl cysteines. The databases queried by each program are listed in the right column. In the score column, the first value is the score of the first candidate protein, followed by either the score of the second candidate protein (if the first one is the correct one) or the score of the correct protein Program Name
Rank
Score
Discrimination value/rule
Database
1 2 3 8
130.41 (80.40) 0.08 (0.08) 0.15 (0.08) 41 (29)
1 1 1 2 1 9 1
3890 (156) 634 (633) 6 (6)
0.62 SP/TrEMBL SP SP/TrEMBL If score > 58, then NCBInr significant SP NCBInr 0.36 (0.29) NCBInr 1.0 (0.00019) NCBInr 0.99 (0.01) SP SP/TrEMBL nrdb
SmartIdent PeptIdent Mascot MS-Fit ProFound ProFound-New PeptideSearch Startlab (Gent)
When querying the NCBInr database, the identification ranks are completely different depending on which PMF tool is used. For the example shown in Table 12, MS-Fit and ProFound ranked G3PC_ARATH att the top level while Mascot and ProFound-New ranked this protein with a lower score. The MS Fingerprinting Spectrum Analysis Tool from Starlab, Gent (http://genesis.rug.ac.be/~nikos/) was also tested, even though it’s scoring method was not described in this document (data not published, personal communication). It is also apparent that the size of the database has a strong influence on the PMF result. For example, on the SmartIdent
BIENVENUT ET AL.
258
output, utilisation of SWISS-PROT alone or SWISS-PROT/TrEMBL as target databases modified the level of discrimination of the result. This problem is mainly related to the database size (88757 entries in SWISS-PROT Release 39.7 and 300152 entries in TrEMBL Release 14.17). A larger database increases the probability to generate a false match. In the outputs of the softwares that ranked the protein correctly, the score values and the discrimination values gave good hints about the quality of the match, particularly when comparing the first hit against the second one. One must consider that this mass fingerprint was very difficult to analyse and was specially chosen to outline the gambling aspects of PMF analysis. Table 13: Query for 60 mass values of a MALDI-TOF MS spectrum from a mixture of PON2_HUMAN (SERUM PARAOXONASE/ARYLESTERASE 2, Species Homo sapiens, SWISS-PROT accession number Q15165) and the marker GT26_SCHJA (GLUTATHIONE STRANSFERASE 26 KDA, Species Schistosoma japonicum, SWISS-PROT accession number P08515). The query was made for the all available species, the minimum number of matched masses was 4, the maximal tolerance for masses was 40 ppm, at most one missed cleavage for tryptic peptides was allowed, and the modifications accepted were artefactual modification of cysteines with acrylamide and oxydised methionines. Mr values were delimited between 20 and 40 kDa. The databases queried by each program are listed in the right column GT26_SCHJA
PON2_HUMAN
Rank
Number of Rank peptides matched
Number peptides matched
PeptIdent Mascot MS-Fit ProFound PeptideSearch
1 1 1 1 1
15 16 15 15 11
7 38 2 2 25
10 8 10 5 7
Startlab (Gent)
1
13
32
9
Program Name
Database of
SP/TrEMBL NCBInr NCBInr NCBInr SP/TrEMBL TrEMBLnew nrdb
Table 13 presents the analysis of PMF results for a mixture of two proteins. PMF of the protein PON2_HUMAN (SERUM PARAOXONASE / ARYLESTERASE 2, Species Homo sapiens, SWISS-PROT accession number Q15165) and the marker GT26_SCHJA (GLUTATHIONE S-TRANSFERASE 26 KDA, Species Schistosoma japonicum, SWISS-PROT accession number P08515) were obtained after 1-DE separation and tryptic digestion as described by Bienvenut et al. (Bienvenut et al., 1999). All programs have found the marker as the first hit, and PON2_HUMAN protein as the second hit. In the cases where PON2_HUMAN was not the second hit, the preceding hits on the list were variants of the marker. ProFound offers the possibility to decide if the mass values correspond to one single protein or to a mixture of up to four proteins. Selecting one single protein or a mixture of two proteins does not change the results, i.e. PON2_HUMAN and GT26_SCHJA are always at the top of the hit list with high scores. In fact, these
PROTEOMICS AND MASS SPECTROMETRY
259
results show that the performances of PMF tools highly depend on the search databases. All the available tools have their own characteristics, and each of them has its own strength and weakness. It is therefore not surprising that they can produce quite different results for the same sets of peptide masses, particularly for “difficult” query spectra such as the case of protein G3PC_ARATH. With the aim of giving users the opportunity to take advantage of the features of a large number of tools whilst having to fill in only one single submission form, the CombSearch tool (http://www.expasy.ch/tools/CombSearch/) was designed. It simultaneously submits the specified input data to several protein identification tools available on the internet, and tries to assist in integration of the results. 3.2 MS/MS Ions Search Peptide mass fingerprinting characterises each peptide by only one attribute, the peptide mass value. By itself, a single mass value does not reveal much about the peptide or the protein sequence, however other protein attributes such as ions from internal sequences of peptides obtained from successive MS fragmentation may give better hints for protein identification. Algorithms similar to those of PMF are also used for MS/MS ions search. In general, all proteins contained in a database are digested in silico to find parent peaks. These theoretical parent peptides are then fragmented in silico, and the experimental MS/MS pattern is compared to the theoretical patterns (Eng et al., 1994; Mann & Wilm, 1994). The discrimination score is given by correlating theoretical and experimental fragments. Some of the scoring methods of peptide mass fingerprinting tools are used in the analysis of MS/MS mass spectra of peptides. PeptideSearch, for example, counts the number of experimental and theoretical fragments matched, whereas Mascot uses a probabilistic score (personal communication). The Sequest program (Eng et al., 1994) is a blend of two approaches which runs in two sequential steps. The first step counts the number of matches among all ion fractions. The second step takes a certain number of the best matches and builds a theoretical spectrum where the peak intensities depend on the ion types. The score measure is then given by correlating theoretical and experimental spectra through their Fourier transformation and choosing the best correspondence in this space. In a more recent version, called Turbo Sequest (http://www.thermoquest.com/turbosequest.html), the computation time required for the first step was reduced by the use of an indexed database. In this case, the indexes are based on the peptide masses. Most of these algorithms are well adapted to fragment peptides by CID, meanwhile the tool available at Starlab, Gent (http://genesis.rug.ac.be/~nikos/) is adapted to fragment ions by PSD. Even though this tool was presented publicly in Siena 2000 (Gevaert & Vandekerckhove, 2000), its scoring method has not yet been published. Other possibilities to improve the results of MS/MS data comparison are to combine this information with peptide sequence tags or the amino acid composition. This information is taken into account in programs such as MS-Tag, MS-Seq, PepFrag (Fenyo et al., 1998) and PeptideSearch.
260
BIENVENUT ET AL.
The identification of proteins with MS/MS is a powerful technique specially for the identification of protein mixtures (McCormack et al., 1997). There is also the possibility to search in expressed sequence tags (EST) databases (Yates, 3rd, Eng, & McCormack, 1995). However, it is also important to note that many more modifications are possible when fragmenting the peptides, which result in exponential combinatorial possibilities of search. 3.3 De novo sequencing Very often, no exact database match can be found even with high quality MS/MS mass spectra. It depends directly on the completeness and accuracy of the database searched, i.e. whether the genome is complete or incomplete, and on the quality of the transcripted EST sequences. These problems raise the question whether it is a novel protein, a known protein with a post-translational modification or if the failure to produce a database match due to inter-species variation, database sequence errors, or unexpected proteolytic cleavages. To address this problem, de novo interpretation of MS/MS data is an alternative where the amino acid sequence of peptides is derived by interpreting the mass differences between the generated MS/MS fragment ions sequence. The automatic or visual interpretations of MS/MS data require considerable efforts. De novo peptide sequencing algorithms generate results that may be ambiguous since the analysis of MS/MS ions is not a simple task. Important experimental problems such as noisy and incomplete data, or ion types dependent on the ionisation method, are some of the problems to be addressed. Although no tools, freely available on the Internet, exist that integrate automatically protein identification with de novo sequencing algorithms, some isolated programs deduce peptide sequences from a list of ion fragmentation masses. Their algorithms are mainly divided in two different approaches. The first described global approach generated all possible fragments of amino acids, compared them to theoretical fragments and used to keep the best matches (Sakurai, Matsuo , Matsuda, & I, 1984). Prefix pruning algorithms were also applied to reduce the combinatorial explosion, since the number of sequences grow exponentially with the length of peptides (Jonhnson & Biemann, 1989). This approach restricts the search space to sequences that best match the experimental spectra according to their prefix. The inconvenience of this method lies in the fact that regions of a sequence that are under-represented by fragment ions may be discarded before any analysis. The second paradigm of a local search provides more efficient results and is based on graph theory. The peaks in an experimental spectrum are transformed into vertices in a spectrum graph, where the edges correspond to differences of masses between two ions. According to experimental conditions, these mass gaps can result from some amino acids, single amino acids, fragments of amino acids or modifications (Fernandez-de-Cossio, Gonzalez, & Besada, 1995; Fernandez-deCossio et al., 1998; Taylor & Johnson, 1997). The solution is given by searching for the path in the resulting acyclic graph which has the best score. Scores may be calculated based on probabilistic models, implemented in the so-called Sherenga algorithm (Dancik et al., 1999) and in the SeqMS tool (Fernandez-de-Cossio et al.,
PROTEOMICS AND MASS SPECTROMETRY
261
1998), or on a combination of ion intensities and cross-correlation (Taylor & Johnson, 1997). For Zhang and McElvain (Zhang & McElvain, 2000), the peptide sequences are obtained by reading the intersection spectrum of a MS/MS daughter ion and their granddaughter ions MS3 (a third stage of mass spectrometry). This intersection spectrum represents common peaks to both MS2 and MS3 and are obtained through an arithmetic mean of the ion intensities. This algorithm also uses cross-correlation and ion intensities to calculate the similarity score of the different spectra. Other innovative algorithms for de novo sequencing have been developed based on learning methods from artificial intelligence. Stranz et al. (Stranz & Martin, 1998), instead of using the usual graph theory, proposed an adapted genetic algorithm to optimise the search for possible combinations of amino acid masses. The Sherenga algorithm (Dancik et al., 1999) automatically learns fragment ion types and intensity thresholds from a collection of test spectra. This information is then used to help de novo interpretation of peptides. Scarberry et al. (Scarberry, Zhang, & Knapp, 1995) trained artificial neural networks to classify observed ion fragments into specific ion types (y, z, b, …) before deriving the sequence spectra. 3.4 Other tools related to protein identification A range of proteomics tools is available that allows to go beyond simple protein identification. Most proteins from higher eukaryotes undergo co- and/or posttranslational modifications. These modifications can involve either a cleavage process (thus eliminating signal sequences, transit or pro- peptides and initiator methionines) or the addition or removal of many different simple chemical groups (e.g. hydroxyl, carboxyl, acetyl, methyl, phosphoryl, etc.), as well as the addition of more complex molecules, such as sugars and lipids. In these cases, the in silico calculation of peptide masses using non-annotated databases will not match the masses obtained experimentally any longer. Needless to say that these modifications are important assets of proteins and therefore need to be characterised in order to describe the mature protein. A comprehensive tool for high throughput mass spectrometric discovery of protein post-translational modifications is the FindMod tool (Wilkins et al., 1998) available on the ExPASy server (http://www.expasy.ch/tools/findmod/). This tool considers some 30 posttranslational modifications, applying many different rules derived from documented post-translational modifications in SWISS-PROT (Bairoch & Apweiler, 2000) and from the PROSITE protein family and domain database (Hofmann, Bucher, Falquet, & Bairoch, 1999). FindMod can also suggest possible single amino acid substitutions. Such substitutions can occur in proteins translated from polymorphic genes. A similar tool that assists calculation and prediction of possible post-translational protein modifications is GlycoMod (Cooper, Wilkins, Williams, & Packer, 1999). This tool deals only with protein glycosylation, probably the most common and complex type of protein modification. GlycoMod is available on the ExPASy server (http://www.expasy.ch/tools/glycomod/) and does not only allow for computing possible monosaccharide compositions corresponding to the mass of a glycopeptide,
262
BIENVENUT ET AL.
but also allows inclusion of a range of options such as oligosaccharide release or derivatisation strategies in the calculation. Peptide masses that do not immediately match theoretical masses in the identification process, can not only be the result of a post-translational modification, but may arise during the processing of the sample. A tool that can identify possible peptides that have resulted from non-specific chemical or enzymatic cleavage of proteins is the FindPept tool (http://www.expasy.ch/tools/). At this point we would also like to mention a versatile tool to model mass spectra and mass fragmentation spectra in silico. A range of proteomics tools is combined in MassXpert, a free application by F. Rusconi, which can be downloaded from http://frl.lptc.u-bordeaux.fr). Supported features include cleavage of a protein by different enzymes, fragmentation of peptides or small proteins, an m/z calculator for a given protein mass and many more. 3.5. Data storage and treatment with LIMS The combination of 2-DE gels with mass spectrometry results in huge amounts of data. It is essential to manage this data in a centralised way in order to simplify its analysis and database updates. The bioinformatics analysis of experimental data also produce results and data, that must be managed and available by all times. A LIMS (Laboratory Information Management System) is a software application that uses a relational database to assemble heterogeneous data, such as gel images, mass spectra, samples, experiments, and related documents, and also provides the tools to allow such data to be entered, tracked and reported (Avery, McGee, & Falk, 2000). A LIMS has or should have the following characteristics: (1) Be an instrument management that allows centralised storage of maintenance and calibration records. (2) Be a data management that enables the complete laboratory environment to be mapped onto database. This allows organising information about personnel, instruments, analytical methods, work procedures and costs. (3) of data.
Have a wide range of validation techniques that ensures the integrity
(4) Be a sample management, which provides a variety of techniques for registration, processing, authorisation and archiving of routine and non-routine samples, standards and reference materials, as well as commonly used test sequences. (5) Have resource management capabilities that include instrument backlog reporting, as well as personnel time management. In addition costs associated with analysis may be calculated and invoiced to client accounts.
PROTEOMICS AND MASS SPECTROMETRY
263
(6) Have a communication management that ensures that important information reaches decision-makers with minimal delay. Access to data within LIMS may be achieved through a wide variety of mechanisms including the industry standard SQL. Information can be communicated using all common networks and interfaces. (7) Have a quality management that is particularly important in regulated environments and can be achieved through audit trail and validation facilities. Quality control is enhanced through specification libraries and action triggers with graphical data interpretation. (8) Have a security configuration that allows standard functionality to be used to create a secure system. This includes setting up passwords, authority levels and menus for each user. A LIMS should provide a great flexibility and variety of functions without creating a system that requires expensive hardware platforms or one that is difficult to change, e.g. small changes require major system rewrites. The best LIMS are those that have simple or straightforward software architectures and run on inexpensive platforms or networks. Invariably, the keys to their success are flexibility, adaptability, ease of evolution and support, and most importantly overall system speed. The speed issue is very critical, as personnel will not use something that is slow or awkward.
Figure 7. The architecture of a simplified client/server implementation of a LIMS.
264
BIENVENUT ET AL.
An example of simplified client/server architecture of a LIMS is given in Figure 7. To allow clients to access the LIMS the system is centralised on a Web server. When connecting to the server, users receive, via the network, the forms corresponding to their request. Once completed, these forms are sent back to the server which processes the information accordingly. Useful information is stocked in the LIMS database. The software used in each stage may be, for example: Server: Java Web Server, IIS, Apache etc, Database: Oracle, Access, Mysql, etc, Network: Internet, Client: PC, SUN, etc, Forms: Created with servlets (Java) or CGI and Perl. LIMS are very specialised softwares which require great financial and time investments. This may explain that freely available solutions have not been reported until today, and one of the commercial versions adapted to proteomics is BioLIMS from PE Informatics. 3.6. Concluding remarks One of the main objectives of research in proteomics is the automation of all procedures from sample acquisition to protein identification. The role of bioinformatics is to treat all available data to obtain unique identification of proteins without human intervention. The recent generation of mass spectrometers are producing more accurate data, with precise mass values using MSn where n are successive stages of mass spectrometry (Gates, Kearney, Jones, Leadlay, & Staunton, 1999; Sullards & Reiter, 2000). The complexity of the PMF must also be explored to optimise the identification tools by means of theoretical models that predict PMF spectra. Currently, most of the tools use a minimum of information to identify proteins. As described in this section, experimental information required for protein identification is limited to a list of mass values which is compared to theoretical fragments. Rules used to produce such peptides do not reflect spectra complexity due to various factors. Mass spectra samples contain peptides that are due to specific endoproteolytic cleavage of the target protein but also unspecific cleavage (Keil, 1982; Thiede et al., 2000)as well as artifactual peptide modifications. The theoretical rules used for in silico protein digestion are often much simpler that the real ones (Keil, 1982, 1992). It is also difficult to prevent sample contamination from different kinds of contaminants, e.g. keratin and endoprotease autolysis products or disturbing agents i.e. SDS or non-volatile salts that could also affected the signal. The chemical and physical properties involved in the mechanisms of ions formation are particularly complex (Karas et al., 2000; Vestal & Jushaz, 1998; Zenobi & Knochenmuss, 1998) and as a consequence they are impossible to be modelled. Preliminary studies towards the development of efficient spectra modeling tools have been published. These include, for example, the analysis of the distribution of peptide masses generated by in silico digestion tools from protein sequence databases (Gay et al., 1999; Pappin et al., 1993) and the influence of a sequence on spectra intensity (Jungblut et al., 1990).
PROTEOMICS AND MASS SPECTROMETRY
265
Bioinformatics tools should be easily adaptable to integrate all the new data generated by their analyses. We should envisage that one day an automatic system would integrate the experimental data (such as a LIMS), all kinds of mass spectrum values (MS, MS/MS) and algorithms (PMF, ions search, de novo sequencing) to produce reliable and automatic protein identifications. 4. BIOINFORMATICS TOOLS FOR THE MOLECULAR SCANNER As described previously, 2-DE gels are a method of choice to separate a large number of proteins with high resolution. If properly stained, a gel provides a twodimensional graphical representation of a proteome. Peptide mass fingerprinting is at present one of the most effective and rapid methods of identifying proteins excised from a 2-DE gel. In order to apply this method to a large number of spots various approaches have been described (Lopez, 2000; Traini et al., 1998b). The molecular scanner approach (Binz, Muller et al., 1999) combines parallel methods for protein digestion and electrotransfer (Bienvenut et al., 1999) with peptide mass fingerprinting methods. Two scanning experiments will be discussed: In the first one, a sample of 1mg E. coli was separated with a mini 2-DE gel and the rectangle excised from the collecting PVDF membrane was scanned on a 48x32 grid with a sampling distance of 0.25mm in both directions (Bienvenut et al., 1999). For the second experiment, human plasma was used and the membrane scanned was scanned on a 80x16 grid with a sampling distance of 0.25mm in horizontal direction and 0.5mm in vertical direction, respectively (Binz, Muller et al., 1999). The software tools used to analyse the spectra, to perform the identifications and to visualise the results are described in this section. 4.1.
Peak detection and spectrum intensity images
By means of a home-made sample and data pipelining software, the user can launch an analysis of these data. First, the peptide peaks have to be detected in all spectra. The next task is to create a virtual image that shows the presence of proteins on the membrane. The height of a peptide peak depends on many parameters and does not give good quantitative information on the concentration of peptides (Kratzer et al., 1998). But still, if we sum up the heights off all the peaks detected in a spectrum, this gives valuable qualitative information on the presence of proteins can be obtained. By doing this for every sampling point of the scan, we obtain an image representing the intensity of the spectra as depicted in Figure 8C and 8D. Figure 8 shows that there is a good correspondence between the SWISS2DPAGE (Hoogland et al., 183) and intensity images. Since the SWISS-2DPAGE reference gels were run with an acrylamide concentration gradient in the second (Mr) dimension and the mini 2-DE gels were not, the Mr scales of the reference gels and the intensity images are different. While there was a very good correspondence for the E. coli scan, the correspondence was less obvious for the human plasma scan, because of the proximity of the immunoglobulin and albumin spots whose peptides were abundant and disturbed the ionisation of peptides from other proteins.
Figure 8. E. coli (A,C) and human plasma (B,D) reference gels (A,B) and spectrum intensities (C,D) (throughout this section high g intensities are black and low ones are white). (A) Portion (pI range 5.1-5.2, Mr 35’000-45’000 Dalton) of the SWISS-2DPAGE reference gel (http://www.expasy.ch/cgibin/map2/def?ECOLI4.5-5.5 with annotated spots. (B) Portion (pI range 4.2-5.6, Mr 43’000–65’000 Dalton) of the SWISS-2DPAGE reference gel (http://www.expasy.ch/cgi-bin/map2/def?PLASMA_HUMAN) with some annotated spots. (C) Total intensity of peaks in E. coli and (D) in human plasma were smoothed. scan, where grey lines indicate spot identities. In order to obtain a better correspondence with the master gels, the images (C)-(D) C
266 BIENVENUT ET AL.
PROTEOMICS AND MASS SPECTROMETRY
4.2.
267
Protein identification
The actual goal of the molecular scanner is the identification of proteins that were sperarated with a 2-DE gel. Therefore, for each scan point, the lists of peptide masses are submitted to the peptide mass fingerprint identification program SmartIdent (Gras et al., 1999), which searches the protein sequence database SWISS-PROT and returns a list of matching proteins and their score. The resolution power of a 2-DE gel is limited, and therefore several proteins may be found in the same position in a gel (Cavalcoli, VanBogelen, Andrews, & Moldover, 1997). In MALDI MS, the presence of one peptide can attenuate the signal of another and some peptides are difficult to detect (Kratzer et al., 1998), resulting in a limited number of peptides per protein expressed in the spectrum. For the E. coli and human plasma scan the protein concentration on the PVDF membrane was low and we had to set the minimal number of matching peptide masses to 3 in order to identify some weekly expressed proteins. The mass tolerance was set to a value as high as 0.6 Dalton due to calibration errors and one missed cleavage was accepted. Peak lists and identification results are then written into a text file. The 2-DE gel analysis tool Melanie (Appel, Palagi et al., 1997) can read this file and allows loading the molecular scanner data for visualisation. It allows selecting scan points and viewing the identification results and spectra of these points. Figure 9 shows a Melanie image of the human plasma scan, where two scan points were selected. In the point on the right side, immunoglobulin-D-1-constantregion protein (ALC1_HUMAN) was detected with the best score and good spatial correlation, i.e. the same protein was detected at least two times in the eight surrounding points. The following proteins in the scoring list have a significantly lower score and some of them are isolated identifications (indicated by an asterisk). D-1-antitrypsin (A1AT_HUMAN) matched with the best score in the other selected point, whereas the next protein (STK2_HUMAN) has a similar score. Two hypothesis could explain this observation: Either both proteins are present or there is a false identification. This problem must be carefully investigated and rules must be defined that decide whether the identification is correct or false. 4.3.
Validation of identifications
In PMF, false identifications occur if a protein matches by chance some peptide masses detected in a spectrum. These peptide masses might stem from other proteins, from impurities or matrix clusters or might be erroneously detected peaks. The more selective the parameters for the identification program are, the lower is the chance for a false match, but the higher is the chance that a true match is missed (Eriksson et al., 2000).
268
BIENVENUT ET AL.
Figure 9. Identification data of the human plasma scan. The data were imported into Melanie and two scan points were selected. pI and Mr were calculated from the position of a point on the gel and are shown in the first field. The second field contains a list of the matching proteins (SWISS-PROT accession number and SWISS-PROT name) and their score. Clicking on this field displays the whole list (an asterisk means that the protein was not found a third time in a 3x3 neighbourhood of the point). Clicking on the third field renders the spectrum attached to the scan point with DataExplorer, which allows verifying peak detection and gives information about the intensity of the peaks.
The values chosen for the two scans were not very selective because the aim was to detect also weekly expressed proteins. We therefor had to deal with a large number of false matches. To discard the bulk of them, two selections were applied. In order to avoid isolated matches, we applied a cellular automaton (Toffoli & Margolus, 1987) to the list of matching proteins. A match was discarded if it was not found again in at least two of the eight neighbouring sites and this process was repeated until a stable configuration was reached. Then a threshold for the median score, i.e. the median of the scores of all sites where the protein was found, had to be set. Figure 10A depicts the distribution of the median protein score for the E. coli scan. In Figure 10B the peptide mass fingerprints were compared with the mouse proteins in SWISS-PROT. It was assumed that the mouse proteins in the scanned pI and molecular weight window bear little resemblance with the E. coli proteins and
PROTEOMICS AND MASS SPECTROMETRY
269
thus provided a statistic for false matches. 95% of these false matches had an average score lower than 340, and this value was taken as the threshold. Table 14. Protein identifications examined due to the criteria described in the text SWISS-PROT Name 6PGD_ECOLI IDH_ECOLI METK_ECOLI ALDA_ECOLI PGK_ECOLI DHPS_ECOLI EUTB_ECOLI FIXC_ECOLI YAGE_ECOLI YBHE_ECOLI ACEA_ECOLI 6PG9_ECOLI HSLU_ECOLI ATOC_ECOLI
Median Score 4620.96 4533.88 1860.03 1307.66 1104.28 606.52 508.55 497.26 475.95 441.67 433.07 430.30 354.67 349.61
Status Ok Ok Ok Ok Ok False False False False Ok Ok False False False
Eliminating Criterion
1,2 2 3 2
2 1,2 1,2
matches. In order to purge them from this list, a detailed analysis of every protein was performed using the following criteria. 1. Spot shape 2. Matches with matrix cluster and impurity peaks 3. Reuse of matching peptide masses All these criteria are rules of thumb and the user has to check them using visualisation tools. The first one serves to eliminate identifications that are poorly localised and therefore do not look like a spot. The second tries to eliminate matches with matrix cluster (Keller & Li, 2000) and impurity peaks. For the third criterion it is assumed that every peptide mass belongs to one matching protein. If a protein matches with a lower score and reuses two or more peptide masses of proteins that matched with a significantly higher score, these masses are discarded. If these masses are indispensable for the match of the former protein, that protein will be eliminated from the list. A detailed discussion of these criteria will be given in Müller et. al. (Muller et al., 2002). An important point in criteria 1) and 2) is that they take account of spatial correlation and distribution of the data. This allows an improvement in the results to a much greater extent than if only localised information was available. We believe that this is one of the strongest features of the molecular scanner.
BIENVENUT ET AL.
270
E. coli
10 00 12 00 14 00 16 00 18 00 20 00 22 00 24 00 M or e
80 0
60 0
40 0
0
90 80 70 60 50 40 30 20 10 0
20 0
Frequency
A
Median Score
E. coli - False Matches
80 0 10 00 12 00 14 00 16 00 18 00 20 00 22 00 24 00 M or e
0
0 80 70 60 50 40 30 20 10 0
20 0 40 0 60 0
Frequency
B
Median Score
Figure 10. Histograms of the median score for the E. coli scan. (A) shows the median score resulting from matches of all E. coli proteins in the SWISS-PROT database, whereas (B) shows the average score of matches of all mouse proteins in SWISS-PROT. The matches in (B) provide a statistic for false matches, since the E. coli and mouse databases have a similar size (4602 and 4066 entries, respectively). The vertical lines mark the 95% confidence threshold.
PROTEOMICS AND MASS SPECTROMETRY
271
Figure 11. Spots detected in the E. coli scan. The spots were calculated using a dendrogram algorithm (Saporta, 189) and the contours enclose 50% of the points, where the protein was detected. The proteins PKG_ECOLI and IDH_ECOLI are split into two spots, which might correspond to modifications of these proteins.
Figure 11 shows the proteins that fulfil all the above criteria. IDH_ECOLI and PGK_ECOLI were found in two spots. ACEA_ECOLI and ALDA_ECOLI were only weakly expressed and formed tiny spots. ALDA_ECOLI, 6PGD_ECOLI and YBHE_ECOLI were not identified on the master gel. Apart from identifying spots automatically, a user might be interested in certain features of the data. An important question is whether a protein is in a modified form or not. The user can specify the mass change of this modification and the name of the protein under investigation and then draw a map where the modified peptides are found. Figure 12 summarises the steps needed to analyse a scan. All steps apart from the last one can be fully automated. The automation of criteria 1) and 2) is a more difficult task and we are devising algorithms that can solve this problem.
BIENVENUT ET AL.
272
A
Peak Detection Identification
B
Intensity Images Peak Lists Identification Data (XML) C
Threshold Setting for Spot Selection Examination of Identifications Datamining
D
E F
Figure 12 (A) By means of the LIMS, the user defines all the parameters needed for peak detection and identification (SmartIdent) and launches the data analysis program. For each scan point peptidic peaks are detected and the peak list is submitted to SmartIdent. (B) The results are written into TIF- and XML-files. (C) Melanie is used to visualise them. The user can examine identifications and (D) set thresholds for the spot detection. (E) All the proteins that pass these thresholds are selected and the contours of the regions where they were detected are drawn. (F) The intensity of peaks within 2025r 0.7Da is shown. The region where these masses are detected corresponds well to the PGK_ECOLI spot on the right side. Since these masses do not appear in the list of standard peptide masses of PGK_ECOLI, they might correspond to a modified peptide.
PROTEOMICS AND MASS SPECTROMETRY 4.4.
273
Concluding remarks
The molecular scanner is a powerful tool to analyse an entire proteome based on 2-D gel electrophoresis and PMF. For every scan point it yields a list of peptide masses and, after searching a peptide sequence database, a list of matching proteins. The two dimensional structure of the data allows easy visualisation and comparison of samples and the spatial correlation can be effectively used to enhance the quality of the results. Algorithms for highly automated spot detection were developed and many spots in 2-DE gels of E. coli and human plasma could be identified. Otherwise, the user can launch data mining applications and search the data for some predefined properties like modified peptide masses. As a result hypotheses can be checked or statistical tasks can be performed. Current development focuses on automation, data mining and visualisation. We are working on the automation of criteria to exclude falsely matched proteins using the spatial correlation of the data. To gain more insight into the data, better visualisation techniques incorporating more user interaction, 3D-animation and multidimensionality will be required. It should also be possible to compare different scans and automatically retrieve information on differently expressed proteins. In combination with a MALDI-TOF/TOF spectrometer (K. F. Medzihradszky et al., 2000) the molecular scanner approach could be used to identify proteins with peptide fragmentation data. We believe that this is a very promising approach for future development.
5. CONCLUSION This chapter has dealt with the whole process of protein identification using mass spectrometry, starting with protein separation followed by mass spectra acquisition and the analysis of this data with bioinformatic tools. Examples of these techniques applied to real biological samples have been given. However, all the steps involved should be optimised. For example, various purification techniques are available and amongst them 2-D gel electrophoresis is one of the most powerful, however, only the most abundant proteins are seen. This problem can be avoided by using narrow range IPG gradients or by pre-fractionation of the sample, for example by affinity chromatography, before electrophoresis. A direct consequence of the completion of so many genome sequencing projects is that the databases containing protein sequence information are growing exponentially. In order to have experimental data that is discriminative enough to unambiguously identify a protein, more experimental information has to be added to mass spectra data, and the available bioinformatics tools must consider this information. To enlarge the scope of mass spectrometry, data such as sequence tags obtained from MS/MS or PSD-MS measurements or information resulting from chemically modified peptides should be also taken into account.
274
BIENVENUT ET AL.
All the processes from separation to identification should be automated. The current approach using robotics has a limited throughput because not all of its steps can be automated or paralleled. The molecular scanner technique could circumvent these drawbacks due to its high capacity for automation and parallel processing. In combination with a MALDI-TOF-TOF MS instrument it provides a very powerful and rapid means to analyse proteins separated on a 2-DE gel. Proteomics not only concerns the identification of proteins but also their control and function. This will involve considerable efforts for the foreseeable future. 6. ACKNOWLEDGEMENTS We would like to thank Gérard Bouchet, Diego Chiappe, Veronique Converset, Isabelle Demalte, Jacques Deshusses, Pavel Dobrokhotov, Roberto Fabbretti, Irene Fasso, Severine Frutiger-Hughes, Christine Hoogland, Ivan Ivanyi, Sylviane Jacoud, Christian Juillet, Salvo Paesano, Luisa Tonnella, Sonja Voordijk, Daniel Walther, Catherine Zimmermann for their technical assistance.
7.
REFERENCES
Acharya, A., Maanjula, B., Murthy, G., & Vithayathil, P. (1977). Int J Peptide Protein Res, 9, 213-219. Aleksandrov, M., Gall, L., Krasnov, V., Nikolae, V., Pavlenko, V., Shkurov, V., et al. (1984). Bioorg. Khim., 10, 710. Amado, F., Damingues, P., Santana-Marques, M., Ferrer-Correia, A., & Jones, K. (1997). Discrimination effects and sensitivity variations in MALDI. Rapid Commun. Mass Spectrom., 11, 1347-1352. Anderson, L., & Seilhamer, J. (1997). A comparison of selected mRNA and protein abundances in human liver. Electrophoresis, 18(3-4), 533-537. Anderson, N. L., Giometti, C. S., Gemmell, M. A., Nance, S. L., & Anderson, N. G. (1982). A twodimensional electrophoretic analysis of the heat-shock-induced proteins of human cells. Clin Chem, 28(4 Pt 2), 1084-1092. Appel, R., Hochstrasser, D., Funk, M., Vargas, R., Pellegrini, C., Muller, A., et al. (1991). Electrophoresis, 12, 722-735. Appel, R., Palagi, P., Walther, D., Vargas, J., Sanchez, J., Ravier, F., et al. (1997). Melanie II--a thirdgeneration software package for analysis of two-dimensional electrophoresis images: I. Features and user interface. Electrophoresis, 18(15), 2724-2734. Appel, R., Vargas, J., Palagi, P., Walther, D., & Hochstrasser, D. (1997). Melanie II--a third-generation software package for analysis of two-dimensional electrophoresis images: II. Algorithms. Electrophoresis, 18(15), 2735-2748. Ashman, K., Houthaeve, T., Clayton, J., Wilm, M., Podtelejnikov, A., & Jensen, O. (1997). Lett. Pept. Sci., 4, 57-65. Avery, G., McGee, C., & Falk, S. (2000). Implementing LIMS: a "how-to" guide. Anal Chem, 72(1), 57A-62A. Ayorinde, F. O., Hambright, P., Porter, T. N., & Keith, Q. L., Jr. (1999). Use of mesotetrakis(pentafluorophenyl)porphyrin as a matrix for low molecular weight alkylphenol ethoxylates in laser desorption/ ionization time-of-flight mass spectrometry. Rapid Commun Mass Spectrom, 13(24), 2474-2479. Bairoch, A., & Apweiler, R. (2000). The SWISS-PROT protein séquence database and its supplement TrEMBL in 2000. Nucleic acids research, 28, 45-48. Bartlet-Jones, M., Jeffrey, W., Hansen, H., & Pappin, D. (1994). Peptide ladder sequencing by MS using a novel volatile degradation reagent. Rapid Commun. Mass Spectrom., 8, 737-742.
PROTEOMICS AND MASS SPECTROMETRY
275
Bauer, M., Sun, Y., Keough, T., & Lacey, M. (2000). Sequencing of sulfonic acid derivatized peptides by electrospray MS. Rapid Commun. Mass Spectrom., 14, 924-929. Beavis, J., & Bridson, J. (1993). Epitaxial protein inclusion in sinaptic acid cristal. J. Phys. D.: Appl. Phys., 26, 442-447. Bienvenut, W., Deon, C., Sanchez, J., & Hochstrasser, D. (2002). Enhanced protein recovery after electrotransfer using square wave alternating voltage. Anal Biochem, 307(2), 297-303. Bienvenut, W., Hoogland, C., Greco, A., Heller, M., Gasteiger, E., Appel, R., et al. (2002). Hydrogen/deuterium exchange for higher specificity of protein identification by peptide mass fingerprinting. Rapid Commun. Mass Spectrom., 16(6), 616-626. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 49814988. Binz, P., Wilkins, M., Gasteiger, E., Bairoch, A., Appel, R., & Hochstrasser, D. (1999). In R. Kellner, F. Lottspeich & H. Meyer (Eds.), Microcharacterisation of proteins (2nd ed., pp. 277-300). Berlin: Wiley-VCH. Binz, P. A., Muller, M., Walther, D., Bienvenut, W. V., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 4981-4988. Bjellqvist, B., Ek, P., Righetti, P., Gianazza, E., Gorg, A., Westermeir, R., et al. (1982). J. Biochem. Biophys., 6, 317-339. Blankenship, D. T., Krivanek, M. A., Ackermann, B. L., & Cardin, A. D. (1989). High-sensitivity amino acid analysis by derivatization with O-phthalaldehyde and 9-fluorenylmethyl chloroformate using fluorescence detection: applications in protein structure determination. Anal Biochem, 178(2), 227232. Bolt, M., & Mahoney, P. (1997). High-efficiency blotting of proteins of divers sizes following SDSPAGE. Anal. Biochem., 247, 185-192. Breaux, G., Green-Church, K., France, A., & Limbach, P. (2000). surfractant-aided, MALDI-MS of hydrophobic and hydrophylic peptides. Anal. Chem., 72, 1169-1174. Breen, E., Hopwood, F., Williams, K., & Wilkins, M. (2000). Automatic Poisson peak harvesting for high throughput protein identification. Electrophoresis, 21, 2243-2251. Brown, R., & Lennon, J. (1995). Mass resolution improvment by incorporation of pulsed ion extraction in a matrix-assisted laser desorption/ionisation linear time-of-flight mass spectrometer. Anal. Chem, 67, 1988-2003. Buijs, J., Costa Vera, C., Ayala, E., Steensma, E., Hakansson, P., & Oscarsson, S. (1999). Conformational stability of adsorbed insulin studied with mass spectrometry and hydrogen exchange. Anal Chem, 71(15), 3219-3225. Cavalcoli, J. D., VanBogelen, R. A., Andrews, P. C., & Moldover, B. (1997). Unique identification of proteins from small genome organisms: theoretical feasibility of high throughput proteome analysis. Electrophoresis, 18(15), 2703-2708. Chaurand, P., Luetzenkirchen, F., & Spengler, B. (1999). Peptide and protein identification by MALDIPSD TOF-MS. J. Am. Soc. Mass Spectrom., 10, 91-103. Chen, C., Walkes, A., Wu, Y., Timmons, R., & Kinsel, G. (1999). Influence of sample preparation methodology on the reduction of peptide MALDI-ion signal by surface peptide binding. J Mass Spectrom, 34, 1205-1207. Clauser, K., Hall, S., Smith, D., Webb, J., Andrews, L., Tran, H., et al. (1995). Rapid mass spectrometric peptide sequencing and mass matching for characterization of human melanoma proteins isolated by two-dimensional PAGE. Proc. Natl. Acad. Sci. USA, 92(11), 5072-5076. Cohen, S., & Chait, B. (1996). Influence of matrix solution condition on the MALDI-MS analysis of peptides and proteins. Anal. Chem, 68, 31-37. Cohen, S. L., & Chait, B. T. (1996). Influence off matrix solution conditions on the MALDI-MS analysis of peptides and proteins. Anal Chem, 68(1), 31-37.
276
BIENVENUT ET AL.
Cooper, C. A., Wilkins, M. R., Williams, K. L., & Packer, N. H. (1999). BOLD--a biological O-linked glycan database. Electrophoresis, 20(18), 3589-3598. Cornish, T., & Cotter, R. (1993). A curved-field reflectron for improved énergie focussing of product ions in time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom., 7, 1037-1040. Dancik, V., Addona, T., Clauser, K., Vath, J., & Pevzner, P. (1999). De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol., 6, 327-342. Dottavio-Martin, D., & Ravel, J. M. (1978). Radiolabeling of proteins by reductive alkylation with [14C]formaldehyde and sodium cyanoborohydride. Anal Biochem, 87(2), 562-565. Ducret, A., Bartone, N., Haynes, P., Blanchard, A., & Aebersold, R. (1998). A simplified gradient solvent delivery system for capillary liquid chromatography - electrospray ionization mass spectrometry. submitted. Edman, P., & Begg, G. (1967). Eur. J. Biochem., 1, 80-91. Einarsson, S., Josefsson, B., & Lagerkvist, S. (1983). J. Chromatogr., 282, 609. Emmett, M., & Caprioli, R. (1994). Micro-electrospray mass spectrometry: Ultra-high-sensitivity analysis of peptides and proteins. J. Am. Soc. Mass. Spectrom., 5, 605-613. Eng, J., McCormack, A., & Yates, J. r. (1994). An approach to correlate tandem mass spectral data pf peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom., 5(11), 976989. Eriksson, J., Chait, B., & Fenyo, D. (2000). A statistical basis for testing the significance of mass spectrometric protein identification results. Anal Chem, 72(5), 999-1005. Falick, A., & Maltby, D. (1989). Anal. Biochem., 182, 165-169. Fenyo, D., Qin, J., & Chait, B. (1998). Protein identification using mass spectrometric information. Electrophoresis, 19(6), 998-1005. Fernandez-de-Cossio, J., Gonzalez, J., & Besada, V. (1995). A computer program to aid the sequencing of peptides in collision-activated decomposition experiments. Comput Appl Biosci, 11(4), 427-434. Fernandez-de-Cossio, J., Gonzalez, J., Betancourt, L., Besada, V., Padron, G., Shimonishi, Y., et al. (1998). Automated interpretation of high-energy collision-induced dissociation spectra of singly protonated peptides by 'SeqMS', a software aid for de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom, 12(23), 1867-1878. Fernandez-Patron, C., Castellanos-Serra, L., & Rodriguez, P. (1992). Reverse staining of sodium dodecyl sulfate polyacrylamide gels by imidazole-zinc salts: sensitive detection of unmodified proteins. Biotechniques, 12(4), 564-573. Figueroa, I., Torres, O., & Russell, D. (1998). Effects of the water content in the sample preparation for MALDI on the mass spectra. Anal. Chem., 70, 4527-4533. Fraenkel-Conrat, H., & Olcott, H. (1945). J. Biol. Chem., 161, 259-268. Gates, P. J., Kearney, G. C., Jones, R., Leadlay, P. F., & Staunton, J. (1999). Structural elucidation studies of erythromycins by electrospray tandem mass spectrometry. Rapid Commun Mass Spectrom, 13(4), 242-246. Gay, S., Binz, P. A., Hochstrasser, D. F., & Appel, R. D. (1999). Modeling peptide mass fingerprinting data using the atomic composition of peptides. Electrophoresis, 20(18), 3527-3534. Gevaert, K., De Mol, H., Verschelde, J. L., Van Damme, J., De Boeck, S., & Vandekerckhove, J. (1997). Novel techniques for identification and characterization of proteins loaded on gels in femtomole amounts. J Protein Chem, 16(5), 335-342. Gevaert, K., Demol, H., Sklyarova, T., Houthaeye, T., De Broeck, S., & Vandekerckove, J. (1998). Sample preparation procedures for ultra sensitive protein identification by PSD-MALDI-TOF-MS. J. Prot. Chem., 17(6), 560. Gevaert, K., Demol, H., Sklyarova, T., Vandekerckhove, J., & Houthaeve, T. (1998). A peptide concentration and purification method for protein characterization in the subpicomole range using matrix assisted laser desorption/ionization postsource decay (MALDI-PSD) sequencing. Electrophoresis, 19(6), 909-917. Gevaert, K., Houthaeve, T., & Vandekerckhove, J. (2000). Techniques for sample preparation including methods for concentrating peptide samples. Exs, 88, 29-42. Gevaert, K., & Vandekerckhove, J. (2000). Protein identification methods in proteomics. Electrophoresis, 21(6), 1145-1154.
PROTEOMICS AND MASS SPECTROMETRY
277
Gobom, J., Kraeuter, K., Persson, R., Steen, H., Roepstorff, P., & Ekman, R. (2000). Detection and quantification of neurotensin in human brain tissue by matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry. Anal. Chem., 72, 3320-3326. Golaz, O., Wilkins, M. R., Sanchez, J. C., Appel, R. D., Hochstrasser, D. F., & Williams, K. L. (1996). Identification of proteins by their amino acid composition: an evaluation of the method. Electrophoresis, 17(3), 573-579. Gooley, A., Ou, K., Russell, J., Wilkins, M., Sanchez, J., Hochstrasser, D., et al. (1997). Electrophoresis, 18, 1068. Gorg, A., Postel, W., & Gunther, S. (1988). The current state of two-dimensional electrophoresis with immobilized pH gradients. Electrophoresis, 9(9), 531-546. Gras, R., Muller, M., Gasteiger, E., Gay, S., Binz, P., Bienvenut, W., et al. (1999). Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection. Electrophoresis, 20(18), 3535-3550. Guilhaus, M., Selby, D., & Mlynski, V. (2000). Orthogonal acceleration time-of-flight mass spectrometry. Mass Spectrom Rev, 19(2), 65-107. Gusev, A., Wilkinson, W., Proctor, A., & Hercules, D. (1995). Improvement of signal reproducibility and matrix/comatrix effects in MALDI analysis. Anal. Chem, 67, 1034-1041. Gygi, S., Rist, B., Gerber, S., Turecek, F., Gelb, M., & Aebersold, R. (1999). quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnol., 17, 994-999. Haynes, P., Sheumack, D., Greig, L., Kibby, J., & Redmond, J. (1991). J. Chromatogr., 588, 107-114. Hensel, R. R., King, R. C., & Owens, K. G. (1997). Electrospay sample preparation for improved quantitation in matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom., 11, 1785-1793. Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., & Watanabe, C. (1993). Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proceedings of the National Academy off Sciences of the United States of America, 90(11), 5011-5015. Hochstrasser, D. (1998). Proteome in perspective. Clin. Chem. Lab. Med., 36(11), 825-836. Hochstrasser, D. F., Appel, R. D., Vargas, R., Perrier, R., Vurlod, J. F., Ravier, F., et al. (1991). A clinical molecular scanner: the Melanie project. MD Comput, 8(2), 85-91. Hofmann, K., Bucher, P., Falquet, L., & Bairoch, A. (1999). The PROSITE database, its status in 1999. Nucleic Acids Res, 27(1), 215-219. Houthaeve, T., Gausepohl, H., Ashman, K., Nillson, T., & Mann, M. (1997). Automated protein preparation techniques using a digest robot. J. Prot. Chem., 16(5), 343-348. Houthaeve, T., Gausepohl, H., Mann, M., & Ashman, K. (1995). Automation of micro-preparation and enzymatic cleavage of gel electrophoretically separated proteins. FEBS Lett., 376(1-2), 91-94. Hunt, D., Yates, J., Shabanowitz, J., Winston, S., & Hauer, C. (1986). Proc. Natl. Sci. USA, 83, 62336237. Ingendoh, A., Karas, M., Hillenkamp, F., & Giessmann, U. (1994). Factors affecting the resolution in MALDI-MS. Int. J. Mass Spectrom. Ion Proc., 131, 345-354. James, P., Quadroni, M., Carafoli, E., & Gonnet, G. (1993). Protein identification by mass profile fingerprinting. Biochem Biophys Res Commun, 195(1), 58-64. Jensen, O. N., Podtelejnikov, A., & Mann-M. (1996). Delayed extraction improves specificity in database searches by matrix-assisted laser desorption/ionization peptide maps. Rapid Commun. Mass Spectrom., 10(11), 1371-1378. Johnson, R. S., Martin, S. A., Biemann, K., Stults, J. T., & Watson, J. T. (1987). Novel fragmentation process of peptides by collision-induced decomposition in a tandem mass spectrometer: differentiation of leucine and isoleucine. Anal Chem, 59(21), 2621-2625. Johnston, R. F., Pickett, S. C., & Barker, D. L. (1990). Autoradiography using storage phosphor technology. Electrophoresis, 11(5), 355-360. Jones, D., Stott, K., Howard, M., & Perham, R. (2000). Biochemistry, 39, 8448-8459. Jonhnson, R., & Biemann, K. (1989). Computer program (seqpep) to aid in the interpretation of highenergy collision tandem mass spectra of peptides. Biomed. Mass Spectrom., 18, 945-.
278
BIENVENUT ET AL.
Joubert-Caron, R., Le Caer, J., Montandon, F., Poirier, F., Pontet, M., Imam, N., et al. (2000). Protein analysis by mass spectrometry and sequence database searching: a proteomic approach to identify human lymphoblastoid cell line proteins. Electrophoresis, 21(12), 2566-2575. Jungblut, P., Eckerskorn, C., Lottspeich, F., & Klose, J. (1990). Blotting efficiency investigated by using two-dimensional electrophoresis, hydrophobic membranes and proteins from different sources. Electrophoresis, 11(7), 581-588. Karas, M., Bachmann, D., Bahr, U., & Hillenkamp, F. (1987). Int. J. Mass Spectrom. Ion Processes, 78, 53-68. Karas, M., Bahr, U., Strupat, K., Hillenkamp, F., Tsarbopoulos, A., & Pramanik, B. N. (1995). Matrix dependence of metastable fragmentation of glycoproteins in MALDI TOF mass spectrometry. Anal. Chem., 67, 675-679. Karas, M., Glückmann, M., & Schäfer, J. (2000). Ionization in matrix-assisted laser desorption/ionisation: singly charged molecular ions are the lucky survivors. J. Mass Spectrom., 35, .01-12. Karas, M., & Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem, 60(20), 2299-2301. Katta, V., & Chait, B. (1993). J. Am. Chem. Soc., 115, 6317-6321. Kaufmann, R., Kirsch, D., & Spengler, B. (1994). Sequencing of peptides in a time-of-flight mass spectrometer: evaluation of postsource decay following matrix-assisted laser desorption/ionisation (MALDI). Int. J. Mass Spectrom. Ion Proc., 131, 355-385. Kaufmann, R., Spengler, B., & Lutzenkirchen, F. (1993). Mass spectrometric sequencing of linear peptides by product-ion anaylsis in a reflectron time-of-flight mass spectrometer using matricassisted laser desorption/ionisation. Rapid Commun. Mass Spectrom., 7, 902-910. Kebarle, P. (2000). A brief overview of the present status of the mechanisms involved in electrospray mass spectrometry. J Mass Spectrom, 35(7), 804-817. Keil, B. (1982). In Methods in protein sequence analysis. Clifton: Humana Press. Keil, B. (1992). Specificity of proteolysis. Heidelberg/New York: Springer-Verlag. Keller, B., & Li, L. (2000). J Am Chem Soc, 11, 88-93. Kenrik, K., & Margolis, J. (1970). Anal. Biochem., 33, 204-207. Keough, T., Lacey, M. P., & Youngquist, R. S. (2000). Derivatization procedures to facilitate de novo sequencing of lysine-terminated tryptic peptides using postsource decay matrix-assisted laser desorption/ionization mass spectrometry. Rapid Commun Mass Spectrom, 14(24), 2348-2356. Keough, T., Youngquist, R. S., & Lacey, M. P. (1999). A method for high sensitivity peptide sequencing using post source decay MALDI-MS. Proc. Natl. Sci. USA, 96, 7131-7136. Klose, J. (1975). Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues. A novel approach to testing for induced point mutations in mammals. Humangenetik, 26(3), 231-243. Kollisch, G., Lorenz, M., Kellner, R., Verhaert, P., & Hoffmann, K. (2000). Eur J Biochem, 267, 55025508. Kratzer, R., Eckerskorn, C., Karas, M., & Lottspeich, F. (1998). Suppression effects in enzymatic peptide ladder sequencing using UV-MALDI-MS. Electrophoresis, 19, 1910-1919. Kraus, M., Janck, K., Bienert, M., & Krause, E. (2000). Characterisation of intermolecular ?-sheet peptides by mass spectrometry and hydrogen isotope exchange. Rapid Commun. Mass Spectrom., 14, 1094-1104. Krause, E., Wenschuh, H., & Jungblut, P. R. (1999). The dominance of Arg containing peptidesin MALDI derived tryptic mass fingerprints of proteins. Anal. Chem., 71, 4160-4165. Krutchinsky, A., W, Z., & BT., C. (2000). Rapidly switchable MALDI and electrospray quadrupole time of flight for protein identification. J. Am. Soc. Mass Spectrom., 11, 493-504. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. Lahn, H. W., & Langen, H. (2000). Mass spectrometry: a tool for the identifiaction of proteins separated by gels. Electrophoresis, 21, 2105-2114. Lehr, S., Kotzka, J., Herkner, A., Sikmann, A., Meyer, H., Krone, W., et al. (2000). Biochemistry, 39, 10898-10907. Li, G., Waltham, M., Anderson, N. L., Unsworth, E., Treston, A., & Weinstein, J. N. (1997). Rapid mass spectrometric identification of proteins from two-dimensional polyacrylamide gels after in gel proteolytic digestion. Electrophoresis, 18(3-4), 391-402.
PROTEOMICS AND MASS SPECTROMETRY
279
Liao, P., Huang, Z., & Allison, J. (1997). Charge remote fragmentation of peptides following attachement of a fixed positive charge: a MALDI PSD study. J. Am. Soc. Mass Spectrom., 8, 501-509. Link, A., Tempel, K., & Hund, M. (1992). RNA metabolism, DNA damage and cellular resistance to Xrays: investigations in chick embryo and rat cells. Z Naturforsch [C], 47(3-4), 249-254. Link, A. J., Eng, J., Schieltz, D. M., Carmack, E., Mize, G. J., Morris, D. R., et al. (1999). Direct analysis of protein complexes using mass spectrometry. Nature Biotechnol., 17, 676-682. Lopez, M. F. (2000). Better approach to finding the needle in a haystack: optimizing proteome analysis through automation. Electrophoresis, 21, 1082-1093. Mamyrin, B., Karatajev, V., Shmikk, D., & Zagulin, V. (1973). JEPT, 37, 45-. Mann, M., Hojrup, P., & Roepstorff, P. (1993). Use off mass spectrometric molecular weight information to identify proteins in sequence databases. Biol Mass Spectrom, 22, 338-345. Mann, M., & Wilm, M. (1994). Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem, 66, 4390-4399. McCormack, A. L., Schieltz, D. M., Goode, B., Yang, S., Barnes, G., Drubin, D., et al. (1997). Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the lowfemtomole level. Analytical Chemistry, 69(4), 767-776. Medzihradszky, K. F., Campbell, J. M., Baldwin, M. A., Falick, A. M., Juhasz, P., Vestal, M. L., et al. (2000). The characteristics of peptide collision-induced dissociation using a high-performance MALDI-TOF/TOF tandem mass spectrometer. Anal Chem, 72(3), 552-558. Mozdzanowski, J., & Speicher, D. (1992). Microsequence analysis of electroblotted proteins. I. Comparison of electroblotting recoveries using different types of PVDF membranes. Anal Biochem, 207(1), 11-18. Muller, M., Gras, R., Appel, R. D., Bienvenut, W. V., & Hochstrasser, D. F. (2002). Visualization and analysis of molecular scanner peptide mass spectra. J Am Soc Mass Spectrom, 13(3), 221-231. Nakanishi, T., Okamoto, N., Tanaka, K., & Shimizu, A. (1994). Biol Mass Spectrom, 23, 230-233. Neuhoff, V., Arold, N., Taube, D., & Ehrhardt, W. (1988). Improved staining of proteins in polyacrylamide gels including isoelectric focusing gels with clear background at nanogram sensitivity using Coomassie Brilliant Blue G-250 and R-250. Electrophoresis, 9(6), 255-262. Neumann, H., & Mullner, S. (1998). Two replica blotting methods for fast immunological analysis of common proteins in two-dimentional electrophoresis. Electrophoresis, 19, 752-757. Nutkins, J., & Williams, D. (1989). Eur. J. Biochem. Oda, Y., Huang, K., Cross, F. R., Cowburn, D., & Chait, B. T. (1999). Accurate quantitation of protein expression and site-specific phosphorylation. Proc Natl Acad Sci U S A, 96(12), 6591-6596. O'Farrell, P. H. (1975). High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem., 250(10), 4007-4021. Okamoto, M., Takahashi, K., Doi, T., & Takimoto, Y. (1997). High-sensitivity detection and postsource with matrix-assisted laser decay of 2-aminopyridine-derivatized oligosaccharides desorption/ionization mass spectrometry. Anal Chem, 69(15), 2919-2926. Pappin, D., Hojrup, P., & Bleasby, A. (1993). Rapid identification of proteins by petide mass fingerprint. Curr. Biol., 3(6), 327-332. Patterson, S., Thomas, D., & Bradshaw, R. (1996). Application of combined mass spectrometry and partial amino acid sequence to the identification of gel separated proteins. Electrophoresis, 17, 877891. Patton, W. F. (2000). A thousand points of light: the application of fluorescence detection technologies to two-dimensional gel electrophoresis and proteomics. Electrophoresis, 21(6), 1123-1144. Perkins, D., Pappin, D., Creasy, D., & Cottrell, J. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20(18), 3551-3567. Rabilloud, T. (1990). Electrophoresis, 11. Ramsay, S. L., Steinborner, S. T., Waugh, R. J., Dua, S., & Bowie, J. H. (1995). A simple method for differentiating Leu and Ile in peptides. The negative-ion mass spectra of [M-H]- ions of phenylthiohydantoin Leu and Ile. Rapid Commun Mass Spectrom, 9(13), 1241-1243. Reim, D., & Speicher, D. (1992). Microsequence analysis of electroblotted proteins: part II. Anal. Biochem., 207, 19-23. Roepstorff, P., & Fohlman, J. (1984). Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed. Mass Spectrom., 11, 601.
280
BIENVENUT ET AL.
Rossier, J. S., Schwarz, A., Reymond, F., Ferrigno, R., Bianchi, F., & Girault, H. H. (1999). Microchannel networks for electrophoretic separations. Electrophoresis, 20(4-5), 727-731. Sakurai, T., Matsuo , T., Matsuda, H., & I, K. (1984). Paas3: A computer progtram to decide probable sequences of peptides from mass spectrometric data. Biomed. Mass Spectrom., 11(8), 396-402. Salih, B., & Zenobi, R. (1998). MALDI mass spectrometry of dye-peptide and dye protein complexe. Anal. Chem., 70, 1536-1543. Scarberry, R., Zhang, Z., & Knapp, D. (1995). J. Am. Soc. Mass Spectrom., 6, 947-. Scheele, G. (1975). J. Biol. Chem., 250, 5375-5385. Schwert, G., & Takenaka, Y. (1955). A Spectrophotometric Determination of Trypsin and Chymotrypsin. Biochim Biophys Acta, 16, 570. Shevchenko, A., Loboda, A., Ens, W., & Standing, K. G. (2000). MALDI quadrupole time-of-flight mass spectrometry: a powerful tool for proteomic research. Anal Chem, 72(9), 2132-2141. Smith, R., Loo, J., Edmonds, C., Barinaga, C., & Udseth, H. (1990). Anal. Chem., 62, 882-899. Spengler, B. (1997). Post-source decay analysis in matrix*assisted laser desorption/ionisation mass spectrometry of biomolecules. J. Mass Spectrom., 32, 1019-1036. Spengler, B., Lutzenkirchen, F., & Kaufmann, R. (1993). On-target deuteration for peptide sequencing by laser mass spectrometry. Org. Mass Spectrom., 28, 1482-1490. Stemmler, E. A., Buchanan, M. V., Hurst, G. B., & Hettich, R. L. (1995). Analysis of modified oligonucleotides by matrix-assisted laser desorption/ionization Fourier transform mass spectrometry. Anal Chem, 67(17), 2924-2930. Stemmler, E. A., Hettich, R. L., Hurst, G. B., & Buchanan, M. V. (1993). Matrix-assisted laser desorption/ionization Fourier-transform mass spectrometry of oligodeoxyribonucleotides. Rapid Commun Mass Spectrom, 7(9), 828-836. Stranz, D., & 3rd, M. L. (1998). J. Biomol. Techniques, 9. Sullards, M. C., & Reiter, J. A. (2000). Primary and secondary locations of charge sites in angiotensin II (M + 2H)2+ ions formed by electrospray ionization. J Am Soc Mass Spectrom, 11(1), 40-53. Takach, E., Hines, W., Patterson, D., Juhasz, P., Falick, A., Vestal, M., et al. (1997). Accurate mass measurements using MALDI-TOF with delayed extraction. J. Prot. Chem., 16(5), 363-369. Tal, M., Silberstain, A., & Nusser, E. (1985). Why does coomassie brillant blue R interact differently with different proteins ? a partial answer. J. Biol Chem, 260(18), 9976-9980. Tanaka, K., Waki, H., Ido, Y., Akita, S., Yoshida, Y., & Yoshida, T. (1988). Rapid Commun. Mass Spectrom., 2, 151-153. Taranenko, N. I., Tang, K., Allman, S. L., Chang, L. Y., & Chen, C. H. (1994). 3-Aminopicolinic acid as a matrix for laser desorption mass spectrometry of biopolymeres. Rapid Commun. Mass Spectrom., 8, 1001-1006. Taylor, J., Anderson, N. L., Scandora, A. E., Jr., Willard, K. E., & Anderson, N. G. (1982). Design and implementation of a prototype Human Protein Index. Clin Chem, 28(4 Pt 2), 861-866. Taylor, J. A., & Johnson, R. S. (1997). Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom, 11(9), 1067-1075. Thiede, B., Lamer, S., Mattow, J., Siejak, F., Dimmler, C., Rudel, T., et al. (2000). Analysis of missed cleavage sites, tryptophan oxydation and N-Terminal pyroglutamination after in-gel tryptic digestion. Rapid Commun. Mass Spectrom., 14, 496-502. Toffoli, T., & Margolus, N. (1987). Cellular automata machines. Cambridge (MA): MIT press. Tonella, L., Walsh, B. J., Sanchez, J. C., Ou, K., Wilkins, M. R., Tyler, M., et al. (1998). '98 Escherichia coli SWISS-2DPAGE database update. Electrophoresis, 19(11), 1960-1971. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J. C., et al. (1998a). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19(11), 19411949. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J.-C., et al. (1998b). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19, 1941-1949. Vestal, M., & Jushaz P. (1998). J Am Soc Mass Spectrom., 9, 892-911. Vestal, M. L., Juhasz, P., & Martin, S. A. (1995). Delayed extraction matrix-assisted laser desorption time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom., 9, 1044-1050.
PROTEOMICS AND MASS SPECTROMETRY
281
Villanueva, J., Canals, F., Villegas, V., Querol, E., & Avilés, F. (2000). Hydrogen exchange monitored by MALDI-TOF MS for rapid characterization of the stability and conformation of proteins. FEBS letters, 472, 27-33. Walsh, B., Molloy, M., & Williams, K. (1998). Electrophoresis, 19, 1883-1890. Wang, F., & Tang, X. (1996). Biochemistry, 35, 4069-4078. Wei, J., Buriak, J., & Siuzdak, G. (1999). Desorption ionization mass spectrometry on porous silicon. Nature, 399, 243-246. Wenschuh, H., Halada, P., Lamer, S., Jungblut, P., & Krause, E. (1998). The ease of peptide detection by MALDI-MS: the effect of secondary structure on signal intensity. Rapid Commun. Mass Spectrom., 12, 115-119. Whittal, R., & Li, L. (1995). Anal. Chem., 67, 1950-1954. Wilcox, P. (1967). Esterification. Meth. Enzym., 11, 605-616. Wiley, W., & McLaren, I. (1953). Time-of-flight mass spectrometer with improved resolution. Rev. Sci. Instrum., 26, 1150-1157. Wilkins, M., Gasteiger, E., Sanchez, J., Appel, R., & Hochstrasser, D. (1996). Curr. Biol., 6, 1543. Wilkins, M., Gasteiger, E., Tonella, L., Ou, K., Tyler, M., Sanchez, J.-C., et al. (1998). J. Mol. Biol., 278, 599-608. Wilkins, M. R., Ou, K., Appel, R. D., Sanchez, J. C., Yan, J. X., Golaz, O., et al. (1996). Rapid protein identification using N-terminal "sequence tag" and amino acid analysis. Biochem Biophys Res Commun, 221(3), 609-613. Wilkins, M. R., Pasquali, C., Appel, R. D., Ou, K., Golaz, O., Sanchez, J. C., et al. (1996). From proteins to proteomes: large scale protein identification by two-dimensional electrophoresis and amino acid analysis. Biotechnology (N Y), 14(1), 61-65. Williams, K., & Hochstrasser, D. (1997). In Proteome Research: New Frontiers in functional genomics (pp. 1-12). Berlin: Springer-Verlag. Wilm, M., & Mann, M. (1996). Analytical properties of the nanoelectrospray ion source. Anal Chem, 68(1), 1-8. Yamashita, M., & Fenn, J. (1984). Phys. Chem., 88, 4451-4459. Yan, J. X., Wilkins, M. R., Ou, K., Gooley, A. A., Williams, K. L., Sanchez, J. C., et al. (1996). Largescale amino-acid analysis for proteome studies. J Chromatogr A, 736(1-2), 291-302. Yates, J. R., 3rd, Eng, J. K., & McCormack, A. L. (1995). Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Analytical Chemistry, 67(18), 3202-3210. Yates, J. R., III, Speicher, S., Griffin, P. R., & Hunkapiller, T. (1993). Peptide mass maps: A highly informative approach to protein identification. Anal Biochem, 214, 397-408. Zenobi, R., & Knochenmuss, R. (1998). Ion formation in MALDI mass spectrometry. Mass Spectrom. Rev., 17, 337-366. Zhang, W., & Chait, B. (2000). Anal. Chem., 72, 2482-2489. Zhang, Z., & McElvain, J. (2000). Improvements in protein identification by MALDI-TOF MS peptide mapping. Anal. Chem., 72, 2337-2350. Zhu, Y., Chung, C., Taranenko, N., Allmann, S., Martin, S., Haff, L., et al. (1996). The study of 2,3,4Trihydroxyacetophenone and 2,4,6-Trihydroxyacetophenone as matrix for DNA detection in MALDI-TOF-MS. Rapid Commun. Mass Spectrom., 10, 383-388.
CHAPTER 9 CONCLUSIONS AND PERSPECTIVES
The development of genome sequencing projects has produced an amazing quantity of data concerning the organisms studied. Nevertheless, gene identification shows its limits, especially in the eukaryotic kingdom where correct localization of short open reading frames is not easy using conventional predictive bioinformatics tools. Furthermore, a clearly identified gene may give little information concerning the translation product, and at present most of the PTMs are not predictable from DNA sequences as well as the concentrations of such compounds in a single cell. The proteomic approach on a large scale for protein identification is now a powerful tool as a complementary approach to the genomic one. While powerful experimental techniques allow us to sequence DNA material in an automated way and with confidence, protein analysis is completely different. At present, it is very difficult to analyse complex protein mixtures, e.g. cell lysates or body fluids, without a purification/separation step to decrease sample complexity. After such sample preparation, proteins are hydrolysed using endoproteolytic enzymes that are able to cleave the polypeptide at predictable sites. Analyse of these fragments by mass spectrometry combined with bioinformatics treatment of the data enables identification of the protein. Despite automation of such an approach to allow great increases in throughput of protein identification by mass spectrometry, the validation step, i.e. validation of the protein identification results using other information such as the species, the MW, the pI…, cannot be automated and becomes the bottleneck of this technique. This work proposes a new approach to protein identification using mass spectrometry combined with a sample fractionation step by SDS-PAGE. The parallel digestion technique allow the processing, for an endoproteolytic digestion, all of the proteins previously separated on an SDS-PAGE run and requires little manual work, features that may decrease external contamination, e.g. by keratins, and operator errors. The digestion step is conducted in a parallel manner by protein electroblotting from SDS-PAGE to collecting membrane (usually PVDF-based membranes). During migration, the polypeptides cross a hydrophilic membrane on which trypsin is immobilized. Using an adapted voltage signal, proteins are correctly digested during the electroblotting process. The PVDF collecting membrane is then used directly in MALDI-MS for sample analysis, which decreases material loss during sample preparation. In addition, spatial resolution is much better than the resolution obtained by spot excision, in which the smaller gel plug must be around 1 mm whereas the molecular scanner approach is limited by the laser resolution, which is usually around 50 Pm. Finally, for the manual or the robotized approaches, 283 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 283–284. © 2005 Springer. Printed in the Netherlands.
284
BIENVENUT ET AL.
an accurate validation step of the protein identification result is needed to remove false positive matches. In the molecular scanner approach, the redundancy of the data due to the high spatial resolution of the technique (100–400 spectra per mm2) allows cross validation of the true results on a large number of spectra in a limited surface such as a band (1-DE) or spot (2-DE) using a clustering algorithm that also recreates the protein spot on the virtual image. A collaboration with the company Applied Biosystems, which is involved in the development of MALDI mass spectrometry, allows this technique to be reproduced accurately on different sites and with different users. The main impact of this collaboration is in allowing improvement of some aspects of this approach: - Industrial production of the trypsin-bonded hydrophilic membranes with high proteolytic density, - Development of a hydrophobic collecting membrane better adapted to peptide capture than are standard PVDF membranes, - Homogeneous matrix application at the surface of the collecting membrane. Subsequently, other developments can be implemented to improve and to enlarge the scope of the technique. As an example, hydrophilic membranes for endoproteolytic cleavage of the polypeptide can also be developed for enzymes such as endoproteinase Lys-C or less specific enzymes such as chymotrypsin. It must also be of interest to combine such approaches with the characterization of the protein PTMs using enzymatic or chemical treatments. A hydrophilic membrane carrying alkaline phosphatase was developed and shows a phosphatase activity against casein test solutions. However, these membranes were never used during electroblotting preparations. In chapter 4, the loss of confidence in protein identification using the PMF approach was clearly demonstrated. Complementary information concerning the primary structure is frequently used as a confirmation step; for example, H/D exchange on peptides was previously developed on dry-droplet prepared samples but may also be possible for the hydrophobic collecting membranes. Moreover, Applied Biosystems as well as Bruker developed new tandem MS with MALDI sources (4700 proteomic analyser and Ultraflex™). These instruments are able first to analyse the sample in MS mode for PMF protein identification, then to switch to the tandem MS mode to improve protein identification confidence. With to the continuous increase in the area of proteomics, such tools for highthroughput protein identification, including sample preparation, MS analysis, data treatment and automated validation systems for the identified proteins, must be helpful for such goals.
APPENDIX GENERAL ABBREVIATIONS 2-D: 2-DE: 3-D: AA: AB: ACN: ALICE: ANS: BAAMC:
Bi-Dimensional Bi-Dimensional Electrophoresis Tri-Dimensional Amino Acid Amido Black Acetonitrile or MeCN Acid-Labile Isotope-Coded Extractants 1-Anilino-8-NaphthaleneSulfonate Benzol-DL-Arginine 7-Amino-4-MethylCoumarine or Bz-Arg-AMC BAEE: Benzoyl-L-Arginine Ethylic Ester or Bz-Arg-OEt BAPNA: N(D)-Benzoyl-DL-Arginine-4-NitroAnilide/HCl or Bz-Arg-NHPhNO 2 Bz-Arg-AMC: Benzol-DL-Arginine 7-Amino-4-MethylCoumarine or BAAMC Bz-Arg-NHPhNO2: N(D)-Benzoyl-DL-Arginine-4-NitroAnilide/HCl or BAPNA Bz-Arg-OEt: Benzoyl-L-Arginine Ethylic Ester or BAEE CAPS: (3-CyclohexylAmino)-1-PropaneSulfonique acid CBAAMC: Carbobenzoxy-L-Arginine 7-Amino-4-MethylCoumarine or Cbz -Arg-AMC CBB: Coomassie Brilliant Blue Cbz -Arg-AMC: Carbobenzoxy-L-Arginine 7-Amino-4-MethylCoumarine or CBAAMC Corr. Correlation coefficient of a linear regression modelling C-terminus part of the sequence of a peptide/protein C-Ter: CZE: Capillary Zone Electrophoresis DIGE: DIfference Gel Electrophoresis DL-BAPA: N-Benzoyl-DL-Arginine p-NitroAnilide DMC: DiMethyl Casein DNA: DeoxyriboNucleic Acid DPD: Double Parallel Digestion ES: Electro-Spray ESI: Electro-Spray Ionisation EST: Express Sequence Tag FTICR: Fourier Transform Ion Cyclotron Resonance Full Widths at Half Maximum FWHM: H/D: Hydrogen/Deuterium HPLC: High Pressure Liquid Chromatography
285
286 IAV: ICAT: IEF: IGD: IMAC: IR: IT: LC: L-GPNA: MALDI: MC: MCAT: MDPF: MEKC: Mr: MS/MS: MS: MW: NA: NBS: NC: NCS: NHS: NIS: NPGB: N-Ter: OMD: OSDT: PAGE: pH: pI: PIGD: PMF: PSD: PTM: PVDF: RNA: SDS: SELDI: TAME: TFA:
BIENVENUT ET AL. Immobilon AV™ (activated membrane from Millipore, not anymore commercialised) Isotope-Coded Affinity Tags IsoElectric Focussing In-Gel Digestion Immobilized Metal Affinity Columns InfraRed Ion-Trap Liquid Chromatography N-Glutaryl-L-Phenylaline p-NitroAnilide Matrix-Assisted Laser Desorption/Ionisation Missed Cleavage Mass-Coded Abundance Tag 2-Methoxy-2,4-DiPhenyl-3(2H)-Furanone Micellar Electrophoresis or Micellar Electro-Kinetic Chromatography Relative mass Tandem mass spectrometry or MS2 Mass Spectrometry Molecular Weight Not Available N-BromoSuccinimide NitroCellulose N-ChloroSuccinimide N-HalogenoSuccinimide N-IodoSuccinimide P-Nitro-Phenyl p-Guanidino Benzoate N-terminus part of the sequence of a peptide/protein On-Membrane Digestion One Step Digestion-Transfer PolyAcrylamide GEl Logarithm index for the hydrogen ion concentration in an aqueous solution Isoelectric point Parallel In-Gel Digestion Peptide Mass Fingerprint Post Source Decay Post-Translational Modification PolyVinylidene DiFluoride RiboNucleic Acid Sodium Dodecyl Sulphate Surface-Enhance Laser Desorption/Ionisation D-Tosyl-L-Arginine Methyl Ester or Tos-Arg-OMe TrifluoroAcetic Acid
PROTEOMICS AND MASS SPECTROMETRY TNF: ToF: Tos-Arg-OMe: TPCK: UV:
Tumour Necrosis Factor Time of Flight D-Tosyl-L-Arginine Methyl Ester or TAME L-1-Tosylamido-2-Phenylethyl Chloromethyl Ketone Ultra Violet
Pl: Pmol: amol: Atm: eV: fmol: mBa: kDa: MDa: mmol: ms: ng: nl: nm: Pa: pmol: ppm: Thorr:
micro-litre micro-mole ato-mole atmosphere or 101725 Pa or 1.01725 Bars electron Volt femto-mole milli-bar kiloDalton megaDalton milli-mole milli-second nano-gram nano-litre nano-meter Pascal (pressure unit) picomole party per million (x10-6) 1 mm of Hg = 1/760 atm = 1.33322 mBa = 133.322 Pa
287
ABBREVIATIONS FOR USUAL AMINO ACIDS AND CHEMICAL CONSTANTS 3 letters code Ala
1 letter code A
Arg
R
Asn
N
Asp
D
Cys
C
Gln
Q
Glu
E
Gly His
G H
Ile
I
Leu
L
Lys
K
Met
M
Phe
F
Pro
P
Ser
S
Thr
T
Trp
W
Tyr
Y
Val
V
Trivial name
Systematic name
Av. MW
Alanine (L)
Monoisotopic. MW 71.03711
pKa1
pKa2
pKa3
pI
chain)
2-Aminopropanoic 71.08 2.35 9.87 6 acid Arginine 2-Amino-5156.19 156.10111 2.01 9.04 12.48 11.2 guanidinopentanoic acid Asparagine 2-Amino-3114.1 114.04293 2.02 8.8 5.41 carbamoylpropanoic acid Aspartic acid 2-Aminobutanedioic 115.09 115.0294 2.1 9.82 3.86 2.77 acid Cysteine 2-Amino-3103.14 103.00919 2.05 10.25 8 5.02 mercaptopropanoic acid Glutamine 2-Amino-4128.13 128.05858 2.17 9.13 5.65 carbamoylbutanoic acid Glutamic acid 2129.12 129.04259 2.1 9.47 4.07 3.22 Aminopentanedioic acid Glycine Aminoethanoic acid 57.05 57.02146 2.35 9.78 5.97 137.14 137.05891 1.77 9.18 6.1 7.47 Histidine 2-Amino-3-(1HH imidazol-4-yl)propanoic acid Isoleucine 2-Amino-3113.16 113.08406 2.32 9.76 5.94 methylpentanoic acid Leucine 2-Amino-4113.16 113.08406 2.33 9.74 5.98 methylpentanoic acid Lysine 2,6128.17 128.09496 2.18 8.95 10.53 9.59 Diaminohexanoic acid Methionine 2-Amino-4131.19 131.04049 2.28 9.21 5.74 (methylthio)butanoic acid Phenylalanine 2-Amino-3147.18 147.06841 2.58 9.24 5.48 phenylpropanoic acid 97.12 97.05276 2 10.6 6.3 Proline Pyrrolidine-2carboxylic acid Serine 2-Amino-387.08 87.03203 2.21 9.15 5.68 hydroxypropanoic acid Threonine 2-Amino-3101.11 101.04768 2.09 9.1 5.64 hydroxybutanoic acid Tryptophan 2-Amino-3-(lH 186.21 186.07931 2.38 9.39 5.89 lHindol-3-yl)propanoic acid Tyrosine 2-Amino-3-(4163.18 163.06333 2.2 9.11 10.07 5.66 hydroxyphenyl)propanoic acid Valine 2-Amino-399.13 99.06841 2.29 9.72 5.96 methylbutanoic acid *: Kyte, J., & Doolittle, R. F. (1982). A simple method for displaying the hydropathic character of a protein. J Mol Biol, 157(1), 105-132.
289
Hydrophobicity*
(Side
1.8 -4.5
-3.5
-3.5 2.5
-3.5
-3.5
-0.4 -3.2
4.5
3.8
-3.9
1.9
2.8
-1.6 -0.8
-0.7
-0.9
-1.3
4.2
APPENDIX 13
C-labelled, 63, 64 1-DE, 26, 28, 70, 122, 123, 124, 126, 128, 129, 130, 132, 140, 141, 142, 143, 158, 159, 160, 193, 196, 248, 249, 258, 284 2,5-Dihydroxybenzoic acid, 49, 50 2-DE, 5, 6, 11, 14, 23, 26, 46, 70, 95, 96, 97, 120, 121, 122, 123, 124, 126, 129, 130, 131, 132, 133, 134, 136, 151, 153, 154, 155, 157, 161, 163, 164, 165, 166, 196, 201, 205, 206, 209, 210, 211, 214, 225, 227, 229, 230, 247, 251, 255, 262, 265, 267, 273, 274, 284 2D-PAGE, 4, 11, 169, 170, 172, 178 3-D structure, 43 3-D trap, 77 4-vinylpyridine, 28, 69 Absolute quantitation, 63, 234, 248 Acetic anhydride, 81 Acetyl chloride, 81, 84 Acetylation, 80 Acetylation of the amino groups, 80 Achromobacter lyticus, 33 Acidic hydrolysis, 37 residues, 204 Active process, 18 Affinity tag, 86 ALICE, 64, 87, 92, 93 Alkylation, 28, 38, 69, 86 Allyl isothiocyanate, 83 Amido-Black, 6, 8, 9 Amino group of Lys, 80, 82 Aminopeptidase., 91 Ammonium bicarbonate, 40 salts, 55 Amphoteric, 3, 5 ANS, 30 Arginyl residue, 32, 234
Artefactual reactions, 69, 88 Aspartic peptidase, 35 Asp-N, 28, 36 Atypical cleavage, 32 Autolysis products, 30, 31, 65, 205, 213, 214, 264 Automated approach, 94 external calibration, 65 MS/MS acquisition, 78 Automation, 85, 120, 151, 153, 238, 247, 264, 271, 273, 274, 283 Bacillus licheniformis, 36 Bacterial endopeptidases, 35 Basic residue, 7, 9, 31, 76, 91, 133 Bathophenanthroline, 21, 22 Benzoyl-L-arginine ethylic ester,, 30 Bi-dimensional cards, 46 electrophoresis, 11, 96, 121 Bifunctional organic molecules, 88 Blotting buffer, 17, 18, 141 process, 15, 18, 20, 27 bn ion series, 76, 79, 87 BNPS-skatol, 39 Capillary zone electrophoresis, 12 Carboxylic group of Glu or Asp., 80 groups, 9, 84, 89, 192, 240, 241 side chains, 88 Carboxypeptidase, 28, 91, 94 Carrier Ampholytes, 5 CBB G250, 7, 8, 22 R250, 7, 8, 11, 21, 22, 141, 145, 193 Cellulose acetate, 18 Charge density ratio, 5 per mass unit, 5
291
292
BIENVENUT ET AL.
modifications, 87 peptide derivatisation, 79 Chemical cleavage, 27, 38, 40, 83, 85 modification, 69, 71, 78, 80, 235 noise, 97, 162, 169, 172, 173, 176, 178, 179, 181, 183, 185 Chemi-luminescence, 26 Chemotryptic inhibitors, 32 Chymotrypsin, 33 A, 33 B, 34 Citric acid, 10 cleavage rules, 31, 37, 71, 72 Clinical Molecular Scanner, 119 Clostridium histolyticum, 36 CNBr cleavage, 17 co-crystallisation, 42, 43, 58, 171, 233 collision gas, 76, 214, 235, 237 Colloidal gold, 22, 24 silver staining, 10, 22, 24 co-matrix, 51, 53, 54, 55, 56, 231 compatibility, 8, 9, 10, 12, 13, 21, 22, 55 Complementary information, 80, 284 complete genomes, 25, 170 computer comparison, 46 computer-related data-treatment, 46 Coomassie, 2, 6, 7, 8, 9, 10, 21, 22, 72, 123, 128, 132, 134, 141, 229 Coomassie brilliant blue, 6, 7, 8, 21, 22, 141 Coomassie brilliant blue colloidal, 7 Coulomb explosion, 76, 236 Coulomb repulsion, 75 covalently immobilised charged tags, 89 Cross linker, 5 cross-linking, 10 cryogenic detectors, 59 Crystalline seed method, 43 C-Terr hydrolysis, 91
Cyanogen bromide, 38, 39 Cysteine alkylation, 86 modifications, 11 cysteine-containing peptides, 27, 28, 86 CZE, 12, 62 Data treatment, 65, 95, 96 Data-mining, 37, 72 Daughter fragment ions, 76 Default calibration, 65 Delayed Extraction, 58, 157 Denaturing staining process, 24 de-novo sequencing, 87, 88, 90 de-novo sequence, 78, 219, 222 Densitometry, 7, 10 Derivatised glass fibre tissues, 18 Detector response, 59 Deuterium-labelled, 63 Differential in-gel electrophoresis, 11 Diffusion process, 15 DIGE, 11, 23 Digitiser, 59 DIOS, 53, 187 Discontinuity of the peptides masses, 66 Discriminating factor, 67 DNA blotting, 15, 17 Double Parallel Digestion, 119, 140, 248 Dry droplet method, 42 DTE/DTT, 28 EC 3.4.21.1, 33, 35, 36 EC 3.4.21.19, 36 EC 3.4.21.26, 36 EC 3.4.21.4, 29 EC 3.4.21.50, 33 EC 3.4.21.62, 36 EC 3.4.22.8, 36 EC 3.4.24.33, 36 Edman degradation, 17, 24, 25, 27, 80, 83, 84, 94, 226, 232 Edman sequencing, 14, 18, 21, 25, 37, 38, 80, 227
PROTEOMICS AND MASS SPECTROMETRY EDTA, 10 Electric field, 3, 5, 12, 15, 16, 18, 139, 140, 143, 145, 148, 171, 176, 229, 247, 249, 250 Electrical neutrality, 18 Electromagnetic analysers, 58 Electron multipliers, 59 ElectroSpray Ionisation, 1 Electrostatic interaction, 57, 133, 250 traps, 77 Emission wavelength, 11, 30, 31 Endoproteinase, 32, 33 Glu-C, 36 Asp-N, 36 Lys-C, 33 V8, 24, 36 Endoproteolytic digestion, 28, 29, 71, 170, 171, 283 Energy transfer, 49, 56, 75 Enzymatic cleavage, 27 Enzyme/substrate interactions, 72 Error distribution, 64, 73 ESI, 1, 13, 28, 30, 41, 45, 64, 71, 75, 76, 77, 78, 79, 83, 85, 87, 88, 192, 193, 209, 210, 218, 220, 222, 227, 236, 237, 238, 239 ESI ionisation, 75, 78 ESI MS, 13, 71, 220, 238 EST databases, 70, 74, 78, 79 Esterase activity, 33, 35, 90 Esterification, 69, 84, 192, 239, 240, 241, 242, 243 Europium, 22 False positive identifications, 66, 67, 73 Fast evaporation method, 43 Fluorescence, 23, 26, 30, 31, 220 Fluorescent dyes, 9, 11 stain, 229 Fluorography, 23 Fourier transform detector, 26
293
Fourier Transform Ion Cyclotron Resonance, 59 Free amino groups, 18, 23, 25, 80, 81, 83, 87, 88 Free carboxylic groups, 84 Frequency-associated ion, 59 FTICR, 59, 232 Fucose, 55 Functionalised gold surface, 29 Gas phase acidity, 56 phase ions, 75, 76 plume, 47, 55, 56, 57 Gel plug, 10, 24, 28, 29, 96, 283 Genbank, 69, 70 Genepept, 69, 256 Genome, 3, 189 Gentisic acid, 49, 50 Glass fibre tissues, 18 Glutaraldehyde, 8, 9, 10, 22 Guanidine/HCl, 39 Higher charge stability, 61 High-throughput protein identification, 96, 284 Homoarginine group, 81 Hydrogen bonds, 9, 133 Hydrophilic residues, 12 hydrophobic peptides, 12, 29, 53, 54, 233 surface, 42, 43, 44, 45, 53 Hydrophobicity, 12 IAV-trypsin, 120, 123, 125, 129, 155, 247, 248 ICAT reagents, 28 Identification tools, 66 Identification/validation, 96 IEF, 5, 6, 12, 14, 69, 229 IMAC, 13, 41, 69, 238 Imidazol, 40 Immobilines, 6, 229 Immobilised matrix, 53 trypsin, 14, 119 dyes, 10
294
BIENVENUT ET AL.
endoproteinases, 29 metal affinity columns, 13 Immobilon CD, 22, 154 Immobilon P, 154 Immobilon PSQ, 154 Immuno-detection, 17, 21, 23, 24, 25, 26, 27 Immuno-selective columns, 13 in silico, 3, 46, 66, 67, 71, 73, 170, 232, 252, 259, 261, 262, 264 India ink, 9, 24 Indole ring oxidation, 38 In-gel digestion, 24, 28, 37, 165, 172 Initial energy distribution, 58, 232 ion velocity, 56 kinetic energy, 56 Interaction sites, 31, 37, 72, 81 Internal calibration, 61, 65, 67, 126, 214 standard, 60, 61, 63, 65, 97, 159, 171, 176, 177, 181, 185, 197, 234 in-vivo, 23, 70, 94 In-vivo labelling, 64, 94 Iodoacetamide, 86 Iodoacetic acid, 28, 69 Iodoxybenzoic acid, 39 Ion internal energy, 56 reflectron, 58, 232 Trap, 77 Ionic charge, 13 Ionic chromatography, 13 Ionic interactions, 9 Ionisation process, 49, 56, 57, 58, 239 IR lasers, 49, 52, 57 Isoelectric focussing,, 6 Isoelectric point, 3, 4, 5, 170, 227, 229 Isothiocyanate treatment, 83 Isotope-coded affinity tag, 63, 227, 234 Isotope-coded tags, 64
Kinetic energy, 48, 59 Labile hydrogen atom exchange, 85 Laemmli, 5, 123, 140, 155, 193, 212, 229, 249 Laser photons, 42, 47, 49, 53, 56, 57, 75 Learning-algorithms, 78 Limit of detection, 7, 10, 21, 24, 58, 222 Linear regression, 66, 67, 255 Liquid chromatography, 12 Low background, 11 Low binding capacity, 17 Low-energy interactions, 44 Low-MW proteins, 70, 73 Lysobacter enzymogene, 33 Lysyl endopeptidase, 33 m/z unit, 68 MALDI ionisation, 49, 55, 57, 232 MALDI signal, 43, 53, 54, 55, 61, 63 MALDI-based analyses, 83 MALDI-MS analysis, 41, 42, 43, 49, 50, 82, 95, 96, 171, 189, 229, 231, 232, 241, 247 MALDI-PSD MS, 88, 89, 235 MALDI-ToF-MS, 47, 95, 97 Manual interpretation, 79 Mascot, 64, 67, 68, 70, 71, 241, 243, 255, 256, 257, 258, 259 Mass accuracy, 43, 58, 66, 72, 74, 77, 79, 95, 121, 166, 170, 218, 232, 234 shift, 11, 64, 65, 70 spectrometry, 1, 3, 7, 24, 26, 27, 31, 45, 46, 55, 63, 64, 83, 95, 119, 121, 139, 140, 153, 170, 171, 192, 220, 225, 226, 227, 253, 254, 261, 262, 264, 273, 283, 284 mass/charge ratio, 12 Matrix, 1, 49, 50, 55, 121, 155, 158, 159, 176, 227
PROTEOMICS AND MASS SPECTROMETRY Matrix cooler, 55 crystallisation, 42, 43, 44, 233, 234, 239 crystal, 42, 43, 44, 161, 233, 234, 239 plume, 53 Matrix-Assisted Lased Desorption/Ionisation, 1 Membrane pores sizes, 17 staining, 21 Metallic ion staining, 9 Micellar electro-kinetic chromatography, 12 Micellar electrophoresis, 12 Micro-channel plates, 59 Micro-spray, 75 Missed cleavage, 32, 73, 126, 127, 129, 135, 158, 170, 172, 175, 214, 255, 257, 258, 267 Modified peptide, 63, 73, 87, 89, 178, 184, 241, 245, 271, 272, 273 Molecular gas phase ions, 42 Molecular scanner, 96, 97, 120, 136, 151, 154, 156, 163, 165, 166, 169, 171, 172, 173, 176, 185, 186, 247, 265, 267, 269, 273, 274, 283 Monoclonal antibodies, 26 Mono-crystals, 43 MS imaging, 44, 96 MS/MS analysis, 63, 75, 76, 79, 211, 213, 234, 236 MS-compatible, 10, 13 MS-Fit, 67, 68, 71, 255, 256, 257, 258 MS-imaging, 97, 151 MSn, 26, 77, 79, 264 Multidimensional chromatography, 13, 14, 63, 87 Multiply charged gas-phase molecular ions, 76 Negative staining, 10 Negative Zn/imidazol stain, 72
295
Negatively charged modifications, 88 Neurotensin, 60, 63, 234 N-halogenosuccinimides, 39 N Niles red, 8, 9 Nitrocellulose membrane, 9, 17 Non-covalent interactions, 9 Non-denaturing organic staining processes, 21 Non-polymerised acrylamide, 28 Non-specific enzyme, 33, 35 Non-specific interactions, 9 Non-volatile compounds, 42 nrDB, 69 N-terminally blocked proteins, 13 Nucleotide databases, 70, 78 Nucleotide sequences, 69, 79, 153 Nylon, 18, 19 Octyl-E-glucopyranoside, 27 o-Iodobenzoic acid, 38 o-methylisourea, 81, 82, 87 One-Step Digestion-Transfer, 119 On-line reversed-phase, 78 On-target digestion, 29 Organic dyes, 7 Organic solvent, 12, 17, 18, 20, 21, 43, 44 Parallel In-Gel Digestion, 119 Parallel Protein Chemical Processing, 119 Paramagnetic beads, 29 Parent ion fragmentation, 76 Pepsin, 34, 35 pepsin B, 35 Pepsinogen, 35 Peptidase family A1, 35 family S1, 29, 33 Peptide charge, 94 ionisation, 45, 57 Mass errors, 65 mass fingerprint (PMF), 3, 29, 35, 45, 72, 97, 120, 139, 151, 152, 153, 154, 155, 163, 164, 169, 170,
296
BIENVENUT ET AL.
171, 178, 179, 185, 189, 229, 259, 265, 267, 268 search, 66 PeptIdent, 66, 67, 68, 126, 130, 151, 156, 157, 158, 159, 160, 163, 164, 254, 256, 257, 258 Peptidyl-Asp metalloendopeptidase, 36 Phenol ring, 39 Phenyl isothiocyanate, 25, 83 Phosphate buffers, 36 Phospho-peptides, 13 Photo-ionisation, 56 Photo-thermic effect, 57 PMF, 6, 8, 10, 17, 21, 22, 24, 29, 33, 34, 36, 37, 38, 40, 45, 53, 55, 58, 65, 66, 67, 68, 69, 70, 71, 72, 73, 75, 78, 79, 80, 86, 90, 95, 96, 97, 121, 126, 128, 129, 130, 132, 136, 151, 153, 156, 157, 159, 163, 164, 166, 169, 170, 171, 172, 177, 178, 179, 183, 184, 185, 186, 191, 192, 193, 198, 201, 203, 205, 209, 210, 214, 216, 217, 218, 220, 221, 222, 227, 229, 230, 231, 232, 234, 239, 241, 243, 244, 245, 247, 248, 251, 252, 254, 255, 257, 258, 259, 264, 265, 267, 273, 284 Polyacrylamide, 4, 5, 15, 70, 95, 121, 122, 123, 125, 140, 169, 170, 193, 225, 229, 248 Polyclonal antibodies, 13 Polyethylene, 18, 42, 44, 154 Polypeptides, 6, 7, 10, 11, 15, 24, 25, 38, 45, 119, 130, 133, 135, 191, 227, 229, 231, 248, 251, 283 Polyurethane, 42, 44, 154 Polyvinylpyrolidone, 27 Ponceau red, 9, 21 Positively charged modifications, 87 Post-translational modification (PTM), 13, 26, 34, 35, 153, 164, 166, 225, 226, 227, 231, 234, 238, 260, 261, 262
ppm unit, 68 Precursor ion mass, 76, 77 Primary amino group, 10, 81, 83 Pro endopeptidase, 36 Probability approach based, 67 ProFound, 67, 68, 71, 255, 256, 257, 258 Prolyl oligo-peptidase, 36 residues, 32, 36, 91 protein concentration, 9, 11, 163, 211, 267 databases, 68, 70, 78, 79, 191, 225, 229 detection, 6, 7, 8, 22, 229 electroblotting:, 14 identification, 14, 25, 45, 65, 73, 79, 139, 209, 210, 218, 229, 241, 252, 267, 269 migration, 5, 18, 20 quantitation, 23, 63, 64, 90 score, 67, 268 sequence recovery, 36, 72, 73 staining, 6, 7, 9, 24, 72 transfer, 15 visualisation, 6, 7 Protein/peptide esterification, 84 Proteome, 3, 119, 151, 228 Proteomic, 151 PSD-MS spectrum, 85, 87, 218 Pseudomonas fragi, 36 Psophocarpus tetragonolobu, 40 PTM characterisation, 26 Punctual modifications, 69 PVDF, 9, 15, 17, 19, 22, 24, 25, 27, 29, 42, 44, 49, 97, 119, 122, 124, 125, 128, 129, 130, 131, 132, 133, 134, 139, 141, 142, 143, 144, 145, 147, 148, 151, 154, 155, 156, 158, 159, 160, 161, 164, 171, 172, 209, 211, 212, 230, 247, 248, 249, 250, 265, 267, 283, 284 PVP-40, 27 Quantification, 63, 139, 147
PROTEOMICS AND MASS SPECTROMETRY Quantitative cleavage, 39, 40 Quaternary ammonium, 87 Quaternary phosphonium, 87 Radiography, 23, 251 Radioisotope labelling, 8, 11,23, 26 Radiolabelled protein detection, 23 Rayleigh diameter, 76, 236 Recurrent degradation, 85 Reduction, 28, 38, 78 Relative peak intensity., 72 Relative quantitation, 63, 65, 86, 234 Reversed-phase chromatography, 12, 42 Robotized procedures, 96 Salt concentration, 40, 60 Sample treatment, 95 SDS, 2, 5, 6, 7, 10, 11, 12, 15, 16, 20, 24, 31, 41, 62, 122, 124, 125, 126, 128, 129, 141, 146, 155, 158, 160, 164, 193, 212, 229, 248, 251, 264, 283 SDS-PAGE., 6, 7, 283 SELDI, 42, 44 Semi-dry electroblot, 20, 120 Semi-dry electrophoresis, 15 Sensitiser, 10 Sequence coverage, 33, 60, 127, 128, 129, 130, 132, 133, 135 recovery, 34, 45, 79, 82, 83 by homology, 78 Serine peptidase, 29, 36 sialylations, 13 Signal detection, 59 Signal intensity, 49, 55, 60, 72, 76, 77, 83, 97, 135, 142, 144, 174, 181, 182, 185, 206, 233, 234, 252 reproducibility, 43, 45, 55, 60, 61, 63, 231 sensitivity, 43 suppressors, 61 Silver staining, 9 Sinapinic acid, 43, 50
297
Singly charged ions, 42, 57, 232 Size exclusion chromatography, 13, 14, 29 Slow crystallisation, 43, 233 SmartIdent, 97, 172, 178, 179, 180, 183, 186, 201, 214, 255, 256, 257, 258, 267, 272 Source geometry, 65 Spin-coated drying, 42 Spot cutting, 95 Spot excision, 9, 95, 121, 132, 230, 283 Spot picker, 95, 121, 230 Sprayed matrix, 44 Square wave voltage, 15 Stabilisation energy, 57 STable isotopes, 63, 87 Staphylococcus aureus, 36 Strong cation exchange, 44 Succinic acid, 52 Sulfhydryl groups, 28 Sulfobenzoic acid, 88 Sulfonate groups, 7 Superconducting tunnel junction, 59 Suppression effects, 60 Surface Enhanced Laser Desorption/Ionisation, 42 SWISS-PROT database, 46, 47, 69, 79, 191, 257, 270 SYPRO, 6, 7, 8, 9, 11, 21, 22, 24, 72 TAME, 30, 122, 123, 126 Tank electrophoresis, 15 Tank transfer, 15, 20 Thiohydantoin residue, 83 Time of flight (ToF), 58, 235 ToF analysers, 66 Top-down proteomic, 27 Towbin, 6, 18, 21, 25, 124, 128, 129, 133, 140, 142, 143, 144, 145, 146, 147, 148, 249, 251 Trans-3,5-dimethoxy-4hydroxycinnamic acid, 50 TrEMBL database, 69, 79, 189, 191, 214, 245, 254, 257
298
BIENVENUT ET AL.
Trifluoroethyl isothiocyanate, 83 Trypsin, 29, 31, 123, 133 Trypsin immobilised, 29 \, 2 Uncleaved protein identification, 25
UV lasers, 49, 52, 60 van der Waals bounds, 9 Weakly expressed proteins, 97, 169, 171, 178, 179 yn ion series, 216